PyCogent-1.5.3/000755 000765 000024 00000000000 12024703643 014241 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/ChangeLog000755 000765 000024 00000071344 12024702176 016027 0ustar00jrideoutstaff000000 000000 ********* Changelog ********* Cogent 1.5.2 - 1.5.3 ==================== New Features ------------ * Added a withoutLostSpans() method to Feature objects in cogent.core.annotation. Useful after projecting features from one aligned sequence across to another. Implemented for ordinary Features and SimpleVariables but not xxy_list Variables. Changes ------- * Make Span.remapWith() a little clearer in cogent.core.location. * Tidy annotation remapping code in cogent.core.annotation and cogent.core.sequence so that new positions are only calculated once when slicing, projecting, or otherwise remapping parts of sequences. The old code was needlessly doing it twice. Bug Fixes --------- * Fixed bug in BLAT application controller (cogent.app.blat) which would drop some input sequences when running assign_dna_reads_to_protein_database. * Prevent negative widths from arising in cogent.draw.compatibility when alignment is too wide. Cogent 1.5.1 - 1.5.2 ==================== New Features ------------ * Added new mantel_test function to cogent.maths.stats.test that allows the type of significance test to be specified. This function is meant to replace the pre-existing mantel function. * Added new correlation_test function to cogent.maths.stats.test that computes the correlation (pearson or spearman) between two vectors, in addition to parametric and nonparametric tests of significance and confidence intervals. This function gives more control and information than the pre-existing correlation function. The spearman function is also a new addition. * Added new mc_t_two_sample function to cogent.maths.stats.test that performs a two-sample t-test and uses Monte Carlo permutations to determine nonparametric significance (similar to R's Deducer::perm.t.test). * Added guppy 1.1, pplacer 1.1, ParsInsert 1.04, usearch 5.2.32, rtax 0.981, raxml 7.3.0, BLAT 34, and BWA 0.6.2 application controllers. * Added new functions to cogent.maths.stats.rarefaction that provide alternative ways to perform rarefaction subsampling. * Added convenience wrappers assign_dna_reads_to_database, assign_dna_reads_to_protein_database, and assign_dna_reads_to_dna_database for BLAT, BWA, and usearch with consistent interface across all three. Changes ------- * Minimum matplotlib version now set to 1.1.0. * Minimum Vienna package version now set to 1.8.5. * The pearson function in cogent.maths.stats.test has more robust error-checking. * The mantel and mantel_test functions in cogent.maths.stats.test now check for symmetric, hollow distance matrices as input by default, with an option to disable these checks. * cogent.draw.distribution_plots now uses matplotlib proxy Artists for legend creation (this simplifies the code a bit). Added ability to set the size of plot figures through two new optional parameters to generate_box_plots and generate_comparative_plots. More robust checks have been put in place in case making room for labels fails (this now uses matplotlib 1.1.0's new tight_layout() functionality, but this can still fail in some cases). * cogent.app.raxml (version 7.0.3) is now deprecated and will be removed in 1.6.0. Please use cogent.app.raxml_v730 instead (version 7.3.0). * cogent.app.muscle (version 3.6) is now deprecated and will be removed in 1.6.0. Please use cogent.app.muscle_v38 instead (version 3.8). * Updated cogent.app.uclust to handle --stepwords and --w. Bug Fixes --------- * improve handling of reading frames from Ensembl * actually included the test_ensembl/test_metazoa.py file that was accidentally overlooked. * fixed small diff in postcript output from RNAfold * Deprecation and discontinued warnings are now not ignored by default. cogent.util.warning was ignored in Python 2.7 because it uses DeprecationWarnings. These warnings are temporarily forced to not be ignored. * Included test_app/test_formatdb.py and test_app/test_mothur.py files in alltests.py. * Fixed test_safe_md5 in tests.test_util.test_misc to no longer run an MD5 over a file in PyCogent (this caused the test to break when a new release went out because the MD5 changes due to the new version string). The test now writes a temporary file populated with fixed data and computes the MD5 from that. * Fixed data_file_links.html in the PyCogent documentation to correctly point to several data files that were previously unreachable. Cogent 1.5 - 1.5.1 ================== New Features ------------ * Alignments can now add sequences that are pairwise aligned to a sequence already present in the alignment. * Alignment.addSeqs has more flexibility with the specific order of sequences now controllable by the user. Thanks to Jan Kosinski for these two very useful patches! * Increased options for reading Table data from files: limit keyword; and, line based (as distinct from column-based) type-casting of delimited files. * Flexible parser for raw Greengenes 16S records * Add fast pairwise distance estimation for DNA/RNA sequences. Currently only Jukes-Cantor 1969 and Tamura-Nei 1993 distances are provided. A cookbook entry was added to building_phylogenies. * Added a PredefinedNucleotide substitution model class. This class uses Cython implementations for nucleotide models where analytical solutions are available. Substantial speedups are achieved. These implementations do not support obtaining the rate matrix. Use the older style implementation if you require that (toggled by the rate_matrix_required argument). * Added fit_function function. This allows to fit any model to a x and y dataset using simplex to reduce the error between the model and the data. * Added parsers for bowtie and for BLAT's PSL format * Table can now read/write gzipped files. * GeneticCode class has a getStopIndices method. Returns the index positions of stop codons in a sequence for a user specified translation frame. * Added LogDet metric to cogent.evolve.pairwise_distance. With able assistance from Yicheng Zhu. Thanks Yicheng! * Added jackknife code to cogent.maths.stats.jackknife. This can be used to measure confidence of an estimate from a single vector or a matrix. Thanks to Anuj Pahwa for help implementing this! * Added abundance-based Jaccard beta diversity index (Chao et. al., 2005) Changes ------- * python 2.6 is now the minimum required version * We have removed code authored by Ziheng Yang as it is not available under an open source license. We note a modest performance hit for nucleotide and dinucleotide models. Codon models are not affected. The PredefinedNucleotide models recently added are faster than the older approach that used Yang's code. * The PredefinedNucleotide models are now available via cogent.evolve.models. The old-style (slower) nucleotide models can be obtained by setting rate_matrix_required=True. * RichGenbankParser can now return WGS blocks Bug Fixes --------- * fixed bug that crept into doing consensus trees from tree collections. Thanks to Klara Verbyla for catching this one! * fixed a bug (#3170464) affecting obtaining sequences from non-chromosome level coordinate systems. Thanks to brandoninvergo for reporting and Hua Ying for the patch! * fixed a bug (#2987278) associated with missing unit tests for gbseq.py * fixed a bug (#2987264) associated with missing unit tests for paml_matrix.py * fixed a bug (#2987238) associated with missing unit tests for tinyseq.py Cogent 1.4.1 - 1.5 ================== New Features ------------ * major additions to Cookbook. Thanks to the many contributors (too many to list here)! * added AlleleFreqs attribute to ensembl Variation objects. * added getGeneByStableId method to genome objects. * added Introns attribute to Transcript objects and an Intron class. (Thanks to Hua Ying for this patch!) * added Mann-Whitney test and a Monte-Carlo version * exploratory and confirmatory period estimation techniques (suitable for symbolic and continuous data) * Information theoretic measures (AIC and BIC) added * drawing of trees with collapsed nodes * progress display indicator support for terminal and GUI apps * added parser for illumina HiSeq2000 and GAiix sequence files as cogent.parse.illumina_sequence.MinimalIlluminaSequenceParser. * added parser to FASTQ files, one of the output options for illumina's workflow, also added cookbook demo. * added functionality for parsing of SFF files without the Roche tools in cogent.parse.binary_sff Changes ------- * thousand fold performance improvement to nmds * >10-fold performance improvements to some Table operations Bug Fixes --------- * Fixed a Bug in cogent.core.alphabet that resulted in 4 tests err'ing out when using NumPy v1.4.1 * Sourceforge bugs 2987289, 2987277, 2987378, 2987272, 2987269 were addressed and fixed Cogent 1.4 - 1.4.1 ================== New Features ------------ * Simplified getting genetic variation from Ensembl and provide the protein location of nonsynonymous variants. * rate heterogeneity variants of pre-canned continuous time substitution models easier to define. * Added implementation of generalised neighbour joining. * New capabilities for examining genetic variants using Ensembl. * Phylogenetic methods that can return collections of trees do so as a TreeCollections object, which has writeToFile and getConsensusTree methods. * Added uclust application controller which currently supports uclust v1.1.579. Changes ------- * Major additions to Cookbook documentation courtesy of Tom Elliot. Thanks Tom! * Improvements to parallelisation. Bug Fixes --------- Cogent 1.3 - 1.4 ================ New Features ------------ * added support for manipulating and handling macromolecular structures. This includes a PDB file format parser, a hierarchical data structure to represent molecules. Various utilities to manipulate e.g. clean-up molecules, efficient surface area and proximity-contact calculation via cython. Expansion into unit-cells and crystal lattices is also possible. * added a KD-Tree class for fast nearest neighbor look-ups. Supports k-nearest neighbours and all neighbours within radius queries currently only in 3D. * added new tools for evaluating clustering stresses, goodness_of_fit. In cogent.cluster . * added a new clustering tool, procrustes analysis. In cogent.cluster . * phylo.distance.EstimateDistances class has a new argument, modify_lf. This allows the use the modify the likelihood function, possibly constraining parameters, freeing them, setting bounds, pre-optimising etc.. * cogent Table mods; added transposed and normalized methods while the summed method can now return column or row sums. * Added a new context dependent model class. The conditional nucleotide frequency weighted class has been demonstrated to be superior to the model forms of Goldman and Yang (1994) and Muse and Gaut (1994). The publication supporting this claim is In press at Mol Biol Evol, authored by Yap, Lindsay, Easteal and Huttley. * added new argument to LoadTable to facilitate speedier loading of large files. The static_column_types argument auto-generates a separator format parser with column conversion for numeric columns. * added BLAST XML parser + tests. * Add compatibility matrix module for determining reticulate evolution. * Added 'start' and 'order' options to the WLS and ML tree finding method .trex() These allow the search of tree-space to be constrained to start at a particular subtree and to proceed in a specified order of leaf additions. * Consensus tree of weighted trees from phylo.maximum_likelihood. * Add Alignment.withGapsFrom() aka mirrorGaps, mirrors the gaps into a provided alignment. * added ANOVA to maths.stats.test * LoadTable gets an optional argument (static_column_types) to simplify speedy loading of big files. Changes ------- * Python 2.4 is no longer supported. * NumPy 1.3 is now the minimum supported NumPy version. * zlib is now a dependency. * cogent.format.table.asReportlabTable is being discontinued in version 1.5. This is the last dependency on reportlab and removal will simplify installation. * the conditional nucleotide model (Yap et al 2009) will be made the default model form for context dependent models in version 1.5. * Change required MPI library from PyxMPI to mpi4py. * Move all of the cogent.draw.matplotlib.* modules up to cogent.draw.* * Substitute matplotlib for reportlab throughout cogent.draw * cogent.db.ensembl code updated to work with the latest Ensembl release (56) * motif prob pseudocount option, used for initial values of optimisable mprobs * The mlagan application controller has been removed. Bug Fixes --------- * Fix and test for two bugs in multiple alignment, One in the pairwise.py Hirschberg code and the other in indel_positions.py where gaps at the end were effectively taken to be deletions and never insertions, unlike gaps at the start. * fix #2811993 alignment getMotifProbs. allow_gap argument now has an effect. Cogent 1.2 - 1.3 ================ New Features ------------ * Python2.6 is now supported * added cogent.cluster.nmds, code to perform nonmetric multidimensional scaling. Not as fast as others (e.g.: R, MASS package, isoMDS) * Documentation ported to using the Sphinx documentation generator. * Major additions to documentation in doc/examples. * Added partial support for querying the Ensembl MySQL databases. This capacity has additional dependencies (MySQL-python and SQLAlchemy). This module should be considered alpha level code, (although it has worked reliably for some time in the hands of the developers). * Introduced a new substitution model family. This family has the same form as that originally described by Muse and Gaut (Mol Biol Evol, 11, 715-24). These models were applied in the article by Lindsay et al. (2008, Biol Direct, 3, 52). Model state defaults to the tuple weighted matrix (eg. the Goldman and Yang codon models). Selecting the nucleotide weighted matrix is done using the mprob_model argument. * Likelihood functions now have a getStatistics method. This returns cogent Table objects. Optional arguments are with_motif_probs and with_titles where the latter refers to the Table.Title attribute being set. * Added rna_struct formatter and rna_plot parser * A fast unifrac method implementation. * Added new methods on tree related objects: TreeNode.getNodesDict; TreeNode.reassignNames; PhyloNode.tipsWithinDistance; PhyloNode.totalDescendingBranchLength * Adopted Sphinx documentation system, added many new use cases, improved existing ones. * added setTablesFormat to likelihood function. This allows setting the spacing, display precision of the stats tables resulting from printing a likelihood function. * Added non-parametric multidimensional scaling (nmds) method. * Added a seperate app controller for FastTree v1.0 * new protein MolType, PROTEIN_WITH_STOP, that supports the stop codon new sequence objects, ProteinWithStopSequence and ModelProteinWithStopSequence to support the new MolType. * Support for Cython added. Changes ------- * reconstructAncestralSequences has been deprecated to reconstructAncestralSeqs. It will be removed in version 1.4. * updated parsers * TreeNode.getNewick is now iterative. For recursive, use TreeNode.getNewickRecursive. Both the iterative and recursive methods of getNewick now support the keyword 'escape_name'. DndParser now supports the keyword 'unescape_name'. DndParser unescape_name will now try to remove underscores (like underscore_unmunge). * Generalized MinimalRnaalifoldParser to parse structures and energies from RNAfold as well. * PhyloNode.tipToTipDistances can now work on a subset of tips by passing either a list of tip names or a list of tip nodes using the endpoints param. * deprecating reconstructAncestralSequences to deprecating reconstructAncestralSeqs. * updated app controller parameters for FastTree v1.1 * Allow and require a more recent version of Pyrex. * LoadTree is now a method of cogent.__init__ .. warning:: Pyrex is no longer the accepted way to develop extensions. Use `Cython `_ instead. Bug Fixes --------- * the alignment sample methods and xsample (randint had the wrong max argument) * Fixes the tests that no longer work with NCBI's API changes, and sticks a big warning for the unwary in the ncbi module pointing users to the "official" list of reported rettypes. Note that the rettypes changed recently and NCBI says they do not plan to support the old behavior. * The TreeNode operators cmp, contains, and any operator that relies on those methods will now only perform comparisons against the objects id. Prior behavior first checked the TreeNode's Name attribute and then object id if the Name was not present. This resulted in ambiguous behavior in certain situations. * Added type conversion to Mantel so it works on lists. * kendall tau fix, bug 2794469 * Table now raises a RuntimeError if provided malformed data. * Fixed silent type conversion in TreeNode, bug 2804431 * RangeNode is now properly passing kwargs to baseclass, bug 2804441 * DndParser was not producing correct trees in niche cases, bug 2798580 Cogent 1.1 - 1.2 ================ New Features ------------ * Code for performing molecular coevolution/covariation analyses on multiple sequence alignments, plus support files. (Described in J. Caporaso et al. BMC Evol Biol, 8(1):327, 2008.) * App controller for `CD-HIT `_ * A ParameterEnumerator object is now available in cogent.app.util. This method will iterate over a range of parameters, returning parameter dicts that can be used with the relevant app controller. * Sequence and alignment objects that inherit from Annotatable can now mask regions of sequence, returning new objects where the observed sequence characters in the regions spanned by the annotations are replaced by a mask character (mask_char). * Table.count method. Counts the number of rows satisfying some condition. * Format writer for stockholm and clustal formats. * App controllers for dotur, infernal, RNAplot, RNAalifold. Parsers for infernal and dotur. * Empirical nucleotide substitution matrix estimation code (Described in M. Oscamou et al. BMC Bioinformatics, 9(1):511, 2008) New Documentation ----------------- Usage examples (see doc/) were added for the following: * Querying NCBI * The motif module. * UPGMA clustering * For using the ParameterCombinations object and generating command lines * Coevolution modelling * Sequence annotation handling * Table manipulation * Principal components analysis (PCoA) * Genetic code objects * How to construct profiles, consensus seqs etc .. Changes ------- * PyCogent no longer relies on the Python math module. All math functions are now imported from numpy. The main motivator was to remove casting between numpy and Python types. Such as, a 'numpy.float64' variable unknowingly being converted to a Python 'float' type. * Tables.getDistinctValues now handles multiple columns. * Table.Header is now an immutable property of Tables. Use the withNewHeader method modifying Header labels. * The TreeNode comparison methods now only check against the objects ID Bug Fixes --------- * LoadTable was ignoring title argument for standard file read. * Fixed bug in Table.joined. When a join produces no result, now returns a Table with 0 rows. * Improved consistency of LoadTable with previous behaviour of cogent.Table * Added methods to detect large sequences/alphabets and handle counts from sequence triples correctly. * goldman_q_dna_pair() and goldman_q_rna_pair() now average the frequency matrix used. * reverse complement of annotations with disjoint spans correctly preserve order. * Fixed ambiguity in TreeNode comparison methods which resulted in the prune method incorrectly removing entire subtrees. Cogent 1.0.1 - 1.1 ================== New Features ------------ * Added functionality to cogent.util.unit_test.TestCase assertSameObj - use in place of 'assert a is b' assertNotSameObj - use in place of 'assert a is not b' assertIsPermutation - checks if observed is a permutation of items assertIsProb - checks whether a value(s) are probabilities assertIsBetween - use in place of 'assert a < obs < b' assertLessThan - use in place of 'assert obs < value' assertGreaterThan - use in place of 'assert obs > value' assertSimiliarFreqs - compares frequency distributions using a G-test assertSimiliarMeans - compares samples using a t-test _set_suite_pvalue - set a suite wide pvalue .. note:: both the similiarity assertions can have a pvalue specified in the testing module. This pvalue can be overwritten during alltests.py by calling TestCase._set_suite_pvalue(pvalue) .. note:: All of these new assert methods can take lists as well. For instance: obs = [1,2,3,4] value = 5 self.assertLessThan(obs, value) * Alignment constructor now checks for iterators (e.g. results from parsers) and lists() them -- this allows direct construction like Alignment(MinimalFastaParser(open(myfile.fasta))). Applies to both dense and sparse alignments, and SequenceCollections. * Parameterized LoadTree underscore stripping in node names, and turned it off by default. * new Table features and refactor Trivial edits of the code provided by Felix Schill for SQL-like table joining. Principally a unification of the different types of table joins (inner- and outer-join) between 2 tables, and porting of all testing code into test_table.rest. The method Table.joined provides the interface (see tests/test_table.rest for usage). * added a Table.count method simply counts the number of rows satisfying some condition. Method has the same args as for Table.filtered. * Functions for obtaining the rate matrix for 2 or 3 sequences using the Goldman method. Support for RNA and DNA. * Additions to clustalw and muscle app controllers muscle.py add_seqs_to_alignment align_two_alignments align_unaligned_seqs align_and_build_tree build_tree_from_alignment clustalw.py align_unaligned_seqs bootstrap_tree_from_alignment build_tree_from_alignment align_and_build_tree * App controllers for Clearcut, ClustalW, Mafft * Added midpoint rooting * Accept FloatingPointError as well as ZeroDivisionError to accommodate numpy. * Trees can now compare themselves to other trees using a couple of methods. subsets: compare based on fraction of subsets of labels defined by clades that are the same in the two trees. tip_to_tip: compare based on correlations of tip_to_tip distances. Both of these are fairly badly behaved statistically, so should always be compared to a distribution of values from random (e.g. label-permuted) trees using Monte Carlo. * Added ability to exclude non-shared taxa from subsets tree cmp method. * Added Zongzhi's combination and permutation implementations to transform.py. * Added some docs to UPGMA_cluster. * Added median in cogent.maths.stats.test Added because the numpy version does not support an axis parameter. This function now works like numpy functions (sum, mean, etc...) where you can specify axis. This function should be safe in place of numpy.median. Changes ------- * Many changes to the core objects, mainly for compatibility. Major changes in this update: - ModelSequence now inherits from SequenceI and supports the various Sequence methods (e.g. nucleic acids can reverse-complement, etc.). Type checking is still performed using strings (e.g. for ambiguous characters, etc.) and could be improved, but everything seems to work. Bug # 1851959. - ModelProteinSequence added. Bug # 1851961. - DenseAlignment and ModelSequence can now handle the '?' character, which is added to the Alphabet during install. Bug # 1851483. - Fixed a severe bug in moltype constructors that mutated the dict of ambiguous states after construction of each of the standard moltypes (for example, preventing re-instantiation of a similar moltype after the initial install: bug # 1851482. This would have been very confusing for anyone trying to experiment with custom MolTypes. - DenseAlignment now implements many methods of Alignment (some of which have actually been moved into SequenceCollection), e.g. getGappedSeq() as per bug # 1816573. * Added parameter to MageListFromString and MageGroupFromString. Can now handle 'on' as well as 'off'. * SequenceCollection, Alignment, etc. now check for duplicate seq labels and raise exception or strip duplicates as desired. Added unit test to cover this case. - SequenceCollection now also produces FASTA as default __str__ behavior like the other objects. - DenseAlignment now iterates over its mapped items, not the indices for those items, by default. This allows API compatibility with Alignment but is slow: it may be worth optimizing this for cases such as detecting ambiguous chars, as I have already implemented for gaps. * Updated std in cogent.maths.stats.test - std now takes an axis parameter like numpy functions (sum, mean,etc...). - also added in a docstring and tests. .. note:: cogent.maths.stats.test import sqrt from numpy instead of math in order to allow std to work on arrays. * Tree now uses iterative implementation of traverse(). .. warning:: If you **do** modify the tree while using traverse(), you will get undesired results. If you need to modify the tree, use traverse_recursive() instead. This only applies to the tree topology (e.g. if you are adding or deleting nodes, or moving nodes around; doesn't apply if you are changing branch lengths, etc.). The only two uses I found in Cogent where the tree is modified during iteration are in rna2d (some of the structure tree operations) and the prune() method. I have changed both to use traverse_recursive for now. However, there might be issues with other code. It might be worth figuring out how to make the iterative method do the right thing when the tree is modified -- suggestions are welcome provided they do not impose substantial performance penalties. * Made compatible with Python 2.4 * Changed dev status in setup call * Dropping comments indicating windows support Bug Fixes --------- * Fixed bug 1850981: UPGMA does not check diagonals. This bug was caused because the UPGMA algorithm picks the smallest distances between nodes at each step but should not ever pick something on the diagonal. To prevent a diagonal choice we set it to a large number, but sometimes, for very large matrices, the diagonal sometimes is chosen becuase the number decreases in value as the distances are averaged during node collapse. To prevent this error, the program now checks to make sure that the selected smallest_index is not on the diagonal. If it is, it reassigns the diagonal to the large number. * Fixed bug in gff.py attribute string parsing. If attribute string did not contain double quotes, find() returned -1, so the last character of the string was inadvertently omitted. * Fixed error in taxonomy convenience functions that failed to pass in the specified database. This used to be masked by ncbi's automatic conversion between protein and nucleotide ids, but apparently this conversion no longer operates in the tested cases. * fixed unittest methods Zongzhi noticed that assertFloatEqual would compare two against when a shape (4,0) array was compared against a shape (4,4) array. I added tests for assertFloatEqual, assertFloatEqualAbs, assertFloatEqualRel and assertEqual. The same bug was noticed in assertFloatEqualRel. They are now fixed. These fixes resulted in errors in maths.stats.test.std and correlation_matrix. The std function needed a work over, but the correlation_matrix was a fault in the test case itself. * fixed bug in reading tab-delimited tables failure when a record had a missing observation in the last field has been fixed. Line stripping of only line-feed characters is now done. * Fixed important bug in metric_scaling. numpy eig() produces eigenvector array that is the transpose of Numeric eig(). Therefore, any code that does not take this into account will produce results that are TOTALLY INCORRECT when fed to downstream analyses. Coordinates from this module prior to this patch are incorrect and are not to be trusted. * Fixed a typo in dialign test * tree __repr__ now more robust to non-str Name entries * seqsim.rangenode traverse now compatible /w base class. * Fixed line color bug in PR2 bias plots. * Added method to dump raw coords from dendrogram. * Fixed called to eigenvector when no pyrex * Fixed bug in nonrecursive postorder traversal if not root Cogent 1.0 - (9/8/2007) ======================= * Initial release PyCogent-1.5.3/cogent/000755 000765 000024 00000000000 12024703631 015515 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent-requirements.txt000644 000765 000024 00000000024 11305664175 021005 0ustar00jrideoutstaff000000 000000 cogent numpy>=1.3.0 PyCogent-1.5.3/doc/000755 000765 000024 00000000000 12024703641 015004 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/include/000755 000765 000024 00000000000 12024703635 015665 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/misc/000755 000765 000024 00000000000 12024703643 015174 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/README000755 000765 000024 00000012467 12022431620 015125 0ustar00jrideoutstaff000000 000000 The Readme ========== :Download: `From sourceforge `_ or follow the :ref:`quick-install` instructions. :Registration: To be informed of bugs, new releases please subscribe to the `mailing lists at sourceforge `_. Dependencies ------------ The toolkit requires Python 2.5.1 or greater, and Numpy 1.3 or greater. Aside from these the dependencies below are optional and the code will work as is. A C compiler, however, will allow external C module's responsible for the likelihood and matrix exponentiation calculations to be compiled, resulting in significantly improved performance. .. _required: Required ^^^^^^^^ - Python_: the language the toolkit is primarily written in, and in which the user writes control scripts. - Numpy_: This is a python module used for speeding up matrix computations. It is available as source code for \*nix. - zlib_: This is a compression library which is available for all platforms and comes pre-installed on most too. If, by chance, your platform doesn't have this installed then download the source from the zlib_ site and follow the install instructions, or refer to the instructions for `compiling matplotlib`_. .. note:: On some linux platforms (like Ubuntu), you must specifically install a ``python-dev`` package so that the Python_ header files required for building some external dependencies are available. Optional ^^^^^^^^ - C compiler: This is standard on most \*nix platforms. On Macos X this is available for free in the Developer tools which, if you don't already have them, can be obtained from Apple_. - Matplotlib_: used to plot several kinds of graphs related to codon usage. For installation, see these instructions for `compiling matplotlib`_. - Cython_: This module is only necessary if you are a developer who wants to modify the \*.pyx files. - mpi4py_: Message Passing Interface interface, required for parallel computation. - SQLAlchemy_ and `MySQL-python`_: These are required for the Ensembl querying code. If you use the :ref:`quick-install` approach, these are all specified in the requirements file. Installation ------------ If you don't wish to use the :ref:`quick-install` approach then the conventional \*nix platform (including MacOS X) python package installation procedure applies. Download the software from `here `_. Uncompress the archive and change into the ``PyCogent`` directory and type:: $ python setup.py build This automatically compiles the modules. If you have administrative privileges type:: $ sudo python setup.py install This then places the entire package into your python/site-packages folder. If you do not have administrator privileges on your machine you can change the build approach to:: $ python setup.py build_ext -if which compiles the extensions in place (the ``i`` option) forcibly (the ``f`` option, ie even if they've already been compiled). Then move the cogent directory to where you want it (or leave it in place) and add this location to your python path using ``sys.path.insert(0, "/your/path/to/PyCogent")`` in each script, or by setting shell environment variables (e.g. ``$ export PYTHONPATH=/your/path/to/PyCogent:$PYTHONPATH``) Testing ------- ``PyCogent/tests`` contains all the tests (currently >3100). You can most readily run the tests using the ``PyCogent/run_tests`` shell script. This is done by typing: .. code-block:: guess $ sh run_tests which will automatically build extensions in place, set up the PYTHONPATH and run ``PyCogent/tests/alltests.py``. Note that if certain optional applications are not installed this will be indicated in the output as "can't find" or "not installed". A "`.`" will be printed to screen for each test and if they all pass, you'll see output like: .. code-block:: guess Ran 3299 tests in 58.128s OK Tips for usage -------------- A good IDE can greatly simplify writing control scripts. Features such as code completion and definition look-up are extremely useful. For a complete list of `editors go here`_. To get help on attributes of an object in python, use .. code-block:: python >>> dir(myalign) to list the attributes of ``myalign`` or .. code-block:: python >>> help(myalign.writeToFile) to figure out how to use the ``myalign.writeToFile`` method. Also note that the directory structure of the package is similar to the import statements required to use a module -- to see the contents of ``alignment.py`` or ``sequence.py`` you need to look in the directory ``cogent/core`` path, to use the classes in those files you specify ``cogent.core`` for importing. .. _Python: http://www.python.org .. _Cython: http://www.cython.org/ .. _Numpy: http://numpy.scipy.org/ .. _Matplotlib: http://matplotlib.sourceforge.net .. _Apple: http://www.apple.com .. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ .. _`editors go here`: http://www.python.org/cgi-bin/moinmoin/PythonEditors .. _mpi4py: http://code.google.com/p/mpi4py .. _`restructured text`: http://docutils.sourceforge.net/rst.html .. _gcc: http://gcc.gnu.org/ .. _SQLAlchemy: http://www.sqlalchemy.org .. _`MySQL-python`: http://sourceforge.net/projects/mysql-python .. _zlib: http://www.zlib.net/ .. _`compiling matplotlib`: http://sourceforge.net/projects/pycogent/forums/forum/651121/topic/5635916 PyCogent-1.5.3/run_tests000755 000765 000024 00000000612 11623063350 016212 0ustar00jrideoutstaff000000 000000 #!/bin/sh # make sure we remove .pyc files in case someone has renamed a module .. find . -name '*.pyc' -delete # for automated testing - check if using alternate python install PYTHON_EXE=python if [ $PYTHON_TEST_EXE ]; then PYTHON_EXE=$PYTHON_TEST_EXE; fi set -e $PYTHON_EXE setup.py build_ext --inplace export PYTHONPATH=`pwd`:$PYTHONPATH cd tests nice $PYTHON_EXE alltests.py "$@" PyCogent-1.5.3/setup.py000644 000765 000024 00000015136 12024702176 015761 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from distutils.core import setup import sys, os, re, subprocess __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2011, The Cogent Project" __contributors__ = ["Peter Maxwell", "Gavin Huttley", "Matthew Wakefield", "Greg Caporaso", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" # Check Python version, no point installing if unsupported version inplace if sys.version_info < (2, 6): py_version = ".".join([str(n) for n in sys.version_info]) raise RuntimeError("Python-2.6 or greater is required, Python-%s used." % py_version) # Check Numpy version, no point installing if unsupported version inplace try: import numpy except ImportError: raise RuntimeError("Numpy required but not found.") numpy_version = re.split("[^\d]", numpy.__version__) numpy_version_info = tuple([int(i) for i in numpy_version if i.isdigit()]) if numpy_version_info < (1, 3): raise RuntimeError("Numpy-1.3 is required, %s found." % numpy_version) doc_imports_failed = False try: import sphinx except ImportError: doc_imports_failed = True # A new command for predist, ie: pyrexc but no compile. import distutils.ccompiler class NullCompiler(distutils.ccompiler.CCompiler): # this basically to stop pyrexc building binaries, just the .c files executables = () def __init__(self, *args, **kw): pass def compile(self, *args, **kw): return [] def link(self, *args, **kw): pass # Pyrex makes some messy C code so limit some warnings when we know how. import distutils.sysconfig if (distutils.sysconfig.get_config_var('CC') or '').startswith("gcc"): pyrex_compile_options = ['-w'] else: pyrex_compile_options = [] # On windows with no commandline probably means we want to build an installer. if sys.platform == "win32" and len(sys.argv) < 2: sys.argv[1:] = ["bdist_wininst"] # Restructured Text -> HTML def build_html(): if doc_imports_failed: print "Failed to build html due to ImportErrors for sphinx" return cwd = os.getcwd() os.chdir('doc') subprocess.call(["make", "html"]) os.chdir(cwd) print "Built index.html" # Compiling Pyrex modules to .c and .so include_path = os.path.join(os.getcwd(), 'include') # find arrayobject.h on every system an alternative would be to put # arrayobject.h into pycogent/include, but why .. numpy_include_path = numpy.get_include() distutils_extras = {"include_dirs": [include_path, numpy_include_path]} try: if 'DONT_USE_PYREX' in os.environ: raise ImportError from Cython.Compiler.Version import version version = tuple([int(v) \ for v in re.split("[^\d]", version) if v.isdigit()]) if version < (0, 11, 2): print "Your Cython version is too old" raise ImportError except ImportError: print "No Cython, will compile from .c files" for cmd in ['cython', 'pyrexc', 'predist']: if cmd in sys.argv: print "'%s' not available without Cython" % cmd sys.exit(1) from distutils.extension import Extension pyrex_suffix = ".c" else: from Cython.Distutils import build_ext from Cython.Distutils.extension import Extension pyrex_suffix = ".pyx" class build_wrappers(build_ext): # for predist, make .c files def run(self): self.compiler = NullCompiler() # skip build_ext.run() and thus ccompiler setup build_ext.build_extensions(self) class build_wrappers_and_html(build_wrappers): def run(self): build_wrappers.run(self) build_html() distutils_extras["cmdclass"] = { 'build_ext': build_ext, 'pyrexc': build_wrappers, 'cython': build_wrappers, 'predist': build_wrappers_and_html} # predist python setup.py predist --inplace --force, this is in _darcs/prefs/prefs for instructing darcs predist to execute the subsequent, predist is a darcs word # Save some repetitive typing. We have all compiled # modules in place with their python siblings. def CogentExtension(module_name, extra_compile_args=[], **kw): path = module_name.replace('.', '/') kw['extra_compile_args'] = pyrex_compile_options + extra_compile_args if pyrex_suffix == '.pyx': kw['pyrex_include_dirs'] = [include_path] return Extension(module_name, [path + pyrex_suffix], **kw) short_description = "COmparative GENomics Toolkit" # This ends up displayed by the installer long_description = """Cogent A toolkit for statistical analysis of biological sequences. Version %s. """ % __version__ setup( name="cogent", version=__version__, url="http://sourceforge.net/projects/pycogent", author="Gavin Huttley, Rob Knight", author_email="gavin.huttley@anu.edu.au, rob@spot.colorado.edu", description=short_description, long_description=long_description, platforms=["any"], license=["GPL"], keywords=["biology", "genomics", "statistics", "phylogeny", "evolution", "bioinformatics"], classifiers=[ "Development Status :: 5 - Production/Stable", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License (GPL)", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Software Development :: Libraries :: Python Modules", "Operating System :: OS Independent", ], packages=['cogent', 'cogent.align', 'cogent.align.weights', 'cogent.app', 'cogent.cluster', 'cogent.core', 'cogent.data', 'cogent.db', 'cogent.db.ensembl', 'cogent.draw', 'cogent.evolve', 'cogent.format', 'cogent.maths', 'cogent.maths.matrix', 'cogent.maths.stats', 'cogent.maths.stats.cai', 'cogent.maths.unifrac', 'cogent.maths.spatial', 'cogent.motif', 'cogent.parse', 'cogent.phylo', 'cogent.recalculation', 'cogent.seqsim', 'cogent.struct', 'cogent.util'], ext_modules=[ CogentExtension("cogent.align._compare"), CogentExtension("cogent.align._pairwise_seqs"), CogentExtension("cogent.align._pairwise_pogs"), CogentExtension("cogent.evolve._solved_models"), CogentExtension("cogent.evolve._likelihood_tree"), CogentExtension("cogent.evolve._pairwise_distance"), CogentExtension("cogent.struct._asa"), CogentExtension("cogent.struct._contact"), CogentExtension("cogent.maths._period"), CogentExtension("cogent.maths.spatial.ckd3"), ], **distutils_extras ) PyCogent-1.5.3/tests/000755 000765 000024 00000000000 12024703635 015404 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/__init__.py000644 000765 000024 00000001204 12024702176 017511 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python sub_modules = ['alltests', 'benchmark', 'benchmark_aligning', 'test_draw', 'test_phylo', 'timetrial'] for sub_module in sub_modules: exec ("from %s import %s" % (__name__, sub_module)) __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", "Matthew Wakefield", "Andrew Butterfield", "Edward Lang"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" PyCogent-1.5.3/tests/alltests.py000644 000765 000024 00000031543 12024702176 017616 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # # suite of cogent package unit tests. # run suite by executing this file # import doctest, cogent.util.unit_test as unittest, sys, os from cogent.util.misc import app_path __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", "Hau Ying", "Helen Lindsay", "Jeremy Widmann", "Sandra Smit", "Greg Caporaso", "Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def my_import(name): """Imports a module, possibly qualified with periods. Returns the module. __import__ only imports the top-level module. Recipe from python documentation at: http://www.python.org/doc/2.4/lib/built-in-funcs.html """ mod = __import__(name) components = name.split('.') for comp in components[1:]: mod = getattr(mod, comp) return mod def module_present(modules): """returns True if dependencies present""" if type(modules) == str: modules = [modules] try: for module in modules: mod = __import__(module) except ImportError: return False return True def suite(): modules_to_test = [ 'test_recalculation.rst', 'test_phylo', 'test_dictarray.rst', 'test_align.test_align', 'test_align.test_algorithm', 'test_align.test_weights.test_methods', 'test_align.test_weights.test_util', 'test_app.test_parameters', 'test_app.test_util', 'test_cluster.test_goodness_of_fit', 'test_cluster.test_metric_scaling', 'test_cluster.test_approximate_mds', 'test_cluster.test_procrustes', 'test_cluster.test_UPGMA', 'test_cluster.test_nmds', 'test_core.test_alphabet', 'test_core.test_alignment', 'test_core.test_annotation', 'test_core.test_bitvector', 'test_core.test_core_standalone', 'test_core.test_features.rst', 'test_core.test_entity', 'test_core.test_genetic_code', 'test_core.test_info', 'test_core.test_location', 'test_core.test_maps', 'test_core.test_moltype', 'test_core.test_profile', 'test_core.test_seq_aln_integration', 'test_core.test_sequence', 'test_core.test_tree', 'test_core.test_usage', 'test_data.test_molecular_weight', 'test_evolve.test_best_likelihood', 'test_evolve.test_bootstrap', 'test_evolve.test_coevolution', 'test_evolve.test_models', 'test_evolve.test_motifchange', 'test_evolve.test_substitution_model', 'test_evolve.test_scale_rules', 'test_evolve.test_likelihood_function', 'test_evolve.test_newq', 'test_evolve.test_pairwise_distance', 'test_evolve.test_parameter_controller', 'test_format.test_bedgraph', 'test_format.test_fasta', 'test_format.test_mage', 'test_format.test_pdb_color', 'test_format.test_xyzrn', 'test_maths.test_fit_function', 'test_maths.test_geometry', 'test_maths.test_matrix_logarithm', 'test_maths.test_period', 'test_maths.test_matrix.test_distance', 'test_maths.test_spatial.test_ckd3', 'test_maths.test_stats.test_alpha_diversity', 'test_maths.test_stats.test_distribution', 'test_maths.test_stats.test_histogram', 'test_maths.test_stats.test_information_criteria', 'test_maths.test_stats.test_period', 'test_maths.test_stats.test_special', 'test_maths.test_stats.test_test', 'test_maths.test_stats.test_ks', 'test_maths.test_stats.test_rarefaction', 'test_maths.test_stats.test_util', 'test_maths.test_stats.test_cai.test_adaptor', 'test_maths.test_stats.test_cai.test_get_by_cai', 'test_maths.test_stats.test_cai.test_util', 'test_maths.test_optimisers', 'test_maths.test_distance_transform', 'test_maths.test_unifrac.test_fast_tree', 'test_maths.test_unifrac.test_fast_unifrac', 'test_motif.test_util', 'test_parse.test_aaindex', 'test_parse.test_agilent_microarray', 'test_parse.test_binary_sff', 'test_parse.test_blast', 'test_parse.test_bowtie', 'test_parse.test_bpseq', 'test_parse.test_cigar', 'test_parse.test_clustal', 'test_parse.test_column', 'test_parse.test_comrna', 'test_parse.test_consan', 'test_parse.test_cove', 'test_parse.test_ct', 'test_parse.test_cut', 'test_parse.test_cutg', 'test_parse.test_dialign', 'test_parse.test_ebi', 'test_parse.test_fasta', 'test_parse.test_fastq', 'test_parse.test_gbseq', 'test_parse.test_gibbs', 'test_parse.test_genbank', 'test_parse.test_gff', 'test_parse.test_greengenes', 'test_parse.test_ilm', 'test_parse.test_illumina_sequence', 'test_parse.test_locuslink', 'test_parse.test_mage', 'test_parse.test_meme', 'test_parse.test_msms', 'test_parse.test_ncbi_taxonomy', 'test_parse.test_nexus', 'test_parse.test_nupack', 'test_parse.test_pdb', 'test_parse.test_psl', 'test_parse.test_structure', 'test_parse.test_pamlmatrix', 'test_parse.test_phylip', 'test_parse.test_pknotsrg', 'test_parse.test_rdb', 'test_parse.test_record', 'test_parse.test_record_finder', 'test_parse.test_rfam', 'test_parse.test_rnaalifold', 'test_parse.test_rna_fold', 'test_parse.test_rnaview', 'test_parse.test_rnaforester', 'test_parse.test_sprinzl', 'test_parse.test_tinyseq', 'test_parse.test_tree', 'test_parse.test_unigene', 'test_seqsim.test_analysis', 'test_seqsim.test_birth_death', 'test_seqsim.test_markov', 'test_seqsim.test_microarray', 'test_seqsim.test_microarray_normalize', 'test_seqsim.test_randomization', 'test_seqsim.test_searchpath', 'test_seqsim.test_sequence_generators', 'test_seqsim.test_tree', 'test_seqsim.test_usage', 'test_struct.test_knots', 'test_struct.test_pairs_util', 'test_struct.test_rna2d', 'test_struct.test_asa', 'test_struct.test_contact', 'test_struct.test_annotation', 'test_struct.test_selection', 'test_struct.test_manipulation', 'test_util.test_unit_test', 'test_util.test_array', 'test_util.test_dict2d', 'test_util.test_misc', 'test_util.test_organizer', 'test_util.test_recode_alignment', 'test_util.test_table.rst', 'test_util.test_transform', ] try: import matplotlib except: print >> sys.stderr, "No matplotlib so not running test_draw.py" else: modules_to_test.append('test_draw') modules_to_test.append('test_draw.test_distribution_plots') #Try importing modules for app controllers apps = [('formatdb', 'test_formatdb'), ('blastall', 'test_blast'), ('blat', 'test_blat'), ('bwa', 'test_bwa'), ('carnac', 'test_carnac'), ('clearcut', 'test_clearcut'), ('clustalw', 'test_clustalw'), ('cmalign', 'test_infernal'), ('cmfinder.pl', 'test_cmfinder'), ('comrna', 'test_comrna'), ('contrafold', 'test_contrafold'), ('covea', 'test_cove'), ('dialign2-2', 'test_dialign'), ('dynalign', 'test_dynalign'), ('FastTree', 'test_fasttree'), ('foldalign', 'test_foldalign'), ('guppy', 'test_guppy'), ('ilm', 'test_ilm'), ('knetfold.pl', 'test_knetfold'), ('mafft', 'test_mafft'), ('mfold', 'test_mfold'), ('mothur', 'test_mothur'), ('muscle', 'test_muscle_v38'), ('msms', 'test_msms'), ('ParsInsert', 'test_parsinsert'), ('pplacer', 'test_pplacer'), ('rdp_classifier-2.2.jar', 'test_rdp_classifier'), ('rdp_classifier-2.0.jar', 'test_rdp_classifier20'), ('Fold.out', 'test_nupack'), ('findphyl', 'test_pfold'), ('pknotsRG-1.2-i386-linux-static', 'test_pknotsrg'), ('RNAalifold', 'test_rnaalifold'), ('rnaview', 'test_rnaview'), ('RNAfold', 'test_vienna_package'), ('raxmlHPC', 'test_raxml_v730'), ('rtax', 'test_rtax'), ('sfold.X86_64.LINUX', 'test_sfold'), ('stride', 'test_stride'), ('hybrid-ss-min', 'test_unafold'), ('cd-hit', 'test_cd_hit'), ('calculate_likelihood', 'test_gctmpca'), ('sfffile', 'test_sfffile'), ('sffinfo', 'test_sffinfo'), ('uclust','test_uclust'), ('usearch','test_usearch') ] for app, test_name in apps: should_run_test = False if app_path(app): should_run_test = True elif app.startswith('rdp_classifier') and os.environ.get('RDP_JAR_PATH'): # This is ugly, but because this is a jar file, it won't be in # $PATH -- we require users to set an environment variable to # point to the location of this jar file, so we test for that. # My new version of app_path can be applied to do smarter checks, # but will involve some re-write of how we check whether tests can # be run. -Greg if app == os.path.basename(os.environ.get('RDP_JAR_PATH')): should_run_test = True if should_run_test: modules_to_test.append('test_app.' + test_name) else: print >> sys.stderr, "Can't find %s executable: skipping test" % app if app_path('muscle'): modules_to_test.append('test_format.test_pdb_color') # we now toggle the db tests, based on an environment flag if int(os.environ.get('TEST_DB', 0)): db_tests = ['test_db.test_ncbi', 'test_db.test_pdb', 'test_db.test_rfam', 'test_db.test_util'] # we check for an environment flag for ENSEMBL # we expect this to have the username and account for a localhost # installation of the Ensembl MySQL databases if 'ENSEMBL_ACCOUNT' in os.environ: # check for cogent.db.ensembl dependencies test_ensembl = True for module in ['MySQLdb', 'sqlalchemy']: if not module_present(module): test_ensembl = False print >> sys.stderr, \ "Module '%s' not present: skipping test" % module if test_ensembl: db_tests += ['test_db.test_ensembl.test_assembly', 'test_db.test_ensembl.test_database', 'test_db.test_ensembl.test_compara', 'test_db.test_ensembl.test_genome', 'test_db.test_ensembl.test_host', 'test_db.test_ensembl.test_metazoa', 'test_db.test_ensembl.test_species', 'test_db.test_ensembl.test_feature_level'] else: print >> sys.stderr, "Environment variable ENSEMBL_ACCOUNT not "\ "set: skipping db.ensembl tests" for db_test in db_tests: modules_to_test.append(db_test) else: print >> sys.stderr, \ "Environment variable TEST_DB=1 not set: skipping db tests" assert sys.version_info >= (2, 6) alltests = unittest.TestSuite() for module in modules_to_test: if module.endswith('.rst'): module = os.path.join(*module.split(".")[:-1]) + ".rst" test = doctest.DocFileSuite(module, optionflags= doctest.REPORT_ONLY_FIRST_FAILURE | doctest.ELLIPSIS) else: test = unittest.findTestCases(my_import(module)) alltests.addTest(test) return alltests class BoobyTrappedStream(object): def __init__(self, output): self.output = output def write(self, text): self.output.write(text) raise RuntimeError, "Output not allowed in tests" def flush(self): pass def isatty(self): return False if __name__ == '__main__': if '--debug' in sys.argv: s = suite() s.debug() else: orig = sys.stdout if '--output-ok' in sys.argv: sys.argv.remove('--output-ok') else: sys.stdout = BoobyTrappedStream(orig) try: unittest.main(defaultTest='suite', argv=sys.argv) finally: sys.stdout = orig PyCogent-1.5.3/tests/benchmark.py000644 000765 000024 00000012073 12024702176 017712 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import sys #,hotshot from cogent.evolve.substitution_model import Nucleotide, Dinucleotide, Codon from cogent import LoadSeqs, LoadTree from cogent.maths import optimisers from cogent.util import parallel __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" ALIGNMENT = LoadSeqs(filename="data/brca1.fasta") TREE = LoadTree(filename="data/murphy.tree") def subtree(size): names = ALIGNMENT.getSeqNames()[:size] assert len(names) == size tree = TREE.getSubTree(names) #.balanced() return names, tree def brca_test(subMod, names, tree, length, par_rules, **kw): #names = ALIGNMENT.getSeqNames()[:taxa] #assert len(names) == taxa tree = TREE.getSubTree(names) #.balanced() aln = ALIGNMENT.takeSeqs(names).omitGapPositions()[:length] assert len(aln) == length, (len(aln), length) #the_tree_analysis = LikelihoodFunction(treeobj = tree, submodelobj = subMod, alignobj = aln) par_controller = subMod.makeParamController(tree, **kw) for par_rule in par_rules: par_controller.setParamRule(**par_rule) #lf = par_controller.makeCalculator(aln) return (par_controller, aln) def measure_evals_per_sec(pc, aln): pc.setAlignment(aln) return pc.measureEvalsPerSecond(time_limit=2.0, wall=False) def makePC(modelClass, parameterisation, length, taxa, tree, opt_mprobs, **kw): modelClass = eval(modelClass) if parameterisation is not None: predicates = {'silly': silly_predicate} par_rules = [{'par_name':'silly', 'is_independent':parameterisation}] else: predicates = {} par_rules = [] subMod = modelClass(equal_motif_probs=True, optimise_motif_probs=opt_mprobs, predicates=predicates, recode_gaps=True, mprob_model="conditional") (pc, aln) = brca_test(subMod, taxa, tree, length, par_rules, **kw) return (pc, aln) def quiet(f, *args, **kw): import sys, cStringIO temp = cStringIO.StringIO() _stdout = sys.stdout try: sys.stdout = temp result = f(*args, **kw) finally: #pass sys.stdout = _stdout return result def evals_per_sec(*args): pc, aln = makePC(*args) #quiet(makeLF, *args) speed1 = measure_evals_per_sec(pc, aln) speed = str(int(speed1)) return speed class CompareImplementations(object): def __init__(self, switch): self.switch = switch def __call__(self, *args): self.switch(0) (pc,aln) = quiet(makePC, *args) speed1 = measure_evals_per_sec(pc,aln) self.switch(1) (pc,aln) = quiet(makePC, *args) speed2 = measure_evals_per_sec(pc,aln) if speed1 < speed2: speed = '+%2.1f' % (speed2/speed1) else: speed = '-%2.1f' % (speed1/speed2) if speed in ['+1.0', '-1.0']: speed = '' return speed def benchmarks(test): alphabets = ["Nucleotide", "Dinucleotide", "Codon"] sequence_lengths = [18, 2004] treesizes = [5, 20] for (optimise_motifs, parameterisation) in [ (False, 'global'), (False, 'local'), (True, 'global')]: print parameterisation, ['', 'opt motifs'][optimise_motifs] print ' ' * 14, wcol = 5*len(sequence_lengths) + 2 for alphabet in alphabets: print str(alphabet).ljust(wcol), print print '%-15s' % "", # "length" for alphabet in alphabets: for sequence_length in sequence_lengths: print "%4s" % sequence_length, print ' ', print print ' '*12 + (' | '.join(['']+['-'*(len(sequence_lengths)*5) for alphabet in alphabets]+[''])) for treesize in treesizes: print ("%4s taxa | " % treesize), (taxa, tree) = subtree(treesize) for alphabet in alphabets: for sequence_length in sequence_lengths: speed = test(alphabet, parameterisation=='local', sequence_length, taxa, tree, optimise_motifs) print "%4s" % speed, print '| ', print print print def silly_predicate(a,b): return a.count('A') > a.count('T') or b.count('A') > b.count('T') #def asym_predicate((a,b)): # print a, b, 'a' in a # return 'a' in a #mA = Codon() #mA.setPredicates({'asym': asym_predicate}) def exponentiator_switch(switch): import cogent.evolve.substitution_calculation cogent.evolve.substitution_calculation.use_new = switch import sys if 'relative' in sys.argv: test = CompareImplementations(exponentiator_switch) else: test = evals_per_sec parallel.inefficiency_forgiven = True if parallel.getCommunicator().Get_rank() > 0: #benchmarks(test) quiet(benchmarks, test) else: try: benchmarks(test) except KeyboardInterrupt: print ' OK' PyCogent-1.5.3/tests/benchmark_aligning.py000644 000765 000024 00000002334 12024702176 021561 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import numpy import time from cogent import DNA from cogent.align.align import classic_align_pairwise, make_dna_scoring_dict __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" def _s2i(s): return numpy.array(['ATCG'.index(c) for c in s]) def test(r=1, **kw): S = make_dna_scoring_dict(10, -1, -8) seq2 = DNA.makeSequence('AAAATGCTTA' * r) seq1 = DNA.makeSequence('AATTTTGCTG' * r) t0 = time.time() aln = classic_align_pairwise(seq1, seq2, S, 10, 2, local=False, **kw) t = time.time() - t0 return (len(seq1)*len(seq2))/t print t if __name__ == '__main__': d = 2 e = 1 options = [(False, False), (True, False), (False, True)] template = "%10s " * 4 print " 1000s positions per second" print template % ("size", "simple", "logs", "scaled") for r in [50, 100, 200, 500]: times = [test(r, use_logs=l, use_scaling=s) for (l,s) in options] print template % tuple([r*10] + [int(t/1000) for t in times]) PyCogent-1.5.3/tests/data/000755 000765 000024 00000000000 12024703634 016314 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_align/000755 000765 000024 00000000000 12024703632 017532 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_app/000755 000765 000024 00000000000 12024703632 017220 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_cluster/000755 000765 000024 00000000000 12024703632 020121 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_core/000755 000765 000024 00000000000 12024703632 017370 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_data/000755 000765 000024 00000000000 12024703633 017352 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_db/000755 000765 000024 00000000000 12024703632 017025 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_dictarray.rst000644 000765 000024 00000002302 11634002704 021147 0ustar00jrideoutstaff000000 000000 >>> import numpy >>> from cogent import DNA >>> from cogent.util.dict_array import DictArrayTemplate >>> a = numpy.identity(3, int) >>> b = DictArrayTemplate('abc', 'ABC').wrap(a) >>> b[0] =========== A B C ----------- 1 0 0 ----------- >>> b['a'] =========== A B C ----------- 1 0 0 ----------- >>> b.keys() ['a', 'b', 'c'] >>> row = b['a'] >>> row.keys() ['A', 'B', 'C'] >>> list(row) [1, 0, 0] >>> sum(row) 1 >>> # Dimensions can also be ordinay integers >>> b = DictArrayTemplate(3, 3).wrap(a) >>> b.keys() [0, 1, 2] >>> b[0].keys() [0, 1, 2] >>> sum(b[0]) 1 >>> # Or a mix >>> b = DictArrayTemplate('ABC', 3).wrap(a) >>> b.keys() ['A', 'B', 'C'] >>> b['A'].keys() [0, 1, 2] ``DictArray`` should work properly in ``numpy`` operations. >>> darr = DictArrayTemplate(list(DNA), list(DNA)).wrap([[.7,.1,.1,.1], ... [.1,.7,.1,.1], ... [.1,.1,.7,.1], ... [.1,.1,.1,.7]]) >>> mprobs = numpy.array([0.25, 0.25, 0.25, 0.25]) >>> print mprobs.dot(darr) [ 0.25 0.25 0.25 0.25] >>> print numpy.dot(mprobs, darr) [ 0.25 0.25 0.25 0.25] PyCogent-1.5.3/tests/test_draw/000755 000765 000024 00000000000 12024703633 017376 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_draw.py000644 000765 000024 00000024321 12024702176 017753 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """This script/module can do any of 5 different actions with the figures it makes. When run as a script, one of the actions must be specified: - exercise The default when used as a module. Drawing code is used to make a matplotlib figure object but that is all. - record Save PNG images of the figures in draw_results/baseline/ To be used before making changes in drawing code. - check Compare figures with saved baseline figures. Fail if they don't match. Failed figures are saved in draw_results/current. Also makes an HTML page comparing them with the baseline images. - compare Save ALL differing figures in draw_results/current and make HTML page comparing them with the baseline images. - view Save all differing figures in draw_results/current and make HTML page comparing them with the baseline images, along with all the matching figures too. """ import warnings warnings.filterwarnings('ignore', category=UserWarning, module='matplotlib') import matplotlib matplotlib.use('Agg') import unittest import sys, os, cStringIO from cogent import DNA, LoadTree, LoadSeqs from cogent.core import alignment, alphabet, annotation from cogent.draw.linear import * from cogent.draw.dendrogram import * from cogent.draw.compatibility import partimatrix __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", "Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def file_for_test(msg, baseline=False, prefixed=True): file_ext = "png" dirname = 'baseline' if baseline else 'current' if prefixed: dirname = os.path.join('draw_results', dirname) fname = msg.replace(' ', '_') + '.' + file_ext return os.path.join(dirname, fname) def fig2png(fig): f = cStringIO.StringIO() fig.savefig(f, format='png') return f.getvalue() def writefile(fname, content): dirname = os.path.dirname(fname) if not os.path.exists(dirname): os.makedirs(dirname) f = open(fname, 'wb') f.write(content) f.close() def exercise(msg, fig): pass def record(msg, fig): png = fig2png(fig) fname = file_for_test(msg, True) writefile(fname, png) class CheckOutput(object): def __init__(self, failOnDifference=True, showAll=False): if not os.path.exists('draw_results/baseline'): raise RuntimeError( 'No baseline found. Run "test_draw.py record" first') self.results = [] self.failOnDifference = failOnDifference self.showAll = showAll self.anyFailures = False def __call__(self, msg, fig): fname = file_for_test(msg, True) observed = fig2png(fig) if os.path.exists(fname): expected = open(fname, 'rb').read() different = observed != expected self.results.append((msg, different)) if different: self.anyFailures = True writefile(file_for_test(msg, False), observed) if self.failOnDifference: raise AssertionError('See draw_results/comparison.html') else: print 'difference from', fname else: raise RuntimeError('No baseline image at %s' % fname) def writeHTML(self): html = ['Drawing Test Output', ''] html.append('

%s figures of which %s differ from baseline' % ( len(self.results), sum(d for (m,d) in self.results))) for (msg, different) in self.results: fn1 = file_for_test(msg, True, False) fn2 = file_for_test(msg, False, False) if different: html.append('

%s

' % msg) html.append('

Old

') html.append('' % fn1) html.append('

New

') html.append('' % fn2) elif self.showAll: html.append('

%s

' % msg) html.append('' % fn1) else: html.append('

%s

' % msg) html.append('
') html.append('') html = '\n'.join(html) f = open('draw_results/comparison.html', 'w') f.write(html) f.close() def report(self): self.writeHTML() if self.anyFailures or self.showAll: if sys.platform == 'darwin': import subprocess subprocess.call(['open', 'draw_results/comparison.html']) else: print "See draw_results/comparison.html" def do(msg, display, **kw): fig = display.makeFigure(**kw) test_figure(msg, fig) def makeSampleSequence(): seq = 'tgccnwsrygagcgtgttaaacaatggccaactctctaccttcctatgttaaacaagtgagatcgcaggcgcgccaaggc' seq = DNA.makeSequence(seq) v = seq.addAnnotation(annotation.Feature, 'exon', 'exon', [(20,35)]) v = seq.addAnnotation(annotation.Feature, 'repeat_unit', 'repeat_unit', [(39,49)]) v = seq.addAnnotation(annotation.Feature, 'repeat_unit', 'rep2', [(49,60)]) return seq def makeSampleAlignment(): # must be an esier way to make an alignment of annotated sequences! from cogent.align.align import global_pairwise, make_dna_scoring_dict DNA = make_dna_scoring_dict(10, -8, -8) seq1 = makeSampleSequence()[:-2] seq2 = makeSampleSequence()[2:] seq2 = seq2[:30] + seq2[50:] seq1.Name = 'FAKE01' seq2.Name = 'FAKE02' names = (seq1.getName(), seq2.getName()) align = global_pairwise(seq1, seq2, DNA, 2, 1) align.addAnnotation(annotation.Variable, 'redline', 'align', [((0,15),1),((15,30),2),((30,45),3)]) align.addAnnotation(annotation.Variable, 'blueline', 'align', [((0,15),1.5),((15,30),2.5),((30,45),3.5)]) return align seq = makeSampleSequence() a = seq.addAnnotation(annotation.Variable, 'blueline', 'seq', [((0,15),1),((15,30),2),((30,45),3)]) v = seq.addAnnotation(annotation.Feature, 'gene', 'gene', [(0,15),(20,35),(40,55)]) b = v.addAnnotation(annotation.Variable, 'redline', 'feat', [((0,15),1.5),((15,30),2.5),((30,45),3.5)]) align = makeSampleAlignment() def green_cg(seq): seq = str(seq) posn = 0 result = [] while True: last = posn posn = seq.find('CG', posn) if posn < 0: break result.append('k' * (posn-last)+'gg') posn += 2 result.append('k' * (len(seq)-last)) return list(''.join(result)) class DrawingTests(unittest.TestCase): def test_seqs(self): seqd = Display(seq) do('sequence wrapped at 50', seqd, rowlen=50) small = FontProperties(size=7, stretch='extra-condensed') do('squashed sequence', seqd.copy(seq_font=small, colour_sequences=True)) do('seq display slice from 5 to 45 starts %s' % seq[5:8], seqd[5:45]) def test_alns(self): alignd = Display(align, colour_sequences=True, min_feature_height=10) do('coloured text alignment', alignd) do('coloured alignment no text', alignd.copy(show_text=False)) do('no text and no colour', alignd.copy(show_text=False, colour_sequences=False)) do('no shapes', alignd.copy(show_text=False, draw_bases=False)) do('no text or colour or shapes', alignd.copy(show_text=False, colour_sequences=False, draw_bases=False)) do('green seqs', alignd.copy(seq_color_callback=green_cg)) def test_legend(self): from cogent.draw.legend import Legend do('Feature Legend', Legend()) def test_dotplot(self): from cogent.draw.dotplot import Display2D do('2d', Display2D(seq, seq[:40], show_text=False, draw_bases=False)) def test_trees(self): treestring = "((A:.1,B:.22)ab:.3,((C:.4,D:.5)cd:.55,E:.6)cde:.7,F:.2)" for edge in 'ABCDEF': treestring = treestring.replace(edge, edge+edge.lower()*10) t = LoadTree(treestring=treestring) for klass in [ UnrootedDendrogram, SquareDendrogram, ContemporaneousDendrogram, ShelvedDendrogram, # StraightDendrogram, # ContemporaneousStraightDendrogram ]: dendro = klass(t) dendro.getConnectingNode('Ccccccccccc', 'Eeeeeeeeeee').setCollapsed( color="green", label="C, D and E") do(klass.__name__, dendro, shade_param="length", show_params=["length"]) def callback(edge): return ["blue", "red"][edge.Name.startswith("A")] do("Highlight edge A", UnrootedDendrogram(t), edge_color_callback=callback) def test_partimatrix(self): aln = LoadSeqs(filename='data/brca1.fasta', moltype=DNA) species5 = ['Human','HowlerMon','Mouse','NineBande','DogFaced'] aln = aln.takeSeqs(species5) aln = aln[:500] fig = partimatrix(aln, samples=0, display=True, print_stats=False, s_limit=10, title="brca1") test_figure('compatibility', fig) if __name__ != "__main__": test_figure = exercise else: myargs = [] for arg in ['exercise', 'record', 'check', 'compare', 'view']: if arg in sys.argv: sys.argv.remove(arg) myargs.append(arg) if len(myargs) != 1: print 'Need one action, got', myargs print __doc__ sys.exit(1) action = myargs[0] if action == 'record': test_figure = record elif action == 'check': test_figure = CheckOutput(True) elif action == 'compare': test_figure = CheckOutput(False) elif action == 'view': test_figure = CheckOutput(False, True) elif action == 'exercise': test_figure = exercise else: raise RuntimeError('Unknown action %s' % action) try: unittest.main() finally: if hasattr(test_figure, 'report'): test_figure.report() PyCogent-1.5.3/tests/test_evolve/000755 000765 000024 00000000000 12024703633 017741 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_format/000755 000765 000024 00000000000 12024703632 017730 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_maths/000755 000765 000024 00000000000 12024703635 017557 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_motif/000755 000765 000024 00000000000 12024703635 017561 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_parse/000755 000765 000024 00000000000 12024703635 017555 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_phylo.py000644 000765 000024 00000026430 12024702176 020154 0ustar00jrideoutstaff000000 000000 #! /usr/bin/env python import unittest, os import warnings from numpy import log, exp warnings.filterwarnings('ignore', 'Not using MPI as mpi4py not found') from cogent.phylo.distance import EstimateDistances from cogent.phylo.nj import nj, gnj from cogent.phylo.least_squares import wls from cogent import LoadSeqs, LoadTree from cogent.phylo.tree_collection import LogLikelihoodScoredTreeCollection,\ WeightedTreeCollection, LoadTrees from cogent.evolve.models import JC69, HKY85, F81 from cogent.phylo.consensus import majorityRule, weightedMajorityRule from cogent.util.misc import remove_files __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Matthew Wakefield",\ "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def Tree(t): return LoadTree(treestring=t) class ConsensusTests(unittest.TestCase): def setUp(self): self.trees = [ Tree("((a,b),(c,d));"), Tree("((a,b),(c,d));"), Tree("((a,c),(b,d));"), Tree("((a,b),c,d);")] data = zip(map(log, [0.4,0.4,0.05,0.15]), # emphasizing the a,b clade self.trees) data.sort() data.reverse() self.scored_trees = data def test_majorityRule(self): """Tests for majority rule consensus trees""" trees = self.trees outtrees = majorityRule(trees, strict=False) self.assertEqual(len(outtrees), 1) self.assert_(outtrees[0].sameTopology(Tree("((c,d),(a,b));"))) outtrees = majorityRule(trees, strict=True) self.assertEqual(len(outtrees), 1) self.assert_(outtrees[0].sameTopology(Tree("(c,d,(a,b));"))) def test_consensus_from_scored_trees_collection(self): """tree collection should get same consensus as direct approach""" sct = LogLikelihoodScoredTreeCollection([(1, t) for t in self.trees]) ct = sct.getConsensusTree() self.assertTrue(ct.sameTopology(Tree("((c,d),(a,b));"))) def test_weighted_consensus_from_scored_trees_collection(self): """weighted consensus from a tree collection should be different""" sct = LogLikelihoodScoredTreeCollection(self.scored_trees) ct = sct.getConsensusTree() self.assertTrue(ct.sameTopology(Tree("((a,b),(c,d));"))) def test_weighted_trees_satisyfing_cutoff(self): """build consensus tree from those satisfying cutoff""" sct = LogLikelihoodScoredTreeCollection(self.scored_trees) cts = sct.getWeightedTrees(cutoff=0.8) expected_trees = [Tree(t) for t in "((a,b),(c,d));", "((a,b),(c,d));", "((a,b),c,d);"] for i in range(len(cts)): cts[i][1].sameTopology(expected_trees[i]) ct = cts.getConsensusTree() self.assertTrue(ct.sameTopology(Tree("((a,b),(c,d));"))) def test_tree_collection_read_write_file(self): """should correctly read / write a collection from a file""" def eval_klass(coll): coll.writeToFile('sample.trees') read = LoadTrees('sample.trees') self.assertTrue(type(read) == type(coll)) eval_klass(LogLikelihoodScoredTreeCollection(self.scored_trees)) # convert lnL into p eval_klass(WeightedTreeCollection([(exp(s), t) for s,t in self.scored_trees])) remove_files(['sample.trees'], error_on_missing=False) class TreeReconstructionTests(unittest.TestCase): def setUp(self): self.tree = LoadTree(treestring='((a:3,b:4):2,(c:6,d:7):30,(e:5,f:5):5)') self.dists = self.tree.getDistances() def assertTreeDistancesEqual(self, t1, t2): d1 = t1.getDistances() d2 = t2.getDistances() self.assertEqual(len(d1), len(d2)) for key in d2: self.assertAlmostEqual(d1[key], d2[key]) def test_nj(self): """testing nj""" reconstructed = nj(self.dists) self.assertTreeDistancesEqual(self.tree, reconstructed) def test_gnj(self): """testing gnj""" results = gnj(self.dists, keep=1) (length, reconstructed) = results[0] self.assertTreeDistancesEqual(self.tree, reconstructed) results = gnj(self.dists, keep=10) (length, reconstructed) = results[0] self.assertTreeDistancesEqual(self.tree, reconstructed) # Results should be a TreeCollection len(results) results.getConsensusTree() # From GNJ paper. Pearson, Robins, Zhang 1999. tied_dists = { ('a', 'b'):3, ('a', 'c'):3, ('a', 'd'):4, ('a', 'e'):3, ('b', 'c'):3, ('b', 'd'):3, ('b', 'e'):4, ('c', 'd'):3, ('c', 'e'):3, ('d', 'e'):3} results = gnj(tied_dists, keep=3) scores = [score for (score, tree) in results] self.assertEqual(scores[:2], [7.75, 7.75]) self.assertNotEqual(scores[2], 7.75) def test_wls(self): """testing wls""" reconstructed = wls(self.dists, a=4) self.assertTreeDistancesEqual(self.tree, reconstructed) def test_truncated_wls(self): """testing wls with order option""" order = ['e', 'b', 'c', 'd'] reconstructed = wls(self.dists, order=order) self.assertEqual(set(reconstructed.getTipNames()), set(order)) def test_limited_wls(self): """testing (well, exercising at least), wls with constrained start""" init = LoadTree(treestring='((a,c),b,d)') reconstructed = wls(self.dists, start=init) self.assertEqual(len(reconstructed.getTipNames()), 6) init2 = LoadTree(treestring='((a,d),b,c)') reconstructed = wls(self.dists, start=[init, init2]) self.assertEqual(len(reconstructed.getTipNames()), 6) init3 = LoadTree(treestring='((a,d),b,z)') self.assertRaises(Exception, wls, self.dists, start=[init, init3]) # if start tree has all seq names, should raise an error self.assertRaises(Exception, wls, self.dists, start=[LoadTree(treestring='((a,c),b,(d,(e,f)))')]) class DistancesTests(unittest.TestCase): def setUp(self): self.al = LoadSeqs(data = {'a':'GTACGTACGATC', 'b':'GTACGTACGTAC', 'c':'GTACGTACGTTC', 'e':'GTACGTACTGGT'}) self.collection = LoadSeqs(data = {'a':'GTACGTACGATC', 'b':'GTACGTACGTAC', 'c':'GTACGTACGTTC', 'e':'GTACGTACTGGT'}, aligned=False) def assertDistsAlmostEqual(self, expected, observed, precision=4): observed = dict([(frozenset(k),v) for (k,v) in observed.items()]) expected = dict([(frozenset(k),v) for (k,v) in expected.items()]) for key in expected: self.assertAlmostEqual(expected[key], observed[key], precision) def test_EstimateDistances(self): """testing (well, exercising at least), EstimateDistances""" d = EstimateDistances(self.al, JC69()) d.run() canned_result = {('b', 'e'): 0.440840, ('c', 'e'): 0.440840, ('a', 'c'): 0.088337, ('a', 'b'): 0.188486, ('a', 'e'): 0.440840, ('b', 'c'): 0.0883373} result = d.getPairwiseDistances() self.assertDistsAlmostEqual(canned_result, result) # excercise writing to file d.writeToFile('junk.txt') try: os.remove('junk.txt') except OSError: pass # probably parallel def test_EstimateDistancesWithMotifProbs(self): """EstimateDistances with supplied motif probs""" motif_probs= {'A':0.1,'C':0.2,'G':0.2,'T':0.5} d = EstimateDistances(self.al, HKY85(), motif_probs=motif_probs) d.run() canned_result = {('a', 'c'): 0.07537, ('b', 'c'): 0.07537, ('a', 'e'): 0.39921, ('a', 'b'): 0.15096, ('b', 'e'): 0.39921, ('c', 'e'): 0.37243} result = d.getPairwiseDistances() self.assertDistsAlmostEqual(canned_result, result) def test_EstimateDistances_fromThreeway(self): """testing (well, exercising at least), EsimateDistances fromThreeway""" d = EstimateDistances(self.al, JC69(), threeway=True) d.run() canned_result = {('b', 'e'): 0.495312, ('c', 'e'): 0.479380, ('a', 'c'): 0.089934, ('a', 'b'): 0.190021, ('a', 'e'): 0.495305, ('b', 'c'): 0.0899339} result = d.getPairwiseDistances(summary_function="mean") self.assertDistsAlmostEqual(canned_result, result) def test_EstimateDistances_fromUnaligned(self): """Excercising estimate distances from unaligned sequences""" d = EstimateDistances(self.collection, JC69(), do_pair_align=True, rigorous_align=True) d.run() canned_result = {('b', 'e'): 0.440840, ('c', 'e'): 0.440840, ('a', 'c'): 0.088337, ('a', 'b'): 0.188486, ('a', 'e'): 0.440840, ('b', 'c'): 0.0883373} result = d.getPairwiseDistances() self.assertDistsAlmostEqual(canned_result, result) d = EstimateDistances(self.collection, JC69(), do_pair_align=True, rigorous_align=False) d.run() canned_result = {('b', 'e'): 0.440840, ('c', 'e'): 0.440840, ('a', 'c'): 0.088337, ('a', 'b'): 0.188486, ('a', 'e'): 0.440840, ('b', 'c'): 0.0883373} result = d.getPairwiseDistances() self.assertDistsAlmostEqual(canned_result, result) def test_EstimateDistances_other_model_params(self): """test getting other model params from EstimateDistances""" d = EstimateDistances(self.al, HKY85(), est_params=['kappa']) d.run() # this will be a Number object with Mean, Median etc .. kappa = d.getParamValues('kappa') self.assertAlmostEqual(kappa.Mean, 0.8939, 4) # this will be a dict with pairwise instances, it's called by the above # method, so the correctness of it's values is already checked kappa = d.getPairwiseParam('kappa') def test_EstimateDistances_modify_lf(self): """tests modifying the lf""" def constrain_fit(lf): lf.setParamRule('kappa', is_constant=True) lf.optimise(local=True) return lf d = EstimateDistances(self.al, HKY85(), modify_lf=constrain_fit) d.run() result = d.getPairwiseDistances() d = EstimateDistances(self.al, F81()) d.run() expect = d.getPairwiseDistances() self.assertDistsAlmostEqual(expect, result) if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_recalculation.rst000644 000765 000024 00000012320 11425201333 022010 0ustar00jrideoutstaff000000 000000 A simple calculator >>> from cogent.recalculation.definition import * >>> def add(*args): ... return sum(args) ... >>> top = CalcDefn(add)(ParamDefn('A'), ParamDefn('B')) >>> pc = top.makeParamController() >>> f = pc.makeCalculator() f.getValueArray() shows the inputs, ie: the optimisable parameters >>> f.getValueArray() [1.0, 1.0] The calculator can be called like a function >>> f([3.0, 4.25]) 7.25 Or just a subset of the inputs can be changed directly >>> f.change([(1, 4.5)]) 7.5 >>> f.getValueArray() [3.0, 4.5] Now with scopes. We will set up the calculation result = (Ax+Bx) + (Ay+By) + (Az+Bz) A and B will remain distinct parameters, but x,y and z are merely scopes - ie: it may be the case that Ax = Ay = Az, and that may simplify the calculation, but we will never even notice if Ax = Bx. Each scope dimension (here there is just one, 'category') must be collapsed away at some point towards the end of the calculation if the calculation is to produce a scalar result. Here this is done with the selectFromDimension method. >>> A = ParamDefn('A', dimensions = ['category']) >>> B = ParamDefn('B', dimensions = ['category']) >>> mid = CalcDefn(add, name="mid")(A, B) >>> top = CalcDefn(add)( ... mid.selectFromDimension('category', "x"), ... mid.selectFromDimension('category', "y"), ... mid.selectFromDimension('category', "z")) ... >>> # or equivalently: >>> # top = CalcDefn(add, *mid.acrossDimension('category', >>> # ['x', 'y', 'z'])) >>> >>> pc = top.makeParamController() >>> f = pc.makeCalculator() >>> f.getValueArray() [1.0, 1.0] There are still only 2 inputs because the default scope is global, ie: Ax == Ay == Az. If we allow A to be different in the x,y and z categories and set their initial values to 2.0: >>> pc.assignAll('A', value=2.0, independent=True) >>> f = pc.makeCalculator() >>> f.getValueArray() [1.0, 2.0, 2.0, 2.0] Now we have A local and B still global, so the calculation is (Ax+B) + (Ay+B) + (Az+B) with the input parameters being [B, Ax, Ay, Az], so: >>> f([1.0, 2.0, 2.0, 2.0]) 9.0 >>> f([0.25, 2.0, 2.0, 2.0]) 6.75 Constants do not appear in the optimisable inputs. Set one of the 3 A values to be a constant and there will be one fewer optimisable parameters: >>> pc.assignAll('A', scope_spec={'category':'z'}, const=True) >>> f = pc.makeCalculator() >>> f.getValueArray() [1.0, 2.0, 2.0] The parameter controller should catch cases where the specified scope does not exist: >>> pc.assignAll('A', scope_spec={'category':'nosuch'}) Traceback (most recent call last): InvalidScopeError: ... >>> pc.assignAll('A', scope_spec={'nonsuch':'nosuch'}) Traceback (most recent call last): InvalidDimensionError: ... It is complicated guesswork matching the parameters you expect with positions in the value array, let alone remembering whether or not they are presented to the optimiser as logs, so .getValueArray(), .change() and .__call__() should only be used by optimisers. For other purposes there is an alternative, human friendly interface: >>> pc.updateFromCalculator(f) >>> pc.getParamValue('A', category='x') 2.0 >>> pc.getParamValue('B', category=['x', 'y']) 1.0 Despite the name, .getParamValue can get the value from any step in the calculation, so long as it has a unique name. >>> pc.getParamValue('mid', category='x') 3.0 For bulk retrieval of parameter values by parameter name and scope name there is the .getParamValueDict() method: >>> pc.getParamValueDict(['category']).keys() ['A', 'B'] >>> pc.getParamValueDict(['category'])['A']['x'] 2.0 Here is a function that is more like a likelihood function, in that it has a maximum: >>> def curve(x, y): ... return 0 - (x**2 + y**2) ... >>> top = CalcDefn(curve)(ParamDefn('X'), ParamDefn('Y')) >>> pc = top.makeParamController() >>> f = pc.makeCalculator() Now ask it to find the maximum. It is a simple function with only one local maximum so local optimisation should be enough: >>> f.optimise(local=True) >>> pc.updateFromCalculator(f) There were two parameters, X and Y, and at the maximum they should both be 0.0: >>> pc.getParamValue('Y') 0.0 >>> pc.getParamValue('X') 0.0 Because this function has a maximum it is possible to ask it for a confidence interval around a parameter, ie: how far from 0.0 can we move x before f(x,y) falls bellow f(X,Y)-dropoff: >>> pc.getParamInterval('X', dropoff=4, xtol=0.0) (-2.0, 0.0, 2.0) We test the ability to omit xtol. Due to precision issues we convert the returned value to a string. >>> '-2.0, 0.0, 2.0' == "%.1f, %.1f, %.1f" % pc.getParamInterval('X', dropoff=4) True And finally intervals can be calculated in bulk by passing a dropoff value to .getParamValueDict(): >>> pc.getParamValueDict([], dropoff=4, xtol=0.0)['X'] (-2.0, 0.0, 2.0) For likelihood functions it is more convenient to provide 'p' rather than 'dropoff', dropoff = chdtri(1, p) / 2.0. Also in general you won't need ultra precise answers, so don't use 'xtol=0.0', that's just to make the doctest work. PyCogent-1.5.3/tests/test_seqsim/000755 000765 000024 00000000000 12024703633 017742 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_struct/000755 000765 000024 00000000000 12024703632 017764 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_util/000755 000765 000024 00000000000 12024703635 017420 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/timetrial.py000644 000765 000024 00000006362 12024702176 017756 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # Simple script to run another command a certain number of times, # recording how long each run took, and writing the results out to a file. import os import os.path import re import string import sys import time __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Edward Lang"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" # Values that affect the running of the program. minimum_accepted_time = 2 iterations = int(sys.argv[1]) args = sys.argv[2:] script_re = re.compile(".py$") type = "" for i in range(len(args)): if script_re.search(args[i], 1): script = args[i][0:string.index(args[i], '.')] if args[i] == "mpirun": type = "parallel" if not type: type = "serial" output = "timing/" + script + "-" + str(int(time.time())) + "-" + type def usage(): pass def standard_dev(numbers = [], mean = 1): import math sum = 0 for i in range(len(numbers)): sum = sum + math.pow(numbers[i] - mean, 2) sigma = math.sqrt(sum / (len(numbers) - 1)) return sigma def main(): if args: command = ' '.join(map(str, args)) else: usage() sys.exit() total_time = 0.0 times = [] print 'Running "%s" %d times...' % (command, iterations) i = 0 attempt = 0 while i < iterations: start_time = time.time() os.system(command + " > " + output + "." + str(i)) end_time = time.time() - start_time if end_time > minimum_accepted_time: times.append(end_time) total_time = total_time + end_time print "Time for run %d: %.3f seconds" % (i, end_time) i = i + 1 attempt = 0 else: print "Discarding probably bogus time: %.3f seconds" % end_time attempt = attempt + 1 if attempt == 5: print "Aborting early due to multiple errors" sys.exit(3) times.sort() mean = total_time / len(times) sd = standard_dev(times, mean) print "" print "Fastest time : %.3f" % times[0] print "Slowest time : %.3f" % times[len(times) - 1] print "Mean : %.3f" % mean print "Standard dev : %.3f" % sd print "Total time : %.3f" % total_time print "" corrected_total = 0.0 corrected_times = [] for i in range(len(times)): if abs(mean - times[i]) < sd: corrected_times.append(times[i]) corrected_total = corrected_total + times[i] else: print "Discarding value '%.3f'" % times[i] if len(times) != len(corrected_times): corrected_mean = corrected_total / len(corrected_times) corrected_sd = standard_dev(corrected_times, corrected_mean) print "" print "CORRECTED RESULTS" print "Fastest time : %.3f" % corrected_times[0] print "Slowest time : %.3f" % corrected_times[len(corrected_times)-1] print "Mean : %.3f" % corrected_mean print "Standard dev : %.3f" % corrected_sd if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_util/__init__.py000644 000765 000024 00000001042 12024702176 021525 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_unit_test', 'test_misc', 'test_array', 'test_dict2d', 'test_organizer', 'test_transform', 'test_recode_alignment'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Sandra Smit", "Gavin Huttley", "Rob Knight", "Zongzhi Liu", "Amanda Birmingham", "Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_util/test_array.py000644 000765 000024 00000073122 12024702176 022153 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides tests for array.py """ #SUPPORT2425 #from __future__ import with_statement from cogent.util.unit_test import main, TestCase#, numpy_err from cogent.util.array import gapped_to_ungapped, unmasked_to_masked, \ ungapped_to_gapped, masked_to_unmasked, pairs_to_array,\ ln_2, log2, safe_p_log_p, safe_log, row_uncertainty, column_uncertainty,\ row_degeneracy, column_degeneracy, hamming_distance, norm,\ euclidean_distance, \ count_simple, count_alphabet, \ is_complex, is_significantly_complex, \ has_neg_off_diags, has_neg_off_diags_naive, \ sum_neg_off_diags, sum_neg_off_diags_naive, \ scale_row_sum, scale_row_sum_naive, scale_trace, \ abs_diff, sq_diff, norm_diff, \ cartesian_product, with_diag, without_diag, \ only_nonzero, combine_dimensions, split_dimension, \ non_diag, perturb_one_off_diag, perturb_off_diag, \ merge_samples, sort_merged_samples_by_value, classifiers, \ minimize_error_count, minimize_error_rate, mutate_array import numpy Float = numpy.core.numerictypes.sctype2char(float) from numpy import array, zeros, transpose, sqrt, reshape, arange, \ ravel, trace, ones __author__ = "Rob Knight and Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class arrayTests(TestCase): """Tests of top-level functions.""" def setUp(self): """set up some standard sequences and masks""" self.gap_state = array('-', 'c') self.s1 = array('ACT-G', 'c') self.s2 = array('--CT', 'c') self.s3 = array('AC--', 'c') self.s4 = array('AC', 'c') self.s5 = array('--', 'c') self.m1 = array([0,0,0,1,0]) self.m2 = array([1,1,0,0]) self.m3 = array([0,0,1,1]) self.m4 = array([0,0]) self.m5 = array([1,1]) def test_unmasked_to_masked(self): """unmasked_to_masked should match hand-calculated results""" u2m = unmasked_to_masked self.assertEqual(u2m(self.m1), array([0,1,2,4])) self.assertEqual(u2m(self.m2), array([2,3])) self.assertEqual(u2m(self.m3), array([0,1])) self.assertEqual(u2m(self.m4), array([0,1])) self.assertEqual(u2m(self.m5), array([])) def test_ungapped_to_gapped(self): """ungapped_to_gapped should match hand-calculated results""" u2g = ungapped_to_gapped gap_state = self.gap_state self.assertEqual(u2g(self.s1, gap_state), array([0,1,2,4])) self.assertEqual(u2g(self.s2, gap_state), array([2,3])) self.assertEqual(u2g(self.s3, gap_state), array([0,1])) self.assertEqual(u2g(self.s4, gap_state), array([0,1])) self.assertEqual(u2g(self.s5, gap_state), array([])) def test_masked_to_unmasked(self): """masked_to_unmasked should match hand-calculated results""" m2u = masked_to_unmasked self.assertEqual(m2u(self.m1), array([0,1,2,2,3])) self.assertEqual(m2u(self.m1, True), array([0,1,2,-1,3])) self.assertEqual(m2u(self.m2), array([-1,-1,0,1])) self.assertEqual(m2u(self.m2, True), array([-1,-1,0,1])) self.assertEqual(m2u(self.m3), array([0,1,1,1])) self.assertEqual(m2u(self.m3, True), array([0,1,-1,-1])) self.assertEqual(m2u(self.m4), array([0,1])) self.assertEqual(m2u(self.m4, True), array([0,1])) self.assertEqual(m2u(self.m5), array([-1,-1])) self.assertEqual(m2u(self.m5, True), array([-1,-1])) def test_gapped_to_ungapped(self): """gapped_to_ungapped should match hand-calculated results""" g2u = gapped_to_ungapped gap_state = self.gap_state self.assertEqual(g2u(self.s1, gap_state), array([0,1,2,2,3])) self.assertEqual(g2u(self.s1, gap_state, True), array([0,1,2,-1,3])) self.assertEqual(g2u(self.s2, gap_state), array([-1,-1,0,1])) self.assertEqual(g2u(self.s2, gap_state, True), array([-1,-1,0,1])) self.assertEqual(g2u(self.s3, gap_state), array([0,1,1,1])) self.assertEqual(g2u(self.s3, gap_state, True), array([0,1,-1,-1])) self.assertEqual(g2u(self.s4, gap_state), array([0,1])) self.assertEqual(g2u(self.s4, gap_state, True), array([0,1])) self.assertEqual(g2u(self.s5, gap_state), array([-1,-1])) self.assertEqual(g2u(self.s5, gap_state, True), array([-1,-1])) def test_pairs_to_array(self): """pairs_to_array should match hand-calculated results""" p2a = pairs_to_array p1 = [0, 1, 0.5] p2 = [2, 3, 0.9] p3 = [1, 2, 0.6] pairs = [p1, p2, p3] self.assertEqual(p2a(pairs), \ array([[0,.5,0,0],[0,0,.6,0],[0,0,0,.9],[0,0,0,0]])) #try it without weights -- should assign 1 new_pairs = [[0,1],[2,3],[1,2]] self.assertEqual(p2a(new_pairs), \ array([[0,1,0,0],[0,0,1,0],[0,0,0,1],[0,0,0,0]])) #try it with explicit array size self.assertEqual(p2a(pairs, 5), \ array([[0,.5,0,0,0],[0,0,.6,0,0],[0,0,0,.9,0],[0,0,0,0,0],\ [0,0,0,0,0]])) #try it when we want to map the indices into gapped coords #we're effectively doing ABCD -> -A--BC-D- transform = array([1,4,5,7]) result = p2a(pairs, transform=transform) self.assertEqual(result.shape, (8,8)) exp = zeros((8,8), Float) exp[1,4] = 0.5 exp[4,5] = 0.6 exp[5,7] = 0.9 self.assertEqual(result, exp) result = p2a(pairs, num_items=9, transform=transform) self.assertEqual(result.shape, (9,9)) exp = zeros((9,9), Float) exp[1,4] = 0.5 exp[4,5] = 0.6 exp[5,7] = 0.9 self.assertEqual(result, exp) class ArrayMathTests(TestCase): def test_ln_2(self): """ln_2: should be constant""" self.assertFloatEqual(ln_2, 0.693147) def test_log2(self): """log2: should work fine on positive/negative numbers and zero""" self.assertEqual(log2(1),0) self.assertEqual(log2(2),1) self.assertEqual(log2(4),2) self.assertEqual(log2(8),3) #SUPPORT2425 #with numpy_err(divide='ignore'): ori_err = numpy.geterr() numpy.seterr(divide='ignore') try: try: self.assertEqual(log2(0),float('-inf')) except (ValueError, OverflowError): #platform-dependent pass finally: numpy.seterr(**ori_err) #SUPPORT2425 ori_err = numpy.geterr() numpy.seterr(divide='raise') try: #with numpy_err(divide='raise'): self.assertRaises(FloatingPointError, log2, 0) finally: numpy.seterr(**ori_err) #nan is the only thing that's not equal to itself try: self.assertNotEqual(log2(-1),log2(-1)) #now nan except ValueError: pass def test_safe_p_log_p(self): """safe_p_log_p: should handle pos/neg/zero/empty arrays as expected """ #normal valid array a = array([[4,0,8],[2,16,4]]) self.assertEqual(safe_p_log_p(a),array([[-8,0,-24],[-2,-64,-8]])) #just zeros a = array([[0,0],[0,0]]) self.assertEqual(safe_p_log_p(a),array([[0,0],[0,0]])) #negative number -- skip self.assertEqual(safe_p_log_p(array([-4])), array([0])) #integer input, float output self.assertFloatEqual(safe_p_log_p(array([3])),array([-4.75488750])) #empty array self.assertEqual(safe_p_log_p(array([])),array([])) def test_safe_log(self): """safe_log: should handle pos/neg/zero/empty arrays as expected """ #normal valid array a = array([[4,0,8],[2,16,4]]) self.assertEqual(safe_log(a),array([[2,0,3],[1,4,2]])) #input integers, output floats self.assertFloatEqual(safe_log(array([1,2,3])),array([0,1,1.5849625])) #just zeros a = array([[0,0],[0,0]]) self.assertEqual(safe_log(a),array([[0,0],[0,0]])) #negative number try: self.assertFloatEqual(safe_log(array([0,3,-4]))[0:2], \ array([0,1.5849625007])) except ValueError: #platform-dependent pass try: self.assertNotEqual(safe_log(array([0,3,-4]))[2],\ safe_log(array([0,3,-4]))[2]) except ValueError: #platform-dependent pass #empty array self.assertEqual(safe_log(array([])),array([])) #double empty array self.assertEqual(safe_log(array([[]])),array([[]])) def test_row_uncertainty(self): """row_uncertainty: should handle pos/neg/zero/empty arrays as expected """ #normal valid array b = transpose(array([[.25,.2,.45,.25,1],[.25,.2,.45,0,0],\ [.25,.3,.05,.75,0],[.25,.3,.05,0,0]])) self.assertFloatEqual(row_uncertainty(b),[2,1.97,1.47,0.81,0],1e-3) #one-dimensional array self.assertRaises(ValueError, row_uncertainty,\ array([.25,.25,.25,.25])) #zeros self.assertEqual(row_uncertainty(array([[0,0]])),array([0])) #empty 2D array self.assertEqual(row_uncertainty(array([[]])),array([0])) self.assertEqual(row_uncertainty(array([[],[]])),array([0,0])) #negative number -- skip self.assertEqual(row_uncertainty(array([[-2]])), array([0])) def test_col_uncertainty(self): """column_uncertainty: should handle pos/neg/zero/empty arrays """ b = array([[.25,.2,.45,.25,1],[.25,.2,.45,0,0],[.25,.3,.05,.75,0],\ [.25,.3,.05,0,0]]) self.assertFloatEqual(column_uncertainty(b),[2,1.97,1.47,0.81,0],1e-3) #one-dimensional array self.assertRaises(ValueError, column_uncertainty,\ array([.25,.25,.25,.25])) #zeros self.assertEqual(column_uncertainty(array([[0,0]])),array([0,0])) #empty 2D array self.assertEqual(column_uncertainty(array([[]])),array([])) self.assertEqual(column_uncertainty(array([[],[]])),array([])) #negative number -- skip self.assertEqual(column_uncertainty(array([[-2]])), array([0])) def test_row_degeneracy(self): """row_degeneracy: should work with different cutoff values and arrays """ a = array([[.1, .3, .4, .2],[.5, .3, 0, .2],[.8, 0, .1, .1]]) self.assertEqual(row_degeneracy(a,cutoff=.75),[3,2,1]) self.assertEqual(row_degeneracy(a,cutoff=.95),[4,3,3]) #one-dimensional array self.assertRaises(ValueError, row_degeneracy,\ array([.25,.25,.25,.25])) #if cutoff value is not found, results are clipped to the #number of columns in the array self.assertEqual(row_degeneracy(a,cutoff=2), [4,4,4]) #same behavior on empty array self.assertEqual(row_degeneracy(array([[]])),[]) def test_column_degeneracy(self): """column_degeneracy: should work with different cutoff values """ a = array([[.1,.8,.3],[.3,.2,.3],[.6,0,.4]]) self.assertEqual(column_degeneracy(a,cutoff=.75),[2,1,3]) self.assertEqual(column_degeneracy(a,cutoff=.45),[1,1,2]) #one-dimensional array self.assertRaises(ValueError, column_degeneracy,\ array([.25,.25,.25,.25])) #if cutoff value is not found, results are clipped to the #number of rows in the array self.assertEqual(column_degeneracy(a,cutoff=2), [3,3,3]) #same behavior on empty array self.assertEqual(column_degeneracy(array([[]])),[]) def test_hamming_distance_same_length(self): """hamming_distance: should return # of chars different""" hd = hamming_distance(array('ABC','c'),array('ABB','c')) self.assertEqual(hd,1) self.assertEqual(hamming_distance(array('ABC', 'c'),array('ABC', 'c')),0) self.assertEqual(hamming_distance(array('ABC', 'c'),array('DDD', 'c')),3) def test_hamming_distance_diff_length(self): """hamming_distance: truncates at shortest sequence""" self.assertEqual(hamming_distance(array('ABC', 'c'),array('ABBDDD', 'c')),1) self.assertEqual(hamming_distance(array('ABC', 'c'),array('ABCDDD', 'c')),0) self.assertEqual(hamming_distance(array('ABC', 'c'),array('DDDDDD', 'c')),3) def test_norm(self): """norm: should return vector or matrix norm""" self.assertFloatEqual(norm(array([2,3,4,5])),sqrt(54)) self.assertEqual(norm(array([1,1,1,1])),2) self.assertFloatEqual(norm(array([[2,3],[4,5]])),sqrt(54)) self.assertEqual(norm(array([[1,1],[1,1]])),2) def test_euclidean_distance(self): """euclidean_distance: should return dist between 2 vectors or matrices """ a = array([3,4]) b = array([8,5]) c = array([[2,3],[4,5]]) d = array([[1,5],[8,2]]) self.assertFloatEqual(euclidean_distance(a,b),sqrt(26)) self.assertFloatEqual(euclidean_distance(c,d),sqrt(30)) def test_euclidean_distance_unexpected(self): """euclidean_distance: works always when frames are aligned. UNEXPECTED! """ a = array([3,4]) b = array([8,5]) c = array([[2,3],[4,5]]) d = array([[1,5],[8,2]]) e = array([[4,5],[4,5],[4,5]]) f = array([1,1,1,1,1]) self.assertFloatEqual(euclidean_distance(a,c),sqrt(4)) self.assertFloatEqual(euclidean_distance(c,a),sqrt(4)) self.assertFloatEqual(euclidean_distance(a,e),sqrt(6)) #IT DOES RAISE AN ERROR WHEN THE FRAMES ARE NOT ALIGNED self.assertRaises(ValueError,euclidean_distance,c,e) self.assertRaises(ValueError,euclidean_distance,c,f) def test_count_simple(self): """count_simple should return correct counts""" self.assertEqual(count_simple(array([]), 3), array([0,0,0])) self.assertEqual(count_simple(array([1,2,2,1,0]), 3), array([1,2,2])) self.assertEqual(count_simple(array([1,1,1,1,1]), 3), array([0,5,0])) self.assertEqual(count_simple(array([1,1,1,1,1]), 2), array([0,5])) #raises index error if alphabet length is 0 self.assertRaises(IndexError, count_simple, array([1]), 0) def test_count_alphabet(self): """count_alphabet should return correct counts""" self.assertEqual(count_alphabet(array([]), 3), array([0,0,0])) self.assertEqual(count_alphabet(array([1,2,2,1,0]), 3), array([1,2,2])) self.assertEqual(count_alphabet(array([1,1,1,1,1]), 3), array([0,5,0])) self.assertEqual(count_alphabet(array([1,1,1,1,1]), 2), array([0,5])) #raises index error if alphabet length is 0 self.assertRaises(IndexError, count_alphabet, array([1]), 0) def test_is_complex(self): """is_complex should return True on matrix with complex values""" self.assertEqual(is_complex(array([[1,2],[3,4]])), False) self.assertEqual(is_complex(array([[1,2],[3,4.0]])), False) self.assertEqual(is_complex(array([[1,2+1j],[3,4]])), True) self.assertEqual(is_complex(array([[1,2.0j],[3,4.0]])), True) def test_is_significantly_complex(self): """is_significantly_complex should return True on complex matrix""" isc = is_significantly_complex self.assertEqual(isc(array([[1,2],[3,4]])), False) self.assertEqual(isc(array([[1,2],[3,4.0]])), False) self.assertEqual(isc(array([[1,2+1j],[3,4]])), True) self.assertEqual(isc(array([[1,2.0j],[3,4.0]])), True) self.assertEqual(isc(array([[1,1e-10j],[3,4.0]])), False) self.assertEqual(isc(array([[1,1e-10j],[3,4.0]]), 1e-12), True) def test_has_neg_off_diags_naive(self): """has_neg_off_diags_naive should return True if any off-diags negative""" hnod = has_neg_off_diags_naive self.assertEqual(hnod(array([[1,2],[3,4]])), False) self.assertEqual(hnod(array([[-1,2],[3,-4]])), False) self.assertEqual(hnod(array([[-1,-2],[3,-4]])), True) self.assertEqual(hnod(array([[1,-2],[3,4]])), True) def test_has_neg_off_diags(self): """has_neg_off_diags should be same as has_neg_off_diags_naive""" hnod = has_neg_off_diags self.assertEqual(hnod(array([[1,2],[3,4]])), False) self.assertEqual(hnod(array([[-1,2],[3,-4]])), False) self.assertEqual(hnod(array([[-1,-2],[3,-4]])), True) self.assertEqual(hnod(array([[1,-2],[3,4]])), True) def test_sum_neg_off_diags_naive(self): """sum_neg_off_diags_naive should return the sum of negative off-diags""" snod = sum_neg_off_diags_naive self.assertEqual(snod(array([[1,2],[3,4]])), 0) self.assertEqual(snod(array([[-1,2],[3,-4]])), 0) self.assertEqual(snod(array([[-1,-2],[3,-4]])), -2) self.assertEqual(snod(array([[1,-2],[3,4]])), -2) self.assertEqual(snod(array([[1,-2],[-3,4]])), -5) def test_sum_neg_off_diags(self): """sum_neg_off_diags should return same as sum_neg_off_diags_naive""" snod = sum_neg_off_diags self.assertEqual(snod(array([[1,2],[3,4]])), 0) self.assertEqual(snod(array([[-1,2],[3,-4]])), 0) self.assertEqual(snod(array([[-1,-2],[3,-4]])), -2) self.assertEqual(snod(array([[1,-2],[3,4]])), -2) self.assertEqual(snod(array([[1,-2],[-3,4]])), -5) def test_scale_row_sum(self): """scale_row_sum should give same result as scale_row_sum_naive""" m = array([[1.0,2,3,4],[2,4,4,0],[1,1,1,1],[0,0,0,100]]) scale_row_sum(m) self.assertFloatEqual(m, [[0.1,0.2,0.3,0.4],[0.2,0.4,0.4,0],\ [0.25,0.25,0.25,0.25],[0,0,0,1.0]]) scale_row_sum(m,4) self.assertFloatEqual(m, [[0.4,0.8,1.2,1.6],[0.8,1.6,1.6,0],\ [1,1,1,1],[0,0,0,4.0]]) #if any of the rows sums to zero, an exception will be raised. #SUPPORT2425 ori_err = numpy.geterr() numpy.seterr(divide='raise') try: #with numpy_err(divide='raise'): self.assertRaises((ZeroDivisionError, FloatingPointError), \ scale_row_sum, array([[1,0],[0,0]])) finally: numpy.seterr(**ori_err) def test_scale_row_sum_naive(self): """scale_row_sum_naive should scale rows to correct values""" m = array([[1.0,2,3,4],[2,4,4,0],[1,1,1,1],[0,0,0,100]]) scale_row_sum_naive(m) self.assertFloatEqual(m, [[0.1,0.2,0.3,0.4],[0.2,0.4,0.4,0],\ [0.25,0.25,0.25,0.25],[0,0,0,1.0]]) scale_row_sum_naive(m,4) self.assertFloatEqual(m, [[0.4,0.8,1.2,1.6],[0.8,1.6,1.6,0],\ [1,1,1,1],[0,0,0,4.0]]) #if any of the rows sums to zero, an exception will be raised. #SUPPORT2425 ori_err = numpy.geterr() numpy.seterr(divide='raise') try: #with numpy_err(divide='raise'): self.assertRaises((ZeroDivisionError, FloatingPointError), \ scale_row_sum_naive, array([[1,0],[0,0]])) finally: numpy.seterr(**ori_err) def test_scale_trace(self): """scale_trace should scale trace to correct values""" #should scale to -1 by default #WARNING: won't work with integer matrices m = array([[-2., 0],[0,-2]]) scale_trace(m) self.assertFloatEqual(m, [[-0.5, 0],[0,-0.5]]) #should work even with zero rows m = array([ [1.0,2,3,4], [2,4,4,0], [1,1,0,1], [0,0,0,0] ]) m_orig = m.copy() scale_trace(m) self.assertFloatEqual(m, m_orig / -5) #but should fail if trace is zero m = array([[0,1,1],[1,0,1],[1,1,0]]) #SUPPORT2425 ori_err = numpy.geterr() numpy.seterr(divide='raise') try: #with numpy_err(divide='raise'): self.assertRaises((ZeroDivisionError, FloatingPointError), \ scale_trace, m) finally: numpy.seterr(**ori_err) def test_abs_diff(self): """abs_diff should calculate element-wise sum of abs(first-second)""" m = array([[1.0,2,3],[4,5,6], [7,8,9]]) m2 = array([[1.0,1,4],[2,6,-1],[8,6,-5]]) #matrix should not be different from itself self.assertEqual(abs_diff(m,m), 0.0) self.assertEqual(abs_diff(m2,m2), 0.0) #difference should be same either direction self.assertEqual(abs_diff(m,m2), 29.0) self.assertEqual(abs_diff(m2,m), 29.0) def test_sq_diff(self): """sq_diff should calculate element-wise sum square of abs(first-second)""" m = array([[1.0,2,3],[4,5,6], [7,8,9]]) m2 = array([[1.0,1,4],[2,6,-1],[8,6,-5]]) #matrix should not be different from itself self.assertEqual(sq_diff(m,m), 0.0) self.assertEqual(sq_diff(m2,m2), 0.0) #difference should be same either direction self.assertEqual(sq_diff(m,m2), 257.0) self.assertEqual(sq_diff(m2,m), 257.0) def test_norm_diff(self): """norm_diff should calculate per-element rms difference""" m = array([[1.0,2,3],[4,5,6], [7,8,9]]) m2 = array([[1.0,1,4],[2,6,-1],[8,6,-5]]) #matrix should not be different from itself self.assertEqual(norm_diff(m,m), 0.0) self.assertEqual(norm_diff(m2,m2), 0.0) #difference should be same either direction self.assertEqual(norm_diff(m,m2), sqrt(257.0)/9) self.assertEqual(norm_diff(m2,m), sqrt(257.0)/9) def test_carteisan_product(self): """cartesian_product should return expected results.""" a = 'abc' b = [1,2,3] c = [1.0] d = [0,1] #cartesian_product of list of single list should be same list self.assertEqual(cartesian_product([c]), [(1.0,)]) self.assertEqual(cartesian_product([a]), [('a',),('b',),('c',)]) #should combine two lists correctly self.assertEqual(cartesian_product([a,b]), \ [('a',1),('a',2),('a',3),('b',1),('b',2),\ ('b',3),('c',1),('c',2),('c',3)]) #should combine three lists correctly self.assertEqual(cartesian_product([d,d,d]), \ [(0,0,0),(0,0,1),(0,1,0),(0,1,1),(1,0,0),(1,0,1),(1,1,0),(1,1,1)]) self.assertEqual(cartesian_product([c,d,d]), \ [(1.0,0,0),(1.0,0,1),(1.0,1,0),(1.0,1,1)]) def test_without_diag(self): """without_diag should omit diagonal from matrix""" a = array([[1,2,3],[4,5,6],[7,8,9]]) b = without_diag(a) self.assertEqual(b, array([[2,3],[4,6],[7,8]])) def test_with_diag(self): """with_diag should add diagonal to matrix""" a = array([[2,3],[4,6],[7,8]]) b = with_diag(a, array([1,5,9])) self.assertEqual(b, array([[1,2,3],[4,5,6],[7,8,9]])) def test_only_nonzero(self): """only_nonzero should return only items whose first element is nonzero""" a = reshape(arange(1,46),(5,3,3)) a[1,0,0] = 0 a[3,0,0] = 0 #expect result to be rows 0, 2 and 3 of a result = only_nonzero(a) self.assertEqual(result, array([[[1,2,3],[4,5,6],[7,8,9]],\ [[19,20,21],[22,23,24],[25,26,27]], [[37,38,39],[40,41,42],[43,44,45]]])) def test_combine_dimensions(self): """combine_dimensions should aggregate expected dimensions""" m = reshape(arange(81), (3,3,3,3)) a = combine_dimensions(m, 0) self.assertEqual(a.shape, (3,3,3,3)) a = combine_dimensions(m, 1) self.assertEqual(a.shape, (3,3,3,3)) a = combine_dimensions(m, 2) self.assertEqual(a.shape, (9,3,3)) a = combine_dimensions(m, 3) self.assertEqual(a.shape, (27,3)) a = combine_dimensions(m, 4) self.assertEqual(a.shape, (81,)) #should work for negative indices as well, starting at end a = combine_dimensions(m, -1) self.assertEqual(a.shape, (3,3,3,3)) a = combine_dimensions(m, -2) self.assertEqual(a.shape, (3,3,9)) a = combine_dimensions(m, -3) self.assertEqual(a.shape, (3,27)) a = combine_dimensions(m, -4) self.assertEqual(a.shape, (81,)) def test_split_dimension(self): """split_dimension should unpack specified dimension""" m = reshape(arange(12**3), (12,12,12)) a = split_dimension(m, 0, (4,3)) self.assertEqual(a.shape, (4,3,12,12)) a = split_dimension(m, 0, (2,3,2)) self.assertEqual(a.shape, (2,3,2,12,12)) a = split_dimension(m, 1, (6,2)) self.assertEqual(a.shape, (12, 6, 2, 12)) a = split_dimension(m, 2, (3,4)) self.assertEqual(a.shape, (12,12,3,4)) #should work for negative index a = split_dimension(m, -1, (3,4)) self.assertEqual(a.shape, (12,12,3,4)) a = split_dimension(m, -2, (3,4)) self.assertEqual(a.shape, (12,3,4,12)) a = split_dimension(m, -3, (3,4)) self.assertEqual(a.shape, (3,4,12,12)) #should fail with IndexError for invalid dimension self.assertRaises(IndexError, split_dimension, m, 5, (3,4)) #should assume even split if not supplied m = reshape(arange(16**3), (16,16,16)) a = split_dimension(m, 0) self.assertEqual(a.shape, (4,4,16,16)) a = split_dimension(m, 1) self.assertEqual(a.shape, (16,4,4,16)) def test_non_diag(self): """non_diag should return non-diag elements from flattened matrices""" a = reshape(arange(16), (4,4)) m = non_diag(a) self.assertEqual(m, array([[1,2],[5,6],[9,10],[13,14]])) a = reshape(arange(27), (3,9)) m = non_diag(a) self.assertEqual(m, array([[1,2,3,5,6,7],[10,11,12,14,15,16],\ [19,20,21,23,24,25]])) def test_perturb_one_off_diag(self): """perturb_element should perturb a random off-diagonal element""" for i in range(100): a = zeros((4,4), Float) p = perturb_one_off_diag(a) #NOTE: off-diag element and diag element will _both_ change self.assertEqual(sum(ravel(p != a)), 2) #check that sum is still 0 self.assertEqual(sum(ravel(p)), 0) #check that rrace is negative assert trace(p) < 1 #check that we can pick an element to change a = zeros((4,4), Float) p = perturb_one_off_diag(a, mean=5, sd=0.1, element_to_change=8) #check that row still sums to 0 self.assertEqual(sum(ravel(p)), 0) #set diag in changed row to 0 p[2][2] = 0 assert ((4.5 < sum(p)).any() < 5.5).any() assert 4.5 < p[2][3] < 5.5 p[2][3] = 0 self.assertEqual(sum(ravel(p)), 0) def test_perturb_off_diag(self): """perturb_off_diag should change all off_diag elements.""" a = zeros((4,4), Float) d = perturb_off_diag(a) self.assertFloatEqual(sum(ravel(d)), 0) #try it with a valid rate matrix a = ones((4,4), Float) for i in range(4): a[i][i] = -3 d = perturb_off_diag(a) self.assertNotEqual(d, a) self.assertFloatEqual(sum(ravel(d)), 0) #check that we didn't change it too much assert -13 < trace(d) < -11 def test_merge_samples(self): """merge_samples should keep the sample label""" self.assertEqual(merge_samples(array([1,2]),array([3,4]),array([5])), array([[1,2,3,4,5],[0,0,1,1,2]])) def test_sort_merged_samples_by_value(self): """sort_merged_samples_by_value should keep label associations""" s = merge_samples(array([3,4]), array([5,6]), array([1,2])) result = sort_merged_samples_by_value(s) self.assertEqual(result, array([[1,2,3,4,5,6],[2,2,0,0,1,1]])) def test_classifiers(self): """classifiers should return all the 1D classifiers of samples""" first = array([2,1,5,3,5]) second = array([2,5,5,4,6,7]) result = classifiers(first, second) self.assertEqual(len(result), 6) exp = [(1,False,0,4,1,6),(3,False,1,3,2,5),(4,False,1,2,3,5),\ (5,False,2,2,3,4),(9,False,4,0,5,2),(10,False,5,0,5,1)] self.assertEqual(result, exp) #should work in reverse result = classifiers(second, first) exp = [(1,True,0,4,1,6),(3,True,1,3,2,5),(4,True,1,2,3,5),\ (5,True,2,2,3,4),(9,True,4,0,5,2),(10,True,5,0,5,1)] def test_minimize_error_count(self): """minimize_error_count should return correct classifier""" first = array([2,1,5,3,5]) second = array([2,5,5,4,6,7]) c = classifiers(first, second) exp = (4,False,1,2,3,5) self.assertEqual(minimize_error_count(c), exp) def test_minimize_error_rate(self): """minimize_error_rate should return correct classifier""" #should be same as error count on example used above first = array([2,1,5,3,5]) second = array([2,5,5,4,6,7]) c = classifiers(first, second) exp = (4,False,1,2,3,5) self.assertEqual(minimize_error_rate(c), exp) #here's a case where they should differ first = array([2,3,11,5]) second = array([1,4,6,7,8,9,10]) c = classifiers(first, second) self.assertEqual(minimize_error_count(c), (3,False,1,2,2,6)) self.assertEqual(minimize_error_rate(c), (5,False,2,1,3,5)) def test_mutate_array(self): """mutate_array should return mutated copy""" a = arange(5) m = mutate_array(a, 1, 2) assert a is not m self.assertNotEqual(a, m) residuals = m - a assert min(residuals) > -6 assert max(residuals) < 6 if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_util/test_dict2d.py000644 000765 000024 00000064132 12024702176 022207 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.util.dict2d import Dict2D, \ average, largest, smallest, swap, nonzero, not_0, upper_to_lower, \ lower_to_upper, Dict2DInitError, Dict2DError, Dict2DSparseError from cogent.maths.stats.util import Numbers, Freqs __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "caporaso@colorado.edu" __status__ = "Development" class Dict2DTests(TestCase): """ Tests of the Dict2DTests class """ def setUp(self): """Define a few standard matrices""" self.empty = {} self.single_same = {'a':{'a':2}} self.single_diff = {'a':{'b':3}} self.square = { 'a':{'a':1,'b':2,'c':3}, 'b':{'a':2,'b':4,'c':6}, 'c':{'a':3,'b':6,'c':9}, } self.top_triangle = { 'a':{'a':1, 'b':2, 'c':3}, 'b':{'b':4, 'c':6}, 'c':{'c':9} } self.bottom_triangle = { 'b':{'a':2}, 'c':{'a':3, 'b':6} } self.sparse = { 'a':{'a':1, 'c':3}, 'd':{'b':2}, } self.dense = { 'a':{'a':1,'b':2,'c':3}, 'b':{'a':2,'b':4,'c':6}, } def test_init(self): """Dict2D init should work as expected""" #NOTE: currently only tests init from dict of dicts. Other initializers #are tested in the test_guess_input* and test_from* methods #should compare equal to the relevant dict for d in [self.empty, self.single_same, self.single_diff, self.dense, \ self.sparse]: d2d = Dict2D(d) self.assertEqual(d2d, d) self.assertEqual(d2d.__class__, Dict2D) #spot-check values d2d = Dict2D(self.sparse) self.assertEqual(d2d['a']['c'], 3) self.assertEqual(d2d['d']['b'], 2) self.assertEqual(len(d2d), 2) self.assertEqual(len(d2d['a']), 2) self.assertRaises(KeyError, d2d.__getitem__, 'c') #check truth values assert not Dict2D(self.empty) assert Dict2D(self.single_same) def test_fromDicts(self): """Dict2D.fromDicts should construct from dict of dicts""" d2d = Dict2D() d2d.fromDicts(self.sparse) self.assertEqual(d2d['a']['c'], 3) self.assertEqual(d2d['d']['b'], 2) self.assertEqual(len(d2d), 2) self.assertEqual(len(d2d['a']), 2) self.assertRaises(KeyError, d2d.__getitem__, 'c') self.assertRaises(Dict2DInitError, d2d.fromDicts, [1,2,3]) def test_fromIndices(self): """Dict2D.fromIndices should construct from list of indices""" d2d = Dict2D(self.sparse) d2d2 = Dict2D() self.assertNotEqual(d2d, d2d2) d2d2.fromIndices([('a','a',1),('a','c',3),('d','b',2)]) self.assertEqual(d2d, d2d2) self.assertRaises(Dict2DInitError, d2d2.fromIndices, [1,2,3]) def test_fromLists(self): """Dict2D.fromLists should construct from list of lists""" #Note that this only works for dense matrices, not sparse ones orig = Dict2D(self.dense) new = Dict2D(self.dense) #will overwrite this data self.assertEqual(orig, new) assert orig is not new new.RowOrder = ['b','a'] new.ColOrder = ['c','a','b'] new.fromLists([[3,6,9],[1,3,5]]) self.assertNotEqual(orig, new) test = Dict2D({'b':{'c':3,'a':6,'b':9},'a':{'c':1,'a':3,'b':5}}) self.assertEqual(new, test) def test_guess_input_type_fromLists(self): """Dict2D init can correctly guess input type: Lists """ # Will fail if Error is raised d = Dict2D(data=[[1,2,3],[4,5,6]], RowOrder=list('ab'), \ ColOrder=list('def')) def test_guess_input_type_fromDict(self): """Dict2D init can correctly guess input type: Dict """ # Will fail if error is raised d = Dict2D({}) def test_guess_input_type_fromIndices(self): """Dict2D init can correctly guess input type: Indices """ # Will fail if error is raised d = Dict2D([('a','b',1)]) def test_init_without_data(self): """Dict2D init functions correctly without a data parameter """ d = Dict2D(RowOrder=['a'],ColOrder=['b'],Pad=True,Default=42, RowConstructor=Freqs) self.assertEqual(d.RowOrder,['a']) self.assertEqual(d.ColOrder,['b']) self.assertEqual(d.Pad,True) self.assertEqual(d.Default,42) self.assertEqual(d.RowConstructor, Freqs) self.assertEqual(d,{'a':{'b':42.}}) def test_pad(self): """Dict2D pad should fill empty slots with default, but not make square""" d = Dict2D(self.sparse) d.pad() self.assertEqual(len(d), 2) self.assertEqual(len(d['a']), 3) self.assertEqual(len(d['d']), 3) self.assertEqual(d['a'].keys(), d['d'].keys()) self.assertEqual(d['a']['b'], None) #check that it works with a different default value d = Dict2D(self.sparse, Default='x') d.pad() self.assertEqual(d['a']['b'], 'x') #check that it works with a different constructor d = Dict2D(self.sparse, Default=0, RowConstructor=Freqs) d.pad() self.assertEqual(d['a']['b'], 0) assert isinstance(d['a'], Freqs) def test_purge(self): """Dict2D purge should delete unwanted keys at both levels""" d = Dict2D(self.square) d.RowOrder = 'ab' d.ColOrder = 'bc' d.purge() self.assertEqual(d, Dict2D({'a':{'b':2,'c':3},'b':{'b':4,'c':6}})) #check that a superset of the keys is OK d = Dict2D(self.square) d.RowOrder = dict.fromkeys('abcd') d.ColOrder = dict.fromkeys('abcd') d.purge() self.assertEqual(d, Dict2D(self.square)) #check that everything gets deleted if nothing is valid d.RowOrder = list('xyz') d.ColOrder = list('xyz') d.purge() self.assertEqual(d, {}) def test_rowKeys(self): """Dict2D rowKeys should find all the keys of component rows""" self.assertEqual(Dict2D(self.empty).rowKeys(), []) self.assertEqual(Dict2D(self.single_diff).rowKeys(), ['a']) #note that keys will be returned in arbitrary order self.assertEqualItems(Dict2D(self.dense).rowKeys(), ['a','b',]) self.assertEqualItems(Dict2D(self.square).rowKeys(), ['a','b','c']) self.assertEqualItems(Dict2D(self.sparse).rowKeys(), ['a','d']) def test_colKeys(self): """Dict2D colKeys should find all the keys of component cols""" self.assertEqual(Dict2D(self.empty).colKeys(), []) self.assertEqual(Dict2D(self.single_diff).colKeys(), ['b']) #note that keys will be returned in arbitrary order self.assertEqualItems(Dict2D(self.square).colKeys(), ['a','b','c']) self.assertEqualItems(Dict2D(self.dense).colKeys(), ['a','b','c']) self.assertEqualItems(Dict2D(self.sparse).colKeys(), ['a','b','c']) def test_sharedColKeys(self): """Dict2D sharedColKeys should find keys shared by all component cols""" self.assertEqual(Dict2D(self.empty).sharedColKeys(), []) self.assertEqual(Dict2D(self.single_diff).sharedColKeys(), ['b']) #note that keys will be returned in arbitrary order self.assertEqualItems(Dict2D(self.square).sharedColKeys(),['a','b','c']) self.assertEqualItems(Dict2D(self.dense).sharedColKeys(), ['a','b','c']) self.assertEqualItems(Dict2D(self.sparse).sharedColKeys(), []) self.square['x'] = {'b':3, 'c':5, 'e':7} self.assertEqualItems(Dict2D(self.square).colKeys(),['a','b','c','e']) self.assertEqualItems(Dict2D(self.square).sharedColKeys(),['b','c']) def test_square(self): """Dict2D square should ensure that all d[i][j] exist""" #will raise exception if rows and cols aren't equal... self.assertRaises(Dict2DError, Dict2D(self.sparse).square) self.assertRaises(Dict2DError, Dict2D(self.dense).square) #...unless reset_order is True d = Dict2D(self.sparse) d.square(reset_order=True) self.assertEqual(d, Dict2D({ 'a':{'a':1,'b':None,'c':3,'d':None}, 'b':{'a':None, 'b':None, 'c':None, 'd':None}, 'c':{'a':None, 'b':None, 'c':None, 'd':None}, 'd':{'a':None, 'b':2, 'c':None, 'd':None}, })) #Check that passing in a default works too d = Dict2D(self.sparse) d.square(reset_order=True, default='x') self.assertEqual(d, Dict2D({ 'a':{'a':1,'b':'x','c':3,'d':'x'}, 'b':{'a':'x', 'b':'x', 'c':'x', 'd':'x'}, 'c':{'a':'x', 'b':'x', 'c':'x', 'd':'x'}, 'd':{'a':'x', 'b':2, 'c':'x', 'd':'x'}, })) def test_rows(self): """Dict2D Rows property should return list in correct order""" #should work with no data self.assertEqual(list(Dict2D(self.empty).Rows), []) #should work on square matrix sq = Dict2D(self.square, RowOrder='abc', ColOrder='abc') self.assertEqual(list(sq.Rows), [[1,2,3],[2,4,6],[3,6,9]]) #check that it works when we change the row and col order sq.RowOrder = 'ba' sq.ColOrder = 'ccb' self.assertEqual(list(sq.Rows), [[6,6,4],[3,3,2]]) #check that it doesn't raise an error on sparse matrices... sp = Dict2D(self.sparse) rows = list(sp.Rows) for r in rows: r.sort() rows.sort() self.assertEqual(rows, [[1,3],[2]]) #...unless self.RowOrder and self.ColOrder are set... sp.RowOrder = 'ad' sp.ColOrder = 'abc' self.assertRaises(Dict2DSparseError, list, sp.Rows) #...and then, only if self.Pad is not set sp.Pad = True sp.Default = 'xxx' self.assertEqual(list(sp.Rows), [[1, 'xxx', 3],['xxx',2,'xxx']]) def test_cols(self): """Dict2D Cols property should return list in correct order""" #should work with no data self.assertEqual(list(Dict2D(self.empty).Cols), []) #should work with square matrix sq = Dict2D(self.square, RowOrder='abc', ColOrder='abc') self.assertEqual(list(sq.Cols), [[1,2,3],[2,4,6],[3,6,9]]) #check that it works when we change the row and col order sq.RowOrder = 'ba' sq.ColOrder = 'ccb' self.assertEqual(list(sq.Cols), [[6,3],[6,3],[4,2]]) #check that it _does_ raise an error on sparse matrices... sp = Dict2D(self.sparse) self.assertRaises(Dict2DSparseError, list, sp.Cols) #...especially if self.RowOrder and self.ColOrder are set... sp.RowOrder = 'ad' sp.ColOrder = 'abc' self.assertRaises(Dict2DSparseError, list, sp.Cols) #...and then, only if self.Pad is not set sp.Pad = True sp.Default = 'xxx' self.assertEqual(list(sp.Cols), [[1,'xxx'],['xxx',2],[3,'xxx']]) def test_items(self): """Dict2D Items property should return list in correct order""" #should work with no data self.assertEqual(list(Dict2D(self.empty).Items), []) #should work on square matrix sq = Dict2D(self.square, RowOrder='abc', ColOrder='abc') self.assertEqual(list(sq.Items), [1,2,3,2,4,6,3,6,9]) #check that it works when we change the row and col order sq.RowOrder = 'ba' sq.ColOrder = 'ccb' self.assertEqual(list(sq.Items), [6,6,4,3,3,2]) #check that it doesn't raise an error on sparse matrices... sp = Dict2D(self.sparse) items = list(sp.Items) items.sort() self.assertEqual(items, [1,2,3]) #...unless self.RowOrder and self.ColOrder are set... sp.RowOrder = 'ad' sp.ColOrder = 'abc' self.assertRaises(Dict2DSparseError, list, sp.Items) #...and then, only if self.Pad is not set sp.Pad = True sp.Default = 'xxx' self.assertEqual(list(sp.Items), [1, 'xxx', 3,'xxx',2,'xxx']) def test_getRows(self): """Dict2D getRows should get specified rows""" self.assertEqual(Dict2D(self.square).getRows(['a','c']), \ {'a':{'a':1,'b':2,'c':3},'c':{'a':3,'b':6,'c':9}}) #should work on sparse matrix self.assertEqual(Dict2D(self.sparse).getRows(['d']), {'d':{'b':2}}) #should raise KeyError if row doesn't exist... d = Dict2D(self.sparse) self.assertRaises(KeyError, d.getRows, ['c']) #...unless we're Padding d.Pad = True self.assertEqual(d.getRows('c'), {'c':{}}) #should work when we negate it self.assertEqual(Dict2D(self.square).getRows(['a','c'], negate=True), {'b':{'a':2,'b':4,'c':6}}) def test_getRowIndices(self): """Dict2D getRowIndices should return indices of rows where f(x) True""" d = Dict2D(self.square) lt_15 = lambda x: sum(x) < 15 self.assertEqual(d.getRowIndices(lt_15), ['a','b']) #should be bound by RowOrder and ColOrder d.RowOrder = d.ColOrder = 'ac' self.assertEqual(d.getRowIndices(lt_15), ['a','c']) #negate should work d.RowOrder = d.ColOrder = None self.assertEqual(d.getRowIndices(lt_15, negate=True), ['c']) def test_getRowsIf(self): """Dict2D getRowsIf should return object with rows wher f(x) is True""" d = Dict2D(self.square) lt_15 = lambda x: sum(x) < 15 self.assertEqual(d.getRowsIf(lt_15), \ {'a':{'a':1,'b':2,'c':3},'b':{'a':2,'b':4,'c':6}}) #should do test by RowOrder, but copy the whole row d.RowOrder = d.ColOrder = 'ac' self.assertEqual(d.getRowsIf(lt_15), \ {'a':{'a':1,'b':2,'c':3},'c':{'a':3,'b':6,'c':9}}) #negate should work d.RowOrder = d.ColOrder = None self.assertEqual(d.getRowsIf(lt_15, negate=True), \ {'c':{'a':3,'b':6,'c':9}}) def test_getCols(self): """Dict2D getCols should return object with specified cols only""" d = Dict2D(self.square) self.assertEqual(d.getCols('bc'), { 'a':{'b':2, 'c':3}, 'b':{'b':4, 'c':6}, 'c':{'b':6,'c':9}, }) #check that it works on ragged matrices d = Dict2D(self.top_triangle) self.assertEqual(d.getCols('ac'), { 'a':{'a':1, 'c':3}, 'b':{'c':6}, 'c':{'c':9} }) #check that negate works d = Dict2D(self.square) self.assertEqual(d.getCols('bc', negate=True), { 'a':{'a':1}, 'b':{'a':2}, 'c':{'a':3}, }) def test_getColIndices(self): """Dict2D getColIndices should return list of indices of matching cols""" d = Dict2D(self.square) lt_15 = lambda x: sum(x) < 15 self.assertEqual(d.getColIndices(lt_15), ['a','b']) #check that negate works self.assertEqual(d.getColIndices(lt_15, negate=True), ['c']) def test_getColsIf(self): """Dict2D getColsIf should return new Dict2D with matching cols""" d = Dict2D(self.square) lt_15 = lambda x: sum(x) < 15 self.assertEqual(d.getColsIf(lt_15), { 'a':{'a':1,'b':2},'b':{'a':2,'b':4},'c':{'a':3,'b':6} }) #check that negate works self.assertEqual(d.getColsIf(lt_15, negate=True), \ {'a':{'c':3},'b':{'c':6},'c':{'c':9}}) def test_getItems(self): """Dict2D getItems should return list of relevant items""" d = Dict2D(self.square) self.assertEqual(d.getItems([('a','a'),('b','c'),('c','a'),('a','a')]),\ [1,6,3,1]) #should work on ragged matrices... d = Dict2D(self.top_triangle) self.assertEqual(d.getItems([('a','c'),('c','c')]), [3,9]) #...unless absent items are asked for... self.assertRaises(KeyError, d.getItems, [('a','a'),('c','a')]) #...unles self.Pad is True d.Pad = True self.assertEqual(d.getItems([('a','c'),('c','a')]), [3, None]) #negate should work -- must specify RowOrder and ColOrder to get #results in predictable order d.Pad = False d.RowOrder = d.ColOrder = 'abc' self.assertEqual(d.getItems([('a','c'),('c','a'),('a','a')], \ negate=True), [2,4,6,9]) def test_getItemIndices(self): """Dict2D getItemIndices should return indices when f(item) is True""" lt_5 = lambda x: x < 5 d = Dict2D(self.square) d.RowOrder = d.ColOrder = 'abc' self.assertEqual(d.getItemIndices(lt_5), \ [('a','a'),('a','b'),('a','c'),('b','a'),('b','b'),('c','a')]) self.assertEqual(d.getItemIndices(lt_5, negate=True), \ [('b','c'),('c','b'),('c','c')]) d = Dict2D(self.top_triangle) d.RowOrder = d.ColOrder = 'abc' self.assertEqual(d.getItemIndices(lt_5), \ [('a','a'),('a','b'),('a','c'),('b','b')]) def test_getItemsIf(self): """Dict2D getItemsIf should return list of items when f(item) is True""" lt_5 = lambda x: x < 5 d = Dict2D(self.square) d.RowOrder = d.ColOrder = 'abc' self.assertEqual(d.getItemsIf(lt_5), [1,2,3,2,4,3]) self.assertEqual(d.getItemsIf(lt_5, negate=True), [6,6,9]) d = Dict2D(self.top_triangle) d.RowOrder = d.ColOrder = 'abc' self.assertEqual(d.getItemsIf(lt_5), [1,2,3,4]) self.assertEqual(d.getItemsIf(lt_5, negate=True), [6,9]) def test_toLists(self): """Dict2D toLists should convert dict into list of lists""" d = Dict2D(self.square) d.RowOrder = 'abc' d.ColOrder = 'abc' self.assertEqual(d.toLists(), [[1,2,3],[2,4,6],[3,6,9]]) self.assertEqual(d.toLists(headers=True), \ [['-', 'a', 'b', 'c'], ['a', 1, 2, 3], ['b', 2, 4, 6], ['c', 3, 6, 9], ]) #should raise error if called on sparse matrix... self.assertRaises(Dict2DSparseError, Dict2D(self.sparse).toLists) #...unless self.Pad is True d = Dict2D(self.sparse) d.RowOrder = 'ad' d.ColOrder = 'abc' d.Pad = True d.Default = 'x' self.assertEqual(d.toLists(headers=True), \ [['-','a','b','c'],['a',1,'x',3],['d','x',2,'x']]) #works without RowOrder or ColOrder goal = [[1,2,3],[2,4,6],[3,6,9]] # headers=False d = Dict2D(self.square) l = d.toLists() for r in l: r.sort() l.sort() self.assertEqual(l,goal) # headers=True d.toLists(headers=True) l = d.toLists() for r in l: r.sort() l.sort() self.assertEqual(l,goal) def test_copy(self): """Dict2D copy should copy data and selected attributes""" #if it works on sparse matrices, it'll work on dense ones s = Dict2D(self.sparse) s.Pad = True s.RowOrder = 'abc' s.ColOrder = 'def' s.xxx = 'yyy' #arbitrary attributes won't be copied s2 = s.copy() self.assertEqual(s, s2) assert s is not s2 assert not hasattr(s2, 'xxx') self.assertEqual(s2.RowOrder, 'abc') self.assertEqual(s2.ColOrder, 'def') self.assertEqual(s2.Pad, True) assert 'Default' not in s2.__dict__ assert 'RowConstructor' not in s2.__dict__ def test_fill(self): """Dict2D fill should fill in specified values""" #with no parameters, should just fill in elements that exist d = Dict2D(self.sparse) d.fill('x') self.assertEqual(d, {'a':{'a':'x','c':'x'}, 'd':{'b':'x'}}) #if cols is set, makes sure all the relevant cols exist in each row #doesn't delete extra cols if they are present d = Dict2D(self.sparse) d.fill('x', cols='bc') #note that d[a][a] should not be affected by the fill self.assertEqual(d, {'a':{'a':1,'b':'x','c':'x'},\ 'd':{'b':'x','c':'x'} }) #if rows but not cols is set, should create but not fill rows d = Dict2D(self.sparse) d.fill('y', rows='ab') self.assertEqual(d, {'a':{'a':'y','c':'y'}, 'b':{}, #new row created 'd':{'b':2} #unaffected since not in rows }) #if both rows and cols are set, should create and fill rows d = Dict2D(self.sparse) d.fill('z', rows='abc', cols='abc') self.assertEqual(d, {'a':{'a':'z','b':'z','c':'z'}, 'b':{'a':'z','b':'z','c':'z'}, 'c':{'a':'z','b':'z','c':'z'}, 'd':{'b':2} #unaffected since col skipped }) #if set_orders is True, should reset RowOrder and ColOrder d = Dict2D(self.sparse) d.fill('z', rows='abc', cols='xyz', set_orders=True) self.assertEqual(d.RowOrder, 'abc') self.assertEqual(d.ColOrder, 'xyz') d.fill('a', set_orders=True) self.assertEqual(d.RowOrder, None) self.assertEqual(d.ColOrder, None) def test_setDiag(self): """Dict2D setDiag should set diagonal to specified value""" #should have no effect on empty dict2d d = Dict2D(self.empty) d.setDiag(0) self.assertEqual(d, {}) #should work on one-element dict d = Dict2D(self.single_same) d.setDiag(0) self.assertEqual(d, {'a':{'a':0}}) d = Dict2D(self.single_diff) d.setDiag(0) self.assertEqual(d, {'a':{'a':0,'b':3}}) #should work on dense dict d = Dict2D(self.square) d.setDiag(9) self.assertEqual(d, { 'a':{'a':9,'b':2,'c':3}, 'b':{'a':2,'b':9,'c':6}, 'c':{'a':3,'b':6,'c':9}, }) #should work on sparse dict, creating cols for rows but not vice versa d = Dict2D(self.sparse) d.setDiag(-1) self.assertEqual(d, {'a':{'a':-1,'c':3},'d':{'b':2,'d':-1}}) def test_scale(self): """Dict2D scale should apply f(x) to each d[i][j]""" doubler = lambda x: x * 2 #should have no effect on empty Dict2D d = Dict2D(self.empty) d.scale(doubler) self.assertEqual(d, {}) #should work on single-element dict d = Dict2D(self.single_diff) d.scale(doubler) self.assertEqual(d, {'a':{'b':6}}) #should work on dense dict d = Dict2D(self.square) d.scale(doubler) self.assertEqual(d, { 'a':{'a':2,'b':4,'c':6}, 'b':{'a':4,'b':8,'c':12}, 'c':{'a':6,'b':12,'c':18}, }) #should work on sparse dict, not creating any new elements d = Dict2D(self.sparse) d.scale(doubler) self.assertEqual(d, {'a':{'a':2,'c':6},'d':{'b':4}}) def test_transpose(self): """Dict2D transpose should work on both dense and sparse matrices, in place""" #should do nothing to empty matrix d = Dict2D(self.empty) d.transpose() self.assertEqual(d, {}) #should do nothing to single-element square matrix d = Dict2D(self.single_same) d.transpose() self.assertEqual(d, {'a':{'a':2}}) #should reverse single-element non-square matrix d = Dict2D(self.single_diff) d.transpose() self.assertEqual(d, {'b':{'a':3}}) #should work on sparse matrix d = Dict2D(self.sparse) d.transpose() self.assertEqual(d, {'a':{'a':1}, 'c':{'a':3},'b':{'d':2}}) #should reverse row and col order d = Dict2D(self.dense) d.RowOrder = 'ab' d.ColOrder = 'abc' d.transpose() self.assertEqual(d, \ {'a':{'a':1,'b':2},'b':{'a':2,'b':4},'c':{'a':3,'b':6}}) self.assertEqual(d.ColOrder, 'ab') self.assertEqual(d.RowOrder, 'abc') def test_reflect(self): """Dict2D reflect should reflect square matrices across diagonal.""" d = Dict2D(self.top_triangle) #should fail if RowOrder and/or ColOrder are unspecified self.assertRaises(Dict2DError, d.reflect) self.assertRaises(Dict2DError, d.reflect, upper_to_lower) d.RowOrder = 'abc' self.assertRaises(Dict2DError, d.reflect) d.RowOrder = None d.ColOrder = 'abc' self.assertRaises(Dict2DError, d.reflect) #should work if RowOrder and ColOrder are both set d.RowOrder = 'abc' d.reflect(upper_to_lower) self.assertEqual(d, self.square) #try it on lower triangle as well -- note that the diagonal won't be #set if it's absent. d = Dict2D(self.bottom_triangle) d.ColOrder = 'abc' d.RowOrder = 'abc' d.reflect(lower_to_upper) self.assertEqual(d, { 'a':{'b':2,'c':3}, 'b':{'a':2,'c':6}, 'c':{'a':3,'b':6}, }) d = Dict2D({ 'a':{'a':2,'b':4,'c':6}, 'b':{'a':10,'b':20, 'c':30}, 'c':{'a':30, 'b':60, 'c':90}, }) d.ColOrder = d.RowOrder = 'abc' d.reflect(average) self.assertEqual(d, { 'a':{'a':2,'b':7,'c':18}, 'b':{'a':7,'b':20,'c':45}, 'c':{'a':18,'b':45,'c':90}, }) def test_toDelimited(self): """Dict2D toDelimited should return delimited string for printing""" d = Dict2D(self.square) d.RowOrder = d.ColOrder = 'abc' self.assertEqual(d.toDelimited(), \ '-\ta\tb\tc\na\t1\t2\t3\nb\t2\t4\t6\nc\t3\t6\t9') self.assertEqual(d.toDelimited(headers=False), \ '1\t2\t3\n2\t4\t6\n3\t6\t9') #set up a custom formatter... def my_formatter(x): try: return '%1.1f' % x except: return str(x) #...and use it self.assertEqual(d.toDelimited(headers=True, item_delimiter='x', \ row_delimiter='y', formatter=my_formatter), \ '-xaxbxcyax1.0x2.0x3.0ybx2.0x4.0x6.0ycx3.0x6.0x9.0') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_util/test_misc.py000644 000765 000024 00000210565 12024702176 021774 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for utility functions and classes. """ from copy import copy, deepcopy from os import remove, rmdir from os.path import exists from cogent.app.util import get_tmp_filename from cogent.util.unit_test import TestCase, main from cogent.util.misc import (iterable, max_index, min_index, flatten, is_iterable, is_char, is_char_or_noniterable, is_str_or_noniterable, not_list_tuple, list_flatten, recursive_flatten, unflatten, unzip, select, sort_order, find_all, find_many, unreserve, extract_delimited, caps_from_underscores, add_lowercase, InverseDict, InverseDictMulti, DictFromPos, DictFromFirst, DictFromLast, DistanceFromMatrix, PairsFromGroups, ClassChecker, Delegator, FunctionWrapper, ConstraintError, ConstrainedContainer, ConstrainedString, ConstrainedList, ConstrainedDict, MappedString, MappedList, MappedDict, generateCombinations, makeNonnegInt, NonnegIntError, reverse_complement, not_none, get_items_except, NestedSplitter, curry, app_path, remove_files, get_random_directory_name, revComp, parse_command_line_parameters, safe_md5, create_dir, handle_error_codes, identity, if_, deep_list, deep_tuple, combinate,gzip_dump,gzip_load,recursive_flatten_old,getNewId,toString, timeLimitReached, get_independent_coords, get_merged_by_value_coords, get_merged_overlapping_coords, get_run_start_indices) from numpy import array from time import clock, sleep __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Amanda Birmingham", "Sandra Smit", "Zongzhi Liu", "Peter Maxwell", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class UtilsTests(TestCase): """Tests of individual functions in utils""" def setUp(self): """ """ self.files_to_remove = [] self.dirs_to_remove = [] def tearDown(self): """ """ map(remove,self.files_to_remove) map(rmdir,self.dirs_to_remove) def test_identity(self): """should return same object""" foo = [1,'a',lambda x: x] exp = id(foo) self.assertEqual(id(identity(foo)), exp) def test_if_(self): """implementation of c-like tertiary operator""" exp = 'yay' obs = if_(True, 'yay', 'nay') self.assertEqual(obs, exp) exp = 'nay' obs = if_(False, 'yay', 'nay') self.assertEqual(obs, exp) def test_deep_list(self): """should convert nested tuple to nested list""" input = ((1,(2,3)),(4,5),(6,7)) exp = [[1,[2,3]],[4,5],[6,7]] obs = deep_list(input) self.assertEqual(obs, exp) def test_deep_tuple(self): """Should convert a nested list to a nested tuple""" exp = ((1,(2,3)),(4,5),(6,7)) input = [[1,[2,3]],[4,5],[6,7]] obs = deep_tuple(input) self.assertEqual(obs, exp) def test_combinate(self): """Should return combinations""" input = [1,2,3,4] n = 2 exp = [[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]] obs = list(combinate(input, n)) self.assertEqual(obs, exp) def test_recursive_flatten_old(self): """Should flatten nested lists""" input = [[[1,2],[3,[4,5]],[6,7]],8] exp = [1,2,3,4,5,6,7,8] obs = recursive_flatten_old(input) self.assertEqual(obs, exp) def test_getNewId(self): """should return a random 12 digit id""" rand_f = lambda x: 1 obs = getNewId(rand_f=rand_f) exp = '111111111111' self.assertEqual(obs,exp) def test_toString(self): """should stringify an object""" class foo(object): def __init__(self): self.bar = 5 exp = 'bar: 5' obs = toString(foo()) self.assertEqual(obs, exp) # this test and the code it tests is architecture dependent. that is not # good. #def test_timeLimitReached(self): # """should return true if timelimit has been reached, else return false""" # start = clock() # timelimit = .0002 # exp = False # sleep(1) # obs = timeLimitReached(start, timelimit) # self.assertEqual(obs, exp) # sleep(1) # exp = True # obs = timeLimitReached(start, timelimit) # self.assertEqual(obs, exp) def test_safe_md5(self): """Make sure we have the expected md5""" exp = 'd3b07384d113edec49eaa6238ad5ff00' tmp_fp = get_tmp_filename(prefix='test_safe_md5', suffix='txt') self.files_to_remove.append(tmp_fp) tmp_f = open(tmp_fp, 'w') tmp_f.write('foo\n') tmp_f.close() obs = safe_md5(open(tmp_fp, 'U')) self.assertEqual(obs.hexdigest(),exp) def test_iterable(self): """iterable(x) should return x or [x], always an iterable result""" self.assertEqual(iterable('x'), 'x') self.assertEqual(iterable(''), '') self.assertEqual(iterable(3), [3]) self.assertEqual(iterable(None), [None]) self.assertEqual(iterable({'a':1}), {'a':1}) self.assertEqual(iterable(['a','b','c']), ['a', 'b', 'c']) def test_max_index(self): """max_index should return index of largest item, last if tie""" self.assertEqual(max_index('abcde'), 4) self.assertEqual(max_index('ebcda'), 0) self.assertRaises(ValueError, max_index, '') self.assertEqual(max_index('ebcde'), 4) self.assertEqual(max_index([0, 0, 1, 0]), 2) def test_min_index(self): """min_index should return index of smallest item, first if tie""" self.assertEqual(min_index('abcde'), 0) self.assertEqual(min_index('ebcda'), 4) self.assertRaises(ValueError, min_index, '') self.assertEqual(min_index('ebcde'), 1) self.assertEqual(min_index([0,0,1,0]), 0) def test_flatten_no_change(self): """flatten should not change non-nested sequences (except to list)""" self.assertEqual(flatten('abcdef'), list('abcdef')) #test identities self.assertEqual(flatten([]), []) #test empty sequence self.assertEqual(flatten(''), []) #test empty string def test_flatten(self): """flatten should remove one level of nesting from nested sequences""" self.assertEqual(flatten(['aa', 'bb', 'cc']), list('aabbcc')) self.assertEqual(flatten([1,[2,3], [[4, [5]]]]), [1, 2, 3, [4,[5]]]) def test_is_iterable(self): """is_iterable should return True for iterables""" #test str self.assertEqual(is_iterable('aa'), True) #test list self.assertEqual(is_iterable([3,'aa']), True) #test Number, expect False self.assertEqual(is_iterable(3), False) def test_is_char(self): """is_char(obj) should return True when obj is a char""" self.assertEqual(is_char('a'), True) self.assertEqual(is_char('ab'), False) self.assertEqual(is_char(''), True) self.assertEqual(is_char([3]), False) self.assertEqual(is_char(3), False) def test_is_char_or_noniterable(self): """is_char_or_noniterable should return True or False""" self.assertEqual(is_char_or_noniterable('a'), True) self.assertEqual(is_char_or_noniterable('ab'), False) self.assertEqual(is_char_or_noniterable(3), True) self.assertEqual(is_char_or_noniterable([3]), False) def test_is_str_or_noniterable(self): """is_str_or_noniterable should return True or False""" self.assertEqual(is_str_or_noniterable('a'), True) self.assertEqual(is_str_or_noniterable('ab'), True) self.assertEqual(is_str_or_noniterable(3), True) self.assertEqual(is_str_or_noniterable([3]), False) def test_recursive_flatten(self): """recursive_flatten should remove all nesting from nested sequences""" self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]]), [1,2,3,4,5]) #test default behavior on str unpacking self.assertEqual(recursive_flatten( ['aa',[8,'cc','dd'], ['ee',['ff','gg']]]), ['a', 'a', 8, 'c', 'c', 'd', 'd', 'e', 'e', 'f', 'f', 'g', 'g']) #test str untouched flattening using is_leaf=is_str_or_noniterable self.assertEqual(recursive_flatten( ['aa',[8,'cc','dd'], ['ee',['ff','gg']]], is_leaf=is_str_or_noniterable), ['aa',8,'cc','dd','ee','ff','gg']) def test_create_dir(self): """create_dir creates dir and fails meaningful.""" tmp_dir_path = get_random_directory_name() tmp_dir_path2 = get_random_directory_name(suppress_mkdir=True) tmp_dir_path3 = get_random_directory_name(suppress_mkdir=True) self.dirs_to_remove.append(tmp_dir_path) self.dirs_to_remove.append(tmp_dir_path2) self.dirs_to_remove.append(tmp_dir_path3) # create on existing dir raises OSError if fail_on_exist=True self.assertRaises(OSError, create_dir, tmp_dir_path, fail_on_exist=True) self.assertEquals(create_dir(tmp_dir_path, fail_on_exist=True, handle_errors_externally=True), 1) # return should be 1 if dir exist and fail_on_exist=False self.assertEqual(create_dir(tmp_dir_path, fail_on_exist=False), 1) # if dir not there make it and return always 0 self.assertEqual(create_dir(tmp_dir_path2), 0) self.assertEqual(create_dir(tmp_dir_path3, fail_on_exist=True), 0) def test_handle_error_codes(self): """handle_error_codes raises the right error.""" self.assertRaises(OSError, handle_error_codes, "test", False,1) self.assertEqual(handle_error_codes("test", True, 1), 1) self.assertEqual(handle_error_codes("test", False, 0), 0) self.assertEqual(handle_error_codes("test"), 0) def test_not_list_tuple(self): """not_list_tuple(obj) should return False when obj is list or tuple""" self.assertEqual(not_list_tuple([8,3]), False) self.assertEqual(not_list_tuple((8,3)), False) self.assertEqual(not_list_tuple('34'), True) def test_list_flatten(self): """list_flatten should remove all nesting, str is untouched """ self.assertEqual(list_flatten( ['aa',[8,'cc','dd'], ['ee',['ff','gg']]], ), ['aa',8,'cc','dd','ee','ff','gg']) def test_recursive_flatten_max_depth(self): """recursive_flatten should not remover more than max_depth levels""" self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]]), [1,2,3,4,5]) self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 0), \ [1,[2,3], [[4, [5]]]]) self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 1), \ [1,2,3, [4, [5]]]) self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 2), \ [1,2,3,4, [5]]) self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 3), \ [1,2,3,4,5]) self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 5000), \ [1,2,3,4,5]) def test_unflatten(self): """unflatten should turn a 1D sequence into a 2D list""" self.assertEqual(unflatten("abcdef", 1), list("abcdef")) self.assertEqual(unflatten("abcdef", 1, True), list("abcdef")) self.assertEqual(unflatten("abcdef", 2), ['ab','cd','ef']) self.assertEqual(unflatten("abcdef", 3), ['abc','def']) self.assertEqual(unflatten("abcdef", 4), ['abcd']) #should be able to preserve extra items self.assertEqual(unflatten("abcdef", 4, True), ['abcd', 'ef']) self.assertEqual(unflatten("abcdef", 10), []) self.assertEqual(unflatten("abcdef", 10, True), ['abcdef']) #should succeed on empty sequnce self.assertEqual(unflatten('',10), []) def test_unflatten_bad_row_width(self): "unflatten should raise ValueError with row_width < 1""" self.assertRaises(ValueError, unflatten, "abcd", 0) self.assertRaises(ValueError, unflatten, "abcd", -1) def test_unzip(self): """unzip(items) should be the inverse of zip(*items)""" chars = [list('abcde'), list('ghijk')] numbers = [[1,2,3,4,5], [0,0,0,0,0]] strings = [["abcde", "fghij", "klmno"], ['xxxxx'] * 3] empty = [[]] lists = [chars, numbers, strings] zipped = [zip(*i) for i in lists] unzipped = [unzip(i) for i in zipped] for u, l in zip(unzipped, lists): self.assertEqual(u, l) def test_select_sequence(self): """select should work on a sequence with a list of indices""" chars = 'abcdefghij' strings = list(chars) tests = { (0,):['a'], (-1,):['j'], (0, 2, 4): ['a', 'c', 'e'], (9,8,7,6,5,4,3,2,1,0):list('jihgfedcba'), (-8, 8): ['c', 'i'], ():[], } for test, result in tests.items(): self.assertEqual(select(test, chars), result) self.assertEqual(select(test, strings), result) def test_select_empty(self): """select should raise error if indexing into empty sequence""" self.assertRaises(IndexError, select, [1], []) def test_select_mapping(self): """select should return the values corresponding to a list of keys""" values = {'a':5, 'b':2, 'c':4, 'd':6, 'e':7} self.assertEqual(select('abc', values), [5,2,4]) self.assertEqual(select(['e','e','e'], values), [7,7,7]) self.assertEqual(select(('e', 'b', 'a'), values), [7, 2, 5]) #check that it raises KeyError on anything out of range self.assertRaises(KeyError, select, 'abx', values) def test_sort_order(self): """sort_order should return the ordered indices of items""" self.assertEqual(sort_order('abc'), [0, 1, 2]) self.assertEqual(sort_order('cba'), [2,1,0]) self.assertEqual(sort_order('bca'), [2,0,1]) def test_sort_order_cmpfunc(self): """sort_order should use cmpfunc if passed""" self.assertEqual(sort_order([4, 8, 10], lambda x,y:cmp(y,x)), [2, 1, 0]) def test_sort_order_empty(self): """sort_order should return empty list on empty sequence""" self.assertEqual(sort_order([]), []) def test_find_all(self): """find_all should return list of all occurrences""" self.assertEqual(find_all('abc', 'd'), []) self.assertEqual(find_all('abc', 'a'), [0]) self.assertEqual(find_all('abcabca', 'a'), [0,3,6]) self.assertEqual(find_all('abcabca', 'c'), [2,5]) self.assertEqual(find_all('abcabca', '3'), []) self.assertEqual(find_all('abcabca', 'bc'), [1,4]) self.assertRaises(TypeError, find_all,'abcabca', 3) def test_find_many(self): """find_many should return list of all occurrences of all items""" #should be same as find_all for single chars self.assertEqual(find_many('abc', 'd'), []) self.assertEqual(find_many('abc', 'a'), [0]) self.assertEqual(find_many('abcabca', 'a'), [0,3,6]) self.assertEqual(find_many('abcabca', 'c'), [2,5]) self.assertEqual(find_many('abcabca', '3'), []) #should sort together the items from the two lists self.assertEqual(find_many('abcabca', 'bc'), [1,2,4,5]) #note difference between 2-char string and 1-string list self.assertEqual(find_many('abcabca', ['bc']), [1,4]) self.assertRaises(TypeError, find_many,'abcabca', [3]) def test_unreserve(self): """unreserve should trim trailing underscore if present.""" for i in (None, [], ['x'], 'xyz', '', 'a', '__abc'): self.assertEqual(unreserve(i), i) self.assertEqual(unreserve('_'), '') self.assertEqual(unreserve('class_'), 'class') def test_extract_delimited_bad_delimiters(self): """extract_delimited should raise error if delimiters identical""" self.assertRaises(TypeError, extract_delimited, '|acb|acx', '|','|') def test_extract_delimited_missing_right(self): """extract_delimited should raise error if right delimiter missing""" self.assertRaises(ValueError, extract_delimited, 'ac[acgsd', '[', ']') def test_extract_delimited_normal(self): """extract_delimited should return correct field if present, or None""" self.assertEqual(extract_delimited('[]', '[', ']'), '') self.assertEqual(extract_delimited('asdsad', '[', ']'), None) self.assertEqual(extract_delimited('ac[abc]ac', '[', ']'), 'abc') self.assertEqual(extract_delimited('[xyz]asd', '[', ']'), 'xyz') self.assertEqual(extract_delimited('acg[xyz]', '[', ']'), 'xyz') self.assertEqual(extract_delimited('abcdef', 'a', 'e'), 'bcd') def test_extract_delimited_indexed(self): """extract_delimited should return correct field with starting index""" self.assertEqual(extract_delimited('[abc][def]', '[',']', 0), 'abc') self.assertEqual(extract_delimited('[abc][def]','[',']',1), 'def') self.assertEqual(extract_delimited('[abc][def]', '[',']',5), 'def') def test_caps_from_underscores(self): """caps_from_underscores should become CapsFromUnderscores""" cfu = caps_from_underscores #should still convert strings without underscores self.assertEqual(cfu('ABCDE abcde!$'), 'Abcde Abcde!$') self.assertEqual(cfu('abc_def'), 'AbcDef') #should read through multiple underscores self.assertEqual(cfu('_caps__from_underscores___'), 'CapsFromUnderscores') def test_add_lowercase(self): """add_lowercase should add lowercase version of each key w/ same val""" d = {'a':1, 'b':'test', 'A':5, 'C':123, 'D':[], 'AbC':'XyZ', \ None:'3', '$':'abc', 145:'5'} add_lowercase(d) assert d['d'] is d['D'] d['D'].append(3) self.assertEqual(d['D'], [3]) self.assertEqual(d['d'], [3]) self.assertNotEqual(d['a'], d['A']) self.assertEqual(d, {'a':1, 'b':'test', 'A':5, 'C':123, 'c':123, \ 'D':[3], 'd':[3], 'AbC':'XyZ', 'abc':'xyz', None:'3', '$':'abc', \ 145:'5'}) #should work with strings d = 'ABC' self.assertEqual(add_lowercase(d), 'ABCabc') #should work with tuples d = tuple('ABC') self.assertEqual(add_lowercase(d), tuple('ABCabc')) #should work with lists d = list('ABC') self.assertEqual(add_lowercase(d), list('ABCabc')) #should work with sets d = set('ABC') self.assertEqual(add_lowercase(d), set('ABCabc')) #...even frozensets d = frozenset('ABC') self.assertEqual(add_lowercase(d), frozenset('ABCabc')) def test_add_lowercase_tuple(self): """add_lowercase should deal with tuples correctly""" d = {('A','B'):'C', ('D','e'):'F', ('b','c'):'H'} add_lowercase(d) self.assertEqual(d, { ('A','B'):'C', ('a','b'):'c', ('D','e'):'F', ('d','e'):'f', ('b','c'):'H', }) def test_InverseDict(self): """InverseDict should invert dict's keys and values""" self.assertEqual(InverseDict({}), {}) self.assertEqual(InverseDict({'3':4}), {4:'3'}) self.assertEqual(InverseDict({'a':'x','b':1,'c':None,'d':('a','b')}), \ {'x':'a',1:'b',None:'c',('a','b'):'d'}) self.assertRaises(TypeError, InverseDict, {'a':['a','b','c']}) d = InverseDict({'a':3, 'b':3, 'c':3}) self.assertEqual(len(d), 1) assert 3 in d assert d[3] in 'abc' def test_InverseDictMulti(self): """InverseDictMulti should invert keys and values, keeping all keys""" self.assertEqual(InverseDictMulti({}), {}) self.assertEqual(InverseDictMulti({'3':4}), {4:['3']}) self.assertEqual(InverseDictMulti(\ {'a':'x','b':1,'c':None,'d':('a','b')}), \ {'x':['a'],1:['b'],None:['c'],('a','b'):['d']}) self.assertRaises(TypeError, InverseDictMulti, {'a':['a','b','c']}) d = InverseDictMulti({'a':3, 'b':3, 'c':3, 'd':'3', 'e':'3'}) self.assertEqual(len(d), 2) assert 3 in d d3_items = d[3][:] self.assertEqual(len(d3_items), 3) d3_items.sort() self.assertEqual(''.join(d3_items), 'abc') assert '3' in d d3_items = d['3'][:] self.assertEqual(len(d3_items), 2) d3_items.sort() self.assertEqual(''.join(d3_items), 'de') def test_DictFromPos(self): """DictFromPos should return correct lists of positions""" d = DictFromPos self.assertEqual(d(''), {}) self.assertEqual(d('a'), {'a':[0]}) self.assertEqual(d(['a','a','a']), {'a':[0,1,2]}) self.assertEqual(d('abacdeeee'), {'a':[0,2],'b':[1],'c':[3],'d':[4], \ 'e':[5,6,7,8]}) self.assertEqual(d(('abc',None, 'xyz', None, 3)), {'abc':[0],None:[1,3], 'xyz':[2], 3:[4]}) def test_DictFromFirst(self): """DictFromFirst should return correct first positions""" d = DictFromFirst self.assertEqual(d(''), {}) self.assertEqual(d('a'), {'a':0}) self.assertEqual(d(['a','a','a']), {'a':0}) self.assertEqual(d('abacdeeee'), {'a':0,'b':1,'c':3,'d':4,'e':5}) self.assertEqual(d(('abc',None, 'xyz', None, 3)), {'abc':0,None:1, 'xyz':2, 3:4}) def test_DictFromLast(self): """DictFromLast should return correct last positions""" d = DictFromLast self.assertEqual(d(''), {}) self.assertEqual(d('a'), {'a':0}) self.assertEqual(d(['a','a','a']), {'a':2}) self.assertEqual(d('abacdeeee'), {'a':2,'b':1,'c':3,'d':4,'e':8}) self.assertEqual(d(('abc',None, 'xyz', None, 3)), {'abc':0,None:3, 'xyz':2, 3:4}) def test_DistanceFromMatrix(self): """DistanceFromMatrix should return correct elements""" m = {'a':{'3':4, 6:1}, 'b':{'3':5,'6':2}} d = DistanceFromMatrix(m) self.assertEqual(d('a','3'), 4) self.assertEqual(d('a',6), 1) self.assertEqual(d('b','3'), 5) self.assertEqual(d('b','6'), 2) self.assertRaises(KeyError, d, 'c', 1) self.assertRaises(KeyError, d, 'b', 3) def test_PairsFromGroups(self): """PairsFromGroups should return dict with correct pairs""" empty = [] self.assertEqual(PairsFromGroups(empty), {}) one = ['abc'] self.assertEqual(PairsFromGroups(one), dict.fromkeys([ \ ('a','a'), ('a','b'), ('a','c'), \ ('b','a'), ('b','b'), ('b','c'), \ ('c','a'), ('c','b'), ('c','c'), \ ])) two = ['xy', 'abc'] self.assertEqual(PairsFromGroups(two), dict.fromkeys([ \ ('a','a'), ('a','b'), ('a','c'), \ ('b','a'), ('b','b'), ('b','c'), \ ('c','a'), ('c','b'), ('c','c'), \ ('x','x'), ('x','y'), ('y','x'), ('y','y'), \ ])) #if there's overlap, note that the groups should _not_ be expanded #(e.g. in the following case, 'x' is _not_ similar to 'c', even though #both 'x' and 'c' are similar to 'a'. overlap = ['ax', 'abc'] self.assertEqual(PairsFromGroups(overlap), dict.fromkeys([ \ ('a','a'), ('a','b'), ('a','c'), \ ('b','a'), ('b','b'), ('b','c'), \ ('c','a'), ('c','b'), ('c','c'), \ ('x','x'), ('x','a'), ('a','x'), \ ])) def test_remove_files(self): """Remove files functions as expected """ # create list of temp file paths test_filepaths = \ [get_tmp_filename(prefix='remove_files_test') for i in range(5)] # try to remove them with remove_files and verify that an IOError is # raises self.assertRaises(OSError,remove_files,test_filepaths) # now get no error when error_on_missing=False remove_files(test_filepaths,error_on_missing=False) # touch one of the filepaths so it exists open(test_filepaths[2],'w').close() # check that an error is raised on trying to remove the files... self.assertRaises(OSError,remove_files,test_filepaths) # ... but that the existing file was still removed self.assertFalse(exists(test_filepaths[2])) # touch one of the filepaths so it exists open(test_filepaths[2],'w').close() # no error is raised on trying to remove the files # (although 4 don't exist)... remove_files(test_filepaths,error_on_missing=False) # ... and the existing file was removed self.assertFalse(exists(test_filepaths[2])) def test_get_random_directory_name(self): """get_random_directory_name functions as expected """ # repeated calls yield different directory names dirs = [] for i in range(100): d = get_random_directory_name(suppress_mkdir=True) self.assertTrue(d not in dirs) dirs.append(d) actual = get_random_directory_name(suppress_mkdir=True) self.assertFalse(exists(actual),'Random dir exists: %s' % actual) self.assertTrue(actual.startswith('/'),\ 'Random dir is not a full path: %s' % actual) # prefix, suffix and output_dir are used as expected actual = get_random_directory_name(suppress_mkdir=True,prefix='blah',\ output_dir='/tmp/',suffix='stuff') self.assertTrue(actual.startswith('/tmp/blah2'),\ 'Random dir does not begin with output_dir + prefix '+\ '+ 2 (where 2 indicates the millenium in the timestamp): %s' % actual) self.assertTrue(actual.endswith('stuff'),\ 'Random dir does not end with suffix: %s' % actual) # changing rand_length functions as expected actual1 = get_random_directory_name(suppress_mkdir=True) actual2 = get_random_directory_name(suppress_mkdir=True,\ rand_length=10) actual3 = get_random_directory_name(suppress_mkdir=True,\ rand_length=0) self.assertTrue(len(actual1) > len(actual2) > len(actual3),\ "rand_length does not affect directory name lengths "+\ "as expected:\n%s\n%s\n%s" % (actual1,actual2,actual3)) # changing the timestamp pattern functions as expected actual1 = get_random_directory_name(suppress_mkdir=True) actual2 = get_random_directory_name(suppress_mkdir=True,\ timestamp_pattern='%Y') self.assertNotEqual(actual1,actual2) self.assertTrue(len(actual1)>len(actual2),\ 'Changing timestamp_pattern does not affect directory name') # empty string as timestamp works actual3 = get_random_directory_name(suppress_mkdir=True,\ timestamp_pattern='') self.assertTrue(len(actual2) > len(actual3)) # creating the directory works as expected actual = get_random_directory_name(output_dir='/tmp/',\ prefix='get_random_directory_test') self.assertTrue(exists(actual)) rmdir(actual) def test_independent_spans(self): """get_independent_coords returns truly non-overlapping (decorated) spans""" # single span is returned data = [(0, 20, 'a')] got = get_independent_coords(data) self.assertEqual(got, data) # multiple non-overlapping data = [(20, 30, 'a'), (35, 40, 'b'), (65, 75, 'c')] got = get_independent_coords(data) self.assertEqual(got, data) # over-lapping first/second returns first occurrence by default data = [(20, 30, 'a'), (25, 40, 'b'), (65, 75, 'c')] got = get_independent_coords(data) self.assertEqual(got, [(20, 30, 'a'), (65, 75, 'c')]) # but randomly the first or second if random_tie_breaker is chosen got = get_independent_coords(data, random_tie_breaker=True) self.assertTrue(got in ([(20, 30, 'a'), (65, 75, 'c')], [(25, 40, 'b'), (65, 75, 'c')])) # over-lapping second/last returns first occurrence by default data = [(20, 30, 'a'), (30, 60, 'b'), (50, 75, 'c')] got = get_independent_coords(data) self.assertEqual(got, [(20, 30, 'a'), (30, 60, 'b')]) # but randomly the first or second if random_tie_breaker is chosen got = get_independent_coords(data, random_tie_breaker=True) self.assertTrue(got in ([(20, 30, 'a'), (50, 75, 'c')], [(20, 30, 'a'), (30, 60, 'b')])) # over-lapping middle returns first occurrence by default data = [(20, 24, 'a'), (25, 40, 'b'), (30, 35, 'c'), (65, 75, 'd')] got = get_independent_coords(data) self.assertEqual(got, [(20, 24, 'a'), (25, 40, 'b'), (65, 75, 'd')]) # but randomly the first or second if random_tie_breaker is chosen got = get_independent_coords(data, random_tie_breaker=True) self.assertTrue(got in ([(20, 24, 'a'), (25, 40, 'b'), (65, 75, 'd')], [(20, 24, 'a'), (30, 35, 'c'), (65, 75, 'd')])) def test_get_merged_spans(self): """tests merger of overlapping spans""" sample = [[0, 10], [12, 15], [13, 16], [18, 25], [19, 20]] result = get_merged_overlapping_coords(sample) expect = [[0, 10], [12, 16], [18, 25]] self.assertEqual(result, expect) sample = [[0, 10], [5, 9], [12, 16], [18, 20], [19, 25]] result = get_merged_overlapping_coords(sample) expect = [[0, 10], [12, 16], [18, 25]] self.assertEqual(result, expect) def test_get_run_start_indices(self): """return indices corresponding to start of a run of identical values""" # 0 1 2 3 4 5 6 7 data = [1, 2, 3, 3, 3, 4, 4, 5] expect = [[0, 1], [1, 2], [2, 3], [5, 4], [7, 5]] got = get_run_start_indices(data) self.assertEqual(list(got), expect) # raise an exception if try and provide a converter and num digits def wrap_gen(): # need to wrap generator so we can actually test this gen = get_run_start_indices(data, digits=1, converter_func=lambda x: x) def call(): for v in gen: pass return call self.assertRaises(AssertionError, wrap_gen()) def test_merged_by_value_spans(self): """correctly merge adjacent spans with the same value""" # initial values same data = [[20, 21, 0], [21, 22, 0], [22, 23, 1], [23, 24, 0]] self.assertEqual(get_merged_by_value_coords(data), [[20, 22, 0], [22, 23, 1], [23, 24, 0]]) # middle values same data = [[20, 21, 0], [21, 22, 1], [22, 23, 1], [23, 24, 0]] self.assertEqual(get_merged_by_value_coords(data), [[20, 21, 0], [21, 23, 1], [23, 24, 0]]) # last values same data = [[20, 21, 0], [21, 22, 1], [22, 23, 0], [23, 24, 0]] self.assertEqual(get_merged_by_value_coords(data), [[20, 21, 0], [21, 22, 1], [22, 24, 0]]) # all unique values data = [[20, 21, 0], [21, 22, 1], [22, 23, 2], [23, 24, 0]] self.assertEqual(get_merged_by_value_coords(data), [[20, 21, 0], [21, 22, 1], [22, 23, 2], [23, 24, 0]]) # all values same data = [[20, 21, 0], [21, 22, 0], [22, 23, 0], [23, 24, 0]] self.assertEqual(get_merged_by_value_coords(data), [[20, 24, 0]]) # all unique values to 2nd decimal data = [[20, 21, 0.11], [21, 22, 0.12], [22, 23, 0.13], [23, 24, 0.14]] self.assertEqual(get_merged_by_value_coords(data), [[20, 21, 0.11], [21, 22, 0.12], [22, 23, 0.13], [23, 24, 0.14]]) # all values same at 1st decimal data = [[20, 21, 0.11], [21, 22, 0.12], [22, 23, 0.13], [23, 24, 0.14]] self.assertEqual(get_merged_by_value_coords(data, digits=1), [[20, 24, 0.1]]) class _my_dict(dict): """Used for testing subclass behavior of ClassChecker""" pass class ClassCheckerTests(TestCase): """Unit tests for the ClassChecker class.""" def setUp(self): """define a few standard checkers""" self.strcheck = ClassChecker(str) self.intcheck = ClassChecker(int, long) self.numcheck = ClassChecker(float, int, long) self.emptycheck = ClassChecker() self.dictcheck = ClassChecker(dict) self.mydictcheck = ClassChecker(_my_dict) def test_init_good(self): """ClassChecker should init OK when initialized with classes""" self.assertEqual(self.strcheck.Classes, [str]) self.assertEqual(self.numcheck.Classes, [float, int, long]) self.assertEqual(self.emptycheck.Classes, []) def test_init_bad(self): """ClassChecker should raise TypeError if initialized with non-class""" self.assertRaises(TypeError, ClassChecker, 'x') self.assertRaises(TypeError, ClassChecker, str, None) def test_contains(self): """ClassChecker should return True only if given instance of class""" self.assertEqual(self.strcheck.__contains__('3'), True) self.assertEqual(self.strcheck.__contains__('ahsdahisad'), True) self.assertEqual(self.strcheck.__contains__(3), False) self.assertEqual(self.strcheck.__contains__({3:'c'}), False) self.assertEqual(self.intcheck.__contains__('ahsdahisad'), False) self.assertEqual(self.intcheck.__contains__('3'), False) self.assertEqual(self.intcheck.__contains__(3.0), False) self.assertEqual(self.intcheck.__contains__(3), True) self.assertEqual(self.intcheck.__contains__(4**60), True) self.assertEqual(self.intcheck.__contains__(4**60 * -1), True) d = _my_dict() self.assertEqual(self.dictcheck.__contains__(d), True) self.assertEqual(self.dictcheck.__contains__({'d':1}), True) self.assertEqual(self.mydictcheck.__contains__(d), True) self.assertEqual(self.mydictcheck.__contains__({'d':1}), False) self.assertEqual(self.emptycheck.__contains__('d'), False) self.assertEqual(self.numcheck.__contains__(3), True) self.assertEqual(self.numcheck.__contains__(3.0), True) self.assertEqual(self.numcheck.__contains__(-3), True) self.assertEqual(self.numcheck.__contains__(-3.0), True) self.assertEqual(self.numcheck.__contains__(3e-300), True) self.assertEqual(self.numcheck.__contains__(0), True) self.assertEqual(self.numcheck.__contains__(4**1000), True) self.assertEqual(self.numcheck.__contains__('4**1000'), False) def test_str(self): """ClassChecker str should be the same as str(self.Classes)""" for c in [self.strcheck, self.intcheck, self.numcheck, self.emptycheck, self.dictcheck, self.mydictcheck]: self.assertEqual(str(c), str(c.Classes)) def test_copy(self): """copy.copy should work correctly on ClassChecker""" c = copy(self.strcheck) assert c is not self.strcheck assert '3' in c assert 3 not in c assert c.Classes is self.strcheck.Classes def test_deepcopy(self): """copy.deepcopy should work correctly on ClassChecker""" c = deepcopy(self.strcheck) assert c is not self.strcheck assert '3' in c assert 3 not in c assert c.Classes is not self.strcheck.Classes class modifiable_string(str): """Mutable class to allow arbitrary attributes to be set""" pass class _list_and_string(list, Delegator): """Trivial class to demonstrate Delegator. """ def __init__(self, items, string): Delegator.__init__(self, string) self.NormalAttribute = 'default' self._x = None self._constant = 'c' for i in items: self.append(i) def _get_rand_property(self): return self._x def _set_rand_property(self, value): self._x = value prop = property(_get_rand_property, _set_rand_property) def _get_constant_property(self): return self._constant constant = property(_get_constant_property) class DelegatorTests(TestCase): """Verify that Delegator works with attributes and properties.""" def test_init(self): """Delegator should init OK when data supplied""" ls = _list_and_string([1,2,3], 'abc') self.assertRaises(TypeError, _list_and_string, [123]) def test_getattr(self): """Delegator should find attributes in correct places""" ls = _list_and_string([1,2,3], 'abcd') #behavior as list self.assertEqual(len(ls), 3) self.assertEqual(ls[0], 1) ls.reverse() self.assertEqual(ls, [3,2,1]) #behavior as string self.assertEqual(ls.upper(), 'ABCD') self.assertEqual(len(ls.upper()), 4) self.assertEqual(ls.replace('a', 'x'), 'xbcd') #behavior of normal attributes self.assertEqual(ls.NormalAttribute, 'default') #behavior of properties self.assertEqual(ls.prop, None) self.assertEqual(ls.constant, 'c') #shouldn't be allowed to get unknown properties self.assertRaises(AttributeError, getattr, ls, 'xyz') #if the unknown property can be set in the forwarder, do it there flex = modifiable_string('abcd') ls_flex = _list_and_string([1,2,3], flex) ls_flex.blah = 'zxc' self.assertEqual(ls_flex.blah, 'zxc') self.assertEqual(flex.blah, 'zxc') #should get AttributeError if changing a read-only property self.assertRaises(AttributeError, setattr, ls, 'constant', 'xyz') def test_setattr(self): """Delegator should set attributes in correct places""" ls = _list_and_string([1,2,3], 'abcd') #ability to create a new attribute ls.xyz = 3 self.assertEqual(ls.xyz, 3) #modify a normal attribute ls.NormalAttribute = 'changed' self.assertEqual(ls.NormalAttribute, 'changed') #modify a read/write property ls.prop = 'xyz' self.assertEqual(ls.prop, 'xyz') def test_copy(self): """copy.copy should work correctly on Delegator""" l = ['a'] d = Delegator(l) c = copy(d) assert c is not d assert c._handler is d._handler def test_deepcopy(self): """copy.deepcopy should work correctly on Delegator""" l = ['a'] d = Delegator(l) c = deepcopy(d) assert c is not d assert c._handler is not d._handler assert c._handler == d._handler class FunctionWrapperTests(TestCase): """Tests of the FunctionWrapper class""" def test_init(self): """FunctionWrapper should initialize with any callable""" f = FunctionWrapper(str) g = FunctionWrapper(id) h = FunctionWrapper(iterable) x = 3 self.assertEqual(f(x), '3') self.assertEqual(g(x), id(x)) self.assertEqual(h(x), [3]) def test_copy(self): """copy should work for FunctionWrapper objects""" f = FunctionWrapper(str) c = copy(f) assert c is not f assert c.Function is f.Function #NOTE: deepcopy does not work for FunctionWrapper objects because you #can't copy a function. class _simple_container(object): """example of a container to constrain""" def __init__(self, data): self._data = list(data) def __getitem__(self, item): return self._data.__getitem__(item) class _constrained_simple_container(_simple_container, ConstrainedContainer): """constrained version of _simple_container""" def __init__(self, data): _simple_container.__init__(self, data) ConstrainedContainer.__init__(self, None) class ConstrainedContainerTests(TestCase): """Tests of the generic ConstrainedContainer interface.""" def setUp(self): """Make a couple of standard containers""" self.alphabet = _constrained_simple_container('abc') self.numbers = _constrained_simple_container([1,2,3]) self.alphacontainer = 'abcdef' self.numbercontainer = ClassChecker(int) def test_matchesConstraint(self): """ConstrainedContainer matchesConstraint should return true if items ok""" self.assertEqual(self.alphabet.matchesConstraint(self.alphacontainer), \ True) self.assertEqual(self.alphabet.matchesConstraint(self.numbercontainer),\ False) self.assertEqual(self.numbers.matchesConstraint(self.alphacontainer), \ False) self.assertEqual(self.numbers.matchesConstraint(self.numbercontainer),\ True) def test_otherIsValid(self): """ConstrainedContainer should use constraint for checking other""" self.assertEqual(self.alphabet.otherIsValid('12d8jc'), True) self.alphabet.Constraint = self.alphacontainer self.assertEqual(self.alphabet.otherIsValid('12d8jc'), False) self.alphabet.Constraint = list('abcdefghijkl12345678') self.assertEqual(self.alphabet.otherIsValid('12d8jc'), True) self.assertEqual(self.alphabet.otherIsValid('z'), False) def test_itemIsValid(self): """ConstrainedContainer should use constraint for checking item""" self.assertEqual(self.alphabet.itemIsValid(3), True) self.alphabet.Constraint = self.alphacontainer self.assertEqual(self.alphabet.itemIsValid(3), False) self.assertEqual(self.alphabet.itemIsValid('a'), True) def test_sequenceIsValid(self): """ConstrainedContainer should use constraint for checking sequence""" self.assertEqual(self.alphabet.sequenceIsValid('12d8jc'), True) self.alphabet.Constraint = self.alphacontainer self.assertEqual(self.alphabet.sequenceIsValid('12d8jc'), False) self.alphabet.Constraint = list('abcdefghijkl12345678') self.assertEqual(self.alphabet.sequenceIsValid('12d8jc'), True) self.assertEqual(self.alphabet.sequenceIsValid('z'), False) def test_Constraint(self): """ConstrainedContainer should only allow valid constraints to be set""" try: self.alphabet.Constraint = self.numbers except ConstraintError: pass else: raise AssertionError, \ "Failed to raise ConstraintError with invalid constraint." self.alphabet.Constraint = 'abcdefghi' self.alphabet.Constraint = ['a','b', 'c', 1, 2, 3] self.numbers.Constraint = range(20) self.numbers.Constraint = xrange(20) self.numbers.Constraint = [5,1,3,7,2] self.numbers.Constraint = {1:'a',2:'b',3:'c'} self.assertRaises(ConstraintError, setattr, self.numbers, \ 'Constraint', '1') class ConstrainedStringTests(TestCase): """Tests that ConstrainedString can only contain allowed items.""" def test_init_good_data(self): """ConstrainedString should init OK if string matches constraint""" self.assertEqual(ConstrainedString('abc', 'abcd'), 'abc') self.assertEqual(ConstrainedString('', 'abcd'), '') items = [1,2,3.2234, (['a'], ['b'],), 'xyz'] #should accept anything str() does if no constraint is passed self.assertEqual(ConstrainedString(items), str(items)) self.assertEqual(ConstrainedString(items, None), str(items)) self.assertEqual(ConstrainedString('12345'), str(12345)) self.assertEqual(ConstrainedString(12345, '1234567890'), str(12345)) #check that list is formatted correctly and chars are all there test_list = [1,2,3,4,5] self.assertEqual(ConstrainedString(test_list, '][, 12345'), str(test_list)) def test_init_bad_data(self): """ConstrainedString should fail init if unknown chars in string""" self.assertRaises(ConstraintError, ConstrainedString, 1234, '123') self.assertRaises(ConstraintError, ConstrainedString, '1234', '123') self.assertRaises(ConstraintError, ConstrainedString, [1,2,3], '123') def test_add_prevents_bad_data(self): """ConstrainedString should allow addition only of compliant string""" a = ConstrainedString('123', '12345') b = ConstrainedString('444', '4') c = ConstrainedString('45', '12345') d = ConstrainedString('x') self.assertEqual(a + b, '123444') self.assertEqual(a + c, '12345') self.assertRaises(ConstraintError, b.__add__, c) self.assertRaises(ConstraintError, c.__add__, d) #should be OK if constraint removed b.Constraint = None self.assertEqual(b + c, '44445') self.assertEqual(b + d, '444x') #should fail if we add the constraint back b.Constraint = '4x' self.assertEqual(b + d, '444x') self.assertRaises(ConstraintError, b.__add__, c) #check that added strings retain constraint self.assertRaises(ConstraintError, (a+b).__add__, d) def test_mul(self): """ConstrainedString mul amd rmul should retain constraint""" a = ConstrainedString('123', '12345') b = 3*a c = b*2 self.assertEqual(b, '123123123') self.assertEqual(c, '123123123123123123') self.assertRaises(ConstraintError, b.__add__, 'x') self.assertRaises(ConstraintError, c.__add__, 'x') def test_getslice(self): """ConstrainedString getslice should remember constraint""" a = ConstrainedString('123333', '12345') b = a[2:4] self.assertEqual(b, '33') self.assertEqual(b.Constraint, '12345') def test_getitem(self): """ConstrainedString getitem should handle slice objects""" a = ConstrainedString('7890543', '1234567890') self.assertEqual(a[0], '7') self.assertEqual(a[1], '8') self.assertEqual(a[-1], '3') self.assertRaises(AttributeError, getattr, a[1], 'Alphabet') self.assertEqual(a[1:6:2], '804') self.assertEqual(a[1:6:2].Constraint, '1234567890') def test_init_masks(self): """ConstrainedString should init OK with masks""" def mask(x): return str(int(x) + 3) a = ConstrainedString('12333', '45678', mask) self.assertEqual(a, '45666') assert 'x' not in a self.assertRaises(TypeError, a.__contains__, 1) class MappedStringTests(TestCase): """MappedString should behave like ConstrainedString, but should map items.""" def test_init_masks(self): """MappedString should init OK with masks""" def mask(x): return str(int(x) + 3) a = MappedString('12333', '45678', mask) self.assertEqual(a, '45666') assert 1 in a assert 'x' not in a class ConstrainedListTests(TestCase): """Tests that bad data can't sneak into ConstrainedLists.""" def test_init_good_data(self): """ConstrainedList should init OK if list matches constraint""" self.assertEqual(ConstrainedList('abc', 'abcd'), list('abc')) self.assertEqual(ConstrainedList('', 'abcd'), list('')) items = [1,2,3.2234, (['a'], ['b'],), list('xyz')] #should accept anything str() does if no constraint is passed self.assertEqual(ConstrainedList(items), items) self.assertEqual(ConstrainedList(items, None), items) self.assertEqual(ConstrainedList('12345'), list('12345')) #check that list is formatted correctly and chars are all there test_list = list('12345') self.assertEqual(ConstrainedList(test_list, '12345'), test_list) def test_init_bad_data(self): """ConstrainedList should fail init with items not in constraint""" self.assertRaises(ConstraintError, ConstrainedList, '1234', '123') self.assertRaises(ConstraintError,ConstrainedList,[1,2,3],['1','2','3']) def test_add_prevents_bad_data(self): """ConstrainedList should allow addition only of compliant data""" a = ConstrainedList('123', '12345') b = ConstrainedList('444', '4') c = ConstrainedList('45', '12345') d = ConstrainedList('x') self.assertEqual(a + b, list('123444')) self.assertEqual(a + c, list('12345')) self.assertRaises(ConstraintError, b.__add__, c) self.assertRaises(ConstraintError, c.__add__, d) #should be OK if constraint removed b.Constraint = None self.assertEqual(b + c, list('44445')) self.assertEqual(b + d, list('444x')) #should fail if we add the constraint back b.Constraint = {'4':1, 5:2} self.assertRaises(ConstraintError, b.__add__, c) def test_iadd_prevents_bad_data(self): """ConstrainedList should allow in-place addition only of compliant data""" a = ConstrainedList('12', '123') a += '2' self.assertEqual(a, list('122')) self.assertEqual(a.Constraint, '123') self.assertRaises(ConstraintError, a.__iadd__, '4') def test_imul(self): """ConstrainedList imul should preserve constraint""" a = ConstrainedList('12', '123') a *= 3 self.assertEqual(a, list('121212')) self.assertEqual(a.Constraint, '123') def test_mul(self): """ConstrainedList mul should preserve constraint""" a = ConstrainedList('12', '123') b = a * 3 self.assertEqual(b, list('121212')) self.assertEqual(b.Constraint, '123') def test_rmul(self): """ConstrainedList rmul should preserve constraint""" a = ConstrainedList('12', '123') b = 3 * a self.assertEqual(b, list('121212')) self.assertEqual(b.Constraint, '123') def test_setitem(self): """ConstrainedList setitem should work only if item in constraint""" a = ConstrainedList('12', '123') a[0] = '3' self.assertEqual(a, list('32')) self.assertRaises(ConstraintError, a.__setitem__, 0, 3) a = ConstrainedList('1'*20, '123') self.assertRaises(ConstraintError, a.__setitem__, slice(0,1,1), [3]) self.assertRaises(ConstraintError, a.__setitem__, slice(0,1,1), ['111']) a[2:9:2] = '3333' self.assertEqual(a, list('11313131311111111111')) def test_append(self): """ConstrainedList append should work only if item in constraint""" a = ConstrainedList('12', '123') a.append('3') self.assertEqual(a, list('123')) self.assertRaises(ConstraintError, a.append, 3) def test_extend(self): """ConstrainedList extend should work only if all items in constraint""" a = ConstrainedList('12', '123') a.extend('321') self.assertEqual(a, list('12321')) self.assertRaises(ConstraintError, a.extend, ['1','2', 3]) def test_insert(self): """ConstrainedList insert should work only if item in constraint""" a = ConstrainedList('12', '123') a.insert(0, '2') self.assertEqual(a, list('212')) self.assertRaises(ConstraintError, a.insert, 0, [2]) def test_getslice(self): """ConstrainedList getslice should remember constraint""" a = ConstrainedList('123333', '12345') b = a[2:4] self.assertEqual(b, list('33')) self.assertEqual(b.Constraint, '12345') def test_setslice(self): """ConstrainedList setslice should fail if slice has invalid chars""" a = ConstrainedList('123333', '12345') a[2:4] = ['2','2'] self.assertEqual(a, list('122233')) self.assertRaises(ConstraintError, a.__setslice__, 2,4, [2,2]) a[:] = [] self.assertEqual(a, []) self.assertEqual(a.Constraint, '12345') def test_setitem_masks(self): """ConstrainedList setitem with masks should transform input""" a = ConstrainedList('12333', range(5), lambda x: int(x) + 1) self.assertEqual(a, [2,3,4,4,4]) self.assertRaises(ConstraintError, a.append, 4) b = a[1:3] assert b.Mask is a.Mask assert '1' not in a assert '2' not in a assert 2 in a assert 'x' not in a class MappedListTests(TestCase): """MappedList should behave like ConstrainedList, but map items.""" def test_setitem_masks(self): """MappedList setitem with masks should transform input""" a = MappedList('12333', range(5), lambda x: int(x) + 1) self.assertEqual(a, [2,3,4,4,4]) self.assertRaises(ConstraintError, a.append, 4) b = a[1:3] assert b.Mask is a.Mask assert '1' in a assert 'x' not in a class ConstrainedDictTests(TestCase): """Tests that bad data can't sneak into ConstrainedDicts.""" def test_init_good_data(self): """ConstrainedDict should init OK if list matches constraint""" self.assertEqual(ConstrainedDict(dict.fromkeys('abc'), 'abcd'), \ dict.fromkeys('abc')) self.assertEqual(ConstrainedDict('', 'abcd'), dict('')) items = [1,2,3.2234, tuple('xyz')] #should accept anything dict() does if no constraint is passed self.assertEqual(ConstrainedDict(dict.fromkeys(items)), \ dict.fromkeys(items)) self.assertEqual(ConstrainedDict(dict.fromkeys(items), None), \ dict.fromkeys(items)) self.assertEqual(ConstrainedDict([(x,1) for x in '12345']), \ dict.fromkeys('12345', 1)) #check that list is formatted correctly and chars are all there test_dict = dict.fromkeys('12345') self.assertEqual(ConstrainedDict(test_dict, '12345'), test_dict) def test_init_sequence(self): """ConstrainedDict should init from sequence, unlike normal dict""" self.assertEqual(ConstrainedDict('abcda'), {'a':2,'b':1,'c':1,'d':1}) def test_init_bad_data(self): """ConstrainedDict should fail init with items not in constraint""" self.assertRaises(ConstraintError, ConstrainedDict, \ dict.fromkeys('1234'), '123') self.assertRaises(ConstraintError,ConstrainedDict, \ dict.fromkeys([1,2,3]),['1','2','3']) def test_setitem(self): """ConstrainedDict setitem should work only if key in constraint""" a = ConstrainedDict(dict.fromkeys('12'), '123') a['1'] = '3' self.assertEqual(a, {'1':'3','2':None}) self.assertRaises(ConstraintError, a.__setitem__, 1, '3') def test_copy(self): """ConstrainedDict copy should retain constraint""" a = ConstrainedDict(dict.fromkeys('12'), '123') b = a.copy() self.assertEqual(a.Constraint, b.Constraint) self.assertRaises(ConstraintError, a.__setitem__, 1, '3') self.assertRaises(ConstraintError, b.__setitem__, 1, '3') def test_fromkeys(self): """ConstrainedDict instance fromkeys should retain constraint""" a = ConstrainedDict(dict.fromkeys('12'), '123') b = a.fromkeys('23') self.assertEqual(a.Constraint, b.Constraint) self.assertRaises(ConstraintError, a.__setitem__, 1, '3') self.assertRaises(ConstraintError, b.__setitem__, 1, '3') b['2'] = 5 self.assertEqual(b, {'2':5, '3':None}) def test_setdefault(self): """ConstrainedDict setdefault shouldn't allow bad keys""" a = ConstrainedDict({'1':None, '2': 'xyz'}, '123') self.assertEqual(a.setdefault('2', None), 'xyz') self.assertEqual(a.setdefault('1', None), None) self.assertRaises(ConstraintError, a.setdefault, 'x', 3) a.setdefault('3', 12345) self.assertEqual(a, {'1':None, '2':'xyz', '3': 12345}) def test_update(self): """ConstrainedDict should allow update only of compliant data""" a = ConstrainedDict(dict.fromkeys('123'), '12345') b = ConstrainedDict(dict.fromkeys('444'), '4') c = ConstrainedDict(dict.fromkeys('45'), '12345') d = ConstrainedDict([['x','y']]) a.update(b) self.assertEqual(a, dict.fromkeys('1234')) a.update(c) self.assertEqual(a, dict.fromkeys('12345')) self.assertRaises(ConstraintError, b.update, c) self.assertRaises(ConstraintError, c.update, d) #should be OK if constraint removed b.Constraint = None b.update(c) self.assertEqual(b, dict.fromkeys('45')) b.update(d) self.assertEqual(b, {'4':None, '5':None, 'x':'y'}) #should fail if we add the constraint back b.Constraint = {'4':1, 5:2, '5':1, 'x':1} self.assertRaises(ConstraintError, b.update, {4:1}) b.update({5:1}) self.assertEqual(b, {'4':None, '5':None, 'x':'y', 5:1}) def test_setitem_masks(self): """ConstrainedDict setitem should work only if key in constraint""" key_mask = str val_mask = lambda x: int(x) + 3 d = ConstrainedDict({1:4, 2:6}, '123', key_mask, val_mask) d[1] = '456' self.assertEqual(d, {'1':459,'2':9,}) d['1'] = 234 self.assertEqual(d, {'1':237,'2':9,}) self.assertRaises(ConstraintError, d.__setitem__, 4, '3') e = d.copy() assert e.Mask is d.Mask assert '1' in d assert not 1 in d class MappedDictTests(TestCase): """MappedDict should work like ConstrainedDict, but map keys.""" def test_setitem_masks(self): """MappedDict setitem should work only if key in constraint""" key_mask = str val_mask = lambda x: int(x) + 3 d = MappedDict({1:4, 2:6}, '123', key_mask, val_mask) d[1] = '456' self.assertEqual(d, {'1':459,'2':9,}) d['1'] = 234 self.assertEqual(d, {'1':237,'2':9,}) self.assertRaises(ConstraintError, d.__setitem__, 4, '3') e = d.copy() assert e.Mask is d.Mask assert '1' in d assert 1 in d assert 1 not in d.keys() assert 'x' not in d.keys() def test_getitem(self): """MappedDict getitem should automatically map key.""" key_mask = str d = MappedDict({}, '123', key_mask) self.assertEqual(d, {}) d['1'] = 5 self.assertEqual(d, {'1':5}) self.assertEqual(d[1], 5) def test_get(self): """MappedDict get should automatically map key.""" key_mask = str d = MappedDict({}, '123', key_mask) self.assertEqual(d, {}) d['1'] = 5 self.assertEqual(d, {'1':5}) self.assertEqual(d.get(1, 'x'), 5) self.assertEqual(d.get(5, 'x'), 'x') def test_has_key(self): """MappedDict has_key should automatically map key.""" key_mask = str d = MappedDict({}, '123', key_mask) self.assertEqual(d, {}) d['1'] = 5 assert d.has_key('1') assert d.has_key(1) assert not d.has_key('5') class generateCombinationsTests(TestCase): """Tests for public generateCombinations function.""" def test_generateCombinations(self): """function should return all combinations of given length""" #test all 3-position combinations of a 2-digit alphabet, since I can #work that one out by hand ... correct_result = [ "AAA", "AAB", "ABA", "ABB", \ "BBB", "BBA", "BAB", "BAA"] real_result = generateCombinations("AB", 3) correct_result.sort() real_result.sort() self.assertEquals(str(real_result), str(correct_result)) #end test_generateCombinations def test_generateCombinations_singleAlphabet(self): """function should return correct value when alphabet is one char""" real_result = generateCombinations("A", 4) self.assertEquals(str(real_result), str(["AAAA"])) #end test_generateCombinations_singleAlphabet def test_generateCombinations_singleLength(self): """function should return correct values if length is 1""" real_result = generateCombinations("ABC", 1) self.assertEquals(str(real_result), str(["A", "B", "C"])) #end test_generateCombinations_singleLength def test_generateCombinations_emptyAlphabet(self): """function should return empty list if alphabet arg is [], "" """ real_result = generateCombinations("", 4) self.assertEquals(str(real_result), str([])) real_result = generateCombinations([], 4) self.assertEquals(str(real_result), str([])) #end test_generateCombinations_emptyAlphabet def test_generateCombinations_zeroLength(self): """function should return empty list if length arg is 0 """ real_result = generateCombinations("ABC", 0) self.assertEquals(str(real_result), str([])) #end test_generateCombinations_zeroLength def test_generateCombinations_badArgs(self): """function should error if args are not castable to right type.""" self.assertRaises(RuntimeError, generateCombinations, 12, 4) self.assertRaises(RuntimeError, generateCombinations, [], None) #end test_generateCombinations_badArgs #end generateCombinationsTests class makeNonnegIntTests(TestCase): """Tests of the public makeNonnegInt function""" def test_makeNonnegInt_unchanged(self): """Should return an input nonneg int unchanged""" self.assertEquals(makeNonnegInt(3), 3) #end test_makeNonnegInt_unchanged def test_makeNonnegInt_castable(self): """Should return nonneg int version of a castable input""" self.assertEquals(makeNonnegInt(-4.2), 4) #end test_makeNonnegInt_castable def test_makeNonnegInt_noncastable(self): """Should raise a special NonnegIntError if input isn't castable""" self.assertRaises(NonnegIntError, makeNonnegInt, "blue") #end test_makeNonnegInt_noncastable #end makeNonnegIntTests class reverse_complementTests(TestCase): """Tests of the public reverse_complement function""" def test_reverse_complement_DNA(self): """reverse_complement should correctly return reverse complement of DNA""" #input and correct output taken from example at #http://bioweb.uwlax.edu/GenWeb/Molecular/Seq_Anal/ #Reverse_Comp/reverse_comp.html user_input = "ATGCAGGGGAAACATGATTCAGGAC" correct_output = "GTCCTGAATCATGTTTCCCCTGCAT" real_output = reverse_complement(user_input) self.assertEquals(real_output, correct_output) # revComp is a pointer to reverse_complement (for backward # compatibility) real_output = revComp(user_input) self.assertEquals(real_output, correct_output) #end test_reverse_complement_DNA def test_reverse_complement_RNA(self): """reverse_complement should correctly return reverse complement of RNA""" #input and correct output taken from test_reverse_complement_DNA test, #with all Ts changed to Us user_input = "AUGCAGGGGAAACAUGAUUCAGGAC" correct_output = "GUCCUGAAUCAUGUUUCCCCUGCAU" #remember to use False toggle to get RNA instead of DNA real_output = reverse_complement(user_input, False) self.assertEquals(real_output, correct_output) #end test_reverse_complement_RNA def test_reverse_complement_caseSensitive(self): """reverse_complement should convert bases without changing case""" user_input = "aCGtAcgT" correct_output = "AcgTaCGt" real_output = reverse_complement(user_input) self.assertEquals(real_output, correct_output) #end test_reverse_complement_caseSensitive def test_reverse_complement_nonNucleicSeq(self): """reverse_complement should just reverse any chars but ACGT/U""" user_input = "BDeF" self.assertRaises(ValueError,reverse_complement,user_input) #end test_reverse_complement_nonNucleicSeq def test_reverse_complement_emptySeq(self): """reverse_complement should return empty string if given empty sequence""" #shouldn't matter whether in DNA or RNA mode real_output = reverse_complement("") self.assertEquals(real_output, "") #end test_reverse_complement_emptySeq def test_reverse_complement_noSeq(self): """reverse_complement should return error if given no sequence argument""" self.assertRaises(TypeError, reverse_complement) #end test_reverse_complement_noSeq #end reverse_complementTests def test_not_none(self): """not_none should return True if none of the items is None""" assert not_none([1,2,3,4]) assert not not_none([1,2,3,None]) self.assertEqual(filter(not_none,[(1,2),(3,None)]),[(1,2)]) #end test_not_none def test_get_items_except(self): """get_items_except should return all items of seq not in indices""" self.assertEqual(get_items_except('a-b-c-d',[1,3,5]),'abcd') self.assertEqual(get_items_except([0,1,2,3,4,5,6],[1,3,5]),[0,2,4,6]) self.assertEqual(get_items_except((0,1,2,3,4,5,6),[1,3,5]),(0,2,4,6)) self.assertEqual(get_items_except('a-b-c-d',[1,3,5],tuple), ('a','b','c','d')) #end test_get_items_except def test_NestedSplitter(self): """NestedSplitter should make a function which return expected list""" #test delimiters, constructor, filter_ line='ii=0; oo= 9, 6 5; ; xx= 8; ' cmds = [ "NestedSplitter(';=,')(line)", "NestedSplitter([';', '=', ','])(line)", "NestedSplitter([(';'), '=', ','], constructor=None)(line)", "NestedSplitter([(';'), '=', ','], filter_=None)(line)", "NestedSplitter([(';',1), '=', ','])(line)", "NestedSplitter([(';',-1), '=', ','])(line)" ] results=[ [['ii', '0'], ['oo', ['9', '6 5']], '', ['xx', '8'], ''], [['ii', '0'], ['oo', ['9', '6 5']], '', ['xx', '8'], ''], [['ii', '0'], [' oo', [' 9', ' 6 5']], ' ', [' xx', ' 8'], ' '], [['ii', '0'], ['oo', ['9', '6 5']], ['xx', '8']], [['ii', '0'], ['oo', ['9', '6 5; ; xx'], '8;']], [['ii', '0; oo', ['9', '6 5; ; xx'], '8'], ''] ] for cmd, result in zip(cmds, results): self.assertEqual(eval(cmd), result) #test uncontinous level of delimiters test = 'a; b,c; d,e:f; g:h;' #g:h should get [[g,h]] instead of [g,h] self.assertEqual(NestedSplitter(';,:')(test), ['a', ['b', 'c'], ['d', ['e', 'f']], [['g', 'h']], '']) #test empty self.assertEqual(NestedSplitter(';,:')(''), ['']) self.assertEqual(NestedSplitter(';,:')(' '), ['']) self.assertEqual(NestedSplitter(';,:', filter_=None)(' ;, :'), [[[]]]) def test_curry(self): """curry should generate the function with parameters setted""" curry_test = curry(cmp, 5) knowns = ((3, 1), (9, -1), (5, 0)) for arg2, result in knowns: self.assertEqual (curry_test(arg2), result) def test_app_path(self): """app_path should return correct paths""" self.assertEqual(app_path('ls'), '/bin/ls') self.assertEqual(app_path('lsxxyyx'), False) class CommandLineParserTests(TestCase): def test_parse_command_line_parameters(self): """parse_command_line_parameters returns without error There is not a lot of detailed testing that can be done here, so the basic functionality is tested. """ option_parser, opts, args = parse_command_line_parameters( script_description="My script", script_usage=[('Print help','%prog -h','')], version='1.0',help_on_no_arguments=False, command_line_args=[]) self.assertEqual(len(args),0) d = {'script_description':"My script",\ 'script_usage':[('Print help','%prog -h','')],\ 'version':'1.0', 'help_on_no_arguments':False, 'command_line_args':[]} option_parser, opts, args = parse_command_line_parameters(**d) self.assertEqual(len(args),0) # allowing positional arguments functions as expected as does # passing a positional argument d = {'script_description':"My script",\ 'script_usage':[('Print help','%prog -h','')],\ 'version':'1.0', 'help_on_no_arguments':False, 'command_line_args':['hello'], 'disallow_positional_arguments':False} option_parser, opts, args = parse_command_line_parameters(**d) self.assertEqual(len(args),1) #run tests on command-line invocation if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_util/test_organizer.py000644 000765 000024 00000017071 12024702176 023036 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests Filter, Organizer and filterfunctions, classes for filtering """ from cogent.util.organizer import Filter, Organizer, GroupList, regroup from cogent.util.transform import find_any, find_no, find_all,\ keep_if_more, exclude_if_more, keep_if_more_other, exclude_if_more_other from cogent.util.unit_test import TestCase,main __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class Sequence(object): """Simple sequence class for tests.""" def __init__(self, s, info): self.s = s self.__dict__.update(info) if not hasattr(self, 'Gene'): self.Gene = None def __contains__(self, s): return s in self.s def __repr__(self): return `self.s` def __iter__(self): return iter(self.s) def __nonzero__(self): return bool(self.s) def lower(self): return self.s.lower() def __cmp__(self, other): return cmp(self.s,other.s) class FilterTests(TestCase): """Tests of Filter class""" def test_init(self): """Filter should init as expected""" empty_filter = Filter('',{}) named_empty_filter = Filter('Archaea',{}) self.assertEqual(empty_filter,{}) self.assertEqual(empty_filter.Name,'') self.assertEqual(named_empty_filter,{}) self.assertEqual(named_empty_filter.Name,'Archaea') f = find_all('abcd') g = keep_if_more_other('ab',7) fil = Filter('Archaea',{'Arch':[f,g]}) assert fil['Arch'][0] is f assert fil['Arch'][1] is g def test_call_empty(self): """Empty Filter should return True when called on anything""" f = Filter('',{}) data = ['aa','bb','cc'] self.assertEqual(f(data),True) def test_call_full(self): """Filter should return True if the object satisfies all criteria""" seq1 = Sequence('ACGU',{'Gene':'LSU'}) seq2 = Sequence('ACGUACGU',{'Gene':'SSU'}) seq3 = Sequence('ACGUN',{'Gene':'LSU'}) seq4 = Sequence('ACG',{'Gene':'LSU'}) seq5 = Sequence('ACGU',{}) seq6 = Sequence('',{}) f = Filter('valid',{None:[find_all('AGCU'),find_no('N')],\ 'Gene':[find_any(['LSU'])]}) self.assertEqual(f(seq1),True) self.assertEqual(f(seq2),False) self.assertEqual(f(seq3),False) self.assertEqual(f(seq4),False) self.assertEqual(f(seq5),False) self.assertEqual(f(seq6),False) class GroupListTests(TestCase): """Tests of GroupList class""" def test_init_empty(self): """Empty GroupList should init OK""" g = GroupList([]) self.assertEqual(len(g),0) self.assertEqual(g.Groups,[]) def test_init_full(self): """GroupList should init OK with data and groups""" data = ['a','b','c'] groups = [1,2,3] g = GroupList(data,groups) self.assertEqual(g,data) self.assertEqual(g.Groups,groups) self.assertEqual(len(g),3) class OrganizerTests(TestCase): """Tests of Classifier class""" def setUp(self): """Define some standard Organizers for testing""" self.Empty = Organizer([]) self.a = Filter('a',{None:[find_any('a')]}) self.b = Filter('b',{None:[find_any('b')]}) self.Ab_org = Organizer([self.a,self.b]) lsu = Filter('LSU',{None:[exclude_if_more('N',5)],\ 'Gene':[find_any(['LSU'])]}) ssu = Filter('SSU',{None:[exclude_if_more('N',5)],\ 'Gene':[find_any(['SSU'])]}) self.Gene_org = Organizer([lsu,ssu]) self.Ab_seq = ['aa','bb','abab','cc',''] self.seq1 = Sequence('ACGU',{'Gene':'LSU'}) self.seq2 = Sequence('ACGUACGU',{'Gene':'SSU'}) self.seq3 = Sequence('ACGUNNNNNN',{'Gene':'LSU'}) self.seq4 = Sequence('ACGUNNNNNN',{'Gene':'SSU'}) self.seq5 = Sequence('ACGU',{}) self.seq6 = Sequence('',{}) self.seq7 = Sequence('ACGU',{'Gene':'unit'}) self.Gene_seq = [self.seq1,self.seq2,self.seq3,self.seq4,\ self.seq5,self.seq6,self.seq7] f = Filter('valid',{None:[find_all('AGCU'),find_no('N')],\ 'Gene':[find_any(['LSU'])]}) self.Mult_func_org = Organizer([f]) def test_init_empty(self): """Empty Organizer should init correctly""" org = self.Empty self.assertEqual(len(org),0) def test_init_full(self): """Organizer should init correctly with multiple functions""" org = Organizer([self.a,self.b]) self.assertEqual(org[0],self.a) self.assertEqual(org[1],self.b) self.assertEqual(len(org),2) def test_empty_org_empty_list(self): """Empty Organizer should return [] when applied to []""" org = self.Empty l = [] self.assertEqual(org(l),[]) def test_empty_org_full_list(self): """Empty organizer, applied to full list, should return the original""" org = self.Empty l = self.Ab_seq obs = org(l) self.assertEqual(obs,[l]) self.assertEqual(obs[0].Groups,[None]) def test_full_org_empty_list(self): """Organizer should return [] when applied to []""" org = self.Ab_org l = [] obs = org(l) self.assertEqual(obs,[]) def test_full_org_full_list(self): """Organizer should return correct organization""" org = self.Ab_org l = self.Ab_seq obs = org(l) obs.sort() exp = [['aa','abab'],['bb'],['cc','']] self.assertEqual(obs,exp) self.assertEqual(obs[0].Groups,['a']) self.assertEqual(obs[1].Groups,['b']) self.assertEqual(obs[2].Groups,[None]) def test_double_org_empty_list(self): """Organizer should return [] when applied to []""" org = self.Gene_org l = [] obs = org(l) self.assertEqual(obs,[]) def test_double_org_full_list(self): """Organizer should handle multiple filters correctly""" org = self.Gene_org l = self.Gene_seq obs = org(l) obs.sort() exp = [[self.seq1],[self.seq2],[self.seq3,self.seq4,\ self.seq5,self.seq6,self.seq7]] self.assertEqual(obs,exp) self.assertEqual(obs[0].Groups,['LSU']) self.assertEqual(obs[1].Groups,['SSU']) self.assertEqual(obs[2].Groups,[None]) def test_multiple_func(self): """Organizer should handle filter with multiple functions correctly""" org = self.Mult_func_org l = self.Gene_seq obs = org(l) obs.sort() exp = [[self.seq1],[self.seq2,self.seq3,self.seq4,self.seq5,\ self.seq6,self.seq7]] self.assertEqual(obs,exp) self.assertEqual(obs[0].Groups,['valid']) self.assertEqual(obs[1].Groups,[None]) class organizerTests(TestCase): """Tests for module-level functions""" def test_regroup(self): """regroup: should groups with identical hierarchy-info together""" g1 = GroupList([1,2,3],['a']) g2 = GroupList([4,5,6],['b']) g3 = GroupList([7,7,7],['a','b']) g4 = GroupList([8,8,8],['a']) all = [g1, g2, g3, g4] self.assertEqualItems(regroup(all), [[1,2,3,8,8,8],[7,7,7],[4,5,6]]) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_util/test_recode_alignment.py000755 000765 000024 00000035142 12024702176 024337 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # Author: Greg Caporaso (gregcaporaso@gmail.com) # test_recode_alignment.py """ Description File created on 19 Jun 2007. """ from __future__ import division from numpy import array from cogent import LoadSeqs from cogent.util.unit_test import TestCase, main from cogent.core.alignment import DenseAlignment from cogent.evolve.models import DSO78_matrix, DSO78_freqs from cogent.evolve.substitution_model import SubstitutionModel from cogent.core.alphabet import Alphabet from cogent.app.gctmpca import gctmpca_aa_order,\ default_gctmpca_aa_sub_matrix from cogent.util.recode_alignment import alphabets, recode_dense_alignment,\ build_alphabet_map, recode_freq_vector, recode_alignment,\ recode_counts_and_freqs, recode_count_matrix __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Beta" class RecodeAlignmentTests(TestCase): """ Tests of functions in recode_alphabet.py These functions will probably move at some point, and the unit tests will move with them. """ def setUp(self): """ Initialize some variables for the tests """ self.canonical_abbrevs = 'ACDEFGHIKLMNPQRSTVWY' self.ambiguous_abbrevs = 'BXZ' self.all_to_a = [('A',self.canonical_abbrevs+\ self.ambiguous_abbrevs)] self.charge_2 = alphabets['charge_2'] self.hydropathy_3 = alphabets['hydropathy_3'] self.orig = alphabets['orig'] self.aln = DenseAlignment(\ data={'1':'CDDFBXZ', '2':'CDD-BXZ', '3':'AAAASS-'}) self.aln2 = LoadSeqs(\ data={'1':'CDDFBXZ', '2':'CDD-BXZ', '3':'AAAASS-'}) def test_build_alphabet_map_handles_bad_data(self): """build_alphabet_map: bad data raises error """ self.assertRaises(ValueError,build_alphabet_map) self.assertRaises(ValueError,build_alphabet_map,'not_a_valid_id') self.assertRaises(ValueError,build_alphabet_map,\ alphabet_def=['A','BCD','B','EFG']) def test_build_alphabet_map_w_alphabet_id(self): """build_alphabet_map: returns correct dict when given alphabet_id """ expected = dict([\ ('G','G'), ('A','G'), ('V','G'), ('L','G'), ('I','G'),\ ('S','G'), ('P','G'), ('T','G'), ('C','G'), ('N','G'), ('D','G'),\ ('X','G'), ('B','G'), ('M','M'), ('F','M'), ('Y','M'), ('W','M'),\ ('Q','M'), ('K','M'), ('H','M'), ('R','M'), ('E','M'), ('Z','M')]) self.assertEqual(build_alphabet_map('size_2'),expected) self.assertEqual(build_alphabet_map('charge_3')['E'],'D') self.assertEqual(build_alphabet_map('charge_3')['B'],'A') self.assertEqual(build_alphabet_map('charge_3')['K'],'K') def test_build_alphabet_map_w_alphabet_def(self): """build_alphabet_map: returns correct dict when given alphabet_def """ expected = dict([\ ('G','S'), ('A','S'), ('V','S'), ('L','S'), ('I','S'),\ ('S','S'), ('P','S'), ('T','S'), ('C','S'), ('N','S'), ('D','S'),\ ('X','S'), ('B','S'), ('M','L'), ('F','L'), ('Y','L'), ('W','L'),\ ('Q','L'), ('K','L'), ('H','L'), ('R','L'), ('E','L'), ('Z','L')]) self.assertEqual(build_alphabet_map(alphabet_def=\ [('S','GAVLISPTCNDXB'),('L','MFYWQKHREZ')]),expected) def test_build_alphabet_map_handles_all_ids_and_defs_wo_error(self): """build_alphabet_map: handles all pre-defined alphabets w/o error""" for alphabet_id, alphabet_def in alphabets.items(): try: build_alphabet_map(alphabet_id=alphabet_id) except ValueError: raise AssertionError, "Failed on id: %s" % alphabet_id try: build_alphabet_map(alphabet_def=alphabet_def) except ValueError: raise AssertionError, "Failed on def: %s" % str(alphabet_def) def test_recode_dense_alignment_handles_all_ids_and_defs_wo_error(self): """recode_dense_alignment: handles pre-defined alphabets w/o error""" for alphabet_id, alphabet_def in alphabets.items(): try: recode_dense_alignment(self.aln,alphabet_id=alphabet_id) except ValueError: raise AssertionError, "Failed on id: %s" % alphabet_id try: recode_dense_alignment(self.aln,alphabet_def=alphabet_def) except ValueError: raise AssertionError, "Failed on def: %s" % str(alphabet_def) def test_recode_dense_alignment_leaves_original_alignment_intact(self): """recode_dense_alignment: leaves input alignment intact """ # provided with alphabet_id actual = recode_dense_alignment(self.aln, alphabet_id='charge_2') self.assertNotEqual(actual,self.aln) # provided with alphabet_def actual = recode_dense_alignment(self.aln, alphabet_def=self.charge_2) self.assertNotEqual(actual,self.aln) def test_recode_dense_alignment(self): """recode_dense_alignment: recode alignment to charge_2 alpha works """ expected_c2 = DenseAlignment(data=\ {'1':'AKKAKAK','2':'AKK-KAK','3':'AAAAAA-'}) expected_h3 = DenseAlignment(data=\ {'1':'PRRPRPR','2':'PRR-RPR','3':'PPPPYY-'}) expected_aa = DenseAlignment(data=\ {'1':'AAAAAAA','2':'AAA-AAA','3':'AAAAAA-'}) # provided with alphabet_id actual = recode_dense_alignment(self.aln, alphabet_id='charge_2') self.assertEqual(actual,expected_c2) # provided with alphabet_def actual = recode_dense_alignment(self.aln, alphabet_def=self.charge_2) self.assertEqual(actual,expected_c2) # different alphabet actual = recode_dense_alignment(self.aln, alphabet_id='hydropathy_3') self.assertEqual(actual,expected_h3) actual = recode_dense_alignment(self.aln,\ alphabet_def=self.hydropathy_3) self.assertEqual(actual,expected_h3) # different alphabet actual = recode_dense_alignment(self.aln, alphabet_def=self.all_to_a) self.assertEqual(actual,expected_aa) # original charactars which aren't remapped are let in original state actual = recode_dense_alignment(self.aln, alphabet_def=[('a','b')]) self.assertEqual(actual,self.aln) # non-alphabetic character mapped same as alphabetic characters actual = recode_dense_alignment(self.aln, alphabet_def=[('.','-')]) expected = DenseAlignment(\ data={'1':'CDDFBXZ', '2':'CDD.BXZ', '3':'AAAASS.'}) self.assertEqual(actual,expected) def test_recode_dense_alignment_to_orig(self): """recode_dense_alignment: recode aln to orig returns original aln """ # provided with alphabet_id self.assertEqual(recode_dense_alignment(\ self.aln, alphabet_id='orig'), self.aln) # provided with alphabet_def self.assertEqual(recode_dense_alignment(\ self.aln, alphabet_def=self.orig), self.aln) # THE FUNCTION THAT THESE TESTS APPLY TO ONLY EXISTS AS A STUB RIGHT # NOW -- WILL UNCOMMENT THE TESTS WHEN THE FUNCTIONS IS READY. # --GREG C. (11/19/08) # def test_recode_alignment(self): # """recode_alignment: recode alignment works as expected # """ # expected_c2 = LoadSeqs(data=\ # {'1':'AKKAKAK','2':'AKK-KAK','3':'AAAAAA-'}) # expected_h3 = LoadSeqs(data=\ # {'1':'PRRPRPR','2':'PRR-RPR','3':'PPPPYY-'}) # expected_aa = LoadSeqs(data=\ # {'1':'AAAAAAA','2':'AAA-AAA','3':'AAAAAA-'}) # # # provided with alphabet_id # actual = recode_alignment(self.aln2, alphabet_id='charge_2') # self.assertEqual(actual,expected_c2) # # provided with alphabet_def # actual = recode_alignment(self.aln2, alphabet_def=self.charge_2) # self.assertEqual(actual,expected_c2) # # # different alphabet # actual = recode_alignment(self.aln2, alphabet_id='hydropathy_3') # self.assertEqual(actual,expected_h3) # actual = recode_alignment(self.aln2,\ # alphabet_def=self.hydropathy_3) # self.assertEqual(actual,expected_h3) # # # different alphabet # actual = recode_alignment(self.aln2, alphabet_def=self.all_to_a) # self.assertEqual(actual,expected_aa) # # # original charactars which aren't remapped are let in original state # actual = recode_alignment(self.aln2, alphabet_def=[('a','b')]) # self.assertEqual(actual,self.aln2) # # # non-alphabetic character mapped same as alphabetic characters # actual = recode_alignment(self.aln2, alphabet_def=[('.','-')]) # expected = LoadSeqs(\ # data={'1':'CDDFBXZ', '2':'CDD.BXZ', '3':'AAAASS.'}) # self.assertEqual(actual,expected) # # def test_recode_alignment_to_orig(self): # """recode_alignment: recode aln to orig returns original aln # """ # # provided with alphabet_id # self.assertEqual(recode_alignment(\ # self.aln2, alphabet_id='orig'), self.aln2) # # provided with alphabet_def # self.assertEqual(recode_alignment(\ # self.aln2, alphabet_def=self.orig), self.aln2) # # def test_recode_alignment_leaves_original_alignment_intact(self): # """recode_alignment: leaves input alignment intact # """ # # provided with alphabet_id # actual = recode_alignment(self.aln2, alphabet_id='charge_2') # self.assertNotEqual(actual,self.aln2) # # provided with alphabet_def # actual = recode_alignment(self.aln2, alphabet_def=self.charge_2) # self.assertNotEqual(actual,self.aln2) def test_recode_freq_vector(self): """recode_freq_vector: bg freqs updated to reflect recoded alphabet """ freqs = {'A':0.21, 'E':0.29, 'C':0.05, 'D':0.45} a_def = [('A','AEC'),('E','D')] expected = {'A':0.55, 'E':0.45} self.assertFloatEqual(recode_freq_vector(a_def,freqs),\ expected) # reversal of alphabet freqs = {'A':0.21, 'E':0.29, 'C':0.05, 'D':0.45} a_def = [('A','D'),('E','C'),('C','E'),('D','A')] expected = {'A':0.45,'E':0.05,'C':0.29,'D':0.21} self.assertFloatEqual(recode_freq_vector(a_def,freqs),\ expected) # no change in freqs (old alphabet = new alphabet) freqs = {'A':0.21, 'E':0.29, 'C':0.05, 'D':0.45} a_def = [('A','A'),('E','E'),('C','C'),('D','D')] self.assertFloatEqual(recode_freq_vector(a_def,freqs),\ freqs) freqs = {'A':0.21, 'E':0.29, 'C':0.05, 'D':0.45} a_def = [('X','AEC'),('Y','D')] expected = {'X':0.55, 'Y':0.45} self.assertFloatEqual(recode_freq_vector(a_def,freqs),\ expected) def test_recode_freq_vector_ignores(self): """recode_freq_vector: ignored chars are ignored """ freqs = {'A':0.21, 'B':0.29, 'C':0.05, 'D':0.45,'X':0.22,'Z':0.5} a_def = [('A','A'),('B','B'),('C','C'),('D','D'),('X','X'),('Z','Z')] expected = {'A':0.21,'C':0.05, 'D':0.45} self.assertFloatEqual(recode_freq_vector(a_def,freqs),\ expected) freqs = {'D':0.21, 'E':0.29, 'N':0.05,\ 'Q':0.45,'B':0.26,'Z':0.74,'X':1.0} a_def = [('D','DEN'),('Q','Q')] expected = {'D':0.55, 'Q':0.45} self.assertFloatEqual(recode_freq_vector(a_def,freqs),\ expected) class RecodeMatrixTests(TestCase): """ Tests of substitution matrix recoding. """ def setUp(self): """ Create variables for use in the tests """ self.m1 = [[0,4,1,3,5],[4,0,2,4,6],[1,2,0,7,8],[3,4,7,0,9],[5,6,8,9,0]] self.recoded_m1 =\ [[0,0,21,0,0],[0,0,0,0,0],[21,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]] self.aa_order1 = 'DELIV' self.input_freqs1 = dict(zip(self.aa_order1,[0.2]*5)) self.alphabet1 = [('D','DE'),('L','LIV')] #create_recoded_rate_matrix(alphabets['a1_4']) self.m2 = [[0,8,6,5,1],[8,0,7,3,0],[6,7,0,4,2],[5,3,4,0,0],[1,0,2,0,0]] self.recoded_m2 =\ [[0,0,21,0,1],[0,0,0,0,0],[21,0,0,0,2],[0,0,0,0,0],[1,0,2,0,0]] self.aa_order2 = 'DELIC' self.input_freqs2 = dict(zip(self.aa_order2,[0.2]*5)) self.alphabet2 = [('D','DE'),('L','LI'),('C','C')] self.alphabet2_w_ambig = [('D','DEX'),('L','LIB'),('C','CZ')] def test_recode_counts_and_freqs(self): """recode_counts_and_freqs: functions as expected """ alphabet = alphabets['charge_his_3'] aa_order = 'ACDEFGHIKLMNPQRSTVWY' actual = recode_counts_and_freqs(alphabet) expected_matrix = recode_count_matrix(alphabet,\ count_matrix=DSO78_matrix,aa_order=aa_order) expected_freqs = {}.fromkeys(aa_order,0.0) expected_freqs.update(recode_freq_vector(alphabet,DSO78_freqs)) expected = (expected_matrix,expected_freqs) self.assertEqual(actual,expected) def test_recode_count_matrix_2_states(self): """recode_count_matrix: returns correct result with 2-state alphabet """ actual = recode_count_matrix(self.alphabet1,self.m1,self.aa_order1) expected = self.recoded_m1 self.assertEqual(actual,expected) def test_recode_count_matrix_3_states(self): """recode_count_matrix: returns correct result with 3-state alphabet """ actual = recode_count_matrix(self.alphabet2,self.m2,self.aa_order2) expected = self.recoded_m2 self.assertEqual(actual,expected) def test_recode_count_matrix_3_states_ambig_ignored(self): """recode_count_matrix: correct result w 3-state alphabet w ambig chars """ actual =\ recode_count_matrix(self.alphabet2_w_ambig,self.m2,self.aa_order2) expected = self.recoded_m2 self.assertEqual(actual,expected) def test_recode_count_matrix_no_change(self): """recode_count_matrix: no changes applied when they shouldn't be """ # recoding recoded matrices actual =\ recode_count_matrix(self.alphabet1,self.recoded_m1,self.aa_order1) expected = self.recoded_m1 self.assertEqual(actual,expected) actual =\ recode_count_matrix(self.alphabet2,self.recoded_m2,self.aa_order2) expected = self.recoded_m2 self.assertEqual(actual,expected) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_util/test_table.rst000755 000765 000024 00000230554 11643033237 022314 0ustar00jrideoutstaff000000 000000 Data Manipulation using ``Table`` ================================= .. sectionauthor:: Gavin Huttley .. Copyright 2007-2009, The Cogent Project Credits Gavin Huttley, Felix Schill License, GPL version, 1.3.0.dev Maintainer, Gavin Huttley Email, gavin.huttley@anu.edu.au Status, Production The toolkit has a ``Table`` object that can be used for manipulating tabular data. It's properties can be considered like an ordered 2 dimensional dictionary or tuple with flexible output format capabilities of use for exporting data for import into external applications. Importantly, via the restructured text format one can generate html or latex formatted tables. The ``table`` module is located within ``cogent.util``. The ``LoadTable`` convenience function is provided as a top-level ``cogent`` import. Table creation -------------- Tables can be created directly using the Table object itself, or a convenience function that handles loading from files. We import both here: .. doctest:: >>> from cogent import LoadTable >>> from cogent.util.table import Table First, if you try and create a ``Table`` without any data, it raises a ``RuntimeError``. .. doctest:: >>> t = Table() Traceback (most recent call last): RuntimeError: header and rows must be provided to Table >>> t = Table(header=[], rows=[]) Traceback (most recent call last): RuntimeError: header and rows must be provided to Table Let's create a very simple, rather nonsensical, table first. To create a table requires a header series, and a 2D series (either of type ``tuple``, ``list``, ``dict``). .. doctest:: >>> column_headings = ['Journal', 'Impact'] The string "Journal" will become the first column heading, "Impact" the second column heading. The data are, .. doctest:: >>> rows = [['INT J PARASITOL', 2.9], ... ['J MED ENTOMOL', 1.4], ... ['Med Vet Entomol', 1.0], ... ['INSECT MOL BIOL', 2.85], ... ['J AM MOSQUITO CONTR', 0.811], ... ['MOL PHYLOGENET EVOL', 2.8], ... ['HEREDITY', 1.99e+0], ... ['AM J TROP MED HYG', 2.105], ... ['MIL MED', 0.605], ... ['MED J AUSTRALIA', 1.736]] We create the simplest of tables. .. doctest:: >>> t = Table(header = column_headings, rows = rows) >>> print t ============================= Journal Impact ----------------------------- INT J PARASITOL 2.9000 J MED ENTOMOL 1.4000 Med Vet Entomol 1.0000 INSECT MOL BIOL 2.8500 J AM MOSQUITO CONTR 0.8110 MOL PHYLOGENET EVOL 2.8000 HEREDITY 1.9900 AM J TROP MED HYG 2.1050 MIL MED 0.6050 MED J AUSTRALIA 1.7360 ----------------------------- The format above is referred to as 'simple' format in the documentation. Notice that the numbers in this table have 4 decimal places, despite the fact the original data were largely strings and had ``max`` of 3 decimal places precision. ``Table`` converts string representations of numbers to their appropriate form when you do ``str(table)`` or print the table. We have several things we might want to specify when creating a table: the precision and or format of floating point numbers (integer argument - ``digits``), the spacing between columns (integer argument or actual string of whitespace - ``space``), title (argument - ``title``), and legend (argument - ``legend``). Lets modify some of these and provide a title and legend. .. doctest:: >>> t = Table(column_headings, rows, title='Journal impact factors', legend='From ISI', ... digits=2, space=' ') >>> print t Journal impact factors ================================= Journal Impact --------------------------------- INT J PARASITOL 2.90 J MED ENTOMOL 1.40 Med Vet Entomol 1.00 INSECT MOL BIOL 2.85 J AM MOSQUITO CONTR 0.81 MOL PHYLOGENET EVOL 2.80 HEREDITY 1.99 AM J TROP MED HYG 2.10 MIL MED 0.60 MED J AUSTRALIA 1.74 --------------------------------- From ISI .. note:: You can also a representation on a table for a quick summary. .. doctest:: >>> t Table(numrows=10, numcols=2, header=['Journal', 'Impact'], rows=[['INT J PARASITOL', 2.9000],..]) The Table class cannot handle arbitrary python objects, unless they are passed in as strings. Note in this case we now directly pass in the column headings list and the handling of missing data can be explicitly specified.. .. doctest:: >>> t2 = Table(['abcd', 'data'], [[str(range(1,6)), '0'], ... ['x', 5.0], ['y', None]], ... missing_data='*') >>> print t2 ========================= abcd data ------------------------- [1, 2, 3, 4, 5] 0 x 5.0000 y * ------------------------- Table column headings can be assessed from the ``table.Header`` property .. doctest:: >>> assert t2.Header == ['abcd', 'data'] and this is immutable (cannot be changed). .. doctest:: >>> t2.Header[1] = 'Data' Traceback (most recent call last): RuntimeError: Table Header is immutable, use withNewHeader If you want to change the Header, use the ``withNewHeader`` method. This can be done one column at a time, or as a batch. The returned Table is identical aside from the modified column labels. .. doctest:: >>> mod_header = t2.withNewHeader('abcd', 'ABCD') >>> assert mod_header.Header == ['ABCD', 'data'] >>> mod_header = t2.withNewHeader(['abcd', 'data'], ['ABCD', 'DATA']) >>> print mod_header ========================= ABCD DATA ------------------------- [1, 2, 3, 4, 5] 0 x 5.0000 y * ------------------------- Tables may also be created from 2-dimensional dictionaries. In this case, special capabilities are provided to enforce printing rows in a particular order. .. doctest:: >>> d2D={'edge.parent': {'NineBande': 'root', 'edge.1': 'root', ... 'DogFaced': 'root', 'Human': 'edge.0', 'edge.0': 'edge.1', ... 'Mouse': 'edge.1', 'HowlerMon': 'edge.0'}, 'x': {'NineBande': 1.0, ... 'edge.1': 1.0, 'DogFaced': 1.0, 'Human': 1.0, 'edge.0': 1.0, ... 'Mouse': 1.0, 'HowlerMon': 1.0}, 'length': {'NineBande': 4.0, ... 'edge.1': 4.0, 'DogFaced': 4.0, 'Human': 4.0, 'edge.0': 4.0, ... 'Mouse': 4.0, 'HowlerMon': 4.0}, 'y': {'NineBande': 3.0, 'edge.1': 3.0, ... 'DogFaced': 3.0, 'Human': 3.0, 'edge.0': 3.0, 'Mouse': 3.0, ... 'HowlerMon': 3.0}, 'z': {'NineBande': 6.0, 'edge.1': 6.0, ... 'DogFaced': 6.0, 'Human': 6.0, 'edge.0': 6.0, 'Mouse': 6.0, ... 'HowlerMon': 6.0}, ... 'edge.name': ['Human', 'HowlerMon', 'Mouse', 'NineBande', 'DogFaced', ... 'edge.0', 'edge.1']} >>> row_order = d2D['edge.name'] >>> d2D['edge.name'] = dict(zip(row_order, row_order)) >>> t3 = Table(['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], d2D, ... row_order = row_order, missing_data='*', space=8, max_width = 50, ... row_ids = True, title = 'My Title', ... legend = 'Legend: this is a nonsense example.') >>> print t3 My Title ========================================== edge.name edge.parent length ------------------------------------------ Human edge.0 4.0000 HowlerMon edge.0 4.0000 Mouse edge.1 4.0000 NineBande root 4.0000 DogFaced root 4.0000 edge.0 edge.1 4.0000 edge.1 root 4.0000 ------------------------------------------ continued: My Title ===================================== edge.name x y ------------------------------------- Human 1.0000 3.0000 HowlerMon 1.0000 3.0000 Mouse 1.0000 3.0000 NineBande 1.0000 3.0000 DogFaced 1.0000 3.0000 edge.0 1.0000 3.0000 edge.1 1.0000 3.0000 ------------------------------------- continued: My Title ======================= edge.name z ----------------------- Human 6.0000 HowlerMon 6.0000 Mouse 6.0000 NineBande 6.0000 DogFaced 6.0000 edge.0 6.0000 edge.1 6.0000 ----------------------- Legend: this is a nonsense example. In the above we specify a maximum width of the table, and also specify row identifiers (using ``row_ids``, the integer corresponding to the column at which data begin, preceding columns are taken as the identifiers). This has the effect of forcing the table to wrap when the simple text format is used, but wrapping does not occur for any other format. The ``row_ids`` are keys for slicing the table by row, and as identifiers are presented in each wrapped sub-table. Wrapping generate neat looking tables whether or not you index the table rows. We demonstrate here .. doctest:: >>> from cogent import LoadTable >>> h = ['A/C', 'A/G', 'A/T', 'C/A'] >>> rows = [[0.0425, 0.1424, 0.0226, 0.0391]] >>> wrap_table = LoadTable(header=h, rows=rows, max_width=30) >>> print wrap_table ============================== A/C A/G A/T ------------------------------ 0.0425 0.1424 0.0226 ------------------------------ continued: ========== C/A ---------- 0.0391 ---------- >>> wrap_table = LoadTable(header=h, rows=rows, max_width=30, ... row_ids=True) >>> print wrap_table ========================== A/C A/G A/T -------------------------- 0.0425 0.1424 0.0226 -------------------------- continued: ================ A/C C/A ---------------- 0.0425 0.0391 ---------------- We can also customise the formatting of individual columns. .. doctest:: >>> rows = (('NP_003077_hs_mm_rn_dna', 'Con', 2.5386013224378985), ... ('NP_004893_hs_mm_rn_dna', 'Con', 0.12135142635634111e+06), ... ('NP_005079_hs_mm_rn_dna', 'Con', 0.95165949788861326e+07), ... ('NP_005500_hs_mm_rn_dna', 'Con', 0.73827030202664901e-07), ... ('NP_055852_hs_mm_rn_dna', 'Con', 1.0933217708952725e+07)) We first create a table and show the default formatting behaviour for ``Table``. .. doctest:: >>> t46 = Table(['Gene', 'Type', 'LR'], rows) >>> print t46 =============================================== Gene Type LR ----------------------------------------------- NP_003077_hs_mm_rn_dna Con 2.5386 NP_004893_hs_mm_rn_dna Con 121351.4264 NP_005079_hs_mm_rn_dna Con 9516594.9789 NP_005500_hs_mm_rn_dna Con 0.0000 NP_055852_hs_mm_rn_dna Con 10933217.7090 ----------------------------------------------- We then format the ``LR`` column to use a scientific number format. .. doctest:: >>> t46 = Table(['Gene', 'Type', 'LR'], rows) >>> t46.setColumnFormat('LR', "%.4e") >>> print t46 ============================================ Gene Type LR -------------------------------------------- NP_003077_hs_mm_rn_dna Con 2.5386e+00 NP_004893_hs_mm_rn_dna Con 1.2135e+05 NP_005079_hs_mm_rn_dna Con 9.5166e+06 NP_005500_hs_mm_rn_dna Con 7.3827e-08 NP_055852_hs_mm_rn_dna Con 1.0933e+07 -------------------------------------------- It is safe to directly modify certain attributes, such as the title, legend and white space separating columns, which we do for the ``t46``. .. doctest:: >>> t46.Title = "A new title" >>> t46.Legend = "A new legend" >>> t46.Space = ' ' >>> print t46 A new title ======================================== Gene Type LR ---------------------------------------- NP_003077_hs_mm_rn_dna Con 2.5386e+00 NP_004893_hs_mm_rn_dna Con 1.2135e+05 NP_005079_hs_mm_rn_dna Con 9.5166e+06 NP_005500_hs_mm_rn_dna Con 7.3827e-08 NP_055852_hs_mm_rn_dna Con 1.0933e+07 ---------------------------------------- A new legend We can provide settings for multiple columns. .. doctest:: >>> t3 = Table(['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], d2D, ... row_order = row_order) >>> t3.setColumnFormat('x', "%.1e") >>> t3.setColumnFormat('y', "%.2f") >>> print t3 =============================================================== edge.name edge.parent length x y z --------------------------------------------------------------- Human edge.0 4.0000 1.0e+00 3.00 6.0000 HowlerMon edge.0 4.0000 1.0e+00 3.00 6.0000 Mouse edge.1 4.0000 1.0e+00 3.00 6.0000 NineBande root 4.0000 1.0e+00 3.00 6.0000 DogFaced root 4.0000 1.0e+00 3.00 6.0000 edge.0 edge.1 4.0000 1.0e+00 3.00 6.0000 edge.1 root 4.0000 1.0e+00 3.00 6.0000 --------------------------------------------------------------- In some cases, the contents of a column can be of different types. In this instance, rather than passing a column template we pass a reference to a function that will handle this complexity. To illustrate this we will define a function that formats floating point numbers, but returns everything else as is. .. doctest:: >>> def formatcol(value): ... if isinstance(value, float): ... val = "%.2f" % value ... else: ... val = str(value) ... return val We apply this to a table with mixed string, integer and floating point data. .. doctest:: >>> t6 = Table(['ColHead'], [['a'], [1], [0.3], ['cc']], ... column_templates = dict(ColHead=formatcol)) >>> print t6 ======= ColHead ------- a 1 0.30 cc ------- Representation of tables ^^^^^^^^^^^^^^^^^^^^^^^^ The representation formatting provides a quick overview of a table's dimensions and it's contents. We show this for a table with 3 columns and multiple rows .. doctest:: >>> t46 Table(numrows=5, numcols=3, header=['Gene', 'Type', 'LR'], rows=[['NP_003077_hs_mm_rn_dna', 'Con', 2.5386],..]) and larger .. doctest:: >>> t3 Table(numrows=7, numcols=6, header=['edge.name', 'edge.parent', 'length',..], rows=[['Human', 'edge.0', 4.0000,..],..]) .. note:: within a script use ``print repr(t3)`` to get the same representation. Table output ------------ Table can output in multiple formats, including restructured text or 'rest' and delimited. These can be obtained using the ``tostring`` method and ``format`` argument as follows. Using table ``t`` from above, .. doctest:: >>> print t.tostring(format='rest') +------------------------------+ | Journal impact factors | +---------------------+--------+ | Journal | Impact | +=====================+========+ | INT J PARASITOL | 2.90 | +---------------------+--------+ | J MED ENTOMOL | 1.40 | +---------------------+--------+ | Med Vet Entomol | 1.00 | +---------------------+--------+ | INSECT MOL BIOL | 2.85 | +---------------------+--------+ | J AM MOSQUITO CONTR | 0.81 | +---------------------+--------+ | MOL PHYLOGENET EVOL | 2.80 | +---------------------+--------+ | HEREDITY | 1.99 | +---------------------+--------+ | AM J TROP MED HYG | 2.10 | +---------------------+--------+ | MIL MED | 0.60 | +---------------------+--------+ | MED J AUSTRALIA | 1.74 | +---------------------+--------+ | From ISI | +------------------------------+ Arguments such as ``space`` have no effect in this case. The table may also be written to file in any of the available formats (latex, simple text, html, pickle) or using a custom separator (such as a comma or tab). This makes it convenient to get data into other applications (such as R or a spreadsheet program). Here is the latex format, note how the title and legend are joined into the latex table caption. We also provide optional arguments for the column alignment (fist column left aligned, second column right aligned and remaining columns centred) and a label for table referencing. .. doctest:: >>> print t3.tostring(format='tex', justify="lrcccc", label="table:example") \begin{longtable}[htp!]{ l r c c c c } \hline \bf{edge.name} & \bf{edge.parent} & \bf{length} & \bf{x} & \bf{y} & \bf{z} \\ \hline \hline Human & edge.0 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ HowlerMon & edge.0 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ Mouse & edge.1 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ NineBande & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ DogFaced & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ edge.0 & edge.1 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ edge.1 & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ \hline \label{table:example} \end{longtable} More complex latex table justifying is also possible. Specifying the width of individual columns requires passing in a series (list or tuple) of justification commands. In the following we introduce the command for specific columns widths. .. doctest:: >>> print t3.tostring(format='tex', justify=["l","p{3cm}","c","c","c","c"]) \begin{longtable}[htp!]{ l p{3cm} c c c c } \hline \bf{edge.name} & \bf{edge.parent} & \bf{length} & \bf{x} & \bf{y} & \bf{z} \\ \hline \hline Human & edge.0 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ HowlerMon & edge.0 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ Mouse & edge.1 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ NineBande & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ DogFaced & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ edge.0 & edge.1 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ edge.1 & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\ \hline \end{longtable} >>> print t3.tostring(sep=',') edge.name,edge.parent,length, x, y, z Human, edge.0,4.0000,1.0e+00,3.00,6.0000 HowlerMon, edge.0,4.0000,1.0e+00,3.00,6.0000 Mouse, edge.1,4.0000,1.0e+00,3.00,6.0000 NineBande, root,4.0000,1.0e+00,3.00,6.0000 DogFaced, root,4.0000,1.0e+00,3.00,6.0000 edge.0, edge.1,4.0000,1.0e+00,3.00,6.0000 edge.1, root,4.0000,1.0e+00,3.00,6.0000 You can specify any standard text character that will work with your desired target. Useful separators are tabs ('\\t'), or pipes ('\|'). If ``Table`` encounters any of these characters within a cell, it wraps the cell in quotes -- a standard approach to facilitate import by other applications. We will illustrate this with ``t2``. .. doctest:: >>> print t2.tostring(sep=', ') abcd, data "[1, 2, 3, 4, 5]", 0 x, 5.0000 y, * Note that I introduced an extra space after the column just to make the result more readable in this example. Test the writing of phylip distance matrix format. .. doctest:: >>> rows = [['a', '', 0.088337278874079342, 0.18848582712597683, ... 0.44084000179091454], ['c', 0.088337278874079342, '', ... 0.088337278874079342, 0.44083999937417828], ['b', 0.18848582712597683, ... 0.088337278874079342, '', 0.44084000179090932], ['e', ... 0.44084000179091454, 0.44083999937417828, 0.44084000179090932, '']] >>> header = ['seq1/2', 'a', 'c', 'b', 'e'] >>> dist = Table(rows = rows, header = header, ... row_ids = True) >>> print dist.tostring(format = 'phylip') 4 a 0.0000 0.0883 0.1885 0.4408 c 0.0883 0.0000 0.0883 0.4408 b 0.1885 0.0883 0.0000 0.4408 e 0.4408 0.4408 0.4408 0.0000 The ``tostring`` method also provides generic html generation via the restructured text format. The ``toRichHtmlTable`` method can be used to generate the html table element by itself, with greater control over formatting. Specifically, users can provide custom callback functions to the ``row_cell_func`` and ``header_cell_func`` arguments to control in detail the formatting of table elements, or use the simpler dictionary based ``element_formatters`` approach. We use the above ``dist`` table to provide a specific callback that will set the background color for diagonal cells. We first write a function that takes the cell value and coordinates, returning the html formmatted text. .. doctest:: >>> def format_cell(value, row_num, col_num): ... bgcolor=['', ' bgcolor="#0055ff"'][value==''] ... return '%s' % (bgcolor, value) We then call the method, without this argument, then with it. .. doctest:: >>> straight_html = dist.toRichHtmlTable() >>> print straight_html
seq1/2a... >>> rich_html = dist.toRichHtmlTable(row_cell_func=format_cell, ... compact=False) >>> print rich_html ... Exporting bedGraph format ------------------------- One export format available is bedGraph_. This format can be used for viewing data as annotation track in a genome browser. This format allows for unequal spans and merges adjacent spans with the same value. The format has many possible arguments that modify the appearance in the genome browser. For this example we just create a simple data set. .. doctest:: >>> rows = [['1', 100, 101, 1.123], ['1', 101, 102, 1.123], ... ['1', 102, 103, 1.123], ['1', 103, 104, 1.123], ... ['1', 104, 105, 1.123], ['1', 105, 106, 1.123], ... ['1', 106, 107, 1.123], ['1', 107, 108, 1.123], ... ['1', 108, 109, 1], ['1', 109, 110, 1], ... ['1', 110, 111, 1], ['1', 111, 112, 1], ... ['1', 112, 113, 1], ['1', 113, 114, 1], ... ['1', 114, 115, 1], ['1', 115, 116, 1], ... ['1', 116, 117, 1], ['1', 117, 118, 1], ... ['1', 118, 119, 2], ['1', 119, 120, 2], ... ['1', 120, 121, 2], ['1', 150, 151, 2], ... ['1', 151, 152, 2], ['1', 152, 153, 2], ... ['1', 153, 154, 2], ['1', 154, 155, 2], ... ['1', 155, 156, 2], ['1', 156, 157, 2], ... ['1', 157, 158, 2], ['1', 158, 159, 2], ... ['1', 159, 160, 2], ['1', 160, 161, 2]] ... >>> bgraph = LoadTable(header=['chrom', 'start', 'end', 'value'], ... rows=rows) ... >>> print bgraph.tostring(format='bedgraph', name='test track', ... graphType='bar', description='test of bedgraph', color=(255,0,0)) # doctest: +NORMALIZE_WHITESPACE track type=bedGraph name="test track" description="test of bedgraph" color=255,0,0 graphType=bar 1 100 108 1.12 1 108 118 1.0 1 118 161 2.0 The bedgraph formatter defaults to rounding values to 2 decimal places. You can adjust that precision using the ``digits`` argument. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> print bgraph.tostring(format='bedgraph', name='test track', ... graphType='bar', description='test of bedgraph', color=(255,0,0), ... digits=0) # doctest: +NORMALIZE_WHITESPACE track type=bedGraph name="test track" description="test of bedgraph" color=255,0,0 graphType=bar 1 100 118 1.0 1 118 161 2.0 .. note:: Writing files in bedgraph format is done using the ``writeToFile(format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0))``. .. _bedGraph: https://cgwb.nci.nih.gov/goldenPath/help/bedgraph.html Saving a table for reloading ---------------------------- Saving a table object to file for later reloading can be done using the standard ``writeToFile`` method and ``filename`` argument to the ``Table`` constructor, specifying any of the formats supported by ``tostring``. The table loading will recreate a table from raw data located at ``filename``. To illustrate this, we first write out the table ``t3`` in ``pickle`` format and then the table ``t2`` in a csv (comma separated values format). .. doctest:: :options: +NORMALIZE_WHITESPACE >>> t3 = Table(['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], d2D, ... row_order = row_order, missing_data='*', space=8, max_width = 50, ... row_ids = True, title = 'My Title', ... legend = 'Legend: this is a nonsense example.') >>> t3.writeToFile("t3.pickle") >>> t3_loaded = LoadTable(filename = "t3.pickle") >>> print t3_loaded My Title ========================================== edge.name edge.parent length ------------------------------------------ Human edge.0 4.0000 HowlerMon edge.0 4.0000 Mouse edge.1 4.0000 NineBande root 4.0000 DogFaced root 4.0000 edge.0 edge.1 4.0000 edge.1 root 4.0000 ------------------------------------------ continued: My Title ===================================== edge.name x y ------------------------------------- Human 1.0000 3.0000 HowlerMon 1.0000 3.0000 Mouse 1.0000 3.0000 NineBande 1.0000 3.0000 DogFaced 1.0000 3.0000 edge.0 1.0000 3.0000 edge.1 1.0000 3.0000 ------------------------------------- continued: My Title ======================= edge.name z ----------------------- Human 6.0000 HowlerMon 6.0000 Mouse 6.0000 NineBande 6.0000 DogFaced 6.0000 edge.0 6.0000 edge.1 6.0000 ----------------------- Legend: this is a nonsense example. >>> t2 = Table(['abcd', 'data'], [[str(range(1,6)), '0'], ['x', 5.0], ... ['y', None]], missing_data='*', title = 'A \ntitle') >>> t2.writeToFile('t2.csv', sep=',') >>> t2_loaded = LoadTable(filename = 't2.csv', header = True, with_title = True, ... sep = ',') >>> print t2_loaded A title ========================= abcd data ------------------------- [1, 2, 3, 4, 5] 0 x 5.0000 y ------------------------- Note the ``missing_data`` attribute is not saved in the delimited format, but is in the ``pickle`` format. In the next case, I'm going to override the digits format on reloading of the table. .. doctest:: >>> t2 = Table(['abcd', 'data'], [[str(range(1,6)), '0'], ['x', 5.0], ... ['y', None]], missing_data='*', title = 'A \ntitle', ... legend = "And\na legend too") >>> t2.writeToFile('t2.csv', sep=',') >>> t2_loaded = LoadTable(filename = 't2.csv', header = True, ... with_title = True, with_legend = True, sep = ',', digits = 2) >>> print t2_loaded # doctest: +NORMALIZE_WHITESPACE A title ======================= abcd data ----------------------- [1, 2, 3, 4, 5] 0 x 5.00 y ----------------------- And a legend too A few things to note about the delimited file saving: formatting arguments are lost in saving to a delimited format; the ``header`` argument specifies whether the first line of the file should be treated as the header; the ``with_title`` and ``with_legend`` arguments are necessary if the file contains them, otherwise they become the header or part of the table. Importantly, if you wish to preserve numerical precision use the ``pickle`` format. ``cPickle`` can load a useful object from the pickled ``Table`` by itself, without needing to know anything about the ``Table`` class. .. doctest:: >>> import cPickle >>> f = file("t3.pickle") >>> pickled = cPickle.load(f) >>> f.close() >>> pickled.keys() ['digits', 'row_ids', 'rows', 'title', 'space', 'max_width', 'header',... >>> pickled['rows'][0] ['Human', 'edge.0', 4.0, 1.0, 3.0, 6.0] We can read in a delimited format using a custom reader. There are two approaches. The first one allows specifying different type conversions for different columns. The second allows specifying a whole line-based parser. You can also read and write tables in gzip compressed format. This can be done simply by ending a filename with '.gz' or specifying ``compress=True``. We write a compressed file the two different ways and read it back in. .. doctest:: >>> t2.writeToFile('t2.csv.gz', sep=',') >>> t2_gz = LoadTable('t2.csv.gz', sep=',', with_title=True, ... with_legend = True) >>> t2_gz.Shape == t2.Shape True >>> t2.writeToFile('t2.csv', sep=',', compress=True) >>> t2_gz = LoadTable('t2.csv.gz', sep=',', with_title=True, ... with_legend = True) >>> t2_gz.Shape == t2.Shape True Defining a custom reader with type conversion for each column ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We convert columns 2-5 to floats by specifying a field convertor. We then create a reader, specifying the data (below a list but can be a file) properties. Note that if no convertor is provided all data are returned as strings. We can also provide this reader to the ``Table`` constructor for a more direct way of opening such files. In this case, ``Table`` assumes there is a header row and nothing else. .. doctest:: >>> from cogent.parse.table import ConvertFields, SeparatorFormatParser >>> t3.Title = t3.Legend = None >>> comma_sep = t3.tostring(sep=",").splitlines() >>> print comma_sep ['edge.name,edge.parent,length, x, y, z', ' Human, ... >>> converter = ConvertFields([(2,float), (3,float), (4,float), (5, float)]) >>> reader = SeparatorFormatParser(with_header=True,converter=converter, ... sep=",") >>> comma_sep = [line for line in reader(comma_sep)] >>> print comma_sep [['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], ['Human',... >>> t3.writeToFile("t3.tab", sep="\t") >>> reader = SeparatorFormatParser(with_header=True,converter=converter, ... sep="\t") >>> t3a = LoadTable(filename="t3.tab", reader=reader, title="new title", ... space=2) ... >>> print t3a new title ====================================================== edge.name edge.parent length x y z ------------------------------------------------------ Human edge.0 4.0000 1.0000 3.0000 6.0000 HowlerMon edge.0 4.0000 1.0000 3.0000 6.0000 Mouse edge.1 4.0000 1.0000 3.0000 6.0000 NineBande root 4.0000 1.0000 3.0000 6.0000 DogFaced root 4.0000 1.0000 3.0000 6.0000 edge.0 edge.1 4.0000 1.0000 3.0000 6.0000 edge.1 root 4.0000 1.0000 3.0000 6.0000 ------------------------------------------------------ We can use the ``SeparatorFormatParser`` to ignore reading certain lines by using a callback function. We illustrate this using the above data, skipping any rows with ``edge.name`` starting with ``edge``. .. doctest:: >>> def ignore_internal_nodes(line): ... return line[0].startswith('edge') ... >>> reader = SeparatorFormatParser(with_header=True,converter=converter, ... sep="\t", ignore=ignore_internal_nodes) ... >>> tips = LoadTable(filename="t3.tab", reader=reader, digits=1, space=2) >>> print tips ============================================= edge.name edge.parent length x y z --------------------------------------------- Human edge.0 4.0 1.0 3.0 6.0 HowlerMon edge.0 4.0 1.0 3.0 6.0 Mouse edge.1 4.0 1.0 3.0 6.0 NineBande root 4.0 1.0 3.0 6.0 DogFaced root 4.0 1.0 3.0 6.0 --------------------------------------------- We can also limit the amount of data to be read in, very handy for checking large files. .. doctest:: >>> t3a = LoadTable("t3.tab", sep='\t', limit=3) >>> print t3a ================================================================ edge.name edge.parent length x y z ---------------------------------------------------------------- Human edge.0 4.0000 1.0000 3.0000 6.0000 HowlerMon edge.0 4.0000 1.0000 3.0000 6.0000 Mouse edge.1 4.0000 1.0000 3.0000 6.0000 ---------------------------------------------------------------- Limiting should also work when ``static_column_types`` is invoked .. doctest:: >>> t3a = LoadTable("t3.tab", sep='\t', limit=3, static_column_types=True) >>> t3a.Shape[0] == 3 True or when In the above example, the data type in a column is static, e.g. all values in ``x`` are floats. Rather than providing a custom reader, you can get the ``Table`` to construct such a reader based on the first data row using the ``static_column_types`` argument. .. doctest:: >>> t3a = LoadTable(filename="t3.tab", static_column_types=True, digits=1, ... sep='\t') >>> print t3a ======================================================= edge.name edge.parent length x y z ------------------------------------------------------- Human edge.0 4.0 1.0 3.0 6.0 HowlerMon edge.0 4.0 1.0 3.0 6.0 Mouse edge.1 4.0 1.0 3.0 6.0 NineBande root 4.0 1.0 3.0 6.0 DogFaced root 4.0 1.0 3.0 6.0 edge.0 edge.1 4.0 1.0 3.0 6.0 edge.1 root 4.0 1.0 3.0 6.0 ------------------------------------------------------- If you invoke the ``static_column_types`` argument and the column data are not static, you'll get a ``ValueError``. We show this by first creating a simple table with mixed data types in a column, write to file and then try to load with ``static_column_types=True``. .. doctest:: >>> t3b = LoadTable(header=['A', 'B'], rows=[[1,1], ['a', 2]], sep=2) >>> print t3b ====== A B ------ 1 1 a 2 ------ >>> t3b.writeToFile('test3b.txt', sep='\t') >>> t3b = LoadTable('test3b.txt', sep = '\t', static_column_types=True) Traceback (most recent call last): ValueError: invalid literal for int() with base 10: 'a' We also test the reader function for a tab delimited format with missing data at the end. .. doctest:: >>> data = ['ab\tcd\t', 'ab\tcd\tef'] >>> tab_reader = SeparatorFormatParser(sep='\t') >>> for line in tab_reader(data): ... assert len(line) == 3, line Defining a custom reader that operates on entire lines ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It can also be the case that data types differ between lines. The basic mechanism is the same as above, except in defining the converter you must set the argument ``by_column=True``. We illustrate this capability by writing a short function that tries to cast entire lines to ``int``, ``float`` or leaves as a string. .. doctest:: >>> def CastLine(): ... floats = lambda x: map(float, x) ... ints = lambda x: map(int, x) ... def call(line): ... try: ... line = ints(line) ... except ValueError: ... try: ... line = floats(line) ... except ValueError: ... pass ... return line ... return call We then define a couple of lines, create an instance of ``ConvertFields`` and call it for each type. .. doctest:: >>> line_str_ints = '\t'.join(map(str, range(5))) >>> line_str_floats = '\t'.join(map(str, map(float, range(5)))) >>> data = [line_str_ints, line_str_floats] >>> cv = ConvertFields(CastLine(), by_column=False) >>> tab_reader = SeparatorFormatParser(with_header=False, converter=cv, ... sep='\t') >>> for line in tab_reader(data): ... print line [0, 1, 2, 3, 4] [0.0, 1.0, 2.0, 3.0, 4.0] Defining a custom writer ^^^^^^^^^^^^^^^^^^^^^^^^ We can likewise specify a writer, using a custom field formatter and provide this to the ``Table`` directly for writing. We first illustrate how the writer works to generate output. We then use it to escape some text fields in quotes. In order to read that back in, we define a custom reader that strips these quotes off. .. doctest:: >>> from cogent.format.table import FormatFields, SeparatorFormatWriter >>> formatter = FormatFields([(0,'"%s"'), (1,'"%s"')]) >>> writer = SeparatorFormatWriter(formatter=formatter, sep=" | ") >>> for formatted in writer(comma_sep, has_header=True): ... print formatted edge.name | edge.parent | length | x | y | z "Human" | "edge.0" | 4.0 | 1.0 | 3.0 | 6.0 "HowlerMon" | "edge.0" | 4.0 | 1.0 | 3.0 | 6.0 "Mouse" | "edge.1" | 4.0 | 1.0 | 3.0 | 6.0 "NineBande" | "root" | 4.0 | 1.0 | 3.0 | 6.0 "DogFaced" | "root" | 4.0 | 1.0 | 3.0 | 6.0 "edge.0" | "edge.1" | 4.0 | 1.0 | 3.0 | 6.0 "edge.1" | "root" | 4.0 | 1.0 | 3.0 | 6.0 >>> t3.writeToFile(filename="t3.tab", writer=writer) >>> strip = lambda x: x.replace('"', '') >>> converter = ConvertFields([(0,strip), (1, strip)]) >>> reader = SeparatorFormatParser(with_header=True, converter=converter, ... sep="|", strip_wspace=True) >>> t3a = LoadTable(filename="t3.tab", reader=reader, title="new title", ... space=2) >>> print t3a new title ============================================= edge.name edge.parent length x y z --------------------------------------------- Human edge.0 4.0 1.0 3.0 6.0 HowlerMon edge.0 4.0 1.0 3.0 6.0 Mouse edge.1 4.0 1.0 3.0 6.0 NineBande root 4.0 1.0 3.0 6.0 DogFaced root 4.0 1.0 3.0 6.0 edge.0 edge.1 4.0 1.0 3.0 6.0 edge.1 root 4.0 1.0 3.0 6.0 --------------------------------------------- .. note:: There are performance issues for large files. Pickling has proven very slow for saving very large files and introduces significant file size bloat. A simple delimited format is much more efficient both storage wise and, if you use a custom reader (or specify ``static_column_types=True``), to generate and read. A custom reader was approximately 6 fold faster than the standard delimited file reader. Table slicing and iteration --------------------------- The Table class is capable of slicing by row, range of rows, column or range of columns headings or used to identify a single cell. Slicing using the method ``getColumns`` can also be used to reorder columns. In the case of columns, either the string headings or their position integers can be used. For rows, if ``row_ids`` was specified as ``True`` the 0'th cell in each row can also be used. .. doctest:: >>> t4 = Table(['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], d2D, ... row_order = row_order, row_ids = True, title = 'My Title') We subset ``t4`` by column and reorder them. .. doctest:: >>> new = t4.getColumns(['z', 'y']) >>> print new My Title ============================= edge.name z y ----------------------------- Human 6.0000 3.0000 HowlerMon 6.0000 3.0000 Mouse 6.0000 3.0000 NineBande 6.0000 3.0000 DogFaced 6.0000 3.0000 edge.0 6.0000 3.0000 edge.1 6.0000 3.0000 ----------------------------- We use the column position indexes to do get the same table. .. doctest:: >>> new = t4.getColumns([5, 4]) >>> print new My Title ============================= edge.name z y ----------------------------- Human 6.0000 3.0000 HowlerMon 6.0000 3.0000 Mouse 6.0000 3.0000 NineBande 6.0000 3.0000 DogFaced 6.0000 3.0000 edge.0 6.0000 3.0000 edge.1 6.0000 3.0000 ----------------------------- We can also using more general slicing, by both rows and columns. The following returns all rows from 4 on, and columns up to (but excluding) 'y': .. doctest:: >>> k = t4[4:, :'y'] >>> print k My Title ============================================ edge.name edge.parent length x -------------------------------------------- DogFaced root 4.0000 1.0000 edge.0 edge.1 4.0000 1.0000 edge.1 root 4.0000 1.0000 -------------------------------------------- We can explicitly reference individual cells, in this case using both row and column keys. .. doctest:: >>> val = t4['HowlerMon', 'y'] >>> print val 3.0 We slice a single row, .. doctest:: >>> new = t4[3] >>> print new My Title ================================================================ edge.name edge.parent length x y z ---------------------------------------------------------------- NineBande root 4.0000 1.0000 3.0000 6.0000 ---------------------------------------------------------------- and range of rows. .. doctest:: >>> new = t4[3:6] >>> print new My Title ================================================================ edge.name edge.parent length x y z ---------------------------------------------------------------- NineBande root 4.0000 1.0000 3.0000 6.0000 DogFaced root 4.0000 1.0000 3.0000 6.0000 edge.0 edge.1 4.0000 1.0000 3.0000 6.0000 ---------------------------------------------------------------- You can get disjoint rows. .. doctest:: >>> print t4.getDisjointRows(['Human', 'Mouse', 'DogFaced']) My Title ================================================================ edge.name edge.parent length x y z ---------------------------------------------------------------- Human edge.0 4.0000 1.0000 3.0000 6.0000 Mouse edge.1 4.0000 1.0000 3.0000 6.0000 DogFaced root 4.0000 1.0000 3.0000 6.0000 ---------------------------------------------------------------- You can iterate over the table one row at a time and slice the rows. We illustrate this for slicing a single column, .. doctest:: >>> for row in t: ... print row['Journal'] INT J PARASITOL J MED ENTOMOL Med Vet Entomol INSECT MOL BIOL J AM MOSQUITO CONTR MOL PHYLOGENET EVOL HEREDITY AM J TROP MED HYG MIL MED MED J AUSTRALIA and for multiple columns. .. doctest:: >>> for row in t: ... print row['Journal'], row['Impact'] INT J PARASITOL 2.9 J MED ENTOMOL 1.4 Med Vet Entomol 1.0 INSECT MOL BIOL 2.85 J AM MOSQUITO CONTR 0.811 MOL PHYLOGENET EVOL 2.8 HEREDITY 1.99 AM J TROP MED HYG 2.105 MIL MED 0.605 MED J AUSTRALIA 1.736 The numerical slice equivalent to the first case above would be ``row[0]``, to the second case either ``row[:]``, ``row[:2]``. Filtering tables - selecting subsets of rows/columns ---------------------------------------------------- We want to be able to slice a table, based on some condition(s), to produce a new subset table. For instance, we construct a table with type and probability values. .. doctest:: >>> header = ['Gene', 'type', 'LR', 'df', 'Prob'] >>> rows = (('NP_003077_hs_mm_rn_dna', 'Con', 2.5386, 1, 0.1111), ... ('NP_004893_hs_mm_rn_dna', 'Con', 0.1214, 1, 0.7276), ... ('NP_005079_hs_mm_rn_dna', 'Con', 0.9517, 1, 0.3293), ... ('NP_005500_hs_mm_rn_dna', 'Con', 0.7383, 1, 0.3902), ... ('NP_055852_hs_mm_rn_dna', 'Con', 0.0000, 1, 0.9997), ... ('NP_057012_hs_mm_rn_dna', 'Unco', 34.3081, 1, 0.0000), ... ('NP_061130_hs_mm_rn_dna', 'Unco', 3.7986, 1, 0.0513), ... ('NP_065168_hs_mm_rn_dna', 'Con', 89.9766, 1, 0.0000), ... ('NP_065396_hs_mm_rn_dna', 'Unco', 11.8912, 1, 0.0006), ... ('NP_109590_hs_mm_rn_dna', 'Con', 0.2121, 1, 0.6451), ... ('NP_116116_hs_mm_rn_dna', 'Unco', 9.7474, 1, 0.0018)) >>> t5 = Table(header, rows) >>> print t5 ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111 NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276 NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293 NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902 NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997 NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513 NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 --------------------------------------------------------- We then seek to obtain only those rows that contain probabilities < 0.05. We use valid python code within a string. **Note:** Make sure your column headings could be valid python variable names or the string based approach will fail (you could use an external function instead, see below). .. doctest:: >>> sub_table1 = t5.filtered(callback = "Prob < 0.05") >>> print sub_table1 ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 --------------------------------------------------------- Using the above table we test the function to extract the raw data for a single column, .. doctest:: >>> raw = sub_table1.getRawData('LR') >>> raw [34.3081..., 89.9766..., 11.8912, 9.7474...] and from multiple columns. .. doctest:: >>> raw = sub_table1.getRawData(columns = ['df', 'Prob']) >>> raw [[1, 0.0], [1, 0.0],... We can also do filtering using an external function, in this case we use a ``lambda`` to obtain only those rows of type 'Unco' that contain probabilities < 0.05, modifying our callback function. .. doctest:: >>> func = lambda (ty, pr): ty == 'Unco' and pr < 0.05 >>> sub_table2 = t5.filtered(columns = ('type', 'Prob'), callback = func) >>> print sub_table2 ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 --------------------------------------------------------- This can also be done using the string approach. .. doctest:: >>> sub_table2 = t5.filtered(callback = "type == 'Unco' and Prob < 0.05") >>> print sub_table2 ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 --------------------------------------------------------- We can also filter table columns using ``filteredByColumn``. Say we only want the numerical columns, we can write a callback that returns ``False`` if some numerical operation fails, ``True`` otherwise. .. doctest:: >>> def is_numeric(values): ... try: ... sum(values) ... except TypeError: ... return False ... return True >>> print t5.filteredByColumn(callback=is_numeric) ======================= LR df Prob ----------------------- 2.5386 1 0.1111 0.1214 1 0.7276 0.9517 1 0.3293 0.7383 1 0.3902 0.0000 1 0.9997 34.3081 1 0.0000 3.7986 1 0.0513 89.9766 1 0.0000 11.8912 1 0.0006 0.2121 1 0.6451 9.7474 1 0.0018 ----------------------- Appending tables ---------------- Tables may also be appended to each other, to make larger tables. We'll construct two simple tables to illustrate this. .. doctest:: >>> geneA = Table(['edge.name', 'edge.parent', 'z'], [['Human','root', ... 6.0],['Mouse','root', 6.0], ['Rat','root', 6.0]], ... title='Gene A') >>> geneB = Table(['edge.name', 'edge.parent', 'z'], [['Human','root', ... 7.0],['Mouse','root', 7.0], ['Rat','root', 7.0]], ... title='Gene B') >>> print geneB Gene B ================================== edge.name edge.parent z ---------------------------------- Human root 7.0000 Mouse root 7.0000 Rat root 7.0000 ---------------------------------- we now use the ``appended`` Table method to create a new table, specifying that we want a new column created (by passing the ``new_column`` argument a heading) in which the table titles will be placed. .. doctest:: >>> new = geneA.appended('Gene', geneB, title='Appended tables') >>> print new Appended tables ============================================ Gene edge.name edge.parent z -------------------------------------------- Gene A Human root 6.0000 Gene A Mouse root 6.0000 Gene A Rat root 6.0000 Gene B Human root 7.0000 Gene B Mouse root 7.0000 Gene B Rat root 7.0000 -------------------------------------------- We repeat this without adding a new column. .. doctest:: >>> new = geneA.appended(None, geneB, title="Appended, no new column") >>> print new Appended, no new column ================================== edge.name edge.parent z ---------------------------------- Human root 6.0000 Mouse root 6.0000 Rat root 6.0000 Human root 7.0000 Mouse root 7.0000 Rat root 7.0000 ---------------------------------- Miscellaneous ------------- Tables have a ``Shape`` attribute, which specifies *x* (number of columns) and *y* (number of rows). The attribute is a tuple and we illustrate it for the above ``sub_table`` tables. Combined with the ``filtered`` method, this attribute can tell you how many rows satisfy a specific condition. .. doctest:: >>> t5.Shape (11, 5) >>> sub_table1.Shape (4, 5) >>> sub_table2.Shape (3, 5) For instance, 3 of the 11 rows in ``t`` were significant and belonged to the ``Unco`` type. For completeness, we generate a table with no rows and assess its shape. .. doctest:: >>> func = lambda (ty, pr): ty == 'Unco' and pr > 0.1 >>> sub_table3 = t5.filtered(columns = ('type', 'Prob'), callback = func) >>> sub_table3.Shape (0, 5) The distinct values can be obtained for a single column, .. doctest:: >>> distinct = new.getDistinctValues("edge.name") >>> assert distinct == set(['Rat', 'Mouse', 'Human']) or multiple columns .. doctest:: >>> distinct = new.getDistinctValues(["edge.parent", "z"]) >>> assert distinct == set([('root', 6.0), ('root', 7.0)]), distinct We can compute column sums. Assuming only numerical values in a column. .. doctest:: >>> assert new.summed('z') == 39., new.summed('z') We construct an example with mixed numerical and non-numerical data. We now compute the column sum with mixed non-numerical/numerical data. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> mix = LoadTable(header=['A', 'B'], rows=[[0,''],[1,2],[3,4]]) >>> print mix ====== A B ------ 0 1 2 3 4 ------ >>> mix.summed('B', strict=False) 6 We also compute row sums for the pure numerical and mixed non-numerical/numerical rows. For summing across rows we must specify the actual row index as an ``int``. .. doctest:: >>> mix.summed(0, col_sum=False, strict=False) 0 >>> mix.summed(1, col_sum=False) 3 We can compute the totals for all columns or rows too. .. doctest:: >>> mix.summed(strict=False) [4, 6] >>> mix.summed(col_sum=False, strict=False) [0, 3, 7] It is not currently possible to do a subset of columns/rows. We show this for rows here. .. doctest:: >>> mix.summed([0, 2], col_sum=False, strict=False) Traceback (most recent call last): RuntimeError: unknown indices type: [0, 2] We test these for a strictly numerical table. .. doctest:: >>> non_mix = LoadTable(header=['A', 'B'], rows=[[0,1],[1,2],[3,4]]) >>> non_mix.summed() [4, 7] >>> non_mix.summed(col_sum=False) [1, 3, 7] We can normalise a numerical table by row, .. doctest:: >>> print non_mix.normalized(by_row=True) ================ A B ---------------- 0.0000 1.0000 0.3333 0.6667 0.4286 0.5714 ---------------- or by column, such that the row/column sums are 1. .. doctest:: >>> print non_mix.normalized(by_row=False) ================ A B ---------------- 0.0000 0.1429 0.2500 0.2857 0.7500 0.5714 ---------------- We normalize by an arbitrary function (maximum value) by row, .. doctest:: >>> print non_mix.normalized(by_row=True, denominator_func=max) ================ A B ---------------- 0.0000 1.0000 0.5000 1.0000 0.7500 1.0000 ---------------- by column. .. doctest:: >>> print non_mix.normalized(by_row=False, denominator_func=max) ================ A B ---------------- 0.0000 0.2500 0.3333 0.5000 1.0000 1.0000 ---------------- Extending tables ---------------- In some cases it is desirable to compute an additional column from existing column values. This is done using the ``withNewColumn`` method. We'll use t4 from above, adding two of the columns to create an additional column. .. doctest:: >>> t7 = t4.withNewColumn('Sum', callback="z+x", digits=2) >>> print t7 My Title ================================================================== edge.name edge.parent length x y z Sum ------------------------------------------------------------------ Human edge.0 4.00 1.00 3.00 6.00 7.00 HowlerMon edge.0 4.00 1.00 3.00 6.00 7.00 Mouse edge.1 4.00 1.00 3.00 6.00 7.00 NineBande root 4.00 1.00 3.00 6.00 7.00 DogFaced root 4.00 1.00 3.00 6.00 7.00 edge.0 edge.1 4.00 1.00 3.00 6.00 7.00 edge.1 root 4.00 1.00 3.00 6.00 7.00 ------------------------------------------------------------------ We test this with an externally defined function. .. doctest:: >>> func = lambda (x, y): x * y >>> t7 = t4.withNewColumn('Sum', callback=func, columns=("y","z"), ... digits=2) >>> print t7 My Title =================================================================== edge.name edge.parent length x y z Sum ------------------------------------------------------------------- Human edge.0 4.00 1.00 3.00 6.00 18.00 HowlerMon edge.0 4.00 1.00 3.00 6.00 18.00 Mouse edge.1 4.00 1.00 3.00 6.00 18.00 NineBande root 4.00 1.00 3.00 6.00 18.00 DogFaced root 4.00 1.00 3.00 6.00 18.00 edge.0 edge.1 4.00 1.00 3.00 6.00 18.00 edge.1 root 4.00 1.00 3.00 6.00 18.00 ------------------------------------------------------------------- >>> func = lambda x: x**3 >>> t7 = t4.withNewColumn('Sum', callback=func, columns="y", digits=2) >>> print t7 My Title =================================================================== edge.name edge.parent length x y z Sum ------------------------------------------------------------------- Human edge.0 4.00 1.00 3.00 6.00 27.00 HowlerMon edge.0 4.00 1.00 3.00 6.00 27.00 Mouse edge.1 4.00 1.00 3.00 6.00 27.00 NineBande root 4.00 1.00 3.00 6.00 27.00 DogFaced root 4.00 1.00 3.00 6.00 27.00 edge.0 edge.1 4.00 1.00 3.00 6.00 27.00 edge.1 root 4.00 1.00 3.00 6.00 27.00 ------------------------------------------------------------------- Sorting tables -------------- We want a table sorted according to values in a column. .. doctest:: >>> sorted = t5.sorted(columns = 'LR') >>> print sorted ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997 NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276 NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451 NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902 NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293 NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111 NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000 --------------------------------------------------------- We want a table sorted according to values in a subset of columns, note the order of columns determines the sort order. .. doctest:: >>> sorted = t5.sorted(columns=('LR', 'type')) >>> print sorted ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997 NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276 NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451 NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902 NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293 NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111 NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000 --------------------------------------------------------- We now do a sort based on 2 columns. .. doctest:: >>> sorted = t5.sorted(columns=('type', 'LR')) >>> print sorted ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997 NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276 NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451 NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902 NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293 NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111 NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000 NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 --------------------------------------------------------- Reverse sort a single column .. doctest:: >>> sorted = t5.sorted('LR', reverse = 'LR') >>> print sorted ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000 NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513 NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111 NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293 NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902 NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451 NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276 NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997 --------------------------------------------------------- Reverse sort one column but not another .. doctest:: >>> sorted = t5.sorted(columns=('type', 'LR'), reverse = 'LR') >>> print sorted ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000 NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111 NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293 NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902 NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451 NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276 NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997 NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513 --------------------------------------------------------- Reverse sort both columns. .. doctest:: >>> sorted = t5.sorted(columns=('type', 'LR'), reverse = ('type', 'LR')) >>> print sorted ========================================================= Gene type LR df Prob --------------------------------------------------------- NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000 NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006 NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018 NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513 NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000 NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111 NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293 NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902 NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451 NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276 NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997 --------------------------------------------------------- Joining Tables -------------- The Table object is capable of joins or merging of records in two tables. There are two fundamental types of joins -- inner and outer -- with there being different sub-types. We demonstrate these first constructing some simple tables. .. doctest:: >>> a=Table(header=["index", "col2","col3"], ... rows=[[1,2,3],[2,3,1],[2,6,5]], title="A") >>> print a A ===================== index col2 col3 --------------------- 1 2 3 2 3 1 2 6 5 --------------------- >>> b=Table(header=["index", "col2","col3"], ... rows=[[1,2,3],[2,2,1],[3,6,3]], title="B") >>> print b B ===================== index col2 col3 --------------------- 1 2 3 2 2 1 3 6 3 --------------------- >>> c=Table(header=["index","col_c2"],rows=[[1,2],[3,2],[3,5]],title="C") >>> print c C =============== index col_c2 --------------- 1 2 3 2 3 5 --------------- For a natural inner join, only 1 copy of columns with the same name are retained. So we expect the headings to be identical between the table ``a``/``b`` and the result of ``a.joined(b)`` or ``b.joined(a)``. .. doctest:: >>> assert a.joined(b).Header == b.Header >>> assert b.joined(a).Header == a.Header For a standard inner join, the joined table should contain all columns from ``a`` and ``b`` excepting the index column(s). Simply providing a column name (or index) selects this behaviour. Note that in this case, column names from the second table are made unique by prefixing them with that tables title. If the provided tables do not have a title, a ``RuntimeError`` is raised. .. doctest:: >>> b.Title = None >>> try: ... a.joined(b) ... except RuntimeError: ... pass >>> b.Title = 'B' >>> assert a.joined(b, "index").Header == ["index", "col2", "col3", ... "B_col2", "B_col3"] ... Note that the table title's were used to prefix the column headings from the second table. We further test this using table ``c`` which has different dimensions. .. doctest:: >>> assert a.joined(c,"index").Header == ["index","col2","col3", ... "C_col_c2"] It's also possible to specify index columns using numerical values, the results of which should be the same. .. doctest:: >>> assert a.joined(b,[0, 2]).getRawData() ==\ ... a.joined(b,["index","col3"]).getRawData() Additionally, it's possible to provide two series of indices for the two tables. Here, they have identical values. .. doctest:: >>> assert a.joined(b, ["index", "col3"],["index", "col3"]).getRawData()\ ... == a.joined(b,["index","col3"]).getRawData() The results of a standard join between tables ``a`` and ``b`` are .. doctest:: >>> print a.joined(b, ["index"], title='A&B') A&B ========================================= index col2 col3 B_col2 B_col3 ----------------------------------------- 1 2 3 2 3 2 3 1 2 1 2 6 5 2 1 ----------------------------------------- We demo the table specific indices. .. doctest:: >>> print a.joined(c, ["col2"], ["index"], title='A&C by "col2/index"') A&C by "col2/index" ================================= index col2 col3 C_col_c2 --------------------------------- 2 3 1 2 2 3 1 5 --------------------------------- Tables ``a`` and ``c`` share a single row with the same value in the ``index`` column, hence a join by that index should return a table with just that row. .. doctest:: >>> print a.joined(c, "index", title='A&C by "index"') A&C by "index" ================================= index col2 col3 C_col_c2 --------------------------------- 1 2 3 2 --------------------------------- A natural join of tables ``a`` and ``b`` results in a table with only rows that were identical between the two parents. .. doctest:: >>> print a.joined(b, title='A&B Natural Join') A&B Natural Join ===================== index col2 col3 --------------------- 1 2 3 --------------------- We test the outer join by defining an additional table with different dimensions, and conducting a join specifying ``inner_join=False``. .. doctest:: >>> d=Table(header=["index", "col_c2"], rows=[[5,42],[6,23]], title="D") >>> print d D =============== index col_c2 --------------- 5 42 6 23 --------------- >>> print c.joined(d,inner_join=False, title='C&D Outer join') C&D Outer join ====================================== index col_c2 D_index D_col_c2 -------------------------------------- 1 2 5 42 1 2 6 23 3 2 5 42 3 2 6 23 3 5 5 42 3 5 6 23 -------------------------------------- We establish the ``joined`` method works for mixtures of character and numerical data, setting some indices and some cell values to be strings. .. doctest:: >>> a=Table(header=["index", "col2","col3"], ... rows=[[1,2,"3"],["2",3,1],[2,6,5]], title="A") >>> b=Table(header=["index", "col2","col3"], ... rows=[[1,2,"3"],["2",2,1],[3,6,3]], title="B") >>> assert a.joined(b, ["index", "col3"],["index", "col3"]).getRawData()\ ... == a.joined(b,["index","col3"]).getRawData() We test that the ``joined`` method works when the column index orders differ. .. doctest:: >>> t1_header = ['a', 'b'] >>> t1_rows = [(1,2),(3,4)] >>> t2_header = ['b', 'c'] >>> t2_rows = [(3,6),(4,8)] >>> t1 = Table(header = t1_header, rows = t1_rows, title='t1') >>> t2 = Table(header = t2_header, rows = t2_rows, title='t2') >>> t3 = t1.joined(t2, columns_self = ["b"], columns_other = ["b"]) >>> print t3 ============== a b t2_c -------------- 3 4 8 -------------- We then establish that a join with no values does not cause a failure, just returns an empty ``Table``. .. doctest:: >>> t4_header = ['b', 'c'] >>> t4_rows = [(5,6),(7,8)] >>> t4 = LoadTable(header = t4_header, rows = t4_rows) >>> t4.Title = 't4' >>> t5 = t1.joined(t4, columns_self = ["b"], columns_other = ["b"]) >>> print t5 ============== a b t4_c -------------- -------------- Whose representation looks like .. doctest:: >>> t5 Table(numrows=0, numcols=3, header=['a', 'b', 't4_c'], rows=[]) Transposing a table ------------------- Tables can be transposed. .. doctest:: >>> from cogent import LoadTable >>> title='#Full OTU Counts' >>> header = ['#OTU ID', '14SK041', '14SK802'] >>> rows = [[-2920, '332', 294], ... [-1606, '302', 229], ... [-393, 141, 125], ... [-2109, 138, 120], ... [-5439, 104, 117], ... [-1834, 70, 75], ... [-18588, 65, 47], ... [-1350, 60, 113], ... [-2160, 57, 52], ... [-11632, 47, 36]] >>> table = LoadTable(header=header,rows=rows,title=title) >>> print table #Full OTU Counts ============================= #OTU ID 14SK041 14SK802 ----------------------------- -2920 332 294 -1606 302 229 -393 141 125 -2109 138 120 -5439 104 117 -1834 70 75 -18588 65 47 -1350 60 113 -2160 57 52 -11632 47 36 ----------------------------- We now transpose this. We require a new column heading for header data and an identifier for which existing column will become the header (default is index 0). .. doctest:: >>> tp = table.transposed(new_column_name='sample', ... select_as_header='#OTU ID', space=2) ... >>> print tp ============================================================================== sample -2920 -1606 -393 -2109 -5439 -1834 -18588 -1350 -2160 -11632 ------------------------------------------------------------------------------ 14SK041 332 302 141 138 104 70 65 60 57 47 14SK802 294 229 125 120 117 75 47 113 52 36 ------------------------------------------------------------------------------ We test transposition with default value is the same. .. doctest:: >>> tp = table.transposed(new_column_name='sample', space=2) ... >>> print tp ============================================================================== sample -2920 -1606 -393 -2109 -5439 -1834 -18588 -1350 -2160 -11632 ------------------------------------------------------------------------------ 14SK041 332 302 141 138 104 70 65 60 57 47 14SK802 294 229 125 120 117 75 47 113 52 36 ------------------------------------------------------------------------------ We test transposition selecting a different column to become the header. .. doctest:: >>> tp = table.transposed(new_column_name='sample', ... select_as_header='14SK802', space=2) ... >>> print tp ============================================================================== sample 294 229 125 120 117 75 47 113 52 36 ------------------------------------------------------------------------------ #OTU ID -2920 -1606 -393 -2109 -5439 -1834 -18588 -1350 -2160 -11632 14SK041 332 302 141 138 104 70 65 60 57 47 ------------------------------------------------------------------------------ Counting rows ------------- We can count the number of rows for which a condition holds. This method uses the same arguments as ``filtered`` but returns an integer result only. .. doctest:: >>> print c.count("col_c2 == 2") 2 >>> print c.joined(d,inner_join=False).count("index==3 and D_index==5") 2 Testing a sub-component ----------------------- Before using ``Table``, we exercise some formatting code: .. doctest:: >>> from cogent.format.table import formattedCells, phylipMatrix, latex We check we can format an arbitrary 2D list, without a header, using the ``formattedCells`` function directly. .. doctest:: >>> data = [[230, 'acdef', 1.3], [6, 'cc', 1.9876]] >>> head = ['one', 'two', 'three'] >>> header, formatted = formattedCells(data, header = head) >>> print formatted [['230', 'acdef', '1.3000'], [' 6', ' cc', '1.9876']] >>> print header ['one', ' two', ' three'] We directly test the latex formatting. .. doctest:: >>> print latex(formatted, header, justify='lrl', caption='A legend', ... label="table:test") \begin{longtable}[htp!]{ l r l } \hline \bf{one} & \bf{two} & \bf{three} \\ \hline \hline 230 & acdef & 1.3000 \\ 6 & cc & 1.9876 \\ \hline \caption{A legend} \label{table:test} \end{longtable} .. Import the ``os`` module so some file cleanup can be done at the end. To check the contents of those files, just delete the following prior to running the test. The try/except clause below is aimed at case where ``junk.pdf`` wasn't created due to ``reportlab`` not being present. .. doctest:: :hide: >>> import os >>> to_delete = ['t3.pickle', 't2.csv', 't2.csv.gz', 't3.tab', ... 'test3b.txt'] >>> for f in to_delete: ... try: ... os.remove(f) ... except OSError: ... pass PyCogent-1.5.3/tests/test_util/test_transform.py000644 000765 000024 00000112414 12024702176 023046 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of transformation and composition functions . """ from cogent.util.unit_test import TestCase, main from cogent.util.misc import identity from cogent.util.transform import apply_each, bools, bool_each, \ conjoin, all, both,\ disjoin, any, either, negate, none, neither, compose, compose_many, \ per_shortest, per_longest, for_seq, \ has_field, extract_field, test_field, index, test_container, \ trans_except, trans_all, make_trans, find_any, find_no, find_all,\ keep_if_more, exclude_if_more, keep_if_more_other, exclude_if_more_other,\ keep_chars,exclude_chars, reorder, reorder_inplace, float_from_string,\ first, last, first_in_set, last_in_set, first_not_in_set, last_not_in_set,\ first_index, last_index, first_index_in_set, last_index_in_set, \ first_index_not_in_set, last_index_not_in_set, perm, comb, cross_comb, _increment_comb __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit", "Zongzhi Liu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class has_x(object): #convenience class for has_field and related functions def __init__(self, x): self.x = x def __hash__(self): return hash(self.x) def __str__(self): return str(self.x) class has_y(object): #convenience class for has_field and related functions def __init__(self, y): self.y = y def __hash__(self): return hash(self.y) def __str__(self): return str(self.y) class metafunctionsTests(TestCase): """Tests of standalone functions.""" def setUp(self): """Define some standard functions and data.""" self.Numbers = range(20) self.SmallNumbers = range(3) self.SmallNumbersRepeated = range(5) * 4 self.Letters = 'abcde' self.Mixed = list(self.Letters) + range(5) self.firsts = 'ab2' self.seconds = '0bc' self.is_char = lambda x: isinstance(x, str) and len(x) == 1 self.is_vowel = lambda x: x in 'aeiou' self.is_consonant = lambda x: x not in 'aeiuo' self.is_number = lambda x: isinstance(x, int) self.is_odd_number = lambda x: x%2 self.is_odd_letter = lambda x: x in 'acegikmoqs' self.is_zero = lambda x: x == 0 self.is_small = lambda x: x < 3 self.double = lambda x: x * 2 self.minusone = lambda x: x - 1 #function to test *args, **kwargs) self.is_alpha_digit = lambda first, second: \ first.isalpha() and second.isdigit() self.is_digit_alpha = lambda first, second: \ first.isdigit() and second.isalpha() def test_apply_each(self): """apply_each should apply each function to args, kwargs""" self.assertEqual(apply_each( \ [self.is_char, self.is_vowel, self.is_consonant, self.is_number], \ self.Letters[0]), [True, True, False, False]) self.assertEqual(apply_each( \ [self.is_char, self.is_vowel, self.is_consonant, self.is_number], \ self.Letters[1]), [True, False, True, False]) self.assertEqual(apply_each( \ [self.double, self.minusone], self.SmallNumbers[0]), [0, -1]) self.assertEqual(apply_each( \ [self.double, self.minusone], self.SmallNumbers[1]), [2, 0]) expects = [[True, False], [False, False], [False, True]] for i in range(len(expects)): self.assertEqual(apply_each( \ [self.is_alpha_digit, self.is_digit_alpha], self.firsts[i], self.seconds[i]), expects[i]) self.assertEqual(apply_each( \ [self.is_alpha_digit, self.is_digit_alpha], self.firsts[i], second=self.seconds[i]), expects[i]) self.assertEqual(apply_each( \ [self.is_alpha_digit, self.is_digit_alpha], second=self.seconds[i], first=self.firsts[i]), expects[i]) def test_bools(self): """bools should convert items to True or False.""" self.assertEqual(bools(self.Letters), [True]*5) self.assertEqual(bools(self.Numbers), [False] + [True]*19) def test_bool_each(self): """bool_each should return boolean version of applying each f to args""" self.assertEqual(bool_each([self.double, self.minusone], \ self.SmallNumbers[0]), [False, True]) self.assertEqual(bool_each([self.double, self.minusone], \ self.SmallNumbers[1]), [True, False]) def test_conjoin(self): """conjoin should return True if all components True""" self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel],'a'), True) self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel], x='b'), False) self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel],'c'), False) self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel],'e'), True) #technically, this one should be true as well, but I left it off to #have an even vowel test case... self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel],'u'), False) #should short-circuit, i.e. not evaluate later cases after False self.assertEqual(conjoin([self.is_odd_letter, self.fail], 'b'), False) self.assertRaises(AssertionError, conjoin, \ [self.is_odd_letter, self.fail], 'a') def test_all(self): """all should return a function returning True if all components True""" odd_vowel = all([self.is_odd_letter, self.is_vowel, self.is_char]) self.assertEqual(odd_vowel('a'), True) self.assertEqual(map(odd_vowel, 'abceu'), [True,False,False,True,False]) odd_number = all([self.is_odd_number, self.is_number]) self.assertEqual(map(odd_number, range(5)), [False,True]*2+[False]) #should short-circuit, i.e. not evaluate later cases after False self.assertEqual(all([self.is_odd_letter, self.fail])('b'), False) self.assertRaises(AssertionError, all([self.is_odd_letter,self.fail]),\ 'a') def test_both(self): """both should return True if both components True""" odd_vowel = both(self.is_odd_letter, self.is_vowel) self.assertEqual(map(odd_vowel, 'abcu'), [True,False,False,False]) #should short-circuit self.assertEqual(both(self.is_odd_letter, self.fail)('b'), False) self.assertRaises(AssertionError, both(self.is_odd_letter, self.fail),\ 'a') def test_disjoin(self): """disjoin should return True if any component True""" self.assertEqual(disjoin([self.is_odd_letter,self.is_vowel], 'a'), True) self.assertEqual(disjoin([self.is_odd_letter,self.is_vowel], 'b'),False) self.assertEqual(disjoin([self.is_odd_letter,self.is_vowel], 'c'), True) self.assertEqual(disjoin([self.is_odd_letter,self.is_vowel], x='u'), True) #should short-circuit after first True self.assertEqual(disjoin([self.is_odd_letter, self.fail], 'a'), True) self.assertRaises(AssertionError, \ disjoin, [self.is_odd_letter, self.fail], 'b') def test_any(self): """any should return a function returning True if any component True""" odd_vowel = any([self.is_odd_letter, self.is_vowel]) self.assertEqual(odd_vowel('a'), True) self.assertEqual(map(odd_vowel, 'abceu'), [True,False,True,True,True]) odd = any([self.is_odd_number, self.is_small]) self.assertEqual(map(odd, range(5)), [True]*4+[False]) #should short-circuit after first True self.assertEqual(any([self.is_odd_letter, self.fail])(x='a'), True) self.assertRaises(AssertionError, any([self.is_odd_letter,self.fail]),\ 'b') def test_either(self): """either should return function returning True if either component True""" odd_vowel = either(self.is_odd_letter, self.is_vowel) self.assertEqual(map(odd_vowel, 'abcu'), [True,False,True,True]) #should short-circuit self.assertEqual(either(self.is_odd_letter, self.fail)(x='a'), True) self.assertRaises(AssertionError, \ either(self.is_odd_letter, self.fail), 'b') def test_negate(self): """negate should return True if no component True""" self.assertEqual(negate([self.is_odd_letter,self.is_vowel], 'a'), False) self.assertEqual(negate([self.is_odd_letter,self.is_vowel], 'b'), True) self.assertEqual(negate([self.is_odd_letter,self.is_vowel], 'c'), False) self.assertEqual(negate([self.is_odd_letter,self.is_vowel], 'u'), False) #should short-circuit after first True self.assertEqual(negate([self.is_odd_letter, self.fail], x='a'), False) self.assertRaises(AssertionError, \ negate, [self.is_odd_letter, self.fail], 'b') def test_none(self): """none should return a function returning True if no component True""" odd_vowel = none([self.is_odd_letter, self.is_vowel]) self.assertEqual(odd_vowel('a'), False) self.assertEqual(map(odd_vowel, 'abceu'), [False,True] + [False]*3) odd = none([self.is_odd_number, self.is_small]) self.assertEqual(map(odd, range(5)), [False]*4+[True]) #should short-circuit after first True self.assertEqual(none([self.is_odd_letter, self.fail])(x='a'), False) self.assertRaises(AssertionError, none([self.is_odd_letter,self.fail]),\ 'b') def test_neither(self): """neither should return function returning True if each component False""" odd_vowel = neither(self.is_odd_letter, self.is_vowel) self.assertEqual(map(odd_vowel, 'abcu'), [False,True,False,False]) #should short-circuit self.assertEqual(neither(self.is_odd_letter, self.fail)(x='a'), False) self.assertRaises(AssertionError, \ neither(self.is_odd_letter, self.fail), 'b') def test_compose(self): """compose should return function returning f(g(x))""" ds = compose(self.double, self.minusone) sd = compose(self.minusone, self.double) self.assertEqual(ds(5), 8) self.assertEqual(sd(x=5), 9) #check that it works when arg lists are different commafy = compose(','.join, list) self.assertEqual(commafy('abc'), 'a,b,c') self.assertEqual(commafy(''), '') self.assertEqual(commafy('a'), 'a') def test_compose_many(self): """compose_many should return composition of all args""" from numpy import arange def to_strings(x): return map(str, x) printable_range = compose_many(''.join, to_strings, range) printable_arange = compose_many(''.join, to_strings, arange) self.assertEqual(printable_range(3), '012') self.assertEqual(printable_range(0), '') self.assertEqual(printable_range(5), '01234') self.assertEqual(printable_arange(stop=51, start=10, step=10), '1020304050') def test_identity(self): """identity should return x""" for i in ['a', 'abc', None, '', [], [1], 1, 2**50, 0.3e-50, {'a':3}]: assert identity(i) is i def test_has_field(self): """has_field should return True if specified field exists.""" x = has_x(1) y = has_y(1) check_x = has_field('x') self.assertEqual(check_x(x), True) self.assertEqual(check_x(y), False) check_y = has_field('y') self.assertEqual(check_y(x), False) self.assertEqual(check_y(y), True) del y.y self.assertEqual(check_y(y), False) y.x = 3 self.assertEqual(check_x(y), True) def test_extract_field(self): """extract_field should apply constructor to field, or return None""" num = has_x('1') alpha = has_x('x') y = has_y('1') extractor = extract_field('x') self.assertEqual(extractor(num), '1') self.assertEqual(extractor(alpha), 'x') self.assertEqual(extractor(y), None) int_extractor = extract_field('x', int) self.assertEqual(int_extractor(num), 1) self.assertEqual(int_extractor(alpha), None) self.assertEqual(int_extractor(y), None) def test_test_field(self): """test_field should return boolean result of applying constructor""" num = has_x('5') alpha = has_x('x') zero = has_x(0) y = has_y('5') tester = test_field('x') self.assertEqual(tester(num), True) self.assertEqual(tester(alpha), True) self.assertEqual(tester(y), False) int_tester = test_field('x', int) self.assertEqual(int_tester(num), True) self.assertEqual(int_tester(alpha), False) self.assertEqual(int_tester(y), False) self.assertEqual(int_tester(zero), False) def test_index(self): """index should index objects by specified field or identity""" num = has_x(5) let = has_x('5') zer = has_x('0') non = has_x(None) y = has_y(3) items = [num, let, zer, non, y] duplicates = items * 3 basic_indexer = index() i = basic_indexer(items) self.assertEqual(i, {num:[num], let:[let], zer:[zer], non:[non], y:[y]}) #test reusability i = basic_indexer([3,3,4]) self.assertEqual(i, {3:[3, 3], 4:[4]}) #test duplicates d = basic_indexer(duplicates) self.assertEqual(d, {num:[num]*3, let:[let]*3, zer:[zer]*3, \ non:[non]*3, y:[y]*3}) #test with constructor str_indexer = index(str) i = str_indexer(items) self.assertEqual(i, {'5':[num,let], '0':[zer], 'None':[non], '3':[y]}) #test order correct in duplicates i = str_indexer(duplicates) self.assertEqual(i, {'5':[num,let,num,let,num,let], '0':[zer,zer,zer], 'None':[non,non,non], '3':[y,y,y]}) #test with squashing overwriter = index(str, overwrite=True) i = overwriter(duplicates) self.assertEqual(i, {'5':let, '0':zer, 'None':non, '3':y}) def test_test_container(self): """test_container should return True or False in a typesafe way.""" test_dict = test_container({'a':1}) test_list = test_container([1,2,3]) test_str = test_container('438hfanvr438') for item in (1, 2, 3): assert test_list(item) assert not test_dict(item) assert not test_str(item) assert test_dict('a') assert not test_list('a') assert test_str('a') for item in ('4', 'h', 'fan'): assert not test_dict(item) assert not test_list(item) assert test_str(item) for item in (['x','y'],{},{'a':3},'@#@',('a','b'),None,False): assert not test_dict(item) assert not test_list(item) assert not test_str(item) class SequenceFunctionsTests(TestCase): """Tests of standalone functions for dealing with sequences.""" def test_per_shortest(self): """per_shortest should divide by min(len(x), len(y))""" self.assertEqual(per_shortest(20, 'aaaaaa', 'bbbb'), 5) self.assertEqual(per_shortest(20, 'aaaaaa', 'b'), 20) self.assertEqual(per_shortest(20, 'a', 'bbbbb'), 20) self.assertEqual(per_shortest(20, '', 'b'), 0) self.assertEqual(per_shortest(20, '', ''), 0) #check that it does it in floating-point self.assertEqual(per_shortest(1, 'aaaaaa', 'bbbb'), 0.25) #check that it raises TypeError on non-seq self.assertRaises(TypeError, per_shortest, 1, 2, 3) def test_per_longest(self): """per_longest should divide by max(len(x), len(y))""" self.assertEqual(per_longest(20, 'aaaaaa', 'bbbb'), 20/6.0) self.assertEqual(per_longest(20, 'aaaaaa', 'b'), 20/6.0) self.assertEqual(per_longest(20, 'a', 'bbbbb'), 20/5.0) self.assertEqual(per_longest(20, '', 'b'), 20) self.assertEqual(per_longest(20, '', ''), 0) #check that it does it in floating-point self.assertEqual(per_longest(1, 'aaaaaa', 'bbbb'), 1/6.0) #check that it raises TypeError on non-seq self.assertRaises(TypeError, per_longest, 1, 2, 3) def test_for_seq(self): """for_seq should return the correct function""" is_eq = lambda x,y: x == y is_ne = lambda x,y: x != y lt_5 = lambda x,y: x + y < 5 diff = lambda x,y: x - y sumsq = lambda x: sum([i*i for i in x]) long_norm = lambda s, x, y: (s + 0.0) / max(len(x), len(y)) times_two = lambda s, x, y: 2*s empty = [] s1 = [1,2,3,4,5] s2 = [1,3,2,4,5] s3 = [1,1,1,1,1] s4 = [5,5,5,5,5] s5 = [3,3,3,3,3] short = [1] #test behavior with default aggregator and normalizer f = for_seq(is_eq) self.assertFloatEqual(f(s1, s1), 1.0) self.assertFloatEqual(f(s1, short), 1.0) self.assertFloatEqual(f(short, s1), 1.0) self.assertFloatEqual(f(short, s4), 0.0) self.assertFloatEqual(f(s4, short), 0.0) self.assertFloatEqual(f(s1,s2), 0.6) f = for_seq(is_ne) self.assertFloatEqual(f(s1, s1), 0.0) self.assertFloatEqual(f(s1, short), 0.0) self.assertFloatEqual(f(short, s1), 0.0) self.assertFloatEqual(f(short, s4), 1.0) self.assertFloatEqual(f(s4, short), 1.0) self.assertFloatEqual(f(s1, s2), 0.4) f = for_seq(lt_5) self.assertFloatEqual(f(s3,s3), 1.0) self.assertFloatEqual(f(s3,s4), 0.0) self.assertFloatEqual(f(s2,s3), 0.6) f = for_seq(diff) self.assertFloatEqual(f(s1,s1), 0.0) self.assertFloatEqual(f(s4,s1), 2.0) self.assertFloatEqual(f(s1,s4), -2.0) #test behavior with different aggregator f = for_seq(diff) self.assertFloatEqual(f(s1,s5), 0) f = for_seq(diff, aggregator=sum) self.assertFloatEqual(f(s1,s5), 0) f = for_seq(diff, aggregator=sumsq) self.assertFloatEqual(f(s1,s5), 2.0) #test behavior with different normalizer f = for_seq(diff, aggregator=sumsq, normalizer=None) self.assertFloatEqual(f(s1,s5), 10) f = for_seq(diff, aggregator=sumsq) self.assertFloatEqual(f(s1,s5), 2.0) f = for_seq(diff, aggregator=sumsq, normalizer=times_two) self.assertFloatEqual(f(s1,s5), 20) f = for_seq(diff, aggregator=sumsq) self.assertFloatEqual(f(s5,short), 4) f = for_seq(diff, aggregator=sumsq, normalizer=long_norm) self.assertFloatEqual(f(s5,short), 0.8) class Filter_Criteria_Tests(TestCase): """Tests of standalone functions used as filter criteria""" def test_trans_except(self): """trans_except should return trans table mapping non-good chars to x""" a = trans_except('Aa', '-') none = trans_except('', '*') some = trans_except('zxcvbnm,.zxcvbnm,.', 'V') self.assertEqual('abcABA'.translate(a), 'a--A-A') self.assertEqual(''.translate(a), '') self.assertEqual('12345678'.translate(a), '--------') self.assertEqual(''.translate(none), '') self.assertEqual('abcdeEFGHI12345&*(!@'.translate(none), '*'*20) self.assertEqual('qazwsxedcrfv'.translate(some),'VVzVVxVVcVVv') def test_trans_all(self): """trans_all should return trans table mapping all bad chars to x""" a = trans_all('Aa', '-') none = trans_all('', '*') some = trans_all('zxcvbnm,.zxcvbnm,.', 'V') self.assertEqual('abcABA'.translate(a), '-bc-B-') self.assertEqual(''.translate(a), '') self.assertEqual('12345678'.translate(a), '12345678') self.assertEqual(''.translate(none), '') self.assertEqual('abcdeEFGHI12345&*(!@'.translate(none), \ 'abcdeEFGHI12345&*(!@') self.assertEqual('qazwsxedcrfv'.translate(some),'qaVwsVedVrfV') def test_make_trans(self): """make_trans should return trans table mapping chars to default""" a = make_trans() self.assertEqual('abc123'.translate(a), 'abc123') a = make_trans('a', 'x') self.assertEqual('abc123'.translate(a), 'xbc123') a = make_trans('ac', 'xa') self.assertEqual('abc123'.translate(a), 'xba123') a = make_trans('ac', 'xa', '.') self.assertEqual('abc123'.translate(a), 'x.a...') self.assertRaises(ValueError, make_trans, 'ac', 'xa', 'av') def test_find_any(self): """find_any should be True if one of the words is in the string""" f = find_any('ab') self.assertEqual(f(''),0) #empty self.assertRaises(AttributeError,f,None) # none self.assertEqual(f('cde'),0) #none of the elements self.assertEqual(f('axxx'),1) #one of the elements self.assertEqual(f('bxxx'),1) #one of the elements self.assertEqual(f('axxxb'),1) #all elements self.assertEqual(f('aaaa'),1) #repeated element # works on any sequence f = find_any(['foo','bar']) self.assertEqual(f("joe"),0) self.assertEqual(f("only foo"),1) self.assertEqual(f("bar and foo"),1) # does NOT work on numbers def test_find_no(self): """find_no should be True if none of the words in the string""" f = find_no('ab') self.assertEqual(f(''),1) #empty self.assertRaises(AttributeError,f,None) # none self.assertEqual(f('cde'),1) #none of the elements self.assertEqual(f('axxx'),0) #one of the elements self.assertEqual(f('bxxx'),0) #one of the elements self.assertEqual(f('axxxb'),0) #all elements self.assertEqual(f('aaaa'),0) #repeated element # works on any sequence f = find_no(['foo','bar']) self.assertEqual(f("joe"),1) self.assertEqual(f("only foo"),0) self.assertEqual(f("bar and foo"),0) # does NOT work on numbers def test_find_all(self): """find_all should be True if all words appear in the string""" f = find_all('ab') self.assertEqual(f(''),0) #empty self.assertRaises(AttributeError,f,None) # none self.assertEqual(f('cde'),0) #none of the elements self.assertEqual(f('axxx'),0) #one of the elements self.assertEqual(f('bxxx'),0) #one of the elements self.assertEqual(f('axxxb'),1) #all elements self.assertEqual(f('aaaa'),0) #repeated element # works on any sequence f = find_all(['foo','bar']) self.assertEqual(f("joe"),0) self.assertEqual(f("only foo"),0) self.assertEqual(f("bar and foo"),1) # does NOT work on numbers def test_keep_if_more(self): """keep_if_more should be True if #items in s > x""" self.assertRaises(ValueError, keep_if_more,'lksfj','ksfd') #not int self.assertRaises(IndexError,keep_if_more,'ACGU',-3) #negative f = keep_if_more('a',0) #zero self.assertEqual(f(''),0) self.assertEqual(f('a'),1) self.assertEqual(f('b'),0) # works on strings f = keep_if_more('ACGU',5) #positive self.assertEqual(f(''),0) self.assertEqual(f('ACGUAGCUioooNNNNNA'),1) self.assertEqual(f('NNNNNNN'),0) # works on words f = keep_if_more(['foo'],1) self.assertEqual(f(''),0) self.assertEqual(f(['foo', 'bar','foo']),1) self.assertEqual(f(['joe']),0) # works on numbers f = keep_if_more([0,1],3) self.assertEqual(f(''),0) self.assertEqual(f([0,1,2,3,4,5]),0) self.assertEqual(f([0,1,0,1]),1) def test_exclude_if_more(self): """exclude_if_more should be True if #items in s <= x""" self.assertRaises(ValueError, exclude_if_more,'lksfj','ksfd') #not int self.assertRaises(IndexError,exclude_if_more,'ACGU',-3) #negative f = exclude_if_more('a',0) #zero self.assertEqual(f(''),1) self.assertEqual(f('a'),0) self.assertEqual(f('b'),1) # works on strings f = exclude_if_more('ACGU',5) #positive self.assertEqual(f(''),1) self.assertEqual(f('ACGUAGCUioooNNNNNA'),0) self.assertEqual(f('NNNNNNN'),1) # works on words f = exclude_if_more(['foo'],1) self.assertEqual(f(''),1) self.assertEqual(f(['foo', 'bar','foo']),0) self.assertEqual(f(['joe']),1) # works on numbers f = exclude_if_more([0,1],3) self.assertEqual(f(''),1) self.assertEqual(f([0,1,2,3,4,5]),1) self.assertEqual(f([0,1,0,1]),0) def test_keep_if_more_other(self): """keep_if_more_other should be True if #other items > x""" self.assertRaises(ValueError, keep_if_more_other,'lksfj','ks') #not int self.assertRaises(IndexError,keep_if_more_other,'ACGU',-3) #negative f = keep_if_more_other('a',0) #zero self.assertEqual(f(''),0) self.assertEqual(f('a'),0) self.assertEqual(f('b'),1) # works on strings f = keep_if_more_other('ACGU',5) #positive self.assertEqual(f(''),0) self.assertEqual(f('ACGUNNNNN'),0) self.assertEqual(f('ACGUAGCUioooNNNNNA'),1) self.assertEqual(f('NNNNNNN'),1) # works on words f = keep_if_more_other(['foo'],1) self.assertEqual(f(''),0) self.assertEqual(f(['foo', 'bar','foo']),0) self.assertEqual(f(['joe','oef']),1) # works on numbers f = keep_if_more_other([0,1],3) self.assertEqual(f(''),0) self.assertEqual(f([0,1,2,3,4,5]),1) self.assertEqual(f([0,1,0,1]),0) def test_exclude_if_more_other(self): """exclude_if_more_other should be True if #other items <= x""" self.assertRaises(ValueError, exclude_if_more_other,'lks','ks') #not int self.assertRaises(IndexError,exclude_if_more_other,'ACGU',-3) #negative f = exclude_if_more_other('a',0) #zero self.assertEqual(f(''),1) self.assertEqual(f('a'),1) self.assertEqual(f('b'),0) # works on strings f = exclude_if_more_other('ACGU',5) #positive self.assertEqual(f(''),1) self.assertEqual(f('ACGUNNNNN'),1) self.assertEqual(f('ACGUAGCUioooNNNNNA'),0) self.assertEqual(f('NNNNNNN'),0) # works on words f = exclude_if_more_other(['foo'],1) self.assertEqual(f(''),1) self.assertEqual(f(['foo', 'bar','foo']),1) self.assertEqual(f(['joe','oef']),0) # works on numbers f = exclude_if_more_other([0,1],3) self.assertEqual(f(''),1) self.assertEqual(f([0,1,2,3,4,5]),0) self.assertEqual(f([0,1,0,1]),1) def test_keep_chars(self): """keep_chars returns a string containing only chars in keep""" f = keep_chars('ab c3*[') self.assertEqual(f(''),'') #empty self.assertRaises(AttributeError,f,None) #None #one character, case sensitive self.assertEqual(f('b'),'b') self.assertEqual(f('g'),'') self.assertEqual(f('xyz123'),'3') self.assertEqual(f('xyz 123'),' 3') #more characters, case sensitive self.assertEqual(f('kjbwherzcagebcujrkcs'),'bcabcc') self.assertEqual(f('f[ffff*ff*fff3fff'),'[**3') # case insensitive f = keep_chars('AbC',False) self.assertEqual(f('abcdef'),'abc') self.assertEqual(f('ABCDEF'),'ABC') self.assertEqual(f('aBcDeF'),'aBc') def test_exclude_chars(self): """exclude_chars returns string containing only chars not in exclude""" f = exclude_chars('ab c3*[') self.assertEqual(f(''),'') #empty self.assertRaises(AttributeError,f,None) #None #one character, case sensitive self.assertEqual(f('b'),'') self.assertEqual(f('g'),'g') self.assertEqual(f('xyz123'),'xyz12') self.assertEqual(f('xyz 123'),'xyz12') #more characters, case sensitive self.assertEqual(f('axxxbxxxcxxx'),'xxxxxxxxx') # case insensitive f = exclude_chars('AbC',False) self.assertEqual(f('abcdef'),'def') self.assertEqual(f('ABCDEF'),'DEF') self.assertEqual(f('aBcDeF'),'DeF') def test_reorder(self): """reorder should always use the same order when invoked""" list_test = reorder([3,2,1]) dict_test = reorder(['x','y','z']) multi_test = reorder([3,2,2]) null_test = reorder([]) first_seq = 'abcde' second_seq = [3,4,5,6,7] empty_list = [] empty_dict = {} full_dict = {'a':3, 'c':5, 'x':'abc','y':'234','z':'qaz'} for i in (first_seq, second_seq, empty_list, empty_dict): self.assertEqual(null_test(i), []) self.assertEqual(list_test(first_seq), ['d','c','b']) self.assertEqual(list_test(second_seq), [6,5,4]) self.assertEqual(multi_test(first_seq), ['d','c','c']) self.assertEqual(dict_test(full_dict), ['abc','234','qaz']) self.assertRaises(KeyError, dict_test, empty_dict) self.assertRaises(IndexError, list_test, empty_list) def test_reorder_inplace(self): """reorder_inplace should replace object's data with new order""" attr_test = reorder_inplace([3,2,1], 'Data') obj_test = reorder_inplace([3,2,2]) seq = [3,4,5,6,7] class obj(object): pass o = obj() o.XYZ = [9, 7, 5] o.Data = ['a','b','c','d','e'] orig_data = o.Data self.assertEqual(obj_test(seq), [6,5,5]) self.assertEqual(seq, [6,5,5]) assert attr_test(o) is o self.assertEqual(o.XYZ, [9,7,5]) self.assertEqual(o.Data, ['d','c','b']) assert orig_data is o.Data def test_float_from_string(self): """float_from_string should ignore funny chars""" ffs = float_from_string self.assertEqual(ffs('3.5'), 3.5) self.assertEqual(ffs(' -3.45e-10 '), float(' -3.45e-10 ')) self.assertEqual(ffs('jsdjhsdf[]()0.001IVUNZSDFl]]['), 0.001) def test_first_index(self): """first_index should return index of first occurrence where f(s)""" vowels = 'aeiou' is_vowel = lambda x: x in vowels s1 = 'ebcua' s2 = 'bcbae' s3 = '' s4 = 'cbd' self.assertEqual(first_index(s1, is_vowel), 0) self.assertEqual(first_index(s2, is_vowel), 3) self.assertEqual(first_index(s3, is_vowel), None) self.assertEqual(first_index(s4, is_vowel), None) def test_last_index(self): """last_index should return index of last occurrence where f(s)""" vowels = 'aeiou' is_vowel = lambda x: x in vowels s1 = 'ebcua' s2 = 'bcbaef' s3 = '' s4 = 'cbd' self.assertEqual(last_index(s1, is_vowel), 4) self.assertEqual(last_index(s2, is_vowel), 4) self.assertEqual(last_index(s3, is_vowel), None) self.assertEqual(last_index(s4, is_vowel), None) def test_first_index_in_set(self): """first_index_in_set should return index of first occurrence """ vowels = 'aeiou' s1 = 'ebcua' s2 = 'bcbae' s3 = '' s4 = 'cbd' self.assertEqual(first_index_in_set(s1, vowels), 0) self.assertEqual(first_index_in_set(s2, vowels), 3) self.assertEqual(first_index_in_set(s3, vowels), None) self.assertEqual(first_index_in_set(s4, vowels), None) def test_last_index_in_set(self): """last_index_in_set should return index of last occurrence""" vowels = 'aeiou' s1 = 'ebcua' s2 = 'bcbaef' s3 = '' s4 = 'cbd' self.assertEqual(last_index_in_set(s1, vowels), 4) self.assertEqual(last_index_in_set(s2, vowels), 4) self.assertEqual(last_index_in_set(s3, vowels), None) self.assertEqual(last_index_in_set(s4, vowels), None) def test_first_index_not_in_set(self): """first_index_not_in_set should return index of first occurrence """ vowels = 'aeiou' s1 = 'ebcua' s2 = 'bcbae' s3 = '' s4 = 'cbd' self.assertEqual(first_index_not_in_set(s1, vowels), 1) self.assertEqual(first_index_not_in_set(s2, vowels), 0) self.assertEqual(first_index_not_in_set(s3, vowels), None) self.assertEqual(first_index_not_in_set(s4, vowels), 0) def test_last_index_not_in_set(self): """last_index_not_in_set should return index of last occurrence""" vowels = 'aeiou' s1 = 'ebcua' s2 = 'bcbaef' s3 = '' s4 = 'cbd' self.assertEqual(last_index_not_in_set(s1, vowels), 2) self.assertEqual(last_index_not_in_set(s2, vowels), 5) self.assertEqual(last_index_not_in_set(s3, vowels), None) self.assertEqual(last_index_not_in_set(s4, vowels), 2) def test_first(self): """first should return first occurrence where f(s)""" vowels = 'aeiou' is_vowel = lambda x: x in vowels s1 = 'ebcua' s2 = 'bcbae' s3 = '' s4 = 'cbd' self.assertEqual(first(s1, is_vowel), 'e') self.assertEqual(first(s2, is_vowel), 'a') self.assertEqual(first(s3, is_vowel), None) self.assertEqual(first(s4, is_vowel), None) def test_last(self): """last should return last occurrence where f(s)""" vowels = 'aeiou' is_vowel = lambda x: x in vowels s1 = 'ebcua' s2 = 'bcbaef' s3 = '' s4 = 'cbd' self.assertEqual(last(s1, is_vowel), 'a') self.assertEqual(last(s2, is_vowel), 'e') self.assertEqual(last(s3, is_vowel), None) self.assertEqual(last(s4, is_vowel), None) def test_first_in_set(self): """first_in_set should return first occurrence """ vowels = 'aeiou' s1 = 'ebcua' s2 = 'bcbae' s3 = '' s4 = 'cbd' self.assertEqual(first_in_set(s1, vowels), 'e') self.assertEqual(first_in_set(s2, vowels), 'a') self.assertEqual(first_in_set(s3, vowels), None) self.assertEqual(first_in_set(s4, vowels), None) def test_last_in_set(self): """last_in_set should return last occurrence""" vowels = 'aeiou' s1 = 'ebcua' s2 = 'bcbaef' s3 = '' s4 = 'cbd' self.assertEqual(last_in_set(s1, vowels), 'a') self.assertEqual(last_in_set(s2, vowels), 'e') self.assertEqual(last_in_set(s3, vowels), None) self.assertEqual(last_in_set(s4, vowels), None) def test_first_not_in_set(self): """first_not_in_set should return first occurrence """ vowels = 'aeiou' s1 = 'ebcua' s2 = 'bcbae' s3 = '' s4 = 'cbd' self.assertEqual(first_not_in_set(s1, vowels), 'b') self.assertEqual(first_not_in_set(s2, vowels), 'b') self.assertEqual(first_not_in_set(s3, vowels), None) self.assertEqual(first_not_in_set(s4, vowels), 'c') def test_last_not_in_set(self): """last_not_in_set should return last occurrence""" vowels = 'aeiou' s1 = 'ebcua' s2 = 'bcbaef' s3 = '' s4 = 'cbd' self.assertEqual(last_not_in_set(s1, vowels), 'c') self.assertEqual(last_not_in_set(s2, vowels), 'f') self.assertEqual(last_not_in_set(s3, vowels), None) self.assertEqual(last_not_in_set(s4, vowels), 'd') def test_perm(self): """perm should return correct permutations""" self.assertEqual(list(perm('abc')), ['abc','acb','bac','bca','cab','cba']) def test_comb(self): """comb should return correct combinations""" self.assertEqual(list(comb(range(5), 0)), []) self.assertEqual(list(comb(range(5), 1)), [[0], [1], [2], [3], [4]]) self.assertEqual(list(comb(range(5), 2)), [[0, 1], [0, 2], [0, 3], [0, 4], [1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]) self.assertEqual(list(comb(range(5), 3)), [[0, 1, 2], [0, 1, 3], [0, 1, 4], [0, 2, 3], [0, 2, 4], [0, 3, 4], [1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]]) self.assertEqual(list(comb(range(5), 4)), [[0, 1, 2, 3], [0, 1, 2, 4], [0, 1, 3, 4], [0, 2, 3, 4], [1, 2, 3, 4]]) self.assertEqual(list(comb(range(5), 5)), [[0, 1, 2, 3, 4]]) def test_cross_comb(self): """cross_comb should produce correct combinations""" v1 = range(2) v2 = range(3) v3 = list('abc') vv1 = ([e] for e in v1) v1_x_v2 = [[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2]] v1v2v3 = [[0, 0, 'a'], [0, 0, 'b'], [0, 0, 'c'], [0, 1, 'a'], [0, 1, 'b'], [0, 1, 'c'], [0, 2, 'a'], [0, 2, 'b'], [0, 2, 'c'], [1, 0, 'a'], [1, 0, 'b'], [1, 0, 'c'], [1, 1, 'a'], [1, 1, 'b'], [1, 1, 'c'], [1, 2, 'a'], [1, 2, 'b'], [1, 2, 'c']] self.assertEqual(list( _increment_comb(vv1, v2)), v1_x_v2) self.assertEqual(list( cross_comb([v1, v2])), v1_x_v2) self.assertEqual(list(cross_comb([v1, v2, v3])), v1v2v3) #run tests if invoked from the commandline if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_util/test_trie.py000644 000765 000024 00000021614 12024702176 021777 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """tests for Trie and compressed Trie class.""" __author__ = "Jens Reeder" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jens Reeder"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jens Reeder" __email__ = "jens.reeder@gmail.com" __status__ = "Prototype" from cogent.util.unit_test import TestCase, main from cogent.util.trie import Trie, Compressed_Trie, build_prefix_map, build_trie, \ _build_prefix_map class TrieTests(TestCase): def setUp(self): self.data = dict({"0": "ab", "1":"abababa", "2":"abab", "3":"baba", "4":"ababaa","5":"a", "6":"abababa", "7":"bab", "8":"babba"}) def test_init(self): """Trie init should create an empty trie.""" t = Trie() self.assertEqual(t.root.labels, []) self.assertEqual(t.root.children, {}) def test_insert_find(self): """An added key should be found by find.""" data = self.data t = Trie() for (label, seq) in data.iteritems(): t.insert(seq, label) for (label, seq) in data.iteritems(): self.assertEqual(label in t.find(seq), True) self.assertEqual(t.find("cacgchagc"), []) self.assertEqual(t.find("abababa"), ["1","6"]) def test_insert_unique(self): """insert_unique should insert only unique words.""" data = self.data t = Trie() for (label, seq) in data.iteritems(): t._insert_unique(seq, label) self.assertEqual(t.find("ab"), []) self.assertEqual(t.find("cacgchagc"), []) self.assertEqual(t.find("abababa"), ["1"]) def test_build_prefix_map(self): """prefix_map should map prefix strings.""" self.assertEqual(dict(_build_prefix_map(self.data.iteritems())), {'1': ['0', '2', '5', '6'], '8': [], '3': ['7'], '4': []}) class Compressed_Trie_Tests(TestCase): def setUp(self): self.data = dict({"0": "ab", "1":"abababa", "2":"abab", "3":"baba", "4":"ababaa","5":"a", "6":"abababa", "7":"bab", "8":"babba"}) self.trie = build_trie(self.data.iteritems()) def test_init(self): """Trie init should create an empty trie.""" t = Compressed_Trie() self.assertEqual(t.root.labels, []) self.assertEqual(t.root.children, {}) self.assertEqual(t.root.key, "") def test_non_zero(self): """__non_zero__ should cehck for any data in the trie.""" t = Compressed_Trie() self.assertEqual(t.__nonzero__(), False) self.assertEqual(self.trie.__nonzero__(), True) def test_len(self): """__len__ should return the number of seqs in the trie.""" self.assertEqual(len(self.trie), 9) t = Compressed_Trie() self.assertEqual(len(t), 0) def test_size(self): """size should return the number of nodes in the trie.""" self.assertEqual(self.trie.size(), 10) #empty trie contins only root node t = Compressed_Trie() self.assertEqual(size(t), 1) def test_to_string(self): """_to_string should create a string representation.""" string_rep = """ key { \tkey a['5'] \t{ \t\tkey b['0'] \t\t{ \t\t\tkey ab['2'] \t\t\t{ \t\t\t\tkey a \t\t\t\t{ \t\t\t\t\tkey a['4'] \t\t\t\t} \t\t\t\t{ \t\t\t\t\tkey ba['1', '6'] \t\t\t\t} \t\t\t} \t\t} \t} } { \tkey bab['7'] \t{ \t\tkey a['3'] \t} \t{ \t\tkey ba['8'] \t} } """ self.assertEqual(str(self.trie), string_rep) def test_insert_find(self): """An added key should be found by find.""" data = self.data t = Compressed_Trie() for (label, seq) in data.iteritems(): t.insert(seq, label) for (label, seq) in data.iteritems(): self.assertEqual(label in t.find(seq), True) self.assertEqual(t.find("abababa"), ["1","6"]) self.assertEqual(t.find("cacgchagc"), []) def test_prefixMap(self): """prefix_map (Compressed_Trie) should map prefix strings.""" self.assertEqual(self.trie.prefixMap(), {'1': ['6', '2', '0', '5'], '8': ['7'], '3': [], '4': []}) def test_init(self): """Trie init should create an empty trie.""" t = Trie() self.assertEqual(t.root.labels, []) self.assertEqual(t.root.children, {}) def test_insert_find(self): """An added key should be found by find.""" data = self.data t = Trie() for (label, seq) in data.iteritems(): t.insert(seq, label) for (label, seq) in data.iteritems(): self.assertEqual(label in t.find(seq), True) self.assertEqual(t.find("cacgchagc"), []) self.assertEqual(t.find("abababa"), ["1","6"]) def test_insert_unique(self): """insert_unique should insert only unique words.""" data = self.data t = Trie() for (label, seq) in data.iteritems(): t._insert_unique(seq, label) self.assertEqual(t.find("ab"), []) self.assertEqual(t.find("cacgchagc"), []) self.assertEqual(t.find("abababa"), ["1"]) def test_build_prefix_map(self): """prefix_map should map prefix strings.""" self.assertEqual(dict(_build_prefix_map(self.data.iteritems())), {'1': ['0', '2', '5', '6'], '8': [], '3': ['7'], '4': []}) def test_build_trie(self): """Build_trie should build trie from seqs.""" t = build_trie(self.data.iteritems(), Trie) self.assertTrue(isinstance(t, Trie)) for (label, seq) in self.data.iteritems(): self.assertContains(t.find(seq), label) self.assertEqual(t.find(""), []) self.assertEqual(t.find("ccc"), []) class Compressed_Trie_Tests(TestCase): def setUp(self): self.data = dict({"0": "ab", "1":"abababa", "2":"abab", "3":"baba", "4":"ababaa","5":"a", "6":"abababa", "7":"bab", "8":"babba"}) self.trie = build_trie(self.data.iteritems()) def test_init(self): """Trie init should create an empty trie.""" t = Compressed_Trie() self.assertEqual(t.root.labels, []) self.assertEqual(t.root.children, {}) self.assertEqual(t.root.key, "") def test_non_zero(self): """__non_zero__ should cehck for any data in the trie.""" t = Compressed_Trie() self.assertEqual(t.__nonzero__(), False) self.assertEqual(self.trie.__nonzero__(), True) def test_len(self): """__len__ should return the number of seqs in the trie.""" self.assertEqual(len(self.trie), 9) def test_size(self): """size should return the number of nodes in the trie.""" self.assertEqual(self.trie.size(), 10) def test_to_string(self): """_to_string should create a string representation.""" string_rep = """ key { \tkey a['5'] \t{ \t\tkey b['0'] \t\t{ \t\t\tkey ab['2'] \t\t\t{ \t\t\t\tkey a \t\t\t\t{ \t\t\t\t\tkey a['4'] \t\t\t\t} \t\t\t\t{ \t\t\t\t\tkey ba['1', '6'] \t\t\t\t} \t\t\t} \t\t} \t} } { \tkey bab['7'] \t{ \t\tkey a['3'] \t} \t{ \t\tkey ba['8'] \t} } """ self.assertEqual(str(self.trie), string_rep) def test_insert_find(self): """An added key should be found by find.""" data = self.data t = Compressed_Trie() for (label, seq) in data.iteritems(): t.insert(seq, label) for (label, seq) in data.iteritems(): self.assertEqual(label in t.find(seq), True) self.assertEqual(t.find("abababa"), ["1","6"]) self.assertEqual(t.find("cacgchagc"), []) def test_prefixMap(self): """prefix_map (Compressed_Trie) should map prefix strings.""" self.assertEqual(self.trie.prefixMap(), {'1': ['6', '2', '0', '5'], '8': ['7'], '3': [], '4': []}) def test_build_trie(self): """Build_trie should build trie from seqs.""" t = build_trie(self.data.iteritems()) self.assertTrue(isinstance(t, Compressed_Trie)) for (label, seq) in self.data.iteritems(): self.assertContains(t.find(seq), label) self.assertEqual(t.find(""), []) self.assertEqual(t.find("ccc"), []) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_util/test_unit_test.py000644 000765 000024 00000144356 12024702176 023063 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for cogent.util.unit_test, extension of the built-in PyUnit framework. """ ##SUPPORT2425 #from __future__ import with_statement from cogent.util.unit_test import TestCase, main, FakeRandom #,numpy_err import numpy; from numpy import array, zeros, log, inf from sys import exc_info __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit", "Gavin Huttley", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" ## SUPPORT2425 #class NumpyErrTests(TestCase): #"""Tests numpy_err function.""" #def test_usage(self): #with numpy_err(divide='raise'): #self.assertRaises(FloatingPointError, log, 0) #with numpy_err(divide='ignore'): #self.assertEqual(log(0), -inf) #with numpy_err(divide='raise'): #self.assertRaises(FloatingPointError, log, 0) #def test_err_status(self): #ori_status = numpy.geterr() #numpy.seterr(divide='warn') #with numpy_err(all='ignore'): #for v in numpy.geterr().values(): #self.assertEqual(v, 'ignore') #self.assertEqual(numpy.geterr()['divide'], 'warn') #numpy.seterr(**ori_status) class FakeRandomTests(TestCase): """Tests FakeRandom class.""" def test_call_constant(self): """FakeRandom __call__ should return next item from list if constant""" const = FakeRandom([1]) self.assertEqual(const(), 1) self.assertRaises(IndexError, const) def test_call_constant_wrap(self): """FakeRandom __call__ should wrap for one-item list if specified""" const = FakeRandom([1], True) for i in range(10): self.assertEqual(const(), True) def test_call_var(self): """FakeRandom __call__ should work with a multi-item list""" f = FakeRandom([1,2,3]) self.assertEqual(f(), 1) self.assertEqual(f(), 2) self.assertEqual(f(), 3) self.assertRaises(IndexError, f) def test_call_var_wrap(self): """FakeRandom __call__ should work with a multi-item wrapped list""" f = FakeRandom([1,2,3], True) result = [f() for i in range(10)] self.assertEqual(result, [1,2,3,1,2,3,1,2,3,1]) def test_cal_var_args(self): """FakeRandom __call__ should ignore extra args""" f = FakeRandom([[1,2,3]], True) for i in range(5): result = f((5,5)) #shape parameter ignored self.assertEqual(result, [1,2,3]) class TestCaseTests(TestCase): """Tests for extension of the built-in unittest framework. For each test, includes an example of success and failure. """ unequal_pairs = [ (1, 0), ([], ()), (None, 0), ('', ' '), (1, '1'), (0, '0'), ('', None), (array([1,2,3]),array([1,2,4])), (array([[1,2],[3,4]]), array([[1.0,2.0],[3.0,4.1]])), (array([1]), array([1,2])), (zeros(0), array([1])), (array([1,1,1]), array([1])), (array([[1,1],[1,1]]), array([1,1,1,1])), (zeros(0), None), (zeros(3), zeros(5)), (zeros(0), ''), ] equal_pairs = [ (1, 1), (0, 0), (5, 5L), (5, 5.0), (0, 0.0), ('', ''), (' ', ' '), ('a', 'a'), (None, None), ([0, 1], [0.0, 1.0]), (array([1,2,3]), array([1,2,3])), (array([[1,2],[3,4]]), array([[1.0,2.0],[3.0,4.0]])), (zeros(0), []), (zeros(0), zeros(0)), (array([]), zeros(0)), (zeros(3), zeros(3)), (array([0,0,0]), zeros(3)), (array([]), []), ] small = 1e-7 big = 1e-5 within_1e6_abs_pairs = [ (1, 1 + small), (1 + small, 1), (1, 1 - small), (1 - small, 1), (100000, 100000 - small), (-100000, -100000 - small), (-1, -1 + small), (-1, -1 - small), (0, small), (0, -small), (array([1,2]), array([1,2+small])), (array([[1,2],[3,4]]), array([[1,2+small],[3,4]])) ] within_1e6_rel_pairs = [ (1, 1 + 1 * small), (1 + 1 * small, 1), (1, 1 - 1 * small), (1 - 1 * small, 1), (100000, 100000 - 100000 * small), (-100000, -100000 - 100000 * small), (-1, -1 + -1 * small), (-1, -1 - -1 * small), (array([1,2]), array([1+small,2])), (array([[1000,1000],[1000,1000]]), \ array([[1000+1000*small, 1000], [1000,1000]])), ] outside_1e6_abs_pairs = [ (1, 1 + big), (1 + big, 1), (1, 1 - big), (1 - big, 1), (100000, 100000 - big), (-100000, -100000 - big), (-1, -1 + big), (-1, -1 - big), (0, big), (0, -big), (1e7, 1e7 + 1), (array([1,1]), array([1,1+big])), (array([[1,1],[1,1]]), array([[1,1+big],[1,1]])), ] outside_1e6_rel_pairs = [ (1, 1 + 1 * big), (1 + 1 * big, 1), (1, 1 - 1 * big), (1 - 1 * big, 1), (100000, 100000 - 100000 * big), (-100000, -100000 - 100000 * big), (-1, -1 + -1 * big), (-1, -1 - -1 * big), (1e-30, 1e-30 + small), (0, small), (1e5, 1e5 + 1), (array([1,1]), array([1,1+1*big])), ] def test_assertNotEqual_None(self): """assertNotEqual should raise exception with two copies of None""" try: self.assertNotEqual(None, None) except: message = str(exc_info()[1]) self.assertEqual(message, 'Observed None and expected None: shouldn\'t test equal') else: raise AssertionError, \ "unit_test.assertNotEqual failed on input %s and %s" \ % (`first`, `second`) def test_assertNotEqual_numbers(self): """assertNotEqual should raise exception with integer and float zero""" try: self.assertNotEqual(0, 0.0) except: message = str(exc_info()[1]) self.assertEqual(message, 'Observed 0 and expected 0.0: shouldn\'t test equal') else: raise AssertionError, \ "unit_test.assertNotEqual failed on input %s and %s" \ % (`first`, `second`) def test_assertNotEqual_unequal(self): """assertNotEqual should not raise exception when values differ""" for first, second in self.unequal_pairs: try: self.assertNotEqual(first, second) except: raise AssertionError, \ "unit_test.assertNotEqual failed on input %s and %s" \ % (`first`, `second`) def test_assertNotEqual_equal(self): """assertNotEqual should raise exception when values differ""" for first, second in self.equal_pairs: try: self.assertNotEqual(first, second) except: message = str(exc_info()[1]) self.assertEqual(message, 'Observed %s and expected %s: shouldn\'t test equal' \ % (`first`, `second`)) else: raise AssertionError, \ "unit_test.assertNotEqual failed on input %s and %s" \ % (`first`, `second`) def test_assertEqual_None(self): """assertEqual should not raise exception with two copies of None""" try: self.assertEqual(None, None) except: raise AssertionError, \ "unit_test.assertEqual failed on input %s and %s" \ % (`first`, `second`) def test_assertEqual_numbers(self): """assertEqual should not raise exception with integer and float zero""" try: self.assertEqual(0, 0.0) except: raise AssertionError, \ "unit_test.assertEqual failed on input %s and %s" \ % (`first`, `second`) def test_assertEqual_unequal(self): """assertEqual should raise exception when values differ""" for first, second in self.unequal_pairs: try: self.assertEqual(first, second) except: message = str(exc_info()[1]) self.assertEqual(message, 'Got %s, but expected %s' \ % (`first`, `second`)) else: raise AssertionError, \ "unit_test.assertEqual failed on input %s and %s" \ % (`first`, `second`) def test_assertEqual_equal(self): """assertEqual should not raise exception when values test equal""" for first, second in self.equal_pairs: try: self.assertEqual(first, second) except: raise AssertionError, \ "unit_test.assertEqual failed on input %s and %s" \ % (`first`, `second`) def test_assertEqual_nested_array(self): self.assertEqual([[1,0], [0,1]], [array([1,0]), array([0,1])]) def test_assertEqual_shape_mismatch(self): """assertEqual should raise when obs and exp shapes mismatch""" obs = [1,2,3] exp = [1,2,3,4] self.assertRaises(AssertionError, self.assertEqual, obs, exp) def test_assertFloatEqualAbs_equal(self): """assertFloatEqualAbs should not raise exception when values within eps""" for first, second in self.within_1e6_abs_pairs: try: self.assertFloatEqualAbs(first, second, eps=1e-6) except: raise AssertionError, \ "unit_test.assertFloatEqualAbs failed on input %s and %s" \ % (`first`, `second`) def test_assertFloatEqualAbs_threshold(self): """assertFloatEqualAbs should raise exception when eps is very small""" for first, second in self.within_1e6_abs_pairs: try: self.assertFloatEqualAbs(first, second, 1e-30) except: message = str(exc_info()[1]) diff = first - second self.assertEqual(message, 'Got %s, but expected %s (diff was %s)' \ % (`first`, `second`, `diff`)) else: raise AssertionError, \ "unit_test.assertFloatEqualAbs failed on input %s and %s" \ % (`first`, `second`) def test_assertFloatEqualAbs_unequal(self): """assertFloatEqualAbs should raise exception when values differ by >eps""" for first, second in self.outside_1e6_abs_pairs: try: self.assertFloatEqualAbs(first, second) except: message = str(exc_info()[1]) diff = first - second self.assertEqual(message, 'Got %s, but expected %s (diff was %s)' \ % (`first`, `second`, `diff`)) else: raise AssertionError, \ "unit_test.assertFloatEqualAbs failed on input %s and %s" \ % (`first`, `second`) def test_assertFloatEqualAbs_shape_mismatch(self): """assertFloatEqualAbs should raise when obs and exp shapes mismatch""" obs = [1,2,3] exp = [1,2,3,4] self.assertRaises(AssertionError, self.assertFloatEqualAbs, obs, exp) def test_assertFloatEqualRel_equal(self): """assertFloatEqualRel should not raise exception when values within eps""" for first, second in self.within_1e6_rel_pairs: try: self.assertFloatEqualRel(first, second) except: raise AssertionError, \ "unit_test.assertFloatEqualRel failed on input %s and %s" \ % (`first`, `second`) def test_assertFloatEqualRel_unequal(self): """assertFloatEqualRel should raise exception when eps is very small""" for first, second in self.within_1e6_rel_pairs: try: self.assertFloatEqualRel(first, second, 1e-30) except: message = str(exc_info()[1]) diff = first - second self.assertEqual(message, 'Got %s, but expected %s (diff was %s)' \ % (`first`, `second`, `diff`)) else: raise AssertionError, \ "unit_test.assertFloatEqualRel failed on input %s and %s" \ % (`first`, `second`) def test_assertFloatEqualRel_unequal(self): """assertFloatEqualRel should raise exception when values differ by >eps""" for first, second in self.outside_1e6_rel_pairs: try: self.assertFloatEqualRel(first, second) except: message = str(exc_info()[1]) diff = first - second self.assertEqual(message, 'Got %s, but expected %s (diff was %s)' \ % (`first`, `second`, `diff`)) else: raise AssertionError, \ "unit_test.assertFloatEqualRel failed on input %s and %s" \ % (`first`, `second`) def test_assertFloatEqualRel_shape_mismatch(self): """assertFloatEqualRel should raise when obs and exp shapes mismatch""" obs = [1,2,3] exp = [1,2,3,4] self.assertRaises(AssertionError, self.assertFloatEqualRel, obs, exp) def test_assertFloatEqualList_equal(self): """assertFloatEqual should work on two lists of similar values""" originals = [0, 1, -1, 10, -10, 100, -100] modified = [i + 1e-7 for i in originals] try: self.assertFloatEqual(originals, modified) self.assertFloatEqual([], []) #test empty lists as well except: raise AssertionError, \ "unit_test.assertFloatEqual failed on lists of similar values" def test_assertFloatEqual_shape_mismatch(self): """assertFloatEqual should raise when obs and exp shapes mismatch""" obs = [1,2,3] exp = [1,2,3,4] self.assertRaises(AssertionError, self.assertFloatEqual, obs, exp) def test_assertFloatEqualList_unequal(self): """assertFloatEqual should fail on two lists of dissimilar values""" originals = [0, 1, -1, 10, -10, 100, -100] modified = [i + 1e-5 for i in originals] try: self.assertFloatEqual(originals, modified) except: pass else: raise AssertionError, \ "unit_test.assertFloatEqual failed on lists of dissimilar values" def test_assertFloatEqual_mixed(self): """assertFloatEqual should work on equal lists of mixed types.""" first = [i[0] for i in self.equal_pairs] second = [i[1] for i in self.equal_pairs] self.assertFloatEqual(first, second) def test_assertFloatEqualAbs_mixed(self): first = [i[0] for i in self.equal_pairs] second = [i[1] for i in self.equal_pairs] """assertFloatEqualAbs should work on equal lists of mixed types.""" self.assertFloatEqualAbs(first, second) def test_assertFloatEqualRel_mixed(self): first = [i[0] for i in self.equal_pairs] second = [i[1] for i in self.equal_pairs] """assertFloatEqualRel should work on equal lists of mixed types.""" self.assertFloatEqualRel(first, second) def test_assertFloatEqual_mixed_unequal(self): """assertFloatEqual should work on unequal lists of mixed types.""" first = [i[0] for i in self.unequal_pairs] second = [i[1] for i in self.unequal_pairs] self.assertRaises(AssertionError, \ self.assertFloatEqual, first, second) def test_assertFloatEqualAbs_mixed(self): """assertFloatEqualAbs should work on lists of mixed types.""" first = [i[0] for i in self.unequal_pairs] second = [i[1] for i in self.unequal_pairs] self.assertRaises(AssertionError, \ self.assertFloatEqualAbs, first, second) def test_assertFloatEqualRel_mixed(self): """assertFloatEqualRel should work on lists of mixed types.""" first = [i[0] for i in self.unequal_pairs] second = [i[1] for i in self.unequal_pairs] self.assertRaises(AssertionError, \ self.assertFloatEqualRel, first, second) def test_assertEqualItems(self): """assertEqualItems should raise exception if items not equal""" self.assertEqualItems('abc', 'abc') self.assertEqualItems('abc', 'cba') self.assertEqualItems('', '') self.assertEqualItems('abc', ['a','b','c']) self.assertEqualItems([0], [0.0]) try: self.assertEqualItems('abc', 'abcd') except: message = str(exc_info()[1]) self.assertEqual(message, 'Observed and expected are different lengths: 3 and 4') else: raise AssertionError, \ "unit_test.assertEqualItems failed on input %s and %s" \ % (`first`, `second`) try: self.assertEqualItems('cab', 'acc') except: message = str(exc_info()[1]) self.assertEqual(message, 'Observed b and expected c at sorted index 1') else: raise AssertionError, \ "unit_test.assertEqualItems failed on input %s and %s" \ % (`first`, `second`) try: self.assertEqualItems('cba', 'yzx') except: message = str(exc_info()[1]) self.assertEqual(message, 'Observed a and expected x at sorted index 0') else: raise AssertionError, \ "unit_test.assertEqualItems failed on input %s and %s" \ % (`first`, `second`) def test_assertSameItems(self): """assertSameItems should raise exception if items not same""" x = 0 y = 'abcdef' z = 3 y1 = 'abc' + 'def' z1 = 3.0 y_id = id(y) z_id = id(z) y1_id = id(y1) z1_id = id(z1) self.assertSameItems([x,y,z], [x,y,z]) self.assertSameItems([x,y,z], [z,x,y]) self.assertSameItems('', '') self.assertSameItems([x,y,z], (x,y,z)) try: self.assertSameItems([x,y,z], [x,y,z,y]) except: message = str(exc_info()[1]) self.assertEqual(message, 'Observed and expected are different lengths: 3 and 4') else: raise AssertionError, \ "unit_test.assertSameItems failed on input %s and %s" \ % (`[x,y,z]`, `[x,y,z,y]`) try: first_list = [x,y,z] second_list = [y,x,z1] self.assertSameItems(first_list, second_list) except self.failureException: pass else: raise AssertionError, \ "unit_test.assertEqualItems failed on input %s and %s" \ % (`[x,y,z]`, `[y,x,z1]`) # assert y is not y1 # try: # self.assertSameItems([y], (y1,)) # except self.failureException: # pass # else: # raise AssertionError, \ # "unit_test.assertEqualItems failed on input %s and %s" \ # % (`[y]`, `(y1,)`) def test_assertNotEqualItems(self): """assertNotEqualItems should raise exception if all items equal""" self.assertNotEqualItems('abc', '') self.assertNotEqualItems('abc', 'cbad') self.assertNotEqualItems([0], [0.01]) try: self.assertNotEqualItems('abc', 'abc') except: message = str(exc_info()[1]) self.assertEqual(message, "Observed 'abc' has same items as 'abc'") else: raise AssertionError, \ "unit_test.assertNotEqualItems failed on input %s and %s" \ % (`'abc'`, `'abc'`) try: self.assertNotEqualItems('', []) except: message = str(exc_info()[1]) self.assertEqual(message, "Observed '' has same items as []") else: raise AssertionError, \ "unit_test.assertNotEqualItems failed on input %s and %s" \ % (`''`, `[]`) def test_assertContains(self): """assertContains should raise exception if item not in test set""" self.assertContains('abc', 'a') self.assertContains(['a', 'b', 'c'], 'a') self.assertContains(['a', 'b', 'c'], 'b') self.assertContains(['a', 'b', 'c'], 'c') self.assertContains({'a':1, 'b':2}, 'a') class _fake_container(object): def __contains__(self, other): return True fake = _fake_container() self.assertContains(fake, 'x') self.assertContains(fake, 3) self.assertContains(fake, {'a':'b'}) try: self.assertContains('', []) except: message = str(exc_info()[1]) self.assertEqual(message, "Item [] not found in ''") else: raise AssertionError, \ "unit_test.assertContains failed on input %s and %s" \ % (`''`, `[]`) try: self.assertContains('abcd', 'x') except: message = str(exc_info()[1]) self.assertEqual(message, "Item 'x' not found in 'abcd'") else: raise AssertionError, \ "unit_test.assertContains failed on input %s and %s" \ % (`'abcd'`, `'x'`) def test_assertNotContains(self): """assertNotContains should raise exception if item in test set""" self.assertNotContains('abc', 'x') self.assertNotContains(['a', 'b', 'c'], 'x') self.assertNotContains('abc', None) self.assertNotContains(['a', 'b', 'c'], {'x':1}) self.assertNotContains({'a':1, 'b':2}, 3.0) class _fake_container(object): def __contains__(self, other): return False fake = _fake_container() self.assertNotContains(fake, 'x') self.assertNotContains(fake, 3) self.assertNotContains(fake, {'a':'b'}) try: self.assertNotContains('', '') except: message = str(exc_info()[1]) self.assertEqual(message, "Item '' should not have been in ''") else: raise AssertionError, \ "unit_test.assertNotContains failed on input %s and %s" \ % (`''`, `''`) try: self.assertNotContains('abcd', 'a') except: message = str(exc_info()[1]) self.assertEqual(message, "Item 'a' should not have been in 'abcd'") else: raise AssertionError, \ "unit_test.assertNotContains failed on input %s and %s" \ % (`'abcd'`, `'a'`) try: self.assertNotContains({'a':1, 'b':2}, 'a') except: message = str(exc_info()[1]) self.assertEqual(message, \ "Item 'a' should not have been in {'a': 1, 'b': 2}") else: raise AssertionError, \ "unit_test.assertNotContains failed on input %s and %s" \ % (`{'a':1, 'b':2}`, `'a'`) def test_assertGreaterThan_equal(self): """assertGreaterThan should raise exception if equal""" self.assertRaises(AssertionError, self.assertGreaterThan, 5, 5) self.assertRaises(AssertionError, self.assertGreaterThan, 5.0, 5.0) self.assertRaises(AssertionError, self.assertGreaterThan, 5.0, 5) self.assertRaises(AssertionError, self.assertGreaterThan, 5, 5.0) def test_assertGreaterThan_None(self): """assertGreaterThan should raise exception if compared to None""" self.assertRaises(AssertionError, self.assertGreaterThan, 5, None) self.assertRaises(AssertionError, self.assertGreaterThan, None, 5) self.assertRaises(AssertionError, self.assertGreaterThan, 5.0, None) self.assertRaises(AssertionError, self.assertGreaterThan, None, 5.0) def test_assertGreaterThan_numbers_true(self): """assertGreaterThan should pass when observed > value""" self.assertGreaterThan(10, 5) def test_assertGreaterThan_numbers_false(self): """assertGreaterThan should raise when observed <= value""" self.assertRaises(AssertionError, self.assertGreaterThan, 2, 5) def test_assertGreaterThan_numbers_list_true(self): """assertGreaterThan should pass when all elements are > value""" observed = [1,2,3,4,3,2,3,4,6,3] self.assertGreaterThan(observed, 0) def test_assertGreaterThan_numbers_list_false(self): """assertGreaterThan should raise when a single element is <= value""" observed = [2,3,4,3,2,1,3,4,6,3] self.assertRaises(AssertionError, self.assertGreaterThan, observed, 1) def test_assertGreaterThan_floats_true(self): """assertGreaterThan should pass when observed > value""" self.assertGreaterThan(5.0, 3.0) def test_assertGreaterThan_floats_false(self): """assertGreaterThan should raise when observed <= value""" self.assertRaises(AssertionError, self.assertGreaterThan, 3.0, 5.0) def test_assertGreaterThan_floats_list_true(self): """assertGreaterThan should pass when all elements are > value""" observed = [1.0,2.0,3.0,4.0,6.0,3.0] self.assertGreaterThan(observed, 0.0) def test_assertGreaterThan_floats_list_false(self): """assertGreaterThan should raise when any elements are <= value""" observed = [2.0,3.0,4.0,1.0, 3.0,3.0] self.assertRaises(AssertionError, self.assertGreaterThan, observed, 1.0) def test_assertGreaterThan_mixed_true(self): """assertGreaterThan should pass when observed > value""" self.assertGreaterThan(5.0, 3) self.assertGreaterThan(5, 3.0) def test_assertGreaterThan_mixed_false(self): """assertGreaterThan should raise when observed <= value""" self.assertRaises(AssertionError, self.assertGreaterThan, -3, 5.0) self.assertRaises(AssertionError, self.assertGreaterThan, 3.0, 5) def test_assertGreaterThan_mixed_list_true(self): """assertGreaterThan should pass when all elements are > value""" observed = [1.0, 2, 3.0, 4.0, 6, 3.0] self.assertGreaterThan(observed, 0.0) self.assertGreaterThan(observed, 0) def test_assertGreaterThan_mixed_list_false(self): """assertGreaterThan should raise when a single element is <= value""" observed = [2.0, 3, 4, 1.0, 3.0, 3.0] self.assertRaises(AssertionError, self.assertGreaterThan, observed, 1.0) self.assertRaises(AssertionError, self.assertGreaterThan, observed, 1) def test_assertGreaterThan_numpy_array_true(self): """assertGreaterThan should pass when all elements are > value""" observed = array([1,2,3,4]) self.assertGreaterThan(observed, 0) self.assertGreaterThan(observed, 0.0) def test_assertGreaterThan_numpy_array_false(self): """assertGreaterThan should pass when any element is <= value""" observed = array([1,2,3,4]) self.assertRaises(AssertionError, self.assertGreaterThan, observed, 3) self.assertRaises(AssertionError, self.assertGreaterThan, observed, 3.0) def test_assertLessThan_equal(self): """assertLessThan should raise exception if equal""" self.assertRaises(AssertionError, self.assertLessThan, 5, 5) self.assertRaises(AssertionError, self.assertLessThan, 5.0, 5.0) self.assertRaises(AssertionError, self.assertLessThan, 5.0, 5) self.assertRaises(AssertionError, self.assertLessThan, 5, 5.0) def test_assertLessThan_None(self): """assertLessThan should raise exception if compared to None""" self.assertRaises(AssertionError, self.assertLessThan, 5, None) self.assertRaises(AssertionError, self.assertLessThan, None, 5) self.assertRaises(AssertionError, self.assertLessThan, 5.0, None) self.assertRaises(AssertionError, self.assertLessThan, None, 5.0) def test_assertLessThan_numbers_true(self): """assertLessThan should pass when observed < value""" self.assertLessThan(10, 15) def test_assertLessThan_numbers_false(self): """assertLessThan should raise when observed >= value""" self.assertRaises(AssertionError, self.assertLessThan, 6, 5) def test_assertLessThan_numbers_list_true(self): """assertLessThan should pass when all elements are < value""" observed = [1,2,3,4,3,2,3,4,6,3] self.assertLessThan(observed, 8) def test_assertLessThan_numbers_list_false(self): """assertLessThan should raise when a single element is >= value""" observed = [2,3,4,3,2,1,3,4,6,3] self.assertRaises(AssertionError, self.assertLessThan, observed, 6) def test_assertLessThan_floats_true(self): """assertLessThan should pass when observed < value""" self.assertLessThan(-5.0, 3.0) def test_assertLessThan_floats_false(self): """assertLessThan should raise when observed >= value""" self.assertRaises(AssertionError, self.assertLessThan, 3.0, -5.0) def test_assertLessThan_floats_list_true(self): """assertLessThan should pass when all elements are < value""" observed = [1.0,2.0,-3.0,4.0,-6.0,3.0] self.assertLessThan(observed, 5.0) def test_assertLessThan_floats_list_false(self): """assertLessThan should raise when a single element is >= value""" observed = [2.0,3.0,4.0,1.0, 3.0,3.0] self.assertRaises(AssertionError, self.assertLessThan, observed, 4.0) def test_assertLessThan_mixed_true(self): """assertLessThan should pass when observed < value""" self.assertLessThan(2.0, 3) self.assertLessThan(2, 3.0) def test_assertLessThan_mixed_false(self): """assertLessThan should raise when observed >= value""" self.assertRaises(AssertionError, self.assertLessThan, 6, 5.0) self.assertRaises(AssertionError, self.assertLessThan, 6.0, 5) def test_assertLessThan_mixed_list_true(self): """assertLessThan should pass when all elements are < value""" observed = [1.0, 2, 3.0, 4.0, 6, 3.0] self.assertLessThan(observed, 7.0) self.assertLessThan(observed, 7) def test_assertLessThan_mixed_list_false(self): """assertLessThan should raise when a single element is >= value""" observed = [2.0, 3, 4, 1.0, 3.0, 3.0] self.assertRaises(AssertionError, self.assertLessThan, observed, 4.0) self.assertRaises(AssertionError, self.assertLessThan, observed, 4) def test_assertLessThan_numpy_array_true(self): """assertLessThan should pass when all elements are < value""" observed = array([1,2,3,4]) self.assertLessThan(observed, 5) self.assertLessThan(observed, 5.0) def test_assertLessThan_numpy_array_false(self): """assertLessThan should pass when any element is >= value""" observed = array([1,2,3,4]) self.assertRaises(AssertionError, self.assertLessThan, observed, 3) self.assertRaises(AssertionError, self.assertLessThan, observed, 3.0) def test_assertIsBetween_bounds(self): """assertIsBetween should raise if min_value >= max_value""" self.assertRaises(AssertionError, self.assertIsBetween, 5, 6, 3) self.assertRaises(AssertionError, self.assertIsBetween, 5, 3, 3) def test_assertIsBetween_equal(self): """assertIsBetween should raise when a value is equal to either bound""" self.assertRaises(AssertionError, self.assertIsBetween, 1, 1, 5) self.assertRaises(AssertionError, self.assertIsBetween, 5, 1, 5) def test_assertIsBetween_None(self): """assertIsBetween should raise when compared to None""" self.assertRaises(AssertionError, self.assertIsBetween, None, 1, 5) self.assertRaises(AssertionError, self.assertIsBetween, 1, None, 5) self.assertRaises(AssertionError, self.assertIsBetween, 5, 1, None) def test_assertIsBetween_numbers_true(self): """assertIsBetween should pass when in bounds""" self.assertIsBetween(5,3,7) def test_assertIsBetween_numbers_false(self): """assertIsBetween should raise when out of bounds""" self.assertRaises(AssertionError, self.assertIsBetween, 5, 1, 3) def test_assertIsBetween_numbers_list_true(self): """assertIsBetween should pass when all elements are in bounds""" observed = [3,4,5,4,3,4,5,4,3] self.assertIsBetween(observed, 1, 7) def test_assertIsBetween_numbers_list_false(self): """assertIsBetween should raise when any elements are out of bounds""" observed = [3,4,5,4,3,4,5,6] self.assertRaises(AssertionError, self.assertIsBetween, observed, 1, 5) def test_assertIsBetween_floats_true(self): """assertIsBetween should pass when in bounds""" self.assertIsBetween(5.0, 3.0 ,7.0) def test_assertIsBetween_floats_false(self): """assertIsBetween should raise when out of bounds""" self.assertRaises(AssertionError, self.assertIsBetween, 5.0, 1.0, 3.0) def test_assertIsBetween_floats_list_true(self): """assertIsBetween should pass when all elements are in bounds""" observed = [3.0, 4.0, -5.0, 4.0, 3.0] self.assertIsBetween(observed, -7.0, 7.0) def test_assertIsBetween_floats_list_false(self): """assertIsBetween should raise when any elements are out of bounds""" observed = [3.0, 4.0, -5.0, 5.0, 6.0] self.assertRaises(AssertionError, self.assertIsBetween,observed,1.0,5.0) def test_assertIsBetween_mixed_true(self): """assertIsBetween should pass when in bounds""" self.assertIsBetween(5.0, 3, 7) self.assertIsBetween(5, 3.0, 7) self.assertIsBetween(5, 3, 7.0) self.assertIsBetween(5.0, 3.0, 7) self.assertIsBetween(5, 3.0, 7.0) self.assertIsBetween(5.0, 3, 7.0) def test_assertIsBetween_mixed_false(self): """assertIsBetween should raise when out of bounds""" self.assertRaises(AssertionError, self.assertIsBetween, 5.0, 1, 3) self.assertRaises(AssertionError, self.assertIsBetween, 5, 1.0, 3) self.assertRaises(AssertionError, self.assertIsBetween, 5, 1, 3.0) self.assertRaises(AssertionError, self.assertIsBetween, 5.0, 1.0, 3) self.assertRaises(AssertionError, self.assertIsBetween, 5, 1.0, 3.0) self.assertRaises(AssertionError, self.assertIsBetween, 5.0, 1, 3.0) def test_assertIsBetween_mixed_list_true(self): """assertIsBetween should pass when all elements are in bounds""" observed = [3,4,5,4.0,3,4.0,5,4,3.0] self.assertIsBetween(observed, 1, 7) self.assertIsBetween(observed, 1.0, 7) self.assertIsBetween(observed, 1, 7.0) self.assertIsBetween(observed, 1.0, 7.0) def test_assertIsBetween_mixed_list_false(self): """assertIsBetween should raise when any elements are out of bounds""" observed = [3.0,4,5.0,4,3,4.0,5,6] self.assertRaises(AssertionError, self.assertIsBetween,observed, 1.0, 5) self.assertRaises(AssertionError, self.assertIsBetween,observed, 1, 5.0) self.assertRaises(AssertionError, self.assertIsBetween,observed,1.0,5.0) self.assertRaises(AssertionError, self.assertIsBetween,observed, 1, 5) def test_assertIsBetween_numpy_array_true(self): """assertIsBetween should pass when all elements are in bounds""" observed = array([1,2,4,5,6]) self.assertIsBetween(observed, 0, 7) def test_assertIsBetween_numpy_array_false(self): """assertIsBetween should raise when any elements is out of bounds""" observed = array([1,2,4,5,6]) self.assertRaises(AssertionError, self.assertIsBetween, observed, 2, 7) def test_assertIsProb_None(self): """assertIsProb should raise when compared against None""" self.assertRaises(AssertionError, self.assertIsProb, None) def test_assertIsProb_numbers_true(self): """assertIsProb should pass when compared against valid numbers""" self.assertIsProb(0) self.assertIsProb(1) def test_assertIsProb_numbers_false(self): """assertIsProb should raise when compared against invalid numbers""" self.assertRaises(AssertionError, self.assertIsProb, -1) self.assertRaises(AssertionError, self.assertIsProb, 2) def test_assertIsProb_numbers_list_true(self): """assertIsProb should pass when all elements are probs""" observed = [0, 1, 0] self.assertIsProb(observed) def test_assertIsProb_numbers_list_false(self): """assertIsProb should raise when any element is not a prob""" observed = [-2, -4, 3] self.assertRaises(AssertionError, self.assertIsProb, observed) def test_assertIsProb_float_true(self): """assertIsProb should pass when compared against valid numbers""" self.assertIsProb(0.0) self.assertIsProb(1.0) def test_assertIsProb_float_false(self): """assertIsProb should raise when compared against invalid numbers""" self.assertRaises(AssertionError, self.assertIsProb, -1.0) self.assertRaises(AssertionError, self.assertIsProb, 2.0) def test_assertIsProb_float_list_true(self): """assertIsProb should pass when all elements are probs""" observed = [0.0, 1.0, 0.0] self.assertIsProb(observed) def test_assertIsProb_float_list_false(self): """assertIsProb should raise when any element is not a prob""" observed = [-2.0, -4.0, 3.0] self.assertRaises(AssertionError, self.assertIsProb, observed) def test_assertIsProb_mixed_list_true(self): """assertIsProb should pass when all elements are probs""" observed = [0.0, 1, 0.0] self.assertIsProb(observed) def test_assertIsProb_mixed_list_false(self): """assertIsProb should raise when any element is not a prob""" observed = [-2.0, -4, 3.0] self.assertRaises(AssertionError, self.assertIsProb, observed) def test_assertIsProb_numpy_array_true(self): """assertIsProb should pass when all elements are probs""" observed = array([0.0,0.4,0.8]) self.assertIsProb(observed) def test_assertIsProb_numpy_array_true(self): """assertIsProb should pass when all elements are probs""" observed = array([0.0,-0.4,0.8]) self.assertRaises(AssertionError, self.assertIsProb, observed) def test_assertSimilarMeans_one_obs_true(self): """assertSimilarMeans should pass when p > pvalue""" obs = [5] expected = [1,2,3,4,5,6,7,8,9,10,11] self.assertSimilarMeans(obs, expected) self.assertSimilarMeans(obs, expected, pvalue=0.25) self._set_suite_pvalue(0.10) self.assertSimilarMeans(obs, expected) def test_assertSimilarMeans_one_obs_false(self): """assertSimilarMeans should raise when p < pvalue""" obs = [5] expected = [.001,.009,.00012] self.assertRaises(AssertionError, self.assertSimilarMeans, \ obs, expected) self.assertRaises(AssertionError, self.assertSimilarMeans, \ obs, expected, 0.1) self._set_suite_pvalue(0.001) self.assertRaises(AssertionError, self.assertSimilarMeans, \ obs, expected) def test_assertSimilarMeans_twosample_true(self): """assertSimilarMeans should pass when p > pvalue""" obs = [4,5,6] expected = [1,2,3,4,5,6,7,8,9] self.assertSimilarMeans(obs, expected) self.assertSimilarMeans(obs, expected, pvalue=0.25) self._set_suite_pvalue(0.10) self.assertSimilarMeans(obs, expected) def test_assertSimilarMeans_twosample_false(self): """assertSimilarMeans should raise when p < pvalue""" obs = [1,2,3] expected = [6,7,8,9,10,11,12,13,14] self.assertRaises(AssertionError, self.assertSimilarMeans, \ obs, expected) self.assertRaises(AssertionError, self.assertSimilarMeans, \ obs, expected, 0.1) self._set_suite_pvalue(0.001) self.assertRaises(AssertionError, self.assertSimilarMeans, \ obs, expected) def test_assertSimilarFreqs_true(self): """assertSimilarFreqs should pass when p > pvalue""" observed = [2,2,3,2,1,2,2,2,2] expected = [2,2,2,2,2,2,2,2,2] self.assertSimilarFreqs(observed, expected) self.assertSimilarFreqs(observed, expected, pvalue=0.25) self._set_suite_pvalue(0.10) self.assertSimilarFreqs(observed, expected) def test_assertSimilarFreqs_false(self): """assertSimilarFreqs should raise when p < pvalue""" observed = [10,15,20,10,12,12,13] expected = [100,50,10,20,700,2,100] self.assertRaises(AssertionError, self.assertSimilarFreqs, \ observed, expected) self.assertRaises(AssertionError, self.assertSimilarFreqs, \ observed, expected, 0.2) self._set_suite_pvalue(0.001) self.assertRaises(AssertionError, self.assertSimilarFreqs, \ observed, expected) def test_assertSimilarFreqs_numpy_array_true(self): """assertSimilarFreqs should pass when p > pvalue""" observed = array([2,2,3,2,1,2,2,2,2]) expected = array([2,2,2,2,2,2,2,2,2]) self.assertSimilarFreqs(observed, expected) self.assertSimilarFreqs(observed, expected, pvalue=0.25) self._set_suite_pvalue(0.10) self.assertSimilarFreqs(observed, expected) def test_assertSimilarFreqs_numpy_array_false(self): """assertSimilarFreqs should raise when p < pvalue""" observed = array([10,15,20,10,12,12,13]) expected = array([100,50,10,20,700,2,100]) self.assertRaises(AssertionError, self.assertSimilarFreqs, \ observed, expected) self.assertRaises(AssertionError, self.assertSimilarFreqs, \ observed, expected, 0.2) self._set_suite_pvalue(0.001) self.assertRaises(AssertionError, self.assertSimilarFreqs, \ observed, expected) def test_set_suite_pvalue(self): """Should set the suite pvalue""" # force stats to fail self._set_suite_pvalue(0.99) obs = [2,5,6] exp = [1,2,3,4,5,6,7,8,9] self.assertRaises(AssertionError, self.assertSimilarMeans, obs, exp) # force stats to pass self._set_suite_pvalue(0.01) self.assertSimilarMeans(obs, exp) def test_assertIsPermutation_true(self): """assertIsPermutation should pass when a is a permutation of b""" observed = [3,2,1,4,5] items = [1,2,3,4,5] self.assertIsPermutation(observed, items) def test_assertIsPermutation_false(self): """assertIsPermutation should raise when a is not a permutation of b""" items = [1,2,3,4,5] self.assertRaises(AssertionError,self.assertIsPermutation, items,items) self.assertRaises(AssertionError,self.assertIsPermutation, [1,2],[3,4]) def test_assertSameObj_true(self): """assertSameObj should pass when 'a is b'""" self.assertSameObj("foo", "foo") self.assertSameObj(None, None) bar = lambda x:5 self.assertSameObj(bar, bar) def test_assertSameObj_false(self): """assertSameObj should raise when 'a is not b'""" self.assertRaises(AssertionError, self.assertSameObj, "foo", "bar") self.assertRaises(AssertionError, self.assertSameObj, None, "bar") self.assertRaises(AssertionError, self.assertSameObj, lambda x:5, \ lambda y:6) def test_assertNotSameObj_true(self): """assertNotSameObj should pass when 'a is not b'""" self.assertNotSameObj("foo", "bar") self.assertNotSameObj(None, 5) self.assertNotSameObj(lambda x:5, lambda y:6) def test_assertNotSameObj_false(self): """assertSameObj should raise when 'a is b'""" self.assertRaises(AssertionError, self.assertNotSameObj, "foo", "foo") self.assertRaises(AssertionError, self.assertNotSameObj, None, None) bar = lambda x:5 self.assertRaises(AssertionError, self.assertNotSameObj, bar, bar) def test_assertIsNotBetween_bounds(self): """assertIsNotBetween should raise if min_value >= max_value""" self.assertRaises(AssertionError, self.assertIsNotBetween, 5, 4, 3) self.assertRaises(AssertionError, self.assertIsNotBetween, 5, 3, 3) def test_assertIsNotBetween_equals(self): """assertIsNotBetween should pass when equal on either bound""" self.assertIsNotBetween(1, 1, 2) self.assertIsNotBetween(1.0, 1, 2) self.assertIsNotBetween(1, 1.0, 2) self.assertIsNotBetween(1.0, 1.0, 2) self.assertIsNotBetween(2, 1, 2) self.assertIsNotBetween(2.0, 1, 2) self.assertIsNotBetween(2, 1, 2.0) self.assertIsNotBetween(2.0, 1, 2.0) def test_assertIsNotBetween_None(self): """assertIsNotBetween should raise when compared against None""" self.assertRaises(AssertionError, self.assertIsNotBetween, None, 1, 2) self.assertRaises(AssertionError, self.assertIsNotBetween, 1, None, 2) self.assertRaises(AssertionError, self.assertIsNotBetween, 1, 2, None) def test_assertIsNotBetween_numbers_true(self): """assertIsNotBetween should pass when a number is not in bounds""" self.assertIsNotBetween(1,2,3) self.assertIsNotBetween(4,2,3) self.assertIsNotBetween(-1,-3,-2) self.assertIsNotBetween(-4,-3,-2) self.assertIsNotBetween(2,-1,1) def test_assertIsNotBetween_numbers_false(self): """assertIsNotBetween should raise when a number is in bounds""" self.assertRaises(AssertionError, self.assertIsNotBetween, 2, 1, 3) self.assertRaises(AssertionError, self.assertIsNotBetween, 0, -1, 1) self.assertRaises(AssertionError, self.assertIsNotBetween, -2, -3, -1) def test_assertIsNotBetween_numbers_list_true(self): """assertIsNotBetween should pass when all elements are out of bounds""" obs = [1,2,3,4,5] self.assertIsNotBetween(obs, 5, 10) self.assertIsNotBetween(obs, -2, 1) def test_assertIsNotBetween_numbers_list_false(self): """assertIsNotBetween should raise when any element is in bounds""" obs = [1,2,3,4,5] self.assertRaises(AssertionError, self.assertIsNotBetween, obs, 3, 7) self.assertRaises(AssertionError, self.assertIsNotBetween, obs, -3, 3) self.assertRaises(AssertionError, self.assertIsNotBetween, obs, 2, 4) def test_assertIsNotBetween_float_true(self): """assertIsNotBetween should pass when a number is not in bounds""" self.assertIsNotBetween(1.0, 2.0, 3.0) self.assertIsNotBetween(4.0, 2.0, 3.0) self.assertIsNotBetween(-1.0, -3.0, -2.0) self.assertIsNotBetween(-4.0, -3.0, -2.0) self.assertIsNotBetween(2.0, -1.0, 1.0) def test_assertIsNotBetween_float_false(self): """assertIsNotBetween should raise when a number is in bounds""" self.assertRaises(AssertionError, self.assertIsNotBetween, 2.0,1.0,3.0) self.assertRaises(AssertionError, self.assertIsNotBetween, 0.0,-1.0,1.0) self.assertRaises(AssertionError,self.assertIsNotBetween,-2.0,-3.0,-1.0) def test_assertIsNotBetween_float_list_true(self): """assertIsNotBetween should pass when all elements are out of bounds""" obs = [1.0, 2.0, 3.0, 4.0, 5.0] self.assertIsNotBetween(obs, 5.0, 10.0) self.assertIsNotBetween(obs, -2.0, 1.0) def test_assertIsNotBetween_float_list_false(self): """assertIsNotBetween should raise when any element is in bounds""" obs = [1.0, 2.0, 3.0, 4.0, 5.0] self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 3.0, 7.0) self.assertRaises(AssertionError,self.assertIsNotBetween, obs, -3.0,3.0) self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 2.0, 4.0) def test_assertIsNotBetween_mixed_true(self): """assertIsNotBetween should pass when a number is not in bounds""" self.assertIsNotBetween(1, 2.0, 3.0) self.assertIsNotBetween(1.0, 2, 3.0) self.assertIsNotBetween(1.0, 2.0, 3) def test_assertIsNotBetween_mixed_false(self): """assertIsNotBetween should raise when a number is in bounds""" self.assertRaises(AssertionError, self.assertIsNotBetween, 2.0, 1.0, 3) self.assertRaises(AssertionError, self.assertIsNotBetween, 2.0, 1, 3.0) self.assertRaises(AssertionError, self.assertIsNotBetween, 2, 1.0, 3.0) def test_assertIsNotBetween_mixed_list_true(self): """assertIsNotBetween should pass when all elements are not in bounds""" obs = [1, 2.0, 3, 4.0, 5.0] self.assertIsNotBetween(obs, 5.0, 10.0) self.assertIsNotBetween(obs, 5, 10.0) self.assertIsNotBetween(obs, 5.0, 10) def test_assertIsNotBetween_mixed_list_false(self): """assertIsNotBetween should raise when any element is in bounds""" obs = [1, 2.0, 3, 4.0, 5.0] self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 3.0, 7.0) self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 3, 7.0) self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 3.0, 7) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_struct/__init__.py000644 000765 000024 00000000637 12024702176 022105 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_rna2d', 'test_annotation', 'test_selection', 'test_asa', 'test_manipulation', 'test_contact'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_struct/test_annotation.py000644 000765 000024 00000004315 12024702176 023554 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os try: from cogent.util.unit_test import TestCase, main from cogent.parse.pdb import PDBParser from cogent.struct.annotation import xtradata from cogent.struct.selection import einput except ImportError: from zenpdb.cogent.util.unit_test import TestCase, main from zenpdb.cogent.parse.pdb import PDBParser from zenpdb.cogent.struct.annotation import xtradata from zenpdb.cogent.struct.selection import einput __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class AnnotationTest(TestCase): """tests if annotation get into xtra.""" def setUp(self): self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) def test_xtradata(self): """tests if an full_id's in the data dict are correctly parsed.""" structure = einput(self.input_structure, 'S')[('2E12',)] model = einput(self.input_structure, 'M')[('2E12', 0)] chain = einput(self.input_structure, 'C')[('2E12', 0, 'B')] residue = einput(self.input_structure, 'R')[('2E12', 0, 'B', ('LEU', 24, ' '))] atom = einput(self.input_structure, 'A')[('2E12', 0, 'B', ('LEU', 24, ' '), ('CD1', ' '))] data_model = {(None, 0):{'model':1}} xtradata(data_model, structure) self.assertEquals(model.xtra, {'model': 1}) data_chain = {(None, None, 'B'):{'chain':1}} xtradata(data_chain, model) self.assertEquals(chain.xtra, {'chain': 1}) data_chain = {(None, 0, 'B'):{'chain': 2}} xtradata(data_chain, structure) self.assertEquals(chain.xtra['chain'], 2) data_residue = {(None, None, 'B', ('LEU', 24, ' ')):{'residue':1}} xtradata(data_residue, model) self.assertEquals(residue.xtra, {'residue': 1}) data_residue = {(None, 0, 'B', ('LEU', 24, ' ')):{'residue':2}} xtradata(data_residue, structure) self.assertEquals(residue.xtra, {'residue': 2}) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_struct/test_asa.py000644 000765 000024 00000021056 12024702176 022147 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os import numpy as np from numpy import sum from cogent.util.unit_test import TestCase, main from cogent.app.util import ApplicationNotFoundError from cogent.parse.pdb import PDBParser from cogent.struct.selection import einput from cogent.maths.stats.test import correlation __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class DummyFile(object): def __init__(self, some_string): self.some_string = some_string slist = self.some_string.split('\n') self.some_string_list = [i + '\n' for i in slist] def readlines(self): return self.some_string_list def close(self): pass test_file_water = """ HETATM 1185 O HOH 268 141.577 14.676 13.168 1.00 54.76 O HETATM 1186 O HOH 269 137.019 17.606 19.854 1.00 33.36 O HETATM 1187 O HOH 270 149.639 55.203 4.611 1.00 49.01 O HETATM 1188 O HOH 271 156.238 32.191 -4.204 1.00 64.53 O CONECT 453 685 CONECT 685 453 MASTER 357 0 0 1 10 0 0 6 1187 1 2 12 END """ dummy_water = DummyFile(test_file_water) class asaTest(TestCase): """Tests for surface calculations.""" def setUp(self): self.arr = np.random.random(3000).reshape((1000, 3)) self.point = np.random.random(3) self.center = np.array([0.5, 0.5, 0.5]) def test_0import(self): # sort by name """tests if can import _asa cython extension.""" global _asa from cogent.struct import _asa assert 'asa_loop' in dir(_asa) def test_1import(self): # sort by name """tests if can import asa.""" global asa from cogent.struct import asa def test_asa_loop(self): """tests if inner asa_loop (cython) performs correctly""" self.lcoords = np.array([[-4., 0, 0], [0, 0, 0], [4, 0, 0], [10, 0, 0]]) self.qcoords = np.array([[0., 0, 0], [4., 0, 0]]) self.lradii = np.array([2., 3.]) self.qradii = np.array([3., 2.]) #spoints, np.ndarray[DTYPE_t, ndim =1] box,\ # DTYPE_t probe, unsigned int bucket_size, MAXSYM =200000) self.spoints = np.array([[1., 0., 0.], [-1., 0., 0.], [0., 1., 0.], \ [0., -1., 0.], [0., 0., 1.], [0., 0., -1.]]) output = _asa.asa_loop(self.qcoords, self.lcoords, self.qradii, \ self.lradii, self.spoints, \ np.array([-100., -100., -100., 100., 100., 100.]), 1., 10) self.assertFloatEqual(output, np.array([ 75.39822369, 41.88790205])) def test_asa_xtra(self): """test internal asa""" self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) self.assertRaises(ValueError, asa.asa_xtra, self.input_structure, mode='a') result = asa.asa_xtra(self.input_structure) a = einput(self.input_structure, 'A') for i in range(len(result)): self.assertEquals(result.values()[i]['ASA'], a[result.keys()[i]].xtra['ASA']) r = einput(self.input_structure, 'R') for water in r.selectChildren('H_HOH', 'eq', 'name').values(): self.assertFalse('ASA' in water.xtra) for residue in r.selectChildren('H_HOH', 'ne', 'name').values(): for a in residue: self.assertTrue('ASA' in a.xtra) result = asa.asa_xtra(self.input_structure, xtra_key='SASA') for residue in r.selectChildren('H_HOH', 'ne', 'name').values(): for a in residue: a.xtra['ASA'] == a.xtra['SASA'] def test_asa_xtra_stride(self): """test asa via stride""" self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) try: result = asa.asa_xtra(self.input_structure, 'stride') except ApplicationNotFoundError: return self.assertAlmostEqual(self.input_structure[(0,)][('B',)]\ [(('LEU', 35, ' '),)].xtra['STRIDE_ASA'], 17.20) def test_compare(self): """compares internal asa to stride.""" self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) try: asa.asa_xtra(self.input_structure, mode='stride') except ApplicationNotFoundError: return asa.asa_xtra(self.input_structure) self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True) residues = einput(self.input_structure, 'R') asa1 = [] asa2 = [] for residue in residues.selectChildren('H_HOH', 'ne', 'name').values(): asa1.append(residue.xtra['ASA']) asa2.append(residue.xtra['STRIDE_ASA']) self.assertAlmostEqual(correlation(asa1, asa2)[1], 0.) def test_uc(self): """compares asa within unit cell.""" self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) asa.asa_xtra(self.input_structure, symmetry_mode='uc', xtra_key='ASA_UC') asa.asa_xtra(self.input_structure) self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True) self.input_structure.propagateData(sum, 'A', 'ASA_UC', xtra=True) residues = einput(self.input_structure, 'R') x = residues[('2E12', 0, 'B', ('GLU', 77, ' '))].xtra.values() self.assertTrue(x[0] != x[1]) def test_uc2(self): self.input_file = os.path.join('data', '1LJO.pdb') self.input_structure = PDBParser(open(self.input_file)) asa.asa_xtra(self.input_structure, symmetry_mode='uc', xtra_key='ASA_XTAL') asa.asa_xtra(self.input_structure) self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True) self.input_structure.propagateData(sum, 'A', 'ASA_XTAL', xtra=True) residues = einput(self.input_structure, 'R') r1 = residues[('1LJO', 0, 'A', ('ARG', 65, ' '))] r2 = residues[('1LJO', 0, 'A', ('ASN', 46, ' '))] self.assertFloatEqual(r1.xtra.values(), [128.94081270529105, 22.807700865674093]) self.assertFloatEqual(r2.xtra.values(), [115.35738419425566, 115.35738419425566]) def test_crystal(self): """compares asa within unit cell.""" self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) asa.asa_xtra(self.input_structure, symmetry_mode='uc', crystal_mode=2, xtra_key='ASA_XTAL') asa.asa_xtra(self.input_structure) self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True) self.input_structure.propagateData(sum, 'A', 'ASA_XTAL', xtra=True) residues = einput(self.input_structure, 'R') r1 = residues[('2E12', 0, 'A', ('ALA', 42, ' '))] r2 = residues[('2E12', 0, 'A', ('VAL', 8, ' '))] r3 = residues[('2E12', 0, 'A', ('LEU', 25, ' '))] self.assertFloatEqual(r1.xtra.values(), \ [32.041070749038823, 32.041070749038823]) self.assertFloatEqual(r3.xtra.values(), \ [0., 0.]) self.assertFloatEqual(r2.xtra.values(), \ [28.873559956056916, 0.0]) def test__prepare_entities(self): self.input_structure = PDBParser(dummy_water) self.assertRaises(ValueError, asa._prepare_entities, self.input_structure) def _test_bio(self): """compares asa within a bio unit.""" self.input_file = os.path.join('data', '1A1X.pdb') self.input_structure = PDBParser(open(self.input_file)) asa.asa_xtra(self.input_structure, symmetry_mode='bio', xtra_key='ASA_BIO') asa.asa_xtra(self.input_structure) self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True) self.input_structure.propagateData(sum, 'A', 'ASA_BIO', xtra=True) residues = einput(self.input_structure, 'R') r1 = residues[('1A1X', 0, 'A', ('GLU', 37, ' '))] r2 = residues[('1A1X', 0, 'A', ('TRP', 15, ' '))] self.assertFloatEqual(r1.xtra.values(), \ [20.583191467544726, 78.996394472066541]) self.assertFloatEqual(r2.xtra.values(), \ [136.41436710386989, 136.41436710386989]) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_struct/test_contact.py000644 000765 000024 00000005274 12024702176 023042 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os import numpy as np try: from cogent.util.unit_test import TestCase, main from cogent.parse.pdb import PDBParser from cogent.struct.selection import einput except ImportError: from zenpdb.cogent.util.unit_test import TestCase, main from zenpdb.cogent.parse.pdb import PDBParser from zenpdb.cogent.struct.selection import einput __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class asaTest(TestCase): """Tests for surface calculations.""" def setUp(self): self.arr = np.random.random(3000).reshape((1000, 3)) self.point = np.random.random(3) self.center = np.array([0.5, 0.5, 0.5]) def test_0import(self): # sort by name """tests if can import _contact cython extension.""" global _contact from cogent.struct import _contact assert 'cnt_loop' in dir(_contact) def test_1import(self): # sort by name """tests if can import contact.""" global contact from cogent.struct import contact def test_chains(self): """compares contacts diff chains""" self.input_file = os.path.join('data', '1A1X.pdb') # one chain self.input_structure = PDBParser(open(self.input_file)) res = contact.contacts_xtra(self.input_structure) self.assertTrue(res == {}) self.input_file = os.path.join('data', '2E12.pdb') # one chain self.input_structure = PDBParser(open(self.input_file)) res = contact.contacts_xtra(self.input_structure) self.assertTrue(res) self.assertFloatEqual(\ res[('2E12', 0, 'B', ('THR', 17, ' '), ('OG1', ' '))]['CONTACTS']\ [('2E12', 0, 'A', ('ALA', 16, ' '), ('CB', ' '))][0], 5.7914192561064004) def test_symmetry(self): """compares contacts diff symmetry mates""" self.input_file = os.path.join('data', '2E12.pdb') # one chain self.input_structure = PDBParser(open(self.input_file)) res = contact.contacts_xtra(self.input_structure, \ symmetry_mode='uc', contact_mode='diff_sym') self.assertTrue(res) self.assertFloatEqual(\ res[('2E12', 0, 'B', ('GLU', 77, ' '), ('OE2', ' '))]['CONTACTS']\ [('2E12', 0, 'B', ('GLU', 57, ' '), ('OE2', ' '))][0], \ 5.2156557833123873) def test_crystal(self): """"compares contacts diff unit-cell-mates""" pass if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_struct/test_dihedral.py000644 000765 000024 00000033544 12024702176 023164 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # # test_dihedral.py # # Tests the dihedral module. # """Provides tests for functions in the file dihedral.py """ __author__ = "Kristian Rother" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Kristian Rother", "Sandra Smit"] __credits__ = ["Janusz Bujnicki", "Nils Goldmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kristian Rother" __email__ = "krother@rubor.de" __status__ = "Production" from cogent.util.unit_test import main, TestCase from cogent.struct.dihedral import dihedral, scalar, angle, \ calc_angle, DihedralGeometryError, AngleGeometryError from random import random from numpy import array from math import pi, cos, sin class DihedralTests(TestCase): def get_random_array(self): """ Returns a one-dimensional numpy array with three random floats in the range between -5.0 and +5.0. """ return array([(random()-0.5)*10,(random()-0.5)*10,(random()-0.5)*10]) def assertAlmostEqualAngle(self, is_value, should_value, digits=7): """ Checks two angles in degrees whether they are the same within the given number of digits. This has been implemented to make sure that 359.9999991 == 0.0 """ maxv = 359.0 for i in range(digits): maxv += 0.9 * 0.1**i while is_value < 0.0: is_value += 360 while is_value > maxv: is_value -= 360 while should_value < 0.0: should_value += 360 while should_value > maxv: should_value -= 360 self.assertAlmostEqual(is_value,should_value, digits) def test_scalar(self): """Tests the scalar product function for one-dimensional arrays.""" # test one-dimensional arrays self.assertEqual(scalar(array([0]),array([0])),0.0) self.assertEqual(scalar(array([2]),array([3])),6.0) self.assertEqual(scalar(array([0,0]),array([0,0])),0.0) self.assertEqual(scalar(array([-1,-4]),array([1,4])),-17.0) self.assertEqual(scalar(array([1,2]),array([3,4])),11.0) self.assertEqual(scalar(array([-1,-4]),array([-1,4])),-15.0) self.assertEqual(scalar(array([0,0,0]),array([0,0,0])),0.0) self.assertEqual(scalar(array([2.5,0,-1]),array([2.5,0,-1])),7.25) self.assertEqual(scalar(array([1,2,3]),array([0,0,0])),0.0) self.assertEqual(scalar(array([1,2,3]),array([1,2,3])),14.0) self.assertEqual(scalar(array([1,2,3]),array([4,5,6])),32.0) # test two-dimensional arrays (should not be a feature) self.assertNotEqual(scalar(array([[0,0],[0,0]]),\ array([[0,0],[0,0]])),0.0) def test_angle_simple(self): """Tests the angle function for one- and two-dimensional vectors.""" # test two-dimensional vectors (not arrays!) self.assertEqual(angle(array([0,1]),array([1,0])),0.5*pi) self.assertEqual(angle(array([5,0]),array([13,0])),0.0) self.assertEqual(angle(array([2,3]),array([26,39])),0.0) self.assertEqual(angle(array([2,3]),array([-3,2])),0.5*pi) self.assertEqual(angle(array([-5,0]),array([13,0])),pi) # test three-dimensional vectors (not arrays!) self.assertEqual(angle(array([0,0,-1]),array([0,0,1])),pi) self.assertEqual(angle(array([0,15,-1]),array([0,-15,1])),pi) self.assertEqual(angle(array([0,0,7]),array([14,14,0])),0.5*pi) self.assertEqual(angle(array([0,7,7]),array([0,14,14])),0.0) self.assertAlmostEqual(angle(array([100000000.0,0,1]),\ array([1,0,0])),0.0) def test_calc_angle_simple(self): """Tests the calc_angle function for one- and two-dimensional vectors.""" # test two-dimensional vectors (not arrays!) self.assertEqual(calc_angle(array([0,1]),array([0,0]),array([1,0])),0.5*pi) self.assertEqual(calc_angle(array([5,0]),array([0,0]),array([13,0])),0.0) self.assertEqual(calc_angle(array([4,3]),array([2,0]),array([28,39])),0.0) self.assertEqual(calc_angle(array([2,-13]),array([0,-10]),array([-3,-12])),0.5*pi) self.assertEqual(calc_angle(array([-5,0]),array([0,0]),array([13,0])),pi) # test three-dimensional vectors (not arrays!) self.assertEqual(calc_angle(array([0,0,-1]),array([0,0,0]),array([0,0,1])),pi) self.assertEqual(calc_angle(array([0,15,-1]),array([0,0,0]),array([0,-15,1])),pi) self.assertEqual(calc_angle(array([0,10,7]),array([0,10,0]),array([14,24,0])),0.5*pi) self.assertEqual(calc_angle(array([0,7,7]),array([0,0,0]),array([0,14,14])),0.0) self.assertAlmostEqual(calc_angle(array([100000000.0,0,1]),\ array([0,0,0]),array([1,0,0])),0.0) ## def make_scipy_angles(self): ## """Generates the test data given below. Was commented out ## for getting rid of the library dependency""" ## from Scientific.Geometry import Vector ## for i in range(20): ## v1 = self.get_random_array() ## v2 = self.get_random_array() ## vec1 = Vector(v1[0],v1[1],v1[2]) ## vec2 = Vector(v2[0],v2[1],v2[2]) ## scipy_angle = vec1.angle(vec2) ## out = [(vec1[0],vec1[1],vec1[2]),\ ## (vec2[0],vec2[1],vec2[2]),scipy_angle] ## print out,"," def test_angle_scipy(self): """ Asserts that dihedral and ScientificPython calculate the same angles. """ for v1,v2,scipy_angle in SCIPY_ANGLES: ang = angle(array(v1),array(v2)) self.assertAlmostEqual(ang, scipy_angle) def test_angle_fail(self): """The angle function should fail for zero length vectors.""" # should not work for zero length vectors self.assertRaises(AngleGeometryError,angle,\ array([0,0]),array([0,0])) self.assertRaises(AngleGeometryError,angle,\ array([0,0,0]),array([0,0,0])) def test_dihedral_eight_basic_directions(self): """Checks dihedrals in all 45 degree intervals.""" # using vectors with integer positions. self.assertAlmostEqualAngle(\ dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2,-1, 0]), 0.0) self.assertAlmostEqualAngle(\ dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2,-1,-1]), 45.0) self.assertAlmostEqualAngle(\ dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 0,-1]), 90.0) self.assertAlmostEqualAngle(\ dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 1,-1]),135.0) self.assertAlmostEqualAngle(\ dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 1, 0]),180.0) self.assertAlmostEqualAngle(\ dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 1, 1]),225.0) self.assertAlmostEqualAngle(\ dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 0, 1]),270.0) self.assertAlmostEqualAngle(\ dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2,-1, 1]),315.0) def test_dihedral_rotation(self): """Checks all angles in 0.2 degree intervals.""" # constructs vectors using sin/cos and then calculates dihedrals precision = 5.0 # the higher the better v1 = array([1,0,1]) v2 = array([0,0,1]) v3 = array([0,0,2]) for i in range(int(360*precision)): degrees = i/precision radians = pi*degrees/180.0 opp_degrees = 360-degrees if opp_degrees == 360.0: opp_degrees = 0.0 # construct circular motion of vector v4 = array([cos(radians), sin(radians),2]) self.assertAlmostEqualAngle(dihedral(v4,v3,v2,v1), degrees, 5) # check rotation in the opposite direction self.assertAlmostEqualAngle(dihedral(v1,v2,v3,v4), degrees, 5) def test_dihedral_samples(self): """Checks values measured manually from atoms in PyMOL.""" coordinates = [ [(-1.225,4.621,42.070),(-1.407,4.455,43.516),\ (-2.495,4.892,44.221),(-3.587,5.523,43.715)], [(-2.495,4.892,44.221),(1.513,0.381,40.711),\ (-3.091,4.715,47.723),(-0.567,3.892,44.433)], [(-0.349,5.577,39.446),(-1.559,3.400,41.427),\ (-4.304,5.563,45.998),(-2.495,4.892,44.221)], [(-45.819,84.315,19.372),(-31.124,72.286,14.035),\ (-27.975,58.688,7.025),(-16.238,78.659,23.731)], [(-29.346,66.973,24.152),(-29.977,69.635,24.580),\ (-30.875,68.788,24.663),(-30.668,67.495,24.449)], [(-34.586,84.884,14.064),(-23.351,69.756,11.028),\ (-40.924,69.442,24.630),(-30.875,68.788,24.663)] ] angles = [1.201, 304.621, 295.672, 195.184, 358.699, 246.603] for i in range(len(coordinates)): v1,v2,v3,v4 = coordinates[i] self.assertAlmostEqualAngle(dihedral(v1,v2,v3,v4), angles[i],3) def test_dihedral_linear(self): """The dihedral function should fail for collinear vectors.""" v1 = [1,0,0] v2 = [2,0,0] v3 = [3,0,0] v4 = [4,0,0] # print dihedral(v1,v2,v3,v4) for i in range(100): offset = array([int((random()-0.5)*10),\ int((random()-0.5)*10),\ int((random()-0.5)*10)]) v1 = array([int((random()-0.5)*100),\ int((random()-0.5)*100),\ int((random()-0.5)*100)]) v2 = v1 * int((random()-0.5)*100) + offset v3 = v1 * int((random()-0.5)*100) + offset v4 = v1 * int((random()-0.5)*100) + offset v1 += offset self.assertRaises(DihedralGeometryError,dihedral,v1,v2,v3,v4) def test_dihedral_identical(self): """The dihedral function should fail if two vectors are the same.""" # except for the first and last (the vectors form a triangle), # in which case the dihedral angle should be 0.0 for i in range(100): v1 = self.get_random_array() v2 = self.get_random_array() v3 = self.get_random_array() self.assertRaises(DihedralGeometryError,dihedral,v1,v1,v2,v3) self.assertRaises(DihedralGeometryError,dihedral,v1,v2,v1,v3) self.assertRaises(DihedralGeometryError,dihedral,v1,v2,v2,v3) self.assertRaises(DihedralGeometryError,dihedral,v1,v2,v3,v3) self.assertRaises(DihedralGeometryError,dihedral,v1,v3,v2,v3) # now the triangular case # make sure that 359.999998 is equal to 0.0 torsion = dihedral(v1,v2,v3,v1) + 0.000001 if torsion > 360.0: torsion -= 360.0 self.assertAlmostEqualAngle(torsion,0.0,5) SCIPY_ANGLES = [ [(-4.4891521637990852, -1.2310927013330153, -0.96969777583098771), (4.2147455310344171, -3.5069051036633514, 2.2430685816310305), 2.2088870817461759] , [(0.13959847081794097, 1.7204537912940399, -1.9303780516641089), (0.35412687539602361, -2.9493521724340743, -4.865941405480644), 1.2704043143950585] , [(2.3192363837822327, -3.6376441859213848, -2.2337816400479813), (-1.0271253661119029, -2.5736009846920425, -4.1470855710278975), 0.83609068310373857] , [(-0.38347986357358477, -4.1453876196041719, -2.1354583394773785), (0.27416747968044608, -2.5747732838982551, -0.68554680652905264), 0.28348352552436806] , [(-2.4928231204363449, 1.9263125976608209, -0.34275964486924715), (0.6721152528064811, 1.5270465172130598, -3.5720133753579564), 1.3701121510959966] , [(-1.50101403139692, 2.3218275982958292, 1.044582416480222), (-3.044743729573085, 2.0655933798619532, 2.9037849925327897), 0.46218988498537578] , [(4.9648826388927603, -1.7439743270051977, 1.0432135258334796), (-3.3694557299188608, 3.7697639370274052, 2.6962018714965055), 2.3004031013653625] , [(-3.3337033325729051, -0.79660906888508021, -3.4875326261817454), (1.4735023133671066, -0.066399047153666846, 0.94171530437632489), 2.8293957790283595] , [(-2.1249404252000517, 2.7456001658201568, 1.6891202129451799), (-0.66412553435299504, 3.371012200444512, -1.1548086037901306), 0.89846374464990042] , [(3.3993205618602018, 1.2047703532166887, -1.5839949555795063), (-4.6759756026580863, -2.8551222890449646, 4.888270217692785), 2.7825564894291754] , [(-3.966467296275785, 0.75617096138383189, -3.1711352932360248), (2.1054362220912326, 4.2867761689586601, -0.65739369331424213), 1.6933117193742961] , [(0.44413554305522851, 4.6000690382282361, -3.8338383756621819), (-2.4947413565865029, 1.8136080147734013, -4.0295344084655405), 0.73110709489481174] , [(-1.0971777991639065, -3.3166205797568815, 2.7098739534055563), (2.1536566381847289, 4.7817155120086055, -0.068554664323454695), 2.4878958372925202] , [(2.2696914760438136, 4.8841630875833673, 4.9524177412608861), (1.2249510822623111, 0.73008672334971658, 1.131607772478449), 0.45741431674591532] , [(-2.4456899797216938, 4.7894200033986447, -2.839449354837468), (0.95035116225980154, 4.2179212878828238, -1.801158217734109), 0.63144509227951684] , [(-4.6954041297179474, -2.8326266911591391, -1.1804869511610427), (3.2585456362256924, -2.2325171051479265, -2.0527260317826901), 1.8363466110668369] , [(2.1416146613604283, 3.8577375591718677, 3.1463493245087939), (-0.32185887240442468, 2.2163051363839505, 2.4704882512058224), 0.52534998201320993] , [(-2.8493351354335941, -3.8203784990110954, 2.4657357720402273), (-2.7799043389229383, 4.358406526726669, -2.8319872383058744), 2.0906833125217235] , [(2.274223250163784, -3.6086250253596406, 1.7143006579401876), (3.2763334328544347, -0.89908959703552171, -4.4068824993431557), 1.4477009361020545] , [(0.66737672421842809, -3.4628508908383848, 3.9044108358095366), (-1.9078974719893915, -0.53231141116878433, 1.3323584972786728), 1.0932781951137689] , ] if __name__== '__main__': main() PyCogent-1.5.3/tests/test_struct/test_knots.py000644 000765 000024 00000153205 12024702176 022543 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # test_knots.py """Provides tests for classes and functions in the file knots.py """ from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.util.dict2d import Dict2D from cogent.struct.rna2d import Pairs from cogent.struct.knots import PairedRegion, PairedRegionFromPairs,\ PairedRegions, PairedRegionsFromPairs, ConflictMatrix,\ opt_all, contains_true, empty_matrix,\ pick_multi_best, dp_matrix_multi, matrix_solutions,\ opt_single_random, opt_single_property,\ inc_order, inc_length, inc_range,\ find_max_conflicts, find_min_gain,\ conflict_elimination, add_back_non_conflicting,\ num_bps, hydrogen_bonds,\ nussinov_fill, nussinov_traceback, nussinov_restricted __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Sandra Smit, Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class PairedRegionTests(TestCase): """Tests for PairedRegion class""" def test_init_valid(self): """PairedRegion __init__: should work as expected on valid input """ pr = PairedRegion(3,10,2) self.assertEqual(pr.Start, 3) self.assertEqual(pr.End, 10) self.assertEqual(pr.Length, 2) self.assertEqual(pr.Pairs, [(3,10),(4,9)]) pr = PairedRegion(3,10,2, Id=0) self.assertEqual(pr.Id, 0) pr = PairedRegion(3,10,2, Id='A') self.assertEqual(pr.Id, 'A') self.assertRaises(ValueError, PairedRegion, 4, 10, 0) def test_init_weird(self): """PairedRegion __init__: no error checking """ pr = PairedRegion(3,6,4) self.assertEqual(pr.Start, 3) self.assertEqual(pr.End, 6) self.assertEqual(pr.Length, 4) self.assertEqual(pr.Pairs, [(3,6),(4,5),(5,4),(6,3)]) def test_str(self): """PairedRegion __str__: should print pairs""" pr = PairedRegion(3,10,2) p = Pairs([(3,10),(4,9)]) self.assertEqual(str(pr), str(p)) def test_len(self): """PairedRegion __len__: should return number of pairs""" pr = PairedRegion(3,10,2) self.assertEqual(len(pr), 2) def test_eq(self): """PairedRegion __eq__: should use pairs and IDs""" pr1 = PairedRegion(3,10,2) pr2 = PairedRegion(3,10,2) pr3 = PairedRegion(3,10,2, Id='A') pr4 = PairedRegion(3,10,2, Id='A') pr5 = PairedRegion(3,20,4, Id='A') self.assertEqual(pr1==pr2, True) # same pairs, no IDs self.assertEqual(pr3==pr4, True) # same pairs, same IDs self.assertEqual(pr1==pr3, False) # same pairs, diff ID self.assertEqual(pr3==pr5, False) # diff pairs, same IDs def test_upstream(self): """PairedRegions upstream: single and multiple pair(s)""" pr = PairedRegion(3,10,2) self.assertEqual(pr.upstream(), [3,4]) pr = PairedRegion(3,10,1) self.assertEqual(pr.upstream(), [3]) def test_downstream(self): """PairedRegions downstream: single and multiple pair(s)""" pr = PairedRegion(3,10,2) self.assertEqual(pr.downstream(), [9,10]) pr = PairedRegion(3,10,1) self.assertEqual(pr.downstream(), [10]) def test_paired(self): """PairedRegion paired: single and multiple pair(s)""" pr = PairedRegion(3,10,2) self.assertEqual(pr.paired(), [3,4,9,10]) pr = PairedRegion(3,10,1) self.assertEqual(pr.paired(), [3,10]) def test_regionRange(self): """PairedRegion regionRange: single and multiple pair(s)""" pr = PairedRegion(3,10,2) self.assertEqual(pr.range(), 4) pr = PairedRegion(1,10,4) self.assertEqual(pr.range(), 2) pr = PairedRegion(1,5,1) self.assertEqual(pr.range(), 3) # no error checking pr = PairedRegion(5,8,3) # 5,6,7,-- 6,7,8 self.assertEqual(pr.range(), -2) def test_overlapping(self): """PairedRegion overlapping: identical and different regions""" pr1 = PairedRegion(1,10,2) pr2 = PairedRegion(3,15,2) self.assertEqual(pr1.overlapping(pr2), False) self.assertEqual(pr2.overlapping(pr1), False) pr1 = PairedRegion(1,10,2) pr2 = PairedRegion(2,15,2) pr3 = PairedRegion(9,20,4) self.assertEqual(pr1.overlapping(pr2), True) self.assertEqual(pr2.overlapping(pr1), True) self.assertEqual(pr1.overlapping(pr3), True) bl1 = PairedRegion(2,10,1) bl2 = PairedRegion(12,20,3) self.assertEqual(bl1.overlapping(bl2), False) pr1 = PairedRegion(1,10,2, Id='A') pr2 = PairedRegion(1,10,2, Id='A') pr3 = PairedRegion(1,10,2, Id='B') self.assertEqual(pr1.overlapping(pr2), True) self.assertEqual(pr1.overlapping(pr3), True) # ignore ID def test_conflicting(self): """PairedRegion conflicting: identical, nested and pseudoknot""" bl1 = PairedRegion(2,10,3) bl2 = PairedRegion(12,20,3) # identical blocks are NOT conflicting... self.assertEqual(bl1.conflicting(bl1), False) # identical blocks self.assertEqual(bl1.conflicting(bl2), False) # one after the other self.assertEqual(bl2.conflicting(bl1), False) # one after the other bl1 = PairedRegion(1,30,2) #[(1,30),(2,29)] bl2 = PairedRegion(14,20,2) #[(14,20),(15,19)] self.assertEqual(bl1.conflicting(bl2), False) # one inside the other self.assertEqual(bl2.conflicting(bl1), False) # one inside the other bl1 = PairedRegion(1,10,2) #[(1,10),(2,9)] bl2 = PairedRegion(4,15,3) #[(4,15),(5,14),(6,13)] self.assertEqual(bl1.conflicting(bl2), True) # pseudoknot self.assertEqual(bl2.conflicting(bl1), True) # pseudoknot def test_score(self): """PairedRegion score: should take arbitrary scoring function""" f = lambda x: x.Length # scoring function bl1 = PairedRegion(2,10,3) bl2 = PairedRegion(12,30,4) bl1.score(f) # set Score attribute for bl1 bl2.score(f) # set Score attribute for bl2 self.assertEqual(bl1.Score, 3) self.assertEqual(bl2.Score, 4) def test_PairedRegionFromPairs(self): """PairedRegionFromPairs: should handle valid input""" p = Pairs([(3,10),(4,9),(5,8)]) pr = PairedRegionFromPairs(p, Id='A') self.assertEqual(pr.Start, 3) self.assertEqual(pr.End, 10) self.assertEqual(pr.Length, 3) self.assertEqual(pr.Id, 'A') self.assertEqual(pr.Pairs, [(3,10),(4,9),(5,8)]) def test_PairedRegionFromPairs_invalid(self): """PairedRegionFromPairs: conflicts and error checking""" p = Pairs([(3,10),(4,9),(4,None)]) self.assertRaises(ValueError, PairedRegionFromPairs, p) # no error checking on input pairs... p = Pairs([(3,10),(4,9),(6,8)]) # not a real paired region pr = PairedRegionFromPairs(p, Id='A') self.assertEqual(pr.Start, 3) self.assertEqual(pr.End, 10) self.assertEqual(pr.Length, 3) self.assertEqual(pr.Id, 'A') # NOTE: Pairs will be different than input because assumption does # not hold self.assertEqual(pr.Pairs, [(3,10),(4,9),(5,8)]) self.assertRaises(ValueError, PairedRegionFromPairs, []) class PairedRegionsTests(TestCase): """Tests for PairedRegions class""" def test_init(self): """PairedRegions __init__: should accept list of PairedRegion objects """ pr1 = PairedRegion(3,10,2) pr2 = PairedRegion(12,20,3) obs = PairedRegions([pr1, pr2]) self.assertEqual(obs[0].Start, 3) self.assertEqual(obs[1].Id, None) self.assertEqual(obs[1].End, 20) def test_init_no_validation(self): """PairedRegions __init__: does not perform validation """ # can give any list of arbitrary object as input obs = PairedRegions([1,2,3]) self.assertEqual(obs[0], 1) def test_str(self): """PairedRegions __str__: full and empty list """ pr1 = PairedRegion(3,10,2) pr2 = PairedRegion(12,20,3, Id='A') prs = PairedRegions([pr1, pr2]) self.assertEqual(str(prs), "(None:3,10,2; A:12,20,3;)") def test_eq(self): """PairedRegions __eq__: with/without IDs, in or out of order """ pr1 = PairedRegion(3,10,2) pr2 = PairedRegion(12,20,3, Id='A') prs1 = PairedRegions([pr1, pr2]) pr3 = PairedRegion(3,10,3) pr4 = PairedRegion(20,30,3, Id='A') prs2 = PairedRegions([pr3, pr4]) pr5 = PairedRegion(3,10,2) pr6 = PairedRegion(12,20,3, Id='A') prs3 = PairedRegions([pr5, pr6]) prs4 = PairedRegions([pr6, pr5]) self.assertEqual(prs1==prs2, False) self.assertEqual(prs1==prs1, True) self.assertEqual(prs1==prs3, True) self.assertEqual(prs1==prs4, True) def test_ne(self): """PairedRegions __ne__: with/without IDs, in or out of order """ pr1 = PairedRegion(3,10,2) pr2 = PairedRegion(12,20,3, Id='A') prs1 = PairedRegions([pr1, pr2]) pr3 = PairedRegion(3,10,3) pr4 = PairedRegion(20,30,3, Id='A') prs2 = PairedRegions([pr3, pr4]) pr5 = PairedRegion(3,10,2) pr6 = PairedRegion(12,20,3, Id='A') prs3 = PairedRegions([pr5, pr6]) prs4 = PairedRegions([pr6, pr5]) self.assertEqual(prs1!=prs2, True) self.assertEqual(prs1!=prs1, False) self.assertEqual(prs1!=prs3, False) self.assertEqual(prs1!=prs4, False) def test_byId(self): """PairedRegions byId: unique IDs and duplicates """ pr1 = PairedRegion(3,10,2, Id='A') pr2 = PairedRegion(12,20,3, Id='B') prs1 = PairedRegions([pr1, pr2]) obs = prs1.byId() self.assertEqual(obs['A'], pr1) self.assertEqual(obs['B'], pr2) self.assertEqual(len(obs), 2) pr3 = PairedRegion(3,10,2, Id='A') pr4 = PairedRegion(12,20,3, Id='A') prs2 = PairedRegions([pr3, pr4]) self.assertRaises(ValueError, prs2.byId) pr3 = PairedRegion(3,10,2) pr4 = PairedRegion(12,20,3) prs2 = PairedRegions([pr3, pr4]) self.assertRaises(ValueError, prs2.byId) self.assertEqual(PairedRegions().byId(), {}) def test_numberOfRegions(self): """PairedRegions numberOfRegions: full and empty""" pr1 = PairedRegion(2,10,2) pr2 = PairedRegion(11,20,4) prs1 = PairedRegions([pr1,pr2]) prs2 = PairedRegions([pr1,pr1,pr2,pr2]) self.assertEqual(prs1.numberOfRegions(), 2) self.assertEqual(prs2.numberOfRegions(), 4) self.assertEqual(PairedRegions().numberOfRegions(), 0) def test_totalLength(self): """PairedRegions totalLength: full and empty""" pr1 = PairedRegion(2,10,2) pr2 = PairedRegion(11,20,4) prs1 = PairedRegions([pr1,pr2]) self.assertEqual(prs1.totalLength(), 6) self.assertEqual(PairedRegions().totalLength(), 0) def test_totalScore(self): """PairedRegions totalScore: full, empty, None""" pr1 = PairedRegion(2,10,2) pr1.Score = 3 pr2 = PairedRegion(11,20,4) pr2.Score = 2 pr3 = PairedRegion(11,20,4) pr3.Score = None pr4 = PairedRegion(11,20,4) pr4.Score = "abc" pr5 = PairedRegion(11,20,4) prs1 = PairedRegions([pr1,pr2]) prs2 = PairedRegions([pr1,pr3]) prs3 = PairedRegions([pr1,pr4]) prs4 = PairedRegions([pr1,pr5]) self.assertEqual(prs1.totalScore(), 5) self.assertRaises(ValueError, prs2.totalScore) self.assertRaises(ValueError, prs3.totalScore) self.assertRaises(ValueError, prs4.totalScore) def test_toPairs(self): """PairedRegions toPairs: good data""" pr1 = PairedRegion(2,10,2) pr2 = PairedRegion(11,20,4) prs1 = PairedRegions([pr2,pr1]) exp = [(2,10),(3,9),(11,20),(12,19),(13,18),(14,17)] self.assertEqual(prs1.toPairs(), exp) prs2 = PairedRegions([pr1,pr1]) exp = [(2,10),(2,10),(3,9),(3,9)] self.assertEqual(prs2.toPairs(), exp) self.assertEqual(PairedRegions().toPairs(), Pairs()) def test_byStartEnd(self): """PairedRegions byStartEnd: unique and duplicate keys""" pr1 = PairedRegion(2,10,2) pr2 = PairedRegion(11,20,4) prs1 = PairedRegions([pr2,pr1]) exp = {(2,10): pr1, (11,20): pr2} self.assertEqual(prs1.byStartEnd(), exp) pr3 = PairedRegion(2,10,2, Id='A') pr4 = PairedRegion(2,10,3, Id='B') prs2 = PairedRegions([pr3,pr4]) self.assertRaises(ValueError, prs2.byStartEnd) def test_lowestStart(self): """PairedRegions lowestStart: full and empty object""" pr1 = PairedRegion(2,10,2) pr2 = PairedRegion(11,20,4) pr3 = PairedRegion(2,30,5,Id='A') prs1 = PairedRegions([pr2,pr1,pr3]) self.assertEqual(prs1.lowestStart(), 2) self.assertEqual(PairedRegions().lowestStart(), None) def test_highestEnd(self): """PairedRegions highestEnd: full and empty object""" pr1 = PairedRegion(2,10,2) pr2 = PairedRegion(11,20,4) pr3 = PairedRegion(2,30,5,Id='A') prs1 = PairedRegions([pr2,pr1,pr3]) self.assertEqual(prs1.highestEnd(), 30) self.assertEqual(PairedRegions().highestEnd(), None) def test_sortedIds(self): """PairedRegions sortedIds: full and empty list""" pr1 = PairedRegion(2,10,2, Id='C') pr2 = PairedRegion(11,20,4, Id='A') pr3 = PairedRegion(2,30,5,Id='B') prs1 = PairedRegions([pr1,pr2,pr3]) self.assertEqual(prs1.sortedIds(), ['A','B','C']) pr1 = PairedRegion(2,10,2, Id='C') pr2 = PairedRegion(11,20,4, Id='A') pr3 = PairedRegion(2,30,5,Id='C') prs1 = PairedRegions([pr1,pr2,pr3]) self.assertEqual(prs1.sortedIds(), ['A','C','C']) pr1 = PairedRegion(2,10,2) pr2 = PairedRegion(11,20,4, Id='A') pr3 = PairedRegion(2,30,5,Id=2) prs1 = PairedRegions([pr1,pr2,pr3]) self.assertEqual(prs1.sortedIds(), [None, 2, 'A']) def test_upstream(self): """PairedRegions upstream: full and empty""" self.assertEqual(PairedRegions().upstream(), []) pr1 = PairedRegion(2,10,2, Id='C') pr2 = PairedRegion(11,20,4, Id='A') pr3 = PairedRegion(4,30,1,Id='B') prs1 = PairedRegions([pr1,pr2,pr3]) exp = [2,3,11,12,13,14,4] exp.sort() self.assertEqual(prs1.upstream(), exp) def test_downstream(self): """PairedRegions upstream: full and empty""" self.assertEqual(PairedRegions().downstream(), []) pr1 = PairedRegion(2,10,2, Id='C') pr2 = PairedRegion(11,20,4, Id='A') pr3 = PairedRegion(4,30,1,Id='B') prs1 = PairedRegions([pr1,pr2,pr3]) exp = [10,9,20,19,18,17,30] exp.sort() self.assertEqual(prs1.downstream(), exp) def test_pairedPos(self): """PairedRegions pairedPos: full and empty""" self.assertEqual(PairedRegions().pairedPos(), []) pr1 = PairedRegion(2,10,2, Id='C') pr2 = PairedRegion(11,20,4, Id='A') pr3 = PairedRegion(4,30,1,Id='B') prs1 = PairedRegions([pr1,pr2,pr3]) exp = [2,3,11,12,13,14,4,10,9,20,19,18,17,30] exp.sort() self.assertEqual(prs1.pairedPos(), exp) def test_boundaries(self): """PairedRegions boundaries: full and empty""" self.assertEqual(PairedRegions().boundaries(), []) pr1 = PairedRegion(2,10,2, Id='C') pr2 = PairedRegion(11,20,4, Id='A') pr3 = PairedRegion(4,30,1,Id='B') prs1 = PairedRegions([pr1,pr2,pr3]) exp = [2,10,11,20,4,30] exp.sort() self.assertEqual(prs1.boundaries(), exp) def test_enumeratedBoundaries(self): """PairedRegions enumeratedBoundaries: full and empty""" self.assertEqual(PairedRegions().enumeratedBoundaries(), {}) pr1 = PairedRegion(2,10,2, Id='C') pr2 = PairedRegion(11,20,4, Id='A') pr3 = PairedRegion(4,30,1,Id='B') prs1 = PairedRegions([pr1,pr2,pr3]) exp = {0:2,2:10,3:11,4:20,1:4,5:30} self.assertEqual(prs1.enumeratedBoundaries(), exp) def test_invertedEnumeratedBoundaries(self): """PairedRegions invertedEnumeratedBoundaries: full and empty""" self.assertEqual(PairedRegions().invertedEnumeratedBoundaries(), {}) pr1 = PairedRegion(2,10,2, Id='C') pr2 = PairedRegion(11,20,4, Id='A') pr3 = PairedRegion(4,30,1,Id='B') prs1 = PairedRegions([pr1,pr2,pr3]) exp = {2:0,10:2,11:3,20:4,4:1,30:5} self.assertEqual(prs1.invertedEnumeratedBoundaries(), exp) pr1 = PairedRegion(3,10,2) pr2 = PairedRegion(5,10,3) prs = PairedRegions([pr1, pr2]) self.assertRaises(ValueError, prs.invertedEnumeratedBoundaries) def test_merge(self): """PairedRegions merge: different, duplicates, empty""" pr1 = PairedRegion(3,10,2, Id='A') pr2 = PairedRegion(11,20,3, Id='B') pr3 = PairedRegion(15,25,1, Id='C') prs1 = PairedRegions([pr1, pr2]) prs2 = PairedRegions([pr1, pr3]) prs3 = PairedRegions() exp = PairedRegions([pr1,pr2,pr3]) self.assertEqual(prs1.merge(prs2), exp) self.assertEqual(prs2.merge(prs1), exp) self.assertEqual(prs1.merge(prs3), prs1) self.assertEqual(prs2.merge(prs3), prs2) def test_conflicting_no_ids(self): """PairedRegions conflicting: raises error on duplicate IDs """ pr1 = PairedRegion(1,10,2) pr2 = PairedRegion(11,20,2) prs = PairedRegions([pr1, pr2]) self.assertRaises(ValueError, prs.conflicting) #conflicting IDs def test_conflicting(self): """PairedRegions conflicting: works when IDs are set and unique """ pr1 = PairedRegion(3,10,2, Id='A') pr2 = PairedRegion(11,20,3, Id='B') pr3 = PairedRegion(15,25,1, Id='C') prs = PairedRegions([pr1,pr2,pr3]) self.assertEqual(prs.conflicting(), PairedRegions([pr2,pr3])) prs = PairedRegions() self.assertEqual(prs.conflicting(), PairedRegions()) def test_non_conflicting_no_ids(self): """PairedRegions nonConflicting: raises error on duplicate IDs """ pr1 = PairedRegion(1,10,2) pr2 = PairedRegion(11,20,2) prs = PairedRegions([pr1, pr2]) self.assertRaises(ValueError, prs.nonConflicting) #conflicting IDs def test_non_conflicting(self): """PairedRegions nonConflicting: works when IDs are set and unique """ pr1 = PairedRegion(3,10,2, Id='A') pr2 = PairedRegion(11,20,3, Id='B') pr3 = PairedRegion(15,25,1, Id='C') prs = PairedRegions([pr1,pr2,pr3]) self.assertEqual(prs.nonConflicting(), PairedRegions([pr1])) prs = PairedRegions() self.assertEqual(prs.conflicting(), PairedRegions()) def test_conflictCliques(self): """PairedRegions conflictCliques: should work when IDs are unique""" pr1 = PairedRegion(3,10,2, Id='A') pr2 = PairedRegion(11,20,3, Id='B') pr3 = PairedRegion(15,25,1, Id='C') pr4 = PairedRegion(30,40,2, Id='D') pr5 = PairedRegion(28,35,1, Id='E') prs = PairedRegions([pr1,pr2,pr3,pr4, pr5]) obs = prs.conflictCliques() exp = [PairedRegions([pr2,pr3]),PairedRegions([pr5,pr4])] for i in obs: self.failUnless(i in exp) self.assertEqual(len(obs), len(exp)) prs = PairedRegions() self.assertEqual(prs.conflictCliques(), []) def test_PairedRegionsFromPairs(self): """PairedRegionsFromPairs: should work on valid input""" p = Pairs([(1,10),(2,9),(12,20),(13,19),(14,18)]) prs = PairedRegionsFromPairs(p) self.assertEqual(len(prs), 2) self.assertEqual(prs[0].Id, 0) self.assertEqual(prs[0].Pairs, [(1,10),(2,9)]) self.assertEqual(prs[0].Start, 1) self.assertEqual(prs[0].End, 10) self.assertEqual(PairedRegionsFromPairs(Pairs()), PairedRegions()) def test_PairedRegionsFromPairs_conflict(self): """PairedRegionsFromPairs: should raise error on overlapping pairs""" p = Pairs([(2,20),(5,10),(10,15)]) self.assertRaises(ValueError, PairedRegionsFromPairs, p) class ConflictMatrixTests(TestCase): """Tests for ConflictMatrix class""" def test_conflict_matrix_from_pairs(self): """ConflixtMatrix __init__: Pairs as input, w/wo conflict """ f = ConflictMatrix # conflict free d = [(1,10),(2,9),(12,20),(13,19),(14,18)] exp = Dict2D({0:{0:False,1:False},1:{0:False,1:False}}) self.assertEqual(f(d).Matrix, exp) self.failIf(not isinstance(f(d).Matrix, Dict2D)) # 1 conflict d = [(1,10),(2,9),(12,20),(13,19),(14,18),(15,30),(16,29)] exp = Dict2D({0:{0:False,1:False,2:False},\ 1:{0:False,1:False,2:True},\ 2:{0:False,1:True,2:False}}) self.assertEqual(f(d).Matrix, exp) # 1 conflict d = Pairs([(1,10),(2,9),(12,20),(13,19),(14,18),(15,30),(16,29)]) exp = Dict2D({0:{0:False,1:False,2:False},\ 1:{0:False,1:False,2:True},\ 2:{0:False,1:True,2:False}}) m = f(d).Matrix self.assertEqual(m, exp) self.assertEqual(m.RowOrder, [0,1,2]) self.assertEqual(m.ColOrder, [0,1,2]) d = [] # empty input exp = Dict2D() self.assertEqual(f(d).Matrix, exp) def test_ConflictMatrix_Pairs_overlap(self): """ConflictMatrix __init__: raises error on overlapping pairs""" p = Pairs([(1,10),(2,9),(3,9),(12,20)]) self.assertRaises(ValueError, ConflictMatrix, p) def test_conflict_matrix_from_PairedRegions(self): """ConflictMatrix __init__: PairedRegions as input, w/wo conflict """ f = ConflictMatrix # conflict free pr1 = PairedRegion(1,10,2, Id=0) pr2 = PairedRegion(12,20,3, Id=1) prs = PairedRegions([pr1,pr2]) exp = Dict2D({0:{0:False,1:False},1:{0:False,1:False}}) self.assertEqual(f(prs).Matrix, exp) self.failIf(not isinstance(f(prs).Matrix, Dict2D)) pr1 = PairedRegion(1,10,2, Id=0) pr2 = PairedRegion(12,20,3, Id=1) pr3 = PairedRegion(15,30,2, Id=2) prs = PairedRegions([pr1,pr2, pr3]) # 1 conflict exp = Dict2D({0:{0:False,1:False,2:False},\ 1:{0:False,1:False,2:True},\ 2:{0:False,1:True,2:False}}) self.assertEqual(f(prs).Matrix, exp) # 1 conflict pr1 = PairedRegion(1,10,2, Id=4) pr2 = PairedRegion(12,20,3, Id=1) pr3 = PairedRegion(15,30,2, Id=9) prs = PairedRegions([pr1,pr2, pr3]) exp = Dict2D({1:{4:False,1:False,9:True},\ 4:{1:False,4:False,9:False},\ 9:{1:True,4:False,9:False}}) m = f(prs).Matrix self.assertEqual(m, exp) self.assertEqual(m.RowOrder, [1,4,9]) self.assertEqual(m.ColOrder, [1,4,9]) prs = PairedRegions() exp = Dict2D() self.assertEqual(f(prs).Matrix, exp) # input some weird data. Other errors might occur. self.assertRaises(ValueError, f, 'ABC') self.assertRaises(ValueError, f, [('a','b'),('c','d')]) def test_ConflictMatrix_PairedRegions_overlap(self): """ConflictMatrix __init__: raises error on overlapping PairedRegions """ pr1 = PairedRegion(1,10,2, Id='A') pr2 = PairedRegion(8,20,2, Id='B') prs = PairedRegions([pr1, pr2]) self.assertRaises(ValueError, ConflictMatrix, prs) def test_conflictsOf(self): """ConflictMatrix conflictsOf: with/without conflicts""" p = Pairs([(1,10),(5,15),(20,30),(25,35),(24,32),(0,80)]) cm = ConflictMatrix(p) self.assertEqual(cm.conflictsOf(0), []) self.assertEqual(cm.conflictsOf(1), [2]) self.assertEqual(cm.conflictsOf(2), [1]) self.assertEqual(cm.conflictsOf(3), [4,5]) p = Pairs([(1,10),(11,20)]) cm = ConflictMatrix(p) self.assertEqual(cm.conflictsOf(0), []) self.assertEqual(cm.conflictsOf(1), []) self.assertRaises(KeyError, cm.conflictsOf, 2) def test_conflicting(self): """ConflictMatrix conflicting: full and empty Pairs""" p = Pairs([(1,10),(5,15),(20,30),(25,35),(24,32),(0,80)]) cm = ConflictMatrix(p) obs = cm.conflicting() exp = [1,2,3,4,5] self.assertEqual(obs, exp) self.assertEqual(ConflictMatrix(Pairs()).conflicting(), []) def test_nonConflicting(self): """ConflictMatrix nonConflicting: full and empty Pairs""" p = Pairs([(1,10),(5,15),(20,30),(25,35),(24,32),(0,80)]) cm = ConflictMatrix(p) obs = cm.nonConflicting() exp = [0] self.assertEqual(obs, exp) self.assertEqual(ConflictMatrix(Pairs()).nonConflicting(), []) def test_conflictCliques(self): """ConflictMatrix conflictCliques: full and empty Pairs""" p = Pairs([(1,10),(5,15),(20,30),(25,35),(24,32),(0,80)]) cm = ConflictMatrix(p) obs = cm.conflictCliques() exp = [[1,2],[3,4,5]] self.assertEqual(obs, exp) self.assertEqual(ConflictMatrix(Pairs()).conflictCliques(), []) class DPTests(TestCase): """Tests for opt_all and related functions""" def test_num_bps(self): """num_bps: should return length of paired region""" f = num_bps pr1 = PairedRegion(0,10,3) self.assertEqual(f(pr1), 3) def test_hydrogen_bonds(self): """hydrogen_bonds: score GC, AU, and GU base pairs""" f = hydrogen_bonds('UACGAAAUGCGUG') pr1 = PairedRegion(0,12,5) self.assertEqual(f(pr1),10) f = hydrogen_bonds('UACGAAA') # sequence too short pr1 = PairedRegion(0,12,5) self.assertRaises(IndexError, f, pr1) def test_contains_true(self): """contains_true: should return True if True in input""" f = contains_true self.assertEqual(f([True]), True) self.assertEqual(f([True, False]), True) self.assertEqual(f([1, 0]), True) self.assertEqual(f([1]), True) self.assertEqual(f([False]), False) self.assertEqual(f([3]), False) self.assertEqual(f(["a","b","c"]), False) self.assertEqual(f("abc"), False) def test_empty_matrix(self): """empty_matrix: valid input and error""" f = empty_matrix p = PairedRegions() exp = [[[p],[p]], [[p],[p]]] self.assertEqual(f(2), exp) self.assertEqual(f(1), [[[p]]]) self.assertRaises(ValueError, f, 0) def test_pick_multi_best_max(self): """pick_multi_best: max, full and empty list""" pr1 = PairedRegion(2,10,2, Id='A') pr2 = PairedRegion(4,15,3, Id='B') pr3 = PairedRegion(20,40,5, Id='C') pr4 = PairedRegion(22,30,3, Id='D') for i in [pr1,pr2,pr3,pr4]: i.score(num_bps) prs1 = PairedRegions([pr1, pr2]) prs2 = PairedRegions([pr3]) prs3 = PairedRegions([pr4]) self.assertEqualItems(pick_multi_best([prs1, prs2, prs3]), [prs1, prs2]) self.assertEqual(pick_multi_best([]), [PairedRegions()]) def test_pick_multi_best_min(self): """pick_multi_best: min, full and empty list""" f = lambda x: -1 pr1 = PairedRegion(2,10,2) pr2 = PairedRegion(4,15,3) pr3 = PairedRegion(20,40,5) pr4 = PairedRegion(22,30,3) for i in [pr1,pr2,pr3,pr4]: i.score(f) prs1 = PairedRegions([pr1, pr2]) prs2 = PairedRegions([pr3]) prs3 = PairedRegions([pr4]) self.assertEqual(pick_multi_best([prs1, prs2, prs3], goal='min'),\ [prs1]) self.assertEqual(pick_multi_best([], goal='min'), [PairedRegions()]) def test_dp_matrix_multi_toy(self): """dp_matrix_multi: test on initial toy example""" pr0 = PairedRegion(0, 70, 2, Id='C') pr1 = PairedRegion(10, 30, 4, Id='A') pr2 = PairedRegion(20, 50, 3, Id='B') pr3 = PairedRegion(40, 90, 2, Id='E') pr4 = PairedRegion(60, 80, 3, Id='D') prs = PairedRegions([pr0, pr1, pr2, pr3, pr4]) obs = dp_matrix_multi(prs) self.assertEqual(obs[0][0], [PairedRegions()]) self.assertEqual(obs[0][3], [PairedRegions([pr1])]) self.assertEqual(obs[2][5], [PairedRegions([pr2])]) self.assertEqual(obs[4][9], [PairedRegions([pr3,pr4])]) self.assertEqual(obs[2][9], [PairedRegions([pr2,pr4])]) self.assertEqual(obs[1][8], [PairedRegions([pr1,pr4])]) self.assertEqual(obs[1][9], [PairedRegions([pr1,pr3,pr4])]) self.assertEqual(obs[0][9], [PairedRegions([pr1,pr3,pr4])]) def test_dp_matrix_multi_lsu(self): """dp_matrix_multi: test on LSU rRNA domain I case""" pr0 = PairedRegion(56, 69, 3, Id=0) pr1 = PairedRegion(60, 92, 1, Id=1) pr2 = PairedRegion(62, 89, 3, Id=2) pr3 = PairedRegion(75, 109, 6, Id=3) pr4 = PairedRegion(84, 96, 3, Id=4) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4]) obs = dp_matrix_multi(prs) self.assertEqual(obs[0][0], [PairedRegions()]) self.assertEqual(obs[0][5], [PairedRegions([pr0])]) self.assertEqual(obs[1][6], [PairedRegions([pr2])]) self.assertEqual(obs[1][7], [PairedRegions([pr1,pr2])]) self.assertEqualItems(obs[2][8],\ [PairedRegions([pr2]),PairedRegions([pr4])]) self.assertEqual(obs[1][9], [PairedRegions([pr3,pr4])]) self.assertEqual(obs[0][9], [PairedRegions([pr0,pr3,pr4])]) def test_dp_matrix_multi_artificial(self): """dp_matrix_multi: test on artificial structure""" pr0 = PairedRegion(0, 77, 2, Id=0) pr1 = PairedRegion(7, 75, 5, Id=1) pr2 = PairedRegion(13, 83, 3, Id=2) pr3 = PairedRegion(18, 41, 5, Id=3) pr4 = PairedRegion(23, 53, 10, Id=4) pr5 = PairedRegion(33, 70, 3, Id=5) pr6 = PairedRegion(59, 93, 9, Id=6) pr7 = PairedRegion(78, 96, 3, Id=7) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7]) obs = dp_matrix_multi(prs) self.assertEqual(obs[0][0], [PairedRegions()]) self.assertEqual(obs[0][6], [PairedRegions([pr3])]) self.assertEqual(obs[0][7], [PairedRegions([pr4])]) self.assertEqual(obs[9][15], [PairedRegions([pr7])]) self.assertEqual(obs[1][10], [PairedRegions([pr1,pr4])]) self.assertEqual(obs[0][11], [PairedRegions([pr0,pr1,pr4])]) self.assertEqual(obs[3][14], [PairedRegions([pr4, pr6])]) self.assertEqual(obs[3][14], [PairedRegions([pr4, pr6])]) self.assertEqual(obs[0][13], [PairedRegions([pr0, pr1, pr4])]) self.assertEqual(obs[0][14], [PairedRegions([pr4, pr6])]) self.assertEqual(obs[1][15], [PairedRegions([pr4, pr6])]) self.assertEqual(obs[0][15], [PairedRegions([pr0, pr1, pr4, pr7])]) def test_pick_multi_best_saturated(self): """pick_multi_best: should only include saturated solutions""" pr1 = PairedRegion(2,10,2, Id='A') pr1.Score = 2 pr2 = PairedRegion(15,25,2, Id='B') pr2.Score = 2 pr3 = PairedRegion(4,22,4, Id='C') pr3.Score = 0 prs1 = PairedRegions([pr1]) prs2 = PairedRegions([pr2]) prs3 = PairedRegions([pr1, pr3]) self.assertEqualItems(pick_multi_best([prs1, prs2, prs3]),\ [prs2, prs3]) self.assertEqual(pick_multi_best([]), [PairedRegions()]) def test_matrix_solutions(self): """matrix_solutions: should return contents of top-right cell""" pr0 = PairedRegion(56, 69, 3, Id=0) pr1 = PairedRegion(60, 92, 1, Id=1) pr2 = PairedRegion(62, 89, 3, Id=2) pr3 = PairedRegion(75, 109, 6, Id=3) pr4 = PairedRegion(84, 96, 3, Id=4) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4]) obs = matrix_solutions(prs) self.assertEqual(obs, [PairedRegions([pr0,pr3,pr4])]) # error, size should be at least 1 prs = PairedRegions() self.assertRaises(ValueError, matrix_solutions, prs) pr = PairedRegion(2,20, 5, Id='A') prs = PairedRegions([pr]) obs = matrix_solutions(prs) self.assertEqual(obs, [prs]) def test_opt_all_nested(self): """opt_all: should return input when already nested""" p = Pairs([(1,10),(2,9),(20,30),(22,29)]) obs = opt_all(p) self.assertEqual(len(obs),1) self.assertEqual(obs[0], p) p = Pairs() self.assertEqual(opt_all(p), [[]]) def test_opt_all_overlap(self): """opt_all: should raise error on overlapping pairs""" p = Pairs([(1,10),(2,9),(9,30),(22,29),(1,None)]) self.assertRaises(ValueError, opt_all, p) def test_opt_all_knot(self): """opt_all: single/multiple solution(s)""" p = Pairs([(1,10),(2,9),(3,15),(4,14),(11,20),(12,19),(25,30)]) obs = opt_all(p) exp = Pairs([(1,10),(2,9),(11,20),(12,19),(25,30)]) exp_rem = [(3,15),(4,14)] self.assertEqual(len(obs), 1) self.assertEqual(obs[0], exp) self.assertEqual(opt_all(p, return_removed=True)[0][1],\ exp_rem) p = Pairs([(1,10),(2,9),(4,14),(3,15)]) obs = opt_all(p) self.assertEqual(len(obs), 2) self.assertEqualItems(obs, [Pairs([(1,10),(2,9)]),\ Pairs([(3,15),(4,14)])]) exp_rem = [(Pairs([(1,10),(2,9)]),Pairs([(3,15),(4,14)])),\ (Pairs([(3,15),(4,14)]),Pairs([(1,10),(2,9)]))] self.assertEqualItems(opt_all(p, return_removed=True),\ exp_rem) def test_opt_all_some_non_conflicting(self): """opt_all: some conflicting, other not""" p = Pairs([(30,40),(10,20),(12,17),(13,None),(17,12),(35,45),(36,44)]) exp = Pairs([(10,20),(12,17),(35,45),(36,44)]) exp_rem = [(30,40)] self.assertEqual(opt_all(p, return_removed=True),\ [(exp,exp_rem)]) def test_opt_all_scoring1(self): """opt_all: one optimal in bps, both optimal in energy""" p = Pairs([(1,10),(2,9),(4,15),(5,14),(6,13)]) obs_bps = opt_all(p, goal='max', scoring_function=num_bps) obs_energy = opt_all(p, goal='max',\ scoring_function=hydrogen_bonds('CCCAAAUGGGGUCGUUC')) exp_bps = [[(4,15),(5,14),(6,13)]] exp_energy = [[(1,10),(2,9)],[(4,15),(5,14),(6,13)]] self.assertEqualItems(obs_bps, exp_bps) self.assertEqualItems(obs_energy, exp_energy) def test_opt_all_scoring2(self): """opt_all: both optimal in bps, one optimal in energy""" p = Pairs([(0,9),(1,8),(2,7),(3,13),(4,12),(5,11)]) obs_bps = opt_all(p, goal='max', scoring_function=num_bps) obs_energy = opt_all(p, goal='max',\ scoring_function=hydrogen_bonds('CCCAAAAGGGUUUU')) exp_bps = [[(0,9),(1,8),(2,7)],[(3,13),(4,12),(5,11)]] exp_energy = [[(0,9),(1,8),(2,7)]] self.assertEqualItems(obs_bps, exp_bps) self.assertEqualItems(obs_energy, exp_energy) def test_opt_all_scoring3(self): """opt_all: one optimal in bps, the other optimal in energy""" p = Pairs([(0,11),(1,10),(2,9),(4,15),(5,14),(6,13),(7,12)]) obs_bps = opt_all(p, goal='max', scoring_function=num_bps) obs_energy = opt_all(p, goal='max',\ scoring_function=hydrogen_bonds('CCCCAAAAGGGGUUUU')) exp_bps = [[(4,15),(5,14),(6,13),(7,12)]] exp_energy = [[(0,11),(1,10),(2,9)]] self.assertEqualItems(obs_bps, exp_bps) self.assertEqualItems(obs_energy, exp_energy) def test_opt_single_random(self): """opt_single_random: should return single solution""" p = Pairs ([(10,20),(11,19),(15,25),(16,24)]) exp1, exp_rem1 = [(10,20),(11,19)], [(15,25),(16,24)] exp2, exp_rem2 = [(15,25),(16,24)], [(10,20),(11,19)] obs = opt_single_random(p) self.failUnless(obs == exp1 or obs == exp2) obs = opt_single_random(p, return_removed=True) self.failUnless(obs == (exp1, exp_rem1) or obs == (exp2, exp_rem2)) def test_opt_single_property(self): """opt_single_property: three properties""" # one solution single region, other solution two regions p = Pairs ([(10,20),(25,35),(26,34),(27,33),\ (12,31),(13,30),(14,29),(15,28)]) exp = [(12,31),(13,30),(14,29),(15,28)] exp_rem = [(10,20),(25,35),(26,34),(27,33)] self.assertEqual(opt_single_property(p), exp) self.assertEqual(opt_single_property(p, return_removed=True),\ (exp,exp_rem)) # both two blocks, one shorter average range p = Pairs ([(10,20),(22,40),(23,39),(24,38),\ (17,26),(18,25),(36,43),(37,42)]) exp = [(17,26),(18,25),(36,43),(37,42)] exp_rem = [(10,20),(22,40),(23,39),(24,38)] self.assertEqual(opt_single_property(p), exp) self.assertEqual(opt_single_property(p, return_removed=True),\ (exp,exp_rem)) # both single block over same range, pick lowest start p = Pairs([(10,20),(15,25)]) exp = [(10,20)] exp_rem = [(15,25)] self.assertEqual(opt_single_property(p), exp) self.assertEqual(opt_single_property(p, return_removed=True),\ (exp,exp_rem)) class EliminationMethodsTests(TestCase): """Tests for conflict_elimination and related functions""" def test_find_max_conflicts(self): """find_max_conflicts: simple case""" f = find_max_conflicts pr0 = PairedRegion(0, 77, 2, Id=0) pr1 = PairedRegion(7, 75, 5, Id=1) pr2 = PairedRegion(13, 83, 3, Id=2) pr3 = PairedRegion(18, 41, 5, Id=3) pr4 = PairedRegion(23, 53, 10, Id=4) pr5 = PairedRegion(33, 70, 3, Id=5) pr6 = PairedRegion(59, 93, 9, Id=6) pr7 = PairedRegion(78, 96, 3, Id=7) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7]) id_to_pr = prs.byId() cm = ConflictMatrix(prs) conf = cm.conflicting() self.assertEqual(f(conf, cm, prs.byId()), 6) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr7]) id_to_pr = prs.byId() cm = ConflictMatrix(prs) conf = cm.conflicting() self.assertEqual(f(conf, cm, prs.byId()), 2) prs = PairedRegions([pr0, pr1, pr3, pr4, pr5, pr7]) id_to_pr = prs.byId() cm = ConflictMatrix(prs) conf = cm.conflicting() self.assertEqual(f(conf, cm, prs.byId()), 5) def test_find_max_conflicts_on_start(self): """find_max_conflicts: in case of equal conflicts and gain""" f = find_max_conflicts pr0 = PairedRegion(10, 20, 2, Id=0) pr1 = PairedRegion(15, 25, 2, Id=1) prs = PairedRegions([pr0, pr1]) id_to_pr = prs.byId() cm = ConflictMatrix(prs) conf = cm.conflicting() self.assertEqual(f(conf, cm, prs.byId()), 1) def test_find_min_gain(self): """find_min_gain: differentiate on gain only""" f = find_min_gain pr0 = PairedRegion(0, 77, 2, Id=0) pr1 = PairedRegion(7, 75, 5, Id=1) pr2 = PairedRegion(13, 83, 3, Id=2) pr3 = PairedRegion(18, 41, 5, Id=3) pr4 = PairedRegion(23, 53, 10, Id=4) pr5 = PairedRegion(33, 70, 3, Id=5) pr6 = PairedRegion(59, 93, 9, Id=6) pr7 = PairedRegion(78, 96, 3, Id=7) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7]) id_to_pr = prs.byId() cm = ConflictMatrix(prs) conf = cm.conflicting() self.assertEqual(f(conf, cm, prs.byId()), 5) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr6, pr7]) id_to_pr = prs.byId() cm = ConflictMatrix(prs) conf = cm.conflicting() self.assertEqual(f(conf, cm, prs.byId()), 2) def test_find_min_gain_conf(self): """find_min_gain: in case of equal gain, differentiate on conflicts""" f = find_min_gain pr0 = PairedRegion(10,30,3, Id=0) pr1 = PairedRegion(1,20,6, Id=1) pr2 = PairedRegion(22,40,2, Id=2) pr3 = PairedRegion(50,80,3, Id=3) pr4 = PairedRegion(60,90,8, Id=4) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4]) id_to_pr = prs.byId() cm = ConflictMatrix(prs) conf = cm.conflicting() self.assertEqual(f(conf, cm, prs.byId()), 0) def test_find_min_gain_start(self): """find_min_gain: in case of equal gain and number of conflicts""" f = find_min_gain pr0 = PairedRegion(10,30,3, Id=0) pr1 = PairedRegion(1,20,6, Id=1) pr2 = PairedRegion(22,40,2, Id=2) pr3 = PairedRegion(50,80,3, Id=3) pr4 = PairedRegion(60,90,7, Id=4) pr5 = PairedRegion(45,55,1, Id=5) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5]) id_to_pr = prs.byId() cm = ConflictMatrix(prs) conf = cm.conflicting() self.assertEqual(f(conf, cm, prs.byId()), 3) def test_add_back_non_conflicting(self): """add_back_non_conflicting: should add all non-confl regions""" f = add_back_non_conflicting pr0 = PairedRegion(10,20,3, Id=0) pr1 = PairedRegion(30,40,2, Id=1) pr2 = PairedRegion(50,60,2, Id=2) pr3 = PairedRegion(45,55,3, Id=3) # confl with pr1 and pr2 pr4 = PairedRegion(0,90,7, Id=4) # not confl with 1,2,3 pr5 = PairedRegion(32,38,2, Id=5) # not confl with 1,2,3 prs = PairedRegions([pr0, pr1, pr2]) removed = {3: pr3, 4: pr4, 5: pr5} exp_prs = PairedRegions([pr0, pr1, pr2, pr4, pr5]) exp_rem = {3: pr3} self.assertEqual(f(prs, removed), (exp_prs, exp_rem)) def test_add_back_non_conflicting_order(self): """add_back_non_conflicting: should add 5' side first""" f = add_back_non_conflicting pr0 = PairedRegion(10,20,3, Id=0) pr1 = PairedRegion(30,40,2, Id=1) pr2 = PairedRegion(50,60,2, Id=2) pr3 = PairedRegion(45,55,3, Id=3) # confl with pr1 and pr2 pr4 = PairedRegion(0,90,7, Id=4) # not confl with 1,2,3 pr5 = PairedRegion(80,95,2, Id=5) # not confl with 1,2,3 prs = PairedRegions([pr0, pr1, pr2]) removed = {3: pr3, 4: pr4, 5: pr5} exp_prs = PairedRegions([pr0, pr1, pr2, pr4 ]) exp_rem = {3: pr3, 5: pr5} self.assertEqual(f(prs, removed), (exp_prs, exp_rem)) def test_elim_most_conflict(self): """conflict_elimination: find_max_conflicts, simple case""" f = conflict_elimination func = find_max_conflicts pr0 = PairedRegion(0, 77, 2, Id=0) pr1 = PairedRegion(7, 75, 5, Id=1) pr2 = PairedRegion(13, 83, 3, Id=2) pr3 = PairedRegion(18, 41, 5, Id=3) pr4 = PairedRegion(23, 53, 10, Id=4) pr5 = PairedRegion(33, 70, 3, Id=5) pr6 = PairedRegion(59, 93, 9, Id=6) pr7 = PairedRegion(78, 96, 3, Id=7) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7]) pairs = prs.toPairs() exp = PairedRegions([pr0, pr1, pr4, pr7]).toPairs() exp_rem = PairedRegions([pr2, pr3, pr5, pr6]).toPairs() self.assertEqual(f(pairs, func), exp) self.assertEqual(f(pairs, func, return_removed=True), (exp, exp_rem)) def test_elim_mc_circular(self): """conflict_elimination: find_max_conflicts, circular removal""" # simply remove in order of most conflicts, don't add back prfp = PairedRegionFromPairs f = conflict_elimination func = find_max_conflicts pr0 = prfp([(13, 65), (14, 64)], Id=0) pr1 = prfp([(15, 102), (16, 101), (17, 100), (18, 99), (19, 98)], Id=1) pr2 = prfp([(22, 72), (23, 71), (24, 70), (25, 69),\ (26, 68), (27, 67), (28, 66)], Id=2) pr3 = prfp([(31, 147), (32, 146), (33, 145), (34, 144), (35, 143),\ (36, 142), (37, 141), (38, 140), (39, 139)], Id=3) pr4 = prfp([(42, 129), (43, 128), (44, 127)], Id=4) pr5 = prfp([(46, 149), (47, 148)], Id=5) pr6 = prfp([(49, 92), (50, 91), (51, 90), (52, 89), (53, 88)], Id=6) pr7 = prfp([(75, 138), (76, 137), (77, 136), (78, 135)], Id=7) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7]) exp = PairedRegions([pr3, pr6]).toPairs() exp_rem = PairedRegions([pr0, pr1, pr2, pr4, pr5, pr7]).toPairs() self.assertEqual(f(prs.toPairs(), func, add_back=False,\ return_removed=True), (exp, exp_rem)) # add back circular removals exp = PairedRegions([pr3, pr4, pr6]).toPairs() exp_rem = PairedRegions([pr0, pr1, pr2, pr5, pr7]).toPairs() self.assertEqual(f(prs.toPairs(), func, add_back=True,\ return_removed=True), (exp, exp_rem)) def test_elim_min_gain(self): """conflict_elimination: find_min_gain, simple case""" f = conflict_elimination func = find_min_gain pr0 = PairedRegion(0, 77, 2, Id=0) pr1 = PairedRegion(7, 75, 5, Id=1) pr2 = PairedRegion(13, 83, 3, Id=2) pr3 = PairedRegion(18, 41, 5, Id=3) pr4 = PairedRegion(23, 53, 10, Id=4) pr5 = PairedRegion(33, 70, 3, Id=5) pr6 = PairedRegion(59, 93, 9, Id=6) pr7 = PairedRegion(78, 96, 3, Id=7) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7]) pairs = prs.toPairs() exp = PairedRegions([pr4, pr6]).toPairs() exp_rem = PairedRegions([pr0, pr1, pr2, pr3, pr5, pr7]).toPairs() self.assertEqual(f(pairs, func), exp) self.assertEqual(f(pairs, func, return_removed=True), (exp, exp_rem)) def test_elim_min_gain_circular(self): """conflict_elimination: find_min_gain, circular removal""" # simply remove in order of most conflicts, don't add back prfp = PairedRegionFromPairs f = conflict_elimination func = find_min_gain pr0 = prfp([(5, 170), (6, 169), (7, 168), (8, 167), (9, 166),\ (10, 165)], Id=0) pr1 = prfp([(25, 62), (26, 61)], Id=1) pr2 = prfp([(29, 46), (30, 45), (31, 44)], Id=2) pr3 = prfp([(48, 124), (49, 123)], Id=3) pr4 = prfp([(67, 183), (68, 182), (69, 181), (70, 180), (71, 179),\ (72, 178), (73, 177), (74, 176), (75, 175), (76, 174)], Id=4) pr5 = prfp([(82, 172), (83, 171)], Id=5) pr6 = prfp([(117, 135), (118, 134), (119, 133)], Id=6) pr7 = prfp([(151, 199), (152, 198), (153, 197), (154, 196),\ (155, 195), (156, 194), (157, 193), (158, 192), (159, 191),\ (160, 190)], Id=7) prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7]) exp = PairedRegions([pr1, pr2, pr4, pr6]).toPairs() exp_rem = PairedRegions([pr0, pr3, pr5, pr7]).toPairs() self.assertEqual(f(prs.toPairs(), func, add_back=False,\ return_removed=True), (exp, exp_rem)) # add back circular removals exp = PairedRegions([pr1, pr2, pr4, pr5, pr6]).toPairs() exp_rem = PairedRegions([pr0, pr3, pr7]).toPairs() self.assertEqual(f(prs.toPairs(), func, add_back=True,\ return_removed=True), (exp, exp_rem)) class IncrementalMethodsTests(TestCase): """Tests for incremental pseudoknot-removal methods""" def test_inc_order_forward(self): """nested_in_order: starting at 5' end""" f = inc_order p = Pairs([(1,10),(2,9),(3,15),(4,14),(11,20),(12,19),(25,30)]) exp = Pairs([(1,10),(2,9),(11,20),(12,19),(25,30)]) exp_rem = Pairs([(3,15),(4,14)]) self.assertEqual(f(p,reversed=False), exp) self.assertEqual(f(p,return_removed=True), (exp, exp_rem)) p = Pairs([(1,20),(2,30),(3,29),(4,28),(5,27),(7,24)]) exp = Pairs([(1,20)]) exp_rem = Pairs([(2,30),(3,29),(4,28),(5,27),(7,24)]) self.assertEqual(f(p,reversed=False), exp) self.assertEqual(f(p,return_removed=True), (exp, exp_rem)) self.assertEqual(f([]), []) p = [(1,10),(3,13)] # input as list of tuples exp = Pairs([(1,10)]) self.assertEqual(f(p), exp) p = [(1,10),(4,7),(2,9),(5,None)] # pseudoknot-free exp = [(1,10),(2,9),(4,7)] self.assertEqual(f(p), exp) p = [(1,10),(2,10)] #conflict self.assertRaises(ValueError, f, p) def test_inc_order_reversed(self): """nested_in_order: starting at 3' end""" f = inc_order p = Pairs([(1,10),(2,9),(3,15),(4,14),(24,31),(25,30)]) exp = Pairs([(3,15),(4,14),(24,31),(25,30)]) exp_rem = Pairs([(1,10),(2,9)]) self.assertEqual(f(p,reversed=True), exp) self.assertEqual(f(p, reversed=True, return_removed=True),\ (exp, exp_rem)) p = Pairs([(1,20),(2,30),(3,29),(4,28),(5,27),(7,24)]) exp = Pairs([(2,30),(3,29),(4,28),(5,27),(7,24)]) exp_rem = Pairs([(1,20)]) self.assertEqual(f(p,reversed=True), exp) self.assertEqual(f(p, reversed=True, return_removed=True),\ (exp, exp_rem)) self.assertEqual(f([], reversed=True), []) p = [(1,10),(3,13)] # input as list of tuples exp = Pairs([(3,13)]) self.assertEqual(f(p, reversed=True), exp) p = [(1,10),(4,7),(2,9),(5,None)] # pseudoknot-free exp = [(1,10),(2,9),(4,7)] self.assertEqual(f(p), exp) p = [(1,10),(2,10)] #conflict self.assertRaises(ValueError, f, p) def test_inc_length(self): """inc_length: should handle standard input """ f = inc_length # All blocks in conflict, start empty, add first p = Pairs([(1,10),(2,9),(3,8),(5,13),(6,12),(7,11)]) exp = Pairs([(1,10),(2,9),(3,8)]) self.assertEqual(f(p), exp) # Start with length 3 and 2, add 1 block p = Pairs([(1,10),(2,9),(3,8),(20,30),(21,29),(25,40),(32,38)]) exp = Pairs([(1,10),(2,9),(3,8),(20,30),(21,29),(32,38)]) self.assertEqual(f(p), exp) p = Pairs([(1,10),(2,9),(3,8),(12,20),(13,19),(15,23),(16,22)]) exp_5 = Pairs([(1,10),(2,9),(3,8),(12,20),(13,19)]) exp_3 = Pairs([(1,10),(2,9),(3,8),(15,23),(16,22)]) self.assertEqual(f(p), exp_5) self.assertEqual(f(p, reversed=True), exp_3) self.assertEqual(f(p, return_removed=True),(exp_5,[(15,23),(16,22)])) p = [(1,10),(4,7),(2,9),(5,None)] # pseudoknot-free exp = [(1,10),(2,9),(4,7)] self.assertEqual(f(p), exp) p = [(1,10),(2,10)] #conflict self.assertRaises(ValueError, f, p) def test_inc_length_rev(self): """inc_length: should prefer 3' side when reversed is True """ f = inc_length p = Pairs([(1,10),(2,9),(5,20),(6,19)]) self.assertEqual(f(p), [(1,10),(2,9)]) self.assertEqual(f(p, reversed=True), [(5,20),(6,19)]) def test_inc_range(self): """inc_range: should handle normal input """ f = inc_range p = [(1,5),(4,20),(15,23),(16,22)] exp = [(1,5),(15,23),(16,22)] self.assertEqual(f(p), exp) self.assertEqual(f(p, return_removed=True), (exp, [(4,20)])) p = [(1,11),(5,15)] # same range self.assertEqual(f(p), [(1,11)]) # 5' wins self.assertEqual(f(p, reversed=True), [(5,15)]) # 3' wins p = [(1,10),(2,10)] #conflict self.assertRaises(ValueError, f, p) def test_inc_range_empty(self): """inc_range: should handle empty or pseudoknot-free pairs """ f = inc_range p = [] exp = [] self.assertEqual(f(p), exp) p = [(1,10),(4,7),(2,9),(5,None)] exp = [(1,10),(2,9),(4,7)] self.assertEqual(f(p), exp) class NussinovTests(TestCase): """Tests for restricted nussinov algorithm and related functions""" def test_nussinov_fill(self): """nussinov_fill: basic test""" p = Pairs([(0,7),(1,6),(2,5),(3,9),(4,8)]) exp = [[0,0,0,0,0,1,2,3,3,3], [0,0,0,0,0,1,2,2,2,2], [0,0,0,0,0,1,1,1,1,2], [0,0,0,0,0,0,0,0,1,2], [0,0,0,0,0,0,0,0,1,1,], [0,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,0]] obs = nussinov_fill(p,size=10) self.assertEqual(obs, exp) def test_nussinov_traceback(self): """nussinov_traceback: basic test""" p = Pairs([(0,7),(1,6),(2,5),(3,9),(4,8)]) m = nussinov_fill(p,size=10) exp = set([(0,7),(1,6),(2,5)]) obs = nussinov_traceback(m, 0, 9, p) self.assertEqual(obs, exp) def test_nussinov_restricted(self): """nussinov_restricted: basic test""" p = Pairs([(0,7),(1,6),(2,5),(3,9),(4,8)]) obs = nussinov_restricted(p) obs_rem = nussinov_restricted(p, return_removed=True) exp = [(0,7),(1,6),(2,5)] exp_rem = ([(0,7),(1,6),(2,5)],[(3,9),(4,8)]) self.assertEqual(obs, exp) self.assertEqual(obs_rem, exp_rem) p = Pairs([(0,7),(1,6),(2,6)]) self.assertRaises(ValueError, nussinov_restricted, p) p = Pairs([(0,7),(1,6),(2,5)]) exp = Pairs([(0,7),(1,6),(2,5)]) self.assertEqual(nussinov_restricted(p), exp) def test_nussinov_restricted_bi(self): """nussinov_restricted: include bifurcation""" p = Pairs([(0,7),(1,6),(2,14),(3,13),(4,12),(5,11),\ (8,17),(9,16),(10,15)]) obs = nussinov_restricted(p) obs_rem = nussinov_restricted(p, return_removed=True) exp = [(0,7),(1,6),(8,17),(9,16),(10,15)] exp_rem = ([(0,7),(1,6),(8,17),(9,16),(10,15)],\ [(2,14),(3,13),(4,12),(5,11)]) self.assertEqual(obs, exp) self.assertEqual(obs_rem, exp_rem) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_struct/test_manipulation.py000644 000765 000024 00000007652 12024702176 024111 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os, tempfile try: from cogent.util.unit_test import TestCase, main from cogent.parse.pdb import PDBParser from cogent.format.pdb import PDBWriter from cogent.struct.selection import einput from cogent.struct.manipulation import copy, clean_ical, \ expand_symmetry, expand_crystal except ImportError: from zenpdb.cogent.util.unit_test import TestCase, main from zenpdb.cogent.parse.pdb import PDBParser from cogent.format.pdb import PDBWriter from zenpdb.cogent.struct.selection import einput from zenpdb.cogent.struct.manipulation import copy, clean_ical, \ exapnd_symmetry, expand_crystal __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class ManipulationTest(TestCase): """tests manipulationg entities""" def setUp(self): self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) def test_clean_ical(self): """tests the clean ical function which cleans structures.""" chainB = self.input_structure.table['C'][('2E12', 0, 'B')] leu25 = self.input_structure.table['R'][('2E12', 0, 'B', \ ('LEU', 25, ' '))] leu25icA = copy(leu25) self.assertTrue(leu25icA.parent is None) self.assertTrue(leu25icA is not leu25) self.assertTrue(leu25icA[(('N', ' '),)] is not leu25[(('N', ' '),)]) leu25icA.setIc('A') self.assertEquals(leu25icA.getId(), (('LEU', 25, 'A'),)) chainB.addChild(leu25icA) self.assertFalse(chainB[(('LEU', 25, 'A'),)] is \ chainB[(('LEU', 25, ' '),)]) self.assertEquals(clean_ical(self.input_structure), \ ([], [('2E12', 0, 'B', ('LEU', 25, 'A'))])) clean_ical(self.input_structure, pretend=False) self.assertTrue(chainB[(('LEU', 25, 'A'),)] is leu25icA) self.assertFalse((('LEU', 25, 'A'),) in chainB.keys()) self.assertFalse((('LEU', 25, 'A'),) in chainB) self.assertTrue((('LEU', 25, 'A'),) in chainB.keys(unmask=True)) self.input_structure.setUnmasked(force=True) self.assertEquals(clean_ical(self.input_structure), \ ([], [('2E12', 0, 'B', ('LEU', 25, 'A'))])) clean_ical(self.input_structure, pretend=False, mask=False) self.assertFalse((('LEU', 25, 'A'),) in chainB.keys()) self.assertFalse((('LEU', 25, 'A'),) in chainB) self.assertFalse((('LEU', 25, 'A'),) in chainB.keys(unmask=True)) def test_0expand_symmetry(self): """tests the expansion of a asu to a unit-cell.""" global fn mh = expand_symmetry(self.input_structure[(0,)]) fd, fn = tempfile.mkstemp('.pdb') os.close(fd) fh = open(fn, 'w') PDBWriter(fh, mh, self.input_structure.raw_header) fh.close() def test_1expand_crystal(self): """tests the expansion of a unit-cell to a crystal""" fh = open(fn, 'r') input_structure = PDBParser(fh) self.assertTrue(input_structure.values(), 4) # 4 models sh = expand_crystal(input_structure) self.assertTrue(len(sh) == 27) fd, fn2 = tempfile.mkstemp('.pdb') os.close(fd) fh = open(fn2, 'w') a1 = einput(input_structure, 'A') a2 = einput(sh.values()[3], 'A') k = a1.values()[99].getFull_id() name = sh.values()[3].name a1c = a1[k].coords a2c = a2[(name,) + k[1:]].coords self.assertTrue(len(a1), len(a2)) self.assertRaises(AssertionError, self.assertFloatEqual, a1c, a2c) PDBWriter(fh, sh) fh.close() os.unlink(fn) os.unlink(fn2) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_struct/test_pairs_util.py000644 000765 000024 00000105505 12024702176 023560 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # test_pairs_util.py """Provides tests for gapping/ungapping functions and base pair comparison """ from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.core.sequence import RnaSequence, ModelSequence, Sequence from cogent.core.moltype import RNA from cogent.core.alphabet import CharAlphabet from cogent.struct.rna2d import Pairs from cogent.struct.pairs_util import PairsAdjustmentError,\ adjust_base, adjust_base_structures, adjust_pairs_from_mapping,\ delete_gaps_from_pairs, insert_gaps_in_pairs, gapped_to_ungapped,\ get_gap_symbol, get_gap_list, degap_model_seq, degap_seq,\ ungapped_to_gapped,\ pairs_intersection, pairs_union, compare_pairs,\ compare_pairs_mapping, compare_random_to_correct,\ sensitivity, selectivity, get_all_pairs, get_counts, extract_seqs,\ mcc, approximate_correlation, correlation_coefficient, all_metrics __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Shandy Wikman", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class GappedUngappedTests(TestCase): """Provides tests for gapped_to_ungapped and ungapped_to_gapped functions """ def setUp(self): """setUp: set up method for all tests""" self.rna1 = RnaSequence('UCAG-RYN-N', Name='rna1') self.m1 = ModelSequence('UCAG-RYN-N', Name='rna1',\ Alphabet=RNA.Alphabets.DegenGapped) self.s1 = 'UCAG-RYN-N' def test_adjust_base(self): """adjust_base: should work for pairs object or list of pairs""" p = Pairs() self.assertEqual(adjust_base(p,10),[]) pairs = [(1,21),(2,15),(3,13),(4,11),(5,10),(6,9)] offset = -1 expected = [(0,20),(1,14),(2,12),(3,10),(4,9),(5,8)] obs_pairs = adjust_base(pairs, offset) self.assertEqual(obs_pairs, expected) pairs = Pairs([(0,10),(1,9)]) self.assertEqual(adjust_base(pairs, -1), Pairs([(-1,9),(0,8)])) self.assertEqual(adjust_base(pairs, 5), Pairs([(5,15),(6,14)])) self.assertRaises(PairsAdjustmentError, adjust_base, pairs, 3.5) def test_adjust_base_structures(self): """adjust_pairs_structures: simple structure""" p = Pairs([(3,10),(4,9)]) p2 = Pairs([(2,7), (30,40)]) self.assertEqual(adjust_base_structures([p,p2], -1),\ [[(2,9),(3,8)],[(1,6),(29,39)]]) def test_adjust_base_None(self): """adjust_base: should keep Nones or duplicates, ignore conflicts""" pairs = Pairs([(2,8),(3,7),(6,None),(None,None),(2,10)]) expected = Pairs([(1,7),(2,6),(5,None),(None, None),(1,9)]) self.assertEqual(adjust_base(pairs,-1), expected) p = Pairs([(1,2),(2,1),(1,2),(2,None)]) self.assertEqual(adjust_base(p, 1), [(2,3),(3,2),(2,3),(3,None)]) def test_adjust_pairs_from_mapping_confl(self): """adjust_pairs_from_mapping: should handle conflicts, pseudo, dupl """ f = adjust_pairs_from_mapping p = Pairs([(0,6),(1,5),(2,None),(None,None),(1,4),(3,7),(6,0)]) m = {0:1,1:3,2:6,3:7,4:8,5:10,6:11,7:12} exp = Pairs([(1,11),(3,10),(6,None),(None,None),(3,8),(7,12),(11,1)]) self.assertEqual(f(p, m), exp) p = Pairs([(1,11),(3,10),(7,12),(6,None),(None,None),(5,8)]) m = {1: 0, 3: 1, 6: 2, 7: 3, 8: 4, 10: 5, 11: 6, 12: 7} exp = Pairs([(0,6),(1,5),(3,7),(2,None),(None,None)]) self.assertEqual(f(p,m), exp) def test_delete_gaps_from_pairs(self): """delete_gaps_from_pairs: should work on standard input""" r = delete_gaps_from_pairs # empty list p = Pairs([]) self.assertEqual(r(p,[1,2,3]), []) # normal list p1 = Pairs([(2,8), (3,6)]) gap_list = [0,1,4,5,7,9] self.assertEqualItems(r(p1, gap_list), [(0,3),(1,2)]) p2 = Pairs([(2,8),(3,6),(4,9)]) self.assertEqualItems(r(p2, gap_list), [(0,3),(1,2)]) p3 = Pairs([(2,8),(3,6),(4,10)]) self.assertEqualItems(r(p3, gap_list), [(0,3),(1,2)]) def test_delete_gaps_from_pairs_weird(self): """delete_gaps_from_pairs: should ignore conflicts etc""" r = delete_gaps_from_pairs gap_list = [0,1,4,5,7,9] p = Pairs([(2,6),(3,8)]) self.assertEqualItems(r(p, gap_list), [(0,2),(1,3)]) p = Pairs([(2,6),(3,8),(3,None),(6,2),(3,8), (None, None)]) self.assertEqualItems(r(p, gap_list),\ [(0,2),(1,3),(1,None),(2,0),(1,3),(None, None)]) def test_insert_gaps_in_pairs(self): """insert_gaps_in_pairs: should work with normal and conflicts""" p = Pairs([(0,3),(1,2),(1,4),(3,None)]) gaps = [0,1,4,5,7] self.assertEqual(insert_gaps_in_pairs(p, gaps),\ [(2,8),(3,6),(3,9),(8,None)]) p = Pairs([(0,6),(1,5),(2,None),(3,7),(0,1),(5,1)]) gaps = [0,2,6,9] self.assertEqual(insert_gaps_in_pairs(p, gaps),\ [(1,10),(3,8),(4,None),(5,11),(1,3),(8,3)]) gaps = [2,3,4,9] self.assertEqual(insert_gaps_in_pairs(p, gaps),\ [(0,10),(1,8),(5,None),(6,11),(0,1),(8,1)]) p = Pairs([(0,6),(1,5),(2,None),(3,7),(0,1),(5,1)]) gaps = [] self.assertEqual(insert_gaps_in_pairs(p, gaps),\ [(0,6),(1,5),(2,None),(3,7),(0,1),(5,1)]) def test_get_gap_symbol(self): """get_gap_symbol: Sequence, ModelSequence, old_cogent, string""" self.assertEqual(get_gap_symbol(self.rna1), '-') self.assertEqual(get_gap_symbol(self.m1), '-') self.assertEqual(get_gap_symbol(self.s1), '-') self.assertEqual(get_gap_symbol(''), '-') def test_get_gap_list(self): """get_gap_list: Sequence, ModelSequence, old_cogent, string""" gs = '-' self.assertEqual(get_gap_list(self.rna1), [4,8]) self.assertEqual(get_gap_list(self.m1), [4,8]) self.assertEqual(get_gap_list(self.s1,gs),[4,8]) self.assertEqual(get_gap_list('',gs), []) def test_degap_model_seq(self): """degap_model_seq: replacement for broken method""" self.assertEqual(str(degap_model_seq(self.m1)),'UCAGRYNN') def test_degap_seq(self): """degap_seq: Sequence, ModelSequence, old_cogent, string""" f = degap_seq gs = '-' self.assertEqual(f(self.rna1, gs), 'UCAGRYNN') self.assertEqual(str(f(self.m1, gs)), 'UCAGRYNN') self.assertEqual(f(self.s1, gs), 'UCAGRYNN') def test_gapped_to_ungapped(self): """gapped_to_ungapped: Sequence, ModelSequence, old_cogent, string """ p = Pairs([(0,6),(1,5),(3,9)]) exp = Pairs([(0,5),(1,4),(3,7)]) f = gapped_to_ungapped self.assertEqual(f(self.rna1, p)[1], exp) self.assertEqual(f(self.m1, p)[1], exp) self.assertEqual(f(self.s1, p)[1], exp) def test_ungapped_to_gapped(self): """ungapped_to_gapped: Sequence, ModelSequence, old_cogent, string """ p = Pairs([(0,6),(1,5),(3,9)]) exp = Pairs([(0,5),(1,4),(3,7)]) f = ungapped_to_gapped self.assertEqual(f(self.rna1, exp)[1], p) self.assertEqual(f(self.m1, exp)[1], p) self.assertEqual(f(self.s1, exp)[1], p) class OldAdjustmentFunctionsTests(TestCase): """Provides tests for gapped_to_ungapped and ungapped_to_gapped functions """ def setUp(self): """setUp: set up method for all tests""" self.ungapped = 'AGAUGCUAGCUAC' self.gapped = '-AGA--UGC-UAG--CUAC' self.diff_sym = '*AGA**UGC*UAG**CUAC' self.simple = Pairs([(2,7),(3,6),(8,12)]) self.simple_g = Pairs([(3,11),(6,10),(12,18)]) self.out_order = Pairs([(6,10),(4,1),(9,7),(5,11)]) self.out_order_g = Pairs([(10,16),(7,2),(15,11),(8,17)]) self.duplicates = Pairs([(3,9),(3,9),(2,10),(0,12)]) self.duplicates_g = Pairs([(6,15),(6,15),(3,16),(1,18)]) self.pseudo = Pairs([(0,7),(2,6),(3,10)]) self.pseudo_g = Pairs([(1,11),(3,10),(6,16)]) def test_adjust_base(self): """adjust_base: should work for pairs object or list of pairs""" p = Pairs() self.assertEqual(adjust_base(p,10),[]) pairs = [(1,21),(2,15),(3,13),(4,11),(5,10),(6,9)] offset = -1 expected = [(0,20),(1,14),(2,12),(3,10),(4,9),(5,8)] obs_pairs = adjust_base(pairs, offset) self.assertEqual(obs_pairs, expected) pairs = Pairs([(0,10),(1,9)]) self.assertEqual(adjust_base(pairs, -1), Pairs([(-1,9),(0,8)])) self.assertEqual(adjust_base(pairs, 5), Pairs([(5,15),(6,14)])) self.assertRaises(PairsAdjustmentError, adjust_base, pairs, 3.5) def test_adjust_base_None(self): """adjust_base: should keep Nones or duplicates, ignore conflicts""" pairs = Pairs([(2,8),(3,7),(6,None),(None,None),(2,10)]) expected = Pairs([(1,7),(2,6),(5,None),(None, None),(1,9)]) self.assertEqual(adjust_base(pairs,-1), expected) p = Pairs([(1,2),(2,1),(1,2),(2,None)]) self.assertEqual(adjust_base(p, 1), [(2,3),(3,2),(2,3),(3,None)]) def test_delete_gaps_from_pairs(self): """delete_gaps_from_pairs: should work on standard input""" r = delete_gaps_from_pairs # empty list p = Pairs([]) self.assertEqual(r(p,[1,2,3]), []) # normal list p1 = Pairs([(2,8), (3,6)]) gap_list = [0,1,4,5,7,9] self.assertEqualItems(r(p1, gap_list), [(0,3),(1,2)]) p2 = Pairs([(2,8),(3,6),(4,9)]) self.assertEqualItems(r(p2, gap_list), [(0,3),(1,2)]) p3 = Pairs([(2,8),(3,6),(4,10)]) self.assertEqualItems(r(p3, gap_list), [(0,3),(1,2)]) def test_delete_gaps_from_pairs_weird(self): """delete_gaps_from_pairs: should ignore conflicts etc""" r = delete_gaps_from_pairs gap_list = [0,1,4,5,7,9] p = Pairs([(2,6),(3,8)]) self.assertEqualItems(r(p, gap_list), [(0,2),(1,3)]) p = Pairs([(2,6),(3,8),(3,None),(6,2),(3,8), (None, None)]) self.assertEqualItems(r(p, gap_list),\ [(0,2),(1,3),(1,None),(2,0),(1,3),(None, None)]) def test_insert_gaps_in_pairs(self): """insert_gaps_in_pairs: should work with normal and conflicts""" p = Pairs([(0,3),(1,2),(1,4),(3,None)]) gaps = [0,1,4,5,7] self.assertEqual(insert_gaps_in_pairs(p, gaps),\ [(2,8),(3,6),(3,9),(8,None)]) p = Pairs([(0,6),(1,5),(2,None),(3,7),(0,1),(5,1)]) gaps = [0,2,6,9] self.assertEqual(insert_gaps_in_pairs(p, gaps),\ [(1,10),(3,8),(4,None),(5,11),(1,3),(8,3)]) gaps = [2,3,4,9] self.assertEqual(insert_gaps_in_pairs(p, gaps),\ [(0,10),(1,8),(5,None),(6,11),(0,1),(8,1)]) def test_gapped_to_ungapped_simple(self): """gapped_to_ungapped: should work for simple case""" s = RnaSequence(self.gapped) p = self.simple_g obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.simple) assert isinstance(obs_seq, RnaSequence) assert isinstance(obs_pairs, Pairs) def test_gapped_to_ungapped_out_of_order(self): """gapped_to_ungapped: should work when pairs are out of order """ s = RnaSequence(self.gapped) p = Pairs(self.out_order_g) obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.out_order) assert isinstance(obs_seq, RnaSequence) assert isinstance(obs_pairs, Pairs) def test_gapped_to_ungapped_duplicates(self): """gapped_to_ungapped: should work when pairs contains duplicates """ s = RnaSequence(self.gapped) p = Pairs(self.duplicates_g) obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.duplicates) assert isinstance(obs_seq, RnaSequence) assert isinstance(obs_pairs, Pairs) def test_gapped_to_ungapped_pseudo(self): """gapped_to_ungapped: shouldn't care about pseudoknots """ s = RnaSequence(self.gapped) p = Pairs(self.pseudo_g) obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.pseudo) assert isinstance(obs_seq, RnaSequence) assert isinstance(obs_pairs, Pairs) def test_gapped_to_ungapped_no_gaps(self): """gapped_to_ungapped: should return same pairs when no gaps """ s = RnaSequence(self.ungapped) p = Pairs(self.simple) obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.simple) assert isinstance(obs_seq, RnaSequence) assert isinstance(obs_pairs, Pairs) def test_ungapped_to_gapped(self): """ungapped_to_gapped: should work for basic case """ s = RnaSequence(self.gapped) p = self.simple obs_seq, obs_pairs = ungapped_to_gapped(s,p) assert obs_seq is s self.assertEqualItems(obs_pairs, self.simple_g) assert isinstance(obs_seq, RnaSequence) assert isinstance(obs_pairs, Pairs) def test_ungapped_to_gapped_out_of_order(self): """ungapped_to_gapped: should work when pairs out of order """ s = RnaSequence(self.gapped) p = self.out_order obs_seq, obs_pairs = ungapped_to_gapped(s,p) assert obs_seq is s self.assertEqualItems(obs_pairs, self.out_order_g) assert isinstance(obs_seq, RnaSequence) assert isinstance(obs_pairs, Pairs) def test_gapped_to_ungapped_simple(self): """gapped_to_ungapped: should work on simple case """ s = self.gapped p = self.simple_g obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.simple) assert not isinstance(obs_seq, RnaSequence) assert isinstance(obs_seq, str) assert isinstance(obs_pairs, Pairs) def test_gapped_to_ungapped_pseudo(self): """gapped_to_ungapped: shouldn't care about pseudoknots """ s = self.gapped p = self.pseudo_g obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.pseudo) assert not isinstance(obs_seq, RnaSequence) assert isinstance(obs_seq, str) assert isinstance(obs_pairs, Pairs) def test_ungapped_to_gapped_simple(self): """ungapped_to_gapped: should work on basic case""" s = self.gapped p = self.simple obs_seq, obs_pairs = ungapped_to_gapped(s,p) assert obs_seq is s self.assertEqualItems(obs_pairs, self.simple_g) assert not isinstance(obs_seq, RnaSequence) assert isinstance(obs_seq, str) assert isinstance(obs_pairs, Pairs) def test_ungapped_to_gapped_duplicates(self): """ungapped_to_gapped: should work when pairs are duplicated""" s = self.gapped p = self.duplicates obs_seq, obs_pairs = ungapped_to_gapped(s,p) assert obs_seq is s self.assertEqualItems(obs_pairs, self.duplicates_g) assert not isinstance(obs_seq, RnaSequence) assert isinstance(obs_seq, str) assert isinstance(obs_pairs, Pairs) def test_gapped_to_ungapped_general(self): """gapped_to_ungapped: should return object of right type """ s = RnaSequence(self.gapped) p = self.simple_g #in case of RnaSequence obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.simple) assert isinstance(obs_seq, RnaSequence) assert isinstance(obs_pairs, Pairs) #in case of str input s = self.gapped obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.simple) assert not isinstance(obs_seq, RnaSequence) assert isinstance(obs_seq, str) assert isinstance(obs_pairs, Pairs) def test_ungapped_to_gapped_general(self): """ungapped_to_gapped: should return object of right type """ s = RnaSequence(self.gapped) p = self.simple #in case of RnaSequence obs_seq, obs_pairs = ungapped_to_gapped(s,p) assert obs_seq is s self.assertEqualItems(obs_pairs, self.simple_g) assert isinstance(obs_seq, RnaSequence) assert isinstance(obs_pairs, Pairs) #in case of str input s = self.gapped obs_seq, obs_pairs = ungapped_to_gapped(s,p) assert obs_seq is s self.assertEqualItems(obs_pairs, self.simple_g) assert not isinstance(obs_seq, RnaSequence) assert isinstance(obs_seq, str) assert isinstance(obs_pairs, Pairs) def test_gapped_to_ungapped_general_seq(self): """gapped_to_ungapped: when input is Sequence obj, treat as string """ s = Sequence(self.gapped) p = self.simple_g obs_seq, obs_pairs = gapped_to_ungapped(s,p) self.assertEqual(obs_seq, self.ungapped) self.assertEqualItems(obs_pairs, self.simple) #assert not isinstance(obs_seq, Sequence) #assert isinstance(obs_seq, str) assert isinstance(obs_seq, Sequence) assert isinstance(obs_pairs, Pairs) def test_adjust_pairs_from_mapping(self): """adjust_pairs_from_mapping: should work both ways """ #ungapped to gapped r = RnaSequence('UC-AG-UC-CG-A-') u_to_g = r.gapMaps()[0] #{0: 0, 1: 1, 2: 3, 3: 4, 4: 6, 5: 7, 6: 9, 7: 10, 8: 12} ungapped_pairs = Pairs([(0,8),(1,6),(2,5)]) exp_pairs = Pairs([(0,12),(1,9),(3,7)]) self.assertEqualItems(adjust_pairs_from_mapping(ungapped_pairs,\ u_to_g), exp_pairs) #gapped to ungapped r = RnaSequence('UC-AG-UC-CG-A-') g_to_u = r.gapMaps()[1] #{0: 0, 1: 1, 3: 2, 4: 3, 6: 4, 7: 5, 9: 6, 10: 7, 12: 8} gapped_pairs = Pairs([(0,12),(1,9),(3,7)]) exp_pairs = Pairs([(0,8),(1,6),(2,5)]) self.assertEqualItems(adjust_pairs_from_mapping(gapped_pairs,\ g_to_u), exp_pairs) class PairsComparisonTests(TestCase): """Provides tests for comparing different Pairs objects""" def test_pairs_intersection(self): """pairs_intersection: should work on simple case """ p1 = Pairs([(3,10),(4,9),(5,8),(20,24)]) p2 = Pairs([(1,12),(4,9),(5,8)]) self.assertEqualItems(pairs_intersection(p1,p2),[(4,9),(5,8)]) #works when one is empty p1 = Pairs([(3,10),(4,9),(5,8),(20,24)]) p2 = Pairs([]) self.assertEqualItems(pairs_intersection(p1,p2),[]) #works also on lists (not Pairs) p1 = [(3,10),(4,9),(5,8),(20,24)] p2 = [(1,12),(4,9),(5,8)] self.assertEqualItems(pairs_intersection(p1,p2),[(4,9),(5,8)]) def test_pairs_intersection_duplicates(self): """pairs_intersection: should work on flipped pairs and duplicates """ p1 = Pairs([(3,10),(4,9),(5,8),(20,24)]) p2 = Pairs([(10,3),(4,9),(5,8),(9,4),(4,9),(23,30)]) self.assertEqualItems(pairs_intersection(p1,p2),[(3,10),(4,9),(5,8)]) # Conflicts, duplicates, None, pseudoknots p1 = Pairs([(3,10),(4,9),(5,8),(20,24),(22,26),(3,2),(9,4),(6,None)]) p2 = Pairs([(1,12),(4,9),(5,8)]) self.assertEqualItems(pairs_intersection(p1,p2),\ [(4,9),(5,8)]) def test_pairs_union(self): """pairs_union: should work on simple case """ p1 = Pairs([(3,10),(4,9),(5,8),(20,24)]) p2 = Pairs([(1,12),(4,9),(5,8)]) self.assertEqualItems(pairs_union(p1,p2),\ [(1,12),(3,10),(4,9),(5,8),(20,24)]) #works when one is empty p1 = Pairs([(3,10),(4,9),(5,8),(20,24)]) p2 = Pairs([]) self.assertEqualItems(pairs_union(p1,p2),p1) #works also on lists (not Pairs) p1 = [(3,10),(4,9),(5,8),(20,24)] p2 = [(1,12),(4,9),(5,8)] self.assertEqualItems(pairs_union(p1,p2),\ [(1,12),(3,10),(4,9),(5,8),(20,24)]) def test_union_duplicates(self): """pairs_union: should work on flipped base pairs and duplicates """ p1 = Pairs([(3,10),(4,9),(5,8),(20,24)]) p2 = Pairs([(10,3),(4,9),(5,8),(9,4),(4,9),(23,30)]) self.assertEqualItems(pairs_union(p1,p2),\ [(3,10),(4,9),(5,8),(20,24),(23,30)]) # Conflicts, duplicates, None, pseudoknots p1 = Pairs([(3,10),(4,9),(5,8),(20,24),(22,26),(3,2),(9,4),(6,None)]) p2 = Pairs([(1,12),(4,9),(5,8)]) self.assertEqualItems(pairs_union(p1,p2),\ [(1,12),(3,10),(4,9),(5,8),(20,24),(22,26),(2,3)]) def test_compare_pairs(self): """compare_pairs: should work on simple case""" #all the same p1 = Pairs([(3,10),(4,9),(5,8),(20,24)]) p2 = Pairs([(3,10),(4,9),(5,8),(20,24)]) self.assertEqual(compare_pairs(p1,p2),1) #all different p1 = Pairs([(3,10),(4,9),(5,8),(20,24)]) p2 = Pairs([(1,2),(3,4),(5,6)]) self.assertEqual(compare_pairs(p1,p2),0) #one empty p1 = Pairs([(3,10),(4,9),(5,8),(20,24)]) p2 = Pairs([]) self.assertEqual(compare_pairs(p1,p2),0) #partially different p1 = Pairs([(1,2),(3,4),(5,6),(7,8)]) p2 = Pairs([(1,2),(3,4),(9,10),(11,12)]) self.assertFloatEqual(compare_pairs(p1,p2),.33333333333333333) #partially different p1 = Pairs([(1,2),(3,4),(5,6)]) p2 = Pairs([(1,2),(3,4),(9,10)]) self.assertFloatEqual(compare_pairs(p1,p2),.5) def test_compare_pairs_both_empty(self): """compare_pairs: should return 1.0 when both lists are empty """ p1 = Pairs([]) p2 = Pairs([]) self.assertEqual(compare_pairs(p1,p2),1) def test_compare_pairs_weird(self): """compare_pairs: should handle conflicts, duplicates, pseudo, None """ #Should raise error on conflict p1 = Pairs([(1,2),(3,4),(5,6),(2,None),(4,3),(None,None)]) p2 = Pairs([(1,2),(3,4),(9,10)]) self.assertRaises(ValueError, compare_pairs, p1, p2) p1 = Pairs([(1,2),(3,4),(5,6),(4,3),(None,None),(10,None)]) p2 = Pairs([(1,2),(3,4),(9,10)]) self.assertFloatEqual(compare_pairs(p1,p2),.5) p1 = Pairs([(1,8),(2,10),(7,3)]) p2 = Pairs([(1,8),(10,2),(3,7),(4,6)]) self.assertFloatEqual(compare_pairs(p1,p2), 0.75) def test_compare_pairs_mapping(self): """compare_pairs_mapping: should work with correct mapping """ # pos in first seq, base, pos in second seq #1 U 0 #2 C 1 #3 G 2 #4 A 3 # A 4 #5 C 5 #6 C 6 #7 U #8 G 7 #all the same p1 = Pairs([(3,6),(1,8)]) p2 = Pairs([(2,6),(0,7)]) mapping = {1:0,2:1,3:2,4:3,5:5,6:6,7:None, 8:7} self.assertEqual(compare_pairs_mapping(p1,p2, mapping),1) #all different p1 = Pairs([(3,6),(1,8)]) p2 = Pairs([(1,5),(4,7)]) mapping = {1:0,2:1,3:2,4:3,5:5,6:6,7:None, 8:7} self.assertEqual(compare_pairs_mapping(p1,p2, mapping),0) #partially the same p1 = Pairs([(5,6),(1,4),(2,7)]) p2 = Pairs([(5,6),(0,3),(4,7)]) self.assertEqual(compare_pairs_mapping(p1,p2, mapping),.5) p1 = Pairs([(1,8),(2,7),(3,6),(4,5)]) p2 = Pairs([(0,7),(1,6),(2,5),(3,4)]) self.assertFloatEqual(compare_pairs_mapping(p1,p2, mapping),1/7) #one empty p1 = Pairs([(1,8),(2,7),(3,6),(4,5)]) p2 = [] self.assertEqual(compare_pairs_mapping(p1,p2, mapping),0) #both empty p1 = [] p2 = [] self.assertEqual(compare_pairs_mapping(p1,p2, mapping),1) def test_compare_random_to_correct(self): """comapre_random_to_correct: should return correct fraction """ p1 = Pairs([(1,8),(2,7),(3,6),(4,5)]) p2 = Pairs([(1,8)]) p3 = Pairs([(1,8), (2,7), (4,5)]) p4 = Pairs([(1,8),(2,7),(9,10),(11,12)]) self.assertFloatEqual(compare_random_to_correct(p2,p1),1) self.assertFloatEqual(compare_random_to_correct(p3,p1),1) self.assertFloatEqual(compare_random_to_correct(p4,p1),0.5) self.assertFloatEqual(compare_random_to_correct([],p1),0) self.assertFloatEqual(compare_random_to_correct(p2,[]),0) self.assertFloatEqual(compare_random_to_correct([],[]),1) class GardnerMetricsTest(TestCase): """Tests for the metrics from Gardner & Giegerich 2004""" def setUp(self): """setUp: setup method for all tests""" self.true = Pairs([(0,40),(1,39),(2,38),(3,37),(10,20),\ (11,19),(12,18),(13,17),(26,33),(27,32)]) self.predicted = Pairs([(0,40),(1,39),(2,38),(3,37),(4,36),\ (5,35),(10,22),(11,20),(14,29),(15,28)]) self.seq = ['>seq1\n','agguugaaggggauccgauccacuccccggcuggucaaccu'] def test_conflicts(self): """all metrics should raise error when conflicts in one of the structs """ ref = Pairs([(1,6),(2,5),(3,10),(7,None),(None,None),(5,2),(1,12)]) pred = Pairs([(6,1),(10,11),(3,12)]) self.assertRaises(ValueError, sensitivity, ref, pred) self.assertRaises(ValueError, sensitivity, pred, ref) self.assertRaises(ValueError, selectivity, ref, pred) self.assertRaises(ValueError, selectivity, pred, ref) self.assertRaises(ValueError, approximate_correlation, ref, pred,\ self.seq) self.assertRaises(ValueError, approximate_correlation, pred, ref,\ self.seq) self.assertRaises(ValueError, correlation_coefficient, ref, pred,\ self.seq) self.assertRaises(ValueError, correlation_coefficient, pred, ref,\ self.seq) self.assertRaises(ValueError, mcc, ref, pred, self.seq) self.assertRaises(ValueError, mcc, pred, ref, self.seq) def test_get_all_pairs(self): """get_all_pairs: should return the number of possible pairs""" seq = RnaSequence('UCAG-NACGU') seq2 = RnaSequence('UAAG-CACGC') self.assertEqual(get_all_pairs([seq], min_dist=4), 6) self.assertEqual(get_all_pairs([seq2], min_dist=4), 4) # when given multiple sequences, should average over all of them self.assertEqual(get_all_pairs([seq,seq2], min_dist=4), 5) # different min distance self.assertEqual(get_all_pairs([seq], min_dist=2),10) # error on invalid minimum distance self.assertRaises(ValueError, get_all_pairs, [seq], min_dist=-2) def test_get_counts(self): """get_counts: should work with all parameters""" seq = RnaSequence('UCAG-NAUGU') seq2 = RnaSequence('UAAG-CACGC') p = Pairs([(1,8),(2,7)]) p2 = Pairs([(1,8),(2,6),(3,6),(4,9),]) exp = {'TP':1,'TN':0, 'FN':1,'FP':3,\ 'FP_INCONS':0, 'FP_CONTRA':0, 'FP_COMP':0} self.assertEqual(get_counts(p, p2), exp) exp = {'TP':1,'TN':0, 'FN':1,'FP':3,\ 'FP_INCONS':1, 'FP_CONTRA':1, 'FP_COMP':1} self.assertEqual(get_counts(p, p2, split_fp=True), exp) seq = RnaSequence('UCAG-NACGU') exp = {'TP':1,'TN':7, 'FN':1,'FP':3,\ 'FP_INCONS':1, 'FP_CONTRA':1, 'FP_COMP':1} self.assertEqual(get_counts(p, p2, split_fp=True,\ sequences=[seq], min_dist=2), exp) # check against compare_ct.pm exp = {'TP':4,'TN':266, 'FN':6,'FP':6,\ 'FP_INCONS':2, 'FP_CONTRA':2, 'FP_COMP':2} seq = 'agguugaaggggauccgauccacuccccggcuggucaaccu'.upper() self.assertEqual(get_counts(self.true, self.predicted, split_fp=True,\ sequences=[seq], min_dist=4), exp) def test_extract_seqs(self): """extract_seqs: should handle different input formats""" s1 = ">seq1\nACGUAGC\n>seq2\nGGUAGCG" s2 = [">seq1","ACGUAGC",">seq2","GGUAGCG"] s3 = ['ACGUAGC','GGUAGCG'] s4 = [RnaSequence('ACGUAGC'), RnaSequence('GGUAGCG')] m1 = ModelSequence('ACGUAGC', Name='rna1',\ Alphabet=RNA.Alphabets.DegenGapped) m2 = ModelSequence('GGUAGCG', Name='rna2',\ Alphabet=RNA.Alphabets.DegenGapped) s5 = [m1, m2] f = extract_seqs self.assertEqual(f(s1), ['ACGUAGC', 'GGUAGCG']) self.assertEqual(f(s2), ['ACGUAGC', 'GGUAGCG']) self.assertEqual(f(s3), ['ACGUAGC', 'GGUAGCG']) self.assertEqual(f(s4), ['ACGUAGC', 'GGUAGCG']) self.assertEqual(f(s5), ['ACGUAGC', 'GGUAGCG']) def test_sensitivity(self): """sensitivity: check against compare_ct.pm""" sen = sensitivity(self.true,self.predicted) self.assertEqual(sen, 0.4) def test_sensitivity_general(self): """sensitivity: should work in general""" ref = Pairs([(1,6),(2,5),(3,10)]) pred = Pairs([(6,1),(10,11),(3,12)]) # one good prediction self.assertFloatEqual(sensitivity(ref, pred), 1/3) # over-prediction not penalized pred = Pairs([(6,1),(10,11),(3,12),(13,20),(14,19),(15,18)]) self.assertFloatEqual(sensitivity(ref, pred), 1/3) def test_sensitivity_dupl(self): """sensitivity: should handle duplicates, pseudo, None""" ref = Pairs([(1,6),(2,5),(3,10),(7,None),(None,None),(5,2),(4,9)]) pred = Pairs([(6,1),(10,11),(3,12)]) self.assertFloatEqual(sensitivity(ref, pred), 0.25) pred = Pairs([(6,1),(10,11),(3,12),(20,None),(None,None),(1,6)]) self.assertFloatEqual(sensitivity(ref, pred), 0.25) def test_sensitivity_empty(self): """sensitivity: should work on emtpy Pairs""" # both empty self.assertFloatEqual(sensitivity(Pairs(), Pairs()), 1) pred = Pairs([(6,1),(10,11),(3,12),(13,20),(14,19),(15,18)]) # prediction emtpy self.assertFloatEqual(sensitivity(Pairs(), pred), 0) # reference empty self.assertFloatEqual(sensitivity(pred, Pairs()), 0) def test_selectivity(self): """selectivity: check against compare_ct.pm""" sel = selectivity(self.true,self.predicted) self.assertEqual(sel, 0.5) def test_selectivity_general(self): """selectivity: should work in general""" ref = Pairs([(1,6),(2,5),(10,13)]) pred = Pairs([(6,1),(3,4),(10,12)]) # one good prediction self.assertFloatEqual(selectivity(ref, pred), 0.5) # over-prediction not penalized pred = Pairs([(6,1),(10,11),(3,12),(13,20),(14,19),(15,18)]) self.assertFloatEqual(selectivity(ref, pred), 0.25) def test_selectivity_dupl(self): """selectivity: duplicates and Nones shouldn't influence the calc. """ ref = Pairs([(1,6),(2,5),(10,13),(6,1),(7,None),(None,None)]) pred = Pairs([(6,1),(3,4),(10,12)]) self.assertFloatEqual(selectivity(ref, pred), 0.5) def test_selectivity_empty(self): """selectivity: should handle empty reference/predicted structure""" # both empty self.assertFloatEqual(selectivity(Pairs(), Pairs()), 1) pred = Pairs([(6,1),(10,11),(3,12),(13,20),(14,19),(15,18)]) # prediction emtpy self.assertFloatEqual(selectivity(Pairs(), pred), 0) # reference empty self.assertFloatEqual(selectivity(pred, Pairs()), 0) def test_approximate_correlation(self): """approximate_correlation: check against compare_ct.pm""" self.assertFloatEqual(approximate_correlation(self.true,\ self.predicted, seqs=self.seq), 0.45) def test_correlation_coefficient(self): """correlation_coefficient: check against compare_ct.pm""" self.assertFloatEqual(correlation_coefficient(self.true,\ self.predicted, seqs=self.seq, min_dist=4), 0.42906394) def test_cc_bad_pred(self): """correlation_coefficient: should give 0 when TP=0""" ref = Pairs([(1,7),(2,5)]) pred = Pairs([(0,8)]) seqs = ['CAUCGAUUG'] self.assertEqual(correlation_coefficient(ref, pred, seqs=seqs), 0.0) def test_mcc(self): """mcc: check against compare_ct.pm""" res = mcc(self.true,self.predicted,self.seq, min_dist=4) self.assertFloatEqual(res, 0.42906394) def test_all_metrics(self): """all_metrics: check against compare_ct.pm""" exp = {'SENSITIVITY':0.4, 'SELECTIVITY':0.5, 'AC':0.45,\ 'CC':0.42906394, 'MCC':0.42906394} obs = all_metrics(self.true, self.predicted, seqs=self.seq, min_dist=4) self.assertEqualItems(obs.keys(), exp.keys()) for k in exp: self.assertFloatEqual(obs[k], exp[k]) def test_get_counts_pseudo(self): """get_counts: should work when pseudo in ref -> classification off""" # pairs that would normally be compatible, are now contradicting ref = Pairs([(0,8),(1,7),(4,10)]) pred = Pairs([(0,8),(3,6),(4,10)]) seq = 'GACUGUGUCAU' exp = {'TP':2,'TN':13-2-1, 'FN':1,'FP':1,\ 'FP_INCONS':0, 'FP_CONTRA':1, 'FP_COMP':0} self.assertEqual(get_counts(ref, pred, split_fp=True,\ sequences=[seq], min_dist=4), exp) def test_all_metrics_pseudo(self): """all_metrics: pseudoknot in ref, check against compare_ct.pm""" ref = Pairs([(0,8),(1,7),(4,10)]) pred = Pairs([(0,8),(3,6),(4,10)]) seq = 'GACUGUGUCAU' exp = {'SENSITIVITY':0.6666667, 'SELECTIVITY':0.6666667,\ 'AC':0.6666667, 'CC':0.57575758, 'MCC':0.57575758} obs = all_metrics(ref, pred, seqs=[seq], min_dist=4) self.assertEqualItems(obs.keys(), exp.keys()) for k in exp: self.assertFloatEqual(obs[k], exp[k]) def test_all_metrics_weird_input(self): """all_metrics: should work when ref or prediction empty or no seqs""" ref = Pairs([(3,10)]) pred = Pairs() seqs = ['UACGUAGCUAGCUAGCUACG'] obs = all_metrics(ref, pred, seqs=[seqs], min_dist=4) exp = {'SENSITIVITY':0, 'SELECTIVITY':0,\ 'AC':0, 'CC':0, 'MCC':0} for k in exp: self.assertFloatEqual(obs[k], exp[k]) ref = Pairs() pred = Pairs() seqs = ['UACGUAGCUAGCUAGCUACG'] obs = all_metrics(ref, pred, seqs=[seqs], min_dist=4) exp = {'SENSITIVITY':1, 'SELECTIVITY':1,\ 'AC':1, 'CC':1, 'MCC':1} for k in exp: self.assertFloatEqual(obs[k], exp[k]) ref = Pairs([(3,10)]) pred = Pairs([(1,12)]) seqs = ['UACGUAGCUAGCUAGCUACG'] self.assertRaises(ValueError, all_metrics, ref, pred, seqs="",\ min_dist=4) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_struct/test_rna2d.py000644 000765 000024 00000127201 12024702176 022410 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for ViennaStructure and related classes. """ from cogent.util.unit_test import TestCase, main from cogent.struct.rna2d import ViennaStructure, Vienna, Pairs,\ Partners, EmptyPartners, WussStructure, wuss_to_vienna, StructureNode, \ Stem, classify, PairError __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class RnaAlphabet(object): Pairs = { ('A','U'): True, #True vs False for 'always' vs 'sometimes' pairing ('C','G'): True, ('G','C'): True, ('U','A'): True, ('G','U'): False, ('U','G'): False, } class Rna(str): Alphabet = RnaAlphabet class StemTests(TestCase): """Tests for the Stem object""" def test_init_empty(self): """Stem should init ok with no parameters.""" s = Stem() self.assertEqual((s.Start, s.End, s.Length), (None, None, 0)) def test_init(self): """Stem should init with Start, End, and Length""" s = Stem(Length=3) self.assertEqual((s.Start, s.End, s.Length), (None, None, 3)) #should set Length to 0 if not supplied and unpaired s = Stem(Start=3) self.assertEqual((s.Start, s.End, s.Length), (3, None, 0)) s = Stem(End=3) self.assertEqual((s.Start, s.End, s.Length), (None, 3, 0)) #should set Length to 1 if not supplied and paired s = Stem(Start=3, End=5) self.assertEqual((s.Start, s.End, s.Length), (3, 5, 1)) #parameters should be in order Start, End, Length #note that you're allowed to initialize an invalid stem, like the #following one (can't have 7 pairs between 3 and 5); this is often #useful when building up a tree that you plan to renumber(). s = Stem(3, 5, 7) self.assertEqual((s.Start, s.End, s.Length), (3, 5, 7)) #not allowed more than 3 parameters self.assertRaises(TypeError, Stem, 1, 2, 3, 4) def test_len(self): """Stem len() should return self.Length""" s = Stem() self.assertEqual(len(s), 0) s.Length = 5 self.assertEqual(len(s), 5) s.Length = None self.assertRaises(TypeError, len, s) def test_getitem(self): """Stem getitem should return a Stem object for the ith pair in the stem""" s = Stem() self.assertRaises(IndexError, s.__getitem__, 0) s.Start = 5 s.End = 8 s.Length = 1 pairs = list(s) self.assertEqual(pairs, [Stem(5, 8, 1)]) s.Length = 2 pairs = list(s) self.assertEqual(pairs, [Stem(5,8,1),Stem(6,7,1)]) #WARNING: Stem will not complain when iterating over an invalid helix, #as per the one below s.Length = 5 pairs = list(s) self.assertEqual(pairs, [Stem(5,8,1),Stem(6,7,1),Stem(7,6,1),\ Stem(8,5,1), Stem(9,4,1)]) def test_cmp(self): """Stems should compare equal when the data is the same""" self.assertEqual(Stem(1,2,3), Stem(1,2,3)) self.assertNotEqual(Stem(1,2,5), Stem(1,2,3)) a = Stem(1, 10, 2) b = Stem(2, 8, 1) c = Stem(15, 20, 2) l = [c, b, a] l.sort() self.assertEqual(l, [a,b,c]) def test_extract(self): """Stems extract should return a list of 'pairs' from a sequence""" seq = 'UGAGAUUUUCU' s = Stem(1, 10, 3) self.assertEqual(s.extract(seq), [('G','U'),('A','C'),('G','U')]) s = Stem(0, 1) self.assertEqual(s.extract(seq), [('U','G')]) #should put in None if either position hasn't been specified s = Stem(5) self.assertEqual(s.extract(seq), [('U', None)]) s = Stem() self.assertEqual(s.extract(seq), [(None, None)]) #should raise IndexError if the stem contains bases outside the seq s = Stem(50, 60, 5) self.assertRaises(IndexError, s.extract, seq) def test_hash(self): """Stems hash should allow use as dict keys if unchanged""" #WARNING: if you change the Stem after putting it in a dict, all bets #are off as to behavior. Don't do it! s = Stem(1, 5, 2) t = Stem(1, 5, 2) u = Stem(2, 4, 6) v = Stem(2, 4, 6) w = Stem(2, 4, 4) d = {} assert s is not t for i in (s, t, u, v, w): if i in d: d[i] += 1 else: d[i] = 1 self.assertEqual(len(d), 3) self.assertEqual(d[Stem(1, 5, 2)], 2) self.assertEqual(d[Stem(2, 4, 6)], 2) self.assertEqual(d[Stem(2, 4, 4)], 1) assert Stem(1,5) not in d def test_str(self): """Stem str should print Start, End and Length as a tuple""" self.assertEqual(str(Stem()), '(None,None,0)') self.assertEqual(str(Stem(3)), '(3,None,0)') self.assertEqual(str(Stem(3,4)), '(3,4,1)') self.assertEqual(str(Stem(3,4,5)), '(3,4,5)') def test_nonzero(self): """Stem nonzero should return True if paired (length > 0)""" assert not Stem() assert not Stem(1) assert Stem(7, 10) assert Stem(1, 7, 1) assert Stem(2, 8, 3) #go strictly by length; don't check if data is invalid assert Stem(0, 0) assert Stem(5, None, 10) assert not Stem(5, 7, -1) class PartnersTests(TestCase): """Tests for Partners object""" def test_init(self): """Partners should init with empty list and stay free of conflicts""" self.assertEqual(Partners([]),[]) empty = Partners([None]*6) self.assertEqual(empty,[None,None,None,None,None,None]) self.assertRaises(ValueError,empty.__setitem__,2,2) empty[2] = 3 self.assertEqual(empty,[None,None,3,2,None,None]) empty[3] = 1 self.assertEqual(empty,[None,3,None,1,None,None]) empty[3] = 5 self.assertEqual(empty,[None,None,None,5,None,3]) empty[1] = None self.assertEqual(empty,[None,None,None,5,None,3]) def test_toPairs(self): """Partners toPairs() should return a Pairs object""" p = Partners([None,3,None,1,5,4]) self.assertEqualItems(p.toPairs(),[(1,3),(4,5)]) assert isinstance(p.toPairs(),Pairs) self.assertEqual(Partners([None]*10).toPairs(),[]) def test_not_implemented(self): """Partners not_implemented should raise error for 'naughty' methods""" p = Partners([None,3,1,5,4]) self.assertRaises(NotImplementedError,p.pop) self.assertRaises(NotImplementedError,p.sort) self.assertRaises(NotImplementedError,p.__delitem__,3) class PairsTests(TestCase): """Tests for Pairs object""" def setUp(self): """Pairs SetUp method for all tests""" self.Empty = Pairs([]) self.OneList = Pairs([[1,2]]) self.OneTuple = Pairs([(1,2)]) self.MoreLists = Pairs([[2,4],[3,9],[6,36],[7,49]]) self.MoreTuples = Pairs([(2,4),(3,9),(6,36),(7,49)]) self.MulNoOverlap = Pairs([(1,10),(2,9),(3,7),(4,12)]) self.MulOverlap = Pairs([(1,2),(2,3)]) self.Doubles = Pairs([[1,2],[1,2],[2,3],[1,3]]) self.Undirected = Pairs([(2,1),(6,4),(1,7),(8,3)]) self.UndirectedNone = Pairs([(5,None),(None,3)]) self.UndirectedDouble = Pairs([(2,1),(1,2)]) self.NoPseudo = Pairs([(1,20),(2,19),(3,7),(4,6),(10,15),(11,14)]) self.NoPseudo2 = Pairs([(1,3),(4,6)]) #((.(.)).) self.p0 = Pairs([(0,6),(1,5),(3,8)]) #(.((..(.).).)) self.p1 = Pairs([(0,9),(2,12),(3,10),(5,7)]) #((.(.(.).)).) self.p2 = Pairs([(0,10),(1,9),(3,12),(5,7)]) #((.((.(.)).).)) self.p3 = Pairs([(0,9),(1,8),(3,14),(4,13),(6,11)]) #(.(((.((.))).)).(((.((((..))).)))).) self.p4 = Pairs([(0,35),(2,11),(3,10),(4,9),(6,14),(7,13),(16,28),\ (17,27),(18,26),(20,33),(21,32),(22,31),(23,30)]) #(.((.).)) self.p5 = Pairs([(0,5),(2,8),(3,7)]) self.p6 = Pairs([(0,19),(2,6),(3,5),(8,14),(9,13),(10,12),\ (16,22),(17,21)]) self.p7 = Pairs([(0,20),(2,6),(3,5),(8,14),(9,10),(11,16),(12,15),\ (17,23),(18,22)]) def test_init(self): """Pairs should initalize with both lists and tuples""" self.assertEqual(self.Empty,[]) self.assertEqual(self.OneList,[[1,2]]) self.assertEqual(self.OneTuple,[(1,2)]) self.assertEqual(self.MulNoOverlap,[(1,10),(2,9),(3,7),(4,12)]) self.assertEqual(self.MulOverlap,[(1,2),(2,3)]) def test_toPartners(self): """Pairs toPartners() should return a Partners object""" a = Pairs([(1,5),(3,4),(6,9),(7,8)]) #normal b = Pairs([(0,4),(2,6)]) #pseudoknot c = Pairs([(1,6),(3,6),(4,5)]) #conflict self.assertEqual(a.toPartners(10),[None,5,None,4,3,1,9,8,7,6]) self.assertEqual(a.toPartners(13,3),\ [None,None,None,None,8,None,7,6,4,12,11,10,9]) assert isinstance(a.toPartners(10),Partners) self.assertEqual(b.toPartners(7),[4,None,6,None,0,None,2]) self.assertRaises(ValueError,c.toPartners,7) self.assertEqual(c.toPartners(7,strict=False),[None,None,None,6,5,4,3]) #raises an error when try to insert something at non-existing indices self.assertRaises(IndexError,c.toPartners,0) def test_toVienna(self): """Pairs toVienna() should return a ViennaStructure if possible""" a = Pairs([(1,5),(3,4),(6,9),(7,8)]) #normal b = Pairs([(0,4),(2,6)]) #pseudoknot c = Pairs([(1,6),(3,6),(4,5)]) #conflict d = Pairs([(1,6),(3,None)]) e = Pairs([(1,9),(8,2),(7,3)]) #not directed f = Pairs([(1,6),(2,5),(10,15),(14,11)]) # not directed self.assertEqual(a.toVienna(10),'.(.())(())') self.assertEqual(a.toVienna(13,offset=3),'....(.())(())') self.assertRaises(PairError,b.toVienna,7) #pseudoknot NOT accepted self.assertRaises(Exception,b.toVienna,7) #old test for exception self.assertRaises(ValueError,c.toVienna,7) #pairs containging None are being skipped self.assertEquals(d.toVienna(7),'.(....)') #raises error when trying to insert at non-existing indices self.assertRaises(IndexError,a.toVienna,3) self.assertEqual(Pairs().toVienna(3),'...') #test when parsing in the sequence self.assertEqual(a.toVienna('ACGUAGCUAG'),'.(.())(())') self.assertEqual(a.toVienna(Rna('AACCGGUUAGCUA'), offset=3),\ '....(.())(())') self.assertEqual(e.toVienna(10),'.(((...)))') self.assertEqual(f.toVienna(20),'.((..))...((..))....') def test_tuples(self): """Pairs tuples() should transform the elements of list to tuples""" x = Pairs([]) x.tuples() assert x == [] x = Pairs([[1,2],[3,4]]) x.tuples() assert x == [(1,2),(3,4)] x = Pairs([(1,2),(3,4)]) x.tuples() assert x == [(1,2),(3,4)] assert x != [[1,2],[3,4]] def test_unique(self): """Pairs unique() should remove double occurences of certain tuples""" self.assertEqual(self.Empty.unique(),[]) self.assertEqual(self.MoreTuples.unique(),self.MoreTuples) self.assertEqual(self.Doubles.unique(),Pairs([(1,2),(2,3),(1,3)])) def test_directed(self): """Pairs directed() should change all pairs so that a}]',-0.01) self.WussTwoHelix = WussStructure('{[.]}(<>).',1.11) self.WussThreeHelix = WussStructure('::(<<({__}),,([(__)])-->>)') self.WussPseudo = WussStructure('<<__AA>>_aa::') def test_wuss_toPairs(self): """WussStructure toPairs() should return a valid Pairs object""" self.assertEqual(self.WussNoPairs.toPairs(),[]) self.assertEqualItems(self.WussOneHelix.toPairs(),\ [(0,12),(2,11),(3,10),(4,7)]) self.assertEqualItems(self.WussTwoHelix.toPairs(),\ [(0,4),(1,3),(5,8),(6,7)]) self.assertEqualItems(self.WussThreeHelix.toPairs(),\ [(2,25),(3,24),(4,23),(5,10),(6,9),(13,20),(14,19),(15,18)]) self.assertEqualItems(self.WussPseudo.toPairs(),\ [(0,7),(1,6)]) def test_wuss_toPartners(self): """WussStructure toPartners() should return valid Partners object""" self.assertEqual(self.WussNoPairs.toPartners(),[None]*6) self.assertEqualItems(self.WussThreeHelix.toPartners(),\ [None,None,25,24,23,10,9,None,None,6,5,None,None,20,19,\ 18,None,None,15,14,13,None,None,4,3,2]) self.assertEqualItems(self.WussPseudo.toPartners(),\ [7,6,None,None,None,None,1,0,None,None,None,None,None]) class Rna2dTests(TestCase): def test_Vienna(self): """Vienna should initalize from several formats""" self.NoPairs = Vienna('.......... (0.0)') self.OneHelix = Vienna('((((())))) (-1e-02)') self.TwoHelix = Vienna('((.))(()). \t(1.11)') self.ThreeHelix = Vienna('(((((..))..(((..)))..)))') self.GivenEnergy = Vienna('((.))',0.1) self.TwoEnergies = Vienna('((.)) (4.6)',2.1) self.assertEqual(self.NoPairs, '..........') self.assertEqual(self.NoPairs.Energy, 0.0) self.assertEqual(self.OneHelix, '((((()))))') self.assertEqual(self.OneHelix.Energy, -1e-2) self.assertEqual(self.TwoHelix, '((.))(()).') self.assertEqual(self.TwoHelix.Energy, 1.11) self.assertEqual(self.ThreeHelix, '(((((..))..(((..)))..)))') self.assertEqual(self.ThreeHelix.Energy, None) self.assertEqual(self.GivenEnergy.Energy,0.1) self.assertEqual(self.TwoEnergies.Energy,2.1) def test_EmptyPartners(self): """EmptyPartners should return list of 'None's of given length""" self.assertEqual(EmptyPartners(0),[]) self.assertEqual(EmptyPartners(1),[None]) self.assertEqual(EmptyPartners(10),[None]*10) def test_wuss_to_vienna(self): """wuss_to_vienna() should transform Wuss into Vienna""" empty = WussStructure('.....') normal = WussStructure('[.{[<...}}}}') pseudo = WussStructure('[..AA..]..aa') self.assertEqual(wuss_to_vienna(normal),'(.(((...))))') self.assertEqual(wuss_to_vienna(empty),'.....') self.assertEqual(wuss_to_vienna(pseudo),'(......)....') def test_classify(self): """classify() should classify valid structures correctly""" Empty = '' NoPairs = '.....' OneHelix = '((((()))))' ManyHelices = '(..(((...)).((.(((((..))).)))..((((..))))))...)' Ends = '..(.)..' FirstEnd = '..((()))' LastEnd = '((..((.))))...' Internal = '(((...)))..((.)).' #following structure is from p 25 of Eddy's WUSS description manual Eddy = '..((((.(((...)))...((.((....))..)).)).))' structs = [Empty, NoPairs, OneHelix, ManyHelices, Ends, \ FirstEnd, LastEnd, Internal, Eddy] EmptyResult = '' NoPairsResult = 'EEEEE' OneHelixResult = 'SSSSSSSSSS' ManyHelicesResult = 'SBBSSSLLLSSJSSBSSSSSLLSSSBSSSJJSSSSLLSSSSSSBBBS' EndsResult = 'EESLSEE' FirstEndResult = 'EESSSSSS' LastEndResult = 'SSBBSSLSSSSEEE' InternalResult = 'SSSLLLSSSFFSSLSSE' #following structure is from p 25 of Eddy's WUSS description manual Eddy = 'EESSSSJSSSLLLSSSJJJSSBSSLLLLSSBBSSJSSBSS' results = [EmptyResult, NoPairsResult, OneHelixResult, ManyHelicesResult, EndsResult, FirstEndResult, LastEndResult, InternalResult, Eddy] for s, r in zip(structs, results): c = classify(s) self.assertEqual(classify(s), r) long_struct = ".((((((((((((((.((((((..((((.....)))))))))).))..))))))))))))....(((.((((.((((((((......((((((.((..(((((((....)))).)))..))))))))...))))))))...........(((((.(..(((((((((......((((((((((((.........))))))))))))....))))).))))..)..)))))..(((((((((((((((((((......(((((((((((((((((((((((((((((((...(((.......((((((((........)))))))).......)))...))))))))))))))))))))))))))))))).((((........(((((((((((((((((((...))))))))))))))))))).......)))).....((((((((((((((((((((((((((((((.(((...))).)))))))))))))))))))))))...........))))))).))))))))))))))))))).....)))).)))......" #compare standalone method with classification from tree c = classify(long_struct) d = ViennaStructure(long_struct).toTree().classify() self.assertEqual(c,d) #Error is raised when trying to classify invalid structures invalid_structure = '(((..)).))))(...)(...' self.assertRaises(IndexError, classify, invalid_structure) class ViennaNodeTests(TestCase): """Tests of the ViennaNode class.""" def setUp(self): """Instantiate some standard ViennaNodes.""" self.EmptyStr = '' self.NoPairsStr = '.....' self.OneHelixStr = '((((()))))' self.ManyHelicesStr = '(..(((...)).((.(((((..))).)))..((((..))))))...)' self.EndsStr = '..(.)..' self.FirstEndStr = '..((()))' self.LastEndStr = '((..((.))))...' self.InternalStr = '(((...)))..((.)).' #following structure is from p 25 of Eddy's WUSS description manual self.EddyStr = '..((((.(((...)))...((.((....))..)).)).))' #add in the tree versions by deleting trailing 'Str' for s in self.__dict__.keys(): if s.endswith('Str'): self.__dict__[s[:-3]] = \ ViennaStructure(self.__dict__[s]).toTree() def test_str(self): """ViennaNode str should return Vienna-format string""" for s in [self.EmptyStr, self.NoPairsStr, self.OneHelixStr, self.ManyHelicesStr, self.EndsStr, self.InternalStr]: self.assertEqual(str(ViennaStructure(s).toTree()), s) #test with multiple-base helix in a node r = StructureNode() r.append(StructureNode()) r.append(StructureNode(Data=Stem(1,7,5))) r[1].append(StructureNode()) r.append(StructureNode()) r.append(StructureNode()) r.renumber() self.assertEqual(str(r), '.(((((.)))))..') def test_classify(self): """ViennaNode classify should return correct classification string""" self.assertEqual(self.Empty.classify(), '') self.assertEqual(self.NoPairs.classify(), 'EEEEE') self.assertEqual(self.OneHelix.classify(), 'SSSSSSSSSS') self.assertEqual(self.ManyHelices.classify(), \ 'SBBSSSLLLSSJSSBSSSSSLLSSSBSSSJJSSSSLLSSSSSSBBBS') self.assertEqual(self.Ends.classify(), 'EESLSEE') self.assertEqual(self.FirstEnd.classify(), 'EESSSSSS') self.assertEqual(self.LastEnd.classify(), 'SSBBSSLSSSSEEE') self.assertEqual(self.Internal.classify(), 'SSSLLLSSSFFSSLSSE') self.assertEqual(self.Eddy.classify(), \ 'EESSSSJSSSLLLSSSJJJSSBSSLLLLSSBBSSJSSBSS') def test_renumber(self): """ViennaNode renumber should assign correct numbers to nodes""" #should have no effect on empty structure se = self.Empty self.assertEqual(se.renumber(5), 5) self.assertEqual((se.Start, se.End, se.Length), (None, None, 0)) #with no pairs, should number consecutively sn = self.NoPairs self.assertEqual(sn.renumber(5), 10) self.assertEqual([i.Start for i in sn], [5, 6, 7, 8, 9]) self.assertEqual([i.End for i in sn], [None]*5) self.assertEqual([i.Length for i in sn], [0]*5) #spot checks on a complex structure sm = self.ManyHelices self.assertEqual(sm.renumber(5), 52) s0 = sm[0] self.assertEqual((s0.Start, s0.End, s0.Length), (5, 51, 1)) s5 = sm[0][2][2][0] self.assertEqual(len(s5), 2) self.assertEqual((s5.Start, s5.End, s5.Length), (18, 33, 1)) s6 = s5[0] self.assertEqual((s6.Start, s6.End, s6.Length), (19,None,0)) #test with some helices of different lengths root = StructureNode() root.extend([StructureNode() for i in range(3)]) root.insert(1, StructureNode(Data=Stem(3, 7, 5))) root.insert(3, StructureNode(Data=Stem(6,2,2))) root.append(StructureNode()) self.assertEqual(root.renumber(0), 18) self.assertEqual(len(root), 6) curr = root[0] self.assertEqual((curr.Start,curr.End,curr.Length), (0, None, 0)) curr = root[1] self.assertEqual((curr.Start, curr.End, curr.Length), (1, 10, 5)) curr = root[2] self.assertEqual((curr.Start, curr.End, curr.Length), (11, None, 0)) curr = root[3] self.assertEqual((curr.Start, curr.End, curr.Length), (12, 15, 2)) curr = root[4] self.assertEqual((curr.Start, curr.End, curr.Length), (16, None, 0)) curr = root[5] self.assertEqual((curr.Start, curr.End, curr.Length), (17, None, 0)) def test_unpair(self): """StructureNode unpair should break a base pair and add correct nodes""" i = self.Internal self.assertEqual(i[0].unpair(), True) self.assertEqual(str(i), '.((...))...((.)).') e = self.Ends self.assertEqual(e[0].unpair(), False) self.assertEqual(str(e), self.EndsStr) o = self.OneHelix self.assertEqual(o[0].unpair(), True) self.assertEqual(str(o), '.(((()))).') self.assertEqual(o[1][0][0].unpair(), True) self.assertEqual(str(o), '.((.().)).') self.assertEqual(o[1].unpair(), True) self.assertEqual(str(o), '..(.().)..') self.assertEqual(o[2][1].unpair(), True) self.assertEqual(str(o), '..(....)..') self.assertEqual(o[2].unpair(), True) self.assertEqual(str(o), '..........') #test with multiple bases in helix r = StructureNode() r.append(StructureNode(Data=Stem(0,0, 5))) r.renumber() self.assertEqual(str(r), '((((()))))') self.assertEqual(r[0].unpair(), True) self.assertEqual(str(r), '.(((()))).') def test_pairBefore(self): """StructureNode pairBefore should make a pair before the current node""" #shouldn't be able to make any pairs if everything is paired already o = self.OneHelix for i in o: self.assertEqual(i.pairBefore(), False) n = self.NoPairs #shouldn't be able to pair at the start... self.assertEqual(n[0].pairBefore(), False) #...or at the end... self.assertEqual(n[-1].pairBefore(), False) #...but should work OK in the middle self.assertEqual(n[1].pairBefore(), True) self.assertEqual(str(n), '(.)..') e = self.Ends self.assertEqual(e[2].pairBefore(), True) self.assertEqual(e[1].pairBefore(), True) self.assertEqual(str(e), '(((.)))') self.assertEqual((e[0].Start, e[0].End, e[0].Length), (0,6,1)) def test_pairAfter(self): """StructureNode pairAfter should create pairs after a node""" n = self.NoPairs self.assertEqual(n.pairAfter(), True) self.assertEqual(str(n), '(...)') self.assertEqual(n[0].pairAfter(), True) self.assertEqual(str(n), '((.))') self.assertEqual(n[0][0].pairAfter(), False) self.assertEqual(str(n), '((.))') curr = n[0][0] #check that child is correct self.assertEqual(len(curr), 1) self.assertEqual((curr[0].Start, curr[0].End, curr[0].Length), \ (2,None,0)) #check that pair is correct self.assertEqual((curr.Start, curr.End, curr.Length), (1,3,1)) m = self.ManyHelices n = m[0][2][0][0] self.assertEqual(n.pairAfter(), True) self.assertEqual(str(m), \ '(..((((.))).((.(((((..))).)))..((((..))))))...)') self.assertEqual(n[0].pairAfter(), False) def test_pairChildren(self): """StructureNode PairChildren should make the correct pairs""" n = ViennaStructure('.....').toTree() #same as self.NoPairs self.assertEqual(n.pairChildren(0, 4), True) self.assertEqual(str(n), '(...)') n = ViennaStructure('.....').toTree() #same as self.NoPairs self.assertEqual(n.pairChildren(1, 4), True) self.assertEqual(str(n), '.(..)') n = ViennaStructure('.....').toTree() #same as self.NoPairs #can't pair same object self.assertEqual(n.pairChildren(1, 1), False) self.assertEqual(str(n), '.....') self.assertEqual(n.pairChildren(1, -1), True) self.assertEqual(str(n), '.(..)') #can't pair something already paired self.assertEqual(n.pairChildren(0,1), False) #IndexError if out of range self.assertRaises(IndexError, n.pairChildren, 0, 5) n.append(StructureNode()) n.append(StructureNode()) n.renumber() self.assertEqual(str(n), '.(..)..') self.assertEqual(n.pairChildren(0, -2), True) self.assertEqual(str(n), '((..)).') def test_expand(self): """StructureNode expand should extend helices.""" s = StructureNode(Data=(Stem(1, 10, 3))) s.append(StructureNode()) #need to make a root node for consistency r = StructureNode() r.append(s) self.assertEqual(str(s), '(((.)))') s.expand() self.assertEqual(str(s), '(((.)))') self.assertEqual((s.Start, s.End, s.Length), (1, 10, 1)) n = s[0] self.assertEqual((n.Start, n.End, n.Length), (2, 9, 1)) n = s[0][0] self.assertEqual((n.Start, n.End, n.Length), (3, 8, 1)) n = s[0][0][0] self.assertEqual((n.Start, n.End, n.Length), (None, None, 0)) s.renumber() self.assertEqual((s.Start, s.End, s.Length), (0, 6, 1)) n = s[0] self.assertEqual((n.Start, n.End, n.Length), (1, 5, 1)) n = s[0][0] self.assertEqual((n.Start, n.End, n.Length), (2, 4, 1)) n = s[0][0][0] self.assertEqual((n.Start, n.End, n.Length), (3, None, 0)) #check that it's not recursive s[0][0].append(StructureNode(Data=Stem(20, 24, 2))) s.expand() n = s[0][0][-1] self.assertEqual((n.Start, n.End, n.Length), (20, 24, 2)) n.expand() self.assertEqual((n.Start, n.End, n.Length), (20, 24, 1)) n = n[0] self.assertEqual((n.Start, n.End, n.Length), (21, 23, 1)) def test_expandAll(self): """StructureNode expandAll should act recursively""" r = StructureNode() r.append(StructureNode(Data=Stem(0, 6, 4))) r.append(StructureNode(Data=Stem(0, 6, 3))) r.append(StructureNode()) r[0].append(StructureNode()) r[0].append(StructureNode(Data=Stem(0,6,2))) r[0][-1].append(StructureNode()) r.renumber() self.assertEqual(str(r), '((((.((.))))))((())).') r.expandAll() self.assertEqual(str(r), '((((.((.))))))((())).') expected_nodes = [ (None, None, 0), (0, 13, 1), (1, 12, 1), (2, 11, 1), (3, 10, 1), (4, None, 0), (5, 9, 1), (6, 8, 1), (7, None, 0), (14, 19, 1), (15, 18, 1), (16, 17, 1), (20, None, 0), ] for obs, exp in zip(r.traverse(), expected_nodes): self.assertEqual((obs.Start, obs.End, obs.Length), exp) def test_collapse(self): """StructureNode collapse should collapse consecutive pairs from self""" one = ViennaStructure('(.)').toTree() self.assertEqual(one.collapse(), False) self.assertEqual(str(one), '(.)') two = ViennaStructure('((.))').toTree() #can't collapse root node self.assertEqual(two.collapse(), False) #should be able to collapse next node self.assertEqual(two[0].collapse(), True) self.assertEqual((two[0].Start, two[0].End, two[0].Length), (0,4,2)) self.assertEqual(str(two), '((.))') three = ViennaStructure('(((...)))..').toTree() self.assertEqual(three[0].collapse(), True) self.assertEqual((three[0].Start, three[0].End, three[0].Length), \ (0,8,3)) self.assertEqual(str(three), '(((...)))..') self.assertEqual(three[0].collapse(), False) self.assertEqual(three[-1].collapse(), False) oh = self.OneHelix self.assertEqual(oh[0].collapse(), True) self.assertEqual(str(oh), '((((()))))') def test_collapseAll(self): """StructureNode collapseAll should collapse consecutive pairs""" for s in [self.Empty, self.NoPairs, self.OneHelix, self.ManyHelices,\ self.Ends, self.FirstEnd, self.LastEnd, self.Internal, self.Eddy]: before = str(s) s.collapseAll() after = str(s) self.assertEqual(after, before) oh = self.OneHelix[0] self.assertEqual((oh.Start, oh.End, oh.Length), (0,9,5)) m_obs = self.ManyHelices.traverse() m_exp = [ (None, None, 0), (0, 46, 1), (1, None, 0), (2, None, 0), (3, 42, 1), (4, 10, 2), (6, None, 0), (7, None, 0), (8, None, 0), (11, None, 0), (12, 41, 1), (13, 28, 1), (14, None, 0), (15, 27, 2), (17, 24, 3), (20, None, 0), (21, None, 0), (25, None, 0), (29, None, 0), (30, None, 0), (31, 40, 4), (35, None, 0), (36, None, 0), (43, None, 0), (44, None, 0), (45, None, 0), (46, None, 0), ] for obs, exp in zip([(i.Start, i.End, i.Length) for i in m_obs], m_exp): self.assertEqual(obs, exp) def test_breakBadPairs(self): """StructureNode breakBadPairs should eliminate mispaired bases.""" oh_str = ViennaStructure(self.OneHelixStr) #no change if all pairs valid oh = oh_str.toTree() oh.breakBadPairs(Rna('CCCCCGGGGG')) self.assertEqual(str(oh), str(oh_str)) #break everything if all pairs invalid oh.breakBadPairs(Rna('CCCCCAAAAA')) self.assertEqual(str(oh), '..........') #break a single pair oh = oh_str.toTree() oh.breakBadPairs(Rna('GCCCCGGGGG')) self.assertEqual(str(oh), '.(((()))).') #break two pairs oh = oh_str.toTree() oh.breakBadPairs(Rna('GCCCCCGGGG')) self.assertEqual(str(oh), '.(((..))).') #break internal pairs oh = oh_str.toTree() oh.breakBadPairs(Rna('GCCGCGGGGG')) self.assertEqual(str(oh), '.((.().)).') #repeat with multiple independent helices th_str = ViennaStructure('((.)).((.))') th = th_str.toTree() th.breakBadPairs(Rna('CCUGGCUUCGG')) self.assertEqual(str(th), th_str) th.breakBadPairs(Rna('CGUAGCAGUUU')) self.assertEqual(str(th), '(...).((.))') th = th_str.toTree() th.breakBadPairs(Rna('UUUUUUUUUUU')) self.assertEqual(str(th), '...........') def test_extendHelix(self): """StructureNode extendHelix should extend the helix as far as possible """ #single paired node is root[4] op_str = ViennaStructure('....(......)...') op = op_str.toTree() #can't extend if base pairs not allowed op[4].extendHelix(Rna('AAAAAAAAAAAAAAA')) self.assertEqual(str(op), op_str) #should extend a pair 5' op[4].extendHelix(Rna('AAACCAAAAAAGGAA')) self.assertEqual(str(op), '...((......))..') #should extend multiple pairs 5' op = op_str.toTree() op[4].extendHelix(Rna('CCCCCUUUUUUGGGG')) self.assertEqual(str(op), '.((((......))))') #should extend a pair 3', but must leave > 2-base loop op = op_str.toTree() op[4].extendHelix(Rna('AAAACCCCGGGGAAA')) self.assertEqual(str(op), '....((....))...') op[4][0].insert(1, StructureNode(Data=Stem(Start=1,End=1,Length=1))) op.renumber() self.assertEqual(str(op), '....((.()...))...') op[4][0].extendHelix(Rna( 'AAAACCCUACGGGGAAA')) self.assertEqual(str(op), '....(((()..)))...') #should extend a pair in both directions if possible op = op_str.toTree() op[4].extendHelix(Rna('AAACCCAAAAGGGAA')) self.assertEqual(str(op), '...(((....)))..') def test_extendHelices(self): """StructureNode extendHelices should extend all helices""" e = ViennaStructure('........') t = e.toTree() t.extendHelices(Rna('CCCCCCCCCC')) self.assertEqual(str(t), e) #no pairs if sequence can't form them s = ViennaStructure('(.....(...)..)...((.....))...') r = Rna('AAAAAAAAAAAAAAAAAAAAAAAAAAAAA') t = s.toTree() t.extendHelices(r) self.assertEqual(str(t), s) #should be able to extend a single helix s = ViennaStructure('(.....(...)..)...((.....))...') r = Rna('CAAAAACAAAGAAGCCCCCCCAGGGGGGG') t = s.toTree() t.extendHelices(r) self.assertEqual(str(t), '(.....(...)..)((((((...))))))') #should be able to extend multiple helices s = ViennaStructure('(.....(...)..)...((.....))...') r = Rna('AAAAACCCAGGGUUCCCCCAUAAAGGGAA') t = s.toTree() t.extendHelices(r) self.assertEqual(str(t), '((...((...))))..(((.....)))..') def test_fitSeq(self): """StructureNode fitSeq should adjust structure to match sequence""" #this is just a minimal test, since we know that both breakBadPairs() #and extendHelices() work fine with more extensive tests. s = ViennaStructure('..(((.....)))......(((.....)))...') r = Rna( 'UCCCCACUGAGGGGUUUGGGGGGUUUUCGCCCU') t = s.toTree() t.fitSeq(r) self.assertEqual(str(t), '.((((.....))))...(((.((...)).))).') #run the test suites if invoked as a script from the command line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_struct/test_selection.py000644 000765 000024 00000004202 12024702176 023362 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os try: from cogent.util.unit_test import TestCase, main from cogent.parse.pdb import PDBParser from cogent.struct.selection import einput, select except ImportError: from zenpdb.cogent.util.unit_test import TestCase, main from zenpdb.cogent.parse.pdb import PDBParser from zenpdb.cogent.struct.selection import einput, select __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class AnnotationTest(TestCase): """tests selecting entities""" def setUp(self): self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) def test_einput(self): """tests einput.""" structures = einput(self.input_structure, 'S') models = einput(self.input_structure, 'M') chains = einput(self.input_structure, 'C') residues = einput(self.input_structure, 'R') atoms = einput(self.input_structure, 'A') self.assertEquals(structures.level, 'H') self.assertEquals(models.level, 'S') self.assertEquals(chains.level, 'M') self.assertEquals(residues.level, 'C') self.assertEquals(atoms.level, 'R') atoms2 = einput(models, 'A') self.assertEquals(atoms, atoms2) atoms3 = einput(chains, 'A') self.assertEquals(atoms, atoms3) structures2 = einput(atoms, 'S') self.assertEquals(self.input_structure, structures2.values()[0]) residues2 = einput(atoms, 'R') self.assertEquals(residues, residues2) def test_select(self): """tests select.""" water = select(self.input_structure, 'R', 'H_HOH', 'eq', 'name') for residue in water: self.assertTrue(residue.name == 'H_HOH') non_water = select(self.input_structure, 'R', 'H_HOH', 'ne', 'name') for residue in non_water: self.assertTrue(residue.name != 'H_HOH') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_seqsim/__init__.py000644 000765 000024 00000000463 12024702176 022057 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_seqsim'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Developmentn" PyCogent-1.5.3/tests/test_seqsim/test_analysis.py000644 000765 000024 00000030413 12024702176 023200 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for analysis.py: substitution matrix analysis code.""" from cogent.seqsim.analysis import tree_threeway_counts, \ tree_twoway_counts, counts_to_probs, probs_to_rates, \ tree_threeway_rates, tree_twoway_rates, \ rates_to_array, multivariate_normal_prob from cogent.seqsim.tree import RangeNode from cogent.core.usage import DnaPairs, ABPairs from cogent.seqsim.usage import Rates, Counts, Probs from numpy import array, average, ones, zeros, float64, ravel, diag, any from numpy.random import random, randint from copy import deepcopy from cogent.parse.tree import DndParser from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class analysisTests(TestCase): """Tests of top-level functions.""" def setUp(self): """Make a couple of standard trees""" self.t1 = DndParser('((a,(b,c)),(d,e))', RangeNode) #selt.t1 indices: ((0,(1,2)5)6,(3,4)7)8 def test_threeway_counts(self): """threeway_counts should produce correct count matrix""" self.t1.makeIdIndex() ind = self.t1.IdIndex ind[0].Sequence = array([0,0,0]) ind[1].Sequence = array([0,1,0]) ind[2].Sequence = array([1,0,1]) ind[3].Sequence = array([1,1,0]) ind[4].Sequence = array([1,1,1]) depths = self.t1.leafLcaDepths() result = tree_threeway_counts(self.t1, depths, ABPairs) #check we got the right number of comparisons self.assertEqual(len(result), 20) #check we got the right keys for k in [(1,2,0),(2,1,0),(0,1,3),(1,0,3),(0,1,4),(1,0,4),(0,2,3),\ (2,0,3),(0,2,4),(2,0,4),(1,2,3),(2,1,3),(1,2,4),(2,1,4),(3,4,1),\ (4,3,1),(3,4,2),(4,3,2)]: assert k in result #spot-check a few results self.assertEqual(result[(1,2,0)]._data, array([[2,1],[0,0]])) self.assertEqual(result[(2,1,0)]._data, array([[1,2],[0,0]])) self.assertEqual(result[(2,1,3)]._data, array([[0,1],[1,1]])) def test_twoway_counts(self): """twoway_counts should produce correct count matrix""" self.t1.makeIdIndex() ind = self.t1.IdIndex ind[0].Sequence = array([0,0,0]) ind[1].Sequence = array([0,1,0]) ind[2].Sequence = array([1,0,1]) ind[3].Sequence = array([1,1,0]) ind[4].Sequence = array([1,1,1]) depths = self.t1.leafLcaDepths() #check that it works with averaging result = tree_twoway_counts(self.t1, ABPairs) #check we got the right number of comparisons: average by default self.assertEqual(len(result), 10) #check we got the right keys for k in [(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]: assert k in result #spot-check a few results self.assertEqual(result[(0,1)]._data, array([[2,.5],[.5,0]])) self.assertEqual(result[(2,3)]._data, array([[0,1],[1,1]])) #check that it works when we don't average result = tree_twoway_counts(self.t1, ABPairs, average=False) self.assertEqual(len(result), 20) #check we got the right keys for k in [(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]: assert k in result #reverse should be in result too assert (k[1],k[0]) in result #spot-check values self.assertEqual(result[(0,1)]._data, array([[2,1],[0,0]])) self.assertEqual(result[(1,0)]._data, array([[2,0],[1,0]])) def test_counts_to_probs(self): """counts_to_probs should skip cases with zero rows""" counts = { (0,1): Counts(array([[0,1],[1,0]]), ABPairs), (1,2): Counts(array([[0,0],[1,0]]), ABPairs), #bad row (0,3): Counts(array([[0,0],[0,0]]), ABPairs), #bad row (0,4): Counts(array([[0.0,0.0],[0.0,0.0]]), ABPairs), #bad row (0,5): Counts(array([[0.1,0.3],[0.0,0.0]]), ABPairs), #bad row (3,4): Counts(array([[0.1,0.3],[0.4,0.1]]), ABPairs), (2,1): Counts(array([[0,5],[1,0]]), ABPairs), } result = counts_to_probs(counts) self.assertEqual(len(result), 3) self.assertFloatEqual(result[(0,1)]._data, array([[0,1],[1,0]])) self.assertFloatEqual(result[(3,4)]._data, \ array([[0.25,0.75],[0.8,0.2]])) self.assertFloatEqual(result[(2,1)]._data, array([[0,1],[1,0]])) def test_probs_to_rates(self): """probs_to_rates converts probs to rates, omitting problem cases""" probs = dict([(i, Probs.random(DnaPairs)) for i in range(100)]) rates = probs_to_rates(probs) #check we got at most the same number of items as in probs assert len(rates) <= len(probs) #check that we didn't get anything bad vals = rates.values() for v in vals: assert not v.isSignificantlyComplex() #check that we didn't miss anything good for key, val in probs.items(): if key not in rates: try: r = val.toRates() print r.isValid() assert r.isSignificantlyComplex() or (not r.isValid()) except (ZeroDivisionError, OverflowError, ValueError): pass def test_rates_to_array(self): """rates_to_array should pack rates into array correctly""" m1 = array([[-1,1,1,1],[2,-2,2,2],[3,3,-3,3],[1,2,3,-4]]) m2 = m1 * 2 m3 = m1 * 0.5 m4 = zeros((4,4)) m5 = array([0,0]) r1, r2, r3, r4, r5 = [Rates(i, DnaPairs) for i in m1,m2,m3,m4,m5] data = {(0,1,0):r1, (1,2,0):r2, (2,0,0):r3, (2,1,1):r4} #note that array can be, but need not be, floating point to_fill = zeros((3,3,3,16), 'float64') result = rates_to_array(data, to_fill) #check that the thnigs we deliberately set are OK self.assertEqual(to_fill[0][1][0], ravel(m1)) self.assertNotEqual(to_fill[0][1][0], ravel(m2)) self.assertEqual(to_fill[1,2,0], ravel(m2)) self.assertEqual(to_fill[2][0][0], ravel(m3)) self.assertEqual(to_fill[2][1][1], ravel(m4)) #check that everything else is zero nonzero = [(0,1,0),(1,2,0),(2,0,0)] for x in [(i, j, k) for i in range(3) for j in range(3) \ for k in range(3)]: if x not in nonzero: self.assertEqual(to_fill[x], zeros(16)) #check that it works omitting the diagonal to_fill = zeros((3,3,3,12), 'float64') result = rates_to_array(data, to_fill, without_diagonal=True) #check that the thnigs we deliberately set are OK m1_nodiag = array([[1,1,1],[2,2,2],[3,3,3],[1,2,3]]) self.assertEqual(to_fill[0][1][0], ravel(m1_nodiag)) self.assertNotEqual(to_fill[0][1][0], ravel(m1_nodiag*2)) self.assertEqual(to_fill[1,2,0], ravel(m1_nodiag*2)) self.assertEqual(to_fill[2][0][0], ravel(m1_nodiag*0.5)) self.assertEqual(to_fill[2][1][1], zeros(12)) #check that everything else is zero nonzero = [(0,1,0),(1,2,0),(2,0,0)] for x in [(i, j, k) for i in range(3) for j in range(3) \ for k in range(3)]: if x not in nonzero: self.assertEqual(to_fill[x], zeros(12)) def test_tree_threeway_rates(self): """tree_threeway_rates should give plausible results on rand trees""" #note: the following fails occasionally, but repeating it 5 times #and checking that one passes is fairly safe for i in range(5): try: t = self.t1 t.assignLength(0.05) t.Q = Rates.random(DnaPairs).normalize() t.assignQ() t.assignP() t.evolve(randint(0,4,100)) t.makeIdIndex() depths = t.leafLcaDepths() result = tree_threeway_rates(t, depths) self.assertEqual(result.shape, (5,5,5,16)) #check that row sums are 0 for x in [(i,j,k) for i in range(5) for j in range(5) \ for k in range(5)]: self.assertFloatEqual(sum(result[x]), 0) assert any(result) #check that it works without_diag result = tree_threeway_rates(t, depths, without_diag=True) self.assertEqual(result.shape, (5,5,5,12)) #check that it works with/without normalize #default: no normalization, so row sums shouldn't be 1 after #omitting diagonal result = tree_threeway_rates(t, depths, without_diag=True) self.assertEqual(result.shape, (5,5,5,12)) for x in [(i,j,k) for i in range(5) for j in range(5) \ for k in range(5)]: assert sum(result[x]) == 0 or abs(sum(result[x]) - 1) > 0.01 #...but if we tell it to normalize, row sums should be nearly 1 #after omitting diagonal result = tree_threeway_rates(t, depths, without_diag=True, \ normalize=True) self.assertEqual(result.shape, (5,5,5,12)) for x in [(i,j,k) for i in range(5) for j in range(5) \ for k in range(5)]: s = sum(result[x]) if s != 0: self.assertFloatEqual(s, 1) break except AssertionError: pass def test_tree_twoway_rates(self): """tree_twoway_rates should give plausible results on rand trees""" t = self.t1 t.assignLength(0.05) t.Q = Rates.random(DnaPairs).normalize() t.assignQ() t.assignP() t.evolve(randint(0,4,100)) t.makeIdIndex() result = tree_twoway_rates(t) self.assertEqual(result.shape, (5,5,16)) #check that row sums are 0 for x in [(i,j) for i in range(5) for j in range(5)]: self.assertFloatEqual(sum(result[x]), 0) #need to make sure we didn't just get an empty array self.assertGreaterThan((abs(result)).sum(), 0) #check that it works without_diag result = tree_twoway_rates(t, without_diag=True) self.assertEqual(result.shape, (5,5,12)) #check that it works with/without normalize #default: no normalization, so row sums shouldn't be 1 after omitting #diagonal result = tree_twoway_rates(t, without_diag=True) self.assertEqual(result.shape, (5,5,12)) #check that the row sums are not 1 before normalization (note that they #can be zero, though) sums_before = [] for x in [(i,j) for i in range(5) for j in range(5)]: curr_sum = sum(result[x]) sums_before.append(curr_sum) #...but if we tell it to normalize, row sums should be nearly 1 #after omitting diagonal result = tree_twoway_rates(t, without_diag=True, \ normalize=True) self.assertEqual(result.shape, (5,5,12)) sums_after = [] for x in [(i,j) for i in range(5) for j in range(5)]: curr_sum = sum(result[x]) sums_after.append(curr_sum) if curr_sum != 0: self.assertFloatEqual(curr_sum, 1) try: self.assertFloatEqual(sums_before, sums_after) except AssertionError: pass else: raise AssertionError, "Expected different arrays before/after norm" def test_multivariate_normal_prob(self): """Multivariate normal prob should match R results""" cov = array([[3,1,2],[1,5,4],[2,4,6]]) a = array([0,0,0]) b = array([1,1,1]) c = array([0.1, 0.2, 0.3]) small_cov = cov/10.0 mvp = multivariate_normal_prob self.assertFloatEqual(mvp(a, cov), 0.01122420) self.assertFloatEqual(mvp(a, cov, b), 0.009018894) self.assertFloatEqual(mvp(a, small_cov, b), 0.03982319) self.assertFloatEqual(mvp(c, small_cov, b), 0.06091317) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_seqsim/test_birth_death.py000644 000765 000024 00000026676 12024702176 023652 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file test_birth_death.py """Unit tests of birth_death.py: implementation of the birth-death model. """ from cogent.seqsim.birth_death import ExtinctionError, TooManyTaxaError, \ BirthDeathModel, DoubleBirthDeathModel from cogent.util.unit_test import TestCase, main, FakeRandom __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Mike Robeson"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class BirthDeathModelTests(TestCase): """Tests of the BirthDeathModel class, which makes birth-death trees.""" def test_init_deafults(self): """BirthDeathModel should init correctly w/ default params""" m = BirthDeathModel(0.1, 0.2, 0.3) self.assertEqual(m.BirthProb, 0.1) self.assertEqual(m.DeathProb, 0.2) self.assertEqual(m.TimePerStep, 0.3) self.assertEqual(m.MaxStep, 1000) self.assertEqual(m.MaxTaxa, None) self.assertEqual(m.CurrStep, 0) self.assertEqual(m.Tree.__class__, m.NodeClass) self.assertEqual(m.CurrTaxa, [m.Tree]) self.assertEqual(m.ChangedBirthProb,None) self.assertEqual(m.ChangedDeathProb,None) self.assertEqual(m.ChangedBirthStep,None) self.assertEqual(m.ChangedDeathStep,None) self.assertEqual(m.CurrBirthProb, 0.1) self.assertEqual(m.CurrDeathProb, 0.2) def test_init_bad(self): """BirthDeathModel should raise exceptions on init with bad data""" #BirthProb and DeathProb must be probabilities between 0 and 1 self.assertRaises(ValueError, BirthDeathModel, -1, 0.2, 0.3) self.assertRaises(ValueError, BirthDeathModel, 2, 0.2, 0.3) self.assertRaises(ValueError, BirthDeathModel, 0.1, -1, 0.3) self.assertRaises(ValueError, BirthDeathModel, 0.1, 2, 0.3) #TimePerStep can't be negative or 0 self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, -1) self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0) def test_init_extras(self): """BirthDeathModel should init OK with extra params""" m = BirthDeathModel(BirthProb=0.1, DeathProb=0.2, TimePerStep=0.3, \ ChangedBirthProb=0.4, ChangedDeathProb=0.3, ChangedBirthStep=3,\ ChangedDeathStep=4, MaxStep=5, MaxTaxa=10) self.assertEqual(m.BirthProb, 0.1) self.assertEqual(m.DeathProb, 0.2) self.assertEqual(m.TimePerStep, 0.3) self.assertEqual(m.ChangedBirthProb, 0.4) self.assertEqual(m.ChangedDeathProb, 0.3) self.assertEqual(m.ChangedBirthStep, 3) self.assertEqual(m.ChangedDeathStep, 4) self.assertEqual(m.MaxStep, 5) self.assertEqual(m.MaxTaxa, 10) self.assertEqual(m.CurrStep, 0) self.assertEqual(m.Tree.__class__, m.NodeClass) self.assertEqual(m.CurrTaxa, [m.Tree]) def test_step(self): """BirthDeathModel step should match hand-calculated results""" m = BirthDeathModel(BirthProb=0.1, DeathProb=0.2, TimePerStep=1) born_and_died = FakeRandom([0],True) born_only = FakeRandom([1,0],True) died_only = FakeRandom([0,1],True) neither = FakeRandom([1],True) kill_alternate = FakeRandom([0,1,1,1], True) born_alternate = FakeRandom([1,1,1,0], True) #check that with neither birth nor death, we just continue m.step(neither) self.assertEqual(len(m.Tree.Children), 0) #check that with born_only we get a duplication m.step(born_only) self.assertEqual(len(m.Tree.Children), 2) assert m.Tree not in m.CurrTaxa for i in m.CurrTaxa: assert i.Parent is m.Tree self.assertEqual(i.Length, 1) #check that with a second round of born_only we duplicate again m.step(born_only) self.assertEqual(len(m.Tree.Children), 2) self.assertEqual(len(list(m.Tree.traverse())), 4) for i in m.Tree.traverse(): self.assertEqual(i.Length, 1) for i in m.Tree.Children: self.assertEqual(i.Length, 1) #check that branch lengths add correctly for i in range(4): m.step(neither) self.assertEqual(len(m.CurrTaxa), 4) self.assertEqual(len(m.Tree.Children), 2) self.assertEqual(len(list(m.Tree.traverse())), 4) for i in m.Tree.traverse(): self.assertEqual(i.Length, 5) for i in m.Tree.Children: self.assertEqual(i.Length, 1) #check that we can kill offspring correctly m.step(kill_alternate) self.assertEqual(len(m.CurrTaxa), 2) #make sure we killed the right children m.Tree.assignIds() for i in m.Tree.Children: #note that killing a child doesn't remove it, just stops it changing self.assertEqual(len(i.Children), 2) self.assertEqual(i.Children[0].Length, 5) self.assertEqual(i.Children[1].Length, 6) self.assertEqual([i.Length for i in m.Tree.traverse()], \ [5,6,5,6]) #make sure that born_and_died does the same thing as neither m.step(born_and_died) self.assertEqual([i.Length for i in m.Tree.traverse()], \ [5,7,5,7]) m.step(neither) self.assertEqual([i.Length for i in m.Tree.traverse()], \ [5,8,5,8]) #check that only CurrTaxa are brought forward self.assertEqual([i.Length for i in m.CurrTaxa], [8,8]) #check that we can duplicate a particular taxon m.step(born_alternate) self.assertEqual([i.Length for i in m.CurrTaxa], [9,1,1]) self.assertEqual(m.CurrTaxa[1].Parent.Length, 8) #check that we can kill 'em all m.step(died_only) self.assertEqual(len(m.CurrTaxa), 0) def test_prob_step_check(self): """prob_check and step_check should return error when out of bounds. Prob values should be between zero and one Step values should be greater than zero """ #ChangedBirthProb = -0.1 , raises ValueError self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0.3,\ ChangedBirthProb=-0.1,ChangedBirthStep=3,ChangedDeathProb=0.3,\ ChangedDeathStep=4, MaxStep=5) #ChangedBirthStep = 0 , raises ValueError self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0.3,\ ChangedBirthProb=0.6,ChangedBirthStep=0,ChangedDeathProb=0.3,\ ChangedDeathStep=4, MaxStep=5) #ChangedDeathProb = 2 , raises ValueError self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0.3,\ ChangedBirthProb=0.6,ChangedBirthStep=3,ChangedDeathProb=2,\ ChangedDeathStep=4, MaxStep=5) #ChangedDeathStep = -1 , raises ValueError self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0.3,\ ChangedBirthProb=0.6,ChangedBirthStep=3,ChangedDeathProb=0.3,\ ChangedDeathStep=-1, MaxStep=5) def test_timeOk(self): """BirthDeathModel TimeOk should return True if time not exceeded""" b = BirthDeathModel(0.1, 0.2, 0.3, MaxStep=5) assert b.timeOk() b.CurrStep = 4 assert b.timeOk() b.CurrStep = 5 assert not b.timeOk() b.CurrStep = 1000 assert not b.timeOk() b.MaxStep = None assert b.timeOk() b.MaxStep = 1001 assert b.timeOk() b.step() assert not b.timeOk() def test_taxaOk(self): """BirthDeathModel TaxaOk should return True if taxa not exceeded""" b = BirthDeathModel(0.1, 0.2, 0.3, MaxTaxa=5) born_alternate = FakeRandom([1,1,1,0], True) born_only = FakeRandom([1,0],True) kill_only = FakeRandom([0,1,0,1], True) #start off with single taxon assert b.taxaOk() #taxa are OK if there are a few b.step(born_only) #now 2 taxa assert b.taxaOk() b.step(born_only) #now 4 taxa assert b.taxaOk() b.step(born_only) #now 8 taxa assert not b.taxaOk() b.MaxTaxa = 8 assert not b.taxaOk() b.MaxTaxa = 9 assert b.taxaOk() b.MaxTaxa = 17 assert b.taxaOk() b.step(born_only) assert b.taxaOk() b.step(born_only) assert not b.taxaOk() #ok if no maximum b.MaxTaxa = None assert b.taxaOk() #not ok if there are no taxa left b.step(kill_only) assert not b.taxaOk() #still not OK if not MaxTaxa b.MaxTaxa = None assert not b.taxaOk() def test_call_exact(self): """BirthDeathModel call should produce right # taxa when exact""" m = BirthDeathModel(0.01, 0.005, 0.1, MaxTaxa=10) for i in range(10): try: result = m(filter=True, exact=True) self.assertEqual(len(list(result.traverse())), 10) except (TooManyTaxaError, ExtinctionError), e: pass def test_call(self): """BirthDeathModel call should produce hand-calculated trees""" m = BirthDeathModel(0.01, 0.005, 0.1, MaxTaxa=10) r = FakeRandom(\ [1,0,\ 1,1, 1,1,\ 1,0, 0,0,\ 0,0, 0,0, 1,0,\ 0,0, 0,0, 0,1, 0,0, \ 1,0, 0,0, 0,0,\ 1,0, 0,0, 0,0, 1,0, \ 1,0, 1,0, 0,1, 1,1, 1,0, 1,0, \ 1,1, 1,1, 1,1, 1,1, 1,0, 1,1, 1,1, 1,1, 1,1], True) m = BirthDeathModel(0.1, 0.5, 1, MaxTaxa=10) result = m(filter=False, random_f=r) self.assertEqual([i.Length for i in result.traverse()], \ [2,2,2,2,2,1,1,1,2,2,2,2]) #try it with pruning m = BirthDeathModel(0.1, 0.5, 1, MaxTaxa=10) result = m(filter=True, random_f=r) self.assertEqual([i.Length for i in result.traverse()], \ [2,2,2,2,1,1,2,2,2,2]) #try it with fewer taxa m = BirthDeathModel(0.1, 0.5, 1, MaxTaxa=4) result = m(filter=True, random_f=r) self.assertEqual([i.Length for i in result.traverse()], \ [2,2,1,1]) def test_changed_values_step(self): """Tests if values changed at specified steps in step(). Note, in m.step() CurrStep is logically tested one step later. """ m = BirthDeathModel( 0.1, 0.2, 0.3,ChangedBirthProb=0.6,\ ChangedBirthStep=3,ChangedDeathProb=0.3,ChangedDeathStep=4,\ MaxStep=5) # all values should be as initialized m.step() assert m.CurrStep == 1 assert m.BirthProb == 0.1 assert m.DeathProb == 0.2 assert m.CurrBirthProb == 0.1 assert m.CurrDeathProb == 0.2 # continue 2 steps m.step() m.step() # when logically evaluated CurrBirthProb should change # from 0.1 to 0.6 m.step() assert m.CurrStep == 4 assert m.BirthProb == 0.1 assert m.DeathProb == 0.2 assert m.CurrBirthProb == 0.6 assert m.CurrDeathProb == 0.2 # All values other than CurrStep should be as above # except that CurrDeathProb should change from 0.2 to 0.3 m.step() assert m.CurrStep == 5 assert m.BirthProb == 0.1 assert m.DeathProb == 0.2 assert m.CurrBirthProb == 0.6 assert m.CurrDeathProb == 0.3 class DoubleBirthDeathTests(TestCase): """Tests of the double birth-death model.""" def test_double_birth_death(self): """double_birth_death should run without errors""" pass if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_seqsim/test_markov.py000644 000765 000024 00000012334 12024702176 022656 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env/python """test_markov.py: tests of the MarkovGenerator class. """ from cogent.seqsim.markov import MarkovGenerator from StringIO import StringIO from operator import mul from sys import path from cogent.util.unit_test import TestCase, main from numpy import array __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Jesse Zaneveld", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" class MarkovGeneratorTests(TestCase): """Tests of the MarkovGenerator class.""" def setUp(self): """Define a few well-known frequencies.""" self.single = MarkovGenerator(['UUUUUUUUUU'], order=0) self.equal = MarkovGenerator(['UUUUUCCCCC'], order=0) self.unequal_0 = MarkovGenerator(['UCCCCCCCCC'], order=0) self.unequal_1 = MarkovGenerator(['UCCCCCCCCC'], order=-1) self.pairs = MarkovGenerator(['UCAUCAUCAUCAUCA'], order=1) self.randquads=MarkovGenerator(['AACCUAUCUACUACUAUCUUCAUAUUCC']\ ,order=3, calc_entropy=True, delete_bad_suffixes=False) self.empty = MarkovGenerator('', order=0) self.linebreaks= MarkovGenerator(StringIO('abb\nbcc\nd\n')) self.dinucs=MarkovGenerator(['ATACATAC'],order=1) self.orderfive=MarkovGenerator(['AAAAAGAAAAATAAAAAGAAAAAT'],order=5) def test_init(self): """MarkovGenerator init should give right frequency distributions.""" self.assertEqual(self.empty.Frequencies, {}) self.assertEqual(self.single.Frequencies, {'':{'U':1.0}}) self.assertEqual(self.equal.Frequencies, {'':{'U':0.5,'C':0.5}}) self.assertEqual(self.unequal_0.Frequencies, {'':{'U':0.1,'C':0.9}}) self.assertEqual(self.unequal_1.Frequencies, {'':{'U':0.5, 'C':0.5}}) self.assertEqual(self.pairs.Frequencies, \ {'U':{'C':1},'C':{'A':1},'A':{'U':1}}) #check that recalculating the frequencies doesn't break anything self.pairs.calcFrequencies() self.assertEqual(self.pairs.Frequencies, \ {'U':{'C':1},'C':{'A':1},'A':{'U':1}}) exp={'AAC':{'C':1},'ACC':{'U':1},'CCU':{'A':1},'CUA':{'U':0.5,'C':0.5},\ 'UAU':{'U':1/3.0,'C':2/3.0},'AUC':{'U':1},'UCU':{'U':0.5,'A':0.5},\ 'UAC':{'U':1},'ACU':{'A':1},'CUU':{'C':1},'UUC':{'C':0.5,'A':0.5},\ 'UCA':{'U':1},'CAU':{'A':1},'AUA':{'U':1},'AUU':{'C':1}, } obs = self.randquads.Frequencies self.assertFloatEqual(obs, exp) #check that resetting linebreaks has the desired effect self.assertEqual(self.linebreaks.Frequencies, \ {'a':{'b':1},'b':{'b':0.5,'c':0.5},'c':{'c':1}}) self.linebreaks.Linebreaks = True self.linebreaks.Text.seek(0) self.linebreaks.calcFrequencies() #NOTE: current algorithm won't extend over line breaks. If you want #to force use of line breaks, read into a single string. self.assertEqual(self.linebreaks.Frequencies, \ {'a':{'b':1},'b':{'b':0.5,'c':0.5},'c':{'c':1}}) def test_next(self): """MarkovGenerator.next should generate text with expected properties""" #haven't figured how to do this for longer correlation lengths yet pass def test_entropy(self): """MarkovGenerator._entropy() should correctly calculate average H""" self.assertFloatEqual(self.randquads.Entropy, \ 3.0/25 * 0.91829583405448956 + 8.0/25) def test_evaluateProbability(self): """Should calculate proper P value for seq""" self.dinucs.Prior=1 q=self.dinucs.evaluateProbability('AT') self.assertFloatEqual(q,.50) z=self.dinucs.evaluateProbability('ATAT') self.assertFloatEqual(z,.25) p=self.dinucs.evaluateProbability('ATATAT') self.assertFloatEqual(p,.125) j=self.dinucs.evaluateProbability('ATACAT') self.assertFloatEqual(j,.125) h=self.orderfive.evaluateProbability('AAAAAT') self.assertFloatEqual(h,.50) def test_replaceDegenerateBases(self): """strips degenerate bases....""" text = 'AATCGCRRCCYAATC' m=MarkovGenerator([text],order=2) self.assertEqual(m.Text, [text]) m.replaceDegenerateBases() self.assertEqual(m.Text[0][0:6],'aatcgc') p=m.Text[0][6] q= p in ['a','t','c','g'] self.assertEqual(q,True) def test_wordToUniqueKey(self): """wordToUniqueKey should generate proper integers""" m=MarkovGenerator(['aataacaataac'],order=2) word='gca' uniqueKey=m.wordToUniqueKey(word) #a=0 c=1 t=2 g=3 #should be (4^0*3)+(4^1*1)+(4^2*0)=3+4+0=7 self.assertEqual(uniqueKey,7) def test_evaluateArrayProbability(self): """evaluateArrayProbability should calc prob from array indices""" m=MarkovGenerator(['aaaaaaaatt'],order=0) #8 a's, 2 t's m.calcFrequencies() prob=m.evaluateArrayProbability(array([0,2])) self.assertFloatEqual(prob,0.16) #0.8*0.2 prob=m.evaluateArrayProbability(array([0,1])) self.assertFloatEqual(prob,0) #0.8*0 if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_seqsim/test_microarray.py000644 000765 000024 00000013105 12024702176 023524 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the microarray module, dealing with fake expression data.""" from cogent.util.unit_test import TestCase, main from cogent.seqsim.microarray import MicroarrayNode from numpy import ones __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" class MicroarrayNodeTests(TestCase): """Tests of the MicroarrayNode class""" def test_init_empty(self): """MicroarrayNode empty init should return new object as expected""" m = MicroarrayNode() self.assertEqual(m.Length, 0) self.assertEqual(m.Array, None) self.assertEqual(m.Name, None) self.assertEqual(m.Children, []) self.assertEqual(m.Parent, None) def test_init(self): """MicroarrayNode init should return new object w/ correct attributes""" m = MicroarrayNode(Name='x') self.assertEqual(m.Length, 0) self.assertEqual(m.Array, None) self.assertEqual(m.Name, 'x') self.assertEqual(m.Children, []) self.assertEqual(m.Parent, None) n = MicroarrayNode(3, 'xyz', Parent=m) self.assertEqual(n.Length, 3) self.assertEqual(n.Array, 'xyz') self.assertEqual(n.Name, None) self.assertEqual(n.Children, []) assert n.Parent is m def test_mutate(self): """Microarray mutate should set arrays appropriately""" #check that it works as the root a = ones(25, 'float64') m = MicroarrayNode() m.setExpression(a) assert m.Array is not a self.assertEqual(m.Array, a) #check that it works on a single node w/ branchlength set m.Length = 1 m.setExpression(a) self.assertNotEqual(m.Array, a) assert min(m.Array) > -4 assert max(m.Array) < 6 #check that it works for the children m.Length = None m2, m3, m4 = MicroarrayNode(), MicroarrayNode(), MicroarrayNode() m5, m6, m7 = MicroarrayNode(), MicroarrayNode(), MicroarrayNode() m8, m9, m10 = MicroarrayNode(), MicroarrayNode(), MicroarrayNode() m.Children = [m2,m3, m4] m2.Children = [m5] m3.Children = [m6,m7,m8] m8.Children = [m9,m10] m2.Length = 2 # should be ~ 2 sd from 1 m3.Length = 0 # should test equal to m.Array m4.Length = 0.1 # should be ~ 0.1 sd from 1 m5.Length = 1 # should be ~ 3 sd from 1 m6.Length = 0.1 # should be in same bounds as m4 m7.Length = 2 # should be in same bounds as m2 m8.Length = 1 # should be ~ 1 sd from 1 m9.Length = 1 # should be in same bounds as m2 m10.Length = 0 # should test equal to m8 m.setExpression(a) self.assertNotEqual(m.Array, m2.Array) self.assertEqual(m.Array, m3.Array) self.assertNotEqual(m.Array, m4.Array) self.assertNotEqual(m.Array, m5.Array) self.assertNotEqual(m.Array, m6.Array) self.assertNotEqual(m.Array, m7.Array) self.assertNotEqual(m.Array, m8.Array) self.assertNotEqual(m.Array, m9.Array) self.assertNotEqual(m.Array, m10.Array) self.assertNotEqual(m2.Array, m3.Array) self.assertNotEqual(m2.Array, m4.Array) self.assertNotEqual(m2.Array, m5.Array) self.assertNotEqual(m2.Array, m6.Array) self.assertNotEqual(m2.Array, m7.Array) self.assertNotEqual(m2.Array, m8.Array) self.assertNotEqual(m2.Array, m9.Array) self.assertNotEqual(m2.Array, m10.Array) self.assertNotEqual(m3.Array, m4.Array) self.assertNotEqual(m3.Array, m5.Array) self.assertNotEqual(m3.Array, m6.Array) self.assertNotEqual(m3.Array, m7.Array) self.assertNotEqual(m3.Array, m8.Array) self.assertNotEqual(m3.Array, m9.Array) self.assertNotEqual(m3.Array, m10.Array) self.assertNotEqual(m4.Array, m5.Array) self.assertNotEqual(m4.Array, m6.Array) self.assertNotEqual(m4.Array, m7.Array) self.assertNotEqual(m4.Array, m8.Array) self.assertNotEqual(m4.Array, m9.Array) self.assertNotEqual(m4.Array, m10.Array) self.assertNotEqual(m5.Array, m6.Array) self.assertNotEqual(m5.Array, m7.Array) self.assertNotEqual(m5.Array, m8.Array) self.assertNotEqual(m5.Array, m9.Array) self.assertNotEqual(m5.Array, m10.Array) self.assertNotEqual(m6.Array, m7.Array) self.assertNotEqual(m6.Array, m8.Array) self.assertNotEqual(m6.Array, m9.Array) self.assertNotEqual(m6.Array, m10.Array) self.assertNotEqual(m7.Array, m8.Array) self.assertNotEqual(m7.Array, m9.Array) self.assertNotEqual(m7.Array, m10.Array) self.assertNotEqual(m8.Array, m9.Array) self.assertEqual(m8.Array, m10.Array) self.assertNotEqual(m9.Array, m10.Array) #check that amount of change is about right #assert 1 > min(m2.Array) > -15 #assert 1 < max(m2.Array) < 15 #assert 1 > min(m4.Array) > 0.4 #assert 1 < max(m4.Array) < 1.6 # might want stochastic tests here... self.assertIsBetween(m2.Array, -11, 13) self.assertIsBetween(m4.Array, 0.4, 1.6) self.assertIsBetween(m5.Array, -15, 17) self.assertIsBetween(m6.Array, 0.4, 1.6) self.assertIsBetween(m7.Array, -11, 13) self.assertIsBetween(m8.Array, -4, 7) self.assertIsBetween(m9.Array, -11, 13) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_seqsim/test_microarray_normalize.py000644 000765 000024 00000007700 12024702176 025610 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests of microarray_normalize.py: code for normalizing microarrays. """ from cogent.seqsim.microarray_normalize import (zscores, logzscores, ranks, quantiles, make_quantile_normalizer, make_normal_quantile_normalizer, make_empirical_quantile_normalizer, geometric_mean ) from cogent.util.unit_test import TestCase, main from numpy import array, arange, reshape, log2 __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Micah Hamady", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" class microarray_normalize_tests(TestCase): """Tests of top-level functions.""" def test_zscores(self): """zscores should convert array to zscores within each column""" a = reshape(arange(15),(5,3)) z = zscores(a) self.assertEqual(z[2], array([0,0,0])) #middle should be mean self.assertFloatEqual(z[0], [-1.41421356]*3) #check that it works when arrays aren't sorted a[0] = a[-1] a[1] = a[-2] a[0, -1] = 50 z = zscores(a) self.assertEqual(z[0,0],z[-1,0]) self.assertFloatEqual(z[0,-1], 1.9853692256351525) self.assertFloatEqual(z[-1,-1], -0.30544141932848506) def test_logzscores(self): """logzscores should perform zscores on log of a""" a = reshape(arange(1,16),(5,3)) #won't work with zero value self.assertFloatEqual(logzscores(a), zscores(log2(a))) def test_ranks(self): """ranks should convert array to ranks within each column""" a = array([[10,20,30],[20,10,50],[30,5,10]]) r = ranks(a) self.assertEqual(r, array([[0,2,1],[1,1,2],[2,0,0]])) def test_quantiles(self): """quantiles should convert array to quantiles within each column""" a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]]) q = quantiles(a) self.assertEqual(q, \ array([[0,.5,.25],[.25,.25,.75],[.5,0,0],[.75,.75,.5]])) def test_make_quantile_normalizer(self): """make_quantile_normalizer should sample from right distribution.""" dist = array([1,2,3,4]) qn = make_quantile_normalizer(dist) a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]]) q = qn(a) self.assertEqual(q, \ array([[1,3,2],[2,2,4],[3,1,1],[4,4,3]])) #check that it works when they don't match in size exactly dist = array([2,4,6,7,8,8,8,8]) qn = make_quantile_normalizer(dist) a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]]) q = qn(a) self.assertEqual(q, \ array([[2,8,6],[6,6,8],[8,2,2],[8,8,8]])) def test_make_normal_quantile_normalizer(self): """make_normal_quantile_normalizer should sample from normal dist.""" nqn = make_normal_quantile_normalizer(20, 10) a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]]) q = nqn(a) exp = array([[-289.02323062, 20. , -47.44897502], [ -47.44897502, -47.44897502, 87.44897502], [ 20. , -289.02323062, -289.02323062], [ 87.44897502, 87.44897502, 20. ]]) self.assertFloatEqual(q, exp) def test_make_empirical_quantile_normalizer(self): """make_empirical_quantile_normalizer should convert a to dist of data""" dist = array([4,2,3,1]) #note: out of order qn = make_empirical_quantile_normalizer(dist) a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]]) q = qn(a) self.assertEqual(q, \ array([[1,3,2],[2,2,4],[3,1,1],[4,4,3]])) def test_geometric_mean(self): """geometric_mean should return geometric mean.""" a = array([1.05, 1.2, .96]) gmean = geometric_mean(a) self.assertFloatEqual(gmean, 1.065484802091121) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_seqsim/test_randomization.py000644 000765 000024 00000006201 12024702176 024231 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the microarray module, dealing with fake expression data.""" from cogent.util.unit_test import TestCase, main from cogent.seqsim.randomization import shuffle_range, shuffle_between, \ shuffle_except_indices, shuffle_except __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" class randomization_tests(TestCase): """Tests of the top-level functionality""" def setUp(self): """Make some standard objects to randomize""" self.numbers = list('123') self.letters = list('abcdef') self.to_test = self.numbers + 2*self.letters + self.numbers def test_shuffle_range(self): """shuffle_range should shuffle only inside range""" shuffle_range(self.to_test, 3, -3) self.assertEqual(self.to_test[:3],self.numbers) self.assertEqual(self.to_test[-3:], self.numbers) self.assertNotEqual(self.to_test[3:-3], 2*self.letters) self.assertEqualItems(self.to_test[3:-3], 2*self.letters) #this time, start is negative and end is positive shuffle_range(self.to_test, -15, 15) self.assertEqual(self.to_test[:3],self.numbers) self.assertEqual(self.to_test[-3:], self.numbers) self.assertNotEqual(self.to_test[3:-3], 2*self.letters) self.assertEqualItems(self.to_test[3:-3], 2*self.letters) def test_shuffle_between(self): """shuffle_between should shuffle between specified chars""" shuffle_peptides = shuffle_between('KR') seq1 = 'AGHCDSGAHF' #each 10 chars long seq2 = 'PLMIDNYHGT' protein = seq1 + 'K' + seq2 result = shuffle_peptides(protein) self.assertEqual(result[10], 'K') self.assertNotEqual(result[:10], seq1) self.assertEqualItems(result[:10], seq1) self.assertNotEqual(result[11:], seq2) self.assertEqualItems(result[11:], seq2) def test_shuffle_except_indices(self): """shuffle_except_indices should shuffle all except specified indices""" seq1 = 'AGHCDSGAHF' #each 10 chars long seq2 = 'PLMIDNYHGT' protein = seq1 + 'K' + seq2 result = list(protein) shuffle_except_indices(result, [10]) self.assertEqual(result[10], 'K') self.assertNotEqual(''.join(result), protein) self.assertEqualItems(''.join(result), protein) self.assertNotEqualItems(''.join(result[:10]), seq1) def test_shuffle_except(self): """shuffle_except_indices should shuffle all except specified indices""" seq1 = 'AGHCDSGAHF' #each 10 chars long seq2 = 'PLMIDNYHGT' protein = seq1 + 'K' + seq2 prot = protein se = shuffle_except('K') result = se(prot) self.assertEqual(result[10], 'K') self.assertNotEqual(''.join(result), protein) self.assertEqualItems(''.join(result), protein) self.assertNotEqualItems(''.join(result[:10]), seq1) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_seqsim/test_searchpath.py000644 000765 000024 00000064245 12024702176 023511 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests private methods of SearchPath and SearchNode classes. """ from cogent.util.unit_test import TestCase, main from cogent.util.misc import NonnegIntError from cogent.seqsim.searchpath import SearchPath, SearchNode __author__ = "Amanda Birmingham" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Amanda Birmingham"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Amanda Birmingham" __email__ = "amanda.birmingham@thermofisher.com" __status__ = "Production" class SearchPathHelper(object): """Contains data and function defs used by public AND private tests""" #all primers have certain forbidden sequences: no runs longer than 3 of #any purines or pyrimidines standard_forbid_seq = ['AAAA', 'GAAA', 'AGAA', 'GGAA', 'AAGA', 'GAGA', \ 'AGGA', 'GGGA', 'AAAG', 'GAAG', 'AGAG', 'GGAG', \ 'AAGG', 'GAGG', 'AGGG', 'GGGG', 'CCCC', 'TCCC', \ 'CTCC', 'TTCC', 'CCTC', 'TCTC', 'CTTC', 'TTTC', \ 'CCCT', 'TCCT', 'CTCT', 'TTCT', 'CCTT', 'TCTT', \ 'CTTT', 'TTTT'] alphabets = {SearchPath.DEFAULT_KEY:"ACGT"} #end SearchPathHelper class SearchNodeHelper(object): """Contains data and function defs used by public AND private tests""" #list of possible bases at any position alphabet = ["A", "C", "G", "T"] #end SearchNodeHelper class SearchPathTests(TestCase): """Tests public SearchPath methods.""" #------------------------------------------------------------------- #Tests of clearNodes def test_clearNodes(self): """Should empty path stack and variable forbidden""" #create a searchpath and add just one node test = SearchPath(SearchPathHelper.alphabets) test.generate(1) #now call clear and make sure path value is "" (empty) test.clearNodes() self.assertEquals(test.Value, "") #end test_clearNodes #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of generate method def tripletGenerator(self, pathobj, path_length): """Helper function to generate primers and catalogue triplets""" found_triplets = {} #make a hundred random searchpaths for i in xrange(100): curr_path_val = pathobj.generate(path_length) #now find all the triplets in this path for r in xrange(path_length-2): curr_triplet = curr_path_val[r:r+3] found_triplets[curr_triplet] = True #end if #clear out the path pathobj.clearNodes() #next rand sequence return found_triplets #end tripletGenerator def test_generate_fullCoverage(self): """With no constraints, should produce all possible triplets""" path_length = 20 test = SearchPath(SearchPathHelper.alphabets) #make a hundred random searchpaths and see what triplets produced found_triplets = self.tripletGenerator(test, path_length) num_found = len(found_triplets.keys()) self.assertEquals(num_found, 64) #end test_generate_fullCoverage def test_generate_withForbidden(self): """With 2 triplet constraints, should produce all others""" forbidden_triplet = ["ATG", "CCT"] path_length = 20 test = SearchPath(SearchPathHelper.alphabets, forbidden_triplet) #make a hundred random searchpaths and see what triplets produced found_triplets = self.tripletGenerator(test, path_length) num_found = len(found_triplets.keys()) self.assertEquals(num_found, 62) #end test_generate_oneForbidden def test_generate_nonePossible(self): """Should return null if no path can match constraints""" alphabet = {SearchPath.DEFAULT_KEY:"AB"} #forbid all combinations of alphabet forbidden_seqs = ["AA", "AB", "BB", "BA"] test = SearchPath(alphabet, forbidden_seqs) output = test.generate(2) self.assertEquals(output, None) #end test_generate_nonePossible def test_generate_multiple(self): """Should be able to call generate multiple times to extend path""" test = SearchPath(SearchPathHelper.alphabets) output1 = test.generate(2) output2 = test.generate(3) #make sure that the length of the path is now three self.assertEquals(len(output2), 3) #make sure that the new path is a superset of the old one self.assertEquals(output1, output2[:2]) #end test_generate_multiple def test_generate_correctAlph(self): """Should get correct alphabet even if node is popped then readded""" test_alphs = {0:"A",1:"BC",2:"D",3:"E",SearchPath.DEFAULT_KEY:"X"} forbidden_seqs = ["CD"] test = SearchPath(test_alphs, forbidden_seqs) #given these position alphabets and this forbidden seq, #the only legal 3-node searchpath should be ABD. Make #a hundred searchpaths and make sure this is the only one #that actually shows up. found_paths = {} for i in xrange(100): curr_path = test.generate(3) found_paths[curr_path] = True test.clearNodes() #next #make sure there is only one path found and that it is the right one found_path_str = str("".join(found_paths.keys())) self.assertEquals(len(found_paths), 1) self.assertEquals(found_path_str, "ABD") #end test_generate_correctAlph #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of findAllowedOption def test_findAllowedOption_currentAllowed(self): """Should return true when current option is allowed""" #searchpath with no forbidden seqs, so anything should work test = SearchPath(SearchPathHelper.alphabets) test._add_node(SearchNode(SearchNodeHelper.alphabet)) allowed_found = test.findAllowedOption() self.assertEquals(allowed_found, True) #end test_findAllowedOption_currentAllowed def test_findAllowedOption_otherAllowed(self): """Should return true when curr option is bad but another is good""" node_vals = [] #create a path and put in 2 nodes; since all the forbidden seqs I #used to init the path have 4 entries, there should be no chance that #the path I just created has anything forbidden in it test = self._fill_path(2, node_vals) #add the existing path value to the forbidden list test._fixed_forbidden["".join(node_vals)] = True test._forbidden_lengths[2] = True #call findAllowedOption ... should find next available good option allowed_found = test.findAllowedOption() self.assertEquals(allowed_found, True) #end test_findAllowedOption_otherAllowed def test_findAllowedOption_none(self): """Should return false if curr option is bad and no good exist""" test = self._fill_path(1) self._empty_top(test) #get the value of the top node's only remaining option; #add to forbidden last_option = test._get_top().Options test._fixed_forbidden["".join(last_option)] = True test._forbidden_lengths[1] = True #now make sure we get back result that no options for path remain allowed_found = test.findAllowedOption() self.assertEquals(allowed_found, False) #end test_findAllowedOption_none #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of removeOption #Helper function def _fill_path(self, num_nodes, node_vals = []): """create a searchpath and add searchnodes; return path""" test = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) for i in xrange(num_nodes): curr_node = SearchNode(SearchNodeHelper.alphabet) node_vals.append(curr_node.Value) test._add_node(curr_node) #next i return test #end _fill_path #Helper function def _empty_top(self, spath): """remove all but one options from the top node""" top_node = spath._get_top() num_options = len(top_node.Options) for i in xrange(num_options-1): top_node.removeOption() #end _empty_top def test_removeOption_simple(self): """Should correctly remove option from untapped node""" #create a searchpath and a searchnode test = self._fill_path(1) orig_len_minus1 = len(test._get_top().Options) - 1 #also check that remove result is true: node still has options has_options = test.removeOption() self.assertEqual(has_options, True) #get the top node and make sure that it has fewer options top_node = test._get_top() option_len = len(top_node.Options) self.assertEqual(option_len, orig_len_minus1) #end test_removeOption_simple def test_removeOption_empty(self): """Should return False if removing option leads to empty stack""" #create a searchpath with just one searchnode, then (almost) empty it test = self._fill_path(1) self._empty_top(test) #now remove the last option, and make sure the stack is now empty some_left = test.removeOption() self.assertEquals(some_left, False) #end test_removeOption_empty def test_removeOption_recurse(self): """Should correctly remove empty node and curr option of next""" #put two nodes in the search path and almost empty the top one test = self._fill_path(2) self._empty_top(test) test.removeOption() #make sure there's only one item left in the path self.assertEquals(len(test._path_stack), 1) #make sure that it has one fewer options top_node = test._get_top() self.assertEquals(len(top_node.Options), len(top_node.Alphabet)-1) #end test_removeOption_recurse #------------------------------------------------------------------- #end SearchPathTests class SearchNodeTests(TestCase): """Tests public SearchNode methods.""" #------------------------------------------------------------------- #Tests of removeOption method def test_removeOption_someLeft(self): """removeOption should cull options and return T when some left.""" #create a search node and get its current value test = SearchNode(SearchNodeHelper.alphabet) last_val = test.Value some_left = test.removeOption() #new current value must be different from old #and return value must be true self.assertNotEqual(test.Value, last_val) self.assertEqual(some_left, True) #end test_removeOption_someLeft def test_removeOption_noneLeft(self): """removeOption should cull options and return F when none left.""" test = SearchNode(SearchNodeHelper.alphabet) num_options = len(test.Options) #removeOption num_options times: that should get 'em all for i in xrange(num_options): some_left = test.removeOption() #return value should be false (no options remain) self.assertEqual(some_left, False) #end test_removeOption_noneLeft #------------------------------------------------------------------- #------------------------------------------------------------------- #Test of options property (and _get_options method) def test_options(self): """Should return a copy of real options""" test = SearchNode(SearchNodeHelper.alphabet) optionsA = test.Options del optionsA[0] optionsB = test.Options self.assertNotEqual(len(optionsA), len(optionsB)) #end test_options #------------------------------------------------------------------- #end SearchNodeTests class SearchPathTests_private(TestCase): """Tests for private SearchPath methods.""" #No need to test __str__: just calls toString in general_tools #No need to test _accept_option or _remove_accepted_option: they #simply pass in the base class implementation #No need to test _in_extra_forbidden: just returns False in base class #implementation #------------------------------------------------------------------- #Helper functions def _fill_path(self, num_nodes, node_vals = []): """create a searchpath and add searchnodes; return path""" test = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) for i in xrange(num_nodes): curr_node = SearchNode(SearchNodeHelper.alphabet) node_vals.append(curr_node.Value) test._add_node(curr_node) #next i return test #end _fill_path def _empty_top(self, spath): """remove all but one options from the top node""" top_node = spath._get_top() num_options = len(top_node.Options) for i in xrange(num_options-1): top_node.removeOption() #end _empty_top #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of __init__ def test_init_noForbid(self): """Init should correctly set private properties w/o forbid list""" test = SearchPath(SearchPathHelper.alphabets) real_result = len(test._fixed_forbidden.keys()) self.assertEquals(real_result, 0) #end test_init_noForbid def test_init_withForbid(self): """Init should correctly set private properties, w/forbid list""" user_input = SearchPathHelper.standard_forbid_seq[:] user_input.extend(["AUG", "aaaaccuag"]) test = SearchPath(SearchPathHelper.alphabets, user_input) user_input = [i.upper() for i in user_input] user_input.sort() real_result = test._fixed_forbidden.keys() real_result.sort() self.assertEquals(str(real_result), str(user_input)) #end test_init_withForbid def test_init_badAlphabets(self): """Init should fail if alphabets param is not dictionary-like""" self.assertRaises(ValueError, SearchPath, "blue") #end test_init_badAlphabets def test_init_noDefault(self): """Init should fail if alphabets param has no 'default' key""" self.assertRaises(ValueError, SearchPath, {12:"A"}) #end test_init_noDefault #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of value property (and _get_value method) def test_value_empty(self): """Should return empty string when path is empty""" test = SearchPath(SearchPathHelper.alphabets) self.assertEquals(test.Value, "") #end test_value_empty def test_value(self): """Should return string of node values when nodes exist""" node_vals = [] test = self._fill_path(3, node_vals) self.assertEquals(test.Value, "".join(node_vals)) #end test_value #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of _top_index property (and _get_top_index method) def test_top_index(self): """Should return index of top node when one exists""" test = self._fill_path(3) top_index = test._top_index self.assertEquals(top_index,2) #end test_top_index def test_top_index_None(self): """Should return None when stack has no entries""" test = SearchPath(SearchPathHelper.alphabets) top_index = test._top_index self.assertEquals(top_index, None) #end test_top_index_None #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of _get_top method def test_get_top(self): """Should return a reference to top node on stack if there is one""" test = SearchPath(SearchPathHelper.alphabets) test._add_node(SearchNode(SearchNodeHelper.alphabet)) topnode = SearchNode(SearchNodeHelper.alphabet) test._add_node(topnode) resultnode = test._get_top() self.assertEquals(resultnode, topnode) #end test_get_top def test_get_top_None(self): """Should return None if stack is empty""" test = SearchPath(SearchPathHelper.alphabets) topnode = test._get_top() self.assertEquals(topnode, None) #end test_get_top_None #------------------------------------------------------------------- #------------------------------------------------------------------- #Test of _get_forbidden_lengths def test_get_forbidden_lengths(self): """get_forbidden_lengths should return dict of forbidden seq lens""" correct_result = str([3, 4, 9]) user_input = SearchPathHelper.standard_forbid_seq[:] user_input.extend(["AUG", "aaaaccuag"]) test = SearchPath(SearchPathHelper.alphabets, user_input) real_dict = test._get_forbidden_lengths() real_list = real_dict.keys() real_list.sort() real_result = str(real_list) self.assertEquals(real_result, correct_result) #end test_get_forbidden_lengths #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of _add_node def test_add_node_first(self): """add_node should correctly add first node and increase top index.""" test = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) test_node = SearchNode(SearchNodeHelper.alphabet) test._add_node(test_node) self.assertEquals(len(test._path_stack), 1) self.assertEquals(test._top_index, 0) #end test_add_node_first def test_add_node_subsequent(self): """add_node should correctly add additional nodes and up top index.""" test = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) test_node = SearchNode(SearchNodeHelper.alphabet) test_node2 = SearchNode(SearchNodeHelper.alphabet) test._add_node(test_node) test._add_node(test_node2) self.assertEquals(len(test._path_stack), 2) self.assertEquals(test._top_index, 1) #end test_add_node_subsequent #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of _get_nmer def test_get_nmer(self): """get_nmer should return correct nmer for n <= length of stack""" node_values = [] n = 4 test_path = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) for i in xrange(n+1): curr_node = SearchNode(SearchNodeHelper.alphabet) test_path._add_node(curr_node) node_values.append(curr_node.Value) #next #get a nmer, and get the last n values that were put on stack; #should be the same real_result = test_path._get_nmer(n) correct_result = "".join(node_values[-n:]) self.assertEquals(real_result, correct_result) #end test_get_nmer def test_get_nmer_tooLong(self): """get_nmer should return None for n > length of stack""" test_path = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) test_node = SearchNode(SearchNodeHelper.alphabet) test_path._add_node(test_node) #stack is 1 long. Ask for a 2 mer real_result = test_path._get_nmer(2) self.assertEquals(real_result, None) #end test_get_nmer_tooLong def test_get_nmer_len1(self): """get_nmer should return correct result for nmer 1 on full stack""" test_path = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) test_node = SearchNode(SearchNodeHelper.alphabet) test_path._add_node(test_node) correct_result = test_node.Value real_result = test_path._get_nmer(1) self.assertEquals(real_result, correct_result) #end test_get_nmer_len1 def test_get_nmer_len0(self): """get_nmer should return an empty string if n is 0""" #if n is zero, this should return "" even when stack is empty test_path = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) real_result = test_path._get_nmer(0) self.assertEquals(real_result, "") #end test_get_nmer_len0 def test_get_nmer_badArg(self): """get_nmer should error if given a non integer-castable n""" test_path = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) self.assertRaises(NonnegIntError, test_path._get_nmer, "blue") #end test_get_nmer_badArg #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of _check_forbidden_seqs def test_check_forbidden_seqs_fixed(self): """Should return True if path includes a fixed forbidden seq""" forbidden_seq = ["G", "U", "A"] user_input = ["".join(forbidden_seq)] user_input.extend(SearchPathHelper.standard_forbid_seq) test = SearchPath(SearchPathHelper.alphabets, user_input) test._add_node(SearchNode(SearchNodeHelper.alphabet)) #add more values, and cheat so as to make them something forbidden for bad_val in forbidden_seq: curr_node = SearchNode(SearchNodeHelper.alphabet) curr_node._options[0] = bad_val #torque the node's innards test._add_node(curr_node) #next bad_val real_result = test._check_forbidden_seqs() self.assertEquals(real_result, True) #end test_check_forbidden_seqs_fixed def test_check_forbidden_seqs_none(self): """Should return False if path includes no forbidden seqs""" #a seq that isn't in the standard fixed forbidden lib allowed_seq = ["C", "U", "A", "T"] test = SearchPath(SearchPathHelper.alphabets, \ SearchPathHelper.standard_forbid_seq) test._add_node(SearchNode(SearchNodeHelper.alphabet)) #add more values, and cheat so as to make them something known for known_val in allowed_seq: curr_node = SearchNode(SearchNodeHelper.alphabet) curr_node._options[0] = known_val #torque the node's innards test._add_node(curr_node) #next bad_val real_result = test._check_forbidden_seqs() self.assertEquals(real_result, False) #end test_check_forbidden_seqs_fixed #------------------------------------------------------------------- #------------------------------------------------------------------- #Tests of _get_alphabet def test_get_alphabet_exists(self): """Should return alphabet for position when one exists""" alph1 = "G" alph2 = "ACGT" test_alphs = {0:alph1, 2:alph1, SearchPath.DEFAULT_KEY:alph2} test = SearchPath(test_alphs) real_alph = test._get_alphabet(2) self.assertEquals(str(real_alph), alph1) #end test_get_alphabet_exists def test_get_alphabet_default(self): """Should return default alphabet if none defined for position""" #SearchPathHelper.alphabets has only a default entry test = SearchPath(SearchPathHelper.alphabets) real_alph = test._get_alphabet(0) correct_alph = SearchPathHelper.alphabets[SearchPath.DEFAULT_KEY] self.assertEquals(str(real_alph), str(correct_alph)) #end test_get_alphabet_default def test_get_alphabet_badPosition(self): """Should raise error if input isn't castable to nonneg int""" test = SearchPath(SearchPathHelper.alphabets) self.assertRaises(NonnegIntError, test._get_alphabet, "blue") #end test_get_alphabet_badPosition #------------------------------------------------------------------- #end SearchPathTests_private class SearchNodeTests_private(TestCase): """Tests for private SearchNode methods.""" #No need to test __str__: just calls toString in general_tools #No need to test _get_value and the value property: just references #an item in an array #No need to test _get_alphabet and alphabet property: ibid #------------------------------------------------------------------- #Tests of __init__ def test_init_noArg(self): """Init should correctly set private properties w/no arg""" correct_result = str(SearchNodeHelper.alphabet) test = SearchNode(SearchNodeHelper.alphabet) options = test.Options options.sort() real_result = str(options) self.assertEquals(real_result, correct_result) #end test_init_noArg #------------------------------------------------------------------- #end SearchNodeTests_private if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_seqsim/test_sequence_generators.py000644 000765 000024 00000142373 12024702176 025427 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """test_sequence_generator.py: tests of the sequence_generator module. """ from cogent.seqsim.sequence_generators import permutations, combinations, \ SequenceGenerator, Partition, Composition, \ MageFrequencies, SequenceHandle, IUPAC_DNA, IUPAC_RNA, BaseFrequency, \ PairFrequency, BasePairFrequency, RegionModel, ConstantRegion, \ UnpairedRegion, ShuffledRegion, PairedRegion, MatchingRegion, \ SequenceModel, Rule, Motif, Module, SequenceEmbedder from StringIO import StringIO from operator import mul from sys import path from cogent.maths.stats.util import Freqs from cogent.util.misc import app_path from cogent.struct.rna2d import ViennaStructure from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" #need to skip some tests if RNAfold absent if app_path('RNAfold'): RNAFOLD_PRESENT = True else: RNAFOLD_PRESENT = False class FunctionTests(TestCase): """Tests of standalone functions""" def setUp(self): self.standards = (0, 1, 5, 30, 173, 1000, 4382) def test_permuations_negative_k(self): """permutations should raise IndexError if k negative""" self.assertRaises(IndexError, permutations, 3, -1) def test_permutations_k_more_than_n(self): """permutations should raise IndexError if k > n""" self.assertRaises(IndexError, permutations, 3, 4) def test_permutations_negative_n(self): """permutations should raise IndexError if n negative""" self.assertRaises(IndexError, permutations, -3, -2) def test_permutations_k_equals_1(self): """permutations should return n if k=1""" for n in self.standards[1:]: self.assertEqual(permutations(n,1), n) def test_permutations_k_equals_2(self): """permutations should return n*(n-1) if k=2""" for n in self.standards[2:]: self.assertEqual(permutations(n,2), n*(n-1)) def test_permutations_k_equals_n(self): """permutations should return n! if k=n""" for n in self.standards[1:]: self.assertEqual(permutations(n,n), reduce(mul, range(1,n+1))) def test_combinations_k_equals_n(self): """combinations should return 1 if k = n""" for n in self.standards: self.assertEqual(combinations(n,n), 1) def test_combinations_k_equals_n_minus_1(self): """combinations should return n if k=(n-1)""" for n in self.standards[1:]: self.assertEqual(combinations(n, n-1), n) def test_combinations_zero_k(self): """combinations should return 1 if k is zero""" for n in self.standards: self.assertEqual(combinations(n, 0), 1) def test_combinations_symmetry(self): """combinations(n,k) should equal combinations(n,n-k)""" for n in self.standards[3:]: for k in (0, 1, 5, 18): self.assertEquals(combinations(n, k), combinations(n, n-k)) def test_combinations_arbitrary_values(self): """combinations(n,k) should equal results from spreadsheet""" results = { 30:{0:1, 1:30, 5:142506, 18:86493225, 29:30, 30:1}, 173:{0:1, 1:173, 5:1218218079, 18:1.204353e24, 29:7.524850e32, \ 30:3.611928e33}, 1000:{0:1, 1:1000, 5:8.2502913e12, 18:1.339124e38,29:7.506513e55, \ 30:2.429608e57}, 4382:{0:1, 1:4382, 5:1.343350e16, 18:5.352761e49, 29:4.184411e74, \ 30:6.0715804e76}, } for n in self.standards[3:]: for k in (0, 1, 5, 18, 29, 30): self.assertFloatEqualRel(combinations(n,k), results[n][k], 1e-5) class SequenceGeneratorTests(TestCase): """Tests of SequenceGenerator, which fills in degenerate bases""" def setUp(self): """Defines a few standard generators""" self.rna_codons = SequenceGenerator('NNN') self.dna_iupac_small = SequenceGenerator('RH', IUPAC_DNA) self.empty = SequenceGenerator('') self.huge = SequenceGenerator('N'*50) self.binary = SequenceGenerator('01??01', {'0':'0','1':'1','?':'10'}) def test_len(self): """len(SequenceGenerator) should return number of possible matches""" lengths = ((self.rna_codons, 64), (self.dna_iupac_small, 6), (self.empty, 0), (self.binary, 4)) for item, expected in lengths: self.assertEqual(len(item), expected) try: len(self.huge) except OverflowError: pass else: raise AssertionError, "Failed to raise expected OverflowError" def test_numPossibilities(self): """SequenceGenerator.numPossibilities() should be robust to overflow""" lengths = ((self.rna_codons, 64), (self.dna_iupac_small, 6), (self.empty, 0), (self.binary, 4), (self.huge, 4**50)) for item, expected in lengths: self.assertEqual(item.numPossibilities(), expected) def test_sequences(self): """SequenceGenerator should produce the correct list of sequences""" self.assertEqual(list(self.empty), []) self.assertEqual(list(self.dna_iupac_small), \ ['AT','AC','AA','GT','GC','GA']) codons = [] for first in 'UCAG': for second in 'UCAG': for third in 'UCAG': codons.append(''.join([first, second, third])) self.assertEqual(list(self.rna_codons), codons) #test that it still works if we call the generator a second time self.assertEqual(list(self.rna_codons), codons) def test_iter(self): """SequenceGenerator should act like a list with for..in syntax""" as_list = list(self.rna_codons) for obs, exp in zip(self.rna_codons, as_list): self.assertEqual(obs, exp) def test_getitem(self): """SequenceGenerator should allow __getitem__ like a list""" as_list = list(self.rna_codons) for i in range(64): self.assertEqual(self.rna_codons[i], as_list[i]) for i in range(1,65): self.assertEqual(self.rna_codons[-i], as_list[-i]) self.assertEqual(self.huge[-1], 'G'*50) def test_getitem_slices(self): """SequenceGenerator slicing should work the same as a list""" e = list(self.rna_codons) o = self.rna_codons values = ( (o[:], e[:]), (o[0:], e[0:]), (o[1:], e[1:]), (o[5:], e[5:]), (o[0:5], e[0:5]), (o[1:5], e[1:5]), (o[5:5], e[5:5]), (o[0:1], e[0:1]), (o[len(o)-1:len(o)], e[len(e)-1:len(e)]), (o[len(o):len(o)], e[len(e):len(e)]), ) testnum = 0 for obs, exp in values: testnum += 1 self.assertEqual(list(obs), exp) big = list(self.huge[1:5]) self.assertEqual(['U'*49+'C', 'U'*49+'A', 'U'*49+'G', 'U'*48+'CU'], big) class PartitionTests(TestCase): """Tests of the Paritition object.""" def test_single_partition(self): """If number of objects = bins * min, only one way to partition""" for num_bins in range(1, 10): for occupancy in range(10): self.assertEqual(len(Partition(num_bins*occupancy, num_bins, occupancy)), 1) def test_partitions(self): """Test several properties of partitions, especially start/end""" for num_bins in range(1, 5): for occupancy in range(5): for num_items in \ range(num_bins*occupancy, num_bins*occupancy + 10): p = Partition(num_items, num_bins, occupancy) l = [i for i in p] l2 = [i for i in p] #check that calling it twice doesn't break it self.assertEqual(l, l2) #check the lengths self.assertEqual(len(p), len(l)) #check the ranges are the same... self.assertEqual(l[0][1:], l[-1][0:-1]) #and that they contain the right values. self.assertEqual(l[0][1:], [occupancy]*(num_bins - 1)) #check the first and last elements self.assertEqual(l[0][0], l[-1][-1]) self.assertEqual(l[0][0], \ num_items - occupancy * (num_bins - 1)) def test_values(self): """Partition should match precalculated values""" self.assertEqual(len(Partition(20, 4, 1)), 969) def test_str(self): """str(partition) should work as expected""" p = Partition(20,4,1) self.assertEqual(str(p), "Items: 20 Pieces: 4 Min Per Piece: 1") p.NumItems = 13 p.NumPieces = 2 p.MinOccupancy = 0 self.assertEqual(len(p), len(Partition(13, 2, 0))) self.assertEqual(str(p), "Items: 13 Pieces: 2 Min Per Piece: 0") class CompositionTests(TestCase): """Tests of the Composition class.""" def setUp(self): """Define a few standard compositions.""" self.bases_10pct = Composition(10, 0, "ACGU") self.bases_5pct = Composition(5, 1, "ACGU") self.bases_extra = Composition(10, 0, "CYGEJ") self.small = Composition(20, 0, "xy") self.unique = Composition(20, 1, "z") def test_lengths(self): """Composition should return correct number of elements""" self.assertEqual(len(self.bases_10pct), len(Partition(10,4,0))) self.assertEqual(len(self.bases_5pct), len(Partition(20,4,1))) self.assertEqual(len(self.bases_extra), len(Partition(10,5,0))) self.assertEqual(len(self.small), len(Partition(5, 2, 0))) self.assertEqual(len(self.unique), len(Partition(5, 1, 1))) def test_known_vals(self): """Composition should return precalculated elements for known cases""" self.assertEqual(len(Composition(5,1,"ACGU")), 969) self.assertEqual(len(Composition(5,0,"ACGU")), 1771) as_list = list(Composition(5,1,"ACGU")) self.assertEqual(as_list[0], Freqs('A'*17+'CGU')) self.assertEqual(as_list[-1], Freqs('U'*17+'ACG')) def test_updating(self): """Composition updates should reset frequencies correctly.""" exp_list = list(Composition(5, 1, "GCAUN")) self.bases_10pct.Spacing = 5 self.bases_10pct.Alphabet = "GCAUN" self.bases_10pct.MinOccupancy = 1 self.assertEqual(list(self.bases_10pct), exp_list) class MageFrequenciesTests(TestCase): """Tests of the MageFrequencies class -- presentation for Composition.""" def setUp(self): """Define a few standard compositions.""" self.bases_10pct = Composition(10, 0, "ACGU") def test_str(self): """MageFrequencies string conversions work correctly""" obs_list = list(self.bases_10pct) self.assertEqual(str(MageFrequencies(obs_list[0])), '1.0 0.0 0.0') self.assertEqual(str(MageFrequencies(obs_list[-1], "last")), \ '{last} 0.0 0.0 0.0') self.assertEqual(str(MageFrequencies({'C':2, 'A':3, 'T':5, 'x':17}, \ 'bases')), '{bases} 0.3 0.2 0.0') class SequenceHandleTests(TestCase): """Tests of the SequenceHandle class.""" def setUp(self): """Define some standard SequenceHandles.""" self.rna = SequenceHandle('uuca', 'ucag') self.any = SequenceHandle(['u', 1, None]) self.empty = SequenceHandle() def test_init_good(self): """SequenceHandle should init OK without alphabet""" self.assertEqual(SequenceHandle('abc123'), list('abc123')) self.assertEqual(SequenceHandle(), list()) self.assertEqual(SequenceHandle('abcaaa', 'abcd'), list('abcaaa')) self.assertEqual(SequenceHandle([1,2,3]), [1,2,3]) def test_init_bad(self): """SequenceHandle should raise ValueError if item not in alphabet""" self.assertRaises(ValueError, SequenceHandle, 'abc1', 'abc') self.assertRaises(ValueError, SequenceHandle, '1', [1]) def test_setitem_good(self): """SequenceHandle setitem should allow items in alphabet""" self.rna[0] = 'c' self.assertEqual(self.rna, list('cuca')) self.rna[-1] = 'u' self.assertEqual(self.rna, list('cucu')) self.any[1] = [1, 2, 3] self.assertEqual(self.any, ['u', [1, 2, 3], None]) def test_setitem_bad(self): """SequenceHandle setitem should reject items not in alphabet""" self.assertRaises(ValueError, self.rna.__setitem__, 0, 'x') def test_setslice_good(self): """SequenceHandle setslice should allow same-length slice""" self.rna[:] = list('aaaa') self.assertEqual(self.rna, list('aaaa')) self.rna[0:1] = ['u'] self.assertEqual(self.rna, list('uaaa')) self.rna[-2:] = ['g','g'] self.assertEqual(self.rna, list('uagg')) def test_setslice_bad(self): """SequenceHandle setslice should reject bad items or length change""" self.assertRaises(ValueError, self.rna.__setslice__, 0, len(self.rna), \ ['a']*5) self.assertRaises(ValueError, self.any.__setslice__, 0, len(self.any), \ ['a']*5) self.assertRaises(ValueError, self.rna.__setslice__, 0, 1, ['x']) def test_string(self): """SequenceHandle str should join items without spaces""" #use ''.join if items are strings self.assertEqual(str(self.rna), 'uuca') self.assertEqual(str(self.empty), '') #if some of the items raise errors, use built-in method instead self.assertEqual(str(self.any), str(['u', 1, None])) def test_naughty_methods(self): """SequenceHandle list mutators should raise NotImplementedError""" r = self.rna naughty = [r.__delitem__, r.__delslice__, r.__iadd__, r.__imul__, \ r.append, r.extend, r.insert, r.pop, r.remove] for n in naughty: self.assertRaises(NotImplementedError, n) class BaseFrequencyTests(TestCase): """Tests of BaseFrequency class: wrapper for FrequencyDistibution.""" def test_init(self): """BaseFrequency should init as expected""" self.assertEqual(BaseFrequency('UUUCCCCAG'), \ Freqs('UUUCCCCAG', 'UCAG')) self.assertEqual(BaseFrequency('TTTCAGG', RNA=False), \ Freqs('TTTCAGG')) def test_init_bad(self): """BaseFrequency init should disallow bad characters""" self.assertRaises(Exception, BaseFrequency, 'TTTCAGG') self.assertRaises(Exception, BaseFrequency, 'UACGUA', False) class PairFrequencyTests(TestCase): """Tests of PairFrequency class: wrapper for Freqs.""" def test_init_one_parameter(self): """PairFrequency should interpret single parameter as pair probs""" obs = PairFrequency('UCCC') exp = Freqs({('U','U'):0.0625, ('U','C'):0.1875, ('C','U'):0.1875, ('C','C'):0.5625}) for k, v in exp.items(): self.assertEqual(v, obs[k]) for k, v in obs.items(): if k not in exp: self.assertEqual(v, 0) self.assertEqual(PairFrequency('UCCC', [('U','U'),('C','C')]), \ Freqs({('U','U'):0.1, ('C','C'):0.9})) #check that the alphabets are right: should not raise error on #incrementing characters already there, but should raise KeyError #on anything that's missing. p = PairFrequency('UCCC') p[('U','U')] += 1 try: p[('X','U')] += 1 except KeyError: pass else: raise AssertionError, "Expected KeyError." p = PairFrequency('UCCC', (('C','C'),)) p[('C','C')] += 1 try: p[('U','U')] += 1 except KeyError: pass else: raise AssertionError, "Expected KeyError." class BasePairFrequencyTests(TestCase): """Tests of the BaseFrequency class, constructed for easy initialization.""" def test_init(self): """BaseFrequency init should provide correct PairFrequency""" WatsonCrick = [('A','U'), ('U','A'),('G','C'),('C','G')] Wobble = WatsonCrick + [('G','U'), ('U','G')] #by default, basepair should have the wobble alphabet bpf = BasePairFrequency('UUACG') pf = PairFrequency('UUACG', Wobble) self.assertEqual(bpf, pf) self.assertEqual(bpf.Constraint, pf.Constraint) #can turn GU off, leading to watson-crickery bpf = BasePairFrequency('UUACG', False) #make sure this gives different results... self.assertNotEqual(bpf, pf) self.assertNotEqual(bpf.Constraint, pf.Constraint) #...but that the results are the same when the correct alphabet is used pf = PairFrequency('UUACG', WatsonCrick) self.assertEqual(bpf, pf) self.assertEqual(bpf.Constraint, pf.Constraint) class RegionModelTests(TestCase): """Tests of the RegionModel class. Base class just returns the template.""" def test_init(self): """RegionModel base class should always return current template.""" #test blank region model r = RegionModel() self.assertEqual(str(r.Current), '') self.assertEqual(len(r), 0) #now assign it to a template r.Template = ('ACGUUCGA') self.assertEqual(str(r.Current), 'ACGUUCGA') self.assertEqual(len(r), len('ACGUUCGA')) #check that refresh doesn't break anything r.refresh() self.assertEqual(str(r.Current), 'ACGUUCGA') self.assertEqual(len(r), len('ACGUUCGA')) #check composition self.assertEqual(r.Composition, None) d = {'A':3, 'U':10} r.Composition = Freqs(d) self.assertEqual(r.Composition, d) #check that composition doesn't break the update r.refresh() self.assertEqual(str(r.Current), 'ACGUUCGA') self.assertEqual(len(r), len('ACGUUCGA')) class ConstantRegionTests(TestCase): """Tests of the ConstantRegion class. Just returns the template.""" def test_init(self): """ConstantRegion should always return current template.""" #test blank region model r = ConstantRegion() self.assertEqual(str(r.Current), '') self.assertEqual(len(r), 0) #now assign it to a template r.Template = ('ACGUUCGA') self.assertEqual(str(r.Current), 'ACGUUCGA') self.assertEqual(len(r), len('ACGUUCGA')) #check that refresh doesn't break anything r.refresh() self.assertEqual(str(r.Current), 'ACGUUCGA') self.assertEqual(len(r), len('ACGUUCGA')) #check composition self.assertEqual(r.Composition, None) d = {'A':3, 'U':10} r.Composition = Freqs(d) self.assertEqual(r.Composition, d) #check that composition doesn't break the update r.refresh() self.assertEqual(str(r.Current), 'ACGUUCGA') self.assertEqual(len(r), len('ACGUUCGA')) class UnpairedRegionTests(TestCase): """Tests of unpaired region: should fill in w/ single-base frequencies.""" def test_init(self): """Unpaired region should generate right freqs, even after change""" freqs = Freqs({'C':10,'U':1, 'A':0}) r = UnpairedRegion('NN', freqs) seq = r.Current assert seq[0] in 'CU' assert seq[1] in 'CU' self.assertEqual(len(seq), 2) fd = [] for i in range(1000): r.refresh() fd.append(str(seq)) fd = Freqs(''.join(fd)) observed = [fd['C'], fd['U']] expected = [1800, 200] self.assertSimilarFreqs(observed, expected) self.assertEqual(fd['U'] + fd['C'], 2000) freqs2 = Freqs({'A':5, 'U':5}) r.Composition = freqs2 r.Template = 'NNN' #note that changing the Template changes seq ref seq = r.Current self.assertEqual(len(seq), 3) assert seq[0] in 'AU' assert seq[1] in 'AU' assert seq[2] in 'AU' fd = [] for i in range(1000): r.refresh() fd.append(str(seq)) fd = Freqs(''.join(fd)) observed = [fd['A'], fd['U']] expected = [1500, 1500] self.assertSimilarFreqs(observed, expected) self.assertEqual(fd['A'] + fd['U'], 3000) class ShuffledRegionTests(TestCase): """Shuffled region should randomize string""" def test_init(self): """Shuffled region should init ok with string, ignoring base freqs""" #general strategy: seqs should be different, but sorted seqs should #be the same empty = '' seq = 'UUUCCCCAAAGGG' #check that we don't get errors on empty template r = ShuffledRegion(empty) r.refresh() self.assertEqual(str(r.Current), '') #check that changing the template changes the sequence r.Template = seq self.assertNotEqual(str(r.Current), '') #check that it shuffled the sequence the first time self.assertNotEqual(str(r.Current), seq) curr = str(r.Current) as_list = list(curr) #check that we have the right number of each type of base as_list.sort() exp_as_list = list(seq) exp_as_list.sort() self.assertEqual(as_list, exp_as_list) #check that we get something different if we refresh again r.refresh() self.assertNotEqual(str(r.Current), curr) as_list = list(str(r.Current)) as_list.sort() self.assertEqual(as_list, exp_as_list) class PairedRegionTests(TestCase): """Tests of paired region generation.""" def test_init(self): """Paired region init and mutation should give expected results""" WatsonCrick = {'A':'U', 'U':'A', 'C':'G', 'G':'C'} Wobble = {'A':'U', 'U':'AG', 'C':'G', 'G':'UC'} #check that empty init doesn't give errors r = PairedRegion() r.refresh() #check that mutation works correctly r.Template = "N" self.assertEqual(len(r), 1) r.monomers('UCCGGA') upstream = r.Current[0] downstream = r.Current[1] states = {} num_to_do = 10000 for i in range(num_to_do): r.refresh() curr = (upstream[0], downstream[0]) assert upstream[0] in Wobble[downstream[0]] states[curr] = states.get(curr, 0) + 1 for i in states.keys(): assert i[1] in Wobble[i[0]] for i in Wobble: for j in Wobble[i]: assert (i, j) in states.keys() expected_dict = {('A','U'):num_to_do/14, ('U','A'):num_to_do/14, ('C','G'):num_to_do/14*4, ('G','C'):num_to_do/14*4, ('U','G'):num_to_do/14*2, ('G','U'):num_to_do/14*2,} # the following for loop was replaced with the assertSimilarFreqs # call below it #for key, val in expected.items(): #self.assertFloatEqualAbs(val, states[key], 130) #conservative? expected = [val for key, val in expected_dict.items()] observed = [states[key] for key, val in expected_dict.items()] self.assertSimilarFreqs(observed, expected) assert ('G','U') in states assert ('U','G') in states r.monomers('UCGA', GU=False) upstream = r.Current[0] downstream = r.Current[1] states = {} num_to_do = 10000 for i in range(num_to_do): r.refresh() curr = (upstream[0], downstream[0]) assert upstream[0] in WatsonCrick[downstream[0]] states[curr] = states.get(curr, 0) + 1 for i in states.keys(): assert i[1] in WatsonCrick[i[0]] for i in WatsonCrick: for j in WatsonCrick[i]: assert (i, j) in states.keys() expected_dict = {('A','U'):num_to_do/4, ('U','A'):num_to_do/4, ('C','G'):num_to_do/4, ('G','C'):num_to_do/4,} expected = [val for key, val in expected_dict.items()] observed = [states[key] for key, val in expected_dict.items()] self.assertSimilarFreqs(observed, expected) #for key, val in expected.items(): # self.assertFloatEqualAbs(val, states[key], 130) #3 std devs assert ('G','U') not in states assert ('U','G') not in states class SequenceModelTests(TestCase): """Tests of the SequenceModel class.""" def test_init(self): """SequenceModel should init OK with Isoleucine motif.""" helices = [PairedRegion('NNN'), PairedRegion('NNNNN')] constants = [ConstantRegion('CUAC'), ConstantRegion('UAUUGGGG')] order = "H0 C0 H1 - H1 C1 H0" isoleucine = SequenceModel(order=order, constants=constants, \ helices=helices) isoleucine.Composition = BaseFrequency('UCAG') #print #print for i in range(10): isoleucine.refresh() #print list(isoleucine) #print isoleucine.Composition = BaseFrequency('UCAG') isoleucine.GU = False #print for i in range(10): isoleucine.refresh() #print list(isoleucine) #print class RuleTests(TestCase): """Tests of the Rule class""" def test_init_bad_params(self): """Rule should fail validation except with exactly 5 parameters""" self.assertRaises(TypeError, Rule, 1, 1, 1, 1) self.assertRaises(TypeError, Rule, 1, 1, 1, 1, 1, 1) def test_init_bad_length(self): """Rule should fail validation if helix extends past downstream start""" self.assertRaises(ValueError, Rule, 0, 0, 1, 0, 2) self.assertRaises(ValueError, Rule, 0, 0, 10, 10, 12) def test_init_bad_negative_params(self): """Rule should fail validation if any parameters are negative""" self.assertRaises(ValueError, Rule, -1, 0, 1, 0, 1) self.assertRaises(ValueError, Rule, 0, -1, 1, 1, 1) self.assertRaises(ValueError, Rule, 0, 0, -1, 0, 5) self.assertRaises(ValueError, Rule, 0, 0, 0, -1, 1) self.assertRaises(ValueError, Rule, 0, 0, 1, 1, -1) def test_init_bad_zero_length(self): """Rule should fail validation if length is zero""" self.assertRaises(ValueError, Rule, 0, 0, 1, 1, 0) def test_init_overlap(self): """Rule should fail validation if bases must pair with themselves""" self.assertRaises(ValueError, Rule, 0, 0, 0, 0, 1) self.assertRaises(ValueError, Rule, 0, 10, 0, 15, 4) def test_init_wrong_order(self): """First sequence must have lower index""" self.assertRaises(ValueError, Rule, 1, 0, 0, 5, 3) def test_init_ok_length(self): """Rule should init OK if helix extends to exactly downstream start""" x = Rule(0, 0, 1, 0, 1) self.assertEqual(str(x), \ "Up Seq: 0 Up Pos: 0 Down Seq: 1 Down Pos: 0 Length: 1") #check adjacent bases x = Rule(0, 0, 0, 1, 1) self.assertEqual(str(x), \ "Up Seq: 0 Up Pos: 0 Down Seq: 0 Down Pos: 1 Length: 1") x = Rule(1, 10, 2, 8, 7) #check rule that would cause overlap if motifs weren't different self.assertEqual(str(x), \ "Up Seq: 1 Up Pos: 10 Down Seq: 2 Down Pos: 8 Length: 7") def test_str(self): """Rule str method should give expected results""" x = Rule(1, 10, 2, 8, 7) self.assertEqual(str(x), \ "Up Seq: 1 Up Pos: 10 Down Seq: 2 Down Pos: 8 Length: 7") class RuleTests_compatibility(TestCase): """Tests to see whether the Rule compatibility code works""" def setUp(self): """Sets up some standard rules""" self.x = Rule(1, 5, 2, 10, 3) self.x_ok = Rule(1, 8, 2, 14, 4) self.x_ok_diff_sequences = Rule(3, 5, 5, 10, 3) self.x_bad_first = Rule(1, 0, 3, 10, 10) self.x_bad_first_2 = Rule(0, 0, 1, 8, 2) self.x_bad_second = Rule(1, 15, 2, 15, 8) self.x_bad_second_2 = Rule(1, 14, 2, 8, 4) def test_is_compatible_ok(self): """Rule.isCompatible should return True if rules don't overlap""" self.assertEqual(self.x.isCompatible(self.x_ok), True) #no return value self.assertEqual(self.x.isCompatible(self.x_ok_diff_sequences), True) #check that it's transitive self.assertEqual(self.x_ok.isCompatible(self.x), True) self.assertEqual(self.x_ok_diff_sequences.isCompatible(self.x), True) def test_is_compatible_bad(self): """Rule.isComaptible should return False if rules overlap""" tests = [ (self.x, self.x_bad_first), (self.x, self.x_bad_first_2), (self.x, self.x_bad_second), (self.x, self.x_bad_second_2), ] for first, second in tests: self.assertEqual(first.isCompatible(second), False) #check that it's transitive self.assertEqual(second.isCompatible(first), False) def test_fits_in_sequence(self): """Rule.fitsInSequence should return True if sequence long enough""" sequences = map('x'.__mul__, range(21)) #0 to 20 copies of 'x' rules = [self.x, self.x_ok, self.x_ok_diff_sequences, self.x_bad_first, self.x_bad_first_2, self.x_bad_second, self.x_bad_second_2] #test a bunch of values for all the rules we have handy for s in sequences: for r in rules: if r.UpstreamPosition + r.Length > len(s): self.assertEqual(r.fitsInSequence(s), False) else: self.assertEqual(r.fitsInSequence(s), True) #test a couple of specific boundary cases #length-1 helix r = Rule(0, 0, 1, 0, 1) self.assertEqual(r.fitsInSequence(''), False) self.assertEqual(r.fitsInSequence('x'), True) self.assertEqual(r.fitsInSequence('xx'), True) #length-2 helix starting one base from the start r = Rule(1, 1, 2, 2, 2) self.assertEqual(r.fitsInSequence(''), False) self.assertEqual(r.fitsInSequence('x'), False) self.assertEqual(r.fitsInSequence('xx'), False) self.assertEqual(r.fitsInSequence('xxx'), True) self.assertEqual(r.fitsInSequence('xxxx'), True) class ModuleTests(TestCase): """Tests of the Module class, which holds sequences and structures.""" def test_init_bad(self): """Module init should fail if seq/struct missing, or mismatched lengths""" #test incorrect param number self.assertRaises(TypeError, Module, 'abc') self.assertRaises(TypeError, Module, 'abc', 'def', 'ghi') #test incorrect lengths self.assertRaises(ValueError, Module, 'abc', 'abcd') self.assertRaises(ValueError, Module, 'abcd', 'acb') def test_init_good(self): """Module init should work if seq and struct same length""" m = Module('U', '.') self.assertEqual(m.Sequence, 'U') self.assertEqual(m.Structure, '.') m.Sequence = '' m.Structure = '' self.assertEqual(m.Sequence, '') self.assertEqual(m.Structure, '') m.Sequence = 'CCUAGG' m.Structure = '((..))' self.assertEqual(m.Sequence, 'CCUAGG') self.assertEqual(m.Structure, '((..))') m.Structure = '' self.assertRaises(ValueError, m.__len__) def test_len(self): """Module len should work if seq and struct same length""" m = Module('CUAG', '....') self.assertEqual(len(m), 4) m = Module('', '') self.assertEqual(len(m), 0) m.Sequence = 'AUCGAUCGA' self.assertRaises(ValueError, m.__len__) def test_str(self): """Module str should contain sequence and structure""" m = Module('CUAG', '....') self.assertEqual(str(m), 'Sequence: CUAG\nStructure: ....') m = Module('', '') self.assertEqual(str(m), 'Sequence: \nStructure: ') def test_matches(self): """Module matches should return correct result for seq/struct match""" empty = Module('', '') short_p = Module('AC', '((') short_u = Module('UU', '..') short_up = Module('UU', '((') long_all = Module('GGGACGGUUGGUUGGUU', ')))((..((....((((') #struct+seq long_seq = Module('GGGACGGUUGGUU', ')))))))))))))') #seq but not struct long_struct = Module('GGGGGGGGGGGGG', ')))((..((....') #struct, not seq long_none = Module('GGGGGGGGGGGGG', ')))))))))))))') #not struct or seq #test overall matching for matcher in [empty, short_p, short_u, short_up]: self.assertEqual(matcher.matches(long_all), True) for longer in [long_seq, long_struct, long_none]: if matcher is empty: self.assertEqual(matcher.matches(longer), True) else: self.assertEqual(matcher.matches(longer), False) #test specific positions positions = {3:short_p, 11:short_u, 7:short_up, 15:short_up} for module in [short_p, short_u, short_up]: for i in range(len(long_all)): result = module.matches(long_all, i) if positions.get(i, None) is module: self.assertEqual(result, True) else: self.assertEqual(result, False) class MotifTests(TestCase): """Tests of the Motif object, which has a set of Modules and Rules.""" def setUp(self): """Defines a few standard motifs""" self.ile_mod_0 = Module('NNNCUACNNNNN', '(((((..(((((') self.ile_mod_1 = Module('NNNNNUAUUGGGGNNN', ')))))......)))))') self.ile_rule_0 = Rule(0, 0, 1, 15, 3) self.ile_rule_1 = Rule(0, 7, 1, 4, 5) self.ile = Motif([self.ile_mod_0, self.ile_mod_1], \ [self.ile_rule_0, self.ile_rule_1]) self.hh_mod_0 = Module('NNNNUNNNNN', '(((((.((((') self.hh_mod_1 = Module('NNNNCUGANGAGNNN', ')))).......((((') self.hh_mod_2 = Module('NNNCGAAANNNN', '))))...)))))') self.hh_rule_0 = Rule(0, 0, 2, 11, 5) self.hh_rule_1 = Rule(0, 6, 1, 3, 4) self.hh_rule_2 = Rule(1, 11, 2, 3, 4) self.hh = Motif([self.hh_mod_0, self.hh_mod_1, self.hh_mod_2], \ [self.hh_rule_0, self.hh_rule_1, self.hh_rule_2]) self.simple_0 = Module('CCCCC', '(((..') self.simple_1 = Module('GGGGG', '..)))') self.simple_r = Rule(0, 0, 1, 4, 3) self.simple = Motif([self.simple_0, self.simple_1], [self.simple_r]) def test_init_bad_rule_lengths(self): """Motif init should fail if rules don't match module lengths""" bad_rule = Rule(0, 0, 1, 8, 6) self.assertRaises(ValueError, Motif, [self.simple_0, self.simple_1], \ [bad_rule]) def test_init_conflicting_rules(self): """Motif init should fail if rules overlap""" interferer = Rule(0, 2, 2, 20, 4) self.assertRaises(ValueError, Motif, [self.ile_mod_0, self.ile_mod_1, \ self.ile_mod_0], [self.ile_rule_0, interferer]) def test_matches_simple(self): """Test of simple match should work correctly""" index = '01234567890123456789012345678901' seq = 'AAACCCCCUUUGGGGGAAACCCCCUUUGGGGG' struct = ViennaStructure('((..((..))....))...(((.......)))') struct_2 = ViennaStructure('((((((..((())))))))).....(((.)))') #substring right, not pair self.assertEqual(self.simple.matches(seq, struct, [19, 27]), True) self.assertEqual(self.simple.matches(seq, struct_2, [19,27]), False) for first_pos in range(len(seq) - len(self.simple_0) + 1): for second_pos in range(len(seq) - len(self.simple_1) + 1): #should match struct only at one location match=self.simple.matches(seq, struct, [first_pos, second_pos]) if (first_pos == 19) and (second_pos == 27): self.assertEqual(match, True) else: self.assertEqual(match, False) #should never match in struct_2 self.assertEqual(self.simple.matches(seq, struct_2, \ [first_pos, second_pos]), False) #check that it doesn't fail if there are _two_ matches index = '01234567890123456789' seq = 'CCCCCGGGGGCCCCCGGGGG' struct = '(((....)))(((....)))' struct = ViennaStructure(struct) self.assertEqual(self.simple.matches(seq, struct, [0, 5]), True) self.assertEqual(self.simple.matches(seq, struct, [10,15]), True) #not allowed to cross-pair self.assertEqual(self.simple.matches(seq, struct, [0, 15]), False) def test_matches_ile(self): """Test of isoleucine match should work correctly""" index = '012345678901234567890123456789012345' seq_good = 'AAACCCCUACUUUUUCCCAAAAAUAUUGGGGGGGAA' seq_bad = 'AAACCCCUACUUUUUCCCAAAAAUAUUGGGCGGGAA' st_good = '...(((((..(((((...)))))......)))))..' st_bad = '((((((((..(((((...)))))...))))))))..' st_good = ViennaStructure(st_good) st_bad = ViennaStructure(st_bad) for first_pos in range(len(seq_good) - len(self.ile_mod_0) + 1): for second_pos in range(len(seq_good) - len(self.ile_mod_1) + 1): #seq_good and struct_good should match at one location match=self.ile.matches(seq_good,st_good,[first_pos,second_pos]) if (first_pos == 3) and (second_pos == 18): self.assertEqual(match, True) else: self.assertEqual(match, False) self.assertEqual(self.ile.matches(seq_good, st_bad, \ [first_pos, second_pos]), False) self.assertEqual(self.ile.matches(seq_bad, st_good, \ [first_pos, second_pos]), False) self.assertEqual(self.ile.matches(seq_bad, st_bad, \ [first_pos, second_pos]), False) def test_matches_hh(self): """Test of hammerhead match should work correctly""" index = '0123456789012345678901234567890123456' seq_good = 'CCCCUAGGGGCCCCCUGAAGAGAAAUUUCGAAAGGGG' seq_bad ='CCCCCAGGGGCCCCCUGAAGAGAAAUUUCGAAGGGGG' structure ='(((((.(((()))).......(((())))...)))))' struct = ViennaStructure(structure) self.assertEqual(self.hh.matches(seq_good, struct, [0, 10, 25]), True) self.assertEqual(self.hh.matches(seq_bad, struct, [0, 10, 25]), False) def test_structureMatches_hh(self): """Test of hammerhead structureMatch should work correctly""" index = '0123456789012345678901234567890123456' seq_good = 'CCCCUAGGGGCCCCCUGAAGAGAAAUUUCGAAAGGGG' seq_bad ='CCCCCAGGGGCCCCCUGAAGAGAAAUUUCGAAGGGGG' structure ='(((((.(((()))).......(((())))...)))))' struct = ViennaStructure(structure) self.assertEqual(self.hh.structureMatches(struct, [0, 10, 25]), True) self.assertEqual(self.hh.structureMatches(struct, [0, 10, 25]), True) class SequenceEmbedderTests(TestCase): """Tests of the SequenceEmbedder class.""" def setUp(self): """Define a few standard models and motifs""" ile_mod_0 = Module('NNNCUACNNNNN', '(((((..(((((') ile_mod_1 = Module('NNNNNUAUUGGGGNNN', ')))))......)))))') ile_rule_0 = Rule(0, 0, 1, 15, 5) ile_rule_1 = Rule(0, 7, 1, 4, 5) ile_motif = Motif([ile_mod_0, ile_mod_1], \ [ile_rule_0, ile_rule_1]) helices = [PairedRegion('NNN'), PairedRegion('NNNNN')] constants = [ConstantRegion('CUAC'), ConstantRegion('UAUUGGGG')] order = "H0 C0 H1 - H1 C1 H0" ile_model = SequenceModel(order=order, constants=constants, \ helices=helices, composition=BaseFrequency('UCAG')) self.ile_embedder = SequenceEmbedder(length=50, num_to_do=10, \ motif=ile_motif, model=ile_model, composition=BaseFrequency('UCAG')) short_ile_mod_0 = Module('NCUACNN', '(((..((') short_ile_mod_1 = Module('NNUAUUGGGGN', '))......)))') short_ile_rule_0 = Rule(0, 0, 1, 10, 3) short_ile_rule_1 = Rule(0, 5, 1, 1, 2) short_ile_motif = Motif([short_ile_mod_0, short_ile_mod_1], \ [short_ile_rule_0, short_ile_rule_1]) short_helices = [PairedRegion('N'), PairedRegion('NN')] short_constants = [ConstantRegion('CUAC'), ConstantRegion('UAUUGGGG')] short_order = "H0 C0 H1 - H1 C1 H0" short_ile_model = SequenceModel(order=short_order, \ constants=short_constants, \ helices=short_helices, composition=BaseFrequency('UCAG')) self.short_ile_embedder = SequenceEmbedder(length=50, num_to_do=10, \ motif=short_ile_motif, model=short_ile_model, \ composition=BaseFrequency('UCAG')) def test_composition_change(self): """Changes in composition should propagate.""" rr = str(self.ile_embedder.RandomRegion.Current) #for base in 'UCAG': # assert base in rr #the above two lines should generally be true but fail stochastically self.ile_embedder.Composition = BaseFrequency('CG') self.assertEqual(self.ile_embedder.Model.Composition, \ BaseFrequency('CG')) self.assertEqual(self.ile_embedder.RandomRegion.Composition, \ BaseFrequency('CG')) self.ile_embedder.RandomRegion.refresh() self.assertEqual(len(self.ile_embedder.RandomRegion), 22) rr = str(self.ile_embedder.RandomRegion.Current) assert ('C' in rr or 'G' in rr) assert 'A' not in rr assert 'U' not in rr def test_choose_locations_too_short(self): """SequenceEmbedder _choose_locations should fail if too little space""" self.ile_embedder.Length = 28 #no positions left over self.assertRaises(ValueError, self.ile_embedder._choose_locations) self.ile_embedder.Length = 29 #one position left over self.assertRaises(ValueError, self.ile_embedder._choose_locations) def test_choose_locations_exact(self): """SequenceEmbedder _choose_locations should pick all locations""" self.ile_embedder.Length = 30 #two positions left: must both be filled for i in range(10): first, second = self.ile_embedder._choose_locations() self.assertEqual(first, 0) self.assertEqual(second, 1) def test_choose_locations_even(self): """SequenceEmbedder _choose_locations should pick locations evenly""" self.ile_embedder.Length = 31 #three positions left counts = {} for i in range(1000): key = tuple(self.ile_embedder._choose_locations()) assert key[0] != key[1] curr = counts.get(key, 0) counts[key] = curr + 1 expected = [333, 333, 333] observed = [counts[(0,1)], counts[(0,2)], counts[(1,2)]] self.assertSimilarFreqs(observed, expected) #make sure nothing else snuck in there self.assertEqual(counts[(0,1)]+counts[(0,2)]+counts[(1,2)], 1000) def test_choose_locations_with_replacement(self): """SequenceEmbedder _choose_locations can sample with replacement""" self.ile_embedder.Length = 28 #exact fit self.ile_embedder.WithReplacement = True for i in range(10): first, second = self.ile_embedder._choose_locations() self.assertEqual(first, 0) self.assertEqual(second, 0) self.ile_embedder.Length = 29 #one left over: can be 0,0 0,1 1,1 counts = {} for i in range(1000): key = tuple(self.ile_embedder._choose_locations()) curr = counts.get(key, 0) counts[key] = curr + 1 expected = [250, 500, 250] observed = [counts[(0,0)], counts[(0,1)], counts[(1,1)]] self.assertSimilarFreqs(observed, expected) #make sure nothing else snuck in there self.assertEqual(counts[(0,0)]+counts[(0,1)]+counts[(1,1)], 1000) def test_insert_modules(self): """SequenceEmbedder _insert_modules should make correct sequence""" ile = self.ile_embedder ile.Length = 50 ile.RandomRegion.Current[:] = ['A'] * 22 modules = list(ile.Model) ile.Positions = [0, 0] #try inserting at first position self.assertEqual(str(ile), modules[0] + modules[1] + 'A'*22) ile.Positions = [3, 20] self.assertEqual(str(ile), 'A'*3+modules[0]+'A'*17+modules[1]+'A'*2) def test_refresh(self): """SequenceEmbedder refresh should change module sequences""" modules_before = list(self.ile_embedder.Model) random_before = str(self.ile_embedder.RandomRegion.Current) self.ile_embedder.refresh() random_after = str(self.ile_embedder.RandomRegion.Current) self.assertNotEqual(random_before, random_after) modules_after = list(self.ile_embedder.Model) for before, after in zip(modules_before, modules_after): self.assertNotEqual(before, after) #check that it works twice self.ile_embedder.refresh() random_third = str(self.ile_embedder.RandomRegion.Current) modules_third = list(self.ile_embedder.Model) self.assertNotEqual(random_third, random_before) self.assertNotEqual(random_third, random_after) for first, second, third in \ zip(modules_before, modules_after, modules_third): self.assertNotEqual(first, third) self.assertNotEqual(second, third) def test_countMatches(self): """Shouldn't find any Ile matches if all the pairs are GU""" if not RNAFOLD_PRESENT: return self.ile_embedder.NumToDo = 100 self.ile_embedder.Composition = BaseFrequency('GGGGGGGGGU') self.ile_embedder.Length = 40 good_count = self.ile_embedder.countMatches() self.assertEqual(good_count, 0) def test_countMatches_pass(self): """Should find some matches against a random background""" if not RNAFOLD_PRESENT: return self.ile_embedder.NumToDo = 100 self.ile_embedder.Composition = BaseFrequency('UCAG') self.ile_embedder.Length = 40 good_count = self.ile_embedder.countMatches() self.assertNotEqual(good_count, 0) def test_refresh_specific_position(self): """Should always find the module in the same position if specified""" first_module = Module('AAAAA', '(((((') second_module = Module('UUUUU', ')))))') rule_1 = Rule(0, 0, 1, 4, 5) helix = Motif([first_module, second_module], [rule_1]) model = SequenceModel(constants=[ConstantRegion('AAAAA'), \ ConstantRegion('UUUUU')], order='C0 - C1', \ composition=BaseFrequency('A')) embedder = SequenceEmbedder(length=30, num_to_do=100, \ motif=helix, model=model, composition=BaseFrequency('CG'), \ positions=[3, 6]) last = '' for i in range(100): embedder.refresh() curr = str(embedder) self.assertEqual(curr[3:8], 'AAAAA') self.assertEqual(curr[11:16], 'UUUUU') self.assertEqual(curr.count('A'), 5) self.assertEqual(curr.count('U'), 5) self.assertNotEqual(last, curr) last = curr def test_refresh_primers(self): """Module should appear in correct location with primers""" first_module = Module('AAAAA', '(((((') second_module = Module('UUUUU', ')))))') rule_1 = Rule(0, 0, 1, 4, 5) helix = Motif([first_module, second_module], [rule_1]) model = SequenceModel(constants=[ConstantRegion('AAAAA'), \ ConstantRegion('UUUUU')], order='C0 - C1', \ composition=BaseFrequency('A')) embedder = SequenceEmbedder(length=30, num_to_do=100, \ motif=helix, model=model, composition=BaseFrequency('CG'), \ positions=[3, 6], primer_5 = 'UUU', primer_3 = 'AAA') last = '' for i in range(100): embedder.refresh() curr = str(embedder) self.assertEqual(curr[0:3], 'UUU') self.assertEqual(curr[6:11], 'AAAAA') self.assertEqual(curr[14:19], 'UUUUU') self.assertEqual(curr.count('A'), 8) self.assertEqual(curr.count('U'), 8) self.assertEqual(curr[-3:], 'AAA') self.assertNotEqual(last, curr) last = curr def xxx_test_count_long(self): self.ile_embedder.NumToDo = 100000 self.ile_embedder.Composition = BaseFrequency('UCAG') print print "Extended helices" for length in range(30, 150): self.ile_embedder.Length = length good_count = self.ile_embedder.countMatches() print "Length: %s Matches: %s/100000" % (length, good_count) print def xxx_test_count_short(self): self.short_ile_embedder.NumToDo = 10000 self.short_ile_embedder.Composition = BaseFrequency('UCAG') print print "Minimal motif" for length in range(20, 150): self.short_ile_embedder.Length = length good_count = self.short_ile_embedder.countMatches() print "Length: %s Matches: %s/10000" % (length, good_count) print if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_seqsim/test_tree.py000644 000765 000024 00000072754 12024702176 022332 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.parse.tree import DndParser from cogent.seqsim.tree import RangeNode, balanced_breakpoints, BalancedTree, \ RandomTree, CombTree, StarTree, LineTree from cogent.core.usage import DnaPairs from copy import deepcopy from operator import mul, or_, add from numpy import array, average, diag from numpy.random import random, randint from cogent.seqsim.usage import Rates __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class treeTests(TestCase): """Tests for top-level functions.""" def test_init(self): """Make sure keyword arguments are being passed to baseclass""" node = RangeNode(LeafRange=1, Id=2, Name='foo', Length=42) self.assertEqual(node.LeafRange, 1) self.assertEqual(node.Id, 2) self.assertEqual(node.Name, 'foo') self.assertEqual(node.Length, 42) def test_balanced_breakpoints(self): """balanced_breakpoints should produce expected arrays.""" self.assertRaises(ValueError, balanced_breakpoints, 1) self.assertEqual(balanced_breakpoints(2), array([0])) self.assertEqual(balanced_breakpoints(4), array([1,0,2])) self.assertEqual(balanced_breakpoints(8), \ array([3,1,5,0,2,4,6])) self.assertEqual(balanced_breakpoints(16), \ array([7,3,11,1,5,9,13,0,2,4,6,8,10,12,14])) self.assertEqual(balanced_breakpoints(32), \ array([15,7,23,3,11,19,27,1,5,9,13,17,21,25,29,\ 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30])) def test_BalancedTree(self): """BalancedTree should return a balanced tree""" b = BalancedTree(4) self.assertEqual(len(list(b.traverse())), 4) b.assignIds() self.assertEqual(str(b), '((0,1)4,(2,3)5)6') def test_RandomTree(self): """RandomTree should return correct number of nodes NOTE: all the work is done in breakpoints, which is thoroughly tested indepndently. RandomTree just makes permutations. """ d = {} for i in range(10): r = RandomTree(100) self.assertEqual(len(list(r.traverse())), 100) r.assignIds() #make sure we get different trees each time... s = str(r) assert s not in d d[s] = None def test_CombTree(self): """CombTree should return correct topology""" c = CombTree(4) c.assignIds() self.assertEqual(str(c), '(0,(1,(2,3)4)5)6') c = CombTree(4, deepest_first=False) c.assignIds() self.assertEqual(str(c), '(((0,1)4,2)5,3)6') def test_StarTree(self): """StarTree should return correct star topology and # nodes""" t = StarTree(5) self.assertEqual(len(t.Children), 5) for c in t.Children: assert c.Parent is t def test_LineTree(self): """LineTree should return correct number of nodes""" t = LineTree(5) depth = 1 curr = t while curr.Children: self.assertEqual(len(curr.Children), 1) depth += 1 curr = curr.Children[0] self.assertEqual(depth, 5) class RangeTreeTests(TestCase): """Tests of the RangeTree class.""" def setUp(self): """Make some standard objects to test.""" #Notes on sample string: # #1. trailing zeros are stripped in conversion to/from float, so result # is only exactly the same without them. # #2. trailing chars (e.g. semicolon) are not recaptured in the output, # so were deleted from original Newick-format string. # #3. whitespace is stripped, but is handy for formatting, so is stripped # from original string before comparisons. self.sample_tree_string = """ ( ( xyz:0.28124, ( def:0.24498, mno:0.03627) A:0.1771) B:0.0487, abc:0.05925, ( ghi:0.06914, jkl:0.13776) C:0.09853) """ self.t = DndParser(self.sample_tree_string, RangeNode) self.i = self.t.indexByAttr('Name') self.sample_string_2 = '((((a,b),c),(d,e)),((f,g),h))' self.t2 = DndParser(self.sample_string_2, RangeNode) self.i2 = self.t2.indexByAttr('Name') self.sample_string_3 = '(((a,b),c),(d,e))' self.t3 = DndParser(self.sample_string_3, RangeNode) def test_str(self): """RangeNode should round-trip Newick string corrrectly.""" r = RangeNode() self.assertEqual(str(r), '()') #should work for tree with branch lengths set t = DndParser(self.sample_tree_string, RangeNode) expected = self.sample_tree_string.replace('\n', '') expected = expected.replace(' ', '') self.assertEqual(str(t), expected) #self.assertEqual(t.getNewick(with_distances=True), expected) #should also work for tree w/o branch lengths t2 = DndParser(self.sample_string_2, RangeNode) self.assertEqual(str(t2), self.sample_string_2) def test_traverse(self): """RangeTree traverse should visit all nodes in correct order""" t = self.t i = self.i #first, check that lengths are correct #naked traverse() only does leaves; should be 6. self.assertEqual(len(list(t.traverse())), 6) #traverse() with self_before should count all nodes. self.assertEqual(len(list(t.traverse(self_before=True))), 10) #traverse() with self_after should have same count as self_before self.assertEqual(len(list(t.traverse(self_after=True))), 10) #traverse() with self_before and self_after should visit internal #nodes multiple times self.assertEqual(len(list(t.traverse(True,True))), 14) #now, check that items are in correct order exp = ['xyz','def','mno','abc','ghi','jkl'] obs = [i.Name for i in t.traverse()] self.assertEqual(obs, exp) exp = [None, 'B', 'xyz', 'A', 'def', 'mno', 'abc', 'C', 'ghi', 'jkl'] obs = [i.Name for i in t.traverse(self_before=True)] self.assertEqual(obs, exp) exp = ['xyz', 'def', 'mno', 'A', 'B', 'abc', 'ghi', 'jkl', 'C', None] obs = [i.Name for i in t.traverse(self_after=True)] self.assertEqual(obs, exp) exp = [None, 'B', 'xyz', 'A', 'def', 'mno', 'A', 'B', 'abc', 'C', \ 'ghi', 'jkl', 'C', None] obs = [i.Name for i in t.traverse(self_before=True, self_after=True)] self.assertEqual(obs, exp) def test_indexByAttr(self): """RangeNode indexByAttr should make index using correct attr""" t = self.t i = self.i #check that we got the right number of elements #all elements unique, so should be same as num nodes self.assertEqual(len(i), len(list(t.traverse(self_before=True)))) #check that we got everything i_keys = i.keys() i_vals = i.values() for node in t.traverse(self_before=True): assert node.Name in i_keys assert node in i_vals #can't predict which node will have None as the key if node.Name is not None: assert i[node.Name] is node #check that it works when elements are not unique t = self.t3 for node in t.traverse(self_before=True): node.X = 'b' for node in t.traverse(): node.X = 'a' result = t.indexByAttr('X', multiple=True) self.assertEqual(len(result), 2) self.assertEqual(len(result['a']), 5) self.assertEqual(len(result['b']), 4) for n in t.traverse(): assert n in result['a'] assert not n in result['b'] def test_indexbyFunc(self): """RangeNode indexByFunc should make index from function""" t = self.t def f(n): try: return n.Name.isupper() except AttributeError: return None i = self.i f_i = t.indexByFunc(f) self.assertEqual(len(f_i), 3) self.assertEqual(f_i[True], [i['B'], i['A'], i['C']]) self.assertEqual(f_i[False], [i['xyz'], i['def'], i['mno'], \ i['abc'], i['ghi'], i['jkl']]) self.assertEqual(f_i[None], [i[None]]) def test_assignIds(self): """RangeNode assignIds should work as expected""" t = self.t2 index = self.i2 t.assignIds() #check that ids were set correctly on the leaves for i, a in enumerate('abcdefgh'): self.assertEqual(index[a].Id, i) #check that ranges were set correctly on the leaves for i, a in enumerate('abcdefgh'): self.assertEqual(index[a].LeafRange, (i, i+1)) #check that internal ids were set correctly obs = [i.Id for i in t.traverse(self_after=True)] exp = [0,1,8,2,9,3,4,10,11,5,6,12,7,13,14] self.assertEqual(obs, exp) #check that internal ranges were set correctly obs = [i.LeafRange for i in t.traverse(self_after=True)] exp = [(0,1),(1,2),(0,2),(2,3),(0,3),(3,4),(4,5),(3,5),(0,5), \ (5,6),(6,7),(5,7),(7,8),(5,8),(0,8)] self.assertEqual(obs, exp) def test_propagateAttr(self): """RangeNode propagateAttr should send attr down tree, unless set""" t = self.t i = self.i for n in t.traverse(self_before=True): assert not hasattr(n, 'XYZ') t.XYZ = 3 t.propagateAttr('XYZ') for n in t.traverse(self_before=True): self.assertEqual(n.XYZ, 3) #shouldn't overwrite internal nodes by default a_children = list(i['A'].traverse(self_before=True)) i['A'].GHI = 5 t.GHI = 1 t.propagateAttr('GHI') for n in t.traverse(self_before=True): if n in a_children: self.assertEqual(n.GHI, 5) else: self.assertEqual(n.GHI, 1) t.GHI = 0 t.propagateAttr('GHI', overwrite=True) for n in t.traverse(self_before=True): self.assertEqual(n.GHI, 0) def test_delAttr(self): """RangeNode delAttr should delete attr from self and children""" t = self.t2 for n in t.traverse(self_before=True): assert hasattr(n, 'Name') t.delAttr('Name') for n in t.traverse(self_before=True): assert not hasattr(n, 'Name') def test_accumulateAttr(self): """RangeNode accumulateAttr should accumulate attr in right direction""" t = self.t3 #test towards_leaves (the default) f = lambda a, b: b + 1 for n in t.traverse(self_before=True): n.Level = 0 t.accumulateAttr('Level', f=f) levels = [i.Level for i in t.traverse(self_before=True)] self.assertEqual(levels, [0,1,2,3,3,2,1,2,2]) for n in t.traverse(self_before=True): n.Level=0 #test away from leaves f = lambda a, b : max(a, b+1) for n in t.traverse(self_before=True): n.Level=0 t.accumulateAttr('Level', towards_leaves=False,f=f) levels = [i.Level for i in t.traverse(self_before=True)] self.assertEqual(levels, [3,2,1,0,0,0,1,0,0]) def test_accumulateChildAttr(self): """RangeNode accumulateChildAttr should work as expected""" t = self.t2 i = self.i2 i['a'].x = 3 i['b'].x = 4 i['d'].x = 0 i['f'].x = 1 i['g'].x = 1 i['h'].x = 1 t.accumulateChildAttr('x', f=mul) self.assertEqual([i.__dict__.get('x', None) for i in \ t.traverse(self_after=True)], [3, 4, 12, None, 12, 0, None, 0, 0, 1, 1, 1, 1, 1, 0]) t.accumulateChildAttr('x', f=add) self.assertEqual([i.__dict__.get('x', None) for i in \ t.traverse(self_after=True)], [3, 4, 7, None, 7, 0, None, 0, 7, 1, 1, 2, 1, 3, 10]) def test_assignLevelsFromRoot(self): """RangeNode assignLevelsFromRoot should match hand-calculated levels""" t = self.t3 t.assignLevelsFromRoot() levels = [i.Level for i in t.traverse(self_before=True)] self.assertEqual(levels, [0,1,2,3,3,2,1,2,2]) def test_assignLevelsFromLeaves(self): """RangeNode assignLevelsFromLeaves should match hand-calculated levels""" t = self.t3 t.assignLevelsFromLeaves() levels = [i.Level for i in t.traverse(self_before=True)] self.assertEqual(levels, [3,2,1,0,0,0,1,0,0]) t.assignLevelsFromLeaves(use_min=True) levels = [i.Level for i in t.traverse(self_before=True)] self.assertEqual(levels, [2,1,1,0,0,0,1,0,0]) def test_attrToList(self): """RangeNode attrToList should return correct list of attr""" t = self.t3 t.assignIds() t.assignLevelsFromRoot() #make sure nodes are in the order we expect self.assertEqual([n.Id for n in t.traverse(self_before=True)], [8,6,5,0,1,2,7,3,4]) #by default, should return list containing all nodes obs = t.attrToList('Level') self.assertEqual(obs, [3,3,2,2,2,2,1,1,0]) #should be able to do leaves only if specified obs = t.attrToList('Level', leaves_only=True) self.assertEqual(obs, [3,3,2,2,2]) #should be able to specify larger size obs=t.attrToList('Level', size=12) self.assertEqual(obs, [3,3,2,2,2,2,1,1,0,None,None,None]) #should be able to set default obs=t.attrToList('Level', default='x', size=12) self.assertEqual(obs, [3,3,2,2,2,2,1,1,0,'x','x','x']) def test_attrFromList(self): """RangeNode attrFromList should set values correctly""" t = self.t3 t.assignIds() #by default, should set all nodes from array t.attrFromList('Level', [3,3,2,2,2,2,1,1,0]) self.assertEqual([n.Level for n in t.traverse(self_before=True)], \ [0,1,2,3,3,2,1,2,2]) #should also work if we choose to set only the leaves (rest should #stay at default values) t.Level = -1 t.propagateAttr('Level', overwrite=True) t.attrFromList('Level', [3,3,2,2,2,2,1,1,0], leaves_only=True) self.assertEqual([n.Level for n in t.traverse(self_before=True)], \ [-1,-1,-1,3,3,2,-1,2,2]) def test_toBreakpoints(self): """RangeNode toBreakpoints should give expected list""" t = self.t2 t.assignIds() self.assertEqual(t.toBreakpoints(), [4,2,1,0,3,6,5]) def test_fromBreakpoints(self): """RangeNode fromBreakpoints should have correct topology""" breakpoints = [4,2,1,0,3,6,5] t = RangeNode.fromBreakpoints(breakpoints) #check number of leaves self.assertEqual(len(list(t.traverse())), 8) self.assertEqual(len(list(t.traverse(self_before=True))), 15) #check that leaves were created in right order self.assertEqual([i.Id for i in t.traverse()], range(8)) #check that whole toplogy is right wrt ids... nodes = list(t.traverse(self_before=True)) obs = [i.Id for i in nodes] exp = [8, 9, 11, 13, 0, 1, 2, 12, 3, 4, 10, 14, 5, 6, 7] self.assertEqual(obs, exp) #...and ranges obs = [i.LeafRange for i in nodes] exp = [(0,8),(0,5),(0,3),(0,2),(0,1),(1,2),(2,3),(3,5),(3,4),(4,5), \ (5,8),(5,7),(5,6),(6,7),(7,8)] self.assertEqual(obs, exp) def test_leafLcaDepths(self): """RangeNode leafLcaDepths should return expected depths""" t = self.t3 result = t.leafLcaDepths() self.assertEqual(result, array([[0,1,2,3,3], [1,0,2,3,3], [2,2,0,3,3], [3,3,3,0,1], [3,3,3,1,0]])) def test_randomNode(self): """RandomNode should hit all nodes equally""" t = self.t3 result = {} for i in range(100): ans = id(t.randomNode()) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 9) for node in t.traverse(self_before=True): assert id(node) in result def test_randomLeaf(self): """RandomLeaf should hit all leaf nodes equally""" t = self.t3 result = {} for i in range(100): ans = id(t.randomLeaf()) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 5) for node in t.traverse(): assert id(node) in result def test_randomNodeWithNLeaves(self): """RandomNodeWithNLeaves should return node with correct # leaves""" t = self.t3 #check that only the root gets selected with 5 leaves result = {} for i in range(20): ans = id(t.randomNodeWithNLeaves(5)) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 1) assert id(t) in result #check that nothing has 6 or 4 (for this tree) leaves self.assertRaises(KeyError, t.randomNodeWithNLeaves, 6) self.assertRaises(KeyError, t.randomNodeWithNLeaves, 4) #check that it works with fewer than 5 leaves #SINGLE LEAF: result = {} for i in range(40): ans = id(t.randomNodeWithNLeaves(1)) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 5) self.assertEqual(sum(result.values()), 40) #TWO LEAVES: result = {} for i in range(20): ans = id(t.randomNodeWithNLeaves(2)) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 2) self.assertEqual(sum(result.values()), 20) #THREE LEAVES: result = {} for i in range(20): ans = id(t.randomNodeWithNLeaves(3)) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 1) self.assertEqual(sum(result.values()), 20) def test_randomNodeAtLevel(self): """RangeNode randomNodeAtLevel should return random node at correct level""" t = self.t3 #LEAVES: result = {} for i in range(40): ans = id(t.randomNodeAtLevel(0)) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 5) self.assertEqual(sum(result.values()), 40) #BACK ONE LEVEL: result = {} for i in range(20): ans = id(t.randomNodeAtLevel(1)) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 2) self.assertEqual(sum(result.values()), 20) #BACK TWO LEVELS: result = {} for i in range(20): ans = id(t.randomNodeAtLevel(2)) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 1) self.assertEqual(sum(result.values()), 20) #BACK THREE LEVELS (to root): result = {} for i in range(20): ans = id(t.randomNodeAtLevel(3)) if ans not in result: result[ans] = 0 result[ans] += 1 self.assertEqual(len(result), 1) self.assertEqual(sum(result.values()), 20) self.assertEqual(result.keys()[0], id(t)) def test_outgroupLast(self): """RangeNode outgroupLast should reorder nodes to put outgroup last""" t = self.t3 a, b, c, d, e = t.traverse() self.assertEqual(t.outgroupLast(c,a,b), (a, b, c)) self.assertEqual(t.outgroupLast(c,b,a), (b, a, c)) self.assertEqual(t.outgroupLast(b,d,a), (b, a, d)) self.assertEqual(t.outgroupLast(c,d,e), (d, e, c)) self.assertEqual(t.outgroupLast(a,d,e), (d, e, a)) self.assertEqual(t.outgroupLast(a,d,b), (a, b, d)) #check that it works if we suppress the cache self.assertEqual(t.outgroupLast(c,a,b, False), (a, b, c)) self.assertEqual(t.outgroupLast(c,b,a, False), (b, a, c)) self.assertEqual(t.outgroupLast(b,d,a, False), (b, a, d)) self.assertEqual(t.outgroupLast(c,d,e, False), (d, e, c)) self.assertEqual(t.outgroupLast(a,d,e, False), (d, e, a)) self.assertEqual(t.outgroupLast(a,d,b, False), (a, b, d)) def test_filter(self): """RangeNode filter should keep or omit selected nodes.""" t_orig = self.t2 t = deepcopy(t_orig) idx = t.indexByAttr('Name') to_keep = map(idx.__getitem__, 'abch') curr_leaves = list(t.traverse()) t.filter(to_keep) curr_leaves = list(t.traverse()) for i in to_keep: assert i in curr_leaves for i in map(idx.__getitem__, 'defg'): assert i not in curr_leaves #note that it collapses one-child nodes self.assertEqual(str(t), '(((a,b),c),h)') #test same thing but omitting t = deepcopy(t_orig) idx = t.indexByAttr('Name') to_omit = map(idx.__getitem__, 'abch') t.filter(to_omit, keep=False) curr_leaves = list(t.traverse()) for i in to_omit: assert i not in curr_leaves for i in map(idx.__getitem__, 'defg'): assert i in curr_leaves #note that it collapses one-child nodes self.assertEqual(str(t), '((d,e),(f,g))') #test that it works with internal nodes t = deepcopy(t_orig) idx = t.indexByAttr('Name') to_omit = [idx['a'].Parent.Parent] t.filter(to_omit, keep=False) self.assertEqual(str(t), '((d,e),((f,g),h))') #test that it adds branch lengths t = deepcopy(t_orig) idx = t.indexByAttr('Name') for i in t.traverse(self_after=True): i.BranchLength = 1 to_omit = map(idx.__getitem__, 'abdefg') t.filter(to_omit, keep=False) self.assertEqual(str(t), '(c,h)') #test that it got rid of the temporary '_selected' attribute for node in t.traverse(self_before=True): assert not hasattr(node, '_selected') #if nothing valid in to_keep, should return empty tree t = deepcopy(t_orig) idx = t.indexByAttr('Name') to_keep = [] t.filter(to_keep, keep=True) curr_leaves = list(t.traverse()) assert len(curr_leaves), 0 #if nothing valid in to_keep, should return empty tree t = deepcopy(t_orig) idx = t.indexByAttr('Name') to_keep = list('abcde') #note: just labels, not nodes t.filter(to_keep, keep=True) curr_leaves = list(t.traverse()) assert len(curr_leaves), 0 def test_addChildren(self): """RangeNode add_children should add specified # children to list""" t = RangeNode() t2 = RangeNode(Parent=t) t.addChildren(5) self.assertEqual(len(t.Children), 6) assert t.Children[0] is t2 for c in t.Children: assert c.Parent is t class OldPhyloNodeTests(TestCase): """Tests of the PhyloNode class -- these are all now methods of RangeNode.""" def setUp(self): """Make a couple of standard trees""" self.t1 = DndParser('((a,(b,c)),(d,e))', RangeNode) #selt.t1 indices: ((0,(1,2)5)6,(3,4)7)8 def test_makeIdIndex(self): """RangeNode makeIdIndex should assign ids to every node""" self.t1.makeIdIndex() result = self.t1.IdIndex nodes = list(self.t1.traverse(self_before=True)) #check we got an entry for each node self.assertEqual(len(result), len(nodes)) #check the ids are in the result for i in nodes: assert hasattr(i, 'Id') assert i.Id in result def test_assignQ_single_passed(self): """RangeNode assignQ should propagate single Q param down tree""" #should work if Q explicitly passed t = self.t1 Q = ['a'] t.assignQ(Q) for node in t.traverse(self_before=True): assert node.Q is Q def test_assignQ_single_set(self): """RangeNode assignQ should propagate single Q if set""" t = self.t1 Q = ['a'] assert not hasattr(t, 'Q') t.Q = Q t.assignQ() for node in t.traverse(self_before=True): assert node.Q is Q def test_assignQ_single_overwrite(self): """RangeNode assignQ should overwrite root Q if new Q passed""" t = self.t1 Q = ['a'] Q2 = ['b'] t.Q = Q t.assignQ(Q2) for node in t.traverse(self_before=True): assert node.Q is Q2 assert not node.Q is Q def test_assignQ_multiple(self): """RangeNode assignQ should propagate multiple Qs""" t = self.t1 Q1 = ['a'] Q2 = ['b'] Q3 = ['c'] t.makeIdIndex() t.IdIndex[7].Q = Q1 t.IdIndex[5].Q = Q2 t.assignQ(Q3) result = [i.Q for i in t.traverse(self_after=True)] assert t.Q is Q3 self.assertEqual(result, [Q3,Q2,Q2,Q2,Q3,Q1,Q1,Q1,Q3]) def test_assignQ_multiple_overwrite(self): """RangeNode assignQ should allow overwrite""" t = self.t1 Q1 = ['a'] Q2 = ['b'] Q3 = ['c'] t.makeIdIndex() t.IdIndex[7].Q = Q1 t.IdIndex[5].Q = Q2 t.assignQ(Q3, overwrite=True) for i in t.traverse(self_after=True): assert i.Q is Q3 def test_assignQ_special(self): """RangeNode assignQ should work with special Qs""" t = self.t1 Q1 = 'a' Q2 = 'b' Q3 = 'c' t.makeIdIndex() special = {7:Q1, 1:Q2} #won't work if no Q at root self.assertRaises(ValueError, t.assignQ, special_qs=special) t.assignQ(Q3, special_qs=special) result = [i.Q for i in t.traverse(self_after=True)] self.assertEqual(result, ['c','b','c','c','c','a','a','a','c']) def test_assignP(self): """RangeNode assignP should work when Qs set.""" t = self.t1 for i in t.traverse(self_before=True): i.Length = random() * 0.5 #range 0 to 0.5 t.Q = Rates.random(DnaPairs) t.assignQ() t.assignP() t.assignIds() for node in t.traverse(self_after=True): if node.Parent is not None: self.assertFloatEqual(average(1-diag(node.P._data), axis=0), \ node.Length) def test_assignLength(self): """RangeNode assignLength should set branch length""" t = self.t1 t.assignLength(0.3) for i in t.traverse(self_before=True): self.assertEqual(i.Length, 0.3) def test_evolve(self): """RangeNode evolve should work on a starting vector""" t = self.t1 t.Q = Rates.random(DnaPairs) t.assignQ() t.assignLength(0.1) t.assignP() start = array([1,0,2,1,0,0,2,1,2,0,1,2,1,0,2,0,0,3,0,2,1,0,3,1,0,2,0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3]) t.evolve(start) for i in t.traverse(): self.assertEqual(len(i.Sequence), len(start)) self.assertNotEqual(i.Sequence, start) #WARNING: Doesn't test base freqs etc. at this point, but those aren't #really evolve()'s responsibity (tested as self.P.mutate(seq) once #P is set, which we've already demonstrated works.) def test_assignPs(self): """RangeNode assignPs should assign multiple scaled P matrices""" t = self.t1 for i in t.traverse(self_before=True): i.Length = random() * 0.5 #range 0 to 0.5 t.Q = Rates.random(DnaPairs) t.assignQ() t.assignPs([1, 0.5, 0.25]) t.assignIds() for node in t.traverse(self_after=True): if node.Parent is not None: self.assertEqual(len(node.Ps), 3) self.assertFloatEqual(average(1-diag(node.Ps[0]._data), axis=0), \ node.Length) self.assertFloatEqual(average(1-diag(node.Ps[1]._data), axis=0), \ 0.5*node.Length) self.assertFloatEqual(average(1-diag(node.Ps[2]._data), axis=0), \ 0.25*node.Length) def test_evolveSeqs(self): """PhlyoNode evolveSeqs should evolve multiple sequences""" t = self.t1 for i in t.traverse(self_before=True): i.Length = 0.5 t.Q = Rates.random(DnaPairs) t.assignQ() t.assignPs([1, 1, 0.1]) t.assignIds() orig_seqs = [array(i) for i in [randint(0,4,200), randint(0,4,200), \ randint(0,4,200)]] t.evolveSeqs(orig_seqs) for node in t.traverse(): #only look at leaves if node.Parent is not None: self.assertEqual(len(node.Sequences), 3) for orig, new in zip(orig_seqs, node.Sequences): self.assertEqual(len(orig), len(new)) self.assertNotEqual(orig, new) assert sum(orig_seqs[1]!=node.Sequences[1]) > \ sum(orig_seqs[2]!=node.Sequences[2]) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_seqsim/test_usage.py000644 000765 000024 00000110542 12024702176 022463 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for usage and substitution matrices. """ from cogent.util.unit_test import TestCase, main from cogent.core.moltype import RNA from cogent.core.usage import RnaBases, DnaBases, RnaPairs, DnaPairs from cogent.core.alphabet import Alphabet from cogent.core.sequence import ModelRnaSequence as RnaSequence, \ ModelRnaCodonSequence from cogent.seqsim.usage import Usage, DnaUsage, RnaUsage, PairMatrix, Counts,\ Probs, Rates, goldman_q_dna_pair, goldman_q_rna_pair,\ goldman_q_dna_triple, goldman_q_rna_triple from numpy import average, asarray, sqrt, identity, diagonal, trace, \ array, sum from cogent.maths.matrix_logarithm import logm from cogent.maths.matrix_exponentiation import FastExponentiator as expm #need to find test directory to get access to the tests of the Freqs interface try: from os import getcwd from sys import path from os.path import sep,join test_path = getcwd().split(sep) index = test_path.index('tests') fields = test_path[:index+1] + ["test_maths"] test_path = sep + join(*fields) path.append(test_path) from test_stats.test_util import StaticFreqsTestsI my_alpha = Alphabet('abcde') class myUsage(Usage): Alphabet = my_alpha class UsageAsFreqsTests(StaticFreqsTestsI, TestCase): """Note that the remaining Usage methods are tested here.""" ClassToTest=myUsage except ValueError: #couldn't find directory pass __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" NUM_TESTS = 10 #for randomization trials class UsageTests(TestCase): """Tests of the Usage object.""" def setUp(self): """Defines some standard test items.""" self.ab = Alphabet('ab') class abUsage(Usage): Alphabet = self.ab self.abUsage = abUsage def test_init(self): """Usage init should succeed only in subclasses""" self.assertRaises(TypeError, Usage, [1,2,3,4]) self.assertEqual(self.abUsage().items(), [('a',0),('b',0)]) self.assertEqual(self.abUsage([5,6]).items(), [('a',5.0),('b',6.0)]) #should also construct from seq, if not same length as freqs self.assertEqual(self.abUsage([0,0,1,1,1,0,1,1]).items(), \ [('a',3),('b',5)]) def test_getitem(self): """Usage getitem should get item via alphabet""" u = self.abUsage([3,4]) self.assertEqual(u['a'], 3) self.assertEqual(u['b'], 4) def test_setitem(self): """Usage setitem should set item via alphabet""" u = self.abUsage([3,4]) self.assertEqual(u['a'], 3) u['a'] = 10 self.assertEqual(u['a'], 10) u['b'] += 5 self.assertEqual(u['a'], 10) self.assertEqual(u['b'], 9) def test_str(self): """Usage str should print like equivalent list""" u = self.abUsage() self.assertEqual(str(u), "[('a', 0.0), ('b', 0.0)]") u = self.abUsage([1,2.0]) self.assertEqual(str(u), \ "[('a', 1.0), ('b', 2.0)]") def test_iter(self): """Usage iter should iterate over keys""" u = self.abUsage([1,2]) x = tuple(u) self.assertEqual(x, ('a', 'b')) #should be able to convert to dict via iter d = dict(u) self.assertEqual(dict(u), {'a':1,'b':2}) def test_cmp(self): """Usage cmp should work as expected""" a = self.abUsage([3,4]) b = self.abUsage([3,2]) c = self.abUsage([3,4]) self.assertEqual(a, a) self.assertNotEqual(a,b) self.assertEqual(a,c) self.assertEqual(a==a, True) self.assertEqual(a!=a, False) self.assertEqual(a==b, False) self.assertEqual(a!=b, True) self.assertEqual(a==c, True) self.assertEqual(a!=c, False) self.assertEqual(a==3, False) self.assertEqual(a!=3, True) def test_add(self): """Usage add should add two sets of counts together""" u, v = self.abUsage([1,2]), self.abUsage([6,4]) x = self.abUsage([7,6]) y = self.abUsage([7,6]) self.assertEqual(x, y) self.assertEqual(x, u+v) self.assertEqual(u + v, self.abUsage([7,6])) def test_sub(self): """Usage sub should subtract one set of counts from the other""" u, v = self.abUsage([1,2]), self.abUsage([6,4]) self.assertEqual(v-u, self.abUsage([5,2])) def test_mul(self): """Usage mul should multiply usage by a scalar""" u = self.abUsage([0,4]) self.assertEqual(u*3, self.abUsage([0,12])) def test_div(self): """Usage div should divide usage by scalar (unsafely)""" u = self.abUsage([0,4]) self.assertEqual(u/2, self.abUsage([0,2])) self.assertEqual(u/8, self.abUsage([0.0,0.5])) #note: don't need to divide by floating point to get fractions self.assertEqual(u/8.0, self.abUsage([0.0,0.5])) def test_scale_sum(self): """Usage scale_sum should scale usage to specified sum""" u = self.abUsage([1,3]) self.assertEqual(u.scale_sum(12), self.abUsage([3.0, 9.0])) self.assertEqual(u.scale_sum(1), self.abUsage([0.25,0.75])) #default is sum to 1 self.assertEqual(u.scale_sum(), self.abUsage([0.25,0.75])) def test_scale_max(self): """Usage scale_max should scale usage to specified max""" u = self.abUsage([1,3]) self.assertEqual(u.scale_max(12), self.abUsage([4.0, 12.0])) self.assertEqual(u.scale_max(1), self.abUsage([1/3.0,1.0])) #default is max to 1 self.assertEqual(u.scale_max(), self.abUsage([1/3.0,1.0])) def test_probs(self): """Usage probs should scale usage to sum to 1""" u = self.abUsage([1,3]) self.assertEqual(u.probs(), self.abUsage([0.25,0.75])) def test_randomIndices(self): """Usage randomIndices should return correct sequence.""" d = DnaUsage([0.25, 0.5, 0.1, 0.15]) s = d.randomIndices(7, [0, 0.49, 1, 0.74, 0.76, 0.86, 0.2]) self.assertEqual(s, array([0,1,3,1,2,3,0])) s = d.randomIndices(10000) u, c, a, g = [asarray(s==i, 'int32') for i in [0,1,2,3]] assert 2300 < sum(u) < 2700 assert 4800 < sum(c) < 5200 assert 800 < sum(a) < 1200 assert 1300 < sum(g) < 1700 def test_fromSeqData(self): """Usage fromSeqData should construct from a sequence object w/ data""" class o(object): pass s = o() s._data = array([0,0,0,1]) self.assertEqual(self.abUsage.fromSeqData(s), self.abUsage([3,1])) def test_fromArray(self): """Usage fromArray should construct from array holding seq of indices""" s = array([0,0,0,1]) self.assertEqual(self.abUsage.fromArray(s), self.abUsage([3,1])) def test_get(self): """Usage get should behave like dict""" u = self.abUsage([3,4]) self.assertEqual(u.get('a', 5), 3) self.assertEqual(u.get('b', 5), 4) self.assertEqual(u.get('x', 5), 5) def test_values(self): """Usage values should return list of values in alphabet order""" u = self.abUsage([3,4]) self.assertEqual(u.values(), [3,4]) def test_keys(self): """Usage keys should return list of symbols in alphabet order""" u = self.abUsage([3,4]) self.assertEqual(u.keys(), ['a','b']) def test_items(self): """Usage items should return list of key-value pairs""" u = self.abUsage([3,4]) self.assertEqual(u.items(), [('a',3),('b',4)]) def test_entropy(self): """Usage items should calculate their Shannon entropy""" #two equal choices implies one bit of entropy u = RnaUsage([1,1,0,0]) self.assertEqual(u.entropy(), 1) u = RnaUsage([10,10,0,0]) self.assertEqual(u.entropy(), 1) #four equal choices implies two bits u = RnaUsage([3,3,3,3]) self.assertEqual(u.entropy(), 2) #only one choice -> no entropy u = RnaUsage([3,0,0,0]) self.assertEqual(u.entropy(), 0) #empty usage also has no entropy u = RnaUsage([0,0,0,0]) self.assertEqual(u.entropy(), 0) #calculated this one by hand u = RnaUsage([.5,.3,.1,.1]) self.assertFloatEqual(u.entropy(),1.6854752972273346) class PairMatrixTests(TestCase): """Tests of the PairMatrix base class.""" def setUp(self): """Define standard alphabet and matrices for tests.""" self.ab = Alphabet('ab') self.ab_pairs = self.ab*self.ab self.empty = PairMatrix([0,0,0,0], self.ab_pairs) self.named = PairMatrix([[1,2],[3,4]], self.ab_pairs, 'name') def test_init(self): """PairMatrix init requires data and alphabet""" #should only care about number of elements, not shape p = PairMatrix([1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8], RnaPairs) assert p.Alphabet is RnaPairs self.assertEqual(len(p._data), 4) self.assertEqual(len(p._data.flat), 16) self.assertEqual(p._data[0], array([1,2,3,4])) self.assertEqual(p._data[1], array([5,6,7,8])) def test_init_bad(self): """PairMatrix init should fail if data wrong length""" self.assertRaises(ValueError, PairMatrix, [1,2,3,4], RnaPairs) #should also require alphabet self.assertRaises(TypeError, PairMatrix, [1,2,3,4]) def test_toMatlab(self): """PairMatrix toMatlab should return correct format string""" self.assertEqual(self.empty.toMatlab(), "m=[0.0 0.0;\n0.0 0.0];\n") self.assertEqual(self.named.toMatlab(), \ "name=[1.0 2.0;\n3.0 4.0];\n") def test_str(self): """PairMatrix __str__ should return string correpsonding to data""" self.assertEqual(str(self.named), str(self.named._data)) def test_repr(self): """PairMatrix __repr__ should return reconstructable string""" self.assertEqual(repr(self.named), \ 'PairMatrix('+ repr(self.named._data) + ',' +\ repr(self.ab_pairs)+",'name')") def test_getitem(self): """PairMatrix __getitem__ should translate indices and get from array""" n = self.named self.assertEqual(n['a'], array([1,2])) self.assertEqual(n['b'], array([3,4])) self.assertEqual(n['a','a'], 1) self.assertEqual(n['a','b'], 2) self.assertEqual(n['b','a'], 3) self.assertEqual(n['b','b'], 4) #WARNING: m[a][b] doesn't work b/c indices not translated! #must access as m[a,b] instead. try: x = n['a']['b'] except ValueError: pass #should work even if SubAlphabets not the same a = Alphabet('ab') x = Alphabet('xyz') j = a * x m = PairMatrix([1,2,3,4,5,6], j) self.assertEqual(m['a','x'], 1) self.assertEqual(m['a','y'], 2) self.assertEqual(m['a','z'], 3) self.assertEqual(m['b','x'], 4) self.assertEqual(m['b','y'], 5) self.assertEqual(m['b','z'], 6) #should work even if SubAlphabets are different types a = Alphabet([1,2,3]) b = Alphabet(['abc', 'xyz']) j = a * b m = PairMatrix([1,2,3,4,5,6], j) self.assertEqual(m[1,'abc'], 1) self.assertEqual(m[1,'xyz'], 2) self.assertEqual(m[2,'abc'], 3) self.assertEqual(m[2,'xyz'], 4) self.assertEqual(m[3,'abc'], 5) self.assertEqual(m[3,'xyz'], 6) self.assertEqual(list(m[2]), [3,4]) #gives KeyError if single item not present in first level self.assertRaises(KeyError, m.__getitem__, 'x') def test_empty(self): """PairMatrix empty classmethod should produce correct class""" p = PairMatrix.empty(self.ab_pairs) self.assertEqual(p._data.flat, array([0,0,0,0])) self.assertEqual(p._data, array([[0,0],[0,0]])) self.assertEqual(p._data.shape, (2,2)) def test_eq(self): """Pairmatrix test for equality should check all elements""" p = self.ab_pairs a = PairMatrix.empty(p) b = PairMatrix.empty(p) assert a is not b self.assertEqual(a, b) c = PairMatrix([1,2,3,4], p) d = PairMatrix([1,2,3,4], p) assert c is not d self.assertEqual(c, d) self.assertNotEqual(a, c) #Note: still compare equal if alphabets are different x = Alphabet('xy') x = x*x y = PairMatrix([1,2,3,4], x) self.assertEqual(y, c) #should check all elements, not just first c = PairMatrix([1,1,1,1], p) d = PairMatrix([1,1,1,4], p) assert c is not d self.assertNotEqual(c, d) def test_ne(self): """PairMatrix test for inequality should check all elements""" p = self.ab_pairs a = PairMatrix.empty(p) b = PairMatrix.empty(p) c = PairMatrix([1,2,3,4], p) d = PairMatrix([1,2,3,4], p) assert a != c assert a == b assert c == d #Note: still compare equal if alphabets are different x = Alphabet('xy') x = x*x y = PairMatrix([1,2,3,4], x) assert y == c #should check all elements, not just first c = PairMatrix([1,1,1,1], p) d = PairMatrix([1,1,1,4], p) assert c != d def test_iter(self): """PairMatrix __iter__ should iterate over rows.""" p = self.ab_pairs c = PairMatrix([1,2,3,4], p) l = list(c) self.assertEqual(len(l), 2) self.assertEqual(list(l[0]), [1,2]) self.assertEqual(list(l[1]), [3,4]) def test_len(self): """PairMatrix __len__ should return number of rows""" p = self.ab_pairs c = PairMatrix([1,2,3,4], p) self.assertEqual(len(c), 2) class CountsTests(TestCase): """Tests of the Counts class, including inferring counts from sequences.""" def test_toProbs(self): """Counts toProbs should return valid prob matrix.""" c = Counts([1,2,3,4,2,2,2,2,0.2,0.4,0.6,0.8,1,0,0,0], RnaPairs) p = c.toProbs() assert isinstance(p, Probs) self.assertEqual(p, Probs([0.1,0.2,0.3,0.4,0.25,0.25,0.25,0.25, \ 0.1,0.2,0.3,0.4,1.0,0.0,0.0,0.0], RnaPairs)) self.assertEqual(p['U','U'], 0.1) self.assertEqual(p['G','U'], 1.0) self.assertEqual(p['G','G'], 0.0) def test_fromPair(self): """Counts fromPair should return correct counts.""" s = Counts.fromPair( RnaSequence('UCCGAUCGAUUAUCGGGUACGUA'), \ RnaSequence('GUCGAGUAUAGCGUACGGCUACG'), RnaPairs) assert isinstance(s, Counts) vals = [ ('U','U',0),('U','C',2.5),('U','A',1),('U','G',2.5), ('C','U',2.5),('C','C',1),('C','A',1),('C','G',0.5), ('A','U',1),('A','C',1),('A','A',1),('A','G',2), ('G','U',2.5),('G','C',0.5),('G','A',2),('G','G',2), ] for i, j, val in vals: self.assertFloatEqual(s[i,j], val) #check that it works for big seqs s = Counts.fromPair( RnaSequence('UCAG'*1000), \ RnaSequence('UGAG'*1000), RnaPairs) assert isinstance(s, Counts) vals = [ ('U','U',1000),('U','C',0),('U','A',0),('U','G',0), ('C','U',0),('C','C',0),('C','A',0),('C','G',500), ('A','U',0),('A','C',0),('A','A',1000),('A','G',0), ('G','U',0),('G','C',500),('G','A',0),('G','G',1000), ] for i, j, val in vals: self.assertFloatEqual(s[i,j], val) #check that it works for codon seqs s1 = ModelRnaCodonSequence('UUCGCG') s2 = ModelRnaCodonSequence('UUUGGG') c = Counts.fromPair(s1, s2, RNA.Alphabet.Triples**2) self.assertEqual(c._data.sum(), 2) self.assertEqual(c._data[0,1], 0.5) self.assertEqual(c._data[1,0], 0.5) self.assertEqual(c._data[55,63], 0.5) self.assertEqual(c._data[63,55], 0.5) def test_fromTriple(self): """Counts fromTriple should return correct counts.""" cft = Counts.fromTriple rs = RnaSequence A, C, G, U = map(rs, 'ACGU') #counts if different from both the other groups s = cft(A, C, C, RnaPairs) assert isinstance(s, Counts) self.assertEqual(s['C','A'], 1) self.assertEqual(s['A','C'], 0) self.assertEqual(s['C','C'], 0) #try it with longer sequences AAA, CCC = map(rs, ['AAA', 'CCC']) s = cft(AAA, CCC, CCC, RnaPairs) self.assertEqual(s['C','A'], 3) self.assertEqual(s['A','C'], 0) #doesn't count if all three differ ACG, CGA, GAC = map(rs, ['ACG','CGA','GAC']) s = cft(ACG, CGA, GAC, RnaPairs) self.assertEqual(s['C','A'], 0) self.assertEqual(s['A','C'], 0) self.assertEqual(s, Counts.empty(RnaPairs)) #counts as no change if same as other sequence... s = cft(AAA, AAA, CCC, RnaPairs) self.assertEqual(s['A','A'], 3) self.assertEqual(s['A','C'], 0) #...or same as the outgroup s = cft(AAA, CCC, AAA, RnaPairs) self.assertEqual(s['A','A'], 3) self.assertEqual(s['A','C'], 0) #spot-check a mixed example s = cft( \ rs('AUCGCUAGCAUACGUCA'), rs('AAGCUGCGUAGCGCAUA'), rs('GCGCAUAUGACGAUAGC'), RnaPairs ) vals = [ ('U','U',1),('U','C',0),('U','A',0),('U','G',0), ('C','U',0),('C','C',0),('C','A',0),('C','G',1), ('A','U',1),('A','C',0),('A','A',4),('A','G',0), ('G','U',0),('G','C',1),('G','A',0),('G','G',1), ] for i, j, val in vals: self.assertFloatEqual(s[i,j], val) #check a long sequence s = cft( \ rs('AUCGCUAGCAUACGUCA'*1000), rs('AAGCUGCGUAGCGCAUA'*1000), rs('GCGCAUAUGACGAUAGC'*1000), RnaPairs ) vals = [ ('U','U',1000),('U','C',0),('U','A',0),('U','G',0), ('C','U',0),('C','C',0),('C','A',0),('C','G',1000), ('A','U',1000),('A','C',0),('A','A',4000),('A','G',0), ('G','U',0),('G','C',1000),('G','A',0),('G','G',1000), ] for i, j, val in vals: self.assertFloatEqual(s[i,j], val) #check that it works when forced to use both variants of fromTriple s = cft( \ rs('AUCGCUAGCAUACGUCA'*1000), rs('AAGCUGCGUAGCGCAUA'*1000), rs('GCGCAUAUGACGAUAGC'*1000), RnaPairs, threshold=0 #forces "large" method ) vals = [ ('U','U',1000),('U','C',0),('U','A',0),('U','G',0), ('C','U',0),('C','C',0),('C','A',0),('C','G',1000), ('A','U',1000),('A','C',0),('A','A',4000),('A','G',0), ('G','U',0),('G','C',1000),('G','A',0),('G','G',1000), ] for i, j, val in vals: self.assertFloatEqual(s[i,j], val) s = cft( \ rs('AUCGCUAGCAUACGUCA'*1000), rs('AAGCUGCGUAGCGCAUA'*1000), rs('GCGCAUAUGACGAUAGC'*1000), RnaPairs, threshold=1e12 #forces "small" method ) vals = [ ('U','U',1000),('U','C',0),('U','A',0),('U','G',0), ('C','U',0),('C','C',0),('C','A',0),('C','G',1000), ('A','U',1000),('A','C',0),('A','A',4000),('A','G',0), ('G','U',0),('G','C',1000),('G','A',0),('G','G',1000), ] for i, j, val in vals: self.assertFloatEqual(s[i,j], val) #check that it works for codon seqs s1 = ModelRnaCodonSequence('UUCGCG') s2 = ModelRnaCodonSequence('UUUGGG') s3 = s2 c = Counts.fromTriple(s1, s2, s3, RNA.Alphabet.Triples**2) self.assertEqual(c._data.sum(), 2) self.assertEqual(c._data[0,1], 1) self.assertEqual(c._data[63,55], 1) class ProbsTests(TestCase): """Tests of the Probs class.""" def setUp(self): """Define an alphabet and some probs.""" self.ab = Alphabet('ab') self.ab_pairs = self.ab**2 def test_isValid(self): """Probs isValid should return True if it's a prob matrix""" a = self.ab_pairs m = Probs([0.5,0.5,1,0], a) self.assertEqual(m.isValid(), True) #fails if don't sum to 1 m = Probs([0.5, 0, 1, 0], a) self.assertEqual(m.isValid(), False) #fails if negative elements m = Probs([1, -1, 0, 1], a) self.assertEqual(m.isValid(), False) def test_makeModel(self): """Probs makeModel should return correct substitution pattern""" a = Alphabet('abc')**2 m = Probs([0.5,0.25,0.25,0.1,0.8,0.1,0.3,0.6,0.1], a) obs = m.makeModel(array([0,1,1,0,2,2])) exp = array([[0.5,0.25,0.25],[0.1,0.8,0.1],[0.1,0.8,0.1],\ [0.5,0.25,0.25],[0.3,0.6,0.1],[0.3,0.6,0.1]]) self.assertEqual(obs, exp) def test_mutate(self): """Probs mutate should return correct vector from input vector""" a = Alphabet('abc')**2 m = Probs([0.5,0.25,0.25,0.1,0.8,0.1,0.3,0.6,0.1], a) #because of fp math in accumulate, can't predict boundaries exactly #so add/subtract eps to get the result we expect eps = 1e-6 # a b b a c c a b c seq = array([0,1,1,0,2,2,0,1,2]) random_vec = array([0,.01,.8-eps,1,1,.3,.05,.9+eps,.95]) self.assertEqual(m.mutate(seq, random_vec), \ # a a b c c a a c c array([0,0,1,2,2,0,0,2,2])) #check that freq. distribution is about right seqs = array([m.mutate(seq) for i in range(1000)]) #WARNING: bool operators return byte arrays, whose sums wrap at 256! zero_count = asarray(seqs == 0, 'int32') sums = sum(zero_count, axis=0) #expect: 500, 100, 100, 500, 300, 300, 500, 100, 300 #std dev = sqrt(npq), which is sqrt(250), sqrt(90), sqrt(210) means = array([500, 100, 100, 500, 300, 300, 500, 100, 300]) var = array([250, 90, 90, 250, 210, 210, 250, 90, 210]) three_sd = 3 * sqrt(var) for obs, exp, sd in zip(sums, means, three_sd): assert exp - 2*sd < obs < exp + 2*sd def test_toCounts(self): """Probs toCounts should return counts object w/ right numbers""" a = Alphabet('abc')**2 m = Probs([0.5,0.25,0.25,0.1,0.8,0.1,0.3,0.6,0.1], a) obs = m.toCounts(30) assert isinstance(obs, Counts) exp = Counts([[5.,2.5,2.5,1,8,1,3,6,1]], a) self.assertEqual(obs, exp) def test_toRates(self): """Probs toRates should return log of probs, optionally normalized""" a = Alphabet('abc')**2 p = Probs([0.9,0.05,0.05,0.1,0.85,0.05,0.02,0.02,0.96], a) assert p.isValid() r = p.toRates() assert isinstance(r, Rates) assert r.isValid() assert not r.isComplex() self.assertEqual(r._data, logm(p._data)) r_norm = p.toRates(normalize=True) self.assertFloatEqual(trace(r_norm._data), -1.0) def test_random_p_matrix(self): """Probs random should return random Probsrows that sum to 1""" for i in range(NUM_TESTS): p = Probs.random(RnaPairs)._data for i in p: self.assertFloatEqual(sum(i), 1.0) #length should be 4 by default self.assertEqual(len(p), 4) self.assertEqual(len(p[0]), 4) def test_random_p_matrix_diag(self): """Probs random should work with a scalar diagonal""" #if diagonal is 1, off-diagonal elements should be 0 for i in range(NUM_TESTS): p = Probs.random(RnaPairs, 1)._data self.assertEqual(p, identity(4, 'd')) #if diagonal is between 0 and 1, rows should sum to 1 for i in range(NUM_TESTS): p = Probs.random(RnaPairs, 0.5)._data for i in range(4): self.assertFloatEqual(sum(p[i]), 1.0) self.assertEqual(p[i][i], 0.5) assert min(p[i]) >= 0 assert max(p[i]) <= 1 #if diagonal > 1, rows should still sum to 1 for i in range(NUM_TESTS): p = Probs.random(RnaPairs, 2)._data for i in range(4): self.assertEqual(p[i][i], 2.0) self.assertFloatEqual(sum(p[i]), 1.0) assert min(p[i]) < 0 def test_random_p_matrix_diag_vector(self): """Probs random should work with a vector diagonal""" for i in range(NUM_TESTS): diag = [0, 0.2, 0.6, 1.0] p = Probs.random(RnaPairs, diag)._data for i, d, row in zip(range(4), diag, p): self.assertFloatEqual(sum(row), 1.0) self.assertEqual(row[i], diag[i]) class RatesTests(TestCase): """Tests of the Rates class.""" def setUp(self): """Define standard alphabets.""" self.abc = Alphabet('abc') self.abc_pairs = self.abc**2 def test_init(self): """Rates init should take additional parameter to normalize""" r = Rates([-2,1,1,0,-1,1,0,0,0], self.abc_pairs) self.assertEqual(r._data, array([[-2,1,1],[0,-1,1],[0,0,0]])) r = Rates([-2.5,1,1,0,-1,1,0,0,0], self.abc_pairs) self.assertEqual(r._data, array([[-2.5,1.,1.],[0.,-1.,1.],[0.,0.,0.]])) r = Rates([-2,1,1,0,-1,1,2,0,-1], self.abc_pairs, normalize=True) self.assertEqual(r._data, \ array([[-0.5,.25,.25],[0.,-.25,.25],[.5,0.,-.25]])) def test_isComplex(self): """Rates isComplex should return True if complex elements""" r = Rates([0,0,0.1j,0,0,0,0,0,0], self.abc_pairs) assert r.isComplex() r = Rates([0,0,0.1,0,0,0,0,0,0], self.abc_pairs) assert not r.isComplex() def test_isSignificantlyComplex(self): """Rates isSignificantlyComplex should be true if large imag component""" r = Rates([0,0,0.2j,0,0,0,0,0,0], self.abc_pairs) assert r.isSignificantlyComplex() assert r.isSignificantlyComplex(0.01) assert not r.isSignificantlyComplex(0.2) assert not r.isSignificantlyComplex(0.3) r = Rates([0,0,0.1,0,0,0,0,0,0], self.abc_pairs) assert not r.isSignificantlyComplex() assert not r.isSignificantlyComplex(1e-30) assert not r.isSignificantlyComplex(1e3) def test_isValid(self): """Rates isValid should check row sums and neg off-diags""" r = Rates([-2,1,1,0,-1,1,0,0,0], self.abc_pairs) assert r.isValid() r = Rates([0,0,0,0,0,0,0,0,0], self.abc_pairs) assert r.isValid() #not valid if negative off-diagonal r = Rates([-2,-1,3,1,-1,0,2,2,-4], self.abc_pairs) assert not r.isValid() #not valid if rows don't all sum to 0 r = Rates([0,0.0001,0,0,0,0,0,0,0], self.abc_pairs) assert not r.isValid() def test_normalize(self): """Rates normalize should return normalized copy of self where trace=-1""" r = Rates([-2,1,1,0,-1,1,2,0,-1], self.abc_pairs) n = r.normalize() self.assertEqual(n._data, \ array([[-0.5,.25,.25],[0.,-.25,.25],[.5,0.,-.25]])) #check that we didn't change the original assert n._data is not r._data self.assertEqual(r._data, \ array([[-2,1,1,],[0,-1,1,],[2,0,-1]])) def test_toProbs(self): """Rates toProbs should return correct probability matrix""" a = self.abc_pairs p = Probs([0.75, 0.1, 0.15, 0.2, 0.7, 0.1, 0.05, 0.1, 0.85], a) q = p.toRates() self.assertEqual(q._data, logm(p._data)) p2 = q.toProbs() self.assertFloatEqual(p2._data, p._data) #test a case that didn't work for DNA q = Rates(array( [[-0.64098451, 0.0217681 , 0.35576469, 0.26345171], [ 0.31144238, -0.90915091, 0.25825858, 0.33944995], [ 0.01578521, 0.43162879, -0.99257581, 0.54516182], [ 0.13229986, 0.04027147, 0.05817791, -0.23074925]]), DnaPairs) self.assertFloatEqual(q.toProbs(0.5)._data, expm(q._data)(t=0.5)) def test_timeForSimilarity(self): """Rates timeToSimilarity should return correct time""" a = self.abc_pairs p = Probs([0.75, 0.1, 0.15, 0.2, 0.7, 0.1, 0.05, 0.15, 0.8], a) q = p.toRates() d = 0.5 t = q.timeForSimilarity(d) x = expm(q._data)(t) self.assertFloatEqual(average(diagonal(x), axis=0), d) t = q.timeForSimilarity(d, array([1/3.0]*3)) x = expm(q._data)(t) self.assertFloatEqual(average(diagonal(x), axis=0), d) self.assertEqual(q.timeForSimilarity(1), 0) def test_toSimilarProbs(self): """Rates toSimilarProbs should match individual steps""" a = self.abc_pairs p = Probs([0.75, 0.1, 0.15, 0.2, 0.7, 0.1, 0.05, 0.15, 0.8], a) q = p.toRates() self.assertEqual(q.toSimilarProbs(0.5), \ q.toProbs(q.timeForSimilarity(0.5))) #test a case that didn't work for DNA q = Rates(array( [[-0.64098451, 0.0217681 , 0.35576469, 0.26345171], [ 0.31144238, -0.90915091, 0.25825858, 0.33944995], [ 0.01578521, 0.43162879, -0.99257581, 0.54516182], [ 0.13229986, 0.04027147, 0.05817791, -0.23074925]]), DnaPairs) p = q.toSimilarProbs(0.66) self.assertFloatEqual(average(diagonal(p._data), axis=0), 0.66) def test_random_q_matrix(self): """Rates random should return matrix of correct size""" for i in range(NUM_TESTS): q = Rates.random(RnaPairs)._data self.assertEqual(len(q), 4) self.assertEqual(len(q[0]), 4) for row in q: self.assertFloatEqual(sum(row), 0.0) assert min(row) < 0 assert max(row) > 0 l = list(row) l.sort() assert min(l[1:]) >= 0 assert max(l[1:]) <= 1 def test_random_q_matrix_diag(self): """Rates random should set diagonal correctly from scalar""" for i in range(NUM_TESTS): q = Rates.random(RnaPairs, -1)._data self.assertEqual(len(q), 4) for i, row in enumerate(q): self.assertFloatEqual(sum(row), 0) self.assertEqual(row[i], -1) assert max(row) <= 1 l = list(row) l.sort() assert min(l[1:]) >= 0 assert max(l[1:]) <= 1 for i in range(NUM_TESTS): q = Rates.random(RnaPairs, -5)._data self.assertEqual(len(q), 4) for i, row in enumerate(q): self.assertFloatEqual(sum(row), 0) self.assertEqual(row[i], -5) assert max(row) <= 5 l = list(row) l.sort() assert min(l[1:]) >= 0 assert max(l[1:]) <= 5 def test_random_q_matrix_diag_vector(self): """Rates random should init with vector as diagonal""" diag = [1, -1, 2, -2] for i in range(NUM_TESTS): q = Rates.random(RnaPairs, diag)._data for i, d, row in zip(range(4), diag, q): self.assertFloatEqual(sum(row, axis=0), 0.0) self.assertEqual(row[i], diag[i]) def test_fixNegsDiag(self): """Rates fixNegsDiag should fix negatives by adding to diagonal""" q = Rates([[-6,2,2,2],[-6,-2,4,4],[2,2,-6,2],[4,4,-2,-6]], RnaPairs) m = q.fixNegsDiag()._data self.assertEqual(m,array([[-6,2,2,2],[0,-8,4,4],[2,2,-6,2],[4,4,0,-8]])) def test_fixNegsEven(self): """Rates fixNegsEven should fix negatives by adding evenly to others""" q = Rates([[-6,2,2,2],[-3,-2,3,2],[-2,-2,-6,2],[4,4,-6,-2]], RnaPairs) m = q.fixNegsEven()._data self.assertEqual(m,array([[-6,2,2,2],[0,-3,2,1],[0,0,-0,0],[2,2,0,-4]])) def test_fixNegsFmin(self): """Rates fixNegsFmin should fix negatives using fmin method""" q = Rates(array([[-0.28936029, 0.14543346, -0.02648614, 0.17041297], [ 0.00949624, -0.31186005, 0.17313171, 0.1292321 ], [ 0.10443209, 0.16134479, -0.30480186, 0.03902498], [ 0.01611264, 0.12999161, 0.15558259, -0.30168684]]), DnaPairs) r = q.fixNegsFmin() assert not q.isValid() assert r.isValid() def test_fixNegsConstrainedOpt(self): """Rates fixNegsConstrainedOpt should fix negatives w/ constrained opt""" q = Rates(array([[-0.28936029, 0.14543346, -0.02648614, 0.17041297], [ 0.00949624, -0.31186005, 0.17313171, 0.1292321 ], [ 0.10443209, 0.16134479, -0.30480186, 0.03902498], [ 0.01611264, 0.12999161, 0.15558259, -0.30168684]]), DnaPairs) r = q.fixNegsFmin() assert not q.isValid() assert r.isValid() def test_fixNegsReflect(self): """Rates fixNegsReflect should reflect negatives across diagonal""" ab = Alphabet('ab')**2 #should leave matrix alone if no off-diagonal elements q = Rates([0,0,1,-1], ab) self.assertEqual(q.fixNegsReflect()._data, array([[0,0],[1,-1]])) q = Rates([-2,2,1,-1], ab) self.assertEqual(q.fixNegsReflect()._data, array([[-2,2],[1,-1]])) #should work if precisely one off-diag element in a pair is negative q = Rates([2,-2,1,-1], ab) self.assertEqual(q.fixNegsReflect()._data, array([[0,0],[3,-3]])) q = Rates([-1,1,-2,2], ab) self.assertEqual(q.fixNegsReflect()._data, array([[-3,3],[0,-0]])) #should work if both off-diag elements in a pair are negative q = Rates([1,-1,-2,2], ab) self.assertEqual(q.fixNegsReflect()._data, array([[-2,2],[1,-1]])) q = Rates([2,-2,-1,1], ab) self.assertEqual(q.fixNegsReflect()._data, array([[-1,1],[2,-2]])) q = Rates([[ 0, 3, -2, -1], [ 2, -1, 2, -3], [-1, -1, 2, 0], [-3, 2, 0, 1]], RnaPairs) q2 = q.fixNegsReflect() self.assertEqual(q2._data, \ array([[-7, 3, 1, 3], [ 2, -5, 3, 0], [ 2, 0, -2, 0], [ 1, 5, 0, -6]])) class GoldmanTests(TestCase): def setUp(self): pass def test_goldman_q_dna_pair(self): """Should return expected rate matrix""" seq1 = "ATGCATGCATGC" seq2 = "AAATTTGGGCCC" expected = array([[-(2/3.0), (1/3.0), (1/3.0), 0], [(1/3.0), -(2/3.0), 0, (1/3.0)], [(1/3.0), 0, -(2/3.0), (1/3.0)], [0, (1/3.0), (1/3.0), -(2/3.0)]]) observed = goldman_q_dna_pair(seq1, seq2) self.assertFloatEqual(observed, expected) def test_goldman_q_rna_pair(self): """Should return expected rate matrix""" seq1 = "AUGCAUGCAUGC" seq2 = "AAAUUUGGGCCC" expected = array([[-(2/3.0), (1/3.0), (1/3.0), 0], [(1/3.0), -(2/3.0), 0, (1/3.0)], [(1/3.0), 0, -(2/3.0), (1/3.0)], [0, (1/3.0), (1/3.0), -(2/3.0)]]) observed = goldman_q_rna_pair(seq1, seq2) self.assertFloatEqual(observed, expected) def test_goldman_q_dna_triple(self): """Should return expected rate matrix""" seq1 = "ATGCATGCATGC" seq2 = "AAATTTGGGCCC" outgroup = "AATTGGCCAATT" expected = array([[-(1/2.0), (1/2.0), 0, 0], [0, 0, 0, 0], [(1/3.0), 0, -(1/3.0), 0], [0, 0, 0, 0]]) observed = goldman_q_dna_triple(seq1, seq2, outgroup) self.assertFloatEqual(observed, expected) def test_goldman_q_rna_triple(self): """Should return expected rate matrix""" seq1 = "AUGCAUGCAUGC" seq2 = "AAAUUUGGGCCC" outgroup = "AAUUGGCCAAUU" expected = array([[-(1/2.0), (1/2.0), 0, 0], [0, 0, 0, 0], [(1/3.0), 0, -(1/3.0), 0], [0, 0, 0, 0]]) observed = goldman_q_rna_triple(seq1, seq2, outgroup) self.assertFloatEqual(observed, expected) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/__init__.py000644 000765 000024 00000002540 12024702176 021666 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_aaindex', 'test_agilent_microarray', 'test_blast', 'test_bpseq', 'test_cigar', 'test_clustal', 'test_cutg', 'test_dialign', 'test_ebi', 'test_fasta', 'test_fastq', 'test_genbank', 'test_illumina_sequence', 'test_locuslink', 'test_mage', 'test_meme', 'test_msms', 'test_ncbi_taxonomy', 'test_nexus', 'test_pdb', 'test_structure', 'test_phylip', 'test_record', 'test_record_finder', 'test_rfam', 'test_rna_fold', 'test_rnaview', 'test_sprinzl', 'test_tree', 'test_unigene' 'test_stride'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Catherine Lozuopone", "Gavin Huttley", "Rob Knight", "Sandra Smit", "Micah Hamady", "Jeremy Widmann", "Hua Ying", "Greg Caporaso", "Zongzhi Liu", "Jason Carnes", "Peter Maxwell", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_parse/test_aaindex.py000644 000765 000024 00000133710 12024702176 022603 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of the AAIndex parser. """ from cogent.util.unit_test import TestCase, main from cogent.parse.aaindex import AAIndex1Parser, AAIndex2Parser,\ AAIndexRecord, AAIndex1Record, AAIndex2Record, AAIndex1FromFiles,\ AAIndex2FromFiles __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "caporaso@colorado.edu" __status__ = "Production" class test_aaindex1_parser(TestCase): """ Tests aindex1_parser class """ def setUp(self): """ Setup some variables """ self._fake_file = list(fake_file_aaindex1.split('\n')) self.AAIndexObjects = AAIndex1FromFiles(self._fake_file) def test_init(self): """ AAI1: Test that init run w/o error """ aa1p = AAIndex1Parser() def test_read_file_as_list(self): """AAI1: Test that a file is correctly opened as a list """ aap = AAIndex1Parser() AAIndexObjects = aap(self._fake_file) def test_correct_num_of_records(self): """AAI1: Test that one object is created per record """ self.assertEqual(6, len(self.AAIndexObjects)) def test_ID_entries(self): """ AAI1: Test ID Entries """ self.assertEqual(self.AAIndexObjects['ANDN920101'].ID, 'ANDN920101') self.assertEqual(self.AAIndexObjects['ARGP820103'].ID, 'ARGP820103') self.assertEqual(self.AAIndexObjects['JURD980101'].ID, 'JURD980101') def test_single_line_Description_entries(self): """ AAI1: Test Single Line Description Entries """ self.assertEqual(self.AAIndexObjects['ANDN920101'].Description,\ 'alpha-CH chemical shifts (Andersen et al., 1992)') self.assertEqual(self.AAIndexObjects['ARGP820103'].Description,\ 'Membrane-buried preference parameters (Argos et al., 1982)') def test_multi_line_Description_entries(self): """ AAI1: Test Multi Line Description Entries """ self.assertEqual(self.AAIndexObjects['JURD980101'].Description,\ 'Modified Kyte-Doolittle hydrophobicity scale (Juretic et al., 1998)') def test_LITDB_entries(self): """ AAI1: Test LITDB Entries """ self.assertEqual(self.AAIndexObjects['ANDN920101'].LITDBEntryNum,\ 'LIT:1810048b PMID:1575719') self.assertEqual(self.AAIndexObjects['ARGP820103'].LITDBEntryNum,\ 'LIT:0901079b PMID:7151796') self.assertEqual(self.AAIndexObjects['JURD980101'].LITDBEntryNum,\ '') def test_Authors_entries(self): """ AAI1: Test Authors Entries """ self.assertEqual(self.AAIndexObjects['ANDN920101'].Authors,\ 'Andersen, N.H., Cao, B. and Chen, C.') self.assertEqual(self.AAIndexObjects['ARGP820103'].Authors,\ 'Argos, P., Rao, J.K.M. and Hargrave, P.A.') self.assertEqual(self.AAIndexObjects['JURD980101'].Authors,\ 'Juretic, D., Lucic, B., Zucic, D. and Trinajstic, N.') def test_mult_line_Title_entries(self): """ AAI1: Test Multi Line Title Entries """ self.assertEqual(self.AAIndexObjects['ANDN920101'].Title,\ 'Peptide/protein structure analysis using the chemical shift index ' +\ 'method: upfield alpha-CH values reveal dynamic helices and aL sites') self.assertEqual(self.AAIndexObjects['JURD980101'].Title,\ 'Protein transmembrane structure: recognition and prediction by ' +\ 'using hydrophobicity scales through preference functions') def test_sing_line_Title_entries(self): """ AAI1: Test Single Line Title Entries """ self.assertEqual(self.AAIndexObjects['ARGP820103'].Title,\ 'Structural prediction of membrane-bound proteins') def test_Citation_entries(self): """ AAI1: Test Citation Entries """ self.assertEqual(self.AAIndexObjects['ANDN920101'].Citation,\ 'Biochem. and Biophys. Res. Comm. 184, 1008-1014 (1992)') self.assertEqual(self.AAIndexObjects['ARGP820103'].Citation,\ 'Eur. J. Biochem. 128, 565-575 (1982)') self.assertEqual(self.AAIndexObjects['JURD980101'].Citation,\ 'Theoretical and Computational Chemistry, 5, 405-445 (1998)') def test_Comments_entries(self): """ AAI1: Test Comments Entries """ self.assertEqual(self.AAIndexObjects['ANDN920101'].Comments,\ '') self.assertEqual(self.AAIndexObjects['ARGP820103'].Comments,\ '') self.assertEqual(self.AAIndexObjects['JURD980101'].Comments,\ '') self.assertEqual(self.AAIndexObjects['TSAJ990102'].Comments,\ '(Cyh 113.7)') def test_single_line_Correlating_entries(self): """ AAI1: Test single line Correlating Entries """ self.assertEqual(self.AAIndexObjects['ANDN920101'].\ Correlating['BUNA790102'], 0.949) def test_empty_Correlating_entries(self): """ AAI1: Test empty Correlating Entries """ self.assertEqual(self.AAIndexObjects['WILM950104'].Correlating, {}) def test_multi_line_Correlating_entries(self): """ AAI1: Test multi line Correlating Entries """ self.assertEqual(self.AAIndexObjects['ARGP820103'].\ Correlating['ARGP820102'], 0.961) self.assertEqual(self.AAIndexObjects['ARGP820103'].\ Correlating['MIYS850101'], 0.822) self.assertEqual(self.AAIndexObjects['ARGP820103'].\ Correlating['JURD980101'], 0.800) self.assertEqual(self.AAIndexObjects['JURD980101'].\ Correlating['KYTJ820101'], 0.996) self.assertEqual(self.AAIndexObjects['JURD980101'].\ Correlating['NADH010101'], 0.925) self.assertEqual(self.AAIndexObjects['JURD980101'].\ Correlating['OOBM770101'], -0.903) def test_Data_entries(self): """ AAI1: Test Data Entries """ self.assertEqual(self.AAIndexObjects['ANDN920101'].Data['A'],\ 4.35) self.assertEqual(self.AAIndexObjects['ANDN920101'].Data['Q'],\ 4.37) self.assertEqual(self.AAIndexObjects['ANDN920101'].Data['V'],\ 3.95) self.assertEqual(self.AAIndexObjects['ARGP820103'].Data['A'],\ 1.56) self.assertEqual(self.AAIndexObjects['ARGP820103'].Data['Q'],\ 0.51) self.assertEqual(self.AAIndexObjects['ARGP820103'].Data['V'],\ 1.14) self.assertEqual(self.AAIndexObjects['JURD980101'].Data['A'],\ 1.10) self.assertEqual(self.AAIndexObjects['JURD980101'].Data['Q'],\ -3.68) self.assertEqual(self.AAIndexObjects['JURD980101'].Data['V'],\ 4.2) class test_aaindex2_parser(TestCase): def setUp(self): """ Setup some variables """ self._fake_file = list(fake_file_aaindex2.split('\n')) self.AAIndexObjects = AAIndex2FromFiles(self._fake_file) def test_init(self): """ AAI2: Test that init run w/o error """ aa2p = AAIndex2Parser() def test_read_file_as_list(self): """AAI1: Test that a file is correctly opened as a list """ aap = AAIndex2Parser() AAIndexObjects = aap(self._fake_file) def test_correct_num_of_records(self): """AAI2: Test that one object is created per record """ self.assertEqual(6, len(self.AAIndexObjects)) def test_ID_entries(self): """ AAI2: Test ID Entries """ self.assertEqual(self.AAIndexObjects['ALTS910101'].ID, 'ALTS910101') self.assertEqual(self.AAIndexObjects['BENS940103'].ID, 'BENS940103') self.assertEqual(self.AAIndexObjects['QUIB020101'].ID, 'QUIB020101') def test_Description_entries(self): """ AAI2: Test Description Entries """ self.assertEqual(self.AAIndexObjects['ALTS910101'].Description,\ 'The PAM-120 matrix (Altschul, 1991)') self.assertEqual(self.AAIndexObjects['BENS940103'].Description,\ 'Log-odds scoring matrix collected in 74-100 PAM (Benner et al., '+\ '1994)') self.assertEqual(self.AAIndexObjects['QUIB020101'].Description,\ 'STROMA score matrix for the alignment of known distant homologs ' +\ '(Qian-Goldstein, 2002)') def test_LITDB_entries(self): """ AAI2: Test LITDB Entries """ self.assertEqual(self.AAIndexObjects['ALTS910101'].LITDBEntryNum,\ 'LIT:1713145 PMID:2051488') self.assertEqual(self.AAIndexObjects['BENS940103'].LITDBEntryNum,\ 'LIT:2023094 PMID:7700864') self.assertEqual(self.AAIndexObjects['QUIB020101'].LITDBEntryNum,\ 'PMID:12211027') def test_Authors_entries(self): """ AAI2: Test Atuthor Entries """ self.assertEqual(self.AAIndexObjects['ALTS910101'].Authors,\ 'Altschul, S.F.') self.assertEqual(self.AAIndexObjects['BENS940103'].Authors,\ 'Benner, S.A., Cohen, M.A. and Gonnet, G.H.') self.assertEqual(self.AAIndexObjects['QUIB020101'].Authors,\ 'Qian, B. and Goldstein, R.A.') def test_Title_entries(self): """ AAI2: Test Title Entries """ self.assertEqual(self.AAIndexObjects['ALTS910101'].Title,\ 'Amino acid substitution matrices from an information theoretic ' +\ 'perspective') self.assertEqual(self.AAIndexObjects['BENS940103'].Title,\ 'Amino acid substitution during functionally constrained divergent ' +\ 'evolution of protein sequences') self.assertEqual(self.AAIndexObjects['QUIB020101'].Title,\ 'Optimization of a new score function for the generation of '+\ 'accurate alignments') def test_Citation_entries(self): """ AAI2: Test citation entries """ self.assertEqual(self.AAIndexObjects['ALTS910101'].Citation,\ 'J. Mol. Biol. 219, 555-565 (1991)') self.assertEqual(self.AAIndexObjects['BENS940103'].Citation,\ 'Protein Engineering 7, 1323-1332 (1994)') self.assertEqual(self.AAIndexObjects['QUIB020101'].Citation,\ 'Proteins. 48, 605-610 (2002)') def test_Comments_entries(self): """ AAI2: Tests null, single line, multi line comments """ self.assertEqual(self.AAIndexObjects['ALTS910101'].Comments,\ '') self.assertEqual(self.AAIndexObjects['BENS940103'].Comments,\ 'extrapolated to 250 PAM') self.assertEqual(self.AAIndexObjects['QUIB020101'].Comments,\ '') self.assertEqual(self.AAIndexObjects['HENS920104'].Comments,\ '# Matrix made by matblas from blosum50.iij ' + '* # BLOSUM Clustered Scoring Matrix in 1/3 Bit Units ' + '* # Blocks Database = /data/blocks_5.0/blocks.dat ' + '* # Cluster Percentage: >= 50 ' + '* # Entropy = 0.4808, Expected = -0.3573') def test_Data_entries_20x20_LTM(self): """ AAI2: correct data entries when 20x20 LTM""" self.assertEqual(self.AAIndexObjects['ALTS910101'].Data['A']['A'],\ 3.) self.assertEqual(self.AAIndexObjects['ALTS910101'].Data['Y']['R'],\ -6.) self.assertEqual(self.AAIndexObjects['ALTS910101'].Data['V']['V'],\ 5.) self.assertEqual(self.AAIndexObjects['BENS940103'].Data['A']['A'],\ 2.4) self.assertEqual(self.AAIndexObjects['BENS940103'].Data['Y']['R'],\ -2.0) self.assertEqual(self.AAIndexObjects['BENS940103'].Data['V']['V'],\ 3.4) self.assertEqual(self.AAIndexObjects['QUIB020101'].Data['A']['A'],\ 2.5) self.assertEqual(self.AAIndexObjects['QUIB020101'].Data['Y']['R'],\ -0.9) self.assertEqual(self.AAIndexObjects['QUIB020101'].Data['V']['V'],\ 4.2) def test_Data_entries_20x20_Square(self): """ AAI2: correct data entries when 20x20 squ matrix """ self.assertEqual(self.AAIndexObjects['HENS920104'].Data['V']['Y'],\ -1) self.assertEqual(self.AAIndexObjects['HENS920104'].Data['Q']['A'],\ -1) self.assertEqual(self.AAIndexObjects['HENS920104'].Data['N']['N'],\ 7) def test_Data_entries_with_abnormal_fields(self): """ AAI2: test correct data entries when more than std fields present Some entires in AAIndex2 have more that 20 fields, this tests that that data is corrected parsed and identified. """ # There are no entries that fit this category that are square # matrices, which is all we are concerned with at this point, # so this method should just serve as a reminder to test this # when we begin parsing data other than square matrices. pass def test_Data_entries_21x21_LTM(self): """ AAI2: correct data entries when 21x21 LTM""" self.assertEqual(self.AAIndexObjects['KOSJ950101'].Data['-']['-'],\ 55.7) self.assertEqual(self.AAIndexObjects['KOSJ950101'].Data['Y']['-'],\ 0.3) self.assertEqual(self.AAIndexObjects['KOSJ950101'].Data['N']['R'],\ 3.0) def test_Data_entries_22x21_square(self): """ AAI2: correct data entries when 22x21 square matrix """ # It's not really a sqaure matrix, but it's fully populated ... self.assertEqual(self.AAIndexObjects['OVEJ920102'].Data['J']['D'],\ 0.001) self.assertEqual(self.AAIndexObjects['OVEJ920102'].Data['-']['I'],\ 0.022) self.assertEqual(self.AAIndexObjects['OVEJ920102'].Data['D']['E'],\ 0.109) class AAIndexRecordTests(TestCase): """ AAIR: Tests AAIndexRecord class """ def setUp(self): self.id = "5" self.description = "Some Info" self.LITDB_entry_num = "25" self.authors = "Greg" self.title = "A test" self.citation = "something" self.comments = "This is a test, this is only a test" self.data = {} class AAIndex1RecordTests(AAIndexRecordTests): """ AAIR1: Tests AAIndex1Records class """ def setUp(self): AAIndexRecordTests.setUp(self) self.correlating = [0.987, 0.783, 1., 0] values = [] keys = 'ARNDCQEGHILKMFPSTWYV' for i in range(20): values += [float(i) + 0.15] self.data = dict(zip(keys,values)) self.aar = AAIndex1Record(self.id, self.description,\ self.LITDB_entry_num, self.authors, self.title,\ self.citation, self.comments, self.correlating, self.data) def test_init(self): """ AAIR1: Tests init method returns with no errors""" test_aar = AAIndex1Record(self.id, self.description,\ self.LITDB_entry_num, self.authors, self.title,\ self.citation, self.comments, self.correlating, self.data) def test_general_init_data(self): """ AAIR1: Tests init correctly initializes data""" self.assertEqual(self.aar.ID, str(self.id)) self.assertEqual(self.aar.Description, str(self.description)) self.assertEqual(self.aar.LITDBEntryNum,\ str(self.LITDB_entry_num)) self.assertEqual(self.aar.Authors, str(self.authors)) self.assertEqual(self.aar.Title, str(self.title)) self.assertEqual(self.aar.Citation, str(self.citation)) self.assertEqual(self.aar.Comments, str(self.comments)) self.assertEqual(self.aar.Correlating, self.correlating) self.assertEqual(self.aar.Data,self.data) def test_toSquareDistanceMatrix(self): """ AAIR1: Tests that _toSquareDistanceMatrix runs without returning an error """ square = self.aar._toSquareDistanceMatrix() def test_toSquareDistanceMatrix_data_integrity_diagonal(self): """ AAIR1: Tests that diag = 0 when square matrix is built """ square = self.aar._toSquareDistanceMatrix() # Test diagonal keys = 'ARNDCQEGHILKMFPSTWYV' for k in keys: self.assertEqual(square[k][k], 0.) def test_toSquareDistanceMatrix_data_integrity(self): """ AAIR1: Tests that _toSquareDistanceMatrix works right w/o stops """ square = self.aar._toSquareDistanceMatrix() self.assertFloatEqualAbs(square['R']['A'], square['A']['R']) self.assertFloatEqualAbs(square['A']['R'], 1.) self.assertFloatEqualAbs(square['D']['N'], square['N']['D']) self.assertFloatEqualAbs(square['D']['N'], 1.) self.assertFloatEqualAbs(square['A']['C'], square['C']['A']) self.assertFloatEqualAbs(square['A']['C'], 4.) self.assertFloatEqualAbs(square['V']['A'], square['A']['V']) self.assertFloatEqualAbs(square['V']['A'], 19.) self.assertFloatEqualAbs(square['V']['Y'], square['Y']['V']) self.assertFloatEqualAbs(square['V']['Y'], 1.) def test_toSquareDistanceMatrix_data_integrity_w_stops(self): """ AAIR1: Tests that _toSquareDistanceMatrix works right w/ stops """ square = self.aar._toSquareDistanceMatrix(include_stops=1) self.assertFloatEqualAbs(square['R']['A'], square['A']['R']) self.assertFloatEqualAbs(square['A']['R'], 1.) self.assertFloatEqualAbs(square['D']['N'], square['N']['D']) self.assertFloatEqualAbs(square['D']['N'], 1.) self.assertFloatEqualAbs(square['A']['C'], square['C']['A']) self.assertFloatEqualAbs(square['A']['C'], 4.) self.assertFloatEqualAbs(square['V']['A'], square['A']['V']) self.assertFloatEqualAbs(square['V']['A'], 19.) self.assertFloatEqualAbs(square['V']['Y'], square['Y']['V']) self.assertFloatEqualAbs(square['V']['Y'], 1.) self.assertFloatEqualAbs(square['V']['*'], None) self.assertFloatEqualAbs(square['*']['Y'], None) self.assertFloatEqualAbs(square['*']['*'], None) self.assertFloatEqualAbs(square['*']['R'], None) def test_toDistanceMatrix(self): """ AAIR1: Tests that toDistanceMatrix functions as expected """ dm = self.aar.toDistanceMatrix() self.assertFloatEqualAbs(dm['R']['A'], dm['A']['R']) self.assertFloatEqualAbs(dm['A']['R'], 1.) self.assertFloatEqualAbs(dm['D']['N'], dm['N']['D']) self.assertFloatEqualAbs(dm['D']['N'], 1.) self.assertFloatEqualAbs(dm['A']['C'], dm['C']['A']) self.assertFloatEqualAbs(dm['A']['C'], 4.) self.assertFloatEqualAbs(dm['V']['A'], dm['A']['V']) self.assertFloatEqualAbs(dm['V']['A'], 19.) self.assertFloatEqualAbs(dm['V']['Y'], dm['Y']['V']) self.assertFloatEqualAbs(dm['V']['Y'], 1.) def test_toDistanceMatrix_w_stops(self): """ AAIR1: Tests that toDistanceMatrix works right w/ stops """ square = self.aar.toDistanceMatrix(include_stops=1) self.assertFloatEqualAbs(square['R']['A'], square['A']['R']) self.assertFloatEqualAbs(square['A']['R'], 1.) self.assertFloatEqualAbs(square['D']['N'], square['N']['D']) self.assertFloatEqualAbs(square['D']['N'], 1.) self.assertFloatEqualAbs(square['A']['C'], square['C']['A']) self.assertFloatEqualAbs(square['A']['C'], 4.) self.assertFloatEqualAbs(square['V']['A'], square['A']['V']) self.assertFloatEqualAbs(square['V']['A'], 19.) self.assertFloatEqualAbs(square['V']['Y'], square['Y']['V']) self.assertFloatEqualAbs(square['V']['Y'], 1.) self.assertFloatEqualAbs(square['V']['*'], None) self.assertFloatEqualAbs(square['*']['Y'], None) self.assertFloatEqualAbs(square['*']['*'], None) self.assertFloatEqualAbs(square['*']['R'], None) class AAIndex2RecordTests(AAIndexRecordTests): """ AAIR2: Tests AAIndex2Records class """ def setUp(self): AAIndexRecordTests.setUp(self) # Build LTM data values = range(210) keys = 'ARNDCQEGHILKMFPSTWYV' self.LTMdata = dict.fromkeys(keys) i = 0 for r in keys: new_row = dict.fromkeys(keys) for c in keys: if keys.find(c) <= keys.find(r): new_row[c] = values[i] i +=1 self.LTMdata[r] = new_row self.aarLTM = AAIndex2Record(self.id, self.description,\ self.LITDB_entry_num, self.authors, self.title,\ self.citation, self.comments, self.LTMdata) # Build Square matrix data values = range(400) self.SQUdata = dict.fromkeys(keys) i = 0 for r in keys: new_row = dict.fromkeys(keys) for c in keys: new_row[c] = values[i] i +=1 self.SQUdata[r] = new_row self.aarSquare = AAIndex2Record(self.id, self.description,\ self.LITDB_entry_num, self.authors, self.title,\ self.citation, self.comments, self.SQUdata) def test_init(self): """ AAIR2: Tests init method returns with no errors""" test_aar = AAIndex2Record(self.id, self.description,\ self.LITDB_entry_num, self.authors, self.title,\ self.citation, self.comments, self.SQUdata) def test_init_data(self): """ AAIR2: Tests init correctly initializes data""" self.assertEqual(self.aarLTM.ID, str(self.id)) self.assertEqual(self.aarLTM.Description, str(self.description)) self.assertEqual(self.aarLTM.LITDBEntryNum,\ str(self.LITDB_entry_num)) self.assertEqual(self.aarLTM.Authors, str(self.authors)) self.assertEqual(self.aarLTM.Title, str(self.title)) self.assertEqual(self.aarLTM.Citation, str(self.citation)) self.assertEqual(self.aarLTM.Comments, str(self.comments)) # def test_matrix_values_col_by_row(self): # """ Tests that keys and values correctly correspond in data LTM # # # Also tests that reverse keys are same as forward keys. # # """ # # data_matrix = self.aarLTM.Data # self.assertEqual(data_matrix['A']['A'], 0) # self.assertEqual(data_matrix['A']['R'], 1) # self.assertEqual(data_matrix['R']['R'], 2) # self.assertEqual(data_matrix['C']['H'], 40) # self.assertEqual(data_matrix['I']['M'], 87) # self.assertEqual(data_matrix['D']['P'], 108) # self.assertEqual(data_matrix['W']['V'], 207) # self.assertEqual(data_matrix['Y']['V'], 208) # self.assertEqual(data_matrix['V']['V'], 209) # def test_LTM_values_row_by_col(self): # """ Tests that keys are correctly linked to values in a LTM # # This tests that some random places hold the correct values. # These are some randomly selected keys with hand calculated # values. Also included are the extreme values. Technically if # the first and last three are correct all values should be # correct. # # """ # data_matrix = self.aarLTM.Data # self.assertEqual(data_matrix['R']['A'], 1) # self.assertEqual(data_matrix['H']['C'], 40) # self.assertEqual(data_matrix['M']['I'], 87) # self.assertEqual(data_matrix['P']['D'], 108) # self.assertEqual(data_matrix['V']['W'], 207) # self.assertEqual(data_matrix['V']['Y'], 208) # self.assertEqual(data_matrix['A']['A'], 0) # self.assertEqual(data_matrix['R']['R'], 2) # self.assertEqual(data_matrix['V']['V'], 209) def test_Square_Matrix_values_row_by_col(self): """ AAIR2: Tests that key -> value pair integrity in Square matrix """ data_matrix = self.aarSquare.Data self.assertEqual(data_matrix['R']['A'], 20) #self.assertEqual(data_matrix['H']['C'], 40) #self.assertEqual(data_matrix['M']['I'], 87) #self.assertEqual(data_matrix['P']['D'], 108) #self.assertEqual(data_matrix['V']['W'], 207) #self.assertEqual(data_matrix['V']['Y'], 208) self.assertEqual(data_matrix['A']['A'], 0) self.assertEqual(data_matrix['R']['R'], 21) self.assertEqual(data_matrix['V']['V'], 399) def test_toSquareDistanceMatrix_data_integrity(self): """ AAIR2: Tests that _toSquareDistanceMatrix works right w/o stops """ square = self.aarSquare._toSquareDistanceMatrix() self.assertEqual(square['R']['A'], 20) self.assertEqual(square['A']['A'], 0) self.assertEqual(square['R']['R'], 21) self.assertEqual(square['V']['V'], 399) def test_toSquareDistanceMatrix_data_integrity_w_stops(self): """ AAIR2: Tests that _toSquareDistanceMatrix works right with stops """ square = self.aarSquare._toSquareDistanceMatrix(include_stops=1) self.assertEqual(square['R']['A'], 20) self.assertEqual(square['A']['A'], 0) self.assertEqual(square['R']['R'], 21) self.assertEqual(square['V']['V'], 399) self.assertEqual(square['V']['*'], None) self.assertEqual(square['*']['Y'], None) self.assertEqual(square['*']['*'], None) self.assertEqual(square['*']['R'], None) # Data for parser tests fake_file_aaindex1 =\ """ H ANDN920101 D alpha-CH chemical shifts (Andersen et al., 1992) R LIT:1810048b PMID:1575719 A Andersen, N.H., Cao, B. and Chen, C. T Peptide/protein structure analysis using the chemical shift index method: upfield alpha-CH values reveal dynamic helices and aL sites J Biochem. and Biophys. Res. Comm. 184, 1008-1014 (1992) C BUNA790102 0.949 I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 4.35 4.38 4.75 4.76 4.65 4.37 4.29 3.97 4.63 3.95 4.17 4.36 4.52 4.66 4.44 4.50 4.35 4.70 4.60 3.95 // H ARGP820101 D Hydrophobicity index (Argos et al., 1982) R LIT:0901079b PMID:7151796 A Argos, P., Rao, J.K.M. and Hargrave, P.A. T Structural prediction of membrane-bound proteins J Eur. J. Biochem. 128, 565-575 (1982) C JOND750101 1.000 SIMZ760101 0.967 GOLD730101 0.936 TAKK010101 0.906 MEEJ810101 0.891 CIDH920105 0.867 LEVM760106 0.865 CIDH920102 0.862 MEEJ800102 0.855 MEEJ810102 0.853 CIDH920103 0.827 PLIV810101 0.820 CIDH920104 0.819 LEVM760107 0.806 NOZY710101 0.800 PARJ860101 -0.835 WOLS870101 -0.838 BULH740101 -0.854 I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 0.61 0.60 0.06 0.46 1.07 0. 0.47 0.07 0.61 2.22 1.53 1.15 1.18 2.02 1.95 0.05 0.05 2.65 1.88 1.32 // H TSAJ990102 D Volumes not including the crystallographic waters using the ProtOr (Tsai et al., 1999) R PMID:10388571 A Tsai, J., Taylor, R., Chothia, C. and Gerstein, M. T The packing density in proteins: standard radii and volumes J J Mol Biol. 290, 253-266 (1999) * (Cyh 113.7) C TSAJ990101 1.000 CHOC750101 0.996 BIGC670101 0.992 GOLD730102 0.991 KRIW790103 0.987 FAUJ880103 0.985 GRAR740103 0.978 CHAM820101 0.978 CHOC760101 0.972 FASG760101 0.940 LEVM760105 0.928 LEVM760102 0.918 ROSG850101 0.909 DAWD720101 0.905 CHAM830106 0.896 FAUJ880106 0.882 RADA880106 0.864 LEVM760107 0.861 LEVM760106 0.841 RADA880103 -0.879 I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 90.0 194.0 124.7 117.3 103.3 149.4 142.2 64.9 160.0 163.9 164.0 167.3 167.0 191.9 122.9 95.4 121.5 228.2 197.0 139.0 // H JURD980101 D Modified Kyte-Doolittle hydrophobicity scale (Juretic et al., 1998) R A Juretic, D., Lucic, B., Zucic, D. and Trinajstic, N. T Protein transmembrane structure: recognition and prediction by using hydrophobicity scales through preference functions J Theoretical and Computational Chemistry, 5, 405-445 (1998) C KYTJ820101 0.996 CHOC760103 0.967 NADH010102 0.931 JANJ780102 0.928 NADH010101 0.925 EISD860103 0.901 DESM900102 0.900 NADH010103 0.900 EISD840101 0.895 RADA880101 0.893 MANP780101 0.887 WOLR810101 0.881 PONP800103 0.879 JANJ790102 0.879 NADH010104 0.873 CHOC760104 0.870 PONP800102 0.869 JANJ790101 0.868 MEIH800103 0.861 PONP800101 0.858 NAKH920108 0.858 RADA880108 0.857 PONP800108 0.856 ROSG850102 0.854 PONP930101 0.849 RADA880107 0.842 BIOV880101 0.840 MIYS850101 0.837 FAUJ830101 0.833 CIDH920104 0.832 DESM900101 0.829 WARP780101 0.827 KANM800104 0.826 LIFS790102 0.824 RADA880104 0.824 NADH010105 0.821 NISK800101 0.816 NISK860101 0.808 BIOV880102 0.805 ARGP820102 0.802 ARGP820103 0.800 VHEG790101 -0.814 KRIW790101 -0.824 CHOC760102 -0.851 ROSM880101 -0.851 MONM990101 -0.853 JANJ780103 -0.853 RACS770102 -0.855 PRAM900101 -0.862 JANJ780101 -0.862 GUYH850101 -0.864 GRAR740102 -0.864 MEIH800102 -0.879 KUHL950101 -0.884 ROSM880102 -0.894 OOBM770101 -0.903 I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 1.10 -5.10 -3.50 -3.60 2.50 -3.68 -3.20 -0.64 -3.20 4.50 3.80 -4.11 1.90 2.80 -1.90 -0.50 -0.70 -0.46 -1.3 4.2 // H WILM950104 D Hydrophobicity coefficient in RP-HPLC, C18 with 0.1%TFA/2-PrOH/MeCN/H2O (Wilce et al. 1995) R A Wilce, M.C., Aguilar, M.I. and Hearn, M.T. T Physicochemical basis of amino acid hydrophobicity scales: evaluation of four new scales of amino acid hydrophobicity coefficients derived from RP-HPLC of peptides J Anal Chem. 67, 1210-1219 (1995) C I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V -2.34 1.60 2.81 -0.48 5.03 0.16 1.30 -1.06 -3.00 7.26 1.09 1.56 0.62 2.57 -0.15 1.93 0.19 3.59 -2.58 2.06 // H ARGP820103 D Membrane-buried preference parameters (Argos et al., 1982) R LIT:0901079b PMID:7151796 A Argos, P., Rao, J.K.M. and Hargrave, P.A. T Structural prediction of membrane-bound proteins J Eur. J. Biochem. 128, 565-575 (1982) C ARGP820102 0.961 MIYS850101 0.822 NAKH900106 0.810 EISD860103 0.810 KYTJ820101 0.806 JURD980101 0.800 I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 1.56 0.45 0.27 0.14 1.23 0.51 0.23 0.62 0.29 1.67 2.93 0.15 2.96 2.03 0.76 0.81 0.91 1.08 0.68 1.14 // """ fake_file_aaindex2 =\ """ H ALTS910101 D The PAM-120 matrix (Altschul, 1991) R LIT:1713145 PMID:2051488 A Altschul, S.F. T Amino acid substitution matrices from an information theoretic perspective J J. Mol. Biol. 219, 555-565 (1991) M rows = ARNDCQEGHILKMFPSTWYV, cols = ARNDCQEGHILKMFPSTWYV 3. -3. 6. 0. -1. 4. 0. -3. 2. 5. -3. -4. -5. -7. 9. -1. 1. 0. 1. -7. 6. 0. -3. 1. 3. -7. 2. 5. 1. -4. 0. 0. -5. -3. -1. 5. -3. 1. 2. 0. -4. 3. -1. -4. 7. -1. -2. -2. -3. -3. -3. -3. -4. -4. 6. -3. -4. -4. -5. -7. -2. -4. -5. -3. 1. 5. -2. 2. 1. -1. -7. 0. -1. -3. -2. -2. -4. 5. -2. -1. -3. -4. -6. -1. -4. -4. -4. 1. 3. 0. 8. -4. -4. -4. -7. -6. -6. -6. -5. -2. 0. 0. -6. -1. 8. 1. -1. -2. -2. -3. 0. -1. -2. -1. -3. -3. -2. -3. -5. 6. 1. -1. 1. 0. -1. -2. -1. 1. -2. -2. -4. -1. -2. -3. 1. 3. 1. -2. 0. -1. -3. -2. -2. -1. -3. 0. -3. -1. -1. -4. -1. 2. 4. -7. 1. -5. -8. -8. -6. -8. -8. -5. -7. -5. -5. -7. -1. -7. -2. -6. 12. -4. -6. -2. -5. -1. -5. -4. -6. -1. -2. -3. -6. -4. 4. -6. -3. -3. -1. 8. 0. -3. -3. -3. -2. -3. -3. -2. -3. 3. 1. -4. 1. -3. -2. -2. 0. -8. -3. 5. // H BENS940103 D Log-odds scoring matrix collected in 74-100 PAM (Benner et al., 1994) R LIT:2023094 PMID:7700864 A Benner, S.A., Cohen, M.A. and Gonnet, G.H. T Amino acid substitution during functionally constrained divergent evolution of protein sequences J Protein Engineering 7, 1323-1332 (1994) * extrapolated to 250 PAM M rows = ARNDCQEGHILKMFPSTWYV, cols = ARNDCQEGHILKMFPSTWYV 2.4 -0.8 4.8 -0.2 0.3 3.6 -0.3 -0.5 2.2 4.8 0.3 -2.2 -1.8 -3.2 11.8 -0.3 1.6 0.7 0.8 -2.6 3.0 -0.1 0.3 1.0 2.9 -3.2 1.7 3.7 0.6 -1.0 0.4 0.2 -2.0 -1.1 -0.5 6.6 -1.0 1.0 1.2 0.4 -1.3 1.4 0.2 -1.6 6.1 -0.8 -2.6 -2.8 -3.9 -1.2 -2.0 -2.9 -4.3 -2.3 4.0 -1.4 -2.4 -3.1 -4.2 -1.6 -1.7 -3.1 -4.6 -1.9 2.8 4.2 -0.4 2.9 0.9 0.4 -2.9 1.7 1.2 -1.1 0.6 -2.3 -2.4 3.4 -0.8 -1.8 -2.2 -3.2 -1.2 -1.0 -2.2 -3.5 -1.5 2.6 2.9 -1.5 4.5 -2.6 -3.5 -3.2 -4.7 -0.7 -2.8 -4.3 -5.4 0.0 0.9 2.1 -3.6 1.3 7.2 0.4 -1.0 -1.0 -1.0 -3.1 -0.2 -0.7 -1.7 -1.0 -2.6 -2.2 -0.8 -2.4 -3.8 7.5 1.1 -0.2 0.9 0.4 0.1 0.1 0.1 0.4 -0.3 -1.8 -2.2 0.0 -1.4 -2.6 0.5 2.1 0.7 -0.3 0.4 -0.2 -0.6 -0.1 -0.2 -1.0 -0.5 -0.3 -1.1 0.1 -0.4 -2.2 0.1 1.4 2.5 -4.1 -1.6 -4.0 -5.5 -0.9 -2.8 -4.7 -4.1 -1.0 -2.3 -0.9 -3.6 -1.3 3.0 -5.2 -3.4 -3.7 14.7 -2.6 -2.0 -1.4 -2.8 -0.4 -1.8 -3.0 -4.3 2.5 -1.0 -0.1 -2.4 -0.5 5.3 -3.4 -1.9 -2.1 3.6 8.1 0.1 -2.2 -2.2 -2.9 -0.2 -1.7 -2.1 -3.1 -2.1 3.2 1.9 -1.9 1.8 0.1 -1.9 -1.0 0.2 -2.9 -1.4 3.4 // H QUIB020101 D STROMA score matrix for the alignment of known distant homologs (Qian-Goldstein, 2002) R PMID:12211027 A Qian, B. and Goldstein, R.A. T Optimization of a new score function for the generation of accurate alignments J Proteins. 48, 605-610 (2002) M rows = ARNDCQEGHILKMFPSTWYV, cols = ARNDCQEGHILKMFPSTWYV 2.5 0.2 5.2 1.1 0.7 2.5 1 0.1 3.3 5.3 1.2 -1.3 -1.9 -3.1 11.5 -0.1 2 1.9 1.1 -2.5 3.6 1.2 1.9 2.3 3.2 -2.4 1.7 3.7 1.4 -0.2 0.7 0.9 -1.3 -0.3 0.5 7.5 -1.4 1.5 1.4 0.5 -1.7 1.4 0.3 -1.7 6.8 0.3 -1.9 -2.4 -2.9 -3.2 -0.9 -3.1 -3.7 -1.8 4.5 -0.2 -1.5 -2.4 -3.4 -1.6 -1.2 -1.5 -3.8 -2.4 3.4 5.2 -0.2 3.4 1.6 1.4 -3 2.2 1.2 0.4 1.1 -1.5 -2 3.9 -0.2 -1.4 -2.1 -2.8 -1.3 -0.6 -2 -3.8 -0.8 2.2 3.1 -0.5 5.4 -1.6 -3.2 -2.5 -3.7 -0.8 -1.7 -13.7 -4.7 -0.9 2.2 3.7 -2.8 1.7 7 0.7 -0.6 -0.1 -0.2 -3.6 1 0 -0.8 -2.1 -2.4 -1.4 0.2 -1.9 -4.1 8.1 1.7 0.2 1.4 1.7 0.7 0.9 1.1 1.6 -0.1 -1.1 -0.8 1.4 -1.1 -2.5 2 2.8 1.7 0.2 1.4 0.1 0.3 -0.1 1.6 -0.6 -0.2 0 0.3 1 -0.3 -0.8 1.1 2.6 0.4 -3.3 -1.5 -4 -5.7 -0.5 -2.9 -4.7 -4.2 -1.2 -1.8 -1.2 -3 -0.6 3.7 -5 -2.8 -2.9 14.9 -1.8 -0.9 -0.8 -2.9 -0.3 -1.5 -2.2 -4.8 2.9 0.2 0.8 -1.5 0.5 5.2 -3.3 -0.9 -0.8 4.9 8.1 1.9 -2.8 -0.9 -2.5 0.7 -1.5 -1.3 -1.4 -2.5 4.5 3.4 -1 1.7 0.9 -1.1 -3 1.5 -2.5 0.3 4.2 // H HENS920104 D BLOSUM50 substitution matrix (Henikoff-Henikoff, 1992) R LIT:1902106 PMID:1438297 A Henikoff, S. and Henikoff, J.G. T Amino acid substitution matrices from protein blocks J Proc. Natl. Acad. Sci. USA 89, 10915-10919 (1992) * # Matrix made by matblas from blosum50.iij * # BLOSUM Clustered Scoring Matrix in 1/3 Bit Units * # Blocks Database = /data/blocks_5.0/blocks.dat * # Cluster Percentage: >= 50 * # Entropy = 0.4808, Expected = -0.3573 M rows = ARNDCQEGHILKMFPSTWYV, cols = ARNDCQEGHILKMFPSTWYV 5 -2 -1 -2 -1 -1 -1 0 -2 -1 -2 -1 -1 -3 -1 1 0 -3 -2 0 -2 7 -1 -2 -4 1 0 -3 0 -4 -3 3 -2 -3 -3 -1 -1 -3 -1 -3 -1 -1 7 2 -2 0 0 0 1 -3 -4 0 -2 -4 -2 1 0 -4 -2 -3 -2 -2 2 8 -4 0 2 -1 -1 -4 -4 -1 -4 -5 -1 0 -1 -5 -3 -4 -1 -4 -2 -4 13 -3 -3 -3 -3 -2 -2 -3 -2 -2 -4 -1 -1 -5 -3 -1 -1 1 0 0 -3 7 2 -2 1 -3 -2 2 0 -4 -1 0 -1 -1 -1 -3 -1 0 0 2 -3 2 6 -3 0 -4 -3 1 -2 -3 -1 -1 -1 -3 -2 -3 0 -3 0 -1 -3 -2 -3 8 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 -2 0 1 -1 -3 1 0 -2 10 -4 -3 0 -1 -1 -2 -1 -2 -3 2 -4 -1 -4 -3 -4 -2 -3 -4 -4 -4 5 2 -3 2 0 -3 -3 -1 -3 -1 4 -2 -3 -4 -4 -2 -2 -3 -4 -3 2 5 -3 3 1 -4 -3 -1 -2 -1 1 -1 3 0 -1 -3 2 1 -2 0 -3 -3 6 -2 -4 -1 0 -1 -3 -2 -3 -1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7 0 -3 -2 -1 -1 0 1 -3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8 -4 -3 -2 1 4 -1 -1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10 -1 -1 -4 -3 -3 1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5 2 -4 -2 -2 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 2 5 -3 -2 0 -3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15 2 -3 -2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8 -1 0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 1 -1 -3 -2 0 -3 -1 5 // H KOSJ950101 D Context-dependent optimal substitution matrices for exposed helix (Koshi-Goldstein, 1995) R LIT:2124140 PMID:8577693 A Koshi, J.M. and Goldstein, R.A. T Context-dependent optimal substitution matrices. J Protein Engineering 8, 641-645 (1995) M rows = -ARNDCQEGHILKMFPSTWYV, cols = -ARNDCQEGHILKMFPSTWYV 55.7 3.0 3.0 3.0 3.0 0.4 0.1 3.0 3.0 2.1 3.0 3.0 3.0 0.1 1.9 2.2 2.4 3.0 0.8 1.3 3.0 25.6 47.2 1.5 1.0 0.7 0.3 1.9 2.3 4.3 0.6 0.2 2.0 0.8 0.1 0.3 3.1 2.8 3.7 0.4 0.1 2.0 14.8 0.9 62.7 1.3 0.4 0.3 4.6 0.3 0.1 1.9 0.5 2.2 5.1 0.6 0.2 0.4 1.9 1.5 0.4 0.2 0.3 15.2 0.2 0.5 48.2 3.3 0.1 3.2 4.9 0.1 1.7 1.7 1.4 3.0 0.6 1.0 0.1 9.7 2.7 0.7 1.1 1.5 15.9 3.9 1.4 7.3 52.1 0.3 0.9 11.0 2.0 0.4 0.6 0.1 0.6 0.5 0.1 0.6 2.9 0.1 0.1 0.1 0.1 9.4 1.5 0.1 1.5 1.6 73.6 0.1 2.6 0.1 0.1 2.1 4.0 0.1 0.1 0.8 0.7 0.3 2.2 0.1 0.1 0.1 0.1 8.4 5.7 2.0 4.5 0.3 47.5 8.2 0.9 1.6 0.1 3.4 7.8 0.5 0.1 0.7 5.3 2.2 0.2 0.7 0.5 5.2 5.3 1.0 1.5 8.6 0.1 4.9 56.8 1.5 1.0 0.3 0.9 5.8 0.1 0.2 1.6 2.1 2.4 0.2 0.1 1.1 20.2 2.0 1.2 2.3 3.3 0.1 0.4 0.1 6 4.8 0.8 0.1 0.1 1.4 0.3 0.6 0.1 1.2 0.6 0.1 0.5 13.3 0.3 4.7 7.5 1.8 0.1 4.4 0.7 0.1 56.9 0.6 0.1 2.3 1.2 2.2 0.1 0.1 0.1 0.1 4.4 0.1 18.4 0.1 0.1 0.1 0.1 0.1 0.4 0.1 0.1 0.1 5 2.6 10.8 1.2 3.5 1.3 0.1 0.1 3.4 0.1 0.1 // H OVEJ920102 D Environment-specific amino acid substitution matrix for alpha residues (Overington et al., 1992) R LIT:1811128 PMID:1304904 A Overington, J., Donnelly, D., Johnson, M.S., Sali, A. and Blundell, T.L. T Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds J Protein Science 1, 216-226 (1992) M rows = ACDEFGHIKLMNPQRSTVWYJ-, cols = ACDEFGHIKLMNPQRSTVWYJ 0.355 0.007 0.090 0.100 0.050 0.177 0.037 0.077 0.096 0.056 0.081 0.103 0.106 0.090 0.088 0.163 0.120 0.098 0.065 0.036 0.252 0.001 0.901 0.000 0.000 0.000 0.000 0.000 0.004 0.001 0.000 0.000 0.003 0.000 0.006 0.006 0.004 0.002 0.000 0.007 0.000 0.000 0.038 0.000 0.315 0.109 0.006 0.041 0.027 0.009 0.033 0.004 0.009 0.088 0.051 0.089 0.023 0.065 0.048 0.013 0.012 0.011 0.009 0.044 0.011 0.111 0.305 0.011 0.048 0.026 0.011 0.059 0.013 0.009 0.068 0.069 0.086 0.053 0.033 0.045 0.017 0.012 0.018 0.000 0.017 0.000 0.005 0.007 0.415 0.004 0.009 0.039 0.025 0.097 0.042 0.013 0.006 0.011 0.009 0.009 0.014 0.041 0.053 0.085 0.009 0.065 0.000 0.070 0.042 0.006 0.370 0.017 0.022 0.029 0.013 0.015 0.036 0.043 0.031 0.013 0.068 0.049 0.014 0.009 0.021 0.045 0.010 0.000 0.012 0.011 0.010 0.007 0.571 0.003 0.022 0.005 0.015 0.043 0.006 0.035 0.021 0.016 0.008 0.017 0.009 0.037 0.009 0.029 0.014 0.009 0.008 0.048 0.021 0.004 0.325 0.017 0.076 0.107 0.018 0.007 0.007 0.015 0.014 0.033 0.112 0.016 0.030 0.018 0.053 0.007 0.044 0.081 0.020 0.041 0.044 0.026 0.336 0.029 0.059 0.073 0.045 0.094 0.163 0.041 0.054 0.026 0.041 0.028 0.036 0.038 0.000 0.006 0.018 0.210 0.019 0.004 0.139 0.033 0.415 0.225 0.033 0.016 0.041 0.028 0.029 0.026 0.133 0.037 0.057 0.036 0.013 0.000 0.004 0.003 0.016 0.007 0.000 0.043 0.014 0.053 0.197 0.010 0.000 0.018 0.004 0.003 0.010 0.018 0.021 0.021 0.018 0.031 0.007 0.057 0.035 0.010 0.026 0.054 0.012 0.034 0.012 0.013 0.195 0.015 0.066 0.026 0.037 0.046 0.012 0.002 0.048 0.000 0.022 0.000 0.036 0.035 0.005 0.026 0.011 0.009 0.020 0.006 0.000 0.013 0.424 0.013 0.016 0.039 0.011 0.009 0.002 0.000 0.000 0.025 0.011 0.045 0.039 0.011 0.021 0.031 0.004 0.045 0.015 0.035 0.059 0.015 0.183 0.029 0.030 0.030 0.008 0.007 0.025 0.009 0.019 0.011 0.012 0.023 0.005 0.008 0.019 0.010 0.069 0.009 0.004 0.018 0.013 0.028 0.348 0.030 0.019 0.005 0.007 0.018 0.018 0.086 0.021 0.075 0.047 0.012 0.079 0.033 0.020 0.041 0.020 0.009 0.089 0.082 0.069 0.063 0.264 0.096 0.028 0.005 0.020 0.054 0.043 0.007 0.039 0.033 0.020 0.038 0.014 0.026 0.032 0.015 0.026 0.057 0.028 0.046 0.035 0.065 0.266 0.037 0.016 0.034 0.000 0.055 0.000 0.018 0.021 0.069 0.022 0.044 0.178 0.025 0.111 0.016 0.018 0.025 0.017 0.015 0.129 0.060 0.350 0.012 0.043 0.162 0.009 0.000 0.003 0.004 0.022 0.004 0.007 0.006 0.012 0.006 0.020 0.001 0.001 0.006 0.004 0.002 0.007 0.003 0.588 0.064 0.000 0.009 0.000 0.006 0.006 0.046 0.006 0.029 0.014 0.007 0.013 0.031 0.033 0.003 0.020 0.010 0.007 0.017 0.016 0.078 0.377 0.027 0.009 0.000 0.001 0.000 0.001 0.004 0.001 0.002 0.002 0.002 0.004 0.000 0.000 0.004 0.003 0.006 0.004 0.010 0.000 0.005 0.297 0.028 0.004 0.041 0.074 0.010 0.029 0.017 0.022 0.050 0.031 0.033 0.031 0.045 0.039 0.028 0.047 0.034 0.032 0.002 0.021 0.000 // """ # Run tests if called from the command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_agilent_microarray.py000644 000765 000024 00000002734 12024702176 025046 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for the Microarray output parser """ from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.parse.agilent_microarray import * __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" class MicroarrayParserTests(TestCase): """Tests for MicroarrayParser. """ def setUp(self): """Setup function for MicroarrayParser tests. """ self.sample_file = ['first line in file', 'second line, useless data', 'FEATURES\tFirst\tL\tProbeName\tGeneName\tLogRatio', 'DATA\tFirst\tData\tProbe1\tGene1\t0.02', 'DATA\tSecond\tData\tProbe2\tGene2\t-0.34'] def test_MicroarrayParser_empty_list(self): #Empty list should return tuple of empty lists self.assertEqual(MicroarrayParser([]),([],[],[])) def test_MicroarrayParser(self): #Given correct file format, return correct results self.assertEqual(MicroarrayParser(self.sample_file), (['PROBE1','PROBE2'], ['GENE1','GENE2'],[float(0.02),float(-0.34)])) #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_binary_sff.py000644 000765 000024 00000043253 12024702176 023316 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import copy import os import tempfile from unittest import TestCase, main from cogent.parse.binary_sff import ( seek_pad, parse_common_header, parse_read_header, parse_read_data, validate_common_header, parse_read, parse_binary_sff, UnsupportedSffError, write_pad, write_common_header, write_read_header, write_read_data, write_read, write_binary_sff, format_common_header, format_read_header, format_read_data, format_binary_sff, base36_encode, base36_decode, decode_location, decode_timestamp, decode_accession, decode_sff_filename, ) __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Production" TEST_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__))) SFF_FP = os.path.join(TEST_DIR, 'data', 'F6AVWTA01.sff') class WritingFunctionTests(TestCase): def setUp(self): self.output_file = tempfile.TemporaryFile() def test_write_pad(self): self.output_file.write('\x01\x02\x03\x04') write_pad(self.output_file) self.output_file.seek(0) buff = self.output_file.read() self.assertEqual(buff, '\x01\x02\x03\x04\x00\x00\x00\x00') def test_write_common_header(self): write_common_header(self.output_file, COMMON_HEADER) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) self.output_file.seek(0) observed = parse_common_header(self.output_file) self.assertEqual(observed, COMMON_HEADER) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) def test_write_read_header(self): write_read_header(self.output_file, READ_HEADER) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) self.output_file.seek(0) observed = parse_read_header(self.output_file) self.assertEqual(observed, READ_HEADER) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) def test_write_read_data(self): write_read_data(self.output_file, READ_DATA) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) self.output_file.seek(0) num_flows = len(READ_DATA['flowgram_values']) num_bases = len(READ_DATA['Bases']) observed = parse_read_data(self.output_file, num_bases, num_flows) self.assertEqual(observed, READ_DATA) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) def test_write_read(self): read = READ_HEADER.copy() read.update(READ_DATA) write_read(self.output_file, read) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) self.output_file.seek(0) num_flows = len(read['flowgram_values']) observed = parse_read(self.output_file) self.assertEqual(observed, read) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) def test_write_binary_sff(self): read = READ_HEADER.copy() read.update(READ_DATA) header = COMMON_HEADER.copy() header['number_of_reads'] = 1 write_binary_sff(self.output_file, header, [read]) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) self.output_file.seek(0) observed_header, observed_reads = parse_binary_sff( self.output_file, native_flowgram_values=True) observed_reads = list(observed_reads) self.assertEqual(observed_header, header) self.assertEqual(observed_reads[0], read) self.assertEqual(len(observed_reads), 1) file_pos = self.output_file.tell() self.assertTrue(file_pos % 8 == 0) class ParsingFunctionTests(TestCase): def setUp(self): self.sff_file = open(SFF_FP) def test_seek_pad(self): f = self.sff_file f.seek(8) seek_pad(f) self.assertEqual(f.tell(), 8) f.seek(9) seek_pad(f) self.assertEqual(f.tell(), 16) f.seek(10) seek_pad(f) self.assertEqual(f.tell(), 16) f.seek(15) seek_pad(f) self.assertEqual(f.tell(), 16) f.seek(16) seek_pad(f) self.assertEqual(f.tell(), 16) f.seek(17) seek_pad(f) self.assertEqual(f.tell(), 24) def test_parse_common_header(self): observed = parse_common_header(self.sff_file) self.assertEqual(observed, COMMON_HEADER) def test_validate_common_header(self): header = { 'magic_number': 779314790, 'version': 1, 'flowgram_format_code': 1, 'index_offset': 0, 'index_length': 0, 'number_of_reads': 0, 'header_length': 0, 'key_length': 0, 'number_of_flows_per_read': 0, 'flow_chars': 'A', 'key_sequence': 'A', } self.assertEqual(validate_common_header(header), None) header['version'] = 2 self.assertRaises(UnsupportedSffError, validate_common_header, header) def test_parse_read_header(self): self.sff_file.seek(440) observed = parse_read_header(self.sff_file) self.assertEqual(observed, READ_HEADER) def test_parse_read_data(self): self.sff_file.seek(440 + 32) observed = parse_read_data(self.sff_file, 271, 400) self.assertEqual(observed, READ_DATA) def test_parse_read(self): self.sff_file.seek(440) observed = parse_read(self.sff_file, 400) expected = dict(READ_HEADER.items() + READ_DATA.items()) self.assertEqual(observed, expected) def test_parse_sff(self): header, reads = parse_binary_sff(self.sff_file) self.assertEqual(header, COMMON_HEADER) counter = 0 for read in reads: self.assertEqual( len(read['flowgram_values']), header['number_of_flows_per_read']) counter += 1 self.assertEqual(counter, 20) class FormattingFunctionTests(TestCase): def setUp(self): self.output_file = tempfile.TemporaryFile() def test_format_common_header(self): self.assertEqual( format_common_header(COMMON_HEADER), COMMON_HEADER_TXT) def test_format_read_header(self): self.assertEqual( format_read_header(READ_HEADER), READ_HEADER_TXT) def test_format_read_header(self): self.assertEqual( format_read_data(READ_DATA, READ_HEADER), READ_DATA_TXT) def test_format_binary_sff(self): output_buffer = format_binary_sff(open(SFF_FP)) output_buffer.seek(0) expected = COMMON_HEADER_TXT + READ_HEADER_TXT + READ_DATA_TXT observed = output_buffer.read(len(expected)) self.assertEqual(observed, expected) class Base36Tests(TestCase): def test_base36_encode(self): self.assertEqual(base36_encode(2), 'C') self.assertEqual(base36_encode(37), 'BB') def test_base36_decode(self): self.assertEqual(base36_decode('C'), 2) self.assertEqual(base36_decode('BB'), 37) def test_decode_location(self): self.assertEqual(decode_location('C'), (0, 2)) def test_decode_timestamp(self): self.assertEqual(decode_timestamp('C3U5GW'), (2004, 9, 22, 16, 59, 10)) self.assertEqual(decode_timestamp('GA202I'), (2010, 1, 22, 13, 28, 56)) def test_decode_accession(self): self.assertEqual( decode_accession('GA202I001ER3QL'), ((2010, 1, 22, 13, 28, 56), '0', 1, (1843, 859))) def test_decode_sff_filename(self): self.assertEqual( decode_sff_filename('F6AVWTA01.sff'), ((2009, 11, 25, 14, 30, 19), 'A', 1)) COMMON_HEADER = { 'header_length': 440, 'flowgram_format_code': 1, 'index_length': 900, 'magic_number': 779314790, 'number_of_flows_per_read': 400, 'version': 1, 'flow_chars': 100 * 'TACG', 'key_length': 4, 'key_sequence': 'TCAG', 'number_of_reads': 20, 'index_offset': 33464, } COMMON_HEADER_TXT = """\ Common Header: Magic Number: 0x2E736666 Version: 0001 Index Offset: 33464 Index Length: 900 # of Reads: 20 Header Length: 440 Key Length: 4 # of Flows: 400 Flowgram Code: 1 Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG Key Sequence: TCAG """ READ_HEADER = { 'name_length': 14, 'Name': 'GA202I001ER3QL', 'clip_adapter_left': 0, 'read_header_length': 32, 'clip_adapter_right': 0, 'number_of_bases': 271, 'clip_qual_left': 5, 'clip_qual_right': 271, } READ_HEADER_TXT = """ >GA202I001ER3QL Run Prefix: R_2010_01_22_13_28_56_ Region #: 1 XY Location: 1843_0859 Read Header Len: 32 Name Length: 14 # of Bases: 271 Clip Qual Left: 5 Clip Qual Right: 271 Clip Adap Left: 0 Clip Adap Right: 0 """ READ_DATA = { 'flow_index_per_base': ( 1, 2, 3, 2, 3, 3, 2, 1, 1, 2, 1, 2, 0, 2, 3, 3, 2, 3, 3, 0, 2, 0, 2, 0, 1, 1, 1, 2, 0, 2, 2, 1, 0, 0, 3, 0, 2, 1, 0, 1, 1, 3, 1, 2, 2, 2, 3, 2, 1, 0, 2, 0, 3, 0, 3, 3, 1, 3, 0, 0, 0, 0, 2, 1, 0, 2, 0, 2, 0, 2, 2, 2, 2, 3, 2, 2, 0, 1, 0, 0, 0, 2, 1, 3, 2, 0, 3, 3, 2, 1, 2, 0, 2, 2, 1, 2, 1, 2, 0, 1, 3, 0, 0, 3, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 3, 0, 2, 1, 1, 2, 1, 3, 2, 2, 1, 0, 3, 3, 0, 2, 0, 1, 1, 3, 3, 3, 2, 0, 0, 0, 3, 3, 2, 1, 1, 2, 2, 1, 1, 0, 1, 0, 2, 0, 3, 1, 1, 0, 2, 0, 0, 1, 0, 3, 2, 3, 3, 3, 1, 3, 2, 0, 1, 3, 3, 3, 1, 3, 2, 0, 1, 2, 2, 3, 3, 3, 2, 3, 3, 3, 0, 3, 3, 2, 2, 0, 3, 1, 1, 3, 0, 1, 0, 3, 2, 2, 0, 2, 0, 2, 0, 0, 2, 3, 2, 2, 0, 2, 0, 3, 2, 3, 1, 2, 0, 3, 0, 2, 2, 2, 1, 1, 2, 2, 1, 1, 0, 3, 3, 2, 0, 1, 0, 3, 0, 2, 3, 1, 1, 1, 1, 3, 1, 0, 1, 1, 2, 2, 3, 1, 0, 0, 1, 1, 3, 3, 1, 3, 0, 1, 0), 'flowgram_values': ( 101, 0, 98, 3, 0, 104, 2, 95, 1, 0, 97, 3, 0, 110, 2, 102, 102, 110, 2, 99, 101, 0, 195, 5, 102, 0, 5, 96, 7, 0, 95, 7, 101, 0, 8, 98, 9, 0, 190, 9, 201, 0, 194, 101, 107, 104, 12, 198, 13, 104, 2, 105, 295, 7, 4, 197, 10, 101, 195, 98, 101, 3, 10, 100, 102, 0, 100, 7, 101, 0, 96, 8, 11, 102, 12, 102, 203, 9, 196, 8, 13, 206, 13, 6, 103, 10, 4, 103, 102, 3, 7, 479, 9, 102, 202, 10, 198, 6, 195, 9, 102, 0, 100, 5, 100, 2, 103, 8, 8, 100, 6, 102, 7, 200, 388, 10, 97, 100, 8, 5, 100, 12, 197, 7, 13, 103, 8, 7, 104, 10, 101, 104, 12, 201, 12, 99, 8, 99, 106, 13, 103, 102, 8, 202, 108, 9, 13, 293, 7, 4, 203, 103, 202, 107, 376, 103, 8, 11, 188, 8, 99, 101, 104, 8, 92, 101, 12, 4, 92, 11, 101, 7, 96, 202, 8, 12, 93, 11, 11, 202, 7, 195, 101, 102, 6, 0, 101, 7, 7, 106, 2, 6, 107, 4, 404, 12, 6, 104, 8, 10, 98, 2, 105, 110, 100, 8, 95, 3, 105, 102, 208, 201, 13, 195, 14, 0, 99, 86, 202, 9, 301, 206, 8, 8, 85, 6, 101, 6, 9, 103, 8, 9, 96, 4, 7, 102, 111, 0, 8, 93, 7, 194, 111, 5, 10, 95, 5, 10, 104, 2, 6, 98, 103, 0, 11, 99, 15, 192, 110, 5, 98, 8, 91, 8, 10, 92, 5, 10, 102, 8, 7, 105, 15, 102, 7, 9, 100, 2, 3, 102, 6, 9, 203, 6, 14, 107, 12, 8, 107, 1, 103, 13, 202, 2, 6, 108, 103, 99, 11, 2, 201, 207, 14, 8, 94, 4, 95, 9, 195, 13, 193, 9, 306, 13, 100, 11, 6, 75, 13, 91, 12, 205, 7, 203, 10, 3, 107, 17, 111, 12, 4, 105, 106, 7, 208, 5, 9, 202, 8, 108, 6, 84, 16, 103, 108, 92, 16, 93, 8, 95, 94, 207, 17, 10, 103, 3, 0, 104, 0, 202, 217, 16, 12, 197, 4, 90, 15, 17, 108, 98, 125, 104, 88, 14, 15, 99, 187, 106, 109, 12, 100, 11, 81, 8, 11, 92, 304, 112, 107, 2, 11, 94, 7, 6, 86, 97, 19, 3, 225, 206), 'Bases': ( 'TCAGCAGTAGTCCTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCT' 'CTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCC' 'ATCGTCTACCGGAATACCTTTAATCATGTGAACATGTGAACTCATGATGCCATCTTGTATTAATCTTCCT' 'TTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGG'), 'quality_scores': ( 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 37, 37, 37, 37, 37, 37, 37, 37, 34, 34, 34, 34, 34, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 32, 32, 32, 32, 38, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 40, 40, 40, 38, 38, 38, 38, 38, 38, 38, 40, 38, 38, 38, 38, 38, 38, 37, 38, 38, 36, 37, 37, 36, 33, 28, 28, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 31, 30, 30, 25, 25, 25, 25), } READ_DATA_TXT = """ Flowgram: 1.01 0.00 0.98 0.03 0.00 1.04 0.02 0.95 0.01 0.00 0.97 0.03 0.00 1.10 0.02 1.02 1.02 1.10 0.02 0.99 1.01 0.00 1.95 0.05 1.02 0.00 0.05 0.96 0.07 0.00 0.95 0.07 1.01 0.00 0.08 0.98 0.09 0.00 1.90 0.09 2.01 0.00 1.94 1.01 1.07 1.04 0.12 1.98 0.13 1.04 0.02 1.05 2.95 0.07 0.04 1.97 0.10 1.01 1.95 0.98 1.01 0.03 0.10 1.00 1.02 0.00 1.00 0.07 1.01 0.00 0.96 0.08 0.11 1.02 0.12 1.02 2.03 0.09 1.96 0.08 0.13 2.06 0.13 0.06 1.03 0.10 0.04 1.03 1.02 0.03 0.07 4.79 0.09 1.02 2.02 0.10 1.98 0.06 1.95 0.09 1.02 0.00 1.00 0.05 1.00 0.02 1.03 0.08 0.08 1.00 0.06 1.02 0.07 2.00 3.88 0.10 0.97 1.00 0.08 0.05 1.00 0.12 1.97 0.07 0.13 1.03 0.08 0.07 1.04 0.10 1.01 1.04 0.12 2.01 0.12 0.99 0.08 0.99 1.06 0.13 1.03 1.02 0.08 2.02 1.08 0.09 0.13 2.93 0.07 0.04 2.03 1.03 2.02 1.07 3.76 1.03 0.08 0.11 1.88 0.08 0.99 1.01 1.04 0.08 0.92 1.01 0.12 0.04 0.92 0.11 1.01 0.07 0.96 2.02 0.08 0.12 0.93 0.11 0.11 2.02 0.07 1.95 1.01 1.02 0.06 0.00 1.01 0.07 0.07 1.06 0.02 0.06 1.07 0.04 4.04 0.12 0.06 1.04 0.08 0.10 0.98 0.02 1.05 1.10 1.00 0.08 0.95 0.03 1.05 1.02 2.08 2.01 0.13 1.95 0.14 0.00 0.99 0.86 2.02 0.09 3.01 2.06 0.08 0.08 0.85 0.06 1.01 0.06 0.09 1.03 0.08 0.09 0.96 0.04 0.07 1.02 1.11 0.00 0.08 0.93 0.07 1.94 1.11 0.05 0.10 0.95 0.05 0.10 1.04 0.02 0.06 0.98 1.03 0.00 0.11 0.99 0.15 1.92 1.10 0.05 0.98 0.08 0.91 0.08 0.10 0.92 0.05 0.10 1.02 0.08 0.07 1.05 0.15 1.02 0.07 0.09 1.00 0.02 0.03 1.02 0.06 0.09 2.03 0.06 0.14 1.07 0.12 0.08 1.07 0.01 1.03 0.13 2.02 0.02 0.06 1.08 1.03 0.99 0.11 0.02 2.01 2.07 0.14 0.08 0.94 0.04 0.95 0.09 1.95 0.13 1.93 0.09 3.06 0.13 1.00 0.11 0.06 0.75 0.13 0.91 0.12 2.05 0.07 2.03 0.10 0.03 1.07 0.17 1.11 0.12 0.04 1.05 1.06 0.07 2.08 0.05 0.09 2.02 0.08 1.08 0.06 0.84 0.16 1.03 1.08 0.92 0.16 0.93 0.08 0.95 0.94 2.07 0.17 0.10 1.03 0.03 0.00 1.04 0.00 2.02 2.17 0.16 0.12 1.97 0.04 0.90 0.15 0.17 1.08 0.98 1.25 1.04 0.88 0.14 0.15 0.99 1.87 1.06 1.09 0.12 1.00 0.11 0.81 0.08 0.11 0.92 3.04 1.12 1.07 0.02 0.11 0.94 0.07 0.06 0.86 0.97 0.19 0.03 2.25 2.06 Flow Indexes: 1 3 6 8 11 14 16 17 18 20 21 23 23 25 28 31 33 36 39 39 41 41 43 43 44 45 46 48 48 50 52 53 53 53 56 56 58 59 59 60 61 64 65 67 69 71 74 76 77 77 79 79 82 82 85 88 89 92 92 92 92 92 94 95 95 97 97 99 99 101 103 105 107 110 112 114 114 115 115 115 115 117 118 121 123 123 126 129 131 132 134 134 136 138 139 141 142 144 144 145 148 148 148 151 151 152 153 153 154 155 155 155 155 156 159 159 161 162 163 165 166 169 171 173 174 174 177 180 180 182 182 183 184 187 190 193 195 195 195 195 198 201 203 204 205 207 209 210 211 211 212 212 214 214 217 218 219 219 221 221 221 222 222 225 227 230 233 236 237 240 242 242 243 246 249 252 253 256 258 258 259 261 263 266 269 272 274 277 280 283 283 286 289 291 293 293 296 297 298 301 301 302 302 305 307 309 309 311 311 313 313 313 315 318 320 322 322 324 324 327 329 332 333 335 335 338 338 340 342 344 345 346 348 350 351 352 352 355 358 360 360 361 361 364 364 366 369 370 371 372 373 376 377 377 378 379 381 383 386 387 387 387 388 389 392 395 396 399 399 400 400 Bases: tcagCAGTAGTCCTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGTGAACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGG Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 34 34 34 34 34 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 32 32 32 32 38 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 38 38 38 40 40 40 38 38 38 38 38 38 38 40 38 38 38 38 38 38 37 38 38 36 37 37 36 33 28 28 31 31 31 31 31 31 31 31 31 31 31 32 32 31 30 30 25 25 25 25 """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_blast.py000644 000765 000024 00000032350 12024702176 022275 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of BLAST parser. """ from string import split, strip from cogent.util.unit_test import TestCase, main from cogent.parse.blast import iter_finder, query_finder, iteration_set_finder,\ is_blast_junk, is_blat_junk, make_label, PsiBlastQueryFinder, \ TableToValues, \ PsiBlastTableParser, PsiBlastFinder, GenericBlastParser9, \ PsiBlastParser9, LastProteinIds9, QMEBlast9, QMEPsiBlast9, \ fastacmd_taxonomy_splitter, FastacmdTaxonomyParser __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Production" class BlastTests(TestCase): """Tests of top-level functions""" def setUp(self): """Define some standard data""" self.rec = """# BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-06 52.8 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 2 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-54 211 ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-54 211 ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-08 59.0 ece:Z4181 sfl:CP0138 33.98 103 57 2 8 110 6 97 6e-06 50.5 ece:Z4181 spt:SPA2730 37.50 72 45 0 39 110 30 101 1e-05 49.8 ece:Z4181 sec:SC2804 37.50 72 45 0 39 110 30 101 1e-05 49.8 ece:Z4181 stm:STM2872 37.50 72 45 0 39 110 30 101 1e-05 49.8""".split('\n') self.rec2 = """# BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-06 52.8 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 2 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-54 211 ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-54 211 ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-08 59.0 ece:Z4181 sfl:CP0138 33.98 103 57 2 8 110 6 97 6e-06 50.5 ece:Z4181 spt:SPA2730 37.50 72 45 0 39 110 30 101 1e-05 49.8 ece:Z4181 sec:SC2804 37.50 72 45 0 39 110 30 101 1e-05 49.8 ece:Z4181 stm:STM2872 37.50 72 45 0 39 110 30 101 1e-05 49.8 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4182 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4182 ece:Z4182 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4182 ecs:ECs3718 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4182 cvi:CV2422 41.67 72 42 0 39 110 29 100 2e-06 52.8""".split('\n') self.rec3 = """# BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 spt:SPA2730 37.50 72 45 0 39 110 30 101 1e-05 49.8 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 2 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-54 211 ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-08 59.0 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4182 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4182 ece:Z4182 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4182 cvi:CV2422 41.67 72 42 0 39 110 29 100 2e-06 52.8""".split('\n') def test_iter_finder(self): """iter_finder should split on lines starting with '# Iteration:'""" lines = 'abc\n# Iteration: 3\ndef'.split('\n') self.assertEqual(map(iter_finder,lines), [False, True, False]) def test_query_finder(self): """query_finder should split on lines starting with '# Query:'""" lines = 'abc\n# Query: dfdsffsd\ndef'.split('\n') self.assertEqual(map(query_finder,lines), [False, True, False]) def test_iteration_set_finder(self): """iter_finder should split on lines starting with '# Iteration:'""" lines = 'abc\n# Iteration: 3\ndef\n# Iteration: 1'.split('\n') self.assertEqual(map(iteration_set_finder,lines), \ [False, False, False, True]) def test_is_junk(self): """is_junk should reject an assortment of invalid lines""" #Note: testing two functions that call it instead of function itself lines = 'abc\n# BLAST blah blah\n \n# BLAT blah\n123'.split('\n') self.assertEqual(map(is_blast_junk, lines), \ [False, True, True, False, False]) self.assertEqual(map(is_blat_junk, lines), \ [False, False, True, True, False]) def test_make_label(self): """make_label should turn comment lines into (key, val) pairs""" a = 'this test will fail: no # at start' b = '#this test will fail because no colon' c = '# Iteration: 1' d = '# Query: ece:Z4147 ygdP; putative invasion protein [EC:3.6.1.-]' e = '#Iteration: 1' #no space after the hash self.assertRaises(ValueError, make_label, a) self.assertRaises(ValueError, make_label, b) #Note that we _do_ map the data type of known values value, so the #value of the iteration will be 1, not '1' self.assertEqual(make_label(c), ('ITERATION', 1)) self.assertEqual(make_label(d), ('QUERY', \ 'ece:Z4147 ygdP; putative invasion protein [EC:3.6.1.-]')) self.assertEqual(make_label(e), ('ITERATION', 1)) def test_TableToValues(self): """TableToValues should convert itself into the correct type.""" constructors = {'a':int, 'b':float, 'c':str} table=[['c','b','a','d'], ['1.5', '3.5', '2', '2.5'],['1','2','3','4']] self.assertEqual(TableToValues(table, constructors), \ ([['1.5',3.5,2,'2.5'],['1',2.0,3,'4']], ['c','b','a','d'])) #check that it works with supplied header self.assertEqual(TableToValues(table[1:], constructors, list('cbad')), \ ([['1.5',3.5,2,'2.5'],['1',2.0,3,'4']], ['c','b','a','d'])) def test_PsiBlastTableParser(self): """PsiBlastTableParser should wrap values in table.""" fields = map(strip, 'Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score'.split(',')) table = map(split, """ece:Z4147 ece:Z4147 100.00 176 0 0 1 176 1 176 2e-89 328 ece:Z4147 ecs:ECs3687 100.00 176 0 0 1 176 1 176 2e-89 328 ece:Z4147 ecc:c3425 100.00 176 0 0 1 176 1 176 2e-89 328 ece:Z4147 sfl:SF2840 100.00 176 0 0 1 176 1 176 2e-89 328""".split('\n')) headed_table = [fields] + table new_table, new_fields = PsiBlastTableParser(headed_table) self.assertEqual(new_fields, fields) self.assertEqual(len(new_table), 4) self.assertEqual(new_table[1], ['ece:Z4147', 'ecs:ECs3687', 100.0, \ 176, 0, 0, 1, 176, 1, 176, 2e-89, 328]) def test_GenericBlastParser9(self): """GenericBlastParser9 should read blast's tabular format (#9).""" rec = self.rec p = GenericBlastParser9(rec, PsiBlastFinder) result = list(p) self.assertEqual(len(result), 2) first, second = result self.assertEqual(first[0], {'ITERATION':1,'QUERY':'ece:Z4181',\ 'DATABASE':'db/everything.faa', 'FIELDS':'Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score'}) self.assertEqual(len(first[1]), 3) self.assertEqual(second[0]['ITERATION'], 2) self.assertEqual(len(second[1]), 7) self.assertEqual(second[1][-1], \ 'ece:Z4181 stm:STM2872 37.50 72 45 0 39 110 30 101 1e-05 49.8'.split()) def test_PsiBlastParser9(self): """PsiBlastParser9 should provide convenient results for format #9.""" result = PsiBlastParser9(self.rec2) self.assertEqual(len(result), 2) assert 'ece:Z4181' in result assert 'ece:Z4182' in result first = result['ece:Z4181'] second = result['ece:Z4182'] self.assertEqual(len(first), 2) self.assertEqual(len(second), 1) iter_1 = first[0] iter_2 = first[1] self.assertEqual(len(iter_1), 3) self.assertEqual(len(iter_2), 7) iter_1_2 = second[0] self.assertEqual(len(iter_1_2), 3) self.assertEqual(len(result['ece:Z4181'][1][3]), 12) self.assertEqual(result['ece:Z4181'][1][3]['ALIGNMENT LENGTH'], 103) def test_LastProteinIds9(self): """LastProteinIds9 should give last protein ids in iter""" result = LastProteinIds9(self.rec) self.assertEqual(result, ['ece:Z4181', 'ecs:ECs3717', 'cvi:CV2421',\ 'sfl:CP0138', 'spt:SPA2730', 'sec:SC2804', 'stm:STM2872']) #should also work if threshold set result = LastProteinIds9(self.rec, False, threshold=8e-6) self.assertEqual(result, ['ece:Z4181', 'ecs:ECs3717', 'cvi:CV2421',\ 'sfl:CP0138']) #should work on multiple records result = map(LastProteinIds9, PsiBlastQueryFinder(self.rec2)) self.assertEqual(len(result), 2) self.assertEqual(result[0], ['ece:Z4181', 'ecs:ECs3717', 'cvi:CV2421',\ 'sfl:CP0138', 'spt:SPA2730', 'sec:SC2804', 'stm:STM2872']) self.assertEqual(result[1], ['ece:Z4182','ecs:ECs3718','cvi:CV2422']) def test_QMEBlast9(self): """QMEBlast9 should return expected lines from all iterations""" self.assertFloatEqual(QMEBlast9(self.rec3), [\ ('ece:Z4181','ece:Z4181',3e-47), ('ece:Z4181','ecs:ECs3717',3e-47), ('ece:Z4181','spt:SPA2730', 1e-5), ('ece:Z4181','ecs:ECs3717',3e-54), #WARNING: allows duplicates ('ece:Z4181','cvi:CV2421',2e-8), ('ece:Z4182','ece:Z4182',3e-47), ('ece:Z4182','cvi:CV2422',2e-6), ]) def test_QMEPsiBlast9(self): """QMEPsiBlast9 should only return items from last iterations""" self.assertFloatEqual(QMEPsiBlast9(self.rec3), [\ ('ece:Z4181','ecs:ECs3717',3e-54), ('ece:Z4181','cvi:CV2421',2e-8), ('ece:Z4182','ece:Z4182',3e-47), ('ece:Z4182','cvi:CV2422',2e-6), ]) def test_fastacmd_taxonomy_splitter(self): """fastacmd_taxonomy_splitter should split records into groups""" text = """NCBI sequence id: gi|3021565|emb|AJ223314.1|PSAJ3314 NCBI taxonomy id: 3349 Common name: Scots pine Scientific name: Pinus sylvestris NCBI sequence id: gi|37777029|dbj|AB108787.1| NCBI taxonomy id: 228610 Common name: cf. Acremonium sp. KR21-2 Scientific name: cf. Acremonium sp. KR21-2 """.splitlines() recs = list(fastacmd_taxonomy_splitter(text)) self.assertEqual(len(recs), 2) self.assertEqual(recs[0], text[:5]) #includes trailing blank def test_FastaCmdTaxonomyParser(self): """FastaCmdTaxonomyParser should parse taxonomy record to dict""" text = """NCBI sequence id: gi|3021565|emb|AJ223314.1|PSAJ3314 NCBI taxonomy id: 3349 Common name: Scots pine Scientific name: Pinus sylvestris NCBI sequence id: gi|37777029|dbj|AB108787.1| NCBI taxonomy id: 228610 Common name: cf. Acremonium sp. KR21-2 Scientific name: cf. Acremonium sp. KR21-2 """.splitlines() recs = list(FastacmdTaxonomyParser(text)) self.assertEqual(len(recs), 2) for r in recs: self.assertEqual(sorted(r.keys()), ['common_name','scientific_name', 'seq_id', 'tax_id']) r0, r1 = recs self.assertEqual(r0['tax_id'], '3349') self.assertEqual(r0['common_name'], 'Scots pine') self.assertEqual(r0['scientific_name'], 'Pinus sylvestris') self.assertEqual(r0['seq_id'], 'gi|3021565|emb|AJ223314.1|PSAJ3314') self.assertEqual(r1['tax_id'], '228610') if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_blast_xml.py000644 000765 000024 00000027220 12024702176 023155 0ustar00jrideoutstaff000000 000000 #! /usr/bin/env python # # test_blast_xml.py # __author__ = "Kristian Rother" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Micah Hamady"] __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kristian Rother" __email__ = "krother@rubor.de" __status__ = "Prototype" from cogent.util.unit_test import main, TestCase from cogent.parse.blast_xml import BlastXMLResult, MinimalBlastParser7,\ get_tag, parse_hsp, parse_hit, parse_header, parse_parameters,\ HSP_XML_FIELDNAMES, HIT_XML_FIELDNAMES import xml.dom.minidom class GetTagTests(TestCase): """Tests for the auxiliary function evaluating the tag objects.""" def setUp(self): self.single_tag = xml.dom.minidom.parseString(\ "bla contentbla") self.double_tag = xml.dom.minidom.parseString(\ "first contentsecond content") self.empty_tag = xml.dom.minidom.parseString("") def test_get_tag_works(self): self.assertEqual(get_tag(self.single_tag,'inner'),'content') self.assertEqual(get_tag(self.double_tag,'inner'),'first content') self.assertEqual(get_tag(self.empty_tag,'inner'),None) self.assertEqual(get_tag(self.empty_tag,'inner', 'blue elephant'),\ 'blue elephant') self.assertEqual(get_tag(self.single_tag,'non-existing tag'),None) self.assertEqual(get_tag(self.single_tag,'non-existing tag',\ 'pink elephant'),'pink elephant') self.assertEqual(get_tag(self.single_tag,'inner'),'content') def test_get_tag_fail(self): """Make sure the tag and name parameters are in the proper types.""" self.assertRaises(AttributeError, get_tag,None,"h1") self.assertRaises(AttributeError, get_tag,\ "

This is not a XML tag object

","h1") class MinimalBlastParser7Tests(TestCase): """Tests for the functions required by the Blast XML parsers.""" def setUp(self): self.hit1 = xml.dom.minidom.parseString(HIT_WITH_ONE_HSP) self.hit2 = xml.dom.minidom.parseString(HIT_WITH_TWO_HSPS) self.hsp1 = xml.dom.minidom.parseString(HSP_ONE) self.hsp2 = xml.dom.minidom.parseString(HSP_TWO) self.hsp_gaps = xml.dom.minidom.parseString(HSP_WITH_GAPS) self.param = xml.dom.minidom.parseString(PARAM_XML) self.header = xml.dom.minidom.parseString(HEADER_XML) self.complete = xml.dom.minidom.parseString(HEADER_COMPLETE) def test_parse_header(self): """Fields from XML header tag should be available as dict.""" data = parse_header(self.header) self.assertEqual(data.get('application'), 'my Grandma') self.assertEqual(data.get('version'), 'has') self.assertEqual(data.get('reference'), 'furry') self.assertEqual(data.get('query_letters'), 27) self.assertEqual(data.get('database'), 'Cats') def test_parse_parameters(self): """Fields from XML parameter tag should be available as dict.""" data = parse_parameters(self.param) self.assertEqual(data.get('matrix'), 'BLOSUM62') self.assertEqual(data.get('expect'), '10') self.assertEqual(data.get('gap_open_penalty'), 11.1) self.assertEqual(data.get('gap_extend_penalty'), 22.2) self.assertEqual(data.get('filter'), 'F') def test_parse_header_complete(self): """Fields from header+param tag should be available as dict.""" # try to process header with parameters etc in the XML data = parse_header(self.complete) self.assertEqual(data.get('database'), 'Cats') self.assertEqual(data.get('matrix'), 'BLOSUM62') def test_parse_hit(self): """Should return a list with all values for a hit+hsp.""" data = parse_hit(self.hit1) self.assertEqual(len(data),1) d = dict(zip(HIT_XML_FIELDNAMES,data[0])) self.assertEqual(d['SUBJECT_ID'],"gi|148670104|gb|EDL02051.1|") self.assertEqual(d['HIT_DEF'], "insulin-like growth factor 2 receptor, isoform CRA_c [Mus musculus]") self.assertEqual(d['HIT_ACCESSION'],"2001") self.assertEqual(int(d['HIT_LENGTH']),707) # check hit with more HSPs data = parse_hit(self.hit2) self.assertEqual(len(data),2) self.assertNotEqual(data[0],data[1]) def test_parse_hsp(self): """Should return list with all values for a hsp.""" data = parse_hsp(self.hsp1) d = dict(zip(HSP_XML_FIELDNAMES,data)) self.assertEqual(float(d['BIT_SCORE']),1023.46) self.assertEqual(float(d['SCORE']),2645) self.assertEqual(float(d['E_VALUE']),0.333) self.assertEqual(int(d['QUERY_START']),4) self.assertEqual(int(d['QUERY_END']),18) self.assertEqual(int(d['SUBJECT_START']),5) self.assertEqual(int(d['SUBJECT_END']),19) self.assertEqual(int(d['GAP_OPENINGS']),0) self.assertEqual(int(d['ALIGNMENT_LENGTH']),14) self.assertEqual(d['QUERY_ALIGN'],'ELEPHANTTHISISAHITTIGER') self.assertEqual(d['MIDLINE_ALIGN'],'ORCA-WHALE') self.assertEqual(d['SUBJECT_ALIGN'],'SEALSTHIS---HIT--GER') class BlastXmlResultTests(TestCase): """Tests parsing of output of Blast with output mode 7 (XML).""" def setUp(self): self.result = BlastXMLResult(COMPLETE_XML,xml=True) def test_options(self): """Constructor should take parser as an option.""" result = BlastXMLResult(COMPLETE_XML,parser=MinimalBlastParser7) self.assertEqual(len(result.keys()),1) # make sure whether normal Blast parser still works upon code merge! def test_parsed_query_sequence(self): """The result dict should have one query sequence as a key.""" # The full query sequence is not given in the XML file. # Thus it is not checked explicitly, only whether there is # exactly one found. self.assertEqual(len(self.result.keys()),1) def test_parsed_iterations(self): """The result should have the right number of iterations.""" n_iter = 0 for query_id,hits in self.result.iterHitsByQuery(): n_iter += 1 self.assertEqual(n_iter,1) def test_parsed_hsps(self): """The result should have the right number of hsps.""" n_hsps = 0 for query_id,hsps in self.result.iterHitsByQuery(): n_hsps += len(hsps) self.assertEqual(n_hsps,3) def test_parse_hit_details(self): """The result should have data from hit fields.""" for query in self.result: first_hsp = self.result[query][0][0] self.assertEqual(first_hsp['SUBJECT_ID'], "gi|148670104|gb|EDL02051.1|") self.assertEqual(first_hsp['HIT_DEF'], "insulin-like growth factor 2 receptor, isoform CRA_c [Mus musculus]") self.assertEqual(first_hsp['HIT_ACCESSION'],"2001") self.assertEqual(first_hsp['HIT_LENGTH'],707) def test_parse_hsp_details(self): """The result should have data from hsp fields.""" for query in self.result: # should check integers in next version. first_hsp = self.result[query][0][0] self.assertEqual(first_hsp['QUERY ID'],1) self.assertEqual(first_hsp['BIT_SCORE'],'1023.46') self.assertEqual(first_hsp['SCORE'],'2645') self.assertEqual(first_hsp['E_VALUE'],'0.333') self.assertEqual(first_hsp['QUERY_START'],'4') self.assertEqual(first_hsp['QUERY_END'],'18') self.assertEqual(first_hsp['QUERY_ALIGN'],'ELEPHANTTHISISAHITTIGER') self.assertEqual(first_hsp['MIDLINE_ALIGN'],'ORCA-WHALE') self.assertEqual(first_hsp['SUBJECT_ALIGN'],'SEALSTHIS---HIT--GER') self.assertEqual(first_hsp['SUBJECT_START'],'5') self.assertEqual(first_hsp['SUBJECT_END'],'19') self.assertEqual(first_hsp['PERCENT_IDENTITY'],'55') self.assertEqual(first_hsp['POSITIVE'],'555') self.assertEqual(first_hsp['GAP_OPENINGS'],0) self.assertEqual(first_hsp['ALIGNMENT_LENGTH'],'14') gap_hsp = self.result[query][0][1] self.assertEqual(gap_hsp['GAP_OPENINGS'],'33') HSP_XML = """ 1 1023.46 2645 0.333 4 18 5 19 1 1 55 %s 555 14 ELEPHANTTHISISAHITTIGER SEALSTHIS---HIT--GER ORCA-WHALE """ HSP_ONE = HSP_XML%'' HSP_WITH_GAPS = HSP_XML%'33' HSP_TWO = """ 2 1023.46 2645 0.333 6 22 5 23 1 1 55 %s 555 18 EPHANT---THISISAHIT-TIGER ALSWWWTHIS---HITW--GER ORCA-WHALE """ HIT_XML = """ 1 gi|148670104|gb|EDL02051.1| insulin-like growth factor 2 receptor, isoform CRA_c [Mus musculus] 2001 707 %s """ HIT_WITH_ONE_HSP = HIT_XML%HSP_ONE HIT_WITH_TWO_HSPS = HIT_XML%(HSP_WITH_GAPS+HSP_TWO) PARAM_XML = """ BLOSUM62 10 11.1 22.2 F """ HEADER_XML = """ my Grandma has Cats furry 27 %s """ HIT_PREFIX = """ """ HIT_SUFFIX = """ """ HEADER_COMPLETE=HEADER_XML%(PARAM_XML+HIT_PREFIX+HIT_WITH_ONE_HSP+\ HIT_WITH_TWO_HSPS+HIT_SUFFIX) COMPLETE_XML = """ """+HEADER_COMPLETE if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_bowtie.py000644 000765 000024 00000006674 12024702176 022473 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the bowtie default output parser. Compatible with bowtie version 0.12.5 """ from cogent.parse.bowtie import BowtieOutputParser, BowtieToTable from cogent.util.unit_test import TestCase, main from cogent import LoadTable __author__ = "Gavin Huttley, Anuj Pahwa" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight","Peter Maxwell", "Gavin Huttley", "Anuj Pahwa"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Development" fname = 'data/bowtie_output.map' expected = [['GAPC_0015:6:1:1283:11957#0/1', '-', 'Mus', 66047927, 'TGTATATATAAACATATATGGAAACTGAATATATATACATTATGTATGTATATATGTATATGTTATATATACATA', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 0, ['55:A>G', '64:C>A']], ['GAPC_0015:6:1:1394:18813#0/1', '+', 'Mus', 77785518, 'ATGAAATTCCTAGCCAAATGGATGGACCTGGAGGGCATCATC', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 447, []], ['GAPC_0015:6:1:1560:18056#0/1', '+', 'Mus', 178806665, 'TAGATAAAGGCTCTGTTTTTCATCATTGAGAAATTGTTATTTTTCTGATGTTATA', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 0, ['9:T>G']], ['GAPC_0015:6:1:1565:19849#0/1', '+', 'Mus', 116516430, 'ACCATTTGCTTGGAAAATTGTTTTCCAGCCTTTCACTCTGAG', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 141, []], ['GAPC_0015:6:1:1591:17397#0/1', '-', 'Mus', 120440696, 'TCTAAATCTGTTCATTAATTAAGCCTGTTTCCATGTCCTTGGTCTTAAGACCAATCTGTTATGCGGGTGTGA', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 0, ['70:A>C', '71:G>T']]] class BowtieOutputTest(TestCase): def test_parsing(self): """make sure that the bowtie output file is parsed properly""" parser = BowtieOutputParser(fname) header = parser.next() index = 0 for row in parser: self.assertEqual(row, expected[index]) index += 1 def test_psl_to_table(self): """make sure that the table is built without any errors""" table = BowtieToTable(fname) def test_getting_seq_coords(self): """get correct information from the table""" table = BowtieToTable(fname) index = 0 for row in table: query_name = row['Query Name'] strand_direction = row['Strand Direction'] query_offset = row['Offset'] self.assertEqual(query_name, expected[index][0]) self.assertEqual(strand_direction, expected[index][1]) self.assertEqual(query_offset, expected[index][3]) index += 1 def test_no_row_converter(self): """setting row_converter=None returns strings""" # straight parser parser = BowtieOutputParser(fname, row_converter=None) header = parser.next() for index, row in enumerate(parser): query_offset = row[3] other_matches = row[6] self.assertEqual(query_offset, str(expected[index][3])) self.assertEqual(other_matches, str(expected[index][6])) # table table = BowtieToTable(fname, row_converter=None) for index, row in enumerate(table): query_offset = row['Offset'] other_matches = row['Other Matches'] self.assertEqual(query_offset, str(expected[index][3])) self.assertEqual(other_matches, str(expected[index][6])) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_bpseq.py000644 000765 000024 00000017670 12024702176 022312 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides Tests for BpseqParser and related functions. """ from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.struct.knots import inc_order from cogent.parse.bpseq import BpseqParseError, construct_sequence,\ parse_header, parse_residues, MinimalBpseqParser, BpseqParser,\ bpseq_specify_output __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class BpseqParserTests(TestCase): """Provides tests for BpseqParser and related functions""" def test_parse_header(self): """parse_header: should work on standard header""" h1 = ['Filename: d.16.b.E.coli.bpseq','Organism: Escherichia coli',\ 'Accession Number: J01695', 'Citation and related information'+\ ' available at http://www.rna.icmb.utexas.edu'] self.assertEqual(parse_header(h1),{'Filename':'d.16.b.E.coli.bpseq',\ 'Accession Number': 'J01695', 'Organism': 'Escherichia coli',\ 'Refs': {},'Citation':'http://www.rna.icmb.utexas.edu'}) assert isinstance(parse_header(h1), Info) # lines without ':' are skipped h2 = ['Filename: d.16.b.E.coli.bpseq','Organism: Escherichia coli',\ 'Accession Number: J01695', 'Remark this is an interesting seq'] exp = {'Filename':'d.16.b.E.coli.bpseq', 'Refs': {},\ 'Organism': 'Escherichia coli', 'Accession Number':'J01695'} self.assertEqual(parse_header(h2),exp) def test_construct_sequence(self): """construct_sequence: should return correct sequence or raise error """ d = {0:'A',1:'C',2:'G',3:'U'} self.assertEqual(construct_sequence(d),'ACGU') # doesn't check residue identity d = {0:'A',1:'-',2:'R',3:'U'} self.assertEqual(construct_sequence(d),'A-RU') # error when sequence isn't continuous d = {0:'A',1:'C',2:'G',5:'U'} self.assertRaises(BpseqParseError, construct_sequence, d) # error when first index is not zero d = {1:'C',2:'G',3:'U',4:'A'} self.assertRaises(BpseqParseError, construct_sequence, d) def test_parse_residues(self): """parse_residues: should work on valid data """ lines = RES_LINES.split('\n') exp_seq = 'UGGUAAUACGUUGCGAAGCC' exp_pairs = [(2,8),(3,7),(4,11),(5,10),(6,9),(12,18),(13,17)] self.assertEqual(parse_residues(lines, num_base=1,\ unpaired_symbol='0'), (exp_seq, exp_pairs)) def test_parse_residues_errors(self): """parse_residues: should raise BpseqParseErrors in several cases """ not_all_lines = RES_LINES_NOT_ALL.split('\n') wrong_lines = RES_LINES_WRONG.split('\n') conflict_lines = RES_LINES_CONFLICT.split('\n') bp_conflict = RES_LINES_BP_CONFLICT.split('\n') self.assertRaises(BpseqParseError, parse_residues, not_all_lines,\ num_base=1, unpaired_symbol='0') self.assertRaises(BpseqParseError, parse_residues, wrong_lines,\ num_base=1, unpaired_symbol='0') self.assertRaises(BpseqParseError, parse_residues, conflict_lines,\ num_base=1, unpaired_symbol='0') self.assertRaises(BpseqParseError, parse_residues, bp_conflict,\ num_base=1, unpaired_symbol='0') def test_parse_residues_diff_base(self): """parse_residues: should work with diff base and unpaired_symbol""" lines = RES_LINES_DIFF_BASE.split('\n') exp_seq = 'CAGACU' exp_pairs = [(1,5),(2,4)] obs = parse_residues(lines, num_base=3, unpaired_symbol='xxx') self.assertEqual(obs, (exp_seq, exp_pairs)) def test_MinimalBpseqParser(self): """MinimalBpseqParser: should separate lines correctly""" lines = ['Accesion: J01234', 'LABEL : label', '1 U 4', '2 A 10', 'xx',\ 'A B C D E'] exp = {'HEADER': ['Accesion: J01234', 'LABEL : label'],\ 'SEQ_STRUCT': ['1 U 4', '2 A 10']} self.assertEqual(MinimalBpseqParser(lines), exp) def test_BpseqParser(self): """BpseqParser: should work on valid data, returning Vienna or Pairs """ lines = RES_LINES_W_HEADER.split('\n') exp_seq = 'UGGUAAUACGUUGCGAAGCC' exp_pairs = [(2,8),(3,7),(4,11),(5,10),(6,9),(12,18),(13,17)] self.assertEqual(BpseqParser(lines),(exp_seq, exp_pairs)) self.assertEqual(BpseqParser(lines)[0].Info,\ {'Filename':'d.16.b.E.coli.bpseq',\ 'Accession Number': 'J01695', 'Organism': 'Escherichia coli',\ 'Refs': {},'Citation':'http://www.rna.icmb.utexas.edu'}) # should work with different base lines = RES_LINES_DIFF_BASE.split('\n') exp_seq = 'CAGACU' exp_pairs = [(1,5),(2,4)] obs_seq, obs_pairs = BpseqParser(lines, num_base=3,\ unpaired_symbol='xxx') self.assertEqual(obs_seq, exp_seq) self.assertEqual(obs_seq.Info, {'Refs':{}}) self.assertEqual(obs_pairs, exp_pairs) def test_BpseqParser_errors(self): """BpseqParser: should skip lines in unknown format""" exp_seq = 'UGGUAAUACGUUGCGAAGCC' exp_vienna_m = '....(((..)))((...)).' exp_pairs = [(2,8),(3,7),(4,11),(5,10),(6,9),(12,18),(13,17)] #skips lines in unknown format lines = RES_LINES_UNKNOWN.split('\n') obs_seq, obs_pairs = BpseqParser(lines) self.assertEqual(obs_seq, exp_seq) self.assertEqual(obs_pairs, exp_pairs) self.assertEqual(obs_seq.Info,\ {'Filename':'d.16.b.E.coli.bpseq',\ 'Accession Number': 'J01695', 'Organism': 'Escherichia coli',\ 'Refs': {},'Citation':'http://www.rna.icmb.utexas.edu'}) class ConvenienceFunctionTests(TestCase): """Tests for convenience functions""" def test_bpseq_specify_output(self): """bpseq_specify_output: different return values""" f = bpseq_specify_output lines = RES_LINES_W_HEADER.split('\n') exp_seq = 'UGGUAAUACGUUGCGAAGCC' exp_pairs = [(2,8),(3,7),(4,11),(5,10),(6,9),(12,18),(13,17)] exp_pairs_majority = [(4,11),(5,10),(6,9),(12,18),(13,17)] exp_pairs_first = [(2,8),(3,7),(12,18),(13,17)] exp_vienna_majority = '....(((..)))((...)).' self.assertEqual(f(lines),(exp_seq, exp_pairs)) self.assertEqual(f(lines, remove_pseudo=True),\ (exp_seq, exp_pairs_majority)) self.assertEqual(f(lines, remove_pseudo=True, pseudoknot_function=inc_order),\ (exp_seq, exp_pairs_first)) self.assertEqual(f(lines, return_vienna=True),\ (exp_seq, exp_vienna_majority)) RES_LINES=\ """1 U 0 2 G 0 3 G 9 4 U 8 5 A 12 6 A 11 7 U 10 8 A 4 9 C 3 10 G 7 11 U 6 12 U 5 13 G 19 14 C 18 15 G 0 16 A 0 17 A 0 18 G 14 19 C 13 20 C 0""" RES_LINES_NOT_ALL=\ """1 U 0 2 G 0 3 G 0 6 A 0""" RES_LINES_WRONG=\ """1 U0 2 G 0 3 G 0 6 A 0""" RES_LINES_CONFLICT=\ """1 U 4 2 G 3 3 G 2 4 A 1 4 A 0""" RES_LINES_BP_CONFLICT=\ """1 U 0 2 G 4 3 G 4 4 C 2 5 A 0""" RES_LINES_DIFF_BASE=\ """3 C xxx 4 A 8 5 G 7 6 A xxx 7 C 5 8 U 4""" RES_LINES_W_HEADER=\ """Filename: d.16.b.E.coli.bpseq Organism: Escherichia coli Accession Number: J01695 Citation and related information available at http://www.rna.icmb.utexas.edu 1 U 0 2 G 0 3 G 9 4 U 8 5 A 12 6 A 11 7 U 10 8 A 4 9 C 3 10 G 7 11 U 6 12 U 5 13 G 19 14 C 18 15 G 0 16 A 0 17 A 0 18 G 14 19 C 13 20 C 0""" RES_LINES_UNKNOWN=\ """Filename: d.16.b.E.coli.bpseq Organism: Escherichia coli Accession Number: J01695 Citation and related information available at http://www.rna.icmb.utexas.edu 1 U 0 2 G 0 3 G 9 UNKNOWN LINE 4 U 8 5 A 12 6 A 11 7 U 10 8 A 4 9 C 3 10 G 7 11 U 6 12 U 5 13 G 19 14 C 18 15 G 0 16 A 0 17 A 0 18 G 14 19 C 13 20 C 0""" #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_cigar.py000644 000765 000024 00000006075 12024702176 022262 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import unittest, sys, os from cogent import DNA, LoadSeqs from cogent.parse.cigar import map_to_cigar, cigar_to_map, aligned_from_cigar, \ slice_cigar, CigarParser __author__ = "Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Hua Ying", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Hua Ying" __email__ = "hua.ying@anu.edu.au" __status__ = "Production" class TestCigar(unittest.TestCase): def setUp(self): self.cigar_text = '3D2M3D6MDM2D3MD' self.aln_seq = DNA.makeSequence('---AA---GCTTAG-A--CCT-') self.aln_seq1 = DNA.makeSequence('CCAAAAAA---TAGT-GGC--G') self.map, self.seq = self.aln_seq.parseOutGaps() self.map1, self.seq1 = self.aln_seq1.parseOutGaps() self.slices = [(1, 4), (0, 8), (7, 12), (0, 1), (3, 5)] self.aln = LoadSeqs(data = {"FAKE01": self.aln_seq, "FAKE02": self.aln_seq1}) self.cigars = {"FAKE01": self.cigar_text, "FAKE02": map_to_cigar(self.map1)} self.seqs = {"FAKE01": str(self.seq), "FAKE02": str(self.seq1)} def test_map_to_cigar(self): """convert a Map to cigar string""" assert map_to_cigar(self.map) == self.cigar_text def test_cigar_to_map(self): """test generating a Map from cigar""" map = cigar_to_map(self.cigar_text) assert str(map) == str(self.map) def test_aligned_from_cigar(self): """test generating aligned seq from cigar""" aligned_seq = aligned_from_cigar(self.cigar_text, self.seq) assert aligned_seq == self.aln_seq def test_slice_cigar(self): """test slicing cigars""" for start, end in self.slices: # test by_align = True map1, loc1 = slice_cigar(self.cigar_text, start, end) ori1 = self.aln_seq[start:end] if loc1: slicealn1 = self.seq[loc1[0]:loc1[1]].gappedByMap(map1) assert ori1 == slicealn1 else: assert map1.length == len(ori1) # test by_align = False map2, loc2 = slice_cigar(self.cigar_text, start, end, by_align = False) slicealn2 = self.seq[start:end].gappedByMap(map2) ori2 = self.aln_seq[loc2[0]:loc2[1]] assert slicealn2 == ori2 def test_CigarParser(self): """test without slice""" aln = CigarParser(self.seqs, self.cigars) assert aln == self.aln # test slice i = 1 for start, end in self.slices: self.aln.getSeq("FAKE01").addFeature("annot%d"%i, "annot", [(start, end)]) annot = self.aln.getAnnotationsFromAnySequence("annot%d"%i) slice_aln = aln.getRegionCoveringAll(annot).asOneSpan().getSlice() i += 1 cmp_aln = CigarParser(self.seqs, self.cigars, sliced = True, ref_seqname = "FAKE01", start = start, end = end) assert cmp_aln == slice_aln if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_parse/test_clustal.py000644 000765 000024 00000012371 12024702176 022640 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the clustal parsers. """ from cogent.parse.clustal import LabelLineParser, is_clustal_seq_line, \ last_space, delete_trailing_number, MinimalClustalParser from cogent.parse.record import RecordError from cogent.util.unit_test import TestCase, main from cogent.core.alignment import Alignment __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" #Note: the data are all strings and hence immutable, so it's OK to define #them here instead of in setUp and then subclassing everything from that #base class. If the data were mutable, we'd need to take more precautions #to avoid crossover between tests. minimal = 'abc\tucag' two = 'abc\tuuu\ndef\tccc\n\n ***\n\ndef ggg\nabc\taaa\n'.split('\n') real = """CLUSTAL W (1.82) multiple sequence alignment abc GCAUGCAUGCAUGAUCGUACGUCAGCAUGCUAGACUGCAUACGUACGUACGCAUGCAUCA 60 def ------------------------------------------------------------ xyz ------------------------------------------------------------ abc GUCGAUACGUACGUCAGUCAGUACGUCAGCAUGCAUACGUACGUCGUACGUACGU-CGAC 119 def -----------------------------------------CGCGAUGCAUGCAU-CGAU 18 xyz -------------------------------------CAUGCAUCGUACGUACGCAUGAC 23 * * * * * ** abc UGACUAGUCAGCUAGCAUCGAUCAGU 145 def CGAUCAGUCAGUCGAU---------- 34 xyz UGCUGCAUCA---------------- 33 * ***""".split('\n') bad = ['dshfjsdfhdfsj','hfsdjksdfhjsdf'] space_labels = ['abc uca','def ggg ccc'] class clustalTests(TestCase): """Tests of top-level functions.""" def test_is_clustal_seq_line(self): """is_clustal_seq_line should reject blanks and 'CLUSTAL'""" ic = is_clustal_seq_line assert ic('abc') assert ic('abc def') assert not ic('CLUSTAL') assert not ic('CLUSTAL W fsdhicjkjsdk') assert not ic(' * *') assert not ic(' abc def') assert not ic('MUSCLE (3.41) multiple sequence alignment') def test_last_space(self): """last_space should split on last whitespace""" self.assertEqual(last_space('a\t\t\t b c'), ['a b', 'c']) self.assertEqual(last_space('xyz'), ['xyz']) self.assertEqual(last_space(' a b'), ['a','b']) def test_delete_trailing_number(self): """delete_trailing_number should delete the trailing number if present""" dtn = delete_trailing_number self.assertEqual(dtn('abc'), 'abc') self.assertEqual(dtn('a b c'), 'a b c') self.assertEqual(dtn('a \t b \t c'), 'a \t b \t c') self.assertEqual(dtn('a b 3'), 'a b') self.assertEqual(dtn('a b c \t 345'), 'a b c') class MinimalClustalParserTests(TestCase): """Tests of the MinimalClustalParser class""" def test_null(self): """MinimalClustalParser should return empty dict and list on null input""" result = MinimalClustalParser([]) self.assertEqual(result, ({},[])) def test_minimal(self): """MinimalClustalParser should handle single-line input correctly""" result = MinimalClustalParser([minimal]) #expects seq of lines self.assertEqual(result, ({'abc':['ucag']}, ['abc'])) def test_two(self): """MinimalClustalParser should handle two-sequence input correctly""" result = MinimalClustalParser(two) self.assertEqual(result, ({'abc':['uuu','aaa'],'def':['ccc','ggg']}, \ ['abc', 'def'])) def test_real(self): """MinimalClustalParser should handle real Clustal output""" data, labels = MinimalClustalParser(real) self.assertEqual(labels, ['abc', 'def', 'xyz']) self.assertEqual(data, { 'abc': [ 'GCAUGCAUGCAUGAUCGUACGUCAGCAUGCUAGACUGCAUACGUACGUACGCAUGCAUCA', 'GUCGAUACGUACGUCAGUCAGUACGUCAGCAUGCAUACGUACGUCGUACGUACGU-CGAC', 'UGACUAGUCAGCUAGCAUCGAUCAGU' ], 'def': [ '------------------------------------------------------------', '-----------------------------------------CGCGAUGCAUGCAU-CGAU', 'CGAUCAGUCAGUCGAU----------' ], 'xyz': [ '------------------------------------------------------------', '-------------------------------------CAUGCAUCGUACGUACGCAUGAC', 'UGCUGCAUCA----------------' ] }) def test_bad(self): """MinimalClustalParser should reject bad data if strict""" result = MinimalClustalParser(bad, strict=False) self.assertEqual(result, ({},[])) #should fail unless we turned strict processing off self.assertRaises(RecordError, MinimalClustalParser, bad) def test_space_labels(self): """MinimalClustalParser should tolerate spaces in labels""" result = MinimalClustalParser(space_labels) self.assertEqual(result, ({'abc':['uca'],'def ggg':['ccc']},\ ['abc', 'def ggg'])) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_column.py000644 000765 000024 00000013751 12024702176 022471 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.foldalign import find_struct from cogent.parse.pfold import tree_struct_sep from cogent.parse.column import column_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class ColumnParserTest(TestCase): """Provides tests for Column format RNA secondary structure parsers""" def setUp(self): """Setup function""" #output self.pfold_out = PFOLD self.foldalign_out = FOLDALIGN #expected self.pfold_exp = [['GCAGAUUUAGAUGC',[(0,13),(1,12),(2,11),(6,10)]]] self.foldalign_exp = [['GCAGAUUUAGAUGC',[(0,13),(1,12),(2,11),(6,10)]]] self.find_struct_exp = [[[(0,13),(1,12),(2,11),(6,10)],'GCCACGUAGCUCAG', 'GCCGUAUGUUUCAG']] def test_pfold_output(self): """Test for column_parser for pfold format""" tree,lines = tree_struct_sep(self.pfold_out) self.assertEqual(tree,PFOLD_tree) obs = column_parser(lines) self.assertEqual(obs,self.pfold_exp) def test_foldalign_output(self): """Test for column_parser for foldalign format""" obs = column_parser(self.foldalign_out) self.assertEqual(obs,self.foldalign_exp) def test_foldalign_find_struct(self): """ Test for foldalign parser find struct function""" obs = find_struct(self.foldalign_out) self.assertEqual(obs,self.find_struct_exp) FOLDALIGN = ['; FOLDALIGN 2.0.3\n', '; REFERENCE J.H. Havgaard, R.B. Lyngs\xf8, G.D. Stormo, J. Gorodkin\n', '; REFERENCE Pairwise local structural alignment of RNA sequences\n', '; REFERENCE with sequence similarity less than 40%\n', '; REFERENCE Bioinformatics 21(9), 1815-1824, 2005\n', '; ALIGNMENT_ID n.a.\n', '; ALIGNING seq1 against seq2\n', '; ALIGN seq1 \n', '; ALIGN seq2 \n', '; ALIGN Score: 929\n', '; ALIGN Identity: 69 % ( 48 / 70 )\n', '; ALIGN Begin\n', '; ALIGN\n', '; ALIGN seq1 GCCACGUAGC UCAG\n', '; ALIGN Structure (((...(... ))))\n', '; ALIGN seq2 GCCGUAUGUU UCAG\n', '; ALIGN \n', '; ALIGN End\n', '; ==============================================================================\n', '; TYPE RNA\n', '; COL 1 label\n', '; COL 2 residue\n', '; COL 3 seqpos\n', '; COL 4 alignpos\n', '; COL 5 align_bp\n', '; COL 6 seqpos_bp\n', '; ENTRY seq1\n', '; ALIGNMENT_ID n.a.\n', '; ALIGNMENT_LIST seq1 seq2\n', '; FOLDALIGN_SCORE 929\n', '; GROUP 1\n', '; FILENAME seq1.fasta\n', '; START_POSITION 2\n', '; END_POSITION 71\n', '; ALIGNMENT_SIZE 2\n', '; ALIGNMENT_LENGTH 70\n', '; SEQUENCE_LENGTH 76\n', '; PARAMETER max_length=76\n', '; PARAMETER max_diff=76\n', '; PARAMETER min_loop=3\n', '; PARAMETER score_matrix=\n', '; PARAMETER nobranching=\n', '; PARAMETER global=\n', '; ----------\n', 'N G 1 1 14 0.90\n', 'N C 2 2 13 0.79\n', 'N A 3 3 12 0.87\n', 'N G 4 4 . 0.60\n', 'N A 5 5 . 0.34\n', 'N U 6 6 . 0.34\n', 'N U 7 7 11 0.98\n', 'N U 8 8 . 0.34\n', 'N A 9 9 . 0.56\n', 'N G 10 10 . 0.67\n', 'N A 11 11 7 0.78\n', 'N U 12 12 3 0.87\n', 'N G 13 13 2 0.87\n', 'N C 14 14 1 0.90\n', '; **********\n'] PFOLD = ['; generated by fasta2col\n', '; ============================================================\n', '; TYPE TREE\n', '; COL 1 label\n', '; COL 2 number\n', '; COL 3 name\n', '; COL 4 uplen\n', '; COL 5 child\n', '; COL 6 brother\n', '; ENTRY tree\n', '; root 1\n', '; ----------\n', ' N 1 seq1 0.001000 . .\n', '; **********\n', '; TYPE RNA\n', '; COL 1 label\n', '; COL 2 residue\n', '; COL 3 seqpos\n', '; COL 4 alignpos\n', '; COL 5 align_bp\n', '; COL 6 certainty\n', '; ENTRY seq1\n', '; ----------\n', 'N G 1 1 14 0.90\n', 'N C 2 2 13 0.79\n', 'N A 3 3 12 0.87\n', 'N G 4 4 . 0.60\n', 'N A 5 5 . 0.34\n', 'N U 6 6 . 0.34\n', 'N U 7 7 11 0.98\n', 'N U 8 8 . 0.34\n', 'N A 9 9 . 0.56\n', 'N G 10 10 . 0.67\n', 'N A 11 11 7 0.78\n', 'N U 12 12 3 0.87\n', 'N G 13 13 2 0.87\n', 'N C 14 14 1 0.90\n', '; **********\n'] PFOLD_tree = ['; generated by fasta2col\n', '; ============================================================\n', '; TYPE TREE\n', '; COL 1 label\n', '; COL 2 number\n', '; COL 3 name\n', '; COL 4 uplen\n', '; COL 5 child\n', '; COL 6 brother\n', '; ENTRY tree\n', '; root 1\n', '; ----------\n', ' N 1 seq1 0.001000 . .\n', '; **********\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_comrna.py000644 000765 000024 00000015433 12024702176 022452 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.comrna import comRNA_parser,common __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class ComrnaParserTest(TestCase): """Provides tests for COMRNA RNA secondary structure format parsers""" def setUp(self): """Setup function """ #output self.comrna_out = COMRNA #expected self.comrna_exp = [['GGCUAGAUAGCUCA',[(0,13),(1,12),(4,9),(5,8)]], ['GGCUAGAUAGCUCA',[(0,13),(1,12),(4,9),(5,8)]], ['GGCUAGAUAGCUCA',[(0,13),(1,12),(4,9),(5,8)]], ['GGCUAGAUAGCUCA',[(0,13),(1,12)]]] def test_comrna_output(self): """Test for comrna format parser""" obs = comRNA_parser(self.comrna_out) self.assertEqual(obs,self.comrna_exp) def test_common_func(self): """Test common function in comrna parser """ obs = common(self.comrna_exp) exp = [['GGCUAGAUAGCUCA',[(0,13),(1,12),(4,9),(5,8)]], ['GGCUAGAUAGCUCA',[(0,13),(1,12)]]] self.assertEqual(obs,exp) COMRNA = ['comRNA input.fasta \n', '\n', 'PARAMETERS: \n', 'L = 4, Minimum length of a straight stem;\n', 'E = -5.00, Maximum stem energy allowed for a stem to be analyzed, in kcal/mol;\n', 'S = 0.00, Minimum stem similarity score b/w two stems compared;\n', 'Sh = 0.60, Maximum stem similarity score threshold that will be tested;\n', 'Sl = 0.20, Minimum stem similarity score threshold that will be tested;\n', 'P = 0.50, Minimum percentage of sequences in which a common structure should occur;\n', 'n = 10, Number of common structures to be reported;\n', 'x = 999, Maximum number of pseudoknot crossover pattern allowed between one stem and other stems in a structure;\n', 'a = 1, Use anchor region during stem comparison;\n', 'o = 4, Maximum number of overlapping nucleotides allowed between two stems;\n', 'c = 0.30, Maximum percentage of stem length that is allowed overlapping between two stems;\n', 'j = 0.70, Maximum percentage of stems allowed overlapping between two different cliques;\n', 'r = 0.40, Minimum percentage of stems required to be same for two cliques to be considered same when reporting structures;\n', 'f = 10, Number of flanking nucleotides of a stem to be refolded together during structure refinement;\n', 'v = 5, Maximum length of nucleotides allowed for a new loop to deviate from its length in the original structure pattern;\n', 'g = 0, Use topological sort to assemble stem blocks;\n', '\n', 'Sequence file name: "input.fasta"\n', '\n', 'Sequences loaded ...\n', '1 seq1 72 nt\n', '2 seq2 72 nt\n', '3 seq3 72 nt\n', '4 seq4 72 nt\n', '\n', '\n', 'Number of stems in each energy bin for each sequence:\n', '\n', 'energy[kc/m] -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0\n', 'seq1 2 1 1 6 1 5 9 2 1 3 0 1\n', 'seq2 2 1 1 6 1 5 9 2 1 3 0 1\n', 'seq3 2 1 1 6 1 5 9 2 1 3 0 1\n', 'seq4 2 1 1 6 1 5 9 2 1 3 0 1\n', '\n', '\n', 'Pairwise Sequence Identity (%): \n', '\n', ' 1 2 3 4\n', '\n', ' 1 - 100 100 100\n', ' 2 100 - 100 100\n', ' 3 100 100 - 100\n', ' 4 100 100 100 - \n', '\n', 'Average Pairwise Sequence Identity (%): 100\n', '\n', 'Comparing stems pairwise ... \n', '\n', 'Number of edges that has stem-similarity-score higher than a certain threshold in the stem graph:\n', '\n', 'Score: 0.8 0.78 0.76 0.74 0.72 0.7 0.68 0.66 0.64 0.62 0.6 0.58 0.56 0.54 0.52 0.5 0.48 0.46 0.44 0.42 0.4 0.38 0.36 0.34 0.32 0.3 0.28 0.26 0.24 0.22 0.2\n', 'Num of edges: 12 12 18 18 18 24 24 54 72 78 102 114 114 120 132 132 144 156 168 168 174 180 180 180 180 180 180 180 180 180 180 180\n', '\n', 'Time spent on comparing stems: 0.03 seconds user CPU time; 0.04 seconds real time.\n', '\n', 'Maximum structure finding time: 1 min\n', '\n', '=========================== S = 0.6 ===========================\n', '\n', 'Find conserved stems (cliques) ... ==== 17 cliques ==== 17 unique ====\n', 'Time spent on finding conserved stems: 0 sec CPU time; 0 sec clock time.\n', '\n', 'Construct clique topological graph ... ==== 53 edges ====\n', 'Assemble conserved stems (cliques) ... ==== 44 structures ====\n', 'Time spent on topologically assembling conserved stems: 0 sec CPU time; 0 sec clock time.\n', '\n', 'Report top 10 structures.\n', '-------------------------------------------\n', 'Structure #1: Score = 10.1, pattern: 41, path: 0 1 3 , comseq: 1 2 3 4 , incompatible_seq: 0() 1() 3() \n', '(a) Clique 0: OriginalScore = 3.82, ModifiedScore = 3.82\n', ' 1, seq1 1 GGCUAGA 7 ... 66 UCUGGCC 72 [-13 kc/m]\n', ' 2, seq2 1 GGCUAGA 7 ... 66 UCUGGCC 72 [-13 kc/m]\n', ' 3, seq3 1 GGCUAGA 7 ... 66 UCUGGCC 72 [-13 kc/m]\n', ' 4, seq4 1 GGCUAGA 7 ... 66 UCUGGCC 72 [-13 kc/m]\n', '(b) Clique 1: OriginalScore = 3.45, ModifiedScore = 3.45\n', ' 1, seq1 29 GGAUUGAA 36 ... 54 UUCGAUCC 61 [-11.6 kc/m]\n', ' 2, seq2 29 GGAUUGAA 36 ... 54 UUCGAUCC 61 [-11.6 kc/m]\n', ' 3, seq3 29 GGAUUGAA 36 ... 54 UUCGAUCC 61 [-11.6 kc/m]\n', ' 4, seq4 29 GGAUUGAA 36 ... 54 UUCGAUCC 61 [-11.6 kc/m]\n', '(c) Clique 3: OriginalScore = 2.82, ModifiedScore = 2.82\n', ' 1, seq1 49 GUCGG 53 UUCGAUC 61 CCGGC 65 [-8.4 kc/m]\n', ' 2, seq2 49 GUCGG 53 UUCGAUC 61 CCGGC 65 [-8.4 kc/m]\n', ' 3, seq3 49 GUCGG 53 UUCGAUC 61 CCGGC 65 [-8.4 kc/m]\n', ' 4, seq4 49 GUCGG 53 UUCGAUC 61 CCGGC 65 [-8.4 kc/m]\n', '\n', '\n', 'seq1 1 GGCUAGAUAGCUCA 14 \n', ' aa bb bb aa\n', 'seq2 1 GGCUAGAUAGCUCA 14 \n', ' aa bb bb aa\n', 'seq3 1 GGCUAGAUAGCUCA 14 \n', ' aa bb bb aa\n', 'seq4 1 GGCUAGAUAGCUCA 14 \n', ' aa aa\n', '\n', '\n', '-------------------------------------------'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_consan.py000644 000765 000024 00000002667 12024702176 022461 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.consan import consan_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class ConsanParserTest(TestCase): """Provides tests for CONSAN RNA secondary structure parsers""" def setUp(self): """Setup function""" #output self.consan_out = CONSAN #expected self.consan_exp = [{'seq1':'GGACACGUCGCUCA','seq2':'G.ACAGAUCGCUCA'}, [(0,13),(2,11),(5,8)]] def test_consan_output(self): """Test for consan format parser""" obs = consan_parser(self.consan_out) self.assertEqual(obs,self.consan_exp) CONSAN = ['Using standard STA scoring\n', 'Using QRADIUS constraints (Quality > 0.9500 SPACED 20) !\n', '# STOCKHOLM 1.0\n', '\n', '#=GF SC\t 1.000000\n', '#=GF PI\t 0.720000\n', '#=GF ME\t QRadius Num: 3 Win: 20 Cutoff: 0.95\n', '\n', 'seq1 \tGGACACGUCGCUCA\n', 'seq2 \tG.ACAGAUCGCUCA\n', '#=GC SS_cons\t\t >.>..>..<..<.<\n', '#=GC PN \t\t .......*......\n', '\n', '\n', '\n', '#=GF TIME 43.0\n', '\n', '//\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_cove.py000644 000765 000024 00000002641 12024702176 022124 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.cove import coves_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class CovesParserTest(TestCase): """Provides tests for Coves RNA secondary structure parsers""" def setUp(self): """Setup function """ #output self.cove_out = COVE #expected self.cove_exp = [['UAGAUGGCUCUCAU',[(0,13),(2,11),(3,10)]]] def test_cove_output(self): """Test coves format parser""" obs = coves_parser(self.cove_out) self.assertEqual(obs,self.cove_exp) COVE = ['coves - scoring and structure prediction of RNA sequences\n', ' using a covariance model\n', ' version 2.4.4, January 1996\n', '\n', '---------------------------------------------------\n', 'Database to search/score: single.fasta\n', 'Model: single.fasta.cm\n', 'GC% of background model: 50%\n', '---------------------------------------------------\n', '\n', '-32.55 bits : seq1\n', ' seq1 UAGAUGGCUCUCAU\n', ' seq1 >.>>......<<.<\n', '\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_ct.py000644 000765 000024 00000012257 12024702176 021602 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.ct import ct_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class CtParserTest(TestCase): """Provides tests for RNA secondary structure parsers""" def setUp(self): """Setup function""" #output self.carnac_out = CARNAC self.dynalign_out = DYNALIGN self.mfold_out = MFOLD self.sfold_out = SFOLD self.unafold_out = UNAFOLD self.knetfold_out = KNETFOLD #expected self.carnac_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)]]] self.dynalign_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)],-46.3]] self.mfold_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)],-23.47]] self.sfold_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)],-22.40]] self.unafold_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)],-20.5]] self.knetfold_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)]]] def test_carnac_output(self): """Test for ct_parser for carnac format""" obs = ct_parser(self.carnac_out) self.assertEqual(obs,self.carnac_exp) def test_dynalign_output(self): """Test for ct_parser for dynalign format""" obs = ct_parser(self.dynalign_out) self.assertEqual(obs,self.dynalign_exp) def test_mfold_output(self): """Test for ct_parser for mfold format""" obs = ct_parser(self.mfold_out) self.assertEqual(obs,self.mfold_exp) def test_sfold_output(self): """Test for ct_parser for sfold format""" obs = ct_parser(self.sfold_out) self.assertEqual(obs,self.sfold_exp) def test_unafold_output(self): """Test for ct_parser for unafold format""" obs = ct_parser(self.unafold_out) self.assertEqual(obs,self.unafold_exp) def test_knetfold_output(self): """Test for ct_parser for knetfold format""" obs = ct_parser(self.knetfold_out) self.assertEqual(obs,self.knetfold_exp) CARNAC = [' 12 seq1\n', ' 1 G 0 2 12 1\n', ' 2 C 1 3 0 2\n', ' 3 A 2 4 10 3\n', ' 4 G 3 5 9 4\n', ' 5 A 4 6 0 5\n', ' 6 U 5 7 0 6\n', ' 7 G 6 8 0 7\n', ' 8 G 7 9 0 8\n', ' 9 C 8 10 4 9\n', ' 10 U 6 11 3 10 \n', ' 11 U 7 12 0 11\n', ' 12 C 8 13 1 12\n'] DYNALIGN = [' 72 ENERGY = -46.3 seq 3\n', ' 1 G 0 2 12 1\n', ' 2 C 1 3 0 2\n', ' 3 A 2 4 10 3\n', ' 4 G 3 5 9 4\n', ' 5 A 4 6 0 5\n', ' 6 U 5 7 0 6\n', ' 7 G 6 8 0 7\n', ' 8 G 7 9 0 8\n', ' 9 C 8 10 4 9\n', ' 10 U 6 11 3 10 \n', ' 11 U 7 12 0 11\n', ' 12 C 8 13 1 12\n'] MFOLD = [' 12 dG = -23.47 [initially -22.40] seq1 \n', ' 1 G 0 2 12 1\n', ' 2 C 1 3 0 2\n', ' 3 A 2 4 10 3\n', ' 4 G 3 5 9 4\n', ' 5 A 4 6 0 5\n', ' 6 U 5 7 0 6\n', ' 7 G 6 8 0 7\n', ' 8 G 7 9 0 8\n', ' 9 C 8 10 4 9\n', ' 10 U 6 11 3 10 \n', ' 11 U 7 12 0 11\n', ' 12 C 8 13 1 12\n'] SFOLD = ['Structure 1 -22.40 0.63786E-01\n', ' 1 G 0 2 12 1\n', ' 2 C 1 3 0 2\n', ' 3 A 2 4 10 3\n', ' 4 G 3 5 9 4\n', ' 5 A 4 6 0 5\n', ' 6 U 5 7 0 6\n', ' 7 G 6 8 0 7\n', ' 8 G 7 9 0 8\n', ' 9 C 8 10 4 9\n', ' 10 U 6 11 3 10 \n', ' 11 U 7 12 0 11\n', ' 12 C 8 13 1 12\n'] UNAFOLD = ['12\tdG = -20.5\tseq1\n', ' 1 G 0 2 12 1\n', ' 2 C 1 3 0 2\n', ' 3 A 2 4 10 3\n', ' 4 G 3 5 9 4\n', ' 5 A 4 6 0 5\n', ' 6 U 5 7 0 6\n', ' 7 G 6 8 0 7\n', ' 8 G 7 9 0 8\n', ' 9 C 8 10 4 9\n', ' 10 U 6 11 3 10 \n', ' 11 U 7 12 0 11\n', ' 12 C 8 13 1 12\n'] KNETFOLD = [' 12 \n', ' 1 G 0 2 12 1\n', ' 2 C 1 3 0 2\n', ' 3 A 2 4 10 3\n', ' 4 G 3 5 9 4\n', ' 5 A 4 6 0 5\n', ' 6 U 5 7 0 6\n', ' 7 G 6 8 0 7\n', ' 8 G 7 9 0 8\n', ' 9 C 8 10 4 9\n', ' 10 U 6 11 3 10 \n', ' 11 U 7 12 0 11\n', ' 12 C 8 13 1 12\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_cut.py000644 000765 000024 00000002204 12024702176 021756 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.parse.cut import cut_parser __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" class CutParserTest(TestCase): """Provides tests for codon usage table parser""" def test_cut_parser(self): """cut_parser should work on first few lines of supplied file""" lines = """#Species: Saccharomyces cerevisiae #Division: gbpln #Release: TranstermMay1994 #CdsCount: 24 #Coding GC 44.99% #1st letter GC 47.28% #2nd letter GC 40.83% #3rd letter GC 46.86% #Codon AA Fraction Frequency Number GCA A 0.010 1.040 6 GCC A 0.240 22.420 130 GCG A 0.000 0.000 0 GCT A 0.750 71.610 411 TGC C 0.070 0.520 3 """ result = cut_parser(lines.splitlines()) self.assertEqual(result, \ {'GCA':6,'GCC':130,'GCG':0,'GCT':411,'TGC':3}) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_cutg.py000644 000765 000024 00000014671 12024702176 022140 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the CUTG database parsers. """ from cogent.parse.cutg import CutgParser, CutgSpeciesParser, InfoFromLabel from cogent.parse.record import RecordError from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" sample_gene = r""">AB000406\AB000406\100..1506\1407\BAA19100.1\Xenopus laevis\Xenopus laevis mRNA for protein phosphatase 2A regulatory subunit,complete cds./gene="PP2A"/codon_start=1/product="protein phosphatase 2A regulatory subunit"/protein_id="BAA19100.1"/db_xref="GI:1783183" 1 7 4 2 15 6 4 4 11 4 1 7 6 7 4 7 14 7 10 7 4 6 5 2 3 8 3 4 3 6 5 4 6 0 8 5 8 8 20 10 21 13 2 9 5 7 22 16 20 10 10 8 7 3 6 15 12 5 14 11 6 0 1 0 >AB000458#1\AB000458\105..623\519\BAA22881.1\Xenopus laevis\Xenopus laevis Xem1 mRNA for transmembrane protein, complete cds./gene="Xem1"/codon_start=1/product="transmembrane protein"/protein_id="BAA22881.1"/db_xref="GI:2554596" 0 1 1 0 2 1 6 7 10 4 0 1 4 6 1 5 2 1 3 1 1 5 2 5 0 5 2 4 0 2 2 1 1 1 2 5 4 4 2 2 1 3 1 6 4 2 2 5 1 2 3 2 1 2 2 9 3 5 2 6 4 1 0 0 >AB000736\AB000736\27..557\531\BAA19174.1\Xenopus laevis\Xenopus laevis mRNA for myelin basic protein, complete cds./codon_start=1/product="myelin basic protein"/protein_id="BAA19174.1"/db_xref="GI:1816437" 1 1 1 2 5 11 1 0 3 2 0 1 8 4 2 11 3 1 2 1 0 0 5 2 0 5 3 3 0 1 14 3 4 2 1 0 1 4 3 7 3 0 2 3 9 7 1 4 3 3 5 4 0 0 5 2 1 1 0 4 1 0 0 1 """.split('\n') sample_species = r"""Salmonella enterica: 332 432 1587 640 1808 586 275 758 825 3557 1668 2943 1663 1267 928 1002 1397 1359 1277 1096 1880 1305 1267 911 608 1752 1002 1684 1868 2960 1627 995 2399 1228 2103 1605 1289 2024 2017 4800 1681 1719 3181 1699 2857 700 1723 4347 2560 1577 4174 1031 2938 496 678 1354 3501 1713 1955 3743 2497 1366 214 22 96 Salmonella enterica IIIb 50:k:z: 3 2 5 2 4 3 3 4 6 12 5 7 1 7 5 1 4 7 3 10 13 5 6 9 4 3 5 17 14 14 14 6 18 5 12 15 10 4 11 24 5 16 17 9 19 6 3 11 9 9 18 7 20 2 0 8 17 6 11 16 11 5 2 0 1 Salmonella enterica subsp. VII: 5 4 1 6 7 6 0 8 15 10 9 18 9 15 6 6 19 21 3 21 14 11 22 16 10 11 6 38 13 20 10 8 8 9 9 7 5 7 22 58 21 25 40 24 22 11 16 41 21 6 14 9 11 2 12 10 31 16 23 22 22 3 0 1 4""".split('\n') strange_db = r'''>AB001737\AB001737\1..696\696\BAA19944.1\Mus musculus\Mus musculus mRNA for anti-CEA scFv antibody, complete cds./codon_start=1/product="anti-CEA scFv antibody"/protein_id="BAA19944.1"/db_xref="GI:2094751"/db_xref="IMGT/LIGM:AB001737"''' class InfoFromLabelTests(TestCase): """Tests of the InfoFromLabel constructor.""" def test_init(self): """InfoFromLabel should handle a typical label line""" i = InfoFromLabel(sample_gene[0]) sa = self.assertEqual sa(i.GenBank, ['AB000406']) sa(i.Locus, 'AB000406') sa(i.CdsNumber, '1'), sa(i.Location, '100..1506') sa(i.Length, '1407') sa(i.Species, 'Xenopus laevis') sa(i.Description, r'Xenopus laevis mRNA for protein phosphatase 2A regulatory subunit,complete cds./gene="PP2A"/codon_start=1/product="protein phosphatase 2A regulatory subunit"/protein_id="BAA19100.1"/db_xref="GI:1783183"') sa(i.Gene, 'PP2A') sa(i.CodonStart, '1') sa(i.Product, 'protein phosphatase 2A regulatory subunit') sa(i.GenPept, ['BAA19100.1']) sa(i.GI, ['1783183']) j = InfoFromLabel(sample_gene[2]) assert j.Refs is not i.Refs assert j._handler is not i._handler assert j._handler is j.Refs assert j.Refs.GI is not i.Refs.GI assert j.GI is not i.GI sa(j.GenBank, ['AB000458']), sa(j.Locus, 'AB000458') sa(j.CdsNumber, '1') sa(j.Location, '105..623') sa(j.Length, '519'), sa(j.Species, 'Xenopus laevis') sa(j.Description, 'Xenopus laevis Xem1 mRNA for transmembrane protein, complete cds./gene="Xem1"/codon_start=1/product="transmembrane protein"/protein_id="BAA22881.1"/db_xref="GI:2554596"') sa(j.GenPept, ['BAA22881.1']), sa(j.GI, ['2554596']) sa(j.Product, 'transmembrane protein') def test_init_unknown_db(self): """InfoFromLabel should handle a line whose database is unknown""" i = InfoFromLabel(strange_db) self.assertEqual(i.Locus, 'AB001737') class CutgSpeciesParserTests(TestCase): """Tests of the CutgSpeciesParser.""" def test_init(self): """CutgSpeciesParser should read records one at a time from lines""" recs = list(CutgSpeciesParser(sample_species)) self.assertEqual(len(recs), 3) a, b, c = recs self.assertEqual(a.Species, 'Salmonella enterica') self.assertEqual(a.NumGenes, 332) self.assertEqual(a['CGA'], 432) self.assertEqual(a['UGG'], 1366) self.assertEqual(b.Species, 'Salmonella enterica IIIb 50:k:z') self.assertEqual(b.NumGenes, 3) self.assertEqual(b['CGA'], 2) self.assertEqual(b['UGG'], 5) self.assertEqual(c.Species, 'Salmonella enterica subsp. VII') self.assertEqual(c.NumGenes, 5) self.assertEqual(c['CGA'], 4) self.assertEqual(c['UGG'], 3) #check that it won't work if we're missing any lines self.assertRaises(RecordError, list, CutgSpeciesParser(sample_species[1:])) self.assertRaises(RecordError, list, CutgSpeciesParser(sample_species[:-1])) #...but that it does work if we only have some of them recs = list(CutgSpeciesParser(sample_species[2:])) self.assertEqual(recs[0], b) self.assertEqual(len(list(CutgSpeciesParser(sample_species[1:], strict=False))), 2) class CutgParserTests(TestCase): """Tests of the CutgParser. Note: these are fairly incomplete at present since most of the work is in parsing the label line, which is tested by itself. """ def test_init(self): """CutgParser should read records one at a time from lines""" recs = list(CutgParser(sample_gene)) self.assertEqual(len(recs), 3) a, b, c = recs self.assertEqual(a.Species, 'Xenopus laevis') self.assertEqual(a['CGC'], 7) self.assertEqual(a.GI, ['1783183']) self.assertRaises(RecordError, list, CutgParser(sample_gene[1:])) self.assertEqual(len(list(CutgParser(sample_gene[1:],strict=False))), 2) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_dialign.py000644 000765 000024 00000010414 12024702176 022574 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import unittest from cogent import PROTEIN, LoadSeqs from cogent.parse.dialign import align_block_lines, parse_data_line, DialignParser __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" data = \ """ DIALIGN 2.2.1 ************* Program code written by Burkhard Morgenstern and Said Abdeddaim e-mail contact: dialign (at) gobics (dot) de Published research assisted by DIALIGN 2 should cite: Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218. For more information, please visit the DIALIGN home page at http://bibiserv.techfak.uni-bielefeld.de/dialign/ ************************************************************ program call: dialign -fa -fn /tmp/ct/seq1.fasta /tmp/ct/seq1.txt Aligned sequences: length: ================== ======= 1) HTL2 57 2) MMLV 58 3) HEPB 62 4) ECOL 54 Average seq. length: 57.8 Please note that only upper-case letters are considered to be aligned. Alignment (DIALIGN format): =========================== HTL2 1 ldtapC-LFS DGS------P QKAAYVL--- ----WDQTIL QQDITPLPSH MMLV 1 pdadhtw-YT DGSSLLQEGQ RKAGAAVtte teviWa---- KALDAG---T HEPB 1 rpgl-CQVFA DAT------P TGWGLVM--- ----GHQRMR GTFSAPLPIH ECOL 1 mlkqv-EIFT DGSCLGNPGP GGYGAIL--- ----RYRGRE KTFSAGytrT 0000000588 8882222229 9999999000 0000666666 6666633334 HTL2 37 ethSAQKGEL LALICGLRAa k--------- --- MMLV 43 ---SAQRAEL IALTQALKm- ---------- --- HEPB 37 t------AEL LAA-CFARSr sganiigtdn svv ECOL 43 ---TNNRMEL MAAIv----- ---------- --- 0003333455 5533333300 0000000000 000 Sequence tree: ============== Tree constructed using UPGMAbased on DIALIGN fragment weight scores ((HTL2 :0.130254MMLV :0.130254):0.067788(HEPB :0.120520ECOL :0.120520):0.077521); """.splitlines() class TestDialign(unittest.TestCase): def setUp(self): aln_seqs = {"HTL2": "ldtapC-LFSDGS------PQKAAYVL-------WDQTILQQDITPLPSHethSAQKGELLALICGLRAak------------", "MMLV": "pdadhtw-YTDGSSLLQEGQRKAGAAVtteteviWa----KALDAG---T---SAQRAELIALTQALKm--------------", "HEPB": "rpgl-CQVFADAT------PTGWGLVM-------GHQRMRGTFSAPLPIHt------AELLAA-CFARSrsganiigtdnsvv", "ECOL": "mlkqv-EIFTDGSCLGNPGPGGYGAIL-------RYRGREKTFSAGytrT---TNNRMELMAAIv------------------"} self.aln_seqs = {} for name, seq in aln_seqs.items(): self.aln_seqs[name] = PROTEIN.Sequence(seq,Name=name) self.QualityScores = "00000005888882222229999999900000006666666666633334000333345555333333000000000000000" def test_line_split(self): """test splitting of sequence record lines""" result = parse_data_line("HTL2 1 ldtapcLFSD GS------PQ KAAYVLWDQT ILQQDITPLP SHethsaqkg ") self.assertEqual(result, ("HTL2", "ldtapcLFSDGS------PQKAAYVLWDQTILQQDITPLPSHethsaqkg")) result = parse_data_line(" 1111111111 1000001111 1111033333 3333333333 3000000000 ") self.assertEqual(result, (None, "11111111111000001111111103333333333333333000000000")) def test_aligned_from_dialign(self): """test getting aligned seqs""" aligned_seq = dict(list(DialignParser(data, seq_maker=PROTEIN.Sequence))) assert aligned_seq == self.aln_seqs def test_quality_scores(self): """test quality scores correctly returned""" result = dict(list(DialignParser(data, seq_maker=PROTEIN.Sequence, get_scores=True))) assert result["QualityScores"] == self.QualityScores if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_parse/test_dotur.py000644 000765 000024 00000004076 12024702176 022331 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # test_dotur.py from cogent.util.unit_test import TestCase, main from cogent.parse.dotur import get_otu_lists, OtuListParser __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" class DoturParserTests(TestCase): """Tests for DoturParser. """ def setUp(self): """setup for DoturParserTests. """ self.otu_list_string = \ """unique 3 a b c 0.00 3 a b c 0.59 2 a,c b 0.78 1 a,c,b """ self.otu_res_list = [ [0.0,3,[['a'],['b'],['c']]],\ [0.0,3,[['a'],['b'],['c']]],\ [float(0.59),2,[['a','c'],['b']]],\ [float(0.78),1,[['a','c','b']]],\ ] self.otu_lists_unparsed=[\ ['a','b','c'], ['a','b','c'], ['a,c','b'], ['a,c,b'] ] self.otu_lists_parsed=[\ [['a'],['b'],['c']], [['a'],['b'],['c']], [['a','c'],['b']], [['a','c','b']] ] def test_get_otu_lists_no_data(self): """get_otu_lists should function as expected. """ self.assertEqual(get_otu_lists([]),[]) def test_get_otu_lists(self): """get_otu_lists should function as expected. """ for obs, exp in zip(self.otu_lists_unparsed,self.otu_lists_parsed): self.assertEqual(get_otu_lists(obs),exp) def test_otulistparser_no_data(self): """OtuListParser should return correct result given no data. """ res = OtuListParser([]) self.assertEqual(list(res),[]) def test_otulistparser_parser(self): """OtuListParser should return correct result given basic output. """ res = OtuListParser(self.otu_list_string.split('\n')) self.assertEqual(res,self.otu_res_list) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_ebi.py000644 000765 000024 00000104537 12024702176 021736 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Provides tests for EbiParser and related classes and functions. """ from cogent.parse.ebi import cc_alternative_products_parser, \ cc_basic_itemparser, cc_biophysicochemical_properties_parser, \ cc_interaction_parser, cc_itemfinder, cc_parser, EbiFinder, \ MinimalEbiParser, hanging_paragraph_finder, join_parser, \ join_split_dict_parser, join_split_parser, labeloff, linecode_maker, \ linecode_merging_maker, mapping_parser, pairs_to_dict, period_tail_finder, \ rstrip_, ft_basic_itemparser, ft_id_parser, ft_mutagen_parser, \ ft_mutation_parser, ft_parser, try_int, ra_parser, rc_parser, \ rg_parser, rl_parser, rn_parser, rp_parser, rt_parser, rx_parser, \ gn_parser, single_ref_parser, ac_parser, de_itemparser, dr_parser, \ dt_parser, id_parser, oc_parser, og_parser, os_parser, ox_parser, \ sq_parser, de_parser, RecordError, FieldError, curry, required_labels, \ EbiParser from cogent.util.unit_test import TestCase, main from cogent.core.sequence import Sequence from cogent.core.info import Info __author__ = "Zongzhi Liu" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Zongzhi Liu", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Zongzhi Liu" __email__ = "zongzhi.liu@gmail.com" __status__ = "Development" def item_empty_filter(d): """return a dict with only nonempty values""" pairs = [(k,v) for (k,v) in d.iteritems() if v] return dict(pairs) class EbiTests(TestCase): """ Tests ebi parsers and generic parsers and general functions """ def setUp(self): """ Construct some fake data for testing purposes """ pass def test_item_empty_filter(self): """item_empty_filter: known values""" inputs = [ {1:0, 2:1, 3:'', 4:[], 5:False, 6:{}}] expects = [ {2:1}] self.assertEqual(map(item_empty_filter, inputs), expects) def test_rstrip_(self): """rstrip_ should generate the expected function""" test = ' aaa; ' self.assertEqual(rstrip_('; ')(test), test.rstrip('; ')) #test default self.assertEqual(rstrip_()(test), test.rstrip()) def test_hanging_paragraph_finder(self): """hanging_paragraph_finder should give expected results""" f = hanging_paragraph_finder test = [ '-aaa:', ' content', ' content', '-bbb:', #3 ' bbb', 'c',] self.assertEqual(list(f(test)), [test[0:3], test[3:-1], test[-1:]]) #test all indent lines all_indent = [' aa', ' bb'] self.assertEqual(list(f(all_indent)), [all_indent]) #test empty lines self.assertEqual(list(f(['',' '])), []) def test_period_tail_finder(self): """period_tail_finder should yield each group of expected lines.""" test = "a\naa.\nb\nbb.".splitlines() self.assertEqual(list(period_tail_finder(test)), [['a','aa.'],['b','bb.']]) def test_EbiFinder(self): """EbiFinder should return expected list""" test = "a\n//\nb\n//".splitlines() self.assertEqual(list(EbiFinder(test)), [['a','//'],['b','//']]) test_fail = test + ['c'] self.assertRaises(RecordError, list, EbiFinder(test_fail)) def test_pairs_to_dict(self): """pairs_to_dict should return the expected dict""" test_dict = {'a': 1, 'b': 2, 'b': 3,} sorted_items = [('a', 1), ('b', 2), ('b', 3),] add_one = lambda x: x + 1 double = lambda x: x*2 set_zero = lambda x: 0 handlers ={ 'a': add_one, 'b': double,} #test default all self.assertEqual(pairs_to_dict(sorted_items), {'a': 1, 'b': 3}) #test 'overwrite_value' self.assertEqual(pairs_to_dict(sorted_items, 'overwrite_value'), {'a': 1, 'b': 3}) #test no_duplicated_key, raise self.assertRaises(ValueError, pairs_to_dict, sorted_items, 'no_duplicated_key') #test always_multi_value self.assertEqual(pairs_to_dict(sorted_items, 'always_multi_value'), {'a': [1], 'b': [2, 3]}) #test allow multi_value self.assertEqual(pairs_to_dict(sorted_items, 'allow_multi_value'), {'a': 1, 'b': [2, 3]}) #test raise error when key not found in all_keys f = curry(pairs_to_dict, all_keys=['a','c']) self.assertRaises(ValueError, f, sorted_items) #test handlers sorted_items.append(('c', 4)) self.assertEqual(pairs_to_dict(sorted_items, handlers=handlers, default_handler=set_zero), {'a': 2, 'b': 6, 'c': 0}) #test raise error when no valid handlers were found f = curry(pairs_to_dict, handlers=handlers) self.assertRaises(ValueError, f, sorted_items) #test sanity test_dict = dict(sorted_items) self.assertEqual(pairs_to_dict(test_dict.items()), test_dict) def test_linecode_maker(self): """linecode_maker: should return expected tuple""" tests = ['AA aa.', 'BB bb.', 'CC C cc.', 'DD dd.'] expected_linecodes =[ 'AA', 'BB', 'CC C', 'DD dd.'] #pprint(map(linecode_maker, tests)) self.assertEqual(map(linecode_maker, tests), zip(expected_linecodes, tests)) def test_labeloff(self): """labeloff: should return expected lines""" tests = ['AA aa.', 'BB bb.', 'CC C cc.', 'DD dd.', 'EE', ''] #expects = [line[5:] for line in tests] expects = ['aa.', ' bb.', ' cc.', '.','',''] self.assertEqual(labeloff(tests), expects) def test_join_parser(self): """join parser should return expected str.""" test_list = '"aaa\nbbb \nccc"; \n'.splitlines() test_str = 'aaa bb. ' #test default, list input self.assertEqual(join_parser(test_list), '"aaa bbb ccc"') #test default, str input self.assertEqual(join_parser(test_str), 'aaa bb') #test no strip self.assertEqual(join_parser(test_list, chars_to_strip=''), '"aaa bbb ccc"; ') #test strip self.assertEqual(join_parser(test_list, chars_to_strip='"; '), 'aaa bbb ccc') #test empty self.assertEqual(join_parser([]),'') self.assertEqual(join_parser(['', ' ']),'') self.assertEqual(join_parser(''),'') def test_join_split_parser(self): """join_split_parser: should return expected""" f = join_split_parser assertEqual = self.assertEqual assertEqual(f(['aa; bb;', 'cc.']), ['aa', 'bb', 'cc']) assertEqual(f(['aa; bb, bbb;', 'cc.'],delimiters=';,'), ['aa', ['bb','bbb'], 'cc']) #test item_modifer assertEqual(f('aa (bb) (cc).', '(', item_modifier=rstrip_(') ')), ['aa','bb','cc']) assertEqual(f('aa (bb)xx (cc).', '(', item_modifier=rstrip_(') ')), ['aa','bb)xx','cc']) #test empty assertEqual(f([]),['']) assertEqual(f(['', ' ']),['']) assertEqual(f(''),['']) def test_join_split_dict_parser(self): """join_split_dict_parser: should return expected""" f = join_split_dict_parser #test default self.assertEqual(f('aa=1; bb=2,3; cc=4 (if aa=1);'), {'aa':'1', 'bb': ['2','3'], 'cc': '4 (if aa=1)'}) self.assertEqual(f('aa=1; bb=2,3; cc=4:5', delimiters=';=,:'), {'aa':'1', 'bb': ['2','3'], 'cc': '4:5'}) #test strict=False -> splits without dict() self.assertEqual(f('aa=1; bb.', strict=False), [['aa', '1'], ['bb']]) #test strict -> raise ValueError self.assertRaises(ValueError, f, 'aa=1; bb.') self.assertRaises(ValueError, f, 'aa=1; bb=2=3.', ';=') self.assertRaises(ValueError, f, '') def test_mapping_parser(self): """mapping_parser: should return expected dict""" fields = [None, 'A', 'B', ('C', int), ('D', float)] line = 'blah aa bb; 2 3.1;' expect = dict(A='aa', B='bb', C=2, D=3.1) self.assertEqual(mapping_parser(line, fields), expect) #test more splits -> ignore the last splits line_leftover = line + 'blah blah' self.assertEqual(mapping_parser(line_leftover, fields), expect) #test more fields -> truncate the last fields fields_leftover = fields + ['E'] self.assertEqual(mapping_parser(line, fields_leftover), expect) #test empty self.assertEqual(mapping_parser('', fields), {}) def test_linecode_merging_maker(self): """linecode_merging_maker: """ f = linecode_merging_maker lines =['ID id.', 'RN rn.', 'RR invalid', 'RN rn.'] labels = ['ID', 'REF', 'RR', 'RN rn.'] self.assertEqual(map(f, lines), zip(labels, lines)) def test_parse_header(lines): pass def test_MinimalEbiParser_valid(self): """MinimalEbiParser: integrity of output """ f = curry(MinimalEbiParser, strict=False) #test valid result: sequence, number of records, keys of a header valid_result = list(f(fake_records_valid)) self.assertEqual(len(valid_result), 2) sequence, header = valid_result[0] self.assertEqual(sequence, 'aaccppgghhh') #the first fake record use only the required labels, the header is #deleted of '', which was assigned to sequence self.assertEqual(list(sorted(header.keys())), list(sorted(required_labels))[1:]) #[1:] to exclude the '' #test selected_labels selected_labels = ['ID', 'DE'] select_result = list(f(fake_records_valid, selected_labels=selected_labels)) self.assertEqual(list(sorted(select_result[0][1].keys())), list(sorted(selected_labels))) #test bad record - unknown linecode or wrong line format self.assertRaises(ValueError, list, f(fake_records_valid + ['ID id.', 'RR not valid.','//'])) self.assertRaises(ValueError, list, f(fake_records_valid + ['ID id.', ' RN bad format.','//'])) self.assertRaises(ValueError, list, f(fake_records_valid + ['ID id.', 'RN bad format.','//'])) #test bad record - not end with '//' self.assertRaises(RecordError, list, f(fake_records_valid + ['ID not end with //', ' seq'])) #test strict: lacking required linecodes #?? How to test the error message? warn message? #the first record, [:-5], is valid even when strict=True. the_first_valid = list(f(fake_records_valid[:-5], strict=True))[0] #[1] get the header_dict self.assertEqual(len(the_first_valid[1]),9) self.assertRaises(RecordError, list, f(fake_records_valid, strict=True)) def test_EbiParser(self): """EbiParser: """ f = curry(EbiParser, strict=False) first_valid = fake_records_valid[:-5] #test valid self.assertEqual(len(list(f(fake_records_valid))), 2) #test skipping bad record which strict=False #self.assertEqual(len(list(f(fake_records_valid[:-1] + # ['OX xx=no equal.', '//']))), 1) ##test Raise RecordError from parse_head when strict=True #self.assertRaises(RecordError, list, f(first_valid[:-1] + # ['OX xx=no equal.', '//'], strict=True)) class RootParsersKnownValues(TestCase): """Test most xx_parsers with known value""" def test_id_parser(self): """id_parser should return expected dict""" id_line = [ 'ID CYC_BOVIN STANDARD; PRT; 104 AA.'] self.assertEqual( id_parser(id_line), {'DataClass': 'STANDARD', 'Length': 104, 'MolType': 'PRT', 'EntryName': 'CYC_BOVIN'}) def test_sq_parser(self): """sq_parser should return expected dict""" lines = [ "SQ SEQUENCE 486 AA; 55639 MW; D7862E867AD74383 CRC64;"] self.assertEqual( sq_parser(lines), {'Crc64': 'D7862E867AD74383', 'Length': 486, 'MolWeight': 55639}) def test_ac_parser(self): """ac_parser should return expected list""" lines = [ "AC Q16653; O00713; O00714;", "AC Q92892; Q92893;"] self.assertEqual( ac_parser(lines), ['Q16653', 'O00713', 'O00714', 'Q92892', 'Q92893']) def test_oc_parser(self): """oc_parser should return expected list""" lines = [ "OC Eukaryota; Metazoa; Chordata;", "OC Mammalia;"] self.assertEqual(oc_parser(lines), ['Eukaryota', 'Metazoa', 'Chordata', 'Mammalia']) def test_dt_parser(self): """dt_parser should return expected list of list""" lines = \ "DT 01-AUG-1988 (Rel. 08, Created)\n"\ "DT 30-MAY-2000 (Rel. 39, Last sequence update)\n"\ "DT 10-MAY-2005 (Rel. 47, Last annotation update)\n".splitlines() self.assertEqual( dt_parser(lines), ['01-AUG-1988 (Rel. 08, Created)', '30-MAY-2000 (Rel. 39, Last sequence update)', '10-MAY-2005 (Rel. 47, Last annotation update)']) def test_de_itemparser(self): """de_itemparser: known values""" inputs = [ ' AAA (aa) ', 'AAA [xx] (aa)', 'AAA', '',] expects = [ {'OfficalName': 'AAA', 'Synonyms': ['aa']}, {'OfficalName': 'AAA [xx]', 'Synonyms': ['aa']}, {'OfficalName': 'AAA', 'Synonyms': []}, {'OfficalName': '', 'Synonyms': []}] #pprint(map(de_itemparser, inputs)) self.assertEqual(map(de_itemparser, inputs), expects) def test_de_parser(self): """de_parser should return expected list""" inputs = [ "DE Annexin [Includes: CCC] [Contains: AAA] (Fragment).", "DE A [Includes: II] (Fragment).", "DE A [Contains: CC].", "DE A (Fragment).", "DE A (aa)."] filtered_dicts = [item_empty_filter(de_parser([e])) for e in inputs] self.assertEqual(map(len, filtered_dicts), [4, 3, 2, 2, 2]) def test_os_parser(self): """os_parser should return expected list""" lines = [ 'OS Solanum melongena (Eggplant) (Auber-', 'OS gine).'] self.assertEqual(os_parser(lines), ['Solanum melongena', 'Eggplant', 'Auber- gine']) lines = \ """OS Escherichia coli.""".splitlines() self.assertEqual(os_parser(lines), ['Escherichia coli']) def test_og_parser(self): """og_parser should return expected list""" lines = [ "OG XXX; xx.", "OG Plasmid R6-5, Plasmid IncFII R100 (NR1), and", "OG Plasmid IncFII R1-19 (R1 drd-19)."] self.assertEqual(og_parser(lines), ['XXX; xx', [ 'Plasmid R6-5', 'Plasmid IncFII R100 (NR1)', 'Plasmid IncFII R1-19 (R1 drd-19)']]) def test_ox_parser(self): """ox_parser should return expected dict""" lines = ["OX NCBI_TaxID=9606;"] self.assertEqual(ox_parser(lines), {'NCBI_TaxID': '9606'}) def test_gn_parser(self): """gn_parser should return expected list of dict""" lines = [ "GN Name=hns; Synonyms=bglY, cur, topS;", "GN OrderedLocusNames=b1237, c1701, ECs1739;"] self.assertEqual(gn_parser(lines), [{'Synonyms': ['bglY', 'cur', 'topS'], 'OrderedLocusNames': ['b1237', 'c1701', 'ECs1739'], 'Name': 'hns'}]) lines = [ "GN Name=Jon99Cii; Synonyms=SER1, Ser99Da; ORFNames=CG7877;", "GN and", "GN Name=Jon99Ciii; Synonyms=SER2, Ser99Db; ORFNames=CG15519;"] self.assertEqual(gn_parser(lines), [{'ORFNames': 'CG7877', 'Synonyms': ['SER1', 'Ser99Da'], 'Name': 'Jon99Cii'}, {'ORFNames': 'CG15519', 'Synonyms': ['SER2', 'Ser99Db'], 'Name': 'Jon99Ciii'}]) def test_dr_parser(self): """dr_parser should return expected dict""" lines = dr_lines self.assertEqual(dr_parser(lines), dr_expect) class FT_Tests(TestCase): """Tests for FT parsers. """ def test_ft_basic_itemparser(self): """ft_basic_itemparser: known values""" inputs = [ ['DNA_BIND >102 292'], ['CONFLICT 327 327 E -> R (in Ref. 2).'], ['PROPEP ?25 48', ' /FTId=PRO_021449.',], ['VARIANT 214 214 V -> I.', ' /FTId=VAR_009122.',],] expects = [ ('DNA_BIND', '>102', 292, ''), ('CONFLICT', 327, 327, 'E -> R (in Ref. 2)'), ('PROPEP', '?25', 48, '/FTId=PRO_021449'), ('VARIANT', 214, 214, 'V -> I. /FTId=VAR_009122')] #pprint(map(ft_basic_itemparser, inputs)) self.assertEqual(map(ft_basic_itemparser, inputs), expects) def test_try_int(self): """try_int: known values""" inputs = ['9', '0', '-3', '2.3', '<9', '>9', '?', '?35', ''] expects = [9, 0, -3, '2.3', '<9', '>9', '?', '?35', ''] self.assertEqual(map(try_int, inputs), expects) def test_ft_id_parser(self): """ft_id_parser: known values""" inputs = [ '', 'ddd', '/FTId=PRO_021449', 'V -> I. /FTId=VAR_009122', 'E -> R (tumor). /FTId=VAR_002343',] expects = [ {'Description': '', 'Id': ''}, {'Description': 'ddd', 'Id': ''}, {'Description': '', 'Id': 'PRO_021449'}, {'Description': 'V -> I', 'Id': 'VAR_009122'}, {'Description': 'E -> R (tumor)', 'Id': 'VAR_002343'}] #pprint(map(ft_id_parser, inputs)) self.assertEqual(map(ft_id_parser, inputs), expects) def test_ft_mutation_parser(self): """ft_mutation_parser: known values""" inputs = [ '', 'ddd', #should raise error? 'V -> I. /FTId=xxxxxx', #should raise error? 'V -> I', 'E -> R (tumor)', 'missing (tumor)', ] expects = [ {'MutateFrom': '', 'Comment': '', 'MutateTo': ''}, {'MutateFrom': 'ddd', 'Comment': '', 'MutateTo': ''}, {'MutateFrom': 'V', 'Comment': '', 'MutateTo': 'I. /FTId=xxxxxx'}, {'MutateFrom': 'V', 'Comment': '', 'MutateTo': 'I'}, {'MutateFrom': 'E', 'Comment': 'tumor', 'MutateTo': 'R'}, {'MutateFrom': 'missing ', 'Comment': 'tumor', 'MutateTo': ''}] #pprint(map(ft_mutation_parser, inputs)) self.assertEqual(map(ft_mutation_parser, inputs), expects) def test_ft_mutation_parser_raise(self): """ft_mutation_parser: raise ValueError""" pass def test_ft_mutagen_parser(self): """ft_mutagen_parser: known values""" inputs = [ 'C->R,E,A: Loss of cADPr hydrolas', 'Missing: Abolishes ATP-binding', ] expects = [ {'Comment': ' Loss of cADPr hydrolas', 'MutateFrom': 'C', 'MutateTo': 'R,E,A'}, {'Comment': ' Abolishes ATP-binding', 'MutateFrom': 'Missing', 'MutateTo': ''}] #pprint(map(ft_mutagen_parser, inputs)) self.assertEqual(map(ft_mutagen_parser, inputs), expects) def test_ft_id_mutation_parser(self): """ft_id_mutation_parser: known values""" pass def test_ft_parser(self): """ft_parser should return expected dict""" lines = ft_lines #pprint(ft_parser(lines)) self.assertEqual(ft_parser(lines), ft_expect) class CC_Tests(TestCase): """tests for cc_parsers. """ def test_cc_itemfinder_valid(self): """cc_itemfinder: yield each expected block.""" #pprint(list(cc_itemfinder(labeloff(cc_lines)))) input_with_license = labeloff(cc_lines) self.assertEqual(len(list(cc_itemfinder( input_with_license))), 9) input_without_license = labeloff(cc_lines[:-4]) self.assertEqual(len(list(cc_itemfinder( input_without_license))), 8) def test_cc_itemfinder_raise(self): """cc_itemfinder: raise RecordError if license block bad.""" input_with_license_lacking_bottom = labeloff(cc_lines[:-1]) self.assertRaises(FieldError, cc_itemfinder, input_with_license_lacking_bottom) def test_cc_basic_itemparser(self): """cc_basic_itemparser: known results or FieldError""" valid_topics = [ ['-!- topic1: first: line', ' second line'], ['-!- topic2: ', ' first line', ' second line'], [' topic3: not treated invalid topic format'],] expects = [ ('topic1', ['first: line', 'second line']), ('topic2', ['first line', 'econd line']), ('topic3', ['not treated invalid topic format']),] self.assertEqual(map(cc_basic_itemparser, valid_topics), expects) bad_topic = ['-!- bad_topic without colon', ' FieldError'] self.assertRaises(FieldError, cc_basic_itemparser, bad_topic) def test_cc_interaction_parser(self): """cc_interaction_parser: known values""" inputs = [ ['Self; NbExp=1; IntAct=EBI-123485, EBI-123485;', 'Q9W158:CG4612; NbExp=1; IntAct=EBI-123485, EBI-89895;', 'Q9VYI0:fne; NbExp=1; IntAct=EBI-123485, EBI-126770;',]] expects =[ [('Self', {'NbExp': '1', 'IntAct': ['EBI-123485', 'EBI-123485']}), ('Q9W158:CG4612', {'NbExp': '1', 'IntAct': ['EBI-123485', 'EBI-89895']}), ('Q9VYI0:fne', {'NbExp': '1', 'IntAct': ['EBI-123485', 'EBI-126770']})]] self.assertEqual(map(cc_interaction_parser, inputs), expects) def test_cc_biophysicochemical_properties_parser(self): """cc_biophysicochemical_properties_parser: known values""" #pprint(cc['BIOPHYSICOCHEMICAL PROPERTIES']) #topic specific parser f = cc_biophysicochemical_properties_parser valid_inputs = [ ['Kinetic parameters:', ' KM=98 uM for ATP;', ' KM=688 uM for pyridoxal;', ' Vmax=1.604 mmol/min/mg enzyme;', 'pH dependence:', ' Optimum pH is 6.0. Active pH 4.5 to 10.5;',] ] expects = [ {'Kinetic parameters': { 'KM': ['98 uM for ATP', '688 uM for pyridoxal'], 'Vmax': '1.604 mmol/min/mg enzyme'}, 'pH dependence': 'Optimum pH is 6.0. Active pH 4.5 to 10.5'}, ] self.assertEqual(map(f, valid_inputs), expects) def test_cc_alternative_products_parser(self): """cc_alternative_products_parser: know values""" f = cc_alternative_products_parser valid_inputs = [ ['Event=Alternative initiation;' ' Comment=Free text;', 'Event=Alternative splicing; Named isoforms=3;', ' Comment=Additional isoforms seem to exist.', ' confirmation;', 'Name=1; Synonyms=AIRE-1;', ' IsoId=O43918-1; Sequence=Displayed;', 'Name=3; Synonyms=AIRE-3,', 'ai-2, ai-3;', #broken the hanging_paragraph_finder ' IsoId=O43918-3; Sequence=VSP_004089, VSP_004090;',]] expects = \ [[{'Comment': 'Free text', 'Event': 'Alternative initiation'}, {'Comment': 'Additional isoforms seem to exist. confirmation', 'Event': 'Alternative splicing', 'Named isoforms': '3', 'Names': [{'IsoId': 'O43918-1', 'Name': '1', 'Sequence': 'Displayed', 'Synonyms': 'AIRE-1'}, {'IsoId': 'O43918-3', 'Name': '3', 'Sequence': ['VSP_004089', 'VSP_004090'], 'Synonyms': ['AIRE-3', 'ai-2', 'ai-3']}]}]] #pprint(map(f,valid_inputs)) self.assertEqual(map(f, valid_inputs), expects) def test_cc_parser(self): """cc_parser: known values and raise when strict""" cc = cc_parser(cc_lines) #pprint(cc) #print cc.keys() self.assertEqual(list(sorted(cc.keys())), ['ALLERGEN', 'ALTERNATIVE PRODUCTS', 'BIOPHYSICOCHEMICAL PROPERTIES', 'DATABASE', 'DISEASE', 'INTERACTION', 'LICENSE', 'MASS SPECTROMETRY']) #test Disease topic (default_handler) self.assertEqual(cc['DISEASE'], [ 'Defects in PHKA1 are linked to X-linked muscle glycogenosis ' '[MIM:311870]', 'Defects in ABCD1 are the cause of recessive X-linked ' 'adrenoleukodystrophy (X-ALD) [MIM:300100]. X-ALD is a rare ' 'phenotype' ]) #test License (default_handler) #pprint(cc['LICENSE']) self.assertEqual(cc['LICENSE'], [ 'This SWISS-PROT entry is copyright. It is produced through a ' 'collaboration removed']) #pprint(cc['DATABASE']) #join_split_dict self.assertEqual(cc['DATABASE'], [{ 'NAME': 'CD40Lbase', 'NOTE': 'European CD40L defect database (mutation db)', 'WWW': '"http://www.expasy.org/cd40lbase/"'}]) #test strict cc_lines_with_unknown_topic = ['CC -!- BLAHBLAH: xxxxx'] + cc_lines #pprint(cc_parser(cc_lines_with_unknown_topic)) self.assertEqual(cc_parser(cc_lines_with_unknown_topic)['BLAHBLAH'], ['xxxxx']) self.assertRaises(FieldError, cc_parser, cc_lines_with_unknown_topic, strict=True) class ReferenceTests(TestCase): """Tests for parsers related to reference blocks""" def test_ref_finder(self): """ref_finder: should return a list of ref blocks""" pass def test_refs_parser(self): """refs_parser: should return a dict of {RN: ref_dict}""" pass def test_single_ref_parser(self): """single_ref_parser: should return the expected dict""" fake_ref_block = ['RN [1]', 'RP NUCLEOTIDE', 'RC STRAIN=Bristol N2;', 'RX PubMed=1113;', 'RA Zhang L., Wu S.-L., Rubin C.S.;', 'RT "A novel ";', 'RL J. Biol. Chem. 276:10.',] rn, others = single_ref_parser(fake_ref_block) self.assertEqual(rn, 1) self.assertEqual(len(others), 6) #test strict: lacking required labels self.assertEqual(len(single_ref_parser( fake_ref_block[:-1], strict=False)[1]), 5) self.assertRaises(RecordError, single_ref_parser, fake_ref_block[:-1], True) def test_ra_parser(self): """ra_parser should return expected list""" lines = \ "RA Galinier A., Bleicher F., Negre D.,\n"\ "RA Cozzone A.J., Cortay J.-C.;\n".splitlines() self.assertEqual( ra_parser(lines), ['Galinier A.', 'Bleicher F.', 'Negre D.', 'Cozzone A.J.', 'Cortay J.-C.']) def test_rx_parser(self): """rx_parser should return expected dict""" inputs = [ ['RX MEDLINE=22709107; PubMed=12788972; DOI=10.1073/pnas.113'], ['RX PubMed=14577811; '\ 'DOI=10.1597/1545-1569(2003)040<0632:AMMITS>2.0.CO;2;']] expects = [ {'DOI': '10.1073/pnas.113', 'MEDLINE': '22709107', 'PubMed': '12788972'}, {'DOI': '10.1597/1545-1569(2003)040<0632:AMMITS>2.0.CO;2', 'PubMed': '14577811'}] self.assertEqual(map(rx_parser, inputs), expects) def test_rc_parser(self): """rc_parser should return expected dict""" lines = [ "RC PLASMID=R1 (R7268); TRANSPOSON=Tn3;", "RC STRAIN=AL.012, AZ.026;"] self.assertEqual(rc_parser(lines), {'TRANSPOSON': 'Tn3', 'PLASMID': 'R1 (R7268)', 'STRAIN': ['AL.012', 'AZ.026']}) def test_rt_parser(self): """rt_parser should return expected str""" lines = [ 'RT "New insulin-like proteins', 'RT analysis and homology modeling.";'] self.assertEqual( rt_parser(lines), 'New insulin-like proteins analysis and homology modeling') def test_rl_parser(self): """rl_parser should return expected str""" lines = [ "RL J. Mol. Biol. 168:321-331(1983)."] self.assertEqual( rl_parser(lines), 'J. Mol. Biol. 168:321-331(1983)') def test_rn_parser(self): """rn_parser should return expected str""" lines = [ "RN [8]"] self.assertEqual( rn_parser(lines), 8) def test_rg_parser(self): """rg_parser should return expected str""" lines = [ "RG The mouse genome sequencing consortium;"] self.assertEqual( rg_parser(lines), ['The mouse genome sequencing consortium']) def test_rp_parser(self): """rp_parser should return expected str""" lines = [ "RP X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS)."] self.assertEqual( rp_parser(lines), 'X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS)') ################################# # global test data ft_lines = \ """FT CHAIN 29 262 Granzyme A. FT /FTId=PRO_0000027394. FT ACT_SITE 69 69 Charge relay system. FT VARIANT 121 121 T -> M (in dbSNP:3104233). FT /FTId=VAR_024291. FT VARIANT 1 7 unknown (in a skin tumor). FT /FTId=VAR_005851. FT CONFLICT 282 282 R -> Q (in Ref. 18). FT STRAND 30 30 FT STRAND 33 34 FT TURN 37 38""".splitlines() ft_expect = \ {'ACT_SITE': [{'Start': 69, 'End': 69, 'Description': 'Charge relay system'}], 'CHAIN': [{'Description': {'Description': 'Granzyme A', 'Id': 'PRO_0000027394'}, 'End': 262, 'Start': 29}], 'CONFLICT': [{'Description': {'Comment': 'in Ref. 18', 'MutateFrom': 'R', 'MutateTo': 'Q'}, 'End': 282, 'Start': 282}], 'SecondaryStructure': [('STRAND', 30, 30), ('STRAND', 33, 34), ('TURN', 37, 38)], 'VARIANT': [{'Description': {'Comment': 'in dbSNP:3104233', 'Id': 'VAR_024291', 'MutateFrom': 'T', 'MutateTo': 'M'}, 'End': 121, 'Start': 121}, {'Description': {'Comment': 'in a skin tumor', 'Id': 'VAR_005851', 'MutateFrom': 'unknown ', 'MutateTo': ''}, 'End': 7, 'Start': 1}]} dr_lines =\ """DR MIM; 140050; gene. DR GO; GO:0001772; C:immunological synapse; TAS. DR GO; GO:0005634; C:nucleus; TAS. DR GO; GO:0006915; P:apoptosis; TAS. DR GO; GO:0006922; P:cleavage of lamin; IDA. DR GO; GO:0006955; P:immune response; TAS. DR InterPro; IPR001254; Peptidase_S1_S6. DR InterPro; IPR001314; Peptidase_S1A. DR Pfam; PF00089; Trypsin; 1.""".splitlines() dr_expect =\ {'GO': [['GO:0001772', 'C:immunological synapse', 'TAS'], ['GO:0005634', 'C:nucleus', 'TAS'], ['GO:0006915', 'P:apoptosis', 'TAS'], ['GO:0006922', 'P:cleavage of lamin', 'IDA'], ['GO:0006955', 'P:immune response', 'TAS']], 'Pfam': [['PF00089', 'Trypsin', '1']], 'InterPro': [['IPR001254', 'Peptidase_S1_S6'], ['IPR001314', 'Peptidase_S1A']], 'MIM': [['140050', 'gene']]} cc_lines = \ """CC -!- ALLERGEN: Causes an allergic reaction in human. Binds to IgE. CC bovine dander. CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing; Named isoforms=3; CC Comment=Additional isoforms seem to exist. CC confirmation; CC Name=1; Synonyms=AIRE-1; CC IsoId=O43918-1; Sequence=Displayed; CC Name=2; Synonyms=AIRE-2; CC IsoId=O43918-2; Sequence=VSP_004089; CC Name=3; Synonyms=AIRE-3; CC IsoId=O43918-3; Sequence=VSP_004089, VSP_004090; CC -!- BIOPHYSICOCHEMICAL PROPERTIES: CC Kinetic parameters: CC KM=98 uM for ATP; CC KM=688 uM for pyridoxal; CC Vmax=1.604 mmol/min/mg enzyme; CC pH dependence: CC Optimum pH is 6.0. Active pH 4.5 to 10.5; CC -!- DATABASE: NAME=CD40Lbase; CC NOTE=European CD40L defect database (mutation db); CC WWW="http://www.expasy.org/cd40lbase/". CC -!- DISEASE: Defects in PHKA1 are linked to X-linked muscle CC glycogenosis [MIM:311870]. CC -!- DISEASE: Defects in ABCD1 are the cause of recessive X-linked CC adrenoleukodystrophy (X-ALD) [MIM:300100]. X-ALD is a rare CC phenotype. CC -!- INTERACTION: CC Self; NbExp=1; IntAct=EBI-123485, EBI-123485; CC Q9W158:CG4612; NbExp=1; IntAct=EBI-123485, EBI-89895; CC Q9VYI0:fne; NbExp=1; IntAct=EBI-123485, EBI-126770; CC -!- MASS SPECTROMETRY: MW=24948; MW_ERR=6; METHOD=MALDI; RANGE=1-228; CC NOTE=Ref.2. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC removed. CC -------------------------------------------------------------------------- """.splitlines() fake_records_valid = """ID CYC_BOVIN STANDARD; PRT; 104 AA. AC ac1; ac2; DT dt. DE de. OS os. OC oc. OX NCBI_TaxID=9606; RN [1] SQ SEQUENCE 486 AA; 55639 MW; D7862E867AD74383 CRC64 aac cpp ggh hh // ID idid std; prt; 104 #-5 OX NCBI_TaxID=9606; DE dede. ggaaccpp //""".splitlines() # Run tests if called from the command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_fasta.py000644 000765 000024 00000041153 12024702176 022267 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for FASTA and related parsers. """ from cogent.parse.fasta import FastaParser, MinimalFastaParser, \ NcbiFastaLabelParser, NcbiFastaParser, RichLabel, LabelParser, GroupFastaParser from cogent.core.sequence import DnaSequence, Sequence, ProteinSequence as Protein from cogent.core.info import Info from cogent.parse.record import RecordError from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def Dna(seq, *args, **kwargs): seq = seq.replace('u','t') seq = seq.replace('U','T') d = DnaSequence(seq, *args, **kwargs) return d class GenericFastaTest(TestCase): """Setup data for all the various FASTA parsers.""" def setUp(self): """standard files""" self.labels = '>abc\n>def\n>ghi\n'.split('\n') self.oneseq = '>abc\nUCAG\n'.split('\n') self.multiline = '>xyz\nUUUU\nCC\nAAAAA\nG'.split('\n') self.threeseq='>123\na\n> \t abc \t \ncag\ngac\n>456\nc\ng'.split('\n') self.twogood='>123\n\n> \t abc \t \ncag\ngac\n>456\nc\ng'.split('\n') self.oneX='>123\nX\n> \t abc \t \ncag\ngac\n>456\nc\ng'.split('\n') self.nolabels = 'GJ>DSJGSJDF\nSFHKLDFS>jkfs\n'.split('\n') self.empty = [] class MinimalFastaParserTests(GenericFastaTest): """Tests of MinimalFastaParser: returns (label, seq) tuples.""" def test_empty(self): """MinimalFastaParser should return empty list from 'file' w/o labels""" self.assertEqual(list(MinimalFastaParser(self.empty)), []) self.assertEqual(list(MinimalFastaParser(self.nolabels, strict=False)), []) self.assertRaises(RecordError, list, MinimalFastaParser(self.nolabels)) def test_no_labels(self): """MinimalFastaParser should return empty list from file w/o seqs""" #should fail if strict (the default) self.assertRaises(RecordError, list, MinimalFastaParser(self.labels,strict=True)) #if not strict, should skip the records self.assertEqual(list(MinimalFastaParser(self.labels, strict=False)), []) def test_single(self): """MinimalFastaParser should read single record as (label, seq) tuple""" f = list(MinimalFastaParser(self.oneseq)) self.assertEqual(len(f), 1) a = f[0] self.assertEqual(a, ('abc', 'UCAG')) f = list(MinimalFastaParser(self.multiline)) self.assertEqual(len(f), 1) a = f[0] self.assertEqual(a, ('xyz', 'UUUUCCAAAAAG')) def test_multiple(self): """MinimalFastaParser should read multiline records correctly""" f = list(MinimalFastaParser(self.threeseq)) self.assertEqual(len(f), 3) a, b, c = f self.assertEqual(a, ('123', 'a')) self.assertEqual(b, ('abc', 'caggac')) self.assertEqual(c, ('456', 'cg')) def test_multiple_bad(self): """MinimalFastaParser should complain or skip bad records""" self.assertRaises(RecordError, list, MinimalFastaParser(self.twogood)) f = list(MinimalFastaParser(self.twogood, strict=False)) self.assertEqual(len(f), 2) a, b = f self.assertEqual(a, ('abc', 'caggac')) self.assertEqual(b, ('456', 'cg')) class FastaParserTests(GenericFastaTest): """Tests of FastaParser: returns sequence objects.""" def test_empty(self): """FastaParser should return empty list from 'file' w/o labels""" self.assertEqual(list(FastaParser(self.empty)), []) self.assertEqual(list(FastaParser(self.nolabels, strict=False)), []) self.assertRaises(RecordError, list, FastaParser(self.nolabels)) def test_no_labels(self): """FastaParser should return empty list from file w/o seqs""" #should fail if strict (the default) self.assertRaises(RecordError, list, FastaParser(self.labels,strict=True)) #if not strict, should skip the records self.assertEqual(list(FastaParser(self.labels, strict=False)), []) def test_single(self): """FastaParser should read single record as seq object""" f = list(FastaParser(self.oneseq)) self.assertEqual(len(f), 1) a = f[0] self.assertEqual(a, ('abc', 'UCAG')) self.assertEqual(a[1].Name, 'abc') f = list(FastaParser(self.multiline)) self.assertEqual(len(f), 1) a = f[0] self.assertEqual(a, ('xyz', 'UUUUCCAAAAAG')) self.assertEqual(a[1].Name, 'xyz') def test_single_constructor(self): """FastaParser should use constructors if supplied""" f = list(FastaParser(self.oneseq, Dna)) self.assertEqual(len(f), 1) a = f[0] self.assertEqual(a, ('abc', 'TCAG')) self.assertEqual(a[1].Name, 'abc') def upper_abc(x): return None, {'ABC': x.upper()} f = list(FastaParser(self.multiline, Dna, upper_abc)) self.assertEqual(len(f), 1) a = f[0] self.assertEqual(a, (None, 'TTTTCCAAAAAG')) self.assertEqual(a[1].Name, None) self.assertEqual(a[1].Info.ABC, 'XYZ') def test_multiple(self): """FastaParser should read multiline records correctly""" f = list(FastaParser(self.threeseq)) self.assertEqual(len(f), 3) for i in f: assert isinstance(i[1], Sequence) a, b, c = f self.assertEqual((a[1].Name, a[1]), ('123', 'a')) self.assertEqual((b[1].Name, b[1]), ('abc', 'caggac')) self.assertEqual((c[1].Name, c[1]), ('456', 'cg')) def test_multiple_bad(self): """Parser should complain or skip bad records""" self.assertRaises(RecordError, list, FastaParser(self.twogood)) f = list(FastaParser(self.twogood, strict=False)) self.assertEqual(len(f), 2) a, b = f a, b = a[1], b[1] #field 0 is name self.assertEqual((a.Name, a), ('abc', 'caggac')) self.assertEqual((b.Name, b), ('456', 'cg')) def test_multiple_constructor_bad(self): """Parser should complain or skip bad records w/ constructor""" def dnastrict(x, **kwargs): try: return Dna(x, check=True, **kwargs) except Exception, e: raise RecordError, "Could not convert sequence" self.assertRaises(RecordError, list, FastaParser(self.oneX, dnastrict)) f = list(FastaParser(self.oneX, dnastrict, strict=False)) self.assertEqual(len(f), 2) a, b = f a, b = a[1], b[1] self.assertEqual((a.Name, a), ('abc', 'caggac'.upper())) self.assertEqual((b.Name, b), ('456', 'cg'.upper())) class NcbiFastaLabelParserTests(TestCase): """Tests of the label line parser for NCBI's FASTA identifiers.""" def test_init(self): """Labels from genpept.fsa should work as expected""" i = NcbiFastaLabelParser( '>gi|37549575|ref|XP_352503.1| similar to EST gb|ATTS1136')[1] self.assertEqual(i.GI, ['37549575']) self.assertEqual(i.RefSeq, ['XP_352503.1']) self.assertEqual(i.Description, 'similar to EST gb|ATTS1136') i = NcbiFastaLabelParser( '>gi|32398734|emb|CAD98694.1| (BX538350) dbj|baa86974.1, possible')[1] self.assertEqual(i.GI, ['32398734']) self.assertEqual(i.RefSeq, []) self.assertEqual(i.EMBL, ['CAD98694.1']) self.assertEqual(i.Description, '(BX538350) dbj|baa86974.1, possible') i = NcbiFastaLabelParser( '>gi|10177064|dbj|BAB10506.1| (AB005238) ')[1] self.assertEqual(i.GI, ['10177064']) self.assertEqual(i.DDBJ, ['BAB10506.1']) self.assertEqual(i.Description, '(AB005238)') class NcbiFastaParserTests(TestCase): """Tests of the NcbiFastaParser.""" def setUp(self): """Define a few standard files""" self.peptide = [ '>gi|10047090|ref|NP_055147.1| small muscle protein, X-linked [Homo sapiens]', 'MNMSKQPVSNVRAIQANINIPMGAFRPGAGQPPRRKECTPEVEEGVPPTSDEEKKPIPGAKKLPGPAVNL', 'SEIQNIKSELKYVPKAEQ', '>gi|10047092|ref|NP_037391.1| neuronal protein [Homo sapiens]', 'MANRGPSYGLSREVQEKIEQKYDADLENKLVDWIILQCAEDIEHPPPGRAHFQKWLMDGTVLCKLINSLY', 'PPGQEPIPKISESKMAFKQMEQISQFLKAAETYGVRTTDIFQTVDLWEGKDMAAVQRTLMALGSVAVTKD' ] self.nasty = [ ' ', #0 ignore leading blank line '>gi|abc|ref|def|', #1 no description -- ok 'UCAG', #2 single line of sequence '#comment', #3 comment -- skip ' \t ', #4 ignore blank line between records '>gi|xyz|gb|qwe| \tdescr \t\t', #5 desciption has whitespace 'UUUU', #6 two lines of sequence 'CCCC', #7 '>gi|bad|ref|nonsense', #8 missing last pipe -- error 'ACU', #9 '>gi|bad|description', #10 not enough fields -- error 'AAA', #11 '>gi|bad|ref|stuff|label', #12 'XYZ', #13 bad sequence -- error '>gi|bad|gb|ignore| description', #14 label without sequence -- error '> gi | 123 | dbj | 456 | desc|with|pipes| ',#15 label w/ whitespace -- OK 'ucag', #16 ' \t ', #17 ignore blank line inside record 'UCAG', #18 'tgac', #19 lowercase should be OK '# comment', #20 comment -- skip 'NNNN', #21 degenerates should be OK ' ', #22 ignore trailing blank line ] self.empty = [] self.no_label = ['ucag'] def test_empty(self): """NcbiFastaParser should accept empty input""" self.assertEqual(list(NcbiFastaParser(self.empty)), []) self.assertEqual(list(NcbiFastaParser(self.empty, Protein)), []) def test_normal(self): """NcbiFastaParser should accept normal record if loose or strict""" f = list(NcbiFastaParser(self.peptide, Protein)) self.assertEqual(len(f), 2) a, b = f a, b = a[1], b[1] #field 0 is the name self.assertEqual(a, 'MNMSKQPVSNVRAIQANINIPMGAFRPGAGQPPRRKECTPEVEEGVPPTSDEEKKPIPGAKKLPGPAVNLSEIQNIKSELKYVPKAEQ') self.assertEqual(a.Info.GI, ['10047090']) self.assertEqual(a.Info.RefSeq, ['NP_055147.1']) self.assertEqual(a.Info.DDBJ, []) self.assertEqual(a.Info.Description, 'small muscle protein, X-linked [Homo sapiens]') self.assertEqual(b, 'MANRGPSYGLSREVQEKIEQKYDADLENKLVDWIILQCAEDIEHPPPGRAHFQKWLMDGTVLCKLINSLYPPGQEPIPKISESKMAFKQMEQISQFLKAAETYGVRTTDIFQTVDLWEGKDMAAVQRTLMALGSVAVTKD') self.assertEqual(b.Info.GI, ['10047092']) self.assertEqual(b.Info.RefSeq, ['NP_037391.1']) self.assertEqual(b.Info.Description, 'neuronal protein [Homo sapiens]') def test_bad(self): """NcbiFastaParser should raise error on bad records if strict""" #if strict, starting anywhere in the first 15 lines should cause errors for i in range(15): self.assertRaises(RecordError,list,NcbiFastaParser(self.nasty[i:])) #...but the 16th is OK. r = list(NcbiFastaParser(self.nasty[15:]))[0] self.assertEqual(r, ('123', 'ucagUCAGtgacNNNN')) #test that we get what we expect if not strict r = list(NcbiFastaParser(self.nasty, Sequence, strict=False)) self.assertEqual(len(r), 4) a, b, c, d = r self.assertEqual((a[1], a[1].Info.GI, a[1].Info.RefSeq, \ a[1].Info.Description), ('UCAG', ['abc'], ['def'], '')) self.assertEqual((b[1], b[1].Info.GI, b[1].Info.GenBank, \ b[1].Info.Description), ('UUUUCCCC', ['xyz'], ['qwe'], 'descr')) self.assertEqual((c[1], c[1].Info.GI, c[1].Info.RefSeq, \ c[1].Info.Description), ('XYZ', ['bad'], ['stuff'], 'label')) self.assertEqual((d[1], d[1].Info.GI, d[1].Info.DDBJ, \ d[1].Info.Description), ('ucagUCAGtgacNNNN'.upper(), ['123'], ['456'], 'desc|with|pipes|')) #...and when we explicitly supply a constructor r = list(NcbiFastaParser(self.nasty, Dna, strict=False)) self.assertEqual(len(r), 3) a, b, c = r a, b, c = a[1], b[1], c[1] self.assertEqual((a, a.Info.GI, a.Info.RefSeq, a.Info.Description), ('TCAG', ['abc'], ['def'], '')) self.assertEqual((b, b.Info.GI, b.Info.GenBank, b.Info.Description), ('TTTTCCCC', ['xyz'], ['qwe'], 'descr')) self.assertEqual((c, c.Info.GI, c.Info.DDBJ, c.Info.Description), ('tcagTCAGtgacNNNN'.upper(), ['123'], ['456'], 'desc|with|pipes|')) class LabelParsingTest(TestCase): """Test generic fasta label parsing""" def test_rich_label(self): """rich label correctly constructs label strings""" # labels should be equal based on the result of applying their # attributes to their string template k = RichLabel(Info(species="rat"), "%(species)s") l = RichLabel(Info(species="rat", seq_id="xy5"), "%(species)s") self.assertEqual(k, l) # labels should construct from Info components correctly k = RichLabel(Info(species="rat", seq_id="xy5"), "%(seq_id)s:%(species)s") self.assertEqual(k, "xy5:rat") k = RichLabel(Info(species="rat", seq_id="xy5"), "%(species)s:%(seq_id)s") self.assertEqual(k, "rat:xy5") # extra components should be ignored k = RichLabel(Info(species="rat", seq_id="xy5"), "%(species)s") self.assertEqual(k, "rat") # the label should have Info object self.assertEqual(k.Info.species, "rat") self.assertEqual(k.Info.seq_id, "xy5") # label should be constructable just like a normal string self.assertEqual(RichLabel('a'), 'a') def test_label_parser(self): """label parser factory function cope with mixed structure labels""" # the label parser factory function should correctly handle label lines # with mixed separators make = LabelParser("%(species)s:%(accession)s", [[0,"accession", str], [2, "species", str]], split_with=": ") for label, expect in [(">abcd:human:misc", "misc:abcd"), ("abcd:human:misc", "misc:abcd"), (">abcd:Human misc", "misc:abcd"), (">abcd Human:misc", "misc:abcd"), (">abcd:Human misc", "misc:abcd")]: self.assertEqual(make(label), expect) # should raise an assertion error if template doesn't match at least one field name self.assertRaises(AssertionError, LabelParser, "%s:%s", [[0,"accession", str], [2, "species", str]], split_with=": ") class GroupFastaParsingTest(TestCase): """test parsing of grouped sequences in a collection""" def test_groups(self): """correctly yield grouped sequences from fasta formatted data""" data = [">group1:seq1_id:species1", "ACTG", ">group1:seq2_id:species2", "ACTG", ">group2:seq3_id:species1", "ACGT", ">group2:seq4_id:species2", "ACGT"] expected = [{"species1": "ACTG", "species2":"ACTG"}, {"species1":"ACGT", "species2":"ACGT"}] label_to_name = LabelParser("%(species)s", [(0,"Group",str), (1,"seq_id",str),(2,"species",str)], split_with=":") parser = GroupFastaParser(data, label_to_name, aligned=True) count = 0 for group in parser: got = group.todict() want = expected[count] self.assertEqual(got, want) self.assertEqual(group.Info.Group, "group%s" % (count+1)) count += 1 # check we don't return a done group done_groups = ["group1"] parser = GroupFastaParser(data, label_to_name, done_groups=done_groups, aligned=True) for group in parser: got = group.todict() want = expected[1] self.assertEqual(got, want) self.assertEqual(group.Info.Group, "group2") if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_fastq.py000644 000765 000024 00000004337 12024702176 022312 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.parse.fastq import MinimalFastqParser __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" data = { "GAPC_0015:6:1:1259:10413#0/1": dict(seq='AACACCAAACTTCTCCACCACGTGAGCTACAAAAG', qual=r'````Y^T]`]c^cabcacc`^Lb^ccYT\T\Y\WF'), "GAPC_0015:6:1:1283:11957#0/1": dict(seq='TATGTATATATAACATATACATATATACATACATA', qual=r']KZ[PY]_[YY^```ac^\\`bT``c`\aT``bbb'), "GAPC_0015:6:1:1284:10484#0/1": dict(seq='TCAGTTTTCCTCGCCATATTTCACGTCCTAAAGCG', qual=r'UM_]]U_]Z_Y^\^^``Y]`^SZ]\Ybb`^_LbL_'), "GAPC_0015:6:1:1287:17135#0/1": dict(seq='TGTGCCTATGGAAGCAGTTCTAGGATCCCCTAGAA', qual=r'^aacccL\ccc\c\cTKTS]KZ\]]I\[Wa^T`^K'), "GAPC_0015:6:1:1293:3171#0/1": dict(seq="AAAGAAAGGAAGAAAAGAAAAAGAAACCCGAGTTA", qual=r"b`bbbU_[YYcadcda_LbaaabWbaacYcc`a^c"), "GAPC_0015:6:1:1297:10729#0/1": dict(seq="TAATGCCAAAGAAATATTTCCAAACTACATGCTTA", qual=r"T\ccLbb``bacc]_cacccccLccc\ccTccYL^"), "GAPC_0015:6:1:1299:5940#0/1": dict(seq="AATCAAGAAATGAAGATTTATGTATGTGAAGAATA", qual=r"dcddbcfffdfffd`dd`^`c`Oc`Ybb`^eecde"), "GAPC_0015:6:1:1308:6996#0/1": dict(seq="TGGGACACATGTCCATGCTGTGGTTTTAACCGGCA", qual=r"a]`aLY`Y^^ccYa`^^TccK_X]\c\c`caTTTc"), "GAPC_0015:6:1:1314:13295#0/1": dict(seq="AATATTGCTTTGTCTGAACGATAGTGCTCTTTGAT", qual=r"cLcc\\dddddaaYd`T```bLYT\`a```bZccc"), "GAPC_0015:6:1:1317:3403#0/1": dict(seq="TTGTTTCCACTTGGTTGATTTCACCCCTGAGTTTG", qual=r"\\\ZTYTSaLbb``\_UZ_bbcc`cc^[ac\a\T\ ".strip())# had to add space } class ParseFastq(TestCase): def test_parse(self): """sequence and info objects should correctly match""" for label, seq, qual in MinimalFastqParser('data/fastq.txt'): self.assertTrue(label in data) self.assertEqual(seq, data[label]["seq"]) self.assertEqual(qual, data[label]["qual"]) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_flowgram.py000644 000765 000024 00000131277 12024702176 023016 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """tests for Flowgram and Flowgramcollection objects """ __author__ = "Jens Reeder, Julia Goodrich" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jens Reeder","Julia Goodrich"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jens Reeder" __email__ = "jreeder@colorado.edu" __status__ = "Development" from cogent.util.unit_test import TestCase, main from types import GeneratorType from numpy import array, transpose from cogent.core.sequence import Sequence from cogent.parse.flowgram import Flowgram, build_averaged_flowgram from cogent.parse.flowgram_parser import parse_sff class FlowgramTests(TestCase): def test_init_empty(self): """Flowgram should init correctly.""" f = Flowgram() self.assertEqual(f._flowgram, '') self.assertEqual(f.flowgram, []) def test_init_data(self): """Flowgram init with data should set data in correct location""" f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',KeySeq = "ATCG", floworder = "TACG", header_info = {'Bases':'TACCCC'}) self.assertEqual(f._flowgram, '0.5 1.0 4.0 0.0') self.assertEqual(f.flowgram, [0.5, 1.0, 4.0, 0.0]) self.assertEqual(f.Name, 'a') self.assertEqual(f.keySeq, "ATCG") self.assertEqual(f.floworder, "TACG") self.assertEqual(f.Bases, 'TACCCC') self.assertEqual(f.header_info, {'Bases':'TACCCC'}) f = Flowgram([0.5, 1.0, 4.0, 0.0], Name = 'a',KeySeq = "ATCG", floworder = "TACG", header_info = {'Bases':'TACCCC'}) self.assertEqual(f._flowgram, '0.5 1.0 4.0 0.0') self.assertEqual(f.flowgram, [0.5, 1.0, 4.0, 0.0]) def test_cmpSeqToString(self): """Sequence should compare equal to same string.""" f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCC'}) self.assertTrue(f.cmpSeqToString('TACCCC')) self.assertFalse(f.cmpSeqToString('TACCC')) f = Flowgram('0.5 1.0 4.0 0.0',floworder = "TACG") self.assertTrue(f.cmpSeqToString('TACCCC')) self.assertFalse(f.cmpSeqToString('TACCC')) def test_cmp_flow_to_string(self): """Sequence should compare equal to same string.""" f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCC'}) self.assertEqual(f, '0.5 1.0 4.0 0.0') self.assertNotEqual(f,'0.5 1.0 4.0') f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCC'}) self.assertEqual(f,f2) def test_cmpBySeqs(self): """Flowgrams should be the same if name, bases, or to_seqs are equal""" f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCC'}) f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCC'}) self.assertEqual(f.cmpBySeqs(f2), 0) f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'b',floworder = "TACG", header_info = {'Bases':'TACCCC'}) self.assertEqual(f.cmpBySeqs(f2), 0) f2 = Flowgram('0.5 1.0 4.0 0.0',floworder = "TACG") self.assertEqual(f.cmpBySeqs(f2), 0) def test_cmpByName(self): """Flowgrams should be the same if name, bases, or to_seqs are equal""" f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCC'}) f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCC'}) self.assertEqual(f.cmpByName(f2), 0) self.assertEqual(f.cmpByName(f), 0) f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'b',floworder = "TACG", header_info = {'Bases':'TACCCC'}) self.assertNotEqual(f.cmpByName(f2), 0) def test_toFasta(self): """Flowgram toFasta() should return Fasta-format string""" even = '0.5 1.0 4.0 0.0' odd = '0.5 1.0 4.0 1.0' even_f = Flowgram(even, Name='even', floworder = "TACG") odd_f = Flowgram(odd, Name='odd', floworder = "TACG") self.assertEqual(even_f.toFasta(), '>even\nTACCCC') #set line wrap to small number so we can test that it works self.assertEqual(even_f.toFasta(LineWrap = 2), '>even\nTA\nCC\nCC') self.assertEqual(odd_f.toFasta(LineWrap = 2), '>odd\nTA\nCC\nCC\nG') even_f = Flowgram(even, Name='even', floworder = "TACG", header_info ={'Bases':'TACCCG'}) odd_f = Flowgram(odd, Name='odd', floworder = "TACG", header_info ={'Bases':'TACCCGG'}) self.assertEqual(even_f.toFasta(), '>even\nTACCCG') #set line wrap to small number so we can test that it works self.assertEqual(even_f.toFasta(LineWrap = 2), '>even\nTA\nCC\nCG') self.assertEqual(odd_f.toFasta(LineWrap = 2), '>odd\nTA\nCC\nCG\nG') def test_contains(self): """Flowgram contains should return correct result""" f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCC'}) assert '0.5' in f assert '0.5 1.0' in f assert '2.0' not in f assert '5.0' not in f def test_cmp(self): """_cmp_ should compare the flowgram strings.""" f1 = Flowgram(['1 2 3 4']) f2 = Flowgram(['2 2 3 4']) self.assertNotEqual(f1,f2) self.assertEqual(f1,f1) #works also with string self.assertNotEqual(f1,"1 2 3 5") self.assertEqual(f1,"1 2 3 4") self.assertNotEqual(f1,"") def test_iter(self): """Flowgram iter should iterate over sequence""" f = Flowgram('0.5 1.0 4.0 0.0') self.assertEqual(list(f), [0.5,1.0,4.0,0.0]) def test_str(self): """__str__ returns self._flowgram unmodified.""" f = Flowgram('0.5 1.0 4.0 0.0') self.assertEqual(str(f), '0.5\t1.0\t4.0\t0.0') f = Flowgram([0.5, 1.0, 4.0, 0.0]) self.assertEqual(str(f), '0.5\t1.0\t4.0\t0.0') def test_len(self): """returns the length of the flowgram""" f = Flowgram('0.5 1.0 4.0 0.0') self.assertEqual(len(f), 4) f = Flowgram() self.assertEqual(len(f), 0) def test_hash(self): """__hash__ behaves like the flowgram string for dict lookup.""" f = Flowgram('0.5 1.0 4.0 0.0', floworder = "TACG") self.assertEqual(hash(f), hash('0.5 1.0 4.0 0.0')) f = Flowgram() self.assertEqual(hash(f), hash('')) def test_toSeq(self): """toSeq should Translate flowgram to sequence""" f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCG'}) self.assertEqual(f.toSeq(), 'TACCCG') self.assertEqual(isinstance(f.toSeq(),Sequence), True) self.assertEqual(f.toSeq(Bases = False), 'TACCCC') f = Flowgram('0.5 1.0 4.0 0.0 0.0 1.23 0.0 6.1', Name = 'a',floworder = "TACG", header_info = {'Bases':'TACCCG'}) self.assertEqual(f.toSeq(), 'TACCCG') self.assertEqual(f.toSeq(Bases = False), 'TACCCCAGGGGGG') f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG", header_info = {}) self.assertEqual(f.toSeq(), 'TACCCC') self.assertEqual(isinstance(f.toSeq(),Sequence), True) self.assertEqual(f.toSeq(Bases = False), 'TACCCC') f = Flowgram('0.5 1.0 4.0 0.0 0.0 1.23 0.0 6.1', Name = 'a',floworder = "TACG", header_info = {}) self.assertEqual(f.toSeq(Bases = True), 'TACCCCAGGGGGG') f = Flowgram('0.4 0.0 0.0 0.0 0.0 1.23 0.0 1.1', Name = 'a',floworder = "TACG", header_info = {}) self.assertEqual(f.toSeq(), 'NAG') def test_getQualityTrimmedFlowgram(self): """getQualityTrimmedFlowgram trims the flowgram correctly""" f = Flowgram('0.5 1.0 4.1 0.0 0.0 1.23 0.0 3.1', Name = 'a', floworder = "TACG", header_info = {'Bases':'TACCCCAGGG', 'Clip Qual Right': 7, 'Flow Indexes': "1\t2\t3\t3\t3\t3\t6\t8\t8\t8"}) trimmed = f.getQualityTrimmedFlowgram() self.assertEqual(trimmed.toSeq(), "TACCCCA") self.assertEqual(str(trimmed), "0.5\t1.0\t4.1\t0.0\t0.0\t1.23") # tests on real data flow1 = self.flows[0] flow2 = self.flows[1] flow1_trimmed = flow1.getQualityTrimmedFlowgram() self.assertEqual(str(flow1_trimmed), "1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97") self.assertEqual(flow1_trimmed.Bases, "tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCA") flow2_trimmed = flow2.getQualityTrimmedFlowgram() self.assertEqual(str(flow2_trimmed), "1.04 0.00 1.01 0.00 0.00 1.00 0.00 1.00 0.00 1.05 0.00 0.91 0.10 1.07 0.95 1.01 0.00 0.06 0.93 0.02 0.03 1.06 1.18 0.09 1.00 0.05 0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99") self.assertEqual(flow2_trimmed.Bases, "tcagAGACGCACTCAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAA") def test_getPrimerTrimmedFlowgram(self): """getPrimerTrimmedFlowgram cuts the barcode of the flowgram correctly""" f = Flowgram('0.5 1.0 4.1 0.0 0.0 1.23 0.0 3.1', Name = 'a', floworder = "TACG", header_info = {'Bases':'TACCCCAGGG', 'Clip Qual Right': 7, 'Flow Indexes': "1\t2\t3\t3\t3\t3\t6\t8\t8\t8"}) trimmed = f.getPrimerTrimmedFlowgram(primerseq="TA") #test primer trimming self.assertEqual(trimmed.toSeq(), "CCCCAGGG") self.assertEqual(str(trimmed), "0.00\t0.00\t4.10\t0.00\t0.00\t1.23\t0.00\t3.10") for (a,b) in zip(trimmed.flowgram, [0.0,0.0,4.1,0.0,0.0,1.23,0.0,3.1]): self.assertFloatEqual(a,b) trimmed = f.getPrimerTrimmedFlowgram(primerseq="TACC") for (a,b) in zip(trimmed.flowgram, [0.0,0.0,2.1,0.0,0.0,1.23,0.0,3.1]): self.assertFloatEqual(a,b) self.assertEqual(trimmed.toSeq(), "CCAGGG") self.assertEqual(str(trimmed), "0.00\t0.00\t2.10\t0.00\t0.00\t1.23\t0.00\t3.10") # test that primer trimming does not leave ambig flow at begin trimmed = f.getPrimerTrimmedFlowgram(primerseq="TACCCC") for (a,b) in zip(trimmed.flowgram, [0.0,1.23,0.0,3.1]): self.assertFloatEqual(a,b) self.assertEqual(trimmed.toSeq(), "AGGG") self.assertEqual(str(trimmed), "0.00\t1.23\t0.00\t3.10") # tests on real data flow1 = self.flows[0] flow2 = self.flows[1] flow3 = self.flows[2] flow1_trimmed = flow1.getPrimerTrimmedFlowgram(primerseq="TCAG"+"GCTAACTGTAA") self.assertEqual(str(flow1_trimmed), "0.00\t0.00\t2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04") self.assertEqual(flow1_trimmed.Bases, "CCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc") flow1_trimmed = flow1.getPrimerTrimmedFlowgram(primerseq="TCAG"+"GCTAACTGTAAC") self.assertEqual(str(flow1_trimmed), "0.00\t0.00\t1.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04") self.assertEqual(flow1_trimmed.Bases, "CCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc") #test that trimming does not leave 4 zero flows (homopolymer) flow1_trimmed = flow1.getPrimerTrimmedFlowgram(primerseq="TCAG"+"GCTAACTGTAACCC") self.assertEqual(str(flow1_trimmed), "0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04") self.assertEqual(flow1_trimmed.Bases, "TCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc") #test that trimming does not leave 4 zero flows (signal <1.5) flow1_trimmed = flow1.getPrimerTrimmedFlowgram(primerseq="TCAG"+"GCTAACTGTAACCCTC") self.assertEqual(str(flow1_trimmed), "1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04") self.assertEqual(flow1_trimmed.Bases, "TTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc") flow1_untrimmed= flow1.getPrimerTrimmedFlowgram("") self.assertEqual(str(flow1_untrimmed), "1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04") self.assertEqual(flow1_untrimmed.Bases, "tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc") flow2_trimmed = flow2.getPrimerTrimmedFlowgram(primerseq="TCAG"+"AGACGCACT") self.assertEqual(str(flow2_trimmed), "0.00\t0.05\t0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99 1.05 0.14 1.03 0.13 0.03 1.10 0.10 0.96 0.11 0.99 0.12 0.05 0.94 2.83 0.14 0.12 0.96 0.00 1.00 0.11 0.14 1.98 0.08 0.11 1.04 0.01 0.11 2.03 0.15 2.05 0.10 0.03 0.93 0.01 0.08 0.12 0.00 0.16 0.05 0.07 0.08 0.11 0.07 0.05 0.04 0.10 0.05 0.05 0.03 0.07 0.03 0.04 0.04 0.06 0.03 0.05 0.04 0.09 0.03 0.08 0.03 0.07 0.02 0.05 0.02 0.06 0.01 0.05 0.04 0.06 0.02 0.04 0.04 0.04 0.03 0.03 0.06 0.06 0.03 0.02 0.02 0.08 0.03 0.01 0.01 0.06 0.03 0.01 0.03 0.04 0.02 0.00 0.02 0.05 0.00 0.02 0.02 0.03 0.00 0.02 0.02 0.04 0.01 0.00 0.01 0.05") self.assertEqual(flow2_trimmed.Bases, "CAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAActgagcgggctggcaaggc") #trimming at the end of the flow cycle works flow3_trimmed = flow3.getPrimerTrimmedFlowgram(primerseq="TCAG"+"ATTAGATACCCNGGTAGG") self.assertEqual(str(flow3_trimmed), "0.05 0.05 2.04 0.10 0.03 1.06 1.05 1.01 0.07 0.09 2.07 1.01 0.93 2.88 1.06 1.95 1.00 0.05 0.05 2.97 0.09 0.00 0.93 1.01 0.06 0.05 0.99 0.09 0.98 1.01 0.03 1.02 1.92 0.07 0.01 1.03 1.01 0.01 0.05 0.96 0.09 0.05 0.98 1.07 0.02 2.02 2.05 0.09 1.87 0.12 2.15 0.05 0.13 0.92 1.05 1.96 3.01 0.13 0.04 1.05 0.96 0.05 0.05 0.95 0.12 0.01 1.00 2.02 0.03 0.03 0.99 1.01 0.05 0.06 0.98 0.13 0.06 0.97 0.11 1.01 0.08 0.12 1.02 0.12 1.02 2.19 1.03 1.01 0.08 0.11 0.96 0.09 0.08 1.01 0.08 0.06 2.10 2.11 0.12 1.04 0.13 0.09 0.94 1.03 0.08 0.05 3.06 0.12 1.00 0.03 0.09 0.95 0.10 0.03 2.09 0.21 0.99 0.06 0.11 4.06 0.10 1.04 0.04 1.05 1.05 1.04 1.02 0.97 0.13 0.93 0.10 0.12 1.08 0.12 0.99 1.06 0.10 0.11 0.98 0.10 0.02 2.01 0.10 1.01 0.09 0.96 0.07 0.11 2.03 4.12 1.05 0.08 1.01 0.04 0.98 0.14 0.12 2.96 0.13 1.98 0.12 2.08 0.10 0.12 1.99 0.13 0.07 0.98 0.03 0.93 0.86 4.10 0.13 0.10 3.99 1.13 0.07 0.06 1.07 0.09 0.05 1.03 1.12 0.13 0.05 2.01 0.08 0.80 0.05 0.11 0.98 0.13 0.04 1.01 0.07 1.02 0.07 0.11 1.07 2.19 0.06 0.97 0.11 1.03 0.05 0.11 1.05 0.14 0.06 1.03 0.13 0.10 0.97 0.16 0.13 1.00 0.13 0.06 1.02 2.15 0.02 0.16 0.95 0.09 2.06 2.12 0.07 0.07 2.08 0.12 0.97 1.00 0.03 0.99 1.02 1.01 0.03 0.15 0.90 0.07 0.01 2.00 1.01 1.00 0.06 0.11 1.08 1.00 0.03 1.99 0.03 1.00 0.02 1.85 1.93 0.14 1.97 0.91 1.83 0.06 0.04 1.97 0.05 2.08 0.04 0.06 1.05 0.05 2.13 0.16 0.09 1.17 0.01 1.01 1.07 0.09 0.14 0.91 0.06 0.08 1.03 1.04 0.08 0.05 1.05 1.03 1.16 0.06 0.05 1.01 0.06 2.15 0.06 1.99 0.13 0.04 1.08 0.97 0.11 0.07 1.05 0.08 0.07 2.13 0.14 0.09 1.10 0.15 0.00 1.02 0.07 1.05 0.05 0.95 0.09 1.00 0.15 0.95 0.08 0.15 1.11 0.07 0.12 1.05 1.06 0.09 1.03 0.07 0.11 1.01 0.05 0.05 1.05 0.98 0.00 0.93 0.08 0.12 1.85 1.11 0.10 0.07 1.00 0.01 0.10 1.87 0.05 2.14 1.10 0.03 1.06 0.10 0.91 0.10 0.06 1.05 1.02 1.02 0.07 0.06 0.98 0.95 1.09 0.06 0.14 0.97 0.04 2.44") self.assertEqual(flow3_trimmed.Bases, "CCACGCCGTAAACGGTGGGCGCTAGTTGTGCGAACCTTCCACGGTTTGTGCGGCGCAGCTAACGCATTAAGCGCCCTGCCTGGGGAGTACGATCGCAAGATTAAAACTCAAAGGAATTGACGGGGCCCCGCACAAGCAGCGGAGCATGCGGCTTAATTCGACGCAACGCGAAGAACCTTACCAAGGCTTGACATATACAGGAATATGGCAGAGATGTCATAGCCGCAAGGTCTGTATACAGG") flow3_trimmed = flow3.getPrimerTrimmedFlowgram(primerseq="TCAG"+"ATTAGATACCCNGGTAG") self.assertEqual(str(flow3_trimmed), "0.00\t0.00\t0.00 1.10 0.05 0.05 2.04 0.10 0.03 1.06 1.05 1.01 0.07 0.09 2.07 1.01 0.93 2.88 1.06 1.95 1.00 0.05 0.05 2.97 0.09 0.00 0.93 1.01 0.06 0.05 0.99 0.09 0.98 1.01 0.03 1.02 1.92 0.07 0.01 1.03 1.01 0.01 0.05 0.96 0.09 0.05 0.98 1.07 0.02 2.02 2.05 0.09 1.87 0.12 2.15 0.05 0.13 0.92 1.05 1.96 3.01 0.13 0.04 1.05 0.96 0.05 0.05 0.95 0.12 0.01 1.00 2.02 0.03 0.03 0.99 1.01 0.05 0.06 0.98 0.13 0.06 0.97 0.11 1.01 0.08 0.12 1.02 0.12 1.02 2.19 1.03 1.01 0.08 0.11 0.96 0.09 0.08 1.01 0.08 0.06 2.10 2.11 0.12 1.04 0.13 0.09 0.94 1.03 0.08 0.05 3.06 0.12 1.00 0.03 0.09 0.95 0.10 0.03 2.09 0.21 0.99 0.06 0.11 4.06 0.10 1.04 0.04 1.05 1.05 1.04 1.02 0.97 0.13 0.93 0.10 0.12 1.08 0.12 0.99 1.06 0.10 0.11 0.98 0.10 0.02 2.01 0.10 1.01 0.09 0.96 0.07 0.11 2.03 4.12 1.05 0.08 1.01 0.04 0.98 0.14 0.12 2.96 0.13 1.98 0.12 2.08 0.10 0.12 1.99 0.13 0.07 0.98 0.03 0.93 0.86 4.10 0.13 0.10 3.99 1.13 0.07 0.06 1.07 0.09 0.05 1.03 1.12 0.13 0.05 2.01 0.08 0.80 0.05 0.11 0.98 0.13 0.04 1.01 0.07 1.02 0.07 0.11 1.07 2.19 0.06 0.97 0.11 1.03 0.05 0.11 1.05 0.14 0.06 1.03 0.13 0.10 0.97 0.16 0.13 1.00 0.13 0.06 1.02 2.15 0.02 0.16 0.95 0.09 2.06 2.12 0.07 0.07 2.08 0.12 0.97 1.00 0.03 0.99 1.02 1.01 0.03 0.15 0.90 0.07 0.01 2.00 1.01 1.00 0.06 0.11 1.08 1.00 0.03 1.99 0.03 1.00 0.02 1.85 1.93 0.14 1.97 0.91 1.83 0.06 0.04 1.97 0.05 2.08 0.04 0.06 1.05 0.05 2.13 0.16 0.09 1.17 0.01 1.01 1.07 0.09 0.14 0.91 0.06 0.08 1.03 1.04 0.08 0.05 1.05 1.03 1.16 0.06 0.05 1.01 0.06 2.15 0.06 1.99 0.13 0.04 1.08 0.97 0.11 0.07 1.05 0.08 0.07 2.13 0.14 0.09 1.10 0.15 0.00 1.02 0.07 1.05 0.05 0.95 0.09 1.00 0.15 0.95 0.08 0.15 1.11 0.07 0.12 1.05 1.06 0.09 1.03 0.07 0.11 1.01 0.05 0.05 1.05 0.98 0.00 0.93 0.08 0.12 1.85 1.11 0.10 0.07 1.00 0.01 0.10 1.87 0.05 2.14 1.10 0.03 1.06 0.10 0.91 0.10 0.06 1.05 1.02 1.02 0.07 0.06 0.98 0.95 1.09 0.06 0.14 0.97 0.04 2.44") self.assertEqual(flow3_trimmed.Bases, "GCCACGCCGTAAACGGTGGGCGCTAGTTGTGCGAACCTTCCACGGTTTGTGCGGCGCAGCTAACGCATTAAGCGCCCTGCCTGGGGAGTACGATCGCAAGATTAAAACTCAAAGGAATTGACGGGGCCCCGCACAAGCAGCGGAGCATGCGGCTTAATTCGACGCAACGCGAAGAACCTTACCAAGGCTTGACATATACAGGAATATGGCAGAGATGTCATAGCCGCAAGGTCTGTATACAGG") def test_createFlowHeader(self): """header_info dict turned into flowgram header""" f = Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a', header_info = {'Bases':'TACCCCTTGG','Name Length':'14'}) self.assertEqual(f.createFlowHeader(), """>a\n Name Length:\t14\nBases:\tTACCCCTTGG\nFlowgram:\t0.5\t1.0\t4.0\t0.0\t1.5\t0.0\t0.0\t2.0\n""") def test_build_averaged_flowgram(self): f1 = [0.3, 1.1, 4.0 , 0.01, 0.8, 0.0, 0.0, 2.0] f2 = [0.6, 0.9, 4.05, 0.1, 1.2, 0.1, 0.4] f3 = [0.4, 1.2, 4.05, 0.2, 1.3, 0.2] f4 = [0.7, 1.0, 4.0 , 0.02, 1.5] flowgrams = [f1,f2,f3,f4] self.assertFloatEqual(build_averaged_flowgram(flowgrams), [0.5, 1.05, 4.03, 0.08, 1.2, 0.1, 0.2, 2.0]) self.assertFloatEqual(build_averaged_flowgram([f1,f1,f1,f1,f1,f1]), [0.3, 1.1, 4.0 , 0.01, 0.8, 0.0, 0.0, 2.0]) def setUp(self): """Define some standard data""" self.rec = """Common Header: Magic Number: 0x2E736666 Version: 0001 Index Offset: 96099976 Index Length: 1158685 # of Reads: 57902 Header Length: 440 Key Length: 4 # of Flows: 400 Flowgram Code: 1 Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG Key Sequence: TCAG >FIQU8OX05GCVRO Run Prefix: R_2008_10_15_16_11_02_ Region #: 5 XY Location: 2489_3906 Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Read Header Len: 32 Name Length: 14 # of Bases: 104 Clip Qual Left: 5 Clip Qual Right: 85 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04 Flow Indexes: 1 3 6 8 8 11 13 14 14 15 17 20 21 22 22 23 23 23 25 27 29 29 32 32 35 38 39 39 39 42 43 45 46 46 46 47 48 51 51 54 54 57 59 61 61 64 67 69 72 72 74 76 77 80 81 81 81 82 83 83 86 88 88 91 94 95 95 95 98 100 103 106 106 109 112 113 116 118 118 121 122 124 125 127 130 131 133 136 138 140 143 144 144 144 147 149 152 152 155 158 158 160 160 163 Bases: tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 37 37 37 37 37 39 39 39 39 24 24 24 37 34 28 24 24 24 28 34 39 39 39 39 39 39 39 39 39 39 39 39 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 >FIQU8OX05F8ILF Run Prefix: R_2008_10_15_16_11_02_ Region #: 5 XY Location: 2440_0913 Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Read Header Len: 32 Name Length: 14 # of Bases: 206 Clip Qual Left: 5 Clip Qual Right: 187 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.04 0.00 1.01 0.00 0.00 1.00 0.00 1.00 0.00 1.05 0.00 0.91 0.10 1.07 0.95 1.01 0.00 0.06 0.93 0.02 0.03 1.06 1.18 0.09 1.00 0.05 0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99 1.05 0.14 1.03 0.13 0.03 1.10 0.10 0.96 0.11 0.99 0.12 0.05 0.94 2.83 0.14 0.12 0.96 0.00 1.00 0.11 0.14 1.98 0.08 0.11 1.04 0.01 0.11 2.03 0.15 2.05 0.10 0.03 0.93 0.01 0.08 0.12 0.00 0.16 0.05 0.07 0.08 0.11 0.07 0.05 0.04 0.10 0.05 0.05 0.03 0.07 0.03 0.04 0.04 0.06 0.03 0.05 0.04 0.09 0.03 0.08 0.03 0.07 0.02 0.05 0.02 0.06 0.01 0.05 0.04 0.06 0.02 0.04 0.04 0.04 0.03 0.03 0.06 0.06 0.03 0.02 0.02 0.08 0.03 0.01 0.01 0.06 0.03 0.01 0.03 0.04 0.02 0.00 0.02 0.05 0.00 0.02 0.02 0.03 0.00 0.02 0.02 0.04 0.01 0.00 0.01 0.05 Flow Indexes: 1 3 6 8 10 12 14 15 16 19 22 23 25 27 30 30 33 33 34 37 37 37 39 39 42 45 46 48 51 53 53 56 56 56 57 58 60 61 64 65 67 70 70 73 74 74 77 80 83 85 88 91 93 94 97 100 102 102 103 106 109 112 112 112 114 116 117 118 119 122 122 122 125 126 129 129 131 133 133 135 138 138 140 142 145 146 147 149 152 154 157 159 161 163 166 169 169 169 171 171 173 173 173 174 176 178 181 182 185 186 189 190 191 191 191 194 196 198 198 200 201 204 206 206 206 209 209 211 211 213 216 216 218 221 223 226 227 230 233 234 236 237 238 240 241 241 243 245 246 249 249 249 249 249 250 253 253 253 256 258 261 264 266 268 270 270 270 271 273 273 273 274 277 278 279 281 282 285 285 285 285 285 287 290 293 294 294 295 297 300 302 304 307 308 308 308 311 313 316 316 319 322 322 324 324 327 Bases: tcagAGACGCACTCAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAActgagcgggctggcaaggc Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 34 34 34 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 36 36 36 36 36 38 25 25 25 38 37 37 37 37 37 37 33 33 34 37 37 37 37 37 37 37 38 34 20 20 26 26 20 34 38 37 37 37 37 37 37 37 37 37 38 38 38 37 37 37 37 37 37 37 37 37 37 >FIQU8OX06G9PCS Run Prefix: R_2008_10_15_16_11_02_ Region #: 6 XY Location: 2863_3338 Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Read Header Len: 32 Name Length: 14 # of Bases: 264 Clip Qual Left: 5 Clip Qual Right: 264 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.04 0.05 1.01 0.07 0.05 0.99 0.03 1.05 0.04 1.05 0.05 0.06 2.05 1.13 0.03 1.00 0.08 1.07 0.09 0.05 1.02 1.11 3.06 0.09 0.04 1.03 0.13 1.97 1.02 1.07 0.06 2.10 0.05 0.05 2.04 0.10 0.03 1.06 1.05 1.01 0.07 0.09 2.07 1.01 0.93 2.88 1.06 1.95 1.00 0.05 0.05 2.97 0.09 0.00 0.93 1.01 0.06 0.05 0.99 0.09 0.98 1.01 0.03 1.02 1.92 0.07 0.01 1.03 1.01 0.01 0.05 0.96 0.09 0.05 0.98 1.07 0.02 2.02 2.05 0.09 1.87 0.12 2.15 0.05 0.13 0.92 1.05 1.96 3.01 0.13 0.04 1.05 0.96 0.05 0.05 0.95 0.12 0.01 1.00 2.02 0.03 0.03 0.99 1.01 0.05 0.06 0.98 0.13 0.06 0.97 0.11 1.01 0.08 0.12 1.02 0.12 1.02 2.19 1.03 1.01 0.08 0.11 0.96 0.09 0.08 1.01 0.08 0.06 2.10 2.11 0.12 1.04 0.13 0.09 0.94 1.03 0.08 0.05 3.06 0.12 1.00 0.03 0.09 0.95 0.10 0.03 2.09 0.21 0.99 0.06 0.11 4.06 0.10 1.04 0.04 1.05 1.05 1.04 1.02 0.97 0.13 0.93 0.10 0.12 1.08 0.12 0.99 1.06 0.10 0.11 0.98 0.10 0.02 2.01 0.10 1.01 0.09 0.96 0.07 0.11 2.03 4.12 1.05 0.08 1.01 0.04 0.98 0.14 0.12 2.96 0.13 1.98 0.12 2.08 0.10 0.12 1.99 0.13 0.07 0.98 0.03 0.93 0.86 4.10 0.13 0.10 3.99 1.13 0.07 0.06 1.07 0.09 0.05 1.03 1.12 0.13 0.05 2.01 0.08 0.80 0.05 0.11 0.98 0.13 0.04 1.01 0.07 1.02 0.07 0.11 1.07 2.19 0.06 0.97 0.11 1.03 0.05 0.11 1.05 0.14 0.06 1.03 0.13 0.10 0.97 0.16 0.13 1.00 0.13 0.06 1.02 2.15 0.02 0.16 0.95 0.09 2.06 2.12 0.07 0.07 2.08 0.12 0.97 1.00 0.03 0.99 1.02 1.01 0.03 0.15 0.90 0.07 0.01 2.00 1.01 1.00 0.06 0.11 1.08 1.00 0.03 1.99 0.03 1.00 0.02 1.85 1.93 0.14 1.97 0.91 1.83 0.06 0.04 1.97 0.05 2.08 0.04 0.06 1.05 0.05 2.13 0.16 0.09 1.17 0.01 1.01 1.07 0.09 0.14 0.91 0.06 0.08 1.03 1.04 0.08 0.05 1.05 1.03 1.16 0.06 0.05 1.01 0.06 2.15 0.06 1.99 0.13 0.04 1.08 0.97 0.11 0.07 1.05 0.08 0.07 2.13 0.14 0.09 1.10 0.15 0.00 1.02 0.07 1.05 0.05 0.95 0.09 1.00 0.15 0.95 0.08 0.15 1.11 0.07 0.12 1.05 1.06 0.09 1.03 0.07 0.11 1.01 0.05 0.05 1.05 0.98 0.00 0.93 0.08 0.12 1.85 1.11 0.10 0.07 1.00 0.01 0.10 1.87 0.05 2.14 1.10 0.03 1.06 0.10 0.91 0.10 0.06 1.05 1.02 1.02 0.07 0.06 0.98 0.95 1.09 0.06 0.14 0.97 0.04 2.44 Flow Indexes: 1 3 6 8 10 13 13 14 16 18 21 22 23 23 23 26 28 28 29 30 32 32 35 35 38 39 40 43 43 44 45 46 46 46 47 48 48 49 52 52 52 55 56 59 61 62 64 65 65 68 69 72 75 76 78 78 79 79 81 81 83 83 86 87 88 88 89 89 89 92 93 96 99 100 100 103 104 107 110 112 115 117 118 118 119 120 123 126 129 129 130 130 132 135 136 139 139 139 141 144 147 147 149 152 152 152 152 154 156 157 158 159 160 162 165 167 168 171 174 174 176 178 181 181 182 182 182 182 183 185 187 190 190 190 192 192 194 194 197 197 200 202 203 204 204 204 204 207 207 207 207 208 211 214 215 218 218 220 223 226 228 231 232 232 234 236 239 242 245 248 251 252 252 255 257 257 258 258 261 261 263 264 266 267 268 271 274 274 275 276 279 280 282 282 284 286 286 287 287 289 289 290 291 291 294 294 296 296 299 301 301 304 306 307 310 313 314 317 318 319 322 324 324 326 326 329 330 333 336 336 339 342 344 346 348 350 353 356 357 359 362 365 366 368 371 371 372 375 378 378 380 380 381 383 385 388 389 390 393 394 395 398 400 400 Bases: tcagATTAGATACCCAGGTAGGCCACGCCGTAAACGGTGGGCGCTAGTTGTGCGAACCTTCCACGGTTTGTGCGGCGCAGCTAACGCATTAAGCGCCCTGCCTGGGGAGTACGATCGCAAGATTAAAACTCAAAGGAATTGACGGGGCCCCGCACAAGCAGCGGAGCATGCGGCTTAATTCGACGCAACGCGAAGAACCTTACCAAGGCTTGACATATACAGGAATATGGCAGAGATGTCATAGCCGCAAGGTCTGTATACAGG Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 37 40 40 38 38 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 37 37 37 37 37 37 37 37 38 38 40 40 40 40 40 38 38 38 38 38 40 40 38 38 38 38 38 40 40 40 40 38 38 38 38 38 38 31 30 30 30 32 31 32 31 32 31 31 28 25 21 20 """.split('\n') flows, head = parse_sff(self.rec) self.flows = list(flows) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_flowgram_collection.py000644 000765 000024 00000110425 12024702176 025221 0ustar00jrideoutstaff000000 000000 __author__ = "Julia Goodrich" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jens Reeder","Julia Goodrich"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Julia Goodrich" __email__ = "julia.goodrich@colorado.edu" __Status__ = "Development" from cogent.util.unit_test import TestCase, main from types import GeneratorType from numpy import array, transpose from cogent.core.sequence import Sequence from cogent.parse.flowgram_collection import FlowgramCollection, flows_from_array,\ flows_from_generic,flows_from_kv_pairs,flows_from_empty,flows_from_dict,\ flows_from_sff,assign_sequential_names,flows_from_flowCollection,\ pick_from_prob_density, seqs_to_flows from cogent.parse.flowgram import Flowgram from cogent.core.alignment import SequenceCollection from tempfile import mktemp from os import remove class flowgram_tests(TestCase): """Tests of top-level functions.""" def test_flows_from_array(self): """flows_from_array should return chars, and successive indices.""" a = array([[0,1,2],[2,1,0]]) #three 2-char seqs obs_a, obs_labels, obs_info = flows_from_array(a) #note transposition self.assertEqual(obs_a, [array([0,2]), array([1,1]), array([2,0])]) self.assertEqual(obs_labels, None) self.assertEqual(obs_info, None) def test_flows_from_generic(self): """flows_from_flow should initialize from list of flowgram objects""" c = Flowgram('0.0 1.1 3.0 1.0', Name='a') b = Flowgram('0.5 1.0 4.0 0.0', Name = 'b') obs_a, obs_labels, obs_info = flows_from_generic([c,b]) self.assertEqual(map(str,obs_a), ['0.0\t1.1\t3.0\t1.0', '0.5\t1.0\t4.0\t0.0']) self.assertEqual(obs_labels, ['a','b']) self.assertEqual(obs_info, [None,None]) f = ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0'] obs_a, obs_labels, obs_info = flows_from_generic(f) self.assertEqual(map(str,obs_a), ['0.0 1.1 3.0 1.0', '0.5 1.0 4.0 0.0']) self.assertEqual(obs_labels, [None,None]) self.assertEqual(obs_info, [None,None]) def test_flows_from_flowCollection(self): """flows_from_flowCollection should init from existing collection""" c = FlowgramCollection({'a':'0.0 1.1 3.0 1.0','b':'0.5 1.0 4.0 0.0'}) obs_a, obs_labels, obs_info = flows_from_flowCollection(c) self.assertEqual(map(str,obs_a), ['0.0\t1.1\t3.0\t1.0', '0.5\t1.0\t4.0\t0.0']) self.assertEqual(obs_labels, ['a','b']) self.assertEqual(obs_info, [None,None]) def test_flows_from_kv_pairs(self): """seqs_from_kv_pairs should initialize from key-value pairs""" c = [['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0']] obs_a, obs_labels, obs_info = flows_from_kv_pairs(c) self.assertEqual(map(str,obs_a), ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0']) self.assertEqual(obs_labels, ['a','b']) self.assertEqual(obs_info, [None,None]) c =[['a',Flowgram('0.0 1.1 3.0 1.0')],['b',Flowgram('0.5 1.0 4.0 0.0')]] obs_a, obs_labels, obs_info = flows_from_kv_pairs(c) self.assertEqual(map(str,obs_a), ['0.0\t1.1\t3.0\t1.0','0.5\t1.0\t4.0\t0.0']) self.assertEqual(obs_labels, ['a','b']) self.assertEqual(obs_info, [None,None]) def test_flows_from_empty(self): """flowss_from_empty should always raise ValueError""" self.assertRaises(ValueError, flows_from_empty, 'xyz') def test_flows_from_dict(self): """flows_from_dict should init from dictionary""" c = {'a':'0.0 1.1 3.0 1.0','b':'0.5 1.0 4.0 0.0'} obs_a, obs_labels, obs_info = flows_from_dict(c) self.assertEqual(map(str,obs_a), ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0']) self.assertEqual(obs_labels, ['a','b']) self.assertEqual(obs_info, [None,None]) c ={'a':Flowgram('0.0 1.1 3.0 1.0'),'b':Flowgram('0.5 1.0 4.0 0.0')} obs_a, obs_labels, obs_info = flows_from_dict(c) self.assertEqual(map(str,obs_a), ['0.0\t1.1\t3.0\t1.0','0.5\t1.0\t4.0\t0.0']) self.assertEqual(obs_labels, ['a','b']) self.assertEqual(obs_info, [None,None]) def test_pick_from_prob_density(self): """Should take bin probabilitys and bin size and return random""" i = pick_from_prob_density([0,1.0,0,0],1) self.assertEqual(i,1) i = pick_from_prob_density([1.0,0,0,0],.01) self.assertEqual(i,0.0) def test_seqs_to_flows(self): """seqs_to_flows should take a list of seqs and probs and return """ seqs = [('a','ATCGT'), ('b','ACCCAG'), ('c','GTAATG')] a = SequenceCollection(seqs) flows = seqs_to_flows(a.items()) assert isinstance(flows,FlowgramCollection) for f,i in zip(flows,['0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0', '0.0 1.0 3.0 0.0 0.0 1.0 0.0 1.0', '0.0 0.0 0.0 1.0 1.0 2.0 0.0 0.0 1.0 0.0 0.0 1.0']): self.assertEqual(f,i) probs ={0:[1.0,0,0,0,0],1:[0,1.0,0,0,0],2:[0,0,1.0,0,0],3:[0,0,0,1.0,0]} flows = seqs_to_flows(a.items(), probs = probs, bin_size = 1.0) assert isinstance(flows,FlowgramCollection) for f,i in zip(flows,['0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0', '0.0 1.0 3.0 0.0 0.0 1.0 0.0 1.0', '0.0 0.0 0.0 1.0 1.0 2.0 0.0 0.0 1.0 0.0 0.0 1.0']): self.assertEqual(f,i) class FlowgramCollectionTests(TestCase): """Tests sff parser functions""" Class = FlowgramCollection def test_guess_input_type(self): """ _guess_input_type should figure out data type correctly""" git = self.unordered._guess_input_type self.assertEqual(git(self.unordered), 'flowcoll') self.assertEqual(git(['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0']), 'generic') self.assertEqual(git([Flowgram('0.0 1.1 3.0 1.0'), Flowgram('0.5 1.0 4.0 0.0')]), 'generic') self.assertEqual(git([[1,2],[4,5]]), 'kv_pairs') #precedence over generic self.assertEqual(git([('a',Flowgram('0.0 1.1 3.0 1.0')), ('b',Flowgram('0.5 1.0 4.0 0.0'))]), 'kv_pairs') self.assertEqual(git([[1,2,3],[4,5,6]]), 'generic') self.assertEqual(git(array([[1,2,3],[4,5,6]])), 'array') self.assertEqual(git({'a':'0.0 1.1 3.0 1.0'}), 'dict') self.assertEqual(git({'a':Flowgram('0.0 1.1 3.0 1.0')}), 'dict') self.assertEqual(git([]), 'empty') self.assertEqual(git('Common Header'), 'sff') def test_init_pairs(self): """FlowgramCollection init from list of (key,val) should work""" Flows = [['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0']] a = self.Class(Flows) self.assertEqual(len(a.NamedFlows), 2) self.assertEqual(a.NamedFlows['a'], '0.0 1.1 3.0 1.0') self.assertEqual(a.NamedFlows['b'], '0.5 1.0 4.0 0.0') self.assertEqual(a.Names, ['a','b']) self.assertEqual(list(a.flows), ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0']) def test_init_aln(self): """FlowgramCollection should init from existing Collections""" start = self.Class([['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0']]) exp = self.Class([['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0']]) f = self.Class(start) self.assertEqual(f, exp) test_init_aln.__doc__ = Class.__name__ + test_init_aln.__doc__ def test_init_dict(self): """FlowgramCollection init from dict should work as expected""" d = {'a':'0.0 1.1 3.0 1.0','b':'0.5 1.0 4.0 0.0'} a = self.Class(d) self.assertEqual(a, d) self.assertEqual(a.NamedFlows.items(), d.items()) def test_init_name_mapped(self): """FlowgramCollection init should allow name mapping function""" d = {'a':'0.0 1.1 3.0 1.0','b':'0.5 1.0 4.0 0.0'} f = lambda x: x.upper() a = self.Class(d, name_conversion_f=f) self.assertNotEqual(a, d) self.assertNotEqual(a.NamedFlows.items(), d.items()) d_upper = {'A':'0.0 1.1 3.0 1.0','B':'0.5 1.0 4.0 0.0'} self.assertEqual(a, d_upper) self.assertEqual(a.NamedFlows.items(), d_upper.items()) def test_init_flow(self): """FlowgramCollection init from list of flowgrams should use indices as keys""" f1 = Flowgram('0.0 1.1 3.0 1.0') f2 = Flowgram('0.5 1.0 4.0 0.0') flows = [f1,f2] a = self.Class(flows) self.assertEqual(len(a.NamedFlows), 2) self.assertEqual(a.NamedFlows['seq_0'], '0.0 1.1 3.0 1.0') self.assertEqual(a.NamedFlows['seq_1'], '0.5 1.0 4.0 0.0') self.assertEqual(a.Names, ['seq_0','seq_1']) self.assertEqual(list(a.Flows), ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0']) def test_flows_from_sff(self): """flow_from_sff should init from sff iterator""" s = self.rec f = self.Class(s) self.assertEqual(f.NamedFlows['FIQU8OX05GCVRO'], self.flow) def test_init_duplicate_keys(self): """FlowgramCollection init from kv pairs should fail on dup. keys""" f = [['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0'], ['b','1.5 2.0 0.0 0.5']] self.assertRaises(ValueError, self.Class, f) self.assertEqual(self.Class(f, remove_duplicate_names=True).Names, ['a','b']) def test_init_ordered(self): """FlowgramCollection should iter over flows correctly, ordered too""" first = self.ordered1 sec = self.ordered2 un = self.unordered self.assertEqual(first.Names, ['a','b']) self.assertEqual(sec.Names, ['b', 'a']) self.assertEqual(un.Names, un.NamedFlows.keys()) first_list = list(first.flow_str) sec_list = list(sec.flow_str) un_list = list(un.flow_str) self.assertEqual(first_list, ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0']) self.assertEqual(sec_list, ['0.5 1.0 4.0 0.0', '0.0 1.1 3.0 1.0']) #check that the unordered seq matches one of the lists self.assertTrue((un_list == first_list) or (un_list == sec_list)) self.assertNotEqual(first_list, sec_list) def test_flow_str(self): """FlowgramCollection flow_str prop returns flows in correct order.""" first = self.ordered1 sec = self.ordered2 un = self.unordered first_list = list(first.flow_str) sec_list = list(sec.flow_str) un_list = list(un.flow_str) self.assertEqual(first_list, ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0']) self.assertEqual(sec_list, ['0.5 1.0 4.0 0.0', '0.0 1.1 3.0 1.0']) #check that the unordered seq matches one of the lists self.assertTrue((un_list == first_list) or (un_list == sec_list)) self.assertNotEqual(first_list, sec_list) def test_iter(self): """FlowgramCollection __iter__ method should yield flows inorder""" f = self.Class(['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0','1.5 0.0 2.0 1.0'], \ Names=['a','b','c']) for i,b in zip(f,['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0', '1.5 0.0 2.0 1.0']): self.assertEqual(i,b) def test_str(self): """FlowgramCollection __str__ should return sff format""" a = [Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a', header_info = {'Bases':'TACCCCTTGG','Name Length':'14'}), Flowgram('1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', Name = 'b', header_info = {'Bases':'TTATTTACCG','Name Length':'14'})] f = FlowgramCollection(a, header_info = {'Flow Chars':'TACG'}) self.assertEqual(str(f), """Common Header:\n Flow Chars:\tTACG\n\n>a\n Name Length:\t14\nBases:\tTACCCCTTGG\nFlowgram:\t0.5\t1.0\t4.0\t0.0\t1.5\t0.0\t0.0\t2.0\n\n>b\n Name Length:\t14\nBases:\tTTATTTACCG\nFlowgram:\t1.5\t1.0\t0.0\t0.0\t2.5\t1.0\t2.0\t1.0\n""") def test_len(self): """len(FlowgramCollection) returns length of longest sequence""" a = [('a','0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0'), ('b','1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0'), ('c','2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0')] f = FlowgramCollection(a) self.assertEqual(len(f), 3) def test_writeToFile(self): """FlowgramCollection.writeToFile should write in correct format""" a = [Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a', header_info = {'Bases':'TACCCCTTGG','Name Length':'14'}), Flowgram('1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', Name = 'b', header_info = {'Bases':'TTATTTACCG','Name Length':'14'})] f = FlowgramCollection(a, header_info = {'Flow Chars':'TACG'}) fn = mktemp(suffix='.sff') f.writeToFile(fn) result = open(fn, 'U').read() self.assertEqual(result, """Common Header:\n Flow Chars:\tTACG\n\n>a\n Name Length:\t14\nBases:\tTACCCCTTGG\nFlowgram:\t0.5\t1.0\t4.0\t0.0\t1.5\t0.0\t0.0\t2.0\n\n>b\n Name Length:\t14\nBases:\tTTATTTACCG\nFlowgram:\t1.5\t1.0\t0.0\t0.0\t2.5\t1.0\t2.0\t1.0\n""") remove(fn) def test_createCommonHeader(self): """create_commor_header should return lines for sff common header""" a = [Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a', header_info = {'Bases':'TACCCCTTGG','Name Length':'14'}), Flowgram('1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', Name = 'b', header_info = {'Bases':'TTATTTACCG','Name Length':'14'})] f = FlowgramCollection(a, header_info = {'Flow Chars':'TACG'}) self.assertEqual('\n'.join(f.createCommonHeader()), """Common Header:\n Flow Chars:\tTACG""") def test_toFasta(self): """FlowgramCollection should return correct FASTA string""" f = self.Class( [ '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', '1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', '2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0', '0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0' ], header_info = {'Flow Chars':'TACG'}) self.assertEqual(f.toFasta(), '>seq_0\nTACCCCTTGG\n>seq_1\nTTATTTACCG\n>seq_2\nTTTCCCCTAG\n>seq_3\nAGGGTTACGG') #NOTE THE FOLLOWING SURPRISING BEHAVIOR BECAUSE OF THE TWO-ITEM #SEQUENCE RULE: aln = self.Class(['0.5 1.0 0.0 0.0','0.0 1.0 1.0 0.0'], header_info = {'Flow Chars':'TACG'}) self.assertEqual(aln.toFasta(), '>A\nC\n>T\nA') def test_toPhylip(self): """FlowgramCollection should return PHYLIP string format correctly""" f = self.Class( [ '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', '1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', '2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0', '0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0' ], header_info = {'Flow Chars':'TACG'}) phylip_str, id_map = f.toPhylip() self.assertEqual(phylip_str, """4 10\nseq0000001 TACCCCTTGG\nseq0000002 TTATTTACCG\nseq0000003 TTTCCCCTAG\nseq0000004 AGGGTTACGG""") self.assertEqual(id_map, {'seq0000004':'seq_3', 'seq0000001':'seq_0', \ 'seq0000003': 'seq_2', 'seq0000002': 'seq_1'}) def test_toNexus(self): """FlowgramCollection should return correct Nexus string format""" f = self.Class( [ '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', '1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', '2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0', '0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0' ], header_info = {'Flow Chars':'TACG'}) expect = '#NEXUS\n\nbegin data;\n dimensions ntax=4 nchar=10;\n'+\ ' format datatype=dna interleave=yes missing=? gap=-;\n'+\ ' matrix\n seq_1 TTATTTACCG\n seq_0'+\ ' TACCCCTTGG\n seq_3 AGGGTTACGG\n '+\ ' seq_2 TTTCCCCTAG\n\n ;\nend;' self.assertEqual(f.toNexus('dna'), expect) def test_toSequenceCollection(self): """toSequenceCollection should return sequence collection from flows""" f = self.Class( [ '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', '1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', '2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0', '0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0' ], header_info = {'Flow Chars':'TACG'}) s = f.toSequenceCollection() assert isinstance(s,SequenceCollection) for i,j in zip(s.iterSeqs(),['TACCCCTTGG','TTATTTACCG','TTTCCCCTAG', 'AGGGTTACGG']): self.assertEqual(i,j) a = [Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a', header_info = {'Bases':'TACTTGG','Name Length':'14'}), Flowgram('1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', Name = 'b', header_info = {'Bases':'TTATTTG','Name Length':'14'})] f = self.Class(a) s = f.toSequenceCollection(Bases = True) assert isinstance(s,SequenceCollection) for i,j in zip(s.iterSeqs(),['TACTTGG','TTATTTG']): self.assertEqual(i,j) def test_addFlows(self): """addFlows should return an alignment with the new sequences appended""" a = [('s4', '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0'), ('s3', '1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0')] b = [('s1','2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0'), ('s2', '0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0')] f1 = self.Class(a, header_info = {'Flow Chars':'TACG'}) f2 = self.Class(b, header_info = {'Flow Chars':'TACG'}) self.assertEqual(f1.addFlows(f2).toFasta(), self.Class(a+b, header_info = {'Flow Chars':'TACG'}).toFasta()) def test_iterFlows(self): """FlowgramCollection iterFlows() method should support reordering""" f = self.Class(['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0','1.5 0.0 2.0 1.0'], \ Names=['a','b','c']) flows = map(str,list(f.iterFlows())) self.assertEqual(flows, ['0.0\t1.1\t3.0\t1.0', '0.5\t1.0\t4.0\t0.0','1.5\t0.0\t2.0\t1.0']) flows = list(f.iterFlows(flow_order=['b','a','a'])) self.assertEqual(map(str,flows), ['0.5\t1.0\t4.0\t0.0', '0.0\t1.1\t3.0\t1.0', '0.0\t1.1\t3.0\t1.0']) self.assertSameObj(flows[1], flows[2]) self.assertSameObj(flows[0], f.NamedFlows['b']) def test_Items(self): """FlowgramCollection Items should iterate over items in specified order.""" #should work if one row self.assertEqual(list(self.one_seq.Items), [0.0, 1.1, 3.0, 1.0]) #should take order into account self.assertEqual(list(self.ordered1.Items), [0.0, 1.1, 3.0, 1.0] + [0.5, 1.0, 4.0, 0.0]) self.assertEqual(list(self.ordered2.Items), [0.5, 1.0, 4.0, 0.0] + [0.0, 1.1, 3.0, 1.0]) def test_takeFlows(self): """takeFlows should return new FlowgramCollection with selected seqs.""" f = self.Class(['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0','1.5 0.0 2.0 1.0'], \ Names=['a','b','c']) a = f.takeFlows('bc') self.assertTrue(isinstance(a, FlowgramCollection)) self.assertEqual(a, {'b':'0.5 1.0 4.0 0.0','c':'1.5 0.0 2.0 1.0'}) #should be able to negate a = f.takeFlows('bc', negate=True) self.assertEqual(a, {'a':'0.0 1.1 3.0 1.0'}) def test_getFlowIndices(self): """FlowgramCollection getSeqIndices should return names of seqs where f(row) is True""" f = self.ambiguous is_long = lambda x: len(x) > 10 is_med = lambda x: len(str(x).replace('N','')) > 7 #strips gaps is_any = lambda x: len(x) > 0 self.assertEqual(f.getFlowIndices(is_long,Bases = True), []) f.Names = 'cba' self.assertEqual(f.getFlowIndices(is_med,Bases = True), ['c','a']) f.Names = 'bac' self.assertEqual(f.getFlowIndices(is_med,Bases = True), ['a','c']) self.assertEqual(f.getFlowIndices(is_any,Bases = True), ['b','a','c']) #should be able to negate self.assertEqual(f.getFlowIndices(is_med,Bases = True, negate=True), ['b']) self.assertEqual(f.getFlowIndices(is_any, Bases = True,negate=True), []) def test_takeFlowsIf(self): """FlowgramCollection takeFlowsIf should return flows where f(row) is True""" is_long = lambda x: len(x) > 10 is_med = lambda x: len(str(x).replace('N','')) > 7 is_any = lambda x: len(x) > 0 f = self.ambiguous self.assertEqual(f.takeFlowsIf(is_long, Bases = True), {}) self.assertEqual(f.takeFlowsIf(is_med, Bases = True), \ {'a':'0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', 'c':'1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0'}) self.assertEqual(f.takeFlowsIf(is_any, Bases = True), f) self.assertTrue(isinstance(f.takeFlowsIf(is_med, Bases = True), FlowgramCollection)) #should be able to negate self.assertEqual(f.takeFlowsIf(is_med, Bases = True,negate=True), \ {'b':'0.0 0.0 0.0 0.0 2.0 1.0 2.0 2.0'}) def test_getFlow(self): """FlowgramCollection.getFlow should return specified flow""" a = [('a','0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0'), ('b','1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0'), ('c','2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0')] f = FlowgramCollection(a) self.assertEqual(f.getFlow('a'), '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0') self.assertRaises(KeyError, f.getFlow, 'd') def test_getIntMap(self): """FlowgramCollection.getIntMap should return correct mapping.""" f = self.Class({'seq1':'0.5 1.0 2.0 0.0', 'seq2':'1.5 0.0 0.0 2.0','seq3':'0.0 3.0 0.1 1.0'}) int_keys = {'seq_0':'seq1','seq_1':'seq2','seq_2':'seq3'} int_map = {'seq_0':'0.5 1.0 2.0 0.0','seq_1':'1.5 0.0 0.0 2.0', 'seq_2':'0.0 3.0 0.1 1.0'} im,ik = f.getIntMap() self.assertEqual(ik,int_keys) self.assertEqual(im,int_map) #test change prefix from default 'seq_' prefix='seqn_' int_keys = {'seqn_0':'seq1','seqn_1':'seq2','seqn_2':'seq3'} int_map = {'seqn_0':'0.5 1.0 2.0 0.0','seqn_1':'1.5 0.0 0.0 2.0', 'seqn_2':'0.0 3.0 0.1 1.0'} im,ik = f.getIntMap(prefix=prefix) self.assertEqual(ik,int_keys) self.assertEqual(im,int_map) def test_toDict(self): """FlowgramCollection.toDict should return dict of strings (not obj)""" f = self.Class({'a': '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0' , 'b': '1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0'}) self.assertEqual(f.toDict(), {'a':'0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0' ,'b':'1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0'}) for i in f.toDict().values(): assert isinstance(i, str) def test_omitAmbiguousFlows(self): """FlowgramCollection omitAmbiguousFlows should return flows w/o N's""" self.assertEqual(self.ambiguous.omitAmbiguousFlows(Bases=True), {'a':'0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', 'c':'1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0'}) self.assertEqual(self.ambiguous.omitAmbiguousFlows(Bases=False), {'a':'0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', 'c':'1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0'}) #check new object creation self.assertNotSameObj(self.ambiguous.omitAmbiguousFlows(), self.ambiguous) self.assertTrue(isinstance(self.ambiguous.omitAmbiguousFlows( Bases = True), FlowgramCollection)) def test_setBases(self): """FlowgramCollection setBases should set Bases property correctly""" f = self.Class([Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a', header_info = {'Bases':'TACCCCTTGG'}), Flowgram('0.0 1.0 0.0 0.0 2.0 1.0 2.0 2.0', Name='b', header_info = {'Bases':'ATTACCGG'}), Flowgram('1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0', Name='c', header_info = {'Bases':'TTACCTTGG'})], header_info = {'Flow Chars':'TACG'}) f.setBases() for i,b in zip(f,['TACCCCTTGG','ATTACCGG','TTACCTTGG']): self.assertEqual(i.Bases,b) def setUp(self): """Define some standard data""" self.rec = """Common Header: Magic Number: 0x2E736666 Version: 0001 Index Offset: 96099976 Index Length: 1158685 # of Reads: 57902 Header Length: 440 Key Length: 4 # of Flows: 400 Flowgram Code: 1 Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG Key Sequence: TCAG >FIQU8OX05GCVRO Run Prefix: R_2008_10_15_16_11_02_ Region #: 5 XY Location: 2489_3906 Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Read Header Len: 32 Name Length: 14 # of Bases: 104 Clip Qual Left: 5 Clip Qual Right: 85 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04 Flow Indexes: 1 3 6 8 8 11 13 14 14 15 17 20 21 22 22 23 23 23 25 27 29 29 32 32 35 38 39 39 39 42 43 45 46 46 46 47 48 51 51 54 54 57 59 61 61 64 67 69 72 72 74 76 77 80 81 81 81 82 83 83 86 88 88 91 94 95 95 95 98 100 103 106 106 109 112 113 116 118 118 121 122 124 125 127 130 131 133 136 138 140 143 144 144 144 147 149 152 152 155 158 158 160 160 163 Bases: tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 37 37 37 37 37 39 39 39 39 24 24 24 37 34 28 24 24 24 28 34 39 39 39 39 39 39 39 39 39 39 39 39 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 >FIQU8OX05F8ILF Run Prefix: R_2008_10_15_16_11_02_ Region #: 5 XY Location: 2440_0913 Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Read Header Len: 32 Name Length: 14 # of Bases: 206 Clip Qual Left: 5 Clip Qual Right: 187 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.04 0.00 1.01 0.00 0.00 1.00 0.00 1.00 0.00 1.05 0.00 0.91 0.10 1.07 0.95 1.01 0.00 0.06 0.93 0.02 0.03 1.06 1.18 0.09 1.00 0.05 0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99 1.05 0.14 1.03 0.13 0.03 1.10 0.10 0.96 0.11 0.99 0.12 0.05 0.94 2.83 0.14 0.12 0.96 0.00 1.00 0.11 0.14 1.98 0.08 0.11 1.04 0.01 0.11 2.03 0.15 2.05 0.10 0.03 0.93 0.01 0.08 0.12 0.00 0.16 0.05 0.07 0.08 0.11 0.07 0.05 0.04 0.10 0.05 0.05 0.03 0.07 0.03 0.04 0.04 0.06 0.03 0.05 0.04 0.09 0.03 0.08 0.03 0.07 0.02 0.05 0.02 0.06 0.01 0.05 0.04 0.06 0.02 0.04 0.04 0.04 0.03 0.03 0.06 0.06 0.03 0.02 0.02 0.08 0.03 0.01 0.01 0.06 0.03 0.01 0.03 0.04 0.02 0.00 0.02 0.05 0.00 0.02 0.02 0.03 0.00 0.02 0.02 0.04 0.01 0.00 0.01 0.05 Flow Indexes: 1 3 6 8 10 12 14 15 16 19 22 23 25 27 30 30 33 33 34 37 37 37 39 39 42 45 46 48 51 53 53 56 56 56 57 58 60 61 64 65 67 70 70 73 74 74 77 80 83 85 88 91 93 94 97 100 102 102 103 106 109 112 112 112 114 116 117 118 119 122 122 122 125 126 129 129 131 133 133 135 138 138 140 142 145 146 147 149 152 154 157 159 161 163 166 169 169 169 171 171 173 173 173 174 176 178 181 182 185 186 189 190 191 191 191 194 196 198 198 200 201 204 206 206 206 209 209 211 211 213 216 216 218 221 223 226 227 230 233 234 236 237 238 240 241 241 243 245 246 249 249 249 249 249 250 253 253 253 256 258 261 264 266 268 270 270 270 271 273 273 273 274 277 278 279 281 282 285 285 285 285 285 287 290 293 294 294 295 297 300 302 304 307 308 308 308 311 313 316 316 319 322 322 324 324 327 Bases: tcagAGACGCACTCAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAActgagcgggctggcaaggc Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 34 34 34 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 36 36 36 36 36 38 25 25 25 38 37 37 37 37 37 37 33 33 34 37 37 37 37 37 37 37 38 34 20 20 26 26 20 34 38 37 37 37 37 37 37 37 37 37 38 38 38 37 37 37 37 37 37 37 37 37 37 """.split('\n') self.flow = """1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04""" self.unordered = self.Class({'a':'0.0 1.1 3.0 1.0', 'b':'0.5 1.0 4.0 0.0'}) self.ordered1 = self.Class({'a':'0.0 1.1 3.0 1.0',\ 'b':'0.5 1.0 4.0 0.0'}, Names=['a','b']) self.ordered2 = self.Class({'a':'0.0 1.1 3.0 1.0',\ 'b':'0.5 1.0 4.0 0.0'}, Names=['b','a']) self.one_seq = self.Class({'a':'0.0 1.1 3.0 1.0'}) self.ambiguous = self.Class([Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a', header_info = {'Bases':'TACCCCTTGG'}), Flowgram('0.0 0.0 0.0 0.0 2.0 1.0 2.0 2.0', Name = 'b', header_info = {'Bases':'NTTACCGG'}), Flowgram('1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0', Name='c', header_info = {'Bases':'TTACCTTGG'})], header_info = {'Flow Chars':'TACG'}) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_flowgram_parser.py000644 000765 000024 00000031350 12024702176 024361 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """tests for sff parser""" __author__ = "Julia Goodrich, Jens Reeder" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Julia Goodrich","Jens Reeder"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jens Reeder" __email__ = "jreeder@colorado.edu" __status__ = "Development" from types import GeneratorType from cogent.util.unit_test import TestCase, main from cogent.parse.flowgram_parser import get_header_info, get_summaries,\ get_all_summaries, split_summary, parse_sff, lazy_parse_sff_handle class SFFParserTests(TestCase): """Tests sff parser functions""" def setUp(self): """Define some standard data""" self.rec = """Common Header: Magic Number: 0x2E736666 Version: 0001 Index Offset: 96099976 Index Length: 1158685 # of Reads: 57902 Header Length: 440 Key Length: 4 # of Flows: 400 Flowgram Code: 1 Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG Key Sequence: TCAG >FIQU8OX05GCVRO Run Prefix: R_2008_10_15_16_11_02_ Region #: 5 XY Location: 2489_3906 Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Read Header Len: 32 Name Length: 14 # of Bases: 104 Clip Qual Left: 5 Clip Qual Right: 85 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04 Flow Indexes: 1 3 6 8 8 11 13 14 14 15 17 20 21 22 22 23 23 23 25 27 29 29 32 32 35 38 39 39 39 42 43 45 46 46 46 47 48 51 51 54 54 57 59 61 61 64 67 69 72 72 74 76 77 80 81 81 81 82 83 83 86 88 88 91 94 95 95 95 98 100 103 106 106 109 112 113 116 118 118 121 122 124 125 127 130 131 133 136 138 140 143 144 144 144 147 149 152 152 155 158 158 160 160 163 Bases: tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 37 37 37 37 37 39 39 39 39 24 24 24 37 34 28 24 24 24 28 34 39 39 39 39 39 39 39 39 39 39 39 39 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 >FIQU8OX05F8ILF Run Prefix: R_2008_10_15_16_11_02_ Region #: 5 XY Location: 2440_0913 Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis Read Header Len: 32 Name Length: 14 # of Bases: 206 Clip Qual Left: 5 Clip Qual Right: 187 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.04 0.00 1.01 0.00 0.00 1.00 0.00 1.00 0.00 1.05 0.00 0.91 0.10 1.07 0.95 1.01 0.00 0.06 0.93 0.02 0.03 1.06 1.18 0.09 1.00 0.05 0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99 1.05 0.14 1.03 0.13 0.03 1.10 0.10 0.96 0.11 0.99 0.12 0.05 0.94 2.83 0.14 0.12 0.96 0.00 1.00 0.11 0.14 1.98 0.08 0.11 1.04 0.01 0.11 2.03 0.15 2.05 0.10 0.03 0.93 0.01 0.08 0.12 0.00 0.16 0.05 0.07 0.08 0.11 0.07 0.05 0.04 0.10 0.05 0.05 0.03 0.07 0.03 0.04 0.04 0.06 0.03 0.05 0.04 0.09 0.03 0.08 0.03 0.07 0.02 0.05 0.02 0.06 0.01 0.05 0.04 0.06 0.02 0.04 0.04 0.04 0.03 0.03 0.06 0.06 0.03 0.02 0.02 0.08 0.03 0.01 0.01 0.06 0.03 0.01 0.03 0.04 0.02 0.00 0.02 0.05 0.00 0.02 0.02 0.03 0.00 0.02 0.02 0.04 0.01 0.00 0.01 0.05 Flow Indexes: 1 3 6 8 10 12 14 15 16 19 22 23 25 27 30 30 33 33 34 37 37 37 39 39 42 45 46 48 51 53 53 56 56 56 57 58 60 61 64 65 67 70 70 73 74 74 77 80 83 85 88 91 93 94 97 100 102 102 103 106 109 112 112 112 114 116 117 118 119 122 122 122 125 126 129 129 131 133 133 135 138 138 140 142 145 146 147 149 152 154 157 159 161 163 166 169 169 169 171 171 173 173 173 174 176 178 181 182 185 186 189 190 191 191 191 194 196 198 198 200 201 204 206 206 206 209 209 211 211 213 216 216 218 221 223 226 227 230 233 234 236 237 238 240 241 241 243 245 246 249 249 249 249 249 250 253 253 253 256 258 261 264 266 268 270 270 270 271 273 273 273 274 277 278 279 281 282 285 285 285 285 285 287 290 293 294 294 295 297 300 302 304 307 308 308 308 311 313 316 316 319 322 322 324 324 327 Bases: tcagAGACGCACTCAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAActgagcgggctggcaaggc Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 34 34 34 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 36 36 36 36 36 38 25 25 25 38 37 37 37 37 37 37 33 33 34 37 37 37 37 37 37 37 38 34 20 20 26 26 20 34 38 37 37 37 37 37 37 37 37 37 38 38 38 37 37 37 37 37 37 37 37 37 37 """.split('\n') def test_get_header_info(self): """get_header_info should return a sff file common header as a dict""" header = get_header_info(self.rec) self.assertEqual(len(header), 11) self.assertEqual(header['Key Length'], '4') self.assertEqual(header['Key Sequence'], 'TCAG') def test_get_summaries(self): """get_summaries should return a generator of the summaries""" summaries = get_summaries(self.rec,number_list = [1]) sum_list = list(summaries) self.assertEqual(len(sum_list), 1) self.assertEqual(isinstance(summaries, GeneratorType), True) self.assertEqual(len(sum_list[0]), 18) self.assertEqual(sum_list[0][0], '>FIQU8OX05F8ILF') summaries = get_summaries(self.rec,name_list = ['FIQU8OX05GCVRO']) sum_list = list(summaries) self.assertEqual(len(sum_list), 1) self.assertEqual(isinstance(summaries, GeneratorType), True) self.assertEqual(len(sum_list[0]), 18) self.assertEqual(sum_list[0][0], '>FIQU8OX05GCVRO') summaries = get_summaries(self.rec,all_sums = True ) sum_list = list(summaries) self.assertEqual(len(sum_list), 2) self.assertEqual(isinstance(summaries, GeneratorType), True) self.assertEqual(len(sum_list[0]), 18) self.assertEqual(sum_list[0][0], '>FIQU8OX05GCVRO') self.assertEqual(sum_list[1][0], '>FIQU8OX05F8ILF') summaries = get_summaries(self.rec,number_list = [0], name_list =['FIQU8OX05GCVRO']) self.assertRaises(AssertionError,list,summaries) summaries = get_summaries(self.rec) self.assertRaises(ValueError,list, summaries) def test_get_all_summaries(self): """get_all_summaries should return a list of the summaries""" summaries = get_all_summaries(self.rec) self.assertEqual(len(summaries), 2) self.assertEqual(isinstance(summaries,list), True) self.assertEqual(len(summaries[0]), 18) self.assertEqual(summaries[0][0], '>FIQU8OX05GCVRO') self.assertEqual(summaries[1][0], '>FIQU8OX05F8ILF') def test_split_summary(self): """split_summary should return the info of a flowgram header.""" summaries = get_all_summaries(self.rec) sum_dict = split_summary(summaries[0]) self.assertEqual(len(sum_dict), 18) self.assertEqual(sum_dict['Name'], 'FIQU8OX05GCVRO') assert 'Flowgram' in sum_dict assert 'Bases' in sum_dict sum_dict = split_summary(summaries[1]) self.assertEqual(len(sum_dict), 18) self.assertEqual(sum_dict['Name'], 'FIQU8OX05F8ILF') assert 'Flowgram' in sum_dict assert 'Bases' in sum_dict def test_parse_sff(self): """SFParser should read in the SFF file correctly.""" flows, head = parse_sff(self.rec) self.assertEqual(len(flows),2) self.assertEqual(len(head), 11) self.assertEqual(head['Key Length'], '4') self.assertEqual(head['Key Sequence'], 'TCAG') self.assertEqual(flows[0].Name, 'FIQU8OX05GCVRO') self.assertEqual(flows[1].Name, 'FIQU8OX05F8ILF') def test_lazy_parse_sff_handle(self): """LazySFParser should read in the SFF file correctly.""" flows, head = lazy_parse_sff_handle(self.rec) flows = list(flows) self.assertEqual(len(flows),2) self.assertEqual(len(head), 11) self.assertEqual(head['Key Length'], '4') self.assertEqual(head['Key Sequence'], 'TCAG') self.assertEqual(flows[0].Name, 'FIQU8OX05GCVRO') self.assertEqual(flows[1].Name, 'FIQU8OX05F8ILF') if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_gbseq.py000644 000765 000024 00000016547 12024702176 022303 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import xml.dom.minidom from cogent.util.unit_test import TestCase, main from cogent.parse.gbseq import GbSeqXmlParser __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" data = """ AY286018 99 single mRNA linear MAM 29-SEP-2003 01-JUN-2003 Macropus eugenii medium wave-sensitive opsin 1 (OPN1MW) mRNA, complete cds AY286018 AY286018.1 gb|AY286018.1| gi|31322957 Macropus eugenii (tammar wallaby) Macropus eugenii Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Metatheria; Diprotodontia; Macropodidae; Macropus 1 1..99 Deeb,S.S. Wakefield,M.J. Tada,T. Marotte,L. Yokoyama,S. Marshall Graves,J.A. The cone visual pigments of an Australian marsupial, the tammar wallaby (Macropus eugenii): sequence, spectral tuning, and evolution Mol. Biol. Evol. 20 (10), 1642-1649 (2003) doi 10.1093/molbev/msg181 12885969 2 1..99 Deeb,S.S. Wakefield,M.J. Tada,T. Marotte,L. Yokoyama,S. Graves,J.A.M. Direct Submission Submitted (29-APR-2003) RSBS, The Australian National University, Acton, ACT 0200, Australia source 1..99 1 99 AY286018.1 organism Macropus eugenii mol_type mRNA db_xref taxon:9315 country Australia: Kangaroo Island gene 1..99 1 99 AY286018.1 gene OPN1MW CDS 31..99 31 99 AY286018.1 gene OPN1MW note cone pigments codon_start 1 transl_table 1 product medium wave-sensitive opsin 1 protein_id AAP37945.1 db_xref GI:31322958 translation MTQAWDPAGFLAWRRDENE ggcagggaaagggaagaaagtaaaggggccatgacacaggcatgggaccctgcagggttcttggcttggcggcgggacgagaacgaggagacgactcgg """ sample_seq = ">AY286018.1\nGGCAGGGAAAGGGAAGAAAGTAAAGGGGCCATGACACAGGCATGGGACCCTGCAGGGTTCTTGGCTTGGCGGCGGGACGAGAACGAGGAGACGACTCGG" sample_annotations = '[source "[0:99]/99 of AY286018.1" at [0:99]/99, organism "Macropus eugenii" at [0:99]/99, gene "OPN1MW" at [0:99]/99, CDS "OPN1MW" at [30:99]/99]' class ParseGBseq(TestCase): def test_parse(self): for name,seq in [GbSeqXmlParser(data).next(),GbSeqXmlParser(xml.dom.minidom.parseString(data)).next()]: self.assertEqual(name, 'AY286018.1') self.assertEqual(sample_seq, seq.toFasta()) self.assertEqual(str(seq.annotations), sample_annotations) pass if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_genbank.py000644 000765 000024 00000043366 12024702176 022606 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the GenBank database parsers. """ from cogent.parse.genbank import parse_locus, parse_single_line, \ indent_splitter, parse_sequence, block_consolidator, parse_organism, \ parse_feature, location_line_tokenizer, parse_simple_location_segment, \ parse_location_line, parse_reference, parse_source, \ Location, LocationList from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class GenBankTests(TestCase): """Tests of the GenBank main functions.""" def test_parse_locus(self): """parse_locus should give correct results on specimen locus lines""" line = 'LOCUS AF108830 5313 bp mRNA linear PRI 19-MAY-1999' result = parse_locus(line) self.assertEqual(len(result), 6) self.assertEqual(result['locus'], 'AF108830') self.assertEqual(result['length'], 5313) #note: int, not str self.assertEqual(result['mol_type'], 'mRNA') self.assertEqual(result['topology'], 'linear') self.assertEqual(result['db'], 'PRI') self.assertEqual(result['date'], '19-MAY-1999') #should work if some of the fields are missing line = 'LOCUS AF108830 5313' result = parse_locus(line) self.assertEqual(len(result), 2) self.assertEqual(result['locus'], 'AF108830') self.assertEqual(result['length'], 5313) #note: int, not str def test_parse_single_line(self): """parse_single_line should split off the label and return the rest""" line_1 = 'VERSION AF108830.1 GI:4868112\n' self.assertEqual(parse_single_line(line_1), 'AF108830.1 GI:4868112') #should work if leading spaces line_2 = ' VERSION AF108830.1 GI:4868112\n' self.assertEqual(parse_single_line(line_2), 'AF108830.1 GI:4868112') def test_indent_splitter(self): """indent_splitter should split lines at correct locations""" #if lines have same indent, should not group together lines = [ 'abc xxx', 'def yyy' ] self.assertEqual(list(indent_splitter(lines)),\ [[lines[0]], [lines[1]]]) #if second line is indented, should group with first lines = [ 'abc xxx', ' def yyy' ] self.assertEqual(list(indent_splitter(lines)),\ [[lines[0], lines[1]]]) #if both lines indented but second is more, should group with first lines = [ ' abc xxx', ' def yyy' ] self.assertEqual(list(indent_splitter(lines)),\ [[lines[0], lines[1]]]) #if both lines indented equally, should not group lines = [ ' abc xxx', ' def yyy' ] self.assertEqual(list(indent_splitter(lines)), \ [[lines[0]], [lines[1]]]) #for more complex situation, should produce correct grouping lines = [ ' xyz', #0 - ' xxx', #1 - ' yyy', #2 ' uuu', #3 ' iii', #4 ' qaz', #5 - ' wsx', #6 - ' az', #7 ' sx', #8 ' gb',#9 ' bg', #10 ' aaa', #11 - ] self.assertEqual(list(indent_splitter(lines)), \ [[lines[0]], lines[1:5], [lines[5]], lines[6:11], [lines[11]]]) #real example from genbank file lines = \ """LOCUS NT_016354 92123751 bp DNA linear CON 29-AUG-2006 DEFINITION Homo sapiens chromosome 4 genomic contig, reference assembly. ACCESSION NT_016354 NT_006109 NT_006204 NT_006245 NT_006302 NT_006371 NT_006397 NT_016393 NT_016589 NT_016599 NT_016606 NT_022752 NT_022753 NT_022755 NT_022760 NT_022774 NT_022797 NT_022803 NT_022846 NT_022960 NT_025694 NT_028147 NT_029273 NT_030643 NT_030646 NT_030662 NT_031780 NT_031781 NT_031791 NT_034703 NT_034705 NT_037628 NT_037629 NT_079512 VERSION NT_016354.18 GI:88977422 KEYWORDS . SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. ? REFERENCE 2 (bases 1 to 92123751) AUTHORS International Human Genome Sequencing Consortium. TITLE Finishing the euchromatic sequence of the human genome""".split('\n') self.assertEqual(list(indent_splitter(lines)), \ [[lines[0]],[lines[1]],lines[2:8],[lines[8]],[lines[9]],lines[10:15],\ [lines[15]], lines[16:]]) def test_parse_sequence(self): """parse_sequence should strip bad chars out of sequence lines""" lines = """ ORIGIN 1 gggagcgcgg cgcgggagcc cgaggctgag actcaccgga ggaagcggcg cgagcgcccc 61 gccatcgtcc \t\t cggctgaagt 123 \ngcagtg \n 121 cctgggctta agcagtcttc45ccacctcagc //\n\n\n""".split('\n') result = parse_sequence(lines) self.assertEqual(result, 'gggagcgcggcgcgggagcccgaggctgagactcaccggaggaagcggcgcgagcgccccgccatcgtcccggctgaagtgcagtgcctgggcttaagcagtcttcccacctcagc') def test_block_consolidator(self): """block_consolidator should join the block together.""" lines = """ ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo.""".split('\n') label, data = block_consolidator(lines) self.assertEqual(label, 'ORGANISM') self.assertEqual(data, ['Homo sapiens', ' Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;', ' Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini;', ' Hominidae; Homo.']) lines = r"""COMMENT Contact: Spindel ER Division of Neuroscience""".splitlines() label, data = block_consolidator(lines) self.assertEqual(label, "COMMENT") self.assertEqual(data, ['', ' Contact: Spindel ER', ' Division of Neuroscience']) def test_parse_organism(self): """parse_organism should return species, taxonomy (up to genus)""" #note: lines modified to include the following: # - multiword names # - multiword names split over a line break # - periods and other punctuation in names lines = """ ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates \t abc. 2.; Catarrhini Hominidae; Homo.""".split('\n') species, taxonomy = parse_organism(lines) self.assertEqual(species, 'Homo sapiens') self.assertEqual(taxonomy, ['Eukaryota', 'Metazoa', \ 'Chordata Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', \ 'Eutheria', 'Euarchontoglires', 'Primates abc. 2.', \ 'Catarrhini Hominidae', 'Homo']) def test_parse_feature(self): """parse_feature should return dict containing annotations of feature""" example_feature=\ """ CDS complement(join(102262..102647,105026..105217, 106638..106719,152424..152682,243209..243267)) /gene="nad1" /note="Protein sequence is in conflict with the conceptual translation; author given translation (not conceptual translation) start codon is created by C to U RNA editing" /codon_start=1 /exception="RNA editing" /product="NADH dehydrogenase subunit 1" /protein_id="NP_064011.1" /db_xref="GI:9838451" /db_xref="IPI:12345" /translation="MYIAVPAEILGIILPLLLGVAFLVLAERKVMAFVQRRKGPDVVG SFGLLQPLADGSKLILKEPISPSSANFSLFRMAPVTTFMLSLVARAVVPFDYGMVLSD PNIGLLYLFAISSLGVYGIIIAGWSSNSKYAFLGALRSAAQMVPYEVSIGLILITVLI CVGPRNSSEIVMAQKQIWSGIPLFPVLVMFFISCLAETNRAPFDLPEAERELVAGYNV EYSSMGSALFFLGEYANMILMSGLCTSLSPGGWPPILDLPISKRIPGSIWFSIKVILF LFLYIWVRAAFPRYRYDQLMGLGRKVFLPLSLARVVAVSGVLVTFQWLP""" result = parse_feature(example_feature.split('\n')) self.assertEqual(result['type'], 'CDS') self.assertEqual(result['raw_location'], \ ['complement(join(102262..102647,105026..105217,', \ ' 106638..106719,152424..152682,243209..243267))']) self.assertEqual(result['gene'], ['nad1']) self.assertEqual(result['note'], ['Protein sequence is in conflict with the conceptual translation; author given translation (not conceptual translation) start codon is created by C to U RNA editing']) self.assertEqual(result['codon_start'], ['1']) self.assertEqual(result['exception'], ['RNA editing']) self.assertEqual(result['product'], ['NADH dehydrogenase subunit 1']) self.assertEqual(result['protein_id'],['NP_064011.1']) self.assertEqual(result['db_xref'], ['GI:9838451','IPI:12345']) self.assertEqual(result['translation'],['MYIAVPAEILGIILPLLLGVAFLVLAERKVMAFVQRRKGPDVVGSFGLLQPLADGSKLILKEPISPSSANFSLFRMAPVTTFMLSLVARAVVPFDYGMVLSDPNIGLLYLFAISSLGVYGIIIAGWSSNSKYAFLGALRSAAQMVPYEVSIGLILITVLICVGPRNSSEIVMAQKQIWSGIPLFPVLVMFFISCLAETNRAPFDLPEAERELVAGYNVEYSSMGSALFFLGEYANMILMSGLCTSLSPGGWPPILDLPISKRIPGSIWFSIKVILFLFLYIWVRAAFPRYRYDQLMGLGRKVFLPLSLARVVAVSGVLVTFQWLP']) self.assertEqual(len(result), 11) short_feature = ['D-loop 15418..16866'] result = parse_feature(short_feature) self.assertEqual(result['type'], 'D-loop') self.assertEqual(result['raw_location'], ['15418..16866']) #can get more than one = in a line #from AF260826 bad_feature = \ """ tRNA 1173..1238 /note="codon recognized: AUC; Cove score = 16.56" /product="tRNA-Ile" /anticodon=(pos:1203..1205,aa:Ile)""" result = parse_feature(bad_feature.split('\n')) self.assertEqual(result['note'], \ ['codon recognized: AUC; Cove score = 16.56']) #need not always have an = in a line #from NC_001807 bad_feature = \ ''' mRNA 556 /partial /citation=[6] /product="H-strand"''' result = parse_feature(bad_feature.split('\n')) self.assertEqual(result['partial'], ['']) def test_location_line_tokenizer(self): """location_line_tokenizer should tokenize location lines""" llt =location_line_tokenizer self.assertEqual(list(llt(['123..456'])), ['123..456']) self.assertEqual(list(llt(['complement(123..456)'])), \ ['complement(', '123..456', ')']) self.assertEqual(list(llt(['join(1..2,3..4)'])), \ ['join(', '1..2', ',', '3..4', ')']) self.assertEqual(list(llt([\ 'join(complement(1..2, join(complement( 3..4),',\ '\n5..6), 7..8\t))'])),\ ['join(','complement(','1..2',',','join(','complement(','3..4',\ ')', ',', '5..6',')',',','7..8',')',')']) def test_parse_simple_location_segment(self): """parse_simple_location_segment should parse simple segments""" lsp = parse_simple_location_segment l = lsp('37') self.assertEqual(l._data, 37) self.assertEqual(str(l), '37') self.assertEqual(l.Strand, 1) l = lsp('40..50') first, second = l._data self.assertEqual(first._data, 40) self.assertEqual(second._data, 50) self.assertEqual(str(l), '40..50') self.assertEqual(l.Strand, 1) #should handle ambiguous starts and ends l = lsp('>37') self.assertEqual(l._data, 37) self.assertEqual(str(l), '>37') l = lsp('<37') self.assertEqual(l._data, 37) self.assertEqual(str(l), '<37') l = lsp('<37..>42') first, second = l._data self.assertEqual(first._data, 37) self.assertEqual(second._data, 42) self.assertEqual(str(first), '<37') self.assertEqual(str(second), '>42') self.assertEqual(str(l), '<37..>42') def test_parse_location_line(self): """parse_location_line should give correct list of location objects""" llt = location_line_tokenizer r = parse_location_line(llt(['123..456'])) self.assertEqual(str(r), '123..456') r = parse_location_line(llt(['complement(123..456)'])) self.assertEqual(str(r), 'complement(123..456)') r = parse_location_line(llt(['complement(123..456, 345..678)'])) self.assertEqual(str(r), \ 'join(complement(345..678),complement(123..456))') r = parse_location_line(llt(['complement(join(123..456, 345..678))'])) self.assertEqual(str(r), \ 'join(complement(345..678),complement(123..456))') r = parse_location_line(\ llt(['join(complement(123..456), complement(345..678))'])) self.assertEqual(str(r), \ 'join(complement(123..456),complement(345..678))') #try some nested joins and complements r = parse_location_line(llt(\ ['complement(join(1..2,3..4,complement(5..6),', 'join(7..8,complement(9..10))))'])) self.assertEqual(str(r), \ 'join(9..10,complement(7..8),5..6,complement(3..4),complement(1..2))') def test_parse_reference(self): """parse_reference should give correct fields""" r = \ """REFERENCE 2 (bases 1 to 2587) AUTHORS Janzen,D.M. and Geballe,A.P. TITLE The effect of eukaryotic release factor depletion on translation termination in human cell lines JOURNAL (er) Nucleic Acids Res. 32 (15), 4491-4502 (2004) PUBMED 15326224""" result = parse_reference(r.split('\n')) self.assertEqual(len(result), 5) self.assertEqual(result['reference'], '2 (bases 1 to 2587)') self.assertEqual(result['authors'], 'Janzen,D.M. and Geballe,A.P.') self.assertEqual(result['title'], \ 'The effect of eukaryotic release factor depletion ' + \ 'on translation termination in human cell lines') self.assertEqual(result['journal'], \ '(er) Nucleic Acids Res. 32 (15), 4491-4502 (2004)') self.assertEqual(result['pubmed'], '15326224') def test_parse_source(self): """parse_source should split into source and organism""" s = \ """SOURCE African elephant. ORGANISM Mitochondrion Loxodonta africana Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Proboscidea; Elephantidae; Loxodonta.""".split('\n') r = parse_source(s) self.assertEqual(len(r), 3) self.assertEqual(r['source'], 'African elephant.') self.assertEqual(r['species'], 'Mitochondrion Loxodonta africana') self.assertEqual(r['taxonomy'], ['Eukaryota','Metazoa', 'Chordata',\ 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia',\ 'Eutheria', 'Proboscidea', 'Elephantidae', 'Loxodonta']) class LocationTests(TestCase): """Tests of the Location class.""" def test_init(self): """Location should init with 1 or 2 values, plus params.""" l = Location(37) self.assertEqual(str(l), '37') l = Location(37, Ambiguity = '>') self.assertEqual(str(l), '>37') l = Location(37, Ambiguity='<') self.assertEqual(str(l), '<37') l = Location(37, Accession='AB123') self.assertEqual(str(l), 'AB123:37') l = Location(37, Accession='AB123', Db='Kegg') self.assertEqual(str(l), 'Kegg::AB123:37') l1 = Location(37) l2 = Location(42) l = Location([l1,l2]) self.assertEqual(str(l), '37..42') l3 = Location([l1,l2], IsBounds=True) self.assertEqual(str(l3), '(37.42)') l4 = Location([l1,l2], IsBetween=True) self.assertEqual(str(l4), '37^42') l5 = Location([l4,l3]) self.assertEqual(str(l5), '37^42..(37.42)') l5 = Location([l4,l3], Strand=-1) self.assertEqual(str(l5), 'complement(37^42..(37.42))') class LocationListTests(TestCase): """Tests of the LocationList class.""" def test_extract(self): """LocationList extract should return correct sequence""" l = Location(3) l2_a = Location(5) l2_b = Location(7) l2 = Location([l2_a,l2_b], Strand=-1) l3_a = Location(10) l3_b = Location(12) l3 = Location([l3_a, l3_b]) ll = LocationList([l, l2, l3]) s = ll.extract('ACGTGCAGTCAGTAGCAT') # 123456789012345678 self.assertEqual(s, 'G'+'TGC'+'CAG') #check a case where it wraps around l5_a = Location(16) l5_b = Location(4) l5 = Location([l5_a,l5_b]) ll = LocationList([l5]) s = ll.extract('ACGTGCAGTCAGTAGCAT') self.assertEqual(s, 'CATACGT') if __name__ == '__main__': from sys import argv if len(argv) > 2 and argv[1] == 'x': filename = argv[2] lines = open(filename) for i in indent_splitter(lines): print '******' print i[0] for j in indent_splitter(i[1:]): print '?????' for line in j: print line else: main() PyCogent-1.5.3/tests/test_parse/test_gff.py000644 000765 000024 00000005026 12024702176 021732 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for GFF and related parsers. """ from cogent.parse.gff import * from cogent.util.unit_test import TestCase, main from StringIO import StringIO __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" headers = [ """##gff-version 2 ##source-version ##date ##Type [] ##DNA ##acggctcggattggcgctggatgatagatcagacgac ##... ##end-DNA """, """##gff-version 2 """, "", ] # '\t\t\t\t\t\t\t\t[attribute]\n' data_lines = [ ('seq1\tBLASTX\tsimilarity\t101\t235\t87.1\t+\t0\tTarget "HBA_HUMAN" 11 55 ; E_value 0.0003\n', ('seq1', 'BLASTX', 'similarity', 100, 235, '87.1', '+', '0', 'Target "HBA_HUMAN" 11 55 ; E_value 0.0003', None)), ('dJ102G20\tGD_mRNA\tcoding_exon\t7105\t7201\t.\t-\t2\tSequence "dJ102G20.C1.1"\n', ('dJ102G20', 'GD_mRNA', 'coding_exon', 7201, 7104, '.', '-', '2', 'Sequence "dJ102G20.C1.1"', None)), ('dJ102G20\tGD_mRNA\tcoding_exon\t7105\t7201\t.\t-\t2\t\n', ('dJ102G20', 'GD_mRNA', 'coding_exon', 7201, 7104, '.', '-', '2', '', None)), ('12345\tSource with spaces\tfeature with spaces\t-100\t3600000000\t1e-5\t-\t.\tSequence "BROADO5" ; Note "This is a \\t tab containing \\n multi line comment"\n', ('12345', 'Source with spaces', 'feature with spaces', 3600000000L, 101, '1e-5', '-', '.', 'Sequence "BROADO5" ; Note "This is a \\t tab containing \\n multi line comment"', None)), ] class GffTest(TestCase): """Setup data for all the GFF parsers.""" def testGffParserData(self): """Test GffParser with valid data lines""" for (line,canned_result) in data_lines: result = GffParser(StringIO(line)).next() self.assertEqual(result,canned_result) def testGffParserHeaders(self): """Test GffParser with valid data headers""" data = "".join([x[0] for x in data_lines]) for header in headers: result = list(GffParser(StringIO(header+data))) self.assertEqual(result,[x[1] for x in data_lines]) def test_parse_attributes(self): """Test parse_attributes""" self.assertEqual([parse_attributes(x[1][8]) for x in data_lines], ['HBA_HUMAN', 'dJ102G20.C1.1', '', 'BROADO5']) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_gibbs.py000644 000765 000024 00000522703 12024702176 022264 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for the Gibbs parser """ from __future__ import division from cogent.util.unit_test import TestCase, main import string import re from cogent.motif.util import Motif,Module from cogent.core.moltype import DNA,RNA,PROTEIN from cogent.parse.record import DelimitedSplitter from cogent.parse.record_finder import LabeledRecordFinder from cogent.parse.gibbs import get_sequence_and_motif_blocks, get_sequence_map,\ get_motif_blocks, get_motif_sequences, get_motif_p_value, guess_alphabet,\ build_module_objects, module_ids_to_int, GibbsParser from math import exp __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" class GibbsTests(TestCase): """Tests for gibbs parser. """ def setUp(self): """Setup function for gibbs tests. """ self.gibbs_lines = GIBBS_FILE.split('\n') self.sequence_map = {'1':'1091044',\ '10':'135765',\ '11':'1388082',\ '12':'140543',\ '13':'14286173',\ '14':'14578634',\ '15':'14600438',\ '16':'15218394',\ '17':'15597673',\ '18':'15599256',\ '19':'15602312',\ '2':'11467494',\ '20':'15605725',\ '21':'15605963',\ '22':'15609375',\ '23':'15609658',\ '24':'15613511',\ '25':'15614085',\ '26':'15614140',\ '27':'15615431',\ '28':'15643152',\ '29':'15672286',\ '3':'11499727',\ '30':'15790738',\ '31':'15791337',\ '32':'15801846',\ '33':'15805225',\ '34':'15805374',\ '35':'15807234',\ '36':'15826629',\ '37':'15899007',\ '38':'15899339',\ '39':'15964668',\ '4':'1174686',\ '40':'15966937',\ '41':'15988313',\ '42':'16078864',\ '43':'16123427',\ '44':'16125919',\ '45':'16330420',\ '46':'1633495',\ '47':'16501671',\ '48':'1651717',\ '49':'16759994',\ '5':'12044976',\ '50':'16761507',\ '51':'16803644',\ '52':'16804867',\ '53':'17229033',\ '54':'17229859',\ '55':'1729944',\ '56':'17531233',\ '57':'17537401',\ '58':'17547503',\ '59':'18309723',\ '6':'13186328',\ '60':'18313548',\ '61':'18406743',\ '62':'19173077',\ '63':'19554157',\ '64':'19705357',\ '65':'19746502',\ '66':'20092028',\ '67':'20151112',\ '68':'21112072',\ '69':'21222859',\ '7':'13358154',\ '70':'21223405',\ '71':'21227878',\ '72':'21283385',\ '73':'21674812',\ '74':'23098307',\ '75':'2649838',\ '76':'267116',\ '77':'27375582',\ '78':'2822332',\ '79':'30021713',\ '8':'13541053',\ '80':'3261501',\ '81':'3318841',\ '82':'3323237',\ '83':'4155972',\ '84':'4200327',\ '85':'4433065',\ '86':'4704732',\ '87':'4996210',\ '88':'5326864',\ '89':'6322180',\ '9':'13541117',\ '90':'6323138',\ '91':'6687568',\ '92':'6850955',\ '93':'7109697',\ '94':'7290567',\ '95':'9955016',\ '96':'15677788',\ } self.motif_a_lines = """ 10 columns Num Motifs: 27 2, 1 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494 6, 1 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328 8, 1 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053 9, 1 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117 """.split('\n') self.motif_b_lines = """ MOTIF b 15 columns Num Motifs: 6 2, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494 47, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671 67, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112 81, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841 87, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210 95, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016 **** * ******* ** * Log Motif portion of MAP for motif b = -187.76179 Log Fragmentation portion of MAP for motif b = -7.77486 ------------------------------------------------------------------------- """.split('\n') def test_get_sequence_and_motif_blocks(self): """get_sequence_and_motif_blocks tests.""" seq_motif_lines = ['before line',\ '=====MAP MAXIMIZATION RESULTS=====',\ 'after line'\ ] exp_seq_block=['before line'] exp_motif_block=['=====MAP MAXIMIZATION RESULTS=====','after line'] seq_block,motif_block = get_sequence_and_motif_blocks(seq_motif_lines) self.assertEqual(seq_block,exp_seq_block) self.assertEqual(motif_block,exp_motif_block) def test_get_sequence_map(self): """get_sequence_map tests.""" sequence_map = get_sequence_map(self.gibbs_lines) self.assertEqual(sequence_map,self.sequence_map) def test_get_motif_blocks(self): """get_motif_blocks tests.""" motif_lines = ['before motifs',\ 'first MOTIF a',\ ' motif a data',\ 'second MOTIF b',\ ' motif b data', 'after motifs' ] exp_motif_blocks = [['first MOTIF a', 'motif a data'],\ ['second MOTIF b', 'motif b data', 'after motifs']] motif_blocks = get_motif_blocks(motif_lines) self.assertEqual(motif_blocks,exp_motif_blocks) def test_get_motif_sequences(self): """get_motif_sequences tests.""" motif_list = get_motif_sequences(self.motif_a_lines) exp_motif_list = [('2', 71, 'ILAISVDSPFSH', 1.0, '1'),\ ('6', 65, 'IYAISNDSHFVQ', 1.0, '1'),\ ('8', 67, 'VISVSEDTVYVH', 1.0, '1'),\ ('9', 65, 'VIGISVDSPFSL', 1.0, '1')] self.assertEqual(motif_list,exp_motif_list) def test_get_motif_p_value(self): """get_motif_p_value tests.""" log_list = ['Column 8 : Sequence Description from Fast A input',\ 'Log Motif portion of MAP for motif a = -469.15170',\ 'Log Fragmentation portion of MAP for motif a = -3.80666',\ ] exp_p_val = exp(-469.15170) self.assertEqual(get_motif_p_value(log_list),exp_p_val) def test_guess_alphabet(self): """guess_alphabet tests.""" motif_list = [('2', 71, 'ILAISVDSPFSH', 1.0, '1'),\ ('6', 65, 'IYAISNDSHFVQ', 1.0, '1'),\ ('8', 67, 'VISVSEDTVYVH', 1.0, '1'),\ ('9', 65, 'VIGISVDSPFSL', 1.0, '1')] alphabet = guess_alphabet(motif_list) self.assertEqual(alphabet,PROTEIN) def test_build_module_objects(self): """build_module_objects tests.""" module = list(build_module_objects(self.motif_b_lines,\ self.sequence_map))[0] exp_module_dict = {('20151112', 153): 'AQYVAAHPGEVCPAKWKEG',\ ('9955016', 159): 'AFQYTDEHGEVCPAGWKPG',\ ('16501671', 159): 'ALQFHEEHGDVCPAQWEKG',\ ('11467494', 160): 'IQYVKENPGYACPVNWNFG',\ ('4996210', 162): 'SLQLTNTHPVATPVNWKEG',\ ('3318841', 165): 'SLQLTAEKRVATPVDWKDG',\ } #module.AlignedSeqs.items() == exp_module_dict.items() for k1,k2 in zip(module.AlignedSeqs.keys(),\ exp_module_dict.keys()): self.assertEqual(k1,k2) v1 = str(module.AlignedSeqs[k1]) v2 = exp_module_dict[k2] self.assertEqual(v1,v2) def test_module_ids_to_int(self): """module_ids_to_int tests.""" module = list(build_module_objects(self.motif_b_lines,\ self.sequence_map))[0] module_ids_to_int([module]) self.assertEqual(module.ID,'0') GIBBS_FILE = """ Gibbs.linux superfamily_aln_gis.fasta 10,15,20,25 5,5,5,5 i = 20 range = 20 high = 11 low = -9 Gibbs 2.06.024 Jul 21 2005 Data file: superfamily_aln_gis.fasta Current directory: /home/widmannj/superfamilies The following options are set: Concentrated Region False Sequence type False Collapsed Alphabet False Pseudocount weight False Use Expectation/Maximization False Don't Xnu sequence False Help flag False Near optimal cutoff False Number of iterations False Don't fragment False Don't use map maximization False Repeat regions False Output file False Informed priors file False Plateau periods False palindromic sequence False Don't Reverse complement False Number of seeds False Seed Value False Pseudosite weight False Suboptimal sampler output False Overlap False Allow width to vary False Wilcoxon signed rank False Sample along length False Output Scan File False Output prior file False Modular Sampler False Ignore Spacing Model False Sample Background False Bkgnd Comp Model False Init from prior False Homologous Seq pairs False Parallel Tempering False Group Sampler False No progress info False Fragment from middle False Verify Mode False Alternate sample on k False No freq. soln. False Calc. def. pseudo wt. False Motif/Recur smpl False Phylogenetic Sampling False Supress Near Opt. False Nearopt display cutoff False site_samp = 0 nMotifLen = 10, 15, 20, 25 nAlphaLen = 20 nNumMotifs = 5 ,5 ,5 ,5 dPseudoCntWt = 0.1 dPseudoSiteWt = 0.8 nMaxIterations = 500 lSeedVal = 1149743202 nPlateauPeriods = 20 nSeeds = 10 nNumMotifTypes = 4 dCutoff = 0.01 dNearOptDispCutoff = 0.5 RevComplement = 0 glOverlapParam = 0 Rcutoff factor = 0.001 Post Plateau Samples = 0 Frag/Shft Per. = 5 Frag width = 15,22,30,37 Sequences to be Searched: _________________________ #1 1091044 #2 11467494 #3 11499727 #4 1174686 #5 12044976 #6 13186328 #7 13358154 #8 13541053 #9 13541117 #10 135765 #11 1388082 #12 140543 #13 14286173 #14 14578634 #15 14600438 #16 15218394 #17 15597673 #18 15599256 #19 15602312 #20 15605725 #21 15605963 #22 15609375 #23 15609658 #24 15613511 #25 15614085 #26 15614140 #27 15615431 #28 15643152 #29 15672286 #30 15790738 #31 15791337 #32 15801846 #33 15805225 #34 15805374 #35 15807234 #36 15826629 #37 15899007 #38 15899339 #39 15964668 #40 15966937 #41 15988313 #42 16078864 #43 16123427 #44 16125919 #45 16330420 #46 1633495 #47 16501671 #48 1651717 #49 16759994 #50 16761507 #51 16803644 #52 16804867 #53 17229033 #54 17229859 #55 1729944 #56 17531233 #57 17537401 #58 17547503 #59 18309723 #60 18313548 #61 18406743 #62 19173077 #63 19554157 #64 19705357 #65 19746502 #66 20092028 #67 20151112 #68 21112072 #69 21222859 #70 21223405 #71 21227878 #72 21283385 #73 21674812 #74 23098307 #75 2649838 #76 267116 #77 27375582 #78 2822332 #79 30021713 #80 3261501 #81 3318841 #82 3323237 #83 4155972 #84 4200327 #85 4433065 #86 4704732 #87 4996210 #88 5326864 #89 6322180 #90 6323138 #91 6687568 #92 6850955 #93 7109697 #94 7290567 #95 9955016 #96 15677788 Processed Sequence Length: 16216 Total sequence length: 16307 Seed = 1149743202 motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 1 ** 1 2 3 4[] motif A cycle 4 AP 0.0 (0 sites) [] motif B cycle 4 AP -567.7 (18 sites) [] motif C cycle 4 AP -1161.7 (31 sites) [] motif D cycle 4 AP -245.0 (4 sites) Total Map : 412.899 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 53 5[] motif A cycle 5 AP -26.6 (1 sites) [] motif B cycle 5 AP -426.5 (17 sites) [] motif C cycle 5 AP -1245.5 (33 sites) [------] motif D cycle 5 AP -210.4 (4 sites) Total Map : 499.315 Prev: 412.899 Diff: 86.4157 Motifs: 55 6[] motif A cycle 6 AP -178.0 (10 sites) [] motif B cycle 6 AP -691.2 (26 sites) [] motif C cycle 6 AP -1189.4 (32 sites) [------] motif D cycle 6 AP -315.4 (6 sites) Total Map : 605.7 Prev: 499.315 Diff: 106.385 Motifs: 74 7[] motif A cycle 7 AP -260.3 (16 sites) [] motif B cycle 7 AP -664.9 (25 sites) [] motif C cycle 7 AP -1189.4 (32 sites) [------] motif D cycle 7 AP -366.4 (7 sites) Total Map : 646.637 Prev: 605.7 Diff: 40.937 Motifs: 80 8[] motif A cycle 8 AP -331.5 (20 sites) [] motif B cycle 8 AP -725.3 (27 sites) [] motif C cycle 8 AP -1141.6 (31 sites) [------] motif D cycle 8 AP -366.4 (7 sites) Total Map : 660.38 Prev: 646.637 Diff: 13.7434 Motifs: 85 9 10[] motif A cycle 10 AP -371.3 (22 sites) [] motif B cycle 10 AP -727.7 (27 sites) [] motif C cycle 10 AP -1245.5 (33 sites) [] motif D cycle 10 AP -346.6 (7 sites) Total Map : 671.561 Prev: 660.38 Diff: 11.181 Motifs: 89 11[] motif A cycle 11 AP -365.7 (22 sites) [] motif B cycle 11 AP -760.1 (28 sites) [] motif C cycle 11 AP -1189.4 (32 sites) [] motif D cycle 11 AP -346.6 (7 sites) Total Map : 685.28 Prev: 671.561 Diff: 13.719 Motifs: 89 12[] motif A cycle 12 AP -446.5 (26 sites) [] motif B cycle 12 AP -725.3 (27 sites) [] motif C cycle 12 AP -1141.6 (31 sites) [] motif D cycle 12 AP -346.6 (7 sites) Total Map : 689.5 Prev: 685.28 Diff: 4.21965 Motifs: 91 13 14 15[] motif A cycle 15 AP -422.8 (26 sites) [] motif B cycle 15 AP -726.1 (27 sites) [] motif C cycle 15 AP -1041.5 (29 sites) [] motif D cycle 15 AP -348.1 (7 sites) Total Map : 714.246 Prev: 689.5 Diff: 24.7462 Motifs: 89 16[] motif A cycle 16 AP -440.3 (27 sites) [] motif B cycle 16 AP -725.3 (27 sites) [] motif C cycle 16 AP -1041.5 (29 sites) [] motif D cycle 16 AP -348.1 (7 sites) Total Map : 717.139 Prev: 714.246 Diff: 2.89264 Motifs: 90 17[] motif A cycle 17 AP -416.4 (26 sites) [] motif B cycle 17 AP -725.3 (27 sites) [] motif C cycle 17 AP -1041.5 (29 sites) [] motif D cycle 17 AP -348.1 (7 sites) Total Map : 723.199 Prev: 717.139 Diff: 6.05975 Motifs: 89 18[] motif A cycle 18 AP -476.5 (29 sites) [] motif B cycle 18 AP -660.6 (25 sites) [] motif C cycle 18 AP -1041.5 (29 sites) [] motif D cycle 18 AP -348.1 (7 sites) Total Map : 725.194 Prev: 723.199 Diff: 1.99565 Motifs: 90 19[] motif A cycle 19 AP -387.7 (25 sites) [] motif B cycle 19 AP -725.3 (27 sites) [] motif C cycle 19 AP -1041.5 (29 sites) [] motif D cycle 19 AP -348.1 (7 sites) Total Map : 730.839 Prev: 725.194 Diff: 5.64428 Motifs: 88 20[] motif A cycle 20 AP -454.8 (28 sites) [] motif B cycle 20 AP -691.2 (26 sites) [] motif C cycle 20 AP -1090.2 (30 sites) [] motif D cycle 20 AP -348.1 (7 sites) Total Map : 740.315 Prev: 730.839 Diff: 9.47605 Motifs: 91 21 22[] motif A cycle 22 AP -347.4 (23 sites) [] motif B cycle 22 AP -691.2 (26 sites) [] motif C cycle 22 AP -1041.0 (29 sites) [] motif D cycle 22 AP -348.1 (7 sites) Total Map : 742.668 Prev: 740.315 Diff: 2.35327 Motifs: 85 23 24[] motif A cycle 24 AP -368.0 (24 sites) [] motif B cycle 24 AP -728.8 (27 sites) [] motif C cycle 24 AP -1041.0 (29 sites) [] motif D cycle 24 AP -348.1 (7 sites) Total Map : 742.863 Prev: 742.668 Diff: 0.19554 Motifs: 87 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 MAX :: 742.863379 (Seed = 1149743202, Iteration = 24 Motif A = 24 Motif B = 27 Motif C = 29 Motif D = 7 ) motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 2 ** 1 2 3 4[] motif A cycle 4 AP -53.8 (3 sites) [] motif B cycle 4 AP -751.2 (33 sites) [] motif C cycle 4 AP -1873.4 (54 sites) [] motif D cycle 4 AP -667.4 (12 sites) Total Map : 1679.61 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 102 5[] motif A cycle 5 AP -51.0 (3 sites) [] motif B cycle 5 AP -684.4 (33 sites) [] motif C cycle 5 AP -1599.1 (54 sites) [------] motif D cycle 5 AP -534.8 (12 sites) Total Map : 2005.21 Prev: 1679.61 Diff: 325.602 Motifs: 102 6[] motif A cycle 6 AP -71.4 (4 sites) [] motif B cycle 6 AP -741.8 (35 sites) [] motif C cycle 6 AP -1599.1 (54 sites) [------] motif D cycle 6 AP -468.9 (11 sites) Total Map : 2020.82 Prev: 2005.21 Diff: 15.6117 Motifs: 104 7[] motif A cycle 7 AP -51.0 (3 sites) [] motif B cycle 7 AP -741.8 (35 sites) [] motif C cycle 7 AP -1599.1 (54 sites) [------] motif D cycle 7 AP -468.9 (11 sites) Total Map : 2022.05 Prev: 2020.82 Diff: 1.23593 Motifs: 103 8 9 10[] motif A cycle 10 AP -48.5 (3 sites) [] motif B cycle 10 AP -741.8 (35 sites) [] motif C cycle 10 AP -1599.1 (54 sites) [] motif D cycle 10 AP -534.8 (12 sites) Total Map : 2026.1 Prev: 2022.05 Diff: 4.04393 Motifs: 104 11 12 13 14 15[] motif A cycle 15 AP -47.4 (3 sites) [] motif B cycle 15 AP -741.8 (35 sites) [] motif C cycle 15 AP -1599.1 (54 sites) [] motif D cycle 15 AP -534.8 (12 sites) Total Map : 2026.45 Prev: 2026.1 Diff: 0.353957 Motifs: 104 16 17 18[] motif A cycle 18 AP -64.6 (4 sites) [] motif B cycle 18 AP -741.8 (35 sites) [] motif C cycle 18 AP -1599.1 (54 sites) [] motif D cycle 18 AP -468.9 (11 sites) Total Map : 2028.63 Prev: 2026.45 Diff: 2.1753 Motifs: 104 19 20 21 22[] motif A cycle 22 AP -64.6 (4 sites) [] motif B cycle 22 AP -850.9 (38 sites) [] motif C cycle 22 AP -1599.1 (54 sites) [] motif D cycle 22 AP -468.9 (11 sites) Total Map : 2029.4 Prev: 2028.63 Diff: 0.769306 Motifs: 107 23 24 25[] motif A cycle 25 AP -64.6 (4 sites) [] motif B cycle 25 AP -850.9 (38 sites) [] motif C cycle 25 AP -1599.1 (54 sites) [] motif D cycle 25 AP -534.8 (12 sites) Total Map : 2029.8 Prev: 2029.4 Diff: 0.400632 Motifs: 108 26 27 28 29 30[] motif A cycle 30 AP -47.5 (3 sites) [] motif B cycle 30 AP -850.9 (38 sites) [] motif C cycle 30 AP -1658.1 (55 sites) [] motif D cycle 30 AP -467.4 (11 sites) Total Map : 2035.48 Prev: 2029.8 Diff: 5.68003 Motifs: 107 31 32 33 34[] motif A cycle 34 AP -47.5 (3 sites) [] motif B cycle 34 AP -816.5 (37 sites) [] motif C cycle 34 AP -1599.1 (54 sites) [] motif D cycle 34 AP -467.4 (11 sites) Total Map : 2036.53 Prev: 2035.48 Diff: 1.0477 Motifs: 105 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 MAX :: 2036.525370 (Seed = 1149743202, Iteration = 34 Motif A = 3 Motif B = 37 Motif C = 54 Motif D = 11 ) motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 3 ** 1 2 3 4[] motif A cycle 4 AP -408.4 (26 sites) [] motif B cycle 4 AP -71.4 (2 sites) [] motif C cycle 4 AP -1871.1 (53 sites) [] motif D cycle 4 AP -174.1 (3 sites) Total Map : 1146.38 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 84 5[] motif A cycle 5 AP -296.5 (26 sites) [] motif B cycle 5 AP -71.4 (2 sites) [] motif C cycle 5 AP -1665.3 (53 sites) [] motif D cycle 5 AP -174.1 (3 sites) Total Map : 1455.53 Prev: 1146.38 Diff: 309.149 Motifs: 84 6[] motif A cycle 6 AP -343.8 (29 sites) [] motif B cycle 6 AP -71.4 (2 sites) [] motif C cycle 6 AP -1696.2 (54 sites) [] motif D cycle 6 AP -174.1 (3 sites) Total Map : 1498.45 Prev: 1455.53 Diff: 42.9205 Motifs: 88 7 8[] motif A cycle 8 AP -384.0 (31 sites) [] motif B cycle 8 AP -71.4 (2 sites) [] motif C cycle 8 AP -1696.2 (54 sites) [] motif D cycle 8 AP -174.1 (3 sites) Total Map : 1502.55 Prev: 1498.45 Diff: 4.0994 Motifs: 90 9[] motif A cycle 9 AP -422.5 (33 sites) [] motif B cycle 9 AP -71.4 (2 sites) [] motif C cycle 9 AP -1696.2 (54 sites) [] motif D cycle 9 AP -174.1 (3 sites) Total Map : 1503.54 Prev: 1502.55 Diff: 0.99024 Motifs: 92 10 11[] motif A cycle 11 AP -419.9 (34 sites) [] motif B cycle 11 AP -71.4 (2 sites) [] motif C cycle 11 AP -1753.3 (55 sites) [] motif D cycle 11 AP -174.1 (3 sites) Total Map : 1516.48 Prev: 1503.54 Diff: 12.938 Motifs: 94 12[] motif A cycle 12 AP -419.9 (34 sites) [] motif B cycle 12 AP -71.4 (2 sites) [] motif C cycle 12 AP -1696.2 (54 sites) [] motif D cycle 12 AP -174.1 (3 sites) Total Map : 1518.67 Prev: 1516.48 Diff: 2.18498 Motifs: 93 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 4 ** 1 2 3 4[] motif A cycle 4 AP 0.0 (0 sites) [] motif B cycle 4 AP -1240.2 (53 sites) [] motif C cycle 4 AP -192.8 (5 sites) [] motif D cycle 4 AP -1139.0 (22 sites) Total Map : 1140.99 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 80 5[] motif A cycle 5 AP -27.5 (1 sites) [] motif B cycle 5 AP -1108.9 (54 sites) [] motif C cycle 5 AP -107.7 (4 sites) [+++] motif D cycle 5 AP -1112.4 (24 sites) Total Map : 1548.32 Prev: 1140.99 Diff: 407.325 Motifs: 83 6 7 8 9[] motif A cycle 9 AP -370.2 (22 sites) [] motif B cycle 9 AP -1103.5 (54 sites) [] motif C cycle 9 AP -1257.6 (33 sites) [+++] motif D cycle 9 AP 0.0 (0 sites) Total Map : 1561.07 Prev: 1548.32 Diff: 12.75 Motifs: 109 10[] motif A cycle 10 AP -434.4 (26 sites) [] motif B cycle 10 AP -1071.7 (53 sites) [] motif C cycle 10 AP -1241.9 (34 sites) [] motif D cycle 10 AP -75.4 (1 sites) Total Map : 1606.41 Prev: 1561.07 Diff: 45.3433 Motifs: 114 11 12[] motif A cycle 12 AP -551.0 (33 sites) [] motif B cycle 12 AP -1144.9 (55 sites) [] motif C cycle 12 AP -1232.3 (34 sites) [] motif D cycle 12 AP 0.0 (0 sites) Total Map : 1660.34 Prev: 1606.41 Diff: 53.9249 Motifs: 122 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 5 ** 1 2 3 4[] motif A cycle 4 AP -44.7 (2 sites) [] motif B cycle 4 AP -1352.6 (53 sites) [] motif C cycle 4 AP -98.3 (2 sites) [] motif D cycle 4 AP -1454.8 (28 sites) Total Map : 989.762 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 85 5[] motif A cycle 5 AP -61.7 (3 sites) [] motif B cycle 5 AP -1093.0 (53 sites) [] motif C cycle 5 AP -82.2 (2 sites) [-----] motif D cycle 5 AP -1250.5 (29 sites) Total Map : 1543.73 Prev: 989.762 Diff: 553.967 Motifs: 87 6[] motif A cycle 6 AP -117.1 (6 sites) [] motif B cycle 6 AP -1118.4 (54 sites) [] motif C cycle 6 AP -82.2 (2 sites) [-----] motif D cycle 6 AP -1228.9 (29 sites) Total Map : 1579.26 Prev: 1543.73 Diff: 35.536 Motifs: 91 7 8 9 10[] motif A cycle 10 AP -119.2 (6 sites) [] motif B cycle 10 AP -1156.4 (55 sites) [] motif C cycle 10 AP -81.4 (2 sites) [] motif D cycle 10 AP -1227.7 (29 sites) Total Map : 1584.6 Prev: 1579.26 Diff: 5.33507 Motifs: 92 11[] motif A cycle 11 AP -117.1 (6 sites) [] motif B cycle 11 AP -1117.5 (54 sites) [] motif C cycle 11 AP -81.4 (2 sites) [] motif D cycle 11 AP -1227.7 (29 sites) Total Map : 1584.81 Prev: 1584.6 Diff: 0.213828 Motifs: 91 12[] motif A cycle 12 AP -117.1 (6 sites) [] motif B cycle 12 AP -1156.4 (55 sites) [] motif C cycle 12 AP -81.4 (2 sites) [] motif D cycle 12 AP -1227.7 (29 sites) Total Map : 1587.59 Prev: 1584.81 Diff: 2.77506 Motifs: 92 13 14 15 16 17 18 19 20[] motif A cycle 20 AP -97.4 (5 sites) [] motif B cycle 20 AP -1156.8 (55 sites) [] motif C cycle 20 AP -79.9 (2 sites) [] motif D cycle 20 AP -1227.7 (29 sites) Total Map : 1587.99 Prev: 1587.59 Diff: 0.397059 Motifs: 91 21 22 23 24 25 26 27 28 29 30[] motif A cycle 30 AP -39.0 (2 sites) [] motif B cycle 30 AP -1156.8 (55 sites) [] motif C cycle 30 AP -80.5 (2 sites) [] motif D cycle 30 AP -1227.7 (29 sites) Total Map : 1588.16 Prev: 1587.99 Diff: 0.175261 Motifs: 88 31 32[] motif A cycle 32 AP -58.3 (3 sites) [] motif B cycle 32 AP -1156.8 (55 sites) [] motif C cycle 32 AP -80.5 (2 sites) [] motif D cycle 32 AP -1227.7 (29 sites) Total Map : 1588.59 Prev: 1588.16 Diff: 0.429544 Motifs: 89 33 34 35 36[] motif A cycle 36 AP -232.0 (13 sites) [] motif B cycle 36 AP -1117.1 (54 sites) [] motif C cycle 36 AP 0.0 (0 sites) [] motif D cycle 36 AP -1227.7 (29 sites) Total Map : 1601.78 Prev: 1588.59 Diff: 13.1946 Motifs: 96 37[] motif A cycle 37 AP -229.0 (13 sites) [] motif B cycle 37 AP -1156.8 (55 sites) [] motif C cycle 37 AP -234.4 (5 sites) [] motif D cycle 37 AP -1227.7 (29 sites) Total Map : 1611.86 Prev: 1601.78 Diff: 10.0767 Motifs: 102 38[] motif A cycle 38 AP -247.4 (14 sites) [] motif B cycle 38 AP -1156.8 (55 sites) [] motif C cycle 38 AP -544.7 (13 sites) [] motif D cycle 38 AP -1227.7 (29 sites) Total Map : 1688.66 Prev: 1611.86 Diff: 76.798 Motifs: 111 39 40[] motif A cycle 40 AP -199.9 (12 sites) [] motif B cycle 40 AP -1156.8 (55 sites) [] motif C cycle 40 AP -692.6 (17 sites) [] motif D cycle 40 AP -1227.7 (29 sites) Total Map : 1767.8 Prev: 1688.66 Diff: 79.137 Motifs: 113 41[] motif A cycle 41 AP -311.4 (18 sites) [] motif B cycle 41 AP -1156.8 (55 sites) [] motif C cycle 41 AP -784.5 (19 sites) [] motif D cycle 41 AP -1227.7 (29 sites) Total Map : 1785.93 Prev: 1767.8 Diff: 18.1288 Motifs: 121 42 43 44 45 46[] motif A cycle 46 AP -483.4 (27 sites) [] motif B cycle 46 AP -1120.7 (54 sites) [] motif C cycle 46 AP -928.7 (22 sites) [] motif D cycle 46 AP -1227.7 (29 sites) Total Map : 1799.6 Prev: 1785.93 Diff: 13.6793 Motifs: 132 47[] motif A cycle 47 AP -529.5 (29 sites) [] motif B cycle 47 AP -1156.8 (55 sites) [] motif C cycle 47 AP -880.8 (21 sites) [] motif D cycle 47 AP -1227.7 (29 sites) Total Map : 1804.67 Prev: 1799.6 Diff: 5.06939 Motifs: 134 48[] motif A cycle 48 AP -459.6 (26 sites) [] motif B cycle 48 AP -1118.4 (54 sites) [] motif C cycle 48 AP -876.6 (21 sites) [] motif D cycle 48 AP -1227.7 (29 sites) Total Map : 1808.9 Prev: 1804.67 Diff: 4.22417 Motifs: 130 49 50 51 52[] motif A cycle 52 AP -487.1 (27 sites) [] motif B cycle 52 AP -1156.8 (55 sites) [] motif C cycle 52 AP -914.8 (22 sites) [] motif D cycle 52 AP -1227.7 (29 sites) Total Map : 1817.39 Prev: 1808.9 Diff: 8.49225 Motifs: 133 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 6 ** 1 2 3 4[] motif A cycle 4 AP -67.0 (3 sites) [] motif B cycle 4 AP -1223.6 (54 sites) [] motif C cycle 4 AP -145.7 (3 sites) [] motif D cycle 4 AP -224.8 (4 sites) Total Map : 981.16 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 64 5[] motif A cycle 5 AP -37.3 (2 sites) [] motif B cycle 5 AP -989.0 (54 sites) [] motif C cycle 5 AP -304.6 (8 sites) [----] motif D cycle 5 AP -187.9 (4 sites) Total Map : 1275.57 Prev: 981.16 Diff: 294.41 Motifs: 68 6[] motif A cycle 6 AP -81.8 (5 sites) [] motif B cycle 6 AP -989.0 (54 sites) [] motif C cycle 6 AP -590.1 (16 sites) [----] motif D cycle 6 AP -358.3 (7 sites) Total Map : 1416.85 Prev: 1275.57 Diff: 141.283 Motifs: 82 7[] motif A cycle 7 AP -96.8 (6 sites) [] motif B cycle 7 AP -989.0 (54 sites) [] motif C cycle 7 AP -633.2 (17 sites) [----] motif D cycle 7 AP -358.3 (7 sites) Total Map : 1424.61 Prev: 1416.85 Diff: 7.75982 Motifs: 84 8 9 10[] motif A cycle 10 AP -91.9 (6 sites) [] motif B cycle 10 AP -989.0 (54 sites) [] motif C cycle 10 AP -674.5 (18 sites) [-] motif D cycle 10 AP -337.3 (7 sites) Total Map : 1448.23 Prev: 1424.61 Diff: 23.6156 Motifs: 85 11[] motif A cycle 11 AP -91.9 (6 sites) [] motif B cycle 11 AP -989.0 (54 sites) [] motif C cycle 11 AP -667.6 (18 sites) [-] motif D cycle 11 AP -337.3 (7 sites) Total Map : 1456.26 Prev: 1448.23 Diff: 8.03531 Motifs: 85 12 13[] motif A cycle 13 AP -114.3 (7 sites) [] motif B cycle 13 AP -989.0 (54 sites) [] motif C cycle 13 AP -854.9 (22 sites) [-] motif D cycle 13 AP -337.3 (7 sites) Total Map : 1463.23 Prev: 1456.26 Diff: 6.96803 Motifs: 90 14[] motif A cycle 14 AP -137.2 (8 sites) [] motif B cycle 14 AP -989.0 (54 sites) [] motif C cycle 14 AP -803.3 (21 sites) [-] motif D cycle 14 AP -337.3 (7 sites) Total Map : 1465.75 Prev: 1463.23 Diff: 2.52048 Motifs: 90 15[] motif A cycle 15 AP -134.8 (8 sites) [] motif B cycle 15 AP -989.0 (54 sites) [] motif C cycle 15 AP -803.3 (21 sites) [+] motif D cycle 15 AP -336.6 (7 sites) Total Map : 1470.32 Prev: 1465.75 Diff: 4.56864 Motifs: 90 16[] motif A cycle 16 AP -108.3 (7 sites) [] motif B cycle 16 AP -989.0 (54 sites) [] motif C cycle 16 AP -850.5 (22 sites) [+] motif D cycle 16 AP -411.5 (8 sites) Total Map : 1480.76 Prev: 1470.32 Diff: 10.4373 Motifs: 91 17 18 19 20[] motif A cycle 20 AP -128.6 (8 sites) [] motif B cycle 20 AP -989.0 (54 sites) [] motif C cycle 20 AP -940.1 (24 sites) [--] motif D cycle 20 AP -405.0 (8 sites) Total Map : 1500.43 Prev: 1480.76 Diff: 19.6736 Motifs: 94 21[] motif A cycle 21 AP -108.3 (7 sites) [] motif B cycle 21 AP -989.0 (54 sites) [] motif C cycle 21 AP -940.1 (24 sites) [--] motif D cycle 21 AP -405.0 (8 sites) Total Map : 1501.82 Prev: 1500.43 Diff: 1.38679 Motifs: 93 22[] motif A cycle 22 AP -108.3 (7 sites) [] motif B cycle 22 AP -989.0 (54 sites) [] motif C cycle 22 AP -986.8 (25 sites) [--] motif D cycle 22 AP -405.0 (8 sites) Total Map : 1503.53 Prev: 1501.82 Diff: 1.70712 Motifs: 94 23 24 25 26 27[] motif A cycle 27 AP -128.6 (8 sites) [] motif B cycle 27 AP -989.0 (54 sites) [] motif C cycle 27 AP -939.0 (24 sites) [] motif D cycle 27 AP -404.6 (8 sites) Total Map : 1504 Prev: 1503.53 Diff: 0.475394 Motifs: 94 28[] motif A cycle 28 AP -108.3 (7 sites) [] motif B cycle 28 AP -989.0 (54 sites) [] motif C cycle 28 AP -939.0 (24 sites) [] motif D cycle 28 AP -404.6 (8 sites) Total Map : 1505.38 Prev: 1504 Diff: 1.37798 Motifs: 93 29 30 31 32 33 34 35[] motif A cycle 35 AP -142.7 (9 sites) [] motif B cycle 35 AP -989.0 (54 sites) [] motif C cycle 35 AP -889.8 (23 sites) [] motif D cycle 35 AP -471.7 (9 sites) Total Map : 1508.79 Prev: 1505.38 Diff: 3.40696 Motifs: 95 36 37[] motif A cycle 37 AP -159.8 (10 sites) [] motif B cycle 37 AP -989.0 (54 sites) [] motif C cycle 37 AP -939.0 (24 sites) [] motif D cycle 37 AP -404.6 (8 sites) Total Map : 1514.12 Prev: 1508.79 Diff: 5.3377 Motifs: 96 38[] motif A cycle 38 AP -159.8 (10 sites) [] motif B cycle 38 AP -989.0 (54 sites) [] motif C cycle 38 AP -986.4 (25 sites) [] motif D cycle 38 AP -404.6 (8 sites) Total Map : 1514.79 Prev: 1514.12 Diff: 0.670946 Motifs: 97 39 40[] motif A cycle 40 AP -175.3 (11 sites) [] motif B cycle 40 AP -989.0 (54 sites) [] motif C cycle 40 AP -939.0 (24 sites) [] motif D cycle 40 AP -404.6 (8 sites) Total Map : 1518.92 Prev: 1514.79 Diff: 4.12608 Motifs: 97 41[] motif A cycle 41 AP -151.0 (10 sites) [] motif B cycle 41 AP -989.0 (54 sites) [] motif C cycle 41 AP -937.3 (24 sites) [] motif D cycle 41 AP -404.6 (8 sites) Total Map : 1522.39 Prev: 1518.92 Diff: 3.46783 Motifs: 96 42[] motif A cycle 42 AP -132.3 (9 sites) [] motif B cycle 42 AP -989.0 (54 sites) [] motif C cycle 42 AP -986.4 (25 sites) [] motif D cycle 42 AP -404.6 (8 sites) Total Map : 1523.36 Prev: 1522.39 Diff: 0.973725 Motifs: 96 43 44 45 46 47 48 49 50[] motif A cycle 50 AP -151.0 (10 sites) [] motif B cycle 50 AP -989.0 (54 sites) [] motif C cycle 50 AP -986.4 (25 sites) [] motif D cycle 50 AP -404.6 (8 sites) Total Map : 1524.12 Prev: 1523.36 Diff: 0.756895 Motifs: 97 51 52 53 54[] motif A cycle 54 AP -169.9 (11 sites) [] motif B cycle 54 AP -989.0 (54 sites) [] motif C cycle 54 AP -939.0 (24 sites) [] motif D cycle 54 AP -404.6 (8 sites) Total Map : 1524.49 Prev: 1524.12 Diff: 0.371963 Motifs: 97 55 56 57 58 59 60 61 62 63 64 65[] motif A cycle 65 AP -151.0 (10 sites) [] motif B cycle 65 AP -989.0 (54 sites) [] motif C cycle 65 AP -986.8 (25 sites) [] motif D cycle 65 AP -402.9 (8 sites) Total Map : 1529.11 Prev: 1524.49 Diff: 4.61554 Motifs: 97 66[] motif A cycle 66 AP -169.9 (11 sites) [] motif B cycle 66 AP -989.0 (54 sites) [] motif C cycle 66 AP -986.8 (25 sites) [] motif D cycle 66 AP -402.9 (8 sites) Total Map : 1530.16 Prev: 1529.11 Diff: 1.05073 Motifs: 98 67 68 69 70[] motif A cycle 70 AP -169.9 (11 sites) [] motif B cycle 70 AP -989.0 (54 sites) [] motif C cycle 70 AP -986.4 (25 sites) [] motif D cycle 70 AP -402.9 (8 sites) Total Map : 1532.06 Prev: 1530.16 Diff: 1.89983 Motifs: 98 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 7 ** 1 2 3 4[] motif A cycle 4 AP -201.0 (12 sites) [] motif B cycle 4 AP -113.6 (3 sites) [] motif C cycle 4 AP -930.1 (23 sites) [] motif D cycle 4 AP -2959.1 (54 sites) Total Map : 903.055 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 92 5[] motif A cycle 5 AP -150.4 (11 sites) [] motif B cycle 5 AP -117.6 (4 sites) [] motif C cycle 5 AP -871.9 (23 sites) [] motif D cycle 5 AP -2452.5 (52 sites) Total Map : 1437.59 Prev: 903.055 Diff: 534.536 Motifs: 90 6[] motif A cycle 6 AP -185.8 (13 sites) [] motif B cycle 6 AP -63.1 (2 sites) [] motif C cycle 6 AP -817.6 (22 sites) [] motif D cycle 6 AP -2556.7 (54 sites) Total Map : 1469.43 Prev: 1437.59 Diff: 31.8425 Motifs: 91 7[] motif A cycle 7 AP -185.8 (13 sites) [] motif B cycle 7 AP -63.1 (2 sites) [] motif C cycle 7 AP -817.6 (22 sites) [] motif D cycle 7 AP -2469.2 (53 sites) Total Map : 1490.24 Prev: 1469.43 Diff: 20.8077 Motifs: 90 8 9 10[] motif A cycle 10 AP -185.8 (13 sites) [] motif B cycle 10 AP -60.9 (2 sites) [] motif C cycle 10 AP -923.4 (24 sites) [--] motif D cycle 10 AP -2454.7 (53 sites) Total Map : 1505.14 Prev: 1490.24 Diff: 14.9023 Motifs: 92 11 12 13[] motif A cycle 13 AP -185.8 (13 sites) [] motif B cycle 13 AP -60.9 (2 sites) [] motif C cycle 13 AP -817.6 (22 sites) [--] motif D cycle 13 AP -2454.7 (53 sites) Total Map : 1505.52 Prev: 1505.14 Diff: 0.372181 Motifs: 90 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 8 ** 1 2 3 4[] motif A cycle 4 AP -56.8 (3 sites) [] motif B cycle 4 AP -594.9 (23 sites) [] motif C cycle 4 AP -194.1 (4 sites) [] motif D cycle 4 AP -367.4 (6 sites) Total Map : 271.959 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 36 5[] motif A cycle 5 AP -46.0 (3 sites) [] motif B cycle 5 AP -486.2 (21 sites) [] motif C cycle 5 AP -979.1 (29 sites) [-----] motif D cycle 5 AP -276.6 (6 sites) Total Map : 840.305 Prev: 271.959 Diff: 568.345 Motifs: 59 6[] motif A cycle 6 AP -46.0 (3 sites) [] motif B cycle 6 AP -481.1 (20 sites) [] motif C cycle 6 AP -1396.9 (39 sites) [-----] motif D cycle 6 AP -486.7 (10 sites) Total Map : 950.671 Prev: 840.305 Diff: 110.367 Motifs: 72 7 8[] motif A cycle 8 AP -46.0 (3 sites) [] motif B cycle 8 AP -316.3 (10 sites) [] motif C cycle 8 AP -3004.0 (74 sites) [-----] motif D cycle 8 AP -552.2 (11 sites) Total Map : 954.552 Prev: 950.671 Diff: 3.88022 Motifs: 98 9 10[] motif A cycle 10 AP -46.5 (3 sites) [] motif B cycle 10 AP -243.9 (8 sites) [] motif C cycle 10 AP -2992.1 (74 sites) [-] motif D cycle 10 AP -530.1 (11 sites) Total Map : 1001.93 Prev: 954.552 Diff: 47.3832 Motifs: 96 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 9 ** 1 2 3 4[] motif A cycle 4 AP -86.2 (4 sites) [] motif B cycle 4 AP -122.1 (3 sites) [] motif C cycle 4 AP -2149.3 (51 sites) [] motif D cycle 4 AP -234.1 (4 sites) Total Map : 509.201 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 62 5[] motif A cycle 5 AP -66.9 (4 sites) [] motif B cycle 5 AP -75.7 (3 sites) [] motif C cycle 5 AP -1697.1 (50 sites) [] motif D cycle 5 AP -461.9 (9 sites) Total Map : 1092.03 Prev: 509.201 Diff: 582.825 Motifs: 66 6[] motif A cycle 6 AP -311.5 (21 sites) [] motif B cycle 6 AP -96.6 (3 sites) [] motif C cycle 6 AP -1790.1 (52 sites) [] motif D cycle 6 AP -742.9 (14 sites) Total Map : 1213.73 Prev: 1092.03 Diff: 121.707 Motifs: 90 7[] motif A cycle 7 AP -368.8 (24 sites) [] motif B cycle 7 AP -112.7 (4 sites) [] motif C cycle 7 AP -1737.2 (51 sites) [] motif D cycle 7 AP -1041.8 (19 sites) Total Map : 1253.72 Prev: 1213.73 Diff: 39.9908 Motifs: 98 8 9 10[] motif A cycle 10 AP -481.8 (31 sites) [] motif B cycle 10 AP -58.0 (2 sites) [] motif C cycle 10 AP -1790.1 (52 sites) [] motif D cycle 10 AP -1237.8 (23 sites) Total Map : 1285.2 Prev: 1253.72 Diff: 31.4769 Motifs: 108 11[] motif A cycle 11 AP -489.8 (32 sites) [] motif B cycle 11 AP -99.2 (3 sites) [] motif C cycle 11 AP -1790.1 (52 sites) [] motif D cycle 11 AP -1417.6 (26 sites) Total Map : 1303.74 Prev: 1285.2 Diff: 18.5344 Motifs: 113 12[] motif A cycle 12 AP -492.2 (32 sites) [] motif B cycle 12 AP -58.0 (2 sites) [] motif C cycle 12 AP -1842.5 (53 sites) [] motif D cycle 12 AP -1419.1 (26 sites) Total Map : 1309.87 Prev: 1303.74 Diff: 6.13088 Motifs: 113 13 14 15 16 17[] motif A cycle 17 AP -469.5 (31 sites) [] motif B cycle 17 AP -58.7 (2 sites) [] motif C cycle 17 AP -1790.1 (52 sites) [] motif D cycle 17 AP -1419.1 (26 sites) Total Map : 1311.38 Prev: 1309.87 Diff: 1.50905 Motifs: 111 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771 motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158 motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232 motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066 ** 10 ** 1 2 3 4[] motif A cycle 4 AP -374.5 (22 sites) [] motif B cycle 4 AP -74.4 (2 sites) [] motif C cycle 4 AP -1774.4 (54 sites) [] motif D cycle 4 AP -1794.7 (36 sites) Total Map : 1821.24 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 114 5[] motif A cycle 5 AP -364.1 (22 sites) [] motif B cycle 5 AP -74.4 (2 sites) [] motif C cycle 5 AP -1607.6 (54 sites) [++] motif D cycle 5 AP -1738.3 (36 sites) Total Map : 2005.44 Prev: 1821.24 Diff: 184.204 Motifs: 114 6[] motif A cycle 6 AP -355.0 (22 sites) [] motif B cycle 6 AP -74.4 (2 sites) [] motif C cycle 6 AP -1607.6 (54 sites) [++] motif D cycle 6 AP -1800.3 (37 sites) Total Map : 2026.99 Prev: 2005.44 Diff: 21.5427 Motifs: 115 7[] motif A cycle 7 AP -305.7 (20 sites) [] motif B cycle 7 AP -74.4 (2 sites) [] motif C cycle 7 AP -1607.6 (54 sites) [++] motif D cycle 7 AP -1800.3 (37 sites) Total Map : 2042.47 Prev: 2026.99 Diff: 15.4851 Motifs: 113 8[] motif A cycle 8 AP -360.2 (23 sites) [] motif B cycle 8 AP -74.4 (2 sites) [] motif C cycle 8 AP -1607.6 (54 sites) [++] motif D cycle 8 AP -1734.6 (36 sites) Total Map : 2049.89 Prev: 2042.47 Diff: 7.41789 Motifs: 115 9 10[] motif A cycle 10 AP -321.7 (22 sites) [] motif B cycle 10 AP -74.4 (2 sites) [] motif C cycle 10 AP -1607.6 (54 sites) [] motif D cycle 10 AP -1800.3 (37 sites) Total Map : 2065.41 Prev: 2049.89 Diff: 15.5161 Motifs: 115 11[] motif A cycle 11 AP -345.0 (23 sites) [] motif B cycle 11 AP -74.4 (2 sites) [] motif C cycle 11 AP -1607.6 (54 sites) [] motif D cycle 11 AP -1800.3 (37 sites) Total Map : 2068.72 Prev: 2065.41 Diff: 3.31307 Motifs: 116 12 13 14 15 16[] motif A cycle 16 AP -467.4 (29 sites) [] motif B cycle 16 AP -74.4 (2 sites) [] motif C cycle 16 AP -1607.6 (54 sites) [] motif D cycle 16 AP -1800.3 (37 sites) Total Map : 2074.61 Prev: 2068.72 Diff: 5.89359 Motifs: 122 17[] motif A cycle 17 AP -469.2 (29 sites) [] motif B cycle 17 AP -74.4 (2 sites) [] motif C cycle 17 AP -1607.6 (54 sites) [] motif D cycle 17 AP -1800.3 (37 sites) Total Map : 2075.76 Prev: 2074.61 Diff: 1.14881 Motifs: 122 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32[] motif A cycle 32 AP -377.5 (25 sites) [] motif B cycle 32 AP -107.2 (3 sites) [] motif C cycle 32 AP -1607.6 (54 sites) [] motif D cycle 32 AP -1800.3 (37 sites) Total Map : 2090.08 Prev: 2075.76 Diff: 14.3209 Motifs: 119 33 34[] motif A cycle 34 AP -398.0 (26 sites) [] motif B cycle 34 AP -107.2 (3 sites) [] motif C cycle 34 AP -1607.6 (54 sites) [] motif D cycle 34 AP -1800.3 (37 sites) Total Map : 2090.33 Prev: 2090.08 Diff: 0.249073 Motifs: 120 35[] motif A cycle 35 AP -377.0 (25 sites) [] motif B cycle 35 AP -89.9 (3 sites) [] motif C cycle 35 AP -1607.6 (54 sites) [] motif D cycle 35 AP -1800.3 (37 sites) Total Map : 2095.44 Prev: 2090.33 Diff: 5.10984 Motifs: 119 36[] motif A cycle 36 AP -377.0 (25 sites) [] motif B cycle 36 AP -159.5 (5 sites) [] motif C cycle 36 AP -1607.6 (54 sites) [] motif D cycle 36 AP -1800.3 (37 sites) Total Map : 2099.24 Prev: 2095.44 Diff: 3.79902 Motifs: 121 37[] motif A cycle 37 AP -377.0 (25 sites) [] motif B cycle 37 AP -160.6 (5 sites) [] motif C cycle 37 AP -1607.6 (54 sites) [] motif D cycle 37 AP -1800.3 (37 sites) Total Map : 2102.31 Prev: 2099.24 Diff: 3.07348 Motifs: 121 38 39 40 41[] motif A cycle 41 AP -356.4 (24 sites) [] motif B cycle 41 AP -160.7 (5 sites) [] motif C cycle 41 AP -1607.6 (54 sites) [] motif D cycle 41 AP -1800.3 (37 sites) Total Map : 2103.18 Prev: 2102.31 Diff: 0.864532 Motifs: 120 42 43[] motif A cycle 43 AP -377.0 (25 sites) [] motif B cycle 43 AP -160.7 (5 sites) [] motif C cycle 43 AP -1607.6 (54 sites) [] motif D cycle 43 AP -1800.3 (37 sites) Total Map : 2103.36 Prev: 2103.18 Diff: 0.182404 Motifs: 121 44 45[] motif A cycle 45 AP -422.3 (27 sites) [] motif B cycle 45 AP -229.0 (7 sites) [] motif C cycle 45 AP -1607.6 (54 sites) [] motif D cycle 45 AP -1800.3 (37 sites) Total Map : 2104.94 Prev: 2103.36 Diff: 1.57855 Motifs: 125 46[] motif A cycle 46 AP -422.3 (27 sites) [] motif B cycle 46 AP -187.8 (6 sites) [] motif C cycle 46 AP -1607.6 (54 sites) [] motif D cycle 46 AP -1668.3 (35 sites) Total Map : 2109.23 Prev: 2104.94 Diff: 4.29029 Motifs: 122 47[] motif A cycle 47 AP -446.7 (28 sites) [] motif B cycle 47 AP -187.8 (6 sites) [] motif C cycle 47 AP -1607.6 (54 sites) [] motif D cycle 47 AP -1668.3 (35 sites) Total Map : 2109.87 Prev: 2109.23 Diff: 0.644012 Motifs: 123 48 49[] motif A cycle 49 AP -490.2 (30 sites) [] motif B cycle 49 AP -187.8 (6 sites) [] motif C cycle 49 AP -1607.6 (54 sites) [] motif D cycle 49 AP -1668.3 (35 sites) Total Map : 2110.99 Prev: 2109.87 Diff: 1.11211 Motifs: 125 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 MAX :: 2110.985589 (Seed = 1149743202, Iteration = 49 Motif A = 30 Motif B = 6 Motif C = 54 Motif D = 35 ) Max subopt MAP found on seed 10 ======================== NEAR OPTIMAL RESULTS ======================== ====================================================================== MAP = 584 maybe = 588 discard = 64640 Max set 2111.157969 at 4 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 255 260 265 270 275 280 285 290 295 300 305 310 315 320 325 330 335 340 345 350 355 360 365 370 375 380 385 390 395 400 405 410 415 420 425 430 435 440 445 450 455 460 465 470 475 480 485 490 495 500 ============================================================= ====== Results by Sequence ===== ============================================================= 1, 1, 3 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044 1, 2, 0 79 agvda VIVLSANDPFVQ safgk 90 1.00 F 1091044 2, 1, 3 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494 2, 2, 0 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494 2, 3, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494 3, 1, 2 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727 4, 1, 2 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686 5, 1, 2 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976 6, 1, 3 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 1.00 F 13186328 6, 2, 0 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328 7, 1, 2 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154 8, 1, 3 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053 8, 2, 0 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053 9, 1, 3 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117 9, 2, 0 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117 10, 1, 2 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765 11, 1, 2 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082 12, 1, 2 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543 13, 1, 3 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173 13, 2, 0 65 eldce LVGLSVDQVFSH ikwie 76 1.00 F 14286173 14, 1, 2 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634 15, 1, 3 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438 15, 2, 0 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438 16, 1, 2 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394 17, 1, 2 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673 18, 1, 2 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256 19, 1, 2 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312 20, 1, 2 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725 21, 1, 0 80 megvd VTVVSMDLPFAQ krfce 91 1.00 F 15605963 22, 1, 3 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375 23, 1, 3 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658 23, 2, 0 70 tagln VVGISPDKPEKL atfrd 81 1.00 F 15609658 24, 1, 3 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511 24, 2, 0 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511 25, 1, 2 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085 26, 1, 2 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140 27, 1, 2 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431 28, 1, 3 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152 28, 2, 0 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152 30, 1, 3 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738 30, 2, 0 101 tpgla VWGISPDSTYAH eafad 112 1.00 F 15790738 31, 1, 2 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337 32, 1, 0 80 idntv VLCISADLPFAQ srfcg 91 1.00 F 15801846 33, 1, 2 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225 34, 1, 2 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374 35, 1, 3 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234 35, 2, 0 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234 36, 1, 3 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629 37, 1, 2 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007 38, 1, 3 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339 38, 2, 0 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339 39, 1, 3 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668 40, 1, 2 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937 41, 1, 2 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313 42, 1, 2 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864 43, 1, 2 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427 44, 1, 3 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919 46, 1, 2 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495 47, 1, 3 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671 47, 2, 0 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671 47, 3, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671 48, 1, 2 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717 49, 1, 2 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994 50, 1, 2 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507 51, 1, 3 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644 52, 1, 2 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867 53, 1, 3 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033 53, 2, 0 73 ngvde IVCISVNDAFVM newak 84 1.00 F 17229033 54, 1, 2 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859 55, 1, 2 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944 56, 1, 2 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233 57, 1, 2 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401 58, 1, 2 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503 59, 1, 2 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723 60, 1, 3 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548 60, 2, 0 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548 61, 1, 2 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743 61, 2, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743 62, 1, 3 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077 62, 2, 0 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077 63, 1, 2 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157 64, 1, 2 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357 66, 1, 2 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028 67, 1, 3 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112 67, 2, 0 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112 67, 3, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112 68, 1, 3 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072 69, 1, 2 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859 70, 1, 3 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405 70, 2, 0 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405 71, 1, 3 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878 72, 1, 0 78 keegi VLTISADLPFAQ krwca 89 1.00 F 21283385 73, 1, 3 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812 73, 2, 0 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812 74, 1, 2 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307 76, 1, 2 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116 77, 1, 2 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582 78, 1, 2 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332 79, 1, 2 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713 80, 1, 2 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501 81, 1, 3 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841 81, 2, 0 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841 81, 3, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841 82, 1, 2 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237 83, 1, 2 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972 84, 1, 2 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327 85, 1, 3 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065 85, 2, 0 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065 86, 1, 3 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732 86, 2, 0 74 kgide IICFSVNDPFVM kawgk 85 1.00 F 4704732 87, 1, 3 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210 87, 2, 0 68 klnck LIGFSCNSKESH dqwie 79 1.00 F 4996210 87, 3, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210 88, 1, 3 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864 89, 1, 3 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180 90, 1, 3 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 1.00 F 6323138 91, 1, 2 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568 92, 1, 0 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955 93, 1, 2 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697 94, 1, 2 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567 95, 1, 3 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016 95, 2, 0 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016 95, 3, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016 96, 1, 2 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788 124 motifs Column 1 : Sequence Number, Site Number Column 2 : Motif type Column 3 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input ------------------------------------------------------------------------- MOTIF a Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | . 68 . . . . . . 20 . . 10 . . . . . . . . 2.4 2 | . 17 . . . . . . 34 3 . 34 . . 6 . . . . 3 1.7 3 | 13 6 10 . . . 51 . . . . 3 . . . . . . 10 3 1.9 4 | . 31 . . . 10 . . 48 . . 10 . . . . . . . . 2.1 5 | . . . . . . . . . . . . . 3 . . . . 96 . 3.8 7 | . . . 86 . . . . . . . . . 13 . . . . . . 3.2 8 | . . . 10 . . . . . . 3 10 . . . 6 3 . 58 6 2.0 9 | 3 34 . . 6 . . 6 . . 3 . . . . 37 3 . . 3 1.8 10 | . . . . 24 58 . . . . . . . . 17 . . . . . 2.9 12 | . . . . . . . 55 . . . 13 6 6 . . 17 . . . 3.4 nonsite 8 8 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5 site 1 15 1 9 3 6 5 6 10 . . 8 . 2 2 4 2 . 16 1 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.001 0.679 0.000 0.001 0.001 0.001 0.001 0.000 0.204 0.000 0.001 0.103 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001 2 | 0.001 0.171 0.000 0.001 0.001 0.001 0.001 0.000 0.340 0.034 0.001 0.341 0.000 0.001 0.068 0.001 0.001 0.001 0.001 0.035 3 | 0.137 0.069 0.102 0.001 0.001 0.001 0.510 0.000 0.001 0.000 0.001 0.035 0.000 0.001 0.000 0.001 0.001 0.001 0.103 0.035 4 | 0.001 0.306 0.000 0.001 0.001 0.103 0.001 0.000 0.476 0.000 0.001 0.103 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001 5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.035 0.000 0.001 0.001 0.001 0.950 0.001 7 | 0.001 0.001 0.000 0.849 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.136 0.000 0.001 0.001 0.001 0.001 0.001 8 | 0.001 0.001 0.000 0.103 0.001 0.001 0.001 0.000 0.001 0.000 0.035 0.103 0.000 0.001 0.000 0.069 0.034 0.001 0.577 0.069 9 | 0.035 0.340 0.000 0.001 0.069 0.001 0.001 0.068 0.001 0.000 0.035 0.002 0.000 0.001 0.000 0.374 0.034 0.001 0.001 0.035 10 | 0.001 0.001 0.000 0.001 0.238 0.577 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.001 0.170 0.001 0.001 0.001 0.001 0.001 12 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.543 0.001 0.000 0.001 0.137 0.068 0.068 0.000 0.001 0.170 0.001 0.001 0.001 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 10 columns Num Motifs: 29 1, 1 79 agvda VIVLSANDPFVQ safgk 90 1.00 F 1091044 2, 1 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494 6, 1 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328 8, 1 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053 9, 1 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117 13, 1 65 eldce LVGLSVDQVFSH ikwie 76 1.00 F 14286173 15, 1 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438 21, 1 80 megvd VTVVSMDLPFAQ krfce 91 1.00 F 15605963 23, 1 70 tagln VVGISPDKPEKL atfrd 81 1.00 F 15609658 24, 1 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511 28, 1 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152 30, 1 101 tpgla VWGISPDSTYAH eafad 112 1.00 F 15790738 32, 1 80 idntv VLCISADLPFAQ srfcg 91 1.00 F 15801846 35, 1 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234 38, 1 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339 47, 1 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671 53, 1 73 ngvde IVCISVNDAFVM newak 84 1.00 F 17229033 60, 1 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548 62, 1 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077 67, 1 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112 70, 1 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405 72, 1 78 keegi VLTISADLPFAQ krwca 89 1.00 F 21283385 73, 1 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812 81, 1 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841 85, 1 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065 86, 1 74 kgide IICFSVNDPFVM kawgk 85 1.00 F 4704732 87, 1 68 klnck LIGFSCNSKESH dqwie 79 1.00 F 4996210 92, 1 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955 95, 1 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016 ***** **** * Column 1 : Sequence Number, Site Number Column 2 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input Log Motif portion of MAP for motif a = -469.15170 Log Fragmentation portion of MAP for motif a = -3.80666 ============================================================= ====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME ===== ====== Motif a ===== ============================================================= Listing of those elements occurring greater than 50% of the time in near optimal sampling using 500 iterations Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | . 74 . . . . . . 14 . . 11 . . . . . . . . 2.5 2 | . 14 . . . 3 . . 29 3 . 37 . . 7 . . . . 3 1.6 3 | 14 3 3 . . . 59 . . . . 3 . . . . . . 11 3 1.9 4 | . 33 . . . 7 . . 48 . . 11 . . . . . . . . 2.1 5 | . . . . . . . . . . . . . 3 . . . . 96 . 3.8 7 | . . . 96 . . . . . . . . . 3 . . . . . . 3.5 8 | . . . . . . . . . . 3 11 . . . 7 3 . 66 7 2.4 9 | . 40 . . 7 . . 7 . . 3 . . . . 33 3 . . 3 1.9 10 | . . . . 25 51 . . . . . . . . 18 . . . . 3 2.7 12 | . . . . . . . 59 . . . 14 . 7 . . 18 . . . 3.6 nonsite 8 8 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5 site 1 16 . 9 3 6 5 6 9 . . 8 . 1 2 4 2 . 17 2 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.002 0.729 0.000 0.001 0.001 0.001 0.001 0.000 0.147 0.000 0.001 0.111 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001 2 | 0.002 0.147 0.000 0.001 0.001 0.037 0.001 0.000 0.292 0.037 0.001 0.365 0.000 0.001 0.073 0.001 0.001 0.001 0.001 0.037 3 | 0.147 0.038 0.037 0.001 0.001 0.001 0.583 0.000 0.001 0.000 0.001 0.038 0.000 0.001 0.000 0.001 0.001 0.001 0.110 0.037 4 | 0.002 0.329 0.000 0.001 0.001 0.074 0.001 0.000 0.474 0.000 0.001 0.111 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001 5 | 0.002 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.037 0.000 0.001 0.001 0.001 0.946 0.001 7 | 0.002 0.001 0.000 0.947 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.037 0.000 0.001 0.001 0.001 0.001 0.001 8 | 0.002 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.038 0.111 0.000 0.001 0.000 0.074 0.037 0.001 0.655 0.074 9 | 0.002 0.401 0.000 0.001 0.074 0.001 0.001 0.073 0.001 0.000 0.038 0.002 0.000 0.001 0.000 0.328 0.037 0.001 0.001 0.037 10 | 0.002 0.001 0.000 0.001 0.256 0.510 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.001 0.182 0.001 0.001 0.001 0.001 0.037 12 | 0.002 0.001 0.000 0.001 0.001 0.001 0.001 0.582 0.001 0.000 0.001 0.147 0.000 0.073 0.000 0.001 0.182 0.001 0.001 0.001 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 10 columns Num Motifs: 27 2, 1 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494 6, 1 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328 8, 1 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053 9, 1 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117 13, 1 65 eldce LVGLSVDQVFSH ikwie 76 0.98 F 14286173 15, 1 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438 21, 1 80 megvd VTVVSMDLPFAQ krfce 91 0.65 F 15605963 23, 1 70 tagln VVGISPDKPEKL atfrd 81 0.99 F 15609658 24, 1 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511 28, 1 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152 30, 1 101 tpgla VWGISPDSTYAH eafad 112 0.98 F 15790738 32, 1 80 idntv VLCISADLPFAQ srfcg 91 0.99 F 15801846 35, 1 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234 38, 1 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339 47, 1 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671 60, 1 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548 62, 1 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077 67, 1 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112 70, 1 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405 72, 1 78 keegi VLTISADLPFAQ krwca 89 0.99 F 21283385 73, 1 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812 81, 1 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841 85, 1 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065 87, 1 68 klnck LIGFSCNSKESH dqwie 79 0.56 F 4996210 89, 1 127 kkyaa VFGLSADSVTSQ kkfqs 138 0.51 F 6322180 92, 1 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955 95, 1 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016 ***** **** * ------------------------------------------------------------------------- MOTIF b Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | 50 . . . . . . . 16 . . . . . . . . . 33 . 1.9 2 | . . . . . 16 . . . . . 50 . . . . 33 . . . 2.1 3 | . . . . . . . . . . . . . . 33 . 66 . . . 3.4 4 | . 33 . . . 16 . . . . . 33 . . 16 . . . . . 1.7 6 | 33 . . 16 33 . . . . . . . . 16 . . . . . . 1.5 8 | . . . . . . . 50 . . 16 . . . . 33 . . . . 3.2 9 | . . . . . . 66 . . . . . . . . 16 . 16 . . 2.3 10 | . 33 . 16 33 . . . . . . . . . 16 . . . . . 1.7 11 | 50 50 . . . . . . . . . . . . . . . . . . 2.1 12 | . . 66 . . . . . . . . . . . . . . . . 33 4.4 13 | . . . . . . . . . . . . . . . 100 . . . . 3.8 14 | 50 50 . . . . . . . . . . . . . . . . . . 2.1 16 | . . . . . . . . . 100 . . . . . . . . . . 5.8 17 | . . . . 16 . . . . . 66 . . 16 . . . . . . 2.1 19 | . . . . . . 100 . . . . . . . . . . . . . 3.2 nonsite 8 7 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5 site 12 11 4 2 5 2 11 3 1 6 5 5 . 2 4 10 6 1 2 2 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.468 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.158 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.312 0.004 2 | 0.006 0.006 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.469 0.002 0.003 0.002 0.004 0.310 0.003 0.004 0.004 3 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.310 0.004 0.618 0.003 0.004 0.004 4 | 0.006 0.314 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.315 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004 6 | 0.314 0.006 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004 8 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.463 0.004 0.001 0.159 0.007 0.002 0.003 0.002 0.312 0.002 0.003 0.004 0.004 9 | 0.006 0.006 0.001 0.005 0.005 0.004 0.621 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.158 0.002 0.157 0.004 0.004 10 | 0.006 0.314 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004 11 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 12 | 0.006 0.006 0.617 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.312 13 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.927 0.002 0.003 0.004 0.004 14 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 16 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.924 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 17 | 0.006 0.006 0.001 0.005 0.159 0.004 0.005 0.001 0.004 0.001 0.621 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004 19 | 0.006 0.006 0.001 0.005 0.005 0.004 0.928 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 15 columns Num Motifs: 6 2, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494 47, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671 67, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112 81, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841 87, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210 95, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016 **** * ******* ** * Column 1 : Sequence Number, Site Number Column 2 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input Log Motif portion of MAP for motif b = -187.76179 Log Fragmentation portion of MAP for motif b = -7.77486 ============================================================= ====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME ===== ====== Motif b ===== ============================================================= Listing of those elements occurring greater than 50% of the time in near optimal sampling using 500 iterations Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | 50 . . . . . . . 16 . . . . . . . . . 33 . 1.9 2 | . . . . . 16 . . . . . 50 . . . . 33 . . . 2.1 3 | . . . . . . . . . . . . . . 33 . 66 . . . 3.4 4 | . 33 . . . 16 . . . . . 33 . . 16 . . . . . 1.7 6 | 33 . . 16 33 . . . . . . . . 16 . . . . . . 1.5 8 | . . . . . . . 50 . . 16 . . . . 33 . . . . 3.2 9 | . . . . . . 66 . . . . . . . . 16 . 16 . . 2.3 10 | . 33 . 16 33 . . . . . . . . . 16 . . . . . 1.7 11 | 50 50 . . . . . . . . . . . . . . . . . . 2.1 12 | . . 66 . . . . . . . . . . . . . . . . 33 4.4 13 | . . . . . . . . . . . . . . . 100 . . . . 3.8 14 | 50 50 . . . . . . . . . . . . . . . . . . 2.1 16 | . . . . . . . . . 100 . . . . . . . . . . 5.8 17 | . . . . 16 . . . . . 66 . . 16 . . . . . . 2.1 19 | . . . . . . 100 . . . . . . . . . . . . . 3.2 nonsite 8 7 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5 site 12 11 4 2 5 2 11 3 1 6 5 5 . 2 4 10 6 1 2 2 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.468 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.158 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.312 0.004 2 | 0.006 0.006 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.469 0.002 0.003 0.002 0.004 0.310 0.003 0.004 0.004 3 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.310 0.004 0.618 0.003 0.004 0.004 4 | 0.006 0.314 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.315 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004 6 | 0.314 0.006 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004 8 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.463 0.004 0.001 0.159 0.007 0.002 0.003 0.002 0.312 0.002 0.003 0.004 0.004 9 | 0.006 0.006 0.001 0.005 0.005 0.004 0.621 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.158 0.002 0.157 0.004 0.004 10 | 0.006 0.314 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004 11 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 12 | 0.006 0.006 0.617 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.312 13 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.927 0.002 0.003 0.004 0.004 14 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 16 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.924 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 17 | 0.006 0.006 0.001 0.005 0.159 0.004 0.005 0.001 0.004 0.001 0.621 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004 19 | 0.006 0.006 0.001 0.005 0.005 0.004 0.928 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 15 columns Num Motifs: 6 2, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494 47, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671 67, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112 81, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841 87, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210 95, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016 **** * ******* ** * ------------------------------------------------------------------------- MOTIF c Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | . . . 5 3 . 5 . . . 53 3 . . . 3 11 7 1 3 1.6 3 | . 61 . . . . . . 22 . . 7 3 . . . . . 1 3 2.1 4 | . 38 . . . 3 3 . 11 . . 40 1 . . . . . . . 1.7 5 | 5 35 . . . . . . 24 . 1 31 1 . . . . . . . 1.7 6 | . . . 48 . . . 1 . . . . . 37 11 . 1 . . . 2.7 7 | 1 7 . . . 85 . . . . . 3 . . 1 . . . . . 3.4 8 | . . . . . 5 3 1 . 55 . . . . 18 . . . 9 5 3.5 9 | 87 . . . . 1 5 . . . . . . . . . . . 3 1 2.8 10 | 3 . . 14 9 . . 1 . . . . . 1 . 16 5 . 20 25 1.5 11 | . . . . . . 1 . . 90 . 1 . . 1 . . . 1 1 5.2 12 | . . 100 . . . . . . . . . . . . . . . . . 6.0 13 | 9 7 . . 1 . 53 . . . 1 . 1 . . 24 . . . . 2.0 14 | . 7 . 1 . . 1 . . . . 1 . . 1 83 . . 1 . 3.2 15 | . . 100 . . . . . . . . . . . . . . . . . 6.0 16 | . 5 . 1 1 . . 3 1 . 27 1 . . . . 9 46 . . 2.1 18 | . 3 . . 37 12 . . 18 . . 16 3 . . . 3 . . 3 1.4 20 | . . . 1 . . . . . . 3 . . . . 94 . . . . 3.9 22 | . 7 . . . 22 . . 9 . . 44 11 . 5 . . . . . 1.8 24 | 7 . . 3 37 . . 3 . . 27 1 . 1 . . 11 1 1 1 1.4 25 | 3 18 . 1 . 9 . . 7 . . 51 1 . . . . . 1 3 1.4 nonsite 8 7 1 6 7 4 6 1 5 1 7 9 2 4 2 4 3 4 4 4 site 5 9 10 3 4 7 3 . 4 7 5 10 1 2 2 11 2 2 2 2 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.001 0.001 0.000 0.056 0.037 0.000 0.056 0.000 0.001 0.000 0.533 0.038 0.000 0.000 0.000 0.037 0.110 0.074 0.019 0.037 3 | 0.001 0.606 0.000 0.001 0.001 0.000 0.001 0.000 0.221 0.000 0.001 0.074 0.037 0.000 0.000 0.000 0.000 0.000 0.019 0.037 4 | 0.001 0.386 0.000 0.001 0.001 0.037 0.037 0.000 0.111 0.000 0.001 0.405 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000 5 | 0.056 0.349 0.000 0.001 0.001 0.000 0.001 0.000 0.239 0.000 0.019 0.313 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000 6 | 0.001 0.001 0.000 0.478 0.001 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.367 0.110 0.000 0.019 0.000 0.000 0.000 7 | 0.019 0.074 0.000 0.001 0.001 0.845 0.001 0.000 0.001 0.000 0.001 0.038 0.000 0.000 0.019 0.000 0.000 0.000 0.000 0.000 8 | 0.001 0.001 0.000 0.001 0.001 0.056 0.037 0.018 0.001 0.551 0.001 0.001 0.000 0.000 0.184 0.000 0.000 0.000 0.092 0.056 9 | 0.863 0.001 0.000 0.001 0.001 0.019 0.056 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.037 0.019 10 | 0.037 0.001 0.000 0.147 0.092 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.166 0.055 0.000 0.202 0.257 11 | 0.001 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.001 0.899 0.001 0.019 0.000 0.000 0.019 0.000 0.000 0.000 0.019 0.019 12 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 13 | 0.093 0.074 0.000 0.001 0.019 0.000 0.533 0.000 0.001 0.000 0.019 0.001 0.019 0.000 0.000 0.239 0.000 0.000 0.000 0.000 14 | 0.001 0.074 0.000 0.019 0.001 0.000 0.019 0.000 0.001 0.000 0.001 0.019 0.000 0.000 0.019 0.826 0.000 0.000 0.019 0.000 15 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 16 | 0.001 0.056 0.000 0.019 0.019 0.000 0.001 0.037 0.019 0.000 0.276 0.019 0.000 0.000 0.000 0.000 0.092 0.459 0.000 0.000 18 | 0.001 0.037 0.000 0.001 0.368 0.129 0.001 0.000 0.184 0.000 0.001 0.166 0.037 0.000 0.000 0.000 0.037 0.000 0.000 0.037 20 | 0.001 0.001 0.000 0.019 0.001 0.000 0.001 0.000 0.001 0.000 0.037 0.001 0.000 0.000 0.000 0.936 0.000 0.000 0.000 0.000 22 | 0.001 0.074 0.000 0.001 0.001 0.221 0.001 0.000 0.092 0.000 0.001 0.441 0.110 0.000 0.055 0.000 0.000 0.000 0.000 0.000 24 | 0.074 0.001 0.000 0.037 0.368 0.000 0.001 0.037 0.001 0.000 0.276 0.019 0.000 0.019 0.000 0.000 0.110 0.019 0.019 0.019 25 | 0.037 0.184 0.000 0.019 0.001 0.092 0.001 0.000 0.074 0.000 0.001 0.515 0.019 0.000 0.000 0.000 0.000 0.000 0.019 0.037 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 20 columns Num Motifs: 54 3, 1 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727 4, 1 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686 5, 1 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976 7, 1 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154 10, 1 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765 11, 1 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082 12, 1 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543 14, 1 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634 16, 1 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394 17, 1 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673 18, 1 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256 19, 1 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312 20, 1 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725 25, 1 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085 26, 1 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140 27, 1 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431 31, 1 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337 33, 1 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225 34, 1 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374 37, 1 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007 40, 1 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937 41, 1 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313 42, 1 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864 43, 1 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427 46, 1 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495 48, 1 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717 49, 1 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994 50, 1 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507 52, 1 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867 54, 1 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859 55, 1 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944 56, 1 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233 57, 1 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401 58, 1 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503 59, 1 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723 61, 1 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743 61, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743 63, 1 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157 64, 1 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357 66, 1 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028 69, 1 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859 74, 1 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307 76, 1 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116 77, 1 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582 78, 1 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332 79, 1 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713 80, 1 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501 82, 1 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237 83, 1 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972 84, 1 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327 91, 1 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568 93, 1 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697 94, 1 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567 96, 1 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788 * ************** * * * ** Column 1 : Sequence Number, Site Number Column 2 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input Log Motif portion of MAP for motif c = -1607.59351 Log Fragmentation portion of MAP for motif c = -10.42374 ============================================================= ====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME ===== ====== Motif c ===== ============================================================= Listing of those elements occurring greater than 50% of the time in near optimal sampling using 500 iterations Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | . . . 5 3 . 5 . . . 53 3 . . . 3 11 7 1 3 1.6 3 | . 61 . . . . . . 22 . . 7 3 . . . . . 1 3 2.1 4 | . 38 . . . 3 3 . 11 . . 40 1 . . . . . . . 1.7 5 | 5 35 . . . . . . 24 . 1 31 1 . . . . . . . 1.7 6 | . . . 48 . . . 1 . . . . . 37 11 . 1 . . . 2.7 7 | 1 7 . . . 85 . . . . . 3 . . 1 . . . . . 3.4 8 | . . . . . 5 3 1 . 55 . . . . 18 . . . 9 5 3.5 9 | 87 . . . . 1 5 . . . . . . . . . . . 3 1 2.8 10 | 3 . . 14 9 . . 1 . . . . . 1 . 16 5 . 20 25 1.5 11 | . . . . . . 1 . . 90 . 1 . . 1 . . . 1 1 5.2 12 | . . 100 . . . . . . . . . . . . . . . . . 6.0 13 | 9 7 . . 1 . 53 . . . 1 . 1 . . 24 . . . . 2.0 14 | . 7 . 1 . . 1 . . . . 1 . . 1 83 . . 1 . 3.2 15 | . . 100 . . . . . . . . . . . . . . . . . 6.0 16 | . 5 . 1 1 . . 3 1 . 27 1 . . . . 9 46 . . 2.1 18 | . 3 . . 37 12 . . 18 . . 16 3 . . . 3 . . 3 1.4 20 | . . . 1 . . . . . . 3 . . . . 94 . . . . 3.9 22 | . 7 . . . 22 . . 9 . . 44 11 . 5 . . . . . 1.8 24 | 7 . . 3 37 . . 3 . . 27 1 . 1 . . 11 1 1 1 1.4 25 | 3 18 . 1 . 9 . . 7 . . 51 1 . . . . . 1 3 1.4 nonsite 8 7 1 6 7 4 6 1 5 1 7 9 2 4 2 4 3 4 4 4 site 5 9 10 3 4 7 3 . 4 7 5 10 1 2 2 11 2 2 2 2 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.001 0.001 0.000 0.056 0.037 0.000 0.056 0.000 0.001 0.000 0.533 0.038 0.000 0.000 0.000 0.037 0.110 0.074 0.019 0.037 3 | 0.001 0.606 0.000 0.001 0.001 0.000 0.001 0.000 0.221 0.000 0.001 0.074 0.037 0.000 0.000 0.000 0.000 0.000 0.019 0.037 4 | 0.001 0.386 0.000 0.001 0.001 0.037 0.037 0.000 0.111 0.000 0.001 0.405 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000 5 | 0.056 0.349 0.000 0.001 0.001 0.000 0.001 0.000 0.239 0.000 0.019 0.313 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000 6 | 0.001 0.001 0.000 0.478 0.001 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.367 0.110 0.000 0.019 0.000 0.000 0.000 7 | 0.019 0.074 0.000 0.001 0.001 0.845 0.001 0.000 0.001 0.000 0.001 0.038 0.000 0.000 0.019 0.000 0.000 0.000 0.000 0.000 8 | 0.001 0.001 0.000 0.001 0.001 0.056 0.037 0.018 0.001 0.551 0.001 0.001 0.000 0.000 0.184 0.000 0.000 0.000 0.092 0.056 9 | 0.863 0.001 0.000 0.001 0.001 0.019 0.056 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.037 0.019 10 | 0.037 0.001 0.000 0.147 0.092 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.166 0.055 0.000 0.202 0.257 11 | 0.001 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.001 0.899 0.001 0.019 0.000 0.000 0.019 0.000 0.000 0.000 0.019 0.019 12 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 13 | 0.093 0.074 0.000 0.001 0.019 0.000 0.533 0.000 0.001 0.000 0.019 0.001 0.019 0.000 0.000 0.239 0.000 0.000 0.000 0.000 14 | 0.001 0.074 0.000 0.019 0.001 0.000 0.019 0.000 0.001 0.000 0.001 0.019 0.000 0.000 0.019 0.826 0.000 0.000 0.019 0.000 15 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 16 | 0.001 0.056 0.000 0.019 0.019 0.000 0.001 0.037 0.019 0.000 0.276 0.019 0.000 0.000 0.000 0.000 0.092 0.459 0.000 0.000 18 | 0.001 0.037 0.000 0.001 0.368 0.129 0.001 0.000 0.184 0.000 0.001 0.166 0.037 0.000 0.000 0.000 0.037 0.000 0.000 0.037 20 | 0.001 0.001 0.000 0.019 0.001 0.000 0.001 0.000 0.001 0.000 0.037 0.001 0.000 0.000 0.000 0.936 0.000 0.000 0.000 0.000 22 | 0.001 0.074 0.000 0.001 0.001 0.221 0.001 0.000 0.092 0.000 0.001 0.441 0.110 0.000 0.055 0.000 0.000 0.000 0.000 0.000 24 | 0.074 0.001 0.000 0.037 0.368 0.000 0.001 0.037 0.001 0.000 0.276 0.019 0.000 0.019 0.000 0.000 0.110 0.019 0.019 0.019 25 | 0.037 0.184 0.000 0.019 0.001 0.092 0.001 0.000 0.074 0.000 0.001 0.515 0.019 0.000 0.000 0.000 0.000 0.000 0.019 0.037 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 20 columns Num Motifs: 54 3, 1 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727 4, 1 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686 5, 1 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976 7, 1 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154 10, 1 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765 11, 1 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082 12, 1 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543 14, 1 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634 16, 1 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394 17, 1 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673 18, 1 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256 19, 1 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312 20, 1 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725 25, 1 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085 26, 1 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140 27, 1 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431 31, 1 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337 33, 1 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225 34, 1 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374 37, 1 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007 40, 1 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937 41, 1 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313 42, 1 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864 43, 1 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427 46, 1 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495 48, 1 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717 49, 1 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994 50, 1 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507 52, 1 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867 54, 1 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859 55, 1 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944 56, 1 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233 57, 1 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401 58, 1 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503 59, 1 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723 61, 1 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743 61, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743 63, 1 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157 64, 1 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357 66, 1 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028 69, 1 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859 74, 1 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307 76, 1 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116 77, 1 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582 78, 1 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332 79, 1 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713 80, 1 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501 82, 1 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237 83, 1 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972 84, 1 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327 91, 1 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568 93, 1 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697 94, 1 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567 96, 1 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788 * ************** * * * ** ------------------------------------------------------------------------- MOTIF d Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | 2 . . 28 17 2 . 2 8 . . 17 . . 11 . . . 2 5 1.1 2 | 5 2 . . 5 34 . . . . 2 14 . . 17 . . 8 2 5 1.4 3 | 8 . . 5 11 . 14 . 2 . 28 . . 5 2 . . 17 . 2 1.0 4 | 2 . . 11 . . 62 . . . 5 . . 11 . . 2 . 2 . 2.1 5 | . . . . . . 2 . . . 57 . . 5 . . 5 22 5 . 2.2 7 | 2 68 . . . 2 5 . 2 . 2 . . 2 . . . . 2 8 1.9 8 | 2 62 . . . . . . 25 . . 8 . . . . . . . . 2.3 9 | . 8 . . . 8 . . 5 . . 77 . . . . . . . . 2.3 10 | 8 8 . . . 57 . . 2 . . 5 . . 11 . . . 2 2 2.1 11 | 14 2 . . . 62 8 . . . . . . . . . . . 11 . 2.4 12 | 2 11 . . . 14 . 11 2 5 . 5 . . 45 . . . . . 2.4 13 | . . . . . . . . . . . . . . . 100 . . . . 4.3 14 | 22 . . . . 2 28 2 . . 8 17 5 . 2 . . 8 . . 1.2 15 | 51 . . 42 . . . . . . . . . 2 . . . . 2 . 2.3 16 | . . . 2 . 71 2 . . 5 . . 5 . 5 . . . 5 . 2.9 17 | . . . . . . . . . . . . . . . . . . 11 88 3.6 18 | 5 . . . . 25 2 . . . . . . . . 51 2 . 11 . 2.4 19 | . 54 . . . . 20 . 8 . . . . . . . . . . 17 2.0 20 | . . 97 . . . . . . . . . . . . . . . 2 . 6.2 21 | 5 . . . . . . . . . . . . . . 28 2 . 20 42 2.3 22 | 14 2 . . . . 2 . . . 14 5 5 . . . 2 8 5 37 1.3 23 | . . . . 62 . . . . . 2 . . 8 . . 14 . 5 5 2.2 24 | 14 2 . . . 2 2 22 11 . . 31 11 . . . . . . . 1.8 27 | 2 2 . . 2 45 17 . 8 . . 8 . . 8 2 . . . . 1.6 28 | 11 . . 2 2 8 5 . . . . . . 2 14 . 5 22 22 . 1.4 nonsite 8 7 . 6 7 4 7 1 5 . 7 9 2 4 2 4 3 4 5 5 site 7 9 3 3 4 13 7 1 3 . 4 7 1 1 4 7 1 3 4 8 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.029 0.001 0.000 0.283 0.170 0.029 0.001 0.028 0.085 0.000 0.001 0.170 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.057 2 | 0.058 0.029 0.000 0.001 0.057 0.339 0.001 0.000 0.001 0.000 0.029 0.142 0.000 0.001 0.169 0.001 0.000 0.085 0.029 0.057 3 | 0.086 0.001 0.000 0.057 0.114 0.001 0.142 0.000 0.029 0.000 0.283 0.001 0.000 0.057 0.029 0.001 0.000 0.170 0.001 0.029 4 | 0.029 0.001 0.000 0.114 0.001 0.001 0.621 0.000 0.001 0.000 0.057 0.001 0.000 0.113 0.000 0.001 0.029 0.001 0.029 0.001 5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.564 0.001 0.000 0.057 0.000 0.001 0.057 0.226 0.057 0.001 7 | 0.029 0.677 0.000 0.001 0.001 0.029 0.057 0.000 0.029 0.000 0.029 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.085 8 | 0.029 0.621 0.000 0.001 0.001 0.001 0.001 0.000 0.254 0.000 0.001 0.086 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001 9 | 0.001 0.086 0.000 0.001 0.001 0.085 0.001 0.000 0.057 0.000 0.001 0.762 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001 10 | 0.086 0.086 0.000 0.001 0.001 0.564 0.001 0.000 0.029 0.000 0.001 0.058 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.029 11 | 0.142 0.029 0.000 0.001 0.001 0.620 0.085 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.001 12 | 0.029 0.114 0.000 0.001 0.001 0.142 0.001 0.113 0.029 0.057 0.001 0.058 0.000 0.001 0.451 0.001 0.000 0.001 0.001 0.001 13 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.987 0.000 0.001 0.001 0.001 14 | 0.227 0.001 0.000 0.001 0.001 0.029 0.283 0.028 0.001 0.000 0.086 0.170 0.057 0.001 0.029 0.001 0.000 0.085 0.001 0.001 15 | 0.508 0.001 0.000 0.423 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.001 16 | 0.001 0.001 0.000 0.029 0.001 0.705 0.029 0.000 0.001 0.057 0.001 0.001 0.057 0.001 0.057 0.001 0.000 0.001 0.057 0.001 17 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.874 18 | 0.058 0.001 0.000 0.001 0.001 0.254 0.029 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.508 0.029 0.001 0.113 0.001 19 | 0.001 0.536 0.000 0.001 0.001 0.001 0.198 0.000 0.085 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.170 20 | 0.001 0.001 0.958 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.029 0.001 21 | 0.058 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.282 0.029 0.001 0.198 0.423 22 | 0.142 0.029 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.142 0.058 0.057 0.001 0.000 0.001 0.029 0.085 0.057 0.367 23 | 0.001 0.001 0.000 0.001 0.621 0.001 0.001 0.000 0.001 0.000 0.029 0.001 0.000 0.085 0.000 0.001 0.141 0.001 0.057 0.057 24 | 0.142 0.029 0.000 0.001 0.001 0.029 0.029 0.226 0.113 0.000 0.001 0.311 0.113 0.001 0.000 0.001 0.000 0.001 0.001 0.001 27 | 0.029 0.029 0.000 0.001 0.029 0.451 0.170 0.000 0.085 0.000 0.001 0.086 0.000 0.001 0.085 0.029 0.000 0.001 0.001 0.001 28 | 0.114 0.001 0.000 0.029 0.029 0.085 0.057 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.141 0.001 0.057 0.226 0.226 0.001 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 25 columns Num Motifs: 35 1, 1 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044 2, 1 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494 6, 1 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 1.00 F 13186328 8, 1 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053 9, 1 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117 13, 1 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173 15, 1 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438 22, 1 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375 23, 1 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658 24, 1 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511 28, 1 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152 30, 1 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738 35, 1 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234 36, 1 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629 38, 1 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339 39, 1 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668 44, 1 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919 47, 1 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671 51, 1 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644 53, 1 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033 60, 1 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548 62, 1 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077 67, 1 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112 68, 1 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072 70, 1 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405 71, 1 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878 73, 1 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812 81, 1 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841 85, 1 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065 86, 1 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732 87, 1 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210 88, 1 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864 89, 1 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180 90, 1 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 1.00 F 6323138 95, 1 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016 ***** ****************** ** Column 1 : Sequence Number, Site Number Column 2 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input Log Motif portion of MAP for motif d = -1668.31468 Log Fragmentation portion of MAP for motif d = -7.86327 ============================================================= ====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME ===== ====== Motif d ===== ============================================================= Listing of those elements occurring greater than 50% of the time in near optimal sampling using 500 iterations Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | 2 . . 28 17 2 . 2 8 . . 17 . . 11 . . . 2 5 1.1 2 | 5 2 . . 5 34 . . . . 2 14 . . 17 . . 8 2 5 1.4 3 | 8 . . 5 11 . 14 . 2 . 28 . . 5 2 . . 17 . 2 1.0 4 | 2 . . 11 . . 62 . . . 5 . . 11 . . 2 . 2 . 2.1 5 | . . . . . . 2 . . . 57 . . 5 . . 5 22 5 . 2.2 7 | 2 68 . . . 2 5 . 2 . 2 . . 2 . . . . 2 8 1.9 8 | 2 62 . . . . . . 25 . . 8 . . . . . . . . 2.3 9 | . 8 . . . 8 . . 5 . . 77 . . . . . . . . 2.3 10 | 8 8 . . . 57 . . 2 . . 5 . . 11 . . . 2 2 2.1 11 | 14 2 . . . 62 8 . . . . . . . . . . . 11 . 2.4 12 | 2 11 . . . 14 . 11 2 5 . 5 . . 45 . . . . . 2.4 13 | . . . . . . . . . . . . . . . 100 . . . . 4.3 14 | 22 . . . . 2 28 2 . . 8 17 5 . 2 . . 8 . . 1.2 15 | 51 . . 42 . . . . . . . . . 2 . . . . 2 . 2.3 16 | . . . 2 . 71 2 . . 5 . . 5 . 5 . . . 5 . 2.9 17 | . . . . . . . . . . . . . . . . . . 11 88 3.6 18 | 5 . . . . 25 2 . . . . . . . . 51 2 . 11 . 2.4 19 | . 54 . . . . 20 . 8 . . . . . . . . . . 17 2.0 20 | . . 97 . . . . . . . . . . . . . . . 2 . 6.2 21 | 5 . . . . . . . . . . . . . . 28 2 . 20 42 2.3 22 | 14 2 . . . . 2 . . . 14 5 5 . . . 2 8 5 37 1.3 23 | . . . . 62 . . . . . 2 . . 8 . . 14 . 5 5 2.2 24 | 14 2 . . . 2 2 22 11 . . 31 11 . . . . . . . 1.8 27 | 2 2 . . 2 45 17 . 8 . . 8 . . 8 2 . . . . 1.6 28 | 11 . . 2 2 8 5 . . . . . . 2 14 . 5 22 22 . 1.4 nonsite 8 7 . 6 7 4 7 1 5 . 7 9 2 4 2 4 3 4 5 5 site 7 9 3 3 4 13 7 1 3 . 4 7 1 1 4 7 1 3 4 8 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.029 0.001 0.000 0.283 0.170 0.029 0.001 0.028 0.085 0.000 0.001 0.170 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.057 2 | 0.058 0.029 0.000 0.001 0.057 0.339 0.001 0.000 0.001 0.000 0.029 0.142 0.000 0.001 0.169 0.001 0.000 0.085 0.029 0.057 3 | 0.086 0.001 0.000 0.057 0.114 0.001 0.142 0.000 0.029 0.000 0.283 0.001 0.000 0.057 0.029 0.001 0.000 0.170 0.001 0.029 4 | 0.029 0.001 0.000 0.114 0.001 0.001 0.621 0.000 0.001 0.000 0.057 0.001 0.000 0.113 0.000 0.001 0.029 0.001 0.029 0.001 5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.564 0.001 0.000 0.057 0.000 0.001 0.057 0.226 0.057 0.001 7 | 0.029 0.677 0.000 0.001 0.001 0.029 0.057 0.000 0.029 0.000 0.029 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.085 8 | 0.029 0.621 0.000 0.001 0.001 0.001 0.001 0.000 0.254 0.000 0.001 0.086 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001 9 | 0.001 0.086 0.000 0.001 0.001 0.085 0.001 0.000 0.057 0.000 0.001 0.762 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001 10 | 0.086 0.086 0.000 0.001 0.001 0.564 0.001 0.000 0.029 0.000 0.001 0.058 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.029 11 | 0.142 0.029 0.000 0.001 0.001 0.620 0.085 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.001 12 | 0.029 0.114 0.000 0.001 0.001 0.142 0.001 0.113 0.029 0.057 0.001 0.058 0.000 0.001 0.451 0.001 0.000 0.001 0.001 0.001 13 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.987 0.000 0.001 0.001 0.001 14 | 0.227 0.001 0.000 0.001 0.001 0.029 0.283 0.028 0.001 0.000 0.086 0.170 0.057 0.001 0.029 0.001 0.000 0.085 0.001 0.001 15 | 0.508 0.001 0.000 0.423 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.001 16 | 0.001 0.001 0.000 0.029 0.001 0.705 0.029 0.000 0.001 0.057 0.001 0.001 0.057 0.001 0.057 0.001 0.000 0.001 0.057 0.001 17 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.874 18 | 0.058 0.001 0.000 0.001 0.001 0.254 0.029 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.508 0.029 0.001 0.113 0.001 19 | 0.001 0.536 0.000 0.001 0.001 0.001 0.198 0.000 0.085 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.170 20 | 0.001 0.001 0.958 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.029 0.001 21 | 0.058 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.282 0.029 0.001 0.198 0.423 22 | 0.142 0.029 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.142 0.058 0.057 0.001 0.000 0.001 0.029 0.085 0.057 0.367 23 | 0.001 0.001 0.000 0.001 0.621 0.001 0.001 0.000 0.001 0.000 0.029 0.001 0.000 0.085 0.000 0.001 0.141 0.001 0.057 0.057 24 | 0.142 0.029 0.000 0.001 0.001 0.029 0.029 0.226 0.113 0.000 0.001 0.311 0.113 0.001 0.000 0.001 0.000 0.001 0.001 0.001 27 | 0.029 0.029 0.000 0.001 0.029 0.451 0.170 0.000 0.085 0.000 0.001 0.086 0.000 0.001 0.085 0.029 0.000 0.001 0.001 0.001 28 | 0.114 0.001 0.000 0.029 0.029 0.085 0.057 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.141 0.001 0.057 0.226 0.226 0.001 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 25 columns Num Motifs: 35 1, 1 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044 2, 1 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494 6, 1 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 0.98 F 13186328 8, 1 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053 9, 1 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117 13, 1 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173 15, 1 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438 22, 1 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375 23, 1 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658 24, 1 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511 28, 1 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152 30, 1 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738 35, 1 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234 36, 1 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629 38, 1 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339 39, 1 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668 44, 1 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919 47, 1 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671 51, 1 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644 53, 1 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033 60, 1 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548 62, 1 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077 67, 1 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112 68, 1 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072 70, 1 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405 71, 1 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878 73, 1 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812 81, 1 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841 85, 1 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065 86, 1 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732 87, 1 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210 88, 1 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864 89, 1 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180 90, 1 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 0.99 F 6323138 95, 1 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016 ***** ****************** ** Log Background portion of Map = -39912.17887 Log Alignment portion of Map = -957.33606 Log Site/seq portion of Map = 0.00000 Log Null Map = -46943.36311 Log Map = 2111.15797 log MAP = sum of motif and fragmentation parts of MAP + background + alignment + sites/seq - Null ============================================================= ====== Results by Sequence ===== ====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME ===== ============================================================= 1, 1, 3 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044 2, 1, 3 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494 2, 2, 0 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494 2, 3, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494 3, 1, 2 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727 4, 1, 2 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686 5, 1, 2 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976 6, 1, 3 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 0.98 F 13186328 6, 2, 0 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328 7, 1, 2 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154 8, 1, 3 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053 8, 2, 0 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053 9, 1, 3 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117 9, 2, 0 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117 10, 1, 2 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765 11, 1, 2 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082 12, 1, 2 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543 13, 1, 3 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173 13, 2, 0 65 eldce LVGLSVDQVFSH ikwie 76 0.98 F 14286173 14, 1, 2 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634 15, 1, 3 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438 15, 2, 0 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438 16, 1, 2 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394 17, 1, 2 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673 18, 1, 2 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256 19, 1, 2 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312 20, 1, 2 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725 21, 1, 0 80 megvd VTVVSMDLPFAQ krfce 91 0.65 F 15605963 22, 1, 3 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375 23, 1, 3 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658 23, 2, 0 70 tagln VVGISPDKPEKL atfrd 81 0.99 F 15609658 24, 1, 3 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511 24, 2, 0 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511 25, 1, 2 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085 26, 1, 2 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140 27, 1, 2 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431 28, 1, 3 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152 28, 2, 0 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152 30, 1, 3 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738 30, 2, 0 101 tpgla VWGISPDSTYAH eafad 112 0.98 F 15790738 31, 1, 2 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337 32, 1, 0 80 idntv VLCISADLPFAQ srfcg 91 0.99 F 15801846 33, 1, 2 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225 34, 1, 2 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374 35, 1, 3 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234 35, 2, 0 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234 36, 1, 3 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629 37, 1, 2 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007 38, 1, 3 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339 38, 2, 0 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339 39, 1, 3 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668 40, 1, 2 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937 41, 1, 2 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313 42, 1, 2 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864 43, 1, 2 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427 44, 1, 3 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919 46, 1, 2 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495 47, 1, 3 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671 47, 2, 0 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671 47, 3, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671 48, 1, 2 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717 49, 1, 2 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994 50, 1, 2 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507 51, 1, 3 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644 52, 1, 2 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867 53, 1, 3 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033 54, 1, 2 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859 55, 1, 2 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944 56, 1, 2 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233 57, 1, 2 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401 58, 1, 2 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503 59, 1, 2 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723 60, 1, 3 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548 60, 2, 0 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548 61, 1, 2 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743 61, 2, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743 62, 1, 3 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077 62, 2, 0 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077 63, 1, 2 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157 64, 1, 2 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357 66, 1, 2 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028 67, 1, 3 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112 67, 2, 0 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112 67, 3, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112 68, 1, 3 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072 69, 1, 2 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859 70, 1, 3 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405 70, 2, 0 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405 71, 1, 3 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878 72, 1, 0 78 keegi VLTISADLPFAQ krwca 89 0.99 F 21283385 73, 1, 3 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812 73, 2, 0 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812 74, 1, 2 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307 76, 1, 2 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116 77, 1, 2 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582 78, 1, 2 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332 79, 1, 2 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713 80, 1, 2 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501 81, 1, 3 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841 81, 2, 0 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841 81, 3, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841 82, 1, 2 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237 83, 1, 2 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972 84, 1, 2 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327 85, 1, 3 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065 85, 2, 0 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065 86, 1, 3 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732 87, 1, 3 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210 87, 2, 0 68 klnck LIGFSCNSKESH dqwie 79 0.56 F 4996210 87, 3, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210 88, 1, 3 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864 89, 1, 3 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180 89, 2, 0 127 kkyaa VFGLSADSVTSQ kkfqs 138 0.51 F 6322180 90, 1, 3 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 0.99 F 6323138 91, 1, 2 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568 92, 1, 0 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955 93, 1, 2 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697 94, 1, 2 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567 95, 1, 3 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016 95, 2, 0 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016 95, 3, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016 96, 1, 2 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788 122 motifs Column 1 : Sequence Number, Site Number Column 2 : Motif type Column 3 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input ======================== MAP MAXIMIZATION RESULTS ==================== ====================================================================== ============================================================= ====== Results by Sequence ===== ============================================================= 1, 1, 3 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044 1, 2, 0 79 agvda VIVLSANDPFVQ safgk 90 0.48 F 1091044 2, 1, 3 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494 2, 2, 0 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494 2, 3, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494 3, 1, 2 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727 4, 1, 2 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686 5, 1, 2 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976 6, 1, 3 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 0.98 F 13186328 6, 2, 0 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328 7, 1, 2 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154 8, 1, 3 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053 8, 2, 0 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053 9, 1, 3 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117 9, 2, 0 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117 10, 1, 2 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765 11, 1, 2 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082 12, 1, 2 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543 13, 1, 3 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173 13, 2, 0 65 eldce LVGLSVDQVFSH ikwie 76 0.98 F 14286173 14, 1, 2 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634 15, 1, 3 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438 15, 2, 0 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438 16, 1, 2 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394 17, 1, 2 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673 18, 1, 2 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256 19, 1, 2 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312 20, 1, 2 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725 21, 1, 0 80 megvd VTVVSMDLPFAQ krfce 91 0.65 F 15605963 22, 1, 3 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375 23, 1, 3 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658 23, 2, 0 70 tagln VVGISPDKPEKL atfrd 81 0.99 F 15609658 24, 1, 3 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511 24, 2, 0 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511 25, 1, 2 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085 26, 1, 2 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140 27, 1, 2 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431 28, 1, 3 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152 28, 2, 0 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152 30, 1, 3 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738 30, 2, 0 101 tpgla VWGISPDSTYAH eafad 112 0.98 F 15790738 31, 1, 2 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337 32, 1, 0 80 idntv VLCISADLPFAQ srfcg 91 0.99 F 15801846 33, 1, 2 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225 34, 1, 2 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374 35, 1, 3 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234 35, 2, 0 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234 36, 1, 3 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629 37, 1, 2 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007 38, 1, 3 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339 38, 2, 0 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339 39, 1, 3 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668 40, 1, 2 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937 41, 1, 2 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313 42, 1, 2 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864 43, 1, 2 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427 44, 1, 3 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919 46, 1, 2 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495 47, 1, 3 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671 47, 2, 0 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671 47, 3, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671 48, 1, 2 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717 49, 1, 2 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994 50, 1, 2 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507 51, 1, 3 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644 52, 1, 2 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867 53, 1, 3 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033 53, 2, 0 73 ngvde IVCISVNDAFVM newak 84 0.32 F 17229033 54, 1, 2 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859 55, 1, 2 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944 56, 1, 2 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233 57, 1, 2 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401 58, 1, 2 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503 59, 1, 2 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723 60, 1, 3 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548 60, 2, 0 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548 61, 1, 2 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743 61, 2, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743 62, 1, 3 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077 62, 2, 0 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077 63, 1, 2 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157 64, 1, 2 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357 66, 1, 2 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028 67, 1, 3 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112 67, 2, 0 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112 67, 3, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112 68, 1, 3 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072 69, 1, 2 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859 70, 1, 3 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405 70, 2, 0 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405 71, 1, 3 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878 72, 1, 0 78 keegi VLTISADLPFAQ krwca 89 0.99 F 21283385 73, 1, 3 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812 73, 2, 0 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812 74, 1, 2 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307 76, 1, 2 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116 77, 1, 2 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582 78, 1, 2 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332 79, 1, 2 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713 80, 1, 2 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501 81, 1, 3 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841 81, 2, 0 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841 81, 3, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841 82, 1, 2 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237 83, 1, 2 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972 84, 1, 2 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327 85, 1, 3 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065 85, 2, 0 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065 86, 1, 3 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732 86, 2, 0 74 kgide IICFSVNDPFVM kawgk 85 0.43 F 4704732 87, 1, 3 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210 87, 2, 0 68 klnck LIGFSCNSKESH dqwie 79 0.56 F 4996210 87, 3, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210 88, 1, 3 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864 89, 1, 3 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180 90, 1, 3 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 0.99 F 6323138 91, 1, 2 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568 92, 1, 0 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955 93, 1, 2 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697 94, 1, 2 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567 95, 1, 3 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016 95, 2, 0 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016 95, 3, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016 96, 1, 2 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788 124 motifs Column 1 : Sequence Number, Site Number Column 2 : Motif type Column 3 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input ------------------------------------------------------------------------- MOTIF a Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | . 68 . . . . . . 20 . . 10 . . . . . . . . 2.4 2 | . 17 . . . . . . 34 3 . 34 . . 6 . . . . 3 1.7 3 | 13 6 10 . . . 51 . . . . 3 . . . . . . 10 3 1.9 4 | . 31 . . . 10 . . 48 . . 10 . . . . . . . . 2.1 5 | . . . . . . . . . . . . . 3 . . . . 96 . 3.8 7 | . . . 86 . . . . . . . . . 13 . . . . . . 3.2 8 | . . . 10 . . . . . . 3 10 . . . 6 3 . 58 6 2.0 9 | 3 34 . . 6 . . 6 . . 3 . . . . 37 3 . . 3 1.8 10 | . . . . 24 58 . . . . . . . . 17 . . . . . 2.9 12 | . . . . . . . 55 . . . 13 6 6 . . 17 . . . 3.4 nonsite 8 8 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5 site 1 15 1 9 3 6 5 6 10 . . 8 . 2 2 4 2 . 16 1 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.001 0.679 0.000 0.001 0.001 0.001 0.001 0.000 0.204 0.000 0.001 0.103 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001 2 | 0.001 0.171 0.000 0.001 0.001 0.001 0.001 0.000 0.340 0.034 0.001 0.341 0.000 0.001 0.068 0.001 0.001 0.001 0.001 0.035 3 | 0.137 0.069 0.102 0.001 0.001 0.001 0.510 0.000 0.001 0.000 0.001 0.035 0.000 0.001 0.000 0.001 0.001 0.001 0.103 0.035 4 | 0.001 0.306 0.000 0.001 0.001 0.103 0.001 0.000 0.476 0.000 0.001 0.103 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001 5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.035 0.000 0.001 0.001 0.001 0.950 0.001 7 | 0.001 0.001 0.000 0.849 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.136 0.000 0.001 0.001 0.001 0.001 0.001 8 | 0.001 0.001 0.000 0.103 0.001 0.001 0.001 0.000 0.001 0.000 0.035 0.103 0.000 0.001 0.000 0.069 0.034 0.001 0.577 0.069 9 | 0.035 0.340 0.000 0.001 0.069 0.001 0.001 0.068 0.001 0.000 0.035 0.002 0.000 0.001 0.000 0.374 0.034 0.001 0.001 0.035 10 | 0.001 0.001 0.000 0.001 0.238 0.577 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.001 0.170 0.001 0.001 0.001 0.001 0.001 12 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.543 0.001 0.000 0.001 0.137 0.068 0.068 0.000 0.001 0.170 0.001 0.001 0.001 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 10 columns Num Motifs: 29 1, 1 79 agvda VIVLSANDPFVQ safgk 90 0.48 F 1091044 2, 1 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494 6, 1 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328 8, 1 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053 9, 1 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117 13, 1 65 eldce LVGLSVDQVFSH ikwie 76 0.98 F 14286173 15, 1 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438 21, 1 80 megvd VTVVSMDLPFAQ krfce 91 0.65 F 15605963 23, 1 70 tagln VVGISPDKPEKL atfrd 81 0.99 F 15609658 24, 1 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511 28, 1 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152 30, 1 101 tpgla VWGISPDSTYAH eafad 112 0.98 F 15790738 32, 1 80 idntv VLCISADLPFAQ srfcg 91 0.99 F 15801846 35, 1 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234 38, 1 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339 47, 1 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671 53, 1 73 ngvde IVCISVNDAFVM newak 84 0.32 F 17229033 60, 1 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548 62, 1 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077 67, 1 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112 70, 1 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405 72, 1 78 keegi VLTISADLPFAQ krwca 89 0.99 F 21283385 73, 1 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812 81, 1 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841 85, 1 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065 86, 1 74 kgide IICFSVNDPFVM kawgk 85 0.43 F 4704732 87, 1 68 klnck LIGFSCNSKESH dqwie 79 0.56 F 4996210 92, 1 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955 95, 1 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016 ***** **** * Column 1 : Sequence Number, Site Number Column 2 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input Log Motif portion of MAP for motif a = -469.15170 Log Fragmentation portion of MAP for motif a = -3.80666 ------------------------------------------------------------------------- MOTIF b Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | 50 . . . . . . . 16 . . . . . . . . . 33 . 1.9 2 | . . . . . 16 . . . . . 50 . . . . 33 . . . 2.1 3 | . . . . . . . . . . . . . . 33 . 66 . . . 3.4 4 | . 33 . . . 16 . . . . . 33 . . 16 . . . . . 1.7 6 | 33 . . 16 33 . . . . . . . . 16 . . . . . . 1.5 8 | . . . . . . . 50 . . 16 . . . . 33 . . . . 3.2 9 | . . . . . . 66 . . . . . . . . 16 . 16 . . 2.3 10 | . 33 . 16 33 . . . . . . . . . 16 . . . . . 1.7 11 | 50 50 . . . . . . . . . . . . . . . . . . 2.1 12 | . . 66 . . . . . . . . . . . . . . . . 33 4.4 13 | . . . . . . . . . . . . . . . 100 . . . . 3.8 14 | 50 50 . . . . . . . . . . . . . . . . . . 2.1 16 | . . . . . . . . . 100 . . . . . . . . . . 5.8 17 | . . . . 16 . . . . . 66 . . 16 . . . . . . 2.1 19 | . . . . . . 100 . . . . . . . . . . . . . 3.2 nonsite 8 7 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5 site 12 11 4 2 5 2 11 3 1 6 5 5 . 2 4 10 6 1 2 2 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.468 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.158 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.312 0.004 2 | 0.006 0.006 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.469 0.002 0.003 0.002 0.004 0.310 0.003 0.004 0.004 3 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.310 0.004 0.618 0.003 0.004 0.004 4 | 0.006 0.314 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.315 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004 6 | 0.314 0.006 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004 8 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.463 0.004 0.001 0.159 0.007 0.002 0.003 0.002 0.312 0.002 0.003 0.004 0.004 9 | 0.006 0.006 0.001 0.005 0.005 0.004 0.621 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.158 0.002 0.157 0.004 0.004 10 | 0.006 0.314 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004 11 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 12 | 0.006 0.006 0.617 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.312 13 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.927 0.002 0.003 0.004 0.004 14 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 16 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.924 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 17 | 0.006 0.006 0.001 0.005 0.159 0.004 0.005 0.001 0.004 0.001 0.621 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004 19 | 0.006 0.006 0.001 0.005 0.005 0.004 0.928 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 15 columns Num Motifs: 6 2, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494 47, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671 67, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112 81, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841 87, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210 95, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016 **** * ******* ** * Column 1 : Sequence Number, Site Number Column 2 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input Log Motif portion of MAP for motif b = -187.76179 Log Fragmentation portion of MAP for motif b = -7.77486 ------------------------------------------------------------------------- MOTIF c Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | . . . 5 3 . 5 . . . 53 3 . . . 3 11 7 1 3 1.6 3 | . 61 . . . . . . 22 . . 7 3 . . . . . 1 3 2.1 4 | . 38 . . . 3 3 . 11 . . 40 1 . . . . . . . 1.7 5 | 5 35 . . . . . . 24 . 1 31 1 . . . . . . . 1.7 6 | . . . 48 . . . 1 . . . . . 37 11 . 1 . . . 2.7 7 | 1 7 . . . 85 . . . . . 3 . . 1 . . . . . 3.4 8 | . . . . . 5 3 1 . 55 . . . . 18 . . . 9 5 3.5 9 | 87 . . . . 1 5 . . . . . . . . . . . 3 1 2.8 10 | 3 . . 14 9 . . 1 . . . . . 1 . 16 5 . 20 25 1.5 11 | . . . . . . 1 . . 90 . 1 . . 1 . . . 1 1 5.2 12 | . . 100 . . . . . . . . . . . . . . . . . 6.0 13 | 9 7 . . 1 . 53 . . . 1 . 1 . . 24 . . . . 2.0 14 | . 7 . 1 . . 1 . . . . 1 . . 1 83 . . 1 . 3.2 15 | . . 100 . . . . . . . . . . . . . . . . . 6.0 16 | . 5 . 1 1 . . 3 1 . 27 1 . . . . 9 46 . . 2.1 18 | . 3 . . 37 12 . . 18 . . 16 3 . . . 3 . . 3 1.4 20 | . . . 1 . . . . . . 3 . . . . 94 . . . . 3.9 22 | . 7 . . . 22 . . 9 . . 44 11 . 5 . . . . . 1.8 24 | 7 . . 3 37 . . 3 . . 27 1 . 1 . . 11 1 1 1 1.4 25 | 3 18 . 1 . 9 . . 7 . . 51 1 . . . . . 1 3 1.4 nonsite 8 7 1 6 7 4 6 1 5 1 7 9 2 4 2 4 3 4 4 4 site 5 9 10 3 4 7 3 . 4 7 5 10 1 2 2 11 2 2 2 2 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.001 0.001 0.000 0.056 0.037 0.000 0.056 0.000 0.001 0.000 0.533 0.038 0.000 0.000 0.000 0.037 0.110 0.074 0.019 0.037 3 | 0.001 0.606 0.000 0.001 0.001 0.000 0.001 0.000 0.221 0.000 0.001 0.074 0.037 0.000 0.000 0.000 0.000 0.000 0.019 0.037 4 | 0.001 0.386 0.000 0.001 0.001 0.037 0.037 0.000 0.111 0.000 0.001 0.405 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000 5 | 0.056 0.349 0.000 0.001 0.001 0.000 0.001 0.000 0.239 0.000 0.019 0.313 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000 6 | 0.001 0.001 0.000 0.478 0.001 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.367 0.110 0.000 0.019 0.000 0.000 0.000 7 | 0.019 0.074 0.000 0.001 0.001 0.845 0.001 0.000 0.001 0.000 0.001 0.038 0.000 0.000 0.019 0.000 0.000 0.000 0.000 0.000 8 | 0.001 0.001 0.000 0.001 0.001 0.056 0.037 0.018 0.001 0.551 0.001 0.001 0.000 0.000 0.184 0.000 0.000 0.000 0.092 0.056 9 | 0.863 0.001 0.000 0.001 0.001 0.019 0.056 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.037 0.019 10 | 0.037 0.001 0.000 0.147 0.092 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.166 0.055 0.000 0.202 0.257 11 | 0.001 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.001 0.899 0.001 0.019 0.000 0.000 0.019 0.000 0.000 0.000 0.019 0.019 12 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 13 | 0.093 0.074 0.000 0.001 0.019 0.000 0.533 0.000 0.001 0.000 0.019 0.001 0.019 0.000 0.000 0.239 0.000 0.000 0.000 0.000 14 | 0.001 0.074 0.000 0.019 0.001 0.000 0.019 0.000 0.001 0.000 0.001 0.019 0.000 0.000 0.019 0.826 0.000 0.000 0.019 0.000 15 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 16 | 0.001 0.056 0.000 0.019 0.019 0.000 0.001 0.037 0.019 0.000 0.276 0.019 0.000 0.000 0.000 0.000 0.092 0.459 0.000 0.000 18 | 0.001 0.037 0.000 0.001 0.368 0.129 0.001 0.000 0.184 0.000 0.001 0.166 0.037 0.000 0.000 0.000 0.037 0.000 0.000 0.037 20 | 0.001 0.001 0.000 0.019 0.001 0.000 0.001 0.000 0.001 0.000 0.037 0.001 0.000 0.000 0.000 0.936 0.000 0.000 0.000 0.000 22 | 0.001 0.074 0.000 0.001 0.001 0.221 0.001 0.000 0.092 0.000 0.001 0.441 0.110 0.000 0.055 0.000 0.000 0.000 0.000 0.000 24 | 0.074 0.001 0.000 0.037 0.368 0.000 0.001 0.037 0.001 0.000 0.276 0.019 0.000 0.019 0.000 0.000 0.110 0.019 0.019 0.019 25 | 0.037 0.184 0.000 0.019 0.001 0.092 0.001 0.000 0.074 0.000 0.001 0.515 0.019 0.000 0.000 0.000 0.000 0.000 0.019 0.037 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 20 columns Num Motifs: 54 3, 1 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727 4, 1 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686 5, 1 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976 7, 1 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154 10, 1 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765 11, 1 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082 12, 1 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543 14, 1 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634 16, 1 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394 17, 1 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673 18, 1 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256 19, 1 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312 20, 1 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725 25, 1 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085 26, 1 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140 27, 1 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431 31, 1 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337 33, 1 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225 34, 1 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374 37, 1 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007 40, 1 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937 41, 1 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313 42, 1 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864 43, 1 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427 46, 1 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495 48, 1 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717 49, 1 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994 50, 1 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507 52, 1 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867 54, 1 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859 55, 1 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944 56, 1 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233 57, 1 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401 58, 1 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503 59, 1 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723 61, 1 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743 61, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743 63, 1 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157 64, 1 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357 66, 1 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028 69, 1 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859 74, 1 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307 76, 1 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116 77, 1 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582 78, 1 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332 79, 1 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713 80, 1 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501 82, 1 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237 83, 1 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972 84, 1 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327 91, 1 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568 93, 1 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697 94, 1 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567 96, 1 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788 * ************** * * * ** Column 1 : Sequence Number, Site Number Column 2 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input Log Motif portion of MAP for motif c = -1607.59351 Log Fragmentation portion of MAP for motif c = -10.42374 ------------------------------------------------------------------------- MOTIF d Motif model (residue frequency x 100) ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t Info _____________________________________________________________________________________________ 1 | 2 . . 28 17 2 . 2 8 . . 17 . . 11 . . . 2 5 1.1 2 | 5 2 . . 5 34 . . . . 2 14 . . 17 . . 8 2 5 1.4 3 | 8 . . 5 11 . 14 . 2 . 28 . . 5 2 . . 17 . 2 1.0 4 | 2 . . 11 . . 62 . . . 5 . . 11 . . 2 . 2 . 2.1 5 | . . . . . . 2 . . . 57 . . 5 . . 5 22 5 . 2.2 7 | 2 68 . . . 2 5 . 2 . 2 . . 2 . . . . 2 8 1.9 8 | 2 62 . . . . . . 25 . . 8 . . . . . . . . 2.3 9 | . 8 . . . 8 . . 5 . . 77 . . . . . . . . 2.3 10 | 8 8 . . . 57 . . 2 . . 5 . . 11 . . . 2 2 2.1 11 | 14 2 . . . 62 8 . . . . . . . . . . . 11 . 2.4 12 | 2 11 . . . 14 . 11 2 5 . 5 . . 45 . . . . . 2.4 13 | . . . . . . . . . . . . . . . 100 . . . . 4.3 14 | 22 . . . . 2 28 2 . . 8 17 5 . 2 . . 8 . . 1.2 15 | 51 . . 42 . . . . . . . . . 2 . . . . 2 . 2.3 16 | . . . 2 . 71 2 . . 5 . . 5 . 5 . . . 5 . 2.9 17 | . . . . . . . . . . . . . . . . . . 11 88 3.6 18 | 5 . . . . 25 2 . . . . . . . . 51 2 . 11 . 2.4 19 | . 54 . . . . 20 . 8 . . . . . . . . . . 17 2.0 20 | . . 97 . . . . . . . . . . . . . . . 2 . 6.2 21 | 5 . . . . . . . . . . . . . . 28 2 . 20 42 2.3 22 | 14 2 . . . . 2 . . . 14 5 5 . . . 2 8 5 37 1.3 23 | . . . . 62 . . . . . 2 . . 8 . . 14 . 5 5 2.2 24 | 14 2 . . . 2 2 22 11 . . 31 11 . . . . . . . 1.8 27 | 2 2 . . 2 45 17 . 8 . . 8 . . 8 2 . . . . 1.6 28 | 11 . . 2 2 8 5 . . . . . . 2 14 . 5 22 22 . 1.4 nonsite 8 7 . 6 7 4 7 1 5 . 7 9 2 4 2 4 3 4 5 5 site 7 9 3 3 4 13 7 1 3 . 4 7 1 1 4 7 1 3 4 8 Motif probability model ____________________________________________ Pos. # a v c d e f g h i w k l m n y p q r s t ____________________________________________ 1 | 0.029 0.001 0.000 0.283 0.170 0.029 0.001 0.028 0.085 0.000 0.001 0.170 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.057 2 | 0.058 0.029 0.000 0.001 0.057 0.339 0.001 0.000 0.001 0.000 0.029 0.142 0.000 0.001 0.169 0.001 0.000 0.085 0.029 0.057 3 | 0.086 0.001 0.000 0.057 0.114 0.001 0.142 0.000 0.029 0.000 0.283 0.001 0.000 0.057 0.029 0.001 0.000 0.170 0.001 0.029 4 | 0.029 0.001 0.000 0.114 0.001 0.001 0.621 0.000 0.001 0.000 0.057 0.001 0.000 0.113 0.000 0.001 0.029 0.001 0.029 0.001 5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.564 0.001 0.000 0.057 0.000 0.001 0.057 0.226 0.057 0.001 7 | 0.029 0.677 0.000 0.001 0.001 0.029 0.057 0.000 0.029 0.000 0.029 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.085 8 | 0.029 0.621 0.000 0.001 0.001 0.001 0.001 0.000 0.254 0.000 0.001 0.086 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001 9 | 0.001 0.086 0.000 0.001 0.001 0.085 0.001 0.000 0.057 0.000 0.001 0.762 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001 10 | 0.086 0.086 0.000 0.001 0.001 0.564 0.001 0.000 0.029 0.000 0.001 0.058 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.029 11 | 0.142 0.029 0.000 0.001 0.001 0.620 0.085 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.001 12 | 0.029 0.114 0.000 0.001 0.001 0.142 0.001 0.113 0.029 0.057 0.001 0.058 0.000 0.001 0.451 0.001 0.000 0.001 0.001 0.001 13 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.987 0.000 0.001 0.001 0.001 14 | 0.227 0.001 0.000 0.001 0.001 0.029 0.283 0.028 0.001 0.000 0.086 0.170 0.057 0.001 0.029 0.001 0.000 0.085 0.001 0.001 15 | 0.508 0.001 0.000 0.423 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.001 16 | 0.001 0.001 0.000 0.029 0.001 0.705 0.029 0.000 0.001 0.057 0.001 0.001 0.057 0.001 0.057 0.001 0.000 0.001 0.057 0.001 17 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.874 18 | 0.058 0.001 0.000 0.001 0.001 0.254 0.029 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.508 0.029 0.001 0.113 0.001 19 | 0.001 0.536 0.000 0.001 0.001 0.001 0.198 0.000 0.085 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.170 20 | 0.001 0.001 0.958 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.029 0.001 21 | 0.058 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.282 0.029 0.001 0.198 0.423 22 | 0.142 0.029 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.142 0.058 0.057 0.001 0.000 0.001 0.029 0.085 0.057 0.367 23 | 0.001 0.001 0.000 0.001 0.621 0.001 0.001 0.000 0.001 0.000 0.029 0.001 0.000 0.085 0.000 0.001 0.141 0.001 0.057 0.057 24 | 0.142 0.029 0.000 0.001 0.001 0.029 0.029 0.226 0.113 0.000 0.001 0.311 0.113 0.001 0.000 0.001 0.000 0.001 0.001 0.001 27 | 0.029 0.029 0.000 0.001 0.029 0.451 0.170 0.000 0.085 0.000 0.001 0.086 0.000 0.001 0.085 0.029 0.000 0.001 0.001 0.001 28 | 0.114 0.001 0.000 0.029 0.029 0.085 0.057 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.141 0.001 0.057 0.226 0.226 0.001 Background probability model 0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052 25 columns Num Motifs: 35 1, 1 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044 2, 1 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494 6, 1 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 0.98 F 13186328 8, 1 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053 9, 1 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117 13, 1 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173 15, 1 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438 22, 1 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375 23, 1 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658 24, 1 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511 28, 1 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152 30, 1 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738 35, 1 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234 36, 1 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629 38, 1 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339 39, 1 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668 44, 1 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919 47, 1 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671 51, 1 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644 53, 1 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033 60, 1 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548 62, 1 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077 67, 1 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112 68, 1 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072 70, 1 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405 71, 1 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878 73, 1 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812 81, 1 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841 85, 1 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065 86, 1 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732 87, 1 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210 88, 1 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864 89, 1 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180 90, 1 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 0.99 F 6323138 95, 1 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016 ***** ****************** ** Column 1 : Sequence Number, Site Number Column 2 : Left End Location Column 4 : Motif Element Column 5 : Right End Location Column 6 : Probability of Element Column 7 : Forward Motif (F) or Reverse Complement (R) Column 8 : Sequence Description from Fast A input Log Motif portion of MAP for motif d = -1668.31468 Log Fragmentation portion of MAP for motif d = -7.86327 Log Background portion of Map = -39912.17887 Log Alignment portion of Map = -956.36102 Log Site/seq portion of Map = 0.00000 Log Null Map = -46943.36311 Log Map = 2112.13301 log MAP = sum of motif and fragmentation parts of MAP + background + alignment + sites/seq - Null Frequency Map = 2109.909622 Nearopt Map = 2111.157969 Maximal Map = 2111.157969 Total Time 105 sec (1.750000 min) Elapsed time: 104.960000 secs DOF[0] = 190 DOF[1] = 285 DOF[2] = 380 DOF[3] = 475 """ #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_greengenes.py000644 000765 000024 00000015307 12024702176 023315 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.parse.greengenes import MinimalGreengenesParser, make_ignore_f,\ DefaultDelimitedSplitter, SpecificGreengenesParser __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" #consider project name __credits__ = ["Daniel McDonald"] #remember to add yourself if you make changes __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "daniel.mcdonald@colorado.edu" __status__ = "Prototype" class ParseGreengenesRecordsTests(TestCase): def setUp(self): pass def test_MinimalGreengenesParser_mock(self): """Test MinimalGreengenesParser against mock data""" res = MinimalGreengenesParser(mock_data.splitlines(), RecStart="my_starting", \ RecEnd="my_ending") records = list(res) exp = [{'a':'1','b':'2','c':'3','d':'','e':'5'}, {'q':'asdasd','c':'taco'}] self.assertEqual(records, exp) def test_MinimalGreengenesParser_real(self): """Test MinimalGreengenesParser against real data""" res = MinimalGreengenesParser(real_data.splitlines()) record1, record2 = list(res) self.assertEqual(record1['G2_chip_tax_string'],'Unclassified') self.assertEqual(record1['authors'],'Hernanandez-Eugenio,G., Silva-Rojas,H.V., Zelaya-Molina,L.X.') self.assertEqual(record1['bel3_div_ratio'],'') self.assertEqual(len(record1), 72) self.assertEqual(record2['ncbi_acc_w_ver'],'FJ832719.1') self.assertEqual(record2['timestamp'],'2010-03-23 14:08:27') self.assertEqual(record2['title'],'Developmental Microbial Ecology of the Crop of the Folivorous Hoatzin') def test_SpecificGreengenesParser_real(self): """Test SpecificGreengenesParser against real data""" fields = ['prokMSA_id','journal'] res = SpecificGreengenesParser(real_data.splitlines(), fields) records = list(res) exp = [('604868',''),('604867','ISME J (2010) In press')] self.assertEqual(records, exp) ids = ['604867','12312312323'] res = SpecificGreengenesParser(real_data.splitlines(), fields, ids) records = list(res) exp = [('604867','ISME J (2010) In press')] self.assertEqual(records, exp) def test_make_ignore_f(self): """Properly ignore empty records and the start line""" f = make_ignore_f('testing') self.assertFalse(f(['asasdasd',''])) self.assertFalse(f(['test',''])) self.assertFalse(f(['testing2',''])) self.assertFalse(f(['testing','asd'])) self.assertTrue(f(['',''])) self.assertTrue(f(None)) self.assertTrue(f(['',''])) self.assertTrue(f(['testing',''])) mock_data = """my_starting a=1 b=2 c=3 d= e=5 my_ending my_starting q=asdasd c=taco my_ending """ real_data = """BEGIN G2_chip_tax_string=Unclassified G2_chip_tax_string_format_2=Unclassified HOMD_tax_string= HOMD_tax_string_format_2= Hugenholtz_tax_string=Unclassified Hugenholtz_tax_string_format_2=Unclassified Ludwig_tax_string=Unclassified Ludwig_tax_string_format_2=Unclassified Pace_tax_string=Unclassified Pace_tax_string_format_2=Unclassified RDP_tax_string=Unclassified RDP_tax_string_format_2=Unclassified Silva_tax_string=Unclassified Silva_tax_string_format_2=Unclassified authors=Hernanandez-Eugenio,G., Silva-Rojas,H.V., Zelaya-Molina,L.X. bel3_div_ratio= bellerophon= blast_perc_ident_to_template= clone=51a contact_info=Irrigacion, Universidad Autonoma Chapingo, Carretera Mexico-Texcoco Km 37.5, Texcoco, Mexico 56230, Mexico core_set_member= core_set_member2= country=Mexico: Mexico City create_date=21-NOV-2009 db_name= decision=clone description=Uncultured bacterium clone 51a 16S ribosomal RNA gene, partial sequence email= gold_id= img_oid= isolate= isolation_source=mesophilic anaerobic reactor fed with effluent from the chemical industry journal= longest_insertion= medline_ids= ncbi_acc= ncbi_acc_w_ver=FJ461956.1 ncbi_gi=213390944 ncbi_seq_length=1512 ncbi_tax_id=77133 ncbi_tax_string=Bacteria; environmental samples ncbi_tax_string_format_2=Unclassified non_ACGT_count= non_ACGT_percent= note= organism=uncultured bacterium perc_ident_to_invariant_core= prokMSA_id=604868 prokMSAname=Microbial ecology industrial digestor mesophilic anaerobic reactor fed effluent chemical industry clone 51a pubmed_ids= remark= replaced_by= single_nt_runs_over_7= small_gap_intrusions= source=uncultured bacterium span_aligned=1..2 specific_host= status=0 strain= study_id=38002 sub_species= submit_date=24-OCT-2008 template= timestamp=2010-03-23 14:08:27 title=Microbial ecology of industrial anaerobic digestor unaligned_length= update_date=21-NOV-2009 warning= wigeon95= wigeon99= wigeon_std_dev= aligned_seq=unaligned END BEGIN G2_chip_tax_string=Unclassified G2_chip_tax_string_format_2=Unclassified HOMD_tax_string= HOMD_tax_string_format_2= Hugenholtz_tax_string=Unclassified Hugenholtz_tax_string_format_2=Unclassified Ludwig_tax_string=Unclassified Ludwig_tax_string_format_2=Unclassified Pace_tax_string=Unclassified Pace_tax_string_format_2=Unclassified RDP_tax_string=Unclassified RDP_tax_string_format_2=Unclassified Silva_tax_string=Unclassified Silva_tax_string_format_2=Unclassified authors=Brodie,E.L., Dominguez-Bello,M.G., Garcia-Amado,M.A., Godoy-Vitorino,F., Goldfarb,K.C., Michelangeli,F. bel3_div_ratio= bellerophon= blast_perc_ident_to_template= clone=J3Q101_11C02 contact_info=Biology, University of Puerto Rico, Rio Piedras Campus, PO Box 23360, San Juan, PR 00931-3360, USA core_set_member= core_set_member2= country=Venezuela create_date=10-DEC-2009 db_name= decision=clone description=Uncultured bacterium clone J3Q101_11C02 16S ribosomal RNA gene, partial sequence email= gold_id= img_oid= isolate= isolation_source=crop contents journal=ISME J (2010) In press longest_insertion= medline_ids= ncbi_acc= ncbi_acc_w_ver=FJ832719.1 ncbi_gi=226447371 ncbi_seq_length=1326 ncbi_tax_id=77133 ncbi_tax_string=Bacteria; environmental samples ncbi_tax_string_format_2=Unclassified non_ACGT_count= non_ACGT_percent= note= organism=uncultured bacterium perc_ident_to_invariant_core= prokMSA_id=604867 prokMSAname=Microbial Ecology Crop Folivorous Hoatzin crop contents clone J3Q101_11C02 pubmed_ids= remark= replaced_by= single_nt_runs_over_7= small_gap_intrusions= source=uncultured bacterium span_aligned=1..2 specific_host= status=0 strain= study_id=37901 sub_species= submit_date=16-MAR-2009 template= timestamp=2010-03-23 14:08:27 title=Developmental Microbial Ecology of the Crop of the Folivorous Hoatzin unaligned_length= update_date=10-DEC-2009 warning= wigeon95= wigeon99= wigeon_std_dev= aligned_seq=unaligned END """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_illumina_sequence.py000644 000765 000024 00000010507 12024702176 024672 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of Illumina sequence file parser. """ from cogent.util.unit_test import TestCase, main from cogent.util.misc import remove_files from cogent.app.util import get_tmp_filename from cogent.parse.illumina_sequence import (MinimalIlluminaSequenceParser) __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Production" class ParseIlluminaSequenceTests(TestCase): """ Test of top-level Illumina parsing functions """ def setUp(self): """ """ self.illumina_read1 = illumina_read1 self.illumina_read2 = illumina_read2 self.expected_read1 = expected_read1 self.expected_read2 = expected_read2 self.illumina_read1_fp = get_tmp_filename( prefix='ParseIlluminaTest',suffix='.txt') open(self.illumina_read1_fp,'w').write('\n'.join(self.illumina_read1)) self.files_to_remove = [self.illumina_read1_fp] def tearDown(self): """ """ remove_files(self.files_to_remove) def test_MinimalIlluminaSequenceParser(self): """ MinimalIlluminaSequenceParser functions as expected """ actual_read1 = list(MinimalIlluminaSequenceParser(self.illumina_read1)) self.assertEqual(actual_read1,self.expected_read1) actual_read2 = list(MinimalIlluminaSequenceParser(self.illumina_read2)) self.assertEqual(actual_read2,self.expected_read2) def test_MinimalIlluminaSequenceParser_handles_filepath_as_input(self): """ MinimalIlluminaSequenceParser functions with filepath as input """ actual_read1 = list(MinimalIlluminaSequenceParser( self.illumina_read1_fp)) self.assertEqual(actual_read1,self.expected_read1) def test_MinimalIlluminaSequenceParser_handles_file_as_input(self): """ MinimalIlluminaSequenceParser functions with file handle as input """ actual_read1 = list(MinimalIlluminaSequenceParser( open(self.illumina_read1_fp))) self.assertEqual(actual_read1,self.expected_read1) illumina_read1 = """HWI-6X_9267:1:1:4:1699#ACCACCC/1:TACGGAGGGTGCGAGCGTTAATCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGAAAAAAAAAAAAAAAAAAAAAAA:abbbbbbbbbb`_`bbbbbb`bb^aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaDaabbBBBBBBBBBBBBBBBBBBB HWI-6X_9267:1:1:4:390#ACCTCCC/1:GACAGGAGGAGCAAGTGTTATTCAAATTATGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAA:aaaaaaaaaa```aa\^_aa``aVaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaBaaaaa""".split('\n') expected_read1 = [(["HWI-6X_9267","1","1","4","1699#ACCACCC/1"], "TACGGAGGGTGCGAGCGTTAATCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGAAAAAAAAAAAAAAAAAAAAAAA", "abbbbbbbbbb`_`bbbbbb`bb^aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaDaabbBBBBBBBBBBBBBBBBBBB"), (["HWI-6X_9267","1","1","4","390#ACCTCCC/1"], "GACAGGAGGAGCAAGTGTTATTCAAATTATGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAA", "aaaaaaaaaa```aa\^_aa``aVaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaBaaaaa")] illumina_read2 = """HWI-6X_9267:1:1:4:1699#ACCACCC/2:TTTTAAAAAAAAGGGGGGGGGGGCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTTAAAAAAAAACCCCCCCGGGGGGGGTTTTTTTAATTATTC:aaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccBcccccccccccccccc```````BBBB HWI-6X_9267:1:1:4:390#ACCTCCC/2:ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG:aaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbb""".split('\n') expected_read2 = [(["HWI-6X_9267","1","1","4","1699#ACCACCC/2"], "TTTTAAAAAAAAGGGGGGGGGGGCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTTAAAAAAAAACCCCCCCGGGGGGGGTTTTTTTAATTATTC", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccBcccccccccccccccc```````BBBB"), (["HWI-6X_9267","1","1","4","390#ACCTCCC/2"], "ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG", "aaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbb")] if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_ilm.py000644 000765 000024 00000002045 12024702176 021747 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.ilm import ilm_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class IlmParserTest(TestCase): """Provides tests for ILM RNA secondary structure format parsers""" def setUp(self): """Setup function""" #output self.ilm_out = ILM #expected self.ilm_exp = [[(0,13),(1,12),(2,11),(6,7)]] def test_ilm_output(self): """Test for ilm format""" obs = ilm_parser(self.ilm_out) self.assertEqual(obs,self.ilm_exp) ILM = ['\n', 'Final Matching:\n', '1 14\n', '2 13\n', '3 12\n', '4 0\n', '5 0\n', '6 0\n', '7 8\n', '8 7\n', '9 0\n', '10 0\n', '11 0\n', '12 3\n', '13 2\n', '14 1\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_infernal.py000644 000765 000024 00000015773 12024702176 023000 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # test_infernal.py from cogent.util.unit_test import TestCase, main from cogent.parse.infernal import CmsearchParser,CmalignScoreParser __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" class CmsearchParserTests(TestCase): """Tests for CmsearchParser. """ def setUp(self): """setup for CmsearchParserTests. """ self.basic_results_empty = """# command data # date # CM summary # Post search summary """ self.basic_results_hits = """# command data # date # CM summary Model_1 Target_1 1 10 1 10 25.25 - 50 Model_1 Target_2 3 13 1 10 14.2 - 49 # Post search summary """ self.basic_res = [['Model_1','Target_1', 1, 10, 1, 10, 25.25, '-', 50],\ ['Model_1','Target_2', 3, 13, 1, 10, 14.2, '-', 49]] self.search_res = [['model1.cm','seq_0', 5, 23, 1, 19, 12.85, '-', 37],\ ['model1.cm','seq_1', 1, 19, 1, 19, 14.36, '-', 47]] def test_cmsearch_parser_no_data(self): """CmsearchParser should return correct result given no data. """ parser = CmsearchParser([]) self.assertEqual(list(parser),[]) def test_cmsearch_parser_no_res(self): """CmsearchParser should return correct result given no hits in result. """ parser = CmsearchParser(self.basic_results_empty.split('\n')) self.assertEqual(list(parser),[]) def test_cmsearch_parser_basic(self): """CmsearchParser should return correct result given basic output. """ parser = CmsearchParser(self.basic_results_hits.split('\n')) self.assertEqual(list(parser),self.basic_res) def test_cmsearch_parser_full(self): """CmsearchParser should return correct result given cmsearch output. """ parser = CmsearchParser(SEARCH_DATA.split('\n')) self.assertEqual(list(parser),self.search_res) class CmalignScoreParserTests(TestCase): """Tests for CmalignScoreParser. """ def setUp(self): """setup for CmalignScoreParserTests. """ self.basic_results_hits = """# command: data # date: # # cm summary 1 Target_1 83 55.02 2.94 0.956 00:00:00.01 2 Target_2 84 53.31 4.42 0.960 00:00:00.01 # post alignment summary """ self.basic_res = [[1,'Target_1',83,55.02,2.94,0.956,'00:00:00.01'],\ [2,'Target_2',84,53.31,4.42,0.960,'00:00:00.01']] self.search_res = \ [[1,'AABL01002928.1/2363-2445',83,55.02,2.94,0.956,'00:00:00.01'],\ [2,'AACV01025780.1/26051-26134',84,53.31,4.42,0.960,'00:00:00.01']] def test_cmalign_score_parser_no_data(self): """CmalignScoreParser should return correct result given no data. """ parser = CmalignScoreParser([]) self.assertEqual(list(parser),[]) def test_cmalign_score_parser_basic(self): """CmalignScoreParser should return correct result given basic output. """ parser = CmalignScoreParser(self.basic_results_hits.split('\n')) self.assertEqual(list(parser),self.basic_res) def test_cmalign_score_parser_full(self): """CmalignScoreParser should return correct result given cmalign output. """ parser = CmalignScoreParser(ALIGN_DATA.split('\n')) self.assertEqual(list(parser),self.search_res) SEARCH_DATA = """# command: cmsearch -T 0.0 --tabfile /tmp/tmpQGr0PGVeaEvGUkw2TM3e.txt --informat FASTA /tmp/tmp40hq0MqFPLn2lAymQeAD.txt /tmp/tmplTEQNgv0UA7sFSV0Z2RL.txt # date: Mon Nov 8 13:51:12 2010 # num seqs: 3 # dbsize(Mb): 0.000124 # # Pre-search info for CM 1: model1.cm # # rnd mod alg cfg beta bit sc cut # --- --- --- --- ----- ---------- # 1 hmm fwd loc - 3.00 # 2 cm cyk loc 1e-10 0.00 # 3 cm ins loc 1e-15 0.00 # # CM: model1.cm # target coord query coord # ---------------------- ------------ # model name target name start stop start stop bit sc E-value GC% # ------------------------------- ----------- ---------- ---------- ----- ----- -------- -------- --- model1.cm seq_0 5 23 1 19 12.85 - 37 model1.cm seq_1 1 19 1 19 14.36 - 47 # # Post-search info for CM 1: /tmp/tmpWmLUo5hsKH6nyib4nGMq.cm # # rnd mod alg cfg beta bit sc cut num hits surv fract # --- --- --- --- ----- ---------- -------- ---------- # 1 hmm fwd loc - 3.00 2 0.4113 # 2 cm cyk loc 1e-10 0.00 2 0.4113 # 3 cm ins loc 1e-15 0.00 2 0.3065 # # run time # ----------- # 00:00:00""" ALIGN_DATA = """# cmalign :: align sequences to an RNA CM # INFERNAL 1.0.2 (October 2009) # Copyright (C) 2009 HHMI Janelia Farm Research Campus # Freely distributed under the GNU General Public License (GPLv3) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # command: cmalign orig_alignments/RF01057_3NPQ_chain_A_results.cm RF01057_two_seqs.fasta # date: Thu May 1 13:00:32 2011 # # cm name algorithm config sub bands tau # ------------------------- --------- ------ --- ----- ------ # RF01057_3NPQ_chain_A_resu opt acc global no hmm 1e-07 # # bit scores # ------------------ # seq idx seq name len total struct avg prob elapsed # ------- -------------------------- ----- -------- -------- -------- ----------- 1 AABL01002928.1/2363-2445 83 55.02 2.94 0.956 00:00:00.01 2 AACV01025780.1/26051-26134 84 53.31 4.42 0.960 00:00:00.01 # STOCKHOLM 1.0 #=GF AU Infernal 1.0.2 AABL01002928.1/2363-2445 CGCGCCGAGGAGCGCUGCGACGGCCCG...UCGAGGGCCGCCAGGCUCGG AACV01025780.1/26051-26134 CCUGCCGAGGGGCGCUGCGACCGGAUCcaaUGAGGCCCGGCCAGGCUCGG #=GC SS_cons :::::::::<--<<<--<--<<_____...________>>->-------- #=GC RF CuuuCCGAGGAGCGCUGcAACgGgcuc...uuacggcccGCcAGGCUCGG AABL01002928.1/2363-2445 CGGGG...ACAAucgguUUUCCAACGGCGSUCUGUUUAU AACV01025780.1/26051-26134 UAAGGuggCUUU.....GUAACAACGGCGCCCGGCUAGA #=GC SS_cons -----...----.....--------->>>-->::::::: #=GC RF aaagG...uaaa.....ccuaCAACGGCGCUCAcuCaca // # # CPU time: 0.02u 0.00s 00:00:00.02 Elapsed: 00:00:00""" if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_kegg_fasta.py000644 000765 000024 00000005650 12024702176 023266 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Jesse Zaneveld" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Zaneveld"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Zaneveld" __email__ = "zaneveld@gmail.com" __status__ = "Production" """ Test code for kegg_fasta.py in cogent.parse. """ from cogent.util.unit_test import TestCase, main from cogent.parse.kegg_fasta import kegg_label_fields, parse_fasta class ParseKeggFastaTests(TestCase): def test_kegg_label_fields(self): """kegg_label_fields should return fields from line""" # Format is species:gene_id [optional gene_name]; description. # Note that the '>' should already be stripped by the Fasta Parser test1 = \ """stm:STM0001 thrL; thr operon leader peptide ; K08278 thr operon leader peptide""" test2 = \ """stm:STM0002 thrA; bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K00003 homoserine dehydrogenase [EC:1.1.1.3]; K00928 aspartate kinase [EC:2.7.2.4]""" obs = kegg_label_fields(test1) exp = ('stm:STM0001','stm','STM0001',\ 'thrL','thr operon leader peptide ; K08278 thr operon leader peptide') self.assertEqual(obs,exp) obs = kegg_label_fields(test2) exp = ('stm:STM0002', 'stm', 'STM0002', 'thrA', \ 'bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K00003 homoserine dehydrogenase [EC:1.1.1.3]; K00928 aspartate kinase [EC:2.7.2.4]') self.assertEqual(obs,exp) def test_parse_fasta(self): """parse_fasta should parse KEGG FASTA lines""" obs = parse_fasta(TEST_KEGG_FASTA_LINES) exp = EXP_RESULT for i,entry in enumerate(obs): self.assertEqual(entry, exp[i]) TEST_KEGG_FASTA_LINES = \ [">stm:STM0001 thrL; thr operon leader peptide; K08278 thr operon leader peptide",\ "atgaaccgcatcagcaccaccaccattaccaccatcaccattaccacaggtaacggtgcgggctga",\ ">stm:STM0002 thrA; bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K12524 bifunctional aspartokinase/homoserine dehydrogenase 1 [EC:2.7.2.4 1.1.1.3]",\ "atgcgagtgttgaagttcggcggtacatcagtggcaaatgcagaacgttttctgcgtgtt",\ "gccgatattctggaaagcaatgccaggcaagggcaggtagcgaccgtactttccgccccc"] EXP_RESULT = \ ["\t".join(["stm:STM0001","stm","STM0001",\ "thrL","thr operon leader peptide; K08278 thr operon leader peptide","atgaaccgcatcagcaccaccaccattaccaccatcaccattaccacaggtaacggtgcgggctga","\n"]),\ "\t".join(["stm:STM0002","stm","STM0002",\ "thrA","bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K12524 bifunctional aspartokinase/homoserine dehydrogenase 1 [EC:2.7.2.4 1.1.1.3]",\ "atgcgagtgttgaagttcggcggtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgccaggcaagggcaggtagcgaccgtactttccgccccc","\n"])] if __name__=="__main__": main() PyCogent-1.5.3/tests/test_parse/test_kegg_ko.py000644 000765 000024 00000041365 12024702176 022604 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Jesse Zaneveld" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Zaneveld"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Zaneveld" __email__ = "zaneveld@gmail.com" __status__ = "Production" """ Test code for kegg_ko.py in cogent.parse. """ from cogent.util.unit_test import TestCase, main from cogent.parse.kegg_ko import kegg_label_fields,\ parse_kegg_taxonomy, ko_record_iterator, ko_record_splitter,\ ko_default_parser, ko_first_field_parser, delete_comments,\ ko_colon_fields, ko_colon_delimited_parser, _is_new_kegg_rec_group,\ group_by_end_char, class_lines_to_fields, ko_class_parser, parse_ko,\ parse_ko_file, make_tab_delimited_line_parser class ParseKOTests(TestCase): def make_tab_delimited_line_parser(self): """make_tab_delimited_line_parser should generate line parser""" line ="good\tbad:good\tgood\tgood\tbad:good\tgood" parse_fn = make_tab_delimited_line_parser([0,2,3,5]) obs = parse_fn(line) exp = "good\tgood\tgood\tgood\tgood\tgood" self.assertEqual(obs,exp) def test_kegg_label_fields(self): """kegg_label_fields should return fields from line""" # Format is species:gene_id [optional gene_name]; description. # Note that the '>' should already be stripped by the Fasta Parser test1 = \ """stm:STM0001 thrL; thr operon leader peptide ; K08278 thr operon leader peptide""" test2 = \ """stm:STM0002 thrA; bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K00003 homoserine dehydrogenase [EC:1.1.1.3]; K00928 aspartate kinase [EC:2.7.2.4]""" obs = kegg_label_fields(test1) exp = ('stm:STM0001','stm','STM0001',\ 'thrL','thr operon leader peptide ; K08278 thr operon leader peptide') self.assertEqual(obs,exp) obs = kegg_label_fields(test2) exp = ('stm:STM0002', 'stm', 'STM0002', 'thrA', \ 'bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K00003 homoserine dehydrogenase [EC:1.1.1.3]; K00928 aspartate kinase [EC:2.7.2.4]') self.assertEqual(obs,exp) def test_ko_record_iterator(self): """ko_record_iterator should iterate over KO records""" recs = [] for rec in ko_record_iterator(TEST_KO_LINES): recs.append(rec) self.assertEqual(len(recs),3) self.assertEqual(len(recs[0]),31) exp = 'ENTRY K01559 KO\n' self.assertEqual(recs[0][0],exp) exp = ' RCI: RCIX1162 RCIX2396\n' self.assertEqual(recs[0][-1],exp) exp = 'ENTRY K01561 KO\n' self.assertEqual(recs[-1][0],exp) exp = ' MSE: Msed_1088\n' self.assertEqual(recs[-1][-1],exp) def test_ko_record_splitter(self): """ko_record_splitter should split ko lines into a dict of groups""" recs=[rec for rec in ko_record_iterator(TEST_KO_LINES)] split_recs = ko_record_splitter(recs[0]) exp = ['GENES AFM: AFUA_4G13070\n',\ ' PHA: PSHAa2393\n',\ ' ABO: ABO_0668\n',\ ' BXE: Bxe_B0037 Bxe_C0683 Bxe_C1002 Bxe_C1023\n',\ ' MPT: Mpe_A2274\n',\ ' BBA: Bd0910(catD)\n',\ ' GBE: GbCGDNIH1_0998 GbCGDNIH1_1171\n',\ ' FNU: FN1345\n', \ ' RBA: RB13257\n',\ ' HMA: rrnAC1925(mhpC)\n',\ ' RCI: RCIX1162 RCIX2396\n'] self.assertEqual(exp,split_recs["GENES"]) exp = ['CLASS Metabolism; Biosynthesis of Secondary Metabolites; Limonene and\n', ' pinene degradation [PATH:ko00903]\n', ' Metabolism; Xenobiotics Biodegradation and Metabolism; Caprolactam\n', ' degradation [PATH:ko00930]\n', ' Metabolism; Xenobiotics Biodegradation and Metabolism;\n', ' 1,1,1-Trichloro-2,2-bis(4-chlorophenyl)ethane (DDT) degradation\n', ' [PATH:ko00351]\n', ' Metabolism; Xenobiotics Biodegradation and Metabolism; Benzoate\n', ' degradation via CoA ligation [PATH:ko00632]\n', ' Metabolism; Xenobiotics Biodegradation and Metabolism; Benzoate\n', ' degradation via hydroxylation [PATH:ko00362]\n'] def test_ko_default_parser(self): """ko_default parser should strip out newlines and join lines together""" # Applies to 'NAME' and 'DEFINITION' lines default_line_1 = ['NAME E3.8.1.2\n'] obs = ko_default_parser(default_line_1) self.assertEqual(obs,'E3.8.1.2') default_line_2 = ['DEFINITION 2-haloacid dehalogenase [EC:3.8.1.2]\n'] obs = ko_default_parser(default_line_2) self.assertEqual(obs,'2-haloacid dehalogenase [EC:3.8.1.2]') def test_ko_first_field_parser(self): """ko_first_field_parser should strip out newlines and join lines together (first field only)""" obs = ko_first_field_parser(\ ['ENTRY K01559 KO\n']) exp = 'K01559' self.assertEqual(obs,exp) def test_delete_comments(self): """delete_comments should delete parenthetical comments from lines""" test_line = \ "bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13);" exp = "bifunctional aspartokinase I/homeserine dehydrogenase I ;" obs = delete_comments(test_line) self.assertEqual(obs,exp) nested_test_line = \ "text(comment1(comment2));" exp = "text;" obs = delete_comments(nested_test_line) self.assertEqual(obs,exp) def test_ko_colon_fields(self): """ko_colon_fields should convert lines to (key, [list of values])""" test_lines =\ [' BXE: Bxe_B0037 Bxe_C0683 Bxe_C1002 Bxe_C1023\n'] obs = ko_colon_fields(test_lines) exp = ('BXE', ['Bxe_B0037', 'Bxe_C0683', 'Bxe_C1002', 'Bxe_C1023']) self.assertEqual(obs,exp) test_lines = [' HMA: rrnAC1925(mhpC)\n'] obs = ko_colon_fields(test_lines, without_comments = True) exp = ('HMA', ['rrnAC1925']) self.assertEqual(obs,exp) test_lines = [' HMA: rrnAC1925(mhpC)\n'] obs = ko_colon_fields(test_lines, without_comments = False) exp = ('HMA', ['rrnAC1925(mhpC)']) self.assertEqual(obs,exp) def test_ko_colon_delimited_parser(self): """ko_colon_delimited_parser should return a dict of id: values for colon delimited lines""" test_lines =\ ['GENES AFM: AFUA_4G13070\n',\ ' PHA: PSHAa2393\n',\ ' ABO: ABO_0668\n',\ ' BXE: Bxe_B0037 Bxe_C0683 Bxe_C1002 Bxe_C1023\n',\ ' MPT: Mpe_A2274\n',\ ' BBA: Bd0910(catD)\n',\ ' GBE: GbCGDNIH1_0998 GbCGDNIH1_1171\n',\ ' FNU: FN1345\n',\ ' RBA: RB13257\n',\ ' HMA: rrnAC1925(mhpC)\n',\ ' RCI: RCIX1162 RCIX2396\n'] obs = ko_colon_delimited_parser(test_lines, without_comments = True) self.assertEqual(obs['BXE'],['Bxe_B0037','Bxe_C0683', 'Bxe_C1002','Bxe_C1023']) self.assertEqual(obs['PHA'],['PSHAa2393']) # Check that comments are stripped self.assertEqual(obs['BBA'],['Bd0910']) obs = ko_colon_delimited_parser(test_lines, without_comments = False) # Lines without comments shouldn't be affected self.assertEqual(obs['BXE'],['Bxe_B0037','Bxe_C0683', 'Bxe_C1002','Bxe_C1023']) self.assertEqual(obs['PHA'],['PSHAa2393']) # Comments should be preserved self.assertEqual(obs['BBA'],['Bd0910(catD)']) def test_is_new_kegg_rec_group(self): """_is_new_kegg_rec_group should check for irregular field terminators in KEGG""" pass # Handle unusual KEGG fields. def test_group_by_end_char(self): """group_by_end_char should yield successive lines that end with a given char, plus the last group of lines""" class_lines=['CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\ ' gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n',\ ' Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\ ' 1,2-Dichloroethane degradation [PATH:ko00631]\n'] exp =[['CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\ ' gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n'],\ [' Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\ ' 1,2-Dichloroethane degradation [PATH:ko00631]\n']] for i,group in enumerate(group_by_end_char(class_lines)): self.assertEqual(group, exp[i]) def test_class_lines_to_fields(self): """class_lines_to_fields should split groups of lines for one KO class definition""" class_lines1=['CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\ ' gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n'] class_lines2=[' Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\ ' 1,2-Dichloroethane degradation [PATH:ko00631]\n'] obs = class_lines_to_fields(class_lines1) exp = ('PATH:ko00361',('Metabolism', 'Xenobiotics Biodegradation and Metabolism', 'gamma-Hexachlorocyclohexane degradation')) self.assertEqual(obs,exp) obs = class_lines_to_fields(class_lines2) exp = ('PATH:ko00631',('Metabolism', 'Xenobiotics Biodegradation and Metabolism','1,2-Dichloroethane degradation')) self.assertEqual(obs,exp) def test_ko_class_parser(self): """ko_class_parser should return fields""" class_lines='CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\ ' gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n',\ ' Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\ ' 1,2-Dichloroethane degradation [PATH:ko00631]\n' exp = [('PATH:ko00361',('Metabolism','Xenobiotics Biodegradation and Metabolism',\ 'gamma-Hexachlorocyclohexane degradation')),\ ('PATH:ko00631',('Metabolism', 'Xenobiotics Biodegradation and Metabolism', '1,2-Dichloroethane degradation'))] for i,obs in enumerate(ko_class_parser(class_lines)): self.assertEqual(obs,exp[i]) def test_parse_ko(self): """parse_ko should parse a ko record into fields """ lines = TEST_KO_LINES r = parse_ko(lines) results = [] for result in r: results.append(result) # For each entry we expect a dict self.assertEqual(results[0]["ENTRY"], "K01559") self.assertEqual(results[1]["ENTRY"], "K01560") self.assertEqual(results[2]["ENTRY"], "K01561") self.assertEqual(results[0]["NAME"], "E3.7.1.-") self.assertEqual(results[1]["NAME"], "E3.8.1.2") self.assertEqual(results[2]["NAME"], "E3.8.1.3") self.assertEqual(results[0].get("DEFINITION"), None) #case 1 has no def self.assertEqual(results[1]["DEFINITION"],\ "2-haloacid dehalogenase [EC:3.8.1.2]") self.assertEqual(results[2]["DEFINITION"],\ "haloacetate dehalogenase [EC:3.8.1.3]") self.assertEqual(len(results[0]["CLASS"]), 5) self.assertEqual(results[0]["CLASS"][4], \ ('PATH:ko00362', ('Metabolism', \ 'Xenobiotics Biodegradation and Metabolism',\ 'Benzoate degradation via hydroxylation'))) self.assertEqual(results[0]["DBLINKS"], \ {'RN': ['R04488', 'R05100', 'R05363', \ 'R05365', 'R06371', 'R07515', \ 'R07831']}) self.assertEqual(results[1]["DBLINKS"], \ {'GO': ['0018784'], 'RN': ['R05287'], 'COG': ['COG1011']}) self.assertEqual(results[2]["DBLINKS"], \ {'GO': ['0018785'], 'RN': ['R05287']}) self.assertEqual(results[0]["GENES"], \ {'AFM': ['AFUA_4G13070'], 'FNU': ['FN1345'],\ 'GBE': ['GbCGDNIH1_0998', 'GbCGDNIH1_1171'],\ 'PHA': ['PSHAa2393'], \ 'BBA': ['Bd0910'], \ 'ABO': ['ABO_0668'],\ 'MPT': ['Mpe_A2274'],\ 'RCI': ['RCIX1162', 'RCIX2396'], \ 'BXE': ['Bxe_B0037', 'Bxe_C0683', 'Bxe_C1002', 'Bxe_C1023'],\ 'HMA': ['rrnAC1925'], \ 'RBA': ['RB13257']}) TEST_KO_LINES = ['ENTRY K01559 KO\n', '\ NAME E3.7.1.-\n', '\ PATHWAY ko00351 1,1,1-Trichloro-2,2-bis(4-chlorophenyl)ethane (DDT)\n', '\ degradation\n', '\ ko00362 Benzoate degradation via hydroxylation\n', '\ ko00632 Benzoate degradation via CoA ligation\n', '\ ko00903 Limonene and pinene degradation\n', '\ ko00930 Caprolactam degradation\n', '\ CLASS Metabolism; Biosynthesis of Secondary Metabolites; Limonene and\n', '\ pinene degradation [PATH:ko00903]\n', '\ Metabolism; Xenobiotics Biodegradation and Metabolism; Caprolactam\n', '\ degradation [PATH:ko00930]\n', '\ Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\ 1,1,1-Trichloro-2,2-bis(4-chlorophenyl)ethane (DDT) degradation\n', '\ [PATH:ko00351]\n', '\ Metabolism; Xenobiotics Biodegradation and Metabolism; Benzoate\n', '\ degradation via CoA ligation [PATH:ko00632]\n', '\ Metabolism; Xenobiotics Biodegradation and Metabolism; Benzoate\n', '\ degradation via hydroxylation [PATH:ko00362]\n', '\ DBLINKS RN: R04488 R05100 R05363 R05365 R06371 R07515 R07831\n', '\ GENES AFM: AFUA_4G13070\n', '\ PHA: PSHAa2393\n', '\ ABO: ABO_0668\n', '\ BXE: Bxe_B0037 Bxe_C0683 Bxe_C1002 Bxe_C1023\n', '\ MPT: Mpe_A2274\n', '\ BBA: Bd0910(catD)\n', '\ GBE: GbCGDNIH1_0998 GbCGDNIH1_1171\n', '\ FNU: FN1345\n', '\ RBA: RB13257\n', '\ HMA: rrnAC1925(mhpC)\n', '\ RCI: RCIX1162 RCIX2396\n', '\ ///\n', '\ ENTRY K01560 KO\n', '\ NAME E3.8.1.2\n', '\ DEFINITION 2-haloacid dehalogenase [EC:3.8.1.2]\n', '\ PATHWAY ko00361 gamma-Hexachlorocyclohexane degradation\n', '\ ko00631 1,2-Dichloroethane degradation\n', '\ CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\ gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n', '\ Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\ 1,2-Dichloroethane degradation [PATH:ko00631]\n', '\ DBLINKS RN: R05287\n', '\ COG: COG1011\n', '\ GO: 0018784\n', '\ GENES NCR: NCU03617\n', '\ ANI: AN5830.2 AN7918.2\n', '\ AFM: AFUA_2G07750 AFUA_5G14640 AFUA_8G05870\n', '\ AOR: AO090001000019 AO090003001435 AO090011000921\n', '\ PST: PSPTO_0247(dehII)\n', '\ PSP: PSPPH_1747(dehII1) PSPPH_5028(dehII2)\n', '\ ATU: Atu0797 Atu3405(hadL)\n', '\ ATC: AGR_C_1458 AGR_L_2834\n', '\ RET: RHE_CH00996(ypch00330) RHE_PF00342(ypf00173)\n', '\ MSE: Msed_0732\n', '\ ///\n', '\ ENTRY K01561 KO\n', '\ NAME E3.8.1.3\n', '\ DEFINITION haloacetate dehalogenase [EC:3.8.1.3]\n', '\ PATHWAY ko00361 gamma-Hexachlorocyclohexane degradation\n', '\ ko00631 1,2-Dichloroethane degradation\n', '\ CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\ gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n', '\ Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\ 1,2-Dichloroethane degradation [PATH:ko00631]\n', '\ DBLINKS RN: R05287\n', '\ GO: 0018785\n', '\ GENES RSO: RSc0256(dehH)\n', '\ REH: H16_A0197\n', '\ BPS: BPSL0329\n', '\ BPM: BURPS1710b_0537(dehH)\n', '\ BPD: BURPS668_0347\n', '\ STO: ST2570\n', '\ MSE: Msed_1088\n', '\ ///\n'] if __name__=="__main__": main() PyCogent-1.5.3/tests/test_parse/test_kegg_pos.py000644 000765 000024 00000003447 12024702176 022773 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.parse.kegg_pos import parse_pos_lines, parse_pos_file __author__ = "Jesse Zaneveld" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Zaneveld", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Zaneveld" __email__ = "zaneveld@gmail.com" __status__ = "Production" """ Test code for kegg_pos.py in cogent.parse. """ class ParsePosTests(TestCase): def test_parse_pos_lines(self): """Parse pos lines should parse given lines and filename""" test_lines = \ ['YPO0021 hemN 28982 28982..30355 1374\n',\ 'YPO0022 glnG, glnT, ntrC 30409 complement(30409..31821) 1413\n',\ 'YPO0023 glnL, ntrB 31829 complement(31829..32878) 1050\n',\ 'YPO0024 glnA 33131 complement(33131..34540) 1410\n'] obs = parse_pos_lines(test_lines, file_name = 'y.pestis.pos') exp = ['y.pestis\tYPO0021 hemN 28982 28982..30355 1374\n',\ 'y.pestis\tYPO0022 glnG, glnT, ntrC 30409 complement(30409..31821) 1413\n',\ 'y.pestis\tYPO0023 glnL, ntrB 31829 complement(31829..32878) 1050\n',\ 'y.pestis\tYPO0024 glnA 33131 complement(33131..34540) 1410\n'] for i,parsed_line in enumerate(obs): self.assertEqual(parsed_line, exp[i]) def test_pos_to_fields(self): """parse_pos_to_fields should open files and extract fields""" # Note that this test is set to pass, as this is just a simple # open/yield wrapper equivalent to the demo blocks for other parsers. # It is kept as an independent function to allow calling by handlers. pass if __name__=="__main__": main() PyCogent-1.5.3/tests/test_parse/test_kegg_taxonomy.py000644 000765 000024 00000003260 12024702176 024041 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.parse.kegg_taxonomy import parse_kegg_taxonomy __author__ = "Jesse Zaneveld" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Zaneveld","Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Zaneveld" __email__ = "zaneveld@gmail.com" __status__ = "Production" """ Test code for kegg_taxonomy.py in cogent.parse. """ class ParseKEGGTaxonomy(TestCase): def test_parse_kegg_taxonomy(self): """parse_kegg_taxonomy should return successive taxonomy entries from lines""" test_lines =\ ['# Eukaryotes\n',\ '## Animals\n',\ '### Vertebrates\n',\ '#### Mammals\n',\ 'T01001(2000)\thsa\tH.sapiens\tHomo sapiens (human)\n',\ '#### Birds\n',\ 'T01006(2005)\tgga\tG.gallus\tGallus gallus (chicken)\n',\ '### Arthropods\n',\ '#### Insects\n',\ 'T00030(2000)\tdme\tD.melanogaster\tDrosophila melanogaster (fruit fly)\n'] exp =\ ['Eukaryotes\tAnimals\tVertebrates\tMammals\tT01001(2000)\thsa\tH.sapiens\tHomo sapiens (human)\tHomo\tsapiens\thuman\n',\ 'Eukaryotes\tAnimals\tVertebrates\tBirds\tT01006(2005)\tgga\tG.gallus\tGallus gallus (chicken)\tGallus\tgallus\tchicken\n',\ 'Eukaryotes\tAnimals\tArthropods\tInsects\tT00030(2000)\tdme\tD.melanogaster\tDrosophila melanogaster (fruit fly)\tDrosophila\tmelanogaster\tfruit fly\n'] obs = parse_kegg_taxonomy(test_lines) for i,res in enumerate(obs): self.assertEqual(res,exp[i]) if __name__=="__main__": main() PyCogent-1.5.3/tests/test_parse/test_locuslink.py000644 000765 000024 00000033067 12024702176 023201 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for locuslink-specific classes """ from cogent.parse.locuslink import ll_start,LLFinder,pipes,first_pipe,commas, \ _read_accession, _read_rell, _read_accnum, \ _read_map, _read_sts, _read_comp, _read_grif, _read_pmid, _read_go, \ _read_extannot, _read_cdd, _read_contig, LocusLink, LinesToLocusLink from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class locuslinkTests(TestCase): """Tests toplevel functions.""" def test_read_accession(self): """_read_accession should perform correct conversions""" self.assertEqual(_read_accession('NP_035835|6755985|na\n'), \ {'Accession':'NP_035835','Gi':'6755985','Strain':'na'}) #check that it ignores additional fields self.assertEqual(_read_accession('NG_002740|30172554|na|1|1315\n'), \ {'Accession':'NG_002740','Gi':'30172554','Strain':'na'}) def test_read_rell(self): """_read_rell should perform correct conversions""" self.assertEqual(_read_rell(\ 'related mRNA|AK090391|n|NM_153775--AK090391\n'), \ {'Description':'related mRNA','Id':'AK090391','IdType':'n',\ 'Printable':'NM_153775--AK090391'}) def test_read_accnum(self): """_read_accnum should perform correct conversions""" self.assertEqual(_read_accnum('NG_002740|30172554|na|1|1315\n'), \ {'Accession':'NG_002740','Gi':'30172554','Strain':'na',\ 'Start':'1','End':'1315'}) def test_read_map(self): """_read_map should perform correct conversions""" self.assertEqual(_read_map('10 C1|RefSeq|C|\n'), \ {'Location':'10 C1', 'Source':'RefSeq','Type':'C'}) def test_read_sts(self): """_read_sts should perform correct conversions""" self.assertEqual(_read_sts('RH35858|2|37920|na|seq_map|epcr\n'), \ {'Name':'RH35858','Chromosome':'2','StsId':'37920', 'Segment':'na',\ 'SequenceKnown':'seq_map', 'Evidence':'epcr'}) def test_read_cdd(self): """_read_cdd should perform correct conversions""" self.assertEqual(_read_cdd(\ 'Immunoglobulin C-2 Type|smart00408|103|na|4.388540e+01\n'), {'Name':'Immunoglobulin C-2 Type','Key':'smart00408',\ 'Score':'103', 'EValue':'na', 'BitScore':'4.388540e+01'}) def test_read_comp(self): """_read_comp should perform correct conversions""" self.assertEqual(_read_comp(\ '10090|Map2k6|11|11 cM|26399|17|MAP2K6|ncbi_mgd\n'), \ {'TaxonId':'10090', 'Symbol':'Map2k6', 'Chromosome':'11', \ 'Position':'11 cM', 'LocusId':'26399', 'ChromosomeSelf':'17', \ 'SymbolSelf':'MAP2K6','MapName':'ncbi_mgd'}) def test_read_grif(self): """_read_grif should perform correct conversions""" self.assertEqual(_read_grif('12037672|interaction with pRb\n'), \ {'PubMedId':'12037672', 'Description':'interaction with pRb'}) def test_read_pmid(self): """_read_pmid should perform correct conversions""" self.assertEqual(_read_pmid('12875969,12817023,12743034\n'), \ ['12875969','12817023','12743034']) def test_read_go(self): """_read_go should perform correct conversions""" self.assertEqual(_read_go(\ 'molecular function|zinc ion binding|IEA|GO:0008270|GOA|na\n'), \ {'Category':'molecular function', 'Term':'zinc ion binding',\ 'EvidenceCode':'IEA','GoId':'GO:0008270','Source':'GOA', \ 'PubMedId':'na'}) def test_read_extannot(self): """_read_extannot should perform correct conversions""" self.assertEqual(_read_extannot(\ 'cellular role|Pol II transcription|NR|Proteome|8760285\n'), \ {'Category':'cellular role','Term':'Pol II transcription',\ 'EvidenceCode':'NR', 'Source':'Proteome', 'PubMedId':'8760285'}) def test_read_contig(self): """_read_contig should perform correct conversions""" self.assertEqual(_read_contig(\ 'NT_011109.15|29800594|na|31124734|31133047|-|19|reference\n'),\ {'Accession':'NT_011109.15','Gi':'29800594','Strain':'na',\ 'From':'31124734','To':'31133047','Orientation':'-',\ 'Chromosome':'19','Assembly':'reference'}) def test_LinesToLocusLink(self): """LinesToLocusLink should give expected results on sample data""" fake_file = \ """>>1 LOCUSID: 1 LOCUS_CONFIRMED: yes LOCUS_TYPE: gene with protein product, function known or inferred ORGANISM: Homo sapiens STATUS: REVIEWED NM: NM_130786|21071029|na NP: NP_570602|21071030 CDD: Immunoglobulin C-2 Type|smart00408|103|na|4.388540e+01 PRODUCT: alpha 1B-glycoprotein ASSEMBLY: AF414429,AK055885,AK056201 CONTIG: NT_011109.15|29800594|na|31124734|31133047|-|19|reference EVID: supported by alignment with mRNA XM: NM_130786|21071029|na XP: NP_570602|21071030|na ACCNUM: AC010642|9929687|na|43581|41119 TYPE: g ACCNUM: AF414429|15778555|na|na|na TYPE: m PROT: AAL07469|15778556 ACCNUM: AK055885|16550723|na|na|na TYPE: m ACCNUM: AK056201|16551539|na|na|na TYPE: m ACCNUM: BC035719|23273475|na|na|na TYPE: m PROT: AAH35719|23273476 ACCNUM: none|na|na|na|na TYPE: p PROT: P04217|23503038 OFFICIAL_SYMBOL: A1BG OFFICIAL_GENE_NAME: alpha-1-B glycoprotein ALIAS_SYMBOL: A1B ALIAS_SYMBOL: ABG ALIAS_SYMBOL: GAB PREFERRED_PRODUCT: alpha 1B-glycoprotein SUMMARY: Summary: The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. CHR: 19 STS: RH65092|-|10673|na|na|epcr STS: WI-16009|-|52209|na|na|epcr STS: G59506|-|136670|na|na|epcr COMP: 10090|A1bg|na|na|117586|19|A1BG|ncbi_mgd COMP: 10090|A1bg|7|7 cM|117586|19|A1BG|ncbi_mgd BUTTON: unigene.gif LINK: http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=390608 UNIGENE: Hs.390608 OMIM: 138670 MAP: 19q13.4|RefSeq|C| MAPLINK: default_human_gene|A1BG BUTTON: snp.gif LINK: http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=1 BUTTON: homol.gif LINK: http://www.ncbi.nlm.nih.gov/HomoloGene/homolquery.cgi?TEXT=1[loc]&TAXID=9606 BUTTON: ensembl.gif LINK: http://www.ensembl.org/Homo_sapiens/contigview?geneid=NM_130786 BUTTON: ucsc.gif LINK: http://genome.ucsc.edu/cgi-bin/hgTracks?org=human&position=NM_130786 BUTTON: mgc.gif LINK: http://mgc.nci.nih.gov/Genes/GeneInfo?ORG=Hs&CID=390608 PMID: 12477932,8889549,3458201,2591067 GO: molecular function|molecular_function unknown|ND|GO:0005554|GOA|3458201 GO: biological process|biological_process unknown|ND|GO:0000004|GOA|na GO: cellular component|extracellular|IDA|GO:0005576|GOA|3458201 >>386590 LOCUSID: 386590 LOCUS_CONFIRMED: yes LOCUS_TYPE: gene with protein product, function known or inferred ORGANISM: Danio rerio ACCNUM: AF510108|31323727|na|na|na TYPE: m PROT: AAP47138|31323728 OFFICIAL_SYMBOL: tra1 OFFICIAL_GENE_NAME: tumor rejection antigen (gp96) 1 BUTTON: zfin.gif LINK: http://zfin.org/cgi-bin/ZFIN_jump?record=ZDB-GENE-031002-1 PMID: 14499652""" records = list(LLFinder(fake_file.split('\n'))) self.assertEqual(len(records), 2) first, second = map(LinesToLocusLink, records) #test the second one first, since it's shorter self.assertEqual(second.LOCUSID, 386590) self.assertEqual(second.LOCUS_CONFIRMED, 'yes') self.assertEqual(second.LOCUS_TYPE, \ 'gene with protein product, function known or inferred') self.assertEqual(second.ORGANISM, 'Danio rerio') self.assertEqual(second.ACCNUM, [{'Accession':'AF510108', \ 'Gi':'31323727', 'Strain':'na','Start':'na','End':'na'}]) self.assertEqual(second.TYPE, ['m']) self.assertEqual(second.PROT, \ [{'Accession':'AAP47138','Gi':'31323728'}]) self.assertEqual(second.OFFICIAL_SYMBOL, 'tra1') self.assertEqual(second.OFFICIAL_GENE_NAME, \ 'tumor rejection antigen (gp96) 1') self.assertEqual(second.BUTTON, ['zfin.gif']) self.assertEqual(second.LINK, \ ['http://zfin.org/cgi-bin/ZFIN_jump?record=ZDB-GENE-031002-1']) self.assertEqual(second.PMID, ['14499652']) #now for the annoying test on the longer record self.assertEqual(first.LOCUSID, 1) self.assertEqual(first.LOCUS_CONFIRMED, 'yes') self.assertEqual(first.ORGANISM, 'Homo sapiens') self.assertEqual(first.LOCUS_TYPE, \ 'gene with protein product, function known or inferred') self.assertEqual(first.STATUS, 'REVIEWED') self.assertEqual(first.NM, [{'Accession':'NM_130786','Gi':'21071029', \ 'Strain':'na'}]) self.assertEqual(first.NP, [{'Accession':'NP_570602','Gi':'21071030'}]) self.assertEqual(first.CDD, [{'Name':'Immunoglobulin C-2 Type',\ 'Key':'smart00408','Score':'103', 'EValue':'na',\ 'BitScore':'4.388540e+01'}]) self.assertEqual(first.PRODUCT, ['alpha 1B-glycoprotein']) self.assertEqual(first.ASSEMBLY, [['AF414429','AK055885','AK056201']]) self.assertEqual(first.CONTIG, [{'Accession':'NT_011109.15',\ 'Gi':'29800594','Strain':'na', 'From':'31124734','To':'31133047',\ 'Orientation':'-','Chromosome':'19','Assembly':'reference'}]) self.assertEqual(first.EVID, ['supported by alignment with mRNA']) self.assertEqual(first.XM, [{'Accession':'NM_130786', 'Gi':'21071029', \ 'Strain':'na'}]) self.assertEqual(first.XP, [{'Accession':'NP_570602', 'Gi':'21071030', \ 'Strain':'na'}]) self.assertEqual(first.ACCNUM, [ \ {'Accession':'AC010642','Gi':'9929687','Strain':'na',\ 'Start':'43581', 'End':'41119'}, {'Accession':'AF414429','Gi':'15778555','Strain':'na',\ 'Start':'na', 'End':'na'}, {'Accession':'AK055885','Gi':'16550723','Strain':'na',\ 'Start':'na', 'End':'na'}, {'Accession':'AK056201','Gi':'16551539','Strain':'na',\ 'Start':'na', 'End':'na'}, {'Accession':'BC035719','Gi':'23273475','Strain':'na',\ 'Start':'na', 'End':'na'}, {'Accession':'none','Gi':'na','Strain':'na',\ 'Start':'na', 'End':'na'}, ]) self.assertEqual(first.TYPE, ['g','m','m','m','m','p']) self.assertEqual(first.PROT, [ \ {'Accession':'AAL07469', 'Gi':'15778556'}, {'Accession':'AAH35719', 'Gi':'23273476'}, {'Accession':'P04217', 'Gi':'23503038'}, ]) self.assertEqual(first.OFFICIAL_SYMBOL, 'A1BG') self.assertEqual(first.OFFICIAL_GENE_NAME, 'alpha-1-B glycoprotein') self.assertEqual(first.ALIAS_SYMBOL, ['A1B','ABG','GAB']) self.assertEqual(first.PREFERRED_PRODUCT, ['alpha 1B-glycoprotein']) self.assertEqual(first.SUMMARY, ["""Summary: The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins."""]) self.assertEqual(first.CHR, ['19']) self.assertEqual(first.STS, [ {'Name':'RH65092','Chromosome':'-','StsId':'10673','Segment':'na',\ 'SequenceKnown':'na','Evidence':'epcr'}, {'Name':'WI-16009','Chromosome':'-','StsId':'52209','Segment':'na',\ 'SequenceKnown':'na','Evidence':'epcr'}, {'Name':'G59506','Chromosome':'-','StsId':'136670','Segment':'na',\ 'SequenceKnown':'na','Evidence':'epcr'}, ]) self.assertEqual(first.COMP, [ {'TaxonId':'10090','Symbol':'A1bg','Chromosome':'na','Position':'na',\ 'LocusId':'117586', 'ChromosomeSelf':'19','SymbolSelf':'A1BG',\ 'MapName':'ncbi_mgd'}, {'TaxonId':'10090','Symbol':'A1bg','Chromosome':'7','Position':'7 cM',\ 'LocusId':'117586', 'ChromosomeSelf':'19','SymbolSelf':'A1BG',\ 'MapName':'ncbi_mgd'}, ]) self.assertEqual(first.BUTTON, ['unigene.gif','snp.gif','homol.gif', \ 'ensembl.gif', 'ucsc.gif', 'mgc.gif']) self.assertEqual(first.LINK, [ \ 'http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=390608', 'http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=1', 'http://www.ncbi.nlm.nih.gov/HomoloGene/homolquery.cgi?TEXT=1[loc]&TAXID=9606', 'http://www.ensembl.org/Homo_sapiens/contigview?geneid=NM_130786', 'http://genome.ucsc.edu/cgi-bin/hgTracks?org=human&position=NM_130786', 'http://mgc.nci.nih.gov/Genes/GeneInfo?ORG=Hs&CID=390608', ]) self.assertEqual(first.UNIGENE, ['Hs.390608']) self.assertEqual(first.OMIM, ['138670']) self.assertEqual(first.MAP, [{'Location':'19q13.4','Source':'RefSeq',\ 'Type':'C'}]) self.assertEqual(first.MAPLINK, ['default_human_gene|A1BG']) self.assertEqual(first.PMID, ['12477932','8889549','3458201','2591067']) self.assertEqual(first.GO, [ \ {'Category':'molecular function','Term':'molecular_function unknown',\ 'EvidenceCode':'ND','GoId':'GO:0005554','Source':'GOA',\ 'PubMedId':'3458201'}, {'Category':'biological process','Term':'biological_process unknown',\ 'EvidenceCode':'ND','GoId':'GO:0000004','Source':'GOA',\ 'PubMedId':'na'}, {'Category':'cellular component','Term':'extracellular',\ 'EvidenceCode':'IDA','GoId':'GO:0005576','Source':'GOA',\ 'PubMedId':'3458201'}, ]) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_mage.py000644 000765 000024 00000022665 12024702176 022111 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of MageParser """ from cogent.util.unit_test import TestCase, main from cogent.parse.mage import MageParser, MageGroupFromString,\ MageListFromString, MagePointFromString from cogent.format.mage import MagePoint, MageList, MageGroup, Kinemage __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class MageGroupFromStringTests(TestCase): """Tests for the MageGroupFromString function""" def test_MageGroupFromString(self): """MageGroupFromString should fill itself from string correctly""" f = MageGroupFromString group = f('@group {GA manifold} off') self.assertEqual(str(group),'@group {GA manifold} off') group = f('@group {dna} recessiveon off dominant '+\ 'master= {master name} nobutton clone={clone_name}\n') self.assertEqual(str(group),\ '@group {dna} off nobutton recessiveon dominant '+\ 'master={master name} clone={clone_name}') group = f(\ '@subgroup {max_group} recessiveon instance= {inst \tname} lens') self.assertEqual(str(group), '@subgroup {max_group} recessiveon lens instance={inst \tname}') group = f('@subgroup {Pos 1} on') self.assertEqual(str(group), '@subgroup {Pos 1}') def test_MageGroupFromString_wrong(self): """MageGroupFromString should fail on wrong input""" f = MageGroupFromString # unknown keyword self.assertRaises(KeyError,f,'@group {GA manifold} of') # wrong nesting self.assertRaises(ValueError,f,'@group {GA manifold} master={blabla') class MageListFromStringTests(TestCase): """Tests for the MageListFromString function""" def test_MageListFromString(self): """MageListFromString should fill itself from string correctly""" f = MageListFromString l = f('@dotlist {label} color= green radius=0.3 off \t nobutton\n' ) self.assertEqual(str(l),\ '@dotlist {label} off nobutton color=green radius=0.3') l = f('@vectorlist') self.assertEqual(str(l),'@vectorlist') l = f('@balllist off angle= 4 width= 2 face=something '+\ 'font=\tother size= 3') self.assertEqual(str(l),'@balllist off angle=4 width=2 '+\ 'face=something font=other size=3') l = f('@dotlist {} on nobutton color=sky') self.assertEqual(str(l),'@dotlist nobutton color=sky') def test_MageListFromString_wrong(self): """MageListFromString should fail on wrong input""" f = MageListFromString self.assertRaises(ValueError,f, '@somelist {label} color= green radius=0.3 off \t nobutton\n') self.assertRaises(KeyError,f, '@vectorlist {label} colors= green radius=0.3 off \t nobutton\n') class MagePointFromStringTests(TestCase): """Tests of the MagePointFromString factory function.""" def test_MagePointFromString(self): """MagePoint should fill itself from string correctly""" m = MagePointFromString('{construction}width5 0.000 0.707 -1.225\n') self.assertEqual(str(m), \ '{construction} width5 ' + ' '.join(map(str, [0.0,0.707,-1.225]))) m = MagePointFromString('3, 4, 5') self.assertEqual(str(m), ' '.join(map(str, map(float, [3, 4, 5])))) m = MagePointFromString('{b2}P 0.000 0.000 0.000') self.assertEqual(str(m), '{b2} P ' + \ ' '.join(map(str, map(float, [0,0,0])))) m = MagePointFromString('P -2650192.000 4309510.000 3872241.000') self.assertEqual(str(m), 'P ' + \ ' '.join(map(str, map(float, [-2650192,4309510,3872241])))) m = MagePointFromString('{"}P -2685992.000 5752262.000 535328.000') self.assertEqual(str(m), '{"} P ' + \ ' '.join(map(str, map(float, [-2685992,5752262,535328])))) m = MagePointFromString('{ 1, d, 0 } P 1.000, 0.618, 0.000') self.assertEqual(str(m), '{ 1, d, 0 } P ' + \ ' '.join(map(str, map(float, [1.000, 0.618, 0.000])))) m = MagePointFromString('{"}width1 -1.022 0.969 -0.131') self.assertEqual(str(m), '{"} width1 ' + \ ' '.join(map(str, map(float, [-1.022,0.969,-0.131])))) m = MagePointFromString(\ 'width3 {A label with spaces } A blue r=3.7 5, 6, 7') self.assertEqual(m.Width, 3) self.assertEqual(m.Label, 'A label with spaces ') self.assertFloatEqual(m.Coordinates, [5, 6, 7]) self.assertFloatEqual(m.Radius, 3.7) self.assertEqual(m.Color, 'blue') self.assertEqual(m.State, 'A') self.assertEqual(str(m),'{A label with spaces } A blue width3 r=3.7 ' +\ ' '.join(map(str, map(float, [5, 6, 7])))) class MageParserTests(TestCase): """Tests for the MageParser""" def test_MageParser(self): """MageParser should work on valid input""" obs = str(MageParser(EXAMPLE_1.split('\n'))).split('\n') exp = EXP_EXAMPLE_1.split('\n') assert len(obs) == len(exp) #first check per line; easier for debugging for x in range(len(obs)): self.assertEqual(obs[x],exp[x]) #double check to see if the whole string is the same self.assertEqual(str(MageParser(EXAMPLE_1.split('\n'))),EXP_EXAMPLE_1) EXAMPLE_1 = """ @text Kinemage of ribosomal RNA SSU Bacteria @kinemage1 @caption SSU Bacteria secondary structure elements @viewid {oblique} @zoom 1.05 @zslab 467 @center 0.500 0.289 0.204 @matrix -0.55836 -0.72046 -0.41133 0.82346 -0.42101 -0.38036 0.10085 -0.55108 0.82833 @2viewid {top} @2zoom 0.82 @2zslab 470 @2center 0.500 0.289 0.204 @2matrix -0.38337 0.43731 -0.81351 0.87217 -0.11840 -0.47466 -0.30389 -0.89148 -0.33602 @3viewid {side} @3zoom 0.82 @3zslab 470 @3center 0.500 0.289 0.204 @3matrix -0.49808 -0.81559 -0.29450 0.86714 -0.46911 -0.16738 -0.00164 -0.33875 0.94088 @4viewid {End-O-Line} @4zoom 1.43 @4zslab 469 @4center 0.500 0.289 0.204 @4matrix 0.00348 -0.99984 -0.01766 0.57533 -0.01244 0.81784 -0.81792 -0.01301 0.57519 @perspective @fontsizelabel 24 @onewidth @zclipoff @localrotation 1 0 0 .5 .866 0 .5 .289 .816 @group {Tetrahedron} @vectorlist {Edges} color=white nobutton P {0 0 0} 0 0 0 0.5 0 0 {1 0 0} 1 0 0 0.5 0.5 0 {0 1 0} 0 1 0 P 0 0 0 0 0.5 0 {0 1 0} 0 1 0 0 0.5 0.5 {0 0 1} 0 0 1 P 0 0 0 0 0 0.5 {0 0 1} 0 0 1 0.5 0 0.5 {1 0 0} 1 0 0 @labellist {labels} color=white nobutton {U} 0 0 0 {A} 1.1 0 0 {C} 0 1.05 0 {G} 0 0 1.08 @group {Lines} @vectorlist {A=U&C=G} color= green off P 0 0.5 0.5 .1 .4 .4 .25 .25 .25 .4 .1 .1 L 0.500, 0.000, 0.000 @vectorlist {A=G&C=U} color= red off P 0.5 0 0.5 .25 .25 .25 L 0, 0.500, 0.000 @vectorlist {A=C&G=U} color= red off P 0.5 0.5 0 .25 .25 .25 L 0.000, 0.000, 0.500 @group {SSU Bacteria} recessiveon @dotlist {Stem} radius=0.03 color= orange {a} .3 .1 .4 {b} r=.2 .1 .1 .1 @balllist {Junction} radius=.04 {c} red .4 .4 0 {}\t r=.1 green .3 .2 .1 @group {empty group} @group {Nested} @subgroup {First} @spherelist color=\tpurple {e} .1 .1 .1 @subgroup {Second} master=\t {master name} @labellist {labels} {U} 0 0 0 {A} 1.1 0 0 {C} 0 1.05 0 {G} 0 0 1.08 """ EXP_EXAMPLE_1 =\ """@kinemage 1 @viewid {oblique} @zoom 1.05 @zslab 467 @center 0.500 0.289 0.204 @matrix -0.55836 -0.72046 -0.41133 0.82346 -0.42101 -0.38036 0.10085 -0.55108 0.82833 @2viewid {top} @2zoom 0.82 @2zslab 470 @2center 0.500 0.289 0.204 @2matrix -0.38337 0.43731 -0.81351 0.87217 -0.11840 -0.47466 -0.30389 -0.89148 -0.33602 @3viewid {side} @3zoom 0.82 @3zslab 470 @3center 0.500 0.289 0.204 @3matrix -0.49808 -0.81559 -0.29450 0.86714 -0.46911 -0.16738 -0.00164 -0.33875 0.94088 @4viewid {End-O-Line} @4zoom 1.43 @4zslab 469 @4center 0.500 0.289 0.204 @4matrix 0.00348 -0.99984 -0.01766 0.57533 -0.01244 0.81784 -0.81792 -0.01301 0.57519 @perspective @fontsizelabel 24 @onewidth @zclipoff @localrotation 1 0 0 .5 .866 0 .5 .289 .816 @text Kinemage of ribosomal RNA SSU Bacteria @caption SSU Bacteria secondary structure elements @group {Tetrahedron} @vectorlist {Edges} nobutton color=white {0 0 0} P 0.0 0.0 0.0 0.5 0.0 0.0 {1 0 0} 1.0 0.0 0.0 0.5 0.5 0.0 {0 1 0} 0.0 1.0 0.0 P 0.0 0.0 0.0 0.0 0.5 0.0 {0 1 0} 0.0 1.0 0.0 0.0 0.5 0.5 {0 0 1} 0.0 0.0 1.0 P 0.0 0.0 0.0 0.0 0.0 0.5 {0 0 1} 0.0 0.0 1.0 0.5 0.0 0.5 {1 0 0} 1.0 0.0 0.0 @labellist {labels} nobutton color=white {U} 0.0 0.0 0.0 {A} 1.1 0.0 0.0 {C} 0.0 1.05 0.0 {G} 0.0 0.0 1.08 @group {Lines} @vectorlist {A=U&C=G} off color=green P 0.0 0.5 0.5 0.1 0.4 0.4 0.25 0.25 0.25 0.4 0.1 0.1 L 0.5 0.0 0.0 @vectorlist {A=G&C=U} off color=red P 0.5 0.0 0.5 0.25 0.25 0.25 L 0.0 0.5 0.0 @vectorlist {A=C&G=U} off color=red P 0.5 0.5 0.0 0.25 0.25 0.25 L 0.0 0.0 0.5 @group {SSU Bacteria} recessiveon @dotlist {Stem} color=orange radius=0.03 {a} 0.3 0.1 0.4 {b} r=0.2 0.1 0.1 0.1 @balllist {Junction} radius=.04 {c} red 0.4 0.4 0.0 {} green r=0.1 0.3 0.2 0.1 @group {empty group} @group {Nested} @subgroup {First} @spherelist color=purple {e} 0.1 0.1 0.1 @subgroup {Second} master={master name} @labellist {labels} {U} 0.0 0.0 0.0 {A} 1.1 0.0 0.0 {C} 0.0 1.05 0.0 {G} 0.0 0.0 1.08""" if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_meme.py000644 000765 000024 00000101172 12024702176 022112 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for the MEME parser """ from __future__ import division from cogent.util.unit_test import TestCase, main import string import re from cogent.motif.util import Motif from cogent.core.moltype import DNA from cogent.parse.record import DelimitedSplitter from cogent.parse.record_finder import LabeledRecordFinder from cogent.parse.meme import * __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jermey.widmann@colorado.edu" __status__ = "Production" class MemeTests(TestCase): """Tests for meme module. """ def setUp(self): """Setup function for meme tests. """ #Meme output data: self.meme_file = MEME_FILE.split('\n') self.meme_main = LabeledRecordFinder(lambda x: x.startswith('COMMAND')) self.meme_command = LabeledRecordFinder(lambda x: x.startswith('MOTIF')) self.meme_summary = LabeledRecordFinder(lambda x: x.startswith('SUMMARY')) self.meme_module = LabeledRecordFinder(lambda x: x.startswith('Motif')) self.alphabet_block, self.main_block = \ list(self.meme_main(self.meme_file)) self.cmd_mod_list = list(self.meme_command(self.main_block)) self.command_block = self.cmd_mod_list[0] self.module_blocks = self.cmd_mod_list[1:] self.summary_block = list(self.meme_summary(self.module_blocks[-1]))[1] self.module_data_blocks = [] for module in self.module_blocks: self.module_data_blocks.append(\ list(self.meme_module(module))) #List and Dict for testing dictFromList function self.sample_list = ['key1',1,'key2',2,'key3',3,'key4',4] self.sample_dict = {'key1':1, 'key2':2, 'key3':3, 'key4':4, } #List of command line data self.command_line_list = [ 'model: mod= tcm nmotifs= 3 evt= 1e+100', 'object function= E-value of product of p-values', 'width: minw= 4 maxw= 10 minic= 0.00', 'width: wg= 11 ws= 1 endgaps= yes', 'nsites: minsites= 2 maxsites= 50 wnsites= 0.8', 'theta: prob= 1 spmap= uni spfuzz= 0.5', 'em: prior= dirichlet b= 0.01 maxiter= 20', 'distance= 1e-05', 'data: n= 597 N= 15', 'strands: +', 'sample: seed= 0 seqfrac= 1', ] #List of dicts which contain general info for each module. self.module_info_dicts = [ {'MOTIF':'1', 'width':'10', 'sites':'11', 'llr':'131', 'E-value':'1.3e-019', }, {'MOTIF':'2', 'width':'7', 'sites':'11', 'llr':'88', 'E-value':'2.5e-006', }, {'MOTIF':'3', 'width':'7', 'sites':'6', 'llr':'53', 'E-value':'5.5e-001', }, ] #Summary dict self.summary_dict = {'CombinedP':{ '1': float(3.48e-02), '11': float(3.78e-05), '17': float(2.78e-08), '28': float(3.49e-06), '105': float(3.98e-06), '159': float(1.08e-02), '402-C01': float(4.22e-07), '407-A07': float(7.32e-08), '410-A10': float(4.23e-04), '505-D01': float(5.72e-07), '507-B04-1': float(1.01e-04), '518-D12': float(2.83e-06), '621-H01': float(8.69e-07), '625-H05': float(8.86e-06), '629-C08': float(5.61e-07), } } self.remap_dict = { '11':'11', '1':'1', '407-A07':'407-A07', '17':'17', '159':'159', '505-D01':'505-D01', '28':'28', '507-B04-1':'507-B04-1', '402-C01':'402-C01', '621-H01':'621-H01', '629-C08':'629-C08', '410-A10':'410-A10', '105':'105', '625-H05':'625-H05', '518-D12':'518-D12' } #ModuleInstances and Modules self.ModuleInstances = [ [ModuleInstance('CTATTGGGGC',Location('629-C08',18,28), float(1.95e-06)), ModuleInstance('CTATTGGGGC',Location('621-H01',45,55), float(1.95e-06)), ModuleInstance('CTATTGGGGC',Location('505-D01',26,36), float(1.95e-06)), ModuleInstance('CTATTGGGGC',Location('407-A07',5,15), float(1.95e-06)), ModuleInstance('CTATTGGGGC',Location('105',0,10), float(1.95e-06)), ModuleInstance('CTATTGGGGC',Location('28',3,13), float(1.95e-06)), ModuleInstance('CTATTGGGGC',Location('17',16,26), float(1.95e-06)), ModuleInstance('CTATTGGGCC',Location('402-C01',24,34), float(3.30e-06)), ModuleInstance('CTAGTGGGGC',Location('625-H05',2,12), float(5.11e-06)), ModuleInstance('CTAGTGGGCC',Location('11',15,25), float(6.37e-06)), ModuleInstance('CTATTGGGGT',Location('518-D12',0,10), float(9.40e-06)), ], [ModuleInstance('CGTTACG',Location('629-C08',37,44), float(6.82e-05)), ModuleInstance('CGTTACG',Location('621-H01',30,37), float(6.82e-05)), ModuleInstance('CGTTACG',Location('507-B04-1',8,15), float(6.82e-05)), ModuleInstance('CGTTACG',Location('410-A10',7,14), float(6.82e-05)), ModuleInstance('CGTTACG',Location('407-A07',26,33), float(6.82e-05)), ModuleInstance('CGTTACG',Location('17',0,7), float(6.82e-05)), ModuleInstance('TGTTACG',Location('625-H05',32,39), float(1.74e-04)), ModuleInstance('TGTTACG',Location('505-D01',3,10), float(1.74e-04)), ModuleInstance('CATTACG',Location('518-D12',30,37), float(2.14e-04)), ModuleInstance('CGGTACG',Location('402-C01',1,8), float(2.77e-04)), ModuleInstance('TGTTCCG',Location('629-C08',5,12), float(6.45e-04)), ], [ModuleInstance('CTATTGG',Location('629-C08',57,64), float(1.06e-04)), ModuleInstance('CTATTGG',Location('507-B04-1',42,49), float(1.06e-04)), ModuleInstance('CTATTGG',Location('410-A10',27,34), float(1.06e-04)), ModuleInstance('CTATTGG',Location('159',14,21), float(1.06e-04)), ModuleInstance('CTATTGG',Location('1',18,25), float(1.06e-04)), ModuleInstance('CTAATGG',Location('507-B04-1',28,35), float(1.63e-04)), ], ] self.Modules = [] for module, info in zip(self.ModuleInstances, self.module_info_dicts): curr_module_data = {} for instance in module: curr_module_data[(instance.Location.SeqId, instance.Location.Start)] = instance temp_module = Module(curr_module_data, MolType=DNA, Evalue=float(info['E-value']), Llr=int(info['llr'])) self.Modules.append(temp_module) self.ConsensusSequences = ['CTATTGGGGC','CGTTACG','CTATTGG'] def test_get_data_block(self): """getDataBlock should return the main block and the alphabet.""" main_block, alphabet = getDataBlock(self.meme_file) self.assertEqual(main_block,self.main_block) self.assertEqual(alphabet, DNA) def test_get_alphabet(self): """getMolType should return the correct alphabet.""" self.assertEqual(getMolType(self.alphabet_block),DNA) def test_get_command_module_blocks(self): """getCommandModuleBlocks should return the command and module blocks. """ command_block, module_blocks = getCommandModuleBlocks(self.main_block) self.assertEqual(command_block, self.command_block) self.assertEqual(module_blocks, self.module_blocks) def test_get_summary_block(self): """getSummaryBlock should return the MEME summary block.""" self.assertEqual(getSummaryBlock(self.module_blocks[-1]), self.summary_block) def test_dict_from_list(self): """dictFromList should return a dict given a list.""" self.assertEqual(dictFromList(self.sample_list),self.sample_dict) def test_extract_command_line_data(self): """extractCommandLineData should return a dict of command line data.""" self.assertEqual(extractCommandLineData(self.command_block), self.command_line_list) def test_get_module_data_blocks(self): """getModuleDataBlocks should return a list of blocks for each module. """ self.assertEqual(getModuleDataBlocks(self.module_blocks), self.module_data_blocks) def test_extract_module_data(self): """extractModuleData should return a Module object.""" for data, module in zip(self.module_data_blocks,self.Modules): ans = extractModuleData(data,DNA,self.remap_dict) self.assertEqual(ans,module) def test_get_consensus_sequence(self): """getConsensusSequence should return Module's Consensus sequence.""" for data,seq in zip(self.module_data_blocks,self.ConsensusSequences): ans = getConsensusSequence(data[1]) self.assertEqual(ans,seq) def test_get_module_general_info(self): """getModuleGeneralInfo should return a dict of Module info.""" for module, data_dict in zip(self.module_data_blocks, self.module_info_dicts): self.assertEqual(getModuleGeneralInfo(module[0][0]),data_dict) #Test that getModuleGeneralInfo can parse general info line when # motif ID is > 100. MEME changes the format of this line when in this # case. data_line_special = \ 'MOTIF100 width = 50 sites = 2 llr = 273 E-value = 3.1e-007' expected = {'MOTIF':'100','width':'50','sites':'2','llr':'273',\ 'E-value':'3.1e-007'} self.assertEqual(getModuleGeneralInfo(data_line_special),expected) def test_extract_summary_data(self): """extractSummaryData should return a dict of MEME summary data.""" self.assertEqual(extractSummaryData(self.summary_block), self.summary_dict) def test_meme_parser(self): """MemeParser should correctly return a MotifResults object.""" test_motif_results = MotifResults([],[],{}) test_motif_results.Results = self.summary_dict test_motif_results.Results['Warnings']=[] test_motif_results.Parameters = self.command_line_list test_motif_results.Modules = self.Modules ans_motif_results = MemeParser(self.meme_file) self.assertEqual(ans_motif_results.Modules,test_motif_results.Modules) self.assertEqual(ans_motif_results.Results,test_motif_results.Results) self.assertEqual(ans_motif_results.Parameters, test_motif_results.Parameters) MEME_FILE = """ ******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 3.0 (Release date: 2001/03/03 13:05:22) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.sdsc.edu. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.sdsc.edu. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= meme.16346.data ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ 1 1.0000 26 11 1.0000 25 17 1.0000 26 28 1.0000 26 105 1.0000 21 159 1.0000 21 402-C01 1.0000 34 407-A07 1.0000 34 410-A10 1.0000 34 505-D01 1.0000 49 507-B04-1 1.0000 49 518-D12 1.0000 49 621-H01 1.0000 74 625-H05 1.0000 65 629-C08 1.0000 64 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme meme.16346.data -dna -mod tcm -nmotifs 3 -minw 4 -maxw 10 -evt 1e100 -time 720 -maxsize 60000 -nostatus -maxiter 20 model: mod= tcm nmotifs= 3 evt= 1e+100 object function= E-value of product of p-values width: minw= 4 maxw= 10 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 50 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 em: prior= dirichlet b= 0.01 maxiter= 20 distance= 1e-05 data: n= 597 N= 15 strands: + sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.173 C 0.206 G 0.299 T 0.322 Background letter frequencies (from dataset with add-one prior applied): A 0.173 C 0.207 G 0.298 T 0.322 ******************************************************************************** ******************************************************************************** MOTIF 1 width = 10 sites = 11 llr = 131 E-value = 1.3e-019 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A ::a::::::: pos.-specific C a:::::::29 probability G :::2:aaa8: matrix T :a:8a::::1 bits 2.5 * 2.3 * * 2.0 * * 1.8 * * *** * Information 1.5 *** **** * content 1.3 *** ****** (17.2 bits) 1.0 ********** 0.8 ********** 0.5 ********** 0.3 ********** 0.0 ---------- Multilevel CTATTGGGGC consensus sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ---------- 629-C08 19 1.95e-06 TCCGTGAACA CTATTGGGGC GTGTAAGAGC 621-H01 46 1.95e-06 CGCATGCGTG CTATTGGGGC GTCATTTGTC 505-D01 27 1.95e-06 TTGATTGTTG CTATTGGGGC ATTGCCGTAC 407-A07 6 1.95e-06 CGTTA CTATTGGGGC GGGTATTTTC 105 1 1.95e-06 . CTATTGGGGC CGAAATGGTT 28 4 1.95e-06 TCC CTATTGGGGC CAAGGGCTAC 17 17 1.95e-06 GCTACTTGTG CTATTGGGGC 402-C01 25 3.30e-06 CTTAACATTC CTATTGGGCC 625-H05 3 5.11e-06 GC CTAGTGGGGC AGCTGACAGA 11 16 6.37e-06 TGTTAGACAG CTAGTGGGCC 518-D12 1 9.40e-06 . CTATTGGGGT GTTGTATTGA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 629-C08 2e-06 18_[1]_36 621-H01 2e-06 45_[1]_19 505-D01 2e-06 26_[1]_13 407-A07 2e-06 5_[1]_19 105 2e-06 [1]_11 28 2e-06 3_[1]_13 17 2e-06 16_[1] 402-C01 3.3e-06 24_[1] 625-H05 5.1e-06 2_[1]_53 11 6.4e-06 15_[1] 518-D12 9.4e-06 [1]_39 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=10 seqs=11 629-C08 ( 19) CTATTGGGGC 1 621-H01 ( 46) CTATTGGGGC 1 505-D01 ( 27) CTATTGGGGC 1 407-A07 ( 6) CTATTGGGGC 1 105 ( 1) CTATTGGGGC 1 28 ( 4) CTATTGGGGC 1 17 ( 17) CTATTGGGGC 1 402-C01 ( 25) CTATTGGGCC 1 625-H05 ( 3) CTAGTGGGGC 1 11 ( 16) CTAGTGGGCC 1 518-D12 ( 1) CTATTGGGGT 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 10 n= 462 bayes= 6.40183 E= 1.3e-019 -1010 227 -1010 -1010 -1010 -1010 -1010 164 253 -1010 -1010 -1010 -1010 -1010 -71 135 -1010 -1010 -1010 164 -1010 -1010 174 -1010 -1010 -1010 174 -1010 -1010 -1010 174 -1010 -1010 -19 146 -1010 -1010 214 -1010 -182 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 10 nsites= 11 E= 1.3e-019 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.181818 0.818182 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.181818 0.818182 0.000000 0.000000 0.909091 0.000000 0.090909 -------------------------------------------------------------------------------- Time 0.54 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 width = 7 sites = 11 llr = 88 E-value = 2.5e-006 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A :1::9:: pos.-specific C 7:::1a: probability G :91:::a matrix T 3:9a::: bits 2.5 2.3 * 2.0 ** 1.8 *** Information 1.5 **** content 1.3 ******* (11.6 bits) 1.0 ******* 0.8 ******* 0.5 ******* 0.3 ******* 0.0 ------- Multilevel CGTTACG consensus T sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------- 629-C08 38 6.82e-05 CGTGTAAGAG CGTTACG TGTTCCGTGA 621-H01 31 6.82e-05 CGAGGGAGTA CGTTACG CATGCGTGCT 507-B04-1 9 6.82e-05 CTTGCACA CGTTACG TGTGAGCCAT 410-A10 8 6.82e-05 CTTTGCT CGTTACG TGGTTGTATG 407-A07 27 6.82e-05 GGTATTTTCC CGTTACG T 17 1 6.82e-05 . CGTTACG CTACTTGTGC 625-H05 33 1.74e-04 ATAGGTCGAC TGTTACG GTTAGCGTTC 505-D01 4 1.74e-04 GCA TGTTACG TGACTTTTGA 518-D12 31 2.14e-04 GTTATTGCGA CATTACG CGTTCTGGTT 402-C01 2 2.77e-04 C CGGTACG GTTTGTCTTA 629-C08 6 6.45e-04 TTACG TGTTCCG TGAACACTAT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 629-C08 0.00064 5_[2]_25_[2]_20 621-H01 6.8e-05 30_[2]_37 507-B04-1 6.8e-05 8_[2]_34 410-A10 6.8e-05 7_[2]_20 407-A07 6.8e-05 26_[2]_1 17 6.8e-05 [2]_19 625-H05 0.00017 32_[2]_26 505-D01 0.00017 3_[2]_39 518-D12 0.00021 30_[2]_12 402-C01 0.00028 1_[2]_26 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=7 seqs=11 629-C08 ( 38) CGTTACG 1 621-H01 ( 31) CGTTACG 1 507-B04-1 ( 9) CGTTACG 1 410-A10 ( 8) CGTTACG 1 407-A07 ( 27) CGTTACG 1 17 ( 1) CGTTACG 1 625-H05 ( 33) TGTTACG 1 505-D01 ( 4) TGTTACG 1 518-D12 ( 31) CATTACG 1 402-C01 ( 2) CGGTACG 1 629-C08 ( 6) TGTTCCG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 7 n= 507 bayes= 6.53743 E= 2.5e-006 -1010 181 -1010 -24 -93 -1010 161 -1010 -1010 -1010 -171 150 -1010 -1010 -1010 164 239 -118 -1010 -1010 -1010 227 -1010 -1010 -1010 -1010 174 -1010 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 7 nsites= 11 E= 2.5e-006 0.000000 0.727273 0.000000 0.272727 0.090909 0.000000 0.909091 0.000000 0.000000 0.000000 0.090909 0.909091 0.000000 0.000000 0.000000 1.000000 0.909091 0.090909 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 -------------------------------------------------------------------------------- Time 0.85 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 width = 7 sites = 6 llr = 53 E-value = 5.5e-001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A ::a2::: pos.-specific C a:::::: probability G :::::aa matrix T :a:8a:: bits 2.5 * 2.3 * * 2.0 * * 1.8 * * ** Information 1.5 *** *** content 1.3 *** *** (12.7 bits) 1.0 ******* 0.8 ******* 0.5 ******* 0.3 ******* 0.0 ------- Multilevel CTATTGG consensus sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------- 629-C08 58 1.06e-04 TCCGTGAACA CTATTGG 507-B04-1 43 1.06e-04 TGGTGTTGCG CTATTGG 410-A10 28 1.06e-04 TTGTATGCCG CTATTGG 159 15 1.06e-04 GACCGTTGGT CTATTGG 1 19 1.06e-04 TTGGATAGTG CTATTGG G 507-B04-1 29 1.63e-04 GAGCCATTCT CTAATGG TGTTGCGCTA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 629-C08 0.00011 57_[3] 507-B04-1 0.00016 28_[3]_7_[3] 410-A10 0.00011 27_[3] 159 0.00011 14_[3] 1 0.00011 18_[3]_1 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=7 seqs=6 629-C08 ( 58) CTATTGG 1 507-B04-1 ( 43) CTATTGG 1 410-A10 ( 28) CTATTGG 1 159 ( 15) CTATTGG 1 1 ( 19) CTATTGG 1 507-B04-1 ( 29) CTAATGG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 7 n= 507 bayes= 6.83576 E= 5.5e-001 -923 227 -923 -923 -923 -923 -923 164 253 -923 -923 -923 -6 -923 -923 137 -923 -923 -923 164 -923 -923 174 -923 -923 -923 174 -923 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 7 nsites= 6 E= 5.5e-001 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.166667 0.000000 0.000000 0.833333 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 -------------------------------------------------------------------------------- Time 1.09 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 1 3.48e-02 26 11 3.78e-05 15_[1(6.37e-06)] 17 2.78e-08 [2(6.82e-05)]_9_[1(1.95e-06)] 28 3.49e-06 3_[1(1.95e-06)]_13 105 3.98e-06 [1(1.95e-06)]_11 159 1.08e-02 21 402-C01 4.22e-07 24_[1(3.30e-06)] 407-A07 7.32e-08 5_[1(1.95e-06)]_11_[2(6.82e-05)]_1 410-A10 4.23e-04 7_[2(6.82e-05)]_20 505-D01 5.72e-07 26_[1(1.95e-06)]_13 507-B04-1 1.01e-04 8_[2(6.82e-05)]_34 518-D12 2.83e-06 [1(9.40e-06)]_39 621-H01 8.69e-07 30_[2(6.82e-05)]_8_[1(1.95e-06)]_19 625-H05 8.86e-06 2_[1(5.11e-06)]_53 629-C08 5.61e-07 18_[1(1.95e-06)]_9_[2(6.82e-05)]_20 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 3 reached. ******************************************************************************** CPU: compute-0-2.local ******************************************************************************** """ #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_mothur.py000644 000765 000024 00000001731 12024702176 022505 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file cogent.parse.mothur.py __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Prototype" from cStringIO import StringIO from cogent.util.unit_test import TestCase, main from cogent.parse.mothur import parse_otu_list class FunctionTests(TestCase): def test_parse_otu_list(self): observed = list(parse_otu_list(StringIO(mothur_output))) expected = [ (0.0, [['cccccc'], ['bbbbbb'], ['aaaaaa']]), (0.62, [['bbbbbb', 'cccccc'], ['aaaaaa']]), (0.67000000000000004, [['aaaaaa', 'bbbbbb', 'cccccc']]) ] self.assertEqual(observed, expected) mothur_output = """\ unique 3 cccccc bbbbbb aaaaaa 0.62 2 bbbbbb,cccccc aaaaaa 0.67 1 aaaaaa,bbbbbb,cccccc """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_msms.py000644 000765 000024 00000001714 12024702176 022147 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os, tempfile from cogent.util.unit_test import TestCase, main from cogent.parse.msms import parse_VertFile try: from cStringIO import StringIO except ImportError: from StringIO import StringIO __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class MsmsTest(TestCase): """Tests for Msms application output parsers""" def setUp(self): vs = "1. 2. 3.\n" + \ "4. 5. 6.\n" + \ "7. 8. 9.\n" self.vertfile = StringIO(vs) def test_parseVertFile(self): out_arr = parse_VertFile(self.vertfile) assert out_arr.dtype == 'float64' assert out_arr.shape == (3,3) assert out_arr[0][0] == 1. if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_ncbi_taxonomy.py000644 000765 000024 00000040610 12024702176 024037 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of parsers for dealing with NCBI Taxonomy files. """ from cogent.parse.ncbi_taxonomy import MissingParentError, NcbiTaxon, \ NcbiTaxonParser, NcbiTaxonLookup, NcbiName, NcbiNameParser, \ NcbiNameLookup, \ NcbiTaxonomy, NcbiTaxonNode, NcbiTaxonomyFromFiles from cogent.util.unit_test import TestCase, main __author__ = "Jason Carnes" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jason Carnes", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" good_nodes = '''1\t|\t1\t|\tno rank\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 2\t|\t1\t|\tsuperkingdom\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 6\t|\t2\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t 7\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 9\t|\t7\t|\tsubspecies\t|\tBA\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 10\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| '''.split('\n') bad_nodes = '''1\t|\t1\t|\tno rank\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 2\t|\t1\t|\tsuperkingdom\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 6\t|\t2\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t 7\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 9\t|\t777\t|\tsubspecies\t|\tBA\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 10\t|\t666\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| '''.split('\n') good_names = '''1\t|\tall\t|\t\t|\tsynonym\t| 1\t|\troot\t|\t\t|\tscientific name\t| 2\t|\tBacteria\t|\tBacteria \t|\tscientific name\t| 2\t|\tMonera\t|\tMonera \t|\tin-part\t| 2\t|\tProcaryotae\t|\tProcaryotae <#1>\t|\tin-part\t| 2\t|\tProkaryotae\t|\tProkaryotae <#1>\t|\tin-part\t| 2\t|\teubacteria\t|\t\t|\tgenbank common name\t| 6\t|\tAzorhizobium\t|\t\t|\tscientific name\t| 7\t|\tAzorhizobium caulinodans\t|\t\t|\tscientific name\t| 9\t|\tBuchnera aphidicola\t|\t\t|\tscientific name\t| 10\t|\tFakus namus\t|\t\t|\tscientific name\t| '''.split('\n') class NcbiTaxonTests(TestCase): """Tests proper parsing of NCBI node file, e.g. nodes.dmp""" def test_init(self): """NcbiTaxon init should return object containing taxonomy data""" good_1 = '''1\t|\t1\t|\tno rank\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|\n''' good_2 = '''2\t|\t1\t|\tsuperkingdom\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|\n''' good_3 = '''6\t|\t2\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\n''' good_4 = '''7\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|\n''' node_1 = NcbiTaxon(good_1) #make a NcbiTaxon object node_2 = NcbiTaxon(good_2) #from the corresponding node_3 = NcbiTaxon(good_3) #line. node_4 = NcbiTaxon(good_4) self.assertEqual(node_1.Rank, 'no rank') #confirm object holds self.assertEqual(node_1.RankId, 28) #right data self.assertEqual(node_1.ParentId, 1) self.assertEqual(node_2.Rank, 'superkingdom') self.assertEqual(node_2.RankId, 27) self.assertEqual(node_2.ParentId, 1) self.assertEqual(node_3.Rank, 'genus') self.assertEqual(node_3.RankId, 8) self.assertEqual(node_3.ParentId, 2) self.assertEqual(node_4.Rank, 'species') self.assertEqual(node_4.RankId, 4) self.assertEqual(node_4.ParentId, 6) #test some comparisons assert node_1 > node_2 assert node_1 > node_3 assert node_1 > node_4 assert node_1 == node_1 assert node_2 < node_1 assert node_2 == node_2 assert node_4 < node_1 assert node_3 > node_4 def test_str(self): """NcbiTaxon str should write data in input format from nodes.dmp""" good = '''2\t|\t1\t|\tsuperkingdom\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|\n''' node = NcbiTaxon(good) self.assertEqual(str(node), good) root = '''1\t|\t1\t|\tno rank\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|''' root_node = NcbiTaxon(root) self.assertEqual(str(root), root) def test_bad_input(self): """NcbiTaxon init should raise ValueError if nodes missing""" bad_node_taxid = '''\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|\n''' #contains no taxon_id; not valid bad_node_parentid = '''7\t|\t\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|\n''' #contains no parent_id; not valid self.assertRaises(ValueError, NcbiTaxon, bad_node_taxid) self.assertRaises(ValueError, NcbiTaxon, bad_node_parentid) class NcbiNameTests(TestCase): """Tests proper parsing NCBI name file, e.g. names.dmp.""" def test_init(self): """NcbiName should init OK with well-formed name line""" line_1 = '''1\t|\tall\t|\t\t|\tsynonym\t|\n''' line_2 = '''1\t|\troot\t|\t\t|\tscientific name\t|\n''' line_3 = '''2\t|\tBacteria\t|\tBacteria \t|\tscientific name\t|\n''' line_4 = '''7\t|\tAzorhizobium caulinodans\t|\t\t|\tscientific name\t|\n''' name_1 = NcbiName(line_1) #make an NcbiName object name_2 = NcbiName(line_2) #from the corresponding line name_3 = NcbiName(line_3) name_4 = NcbiName(line_4) self.assertEqual(name_1.TaxonId, 1) #test that the data self.assertEqual(name_1.NameClass, 'synonym') #fields in the object self.assertEqual(name_2.TaxonId, 1) #hold right data self.assertEqual(name_2.NameClass, 'scientific name') self.assertEqual(name_3.TaxonId, 2) self.assertEqual(name_3.NameClass, 'scientific name') self.assertEqual(name_4.TaxonId, 7) self.assertEqual(name_4.NameClass, 'scientific name') def test_str(self): """NcbiName str should return line in original format""" line = '''1\t|\troot\t|\t\t|\tscientific name|\n''' name = NcbiName(line) self.assertEqual(str(name), line) def test_bad_input(self): """NcbiName init should raise correct errors on bad data""" bad_name_taxid = '''\t|\troot\t|\t\t|\tscientific name\t|\n'''#no tax_id self.assertRaises(ValueError, NcbiName, bad_name_taxid) class NcbiNameLookupTest(TestCase): """Tests of the NcbiNameLookup factory function.""" def test_init(self): """NcbiNameLookup should map taxon ids to scientific names""" names = list(NcbiNameParser(good_names)) #list of objects sci_names = NcbiNameLookup(names) #NcbiNameLookup object root = names[1] #NcbiName object made from 2nd line of good_name_file bacteria = names[2] #from 3rd line of good_name_file azorhizobium = names[7] caulinodans = names[8] assert (sci_names[1] is root) #gets NcbiName object from the assert (sci_names[2] is bacteria) #NcbiNameLookup object and assert (sci_names[6] is azorhizobium) #asks if it is the original assert (sci_names[7] is caulinodans) #NcbiName object self.assertEqual(sci_names[1].Name, 'root') self.assertEqual(sci_names[2].Name, 'Bacteria') self.assertEqual(sci_names[7].Name, 'Azorhizobium caulinodans') self.assertEqual(sci_names[9].Name, 'Buchnera aphidicola') class NcbiTaxonLookupTest(TestCase): """Tests of the NcbiTaxonLookup factory function.""" def setUp(self): """Sets up the class tests""" self.names = list(NcbiNameParser(good_names)) self.nodes = list(NcbiTaxonParser(good_nodes)) self.taxID_to_obj = NcbiTaxonLookup(self.nodes) self.names_to_obj = NcbiNameLookup(self.names) def test_init(self): """NcbiTaxonLookup should have correct fields for input NcbiTaxon""" line1_obj = self.nodes[0] #NcbiTaxon objects made from lines of line2_obj = self.nodes[1] #good_node_file line3_obj = self.nodes[2] line4_obj = self.nodes[3] line5_obj = self.nodes[4] assert (self.taxID_to_obj[1] is line1_obj) #gets NcbiTaxon object from assert (self.taxID_to_obj[2] is line2_obj) #NcbiTaxonLookup object & assert (self.taxID_to_obj[6] is line3_obj) #asks if it is the original assert (self.taxID_to_obj[7] is line4_obj) #NcbiTaxon object assert (self.taxID_to_obj[9] is line5_obj) self.assertEqual(self.taxID_to_obj[1].ParentId, 1) #checking a few self.assertEqual(self.taxID_to_obj[2].ParentId, 1) #individual self.assertEqual(self.taxID_to_obj[6].ParentId, 2) #fields of the self.assertEqual(self.taxID_to_obj[7].ParentId, 6) #NcbiTaxon objs self.assertEqual(self.taxID_to_obj[9].ParentId, 7) self.assertEqual(self.taxID_to_obj[1].Rank, 'no rank') self.assertEqual(self.taxID_to_obj[2].Rank, 'superkingdom') self.assertEqual(self.taxID_to_obj[6].Rank, 'genus') self.assertEqual(self.taxID_to_obj[7].Rank, 'species') self.assertEqual(self.taxID_to_obj[7].EmblCode, 'AC') self.assertEqual(self.taxID_to_obj[7].DivisionId, '0') self.assertEqual(self.taxID_to_obj[7].DivisionInherited, 1) self.assertEqual(self.taxID_to_obj[7].TranslTable, 11) self.assertEqual(self.taxID_to_obj[7].TranslTableInherited, 1) self.assertEqual(self.taxID_to_obj[7].TranslTableMt, 0) self.assertEqual(self.taxID_to_obj[7].TranslTableMtInherited, 1) class NcbiTaxonomyTests(TestCase): """Tests of the NcbiTaxonomy class.""" def setUp(self): self.tx = NcbiTaxonomyFromFiles(good_nodes, good_names) def test_init_good(self): """NcbiTaxonomyFromFiles should pass spot-checks of resulting objects""" self.assertEqual(len(self.tx.ByName), 6) self.assertEqual(len(self.tx.ById), 6) self.assertEqual(self.tx[10].Name, 'Fakus namus') self.assertEqual(self.tx['1'].Name, 'root') self.assertEqual(self.tx['root'].Parent, None) self.assertEqual(self.tx.Deadbeats, {}) def test_init_bad(self): """NcbiTaxonomyFromFiles should produce deadbeats by default""" bad_tx = NcbiTaxonomyFromFiles(bad_nodes, good_names) self.assertEqual(len(bad_tx.Deadbeats), 2) assert 777 in bad_tx.Deadbeats assert 666 in bad_tx.Deadbeats assert bad_tx.Deadbeats[777] == bad_tx[9] def test_init_strict(self): """NcbiTaxonomyFromFiles should fail if strict and deadbeats exist""" tx = NcbiTaxonomyFromFiles(good_nodes, good_names, strict=True) self.assertRaises(MissingParentError, NcbiTaxonomyFromFiles, \ bad_nodes, good_names, strict=True) def test_Ancestors(self): """NcbiTaxonomy should support Ancestors correctly, not incl. self""" result = self.tx['7'].ancestors() tax_ids = [taxon_obj.TaxonId for taxon_obj in result] self.assertEqual(tax_ids, [6, 2, 1]) def test_Parent(self): """NcbiTaxonomy should support Parent correctly""" assert self.tx[7].Parent is self.tx[6] assert self.tx[6].Parent is self.tx[2] assert self.tx[2].Parent is self.tx[1] assert self.tx[1].Parent is None def test_Siblings(self): """NcbiTaxonomy should support Siblings correctly""" sibs = self.tx[7].siblings() self.assertEqual(len(sibs), 1) assert sibs[0] is self.tx[10] def test_Children(self): """NcbiTaxonomy should support Children correctly""" children = self.tx[6].Children self.assertEqual(len(children), 2) assert children[0] is self.tx[7] assert children[1] is self.tx[10] root_kids = self.tx['root'] self.assertEqual(len(root_kids), 1) assert root_kids[0] is self.tx[2] self.assertEqual(len(self.tx[10].Children), 0) def test_Names(self): """NcbiTaxonomy should fill in names correctly""" self.assertEqual(self.tx['6'].Name, 'Azorhizobium') self.assertEqual(self.tx['1'].Name, 'root') self.assertEqual(self.tx['2'].Name, 'Bacteria') self.assertEqual(self.tx['7'].Name, 'Azorhizobium caulinodans') def test_lastCommonAncestor(self): """NcbiTaxonomy should support lastCommonAncestor()""" assert self.tx[9].lastCommonAncestor(self.tx[9]) is self.tx[9] assert self.tx[9].lastCommonAncestor(self.tx[7]) is self.tx[7] assert self.tx[9].lastCommonAncestor(self.tx[10]) is self.tx[6] assert self.tx[9].lastCommonAncestor(self.tx[1]) is self.tx[1] class NcbiTaxonNodeTests(TestCase): """Tests of the NcbiTaxonNode class. Note: only testing methods that differ from the TreeNode base class. Note: nested_species is explicitly designed to test the case where the nodes file does _not_ contain the root, and where the id of the de facto root is not 1, to make sure there's nothing special about a node called 'root' or with id 1. """ def test_getRankedDescendants(self): """NcbiTaxonNode getRankedDescendants should return correct list""" nested_species = '''3\t|\t3\t|\tsuperkingdom\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 11\t|\t3\t|\tkingdom\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 22\t|\t11\t|\tclass\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 44\t|\t22\t|\torder\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 66\t|\t22\t|\torder\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t 77\t|\t66\t|\tfamily\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 99\t|\t66\t|\tfamily\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 88\t|\t44\t|\tfamily\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 101\t|\t77\t|\tgenus\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 202\t|\t77\t|\tgenus\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 606\t|\t99\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t 707\t|\t88\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 909\t|\t88\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 123\t|\t909\t|\tgroup\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 1111\t|\t123\t|\tspecies\t|\tAT\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 2222\t|\t707\t|\tspecies\t|\tTT\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t| 6666\t|\t606\t|\tspecies\t|\tGG\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t 7777\t|\t606\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 9999\t|\t202\t|\tspecies\t|\tBA\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 1010\t|\t101\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 5555\t|\t555\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t| 555\t|\t3\t|\tsuperclass\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|'''.split('\n') nested_names = [ '3|a||scientific name|', '11|b||scientific name|', '555|c||scientific name|', '22|d||scientific name|', '44|e||scientific name|', '66|f||scientific name|', '88|g||scientific name|', '77|h||scientific name|', '99|i||scientific name|', '707|j||scientific name|', '909|k||scientific name|', '101|l||scientific name|', '202|m||scientific name|', '606|n||scientific name|', '2222|o||scientific name|', '123|p||scientific name|', '1111|q||scientific name|', '1010|r||scientific name|', '9999|s||scientific name|', '7777|t||scientific name|', '6666|u||scientific name|', '5555|z||scientific name|', ] tx = NcbiTaxonomyFromFiles(nested_species, nested_names) dec = tx[3].getRankedDescendants('superclass') self.assertEqual(len(dec), 1) assert dec[0] is tx[555] sp = tx['f'].getRankedDescendants('species') self.assertSameItems(sp, [tx[1010], tx[9999], tx[7777], tx[6666]]) empty = tx[11].getRankedDescendants('superclass') self.assertEqual(empty, []) gr = tx[3].getRankedDescendants('group') self.assertEqual(gr, [tx[123]]) assert tx[3] is tx['a'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_nexus.py000644 000765 000024 00000037464 12024702176 022345 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the Nexus Parser """ from cogent.util.unit_test import TestCase, main from cogent.parse.nexus import get_tree_info, parse_nexus_tree, parse_PAUP_log, \ split_tree_info, parse_trans_table, parse_dnd, get_BL_table, parse_taxa, \ find_fields __author__ = "Catherine Lozupone" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozupone", "Rob Knight", "Micah Hamady"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Catherine Lozupone" __email__ = "lozupone@colorado.edu" __status__ = "Production" Nexus_tree = """#NEXUS Begin trees; [Treefile saved Wednesday, May 5, 2004 5:02 PM] [! >Data file = Grassland_short.nex >Neighbor-joining search settings: > Ties (if encountered) will be broken systematically > Distance measure = Jukes-Cantor > (Tree is unrooted) ] Translate 1 outgroup25, 2 AF078391l, 3 AF078211af, 4 AF078393l, 5 AF078187af, 6 AF078320l, 7 AF078432l, 8 AF078290af, 9 AF078350l, 10 AF078356l, 11 AF078306af, 12 AF078429l, 13 AF078256af, 14 AF078443l, 15 AF078450l, 16 AF078452l, 17 AF078258af, 18 AF078380l, 19 AF078251af, 20 AF078179af, 21 outgroup258 ; tree PAUP_1 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21); tree PAUP_2 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21); End;""".split('\n') Nexus_tree_2 = """#NEXUS Begin trees; [Treefile saved Wednesday, June 14, 2006 11:20 AM] [!>Neighbor-joining search settings: > Ties (if encountered) will be broken systematically > Distance measure = uncorrected ("p") > (Tree is unrooted) ] tree nj = [&U] ((((((((((YA10260L1:0.01855,SARAG06_Y:0.00367):0.01965,(((YA270L1G0:0.01095,SARAD10_Y:0.00699):0.01744,YA270L1A0:0.04329):0.00028,((YA165L1C1:0.01241,SARAA02_Y:0.02584):0.00213,((YA165L1H0:0.00092,SARAF10_Y:-0.00092):0.00250,(YA165L1A0:0.00177,SARAH10_Y:0.01226):0.00198):0.00131):0.00700):0.01111):0.11201,(YA160L1F0:0.00348,SARAG01_Y:-0.00122):0.13620):0.01202,((((YRM60L1D0:0.00357,(YRM60L1C0:0.00477,SARAE10_Y:-0.00035):0.00086):0.00092,SARAE03_Y:0.00126):0.00125,SARAC11_Y:0.00318):0.00160,YRM60L1H0:0.00593):0.09975):0.07088,SARAA01_Y:0.02880):0.00190,SARAB04_Y:0.05219):0.00563,YRM60L1E0:0.06099):0.00165,(YRM60L1H0:0.00450,SARAF11_Y:0.01839):0.00288):0.00129,YRM60L1B1:0.00713):0.00194,(YRM60L1G0:0.00990,(YA165L1G0:0.00576,(YA160L1G0:0.01226,SARAA11_Y:0.00389):0.00088):0.00300):0.00614,SARAC06_Y:0.00381); end;""".split('\n') Nexus_tree_3 = """#NEXUS Begin trees; [Treefile saved Wednesday, May 5, 2004 5:02 PM] [! >Data file = Grassland_short.nex >Neighbor-joining search settings: > Ties (if encountered) will be broken systematically > Distance measure = Jukes-Cantor > (Tree is unrooted) ] Translate 1 outgroup25, 2 AF078391l, 3 'AF078211af', 4 AF078393l, 5 AF078187af, 6 AF078320l, 7 AF078432l, 8 AF078290af, 9 AF078350l, 10 AF078356l, 11 AF078306af, 12 AF078429l, 13 AF078256af, 14 'AF078443l', 15 AF078450l, 16 AF078452l, 17 AF078258af, 18 'AF078380l', 19 AF078251af, 20 AF078179af, 21 outgroup258 ; tree PAUP_1 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21); tree PAUP_2 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21); End;""".split('\n') PAUP_log = """ P A U P * Version 4.0b10 for Macintosh (PPC/Altivec) Wednesday, May 5, 2004 5:03 PM This copy registered to: Scott Dawson UC-Berkeley (serial number = B400784) -----------------------------NOTICE----------------------------- This is a beta-test version. Please report any crashes, apparent calculation errors, or other anomalous results. There are no restrictions on publication of results obtained with this version, but you should check the WWW site frequently for bug announcements and/or updated versions. See the README file on the distribution media for details. ---------------------------------------------------------------- Tree description: Optimality criterion = parsimony Character-status summary: Of 500 total characters: All characters are of type 'unord' All characters have equal weight 253 characters are constant 109 variable characters are parsimony-uninformative Number of parsimony-informative characters = 138 Multistate taxa interpreted as uncertainty Character-state optimization: Accelerated transformation (ACCTRAN) AncStates = "standard" Tree number 1 (rooted using user-specified outgroup) Branch lengths and linkages for tree #1 Assigned Minimum Maximum Connected branch possible possible Node to node length length length ------------------------------------------------------------------------- 40 root 0 0 0 outgroup25 (1)* 40 40 24 52 39 40 57 15 72 AF078391l (2) 39 56 48 81 38 39 33 17 71 37 38 31 14 48 22 37 20 11 33 AF078211af (3) 22 4 2 7 AF078393l (4) 22 1 0 3 36 37 14 5 32 AF078187af (5) 36 18 10 28 35 36 21 16 45 34 35 10 3 23 26 34 5 3 9 24 26 4 3 13 23 24 0 0 3 AF078320l (6) 23 1 1 3 AF078356l (10) 23 2 2 2 AF078350l (9) 24 5 3 5 25 26 9 2 10 AF078306af (11) 25 6 4 10 AF078380l (18) 25 5 3 10 33 34 5 4 15 29 33 3 1 4 28 29 2 2 2 27 28 3 1 3 AF078432l (7) 27 2 2 2 AF078450l (15) 27 3 3 4 AF078251af (19) 28 6 6 7 AF078258af (17) 29 6 6 6 32 33 4 3 15 AF078290af (8) 32 9 8 11 31 32 9 6 18 AF078429l (12) 31 2 1 5 30 31 10 9 18 AF078443l (14) 30 2 1 6 AF078452l (16) 30 4 4 5 AF078256af (13) 35 4 1 6 AF078179af (20) 38 48 34 79 outgroup258 (21)* 40 45 27 67 ------------------------------------------------------------------------- Sum 509 Tree length = 509 Consistency index (CI) = 0.7151 Homoplasy index (HI) = 0.2849 """.split('\n') line1 = " 40 root 0 0 0" line2 = "outgroup25 (1)* 40 40 24 52" line3 = " 39 40 57 15 72" line4 = "AF078391l (2) 39 56 48 81" class NexusParserTests(TestCase): """Tests of the Nexus Parser functions""" def test_parse_nexus_tree(self): """parse_nexus_tree returns a dnd string and a translation table list""" Trans_table, dnd = parse_nexus_tree(Nexus_tree) #check the full dendrogram string is returned self.assertEqual(dnd['tree PAUP_1'],\ "(1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);") #check that all taxa are returned in the Trans_table self.assertEqual(Trans_table['1'], 'outgroup25') self.assertEqual(Trans_table['2'], 'AF078391l') self.assertEqual(Trans_table['3'], 'AF078211af') self.assertEqual(Trans_table['4'], 'AF078393l') self.assertEqual(Trans_table['5'], 'AF078187af') self.assertEqual(Trans_table['6'], 'AF078320l') self.assertEqual(Trans_table['21'], 'outgroup258') self.assertEqual(Trans_table['20'], 'AF078179af') self.assertEqual(Trans_table['19'], 'AF078251af') #check that Nexus files without translation table work Trans_table, dnd = parse_nexus_tree(Nexus_tree_2) self.assertEqual(Trans_table, None) self.assertEqual(dnd['tree nj'], '((((((((((YA10260L1:0.01855,SARAG06_Y:0.00367):0.01965,(((YA270L1G0:0.01095,SARAD10_Y:0.00699):0.01744,YA270L1A0:0.04329):0.00028,((YA165L1C1:0.01241,SARAA02_Y:0.02584):0.00213,((YA165L1H0:0.00092,SARAF10_Y:-0.00092):0.00250,(YA165L1A0:0.00177,SARAH10_Y:0.01226):0.00198):0.00131):0.00700):0.01111):0.11201,(YA160L1F0:0.00348,SARAG01_Y:-0.00122):0.13620):0.01202,((((YRM60L1D0:0.00357,(YRM60L1C0:0.00477,SARAE10_Y:-0.00035):0.00086):0.00092,SARAE03_Y:0.00126):0.00125,SARAC11_Y:0.00318):0.00160,YRM60L1H0:0.00593):0.09975):0.07088,SARAA01_Y:0.02880):0.00190,SARAB04_Y:0.05219):0.00563,YRM60L1E0:0.06099):0.00165,(YRM60L1H0:0.00450,SARAF11_Y:0.01839):0.00288):0.00129,YRM60L1B1:0.00713):0.00194,(YRM60L1G0:0.00990,(YA165L1G0:0.00576,(YA160L1G0:0.01226,SARAA11_Y:0.00389):0.00088):0.00300):0.00614,SARAC06_Y:0.00381);') def test_parse_nexus_tree_sq(self): """remove single quotes from tree and translate tables""" Trans_table, dnd = parse_nexus_tree(Nexus_tree_3) #check the full dendrogram string is returned self.assertEqual(dnd['tree PAUP_1'],\ "(1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);") #check that all taxa are returned in the Trans_table self.assertEqual(Trans_table['1'], 'outgroup25') self.assertEqual(Trans_table['2'], 'AF078391l') self.assertEqual(Trans_table['3'], 'AF078211af') self.assertEqual(Trans_table['4'], 'AF078393l') self.assertEqual(Trans_table['5'], 'AF078187af') self.assertEqual(Trans_table['6'], 'AF078320l') self.assertEqual(Trans_table['21'], 'outgroup258') self.assertEqual(Trans_table['20'], 'AF078179af') self.assertEqual(Trans_table['19'], 'AF078251af') def test_get_tree_info(self): """get_tree_info returns the Nexus file section that describes the tree""" result = get_tree_info(Nexus_tree) self.assertEqual(len(result), 33) self.assertEqual(result[0],\ "Begin trees; [Treefile saved Wednesday, May 5, 2004 5:02 PM]") self.assertEqual(result[31], \ "tree PAUP_1 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);") def test_split_tree_info(self): """split_tree_info splits lines into header, Trans_table, and dnd""" tree_info = get_tree_info(Nexus_tree) header, trans_table, dnd = split_tree_info(tree_info) self.assertEqual(len(header), 9) self.assertEqual(len(trans_table), 22) self.assertEqual(len(dnd), 2) self.assertEqual(header[0],\ "Begin trees; [Treefile saved Wednesday, May 5, 2004 5:02 PM]") self.assertEqual(header[8], "\tTranslate") self.assertEqual(trans_table[0], "\t\t1 outgroup25,") self.assertEqual(trans_table[21], "\t\t;") self.assertEqual(dnd[0], \ "tree PAUP_1 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);") def test_parse_trans_table(self): """parse_trans_table returns a dict with the taxa names indexed by number""" tree_info = get_tree_info(Nexus_tree) header, trans_table, dnd = split_tree_info(tree_info) Trans_table = parse_trans_table(trans_table) self.assertEqual(len(Trans_table), 21) #check that taxa are returned in the Trans_table self.assertEqual(Trans_table['1'], 'outgroup25') self.assertEqual(Trans_table['2'], 'AF078391l') self.assertEqual(Trans_table['3'], 'AF078211af') self.assertEqual(Trans_table['4'], 'AF078393l') self.assertEqual(Trans_table['5'], 'AF078187af') self.assertEqual(Trans_table['6'], 'AF078320l') self.assertEqual(Trans_table['21'], 'outgroup258') self.assertEqual(Trans_table['20'], 'AF078179af') self.assertEqual(Trans_table['19'], 'AF078251af') def test_parse_dnd(self): """parse_dnd returns a dict with dnd indexed by tree name""" tree_info = get_tree_info(Nexus_tree) header, trans_table, dnd = split_tree_info(tree_info) dnd_dict = parse_dnd(dnd) self.assertEqual(dnd_dict['tree PAUP_1'],\ "(1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);") #------------------------------------------------------ def test_get_BL_table(self): """get_BL_table returns the section of the log file w/ the BL table""" BL_table = get_BL_table(PAUP_log) self.assertEqual(len(BL_table), 40) self.assertEqual(BL_table[0], \ " 40 root 0 0 0") self.assertEqual(BL_table[39], \ "outgroup258 (21)* 40 45 27 67") def test_find_fields(self): """find_fields takes BL table line and returns field names mapped to info""" result = find_fields(line1) self.assertEqual(result['taxa'], "40") self.assertEqual(result['bl'], "0") self.assertEqual(result['parent'], "root") def test_parse_taxa(self): """parse_taxa should return the taxa # from a taxa_field from find_fields""" result1 = find_fields(line1) result2 = find_fields(line2) result3 = find_fields(line3) result4 = find_fields(line4) self.assertEqual(parse_taxa(result1["taxa"]), '40') self.assertEqual(parse_taxa(result2["taxa"]), '1') self.assertEqual(parse_taxa(result3["taxa"]), '39') self.assertEqual(parse_taxa(result4["taxa"]), '2') def test_parse_PAUP_log(self): """parse_PAUP_log extracts branch length info from a PAUP log file""" BL_dict = parse_PAUP_log(PAUP_log) self.assertEqual(len(BL_dict), 40) self.assertEqual(BL_dict['1'], ('40', 40)) self.assertEqual(BL_dict['40'], ('root', 0)) self.assertEqual(BL_dict['39'], ('40', 57)) self.assertEqual(BL_dict['2'], ('39', 56)) self.assertEqual(BL_dict['26'], ('34', 5)) self.assertEqual(BL_dict['21'], ('40', 45)) #run if called from command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_nupack.py000644 000765 000024 00000003125 12024702176 022447 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.nupack import nupack_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class NupackParserTest(TestCase): """Provides tests for NUPACK RNA secondary structure format parsers""" def setUp(self): """Setup function""" #output self.nupack_out = NUPACK #expected self.nupack_exp = [['GGCUAGUCCCUUCU',[(0,9),(1,8),(4,13),(5,12)],-23.30]] def test_nupack_output(self): """Test for nupack format""" obs = nupack_parser(self.nupack_out) self.assertEqual(obs,self.nupack_exp) NUPACK = ['****************************************************************\n', 'NUPACK 1.2\n', 'Copyright 2007-2009 2003, 2004 by Robert M. Dirks & Niles A. Pierce\n', 'California Institute of Technology\n', 'Pasadena, CA 91125 USA\n', '\n', 'Last Modified: 03/18/2004\n', '****************************************************************\n', '\n', '\n', 'Fold.out Version 1.2: Complexity O(N^5) (pseudoknots enabled)\n', 'Reading Input File...\n', 'Sequence Read.\n', 'Energy Parameters Loaded\n', 'SeqLength = 14\n', 'Sequence and a Minimum Energy Structure:\n', 'GGCUAGUCCCUUCU\n', '((..{{..))..}}\n', 'mfe = -23.30 kcal/mol\n', 'pseudoknotted!\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_pamlmatrix.py000644 000765 000024 00000004656 12024702176 023356 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from StringIO import StringIO from cogent.util.unit_test import TestCase, main from cogent.evolve.models import DSO78_matrix, DSO78_freqs from cogent.parse.paml_matrix import PamlMatrixParser __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" data = """ 27 98 32 120 0 905 36 23 0 0 89 246 103 134 0 198 1 148 1153 0 716 240 9 139 125 11 28 81 23 240 535 86 28 606 43 10 65 64 77 24 44 18 61 0 7 41 15 34 0 0 73 11 7 44 257 26 464 318 71 0 153 83 27 26 46 18 72 90 1 0 0 114 30 17 0 336 527 243 18 14 14 0 0 0 0 15 48 196 157 0 92 250 103 42 13 19 153 51 34 94 12 32 33 17 11 409 154 495 95 161 56 79 234 35 24 17 96 62 46 245 371 26 229 66 16 53 34 30 22 192 33 136 104 13 78 550 0 201 23 0 0 0 0 0 27 0 46 0 0 76 0 75 0 24 8 95 0 96 0 22 0 127 37 28 13 0 698 0 34 42 61 208 24 15 18 49 35 37 54 44 889 175 10 258 12 48 30 157 0 28 0.087127 0.040904 0.040432 0.046872 0.033474 0.038255 0.049530 0.088612 0.033618 0.036886 0.085357 0.080482 0.014753 0.039772 0.050680 0.069577 0.058542 0.010494 0.029916 0.064718 Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val S_ij = S_ji and PI_i for the Dayhoff model, with the rate Q_ij=S_ij*PI_j The rest of the file is not used. Prepared by Z. Yang, March 1995. See the following reference for notation used here: Yang, Z., R. Nielsen and M. Hasegawa. 1998. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol. Biol. Evol. 15:1600-1611. """ class TestParsePamlMatrix(TestCase): def test_parse(self): matrix,freqs = PamlMatrixParser(StringIO(data)) self.assertEqual(DSO78_matrix,matrix) self.assertEqual(DSO78_freqs,freqs) pass if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_pdb.py000755 000765 000024 00000114601 12024702176 021740 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the pdb parser. """ from cogent.util.unit_test import TestCase, main from cogent.parse.pdb import dict2pdb, dict2ter, pdb2dict, get_symmetry, \ get_coords_offset, get_trailer_offset, \ parse_header, parse_coords, parse_trailer, \ PDBParser from cogent.core.entity import Structure from cogent.core.entity import StructureBuilder from numpy import array, allclose __author__ = "Marcin Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Production" class pdbTests(TestCase): """Tests of cogent.parse.pdb functions.""" def test_PDBParser(self): """tests the UI parsing function. """ fh = open('data/2E12.pdb') structure = PDBParser(fh, 'JUNK') assert type(structure) is Structure assert len(structure) == 1 assert (0,) in structure assert structure.header['space_group'] == 'P 21 21 21' assert structure.header['experiment_type'] == 'X-RAY DIFFRACTION' assert structure.header['r_free'] == '0.280' assert structure.header['dbref_acc'] == 'Q8P4R5' assert structure.header['cryst1'] == '49.942 51.699 82.120 90.00 90.00 90.00' assert structure.header['matthews'] == '2.29' model = structure[(0,)] assert len(model) == 2 assert structure.raw_header assert structure.raw_trailer assert structure.header assert structure.trailer == {} assert structure.getId() == ('JUNK', ) def test_parse_trailer(self): """testing trailer parsing dummy.""" d = parse_trailer(None) assert isinstance(d, dict) def test_parse_coords(self): """testing minimal structure building and coords parsing. """ builder = StructureBuilder() builder.initStructure('JUNK') atom = 'ATOM 10 CA PRO A 2 51.588 38.262 31.417 1.00 6.58 C \n' hetatm = 'HETATM 1633 O HOH B 164 17.979 35.529 38.171 1.00 1.02 O \n' lines = ['MODEL ', atom, hetatm] z = parse_coords(builder, lines) assert len(z[(0,)]) == 2 assert len(z[(0,)][('A',)]) == 1 assert len(z[(0,)][('B',)]) == 1 z.setTable() atom1 = z.table['A'][('JUNK', 0, 'A', ('PRO', 2, ' '), ('CA', ' '))] hetatm1 = z.table['A'][('JUNK', 0, 'B', ('H_HOH', 164, ' '), ('O', ' '))] self.assertAlmostEqual([51.588 , 38.262 , 31.417][2], list(atom1.coords)[2]) self.assertAlmostEqual([17.979 , 35.529 , 38.171][2], list(hetatm1.coords)[2]) def test_parse_header(self): """testing header parsing. """ header = ['HEADER TRANSLATION 17-OCT-06 2E12 \n', 'TITLE THE CRYSTAL STRUCTURE OF XC5848 FROM XANTHOMONAS CAMPESTRIS \n', 'TITLE 2 ADOPTING A NOVEL VARIANT OF SM-LIKE MOTIF \n', 'COMPND MOL_ID: 1; \n', 'COMPND 2 MOLECULE: HYPOTHETICAL PROTEIN XCC3642; \n', 'COMPND 3 CHAIN: A, B; \n', 'COMPND 4 SYNONYM: SM-LIKE MOTIF; \n', 'COMPND 5 ENGINEERED: YES \n', 'SOURCE MOL_ID: 1; \n', 'SOURCE 2 ORGANISM_SCIENTIFIC: XANTHOMONAS CAMPESTRIS PV. CAMPESTRIS; \n', 'SOURCE 3 ORGANISM_TAXID: 340; \n', 'SOURCE 4 STRAIN: PV. CAMPESTRIS; \n', 'SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; \n', 'SOURCE 6 EXPRESSION_SYSTEM_TAXID: 562 \n', 'KEYWDS NOVEL SM-LIKE MOTIF, LSM MOTIF, XANTHOMONAS CAMPESTRIS, X- \n', 'KEYWDS 2 RAY CRYSTALLOGRAPHY, TRANSLATION \n', 'EXPDTA X-RAY DIFFRACTION \n', 'AUTHOR K.-H.CHIN,S.-K.RUAN,A.H.-J.WANG,S.-H.CHOU \n', 'REVDAT 2 24-FEB-09 2E12 1 VERSN \n', 'REVDAT 1 30-OCT-07 2E12 0 \n', 'JRNL AUTH K.-H.CHIN,S.-K.RUAN,A.H.-J.WANG,S.-H.CHOU \n', 'JRNL TITL XC5848, AN ORFAN PROTEIN FROM XANTHOMONAS \n', 'JRNL TITL 2 CAMPESTRIS, ADOPTS A NOVEL VARIANT OF SM-LIKE MOTIF \n', 'JRNL REF PROTEINS V. 68 1006 2007 \n', 'JRNL REFN ISSN 0887-3585 \n', 'JRNL PMID 17546661 \n', 'JRNL DOI 10.1002/PROT.21375 \n', 'REMARK 1 \n', 'REMARK 2 \n', 'REMARK 2 RESOLUTION. 1.70 ANGSTROMS. \n', 'REMARK 3 \n', 'REMARK 3 REFINEMENT. \n', 'REMARK 3 PROGRAM : CNS \n', 'REMARK 3 AUTHORS : BRUNGER,ADAMS,CLORE,DELANO,GROS,GROSSE- \n', 'REMARK 3 : KUNSTLEVE,JIANG,KUSZEWSKI,NILGES, PANNU, \n', 'REMARK 3 : READ,RICE,SIMONSON,WARREN \n', 'REMARK 3 \n', 'REMARK 3 REFINEMENT TARGET : NULL \n', 'REMARK 3 \n', 'REMARK 3 DATA USED IN REFINEMENT. \n', 'REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.70 \n', 'REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 30.00 \n', 'REMARK 3 DATA CUTOFF (SIGMA(F)) : 5.000 \n', 'REMARK 3 DATA CUTOFF HIGH (ABS(F)) : NULL \n', 'REMARK 3 DATA CUTOFF LOW (ABS(F)) : NULL \n', 'REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 99.1 \n', 'REMARK 3 NUMBER OF REFLECTIONS : 6937 \n', 'REMARK 3 \n', 'REMARK 3 FIT TO DATA USED IN REFINEMENT. \n', 'REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT \n', 'REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM \n', 'REMARK 3 R VALUE (WORKING SET) : 0.220 \n', 'REMARK 3 FREE R VALUE : 0.280 \n', 'REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL \n', 'REMARK 3 FREE R VALUE TEST SET COUNT : NULL \n', 'REMARK 3 ESTIMATED ERROR OF FREE R VALUE : NULL \n', 'REMARK 3 \n', 'REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. \n', 'REMARK 3 TOTAL NUMBER OF BINS USED : NULL \n', 'REMARK 3 BIN RESOLUTION RANGE HIGH (A) : 1.70 \n', 'REMARK 3 BIN RESOLUTION RANGE LOW (A) : 1.75 \n', 'REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : 97.00 \n', 'REMARK 3 REFLECTIONS IN BIN (WORKING SET) : NULL \n', 'REMARK 3 BIN R VALUE (WORKING SET) : 0.2400 \n', 'REMARK 3 BIN FREE R VALUE : 0.2200 \n', 'REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : NULL \n', 'REMARK 3 BIN FREE R VALUE TEST SET COUNT : NULL \n', 'REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : 0.012 \n', 'REMARK 3 \n', 'REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. \n', 'REMARK 3 PROTEIN ATOMS : 1512 \n', 'REMARK 3 NUCLEIC ACID ATOMS : 0 \n', 'REMARK 3 HETEROGEN ATOMS : 0 \n', 'REMARK 3 SOLVENT ATOMS : 122 \n', 'REMARK 3 \n', 'REMARK 3 B VALUES. \n', 'REMARK 3 FROM WILSON PLOT (A**2) : 24.00 \n', 'REMARK 3 MEAN B VALUE (OVERALL, A**2) : NULL \n', 'REMARK 3 OVERALL ANISOTROPIC B VALUE. \n', 'REMARK 3 B11 (A**2) : NULL \n', 'REMARK 3 B22 (A**2) : NULL \n', 'REMARK 3 B33 (A**2) : NULL \n', 'REMARK 3 B12 (A**2) : NULL \n', 'REMARK 3 B13 (A**2) : NULL \n', 'REMARK 3 B23 (A**2) : NULL \n', 'REMARK 3 \n', 'REMARK 3 ESTIMATED COORDINATE ERROR. \n', 'REMARK 3 ESD FROM LUZZATI PLOT (A) : NULL \n', 'REMARK 3 ESD FROM SIGMAA (A) : NULL \n', 'REMARK 3 LOW RESOLUTION CUTOFF (A) : NULL \n', 'REMARK 3 \n', 'REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR. \n', 'REMARK 3 ESD FROM C-V LUZZATI PLOT (A) : NULL \n', 'REMARK 3 ESD FROM C-V SIGMAA (A) : NULL \n', 'REMARK 3 \n', 'REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES. \n', 'REMARK 3 BOND LENGTHS (A) : 0.007 \n', 'REMARK 3 BOND ANGLES (DEGREES) : 1.32 \n', 'REMARK 3 DIHEDRAL ANGLES (DEGREES) : NULL \n', 'REMARK 3 IMPROPER ANGLES (DEGREES) : NULL \n', 'REMARK 3 \n', 'REMARK 3 ISOTROPIC THERMAL MODEL : NULL \n', 'REMARK 3 \n', 'REMARK 3 ISOTROPIC THERMAL FACTOR RESTRAINTS. RMS SIGMA \n', 'REMARK 3 MAIN-CHAIN BOND (A**2) : NULL ; NULL \n', 'REMARK 3 MAIN-CHAIN ANGLE (A**2) : NULL ; NULL \n', 'REMARK 3 SIDE-CHAIN BOND (A**2) : NULL ; NULL \n', 'REMARK 3 SIDE-CHAIN ANGLE (A**2) : NULL ; NULL \n', 'REMARK 3 \n', 'REMARK 3 BULK SOLVENT MODELING. \n', 'REMARK 3 METHOD USED : NULL \n', 'REMARK 3 KSOL : NULL \n', 'REMARK 3 BSOL : NULL \n', 'REMARK 3 \n', 'REMARK 3 NCS MODEL : NULL \n', 'REMARK 3 \n', 'REMARK 3 NCS RESTRAINTS. RMS SIGMA/WEIGHT \n', 'REMARK 3 GROUP 1 POSITIONAL (A) : NULL ; NULL \n', 'REMARK 3 GROUP 1 B-FACTOR (A**2) : NULL ; NULL \n', 'REMARK 3 \n', 'REMARK 3 PARAMETER FILE 1 : NULL \n', 'REMARK 3 TOPOLOGY FILE 1 : NULL \n', 'REMARK 3 \n', 'REMARK 3 OTHER REFINEMENT REMARKS: NULL \n', 'REMARK 4 \n', 'REMARK 4 2E12 COMPLIES WITH FORMAT V. 3.15, 01-DEC-08 \n', 'REMARK 100 \n', 'REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY PDBJ ON 19-OCT-06. \n', 'REMARK 100 THE RCSB ID CODE IS RCSB026092. \n', 'REMARK 200 \n', 'REMARK 200 EXPERIMENTAL DETAILS \n', 'REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION \n', 'REMARK 200 DATE OF DATA COLLECTION : 28-JUL-06 \n', 'REMARK 200 TEMPERATURE (KELVIN) : 100 \n', 'REMARK 200 PH : 8.0 \n', 'REMARK 200 NUMBER OF CRYSTALS USED : 10 \n', 'REMARK 200 \n', 'REMARK 200 SYNCHROTRON (Y/N) : Y \n', 'REMARK 200 RADIATION SOURCE : NSRRC \n', 'REMARK 200 BEAMLINE : BL13B1 \n', 'REMARK 200 X-RAY GENERATOR MODEL : NULL \n', 'REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M \n', 'REMARK 200 WAVELENGTH OR RANGE (A) : 0.96437, 0.97983 \n', 'REMARK 200 MONOCHROMATOR : NULL \n', 'REMARK 200 OPTICS : NULL \n', 'REMARK 200 \n', 'REMARK 200 DETECTOR TYPE : CCD \n', 'REMARK 200 DETECTOR MANUFACTURER : ADSC QUANTUM 315 \n', 'REMARK 200 INTENSITY-INTEGRATION SOFTWARE : DENZO \n', 'REMARK 200 DATA SCALING SOFTWARE : HKL-2000 \n', 'REMARK 200 \n', 'REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 6937 \n', 'REMARK 200 RESOLUTION RANGE HIGH (A) : 1.700 \n', 'REMARK 200 RESOLUTION RANGE LOW (A) : 30.000 \n', 'REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 2.000 \n', 'REMARK 200 \n', 'REMARK 200 OVERALL. \n', 'REMARK 200 COMPLETENESS FOR RANGE (%) : 99.7 \n', 'REMARK 200 DATA REDUNDANCY : 4.500 \n', 'REMARK 200 R MERGE (I) : 0.24000 \n', 'REMARK 200 R SYM (I) : 0.06000 \n', 'REMARK 200 FOR THE DATA SET : 8.0000 \n', 'REMARK 200 \n', 'REMARK 200 IN THE HIGHEST RESOLUTION SHELL. \n', 'REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 1.70 \n', 'REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : NULL \n', 'REMARK 200 COMPLETENESS FOR SHELL (%) : 97.5 \n', 'REMARK 200 DATA REDUNDANCY IN SHELL : 4.50 \n', 'REMARK 200 R MERGE FOR SHELL (I) : 0.06000 \n', 'REMARK 200 R SYM FOR SHELL (I) : 0.24000 \n', 'REMARK 200 FOR SHELL : 7.900 \n', 'REMARK 200 \n', 'REMARK 200 DIFFRACTION PROTOCOL: MAD \n', 'REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: MAD \n', 'REMARK 200 SOFTWARE USED: AMORE \n', 'REMARK 200 STARTING MODEL: NULL \n', 'REMARK 200 \n', 'REMARK 200 REMARK: NULL \n', 'REMARK 280 \n', 'REMARK 280 CRYSTAL \n', 'REMARK 280 SOLVENT CONTENT, VS (%): 46.26 \n', 'REMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 2.29 \n', 'REMARK 280 \n', 'REMARK 280 CRYSTALLIZATION CONDITIONS: PH 8.0, VAPOR DIFFUSION, SITTING \n', 'REMARK 280 DROP, TEMPERATURE 298K \n', 'REMARK 290 REMARK: NULL \n', 'REMARK 300 \n', 'REMARK 300 BIOMOLECULE: 1 \n', 'REMARK 300 SEE REMARK 350 FOR THE AUTHOR PROVIDED AND/OR PROGRAM \n', 'REMARK 300 GENERATED ASSEMBLY INFORMATION FOR THE STRUCTURE IN \n', 'REMARK 300 THIS ENTRY. THE REMARK MAY ALSO PROVIDE INFORMATION ON \n', 'REMARK 300 BURIED SURFACE AREA. \n', 'REMARK 465 \n', 'REMARK 465 MISSING RESIDUES \n', 'REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE \n', 'REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN \n', 'REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) \n', 'REMARK 465 \n', 'REMARK 465 M RES C SSSEQI \n', 'REMARK 465 LEU A 94 \n', 'REMARK 465 GLY A 95 \n', 'REMARK 465 ALA A 96 \n', 'REMARK 465 PRO A 97 \n', 'REMARK 465 GLN A 98 \n', 'REMARK 465 VAL A 99 \n', 'REMARK 465 MET A 100 \n', 'REMARK 465 PRO A 101 \n', 'REMARK 465 LEU B 94 \n', 'REMARK 465 GLY B 95 \n', 'REMARK 465 ALA B 96 \n', 'REMARK 465 PRO B 97 \n', 'REMARK 465 GLN B 98 \n', 'REMARK 465 VAL B 99 \n', 'REMARK 465 MET B 100 \n', 'REMARK 465 PRO B 101 \n', 'REMARK 500 \n', 'REMARK 500 GEOMETRY AND STEREOCHEMISTRY \n', 'REMARK 500 SUBTOPIC: CLOSE CONTACTS IN SAME ASYMMETRIC UNIT \n', 'REMARK 500 \n', 'REMARK 500 THE FOLLOWING ATOMS ARE IN CLOSE CONTACT. \n', 'REMARK 500 \n', 'REMARK 500 ATM1 RES C SSEQI ATM2 RES C SSEQI DISTANCE \n', 'REMARK 500 O HOH A 127 O HOH A 149 2.05 \n', 'REMARK 500 \n', 'REMARK 500 REMARK: NULL \n', 'REMARK 500 \n', 'REMARK 500 GEOMETRY AND STEREOCHEMISTRY \n', 'REMARK 500 SUBTOPIC: TORSION ANGLES \n', 'REMARK 500 \n', 'REMARK 500 TORSION ANGLES OUTSIDE THE EXPECTED RAMACHANDRAN REGIONS: \n', 'REMARK 500 (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; \n', 'REMARK 500 SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). \n', 'REMARK 500 \n', 'REMARK 500 STANDARD TABLE: \n', 'REMARK 500 FORMAT:(10X,I3,1X,A3,1X,A1,I4,A1,4X,F7.2,3X,F7.2) \n', 'REMARK 500 \n', 'REMARK 500 EXPECTED VALUES: GJ KLEYWEGT AND TA JONES (1996). PHI/PSI- \n', 'REMARK 500 CHOLOGY: RAMACHANDRAN REVISITED. STRUCTURE 4, 1395 - 1400 \n', 'REMARK 500 \n', 'REMARK 500 M RES CSSEQI PSI PHI \n', 'REMARK 500 ASN A 64 -175.84 -178.56 \n', 'REMARK 500 HIS A 71 -156.72 -164.33 \n', 'REMARK 500 LEU A 72 -70.52 -135.73 \n', 'REMARK 500 ALA A 74 -75.47 -29.45 \n', 'REMARK 500 SER A 75 -5.54 -145.62 \n', 'REMARK 500 GLN A 76 -178.36 65.32 \n', 'REMARK 500 GLU A 77 115.33 61.52 \n', 'REMARK 500 MET A 92 -36.89 93.21 \n', 'REMARK 500 LEU B 25 37.31 -77.89 \n', 'REMARK 500 GLN B 28 37.09 32.85 \n', 'REMARK 500 ARG B 30 132.20 -36.84 \n', 'REMARK 500 ASN B 64 -172.78 -175.93 \n', 'REMARK 500 GLN B 76 67.64 34.46 \n', 'REMARK 500 PRO B 91 -156.99 -48.61 \n', 'REMARK 500 MET B 92 -37.52 -160.44 \n', 'REMARK 500 \n', 'REMARK 500 REMARK: NULL \n', 'REMARK 525 \n', 'REMARK 525 SOLVENT \n', 'REMARK 525 \n', 'REMARK 525 THE SOLVENT MOLECULES HAVE CHAIN IDENTIFIERS THAT \n', 'REMARK 525 INDICATE THE POLYMER CHAIN WITH WHICH THEY ARE MOST \n', 'REMARK 525 CLOSELY ASSOCIATED. THE REMARK LISTS ALL THE SOLVENT \n', 'REMARK 525 MOLECULES WHICH ARE MORE THAN 5A AWAY FROM THE \n', 'REMARK 525 NEAREST POLYMER CHAIN (M = MODEL NUMBER; \n', 'REMARK 525 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE \n', 'REMARK 525 NUMBER; I=INSERTION CODE): \n', 'REMARK 525 \n', 'REMARK 525 M RES CSSEQI \n', 'REMARK 525 HOH B 115 DISTANCE = 6.82 ANGSTROMS \n', 'REMARK 525 HOH A 116 DISTANCE = 6.52 ANGSTROMS \n', 'REMARK 525 HOH B 119 DISTANCE = 5.12 ANGSTROMS \n', 'REMARK 525 HOH B 121 DISTANCE = 5.21 ANGSTROMS \n', 'REMARK 525 HOH B 123 DISTANCE = 5.18 ANGSTROMS \n', 'REMARK 525 HOH A 124 DISTANCE = 6.99 ANGSTROMS \n', 'REMARK 525 HOH B 124 DISTANCE = 5.13 ANGSTROMS \n', 'REMARK 525 HOH B 134 DISTANCE = 7.25 ANGSTROMS \n', 'REMARK 525 HOH B 140 DISTANCE = 5.54 ANGSTROMS \n', 'REMARK 525 HOH B 141 DISTANCE = 5.94 ANGSTROMS \n', 'REMARK 525 HOH B 142 DISTANCE = 6.60 ANGSTROMS \n', 'REMARK 525 HOH B 143 DISTANCE = 7.39 ANGSTROMS \n', 'REMARK 525 HOH A 145 DISTANCE = 9.25 ANGSTROMS \n', 'REMARK 525 HOH A 150 DISTANCE = 6.01 ANGSTROMS \n', 'REMARK 525 HOH B 152 DISTANCE = 5.46 ANGSTROMS \n', 'REMARK 525 HOH B 153 DISTANCE = 9.74 ANGSTROMS \n', 'REMARK 525 HOH B 154 DISTANCE = 9.32 ANGSTROMS \n', 'REMARK 525 HOH B 155 DISTANCE = 5.41 ANGSTROMS \n', 'REMARK 525 HOH B 163 DISTANCE = 5.16 ANGSTROMS \n', 'DBREF 2E12 A 1 101 UNP Q8P4R5 Q8P4R5_XANCP 1 101 \n', 'DBREF 2E12 B 1 101 UNP Q8P4R5 Q8P4R5_XANCP 1 101 \n', 'SEQRES 1 A 101 MET PRO LYS TYR ALA PRO HIS VAL TYR THR GLU GLN ALA \n', 'SEQRES 2 A 101 GLN ILE ALA THR LEU GLU HIS TRP VAL LYS LEU LEU ASP \n', 'SEQRES 3 A 101 GLY GLN GLU ARG VAL ARG ILE GLU LEU ASP ASP GLY SER \n', 'SEQRES 4 A 101 MET ILE ALA GLY THR VAL ALA VAL ARG PRO THR ILE GLN \n', 'SEQRES 5 A 101 THR TYR ARG ASP GLU GLN GLU ARG GLU GLY SER ASN GLY \n', 'SEQRES 6 A 101 GLN LEU ARG ILE ASP HIS LEU ASP ALA SER GLN GLU PRO \n', 'SEQRES 7 A 101 GLN TRP ILE TRP MET ASP ARG ILE VAL ALA VAL HIS PRO \n', 'SEQRES 8 A 101 MET PRO LEU GLY ALA PRO GLN VAL MET PRO \n', 'SEQRES 1 B 101 MET PRO LYS TYR ALA PRO HIS VAL TYR THR GLU GLN ALA \n', 'SEQRES 2 B 101 GLN ILE ALA THR LEU GLU HIS TRP VAL LYS LEU LEU ASP \n', 'SEQRES 3 B 101 GLY GLN GLU ARG VAL ARG ILE GLU LEU ASP ASP GLY SER \n', 'SEQRES 4 B 101 MET ILE ALA GLY THR VAL ALA VAL ARG PRO THR ILE GLN \n', 'SEQRES 5 B 101 THR TYR ARG ASP GLU GLN GLU ARG GLU GLY SER ASN GLY \n', 'SEQRES 6 B 101 GLN LEU ARG ILE ASP HIS LEU ASP ALA SER GLN GLU PRO \n', 'SEQRES 7 B 101 GLN TRP ILE TRP MET ASP ARG ILE VAL ALA VAL HIS PRO \n', 'SEQRES 8 B 101 MET PRO LEU GLY ALA PRO GLN VAL MET PRO \n', 'FORMUL 3 HOH *122(H2 O) \n', 'HELIX 1 1 GLU A 11 LEU A 24 1 14 \n', 'HELIX 2 2 GLU B 11 LEU B 25 1 15 \n', 'SHEET 1 A 3 ILE A 51 ARG A 55 0 \n', 'SHEET 2 A 3 GLU A 61 ASP A 70 -1 O ASN A 64 N GLN A 52 \n', 'SHEET 3 A 3 GLN A 79 TRP A 82 -1 O ILE A 81 N LEU A 67 \n', 'SHEET 1 B 5 ILE A 51 ARG A 55 0 \n', 'SHEET 2 B 5 GLU A 61 ASP A 70 -1 O ASN A 64 N GLN A 52 \n', 'SHEET 3 B 5 MET A 40 VAL A 45 -1 N THR A 44 O ASP A 70 \n', 'SHEET 4 B 5 ARG A 30 LEU A 35 -1 N ILE A 33 O ILE A 41 \n', 'SHEET 5 B 5 ILE A 86 PRO A 91 -1 O VAL A 87 N GLU A 34 \n', 'SHEET 1 C 5 PRO B 78 TRP B 82 0 \n', 'SHEET 2 C 5 GLN B 66 ASP B 70 -1 N ILE B 69 O GLN B 79 \n', 'SHEET 3 C 5 MET B 40 VAL B 47 -1 N ALA B 46 O ARG B 68 \n', 'SHEET 4 C 5 VAL B 31 LEU B 35 -1 N ILE B 33 O ILE B 41 \n', 'SHEET 5 C 5 ILE B 86 HIS B 90 -1 O VAL B 87 N GLU B 34 \n', 'SHEET 1 D 2 GLN B 52 ARG B 55 0 \n', 'SHEET 2 D 2 GLU B 61 ASN B 64 -1 O ASN B 64 N GLN B 52 \n', 'CRYST1 49.942 51.699 82.120 90.00 90.00 90.00 P 21 21 21 8 \n'] correct_header = { 'bio_cmx': [[[('A',), ('B',)], 1]], 'uc_mxs': array([[[ 1. , 0. , 0. , 0. ],\ [ 0. , 1. , 0. , 0. ],\ [ 0. , 0. , 1. , 0. ],\ [ 0. , 0. , 0. , 1. ]],\ [[ -1. , 0. , 0. , 24.971 ],\ [ 0. , -1. , 0. , 0. ],\ [ 0. , 0. , 1. , 41.06 ],\ [ 0. , 0. , 0. , 1. ]],\ [[ -1. , 0. , 0. , 0. ],\ [ 0. , 1. , 0. , 25.8495],\ [ 0. , 0. , -1. , 41.06 ],\ [ 0. , 0. , 0. , 1. ]],\ [[ 1. , 0. , 0. , 24.971 ],\ [ 0. , -1. , 0. , 25.8495],\ [ 0. , 0. , -1. , 0. ],\ [ 0. , 0. , 0. , 1. ]]]), \ 'dbref_acc_full': 'Q8P4R5_XANCP', \ 'name': 'TRANSLATION', \ 'solvent_content': '46.26', \ 'expdta': 'X-RAY', \ 'bio_mxs': array([[[ 1., 0., 0., 0.],\ [ 0., 1., 0., 0.],\ [ 0., 0., 1., 0.],\ [ 0., 0., 0., 1.]]]), 'uc_omx': array([[ 49.94256605, 0. , 0. ],\ [ 0. , 51.69828879, 0. ],\ [ 0. , 0. , 82.12203334]]), \ 'space_group': 'P 21 21 21', 'r_free': '0.280', \ 'cryst1': '49.942 51.699 82.120 90.00 90.00 90.00', \ 'experiment_type': 'X-RAY DIFFRACTION', \ 'uc_fmx': array([[ 0.020023, 0. , 0. ],\ [ 0. , 0.019343, 0. ],\ [ 0. , 0. , 0.012177]]),\ 'date': '17-OCT-06', \ 'matthews': '2.29', \ 'resolution': '1.70', \ 'id': '2E12', \ 'dbref_acc': 'Q8P4R5'} parsed_header = parse_header(header) for key, val in parsed_header.items(): assert val == correct_header[key] def test_get_trailer_offset(self): lines = ['ATOM','CONNECT'] assert get_trailer_offset(lines) == 1 def test_get_coords_offset(self): lines = ['dummy','ATOM','CONNECT'] assert get_coords_offset(lines) == 1 def test_get_symmetry(self): """testing parsing of symmetry operators """ lines = [ 'REMARK 290 SMTRY1 1 1.000000 0.000000 0.000000 0.00000 \n', 'REMARK 290 SMTRY2 1 0.000000 1.000000 0.000000 0.00000 \n', 'REMARK 290 SMTRY3 1 0.000000 0.000000 1.000000 0.00000 \n', 'REMARK 290 SMTRY1 2 -1.000000 0.000000 0.000000 24.97100 \n', 'REMARK 290 SMTRY2 2 0.000000 -1.000000 0.000000 0.00000 \n', 'REMARK 290 SMTRY3 2 0.000000 0.000000 1.000000 41.06000 \n', 'REMARK 290 SMTRY1 3 -1.000000 0.000000 0.000000 0.00000 \n', 'REMARK 290 SMTRY2 3 0.000000 1.000000 0.000000 25.84950 \n', 'REMARK 290 SMTRY3 3 0.000000 0.000000 -1.000000 41.06000 \n', 'REMARK 290 SMTRY1 4 1.000000 0.000000 0.000000 24.97100 \n', 'REMARK 290 SMTRY2 4 0.000000 -1.000000 0.000000 25.84950 \n', 'REMARK 290 SMTRY3 4 0.000000 0.000000 -1.000000 0.00000 \n', 'REMARK 290 \n', 'REMARK 290 REMARK: NULL \n', 'REMARK 350 \n', 'REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN \n', 'REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE \n', 'REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS \n', 'REMARK 350 GIVEN BELOW. BOTH NON-CRYSTALLOGRAPHIC AND \n', 'REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN. \n', 'REMARK 350 \n', 'REMARK 350 BIOMOLECULE: 1 \n', 'REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: DIMERIC \n', 'REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B \n', 'REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 \n', 'REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 \n', 'REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000 \n', 'CRYST1 49.942 51.699 82.120 90.00 90.00 90.00 P 21 21 21 8 \n', 'ORIGX1 1.000000 0.000000 0.000000 0.00000 \n', 'ORIGX2 0.000000 1.000000 0.000000 0.00000 \n', 'ORIGX3 0.000000 0.000000 1.000000 0.00000 \n', 'SCALE1 0.020023 0.000000 0.000000 0.00000 \n', 'SCALE2 0.000000 0.019343 0.000000 0.00000 \n', 'SCALE3 0.000000 0.000000 0.012177 0.00000 \n'] sym = get_symmetry(lines) correct_sym = { 'bio_cmx': [[[('A',), ('B',)], 1]], 'uc_mxs': array([[[ 1. , 0. , 0. , 0. ],\ [ 0. , 1. , 0. , 0. ],\ [ 0. , 0. , 1. , 0. ],\ [ 0. , 0. , 0. , 1. ]],\ [[ -1. , 0. , 0. , 24.971 ],\ [ 0. , -1. , 0. , 0. ],\ [ 0. , 0. , 1. , 41.06 ],\ [ 0. , 0. , 0. , 1. ]],\ [[ -1. , 0. , 0. , 0. ],\ [ 0. , 1. , 0. , 25.8495],\ [ 0. , 0. , -1. , 41.06 ],\ [ 0. , 0. , 0. , 1. ]],\ [[ 1. , 0. , 0. , 24.971 ],\ [ 0. , -1. , 0. , 25.8495],\ [ 0. , 0. , -1. , 0. ],\ [ 0. , 0. , 0. , 1. ]]]), \ 'bio_mxs': array([[[ 1., 0., 0., 0.],\ [ 0., 1., 0., 0.],\ [ 0., 0., 1., 0.],\ [ 0., 0., 0., 1.]]]), 'uc_omx': array([[ 49.94256605, 0. , 0. ],\ [ 0. , 51.69828879, 0. ],\ [ 0. , 0. , 82.12203334]]), \ 'uc_fmx': array([[ 0.020023, 0. , 0. ],\ [ 0. , 0.019343, 0. ],\ [ 0. , 0. , 0.012177]]),} for key in sym: try: assert sym[key] == correct_sym[key] except ValueError: assert allclose(sym[key], correct_sym[key]) def test_dict2pdb(self): """testing pdb dict round-trip. """ line = 'ATOM 1 N MET A 1 53.045 42.225 33.724 1.00 2.75 N\n' d = pdb2dict(line) line2 = dict2pdb(d) assert line == line2 d.pop('coords') assert d == {'ser_num': 1, 'res_long_id': ('MET', 1, ' '), 'h_flag': ' ', 'at_name': ' N ', 'at_long_id': ('N', ' '), 'bfactor': 2.75, 'chain_id': 'A', 'occupancy': 1.0, 'element': ' N', 'res_name': 'MET', 'seg_id': ' ', 'at_id': 'N', 'alt_loc': ' ', 'res_ic': ' ', 'res_id': 1, 'at_type': 'ATOM '} def test_dict2ter(self): d = {'ser_num': 1, 'chain_id': 'A', 'res_name': 'MET', 'res_ic': ' ', \ 'res_id': 1,} assert dict2ter(d) == 'TER 2 MET A 1 \n' if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_phylip.py000644 000765 000024 00000035312 12024702176 022476 0ustar00jrideoutstaff000000 000000 #!/bin/env python #file cogent/parse/test_phylip.py """Unit tests for the phylip parser """ from cogent.parse.phylip import MinimalPhylipParser, get_align_for_phylip from cogent.parse.record import RecordError from cogent.util.unit_test import TestCase, main from StringIO import StringIO __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Production" class PhylipGenericTest(TestCase): """Setup data for Phylip parsers.""" def setUp(self): """standard files""" self.big_interleaved = StringIO("""10 705 I Cow ATGGCATATCCCATACAACTAGGATTCCAAGATGCAACATCACCAATCATAGAAGAACTA Carp ATGGCACACCCAACGCAACTAGGTTTCAAGGACGCGGCCATACCCGTTATAGAGGAACTT Chicken ATGGCCAACCACTCCCAACTAGGCTTTCAAGACGCCTCATCCCCCATCATAGAAGAGCTC Human ATGGCACATGCAGCGCAAGTAGGTCTACAAGACGCTACTTCCCCTATCATAGAAGAGCTT Loach ATGGCACATCCCACACAATTAGGATTCCAAGACGCGGCCTCACCCGTAATAGAAGAACTT Mouse ATGGCCTACCCATTCCAACTTGGTCTACAAGACGCCACATCCCCTATTATAGAAGAGCTA Rat ATGGCTTACCCATTTCAACTTGGCTTACAAGACGCTACATCACCTATCATAGAAGAACTT Seal ATGGCATACCCCCTACAAATAGGCCTACAAGATGCAACCTCTCCCATTATAGAGGAGTTA Whale ATGGCATATCCATTCCAACTAGGTTTCCAAGATGCAGCATCACCCATCATAGAAGAGCTC Frog ATGGCACACCCATCACAATTAGGTTTTCAAGACGCAGCCTCTCCAATTATAGAAGAATTA CTTCACTTTCATGACCACACGCTAATAATTGTCTTCTTAATTAGCTCATTAGTACTTTAC CTTCACTTCCACGACCACGCATTAATAATTGTGCTCCTAATTAGCACTTTAGTTTTATAT GTTGAATTCCACGACCACGCCCTGATAGTCGCACTAGCAATTTGCAGCTTAGTACTCTAC ATCACCTTTCATGATCACGCCCTCATAATCATTTTCCTTATCTGCTTCCTAGTCCTGTAT CTTCACTTCCATGACCATGCCCTAATAATTGTATTTTTGATTAGCGCCCTAGTACTTTAT ATAAATTTCCATGATCACACACTAATAATTGTTTTCCTAATTAGCTCCTTAGTCCTCTAT ACAAACTTTCATGACCACACCCTAATAATTGTATTCCTCATCAGCTCCCTAGTACTTTAT CTACACTTCCATGACCACACATTAATAATTGTGTTCCTAATTAGCTCATTAGTACTCTAC CTACACTTTCACGATCATACACTAATAATCGTTTTTCTAATTAGCTCTTTAGTTCTCTAC CTTCACTTCCACGACCATACCCTCATAGCCGTTTTTCTTATTAGTACGCTAGTTCTTTAC ATTATTTCACTAATACTAACGACAAAGCTGACCCATACAAGCACGATAGATGCACAAGAA ATTATTACTGCAATGGTATCAACTAAACTTACTAATAAATATATTCTAGACTCCCAAGAA CTTCTAACTCTTATACTTATAGAAAAACTATCA---TCAAACACCGTAGATGCCCAAGAA GCCCTTTTCCTAACACTCACAACAAAACTAACTAATACTAACATCTCAGACGCTCAGGAA GTTATTATTACAACCGTCTCAACAAAACTCACTAACATATATATTTTGGACTCACAAGAA ATCATCTCGCTAATATTAACAACAAAACTAACACATACAAGCACAATAGATGCACAAGAA ATTATTTCACTAATACTAACAACAAAACTAACACACACAAGCACAATAGACGCCCAAGAA ATTATCTCACTTATACTAACCACGAAACTCACCCACACAAGTACAATAGACGCACAAGAA ATTATTACCCTAATGCTTACAACCAAATTAACACATACTAGTACAATAGACGCCCAAGAA ATTATTACTATTATAATAACTACTAAACTAACTAATACAAACCTAATGGACGCACAAGAG GTAGAGACAATCTGAACCATTCTGCCCGCCATCATCTTAATTCTAATTGCTCTTCCTTCT ATCGAAATCGTATGAACCATTCTACCAGCCGTCATTTTAGTACTAATCGCCCTGCCCTCC GTTGAACTAATCTGAACCATCCTACCCGCTATTGTCCTAGTCCTGCTTGCCCTCCCCTCC ATAGAAACCGTCTGAACTATCCTGCCCGCCATCATCCTAGTCCTCATCGCCCTCCCATCC ATTGAAATCGTATGAACTGTGCTCCCTGCCCTAATCCTCATTTTAATCGCCCTCCCCTCA GTTGAAACCATTTGAACTATTCTACCAGCTGTAATCCTTATCATAATTGCTCTCCCCTCT GTAGAAACAATTTGAACAATTCTCCCAGCTGTCATTCTTATTCTAATTGCCCTTCCCTCC GTGGAAACGGTGTGAACGATCCTACCCGCTATCATTTTAATTCTCATTGCCCTACCATCA GTAGAAACTGTCTGAACTATCCTCCCAGCCATTATCTTAATTTTAATTGCCTTGCCTTCA ATCGAAATAGTGTGAACTATTATACCAGCTATTAGCCTCATCATAATTGCCCTTCCATCC TTACGAATTCTATACATAATAGATGAAATCAATAACCCATCTCTTACAGTAAAAACCATA CTACGCATCCTGTACCTTATAGACGAAATTAACGACCCTCACCTGACAATTAAAGCAATA CTCCAAATCCTCTACATAATAGACGAAATCGACGAACCTGATCTCACCCTAAAAGCCATC CTACGCATCCTTTACATAACAGACGAGGTCAACGATCCCTCCCTTACCATCAAATCAATT CTACGAATTCTATATCTTATAGACGAGATTAATGACCCCCACCTAACAATTAAGGCCATG CTACGCATTCTATATATAATAGACGAAATCAACAACCCCGTATTAACCGTTAAAACCATA CTACGAATTCTATACATAATAGACGAGATTAATAACCCAGTTCTAACAGTAAAAACTATA TTACGAATCCTCTACATAATGGACGAGATCAATAACCCTTCCTTGACCGTAAAAACTATA TTACGGATCCTTTACATAATAGACGAAGTCAATAACCCCTCCCTCACTGTAAAAACAATA CTTCGTATCCTATATTTAATAGATGAAGTTAATGATCCACACTTAACAATTAAAGCAATC GGACATCAGTGATACTGAAGCTATGAGTATACAGATTATGAGGACTTAAGCTTCGACTCC GGACACCAATGATACTGAAGTTACGAGTATACAGACTATGAAAATCTAGGATTCGACTCC GGACACCAATGATACTGAACCTATGAATACACAGACTTCAAGGACCTCTCATTTGACTCC GGCCACCAATGGTACTGAACCTACGAGTACACCGACTACGGCGGACTAATCTTCAACTCC GGGCACCAATGATACTGAAGCTACGAGTATACTGATTATGAAAACTTAAGTTTTGACTCC GGGCACCAATGATACTGAAGCTACGAATATACTGACTATGAAGACCTATGCTTTGATTCA GGACACCAATGATACTGAAGCTATGAATATACTGACTATGAAGACCTATGCTTTGACTCC GGACATCAGTGATACTGAAGCTATGAGTACACAGACTACGAAGACCTGAACTTTGACTCA GGTCACCAATGATATTGAAGCTATGAGTATACCGACTACGAAGACCTAAGCTTCGACTCC GGCCACCAATGATACTGAAGCTACGAATATACTAACTATGAGGATCTCTCATTTGACTCT TACATAATTCCAACATCAGAATTAAAGCCAGGGGAGCTACGACTATTAGAAGTCGATAAT TATATAGTACCAACCCAAGACCTTGCCCCCGGACAATTCCGACTTCTGGAAACAGACCAC TACATAACCCCAACAACAGACCTCCCCCTAGGCCACTTCCGCCTACTAGAAGTCGACCAT TACATACTTCCCCCATTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAAT TACATAATCCCCACCCAGGACCTAACCCCTGGACAATTCCGGCTACTAGAGACAGACCAC TATATAATCCCAACAAACGACCTAAAACCTGGTGAACTACGACTGCTAGAAGTTGATAAC TACATAATCCCAACCAATGACCTAAAACCAGGTGAACTTCGTCTATTAGAAGTTGATAAT TATATGATCCCCACACAAGAACTAAAGCCCGGAGAACTACGACTGCTAGAAGTAGACAAT TATATAATCCCAACATCAGACCTAAAGCCAGGAGAACTACGATTATTAGAAGTAGATAAC TATATAATTCCAACTAATGACCTTACCCCTGGACAATTCCGGCTGCTAGAAGTTGATAAT CGAGTTGTACTACCAATAGAAATAACAATCCGAATGTTAGTCTCCTCTGAAGACGTATTA CGAATAGTTGTTCCAATAGAATCCCCAGTCCGTGTCCTAGTATCTGCTGAAGACGTGCTA CGCATTGTAATCCCCATAGAATCCCCCATTCGAGTAATCATCACCGCTGATGACGTCCTC CGAGTAGTACTCCCGATTGAAGCCCCCATTCGTATAATAATTACATCACAAGACGTCTTG CGAATGGTTGTTCCCATAGAATCCCCTATTCGCATTCTTGTTTCCGCCGAAGATGTACTA CGAGTCGTTCTGCCAATAGAACTTCCAATCCGTATATTAATTTCATCTGAAGACGTCCTC CGGGTAGTCTTACCAATAGAACTTCCAATTCGTATACTAATCTCATCCGAAGACGTCCTG CGAGTAGTCCTCCCAATAGAAATAACAATCCGCATACTAATCTCATCAGAAGATGTACTC CGAGTTGTCTTACCTATAGAAATAACAATCCGAATATTAGTCTCATCAGAAGACGTACTC CGAATAGTAGTCCCAATAGAATCTCCAACCCGACTTTTAGTTACAGCCGAAGACGTCCTC CACTCATGAGCTGTGCCCTCTCTAGGACTAAAAACAGACGCAATCCCAGGCCGTCTAAAC CATTCTTGAGCTGTTCCATCCCTTGGCGTAAAAATGGACGCAGTCCCAGGACGACTAAAT CACTCATGAGCCGTACCCGCCCTCGGGGTAAAAACAGACGCAATCCCTGGACGACTAAAT CACTCATGAGCTGTCCCCACATTAGGCTTAAAAACAGATGCAATTCCCGGACGTCTAAAC CACTCCTGGGCCCTTCCAGCCATGGGGGTAAAGATAGACGCGGTCCCAGGACGCCTTAAC CACTCATGAGCAGTCCCCTCCCTAGGACTTAAAACTGATGCCATCCCAGGCCGACTAAAT CACTCATGAGCCATCCCTTCACTAGGGTTAAAAACCGACGCAATCCCCGGCCGCCTAAAC CACTCATGAGCCGTACCGTCCCTAGGACTAAAAACTGATGCTATCCCAGGACGACTAAAC CACTCATGGGCCGTACCCTCCTTGGGCCTAAAAACAGATGCAATCCCAGGACGCCTAAAC CACTCGTGAGCTGTACCCTCCTTGGGTGTCAAAACAGATGCAATCCCAGGACGACTTCAT CAAACAACCCTTATATCGTCCCGTCCAGGCTTATATTACGGTCAATGCTCAGAAATTTGC CAAGCCGCCTTTATTGCCTCACGCCCAGGGGTCTTTTACGGACAATGCTCTGAAATTTGT CAAACCTCCTTCATCACCACTCGACCAGGAGTGTTTTACGGACAATGCTCAGAAATCTGC CAAACCACTTTCACCGCTACACGACCGGGGGTATACTACGGTCAATGCTCTGAAATCTGT CAAACCGCCTTTATTGCCTCCCGCCCCGGGGTATTCTATGGGCAATGCTCAGAAATCTGT CAAGCAACAGTAACATCAAACCGACCAGGGTTATTCTATGGCCAATGCTCTGAAATTTGT CAAGCTACAGTCACATCAAACCGACCAGGTCTATTCTATGGCCAATGCTCTGAAATTTGC CAAACAACCCTAATAACCATACGACCAGGACTGTACTACGGTCAATGCTCAGAAATCTGT CAAACAACCTTAATATCAACACGACCAGGCCTATTTTATGGACAATGCTCAGAGATCTGC CAAACATCATTTATTGCTACTCGTCCGGGAGTATTTTACGGACAATGTTCAGAAATTTGC GGGTCAAACCACAGTTTCATACCCATTGTCCTTGAGTTAGTCCCACTAAAGTACTTTGAA GGAGCTAATCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCTCTCGAACACTTCGAA GGAGCTAACCACAGCTACATACCCATTGTAGTAGAGTCTACCCCCCTAAAACACTTTGAA GGAGCAAACCACAGTTTCATGCCCATCGTCCTAGAATTAATTCCCCTAAAAATCTTTGAA GGAGCAAACCACAGCTTTATACCCATCGTAGTAGAAGCGGTCCCACTATCTCACTTCGAA GGATCTAACCATAGCTTTATGCCCATTGTCCTAGAAATGGTTCCACTAAAATATTTCGAA GGCTCAAATCACAGCTTCATACCCATTGTACTAGAAATAGTGCCTCTAAAATATTTCGAA GGTTCAAACCACAGCTTCATACCTATTGTCCTCGAATTGGTCCCACTATCCCACTTCGAG GGCTCAAACCACAGTTTCATACCAATTGTCCTAGAACTAGTACCCCTAGAAGTCTTTGAA GGAGCAAACCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCGCTAACCGACTTTGAA AAATGATCTGCGTCAATATTA---------------------TAA AACTGATCCTCATTAATACTAGAAGACGCCTCGCTAGGAAGCTAA GCCTGATCCTCACTA------------------CTGTCATCTTAA ATA---------------------GGGCCCGTATTTACCCTATAG AACTGGTCCACCCTTATACTAAAAGACGCCTCACTAGGAAGCTAA AACTGATCTGCTTCAATAATT---------------------TAA AACTGATCAGCTTCTATAATT---------------------TAA AAATGATCTACCTCAATGCTT---------------------TAA AAATGATCTGTATCAATACTA---------------------TAA AACTGATCTTCATCAATACTA---GAAGCATCACTA------AGA """) self.space_interleaved = StringIO(""" 5 176 I cox2_leita MAFILSFWMI FLLDSVIVLL SFVCFVCVWI CALLFSTVLL VSKLNNIYCT cox2_crifa MAFILSFWMI FLIDAVIVLL SFVCFVCIWI CSLFFSSFLL VSKINNVYCT cox2_bsalt MSFIISFWML FLIDSLIVLL SGAIFVCIWI CSLFFLCILF ICKLDYIFCS cox2_trybb MSFILTFWMI FLMDSIIVLI SFSIFLSVWI CALIIATVLT VTKINNIYCT cox2_tborr MLFFINQLLL LLVDTFVILE IFSLFVCVFI IVMYILFINY NIFLKNINVY WDFTASKFID VYWFTIGGMF SLGLLLRLCL LLYFGHLNFV SFDLCKVVGF WDFTASKFID AYWFTIGGMF VLCLLLRLCL LLYFGCLNFV SFDLCKVVGF WDFISAKFID LYWFTLGCLF IVCLLIRLCL LLYFSCLNFV CFDLCKCIGF WDFISSKFID TYWFVLGMMF ILCLLLRLCL LLYFSCINFV SFDLCKVIGF LDFIGSKYLD LYWFLIGIFF VIVLLIRLCL LLYYSWISLL IFDLCKIMGF QWYWVYFIFG ETTIFSNLIL ESDYMIGDLR LLQCNHVLTL LSLVIYKLWL QWYWVYFIFG ETTIFSNLIL ESDYLIGDLR LLQCNHVLTL LSLVIYKLWL QWYWVYFIFG ETTIFSNLIL ESDYLIGDLR LLQCNHVLTL LSLVIYKVWL QWYWVYFLFG ETTIFSNLIL ESDYLIGDLR ILQCNHVLTL LSLVIYKLWV QWYWIFFVFK ENVIFSNLLI ESDYWIGDLR LLQCNNTFNL ICLVVYKIWV SAVDVIHSFA ISSLGVKVEN LVAVMK SAVDVIHSFA VSSLGIKVDC IPGRCN SAIDVIHSFT LANLGIKVD? ?PGRCN SAVDVIHSFT ISSLGIKVEN PGRCNE TSIDVIHSFT ISTLGIKIDC IPGRCN """) self.interleaved_little = StringIO(""" 6 39 I Archaeopt CGATGCTTAC CGCCGATGCT HesperorniCGTTACTCGT TGTCGTTACT BaluchitheTAATGTTAAT TGTTAATGTT B. virginiTAATGTTCGT TGTTAATGTT BrontosaurCAAAACCCAT CATCAAAACC B.subtilisGGCAGCCAAT CACGGCAGCC TACCGCCGAT GCTTACCGC CGTTGTCGTT ACTCGTTGT AATTGTTAAT GTTAATTGT CGTTGTTAAT GTTCGTTGT CATCATCAAA ACCCATCAT AATCACGGCA GCCAATCAC """) self.empty = [] self.noninterleaved_little = StringIO(""" 6 20 Archaeopt CGATGCTTAC CGCCGATGCT HesperorniCGTTACTCGT TGTCGTTACT BaluchitheTAATGTTAAT TGTTAATGTT B. virginiTAATGTTCGT TGTTAATGTT BrontosaurCAAAACCCAT CATCAAAACC B.subtilisGGCAGCCAAT CACGGCAGCC """) self.noninterleaved_big = StringIO("""10 297 Rhesus tgtggcacaaatactcatgccagctcattacagcatgagaac---agtttgttactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggcttg gcaaggagccaacataacagatggactggaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagatctgaatgctaatgccctgtatgagagaaaagaatgg aataagcaaaaactgccatgctctgagaatcctagagacactgaagatgttccttgg Manatee tgtggcacaaatactcatgccagctcattacagcatgagaatagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtcataaaagcaaacagcctggctta acaaggagccagcagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cctagcacagagaaaaaggtagatatgaatgctaatccattgtatgagagaaaagaagtg aataagcagaaacctccatgctccgagagtgttagagatacacaagatattccttgg Pig tgtggcacagatactcatgccagctcgttacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattttgtaataaaagcaagcagcctgtctta gcaaagagccaacagagcagatgggctgaaagtaagggcacatgtaatgataggcagact cctaacacagagaaaaaggtagttctgaatactgatctcctgtatgggagaaacgaactg aataagcagaaacctgcgtgctctgacagtcctagagattcccaagatgttccttgg """) class MinimalPhylipParserTests(PhylipGenericTest): """Tests of MinimalPhylipParser: returns (label, seq) tuples.""" def test_empty(self): """MinimalFastaParser should return empty list from 'file' w/o labels""" self.assertEqual(list(MinimalPhylipParser(self.empty)), []) def test_minimal_parser(self): """MinimalFastaParser should read single record as (label, seq) tuple""" seqs = list(MinimalPhylipParser(self.big_interleaved)) self.assertEqual(len(seqs), 10) label, seq = seqs[-1] self.assertEqual(label, 'Frog') self.assertEqual(seq, \ 'ATGGCACACCCATCACAATTAGGTTTTCAAGACGCAGCCTCTCCAATTATAGAAGAATTACTTCACTTCCACGACCATACCCTCATAGCCGTTTTTCTTATTAGTACGCTAGTTCTTTACATTATTACTATTATAATAACTACTAAACTAACTAATACAAACCTAATGGACGCACAAGAGATCGAAATAGTGTGAACTATTATACCAGCTATTAGCCTCATCATAATTGCCCTTCCATCCCTTCGTATCCTATATTTAATAGATGAAGTTAATGATCCACACTTAACAATTAAAGCAATCGGCCACCAATGATACTGAAGCTACGAATATACTAACTATGAGGATCTCTCATTTGACTCTTATATAATTCCAACTAATGACCTTACCCCTGGACAATTCCGGCTGCTAGAAGTTGATAATCGAATAGTAGTCCCAATAGAATCTCCAACCCGACTTTTAGTTACAGCCGAAGACGTCCTCCACTCGTGAGCTGTACCCTCCTTGGGTGTCAAAACAGATGCAATCCCAGGACGACTTCATCAAACATCATTTATTGCTACTCGTCCGGGAGTATTTTACGGACAATGTTCAGAAATTTGCGGAGCAAACCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCGCTAACCGACTTTGAAAACTGATCTTCATCAATACTA---GAAGCATCACTA------AGA') self.assertEqual(seqs[0][0], 'Cow') seqs = list(MinimalPhylipParser(self.space_interleaved)) self.assertEqual(len(seqs), 5) self.assertEqual(seqs[0][0], 'cox2_leita') self.assertEqual(seqs[-1][0], 'cox2_tborr') self.assertEqual(len(seqs[0][1]), 176) self.assertEqual(len(seqs[-1][1]), 176) seqs = list(MinimalPhylipParser(self.interleaved_little)) self.assertEqual(len(seqs), 6) self.assertEqual(seqs[1][0], 'Hesperorni') self.assertEqual(seqs[-1][0], 'B.subtilis') self.assertEqual(seqs[-1][1], 'GGCAGCCAATCACGGCAGCCAATCACGGCAGCCAATCAC') seqs = list(MinimalPhylipParser(self.noninterleaved_little)) self.assertEqual(len(seqs), 6) self.assertEqual(seqs[0][0], 'Archaeopt') self.assertEqual(seqs[-1][0], 'B.subtilis') self.assertEqual(seqs[-1][-1], 'GGCAGCCAATCACGGCAGCC') seqs = list(MinimalPhylipParser(self.noninterleaved_big)) self.assertEqual(len(seqs), 3) self.assertEqual(seqs[0][0], 'Rhesus') self.assertEqual(seqs[-1][0], 'Pig') self.assertEqual(seqs[-1][1], 'tgtggcacagatactcatgccagctcgttacagcatgagaacagcagtttattactcactaaagacagaatgaatgtagaaaaggctgaattttgtaataaaagcaagcagcctgtcttagcaaagagccaacagagcagatgggctgaaagtaagggcacatgtaatgataggcagactcctaacacagagaaaaaggtagttctgaatactgatctcctgtatgggagaaacgaactgaataagcagaaacctgcgtgctctgacagtcctagagattcccaagatgttccttgg') def test_get_align(self): """get_align_for_phylip should return Aligment object for phylip files""" align = get_align_for_phylip(self.big_interleaved) align = get_align_for_phylip(self.interleaved_little) s = str(align) self.assertEqual(s, '''>Archaeopt CGATGCTTACCGCCGATGCTTACCGCCGATGCTTACCGC >Hesperorni CGTTACTCGTTGTCGTTACTCGTTGTCGTTACTCGTTGT >Baluchithe TAATGTTAATTGTTAATGTTAATTGTTAATGTTAATTGT >B. virgini TAATGTTCGTTGTTAATGTTCGTTGTTAATGTTCGTTGT >Brontosaur CAAAACCCATCATCAAAACCCATCATCAAAACCCATCAT >B.subtilis GGCAGCCAATCACGGCAGCCAATCACGGCAGCCAATCAC ''') align = get_align_for_phylip(self.noninterleaved_little) s = str(align) self.assertEqual(s, '''>Archaeopt CGATGCTTACCGCCGATGCT >Hesperorni CGTTACTCGTTGTCGTTACT >Baluchithe TAATGTTAATTGTTAATGTT >B. virgini TAATGTTCGTTGTTAATGTT >Brontosaur CAAAACCCATCATCAAAACC >B.subtilis GGCAGCCAATCACGGCAGCC ''') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_pknotsrg.py000644 000765 000024 00000002024 12024702176 023032 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.pknotsrg import pknotsrg_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class PknotsrgParserTest(TestCase): """Provides tests for pknotsRG RNA secondary structure format parsers""" def setUp(self): """Setup function""" #output self.pknotsrg_out = PKNOTSRG #expected self.pknotsrg_exp = [['UGCAUAAUAGCUCC',[(0,8),(3,11),(5,13)],-22.40]] def test_pknotsrg_output(self): """Test for pknotsrg format parser""" obs = pknotsrg_parser(self.pknotsrg_out) self.assertEqual(obs,self.pknotsrg_exp) PKNOTSRG = ['UGCAUAAUAGCUCC\n', '(..{.[..)..}.] (-22.40)\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_psl.py000644 000765 000024 00000003022 12024702176 021760 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the PSL parser. Compatible with blat v.34 """ from cogent.parse.psl import make_header, MinimalPslParser, PslToTable from cogent.util.unit_test import TestCase, main from cogent import LoadTable __author__ = "Gavin Huttley, Anuj Pahwa" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight","Peter Maxwell", "Gavin Huttley", "Anuj Pahwa"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Development" fname = 'data/test.psl' class Test(TestCase): def test_header(self): """should return correct header""" expect = ['match', 'mis-match', 'rep. match', "N's", 'Q gap count', 'Q gap bases', 'T gap count', 'T gap bases', 'strand', 'Q name', 'Q size', 'Q start', 'Q end', 'T name', 'T size', 'T start', 'T end', 'block count', 'blockSizes', 'qStarts', 'tStarts'] parser = MinimalPslParser(fname) version = parser.next() header = parser.next() self.assertEqual(header, expect) def test_psl_to_table(self): table = PslToTable(fname) def test_getting_seq_coords(self): """get correct sequence coordinates to produce a trimmed sequence""" table = PslToTable(fname) for row in table: query_name = row["Q name"] query_strand = row["strand"] q_start = row["Q start"] if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_rdb.py000644 000765 000024 00000030077 12024702176 021743 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #test_rdb.py """Unit test for RDB Parser """ from cogent.util.unit_test import TestCase, main from cogent.parse.rdb import RdbParser, MinimalRdbParser,is_seq_label,\ InfoMaker, create_acceptable_sequence from cogent.core.sequence import Sequence, DnaSequence, RnaSequence from cogent.core.info import Info from cogent.parse.record import RecordError __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class RdbTests(TestCase): """Tests for top-level functions in Rdb.py""" def test_is_seq_label(self): """is_seq_label should return True if a line starts with 'seq:'""" seq = 'seq:this is a sequence line' not_seq = 'this is not a sequence line' still_not_seq = 'this seq: is still not a sequence line' self.assertEqual(is_seq_label(seq),True) self.assertEqual(is_seq_label(not_seq),False) self.assertEqual(is_seq_label(still_not_seq),False) def test_create_acceptable_sequence(self): """create_acceptable_sequence: should handle 'o' and sec. struct""" f = create_acceptable_sequence # should keep any char accepted by RNA.Alphabet.DegenGapped s = "UCAG---NRYBDHKMNSRWVY?" self.assertEqual(f(s),s) # should replace 'o' by '?' s = "UCAG-oo-ACGU" self.assertEqual(f(s), "UCAG-??-ACGU") # should strip out secondary info s = "{UC^AG-[oo]-A(CG)U}" self.assertEqual(f(s), "UCAG-??-ACGU") # should leave other chars untouched s = "XYZ1234" self.assertEqual(f(s), "XYZ1234") class InfoMakerTests(TestCase): """Tests for the Constructor InfoMaker. Should return an Info object""" def test_empty(self): """InfoMaker: should return empty Info from empty header""" empty_header = [] obs = InfoMaker(empty_header) exp = Info() self.assertEqual(obs,exp) def test_full(self): """InfoMaker should return Info object with name, value pairs""" test_header = ['acc: X3402','abc:1','mty: ssu','seq: Mit. X3402',\ '','nonsense',':no_name'] obs = InfoMaker(test_header) exp = Info() exp.rRNA = 'X3402' exp.abc = '1' exp.Species = 'Mit. X3402' exp.Gene = 'ssu' self.assertEqual(obs,exp) class GenericRdbTest(TestCase): "SetUp data for all Rdb parsers""" def setUp(self): self.empty = [] self.labels = 'mty:ssu\nseq:bac\n//\nttl:joe\nseq:mit\n//'.split('\n') self.nolabels = 'ACGUAGCUAGCUAC\nGCUGCAUCG\nAUCG\n//'.split('\n') self.oneseq = 'seq:H.Sapiens\nAGUCAUCUAGAUHCAUHC\n//'.split('\n') self.multiline = 'seq:H.Sapiens\nAGUCAUUAG\nAUHCAUHC\n//'.split('\n') self.threeseq =\ 'seq:bac\nAGU\n//\nseq:mit\nACU\n//\nseq:pla\nAAA\n//'.split('\n') self.twogood =\ 'seq:bac\n//\nseq:mit\nACU\n//\nseq:pla\nAAA\n//'.split('\n') self.oneX =\ 'seq:bac\nX\n//\nseq:mit\nACT\n//\nseq:pla\nAAA\n//'.split('\n') self.strange = 'seq:bac\nACGUXxAaKkoo---*\n//'.split('\n') class MinimalRdbParserTests(GenericRdbTest): """Tests of MinimalRdbParser: returns (headerLines,sequence) tuples""" def test_empty(self): """MinimalRdbParser should return empty list from file w/o seqs""" self.assertEqual(list(MinimalRdbParser(self.empty)),[]) self.assertEqual(list(MinimalRdbParser(self.nolabels, strict=False)), []) self.assertRaises(RecordError, list, MinimalRdbParser(self.nolabels)) def test_only_labels(self): """MinimalRdbParser should return empty list from file w/o seqs""" #should fail if strict (the default) self.assertRaises(RecordError, list, MinimalRdbParser(self.labels,strict=True)) #if not strict, should skip the records self.assertEqual(list(MinimalRdbParser(self.labels, strict=False)), []) def test_only_sequences(self): """MinimalRdbParser should return empty list form file w/o lables""" #should fail if strict (the default) self.assertRaises(RecordError, list, MinimalRdbParser(self.nolabels,strict=True)) #if not strict, should skip the records self.assertEqual(list(MinimalRdbParser(self.nolabels, strict=False)), []) def test_single(self): """MinimalRdbParser should read single record as (header,seq) tuple""" res = list(MinimalRdbParser(self.oneseq)) self.assertEqual(len(res),1) first = res[0] self.assertEqual(first, (['seq:H.Sapiens'], 'AGUCAUCUAGAUHCAUHC')) res = list(MinimalRdbParser(self.multiline)) self.assertEqual(len(res),1) first = res[0] self.assertEqual(first, (['seq:H.Sapiens'], 'AGUCAUUAGAUHCAUHC')) def test_multiple(self): """MinimalRdbParser should read multiple record correctly""" res = list(MinimalRdbParser(self.threeseq)) self.assertEqual(len(res), 3) a, b, c = res self.assertEqual(a, (['seq:bac'], 'AGU')) self.assertEqual(b, (['seq:mit'], 'ACU')) self.assertEqual(c, (['seq:pla'], 'AAA')) def test_multiple_bad(self): """MinimalRdbParser should complain or skip bad records""" self.assertRaises(RecordError, list, MinimalRdbParser(self.twogood)) f = list(MinimalRdbParser(self.twogood, strict=False)) self.assertEqual(len(f), 2) a, b = f self.assertEqual(a, (['seq:mit'], 'ACU')) self.assertEqual(b, (['seq:pla'], 'AAA')) def test_strange(self): """MRP: handle strange char. according to constr. and strip off '*'""" f = list(MinimalRdbParser(self.strange)) obs = f[0] exp = (['seq:bac'],'ACGUXxAaKkoo---') self.assertEqual(obs,exp) class RdbParserTests(GenericRdbTest): """Tests for the RdbParser. Should return Sequence objects""" def test_empty(self): """RdbParser should return empty list from 'file' w/o labels""" self.assertEqual(list(RdbParser(self.empty)), []) self.assertEqual(list(RdbParser(self.nolabels, strict=False)), []) self.assertRaises(RecordError, list, RdbParser(self.nolabels)) def test_only_labels(self): """RdbParser should return empty list from file w/o seqs""" #should fail if strict (the default) self.assertRaises(RecordError, list, RdbParser(self.labels,strict=True)) #if not strict, should skip the records self.assertEqual(list(RdbParser(self.labels, strict=False)), []) def test_only_sequences(self): """RdbParser should return empty list form file w/o lables""" #should fail if strict (the default) self.assertRaises(RecordError, list, RdbParser(self.nolabels,strict=True)) #if not strict, should skip the records self.assertEqual(list(RdbParser(self.nolabels, strict=False)), []) def test_single(self): """RdbParser should read single record as (header,seq) tuple""" res = list(RdbParser(self.oneseq)) self.assertEqual(len(res),1) first = res[0] self.assertEqual(first, Sequence('AGUCAUCUAGAUHCAUHC')) self.assertEqual(first.Info, Info({'Species':'H.Sapiens',\ 'OriginalSeq':'AGUCAUCUAGAUHCAUHC'})) res = list(RdbParser(self.multiline)) self.assertEqual(len(res),1) first = res[0] self.assertEqual(first, Sequence('AGUCAUUAGAUHCAUHC')) self.assertEqual(first.Info, Info({'Species':'H.Sapiens',\ 'OriginalSeq':'AGUCAUUAGAUHCAUHC'})) def test_single_constructor(self): """RdbParser should use constructors if supplied""" to_dna = lambda x, Info: DnaSequence(str(x).replace('U','T'), \ Info=Info) f = list(RdbParser(self.oneseq, to_dna)) self.assertEqual(len(f), 1) a = f[0] self.assertEqual(a, 'AGTCATCTAGATHCATHC') self.assertEqual(a.Info, Info({'Species':'H.Sapiens',\ 'OriginalSeq':'AGUCAUCUAGAUHCAUHC'})) def alternativeConstr(header_lines): info = Info() for line in header_lines: all = line.strip().split(':',1) #strip out empty lines, lines without name, lines without colon if not all[0] or len(all) != 2: continue name = all[0].upper() value = all[1].strip().upper() info[name] = value return info f = list(RdbParser(self.oneseq, to_dna, alternativeConstr)) self.assertEqual(len(f), 1) a = f[0] self.assertEqual(a, 'AGTCATCTAGATHCATHC') exp_info = Info({'OriginalSeq':'AGUCAUCUAGAUHCAUHC',\ 'Refs':{}, 'SEQ':'H.SAPIENS'}) self.assertEqual(a.Info, Info({'OriginalSeq':'AGUCAUCUAGAUHCAUHC',\ 'Refs':{}, 'SEQ':'H.SAPIENS'})) def test_multiple_constructor_bad(self): """RdbParser should complain or skip bad records w/ constructor""" def dnastrict(x, **kwargs): try: return DnaSequence(x, **kwargs) except Exception: raise RecordError, "Could not convert sequence" self.assertRaises(RecordError, list, RdbParser(self.oneX,dnastrict)) f = list(RdbParser(self.oneX, dnastrict, strict=False)) self.assertEqual(len(f), 2) a, b = f self.assertEqual(a, 'ACT') self.assertEqual(a.Info,Info({'Species':'mit','OriginalSeq':'ACT'})) self.assertEqual(b, 'AAA') self.assertEqual(b.Info,Info({'Species':'pla','OriginalSeq':'AAA'})) def test_full(self): """RdbParser: full data, valid and invalid""" # when only good record, should work independent of strict r1 = RnaSequence("-??GG-UGAA--CGCU---ACGU-N???---",\ Info=Info({'Species': "unidentified Thermus OPB AF027020",\ 'Refs':{'rRNA':['AF027020']},\ 'OriginalSeq':'-o[oGG-U{G}AA--C^GC]U---ACGU-Nooo---'})) r2 = RnaSequence("---CGAUCG--UAUACG-N???-",\ Info=Info({'Species':'Thermus silvanus X84211',\ 'Refs':{'rRNA':['X84211']},\ 'OriginalSeq':'---CGAU[C(G){--UA}U]ACG-Nooo-'})) obs = list(RdbParser(RDB_LINES_ONLY_GOOD.split('\n'), strict=True)) self.assertEqual(len(obs), 2) self.assertEqual(obs[0], r1) self.assertEqual(str(obs[0]), str(r1)) self.assertEqual(obs[0].Info, r1.Info) self.assertEqual(obs[1], r2) self.assertEqual(str(obs[1]), str(r2)) self.assertEqual(obs[1].Info, r2.Info) obs = list(RdbParser(RDB_LINES_ONLY_GOOD.split('\n'), strict=False)) self.assertEqual(len(obs), 2) self.assertEqual(obs[0], r1) self.assertEqual(str(obs[0]), str(r1)) self.assertEqual(obs[0].Info, r1.Info) # when strict, should raise error on invalid record f = RdbParser(RDB_LINES_GOOD_BAD.split('\n'), strict=True) self.assertRaises(RecordError, list, f) # when not strict, malicious record is skipped obs = list(RdbParser(RDB_LINES_GOOD_BAD.split('\n'), strict=False)) self.assertEqual(len(obs), 2) self.assertEqual(obs[0], r1) self.assertEqual(str(obs[0]), str(r1)) self.assertEqual(obs[0].Info, r1.Info) self.assertEqual(obs[1], r2) self.assertEqual(str(obs[1]), str(r2)) self.assertEqual(obs[1].Info, r2.Info) RDB_LINES_ONLY_GOOD=\ """acc:AF027020 seq: unidentified Thermus OPB AF027020 -o[oGG-U{G}AA--C^GC]U---ACGU-Nooo--- // acc:X84211 seq: Thermus silvanus X84211 ---CGAU[C(G){--UA}U]ACG-Nooo- // """ RDB_LINES_GOOD_BAD=\ """acc:AF027020 seq: unidentified Thermus OPB AF027020 -o[oGG-U{G}AA--C^GC]U---ACGU-Nooo--- // acc:ABC123 seq: E. coli ---ACGU-Nooo-RYXQ- // acc:X84211 seq: Thermus silvanus X84211 ---CGAU[C(G){--UA}U]ACG-Nooo- // """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_record.py000644 000765 000024 00000050015 12024702176 022444 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for parser support libraries dealing with records. """ from cogent.parse.record import FieldError, RecordError, Grouper, \ DelimitedSplitter, GenericRecord, MappedRecord, \ TypeSetter, list_adder, dict_adder, \ LineOrientedConstructor, int_setter, str_setter, bool_setter, \ string_and_strip, FieldWrapper, StrictFieldWrapper, raise_unknown_field, \ FieldMorpher, list_extender from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class recordsTests(TestCase): """Tests of top-level functionality in records.""" def test_string_and_strip(self): """string_and_strip should convert all items to strings and strip them""" self.assertEqual(string_and_strip(), []) self.assertEqual(string_and_strip('\t', ' ', '\n\t'), ['','','']) self.assertEqual(string_and_strip('\ta\tb', 3, ' cde e', None), \ ['a\tb', '3', 'cde e', 'None']) def test_raise_unknown_field(self): """raise_unknown_field should always raise FieldError""" self.assertRaises(FieldError, raise_unknown_field, 'xyz', 123) class GrouperTests(TestCase): """Tests of the Grouper class.""" def test_call(self): """Grouper should return lists containing correct number of groups""" empty = [] s3 = 'abc' s10 = range(10) g1 = Grouper(1) g2 = Grouper(2) g5 = Grouper(5) self.assertEqual(list(g1(empty)), []) self.assertEqual(list(g2(empty)), []) self.assertEqual(list(g5(empty)), []) self.assertEqual(list(g1(s3)), [['a'], ['b'], ['c']]) self.assertEqual(list(g2(s3)), [['a','b'], ['c']]) self.assertEqual(list(g5(s3)), [['a','b','c']]) self.assertEqual(list(g1(s10)), [[i] for i in range(10)]) self.assertEqual(list(g2(s10)), [[0,1],[2,3],[4,5],[6,7],[8,9]]) self.assertEqual(list(g5(s10)), [[0,1,2,3,4],[5,6,7,8,9]]) def test_call_bad(self): """Grouper call should raise ValueError if NumItems is not an int""" g_none = Grouper(None) g_neg = Grouper(-1) g_zero = Grouper(0) g_alpha = Grouper('abc') for g in (g_none, g_neg, g_zero, g_alpha): iterator = g('abcd') self.assertRaises(ValueError, list, iterator) class DelimitedSplitterTests(TestCase): """Tests of the DelimitedSplitter factory function.""" def test_parsers(self): """DelimitedSplitter should return function with correct behavior""" empty = DelimitedSplitter() space = DelimitedSplitter(None) semicolon = DelimitedSplitter(';') twosplits = DelimitedSplitter(';', 2) allsplits = DelimitedSplitter(';', None) lastone = DelimitedSplitter(';', -1) lasttwo = DelimitedSplitter(';', -2) self.assertEqual(empty('a b c'), ['a', 'b c']) self.assertEqual(empty('abc'), ['abc']) self.assertEqual(empty(' '), []) self.assertEqual(empty('a b c'), space('a b c')) self.assertEqual(semicolon(' a ; b ; c d'), ['a','b ; c d']) self.assertEqual(twosplits(' a ; b ; c d'), ['a','b', 'c d']) self.assertEqual(allsplits(' a ; b ; c;;d;e ;'),\ ['a','b','c','','d','e','']) self.assertEqual(lastone(' a ; b ; c;;d;e ;'),\ ['a ; b ; c;;d;e','']) self.assertEqual(lasttwo(' a ; b ; c;;d;e ;'),\ ['a ; b ; c;;d','e','']) self.assertEqual(lasttwo(''), []) self.assertEqual(lasttwo('x'), ['x']) self.assertEqual(lasttwo('x;'), ['x', '']) class GenericRecordTests(TestCase): """Tests of the GenericRecord class""" class gr(GenericRecord): Required = {'a':'x', 'b':[], 'c':{}} def test_init(self): """GenericRecord init should work OK empty or with data""" self.assertEqual(GenericRecord(), {}) self.assertEqual(GenericRecord({'a':1}), {'a':1}) assert isinstance(GenericRecord(), GenericRecord) def test_init_subclass(self): """GenericRecord subclass init should include required data""" self.assertEqual(self.gr(), {'a':'x', 'b':[], 'c':{}}) self.assertEqual(self.gr({'a':[]}), {'a':[], 'b':[],'c':{}}) assert isinstance(self.gr(), self.gr) assert isinstance(self.gr(), GenericRecord) def test_delitem(self): """GenericRecord delitem should fail if item required""" g = self.gr() g['d'] = 3 self.assertEqual(g, {'a':'x','b':[],'c':{},'d':3}) del g['d'] self.assertEqual(g, {'a':'x','b':[],'c':{}}) self.assertRaises(AttributeError, g.__delitem__, 'a') g['c'][3] = 4 self.assertEqual(g['c'], {3:4}) def test_copy(self): """GenericRecord copy should include attributes and set correct class""" g = self.gr() g['a'] = 'abc' g.X = 'y' h = g.copy() self.assertEqual(g, h) assert isinstance(h, self.gr) self.assertEqual(h.X, 'y') self.assertEqual(h, {'a':'abc', 'b':[], 'c':{}}) class MappedRecordTests(TestCase): """Tests of the MappedRecord class""" def setUp(self): """Define a few standard MappedRecords""" self.empty = MappedRecord() self.single = MappedRecord({'a':3}) self.several = MappedRecord(a=4,b=5,c='a',d=[1,2,3]) def test_init_empty(self): """MappedRecord empty init should work OK""" g = MappedRecord() self.assertEqual(g, {}) def test_init_data(self): """MappedRecord should work like normal dict init""" exp = {'a':3, 'b':4} self.assertEqual(MappedRecord({'a':3, 'b':4}), exp) self.assertEqual(MappedRecord(a=3, b=4), exp) self.assertEqual(MappedRecord([['a',3],['b',4]]), exp) def test_init_subclass(self): """MappedRecord subclasses should behave as expected""" class rec(MappedRecord): Required = {'a':{}, 'b':'xyz', 'c':3} Aliases = {'B':'b'} r = rec() self.assertEqual(r, {'a':{}, 'b':'xyz', 'c':3}) #test that subclassing is correct s = r.copy() assert isinstance(s, rec) #test Aliases s.B = 0 self.assertEqual(s, {'a':{}, 'b':0, 'c':3}) #test Required try: del s.B except AttributeError: pass else: raise AssertionError, "Subclass failed to catch requirement" def test_getattr(self): """MappedRecord getattr should look in dict after real attrs""" s = self.several self.assertEqual(s.Aliases, {}) self.assertEqual(s.a, 4) self.assertEqual(s.d, [1,2,3]) for key in s: self.assertEqual(getattr(s, key), s[key]) assert 'xyz' not in s self.assertEqual(s.xyz, None) self.assertEqual(s['xyz'], None) s.Aliases = {'xyz':'a'} self.assertEqual(s['xyz'], 4) def test_setattr(self): """MappedRecord setattr should add to dict""" s = self.single #check that we haven't screwed up normal attribute setting assert 'Aliases' not in s s.Aliases = {'x':'y'} assert 'Aliases' not in s self.assertEqual(s.Aliases, {'x':'y'}) s.x = 5 assert 'x' in s self.assertEqual(s['x'], 5) self.assertEqual(s.x, 5) s.Aliases = {'XYZ':'b'} s.XYZ = 3 self.assertEqual(s.b, 3) def test_delattr(self): """MappedRecord delattr should work for 'normal' and other attributes""" s = self.single s.__dict__['x'] = 'y' assert 'x' not in s self.assertEqual(s.x, 'y') del s.x self.assertEqual(s.x, None) self.assertEqual(s, {'a':3}) #try it for an internal attribute: check it doesn't delete anything else s.b = 4 self.assertEqual(s, {'a':3, 'b':4}) del s.a self.assertEqual(s, {'b':4}) del s.abc self.assertEqual(s, {'b':4}) s.Required = {'b':True} try: del s.b except AttributeError: pass else: raise AssertionError, "Allowed deletion of required attribute""" s.a = 3 self.assertEqual(s.a, 3) s.Aliases = {'xyz':'a'} del s.xyz self.assertEqual(s.a, None) def test_getitem(self): """MappedRecord getitem should work only for keys, not attributes""" s = self.single self.assertEqual(s['Required'], None) self.assertEqual(s['a'], 3) self.assertEqual(s['xyz'], None) self.assertEquals(s[list('abc')], None) s.Aliases = {'xyz':'a'} self.assertEqual(s['xyz'], 3) def test_setitem(self): """MappedRecord setitem should work only for keys, not attributes""" s = self.single s['Required'] = None self.assertEqual(s, {'a':3, 'Required':None}) self.assertEqual(s.Required, {}) self.assertNotEqual(s.Required, None) s['c'] = 5 self.assertEqual(s, {'a':3, 'c':5, 'Required':None}) #still not allowed unhashable objects as keys self.assertRaises(TypeError, s.__setitem__, range(3)) s.Aliases = {'C':'c'} s['C'] = 3 self.assertEqual(s, {'a':3, 'c':3, 'Required':None}) def test_delitem(self): """MappedRecord delitem should only work for keys, not attributes""" s = self.single del s['Required'] self.assertEqual(s.Required, {}) s.Required = {'a':True} try: del s['a'] except AttributeError: pass else: raise AssertionError, "Allowed deletion of required item" s.Aliases = {'B':'b'} s.b = 5 self.assertEqual(s.b, 5) del s.B self.assertEqual(s.b, None) def test_contains(self): """MappedRecord contains should use aliases, but not apply to attrs""" s = self.single assert 'a' in s assert 'b' not in s s.b = 5 assert 'b' in s assert 'Required' not in s assert 'A' not in s s.Aliases = {'A':'a'} assert 'A' in s def test_get(self): """MappedRecord get should be typesafe against unhashables""" s = self.single self.assertEqual(s.get(1, 6), 6) self.assertEqual(s.get('a', 'xyz'), 3) self.assertEqual(s.get('ABC', 'xyz'), 'xyz') s.Aliases = {'ABC':'a'} self.assertEqual(s.get('ABC', 'xyz'), 3) self.assertEqual(s.get([1,2,3], 'x'), 'x') def test_setdefault(self): """MappedRecord setdefault should not be typesafe against unhashables""" s = self.single x = s.setdefault('X', 'xyz') self.assertEqual(x, 'xyz') self.assertEqual(s, {'a':3, 'X':'xyz'}) self.assertRaises(TypeError, s.setdefault, ['a','b'], 'xyz') def test_update(self): """MappedRecord update should transparently convert keys""" s = self.single s.b = 999 s.Aliases = {'XYZ':'x', 'ABC':'a'} d = {'ABC':111, 'CVB':222} s.update(d) self.assertEqual(s, {'a':111, 'b':999, 'CVB':222}) def test_copy(self): """MappedRecord copy should return correct class""" s = self.single t = s.copy() assert isinstance(t, MappedRecord) s.Aliases = {'XYZ':'x'} u = s.copy() u.Aliases['ABC'] = 'a' self.assertEqual(s.Aliases, {'XYZ':'x'}) self.assertEqual(t.Aliases, {}) self.assertEqual(u.Aliases, {'XYZ':'x', 'ABC':'a'}) def test_subclass(self): """MappedRecord subclassing should work correctly""" class ret3(MappedRecord): DefaultValue = 3 ClassData = 'xyz' x = ret3({'ABC':777, 'DEF':'999'}) self.assertEqual(x.ZZZ, 3) self.assertEqual(x.ABC, 777) self.assertEqual(x.DEF, '999') self.assertEqual(x.ClassData, 'xyz') x.ZZZ = 6 self.assertEqual(x.ZZZ, 6) self.assertEqual(x.ZZ, 3) x.ClassData = 'qwe' self.assertEqual(x.ClassData, 'qwe') self.assertEqual(ret3.ClassData, 'xyz') def test_DefaultValue(self): """MappedRecord DefaultValue should give new copy when requested""" class m(MappedRecord): DefaultValue=[] a = m() b = m() assert a['abc'] is not b['abc'] assert a['abc'] == b['abc'] class dummy(object): """Do-nothing class whose attributes can be freely abused.""" pass class TypeSetterTests(TestCase): """Tests of the TypeSetter class""" def test_setter_empty(self): """TypeSetter should set attrs to vals on empty init""" d = dummy() ident = TypeSetter() ident(d, 'x', 'abc') self.assertEqual(d.x, 'abc') ident(d, 'y', 3) self.assertEqual(d.y, 3) ident(d, 'x', 2) self.assertEqual(d.x, 2) def test_setter_typed(self): """TypeSetter should set attrs to constructor(val) when specified""" d = dummy() i = TypeSetter(int) i(d, 'zz', 3) self.assertEqual(d.zz, 3) i(d, 'xx', '456') self.assertEqual(d.xx, 456) class TypeSetterLikeTests(TestCase): """Tests of the functions that behave similarly to TypeSetter products""" def test_list_adder(self): """list_adder should add items to list, creating if necessary""" d = dummy() list_adder(d, 'x', 3) self.assertEqual(d.x, [3]) list_adder(d, 'x', 'abc') self.assertEqual(d.x, [3, 'abc']) list_adder(d, 'y', [2,3]) self.assertEqual(d.x, [3, 'abc']) self.assertEqual(d.y, [[2,3]]) def test_list_extender(self): """list_adder should add items to list, creating if necessary""" d = dummy() list_extender(d, 'x', '345') self.assertEqual(d.x, ['3','4','5']) list_extender(d, 'x', 'abc') self.assertEqual(d.x, ['3','4','5','a','b','c']) list_extender(d, 'y', [2,3]) self.assertEqual(d.x, ['3','4','5','a','b','c']) self.assertEqual(d.y, [2,3]) list_extender(d, 'y', None) self.assertEqual(d.y, [2,3,None]) def test_dict_adder(self): """dict_adder should add items to dict, creating if necessary""" d = dummy() dict_adder(d, 'x', 3) self.assertEqual(d.x, {3:None}) dict_adder(d, 'x', 'ab') self.assertEqual(d.x, {3:None, 'a':'b'}) dict_adder(d, 'x', ['a', 0]) self.assertEqual(d.x, {3:None, 'a':0}) dict_adder(d, 'y', None) self.assertEqual(d.x, {3:None, 'a':0}) self.assertEqual(d.y, {None:None}) class LineOrientedConstructorTests(TestCase): """Tests of the LineOrientedConstructor class""" def test_init_empty(self): """LOC empty init should succeed with expected defaults""" l = LineOrientedConstructor() self.assertEqual(l.Lines, []) self.assertEqual(l.LabelSplitter(' ab cd '), ['ab','cd']) self.assertEqual(l.FieldMap, {}) self.assertEqual(l.Constructor, MappedRecord) self.assertEqual(l.Strict, False) def test_empty_LOC(self): """LOC empty should fail if strict, fill fields if not strict""" data = ["abc def","3 n","\t abc \txyz\n\n", "fgh "] l = LineOrientedConstructor() result = l() self.assertEqual(result, {}) result = l([]) self.assertEqual(result, {}) result = l([' ','\n\t ']) self.assertEqual(result, {}) result = l(data) self.assertEqual(result, {'abc':'xyz', '3':'n', 'fgh':None}) def test_full_LOC(self): """LOC should behave as expected when initialized with rich data""" data = ["abc\t def"," 3 \t n"," abc \txyz\n\n", "x\t5", "fgh ", "x\t3 "] class rec(MappedRecord): Required = {'abc':[]} maps = {'abc':list_adder, 'x':int_setter, 'fgh':bool_setter} label_splitter = DelimitedSplitter('\t') constructor = rec strict = True loc_bad = LineOrientedConstructor(data, label_splitter, maps, \ constructor, strict) self.assertRaises(FieldError, loc_bad) strict = False loc_good = LineOrientedConstructor(data, label_splitter, maps, \ constructor, strict) result = loc_good() assert isinstance(result, rec) self.assertEqual(result, \ {'abc':['def','xyz'], '3':'n','fgh':False,'x':3}) class fake_dict(dict): """Test that constructors return the correct subclass""" pass class FieldWrapperTests(TestCase): """Tests of the FieldWrapper factory function""" def test_default(self): """Default FieldWrapper should wrap fields and labels""" fields = list('abcde') f = FieldWrapper(fields) self.assertEqual(f(''), {}) self.assertEqual(f('xy za '), {'a':'xy','b':'za'}) self.assertEqual(f('1 2\t\t 3 \n4 5 6'), \ {'a':'1','b':'2','c':'3','d':'4','e':'5'}) def test_splitter(self): """FieldWrapper with splitter should use that splitter""" fields = ['label', 'count'] splitter = DelimitedSplitter(':', -1) f = FieldWrapper(fields, splitter) self.assertEqual(f(''), {}) self.assertEqual(f('nknasd:'), {'label':'nknasd', 'count':''}) self.assertEqual(f('n:k:n:a:sd '), {'label':'n:k:n:a', 'count':'sd'}) def test_constructor(self): """FieldWrapper with constructor should use that constructor""" fields = list('abc') f = FieldWrapper(fields, constructor=fake_dict) self.assertEqual(f('x y'), {'a':'x','b':'y'}) assert isinstance(f('x y'), fake_dict) class StrictFieldWrapperTests(TestCase): """Tests of the StrictFieldWrapper factory function""" def test_default(self): """Default StrictFieldWrapper should wrap fields if count correct""" fields = list('abcde') f = StrictFieldWrapper(fields) self.assertEqual(f('1 2\t\t 3 \n4 5 '), \ {'a':'1','b':'2','c':'3','d':'4','e':'5'}) self.assertRaises(FieldError, f, '') self.assertRaises(FieldError, f, 'xy za ') def test_splitter(self): """StrictFieldWrapper with splitter should use that splitter""" fields = ['label', 'count'] splitter = DelimitedSplitter(':', -1) f = StrictFieldWrapper(fields, splitter) self.assertEqual(f('n:k:n:a:sd '), {'label':'n:k:n:a', 'count':'sd'}) self.assertEqual(f('nknasd:'), {'label':'nknasd', 'count':''}) self.assertRaises(FieldError, f, '') def test_constructor(self): """StrictFieldWrapper with constructor should use that constructor""" fields = list('ab') f = StrictFieldWrapper(fields, constructor=fake_dict) self.assertEqual(f('x y'), {'a':'x','b':'y'}) assert isinstance(f('x y'), fake_dict) class FieldMorpherTests(TestCase): """Tests of the FieldMorpher class.""" def test_default(self): """FieldMorpher default should use correct constructors""" fm = FieldMorpher({'a':int, 'b':str}) self.assertEqual(fm({'a':'3', 'b':456}), {'a':3,'b':'456'}) def test_default_error(self): """FieldMorpher default should raise FieldError on unknown fields""" fm = FieldMorpher({'a':int, 'b':str}) self.assertRaises(FieldError, fm, {'a':'3', 'b':456, 'c':'4'}) def test_altered_default(self): """FieldMorpher with default set should apply it""" func = lambda x, y: (str(x), float(y) - 0.5) fm = FieldMorpher({'3':str,4:int}, func) #check that recognized values aren't tampered with self.assertEqual(fm({3:3, 4:'4'}), {'3':'3', 4:4}) #check that unrecognized values get the appropriate conversion self.assertEqual(fm({3:3, 5:'5'}), {'3':'3', '5':4.5}) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_record_finder.py000644 000765 000024 00000023462 12024702176 024001 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for recordfinders: parsers that group the lines for a record. """ from cogent.parse.record import RecordError from cogent.parse.record_finder import DelimitedRecordFinder, \ LabeledRecordFinder, LineGrouper, TailedRecordFinder from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Zongzhi Liu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class TailedRecordFinderTests(TestCase): """Tests of the TailedRecordFinder factory function.""" def setUp(self): """Define a standard TailedRecordFinder""" self.endswith_period = lambda x: x.endswith('.') self.period_tail_finder = TailedRecordFinder(self.endswith_period) def test_parsers(self): """TailedRecordFinder should split records into lines correctly""" lines = '>abc\ndef\nz.\n>efg\nz.'.split() fl = self.period_tail_finder self.assertEqual(list(fl(lines)), \ [['>abc', 'def', 'z.'], ['>efg','z.']]) def test_parsers_empty(self): """TailedRecordFinder should return empty list on empty lines""" fl = self.period_tail_finder self.assertEqual(list(fl([' ','\n'])), []) self.assertEqual(list(fl([])), []) def test_parsers_strip(self): """TailedRecordFinder should trim each line correctly""" fl = self.period_tail_finder lines = '>abc \n \t def\n z. \t\n>efg \nz.'.split('\n') self.assertEqual(list(fl(lines)), \ [['>abc', ' \t def', ' z.'], ['>efg','z.']]) def test_parsers_leftover(self): """TailedRecordFinder should raise error or yield leftover""" f = self.period_tail_finder good = [ 'abc \n', 'def\n', '.\n', 'ghi \n', 'j.', ] blank = ['', ' ', '\t \t\n\n'] bad = ['abc'] result = [['abc', 'def','.'], ['ghi','j.']] self.assertEqual(list(f(good)), result) self.assertEqual(list(f(good+blank)), result) self.assertRaises(RecordError, list, f(good+bad)) f2 = TailedRecordFinder(self.endswith_period, strict=False) self.assertEqual(list(f2(good+bad)), result + [['abc']]) def test_parsers_ignore(self): """TailedRecordFinder should skip lines to ignore.""" def never(line): return False def ignore_labels(line): return (not line) or line.isspace() or line.startswith('#') lines = ['abc','\n','1.','def','#ignore','2.'] self.assertEqual(list(TailedRecordFinder(self.endswith_period)(lines)), [['abc', '1.'],['def','#ignore','2.']]) self.assertEqual(list(TailedRecordFinder(self.endswith_period, ignore=never)(lines)), [['abc', '', '1.'],['def','#ignore','2.']]) self.assertEqual(list(TailedRecordFinder(self.endswith_period, ignore=ignore_labels)(lines)), [['abc','1.'],['def','2.']]) class DelimitedRecordFinderTests(TestCase): """Tests of the DelimitedRecordFinder factory function.""" def test_parsers(self): """DelimitedRecordFinder should split records into lines correctly""" lines = 'abc\ndef\n//\nefg\n//'.split() self.assertEqual(list(DelimitedRecordFinder('//')(lines)), \ [['abc', 'def', '//'], ['efg','//']]) self.assertEqual(list(DelimitedRecordFinder('//', keep_delimiter=False) (lines)), \ [['abc', 'def'], ['efg']]) def test_parsers_empty(self): """DelimitedRecordFinder should return empty list on empty lines""" self.assertEqual(list(DelimitedRecordFinder('//')([' ','\n'])), []) self.assertEqual(list(DelimitedRecordFinder('//')([])), []) def test_parsers_strip(self): """DelimitedRecordFinder should trim each line correctly""" lines = ' \t abc \n \t def\n // \t\n\t\t efg \n//'.split('\n') self.assertEqual(list(DelimitedRecordFinder('//')(lines)), \ [['abc', 'def', '//'], ['efg','//']]) def test_parsers_error(self): """DelimitedRecordFinder should raise RecordError if trailing data""" good = [ ' \t abc \n', '\t def\n', '// \t\n', '\t\n', '\t efg \n', '\t\t//\n', ] blank = ['', ' ', '\t \t\n\n'] bad = ['abc'] result = [['abc', 'def', '//'], ['efg','//']] r = DelimitedRecordFinder('//') self.assertEqual(list(r(good)), result) self.assertEqual(list(r(good+blank)), result) try: list(r(good+bad)) except RecordError: pass else: raise AssertionError, "Parser failed to raise error on bad data" r = DelimitedRecordFinder('//', strict=False) self.assertEqual(list(r(good+bad)), result + [['abc']]) def test_parsers_ignore(self): """DelimitedRecordFinder should skip lines to ignore.""" def never(line): return False def ignore_labels(line): return (not line) or line.isspace() or line.startswith('#') lines = ['>abc','\n','1', '$$', '>def','#ignore','2', '$$'] self.assertEqual(list(DelimitedRecordFinder('$$')(lines)), [['>abc', '1', '$$'],['>def','#ignore','2', '$$']]) self.assertEqual(list(DelimitedRecordFinder('$$', ignore=never)(lines)), [['>abc', '', '1', '$$'],['>def','#ignore','2','$$']]) self.assertEqual(list(DelimitedRecordFinder('$$', ignore=ignore_labels)(lines)), [['>abc','1','$$'],['>def','2','$$']]) class LabeledRecordFinderTests(TestCase): """Tests of the LabeledRecordFinder factory function.""" def setUp(self): """Define a standard LabeledRecordFinder""" self.FastaLike = LabeledRecordFinder(lambda x: x.startswith('>')) def test_parsers(self): """LabeledRecordFinder should split records into lines correctly""" lines = '>abc\ndef\n//\n>efg\n//'.split() fl = self.FastaLike self.assertEqual(list(fl(lines)), \ [['>abc', 'def', '//'], ['>efg','//']]) def test_parsers_empty(self): """LabeledRecordFinder should return empty list on empty lines""" fl = self.FastaLike self.assertEqual(list(fl([' ','\n'])), []) self.assertEqual(list(fl([])), []) def test_parsers_strip(self): """LabeledRecordFinder should trim each line correctly""" fl = self.FastaLike lines = ' \t >abc \n \t def\n // \t\n\t\t >efg \n//'.split('\n') self.assertEqual(list(fl(lines)), \ [['>abc', 'def', '//'], ['>efg','//']]) def test_parsers_leftover(self): """LabeledRecordFinder should not raise RecordError if last line label""" fl = self.FastaLike good = [ ' \t >abc \n', '\t def\n', '\t\n', '\t >efg \n', 'ghi', ] blank = ['', ' ', '\t \t\n\n'] bad = ['>abc'] result = [['>abc', 'def'], ['>efg','ghi']] self.assertEqual(list(fl(good)), result) self.assertEqual(list(fl(good+blank)), result) self.assertEqual(list(fl(good+bad)), result + [['>abc']]) def test_parsers_ignore(self): """LabeledRecordFinder should skip lines to ignore.""" def never(line): return False def ignore_labels(line): return (not line) or line.isspace() or line.startswith('#') def is_start(line): return line.startswith('>') lines = ['>abc','\n','1','>def','#ignore','2'] self.assertEqual(list(LabeledRecordFinder(is_start)(lines)), [['>abc', '1'],['>def','#ignore','2']]) self.assertEqual(list(LabeledRecordFinder(is_start, ignore=never)(lines)), [['>abc', '', '1'],['>def','#ignore','2']]) self.assertEqual(list(LabeledRecordFinder(is_start, ignore=ignore_labels)(lines)), [['>abc','1'],['>def','2']]) class LineGrouperTests(TestCase): """Tests of the LineGrouper class.""" def test_parser(self): """LineGrouper should return n non-blank lines at a time""" good = [ ' \t >abc \n', '\t def\n', '\t\n', '\t >efg \n', 'ghi', ] c = LineGrouper(2) self.assertEqual(list(c(good)), [['>abc', 'def'],['>efg','ghi']]) c = LineGrouper(1) self.assertEqual(list(c(good)), [['>abc'], ['def'],['>efg'],['ghi']]) c = LineGrouper(4) self.assertEqual(list(c(good)), [['>abc', 'def','>efg','ghi']]) #shouldn't work if not evenly divisible c = LineGrouper(3) self.assertRaises(RecordError, list, c(good)) def test_parser_ignore(self): """LineGrouper should skip lines to ignore.""" def never(line): return False def ignore_labels(line): return (not line) or line.isspace() or line.startswith('#') lines = ['abc','\n','1','def','#ignore','2'] self.assertEqual(list(LineGrouper(1)(lines)), [['abc'], ['1'],['def'],['#ignore'],['2']]) self.assertEqual(list(LineGrouper(1, ignore=never)(lines)), [[i.strip()] for i in lines]) self.assertEqual(list(LineGrouper(2, ignore=ignore_labels)(lines)), [['abc','1'],['def','2']]) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_rfam.py000644 000765 000024 00000056502 12024702176 022122 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Provides tests for RfamParser and related classes and functions. """ from cogent.parse.rfam import is_header_line, is_seq_line, is_structure_line,\ HeaderToInfo, MinimalRfamParser, RfamFinder, NameToInfo, RfamParser,\ ChangedSequence, is_empty_or_html from cogent.util.unit_test import TestCase, main from cogent.parse.record import RecordError from cogent.core.info import Info from cogent.struct.rna2d import WussStructure from cogent.core.alignment import Alignment from cogent.core.moltype import BYTES __author__ = "Sandra Smit and Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Greg Caporaso", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" Sequence = BYTES.Sequence class RfamParserTests(TestCase): """ Tests componenets of the rfam parser, in the rfam.py file """ def setUp(self): """ Construct some fake data for testing purposes """ self._fake_headers = [] temp = list(fake_headers.split('\n')) for line in temp: self._fake_headers.append(line.strip()) del temp self._fake_record_no_headers =\ list(fake_record_no_headers.split('\n')) self._fake_record_no_sequences =\ list(fake_record_no_sequences.split('\n')) self._fake_record_no_structure =\ list(fake_record_no_structure.split('\n')) self._fake_two_records =\ list(fake_two_records.split('\n')) self._fake_record =\ list(fake_record.split('\n')) self._fake_record_bad_header_1 =\ list(fake_record_bad_header_1.split('\n')) self._fake_record_bad_header_2 =\ list(fake_record_bad_header_2.split('\n')) self._fake_record_bad_sequence_1 =\ list(fake_record_bad_sequence_1.split('\n')) self._fake_record_bad_structure_1 =\ list(fake_record_bad_structure_1.split('\n')) self._fake_record_bad_structure_2 =\ list(fake_record_bad_structure_2.split('\n')) self.single_family = single_family.split('\n') def test_is_empty_or_html(self): """is_empty_or_html: should ignore empty and HTML line""" line = ' ' self.assertEqual(is_empty_or_html(line), True) line = '\n\n' self.assertEqual(is_empty_or_html(line), True) line = '
'
        self.assertEqual(is_empty_or_html(line), True)
        line = '
\n\n' self.assertEqual(is_empty_or_html(line), True) line = '\t>>')]) # should get empty on missing sequence or missing structure self.assertEqual(list(MinimalRfamParser(self._fake_record_no_sequences,\ strict=False)), []) self.assertEqual(list(MinimalRfamParser(self._fake_record_no_structure,\ strict=False)), []) def test_MinimalRfamParser_strict_invalid_sequence(self): """MinimalRfamParser: toggle strict functions w/ invalid seq """ #strict = True self.assertRaises(RecordError,list,\ MinimalRfamParser(self._fake_record_bad_sequence_1)) # strict = False # you expect to get back as much information as possible, also # half records or sequences result = MinimalRfamParser(self._fake_record_bad_sequence_1,strict=False) self.assertEqual(len(list(MinimalRfamParser(\ self._fake_record_bad_sequence_1,strict=False))[0][1].NamedSeqs), 3) def test_MinimalRfamParser_strict_invalid_structure(self): """MinimalRfamParser: toggle strict functions w/ invalid structure """ #strict = True self.assertRaises(RecordError,list,\ MinimalRfamParser(self._fake_record_bad_structure_1)) # strict = False self.assertEqual(list(MinimalRfamParser(\ self._fake_record_bad_structure_1,strict=False))[0][2],None) def test_MinimalRfamParser_w_valid_data(self): """MinimalRfamParser: integrity of output """ # Some ugly constructions here, but this is what the output of # parsing fake_two_records should be headers = ['#=GF AC RF00014','#=GF AU Mifsud W'] sequences =\ {'U17136.1/898-984':\ ''.join(['AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA',\ 'AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU']),\ 'M15749.1/155-239':\ ''.join(['AACGCAUCGGAUUUCCCGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUU',\ 'AGCAAGUUUGAUCCCGACUCCUG-CGAGUCGGGAUUU']),\ 'AF090431.1/222-139':\ ''.join(['CUCACAUCAGAUUUCCUGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUA',\ 'AGCAAGUUUGAUCCCGACCCGU--AGGGCCGGGAUUU'])} structure = WussStructure(''.join(\ ['...<<<<<<<.....>>>>>>>....................<<<<<...',\ '.>>>>>....<<<<<<<<<<.....>>>>>>>>>>..'])) data = [] for r in MinimalRfamParser(self._fake_two_records, strict=False): data.append(r) self.assertEqual(data[0],(headers,sequences,structure)) assert isinstance(data[0][1],Alignment) # This line tests that invalid entries are ignored when strict=False # Note, there are two records in self._fake_two_records, but 2nd is # invalid self.assertEqual(len(data),1) def test_RfamFinder(self): """RfamFinder: integrity of output """ fake_record = ['a','//','b','b','//'] num_records = 0 data = [] for r in RfamFinder(fake_record): data.append(r) num_records += 1 self.assertEqual(num_records, 2) self.assertEqual(data[0], ['a','//']) self.assertEqual(data[1], ['b','b','//']) def test_ChangedSequence(self): """ChangedSequence: integrity of output""" # Made up input, based on a line that would look like: # U17136.1/898-984 AACA..CAU..CAGAUUUCCU..GGUGUAA.CGAA s_in = 'AACA..CAU..CAGAUUUCCU..GGUGUAA.CGAA' s_out = 'AACA--CAU--CAGAUUUCCU--GGUGUAA-CGAA' sequence = ChangedSequence(s_in) self.assertEqual(sequence, s_out) # test some extremes on the seq # sequence of all blanks s_in = '.' * 5 s_out = '-' * 5 sequence = ChangedSequence(s_in) self.assertEqual(sequence, s_out) # sequence of no blanks s_in = 'U' * 5 s_out = 'U' * 5 sequence = ChangedSequence(s_in) self.assertEqual(sequence, s_out) def test_NameToInfo(self): """NameToInfo: integrity of output """ # Made up input, based on a line that would look like: # U17136.1/898-984 AACA..CAU..CAGAUUUCCU..GGUGUAA.CGAA s_in = 'AACA..CAU..CAGAUUUCCU..GGUGUAA.CGAA' #s_out = 'AACA--CAU--CAGAUUUCCU--GGUGUAA-CGAA' sequence = Sequence(s_in, Name='U17136.1/898-984') info = NameToInfo(sequence) #self.assertEqual(seq, s_out) self.assertEqual(info['Start'], 897) self.assertEqual(info['End'], 984) self.assertEqual(info['GenBank'], ['U17136.1']) def test_NameToInfo_invalid_label(self): """NameToInfo: raises error on invalid label """ s = 'AA' invalid_labels = ['U17136.1898-984','U17136.1/898984'] for l in invalid_labels: self.assertRaises(RecordError,NameToInfo,\ Sequence(s, Name=l)) a = 'U17136.1/' #missing start/end positions b = '/898-984' #missing genbank id obs_info = NameToInfo(Sequence(s,Name=a)) exp = Info({'GenBank':'U17136.1','Start':None,'End':None}) self.assertEqual(obs_info,exp) obs_info = NameToInfo(Sequence(s,Name=b)) exp = Info({'GenBank':None,'Start':897,'End':984}) self.assertEqual(obs_info,exp) #strict = False # in strict mode you want to get back as much info as possible lab1 = 'U17136.1898-984' lab2 = 'U17136.1/898984' obs_info = NameToInfo(Sequence(s,Name=lab1), strict=False) exp = Info({'GenBank':None,'Start':None,'End':None}) self.assertEqual(obs_info,exp) obs_info = NameToInfo(Sequence(s,Name=lab2), strict=False) exp = Info({'GenBank':'U17136.1','Start':None,'End':None}) self.assertEqual(obs_info,exp) def test_RfamParser(self): """RfamParser: integrity of output """ expected_sequences =\ [''.join(['AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA',\ 'AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU']),\ ''.join(['AACGCAUCGGAUUUCCCGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUU',\ 'AGCAAGUUUGAUCCCGACUCCUG-CGAGUCGGGAUUU']),\ ''.join(['CUCACAUCAGAUUUCCUGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUA',\ 'AGCAAGUUUGAUCCCGACCCGU--AGGGCCGGGAUUU'])] expected_structure = ''.join(\ ['...<<<<<<<.....>>>>>>>....................<<<<<...',\ '.>>>>>....<<<<<<<<<<.....>>>>>>>>>>..']) for r in RfamParser(self._fake_record): headers,sequences,structure = r self.assertEqual(headers['Refs']['Rfam'], ['RF00014']) self.assertEqual(headers['Author'], 'Mifsud W') self.assertEqualItems(sequences.values(), expected_sequences) assert isinstance(sequences, Alignment) self.assertEqualItems([s.Info.GenBank for s in sequences.Seqs], [['U17136.1'],['M15749.1'],['AF090431.1']]) self.assertEqualItems([s.Info.Start for s in sequences.Seqs], [897,154,221]) self.assertEqual(structure, expected_structure) assert isinstance(structure,WussStructure) def test_RfamParser_strict_missing_fields(self): """RfamParser: toggle strict functions correctly """ # strict = True self.assertRaises(RecordError,list,\ RfamParser(self._fake_record_no_headers)) self.assertRaises(RecordError,list,\ RfamParser(self._fake_record_no_sequences)) self.assertRaises(RecordError,list,\ RfamParser(self._fake_record_no_structure)) # strict = False self.assertEqual(list(RfamParser(self._fake_record_no_headers,\ strict=False)), []) self.assertEqual(list(RfamParser(self._fake_record_no_sequences,\ strict=False)), []) self.assertEqual(list(RfamParser(self._fake_record_no_structure,\ strict=False)), []) def test_RFamParser_strict_invalid_headers(self): """RfamParser: functions when toggling strict w/ record w/ bad header """ self.assertRaises(RecordError,list,\ RfamParser(self._fake_record_bad_header_1)) self.assertRaises(RecordError,list,\ RfamParser(self._fake_record_bad_header_2)) # strict = False x = list(RfamParser(self._fake_record_bad_header_1, strict=False)) obs = list(RfamParser(self._fake_record_bad_header_1,\ strict=False))[0][0].keys() self.assertEqual(len(obs),1) obs = list(RfamParser(self._fake_record_bad_header_2,\ strict=False))[0][0].keys() self.assertEqual(len(obs),1) def test_RfamParser_strict_invalid_sequences(self): """RfamParser: functions when toggling strict w/ record w/ bad seq """ self.assertRaises(RecordError,list, MinimalRfamParser(self._fake_record_bad_sequence_1)) # strict = False # in 'False' mode you expect to get back as much as possible, also # parts of sequences self.assertEqual(len(list(RfamParser(self._fake_record_bad_sequence_1,\ strict=False))[0][1].NamedSeqs), 3) def test_RfamParser_strict_invalid_structure(self): """RfamParser: functions when toggling strict w/ record w/ bad struct """ # strict self.assertRaises(RecordError,list,\ RfamParser(self._fake_record_bad_structure_2)) #not strict self.assertEqual(list(RfamParser(self._fake_record_bad_structure_2,\ strict=False)),[]) def test_RfamParser_single_family(self): """RfamParser: should work on a single family in stockholm format""" exp_header = Info() exp_aln = {'K02120.1/628-682':\ 'AUGGGAAAUUCCCCCUCCUAUAACCCCCCCGCUGGUAUCUCCCCCUCAGACUGGC',\ 'D00647.1/629-683':\ 'AUGGGAAACUCCCCCUCCUAUAACCCCCCCGCUGGCAUCUCCCCCUCAGACUGGC'} exp_struct = '<<<<<<.........>>>>>>.........<<<<<<.............>>>>>>' h, a, s = list(RfamParser(self.single_family))[0] self.assertEqual(h,exp_header) self.assertEqual(a,exp_aln) self.assertEqual(s,exp_struct) # This is an altered version of some header info from Rfam.seed modified to # incorporate different cases for testing fake_headers = """#=GF AC RF00001 #=GF AU Griffiths-Jones SR #=GF ID 5S_rRNA #=GF RT 5S Ribosomal RNA Database. #=GF DR URL; http://oberon.fvms.ugent.be:8080/rRNA/ssu/index.html; #=GF DR URL; http://rdp.cme.msu.edu/html/; #=GF CC This is a short #=GF CC comment #=GF SQ 606 #=GF PK not real""" fake_record_no_headers ="""Z11765.1/1-89 GGUC #=GC SS_cons ............>>> //""" fake_record_no_sequences ="""#=GF AC RF00006 #=GC SS_cons ............> //""" fake_record_no_structure ="""#=GF AC RF00006 Z11765.1/1-89 GGUCAGC //""" fake_two_records ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx // #=GF AC RF00015 //""" fake_record ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_header_1 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AUMifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_header_2 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GFAUMifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_sequence_1 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_structure_1 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_structure_2 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" single_family=\ """K02120.1/628-682 AUGGGAAAUUCCCCCUCCUAUAACCCCCCCGCUGGUAUCUCCCCCUCAGA D00647.1/629-683 AUGGGAAACUCCCCCUCCUAUAACCCCCCCGCUGGCAUCUCCCCCUCAGA #=GC SS_cons <<<<<<.........>>>>>>.........<<<<<<.............> K02120.1/628-682 CUGGC D00647.1/629-683 CUGGC #=GC SS_cons >>>>> //""" # Run tests if called from the command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_rna_fold.py000644 000765 000024 00000011264 12024702176 022755 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for the RNAfold dot plot parser """ from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.parse.rna_fold import * __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" class RnaFoldParserTests(TestCase): """Tests for RnaFoldParser. """ def setUp(self): """Setup function for RnaFoldParser tests. """ self.rna_fold_parser_results = ('ACGUGCUAG', [(1, 7, float(0.01462)), (2, 9, float(0.11118)), (3, 7, float(0.00985)), (4, 8, float(0.01005)), (4, 9, float(0.01586))]) self.sequence_lines = ['/sequence line before where sequences start\n', ' ACCUGUCUAUCGCUGC&*$#@(*\n', 'ACGGUUAUAUUAUCUCUG\\\n', ') end of sequence\n'] self.sequence_lines_empty = ['/sequence \n', '\n', ')\n'] self.index_lines = ['unimportant line', '1 3 0.332 ubox', '1 4 0.003 ubox'] self.index_lines_no_ubox = ['unimportant line', '1 2 0.432 u box', '1 4 0.32 ubo x'] def test_getSequence(self): self.assertEqual(getSequence(self.sequence_lines), 'ACCUGUCUAUCGCUGC&*$#@(*ACGGUUAUAUUAUCUCUG') self.assertEqual(getSequence(self.sequence_lines_empty),'') def test_getIndices(self): self.assertEqual(getIndices(self.index_lines),[(1,3,float(0.332)), (1,4,float(0.003))]) self.assertEqual(getIndices(self.index_lines_no_ubox),[]) def test_RnaFoldParser(self): self.assertEqual(RnaFoldParser([]), ('',[])) self.assertEqual(RnaFoldParser(RNA_FOLD_RESULTS), self.rna_fold_parser_results) RNA_FOLD_RESULTS = ['%!PS-Adobe-3.0 EPSF-3.0\n', '%%Title: RNA DotPlot\n', '%%Creator: PS_dot.c,v 1.24 2003/08/07 09:01:00 ivo Exp $, ViennaRNA-1.5\n', '%%CreationDate: Fri Oct 8 13:15:01 2004\n', '%%BoundingBox: 66 211 518 662\n', '%%DocumentFonts: Helvetica\n', '%%Pages: 1\n', '%%EndComments\n', '\n', '%Options: \n', '%This file contains the square roots of the base pair probabilities in the form\n', '% i j sqrt(p(i,j)) ubox\n', '100 dict begin\n', '\n', '/logscale false def\n', '\n', '%delete next line to get rid of title\n', '270 665 moveto /Helvetica findfont 14 scalefont setfont (dot.ps) show\n', '\n', '/lpmin {\n', ' 1e-05 log % log(pmin) only probs>pmin will be shown\n', '} bind def\n', '\n', '/box { %size x y box - draws box centered on x,y\n', ' 2 index 0.5 mul add % x += 0.5\n', ' exch 2 index 0.5 mul add exch % x += 0.5\n', ' newpath\n', ' moveto\n', ' dup neg 0 rlineto\n', ' dup neg 0 exch rlineto\n', ' 0 rlineto\n', ' closepath\n', ' fill\n', '} bind def\n', '\n', '/sequence { (\\\n', 'ACGUGCUAG\\\n', ') } def\n', '/len { sequence length } bind def\n', '\n', '/ubox {\n', ' logscale {\n', ' log dup add lpmin div 1 exch sub dup 0 lt { pop 0 } if\n', ' } if\n', ' 3 1 roll\n', ' exch len exch sub 1 add box\n', '} bind def\n', '\n', '/lbox {\n', ' 3 1 roll\n', ' len exch sub 1 add box\n', '} bind def\n', '\n', '72 216 translate\n', '72 6 mul len 1 add div dup scale\n', '/Helvetica findfont 0.95 scalefont setfont\n', '\n', '% print sequence along all 4 sides\n', '[ [0.7 -0.3 0 ]\n', ' [0.7 0.7 len add 0]\n', ' [0.7 -0.2 90]\n', ' [-0.3 len sub 0.7 len add -90]\n', '] {\n', ' gsave\n', ' aload pop rotate translate\n', ' 0 1 len 1 sub {\n', ' dup 0 moveto\n', ' sequence exch 1 getinterval\n', ' show\n', ' } for\n', ' grestore\n', '} forall\n', '\n', '0.5 dup translate\n', '% draw diagonal\n', '0.04 setlinewidth\n', '0 len moveto len 0 lineto stroke \n', '\n', '%draw grid\n', '0.01 setlinewidth\n', 'len log 0.9 sub cvi 10 exch exp % grid spacing\n', 'dup 1 gt {\n', ' dup dup 20 div dup 2 array astore exch 40 div setdash\n', '} { [0.3 0.7] 0.1 setdash } ifelse\n', '0 exch len {\n', ' dup dup\n', ' 0 moveto\n', ' len lineto \n', ' dup\n', ' len exch sub 0 exch moveto\n', ' len exch len exch sub lineto\n', ' stroke\n', '} for\n', '0.5 neg dup translate\n', '\n', '1 7 0.01462 ubox\n', '2 9 0.11118 ubox\n', '3 7 0.00985 ubox\n', '4 8 0.01005 ubox\n', '4 9 0.01586 ubox\n', 'showpage\n', 'end\n', '%%EOF\n'] #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_rna_plot.py000644 000765 000024 00000016216 12024702176 023011 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for the RNAplot parser """ from __future__ import division from cogent.util.unit_test import TestCase, main import string import re from cogent.parse.rna_plot import get_sequence, get_coordinates, get_pairs,\ RnaPlotParser __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jermey.widmann@colorado.edu" __status__ = "Production" class RnaPlotParserTests(TestCase): """Tests for RnaPlotParser. """ def setUp(self): """Setup function for RnaPlotParser tests. """ self.sequence_lines = SEQUENCE_LINES.split('\n') self.expected_seq = 'AAACCGCCUUU' self.coordinate_lines = COORDINATE_LINES.split('\n') self.expected_coords = [[92.500,92.500],\ [92.500,77.500],\ [92.500,62.500],\ [82.218,50.185],\ [85.577,34.497],\ [100.000,27.472],\ [114.423,34.497],\ [117.782,50.185],\ [107.500,62.500],\ [107.500,77.500],\ [107.500,92.500]] self.pairs_lines = PAIRS_LINES.split('\n') self.expected_pairs = [[0,10],\ [1,9],\ [2,8]] self.rna_plot_lines = RNA_PLOT_FILE.split('\n') def test_get_sequence(self): """get_sequence should properly parse out sequence. """ #test real data obs_seq = get_sequence(self.sequence_lines) self.assertEqual(obs_seq, self.expected_seq) #test empty list self.assertEqual(get_sequence([]),'') def test_get_coordinates(self): """get_coordinates should proplerly parse out coordinates. """ obs_coords = get_coordinates(self.coordinate_lines) for (obs1, obs2), (exp1, exp2) in zip(obs_coords,self.expected_coords): self.assertFloatEqual(obs1,exp1) self.assertFloatEqual(obs2,exp2) #test empty list self.assertEqual(get_coordinates([]),[]) def test_get_pairs(self): """get_pairs should proplerly parse out pairs. """ obs_pairs = get_pairs(self.pairs_lines) self.assertEqual(obs_pairs, self.expected_pairs) #test empty list self.assertEqual(get_pairs([]),[]) def test_RnaPlotParser(self): """RnaPlotParser should properly parse full RNAplot postscript file. """ obs_seq, obs_coords, obs_pairs = RnaPlotParser(self.rna_plot_lines) #test seq is correctly parsed self.assertEqual(obs_seq, self.expected_seq) #test coords are correctly parsed for (obs1, obs2), (exp1, exp2) in zip(obs_coords,self.expected_coords): self.assertFloatEqual(obs1,exp1) self.assertFloatEqual(obs2,exp2) #test pairs are correctly parsed self.assertEqual(obs_pairs, self.expected_pairs) #test empty list self.assertEqual(RnaPlotParser([]),('',[],[])) SEQUENCE_LINES = """ /sequence (\ AAACCGCCUUU\ ) def /coor [ [92.500 92.500] [92.500 77.500] [92.500 62.500] [82.218 50.185] [85.577 34.497] [100.000 27.472] [114.423 34.497] [117.782 50.185] [107.500 62.500] [107.500 77.500] [107.500 92.500] ] def /pairs [ [1 11] [2 10] [3 9] ] def init % switch off outline pairs or bases by removing these lines drawoutline drawpairs drawbases % show it showpage end %%EOF """ COORDINATE_LINES = """ /coor [ [92.500 92.500] [92.500 77.500] [92.500 62.500] [82.218 50.185] [85.577 34.497] [100.000 27.472] [114.423 34.497] [117.782 50.185] [107.500 62.500] [107.500 77.500] [107.500 92.500] ] def /pairs [ [1 11] [2 10] [3 9] ] def init % switch off outline pairs or bases by removing these lines drawoutline drawpairs drawbases % show it showpage end %%EOF """ PAIRS_LINES = """ /pairs [ [1 11] [2 10] [3 9] ] def init % switch off outline pairs or bases by removing these lines drawoutline drawpairs drawbases % show it showpage end %%EOF """ RNA_PLOT_FILE = """ %!PS-Adobe-3.0 EPSF-3.0 %%Creator: PS_dot.c,v 1.38 2007/02/02 15:18:13 ivo Exp $, ViennaRNA-1.8.2 %%CreationDate: Wed Apr 14 12:08:23 2010 %%Title: RNA Secondary Structure Plot %%BoundingBox: 66 210 518 662 %%DocumentFonts: Helvetica %%Pages: 1 %%EndComments %Options: % to switch off outline pairs of sequence comment or % delete the appropriate line near the end of the file %%BeginProlog /RNAplot 100 dict def RNAplot begin /fsize 14 def /outlinecolor {0.2 setgray} bind def /paircolor {0.2 setgray} bind def /seqcolor {0 setgray} bind def /cshow { dup stringwidth pop -2 div fsize -3 div rmoveto show} bind def /min { 2 copy gt { exch } if pop } bind def /max { 2 copy lt { exch } if pop } bind def /drawoutline { gsave outlinecolor newpath coor 0 get aload pop 0.8 0 360 arc % draw 5' circle of 1st sequence currentdict /cutpoint known % check if cutpoint is defined {coor 0 cutpoint getinterval {aload pop lineto} forall % draw outline of 1st sequence coor cutpoint get aload pop 2 copy moveto 0.8 0 360 arc % draw 5' circle of 2nd sequence coor cutpoint coor length cutpoint sub getinterval {aload pop lineto} forall} % draw outline of 2nd sequence {coor {aload pop lineto} forall} % draw outline as a whole ifelse stroke grestore } bind def /drawpairs { paircolor 0.7 setlinewidth [9 3.01] 9 setdash newpath pairs {aload pop coor exch 1 sub get aload pop moveto coor exch 1 sub get aload pop lineto } forall stroke } bind def % draw bases /drawbases { [] 0 setdash seqcolor 0 coor { aload pop moveto dup sequence exch 1 getinterval cshow 1 add } forall pop } bind def /init { /Helvetica findfont fsize scalefont setfont 1 setlinejoin 1 setlinecap 0.8 setlinewidth 72 216 translate % find the coordinate range /xmax -1000 def /xmin 10000 def /ymax -1000 def /ymin 10000 def coor { aload pop dup ymin lt {dup /ymin exch def} if dup ymax gt {/ymax exch def} {pop} ifelse dup xmin lt {dup /xmin exch def} if dup xmax gt {/xmax exch def} {pop} ifelse } forall /size {xmax xmin sub ymax ymin sub max} bind def 72 6 mul size div dup scale size xmin sub xmax sub 2 div size ymin sub ymax sub 2 div translate } bind def end %%EndProlog RNAplot begin % data start here /sequence (\ AAACCGCCUUU\ ) def /coor [ [92.500 92.500] [92.500 77.500] [92.500 62.500] [82.218 50.185] [85.577 34.497] [100.000 27.472] [114.423 34.497] [117.782 50.185] [107.500 62.500] [107.500 77.500] [107.500 92.500] ] def /pairs [ [1 11] [2 10] [3 9] ] def init % switch off outline pairs or bases by removing these lines drawoutline drawpairs drawbases % show it showpage end %%EOF """ #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_rnaalifold.py000644 000765 000024 00000006172 12024702176 023306 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.rnaalifold import rnaalifold_parser, MinimalRnaalifoldParser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman","Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class RnaalifoldParserTest(TestCase): """Provides tests for RNAALIFOLD RNA secondary structure format parsers""" def setUp(self): """Setup function """ #output self.rnaalifold_out = RNAALIFOLD #expected self.rnaalifold_exp = [['GGCUAGGUAAAUCC',[(0,13),(1,12),(3,10)],-26.50]] #output 2 self.rnaalifold_out_2 = RNAALIFOLD_2_STRUCTS #expected self.rnaalifold_exp_2 = \ [['GGCUAGGUAAAUCC',[(0,13),(1,12),(3,10)],\ float(-26.50)],\ ['-GAUCCUAAGCGACGAAGUUYAWSCU------YGKRYARYRWWKKR-',\ [(6,21),(7,20),(8,19),(9,18),(10,17)],\ float(-0.80)]] def test_rnaalifold_output(self): """Test for rnaalifold format parser""" #Test empty lines self.assertEqual(rnaalifold_parser(''),[]) #Test one structure obs = rnaalifold_parser(self.rnaalifold_out) self.assertEqual(obs,self.rnaalifold_exp) #Test two structures obs_2 = rnaalifold_parser(self.rnaalifold_out_2) self.assertEqual(obs_2,self.rnaalifold_exp_2) class MinimalRnaalifoldParserTest(TestCase): """Provides tests for MinimalRnaalifoldParser structure format parser. """ def setUp(self): """Setup function """ #output self.rnaalifold_out = RNAALIFOLD #expected self.rnaalifold_exp = [['GGCUAGGUAAAUCC','((.(......).))',\ float(-26.50)]] #output 2 self.rnaalifold_out_2 = RNAALIFOLD_2_STRUCTS #expected self.rnaalifold_exp_2 = \ [['GGCUAGGUAAAUCC','((.(......).))',float(-26.50)],\ ['-GAUCCUAAGCGACGAAGUUYAWSCU------YGKRYARYRWWKKR-',\ '......(((((......))))).........................',\ float(-0.80)]] def test_rnaalifold_output(self): """Test for rnaalifold format parser""" #Test empty lines self.assertEqual(MinimalRnaalifoldParser(''),[]) #Test one structure obs = MinimalRnaalifoldParser(self.rnaalifold_out) self.assertEqual(obs,self.rnaalifold_exp) #Test two structures obs_2 = MinimalRnaalifoldParser(self.rnaalifold_out_2) self.assertEqual(obs_2,self.rnaalifold_exp_2) RNAALIFOLD = ['GGCUAGGUAAAUCC\n', '((.(......).)) (-26.50 = -26.50 + 0.00) \n'] RNAALIFOLD_2_STRUCTS = ['GGCUAGGUAAAUCC\n', '((.(......).)) (-26.50 = -26.50 + 0.00) \n',\ '-GAUCCUAAGCGACGAAGUUYAWSCU------YGKRYARYRWWKKR-\n', '......(((((......)))))......................... ( -0.80 = -1.30 + 0.50)\n',] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_rnaforester.py000644 000765 000024 00000014434 12024702176 023525 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.parse.rnaforester import rnaforester_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class RnaforesterParserTest(TestCase): """Provides tests for RNAforester RNA secondary structure format parsers""" def setUp(self): """Setup function application output is not always actual(real) output from the application but output in the format of the application output in question. this to save space and time(mine)""" #output self.rnaforester_out = RNAFORESTER #expected self.rnaforester_exp = [[{'seq1':'ggccacguagcucagucgguagagcaaaggacugaaaauccuugugucguugguucaauuccaaccguggccacca', 'seq2':'-gccagauagcucagucgguagagcguucgccugaaaagugaaaggucgccgguucgaucccggcucuggccacca'}, 'ggccacauagcucagucgguagagcaaacgacugaaaagccaaaggucgccgguucaaucccaacccuggccacca', [(0, 71), (1, 70), (2, 69), (3, 68), (4, 67), (5, 66), (6, 65), (9, 24), (10, 23), (11, 22), (12, 21), (26, 42), (27, 41), (28, 40), (29, 39), (31, 38), (32, 37), (48, 64), (49, 63), (50, 62), (51, 61), (52, 60)]]] def test_rnaforester_output(self): """Test for rnaforester format""" obs = rnaforester_parser(self.rnaforester_out) self.assertEqual(obs,self.rnaforester_exp) RNAFORESTER = ['*** Scoring parameters ***\n', '\n', 'Scoring type: similarity\n', 'Scoring parameters:\n', 'pm: 10\n', 'pd: -5\n', 'bm: 1\n', 'br: 0\n', 'bd: -10\n', '\n', '\n', 'Input string (upper or lower case); & to end for multiple alignments, @ to quit\n', '....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8\n', '\n', '*** Calculation ***\n', '\n', 'clustering threshold is: 0.7\n', 'join clusters cutoff is: 0\n', '\n', 'Computing all pairwise similarities\n', '2,1: 0.74606\n', '\n', 'joining alignments:\n', '1,2: 0.74606 -> 1\n', 'Calculate similarities to other clusters\n', '\n', '\n', '*** Results ***\n', '\n', 'Minimum basepair probability for consensus structure (-cmin): 0.5\n', '\n', 'RNA Structure Cluster Nr: 1\n', 'Score: 264.25\n', 'Members: 2\n', '\n', 'seq1 ggccacguagcucagucgguagagcaaaggacugaaaauccuugugucguugguu\n', 'seq2 -gccagauagcucagucgguagagcguucgccugaaaagugaaaggucgccgguu\n', ' **** ****************** * ******* **** ****\n', '\n', 'seq1 caauuccaaccguggccacca\n', 'seq2 cgaucccggcucuggccacca\n', ' * ** ** * *********\n', '\n', 'seq1 (((((((..((((........)))).(((((.......))))).....(((((..\n', 'seq2 -((((((..((((........)))).((((.((....)))))).....(((((..\n', ' ***************************** **** *****************\n', '\n', 'seq1 .....))))))))))))....\n', 'seq2 .....))))))))))).....\n', ' **************** ****\n', '\n', '\n', 'Consensus sequence/structure:\n', ' 100% **** ****************** * ******* **** ****\n', ' 90% **** ****************** * ******* **** ****\n', ' 80% **** ****************** * ******* **** ****\n', ' 70% **** ****************** * ******* **** ****\n', ' 60% **** ****************** * ******* **** ****\n', ' 50% *******************************************************\n', ' 40% *******************************************************\n', ' 30% *******************************************************\n', ' 20% *******************************************************\n', ' 10% *******************************************************\n', ' ggccacauagcucagucgguagagcaaacgacugaaaagccaaaggucgccgguu\n', ' (((((((..((((........)))).((((.((....)))))).....(((((..\n', ' 10% *******************************************************\n', ' 20% *******************************************************\n', ' 30% *******************************************************\n', ' 40% *******************************************************\n', ' 50% *******************************************************\n', ' 60% *******************************************************\n', ' 70% ****************************** **** ****************\n', ' 80% ****************************** **** ****************\n', ' 90% ****************************** **** ****************\n', ' 100% ****************************** **** ****************\n', '\n', ' 100% * ** ** * *********\n', ' 90% * ** ** * *********\n', ' 80% * ** ** * *********\n', ' 70% * ** ** * *********\n', ' 60% * ** ** * *********\n', ' 50% *********************\n', ' 40% *********************\n', ' 30% *********************\n', ' 20% *********************\n', ' 10% *********************\n', ' caaucccaacccuggccacca\n', ' .....))))))))))))....\n', ' 10% *********************\n', ' 20% *********************\n', ' 30% *********************\n', ' 40% *********************\n', ' 50% *********************\n', ' 60% *********************\n', ' 70% **************** ****\n', ' 80% **************** ****\n', ' 90% **************** ****\n', ' 100% **************** ****\n', '\n', '\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_rnaview.py000644 000765 000024 00000120626 12024702176 022647 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #test_rnaview.py """Provides tests for code in the rnaview.py file. """ from cogent.util.unit_test import TestCase, main from cogent.parse.rnaview import is_roman_numeral, is_edge, is_orientation,\ parse_annotation, parse_uncommon_residues, parse_base_pairs,\ parse_base_multiplets, parse_pair_counts, MinimalRnaviewParser,\ RnaviewParser, Base, BasePair, BasePairs, BaseMultiplet, BaseMultiplets,\ PairCounts, RnaViewObjectError, RnaViewParseError, MinimalRnaviewParser,\ parse_filename, parse_number_of_pairs, verify_bp_counts,\ in_chain, is_canonical, is_not_canonical, is_stacked, is_not_stacked,\ is_tertiary, is_not_stacked_or_tertiary, is_tertiary_base_base __author__ = "Greg Caporaso and Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" #============================================================================== # RNAVIEW OBJECTS TESTS #============================================================================== class BaseTests(TestCase): """Tests for Base class""" def test_init(self): """Base __init__: should initialize on standard data""" b = Base('A','30','G') self.assertEqual(b.ChainId, 'A') self.assertEqual(b.ResId, '30') self.assertEqual(b.ResName, 'G') #ResId or ResName can't be None or empty string self.assertRaises(RnaViewObjectError, Base, None, None, 'G') self.assertRaises(RnaViewObjectError, Base, '1', 'A', '') self.assertRaises(RnaViewObjectError, Base, None, '', 'C') #Can pass RnaViewSeqPos (str) b = Base('C','12','A','10') self.assertEqual(b.RnaViewSeqPos, '10') def test_str(self): """Base __str__: should return correct string""" b = Base('A','30','G') self.assertEqual(str(b), 'A 30 G') def test_eq(self): """Base ==: functions as expected """ # Define a standard to compare others b = Base('A','30','G') # Identical to b b_a = Base('A','30','G') # Differ in Chain from b b_b = Base('B','30','G') # Differ in ResId from b b_c = Base('A','25','G') # Differ in ResName from b b_d = Base('A','30','C') # Differ in RnaViewSeqPos b_e = Base('A','30','G','2') # Differ in everything from b b_e = Base('C','12','U','1') self.assertEqual(b == b, True) self.assertEqual(b_a == b, True) self.assertEqual(b_b == b, False) self.assertEqual(b_c == b, False) self.assertEqual(b_d == b, False) self.assertEqual(b_e == b, False) def test_ne(self): """Base !=: functions as expected""" # Define a standard to compare others b = Base('A','30','G') # Identical to b b_a = Base('A','30','G') # Differ in Chain from b b_b = Base('B','30','G') # Differ in ResId from b b_c = Base('A','25','G') # Differ in ResName from b b_d = Base('A','30','C') # Differ in everything from b b_e = Base('C','12','U') self.assertEqual(b != b, False) self.assertEqual(b_a != b, False) self.assertEqual(b_b != b, True) self.assertEqual(b_c != b, True) self.assertEqual(b_d != b, True) self.assertEqual(b_e != b, True) class BasePairTests(TestCase): """Tests for BasePair object""" def setUp(self): """setUp method for all tests in BasePairTests""" self.b1 = Base('A','30','G') self.b2 = Base('A','36','C') self.bp = BasePair(self.b1, self.b2, Edges='H/W', Saenger='XI',\ Orientation='cis', Conformation=None) def test_init(self): """BasePair __init__: should initialize on standard data""" self.failUnless(self.bp.Up is self.b1) self.failUnless(self.bp.Down is self.b2) self.failUnless(self.bp.Conformation is None) self.assertEqual(self.bp.Edges, 'H/W') self.assertEqual(self.bp.Orientation, 'cis') self.assertEqual(self.bp.Saenger, 'XI') def test_str(self): """BasePair __str__: should return correct string""" self.assertEqual(str(self.bp), "Bases: A 30 G -- A 36 C; Annotation: H/W -- cis -- "+\ "None -- XI;") def test_eq(self): "BasePair ==: should function as expected" # identical up = Base('A','30','G') down = Base('A','36','C') bp = BasePair(up, down, Edges='H/W', Saenger='XI',\ Orientation='cis', Conformation=None) self.assertEqual(bp == self.bp, True) # diff up base diff_up = Base('C','12','A') bp = BasePair(diff_up, down, Edges='H/W', Saenger='XI',\ Orientation='cis', Conformation=None) self.assertEqual(bp == self.bp, False) # diff down base diff_down = Base('D','13','U') bp = BasePair(up, diff_down, Edges='H/W', Saenger='XI',\ Orientation='cis', Conformation=None) self.assertEqual(bp == self.bp, False) # diff edges bp = BasePair(up, down, Edges='W/W', Saenger='XI',\ Orientation='cis', Conformation=None) self.assertEqual(bp == self.bp, False) # diff orientation bp = BasePair(up, down, Edges='H/W', Saenger='XI',\ Orientation='tran', Conformation=None) self.assertEqual(bp == self.bp, False) # diff conformation bp = BasePair(up, down, Edges='H/W', Saenger='XI',\ Orientation='cis', Conformation='syn') self.assertEqual(bp == self.bp, False) # diff saenger bp = BasePair(up, down, Edges='H/W', Saenger='XIX',\ Orientation='cis', Conformation=None) self.assertEqual(bp == self.bp, False) def test_ne(self): "BasePair !=: should function as expected" # identical up = Base('A','30','G') down = Base('A','36','C') bp = BasePair(up, down, Edges='H/W', Saenger='XI',\ Orientation='cis', Conformation=None) self.assertEqual(bp != self.bp, False) # diff up base diff_up = Base('C','12','A') bp = BasePair(diff_up, down, Edges='H/W', Saenger='XI',\ Orientation='cis', Conformation=None) self.assertEqual(bp != self.bp, True) # diff down base diff_down = Base('D','13','U') bp = BasePair(up, diff_down, Edges='H/W', Saenger='XI',\ Orientation='cis', Conformation=None) self.assertEqual(bp != self.bp, True) # diff edges bp = BasePair(up, down, Edges='W/W', Saenger='XI',\ Orientation='cis', Conformation=None) self.assertEqual(bp != self.bp, True) # diff orientation bp = BasePair(up, down, Edges='H/W', Saenger='XI',\ Orientation='tran', Conformation=None) self.assertEqual(bp != self.bp, True) # diff conformation bp = BasePair(up, down, Edges='H/W', Saenger='XI',\ Orientation='cis', Conformation='syn') self.assertEqual(bp != self.bp, True) # diff saenger bp = BasePair(up, down, Edges='H/W', Saenger='XIX',\ Orientation='cis', Conformation=None) self.assertEqual(bp != self.bp, True) def test_isWC(self): """BasePair isWC: should return True for GC or AU pair""" bp = BasePair(Base('A','30','G'), Base('A','36','C')) self.assertEqual(bp.isWC(), True) bp = BasePair(Base('A','30','g'), Base('A','36','C')) self.assertEqual(bp.isWC(), True) bp = BasePair(Base('A','30','C'), Base('A','36','G')) self.assertEqual(bp.isWC(), True) bp = BasePair(Base('A','30','U'), Base('A','36','a')) self.assertEqual(bp.isWC(), True) bp = BasePair(Base('A','30','G'), Base('A','36','U')) self.assertEqual(bp.isWC(), False) def test_isWobble(self): """BasePair isWobble: should return True for GU pair""" bp = BasePair(Base('A','30','G'), Base('A','36','U')) self.assertEqual(bp.isWobble(), True) bp = BasePair(Base('A','30','g'), Base('A','36','U')) self.assertEqual(bp.isWobble(), True) bp = BasePair(Base('A','30','u'), Base('A','36','g')) self.assertEqual(bp.isWobble(), True) bp = BasePair(Base('A','30','A'), Base('A','36','U')) self.assertEqual(bp.isWobble(), False) class BasePairsTests(TestCase): """Tests for BasePairs object""" def setUp(self): """setUp method for all BasePairs tests""" self.a1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX') self.a2 = BasePair(Base('A','31','A'), Base('A','35','U'), Saenger='XX') self.a3 = BasePair(Base('A','40','G'), Base('A','60','U'), Saenger='V') self.a4 = BasePair(Base('A','41','A'), Base('A','58','U'),\ Saenger=None) self.ab1 = BasePair(Base('A','41','A'), Base('B','58','U')) self.ac1 = BasePair(Base('A','10','C'), Base('C','3','G')) self.bc1 = BasePair(Base('B','41','A'), Base('C','1','U')) self.bn1 = BasePair(Base('B','41','A'), Base(None,'1','U')) self.cd1 = BasePair(Base('C','41','A'), Base('D','1','U')) self.bp1 = BasePair(Base('A','34','U'), Base('A','40','A')) self.bp2 = BasePair(Base('A','35','C'), Base('A','39','G')) self.bp3 = BasePair(Base('B','32','G'), Base('B','38','U')) self.bp4 = BasePair(Base('B','33','G'), Base('B','37','C')) self.bp5 = BasePair(Base('A','31','C'), Base('B','41','G')) self.bp6 = BasePair(Base('A','32','U'), Base('B','40','A')) self.bp7 = BasePair(Base('A','37','U'), Base('B','35','A')) self.pairs = BasePairs([self.bp1, self.bp2, self.bp3, self.bp4,\ self.bp5, self.bp6, self.bp7]) def test_init(self): """BasePairs __init__: should work with or without Model""" # init from list bps = BasePairs([self.a1, self.a2]) self.failUnless(bps[0] is self.a1) self.failUnless(bps[1] is self.a2) # init from tuple bps = BasePairs((self.a1, self.a2)) self.failUnless(bps[0] is self.a1) self.failUnless(bps[1] is self.a2) def test_str(self): """BasePairs __str__: should produce expected string""" b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX') b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\ Orientation='cis',Edges='W/W') bps = BasePairs([b1, b2]) exp_lines = [ "===================================================================",\ "Bases: Up -- Down; Annotation: Edges -- Orient. -- Conf. -- Saenger",\ "===================================================================",\ "Bases: A 30 G -- A 36 C; Annotation: None -- None -- None -- XX;",\ "Bases: A 31 A -- A 35 U; Annotation: W/W -- cis -- None -- None;"] self.assertEqual(str(bps), '\n'.join(exp_lines)) def test_select(self): """BasePairs select: should work with any good function""" def xx(bp): if bp.Saenger == 'XX': return True return False bps = BasePairs([self.a1, self.a2, self.a3, self.a4]) obs = bps.select(xx) self.assertEqual(len(obs), 2) self.failUnless(obs[0] is self.a1) self.failUnless(obs[1] is self.a2) for i in obs: self.assertEqual(i.Saenger, 'XX') def test_PresentChains(self): """BasePairs PresentChains: should work on single/multiple chain(s)""" bps = BasePairs([self.a1, self.a2, self.a3, self.a4]) self.assertEqual(bps.PresentChains, ['A']) bps = BasePairs([self.a1, self.ab1]) self.assertEqualItems(bps.PresentChains, ['A','B']) bps = BasePairs([self.a1, self.ab1, self.ac1, self.bc1]) self.assertEqualItems(bps.PresentChains, ['A','B', 'C']) bps = BasePairs([self.a1, self.ab1, self.bn1]) self.assertEqualItems(bps.PresentChains, [None, 'A','B']) def test_cliques(self): """BasePairs cliques: single/multiple chains and cliques""" #one chain, one clique bps = BasePairs([self.a1, self.a2, self.a3, self.a4]) obs_cl = list(bps.cliques()) self.assertEqual(len(obs_cl), 1) #3 chains, 2 cliques bps = BasePairs([self.a1, self.a2, self.cd1]) obs_cl = list(bps.cliques()) self.assertEqual(len(obs_cl), 2) self.assertEqual(len(obs_cl[0]), 2) self.assertEqual(len(obs_cl[1]), 1) self.failUnless(obs_cl[1][0] is self.cd1) self.assertEqual(obs_cl[1].PresentChains, ['C','D']) #5 chains, 1 clique bps = BasePairs([self.a1, self.ab1, self.bc1, self.bn1, self.cd1]) obs_cl = list(bps.cliques()) self.assertEqual(len(obs_cl), 1) self.assertEqual(len(obs_cl[0]), 5) self.failUnless(obs_cl[0][0] is self.a1) self.assertEqualItems(obs_cl[0].PresentChains, ['A','B','C','D', None]) def test_hasConflicts(self): """BasePairs hadConflicts: handle chains and residue IDs""" # no conflict b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX') b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\ Orientation='cis',Edges='W/W') b3 = BasePair(Base('A','15','G'), Base('A','42','C')) bps = BasePairs([b1, b2, b3]) self.assertEqual(bps.hasConflicts(), False) self.assertEqual(bps.hasConflicts(return_conflict=True), (False, None)) # conflict within chain b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX') b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\ Orientation='cis',Edges='W/W') b3 = BasePair(Base('A','30','G'), Base('A','42','C')) bps = BasePairs([b1, b2, b3]) self.assertEqual(bps.hasConflicts(), True) # conflict within chain -- return conflict b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX') b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\ Orientation='cis',Edges='W/W') b3 = BasePair(Base('A','30','G'), Base('A','42','C')) bps = BasePairs([b1, b2, b3]) self.assertEqual(bps.hasConflicts(return_conflict=True),\ (True, "A 30 G")) # no conflict, same residue ID, different chain b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX') b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\ Orientation='cis',Edges='W/W') b3 = BasePair(Base('C','30','G'), Base('A','42','C')) bps = BasePairs([b1, b2, b3]) self.assertEqual(bps.hasConflicts(), False) class BaseMultipletTests(TestCase): """Tests for BaseMultiplet object""" def test_init(self): """BaseMultiplet __init__: should work as expected""" b1 = Base('A','30','A') b2 = Base('A','35','G') b3 = Base('A','360','U') bm = BaseMultiplet([b1, b2, b3]) self.failUnless(bm[0] is b1) self.failUnless(bm[2] is b3) #should work from tuple also bm = BaseMultiplet((b1, b2, b3)) self.failUnless(bm[0] is b1) self.failUnless(bm[2] is b3) def test_str(self): """BaseMultiplet __str__: should give expected string""" b1 = Base('A','30','A') b2 = Base('A','35','G') b3 = Base('A','360','U') bm = BaseMultiplet([b1, b2, b3]) exp = "A 30 A -- A 35 G -- A 360 U;" self.assertEqual(str(bm), exp) class BaseMultipletsTests(TestCase): """Tests for BaseMultiplets object""" def test_init(self): """BaseMultiplets __init__: from list and tuple""" b1 = Base('A','30','A') b2 = Base('A','35','G') b3 = Base('A','360','U') bm1 = BaseMultiplet([b1, b2, b3]) b4 = Base('B','12','C') b5 = Base('B','42','U') b6 = Base('C','2','A') bm2 = BaseMultiplet([b4, b5, b6]) bms = BaseMultiplets([bm1, bm2]) self.failUnless(bms[0] is bm1) self.failUnless(bms[1] is bm2) self.assertEqual(bms[1][2].ResId, '2') #should work from tuple also bms = BaseMultiplets((bm1, bm2)) self.failUnless(bms[0] is bm1) self.failUnless(bms[1] is bm2) self.assertEqual(bms[1][2].ResId, '2') def test_str(self): """BaseMultiplets __str__: should give expected string""" b1 = Base('A','30','A') b2 = Base('A','35','G') b3 = Base('A','360','U') bm1 = BaseMultiplet([b1, b2, b3]) b4 = Base('B','12','C') b5 = Base('B','42','U') b6 = Base('C','2','A') bm2 = BaseMultiplet([b4, b5, b6]) bms = BaseMultiplets([bm1, bm2]) exp_lines = [\ "A 30 A -- A 35 G -- A 360 U;",\ "B 12 C -- B 42 U -- C 2 A;"] self.assertEqual(str(bms), '\n'.join(exp_lines)) class TestPairCounts(TestCase): """Tests for PairCounts object. Contains only test for __init__. Everything should fucntion as a dict. """ def test_init(self): """PairCounts __init__: should work as dict""" res = PairCounts(\ {'Standard':1, 'WS--cis':300, 'Bifurcated': 2, 'HS-tran':0}) self.assertEqual(res['Standard'], 1) self.assertEqual(res['WS--cis'], 300) self.assertEqual(res['Bifurcated'], 2) self.assertEqual(res['HS-tran'], 0) #============================================================================== # SELECTION FUNCTIONS TESTS #============================================================================== class SelectionFunctionTests(TestCase): def test_in_chain(self): b1, b2 = Base('A','30','C'), Base('A','40','G') bp1 = BasePair(b1, b2) self.assertEqual(in_chain("A")(bp1), True) self.assertEqual(in_chain(["A",'B'])(bp1), True) self.assertEqual(in_chain("B")(bp1), False) b3, b4 = Base('A','30','C'), Base('B','40','G') bp2 = BasePair(b3,b4) self.assertEqual(in_chain("A")(bp2), False) self.assertEqual(in_chain(["A",'B'])(bp2), True) self.assertEqual(in_chain("AB")(bp2), True) self.assertEqual(in_chain("AC")(bp2), False) b5, b6 = Base('A','30','C'), Base('C','40','G') bp3 = BasePair(b5,b6) self.assertEqual(in_chain("A")(bp3), False) self.assertEqual(in_chain("C")(bp3), False) self.assertEqual(in_chain(["A",'B'])(bp3), False) self.assertEqual(in_chain("AC")(bp3), True) def test_is_canocical(self): """is_canonical: work on annotation, not base identity""" b1, b2 = Base('A','30','C'), Base('A','40','G') bp = BasePair(b1, b2, Edges='+/+') self.assertEqual(is_canonical(bp), True) bp = BasePair(b1, b2, Edges='-/-') self.assertEqual(is_canonical(bp), True) bp = BasePair(b1, b2, Edges='W/W') self.assertEqual(is_canonical(bp), False) bp = BasePair(b1, b2, Edges='W/W', Orientation='cis',Saenger='XXVIII') self.assertEqual(is_canonical(bp), True) def test_is_not_canocical(self): """is_not_canonical: opposite of is_canonical""" b1, b2 = Base('A','30','C'), Base('A','40','G') bp = BasePair(b1, b2, Edges='+/+') self.assertEqual(is_not_canonical(bp), False) bp = BasePair(b1, b2, Edges='-/-') self.assertEqual(is_not_canonical(bp), False) bp = BasePair(b1, b2, Edges='W/W') self.assertEqual(is_not_canonical(bp), True) bp = BasePair(b1, b2, Edges='W/W', Orientation='cis',Saenger='XXVIII') self.assertEqual(is_not_canonical(bp), False) def test_is_stacked(self): """is_stacked: checks annotation, not base identity""" b1, b2 = Base('A','30','C'), Base('A','40','A') bp = BasePair(b1, b2, Edges='stacked') self.assertEqual(is_stacked(bp), True) bp = BasePair(b1, b2, Edges='H/?') self.assertEqual(is_stacked(bp), False) def test_is_not_stacked(self): """is_not_stacked: opposite of is_stacked""" b1, b2 = Base('A','30','C'), Base('A','40','A') bp = BasePair(b1, b2, Edges='stacked') self.assertEqual(is_not_stacked(bp), False) bp = BasePair(b1, b2, Edges='H/?') self.assertEqual(is_not_stacked(bp), True) def test_is_tertiary(self): """is_tertiary: checks annotation, not base identity""" b1, b2 = Base('A','30','C'), Base('A','40','U') bp = BasePair(b1, b2, Saenger='!1H(b_b)') self.assertEqual(is_tertiary(bp), True) bp = BasePair(b1, b2, Edges='H/?', Saenger='XX') self.assertEqual(is_tertiary(bp), False) bp = BasePair(b1,b2, Edges='stacked') self.assertEqual(is_tertiary(bp), False) def test_is_not_stacked_or_tertiary(self): """is_not_stacked_or_tertiary: checks annotation, not base identity""" b1, b2 = Base('A','30','C'), Base('A','40','U') bp = BasePair(b1, b2, Saenger='!1H(b_b)') self.assertEqual(is_not_stacked_or_tertiary(bp), False) bp = BasePair(b1, b2, Edges='stacked') self.assertEqual(is_not_stacked_or_tertiary(bp), False) bp = BasePair(b1, b2, Edges='W/W', Saenger='XX') self.assertEqual(is_not_stacked_or_tertiary(bp), True) def test_is_tertiary_base_base(self): """is_tertiary_base_base: checks annotation, not base identity""" b1, b2 = Base('A','30','C'), Base('A','40','U') bp = BasePair(b1, b2, Saenger='!1H(b_b)') self.assertEqual(is_tertiary_base_base(bp), True) bp = BasePair(b1, b2, Edges='H/?', Saenger='!(s_s)') self.assertEqual(is_tertiary_base_base(bp), False) #============================================================================== # RNAVIEW PARSER TESTS #============================================================================== class RnaviewParserTests(TestCase): """Tests for RnaviewParser and related code""" def test_is_roman_numeral(self): """is_roman_numeral: should work for all, including comma""" self.assertEqual(is_roman_numeral('XIII'),True) self.assertEqual(is_roman_numeral('Xiii'),False) self.assertEqual(is_roman_numeral('MMCDXXVIII'),True) self.assertEqual(is_roman_numeral('XII,XIII'),True) self.assertEqual(is_roman_numeral('n/a'),False) def test_is_edge(self): """is_edge: should identify valid edges correctly""" self.assertEqual(is_edge('H/W'),True) self.assertEqual(is_edge('./W'),True) self.assertEqual(is_edge('+/+'),True) self.assertEqual(is_edge(' '),False) self.assertEqual(is_edge('P/W'),False) self.assertEqual(is_edge('X/W'),True) self.assertEqual(is_edge('X/X'),True) def test_is_orientation(self): """is_orientation: should fail on anything but 'cis' or 'tran'""" self.assertEqual(is_orientation('cis'),True) self.assertEqual(is_orientation('tran'),True) self.assertEqual(is_orientation('tranxxx'),False) def test_parse_annotation(self): """parse_annotation: should return correct tuple of 4 or raise error """ self.assertEqual(parse_annotation(['W/S', 'tran', 'syn', 'syn',\ 'n/a']), ('W/S','tran','syn syn','n/a')) self.assertEqual(parse_annotation(['syn','stacked']),\ ('stacked', None,'syn',None)) self.assertEqual(parse_annotation(['W/W','tran','syn','XII,XIII']),\ ('W/W', 'tran','syn','XII,XIII')) self.assertEqual(parse_annotation(['./W','cis','!1H(b_b)']),\ ('./W', 'cis',None,'!1H(b_b)')) self.assertEqual(parse_annotation([]),\ (None, None, None, None)) self.assertRaises(RnaViewParseError, parse_annotation, ['X--X']) def test_parse_filename(self): """parse_filename: should return name of file""" lines = ["PDB data file name: pdb1t4l.ent_nmr.pdb"] self.assertEqual(parse_filename(lines), 'pdb1t4l.ent_nmr.pdb') lines = ["PDB data file name: pdb1t4l.ent_nmr.pdb","other line"] self.assertRaises(RnaViewParseError, parse_filename, lines) def test_parse_uncommon_residues(self): """parse_uncommon_residues: should fail on some missing residue info """ lines = UC_LINES.split('\n') self.assertEqual(parse_uncommon_residues(lines),\ {('D','16','TLN'):'u',('D','17','LCG'):'g',\ ('0','2588','OMG'):'g',(' ','2621','PSU'):'P'}) for l in UC_LINES_WRONG.split('\n'): self.assertRaises(RnaViewParseError, parse_uncommon_residues, [l]) def test_parse_base_pairs_basic(self): """parse_base_pairs: basic input""" basic_lines =\ ['25_437, 0: 34 C-G 448 0: +/+ cis XIX',\ '26_436, 0: 35 U-A 447 0: -/- cis XX'] bp1 = BasePair(Up=Base('0','34','C','25'),\ Down=Base('0','448','G','437'),\ Edges='+/+', Orientation='cis',Conformation=None,Saenger='XIX') bp2 = BasePair(Up=Base('0','35','U','26'),\ Down=Base('0','447','A','436'),\ Edges='-/-', Orientation='cis',Conformation=None,Saenger='XX') bps = BasePairs([bp1,bp2]) obs = parse_base_pairs(basic_lines) for o,e in zip(obs,[bp1,bp2]): self.assertEqual(o,e) self.assertEqual(len(obs), 2) basic_lines =\ ['25_437, 0: 34 c-P 448 0: +/+ cis XIX',\ '26_436, 0: 35 U-X 447 0: -/- cis XX'] self.assertRaises(RnaViewParseError, parse_base_pairs, basic_lines) basic_lines =\ ['25_437, 0: 34 c-P 448 0: +/+ cis XIX',\ '26_436, 0: 35 I-A 447 0: -/- cis XX'] bp1 = BasePair(Up=Base('0','34','c','25'),\ Down=Base('0','448','P','437'),\ Edges='+/+', Orientation='cis',Conformation=None,Saenger='XIX') bp2 = BasePair(Up=Base('0','35','I','26'),\ Down=Base('0','447','A','436'),\ Edges='-/-', Orientation='cis',Conformation=None,Saenger='XX') bps = BasePairs([bp1,bp2]) obs = parse_base_pairs(basic_lines) for o,e in zip(obs,[bp1,bp2]): self.assertEqual(o,e) self.assertEqual(len(obs), 2) lines = ['1_2, : 6 G-G 7 : stacked',\ '1_16, : 6 G-C 35 : +/+ cis XIX'] bp1 = BasePair(Up=Base(' ','6','G','1'),\ Down=Base(' ','7','G','2'), Edges='stacked') bp2 = BasePair(Up=Base(' ','6','G','1'),\ Down=Base(' ','35','C','16'),\ Edges='+/+', Orientation='cis',Conformation=None,Saenger='XIX') obs = parse_base_pairs(lines) for o,e in zip(obs,[bp1,bp2]): self.assertEqual(o,e) self.assertEqual(len(obs), 2) def test_parse_base_multiplets_basic(self): """parse_base_multiplets: basic input""" basic_lines =\ ['235_237_254_| [20 3] 0: 246 G + 0: 248 A + 0: 265 U',\ '273_274_356_| [21 3] 0: 284 C + 0: 285 A + 0: 367 G'] bm1 = BaseMultiplet([Base('0','246','G','235'),\ Base('0','248','A','237'), Base('0','265','U','254')]) bm2 = BaseMultiplet([Base('0','284','C','273'),\ Base('0','285','A','274'), Base('0','367','G','356')]) bms = BaseMultiplets([bm1,bm2]) obs = parse_base_multiplets(basic_lines) for o,e in zip(obs,bms): for base_x, base_y in zip(o,e): self.assertEqual(base_x,base_y) self.assertEqual(len(obs), 2) self.assertEqual(len(obs[0]), 3) basic_lines =\ ['235_237_254_| [20 3] 0: 246 G + 0: 248 A + 0: 265 I',\ '273_274_356_| [21 3] 0: 284 P + 0: 285 a + 0: 367 G'] bm1 = BaseMultiplet([Base('0','246','G','235'),\ Base('0','248','A','237'), Base('0','265','I','254')]) bm2 = BaseMultiplet([Base('0','284','P','273'),\ Base('0','285','a','274'), Base('0','367','G','356')]) bms = BaseMultiplets([bm1,bm2]) obs = parse_base_multiplets(basic_lines) for o,e in zip(obs,bms): for base_x, base_y in zip(o,e): self.assertEqual(base_x,base_y) self.assertEqual(len(obs), 2) self.assertEqual(len(obs[0]), 3) def test_parse_base_multiplets_errors(self): """parse_base_multiplets: error checking""" # Unknown base basic_lines =\ ['235_237_254_| [20 3] 0: 246 X + 0: 248 A + 0: 265 U',\ '273_274_356_| [21 3] 0: 284 C + 0: 285 A + 0: 367 G'] self.assertRaises(RnaViewParseError, parse_base_multiplets,\ basic_lines) # number of rnaview_seqpos doesn't match number of bases basic_lines =\ ['235_237_| [20 3] 0: 246 X + 0: 248 A + 0: 265 U',\ '273_274_356_| [21 3] 0: 284 C + 0: 285 A + 0: 367 G'] self.assertRaises(RnaViewParseError, parse_base_multiplets,\ basic_lines) # Number of reported bases incorrect basic_lines =\ ['235_237_254_| [20 3] 0: 246 X + 0: 248 A + 0: 265 U',\ '273_274_356_| [21 5] 0: 284 C + 0: 285 A + 0: 367 G'] self.assertRaises(RnaViewParseError, parse_base_multiplets,\ basic_lines) def test_parse_number_of_pairs(self): """parse_number_of_pairs: good/bad input""" lines = ["The total base pairs = 31 (from 65 bases)"] exp = {'NUM_PAIRS':31, 'NUM_BASES':65} self.assertEqual(parse_number_of_pairs(lines), exp) lines = ["The total base pairs = 31 (from 65 bases)","XXX"] self.assertRaises(RnaViewParseError, parse_number_of_pairs, lines) lines = ["The total base pairs = 31(from 65 bases)"] self.assertRaises(RnaViewParseError, parse_number_of_pairs, lines) def test_parse_pair_counts(self): """parse_pair_counts: should work for even number of lines""" lines = PC_COUNTS1.split('\n') res = parse_pair_counts(lines) self.assertEqual(res['Standard'], 1) self.assertEqual(res['WS--cis'], 300) self.assertEqual(res['Bifurcated'], 2) self.assertEqual(res['HS-tran'], 0) lines = PC_COUNTS2.split('\n') res = parse_pair_counts(lines) self.assertEqual(res['Standard'], 19) self.assertEqual(res['WW-tran'], 1) self.assertEqual(res['HS-tran'], 0) self.failIf('Bifurcated' in res) self.assertRaises(RnaViewParseError, parse_pair_counts,\ PC_COUNTS2.split('\n')[:-1]) self.assertEqual(parse_pair_counts([]),{}) def test_verify_bp_counts(self): """verify_bp_count: should raise an error if bp counts are wrong""" lines = RNAVIEW_PDB_REAL.split('\n') obs = RnaviewParser(lines) # this shouldn't raise an error verify_bp_counts(obs['BP'],11,obs['PC']) # reported number isn't right self.assertRaises(RnaViewParseError,\ verify_bp_counts, obs['BP'], 12, obs['PC']) # No longer checks for the base pair counts reported in the # dictionary, b/c this number doens't match the total when # modified bases are present. ## PREVIOUS TEST: # pair_counts isn't right #obs['PC']['Standard'] = 14 #self.assertRaises(RnaViewParseError,\ # verify_bp_counts, obs['BP'], 11, obs['PC']) ## NEW TEST obs['PC']['Standard'] = 14 verify_bp_counts(obs['BP'],11,obs['PC']) def test_MinimalRnaviewParser(self): """MinimalRnaviewParser: should divide lines into right classes""" exp = {'FN': ['PDB data file name: 1EHZ.pdb'], 'UC':\ ['uncommon residue I 1 on chain A [#1] assigned to: I', 'uncommon residue 2MG 10 on chain A [#10] assigned to: g'], 'BP':['1_72, A: 1 I-C 72 A: X/X cis n/a', '58_60, A: 58 a-C 60 A: S/S tran syn !(s_s)'], 'BM':['9_12_23_| [1 3] A: 9 A + A: 12 U + A: 23 A', '13_22_46_| [2 3] A: 13 C + A: 22 G + A: 46 g'], 'PC':['Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran', '19 3 1 0 1 0 0', 'WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran', '0 3 0 2 0 0'], 'NP':['The total base pairs = 30 (from 76 bases)']} obs = MinimalRnaviewParser(RNAVIEW_LINES.split('\n')) self.assertEqual(len(obs), len(exp)) self.assertEqual(obs, exp) def test_MinimalRnaviewParser_short(self): """MinimalRnaviewparser: should leave lists empty if no lines found""" lines = RNAVIEW_LINES_SHORT.split('\n') res = MinimalRnaviewParser(lines) self.assertEqual(len(res['FN']), 1) self.assertEqual(res['UC'], []) self.assertEqual(res['BM'], []) self.assertEqual(len(res['PC']), 4) self.assertEqual(len(res['BP']), 11) self.assertEqual(len(res['NP']), 1) def test_RnaviewParser(self): """RnaviewParser: should work with/without model and/or verification """ rnaview_lines = RNAVIEW_PDB_REAL.split('\n') obs = RnaviewParser(rnaview_lines) self.assertEqual(obs['FN'], 'pdb430d.ent') self.assertEqual(len(obs['UC']),1) self.assertEqual(len(obs['BP']),19) self.assertEqual(len(obs['BM']),0) self.assertEqual(obs['BM'],BaseMultiplets()) self.assertEqual(obs['PC']['Standard'],7) self.assertEqual(obs['BP'][2].Down.ResName,'c') self.assertEqual(obs['BP'][6].Edges,'stacked') self.assertEqual(obs['NP']['NUM_PAIRS'], 11) self.assertEqual(obs['NP']['NUM_BASES'], 29) def test_RnaviewParser_error(self): """RnaviewParser: strict or not""" lines = RNAVIEW_LINES_ERROR.split('\n') self.assertRaises(RnaViewParseError, RnaviewParser, lines, strict=True) obs = RnaviewParser(lines, strict=False) self.assertEqual(obs['NP'], None) self.assertEqual(obs['BP'][1].Up.ResId, '2') self.assertEqual(obs['PC']['Standard'], 6) RNAVIEW_LINES=\ """PDB data file name: 1EHZ.pdb uncommon residue I 1 on chain A [#1] assigned to: I uncommon residue 2MG 10 on chain A [#10] assigned to: g BEGIN_base-pair 1_72, A: 1 I-C 72 A: X/X cis n/a 58_60, A: 58 a-C 60 A: S/S tran syn !(s_s) END_base-pair Summary of triplets and higher multiplets BEGIN_multiplets 9_12_23_| [1 3] A: 9 A + A: 12 U + A: 23 A 13_22_46_| [2 3] A: 13 C + A: 22 G + A: 46 g END_multiplets The total base pairs = 30 (from 76 bases) ------------------------------------------------ Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran 19 3 1 0 1 0 0 WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran 0 3 0 2 0 0 ------------------------------------------------ """ RNAVIEW_LINES_SHORT=\ """PDB data file name: pdb17ra.ent_nmr.pdb BEGIN_base-pair 1_21, : 1 G-C 21 : +/+ cis XIX 2_20, : 2 G-C 20 : +/+ cis XIX 3_19, : 3 C-G 19 : +/+ cis XIX 4_18, : 4 G-U 18 : W/W cis XXVIII 5_6, : 5 U-A 6 : stacked 5_17, : 5 U-A 17 : -/- cis XX 7_16, : 7 A-U 16 : W/W cis n/a 8_15, : 8 G-C 15 : +/+ cis XIX 9_14, : 9 G-C 14 : +/+ cis XIX 10_14, : 10 A-C 14 : stacked 11_13, : 11 U-A 13 : S/W tran n/a END_base-pair The total base pairs = 9 (from 21 bases) ------------------------------------------------ Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran 6 2 0 0 0 0 0 WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran 0 0 0 1 0 0 ------------------------------------------------""" PC_COUNTS1=\ """ Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran 1 0 0 0 0 0 0 WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran 12 0 300 0 0 0 Single-bond Bifurcated 0 2""" PC_COUNTS2=\ """ Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran 19 3 1 0 1 0 0 WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran 0 3 0 2 0 0""" UC_LINES=\ """uncommon residue TLN 16 on chain D [#16] assigned to: u uncommon residue LCG 17 on chain D [#17] assigned to: g uncommon residue OMG 2588 on chain 0 [#2430] assigned to: g uncommon residue PSU 2621 on chain [#2463] assigned to: P""" UC_LINES_WRONG=\ """uncommon residue 16 on chain D [#16] assigned to: u uncommon residue LCG on chain D [#17] assigned to: g uncommon residue OMG 2588 on chain 0 [#2430] assigned to: """ RNAVIEW_LINES_TOTAL=\ """PDB data file name: 1EHZ.pdb uncommon residue PSU 1 on chain A [#1] assigned to: P uncommon residue 2MG 10 on chain A [#10] assigned to: g BEGIN_base-pair 1_72, A: 1 P-C 20 A: +/+ cis n/a 1_72, A: 2 A-U 19 A: H/W cis n/a 58_60, A: 10 a-U 60 A: S/S tran syn !(s_s) END_base-pair Summary of triplets and higher multiplets BEGIN_multiplets 9_12_23_| [1 3] A: 1 P + A: 2 U + A: 60 U 13_22_46_| [2 3] A: 20 C + A: 19 U + A: 60 U END_multiplets The total base pairs = 30 (from 76 bases) ------------------------------------------------ Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran 19 3 1 0 1 0 0 WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran 0 3 0 2 0 0 ------------------------------------------------ """ RNAVIEW_PDB_REAL=\ """PDB data file name: pdb430d.ent uncommon residue +C 27 on chain A [#27] assigned to: c BEGIN_base-pair 1_29, A: 1 G-C 29 A: +/+ cis XIX 2_28, A: 2 G-C 28 A: +/+ cis XIX 3_27, A: 3 G-c 27 A: +/+ cis XIX 4_26, A: 4 U-A 26 A: -/- cis XX 5_25, A: 5 G-C 25 A: +/+ cis XIX 6_24, A: 6 C-G 24 A: +/+ cis XIX 7_8, A: 7 U-C 8 A: stacked 8_9, A: 8 C-A 9 A: stacked 9_21, A: 9 A-A 21 A: H/H tran II 11_20, A: 11 U-A 20 A: W/H tran XXIV 12_19, A: 12 A-G 19 A: H/S tran XI 13_18, A: 13 C-G 18 A: +/+ cis XIX 14_17, A: 14 G-A 17 A: S/H tran XI 25_26, A: 25 C-A 26 A: stacked 7_23, A: 7 U-C 23 A: W/W tran !1H(b_b) 8_23, A: 8 C-C 23 A: S/H cis !1H(b_b). 10_11, A: 10 G-U 11 A: S/H cis !1H(b_b) 11_21, A: 11 U-A 21 A: W/H tran !1H(b_b) 10_20, A: 10 G-A 20 A: W/. tran !(s_s) END_base-pair The total base pairs = 11 (from 29 bases) ------------------------------------------------ Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran 7 0 0 0 1 0 0 WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran 0 1 0 0 0 2 ------------------------------------------------""" RNAVIEW_LINES_ERROR=\ """PDB data file name: pdb17ra.ent_nmr.pdb BEGIN_base-pair 1_21, : 1 G-C 21 : +/+ cis XIX 2_20, : 2 G-C 20 : +/+ cis XIX 3_19, : 3 C-G 19 : +/+ cis XIX 4_18, : 4 G-U 18 : W/W cis XXVIII 5_6, : 5 U-A 6 : stacked 5_17, : 5 U-A 17 : -/- cis XX 7_16, : 7 A-U 16 : W/W cis n/a 8_15, : 8 G-C 15 : +/+ cis XIX 9_14, : 9 G-C 14 : +/+ cis XIX 10_14, : 10 A-C 14 : stacked 11_13, : 11 U-A 13 : S/W tran n/a END_base-pair The total base pairs = 9(from 21 bases) ------------------------------------------------ Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran 6 2 0 0 0 0 0 WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran 0 0 0 1 0 0 ------------------------------------------------""" #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_sprinzl.py000644 000765 000024 00000033626 12024702176 022700 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file evo/parsers/test_sprinzl.py """Unit tests for the Sprinzl tRNA database parser. """ from string import strip from cogent.parse.sprinzl import OneLineSprinzlParser, GenomicSprinzlParser,\ _fix_sequence, get_pieces, get_counts, sprinzl_to_vienna from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Jeremy Widmann", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" sample_file = """Accession@@@AA@@@Anticodon@@@Species@@@Strain@@@0@@@1@@@2@@@3@@@4@@@5@@@6@@@7@@@8@@@9@@@10@@@11@@@12@@@13@@@14@@@15@@@16@@@17@@@17A@@@18@@@19@@@20@@@20A@@@20B@@@21@@@22@@@23@@@24@@@25@@@26@@@27@@@28@@@29@@@30@@@31@@@32@@@33@@@34@@@35@@@36@@@37@@@38@@@39@@@40@@@41@@@42@@@43@@@44@@@45@@@e11@@@e12@@@e13@@@e14@@@e15@@@e16@@@e17@@@e1@@@e2@@@e3@@@e4@@@e5@@@e27@@@e26@@@e25@@@e24@@@e23@@@e22@@@e21@@@46@@@47@@@48@@@49@@@5.@@@51@@@52@@@53@@@54@@@55@@@56@@@57@@@58@@@59@@@60@@@61@@@62@@@63@@@64@@@65@@@66@@@67@@@68@@@69@@@7.@@@71@@@72@@@73@@@74@@@75@@@76 GA0000001@@@Ala@@@TGC@@@Haemophilus influenzae@@@Rd KW20@@@-@@@G@@@G@@@G@@@G@@@C@@@C@@@T@@@T@@@A@@@G@@@C@@@T@@@C@@@A@@@G@@@C@@@T@@@-@@@G@@@G@@@G@@@-@@@-@@@A@@@G@@@A@@@G@@@C@@@G@@@C@@@C@@@T@@@G@@@C@@@T@@@T@@@T@@@G@@@C@@@A@@@C@@@G@@@C@@@A@@@G@@@G@@@A@@@G@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@G@@@T@@@C@@@A@@@G@@@C@@@G@@@G@@@T@@@T@@@C@@@G@@@A@@@T@@@C@@@C@@@C@@@G@@@C@@@T@@@A@@@G@@@G@@@C@@@T@@@C@@@C@@@A@@@-@@@-@@@- GA0000002@@@Ala@@@GGC@@@Chlamydia pneumoniae @@@AR39@@@-@@@G@@@G@@@G@@@G@@@T@@@A@@@T@@@T@@@A@@@G@@@C@@@T@@@C@@@A@@@G@@@T@@@T@@@-@@@G@@@G@@@T@@@-@@@-@@@A@@@G@@@A@@@G@@@C@@@G@@@C@@@A@@@A@@@C@@@A@@@A@@@T@@@G@@@G@@@C@@@A@@@T@@@T@@@G@@@T@@@T@@@G@@@A@@@G@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@G@@@T@@@C@@@A@@@G@@@C@@@G@@@G@@@T@@@T@@@C@@@G@@@A@@@C@@@C@@@C@@@C@@@G@@@C@@@T@@@A@@@T@@@G@@@C@@@T@@@C@@@C@@@-@@@-@@@-@@@- GA0000003@@@Ala@@@TGC@@@Chlamydia pneumoniae @@@AR39@@@-@@@G@@@G@@@G@@@G@@@A@@@C@@@T@@@T@@@A@@@G@@@C@@@T@@@T@@@A@@@G@@@T@@@T@@@-@@@G@@@G@@@T@@@-@@@-@@@A@@@G@@@A@@@G@@@C@@@G@@@T@@@C@@@T@@@G@@@A@@@T@@@T@@@T@@@G@@@C@@@A@@@T@@@T@@@C@@@A@@@G@@@A@@@A@@@G@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@G@@@T@@@C@@@A@@@G@@@G@@@A@@@G@@@T@@@T@@@C@@@G@@@A@@@A@@@T@@@C@@@T@@@C@@@C@@@T@@@A@@@G@@@T@@@C@@@T@@@C@@@C@@@-@@@-@@@-@@@-""" sample_lines = ['\t'.join(i.split('@@@')) for i in sample_file.split('\n')] class OneLineSprinzlParserTests(TestCase): """Tests of OneLineSprinzlParser""" def setUp(self): """standard tRNA file""" self.tRNAs = sample_lines #open('data_sprinzl.txt').read().split('\n') def test_minimal(self): """OneLineSprinzlParser should work on a minimal 'file'""" small = ['acc\taa\tac\tsp\tst\ta\tb\tc','q\tw\te\tr\tt\tA\tC\tG'] p = OneLineSprinzlParser(small) result = list(p) self.assertEqual(len(result), 1) self.assertEqual(result[0], 'ACG') self.assertEqual(result[0].Info.Accession, 'q') self.assertEqual(result[0].Info.AA, 'w') self.assertEqual(result[0].Info.Anticodon, 'e') self.assertEqual(result[0].Info.Species, 'r') self.assertEqual(result[0].Info.Strain, 't') def test_init(self): """OneLineSprinzlParser should read small file correctly""" p = OneLineSprinzlParser(self.tRNAs) recs = list(p) self.assertEqual(len(recs), 3) first, second, third = recs assert first.Info.Labels is second.Info.Labels assert first.Info.Labels is third.Info.Labels expected_label_list = "0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 17A 18 19 20 20A 20B 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 e11 e12 e13 e14 e15 e16 e17 e1 e2 e3 e4 e5 e27 e26 e25 e24 e23 e22 e21 46 47 48 49 5. 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 7. 71 72 73 74 75 76".split() exp_labels = {} for i, label in enumerate(expected_label_list): exp_labels[label] = i self.assertEqual(first.Info.Labels, exp_labels) self.assertEqual(first.Info.Accession, 'GA0000001') self.assertEqual(first.Info.AA, 'Ala') self.assertEqual(first.Info.Anticodon, 'TGC') self.assertEqual(first.Info.Species, 'Haemophilus influenzae') self.assertEqual(first.Info.Strain, 'Rd KW20') self.assertEqual(first, '-GGGGCCTTAGCTCAGCT-GGG--AGAGCGCCTGCTTTGCACGCAGGAG-------------------GTCAGCGGTTCGATCCCGCTAGGCTCCA---'.replace('T','U')) self.assertEqual(third.Info.Accession, 'GA0000003') self.assertEqual(third.Info.AA, 'Ala') self.assertEqual(third.Info.Anticodon, 'TGC') self.assertEqual(third.Info.Species, 'Chlamydia pneumoniae') self.assertEqual(third.Info.Strain, 'AR39') self.assertEqual(third, '-GGGGACTTAGCTTAGTT-GGT--AGAGCGTCTGATTTGCATTCAGAAG-------------------GTCAGGAGTTCGAATCTCCTAGTCTCC----'.replace('T','U')) genomic_sample = """3\t5950\tsequences\t\t0\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\t13\t14\t15\t16\t17\t17A\t18\t19\t20\t20A\t20B\t21\t22\t23\t24\t25\t26\t27\t28\t29\t30\t31\t32\t33\t34\t35\t36\t37\t38\t39\t40\t41\t42\t43\t44\t45\te11\te12\te13\te14\te15\te16\te17\te1\te2\te3\te4\te5\te27\te26\te25\te24\te23\te22\te21\t46\t47\t48\t49\t50\t51\t52\t53\t54\t55\t56\t57\t58\t59\t60\t61\t62\t63\t64\t65\t66\t67\t68\t69\t70\t71\t72\t73\t74\t75\t76\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t 2\tGA0000001\tAla\t\tTGC\t\tHaemophilus influenzae\t\t\t\t\t\t\t\t\t\tRd KW20\t\t\t\tBacteria; Proteobacteria; gamma subdivision; Pasteurellaceae; Haemophilus\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t\t\t-\tG\tG\tG\tG\tC\tC\tT\tT\tA\tG\tC\tT\tC\tA\tG\tC\tT\t-\tG\tG\tG\t-\t-\tA\tG\tA\tG\tC\tG\tC\tC\tT\tG\tC\tT\tT\tT\tG\tC\tA\tC\tG\tC\tA\tG\tG\tA\tG\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\tG\tT\tC\tA\tG\tC\tG\tG\tT\tT\tC\tG\tA\tT\tC\tC\tC\tG\tC\tT\tA\tG\tG\tC\tT\tC\tC\tA\t-\t-\t-\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t\t\t\t=\t=\t*\t=\t=\t=\t=\t\t\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t=\t=\t=\t=\t*\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t 3\tGA0000002\tAla\t\tGGC\t\tChlamydia pneumoniae \t\t\t\t\t\t\t\t\t\tAR39\t\t\t\tBacteria; Chlamydiales; Chlamydiaceae; Chlamydophila; Chlamydophila pneumoniae\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t\t\t-\tG\tG\tG\tG\tT\tA\tT\tT\tA\tG\tC\tT\tC\tA\tG\tT\tT\t-\tG\tG\tT\t-\t-\tA\tG\tA\tG\tC\tG\tC\tA\tA\tC\tA\tA\tT\tG\tG\tC\tA\tT\tT\tG\tT\tT\tG\tA\tG\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\tG\tT\tC\tA\tG\tC\tG\tG\tT\tT\tC\tG\tA\tC\tC\tC\tC\tG\tC\tT\tA\tT\tG\tC\tT\tC\tC\t-\t-\t-\t-\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t\t\t\t=\t=\t*\t=\t*\t=\t=\t\t\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t=\t=\t*\t=\t*\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t 4\tGA0000003\tAla\t\tTGC\t\tChlamydia pneumoniae \t\t\t\t\t\t\t\t\t\tAR39\t\t\t\tBacteria; Chlamydiales; Chlamydiaceae; Chlamydophila; Chlamydophila pneumoniae\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t\t\t-\tG\tG\tG\tG\tA\tC\tT\tT\tA\tG\tC\tT\tT\tA\tG\tT\tT\t-\tG\tG\tT\t-\t-\tA\tG\tA\tG\tC\tG\tT\tC\tT\tG\tA\tT\tT\tT\tG\tC\tA\tT\tT\tC\tA\tG\tA\tA\tG\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\tG\tT\tC\tA\tG\tG\tA\tG\tT\tT\tC\tG\tA\tA\tT\tC\tT\tC\tC\tT\tA\tG\tT\tC\tT\tC\tC\t-\t-\t-\t-\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t\t\t\t=\t=\t*\t=\t=\t=\t=\t\t\t=\t=\t=\t*\t\t\t\t\t\t\t\t\t\t\t\t*\t=\t=\t=\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t=\t=\t=\t=\t*\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t""".split('\n') class GenomicSprinzlParserTests(TestCase): """Tests of the GenomicSprinzlParser class.""" def test_single(self): """GenomicSprinzlParser should work with single sequence""" seqs = list(GenomicSprinzlParser(genomic_sample[0:4])) self.assertEqual(len(seqs), 1) s = seqs[0] self.assertEqual(s, '-GGGGCCTTAGCTCAGCT-GGG--AGAGCGCCTGCTTTGCACGCAGGAG-------------------GTCAGCGGTTCGATCCCGCTAGGCTCCA---'.replace('T','U')) self.assertEqual(s.Info.Accession, 'GA0000001') self.assertEqual(s.Info.AA, 'Ala') self.assertEqual(s.Info.Anticodon, 'UGC') self.assertEqual(s.Info.Species, 'Haemophilus influenzae') self.assertEqual(s.Info.Strain, 'Rd KW20') self.assertEqual(s.Info.Taxonomy, ['Bacteria', 'Proteobacteria', \ 'gamma subdivision', 'Pasteurellaceae', 'Haemophilus']) self.assertEqual(s.Pairing, '.==*====..====...........====.=====.......=====........................=====.......=========*==....') def test_multi(self): """GenomicSprinzlParser should work with multiple sequences""" seqs = list(GenomicSprinzlParser(genomic_sample)) self.assertEqual(len(seqs), 3) self.assertEqual([s.Info.Accession for s in seqs], \ ['GA0000001', 'GA0000002', 'GA0000003']) self.assertEqual(seqs[2].Info.Anticodon, 'UGC') self.assertEqual(seqs[0].Info.Order, seqs[2].Info.Order) class FixSequenceTests(TestCase): """Tests that _fix_structure functions properly.""" def test_fix_sequence(self): """Fix sequence should properly replace terminal gaps with CCA""" seqs = ['','ACGUUCC-','ACGUUC--','ACGUU---','ACGU----'] results = ['','ACGUUCCA','ACGUUCCA','ACGUUCCA','ACGU-CCA'] for s,r in zip(seqs,results): self.assertEqual(_fix_sequence(s),r) class SprinzlToViennaTests(TestCase): def setUp(self): """setUp function for SprinzlToViennaTests""" self.structures = map(strip,STRUCTURES.split('\n')) self.vienna_structs = map(strip,VIENNA.split('\n')) self.short_struct = '...===...===.' #structure too long self.incorrect1 = ''.join(['..=====*..*==.............==*...=.=...', '....=.=..........................=====.......=====*=====......']) #two halves don't match self.incorrect2 = ''.join(['..=====*..*===............==*...=.=...', '....=.=..........................=====.......=====*=====.....']) def test_get_pieces(self): """get_pieces: should return the correct pieces""" splits = [0,3,7,-1,13] self.assertEqual(get_pieces(self.short_struct, splits),\ ['...','===.','..===','.']) #will include empty strings for indices outside of the structure self.assertEqual(get_pieces(self.short_struct,[2,10,20,30]),\ ['.===...=','==.','']) #will return empty list if no break-positions are given self.assertEqual(get_pieces(self.short_struct,[]), []) def test_get_counts(self): """get_counts: should return list of lengths of paired regions""" self.assertEqual(get_counts('.===.=..'),[3,1]) self.assertEqual(get_counts('...====..'),[4]) self.assertEqual(get_counts('...'),[]) def test_sprinzl_to_vienna(self): """sprinzl_to_vienna: should give expected output""" #This should only work for correct for sprinzl,vienna in zip(self.structures, self.vienna_structs): self.assertEqual(sprinzl_to_vienna(sprinzl),vienna) #Check two obvious errors self.assertRaises(AssertionError,sprinzl_to_vienna,self.incorrect1) self.assertRaises(AssertionError,sprinzl_to_vienna,self.incorrect2) STRUCTURES="""\ .===*===..====...........====.=====.......=====........................=====.......========*===.... .=======..=*=.............=*=.=====.......=====........................=====.......============.... .=======..====...........====.*====.......====*........................=.===.......===.========.... .=====.=..====...........====..===.........===.........................**===.......===**=.=====.... ....====..====...........====.*====.......====*........................=====.......=========....... ..=====*..*==.............==*...=.=.......=.=..........................=====.......=====*=====..... .=.=.==*..*.=*...........*=.*.*=.==.......==.=*........................=====.......=====*==.=.=.... .====*.=..**.=...........=.**.*====.......====*........................=.*.=.......=.*.==.*====.... .====*.=..**.=............=**.*====.......====*........................=.*.=.......=.*.==.*====....""" VIENNA="""\ .(((((((..((((...........)))).(((((.......)))))........................(((((.......)))))))))))).... .(((((((..(((.............))).(((((.......)))))........................(((((.......)))))))))))).... .(((((((..((((...........)))).(((((.......)))))........................(.(((.......))).)))))))).... .(((((.(..((((...........))))..(((.........))).........................(((((.......)))))).))))).... ....((((..((((...........)))).(((((.......)))))........................(((((.......)))))))))....... ..((((((..(((.............)))...(.(.......).)..........................(((((.......)))))))))))..... .(.(.(((..(.((...........)).).((.((.......)).))........................(((((.......)))))))).).).... .(((((.(..((.(...........).)).(((((.......)))))........................(.(.(.......).).)).))))).... .(((((.(..((.(............))).(((((.......)))))........................(.(.(.......).).)).)))))....""" if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_stockholm.py000644 000765 000024 00000062130 12024702176 023172 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Provides tests for StockholmParser and related classes and functions. """ from cogent.parse.stockholm import is_gf_line, is_gc_line, is_gs_line, \ is_gr_line, is_seq_line, is_structure_line, GfToInfo, GcToInfo, GsToInfo, \ GrToInfo, MinimalStockholmParser, StockholmFinder, \ StockholmParser, Sequence, is_empty_or_html from cogent.util.unit_test import TestCase, main from cogent.parse.record import RecordError from cogent.core.info import Info from cogent.struct.rna2d import WussStructure from cogent.core.alignment import Alignment from cogent.core.moltype import BYTES __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" Sequence = BYTES.Sequence class StockholmParserTests(TestCase): """ Tests componenets of the stockholm parser, in the stockholm.py file """ def setUp(self): """ Construct some fake data for testing purposes """ self._fake_headers = [] temp = list(fake_headers.split('\n')) for line in temp: self._fake_headers.append(line.strip()) del temp self._fake_gc_annotation = [] temp = list(fake_gc_annotation.split('\n')) for line in temp: self._fake_gc_annotation.append(line.strip()) del temp self._fake_gs_annotation = [] temp = list(fake_gs_annotation.split('\n')) for line in temp: self._fake_gs_annotation.append(line.strip()) del temp self._fake_gr_annotation = [] temp = list(fake_gr_annotation.split('\n')) for line in temp: self._fake_gr_annotation.append(line.strip()) del temp self._fake_record_no_headers =\ list(fake_record_no_headers.split('\n')) self._fake_record_no_sequences =\ list(fake_record_no_sequences.split('\n')) self._fake_record_no_structure =\ list(fake_record_no_structure.split('\n')) self._fake_two_records =\ list(fake_two_records.split('\n')) self._fake_record =\ list(fake_record.split('\n')) self._fake_record_bad_header_1 =\ list(fake_record_bad_header_1.split('\n')) self._fake_record_bad_header_2 =\ list(fake_record_bad_header_2.split('\n')) self._fake_record_bad_sequence_1 =\ list(fake_record_bad_sequence_1.split('\n')) self._fake_record_bad_structure_1 =\ list(fake_record_bad_structure_1.split('\n')) self._fake_record_bad_structure_2 =\ list(fake_record_bad_structure_2.split('\n')) self.single_family = single_family.split('\n') def test_is_empty_or_html(self): """is_empty_or_html: should ignore empty and HTML line""" line = ' ' self.assertEqual(is_empty_or_html(line), True) line = '\n\n' self.assertEqual(is_empty_or_html(line), True) line = '
'
        self.assertEqual(is_empty_or_html(line), True)
        line = '
\n\n' self.assertEqual(is_empty_or_html(line), True) line = '\t>..'), True) self.assertEqual(is_gr_line('#=GR RF cGGacG'), True) self.assertEqual(is_gr_line(''), False) self.assertEqual(is_gr_line('X07545.1/505-619 '), False) self.assertEqual(is_gr_line('#=G'), False) self.assertEqual(is_gr_line('=GF'), False) self.assertEqual(is_gr_line('#=GC SS_cons'), False) def test_is_seq_line(self): """is_seq_line: functions correctly w/ various lines """ s = 'X07545.1/505-619 .\ .ACCCGGC.CAUA...GUGGCCG.GGCAA.CAC.CCGG.U.C..UCGUU' assert is_seq_line('s') assert is_seq_line('X07545.1/505-619') assert is_seq_line('M21086.1/8-123') assert not is_seq_line('') assert not is_seq_line('#GF=') assert not is_seq_line('//blah') def test_is_structure_line(self): """is_structure_line: functions correctly w/ various lines """ s = '#=GC SS_cons\ <<<<<<<<<........<<.<<<<.<...<.<...<<<<.<.<.......' self.assertEqual(is_structure_line(s), True) self.assertEqual(is_structure_line('#=GC SS_cons'), False) self.assertEqual(is_structure_line('#=GC SS_cons2'), False) self.assertEqual(is_structure_line('#=GC SS_cons '), True) self.assertEqual(is_structure_line(''), False) self.assertEqual(is_structure_line(' '), False) self.assertEqual(is_structure_line('#=GF AC RF00001'), False) self.assertEqual(is_structure_line('X07545.1/505-619'), False) self.assertEqual(is_structure_line('=GC SS_cons'), False) self.assertEqual(is_structure_line('#=GC'), False) self.assertEqual(is_structure_line('#=GC RF'), False) def test_GfToInfo(self): """GfToInfo: correctly builds info object from header information""" info = GfToInfo(self._fake_headers) self.assertEqual(info['AccessionNumber'], 'RF00001') self.assertEqual(info['Identification'], '5S_rRNA') self.assertEqual(info['Comment'], 'This is a short comment') self.assertEqual(info['Author'], 'Griffiths-Jones SR') self.assertEqual(info['Sequences'], '606') self.assertEqual(info['DatabaseReference'],\ ['URL; http://oberon.fvms.ugent.be:8080/rRNA/ssu/index.html;',\ 'URL; http://rdp.cme.msu.edu/html/;']) self.assertEqual(info['PK'],'not real') def test_GfToInfo_invalid_data(self): """GfToInfo: correctly raises error when necessary """ invalid_headers = [['#=GF ACRF00001'],['#=GFACRF00001']] for h in invalid_headers: self.assertRaises(RecordError, GfToInfo, h) def test_GcToInfo(self): """GcToInfo: correctly builds info object from header information""" info = GcToInfo(self._fake_gc_annotation) self.assertEqual(info['ConsensusSecondaryStructure'], \ '..........<<<<<<<<<<.....>>>>>>>>>>..') self.assertEqual(info['ReferenceAnnotation'], \ 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx') def test_GcToInfo_invalid_data(self): """GcToInfo: correctly raises error when necessary """ invalid_headers = [['#=GCSS_cons ..<<..>>..'],['#=GCSAxxxxxxx']] for h in invalid_headers: self.assertRaises(RecordError, GcToInfo, h) def test_GsToInfo(self): """GsToInfo: correctly builds info object from header information""" info = GsToInfo(self._fake_gs_annotation) self.assertEqual(info['BasePair'], \ {'1N77_C':['0 70 cWW CC','1 69 cWW CC','2 68 cWW CC',\ '3 67 cWW CC']}) def test_GsToInfo_invalid_data(self): """GsToInfo: correctly raises error when necessary """ invalid_headers = [['#=GSBPS 0 10 cwW CC'],['#=GSACRF00001']] for h in invalid_headers: self.assertRaises(RecordError, GsToInfo, h) def test_GrToInfo(self): """GrToInfo: correctly builds info object from header information""" info = GrToInfo(self._fake_gr_annotation) self.assertEqual(info['SecondaryStructure'], \ {'1N77_C':'..........<<<<<<<<<<.....>>>>>>>>>>..'}) self.assertEqual(info['ReferenceAnnotation'], \ {'1N77_C':'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'}) def test_GrToInfo_invalid_data(self): """GrToInfo: correctly raises error when necessary """ invalid_headers = [['#=GRSS ..<<..>>..'],['#=GRSAxxxxxx']] for h in invalid_headers: self.assertRaises(RecordError, GrToInfo, h) def test_StockholmStockholmParser_strict_missing_fields(self): """MinimalStockholmParser: toggle strict functions w/ missing fields""" # strict = True self.assertRaises(RecordError,list,\ MinimalStockholmParser(self._fake_record_no_sequences)) # strict = False # no header shouldn't be a problem headers, aln, struct = \ list(MinimalStockholmParser(self._fake_record_no_headers,\ strict=False))[0] self.assertEqual((headers,aln.todict(),str(struct)), \ ({'GS':[],'GF':[],'GR':[],\ 'GC':['#=GC SS_cons ............>>>']},\ {'Z11765.1/1-89':'GGUC'},'............>>>')) # should get empty on missing sequence or missing structure self.assertEqual(list(MinimalStockholmParser(\ self._fake_record_no_sequences,\ strict=False)), []) def test_MinimalStockholmParser_strict_invalid_sequence(self): """MinimalStockholmParser: toggle strict functions w/ invalid seq """ #strict = True self.assertRaises(RecordError,list,\ MinimalStockholmParser(self._fake_record_bad_sequence_1)) # strict = False # you expect to get back as much information as possible, also # half records or sequences result = MinimalStockholmParser(\ self._fake_record_bad_sequence_1,strict=False) self.assertEqual(len(list(MinimalStockholmParser(\ self._fake_record_bad_sequence_1,strict=False))[0][1].NamedSeqs), 3) def test_StockholmParser_strict_invalid_structure(self): """StockholmParser: toggle strict functions w/ invalid structure """ #strict = True self.assertRaises(RecordError,list,\ StockholmParser(self._fake_record_bad_structure_1)) # strict = False self.assertEqual(list(MinimalStockholmParser(\ self._fake_record_bad_structure_1,strict=False))[0][2],None) def test_MinimalStockholmParser_w_valid_data(self): """MinimalStockholmParser: integrity of output """ # Some ugly constructions here, but this is what the output of # parsing fake_two_records should be headers = ['#=GF AC RF00014','#=GF AU Mifsud W'] sequences =\ {'U17136.1/898-984':\ ''.join(['AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA',\ 'AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU']),\ 'M15749.1/155-239':\ ''.join(['AACGCAUCGGAUUUCCCGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUU',\ 'AGCAAGUUUGAUCCCGACUCCUG-CGAGUCGGGAUUU']),\ 'AF090431.1/222-139':\ ''.join(['CUCACAUCAGAUUUCCUGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUA',\ 'AGCAAGUUUGAUCCCGACCCGU--AGGGCCGGGAUUU'])} structure = WussStructure(''.join(\ ['...<<<<<<<.....>>>>>>>....................<<<<<...',\ '.>>>>>....<<<<<<<<<<.....>>>>>>>>>>..'])) data = [] for r in MinimalStockholmParser(self._fake_two_records, strict=False): data.append(r) self.assertEqual(\ (data[0][0]['GF'],data[0][1].todict(),\ str(data[0][2])),(headers,sequences,structure)) assert isinstance(data[0][1],Alignment) # This line tests that invalid entries are ignored when strict=False # Note, there are two records in self._fake_two_records, but 2nd is # invalid self.assertEqual(len(data),1) def test_StockholmFinder(self): """StockholmFinder: integrity of output """ fake_record = ['a','//','b','b','//'] num_records = 0 data = [] for r in StockholmFinder(fake_record): data.append(r) num_records += 1 self.assertEqual(num_records, 2) self.assertEqual(data[0], ['a','//']) self.assertEqual(data[1], ['b','b','//']) def test_StockholmParser(self): """StockholmParser: integrity of output """ expected_sequences =\ [''.join(['AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA',\ 'AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU']),\ ''.join(['AACGCAUCGGAUUUCCCGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUU',\ 'AGCAAGUUUGAUCCCGACUCCUG-CGAGUCGGGAUUU']),\ ''.join(['CUCACAUCAGAUUUCCUGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUA',\ 'AGCAAGUUUGAUCCCGACCCGU--AGGGCCGGGAUUU'])] expected_structure = ''.join(\ ['...<<<<<<<.....>>>>>>>....................<<<<<...',\ '.>>>>>....<<<<<<<<<<.....>>>>>>>>>>..']) for r in StockholmParser(self._fake_record): headers = r.Info sequences = r structure = r.Info['Struct'] self.assertEqual(headers['GF']['AccessionNumber'], 'RF00014') self.assertEqual(headers['GF']['Author'], 'Mifsud W') self.assertEqualItems(sequences.values(), expected_sequences) assert isinstance(sequences, Alignment) self.assertEqual(structure, expected_structure) assert isinstance(structure,WussStructure) def test_StockholmParser_strict_missing_fields(self): """StockholmParser: toggle strict functions correctly """ # strict = True self.assertRaises(RecordError,list,\ StockholmParser(self._fake_record_no_headers)) # strict = False self.assertEqual(list(StockholmParser(self._fake_record_no_headers,\ strict=False)), []) self.assertEqual(list(StockholmParser(self._fake_record_no_sequences,\ strict=False)), []) def test_StockholmParser_strict_invalid_headers(self): """StockholmParser: functions when toggling strict record w/ bad header """ self.assertRaises(RecordError,list,\ StockholmParser(self._fake_record_bad_header_1)) self.assertRaises(RecordError,list,\ StockholmParser(self._fake_record_bad_header_2)) # strict = False x = list(StockholmParser(self._fake_record_bad_header_1, strict=False)) obs = list(StockholmParser(self._fake_record_bad_header_1,\ strict=False))[0].Info.GF.keys() self.assertEqual(len(obs),1) obs = list(StockholmParser(self._fake_record_bad_header_2,\ strict=False))[0].Info.GF.keys() self.assertEqual(len(obs),1) def test_StockholmParser_strict_invalid_sequences(self): """StockholmParser: functions when toggling strict w/ record w/ bad seq """ self.assertRaises(RecordError,list, MinimalStockholmParser(self._fake_record_bad_sequence_1)) # strict = False # in 'False' mode you expect to get back as much as possible, also # parts of sequences self.assertEqual(len(list(StockholmParser(\ self._fake_record_bad_sequence_1,\ strict=False))[0].NamedSeqs), 3) def test_StockholmParser_strict_invalid_structure(self): """StockholmParser: functions when toggling strict record w/ bad struct """ # strict self.assertRaises(RecordError,list,\ StockholmParser(self._fake_record_bad_structure_2)) #not strict self.assertEqual(list(StockholmParser(\ self._fake_record_bad_structure_2,\ strict=False)),[]) def test_StockholmParser_single_family(self): """StockholmParser: should work on a family in stockholm format""" exp_header = {} exp_aln = {'K02120.1/628-682':\ 'AUGGGAAAUUCCCCCUCCUAUAACCCCCCCGCUGGUAUCUCCCCCUCAGACUGGC',\ 'D00647.1/629-683':\ 'AUGGGAAACUCCCCCUCCUAUAACCCCCCCGCUGGCAUCUCCCCCUCAGACUGGC'} exp_struct = '<<<<<<.........>>>>>>.........<<<<<<.............>>>>>>' aln = list(StockholmParser(self.single_family))[0] h = aln.Info['GF'] a = aln s = aln.Info['Struct'] self.assertEqual(h,exp_header) self.assertEqual(a,exp_aln) self.assertEqual(s,exp_struct) # This is an altered version of some header info from Rfam.seed modified to # incorporate different cases for testing fake_headers = """#=GF AC RF00001 #=GF AU Griffiths-Jones SR #=GF ID 5S_rRNA #=GF RT 5S Ribosomal RNA Database. #=GF DR URL; http://oberon.fvms.ugent.be:8080/rRNA/ssu/index.html; #=GF DR URL; http://rdp.cme.msu.edu/html/; #=GF CC This is a short #=GF CC comment #=GF SQ 606 #=GF PK not real""" fake_gc_annotation = """#=GC SS_cons ..........<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx """ fake_gs_annotation = """#=GS 1N77_C BP 0 70 cWW CC #=GS 1N77_C BP 1 69 cWW CC #=GS 1N77_C BP 2 68 cWW CC #=GS 1N77_C BP 3 67 cWW CC """ fake_gr_annotation = """#=GR 1N77_C SS ..........<<<<<<<<<<.....>>>>>>>>>>.. #=GR 1N77_C RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx """ fake_record_no_headers ="""Z11765.1/1-89 GGUC #=GC SS_cons ............>>> //""" fake_record_no_sequences ="""#=GF AC RF00006 #=GC SS_cons ............> //""" fake_record_no_structure ="""#=GF AC RF00006 Z11765.1/1-89 GGUCAGC //""" fake_two_records ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx // #=GF AC RF00015 //""" fake_record ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_header_1 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AUMifsudW U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_header_2 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GFAUMifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_sequence_1 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_structure_1 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons...<<<<<<<.....>>>>>>>....................<<<<<... #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU #=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" fake_record_bad_structure_2 ="""# STOCKHOLM 1.0 #=GF AC RF00014 #=GF AU Mifsud W U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA #=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<>>>>....<<<<<<<<<<.....>>>>>>>>>>.. #=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx //""" single_family=\ """K02120.1/628-682 AUGGGAAAUUCCCCCUCCUAUAACCCCCCCGCUGGUAUCUCCCCCUCAGA D00647.1/629-683 AUGGGAAACUCCCCCUCCUAUAACCCCCCCGCUGGCAUCUCCCCCUCAGA #=GC SS_cons <<<<<<.........>>>>>>.........<<<<<<.............> K02120.1/628-682 CUGGC D00647.1/629-683 CUGGC #=GC SS_cons >>>>> //""" # Run tests if called from the command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_stride.py000644 000765 000024 00000004522 12024702176 022462 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os try: from cogent.util.unit_test import TestCase, main from cogent.struct.selection import einput from cogent.parse.pdb import PDBParser from cogent.parse.stride import stride_parser from cogent.app.stride import Stride, stride_xtra except ImportError: from zenpdb.cogent.util.unit_test import TestCase, main from zenpdb.cogent.struct.selection import einput from zenpdb.cogent.parse.pdb import PDBParser from zenpdb.cogent.parse.stride import stride_parser from zenpdb.cogent.app.stride import Stride, stride_xtra __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class StrideParseTest(TestCase): """Tests for Stride application controller.""" def setUp(self): input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(input_file)) stride_app = Stride() res = stride_app(self.input_structure) self.lines = res['StdOut'].readlines() def test_stride_parser(self): """tests if output is parsed fully""" id_xtra = stride_parser(self.lines) assert len(id_xtra) < len(self.input_structure[(0,)][('A',)]) + \ len(self.input_structure[(0,)][('B',)]) self.input_structure[(0,)][('A',)].remove_hetero() self.input_structure[(0,)][('B',)].remove_hetero() assert len(id_xtra) == len(self.input_structure[(0,)][('A',)]) + \ len(self.input_structure[(0,)][('B',)]) def test_stride_xtra(self): """tests if residues get annotated with parsed data.""" stride_xtra(self.input_structure) self.assertEquals(\ self.input_structure[(0,)][('A',)][(('H_HOH', 138, ' '),)].xtra, {}) self.assertAlmostEquals(\ self.input_structure[(0,)][('A',)][(('ILE', 86, ' '),)].xtra['STRIDE_ASA'], 13.9) self.input_structure[(0,)][('A',)].remove_hetero() self.input_structure[(0,)][('B',)].remove_hetero() all_residues = einput(self.input_structure, 'R') a = all_residues.data_children('STRIDE_ASA', xtra=True, forgiving=False) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_structure.py000755 000765 000024 00000002300 12024702176 023223 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the pdb parser. """ from cogent.util.unit_test import TestCase, main from cogent.core.entity import Structure from cogent.parse.structure import FromFilenameStructureParser, FromFileStructureParser __author__ = "Marcin Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Production" class structuresTests(TestCase): """Tests of cogent.parse.structure UI functions.""" def test_FromFilenameStructureParser(self): structure = FromFilenameStructureParser('data/1LJO.pdb', 'pdb') self.assertRaises(TypeError, FromFilenameStructureParser, open('data/1LJO.pdb'), 'pdb') assert isinstance(structure, Structure) def test_FromFileStructureParser(self): structure = FromFileStructureParser(open('data/1LJO.pdb'), 'pdb') assert isinstance(structure, Structure) self.assertRaises(TypeError, FromFileStructureParser, 'data/1LJO.pdb', 'pdb') assert isinstance(structure, Structure) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_tinyseq.py000644 000765 000024 00000003245 12024702176 022665 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from StringIO import StringIO import xml.dom.minidom from cogent.util.unit_test import TestCase, main from cogent.parse.tinyseq import TinyseqParser __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" data = """ 31322957 AY286018.1 9315 Macropus eugenii Macropus eugenii medium wave-sensitive opsin 1 (OPN1MW) mRNA, complete cds 99 GGCAGGGAAAGGGAAGAAAGTAAAGGGGCCATGACACAGGCATGGGACCCTGCAGGGTTCTTGGCTTGGCGGCGGGACGAGAACGAGGAGACGACTCGG """ sample_seq = ">AY286018.1\nGGCAGGGAAAGGGAAGAAAGTAAAGGGGCCATGACACAGGCATGGGACCCTGCAGGGTTCTTGGCTTGGCGGCGGGACGAGAACGAGGAGACGACTCGG" sample_annotations = '[genbank_id "AY286018.1" at [0:99]/99, organism "Macropus eugenii" at [0:99]/99]' class ParseTinyseq(TestCase): def test_parse(self): for name,seq in [TinyseqParser(data).next(),TinyseqParser(xml.dom.minidom.parseString(data)).next()]: self.assertEqual(name, 'AY286018.1') self.assertEqual(sample_seq, seq.toFasta()) self.assertEqual(str(seq.annotations), sample_annotations) pass if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_parse/test_tree.py000644 000765 000024 00000021446 12024702176 022133 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for tree parsers. """ from cogent.parse.tree import DndParser, DndTokenizer, RecordError from cogent.core.tree import PhyloNode from cogent.util.unit_test import TestCase, main #from cogent.parse.newick import parse_string, TreeParseError as RecordError #def DndParser(data, NodeClass=PhyloNode, unescape_name=True): # if not unescape_name: # raise NotImplementedError # def constructor(children, name, attribs): # return NodeClass(Children = list(children or []), Name=name, Params=attribs) # return parse_string(data, constructor) __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" sample = """ ( ( xyz:0.28124, ( def:0.24498, mno:0.03627) :0.17710) :0.04870, abc:0.05925, ( ghi:0.06914, jkl:0.13776) :0.09853); """ node_data_sample = """ ( ( xyz:0.28124, ( def:0.24498, mno:0.03627) 'A':0.17710) B:0.04870, abc:0.05925, ( ghi:0.06914, jkl:0.13776) C:0.09853); """ minimal = "();" no_names = "((,),(,));" missing_tip_name = "((a,b),(c,));" empty = '();' single = '(abc:3);' double = '(abc:3, def:4);' onenest = '(abc:3, (def:4, ghi:5):6 );' nodedata = '(abc:3, (def:4, ghi:5)jkl:6 );' class DndTokenizerTests(TestCase): """Tests of the DndTokenizer factory function.""" def test_gdata(self): """DndTokenizer should work as expected on real data""" exp = \ ['(', '(', 'xyz', ':', '0.28124',',', '(', 'def', ':', '0.24498',\ ',', 'mno', ':', '0.03627', ')', ':', '0.17710', ')', ':', '0.04870', \ ',', 'abc', ':', '0.05925', ',', '(', 'ghi', ':', '0.06914', ',', \ 'jkl', ':', '0.13776', ')', ':', '0.09853', ')', ';'] #split it up for debugging on an item-by-item basis obs = list(DndTokenizer(sample)) self.assertEqual(len(obs), len(exp)) for i, j in zip(obs, exp): self.assertEqual(i, j) #try it all in one go self.assertEqual(list(DndTokenizer(sample)), exp) def test_nonames(self): """DndTokenizer should work as expected on trees with no names""" exp = ['(','(',',',')',',','(',',',')',')',';'] obs = list(DndTokenizer(no_names)) self.assertEqual(obs, exp) def test_missing_tip_name(self): """DndTokenizer should work as expected on trees with a missing name""" exp = ['(','(','a',',','b',')',',','(','c',',',')',')',';'] obs = list(DndTokenizer(missing_tip_name)) self.assertEqual(obs, exp) def test_minimal(self): """DndTokenizer should work as expected a minimal tree without names""" exp = ['(',')',';'] obs = list(DndTokenizer(minimal)) self.assertEqual(obs, exp) class DndParserTests(TestCase): """Tests of the DndParser factory function.""" def test_nonames(self): """DndParser should produce the correct tree when there are no names""" obs = DndParser(no_names) exp = PhyloNode() exp.append(PhyloNode()) exp.append(PhyloNode()) exp.Children[0].append(PhyloNode()) exp.Children[0].append(PhyloNode()) exp.Children[1].append(PhyloNode()) exp.Children[1].append(PhyloNode()) self.assertEqual(str(obs), str(exp)) def test_minimal(self): """DndParser should produce the correct minimal tree""" obs = DndParser(minimal) exp = PhyloNode() exp.append(PhyloNode()) self.assertEqual(str(obs), str(exp)) def test_missing_tip_name(self): """DndParser should produce the correct tree when missing a name""" obs = DndParser(missing_tip_name) exp = PhyloNode() exp.append(PhyloNode()) exp.append(PhyloNode()) exp.Children[0].append(PhyloNode(Name='a')) exp.Children[0].append(PhyloNode(Name='b')) exp.Children[1].append(PhyloNode(Name='c')) exp.Children[1].append(PhyloNode()) self.assertEqual(str(obs), str(exp)) def test_gsingle(self): """DndParser should produce a single-child PhyloNode on minimal data""" t = DndParser(single) self.assertEqual(len(t), 1) child = t[0] self.assertEqual(child.Name, 'abc') self.assertEqual(child.Length, 3) self.assertEqual(str(t), '(abc:3.0);') def test_gdouble(self): """DndParser should produce a double-child PhyloNode from data""" t = DndParser(double) self.assertEqual(len(t), 2) self.assertEqual(str(t), '(abc:3.0,def:4.0);') def test_gonenest(self): """DndParser should work correctly with nested data""" t = DndParser(onenest) self.assertEqual(len(t), 2) self.assertEqual(len(t[0]), 0) #first child is terminal self.assertEqual(len(t[1]), 2) #second child has two children self.assertEqual(str(t), '(abc:3.0,(def:4.0,ghi:5.0):6.0);') def test_gnodedata(self): """DndParser should assign Name to internal nodes correctly""" t = DndParser(nodedata) self.assertEqual(len(t), 2) self.assertEqual(len(t[0]), 0) #first child is terminal self.assertEqual(len(t[1]), 2) #second child has two children self.assertEqual(str(t), '(abc:3.0,(def:4.0,ghi:5.0)jkl:6.0);') info_dict = {} for node in t.traverse(): info_dict[node.Name] = node.Length self.assertEqual(info_dict['abc'], 3.0) self.assertEqual(info_dict['def'], 4.0) self.assertEqual(info_dict['ghi'], 5.0) self.assertEqual(info_dict['jkl'], 6.0) def test_data(self): """DndParser should work as expected on real data""" t = DndParser(sample) self.assertEqual(str(t), '((xyz:0.28124,(def:0.24498,mno:0.03627):0.1771):0.0487,abc:0.05925,(ghi:0.06914,jkl:0.13776):0.09853);') tdata = DndParser(node_data_sample, unescape_name=True) self.assertEqual(str(tdata), "((xyz:0.28124,(def:0.24498,mno:0.03627)A:0.1771)B:0.0487,abc:0.05925,(ghi:0.06914,jkl:0.13776)C:0.09853);") def test_gbad(self): """DndParser should fail if parens unbalanced""" left = '((abc:3)' right = '(abc:3))' self.assertRaises(RecordError, DndParser, left) self.assertRaises(RecordError, DndParser, right) def test_DndParser(self): """DndParser tests""" t_str = "(A_a,(B:1.0,C),'D_e':0.5)E;" tree_unesc = DndParser(t_str, PhyloNode, unescape_name=True) tree_esc = DndParser(t_str, PhyloNode, unescape_name=False) self.assertEqual(tree_unesc.Name, 'E') self.assertEqual(tree_unesc.Children[0].Name, 'A a') self.assertEqual(tree_unesc.Children[1].Children[0].Name, 'B') self.assertEqual(tree_unesc.Children[1].Children[0].Length, 1.0) self.assertEqual(tree_unesc.Children[1].Children[1].Name, 'C') self.assertEqual(tree_unesc.Children[2].Name, 'D_e') self.assertEqual(tree_unesc.Children[2].Length, 0.5) self.assertEqual(tree_esc.Name, 'E') self.assertEqual(tree_esc.Children[0].Name, 'A_a') self.assertEqual(tree_esc.Children[1].Children[0].Name, 'B') self.assertEqual(tree_esc.Children[1].Children[0].Length, 1.0) self.assertEqual(tree_esc.Children[1].Children[1].Name, 'C') self.assertEqual(tree_esc.Children[2].Name, "'D_e'") self.assertEqual(tree_esc.Children[2].Length, 0.5) reload_test = tree_esc.getNewick(with_distances=True, \ escape_name=False) obs = DndParser(reload_test, unescape_name=False) self.assertEqual(obs.getNewick(with_distances=True), \ tree_esc.getNewick(with_distances=True)) reload_test = tree_unesc.getNewick(with_distances=True, \ escape_name=False) obs = DndParser(reload_test, unescape_name=False) self.assertEqual(obs.getNewick(with_distances=True), \ tree_unesc.getNewick(with_distances=True)) class PhyloNodeTests(TestCase): """Check that PhyloNode works the way I think""" def test_gops(self): """Basic PhyloNode operations should work as expected""" p = PhyloNode() self.assertEqual(str(p), ';') p.Name = 'abc' self.assertEqual(str(p), 'abc;') p.Length = 3 self.assertEqual(str(p), 'abc:3;') #don't suppress branch from root q = PhyloNode() p.append(q) self.assertEqual(str(p), '()abc:3;') r = PhyloNode() q.append(r) self.assertEqual(str(p), '(())abc:3;') r.Name = 'xyz' self.assertEqual(str(p), '((xyz))abc:3;') q.Length = 2 self.assertEqual(str(p), '((xyz):2)abc:3;') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_parse/test_unigene.py000644 000765 000024 00000014504 12024702176 022623 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for unigene-specific classes """ from cogent.parse.unigene import _read_sts, _read_expression, UniGeneSeqRecord,\ UniGeneProtSimRecord, _read_seq, LinesToUniGene from cogent.parse.record_finder import GbFinder from cogent.util.unit_test import TestCase, main __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class unigeneTests(TestCase): """Tests toplevel functions.""" def test_read_sts(self): """_read_sts should perform correct conversions""" self.assertEqual(_read_sts('ACC=RH128467 UNISTS=211775\n'), \ {'ACC':'RH128467', 'UNISTS':'211775'}) def test_read_expression(self): """_read_expression should perform correct conversions""" self.assertEqual(_read_expression(\ 'embryo ; whole body ; mammary gland ; brain\n'), ['embryo', 'whole body', 'mammary gland', 'brain']) def test_read_seq(self): """_read_seq should perform correct conversions""" #reset the found fields, since we can't guarantee order of test #execution and it's persistent class data UniGeneSeqRecord.found_fields = {} self.assertEqual(_read_seq('ACC=BC025044.1\n'), \ UniGeneSeqRecord({'ACC':'BC025044.1'})) self.assertEqual(_read_seq(\ "ACC=AI842963.1; NID=g5477176; CLONE=UI-M-AO1-aem-f-10-0-UI; END=3'; LID=1944; SEQTYPE=EST; TRACE=158501677\n"), \ UniGeneSeqRecord({ 'ACC':'AI842963.1','NID':'g5477176', 'CLONE':'UI-M-AO1-aem-f-10-0-UI', 'END':"3'", 'LID':'1944', 'SEQTYPE':'EST', 'TRACE':'158501677'}) ) def test_LinesToUniGene(self): """LinesToUniGene should give expected results on sample data""" fake_file = \ """ID Mm.1 TITLE S100 calcium binder GENE S100a10 CYTOBAND 3 41.7 cM LOCUSLINK 20194 EXPRESS embryo ; whole body ; mammary gland ; brain CHROMOSOME 3 STS ACC=RH128467 UNISTS=211775 STS ACC=M16465 UNISTS= 178878 PROTSIM ORG=Homo sapiens; PROTGI=107251; PROTID=pir:JC1139; PCT=91; ALN=97 PROTSIM ORG=Mus musculus; PROTGI=116487; PROTID=sp:P08207; PCT=100; ALN=97 PROTSIM ORG=Rattus norvegicus; PROTGI=116489; PROTID=sp:P05943; PCT=94; ALN=94 SCOUNT 5 SEQUENCE ACC=BC025044.1; NID=g19263549; PID=g19263550; SEQTYPE=mRNA SEQUENCE ACC=AA471893.1; NID=g2199884; CLONE=IMAGE:872193; END=5'; LID=539; SEQTYPE=EST SEQUENCE ACC=AI842963.1; NID=g5477176; CLONE=UI-M-AO1-aem-f-10-0-UI; END=3'; LID=1944; SEQTYPE=EST; TRACE=158501677 SEQUENCE ACC=CB595147.1; NID=g29513003; CLONE=IMAGE:30300703; END=5'; LID=12885; MGC=6677832; SEQTYPE=EST SEQUENCE ACC=BY144053.1; NID=g26280109; CLONE=L930184D22; END=5'; LID=12267; SEQTYPE=EST // ID Mm.5 TITLE homeo box A10 GENE Hoxa10 CYTOBAND 6 26.33 cM LOCUSLINK 15395 EXPRESS kidney ; colon ; mammary gland CHROMOSOME 6 PROTSIM ORG=Caenorhabditis elegans; PROTGI=7510074; PROTID=pir:T31611; PCT=30; ALN=326 SCOUNT 1 SEQUENCE ACC=AW990320.1; NID=g8185938; CLONE=IMAGE:1513482; END=5'; LID=1043; SEQTYPE=EST; TRACE=94472873 // """ records = list(GbFinder(fake_file.split('\n'))) self.assertEqual(len(records), 2) first, second = map(LinesToUniGene, records) self.assertEqual(first.ID, 'Mm.1') self.assertEqual(first.TITLE, 'S100 calcium binder') self.assertEqual(first.GENE, 'S100a10') self.assertEqual(first.CYTOBAND, '3 41.7 cM') self.assertEqual(first.CHROMOSOME, '3') self.assertEqual(first.LOCUSLINK, 20194) self.assertEqual(first.EXPRESS, ['embryo', 'whole body', \ 'mammary gland', 'brain']) self.assertEqual(first.STS, [{'ACC':'RH128467','UNISTS':'211775'}, {'ACC':'M16465', 'UNISTS':'178878'}]) exp_prot_sim = map(UniGeneProtSimRecord, [ {'ORG':'Homo sapiens','PROTGI':'107251', 'PROTID':'pir:JC1139','PCT':'91','ALN':'97'}, {'ORG':'Mus musculus','PROTGI':'116487', 'PROTID':'sp:P08207','PCT':'100','ALN':'97'}, {'ORG':'Rattus norvegicus','PROTGI':'116489', 'PROTID':'sp:P05943','PCT':'94','ALN':'94'},]) for obs, exp in zip(first.PROTSIM, exp_prot_sim): self.assertEqual(obs, exp) self.assertEqual(first.SCOUNT, 5) exp_seqs = map(UniGeneSeqRecord, [ {'ACC':'BC025044.1', 'NID':'g19263549','PID':'g19263550', 'SEQTYPE':'mRNA'}, {'ACC':'AA471893.1','NID':'g2199884','END':"5'", 'CLONE':'IMAGE:872193','LID':'539', 'SEQTYPE':'EST'}, {'ACC':'AI842963.1','NID':'g5477176', 'CLONE':'UI-M-AO1-aem-f-10-0-UI','END':"3'",'LID':'1944', 'SEQTYPE':'EST','TRACE':'158501677'}, {'ACC':'CB595147.1','NID':'g29513003', 'CLONE':'IMAGE:30300703','END':"5'",'LID':'12885', 'MGC':'6677832', 'SEQTYPE':'EST'}, {'ACC':'BY144053.1','NID':'g26280109', 'CLONE':'L930184D22','END':"5'",'LID':'12267', 'SEQTYPE':'EST'}]) for obs, exp in zip(first.SEQUENCE, exp_seqs): self.assertEqual(obs, exp) self.assertEqual(second.ID, 'Mm.5') self.assertEqual(second.TITLE, 'homeo box A10') self.assertEqual(second.GENE, 'Hoxa10') self.assertEqual(second.CYTOBAND, '6 26.33 cM') self.assertEqual(second.LOCUSLINK, 15395) self.assertEqual(second.EXPRESS,['kidney','colon','mammary gland']) self.assertEqual(second.CHROMOSOME, '6') self.assertEqual(second.PROTSIM, map(UniGeneProtSimRecord, [ {'ORG':'Caenorhabditis elegans', 'PROTGI':'7510074', 'PROTID':'pir:T31611','PCT':'30', 'ALN':'326'}])) self.assertEqual(second.SCOUNT, 1) self.assertEqual(second.STS, []) self.assertEqual(second.SEQUENCE, map(UniGeneSeqRecord, [ {'ACC':'AW990320.1','NID':'g8185938', 'CLONE':'IMAGE:1513482','END':"5'",'LID':'1043', 'SEQTYPE':'EST','TRACE':'94472873'}])) #test that the synonym mapping works OK self.assertEqual(second.SequenceIds[0].NucleotideId, 'g8185938') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_motif/__init__.py000644 000765 000024 00000000453 12024702176 021673 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_util'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_motif/test_util.py000644 000765 000024 00000050546 12024702176 022160 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file cogent_tests/motif/test_util.py from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.motif.util import Location, ModuleInstance, Module, Motif,\ MotifResults, MotifFormatter, html_color_to_rgb from cogent.core.moltype import ASCII __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" class LocationTests(TestCase): """Tests of Location class for holding module location. """ def setUp(self): """Setup for Location tests.""" self.location_no_end = Location('seq1',1) self.locations = [ Location('seq1',1,5), Location('seq2',3,54), Location('seq1',5,3), Location('seq1',2,3), Location('seq2',54,2), Location('seq0',1,3), ] self.locations_sorted = [ Location('seq0',1,3), Location('seq1',1,5), Location('seq1',5,3), Location('seq1',2,3), Location('seq2',3,54), Location('seq2',54,2), ] def test_init_no_end(self): """__init__ should properly initialize Location object""" self.assertEqual(self.location_no_end.SeqId, 'seq1') self.assertEqual(self.location_no_end.Start, 1) self.assertEqual(self.location_no_end.End, 2) def test_init_complete(self): """__init__ should properly initialize Location object""" self.assertEqual(self.locations[0].SeqId, 'seq1') self.assertEqual(self.locations[0].Start, 1) self.assertEqual(self.locations[0].End, 5) def test_cmp(self): """Location object should sort properly with __cmp__ overwritten.""" self.locations.sort() self.assertEqual(self.locations, self.locations_sorted) class ModuleInstanceTests(TestCase): """Tests for ModuleInstance class.""" def setUp(self): """Setup function for ModuleInstance tests.""" self.sequences = [ 'accucua', 'caucguu', 'accucua', 'cgacucg', 'cgaucag', 'cuguacc', 'cgcauca', ] self.locations = [ Location('seq0',1,3), Location('seq1',2,3), Location('seq1',1,5), Location('seq1',5,3), Location('seq2',3,54), Location('seq2',54,2), Location('seq3',4,0), ] self.Pvalues = [ .1, .002, .0000000003, .6, .0094, .6, .00201, ] self.Evalues = [ .006, .02, .9, .0200000001, .09, .0000003, .900001, ] self.modules_no_e = [] for i in xrange(7): self.modules_no_e.append(ModuleInstance(self.sequences[i], self.locations[i], self.Pvalues[i])) self.modules_p_and_e = [] for i in xrange(7): self.modules_p_and_e.append(ModuleInstance(self.sequences[i], self.locations[i], self.Pvalues[i], self.Evalues[i])) self.modules_no_e_sorted = [ ModuleInstance(self.sequences[2],self.locations[2],self.Pvalues[2]), ModuleInstance(self.sequences[1],self.locations[1],self.Pvalues[1]), ModuleInstance(self.sequences[6],self.locations[6],self.Pvalues[6]), ModuleInstance(self.sequences[4],self.locations[4],self.Pvalues[4]), ModuleInstance(self.sequences[0],self.locations[0],self.Pvalues[0]), ModuleInstance(self.sequences[3],self.locations[3],self.Pvalues[3]), ModuleInstance(self.sequences[5],self.locations[5],self.Pvalues[5]), ] self.modules_p_and_e_sorted = [ ModuleInstance(self.sequences[2],self.locations[2],self.Pvalues[2]), ModuleInstance(self.sequences[1],self.locations[1],self.Pvalues[1]), ModuleInstance(self.sequences[6],self.locations[6],self.Pvalues[6]), ModuleInstance(self.sequences[4],self.locations[4],self.Pvalues[4]), ModuleInstance(self.sequences[0],self.locations[0],self.Pvalues[0]), ModuleInstance(self.sequences[5],self.locations[5],self.Pvalues[5]), ModuleInstance(self.sequences[3],self.locations[3],self.Pvalues[3]), ] def test_init_no_p_e_values(self): """Init should properly initialize ModuleInstance objects.""" module1 = ModuleInstance(self.sequences[0], self.locations[0]) module2 = ModuleInstance(self.sequences[1], self.locations[1]) self.assertEqual(module1.Sequence, 'accucua') self.assertEqual(module1.Location.SeqId, 'seq0') self.assertEqual(module2.Sequence, 'caucguu') self.assertEqual(module2.Location.SeqId, 'seq1') def test_init_no_e_values(self): """Init should properly initialize ModuleInstance objects.""" self.modules_no_e.sort() self.assertEqual(self.modules_no_e, self.modules_no_e_sorted) def test_len(self): """len() should return correct length of the ModuleInstance sequence.""" for module in self.modules_no_e: self.assertEqual(len(module), 7) def test_str(self): """str() should return the correct string for each ModuleInstance.""" for module, seq in zip(self.modules_no_e, self.sequences): self.assertEqual(str(module), seq) def test_cmp(self): """ModuleInstances should sort properly with __cmp__ overwritten.""" self.modules_no_e.sort() self.modules_p_and_e.sort() self.assertEqual(map(str,self.modules_no_e), map(str,self.modules_no_e_sorted)) self.assertEqual(map(str,self.modules_p_and_e), map(str,self.modules_p_and_e_sorted)) class ModuleTests(TestCase): """Tests for Module class.""" def setUp(self): """SetUp for Module class tests.""" self.sequences = [ 'accucua', 'caucguu', 'accucua', 'cgacucg', 'cgaucag', 'cuguacc', 'cgcauca', ] self.locations = [ Location('seq0',1,3), Location('seq1',2,3), Location('seq1',1,5), Location('seq1',5,3), Location('seq2',3,54), Location('seq2',54,2), Location('seq3',4,0), ] self.Pvalues = [ .1, .002, .0000000003, .6, .0094, .6, .00201, ] self.Evalues = [ .006, .02, .9, .0200000001, .09, .0000003, .900001, ] self.modules_no_e = [] for i in xrange(7): self.modules_no_e.append(ModuleInstance(self.sequences[i], self.locations[i], self.Pvalues[i])) self.modules_p_and_e = [] for i in xrange(7): self.modules_p_and_e.append(ModuleInstance(self.sequences[i], self.locations[i], self.Pvalues[i], self.Evalues[i])) self.module_no_template = Module( { (self.modules_no_e[0].Location.SeqId, self.modules_no_e[0].Location.Start):self.modules_no_e[0], (self.modules_no_e[1].Location.SeqId, self.modules_no_e[1].Location.Start):self.modules_no_e[1], (self.modules_no_e[2].Location.SeqId, self.modules_no_e[2].Location.Start):self.modules_no_e[2], (self.modules_no_e[3].Location.SeqId, self.modules_no_e[3].Location.Start):self.modules_no_e[3], (self.modules_no_e[4].Location.SeqId, self.modules_no_e[4].Location.Start):self.modules_no_e[4], (self.modules_no_e[5].Location.SeqId, self.modules_no_e[5].Location.Start):self.modules_no_e[5], (self.modules_no_e[6].Location.SeqId, self.modules_no_e[6].Location.Start):self.modules_no_e[6], } ) self.module_with_template = Module( { (self.modules_no_e[0].Location.SeqId, self.modules_no_e[0].Location.Start):self.modules_no_e[0], (self.modules_no_e[1].Location.SeqId, self.modules_no_e[1].Location.Start):self.modules_no_e[1], (self.modules_no_e[2].Location.SeqId, self.modules_no_e[2].Location.Start):self.modules_no_e[2], (self.modules_no_e[3].Location.SeqId, self.modules_no_e[3].Location.Start):self.modules_no_e[3], (self.modules_no_e[4].Location.SeqId, self.modules_no_e[4].Location.Start):self.modules_no_e[4], (self.modules_no_e[5].Location.SeqId, self.modules_no_e[5].Location.Start):self.modules_no_e[5], (self.modules_no_e[6].Location.SeqId, self.modules_no_e[6].Location.Start):self.modules_no_e[6], }, Template = 'accgucg' ) def test_init(self): """Init should properly initialize Module object.""" module = Module(data={(self.modules_no_e[0].Location.SeqId, self.modules_no_e[0].Location.Start): \ self.modules_no_e[0]}) self.assertEqual(module.Template, None) self.assertEqual(module.Alphabet, ASCII.Alphabet) self.assertEqual(module.Pvalue, None) self.assertEqual(module.Evalue, None) self.assertEqual(module.keys(),[('seq0',1)]) self.assertEqual(module.values(),[ModuleInstance(self.sequences[0], self.locations[0], self.Pvalues[0])]) def test_cmp(self): """Module objects should sort properly with __cmp__ overwritten.""" pvals_sorted = [3e-010, 0.002, 0.0020100000000000001, 0.0094000000000000004, 0.10000000000000001, 0.59999999999999998, 0.59999999999999998] evals_sorted = [.9, .02, .900001, .09, .006, .0000003, .0200000001, ] modules = [] for instance, pvalue, evalue in zip(self.modules_no_e, self.Pvalues, self.Evalues): modules.append(Module({(instance.Location.SeqId, instance.Location.Start):instance}, Pvalue=pvalue, Evalue=evalue)) modules.sort() for ans, p, e in zip(modules, pvals_sorted, evals_sorted): self.assertEqual(ans.Pvalue, p) self.assertEqual(ans.Evalue, e) def test_LocationDict(self): """LocationDict should return correct dictionary of locations.""" location_dict_ans = { 'seq0':[1], 'seq1':[1,2,3], 'seq2':[2,3], 'seq3':[0], } location_dict = self.module_no_template.LocationDict self.assertEqual(location_dict,location_dict_ans) class MotifTests(TestCase): """Tests for Motif class.""" def test_init(self): """Init should properly initialize Motif object.""" module = Module({ ('a',3): ModuleInstance('guc', Location('a',3,5)), ('b',3): ModuleInstance('guc', Location('b',3,5)), ('c',8): ModuleInstance('guc', Location('c',8,10)), }) m = Motif(module) self.assertEqual(m.Modules,[module]) self.assertEqual(m.Info,None) class MotifResultsTests(TestCase): """Tests for MotifResults class.""" def test_init(self): """Init should properly initialize MotifResults object.""" module = Module({ ('a',3): ModuleInstance('guc', Location('a',3,5)), ('b',3): ModuleInstance('guc', Location('b',3,5)), ('c',8): ModuleInstance('guc', Location('c',8,10)), }) motif = Motif([module]) results = {'key1':'value1','key2':'value2'} parameters = {'parameter1':1,'parameter2':2} mr = MotifResults([module],[motif],results,parameters) self.assertEqual(mr.Modules,[module]) self.assertEqual(mr.Motifs,[motif]) self.assertEqual(mr.Results,results) self.assertEqual(mr.parameter1,1) self.assertEqual(mr.parameter2,2) class UtilTests(TestCase): """Tests for utility functions.""" def test_html_color_to_rgb(self): """Tests for html_to_color_rgb.""" html_colors = ['#FF0000','#00FF00','#0000FF','545454'] rgb_colors = [(1.0,0.0,0.0),(0.0,1.0,0.0),(0.0,0.0,1.0),\ (0.32941176470588235,0.32941176470588235,0.32941176470588235)] for html, rgb in zip(html_colors, rgb_colors): self.assertEqual(html_color_to_rgb(html),rgb) def test_make_remap_dict(self): """Tests for make_remap_dict.""" pass class MotifFormatterTests(TestCase): """Tests for MotifFormatter class.""" def setUp(self): """SetUp for MotifFormatter class tests.""" self.sequences = [ 'accucua', 'caucguu', 'accucua', 'cgacucg', 'cgaucag', 'cuguacc', 'cgcauca', ] self.locations = [ Location('seq0',1,3), Location('seq1',2,3), Location('seq1',1,5), Location('seq1',5,3), Location('seq2',3,54), Location('seq2',54,2), Location('seq3',4,0), ] self.Pvalues = [ .1, .002, .0000000003, .6, .0094, .6, .00201, ] self.Evalues = [ .006, .02, .9, .0200000001, .09, .0000003, .900001, ] self.modules_no_e = [] for i in xrange(7): self.modules_no_e.append(ModuleInstance(self.sequences[i], self.locations[i], self.Pvalues[i])) self.module_with_template = Module( { (self.modules_no_e[0].Location.SeqId, self.modules_no_e[0].Location.Start):self.modules_no_e[0], (self.modules_no_e[1].Location.SeqId, self.modules_no_e[1].Location.Start):self.modules_no_e[1], (self.modules_no_e[2].Location.SeqId, self.modules_no_e[2].Location.Start):self.modules_no_e[2], (self.modules_no_e[3].Location.SeqId, self.modules_no_e[3].Location.Start):self.modules_no_e[3], (self.modules_no_e[4].Location.SeqId, self.modules_no_e[4].Location.Start):self.modules_no_e[4], (self.modules_no_e[5].Location.SeqId, self.modules_no_e[5].Location.Start):self.modules_no_e[5], (self.modules_no_e[6].Location.SeqId, self.modules_no_e[6].Location.Start):self.modules_no_e[6], }, Template = 'accgucg', ID='1' ) self.modules_with_ids =\ [Module({ ('a',3): ModuleInstance('guc', Location('a',3,5)), ('b',3): ModuleInstance('guc', Location('b',3,5)), ('c',8): ModuleInstance('guc', Location('c',8,10)), },ID='1'), Module({ ('a',7): ModuleInstance('cca', Location('a',7,9)), ('b',7): ModuleInstance('cca', Location('b',7,9)), ('c',11): ModuleInstance('cca',Location('c',11,13)), },ID='2'), Module({ ('a',10): ModuleInstance('gca',Location('a',10,12)), ('b',10): ModuleInstance('gca',Location('b',10,12)), ('c',14): ModuleInstance('gca',Location('c',14,12)), },ID='3'), Module({ ('a',13): ModuleInstance('ggg',Location('a',13,15)), ('b',13): ModuleInstance('ggg',Location('b',13,15)), ('c',18): ModuleInstance('ggg',Location('c',18,20)), },ID='4'), ] self.motifs_with_ids = map(Motif,self.modules_with_ids) self.motif_results = MotifResults(Modules=self.modules_with_ids,\ Motifs=self.motifs_with_ids) self.color_map = {'1':"""background-color: #0000FF; ; font-family: 'Courier New', Courier""", '2':"""background-color: #FFFF00; ; font-family: 'Courier New', Courier""", '3':"""background-color: #00FFFF; ; font-family: 'Courier New', Courier""", '4':"""background-color: #FF00FF; ; font-family: 'Courier New', Courier""", } self.color_map_rgb = { 'color_1':(0.0,0.0,1.0), 'color_2':(1.0,1.0,0.0), 'color_3':(0.0,1.0,1.0), 'color_4':(1.0,0.0,1.0), } def test_getColorMapS0(self): """tests for getColorMapS0""" mf = MotifFormatter() module_ids = ['1','2','3','4'] self.assertEqual(mf.getColorMapS0(module_ids),self.color_map) def test_getColorMap(self): """tests for getColorMap""" mf = MotifFormatter() self.assertEqual(mf.getColorMap(self.motif_results),self.color_map) def test_getColorMapRgb(self): """tests for getColorMapRgb""" mf = MotifFormatter() self.assertEqual(mf.getColorMapRgb(self.motif_results),\ self.color_map_rgb) #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_maths/__init__.py000644 000765 000024 00000001154 12024702176 021670 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_fit_function', 'test_geometry', 'test_matrix', 'test_matrix_logarithm', 'test_optimisers', 'test_stats', 'test_unifrac', 'test_spatial'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell", "Catherine Lozupone", "Gavin Huttley", "Sandra Smit", "Marcin Cieslik", "Antonio Gonzalez Pena"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_maths/test_distance_transform.py000644 000765 000024 00000052105 12024702176 025057 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for distance_transform.py functions. """ from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.maths.distance_transform import * from numpy import array, sqrt, shape, ones, diag __author__ = "Justin Kuczynski" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Justin Kuczynski", "Zongzhi Liu", "Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Justin Kuczynski" __email__ = "justinak@gmail.com" __status__ = "Prototype" class functionTests(TestCase): """Tests of top-level functions.""" def setUp(self): self.mat_test = asmatrix([[10, 10, 20], [10, 15, 10], [15, 5, 5]], 'float') self.emptyarray = array([], 'd') self.mtx1 = array([[1, 3], [0.0, 23.1]],'d') self.dense1 = array([[1, 3], [5, 2], [0.1, 22]],'d') self.zeromtx = array([[ 0.0, 0.0, 0.0], [ 0.0, 0.0 , 0.0], [ 0.0, 0.0, 0.0 ], [ 0.0, 0.0, 0.0 ]],'d') self.sparse1 = array([[ 0.0, 0.0, 5.33], [ 0.0, 0.0 , 0.4], [ 1.0, 0.0, 0.0 ], [ 0.0, 0.0, 0.0 ]],'d') self.input_binary_dist_otu_gain1 = array([[2,1,0,0], [1,0,0,1], [0,0,3,0], [0,0,0,1]]) def get_sym_mtx_from_uptri(self, mtx): """helper fn, only for square matrices""" numrows, numcols = shape(mtx) for i in range(numrows): for j in range(i): if i==j: break mtx[i,j] = mtx[j,i] # j < i, so row
upper triangle return mtx def test_dist_canberra(self): """tests dist_canberra tests inputs of empty mtx, zeros, and results compared with calcs done by hand""" self.assertFloatEqual(dist_canberra(self.zeromtx), zeros((4,4),'d')) mtx1expected = array([[ 0.0, 46.2/52.2], [ 46.2/52.2, 0.0 ]],'d') self.assertFloatEqual(dist_canberra(self.mtx1), mtx1expected) sparse1exp = ones((self.sparse1.shape[0],self.sparse1.shape[0])) # remove diagonal sparse1exp[0,0] = sparse1exp[1,1] = sparse1exp[2,2] = sparse1exp[3,3]\ = 0.0 sparse1exp[0,1] = sparse1exp[1,0] = ( (5.33-.4) / (5.33 + .4) ) self.assertFloatEqual(dist_canberra(self.sparse1), sparse1exp) def test_dist_euclidean(self): """tests dist_euclidean tests inputs of empty mtx, zeros, and dense1 compared with calcs done by hand""" self.assertFloatEqual(dist_euclidean(self.zeromtx), zeros((4,4),'d')) dense1expected = array([[ 0.0, sqrt(17.), sqrt(.9**2 + 19**2)], [ sqrt(17.), 0.0 , sqrt(4.9**2 + 20**2)], [ sqrt(.9**2 + 19**2), sqrt(4.9**2 + 20**2), 0.0 ]],'d') self.assertFloatEqual(dist_euclidean(self.dense1), dense1expected) def test_dist_gower(self): """tests dist_gower tests inputs of empty mtx, zeros, and results compared with calcs done by hand""" self.assertFloatEqual(dist_gower(self.zeromtx), zeros((4,4),'d')) mtx1expected = array([[ 0.0, 2.], [ 2., 0.0 ]],'d') self.assertFloatEqual(dist_gower(self.mtx1), mtx1expected) sparse1expected = array([[ 0.0, 4.93/5.33, 2, 1], [ 4.93/5.33 , 0.0 , 1 + .4/5.33, .4/5.33], [ 2, 1 + .4/5.33, 0,1], [1, .4/5.33, 1, 0.0]],'d') self.assertFloatEqual(dist_gower(self.sparse1), sparse1expected) def test_dist_manhattan(self): """tests dist_manhattan tests inputs of empty mtx, zeros, and dense1 compared with calcs done by hand""" self.assertFloatEqual(dist_manhattan(self.zeromtx), zeros((4,4),'d')) dense1expected = array([[ 0.0, 5.0, 019.9], [ 5.0, 0.0 , 24.9], [ 19.9, 24.90, 0.0 ]],'d') self.assertFloatEqual(dist_manhattan(self.dense1), dense1expected) def test_dist_abund_jaccard(self): """dist_abund_jaccard should compute distances for dense1 and mtx1""" mtx1_expected = array([[0, 0.25], [0.25, 0]], 'd') self.assertEqual(dist_abund_jaccard(self.mtx1), mtx1_expected) dense1_expected = zeros((3,3), 'd') self.assertEqual(dist_abund_jaccard(self.dense1), dense1_expected) sparse1_expected = array([ [0.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0], [1.0, 1.0, 0.0, 1.0], [1.0, 1.0, 1.0, 0.0]], 'd') self.assertEqual(dist_abund_jaccard(self.sparse1), sparse1_expected) def test_dist_morisita_horn(self): """tests dist_morisita_horn tests inputs of empty mtx, zeros, and dense1 compared with calcs done by hand""" self.assertFloatEqual(dist_morisita_horn(self.zeromtx), zeros((4,4),'d')) a = 1 - 2*69.3/(26/16. * 23.1 * 4) mtx1expected = array([[0, a], [a,0]],'d') self.assertFloatEqual(dist_morisita_horn(self.mtx1), mtx1expected) def test_dist_bray_curtis(self): """tests dist_bray_curtis tests inputs of empty mtx, zeros, and mtx1 compared with calcs done by hand""" self.assertFloatEqual(dist_manhattan(self.zeromtx), zeros((4,4)*1,'d')) mtx1expected = array([[0, 21.1/27.1], [21.1/27.1, 0]],'d') self.assertFloatEqual(dist_bray_curtis(self.mtx1), mtx1expected) def test_dist_bray_curtis_faith(self): """tests dist_bray_curtis_faith tests inputs of empty mtx, zeros, and mtx1 compared with calcs done by hand""" self.assertFloatEqual(dist_manhattan(self.zeromtx), zeros((4,4)*1,'d')) mtx1expected = array([[0, 21.1/27.1], [21.1/27.1, 0]],'d') self.assertFloatEqual(dist_bray_curtis_faith(self.mtx1), mtx1expected) def test_dist_soergel(self): """tests dist_soergel tests inputs of empty mtx, zeros, and dense1 compared with calcs done by hand/manhattan dist""" self.assertFloatEqual(dist_soergel(self.zeromtx), zeros((4,4)*1,'d')) dense1expected = dist_manhattan(self.dense1) dense1norm = array([[ 1, 8, 23], [8,1,27], [23,27,1]],'d') dense1expected /= dense1norm self.assertFloatEqual(dist_soergel(self.dense1), dense1expected) def test_dist_kulczynski(self): """tests dist_kulczynski tests inputs of empty mtx, zeros, and mtx1 compared with calcs done by hand""" self.assertFloatEqual(dist_kulczynski(self.zeromtx), zeros((4,4)*1,'d')) mtx1expected = array([[0, 1.-1./2.*(3./4. + 3./23.1)], [1.-1./2.*(3./4. + 3./23.1), 0]],'d') self.assertFloatEqual(dist_kulczynski(self.mtx1), mtx1expected) def test_dist_pearson(self): """tests dist_pearson tests inputs of empty mtx, zeros, mtx compared with calcs done by hand, and an example from http://davidmlane.com/hyperstat/A56626.html """ self.assertFloatEqual(dist_pearson(self.zeromtx), zeros((4,4),'d')) mtx1expected = array([[0, 0], [0, 0]],'d') self.assertFloatEqual(dist_pearson(self.mtx1), mtx1expected) # example 1 from http://davidmlane.com/hyperstat/A56626.html ex1 = array([[1, 2, 3, ], [2,5,6]],'d') ex1res = 1 - 4./sqrt(2.*(8+2./3.)) ex1expected = array([[0, ex1res], [ex1res, 0]],'d') self.assertFloatEqual(dist_pearson(ex1), ex1expected) def test_dist_spearman_approx(self): """tests dist_spearman_approx tests inputs of empty mtx, zeros, and an example from wikipedia """ self.assertFloatEqual(dist_spearman_approx(self.zeromtx), zeros((4,4)*1,'d')) # ex1 from wikipedia Spearman's_rank_correlation_coefficient 20jan2009 ex1 = array([[106 ,86 ,100 ,101 ,99 ,103 ,97 ,113 ,112 ,110], [7,0,27,50,28,29,20,12,6,17]],'d') ex1res = 6.*194./(10.*99.) ex1expected = array([[0, ex1res], [ex1res, 0]],'d') self.assertFloatEqual(dist_spearman_approx(ex1), ex1expected) # now binary fns def test_binary_dist_otu_gain(self): """ binary OTU gain functions as expected """ actual = binary_dist_otu_gain(self.input_binary_dist_otu_gain1) expected = array([[0, 1, 2, 2], [1, 0, 2, 1], [1, 1, 0, 1], [1, 0, 1, 0]]) self.assertEqual(actual,expected) def test_binary_dist_chisq(self): """tests binary_dist_chisq tests inputs of empty mtx, zeros, and mtx1 compared with calcs done by hand""" self.assertFloatEqual(binary_dist_chisq(self.zeromtx), zeros((4,4),'d')) mtx1expected = array([[0,sqrt(9/8.)], [ sqrt(9/8.),0]],'d') self.assertFloatEqual(binary_dist_chisq(self.mtx1), mtx1expected) def test_binary_dist_chord(self): """tests binary_dist_chord tests inputs of empty mtx, zeros, and results compared with calcs done by hand""" self.assertFloatEqual(binary_dist_chord(self.zeromtx), zeros((4,4),'d')) mtx1expected = array([[0,sqrt( 1/2. + (1./sqrt(2.) -1.)**2)], [ sqrt( 1/2. + (1./sqrt(2.) -1.)**2),0]],'d') self.assertFloatEqual(binary_dist_chord(self.mtx1), mtx1expected) def test_binary_dist_lennon(self): """tests binary_dist_lennon tests inputs of empty mtx, zeros, and results compared with calcs done by hand""" self.assertFloatEqual(binary_dist_lennon(self.zeromtx), zeros((4,4),'d')) mtxa = array([[5.2,9,0.2], [0,99,1], [0,0.0,8233.1]],'d') self.assertFloatEqual(binary_dist_lennon(mtxa), zeros((3,3),'d') ) mtxb = array([[5.2,0,0.2, 9.2], [0,0,0,1], [0,3.2,0,8233.1]],'d') mtxbexpected = array([[0,0,0.5], [0,0,0], [0.5,0,0]],'d') self.assertFloatEqual(binary_dist_lennon(mtxb), mtxbexpected) def test_binary_dist_pearson(self): """tests binary_dist_pearson tests inputs of empty mtx, zeros, and dense1 compared with calcs done by hand""" self.assertFloatEqual(binary_dist_pearson(self.zeromtx), zeros((4,4),'d')) self.assertFloatEqual(binary_dist_pearson(self.dense1), zeros((3,3))) def test_binary_dist_jaccard(self): """tests binary_dist_jaccard tests inputs of empty mtx, zeros, and sparse1 compared with calcs done by hand""" self.assertFloatEqual(binary_dist_jaccard(self.zeromtx), zeros((4,4),'d')) sparse1expected = array([[0, 0, 1., 1.], [0, 0, 1, 1], [1,1,0,1], [1,1,1,0]],'d') self.assertFloatEqual(binary_dist_jaccard(self.sparse1), sparse1expected) sparse1expected = dist_manhattan(self.sparse1.astype(bool)) sparse1norm = array([[ 1, 1,2,1], [1,1,2,1], [2,2,1,1], [1,1,1,100]],'d') sparse1expected /= sparse1norm self.assertFloatEqual(binary_dist_jaccard(self.sparse1), sparse1expected) def test_binary_dist_ochiai(self): """tests binary_dist_ochiai tests inputs of empty mtx, zeros, and mtx1 compared with calcs done by hand""" self.assertFloatEqual(binary_dist_ochiai(self.zeromtx), zeros((4,4),'d')) mtx1expected = array([[0,1-1/sqrt(2.)], [1-1/sqrt(2.), 0,]],'d') self.assertFloatEqual(binary_dist_ochiai(self.mtx1),mtx1expected) def test_binary_dist_hamming(self): """tests binary_dist_hamming tests inputs of empty mtx, zeros, and mtx1 compared with calcs done by hand""" self.assertFloatEqual(binary_dist_hamming(self.zeromtx), zeros((4,4),'d')) mtx1expected = array([[0,1], [1, 0,]],'d') self.assertFloatEqual(binary_dist_hamming(self.mtx1),mtx1expected) def test_binary_dist_sorensen_dice(self): """tests binary_dist_sorensen_dice tests inputs of empty mtx, zeros, and mtx1 compared with calcs done by hand""" self.assertFloatEqual(binary_dist_sorensen_dice(self.zeromtx), zeros((4,4),'d')) mtx1expected = array([[0,1/3.], [1/3., 0,]],'d') self.assertFloatEqual(binary_dist_sorensen_dice(self.mtx1), mtx1expected) sparse1expected = array([[0, 0, 1., 1.], [0, 0, 1, 1], [1,1,0,1], [1,1,1,0]],'d') self.assertFloatEqual(binary_dist_sorensen_dice(self.sparse1), sparse1expected) def test_binary_dist_euclidean(self): """tests binary_dist_euclidean tests two inputs compared with calculations by hand, and runs zeros and an empty input""" dense1expected = array([[ 0.0, 0.0, 0.0], [ 0.0, 0.0 , 0.0], [ 0.0, 0.0, 0.0 ]],'d') sparse1expected = zeros((4,4),'d') sparse1expected[0,2] = sqrt(2) sparse1expected[0,3] = 1.0 sparse1expected[1,2] = sqrt(2) sparse1expected[1,3] = 1.0 sparse1expected[2,3] = 1.0 sparse1expected = self.get_sym_mtx_from_uptri(sparse1expected) self.assertFloatEqual(binary_dist_euclidean(self.dense1), dense1expected) self.assertFloatEqual(binary_dist_euclidean(self.sparse1), sparse1expected) self.assertFloatEqual(binary_dist_euclidean(self.zeromtx), zeros((4,4),'d')) #zj's stuff def test_chord_transform(self): """trans_chord should return the exp result in the ref paper.""" exp = [[ 0.40824829, 0.40824829, 0.81649658], [ 0.48507125, 0.72760688, 0.48507125], [ 0.90453403, 0.30151134, 0.30151134]] res = trans_chord(self.mat_test) self.assertFloatEqual(res, exp) def test_chord_dist(self): """dist_chord should return the exp result.""" self.assertFloatEqual(dist_chord(self.zeromtx), zeros((4,4),'d')) exp = [[ 0. , 0.46662021, 0.72311971], [ 0.46662021, 0. , 0.62546036], [ 0.72311971, 0.62546036, 0. ]] dist = dist_chord(self.mat_test) self.assertFloatEqual(dist, exp) def test_chisq_transform(self): """trans_chisq should return the exp result in the ref paper.""" exp_m = [[ 0.42257713, 0.45643546, 0.84515425], [ 0.48294529, 0.7824608 , 0.48294529], [ 1.01418511, 0.36514837, 0.3380617 ]] res_m = trans_chisq(self.mat_test) self.assertFloatEqual(res_m, exp_m) def test_chisq_distance(self): """dist_chisq should return the exp result.""" self.assertFloatEqual(dist_chisq(self.zeromtx), zeros((4,4),'d')) exp_d = [[ 0. , 0.4910521 , 0.78452291], [ 0.4910521 , 0. , 0.69091002], [ 0.78452291, 0.69091002, 0. ]] res_d = dist_chisq(self.mat_test) self.assertFloatEqual(res_d, exp_d) def test_hellinger_transform(self): """dist_hellinger should return the exp result in the ref paper.""" exp = [[ 0.5 , 0.5 , 0.70710678], [ 0.53452248, 0.65465367, 0.53452248], [ 0.77459667, 0.4472136 , 0.4472136 ]] res = trans_hellinger(self.mat_test) self.assertFloatEqual(res, exp) def test_hellinger_distance(self): """dist_hellinger should return the exp result.""" self.assertFloatEqual(dist_hellinger(self.zeromtx), zeros((4,4),'d')) exp = [[ 0. , 0.23429661, 0.38175149], [ 0.23429661, 0. , 0.32907422], [ 0.38175149, 0.32907422, 0. ]] dist = dist_hellinger(self.mat_test) self.assertFloatEqual(dist, exp) def test_species_profile_transform(self): """trans_specprof should return the exp result.""" exp = [[ 0.25 , 0.25 , 0.5 ], [ 0.28571429, 0.42857143, 0.28571429], [ 0.6 , 0.2 , 0.2 ]] res = trans_specprof(self.mat_test) self.assertFloatEqual(res, exp) def test_species_profile_distance(self): """dist_specprof should return the exp result.""" self.assertFloatEqual(dist_specprof(self.zeromtx), zeros((4,4),'d')) exp = [[ 0. , 0.28121457, 0.46368092], [ 0.28121457, 0. , 0.39795395], [ 0.46368092, 0.39795395, 0. ]] dist = dist_specprof(self.mat_test) self.assertFloatEqual(dist, exp) def test_dist_bray_curtis_magurran1(self): """ zero values should return zero dist, or 1 with nonzero samples""" res = dist_bray_curtis_magurran( numpy.array([[0,0,0], [0,0,0], [1,1,1], ])) self.assertFloatEqual(res,numpy.array([ [0,0,1], [0,0,1], [1,1,0], ])) def test_dist_bray_curtis_magurran2(self): """ should match hand-calculated values""" res = dist_bray_curtis_magurran( numpy.array([[1,4,3], [1,3,5], [0,2,0], ])) self.assertFloatEqual(res,numpy.array([ [0,1-14/17,1-(.4)], [1-14/17,0,1-4/11], [1-.4,1-4/11,0], ])) #def test_no_dupes(self): #""" here we check all distance functions in distance_transform for #duplicate #results. Uses an unsafe hack to get all distance functions, #thus disabled by default #The dataset is from Legendre 2001, Ecologically Meaningful... #also, doesn't actually raise an error on failing, just prints to #stdout #""" #import distance_transform ## L19 dataset #L19data = array( #[[7,1,0,0,0,0,0,0,0], #[4,2,0,0,0,1,0,0,0], #[2,4,0,0,0,1,0,0,0], #[1,7,0,0,0,0,0,0,0], #[0,8,0,0,0,0,0,0,0], #[0,7,1,0,0,0,0,0,0], #[0,4,2,0,0,0,2,0,0], #[0,2,4,0,0,0,1,0,0], #[0,1,7,0,0,0,0,0,0], #[0,0,8,0,0,0,0,0,0], #[0,0,7,1,0,0,0,0,0], #[0,0,4,2,0,0,0,3,0], #[0,0,2,4,0,0,0,1,0], #[0,0,1,7,0,0,0,0,0], #[0,0,0,8,0,0,0,0,0], #[0,0,0,7,1,0,0,0,0], #[0,0,0,4,2,0,0,0,4], #[0,0,0,2,4,0,0,0,1], #[0,0,0,1,7,0,0,0,0]], 'd') #distfns = [] #distfn_strs = dir(distance_transform) ## warning: dangerous eval, and might catch bad or not functions #for fnstr in distfn_strs: #if fnstr.find('dist') != -1: #distfns.append(eval('%s' % fnstr)) #dist_results = [] #for distfn in distfns: #dist_results.append(distfn(L19data)) #for i in range(len(dist_results)): #for j in range(i): #try: #self.assertFloatEqual(dist_results[i], dist_results[j]) #except: #pass # should not be equal, so catch error and proceed #else: #print "duplicates found: ", distfns[i], distfns[j] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_fit_function.py000644 000765 000024 00000003541 12024702176 023661 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for fit function. """ from numpy import array, arange, exp from numpy.random import rand from cogent.util.unit_test import TestCase, main from cogent.maths.fit_function import fit_function __author__ = "Antonio Gonzalez Pena" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Antonio Gonzalez Pena"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Antonio Gonzalez Pena" __email__ = "antgonza@gmail.com" __status__ = "Prototype" class fit_function_test(TestCase): """Tests of top-level fit functions.""" def test_constant(self): """test constant approximation""" # defining our fitting function def f(x,a): return a[0] exp_params = [2] x = arange(-1,1,.01) y = f(x, exp_params) y_noise = y + rand(len(x)) params = fit_function(x, y_noise, f, 1, 5) self.assertFloatEqual(params, exp_params , .5) def test_linear(self): """test linear approximation""" # defining our fitting function def f(x,a): return (a[0]+x*a[1]) exp_params = [2, 10] x = arange(-1,1,.01) y = f(x, exp_params) y_noise = y + rand(len(y)) params = fit_function(x, y_noise, f, 2, 5) self.assertFloatEqual(params, exp_params , .5) def test_exponential(self): """test exponential approximation""" # defining our fitting function def f(x,a): return exp(a[0]+x*a[1]) exp_params = [2, 10] x = arange(-1,1,.01) y = f(x, exp_params) y_noise = y + rand(len(y)) params = fit_function(x, y_noise, f, 2, 5) self.assertFloatEqual(params, exp_params , .5) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_function_optimisation.py000644 000765 000024 00000005524 12024702176 025621 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for optimisation functions""" from function_optimisation import great_deluge, ga_evolve, _simple_breed,\ _simple_score, _simple_init, _simple_select from cogent.util.unit_test import TestCase, main from numpy import array __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "mcdonadt@colorado.edu" __status__ = "Production" class OptimisationFunctionsTestCase(TestCase): """Tests for great_delue and ga_evolve""" def test_great_deluge(self): """great_deluge should return expected values from foo() obj""" class foo: def __init__(self, x): self.x = x def cost(self): return self.x def perturb(self): return self.__class__(self.x - 1) observed = [i for i in great_deluge(foo(5), max_total_iters=6)] self.assertEqual(observed[0][1].x, 4) self.assertEqual(observed[1][1].x, 3) self.assertEqual(observed[2][1].x, 2) self.assertEqual(observed[3][1].x, 1) self.assertEqual(observed[4][1].x, 0) self.assertEqual(observed[5][1].x, -1) def test_ga_evolve(self): """ga_evolve should return expected values when using overloaded funcs""" init_f = lambda x,y: [1,1,1] score_f = lambda x,y: 5 breed_f = lambda w,x,y,z: [1,1,1] select_f = lambda x,y: 2 expected = [(0, 2), (1, 2), (2, 2)] observed = [i for i in ga_evolve(1, 2, 3, 0.5, score_f, breed_f, \ select_f, init_f, None, 3)] self.assertEqual(observed, expected) class PrivateFunctionsTestCase(TestCase): """Tests of the private support functions for ga_evolve""" def test_simple_breed(self): """simple_breed should return expected values when with modded parent""" f = lambda: 0.5 obj = lambda: 0 obj.mutate = lambda: 1 expected = [1,1,1,1,1] observed = _simple_breed([0, obj], 5, 1.0, f) self.assertEqual(observed, expected) def test_simple_score(self): """simple_score should return choosen value with overloaded obj""" bar = lambda: 5 bar.score = lambda x: x self.assertEqual(_simple_score(bar,6), 6) def test_simple_init(self): """simple_init should return a simple list""" expected = [array([0]), array([0]), array([0])] self.assertEqual(_simple_init(array([0]), 3), expected) def test_simple_select(self): """simple_select should return our hand picked selection""" pop = ['a','b','c','d','e'] scores = [5,3,8,6,1] best_expected = (1, 'e') self.assertEqual(_simple_select(pop, scores), best_expected) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_geometry.py000644 000765 000024 00000011335 12024702176 023025 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of the geometry package.""" from numpy import array, take, newaxis from math import sqrt from cogent.util.unit_test import TestCase, main from cogent.maths.geometry import center_of_mass_one_array, \ center_of_mass_two_array, center_of_mass, distance, sphere_points __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class CenterOfMassTests(TestCase): """Tests for the center of mass functions""" def setUp(self): """setUp for all CenterOfMass tests""" self.simple = array([[1, 1, 1], [3, 1, 1], [2, 3, 2]]) self.simple_list = [[1, 1, 1], [3, 1, 1], [2, 3, 2]] self.more_weight = array([[1, 1, 3], [3, 1, 3], [2, 3, 50]]) self.square = array([[1, 1, 25], [3, 1, 25], [3, 3, 25], [1, 3, 25]]) self.square_odd = array([[1, 1, 25], [3, 1, 4], [3, 3, 25], [1, 3, 4]]) self.sec_weight = array([[1, 25, 1], [3, 25, 1], [3, 25, 3], [1, 25, 3]]) def test_center_of_mass_one_array(self): """center_of_mass_one_array should behave correctly""" com1 = center_of_mass_one_array self.assertEqual(com1(self.simple), array([2, 2])) self.assertEqual(com1(self.simple_list), array([2, 2])) self.assertFloatEqual(com1(self.more_weight), array([2, 2.785714])) self.assertEqual(com1(self.square), array([2, 2])) self.assertEqual(com1(self.square_odd), array([2, 2])) self.assertEqual(com1(self.sec_weight, 1), array([2, 2])) def test_CoM_one_array_wrong(self): """center_of_mass_one_array should fail on wrong input""" com1 = center_of_mass_one_array self.assertRaises(TypeError, com1, self.simple, 'a') #weight_idx wrong self.assertRaises(IndexError, com1, self.simple, 100) #w_idx out of range self.assertRaises(IndexError, com1, [1, 2, 3], 2) #shape[1] out of range def test_center_of_mass_two_array(self): """center_of_mass_two_array should behave correctly""" com2 = center_of_mass_two_array coor = take(self.square_odd, (0, 1), 1) weights = take(self.square_odd, (2,), 1) self.assertEqual(com2(coor, weights), array([2, 2])) weights = weights.ravel() self.assertEqual(com2(coor, weights), array([2, 2])) def test_CoM_two_array_wrong(self): """center_of_mass_two_array should fail on wrong input""" com2 = center_of_mass_two_array weights = [1, 2] self.assertRaises(TypeError, com2, self.simple, 'a') #weight_idx wrong self.assertRaises(ValueError, com2, self.simple, weights) #not aligned def test_center_of_mass(self): """center_of_mass should make right choice between functional methods """ com = center_of_mass com1 = center_of_mass_one_array com2 = center_of_mass_two_array self.assertEqual(com(self.simple), com1(self.simple)) self.assertFloatEqual(com(self.more_weight), com1(self.more_weight)) self.assertEqual(com(self.sec_weight, 1), com1(self.sec_weight, 1)) coor = take(self.square_odd, (0, 1), 1) weights = take(self.square_odd, (2,), 1) self.assertEqual(com(coor, weights), com2(coor, weights)) weights = weights.ravel() self.assertEqual(com(coor, weights), com2(coor, weights)) def test_distance(self): """distance should return Euclidean distance correctly.""" #for single dimension, should return difference a1 = array([3]) a2 = array([-1]) self.assertEqual(distance(a1, a2), 4) #for two dimensions, should work e.g. for 3, 4, 5 triangle a1 = array([0, 0]) a2 = array([3, 4]) self.assertEqual(distance(a1, a2), 5) #vector should be the same as itself for any dimensions a1 = array([1.3, 23, 5.4, 2.6, -1.2]) self.assertEqual(distance(a1, a1), 0) #should match hand-calculated case for an array a1 = array([[1, -2], [3, 4]]) a2 = array([[1, 0], [-1, 2.5]]) self.assertEqual(distance(a1, a1), 0) self.assertEqual(distance(a2, a2), 0) self.assertEqual(distance(a1, a2), distance(a2, a1)) self.assertFloatEqual(distance(a1, a2), sqrt(22.25)) def test_sphere_points(self): """tests sphere points""" self.assertEquals(sphere_points(1), array([[ 1., 0., 0.]])) # def test_coords_to_symmetry(self): # """tests symmetry expansion (TODO)""" # pass # # def test_coords_to_crystal(self): # """tests crystal expansion (TODO)""" # pass if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_matrix/000755 000765 000024 00000000000 12024703635 022122 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_maths/test_matrix_logarithm.py000644 000765 000024 00000003677 12024702176 024556 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for matrix logarithm.""" from numpy import array from cogent.util.unit_test import TestCase, main from cogent.maths.matrix_logarithm import logm, logm_taylor __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class logarithm_tests(TestCase): """Tests of top-level matrix logarithm functions.""" def test_logm(self): """logm results should match scipy's""" p = array([[ 0.86758487, 0.05575623, 0.0196798 , 0.0569791 ], [ 0.01827347, 0.93312148, 0.02109664, 0.02750842], [ 0.04782582, 0.1375742 , 0.80046869, 0.01413129], [ 0.23022035, 0.22306947, 0.06995306, 0.47675713]]) q = logm(p) self.assertFloatEqual(q, \ array([[-0.15572053, 0.04947485, 0.01918653, 0.08705915], [ 0.01405019, -0.07652296, 0.02252941, 0.03994336], [ 0.05365208, 0.15569116, -0.22588966, 0.01654642], [ 0.35144866, 0.31279003, 0.10478999, -0.76902868]])) def test_logm_taylor(self): """logm_taylor should return same result as logm""" q_eig = logm([[ 0.86758487, 0.05575623, 0.0196798 , 0.0569791 ], [ 0.01827347, 0.93312148, 0.02109664, 0.02750842], [ 0.04782582, 0.1375742 , 0.80046869, 0.01413129], [ 0.23022035, 0.22306947, 0.06995306, 0.47675713]]) q_taylor = logm_taylor([[0.86758487, 0.05575623, 0.0196798, 0.0569791], [ 0.01827347, 0.93312148, 0.02109664, 0.02750842], [ 0.04782582, 0.1375742 , 0.80046869, 0.01413129], [ 0.23022035, 0.22306947, 0.06995306, 0.47675713]]) self.assertFloatEqual(q_taylor, q_eig) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_optimisers.py000644 000765 000024 00000006436 12024702176 023376 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division import time, sys, os, numpy from cogent.util.unit_test import TestCase, main from cogent.maths.optimisers import maximise, MaximumEvaluationsReached __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def quartic(x): # Has global maximum at -4 and local maximum at 2 # http://www.wolframalpha.com/input/?i=x**2*%283*x**2%2B8*x-48%29 # Scaled down 10-fold to avoid having to change init_temp return x**2*(3*x**2+8*x-48) class NullFile(object): def write(self, x): pass def isatty(self): return False def quiet(f, *args, **kw): # Checkpointer still has print statements orig = sys.stdout try: sys.stdout = NullFile() result = f(*args, **kw) finally: sys.stdout = orig return result def MakeF(): evals = [0] last = [0] def f(x): evals[0] += 1 last[0] = x # Scaled down 10-fold to avoid having to change init_temp return -0.1 * quartic(x) return f, last, evals class OptimiserTestCase(TestCase): def _test_optimisation(self, target=-4, xinit=1.0, bounds=([-10,10]), **kw): local = kw.get('local', None) max_evaluations = kw.get('max_evaluations', None) f, last, evals = MakeF() x = quiet(maximise, f, [xinit], bounds, **kw) self.assertEqual(x, last[0]) # important for Calculator error = abs(x[0] - target) self.assertTrue(error < .0001, (kw, x, target, x)) def test_global(self): # Should find global minimum self._test_optimisation(local=False, seed=1) def test_bounded(self): # Global minimum out of bounds, so find secondary one # numpy.seterr('raise') self._test_optimisation(bounds=([0.0],[10.0]), target=2, seed=1) def test_local(self): # Global minimum not the nearest one self._test_optimisation(local=True, target=2) def test_limited(self): self.assertRaises(MaximumEvaluationsReached, self._test_optimisation, max_evaluations=5) # def test_limited_warning(self): # """optimiser warning if max_evaluations exceeded""" # self._test_optimisation(max_evaluations=5, limit_action='warn') def test_get_max_eval_count(self): """return the evaluation count from optimisation""" f, last, evals = MakeF() x, e = quiet(maximise, f, xinit=[1.0], bounds=([-10,10]), return_eval_count=True) self.assertTrue(e > 500) def test_checkpointing(self): filename = 'checkpoint.tmp.pickle' if os.path.exists(filename): os.remove(filename) self._test_optimisation(filename=filename, seed=1, init_temp=10) self._test_optimisation(filename=filename, seed=1, init_temp=10) self.assertRaises(Exception, self._test_optimisation, filename=filename, seed=1, init_temp=3.21) if os.path.exists(filename): os.remove(filename) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_period.py000644 000765 000024 00000021456 12024702176 022461 0ustar00jrideoutstaff000000 000000 from numpy import arange, convolve, random, sin, pi, exp, array, zeros, float64 from cogent.util.unit_test import TestCase, main from cogent.maths.period import ipdft, dft, auto_corr, hybrid, goertzel # because we'll be comparing python and pyrexed implementations of the same # algorithms I'm separating out those imports to make it clear from cogent.maths.period import _ipdft_inner2 as py_ipdft_inner, \ _goertzel_inner as py_goertzel_inner, _autocorr_inner2 as py_autocorr_inner try: from cogent.maths._period import ipdft_inner as pyx_ipdft_inner, \ goertzel_inner as pyx_goertzel_inner, \ autocorr_inner as pyx_autocorr_inner pyrex_available = True except ImportError: pyrex_available = False __author__ = "Hua Ying, Julien Epps and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Julien Epps", "Hua Ying", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Production" class TestPeriod(TestCase): def setUp(self): t = arange(0, 10, 0.1) n = random.randn(len(t)) nse = convolve(n, exp(-t/0.05))*0.1 nse = nse[:len(t)] self.sig = sin(2*pi*t) + nse self.p = 10 def test_inner_funcs(self): """python and pyrexed implementation should be the same""" if pyrex_available is not True: return x = array([0.04874203, 0.56831373, 0.94267804, 0.95664485, 0.60719478, -0.09037356, -0.69897319, -1.11239811, -0.84127485, -0.56281126, 0.02301213, 0.56250284, 1.0258557 , 1.03906527, 0.69885916, 0.10103556, -0.43248024, -1.03160503, -0.84901545, -0.84934356, 0.00323728, 0.44344594, 0.97736748, 1.01635433, 0.38538423, 0.09869918, -0.60441861, -0.90175391, -1.00166887, -0.66303249, -0.02070569, 0.76520328, 0.93462426, 0.97011673, 0.63199999, 0.0764678 , -0.55680168, -0.92028808, -0.98481451, -0.57600588, 0.0482667 , 0.57572519, 1.02077883, 0.93271663, 0.41581696, -0.07639671, -0.71426286, -0.97730119, -1.0370596 , -0.67919572, 0.03779302, 0.60408759, 0.87826068, 0.79126442, 0.69769622, 0.01419442, -0.42917556, -1.00100485, -0.83945546, -0.55746313, 0.12730859, 0.60057659, 0.98059721, 0.83275501, 0.69031804, 0.02277554, -0.63982729, -1.23680355, -0.79477887, -0.67773375, -0.05204714, 0.51765381, 0.77691955, 0.8996709 , 0.5153137 , 0.01840839, -0.65124866, -1.13269058, -0.92342177, -0.45673709, 0.11212881, 0.50153941, 1.09329507, 0.96457193, 0.80271578, -0.0041043 , -0.81750772, -0.99259986, -0.92343788, -0.57694955, 0.13982059, 0.56653375, 0.82217563, 0.85162513, 0.3984116 , -0.18937514, -0.65304629, -1.0067146 , -1.0037422 , -0.68011283]) N = 100 period = 10 self.assertFloatEqual(py_goertzel_inner(x, N, period), pyx_goertzel_inner(x, N, period)) ulim = 8 N = 8 x = array([ 0., 1., 0., -1., 0., 1., 0., -1.]) X = zeros(8, dtype='complex128') W = array([1.00000000e+00 +2.44929360e-16j, -1.00000000e+00 -1.22464680e-16j, -5.00000000e-01 -8.66025404e-01j, 6.12323400e-17 -1.00000000e+00j, 3.09016994e-01 -9.51056516e-01j, 5.00000000e-01 -8.66025404e-01j, 6.23489802e-01 -7.81831482e-01j, 7.07106781e-01 -7.07106781e-01j]) py_result = py_ipdft_inner(x, X, W, ulim, N) pyx_result = pyx_ipdft_inner(x, X, W, ulim, N) for i, j in zip(py_result, pyx_result): self.assertFloatEqual(abs(i), abs(j)) x = array([-0.07827614, 0.56637551, 1.01320526, 1.01536245, 0.63548361, 0.08560101, -0.46094955, -0.78065656, -0.8893556 , -0.56514145, 0.02325272, 0.63660719, 0.86291302, 0.82953598, 0.5706848 , 0.11655242, -0.6472655 , -0.86178218, -0.96495057, -0.76098445, -0.18911517, 0.59280646, 1.00248693, 0.89241423, 0.52475111, -0.01620599, -0.60199278, -0.98279829, -1.12469771, -0.61355799, 0.04321191, 0.52784788, 0.68508784, 0.86015123, 0.66825756, -0.0802846 , -0.63626753, -0.93023345, -0.99129547, -0.46891033, 0.04145813, 0.71226518, 1.01499246, 0.94726778, 0.63598143, -0.21920589, -0.48071702, -0.86041579, -0.9046141 , -0.55714746, -0.10052384, 0.69708969, 1.02575789, 1.16524031, 0.49895282, -0.13068573, -0.45770419, -0.86155787, -0.9230734 , -0.6590525 , -0.05072955, 0.52380317, 1.02674335, 0.87778499, 0.4303284 , -0.01855665, -0.62858193, -0.93954774, -0.94257301, -0.49692951, 0.00699347, 0.69049074, 0.93906549, 1.06339809, 0.69337543, 0.00252569, -0.57825881, -0.88460603, -0.99259672, -0.73535697, 0.12064751, 0.91159174, 0.88966993, 1.02159917, 0.43479926, -0.06159005, -0.61782651, -0.95284676, -0.8218889 , -0.52166419, 0.021961 , 0.52268762, 0.79428288, 1.01642697, 0.49060377, -0.02183994, -0.52743836, -0.99363909, -1.02963821, -0.64249996]) py_xc = zeros(2*len(x)-1, dtype=float64) pyx_xc = py_xc.copy() N = 100 py_autocorr_inner(x, py_xc, N) pyx_autocorr_inner(x, pyx_xc, N) for i, j in zip(py_xc, pyx_xc): self.assertFloatEqual(i, j) def test_autocorr(self): """correctly compute autocorrelation""" s = [1,1,1,1] X, periods = auto_corr(s, llim=-3, ulim=None) exp_X = array([1,2,3,4,3,2,1], dtype=float) self.assertEqual(X, exp_X) auto_x, auto_periods = auto_corr(self.sig, llim=2, ulim=50) max_idx = list(auto_x).index(max(auto_x)) auto_p = auto_periods[max_idx] self.assertEqual(auto_p, self.p) def test_dft(self): """correctly compute discrete fourier transform""" dft_x, dft_periods = dft(self.sig) dft_x = abs(dft_x) max_idx = list(dft_x).index(max(dft_x)) dft_p = dft_periods[max_idx] self.assertEqual(int(dft_p), self.p) def test_ipdft(self): """correctly compute integer discrete fourier transform""" s = [0, 1, 0, -1, 0, 1, 0, -1] X, periods = ipdft(s, llim=1, ulim=len(s)) exp_X = abs(array([0, 0, -1.5+0.866j, -4j, 2.927-0.951j, 1.5+0.866j, 0.302+0.627j, 0])) X = abs(X) self.assertFloatEqual(X, exp_X, eps=1e-3) ipdft_x, ipdft_periods = ipdft(self.sig, llim=2, ulim=50) ipdft_x = abs(ipdft_x) max_idx = list(ipdft_x).index(max(ipdft_x)) ipdft_p = ipdft_periods[max_idx] self.assertEqual(ipdft_p, self.p) def test_goertzel(self): """goertzel and ipdft should be the same""" ipdft_pwr, ipdft_prd = ipdft(self.sig, llim=10, ulim=10) self.assertFloatEqual(goertzel(self.sig, 10), ipdft_pwr) def test_hybrid(self): """correctly compute hybrid statistic""" hybrid_x, hybrid_periods = hybrid(self.sig, llim=None, ulim=50) hybrid_x = abs(hybrid_x) max_idx = list(hybrid_x).index(max(hybrid_x)) hybrid_p = hybrid_periods[max_idx] self.assertEqual(hybrid_p, self.p) def test_hybrid_returns_all(self): """correctly returns hybrid, ipdft and autocorr statistics""" ipdft_pwr, ipdft_prd = ipdft(self.sig, llim=2, ulim=50) auto_x, auto_periods = auto_corr(self.sig, llim=2, ulim=50) hybrid_x, hybrid_periods = hybrid(self.sig, llim=None, ulim=50) hybrid_ipdft_autocorr_stats, hybrid_periods = hybrid(self.sig, llim=None, ulim=50, return_all=True) self.assertEqual(hybrid_ipdft_autocorr_stats[0], hybrid_x) self.assertEqual(hybrid_ipdft_autocorr_stats[1], ipdft_pwr) self.assertEqual(hybrid_ipdft_autocorr_stats[2], auto_x) ipdft_pwr, ipdft_prd = ipdft(self.sig, llim=10, ulim=10) auto_x, auto_periods = auto_corr(self.sig, llim=10, ulim=10) hybrid_x, hybrid_periods = hybrid(self.sig, llim=10, ulim=10) hybrid_ipdft_autocorr_stats, hybrid_periods = hybrid(self.sig, llim=10, ulim=10, return_all=True) self.assertEqual(hybrid_ipdft_autocorr_stats[0], hybrid_x) self.assertEqual(hybrid_ipdft_autocorr_stats[1], ipdft_pwr) self.assertEqual(hybrid_ipdft_autocorr_stats[2], auto_x) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_spatial/000755 000765 000024 00000000000 12024703635 022253 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_maths/test_stats/000755 000765 000024 00000000000 12024703635 021754 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_maths/test_svd.py000644 000765 000024 00000007612 12024702176 021771 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for the svd-supporting functionality.""" from cogent.util.unit_test import TestCase, main from cogent.maths.svd import ratio_two_best, ratio_best_to_sum, \ euclidean_distance, euclidean_norm, _dists_from_mean_slow, \ dists_from_v, weiss, three_item_combos, two_item_combos from numpy import array, sqrt __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class functionTests(TestCase): """Tests of top-level functions.""" def test_ratio_two_best(self): """ratio_two_best should return ratio of two biggest items in list""" v = array([3, 2, 5, 2, 4, 10, 3]) self.assertEqual(ratio_two_best(v), 2) #should return 1 if items the same v = array([2,2,2,2,2]) self.assertEqual(ratio_two_best(v), 1) #check that it works on floating-point v = array([3,2,1]) self.assertEqual(ratio_two_best(v), 1.5) def test_ratio_best_to_sum(self): """ratio_best_to_sum should return ratio of biggest item to sum""" v = [3, 2, 5, 2, 4, 10, 3] self.assertFloatEqual(ratio_best_to_sum(v), 10/29.0) v = [2,2,2,2,2] self.assertEqual(ratio_best_to_sum(v), 2/10.0) #check that it works on floating-point v = [3,2,1] self.assertEqual(ratio_best_to_sum(v), 0.5) def test_euclidean_distance(self): """euclidean_distance should return distance between two points""" first = array([2, 3, 4]) second = array([4, 8, 10]) self.assertEqual(euclidean_distance(first, first), 0) self.assertEqual(euclidean_distance(second, second), 0) self.assertFloatEqual(euclidean_distance(first, second), sqrt(65)) self.assertFloatEqual(euclidean_distance(second, first), sqrt(65)) def test_euclidean_norm(self): """euclidean_norm should match hand-calculated results""" first = array([3,4]) self.assertEqual(euclidean_norm(first), 5) def test_dists_from_mean_slow(self): """_dists_from_mean_slow should return distance of each item from mean""" m = [[1,2,3,4],[2,3,4,5],[0,1,2,3]] self.assertEqual(_dists_from_mean_slow(m), array([0.0,2.0,2.0])) def test_dists_from_v(self): """dists_from_v should return distance of each item from v, or mean""" m = [[1,2,3,4],[2,3,4,5],[0,1,2,3]] #should calculate distances from mean by default self.assertEqual(dists_from_v(m), array([0.0,2.0,2.0])) #should caculate distances from vector if supplied v = array([2,2,2,3]) self.assertEqual(dists_from_v(m, v), sqrt(array([3,9,5]))) def test_weiss(self): """weiss should perform weiss calculation correctly""" e = array([12.0, 5.0, 0.1, 1e-3, 1e-15]) self.assertFloatEqual(weiss(e), 4.453018506827001) def test_three_item_combos(self): """three_item_combos should return items in correct order""" items = list(three_item_combos('abcde')) self.assertEqual(items, map(tuple, \ ['abc','abd','abe','acd','ace','ade','bcd','bce','bde','cde'])) def test_two_item_combos(self): """two_item_combos should return items in correct order""" items = list(two_item_combos('abcd')) self.assertEqual(items, map(tuple, ['ab','ac','ad','bc','bd','cd'])) def test_pca_qs(self): """pca_qs not tested b/c it just wraps eigenvalues(corrcoef(qs))""" pass def test_pca_cov_qs(self): """pca_cov_qs not tested b/c it just wraps eigenvalues(cov(qs))""" pass def test_svd_qs(self): """svd_qs not tested b/c it just wraps singular_value_decompositon(qs)""" pass if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_unifrac/000755 000765 000024 00000000000 12024703635 022245 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_maths/test_unifrac/__init__.py000644 000765 000024 00000000556 12024702176 024363 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_fast_tree', 'test_fast_unifrac' ] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Micah Hamady"] __license__ = "All rights reserveed" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" PyCogent-1.5.3/tests/test_maths/test_unifrac/test_fast_tree.py000644 000765 000024 00000063502 12024702176 025637 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for fast tree.""" from cogent.util.unit_test import TestCase, main from cogent.parse.tree import DndParser from cogent.maths.unifrac.fast_tree import (count_envs, sum_env_dict, index_envs, get_branch_lengths, index_tree, bind_to_array, bind_to_parent_array, _is_parent_empty, delete_empty_parents, traverse_reduce, bool_descendants, sum_descendants, fitch_descendants, tip_distances, UniFracTreeNode, FitchCounter, FitchCounterDense, permute_selected_rows, prep_items_for_jackknife, jackknife_bool, jackknife_int, unifrac, unnormalized_unifrac, PD, G, unnormalized_G, unifrac_matrix, unifrac_vector, PD_vector, weighted_unifrac, weighted_unifrac_matrix, weighted_unifrac_vector, jackknife_array, env_unique_fraction, unifrac_one_sample, weighted_one_sample) from numpy import (arange, reshape, zeros, logical_or, array, sum, nonzero, flatnonzero, newaxis) from numpy.random import permutation __author__ = "Rob Knight and Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Micah Hamady"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight, Micah Hamady" __email__ = "rob@spot.colorado.edu, hamady@colorado.edu" __status__ = "Prototype" class fast_tree_tests(TestCase): """Tests of top-level functions""" def setUp(self): """Define a couple of standard trees""" self.t1 = DndParser('(((a,b),c),(d,e))', UniFracTreeNode) self.t2 = DndParser('(((a,b),(c,d)),(e,f))', UniFracTreeNode) self.t3 = DndParser('(((a,b,c),(d)),(e,f))', UniFracTreeNode) self.t4 = DndParser('((c)b,((f,g,h)e,i)d)', UniFracTreeNode) self.t4.Name = 'a' self.t_str = '((a:1,b:2):4,(c:3,(d:1,e:1):2):3)' self.t = DndParser(self.t_str, UniFracTreeNode) self.env_str = """ a A 1 a C 2 b A 1 b B 1 c B 1 d B 3 e C 1""" self.env_counts = count_envs(self.env_str.splitlines()) self.node_index, self.nodes = index_tree(self.t) self.count_array, self.unique_envs, self.env_to_index, \ self.node_to_index = index_envs(self.env_counts, self.node_index) self.branch_lengths = get_branch_lengths(self.node_index) self.old_t_str = '((org1:0.11,org2:0.22,(org3:0.12,org4:0.23)g:0.33)b:0.2,(org5:0.44,org6:0.55)c:0.3,org7:0.4)' self.old_t = DndParser(self.old_t_str, UniFracTreeNode) self.old_env_str = """ org1 env1 1 org1 env2 1 org2 env2 1 org3 env2 1 org4 env3 1 org5 env1 1 org6 env1 1 org7 env3 1 """ self.old_env_counts = count_envs(self.old_env_str.splitlines()) self.old_node_index, self.old_nodes = index_tree(self.old_t) self.old_count_array, self.old_unique_envs, self.old_env_to_index, \ self.old_node_to_index = index_envs(self.old_env_counts, self.old_node_index) self.old_branch_lengths = get_branch_lengths(self.old_node_index) def test_traverse(self): """traverse should work iterative or recursive""" stti = self.t4.traverse stt = self.t4.traverse_recursive obs = [i.Name for i in stt(self_before=False, self_after=False)] exp = [i.Name for i in stti(self_before=False, self_after=False)] self.assertEqual(obs, exp) obs = [i.Name for i in stt(self_before=True, self_after=False)] exp = [i.Name for i in stti(self_before=True, self_after=False)] self.assertEqual(obs, exp) obs = [i.Name for i in stt(self_before=False, self_after=True)] exp = [i.Name for i in stti(self_before=False, self_after=True)] self.assertEqual(obs, exp) obs = [i.Name for i in stt(self_before=True, self_after=True)] exp = [i.Name for i in stti(self_before=True, self_after=True)] self.assertEqual(obs, exp) def test_count_envs(self): """count_envs should return correct counts from lines""" envs = """ a A 3 some other junk a B a C 1 b A 2 skip c B d b A 99 """ result = count_envs(envs.splitlines()) self.assertEqual(result, \ {'a':{'A':3,'B':1,'C':1},'b':{'A':99},'c':{'B':1}}) def test_sum_env_dict(self): """sum_env_dict should return correct counts from env_dict""" envs = """ a A 3 some other junk a B a C 1 b A 2 skip c B d b A 99 """ result = count_envs(envs.splitlines()) sum_ = sum_env_dict(result) self.assertEqual(sum_, 105) def test_index_envs(self): """index_envs should map envs and taxa onto indices""" self.assertEqual(self.unique_envs, ['A','B','C']) self.assertEqual(self.env_to_index, {'A':0, 'B':1, 'C':2}) self.assertEqual(self.node_to_index,{'a':0, 'b':1, 'c':4, 'd':2, 'e':3}) self.assertEqual(self.count_array, \ array([[1,0,2],[1,1,0],[0,3,0],[0,0,1], \ [0,1,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0]])) def test_get_branch_lengths(self): """get_branch_lengths should make array of branch lengths from index""" result = get_branch_lengths(self.node_index) self.assertEqual(result, array([1,2,1,1,3,2,4,3,0])) def test_env_unique_fraction(self): """should report unique fraction of bl in each env """ # testing old unique fraction cur_count_array = self.count_array.copy() bound_indices = bind_to_array(self.nodes, cur_count_array) total_bl = sum(self.branch_lengths) bool_descendants(bound_indices) env_bl_sums, env_bl_ufracs = env_unique_fraction(self.branch_lengths, cur_count_array) # env A has 0 unique bl, B has 4, C has 1 self.assertEqual(env_bl_sums, [0,4,1]) self.assertEqual(env_bl_ufracs, [0,4/17.0,1/17.0]) cur_count_array = self.old_count_array.copy() bound_indices = bind_to_array(self.old_nodes, cur_count_array) total_bl = sum(self.old_branch_lengths) bool_descendants(bound_indices) env_bl_sums, env_bl_ufracs = env_unique_fraction(self.old_branch_lengths, cur_count_array) # env A has 0 unique bl, B has 4, C has 1 self.assertEqual(env_bl_sums, env_bl_sums) self.assertEqual(env_bl_sums, [1.29, 0.33999999999999997, 0.63]) self.assertEqual(env_bl_ufracs, [1.29/2.9,0.33999999999999997/2.9, 0.63/2.9]) def test_index_tree(self): """index_tree should produce correct index and node map""" #test for first tree: contains singleton outgroup t1 = self.t1 id_1, child_1 = index_tree(t1) nodes_1 = [n._leaf_index for n in t1.traverse(self_before=False, \ self_after=True)] self.assertEqual(nodes_1, [0,1,2,3,6,4,5,7,8]) self.assertEqual(child_1, [(2,0,1),(6,2,3),(7,4,5),(8,6,7)]) #test for second tree: strictly bifurcating t2 = self.t2 id_2, child_2 = index_tree(t2) nodes_2 = [n._leaf_index for n in t2.traverse(self_before=False, \ self_after=True)] self.assertEqual(nodes_2, [0,1,4,2,3,5,8,6,7,9,10]) self.assertEqual(child_2, [(4,0,1),(5,2,3),(8,4,5),(9,6,7),(10,8,9)]) #test for third tree: contains trifurcation and single-child parent t3 = self.t3 id_3, child_3 = index_tree(t3) nodes_3 = [n._leaf_index for n in t3.traverse(self_before=False, \ self_after=True)] self.assertEqual(nodes_3, [0,1,2,4,3,5,8,6,7,9,10]) self.assertEqual(child_3, [(4,0,2),(5,3,3),(8,4,5),(9,6,7),(10,8,9)]) def test_bind_to_array(self): """bind_to_array should return correct array ranges""" a = reshape(arange(33), (11,3)) id_, child = index_tree(self.t3) bindings = bind_to_array(child, a) self.assertEqual(len(bindings), 5) self.assertEqual(bindings[0][0], a[4]) self.assertEqual(bindings[0][1], a[0:3]) self.assertEqual(bindings[0][1].shape, (3,3)) self.assertEqual(bindings[1][0], a[5]) self.assertEqual(bindings[1][1], a[3:4]) self.assertEqual(bindings[1][1].shape, (1,3)) self.assertEqual(bindings[2][0], a[8]) self.assertEqual(bindings[2][1], a[4:6]) self.assertEqual(bindings[2][1].shape, (2,3)) self.assertEqual(bindings[3][0], a[9]) self.assertEqual(bindings[3][1], a[6:8]) self.assertEqual(bindings[3][1].shape, (2,3)) self.assertEqual(bindings[4][0], a[10]) self.assertEqual(bindings[4][1], a[8:10]) self.assertEqual(bindings[4][1].shape, (2,3)) def test_bind_to_parent_array(self): """bind_to_parent_array should bind tree to array correctly""" a = reshape(arange(33), (11,3)) index_tree(self.t3) bindings = bind_to_parent_array(self.t3, a) self.assertEqual(len(bindings), 10) self.assertEqual(bindings[0][0], a[8]) self.assertEqual(bindings[0][1], a[10]) self.assertEqual(bindings[1][0], a[4]) self.assertEqual(bindings[1][1], a[8]) self.assertEqual(bindings[2][0], a[0]) self.assertEqual(bindings[2][1], a[4]) self.assertEqual(bindings[3][0], a[1]) self.assertEqual(bindings[3][1], a[4]) self.assertEqual(bindings[4][0], a[2]) self.assertEqual(bindings[4][1], a[4]) self.assertEqual(bindings[5][0], a[5]) self.assertEqual(bindings[5][1], a[8]) self.assertEqual(bindings[6][0], a[3]) self.assertEqual(bindings[6][1], a[5]) self.assertEqual(bindings[7][0], a[9]) self.assertEqual(bindings[7][1], a[10]) self.assertEqual(bindings[8][0], a[6]) self.assertEqual(bindings[8][1], a[9]) self.assertEqual(bindings[9][0], a[7]) self.assertEqual(bindings[9][1], a[9]) def test_delete_empty_parents(self): """delete_empty_parents should remove empty parents from bound indices""" id_to_node, node_first_last = index_tree(self.t) bound_indices = bind_to_array(node_first_last, self.count_array[:,0:1]) bool_descendants(bound_indices) self.assertEqual(len(bound_indices), 4) deleted = delete_empty_parents(bound_indices) self.assertEqual(len(deleted), 2) for d in deleted: self.assertEqual(d[0][0], 1) def test_traverse_reduce(self): """traverse_reduce should reduce array in traversal order.""" id_, child = index_tree(self.t3) a = zeros((11,3)) + 99 #fill with junk bindings = bind_to_array(child, a) #load in leaf envs a[0] = a[1] = a[2] = a[7] = [0,1,0] a[3] = [1,0,0] a[6] = [0,0,1] f = logical_or.reduce traverse_reduce(bindings, f) self.assertEqual(a,\ array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,1,0],[1,0,0],\ [0,0,1],[0,1,0],[1,1,0],[0,1,1],[1,1,1]]) ) f = sum traverse_reduce(bindings, f) self.assertEqual( a, \ array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,3,0],[1,0,0],\ [0,0,1],[0,1,0],[1,3,0],[0,1,1],[1,4,1]]) ) def test_bool_descendants(self): """bool_descendants should be true if any descendant true""" #self.t3 = DndParser('(((a,b,c),(d)),(e,f))', UniFracTreeNode) id_, child = index_tree(self.t3) a = zeros((11,3)) + 99 #fill with junk bindings = bind_to_array(child, a) #load in leaf envs a[0] = a[1] = a[2] = a[7] = [0,1,0] a[3] = [1,0,0] a[6] = [0,0,1] bool_descendants(bindings) self.assertEqual(a, \ array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,1,0],[1,0,0],\ [0,0,1],[0,1,0],[1,1,0],[0,1,1],[1,1,1]]) ) def test_sum_descendants(self): """sum_descendants should sum total descendants w/ each state""" id_, child = index_tree(self.t3) a = zeros((11,3)) + 99 #fill with junk bindings = bind_to_array(child, a) #load in leaf envs a[0] = a[1] = a[2] = a[7] = [0,1,0] a[3] = [1,0,0] a[6] = [0,0,1] sum_descendants(bindings) self.assertEqual(a, \ array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,3,0],[1,0,0],\ [0,0,1],[0,1,0],[1,3,0],[0,1,1],[1,4,1]]) ) def test_fitch_descendants(self): """fitch_descendants should assign states by fitch parsimony, ret. #""" id_, child = index_tree(self.t3) a = zeros((11,3)) + 99 #fill with junk bindings = bind_to_array(child, a) #load in leaf envs a[0] = a[1] = a[2] = a[7] = [0,1,0] a[3] = [1,0,0] a[6] = [0,0,1] changes = fitch_descendants(bindings) self.assertEqual(changes, 2) self.assertEqual(a, \ array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,1,0],[1,0,0],\ [0,0,1],[0,1,0],[1,1,0],[0,1,1],[0,1,0]]) ) def test_fitch_descendants_missing_data(self): """fitch_descendants should work with missing data""" #tree and envs for testing missing values t_str = '(((a:1,b:2):4,(c:3,d:1):2):1,(e:2,f:1):3);' env_str = """a A b B c D d C e C f D""" t = DndParser(t_str, UniFracTreeNode) node_index, nodes = index_tree(t) env_counts = count_envs(env_str.split('\n')) count_array, unique_envs, env_to_index, node_to_index = \ index_envs(env_counts, node_index) branch_lengths = get_branch_lengths(node_index) #test just the AB pair ab_counts = count_array[:, 0:2] bindings = bind_to_array(nodes, ab_counts) changes = fitch_descendants(bindings, counter=FitchCounter) self.assertEqual(changes, 1) orig_result = ab_counts.copy() #check that the original Fitch counter gives the expected #incorrect parsimony result changes = fitch_descendants(bindings, counter=FitchCounterDense) self.assertEqual(changes, 5) new_result = ab_counts.copy() #check that the two versions fill the array with the same values self.assertEqual(orig_result, new_result) def test_tip_distances(self): """tip_distances should set tips to correct distances.""" t = self.t bl = self.branch_lengths.copy()[:,newaxis] bindings = bind_to_parent_array(t, bl) tips = [] for n in t.traverse(self_before=False, self_after=True): if not n.Children: tips.append(n._leaf_index) tip_distances(bl, bindings, tips) self.assertEqual(bl, array([5,6,6,6,6,0,0,0,0])[:,newaxis]) def test_permute_selected_rows(self): """permute_selected_rows should switch just the selected rows in a""" orig = reshape(arange(8),(4,2)) new = orig.copy() fake_permutation = lambda a: range(a)[::-1] #reverse order permute_selected_rows([0,2], orig, new, fake_permutation) self.assertEqual(new, array([[4,5],[2,3],[0,1],[6,7]])) #make sure we didn't change orig self.assertEqual(orig, reshape(arange(8), (4,2))) def test_prep_items_for_jackknife(self): """prep_items_for_jackknife should expand indices of repeated counts""" a = array([0,1,0,1,2,0,3]) # 0 1 2 3 4 5 6 result = prep_items_for_jackknife(a) exp = array([1,3,4,4,6,6,6]) self.assertEqual(result, exp) def test_jackknife_bool(self): """jackknife_bool should make a vector with right number of nonzeros""" fake_permutation = lambda a: range(a)[::-1] #reverse order orig_vec = array([0,0,1,0,1,1,0,1,1]) orig_items = flatnonzero(orig_vec) length = len(orig_vec) result = jackknife_bool(orig_items, 3, len(orig_vec), fake_permutation) self.assertEqual(result, array([0,0,0,0,0,1,0,1,1])) #returns the original if trying to take too many self.assertEqual(jackknife_bool(orig_items, 20, len(orig_vec)), \ orig_vec) def test_jackknife_int(self): """jackknife_int should make a vector with right counts""" orig_vec = array([0,2,1,0,3,1]) orig_items = array([1,1,2,4,4,4,5]) # 0 1 2 3 4 5 6 fake_permutation = lambda a: a == 7 and array([4,6,3,1,2,6,5]) result = jackknife_int(orig_items, 4, len(orig_vec), fake_permutation) self.assertEqual(result, array([0,1,0,0,2,1])) #returns the original if trying to take too many self.assertEqual(jackknife_int(orig_items, 20, len(orig_vec)), \ orig_vec) def test_jackknife_array(self): """jackknife_array should make a new array with right counts""" orig_vec1 = array([0,2,2,3,1]) orig_vec2 = array([2,2,1,2,2]) test_array = array([orig_vec1, orig_vec2]) # implement this, just doing by eye now #perm_fn = fake_permutation perm_fn = permutation #print "need to test with fake permutation!!" new_mat1 = jackknife_array(test_array, 1, axis=1, jackknife_f=jackknife_int, permutation_f=permutation) self.assertEqual(new_mat1.sum(axis=0), [1,1,1,1,1]) new_mat2 = jackknife_array(test_array, 2, axis=1, jackknife_f=jackknife_int, permutation_f=permutation) self.assertEqual(new_mat2.sum(axis=0), [2,2,2,2,2]) new_mat3 = jackknife_array(test_array, 2, axis=0, jackknife_f=jackknife_int, permutation_f=permutation) self.assertEqual(new_mat3.sum(axis=1), [2,2]) # test that you get orig mat back if too many self.assertEqual(jackknife_array(test_array, 20, axis=1), test_array) def test_unifrac(self): """unifrac should return correct results for model tree""" m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\ [0,1,1],[1,1,1]]) bl = self.branch_lengths self.assertEqual(unifrac(bl, m[:,0], m[:,1]), 10/16.0) self.assertEqual(unifrac(bl, m[:,0], m[:,2]), 8/13.0) self.assertEqual(unifrac(bl, m[:,1], m[:,2]), 8/17.0) def test_unnormalized_unifrac(self): """unnormalized unifrac should return correct results for model tree""" m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\ [0,1,1],[1,1,1]]) bl = self.branch_lengths self.assertEqual(unnormalized_unifrac(bl, m[:,0], m[:,1]), 10/17.) self.assertEqual(unnormalized_unifrac(bl, m[:,0], m[:,2]), 8/17.) self.assertEqual(unnormalized_unifrac(bl, m[:,1], m[:,2]), 8/17.) def test_PD(self): """PD should return correct results for model tree""" m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\ [0,1,1],[1,1,1]]) bl = self.branch_lengths self.assertEqual(PD(bl, m[:,0]), 7) self.assertEqual(PD(bl, m[:,1]), 15) self.assertEqual(PD(bl, m[:,2]), 11) def test_G(self): """G should return correct results for model tree""" m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\ [0,1,1],[1,1,1]]) bl = self.branch_lengths self.assertEqual(G(bl, m[:,0], m[:,0]), 0) self.assertEqual(G(bl, m[:,0], m[:,1]), 1/16.0) self.assertEqual(G(bl, m[:,1], m[:,0]), 9/16.0) def test_unnormalized_G(self): """unnormalized_G should return correct results for model tree""" m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\ [0,1,1],[1,1,1]]) bl = self.branch_lengths self.assertEqual(unnormalized_G(bl, m[:,0], m[:,0]), 0/17.) self.assertEqual(unnormalized_G(bl, m[:,0], m[:,1]), 1/17.) self.assertEqual(unnormalized_G(bl, m[:,1], m[:,0]), 9/17.) def test_unifrac_matrix(self): """unifrac_matrix should return correct results for model tree""" m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\ [0,1,1],[1,1,1]]) bl = self.branch_lengths result = unifrac_matrix(bl, m) self.assertEqual(result, array([[0, 10/16.,8/13.],[10/16.,0,8/17.],\ [8/13.,8/17.,0]])) #should work if we tell it the measure is asymmetric result = unifrac_matrix(bl, m, is_symmetric=False) self.assertEqual(result, array([[0, 10/16.,8/13.],[10/16.,0,8/17.],\ [8/13.,8/17.,0]])) #should work if the measure really is asymmetric result = unifrac_matrix(bl,m,metric=unnormalized_G,is_symmetric=False) self.assertEqual(result, array([[0, 1/17.,2/17.],[9/17.,0,6/17.],\ [6/17.,2/17.,0]])) #should also match web site calculations envs = self.count_array bound_indices = bind_to_array(self.nodes, envs) bool_descendants(bound_indices) result = unifrac_matrix(bl, envs) exp = array([[0, 0.6250, 0.6154], [0.6250, 0, \ 0.4706], [0.6154, 0.4707, 0]]) assert (abs(result - exp)).max() < 0.001 def test_unifrac_one_sample(self): """unifrac_one_sample should match unifrac_matrix""" m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\ [0,1,1],[1,1,1]]) bl = self.branch_lengths result = unifrac_matrix(bl, m) for i in range(len(result)): one_sam_res = unifrac_one_sample(i, bl, m) self.assertEqual(result[i], one_sam_res) self.assertEqual(result[:,i], one_sam_res) #should work ok on asymmetric metrics result = unifrac_matrix(bl,m,metric=unnormalized_G,is_symmetric=False) for i in range(len(result)): one_sam_res = unifrac_one_sample(i, bl, m, metric=unnormalized_G) self.assertEqual(result[i], one_sam_res) # only require row for asym # self.assertEqual(result[:,i], one_sam_res) def test_unifrac_vector(self): """unifrac_vector should return correct results for model tree""" m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\ [0,1,1],[1,1,1]]) bl = self.branch_lengths result = unifrac_vector(bl, m) self.assertFloatEqual(result, array([10./17,6./17,7./17])) def test_PD_vector(self): """PD_vector should return correct results for model tree""" m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\ [0,1,1],[1,1,1]]) bl = self.branch_lengths result = PD_vector(bl, m) self.assertFloatEqual(result, array([7,15,11])) def test_weighted_unifrac_matrix(self): """weighted unifrac matrix should ret correct results for model tree""" #should match web site calculations envs = self.count_array bound_indices = bind_to_array(self.nodes, envs) sum_descendants(bound_indices) bl = self.branch_lengths tip_indices = [n._leaf_index for n in self.t.tips()] result = weighted_unifrac_matrix(bl, envs, tip_indices) exp = array([[0, 9.1, 4.5], [9.1, 0, \ 6.4], [4.5, 6.4, 0]]) assert (abs(result - exp)).max() < 0.001 #should work with branch length corrections td = bl.copy()[:,newaxis] tip_bindings = bind_to_parent_array(self.t, td) tips = [n._leaf_index for n in self.t.tips()] tip_distances(td, tip_bindings, tips) result = weighted_unifrac_matrix(bl, envs, tip_indices, bl_correct=True, tip_distances=td) exp = array([[0, 9.1/11.5, 4.5/(10.5+1./3)], [9.1/11.5, 0, \ 6.4/(11+1./3)], [4.5/(10.5+1./3), 6.4/(11+1./3), 0]]) assert (abs(result - exp)).max() < 0.001 def test_weighted_one_sample(self): """weighted one sample should match weighted matrix""" #should match web site calculations envs = self.count_array bound_indices = bind_to_array(self.nodes, envs) sum_descendants(bound_indices) bl = self.branch_lengths tip_indices = [n._leaf_index for n in self.t.tips()] result = weighted_unifrac_matrix(bl, envs, tip_indices) for i in range(len(result)): one_sam_res = weighted_one_sample(i, bl, envs, tip_indices) self.assertEqual(result[i], one_sam_res) self.assertEqual(result[:,i], one_sam_res) #should work with branch length corrections td = bl.copy()[:,newaxis] tip_bindings = bind_to_parent_array(self.t, td) tips = [n._leaf_index for n in self.t.tips()] tip_distances(td, tip_bindings, tips) result = weighted_unifrac_matrix(bl, envs, tip_indices, bl_correct=True, tip_distances=td) for i in range(len(result)): one_sam_res = weighted_one_sample(i, bl, envs, tip_indices, bl_correct=True, tip_distances=td) self.assertEqual(result[i], one_sam_res) self.assertEqual(result[:,i], one_sam_res) def test_weighted_unifrac_vector(self): """weighted_unifrac_vector should ret correct results for model tree""" envs = self.count_array bound_indices = bind_to_array(self.nodes, envs) sum_descendants(bound_indices) bl = self.branch_lengths tip_indices = [n._leaf_index for n in self.t.tips()] result = weighted_unifrac_vector(bl, envs, tip_indices) self.assertFloatEqual(result[0], sum([ abs(1./2 - 2./8)*1, abs(1./2 - 1./8)*2, abs(0 - 1./8)*3, abs(0 - 3./8)*1, abs(0 - 1./8)*1, abs(0 - 4./8)*2, abs(2./2 - 3./8)*4, abs(0. - 5./8)*3.])) self.assertFloatEqual(result[1], sum([ abs(0-.6)*1, abs(.2-.2)*2, abs(.2-0)*3, abs(.6-0)*1, abs(0-.2)*1, abs(.6-.2)*2, abs(.2-.8)*4, abs(.8-.2)*3])) self.assertFloatEqual(result[2], sum([ abs(2./3-1./7)*1, abs(0-2./7)*2, abs(0-1./7)*3, abs(0-3./7)*1, abs(1./3-0)*1, abs(1./3-3./7)*2, abs(2./3-3./7)*4, abs(1./3-4./7)*3])) if __name__ == '__main__': #run if called from command-line main() PyCogent-1.5.3/tests/test_maths/test_unifrac/test_fast_unifrac.py000644 000765 000024 00000047373 12024702176 026337 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for fast unifrac.""" from __future__ import division from numpy import array, logical_not, argsort from cogent.util.unit_test import TestCase, main from cogent.parse.tree import DndParser from cogent.maths.unifrac.fast_tree import (count_envs, index_tree, index_envs, get_branch_lengths) from cogent.maths.unifrac.fast_unifrac import (reshape_by_name, meta_unifrac, shuffle_tipnames, weight_equally, weight_by_num_tips, weight_by_branch_length, weight_by_num_seqs, get_all_env_names, consolidate_skipping_missing_matrices, consolidate_missing_zero, consolidate_missing_one, consolidate_skipping_missing_values, UniFracTreeNode, mcarlo_sig, num_comps, fast_unifrac, fast_unifrac_whole_tree, PD_whole_tree, PD_generic_whole_tree, TEST_ON_TREE, TEST_ON_ENVS, TEST_ON_PAIRWISE, shared_branch_length, shared_branch_length_to_root, fast_unifrac_one_sample) from numpy.random import permutation __author__ = "Rob Knight and Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Micah Hamady", "Daniel McDonald", "Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight, Micah Hamady" __email__ = "rob@spot.colorado.edu, hamady@colorado.edu" __status__ = "Prototype" class unifrac_tests(TestCase): """Tests of top-level functions.""" def setUp(self): """Define some standard trees.""" self.t_str = '((a:1,b:2):4,(c:3,(d:1,e:1):2):3)' self.t = DndParser(self.t_str, UniFracTreeNode) self.env_str = """ a A 1 a C 2 b A 1 b B 1 c B 1 d B 3 e C 1""" self.env_counts = count_envs(self.env_str.splitlines()) self.missing_env_str = """ a A 1 a C 2 e C 1""" self.missing_env_counts = count_envs(self.missing_env_str.splitlines()) self.extra_tip_str = """ q A 1 w C 2 e A 1 r B 1 t B 1 y B 3 u C 1""" self.extra_tip_counts = count_envs(self.extra_tip_str.splitlines()) self.wrong_tip_str = """ q A 1 w C 2 r B 1 t B 1 y B 3 u C 1""" self.wrong_tip_counts = count_envs(self.wrong_tip_str.splitlines()) self.t2_str = '(((a:1,b:1):1,c:5):2,d:4)' self.t2 = DndParser(self.t2_str, UniFracTreeNode) self.env2_str = """ a B 1 b A 1 c A 2 c C 2 d B 1 d C 1""" self.env2_counts = count_envs(self.env2_str.splitlines()) self.trees = [self.t, self.t2] self.envs = [self.env_counts, self.env2_counts] self.mc_1 = array([.5, .4, .3, .2, .1, .6, .7, .8, .9, 1.0]) # from old EnvsNode tests self.old_t_str = '((org1:0.11,org2:0.22,(org3:0.12,org4:0.23)g:0.33)b:0.2,(org5:0.44,org6:0.55)c:0.3,org7:0.4)' self.old_t = DndParser(self.old_t_str, UniFracTreeNode) self.old_env_str = """ org1 env1 1 org1 env2 1 org2 env2 1 org3 env2 1 org4 env3 1 org5 env1 1 org6 env1 1 org7 env3 1 """ self.old_env_counts = count_envs(self.old_env_str.splitlines()) self.old_node_index, self.old_nodes = index_tree(self.old_t) self.old_count_array, self.old_unique_envs, self.old_env_to_index, \ self.old_node_to_index = index_envs(self.old_env_counts, self.old_node_index) self.old_branch_lengths = get_branch_lengths(self.old_node_index) def test_shared_branch_length(self): """Should return the correct shared branch length by env""" t_str = "(((a:1,b:2):3,c:4),(d:5,e:6,f:7):8);" envs = """ a A 1 b A 1 c A 1 d A 1 e A 1 f B 1 """ env_counts = count_envs(envs.splitlines()) t = DndParser(t_str, UniFracTreeNode) exp = {('A',):21.0,('B',):7.0} obs = shared_branch_length(t, env_counts, 1) self.assertEqual(obs, exp) exp = {('A','B'):8.0} obs = shared_branch_length(t, env_counts, 2) self.assertEqual(obs, exp) self.assertRaises(ValueError, shared_branch_length, t, env_counts, 3) def test_shared_branch_length_to_root(self): """Should return the correct shared branch length by env to root""" t_str = "(((a:1,b:2):3,c:4),(d:5,e:6,f:7):8);" envs = """ a A 1 b A 1 c A 1 d A 1 e A 1 f B 1 """ env_counts = count_envs(envs.splitlines()) t = DndParser(t_str, UniFracTreeNode) exp = {'A':29.0,'B':15.0} obs = shared_branch_length_to_root(t, env_counts) self.assertEqual(obs, exp) def test_fast_unifrac(self): """Should calc unifrac values for whole tree.""" #Note: results not tested for correctness here as detailed tests #in fast_tree module. res = fast_unifrac(self.t, self.env_counts) res = fast_unifrac(self.t, self.missing_env_counts) res = fast_unifrac(self.t, self.extra_tip_counts) self.assertRaises(ValueError, fast_unifrac, self.t, \ self.wrong_tip_counts) def test_fast_unifrac_one_sample(self): """ fu one sample should match whole unifrac result, for env 'B'""" # first get full unifrac matrix res = fast_unifrac(self.t, self.env_counts) dmtx, env_order = res['distance_matrix'] dmtx_vec = dmtx[env_order.index('B')] dmtx_vec = dmtx_vec[argsort(env_order)] # then get one sample unifrac vector one_sam_dvec, one_sam_env_order = \ fast_unifrac_one_sample('B', self.t, self.env_counts) one_sam_dvec = one_sam_dvec[argsort(one_sam_env_order)] self.assertFloatEqual(one_sam_dvec, dmtx_vec) def test_fast_unifrac_one_sample2(self): """fu one sam should match whole weighted unifrac result, for env 'B'""" # first get full unifrac matrix res = fast_unifrac(self.t, self.env_counts, weighted=True) dmtx, env_order = res['distance_matrix'] dmtx_vec = dmtx[env_order.index('B')] dmtx_vec = dmtx_vec[argsort(env_order)] # then get one sample unifrac vector one_sam_dvec, one_sam_env_order = \ fast_unifrac_one_sample('B', self.t, self.env_counts,weighted=True) one_sam_dvec = one_sam_dvec[argsort(one_sam_env_order)] self.assertFloatEqual(one_sam_dvec, dmtx_vec) def test_fast_unifrac_one_sample3(self): """fu one sam should match missing env unifrac result, for env 'B'""" # first get full unifrac matrix res = fast_unifrac(self.t, self.missing_env_counts, weighted=False) dmtx, env_order = res['distance_matrix'] dmtx_vec = dmtx[env_order.index('C')] dmtx_vec = dmtx_vec[argsort(env_order)] # then get one sample unifrac vector one_sam_dvec, one_sam_env_order = \ fast_unifrac_one_sample('C', self.t, self.missing_env_counts,weighted=False) one_sam_dvec = one_sam_dvec[argsort(one_sam_env_order)] self.assertFloatEqual(one_sam_dvec, dmtx_vec) # and should raise valueerror when 'B' self.assertRaises(ValueError, fast_unifrac_one_sample, 'B', self.t, self.missing_env_counts,weighted=False) def test_fast_unifrac_whole_tree(self): """ should correctly compute one p-val for whole tree """ # "should test with fake permutation but # using same as old envs nodefor now" result = [] num_to_do = 10 for i in range(num_to_do): real_ufracs, sim_ufracs = fast_unifrac_whole_tree(self.old_t, \ self.old_env_counts, 1000, permutation_f=permutation) rawp, corp = mcarlo_sig(sum(real_ufracs), [sum(x) for x in \ sim_ufracs], 1, tail='high') result.append(rawp) self.assertSimilarMeans(result, 0.047) def test_unifrac_explicit(self): """unifrac should correctly compute correct values. environment M contains only tips not in tree, tip j is in no envs values were calculated by hand """ t1 = DndParser('((a:1,b:2):4,((c:3, j:17),(d:1,e:1):2):3)', \ UniFracTreeNode) # note c,j is len 0 node # /-------- /-a # ---------| \-b # | /-------- /-c # \--------| \-j # \-------- /-d # \-e env_str = """ a A 1 a C 2 b A 1 b B 1 c B 1 d B 3 e C 1 m M 88""" env_counts = count_envs(env_str.splitlines()) self.assertFloatEqual(fast_unifrac(t1,env_counts)['distance_matrix'], \ (array( [[0,10/16, 8/13], [10/16,0,8/17], [8/13,8/17,0]]),['A','B','C'])) # changing tree topology relative to c,j tips shouldn't change # anything t2 = DndParser('((a:1,b:2):4,((c:2, j:16):1,(d:1,e:1):2):3)', \ UniFracTreeNode) self.assertFloatEqual(fast_unifrac(t2,env_counts)['distance_matrix'], \ (array( [[0,10/16, 8/13], [10/16,0,8/17], [8/13,8/17,0]]),['A','B','C'])) def test_unifrac_make_subtree(self): """unifrac result should not depend on make_subtree environment M contains only tips not in tree, tip j, k is in no envs one clade is missing entirely values were calculated by hand we also test that we still have a valid tree at the end """ t1 = DndParser('((a:1,b:2):4,((c:3, (j:1,k:2)mt:17),(d:1,e:1):2):3)',\ UniFracTreeNode) # note c,j is len 0 node # /-------- /-a # ---------| \-b # | /-------- /-c # \--------| \mt------ /-j # | \-k # \-------- /-d # \-e # env_str = """ a A 1 a C 2 b A 1 b B 1 c B 1 d B 3 e C 1 m M 88""" env_counts = count_envs(env_str.splitlines()) self.assertFloatEqual(fast_unifrac(t1,env_counts,make_subtree=False)['distance_matrix'], \ (array( [[0,10/16, 8/13], [10/16,0,8/17], [8/13,8/17,0]]),['A','B','C'])) self.assertFloatEqual(fast_unifrac(t1,env_counts,make_subtree=True)['distance_matrix'], \ (array( [[0,10/16, 8/13], [10/16,0,8/17], [8/13,8/17,0]]),['A','B','C'])) # changing tree topology relative to c,j tips shouldn't change anything t2 = DndParser('((a:1,b:2):4,((c:2, (j:1,k:2)mt:17):1,(d:1,e:1):2):3)', \ UniFracTreeNode) self.assertFloatEqual(fast_unifrac(t2,env_counts,make_subtree=False)['distance_matrix'], \ (array( [[0,10/16, 8/13], [10/16,0,8/17], [8/13,8/17,0]]),['A','B','C'])) self.assertFloatEqual(fast_unifrac(t2,env_counts,make_subtree=True)['distance_matrix'], \ (array( [[0,10/16, 8/13], [10/16,0,8/17], [8/13,8/17,0]]),['A','B','C'])) # ensure we haven't meaningfully changed the tree # by passing it to unifrac t3 = DndParser('((a:1,b:2):4,((c:3, (j:1,k:2)mt:17),(d:1,e:1):2):3)',\ UniFracTreeNode) # note c,j is len 0 node t1_tips = [tip.Name for tip in t1.tips()] t1_tips.sort() t3_tips = [tip.Name for tip in t3.tips()] t3_tips.sort() self.assertEqual(t1_tips, t3_tips) tipj3 = t3.getNodeMatchingName('j') tipb3 = t3.getNodeMatchingName('b') tipj1 = t1.getNodeMatchingName('j') tipb1 = t1.getNodeMatchingName('b') self.assertFloatEqual(tipj1.distance(tipb1), tipj3.distance(tipb3)) def test_PD_whole_tree(self): """PD_whole_tree should correctly compute PD for test tree. environment M contains only tips not in tree, tip j is in no envs """ t1 = DndParser('((a:1,b:2):4,((c:3, j:17),(d:1,e:1):2):3)', \ UniFracTreeNode) env_str = """ a A 1 a C 2 b A 1 b B 1 c B 1 d B 3 e C 1 m M 88""" env_counts = count_envs(env_str.splitlines()) self.assertEqual(PD_whole_tree(t1,env_counts), \ (['A','B','C'], array([7.,15.,11.]))) def test_PD_generic_whole_tree(self): """PD_generic_whole_tree should correctly compute PD for test tree.""" self.t1 = DndParser('((a:1,b:2):4,(c:3,(d:1,e:1):2):3)', \ UniFracTreeNode) self.env_str = """ a A 1 a C 2 b A 1 b B 1 c B 1 d B 3 e C 1""" env_counts = count_envs(self.env_str.splitlines()) self.assertEqual(PD_generic_whole_tree(self.t1,self.env_counts), \ (['A','B','C'], array([7.,15.,11.]))) def test_mcarlo_sig(self): """test_mcarlo_sig should calculate monte carlo sig high/low""" self.assertEqual(mcarlo_sig(.5, self.mc_1, 1, 'high'), (5.0/10, 5.0/10)) self.assertEqual(mcarlo_sig(.5, self.mc_1, 1, 'low'), (4.0/10, 4.0/10)) self.assertEqual(mcarlo_sig(.5, self.mc_1, 5, 'high'), (5.0/10, 1.0)) self.assertEqual(mcarlo_sig(.5, self.mc_1, 5, 'low'), (4.0/10, 1.0)) self.assertEqual(mcarlo_sig(0, self.mc_1, 1, 'low'), (0.0, "<=%.1e" % (1.0/10))) self.assertEqual(mcarlo_sig(100, self.mc_1, 10, 'high'), (0.0, "<=%.1e" % (1.0/10))) def test_num_comps(self): """ test num comps """ self.assertEqual(num_comps(5), sum([i for i in range(1, 5)])) self.assertEqual(num_comps(15), sum([i for i in range(1, 15)])) self.assertEqual(num_comps(10000), sum([i for i in range(1, 10000)])) self.assertEqual(num_comps(1833), sum([i for i in range(1, 1833)])) def test_shuffle_tipnames(self): """shuffle_tipnames should return copy of tree w/ labels permuted""" #Note: this should never fail but is technically still stochastic #5! is 120 so repeating 5 times should fail about 1 in 10^10. for i in range(5): try: t = DndParser(self.t_str) result = shuffle_tipnames(t) orig_names = [n.Name for n in t.tips()] new_names = [n.Name for n in result.tips()] self.assertIsPermutation(orig_names, new_names) return except AssertionError: continue raise AssertionError, "Produced same permutation in 5 tries: broken?" def test_weight_equally(self): """weight_equally should return unit weight per tree""" self.assertEqual(weight_equally(self.trees, self.envs), array([1,1])) def test_weight_by_num_tips(self): """weight_by_num_tips should return tips per tree""" self.assertEqual(weight_by_num_tips(self.trees, self.envs), array([5, 4])) def test_weight_by_branch_length(self): """weight_by_branch_length should return branch length per tree""" self.assertEqual(weight_by_branch_length(self.trees, self.envs), array([17, 14])) def test_weight_by_num_seqs(self): """weight_by_num_seqs should return num seqs per tree""" self.assertEqual(weight_by_num_seqs(self.trees, self.envs), array([10, 8])) def test_get_all_env_names(self): """get_all_env_names should get all names from counts""" self.assertEqual(get_all_env_names(self.env_counts), set('ABC')) def test_consolidate_skipping_missing_matrices(self): """consolidate_skipping_missing_matrices should skip those missing data""" m1 = array([[1,2],[3,4]]) m2 = array([[1,2,3],[4,5,6],[7,8,9]]) m3 = array([[2,2,2],[3,3,3],[4,4,4]]) matrices = [m1,m2, m3] env_names = map(list, ['AB', 'ABC', 'ABC']) weights = [1, 2, 3] all_names =list('ABC') result = consolidate_skipping_missing_matrices(matrices, env_names, weights, all_names) self.assertFloatEqual(result, .4*m2 + .6*m3) def test_consolidate_missing_zero(self): """consolidate_missing_zero should fill missing values to zero""" m1 = array([[1,2],[3,4]]) m2 = array([[1,2,3],[4,5,6],[7,8,9]]) m3 = array([[2,2,2],[3,3,3],[4,4,4]]) matrices = [m1,m2, m3] env_names = map(list, ['AB', 'ABC', 'ABC']) weights = [1, 2, 3] weights = array(weights, float) weights/=weights.sum() all_names =list('ABC') transformed_m1 = array([[1,2,0],[3,4,0],[0,0,0]]) result = consolidate_missing_zero(matrices, env_names, weights, all_names) self.assertFloatEqual(result, (1/6.)*transformed_m1 + (2/6.)*m2 + (3/6.)*m3) def test_consolidate_missing_one(self): """consolidate_missing_one should fill missing off-diags to one""" m1 = array([[1,2],[3,4]]) m2 = array([[1,2,3],[4,5,6],[7,8,9]]) m3 = array([[2,2,2],[3,3,3],[4,4,4]]) matrices = [m1,m2, m3] env_names = map(list, ['AB', 'ABC', 'ABC']) weights = [1, 2, 3] weights = array(weights, float) weights/=weights.sum() all_names =list('ABC') transformed_m1 = array([[1,2,1],[3,4,1],[1,1,0]]) result = consolidate_missing_one(matrices, env_names, weights, all_names) self.assertFloatEqual(result, (1/6.)*transformed_m1 + (2/6.)*m2 + (3/6.)*m3) def test_consolidate_skipping_missing_values(self): """consolidate_skipping_missing_values should average over filled values""" m1 = array([[1,2],[3,4]]) m2 = array([[1,2,3],[4,5,6],[7,8,9]]) m3 = array([[2,2,2],[3,3,3],[4,4,4]]) matrices = [m1,m2, m3] env_names = map(list, ['AB', 'ABC', 'ABC']) weights = [1., 2, 3] weights = array(weights) weights/=weights.sum() all_names =list('ABC') expected = array([[ 1/6.*1 + 2/6.*1 + 3/6.*2, 1/6.*2 + 2/6.*2 + 3/6.*2, 2/5.*3 + 3/5.*2], [ 1/6.*3 + 2/6.*4 + 3/6.*3, 1/6.*4 + 2/6.*5 + 3/6.*3, 2/5.*6 + 3/5.*3], [ 2/5.*7 + 3/5.*4, 2/5.*8 + 3/5.*4, 2/5.*9 + 3/5.*4]]) result = consolidate_skipping_missing_values(matrices, env_names, weights, all_names) self.assertFloatEqual(result, expected) def test_reshape_by_name(self): """reshape_by_name should reshape matrix from old to new names""" old = array([[0,1,2],[3,4,5],[6,7,8]]) old_names = 'ABC' new_names = 'xCyBA' exp = array([[0,0,0,0,0],[0,8,0,7,6],[0,0,0,0,0],\ [0,5,0,4,3],[0,2,0,1,0]]) self.assertEqual(reshape_by_name(old, old_names, new_names), exp) result = reshape_by_name(old, old_names, new_names, masked=True) result.fill_value=0 self.assertEqual(result._data * logical_not(result._mask), exp) def test_meta_unifrac(self): """meta_unifrac should give correct result on sample trees""" tree_list = [self.t, self.t2] envs_list = [self.env_counts, self.env2_counts] result = meta_unifrac(tree_list, envs_list, weight_equally, modes=["distance_matrix"]) u1_distances = array([[0, 10/16.,8/13.],[10/16.,0,8/17.],\ [8/13.,8/17.,0]]) u2_distances = array([[0,11/14.,6/13.],[11/14.,0,7/13.],[6/13.,7/13., 0]]) exp = (u1_distances + u2_distances)/2 self.assertFloatEqual(result['distance_matrix'], (exp, list('ABC'))) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/__init__.py000644 000765 000024 00000000663 12024702176 024071 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_distribution','test_histogram', 'test_special', 'test_ks', 'test_test'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Catherine Lozupone", "Gavin Huttley", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_maths/test_stats/test_alpha_diversity.py000644 000765 000024 00000025014 12024702176 026555 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file test_alpha_diversity.py from __future__ import division from numpy import array, log, sqrt, exp from math import e from cogent.util.unit_test import TestCase, main from cogent.maths.stats.alpha_diversity import expand_counts, counts, observed_species, singles, \ doubles, osd, margalef, menhinick, dominance, simpson, \ simpson_reciprocal, reciprocal_simpson,\ shannon, equitability, berger_parker_d, mcintosh_d, brillouin_d, \ strong, kempton_taylor_q, fisher_alpha, \ mcintosh_e, heip_e, simpson_e, robbins, robbins_confidence, \ chao1_uncorrected, chao1_bias_corrected, chao1, chao1_var, \ chao1_confidence, ACE, michaelis_menten_fit __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight","Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class diversity_tests(TestCase): """Tests of top-level functions""" def setUp(self): """Set up shared variables""" self.TestData = array([0,1,1,4,2,5,2,4,1,2]) self.NoSingles = array([0,2,2,4,5,0,0,0,0,0]) self.NoDoubles = array([0,1,1,4,5,0,0,0,0,0]) def test_expand_counts(self): """expand_counts should return correct expanded array""" c = array([2,0,1,2]) self.assertEqual(expand_counts(c), array([0,0,2,3,3])) def test_counts(self): """counts should return correct array""" c = array([5,0,1,1,5,5]) obs = counts(c) exp = array([1,2,0,0,0,3]) self.assertEqual(obs, exp) d = array([2,2,1,0]) obs = counts(d, obs) exp = array([2,3,2,0,0,3]) self.assertEqual(obs, exp) def test_singles(self): """singles should return correct # of singles""" self.assertEqual(singles(self.TestData), 3) self.assertEqual(singles(array([0,3,4])), 0) self.assertEqual(singles(array([1])), 1) def test_doubles(self): """doubles should return correct # of doubles""" self.assertEqual(doubles(self.TestData), 3) self.assertEqual(doubles(array([0,3,4])), 0) self.assertEqual(doubles(array([2])), 1) def test_osd(self): """osd should return correct # of observeds, singles, doubles""" self.assertEqual(osd(self.TestData), (9,3,3)) def test_margalef(self): """margalef should match hand-calculated values""" self.assertEqual(margalef(self.TestData), 8/log(22)) def test_menhinick(self): """menhinick should match hand-calculated values""" self.assertEqual(menhinick(self.TestData), 9/sqrt(22)) def test_dominance(self): """dominance should match hand-calculated values""" c = array([1,0,2,5,2]) self.assertFloatEqual(dominance(c), .34) d = array([5]) self.assertEqual(dominance(d), 1) def test_simpson(self): """simpson should match hand-calculated values""" c = array([1,0,2,5,2]) self.assertFloatEqual(simpson(c), .66) d = array([5]) self.assertFloatEqual(simpson(d), 0) def test_reciprocal_simpson(self): """reciprocal_simpson should match hand-calculated results""" c = array([1,0,2,5,2]) self.assertFloatEqual(reciprocal_simpson(c), 1/.66) def test_simpson_reciprocal(self): """simpson_reciprocal should match 1/D results""" c = array([1,0,2,5,2]) self.assertFloatEqual(simpson_reciprocal(c), 1./dominance(c)) def test_shannon(self): """shannon should match hand-calculated values""" c = array([5]) self.assertFloatEqual(shannon(c), 0) c = array([5,5]) self.assertFloatEqual(shannon(c), 1) c = array([1,1,1,1,0]) self.assertEqual(shannon(c), 2) def test_equitability(self): """equitability should match hand-calculated values""" c = array([5]) self.assertFloatEqual(equitability(c), 0) c = array([5,5]) self.assertFloatEqual(equitability(c), 1) c = array([1,1,1,1,0]) self.assertEqual(equitability(c), 1) def test_berger_parker_d(self): """berger-parker_d should match hand-calculated values""" c = array([5]) self.assertFloatEqual(berger_parker_d(c), 1) c = array([5,5]) self.assertFloatEqual(berger_parker_d(c), 0.5) c = array([1,1,1,1,0]) self.assertEqual(berger_parker_d(c), 0.25) def test_mcintosh_d(self): """mcintosh_d should match hand-calculated values""" c = array([1,2,3]) self.assertFloatEqual(mcintosh_d(c), 0.636061424871458) def test_brillouin_d(self): """brillouin_d should match hand-calculated values""" c = array([1,2,3,1]) self.assertFloatEqual(brillouin_d(c), 0.86289353018248782) def test_strong(self): """strong's dominance index should match hand-calculated values""" c = array([1,2,3,1]) self.assertFloatEqual(strong(c), 0.214285714) def test_kempton_taylor_q(self): """kempton_taylor_q should approximate Magurran 1998 calculation p143""" c = array([2,3,3,3,3,3,4,4,4,6,6,7,7,9,9,11,14,15,15,20,29,33,34, 36,37,53,57,138,146,170]) self.assertFloatEqual(kempton_taylor_q(c), 14/log(34/4)) def test_fisher_alpha(self): """fisher alpha should match hand-calculated value.""" c = array([4,3,4,0,1,0,2]) obs = fisher_alpha(c) self.assertFloatEqual(obs, 2.7823795367398798) def test_mcintosh_e(self): """mcintosh e should match hand-calculated value.""" c = array([1,2,3,1]) num = sqrt(15) den = sqrt(19) exp = num/den self.assertEqual(mcintosh_e(c), exp) def test_heip_e(self): """heip e should match hand-calculated value""" c = array([1,2,3,1]) h = shannon(c, base=e) expected = exp(h-1)/3 self.assertEqual(heip_e(c), expected) def test_simpson_e(self): """simpson e should match hand-calculated value""" c = array([1,2,3,1]) s = simpson(c) self.assertEqual((1/s)/4, simpson_e(c)) def test_robbins(self): """robbins metric should match hand-calculated value""" c = array([1,2,3,0,1]) r = robbins(c) self.assertEqual(r,2./7) def test_robbins_confidence(self): """robbins CI should match hand-calculated value""" c = array([1,2,3,0,1]) r = robbins_confidence(c, 0.05) n = 7 s = 2 k = sqrt(8/0.05) self.assertEqual(r, ((s-k)/(n+1), (s+k)/(n+1))) def test_observed_species(self): """observed_species should return # observed species""" c = array([4,3,4,0,1,0,2]) obs = observed_species(c) exp = 5 self.assertEqual(obs, exp) c = array([0,0,0]) obs = observed_species(c) exp = 0 self.assertEqual(obs, exp) self.assertEqual(observed_species(self.TestData), 9) def test_chao1_bias_corrected(self): """chao1_bias_corrected should return same result as EstimateS""" obs = chao1_bias_corrected(*osd(self.TestData)) self.assertEqual(obs, 9.75) def test_chao1_uncorrected(self): """chao1_uncorrected should return same result as EstimateS""" obs = chao1_uncorrected(*osd(self.TestData)) self.assertEqual(obs, 10.5) def test_chao1(self): """chao1 should use right decision rules""" self.assertEqual(chao1(self.TestData), 9.75) self.assertEqual(chao1(self.TestData,bias_corrected=False),10.5) self.assertEqual(chao1(self.NoSingles), 4) self.assertEqual(chao1(self.NoSingles,bias_corrected=False),4) self.assertEqual(chao1(self.NoDoubles), 5) self.assertEqual(chao1(self.NoDoubles,bias_corrected=False),5) def test_chao1_var(self): """chao1_var should match observed results from EstimateS""" #NOTE: EstimateS reports sd, not var, and rounds to 2 dp self.assertFloatEqual(chao1_var(self.TestData), 1.42**2, eps=0.01) self.assertFloatEqual(chao1_var(self.TestData,bias_corrected=False),\ 2.29**2, eps=0.01) self.assertFloatEqualAbs(chao1_var(self.NoSingles), 0.39**2, eps=0.01) self.assertFloatEqualAbs(chao1_var(self.NoSingles, \ bias_corrected=False), 0.39**2, eps=0.01) self.assertFloatEqualAbs(chao1_var(self.NoDoubles), 2.17**2, eps=0.01) self.assertFloatEqualAbs(chao1_var(self.NoDoubles, \ bias_corrected=False), 2.17**2, eps=0.01) def test_chao1_confidence(self): """chao1_confidence should match observed results from EstimateS""" #NOTE: EstimateS rounds to 2 dp self.assertFloatEqual(chao1_confidence(self.TestData), (9.07,17.45), \ eps=0.01) self.assertFloatEqual(chao1_confidence(self.TestData, \ bias_corrected=False), (9.17,21.89), eps=0.01) self.assertFloatEqualAbs(chao1_confidence(self.NoSingles),\ (4, 4.95), eps=0.01) self.assertFloatEqualAbs(chao1_confidence(self.NoSingles, \ bias_corrected=False), (4,4.95), eps=0.01) self.assertFloatEqualAbs(chao1_confidence(self.NoDoubles), \ (4.08,17.27), eps=0.01) self.assertFloatEqualAbs(chao1_confidence(self.NoDoubles, \ bias_corrected=False), (4.08,17.27), eps=0.01) def test_ACE(self): """ACE should match values calculated by hand""" self.assertFloatEqual(ACE(array([2,0])), 1.0, eps=0.001) # next: just returns the number of species when all are abundant self.assertFloatEqual(ACE(array([12,0,9])), 2.0, eps=0.001) self.assertFloatEqual(ACE(array([12,2,8])), 3.0, eps=0.001) self.assertFloatEqual(ACE(array([12,2,1])), 4.0, eps=0.001) self.assertFloatEqual(ACE(array([12,1,2,1])), 7.0, eps=0.001) self.assertFloatEqual(ACE(array([12,3,2,1])), 4.6, eps=0.001) self.assertFloatEqual(ACE(array([12,3,6,1,10])), 5.62749672, eps=0.001) def test_michaelis_menten_fit(self): """ michaelis_menten_fit should match hand values in limiting cases""" res = michaelis_menten_fit([22]) self.assertFloatEqual(res,1.0,eps=.01) res = michaelis_menten_fit([42]) self.assertFloatEqual(res,1.0,eps=.01) res = michaelis_menten_fit([34],num_repeats=3,params_guess=[13,13]) self.assertFloatEqual(res,1.0,eps=.01) res = michaelis_menten_fit([70,70],num_repeats=5) self.assertFloatEqual(res,2.0,eps=.01) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/000755 000765 000024 00000000000 12024703635 023547 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_maths/test_stats/test_distribution.py000644 000765 000024 00000106634 12024702176 026115 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of statistical probability distribution integrals. Currently using tests against calculations in R, spreadsheets being unreliable. """ from cogent.util.unit_test import TestCase, main from cogent.maths.stats.distribution import z_low, z_high, zprob, chi_low, \ chi_high, t_low, t_high, tprob, poisson_high, poisson_low, poisson_exact, \ binomial_high, binomial_low, binomial_exact, f_low, f_high, fprob, \ stdtr, bdtr, bdtrc, pdtr, pdtrc, fdtr, fdtrc, gdtr, gdtrc, chdtri, stdtri,\ pdtri, bdtri, fdtri, gdtri __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Rob Knight", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class DistributionsTests(TestCase): """Tests of particular statistical distributions.""" def setUp(self): self.values = [0, 0.01, 0.1, 0.5, 1, 2, 5, 10, 20, 30, 50, 200] self.negvalues = [-i for i in self.values] self.df = [1, 10, 100] def test_z_low(self): """z_low should match R's pnorm() function""" probs = [ 0.5000000, 0.5039894, 0.5398278, 0.6914625, 0.8413447, 0.9772499, 0.9999997, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, ] negprobs = [ 5.000000e-01, 4.960106e-01, 4.601722e-01, 3.085375e-01, 1.586553e-01, 2.275013e-02, 2.866516e-07, 7.619853e-24, 2.753624e-89, 4.906714e-198, 0.000000e+00, 0.000000e+00] for z, p in zip(self.values, probs): self.assertFloatEqual(z_low(z), p) for z, p in zip(self.negvalues, negprobs): self.assertFloatEqual(z_low(z), p) def test_z_high(self): """z_high should match R's pnorm(lower.tail=FALSE) function""" negprobs = [ 0.5000000, 0.5039894, 0.5398278, 0.6914625, 0.8413447, 0.9772499, 0.9999997, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, ] probs = [ 5.000000e-01, 4.960106e-01, 4.601722e-01, 3.085375e-01, 1.586553e-01, 2.275013e-02, 2.866516e-07, 7.619853e-24, 2.753624e-89, 4.906714e-198, 0.000000e+00, 0.000000e+00] for z, p in zip(self.values, probs): self.assertFloatEqual(z_high(z), p) for z, p in zip(self.negvalues, negprobs): self.assertFloatEqual(z_high(z), p) def test_zprob(self): """zprob should match twice the z_high probability for abs(z)""" probs = [2*i for i in [ 5.000000e-01, 4.960106e-01, 4.601722e-01, 3.085375e-01, 1.586553e-01, 2.275013e-02, 2.866516e-07, 7.619853e-24, 2.753624e-89, 4.906714e-198, 0.000000e+00, 0.000000e+00]] for z, p in zip(self.values, probs): self.assertFloatEqual(zprob(z), p) for z, p in zip(self.negvalues, probs): self.assertFloatEqual(zprob(z), p) def test_chi_low(self): """chi_low should match R's pchisq() function""" probs = { 1: [ 0.00000000, 0.07965567, 0.24817037, 0.52049988, 0.68268949, 0.84270079, 0.97465268, 0.99843460, 0.99999226, 0.99999996, 1.00000000, 1.00000000 ], 10: [ 0.000000e+00, 2.593339e-14, 2.497951e-09, 6.611711e-06, 1.721156e-04, 3.659847e-03, 1.088220e-01, 5.595067e-01, 9.707473e-01, 9.991434e-01, 9.999997e-01, 1.000000e+00, ], 100: [ 0.000000e+00, 2.906006e-180, 2.780588e-130, 2.029952e-95, 1.788777e-80, 1.233751e-65, 2.238699e-46, 2.181059e-32, 1.854727e-19, 9.056126e-13, 6.953305e-06, 1.000000e-00 ], } for df in self.df: for x, p in zip(self.values, probs[df]): self.assertFloatEqual(chi_low(x, df), p) def test_chi_high(self): """chi_high should match R's pchisq(lower.tail=FALSE) function""" probs = { 1: [ 1.000000e+00, 9.203443e-01, 7.518296e-01, 4.795001e-01, 3.173105e-01, 1.572992e-01, 2.534732e-02, 1.565402e-03, 7.744216e-06, 4.320463e-08, 1.537460e-12, 2.088488e-45, ], 10: [ 1.000000e+00, 1.000000e-00, 1.000000e-00, 9.999934e-01, 9.998279e-01, 9.963402e-01, 8.911780e-01, 4.404933e-01, 2.925269e-02, 8.566412e-04, 2.669083e-07, 1.613931e-37, ], 100:[ 1.00000e+00, 1.00000e+00, 1.00000e+00, 1.00000e+00, 1.00000e+00, 1.00000e+00, 1.00000e+00, 1.00000e+00, 1.00000e+00, 1.00000e+00, 9.99993e-01, 1.17845e-08, ], } for df in self.df: for x, p in zip(self.values, probs[df]): self.assertFloatEqual(chi_high(x, df), p) def test_t_low(self): """t_low should match R's pt() function""" probs = { 1: [ 0.5000000, 0.5031830, 0.5317255, 0.6475836, 0.7500000, 0.8524164, 0.9371670, 0.9682745, 0.9840977, 0.9893936, 0.9936347, 0.9984085, ], 10: [ 0.5000000, 0.5038910, 0.5388396, 0.6860532, 0.8295534, 0.9633060, 0.9997313, 0.9999992, 1.0000000, 1.0000000, 1.0000000, 1.0000000, ], 100:[ 0.5000000, 0.5039794, 0.5397277, 0.6909132, 0.8401379, 0.9758939, 0.9999988, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, ], } negprobs = { 1: [ 0.500000000, 0.496817007, 0.468274483, 0.352416382, 0.250000000, 0.147583618, 0.062832958, 0.031725517, 0.015902251, 0.010606402, 0.006365349, 0.001591536, ], 10: [ 5.000000e-01, 4.961090e-01, 4.611604e-01, 3.139468e-01, 1.704466e-01, 3.669402e-02, 2.686668e-04, 7.947766e-07, 1.073031e-09, 1.980896e-11, 1.237155e-13, 1.200254e-19, ], 100:[ 5.000000e-01, 4.960206e-01, 4.602723e-01, 3.090868e-01, 1.598621e-01, 2.410609e-02, 1.225087e-06, 4.950844e-17, 4.997134e-37, 4.190166e-52, 7.236082e-73, 2.774197e-132, ], } for df in self.df: for x, p in zip(self.values, probs[df]): self.assertFloatEqualRel(t_low(x, df), p, eps=1e-4) for x, p in zip(self.negvalues, negprobs[df]): self.assertFloatEqualRel(t_low(x, df), p, eps=1e-4) def test_t_high(self): """t_high should match R's pt(lower.tail=FALSE) function""" negprobs = { 1: [ 0.5000000, 0.5031830, 0.5317255, 0.6475836, 0.7500000, 0.8524164, 0.9371670, 0.9682745, 0.9840977, 0.9893936, 0.9936347, 0.9984085, ], 10: [ 0.5000000, 0.5038910, 0.5388396, 0.6860532, 0.8295534, 0.9633060, 0.9997313, 0.9999992, 1.0000000, 1.0000000, 1.0000000, 1.0000000, ], 100:[ 0.5000000, 0.5039794, 0.5397277, 0.6909132, 0.8401379, 0.9758939, 0.9999988, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, ], } probs = { 1: [ 0.500000000, 0.496817007, 0.468274483, 0.352416382, 0.250000000, 0.147583618, 0.062832958, 0.031725517, 0.015902251, 0.010606402, 0.006365349, 0.001591536, ], 10: [ 5.000000e-01, 4.961090e-01, 4.611604e-01, 3.139468e-01, 1.704466e-01, 3.669402e-02, 2.686668e-04, 7.947766e-07, 1.073031e-09, 1.980896e-11, 1.237155e-13, 1.200254e-19, ], 100:[ 5.000000e-01, 4.960206e-01, 4.602723e-01, 3.090868e-01, 1.598621e-01, 2.410609e-02, 1.225087e-06, 4.950844e-17, 4.997134e-37, 4.190166e-52, 7.236082e-73, 2.774197e-132, ], } for df in self.df: for x, p in zip(self.values, probs[df]): self.assertFloatEqualRel(t_high(x, df), p, eps=1e-4) for x, p in zip(self.negvalues, negprobs[df]): self.assertFloatEqualRel(t_high(x, df), p, eps=1e-4) def test_tprob(self): """tprob should match twice the t_high probability for abs(t)""" probs = { 1: [ 2*i for i in [ 0.500000000, 0.496817007, 0.468274483, 0.352416382, 0.250000000, 0.147583618, 0.062832958, 0.031725517, 0.015902251, 0.010606402, 0.006365349, 0.001591536, ]], 10: [ 2*i for i in [ 5.000000e-01, 4.961090e-01, 4.611604e-01, 3.139468e-01, 1.704466e-01, 3.669402e-02, 2.686668e-04, 7.947766e-07, 1.073031e-09, 1.980896e-11, 1.237155e-13, 1.200254e-19, ]], 100:[ 2*i for i in [ 5.000000e-01, 4.960206e-01, 4.602723e-01, 3.090868e-01, 1.598621e-01, 2.410609e-02, 1.225087e-06, 4.950844e-17, 4.997134e-37, 4.190166e-52, 7.236082e-73, 2.774197e-132, ]], } for df in self.df: for x, p in zip(self.values, probs[df]): self.assertFloatEqualRel(tprob(x, df), p, eps=1e-4) def test_poisson_low(self): """Lower tail of poisson should match R for integer successes""" #WARNING: Results only guaranteed for integer successes: floating #point _should_ yield reasonable values, but R rounds to int. expected = { (0, 0): 1, (0, 0.75): 0.4723666, (0, 1): 0.3678794, (0, 5): 0.006737947, (0, 113.7): 4.175586e-50, (2, 0): 1, (2, 3): 0.4231901, (2, 17.8): 3.296636e-06, (17, 29.6): 0.008753318, (180, 0): 1, (180, 137.4):0.999784, (180, 318):2.436995e-17, (180, 1024):8.266457e-233, } for (key, value) in expected.items(): self.assertFloatEqual(poisson_low(*key), value) def test_poisson_high(self): """Upper tail of poisson should match R for integer successes""" #WARNING: Results only guaranteed for integer successes: floating #point _should_ yield reasonable values, but R rounds to int. expected = { (0, 0): 0, (0, 0.75): 0.5276334, (0, 1): 0.6321206, (0, 5): 0.993262, (0, 113.7): 1, (2, 0): 0, (2, 3): 0.5768099, (2, 17.8): 0.9999967, (17, 29.6): 0.9912467, (180, 0): 0, (180, 137.4):0.0002159856, (180, 318):1, (180, 1024):1, } for (key, value) in expected.items(): self.assertFloatEqual(poisson_high(*key), value) def test_poisson_exact(self): """Poisson exact should match expected values from R""" expected = { (0, 0): 1, (0, 0.75): 0.4723666, (0, 1): 0.3678794, (0, 5): 0.006737947, (0, 113.7): 4.175586e-50, (2, 0): 0, (2, 3): 0.2240418, (2, 17.8): 2.946919e-06, (17, 29.6): 0.004034353, (180, 0): 0, (180, 137.4):7.287501e-05, (180, 318):1.067247e-17, (180, 1024):6.815085e-233, } for (key, value) in expected.items(): self.assertFloatEqual(poisson_exact(*key), value) def test_binomial_high(self): """Binomial high should match values from R for integer successes""" expected = { (0, 1, 0.5): 0.5, (1, 1, 0.5): 0, (1, 1, 0.0000001): 0, (1, 1, 0.9999999): 0, (3, 5, 0.75):0.6328125, (0, 60, 0.5): 1, (129, 130, 0.5):7.34684e-40, (299, 300, 0.099): 4.904089e-302, (9, 27, 0.0003): 4.958496e-29, (1032, 2050, 0.5): 0.3702155, (-1, 3, 0.1): 1, #if successes less than 0, return 1 (-0.5, 3, 0.1):1, } for (key, value) in expected.items(): self.assertFloatEqualRel(binomial_high(*key), value, 1e-4) #should reject if successes > trials or successes < -1 self.assertRaises(ValueError, binomial_high, 7, 5, 0.5) def test_binomial_low(self): """Binomial low should match values from R for integer successes""" expected = { (0, 1, 0.5): 0.5, (1, 1, 0.5): 1, (1, 1, 0.0000001): 1, (1, 1, 0.9999999): 1, (26, 50, .5): 0.6641, (3, 5, 0.75):0.3671875, (0, 60, 0.5): 8.673617e-19, (129, 130, 0.5):1, (299, 300, 0.099): 1, (9, 27, 0.0003): 1, (1032, 2050, 0.5): 0.6297845, } for (key, value) in expected.items(): self.assertFloatEqualRel(binomial_low(*key), value, 1e-4) def test_binomial_series(self): """binomial_exact should match values from R on a whole series""" expected = map(float, "0.0282475249 0.1210608210 0.2334744405 0.2668279320 0.2001209490 0.1029193452 0.0367569090 0.0090016920 0.0014467005 0.0001377810 0.0000059049".split()) for i in range(len(expected)): self.assertFloatEqual(binomial_exact(i, 10, 0.3), expected[i]) def test_binomial_exact(self): """binomial_exact should match values from R for integer successes""" expected = { (0, 1, 0.5): 0.5, (1, 1, 0.5): 0.5, (1, 1, 0.0000001): 1e-07, (1, 1, 0.9999999): 0.9999999, (3, 5, 0.75):0.2636719, (0, 60, 0.5): 8.673617e-19, (129, 130, 0.5):9.550892e-38, (299, 300, 0.099): 1.338965e-298, (9, 27, 0.0003): 9.175389e-26, (1032, 2050, 0.5): 0.01679804, } for (key, value) in expected.items(): self.assertFloatEqualRel(binomial_exact(*key), value, 1e-4) def test_binomial_exact_floats(self): """binomial_exact should be within limits for floating point numbers """ expected = { (18.3, 100, 0.2): (0.09089812, 0.09807429), (2.7,1050,0.006): (0.03615498, 0.07623827), (2.7,1050,0.06): (1.365299e-25, 3.044327e-24), (2,100.5,0.6): (7.303533e-37, 1.789727e-36), (10,100.5,.5):(7.578011e-18,1.365543e-17), (0.2, 60, 0.5): (8.673617e-19, 5.20417e-17), (.5,5,.3):(0.16807,0.36015), } for (key, value) in expected.items(): min_val, max_val = value assert min_val < binomial_exact(*key) < max_val #self.assertFloatEqualRel(binomial_exact(*key), value, 1e-4) def test_binomial_exact_errors(self): """binomial_exact should raise errors on invalid input""" self.assertRaises(ValueError, binomial_exact,10.2, 5, 0.33) self.assertRaises(ValueError, binomial_exact,-2, 5, 0.33) self.assertRaises(ValueError, binomial_exact, 10, 50, -2) self.assertRaises(ValueError, binomial_exact, 10, 50, 3) def test_f_high(self): """F high should match values from R for integer successes""" expected = { (1, 1, 0): 1, (1, 1, 1): 0.5, (1, 1, 20): 0.1400487, (1, 1, 1000000): 0.0006366196, (1, 10, 0): 1, (1,10, 5): 0.0493322, (1, 10, 20): 0.001193467, (10, 1, 0):1, (10, 10, 14.7): 0.0001062585, (13.7, 11.9, 3.8): 0.01340347, #test non-integer degrees of freedom #used following series to track down a bug after a failed test case (28, 29, 2): 0.03424088, (28, 29, 10): 1.053019e-08, (28, 29, 20): 1.628245e-12, (28, 29, 300): 5.038791e-29, (28, 35, 1): 0.4946777, (28, 37, 1): 0.4934486, (28, 38, 1): 0.4928721, (28, 38.001, 1): 0.4928716, (28, 38.5, 1): 0.4925927, (28, 39, 1): 0.492319, (28, 39, 10): 1.431901e-10, (28, 39, 20): 1.432014e-15, (28, 39, 30): 1.059964e-18, (28, 39, 50): 8.846678e-23, (28, 39, 10): 1.431901e-10, (28, 39, 300): 1.226935e-37, (28, 39, 50): 8.846678e-23, (28,39,304.7): 9.08154e-38, (28.4, 39.2, 304.7): 5.573927e-38, (1032, 2050, 0): 1, (1032, 2050, 4.15): 1.23535e-165, (1032, 2050, 0.5): 1, (1032, 2050, 0.1): 1, } e = expected.items() e.sort() for (key, value) in e: self.assertFloatEqualRel(f_high(*key), value) def test_f_low(self): """F low should match values from R for integer successes""" expected = { (1, 1, 0): 0, (1, 1, 1): 0.5, (1, 1, 20): 0.8599513, (1, 1, 1000000): 0.9993634, (1, 10, 0): 0, (1,10, 5): 0.9506678, (1, 10, 20): 0.9988065, (10, 1, 0):0, (10, 10, 14.7): 0.9998937, (28.4, 39.2, 304.7): 1, (1032, 2050, 0): 0, (1032, 2050, 4.15): 1, (1032, 2050, 0.5): 7.032663e-35, (1032, 2050, 0.1): 1.70204e-278, } for (key, value) in expected.items(): self.assertFloatEqualRel(f_low(*key), value) def test_fprob(self): """fprob should return twice the tail on a particular side""" error = 1e-4 #right-hand side self.assertFloatEqualAbs(fprob(10,10,1.2), 0.7788, eps=error) #left-hand side self.assertFloatEqualAbs(fprob(10,10,1.2, side='left'), 1.2212, eps=error) self.assertRaises(ValueError, fprob, 10,10,-3) self.assertRaises(ValueError, fprob, 10, 10, 1, 'non_valid_side') def test_stdtr(self): """stdtr should match cephes results""" t = [-10, -3.1, -0.5, -0.01, 0, 1, 0.5, 10] k = [2, 10, 100] exp = [ 0.00492622851166, 7.94776587798e-07, 4.9508444923e-17, 0.0451003650651, 0.00562532860804, 0.00125696358826, 0.333333333333, 0.313946802871, 0.309086782915, 0.496464554479, 0.496108987495, 0.496020605117, 0.5, 0.5, 0.5, 0.788675134595, 0.829553433849, 0.840137922108, 0.666666666667, 0.686053197129, 0.690913217085, 0.995073771488, 0.999999205223, 1.0, ] index = 0 for i in t: for j in k: self.assertFloatEqual(stdtr(j,i), exp[index]) index += 1 def test_bdtr(self): """bdtr should match cephes results""" k_s = [0,1,2,3,5] n_s = [5,10,1000] p_s = [1e-10, .1, .5, .9, .999999] exp = [ 0.9999999995, 0.59049, 0.03125, 1e-05, 1.00000000014e-30, 0.999999999, 0.3486784401, 0.0009765625, 1e-10, 1.00000000029e-60, 0.9999999, 1.74787125172e-46, 9.33263618503e-302, 0.0, 0.0, 1.0, 0.91854, 0.1875, 0.00046, 4.99999600058e-24, 1.0, 0.7360989291, 0.0107421875, 9.1e-09, 9.99999100259e-54, 1.0, 1.9595578811e-44, 9.34196882121e-299, 0.0, 0.0, 1.0, 0.99144, 0.5, 0.00856, 9.99998500087e-18, 1.0, 0.9298091736, 0.0546875, 3.736e-07, 4.49999200104e-47, 1.0, 1.09744951737e-42, 4.67099374325e-296, 0.0, 0.0, 1.0, 0.99954, 0.8125, 0.08146, 9.99998000059e-12, 1.0, 0.9872048016, 0.171875, 9.1216e-06, 1.19999685024e-40, 1.0, 4.09381247279e-41, 1.5554471507e-293, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9998530974, 0.623046875, 0.0016349374, 2.51998950038e-28, 1.0, 2.55654569306e-38, 7.7385053063e-289, 0.0, 0.0, ] index = 0 for k in k_s: for n in n_s: for p in p_s: self.assertFloatEqual(bdtr(k,n,p), exp[index]) index += 1 def test_bdtrc(self): """bdtrc should give same results as cephes""" k_s = [0,1,2,3,5] n_s = [5,10,1000] p_s = [1e-10, .1, .5, .9, .999999] exp = [ 4.999999999e-10, 0.40951, 0.96875, 0.99999, 1.0, 9.9999999955e-10, 0.6513215599, 0.9990234375, 0.9999999999, 1.0, 9.9999995005e-08, 1.0, 1.0, 1.0, 1.0, 9.999999998e-20, 0.08146, 0.8125, 0.99954, 1.0, 4.4999999976e-19, 0.2639010709, 0.9892578125, 0.9999999909, 1.0, 4.99499966766e-15, 1.0, 1.0, 1.0, 1.0, 9.9999999985e-30, 0.00856, 0.5, 0.99144, 1.0, 1.19999999937e-28, 0.0701908264, 0.9453125, 0.9999996264, 1.0, 1.66166987575e-22, 1.0, 1.0, 1.0, 1.0, 4.9999999996e-40, 0.00046, 0.1875, 0.91854, 0.99999999999, 2.09999999899e-38, 0.0127951984, 0.828125, 0.9999908784, 1.0, 4.14171214499e-30, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.09999999928e-58, 0.0001469026, 0.376953125, 0.9983650626, 1.0, 1.36817318242e-45, 1.0, 1.0, 1.0, 1.0, ] index = 0 for k in k_s: for n in n_s: for p in p_s: self.assertFloatEqual(bdtrc(k,n,p), exp[index]) index += 1 def test_pdtr(self): """pdtr should match cephes results""" k_s = [0,1,2,5,10] m_s = [1e-9, 0.1,0.5,1,2,31] exp = [ 0.999999999 , 0.904837418036 , 0.606530659713 , 0.367879441171 , 0.135335283237 , 3.44247710847e-14 , 1.0 , 0.99532115984 , 0.909795989569 , 0.735758882343 , 0.40600584971 , 1.10159267471e-12 , 1.0 , 0.99984534693 , 0.985612322033 , 0.919698602929 , 0.676676416183 , 1.76426951809e-11 , 1.0 , 0.999999998725 , 0.999985835063 , 0.999405815182 , 0.983436391519 , 9.72616712615e-09 , 1.0 , 1.0 , 0.999999999992 , 0.999999989952 , 0.999991691776 , 1.12519146046e-05 , ] index = 0 for k in k_s: for m in m_s: self.assertFloatEqual(pdtr(k,m), exp[index]) index += 1 def test_pdtrc(self): """pdtrc should match cephes results""" k_s = [0,1,2,5,10] m_s = [1e-9, 0.1,0.5,1,2,31] exp = [ 9.999999995e-10 , 0.095162581964 , 0.393469340287 , 0.632120558829 , 0.864664716763 , 1.0 , 4.99999999667e-19 , 0.00467884016044 , 0.090204010431 , 0.264241117657 , 0.59399415029 , 0.999999999999 , 1.66666666542e-28 , 0.000154653070265 , 0.014387677967 , 0.0803013970714 , 0.323323583817 , 0.999999999982 , 1.3888888877e-57 , 1.27489869223e-09 , 1.41649373223e-05 , 0.000594184817582 , 0.0165636084806 , 0.999999990274 , 2.50521083625e-107 , 2.28584493079e-19 , 7.74084073923e-12 , 1.00477663757e-08 , 8.30822436848e-06 , 0.999988748085 , ] index = 0 for k in k_s: for m in m_s: self.assertFloatEqual(pdtrc(k,m), exp[index]) index += 1 def test_fdtr(self): """fdtr should match cephes results""" a_s = [1, 2, 10, 1000] b_s = a_s x_s = [0, 0.01, 0.5, 10, 521.4] exp = [ 0.0, 0.0634510348611, 0.391826552031, 0.805017770958, 0.972137685271, 0.0, 0.0705345615859, 0.4472135955, 0.912870929175, 0.998087586699, 0.0, 0.0776792814356, 0.504352495617, 0.989880440265, 0.999999999415, 0.0, 0.0796356309764, 0.520335137562, 0.998387447605, 1.0, 0.0, 0.00985245702333, 0.292893218813, 0.781782109764, 0.96904781206, 0.0, 0.00990099009901, 0.333333333333, 0.909090909091, 0.99808575804, 0.0, 0.00994027888402, 0.379078676941, 0.995884773663, 0.999999999923, 0.0, 0.00995006724716, 0.393317789705, 0.999949891187, 1.0, 0.0, 1.5895531756e-06, 0.18766987087, 0.758331535711, 0.965930763936, 0.0, 2.44851927021e-07, 0.185934432082, 0.90573080983, 0.998084291751, 0.0, 1.15978163168e-08, 0.144845806026, 0.999428447457, 0.999999999997, 0.0, 2.54720538101e-09, 0.109321108726, 1.0, 1.0, 0.0, 1.66707029586e-22, 0.157610464133, 0.751895627261, 0.965077362955, 0.0, 2.56671102571e-40, 0.135876263477, 0.904846465249, 0.998083928382, 0.0, 5.1131107462e-143, 0.030392376141, 0.999825108037, 0.999999999999, 0.0, 0.0, 9.96002853277e-28, 1.0, 1.0, ] index = 0 for a in a_s: for b in b_s: for x in x_s: self.assertFloatEqual(fdtr(a,b,x), exp[index]) index += 1 def test_fdtrc(self): """fdtrc should match cephes results""" a_s = [1, 2, 10, 1000] b_s = a_s x_s = [0, 0.01, 0.5, 10, 521.4] exp = [ 1.0, 0.936548965139, 0.608173447969, 0.194982229042, 0.0278623147287, 1.0, 0.929465438414, 0.5527864045, 0.0871290708247, 0.00191241330122, 1.0, 0.922320718564, 0.495647504383, 0.0101195597354, 5.85364343244e-10, 1.0, 0.920364369024, 0.479664862438, 0.00161255239482, 3.24963344513e-93, 1.0, 0.990147542977, 0.707106781187, 0.218217890236, 0.0309521879405, 1.0, 0.990099009901, 0.666666666667, 0.0909090909091, 0.00191424196018, 1.0, 0.990059721116, 0.620921323059, 0.00411522633745, 7.73162209771e-11, 1.0, 0.990049932753, 0.606682210295, 5.01088134545e-05, 7.71037335669e-156, 1.0, 0.999998410447, 0.81233012913, 0.241668464289, 0.0340692360638, 1.0, 0.999999755148, 0.814065567918, 0.0942691901701, 0.00191570824928, 1.0, 0.999999988402, 0.855154193974, 0.000571552543402, 3.21796660031e-12, 1.0, 0.999999997453, 0.890678891274, 3.96065609687e-16, 0.0, 1.0, 1.0, 0.842389535866, 0.248104372739, 0.0349226370457, 1.0, 1.0, 0.864123736523, 0.0951535347509, 0.00191607161849, 1.0, 1.0, 0.969607623859, 0.00017489196271, 6.83862415869e-13, 1.0, 1.0, 1.0, 6.68418402018e-243, 0.0, ] index = 0 for a in a_s: for b in b_s: for x in x_s: self.assertFloatEqual(fdtrc(a,b,x), exp[index]) index += 1 def test_gdtr(self): """gdtr should match cephes results""" a_s = [1, 2, 10, 1000] b_s = a_s x_s = [0, 0.01, 0.5, 10, 521.4] exp = [ 0.0, 0.00995016625083, 0.393469340287, 0.99995460007, 1.0, 0.0, 4.96679133403e-05, 0.090204010431, 0.999500600773, 1.0, 0.0, 2.7307942837e-27, 1.70967002935e-10, 0.542070285528, 1.0, 0.0, 0.0, 0.0, 0.0, 2.78154480191e-77, 0.0, 0.0198013266932, 0.632120558829, 0.999999997939, 1.0, 0.0, 0.00019735322711, 0.264241117657, 0.999999956716, 1.0, 0.0, 2.77103020131e-24, 1.11425478339e-07, 0.995004587692, 1.0, 0.0, 0.0, 0.0, 0.0, 0.91070640569, 0.0, 0.095162581964, 0.993262053001, 1.0, 1.0, 0.0, 0.00467884016044, 0.959572318005, 1.0, 1.0, 0.0, 2.51634780677e-17, 0.0318280573062, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.99995460007, 1.0, 1.0, 1.0, 0.0, 0.999500600773, 1.0, 1.0, 1.0, 0.0, 0.542070285528, 1.0, 1.0, 1.0, 0.0, 0.0, 3.29827279707e-86, 1.0, 1.0, ] index = 0 for a in a_s: for b in b_s: for x in x_s: self.assertFloatEqual(gdtr(a,b,x), exp[index]) index += 1 def test_gdtrc(self): """gdtrc should match cephes results""" a_s = [1, 2, 10, 1000] b_s = a_s x_s = [0, 0.01, 0.5, 10, 521.4] exp = [ 1.0, 0.990049833749, 0.606530659713, 4.53999297625e-05, 3.62123855523e-227, 1.0, 0.999950332087, 0.909795989569, 0.000499399227387, 1.89173502125e-224, 1.0, 1.0, 0.999999999829, 0.457929714472, 2.89188102723e-208, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.980198673307, 0.367879441171, 2.06115362244e-09, 0.0, 1.0, 0.999802646773, 0.735758882343, 4.32842260712e-08, 0.0, 1.0, 1.0, 0.999999888575, 0.00499541230831, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0892935943104, 1.0, 0.904837418036, 0.00673794699909, 3.72007597602e-44, 0.0, 1.0, 0.99532115984, 0.0404276819945, 3.75727673578e-42, 0.0, 1.0, 1.0, 0.968171942694, 1.12534739608e-31, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 4.53999297625e-05, 7.12457640674e-218, 0.0, 0.0, 1.0, 0.000499399227387, 3.56941277978e-215, 0.0, 0.0, 1.0, 0.457929714472, 3.90479663912e-199, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, ] index = 0 for a in a_s: for b in b_s: for x in x_s: self.assertFloatEqual(gdtrc(a,b,x), exp[index]) index += 1 def test_chdtri(self): """chdtri should match cephes results""" k_s = [1,2,5,10,100] p_s = [1e-50, 1e-9, .02, .5, .8, .99] exp = [ 224.384748319, 37.3248930514, 5.41189443105, 0.45493642312, 0.0641847546673, 0.00015708785791, 230.258509299, 41.4465316739, 7.82404601086, 1.38629436112, 0.446287102628, 0.020100671707, 244.127298027, 50.6921937015, 13.388222599, 4.3514601911, 2.34253430584, 0.554298076728, 262.995620961, 62.9454574206, 21.1607675413, 9.34181776559, 6.17907925604, 2.55821216019, 478.347499744, 209.317598707, 131.141676866, 99.334129236, 87.9453359228, 70.0648949254, ] index = 0 for k in k_s: for p in p_s: self.assertFloatEqual(chdtri(k,p), exp[index]) index += 1 def test_stdtri(self): """stdtri should match cephes results""" k_s = [1,2,5,10,100] p_s = [1e-50, 1e-9, .02, .5, .8, .99] exp = [ -3.18309886184e+49, -318309886.184, -15.8945448441, 8.1775627727e-17, 1.37638192049, 31.8205159538, -7.07106781216e+24, -22360.6797414, -4.84873221444, 7.48293180888e-17, 1.06066017178, 6.96455671876, -15683925591.1, -98.9372246484, -2.75650852191, 6.976003623e-17, 0.919543780236, 3.36492999891, -256452.571877, -20.1446977667, -2.35931462368, 6.80574793291e-17, 0.879057828551, 2.76376945745, -28.9584072963, -6.59893982023, -2.08088390123, 6.6546053747e-17, 0.845230424487, 2.3642173659, ] index = 0 for k in k_s: for p in p_s: self.assertFloatEqual(stdtri(k,p), exp[index]) index += 1 def test_pdtri(self): """pdtri should match cephes results""" k_s = [1,2,5,10,100] p_s = [1e-50, 1e-9, .02, .5, .8, .99] exp = [ 119.924420375, 23.9397278656, 5.83392170192, 1.67834699002, 0.824388309033, 0.148554740253, 124.094307191, 26.6722865587, 7.51660387561, 2.67406031372, 1.53504420264, 0.436045165078, 134.901981814, 33.6746016741, 12.0269783451, 5.67016118871, 3.90366383933, 1.7852844853, 150.2138305, 43.627975401, 18.8297496417, 10.6685224038, 8.15701989758, 4.77124616939, 332.371212972, 173.368244558, 122.695978128, 100.666862949, 92.4593447729, 79.0999186597, ] index = 0 for k in k_s: for p in p_s: self.assertFloatEqual(pdtri(k,p), exp[index]) index += 1 def test_bdtri(self): """bdtri should match cephes results""" k_s = [0,1,2,3] n_s = [5,10,1000] p_s = [1e-10, .1, .5, .9, .999999] exp = [ 0.99, 0.36904265552, 0.129449436704, 0.020851637639, 2.00000080006e-07, 0.9, 0.205671765276, 0.0669670084632, 0.0104807417938, 1.00000045003e-07, 0.0227627790442, 0.00229993617745, 0.000692907009547, 0.000105354965434, 1.00000049953e-09, 0.997884361719, 0.58389037462, 0.313810170456, 0.112234958546, 0.000316327821398, 0.939678616058, 0.336847723307, 0.162262728195, 0.0545286199977, 0.00014913049349, 0.0260030189545, 0.0038841043984, 0.00167777786542, 0.000531936197341, 1.415587631e-06, 0.999784533318, 0.753363546712, 0.5, 0.246636453288, 0.00465241636163, 0.964779035441, 0.449603888674, 0.258574723285, 0.11582527803, 0.00203463563411, 0.0287538329681, 0.00531348536403, 0.00267315927217, 0.00110256069953, 1.82723947076e-05, 0.999996837712, 0.887765041454, 0.686189829544, 0.41610962538, 0.0212382182007, 0.981054188003, 0.551730832384, 0.355099967912, 0.187562296647, 0.00839131408953, 0.0312483560212, 0.00666849533707, 0.00367082709364, 0.00174586632568, 7.10965576424e-05, ] index = 0 for k in k_s: for n in n_s: for p in p_s: self.assertFloatEqual(bdtri(k,n,p), exp[index]) index += 1 def test_gdtri(self): """gdtri should match cephes results""" k_s = [1,2,4,10,100] n_s = k_s p_s = [1e-9, .02, .5, .8, .99] exp = [ 1.0000000005e-09, 0.0202027073175, 0.69314718056, 1.60943791243, 4.60517018599, 4.47220262303e-05, 0.214699095008, 1.67834699002, 2.994308347, 6.63835206799, 0.0124777531242, 1.01623845904, 3.67206074885, 5.51504571515, 10.0451175148, 0.602134838869, 4.61834927756, 9.66871461471, 12.5187528198, 18.7831173933, 51.1433022288, 80.5501391278, 99.6668649193, 108.304391619, 124.722561491, 5.0000000025e-10, 0.0101013536588, 0.34657359028, 0.804718956217, 2.30258509299, 2.23610131152e-05, 0.107349547504, 0.839173495008, 1.4971541735, 3.319176034, 0.00623887656209, 0.50811922952, 1.83603037443, 2.75752285758, 5.02255875742, 0.301067419435, 2.30917463878, 4.83435730736, 6.25937640991, 9.39155869666, 25.5716511144, 40.2750695639, 49.8334324597, 54.1521958095, 62.3612807454, 2.50000000125e-10, 0.00505067682938, 0.17328679514, 0.402359478109, 1.1512925465, 1.11805065576e-05, 0.053674773752, 0.419586747504, 0.748577086751, 1.659588017, 0.00311943828105, 0.25405961476, 0.918015187213, 1.37876142879, 2.51127937871, 0.150533709717, 1.15458731939, 2.41717865368, 3.12968820495, 4.69577934833, 12.7858255572, 20.1375347819, 24.9167162298, 27.0760979048, 31.1806403727, 1.0000000005e-10, 0.00202027073175, 0.069314718056, 0.160943791243, 0.460517018599, 4.47220262303e-06, 0.0214699095008, 0.167834699002, 0.2994308347, 0.663835206799, 0.00124777531242, 0.101623845904, 0.367206074885, 0.551504571515, 1.00451175148, 0.0602134838869, 0.461834927756, 0.966871461471, 1.25187528198, 1.87831173933, 5.11433022288, 8.05501391278, 9.96668649193, 10.8304391619, 12.4722561491, 1.0000000005e-11, 0.000202027073175, 0.0069314718056, 0.0160943791243, 0.0460517018599, 4.47220262303e-07, 0.00214699095008, 0.0167834699002, 0.02994308347, 0.0663835206799, 0.000124777531242, 0.0101623845904, 0.0367206074885, 0.0551504571515, 0.100451175148, 0.00602134838869, 0.0461834927756, 0.0966871461471, 0.125187528198, 0.187831173933, 0.511433022288, 0.805501391278, 0.996668649193, 1.08304391619, 1.24722561491, ] index = 0 for k in k_s: for n in n_s: for p in p_s: self.assertFloatEqual(gdtri(k,n,p), exp[index]) index += 1 def test_fdtri(self): """fdtri should match cephes results""" k_s = [1,2,4,10,100] n_s = k_s p_s = [1e-50, 1e-9, .02, .5, .8, .99] exp = [ 0.0, 2.46740096071e-18, 0.000987610197427, 1.0, 9.472135955, 4052.18069548, 0.0, 1.99999988687e-18, 0.000800320128051, 0.666666666667, 3.55555555556, 98.5025125628, 0.0, 1.77777767722e-18, 0.000711321880645, 0.548632170413, 2.35072147881, 21.1976895844, 0.0, 1.65119668161e-18, 0.000660638708985, 0.489736921158, 1.88288794493, 10.0442892734, 0.0, 1.57866975531e-18, 0.000631602221127, 0.458262714634, 1.66429288986, 6.89530103058, 0.0, 9.99999973218e-10, 0.0206164098292, 1.5, 12.0, 4999.5, 0.0, 9.99999972718e-10, 0.0204081632653, 1.0, 4.0, 99.0, 0.0, 9.99999972468e-10, 0.0203050891044, 0.828427124746, 2.472135955, 18.0, 0.0, 9.99999972318e-10, 0.0202435772829, 0.743491774985, 1.89864830731, 7.55943215755, 0.0, 9.99999972228e-10, 0.0202067893611, 0.697973989501, 1.63562099482, 4.82390980716, 0.0, 1.29104998825e-05, 0.0712270257663, 1.82271484235, 13.6443218387, 5624.58332963, 0.0, 1.58118880931e-05, 0.082357834815, 1.20710678119, 4.2360679775, 99.2493718553, 0.0, 1.82578627816e-05, 0.0917479893415, 1.0, 2.48261291932, 15.9770248526, 0.0, 2.04128031324e-05, 0.0999726146531, 0.898817134423, 1.82861100515, 5.99433866163, 0.0, 2.21407117017e-05, 0.106518545067, 0.844891468084, 1.5273126184, 3.5126840636, 0.0, 0.00213897888638, 0.130917099116, 2.04191262042, 14.7718897826, 6055.8467074, 0.0, 0.00322083313175, 0.168531162323, 1.34500479177, 4.38216390487, 99.3991959745, 0.0, 0.00448830777955, 0.207656634378, 1.11257336081, 2.45957986729, 14.5459008033, 0.0, 0.00608578074458, 0.251574092492, 1.0, 1.73159473193, 4.84914680208, 0.0, 0.00800159033308, 0.298648905106, 0.940477156977, 1.38089597558, 2.50331112688, 0.0, 3.09672866088e-11, 0.178906118636, 2.18215440197, 15.4973240414, 6334.110036, 0.0, 5.2776234633e-11, 0.2457526061, 1.43271814572, 4.47142755584, 99.4891628084, 0.0, 0.0659164713677, 0.326585865322, 1.18358397235, 2.43020291912, 13.5769915067, 0.0, 0.119865858243, 0.442669184276, 1.06329004653, 1.63265061785, 4.01371941549, 0.0, 0.289673110482, 0.661509869668, 1.0, 1.1839371445, 1.59766912303, ] index = 0 for k in k_s: for n in n_s: for p in p_s: self.assertFloatEqual(fdtri(k,n,p), exp[index]) index += 1 if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_maths/test_stats/test_histogram.py000644 000765 000024 00000007432 12024702176 025367 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides tests for Histogram. """ from cogent.util.unit_test import TestCase, main from cogent.maths.stats.histogram import Histogram from cogent.core.location import Span __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class HistogramTests(TestCase): """Tests for Histogram class""" def test_init_no_bins(self): """Histogram should raise error if initialized without bins""" # you deserve an Error if you initialize your histogram # without providing Bins self.assertRaises(AttributeError, Histogram) def test_init_bins(self): """Histogram should set _bins property correctly""" bins = [Span(0,2),Span(2,4),Span(4,6)] bins_only = Histogram(bins=bins) self.assertEqual(bins_only._bins, bins) def test_init_bins_data(self): """Histogram should fill bins with data if supplied""" # most basic histogram, bins and data data = [1,3,5,'A'] bins = [Span(0,2),Span(2,4),Span(4,6)] data_and_bins = Histogram(data=data,bins=bins) self.assertEqual(data_and_bins._bins,bins) self.assertEqual(data_and_bins._values,[[1],[3],[5]]) self.assertEqual(data_and_bins.Other,['A']) def test_call(self): """Histogram __call__ should update with new data""" data = [1,3,5,'A'] bins = [Span(0,2),Span(2,4),Span(4,6)] data_and_bins = Histogram(data=data,bins=bins) #update the histogram data_and_bins([4,5,6,7]) self.assertEqual(data_and_bins._values,[[1],[3],[5,4,5]]) self.assertEqual(data_and_bins.Other,['A',6,7]) def test_mapping(self): """Histogram Mapping should apply correct function to values""" # bins, data, mapping data = ['A','AAA','CCCCC','GGGGGGGGGGGGGG'] bins = [Span(0,2),Span(2,4),Span(4,6)] mapping = Histogram(data=data,bins=bins,Mapping=len) self.assertEqual(mapping._values, [['A'],['AAA'],['CCCCC']]) self.assertEqual(mapping.Other,['GGGGGGGGGGGGGG']) def test_multi(self): """Histogram Multi should allow values to match multiple bins""" #bins, data, multi=True bins2 = [Span(0,5),Span(3,8),Span(6,10)] data2 = [0,1,2,3,4,5,6,7,8,9,10] not_multi = Histogram(data2,bins2) self.assertEqual(not_multi._values,[[0,1,2,3,4],[5,6,7],[8,9]]) self.assertEqual(not_multi.Other,[10]) multi = Histogram(data2,bins2,Multi=True) self.assertEqual(multi._values,[[0,1,2,3,4],[3,4,5,6,7],[6,7,8,9]]) self.assertEqual(multi.Other,[10]) def test_toFreqs(self): """Histogram toFreqs() should return a Freqs object""" h = Histogram(range(0,20),bins=[Span(0,3),Span(3,10), Span(10,18),Span(18,20)]) constructor=str f = h.toFreqs() self.assertEqual(f[constructor(Span(0,3))],3) self.assertEqual(f[constructor(Span(3,10))],7) self.assertEqual(f[constructor(Span(10,18))],8) self.assertEqual(f[constructor(Span(18,20))],2) def test_clear(self): """Histogram clear should reset all data""" data = [1,3,5,'A'] bins = [Span(0,2),Span(2,4),Span(4,6)] data_and_bins = Histogram(data=data,bins=bins) self.assertEqual(data_and_bins._bins,bins) self.assertEqual(data_and_bins._values,[[1],[3],[5]]) self.assertEqual(data_and_bins.Other,['A']) data_and_bins.clear() self.assertEqual(data_and_bins._values,[[],[],[]]) self.assertEqual(data_and_bins.Other,[]) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/test_information_criteria.py000644 000765 000024 00000002047 12024702176 027576 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.maths.stats.information_criteria import aic, bic __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class InformationCriteria(TestCase): """Tests calculation of AIC and BIC measures.""" def test_aic(self): """correctly compute AIC from Burnham & Anderson 2002, p102""" self.assertFloatEqual(aic(-9.7039, 4), 27.4078) def test_aic_corrected(self): """correctly compute AIC corrected for small sample size""" # from Burnham & Anderson 2002, p102 self.assertFloatEqual(aic(-9.7039, 4, sample_size=13), 32.4078) def test_bic(self): """correctly compute BIC""" # against hand calculated self.assertFloatEqual(bic(-9.7039, 4, 13), 29.6675974298) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_maths/test_stats/test_jackknife.py000644 000765 000024 00000013752 12024702176 025321 0ustar00jrideoutstaff000000 000000 import numpy as np from cogent.util.unit_test import TestCase, main from cogent.maths.stats.jackknife import JackknifeStats __author__ = "Anuj Pahwa, Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Anuj Pahwa", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Production" def pmcc(data, axis=1): """Compute the Product-moment correlation coefficient. Expression 15.3 from Biometry by Sokal/Rohlf This code implementation is on the proviso that the data that is provided is two dimensional: [[Y1], [Y2]] (trying to determine the correlation coefficient between data sets Y1 and Y2""" if axis is 0: data = data.transpose() axis = 1 other_axis = 0 mean = data.mean(axis=axis) data_less_mean = np.array([data[0] - mean[0], data[1] - mean[1]]) sum_squares = np.sum(np.square(data_less_mean), axis=axis) sum_products = np.sum(np.prod(data_less_mean, axis=other_axis)) pmcc = np.divide(sum_products, np.sqrt(np.prod(sum_squares))) z_trans = np.arctanh(pmcc) return z_trans # test data from Box 15.2; Biometry by Sokal/Rohlf data = np.array([[159, 179, 100, 45, 384, 230, 100, 320, 80, 220, 320, 210], [14.40, 15.20, 11.30, 2.50, 22.70, 14.90, 1.41, 15.81, 4.19, \ 15.39, 17.25, 9.52]]) # factory function generator for the statistical function of interest def stat_maker(func, data, axis): def calc_stat(coords): subset_data = data.take(coords, axis) return func(subset_data, axis) return calc_stat # function to compute mean of a np array def mean(data, axis): return data.mean(axis=axis) class JackknifeTests(TestCase): def test_proper_initialise(self): """jackknife should initialise correctly""" # Scalar pmcc_stat = stat_maker(pmcc, data, 1) test_knife = JackknifeStats(data.shape[1], pmcc_stat) self.assertEqual(test_knife.n, data.shape[1]) self.assertEqual(test_knife._jackknifed_stat, None) # Vector mean_stat = stat_maker(mean, data, 1) test_knife = JackknifeStats(data.shape[1], mean_stat) self.assertEqual(test_knife.n, data.shape[1]) self.assertEqual(test_knife._jackknifed_stat, None) def test_jackknife_stats(self): """jackknife results should match Sokal & Rolf example""" # Scalar pmcc_stat = stat_maker(pmcc, data, 1) test_knife = JackknifeStats(data.shape[1], pmcc_stat) self.assertAlmostEquals(test_knife.JackknifedStat, 1.2905845) self.assertAlmostEquals(test_knife.StandardError, 0.2884490) self.assertTrue(test_knife._jackknifed_stat is not None) # Vector mean_stat = stat_maker(mean, data, 1) test_knife = JackknifeStats(data.shape[1], mean_stat) expected_jk_stat = data.mean(axis=1) got_jk_stat = test_knife.JackknifedStat expected_standard_err = [30.69509346, 1.87179671] got_standard_err = test_knife.StandardError for index in [0,1]: self.assertAlmostEqual(got_jk_stat[index], expected_jk_stat[index]) self.assertAlmostEqual(got_standard_err[index], expected_standard_err[index]) def test_tables(self): """jackknife should work for calculators return scalars or vectors""" # Scalar pmcc_stat = stat_maker(pmcc, data, 1) test_knife = JackknifeStats(data.shape[1], pmcc_stat) expected_subsample_stats = [1.4151, 1.3946, 1.4314, 1.1889, 1.1323, \ 1.3083, 1.3561, 1.3453, 1.2412, 1.3216, \ 1.2871, 1.3664] expected_pseudovalues = [0.1968, 0.4224, 0.0176, 2.6852, 3.3084, \ 1.3718, 0.8461, 0.9650, 2.1103, 1.2253, \ 1.6049, 0.7333] test_knife.jackknife() got_subsample_stats = test_knife._subset_statistics got_pseudovalues = test_knife._pseudovalues for index in range(data.shape[1]): self.assertAlmostEqual(got_subsample_stats[index], expected_subsample_stats[index], places=4) self.assertAlmostEqual(got_pseudovalues[index], expected_pseudovalues[index], places=4) # Vector mean_stat = stat_maker(mean, data, 1) test_knife = JackknifeStats(data.shape[1], mean_stat) test_knife.jackknife() expected_pseudovalues = data.transpose() expected_subsample_stats = [[ 198.9091, 11.8336], [ 197.0909, 11.7609], [ 204.2727, 12.1155], [ 209.2727, 12.9155], [ 178.4545, 11.0791], [ 192.4545, 11.7882], [ 204.2727, 13.0145], [ 184.2727, 11.7055], [ 206.0909, 12.7618], [ 193.3636, 11.7436], [ 184.2727, 11.5745], [ 194.2727, 12.2773]] got_subsample_stats = test_knife._subset_statistics got_pseudovalues = test_knife._pseudovalues for index1 in range(data.shape[1]): for index2 in range(data.shape[0]): self.assertAlmostEqual(got_subsample_stats[index1][index2], expected_subsample_stats[index1][index2], places=4) self.assertAlmostEqual(got_pseudovalues[index1][index2], expected_pseudovalues[index1][index2], places=4) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_maths/test_stats/test_ks.py000644 000765 000024 00000010647 12024702176 024011 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.maths.stats.ks import pkolmogorov1x, pkolmogorov2x, pkstwo,\ psmirnov2x from cogent.maths.stats.test import ks_test, ks_boot __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class KSTests(TestCase): """Tests Kolmogorov-Smirnov.""" def setUp(self): self.x1 = [0.09916191, 0.29732882, 0.41475044, 0.68816838, 0.20841367, 0.46129887, 0.22074544, 0.06889561, 0.88264852, 0.87726406, 0.76905072, 0.86178033, 0.42596777, 0.59443782, 0.68852176, 0.66032130, 0.72683791, 0.02363118, 0.82384762, 0.32759965, 0.69231127, 0.50848596, 0.67500888, 0.84919139, 0.70774136, 0.97847465, 0.59784714, 0.82033663, 0.45640039, 0.13054766, 0.01227875, 0.21229238, 0.37054602, 0.80905622, 0.26056527, 0.01662457, 0.76277188, 0.76892495, 0.39186350, 0.61468789, 0.83247770, 0.69946238, 0.80550609, 0.22336814, 0.62491296, 0.03413056, 0.74500251, 0.36008309, 0.19443889, 0.06808133] self.x2 = [1.1177760, 0.9984325, 0.8113576, 0.7247507, 0.9473543, 1.1192222, 1.2577115, 0.6168244, 0.9616475, 1.0677138, 0.5106196, 1.2334833, 0.3750225, 0.9788191, 1.1366872, 0.8212352, 0.7665240, 0.4409294, 0.4447418, 1.1381901, 0.7299300, 1.1307991, 0.5356031, 0.3193794, 1.2476867, 0.7909454, 0.7781800, 0.8438637, 1.1814135, 1.0117055, 0.7433708, 0.7917239, 0.5080752, 0.9014003, 0.5960710, 0.9646521, 0.9263595, 0.7969784, 1.2847108, 0.6393015, 0.6828791, 1.0817340, 0.6586887, 0.7314203, 0.3998812, 0.9988478, 1.0225579, 1.2721428, 0.6465969, 0.9133413] def test_pk1x(self): """1 sample 1-sided should match answers from R""" self.assertFloatEqual(pkolmogorov1x(0.06, 30), 0.2248113) def test_pk2x(self): """1 sample 2-sided should match answers from R""" self.assertFloatEqual(pkolmogorov2x(0.7199, 50), (1-6.661e-16)) self.assertFloatEqual(pkolmogorov2x(0.08, 30), 0.01754027) self.assertFloatEqual(pkolmogorov2x(0.03, 300), 0.05753413) def test_ps2x(self): """2 sample 2-sided smirnov should match answers from R""" self.assertFloatEqual(psmirnov2x(0.48, 20, 50), 0.9982277) self.assertFloatEqual(psmirnov2x(0.28, 20, 50), 0.8161612) self.assertFloatEqual(psmirnov2x(0.28, 50, 20), 0.8161612) def tes_pk2x(self): """2 sample 2-sided kolmogorov should match answers from R""" self.assertFloatEqual(pkolmogorov1x(0.058, 50), 0.007530237) self.assertFloatEqual(pkolmogorov1x(0.018, 50), 4.887356e-26) self.assertFloatEqual(pkolmogorov1x(0.018, 5000), 0.922618) def test_pkstwo(self): """kolmogorov asymptotic should match answers from R""" self.assertFloatEqual(pkstwo(2.3),[1-5.084e-05],eps=1e-5) def test_ks2x(self): """KS two-sample, 2-sided should match answers from R""" D, Pval = ks_test(self.x1, self.x2) self.assertFloatEqual((D, Pval), (0.46, 3.801e-05), eps=1e-4) D, Pval = ks_test(self.x1, self.x2, exact=False) self.assertFloatEqual((D, Pval), (0.46, 5.084e-05), eps=1e-4) D, Pval = ks_test(self.x1, self.x2[:20]) self.assertFloatEqual((D,Pval), (0.53, 0.0003576), eps=1e-4) D, Pval = ks_test(self.x2[:20], self.x1) self.assertFloatEqual((D,Pval), (0.53, 0.0003576), eps=1e-4) D, Pval = ks_test(self.x1[:20], self.x2) self.assertFloatEqual((D,Pval), (0.48, 0.001772), eps=1e-4) D, Pval = ks_test(self.x1, self.x2, alt="greater") self.assertFloatEqual((D,Pval), (0.46, 2.542e-05), eps=1e-4) D, Pval = ks_test(self.x1, self.x2, alt="g") self.assertFloatEqual((D,Pval), (0.46, 2.542e-05), eps=1e-4) D, Pval = ks_test(self.x1, self.x2, alt="less") self.assertFloatEqual((D,Pval), (6.9388939039072284e-18, 1.), eps=1e-4) D, Pval = ks_test(self.x2, self.x1, alt="l") self.assertFloatEqual((D,Pval), (0.46, 2.542e-05), eps=1e-4) def test_ks_boot(self): """excercising the bootstrapped version of KS""" D, Pval = ks_boot(self.x1[:10], self.x2[:10], num_reps=10) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_maths/test_stats/test_period.py000644 000765 000024 00000012704 12024702176 024652 0ustar00jrideoutstaff000000 000000 import numpy from cogent.util.unit_test import TestCase, main from cogent.maths.stats.period import chi_square, factorial, g_statistic, \ circular_indices, _seq_to_symbols, seq_to_symbols, blockwise_bootstrap, \ SeqToSymbols from cogent.maths.period import ipdft, hybrid, auto_corr, Hybrid, Ipdft, \ AutoCorrelation __author__ = "Hua Ying, Julien Epps and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Julien Epps", "Hua Ying", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Production" class TestPeriodStat(TestCase): def setUp(self): x = [1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] self.x = numpy.array(x) self.sig = numpy.array(self.x, numpy.float64) self.motifs = ['AA', 'TT', 'TA'] def test_chi_square(self): D, cs_p_val = chi_square(self.x, 10) self.assertEqual('%.4f'%D, '0.4786') self.assertEqual('%.4f'%cs_p_val, '0.4891') def test_factorial(self): self.assertEqual(factorial(1), 1) self.assertEqual(factorial(4), 24) self.assertEqual(factorial(0), 1) def test_g_statitic(self): """calc g-stat correctly""" X, periods = ipdft(self.sig, llim=2, ulim=39) g_obs, p_val = g_statistic(X) self.assertFloatEqual(p_val, 0.9997, eps=1e-3) self.assertFloatEqual(g_obs, 0.0577, eps=1e-3) def test_circular_indices(self): v = range(10) self.assertEqual(circular_indices(v, 8, 10, 4), [8,9,0,1]) self.assertEqual(circular_indices(v, 9, 10, 4), [9,0,1,2]) self.assertEqual(circular_indices(v, 4, 10, 4), [4,5,6,7]) def test_seq_to_symbol(self): """both py and pyx seq_to_symbol versions correctly convert a sequence""" motifs = ['AA', 'AT', 'TT'] symbols = _seq_to_symbols('AATGGTTA', motifs, 2) self.assertEqual(symbols, numpy.array([1,1,0,0,0,1,0,0])) symbols = seq_to_symbols('AAGATT', motifs, 2, numpy.zeros(6, numpy.uint8)) self.assertEqual(symbols, numpy.array([1,0,0,1,1,0])) def test_seq_to_symbol_factory(self): """checks factory function for conversion works""" motifs = ['AA', 'AT', 'TT'] seq_to_symbols = SeqToSymbols(motifs) self.assertEqual(seq_to_symbols('AATGGTTA'), numpy.array([1,1,0,0,0,1,0,0])) self.assertEqual(seq_to_symbols('AAGATT'), numpy.array([1,0, 0, 1, 1, 0], numpy.uint8)) def test_permutation(self): s = 'ATCGTTGGGACCGGTTCAAGTTTTGGAACTCGCAAGGGGTGAATGGTCTTCGTCTAACGCTGG'\ 'GGAACCCTGAATCGTTGTAACGCTGGGGTCTTTAACCGTTCTAATTTAACGCTGGGGGGTTCT'\ 'AATTTTTAACCGCGGAATTGCGTC' seq_to_symbol = SeqToSymbols(self.motifs, length=len(s)) hybrid_calc = Hybrid(len(s), llim=2, period = 4) ipdft_calc = Ipdft(len(s), llim=2, period = 4) stat, p = blockwise_bootstrap(s, hybrid_calc, block_size=10, num_reps=1000, seq_to_symbols=seq_to_symbol) # print 's=%.4f; p=%.3f' % (stat, p) stat, p = blockwise_bootstrap(s, ipdft_calc, block_size=10, num_reps=1000, seq_to_symbols=seq_to_symbol) # print 's=%.4f; p=%.3f' % (stat, p) def test_permutation_all(self): """performs permutation test of Hybrid, but considers all stats""" s = 'ATCGTTGGGACCGGTTCAAGTTTTGGAACTCGCAAGGGGTGAATGGTCTTCGTCTAACGCTGG'\ 'GGAACCCTGAATCGTTGTAACGCTGGGGTCTTTAACCGTTCTAATTTAACGCTGGGGGGTTCT'\ 'AATTTTTAACCGCGGAATTGCGTC' seq_to_symbol = SeqToSymbols(self.motifs, length=len(s)) hybrid_calc = Hybrid(len(s), period = 4, return_all=True) stat, p = blockwise_bootstrap(s, hybrid_calc, block_size=10, num_reps=1000, seq_to_symbols=seq_to_symbol) # print 's=%s; p=%s' % (stat, p) def test_get_num_stats(self): """calculators should return correct num stats""" hybrid_calc = Hybrid(150, llim=2, period = 4) ipdft_calc = Ipdft(150, llim=2, period = 4) autocorr_calc = AutoCorrelation(150, llim=2, period = 4) self.assertEqual(hybrid_calc.getNumStats(), 1) self.assertEqual(ipdft_calc.getNumStats(), 1) self.assertEqual(autocorr_calc.getNumStats(), 1) hybrid_calc = Hybrid(150, llim=2, period = 4, return_all=True) self.assertEqual(hybrid_calc.getNumStats(), 3) def test_permutation_skips(self): """permutation test correctly handles data without symbols""" s = 'N' * 150 seq_to_symbol = SeqToSymbols(self.motifs, length=len(s)) ipdft_calc = Ipdft(len(s), llim=2, period = 4) stat, p = blockwise_bootstrap(s, ipdft_calc, block_size=10, num_reps=1000, seq_to_symbols=seq_to_symbol, num_stats=1) self.assertEqual(stat, 0.0) self.assertEqual(p, 1.0) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/test_rarefaction.py000644 000765 000024 00000015512 12024702176 025665 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file test_parse.py from numpy import array from cogent.util.unit_test import TestCase, main from cogent.maths.stats.rarefaction import (subsample, naive_histogram, wrap_numpy_histogram, rarefaction, subsample_freq_dist_nonzero, subsample_random, subsample_multinomial) __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class TopLevelTests(TestCase): """Tests of top-level functions""" def test_subsample(self): """subsample should return a random subsample of a vector""" a = array([0,5,0]) self.assertEqual(subsample(a,5), array([0,5,0])) self.assertEqual(subsample(a,2), array([0,2,0])) b = array([2,0,1]) # selecting 2 counts from the vector 1000 times yields each of the # two possible results at least once each b = array([2,0,1]) actual = {} for i in range(1000): e = subsample(b,2) actual[tuple(e)] = None self.assertEqual(actual, {(1,0,1):None,(2,0,0):None}) obs = subsample(b,2) assert (obs == array([1,0,1])).all() or (obs == array([2,0,0])).all() def test_subsample_freq_dist_nonzero(self): """subsample_freq_dist_nonzero should return a random subsample of a vector """ a = array([0,5,0]) self.assertEqual(subsample_freq_dist_nonzero(a,5), array([0,5,0])) self.assertEqual(subsample_freq_dist_nonzero(a,2), array([0,2,0])) # selecting 35 counts from the vector 1000 times yields each at least # two different results b = array([2,0,1,2,1,8,6,0,3,3,5,0,0,0,5]) actual = {} for i in range(100): e = subsample_freq_dist_nonzero(b,35) self.assertTrue(e.sum(),35) actual[tuple(e)] = None self.assertTrue(len(actual) > 1) # selecting 2 counts from the vector 1000 times yields each of the # two possible results at least once each (note that an issue with an # inital buggy version of subsample_freq_dist_nonzero was detected with # this test, so don't remove - ) b = array([2,0,1]) actual = {} for i in range(1000): e = subsample_freq_dist_nonzero(b,2) actual[tuple(e)] = None self.assertTrue(e.sum() == 2) self.assertEqual(actual, {(1,0,1):None,(2,0,0):None}) def test_subsample_random(self): """subsample_random should return a random subsample of a vector """ a = array([0,5,0]) self.assertEqual(subsample_random(a,5), array([0,5,0])) self.assertEqual(subsample_random(a,2), array([0,2,0])) # selecting 35 counts from the vector 1000 times yields each at least # two different results b = array([2,0,1,2,1,8,6,0,3,3,5,0,0,0,5]) actual = {} for i in range(100): e = subsample_random(b,35) self.assertTrue(e.sum(),35) actual[tuple(e)] = None self.assertTrue(len(actual) > 1) # selecting 2 counts from the vector 1000 times yields each of the # two possible results at least once each b = array([2,0,1]) actual = {} for i in range(1000): e = subsample_random(b,2) actual[tuple(e)] = None self.assertTrue(e.sum() == 2) self.assertEqual(actual, {(1,0,1):None,(2,0,0):None}) def test_subsample_multinomial(self): """subsample_multinomial should return a random subsample of a vector """ # selecting 35 counts from the vector 1000 times yields each at least # two different results actual = {} for i in range(100): b = array([2,0,1,2,1,8,6,0,3,3,5,0,0,0,5]) e = subsample_multinomial(b,35) self.assertTrue(e.sum(),35) actual[tuple(e)] = None self.assertTrue(len(actual) > 1) def test_naive_histogram(self): """naive_histogram should produce expected result""" vals = array([1,0,0,3]) self.assertEqual(naive_histogram(vals), array([2,1,0,1])) self.assertEqual(naive_histogram(vals, 4), array([2,1,0,1,0])) def test_wrap_numpy_histogram(self): """wrap_numpy_histogram should provide expected result""" vals = array([1,0,0,3]) h_f = wrap_numpy_histogram(3) self.assertEqual(h_f(vals), array([2,1,0,1])) h_f = wrap_numpy_histogram(4) self.assertEqual(h_f(vals, 4), array([2,1,0,1,0])) def test_rarefaction(self): """rarefaction should produce expected curve""" vals = array([5,0,0,3,0,10], dtype=int) res = [r.copy() for r in rarefaction(vals, stride=1)] self.assertEqual(len(res), 18) for i, r in enumerate(res): self.assertEqual(r.sum(), i+1) #make sure we didn't add any bad counts for pos in [1,2,4]: self.assertEqual(r[pos], 0) #when we get to end should recapture orig vals self.assertEqual(r, vals) res = [r.copy() for r in rarefaction(vals, stride=3)] self.assertEqual(len(res), 6) for i, r in enumerate(res): self.assertEqual(r.sum(), 3*(i+1)) #make sure we didn't add any bad counts for pos in [1,2,4]: self.assertEqual(r[pos], 0) #when we get to end should recapture orig vals self.assertEqual(r, vals) #repeat everything above using alt. input format orig_vals = vals.copy() vals = array([0,0,0,0,0,3,3,3,5,5,5,5,5,5,5,5,5,5], dtype=int) res = [r.copy() for r in rarefaction(vals, stride=1, is_counts=False)] self.assertEqual(len(res), 18) for i, r in enumerate(res): self.assertEqual(r.sum(), i+1) #make sure we didn't add any bad counts for pos in [1,2,4]: self.assertEqual(r[pos], 0) #when we get to end should recapture orig vals self.assertEqual(r, orig_vals) res = [r.copy() for r in rarefaction(vals, stride=3, is_counts=False)] self.assertEqual(len(res), 6) for i, r in enumerate(res): self.assertEqual(r.sum(), 3*(i+1)) #make sure we didn't add any bad counts for pos in [1,2,4]: self.assertEqual(r[pos], 0) #when we get to end should recapture orig vals self.assertEqual(r, orig_vals) if __name__ =='__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/test_special.py000644 000765 000024 00000036744 12024702176 025022 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for special functions used in statistics. """ from cogent.util.unit_test import TestCase, main from cogent.maths.stats.special import permutations, permutations_exact, \ ln_permutations, combinations, combinations_exact, \ ln_combinations, ln_binomial, log_one_minus, one_minus_exp, igami,\ ndtri, incbi, log1p import math __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Rob Knight", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class SpecialTests(TestCase): """Tests miscellaneous functions.""" def test_permutations(self): """permutations should return expected results""" self.assertEqual(permutations(1,1), 1) self.assertEqual(permutations(2,1), 2) self.assertEqual(permutations(3,1), 3) self.assertEqual(permutations(4,1), 4) self.assertEqual(permutations(4,2), 12) self.assertEqual(permutations(4,3), 24) self.assertEqual(permutations(4,4), 24) self.assertFloatEqual(permutations(300, 100), 3.8807387193009318e+239) def test_permutations_errors(self): """permutations should raise errors on invalid input""" self.assertRaises(IndexError,permutations,10,50) self.assertRaises(IndexError,permutations,-1,50) self.assertRaises(IndexError,permutations,10,-5) def test_permutations_float(self): """permutations should use gamma function when floats as input""" self.assertFloatEqual(permutations(1.0,1), 1) self.assertFloatEqual(permutations(2,1.0), 2) self.assertFloatEqual(permutations(3.0,1.0), 3) self.assertFloatEqual(permutations(4.0,1), 4) self.assertFloatEqual(permutations(4.0,2.0), 12) self.assertFloatEqual(permutations(4.0,3.0), 24) self.assertFloatEqual(permutations(4,4.0), 24) self.assertFloatEqual(permutations(300, 100), 3.8807387193009318e+239) def test_permutations_range(self): """permutations should increase gradually with increasing k""" start = 5 #permuations(10,5) = 30240 end = 6 #permutations(10,6) = 151200 step = 0.1 lower_lim = 30240 upper_lim = 151200 previous_value = 30239.9999 while start <= end: obs = permutations(10,start) assert lower_lim <= obs <= upper_lim assert obs > previous_value previous_value = obs start += step def test_permutations_exact(self): """permutations_exact should return expected results""" self.assertFloatEqual(permutations_exact(1,1), 1) self.assertFloatEqual(permutations_exact(2,1), 2) self.assertFloatEqual(permutations_exact(3,1), 3) self.assertFloatEqual(permutations_exact(4,1), 4) self.assertFloatEqual(permutations_exact(4,2), 12) self.assertFloatEqual(permutations_exact(4,3), 24) self.assertFloatEqual(permutations_exact(4,4), 24) self.assertFloatEqual(permutations_exact(300,100),\ 3.8807387193009318e239) def test_ln_permutations(self): """ln_permutations should return expected results""" self.assertFloatEqual(ln_permutations(1,1), math.log(1)) self.assertFloatEqual(ln_permutations(2,1), math.log(2)) self.assertFloatEqual(ln_permutations(3,1.0), math.log(3)) self.assertFloatEqual(ln_permutations(4,1), math.log(4)) self.assertFloatEqual(ln_permutations(4.0,2), math.log(12)) self.assertFloatEqual(ln_permutations(4,3.0), math.log(24)) self.assertFloatEqual(ln_permutations(4,4), math.log(24)) self.assertFloatEqual(ln_permutations(300.0,100),\ math.log(3.8807387193009318e239)) def test_combinations(self): """combinations should return expected results when int as input""" self.assertEqual(combinations(1,1), 1) self.assertEqual(combinations(2,1), 2) self.assertEqual(combinations(3,1), 3) self.assertEqual(combinations(4,1), 4) self.assertEqual(combinations(4,2), 6) self.assertEqual(combinations(4,3), 4) self.assertEqual(combinations(4,4), 1) self.assertEqual(combinations(20,4), 19*17*15) self.assertFloatEqual(combinations(300, 100), 4.1582514632578812e+81) def test_combinations_errors(self): """combinations should raise errors on invalid input""" self.assertRaises(IndexError,combinations,10,50) self.assertRaises(IndexError,combinations,-1,50) self.assertRaises(IndexError,combinations,10,-5) def test_combinations_float(self): """combinations should use gamma function when floats as input""" self.assertFloatEqual(combinations(1.0,1.0), 1) self.assertFloatEqual(combinations(2.0,1.0), 2) self.assertFloatEqual(combinations(3.0,1.0), 3) self.assertFloatEqual(combinations(4.0,1.0), 4) self.assertFloatEqual(combinations(4.0,2), 6) self.assertFloatEqual(combinations(4,3.0), 4) self.assertFloatEqual(combinations(4.0,4.0), 1) self.assertFloatEqual(combinations(20.0,4.0), 19*17*15) self.assertFloatEqual(combinations(300,100.0),4.1582514632578812e81) def test_combinations_range(self): """combinations should decrease gradually with increasing k""" start = 5 #combinations(10,5) = 252 end = 6 #combinations(10,6) = 210 step = 0.1 lower_lim = 210 upper_lim = 252 previous_value = 252.00001 while start <= end: obs = combinations(10,start) assert lower_lim <= obs <= upper_lim assert obs < previous_value previous_value = obs start += step def test_combinations_exact(self): """combinations_exact should return expected results""" self.assertEqual(combinations_exact(1,1), 1) self.assertEqual(combinations_exact(2,1), 2) self.assertEqual(combinations_exact(3,1), 3) self.assertEqual(combinations_exact(4,1), 4) self.assertEqual(combinations_exact(4,2), 6) self.assertEqual(combinations_exact(4,3), 4) self.assertEqual(combinations_exact(4,4), 1) self.assertEqual(combinations_exact(20,4), 19*17*15) self.assertFloatEqual(combinations_exact(300,100),4.1582514632578812e81) def test_ln_combinations(self): """ln_combinations should return expected results""" self.assertFloatEqual(ln_combinations(1,1), math.log(1)) self.assertFloatEqual(ln_combinations(2,1), math.log(2)) self.assertFloatEqual(ln_combinations(3,1), math.log(3)) self.assertFloatEqual(ln_combinations(4.0,1), math.log(4)) self.assertFloatEqual(ln_combinations(4,2.0), math.log(6)) self.assertFloatEqual(ln_combinations(4,3), math.log(4)) self.assertFloatEqual(ln_combinations(4,4.0), math.log(1)) self.assertFloatEqual(ln_combinations(20,4), math.log(19*17*15)) self.assertFloatEqual(ln_combinations(300,100),\ math.log(4.1582514632578812e+81)) def test_ln_binomial_integer(self): """ln_binomial should match R results for integer values""" expected = { (10,60,0.1): -3.247883, (1, 1, 0.5): math.log(0.5), (1, 1, 0.0000001): math.log(1e-07), (1, 1, 0.9999999): math.log(0.9999999), (3, 5, 0.75): math.log(0.2636719), (0, 60, 0.5): math.log(8.673617e-19), (129, 130, 0.5): math.log(9.550892e-38), (299, 300, 0.099): math.log(1.338965e-298), (9, 27, 0.0003): math.log(9.175389e-26), (1032, 2050, 0.5): math.log(0.01679804), } for (key, value) in expected.items(): self.assertFloatEqualRel(ln_binomial(*key), value, 1e-4) def test_ln_binomial_floats(self): """Binomial exact should match values from R for integer successes""" expected = { (18.3, 100, 0.2): (math.log(0.09089812), math.log(0.09807429)), (2.7,1050,0.006): (math.log(0.03615498), math.log(0.07623827)), (2.7,1050,0.06): (math.log(1.365299e-25), math.log(3.044327e-24)), (2,100.5,0.6): (math.log(7.303533e-37), math.log(1.789727e-36)), (0.2, 60, 0.5): (math.log(8.673617e-19), math.log(5.20417e-17)), (.5,5,.3):(math.log(0.16807),math.log(0.36015)), (10,100.5,.5):(math.log(7.578011e-18),math.log(1.365543e-17)), } for (key, value) in expected.items(): min_val, max_val = value assert min_val < ln_binomial(*key) < max_val #self.assertFloatEqualRel(binomial_exact(*key), value, 1e-4) def test_ln_binomial_range(self): """ln_binomial should increase in a monotonically increasing region. """ start=0 end=1 step = 0.1 lower_lim = -1.783375-1e-4 upper_lim = -1.021235+1e-4 previous_value = -1.784 while start <= end: obs = ln_binomial(start,5,.3) assert lower_lim <= obs <= upper_lim assert obs > previous_value previous_value = obs start += step def test_log_one_minus_large(self): """log_one_minus_x should return math.log(1-x) if x is large""" self.assertFloatEqual(log_one_minus(0.2), math.log(1-0.2)) def test_log_one_minus_small(self): """log_one_minus_x should return -x if x is small""" self.assertFloatEqualRel(log_one_minus(1e-30), 1e-30) def test_one_minus_exp_large(self): """one_minus_exp_x should return 1 - math.exp(x) if x is large""" self.assertFloatEqual(one_minus_exp(0.2), 1-(math.exp(0.2))) def test_one_minus_exp_small(self): """one_minus_exp_x should return -x if x is small""" self.assertFloatEqual(one_minus_exp(1e-30), -1e-30) def test_log1p(self): """log1p should give same results as cephes""" p_s = [1e-10, 1e-5, 0.1, 0.8, 0.9, 0.95, 0.999, 0.9999999, 1, \ 1.000000001, 1.01, 2] exp = [ 9.9999999995e-11, 9.99995000033e-06, 0.0953101798043, 0.587786664902, 0.641853886172, 0.667829372576, 0.692647055518, 0.69314713056, 0.69314718056, 0.69314718106, 0.698134722071, 1.09861228867,] for p, e in zip(p_s, exp): self.assertFloatEqual(log1p(p), e) def test_igami(self): """igami should give same result as cephes implementation""" a_vals = [1e-10, 1e-5, 0.5, 1, 10, 200] y_vals = range(0,10,2) obs = [igami(a, y/10.0) for a in a_vals for y in y_vals] exp=[1.79769313486e+308, 0.0, 0.0, 0.0, 0.0, 1.79769313486e+308, 0.0, 0.0, 0.0, 0.0, 1.79769313486e+308, 0.821187207575, 0.3541631504, 0.137497948864, 0.0320923773337, 1.79769313486e+308, 1.60943791243, 0.916290731874, 0.510825623766, 0.223143551314, 1.79769313486e+308, 12.5187528198, 10.4756841889, 8.9044147366, 7.28921960854, 1.79769313486e+308, 211.794753362, 203.267574402, 196.108740945, 188.010915412, ] for o, e in zip(obs, exp): self.assertFloatEqual(o,e) def test_ndtri(self): """ndtri should give same result as implementation in cephes""" exp=[-1.79769313486e+308, -2.32634787404, -2.05374891063, -1.88079360815, -1.75068607125, -1.64485362695, -1.5547735946, -1.47579102818, -1.40507156031, -1.34075503369, -1.28155156554, -1.22652812004, -1.17498679207, -1.12639112904, -1.08031934081, -1.03643338949, -0.99445788321, -0.954165253146, -0.915365087843, -0.877896295051, -0.841621233573, -0.806421247018, -0.772193214189, -0.738846849185, -0.70630256284, -0.674489750196, -0.643345405393, -0.612812991017, -0.582841507271, -0.553384719556, -0.524400512708, -0.495850347347, -0.467698799115, -0.439913165673, -0.412463129441, -0.385320466408, -0.358458793251, -0.331853346437, -0.305480788099, -0.279319034447, -0.253347103136, -0.227544976641, -0.201893479142, -0.176374164781, -0.150969215497, -0.125661346855, -0.100433720511, -0.0752698620998, -0.0501535834647, -0.0250689082587, 0.0, 0.0250689082587, 0.0501535834647, 0.0752698620998, 0.100433720511, 0.125661346855, 0.150969215497, 0.176374164781, 0.201893479142, 0.227544976641, 0.253347103136, 0.279319034447, 0.305480788099, 0.331853346437, 0.358458793251, 0.385320466408, 0.412463129441, 0.439913165673, 0.467698799115, 0.495850347347, 0.524400512708, 0.553384719556, 0.582841507271, 0.612812991017, 0.643345405393, 0.674489750196, 0.70630256284, 0.738846849185, 0.772193214189, 0.806421247018, 0.841621233573, 0.877896295051, 0.915365087843, 0.954165253146, 0.99445788321, 1.03643338949, 1.08031934081, 1.12639112904, 1.17498679207, 1.22652812004, 1.28155156554, 1.34075503369, 1.40507156031, 1.47579102818, 1.5547735946, 1.64485362695, 1.75068607125, 1.88079360815, 2.05374891063, 2.32634787404, ] obs = [ndtri(i/100.0) for i in range(100)] self.assertFloatEqual(obs, exp) def test_incbi(self): """incbi results should match cephes libraries""" aa_range = [0.1, 0.2, 0.5, 1, 2, 5] bb_range = aa_range yy_range = [0.1, 0.2, 0.5, 0.9] exp = [ 8.86928001193e-08, 9.08146855855e-05, 0.5, 0.999999911307, 4.39887474012e-09, 4.50443299194e-06, 0.0416524955556, 0.997881005025, 3.46456275553e-10, 3.54771169012e-07, 0.00337816430373, 0.732777808689, 1e-10, 1.024e-07, 0.0009765625, 0.3486784401, 3.85543289443e-11, 3.94796342545e-08, 0.000376636057552, 0.154915841005, 1.33210087225e-11, 1.36407136078e-08, 0.000130149552409, 0.056682323296, 0.00211899497509, 0.0646097657259, 0.958347504444, 0.999999995601, 0.000247764691908, 0.00788804962659, 0.5, 0.999752235308, 3.09753032747e-05, 0.000990813218262, 0.092990311753, 0.906714634947, 1e-05, 0.00032, 0.03125, 0.59049, 4.01878917904e-06, 0.000128614607219, 0.0126923538971, 0.309157452156, 1.41593162013e-06, 4.5316442592e-05, 0.00449136140034, 0.122896698096, 0.267222191311, 0.684264602461, 0.996621835696, 0.999999999654, 0.0932853650529, 0.321847764104, 0.907009688247, 0.999969024697, 0.0244717418524, 0.0954915028125, 0.5, 0.975528258148, 0.01, 0.04, 0.25, 0.81, 0.00445768188762, 0.0179929616503, 0.120614758428, 0.531877433474, 0.00165851285512, 0.00672409501831, 0.046687245337, 0.247272226803, 0.6513215599, 0.8926258176, 0.9990234375, 0.9999999999, 0.40951, 0.67232, 0.96875, 0.99999, 0.19, 0.36, 0.75, 0.99, 0.1, 0.2, 0.5, 0.9, 0.0513167019495, 0.105572809, 0.292893218813, 0.683772233983, 0.020851637639, 0.04364750021, 0.129449436704, 0.36904265552, 0.845084158995, 0.956946913164, 0.999623363942, 0.999999999961, 0.690842547844, 0.850620771098, 0.987307646103, 0.999995981211, 0.468122566526, 0.629849697132, 0.879385241572, 0.995542318112, 0.316227766017, 0.4472135955, 0.707106781187, 0.948683298051, 0.195800105659, 0.287140725417, 0.5, 0.804199894341, 0.0925952589131, 0.13988068827, 0.264449983296, 0.510316306551, 0.943317676704, 0.984896695084, 0.999869850448, 0.999999999987, 0.877103301904, 0.944441767096, 0.9955086386, 0.999998584068, 0.752727773197, 0.841546267738, 0.953312754663, 0.998341487145, 0.63095734448, 0.724779663678, 0.870550563296, 0.979148362361, 0.489683693449, 0.577552475154, 0.735550016704, 0.907404741087, 0.300968763593, 0.366086516536, 0.5, 0.699031236407, ] i = 0 for a in aa_range: for b in bb_range: for y in yy_range: result = incbi(a,b,y) e = exp[i] self.assertFloatEqual(e, result) i += 1 #specific cases that failed elsewhere self.assertFloatEqual(incbi(999,2,1e-10), 0.97399698104554944) #execute tests if called from command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/test_test.py000644 000765 000024 00000210751 12024702176 024351 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for statistical tests and utility functions. """ from cogent.util.unit_test import TestCase, main from cogent.maths.stats.test import tail, G_2_by_2,G_fit, likelihoods,\ posteriors, bayes_updates, t_paired, t_one_sample, t_two_sample, \ mc_t_two_sample, _permute_observations, t_one_observation, correlation, \ correlation_test, correlation_matrix, z_test, z_tailed_prob, \ t_tailed_prob, sign_test, reverse_tails, ZeroExpectedError, combinations, \ multiple_comparisons, multiple_inverse, multiple_n, fisher, regress, \ regress_major, f_value, f_two_sample, calc_contingency_expected, \ G_fit_from_Dict2D, chi_square_from_Dict2D, MonteCarloP, \ regress_residuals, safe_sum_p_log_p, G_ind, regress_origin, stdev_from_mean, \ regress_R2, permute_2d, mantel, mantel_test, _flatten_lower_triangle, \ pearson, spearman, _get_rank, kendall_correlation, std, median, \ get_values_from_matrix, get_ltm_cells, distance_matrix_permutation_test, \ ANOVA_one_way, mw_test, mw_boot, is_symmetric_and_hollow from numpy import array, concatenate, fill_diagonal, reshape, arange, matrix, \ ones, testing, tril, cov, sqrt from cogent.util.dict2d import Dict2D import math from cogent.maths.stats.util import Numbers __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2011, The Cogent Project" __credits__ = ["Rob Knight", "Catherine Lozupone", "Gavin Huttley", "Sandra Smit", "Daniel McDonald", "Jai Ram Rideout", "Michael Dwan"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class TestsHelper(TestCase): """Class with utility methods useful for other tests.""" def setUp(self): """Sets up variables used in the tests.""" # How many times a p-value should be tested to fall in a given range # before failing the test. self.p_val_tests = 10 def assertCorrectPValue(self, exp_min, exp_max, fn, args=None, kwargs=None, p_val_idx=0): """Tests that the stochastic p-value falls in the specified range. Performs the test self.p_val_tests times and fails if the observed p-value does not fall into the specified range at least once. Each p-value is also tested that it falls in the range 0.0 to 1.0. This method assumes that fn is callable, and will unpack and pass args and kwargs to fn if they are provided. It also assumes that fn returns a single value (the p-value to be tested) or a tuple of results (any length greater than or equal to 1), with the p-value at position p_val_idx. This is primarily used for testing the Mantel and correlation_test functions. """ found_match = False for i in range(self.p_val_tests): if args is not None and kwargs is not None: obs = fn(*args, **kwargs) elif args is not None: obs = fn(*args) elif kwargs is not None: obs = fn(**kwargs) else: obs = fn() try: p_val = float(obs) except TypeError: p_val = obs[p_val_idx] self.assertIsProb(p_val) if p_val >= exp_min and p_val <= exp_max: found_match = True break self.assertTrue(found_match) class TestsTests(TestCase): """Tests miscellaneous functions.""" def test_std(self): """Should produce a standard deviation of 1.0 for a std normal dist""" expected = 1.58113883008 self.assertFloatEqual(std(array([1,2,3,4,5])), expected) expected_a = array([expected, expected, expected, expected, expected]) a = array([[1,2,3,4,5],[5,1,2,3,4],[4,5,1,2,3],[3,4,5,1,2],[2,3,4,5,1]]) self.assertFloatEqual(std(a,axis=0), expected_a) self.assertFloatEqual(std(a,axis=1), expected_a) self.assertRaises(ValueError, std, a, 5) def test_std_2d(self): """Should produce from 2darray the same stdevs as scipy.stats.std""" inp = array([[1,2,3],[4,5,6]]) exps = ( #tuple(scipy_std(inp, ax) for ax in [None, 0, 1]) 1.8708286933869707, array([ 2.12132034, 2.12132034, 2.12132034]), array([ 1., 1.])) results = tuple(std(inp, ax) for ax in [None, 0, 1]) for obs, exp in zip(results, exps): testing.assert_almost_equal(obs, exp) def test_std_3d(self): """Should produce from 3darray the same std devs as scipy.stats.std""" inp3d = array(#2,2,3 [[[ 0, 2, 2], [ 3, 4, 5]], [[ 1, 9, 0], [ 9, 10, 1]]]) exp3d = (#for axis None, 0, 1, 2: calc from scipy.stats.std 3.63901418552, array([[ 0.70710678, 4.94974747, 1.41421356], [ 4.24264069, 4.24264069, 2.82842712]]), array([[ 2.12132034, 1.41421356, 2.12132034], [ 5.65685425, 0.70710678, 0.70710678]]), array([[ 1.15470054, 1. ], [ 4.93288286, 4.93288286]])) res = tuple(std(inp3d, ax) for ax in [None, 0, 1, 2]) for obs, exp in zip(res, exp3d): testing.assert_almost_equal(obs, exp) def test_median(self): """_median should work similarly to numpy.mean (in terms of axis)""" m = array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]]) expected = 6.5 observed = median(m, axis=None) self.assertEqual(observed, expected) expected = array([5.5, 6.5, 7.5]) observed = median(m, axis=0) self.assertEqual(observed, expected) expected = array([2.0, 5.0, 8.0, 11.0]) observed = median(m, axis=1) self.assertEqual(observed, expected) self.assertRaises(ValueError, median, m, 10) def test_tail(self): """tail should return x/2 if test is true; 1-(x/2) otherwise""" self.assertFloatEqual(tail(0.25, 'a'=='a'), 0.25/2) self.assertFloatEqual(tail(0.25, 'a'!='a'), 1-(0.25/2)) def test_combinations(self): """combinations should return correct binomial coefficient""" self.assertFloatEqual(combinations(5,3), 10) self.assertFloatEqual(combinations(5,2), 10) #only one way to pick no items or the same number of items self.assertFloatEqual(combinations(123456789, 0), 1) self.assertFloatEqual(combinations(123456789, 123456789), 1) #n ways to pick one item self.assertFloatEqual(combinations(123456789, 1), 123456789) #n(n-1)/2 ways to pick 2 items self.assertFloatEqual(combinations(123456789, 2), 123456789*123456788/2) #check an arbitrary value in R self.assertFloatEqual(combinations(1234567, 12), 2.617073e64) def test_multiple_comparisons(self): """multiple_comparisons should match values from R""" self.assertFloatEqual(multiple_comparisons(1e-7, 10000), 1-0.9990005) self.assertFloatEqual(multiple_comparisons(0.05, 10), 0.4012631) self.assertFloatEqual(multiple_comparisons(1e-20, 1), 1e-20) self.assertFloatEqual(multiple_comparisons(1e-300, 1), 1e-300) self.assertFloatEqual(multiple_comparisons(0.95, 3),0.99987499999999996) self.assertFloatEqual(multiple_comparisons(0.75, 100),0.999999999999679) self.assertFloatEqual(multiple_comparisons(0.5, 1000),1) self.assertFloatEqual(multiple_comparisons(0.01, 1000),0.99995682875259) self.assertFloatEqual(multiple_comparisons(0.5, 5), 0.96875) self.assertFloatEqual(multiple_comparisons(1e-20, 10), 1e-19) def test_multiple_inverse(self): """multiple_inverse should invert multiple_comparisons results""" #NOTE: multiple_inverse not very accurate close to 1 self.assertFloatEqual(multiple_inverse(1-0.9990005, 10000), 1e-7) self.assertFloatEqual(multiple_inverse(0.4012631 , 10), 0.05) self.assertFloatEqual(multiple_inverse(1e-20, 1), 1e-20) self.assertFloatEqual(multiple_inverse(1e-300, 1), 1e-300) self.assertFloatEqual(multiple_inverse(0.96875, 5), 0.5) self.assertFloatEqual(multiple_inverse(1e-19, 10), 1e-20) def test_multiple_n(self): """multiple_n should swap parameters in multiple_comparisons""" self.assertFloatEqual(multiple_n(1e-7, 1-0.9990005), 10000) self.assertFloatEqual(multiple_n(0.05, 0.4012631), 10) self.assertFloatEqual(multiple_n(1e-20, 1e-20), 1) self.assertFloatEqual(multiple_n(1e-300, 1e-300), 1) self.assertFloatEqual(multiple_n(0.95,0.99987499999999996),3) self.assertFloatEqual(multiple_n(0.5,0.96875),5) self.assertFloatEqual(multiple_n(1e-20, 1e-19), 10) def test_fisher(self): """fisher results should match p 795 Sokal and Rohlf""" self.assertFloatEqual(fisher([0.073,0.086,0.10,0.080,0.060]), 0.0045957946540917905) def test_regress(self): """regression slope, intercept should match p 459 Sokal and Rohlf""" x = [0, 12, 29.5,43,53,62.5,75.5,85,93] y = [8.98, 8.14, 6.67, 6.08, 5.90, 5.83, 4.68, 4.20, 3.72] self.assertFloatEqual(regress(x, y), (-0.05322, 8.7038), 0.001) #higher precision from OpenOffice self.assertFloatEqual(regress(x, y), (-0.05322215,8.70402730)) #add test to confirm no overflow error with large numbers x = [32119,33831] y = [2.28,2.43] exp = (8.761682243E-05, -5.341209112E-01) self.assertFloatEqual(regress(x,y),exp,0.001) def test_regress_origin(self): """regression slope constrained through origin should match Excel""" x = array([1,2,3,4]) y = array([4,2,6,8]) self.assertFloatEqual(regress_origin(x, y), (1.9333333,0)) #add test to confirm no overflow error with large numbers x = [32119,33831] y = [2.28,2.43] exp = (7.1428649481939822e-05, 0) self.assertFloatEqual(regress_origin(x,y),exp,0.001) def test_regress_R2(self): """regress_R2 returns the R^2 value of a regression""" x = [1.0,2.0,3.0,4.0,5.0] y = [2.1,4.2,5.9,8.4,9.6] result = regress_R2(x, y) self.assertFloatEqual(result, 0.99171419347896) def test_regress_residuals(self): """regress_residuals reprts error for points in linear regression""" x = [1.0,2.0,3.0,4.0,5.0] y = [2.1,4.2,5.9,8.4,9.6] result = regress_residuals(x, y) self.assertFloatEqual(result, [-0.1, 0.08, -0.14, 0.44, -0.28]) def test_stdev_from_mean(self): """stdev_from_mean returns num std devs from mean for each val in x""" x = [2.1, 4.2, 5.9, 8.4, 9.6] result = stdev_from_mean(x) self.assertFloatEqual(result, [-1.292463399014413, -0.60358696806764478, -0.045925095396451399, 0.77416589382589174, 1.1678095686526162]) def test_regress_major(self): """major axis regression should match p 589 Sokal and Rohlf""" #Note that the Sokal and Rohlf example flips the axes, such that the #equation is for explaining x in terms of y, not y in terms of x. #Behavior here is the reverse, for easy comparison with regress. y = [159, 179, 100, 45, 384, 230, 100, 320, 80, 220, 320, 210] x = [14.40, 15.20, 11.30, 2.50, 22.70, 14.90, 1.41, 15.81, 4.19, 15.39, 17.25, 9.52] self.assertFloatEqual(regress_major(x, y), (18.93633,-32.55208)) def test_sign_test(self): """sign_test, should match values from R""" v = [("two sided", 26, 50, 0.88772482734078251), ("less", 26, 50, 0.6641), ("l", 10, 50, 1.193066583837777e-05), ("hi", 30, 50, 0.1013193755322703), ("h", 0, 50, 1.0), ("2", 30, 50, 0.20263875106454063), ("h", 49, 50, 4.5297099404706387e-14), ("h", 50, 50, 8.8817841970012543e-16) ] for alt, success, trials, p in v: result = sign_test(success, trials, alt=alt) self.assertFloatEqual(result, p, eps=1e-5) def test_permute_2d(self): """permute_2d permutes rows and cols of a matrix.""" a = reshape(arange(9), (3,3)) self.assertEqual(permute_2d(a, [0,1,2]), a) self.assertEqual(permute_2d(a, [2,1,0]), \ array([[8,7,6],[5,4,3],[2,1,0]])) self.assertEqual(permute_2d(a, [1,2,0]), \ array([[4,5,3],[7,8,6],[1,2,0]])) class GTests(TestCase): """Tests implementation of the G tests for fit and independence.""" def test_G_2_by_2_2tailed_equal(self): """G_2_by_2 should return 0 if all cell counts are equal""" self.assertFloatEqual(0, G_2_by_2(1, 1, 1, 1, False, False)[0]) self.assertFloatEqual(0, G_2_by_2(100, 100, 100, 100, False, False)[0]) self.assertFloatEqual(0, G_2_by_2(100, 100, 100, 100, True, False)[0]) def test_G_2_by_2_bad_data(self): """G_2_by_2 should raise ValueError if any counts are negative""" self.assertRaises(ValueError, G_2_by_2, 1, -1, 1, 1) def test_G_2_by_2_2tailed_examples(self): """G_2_by_2 values should match examples in Sokal & Rohlf""" #example from p 731, Sokal and Rohlf (1995) #without correction self.assertFloatEqual(G_2_by_2(12, 22, 16, 50, False, False)[0], 1.33249, 0.0001) self.assertFloatEqual(G_2_by_2(12, 22, 16, 50, False, False)[1], 0.24836, 0.0001) #with correction self.assertFloatEqual(G_2_by_2(12, 22, 16, 50, True, False)[0], 1.30277, 0.0001) self.assertFloatEqual(G_2_by_2(12, 22, 16, 50, True, False)[1], 0.25371, 0.0001) def test_G_2_by_2_1tailed_examples(self): """G_2_by_2 values should match values from codon_binding program""" #first up...the famous arginine case self.assertFloatEqualAbs(G_2_by_2(36, 16, 38, 106), (29.111609, 0), 0.00001) #then some other miscellaneous positive and negative values self.assertFloatEqualAbs(G_2_by_2(0,52,12,132), (-7.259930, 0.996474), 0.00001) self.assertFloatEqualAbs(G_2_by_2(5,47,14,130), (-0.000481, 0.508751), 0.00001) self.assertFloatEqualAbs(G_2_by_2(5,47,36,108), (-6.065167, 0.993106), 0.00001) def test_calc_contingency_expected(self): """calcContingencyExpected returns new matrix with expected freqs""" matrix = Dict2D({'rest_of_tree': {'env1': 2, 'env3': 1, 'env2': 0}, 'b': {'env1': 1, 'env3': 1, 'env2': 3}}) result = calc_contingency_expected(matrix) self.assertFloatEqual(result['rest_of_tree']['env1'], [2, 1.125]) self.assertFloatEqual(result['rest_of_tree']['env3'], [1, 0.75]) self.assertFloatEqual(result['rest_of_tree']['env2'], [0, 1.125]) self.assertFloatEqual(result['b']['env1'], [1, 1.875]) self.assertFloatEqual(result['b']['env3'], [1, 1.25]) self.assertFloatEqual(result['b']['env2'], [3, 1.875]) def test_Gfit_unequal_lists(self): """Gfit should raise errors if lists unequal""" #lists must be equal self.assertRaises(ValueError, G_fit, [1, 2, 3], [1, 2]) def test_Gfit_negative_observeds(self): """Gfit should raise ValueError if any observeds are negative.""" self.assertRaises(ValueError, G_fit, [-1, 2, 3], [1, 2, 3]) def test_Gfit_nonpositive_expecteds(self): """Gfit should raise ZeroExpectedError if expecteds are zero/negative""" self.assertRaises(ZeroExpectedError, G_fit, [1, 2, 3], [0, 1, 2]) self.assertRaises(ZeroExpectedError, G_fit, [1, 2, 3], [-1, 1, 2]) def test_Gfit_good_data(self): """Gfit tests for fit should match examples in Sokal and Rohlf""" #example from p. 699, Sokal and Rohlf (1995) obs = [63, 31, 28, 12, 39, 16, 40, 12] exp = [ 67.78125, 22.59375, 22.59375, 7.53125, 45.18750, 15.06250, 45.18750, 15.06250] #without correction self.assertFloatEqualAbs(G_fit(obs, exp, False)[0], 8.82397, 0.00002) self.assertFloatEqualAbs(G_fit(obs, exp, False)[1], 0.26554, 0.00002) #with correction self.assertFloatEqualAbs(G_fit(obs, exp)[0], 8.76938, 0.00002) self.assertFloatEqualAbs(G_fit(obs, exp)[1], 0.26964, 0.00002) #example from p. 700, Sokal and Rohlf (1995) obs = [130, 46] exp = [132, 44] #without correction self.assertFloatEqualAbs(G_fit(obs, exp, False)[0], 0.12002, 0.00002) self.assertFloatEqualAbs(G_fit(obs, exp, False)[1], 0.72901, 0.00002) #with correction self.assertFloatEqualAbs(G_fit(obs, exp)[0], 0.11968, 0.00002) self.assertFloatEqualAbs(G_fit(obs, exp)[1], 0.72938, 0.00002) def test_safe_sum_p_log_p(self): """safe_sum_p_log_p should ignore zero elements, not raise error""" m = array([2,4,0,8]) self.assertEqual(safe_sum_p_log_p(m,2), 2*1+4*2+8*3) def test_G_ind(self): """G test for independence should match Sokal and Rohlf p 738 values""" a = array([[29,11],[273,191],[8,31],[64,64]]) self.assertFloatEqual(G_ind(a)[0], 28.59642) self.assertFloatEqual(G_ind(a, True)[0], 28.31244) def test_G_fit_from_Dict2D(self): """G_fit_from_Dict2D runs G-fit on data in a Dict2D """ matrix = Dict2D({'Marl': {'val':[2, 5.2]}, 'Chalk': {'val':[10, 5.2]}, 'Sandstone':{'val':[8, 5.2]}, 'Clay':{'val':[2, 5.2]}, 'Limestone':{'val':[4, 5.2]} }) g_val, prob = G_fit_from_Dict2D(matrix) self.assertFloatEqual(g_val, 9.84923) self.assertFloatEqual(prob, 0.04304536) def test_chi_square_from_Dict2D(self): """chi_square_from_Dict2D calcs a Chi-Square and p value from Dict2D""" #test1 obs_matrix = Dict2D({'rest_of_tree': {'env1': 2, 'env3': 1, 'env2': 0}, 'b': {'env1': 1, 'env3': 1, 'env2': 3}}) input_matrix = calc_contingency_expected(obs_matrix) test, csp = chi_square_from_Dict2D(input_matrix) self.assertFloatEqual(test, 3.0222222222222221) #test2 test_matrix_2 = Dict2D({'Marl': {'val':[2, 5.2]}, 'Chalk': {'val':[10, 5.2]}, 'Sandstone':{'val':[8, 5.2]}, 'Clay':{'val':[2, 5.2]}, 'Limestone':{'val':[4, 5.2]} }) test2, csp2 = chi_square_from_Dict2D(test_matrix_2) self.assertFloatEqual(test2, 10.1538461538) self.assertFloatEqual(csp2, 0.0379143890013) #test3 matrix3_obs = Dict2D({'AIDS':{'Males':4, 'Females':2, 'Both':3}, 'No_AIDS':{'Males':3, 'Females':16, 'Both':2} }) matrix3 = calc_contingency_expected(matrix3_obs) test3, csp3 = chi_square_from_Dict2D(matrix3) self.assertFloatEqual(test3, 7.6568405139833722) self.assertFloatEqual(csp3, 0.0217439383468) class LikelihoodTests(TestCase): """Tests implementations of likelihood calculations.""" def test_likelihoods_unequal_list_lengths(self): """likelihoods should raise ValueError if input lists unequal length""" self.assertRaises(ValueError, likelihoods, [1, 2], [1]) def test_likelihoods_equal_priors(self): """likelihoods should equal Pr(D|H) if priors the same""" equal = [0.25, 0.25, 0.25,0.25] unequal = [0.5, 0.25, 0.125, 0.125] equal_answer = [1, 1, 1, 1] unequal_answer = [2, 1, 0.5, 0.5] for obs, exp in zip(likelihoods(equal, equal), equal_answer): self.assertFloatEqual(obs, exp) for obs, exp in zip(likelihoods(unequal, equal), unequal_answer): self.assertFloatEqual(obs, exp) def test_likelihoods_equal_evidence(self): """likelihoods should return vector of 1's if evidence equal for all""" equal = [0.25, 0.25, 0.25,0.25] unequal = [0.5, 0.25, 0.125, 0.125] equal_answer = [1, 1, 1, 1] unequal_answer = [2, 1, 0.5, 0.5] not_unity = [0.7, 0.7, 0.7, 0.7] for obs, exp in zip(likelihoods(equal, unequal), equal_answer): self.assertFloatEqual(obs, exp) #should be the same if evidences don't sum to 1 for obs, exp in zip(likelihoods(not_unity, unequal), equal_answer): self.assertFloatEqual(obs, exp) def test_likelihoods_unequal_evidence(self): """likelihoods should update based on weighted sum if evidence unequal""" not_unity = [1, 0.5, 0.25, 0.25] unequal = [0.5, 0.25, 0.125, 0.125] products = [1.4545455, 0.7272727, 0.3636364, 0.3636364] #if priors and evidence both unequal, likelihoods should change #(calculated using StarCalc) for obs, exp in zip(likelihoods(not_unity, unequal), products): self.assertFloatEqual(obs, exp) def test_posteriors_unequal_lists(self): """posteriors should raise ValueError if input lists unequal lengths""" self.assertRaises(ValueError, posteriors, [1, 2, 3], [1]) def test_posteriors_good_data(self): """posteriors should return products of paired list elements""" first = [0, 0.25, 0.5, 1, 0.25] second = [0.25, 0.5, 0, 0.1, 1] product = [0, 0.125, 0, 0.1, 0.25] for obs, exp in zip(posteriors(first, second), product): self.assertFloatEqual(obs, exp) class BayesUpdateTests(TestCase): """Tests implementation of Bayes calculations""" def setUp(self): first = [0.25, 0.25, 0.25] second = [0.1, 0.75, 0.3] third = [0.95, 1e-10, 0.2] fourth = [0.01, 0.9, 0.1] bad = [1, 2, 1, 1, 1] self.bad = [first, bad, second, third] self.test = [first, second, third, fourth] self.permuted = [fourth, first, third, second] self.deleted = [second, fourth, third] self.extra = [first, second, first, third, first, fourth, first] #BEWARE: low precision in second item, so need to adjust threshold #for assertFloatEqual accordingly (and use assertFloatEqualAbs). self.result = [0.136690646154, 0.000000009712, 0.863309344133] def test_bayes_updates_bad_data(self): """bayes_updates should raise ValueError on unequal-length lists""" self.assertRaises(ValueError, bayes_updates, self.bad) def test_bayes_updates_good_data(self): """bayes_updates should match hand calculations of probability updates""" #result for first -> fourth calculated by hand for obs, exp in zip(bayes_updates(self.test), self.result): self.assertFloatEqualAbs(obs, exp, 1e-11) def test_bayes_updates_permuted(self): """bayes_updates should not be affected by order of inputs""" for obs, exp in zip(bayes_updates(self.permuted), self.result): self.assertFloatEqualAbs(obs, exp, 1e-11) def test_bayes_update_nondiscriminating(self): """bayes_updates should be unaffected by extra nondiscriminating data""" #deletion of non-discriminating evidence should not affect result for obs, exp in zip(bayes_updates(self.deleted), self.result): self.assertFloatEqualAbs(obs, exp, 1e-11) #additional non-discriminating evidence should not affect result for obs, exp in zip(bayes_updates(self.extra), self.result): self.assertFloatEqualAbs(obs, exp, 1e-11) class StatTests(TestsHelper): """Tests that the t and z tests are implemented correctly""" def setUp(self): super(StatTests, self).setUp() self.x = [ 7.33, 7.49, 7.27, 7.93, 7.56, 7.81, 7.46, 6.94, 7.49, 7.44, 7.95, 7.47, 7.04, 7.10, 7.64, ] self.y = [ 7.53, 7.70, 7.46, 8.21, 7.81, 8.01, 7.72, 7.13, 7.68, 7.66, 8.11, 7.66, 7.20, 7.25, 7.79, ] def test_t_paired_2tailed(self): """t_paired should match values from Sokal & Rohlf p 353""" x, y = self.x, self.y #check value of t and the probability for 2-tailed self.assertFloatEqual(t_paired(y, x)[0], 19.7203, 1e-4) self.assertFloatEqual(t_paired(y, x)[1], 1.301439e-11, 1e-4) def test_t_paired_no_variance(self): """t_paired should return None if lists are invariant""" x = [1, 1, 1] y = [0, 0, 0] self.assertEqual(t_paired(x,x), (None, None)) self.assertEqual(t_paired(x,y), (None, None)) def test_t_paired_1tailed(self): """t_paired should match pre-calculated 1-tailed values""" x, y = self.x, self.y #check probability for 1-tailed low and high self.assertFloatEqual( t_paired(y, x, "low")[1], 1-(1.301439e-11/2), 1e-4) self.assertFloatEqual( t_paired(x, y, "high")[1], 1-(1.301439e-11/2), 1e-4) self.assertFloatEqual( t_paired(y, x, "high")[1], 1.301439e-11/2, 1e-4) self.assertFloatEqual( t_paired(x, y, "low")[1], 1.301439e-11/2, 1e-4) def test_t_paired_specific_difference(self): """t_paired should allow a specific difference to be passed""" x, y = self.x, self.y #difference is 0.2, so test should be non-significant if 0.2 passed self.failIf(t_paired(y, x, exp_diff=0.2)[0] > 1e-10) #same, except that reversing list order reverses sign of difference self.failIf(t_paired(x, y, exp_diff=-0.2)[0] > 1e-10) #check that there's no significant difference from the true mean self.assertFloatEqual( t_paired(y, x,exp_diff=0.2)[1], 1, 1e-4) def test_t_paired_bad_data(self): """t_paired should raise ValueError on lists of different lengths""" self.assertRaises(ValueError, t_paired, self.y, [1, 2, 3]) def test_t_two_sample(self): """t_two_sample should match example on p.225 of Sokal and Rohlf""" I = array([7.2, 7.1, 9.1, 7.2, 7.3, 7.2, 7.5]) II = array([8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2]) self.assertFloatEqual(t_two_sample(I, II), (-0.1184, 0.45385 * 2), 0.001) def test_t_two_sample_no_variance(self): """t_two_sample should return None if lists are invariant""" x = array([1, 1, 1]) y = array([0, 0, 0]) self.assertEqual(t_two_sample(x,x), (None, None)) self.assertEqual(t_two_sample(x,y), (None, None)) def test_t_one_sample(self): """t_one_sample results should match those from R""" x = array(range(-5,5)) y = array(range(-1,10)) self.assertFloatEqualAbs(t_one_sample(x), (-0.5222, 0.6141), 1e-4) self.assertFloatEqualAbs(t_one_sample(y), (4, 0.002518), 1e-4) #do some one-tailed tests as well self.assertFloatEqualAbs(t_one_sample(y, tails='low'),(4, 0.9987),1e-4) self.assertFloatEqualAbs(t_one_sample(y,tails='high'),(4,0.001259),1e-4) def test_t_two_sample_switch(self): """t_two_sample should call t_one_observation if 1 item in sample.""" sample = array([4.02, 3.88, 3.34, 3.87, 3.18]) x = array([3.02]) self.assertFloatEqual(t_two_sample(x,sample),(-1.5637254,0.1929248)) self.assertFloatEqual(t_two_sample(sample, x),(-1.5637254,0.1929248)) #can't do the test if both samples have single item self.assertEqual(t_two_sample(x,x), (None, None)) def test_t_one_observation(self): """t_one_observation should match p. 228 of Sokal and Rohlf""" sample = array([4.02, 3.88, 3.34, 3.87, 3.18]) x = 3.02 #note that this differs after the 3rd decimal place from what's in the #book, because Sokal and Rohlf round their intermediate steps... self.assertFloatEqual(t_one_observation(x,sample),\ (-1.5637254,0.1929248)) def test_mc_t_two_sample(self): """Test gives correct results with valid input data.""" # Verified against R's t.test() and perm.t.test(). # With numpy array as input. exp = (-0.11858541225631833, 0.90756579317867436) I = array([7.2, 7.1, 9.1, 7.2, 7.3, 7.2, 7.5]) II = array([8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2]) obs = mc_t_two_sample(I, II) self.assertFloatEqual(obs[:2], exp) self.assertEqual(len(obs[2]), 999) self.assertCorrectPValue(0.8, 0.9, mc_t_two_sample, [I, II], p_val_idx=3) # With python list as input. exp = (-0.11858541225631833, 0.90756579317867436) I = [7.2, 7.1, 9.1, 7.2, 7.3, 7.2, 7.5] II = [8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2] obs = mc_t_two_sample(I, II) self.assertFloatEqual(obs[:2], exp) self.assertEqual(len(obs[2]), 999) self.assertCorrectPValue(0.8, 0.9, mc_t_two_sample, [I, II], p_val_idx=3) exp = (-0.11858541225631833, 0.45378289658933718) obs = mc_t_two_sample(I, II, tails='low') self.assertFloatEqual(obs[:2], exp) self.assertEqual(len(obs[2]), 999) self.assertCorrectPValue(0.4, 0.47, mc_t_two_sample, [I, II], {'tails':'low'}, p_val_idx=3) exp = (-0.11858541225631833, 0.54621710341066287) obs = mc_t_two_sample(I, II, tails='high', permutations=99) self.assertFloatEqual(obs[:2], exp) self.assertEqual(len(obs[2]), 99) self.assertCorrectPValue(0.4, 0.62, mc_t_two_sample, [I, II], {'tails':'high', 'permutations':99}, p_val_idx=3) exp = (-2.8855783649036986, 0.99315596652421401) obs = mc_t_two_sample(I, II, tails='high', permutations=99, exp_diff=1) self.assertFloatEqual(obs[:2], exp) self.assertEqual(len(obs[2]), 99) self.assertCorrectPValue(0.55, 0.99, mc_t_two_sample, [I, II], {'tails':'high', 'permutations':99, 'exp_diff':1}, p_val_idx=3) def test_mc_t_two_sample_unbalanced_obs(self): """Test gives correct results with unequal number of obs per sample.""" # Verified against R's t.test() and perm.t.test(). exp = (-0.10302479888889175, 0.91979753020527177) I = array([7.2, 7.1, 9.1, 7.2, 7.3, 7.2]) II = array([8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2]) obs = mc_t_two_sample(I, II) self.assertFloatEqual(obs[:2], exp) self.assertEqual(len(obs[2]), 999) self.assertCorrectPValue(0.8, 0.9, mc_t_two_sample, [I, II], p_val_idx=3) def test_mc_t_two_sample_single_obs_sample(self): """Test works correctly with one sample having a single observation.""" sample = array([4.02, 3.88, 3.34, 3.87, 3.18]) x = array([3.02]) exp = (-1.5637254,0.1929248) obs = mc_t_two_sample(x, sample) self.assertFloatEqual(obs[:2], exp) self.assertFloatEqual(len(obs[2]), 999) self.assertIsProb(obs[3]) obs = mc_t_two_sample(sample, x) self.assertFloatEqual(obs[:2], exp) self.assertFloatEqual(len(obs[2]), 999) self.assertIsProb(obs[3]) def test_mc_t_two_sample_no_perms(self): """Test gives empty permutation results if no perms are given.""" exp = (-0.11858541225631833, 0.90756579317867436, [], None) I = array([7.2, 7.1, 9.1, 7.2, 7.3, 7.2, 7.5]) II = array([8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2]) obs = mc_t_two_sample(I, II, permutations=0) self.assertFloatEqual(obs, exp) def test_mc_t_two_sample_no_mc(self): """Test no MC stats if initial t-test is bad.""" x = array([1, 1, 1]) y = array([0, 0, 0]) self.assertEqual(mc_t_two_sample(x,x), (None, None, [], None)) self.assertEqual(mc_t_two_sample(x,y), (None, None, [], None)) def test_mc_t_two_sample_invalid_input(self): """Test fails on various invalid input.""" self.assertRaises(ValueError, mc_t_two_sample, [1, 2, 3], [4., 5., 4.], tails='foo') self.assertRaises(ValueError, mc_t_two_sample, [1, 2, 3], [4., 5., 4.], permutations=-1) self.assertRaises(ValueError, mc_t_two_sample, [1], [4.]) self.assertRaises(ValueError, mc_t_two_sample, [1, 2], []) def test_permute_observations(self): """Test works correctly on small input dataset.""" I = [10, 20., 1] II = [2, 4, 5, 7] obs = _permute_observations(I, II, 1) self.assertEqual(len(obs[0]), 1) self.assertEqual(len(obs[1]), 1) self.assertEqual(len(obs[0][0]), len(I)) self.assertEqual(len(obs[1][0]), len(II)) self.assertFloatEqual(sorted(concatenate((obs[0][0], obs[1][0]))), sorted(I + II)) def test_reverse_tails(self): """reverse_tails should return 'high' if tails was 'low' or vice versa""" self.assertEqual(reverse_tails('high'), 'low') self.assertEqual(reverse_tails('low'), 'high') self.assertEqual(reverse_tails(None), None) self.assertEqual(reverse_tails(3), 3) def test_tail(self): """tail should return prob/2 if test is true, or 1-(prob/2) if false""" self.assertFloatEqual(tail(0.25, True), 0.125) self.assertFloatEqual(tail(0.25, False), 0.875) self.assertFloatEqual(tail(1, True), 0.5) self.assertFloatEqual(tail(1, False), 0.5) self.assertFloatEqual(tail(0, True), 0) self.assertFloatEqual(tail(0, False), 1) def test_z_test(self): """z_test should give correct values""" sample = array([1,2,3,4,5]) self.assertFloatEqual(z_test(sample, 3, 1), (0,1)) self.assertFloatEqual(z_test(sample, 3, 2, 'high'), (0,0.5)) self.assertFloatEqual(z_test(sample, 3, 2, 'low'), (0,0.5)) #check that population mean and variance, and tails, can be set OK. self.assertFloatEqual(z_test(sample, 0, 1), (6.7082039324993694, \ 1.9703444711798951e-11)) self.assertFloatEqual(z_test(sample, 1, 10), (0.44721359549995793, \ 0.65472084601857694)) self.assertFloatEqual(z_test(sample, 1, 10, 'high'), \ (0.44721359549995793, 0.65472084601857694/2)) self.assertFloatEqual(z_test(sample, 1, 10, 'low'), \ (0.44721359549995793, 1-(0.65472084601857694/2))) class CorrelationTests(TestsHelper): """Tests of correlation coefficients and Mantel test.""" def setUp(self): """Sets up variables used in the tests.""" super(CorrelationTests, self).setUp() # For testing spearman and correlation_test using method='spearman'. # Taken from the Spearman wikipedia article. Also used for testing # Pearson (verified with R). self.data1 = [106, 86, 100, 101, 99, 103, 97, 113, 112, 110] self.data2 = [7, 0, 27, 50, 28, 29, 20, 12, 6, 17] # For testing spearman. self.a = [1,2,4,3,1,6,7,8,10,4] self.b = [2,10,20,1,3,7,5,11,6,13] self.c = [7,1,20,13,3,57,5,121,2,9] self.r = (1.7,10,20,1.7,3,7,5,11,6.5,13) self.x = (1, 2, 4, 3, 1, 6, 7, 8, 10, 4, 100, 2, 3, 77) # Ranked copies for testing spearman. self.b_ranked = [2, 7, 10, 1, 3, 6, 4, 8, 5, 9] self.c_ranked = [5, 1, 8, 7, 3, 9, 4, 10, 2, 6] def test_mantel(self): """mantel should be significant for same matrix, not for random""" a = reshape(arange(25), (5,5)) a = tril(a) + tril(a).T fill_diagonal(a, 0) b = a.copy() #closely related -- should be significant self.assertCorrectPValue(0.0, 0.049, mantel, (a, b, 1000)) c = reshape(ones(25), (5,5)) c[0, 1] = 3.0 c[1, 0] = 3.0 fill_diagonal(c, 0) #not related -- should not be significant self.assertCorrectPValue(0.06, 1.0, mantel, (a, c, 1000)) def test_mantel_test_one_sided_greater(self): """Test one-sided mantel test (greater).""" # This test output was verified by R (their mantel function does a # one-sided greater test). m1 = array([[0, 1, 2], [1, 0, 3], [2, 3, 0]]) m2 = array([[0, 2, 7], [2, 0, 6], [7, 6, 0]]) p, stat, perms = mantel_test(m1, m1, 999, alt='greater') self.assertFloatEqual(stat, 1.0) self.assertEqual(len(perms), 999) self.assertCorrectPValue(0.09, 0.25, mantel_test, (m1, m1, 999), {'alt':'greater'}) p, stat, perms = mantel_test(m1, m2, 999, alt='greater') self.assertFloatEqual(stat, 0.755928946018) self.assertEqual(len(perms), 999) self.assertCorrectPValue(0.2, 0.5, mantel_test, (m1, m2, 999), {'alt':'greater'}) def test_mantel_test_one_sided_less(self): """Test one-sided mantel test (less).""" # This test output was verified by R (their mantel function does a # one-sided greater test, but I modified their output to do a one-sided # less test). m1 = array([[0, 1, 2], [1, 0, 3], [2, 3, 0]]) m2 = array([[0, 2, 7], [2, 0, 6], [7, 6, 0]]) m3 = array([[0, 0.5, 0.25], [0.5, 0, 0.1], [0.25, 0.1, 0]]) p, stat, perms = mantel_test(m1, m1, 999, alt='less') self.assertFloatEqual(p, 1.0) self.assertFloatEqual(stat, 1.0) self.assertEqual(len(perms), 999) p, stat, perms = mantel_test(m1, m2, 999, alt='less') self.assertFloatEqual(stat, 0.755928946018) self.assertEqual(len(perms), 999) self.assertCorrectPValue(0.6, 1.0, mantel_test, (m1, m2, 999), {'alt':'less'}) p, stat, perms = mantel_test(m1, m3, 999, alt='less') self.assertFloatEqual(stat, -0.989743318611) self.assertEqual(len(perms), 999) self.assertCorrectPValue(0.1, 0.25, mantel_test, (m1, m3, 999), {'alt':'less'}) def test_mantel_test_two_sided(self): """Test two-sided mantel test.""" # This test output was verified by R (their mantel function does a # one-sided greater test, but I modified their output to do a two-sided # test). m1 = array([[0, 1, 2], [1, 0, 3], [2, 3, 0]]) m2 = array([[0, 2, 7], [2, 0, 6], [7, 6, 0]]) m3 = array([[0, 0.5, 0.25], [0.5, 0, 0.1], [0.25, 0.1, 0]]) p, stat, perms = mantel_test(m1, m1, 999, alt='two sided') self.assertFloatEqual(stat, 1.0) self.assertEqual(len(perms), 999) self.assertCorrectPValue(0.20, 0.45, mantel_test, (m1, m1, 999), {'alt':'two sided'}) p, stat, perms = mantel_test(m1, m2, 999, alt='two sided') self.assertFloatEqual(stat, 0.755928946018) self.assertEqual(len(perms), 999) self.assertCorrectPValue(0.6, 0.75, mantel_test, (m1, m2, 999), {'alt':'two sided'}) p, stat, perms = mantel_test(m1, m3, 999, alt='two sided') self.assertFloatEqual(stat, -0.989743318611) self.assertEqual(len(perms), 999) self.assertCorrectPValue(0.2, 0.45, mantel_test, (m1, m3, 999), {'alt':'two sided'}) def test_mantel_test_invalid_distance_matrix(self): """Test mantel test with invalid distance matrix.""" # Single asymmetric, non-hollow distance matrix. self.assertRaises(ValueError, mantel_test, array([[1, 2], [3, 4]]), array([[0, 0], [0, 0]]), 999) # Two asymmetric distance matrices. self.assertRaises(ValueError, mantel_test, array([[0, 2], [3, 0]]), array([[0, 1], [0, 0]]), 999) def test_mantel_test_invalid_input(self): """Test mantel test with invalid input.""" self.assertRaises(ValueError, mantel_test, array([[1]]), array([[1]]), 999, alt='foo') self.assertRaises(ValueError, mantel_test, array([[1]]), array([[1, 2], [3, 4]]), 999) self.assertRaises(ValueError, mantel_test, array([[1]]), array([[1]]), 0) self.assertRaises(ValueError, mantel_test, array([[1]]), array([[1]]), -1) def test_is_symmetric_and_hollow(self): """Should correctly test for symmetry and hollowness of dist mats.""" self.assertTrue(is_symmetric_and_hollow(array([[0, 1], [1, 0]]))) self.assertTrue(is_symmetric_and_hollow(matrix([[0, 1], [1, 0]]))) self.assertTrue(is_symmetric_and_hollow(matrix([[0.0, 0], [0.0, 0]]))) self.assertTrue(not is_symmetric_and_hollow( array([[0.001, 1], [1, 0]]))) self.assertTrue(not is_symmetric_and_hollow( array([[0, 1.1], [1, 0]]))) self.assertTrue(not is_symmetric_and_hollow( array([[0.5, 1.1], [1, 0]]))) def test_flatten_lower_triangle(self): """Test flattening various dms' lower triangulars.""" self.assertEqual(_flatten_lower_triangle(array([[8]])), []) self.assertEqual(_flatten_lower_triangle(array([[1, 2], [3, 4]])), [3]) self.assertEqual(_flatten_lower_triangle(array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])), [4, 7, 8]) def test_pearson(self): """Test pearson correlation method on valid data.""" # This test output was verified by R. self.assertFloatEqual(pearson([1, 2], [1, 2]), 1.0) self.assertFloatEqual(pearson([1, 2, 3], [1, 2, 3]), 1.0) self.assertFloatEqual(pearson([1, 2, 3], [1, 2, 4]), 0.9819805) def test_pearson_invalid_input(self): """Test running pearson on bad input.""" self.assertRaises(ValueError, pearson, [1.4, 2.5], [5.6, 8.8, 9.0]) self.assertRaises(ValueError, pearson, [1.4], [5.6]) def test_spearman(self): """Test the spearman function with valid input.""" # One vector has no ties. exp = 0.3719581 obs = spearman(self.a, self.b) self.assertFloatEqual(obs, exp) # Both vectors have no ties. exp = 0.2969697 obs = spearman(self.b, self.c) self.assertFloatEqual(obs, exp) # Both vectors have ties. exp = 0.388381 obs = spearman(self.a, self.r) self.assertFloatEqual(obs, exp) exp = -0.17575757575757578 obs = spearman(self.data1, self.data2) self.assertFloatEqual(obs, exp) def test_spearman_no_variation(self): """Test the spearman function with a vector having no variation.""" exp = 0.0 obs = spearman([1, 1, 1], [1, 2, 3]) self.assertFloatEqual(obs, exp) def test_spearman_ranked(self): """Test the spearman function with a vector that is already ranked.""" exp = 0.2969697 obs = spearman(self.b_ranked, self.c_ranked) self.assertFloatEqual(obs, exp) def test_spearman_one_obs(self): """Test running spearman on a single observation.""" self.assertRaises(ValueError, spearman, [1.0], [5.0]) def test_spearman_invalid_input(self): """Test the spearman function with invalid input.""" self.assertRaises(ValueError, spearman, [],[]) self.assertRaises(ValueError, spearman, self.a, []) self.assertRaises(TypeError, spearman, {0:2}, [1,2,3]) def test_get_rank(self): """Test the _get_rank function with valid input.""" exp = ([1.5,3.5,7.5,5.5,1.5,9.0,10.0,11.0,12.0,7.5,14.0,3.5,5.5,13.0], 4) obs = _get_rank(self.x) self.assertFloatEqual(exp,obs) exp = ([1.5,3.0,5.5,4.0,1.5,7.0,8.0,9.0,10.0,5.5],2) obs = _get_rank(self.a) self.assertFloatEqual(exp,obs) exp = ([2,7,10,1,3,6,4,8,5,9],0) obs = _get_rank(self.b) self.assertFloatEqual(exp,obs) exp = ([1.5,7.0,10.0,1.5,3.0,6.0,4.0,8.0,5.0,9.0], 1) obs = _get_rank(self.r) self.assertFloatEqual(exp,obs) exp = ([],0) obs = _get_rank([]) self.assertEqual(exp,obs) def test_get_rank_invalid_input(self): """Test the _get_rank function with invalid input.""" vec = [1, 'a', 3, 2.5, 3, 1] self.assertRaises(TypeError, _get_rank, vec) vec = [1, 2, {1:2}, 2.5, 3, 1] self.assertRaises(TypeError, _get_rank, vec) vec = [1, 2, [23,1], 2.5, 3, 1] self.assertRaises(TypeError, _get_rank, vec) vec = [1, 2, (1,), 2.5, 3, 1] self.assertRaises(TypeError, _get_rank, vec) def test_correlation(self): """Correlations and significance should match R's cor.test()""" x = [1,2,3,5] y = [0,0,0,0] z = [1,1,1,1] a = [2,4,6,8] b = [1.5, 1.4, 1.2, 1.1] c = [15, 10, 5, 20] bad = [1,2,3] #originally gave r = 1.0000000002 self.assertFloatEqual(correlation(x,x), (1, 0)) self.assertFloatEqual(correlation(x,y), (0,1)) self.assertFloatEqual(correlation(y,z), (0,1)) self.assertFloatEqualAbs(correlation(x,a), (0.9827076, 0.01729), 1e-5) self.assertFloatEqualAbs(correlation(x,b), (-0.9621405, 0.03786), 1e-5) self.assertFloatEqualAbs(correlation(x,c), (0.3779645, 0.622), 1e-3) self.assertEqual(correlation(bad,bad), (1, 0)) def test_correlation_test_pearson(self): """Test correlation_test using pearson on valid input.""" # These results were verified with R. # Test with non-default confidence level and permutations. obs = correlation_test(self.data1, self.data2, method='pearson', confidence_level=0.90, permutations=990) self.assertFloatEqual(obs[:2], (-0.03760147, 0.91786297277172868)) self.assertEqual(len(obs[2]), 990) for r in obs[2]: self.assertTrue(r >= -1.0 and r <= 1.0) self.assertCorrectPValue(0.9, 0.93, correlation_test, (self.data1, self.data2), {'method':'pearson', 'confidence_level':0.90, 'permutations':990}, p_val_idx=3) self.assertFloatEqual(obs[4], (-0.5779077, 0.5256224)) # Test with non-default tail type. obs = correlation_test(self.data1, self.data2, method='pearson', confidence_level=0.90, permutations=990, tails='low') self.assertFloatEqual(obs[:2], (-0.03760147, 0.45893148638586434)) self.assertEqual(len(obs[2]), 990) for r in obs[2]: self.assertTrue(r >= -1.0 and r <= 1.0) self.assertCorrectPValue(0.41, 0.46, correlation_test, (self.data1, self.data2), {'method':'pearson', 'confidence_level':0.90, 'permutations':990, 'tails':'low'}, p_val_idx=3) self.assertFloatEqual(obs[4], (-0.5779077, 0.5256224)) def test_correlation_test_spearman(self): """Test correlation_test using spearman on valid input.""" # This example taken from Wikipedia page: # http://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficient obs = correlation_test(self.data1, self.data2, method='spearman', tails='high') self.assertFloatEqual(obs[:2], (-0.17575757575757578, 0.686405827612)) self.assertEqual(len(obs[2]), 999) for rho in obs[2]: self.assertTrue(rho >= -1.0 and rho <= 1.0) self.assertCorrectPValue(0.67, 0.7, correlation_test, (self.data1, self.data2), {'method':'spearman', 'tails':'high'}, p_val_idx=3) self.assertFloatEqual(obs[4], (-0.7251388558041697, 0.51034422964834503)) # The p-value is off because the example uses a one-tailed test, while # we use a two-tailed test. Someone confirms the answer that we get # here for a two-tailed test: # http://stats.stackexchange.com/questions/22816/calculating-p-value- # for-spearmans-rank-correlation-coefficient-example-on-wikip obs = correlation_test(self.data1, self.data2, method='spearman', tails=None) self.assertFloatEqual(obs[:2], (-0.17575757575757578, 0.62718834477648433)) self.assertEqual(len(obs[2]), 999) for rho in obs[2]: self.assertTrue(rho >= -1.0 and rho <= 1.0) self.assertCorrectPValue(0.60, 0.64, correlation_test, (self.data1, self.data2), {'method':'spearman', 'tails':None}, p_val_idx=3) self.assertFloatEqual(obs[4], (-0.7251388558041697, 0.51034422964834503)) def test_correlation_test_invalid_input(self): """Test correlation_test using invalid input.""" self.assertRaises(ValueError, correlation_test, self.data1, self.data2, method='foo') self.assertRaises(ValueError, correlation_test, self.data1, self.data2, tails='foo') self.assertRaises(ValueError, correlation_test, self.data1, self.data2, permutations=-1) self.assertRaises(ValueError, correlation_test, self.data1, self.data2, confidence_level=-1) self.assertRaises(ValueError, correlation_test, self.data1, self.data2, confidence_level=1.1) self.assertRaises(ValueError, correlation_test, self.data1, self.data2, confidence_level=0) self.assertRaises(ValueError, correlation_test, self.data1, self.data2, confidence_level=0.0) self.assertRaises(ValueError, correlation_test, self.data1, self.data2, confidence_level=1) self.assertRaises(ValueError, correlation_test, self.data1, self.data2, confidence_level=1.0) def test_correlation_test_no_permutations(self): """Test correlation_test with no permutations.""" # These results were verified with R. exp = (-0.2581988897471611, 0.7418011102528389, [], None, (-0.97687328610475876, 0.93488023560400879)) obs = correlation_test([1, 2, 3, 4], [1, 2, 1, 1], permutations=0) self.assertFloatEqual(obs, exp) def test_correlation_test_perfect_correlation(self): """Test correlation_test with perfectly-correlated input vectors.""" # These results were verified with R. obs = correlation_test([1, 2, 3, 4], [1, 2, 3, 4]) self.assertFloatEqual(obs[:2], (0.99999999999999978, 2.2204460492503131e-16)) self.assertEqual(len(obs[2]), 999) for r in obs[2]: self.assertTrue(r >= -1.0 and r <= 1.0) self.assertCorrectPValue(0.06, 0.09, correlation_test, ([1, 2, 3, 4], [1, 2, 3, 4]), p_val_idx=3) self.assertFloatEqual(obs[4], (0.99999999999998879, 1.0)) def test_correlation_test_small_obs(self): """Test correlation_test with a small number of observations.""" # These results were verified with R. obs = correlation_test([1, 2, 3], [1, 2, 3]) self.assertFloatEqual(obs[:2], (1.0, 0)) self.assertEqual(len(obs[2]), 999) for r in obs[2]: self.assertTrue(r >= -1.0 and r <= 1.0) self.assertCorrectPValue(0.3, 0.4, correlation_test, ([1, 2, 3], [1, 2, 3]), p_val_idx=3) self.assertFloatEqual(obs[4], (None, None)) obs = correlation_test([1, 2, 3], [1, 2, 3], method='spearman') self.assertFloatEqual(obs[:2], (1.0, 0)) self.assertEqual(len(obs[2]), 999) for r in obs[2]: self.assertTrue(r >= -1.0 and r <= 1.0) self.assertCorrectPValue(0.3, 0.4, correlation_test, ([1, 2, 3], [1, 2, 3]), {'method':'spearman'}, p_val_idx=3) self.assertFloatEqual(obs[4], (None, None)) def test_correlation_matrix(self): """Correlations in matrix should match values from R""" a = [2,4,6,8] b = [1.5, 1.4, 1.2, 1.1] c = [15, 10, 5, 20] m = correlation_matrix([a,b,c]) self.assertFloatEqual(m[0,0], [1.0]) self.assertFloatEqual([m[1,0], m[1,1]], [correlation(b,a)[0], 1.0]) self.assertFloatEqual(m[2], [correlation(c,a)[0], correlation(c,b)[0], \ 1.0]) class Ftest(TestCase): """Tests for the F test""" def test_f_value(self): """f_value: should calculate the correct F value if possible""" a = array([1,3,5,7,9,8,6,4,2]) b = array([5,4,6,3,7,6,4,5]) self.assertEqual(f_value(a,b), (8,7,4.375)) self.assertFloatEqual(f_value(b,a), (7,8,0.2285714)) too_short = array([4]) self.assertRaises(ValueError, f_value, too_short, b) def test_f_two_sample(self): """f_two_sample should match values from R""" #The expected values in this test are obtained through R. #In R the F test is var.test(x,y) different alternative hypotheses #can be specified (two sided, less, or greater). #The vectors are random samples from a particular normal distribution #(mean and sd specified). #a: 50 elem, mean=0 sd=1 a = [-0.70701689, -1.24788845, -1.65516470, 0.10443876, -0.48526915, -0.71820656, -1.02603596, 0.03975982, -2.23404324, -0.21509363, 0.08438468, -0.01970062, -0.67907971, -0.89853667, 1.11137131, 0.05960496, -1.51172084, -0.79733957, -1.60040659, 0.80530639, -0.81715836, -0.69233474, 0.95750665, 0.99576429, -1.61340216, -0.43572590, -1.50862327, 0.92847551, -0.68382338, -1.12523522, -0.09147488, 0.66756023, -0.87277588, -1.36539039, -0.11748707, -1.63632578, -0.31343078, -0.28176086, 0.33854483, -0.51785630, 2.25360559, -0.80761191, 1.18983499, 0.57080342, -1.44601700, -0.53906955, -0.01975266, -1.37147915, -0.31537616, 0.26877544] #b: 50 elem, mean=0, sd=1.2 b=[0.081418743, 0.276571612, -1.864316504, 0.675213612, -0.769202643, 0.140372825, -1.426250184, 0.058617884, -0.819287409, -0.007701916, -0.782722020, -0.285891593, 0.661980419, 0.383225191, 0.622444946, -0.192446150, 0.297150571, 0.408896059, -0.167359383, -0.552381362, 0.982168338, 1.439730446, 1.967616101, -0.579607307, 1.095590943, 0.240591302, -1.566937143, -0.199091349, -1.232983905, 0.362378169, 1.166061081, -0.604676222, -0.536560206, -0.303117595, 1.519222792, -0.319146503, 2.206220810, -0.566351124, -0.720397392, -0.452001377, 0.250890097, 0.320685395, -1.014632725, -3.010346273, -1.703955054, 0.592587381, -1.237451255, 0.172243366, -0.452641122, -0.982148581] #c: 60 elem, mean=5, sd=1 c=[4.654329, 5.242129, 6.272640, 5.781779, 4.391241, 3.800752, 4.559463, 4.318922, 3.243020, 5.121280, 4.126385, 5.541131, 4.777480, 5.646913, 6.972584, 3.817172, 6.128700, 4.731467, 6.762068, 5.082983, 5.298511, 5.491125, 4.532369, 4.265552, 5.697317, 5.509730, 2.935704, 4.507456, 3.786794, 5.548383, 3.674487, 5.536556, 5.297847, 2.439642, 4.759836, 5.114649, 5.986774, 4.517485, 4.579208, 4.579374, 2.502890, 5.190955, 5.983194, 6.766645, 4.905079, 4.214273, 3.950364, 6.262393, 8.122084, 6.330007, 4.767943, 5.194029, 3.503136, 6.039079, 4.485647, 6.116235, 6.302268, 3.596693, 5.743316, 6.860152] #d: 30 elem, mean=0, sd =0.05 d=[ 0.104517366, 0.023039678, 0.005579091, 0.052928250, 0.020724823, -0.060823243, -0.019000890, -0.064133996, -0.016321594, -0.008898334, -0.027626992, -0.051946186, 0.085269587, -0.031190678, 0.065172938, -0.054628573, 0.019257306, -0.032427056, -0.058767356, 0.030927400, 0.052247357, -0.042954937, 0.031842104, 0.094130522, -0.024828465, 0.011320453, -0.016195062, 0.015631245, -0.050335598, -0.031658335] a,b,c,d = map(array,[a,b,c,d]) self.assertEqual(map(len,[a,b,c,d]), [50, 50, 60, 30]) #allowed error. This big, because results from R #are rounded at 4 decimals error = 1e-4 self.assertFloatEqual(f_two_sample(a,a), (49, 49, 1, 1), eps=error) self.assertFloatEqual(f_two_sample(a,b), (49, 49, 0.8575, 0.5925), eps=error) self.assertFloatEqual(f_two_sample(b,a), (49, 49, 1.1662, 0.5925), eps=error) self.assertFloatEqual(f_two_sample(a,b, tails='low'), (49, 49, 0.8575, 0.2963), eps=error) self.assertFloatEqual(f_two_sample(a,b, tails='high'), (49, 49, 0.8575, 0.7037), eps=error) self.assertFloatEqual(f_two_sample(a,c), (49, 59, 0.6587, 0.1345), eps=error) #p value very small, so first check df's and F value self.assertFloatEqualAbs(f_two_sample(d,a, tails='low')[0:3], (29, 49, 0.0028), eps=error) assert f_two_sample(d,a, tails='low')[3] < 2.2e-16 #p value def test_MonteCarloP(self): """MonteCarloP calcs a p-value from a val and list of random vals""" val = 3.0 random_vals = [0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0] #test for "high" tail (larger values than expected by chance) p_val = MonteCarloP(val, random_vals, 'high') self.assertEqual(p_val, 0.7) #test for "low" tail (smaller values than expected by chance) p_val = MonteCarloP(val, random_vals, 'low') self.assertEqual(p_val, 0.4) class MannWhitneyTests(TestCase): """check accuracy of Mann-Whitney implementation""" x = map(int, "104 109 112 114 116 118 118 119 121 123 125 126"\ " 126 128 128 128".split()) y = map(int, "100 105 107 107 108 111 116 120 121 123".split()) def test_mw_test(self): """mann-whitney test results should match Sokal & Rohlf""" U, p = mw_test(self.x, self.y) self.assertFloatEqual(U, 123.5) self.assertTrue(0.02 <= p <= 0.05) def test_mw_boot(self): """excercising the Monte-carlo variant of mann-whitney""" U, p = mw_boot(self.x, self.y, 10) self.assertFloatEqual(U, 123.5) self.assertTrue(0 <= p <= 0.5) class KendallTests(TestCase): """check accuracy of Kendall tests against values from R""" def do_test(self, x, y, alt_expecteds): """conducts the tests for each alternate hypothesis against expecteds""" for alt, exp_p, exp_tau in alt_expecteds: tau, p_val = kendall_correlation(x, y, alt=alt, warn=False) self.assertFloatEqual(tau, exp_tau, eps=1e-3) self.assertFloatEqual(p_val, exp_p, eps=1e-3) def test_exact_calcs(self): """calculations of exact probabilities should match R""" x = (44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1) y = ( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8) expecteds = [["gt", 0.05972, 0.4444444], ["lt", 0.9624, 0.4444444], ["ts", 0.1194, 0.4444444]] self.do_test(x,y,expecteds) def test_with_ties(self): """tied values calculated from normal approx""" # R example with ties in x x = (44.4, 45.9, 41.9, 53.3, 44.4, 44.1, 50.7, 45.2, 60.1) y = ( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8) expecteds = [#["gt", 0.05793, 0.4225771], ["lt", 0.942, 0.4225771], ["ts", 0.1159, 0.4225771]] self.do_test(x,y,expecteds) # R example with ties in y x = (44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1) y = ( 2.6, 3.1, 2.5, 5.0, 3.1, 4.0, 5.2, 2.8, 3.8) expecteds = [["gt", 0.03737, 0.4789207], ["lt", 0.9626, 0.4789207], ["ts", 0.07474, 0.4789207]] self.do_test(x,y,expecteds) # R example with ties in x and y x = (44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 44.4, 60.1) y = ( 2.6, 3.6, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8) expecteds=[["gt", 0.02891, 0.5142857], ["lt", 0.971, 0.5142857], ["ts", 0.05782, 0.5142857]] self.do_test(x,y,expecteds) def test_bigger_vectors(self): """docstring for test_bigger_vectors""" # q < expansion x= (0.118583104633, 0.227860069338, 0.143856130991, 0.935362617582, 0.0471303856799, 0.659819202174, 0.739247965907, 0.268929000278, 0.848250568194, 0.307764819102, 0.733949480141, 0.271662210481, 0.155903098872) y= (0.749762144455, 0.407571703468, 0.934176427266, 0.188638794706, 0.184844781493, 0.391485553856, 0.735504815302, 0.363655952442, 0.18489971978, 0.851075466765, 0.139932273818, 0.333675110224, 0.570250937033) expecteds = [["gt", 0.9183, -0.2820513], ["lt", 0.1022, -0.2820513], ["ts", 0.2044, -0.2820513]] self.do_test(x,y,expecteds) # q > expansion x= (0.2602556958, 0.441506392849, 0.930624643531, 0.728461775775, 0.234341774892, 0.725677256368, 0.354788882728, 0.475882541956, 0.347533553428, 0.608578046857, 0.144697962102, 0.784502692164, 0.872607603407) y= (0.753056395718, 0.454332072011, 0.791882395707, 0.622853579015, 0.127030232518, 0.232086215578, 0.586604349918, 0.0139051260749, 0.579079370051, 0.0550643809812, 0.94798878249, 0.318410679439, 0.86725134615) expecteds = [["gt", 0.4762, 0.02564103], ["lt", 0.5711, 0.02564103], ["ts", 0.9524, 0.02564103]] self.do_test(x,y,expecteds) class TestDistMatrixPermutationTest(TestCase): """Tests of distance_matrix_permutation_test""" def setUp(self): """sets up variables for testing""" self.matrix = array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]) self.cells = [(0,1), (1,3)] self.cells2 = [(0,2), (2,3)] def test_get_ltm_cells(self): "get_ltm_cells converts indices to be below the diagonal" cells = [(0,0),(0,1),(0,2),(1,0),(1,1),(1,2),(2,0),(2,1),(2,2)] result = get_ltm_cells(cells) self.assertEqual(result, [(2, 0), (1, 0), (2, 1)]) cells = [(0,1),(0,2)] result = get_ltm_cells(cells) self.assertEqual(result, [(2, 0), (1, 0)]) def test_get_values_from_matrix(self): """get_values_from_matrix returns the special and other values from matrix""" matrix = self.matrix cells = [(1,0),(0,1),(2,0),(2,1)] #test that works for a symmetric matrix cells_sym = get_ltm_cells(cells) special_vals, other_vals = get_values_from_matrix(matrix, cells_sym,\ cells2=None, is_symmetric=True) special_vals.sort() other_vals.sort() self.assertEqual(special_vals, [5,9,10]) self.assertEqual(other_vals, [13,14,15]) #test that work for a non symmetric matrix special_vals, other_vals = get_values_from_matrix(matrix, cells,\ cells2=None, is_symmetric=False) special_vals.sort() other_vals.sort() self.assertEqual(special_vals, [2,5,9,10]) self.assertEqual(other_vals, [1,3,4,6,7,8,11,12,13,14,15,16]) #test that works on a symmetric matrix when cells2 is defined cells2 = [(3,0),(3,2),(0,3)] cells2_sym = get_ltm_cells(cells2) special_vals, other_vals = get_values_from_matrix(matrix, cells_sym,\ cells2=cells2_sym, is_symmetric=True) special_vals.sort() other_vals.sort() self.assertEqual(special_vals, [5,9,10]) self.assertEqual(other_vals, [13,15]) #test that works when cells2 is defined and not symmetric special_vals, other_vals = get_values_from_matrix(matrix, cells, cells2=cells2,\ is_symmetric=False) special_vals.sort() other_vals.sort() self.assertEqual(special_vals, [2,5,9,10]) self.assertEqual(other_vals, [4,13,15]) def test_distance_matrix_permutation_test_non_symmetric(self): """ evaluate empirical p-values for a non symmetric matrix To test the empirical p-values, we look at a simple 3x3 matrix b/c it is easy to see what t score every permutation will generate -- there's only 6 permutations. Running dist_matrix_test with n=1000, we expect that each permutation will show up 160 times, so we know how many times to expect to see more extreme t scores. We therefore know what the empirical p-values will be. (n=1000 was chosen empirically -- smaller values seem to lead to much more frequent random failures.) """ def make_result_list(*args, **kwargs): return [distance_matrix_permutation_test(*args,**kwargs)[2] \ for i in range(10)] m = arange(9).reshape((3,3)) n = 100 # looks at each possible permutation n times -- # compare first row to rest r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,is_symmetric=False) self.assertSimilarMeans(r, 0./6.) r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,is_symmetric=False,\ tails='high') self.assertSimilarMeans(r, 4./6.) r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,is_symmetric=False,\ tails='low') self.assertSimilarMeans(r, 0./6.) # looks at each possible permutation n times -- # compare last row to rest r = make_result_list(m, [(2,0),(2,1),(2,2)],n=n,is_symmetric=False) self.assertSimilarMeans(r, 0./6.) r = make_result_list(m, [(2,0),(2,1),(2,2)],n=n,is_symmetric=False,\ tails='high') self.assertSimilarMeans(r, 0./6.) r = make_result_list(m, [(2,0),(2,1),(2,2)],n=n,is_symmetric=False,\ tails='low') self.assertSimilarMeans(r, 4./6.) def test_distance_matrix_permutation_test_symmetric(self): """ evaluate empirical p-values for symmetric matrix See test_distance_matrix_permutation_test_non_symmetric doc string for a description of how this test works. """ def make_result_list(*args, **kwargs): return [distance_matrix_permutation_test(*args)[2] for i in range(10)] m = array([[0,1,3],[1,2,4],[3,4,5]]) # looks at each possible permutation n times -- # compare first row to rest n = 100 # looks at each possible permutation n times -- # compare first row to rest r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n) self.assertSimilarMeans(r, 0./6.) r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,tails='high') self.assertSimilarMeans(r, 0.77281447417149496,0) r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,tails='low') self.assertSimilarMeans(r, 4./6.) ## The following lines are not part of the test code, but are useful in ## figuring out what t-scores all of the permutations will yield. #permutes = [[0, 1, 2], [0, 2, 1], [1, 0, 2],\ # [1, 2, 0], [2, 0, 1], [2, 1, 0]] #results = [] #for p in permutes: # p_m = permute_2d(m,p) # results.append(t_two_sample(\ # [p_m[0,1],p_m[0,2]],[p_m[2,1]],tails='high')) #print results def test_distance_matrix_permutation_test_alt_stat(self): def fake_stat_test(a,b,tails=None): return 42.,42. m = array([[0,1,3],[1,2,4],[3,4,5]]) self.assertEqual(distance_matrix_permutation_test(m,\ [(0,0),(0,1),(0,2)],n=5,f=fake_stat_test),(42.,42.,0.)) def test_distance_matrix_permutation_test_return_scores(self): """ return_scores=True functions as expected """ # use alt statistical test to make results simple def fake_stat_test(a,b,tails=None): return 42.,42. m = array([[0,1,3],[1,2,4],[3,4,5]]) self.assertEqual(distance_matrix_permutation_test(\ m,[(0,0),(0,1),(0,2)],\ n=5,f=fake_stat_test,return_scores=True),(42.,42.,0.,[42.]*5)) def test_ANOVA_one_way(self): """ANOVA one way returns same values as ANOVA on a stats package """ g1 = Numbers([10.0, 11.0, 10.0, 5.0, 6.0]) g2 = Numbers([1.0, 2.0, 3.0, 4.0, 1.0, 2.0]) g3 = Numbers([6.0, 7.0, 5.0, 6.0, 7.0]) i = [g1, g2, g3] dfn, dfd, F, between_MS, within_MS, group_means, prob = ANOVA_one_way(i) self.assertEqual(dfn, 2) self.assertEqual(dfd, 13) self.assertFloatEqual(F, 18.565450643776831) self.assertFloatEqual(between_MS, 55.458333333333343) self.assertFloatEqual(within_MS, 2.9871794871794868) self.assertFloatEqual(group_means, [8.4000000000000004, 2.1666666666666665, 6.2000000000000002]) self.assertFloatEqual(prob, 0.00015486238993089464) #execute tests if called from command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/test_util.py000644 000765 000024 00000233537 12024702176 024356 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests Numbers and Freqs objects, and their Unsafe versions. """ from math import sqrt import numpy from cogent.util.unit_test import TestCase, main from cogent.maths.stats.util import SummaryStatistics, SummaryStatisticsError,\ Numbers, UnsafeNumbers, Freqs, UnsafeFreqs, NumberFreqs, \ UnsafeNumberFreqs from cogent.util.misc import ConstraintError from operator import add, sub, mul __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class SummaryStatisticsTests(TestCase): """Tests of summary stats functions.""" def test_init(self): """SummaryStatistics should initialize correctly.""" #check empty init -- can access private vars, but can't get #properties. s = SummaryStatistics() self.assertEqual(s._count, None) self.assertRaises(SummaryStatisticsError, getattr, s, 'Count') #check init with one positional parameter s = SummaryStatistics(1) self.assertEqual(s.Count, 1) #check init with all positional parameters. #note that inconsistent data can sneak in (c.f. sd vs var) s = SummaryStatistics(1,2,3,4,5,6) self.assertEqual(s.Count, 1) self.assertEqual(s.Sum, 2) self.assertEqual(s.Mean, 3) self.assertEqual(s.StandardDeviation, 4) self.assertEqual(s.Variance, 5) self.assertEqual(s.SumSquares, 6) def test_str(self): """SummaryStatistics str should print known fields.""" s = SummaryStatistics() self.assertEqual(str(s), '') #note that additional fields will fill in if they can be calculated. s = SummaryStatistics(Mean=3, StandardDeviation=2) #now expect to print as table self.assertEqual(str(s), '==========================\n Statistic Value\n--------------------------\n Mean 3\nStandardDeviation 2\n Variance 4\n--------------------------') def test_Count(self): """SummaryStatistics Count should work if Count or Sum and Mean ok""" s = SummaryStatistics(Count=3) self.assertEqual(s.Count, 3) s = SummaryStatistics(Sum=10, Mean=5) self.assertEqual(s.Count, 2) #if inconsistent, believes Count s = SummaryStatistics(Count=3, Sum=2, Mean=5) self.assertEqual(s.Count, 3) #doesn't work with just sum or mean s = SummaryStatistics(Mean=3) self.assertRaises(SummaryStatisticsError, getattr, s, 'Count') def test_Sum(self): """SummaryStatistics Sum should work if Sum or Count and Mean ok""" s = SummaryStatistics(Sum=3) self.assertEqual(s.Sum, 3) s = SummaryStatistics(Count=3, Mean=5) self.assertEqual(s.Sum, 15) def test_Mean(self): """SummaryStatistics Mean should work if Mean or Count and Sum ok""" s = SummaryStatistics(Mean=3) self.assertEqual(s.Mean, 3) s = SummaryStatistics(Count=3, Sum=15) self.assertEqual(s.Mean, 5) def test_StandardDeviation(self): """SummaryStatistics StandardDeviation should work if it or variance ok""" s = SummaryStatistics(StandardDeviation=3) self.assertEqual(s.StandardDeviation, 3) self.assertEqual(s.Variance, 9) s = SummaryStatistics(Variance=9) self.assertEqual(s.StandardDeviation, 3) def test_Variance(self): """SummaryStatistics Variance should work if it or std dev ok""" s = SummaryStatistics(StandardDeviation=3) self.assertEqual(s.StandardDeviation, 3) self.assertEqual(s.Variance, 9) s = SummaryStatistics(Variance=9) self.assertEqual(s.StandardDeviation, 3) def test_SumSquares(self): """SummaryStatistics SumSquares should work if set""" s = SummaryStatistics(SumSquares=3) self.assertEqual(s.SumSquares, 3) s = SummaryStatistics(Sum=3) self.assertRaises(SummaryStatisticsError, getattr, s, 'SumSquares') def test_cmp(self): """SummaryStatistics should sort by count, then sum, then variance""" a = SummaryStatistics(Count=3) b = SummaryStatistics(Count=4) c = SummaryStatistics(Count=3, Sum=5) d = SummaryStatistics(Count=3, Sum=10) e = SummaryStatistics(Sum=10) assert a < b assert b > a assert a == a assert a < b assert c < d assert e < a all = [c,a,d,b,e] all.sort() self.assertEqual(all, [e,a,c,d,b]) class NumbersTestsI(object): """Abstract class with tests for Numbers objects. Inherited by safe and unsafe versions to test polymorphism. """ ClassToTest = None def test_init_empty(self): """Numbers should initialize OK with empty list""" self.assertEqual(self.ClassToTest([]), []) def test_init_single(self): """Numbers should initialize OK with single number""" self.assertEqual(self.ClassToTest([5.0]), [5.0]) def test_init_list(self): """Numbers should initialize OK with list of numbers""" self.assertEqual(self.ClassToTest([1, 5.0, 3.2]), [1, 5.0, 3.2]) def test_init_bad_type(self): """Numbers should fail with TypeError if input not iterable""" self.assertRaises(TypeError, self.ClassToTest, 34) def test_add_nonempty(self): """Numbers should allow addition of two nonempty Numbers""" #test that addition works in the right direction self.assertFloatEqual(self.integers + self.floats, Numbers([1, 2, 3, 4, 5, 1.5, 2.7])) #test that neither of the things that were added was changed self.assertFloatEqual(self.integers, [1,2,3,4,5]) self.assertFloatEqual(self.floats, [1.5, 2.7]) def test_add_empty(self): """Numbers should be unchanged on addition of empty list""" #test that addition of an empty list works self.assertFloatEqual(self.integers + self.empty, self.integers) self.assertFloatEqual(self.empty + self.floats, self.floats) def test_add_repeated(self): """Numbers should support repeated addition, a+b+c""" self.assertFloatEqual(self.floats + self.floats + self.floats, [1.5, 2.7]*3) def test_iadd(self): """Numbers should support in-place addition""" self.floats += [4] self.assertFloatEqual(self.floats, [1.5, 2.7, 4.0]) def test_setitem(self): """Numbers should support assignment to positive index""" self.floats[0] = 1 self.assertFloatEqual(self.floats, [1.0, 2.7]) def test_setitem_negative_index(self): """Numbers should support assignment to negative index""" self.floats[-1] = 2 self.assertFloatEqual(self.floats, [1.5, 2.0]) def test_setslice(self): """Numbers should support slice assignment""" self.floats[0:1] = [1, 2, 3] self.assertFloatEqual(self.floats, [1, 2, 3, 2.7]) def test_append_good(self): """Numbers should support append of a number""" self.floats.append(1) self.assertFloatEqual(self.floats, [1.5, 2.7, 1.0]) def test_extend(self): """Numbers should support extend with a sequence""" self.floats.extend([5,5,5]) self.assertFloatEqual(self.floats, [1.5, 2.7, 5.0, 5.0, 5.0]) def test_items(self): """Numbers should support items() method""" self.assertFloatEqual(self.floats.items()[0], (1.5, 1)) self.assertFloatEqual(self.floats.items()[1], (2.7, 1)) def test_isValid(self): """Numbers isValid should return True if all items numbers""" for i in [self.empty, self.integers, self.floats, self.mixed]: assert i.isValid() def test_toFixedWidth(self): """Numbers should be able to convert items to fixed-width string""" self.assertEqual(self.floats.toFixedWidth(), " +1.50e+00 +2.70e+00") def test_toFixedWidth_empty(self): """Numbers should return empty string when converting no items""" self.assertEqual(self.empty.toFixedWidth(), '') def test_toFixedWidth_mixed(self): """Numbers should convert all kinds of floats to fixed precision""" self.assertEqual(self.mixed.toFixedWidth(), ''.join([ ' +0.00e+00', ' +1.00e+00', ' -1.00e+00', ' +1.23e+00', ' -1.24e+00', '+1.23e+302', '+1.23e-298', '-1.23e+302', '-1.23e-298', ])) def test_toFixedWidth_specified_precision(self): """Numbers should convert all kinds of floats to specified precision""" self.assertEqual(self.mixed.toFixedWidth(7), ''.join([ ' +0e+00', ' +1e+00', ' -1e+00', ' +1e+00', ' -1e+00', '+1e+302', '+1e-298', '-1e+302', '-1e-298', ])) self.assertEqual(self.mixed.toFixedWidth(8), ''.join([ ' +0e+00', ' +1e+00', ' -1e+00', ' +1e+00', ' -1e+00', ' +1e+302', ' +1e-298', ' -1e+302', ' -1e-298', ])) self.assertEqual(self.mixed.toFixedWidth(12), ''.join([ ' +0.0000e+00', ' +1.0000e+00', ' -1.0000e+00', ' +1.2346e+00', ' -1.2368e+00', '+1.2340e+302', '+1.2340e-298', '-1.2340e+302', '-1.2340e-298', ])) def test_normalize(self): """Numbers normalize should return items summing to 1 by default""" first = self.ints second = self.fracs first.normalize() second.normalize() self.assertFloatEqual(first, second) self.assertFloatEqual(first.Sum, 1) self.assertFloatEqual(second.Sum, 1) empty = self.empty empty.normalize() self.assertEqual(empty, []) zero = self.zero zero.normalize() self.assertEqual(zero, [0,0,0,0,0]) def test_normalize_parameter(self): """Numbers normalize(x) should divide items by x""" first = self.ClassToTest([0, 1, 2, 3, 4]) first.normalize(max(first)) self.assertFloatEqual(first, [0, 1.0/4, 2.0/4, 3.0/4, 4.0/4]) second = self.ClassToTest([0, 1, 2]) second.normalize(0.5) self.assertFloatEqual(second, [0, 2, 4]) def test_accumulate(self): """Numbers accumulate should do cumulative sum in place""" nl = self.ClassToTest([0, 1, 2, 3, 4]) nl.accumulate() self.assertEqual(nl, [0, 1, 3, 6, 10]) nl = self.ClassToTest() nl.accumulate() self.assertEqual(nl, []) def test_firstIndexLessThan(self): """Numbers firstIndexLessThan should return first index less than val""" nl = self.ints f = nl.firstIndexLessThan self.assertEqual(f(-50), None) self.assertEqual(f(100), 0) self.assertEqual(f(3), 0) self.assertEqual(f(1), None) self.assertEqual(f(1, inclusive=True), 0) self.assertEqual(f(-50, stop_at_ends=True), 4) def test_firstIndexGreaterThan(self): """Numbers firstIndexGreaterThan should return first index less than val""" nl = self.ints f = nl.firstIndexGreaterThan self.assertEqual(f(-50), 0) self.assertEqual(f(100), None) self.assertEqual(f(3), 3) self.assertEqual(f(1), 1) self.assertEqual(f(1, inclusive=True), 0) self.assertEqual(f(2), 2) self.assertEqual(f(2, inclusive=True), 1) self.assertEqual(f(100, stop_at_ends=True), 4) #compatibility tests with old choose() """Numbers choose should return correct index""" nl = self.ClassToTest([1, 2, 3, 4, 5]) nl.normalize() nl.accumulate() known_values = [ (-50, 0), (0, 0), (0.001, 0), (1/15.0 - 0.001, 0), (1/15.0 + 0.001, 1), (3/15.0 + 0.001, 2), (1, 4), (10, 4), ] for test, result in known_values: self.assertFloatEqual(nl.firstIndexGreaterThan(test, inclusive=True, stop_at_ends=True), result) def test_lastIndexGreaterThan(self): """Numbers lastIndexGreaterThan should return last index > val""" nl = self.ints f = nl.lastIndexGreaterThan self.assertEqual(f(-50), 4) self.assertEqual(f(100), None) self.assertEqual(f(3), 4) self.assertEqual(f(1), 4) self.assertEqual(f(1, inclusive=True), 4) self.assertEqual(f(100, stop_at_ends=True), 0) def test_lastIndexLessThan(self): """Numbers lastIndexLessThan should return last index < val""" nl = self.ints f = nl.lastIndexLessThan self.assertEqual(f(-50), None) self.assertEqual(f(100), 4) self.assertEqual(f(3), 1) self.assertEqual(f(1), None) self.assertEqual(f(1, inclusive=True), 0) self.assertEqual(f(-50, stop_at_ends=True), 0) def test_Sum(self): """Numbers Sum should be the same as sum()""" self.assertEqual(self.ints.Sum, 15) self.assertEqual(self.empty.Sum, 0) def test_Count(self): """Numbers Count should be the same as len()""" self.assertEqual(self.ints.Count, 5) self.assertEqual(self.empty.Count, 0) def test_SumSquares(self): """Numbers SumSquares should be sum of squares""" self.assertEqual(self.ints.SumSquares, (1*1+2*2+3*3+4*4+5*5)) self.assertEqual(self.empty.SumSquares, 0) def test_Variance(self): """Numbers Variance should be variance of individual numbers""" self.assertEqual(self.empty.Variance, None) self.assertEqual(self.zero.Variance, 0) self.assertFloatEqual(self.ints.Variance, 2.5) def test_StandardDeviation(self): """Numbers StandardDeviation should be sd of individual numbers""" self.assertEqual(self.empty.StandardDeviation, None) self.assertEqual(self.zero.StandardDeviation, 0) self.assertFloatEqual(self.ints.StandardDeviation, sqrt(2.5)) def test_Mean(self): """Numbers Mean should be mean of individual numbers""" self.assertEqual(self.empty.Mean, None) self.assertEqual(self.zero.Mean, 0) self.assertEqual(self.ints.Mean, 3) def test_NumberQuantiles(self): """quantiles should be correct""" num = self.ClassToTest(range(1,11)) self.assertFloatEqual(num.quantile(.1), 1.9) self.assertFloatEqual(num.quantile(.2), 2.8) self.assertFloatEqual(num.quantile(.25), 3.25) self.assertFloatEqual(num.Median, 5.5) self.assertFloatEqual(num.quantile(.75), 7.75) self.assertFloatEqual(num.quantile(.77), 7.93) def test_summarize(self): """Numbers summarize should return SummaryStatistics object""" self.assertEqual(self.ints.summarize(), SummaryStatistics(Mean=3,\ Variance=2.5, Count=5)) def test_choice(self): """Numbers choice should return random element from self""" nums = [self.ints.choice() for i in range(10)] self.assertEqual(len(nums), 10) for n in nums: assert n in self.ints v = Numbers(nums).Variance self.assertNotEqual(v, 0) def test_randomSequence(self): """Numbers randomSequence should return random sequence from self""" nums = self.ints.randomSequence(10) nums = [self.ints.choice() for i in range(10)] self.assertEqual(len(nums), 10) for n in nums: assert n in self.ints v = Numbers(nums).Variance self.assertNotEqual(v, 0) def test_subset(self): """Numbers subset should delete (or keep) selected items""" odd = [5,1,3] nums = self.ints nums.extend([1,1,1]) new_nums = nums.copy() new_nums.subset(odd) self.assertEqual(new_nums, [1,3,5,1,1,1]) new_nums = nums.copy() new_nums.subset(odd, keep=False) self.assertEqual(new_nums, [2,4]) def test_copy(self): """Numbers copy should leave class intact (unlike slice)""" c = self.ints.copy() self.assertEqual(c, self.ints) self.assertEqual(c.__class__, self.ints.__class__) def test_round(self): """Numbers round should round numbers in-place""" self.floats.round() self.assertEqual(self.floats, [2.0,3.0]) for i, f in enumerate(self.floats): self.floats[i] = self.floats[i] + 0.101 self.assertNotEqual(self.floats, [2.0,3.0]) self.assertNotEqual(self.floats, [2.1,3.1]) self.floats.round(1) self.assertEqual(self.floats, [2.1,3.1]) def test_Uncertainty(self): """Numbers Uncertainty should act via Freqs""" self.assertEqual(self.floats.Uncertainty, \ Freqs(self.floats).Uncertainty) self.assertNotEqual(self.floats.Uncertainty, None) def test_Mode(self): """Numbers Mode should return most common element""" self.assertEqual(self.empty.Mode, None) self.assertEqual(self.zero.Mode, 0) self.ints.extend([1,2,2,3,3,3]) self.assertEqual(self.ints.Mode, 3) class NumbersTests(TestCase, NumbersTestsI): """Tests of the (safe) Numbers class.""" ClassToTest = Numbers def setUp(self): """define some standard lists""" self.empty = self.ClassToTest([]) self.integers = self.ClassToTest([1,2,3,4,5]) self.floats = self.ClassToTest([1.5, 2.7]) self.mixed = self.ClassToTest([ 0, 1, -1, 1.234567890, -1.2367890, 123.4e300, 123.4e-300, -123.4e300, -123.4e-300, ]) self.zero = self.ClassToTest([0,0,0,0,0]) self.ints = self.ClassToTest([1,2,3,4,5]) self.fracs = self.ClassToTest([0.1,0.2,0.3,0.4,0.5]) def test_init_string(self): """Numbers should initialize by treating string as list of digits""" self.assertEqual(self.ClassToTest('102'), [1.0, 0.0, 2.0]) def test_init_bad_string(self): """Numbers should raise ValueError if float() can't convert string""" self.assertRaises(ValueError, self.ClassToTest, '102a') def test_append_bad(self): """Numbers should reject append of a non-number""" self.assertRaises(ValueError, self.floats.append, "abc") class UnsafeNumbersTests(TestCase, NumbersTestsI): """Tests of the UnsafeNumbers class.""" ClassToTest = UnsafeNumbers def setUp(self): """define some standard lists""" self.empty = self.ClassToTest([]) self.integers = self.ClassToTest([1,2,3,4,5]) self.floats = self.ClassToTest([1.5, 2.7]) self.mixed = self.ClassToTest([ 0, 1, -1, 1.234567890, -1.2367890, 123.4e300, 123.4e-300, -123.4e300, -123.4e-300, ]) self.zero = self.ClassToTest([0,0,0,0,0]) self.ints = self.ClassToTest([1,2,3,4,5]) self.fracs = self.ClassToTest([0.1,0.2,0.3,0.4,0.5]) def test_init_string(self): """UnsafeNumbers should treat string as list of chars""" self.assertEqual(self.ClassToTest('102'), ['1','0','2']) def test_init_bad_string(self): """UnsafeNumbers should silently incorporate unfloatable string""" self.assertEqual(self.ClassToTest('102a'), ['1','0','2','a']) def test_append_bad(self): """UnsafeNumbers should allow append of a non-number""" self.empty.append('abc') self.assertEquals(self.empty, ['abc']) def test_isValid_bad(self): """UnsafeNumbers should return False if invalid""" assert self.mixed.isValid() self.mixed.append('abc') assert not self.mixed.isValid() class StaticFreqsTestsI(object): """Tests of the interface shared by Freqs and UnsafeFreqs (static keys). All of these tests assume keys on the alphabet 'abcde'. These tests were added to ensure that array-based objects that implement a fixed set of keys maintain the appropriate portion of the freqs interface. """ ClassToTest = None def setUp(self): """Standard cases to test.""" self.Alphabetic = self.ClassToTest({'a':3,'b':2,'c':1,'d':1,'e':1}) self.Empty = self.ClassToTest({}) self.Constant = self.ClassToTest({'a':5}) #The following test various ways of constructing the objects def test_fromTuples(self): """Freqs fromTuples should add from key, count pairs w/ repeated keys""" ct = self.ClassToTest f = ct() self.assertEqual(f.fromTuples([('a',4),('b',3),('a',2)]), \ ct({'a':6,'b':3})) #note: should be allowed to subtract, as long as freq doesn't go #negative. f.fromTuples([('b',-1),('c',4.5)]) self.assertEqual(f, ct({'a':6,'b':2,'c':4.5})) #should work with a different operator f.fromTuples([('b',7)], op=mul) self.assertEqual(f, ct({'a':6, 'b':14, 'c':4.5})) #check that it works with something that depends on the key def func(key, first, second): if key == 'a': return first + second else: return max(second, first * second) f = ct() self.assertEqual(f.fromTuples([('a',4),('b',3),('a',2), ('b',4)], \ func,uses_key=True), ct({'a':6,'b':12})) def test_newFromTuples(self): """Freqs newFromTuples should work as expected.""" ct = self.ClassToTest self.assertEqual(ct.newFromTuples([('a',4),('b',3),('a',2)]), \ ct({'a':6,'b':3})) def test_fromDict(self): """Freqs fromDict should add from dict of {key:count}""" ct = self.ClassToTest f = ct() self.assertEqual(f.fromDict({'a':6,'b':3}), ct({'a':6,'b':3})) #note: should be allowed to subtract, as long as freq doesn't go #negative. f.fromDict({'b':-1, 'c':4.5}) self.assertEqual(f, ct({'a':6,'b':2,'c':4.5})) #should work with a different operator f.fromDict({'b':7}, op=mul) self.assertEqual(f, ct({'a':6, 'b':14, 'c':4.5})) def test_newFromDict(self): """Freqs newFromDict should work as expected.""" ct = self.ClassToTest self.assertEqual(ct.newFromDict({'a':6,'b':3}), ct({'a':6,'b':3})) def test_fromDicts(self): """Freqs fromDicts should add from list of dicts of {key:count}""" ct = self.ClassToTest f = ct() self.assertEqual(f.fromDicts([{'a':6},{'b':3}]), ct({'a':6,'b':3})) #note: should be allowed to subtract, as long as freq doesn't go #negative. Also tests add of 1-item dict (note: must be in list) f.fromDicts([{'b':-1, 'c':4.5}]) self.assertEqual(f, ct({'a':6,'b':2,'c':4.5})) #should work with a different operator f.fromDicts([{'b':2},{'b':3}], op=mul) self.assertEqual(f, ct({'a':6, 'b':12, 'c':4.5})) def test_newFromDicts(self): """Freqs newFromDicts should work as expected.""" ct = self.ClassToTest self.assertEqual(ct.newFromDicts([{'a':6},{'b':3}]), ct({'a':6,'b':3})) def test_fromSeq(self): """Freqs fromSeq should add items from sequence, according to weight""" ct = self.ClassToTest f = self.ClassToTest() self.assertEqual(f.fromSeq('aaabbbaaa'), ct({'a':6,'b':3})) #should be able to change the operator... self.assertEqual(f.fromSeq('aab', sub), ct({'a':4,'b':2})) #...or change the weight self.assertEqual(f.fromSeq('acc', weight=3.5),\ ct({'a':7.5,'b':2,'c':7})) def test_newFromSeq(self): """Freqs newFromSeq should work as expected.""" ct = self.ClassToTest self.assertEqual(ct.newFromSeq('aaabbbaaa'), ct({'a':6,'b':3})) def test_fromSeqs(self): """Freqs fromSeqs should add items from sequences, according to weight""" ct = self.ClassToTest f = ct() self.assertEqual(f.fromSeqs(['aaa','bbbaaa']), ct({'a':6,'b':3})) #should be able to change the operator... self.assertEqual(f.fromSeqs(list('aab'), sub), ct({'a':4,'b':2})) #...or change the weight. Note that a string counts as a seq of seqs. self.assertEqual(f.fromSeqs('acc', weight=3.5), \ ct({'a':7.5,'b':2,'c':7})) def test_newFromSeqs(self): """Freqs newFromSeqs should work as expected.""" ct = self.ClassToTest self.assertEqual(ct.newFromSeqs(['aaa','bbbaaa']), ct({'a':6,'b':3})) def test_isValid(self): """Freqs isValid should return True if valid""" d =self.ClassToTest() assert d.isValid() d.fromSeq('aaaaaaaaaaaaabb') assert d.isValid() def test_find_conversion_function(self): """Freqs _find_conversion_function should return correct value.""" d = self.ClassToTest() f = d._find_conversion_function #should always return None if data empty for i in [None, 0, False, {}, [], tuple()]: self.assertEqual(f(i), None) #should return fromDict for non-empty dict self.assertEqual(f({3:4}), d.fromDict) #should return fromSeq for string or list of scalars or strings for i in ['abc', [1,2,3], (1,2,3), ['ab','bb','cb']]: self.assertEqual(f(i), d.fromSeq) #should return fromSeqs for sequence of sequences for i in [[[1,2,3],[3,4,4]], ([1,2,4],[3,4,4]), [(1,2),[3],[], [4]]]: self.assertEqual(f(i), d.fromSeqs) #should return fromTuples if possibly key-value pairs for i in [[('a',3),('b',-1)], [(1,2),(3,4)]]: self.assertEqual(f(i), d.fromTuples) #should not be fooled by 2-item seqs that can't be key-value pairs self.assertEqual(f(['ab','cd']), d.fromSeq) #The following test inheritance of dict properties/methods def test_setitem_good(self): """Freqs should allow non-negative values to be set""" ct = self.ClassToTest self.Empty['a'] = 0 self.assertEqual(self.Empty, ct({'a':0})) self.Empty['b'] = 5 self.assertEqual(self.Empty, ct({'a':0, 'b':5})) def test_delitem(self): """delitem not applicable to freqs w/ constant keys: not tested""" pass def test_setdefault_good(self): """Freqs setdefault should work with positive values if key present""" a = self.Alphabetic.setdefault('a', 200) self.assertEqual(a, 3) self.assertEqual(self.Alphabetic['a'], 3) def test_iadd(self): """Freqs += should add in place from any known data type""" ct = self.ClassToTest f = ct({'a':3, 'b':4}) f += 'aca' self.assertEqual(f, ct({'a':5, 'b':4, 'c':1})) f += ['b','b'] self.assertEqual(f, ct({'a':5, 'b':6, 'c':1})) f += {'c':10, 'a':-3} self.assertEqual(f, ct({'a':2, 'b':6, 'c':11})) f += (('a',3),('b',-2)) self.assertEqual(f, ct({'a':5, 'b':4, 'c':11})) f += [['a', 'b', 'b'],['c', 'c', 'c']] self.assertEqual(f, ct({'a':6, 'b':6, 'c':14})) #note that list of strings will give implementation-dependent result def test_add(self): """Freqs + should make new object, adding from any known data type""" ct = self.ClassToTest orig = {'a':3, 'b':4} f = ct(orig) r = f + 'aca' self.assertEqual(r, ct({'a':5, 'b':4, 'c':1})) self.assertEqual(f, orig) r = f + ['b','b'] self.assertEqual(r, ct({'a':3, 'b':6})) self.assertEqual(f, orig) r = f + {'c':10, 'a':-3} self.assertEqual(r, ct({'a':0, 'b':4, 'c':10})) self.assertEqual(f, orig) r = f + (('a',3),('b',-2)) self.assertEqual(r, ct({'a':6, 'b':2})) self.assertEqual(f, orig) r = f + [['a', 'b', 'b'],['c', 'c', 'c']] self.assertEqual(r, ct({'a':4, 'b':6, 'c':3})) self.assertEqual(f, orig) #note that list of strings will give implementation-dependent result def test_isub(self): """Freqs -= should subtract in place using any known data type""" ct = self.ClassToTest f = ct({'a':5, 'b':4}) f -= 'aba' self.assertEqual(f, ct({'a':3, 'b':3})) f -= ['b','b'] self.assertEqual(f, ct({'a':3, 'b':1})) f -= {'c':-2, 'a':-3} self.assertEqual(f, ct({'a':6, 'b':1, 'c':2})) f -= (('a',3),('b',-2)) self.assertEqual(f, ct({'a':3, 'b':3, 'c':2})) f -= [['a', 'b', 'b'],['c', 'c']] self.assertEqual(f, ct({'a':2, 'b':1, 'c':0})) #note that list of strings will give implementation-dependent result def test_sub(self): """Freqs - should make new object, subtracting using any known data type""" orig = {'a':3, 'b':4} ct = self.ClassToTest f = self.ClassToTest(orig) r = f - 'aba' self.assertEqual(r, ct({'a':1, 'b':3})) self.assertEqual(f, orig) r = f - ['b','b'] self.assertEqual(r, ct({'a':3, 'b':2})) self.assertEqual(f, orig) r = f - {'c':-10, 'a':3} self.assertEqual(r, ct({'a':0, 'b':4, 'c':10})) self.assertEqual(f, orig) r = f - (('a',3),('b',-2)) self.assertEqual(r, ct({'a':0, 'b':6})) self.assertEqual(f, orig) r = f - [['a', 'b', 'b'],['a','a']] self.assertEqual(r, ct({'a':0, 'b':2})) self.assertEqual(f, orig) #note that list of strings will give implementation-dependent results def test_copy(self): """Freqs copy should preserve class of original""" d = {'a':4, 'b':3, 'c':6} f = self.ClassToTest(d) g = f.copy() self.assertEqual(f, g) self.assertEqual(f.__class__, g.__class__) def test_str(self): """Freqs abstract interface doesn't specify string result""" pass def test_delitem(self): """Freqs delitem is implementation-dependent""" pass #The following test custom methods def test_rekey(self): """Freqs rekey should map the results onto new keys""" #note that what happens to unmapped keys is implementation-dependent ct = self.ClassToTest d = ct({'a':3, 'b':5, 'c':6, 'd':7, 'e':1}) #should work with simple rekeying f = d.rekey({'a':'d', 'b':'e'}) self.assertEqual(f['d'], d['a']) self.assertEqual(f['e'], d['b']) #remaining keys might be absent or 0 for i in 'abc': if i in f: self.assertEqual(f[i], 0) #should work if many old keys map to the same new key f = d.rekey({'a':'d', 'b':'e', 'c':'e'}) self.assertEqual(f['d'], d['a']) self.assertEqual(f['e'], d['b'] + d['c']) #remaining keys might be absent or 0 for i in 'abc': if i in f: self.assertEqual(f[i], 0) #check with explicit constructor and default d = self.ClassToTest({'a':3, 'b':5, 'c':6, 'd':7, 'e':1}) f = d.rekey({'a':'+', 'b':'-', 'c':'+'}, default='x', constructor=dict) self.assertEqual(f, {'+':9, '-':5, 'x':8}) self.assertEqual(f.__class__, dict) def test_purge(self): """Freqs purge should have no effect if keys are fixed""" ct = self.ClassToTest orig = {'a':3, 'b':2, 'c':1, 'd':3, 'e':4} d = ct(orig) d.purge() self.assertEqual(d, ct(orig)) def test_normalize(self): """Freqs should allow normalization""" ct = self.ClassToTest self.Empty.normalize() self.assertEqual(self.Empty, ct({})) a = self.Alphabetic.copy() a.normalize() expected = {'a':0.375, 'b':0.25, 'c':0.125, 'd':0.125, 'e':0.125} for key, val in expected.items(): self.assertFloatEqual(a[key], val) def test_choice(self): """Freqs choice should work as expected""" self.Alphabetic.normalize() keys = self.Alphabetic.keys() vals = Numbers(self.Alphabetic.values()) vals.accumulate() #test first item self.assertEqual(self.Alphabetic.choice(-1), keys[0]) self.assertEqual(self.Alphabetic.choice(-0.0001), keys[0]) self.assertEqual(self.Alphabetic.choice(-1e300), keys[0]) self.assertEqual(self.Alphabetic.choice(0), keys[0]) #test last item last_val = vals.pop() self.assertEqual(self.Alphabetic.choice(last_val), keys[-1]) self.assertEqual(self.Alphabetic.choice(1000), keys[-1]) #test remaining items for index, value in enumerate(vals): self.assertEqual(self.Alphabetic.choice(value-0.01),keys[index]) self.assertEqual(self.Alphabetic.choice(value+0.01), keys[index+1]) def test_randomSequence_good(self): """Freqs randomSequence should give correct counts""" self.Alphabetic.normalize() total = self.Alphabetic.Sum keys = self.Alphabetic.keys() probs = [float(i)/total for i in self.Alphabetic.values()] rand_seq = self.Alphabetic.randomSequence(10000) observed = [rand_seq.count(key) for key in keys] expected = [prob*10000 for prob in probs] self.assertSimilarFreqs(observed, expected) def test_randomSequence_bad(self): """Empty Freqs should raise error on randomSequence""" self.assertRaises(IndexError, self.Empty.randomSequence, 5) def test_randomSequence_one_item(self): """Freqs randomSequence should work with one key""" self.Constant.normalize() rand = self.Constant.randomSequence(1000) self.assertEqual(rand.count('a'), 1000) self.assertEqual(len(rand), 1000) def test_subset_preserve(self): """Freqs subset should preserve wanted items""" ct = self.ClassToTest self.Constant.subset('bc') self.assertEqual(self.Constant, self.Empty) self.Alphabetic.subset('abx') self.assertEqual(self.Alphabetic, ct({'a':3,'b':2})) def test_subset_remove(self): """Freqs subset should delete unwanted items if asked""" ct = self.ClassToTest self.Alphabetic.subset('abx', keep=False) self.assertEqual(self.Alphabetic, ct({'c':1,'d':1,'e':1})) self.Constant.subset('bx', keep=False) self.assertEqual(self.Constant, ct({'a':5})) def test_scale(self): """Freqs scale should multiply all values with the given scale""" ct = self.ClassToTest f = ct({'a':0.25,'b':0.25}) f.scale(10) self.assertEqual(f,ct({'a':2.5,'b':2.5})) f.scale(100) self.assertEqual(f,ct({'a':250, 'b':250})) f.scale(0.001) self.assertEqual(f,ct({'a':0.25,'b':0.25})) def test_round(self): """Freqs round should round all frequencies to integers""" ct = self.ClassToTest f = ct({'a':23.1, 'b':12.5, 'c':56.7}) f.round() self.assertEqual(f,ct({'a':23, 'b':13, 'c':57})) g = ct({'a':23.1356, 'b':12.5731}) g.round(3) self.assertEqual(g,ct({'a':23.136, 'b':12.573})) def test_expand(self): """Freqs expand should give expected results""" ct = self.ClassToTest f = ct({'a':3, 'c':5, 'b':2}) self.assertEqual(f.expand(order='acb'), list('aaacccccbb')) self.assertEqual(f.expand(order='dba'), list('bbaaa')) self.assertEqual(f.expand(order='cba',convert_to=''.join),'cccccbbaaa') f['c'] = 0 self.assertEqual(f.expand(order='acb'), list('aaabb')) f.normalize() self.assertEqual(f.expand(order='cba'), ['a']) self.assertEqual(f.expand(convert_to=''.join), 'a') f.normalize(total=1.0/20) self.assertEqual(f.expand(order='abc'), list('a'*12 + 'b'*8)) #test expand with scaling g = ct({'c':0.5,'d':0.5}) self.assertEqual(g.expand(order='cd'),['c','d']) self.assertEqual(g.expand(order='cd',scale=10),list(5*'c'+5*'d')) self.assertRaises(ValueError,g.expand,scale=33) def test_Count(self): """Freqs Count should return correct count (number of categories)""" self.assertEqual(self.Alphabetic.Count, 5) #WARNING: Count of empty categories is implementation-dependent def test_Sum(self): """Freqs Sum should return sum of item counts in all categories""" self.assertEqual(self.Alphabetic.Sum, 8) self.assertEqual(self.Empty.Sum, 0) self.assertEqual(self.Constant.Sum, 5) def test_SumSquares(self): """Freqs SumSquared should return sum of squared freq of each category""" self.assertEqual(self.Alphabetic.SumSquares, 16) self.assertEqual(self.Empty.SumSquares, 0) self.assertEqual(self.Constant.SumSquares, 25) def test_Variance(self): """Freqs Variance should return variance of counts in categories""" self.assertFloatEqual(self.Alphabetic.Variance, 0.8) self.assertFloatEqual(self.Empty.Variance, None) #WARNING: Variance with empty categories is implementation-dependent def test_StandardDeviation(self): """Freqs StandardDeviation should return stdev of counts in categories""" self.assertFloatEqual(self.Alphabetic.StandardDeviation, sqrt(0.8)) self.assertFloatEqual(self.Empty.StandardDeviation, None) #WARNING: Standard deviation with empty categories is implementation- #dependent def test_Mean(self): """Freqs Mean should return mean of counts in categories""" self.assertEqual(self.Alphabetic.Mean, 8/5.0) self.assertEqual(self.Empty.Mean, None) #WARNING: Mean with empty categories is implementation-dependent def test_Uncertainty(self): """Freqs Shannon uncertainty values should match spreadsheet""" self.assertEqual(self.Empty.Uncertainty, 0) self.assertFloatEqual(self.Alphabetic.Uncertainty, 2.1556, eps=1e-4) #WARNING: Uncertainty with empty categories is implementation-dependent def test_mode(self): """Freqs mode should return most frequent item""" self.assertEqual(self.Empty.Mode, None) self.assertEqual(self.Alphabetic.Mode, 'a') self.assertEqual(self.Constant.Mode, 'a') def test_summarize(self): """Freqs summarize should return Summary: Count, Sum, SumSquares, Var""" s = self.Alphabetic.summarize() self.assertEqual(s.Sum, 8) self.assertEqual(s.Count, 5) self.assertEqual(s.SumSquares, 16) self.assertFloatEqual(s.Variance, 0.8) self.assertFloatEqual(s.StandardDeviation, sqrt(0.8)) self.assertFloatEqual(s.Mean, 8.0/5) def test_getSortedList(self): """Freqs getSortedList should return sorted list of key, val tuples""" #behavior is implementation-defined with empty list, so skip tests. a = self.Alphabetic a['b'] = 5 self.assertEqual(a.getSortedList(), \ [('b',5),('a',3),('e',1),('d',1),('c',1)]) self.assertEqual(a.getSortedList(descending=True), \ [('b',5),('a',3),('e',1),('d',1),('c',1)]) self.assertEqual(a.getSortedList(descending=False), \ [('c',1),('d',1),('e',1),('a',3),('b',5)]) self.assertEqual(a.getSortedList(by_val=False), \ [('e',1),('d',1),('c',1),('b',5),('a',3)]) self.assertEqual(a.getSortedList(by_val=False, descending=False), \ [('a',3),('b',5),('c',1),('d',1),('e',1)]) class FreqsStaticTests(StaticFreqsTestsI, TestCase): ClassToTest = Freqs class UnsafeFreqsStaticTests(StaticFreqsTestsI, TestCase): ClassToTest = UnsafeFreqs class FreqsTestsI(object): """Tests of the interface shared by Freqs and UnsafeFreqs.""" ClassToTest = None #The following test various ways of constructing the objects def test_fromTuples(self): """Freqs fromTuples should add from key, count pairs w/ repeated keys""" f = self.ClassToTest() self.assertEqual(f.fromTuples([('a',4),('b',3),('a',2)]), {'a':6,'b':3}) #note: should be allowed to subtract, as long as freq doesn't go #negative. f.fromTuples([('b',-1),('c',4.5)]) self.assertEqual(f, {'a':6,'b':2,'c':4.5}) #should work with a different operator f.fromTuples([('b',7)], op=mul) self.assertEqual(f, {'a':6, 'b':14, 'c':4.5}) #check that it works with something that depends on the key def func(key, first, second): if key == 'a': return first + second else: return max(second, first * second) f = self.ClassToTest() self.assertEqual(f.fromTuples([('a',4),('b',3),('a',2), ('b',4)], \ func,uses_key=True), {'a':6,'b':12}) def test_fromDict(self): """Freqs fromDict should add from dict of {key:count}""" f = self.ClassToTest() self.assertEqual(f.fromDict({'a':6,'b':3}), {'a':6,'b':3}) #note: should be allowed to subtract, as long as freq doesn't go #negative. f.fromDict({'b':-1, 'c':4.5}) self.assertEqual(f, {'a':6,'b':2,'c':4.5}) #should work with a different operator f.fromDict({'b':7}, op=mul) self.assertEqual(f, {'a':6, 'b':14, 'c':4.5}) def test_fromDicts(self): """Freqs fromDicts should add from list of dicts of {key:count}""" f = self.ClassToTest() self.assertEqual(f.fromDicts([{'a':6},{'b':3}]), {'a':6,'b':3}) #note: should be allowed to subtract, as long as freq doesn't go #negative. Also tests add of 1-item dict (note: must be in list) f.fromDicts([{'b':-1, 'c':4.5}]) self.assertEqual(f, {'a':6,'b':2,'c':4.5}) #should work with a different operator f.fromDicts([{'b':2},{'b':3}], op=mul) self.assertEqual(f, {'a':6, 'b':12, 'c':4.5}) def test_fromSeq(self): """Freqs fromSeq should add items from sequence, according to weight""" f = self.ClassToTest() self.assertEqual(f.fromSeq('aaabbbaaa'), {'a':6,'b':3}) #should be able to change the operator... self.assertEqual(f.fromSeq('aab', sub), {'a':4,'b':2}) #...or change the weight self.assertEqual(f.fromSeq('acc', weight=3.5), {'a':7.5,'b':2,'c':7}) def test_fromSeqs(self): """Freqs fromSeqs should add items from sequences, according to weight""" f = self.ClassToTest() self.assertEqual(f.fromSeqs(['aaa','bbbaaa']), {'a':6,'b':3}) #should be able to change the operator... self.assertEqual(f.fromSeqs(list('aab'), sub), {'a':4,'b':2}) #...or change the weight. Note that a string counts as a seq of seqs. self.assertEqual(f.fromSeqs('acc', weight=3.5), {'a':7.5,'b':2,'c':7}) def test_isValid(self): """Freqs isValid should return True if valid""" d =self.ClassToTest() assert d.isValid() d.fromSeq('aaaaaaaaaaaaabb') assert d.isValid() def test_find_conversion_function(self): """Freqs _find_conversion_function should return correct value.""" d = self.ClassToTest() f = d._find_conversion_function #should always return None if data empty for i in [None, 0, False, {}, [], tuple()]: self.assertEqual(f(i), None) #should return fromDict for non-empty dict self.assertEqual(f({3:4}), d.fromDict) #should return fromSeq for string or list of scalars or strings for i in ['abc', [1,2,3], (1,2,3), ['ab','bb','cb']]: self.assertEqual(f(i), d.fromSeq) #should return fromSeqs for sequence of sequences for i in [[[1,2,3],[3,4,4]], ([1,2,4],[3,4,4]), [(1,2),[3],[], [4]]]: self.assertEqual(f(i), d.fromSeqs) #should return fromTuples if possibly key-value pairs for i in [[('a',3),('b',-1)], [(1,2),(3,4)]]: self.assertEqual(f(i), d.fromTuples) #should not be fooled by 2-item seqs that can't be key-value pairs self.assertEqual(f(['ab','cd']), d.fromSeq) #The following test inheritance of dict properties/methods def test_setitem_good(self): """Freqs should allow non-negative values to be set""" self.Empty[3] = 0 self.assertEqual(self.Empty, {3:0}) self.Empty['xyz'] = 5 self.assertEqual(self.Empty, {3:0, 'xyz':5}) def test_delitem(self): """Freqs should delete all counts of item with del""" del self.Alphabetic['a'] del self.Alphabetic['b'] del self.Alphabetic['c'] self.assertEqual(self.Alphabetic, {'d':1,'e':1}) def test_setdefault_good(self): """Freqs setdefault should work with positive values""" a = self.Alphabetic.setdefault('a', 200) self.assertEqual(a, 3) self.assertEqual(self.Alphabetic['a'], 3) f = self.Alphabetic.setdefault('f', 0) self.assertEqual(f, 0) self.assertEqual(self.Alphabetic['f'], 0) g = self.Alphabetic.setdefault('g', 1000) self.assertEqual(g, 1000) self.assertEqual(self.Alphabetic['g'], 1000) #The following test overridden operators and methods def test_iadd(self): """Freqs += should add in place from any known data type""" f = self.ClassToTest({'a':3, 'b':4}) f += 'aca' self.assertEqual(f, {'a':5, 'b':4, 'c':1}) f += ['b','b'] self.assertEqual(f, {'a':5, 'b':6, 'c':1}) f += {'c':10, 'a':-3} self.assertEqual(f, {'a':2, 'b':6, 'c':11}) f += (('a',3),('b',-2)) self.assertEqual(f, {'a':5, 'b':4, 'c':11}) f += [['a', 'b', 'b'],['c', 'c', 'c']] self.assertEqual(f, {'a':6, 'b':6, 'c':14}) #note that list of strings will use the strings as keys f += ['abc', 'def', 'abc'] self.assertEqual(f, {'a':6, 'b':6, 'c':14, 'abc':2, 'def':1}) def test_add(self): """Freqs + should make new object, adding from any known data type""" orig = {'a':3, 'b':4} f = self.ClassToTest(orig) self.assertEqual(f, orig) r = f + 'aca' self.assertEqual(r, {'a':5, 'b':4, 'c':1}) self.assertEqual(f, orig) r = f + ['b','b'] self.assertEqual(r, {'a':3, 'b':6}) self.assertEqual(f, orig) r = f + {'c':10, 'a':-3} self.assertEqual(r, {'a':0, 'b':4, 'c':10}) self.assertEqual(f, orig) r = f + (('a',3),('b',-2)) self.assertEqual(r, {'a':6, 'b':2}) self.assertEqual(f, orig) r = f + [['a', 'b', 'b'],['c', 'c', 'c']] self.assertEqual(r, {'a':4, 'b':6, 'c':3}) self.assertEqual(f, orig) #note that list of strings will use the strings as keys r = f + ['abc', 'def', 'abc'] self.assertEqual(r, {'a':3, 'b':4, 'abc':2, 'def':1}) self.assertEqual(f, f) def test_isub(self): """Freqs -= should subtract in place using any known data type""" f = self.ClassToTest({'a':5, 'b':4}) f -= 'aba' self.assertEqual(f, {'a':3, 'b':3,}) f -= ['b','b'] self.assertEqual(f, {'a':3, 'b':1}) f -= {'c':-2, 'a':-3} self.assertEqual(f, {'a':6, 'b':1, 'c':2}) f -= (('a',3),('b',-2)) self.assertEqual(f, {'a':3, 'b':3, 'c':2}) f -= [['a', 'b', 'b'],['c', 'c']] self.assertEqual(f, {'a':2, 'b':1, 'c':0}) f['abc'] = 3 f['def'] = 10 #note that list of strings will use the strings as keys f -= ['abc', 'def', 'abc'] self.assertEqual(f, {'a':2, 'b':1, 'c':0, 'abc':1, 'def':9}) def test_sub(self): """Freqs - should make new object, subtracting using any known data type""" orig = {'a':3, 'b':4} f = self.ClassToTest(orig) self.assertEqual(f, orig) r = f - 'aba' self.assertEqual(r, {'a':1, 'b':3}) self.assertEqual(f, orig) r = f - ['b','b'] self.assertEqual(r, {'a':3, 'b':2}) self.assertEqual(f, orig) r = f - {'c':-10, 'a':3} self.assertEqual(r, {'a':0, 'b':4, 'c':10}) self.assertEqual(f, orig) r = f - (('a',3),('b',-2)) self.assertEqual(r, {'a':0, 'b':6}) self.assertEqual(f, orig) r = f - [['a', 'b', 'b'],['a','a']] self.assertEqual(r, {'a':0, 'b':2}) self.assertEqual(f, orig) #note that list of strings will use the strings as keys orig['abc'] = 5 orig['def'] = 10 f['abc'] = 5 f['def'] = 10 r = f - ['abc', 'def', 'abc'] self.assertEqual(r, {'a':3, 'b':4, 'abc':3, 'def':9}) self.assertEqual(f, orig) def test_copy(self): """Freqs copy should preserve class of original""" d = {'a':4, 'b':3, 'c':6} f = self.ClassToTest(d) g = f.copy() self.assertEqual(d, f) self.assertEqual(d, g) self.assertEqual(f, g) self.assertEqual(f.__class__, g.__class__) def test_str(self): """Freqs should print as tab-delimited table, or 'Empty'""" #should work with empty freq distribution self.assertEqual(str(self.ClassToTest([])), \ "Empty frequency distribution") #should work with single element self.assertEqual(str(self.ClassToTest({'X':1.0})), \ "Value\tCount\nX\t1.0") #should work with multiples of same key self.assertEqual(str(self.ClassToTest({1.0:5.0})), \ "Value\tCount\n1.0\t5.0") #should work with different keys self.assertEqual(str(self.ClassToTest({0:3.0,1:2.0})), \ "Value\tCount\n0\t3.0\n1\t2.0") def test_delitem(self): """Freqs delitem should refuse to delete a required key""" a = self.Alphabetic del a['a'] self.assertEqual(a, {'b':2, 'c':1, 'd':1, 'e':1}) #can't delete RequiredKeys once set a.RequiredKeys = 'bcd' del a['e'] self.assertEqual(a, {'b':2,'c':1,'d':1}) for k in 'bcd': self.assertRaises(KeyError, a.__delitem__, k) #when RequiredKeys is removed, can delete them again a.RequiredKeys = None del a['b'] self.assertEqual(a, {'c':1,'d':1}) #The following test custom methods def test_rekey(self): """Freqs rekey should map the results onto new keys.""" d = self.ClassToTest({'a':3, 'b':5, 'c':6, 'd':7, 'e':1}) f = d.rekey({'a':'+', 'b':'-', 'c':'+'}) self.assertEqual(f, {'+':9, '-':5, None:8}) self.assertEqual(f.__class__, d.__class__) #check with explicit constructor and default d = self.ClassToTest({'a':3, 'b':5, 'c':6, 'd':7, 'e':1}) f = d.rekey({'a':'+', 'b':'-', 'c':'+'}, default='x', constructor=dict) self.assertEqual(f, {'+':9, '-':5, 'x':8}) self.assertEqual(f.__class__, dict) def test_purge(self): """Freqs purge should have no effect unless RequiredKeys set.""" working = self.PosNeg.copy() self.assertEqual(working, self.PosNeg) working.purge() self.assertEqual(working, self.PosNeg) working.RequiredKeys=(-2,-1) working[-2] = 3 working.purge() self.assertEqual(working, {-2:3,-1:1}) #should have no effect if repeated working.purge() self.assertEqual(working, {-2:3,-1:1}) def test_normalize(self): """Freqs should allow normalization on any type""" self.Empty.normalize() self.assertEqual(self.Empty, {}) a = self.Alphabetic.copy() a.normalize() expected = {'a':0.375, 'b':0.25, 'c':0.125, 'd':0.125, 'e':0.125} for key, val in expected.items(): self.assertFloatEqual(a[key], val) self.PosNeg.normalize() expected = {-2:0.25, -1:0.25, 1:0.25, 2:0.25} for key, val in expected.items(): self.assertFloatEqual(self.PosNeg[key], val) #check that it works when we specify a total self.PosNeg.normalize(total=0.2) expected = {-2:1.25, -1:1.25, 1:1.25, 2:1.25} for key, val in expected.items(): self.assertFloatEqual(self.PosNeg[key], val) #check that purging works a = self.Alphabetic.copy() a.RequiredKeys = 'ac' a.normalize() self.assertEqual(a, {'a':0.75, 'c':0.25}) a = self.Alphabetic.copy() a.RequiredKeys = 'ac' a.normalize(purge=False) self.assertEqual(a, \ {'a':0.375, 'b':0.25, 'c':0.125, 'd':0.125, 'e':0.125}) #normalize should also create keys when necessary a.RequiredKeys = 'bdex' a.normalize(purge=True) self.assertEqual(a, {'b':0.5, 'd':0.25, 'e':0.25, 'x':0}) def test_choice(self): """Freqs choice should work as expected""" self.Alphabetic.normalize() keys = self.Alphabetic.keys() vals = Numbers(self.Alphabetic.values()) vals.accumulate() #test first item self.assertEqual(self.Alphabetic.choice(-1), keys[0]) self.assertEqual(self.Alphabetic.choice(-0.0001), keys[0]) self.assertEqual(self.Alphabetic.choice(-1e300), keys[0]) self.assertEqual(self.Alphabetic.choice(0), keys[0]) #test last item last_val = vals.pop() self.assertEqual(self.Alphabetic.choice(last_val), keys[-1]) self.assertEqual(self.Alphabetic.choice(1000), keys[-1]) #test remaining items for index, value in enumerate(vals): self.assertEqual(self.Alphabetic.choice(value-0.01),keys[index]) self.assertEqual(self.Alphabetic.choice(value+0.01), keys[index+1]) def test_randomSequence_good(self): """Freqs randomSequence should give correct counts""" self.Alphabetic.normalize() total = self.Alphabetic.Sum keys = self.Alphabetic.keys() probs = [float(i)/total for i in self.Alphabetic.values()] rand_seq = self.Alphabetic.randomSequence(10000) observed = [rand_seq.count(key) for key in keys] expected = [prob*10000 for prob in probs] self.assertSimilarFreqs(observed, expected) def test_randomSequence_bad(self): """Empty Freqs should raise error on randomSequence""" self.assertRaises(IndexError, self.Empty.randomSequence, 5) def test_randomSequence_one_item(self): """Freqs randomSequence should work with one key""" self.Constant.normalize() rand = self.Constant.randomSequence(1000) self.assertEqual(rand.count(1), 1000) self.assertEqual(len(rand), 1000) def test_subset_preserve(self): """Freqs subset should preserve unwanted items""" self.Constant.subset('abc') self.assertEqual(self.Constant, self.Empty) self.Alphabetic.subset('abx') self.assertEqual(self.Alphabetic, Freqs('aaabb')) def test_subset_remove(self): """Freqs subset should delete unwanted items if asked""" self.Alphabetic.subset('abx', keep=False) self.assertEqual(self.Alphabetic, Freqs('cde')) self.Constant.subset('abx', keep=False) self.assertEqual(self.Constant, Freqs([1]*5)) def test_scale(self): """Freqs scale should multiply all values with the given scale""" f = self.ClassToTest({'a':0.25,'b':0.25}) f.scale(10) self.assertEqual(f,{'a':2.5,'b':2.5}) f.scale(100) self.assertEqual(f,{'a':250, 'b':250}) f.scale(0.001) self.assertEqual(f,{'a':0.25,'b':0.25}) def test_round(self): """Freqs round should round all frequencies to integers""" f = self.ClassToTest({'a':23.1, 'b':12.5, 'c':56.7}) f.round() self.assertEqual(f,{'a':23, 'b':13, 'c':57}) g = Freqs({'a':23.1356, 'b':12.5731}) g.round(3) self.assertEqual(g,{'a':23.136, 'b':12.573}) def test_expand(self): """Freqs expand should give expected results""" f = self.ClassToTest({'U':3, 'A':5, 'C':2}) self.assertEqual(f.expand(order='UAC'), list('UUUAAAAACC')) self.assertEqual(f.expand(order='GCU'), list('CCUUU')) self.assertEqual(f.expand(order='ACU',convert_to=''.join),'AAAAACCUUU') del f['A'] self.assertEqual(f.expand(order='UAC'), list('UUUCC')) f.normalize() self.assertEqual(f.expand(order='ACU'), ['U']) self.assertEqual(f.expand(convert_to=''.join), 'U') f.normalize(total=1.0/20) self.assertEqual(f.expand(order='UCA'), list('U'*12 + 'C'*8)) #test expand with scaling g = self.ClassToTest({'A':0.5,'G':0.5}) self.assertEqual(g.expand(order='AG'),['A','G']) self.assertEqual(g.expand(order='AG',scale=10),list(5*'A'+5*'G')) self.assertRaises(ValueError,g.expand,scale=33) def test_Count(self): """Freqs Count should return correct count (number of categories)""" self.assertEqual(self.Alphabetic.Count, 5) self.assertEqual(self.NumericDuplicated.Count, 3) self.assertEqual(self.Empty.Count, 0) self.assertEqual(self.Constant.Count, 1) def test_Sum(self): """Freqs Sum should return sum of item counts in all categories""" self.assertEqual(self.Alphabetic.Sum, 8) self.assertEqual(self.NumericUnique.Sum, 5) self.assertEqual(self.NumericDuplicated.Sum, 4) self.assertEqual(self.Empty.Sum, 0) # WARNING: For numeric keys, the value of the key is not taken into # account (i.e. each key counts as a separate category) self.assertEqual(self.PosNeg.Sum, 4) self.assertEqual(self.Constant.Sum, 5) def test_SumSquares(self): """Freqs SumSquared should return sum of squared freq of each category""" self.assertEqual(self.Alphabetic.SumSquares, 16) self.assertEqual(self.NumericUnique.SumSquares, 5) self.assertEqual(self.NumericDuplicated.SumSquares, 6) self.assertEqual(self.Empty.SumSquares, 0) self.assertEqual(self.PosNeg.SumSquares, 4) self.assertEqual(self.Constant.SumSquares, 25) def test_Variance(self): """Freqs Variance should return variance of counts in categories""" self.assertFloatEqual(self.Alphabetic.Variance, 0.8) self.assertFloatEqual(self.NumericUnique.Variance, 0) self.assertFloatEqual(self.NumericDuplicated.Variance, 1.0/3) self.assertFloatEqual(self.Empty.Variance, None) self.assertFloatEqual(self.PosNeg.Variance, 0) self.assertEqual(self.Constant.Variance, 0) def test_StandardDeviation(self): """Freqs StandardDeviation should return stdev of counts in categories""" self.assertFloatEqual(self.Alphabetic.StandardDeviation, sqrt(0.8)) self.assertFloatEqual(self.NumericUnique.StandardDeviation, 0) self.assertFloatEqual(self.NumericDuplicated.StandardDeviation, \ sqrt(1.0/3)) self.assertFloatEqual(self.Empty.StandardDeviation, None) self.assertFloatEqual(self.PosNeg.StandardDeviation, 0) self.assertEqual(self.Constant.StandardDeviation, 0) def test_Mean(self): """Freqs Mean should return mean of counts in categories""" self.assertEqual(self.Alphabetic.Mean, 8/5.0) self.assertEqual(self.NumericUnique.Mean, 1) self.assertEqual(self.NumericDuplicated.Mean, 4/3.0) self.assertEqual(self.Empty.Mean, None) self.assertEqual(self.PosNeg.Mean, 1) self.assertEqual(self.Constant.Mean, 5) def test_Uncertainty(self): """Freqs Shannon uncertainty values should match spreadsheet""" self.assertEqual(self.Empty.Uncertainty, 0) self.assertFloatEqual(self.Alphabetic.Uncertainty, 2.1556, eps=1e-4) self.assertFloatEqual(self.PosNeg.Uncertainty, 2) self.assertFloatEqual(self.NumericDuplicated.Uncertainty, 1.5) self.assertFloatEqual(self.NumericUnique.Uncertainty, 2.3219, eps=1e-4) self.assertFloatEqual(self.Constant.Uncertainty, 0) def test_mode(self): """Freqs mode should return most frequent item""" self.assertEqual(self.Empty.Mode, None) self.assertEqual(self.Alphabetic.Mode, 'a') assert(self.PosNeg.Mode in self.PosNeg) assert(self.NumericUnique.Mode in self.NumericUnique) self.assertEqual(self.NumericDuplicated.Mode, 1.5) self.assertEqual(self.Constant.Mode, 1) def test_summarize(self): """Freqs summarize should return Summary: Count, Sum, SumSquares, Var""" self.assertEqual(self.Empty.summarize(), SummaryStatistics()) s = self.Alphabetic.summarize() self.assertEqual(s.Sum, 8) self.assertEqual(s.Count, 5) self.assertEqual(s.SumSquares, 16) self.assertFloatEqual(s.Variance, 0.8) self.assertFloatEqual(s.StandardDeviation, sqrt(0.8)) self.assertFloatEqual(s.Mean, 8.0/5) def test_getSortedList(self): """Freqs getSortedList should return sorted list of key, val tuples""" e = self.Empty self.assertEqual(e.getSortedList(), []) self.assertEqual(e.getSortedList(descending=True), []) self.assertEqual(e.getSortedList(descending=False), []) self.assertEqual(e.getSortedList(by_val=True), []) self.assertEqual(e.getSortedList(by_val=False), []) a = self.Alphabetic a['b'] = 5 self.assertEqual(a.getSortedList(), \ [('b',5),('a',3),('e',1),('d',1),('c',1)]) self.assertEqual(a.getSortedList(descending=True), \ [('b',5),('a',3),('e',1),('d',1),('c',1)]) self.assertEqual(a.getSortedList(descending=False), \ [('c',1),('d',1),('e',1),('a',3),('b',5)]) self.assertEqual(a.getSortedList(by_val=False), \ [('e',1),('d',1),('c',1),('b',5),('a',3)]) self.assertEqual(a.getSortedList(by_val=False, descending=False), \ [('a',3),('b',5),('c',1),('d',1),('e',1)]) class FreqsTests(FreqsTestsI, TestCase): """Tests of Freqs-specific behavior, mostly validation.""" ClassToTest = Freqs def setUp(self): """defines some standard frequency distributions to check""" self.Alphabetic = self.ClassToTest('abcdeaab') self.NumericUnique = self.ClassToTest([1,2,3,4,5]) self.NumericDuplicated = self.ClassToTest([1, 1.5, 1.5, 3.5]) self.Empty = self.ClassToTest('') self.PosNeg = self.ClassToTest([-2, -1, 1, 2]) self.Constant = self.ClassToTest([1]*5) def test_isValid_bad(self): """Freqs should reject invalid data, so isValid() always True""" self.assertRaises(ConstraintError, self.ClassToTest, {'a':3, 'b':-10}) def test_init_empty(self): """Freqs should initialize OK with empty list""" self.assertEqual(self.ClassToTest([]), {}) def test_init_single(self): """Freqs should initialize OK with single item""" self.assertEqual(self.ClassToTest(['X']), {'X':1.0}) def test_init_same_key(self): """Freqs should initialize OK with duplicate items""" self.assertEqual(self.ClassToTest([1]*5), {1:5}) def test_init_two_keys(self): """Freqs should initialize OK with distinct items""" self.assertEqual(self.ClassToTest([0,1,0,0,1]), {1:2,0:3}) def test_init_strings(self): """Freqs should initialize OK with characters in string""" self.assertEqual(self.ClassToTest('zabcz'), {'z':2,'a':1,'b':1,'c':1}) def test_init_fails_negative(self): """Freqs init should fail on negative frequencies""" self.assertRaises(ConstraintError, self.ClassToTest, {'a':3, 'b':-3}) def test_init_from_dict(self): """Freqs should init OK from dictionary""" self.assertEqual(self.ClassToTest({'a':3,'b':2}), {'a':3, 'b':2}) def test_init_from_dicts(self): """Freqs should init OK from list of dicts""" self.assertEqual(self.ClassToTest([{'a':1,'b':1}, {'a':2,'b':1}]), \ {'a':3, 'b':2}) def test_init_from_strings(self): """Freqs should init OK from list of strings""" self.assertEqual(self.ClassToTest(['abc','def','abc']), \ {'abc':2,'def':1}) def test_init_from_tuples(self): """Freqs should init OK from list of key-value pairs""" self.assertEqual(self.ClassToTest([('a',3),('b',10),('a',2)]), \ {'a':5,'b':10}) def test_init_alphabet_success(self): """Freqs should init ok with keys matching alphabet""" fd = self.ClassToTest('abc', Constraint='abcd') self.assertEqual(fd, {'a':1,'b':1,'c':1}) self.assertRaises(ConstraintError, fd.setdefault, 'x', 1) self.assertRaises(ConstraintError, fd.__setitem__, 'x', 1) def test_init_alphabet_failure(self): """Freqs should fail if keys don't match alphabet""" try: f = Freqs('abcd', Constraint='abc') except ConstraintError: pass else: self.fail() def test_setitem_bad(self): """Freqs should not allow negative values""" self.assertRaises(ConstraintError, self.Empty.__setitem__, 'xyz', -0.01) def test_setdefault_bad(self): """Freqs setdefault should fail if default < 0""" self.assertRaises(ConstraintError, self.Empty.setdefault, 'a', -1) self.assertRaises(ConstraintError, self.Empty.setdefault, 'a', -0.00001) self.assertRaises(ConstraintError, self.Empty.setdefault, 'a', "-1") self.assertRaises(ConstraintError, self.Empty.setdefault, 'a', "xxxx") class UnsafeFreqsTests(FreqsTestsI, TestCase): """Tests of UnsafeFreqs-specific behavior, mostly validation.""" ClassToTest = UnsafeFreqs def setUp(self): """defines some standard frequency distributions to check""" self.Alphabetic = self.ClassToTest({'a':3,'b':2,'c':1,'d':1,'e':1}) self.NumericUnique = self.ClassToTest({'1':1,'2':1,'3':1,'4':1,'5':1}) self.NumericDuplicated = self.ClassToTest({1:1, 1.5:2, 3.5:1}) self.Empty = self.ClassToTest({}) self.PosNeg = self.ClassToTest({-2:1, -1:1, 1:1, 2:1}) self.Constant = self.ClassToTest({1:5}) def test_isValid_bad(self): """UnsafeFreqs should allow invalid data, returning False for isValid""" d = self.ClassToTest({'a':3, 'b':'x'}) assert not d.isValid() def test_init_empty(self): """UnsafeFreqs should initialize OK with empty list""" self.assertEqual(self.ClassToTest([]), {}) def test_init_single(self): """UnsafeFreqs init FAILS with single item""" self.assertRaises(ValueError, self.ClassToTest, ['X']) def test_init_same_key(self): """UnsafeFreqs init FAILS with list of items""" self.assertRaises(TypeError, self.ClassToTest, [1]*5) def test_init_strings(self): """UnsafeFreqs init FAILS with string""" self.assertRaises(ValueError, self.ClassToTest, 'zabcz') def test_init_negative(self): """UnsafeFreqs init should SUCCEED on negative frequencies""" self.assertEqual(self.ClassToTest({'a':3, 'b':-3}), {'a':3,'b':-3}) def test_init_from_dict(self): """UnsafeFreqs should init OK from dictionary""" self.assertEqual(self.ClassToTest({'a':3,'b':2}), {'a':3, 'b':2}) def test_init_from_dicts(self): """UnsafeFreqs init should init LIKE A DICT from list of dicts""" # WARNING: Note the difference between this and Freqs init! self.assertEqual(self.ClassToTest([{'a':1,'b':1}, {'a':2,'b':1}]), \ {'a':'b'}) def test_init_from_strings(self): """UnsafeFreqs init should FAIL from list of strings""" self.assertRaises(ValueError, self.ClassToTest, ['abc','def','abc']) def test_init_from_tuples(self): """UnsafeFreqs should init LIKE A DICT from list of key-value pairs""" # WARNING: Note the difference between this and Freqs init! self.assertEqual(self.ClassToTest([('a',3),('b',10),('a',2)]), \ {'a':2,'b':10}) class FreqsSubclassTests(TestCase): """Freqs subclassing should work correctly, esp. with RequiredKeys.""" class BaseFreqs(Freqs): RequiredKeys = 'UCAG' def test_init(self): """Freqs subclass init should add RequiredKeys""" b = self.BaseFreqs() self.assertEqual(b, {'U':0.0,'C':0.0,'A':0.0,'G':0.0}) self.assertEqual(self.BaseFreqs('UUCCCCAAAabc'), \ {'U':2, 'C':4, 'A':3, 'a':1, 'b':1, 'c':1, 'G':0}) def test_delitem(self): """Freqs subclass delitem shouldn't allow deletion of RequiredKeys""" b = self.BaseFreqs('AAGCg') self.assertEqual(b, {'A':2,'G':1,'C':1,'U':0,'g':1}) del b['g'] self.assertEqual(b, {'A':2,'G':1,'C':1,'U':0}) self.assertRaises(KeyError, b.__delitem__, 'A') def test_purge(self): """Freqs subclass purge should eliminate anything not in RequiredKeys""" b = self.BaseFreqs('AjaknadjkAjnjndfjndCnjdjsfnfdsjkC32478737&#^&@GGGG') b.purge() self.assertEqual(b, {'A':2,'C':2,'G':4, 'U':0}) b.purge() self.assertEqual(b, {'A':2,'C':2,'G':4, 'U':0}) def test_normalize(self): """Freqs subclass normalize should optionally elminate non-required keys""" b = self.BaseFreqs('UUUCX') b.normalize(purge=False) self.assertEqual(b, {'U':0.6, 'C':0.2, 'X':0.2, 'A':0, 'G':0}) b.normalize(purge=True) self.assertFloatEqual(b, {'U':0.75, 'C':0.25, 'A':0, 'G':0}) b = self.BaseFreqs() b.normalize() self.assertEqual(b, {'U':0, 'C':0, 'A':0, 'G':0}) class NumberFreqsTestsI(object): """Interface for tests of safe and unsafe NumberFreqs classes.""" ClassToTest = None def setUp(self): """defines some standard frequency distributions to check""" self.NumericUnique = self.ClassToTest([1,2,3,4,5]) self.NumericDuplicated = self.ClassToTest([1, 1.5, 1.5, 3.5]) self.Empty = self.ClassToTest('') self.PosNeg = self.ClassToTest([-2, -1, 1, 2]) self.Constant = self.ClassToTest([1]*5) def test_setitem_good(self): """NumberFreqs should allow non-negative values to be set""" self.Empty[3] = 0 self.assertEqual(self.Empty, {3:0}) def test_add_good(self): """NumberFreqs should allow addition of counts or items""" self.Empty += [1]*5 self.assertEqual(self.Empty, {1:5}) def test_Mean(self): """NumberFreqs means should match hand-calculated values""" self.assertEqual(self.Empty.Mean, None) self.assertFloatEqual(self.NumericUnique.Mean, 15.0/5) self.assertFloatEqual(self.NumericDuplicated.Mean, 7.5/4) self.assertFloatEqual(self.PosNeg.Mean, 0.0) self.assertFloatEqual(self.Constant.Mean, 1.0) def test_Variance(self): """NumberFreqs variance should match values from R.""" self.assertEqual(None, self.Empty.Variance) self.assertFloatEqual(2.5, self.NumericUnique.Variance) self.assertFloatEqual(1.229167, self.NumericDuplicated.Variance) self.assertFloatEqual(10/3.0, self.PosNeg.Variance) self.assertFloatEqual(0, self.Constant.Variance) def test_Sum(self): """NumberFreqs sums should match hand-calculated values""" self.assertEqual(self.Empty.Sum, None) self.assertFloatEqual(self.NumericUnique.Sum, 15) self.assertFloatEqual(self.NumericDuplicated.Sum, 7.5) self.assertFloatEqual(self.PosNeg.Sum, 0.0) self.assertFloatEqual(self.Constant.Sum, 5.0) def test_Count(self): """NumberFreqs counts should match hand-calculated values""" self.assertEqual(self.NumericUnique.Count, 5) self.assertEqual(self.NumericDuplicated.Count, 4) self.assertEqual(self.Empty.Count, 0) self.assertEqual(self.PosNeg.Count, 4) self.assertEqual(self.Constant.Count, 5) def test_Sumsq(self): """NumberFreqs sum of squares should match spreadsheet""" self.assertEqual(self.Empty.SumSquares, None) self.assertFloatEqual(self.NumericUnique.SumSquares, 55.0) self.assertFloatEqual(self.NumericDuplicated.SumSquares, 17.75) self.assertFloatEqual(self.PosNeg.SumSquares, 10.0) self.assertFloatEqual(self.Constant.SumSquares, 5.0) def test_Stdev(self): """NumberFreqs stdev should match spreadsheet""" self.assertEqual(self.Empty.StandardDeviation, None) self.assertFloatEqual(self.NumericUnique.StandardDeviation,1.581139) self.assertFloatEqual(self.NumericDuplicated.StandardDeviation,1.108678) self.assertFloatEqual(self.PosNeg.StandardDeviation, 1.825742) self.assertFloatEqual(self.Constant.StandardDeviation, 0.0) def test_NumberFreqsQuantiles(self): """quantiles should match Numbers, including Median""" data={32: 60L, 33: 211L, 34: 141L, 35: 70L, 36: 26L, 10: 30L, 11: 5L, 18: 43L, 19: 10L, 21: 1L, 22: 1L, 23: 58L, 24: 12L, 25: 3L, 26: 74L, 27: 10L, 28: 77L, 29: 20L, 30: 102L, 31: 47L} nums = Numbers(NumberFreqs(data=data).expand()) number_freqs = self.ClassToTest() number_freqs.update(data) for quantile in numpy.arange(0.05, 0.96, 0.05): num_q = nums.quantile(quantile) num_f = number_freqs.quantile(quantile) self.assertFloatEqual(num_f, num_q) self.assertFloatEqual(number_freqs.Median, nums.Median) def test_normalize(self): """NumberFreqs should allow normalization on any type""" self.Empty.normalize() self.assertEqual(self.Empty, {}) # will refuse to normalize if sum is 0 orig = self.PosNeg.copy() self.PosNeg.normalize() self.assertEqual(self.PosNeg, orig) # will normalize OK if total passed in self.PosNeg.normalize(4) #self.PosNeg.Count) expected = {-2:0.25, -1:0.25, 1:0.25, 2:0.25} for key, val in expected.items(): self.assertFloatEqual(self.PosNeg[key], val) def test_Uncertainty(self): """NumberFreqs Shannon entropy values should match spreadsheet""" self.assertEqual(self.Empty.Uncertainty, 0) self.assertEqual(self.PosNeg.Uncertainty, 2) self.assertEqual(self.NumericDuplicated.Uncertainty, 1.5) self.assertEqual("%6.4f" % self.NumericUnique.Uncertainty, '2.3219') self.assertEqual(self.Constant.Uncertainty, 0) def test_Mode(self): """NumberFreqs mode should return most frequent item""" self.assertEqual(self.Empty.Mode, None) assert(self.PosNeg.Mode in self.PosNeg) assert(self.NumericUnique.Mode in self.NumericUnique) self.assertEqual(self.NumericDuplicated.Mode, 1.5) self.assertEqual(self.Constant.Mode, 1) def test_randomSequence_one_item(self): """NumberFreqs randomSequence should work with one key""" self.Constant.normalize() rand = self.Constant.randomSequence(1000) self.assertEqual(rand.count(1), 1000) self.assertEqual(len(rand), 1000) class NumberFreqsTests(NumberFreqsTestsI, TestCase): """Tests of (safe) NumberFreqs classes.""" ClassToTest = NumberFreqs def setUp(self): """defines some standard frequency distributions to check""" self.NumericUnique = self.ClassToTest([1,2,3,4,5]) self.NumericDuplicated = self.ClassToTest([1, 1.5, 1.5, 3.5]) self.Empty = self.ClassToTest() self.PosNeg = self.ClassToTest([-2, -1, 1, 2]) self.Constant = self.ClassToTest([1]*5) def test_setitem_bad(self): """NumberFreqs should not allow non-numeric values""" self.assertRaises(ValueError, self.Empty.__setitem__, 'xyz', -0.01) def test_add_bad(self): """NumberFreqs add should fail if key not numeric""" self.assertRaises(ValueError, self.Empty.__iadd__, {'a':-1}) class UnsafeNumberFreqsTests(NumberFreqsTestsI, TestCase): """Tests of UnsafeNumberFreqs classes.""" ClassToTest = UnsafeNumberFreqs def setUp(self): """defines some standard frequency distributions to check""" self.NumericUnique = self.ClassToTest({1:1,2:1,3:1,4:1,5:1}) self.NumericDuplicated = self.ClassToTest({1:1,1.5:2,3.5:1}) self.Empty = self.ClassToTest() self.PosNeg = self.ClassToTest({-2:1,-1:1,1:1,2:1}) self.Constant = self.ClassToTest({1:5}) #execute tests if called from command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/__init__.py000644 000765 000024 00000000554 12024702176 025663 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_util', 'test_get_by_cai', 'test_adaptor', ] __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/test_adaptor.py000644 000765 000024 00000001404 12024702176 026610 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests of the basic CAI adaptors.""" from cogent.util.unit_test import TestCase, main import cogent.maths.stats.cai.adaptor __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Prototype" class adaptor_tests(TestCase): """Tests of top-level functionality. NOTE: The adaptors are currently tested in an integration test with the drawing modules in test_draw/test_matplotlib/test_codon_usage. There are not individual unit tests at present, although these should possibly be added later. """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/test_get_by_cai.py000644 000765 000024 00000001421 12024702176 027242 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests of the get_by_cai filter classes.""" from cogent.util.unit_test import TestCase, main import cogent.maths.stats.cai.get_by_cai __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Prototype" class get_by_cai_tests(TestCase): """Tests of top-level functionality. NOTE: The adaptors are currently tested in an integration test with the drawing modules in test_draw/test_matplotlib/test_codon_usage. There are not individual unit tests at present, although these should possibly be added later. """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/test_util.py000644 000765 000024 00000035076 12024702176 026147 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests of the basic CAI calculations.""" from cogent.util.unit_test import TestCase, main from math import log, exp from operator import mul from cogent.maths.stats.cai.util import cu, as_rna, synonyms_to_rna, \ get_synonyms, sum_codon_freqs, norm_to_max, arithmetic_mean, \ geometric_mean, codon_adaptiveness_all, codon_adaptiveness_blocks, \ valid_codons, set_min, cai_1, cai_2, cai_3 __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def product(x): return reduce(mul, x) def amean(x): return sum(x)/float(len(x)) def gmean(x): return (product(x))**(1./len(x)) class cai_tests(TestCase): """Tests of top-level functionality.""" def test_as_rna(self): """as_rna should do correct conversion to RNA""" self.assertEqual(as_rna('TCGT'), 'UCGU') def test_synonyms_to_rna(self): """synonyms_to_rna should convert as expected""" s = {'*':['TAA','TAG'], 'F':['TTT','TTC']} self.assertEqual(synonyms_to_rna(s), {'*':['UAA','UAG'],'F':['UUU','UUC']}) def test_synonyms(self): """synonyms should produce correct results for standard genetic code. NOTE: for standard genetic code, expect the following: - Stop codons are UGA, UAA, UAG - Single-codon blocs are M = 'AUG', W = 'UGG' """ result = get_synonyms() self.assertEqual(len(result), 18) self.assertEqual(''.join(sorted(result)), 'ACDEFGHIKLNPQRSTVY') self.assertEqual(sorted(result['I']), ['AUA','AUC','AUU']) #check that we can do it without eliminating single-codon blocks result = get_synonyms(singles_removed=False) self.assertEqual(len(result), 20) self.assertEqual(''.join(sorted(result)), 'ACDEFGHIKLMNPQRSTVWY') self.assertEqual(sorted(result['I']), ['AUA','AUC','AUU']) self.assertEqual(result['W'], ['UGG']) def test_sum_codon_freqs(self): """sum_codon_freqs should add list of codon freqs together, incl, missing keys""" d = {'x':3, 'UUU':5, 'UAC':3} d2 = {'y':5, 'UUU':1, 'AGG':2} result = sum_codon_freqs([d,d2]) self.assertEqual(len(result), 64) assert 'x' not in result #should exclude bad keys self.assertEqual(result['UUU'], 6.0) self.assertEqual(result['AGG'], 2.0) self.assertEqual(result['UAC'], 3.0) self.assertEqual(sum(result.values()), 11.0) def test_norm_to_max(self): """norm_to_max should normalize vals in list to best val""" a = [1,2,3,4] self.assertEqual(norm_to_max(a), [.25,.5,.75,1]) def test_arithmetic_mean(self): """arithmetic_mean should average a list of means with freqs""" obs = arithmetic_mean([1,2,3],[2,1,3]) exp = sum([1,1,2,3,3,3])/6.0 self.assertEqual(obs, exp) #should also work without freqs self.assertFloatEqual(arithmetic_mean([1,3,7]), 11/3.) def test_geometric_mean(self): """geometric_mean should average a list of means with freqs""" obs = geometric_mean([1,2,3],[2,1,3]) exp = (1*1*2*3*3*3)**(1/6.) self.assertEqual(obs, exp) obs = geometric_mean([0.01, 0.2, 0.5], [5, 2, 3]) exp = (.01*.01*.01*.01*.01*.2*.2*.5*.5*.5)**(0.1) self.assertFloatEqual(obs,exp) #should also work without freqs self.assertFloatEqual(geometric_mean([0.01,0.2,0.5]), (.01*.2*.5)**(1/3.)) def test_codon_adaptiveness_all(self): """codon_adaptiveness_all should normalize all codons relative to the best one.""" codons = {'x':4, 'y':3, 'z':2, 'zz':0, 'zzz':4} result = codon_adaptiveness_all(codons) self.assertEqual(result, {'x':1., 'y':.75, 'z':.5, 'zz':0, 'zzz':1.}) def test_codon_adaptiveness_blocks(self): """codon_adaptiveness_blocks should normalize codons by the best in each block""" codons = {'x':4, 'y':1, 'z':2, 'zz':0, 'zzz':2, 'zzzz':1} blocks = {'A': ['x','y','z'], 'B':['zz','zzz','zzzz']} result = codon_adaptiveness_blocks(codons, blocks) self.assertEqual(result, {'x':1., 'y':.25, 'z':.5, 'zz':0, 'zzz':1., 'zzzz':.5}) def test_set_min(self): """set_min should set minimum value to specified threshold.""" codons = {'x':4, 'y':1e-5, 'z':0} set_min(codons, 1) self.assertEqual(codons, {'x':4, 'y':1, 'z':1}) def test_valid_codons(self): """valid_codons should extract all valid codons from blocks""" blocks = {'A':['GCA','GCG'], 'C':['UGU','UGC']} self.assertEqual(list(sorted(valid_codons(blocks))), ['GCA','GCG','UGC','UGU']) def test_cai_1(self): """cai_1 should produce expected results""" ref_freqs = cu.copy() ref_freqs.update({'AGA':4, 'AGG':2, 'CCC':4, 'CCA':1, 'UGG':1}) #tests with arithmetic mean gene_freqs = {'AGA':1} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 1) gene_freqs = {'AGA':5} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 1) gene_freqs = {'AGG':5} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 0.5) gene_freqs = {'AGG':5,'AGA':5} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 0.75) gene_freqs={'AGA':5,'CCC':1} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 1) gene_freqs={'AGA':5,'CCA':5} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 0.625) ref_freqs_2 = cu.copy() ref_freqs_2.update({'AGA':4, 'AGG':2, 'CCC':5, 'CCA':1, 'UGG':1}) ref_freqs_2.update({'UUU':2,'UUC':1}) gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2} obs = cai_1(ref_freqs_2, gene_freqs, average=arithmetic_mean) vals = [.8,.8,.8,.4,1,1,.2,.4,.2,.2] expect = sum(vals)/len(vals) self.assertFloatEqual(obs, expect) #tests with geometric mean gene_freqs = {'AGA':1} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), 1) gene_freqs = {'AGA':5} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), 1) gene_freqs = {'AGG':5} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), 0.5) gene_freqs = {'AGG':5,'AGA':5} self.assertFloatEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), \ (1**5 * 0.5**5)**(0.1)) gene_freqs={'AGA':5,'CCC':1} self.assertEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), 1) gene_freqs={'AGA':5,'CCA':5} self.assertFloatEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), \ (1**5 * 0.25**5)**0.1) ref_freqs_2 = cu.copy() ref_freqs_2.update({'AGA':4, 'AGG':2, 'CCC':5, 'CCA':1, 'UGG':1}) ref_freqs_2.update({'UUU':2,'UUC':1}) gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2} obs = cai_1(ref_freqs_2, gene_freqs, average=geometric_mean) vals = [.8,.8,.8,.4,1,1,.2,.4,.2,.2] expect = (product(vals))**(1./len(vals)) self.assertFloatEqual(obs, expect) def test_cai_2(self): """cai_2 should produce expected results""" ref_freqs = cu.copy() ref_freqs.update({'AGA':4, 'AGG':2, 'CCC':5, 'CCA':1, 'UGG':1}) #tests with arithmetic mean gene_freqs = {'AGA':1} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 1) gene_freqs = {'AGA':5} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 1) gene_freqs = {'AGG':5} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 0.5) gene_freqs = {'AGG':5,'AGA':5} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 0.75) gene_freqs={'AGA':5,'CCC':1} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 1) gene_freqs={'AGA':5,'CCA':5} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 0.6) ref_freqs_2 = ref_freqs.copy() ref_freqs_2.update({'UUU':2,'UUC':1}) gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2} obs = cai_2(ref_freqs_2, gene_freqs, average=arithmetic_mean) vals = [1,1,1,.5,1,1,.2,1,.5,.5] expect = sum(vals)/len(vals) self.assertEqual(obs, expect) #tests with geometric mean gene_freqs = {'AGA':1} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), 1) gene_freqs = {'AGA':5} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), 1) gene_freqs = {'AGG':5} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), 0.5) gene_freqs = {'AGG':5,'AGA':5} self.assertFloatEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), \ (1**5 * 0.5**5)**(0.1)) gene_freqs={'AGA':5,'CCC':1} self.assertEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), 1) gene_freqs={'AGA':5,'CCA':5} self.assertFloatEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), \ (1**5 * 0.2**5)**0.1) ref_freqs_2 = ref_freqs.copy() ref_freqs_2.update({'UUU':2,'UUC':1}) gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2} obs = cai_2(ref_freqs_2, gene_freqs, average=geometric_mean) vals = [1,1,1,.5,1,1,.2,1,.5,.5] expect = (product(vals))**(1./len(vals)) self.assertEqual(obs, expect) #test that results match example on Gang Wu's CAI calculator page ref_freqs = cu.copy() ref_freqs.update({'UUU':78743, 'UUC':56591, 'UUA':51320, 'UUG':45581, \ 'CUU':42704, 'CUC':35873, 'CUA':15275, 'CUG':168885}) gene_freqs={'UUU':6, 'UUC':3, 'CUU':3, 'CUC':2, 'CUG':8} self.assertFloatEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), \ exp((6*log(1) + 3*log(56591./78743) + 3*log(42704./168885) + \ 2*log(35873./168885)+8*log(1))/22.)) def test_cai_3(self): """cai_3 should produce expected results""" ref_freqs = cu.copy() ref_freqs.update({'AGA':4, 'AGG':2, 'CCC':5, 'CCA':1, 'UGG':1}) #tests with arithmetic mean gene_freqs = {'AGA':1} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 1) gene_freqs = {'AGA':5} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 1) gene_freqs = {'AGG':5} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 0.5) gene_freqs = {'AGG':5,'AGA':5} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 0.75) gene_freqs={'AGA':5,'CCC':1} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 1) gene_freqs={'AGA':5,'CCA':5} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 0.6) ref_freqs_2 = ref_freqs.copy() ref_freqs_2.update({'UUU':2,'UUC':1}) gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2} obs = cai_3(ref_freqs_2, gene_freqs, average=arithmetic_mean) family_vals = [[1,1,1,.5],[1,1,.2],[1,.5,.5]] family_averages = map(amean, family_vals) expect = amean(family_averages) self.assertEqual(obs, expect) #tests with geometric mean gene_freqs = {'AGA':1} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), 1) gene_freqs = {'AGA':5} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), 1) gene_freqs = {'AGG':5} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), 0.5) gene_freqs = {'AGG':5,'AGA':5} self.assertFloatEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), \ (1**5 * 0.5**5)**(0.1)) gene_freqs={'AGA':5,'CCC':1} self.assertEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), 1) gene_freqs={'AGA':5,'CCA':5} self.assertFloatEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), \ (1**5 * 0.2**5)**0.1) ref_freqs_2 = ref_freqs.copy() ref_freqs_2.update({'UUU':2,'UUC':1}) gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2} obs = cai_3(ref_freqs_2, gene_freqs, average=geometric_mean) family_vals = [[1,1,1,.5],[1,1,.2],[1,.5,.5]] family_averages = map(gmean, family_vals) expect = gmean(family_averages) self.assertEqual(obs, expect) #tests with Eyre-Walker's variant -- should be same as geometric mean gene_freqs = {'AGA':1} self.assertEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), 1) gene_freqs = {'AGA':5} self.assertEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), 1) gene_freqs = {'AGG':5} self.assertEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), 0.5) gene_freqs = {'AGG':5,'AGA':5} self.assertFloatEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), \ (1**5 * 0.5**5)**(0.1)) gene_freqs={'AGA':5,'CCC':1} self.assertEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), 1) gene_freqs={'AGA':5,'CCA':5} self.assertFloatEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), \ (1**5 * 0.2**5)**0.1) ref_freqs_2 = ref_freqs.copy() ref_freqs_2.update({'UUU':2,'UUC':1}) gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2} obs = cai_3(ref_freqs_2, gene_freqs, average='eyre_walker') family_vals = [[1,1,1,.5],[1,1,.2],[1,.5,.5]] family_averages = map(gmean, family_vals) expect = gmean(family_averages) self.assertEqual(obs, expect) #test results for Gang Wu's example (unfortunately, no worked example for #this model) ref_freqs = cu.copy() ref_freqs.update({'UUU':78743, 'UUC':56591, 'UUA':51320, 'UUG':45581, \ 'CUU':42704, 'CUC':35873, 'CUA':15275, 'CUG':168885}) gene_freqs={'UUU':6, 'UUC':3, 'CUU':3, 'CUC':2, 'CUG':8} obs = cai_3(ref_freqs, gene_freqs, average=geometric_mean) family_vals = [6*[1]+3*[56591./78743],\ 3*[42704./168885] + 2*[35873./168885]+8*[1]] family_averages = map(gmean, family_vals) expect = gmean(family_averages) self.assertFloatEqual(obs, expect) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_spatial/__init__.py000644 000765 000024 00000000450 12024702176 024362 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_ckd3'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" PyCogent-1.5.3/tests/test_maths/test_spatial/test_ckd3.py000644 000765 000024 00000003626 12024702176 024516 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os, tempfile import numpy as np try: from cogent.util.unit_test import TestCase, main except ImportError: from zenpdb.cogent.util.unit_test import TestCase, main __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class KDTreeTest(TestCase): """Tests KD-Trees""" def setUp(self): self.arr = np.random.random(3000).reshape((1000, 3)) self.point = np.random.random(3) self.center = np.array([0.5, 0.5, 0.5]) def test_0import(self): # sort by name """tests if can import ckd3 cython extension.""" global ckd3 from cogent.maths.spatial import ckd3 assert 'KDTree' in dir(ckd3) def test_instance(self): """check if KDTree instance behaves correctly.""" kdt = ckd3.KDTree(self.arr) self.assertEquals(kdt.dims, 3) def assig(): kdt.dims = 4 self.assertRaises(AttributeError, assig) self.assertEquals(kdt.dims, 3) self.assertEquals(kdt.pnts, 1000) def test_knn(self): """testing k-nearest neighbors. """ sqd = np.sum(np.power((self.arr - self.point), 2), axis=1) sorted_idx = sqd.argsort() kdt = ckd3.KDTree(self.arr) points, dists = kdt.knn(self.point, 5) self.assertEqualItems(sorted_idx[:5], points) def test_rn(self): """testing neighbors within radius. """ sqd = np.sum(np.power((self.arr - self.point), 2), axis=1) sqd = sqd[sqd <= 0.05] sqd.sort() kdt = ckd3.KDTree(self.arr) points, dists = kdt.rn(self.point, 0.05) dists.sort() self.assertFloatEqual(dists, sqd) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_maths/test_matrix/__init__.py000644 000765 000024 00000000440 12024702176 024230 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_distance'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_maths/test_matrix/test_distance.py000644 000765 000024 00000037563 12024702176 025342 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for distance matrices. """ from cogent.util.unit_test import TestCase, main from cogent.maths.matrix.distance import DistanceMatrix from cogent.util.dict2d import largest, Dict2DError, Dict2DSparseError from cogent.parse.aaindex import AAIndex1Record from cogent.maths.stats.util import Freqs from copy import deepcopy __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "caporaso@colorado.edu" __status__ = "Production" class DistanceMatrixTests(TestCase): def setUp(self): # v : vector # m : matrix self.default_keys = list('ACDEFGHIKLMNPQRSTVWY') # Set up some matrices v1 = {'A':1, 'B':2, 'C':3} v2 = {'A':4, 'B':5, 'C':6} v3 = {'A':7, 'B':8, 'C':9} self.m1 = {'A':dict(v1),\ 'B':dict(v2),\ 'C':dict(v3)} v4 = {'A':0, 'B':1, 'C':5} v5 = {'A':5, 'B':0, 'C':4, 'X':99} v6 = {'A':5, 'B':8, 'C':0} self.m2 = {'A':dict(v4),\ 'B':dict(v5),\ 'C':dict(v6)} self.matrices = [self.m1,self.m2] aar_data = dict(zip(self.default_keys, [i*.15 for i in range(20)])) # Setup a AAIndex1Record for testing purposes self.aar = AAIndex1Record("5", "Some Info",\ "25", "Greg", "A test",\ "something", "This is a test, this is only a test",\ [0.987, 0.783, 1., 0], aar_data) # From test_Dict2D, used in tests at end of this file for # inheritance testing self.empty = {} self.single_same = {'a':{'a':2}} self.single_diff = {'a':{'b':3}} self.square = { 'a':{'a':1,'b':2,'c':3}, 'b':{'a':2,'b':4,'c':6}, 'c':{'a':3,'b':6,'c':9}, } self.top_triangle = { 'a':{'a':1, 'b':2, 'c':3}, 'b':{'b':4, 'c':6}, 'c':{'c':9} } self.bottom_triangle = { 'b':{'a':2}, 'c':{'a':3, 'b':6} } self.sparse = { 'a':{'a':1, 'c':3}, 'd':{'b':2}, } self.dense = { 'a':{'a':1,'b':2,'c':3}, 'b':{'a':2,'b':4,'c':6}, } def test_all_init_parameters(self): """ All parameters to init are handled correctly """ # will fail if any paramters are not recognized d = DistanceMatrix() d = DistanceMatrix(data={}) d = DistanceMatrix(RowOrder=[]) d = DistanceMatrix(ColOrder=[]) d = DistanceMatrix(Pad=True) d = DistanceMatrix(Default=42) d = DistanceMatrix(data={},RowOrder=[],ColOrder=[],Pad=True,Default=42) def test_attribute_init(self): """ Proper initialization of all attributes """ # proper setting to defaults d = DistanceMatrix(data={'a':{'a':1}}) self.assertEqual(d.RowOrder, self.default_keys) self.assertEqual(d.ColOrder, self.default_keys) self.assertEqual(d.Pad, True) self.assertEqual(d.Default, None) self.assertEqual(d.RowConstructor, dict) # differ from defaults d = DistanceMatrix(data={'a':{'b':1}},RowOrder=['a'],\ ColOrder=['b'],Pad=False,Default=42,RowConstructor=Freqs) self.assertEqual(d.RowOrder, ['a']) self.assertEqual(d.ColOrder, ['b']) self.assertEqual(d.Pad, False) self.assertEqual(d.Default, 42) self.assertEqual(d.RowConstructor, Freqs) # differ from defaults and no data d = DistanceMatrix(RowOrder=['a'],\ ColOrder=['b'],Pad=False,Default=42,RowConstructor=Freqs) self.assertEqual(d.RowOrder, ['a']) self.assertEqual(d.ColOrder, ['b']) self.assertEqual(d.Pad, False) self.assertEqual(d.Default, 42) self.assertEqual(d.RowConstructor, Freqs) def test_Order_defaults(self): """ RowOrder and ColOrder are set to default as expected """ for m in self.matrices: dm = DistanceMatrix(data=m) self.assertEqual(dm.RowOrder, self.default_keys) self.assertEqual(dm.ColOrder, self.default_keys) def test_Order_parameters(self): """ RowOrder and ColOrder are set to paramters as expected """ row_order = ['a'] col_order = ['b'] for m in self.matrices: dm = DistanceMatrix(data=m, RowOrder=row_order, ColOrder=col_order) self.assertEqual(dm.RowOrder, row_order) self.assertEqual(dm.ColOrder, col_order) def test_rowKeys(self): """ rowKeys functions properly """ dm = DistanceMatrix(data={'a':{'b':1}}) goal = self.default_keys + ['a'] goal.sort() actual = dm.rowKeys() actual.sort() self.assertEqual(actual,goal) def test_colKeys(self): """ colKeys functions properly """ dm = DistanceMatrix(data={'a':{'b':1}}) goal = self.default_keys + ['b'] goal.sort() actual = dm.colKeys() actual.sort() self.assertEqual(actual,goal) def test_sharedColKeys(self): """ sharedColKeys functions properly """ # no shared keys b/c a is not in RowOrder and therefore not padded dm = DistanceMatrix(data={'a':{'b':1}}) self.assertEqual(dm.sharedColKeys(),[]) # shared should be only self.default_keys b/c 'b' not in ColOrder dm = DistanceMatrix(data={'a':{'b':1}},\ RowOrder=self.default_keys + ['a']) actual = dm.sharedColKeys() actual.sort() self.assertEqual(actual, self.default_keys) # shared should be self.default_keys + 'b' dm = DistanceMatrix(data={'a':{'b':1}},\ RowOrder=self.default_keys + ['a'],\ ColOrder=self.default_keys + ['b']) actual = dm.sharedColKeys() actual.sort() self.assertEqual(actual, self.default_keys + ['b']) def test_default_padding(self): """ Default padding functions as expected """ for m in self.matrices: dm = DistanceMatrix(data=m) for r in self.default_keys: for c in self.default_keys: dm[r][c] def test_init_data_types(self): """ Correct init from varying data types """ # No data goal = {}.fromkeys(self.default_keys) for r in goal: goal[r] = {}.fromkeys(self.default_keys) dm = DistanceMatrix() self.assertEqual(dm,goal) # data is dict of dicts dm = DistanceMatrix(data={'a':{'b':1}}, Pad=False) self.assertEqual(dm,{'a':{'b':1}}) # data is list of lists dm = DistanceMatrix(data=[[1]],RowOrder=['a'],ColOrder=['b'], Pad=False) self.assertEqual(dm,{'a':{'b':1}}) # data is in Indices form dm = DistanceMatrix(data=[('a','b',1)], Pad=False) self.assertEqual(dm,{'a':{'b':1}}) def test_sparse_init(self): """ Init correctly from a sparse dict """ d = DistanceMatrix(data={'A':{'C':0.}}) for r in self.default_keys: for c in self.default_keys: if (r == 'A') and (c == 'C'): self.assertEqual(d[r][c],0.) else: self.assertEqual(d[r][c],None) def test_dict_integrity(self): """ Integrity of key -> value pairs """ for m in self.matrices: dm = DistanceMatrix(data=m) self.assertEqual(dm['A']['A'], m['A']['A']) self.assertEqual(dm['B']['C'], m['B']['C']) def test_attribute_forwarder_integrity(self): """ Integrity of attribute forwarding """ dm = DistanceMatrix(data=self.m2,info=self.aar) self.assertEqual(dm.ID, '5') self.assertEqual(dm.Correlating, [0.987, 0.783, 1., 0]) self.assertEqual(dm.Data['C'], 0.15) def test_copy(self): """ Copy functions as expected""" dm = DistanceMatrix(data=self.m2, RowOrder=self.m2.keys(), info=self.aar) c = dm.copy() self.assertEqual(c['A']['A'],dm['A']['A']) self.assertEqual(c.RowOrder,dm.RowOrder) self.assertEqual(c.ColOrder,dm.ColOrder) self.assertEqual(c.Pad,dm.Pad) self.assertEqual(c.Power,dm.Power) # Make sure it's a separate object c['A']['A'] = 999 self.assertNotEqual(c['A']['A'],dm['A']['A']) def test_attribute_forwarder_integrity_after_copy(self): """ Integrity of attribute forwarding following a copy()""" dm = DistanceMatrix(data=self.m2, RowOrder=self.m2.keys(), info=self.aar) c = dm.copy() # dm.ID == '5' self.assertEqual(c.ID, dm.ID) self.assertEqual(c.Correlating, dm.Correlating) self.assertEqual(c.Data['R'], dm.Data['R']) c.ID = '0' self.assertNotEqual(c.ID,dm.ID) def test_setDiag(self): """ setDiag works as expected """ for m in self.matrices: # create a deep copy so we can test against original # matrix without it being effected by altering the object # based on it n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=m.keys()) # set diag to 42 dm.setDiag(42) # test that diag is 42 for k in dm: self.assertEqual(dm[k][k],42) # test that no diag is unchanged self.assertEqual(dm['B']['A'], m['B']['A']) self.assertEqual(dm['B']['C'], m['B']['C']) def test_scale(self): """ Scale correctly applies function to all elements """ for m in self.matrices: # Test square all elements # explicit tests n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=m.keys(), Pad=False) dm.scale(lambda x: x**2) self.assertEqual(dm['A']['A'],m['A']['A']**2) self.assertEqual(dm['B']['A'],m['B']['A']**2) self.assertEqual(dm['B']['C'],m['B']['C']**2) # Test cube all elements # explicit tests n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=m.keys(), Pad=False) dm.scale(lambda x: x**3) self.assertEqual(dm['A']['A'],m['A']['A']**3) self.assertEqual(dm['B']['A'],m['B']['A']**3) self.assertEqual(dm['B']['C'],m['B']['C']**3) # Test linearize all elements # explicit tests n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=m.keys(), Pad=False) dm.scale(lambda x: 10**-(x/10.0)) self.assertFloatEqual(dm['A']['A'],10**-(m['A']['A']/10.)) self.assertFloatEqual(dm['B']['A'],10**-(m['B']['A']/10.)) self.assertFloatEqual(dm['B']['C'],10**-(m['B']['C']/10.)) def test_elementPow_valid(self): """ elementPow correctly scales all elements and updates self.Power""" for m in self.matrices: # Test square all elements # explicit tests n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\ Pad=False) dm.elementPow(2) self.assertEqual(dm.Power, 2) self.assertEqual(dm['A']['A'],m['A']['A']**2) self.assertEqual(dm['B']['A'],m['B']['A']**2) self.assertEqual(dm['B']['C'],m['B']['C']**2) # Test cube square root of all elements # explicit tests n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\ Pad=False) dm.elementPow(3) dm.elementPow(1./2.) self.assertEqual(dm.Power, 3./2.) self.assertEqual(dm['A']['A'],m['A']['A']**(3./2.)) self.assertEqual(dm['B']['A'],m['B']['A']**(3./2.)) self.assertEqual(dm['B']['C'],m['B']['C']**(3./2.)) def test_elementPow_ignore_invalid(self): """ elementPow correctly detects and ignores invalid data""" for m in self.matrices: # Test square all elements # explicit tests n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\ Pad=False) dm['A']['A'] = 'p' dm.elementPow(2) self.assertEqual(dm.Power, 2.) self.assertEqual(dm['A']['A'],'p') n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\ Pad=False) dm['A']['A'] = None dm.elementPow(2) self.assertEqual(dm.Power, 2.) self.assertEqual(dm['A']['A'],None) def test_elementPow_error_on_invalid(self): """ elementPow correctly raises error on invalid data""" for m in self.matrices: # Test square all elements # explicit tests n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\ Pad=False) dm['A']['A'] = 'p' self.assertRaises(TypeError,dm.elementPow,2,ignore_invalid=False) dm['A']['A'] = None self.assertRaises(TypeError,dm.elementPow,2,ignore_invalid=False) def test_elementPow_invalid_pow(self): """ elementPow correctly raises error on invalid power """ for m in self.matrices: n = deepcopy(m) dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\ Pad=False) self.assertRaises(TypeError,dm.elementPow,None,ignore_invalid=False) self.assertRaises(TypeError,dm.elementPow,'a',ignore_invalid=False) def test_transpose(self): """ transpose functions as expected """ for m in self.matrices: d = DistanceMatrix(data=m) t = d.copy() t.transpose() # Note, this line will fail on a matrix where transpose = original self.assertNotEqual(t,d) for r in t: for c in t[r]: self.assertEqual(t[r][c],d[c][r]) t.transpose() self.assertEqual(t,d) def test_reflect(self): """ reflect functions as expected """ for m in self.matrices: d = DistanceMatrix(data=m) n = d.copy() # Only testing one method, all other are tested in superclass, so # redundant testing is probably not necessary n.reflect(method=largest) for r in d.RowOrder: for c in d.ColOrder: if d[r][c] > d[c][r]: goal = d[r][c] else: goal = d[c][r] self.assertEqual(n[r][c],goal) self.assertEqual(n[c][r],goal) ###### # Following tests copied (and slightly modified) from test_DistanceMatrix and # written by Rob Knight. Intended to test inheritance ##### def test_toDelimited(self): """DistanceMatrix toDelimited functions as expected""" d = DistanceMatrix(self.square,Pad=False) d.RowOrder = d.ColOrder = 'abc' self.assertEqual(d.toDelimited(), \ '-\ta\tb\tc\na\t1\t2\t3\nb\t2\t4\t6\nc\t3\t6\t9') self.assertEqual(d.toDelimited(headers=False), \ '1\t2\t3\n2\t4\t6\n3\t6\t9') #set up a custom formatter... def my_formatter(x): try: return '%1.1f' % x except: return str(x) #...and use it self.assertEqual(d.toDelimited(headers=True, item_delimiter='x', \ row_delimiter='y', formatter=my_formatter), \ '-xaxbxcyax1.0x2.0x3.0ybx2.0x4.0x6.0ycx3.0x6.0x9.0') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_format/__init__.py000644 000765 000024 00000000643 12024702176 022046 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_mage', 'test_fasta', 'test_pdb_color', 'test_xyzrn'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Gavin Huttley", "Sandra Smit", "Marcin Cieslik", "Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" PyCogent-1.5.3/tests/test_format/test_bedgraph.py000644 000765 000024 00000012777 12024702176 023135 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for Mage format writer. """ from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.util.table import Table from cogent.format.bedgraph import get_header __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" from cogent.util.unit_test import TestCase, main class FormatBedgraph(TestCase): def test_only_required_columns(self): """generate bedgraph from minimal data""" table = Table(header=['chrom', 'start', 'end', 'value'], rows=[['1', 100, i, 0] for i in range(101,111)] + \ [['1', 150, i, 10] for i in range(151,161)]) bgraph = table.tostring(format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0)) self.assertTrue(bgraph, '\n'.join(['track type=bedGraph name="test track" '\ +'description="test of bedgraph" color=255,0,0', '1\t100\t110\t0', '1\t150\t160\t10'])) def test_merged_overlapping_spans(self): """bedgraph merged overlapping spans, one chrom""" rows = [['1', i, i+1, 0] for i in range(100, 121)] +\ [['1', i, i+1, 10] for i in range(150, 161)] table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows) bgraph = table.tostring(format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0)) self.assertTrue(bgraph, '\n'.join(['track type=bedGraph name="test track" '\ +'description="test of bedgraph" color=255,0,0', '1\t100\t120\t0', '1\t150\t160\t10'])) def test_merged_overlapping_spans_multichrom(self): """bedgraph merged overlapping spans, two crhoms""" rows = [['1', i, i+1, 0] for i in range(100, 121)] +\ [['1', i, i+1, 10] for i in range(150, 161)] rows += [['2', i, i+1, 0] for i in range(100, 121)] table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows) bgraph = table.tostring(format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0)) self.assertTrue(bgraph, '\n'.join(['track type=bedGraph name="test track" '\ +'description="test of bedgraph" color=255,0,0', '1\t100\t120\t1', '1\t150\t160\t10', '2\t105\t120\t1',])) def test_invalid_args_fail(self): """incorrect bedgraph args causes RuntimeError""" rows = [['1', i, i+1, 0] for i in range(100, 121)] +\ [['1', i, i+1, 10] for i in range(150, 161)] table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows) self.assertRaises(RuntimeError, table.tostring, format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0), abc=None) def test_invalid_table_fails(self): """assertion error if table has > 4 columns""" rows = [['1', i, i+1, 0, 1] for i in range(100, 121)] +\ [['1', i, i+1, 10, 1] for i in range(150, 161)] table = Table(header=['chrom', 'start', 'end', 'value', 'blah'], rows=rows) self.assertRaises(AssertionError, table.tostring, format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0), abc=None) def test_boolean_correctly_formatted(self): """boolean setting correctly formatted""" rows = [['1', i, i+1, 0] for i in range(100, 121)] +\ [['1', i, i+1, 10] for i in range(150, 161)] table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows) bgraph = table.tostring(format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0), autoScale=True) self.assertTrue(bgraph, '\n'.join(['track type=bedGraph name="test track" '\ +'description="test of bedgraph" color=255,0,0 autoScale=on', '1\t100\t110\t1', '1\t150\t160\t10'])) def test_int_correctly_formatted(self): """int should be correctly formatted""" rows = [['1', i, i+1, 0] for i in range(100, 121)] +\ [['1', i, i+1, 10] for i in range(150, 161)] table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows) bgraph = table.tostring(format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0), smoothingWindow=10) self.assertTrue(bgraph, '\n'.join(['track type=bedGraph name="test track" '\ +'description="test of bedgraph" color=255,0,0 smoothingWindow=10', '1\t100\t110\t1', '1\t150\t160\t10'])) def test_raises_on_incorrect_format_val(self): """raise AssertionError when provide incorrect format value""" rows = [['1', i, i+1, 0] for i in range(100, 121)] +\ [['1', i, i+1, 10] for i in range(150, 161)] table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows) self.assertRaises(AssertionError, table.tostring, format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0), windowingFunction='sqrt') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_format/test_clustal.py000644 000765 000024 00000005062 12024702176 023015 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for Clustal sequence format writer. """ from cogent.util.unit_test import TestCase, main from cogent.format.clustal import clustal_from_alignment from cogent.core.alignment import Alignment from cogent.core.sequence import Sequence from cogent.core.info import Info __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" class ClustalTests(TestCase): """Tests for Clustal writer. """ def setUp(self): """Setup for Clustal tests.""" self.unaligned_dict = {'1st':'AAA','2nd':'CCCC','3rd':'GGGG', '4th':'UUUU'} self.alignment_dict = {'1st':'AAAA','2nd':'CCCC','3rd':'GGGG', '4th':'UUUU'} #create alignment change order. self.alignment_object = Alignment(self.alignment_dict) self.alignment_order = ['2nd','4th','3rd','1st'] self.alignment_object.RowOrder=self.alignment_order self.clustal_with_label=\ """CLUSTAL 1st AAAA 2nd CCCC 3rd GGGG 4th UUUU """ self.clustal_with_label_lw2=\ """CLUSTAL 1st AA 2nd CC 3rd GG 4th UU 1st AA 2nd CC 3rd GG 4th UU """ self.clustal_with_label_reordered=\ """CLUSTAL 2nd CCCC 4th UUUU 3rd GGGG 1st AAAA """ self.clustal_with_label_lw2_reordered=\ """CLUSTAL 2nd CC 4th UU 3rd GG 1st AA 2nd CC 4th UU 3rd GG 1st AA """ def test_clustal_from_alignment_unaligned(self): """should raise error with unaligned seqs.""" self.assertRaises(ValueError,\ clustal_from_alignment,self.unaligned_dict) def test_clustal_from_alignment(self): """should return correct clustal string.""" self.assertEqual(clustal_from_alignment({}),'') self.assertEqual(clustal_from_alignment(self.alignment_dict),\ self.clustal_with_label) self.assertEqual(clustal_from_alignment(self.alignment_dict, interleave_len=2),self.clustal_with_label_lw2) def test_clustal_from_alignment_reordered(self): """should return correct clustal string.""" self.assertEqual(clustal_from_alignment(self.alignment_object),\ self.clustal_with_label_reordered) self.assertEqual(clustal_from_alignment(self.alignment_object, interleave_len=2),self.clustal_with_label_lw2_reordered) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_format/test_fasta.py000644 000765 000024 00000006207 12024702176 022446 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for FASTA sequence format writer. """ from cogent.util.unit_test import TestCase, main from cogent.format.fasta import fasta_from_sequences, fasta_from_alignment from cogent.core.alignment import Alignment from cogent.core.sequence import Sequence from cogent.core.info import Info __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" class FastaTests(TestCase): """Tests for Fasta writer. """ def setUp(self): """Setup for Fasta tests.""" self.strings = ['AAAA','CCCC','gggg','uuuu'] self.labels = ['1st','2nd','3rd','4th'] self.infos = ["Dog", "Cat", "Mouse", "Rat"] self.sequences_with_labels = map(Sequence, self.strings) self.sequences_with_names = map(Sequence, self.strings) for l,sl,sn in zip(self.labels,self.sequences_with_labels,\ self.sequences_with_names): sl.Label = l sn.Name = l self.fasta_no_label='>0\nAAAA\n>1\nCCCC\n>2\ngggg\n>3\nuuuu' self.fasta_with_label=\ '>1st\nAAAA\n>2nd\nCCCC\n>3rd\nGGGG\n>4th\nUUUU' self.fasta_with_label_lw2=\ '>1st\nAA\nAA\n>2nd\nCC\nCC\n>3rd\nGG\nGG\n>4th\nUU\nUU' self.alignment_dict = {'1st':'AAAA','2nd':'CCCC','3rd':'GGGG', '4th':'UUUU'} self.alignment_object = Alignment(self.alignment_dict) for label, info in zip(self.labels, self.infos): self.alignment_object.NamedSeqs[label].Info = Info(species=info) self.fasta_with_label_species=\ '>1st:Dog\nAAAA\n>2nd:Cat\nCCCC\n>3rd:Mouse\nGGGG\n>4th:Rat\nUUUU' self.alignment_object.RowOrder = ['1st','2nd','3rd','4th'] def test_fastaFromSequence(self): """should return correct fasta string.""" self.assertEqual(fasta_from_sequences(''),'') self.assertEqual(fasta_from_sequences(self.strings),\ self.fasta_no_label) self.assertEqual(fasta_from_sequences(self.sequences_with_labels),\ self.fasta_with_label) self.assertEqual(fasta_from_sequences(self.sequences_with_names),\ self.fasta_with_label) make_seqlabel = lambda seq: "%s:%s" % (seq.Name, seq.Info.species) seqs = [self.alignment_object.NamedSeqs[label] for label in self.labels] self.assertEqual(fasta_from_sequences(seqs, make_seqlabel=make_seqlabel), self.fasta_with_label_species) def test_fasta_from_alignment(self): """should return correct fasta string.""" self.assertEqual(fasta_from_alignment({}),'') self.assertEqual(fasta_from_alignment(self.alignment_dict),\ self.fasta_with_label) self.assertEqual(fasta_from_alignment(self.alignment_dict, line_wrap=2),self.fasta_with_label_lw2) self.assertEqual(fasta_from_alignment(self.alignment_object),\ self.fasta_with_label) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_format/test_mage.py000644 000765 000024 00000050443 12024702176 022262 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for Mage format writer. """ from __future__ import division from numpy import array from copy import deepcopy from cogent.util.unit_test import TestCase, main from cogent.format.mage import MagePoint, MageList, MageGroup, MageHeader, \ Kinemage, MagePointFromBaseFreqs from cogent.core.usage import BaseUsage from cogent.util.misc import Delegator __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class MagePointTests(TestCase): """Tests of the MagePoint class, holding information about points.""" def test_init_empty(self): """MagePoint should init correctly with no data""" m = MagePoint() self.assertEqual(str(m), ' '.join(map(str,[0,0,0]))) def test_init(self): """MagePoint should init correctly with normal cases""" #label and coords m = MagePoint([0.200,0.000,0.800], '0.800') self.assertEqual(str(m), '{0.800} ' + \ ' '.join(map(str, ([0.200,0.000,0.800])))) #coords only m = MagePoint([0.200,0.000,0.800]) self.assertEqual(str(m), ' '.join(map(str, ([0.200,0.000,0.800])))) #label only m = MagePoint(Label='abc') self.assertEqual(str(m), '{abc} '+' '.join(map(str,[0,0,0]))) #all fields occupied m = MagePoint(Label='abc', Coordinates=[3, 6, 1.5], Radius=0.5, \ Width=2, State='P', Color='green') self.assertEqual(str(m), \ '{abc} P green width2 r=0.5 3 6 1.5') def test_cmp(self): """MagePoint cmp should compare all fields""" self.assertEqual(MagePoint([0,0,0]), MagePoint([0,0,0])) self.assertNotEqual(MagePoint([0,0,0]), MagePoint([0,0,0], Color='red')) def test_get_coord(self): """MagePoint _get_coord should return coordinate that is asked for""" m = MagePoint([0,1,2]) self.assertEqual(m.X,m.Coordinates[0]) self.assertEqual(m.Y,m.Coordinates[1]) self.assertEqual(m.Z,m.Coordinates[2]) m = MagePoint() self.assertEqual(m.X,m.Coordinates[0]) self.assertEqual(m.Y,m.Coordinates[1]) self.assertEqual(m.Z,m.Coordinates[2]) def test_set_coord(self): """MagePoint _get_coord should return coordinate that is asked for""" m = MagePoint([0,1,2]) m.X, m.Y, m.Z = 2,3,4 self.assertEqual(m.Coordinates,[2,3,4]) m = MagePoint() m.X, m.Y, m.Z = 5,4,3 self.assertEqual(m.Coordinates,[5,4,3]) m = MagePoint() m.X = 5 self.assertEqual(m.Coordinates,[5,0,0]) def test_toCartesian(self): """MagePoint toCartesian() should transform coordinates correctly""" m = MagePoint([.1,.2,.3]) self.assertEqual(m.toCartesian().Coordinates,[.6,.7,.5]) m = MagePoint() self.assertEqual(m.toCartesian().Coordinates,[1,1,1]) m = MagePoint([.25,.25,.25],Color='red',Label='label',State='L') self.assertEqual(m.toCartesian().Coordinates,[.5,.5,.5]) self.assertEqual(m.toCartesian().Color,m.Color) self.assertEqual(m.toCartesian().Label,m.Label) self.assertEqual(m.toCartesian().State,m.State) m = MagePoint([1/3.0,1/3.0,0]) self.assertFloatEqual(m.toCartesian().Coordinates, [2/3.0,1/3.0,2/3.0]) m = MagePoint([1/3.0,1/3.0,1/3.0]) self.assertFloatEqual(m.toCartesian().Coordinates, [1/3.0,1/3.0,1/3.0]) m = MagePoint([3,4,5]) self.assertRaises(ValueError,m.toCartesian) def test_fromCartesian(self): """MagePoint fromCartesian should transform coordinates correctly""" mp = MagePoint([2/3.0,1/3.0,2/3.0]) self.assertFloatEqual(mp.fromCartesian().Coordinates,[1/3.0,1/3.0,0]) points = [MagePoint([.1,.2,.3]),MagePoint([.25,.25,.25],Color='red', Label='label',State='L'),MagePoint([1/3,1/3,0]), MagePoint([0,0,0]),MagePoint([1/7,2/7,3/7])] for m in points: b = m.toCartesian().fromCartesian() self.assertFloatEqual(m.Coordinates,b.Coordinates) self.assertEqual(m.Color,b.Color) self.assertEqual(m.Label,b.Label) self.assertEqual(m.State,b.State) #even after multiple iterations? mutant = deepcopy(m) for x in range(10): mutant = mutant.toCartesian().fromCartesian() self.assertFloatEqual(m.Coordinates,mutant.Coordinates) class freqs_label(dict): """dict with Label and Id, for testing MagePointFromBaseFreqs""" def __init__(self, Label, Id, freqs): self.Label = Label self.Id = Id self.update(freqs) class freqs_display(dict): """dict with display properties, for testing MagePointFromBaseFreqs""" def __init__(self, Color, Radius, Id, freqs): self.Color = Color self.Radius = Radius self.Id = Id self.update(freqs) class MagePointFromBaseFreqsTests(TestCase): """Tests of the MagePointFromBaseFreqs factory function.""" def setUp(self): """Define a few standard frequencies""" self.empty = freqs_label(None, None, {}) self.dna = freqs_label('dna', None, {'A':4, 'T':1, 'G':2, 'C':3}) self.rna = freqs_label(None, 'rna', {'U':2, 'A':1, 'G':2}) self.display = freqs_display('green', '0.25', 'xxx', {'A':2}) def test_MagePointFromBaseFreqs(self): """MagePoint should fill itself from base freqs correctly""" e = MagePointFromBaseFreqs(self.empty) self.assertEqual(str(e), '0.0 0.0 0.0') dna = MagePointFromBaseFreqs(self.dna) self.assertEqual(str(dna), '{dna} 0.4 0.3 0.2') rna = MagePointFromBaseFreqs(self.rna) self.assertEqual(str(rna), '{rna} 0.2 0.0 0.4') display = MagePointFromBaseFreqs(self.display) self.assertEqual(str(display), \ '{xxx} green r=0.25 1.0 0.0 0.0') def test_MagePointFromBaseFreqs_usage(self): """MagePoint should init correctly from base freqs""" class fake_seq(str, Delegator): def __new__(cls, data, *args): return str.__new__(cls, data) def __init__(self, data, *args): Delegator.__init__(self, *args) self.__dict__['Info'] = self._handler str.__init__(data) class has_species(object): def __init__(self, sp): self.Species = sp s = fake_seq('AAAAACCCTG', has_species('Homo sapiens')) b = BaseUsage(s) p = MagePointFromBaseFreqs(b) self.assertEqual(str(p), '{Homo sapiens} 0.5 0.3 0.1') def test_MagePointFromBaseFreqs_functions(self): """MagePointFromBaseFreqs should apply functions correctly""" def set_color(x): if x.Label == 'dna': return 'green' else: return 'blue' def set_radius(x): if x.Label == 'dna': return 0.25 else: return 0.5 def set_label(x): if x.Id is not None: return 'xxx' else: return 'yyy' self.assertEqual(str(MagePointFromBaseFreqs(self.dna, get_label=set_label)), '{yyy} 0.4 0.3 0.2') self.assertEqual(str(MagePointFromBaseFreqs(self.rna, get_label=set_label)), '{xxx} 0.2 0.0 0.4') self.assertEqual(str(MagePointFromBaseFreqs(self.dna, get_radius=set_radius)), '{dna} r=0.25 0.4 0.3 0.2') self.assertEqual(str(MagePointFromBaseFreqs(self.rna, get_radius=set_radius)), '{rna} r=0.5 0.2 0.0 0.4') self.assertEqual(str(MagePointFromBaseFreqs(self.dna, get_color=set_color)), '{dna} green 0.4 0.3 0.2') self.assertEqual(str(MagePointFromBaseFreqs(self.rna, get_color=set_color)), '{rna} blue 0.2 0.0 0.4') self.assertEqual(str(MagePointFromBaseFreqs(self.rna, get_label=set_label, get_radius=set_radius,get_color=set_color)), '{xxx} blue r=0.5 0.2 0.0 0.4') class MageListTests(TestCase): """Tests of the MageList class, holding a collection of points.""" def setUp(self): """Define a few standard points and lists of points.""" self.null = MagePoint([0,0,0]) self.label = MagePoint([1, 1, 1], 'test') self.properties = MagePoint(Width=1, Label='point', State='L',\ Color='blue', Coordinates=[2.0,4.0,6.0]) self.radius1 = MagePoint([2,2,2],Radius=.1) self.radius2 = MagePoint([3,3,3],Radius=.5) self.first_list = [self.null, self.properties] self.empty_list = [] self.minimal_list = [self.null] self.single_list = [self.label] self.multi_list = [self.properties] * 10 self.radii = [self.radius1,self.radius2] def test_init_empty(self): """MageList should init correctly with no data""" m = MageList() self.assertEqual(str(m), "@dotlist") m = MageList(self.empty_list) self.assertEqual(str(m), "@dotlist") def test_init(self): """MageList should init correctly with data""" m = MageList(self.minimal_list) self.assertEqual(str(m), "@dotlist\n" + str(self.null)) m = MageList(self.multi_list,'x',Off=True,Color='green',NoButton=True) self.assertEqual(str(m), "@dotlist {x} off nobutton color=green\n" + \ '\n'.join(10 * [str(self.properties)])) m = MageList(self.first_list,NoButton=True,Color='red', \ Style='vector', Radius=0.03, Width=3, Label='test') self.assertEqual(str(m), "@vectorlist {test} nobutton color=red " + \ "radius=0.03 width=3\n" + str(self.null) + '\n' + str(self.properties)) def test_toArray_radii(self): """MageList toArray should return the correct array""" m = MageList(self.empty_list) self.assertEqual(m.toArray(),array(())) m = MageList(self.first_list,Radius=.3) self.assertEqual(m.toArray(),array([[0,0,0,0.3],[2.0,4.0,6.0,0.3]])) m = MageList(self.radii) self.assertEqual(m.toArray(), array([[2,2,2,.1],[3,3,3,.5]])) m = MageList(self.radii,Radius=.4) self.assertEqual(m.toArray(), array([[2,2,2,.1],[3,3,3,.5]])) m = MageList(self.single_list) #radius = None self.assertRaises(ValueError,m.toArray) def test_toArray_coords_only(self): """MageList toArray should return the correct array""" m = MageList(self.empty_list) self.assertEqual(m.toArray(include_radius=False),array(())) m = MageList(self.first_list,Radius=.3) self.assertEqual(m.toArray(include_radius=False), array([[0,0,0],[2.0,4.0,6.0]])) m = MageList(self.radii) self.assertEqual(m.toArray(include_radius=False), array([[2,2,2],[3,3,3]])) m = MageList(self.radii,Radius=.4) self.assertEqual(m.toArray(include_radius=False), array([[2,2,2],[3,3,3]])) m = MageList(self.single_list) #radius = None self.assertEqual(m.toArray(include_radius=False),array([[1,1,1]])) def test_iterPoints(self): """MageList iterPoints should yield all points in self""" m = MageList(self.single_list) for point in m.iterPoints(): assert isinstance(point,MagePoint) self.assertEqual(len(list(m.iterPoints())),1) m = MageList(self.multi_list) for point in m.iterPoints(): assert isinstance(point,MagePoint) self.assertEqual(len(list(m.iterPoints())),10) def test_toCartesian(self): """MageList toCartesian should return new list""" m = MageList([self.null],Color='green') res = m.toCartesian() self.assertEqual(len(m), len(res)) self.assertEqual(m.Color,res.Color) self.assertEqual(res[0].Coordinates,[1,1,1]) m.Color='red' self.assertEqual(res.Color,'green') m = MageList([self.properties]) self.assertRaises(ValueError,m.toCartesian) def test_fromCartesian(self): """MageList fromCartesian() should return new list with ACG coordinates """ point = MagePoint([.1,.2,.3]) m = MageList([point]*5,Color='green') res = m.toCartesian().fromCartesian() self.assertEqual(str(m),str(res)) class MageGroupTests(TestCase): """Test cases for the MageGroup class.""" def setUp(self): """Define some standard lists and groups.""" self.p1 = MagePoint([0, 1, 0], Color='green', Label='x') self.p0 = MagePoint([0,0,0]) self.min_list = MageList([self.p0]*2,'y') self.max_list = MageList([self.p1]*5,'z',Color='blue',Off=True, \ Style='ball') self.min_group = MageGroup([self.min_list], Label="min_group") self.max_group = MageGroup([self.min_list, self.max_list], Color='red', Label="max_group", Style='dot') self.nested = MageGroup([self.min_group, self.max_group], Label='nest', Color='orange', Radius=0.3, Style='vector') self.empty = MageGroup(Label='empty',Color='orange', NoButton=True, Style='vector',RecessiveOn=False) def test_init(self): """Nested MageGroups should set subgroup and cascades correctly.""" exp_lines = [ '@group {nest} recessiveon', '@subgroup {min_group} recessiveon', '@vectorlist {y} color=orange radius=0.3', str(self.p0), str(self.p0), '@subgroup {max_group} recessiveon', '@dotlist {y} color=red radius=0.3', str(self.p0), str(self.p0), '@balllist {z} off color=blue radius=0.3', str(self.p1), str(self.p1), str(self.p1), str(self.p1), str(self.p1), ] s = str(self.nested).split('\n') self.assertEqual(str(self.nested), '\n'.join(exp_lines)) #check that resetting the cascaded values works OK nested = self.nested str(nested) self.assertEqual(nested,self.nested) self.assertEqual(nested[0][0].Color,None) def test_str(self): """MageGroup str should print correctly""" m = self.empty self.assertEqual(str(self.empty),'@group {empty} nobutton') m = MageGroup(Label='label',Clone='clone_name',Off=True) self.assertEqual(str(m), '@group {label} off recessiveon clone={clone_name}') m = MageGroup() self.assertEqual(str(m),'@group recessiveon') def test_iterGroups(self): """MageGroup iterGroups should behave as expected""" groups = list(self.nested.iterGroups()) self.assertEqual(groups[0],self.min_group) self.assertEqual(groups[1],self.max_group) self.assertEqual(len(groups),2) def test_iterLists(self): """MageGroup iterLists should behave as expected""" lists = list(self.nested.iterLists()) self.assertEqual(len(lists),3) self.assertEqual(lists[0],self.min_list) self.assertEqual(lists[1],self.min_list) self.assertEqual(lists[2],self.max_list) def test_iterGroupsAndLists(self): """MageGroup iterGroupsAndLists should behave as expected""" all = list(self.nested.iterGroupsAndLists()) self.assertEqual(len(all),5) self.assertEqual(all[0],self.min_group) self.assertEqual(all[4],self.max_list) def test_iterPoints(self): """MageGroup iterPoints should behave as expected""" points = list(self.nested.iterPoints()) self.assertEqual(len(points),9) self.assertEqual(points[1],self.p0) self.assertEqual(points[6],self.p1) def test_toCartesian(self): """MageGroup toCartesian should return a new MageGroup""" m = self.nested res = m.toCartesian() self.assertEqual(len(m),len(res)) self.assertEqual(m.RecessiveOn,res.RecessiveOn) self.assertEqual(m[1][1].Color, res[1][1].Color) self.assertEqual(res[1][1][1].Coordinates,[1,0,0]) def test_fromCartesian(self): """MageGroup fromCartesian should return a new MageGroup""" point = MagePoint([.1,.2,.3]) l = MageList([point]*5,Color='red') m = MageGroup([l],Radius=0.02,Subgroup=True) mg = MageGroup([m]) res = mg.toCartesian().fromCartesian() self.assertEqual(str(mg),str(res)) class MageHeaderTests(TestCase): """Tests of the MageHeader class. For now, MageHeader does nothing, so just verify that it gets the string. """ def test_init(self): """MageHeader should keep the string it was initialized with.""" m = MageHeader('@perspective') self.assertEqual(str(m), '@perspective') class KinemageTests(TestCase): """Tests of the overall Kinemage class.""" def setUp(self): self.point = MagePoint([0,0,0],'x') self.ml = MageList([self.point], Label='y',Color='green') self.mg1 = MageGroup([self.ml],Label='z') self.mg2 = MageGroup([self.ml,self.ml],Label='b') self.kin = Kinemage(1) self.kin.Groups = [self.mg1,self.mg2] def test_init_empty(self): """Kinemage empty init should work, but refuse to print""" k = Kinemage() self.assertEqual(k.Count, None) self.assertRaises(ValueError, k.__str__) def test_init(self): """Kinemage should init with any of its usual fields""" k = Kinemage(1) self.assertEqual(str(k), '@kinemage 1') k.Header = '@perspective' self.assertEqual(str(k), '@kinemage 1\n@perspective') k.Count = 2 self.assertEqual(str(k), '@kinemage 2\n@perspective') k.Header = '' k.Caption = 'test caption' self.assertEqual(str(k), '@kinemage 2\n@caption\ntest caption') k.Caption = None k.Text = 'some text' self.assertEqual(str(k), '@kinemage 2\n@text\nsome text') k.Groups = [self.mg1] k.Header = '@test_header' k.Caption = 'This is\nThe caption' k.Text = 'some text here' self.assertEqual(str(k), '@kinemage 2\n@test_header\n@text\n' +\ 'some text here\n' + \ '@caption\nThis is\nThe caption\n@group {z} recessiveon\n' + \ '@dotlist {y} color=green\n{x} 0 0 0') def test_iterGroups(self): """Kinemage iterGroups should behave as expected""" k = self.kin groups = list(k.iterGroups()) self.assertEqual(len(groups),2) self.assertEqual(groups[0],self.mg1) self.assertEqual(groups[1],self.mg2) def test_iterLists(self): """Kinemage iterLists should behave as expected""" k = self.kin lists = list(k.iterLists()) self.assertEqual(len(lists),3) self.assertEqual(lists[0],self.ml) def test_iterPoints(self): """Kinemage iterPoints should behave as expected""" k = self.kin points = list(k.iterPoints()) self.assertEqual(len(points),3) self.assertEqual(points[0],self.point) def test_iterGroupAndLists(self): """Kinemage iterGroupsAndLists should behave as expected""" all = list(self.kin.iterGroupsAndLists()) self.assertEqual(len(all),5) self.assertEqual(all[0],self.mg1) self.assertEqual(all[4],self.ml) def test_toCartesian(self): """Kinemage toCartesian should return new Kinemage with UC,UG,UA coords """ k = self.kin res = k.toCartesian() self.assertEqual(len(k.Groups),len(res.Groups)) self.assertEqual(k.Text,res.Text) self.assertEqual(k.Groups[1].RecessiveOn,res.Groups[1].RecessiveOn) self.assertEqual(res.Groups[0][0][0].Coordinates,[1,1,1]) def test_fromCartesian(self): """Kinemage fromCartesian should return Kinemage with A,C,G(,U) coords """ point = MagePoint([.1,.2,.3]) l = MageList([point]*5,Color='red') m1 = MageGroup([l],Radius=0.02,Subgroup=True) m2 = MageGroup([l],Radius=0.02) mg = MageGroup([m1]) k = Kinemage(Count=1,Groups=[mg,m2]) res = k.toCartesian().fromCartesian() self.assertEqual(str(k),str(res)) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_format/test_pdb_color.py000644 000765 000024 00000141751 12024702176 023317 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.util.misc import app_path from cogent.format.pdb_color import get_aligned_muscle, make_color_list, \ ungapped_to_pdb_numbers, get_matching_chains, get_chains, \ get_best_muscle_hits, chains_to_seqs, align_subject_to_pdb __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" """Tests of the pdb_color module. Owner: Jeremy Widmann jeremy.widmann@colorado.edu Revision History: October 2006 Jeremy Widmann: File created """ MUSCLE_PATH = app_path('muscle') class PdbColorTests(TestCase): """Tests for pdb_color functions. """ def setUp(self): """Setup for pdb_color tests.""" #Nucleotide test data results self.test_pdb_chains_1 = {'A': [(1, 'G'), (2, 'C'), (3, 'C'), (4, 'A'), (5, 'C'), (6, 'C'), (7, 'C'), (8, 'U'), (9, 'G')], 'B': [(10, 'C'), (11, 'A'), (12, 'G'), (13, 'G'), (14, 'G'), (15, 'U'), (16, 'C'), (17, 'G'), (18, 'G'), (19, 'C')]} self.ungapped_to_pdb_1 = {'A': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9}, 'B': {0: 10, 1: 11, 2: 12, 3: 13, 4: 14, 5: 15, 6: 16, 7: 17, 8: 18, 9: 19}} self.test_pdb_seqs_1 = {'A': 'GCCACCCUG', 'B': 'CAGGGUCGGC'} self.test_pdb_types_1 = {'A': 'Nucleotide', 'B': 'Nucleotide'} #Protein test data results self.test_pdb_chains_2 = {'A': [(1, 'ALA'), (2, 'PRO'), (3, 'ILE'), (4, 'LYS'), (5, 'VAL'), (6, 'GLY'), (7, 'ASP'), (8, 'ALA')]} self.ungapped_to_pdb_2 = {'A': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8}} self.test_pdb_seqs_2 = {'A': 'APIKVGDA'} self.test_pdb_types_2 = {'A': 'Protein'} def test_get_aligned_muscle(self): """Tests for get_aligned_muscle function. """ if not MUSCLE_PATH: return 'skipping test' seq1 = 'ACCUG' seq2 = 'ACGGUG' seq1_aligned_known = 'AC-CUG' seq2_aligned_known = 'ACGGUG' frac_same_known = 4/5.0 seq1_aln, seq2_aln, frac_same = get_aligned_muscle(seq1,seq2) self.assertEqual(seq1_aln,seq1_aligned_known) self.assertEqual(seq2_aln,seq2_aligned_known) self.assertEqual(frac_same,frac_same_known) def test_get_chains_nucleotide(self): """Tests for get_chains function using nucleotide pdb lines. """ chains_nuc = get_chains(TEST_PDB_STRING_1.split('\n')) self.assertEqual(chains_nuc, self.test_pdb_chains_1) def test_get_chains_protein(self): """Tests for get_chains function using protein pdb lines. """ chains_prot = get_chains(TEST_PDB_STRING_2.split('\n')) self.assertEqual(chains_prot, self.test_pdb_chains_2) def test_ungapped_to_pdb_nucleotide(self): """Tests for ungapped_to_pdb function using nucleotide pdb chains. """ for k,v in self.test_pdb_chains_1.items(): self.assertEqual(ungapped_to_pdb_numbers(v),\ self.ungapped_to_pdb_1[k]) def test_ungapped_to_pdb_protein(self): """Tests for ungapped_to_pdb function using protein pdb chains. """ for k,v in self.test_pdb_chains_2.items(): self.assertEqual(ungapped_to_pdb_numbers(v),\ self.ungapped_to_pdb_2[k]) def test_chains_to_seqs_nucleotide(self): """Tests for chains_to_seqs function using nucleotide pdb chains. """ seqs, seqtypes = chains_to_seqs(self.test_pdb_chains_1) self.assertEqual(seqs, self.test_pdb_seqs_1) self.assertEqual(seqtypes, self.test_pdb_types_1) def test_chains_to_seqs_protein(self): """Tests for chains_to_seqs function using protein pdb chains. """ seqs, seqtypes = chains_to_seqs(self.test_pdb_chains_2) self.assertEqual(seqs, self.test_pdb_seqs_2) self.assertEqual(seqtypes, self.test_pdb_types_2) def test_get_best_muscle_hits(self): """Tests for get_best_muscle_hits function. """ if not MUSCLE_PATH: return 'skipping test' subject_seq = 'AACCGGUU' query_aln = {1:'CCCCCCCC', 2:'GGGGGGGG', 3:'AAGGGGUU', 4:'AACCGGGU'} res_20 = {1:'CCCCCCCC', 2:'GGGGGGGG', 3:'AAGGGGUU', 4:'AACCGGGU'} res_50 = {3:'AAGGGGUU', 4:'AACCGGGU'} res_80 = {4:'AACCGGGU'} res_100 = {} self.assertEqual(get_best_muscle_hits(subject_seq,query_aln,.2),res_20) self.assertEqual(get_best_muscle_hits(subject_seq,query_aln,.5),res_50) self.assertEqual(get_best_muscle_hits(subject_seq,query_aln,.8),res_80) self.assertEqual(get_best_muscle_hits(subject_seq,query_aln,1),res_100) def test_get_matching_chains(self): """Tests for get_matching_chains function. """ if not MUSCLE_PATH: return 'skipping test' subject_seq = 'GCGACCCUG' res_30 = {'A': 'GCCACCCUG', 'B': 'CAGGGUCGGC'} res_80 = {'A': 'GCCACCCUG'} res_100 = {} #Threshold of .3 test_30, ungapped_to_pdb = get_matching_chains(subject_seq, \ TEST_PDB_STRING_1.split('\n'),\ subject_type='Nucleotide',\ threshold=.3) #Threshold of .8 test_80, ungapped_to_pdb = get_matching_chains(subject_seq, \ TEST_PDB_STRING_1.split('\n'),\ subject_type='Nucleotide',\ threshold=.8) #Threshold of 1 test_100, ungapped_to_pdb = get_matching_chains(subject_seq, \ TEST_PDB_STRING_1.split('\n'),\ subject_type='Nucleotide',\ threshold=1) #Incorrect subject_type #Threshold of .3 test_wrong_subject, ungapped_to_pdb = get_matching_chains(subject_seq, \ TEST_PDB_STRING_1.split('\n'),\ subject_type='Protein',\ threshold=.3) self.assertEqual(test_30,res_30) self.assertEqual(test_80,res_80) self.assertEqual(test_100,res_100) self.assertEqual(test_wrong_subject,{}) def test_align_subject_to_pdb(self): """Tests for align_subject_to_pdb function. """ if not MUSCLE_PATH: return 'skipping test' subject_seq = 'GCGACCCUG' pdb_matching = {'A': 'GCCACCCUG', 'B': 'CAGGGUCGGC'} result = {'A':('GCGACCCUG', 'GCCACCCUG'), \ 'B':('GCGACCCUG-', 'CAGGGUCGGC')} self.assertEqual(align_subject_to_pdb(subject_seq,pdb_matching),result) def test_make_color_list(self): """Tests for make_color_list function. """ colors = [(1.0,1.0,1.0),(1.0,0.0,1.0),(.5,.5,.5)] res = [('color_1',(1.0,1.0,1.0)),\ ('color_2',(1.0,0.0,1.0)),\ ('color_3',(.5,.5,.5))] self.assertEqual(make_color_list(colors),res) TEST_PDB_STRING_1 = """ HEADER RIBONUCLEIC ACID 04-JAN-00 1DQH TITLE CRYSTAL STRUCTURE OF HELIX II OF THE X. LAEVIS SOMATIC 5S TITLE 2 RRNA WITH A CYTOSINE BULGE IN TWO CONFORMATIONS CRYST1 32.780 32.780 102.500 90.00 90.00 90.00 P 43 21 2 8 ATOM 1 O5* G A 1 38.612 13.536 39.204 1.00 37.41 O ATOM 2 C5* G A 1 39.496 12.419 39.356 1.00 34.43 C ATOM 3 C4* G A 1 38.750 11.165 39.729 1.00 33.33 C ATOM 4 O4* G A 1 38.129 11.341 41.036 1.00 33.01 O ATOM 5 C3* G A 1 37.614 10.780 38.799 1.00 33.57 C ATOM 6 O3* G A 1 38.110 9.968 37.740 1.00 34.48 O ATOM 7 C2* G A 1 36.722 9.952 39.715 1.00 32.60 C ATOM 8 O2* G A 1 37.155 8.606 39.837 1.00 31.59 O ATOM 9 C1* G A 1 36.871 10.693 41.048 1.00 31.18 C ATOM 10 N9 G A 1 35.863 11.709 41.312 1.00 29.41 N ATOM 11 C8 G A 1 36.087 13.049 41.490 1.00 28.46 C ATOM 12 N7 G A 1 34.996 13.725 41.725 1.00 28.77 N ATOM 13 C5 G A 1 33.986 12.775 41.711 1.00 28.42 C ATOM 14 C6 G A 1 32.586 12.918 41.929 1.00 28.30 C ATOM 15 O6 G A 1 31.954 13.943 42.190 1.00 27.86 O ATOM 16 N1 G A 1 31.921 11.699 41.818 1.00 27.81 N ATOM 17 C2 G A 1 32.533 10.496 41.534 1.00 27.95 C ATOM 18 N2 G A 1 31.731 9.428 41.426 1.00 27.06 N ATOM 19 N3 G A 1 33.844 10.357 41.352 1.00 28.40 N ATOM 20 C4 G A 1 34.498 11.525 41.450 1.00 28.58 C ATOM 21 P C A 2 37.584 10.166 36.231 1.00 35.40 P ATOM 22 O1P C A 2 38.502 9.371 35.370 1.00 37.17 O ATOM 23 O2P C A 2 37.371 11.599 35.951 1.00 34.58 O ATOM 24 O5* C A 2 36.188 9.406 36.193 1.00 33.54 O ATOM 25 C5* C A 2 36.118 7.997 36.285 1.00 32.07 C ATOM 26 C4* C A 2 34.683 7.567 36.457 1.00 30.69 C ATOM 27 O4* C A 2 34.156 8.055 37.727 1.00 29.94 O ATOM 28 C3* C A 2 33.728 8.137 35.431 1.00 30.66 C ATOM 29 O3* C A 2 33.789 7.366 34.249 1.00 31.59 O ATOM 30 C2* C A 2 32.391 7.983 36.148 1.00 28.79 C ATOM 31 O2* C A 2 31.937 6.639 36.197 1.00 29.01 O ATOM 32 C1* C A 2 32.779 8.394 37.563 1.00 27.55 C ATOM 33 N1 C A 2 32.625 9.836 37.823 1.00 25.64 N ATOM 34 C2 C A 2 31.353 10.331 38.113 1.00 24.78 C ATOM 35 O2 C A 2 30.382 9.563 38.052 1.00 23.25 O ATOM 36 N3 C A 2 31.210 11.641 38.434 1.00 24.08 N ATOM 37 C4 C A 2 32.265 12.446 38.446 1.00 23.49 C ATOM 38 N4 C A 2 32.070 13.718 38.781 1.00 22.55 N ATOM 39 C5 C A 2 33.573 11.981 38.110 1.00 24.21 C ATOM 40 C6 C A 2 33.702 10.676 37.806 1.00 24.76 C ATOM 41 P C A 3 33.724 8.080 32.829 1.00 33.04 P ATOM 42 O1P C A 3 34.035 6.997 31.842 1.00 33.90 O ATOM 43 O2P C A 3 34.513 9.331 32.832 1.00 32.17 O ATOM 44 O5* C A 3 32.184 8.423 32.636 1.00 31.52 O ATOM 45 C5* C A 3 31.244 7.380 32.516 1.00 29.58 C ATOM 46 C4* C A 3 29.856 7.885 32.841 1.00 27.91 C ATOM 47 O4* C A 3 29.809 8.390 34.201 1.00 26.77 O ATOM 48 C3* C A 3 29.352 9.042 32.005 1.00 28.23 C ATOM 49 O3* C A 3 28.917 8.548 30.744 1.00 30.13 O ATOM 50 C2* C A 3 28.212 9.549 32.887 1.00 26.35 C ATOM 51 O2* C A 3 27.079 8.690 32.854 1.00 23.86 O ATOM 52 C1* C A 3 28.855 9.454 34.273 1.00 25.79 C ATOM 53 N1 C A 3 29.575 10.688 34.648 1.00 24.80 N ATOM 54 C2 C A 3 28.832 11.781 35.132 1.00 24.14 C ATOM 55 O2 C A 3 27.577 11.677 35.209 1.00 23.22 O ATOM 56 N3 C A 3 29.482 12.906 35.489 1.00 23.57 N ATOM 57 C4 C A 3 30.820 12.983 35.357 1.00 23.98 C ATOM 58 N4 C A 3 31.421 14.109 35.709 1.00 23.48 N ATOM 59 C5 C A 3 31.596 11.888 34.855 1.00 24.07 C ATOM 60 C6 C A 3 30.938 10.773 34.525 1.00 24.08 C ATOM 61 P A A 4 28.825 9.524 29.474 1.00 32.73 P ATOM 62 O1P A A 4 28.402 8.666 28.346 1.00 34.23 O ATOM 63 O2P A A 4 30.032 10.358 29.329 1.00 32.79 O ATOM 64 O5* A A 4 27.664 10.569 29.796 1.00 32.19 O ATOM 65 C5* A A 4 26.305 10.158 29.955 1.00 30.34 C ATOM 66 C4* A A 4 25.469 11.306 30.483 1.00 29.48 C ATOM 67 O4* A A 4 25.962 11.694 31.789 1.00 27.46 O ATOM 68 C3* A A 4 25.495 12.610 29.694 1.00 29.92 C ATOM 69 O3* A A 4 24.603 12.575 28.598 1.00 32.61 O ATOM 70 C2* A A 4 25.037 13.609 30.748 1.00 28.81 C ATOM 71 O2* A A 4 23.640 13.574 30.987 1.00 29.76 O ATOM 72 C1* A A 4 25.761 13.085 31.989 1.00 26.51 C ATOM 73 N9 A A 4 27.073 13.686 32.195 1.00 25.35 N ATOM 74 C8 A A 4 28.277 13.137 31.857 1.00 24.29 C ATOM 75 N7 A A 4 29.306 13.870 32.184 1.00 23.88 N ATOM 76 C5 A A 4 28.741 14.980 32.793 1.00 24.49 C ATOM 77 C6 A A 4 29.305 16.106 33.385 1.00 23.89 C ATOM 78 N6 A A 4 30.621 16.294 33.487 1.00 23.21 N ATOM 79 N1 A A 4 28.468 17.036 33.883 1.00 23.83 N ATOM 80 C2 A A 4 27.145 16.832 33.779 1.00 24.11 C ATOM 81 N3 A A 4 26.497 15.805 33.254 1.00 23.77 N ATOM 82 C4 A A 4 27.365 14.897 32.777 1.00 24.53 C ATOM 83 P C A 5 24.910 13.447 27.286 1.00 35.53 P ATOM 84 O1P C A 5 23.998 12.939 26.228 1.00 37.30 O ATOM 85 O2P C A 5 26.366 13.448 27.050 1.00 36.57 O ATOM 86 O5* C A 5 24.512 14.941 27.678 1.00 34.43 O ATOM 87 C5* C A 5 23.191 15.257 28.063 1.00 33.62 C ATOM 88 C4* C A 5 23.144 16.642 28.633 1.00 33.30 C ATOM 89 O4* C A 5 23.865 16.674 29.889 1.00 32.25 O ATOM 90 C3* C A 5 23.834 17.712 27.809 1.00 33.29 C ATOM 91 O3* C A 5 23.016 18.137 26.731 1.00 35.25 O ATOM 92 C2* C A 5 24.027 18.781 28.860 1.00 32.42 C ATOM 93 O2* C A 5 22.811 19.399 29.229 1.00 32.98 O ATOM 94 C1* C A 5 24.494 17.930 30.040 1.00 31.25 C ATOM 95 N1 C A 5 25.956 17.721 30.055 1.00 29.62 N ATOM 96 C2 C A 5 26.739 18.692 30.649 1.00 28.72 C ATOM 97 O2 C A 5 26.182 19.698 31.113 1.00 29.29 O ATOM 98 N3 C A 5 28.074 18.527 30.707 1.00 28.00 N ATOM 99 C4 C A 5 28.632 17.446 30.193 1.00 27.58 C ATOM 100 N4 C A 5 29.955 17.322 30.311 1.00 27.48 N ATOM 101 C5 C A 5 27.862 16.439 29.546 1.00 27.81 C ATOM 102 C6 C A 5 26.532 16.614 29.508 1.00 28.20 C ATOM 103 P C A 6 23.692 18.693 25.380 1.00 36.34 P ATOM 104 O1P C A 6 22.530 18.995 24.495 1.00 37.64 O ATOM 105 O2P C A 6 24.762 17.807 24.897 1.00 35.91 O ATOM 106 O5* C A 6 24.353 20.071 25.823 1.00 34.59 O ATOM 107 C5* C A 6 23.528 21.138 26.251 1.00 33.84 C ATOM 108 C4* C A 6 24.359 22.254 26.818 1.00 33.26 C ATOM 109 O4* C A 6 25.110 21.734 27.937 1.00 32.57 O ATOM 110 C3* C A 6 25.439 22.842 25.933 1.00 33.36 C ATOM 111 O3* C A 6 24.918 23.787 25.011 1.00 34.78 O ATOM 112 C2* C A 6 26.314 23.522 26.974 1.00 32.29 C ATOM 113 O2* C A 6 25.695 24.687 27.482 1.00 33.47 O ATOM 114 C1* C A 6 26.331 22.447 28.060 1.00 31.49 C ATOM 115 N1 C A 6 27.434 21.495 27.881 1.00 29.26 N ATOM 116 C2 C A 6 28.720 21.876 28.301 1.00 29.08 C ATOM 117 O2 C A 6 28.868 22.999 28.797 1.00 29.07 O ATOM 118 N3 C A 6 29.752 21.012 28.149 1.00 28.04 N ATOM 119 C4 C A 6 29.534 19.814 27.609 1.00 28.05 C ATOM 120 N4 C A 6 30.564 18.975 27.495 1.00 28.92 N ATOM 121 C5 C A 6 28.230 19.407 27.162 1.00 28.53 C ATOM 122 C6 C A 6 27.226 20.270 27.318 1.00 28.47 C ATOM 123 P C A 7 25.633 23.980 23.581 1.00 34.67 P ATOM 124 O1P C A 7 24.754 24.860 22.776 1.00 34.93 O ATOM 125 O2P C A 7 26.052 22.667 23.033 1.00 33.74 O ATOM 126 O5* C A 7 26.947 24.804 23.919 1.00 33.62 O ATOM 127 C5* C A 7 26.834 26.089 24.487 1.00 32.39 C ATOM 128 C4* C A 7 28.195 26.619 24.841 1.00 32.43 C ATOM 129 O4* C A 7 28.796 25.780 25.861 1.00 31.40 O ATOM 130 C3* C A 7 29.216 26.582 23.727 1.00 32.59 C ATOM 131 O3* C A 7 29.039 27.661 22.832 1.00 33.34 O ATOM 132 C2* C A 7 30.510 26.706 24.503 1.00 30.81 C ATOM 133 O2* C A 7 30.730 28.021 24.964 1.00 31.51 O ATOM 134 C1* C A 7 30.207 25.796 25.698 1.00 30.43 C ATOM 135 N1 C A 7 30.670 24.427 25.465 1.00 28.25 N ATOM 136 C2 C A 7 32.021 24.155 25.622 1.00 27.81 C ATOM 137 O2 C A 7 32.798 25.101 25.915 1.00 26.72 O ATOM 138 N3 C A 7 32.458 22.883 25.457 1.00 26.97 N ATOM 139 C4 C A 7 31.591 21.912 25.152 1.00 26.73 C ATOM 140 N4 C A 7 32.056 20.670 25.049 1.00 25.90 N ATOM 141 C5 C A 7 30.206 22.176 24.955 1.00 26.90 C ATOM 142 C6 C A 7 29.796 23.434 25.113 1.00 28.38 C ATOM 143 P U A 8 29.356 27.448 21.284 1.00 34.70 P ATOM 144 O1P U A 8 28.951 28.708 20.571 1.00 35.62 O ATOM 145 O2P U A 8 28.821 26.145 20.836 1.00 34.46 O ATOM 146 O5* U A 8 30.942 27.354 21.220 1.00 32.95 O ATOM 147 C5* U A 8 31.739 28.423 21.710 1.00 32.04 C ATOM 148 C4* U A 8 33.182 28.009 21.784 1.00 32.28 C ATOM 149 O4* U A 8 33.347 26.969 22.780 1.00 31.32 O ATOM 150 C3* U A 8 33.778 27.374 20.544 1.00 32.35 C ATOM 151 O3* U A 8 34.101 28.349 19.563 1.00 34.03 O ATOM 152 C2* U A 8 35.026 26.728 21.124 1.00 31.88 C ATOM 153 O2* U A 8 36.029 27.682 21.404 1.00 31.73 O ATOM 154 C1* U A 8 34.481 26.181 22.445 1.00 31.32 C ATOM 155 N1 U A 8 34.022 24.800 22.296 1.00 30.13 N ATOM 156 C2 U A 8 34.954 23.806 22.406 1.00 29.56 C ATOM 157 O2 U A 8 36.151 24.027 22.587 1.00 30.22 O ATOM 158 N3 U A 8 34.454 22.540 22.298 1.00 28.90 N ATOM 159 C4 U A 8 33.147 22.177 22.079 1.00 28.56 C ATOM 160 O4 U A 8 32.873 20.991 21.974 1.00 28.38 O ATOM 161 C5 U A 8 32.230 23.269 21.949 1.00 29.20 C ATOM 162 C6 U A 8 32.691 24.519 22.059 1.00 29.91 C ATOM 163 P G A 9 34.056 27.954 18.027 1.00 35.48 P ATOM 164 O1P G A 9 34.253 29.197 17.239 1.00 36.23 O ATOM 165 O2P G A 9 32.850 27.108 17.775 1.00 36.21 O ATOM 166 O5* G A 9 35.348 27.048 17.835 1.00 32.95 O ATOM 167 C5* G A 9 36.637 27.612 18.017 1.00 31.54 C ATOM 168 C4* G A 9 37.696 26.545 17.935 1.00 29.88 C ATOM 169 O4* G A 9 37.533 25.628 19.041 1.00 29.65 O ATOM 170 C3* G A 9 37.696 25.633 16.719 1.00 29.94 C ATOM 171 O3* G A 9 38.321 26.196 15.566 1.00 30.72 O ATOM 172 C2* G A 9 38.513 24.452 17.217 1.00 29.09 C ATOM 173 O2* G A 9 39.906 24.698 17.166 1.00 30.51 O ATOM 174 C1* G A 9 38.035 24.354 18.672 1.00 28.52 C ATOM 175 N9 G A 9 36.951 23.383 18.827 1.00 27.70 N ATOM 176 C8 G A 9 35.600 23.640 18.864 1.00 27.66 C ATOM 177 N7 G A 9 34.873 22.551 18.981 1.00 27.66 N ATOM 178 C5 G A 9 35.807 21.518 19.032 1.00 27.08 C ATOM 179 C6 G A 9 35.622 20.108 19.171 1.00 26.99 C ATOM 180 O6 G A 9 34.557 19.479 19.295 1.00 27.32 O ATOM 181 N1 G A 9 36.834 19.430 19.166 1.00 25.88 N ATOM 182 C2 G A 9 38.067 20.024 19.072 1.00 26.62 C ATOM 183 N2 G A 9 39.121 19.203 19.094 1.00 26.35 N ATOM 184 N3 G A 9 38.253 21.334 18.960 1.00 26.19 N ATOM 185 C4 G A 9 37.092 22.013 18.942 1.00 26.94 C TER 186 G A 9 ATOM 187 O5* C B 10 37.876 10.866 21.876 1.00 38.18 O ATOM 188 C5* C B 10 39.087 10.527 21.197 1.00 34.41 C ATOM 189 C4* C B 10 39.746 11.780 20.669 1.00 34.90 C ATOM 190 O4* C B 10 38.931 12.392 19.627 1.00 33.24 O ATOM 191 C3* C B 10 39.904 12.927 21.657 1.00 34.06 C ATOM 192 O3* C B 10 40.989 12.675 22.550 1.00 35.88 O ATOM 193 C2* C B 10 40.214 14.067 20.695 1.00 32.72 C ATOM 194 O2* C B 10 41.499 13.919 20.131 1.00 33.32 O ATOM 195 C1* C B 10 39.202 13.792 19.582 1.00 31.82 C ATOM 196 N1 C B 10 37.945 14.536 19.750 1.00 30.40 N ATOM 197 C2 C B 10 37.944 15.911 19.492 1.00 29.71 C ATOM 198 O2 C B 10 39.014 16.463 19.179 1.00 29.35 O ATOM 199 N3 C B 10 36.795 16.602 19.605 1.00 29.04 N ATOM 200 C4 C B 10 35.677 15.980 19.976 1.00 29.04 C ATOM 201 N4 C B 10 34.555 16.703 20.064 1.00 28.12 N ATOM 202 C5 C B 10 35.656 14.584 20.270 1.00 29.17 C ATOM 203 C6 C B 10 36.802 13.911 20.146 1.00 29.79 C ATOM 204 P A B 11 40.901 13.173 24.071 1.00 38.04 P ATOM 205 O1P A B 11 42.040 12.552 24.800 1.00 37.87 O ATOM 206 O2P A B 11 39.521 13.016 24.582 1.00 36.68 O ATOM 207 O5* A B 11 41.192 14.729 23.921 1.00 35.30 O ATOM 208 C5* A B 11 42.473 15.172 23.503 1.00 33.64 C ATOM 209 C4* A B 11 42.479 16.676 23.341 1.00 31.60 C ATOM 210 O4* A B 11 41.587 17.051 22.255 1.00 30.20 O ATOM 211 C3* A B 11 41.947 17.462 24.525 1.00 30.78 C ATOM 212 O3* A B 11 42.937 17.594 25.538 1.00 30.99 O ATOM 213 C2* A B 11 41.593 18.790 23.867 1.00 29.71 C ATOM 214 O2* A B 11 42.736 19.561 23.571 1.00 29.00 O ATOM 215 C1* A B 11 40.990 18.307 22.547 1.00 29.66 C ATOM 216 N9 A B 11 39.547 18.127 22.638 1.00 28.32 N ATOM 217 C8 A B 11 38.844 16.967 22.817 1.00 28.32 C ATOM 218 N7 A B 11 37.548 17.134 22.849 1.00 28.39 N ATOM 219 C5 A B 11 37.382 18.501 22.678 1.00 27.65 C ATOM 220 C6 A B 11 36.236 19.324 22.601 1.00 27.42 C ATOM 221 N6 A B 11 34.973 18.877 22.692 1.00 27.87 N ATOM 222 N1 A B 11 36.433 20.651 22.423 1.00 26.83 N ATOM 223 C2 A B 11 37.687 21.105 22.330 1.00 27.23 C ATOM 224 N3 A B 11 38.837 20.433 22.378 1.00 27.13 N ATOM 225 C4 A B 11 38.610 19.123 22.553 1.00 27.90 C ATOM 226 P G B 12 42.493 17.715 27.072 1.00 32.74 P ATOM 227 O1P G B 12 43.728 17.509 27.884 1.00 34.58 O ATOM 228 O2P G B 12 41.277 16.922 27.381 1.00 32.04 O ATOM 229 O5* G B 12 42.027 19.234 27.207 1.00 30.99 O ATOM 230 C5* G B 12 42.962 20.289 27.056 1.00 30.37 C ATOM 231 C4* G B 12 42.254 21.614 27.115 1.00 29.14 C ATOM 232 O4* G B 12 41.457 21.794 25.913 1.00 28.24 O ATOM 233 C3* G B 12 41.237 21.787 28.229 1.00 28.75 C ATOM 234 O3* G B 12 41.830 22.073 29.500 1.00 30.19 O ATOM 235 C2* G B 12 40.436 22.958 27.690 1.00 27.60 C ATOM 236 O2* G B 12 41.166 24.177 27.802 1.00 27.01 O ATOM 237 C1* G B 12 40.309 22.557 26.218 1.00 27.19 C ATOM 238 N9 G B 12 39.139 21.706 26.021 1.00 26.10 N ATOM 239 C8 G B 12 39.095 20.333 25.971 1.00 25.61 C ATOM 240 N7 G B 12 37.878 19.869 25.854 1.00 25.39 N ATOM 241 C5 G B 12 37.080 21.003 25.805 1.00 24.51 C ATOM 242 C6 G B 12 35.677 21.130 25.709 1.00 25.12 C ATOM 243 O6 G B 12 34.819 20.224 25.646 1.00 24.39 O ATOM 244 N1 G B 12 35.282 22.463 25.710 1.00 23.14 N ATOM 245 C2 G B 12 36.136 23.544 25.801 1.00 25.12 C ATOM 246 N2 G B 12 35.566 24.781 25.809 1.00 23.32 N ATOM 247 N3 G B 12 37.447 23.430 25.888 1.00 23.89 N ATOM 248 C4 G B 12 37.847 22.143 25.892 1.00 24.92 C ATOM 249 P G B 13 41.050 21.646 30.847 1.00 31.79 P ATOM 250 O1P G B 13 41.913 21.970 32.013 1.00 33.13 O ATOM 251 O2P G B 13 40.502 20.266 30.697 1.00 30.96 O ATOM 252 O5* G B 13 39.754 22.570 30.921 1.00 30.06 O ATOM 253 C5* G B 13 39.862 23.958 31.171 1.00 29.46 C ATOM 254 C4* G B 13 38.543 24.649 30.914 1.00 28.17 C ATOM 255 O4* G B 13 38.049 24.334 29.586 1.00 27.88 O ATOM 256 C3* G B 13 37.356 24.322 31.810 1.00 27.17 C ATOM 257 O3* G B 13 37.506 24.978 33.065 1.00 27.57 O ATOM 258 C2* G B 13 36.240 24.941 30.983 1.00 27.19 C ATOM 259 O2* G B 13 36.251 26.358 31.018 1.00 26.53 O ATOM 260 C1* G B 13 36.629 24.474 29.573 1.00 26.87 C ATOM 261 N9 G B 13 36.031 23.162 29.330 1.00 26.99 N ATOM 262 C8 G B 13 36.660 21.949 29.286 1.00 25.89 C ATOM 263 N7 G B 13 35.829 20.951 29.099 1.00 26.13 N ATOM 264 C5 G B 13 34.583 21.549 28.997 1.00 25.76 C ATOM 265 C6 G B 13 33.304 20.970 28.786 1.00 25.44 C ATOM 266 O6 G B 13 33.020 19.783 28.643 1.00 26.51 O ATOM 267 N1 G B 13 32.308 21.927 28.756 1.00 26.07 N ATOM 268 C2 G B 13 32.506 23.276 28.917 1.00 26.27 C ATOM 269 N2 G B 13 31.394 24.054 28.887 1.00 26.49 N ATOM 270 N3 G B 13 33.702 23.829 29.104 1.00 25.76 N ATOM 271 C4 G B 13 34.683 22.911 29.131 1.00 25.78 C ATOM 272 P G B 14 36.688 24.480 34.364 1.00 28.04 P ATOM 273 O1P G B 14 37.232 25.265 35.498 1.00 28.24 O ATOM 274 O2P G B 14 36.700 23.014 34.418 1.00 27.78 O ATOM 275 O5* G B 14 35.189 24.948 34.089 1.00 26.64 O ATOM 276 C5* G B 14 34.821 26.309 34.234 1.00 26.48 C ATOM 277 C4* G B 14 33.329 26.470 34.021 1.00 26.73 C ATOM 278 O4* G B 14 32.973 26.023 32.691 1.00 27.34 O ATOM 279 C3* G B 14 32.455 25.631 34.927 1.00 26.32 C ATOM 280 O3* G B 14 32.289 26.283 36.163 1.00 25.18 O ATOM 281 C2* G B 14 31.150 25.594 34.155 1.00 26.28 C ATOM 282 O2* G B 14 30.462 26.832 34.292 1.00 28.24 O ATOM 283 C1* G B 14 31.685 25.398 32.734 1.00 27.44 C ATOM 284 N9 G B 14 31.902 23.979 32.496 1.00 26.19 N ATOM 285 C8 G B 14 33.102 23.325 32.481 1.00 25.54 C ATOM 286 N7 G B 14 32.989 22.052 32.226 1.00 25.02 N ATOM 287 C5 G B 14 31.622 21.853 32.065 1.00 24.97 C ATOM 288 C6 G B 14 30.885 20.664 31.768 1.00 25.65 C ATOM 289 O6 G B 14 31.314 19.514 31.556 1.00 24.40 O ATOM 290 N1 G B 14 29.512 20.911 31.724 1.00 23.81 N ATOM 291 C2 G B 14 28.927 22.133 31.942 1.00 25.97 C ATOM 292 N2 G B 14 27.586 22.167 31.875 1.00 26.21 N ATOM 293 N3 G B 14 29.603 23.248 32.211 1.00 25.92 N ATOM 294 C4 G B 14 30.936 23.030 32.250 1.00 25.92 C ATOM 295 P U B 15 32.149 25.416 37.480 1.00 25.35 P ATOM 296 O1P U B 15 32.057 26.334 38.616 1.00 27.54 O ATOM 297 O2P U B 15 33.208 24.374 37.459 1.00 26.82 O ATOM 298 O5* U B 15 30.740 24.685 37.321 1.00 24.63 O ATOM 299 C5* U B 15 29.549 25.454 37.213 1.00 25.46 C ATOM 300 C4* U B 15 28.349 24.546 37.085 1.00 24.23 C ATOM 301 O4* U B 15 28.377 23.938 35.774 1.00 23.55 O ATOM 302 C3* U B 15 28.321 23.355 38.026 1.00 24.30 C ATOM 303 O3* U B 15 27.841 23.737 39.309 1.00 25.05 O ATOM 304 C2* U B 15 27.338 22.453 37.289 1.00 23.86 C ATOM 305 O2* U B 15 25.979 22.899 37.359 1.00 23.23 O ATOM 306 C1* U B 15 27.826 22.628 35.851 1.00 24.23 C ATOM 307 N1 U B 15 28.871 21.650 35.545 1.00 23.38 N ATOM 308 C2 U B 15 28.448 20.398 35.155 1.00 24.01 C ATOM 309 O2 U B 15 27.270 20.112 35.029 1.00 22.67 O ATOM 310 N3 U B 15 29.458 19.491 34.928 1.00 24.00 N ATOM 311 C4 U B 15 30.813 19.710 35.044 1.00 24.09 C ATOM 312 O4 U B 15 31.590 18.793 34.769 1.00 24.56 O ATOM 313 C5 U B 15 31.178 21.033 35.445 1.00 24.19 C ATOM 314 C6 U B 15 30.217 21.944 35.673 1.00 23.69 C ATOM 315 P C B 16 28.396 23.004 40.622 1.00 24.91 P ATOM 316 O1P C B 16 29.881 23.085 40.622 1.00 24.84 O ATOM 317 O2P C B 16 27.730 21.661 40.668 1.00 24.89 O ATOM 318 O5* C B 16 27.946 23.943 41.822 1.00 25.30 O ATOM 319 C5* C B 16 26.884 23.591 42.712 1.00 25.79 C ATOM 320 C4* C B 16 25.915 24.750 42.836 1.00 25.83 C ATOM 321 O4* C B 16 26.638 25.986 43.088 1.00 25.96 O ATOM 322 C3* C B 16 25.135 25.023 41.571 1.00 25.48 C ATOM 323 O3* C B 16 23.978 24.188 41.595 1.00 24.15 O ATOM 324 C2* C B 16 24.736 26.489 41.738 1.00 25.72 C ATOM 325 O2* C B 16 23.575 26.576 42.539 1.00 25.74 O ATOM 326 C1* C B 16 25.947 27.061 42.477 1.00 25.87 C ATOM 327 N1 C B 16 26.880 27.811 41.611 1.00 26.24 N ATOM 328 C2 C B 16 26.585 29.152 41.350 1.00 26.56 C ATOM 329 O2 C B 16 25.575 29.642 41.883 1.00 25.85 O ATOM 330 N3 C B 16 27.407 29.874 40.549 1.00 25.78 N ATOM 331 C4 C B 16 28.502 29.306 40.033 1.00 27.00 C ATOM 332 N4 C B 16 29.310 30.069 39.261 1.00 26.12 N ATOM 333 C5 C B 16 28.832 27.943 40.289 1.00 26.61 C ATOM 334 C6 C B 16 27.998 27.237 41.079 1.00 26.28 C ATOM 335 P G B 17 23.384 23.642 40.245 1.00 23.95 P ATOM 336 O1P G B 17 23.436 24.574 39.098 1.00 25.14 O ATOM 337 O2P G B 17 22.082 23.017 40.615 1.00 27.22 O ATOM 338 O5* G B 17 24.421 22.475 39.840 1.00 26.25 O ATOM 339 C5* G B 17 24.386 21.204 40.495 1.00 26.12 C ATOM 340 C4* G B 17 23.742 20.170 39.592 1.00 24.76 C ATOM 341 O4* G B 17 24.528 20.039 38.380 1.00 24.76 O ATOM 342 C3* G B 17 23.732 18.759 40.174 1.00 25.57 C ATOM 343 O3* G B 17 22.581 18.586 40.985 1.00 25.04 O ATOM 344 C2* G B 17 23.674 17.891 38.931 1.00 23.99 C ATOM 345 O2* G B 17 22.367 17.810 38.389 1.00 26.46 O ATOM 346 C1* G B 17 24.591 18.671 37.983 1.00 24.16 C ATOM 347 N9 G B 17 26.001 18.255 38.013 1.00 22.61 N ATOM 348 C8 G B 17 27.064 18.970 38.508 1.00 22.62 C ATOM 349 N7 G B 17 28.216 18.388 38.321 1.00 22.00 N ATOM 350 C5 G B 17 27.898 17.197 37.680 1.00 22.60 C ATOM 351 C6 G B 17 28.740 16.163 37.189 1.00 22.09 C ATOM 352 O6 G B 17 29.987 16.125 37.154 1.00 22.31 O ATOM 353 N1 G B 17 28.003 15.117 36.663 1.00 20.99 N ATOM 354 C2 G B 17 26.634 15.088 36.560 1.00 21.61 C ATOM 355 N2 G B 17 26.130 13.961 36.055 1.00 21.54 N ATOM 356 N3 G B 17 25.834 16.080 36.938 1.00 20.35 N ATOM 357 C4 G B 17 26.525 17.085 37.502 1.00 22.36 C ATOM 358 P G B 18 22.687 17.714 42.302 1.00 26.50 P ATOM 359 O1P G B 18 21.410 17.993 43.015 1.00 27.27 O ATOM 360 O2P G B 18 23.973 17.912 42.983 1.00 23.81 O ATOM 361 O5* G B 18 22.672 16.216 41.755 1.00 23.89 O ATOM 362 C5* G B 18 21.485 15.647 41.200 1.00 25.00 C ATOM 363 C4* G B 18 21.782 14.282 40.588 1.00 24.30 C ATOM 364 O4* G B 18 22.684 14.431 39.468 1.00 22.31 O ATOM 365 C3* G B 18 22.476 13.275 41.487 1.00 24.94 C ATOM 366 O3* G B 18 21.499 12.593 42.250 1.00 26.00 O ATOM 367 C2* G B 18 23.118 12.339 40.468 1.00 22.26 C ATOM 368 O2* G B 18 22.152 11.549 39.777 1.00 22.69 O ATOM 369 C1* G B 18 23.616 13.361 39.455 1.00 22.64 C ATOM 370 N9 G B 18 24.946 13.869 39.792 1.00 20.90 N ATOM 371 C8 G B 18 25.300 15.055 40.391 1.00 21.43 C ATOM 372 N7 G B 18 26.600 15.220 40.446 1.00 20.70 N ATOM 373 C5 G B 18 27.116 14.070 39.870 1.00 20.39 C ATOM 374 C6 G B 18 28.475 13.671 39.631 1.00 21.70 C ATOM 375 O6 G B 18 29.522 14.295 39.877 1.00 21.10 O ATOM 376 N1 G B 18 28.545 12.408 39.051 1.00 22.16 N ATOM 377 C2 G B 18 27.475 11.625 38.743 1.00 21.99 C ATOM 378 N2 G B 18 27.770 10.418 38.218 1.00 23.07 N ATOM 379 N3 G B 18 26.205 11.986 38.939 1.00 21.12 N ATOM 380 C4 G B 18 26.110 13.214 39.499 1.00 21.28 C ATOM 381 P C B 19 21.883 12.026 43.669 1.00 27.44 P ATOM 382 O1P C B 19 20.671 11.313 44.167 1.00 28.17 O ATOM 383 O2P C B 19 22.528 13.044 44.526 1.00 26.74 O ATOM 384 O5* C B 19 23.028 10.969 43.369 1.00 27.50 O ATOM 385 C5* C B 19 22.755 9.790 42.650 1.00 28.82 C ATOM 386 C4* C B 19 24.029 8.991 42.502 1.00 29.58 C ATOM 387 O4* C B 19 24.980 9.694 41.662 1.00 29.75 O ATOM 388 C3* C B 19 24.790 8.765 43.794 1.00 29.78 C ATOM 389 O3* C B 19 24.156 7.716 44.580 1.00 32.33 O ATOM 390 C2* C B 19 26.182 8.418 43.280 1.00 29.77 C ATOM 391 O2* C B 19 26.195 7.085 42.797 1.00 31.42 O ATOM 392 C1* C B 19 26.302 9.364 42.073 1.00 28.44 C ATOM 393 N1 C B 19 27.024 10.602 42.393 1.00 27.04 N ATOM 394 C2 C B 19 28.387 10.641 42.164 1.00 26.22 C ATOM 395 O2 C B 19 28.935 9.617 41.731 1.00 25.20 O ATOM 396 N3 C B 19 29.078 11.788 42.418 1.00 25.55 N ATOM 397 C4 C B 19 28.436 12.863 42.885 1.00 25.98 C ATOM 398 N4 C B 19 29.141 14.002 43.089 1.00 25.03 N ATOM 399 C5 C B 19 27.039 12.836 43.162 1.00 26.00 C ATOM 400 C6 C B 19 26.375 11.697 42.903 1.00 26.97 C TER 401 C B 19 MASTER 238 0 0 0 0 0 0 6 456 2 0 2 END """ TEST_PDB_STRING_2 = """ HEADER ANTIOXIDANT ENZYME 06-NOV-00 1HD2 TITLE HUMAN PEROXIREDOXIN 5 ATOM 1 N ALA A 1 -7.101 53.135 16.957 1.00 88.42 N ANISOU 1 N ALA A 1 12714 7605 13277 523 -2633 3491 N ATOM 2 CA ALA A 1 -8.014 52.075 17.450 1.00 63.39 C ANISOU 2 CA ALA A 1 8225 7477 8383 2990 -789 1435 C ATOM 3 C ALA A 1 -7.241 50.817 17.757 1.00 46.53 C ANISOU 3 C ALA A 1 5793 6074 5811 2042 989 16 C ATOM 4 O ALA A 1 -6.073 50.698 17.346 1.00 53.54 O ANISOU 4 O ALA A 1 5678 8269 6398 1327 1320 1080 O ATOM 5 CB ALA A 1 -9.119 51.791 16.443 1.00 77.54 C ANISOU 5 CB ALA A 1 9616 10505 9342 2125 -2228 2932 C ATOM 6 N PRO A 2 -7.796 49.873 18.488 1.00 34.26 N ANISOU 6 N PRO A 2 4045 4756 4215 1116 109 -2030 N ATOM 7 CA PRO A 2 -6.966 48.670 18.750 1.00 29.30 C ANISOU 7 CA PRO A 2 3413 4041 3677 360 -116 -2345 C ATOM 8 C PRO A 2 -6.707 47.922 17.451 1.00 23.66 C ANISOU 8 C PRO A 2 1982 4034 2972 261 -662 -1762 C ATOM 9 O PRO A 2 -7.549 47.657 16.601 1.00 23.73 O ANISOU 9 O PRO A 2 1855 3960 3202 209 -836 -1320 O ATOM 10 CB PRO A 2 -7.774 47.860 19.732 1.00 30.97 C ANISOU 10 CB PRO A 2 3871 4866 3032 146 -165 -2287 C ATOM 11 CG PRO A 2 -8.779 48.807 20.281 1.00 35.77 C ANISOU 11 CG PRO A 2 3448 6110 4032 639 23 -2099 C ATOM 12 CD PRO A 2 -9.080 49.777 19.155 1.00 36.34 C ANISOU 12 CD PRO A 2 3188 6479 4142 982 -759 -2220 C ATOM 13 N ILE A 3 -5.409 47.566 17.357 1.00 19.45 N ANISOU 13 N ILE A 3 1760 3495 2134 -75 -484 -1027 N ATOM 14 CA ILE A 3 -5.040 46.822 16.152 1.00 18.02 C ANISOU 14 CA ILE A 3 1703 3171 1973 10 -766 -936 C ATOM 15 C ILE A 3 -5.689 45.452 16.192 1.00 18.71 C ANISOU 15 C ILE A 3 1903 3233 1974 -124 -474 -787 C ATOM 16 O ILE A 3 -5.915 44.901 17.260 1.00 20.34 O ANISOU 16 O ILE A 3 2262 3409 2057 315 -565 -507 O ATOM 17 CB ILE A 3 -3.513 46.734 16.025 1.00 17.09 C ANISOU 17 CB ILE A 3 1740 2849 1903 104 -657 -844 C ATOM 18 CG1 ILE A 3 -3.034 46.332 14.628 1.00 20.89 C ANISOU 18 CG1 ILE A 3 2079 3804 2056 -100 -461 -1173 C ATOM 19 CG2 ILE A 3 -2.939 45.866 17.110 1.00 18.41 C ANISOU 19 CG2 ILE A 3 1664 3016 2316 -146 -718 -494 C ATOM 20 CD1 ILE A 3 -1.553 46.546 14.371 1.00 22.47 C ANISOU 20 CD1 ILE A 3 2492 3346 2701 -814 225 -681 C ATOM 21 N LYS A 4 -6.016 44.945 14.979 1.00 18.15 N ANISOU 21 N LYS A 4 2109 2759 2028 -146 -425 -709 N ATOM 22 CA LYS A 4 -6.669 43.672 14.871 1.00 19.21 C ANISOU 22 CA LYS A 4 1734 2945 2619 -166 -240 -1004 C ATOM 23 C LYS A 4 -6.136 42.905 13.666 1.00 18.35 C ANISOU 23 C LYS A 4 1621 2920 2432 -469 -242 -951 C ATOM 24 O LYS A 4 -5.523 43.516 12.787 1.00 18.36 O ANISOU 24 O LYS A 4 1727 2866 2383 -290 -380 -717 O ATOM 25 CB LYS A 4 -8.179 43.880 14.717 1.00 22.82 C ANISOU 25 CB LYS A 4 1797 3115 3760 78 -321 -1132 C ATOM 26 CG LYS A 4 -8.515 44.601 13.433 1.00 36.33 C ANISOU 26 CG LYS A 4 3420 5560 4825 996 -1411 -178 C ATOM 27 CD LYS A 4 -9.973 44.994 13.306 1.00 47.94 C ANISOU 27 CD LYS A 4 3683 7877 6655 1611 -1898 19 C ATOM 28 CE LYS A 4 -10.268 45.683 11.970 1.00 55.24 C ANISOU 28 CE LYS A 4 4630 8886 7472 2301 -2486 712 C ATOM 29 NZ LYS A 4 -9.697 47.057 11.880 1.00 64.74 N ANISOU 29 NZ LYS A 4 6567 9962 8070 776 -3006 2044 N ATOM 30 N VAL A 5 -6.390 41.610 13.638 1.00 17.46 N ANISOU 30 N VAL A 5 1965 2729 1939 -44 -514 -643 N ATOM 31 CA VAL A 5 -6.062 40.803 12.447 1.00 17.65 C ANISOU 31 CA VAL A 5 2357 2607 1741 11 -466 -478 C ATOM 32 C VAL A 5 -6.706 41.431 11.221 1.00 16.38 C ANISOU 32 C VAL A 5 1801 2477 1947 27 -578 -599 C ATOM 33 O VAL A 5 -7.842 41.860 11.225 1.00 20.73 O ANISOU 33 O VAL A 5 1855 3251 2769 339 -647 -999 O ATOM 34 CB VAL A 5 -6.540 39.355 12.621 1.00 18.16 C ANISOU 34 CB VAL A 5 2548 2687 1666 -199 -700 -506 C ATOM 35 CG1 VAL A 5 -6.490 38.556 11.331 1.00 20.68 C ANISOU 35 CG1 VAL A 5 3654 2669 1532 -286 -788 -390 C ATOM 36 CG2 VAL A 5 -5.643 38.711 13.693 1.00 21.32 C ANISOU 36 CG2 VAL A 5 3412 3031 1657 -320 -951 -136 C ATOM 37 N GLY A 6 -5.884 41.470 10.169 1.00 17.57 N ANISOU 37 N GLY A 6 1818 3021 1838 -5 -684 -127 N ATOM 38 CA GLY A 6 -6.293 42.101 8.926 1.00 17.93 C ANISOU 38 CA GLY A 6 2091 2771 1951 266 -1009 -287 C ATOM 39 C GLY A 6 -5.787 43.509 8.756 1.00 19.23 C ANISOU 39 C GLY A 6 2774 2651 1880 339 -804 -321 C ATOM 40 O GLY A 6 -5.730 44.041 7.631 1.00 18.94 O ANISOU 40 O GLY A 6 2552 2787 1858 397 -657 -317 O ATOM 41 N ASP A 7 -5.412 44.192 9.821 1.00 16.89 N ANISOU 41 N ASP A 7 2075 2493 1851 359 -716 -218 N ATOM 42 CA ASP A 7 -4.884 45.527 9.687 1.00 17.92 C ANISOU 42 CA ASP A 7 2197 2589 2023 267 -381 -346 C ATOM 43 C ASP A 7 -3.441 45.489 9.193 1.00 15.85 C ANISOU 43 C ASP A 7 2048 2223 1750 430 -734 -281 C ATOM 44 O ASP A 7 -2.691 44.572 9.409 1.00 18.38 O ANISOU 44 O ASP A 7 2451 2196 2337 656 -1042 -611 O ATOM 45 CB ASP A 7 -4.884 46.242 11.037 1.00 19.75 C ANISOU 45 CB ASP A 7 2401 2759 2345 469 -423 -702 C ATOM 46 CG ASP A 7 -6.256 46.558 11.580 1.00 19.89 C ANISOU 46 CG ASP A 7 2495 3124 1940 591 -278 -147 C ATOM 47 OD1 ASP A 7 -7.246 46.486 10.836 1.00 22.27 O ANISOU 47 OD1 ASP A 7 2413 3699 2348 528 -394 -258 O ATOM 48 OD2 ASP A 7 -6.286 46.895 12.814 1.00 23.63 O ANISOU 48 OD2 ASP A 7 3176 3849 1952 665 -161 -245 O ATOM 49 N ALA A 8 -3.094 46.594 8.534 1.00 16.73 N ANISOU 49 N ALA A 8 2285 2228 1845 255 -469 -376 N ATOM 50 CA ALA A 8 -1.686 46.796 8.224 1.00 19.26 C ANISOU 50 CA ALA A 8 2282 3100 1936 119 -555 -155 C ATOM 51 C ALA A 8 -0.940 47.209 9.477 1.00 18.08 C ANISOU 51 C ALA A 8 2249 2719 1900 181 -453 -242 C ATOM 52 O ALA A 8 -1.418 47.960 10.308 1.00 20.61 O ANISOU 52 O ALA A 8 2470 3061 2299 280 -308 -530 O ATOM 53 CB ALA A 8 -1.558 47.881 7.175 1.00 22.45 C ANISOU 53 CB ALA A 8 2906 3904 1718 16 -78 87 C MASTER 245 0 6 6 7 0 3 6 1429 1 9 13 END """ #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_format/test_stockholm.py000644 000765 000024 00000007264 12024702176 023357 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for Stockholm sequence format writer. """ from cogent.util.unit_test import TestCase, main from cogent.format.stockholm import stockholm_from_alignment from cogent.core.alignment import Alignment from cogent.core.sequence import Sequence from cogent.core.info import Info __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" class StockholmTests(TestCase): """Tests for Stockholm writer. """ def setUp(self): """Setup for Stockholm tests.""" self.unaligned_dict = {'1st':'AAA','2nd':'CCCC','3rd':'GGGG', '4th':'UUUU'} self.alignment_dict = {'1st':'AAAA','2nd':'CCCC','3rd':'GGGG', '4th':'UUUU'} #create alignment change order. self.alignment_object = Alignment(self.alignment_dict) self.alignment_order = ['2nd','4th','3rd','1st'] self.alignment_object.RowOrder=self.alignment_order self.gc_annotation = {'SS_cons':'....'} self.stockholm_with_label=\ """# STOCKHOLM 1.0 1st AAAA 2nd CCCC 3rd GGGG 4th UUUU //""" self.stockholm_with_label_lw2=\ """# STOCKHOLM 1.0 1st AA 2nd CC 3rd GG 4th UU 1st AA 2nd CC 3rd GG 4th UU //""" self.stockholm_with_label_struct=\ """# STOCKHOLM 1.0 1st AAAA 2nd CCCC 3rd GGGG 4th UUUU #=GC SS_cons .... //""" self.stockholm_with_label_struct_lw2=\ """# STOCKHOLM 1.0 1st AA 2nd CC 3rd GG 4th UU #=GC SS_cons .. 1st AA 2nd CC 3rd GG 4th UU #=GC SS_cons .. //""" self.stockholm_with_label_reordered=\ """# STOCKHOLM 1.0 2nd CCCC 4th UUUU 3rd GGGG 1st AAAA //""" self.stockholm_with_label_lw2_reordered=\ """# STOCKHOLM 1.0 2nd CC 4th UU 3rd GG 1st AA 2nd CC 4th UU 3rd GG 1st AA //""" def test_stockholm_from_alignment_unaligned(self): """should raise error with unaligned seqs.""" self.assertRaises(ValueError,\ stockholm_from_alignment,self.unaligned_dict) def test_stockholm_from_alignment(self): """should return correct stockholm string.""" self.assertEqual(stockholm_from_alignment({}),'') self.assertEqual(stockholm_from_alignment(self.alignment_dict),\ self.stockholm_with_label) self.assertEqual(stockholm_from_alignment(self.alignment_dict, interleave_len=2),self.stockholm_with_label_lw2) def test_stockholm_from_alignment_struct(self): """should return correct stockholm string.""" self.assertEqual(stockholm_from_alignment({},\ GC_annotation=self.gc_annotation),'') self.assertEqual(stockholm_from_alignment(self.alignment_dict,\ GC_annotation=self.gc_annotation),\ self.stockholm_with_label_struct) self.assertEqual(stockholm_from_alignment(self.alignment_dict,\ GC_annotation=self.gc_annotation,\ interleave_len=2),self.stockholm_with_label_struct_lw2) def test_stockholm_from_alignment_reordered(self): """should return correct stockholm string.""" self.assertEqual(stockholm_from_alignment(self.alignment_object),\ self.stockholm_with_label_reordered) self.assertEqual(stockholm_from_alignment(self.alignment_object, interleave_len=2),self.stockholm_with_label_lw2_reordered) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_format/test_xyzrn.py000644 000765 000024 00000002717 12024702176 022544 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os, tempfile from unittest import main from cogent.util.unit_test import TestCase from cogent import FromFilenameStructureParser from cogent.struct.selection import einput from cogent.format import xyzrn __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class XyzrnTest(TestCase): """Tests conversion of PDB files into the informal xyzrn format.""" def setUp(self): self.structure = FromFilenameStructureParser('data/1A1X.pdb') self.residues = einput(self.structure, 'R') self.atoms = einput(self.structure, 'A') self.residue8 = self.residues.values()[8] self.atom17 = self.atoms.values()[17] self.atom23 = self.atoms.values()[23] def test_write_atom(self): fd, fn = tempfile.mkstemp() os.close(fd) handle = open(fn, 'wb') xyzrn.XYZRNWriter(handle, [self.atom17]) handle.close() handle = open(fn, 'rb') coords_radius = [float(n) for n in handle.read().split()[:4]] self.atom17.setRadius() radius = self.atom17.getRadius() self.assertFloatEqualRel(self.atom17.coords, coords_radius[:3]) self.assertFloatEqualRel(radius, coords_radius[3]) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_evolve/__init__.py000644 000765 000024 00000001157 12024702176 022057 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ["test_best_likelihood", "test_bootstrap", "test_likelihood_function", "test_motifchange", "test_parameter_controller", "test_scale_rules", "test_simulation", "test_substitution_model", "test_coevolution", "test_models"] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell","Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_evolve/test_best_likelihood.py000644 000765 000024 00000010444 12024702176 024516 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent import LoadSeqs, DNA from cogent.evolve.best_likelihood import aligned_columns_to_rows, count_column_freqs, get_ML_probs, \ get_G93_lnL_from_array, BestLogLikelihood, _transpose, _take import math __author__ = "Helen Lindsay" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Helen Lindsay"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Helen Lindsay" __email__ = "helen.lindsay@anu.edu.au" __status__ = "Production" IUPAC_DNA_ambiguities = 'NRYWSKMBDHV' def makeSampleAlignment(gaps = False, ambiguities = False): if gaps: seqs_list = ['AAA--CTTTGG-T','CCCCC-TATG-GT','-AACCCTTTGGGT'] elif ambiguities: seqs_list = ['AARNCCTTTGGC','CCNYCCTTTGSG','CAACCCTGWGGG'] else: seqs_list = ['AAACCCGGGTTTA','CCCGGGTTTAAAC','GGGTTTAAACCCG'] seqs = zip('abc', seqs_list) return LoadSeqs(data = seqs) class TestGoldman93(TestCase): def setUp(self): self.aln = makeSampleAlignment() self.gapped_aln = makeSampleAlignment(gaps = True) self.ambig_aln = makeSampleAlignment(ambiguities = True) def test_aligned_columns_to_rows(self): obs = aligned_columns_to_rows(self.aln[:-1], 3) expect = [['AAA','CCC','GGG'],['CCC','GGG','TTT'], ['GGG','TTT','AAA'], ['TTT','AAA','CCC']] assert obs == expect, (obs, expect) obs = aligned_columns_to_rows(self.aln, 1) expect = [['A','C','G'],['A','C','G'],['A','C','G'], ['C','G','T'],['C','G','T'],['C','G','T'], ['G','T','A'],['G','T','A'],['G','T','A'], ['T','A','C'],['T','A','C'],['T','A','C'], ['A','C','G']] self.assertEqual(obs, expect) obs = aligned_columns_to_rows(self.gapped_aln[:-1], 3, allowed_chars='ACGT') expect = [['TTT','TAT','TTT']] self.assertEqual(obs, expect) obs = aligned_columns_to_rows(self.ambig_aln, 2, exclude_chars=IUPAC_DNA_ambiguities) expect = [['AA','CC','CA'],['CC','CC','CC'],['TT','TT','TG']] self.assertEqual(obs, expect) def test_count_column_freqs(self): columns = aligned_columns_to_rows(self.aln, 1) obs = count_column_freqs(columns) expect = {'A C G' : 4, 'C G T' : 3, 'G T A' : 3, 'T A C' : 3} self.assertEqual(obs, expect) columns = aligned_columns_to_rows(self.aln[:-1], 2) obs = count_column_freqs(columns) expect = {'AA CC GG': 1, 'AC CG GT': 1, 'CC GG TT':1, 'GG TT AA':1, 'GT TA AC':1, 'TT AA CC':1} self.assertEqual(obs, expect) def test__transpose(self): """test transposing an array""" a = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]] e = [[0,3,6,9],[1,4,7,10],[2,5,8,11]] self.assertEqual(_transpose(a), e) def test__take(self): """test taking selected rows from an array""" e = [[0,3,6,9],[1,4,7,10],[2,5,8,11]] self.assertEqual(_take(e, [0,1]), [[0,3,6,9],[1,4,7,10]]) self.assertEqual(_take(e, [1,2]), [[1,4,7,10],[2,5,8,11]]) self.assertEqual(_take(e, [0,2]), [[0,3,6,9],[2,5,8,11]]) def test_get_ML_probs(self): columns = aligned_columns_to_rows(self.aln, 1) obs = get_ML_probs(columns, with_patterns=True) expect = {'A C G' : 4/13.0, 'C G T' : 3/13.0, 'G T A' : 3/13.0, 'T A C' : 3/13.0} sum = 0 for pattern, lnL, freq in obs: self.assertFloatEqual(lnL, expect[pattern]) sum += lnL self.assertTrue(lnL >= 0) self.assertFloatEqual(sum, 1) def test_get_G93_lnL_from_array(self): columns = aligned_columns_to_rows(self.aln, 1) obs = get_G93_lnL_from_array(columns) expect = math.log(math.pow(4/13.0, 4)) + 3*math.log(math.pow(3/13.0, 3)) self.assertFloatEqual(obs, expect) def test_BestLogLikelihood(self): obs = BestLogLikelihood(self.aln, DNA.Alphabet) expect = math.log(math.pow(4/13.0, 4)) + 3*math.log(math.pow(3/13.0, 3)) self.assertFloatEqual(obs,expect) lnL, l = BestLogLikelihood(self.aln, DNA.Alphabet, return_length=True) self.assertEqual(l, len(self.aln)) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_evolve/test_bootstrap.py000644 000765 000024 00000012047 12024702176 023374 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import sys import unittest from cogent.evolve import likelihood_function, \ parameter_controller, substitution_model, bootstrap from cogent import LoadSeqs, LoadTree import os __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Matthew Wakefield", "Helen Lindsay", "Andrew Butterfield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" base_path = os.getcwd() data_path = os.path.join(base_path, 'data') seqnames = ['Chimpanzee', 'Rhesus', 'Orangutan', 'Human'] REPLICATES = 2 def float_ge_zero(num, epsilon=1e-6): """compare whether a floating point value is >= zero with epsilon tolerance.""" if num >= 0.0: return True elif abs(num - 0.0) < epsilon: return True else: return False class BootstrapTests(unittest.TestCase): def gettree(self): treeobj = LoadTree(filename=os.path.join(data_path,"murphy.tree")) return treeobj.getSubTree(seqnames) def getsubmod(self,choice = 'F81'): if choice == 'F81': return substitution_model.Nucleotide(model_gaps=True) else: return substitution_model.Nucleotide( model_gaps=True, predicates = {'kappa':'transition'}) def getalignmentobj(self): moltype = self.getsubmod().MolType alignmentobj = LoadSeqs( filename = os.path.join(data_path, "brca1.fasta"), moltype = moltype) return alignmentobj.takeSeqs(seqnames)[:1000] def getcontroller(self,treeobj, submodobj): return submodobj.makeParamController(treeobj) def create_null_controller(self, alignobj): """A null model controller creator. We constrain the human chimp branches to be equal.""" treeobj = self.gettree() submodobj = self.getsubmod() controller = self.getcontroller(treeobj, submodobj) # we are setting a local molecular clock for human/chimp controller.setLocalClock('Human', 'Chimpanzee') return controller def create_alt_controller(self,alignobj): """An alternative model controller. Chimp/Human branches are free to vary.""" treeobj = self.gettree() submodobj = self.getsubmod() controller = self.getcontroller(treeobj, submodobj) return controller def calclength(self, likelihood_function): """This extracts the length of the human branch and returns it.""" return likelihood_function.getParamValue("length", 'Human') def test_conf_int(self): """testing estimation of confidence intervals.""" alignobj = self.getalignmentobj() bstrap = bootstrap.EstimateConfidenceIntervals( self.create_null_controller(alignobj), self.calclength, alignobj) bstrap.setNumReplicates(REPLICATES) bstrap.setSeed(1984) bstrap.run(local=True) samplelnL = bstrap.getSamplelnL() for lnL in samplelnL: assert lnL < 0.0, lnL observed_stat = bstrap.getObservedStats() assert float_ge_zero(observed_stat) samplestats = bstrap.getSampleStats() for stat in samplestats: assert float_ge_zero(stat) self.assertEqual(len(samplelnL), REPLICATES) self.assertEqual(len(samplestats), REPLICATES) def test_prob(self): """testing estimation of probability.""" import sys alignobj = self.getalignmentobj() prob_bstrap = bootstrap.EstimateProbability( self.create_null_controller(alignobj), self.create_alt_controller(alignobj), alignobj) prob_bstrap.setNumReplicates(REPLICATES) prob_bstrap.setSeed(1984) prob_bstrap.run(local=True) self.assertEqual(len(prob_bstrap.getSampleLRList()), REPLICATES) assert float_ge_zero(prob_bstrap.getObservedLR()) # check the returned sample LR's for being > 0.0 for sample_LR in prob_bstrap.getSampleLRList(): #print sample_LR assert float_ge_zero(sample_LR), sample_LR # check the returned observed lnL fulfill this assertion too, really # testing their order null, alt = prob_bstrap.getObservedlnL() assert float_ge_zero(2 * (alt - null)) # now check the structure of the returned sample for snull, salt in prob_bstrap.getSamplelnL(): #print salt, snull, 2*(salt-snull) assert float_ge_zero(2 * (salt - snull)) # be sure we get something back from getprob if proc rank is 0 assert float_ge_zero(prob_bstrap.getEstimatedProb()) if __name__ == "__main__": unittest.main() PyCogent-1.5.3/tests/test_evolve/test_coevolution.py000755 000765 000024 00001630436 12024702176 023741 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # Authors: Greg Caporaso (gregcaporaso@gmail.com), Brett Easton, Gavin Huttley # test_coevolution.py """ Description File created on 22 May 2007. """ from __future__ import division from tempfile import mktemp from os import remove, environ from os.path import exists from numpy import zeros, ones, array, transpose, arange, nan, log, e, sqrt,\ greater_equal, less_equal from cogent.util.unit_test import TestCase, main from cogent import DNA, RNA, PROTEIN, LoadTree, LoadSeqs from cogent.core.alphabet import CharAlphabet from cogent.maths.stats.util import Freqs from cogent.core.profile import Profile from cogent.core.alphabet import CharAlphabet, Alphabet from cogent.maths.stats.distribution import binomial_exact from cogent.core.alignment import DenseAlignment from cogent.seqsim.tree import RandomTree from cogent.app.util import get_tmp_filename from cogent.evolve.models import DSO78_matrix, DSO78_freqs from cogent.evolve.substitution_model import SubstitutionModel, Empirical from cogent.app.gctmpca import gctmpca_aa_order,\ default_gctmpca_aa_sub_matrix from cogent.util.misc import app_path from cogent.evolve.coevolution import mi_alignment, nmi_alignment,\ resampled_mi_alignment, sca_alignment, make_weights,\ parse_gctmpca_result_line, gDefaultNullValue, create_gctmpca_input,\ build_rate_matrix, coevolve_pair, validate_position, validate_alphabet,\ validate_alignment, unpickle_coevolution_result, mi,\ parse_gctmpca_result, sca_pair, csv_to_coevolution_matrix, sca_position,\ coevolve_position, sca_input_validation, coevolve_alignment, \ probs_from_dict, pickle_coevolution_result, \ parse_coevolution_matrix_filepath, normalized_mi, n_random_seqs, \ mi_position, mi_pair, calc_pair_scale, coevolve_alignments, protein_dict,\ ignore_excludes, merge_alignments, ltm_to_symmetric, join_positions,\ is_parsimony_informative, identify_aln_positions_above_threshold, \ get_subalignments, get_positional_probabilities, \ get_positional_frequencies, get_dgg, get_dg, get_allowed_perturbations, \ freqs_to_array, freqs_from_aln, \ filter_threshold_based_multiple_interdependency, \ filter_non_parsimony_informative, filter_exclude_positions, \ coevolution_matrix_to_csv, count_le_threshold, count_ge_threshold, \ nmi_position, nmi_pair, AAGapless, ancestral_state_position, \ ancestral_state_pair, coevolve_alignments_validation, \ ancestral_state_alignment, nmi, build_coevolution_matrix_filepath,\ aln_position_pairs_cmp_threshold, validate_tree, validate_ancestral_seqs,\ validate_ancestral_seqs, get_ancestral_seqs, \ ancestral_states_input_validation, ancestral_state_pair, gctmpca_alignment,\ aln_position_pairs_ge_threshold, aln_position_pairs_ge_threshold,\ aln_position_pairs_le_threshold, gctmpca_pair __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Beta" class CoevolutionTests(TestCase): """ Tests of coevolution.py """ def setUp(self): """Set up variables for us in tests """ self.run_slow_tests = int(environ.get('TEST_SLOW_APPC',0)) self.run_gctmpca_tests = app_path('calculate_likelihood') ## Data used in SCA tests self.dna_aln = DenseAlignment(data=zip(\ range(4),['ACGT','AGCT','ACCC','TAGG']),MolType=DNA) self.rna_aln = DenseAlignment(data=zip(\ range(4),['ACGU','AGCU','ACCC','UAGG']),MolType=RNA) self.protein_aln = DenseAlignment(data=zip(\ range(4),['ACGP','AGCT','ACCC','TAGG']),MolType=PROTEIN) self.dna_aln_gapped = DenseAlignment(data=zip(range(4),\ ['A-CGT','AGC-T','-ACCC','TAGG-']),MolType=DNA) self.freq = DenseAlignment(data=zip(range(20),\ ['TCT', 'CCT', 'CCC', 'CCC',\ 'CCG', 'CC-', 'AC-', 'AC-', 'AA-', 'AA-', 'GA-', 'GA-', 'GA-', 'GA-',\ 'GA-', 'G--', 'G--', 'G--', 'G--', 'G--',]),MolType=PROTEIN) self.two_pos = DenseAlignment(data=zip(map(str,range(20)),\ ['TC', 'CC', 'CC', 'CC', 'CC', 'CC', 'AC', 'AC', \ 'AA', 'AA', 'GA', 'GA', 'GA', 'GA', 'GA', 'GT', \ 'GT', 'GT', 'GT', 'GT']),MolType=PROTEIN) self.tree20 = LoadTree(treestring=tree20_string) self.gpcr_aln = gpcr_aln self.myos_aln = myos_aln # a made-up dict of base frequencies to use as the natural freqs # for SCA calcs on DNA seqs self.dna_base_freqs = dict(zip('ACGT',[0.25]*4)) self.rna_base_freqs = dict(zip('ACGU',[0.25]*4)) self.run_slow_gctmpca_tests = False self.protein_aln4 = DenseAlignment([('A1','AACF'),('A12','AADF'),\ ('A123','ADCF'),('A111','AAD-')],\ MolType=PROTEIN) self.rna_aln4 = DenseAlignment([('A1','AAUU'),('A12','ACGU'),\ ('A123','UUAA'),('A111','AAA-')],\ MolType=RNA) self.dna_aln4 = DenseAlignment([('A1','AATT'),('A12','ACGT'),\ ('A123','TTAA'),('A111','AAA?')],\ MolType=DNA) self.tree4 = LoadTree(treestring=\ "((A1:0.5,A111:0.5):0.5,(A12:0.5,A123:0.5):0.5);") def test_alignment_analyses_moltype_protein(self): """ alignment methods work with moltype = PROTEIN """ r = mi_alignment(self.protein_aln4) self.assertEqual(r.shape,(4,4)) r = nmi_alignment(self.protein_aln4) self.assertEqual(r.shape,(4,4)) r = sca_alignment(self.protein_aln4,cutoff=0.75) self.assertEqual(r.shape,(4,4)) r = ancestral_state_alignment(self.protein_aln4,self.tree4) self.assertEqual(r.shape,(4,4)) # check if we're running the GCTMPCA tests and the slow tests if self.run_slow_tests and self.run_gctmpca_tests: r = gctmpca_alignment(self.protein_aln4,self.tree4,epsilon=0.7) self.assertEqual(r.shape,(4,4)) def test_alignment_analyses_moltype_rna(self): """ alignment methods work with moltype = RNA """ r = mi_alignment(self.rna_aln4) self.assertEqual(r.shape,(4,4)) r = nmi_alignment(self.rna_aln4) self.assertEqual(r.shape,(4,4)) r = sca_alignment(self.rna_aln4,cutoff=0.75,alphabet='ACGU',\ background_freqs=self.rna_base_freqs) self.assertEqual(r.shape,(4,4)) r = ancestral_state_alignment(self.rna_aln4,self.tree4) self.assertEqual(r.shape,(4,4)) # check if we're running the GCTMPCA tests and the slow tests if self.run_slow_tests and self.run_gctmpca_tests: r = gctmpca_alignment(self.rna_aln4,self.tree4,epsilon=0.7) self.assertEqual(r.shape,(4,4)) def test_alignment_analyses_moltype_dna(self): """ alignment methods work with moltype = DNA """ r = mi_alignment(self.dna_aln4) self.assertEqual(r.shape,(4,4)) r = nmi_alignment(self.dna_aln4) self.assertEqual(r.shape,(4,4)) r = sca_alignment(self.dna_aln4,cutoff=0.75,alphabet='ACGT',\ background_freqs=self.dna_base_freqs) self.assertEqual(r.shape,(4,4)) r = ancestral_state_alignment(self.dna_aln4,self.tree4) self.assertEqual(r.shape,(4,4)) # check if we're running the GCTMPCA tests and the slow tests if self.run_slow_tests and self.run_gctmpca_tests: # Gctmpca method doesn't support DNA alignments. self.assertRaises(ValueError,gctmpca_alignment,self.dna_aln4,\ self.tree4,epsilon=0.7) def test_join_positions(self): """ join_positions functions as expected """ self.assertEqual(join_positions(list('ABCD'),list('WXYZ')),\ ['AW','BX','CY','DZ']) self.assertEqual(join_positions(list('AAA'),list('BBB')),\ ['AB','AB','AB']) self.assertEqual(join_positions([],[]),[]) def test_mi(self): """ mi calculations function as expected with valid data""" self.assertFloatEqual(mi(1.0,1.0,1.0),1.0) self.assertFloatEqual(mi(1.0,1.0,2.0),0.0) self.assertFloatEqual(mi(1.0,1.0,1.5),0.5) def test_normalized_mi(self): """ normalized mi calculations function as expected with valid data""" self.assertFloatEqual(normalized_mi(1.0,1.0,1.0),1.0) self.assertFloatEqual(normalized_mi(1.0,1.0,2.0),0.0) self.assertFloatEqual(normalized_mi(1.0,1.0,1.5),0.3333,3) def test_mi_pair(self): """ mi_pair calculates mi from a pair of columns """ aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), 0.0) aln = DenseAlignment(data={'1':'AB','2':'BA'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), 1.0) # order of positions doesn't matter (when it shouldn't) aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1),\ mi_pair(aln,pos1=1,pos2=0)) aln = DenseAlignment(data={'1':'AB','2':'BA'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), \ mi_pair(aln,pos1=1,pos2=0)) def test_wrapper_functions_handle_invalid_parameters(self): """coevolve_*: functions error on missing parameters""" # missing cutoff aln = DenseAlignment(data={'1':'AC','2':'AC'},MolType=PROTEIN) self.assertRaises(ValueError,coevolve_pair,sca_pair,aln,0,1) self.assertRaises(ValueError,coevolve_position,sca_position,aln,0) self.assertRaises(ValueError,coevolve_alignment,sca_alignment,aln) self.assertRaises(ValueError,coevolve_alignments,sca_alignment,aln,aln) def test_coevolve_pair(self): """coevolve_pair: returns same as pair methods called directly """ aln = DenseAlignment(data={'1':'AC','2':'AC'},MolType=PROTEIN) t = LoadTree(treestring='(1:0.5,2:0.5);') cutoff = 0.50 # mi_pair == coevolve_pair(mi_pair,...) self.assertFloatEqual(coevolve_pair(mi_pair,aln,pos1=0,pos2=1),\ mi_pair(aln,pos1=0,pos2=1)) self.assertFloatEqual(coevolve_pair(nmi_pair,aln,pos1=0,pos2=1),\ nmi_pair(aln,pos1=0,pos2=1)) self.assertFloatEqual(coevolve_pair(ancestral_state_pair,aln,pos1=0,\ pos2=1,tree=t),ancestral_state_pair(aln,pos1=0,pos2=1,tree=t)) self.assertFloatEqual(coevolve_pair(sca_pair,aln,pos1=0,\ pos2=1,cutoff=cutoff),sca_pair(aln,pos1=0,pos2=1,cutoff=cutoff)) def test_coevolve_position(self): """coevolve_position: returns same as position methods called directly """ aln = DenseAlignment(data={'1':'AC','2':'AC'},MolType=PROTEIN) t = LoadTree(treestring='(1:0.5,2:0.5);') cutoff = 0.50 # mi_position == coevolve_position(mi_position,...) self.assertFloatEqual(coevolve_position(mi_position,aln,position=0),\ mi_position(aln,position=0)) self.assertFloatEqual(coevolve_position(nmi_position,aln,position=0),\ nmi_position(aln,position=0)) self.assertFloatEqual(coevolve_position(\ ancestral_state_position,aln,position=0,\ tree=t),ancestral_state_position(aln,position=0,tree=t)) self.assertFloatEqual(coevolve_position(sca_position,aln,position=0,\ cutoff=cutoff),sca_position(aln,position=0,cutoff=cutoff)) def test_coevolve_alignment(self): """coevolve_alignment: returns same as alignment methods""" aln = DenseAlignment(data={'1':'AC','2':'AC'},MolType=PROTEIN) t = LoadTree(treestring='(1:0.5,2:0.5);') cutoff = 0.50 # mi_alignment == coevolve_alignment(mi_alignment,...) self.assertFloatEqual(coevolve_alignment(mi_alignment,aln),\ mi_alignment(aln)) self.assertFloatEqual(coevolve_alignment(mip_alignment,aln),\ mip_alignment(aln)) self.assertFloatEqual(coevolve_alignment(mia_alignment,aln),\ mia_alignment(aln)) self.assertFloatEqual(coevolve_alignment(nmi_alignment,aln),\ nmi_alignment(aln)) self.assertFloatEqual(coevolve_alignment(ancestral_state_alignment,aln,\ tree=t),ancestral_state_alignment(aln,tree=t)) self.assertFloatEqual(coevolve_alignment(sca_alignment,aln,\ cutoff=cutoff),sca_alignment(aln,cutoff=cutoff)) def test_coevolve_alignments_validation_idenifiers(self): """coevolve_alignments_validation: seq/tree validation functions """ method = sca_alignment aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) t = LoadTree(treestring='(1:0.5,2:0.5);') # OK w/ no tree coevolve_alignments_validation(method,aln1,aln2,2,None) # OK w/ tree coevolve_alignments_validation(method,aln1,aln2,2,None,tree=t) # If there is a plus present in identifiers, we only care about the # text before the colon aln1 = DenseAlignment(data={'1+a':'AC','2+b':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(\ data={'1 + c':'EFW','2 + d':'EGY'},MolType=PROTEIN) t = LoadTree(treestring='(1+e:0.5,2 + f:0.5);') # OK w/ no tree coevolve_alignments_validation(method,aln1,aln2,2,None) # OK w/ tree coevolve_alignments_validation(method,aln1,aln2,2,None,tree=t) # mismatch b/w alignments seq names aln1 = DenseAlignment(data={'3':'AC','2':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) t = LoadTree(treestring='(1:0.5,2:0.5);') self.assertRaises(AssertionError,coevolve_alignments_validation,\ method,aln1,aln2,2,None,tree=t) # mismatch b/w alignments and tree seq names aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) t = LoadTree(treestring='(3:0.5,2:0.5);') self.assertRaises(AssertionError,\ coevolve_alignments_validation,method,\ aln1,aln2,2,None,tree=t) # mismatch b/w alignments in number of seqs aln1 = DenseAlignment(\ data={'1':'AC','2':'AD','3':'AA'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) t = LoadTree(treestring='(1:0.5,2:0.5);') self.assertRaises(AssertionError,coevolve_alignments_validation,\ method,aln1,aln2,2,None) self.assertRaises(AssertionError,coevolve_alignments_validation,\ method,aln1,aln2,2,None,tree=t) # mismatch b/w alignments & tree in number of seqs aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) t = LoadTree(treestring='(1:0.5,(2:0.5,3:0.25));') self.assertRaises(AssertionError,coevolve_alignments_validation,\ method,aln1,aln2,2,None,tree=t) def test_coevolve_alignments_validation_min_num_seqs(self): """coevolve_alignments_validation: ValueError on fewer than min_num_seqs """ method = mi_alignment # too few sequences -> ValueError aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) coevolve_alignments_validation(method,aln1,aln2,1,None) coevolve_alignments_validation(method,aln1,aln2,2,None) self.assertRaises(ValueError,\ coevolve_alignments_validation,method,aln1,aln2,3,None) def test_coevolve_alignments_validation_max_num_seqs(self): """coevolve_alignments_validation: min_num_seqs <= max_num_seqs """ method = mi_alignment # min_num_seqs > max_num_seqs-> ValueError aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) coevolve_alignments_validation(method,aln1,aln2,1,None) coevolve_alignments_validation(method,aln1,aln2,1,3) coevolve_alignments_validation(method,aln1,aln2,2,3) self.assertRaises(ValueError,\ coevolve_alignments_validation,method,aln1,aln2,3,2) def test_coevolve_alignments_validation_moltypes(self): """coevolve_alignments_validation: valid for acceptable MolTypes """ aln1 = DenseAlignment(data={'1':'AC','2':'AU'},MolType=RNA) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) # different MolType coevolve_alignments_validation(mi_alignment,aln1,aln2,2,None) coevolve_alignments_validation(nmi_alignment,aln1,aln2,2,None) coevolve_alignments_validation(\ resampled_mi_alignment,aln1,aln2,2,None) self.assertRaises(AssertionError,coevolve_alignments_validation,\ sca_alignment,aln1,aln2,2,None) self.assertRaises(AssertionError,coevolve_alignments_validation,\ ancestral_state_alignment,aln1,aln2,2,None) self.assertRaises(AssertionError,coevolve_alignments_validation,\ gctmpca_alignment,aln1,aln2,2,None) def test_coevolve_alignments(self): """ coevolve_alignments: returns correct len(aln1) x len(aln2) matrix """ aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) combined_aln =\ DenseAlignment(data={'1':'ACEFW','2':'ADEGY'},MolType=PROTEIN) t = LoadTree(treestring='(1:0.5,2:0.5);') cutoff = 0.50 # MI m = mi_alignment(combined_aln) expected = array([[m[2,0],m[2,1]],\ [m[3,0],m[3,1]],[m[4,0],m[4,1]]]) self.assertFloatEqual(coevolve_alignments(mi_alignment,aln1,aln2),\ expected) # MI (return_full=True) self.assertFloatEqual(coevolve_alignments(mi_alignment,aln1,aln2,\ return_full=True),m) # NMI m = nmi_alignment(combined_aln) expected = array([[m[2,0],m[2,1]],\ [m[3,0],m[3,1]],[m[4,0],m[4,1]]]) self.assertFloatEqual(coevolve_alignments(nmi_alignment,aln1,aln2),\ expected) # AS m = ancestral_state_alignment(combined_aln,tree=t) expected = array([[m[2,0],m[2,1]],\ [m[3,0],m[3,1]],[m[4,0],m[4,1]]]) self.assertFloatEqual(\ coevolve_alignments(ancestral_state_alignment,aln1,aln2,\ tree=t),expected) # SCA m = sca_alignment(combined_aln,cutoff=cutoff) expected = array([[m[2,0],m[2,1]],\ [m[3,0],m[3,1]],[m[4,0],m[4,1]]]) self.assertFloatEqual(coevolve_alignments(sca_alignment,aln1,aln2,\ cutoff=cutoff),expected) def test_coevolve_alignments_watches_min_num_seqs(self): """ coevolve_alignments: error on too few sequences """ aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) coevolve_alignments(mi_alignment,aln1,aln2) coevolve_alignments(mi_alignment,aln1,aln2,min_num_seqs=0) coevolve_alignments(mi_alignment,aln1,aln2,min_num_seqs=1) coevolve_alignments(mi_alignment,aln1,aln2,min_num_seqs=2) self.assertRaises(ValueError,\ coevolve_alignments,mi_alignment,aln1,aln2,min_num_seqs=3) self.assertRaises(ValueError,\ coevolve_alignments,mi_alignment,aln1,aln2,min_num_seqs=50) def test_coevolve_alignments_watches_max_num_seqs(self): """ coevolve_alignments: filtering or error on too many sequences """ aln1 = DenseAlignment(data={'1':'AC','2':'AD','3':'YP'},\ MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'ACP','2':'EAD','3':'PYP'},\ MolType=PROTEIN) # keep all seqs tmp_filepath = get_tmp_filename(\ prefix='tmp_test_coevolution',suffix='.fasta') coevolve_alignments(mi_alignment,aln1,aln2,max_num_seqs=3,\ merged_aln_filepath=tmp_filepath) self.assertEqual(LoadSeqs(tmp_filepath).getNumSeqs(),3) # keep 2 seqs coevolve_alignments(mi_alignment,aln1,aln2,max_num_seqs=2,\ merged_aln_filepath=tmp_filepath) self.assertEqual(LoadSeqs(tmp_filepath).getNumSeqs(),2) # error if no sequence filter self.assertRaises(ValueError,\ coevolve_alignments,mi_alignment,aln1,aln2,max_num_seqs=2,\ merged_aln_filepath=tmp_filepath,sequence_filter=None) # clean up the temporary file remove(tmp_filepath) def test_coevolve_alignments_different_MolType(self): """ coevolve_alignments: different MolTypes supported """ aln1 = DenseAlignment(data={'1':'AC','2':'AU'},MolType=RNA) aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN) combined_aln = DenseAlignment(data={'1':'ACEFW','2':'AUEGY'}) t = LoadTree(treestring='(1:0.5,2:0.5);') cutoff = 0.50 # MI m = mi_alignment(combined_aln) expected = array([[m[2,0],m[2,1]],\ [m[3,0],m[3,1]],[m[4,0],m[4,1]]]) self.assertFloatEqual(coevolve_alignments(mi_alignment,aln1,aln2),\ expected) # MI (return_full=True) self.assertFloatEqual(coevolve_alignments(mi_alignment,aln1,aln2,\ return_full=True),m) # NMI m = nmi_alignment(combined_aln) expected = array([[m[2,0],m[2,1]],\ [m[3,0],m[3,1]],[m[4,0],m[4,1]]]) self.assertFloatEqual(coevolve_alignments(nmi_alignment,aln1,aln2),\ expected) def test_mi_pair_cols_default_exclude_handling(self): """ mi_pair returns null_value on excluded by default """ aln = DenseAlignment(data={'1':'AB','2':'-B'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), gDefaultNullValue) aln = DenseAlignment(data={'1':'-B','2':'-B'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), gDefaultNullValue) aln = DenseAlignment(data={'1':'AA','2':'-B'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), gDefaultNullValue) aln = DenseAlignment(data={'1':'AA','2':'PB'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,excludes='P'),\ gDefaultNullValue) def test_mi_pair_cols_non_default_exclude_handling(self): """ mi_pair uses non-default exclude_handler when provided""" aln = DenseAlignment(data={'1':'A-','2':'A-'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), gDefaultNullValue) self.assertFloatEqual(\ mi_pair(aln,pos1=0,pos2=1,exclude_handler=ignore_excludes),0.0) def test_mi_pair_cols_and_entropies(self): """ mi_pair calculates mi from a pair of columns and precalc entropies """ aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,h1=0.0,h2=0.0), 0.0) aln = DenseAlignment(data={'1':'AB','2':'BA'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,h1=1.0,h2=1.0), 1.0) # incorrect positional entropies provided to ensure that the # precalculated values are used, and that entorpies are not # caluclated on-the-fly. aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,h1=1.0,h2=1.0), 2.0) def test_mi_pair_alt_calculator(self): """ mi_pair uses alternate mi_calculator when provided """ aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1),0.0) self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,\ mi_calculator=normalized_mi),gDefaultNullValue) def test_mi_position_valid_input(self): """ mi_position functions with varied valid input """ aln = DenseAlignment(data={'1':'ACG','2':'GAC'},MolType=PROTEIN) self.assertFloatEqual(mi_position(aln,0),array([1.0,1.0,1.0])) aln = DenseAlignment(data={'1':'ACG','2':'ACG'},MolType=PROTEIN) self.assertFloatEqual(mi_position(aln,0),array([0.0,0.0,0.0])) aln = DenseAlignment(data={'1':'ACG','2':'ACG'},MolType=PROTEIN) self.assertFloatEqual(mi_position(aln,2),array([0.0,0.0,0.0])) def test_mi_position_from_alignment_nmi(self): """mi_position functions w/ alternate mi_calculator """ aln = DenseAlignment(data={'1':'ACG','2':'ACG'},MolType=PROTEIN) self.assertFloatEqual(mi_position(aln,0),array([0.0,0.0,0.0])) aln = DenseAlignment(data={'1':'ACG','2':'ACG'},MolType=PROTEIN) self.assertFloatEqual(mi_position(aln,0,mi_calculator=normalized_mi),\ array([gDefaultNullValue,gDefaultNullValue,gDefaultNullValue])) def test_mi_position_from_alignment_default_exclude_handling(self): """ mi_position handles excludes by setting to null_value""" aln = DenseAlignment(data={'1':'ACG','2':'G-C'},MolType=PROTEIN) self.assertFloatEqual(mi_position(aln,0),\ array([1.0,gDefaultNullValue,1.0])) aln = DenseAlignment(data={'1':'ACG','2':'GPC'},MolType=PROTEIN) self.assertFloatEqual(mi_position(aln,0,excludes='P'),\ array([1.0,gDefaultNullValue,1.0])) def test_mi_position_from_alignment_non_default_exclude_handling(self): """ mi_position handles excludes w/ non-default method""" aln = DenseAlignment(data={'1':'ACG','2':'G-C'},MolType=PROTEIN) self.assertFloatEqual(\ mi_position(aln,0,exclude_handler=ignore_excludes),\ array([1.0,1.0,1.0])) def test_mi_alignment_excludes(self): """ mi_alignment handles excludes properly """ expected = array([[0.0, gDefaultNullValue, 0.0],\ [gDefaultNullValue,gDefaultNullValue,gDefaultNullValue],\ [0.0,gDefaultNullValue,0.0]]) # gap in second column aln = DenseAlignment(data={'1':'ACG','2':'A-G'},MolType=PROTEIN) self.assertFloatEqual(mi_alignment(aln),expected) # excludes = 'P' aln = DenseAlignment(data={'1':'ACG','2':'APG'},MolType=PROTEIN) self.assertFloatEqual(mi_alignment(aln,excludes='P'),\ expected) # gap in first column expected = array([\ [gDefaultNullValue, gDefaultNullValue, gDefaultNullValue],\ [gDefaultNullValue,0.0,0.0], [gDefaultNullValue,0.0,0.0]]) aln = DenseAlignment(data={'1':'-CG','2':'ACG'},MolType=PROTEIN) self.assertFloatEqual(mi_alignment(aln),expected) def test_mi_alignment_high(self): """ mi_alignment detected perfectly correlated columns """ expected = [[1.0, 1.0],[1.0,1.0]] aln = DenseAlignment(data={'1':'AG','2':'GA'},MolType=PROTEIN) self.assertFloatEqual(mi_alignment(aln),expected) def test_mi_alignment_low(self): """ mi_alignment detected in perfectly uncorrelated columns """ expected = [[0.0, 0.0],[0.0,1.0]] aln = DenseAlignment(data={'1':'AG','2':'AC'},MolType=PROTEIN) self.assertFloatEqual(mi_alignment(aln),expected) def test_resampled_mi_alignment(self): """ resampled_mi_alignment returns without error """ aln = DenseAlignment(data={'1':'ACDEF','2':'ACFEF','3':'ACGEF'},\ MolType=PROTEIN) resampled_mi_alignment(aln) aln = DenseAlignment(data={'1':'ACDEF','2':'ACF-F','3':'ACGEF'},\ MolType=PROTEIN) resampled_mi_alignment(aln) def test_coevolve_alignment(self): """ coevolve_alignment functions as expected with varied input """ aln1 = DenseAlignment(data={'1':'ACDEF','2':'ACFEF','3':'ACGEF'},\ MolType=PROTEIN) # no kwargs passed self.assertFloatEqual(coevolve_alignment(mi_alignment,aln1),\ mi_alignment(aln1)) # different method passed self.assertFloatEqual(coevolve_alignment(nmi_alignment,aln1),\ nmi_alignment(aln1)) # kwargs passed self.assertFloatEqual(coevolve_alignment(mi_alignment,aln1,\ mi_calculator=nmi),nmi_alignment(aln1)) def test_build_coevolution_matrix_filepath(self): """ build_coevolution_matrix_filepath functions w/ varied input """ self.assertEqual(build_coevolution_matrix_filepath(\ './blah.fasta'),'./blah') self.assertEqual(build_coevolution_matrix_filepath(\ 'blah.fasta'),'./blah') self.assertEqual(build_coevolution_matrix_filepath('blah'),'./blah') self.assertEqual(build_coevolution_matrix_filepath('./blah'),'./blah') self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\ output_dir='./duh/',method='xx',alphabet='yyy'),\ './duh/blah.yyy.xx') self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\ output_dir='./duh/',method='xx',alphabet='yyy',\ parameter=0.25),\ './duh/blah.yyy.xx') self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\ output_dir='./duh/',method='xx'),'./duh/blah.xx') self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\ output_dir='./duh/',method='sca',parameter=0.25),\ './duh/blah.sca_25') self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\ output_dir='./duh/',method='sca',parameter=0.25,\ alphabet='xx'),'./duh/blah.xx.sca_25') # no trailing / to output_dir self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\ output_dir='./duh',method='sca',parameter=0.25,\ alphabet='xx'),'./duh/blah.xx.sca_25') self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\ output_dir='./duh/',method='gctmpca',parameter=0.25),\ './duh/blah.gctmpca_25') self.assertRaises(ValueError,build_coevolution_matrix_filepath,\ './blah.fasta','./duh/','gctmpca') self.assertRaises(ValueError,build_coevolution_matrix_filepath,\ './blah.fasta','./duh/','sca') self.assertRaises(ValueError,build_coevolution_matrix_filepath,\ './blah.fasta','./duh/','sca','xx') def test_pickle_coevolution_result_error(self): """pickle matrix: IOError handled correctly""" m = array([[1,2],[3,4]]) self.assertRaises(IOError,pickle_coevolution_result,m,'') def test_unpickle_coevolution_result_error(self): """unpickle matrix: IOError handled correctly""" self.assertRaises(IOError,unpickle_coevolution_result,\ 'invalid/file/path.pkl') def test_pickle_and_unpickle(self): """unpickle(pickle(matrix)) == matrix""" for expected in [4.5,array([1.2,4.3,5.5]),\ array([[1.4,2.2],[3.0,0.4]])]: filepath = mktemp() pickle_coevolution_result(expected,filepath) actual = unpickle_coevolution_result(filepath) self.assertFloatEqual(actual,expected) remove(filepath) def test_csv_coevolution_result_error(self): """matrix -> csv: IOError handled correctly""" m = array([[1,2],[3,4]]) self.assertRaises(IOError,coevolution_matrix_to_csv,m,'') def test_uncsv_coevolution_result_error(self): """csv -> matrix: IOError handled correctly""" self.assertRaises(IOError,csv_to_coevolution_matrix,\ 'invalid/file/path.pkl') def test_csv_and_uncsv(self): """converting to/from csv matrix results in correct coevolution matrix """ expected = array([[1.4,2.2],[gDefaultNullValue,0.4]]) filepath = mktemp() coevolution_matrix_to_csv(expected,filepath) actual = csv_to_coevolution_matrix(filepath) self.assertFloatEqual(actual,expected) remove(filepath) def test_parse_coevolution_matrix_filepath(self): """Parsing matrix filepaths works as expected. """ expected = ('myosin_995', 'a1_4', 'nmi') self.assertEqual(\ parse_coevolution_matrix_filepath('pkls/myosin_995.a1_4.nmi.pkl'),\ expected) self.assertEqual(\ parse_coevolution_matrix_filepath('pkls/myosin_995.a1_4.nmi.csv'),\ expected) expected = ('p53','orig','mi') self.assertEqual(\ parse_coevolution_matrix_filepath('p53.orig.mi.pkl'),\ expected) self.assertEqual(\ parse_coevolution_matrix_filepath('p53.orig.mi.csv'),\ expected) def test_parse_coevolution_matrix_filepath_error(self): """Parsing matrix file paths handles invalid filepaths """ self.assertRaises(ValueError,\ parse_coevolution_matrix_filepath,'pkls/myosin_995.nmi.pkl') self.assertRaises(ValueError,\ parse_coevolution_matrix_filepath,'pkls/myosin_995.pkl') self.assertRaises(ValueError,\ parse_coevolution_matrix_filepath,'pkls/myosin_995') self.assertRaises(ValueError,\ parse_coevolution_matrix_filepath,'') def test_identify_aln_positions_above_threshold(self): """Extracting scores above threshold works as expected """ m = array([\ [gDefaultNullValue,gDefaultNullValue,\ gDefaultNullValue,gDefaultNullValue],\ [0.3, 1.0,gDefaultNullValue,gDefaultNullValue],\ [0.25,0.75,1.0,gDefaultNullValue], [0.9,0.751,0.8,1.0]]) self.assertEqual(identify_aln_positions_above_threshold(m,0.75,0),[]) self.assertEqual(identify_aln_positions_above_threshold(m,0.75,1),\ [1]) self.assertEqual(identify_aln_positions_above_threshold(m,0.75,2),\ [1,2]) self.assertEqual(identify_aln_positions_above_threshold(m,0.75,3),\ [0,1,2,3]) m = ltm_to_symmetric(m) self.assertEqual(identify_aln_positions_above_threshold(m,0.75,0),\ [3]) self.assertEqual(identify_aln_positions_above_threshold(m,0.75,1),\ [1,2,3]) self.assertEqual(identify_aln_positions_above_threshold(m,0.75,2),\ [1,2,3]) self.assertEqual(identify_aln_positions_above_threshold(m,0.75,3),\ [0,1,2,3]) self.assertEqual(identify_aln_positions_above_threshold(m,1.1,0),\ []) self.assertEqual(identify_aln_positions_above_threshold(m,-5.,0),\ [1,2,3]) self.assertEqual(identify_aln_positions_above_threshold(m,-5.,1),\ [0,1,2,3]) def test_count_ge_threshold(self): """count_ge_threshold works as expected """ m = array([[gDefaultNullValue]*3]*3) self.assertEqual(count_ge_threshold(m,1.0),(0,0)) self.assertEqual(count_ge_threshold(m,\ gDefaultNullValue,gDefaultNullValue),(0,0)) self.assertEqual(count_ge_threshold(m,1.0,42),(0,9)) m = array([[0,1,2],[3,4,5],[6,7,8]]) self.assertEqual(count_ge_threshold(m,4),(5,9)) self.assertEqual(count_ge_threshold(m,8),(1,9)) self.assertEqual(count_ge_threshold(m,9),(0,9)) m = array([[0,gDefaultNullValue,gDefaultNullValue],\ [gDefaultNullValue,4,5],[6,7,8]]) self.assertEqual(count_ge_threshold(m,4),(5,6)) self.assertEqual(count_ge_threshold(m,8),(1,6)) self.assertEqual(count_ge_threshold(m,9),(0,6)) def test_count_le_threshold(self): """count_le_threshold works as expected """ m = array([[gDefaultNullValue]*3]*3) self.assertEqual(count_le_threshold(m,1.0),(0,0)) self.assertEqual(count_le_threshold(m,\ gDefaultNullValue,gDefaultNullValue),(0,0)) self.assertEqual(count_le_threshold(m,1.0,42),(0,9)) m = array([[0,1,2],[3,4,5],[6,7,8]]) self.assertEqual(count_le_threshold(m,4),(5,9)) self.assertEqual(count_le_threshold(m,8),(9,9)) self.assertEqual(count_le_threshold(m,9),(9,9)) m = array([[0,gDefaultNullValue,gDefaultNullValue],\ [gDefaultNullValue,4,5],[6,7,8]]) self.assertEqual(count_le_threshold(m,4),(2,6)) self.assertEqual(count_le_threshold(m,8),(6,6)) self.assertEqual(count_le_threshold(m,9),(6,6)) def test_count_ge_threshold_symmetric_ignore_diagonal(self): """count_ge_threshold works with symmetric and/or ignoring diag = True """ # no good scores, varied null value m = array([[gDefaultNullValue]*3]*3) self.assertEqual(count_ge_threshold(m,1.0,symmetric=True),(0,0)) self.assertEqual(count_ge_threshold(m,1.0,symmetric=True),(0,0)) self.assertEqual(count_ge_threshold(m,1.0,42,symmetric=True),(0,6)) self.assertEqual(count_ge_threshold(m,1.0,ignore_diagonal=True),(0,0)) self.assertEqual(count_ge_threshold(m,1.0,ignore_diagonal=True),(0,0)) self.assertEqual(count_ge_threshold(m,1.0,42,\ ignore_diagonal=True),(0,6)) self.assertEqual(count_ge_threshold(m,1.0,\ ignore_diagonal=True,symmetric=True),(0,0)) self.assertEqual(count_ge_threshold(m,1.0,\ ignore_diagonal=True,symmetric=True),(0,0)) self.assertEqual(count_ge_threshold(m,1.0,42,\ ignore_diagonal=True,symmetric=True),(0,3)) # no null values, varied other values m = array([[0,1,2],[3,4,5],[6,7,8]]) self.assertEqual(count_ge_threshold(m,4),(5,9)) self.assertEqual(count_ge_threshold(m,4,symmetric=True),(4,6)) self.assertEqual(count_ge_threshold(m,4,ignore_diagonal=True),(3,6)) self.assertEqual(count_ge_threshold(m,4,symmetric=True,\ ignore_diagonal=True),(2,3)) # null and mixed values m = array([\ [0,gDefaultNullValue,gDefaultNullValue],\ [3,4,gDefaultNullValue],\ [gDefaultNullValue,7,8]]) self.assertEqual(count_ge_threshold(m,4),(3,5)) self.assertEqual(count_ge_threshold(m,4,symmetric=True),(3,5)) self.assertEqual(count_ge_threshold(m,4,ignore_diagonal=True),(1,2)) self.assertEqual(count_ge_threshold(m,4,symmetric=True,\ ignore_diagonal=True),(1,2)) def test_count_le_threshold_symmetric_ignore_diagonal(self): """count_le_threshold works with symmetric and/or ignoring diag = True """ # varied null value m = array([[gDefaultNullValue]*3]*3) self.assertEqual(count_le_threshold(m,1.0,symmetric=True),(0,0)) self.assertEqual(count_le_threshold(m,1.0,symmetric=True),(0,0)) self.assertEqual(count_le_threshold(m,1.0,42,symmetric=True),(0,6)) self.assertEqual(count_le_threshold(m,1.0,ignore_diagonal=True),(0,0)) self.assertEqual(count_le_threshold(m,1.0,ignore_diagonal=True),(0,0)) self.assertEqual(count_le_threshold(m,1.0,42,\ ignore_diagonal=True),(0,6)) self.assertEqual(count_le_threshold(m,1.0,\ ignore_diagonal=True,symmetric=True),(0,0)) self.assertEqual(count_le_threshold(m,1.0,\ ignore_diagonal=True,symmetric=True),(0,0)) self.assertEqual(count_le_threshold(m,1.0,42,\ ignore_diagonal=True,symmetric=True),(0,3)) # no null values, varied other values m = array([[0,1,2],[3,4,5],[6,7,8]]) self.assertEqual(count_le_threshold(m,4),(5,9)) self.assertEqual(count_le_threshold(m,4,symmetric=True),(3,6)) self.assertEqual(count_le_threshold(m,4,ignore_diagonal=True),(3,6)) self.assertEqual(count_le_threshold(m,4,symmetric=True,\ ignore_diagonal=True),(1,3)) # null and mixed values m = array([\ [0,gDefaultNullValue,gDefaultNullValue],\ [3,4,gDefaultNullValue],\ [gDefaultNullValue,7,8]]) self.assertEqual(count_le_threshold(m,4),(3,5)) self.assertEqual(count_le_threshold(m,4,symmetric=True),(3,5)) self.assertEqual(count_le_threshold(m,4,ignore_diagonal=True),(1,2)) self.assertEqual(count_le_threshold(m,4,symmetric=True,\ ignore_diagonal=True),(1,2)) def test_aln_position_pairs_cmp_threshold_intramolecular(self): """aln_position_pairs_ge_threshold: intramolecular matrix """ m = array([\ [0,gDefaultNullValue,gDefaultNullValue],\ [3,4,gDefaultNullValue],\ [gDefaultNullValue,7,8]]) # cmp_function = ge self.assertEqual(aln_position_pairs_cmp_threshold(m,3.5,greater_equal),\ [(1,1),(2,1),(2,2)]) # cmp_function = greater_equal, alt null_value self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,greater_equal,null_value=4),\ [(2,1),(2,2)]) # cmp_function = le self.assertEqual(aln_position_pairs_cmp_threshold(m,3.5,less_equal),\ [(0,0),(1,0)]) # results equal results with wrapper functions self.assertEqual(aln_position_pairs_cmp_threshold(m,3.5,greater_equal),\ aln_position_pairs_ge_threshold(m,3.5)) self.assertEqual(aln_position_pairs_cmp_threshold(m,3.5,less_equal),\ aln_position_pairs_le_threshold(m,3.5)) self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,greater_equal,null_value=4),\ aln_position_pairs_ge_threshold(m,3.5,null_value=4)) self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,less_equal,null_value=0),\ aln_position_pairs_le_threshold(m,3.5,null_value=0)) def test_aln_position_pairs_ge_threshold_intermolecular(self): """aln_position_pairs_ge_threshold: intermolecular matrix """ m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) # error if failed to specify intermolecular_data_only=True self.assertRaises(AssertionError,aln_position_pairs_cmp_threshold,\ m,3.5,greater_equal) # cmp_function = ge self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,greater_equal,intermolecular_data_only=True),\ [(1,4),(2,4),(0,5),(1,5),(2,5),(3,5)]) # cmp_function = greater_equal, alt null_value self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,greater_equal,null_value=18.,intermolecular_data_only=True),\ [(1,4),(2,4),(0,5),(2,5),(3,5)]) # cmp_function = le self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,less_equal,intermolecular_data_only=True),\ [(0,4),(3,4)]) # results equal results with wrapper functions self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,greater_equal,intermolecular_data_only=True),\ aln_position_pairs_ge_threshold(m,3.5,intermolecular_data_only=True)) self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,less_equal,intermolecular_data_only=True),\ aln_position_pairs_le_threshold(m,3.5,intermolecular_data_only=True)) self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,greater_equal,null_value=4.,intermolecular_data_only=True),\ aln_position_pairs_ge_threshold(m,3.5,null_value=4.,\ intermolecular_data_only=True)) self.assertEqual(aln_position_pairs_cmp_threshold(\ m,3.5,less_equal,null_value=18.,intermolecular_data_only=True),\ aln_position_pairs_le_threshold(m,3.5,null_value=18.,\ intermolecular_data_only=True)) def test_is_parsimony_informative_strict(self): """ is_parsimony_informative functions as expected with strict=True """ freqs = {'A':25} self.assertFalse(is_parsimony_informative(freqs,strict=True)) freqs = {'A':25,'-':25} self.assertFalse(is_parsimony_informative(freqs,strict=True)) freqs = {'A':25,'?':25} self.assertFalse(is_parsimony_informative(freqs,strict=True)) freqs = {'A':25,'B':1} self.assertFalse(is_parsimony_informative(freqs,strict=True)) freqs = {'A':1,'B':1,'C':1,'D':1,'E':1} self.assertFalse(is_parsimony_informative(freqs,strict=True)) freqs = {'A':2,'B':1,'C':1,'D':1,'E':1} self.assertFalse(is_parsimony_informative(freqs,strict=True)) freqs = {'A':2,'B':2,'C':1,'D':1,'E':1} self.assertFalse(is_parsimony_informative(freqs,strict=True)) freqs = {'A':25,'B':2} self.assertTrue(is_parsimony_informative(freqs,strict=True)) freqs = {'A':2,'B':2,'C':2,'D':2,'E':2} self.assertTrue(is_parsimony_informative(freqs,strict=True)) def test_is_parsimony_informative_non_strict(self): """ is_parsimony_informative functions as expected with strict=False """ freqs = {'A':25} self.assertFalse(is_parsimony_informative(freqs,strict=False)) freqs = {'A':25,'-':25} self.assertFalse(is_parsimony_informative(freqs,strict=False)) freqs = {'A':25,'?':25} self.assertFalse(is_parsimony_informative(freqs,strict=False)) freqs = {'A':25,'B':1} self.assertFalse(is_parsimony_informative(freqs,strict=False)) freqs = {'A':1,'B':1,'C':1,'D':1,'E':1} self.assertFalse(is_parsimony_informative(freqs,strict=False)) freqs = {'A':2,'B':1,'C':1,'D':1,'E':1} self.assertFalse(is_parsimony_informative(freqs,strict=False)) freqs = {'A':2,'B':2,'C':1,'D':1,'E':1} self.assertTrue(is_parsimony_informative(freqs,strict=False)) freqs = {'A':25,'B':2} self.assertTrue(is_parsimony_informative(freqs,strict=False)) freqs = {'A':2,'B':2,'C':2,'D':2,'E':2} self.assertTrue(is_parsimony_informative(freqs,strict=False)) def test_is_parsimony_informative_non_default(self): """ is_parsimony_informative functions w non default paramters """ ## NEED TO UPDATE THESE TESTS BASED ON MY ERROR IN THE ## DEFINITION OF PARSIMONY INFORMATIVE. # changed minimum_count freqs = {'A':25,'B':2} self.assertFalse(is_parsimony_informative(freqs,\ minimum_count=3,strict=False)) freqs = {'A':25,'B':1} self.assertTrue(is_parsimony_informative(freqs,\ minimum_count=1,strict=False)) # different value of strict yields different results freqs = {'A':25,'B':2,'C':3} self.assertTrue(is_parsimony_informative(freqs,\ minimum_count=3,strict=False)) self.assertFalse(is_parsimony_informative(freqs,\ minimum_count=3,strict=True)) # changed minimum_differences freqs = {'A':25,'B':25} self.assertFalse(is_parsimony_informative(\ freqs,minimum_differences=3,strict=False)) freqs = {'A':25} self.assertTrue(is_parsimony_informative(\ freqs,minimum_differences=1,strict=False)) # changed ignored freqs = {'A':25,'-':25,'?':25} self.assertTrue(is_parsimony_informative(freqs,ignored=None,\ strict=False)) freqs = {'A':25,'?':25} self.assertTrue(is_parsimony_informative(freqs,ignored='',\ strict=False)) freqs = {'A':25,'-':25} self.assertTrue(is_parsimony_informative(freqs,ignored=None,\ strict=False)) freqs = {'A':25,'C':25} self.assertFalse(is_parsimony_informative(freqs,ignored='A',\ strict=False)) def test_filter_non_parsimony_informative_intramolecular(self): """ non-parsimony informative sites in intramolecular matrix -> null """ aln = LoadSeqs(data={'1':'ACDE','2':'ACDE','3':'ACDE','4':'ACDE'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[gDefaultNullValue]*4]*4) filter_non_parsimony_informative(aln,m) self.assertFloatEqual(m,expected) aln = LoadSeqs(data={'1':'ACDE','2':'FCDE','3':'ACDE','4':'FCDE'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[42.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[gDefaultNullValue]*4]*4) expected[0,0] = 42. filter_non_parsimony_informative(aln,m) self.assertFloatEqual(m,expected) def test_filter_non_parsimony_informative_intermolecular(self): """ non-parsimony informative sites in intermolecular matrix -> null """ # all non-parsimony informative aln = LoadSeqs(data={'1':'ACDEWQ','2':'ACDEWQ','3':'ACDEWQ','4':'ACDEWQ'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) expected = array([[gDefaultNullValue]*4]*2) filter_non_parsimony_informative(aln,m,intermolecular_data_only=True) self.assertFloatEqual(m,expected) # one non-parsimony informative pair of positions aln = LoadSeqs(data={'1':'FCDEWD','2':'ACDEWQ','3':'ACDEWD','4':'FCDEWQ'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) expected = array([[gDefaultNullValue]*4]*2) expected[1,0] = 9. filter_non_parsimony_informative(aln,m,intermolecular_data_only=True) self.assertFloatEqual(m,expected) # all parsimony informative aln = LoadSeqs(data={'1':'FFFFFF','2':'FFFFFF','3':'GGGGGG','4':'GGGGGG'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) filter_non_parsimony_informative(aln,m,intermolecular_data_only=True) self.assertFloatEqual(m,expected) def test_filter_exclude_positions_intramolecular(self): """filter_exclude_positions: functions for intramolecular data """ # filter zero positions (no excludes) aln = LoadSeqs(data={'1':'WCDE','2':'ACDE','3':'ACDE','4':'ACDE'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) filter_exclude_positions(aln,m) self.assertFloatEqual(m,expected) # filter zero positions (max_exclude_percentage = percent exclude) aln = LoadSeqs(data={'1':'-CDE','2':'A-DE','3':'AC-E','4':'ACD-'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) filter_exclude_positions(aln,m,max_exclude_percent=0.25) self.assertFloatEqual(m,expected) # filter zero positions (max_exclude_percentage too high) aln = LoadSeqs(data={'1':'-CDE','2':'A-DE','3':'AC-E','4':'ACD-'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) filter_exclude_positions(aln,m,max_exclude_percent=0.5) self.assertFloatEqual(m,expected) # filter one position (defualt max_exclude_percentage) aln = LoadSeqs(data={'1':'-CDE','2':'ACDE','3':'ACDE','4':'ACDE'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[gDefaultNullValue]*4,[gDefaultNullValue,18.,5.,6.], [gDefaultNullValue,1.,3.,2.],[gDefaultNullValue,0.,1.,33.]]) filter_exclude_positions(aln,m) self.assertFloatEqual(m,expected) # filter one position (non-defualt max_exclude_percentage) aln = LoadSeqs(data={'1':'-CDE','2':'ACDE','3':'ACDE','4':'-CDE'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[gDefaultNullValue]*4,[gDefaultNullValue,18.,5.,6.], [gDefaultNullValue,1.,3.,2.],[gDefaultNullValue,0.,1.,33.]]) filter_exclude_positions(aln,m,max_exclude_percent=0.49) self.assertFloatEqual(m,expected) # filter all positions (defualt max_exclude_percentage) aln = LoadSeqs(data={'1':'----','2':'ACDE','3':'ACDE','4':'ACDE'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[gDefaultNullValue]*4]*4) filter_exclude_positions(aln,m) self.assertFloatEqual(m,expected) # filter all positions (non-defualt max_exclude_percentage) aln = LoadSeqs(data={'1':'----','2':'A-DE','3':'AC--','4':'-CDE'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,3.]]) expected = array([[gDefaultNullValue]*4]*4) filter_exclude_positions(aln,m,max_exclude_percent=0.49) self.assertFloatEqual(m,expected) # filter one position (defualt max_exclude_percentage, # non-defualt excludes) aln = LoadSeqs(data={'1':'WCDE','2':'ACDE','3':'ACDE','4':'ACDE'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[gDefaultNullValue]*4,[gDefaultNullValue,18.,5.,6.], [gDefaultNullValue,1.,3.,2.],[gDefaultNullValue,0.,1.,33.]]) filter_exclude_positions(aln,m,excludes='W') self.assertFloatEqual(m,expected) # filter one position (defualt max_exclude_percentage, # non-defualt null_value) aln = LoadSeqs(data={'1':'-CDE','2':'ACDE','3':'ACDE','4':'ACDE'},\ moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.], [4.,1.,3.,2.],[21.,0.,1.,33.]]) expected = array([[999.]*4,[999.,18.,5.,6.], [999.,1.,3.,2.],[999.,0.,1.,33.]]) filter_exclude_positions(aln,m,null_value=999.) self.assertFloatEqual(m,expected) def test_filter_exclude_positions_intermolecular(self): """filter_exclude_positions: functions for intermolecular data """ # these tests correspond to alignments of length 4 and 2 positions # respectively, hence a coevolution_matrix with shape = (2,4) # filter zero positions (no excludes) merged_aln = LoadSeqs(data={'1':'WCDEDE','2':'ACDEDE',\ '3':'ACDEDE','4':'ACDEDE'},moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) filter_exclude_positions(merged_aln,m,intermolecular_data_only=True) self.assertFloatEqual(m,expected) # filter one position (aln1) merged_aln = LoadSeqs(data={'1':'WC-EDE','2':'ACDEDE',\ '3':'ACDEDE','4':'ACDEDE'},moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) expected = array([[1.,10.,gDefaultNullValue,3.],\ [9.,18.,gDefaultNullValue,6.]]) filter_exclude_positions(merged_aln,m,intermolecular_data_only=True) self.assertFloatEqual(m,expected) # filter one position (aln2) merged_aln = LoadSeqs(data={'1':'WCEEDE','2':'ACDEDE',\ '3':'ACDEDE','4':'ACDED-'},moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) expected = array([[1.,10.,4.,3.],\ [gDefaultNullValue]*4]) filter_exclude_positions(merged_aln,m,intermolecular_data_only=True) self.assertFloatEqual(m,expected) # filter two positions (aln1 & aln2) merged_aln = LoadSeqs(data={'1':'-CEEDE','2':'ACDEDE',\ '3':'ACDEDE','4':'ACDED-'},moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) expected = array([[gDefaultNullValue,10.,4.,3.],\ [gDefaultNullValue]*4]) filter_exclude_positions(merged_aln,m,intermolecular_data_only=True) self.assertFloatEqual(m,expected) # filter two positions (aln1 & aln2, alt excludes) merged_aln = LoadSeqs(data={'1':'WCEEDE','2':'ACDEDE',\ '3':'ACDEDE','4':'ACDEDW'},moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) expected = array([[gDefaultNullValue,10.,4.,3.],\ [gDefaultNullValue]*4]) filter_exclude_positions(merged_aln,m,intermolecular_data_only=True,\ excludes='W') self.assertFloatEqual(m,expected) # filter two positions (aln1 & aln2, alt null_value) merged_aln = LoadSeqs(data={'1':'-CEEDE','2':'ACDEDE',\ '3':'ACDEDE','4':'ACDED-'},moltype=PROTEIN,aligned=DenseAlignment) m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]]) expected = array([[999.,10.,4.,3.],\ [999.]*4]) filter_exclude_positions(merged_aln,m,intermolecular_data_only=True,\ null_value=999.) self.assertFloatEqual(m,expected) def test_filter_threshold_based_multiple_interdependency_intermolecular(self): "multiple interdependency filter functions with intermolecular data " ## cmp_function = ge # lower boundary null = gDefaultNullValue m = array([[0.63,0.00,null],\ [0.75,0.10,0.45],\ [0.95,0.32,0.33],\ [1.00,0.95,0.11]]) expected = array([[null,null,null],\ [null,null,0.45],\ [null,null,null],\ [null,null,null]]) actual = filter_threshold_based_multiple_interdependency(\ None,m,0.95,0,greater_equal,True) self.assertFloatEqual(actual,expected) # realisitic test case m = array([[0.63,0.00,null],\ [0.75,0.10,0.45],\ [0.95,0.32,0.33],\ [1.00,0.95,0.11]]) expected = array([[null,0.00,null],\ [null,0.10,0.45],\ [null,0.32,0.33],\ [null,null,null]]) actual = filter_threshold_based_multiple_interdependency(\ None,m,0.95,1,greater_equal,True) self.assertFloatEqual(actual,expected) # upper boundary, nothing filtered null = gDefaultNullValue m = array([[0.63,0.00,null],\ [0.75,0.10,0.45],\ [0.95,0.32,0.33],\ [1.00,0.95,0.11]]) expected = m actual = filter_threshold_based_multiple_interdependency(\ None,m,0.95,5,greater_equal,True) self.assertFloatEqual(actual,expected) # cmp_function = less_equal, realistic test case m = array([[0.63,0.00,null],\ [0.75,0.10,0.45],\ [0.95,0.32,0.33],\ [1.00,0.95,0.11]]) expected = array([[0.63,null,null],\ [0.75,null,null],\ [null,null,null],\ [1.00,null,null]]) actual = filter_threshold_based_multiple_interdependency(\ None,m,0.35,1,less_equal,True) self.assertFloatEqual(actual,expected) def test_filter_threshold_based_multiple_interdependency_intramolecular(self): "multiple interdependency filter functions with intramolecular data " null = gDefaultNullValue ## cmp_function = ge # lower bound, everything filtered m = array([[0.63,0.75,0.95,1.00],\ [0.75,0.10,null,0.95],\ [0.95,null,0.33,0.11],\ [1.00,0.95,0.11,1.00]]) expected = array([[null,null,null,null],\ [null,null,null,null],\ [null,null,null,null],\ [null,null,null,null]]) actual = filter_threshold_based_multiple_interdependency(\ None,m,0.95,0,greater_equal) self.assertFloatEqual(actual,expected) # realistic test case m = array([[0.63,0.75,0.95,1.00],\ [0.75,0.10,null,0.95],\ [0.95,null,0.33,0.11],\ [1.00,0.95,0.11,1.00]]) expected = array([[null,null,null,null],\ [null,0.10,null,null],\ [null,null,0.33,null],\ [null,null,null,null]]) actual = filter_threshold_based_multiple_interdependency(\ None,m,0.95,1,greater_equal) self.assertFloatEqual(actual,expected) # upper boundary, nothing filtered m = array([[0.63,0.75,0.95,1.00],\ [0.75,0.10,null,0.95],\ [0.95,null,0.33,0.11],\ [1.00,0.95,0.11,1.00]]) expected = m actual = filter_threshold_based_multiple_interdependency(\ None,m,0.95,5,greater_equal) self.assertFloatEqual(actual,expected) ## cmp_function = le # realistic test case m = array([[0.63,0.75,0.95,1.00],\ [0.75,0.10,null,0.95],\ [0.95,null,0.33,0.11],\ [1.00,0.95,0.11,1.00]]) expected = array([[0.63,0.75,null,1.00],\ [0.75,0.10,null,0.95],\ [null,null,null,null],\ [1.00,0.95,null,1.00]]) actual = filter_threshold_based_multiple_interdependency(\ None,m,0.33,1,less_equal) self.assertFloatEqual(actual,expected) def test_probs_from_dict(self): """probs_from_dict: dict of probs -> list of probs in alphabet's order """ d = {'A':0.25,'D':0.52,'C':0.23} a = list('ACD') self.assertFloatEqual(probs_from_dict(d,a),[0.25,0.23,0.52]) a = list('ADC') self.assertFloatEqual(probs_from_dict(d,a),[0.25,0.52,0.23]) a = list('DCA') self.assertFloatEqual(probs_from_dict(d,a),[0.52,0.23,0.25]) a = CharAlphabet('DCA') self.assertFloatEqual(probs_from_dict(d,a),[0.52,0.23,0.25]) # protein natural probs l = probs_from_dict(protein_dict,AAGapless) for i in range(20): self.assertFloatEqual(l[i],protein_dict[AAGapless[i]],0.001) def test_freqs_from_aln(self): """freqs_from_aln: freqs of alphabet chars in aln is calc'ed correctly """ # non-default scaled_aln_size aln = DenseAlignment(data=zip(range(4),['ACGT','AGCT','ACCC','TAGG']),\ MolType=PROTEIN) alphabet = 'ACGT' expected = [4,5,4,3] self.assertEqual(freqs_from_aln(aln,alphabet,16),expected) # change the order of the alphabet alphabet = 'TGCA' expected = [3,4,5,4] self.assertEqual(freqs_from_aln(aln,alphabet,16),expected) # default scaled_aln_size, sums of freqs == 100 alphabet = 'ACGT' expected = [25.,31.25,25,18.75] self.assertEqual(freqs_from_aln(aln,alphabet),expected) # alphabet char which doesn't show up gets zero freq alphabet = 'ACGTW' expected = [25.,31.25,25,18.75,0] self.assertEqual(freqs_from_aln(aln,alphabet),expected) # alignment char which doesn't show up is silently ignored aln = DenseAlignment(data=zip(range(4),['ACGT','AGCT','ACCC','TWGG']),\ MolType=PROTEIN) alphabet = 'ACGT' expected = [18.75,31.25,25,18.75] self.assertEqual(freqs_from_aln(aln,alphabet),expected) def test_freqs_to_array(self): """freqs_to_array: should convert Freqs object to array""" #should work with empty object f = Freqs() f2a = freqs_to_array self.assertFloatEqual(f2a(f, AAGapless), zeros(20)) #should work with full object, omitting unwanted keys f = Freqs({'A':20, 'Q':30, 'X':20}) expected = zeros(20) expected[AAGapless.index('A')] = 20 expected[AAGapless.index('Q')] = 30 self.assertFloatEqual(expected, f2a(f, AAGapless)) #should work for normal dict and any alphabet d = {'A':3,'D':1,'C':5,'E':2} alpha = "ABCD" exp = array([3,0,5,1]) self.assertFloatEqual(f2a(d,alpha),exp) def test_get_allowed_perturbations(self): """get_allowed_perturbations: should work for different cutoff values """ counts = [50,40,10,0] a = list('ACGT') self.assertEqual(get_allowed_perturbations(counts,1.0,a),[]) self.assertEqual(get_allowed_perturbations(counts,0.51,a),[]) self.assertEqual(get_allowed_perturbations(counts,0.5,a),['A']) self.assertEqual(get_allowed_perturbations(counts,0.49,a),['A']) self.assertEqual(get_allowed_perturbations(counts,0.401,a),['A']) self.assertEqual(get_allowed_perturbations(counts,0.40,a),['A','C']) self.assertEqual(get_allowed_perturbations(counts,0.399,a),['A','C']) self.assertEqual(get_allowed_perturbations(counts,0.10,a),\ ['A','C','G']) self.assertEqual(get_allowed_perturbations(counts,0.0,a),a) def test_get_subalignments(self): """get_subalignments: works with different alignment sizes and cutoffs """ aln = DenseAlignment(\ data={1:'AAAA',2:'AAAC',3:'AACG',4:'ACCT',5:'ACG-'},\ MolType=PROTEIN) sub_aln_0A = DenseAlignment(\ data={1:'AAAA',2:'AAAC',3:'AACG',4:'ACCT',5:'ACG-'},\ MolType=PROTEIN) sub_aln_0C = {} sub_aln_1A = DenseAlignment(data={1:'AAAA',2:'AAAC',3:'AACG'},\ MolType=PROTEIN) sub_aln_1C = DenseAlignment(data={4:'ACCT',5:'ACG-'},MolType=PROTEIN) sub_aln_2G = DenseAlignment(data={5:'ACG-'},MolType=PROTEIN) self.assertEqual(get_subalignments(aln,0,['A']),[sub_aln_0A]) self.assertEqual(get_subalignments(aln,0,['C']),[sub_aln_0C]) self.assertEqual(get_subalignments(aln,1,['A']),[sub_aln_1A]) self.assertEqual(get_subalignments(aln,1,['C']),[sub_aln_1C]) self.assertEqual(get_subalignments(aln,1,['A','C']),\ [sub_aln_1A,sub_aln_1C]) self.assertEqual(get_subalignments(aln,2,['G']),[sub_aln_2G]) self.assertEqual(get_subalignments(aln,3,['-']),[sub_aln_2G]) def test_get_positional_frequencies_w_scale(self): """get_positional_frequencies: works with default scaled_aln_size""" aln = DenseAlignment(data={1:'ACDE',2:'ADDE',3:'AEED',4:'AFEF'},\ MolType=PROTEIN) expected_0 = array([100.,0.,0.,0.,0.]) expected_1 = array([0.,25.,25.,25.,25.]) expected_2 = array([0.,0.,50.,50.,0.]) expected_3 = array([0.,0.,25.,50.,25.]) self.assertFloatEqual(get_positional_frequencies(aln,0,'ACDEF'),expected_0) self.assertFloatEqual(get_positional_frequencies(aln,1,'ACDEF'),expected_1) self.assertFloatEqual(get_positional_frequencies(aln,2,'ACDEF'),expected_2) self.assertFloatEqual(get_positional_frequencies(aln,3,'ACDEF'),expected_3) # extra characters (W) are silently ignored -- is this the desired # behavior? aln = DenseAlignment(data={1:'WCDE',2:'ADDE',3:'AEED',4:'AFEF'},\ MolType=PROTEIN) expected_0 = array([75.,0.,0.,0.,0.]) self.assertFloatEqual(get_positional_frequencies(aln,0,'ACDEF'),expected_0) # 20 residue amino acid alphabet aln = DenseAlignment(data={1:'ACDE',2:'ADDE',3:'AEED',4:'AFEF'},\ MolType=PROTEIN) expected = array([100.] + [0.]*19) self.assertFloatEqual(get_positional_frequencies(aln,0,AAGapless),expected) def test_get_positional_frequencies(self): """get_positional_frequencies: works with non-default scaled_aln_size """ aln = DenseAlignment(data={1:'ACDE',2:'ADDE',3:'AEED',4:'AFEF'},\ MolType=PROTEIN) expected_0 = array([4.,0.,0.,0.,0.]) expected_1 = array([0.,1.,1.,1.,1.]) expected_2 = array([0.,0.,2.,2.,0.]) expected_3 = array([0.,0.,1.,2.,1.]) self.assertFloatEqual(get_positional_frequencies(aln,0,'ACDEF',4),\ expected_0) self.assertFloatEqual(get_positional_frequencies(aln,1,'ACDEF',4),\ expected_1) self.assertFloatEqual(get_positional_frequencies(aln,2,'ACDEF',4),\ expected_2) self.assertFloatEqual(get_positional_frequencies(aln,3,'ACDEF',4),\ expected_3) # extra characters (W) are silently ignored -- is this the desired # behavior? aln = DenseAlignment(data={1:'WCDE',2:'ADDE',3:'AEED',4:'AFEF'},\ MolType=PROTEIN) expected_0 = array([3.,0.,0.,0.,0.]) self.assertFloatEqual(get_positional_frequencies(aln,0,'ACDEF',4),\ expected_0) # 20 residue amino acid alphabet aln = DenseAlignment(data={1:'ACDE',2:'ADDE',3:'AEED',4:'AFEF'},\ MolType=PROTEIN) expected = array([4.] + [0.]*19) self.assertFloatEqual(get_positional_frequencies(aln,0,AAGapless,4),\ expected) def test_validate_alphabet_invalid(self): """validate_alphabet: raises error on incompatible alpabet and freqs """ # len(alpha) > len(freqs) self.assertRaises(ValueError,validate_alphabet,\ 'ABC',{'A':0.5,'B':0.5}) self.assertRaises(ValueError,validate_alphabet,\ 'ABCD',{'A':0.5,'B':0.5}) # len(alpha) == len(freqs) self.assertRaises(ValueError,validate_alphabet,\ 'AC',{'A':0.5,'B':0.5}) # len(alpha) < len(freqs) self.assertRaises(ValueError,validate_alphabet,\ 'A',{'A':0.5,'B':0.5}) self.assertRaises(ValueError,validate_alphabet,'',\ {'A':0.5,'B':0.5}) # different values, len(alpha) > len(freqs) self.assertRaises(ValueError,validate_alphabet,[1,42,3],\ {42:0.5,1:0.5}) self.assertRaises(ValueError,validate_alphabet,CharAlphabet('ABC'),\ {'A':0.5,'C':0.5}) def test_validate_alphabet_valid(self): """validate_alphabet: does nothing on compatible alpabet and freqs """ validate_alphabet('AB',{'A':0.5,'B':0.5}) validate_alphabet(CharAlphabet('AB'),{'A':0.5,'B':0.5}) validate_alphabet([1,42,8],{1:0.5,42:0.25,8:0.25}) def test_validate_position_invalid(self): """validate_position: raises error on invalid position """ self.assertRaises(ValueError,validate_position,self.dna_aln,4) self.assertRaises(ValueError,validate_position,self.dna_aln,42) self.assertRaises(ValueError,validate_position,self.dna_aln,-1) self.assertRaises(ValueError,validate_position,self.dna_aln,-199) def test_validate_position_valid(self): """validate_position: does nothing on valid position """ validate_position(self.dna_aln,0) validate_position(self.dna_aln,1) validate_position(self.dna_aln,2) validate_position(self.dna_aln,3) def test_validate_alignment(self): """validate_alignment: ValueError on bad alignment characters""" # ambiguous characters aln = DenseAlignment(data={0:'BA',1:'AC',2:'CG',3:'CT',4:'TA'},\ MolType=PROTEIN) self.assertRaises(ValueError,validate_alignment,aln) aln = DenseAlignment(data={0:'NA',1:'AC',2:'CG',3:'CT',4:'TA'},\ MolType=DNA) self.assertRaises(ValueError,validate_alignment,aln) aln = DenseAlignment(data={0:'YA',1:'AC',2:'CG',3:'CU',4:'UA'},\ MolType=RNA) self.assertRaises(ValueError,validate_alignment,aln) aln = DenseAlignment(data={0:'AA',1:'AC',2:'CG',3:'CT',4:'TA'},\ MolType=PROTEIN) validate_alignment(aln) aln = DenseAlignment(data={0:'AA',1:'AC',2:'CG',3:'CT',4:'TA'},\ MolType=DNA) validate_alignment(aln) aln = DenseAlignment(data={0:'AA',1:'AC',2:'CG',3:'CU',4:'UA'},\ MolType=RNA) validate_alignment(aln) def test_coevolve_functions_validate_alignment(self): """coevolve_*: functions run validate alignment""" aln = DenseAlignment(\ data={'0':'BA','1':'AC','2':'CG','3':'CT','4':'TA'},\ MolType=PROTEIN) self.assertRaises(ValueError,coevolve_pair,mi_pair,aln,0,1) self.assertRaises(ValueError,coevolve_position,mi_position,aln,0) self.assertRaises(ValueError,coevolve_alignment,mi_alignment,aln) self.assertRaises(ValueError,coevolve_alignments,mi_alignment,aln,aln) def test_get_positional_probabilities_w_non_def_num_seqs(self): """get_positional_probabilities: works w/ non-def num_seqs""" freqs = [1.,2.,0.] probs = [0.33,0.33,0.33] expected = array([0.444411,0.218889,0.300763]) self.assertFloatEqual(get_positional_probabilities(freqs,probs,3),\ expected) def test_get_dg(self): """get_dg: returns delta_g vector""" p = [0.1,0.2,0.3] a = [0.5,0.6,0.7] expected = [log(0.1/0.5),log(0.2/0.6),log(0.3/0.7)] self.assertFloatEqual(get_dg(p,a),expected) def test_get_dgg(self): """get_dgg: returns delta_delta_g value given two delta_g vectors """ v1 = array([0.05,0.5,0.1]) v2 = array([0.03,0.05,0.1]) expected = sqrt(sum((v1 - v2) * (v1 - v2)))/100 * e self.assertFloatEqual(get_dgg(v1,v2),expected) def test_get_positional_probabilities_w_def_num_seqs(self): """get_positional_probabilities: works w/ num_seqs scaled to 100 (def) """ freqs = [15.,33.,52.] probs = [0.33,0.33,0.33] expected = array([2.4990e-5,0.0846,3.8350e-5]) self.assertFloatEqual(get_positional_probabilities(freqs,probs),\ expected,0.001) def test_get_positional_probs_handles_rounding_error_in_freqs(self): """get_positional_probabilities: works w/ rounding error in freqs""" # Since freqs are scaled to scaled_aln_size, rounding error can cause # errors for positions that are perfectly controled. Testing here that # that value error is handled. # default scaled_aln_size freqs = [100.0000000001,0.,0.] probs = [0.33,0.33,0.33] expected = array([7.102218e-49,4.05024e-18,4.05024e-18]) self.assertFloatEqual(get_positional_probabilities(freqs,probs),\ expected) # value that is truely over raises an error freqs = [101.0000000001,0.,0.] probs = [0.33,0.33,0.33] self.assertRaises(ValueError,get_positional_probabilities,freqs,probs) # non-default scaled_aln_size freqs = [50.0000000001,0.,0.] probs = [0.33,0.33,0.33] expected = array([8.42747e-25,2.01252e-9,2.01252e-9]) self.assertFloatEqual(get_positional_probabilities(freqs,probs,50),\ expected) # value that is truely over raises an error freqs = [51.0000000001,0.,0.] probs = [0.33,0.33,0.33] self.assertRaises(ValueError,get_positional_probabilities,\ freqs,probs,50) def test_sca_input_validation(self): """sca_input_validation: handles sca-specific validation steps """ # MolType != PROTEIN makes background freqs required self.assertRaises(ValueError,sca_input_validation,\ self.dna_aln,cutoff=0.4) self.assertRaises(ValueError,sca_input_validation,\ self.rna_aln,cutoff=0.4) # no cutoff -> ValueError self.assertRaises(ValueError,sca_input_validation,self.protein_aln) # low cutoff -> ValueError self.assertRaises(ValueError,sca_input_validation,\ self.protein_aln,cutoff=-0.001) # high cutoff -> ValueError self.assertRaises(ValueError,sca_input_validation,\ self.protein_aln,cutoff=1.001) # good cut-off -> no error sca_input_validation(self.protein_aln,cutoff=0.50) sca_input_validation(self.protein_aln,cutoff=0.0) sca_input_validation(self.protein_aln,cutoff=1.0) # only bad alphabet -> ValueError self.assertRaises(ValueError,sca_input_validation,\ self.dna_aln,cutoff=0.5,alphabet='ABC') # only bad background_freqs -> ValueError self.assertRaises(ValueError,sca_input_validation,\ self.dna_aln,cutoff=0.5, background_freqs={'A':0.25, 'C':0.75}) # incompatible background_freqs & alphabet provided -> ValueError self.assertRaises(ValueError,sca_input_validation,\ self.dna_aln,cutoff=0.5, alphabet='ABC', \ background_freqs={'A':0.25, 'C':0.75}) # default alphabet, background_freqs -> no error sca_input_validation(self.protein_aln,cutoff=0.50) # compatible non-default alphabet, backgorund_freqs -> no error sca_input_validation(self.dna_aln,cutoff=0.50,alphabet='A',\ background_freqs={'A':1.0}) ## Note: don't need a full set of tests of validate_alphabet here -- ## it's tested on it's own. def test_sca_pair_no_error(self): """sca_pair: returns w/o error """ r = sca_pair(self.dna_aln,1,0,cutoff=0.50,alphabet='ACGT',\ background_freqs=self.dna_base_freqs) r = coevolve_pair(sca_pair,self.dna_aln,1,0,cutoff=0.50,\ alphabet='ACGT',background_freqs=self.dna_base_freqs) def test_sca_pair_return_all(self): """sca_pair: handles return_all by returning lists of proper length """ # two allowed_perturbations a = 'ACGT' aln = DenseAlignment(data={0:'AA',1:'AC',2:'CG',3:'CT',4:'TA'},\ MolType=DNA) actual = sca_pair(aln,0,1,cutoff=0.33,return_all=True,alphabet=a,\ background_freqs=self.dna_base_freqs) self.assertEqual(len(actual),2) self.assertEqual(actual[0][0], 'A') self.assertEqual(actual[1][0], 'C') # one allowed_perturbations a = 'ACGT' aln = DenseAlignment(data={0:'AA',1:'AC',2:'AG',3:'CT',4:'TA'},\ MolType=DNA) actual = sca_pair(aln,0,1,0.33,return_all=True,alphabet=a,\ background_freqs=self.dna_base_freqs) self.assertEqual(len(actual),1) self.assertEqual(actual[0][0], 'A') # zero allowed_perturbations actual = sca_pair(aln,0,1,1.0,return_all=True,alphabet=a,\ background_freqs=self.dna_base_freqs) #expected = [('A',-1),('C',-1)] expected = gDefaultNullValue self.assertFloatEqual(actual,expected) # pos1 == pos2 actual = sca_pair(aln,0,0,0.33,return_all=True,alphabet=a,\ background_freqs=self.dna_base_freqs) #expected = [('A',-1),('C',-1)] expected = [('A', 2.40381185618)] self.assertFloatEqual(actual,expected) def test_sca_pair_error(self): """sca_pair:returns w/ error when appropriate """ a = 'ACGT' # pos1 out of range self.assertRaises(ValueError,coevolve_pair,sca_pair,self.dna_aln,\ 100,1,cutoff=0.50,alphabet=a,background_freqs=self.dna_base_freqs) # pos2 out of range self.assertRaises(ValueError,coevolve_pair,sca_pair,self.dna_aln,\ 0,100,cutoff=0.50,alphabet=a,background_freqs=self.dna_base_freqs) # pos1 & pos2 out of range self.assertRaises(ValueError,coevolve_pair,sca_pair,self.dna_aln,\ 100,100,cutoff=0.50,alphabet=a,\ background_freqs=self.dna_base_freqs) # bad cut-off self.assertRaises(ValueError,coevolve_pair,sca_pair,\ self.dna_aln,0,1,cutoff=1.2,\ alphabet=a,background_freqs=self.dna_base_freqs) # incompatible alphabet and background freqs self.assertRaises(ValueError,coevolve_pair,sca_pair,\ self.dna_aln,0,1,cutoff=0.2,alphabet=a) self.assertRaises(ValueError,coevolve_pair,sca_pair,\ self.dna_aln,0,1,cutoff=0.2,alphabet='ACGTBC',\ background_freqs=self.dna_base_freqs) def test_sca_position_no_error(self): """sca_position: returns w/o error """ r = sca_position(self.dna_aln,1,0.50,alphabet='ACGT',\ background_freqs=self.dna_base_freqs) # sanity check -- coupling w/ self self.assertFloatEqual(r[1],3.087,0.01) r = sca_position(self.dna_aln_gapped,1,0.50,\ alphabet='ACGT',background_freqs=self.dna_base_freqs) self.assertFloatEqual(r[1],3.387,0.01) ## same tests, but called via coevolve_position r = coevolve_position(sca_position,self.dna_aln,1,cutoff=0.50,\ alphabet='ACGT',background_freqs=self.dna_base_freqs) # sanity check -- coupling w/ self self.assertFloatEqual(r[1],3.087,0.01) r = coevolve_position(sca_position,self.dna_aln_gapped,1,cutoff=0.50,\ alphabet='ACGT',background_freqs=self.dna_base_freqs) # sanity check -- coupling w/ self self.assertFloatEqual(r[1],3.387,0.01) def test_sca_position_error(self): """sca_position: returns w/ error when appropriate """ a = 'ACGT' # position out of range self.assertRaises(ValueError,coevolve_position,sca_position,\ self.dna_aln,100,cutoff=0.50,alphabet=a,\ background_freqs=self.dna_base_freqs) # bad cutoff self.assertRaises(\ ValueError,coevolve_position,sca_position,self.dna_aln,\ 1,cutoff=-8.2,alphabet=a,background_freqs=self.dna_base_freqs) # incompatible alphabet and background freqs self.assertRaises(ValueError,coevolve_position,sca_position,\ self.dna_aln,0,cutoff=0.2,alphabet=a) self.assertRaises(ValueError,coevolve_position,sca_position,\ self.dna_aln,0,cutoff=0.2,alphabet='ACGTBC',\ background_freqs=self.dna_base_freqs) def test_sca_position_returns_same_as_sca_pair(self): """sca_position: returns same as sca_pair called on each pos """ expected = [] for i in range(len(self.dna_aln)): expected.append(sca_pair(self.dna_aln,1,i,0.50,\ alphabet='ACGT',background_freqs=self.dna_base_freqs)) actual = sca_position(self.dna_aln,1,0.50,\ alphabet='ACGT',background_freqs=self.dna_base_freqs) self.assertFloatEqual(actual,expected) # change some of the defaults to make sure they make it through bg_freqs = {'A':0.50,'C':0.50} expected = [] for i in range(len(self.dna_aln)): expected.append(sca_pair(self.dna_aln,1,i,0.50,\ alphabet='AC',null_value=52.,scaled_aln_size=20,\ background_freqs=bg_freqs)) actual = sca_position(self.dna_aln,1,0.50,alphabet='AC',\ null_value=52.,scaled_aln_size=20,background_freqs=bg_freqs) self.assertFloatEqual(actual,expected) def test_sca_alignment_no_error(self): """sca_alignment: returns w/o error """ r = sca_alignment(self.dna_aln,0.50,alphabet='ACGT',\ background_freqs=self.dna_base_freqs) # sanity check -- coupling w/ self self.assertFloatEqual(r[0][0],2.32222608171) ## same test, but called via coevolve_alignment r = coevolve_alignment(sca_alignment,self.dna_aln,\ cutoff=0.50,alphabet='ACGT',\ background_freqs=self.dna_base_freqs) # sanity check -- coupling w/ self self.assertFloatEqual(r[0][0],2.32222608171) def test_sca_alignment_error(self): """sca_alignment: returns w/ error when appropriate """ a = 'ACGT' # incompatible alphabet and background freqs self.assertRaises(ValueError,coevolve_position,sca_position,\ self.dna_aln,0,cutoff=0.2,alphabet=a) self.assertRaises(ValueError,coevolve_position,sca_position,\ self.dna_aln,0,cutoff=0.2,alphabet='ACGTBC',\ background_freqs=self.dna_base_freqs) def test_sca_alignment_returns_same_as_sca_position(self): """sca_alignment: returns same as sca_position on every position""" expected = [] for i in range(len(self.dna_aln)): expected.append(\ sca_position(self.dna_aln,i,0.50,alphabet='ACGT',\ background_freqs=self.dna_base_freqs)) actual = sca_alignment(self.dna_aln,0.50,alphabet='ACGT',\ background_freqs=self.dna_base_freqs) self.assertFloatEqual(actual,expected) # change some of the defaults to make sure they make it through bg_freqs = {'A':0.50,'C':0.50} expected = [] for i in range(len(self.dna_aln)): expected.append(\ sca_position(self.dna_aln,i,0.50,alphabet='AC',\ null_value=52.0,scaled_aln_size=20,background_freqs=bg_freqs)) actual = sca_alignment(self.dna_aln,0.50,alphabet='AC',\ null_value=52.0,scaled_aln_size=20,background_freqs=bg_freqs) self.assertFloatEqual(actual,expected) def test_sca_pair_gpcr(self): """sca_pair: reproduces several GPCR data from Suel et al., 2003 """ self.assertFloatEqual(sca_pair(self.gpcr_aln,295,18,0.32),0.12,0.1) self.assertFloatEqual(sca_pair(self.gpcr_aln,295,124,0.32),1.86,0.1) self.assertFloatEqual(sca_pair(self.gpcr_aln,295,304,0.32),0.3,0.1) # covariation w/ self self.assertFloatEqual(sca_pair(self.gpcr_aln,295,295,0.32),7.70358628) def test_sca_position_gpcr(self): """sca_position: reproduces several GPCR data from Suel et al., 2003 """ if not self.run_slow_tests: return vector = sca_position(self.gpcr_aln,295,0.32) self.assertFloatEqual(vector[18],0.12,0.1) self.assertFloatEqual(vector[124],1.86,0.1) self.assertFloatEqual(vector[304],0.3,0.1) # covariation w/ self == null_value self.assertFloatEqual(vector[295],nan) def test_ltm_to_symmetric(self): """ltm_to_symmetric: making ltm matrices symmetric functions""" m = arange(9).reshape((3,3)) expected = [[0,3,6],[3,4,7],[6,7,8]] self.assertEqual(ltm_to_symmetric(m),expected) # non-square matrices not supported self.assertRaises(AssertionError,\ ltm_to_symmetric,arange(10).reshape(5,2)) self.assertRaises(AssertionError,\ ltm_to_symmetric,arange(10).reshape(2,5)) def test_merge_alignments(self): """ merging alignments of same moltype functions as expected""" # PROTEIN aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1':'EF','2':'EG'},MolType=PROTEIN) combined_aln = DenseAlignment(\ data={'1':'ACEF','2':'ADEG'},MolType=PROTEIN) actual = merge_alignments(aln1,aln2) self.assertEqual(actual,combined_aln) self.assertEqual(actual.MolType,PROTEIN) # RNA aln1 = DenseAlignment(data={'1':'AC','2':'AU'},MolType=RNA) aln2 = DenseAlignment(data={'1':'GG','2':'UG'},MolType=RNA) combined_aln = DenseAlignment(data={'1':'ACGG','2':'AUUG'},MolType=RNA) actual = merge_alignments(aln1,aln2) self.assertEqual(actual,combined_aln) self.assertEqual(actual.MolType,RNA) # DNA aln1 = DenseAlignment(data={'1':'AC','2':'AT'},MolType=DNA) aln2 = DenseAlignment(data={'1':'GG','2':'TG'},MolType=DNA) combined_aln = DenseAlignment(data={'1':'ACGG','2':'ATTG'},MolType=DNA) actual = merge_alignments(aln1,aln2) self.assertEqual(actual,combined_aln) self.assertEqual(actual.MolType,DNA) def test_merge_alignments_ignores_id_following_plus(self): """ merge_alignments ignores all seq id characters after '+' """ aln1 = DenseAlignment(data={'1+a':'AC','2+b':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(\ data={'1 + c':'EFW','2 + d':'EGY'},MolType=PROTEIN) combined_aln = DenseAlignment(\ data={'1':'ACEFW','2':'ADEGY'},MolType=PROTEIN) self.assertEqual(merge_alignments(aln1,aln2),combined_aln) # not all ids have a + aln1 = DenseAlignment(data={'1':'AC','2+b':'AD'},MolType=PROTEIN) aln2 = DenseAlignment(data={'1+c':'EFW','2':'EGY'},MolType=PROTEIN) combined_aln = DenseAlignment(\ data={'1':'ACEFW','2':'ADEGY'},MolType=PROTEIN) self.assertEqual(merge_alignments(aln1,aln2),combined_aln) def test_merge_alignments_different_moltype(self): """ merging alignments of different moltype functions as expected""" aln1 = DenseAlignment(data={'1':'AC','2':'AU'},MolType=RNA) aln2 = DenseAlignment(data={'1':'EF','2':'EG'},MolType=PROTEIN) combined_aln = DenseAlignment(data={'1':'ACEF','2':'AUEG'}) self.assertEqual(merge_alignments(aln1,aln2),combined_aln) aln1 = DenseAlignment(data={'1':'AC','2':'AT'},MolType=DNA) aln2 = DenseAlignment(data={'1':'EF','2':'EG'},MolType=PROTEIN) combined_aln = DenseAlignment(data={'1':'ACEF','2':'ATEG'}) self.assertEqual(merge_alignments(aln1,aln2),combined_aln) aln1 = DenseAlignment(data={'1':'AC','2':'AT'},MolType=DNA) aln2 = DenseAlignment(data={'1':'UC','2':'UG'},MolType=RNA) combined_aln = DenseAlignment(data={'1':'ACUC','2':'ATUG'}) self.assertEqual(merge_alignments(aln1,aln2),combined_aln) def test_n_random_seqs(self): """n_random_seqs: functions as expected""" aln1 = LoadSeqs(data=zip(list('abcd'),['AA','AC','DD','GG']),\ moltype=PROTEIN,aligned=DenseAlignment) # Number of returned sequences correct self.assertEqual(n_random_seqs(aln1,1).getNumSeqs(),1) self.assertEqual(n_random_seqs(aln1,2).getNumSeqs(),2) self.assertEqual(n_random_seqs(aln1,3).getNumSeqs(),3) self.assertEqual(n_random_seqs(aln1,4).getNumSeqs(),4) # Sequences are correct new_aln = n_random_seqs(aln1,3) self.assertEqual(new_aln.getNumSeqs(),3) for n in new_aln.Names: self.assertEqual(new_aln.getSeq(n),aln1.getSeq(n)) # Objects are equal when all are requested self.assertEqual(n_random_seqs(aln1,4),aln1) # Objects are not equal when subset are requested self.assertNotEqual(n_random_seqs(aln1,3),aln1) # In 1000 iterations, we get at least one different alignment -- # this tests the random selection different = False new_aln = n_random_seqs(aln1,2) for i in range(1000): new_aln2 = n_random_seqs(aln1,2) if new_aln != new_aln2: different = True break self.assertTrue(different) class AncestorCoevolve(TestCase): """ Tests of the ancestral state method for detecting coevolution """ def setUp(self): """ """ # t1, ancestral_states1, and aln1_* are used to test that when # alternate seqs are used with the same tree and ancestral_states, # the results vary when appropriate self.t1 = LoadTree(treestring=\ '((A:0.5,B:0.5):0.5,(C:0.5,(D:0.5,E:0.5):0.5):0.5);') self.ancestral_states1 = DenseAlignment(data={'root':'AAA',\ 'edge.0':'AAA','edge.1':'AAA','edge.2':'AAA'},MolType=PROTEIN) self.ancestral_states1_w_gaps = DenseAlignment(data={'root':'AAA',\ 'edge.0':'AAA','edge.1':'A-A','edge.2':'AA-'},MolType=PROTEIN) # no correlated changes count self.aln1_1 = DenseAlignment(data={'A':'AAC','B':'AAD','C':'AAA',\ 'D':'AAE','E':'AFA'},MolType=PROTEIN) # 1 correlated change count self.aln1_2 = DenseAlignment(data={'A':'AAC','B':'AAD','C':'AAA',\ 'D':'AEE','E':'AFF'},MolType=PROTEIN) # 1 different correlated change count self.aln1_3 = DenseAlignment(data={'A':'AAC','B':'AAD','C':'AAA',\ 'D':'AGE','E':'AFH'},MolType=PROTEIN) # 3 correlated change counts self.aln1_4 = DenseAlignment(data={'A':'AAC','B':'AGD','C':'AAA',\ 'D':'AGE','E':'AFH'},MolType=PROTEIN) # 8 correlated change counts self.aln1_5 = DenseAlignment(data={'A':'YYC','B':'HGD','C':'AAA',\ 'D':'AGE','E':'AFH'},MolType=PROTEIN) self.aln1_w_gaps = DenseAlignment(data={'A':'AAC','B':'AAD','C':'AAA',\ 'D':'AG-','E':'A-H'},MolType=PROTEIN) # t2, ancestral_states2_*, and aln2 are used to test that when # alternate ancestral states are used with the same aln and tree, # the results vary when appropriate self.t2 = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);') self.ancestral_states2_1 = DenseAlignment(data={'root':'AA'},\ MolType=PROTEIN) self.ancestral_states2_2 = DenseAlignment(data={'root':'CC'},\ MolType=PROTEIN) self.ancestral_states2_3 = DenseAlignment(data={'root':'EF'},\ MolType=PROTEIN) self.aln2 = DenseAlignment(data={'A':'AA','B':'CC','C':'CA'},\ MolType=PROTEIN) # t3_*, ancestral_states3, and aln3 are used to test that when # alternate trees are used with the same aln and ancestral_states, # the results vary when appropriate self.t3_1 = LoadTree(treestring='(A:0.5,(B:0.5,C:0.5):0.5);') self.t3_2 = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);') self.ancestral_states3 = DenseAlignment(\ data={'root':'CC','edge.0':'AD'},MolType=PROTEIN) self.aln3 = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},\ MolType=PROTEIN) def test_validate_ancestral_seqs_invalid(self): """validate_ancestral_seqs: ValueError on incompatible anc. seqs & tree """ # edge missing aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},MolType=PROTEIN) self.assertRaises(ValueError,validate_ancestral_seqs,aln,\ tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\ ancestral_seqs=DenseAlignment(data={'root':'AA'},MolType=PROTEIN)) # root missing self.assertRaises(ValueError,validate_ancestral_seqs,aln,\ tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\ ancestral_seqs=DenseAlignment(data={'edge.0':'AA'},MolType=PROTEIN)) # correct numSeqs but wrong names self.assertRaises(ValueError,validate_ancestral_seqs,aln,\ tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\ ancestral_seqs=DenseAlignment(data={'root':'AA','edge.1':'AA'},\ MolType=PROTEIN)) self.assertRaises(ValueError,validate_ancestral_seqs,aln,\ tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\ ancestral_seqs=DenseAlignment(data={'r':'AA','edge.0':'AA'},\ MolType=PROTEIN)) self.assertRaises(ValueError,validate_ancestral_seqs,aln,\ tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\ ancestral_seqs=DenseAlignment(data={'r':'AA','e':'AA'},\ MolType=PROTEIN)) # different tree: invalid aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\ MolType=PROTEIN) self.assertRaises(ValueError,validate_ancestral_seqs,aln,\ tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);'),\ ancestral_seqs=DenseAlignment(\ data={'root':'AA','e':'AA','edge.1':'AA'},MolType=PROTEIN)) def test_validate_ancestral_seqs_valid(self): """validate_ancestral_seqs: does nothing on compatible anc. seqs & tree """ aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},MolType=PROTEIN) # valid data -> no error validate_ancestral_seqs(aln,tree=LoadTree(\ treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\ ancestral_seqs=DenseAlignment(data={'root':'AA','edge.0':'AA'},\ MolType=PROTEIN)) # different tree: valid aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\ MolType=PROTEIN) validate_ancestral_seqs(aln,tree=LoadTree(\ treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);'),\ ancestral_seqs=DenseAlignment(data=\ {'root':'AA','edge.0':'AA','edge.1':'AA'},MolType=PROTEIN)) def test_ancestral_states_input_validation(self): """ancestral_states_input_validation: all validation steps performed""" aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\ MolType=PROTEIN) # incompatible tree and ancestral states (more thorough testing in # test_validate_ancestral_seqs) self.assertRaises(ValueError,ancestral_states_input_validation,aln,\ tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);'),\ ancestral_seqs=DenseAlignment(data={'root':'AA','e':'AA',\ 'edge.1':'AA'},MolType=PROTEIN)) # no tree provided self.assertRaises(ValueError,ancestral_states_input_validation,aln,\ ancestral_seqs=DenseAlignment(data={'root':'AA','e':'AA',\ 'edge.1':'AA'},MolType=PROTEIN)) # incompatible tree and alignment (more tests in test_validate_tree) aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},MolType=PROTEIN) self.assertRaises(ValueError,ancestral_states_input_validation,aln,\ tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);')) def test_validate_tree_valid(self): """validate_tree: does nothing on compatible tree and aln """ t = LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);') aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\ MolType=PROTEIN) validate_tree(aln,t) t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);') aln = DenseAlignment(\ data={'A':'AC','B':'CA','C':'CC'},MolType=PROTEIN) validate_tree(aln,t) def test_validate_tree_invalid(self): """validate_tree: raises ValueError on incompatible tree and aln """ # different scale tree and aln t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);') aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\ MolType=PROTEIN) self.assertRaises(ValueError,validate_tree,aln,t) t = LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);') aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},\ MolType=PROTEIN) self.assertRaises(ValueError,validate_tree,aln,t) # same scale tree and aln, but different names t = LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,Dee:0.5):0.5);') aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\ MolType=PROTEIN) self.assertRaises(ValueError,validate_tree,aln,t) def test_get_ancestral_seqs(self): """get_ancestral_seqs: returns valid collection of ancestral seqs """ t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);') aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN) expected = DenseAlignment(data={'root':'AA','edge.0':'AA'},\ MolType=PROTEIN) self.assertEqual(get_ancestral_seqs(aln,t, optimise=False),expected) t = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);') aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},\ MolType=PROTEIN) expected = DenseAlignment(data={'root':'AA'},MolType=PROTEIN) self.assertEqual(get_ancestral_seqs(aln,t, optimise=False),expected) t = LoadTree(treestring='(((A1:0.5,A2:0.5):0.5,B:0.5):0.5,\ (C:0.5,D:0.5):0.5);') aln = DenseAlignment(data={'A1':'AD','A2':'AD','B':'AC',\ 'C':'AC','D':'AC'},MolType=PROTEIN) expected = DenseAlignment(data={'root':'AC','edge.0':'AD',\ 'edge.1':'AC','edge.2':'AC'},MolType=PROTEIN) self.assertEqual(get_ancestral_seqs(aln,t, optimise=False),expected) def test_get_ancestral_seqs_handles_gaps(self): """get_ancestral_seqs: handles gaps """ # Gaps handled OK t = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);') aln = DenseAlignment(data={'A':'A-','B':'AA','C':'AA'},MolType=PROTEIN) expected = DenseAlignment(data={'root':'AA'},MolType=PROTEIN) self.assertEqual(get_ancestral_seqs(aln,t, optimise=False),expected) def test_get_ancestral_seqs_handles_ambiguous_residues(self): """get_ancestral_seqs: handles ambiguous residues """ # Non-canonical residues handled OK t = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);') aln = DenseAlignment(data={'A':'AX','B':'Z-','C':'BC'},MolType=PROTEIN) actual = get_ancestral_seqs(aln,t, optimise=False) self.assertEqual(len(actual),2) self.assertEqual(actual.getNumSeqs(),1) def test_ancestral_state_alignment_handles_ancestral_state_calc(self): """ancestral_state_alignment: functions when calc'ing ancestral states """ t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);') aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN) self.assertEqual(ancestral_state_alignment(aln,t),[[0,0],[0,2]]) # non-bifurcating tree t = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);') aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN) self.assertEqual(ancestral_state_alignment(aln,t),[[0,0],[0,2]]) def test_ancestral_state_position_handles_ancestral_state_calc(self): """ancestral_state_position: functions when calc'ing ancestral states """ t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);') aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN) self.assertEqual(ancestral_state_position(aln,t,0),[0,0]) self.assertEqual(ancestral_state_position(aln,t,1),[0,2]) def test_ancestral_state_pair_handles_ancestral_state_calc(self): """ancestral_state_position: functions when calc'ing ancestral states """ t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);') aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN) self.assertEqual(ancestral_state_pair(aln,t,0,0),0) self.assertEqual(ancestral_state_pair(aln,t,0,1),0) self.assertEqual(ancestral_state_pair(aln,t,1,1),2) self.assertEqual(ancestral_state_pair(aln,t,1,0),0) def test_ancestral_state_alignment_no_error_on_gap(self): """ancestral_state_alignment: return w/o error with gapped seqs """ ancestral_state_alignment(self.aln1_w_gaps,self.t1,\ self.ancestral_states1) ancestral_state_alignment(self.aln1_1,self.t1,\ self.ancestral_states1_w_gaps) def test_ancestral_state_methods_handle_bad_ancestor_aln(self): """ancestral state methods raise error on bad ancestor alignment """ # bad length and seq names self.assertRaises(ValueError,coevolve_alignment,\ ancestral_state_alignment,self.aln1_2,\ tree=self.t1,ancestral_seqs=self.ancestral_states2_1) self.assertRaises(ValueError,coevolve_position,\ ancestral_state_position,self.aln1_2,0,\ tree=self.t1,ancestral_seqs=self.ancestral_states2_1) self.assertRaises(ValueError,coevolve_pair,\ ancestral_state_pair,self.aln1_2,0,1,\ tree=self.t1,ancestral_seqs=self.ancestral_states2_1) # bad seq names self.assertRaises(ValueError,coevolve_alignment,\ ancestral_state_alignment,self.aln1_2,\ tree=self.t1,ancestral_seqs=self.aln1_2) self.assertRaises(ValueError,coevolve_position,\ ancestral_state_position,self.aln1_2,0,\ tree=self.t1,ancestral_seqs=self.aln1_2) self.assertRaises(ValueError,coevolve_pair,\ ancestral_state_pair,self.aln1_2,0,1,\ tree=self.t1,ancestral_seqs=self.aln1_2) # bad length a = DenseAlignment(data={'root':'AC','edge.0':'AD','edge.1':'AA',\ 'edge.2':'EE'}) self.assertRaises(ValueError,coevolve_alignment,\ ancestral_state_alignment,self.aln1_2,\ tree=self.t1,ancestral_seqs=a) self.assertRaises(ValueError,coevolve_position,\ ancestral_state_position,self.aln1_2,0,\ tree=self.t1,ancestral_seqs=a) self.assertRaises(ValueError,coevolve_pair,\ ancestral_state_pair,self.aln1_2,0,1,\ tree=self.t1,ancestral_seqs=a) def test_ancestral_states_methods_handle_bad_position_numbers(self): """coevolve_* w/ ancestral_states raise ValueError on bad position """ self.assertRaises(ValueError,coevolve_position,\ ancestral_state_position,self.aln1_2,\ 42,tree=self.t1,ancestral_states=self.ancestral_states2_1) self.assertRaises(ValueError,coevolve_pair,\ ancestral_state_pair,self.aln1_2,\ 0,42,tree=self.t1,ancestral_states=self.ancestral_states2_1) self.assertRaises(ValueError,coevolve_pair,\ ancestral_state_pair,self.aln1_2,\ 42,0,tree=self.t1,ancestral_states=self.ancestral_states2_1) def test_ancestral_state_alignment_non_bifurcating_tree(self): """ancestral_state_alignment: handles non-bifurcating tree correctly """ self.assertEqual(ancestral_state_alignment(self.aln2,\ self.t2,self.ancestral_states2_3),[[9,9],[9,9]]) def test_ancestral_state_alignment_bifurcating_tree(self): """ancestral_state_alignment: handles bifurcating tree correctly """ self.assertFloatEqual(ancestral_state_alignment(self.aln1_5,\ self.t1,self.ancestral_states1),\ [[5,5,5],[5,11.6,11.6],[5,11.6,11.6]]) def test_ancestral_state_alignment_ancestor_difference(self): """ancestral_state_alignment: different ancestor -> different result """ # ancestral_states2_1 self.assertEqual(ancestral_state_alignment(self.aln2,\ self.t2,self.ancestral_states2_1),[[5,2],[2,2]]) # ancestral_states2_2 self.assertEqual(ancestral_state_alignment(self.aln2,\ self.t2,self.ancestral_states2_2),[[2,2],[2,5]]) # ancestral_states2_3 self.assertEqual(ancestral_state_alignment(self.aln2,\ self.t2,self.ancestral_states2_3),[[9,9],[9,9]]) def test_ancestral_state_position_ancestor_difference(self): """ancestral_state_position: difference_ancestor -> different result """ # ancestral_states2_1 self.assertEqual(ancestral_state_position(self.aln2,\ self.t2,0,self.ancestral_states2_1),[5,2]) self.assertEqual(ancestral_state_position(self.aln2,\ self.t2,1,self.ancestral_states2_1),[2,2]) # ancestral_states2_2 self.assertEqual(ancestral_state_position(self.aln2,\ self.t2,0,self.ancestral_states2_2),[2,2]) self.assertEqual(ancestral_state_position(self.aln2,\ self.t2,1,self.ancestral_states2_2),[2,5]) # ancestral_states2_3 self.assertEqual(ancestral_state_position(self.aln2,\ self.t2,0,self.ancestral_states2_3),[9,9]) self.assertEqual(ancestral_state_position(self.aln2,\ self.t2,1,self.ancestral_states2_3),[9,9]) def test_ancestral_state_pair_ancestor_difference(self): """ancestral_state_pair: difference_ancestor -> different result """ # ancestral_states2_1 self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,0,0,self.ancestral_states2_1),5) self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,0,1,self.ancestral_states2_1),2) self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,1,1,self.ancestral_states2_1),2) self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,1,0,self.ancestral_states2_1),2) # ancestral_states2_2 self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,0,0,self.ancestral_states2_2),2) self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,0,1,self.ancestral_states2_2),2) self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,1,1,self.ancestral_states2_2),5) self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,1,0,self.ancestral_states2_2),2) # ancestral_states2_3 self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,0,0,self.ancestral_states2_3),9) self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,0,1,self.ancestral_states2_3),9) self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,1,1,self.ancestral_states2_3),9) self.assertEqual(ancestral_state_pair(self.aln2,\ self.t2,1,0,self.ancestral_states2_3),9) def test_ancestral_state_alignment_tree_difference(self): """ancestral_state_alignment: different result on different tree """ # tree: t3_1 self.assertEqual(ancestral_state_alignment(self.aln3,\ self.t3_1,self.ancestral_states3),[[7,5],[5,5]]) # tree: t3_2 self.assertEqual(ancestral_state_alignment(self.aln3,\ self.t3_2,self.ancestral_states3),[[2,2],[2,5]]) def test_ancestral_state_position_tree_difference(self): """ancestral_state_position: different result on different tree """ # tree: t3_1 self.assertEqual(ancestral_state_position(self.aln3,\ self.t3_1,0,self.ancestral_states3),[7,5]) self.assertEqual(ancestral_state_position(self.aln3,\ self.t3_1,1,self.ancestral_states3),[5,5]) # tree: t3_2 self.assertEqual(ancestral_state_position(self.aln3,\ self.t3_2,0,self.ancestral_states3),[2,2]) self.assertEqual(ancestral_state_position(self.aln3,\ self.t3_2,1,self.ancestral_states3),[2,5]) def test_ancestral_state_pair_tree_difference(self): """ancestral_state_pair: different result on different tree """ # tree: t3_1 self.assertFloatEqual(ancestral_state_pair(self.aln3,\ self.t3_1,0,1,self.ancestral_states3),5) self.assertFloatEqual(ancestral_state_pair(self.aln3,\ self.t3_1,1,0,self.ancestral_states3),5) self.assertFloatEqual(ancestral_state_pair(self.aln3,\ self.t3_1,0,0,self.ancestral_states3),7) self.assertFloatEqual(ancestral_state_pair(self.aln3,\ self.t3_1,1,1,self.ancestral_states3),5) # tree: t3_2 self.assertFloatEqual(ancestral_state_pair(self.aln3,\ self.t3_2,0,1,self.ancestral_states3),2) self.assertFloatEqual(ancestral_state_pair(self.aln3,\ self.t3_2,1,0,self.ancestral_states3),2) self.assertFloatEqual(ancestral_state_pair(self.aln3,\ self.t3_2,0,0,self.ancestral_states3),2) self.assertFloatEqual(ancestral_state_pair(self.aln3,\ self.t3_2,1,1,self.ancestral_states3),5) def test_ancestral_state_alignment_aln_difference(self): """ancestral_state_alignment: difference aln -> different result """ expected = [[0,0,0],[0,2,0],[0,0,7.8]] actual = ancestral_state_alignment(self.aln1_1,\ self.t1,self.ancestral_states1) self.assertFloatEqual(actual,expected) expected = [[5,5,5],[5,11.6,11.6],[5,11.6,11.6]] actual = ancestral_state_alignment(self.aln1_5,\ self.t1,self.ancestral_states1) self.assertFloatEqual(actual,expected) def test_ancestral_state_position_aln_difference(self): """ancestral_state_position: difference aln -> different result """ expected = [0,0,0] actual = ancestral_state_position(self.aln1_1,\ self.t1,0,self.ancestral_states1) self.assertFloatEqual(actual,expected) expected = [0,2,0] actual = ancestral_state_position(self.aln1_1,\ self.t1,1,self.ancestral_states1) self.assertFloatEqual(actual,expected) expected = [0,0,7.8] actual = ancestral_state_position(self.aln1_1,\ self.t1,2,self.ancestral_states1) self.assertFloatEqual(actual,expected) expected = [5,5,5] actual = ancestral_state_position(self.aln1_5,\ self.t1,0,self.ancestral_states1) self.assertFloatEqual(actual,expected) expected = [5,11.6,11.6] actual = ancestral_state_position(self.aln1_5,\ self.t1,1,self.ancestral_states1) self.assertFloatEqual(actual,expected) expected = [5,11.6,11.6] actual = ancestral_state_position(self.aln1_5,\ self.t1,2,self.ancestral_states1) self.assertFloatEqual(actual,expected) def test_ancestral_state_pair_aln_difference(self): """acestral_state_pair: different aln -> different result """ self.assertFloatEqual(ancestral_state_pair(self.aln1_1,self.t1,0,0,\ self.ancestral_states1),0) self.assertFloatEqual(ancestral_state_pair(self.aln1_1,self.t1,1,1,\ self.ancestral_states1),2) self.assertFloatEqual(ancestral_state_pair(self.aln1_1,self.t1,2,2,\ self.ancestral_states1),7.8) self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,0,1,\ self.ancestral_states1),5) self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,0,2,\ self.ancestral_states1),5) self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,1,2,\ self.ancestral_states1),11.6) def test_ancestral_state_pair_symmetry(self): """ancestral_state_pair: value[i,j] == value[j,i] """ self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,0,1,\ self.ancestral_states1),ancestral_state_pair(\ self.aln1_5,self.t1,1,0,self.ancestral_states1)) self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,0,2,\ self.ancestral_states1),ancestral_state_pair(\ self.aln1_5,self.t1,2,0,self.ancestral_states1)) self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,1,2,\ self.ancestral_states1),ancestral_state_pair(\ self.aln1_5,self.t1,2,1,self.ancestral_states1)) def est_ancestral_state_methods_handle_alt_null_value(self): """ancetral state methods handle non-default null value """ # need to rewrite a test of this -- right now there's no way to get # null values into the ancestral states result, but that will change # when I fix the exclude handling pass class GctmpcaTests(TestCase): def setUp(self): self.run_slow_tests = int(environ.get('TEST_SLOW_APPC',0)) self.run_gctmpca_tests = app_path('calculate_likelihood') # Data used by Gctmpca tests self.l1 = "1\t2\t42.60\n" self.l2 = "2\t3\t0.60" self.lines = ["Position 1\tPosition 2\tScore\n",self.l1,self.l2] self.aln = DenseAlignment(\ [('A1','AACF'),('A12','AADF'),('A123','ADCF')],\ MolType=PROTEIN) self.rna_aln = DenseAlignment(\ [('A1','AACU'),('A12','AAGG'),('A123','ADCA')],\ MolType=RNA) self.dna_aln = DenseAlignment(\ [('A1','AACT'),('A12','AAGG'),('A123','ADCA')],\ MolType=DNA) self.tree = LoadTree(treestring="(A1:0.5,(A12:0.5,A123:0.5):0.5);") self.aln4 = DenseAlignment([('A','AACF'),('AB','AADF'),\ ('ABC','ADCF'),('AAA','AADE')],\ MolType=PROTEIN) self.tree4 = LoadTree(treestring=\ "((A:0.5,AAA:0.5):0.5,(AB:0.5,ABC:0.5):0.5);") def test_parse_gctmpca_result_line(self): """Gctmpca: result line parsing functions as expected """ exp1 = (0,1,42.60) exp2 = (1,2,0.60) self.assertFloatEqual(parse_gctmpca_result_line(self.l1),exp1) self.assertFloatEqual(parse_gctmpca_result_line(self.l2),exp2) def test_parse_gctmpca_result(self): """Gctmpca: result (as list) yeilds correctly matrix """ exp1 = array([\ [gDefaultNullValue,42.60,gDefaultNullValue],\ [42.60,gDefaultNullValue,0.60],\ [gDefaultNullValue,0.60,gDefaultNullValue]]) self.assertFloatEqual(parse_gctmpca_result(self.lines,3),exp1) exp2 = array([\ [gDefaultNullValue,42.60,gDefaultNullValue,gDefaultNullValue],\ [42.60,gDefaultNullValue,0.60,gDefaultNullValue],\ [gDefaultNullValue,0.60,gDefaultNullValue,gDefaultNullValue],\ [gDefaultNullValue,gDefaultNullValue,\ gDefaultNullValue,gDefaultNullValue]]) self.assertFloatEqual(parse_gctmpca_result(self.lines,4),exp2) self.assertRaises(ValueError,parse_gctmpca_result,self.lines,2) def test_create_gctmpca_input(self): """Gctmpca: create_gctmpca_input generates proper data """ seqs1, tree1, seq_names, seq_to_species1 = \ create_gctmpca_input(self.aln,self.tree) exp_seqs1 = ['3 4','A1.. AACF', 'A12. AADF','A123 ADCF','\n'] exp_tree1 = ["(A1..:0.5,(A12.:0.5,A123:0.5):0.5);",'\n'] exp_seq_names = ["A1..","A12.","A123",'\n'] exp_seq_to_species1 = ["A1..\tA1..","A12.\tA12.","A123\tA123",'\n'] self.assertEqual(seqs1,exp_seqs1) self.assertEqual(tree1,exp_tree1) self.assertEqual(seq_names,exp_seq_names) self.assertEqual(seq_to_species1,exp_seq_to_species1) def test_gctmpca_pair(self): """Gctmpca: pair method works on trivial data """ if not self.run_slow_tests: return if not self.run_gctmpca_tests: return # Note: the values in here are derived from the results of running # this on the example data from the command line. # More extensive tests are performed # in the app controller test -- here I just want to make sure we're # getting the values out. actual = gctmpca_pair(self.aln,self.tree,2,3) expected = 0.483244 self.assertFloatEqual(actual,expected) actual = gctmpca_pair(self.aln4,self.tree4,2,3) expected = 0.164630 self.assertFloatEqual(actual,expected) # mol type = RNA actual = gctmpca_pair(self.rna_aln,self.tree,3,4) expected = 0.237138 self.assertFloatEqual(actual,expected) # mol type = DNA => error self.assertRaises(ValueError,gctmpca_pair,\ self.dna_aln,self.tree,2,3) def test_gctmpca_alignment(self): """Gctmpca: alignment method works on trivial data """ if not self.run_slow_tests: return if not self.run_gctmpca_tests: return # Note: the values in here are derived from the results of running # this on the example data from the command line. # More extensive tests are performed # in the app controller test -- here I just want to make sure we're # getting the values out. actual = gctmpca_alignment(self.aln,self.tree) expected = array([\ [gDefaultNullValue,gDefaultNullValue,\ gDefaultNullValue,gDefaultNullValue],\ [gDefaultNullValue,gDefaultNullValue,0.483244,gDefaultNullValue],\ [gDefaultNullValue,0.483244,gDefaultNullValue,1.373131],\ [gDefaultNullValue,gDefaultNullValue,1.373131,gDefaultNullValue]]) self.assertFloatEqual(actual,expected) def test_build_q_yields_roughly_gctmpca_default(self): """build_rate_matrix: from DSO78 data yields Yeang's default Q """ # Note: This doesn't reproduce the exact values, which I expect is # due to the two Dayhoff matrices in the PAML data. I think they're # using the second one (which is from Dayhoff et al., 1978) and we're # using the first one. What is the difference b/w these two? aa_order = 'ACDEFGHIKLMNPQRSTVWY' q = build_rate_matrix(DSO78_matrix,DSO78_freqs,aa_order=aa_order) expected = [] for row_aa in aa_order: expected.append([default_gctmpca_aa_sub_matrix[row_aa][col_aa] \ for col_aa in aa_order]) self.assertFloatEqual(q,expected,3) def test_build_q_ignores_zero_counts(self): """build_rate_matrix: recoded counts (i.e., w/ many 0s) yeilds right Q """ # Test that when working with reduced counts, counts and freqs # that equal 0.0 don't effect the calculation of Q. aa_order_3 = 'ACD' count_matrix_3 = [[0,3,9],[3,0,6],[9,6,0]] aa_freqs_3 = {'A':0.5,'C':0.3,'D':0.2} sm = Empirical(\ rate_matrix=array(count_matrix_3),\ motif_probs=aa_freqs_3,\ alphabet=Alphabet(aa_order_3),recode_gaps=True,do_scaling=True,\ name="",optimise_motif_probs=False) wprobs = array([aa_freqs_3[aa] for aa in aa_order_3]) mprobs_matrix=ones((wprobs.shape[0],wprobs.shape[0]),float)*wprobs q3 = sm.calcQ(wprobs, mprobs_matrix) aa_freqs_20 = {}.fromkeys('ACDEFGHIKLMNPQRSTVWY',0.0) aa_freqs_20['A'] = 0.5 aa_freqs_20['C'] = 0.3 aa_freqs_20['D'] = 0.2 count_matrix_20 = zeros(400).reshape(20,20) count_matrix_20[0,1] = count_matrix_20[1,0] = 3 count_matrix_20[0,2] = count_matrix_20[2,0] = 9 count_matrix_20[1,2] = count_matrix_20[2,1] = 6 q_20 = build_rate_matrix(array(count_matrix_20),aa_freqs_20) for i in range(20): for j in range(20): try: # rates in q_3 and q_20 are identical self.assertEqual(q3[i,j],q_20[i,j]) except IndexError: # and everything not in q_3 is zero self.assertEqual(q_20[i,j],0.) # following are support funcs for ResampledMiTests def make_freqs(c12): c1,c2=Freqs(),Freqs() for a,b in c12.expand(): c1 += a c2 += b return c1, c2 def make_sample(freqs): d=[] for i, s in enumerate(freqs.expand()): d += [("s%d" % i, s)] return LoadSeqs(data=d) def _calc_mi(): """one mutual info hand calc""" from math import log i = 37/42 * -log(37/42,2) - (5/42 * log(5/42,2)) j = 39/42 * -log(39/42,2) - (3/42 * log(3/42,2)) k = 34/42 * -log(34/42,2) - (3/42 * log(3/42,2)) - (5/42 * log(5/42,2)) return i+j-k class ResampledMiTests(TestCase): def setUp(self): self.c12 = Freqs() self.c12 += ['AA']*2 self.c12 += ['BB']*2 self.c12 += ['BC'] self.c1, self.c2 = make_freqs(self.c12) self.aln = make_sample(self.c12) def test_calc_weights(self): """resampled mi weights should be correctly computed""" w1 = make_weights(self.c1, 5) w2 = make_weights(self.c2, 5) e = [('A', {'C': 0.033333333333333333, 'B': 0.066666666666666666}), ('C', {'A': 0.050000000000000003, 'B': 0.050000000000000003}), ('B', {'A': 0.066666666666666666, 'C': 0.033333333333333333})] weights = [] for w in w1,w2: for k, d in w: weights += d.values() self.assertFloatEqual(sum(weights), 0.5) self.assertEqual(w2, e) def test_scaled_mi(self): """resampled mi should match hand calc""" def calc_scaled(data, expected_smi): col_i, col_j = Freqs(), Freqs() for i, j in data: col_i += i col_j += j pair_freqs = Freqs(data) weights_i = make_weights(col_i.copy(), col_i.Sum) weights_j = make_weights(col_j.copy(), col_j.Sum) entropy = mi(col_i.Uncertainty, col_j.Uncertainty, pair_freqs.Uncertainty) self.assertFloatEqual(entropy, _calc_mi()) scales = calc_pair_scale(data, col_i, col_j, weights_i, weights_j) scaled_mi = 1-sum([w*pair_freqs[pr] for pr, e, w in scales\ if entropy <= e]) self.assertFloatEqual(scaled_mi, expected_smi) data = ['BN', 'BN', 'BP', 'BN', 'PN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BN', 'PN', 'BN', 'PN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BP', 'BN', 'BN', 'BN', 'BN', 'BP', 'BN', 'BN', 'BN', 'BN', 'PN', 'PN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BN'] calc_scaled(data, 8/42) def test_resampled_mi_interface(self): """resampled_mi_alignment should correctly compute statistic from alignment""" arr = resampled_mi_alignment(self.aln) # expected value from hand calculation self.assertFloatEqual(arr.tolist(), [[1.,0.78333333],[0.78333333,1.]]) ALN_FILE=\ """Seq_1 ACDEFG Seq_2 STVWY- Seq_3 WY.ZBX""" #J in here ALN_FILE_WRONG_KEY=\ """Seq_1 ACDKLM Seq_2 JINCK- Seq_3 VX.MAB""" #last seq too long ALN_FILE_INC_SHAPE=\ """Seq_1 ACDKLM Seq_2 LINCK- Seq_3 VX.MABN""" gpcr_ungapped = """>OPSD_SPAAU MNGTEGPFFYVPMVNTSGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWIMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFICHFSIPLTIVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIMMVIAFLVCWLPYAGVAWWIFTHQGSEFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >B1AR_CANFA LPDGAATAARLLVPASPSASPLAPTSEGPAPLSQQWTAGIGLLMALIVLLIVAGNVLVIAAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVMRGRWEYGSFLCELWTSVDVLCVTASIETLCVIALDRYLAITAPFYQSLLTRARARALVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRRAFQRLLCCARRAARGSHGAAG------PPPSPG >OPSB_HUMAN ---MRKMSEEEFYLFKNISSVGPWDGPQYHIAPVWAFYLQAAFMGTVFLIGFPLNAMVLVATLRYKKLRQPLNYILVNVSFGGFLLCIFSVFPVFVASCNGYFVFGRHVCALEGFLGTVAGLVTGWSLAFLAFERYIVICKPFGNFRFSSKHALTVVLATWTIGIGVSIPPFFGWSRFIPEGLQCSCGPDWYTVGTKYRSESYTWFLFIFCFIVPLSLICFSYTQLLRALKAVAAQQQESATTQKAEREVSRMVVVMVGSFCVCYVPYAAFAMYMVNNRNHGLDLRLVTIPSFFSKSACIYNPIIYCFMNKQFQACIMKMVCGKAMTD---ESDTCSTVSSTQVGPN- >OPSD_PROCL NP-Y-GNFTVVDMAPKDILHMIHPHWYQYPPMNPMMYPLLLIFMLFTGILCLAGNFVTIWVFMNTKSLRTPANLLVVNLAMSDFLMMFTMFPPMMVTCYYHTWTLGPTFCQVYAFLGNLCGCASIWTMVFITFDRYNVIVKGVAGEPLSTKKASLWILTIWVLSITWCIAPFFGWNRYVPEGNLTGCGTD--YLSEDILSRSYLYDYSTWVYYLPLLP-IYCYVSIIKAVAAHMGIRNEEAQKTSAECRLAKIAMTTVALWFIAWTPYLLINWVGMFARSY-LSPVYTIWGYVFAKANAVYNPIVYAISHPKYRAAMEKKLPCLSCKTESDDVSESAEEKAESA---- >BRB1_RABIT ASQGPLELQPSNQSQLAPPNATSC--SGAPDAWDLLHRLLPTFIIAIFTLGLLGNSFVLSVFLLARRRLSVAEIYLANLAASDLVFVLGLPFWAENVRNQFDWPFGAALCRIVNGVIKANLFISIFLVVAISQDRYSVLVHPMSRRGRRRRQAQATCALIWLAGGLLSTPTFVLRSVRA-LN--SACILL---LPHEAWHWLRMVELNLLGFLLPLAAILFFNCHILASLRRR---RVPSRCGGPRDSKSTALILTLVASFLVCWAPYHFFAFLECLWQVHEFTDLGLQLSNFSAFVNSCLNPVIYVFVGRLFRTKVWELCQQCSPR---------LAPV-------S >5H7_.ENLA NLLPSEFMTERPLNTTEQDLTKPDCGKELLLYGDTEKIVIGVVLSIITLFTIAGNALVIISVCIVKKLRQPSNYLVVSLAAADLSVAVAVMPFVIITDLVGGWLFGKVFCNVFIAMDVMCCTASIMTLCVISVDRYLGITRPLYPARQNGKLMAKMVFIVWLLSASITLPPLFGWAK--N-V--RVCLIS--------QDFGYTVYSTAVAFYIPMTVMLVMYQRIFVAAKISSKLDRKNISIFKREQKAARTLGIIVGAFTFCWLPFFLLSTARPFICGICMPLRLERTLLWLGYTNSLINPLIYAFFNRDLRTTFWNLLRCKYTNINRRLSAASTERHEGIL---- >B3AR_FELCA MAPWPHGNGSLASWPDAPTLTPNTANTSGLPGVPWAVALAGALLALAVLATVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSKWWRVQ-R--HCCAFA--------SNIPYALLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALRTLGLIMGTFSLCWLPFFVANVVRALGGPS-VPSPAFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCRLEERHAAASGAAALTRPAESGLP >TLR1_DROME IIDNRDNLESINEAKDFLTECLFPSPTRPYELPWEQKTIWAIIFGLMMFVAIAGNGIVLWIVTGHRSMRTVTNYFLLNLSIADLLMSSLNCVFNFIFMLNSDWPFGSIYCTINNFVANVTVSTSVFTLVAISFDRYIAIVDPL-KRRTSRRKVRIILVLIWALSCVLSAPCLLYSS----SR--TVCFMMDGRYPTSMADYAYNLIILVLTTGIPMIVMLICYSLMGRVPGGSSIGTDRQMESMKSKRKVVRMFIAIVSIFAICWLPYHLFFIYAYHNNQVKYVQHMYLGFYWLAMSNAMVNPLIYYWMNKRFRMYFQRIICCCCVGLTRHRFDSPNSSNRHTRAETK >ITR_CATCO EQDFWSFNESSRNSTVGNETF-GNQTVNPLKRNEEVAKVEVTVLALVLFLALAGNLCVLIAIYTAKHTQSRMYYLMKHLSIADLVVAVFQVLPQLIWDITFRFYGPDFLCRLVKYLQTVGMFASTYMLVLMSIDRCIAICQPL--RSLHKRKDRCYVIVSWALSLVFSVPQVYIFSLRE----VYDCWGD---FVQPWGAKAYITWISLTIYIIPVAILGGCYGLISFKIWQNANAVSSVKLVSKAKITTVKMTFVIVLAYIVCWTPFFFVQMWSAWDPEA-REAMPFIISMLLASLNSCCNPWIYMFFAGHLFHDLKQSLLCCSTLYLKSSQCRCKSNCSTYVIKST >NK1R_RANCA ----MNSNISAQNDSALNSTIQNGTKINQFIQPPWQIALWSVAYSIIVIVSLVGNIIVMWIIIAHKRMRTVTNYFLVNLAFAEASMSAFNTVINFTYAIHNHWYYGLIYCKFHNFFPISAVFTSIYSMTAIALDRYMAIIHPL-KPRLSATATKIVICVIWSFSFCMAFPLGYYAD---GG---DICYLNPDSEENRKYEQVYQVLVFCLIYILPLLVIGCAYTFIGMTLWAS--PSDRYHEQVVAKRKVVKMMIVVVCTFAICWLPFHIFFLLQTLHEMTKFYQQFYLAIMWLAMSSTMYNPIIYCCLNDRFRIGFKHVFRWCPFIRAGEY----STRYLQTQSSMY >A2AA_MOUSE ----MGSLQPDAGNSSWNGTEAPGGGTRATPYSLQVTLTLVCLAGLLMLFTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKVWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISIEKKGQ-P--PSCKIN--------DQKWYVISSSIGSFFAPCLIMILVYVRIYQIAKRRRGGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLIAVGCP--VPSQLFNFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------ >MSHR_VULVU SGQGPQRRLLGSPNATSPTTPHFKLAANQTGPRCLEVSIPNGLFLSLGLVSVVENVLVVAAIAKNRNLHSPMYYFIGCLAVSDLLVSVTNVLETAVMLLVEAAAVVQQLDDIIDVLICGSMVSSLCFLGAIAVDRYLSIFYALYHSIVTLPRAWRAISAIWVASVLSSTLFIAYYY----------------------NNHTAVLLCLVSFFVAMLVLMAVLYVHMLARARQHIARKRQHSVHQGFGLKGAATLTILLGIFFLCWGPFFLHLSLMVLCPQHGCVFQNFNLFLTLIICNSIIDPFIYAFRSQELRKTLQEVVLCSW----------------------- >OPSD_BOVIN MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA >OPSG_MOUSE DHYEDSTHASIFTYTNSNSTKGPFEGPNYHIAPRWVYHLTSTWMILVVVASVFTNGLVLAATMRFKKLRHPLNWILVNLAVADLAETIIASTISVVNQIYGYFVLGHPLCVIEGYIVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLATVGIVFSWVWAAIWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCIFPLSIIVLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVFAYCLCWGPYTFFACFATAHPGYAFHPLVASLPSYFAKSATIYNPIIYVFMNRQFRNCILHLFGKKVDDS-----SELVSSV--SSVSPA >TA2R_RAT --MWLNS-----------TSLGACFRPVNITLQERRAIASPWFAASFCALGLGSNLLALSVLAGARPPRSSFLALLCGLVLTDFLGLLVTGAVVASQHAALLTDPGCRLCHFMGAAMVFFGLCPLLLGAAMAAERFVGITRPFSRPAATSRRAWATVGLVWVGAGTLGLLPLLGLGRYSVQYPGSWCFLT----LGAERGDVAFGLMFALLGSVSVGLSLLLNTVSVATLCRVYHAREATQRPRDCEVEMMVQLVGIMVVATVCWMPLLVFILQTLLQTLPRTTERQLLIYLRVATWNQILDPWVYILFRRSVLRRLHPRFTSQLQAVSLHSPPTQ------------ >CCR3_HUMAN LENFSSSYDYGENESDSCCTSPPC---PQDFSLNFDRAFLPALYSLLFLLGLLGNGAVAAVLLSRRTALSSTDTFLLHLAVADTLLVLTLPLWAVDAAVQ--WVFGSGLCKVAGALFNINFYAGALLLACISFDRYLNIVHATLYRRGPPARVTLTCLAVWGLCLLFALPDFIFLSAHH-RL-ATHCQYN----FPQVGRTALRVLQLVAGFLLPLLVMAYCYAHILAVL---------LVSRGQRRLRAMRLVVVVVVAFALCWTPYHLVVLVDILMDLGSRVDVAKSVTSGLGYMHCCLNPLLYAFVGVKFRERMWMLLLRLGCPN----QRGL--------PSSS >THRR_CRILO LPEGRAIYLNKSHSPPAPLAPFISEDASGYLTSPWLRLFIPSVYTFVFVVSLPLNILAIAVFVLKMKVKKPAVVYMLHLAMADVLFVSVLPLKISYYFSGSDWQFGSGMCRFATAAFYCNMYASIMLMTVISIDRFLAVVYPISLSWRTLGRANFTCLVIWVMAIMGVVPLLLKEQTTR--N--TTCHDVLNETLLQGFYSYYFSAFSAVFFLVPLIISTICYMSIIRCL------SSSSVANRSKKSRALFLSAAVFCVFIVCFGPTNVLLIMHYLLLSDEKAYFAYLLCVCVSSVSCCIDPLIYYYASSECQRHLYGILCCKESSDPNSYNSTGDTCS-------- >AA2A_CAVPO ----------------------------------MSSSVYITVELVIAVLAILGNVLVCWAVWINSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTG--FCAACHGCLFFACFVLVLTQSSIFSLLTITIDRYIAIRIPLYNGLVTCTRAKGIIAICWVLSFAIGLTPMLGWNNCSS-E-QVTCLFE-----DVVPMNYMVYYNFFAFVLVPLLLMLGIYLRIFLAARRQESQGERTRSTLQKEVHPAKSLAIIVGLFALCCLPLNIINCFTFFCPECHAPPWLMYLTIILSHGNSVVNPLIYAYRIREFRQTFRKIIRSHILRRRELFKAGGAHSPEGEQVSLR >C3AR_MOUSE --------------MESFDADTNSTDLHSRPLFQPQDIASMVILGLTCLLGLLGNGLVLWVAGVKMK-TTVNTVWFLHLTLADFLCCLSLPFSLAHLILQGHWPYGLFLCKLIPSIIILNMFASVFLLTAISLDRCLIVHKPICQNHRNVRTAFAICGCVWVVAFVMCVPVFVYRDLFI-ED-DYVDQFT-YDNHVPTPLMAITITRLVVGFLVPFFIMVICYSLIVFRM--------RKTNFTKSRNKTFRVAVAVVTVFFICWTPYHLVGVLLLITDPEEAVMSWDHMSIALASANSCFNPFLYALLGKDFRKKARQSIKGILEAAFSEELTHSASS--------- >TA2R_BOVIN --MWPNA-----------SSLGPCFRPMNITLEERRLIASPWFAASFCLVGLASNLLALSVLMGARQSRSSFLTFLCGLVLTDFMGLLVTGAIVVTQHFVLFVDPGCSLCHFMGVIMVFFGLCPLLLGAAMASERFLGITRPFRPATASQRRAWTTVGLVWASALALGLLPLLGVGHYTVQYPGSWCFLT----LGTDPGDVAFGLLFALLGSISVGMSFLLNTISVATLCHVYHGATAQQRPRDCEVEMMVQLMGIMVVASICWMPLLVFIAQTVLQSPPRLTERQLLIYLRVATWNQILDPWVYILFRRAVIQRFYPRLSTRSRSLSLQPQLTR------------ >OPSD_RABIT MNGTEGPDFYIPMSNQTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTTTLYTSLHGYFVFGPTGCNVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWIMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPLIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATASKTETSQVAPA >OPSD_PHOVI MNGTEGPNFYVPFSNKTGVVRSPFEFPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVGFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTLPAFFAKAASIYNPVIYIMMNKQFRTCMITTLCCGKNPLGDDEVSASASKTETSQVAPA >D3DR_RAT --------MAPLSQISTHLNSTCGAENSTGVNRARPHAYYALSYCALILAIIFGNGLVCAAVLRERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGWNFSRICCDVFVTLDVMMCTASILNLCAISIDRYTAVVMPVGTGQSSCRRVALMITAVWVLAFAVSCPLLFGFN---D-P--SICSI---------SNPDFVIYSSVVSFYVPFGVTVLVYARIYIVLRQRTSLPLQPRGVPLREKKATQMVVIVLGAFIVCWLPFFLTHVLNTHCQACHVSPELYRATTWLGYVNSALNPVIYTTFNVEFRKAFLKILSC------------------------- >DADR_.ENLA ---------------MTFNITSMDEDVLLTERESSFRVLTGCFLSVLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-TFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKVAFIMIGVAWTLSVLISFIPVQLNWHKALN-TMDNCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAAKQLDCESSLKTSFKRETKVLKTLSVIMGVFVCCWLPFFILNCIVPFCDPSCISSTTFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSNLLGCYRLCPTSNNIIETAVVYSCQ----- >PAR2_HUMAN ---VDGTSHVTGKGVTVETVFSVDEFSASVLTGKLTTVFLPIVYTIVFVVGLPSNGMALWVFLFRTKKKHPAVIYMANLALADLLSVIWFPLKIAYHIHGNNWIYGEALCNVLIGFFYGNMYCSILFMTCLSVQRYWVIVNPMGHSRKKANIAIGISLAIWLLILLVTIPLYVVKQTIF--N--TTCHDVLPEQLLVGDMFNYFLSLAIGVFLFPAFLTASAYVLMIRML------SAMDENSEKKRKRAIKLIVTVLAMYLICFTPSNLLLVVHYFLIKSSHVYALYIVALCLSTLNSCIDPFVYYFVSHDFRDHAKNALLCRSVRTVKQMQVSLKSSS-------- >GPRA_RAT TPANQSAEASESNVSATVPRAAAVTPFQSLQLVHQLKGLIVMLYSIVVVVGLVGNCLLVLVIARVRRLHNVTNFLIGNLALSDVLMCAACVPLTLAYAFEPRWVFGGGLCHLVFFLQPVTVYVSVFTLTTIAVDRYVVLVHPL-RRRISLKLSAYAVLGIWALSAVLALPAAVHTYHVEHD--VRLCEEF--WGSQERQRQIYAWGLLLGTYLLPLLAILLSYVRVSVKLRNRPGS-SQADWDRARRRRTFCLLVVVVVVFALCWLPLHIFNLLRDLDPRAYAFGLVQLLCHWLAMSSACYNPFIYAWLHDSFREELRKMLLSWPRKIVPHGQNMT------------ >ET1R_PIG EFSLVVTTHRPTNLALPSNGSMHNYCPQQTKITSAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRHDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSVQGIGIPLVTAIEIVSIWILSFILAIPEAIGFVMVPKT--HKTCMLNATSKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRGSL-IALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYDESFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCYQSKSLMTSVPWKNHEQNNHNTE >NTR2_MOUSE -----METSSLWPPRPSPSAGLSLEARLGVDTRLWAKVLFTALYSLIFALGTAGNALSVHVVLKARTRPGRLRYHVLSLALSALLLLLISVPMELYNFVWSHWVFGDLGCRGYYFVRELCAYATVLSVASLSAERCLAVCQPLARRLLTPRRTCRLLSLVWVASLGLALPMAVIMGQKH--AASRVCTVL---VSRASSRSTFQVKRAGLLRSPLWELTAILNGITVNHLVALVQARHKDASQIRSLQHSAQVLRAIVAVYVICWLPYHARRLMYCYIPDDDFYHYFYMVTNTLFYVSSAVTPVLYNAVSSSFRKLFLESLSSLCGEQRSVVPLPQSTYSFRLWGSPR >CKR7_MOUSE QDEVTDDYIGENTTVDYTLYESVC---FKKDVRNFKAWFLPLMYSVICFVGLLGNGLVILTYIYFKRLKTMTDTYLLNLAVADILFLLILPFWAYSEAKS--WIFGVYLCKGIFGIYKLSFFSGMLLLLCISIDRYVAIVQAVRHRARVLLISKLSCVGIWMLALFLSIPELLYSGLQK-GE-TLRCSLV---SAQVEALITIQVAQMVFGFLVPMLAMSFCYLIIIRTL---------LQARNFERNKAIKVIIAVVVVFIVFQLPYNGVVLAQTVANFNKQLNIAYDVTYSLASVRCCVNPFLYAFIGVKFRSDLFKLFKDLGCLSQERLRHWS--------HVRN >NY5R_HUMAN -------------NTAATRNSDFPVWDDYKSSVDDLQYFLIGLYTFVSLLGFMGNLLILMALMKKRNQKTTVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKVMCHIMPFLQCVSVLVSTLILISIAIVRYHMIKHPI-SNNLTANHGYFLIATVWTLGFAICSPLPVFHSLVESS--RYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRSISCGVHEKRSVTRIKKRSRSVFYRLTILILVFAVSWMPLHLFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLVSLIHCLHM---------------------- >VG74_HSVSA VKLDFSSEDFSNYSYNYSGDIYYGDVAPCVVNFLISESALAFIYVLMFLCNAIGNSLVLRTFLKYRA-QAQSFDYLMMGFCLNSLFLAGYLLMRLLRMFE--IFMNTELCKLEAFFLNLSIYWSPFILVFISVLRCLLIFCATRLWVKKTLIGQVFLCCSFVLACFGALPHVMVTSYYE----PSSCIEE---VLTEQLRTKLNTFHTWYSFAGPLFITVICYSMSCYKL---------FKTKLSKRAEVVTIITMTTLLFIVFCIPYYIMESIDTLLRVGSAIVYGIQCTYMLLVLYYCMLPLMFAMFGSLFRQRMAAWCKTICHC--------------------- >CCKR_HUMAN DSLLVNGSNITPPCELGLENETLFCLDQPRPSKEWQPAVQILLYSLIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAVSDLMLCLFCMPFNLIPNLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICKPLSRVWQTKSHALKVIAATWCLSFTIMTPYPIYN----NNQTANMCRFL---LPNDVMQQSWHTFLLLILFLIPGIVMMVAYGLISLELYQGRANSNSSAANLMAKKRVIRMLIVIVVLFFLCWMPIFSANAWRAYDTASRLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGPPGARGEVTTGASLSRFSYS >OPSR_CARAU GD--ETTRESMFVYTNSNNTRDPFEGPNYHIAPRWVYNLATVWMFFVVVASTFTNGLVLVATAKFKKLRHPLNWILVNLAVADLAETLLASTISVTNQFFGYFILGHPMCIFEGFTVSVCGIAGLWSLTVISWERWVVVCKPFGNVKFDAKWASAGIIFSWVWSAIWCAPPIFGWSRFWPHGLKTSCGPDVFSGSEDPGVQSYMIVLMITCCIIPLAIIILCYIAVWLAIRTVAQQQKDSESTQKAEKEVSRMVVVMIFAYCFCWGPYTFCACFAAANPGYAFHPLAAAMPAYFAKSATIYNPIIYVFMNRQFRVCIMQLFGKKVDDG-----SEVSS------VAPA >OPS2_LIMPO PN-----ASVVDTMPKEMLYMIHEHWYAFPPMNPLWYSILGVAMIILGIICVLGNGMVIYLMMTTKSLRTPTNLLVVNLAFSDFCMMAFMMPTMASNCFAETWILGPFMCEVYGMAGSLFGCASIWSMVMITLDRYNVIVRGMAAAPLTHKKATLLLLFVWIWSGGWTILPFFGWSRYVPEGNLTSCTVD--YLTKDWSSASYVIIYGLAVYFLPLITMIYCYFFIVHAVAEHNVAANADQQKQSAECRLAKVAMMTVGLWFMAWTPYLIIAWAGVFSSGTRLTPLATIWGSVFAKANSCYNPIVYGISHPRYKAALYQRFPSLACGSGESGSDVKTMEEKPKSPEA- >O5I1_HUMAN -----------MEFTDRNYTLVTEFILLGFPTRPELQIVLFLMFLTLYAIILIGNIGLMLLIRIDPHLQTPMYFFLSNLSFVDLCYFSDIVPKMLVNFLSENKSISYYGCALQFYFFCTFADTESFILAAMAYDRYVAICNPLYTVVMSRGICMRLIVLSYLGGNMSSLVHTSFAFIL---KNHFFCDLPKLSCTDTTINEWLLSTYGSSVEIICFIIIIISYFFILLSV--------LKIRSFSGRKKTFSTCASHLTSVTIYQGTLLFIYSRPSYLY---SPNTDKIISVFYTIFIPVLNPLIYSLRNKDVKDAAEKVLRSKVDSS-------------------- >OAR_HELVI --TEEVIEDDRDACAVADDPKYPSSFGITLAVPEWEAICTAIVLTLIIISTIVGNILVILSVFTYKPLRIVQNFFIVSLAVADLTVAILVLPLNVAYSILGQWVFGIYVCKMWLTCDIMCCTSSILNLCAIALDRYWAITDPIYAQKRTLERVLLMIGVVWVLSLIISSPPLLGWNDW-E-P--TPCRLT--------SQPGFVIFSSSGSFYIPLVIMTVVYFEIYLATKKRAVYEEKQRISLTRERRAARTLGIIMGVFVVCWLPFFVIYLVIPFCASCCLSNKFINFITWLGYCNSALNPLIYTIFNMDFRRAFKKLLCMKP----------------------- >O1D4_HUMAN -------------MDGDNQSENSQFLLLGISESPEQQQILFWMFLSMYLVTVLGNVLIILAISSDSHLHTPMYFFLANLSFTDLFFVTNTIPKMLVNFQSQNKAISYAGCLTQLYFLVSLVTLDNLILAVMAYDRYVAICCPLYVTAMSPGLCVLLLSLCWGLSVLYGLLLTFLLTRV---THYLFCDMYWLACSNTHIIHTALIATGWFIFLTLLGFMTTSYVRIVRTI--------LQMPSASKKYKTFSTCASHLGVVSLFYGTLAMVYLQPLHTY----SMKDSVATVMYAVLTPMMNPFIYSLRNKDMHGAPGRVLWRPFQRP-------------------- >OPSD_APIME AR-F-NNQTVVDKVPPDMLHLIDANWYQYPPLNPMWHGILGFVIGMLGFVSAMGNGMVVYIFLSTKSLRTPSNLFVINLAISNFLMMFCMSPPMVINCYYETWVLGPLFCQIYAMLGSLFGCGSIWTMTMIAFDRYNVIVKGLSGKPLSINGALIRIIAIWLFSLGWTIAPMFGWNRYVPEGNMTACGTD--YFNRGLLSASYLVCYGIWVYFVPLFLIIYSYWFIIQAVAAHMNVRSSENQNTSAECKLAKVALMTISLWFMAWTPYLVINFSGIFNLVK-ISPLFTIWGSLFAKANAVYNPIVYGISHPKYRAALFAKFPSLACAAEPSSDAVSVTDNEKSNA--- >TRFR_CHICK ----------MENGTGDEQNHTGLLLSSQEFVTAEYQVVTILLVLLICGLGIVGNIMVVLVVLRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITESLYKSWVYGYVGCLCITYLQYLGINASSFSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWSFASVYCMLWFFLLDLN--DT-VVSCGYK------RSYYSPIYMMDFGIFYVLPMVLATVLYGLIARILFLNVNSNKSFNSTIASRRQVTKMLAVVVVLFAFLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCHLKRDKKPANYSVKESDHFSSEIED >OPSR_HUMAN DSYEDSTQSSIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSVWMIFVVTASVFTNGLVLAATMKFKKLRHPLNWILVNLAVADLAETVIASTISIVNQVSGYFVLGHPMCVLEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIVGIAFSWIWSAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMVTCCIIPLAIIMLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMIFAYCVCWGPYTFFACFAAANPGYAFHPLMAALPAYFAKSATIYNPVIYVFMNRQFRNCILQLFGKKVDDG-----SELVSSV--SSVSPA >CB2R_MOUSE ----MEGCRETEVTNGSNGGLEFNPMKEYMILSSGQQIAVAVLCTLMGLLSALENMAVLYIILSSRRRRKPSYLFISSLAGADFLASVIFACNFVIFHVFHG-VDSNAIFLLKIGSVTMTFTASVGSLLLTAVDRYLCLCYPPYKALVTRGRALVALCVMWVLSALISYLPLMGWTC-----CPSPCSEL------FPLIPNDYLLGWLLFIAILFSGIIYTYGYVLWKAHRHAEHQVPGIARMRLDVRLAKTLGLVLAVLLICWFPALALMGHSLVTTLSDQVKEAFAFCSMLCLVNSMVNPIIYALRSGEIRSAAQHCLIGWKKYLQGLGPEGKVTETEADVKTT- >PE21_HUMAN --MSPCGPLNLSLAGEATTCAA---PWVPNTSAVPPSGASPALPIFSMTLGAVSNLLALALLAQAAGSATTFLLFVASLLATDLAGHVIPGALVLRLYTAG-RAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTRPLHAARVSVARARLALAAVAAVALAVALLPLARVGRYELQYPGTWCFIG--LGPPGGWRQALLAGLFASLGLVALLAALVCNTLSGLALHRSRRRAHGPRRARAHDVEMVGQLVGIMVVSCICWSPMLVLVALAVGGWSSTSLQRPLFLAVRLASWNQILDPWVYILLRQAVLRQLLRLLPPRAGAKGGPAGLGLSSLRSSRHSGLS >OPR._CAVPO GSHLQGNLSLLSPNHSGLPPHLLLNASHSAFLPLGLKVTIVGLYLAVCIGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQATDILLGF-WPFGNTLCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALALVVGVPVAIMGSAQVEE---IECLVE-IPDPQDYWGPVFAVSIFLFSFIIPVLIISVCYSLMIRRLHGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLVQGLGVQPETTVAILRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCASALHREMQVSDRVALGCKTTETVPR >OPSV_CHICK -----MSSDDDFYLFTNGSVPGPWDGPQYHIAPPWAFYLQTAFMGIVFAVGTPLNAVVLWVTVRYKRLRQPLNYILVNISASGFVSCVLSVFVVFVASARGYFVFGKRVCELEAFVGTHGGLVTGWSLAFLAFERYIVICKPFGNFRFSSRHALLVVVATWLIGVGVGLPPFFGWSRYMPEGLQCSCGPDWYTVGTKYRSEYYTWFLFIFCFIVPLSLIIFSYSQLLSALRAVAAQQQESATTQKAEREVSRMVVVMVGSFCLCYVPYAALAMYMVNNRDHGLDLRLVTIPAFFSKSACVYNPIIYCFMNKQFRACIMETVCGKPLTD--DSDASTSSVSSSQVGPT- >YKR5_CAEEL MNSENGLDSVTQIMYDMKKYNIVNDVLPPPNHEDLHVVIMAVSYLLLFLLGTCGNVAVLTTIYHVIRTLDNTLIYVIVLSCVDFGVCLSLPITVIDQILGF-WMFGKIPCKLHAVFENFGKILSALILTAMSFDRYAGVC---------HPQRKRLRSRNFAITILLAPGMLTRM-------KIEKCTVD----IDSQMFTAFTIYQFILCYCTPLVLIAFFYTKLLSKLRE----TRTFKSSQIPFLHISLYTLAVACFYFLCWTPFWMATLFAVYLENSPVFVYIMYFIHALPFTNSAINWILYGRVFLETVS--------------------------------- >5H1B_MOUSE CAPPPPAASQTGVPLTNLSHNSADGYIYQDSIALPWKVLLVALLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPKRAAIMIVLVWVFSISISLPPFFWR----E-E--LDCFVN-------TDHVLYTVYSTVGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHMAIFDFFNWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCAG--------------------- >P2YR_HUMAN AAFLAGPGSSWGNSTVASTAAVSSSFKCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDAMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLSLGRLKKKNAICISVLVWLIVVVAISPILFYSGTG----KTITCYDT-TSDEYLRSYFIYSMCTTVAMFCVPLVLILGCYGLIVRALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMKTMNLRARLDDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKASRRSEANLQSKSLPEFKQNGDTSL >5H4_CAVPO ------------------MDKLDANVSSKEGFGSVEKVVLLTFLSAVILMAILGNLLVMVAVCRDRQRKIKTNYFIVSLAFADLLVSVLVMPFGAIELVQDIWVYGEMFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPYRNKMTPLRIALMLGGCWVIPMFISFLPIMQGWNNIRK-NSTYCVFM--------VNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHGRPDQHSTHRMRTETKAAKTLCIIMGCFCLCWAPFFVTNIVDPFIDYT-VPGQLWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYRRPSILGQTINGSTHVLR-- >MC4R_PIG GMHTSLHFWNRSTYGLHSNASEPLGKGYSEGGCYEQLFVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETIVITLLNSQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALYHNIMTVKRVGIIISCIWAVCTVSGVLFIIYYS----------------------DDSSAVIICLITVFFTMLALMASLYVHMFLMARLH-RIPGTGTIRQGANMKGAITLTILIGVFVVCWAPFFLHLIFYISCPQNVCFMSHFNLYLILIMCNSIIDPLIYALRSQELRKTFKEIICCYPLGGLCDLSSRY------------ >SSR1_MOUSE GEGACSRGPGSGAADGMEEPGRNASQNGTLSEGQGSAILISFIYSVVCLVGLCGNSMVIYVILRYAKMKTATNIYILNLAIADELLMLSVPFLVTSTLLRH-WPFGALLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIAARYRRPTVAKVVNLGVWVLSLLVILPIVVFSRTAADG--TVACNML-MPEPAQRWLVGFVLYTFLMGFLLPVGAICLCYVLIIAKMRMVALK-AGWQQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQ--DDATVSQLSVILGYANSCANPILYGFLSDNFKRSFQRILCLS-----WMDNAAETALKSRAYSVED >CML1_HUMAN EDEDYNTSISYGDEYPDYLDSIVVLEDLSPLEARVTRIFLVVVYSIVCFLGILGNGLVIIIATFKMK-KTVNMVWFLNLAVADFLFNVFLPIHITYAAMDYHWVFGTAMCKISNFLLIHNMFTSVFLLTIISSDRCISVLLPVSQNHRSVRLAYMACMVIWVLAFFLSSPSLVFRDTAN-SS--WPTHSQ-MDPVGYSRHMVVTVTRFLCGFLVPVLIITACYLTIVCKL---------HRNRLAKTKKPFKIIVTIIITFFLCWCPYHTLNLLELHHTAMSVFSLGLPLATALAIANSCMNPILYVFMGQDFKK-FKVALFSRLVNALSEDTGHSFTKMSSMNERTS >ACM1_RAT ------------MNTSVPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL-AGQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTVNPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPSRQC------- >MRG_HUMAN QNPNLVSQLCGVFLQNETNETIHMQMSMAVGQQALPLNIIAPKAVLVSLCGVLLNGTVFWLLCCGAT--NPYMVYILHLVAADVIYLCCSAVGFLQVTLLTYHGVVFFIPDFLAILSPFSFEVCLCLLVAISTERCVCVLFPIYRCHRPKYTSNVVCTLIWGLPFCINIVKSLFLTYWK-------------------KACVIFLKLSGLFHAILSLVMCVSSLTLLIRFL--------CCSQQQKATRVYAVVQISAPMFLLWALPLSVAPLITDF----KMFVTTSYLISLFLIINSSANPIIYFFVGSLRKKRLKESLRVILQRALADKPEVGIDPMEQPHSTQH >A2AC_RAT AEGPNGSDAGEWGSGGGANASGTDWGPPPGQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAVISFPPLVSFYR-------PQCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKLRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFSYSLYGICREAQLPEPLFKFFFWIGYCNSSLNPVIYTVFNQDFRRSFKHILFRRRRRGFRQ----------------- >OPSD_SEPOF ---MGRDIPDNETWWYNPTMEVHPHWKQFNQVPDAVYYSLGIFIGICGIIGCTGNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFIKWVFGMAACKVYGFIGGIFGLMSIMTMSMISIDRYNVIGRPMASKKMSHRRAFLMIIFVWMWSTLWSIGPIFGWGAYVLEGVLCNCSFD--YITRDSATRSNIVCMYIFAFCFPILIIFFCYFNIVMAVSNHRLNLRKAQAGASAEMKLAKISIVIVTQFLLSWSPYAVVALLAQFGPIEWVTPYAAQLPVMFAKASAIHNPLIYSVSHPKFREAIAENFPWIITCCQFDEKEVEEIPATEQS-GGE >SSR3_HUMAN SVSTTSEPENASSAWPPDATLGNVSAGPSPAGLAVSGVLIPLVYLVVCVVGLLGNSLVIYVVLRHTASPSVTNVYILNLALADELFMLGLPFLAAQNALSY-WPFGSLMCRLVMAVDGINQFTSIFCLTVMSVDRYLAVVHPTSARWRTAPVARTVSAAVWVASAVVVLPVVVFSGVPR-----STCHMQ-WPEPAAAWRAGFIIYTAALGFFGPLLVICLCYLLIVVKVRSARVWAPSCQRRRRSERRVTRMVVAVVALFVLCWMPFYVLNIVNVVCPLPPAFFGLYFLVVALPYANSCANPILYGFLSYRFKQGFRRVLLRPSRRVRSQEPTVGEDEEEEDG---E >O.1R_HUMAN MGVPPGSREPSPVPPDYED-EFLRYLWRDYLYPKQYEWVLIAAYVAVFVVALVGNTLVCLAVWRNHHMRTVTNYFIVNLSLADVLVTAICLPASLLVDITESWLFGHALCKVIPYLQAVSVSVAVLTLSFIALDRWYAICHPL-LFKSTARRARGSILGIWAVSLAIMVPQAAVMECSSRTRLFSVCDER---WADDLYPKIYHSCFFIVTYLAPLGLMAMAYFQIFRKLWGRQPRFLAEVKQMRARRKTAKMLMVVLLVFALCYLPISVLNVLKRVFGMFEAVYACFTFSHWLVYANSAANPIIYNFLSGKFREQFKAAFSCCLPGLGPCGSLKASHKS---LSLQS >O2C1_HUMAN -------------MDGVNDSSLQGFVLMSISDHPQLEMIFFIAILFSYLLTLLGNSTIILLSRLEARLHTPMYFFLSNLSSLDLAFATSSVPQMLINLWGPGKTISYGGCITQLYVFLWLGATECILLVVMAFDRYVAVCRPLYTAIMNPQLCWLLAVIAWLGGLGNSVIQSTFTLQL---PEGFLCEVPKLACGDTSLNQAVLNGVCTFFTAVPLSIIVISYCLIAQAV--------LKIHSAEGRRKAFNTCLSHLLVVFLFYGSASYGYLLPAKNS---KQDQGKFISLFYSLVTPMVNPLIYTLRNMEVKGALRRLLGKGREVG-------------------- >OPSD_MOUSE MNGTEGPNFYVPFSNVTGVGRSPFEQPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVVFTWIMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIFFLICWLPYASVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMLNKQFRNCMLTTLCCGKNPLGDDDASATASKTETSQVAPA >CB1R_TARGR EFFNRSVSTFKENDDNLKCGENFMDMECFMILTASQQLIIAVLSLTLGTFTVLENFLVLCVILQSRTRCRPSYHFIGSLAVADLLGSVIFVYSFLDFHVFHR-KDSSNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRTKAVIAFCVMWTIAIIIAVLPLLGWNCK-K--LKSVCSDI------FPLIDENYLMFWIGVTSILLLFIVYAYVYILWKAHSHSEDQITRPEQTRMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNNPIKTVFAFCSMLCLMDSTVNPIIYALRSQDLRHAFLEQCPPCEGTSQPLDNSMEGNN-AGNVHRAA >B2AR_HUMAN MGQPGNGSAFLLAPNRSHAPDH----DVTQQRDEVWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQGRTLRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IRKEVYILLNWIGYVNSGFNPLIYCRSP-DFRIAFQELLCLRRSSLKAYGNGYSGNTGEQSGYHVE >OPSD_ZOSOP MNGTEGPFFYIPMVNTTGIVRSPYEYPQYYLVNPAAYACLGAYMFFLILVGFPVNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGVAFTWFMASACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFIVHFCIPLAVVGFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVIGFLVCWLPYASVAWYIFTHQGSEFGPPFMTVPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTVSSSSVSPAA-- >GP37_HUMAN LAQNGSLGEGIHEPGGPRRGNRLKNPFYPLTQESYGAYAVMCLSVVIFGTGIIGNLAVMCIVCHNYYMRSISNSLLANLAFWDFLIIFFCLPLVIFHELTKKWLLEDFSCKIVPYIEVASLGVTTFTLCALCIDRFRAATNVQYEMIENCSSTTAKLAVIWVGALLLALPEVVLRQLSKI-K--KISPDLTIYVLALTYDSARLWWYFGCYFCLPTLFTITCSLVTARKIRKAEKATRGNKRQIQLESQMNCTVVALTILYGFCIIPENICNIVTAYMATGQTMDLLNIISQFLLFFKSCVTPVLLFCLCKPFSRAFMECCCCCCEECIQKSSTVTYTTELELSPFST >OPSR_CHICK HEEEDTTRDSVFTYTNSNNTRGPFEGPNYHIAPRWVYNLTSVWMIFVVAASVFTNGLVLVATWKFKKLRHPLNWILVNLAVADLGETVIASTISVINQISGYFILGHPMCVVEGYTVSACGITALWSLAIISWERWFVVCKPFGNIKFDGKLAVAGILFSWLWSCAWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSDPGVQSYMVVLMVTCCFFPLAIIILCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIVAYCFCWGPYTFFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVDDG-----SEVVSSVSNSSVSPA >IL8B_BOVIN EGFEDEFGNYSGTPPTEDYDYSPC----EISTETLNKYAVVVIDALVFLLSLLGNSLVMLVILYSRIGRSVTDVYLLNLAMADLLFAMTLPIWTASKAKG--WVFGTPLCKVVSLLKEVNFYSGILLLACISMDRYLAIVHATRTLTQKWHWVKFICLGIWALSVILALPIFIFREAYQ-YS-DLVCYED-LGANTTKWRMIMRVLPQTFGFLLPLLVMLFCYGFTLRTL---------FSAQMGHKHRAMRVIFAVVLVFLLCWLPYNLVLIADTLMRAHNDIGRALDATEILGFLHSCLNPLIYVFIGQKFRHGLLKIMAIHGLISKEFLAKDG------------ >OPSD_MACFA MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNAEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLFGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEARAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSASIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA >5H2B_RAT ILQKTCDHLILTDRSGLKAESAAEEMKQTAENQGNTVHWAALLIFAVIIPTIGGNILVILAVSLEKRLQYATNYFLMSLAVADLLVGLFVMPIALLTIMFEAWPLPLALCPAWLFLDVLFSTASIMHLCAISLDRYIAIKKPIANQCNSRTTAFVKITVVWLISIGIAIPVPIKGIEA-NA---ITCELT------KDRFGSFMLFGSLAAFFAPLTIMIVTYFLTIHALRKKRRMGKKPAQTISNEQRASKVLGIVFLFFLLMWCPFFITNVTLALCDSCTTLKTLLQIFVWVGYVSSGVNPLIYTLFNKTFREAFGRYITCNYQATKSVKVLRKGNSMVENSKFFT >OPSR_ANOCA NDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITSVWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISGYFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSWVWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKVDDG-----SELVSSVSNSSVSPA >5H1A_MOUSE -MDMFSLGQGNNTTTSLEPFGTGGNDTGLSNVTFSYQVITSLLLGTLIFCAVLGNACVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQVTCDLFIALDVLCCTSSILHLCAIALDRYWAITDPIYVNKRTPRRAAALISLTWLIGFLISIPPMLGWRA---NP--NECTIS--------KDHGYTIYSTFGAFYIPLLLMLVLYGRIFRAARFRKNEEAKRKMALARERKTVKTLGIIMGTFILCWLPFFIVALVLPFCESSHMPELLGAIINWLGYSNSLLNPVIYAYFNKDFQNAFKKIIKCKFCR--------------------- >CCR4_HUMAN IYTSDNYTEEMG-SGDYDSMKEPC---FREENANFNKIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFANVSE-DD-RYICDRF---YPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH >CKR3_MACMU MTTSLDTVETFGPTSYDDDMGLLC---EKADVGALIAQFVPPLYSLVFMVGLLGNVVVVMILIKYRRLRIMTNIYLLNLAISDLLFLFTLPFWIHYVRERN-WVFSHGMCKVLSGFYHTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIVTWGLAVLAALPEFIFYGTEK-LF-KTLCSAIYPQDTVYSWRHFHTLKMTILCLALPLLVMAICYTGIIKTL---------LRCPSKKKYKAIRLIFVIMAVFFIFWTPYNVAILISTYQSVLKHLDLFVLATEVIAYSHCCVNPVIYAFVGERFRKYLRHFFHRHVLMHLGKYIPFLT-------SSVS >OPSD_GLOME MNGTEGLNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSVLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYIPLNLAVANLFMVFGGFTTTLYTSLHAYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWVMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTSRQEVNNESFVIYMFVVHFTIPLVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWVPYASVAFYIFTHQGSDFGPIFMTIPSFFAKSSSIYNPVIYIMMNKQLRNCMLTTLCCGRNPLGDDEASTTASKTETSQVAPA >FMLR_MOUSE ---MDTNMSLLMNKSAVNLMNVSGSTQSVSAGYIVLDVFSYLIFAVTFVLGVLGNGLVIWVAGFRMK-HTVTTISYLNLAIADFCFTSTLPFYIASMVMGGHWPFGWFMCKFIYTVIDINLFGSVFLIALIALDRCICVLHPVAQNHRTVSLAKKVIIVPWICAFLLTLPVIIRLTTVPSPWPVEKRKVA------VTMLTVRGIIRFIIGFSTPMSIVAICYGLITTKI---------HRQGLIKSSRPLRVLSFVVAAFFLCWCPFQVVALISTIQVREPGIVTALKITSPLAFFNSCLNPMLYVFMGQDFRERLIHSLPASLERALTEDSAQTGT---------- >OPSD_LAMJA MNGTEGDNFYVPFSNKTGLARSPYEYPQYYLAEPWKYSALAAYMFFLILVGFPVNFLTLFVTVQHKKLRTPLNYILLNLAMANLFMVLFGFTVTMYTSMNGYFVFGPTMCSIEGFFATLGGEVALWSLVVLAIERYIVICKPMGNFRFGNTHAIMGVAFTWIMALACAAPPLVGWSRYIPEGMQCSCGPDYYTLNPNFNNESYVVYMFVVHFLVPFVIIFFCYGRLLCTVKEAAAAQQESASTQKAEKEVTRMVVLMVIGFLVCWVPYASVAFYIFTHQGSDFGATFMTLPAFFAKSSALYNPVIYILMNKQFRNCMITTLCCGKNPLGDDESGASVSSVSTSPVSPA >C5AR_CAVPO ---MMVTVSYDYDYNSTFLPDGFVD--NYVERLSFGDLVAVVIMVVVFLVGVPGNALVVWVTACEAR-RHINAIWFLNLAAADLLSCLALPILLVSTVHLNHWYFGDTACKVLPSLILLNMYTSILLLATISADRLLLVLSPICQRFRGGCLAWTACGLAWVLALLLSSPSFLYRRTHN-SF--VYCVTD-YG-RDISKERAVALVRLLVGFIVPLITLTACYTFLLLRT---------WSRKATRSAKTVKVVVAVVSSFFVFWLPYQVTGILLAWHSPNRNTKALDAVCVAFAYINCCINPIIYVVAGHGFQGRLLKSLPSVLRNVLTEESLDKSTVD-------- >P2Y8_.ENLA ATSYPTFLTTPYLPMKLLMNLTNDTEDICVFDEGFKFLLLPVSYSAVFMVGLPLNIAAMWIFIAKMRPWNPTTVYMFNLALSDTLYVLSLPTLVYYYADKNNWPFGEVLCKLVRFLFYANLYSSILFLTCISVHRYRGVCHPISLRRMNAKHAYVICALVWLSVTLCLVPNLIFVTVS-----NTICHDT-TRPEDFARYVEYSTAIMCLLFGIPCLIIAGCYGLMTRELMKP---SGNQQTLPSYKKRSIKTIIFVMIAFAICFMPFHITRTLYYYARLLNVINVTYKVTRPLASANSCIDPILYFLANDRYRRRLIRTVRRRSSVPNRRCMHTNMTAGPLPVISAE >5H1D_FUGRU ELDNNSLDYFSSNFTDIPSN--TTVAHWTEATLLGLQISVSVVLAIVTLATMLSNAFVIATIFLTRKLHTPANFLIGSLAVTDMLVSILVMPISIVYTVSKTWSLGQIVCDIWLSSDITFCTASILHLCVIALDRYWAITDALYSKRRTMRRAAVMVAVVWVISISISMPPLFWR----H-E--KECMVN-------TDQISYTLYSTFGAFYVPTVLLIILYGRIYVAARSRKLALERKRLCAARERKATKTLGIILGAFIICWLPFFVVTLVWAICKEC-FDPLLFDVFTWLGYLNSLINPVIYTVFNDEFKQAFQKLIKFRR----------------------- >5HT_LYMST TGQFINGSHSSRSRDNASANDTDDRYWSLTVYSHEHLVLTSVILGLFVLCCIIGNCFVIAAVMLERSLHNVANYLILSLAVADLMVAVLVMPLSVVSEISKVWFLHSEVCDMWISVDVLCCTASILHLVAIAMDRYWAVTSI-YIRRRSARRILLMIMVVWIVALFISIPPLFGWRD-PD-K--GTCIIS--------QDKGYTIFSTVGAFYLPMLVMMIIYIRIWLVARSRNDTRTREKLELKRERKAARTLAIITGAFLICWLPFFIIALIGPFVDPEGIPPFARSFVLWLGYFNSLLNPIIYTIFSPEFRSAFQKILFGKYRRGHR------------------ >5H6_HUMAN -----------MVPEPGPTANSTPAWGAGPPSAPGGSGWVAAALCVVIALTAAANSLLIALICTQPARNTS-NFFLVSLFTSDLMVGLVVMPPAMLNALYGRWVLARGLCLLWTAFDVMCCSASILNLCLISLDRYLLILSPLYKLRMTPLRALALVLGAWSLAALASFLPLLLGWH---E-VPGQCRLL--------ASLPFVLVASGLTFFLPSGAICFTYCRILLAARKQVESRRLATKHSRKALKASLTLGILLGMFFVTWLPFFVANIVQAVCDC--ISPGLFDVLTWLGYCNSTMNPIIYPLFMRDFKRALGRFLPCPRCPRERQASLASSGPRPG-LSLQQ >AG2R_MERUN ------MALNSSADDGIKRIQDDC---PKAGRHSYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPVWAVYTAMEYRWPFGNHLCKIASAGISFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCVVIWLLAGLASLPAVIHRNVYF-TN--TVCAFH-YESQNSTLPVGLGLTKNILGFMFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFRIIMAIVLFFFFSWIPHQIFTFLDVLIQLGDVVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSD-------N >GRHR_HORSE --MANSDSLEQDPNHCSAINNSIPLIQGKLPTLTVSGKIRVTVTFFLFLLSTAFNASFLLKLQKWTQKLSRMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGEFLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITQPL-AVQSNSKLEQSMISLAWILSIVFAGPQLYIFRMIYTV--FSQCVTH--SFPQWWHQAFYNFFTFGCLFIIPLLIMLICNAKIIFALTRVPRKNQSKNNIPRARLRTLKMTVAFATSFVVCWTPYYVLGIWYWFDPEMRVSDPVNHFFFLFAFLNPCFDPLIYGYFSL------------------------------------- >REIS_TODPA ---------------------MFGNPAMTGLHQFTMWEHYFTGSIYLVLGCVVFSLCGMCIIFLARQKPRRKYAILIHVLITAMAVNGGDPAHASSSIVGR-WLYGSVGCQLMGFWGFFGGMSHIWMLFAFAMERYMAVCHREFYQQMPSVYYSIIVGLMYTFGTFWATMPLLGWASYGLEVHGTSCTIN--YSVSDESYQSYVFFLAIFSFIFPMVSGWYAISKAWSGLSAIPD-EKEKDKDILSEEQLTALAGAFILISLISWSGFGYVAIYSALTHGGQLSHLRGHVPPIMSKTGCALFPLLIFLLTARSLPKSDTKKP-------------------------- >GRPR_RAT NCSHLNLEVDPFLSCNNTFNQTLSPPKMDNWFHPGIIYVIPAVYGLIIVIGLIGNITLIKIFCTVKSMRNVPNLFISSLALGDLLLLVTCAPVDASKYLADRWLFGRIGCKLIPFIQLTSVGVSVFTLTALSADRYKAIVRPMIQASHALMKICLKAALIWIVSMLLAIPEAVFSDLHPQT--FISCAPY---HSNELHPKIHSMASFLVFYIIPLSIISVYYYFIARNLIQSLPVNIHVKKQIESRKRLAKTVLVFVGLFAFCWLPNHVIYLYRSYHYSEMLHFITSICARLLAFTNSCVNPFALYLLSKSFRKQFNTQLLCCQPSLLNR--SHSMTSFKSTNP-SA >CKR5_MACMU ----MDYQVSSPTYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLLFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKMVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >B4AR_MELGA --------------MTPLPAGNGSVPNCSWAAVLSRQWAVGAALSITILVIVAGNLLVIVAIAKTPRLQTMTNVFVTSLACADLVMGLLVVPPGATILLSGHWPYGTVVCELWTSLDVLCVTASIETLCAIAVDRYLAITAPLYEALVTKGRAWAVVCMVWAISAFISFLPIMNHWWRDV-R--RCCDFV--------TNMTYAIVSSTVSFYVPLLVMIFVYVRVFAVATRHSRGRRPSRLLAIKEHKALKTLGIIMGTFTLCWLPFFVANIIKVFCRPL-VPDQLFLFLNWLGYVNSAFNPIIYCRSP-DFRSAFRKLLCCPRRADRRLHAAPQAFSPRGDPMEDS >OPSV_.ENLA -----MLEEEDFYLFKNVSNVSPFDGPQYHIAPKWAFTLQAIFMGMVFLIGTPLNFIVLLVTIKYKKLRQPLNYILVNITVGGFLMCIFSIFPVFVSSSQGYFFFGRIACSIDAFVGTLTGLVTGWSLAFLAFERYIVICKPMGNFNFSSSHALAVVICTWIIGIVVSVPPFLGWSRYMPEGLQCSCGPDWYTVGTKYRSEYYTWFIFIFCFVIPLSLICFSYGRLLGALRAVAAQQQESASTQKAEREVSRMVIFMVGSFCLCYVPYAAMAMYMVTNRNHGLDLRLVTIPAFFSKSSCVYNPIIYSFMNKQFRGCIMETVCGRPMSD--DSSVSSSTVSSSQVSPA- >OPSG_ORYLA ENGTEGKNFYIPMNNRTGLVRSPYEYPQYYLADPWQFKLLGIYMFFLILTGFPINALTLVVTAQNKKLRQPLNFILVNLAVAGLIMVCFGFTVCIYSCMVGYFSLGPLGCTIEGFMATLGGQVSLWSLVVLAIERYIVVCKPMGSFKFTATHSAAGCAFTWIMASSCAVPPLVGWSRYIPEGIQVSCGPDYYTLAPGFNNESFVMYMFSCHFCVPVFTIFFTYGSLVMTVKAAAAQQQDSASTQKAEKEVTRMCFLMVLGFLLAWVPYASYAAWIFFNRGAAFSAMSMAIPSFFSKSSALFNPIIYILLNKQFRNCMLATIGMGG-----------VSTSKTEVSTAA >NY6R_RABIT --MEVSLNDPASNKTSAKSNSSAFFYFESCQSPSLALLLLLIAYTVVLIMGICGNLSLITIIFKKQRAQNVTNILIANLSLSDILVCVMCIPFTAIYTLMDRWIFGNTMCKLTSYVQSVSISVSIFSLVLIAIERYQLIVNPR-GWKPSASHAYWGIMLIWLFSLLLSIPLLLSYHLTDSH--HVVCVEH---WPSKTNQLLYSTSLIMLQYFVPLGFMFICYLKIVICLHKRNSKRRENESRLTENKRINTMLISIVVTFAACWLPLNTFNVIFDWYHEVCHHDLVFAICHLVAMVSTCINPLFYGFLNRNFQKDLVVLIHHCLCFALRERY---TLHTDESKGSLR >CCR4_BOVIN IFTSDNYTEDDLGSGDYDSMKEPC---FREENAHFNRIFLPTVYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVLTLPFWAVDAVAN--WYFGKFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQKPRKLLAEKVVYVGVWLPAVLLTIPDLIFADIKE-DE-RYICDRF---YPSDLWLVVFQFQHIVVGLLLPGIVILSCYCIIISKL---------SHSKGYQKRKALKTTVILILTFFACWLPYYIGISIDSFILLESTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH >OPS2_SCHGR YESSVGLPLLGWNVPTEHLDLVHPHWRSFQVPNKYWHFGLAFVYFMLMCMSSLGNGIVLWIYATTKSIRTPSNMFIVNLALFDVLMLLEMPMLVVSSLFYQR-PVWELGCDIYAALGSVAGIGSAINNAAIAFDRYRTISCPI-DGRLTQGQVLALIAGTWVWTLPFTLMPLLRIWSRFAEGFLTTCSFD--YLTDDEDTKVFVGCIFAWSYAFPLCLICCFYYRLIGAVREHNVKSNADTEAQSAEIRIAKVALTIFFLFLCSWTPYAVVAMIGAFGNRAALTPLSTMIPAVTAKIVSCIDPWVYAINHPRFRAEVQKRMKWLHLGEDARSSKSDRTVGNVSASA-- >DADR_HUMAN --------------MRTLNTSAMDGTGLVVERDFSVRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-SFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLSWHKAGN-TIDNCDSS--------LSRTYAISSSVISFYIPVAIMIVTYTRIYRIAQKQVECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCILPFCGSGCIDSNTFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSTLLGCYRLCPATNNAIETAAMFSSH----- >THRR_HUMAN LTEYRLVSINKSSPLQKQLPAFISEDASGYLTSSWLTLFVPSVYTGVFVVSLPLNIMAIVVFILKMKVKKPAVVYMLHLATADVLFVSVLPFKISYYFSGSDWQFGSELCRFVTAAFYCNMYASILLMTVISIDRFLAVVYPMSLSWRTLGRASFTCLAIWALAIAGVVPLVLKEQTIQ--N--TTCHDVLNETLLEGYYAYYFSAFSAVFFFVPLIISTVCYVSIIRCL------SSSAVANRSKKSRALFLSAAVFCIFIICFGPTNVLLIAHYSFLSHEAAYFAYLLCVCVSSISSCIDPLIYYYASSECQRYVYSILCCKESSDPSSYNSSGDTCS-------- >V2R_HUMAN MLMASTTSAVPGHPSLPSLPSNSSQERPLDTRDPLLARAELALLSIVFVAVALSNGLVLAALARRGRHWAPIHVFIGHLCLADLAVALFQVLPQLAWKATDRFRGPDALCRAVKYLQMVGMYASSYMILAMTLDRHRAICRPMAYRHGSGAHWNRPVLVAWAFSLLLSLPQLFIFAQRNSG--VTDCWAC---FAEPWGRRTYVTWIALMVFVAPTLGIAACQVLIFREIHASGRRPGEGAHVSAAVAKTVRMTLVIVVVYVLCWAPFFLVQLWAAWDPEA-LEGAPFVLLMLLASLNSCTNPWIYASFSSSVSSELRSLLCCARGRTPPSLGPQDSSLAKDTSS--- >GALR_HUMAN ----MELAVGNLSEGNASCPEPPAPEPGPLFGIGVENFVTLVVFGLIFALGVLGNSLVITVLARSKPPRSTTNLFILNLSIADLAYLLFCIPFQATVYALPTWVLGAFICKFIHYFFTVSMLVSIFTLAAMSVDRYVAIVHSRSSSLRVSRNALLGVGCIWALSIAMASPVAYHQGLF-SN--QTFCWEQ---WPDPRHKKAYVVCTFVFGYLLPLLLICFCYAKVLNHLHKKLK--NMSKKSEASKKKTAQTVLVVVVVFGISWLPHHIIHLWAEFGVFPPASFLFRITAHCLAYSNSSVNPIIYAFLSENFRKAYKQVFKCHIRKDSHLSDTKEPPSTNCTHV--- >5H7_RAT SSWMPHLLSGFLEVTASPAPTNVSGCGEQINYGRVEKVVIGSILTLITLLTIAGNCLVVISVCFVKKLRQPSNYLIVSLALADLSVAVAVMPFVSVTDLIGGWIFGHFFCNVFIAMDVMCCTASIMTLCVISIDRYLGITRPLYPVRQNGKCMAKMILSVWLLSASITLPPLFGWAQ--N-D--KVCLIS--------QDFGYTIYSTAVAFYIPMSVMLFMYYQIYKAARKSSRLERKNISIFKREQKAATTLGIIVGAFTVCWLPFFLLSTARPFICGTCIPLWVERTCLWLGYANSLINPFIYAFFNRDLRTTYRSLLQCQYRNINRKLSAAGAERPERSEFVLQ >UL33_RCMVM ----MDVLLGTEELEDELHQLHFNYTCVPSLGLSVARDAETAVNFLIVLVGGPMNFLVLATQMLSNRSVSTPTLYMTNLYLANLLTVATLPFLMLSNRGL--VGSSPEGCKIAALAYYATCTAGFATLMLIAINRYR-VIHQRRSGAGSKRQTYAVLAVTWLASLMCASPAPLYATVMAA-DAFETCIIYSYDQVK-TVLATFKILITMIWGITPVVMMSWFYVFFYRRL---------KLTSYRRRSQTLTFVTTLMLSFLVVQTPFVAIMSYDSYGVLNNKRDAVSMLARVVPNFHCLLNPVLYAFLGRDFNKRFILCISGKLFSRRRALRERAGPVCALP---SK >O.YR_MOUSE GTPAANWSIELDLGSGVPPGAEGNLTAGPPRRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLATWLGCLVASVPQVHIFSLRE----VFDCWAV---FIQPWGPKAYVTWITLAVYIVPVIVLAACYGLISFKIWQNRAAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDVNA-KEASAFIIAMLLASLNSCCNPWIYMLFTGHLFHELVQRFLCCSARYLKGSRPGENSSTFVLSRCSS >PF2R_RAT --MSINS---------SKQPASSAAGLIANTTCQTENRLSVFFSIIFMTVGIVSNSLAIAILMKAYQSKASFLLLASGLVITDFFGHLINGGIAVFVYASDKFDQSNILCSVFGISMVFSGLCPLFLGSTMAIERCIGVTNPLHSTKITSKHVKMILSGVCMFAVFVALLPILGHRDYQIQASRTWCFYN--TEHIEDWEDRFYLLFFSSLGLLALGISFSCNAVTGVTLLRVKFRSQQHRQGRSHHLEMVIQLLAIMCVSCVCWSPFLVTMANIAINGNNPVTCETTLFALRMATWNQILDPWVYILLRKAVLRNLYKLASRCCGVNIISLHIWELKVAAISESPAA >D1DR_FUGRU --------------MAQNFSTVGDGKQMLLERDSSKRVLTGCFLSLLIFTTLLGNTLVCVAVTKFRHRSKVTNFFVISLAISDLLVAILVMPWKAATEIMGFWPFG-EFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKVACLMISVAWTLSVLISFIPVQLNWHKALN-PPDNCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAQKQSLSECSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCEADCISSTTFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSILLGCHRLCPGNS-AIEIAPLSNPSCQYQP >CKR2_RAT HSLFPRSIQELDEGATTPYDYDDGEPCHKTSVKQIGAWILPPLYSLVFIFGFVGNMLVIIILISCKKLKSMTDIYLFNLAISDLLFLLTLPFWAHYAANE--WVFGNIMCKLFTGLYHIGYFGGIFFIILLTIDRYLAIVHAVALKARTVTFGVITSVVTWVVAVFASLPGIIFTKSEQ-ED-QHTCGPY----FPTIWKNFQTIMRNILSLILPLLVMVICYSGILHTL--------FRCRNEKKRHRAVRLIFAIMIVYFLFWTPYNIVLFLTTFQEFLMHLDQAMQVTETLGMTHCCVNPIIYAFVGEKFRRYLSIFFRKHIAKNLCKQCPVFVSSTFTPSTGEQ >OPRK_HUMAN PPNSSAWFPGWAEPDSNGSAGSEDAQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNS-WPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPLKAKIINICIWLLSSSVGISAIVLGGTKVDVD-VIECSLQFPDDDYSWWDLFMKICVFIFAFVIPVLIIIVCYTLMILRLKSVRLL-SGSREKDRNLRRITRLVLVVVAVFVVCWTPIHIFILVEALGSTSTAALSSYYFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFPLKMRMERQSTSRVAYLRDIDGMNKP >ACM3_BOVIN N---------ISRAAGNLSSPNGTTSDPLGGHTIWQVVFIAFLTGVLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWVISFILWAPAILFWQYFV-VP-PGECFIQ------FLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRKTRTKRKRMSLIKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFCDSC-IPKTYWNLGYWLCYINSTVNPVCYALCNKTFRNTFKMLLLCQCDKRKRRKQQYQHKRVPEQAL--- >C3.1_HUMAN ---MDQFPESVTENFEYDDLAEAC---YIGDIVVFGTVFLSIFYSVIFAIGLVGNLLVVFALTNSKKPKSVTDIYLLNLALSDLLFVATLPFWTHYLINEKG--LHNAMCKFTTAFFFIGFFGSIFFITVISIDRYLAIVLAASMNNRTVQHGVTISLGVWAAAILVAAPQFMFTKQKE-----NECLGDYPEVLQEIWPVLRNVETNFLGFLLPLLIMSYCYFRIIQTL---------FSCKNHKKAKAIKLILLVVIVFFLFWTPYNVMIFLETLKLYDKDLRLALSVTETVAFSHCCLNPLIYAFAGEKFRRYLYHLYGKCLAVLCGRSVHVDRSRHGSVLSSNF >5H4_HUMAN ------------------MDKLDANVSSEEGFGSVEKVVLLTFLSTVILMAILGNLLVMVAVCWDRQRKIKTNYFIVSLAFADLLVSVLVMPFGAIELVQDIWIYGEVFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPYRNKMTPLRIALMLGGCWVIPTFISFLPIMQGWNNIRK-NSTYCVFM--------VNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHSRPDQHSTHRMRTETKAAKTLCIIMGCFCLCWAPFFVTNIVDPFIDYT-VPGQVWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYRRPSILGQTINGSTHVLRDA >5H2A_PIG NTSDAFNWTVDSENRTNLSCEGCLSPPCFSLLHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHRRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEREPGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILAYKSSQLQTGQK >OPSD_DIPVU MNGTEGPYFYVPMVNTSGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWLMALACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFICHFSIPLLVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIMMVIAFLVCWLPYASVAWWIFTHQGSDFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >HH1R_HUMAN ----------MSLPNSSCLLEDKMCEGNKTTMASPQLMPLVVVLSTICLVTVGLNLLVLYAVRSERKLHTVGNLYIVSLSVADLIVGAVVMPMNILYLLMSKWSLGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLKYRTKTRASATILGAWFLSFLWVIPILGWNHFM--VR-EDKCETD------FYDVTWFKVMTAIINFYLPTLLMLWFYAKIYKAVRQHLRSQYVSGLHMNRERKAAKQLGFIMAAFILCWIPYFIFFMVIAFCKNC-CNEHLHMFTIWLGYINSTLNPLIYPLCNENFKKTFKRILHIRS----------------------- >OPSD_DELDE MNGTEGLNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSVLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVANLFMVFGGFTTTLYTSLHAYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWIMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTLSPEVNNESFVIYMFVVHFTIPLVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWVPYASVAFYIFTHQGSDFGPIFMTIPSFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGRNPLGDDEASTTASKTETSQVAPA >OPS1_DROME PLS---NGSVVDKVTPDMAHLISPYWNQFPAMDPIWAKILTAYMIMIGMISWCGNGVVIYIFATTKSLRTPANLLVINLAISDFGIMITNTPMMGINLYFETWVLGPMMCDIYAGLGSAFGCSSIWSMCMISLDRYQVIVKGMAGRPMTIPLALGKIAYIWFMSSIWCLAPAFGWSRYVPEGNLTSCGID--YLERDWNPRSYLIFYSIFVYYIPLFLICYSYWFIIAAVSAHMNVRSSEDAEKSAEGKLAKVALVTITLWFMAWTPYLVINCMGLFKFEG-LTPLNTIWGACFAKSAACYNPIVYGISHPKYRLALKEKCPCCVFGKVDDGKSSDSEAESKA----- >RGR_BOVIN ---------------------MAESGTLPTGFGELEVLAVGTVLLVEALSGLSLNILTILSFCKTPELRTPSHLLVLSLALADSGIS-LNALVAATSSLLRRWPYGSEGCQAHGFQGFVTALASICSSAAVAWGRYHHFCTRS---RLDWNTAVSLVFFVWLSSAFWAALPLLGWGHYDYEPLGTCCTLD--YSRGDRNFTSFLFTMAFFNFLLPLFITVVSYRLME--------------QKLGKTSRPPVNTVLPARTLLLGWGPYALLYLYATIADATSISPKLQMVPALIAKAVPTVNAMNYALGSEMVHRGIWQCLSPQRREHSREQ---------------- >VU51_HSV6U ----------------MEKETKSLAWPATAEFYGWVFIFSSIQLCTMVLLTVRFNSFKVGR-E--------YAVFTFAGMSFNCFLLPIKMGLLSGH-----WSLPRDFCAILLYIDDFSIYFSSWSLVFMAIERINHFCYSTLLNENSKALAKVCFPIVWIISGVQALQMLNNYKATA---ETPQCFLA--------FRSGYDMWLMLVYSVMIPVMLVFIYIYSKNFM-----------LLKDELSTVTTYLCIYLLLGTIAHLPKAGLSEIESD----KIFYGLRDIFMALPVLKVYYIPVMAYCMACDDHTVPVRLCSIWLVNLCKKCFSCTLEVGIKMLK--- >A1AA_ORYLA -----------MTPSSVTLNCSNCSHVLAPELNTVKAVVLGMVLGIFILFGVIGNILVILSVVCHRHLQTVTYYFIVNLAVADLLLSSTVLPFSAIFEILDRWVFGRVFCNIWAAVDVLCCTASIMSLCVISVDRYIGVSYPLYPAIMTKRRALLAVMLLWVLSVIISIGPLFGWKEP--AP--TVCKIT--------EEPGYAIFSAVGSFYLPLAIILAMYCRVYVVAQKELRSFALRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVLPIGSIFPAYRPSDTVFKITFWLGYFNSCINPIIYLCSNQEFKKAFQSLLGVHCLRMTPRAHHHHTQGHSLT----- >B1AR_BOVIN VPDGAATAARLLVPASPPASLLTSASEGPPLPSQQWTAGMGLLMAFIVLLIVVGNVLVLVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARALVCTVWAISALVSFLPIFMQWWGDS-R--ECCDFI--------INEGYAITSSVVSFYVPLCIMAFVYLRVFREAQKQPGRRRPPRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRAACGSHAAAGCLAVARPSPSPG >OPSH_ASTFA NE--DTTRESAFVYTNANNTRDPFEGPNYHIAPRWVYNVSSLWMIFVVIASVFTNGLVIVATAKFKKLRHPLNWILVNLAIADLGETVLASTISVINQIFGYFILGHPMCVFEGWTVSVCGITALWSLTIISWERWVVVCKPFGNVKFDGKWAAGGIIFSWVWAIIWCTPPIFGWSRYWPHGLKTSCGPDVFSGSEDPGVASYMITLMLTCCILPLSIIIICYIFVWSAIHQVAQQQKDSESTQKAEKEVSRMVVVMILAFIVCWGPYASFATFSAVNPGYAWHPLAAAMPAYFAKSATIYNPIIYVFMNRQFRSCIMQLFGKKVEDA-----SEVSTAS-------- >OL15_MOUSE -------------MEVDSNSSSGTFILMGVSDHPHLEIIFFAVILASYLLTLVGNLTIILLSRLDARLHTPMYFFLSNLSSLDLAFTTSSVPQMLKNLWGPDKTISYGGCVTQLYVFLWLGATECILLVVMAFDRYVAVCRPLYMTVMNPRLCWGLAAISWLGGLGNSVIQSTFTLQL---PDNFLCEVPKLACGDTSLNEAVLNGVCTFFTVVPVSVILVSYCFIAQAV--------MKIRSVEGRRKAFNTCVSHLVVVFLFYGSAIYGYLLPAKSS---NQSQGKFISLFYSVVTPMVNPLIYTLRNKEVKGALGRLLGKGRGAS-------------------- >ETBR_HUMAN SLAPAEVPKGDRTAGSPPRTISPPPCQGPIEIKETFKYINTVVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIVIDIPINVYKLLAEDWPFGAEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAIGFDIITRI--LRICLLHQKTAFMQFYKTAKDWWLFSFYFCLPLAITAFFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYNQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQSFE-EKQSLEFKANDHGYDNFR >PE23_MOUSE ---------MASMWAPEHSAEAHSNLS---STTDDCGSVSVAFPITMMVTGFVGNALAMLLVSRSYRRKKSFLLCIGWLALTDLVGQLLTSPVVILVYLSQRLDPSGRLCTFFGLTMTVFGLSSLLVASAMAVERALAIRAPHYASHMKTRAT-PVLLGVWLSVLAFALLPVLGVG----RYSGTWCFISNETDPAREPGSVAFASAFACLGLLALVVTFACNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNQMSEKECNSFLIAVRLASLNQILDPWVYLLLRKILLRKFCQIRDHTN-YASSSTSLPCWSDQLER----- >TA2R_MOUSE --MWPNG-----------TSLGACFRPVNITLQERRAIASPWFAASFCALGLGSNLLALSVLAGARPPRSSFLALLCGLVLTDFLGLLVTGAIVASQHAALLTDPSCRLCYFMGVAMVFFGLCPLLLGAAMASERFVGITRPFSRPTATSRRAWATVGLVWVAAGALGLLPLLGLGRYSVQYPGSWCFLT----LGTQRGDVVFGLIFALLGSASVGLSLLLNTVSVATLCRVYHTREATQRPRDCEVEMMVQLVGIMVVATVCWMPLLVFIMQTLLQTPPRATEHQLLIYLRVATWNQILDPWVYILFRRSVLRRLHPRFSSQLQAVSLRRPPAQ------------ >ACM3_HUMAN N---------VSRAAGNFSSPDGTTDDPLGGHTVWQVVFIAFLTGILALVTIIGNILVIVSFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLAIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFV-VP-PGECFIQ------FLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRKTRTKRKRMSLVKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFCDSC-IPKTFWNLGYWLCYINSTVNPVCYALCNKTFRTTFKMLLLCQCDKKKRRKQQYQHKRAPEQAL--- >CCR4_FELCA IYPSDNYTEDDLGSGDYDSMKEPC---FREENAHFNRIFLPTVYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVLTLPFWAVDAVAN--WYFGKFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFANVRE-DG-RYICDRF---YPSDSWLVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGYQKRKALKTTVILILAFFACWLPYYIGISIDSFILLESTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH >GPRJ_MOUSE AEAAEALLPHGLMGLHEEHSWMSNRTELQYELNPGEVATASIFFGALWLFSIFGNSLVCLVIHRSRRTQSTTNYFVVSMACADLLISVASTPFVVLQFTTGRWTLGSAMCKVVRYFQYLTPGVQIYVLLSICIDRFYTIVYPL-SFKVSREKAKKMIAASWILDAAFVTPVFFFYG---SNW-HCNYFLP-----PSWEGTAYTVIHFLVGFVIPSILIILFYQKVIKYIWRIDGR-RTMNIVPRTKVKTVKMFLLLNLVFLFSWLPFHVAQLWHPHEQDYKKSSLVFTAVTWVSFSSSASKPTLYSIYNANFRRGMKETFCMSSMKCYRSNAYTIKRNYVGISEIPP >AA2B_MOUSE ------------------------------MQLETQDALYVALELVIAALAVAGNVLVCAAVGASSALQTPTNYFLVSLATADVAVGLFAIPFAITISLG--FCTDFHGCLFLACFVLVLTQSSIFSLLAVAVDRYLAIRVPLYKGLVTGTRARGIIAVLWVLAFGIGLTPFLGWNSKDI-A-PLTCLFE-----NVVPMSYMVYFNFFGCVLPPLLIMLVIYIKIFMVACKQ---MDHSRTTLQREIHAAKSLAMIVGIFALCWLPVHAINCITLFHPALDKPKWVMNVAILLSHANSVVNPIVYAYRNRDFRYSFHKIISRYVLCQAETKGGSGLSLGL------- >MAS_HUMAN -----MDGSNVTSFVVEEPTNISTGRNASVGNAHRQIPIVHWVIMSISPVGFVENGILLWFLCFRMR-RNPFTVYITHLSIADISLLFCIFILSIDYALDYESSGHYYTIVTLSVTFLFGYNTGLYLLTAISVERCLSVLYPIYRCHRPKYQSALVCALLWALSCLVTTMEYVMCI-DRHS--RNDC-----------RAVIIFIAILSFLVFTPLMLVSSTILVVKIRK----------NTWASHSSKLYIVIMVTIIIFLIFAMPMRLLYLLYYEYW--STFGNLHHISLLFSTINSSANPFIYFFVGSSKKKRFKESLKVVLTRAFKDEMQPRTVTVETVV---- >TSHR_SHEEP LQAFDNHYDYTVCGGSEEMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSLLALLGNVFVLVILLTSHYKLTVPRFLMCNLAFADFCMGLYLLLIASVDLYTQSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLWHAYVIMLGGWVCCFLLALLPLVGISSY----KVSICLPM-----TETPLALAYIILVLLLNIIAFIIVCACYVKIYITVRNP------HYNPGDKDTRIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFMLLSKFGICKRQAQAYRGSTGIRVQKVPPD >NK2R_BOVIN ----MGACVVMTDINISSGLDSNATGITAFSMPGWQLALWTAAYLALVLVAVMGNATVIWIILAHQRMRTVTNYFIVNLALADLCMAAFNAAFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPGTRAVIAGIWLVALALAFPQCFYST---GA---TKCVVAWPEDSGGKMLLLYHLIVIALIYFLPLVVMFVAYSVIGLTLWRRGHQHGANLRHLQAKKKFVKTMVLVVVTFAICWLPYHLYFILGTFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTEE----YTPSLSTRVNRC >5H2C_MOUSE LLVWQFDISISPVAAIVTDTFNSSDGGRLFQFPDGVQNWPALSIVVIIIMTIGGNILVIMAVSMEKKLHNATNYFLMSLAIADMLVGLLVMPLSLLAILYDYWPLPRYLCPVWISLDVLFSTASIMHLCAISLDRYVAVRSPVHSRFNSRTKAIMKIAIVWAISIGVSVPIPVIGLRD-VF--NTTCVL---------NDPNFVLIGSFVAFFIPLTIMVITYFLTIYVLRRQKKKPRGTMQAINNEKKASKVLGIVFFVFLIMWCPFFITNILSVLCGKAKLMEKLLNVFVWIGYVCSGINPLVYTLFNKIYRRAFSKYLRCDYKPDKKPPVRQIALSGRELNVNIY >OLF1_CHICK -------------MASGNCTTPTTFILSGLTDNPGLQMPLFMVFLAIYTITLLTNLGLIALISVDLHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERKTISYVGCILQYFSFVLLTVTESLLLAVMAYDRYVAICKPLYPSIMTKAVCWRLVESLYFLAFLNSLVHTSGLLKL---SNHFFCDISQISSSSIAISELLVIISGSLFVMSSIIIILISYVFIILTV--------VMIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRPVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRLTATTFGFIDSKAVQ-------------- >O.2R_RAT ASELNETQEPFLNPTDYDDEEFLRYLWREYLHPKEYEWVLIAGYIIVFVVALIGNVLVCVAVWKNHHMRTVTNYFIVNLSLADVLVTITCLPATLVVDITETWFFGQSLCKVIPYLQTVSVSVSVLTLSCIALDRWYAICHPL-MFKSTAKRARNSIVVIWIVSCIIMIPQAIVMERSSKTTLFTVCDER---WGGEVYPKMYHICFFLVTYMAPLCLMVLAYLQIFRKLWCRKARVAAEIKQIRARRKTARMLMVVLLVFAICYLPISILNVLKRVFGMFETVYAWFTFSHWLVYANSAANPIIYNFLSGKFREEFKAAFSCCLG-VHRRQGDRLESRKSLTTQISN >CKR8_MOUSE DYTMEPNVTMT--DYYPDFFTAPC---DAEFLLRGSMLYLAILYCVLFVLGLLGNSLVILVLVGCKKLRSITDIYLLNLAASDLLFVLSIPFQTHNLLDQ--WVFGTAMCKVVSGLYYIGFFSSMFFITLMSVDRYLAIVHAVAIKVRTASVGTALSLTVWLAAVTATIPLMVFYQVAS-ED-MLQCFQF-YEEQSLRWKLFTHFEINALGLLLPFAILLFCYVRILQQL---------RGCLNHNRTRAIKLVLTVVIVSLLFWVPFNVALFLTSLHDLHQRLALAIHVTEVISFTHCCVNPVIYAFIGEKFKKHLMDVFQKS-CSHIFLYLGRQRQ------LSSN >GPRF_MACNE ----MDPEETSVYLDYYYATSPNPDIRETHSHVPYTSVFLPVFYTAVFLTGVLGNLVLMGALHFKPGSRRLIDIFIINLAASDFIFLVTLPLWVDKEASLGLWRTGSFLCKGSSYMISVNMHCSVFLLTCMSVDRYLAIVCPVSRKFRRTDCAYVVCASIWFISCLLGLPTLLSRELT-IDD-KPYCAEK----KATPLKLIWSLVALIFTFFVPLLSIVTCYCCIARKLCAH---YQQSGKHNKKLKKSIKIIFIVVAAFLVSWLPFNTSKLLAIVSGLQAILQLGMEVSGPLAFANSCVNPFIYYIFDSYIRRAIVHCLCPCLKNYDFGSSTETALSTFIHAEDFT >MC3R_MOUSE NSSCCLSSVSPMLPNLSEHPAAPPASNRSGSGFCEQVFIKPEVFLALGIVSLMENILVILAVVRNGNLHSPMYFFLCSLAAADMLVSLSNSLETIMIAVINSDQFIQHMDNIFDSMICISLVASICNLLAIAIDRYVTIFYALYHSIMTVRKALTLIGVIWVCCGICGVMFIIYYS----------------------EESKMVIVCLITMFFAMVLLMGTLYIHMFLFARLHIAVAGVVAPQQHSCMKGAVTITILLGVFIFCWAPFFLHLVLIITCPTNICYTAHFNTYLVLIMCNSVIDPLIYAFRSLELRNTFKEILCGCNSMNLG------------------ >CCR5_RAT DDLYKELAIYSNSTEIPLQDSIFCSTEEGPLLTSFKTIFMPVAYSLIFLLGMMGNILVLVILERHRHTRSSTETFLFHLAVADLLLVFILPFAVAEGSVG--WVLGTFLCKTVIALHKINFYCSSLLLACIAVDRYLAIVHAVAYRRRRLLSIHITCSTIWLAGFLFALPELLFAKVVQ-NE-LPQCIFSQENEAETRAWFASRFLYHTGGFLLPMLVMAWCYVGVVHRL--------LQAQRRPQRQKAVRVAILVTSIFLLCWSPYHIVIFLDTLERLKGYLSVAITLCEFLGLAHCCLNPMLYTFAGVKFRSDLSRLLTKLGCAG---PASLC--------PGWR >OPSD_LOLFO ---MGRDIPDNETWWYNPYMDIHPHWKQFDQVPAAVYYSLGIFIAICGIIGCVGNGVVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFMKWVFGNAACKVYGLIGGIFGLMSIMTMTMISIDRYNVIGRPMASKKMSHRKAFIMIIFVWIWSTIWAIGPIFGWGAYTLEGVLCNCSFD--YITRDTTTRSNILCMYIFAFMCPIVVIFFCYFNIVMSVSNHRLNLRKAQAGANAEMKLAKISIVIVTQFLLSWSPYAVVALLAQFGPIEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFRERIASNFPWILTCCQYDEKEIEEIPAGEQS-GGE >OPSD_TODPA ----GRDLRDNETWWYNPSIVVHPHWREFDQVPDAVYYSLGIFIGICGIIGCGGNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFLKWIFGFAACKVYGFIGGIFGFMSIMTMAMISIDRYNVIGRPMASKKMSHRRAFIMIIFVWLWSVLWAIGPIFGWGAYTLEGVLCNCSFD--YISRDSTTRSNILCMFILGFFGPILIIFFCYFNIVMSVSNHRLNLRKAQAGANAEMRLAKISIVIVSQFLLSWSPYAVVALLAQFGPLEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFREAISQTFPWVLTCCQFDDKETEEIPAGESSDAAP >OPRK_RAT LPNSSSWFPNWAESDSNGSVGSEDQQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSAVYLMNS-WPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPLKAKIINICIWLLASSVGISAIVLGGTKVDVD-VIECSLQFPDDEYSWWDLFMKICVFVFAFVIPVLIIIVCYTLMILRLKSVRLL-SGSREKDRNLRRITKLVLVVVAVFIICWTPIHIFILVEALGSTSTAVLSSYYFCIALGYTNSSLNPVLYAFLDENFKRCFRDFCFPIKMRMERQSTNRVASMRDVGGMNKP >ACM4_RAT ------M-NFTPVNGSSANQSVRLVTAAHNHLETVEMVFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLGCADLIIGAFSMNLYTLYIIKGYWPLGAVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPARRTTKMAGLMIAAAWVLSFVLWAPAILFWQFVV-VP-DNQCFIQ------FLSNPAVTFGTAIAAFYLPVVIMTVLYIHISLASRSRSIAVRKKRQMAARERKVTRTIFAILLAFILTWTPYNVMVLVNTFCQSC-IPERVWSIGYWLCYVNSTINPACYALCNATFKKTFRHLLLCQYRNIGTAR---------------- >SSR3_RAT SVPTTLDPGNASSAWPLDTSLGNASAGTSLAGLAVSGILISLVYLVVCVVGLLGNSLVIYVVLRHTSSPSVTSVYILNLALADELFMLGLPFLAAQNALSY-WPFGSLMCRLVMAVDGINQFTSIFCLTVMSVDRYLAVVHPTSARWRTAPVARMVSAAVWVASAVVVLPVVVFSGVPR-----STCHMQ-WPEPAAAWRTAFIIYTAALGFFGPLLVICLCYLLIVVKVRSTSCQAPACQRRRRSERRVTRMVVAVVALFVLCWMPFYLLNIVNVVCPLPPAFFGLYFLVVALPYANSCANPILYGFLSYRFKQGFRRILLRPSRRVRSQEPGSGEEDEEEEERREE >OPSD_HUMAN MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA >OPSG_GECGE RDDDDTTRGSVFTYTNTNNTRGPFEGPNYHIAPRWVYNLVSFFMIIVVIASCFTNGLVLVATAKFKKLRHPLNWILVNLAFVDLVETLVASTISVFNQIFGYFILGHPLCVIEGYVVSSCGITGLWSLAIISWERWFVVCKPFGNIKFDSKLAIIGIVFSWVWAWGWSAPPIFGWSRYWPHGLKTSCGPDVFSGSVELGCQSFMLTLMITCCFLPLFIIIVCYLQVWMAIRAVAAQQKESESTQKAEREVSRMVVVMIVAFCICWGPYASFVSFAAANPGYAFHPLAAALPAYFAKSATIYNPVIYVFMNRQFRNCIMQLFGKKVDDG-----SEAVSSVSNSSVAPA >OAR_BOMMO TEIYDVIEDEKDVCAVADEPNIPCSFGISLAVPEWEAICTAIILTMIIISTVVGNILVILSVFTYKPLRIVQNFFIVSLAVADLTVAILVLPLNVAYSILGQWVFGIYVCKMWLTCDIMCCTSSILNLCAIALDRYWAITDPIYAQKRTLERVLFMIGIVWILSLVISSPPLLGWNDW-E-P--TPCRLT--------SQPGFVIFSSSGSFYIPLVIMTVVYFEIYLATKKRAVYEEKQRISLTRERRAARTLGIIMGVFVVCWLPFFVIYLVIPFCVSCCLSNKFINFITWLGYVNSALNPLIYTIFNMDFRRAFKKLLFIKC----------------------- >YQH2_CAEEL EQSTPARENLPNREIYQIFQFTLVYALPLSNHDNSSLMLIAGFYALLFMFGTCGNAAILAVVHHVKGRHNTTLTYICILSIVDFLSMLPIPMTIIDQILGF-WMFDTFACKLFRLLEHIGKIFSTFILVAFSIDRYCAVCHPLQVRVRNQRTVFVFLGIMFFVTCVMLSPILLYAHSKVTRMHLYKCVDD----LGRELFVVFTLYSFVLAYLMPLLFMIYFYYEMLIRLFKQLVGGGEEKKLTIPVGHIAIYTLAICSFHFICWTPYWISILYSLYEELYYAFIYFMYGVHALPYINSASNFILYGLLNRQLHNAPERKYTRNGVGGRQMSHALTSELIAIPSSSCR >UL33_HSV7J ----MICYSFAKNVTFAFLIILQNFFSQHDEEYKYNYTCITPTVRKAQRLESVINGIMLTLILPVSTKQTITSPYLITLFISDSLHSLTVLLLTLNREAL--TNLNQALCQCVLFVYSASCTYSLCMLAVISTIRYR-TLQRRTLNDKNNNHIKRNVGILFLSSAMCAIPAVLYVQVEKK-KNYGKCNIHSTQKAY-DLFIGIKIVYCFLWGIFPTVIFSYFYVIFGKTL---------RALTQSKHNKTLSFISLLILSFLCIQIPNLLVMSVEIFFLYIIQREIVQIISRLMPEIHCLSNPLVYAFTRTDFRLRFYDFIKCNLCNSSLKRKRNP------------ >C5AR_RAT DPISNDSSEITYDYSDGTPNPDMPADGVYIPKMEPGDIAALIIYLAVFLVGVTGNALVVWVTAFEAK-RTVNAIWFLNLAVADLLSCLALPILFTSIVKHNHWPFGDQACIVLPSLILLNMYSSILLLATISADRFLLVFKPICQKFRRPGLAWMACGVTWVLALLLTIPSFVFRRIHK-SD--ILCNID-YSKGPFFIEKAIAILRLMVGFVLPLLTLNICYTFLLIRT---------WSRKATRSTKTLKVVMAVVTCFFVFWLPYQVTGVILAWLPRSQSVERLNSLCVSLAYINCCVNPIIYVMAGQGFHGRLRRSLPSIIRNVLSEDSLGRSTMD-------- >CKR6_HUMAN DSSEDYFVSVNTSYYSVDSEMLLC---SLQEVRQFSRLFVPIAYSLICVFGLLGNILVVITFAFYKKARSMTDVYLLNMAIADILFVLTLPFWAVSHATGA-WVFSNATCKLLKGIYAINFNCGMLLLTCISMDRYIAIVQATRLRSRTLPRSKIICLVVWGLSVIISSSTFVFNQKYN-TQ-SDVCEPKQTVSEPIRWKLLMLGLELLFGFFIPLMFMIFCYTFIVKTL---------VQAQNSKRHKAIRVIIAVVLVFLACQIPHNMVLLVTAANLGKKLIGYTKTVTEVLAFLHCCLNPVLYAFIGQKFRNYFLKILKDLWCVRRKYKSSGFEN------ISRQ >B2AR_MOUSE MGPHGNDSDFLLAPNGSRAPDH----DVTQERDEAWVVGMAILMSVIVLAIVFGNVLVITAIAKFERLQTVTNYFIISLACADLVMGLAVVPFGASHILMKMWNFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYVAITSPFYQSLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-D--TCCDFF--------TNQAYAIASSIVSFYVPLCVMVFVYSRVFQVAKRQGRSLRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIRDNL-IPKEVYILLNWLGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSSKTYGNGYSDYTGEPNTCQLG >DADR_DIDMA ---------------MPLNDTTMDRRGLVVERDFSFRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-SFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLNWHKAGN-TMDNCDSS--------LSRTYAISSSLISFYIPVAIMIVTYTRIYRIAQKQVECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCESDCIDSITFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSTLLGCYRLCPTANNAIETGAVFSSH----- >O.YR_MACMU GELAANWSTEAVNSSAAPPGAEGNCTAGPPRRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLATWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLAACYGLISFKIWQNRMAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDANA-KEASAFIIVMLLASLNSCCNPWIYMLFTGHLFHELVQRFLCCSASYLKGNRLGENSSSFVLSHRSS >AG2R_MELGA ------MVPNYSTEETVKRIHVDC---PVSGRHSYIYIMVPTVYSIIFIIGIFGNSLVVIVIYCYMKLKTVASIFLLNLALADLCFLITLPLWAAYTAMEYQWPFGNCLCKLASAGISFNLYASVFLLTCLSIDRYLAIVHPVSRIRRTMFVARVTCIVIWLLAGVASLPVIIHRNIFF-LN--TVCGFR-YDNNNTTLRVGLGLSKNLLGFLIPFLIILTSYTLIWKTLKKA----YQIQRNKTRNDDIFKMIVAIVFFFFFSWIPHQVFTFLDVLIQLHDIVDTAMPFTICIAYFNNCLNPFFYVFFGKNFKKYFLQLIKYIPPNVSTHPSLTTRPPE-------N >TSHR_MOUSE LQAFESHYDYTVCGDNEDMVCTPKSDEFNPCEDIMGYRFLRIVVWFVSLLALLGNIFVLLILLTSHYKLTVPRFLMCNLAFADFCMGVYLLLIASVDLYTHSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLRHAYTIMAGGWVSCFLLALLPMVGISSY----KVSICLPM-----TDTPLALAYIVLVLLLNVVAFVVVCSCYVKIYITVRNP------QYNPRDKDTKIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFILLSKFGICKRQAQAYQGSTGIQIQKIPQD >CCR3_MOUSE FAFLLENSTSPYDYGENESDFSDSPPCPQDFSLNFDRTFLPALYSLLFLLGLLGNGAVAAVLLSQRTALSSTDTFLLHLAVADVLLVLTLPLWAVDAAVQ--WVFGPGLCKVAGALFNINFYAGAFLLACISFDRYLSIVHATIYRRDPRVRVALTCIVVWGLCLLFALPDFIYLSANY-RL-ATHCQYN----FPQVGRTALRVLQLVAGFLLPLLVMAYCYAHILAVL---------LVSRGQRRFRAMRLVVVVVAAFAVCWTPYHLVVLVDILMDVGSHVDVAKSVTSGMGYMHCCLNPLLYAFVGVKFREKMWMLFTRLGRSDQRGPQRQPSWSETTEASYLG >AA2B_RAT ------------------------------MQLETQDALYVALELVIAALAVAGNVLVCAAVGASSALQTPTNYFLVSLATADVAVGLFAIPFAITISLG--FCTDFHSCLFLACFVLVLTQSSIFSLLAVAVDRYLAIRVPLYKGLVTGTRARGIIAVLWVLAFGIGLTPFLGWNSKDI-T-PVKCLFE-----NVVPMSYMVYFNFFGCVLPPLLIMMVIYIKIFMVACKQ---MEHSRTTLQREIHAAKSLAMIVGIFALCWLPVHAINCITLFHPALDKPKWVMNVAILLSHANSVVNPIVYAYRNRDFRYSFHRIISRYVLCQTDTKGGSGFSLSL------- >MSHR_RANTA PVLGSQRRLLGSLNCTPPATFPLMLAPNRTGPQCLEVSIPNGLFLSLGLVSLVENVLVVAAIAKNSNLHSPMYYFICCLAVSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHRGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW----------------------- >CKR3_RAT EEELKTVVETFETTPYEYEWAPPC---EKVSIRELGSWLLPPLYSLVFIVGLLGNMMVVLILIKYRKLQIMTNIYLLNLAISDLLFLFTVPFWIHYVLWNE-WGFGHCMCKMLSGLYYLALYSEIFFIILLTIDRYLAIVHAVALRARTVTFATITSIITWGFAVLAALPEFIFHESQD-NF-DLSCSPRYPEGEEDSWKRFHALRMNIFGLALPLLIMVICYSGIIKTL---------LRCPNKKKHKAIQLIFVVMIVFFIFWTPYNLVLLLSAFHSTFIHLDLAMQVTEVITHTHCCINPIIYAFVGERFRKHLRLFFHRNVAIYLRKYISFLT-------SSVS >OPS2_HEMSA DFGYPEGVSIVDFVRPEIKPYVHQHWYNYPPVNPMWHYLLGVIYLFLGTVSIFGNGLVIYLFNKSAALRTPANILVVNLALSDLIMLTTNVPFFTYNCFSGGWMFSPQYCEIYACLGAITGVCSIWLLCMISFDRYNIICNGFNGPKLTTGKAVVFALISWVIAIGCALPPFFGWGNYILEGILDSCSYD--YLTQDFNTFSYNIFIFVFDYFLPAAIIVFSYVFIVKAIFAHMNVRSNEADAQRAEIRIAKTALVNVSLWFICWTPYALISLKGVMGDTSGITPLVSTLPALLAKSCSCYNPFVYAISHPKYRLAITQHLPWFCVHETETKSNDDAQDKA------- >AG2R_RABIT ------MMLNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLAVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPAIIHRNVFF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSNLSTRPSD-------N >5H2A_HUMAN NTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEEPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILAYKSSQLQMGQK >AA2A_HUMAN -------------------------------MPIMGSSVYITVELAIAVLAILGNVLVCWAVWLNSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTG--FCAACHGCLFIACFVLVLTQSSIFSLLAIAIDRYIAIRIPLYNGLVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGS-Q-QVACLFE-----DVVPMNYMVYFNFFACVLVPLLLMLGVYLRIFLAARRQESQGERARSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCPDCHAPLWLMYLAIVLSHTNSVVNPFIYAYRIREFRQTFRKIIRSHVLRQQEPFKAAGAHGSDGEQVSLR >HH2R_CAVPO -------------------MAFNGTVPSFCMDFTVYKVTISVILIILILVTVAGNVVVCLAVGLNRRLRSLTNCFIVSLAVTDLLLGLLVLPFSAIYQLSCKWSFSKVFCNIYTSLDVMLCTASILNLFMISLDRYCAVTDPLYPVLITPARVAISLVFIWVISITLSFLSIHLGWN--RN-TIVKCKVQ--------VNEVYGLVDGLVTFYLPLLIMCITYFRIFKIAREQR--IGSWKAATIREHKATVTLAAVMGAFIICWFPYFTVFVYRGLKGDD-VNEVFEDVVLWLGYANSALNPILYAALNRDFRTAYHQLFCCRLASHNSHETSLRRSQCQEPRW--- >IL8B_HUMAN KGEDLSNYSYSSTLPPFLLDAAPC----EPESLEINKYFVVIIYALVFLLSLLGNSLVMLVILYSRVGRSVTDVYLLNLALADLLFALTLPIWAASKVNG--WIFGTFLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRYLVKFICLSIWGLSLLLALPVLLFRRTVY-NV-SPACYED-MGNNTANWRMLLRILPQSFGFIVPLLIMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQNHIDRALDATEILGILHSCLNPLIYAFIGQKFRHGLLKILAIHGLISKDSLPKDS------------ >OLF5_RAT -------------MSSTNQSSVTEFLLLGLSRQPQQQQLLFLLFLIMYLATVLGNLLIILAIGTDSRLHTPMYFFLSNLSFVDVCFSSTTVPKVLANHILGSQAISFSGCLTQLYFLAVFGNMDNFLLAVMSYDRFVAICHPLYTTKMTRQLCVLLVVGSWVVANMNCLLHILLMARL---SPHFFCDGTKLSCSDTHLNELMILTEGAVVMVTPFVCILISYIHITCAV--------LRVSSPRGGWKSFSTCGSHLAVVCLFYGTVIAVYFNPSSSH---LAGRDMAAAVMYAVVTPMLNPFIYSLRNSDMKAALRKVLAMRFPSKQ------------------- >OPSB_CHICK TDLPEDFYIPMALDAPNITALSPFLVPQTHLGSPGLFRAMAAFMFLLIALGVPINTLTIFCTARFRKLRSHLNYILVNLALANLLVILVGSTTACYSFSQMYFALGPTACKIEGFAATLGGMVSLWSLAVVAFERFLVICKPLGNFTFRGSHAVLGCVATWVLGFVASAPPLFGWSRYIPEGLQCSCGPDWYTTDNKWHNESYVLFLFTFCFGVPLAIIVFSYGRLLITLRAVARQQEQSATTQKADREVTKMVVVMVLGFLVCWAPYTAFALWVVTHRGRSFEVGLASIPSVFSKSSTVYNPVIYVLMNKQFRSCMLKLLFCGRSPFGDDEDVSGSSVSSSHVAPA- >AA2A_RAT ----------------------------------MGSSVYITVELAIAVLAILGNVLVCWAVWINSNLQNVTNFFVVSLAAADIAVGVLAIPFAITISTG--FCAACHGCLFFACFVLVLTQSSIFSLLAIAIDRYIAIRIPLYNGLVTGVRAKGIIAICWVLSFAIGLTPMLGWNNCST-K-RVTCLFE-----DVVPMNYMVYYNFFAFVLLPLLLMLAIYLRIFLAARRQESQGERTRSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCSTCHAPPWLMYLAIILSHSNSVVNPFIYAYRIREFRQTFRKIIRTHVLRRQEPFQAGGAHSTEGEQVSLR >OLF6_CHICK -------------MASGNCTTPTTFILSGLTDNPGLQMPLFMVFLAIYTITLLTNLGLIALISIDLQLQTPMYIFLQNLSFTDAVYSTVITPKMLATFLEETKTISYVGCILQYFSFVLLTVRECLLLAVMAYDRYAAICKPLYPAIMTKAVCWRLVKGLYSLAFLNFLVHTSGLLKL---SNHFFCDNSQISSSSTALNELLVFIFGSLFVMSSIITILISYVFIILTV--------VRIRSKERKYKAFSTCTSHLMAVSLFHGTIVFMYFQPANNF---SLDKDKIMSLFYTVVIPMLNPLIYSWRNKEVKDALHRAIATAVLFH-------------------- >P2Y6_HUMAN -----------MEWDNGTGQALGLPPTTCVYRENFKQLLLPPVYSAVLAAGLPLNICVITQICTSRRALTRTAVYTLNLALADLLYACSLPLLIYNYAQGDHWPFGDFACRLVRFLFYANLHGSILFLTCISFQRYLGICHPLWHKRGGRRAAWLVCVAVWLAVTTQCLPTAIFAATG-----RTVCYDL-SPPALATHYMPYGMALTVIGFLLPFAALLACYCLLACRLCRQ---GPAEPVAQERRGKAARMAVVVAAAFAISFLPFHITKTAYLAVRSTEAFAAAYKGTRPFASANSVLDPILFYFTQKKFRRRPHELLQKLTAKWQRQGR--------------- >OPS5_DROME SLGDGSVFPMGHGYPAEYQHMVHAHWRGFREAPIYYHAGFYIAFIVLMLSSIFGNGLVIWIFSTSKSLRTPSNLLILNLAIFDLFMCTN-MPHYLINATVGYIVGGDLGCDIYALNGGISGMGASITNAFIAFDRYKTISNPI-DGRLSYGQIVLLILFTWLWATPFSVLPLFQIWGRYPEGFLTTCSFD--YLTNTDENRLFVRTIFVWSYVIPMTMILVSYYKLFTHVRVHNVKANANADNMSVELRIAKAALIIYMLFILAWTPYSVVALIGCFGEQQLITPFVSMLPCLACKSVSCLDPWVYATSHPKYRLELERRLPWLGIREKHATSGTSSVSGDTLALSVQ >YR13_CAEEL --------------------MSNIFSVPLDPMSVAVGIPYVCFFIILSVVGIIGNVIVIYAIAGDRNRKSVMNILLLNLAVADLANLIFTIPEWIPPVFFGSWLFPSFLCPVCRYLECVFLFASISTQMIVCIERYIAIVLPMARQLCSRRNVLITVLVDWIFVACFASPYAVWHSVK-LFQLSATCSNT---VGKSTWWQGYKLTEFLAFYFVPCFIITVVYTKVAKCLWCKCLDSSRSSDALRTRRNVVKMLIACVAVYFVCYSPIQVIFLSKAVLNVTHPPYDFILLMNALAMTCSASNPLLYTLFSQKFRRRLRDVLYCPSDVENETKTYYSGPRASF------ >5H1B_CRIGR CAPPPPAASQTGVPLVNLSHNSAESHIYQDSIALPWKVLLVALLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPVSTMYTVTGRWTLGQVVCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYAAKRTPKRAAIMIALVWVFSISISLPPFFWR----E-E--LTCLVN-------TDHVLYTVYSTGGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHMATLDFFNWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCAG--------------------- >D2DR_BOVIN ---MDPLNLSWYDDDPESRNWSRPFNGSEGKADRPPYNYYAMLLTLLIFVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWKFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMIAIVWVLSFTISCPMLFGLN---Q----NECII---------ANPAFVVYSSIVSFYVPFIVTLLVYIKIYIVLRRRRTSMSRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNIHCDCN-IPPVLYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFLKILHC------------------------- >VU51_HSV6Z ----------------MEKETKSLAWPATAEFYGWVFIFSSIQLCTVVFLTVRFNGFKVGR-E--------YAVFTFAGMSFNCFLLPIKMGLLSGH-----WTLPRDFCAILLYIDDFSAYFSSWSLVFMAIERINYFCYSTLLNENSKALAKVCFPIVWVVSGVQALQMLNNYKATA---ETGQCFLA--------FRSGHDMWLMLVYSVVIPVMLVFFYLYSKNFM-----------LLKDELSSVTTYLCIYLLLGTIAHLPKAALSEIESD----KIFYGLRDIFMALPVLKVYYISAMAYCMACDDHTVPVRLCSIWLVNLCKKCFSCTLEVGIKMLK--- >D2DR_MOUSE ---MDPLNLSWYDDDLERQNWSRPFNGSEGKADRPHYNYYAMLLTLLIFIIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWKFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMIAIVWVLSFTISCPLLFGLN---Q----NECII---------ANPAFVVYSSIVSFYVPFIVTLLVYIKIYIVLRKRRTSMSRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNIHCDCN-IPPVLYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFMKILHC------------------------- >CCR4_MACFA IYTSDNYTEEMG-SGDYDSIKEPC---FREENAHFNRIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLYVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFASVSE-DD-RYICDRF---YPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALGFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH >HH1R_BOVIN ---------MTCPNSSCVFEDKMCQGNKTAPANDAQLTPLVVVLSTISLVTVGLNLLVLYAVRSERKLHTVGNLYIVSLSVADLIVGVVVMPMNILYLLMSRWSLGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLRYRTKTRASITILAAWFLSFLWIIPILGWRHFQ--EP-EDKCETD------FYNVTWFKVMTAIINFYLPTLLMLWFYAKIYKAVRQHLRSQYVSGLHMNRERKAAKQLGFIMAAFIICWIPYFIFFMVIAFCESC-CNQHVHMFTIWLGYINSTLNPLIYPLCNENFKKTFKKILHIRS----------------------- >OPS2_DROPS AQTG-GNRSVLDNVLPDMAPLVNPHWSRFAPMDPTMSKILGLFTLVILIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIINFYYETWVLGPLWCDIYAACGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKIAFIWMMAVFWTIMPLIGWSSYVPEGNLTACSID--YMTRQWNPRSYLITYSLFVYYTPLFMICYSYWFIIATVAAHMNVRSSEDCDKSAENKLAKVALTTISLWFMAWTPYLIICYFGLFKIDG-LTPLTTIWGATFAKTSAVYNPIVYGISHPNDRLVLKEKCPMCVCGTTDEPKPDATSEAESKD---- >OPSD_SCYCA MNGTEGENFYIPMSNKTGVVRSPFDYPQYYLAEPWKFSVLAAYMFFLIIAGFPVNFLTLYVTIQHKKLRQPLNYILLNLAVADLFMIFGGFPSTMITSMNGYFVFGPSGCNFEGFFATLGGEIGLWSLVVLAIERYVVVCKPMSNFRFGSQHAFMGVGLTWIMAMACAFPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFSIPLTIIFFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVIIMVIAFLICWLPYASVAFFIFCNQGSEFGPIFMTIPAFFAKAASLYNPLIYILMNKQFRNCMITTICCGKNPFEEEESTSASSVSSSQVAPAA >OPSD_BUFBU MNGTEGPNFYIPMSNKTGVVRSPFEYPQYYLAEPWQYSILCAYMFLLILLGFPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFILGATGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAVMGVAFTWIMALSCAVPPLLGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFTIPLIIIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVFFLICWVPYASVAFFIFSNQGSEFGPIFMTVPAFFAKSSSIYNPVIYIMLNKQFRNCMITTLCCGKNPFGEDDASSAASSVSSSQVSPA >ACM4_CHICK AQPWQAKMANLTYDNVTLSNRSEVAIQPPTNYKTVELVFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLACADLIIGVFSMNLYTVYIIKGYWPLGAVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPARRTTKMAGLMIAAAWILSFILWAPAILFWQFIV-VH-ERECYIQ------FLSNPAVTFGTAIAAFYLPVVIMTVLYIHISLASRSRSIAVRKKRQMAAREKKVTRTIFAILLAFILTWTPYNVMVLINTFCETC-VPETVWSIGYWLCYVNSTINPACYALCNATFKKTFKHLLMCQYRNIGTAR---------------- >GRHR_RAT --MANNASLEQDQNHCSAINNSIPLTQGKLPTLTLSGKIRVTVTFFLFLLSTAFNASFLVKLQRWTQKLSRMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGEFLCKVLSYLKLFSMYAPAFMMVVISLDRSLAVTQPL-AVQSKSKLERSMTSLAWILSIVFAGPQLYIFRMIYAV--FSQCVTH--SFPQWWHEAFYNFFTFSCLFIIPLLIMLICNAKIIFALTRVPRKNQSKNNIPRARLRTLKMTVAFGTSFVICWTPYYVLGIWYWFDPEMRVSEPVNHFFFLFAFLNPCFDPLIYGYFSL------------------------------------- >C5AR_MOUSE ---MNSSFEINYDHY-GTMDPNIPADGIHLPKRQPGDVAALIIYSVVFLVGVPGNALVVWVTAFEPD-GPSNAIWFLNLAVADLLSCLAMPVLFTTVLNHNYWYFDATACIVLPSLILLNMYASILLLATISADRFLLVFKPICQKVRGTGLAWMACGVAWVLALLLTIPSFVYREAYK-SE--TVCGIN-YGGGSFPKEKAVAILRLMVGFVLPLLTLNICYTFLLLRT---------WSRKATRSTKTLKVVMAVVICFFIFWLPYQVTGVMIAWLPPSKRVEKLNSLCVSLAYINCCVNPIIYVMAGQGFHGRLLRSLPSIIRNALSEDSVGRSTDD-------- >5H2A_MACMU NTSDAFNWTVESENRTNLSCEGCLSPSCLSLLHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEDPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILAYKSSQLQMGQK >LSHR_BOVIN ESELSGWDYDYGFCLPKTLQCAPEPDAFNPCEDIMGYNFLRVLIWLINILAITGNVTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDAQTKGWQTG-SGCSAAGFFTVFASELSVYTLTVITLERWHTITYAILDQKLRLKHAIPVMLGGWLFSTLIAVLPLVGVSNY----KVSICLPM-----VESTLSQVYILTILILNVMAFIIICACYIKIYFAVQNP------ELMATNKDTKIAKKMAVLIFTDFTCMAPISFFAISAAFKVPLITVTNSKVLLVLFYPVNSCANPFLYAIFTKAFQRDFFLLLSKFGCCKYRAELYRRSNCKNGFTGSNK >SSR5_HUMAN LFPASTPSWNASSPGAASGGGDNRTLVGPAPSAGARAVLVPVLYLLVCAAGLGGNTLVIYVVLRFAKMKTVTNIYILNLAVADVLYMLGLPFLATQNAASF-WPFGPVLCRLVMTLDGVNQFTSVFCLTVMSVDRYLAVVHPLSARWRRPRVAKLASAAAWVLSLCMSLPLLVFADVQE-----GTCNAS-WPEPVGLWGAVFIIYTAVLGFFAPLLVICLCYLLIVVKVRAAGV--RVGCVRRRSERKVTRMVLVVVLVFAGCWLPFFTVNIVNLAVALPPASAGLYFFVVILSYANSCANPVLYGFLSDNFRQSFQKVLCLRKGSGAKDADATEQQQEATRPRTAA >B2AR_PIG MGQPGNRSVFLLAPNGSHAPDQ----DVPQERDEAWVVGMAIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWTFGSFWCEFWISIDVLCVTASIETLCVIAVDRYLAITSPFYQCLLTKNKARVVILMVWVVSGLISFLPIKMHWYQAL-N--ACCDFF--------TNQPYAIASSIVSFYLPLVVMVFVYSRVFQVARRQGRSHRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHGIHDNL-IPKEVYILLNWVGYVNSAFNPLIYCRSP-DFRMAFQELLCLHRSSLKAYGNGCSDYTGEQSGCYLG >FSHR_RAT SDMMYNEFDYDLCNEVVDVTCSPKPDAFNPCEDIMGYNILRVLIWFISILAITGNTTVLVVLTTSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLECKVQLRHAASVMVLGWTFAFAAALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMALLVLNVLAFVVICGCYTHIYLTVRNP------TIVSSSSDTKIAKRMATLIFTDFLCMAPISFFAISASLKVPLITVSKAKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEMQAQIYRTNFHARKSHCSSA >D4DR_MOUSE ---MGNSSATEDGGLLAGRGP---ESLGTGAGLGGAGAAALVGGVLLIGLVLAGNSLVCVSVASERTLQTPTNYFIVSLAAADLLLAVLVLPLFVYSEVQGGWLLSPRLCDTLMAMDVMLCTASIFNLCAISVDRFVAVTVPL-RYNQQGQCQLLLIAATWLLSAAVASPVVCGLN---G-R--AVCCL---------ENRDYVVYSSVCSFFLPCPLMLLLYWATFRGLRRWPEPRRRGAKITGRERKAMRVLPVVVGAFLVCWTPFFVVHITRALCPACFVSPRLVSAVTWLGYVNSALNPIIYTIFNAEFRSVFRKTLRLRC----------------------- >GALT_MOUSE --------------------MADIQNISLDSPGSVGAVAVPVVFALIFLLGMVGNGLVLAVLLQPGPPGSTTDLFILNLAVADLCFILCCVPFQAAIYTLDAWLFGAFVCKTVHLLIYLTMYASSFTLAAVSVDRYLAVRHPLSRALRTPRNARAAVGLVWLLAALFSAPYLSYYG---GA--LELCVPA----WEDARRRALDVATFAAGYLLPVTVVSLAYGRTLCFLWAAGP-AAAAEARRRATGRAGRAMLAVAALYALCWGPHHALILCFWYGRFAPATYACRLASHCLAYANSCLNPLVYSLASRHFRARFRRLWPCGHRRHRHHHHRLHPASSGPAGYPGD >AG2R_RAT ------MALNSSAEDGIKRIQDDC---PKAGRHSYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNHLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLMAGLASLPAVIHRNVYF-TN--TVCAFH-YESRNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFRIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSD-------N >NMBR_RAT PNLSLPTEASESELEPEVWENDFLPDSDGTTAELVIRCVIPSLYLIIISVGLLGNIMLVKIFLTNSTMRSVPNIFISNLAAGDLLLLLTCVPVDASRYFFDEWVFGKLGCKLIPAIQLTSVGVSVFTLTALSADRYRAIVNPMMQTSGVVLWTSLKAVGIWVVSVLLAVPEAVFSEVARSS--FTACIPY---QTDELHPKIHSVLIFLVYFLIPLVIISIYYYHIAKTLIRSLPGNEHTKKQMETRKRLAKIVLVFVGCFVFCWFPNHILYLYRSFNYKELGHMIVTLVARVLSFSNSCVNPFALYLLSESFRKHFNSQLCCGQKSYPERSTSYLMTSLKSNAKNVV >NY1R_CANFA TSFSQVENHSIFCNFSE-NSQFLAFESDDCHLPLAMIFTLALAYGAVIILGVTGNLALIMIILKQKEMRNVTNILIVNLSFSDLLVAIMCLPFTFVYTLMDHWVFGEAMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYVGIAVIWVLAVVSSLPFLIYQVLTDKD--KYVCFDK---FPSDSHRLSYTTLLLMLQYFGPLCFIFICYFKIYIRLKRRNMMMRDNKYRSSETKRINIMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK >A2AD_HUMAN AAGPNASGAGERGSGGVANASGASWGPPRGQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAVISFPPLVSLYR-------PQCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKLRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFSYSLYGICREAQVPGPLFKFFFWIGYCNSSLNPVIYTVFNQDFRRSFKHILFRRRRRGFRQ----------------- >V1AR_SHEEP SSRWWPLDAGDANTSGDLAGLGEDGGPQADTRNEELAKLEIAVLAVIFVVAVLGNSSVLLALHRTPRKTSRMHLFIRHLSLADLAVAFFQVLPQLGWDITYRFRGPDGLCRVVKHMQVFAMFASAYMLVVMTADRYIAVCHPLKTLQQPARRSRLMIAAAWVLSFVLSTPQYFVFSMVETK--TYDCWAN---FIHPWGLPAYVTWMTGSVFVAPVVILGTCYGFICYHIWRKVLHVSSVKTISRAKIRTVKMTFVIVTAYIVCWAPFFIIQMWSAWDKNFESENPATAIPALLASLNSCCNPWIYMFFSGHLLQDCAQSFPCCQNVKRTFTREGSTSFTNNRSPTNS >FML1_MOUSE -----------MESNYSIHLNGSEVVVYDSTISRVLWILSMVVVSITFFLGVLGNGLVIWVAGFRMP-HTVTTIWYLNLALADFSFTATLPFLLVEMAMKEKWPFGWFLCKLVHIAVDVNLFGSVFLIAVIALDRCICVLHPVAQNHRTVSLARNVVVGSWIFALILTLPLFLFLTTVRVSWVEERLNTA------ITFVTTRGIIRFIVSFSLPMSFVAICYGLITTKI---------HKKAFVNSSRPFRVLTGVVASFFICWFPFQLVALLGTVWLKEKIIGRLVNPTSSLAFFNSCLNPILYVFMGQDFQERLIHSLSSRLQRALSEDSGHIAS---------- >OPSD_PETMA MNGTEGENFYIPFSNKTGLARSPFEYPQYYLAEPWKYSVLAAYMFFLILVGFPVNFLTLFVTVQHKKLRTPLNYILLNLAVANLFMVLFGFTLTMYSSMNGYFVFGPTMCNFEGFFATLGGEMSLWSLVVLAIERYIVICKPMGNFRFGSTHAYMGVAFTWFMALSCAAPPLVGWSRYLPEGMQCSCGPDYYTLNPNFNNESFVIYMFLVHFIIPFIVIFFCYGRLLCTVKEAAAAQQESASTQKAEKEVTRMVVLMVIGFLVCWVPYASVAFYIFTHQGSDFGATFMTVPAFFAKTSALYNPIIYILMNKQFRNCMITTLCCGKNPLGDEDSGASVSSVSTSQVSPA >B2AR_MACMU MGQPGNGSAFLLAPNGSHAPDH----DVTQERDEAWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQGRTLRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IPKEVYILLNWVGYVNSGFNPLIYCRSP-DFRIAFQELLCLRRSSLKACGNGYSGNTGEQSGYHLE >AG2R_SHEEP ------MILNSSTEDGIKRIQDDC---PKAGRHNYIFIMIPTLYSIIFVVGLFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASGSVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPTIIHRNVFF-TN--TVCAFHVYESQNSTLPVGLGLTKNILGFLFPFLIILTSYTLIWKTLKKA----YEIQKNKPRKDDIFKIILAIVLFFFFSWVPHQIFTFMDVLIQLGDIVDTAMPITICLAYFNNCLNPPFYGFLGKKFKKYFLQLLKYIPPKAKSHSNLSTRPSE-------N >AG2R_BOVIN ------MILNSSTEDGIKRIQDDC---PKAGRHNYIFIMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPTIIHRNVFF-TN--TVCAFH-YESQNSTLPVGLGLTKNILGFLFPFLIILTSYTLIWKTLKKA----YEIQKNKPRKDDIFKIILAIVLFFFFSWVPHQIFTFMDVLIQLGDIVDTAMPITICLAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSNLSTRPSE-------N >ACM2_HUMAN --------------MNNSTNSSNNSLALTSPYKTFEVVFIVLVAGSLSLVTIIGNILVMVSIKVNRHLQTVNNYFLFSLACADLIIGVFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIV-VE-DGECYIQ------FFSNAAVTFGTAIAAFYLPVIIMTVLYWHISRASKSRVKMPAKKKPPPSREKKVTRTILAILLAFIITWAPYNVMVLINTFCAPC-IPNTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLMCHYKNIGATR---------------- >PF2R_HUMAN --MSMNN---------SKQLVSPAAALLSNTTCQTENRLSVFFSVIFMTVGILSNSLAIAILMKAYQSKASFLLLASGLVITDFFGHLINGAIAVFVYASDKFDQSNVLCSIFGICMVFSGLCPLLLGSVMAIERCIGVTKPIHSTKITSKHVKMMLSGVCLFAVFIALLPILGHRDYKIQASRTWCFYN--TEDIKDWEDRFYLLLFSFLGLLALGVSLLCNAITGITLLRVKFKSQQHRQGRSHHLEMVIQLLAIMCVSCICWSPFLVTMANIGINGNHLETCETTLFALRMATWNQILDPWVYILLRKAVLKNLYKLASQCCGVHVISLHIWELKVAAISESPVA >US28_HCMVA ----MTPTTTTAELTTEFDYDEDATPCVFTDVLNQSKPVTLFLYGVVFLFGSIGNFLVIFTITWRRRIQCSGDVYFINLAAADLLFVCTLPLWMQYLLDH--NSLASVPCTLLTACFYVAMFASLCFITEIALDRYYAIVYMR---YRPVKQACLFSIFWWIFAVIIAIPHFMVVTKK-----DNQCMTD-YDYLEVSYPIILNVELMLGAFVIPLSVISYCYYRISRIV---------AVSQSRHKGRIVRVLIAVVLVFIIFWLPYHLTLFVDTLKLLKRSLKRALILTESLAFCHCCLNPLLYVFVGTKFRQELHCLLAEFRQRLFSRDVSWYRSSPSRRETSSD >NK4R_HUMAN NLTSSPAPTASPSPAPSWTPSPRPGPAHPFLQPPWAVALWSLAYGAVVAVAVLGNLVVIWIVLAHKRMRTVTNSFLVNLAFADAAMAALNALVNFIYALHE-WYFGANYCRFQNFFPITAVFASIYSMTAIAVDRYMAIIDPL-KPRLSATATRIVIGSIWILAFLLAFPQCLYSK---GR---TLCYVQ--WPEGSRQHFTYHMIVIVLVYCFPLLIMGITYTIVGITLWGG--PCDKYQEQLKAKRKVVKMMIIVVVTFAICWLPYHIYFILTAIYQQLKYIQQVYLASFWLAMSSTMYNPIIYCCLNKRFRAGFKRAFRWCPFIHVSSY----ATRLHPMRQSSL >MC3R_RAT NSSCCPSSSYPTLPNLSQHPAAPSASNRSGSGFCEQVFIKPEVFLALGIVSLMENILVILAVVRNGNLHSPMYFFLLSLLQADMLVSLSNSLETIMIVVINSDQFIQHMDNIFDSMICISLVASICNLLAIAVDRYVTIFYALYHSIMTVRKALSLIVAIWVCCGICGVMFIVYYS----------------------EESKMVIVCLITMFFAMVLLMGTLYIHMFLFARLHIAAADGVAPQQHSCMKGAVTITILLGVFIFCWAPFFLHLVLIITCPTNICYTAHFNTYLVLIMCNSVIDPLIYAFRSLELRNTFKEILCGCNGMNVG------------------ >OPS1_HEMSA TFGYPEGMTVADFVPDRVKHMVLDHWYNYPPVNPMWHYLLGVVYLFLGVISIAGNGLVIYLYMKSQALKTPANMLIVNLALSDLIMLTTNFPPFCYNCFSGGWMFSGTYCEIYAALGAITGVCSIWTLCMISFDRYNIICNGFNGPKLTQGKATFMCGLAWVISVGWSLPPFFGWGSYTLEGILDSCSYD--YFTRDMNTITYNICIFIFDFFLPASVIVFSYVFIVKAIFAHMNVRSNEAETQRAEIRIAKTALVNVSLWFICWTPYAAITIQGLLGNAEGITPLLTTLPALLAKSCSCYNPFVYAISHPKFRLAITQHLPWFCVHEKDPNDVEETQEKS------- >AA3R_RAT -----------------------MKANNTTTSALWLQITYITMEAAIGLCAVVGNMLVIWVVKLNRTLRTTTFYFIVSLALADIAVGVLVIPLAIAVSLE--VQMHFYACLFMSCVLLVFTHASIMSLLAIAVDRYLRVKLTVYRTVTTQRRIWLFLGLCWLVSFLVGLTPMFGWN-RKE-L-TLSCHFR-----SVVGLDYMVFFSFITWILIPLVVMCIIYLDIFYIIRNK--NFRETRAFYGREFKTAKSLFLVLFLFALCWLPLSIINFVSYFNVK--IPEIAMCLGILLSHANSMMNPIVYACKIKKFKETYFVILRACRLCQTSDSLDSN------------ >SSR4_RAT GED----TTWTPGINASWAPDEEEDAVRSDGTGTAGMVTIQCIYALVCLVGLVGNALVIFVILRYAKMKTATNIYLLNLAVADELFMLSVPFVASAAALRH-WPFGAVLCRAVLSVDGLNMFTSVFCLTVLSVDRYVAVVHPLAATYRRPSVAKLINLGVWLASLLVTLPIAVFADTRPGGE-AVACNLH---WPHPAWSAVFVIYTFLLGFLLPVLAIGLCYLLIVGKMRAVALR-AGWQQRRRSEKKITRLVLMVVTVFVLCWMPFYVVQLLNLFVTS--LDATVNHVSLILSYANSCANPILYGFLSDNFRRSFQRVLCLRCCLLETTGGAEETALKSRGGPGCI >NK1R_RAT -----MDNVLPMDSDLFPNISTNTSESNQFVQPTWQIVLWAAAYTVIVVTSVVGNVVVIWIILAHKRMRTVTNYFLVNLAFAEACMAAFNTVVNFTYAVHNVWYYGLFYCKFHNFFPIAALFASIYSMTAVAFDRYMAIIHPL-QPRLSATATKVVIFVIWVLALLLAFPQGYYST---SR---VVCMIEWPEHPNRTYEKAYHICVTVLIYFLPLLVIGYAYTVVGITLWAS-IPSDRYHEQVSAKRKVVKMMIVVVCTFAICWLPFHVFFLLPYINPDLKFIQQVYLASMWLAMSSTMYNPIIYCCLNDRFRLGFKHAFRCCPFISAGDY----STRYLQTQSSVY >OPSD_SARDI MNGTEGPFFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAILGAYMFFLIIVGFPVNFMTLYVTLEHKKLRTPLNYILLNLAVADLFMVIGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGMISLWSLAVLAIERWVVVCKPISNFRFGENHAIMGVSLTWGMALACTVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVLYMFFCHFTIPLTIIFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIIMVIGFLVCWLPYASVAWFIFTHQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMITTLFCGKNPF---EGEEETEASSASSVSPA >FSHR_HORSE FDMMYSEFEYDLCNEVVDVTCSPKPDAFNPCEDIMGYDILRVLIWFISILAITGNIIVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLECKVQLRHAASVMLVGWIFAFAVALLPIFGISTY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYIHIYLTVRNP------NIVSSSSDTKIAKRMAILIFTDFLCMAPISFFAISASLKVPLITVSKSKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEMQAQLYRTISHPRNGHCPPT >OPSG_CARAU MNGTEGKNFYVPMSNRTGLVRSPFEYPQYYLAEPWQFKILALYLFFLMSMGLPINGLTLVVTAQHKKLRQPLNFILVNLAVAGTIMVCFGFTVTFYTAINGYFVLGPTGCAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSSSHAFAGIAFTWVMALACAAPPLFGWSRYIPEGMQCSCGPDYYTLNPDYNNESYVIYMFVCHFILPVAVIFFTYGRLVCTVKAAAAQQQDSASTQKAEREVTKMVILMVFGFLIAWTPYATVAAWIFFNKGADFSAKFMAIPAFFSKSSALYNPVIYVLLNKQFRNCMLTTIFCGKNPLGDDESS--TSKTEVSSVSPA >AA3R_RABIT ------------------------MPDNSTTLFLAIRASYIVFEIVIGVCAVVGNVLVIWVIKLNPSLKTTTFYFIFSLALADIAVGFLVMPLAIVISLG--ITIGFYSCLVMSCLLLVFTHASIMSLLAIAVDRYLRVKLTVYRRVTTQRRIWLALGLCWVVSLLVGFTPMFGWN-MKE-S-DFQCKFD-----SVIPMEYMVFFSFFTWILIPLLLMCALYVYIFYIIRNK--SFKETGAFYRREFKTAKSLFLVLALFAGCWLPLSIINCVTYFKCK--VPDVVLLVGILLSHANSMMNPIVYACKIQKFKETYLLIFKARVTCQPSDSLDPS------------ >5H1D_HUMAN SPLNQSAEGLPQEASNRSLNATETSEAWDPRTLQALKISLAVVLSVITLATVLSNAFVLTTILLTRKLHTPANYLIGSLATTDLLVSILVMPISIAYTITHTWNFGQILCDIWLSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAATMIAIVWAISICISIPPLFWR----Q-E--SDCLVN-------TSQISYTIYSTCGAFYIPSVLLIILYGRIYRAARNRKLALERKRISAARERKATKILGIILGAFIICWLPFFVVSLVLPICRDSWIHPALFDFFTWLGYLNSLINPIIYTVFNEEFRQAFQKIVPFRKAS--------------------- >CCR4_MACMU IYTSDNYTEEMG-SGDYDSIKEPC---FREENAHFNRIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQKPRKLLAEKVVYVGVWIPALLLTIPDFIFASVSE-DD-RYICDRF---YPNDLWVVVFQFQHIMVGLILPGIDILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH >OPSP_CHICK ------MSSNSSQAPPNGTPGPFDGPQWPYQAPQSTYVGVAVLMGTVVACASVVNGLVIVVSICYKKLRSPLNYILVNLAVADLLVTLCGSSVSLSNNINGFFVFGRRMCELEGFMVSLTGIVGLWSLAILALERYVVVCKPLGDFQFQRRHAVSGCAFTWGWALLWSAPPLLGWSSYVPEGLRTSCGPN--WYTGGSNNNSYILSLFVTCFVLPLSLILFSYTNLLLTLRAAAAQQKEADTTQRAEREVTRMVIVMVMAFLLCWLPYSTFALVVATHKGIIIQPVLASLPSYFSKTATVYNPIIYVFMNKQFQSCLLEMLCCGYQPQRTGKASPGVTAAGLRNKVMP >O7C1_HUMAN -------------METGNQTHAQEFLLLGFSATSEIQFILFGLFLSMYLVTFTGNLLIILAICSDSHLHTPMYFFLSNLSFADLCFTSTTVPKMLLNILTQNKFITYAGCLSQIFFFTSFGCLDNLLLTVMAYDRFVAVCHPLYTVIMNPQLCGLLVLGSWCISVMGSLLETLTVLRL---SPHFFCDLLKLACSDTFINNVVIYFATGVLGVISFTGIFFSYYKIVFSI--------LRISSAGRKHKAFSTCGSHLSVVTLFYGTGFGVYLSSAATP---SSRTSLVASVMYTMVTPMLNPFIYSLRNTDMKRALGRLLSRATFFNGDITAGLS------------ >OPR._PIG GSPLQGNLSLLSPNHSLLPPHLLLNASHGAFLPLGLKVTIVGLYLAVCVGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTAVLLTLPFQGTDVLLGF-WPFGNALCKAVIAIDYYNMFTSAFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALASIVGVPVAIMGSAQVEE---IECLVE-IPAPQDYWGPVFAVCIFLFSFVIPVLIISVCYSLMVRRLRGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLVQGLGVQPETAVAVLRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCAPTRRREMQVSDRVALACKTSETVPR >OPSD_RAT MNGTEGPNFYVPFSNITGVVRSPFEQPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIGLWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIFFLICWLPYASVAMYIFTHQGSNFGPIFMTLPAFFAKTASIYNPIIYIMMNKQFRNCMLTSLCCGKNPLGDDEASATASKTETSQVAPA >P2UR_HUMAN ----MAADLGPWNDTINGTWDGDELGYRCRFNEDFKYVLLPVSYGVVCVLGLCLNAVALYIFLCRLKTWNASTTYMFHLAVSDALYAASLPLLVYYYARGDHWPFSTVLCKLVRFLFYTNLYCSILFLTCISVHRCLGVLRPLSLRWGRARYARRVAGAVWVLVLACQAPVLYFVTTS-----RVTCHDT-SAPELFSRFVAYSSVMLGLLFAVPFAVILVCYVLMARRLLKP---YGTSGGLPRAKRKSVRTIAVVLAVFALCFLPFHVTRTLYYSFRSLNAINMAYKVTRPLASANSCLDPVLYFLAGQRLVRFARDAKPPTGPSPATPARRRLTDMQRIGDVLGS >D1DR_OREMO -MEIFTTTRGTSAGPEPAPGGHGGTDSPR-TSDLSLRALTGCVLCILIVSTLLGNALVCAAVIKFRHRSKVTNAFVISLAVSDLFVAVLVMPWRAVSEVAGVWLFG-AFCDTWVAFDIMCSTASILHLCIISMDRYWAISSPFYERRMTPRFGCVMIGVAWTLSVLISFIPVQLNWHARR--DPGDCNAS--------LNRTYAISSSLISFYIPVLIMVGTYTRIFRIGRTQPALESSLKTSFRRETKVLKTLSVIMGVFVFCWLPFFVLNCMVPFCRLECVSDTTFSVFVWFGWANSSLNPVIYAFNA-DFRKAFSTILGCSRYCRTSAVEAVDYHHDTTLQK--- >OPSP_PETMA HHSLPSALPSATGGNGTVATMHNPFERPLEGIAPWNFTMLAALMGTITALSLGENFAVIVVTARFRQLRQPLNYVLVNLAAADLLVSAIGGSVSFFTNIKGYFFLGVHACVLEGFAVTYFGVVALWSLALLAFERYFVICRPLGNFRLQSKHAVLGLAVVWVFSLACTLPPVLGWSSYRPSMIGTTCEPN--WYSGELHDHTFILMFFSTCFIFPLAVIFFSYGKLIQKLKKASETQRGLESTRRAEQQVTRMVVVMILAFLVCWMPYATFSIVVTACPTI---PLLAAVPAFFSKTATVYNPVIYIFMNKQFRDCFVQVLPCKGLKKVSATQTAGASVNTQSPGNRH >DADR_RAT ---------------MAPNTSTMDEAGLPAERDFSFRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPLG-PFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLSWHKAGN-EDDNCDTR--------LSRTYAISSSLISFYIPVAIMIVTYTSIYRIAQKQVECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFISNCMVPFCGSECIDSITFDVFVWFGWANSSLNPIIYAFNA-DFQKAFSTLLGCYRLCPTTNNAIETAVVFSSH----- >GALS_RAT ------------MNGSGSQGAENTSQEGGSGGWQPEAVLVPLFFALIFLVGTVGNALVLAVLLRGGQAVSTTNLFILNLGVADLCFILCCVPFQATIYTLDDWVFGSLLCKAVHFLIFLTMHASSFTLAAVSLDRYLAIRYPLSRELRTPRNALAAIGLIWGLALLFSGPYLSYYR---AN--LTVCHPA----WSAPRRRAMDLCTFVFSYLLPVLVLSLTYARTLRYLWRTDP-VTAGSGSQRAKRKVTRMIIIVAVLFCLCWMPHHALILCVWFGRFPRATYALRILSHLVSYANSCVNPIVYALVSKHFRKGFRKICAGLLRPAPRRASGRVHSGSMLEQESTD >5H7_MOUSE SSWMPHLLSGFPEVTASPAPTNVSGCGEQINYGRVEKVVIGSILTLITLLTIAGNCLVVISVCFVKNVRQPSNYLIVSLALADLSVAVAVMPFVSVTDLIGGWIFGHFFCNVFIAMDVMCCTASIMTLCVISIDRYLGITRPLYPVRQNGKCMAKMILSVWPLSASITLPPLFGWAQ--N-D--KVCLIS--------QDFGYTIYSTAVAFYIPMSVMLFMYYQIYKAARKSSRLERKNISSFKREQKAATTLGIIVGAFTVCWLPFFLLSTARPFICGTCIPLWVERTCLWLGYANSLINPFIYSFFNRDLRTTYRSLLQCQYRNINRKLSAAGAERPERSEFVLQ >EBI2_HUMAN ------MDIQMANNFTPPSATPQGNDCDLYAHHSTARIVMPLHYSLVFIIGLVGNLLALVVIVQNRKKINSTTLYSTNLVISDILFTTALPTRIAYYAMGFDWRIGDALCRITALVFYINTYAGVNFMTCLSIDRFIAVVHPLYNKIKRIEHAKGVCIFVWILVFAQTLPLLINPMS--KQEERITCMEY--NFEETKSLPWILLGACFIGYVLPLIIILICYSQICCKLFRTAK-QNPLTEKSGVNKKALNTIILIIVVFVLCFTPYHVAIIQHMIKKLRHSFQISLHFTVCLMNFNCCMDPFIYFFACKGYKRKVMRMLKRQVSVSISSAVKSAMTETQMMIHSKS >TSHR_CANFA LQAFDSHYDYTVCGGNEDMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSLLALLGNVFVLIVLLTSHYKLTVPRFLMCNLAFADFCMGMYLLLIASVDLYTHSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLRHAYAIMVGGWVCCFLLALLPLVGISSY----KVSICLPM-----TETPLALAYIILVLLLNIVAFIIVCSCYVKIYITVRNP------QYNPGDKDTKIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFILLSKFGICKRQAQAYRGSAGIQIQKVTRD >NY4R_RAT HLMASLSPAFLQGKNGTNPLDSLYNLSDGCQDSADLLAFIITTYSVETVLGVLGNLCLIFVTTRQKEKSNVTNLLIANLAFSDFLMCLICQPLTVTYTIMDYWIFGEVLCKMLTFIQCMSVTVSILSLVLVALERHQLIINPT-GWKPSISQAYLGIVVIWFISCFLSLPFLANSILNDED--KVVCFVS---WSSDHHRLIYTTFLLLFQYCVPLAFILVCYMRIYQRLQRQRA-THTCSSRVGQMKRINGMLMAMVTAFAVLWLPLHVFNTLEDWYQEACHGNLIFLMCHLFAMASTCVNPFIYGFLNINFKKDIKALVLTCRCRPPQGEP---TVHTDLSKGSMR >CKR5_PAPHA ----MDYQVSSPTYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLLFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >OLF2_RAT ------------MESGNSTRRFSSFFLLGFTENPQLHFLIFALFLSMYLVTVLGNLLIIMAIITQSHLHTPMYFFLANLSFVDICFTSTTIPKMLVNIYTQSKSITYEDCISQMCVFLVFAELGNFLLAVMAYDRYVA-CHPLYTVIVNHRLCILLLLLSWVISIFHAFIQSLIVLQL---TPHFFCELNQLTCSDNFPSHLIMNLVPVMLAAISFSGILYSYFKIVSSI--------HSISTVQGKYKAFSTCASHLSIVSLFYSTGLGVYVSSAVVQ---SSHSAASASVMYTVVTPMLNPFIYSLRNKDVKRALERLLEGNCKVHHWTG---------------- >HH2R_HUMAN -------------------MAPNGTASSFCLDSTACKITITVVLAVLILITVAGNVVVCLAVGLNRRLRNLTNCFIVSLAITDLLLGLLVLPFSAIYQLSCKWSFGKVFCNIYTSLDVMLCTASILNLFMISLDRYCAVMDPLYPVLVTPVRVAISLVLIWVISITLSFLSIHLGWN--RN-TTSKCKVQ--------VNEVYGLVDGLVTFYLPLLIMCITYYRIFKVARDQR--ISSWKAATIREHKATVTLAAVMGAFIICWFPYFTAFVYRGLRGDD-INEVLEAIVLWLGYANSALNPILYAALNRDFRTGYQQLFCCRLANRNSHKTSLRRTQSREPRQ--- >CKR5_HYLLE ----MDYQVSSPTYDIDYDTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNMLVILVLINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKHFCKCCSIFA-------SSVY >B3AR_HUMAN MAPWPHENSSLAPWPDLPTLAPNTANTSGLPGVPWEAALAGALLALAVLATVGGNLLVIVAIAWTPRLQTMTNVFVTSLAAADLVMGLLVVPPAATLALTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRCARTAVVLVWVVSAAVSFAPIMSQWWRVQ-R--RCCAFA--------SNMPYVLLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALCTLGLIMGTFTLCWLPFFLANVLRALGGPS-VPGPAFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCGRRLPPEPCAAAGVPAARSSPAQP >NY6R_MOUSE --MEVLTNQPTPNKTSGKSNNSAFFYFESCQPPFLAILLLLIAYTVILIMGIFGNLSLIIIIFKKQRAQNVTNILIANLSLSDILVCVMCIPFTVIYTLMDHWVFGNTMCKLTSYVQSVSVSVSIFSLVLIAIERYQLIVNPR-GWKPRVAHAYWGIILIWLISLTLSIPLFLSYHLTNTH--QVACVEI---WPSKLNQLLFSTSLFMLQYFVPLGFILICYLKIVLCLRKRTRQRKENKSRLNENKRVNVMLISIVVTFGACWLPLNIFNVIFDWYHEMCHHDLVFVVCHLIAMVSTCINPLFYGFLNKNFQKDLMMLIHHCWCGEPQESY---TMHTDESKGSLK >OLF4_RAT -------------MTGNNQTLILEFLLLGLPIPSEYHLLFYALFLAMYLTIILGNLLIIVLVRLDSHLHMPMYLFLSNLSFSDLCFSSVTMPKLLQNMQSQVPSISYTGCLTQLYFFMVFGDMESFLLVVMAYDRYVAICFPLYTTIMSTKFCASLVLLLWMLTMTHALLHTLLIARL---SLHFFCDISKLSCSDIYVNELMIYILGGLIIIIPFLLIVMSYVRIFFSI--------LKFPSIQDIYKVFSTCGSHLSVVTLFYGTIFGIYLCPSGNN---STVKEIAMAMMYTVVTPMLNPFIYSLRNRDMKRALIRVICTKKISL-------------------- >OPSD_ZEUFA MNGTEGPDFYVPMVNTTGIVRSPYDYPQYYLVNPAAFSMLAAYMFFLILVGFPVNFLTLYVTMEHKKLRTPLNYILLNLAVANLFMVIGGFTTTMYTSMHGYFVLGRTGCNLEGFFATLGGEIALWSLVVLAVERWVVVCKPISNFRFGENHAVMGVSFTWLMACACSVPPLFGWSRYIPEGMQCSCGIDYYTRAPGYNNESFVIYMFVCHFSIPLTIIFFCYGRLLCAVKDAAAAQQESETTQRAEREVSRMVVIMVIGFLICWLPYASVAWFIFTHQGSEFGPVFMTIPAFFAKSSAIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSHVSPA >OPSD_CYPCA MNGTEGPMFYVPMSNATGVVKSPYDYPQYYLVAPWAYGCLAAYMFFLIITGFPINFLTLYVTIEHKKLRTPLNYILLNLAISDLFMVFGGFTTTMYTSLHGYFVFGRIGCNLEGFFATLGGEMGLWSLVVLAFERWMVVCKPVSNFRFGENHAIMGVVFTWFMACTCAVPPLVGWSRYIPEGMQCSCGVDYYTRAPGYNNESFVIYMFLVHFIIPLIVIFFCYGRLVCTVKDAAAQQQESETTQRAEREVTRMVVIMVIGFLICWIPYASVAWYIFTHQGSEFGPVFMTVPAFFAKSAAVYNPCIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA >A2AR_LABOS -----MDPLNATGMDAFTAIHLNASWSADSGYSLAAIASIAALVSFLILFTVVGNILVVIAVLTSRALKAPQNLFLVSLATADILVATLVMPFSLANELMGYWYFGKVWCGIYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPKRVKCIIVIVWLISAFISSPPLLSIDS--I-S--PQCMLN--------DDTWYILSSSMASFFAPCLIMILVYIRIYQVAKTRKRRIAEKKVSQAREKRFTFVLAVVMGVFVVCWFPFFFSYSLHAVCRDYKIPDTLFK-FFWIGYCNSSLNPAIYTIFNRDFRRAFQKILCKSWKKSF------------------- >D3DR_HUMAN --------MASLSQLSSHLNYTCGAENSTGASQARPHAYYALSYCALILAIVFGNGLVCMAVLKERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGWNFSRICCDVFVTLDVMMCTASILNLCAISIDRYTAVVMPVGTGQSSCRRVALMITAVWVLAFAVSCPLLFGFN---D-P--TVCSI---------SNPDFVIYSSVVSFYLPFGVTVLVYARIYVVLKQRTSLPLQPRGVPLREKKATQMVAIVLGAFIVCWLPFFLTHVLNTHCQTCHVSPELYSATTWLGYVNSALNPVIYTTFNIEFRKAFLKILSC------------------------- >D2DR_FUGRU -----MDVFTQYAYNDSIFDNGTWSANETTKDETHPYNYYAMLLTLLIFVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWRFSKIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSRRRVTVMISVVWVLSFAISCPLLFGLN---T-R--SLCFI---------ANPAFVVYSSIVSFYVPFIVTLLVYVQIYVVLRKRQTSLSKRKISQQKEKKATQMLAIVLGVFIICWLPFFITHILNTHCTRCKVPAEMYNAFTWLGYVNSAVNPIIYTTFNVEFRKAFIKILHC------------------------- >P2Y4_HUMAN ASTESSLLRSLGLSPGPGS---SEVELDCWFDEDFKFILLPVSYAVVFVLGLGLNAPTLWLFIFRLRPWDATATYMFHLALSDTLYVLSLPTLIYYYAAHNHWPFGTEICKFVRFLFYWNLYCSVLFLTCISVHRYLGICHPLALRWGRPRLAGLLCLAVWLVVAGCLVPNLFFVTTS-----TVLCHDT-TRPEEFDHYVHFSSAVMGLLFGVPCLVTLVCYGLMARRLYQP----LPGSAQSSSRLRSLRTIAVVLTVFAVCFVPFHITRTIYYLARLLNIVNVVYKVTRPLASANSCLDPVLYLLTGDKYRRQLRQLCGGGKPQPRTAASSLASSCRWAATPQDS >GALR_MOUSE ----MELAMVNLSEGNGSDPEPPAPESRPLFGIGVENFITLVVFGLIFAMGVLGNSLVITVLARSKPPRSTTNLFILNLSIADLAYLLFCIPFQATVYALPTWVLGAFICKFIHYFFTVSMLVSIFTLAAMSVDRYVAIVHSRSSSLRVSRNALLGVGFIWALSIAMASPVAYHQRLF-SN--QTFCWEQ---WPNKLHKKAYVVCTFVFGYLLPLLLICFCYAKVLNHLHKKLK--NMSKKSEASKKKTAQTVLVVVVVFGISWLPHHVVHLWAEFGAFPPASFFFRITAHCLAYSNSSVNPIIYAFLSENFRKAYKQVFKCHVCDESPRSETKEPPSTNCTHV--- >OAR1_LOCMI SSAAEEPQDALVGGDACGGRRPPSVLGVRLAVPEWEVAVTAVSLSLIILITIVGNVLVVLSVFTYKPLRIVQNFFIVSLAVADLTVAVLVMPFNVAYSLIQRWVFGIVVCKMWLTCDVLCCTASILNLCAIALDRYWAITDPIYAQKRTLRRVLAMIAGVWLLSGVISSPPLIGWNDW-N-D--TPCQLT--------EEQGYVIYSSLGSFFIPLFIMTIVYVEIFIATKRRPVYEEKQRISLSKERRAARTLGIIMGVFVVCWLPFFLMYVIVPFCNPSKPSPKLVNFITWLGYINSALNPIIYTIFNLDFRRAFKKLLHFKT----------------------- >ML1A_SHEEP GSPGGTPKGNGSSALLNVSQAAPGAGDGVRPRPSWLAATLASILIFTIVVDIVGNLLVVLSVYRNKKLRNAGNVFVVSLAVADLLVAVYPYPLALASIVNNGWSLSSLHCQLSGFLMGLSVIGSVFSITGIAINRYCCICHSLYGKLYSGTNSLCYVFLIWTLTLVAIVPNLCVGT-LQYDP-IYSCTFT------QSVSSAYTIAVVVFHFIVPMLVVVFCYLRIWALVLQVWK-PDNKPKLKPQDFRNFVTMFVVFVLFAICWAPLNFIGLVVASDPASRIPEWLFVASYYMAYFNSCLNAIIYGLLNQNFRQEYRKIIVSLCTTKMFFVDSSNRKPSPLIANHNL >OPSD_SOLSO MNGTEGPYFYIPMLNTTGIVRSPYEYPQYYLVNPAAYAALCAYMFLLILLGFPINFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIGLWSLVVLAVERWMVVCKPISNFRFTENHAIMGLGFTWFAASACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVVYMFVCHFLIPLIVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIAFLICWCPYAGVAWYIFSNQGSEFGPLFMTIPAFFAKSSSIYNPLIYIFMNKQFRHCMITTLCCGKNPFEEEEGSTTSASSSSVSPAA- >ML1A_MOUSE -------MKGNVSELLNATQQAPGGGEGGRPRPSWLASTLAFILIFTIVVDILGNLLVILSVYRNKKLRNSGNIFVVSLAVADLVVAVYPYPLVLTSILNNGWNLGYLHCQVSAFLMGLSVIGSIFNITGIAMNRYCYICHSLYDKIYSNKNSLCYVFLIWMLTLIAIMPNLQTGT-LQYDP-IYSCTFT------QSVSSAYTIAVVVFHFIVPMIIVIFCYLRIWVLVLQVRR-PDNKPKLKPQDFRNFVTMFVVFVLFAICWAPLNLIGLIVASDPATRIPEWLFVASYYLAYFNSCLNAIIYGLLNQNFRKEYKKIIVSLCTAKMFFVESSNCKPSPLIPNNNL >OPSD_POERE MNGTEGPYFYVPMVNTTGIVRSPYEYPQYYLVSPAAYACLGAYMFFLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTIYTSMHGYFVLGRLGCNLEGYFATLGGEIGLWSLVVLAVERWLVVCKPISNFRFSENHAIMGLVFTWIMANSCAAPPLLGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFICHFCIPLIVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIGFLVCWIPYASVAWYIFTHQGSEFGPLFMTVPAFFAKSASIYNPLIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA >PE22_HUMAN ----------------MGNASNDSQSEDCETRQWLPPGESPAISSVMFSAGVLGNLIALALLARRWRSLSLFHVLVTELVFTDLLGTCLISPVVLASYARNQLAPESRACTYFAFAMTFFSLATMLMLFAMALERYLSIGHPYYQRRVSASGGLAVLPVIYAVSLLFCSLPLLDYGQYVQYCPGTWCFIR--------HGRTAYLQLYATLLLLLIVSVLACNFSVILNLIRMGGPRRGERVSMAEETDHLILLAIMTITFAVCSLPFTIFAYMNETSS---RKEKWDLQALRFLSINSIIDPWVFAILRPPVLRLMRSVLCCRISLRTQDATQTSSKQADL------ >CKR3_HUMAN MTTSLDTVETFGTTSYYDDVGLLC---EKADTRALMAQFVPPLYSLVFTVGLLGNVVVVMILIKYRRLRIMTNIYLLNLAISDLLFLVTLPFWIHYVRGHN-WVFGHGMCKLLSGFYHTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIVTWGLAVLAALPEFIFYETEE-LF-ETLCSALYPEDTVYSWRHFHTLRMTIFCLVLPLLVMAICYTGIIKTL---------LRCPSKKKYKAIRLIFVIMAVFFIFWTPYNVAILLSSYQSILKHLDLVMLVTEVIAYSHCCMNPVIYAFVGERFRKYLRHFFHRHLLMHLGRYIPFLT-------SSVS >GPRA_HUMAN TPANQSAEASAGNGSVAGADAPAVTPFQSLQLVHQLKGLIVLLYSVVVVVGLVGNCLLVLVIARVPRLHNVTNFLIGNLALSDVLMCTACVPLTLAYAFEPRWVFGGGLCHLVFFLQPVTVYVSVFTLTTIAVDRYVVLVHPL--RRASRCASAYAVLAIWALSAVLALPPAVHTYHVEHD--VRLCEEF--WGSQERQRQLYAWGLLLVTYLLPLLVILLSYVRVSVKLRNRPGC-SQADWDRARRRRTFCLLVVVVVVFAVCWLPLHVFNLLRDLDPHAYAFGLVQLLCHWLAMSSACYNPFIYAWLHDSFREELRKLLVAWPRKIAPHGQNMT------------ >OPSD_LIZAU MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILIGFPVNFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIALWSLVVLAVERWMVVCKPISNFRFGEDHAIMGLAFTWVMAAACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVVAFLVCWCPYAGVAWYIFTHQGSEFGPLFMTFPAFFAKSSSIYNPMIYICMNKQFRQCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >PF2R_BOVIN --MSTNS---------SIQPVSPESELLSNTTCQLEEDLSISFSIIFMTVGILSNSLAIAILMKAYQYKSSFLLLASALVITDFFGHLINGTIAVFVYASDKFDKSNILCSIFGICMVFSGLCPLFLGSLMAIERCIGVTKPIHSTKITTKHVKMMLSGVCFFAVFVALLPILGHRDYKIQASRTWCFYK--TDEIKDWEDRFYLLLFAFLGLLALGISFVCNAITGISLLKVKFRSQQHRQGRSHHFEMVIQLLGIMCVSCICWSPFLVTMASIGMNIQDKDSCERTLFTLRMATWNQILDPWVYILLRKAVLRNLYVCTRRCCGVHVISLHVWELKVAAISDLPVT >GALR_RAT ----MELAPVNLSEGNGSDPEPP-AEPRPLFGIGVENFITLVVFGLIFAMGVLGNSLVITVLARSKPPRSTTNLFILNLSIADLAYLLFCIPFQATVYALPTWVLGAFICKFIHYFFTVSMLVSIFTLAAMSVDRYVAIVHSRSSSLRVSRNALLGVGFIWALSIAMASPVAYYQRLF-SN--QTFCWEH---WPNQLHKKAYVVCTFVFGYLLPLLLICFCYAKVLNHLHKKLK--NMSKKSEASKKKTAQTVLVVVVVFGISWLPHHVIHLWAEFGAFPPASFFFRITAHCLAYSNSSVNPIIYAFLSENFRKAYKQVFKCRVCNESPHGDAKEPPSTNCTHV--- >OPSU_BRARE MNGTEGPAFYVPMSNATGVVRSPYEYPQYYLVAPWAYGFVAAYMFFLIITGFPVNFLTLYVTIEHKKLRTPLNYILLNLAIADLFMVFGGFTTTMYTSLHGYFVFGRLGCNLEGFFATLGGEMGLKSLVVLAIERWMVVCKPVSNFRFGENHAIMGVAFTWVMACSCAVPPLVGWSRYIPEGMQCSCGVDYYTRTPGVNNESFVIYMFIVHFFIPLIVIFFCYGRLVCTVKEAARQQQESETTQRAEREVTRMVIIMVIAFLICWLPYAGVAWYIFTHQGSEFGPVFMTLPAFFAKTSAVYNPCIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA >GPR6_RAT GPPAASAALGGGGGPNGSLELSSQLPAGPSGLLLSAVNPWDVLLCVSGTVIAGENALVVALIASTPALRTPMFVLVGSLATADLLAG-CGLILHFVFQ-Y--VVPSETVSLLMVGFLVASFAASVSSLLAITVDRYLSLYNALYYSRRTLLGVHLLLAATWTVSLGLGLLPVLGWNCL-A--DRASCSVV-------RPLTRSHVALLSTSFFVVFGIMLHLYVRICQVVWRHIALHCLAPPHLAATRKGVGTLAVVLGTFGASWLPFAIYCVVGSQ----EDPAIYTYATLLPATYNSMINPIIYAFRNQEIQRALWLLFCGCFQSKVPFRSRSP------------ >ACM3_CHICK DSPETTESFPFSTVETTNSSLNATIKDPLGGHAVWQVVLIAFLTGIIALVTIIGNILVIVSFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMGHWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWIISFVLWAPAILFWQYFV-VP-LDECFIQ------FLSEPIITFGTAIAAFYLPVTIMSILYWRIYKETEKRKTRTKRKRMSLIKEKKAAQTLSAILFAFIITWTPYNIMVLVNTFCDC--VPKTVWNLGYWLCYINSTVNPVCYALCNKMFRNTFKMLLLCQCDKRKRRKQQYQHKRIPREAS--- >NY5R_CANFA -------------NTAATRNSDFPVWDDYKSSVDDLQYFLIGLYTFVSLLGFMGNLLILMALMRKRNQKTMVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKVMCHIMPFLQCVSVLVSTLILISIAIVRYHMIKHPI-SNNLTANHGYFLIATVWTLGFAICSPLPVFHSLVESS--RYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRSISCGVHDNRSIMRIKKRSRSVFYRLTILILVFAVSWMPLHLFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLISLIQCLHMS--------------------- >MC5R_SHEEP -MNSSFHLHFLDLGLNATEGNLSGLSVRNASSPCEDMGIAVEVFLALGLISLLENILVIGAIVRNRNLHIPMYFFVGSLAVADMLVSLSNFWETITIYLLTNDASVRHLDNVFDSMICISVVASMCSLLAIAVDRYVTIFCRLYQRIMTGRRSGAIIAGIWAFCTSCGTVFIVYYY----------------------EESTYVVVCLIAMFLTMLLLMASLYTHMFLLARTH-RIPGHSSVRQRTGVKGAITLAMLLGVFIICWAPFFLHLILMISCPQNSCFMSHFNMYLILIMCNSVIDPLIYAFRSQEMRKTFKEIVCFQGFRTPCRFPSTY------------ >V2R_RAT MLLVSTVSAVPGLFSPPSSPSNSSQEELLDDRDPLLVRAELALLSTIFVAVALSNGLVLGALIRRGRRWAPMHVFISHLCLADLAVALFQVLPQLAWDATDRFHGPDALCRAVKYLQMVGMYASSYMILAMTLDRHRAICRPMAYRHGGGARWNRPVLVAWAFSLLLSLPQLFIFAQRDSG--VFDCWAR---FAEPWGLRAYVTWIALMVFVAPALGIAACQVLIFREIHASGRRPSEGAHVSAAMAKTVRMTLVIVIVYVLCWAPFFLVQLWAAWDPEA-LERPPFVLLMLLASLNSCTNPWIYASFSSSVSSELRSLLCCAQRHTTHSLGPQDSSLMKDTPS--- >EDG2_MOUSE QPQFTAMNEQQCFYNESIAFFYNRSGKYLATEWNTVSKLVMGLGITVCVFIMLANLLVMVAIYVNRRFHFPIYYLMANLAAADFFAG-LAYFYLMFNTGPNTRRLTVSTWLLRQGLIDTSLTASVANLLAIAIERHITVFRMQLHTRMSNRRVVVVIVVIWTMAIVMGAIPSVGWNCI-C--DIDHCSNM------APLYSDSYLVFWAIFNLVTFVVMVVLYAHIFGYVRQRRMSSSGPRRNRDTMMSLLKTVVIVLGAFIVCWTPGLVLLLLDVCCPQC-DVLAYEKFFLLLAEFNSAMNPIIYSYRDKEMSATFRQILCCQRNENPNGPTEGSNHTILAGVHSND >EDG1_RAT VKALRSQVSDYGNYDIIVRHYNYTGKLNIGVEKDHGIKLTSVVFILICCLIILENIFVLLTIWKTKKFHRPMYYFIGNLALSDLLAG-VAYTANLLLSGATTYKLTPAQWFLREGSMFVALSASVFSLLAIAIERYITMLKMKLHNGSNSSRSFLLISACWVISLILGGLPIMGWNCI-S--SLSSCSTV------LPLYHKHYILFCTTVFTLLLLSIVILYCRIYSLVRTRLTFISKASRSSEKSLALLKTVIIVLSVFIACWAPLFILLLLDVGCKAKCDILYKAEYFLVLAVLNSGTNPIIYTLTNKEMRRAFIRIISCCKCPNGDSAGKFKEFSRSKSDNSSH >CKR8_HUMAN DYTLDLSVTTVTDYYYPDIFSSPC---DAELIQTNGKLLLAVFYCLLFVFSLLGNSLVILVLVVCKKLRSITDVYLLNLALSDLLFVFSFPFQTYYLLDQ--WVFGTVMCKVVSGFYYIGFYSSMFFITLMSVDRYLAVVHAVALKVRTIRMGTTLCLAVWLTAIMATIPLLVFYQVAS-ED-VLQCYSF-YNQQTLKWKIFTNFKMNILGLLIPFTIFMFCYIKILHQL---------KRCQNHNKTKAIRLVLIVVIASLLFWVPFNVVLFLTSLHSMHQQLTYATHVTEIISFTHCCVNPVIYAFVGEKFKKHLSEIFQKS-CSQIFNYLGRQKS------SSCQ >B1AR_MACMU LPDGVATAARLLVPASPPASLLPPASEGPEPLSQQWTAGMGLLMALIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARGLVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHREL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRNAFQRLLCCARRAARRRHAAHGCLARPGPPPSPG >DBDR_.ENLA FQHLDSDQVASWQSPEMLMNKSVSRESQRRKELVAGQIVTGSLLLLLIFWTLFGNILVCTAVMRFRHRSRVTNIFIVSLAVSDLLVALLVMPWKAVAEVAGHWPFG-AFCDIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTQRVALLMISTAWALSVLISFIPVQLSWHKSDH-STGNCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAQIQSCSQTSLRTSIKKETKVLKTLSIIMGVFVCCWLPFFILNCMVPFCDRSCVSETTFDIFVWFGWANSSLNPIIYAFNA-DFRKVFSSLLGCGHWCSTTPVETVNYNQDTLFHK--- >O00155 PTEPWSPSPGSAPWDYSGLDGLEELELCPAGDLPYGYVYIPALYLAAFAVGLLGNAFVVWLLAGRRGPRRLVDTFVLHLAAADLGFVLTLPLWAAAAARRP-WPFGDGLCKLSTFALAGTRSAGALLLAGMSVDRYLAVVKLLARPLRTPRCAVASCCGVWAVALLAGLPSLVYRGLQPLPG-DSQCGE-----EPSHAFQGLSLLLLLLTFVLPLVVTLFCYCRISRRL-------RRPPHVGRARRNSLRIIFAIESTFVGSWLPFSALRAVFHLARLGLALRWGLTIATCLAFVNSCANPLIYLLLDRSFRARALDGACGRTGRLARRISSASSVFRCRAQAANT >NY4R_HUMAN HLLALLLPKSPQGENRSKPLGTPYNFSEHCQDSVDVMVFIVTSYSIETVVGVLGNLCLMCVTVRQKEKANVTNLLIANLAFSDFLMCLLCQPLTAVYTIMDYWIFGETLCKMSAFIQCMSVTVSILSLVLVALERHQLIINPT-GWKPSISQAYLGIVLIWVIACVLSLPFLANSILENAD--KVVCTES---WPLAHHRTIYTTFLLLFQYCLPLGFILVCYARIYRRLQRQGRVKGTYSLRAGHMKQVNVVLVVMVVAFAVLWLPLHVFNSLEDWHHEACHGNLIFLVCHLLAMASTCVNPFIYGFLNTNFKKEIKALVLTCQQSAPLEES---TVHTEVSKGSLR >SSR2_MOUSE QLNGSQVWVSSPFDLNGSLGPSNGSNQTEPYYDMTSNAVLTFIYFVVCVVGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMINVAVWCVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYAFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSVAISPALKGMFDFVVILTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGTEDGERSDLNETTETQRTLL >EDG2_SHEEP QPQFTAMNEPQCFYNESIAFFYNRSGKYLATEWNTVSKLVMGLGITVCIFIMLANLLVMVAIYVNRRFHFPIYYLMANLAAADFFAG-LAYFYLMFNTGPNTRRLTVSTWLLRQGLIDTTVTASVANLLAIAIERHITVFRMQLHTRMSNRRVVVVIVVIWTMAIVMGAIPSVGWNCI-C--DIENCSNM------APLYSDSYLVFWAIFNLVTFVVMVVLYAHIFGYVRQRRMSSSGPRRNRDTMMSLLKTVVIVLGAFIICWTPGLVLLLLDVCCPQC-DVLAYEKFFLLLAEFNSAMNPIIYSYRDKEMSATFRQILCCQRSENTSGPTEGSNHTILAGVHSND >5H6_MOUSE -----------MVPEPGPVNSSTPAWGPGPPPAPGGSGWVAAALCVVIVLTAAANSLLIALICTQPALRNTSNFFLVSLFTSDLMVGLVVMPPAMLNALYGRWVLARGLCLLWTAFDVMCCSASILNLCLISLDRYLLILSPLYKLRMTAPRALALILGAWSLAALASFLPLLLGWH---E-APGQCRLL--------ASLPYVLVASGVTFFLPSGAICFTYCRILLAARKQMESRRLTTKHSRKALKASLTLGILLSMFFVTWLPFFVASIAQAVCDC--ISPGLFDVLTWLGYCNSTMNPIIYPLFMRDFKRALGRFVPCVHCPPEHRASPASSGARPGLSLQQV >ACTR_CAVPO -------------MKHIIHASGNVNGTARNNSDCPHVALPEEIFFIISITGVLENLIIILAVIKNKNLQFPMYFFICSLAISDMLGSLYKILESILIMFRNMGSFETTTDDIIDTMFILSLLGSIFSLLAIAVDRYITIFHALYHSIVTMHRTIAVLSIIWTFCIGSGITMVLFFS----------------------HHHVPTVLTFTSLFPLMLVFILCLYVHMFLMARSH---ARNISTLPRGNMRGAITLTILLGVFIFCWAPFILHILLVTFCPNNTCYISLFHVNGMLIMCNAVIDPFIYAFRSPELRSAFRRMISYSKCL--------------------- >OPSD_ALLMI MNGTEGPDFYIPFSNKTGVVRSPFEYPQYYLAEPWKYSALAAYMFMLIILGFPINFLTLYVTVQHKKLRSPLNYILLNLAVADLFMVLGGFTTTLYTSMNGYFVFGVTGCYFEGFFATLGGEVALWCLVVLAIERYIVVCKPMSNFRFGENHAIMGVVFTWIMALTCAAPPLVGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFAIPLAVIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVSFLICWVPYASVAFYIFSNQGSDFGPVFMTIPAFFAKSSAIYNPVIYIVMNKQFRNCMITTLCCGKNPLGDDETATGTSSVSTSQVSPA >OPSD_RANTE MNGTEGPNFYIPMSNKTGVVRSPFEYPQYYLAEPWKYSILAAYMFLLILLGFPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTITLYTSLHGYFVFGQSGCYFEGFFATLGGEIALWSLVALAIERYIVVCKPMSNFRFGENHAMMGVAFTWIMALACAVPPLFGWSRYIPEGMQCSCGVDYYTLKPEINNESFVIYMFVVHFLIPLIIITFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVIFFLICWVPYAYVAFYIFCNQGSEFGPIFMTVPAFFAKSSAIYNPVIYIMLNKQFRNCMITTLCCGKNPFGDDDASSAATSVSTSQVSPA >A2AC_CAVPO GPNASGAGEG----GGGVNASGAVWGPPPSQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAIISFPPLVSFYR-------PRCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKLRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFSYSLYGICREAQLPTPLFKFFFWIGYCNSSLNPVIYTIFNQDFRRSFKHILFRRRRRGFRQ----------------- >PE23_RAT ---------MAGVWAPEHSVEAHSNQS---SAADGCGSVSVAFPITMMVTGFVGNALAMLLVVRSYRRKKSFLLCIGWLALTDLVGQLLTSPVVILVYLSQRLDPSGRLCTFFGLTMTVFGLSSLLVASAMAVERALAIRAPHYASHMKTRAT-PVLLGVWLSVLAFALLPVLGVG----RYSGTWCFISNETDSAREPGSVAFASAFACLGLLALVVTFACNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNQMSEKECNSFLIAVRLASLNQILDPWVYLLLRKILLRKFCQIRDHTN-YASSSTSLPCWSDQLER----- >OPSF_ANGAN MNGTEGPNFYVPMSNVTGVVRSPFEYPQYYLAEPWAYSALAAYMFFLIIAGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVFGPTGCNIEGFFATLGGEIALWCLVVLAVERWMVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLFGWSRYIPEGMQCSCGMDHYAPNPETYNESFVIYMFICHFTIPLTVISFCYGRLVCTVKEATAQQQESETTQRAEREVTRMVIIMVISFLVCWVPYASVAWYIFTHQGSSFGPIFMTIPAFFAKSSSLYNPLIYICMNKQSRNCMITTLCCGKNPFEEEEGASTASSVSS--VSPA >OPSR_ORYLA NE--DTTRGSAFTYTNSNHTRDPFEGPNYHIAPRWVYNLATLWMFFVVVLSVFTNGLVLVATAKFKKLRHPLNWILSNLAIADLGETVFASTISVCNQFFGYFILGHPMCVFEGYVVSTCGIAALWSLTIISWERWVVVCKPFGNVKFDAKWAIGGIVFSWVWSAVWCAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVQSYMIVLMITCCIIPLAIIILCYLAVWLAIRAVAMQQKESESTQKAEREVSRMVVVMIVAYCVCWGPYTFFACFAAANPGYAFHPLAAAMPAYFAKSATIYNPVIYVFMNRQFRTCIMQLFGKQVDDG-----SEVSS------VAPA >CKR5_CERAE ----MDYQVSSPTYDIDNYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLLFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPRIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >D2DR_CERAE ---MDPLNLSWYDDDLERQNWSRPFNGSDGKADRPHYNYYATLLTLLIAVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWKFSKIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMIAIVWVLSFTISCPLLFGLN---Q----NECII---------ANPAFVVYSSIVSFYVPFIVTLLVYIKIYIVLRRRRTSMSRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNIHCDCN-IPPVLYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFLKILHC------------------------- >AA3R_SHEEP -------------------------MPVNSTAVSWTSVTYITVEILIGLCAIVGNVLVIWVVKLNPSLQTTTFYFIVSLALADIAVGVLVMPLAIVISLG--VTIHFYSCLFMTCLMLIFTHASIMSLLAIAVDRYLRVKLTVYRRVTTQRRIWLALGLCWLVSFLVGLTPMFGWN-MKA-D-FLPCRFR-----SVMRMDYMVYFSFFLWILVPLVVMCAIYFDIFYIIRNR--SSRETGAFYGREFKTAKSLLLVLFLFALCWLPLSIINCILYFDGQ--VPQTVLYLGILLSHANSMMNPIVYAYKIKKFKETYLLILKACVMCQPSKSMDPS------------ >IL8B_RAT SGDIDSYN-YSSDPPFTLSDAAPC----PSANLDINRYAVVVIYVLVTLLSLVGNSLVMLVILYNRSTCSVTDVYLLNLAIADLFFALTLPVWAASKVNG--WIFGSFLCKVFSFLQEITFYSSVLLLACISMDRYLAIVHATSTLIQKRHLVKFVCITMWFLSLVLSLPIFILRTTVK-PS-TVVCYEN-IGNNTSKWRVVLRILPQTYGFLLPLLIMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLVFLLCWLPYNIVLFTDTLMRTKNEINKALEATEILGFLHSCLNPIIYAFIGQKFRHGLLKIMANYGLVSKEFLAKEG------------ >NY5R_MOUSE ASPAWEDYRGTENNTSAARNTAFPVWEDYRGSVDDLQYFLIGLYTFVSLLGFMGNLLILMAVMKKRNQKTTVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKAMCHIMPFLQCVSVLVSTLILISIAIVRYHMIKHPI-SNNLTANHGYFLIATVWTLGFAICSPFPVFHSLVESS--KYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRSISCGAQEKRSLTRIKKRSRSVFYRLTILILVFAVSWMPLHVFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLRALIHCLHMS--------------------- >OPS3_DROPS ARLSAESRLLGWNVPPDELRHIPEHWLIYPEPPESMNYLLGTLYIFFTVISMIGNGLVMWVFSAAKSLRTPSNILVINLAFCDFMMMIKTPIFIYNSFHQG-YALGHLGCQIFGVIGSYTGIAAGATNAFIAYDRYNVITRPM-EGKMTHGKAIAMIIFIYLYATPWVVACYTESWGRFPEGYLTSCTFD--YLTDNFDTRLFVACIFFFSFVCPTTMITYYYSQIVGHVFSHNVDSNVDKSKEAAEIRIAKAAITICFLFFASWTPYGVMSLIGAFGDKTLLTPGATMIPACTCKMVACIDPFVYAISHPRYRMELQKRCPWLAISEKAPESRAAEQQQTTAA---- >B1AR_MOUSE LPDGAATAARLLVLASPPASLLPPASEGSAPLSQQWTAGMGLLVALIVLLIVVGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARALVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRAACRRRAAHGCLARAGPPPSPG >O6A1_HUMAN ------------MEWRNHSGRVSEFVLLGFPAPVPIQVILFALLLLAYVLVLTENTLIIMAIRNHSTLHKPMYFFLANMSFLEIWYVTVTIPKMLAGFVGSKQLISFEGCMTQLYFFLGLGCTECVLLAVMANDRYMAICYLLNPVIVSGRLCVQMAAGSWAGGFGISMVKVFLISGL---SNHFFCDVSNLSCTDMSTAELTDFILAIFILLGPLSVTGASYVAITGAV--------MHIPSAAGRYKAFSTCASHFNVVIIFYAASIFIYARPKALS---AFDTNKLVSVLYAVIVPLLNPIIYCLRNQEVKRALCCILHLYQHQDPDPKKGSR------------ >GPR8_HUMAN PLDSRGSFSLPTMGANVSQDNGTGHNATFSEPLPFLYVLLPAVYSGICAVGLTGNTAVILVILRAPKMKTVTNVFILNLAVADGLFTLVLPVNIAEHLLQY-WPFGELLCKLVLAVDHYNIFSSIYFLAVMSVDRYLVVLATVHMPWRTYRGAKVASLCVWLGVTVLVLPFFSFAGVYS-LQ-VPSCGLS-FPWPERVWFKASRVYTLVLGFVLPVCTICVLYTDLLRRLRAVR--RSGAKALGKARRKVTVLVLVVLAVCLLCWTPFHLASVVALTTDLPPLVISMSYVITSLTYANSCLNPFLYAFLDDNFRKNFRSILRC------------------------- >CCKR_.ENLA SSTNGTHNLTTANWPPWNLNCTPILDRKKPSPSDLNLWVRIVMYSVIFLLSVFGNTLIIIVLVMNKRLRTITNSFLLSLALSDLMVAVLCMPFTLIPNLMENFIFGEVICRAAAYFMGLSVSVSTFNLVAISIERYSAICNPLSRVWQTRSHAYRVIAATWVLSSIIMIPYLVYK----DRRVGHQCRLV---WPSKQVQQAWYVLLLTILFFIPGVVMIVAYGLISRELYRGKMDINNSEAKLMAKKRVIRMLIVIVAMFFICWMPIFVANTWKAFDELSTLTGAPISFIHLLSYTSACVNPLIYCFMNKRFRKAFLGTFSSCIKP----CRNFRATGASLSKFSYT >ETBR_RAT SSAPAEVTKGGRVAGVPPRS-FPPPCQRKIEINKTFKYINTIVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINAYKLLAGDWPFGAEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAIGFDVITRV--LRVCMLNQKTAFMQFYKTAKDWWLFSFYFCLPLAITAIFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYDQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQTFE-EKQSLEFKANDHGYDNFR >EDG2_HUMAN QPQFTAMNEPQCFYNESIAFFYNRSGKHLATEWNTVSKLVMGLGITVCIFIMLANLLVMVAIYVNRRFHFPIYYLMANLAAADFFAG-LAYFYLMFNTGPNTRRLTVSTWLLRQGLIDTSLTASVANLLAIAIERHITVFRMQLHTRMSNRRVVVVIVVIWTMAIVMGAIPSVGWNCI-C--DIENCSNM------APLYSDSYLVFWAIFNLVTFVVMVVLYAHIFGYVRQRRMSSSGPRRNRDTMMSLLKTVVIVLGAFIICWTPGLVLLLLDVCCPQC-DVLAYEKFFLLLAEFNSAMNPIIYSYRDKEMSATFRQILCCQRSENPTGPTESSNHTILAGVHSND >5H1D_RAT SLPNQSLEGLPQEASNRSLNAT---GAWDPEVLQALRISLVVVLSIITLATVLSNAFVLTTILLTKKLHTPANYLIGSLATTDLLVSILVMPISIAYTTTRTWNFGQILCDIWVSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAAAMIAAVWAISICISIPPLFWR----H-E--SDCLVN-------TSQISYTIYSTCGAFYIPSILLIILYGRIYVAARSRKLALERKRISAARERKATKTLGIILGAFIICWLPFFVVSLVLPICRDSWIHPALFDFFTWLGYLNSLINPVIYTVFNEDFRQAFQRVVHFRKAS--------------------- >ET1R_BOVIN ELSFVVTTHQPTNLALPSNGSMHNYCPQQTKITSAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRNDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSVQGIGIPLVTAIEIVSIWILSFILAIPEAIGFVMVPRT--HRTCMLNATSKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRGSL-IALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYDESFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCYQSKSLMTSVPWKNHEQNNHNTE >GRE2_BALAM ----MSGGEASITGRTAPELN-ASAAPLDDERELGETVAATALLLAIILVTIVGNSLVIISVFTYRPLRSVQNFFVVSLAVADLTVALFVLPLNVAYRLLNQWLLGSYLCQMWLTCDILCCTSSILNLCVIALDRYWAITDPIYAQKRTIRRVNTMIAAVWALSLVISVPPLLGWNDW-T-E--TPCTLT--------QR-LFVVYSSSGSFFIPLIIMSVVYAKIFFATKRRSVHEEKQRISLSKERKAARVLGVIMGVFVVCWLPFFLMYAIVPFCTNCPPSQRVVDFVTWLGYVNSSLNPIIYTIYNKDFRTAFSRLLRCDRRMSA------------------- >LSHR_MOUSE ENELSGWDYDYDFCSPKTLQCTPEPDAFNPCEDIMGYAFLRVLIWLINILAIFGNLTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDSQTKGWQTG-SGCSAAGFFTVFASELSVYTLTVITLERWHTITYAVLDQKLRLRHAIPIMLGGWIFSTLMATLPLVGVSSY----KVSICLPM-----VESTLSQVYILSILLLNAVAFVVICACYVRIYFAVQNP------ELTAPNKDTKIAKKMAILIFTDFTCMAPISFFAISAAFKVPLITVTNSKVLLVLFYPVNSCANPFLYAVFTKAFQRDFFLLLSRFGCCKHRAELYRRFNSKNGFPRSSK >MSHR_MOUSE STQEPQKSLLGSLNSN--ATSHLGLATNQSEPWCLYVSIPDGLFLSLGLVSLVENVLVVIAITKNRNLHSPMYYFICCLALSDLMVSVSIVLETTIILLLEVVALVQQLDNLIDVLICGSMVSSLCFLGIIAIDRYISIFYALYHSIVTLPRARRAVVGIWMVSIVSSTLFITYYY----------------------KKHTAVLLCLVTFFLAMLALMAILYAHMFTRACQHIAQKRRRSIRQGFCLKGAATLTILLGIFFLCWGPFFLHLLLIVLCPQHSCIFKNFNLFLLLIVLSSTVDPLIYAFRSQELRMTLKEVLLCSW----------------------- >5H5A_RAT LPINLTSFSLSTPSTLEPNRSDTEALRTSQSFLSAFRVLVLTLLGFLAAATFTWNLLVLATILRVRTFHRVPHNLVASMAISDVLVAVLVMPLSLVHELSGRWQLGRRLCQLWIACDVLCCTASIWNVTAIALDRYWSITRHLYTLRARKRVSNVMILLTWALSAVISLAPLLFGWGE-S-E-SEECQVS--------REPSYTVFSTVGAFYLPLCVVLFVYWKIYKAAKFRATVTEGDTWREQKEQRAALMVGILIGVFVLCWFPFFVTELISPLCSW-DIPALWKSIFLWLGYSNSFFNPLIYTAFNRSYSSAFKVFFSKQQ----------------------- >BRS3_HUMAN QTLISITNDTESSSSVVSNDNTNKGWSGDNSPGIEALCAIYITYAVIISVGILGNAILIKVFFKTKSMQTVPNIFITSLAFGDLLLLLTCVPVDATHYLAEGWLFGRIGCKVLSFIRLTSVGVSVFTLTILSADRYKAVVKPLRQPSNAILKTCVKAGCVWIVSMIFALPEAIFSNVYTMT--FESCTSY---VSKKLLQEIHSLLCFLVFYIIPLSIISVYYSLIARTLYKSIPTQSHARKQIESRKRIARTVLVLVALFALCWLPNHLLYLYHSFTSQTAMHFIFTIFSRVLAFSNSCVNPFALYWLSKSFQKHFKAQLFCCKAERPEPPVADTMGTVPGTGSIQM >OPSD_CATBO GGGF-GNQTVVDKVPPEMLHLVDAHWYQFPPMNPLWHAILGFVIGILGMISVIGNGMVIYIFTTTKSLRTPSNLLVINLAISDFLMMLSMSPAMVINCYYETWVLGPLVCELYGLTGSLFGCGSIWTMTMIAFDRYNVIVKGLSAKPMTINGALLRILGIWFFSLGWTIAPMFGWNRYVPEGNMTACGTD--YLTKDLLSRSYILVYSFFCYFLPLFLIIYSYFFIIQAVAAHMNVRSAENQSTSAECKLAKVALMTISLWFMAWTPYLVINYAGIFETVK-INPLFTIWGSLFAKANAVYNPIVYGISHPKYRAALFQRFPSLACSSGPAG-ADTTEGTEKPAA--- >GASR_PRANA GSSLCHPGVSLLNSSSAGNLSCEPPRIRGTGTRELELAIRITLYAVIFLMSIGGNMLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAVSYLMGVSVSVSTLNLVAIALERYSAICRPLARVWQTRSHAARVILATWLLSGLLMVPYPVYTVVQP---V-LQCMHR---WPSARVRQTWSVLLLMLLFFIPGVVMAVAYGLISRELYLGTPGASANQAKLLAKKRVVRMLLVIVLLFFLCWLPIYSANTWCAFDGPGALSGAPISFIHLLSYASACVNPLVYCFMHRRFRQACLDTCARCCPRPPRARPRPLPSIASLSRLSYT >P2Y3_CHICK ----------------MSMANFTGGRNSCTFHEEFKQVLLPLVYSVVFLLGLPLNAVVIGQIWLARKALTRTTIYMLNLAMADLLYVCSLPLLIYNYTQKDYWPFGDFTCKFVRFQFYTNLHGSILFLTCISVQRYMGICHPLWHKKKGKKLTWLVCAAVWFIVIAQCLPTFVFASTG-----RTVCYDL-SPPDRSTSYFPYGITLTITGFLLPFAAILACYCSMARILCQK---ELIGLAVHKKKDKAVRMIIIVVIVFSISFFPFHLTKTIYLIVRSSQAFAIAYKCTRPFASMNSVLDPILFYFTQRKFRESTRYLLDKMSSKWRQDHCISY------------ >AG2R_.ENLA -----MSNASTVETSDVERIAVNC---SKSGMHNYIFIAIPIIYSTIFVVGVFGNSMVVIVIYSYMKMKTVASIFLMNLALSDLCFVITLPLWAAYTAMHYHWPFGNFLCKVASTAITLNLYTTVFLLTCLSIDRYSAIVHPMSRIWRTAMVARLTCVGIWLVAFLASMPSIIYRQIYL-TN--TVCAIV-YDSGHIYFMVGMSLAKNIVGFLIPFLIILTSYTLIGKTLKEV------YRAQRARNDDIFKMIVAVVLLFFFCWIPYQVFTFLDVLIQMDDIVDTGMPITICIAYFNSCLNPFLYGFFGKNFRKHFLQLIKYIPPKMRTHASVNTSLSD-------T >FSHR_PIG FDTMYSEFDYDLCNEVVDVICSPEPDTFNPCEDIMGHDILRVLIWFISILAITGNIIVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKTWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLQCKVQLRHAASIMLVGWIFAFTVALFPIFGISSY----KVSICLPM-----IDSPLSQLYVVSLLVLNVLAFVVICGCYTHIYLTVRNP------NIMSSSSDTKIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKSKILLVLFYPINSCANPFLYAIFTKNFRRDVFILLSKFGCYEMQAQTYRTNIHPRNGHCPPA >MSHR_DAMDA PVLGSQRRLLGSLNCTPPATFPLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW----------------------- >ETBR_HORSE LSAPPQMPKAGRTAGAQRRTLPPPPCERTIEIKETFKYINTVVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINVYKLLAEDWPFGVEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAVGFDMITRI--LRICLLHQKTAFMQFYKNAKDWWLFSFYFCLPLAITAFFYTLETCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKHTLYDQSFLLVLEYIGINMASLNSCINPIALYLVSKRFKNCFKWCLCCWCQSFE-EKQSLEFKANDHGYDNFR >MSHR_CAPHI PALGSPRRLLGSLNCTPPATLPLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICSSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSVLSITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW----------------------- >AG2S_RAT ------MTLNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNHLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLMAGLASLPAVIYRNVYF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFVFPFLIILTSYTLIWKALKKA----YKIQKNTPRNDDIFRIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPTAKSHAGLSTRPSD-------N >D5DR_FUGRU NFYNETEPTEPRGGVDPLRVVTAAEDVPAPVGGVSVRALTGCVLCALIVSTLLGNTLVCAAVIKFRHRSKVTNAFVVSLAVSDLFVAVLVMPWRAVSEVAGVWLFG-RFCDTWVAFDIMCSTASILNLCVISMDRYWAISNPFYERRMTRRFAFLMIAVAWTLSVLISFIPVQLNWHRASS-EQGDCNAS--------LNRTYAISSSLISFYIPVLIMVGTYTRIFRIAQTQRASESALKTSFKRETKVLKTLSVIMGVFVFCWLPFFVLNCVVPFCDVDCVSDTTFNIFVWFGWANSSLNPVIYAFNA-DFRKAFTTILGCSKFCSSSAVQAVDYHHDTTLQK--- >PI2R_BOVIN ------------------------MADSCRNLTYVRDSVGPATSTLMFVAGVVGNGLALGILGARRHRPSAFAVLVTGLGVTDLLGTCFLSPAVFAAYARNSARGRPALCDAFAFAMTFFGLASTLILFAMAVERCLALSHPYYAQLDGPRRARLALPAIYAFCTIFCSLPFLGLGQHQQYCPGSWCFIR---MRSAEPGGCAFLLAYASLVALLVAAIVLCNGSVTLSLCRMQRRRCPRPRAGEDEVDHLILLALMTGIMAVCSLPLTPQIRGFTQAIAPDSSEMGDLLAFRFNAFNPILDPWVFILFRKSVFQRLKLWFCCLYSRPAQGDSRTSRKDSSAPPALEG >A1AA_RABIT ----------MVFLSGNASDSSNCT-HPPAPVNISKAILLGVILGGLILFGVLGNILVILSVACHRHLHSVTHYYIVNLAVADLLLTSTVLPFSAIFEILGYWAFGRVFCNIWAAVDVLCCTASIISLCVISIDRYIGVSYPLYPTIVTQRRGLRALLCVWAFSLVISVGPLFGWRQP--AP--TICQIN--------EEPGYVLFSALGSFYVPLTIILAMYCRVYVVAKREAKNFSVRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVMPIGSFFPDFKPPETVFKIVFWLGYLNSCINPIIYPCSSQEFKKAFQNVLKIQCLRRKQSSKHALSQ---------- >OPS4_DROPS SSGSDELQFLGWNVPPDQIQYIPEHWLTQLEPPASMHYMLGVFYIFLFFASTLGNGMVIWIFSTSKSLRTPSNMFVLNLAVFDLIMCLKAPIFIYNSFHRG-FALGNTWCQIFASIGSYSGIGAGMTNAAIGYDRYNVITKPM-NRNMTFTKAVIMNIIIWLYCTPWVVLPLTQFWDRFPEGYLTSCSFD--YLSDNFDTRLFVGTIFLFSFVVPTLMILYYYSQIVGHVFNHNVESNVDKSKETAEIRIAKAAITICFLFFVSWTPYGVMSLIGAFGDKSLLTPGATMIPACTCKLVACIEPFVYAISHPRYRMELQKRCPWLGVNEKSGEASSAQTQQTSAA---- >5H1F_RAT -------------MDFLNSSD-QNLTSEELLNRMPSKILVSLTLSGLALMTTTINCLVITAIIVTRKLHHPANYLICSLAVTDFLVAVLVMPFSIVYIVRESWIMGQGLCDLWLSVDIICCTCSILHLSAIALDRYRAITDAVYARKRTPRHAGITITTVWVISVFISVPPLFWR----S-R--DQCIIK-------HDHIVSTIYSTFGAFYIPLVLILILYYKIYRAARTLLKHWRRQKISGTRERKAATTLGLILGAFVICWLPFFVKELVVNICEKCKISEEMSNFLAWLGYLNSLINPLIYTIFNEDFKKAFQKLVRCRN----------------------- >NK2R_CAVPO ----MGACVIVTNTNISSGLESNTTGITAFSMPTWQLALWATAYLALVLVAVTGNATVTWIILAHQRMRTVTNYFIVNLALADLCMAAFNAAFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAIDRYMAIVHPF-QPRLSAPSTKAVIGGIWLVALALAFPQCFYST---GA---TKCVVAWPEDSRDKSLLLYHLVVIVLIYLLPLTVMFVAYSIIGLTLWRRRHQHGANLRHLQAKKKFVKTMVLVVVTFAICWLPYHLYFILGSFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNRRFRSGFRLAFRCCPWVTPTEE----HTPSFSLRVNRC >5H1A_RAT -MDVFSFGQGNNTTASQEPFGTGGNVTSISDVTFSYQVITSLLLGTLIFCAVLGNACVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQVTCDLFIALDVLCCTSSILHLCAIALDRYWAITDPIYVNKRTPRRAAALISLTWLIGFLISIPPMLGWRT---DP--DACTIS--------KDHGYTIYSTFGAFYIPLLLMLVLYGRIFRAARFRKNEEAKRKMALARERKTVKTLGIIMGTFILCWLPFFIVALVLPFCESSHMPALLGAIINWLGYSNSLLNPVIYAYFNKDFQNAFKKIIKCKFCRR-------------------- >ACM2_PIG --------------MNNSTNSSNSGLALTSPYKTFEVVFIVLVAGSLSLVTIIGNILVMVSIKVNRHLQTVNNYFLFSLACADLIIGVFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIV-VE-DGECYIQ------FFSNAAVTFGTAIAAFYLPVIIMTVLYWHISRASKSRVKMPAKKKPPPSREKKVTRTILAILLAFIITWAPYNVMVLINTFCAPC-IPNTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLMCHYKNIGATR---------------- >GPRV_HUMAN -----------------------MPFPNCSAPSTVVATAVGVLLGLECGLGLLGNAVALWTFLFRVRVWKPYAVYLLNLALADLLLAACLPFLAAFYLSLQAWHLGRVGCWALRFLLDLSRSVGMAFLAAVALDRYLRVVHPRKVNLLSPQAALGVSGLVWLLMVALTCPGLLISEAAQ--S--TRCHSF-YSRADGSFSIIWQEALSCLQFVLPFGLIVFCNAGIIRALQKR----LREPEKQPKLQRAQALVTLVVVLFALCFLPCFLARVLMHIFQNLCAVAHTSDVTGSLTYLHSVVNPVVYCFSSPTFRSSYRRVFHTLRGKGQAAEPPDF------------ >YT66_CAEEL ----------------------------------------MGVTFHPGIVGNITNLMVLASRR-------LRAMYLRALAVADLLCMLFVLVFVSTEYLAKNKLYQIYQCHLMLTLINWALGAGVYVVVALSLERYISIVFPMFRTWNSPQRATRAIVIAFLIPAIFYVPYAITRYKGK---VTIYSMDD---IYTTFYWQIYKWTREAILRFLPIIILTVLNIQIMIAFRKRMFQNKRKEQGTQKDDTLMYMLGGTVLMSLVCNIPAAINLLLIDETLKKLDYQIFRAVANILEITNHASQFYVFCACSTDYRTTFLQKFPCFKTDYANRDRLRSVIQKQGSVEHTT >RGR_MOUSE ---------------------MAATRALPAGLGELEVLAVGTVLLMEALSGISLNGLTIFSFCKTPDLRTPSNLLVLSLALADTGISLNALVAAVSSLLRR-WPHGSEGCQVHGFQGFATALASICGSAAVAWGRYHHYCTRR---QLAWDTAIPLVLFVWMSSAFWASLPLMGWGHYDYEPVGTCCTLD--YSRGDRNFISFLFTMAFFNFLVPLFITHTSYRFME--------------QKFSRSGHLPVNTTLPGRMLLLGWGPYALLYLYAAIADVSFISPKLQMVPALIAKTMPTINAINYALHREMVCRGTWQCLSPQKSKKDRTQA--------------- >A2AA_PIG ----MGSLQPEAGNASWNGTEAPGGGARATPYSLQVTLTLVCLAGLLMLFTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKAWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISIEKKAQ-P--PRCEIN--------DQKWYVISSCIGSFFAPCLIMILVYVRIYQIAKRRRGGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLTAVGCS--VPPTLFKFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------ >PI2R_HUMAN ------------------------MADSCRNLTYVRGSVGPATSTLMFVAGVVGNGLALGILSARRPRPSAFAVLVTGLAATDLLGTSFLSPAVFVAYARNSARGGPALCDAFAFAMTFFGLASMLILFAMAVERCLALSHPYYAQLDGPRCARLALPAIYAFCVLFCALPLLGLGQHQQYCPGSWCFLR---MRWAQPGGAAFSLAYAGLVALLVAAIFLCNGSVTLSLCRMKRHLGPRPRTGEDEVDHLILLALMTVVMAVCSLPLTIRCFTQAVAPDS-SSEMGDLLAFRFYAFNPILDPWVFILFRKAVFQRLKLWVCCLCLGPAHGDSQTPRRDPRAPSAPVG >AA3R_CANFA -------------------------MAVNGTALLLANVTYITVEILIGLCAIVGNVLVIWVVKLNPSLQTTTFYFIVSLALADIAVGVLVMPLAIVISLG--ITIQFYNCLFMTCLLLIFTHASIMSLLAIAVDRYLRVKLTVYRRVTTQRRIWLALGLCWLVSFLVGLTPMFGWN-MKE-H-FLSCQFS-----SVMRMDYMVYFSFFTWILIPLVVMCAIYLDIFYVIRNK--NSKETGAFYGREFKTAKSLFLVLFLFAFSWLPLSIINCITYFHGE--VPQIILYLGILLSHANSMMNPIVYAYKIKKFKETYLLIFKTYMICQSSDSLDSS------------ >CB1R_MOUSE EFYNKSLSSFKENEDNIQCGENFMDMECFMILNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFVDFHVFHR-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRPKAVVAFCLMWTIAIVIAVLPLLGWNCK-K--LQSVCSDI------FPLIDETYLMFWIGVTSVLLLFIVYAYMYILWKAHSHSEDQVTRPDQARMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTVFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPSCEGTAQPLDNSMGHANNTASMHRAA >OPSD_CANFA MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGMQCSCGIDYYTLKPEINNESFVIYMFVVHFAIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSDFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMITTLCCGKNPLGDDEASASASKTETSQVAPA >BRS3_CAVPO QTLISITNDTESSSSVVSNDTTNKGWTGDNSPGIEALCAIYITYAVIISVGILGNAILIKVFFKTKSMQTVPNIFITSLALGDLLLLLTCVPVDATHYLAEGWLFGRIGCKVLSFIRLTSVGVSVFTLTILSADRYKAVVKPLRQPSNAILKTCAKAGCIWIMSMIFALPEAIFSNVHTMT--SEWCAFY---VSEKLLQEIHALLSFLVFYIIPLSIISVYYSLIARTLYKSIPTQSHARKQVESRKRIAKTVLVLVALFALCWLPNHLLNLYHSFTHKAAIHFIVTIFSRVLAFSNSCVNPFALYWLSKTFQKQFKAQLFCCKGELPEPPLAATMGRVSGTENTHI >OPSD_DIPAN MNGTEGPFFYVPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVLGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWTMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFICHFTIPLTVVFFCYGRLLCAVKEAAAAQQESETTQRAEKEVTRMVIMMVIAFLVCWLPYASVAWYIFTHQGSEFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >AG2R_MOUSE ------MALNSSTEDGIKRIQDDC---PRAGRHSYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNHLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLMAGLASLPAVIHRNVYF-TN--TVCAFH-YESRNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFRIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSD-------N >YMJC_CAEEL NLRLNESPYKYVMSNNTTIPSCLTDRQMSLSVSSTEGVLIGTIIPILVLFGISGNILNLTVLLAPNL-RTRSNQLLACLAVADIVSLVVILPHSMAHYETFERKFYGKYKFQIIAMTNWSIATATWLVFVICLERLIIIKYPLPRNVVTIIVVTTFILTSYNHVSHACAEKLFCNGTQY--S-RWFRNEPPNSEFMKSVVRVAPQVNAIFVVLIPVVLVIIFNVMLILTLRQRKTISQFTQLQSKTEHKVTITVTAIVTCFTITQSPSAFVTFLSSYVH--RDWVTLSAICTILVVLGKALNFVLFCLSSASFRQRLLMQTKQGILRKSTRYTSVA------------ >OPSG_ASTFA NE--ETTRESAFVYTNANNTRDPFEGPNYHIAPRWVYNLASLWMIIVVIASIFTNSLVIVATAKFKKLRHPLNWILVNLAIADLGETVLASTISVFNQVFGYFVLGHPMCIFEGWTVSVCGITALWSLTIISWERWVVVCKPFGNVKFDGKWAAGGIIFAWTWAIIWCTPPIFGWSRYWPHGLKTSCGPDVFSGSEDPGVASYMVTLLLTCCILPLSVIIICYIFVWNAIHQVAQQQKDSESTQKAEKEVSRMVVVMILAFILCWGPYASFATFSALNPGYAWHPLAAALPAYFAKSATIYNPIIYVFMNRQFRSCIMQLFGKKVEDA-----SEVSTAS-------- >DUFF_HUMAN FEDVWNSSYGVNDSFPDGDYDANLEAAAPCHSCNLLDDSALPFFILTSVLGILASSTVLFMLFRPLFQLCPGWPVLAQLAVGSALFSIVVPVLAPGLGST--RSSALCSLGYCVWYGSAFAQALLLGCHASLGHRLGAGQ--------VPGLTLGLTVGIWGVAALLTLPVTLASG------SGGLCTLI-YSTELKALQATHTVACLAIFVLLPLGLFGAKGLKKA-------------------LGMGPGPWMNILWAWFIFWWPHGVVLGLDFLVRSKQALDLLLNLAEALAILHCVATPLLLALFCHQATRTLLPSLPLPEGWSSHLDTLGS------------ >OPR._RAT GSHFQGNLSLLN---ETVPHHLLLNASHSAFLPLGLKVTIVGLYLAVCIGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQGTDILLGF-WPFGNALCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALASVVGVPVAIMGSAQVEE---IECLVE-IPAPQDYWGPVFAICIFLFSFIIPVLIISVCYSLMIRRLRGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLVQGLGVQPETAVAILRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCASSLHREMQVSDRVGLGCKTSETVPR >TRFR_SHEEP ------------MENETGSELNQTQLQPRAVVALEYQVVTILLVLIICGLGIVGNIMVVLVVMRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSIYCMLWFFLLDLN--DA-SCGYKIS------RNYYSPIYLMDFGVFYVVPMILATVLYGFIARILFLSLNSNRYFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPVEKPANYSVKESDHFSTELDD >ACTR_MESAU -------------MKHIITPYEHTNDTARNNSDCPDVVLPEEIFFTISIIGVLENLIVLLAVVKNKNLQCPMYFFICSLAISDMLGSLYKILENILIMFRNRGNFESTADDIIDCMFILSLLGSIFSLSVIAADRYITIFHALYHSIVTMRRTIITLTVIWIFCTGSGIAMVIFFS----------------------HHHVPTVLTFTSLFPLMLVFILCLYIHMFLLARSH---ARKISTLPRANMKGAITLTILLGVFIFCWAPFILHVLLMTFCPNNVCYMSLFQINGMLIMCNAVIDPFIYAFRSPELRDAFKKMFSCHRYQ--------------------- >CKR7_HUMAN QDEVTDDYIGDNTTVDYTLFESLC---SKKDVRNFKAWFLPIMYSIICFVGLLGNGLVVLTYIYFKRLKTMTDTYLLNLAVADILFLLTLPFWAYSAAKS--WVFGVHFCKLIFAIYKMSFFSGMLLLLCISIDRYVAIVQAVRHRARVLLISKLSCVGIWILATVLSIPELLYSDLQR-SE-AMRCSLI---TEHVEAFITIQVAQMVIGFLVPLLAMSFCYLVIIRTL---------LQARNFERNKAIKVIIAVVVVFIVFQLPYNGVVLAQTVANFNKQLNIAYDVTYSLACVRCCVNPFLYAFIGVKFRNDLFKLFKDLGCLSQEQLRQWS--------HIRR >O.YR_SHEEP GAFAANWSAEAVNGSAAPPGTEGNRTAGPPQRNEALARVEVAVLSLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLAVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLATWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLAACYGLISFKIWQNRAAVSNVKLISKAKIRTVKMTFIVVLAFIVCWTPFFFKQMWSVWDADA-KEASAFIIAMLLASLNSCCNPWIYMLFTGHLFQDLVQRFLCCSFRRLKGSQLGEHSYTFVLSRHSS >GPR7_HUMAN MDNASFSEPWPANASGPDPALSCSNASTLAPLPAPLAVAVPVVYAVICAVGLAGNSAVLYVLLRAPRMKTVTNLFILNLAIADELFTLVLPINIADFLLRQ-WPFGELMCKLIVAIDQYNTFSSLYFLTVMSADRYLVVLATARVAGRTYSAARAVSLAVWGIVTLVVLPFAVFARLD--QG-RRQCVLV-FPQPEAFWWRASRLYTLVLGFAIPVSTICVLYTTLLCRLHAMR--DSHAKALERAKKRVTFLVVAILAVCLLCWTPYHLSTVVALTTDLPPLVIAISYFITSLTYANSCLNPFLYAFLDASFRRNLRQLITCRAAA--------------------- >GPRJ_HUMAN TETATPLPSQYLMELSEEHSWMSNQTDLHYVLKPGEVATASIFFGILWLFSIFGNSLVCLVIHRSRRTQSTTNYFVVSMACADLLISVASTPFVLLQFTTGRWTLGSATCKVVRYFQYLTPGVQIYVLLSICIDRFYTIVYPL-SFKVSREKAKKMIAASWIFDAGFVTPVLFFYG---SNW-HCNYFLP-----SSWEGTAYTVIHFLVGFVIPSVLIILFYQKVIKYIWRIDGR-RTMNIVPRTKVKTIKMFLILNLLFLLSWLPFHVAQLWHPHEQDYKKSSLVFTAITWISFSSSASKPTLYSIYNANFRRGMKETFCMSSMKCYRSNAYTIKKNYVGISEIPS >OAJ1_HUMAN --MLLCFRFGNQSMKRENFTLITDFVFQGFSSFHEQQITLFGVFLALYILTLAGNIIIVTIIRIDLHLHTPMYFFLSMLSTSETVYTLVILPRMLSSLVGMSQPMSLAGCATQMFFFVTFGITNCFLLTAMGYDRYVAICNPLYMVIMNKRLRIQLVLGACSIGLIVAITQVTSVFRL---PPHFFCDIRKLSCIDTTVNEILTLIISVLVLVVPMGLVFISYVLIISTI--------LKIASVEGRKKAFATCASHLTVVIVHYSCASIAYLKPKSEN---TREHDQLISVTYTVITPLLNPVVYTLRNKEVKDALCRAVGGKFS---------------------- >RDC1_CANFA YAEPGNFSDISWPCNSSDCIVVDTVLCPNMPNKSVLLYTLSFIYIFIFVIGMIANSVVVWVNIQAKTTGYDTHCYILNLAIADLWVVVTIPVWVVSLVQHNQWPMGELTCKITHLIFSINLFGSIFFLTCMSVDRYLSITYFATSSRRKKVVRRAVCVLVWLLAFCVSLPDTYYLKTVTNNE--TYCRSFYPEHSVKEWLISMELVSVVLGFAIPFCVIAVFYCLLARAI---------SASSDQEKQSSRKIIFSYVVVFLVCWLPYHVVVLLDIFSILHNFLFTALHVTQCLSLVHCCVNPVLYSFINRNYRYELMKAFIFKYSAKTGLTKLIDEYSALEQNAK-- >CML1_RAT EYEGYNDSSIYGEEYSDGSDYIVDLEEAGPLEAKVAEVFLVVIYSLVCFLGILGNGLVIVIATFKMK-KTVNTVWFVNLAVADFLFNIFLPIHITYAAMDYHWVFGKAMCKISSFLLSHNMYTSVFLLTVISFDRCISVLLPVSQNHRSVRLAYMTCVVVWVWLSSESPPSLVFGHVST-SF--HSTHPR-TDPVGYSRHVAVTVTRFLCGFLIPVFIITACYLTIVFKL---------QRNRQAKTKKPFKIIITIIITFFLCWCPYHTLYLLELHHTAVSVFSLGLPLATAVAIANSCMNPILYVFMGHDFKK-FKVALFSRLVNALSEDTGPSFTKMSSLIEKAS >FSHR_EQUAS --MMYSEFDYDLCNEVVDVTCSPKPDAFNPCEDIMGYDILRVLIWFISILAITGNIIVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFGSELSVYTLTAITLERWHTITHAMLECKVQLRHAASVMLVGWIFGFGVGLLPIFGISTY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYTHIYLTVRNP------NIVSSSSDTKIAKRMGILIFTDFLCMAPISFFGISASLKVALITVSKSKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEMQAQTYRTISHPKNGPCPPT >OPSD_MULSU MNGTEGPYFYIPMVNTTGIVRSPYDYPQYYLVNPAAYAALGAYMFFLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIALWSLVVLAVERWMVVCKPISNFRFGENHAIMGLAMTWLMASACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVVYMFCCHFMIPLIIVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIAFLVCWLPYASVAWWIFTHQGSEFGPVFMTIPAFFAKSSSIYNPMIYICMNKQFRNCMITTLCCGKNPFEEEEGASSSSVSSSSVSPAA >NK3R_HUMAN GNLSSSPSALGLPVASPAPSQPWANLTNQFVQPSWRIALWSLAYGVVVAVAVLGNLIVIWIILAHKRMRTVTNYFLVNLAFSDASMAAFNTLVNFIYALHSEWYFGANYCRFQNFFPITAVFASIYSMTAIAVDRYMAIIDPL-KPRLSATATKIVIGSIWILAFLLAFPQCLYSK---GR---TLCFVQ--WPEGPKQHFTYHIIVIILVYCFPLLIMGITYTIVGITLWGG--PCDKYHEQLKAKRKVVKMMIIVVMTFAICWLPYHIYFILTAIYQQLKYIQQVYLASFWLAMSSTMYNPIIYCCLNKRFRAGFKRAFRWCPFIKVSSY----TTRFHPNRQSSM >CKR5_CERTO ----MDYQVSSPTYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLLFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSPHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >PI2R_MOUSE DGHPGPPSVTPGSPLSAGGREWQGMAGSCWNITYVQDSVGPATSTLMFVAGVVGNGLALGILGARRRHPSAFAVLVTGLAVTDLLGTCFLSPAVFVAYARNSAHGGTMLCDTFAFAMTFFGLASTLILFAMAVERCLALSHPYYAQLDGPRCARFALPSIYAFCCLFCSLPLLGLGEHQQYCPGSWCFIR---MRSAQPGGCAFSLAYASLMALLVTSIFFCNGSVTLSLYHMRRHFVPTSRAREDEVYHLILLALMTVIMAVCSLPLMIRGFTQAIAPDS--REMGDLLAFRFNAFNPILDPWVFILFRKAVFQRLKFWLCCLCARSVHGDLQAPRRDPPAPTSLQA >FSHR_CHICK FGPVENEFDYGLCNEVVDFVCSPKPDAFNPCEDIMGYNVLRVLIWFINILAITGNTTVLIILISSQYKLTVPRFLMCNLAFADLCIGIYLLFIASVDIQTKSWQTG-AGCNAAGFFTVFASELSVYTLTVITLERWHTITYAMLNRKVRLRHAVIIMVFGWMFAFTVALLPIFGISSY----KVSICLPM-----IETPFSQAYVIFLLVLNVLAFVIICICYICIYFTVRNP------NVISSNSDTKIAKRMAILIFTDFLCMAPISFFAISASLRVPLITVSKSKILLVLFYPINSCANPFLYAIFTKTFRRDFFILLSKFGCCEMQAQIYRTNFHTRNGHYPTA >OPRM_BOVIN FSHLEGNLSDPCGPNRTELGGSDRLCPSAGSPSMITAIIIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGTILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDLRTPRNAKIINICNWILSSAIGLPVMFMATTKYGS---IDCTLT-FSHPTWYWENLLKICVFIFAFIMPILIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSTIEQQNSTRIPSTANTVDRTNH >P2Y5_HUMAN --------------------MVSVNSSHCFYNDSFKYTLYGCMFSMVFVLGLVSNCVAIYIFICVLKVRNETTTYMINLAMSDLLFVFTLPFRIFYFTTRN-WPFGDLLCKISVMLFYTNMYGSILFLTCISVDRFLAIVYPFSKTLRTKRNAKIVCTGVWLTVIGGSAPAVFVQSTHS---ASEACFENFPEATWKTYLSRIVIFIEIVGFFIPLILNVTCSSMVLKTLTKP----VTLSRSKINKTKVLKMIFVHLIIFCFCFVPYNINLILYSLVRTQAAVRTMYPITLCIAVSNCCFDPIVYYFTSDTIQNSIKMKNWSVRRSDFRFSEVHGNLQTLKSKIFDN >HM74_HUMAN ----------MNRHHLQDHFLEIDKKNCCVFRDDFIAKVLPPVLGLEFIFGLLGNGLALWIFCFHLKSWKSSRIFLFNLAVADFLLIICLPFVMDYYVRRSDWNFGDIPCRLVLFMFAMNRQGSIIFLTVVAVDRYFRVVHPHALNKISNWTAAIISCLLWGITVGLTVHLLKKKLLIQ-PA--NVCISF-----SICHTFRWHEAMFLLEFLLPLGIILFCSARIIWSLRQR------QMDRHAKIKRAITFIMVVAIVFVICFLPSVVVRIRIFWLLHTRSVDLAFFITLSFTYMNSMLDPVVYYFSSPSFPNFFSTLINRCLQRKMTGEPDNNTGDPNKTRGAPE >OPSD_DICLA MNGTEGPFFYVPMVNTTGIVRSPYDYPQYYLVSPAAYAALGAYMFLLILLGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNMEGFFATLGGEIGLWSLVVLAVERWLVVCKPISNFRFGENHAIMGLAFTWVMACSCAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFACHFIIPMCVVFFCYGRLLCAVKEAAAAQQESETTQRAEKEVTRMVVIMGIAFLICWCPYASVAWYIFTHQGSEFGPVFMTLPAFFAKTSSVYNPLIYILMNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >GPRW_HUMAN GTRGCSDRQPGVLTRDRSCSRKMNSSGCLSEEVGSLRPLTVVILSASIVVGVLGNGLVLWMTVFRMA-RTVSTVCFFHLALADFMLSLSLPIAMYYIVSRQ-WLLGEWACKLYITFVFLSYFASNCLLVFISVDRCISVLYPVALNHRTVQRASWLAFGVWLLAAALCSAHLKFRTTR-FNSNETAQIWI-----VVEGHIIGTIGHFLLGFLGPLAIIGTCAHLIRAKL---------LREGWVHANRPKRLLLVLVSAFFIFWSPFNVVLLVHLWRRV-PRMLLILQASFALGCVNSSLNPFLYVFVGRDFQEKFFQSLTSALARAFGEEEFLSPRE--------- >UL33_HSV6U --------------------MDTVIELSKLQFKGNASCTSTPTLKTARIMESAVTGITLTTSIPMIIKHNATSFYVITLFASDFVLMWCVFFMTVNRKQL--FSFNRFFCQLVYFIYHAVCSYSISMLAIIATIRYK-TLHRRKKTESKTSSTGRNIGILLLASSMCAIPTALFVKTNGM-KKTGKCVVYSSKKAY-ELFLAVKIVFSFIWGVLPTMVFSFFYVIFCKAL---------HDVTEKKYKKTLFFIRILLLSFLLIQIPYIAILICEIAFLYMARVEILQLIIRLMPQVHCFSNPLVYAFTGGELRNRFTACFQSFFPKTLCSTQKRKDQNSKSKASVEK >A1AB_HUMAN HNTSAPAHWGELKNANFTGPNQTSSNSTLPQLDITRAISVGLVLGAFILFAIVGNILVILSVACNRHLRTPTNYFIVNLAMADLLLSFTVLPFSAALEVLGYWVLGRIFCDIWAAVDVLCCTASILSLCAISIDRYIGVRYSLYPTLVTRRKAILALLSVWVLSTVISIGPLLGWKEP--AP--KECGVT--------EEPFYALFSSLGSFYIPLAVILVMYCRVYIVAKRTHNPIAVKLFKFSREKKAAKTLGIVVGMFILCWLPFFIALPLGSLFSTLKPPDAVFKVVFWLGYFNSCLNPIIYPCSSKEFKRAFVRILGCQCRGRRRRRRRRRTYRPWTRGGSLE >OPSD_SPHSP GG-Y-GNQTVVDKVLPEMLHLIDPHWYQFPPMNPLWHGLLGFVIGCLGFVSVVGNGMVIYIFSTTKGLRTPSNLLVVNLAFSDFLMMLSMSPPMVINCYYETWVLGPFMCELYALLGSLFGCGSIWTMVMIALDRYNVIVKGLAAKPMTNKTAMLRILGIWAMSIAWTVFPLFGWNRYVPEGNMTACGTD--YLNKEWVSRSYILVYSVFVYFLPLATIIYSYWFIVQAVSAHMNVRSAENANTSAECKLAKVALMTISLWFFAWTPYLVTDFSGIFEWGK-ISPLATIWCSLFAKANAVYNPIVYGISHPKYRAALNKKFPSLACASEPDDTASQSDEKSASA---- >ET1R_RAT EFNFLGTTLQPPNLALPSNGSMHGYCPQQTKITTAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRNDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSVQGIGIPLITAIEIVSIWILSFILAIPEAIGFVMVPRT--HRTCMLNATTKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRGSL-IALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYDESFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCHQSKSLMTSVPWKNQEQN-HNTE >B1AR_PIG LPDGAATAARLLVPASPPASLLTPASEGSVQLSQQWTAGMGLLMALIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRAAR-ALVCTVWAISALVSFLPILMHWWRDR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRVARGSCAAAGCLAVARPPPSPG >OLF3_RAT -------------MDSSNRTRVSEFLLLGFVENKDLQPLIYGLFLSMYLVTVIGNISIIVAIISDPCLHTPMYFFLSNLSFVDICFISTTVPKMLVNIQTQNNVITYAGCITQIYFFLLFVELDNFLLTIMAYDRYVAICHPMYTVIMNYKLCGFLVLVSWIVSVLHALFQSLMMLAL---PPHYFCEPNQLTCSDAFLNDLVIYFTLVLLATVPLAGIFYSYFKIVSSI--------CAISSVHGKYKAFSTCASHLSVVSLFYCTGLGVYLSSAANN---SSQASATASVMYTVVTPMVNPFIYSLRNKDVKSVLKKTLCEEVIRSPPSLLHFFCFIFCY------ >5H5A_HUMAN LPVNLTSFSLSTPSPLETNHSGKDDLRPSSPLLSVFGVLILTLLGFLVAATFAWNLLVLATILRVRTFHRVPHNLVASMAVSDVLVAALVMPLSLVHELSGRWQLGRRLCQLWIACDVLCCTASIWNVTAIALDRYWSITRHMYTLRTRKCVSNVMIALTWALSAVISLAPLLFGWGE-S-E-SEECQVS--------REPSYAVFSTVGAFYLPLCVVLFVYWKIYKAAKFRATVPEGDTWREQKEQRAALMVGILIGVFVLCWIPFFLTELISPLCSC-DIPAIWKSIFLWLGYSNSFFNPLIYTAFNKNYNSAFKNFFSRQH----------------------- >OPRD_RAT FSLLANVSDTFPSAFPSASANASGSPGARSASSLALAIAITALYSAVCAVGLLGNVLVMFGIVRYTKLKTATNIYIFNLALADALATSTLPFQSAKYLMET-WPFGELLCKAVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPAKAKLINICIWVLASGVGVPIMVMAVTQPGA---VVCTLQ-FPSPSWYWDTVTKICVFLFAFVVPILIITVCYGLMLLRLRSVRLL-SGSKEKDRSLRRITRMVLVVVGAFVVCWAPIHIFVIVWTLVDINPLVVAALHLCIALGYANSSLNPVLYAFLDENFKRCFRQLCRAPCGGQEPGSLRRPRVTACTPSDGPG >CKRV_MOUSE EIPAVTEPSYNTVAKNDFMSGFLC---FSINVRAFGITVPTPLYSLVFIIGVIGHVLVVLVLIQHKRLRNMTSIYLFNLAISDLVFLSTLPFWVDYIMKGD-WIFGNAMCKFVSGFYYLGLYSDMFFITLLTIDRYLAVVHVVALRARTVTFGIISSIITWVLAALVSIPCLYVFKSQM-EF-YHTCRAILPRKSLIRFLRFQALTMNILGLILPLLAMIICYTRIINVL---------HRRPNKKKAKVMRLIFVITLLFFLLLAPYYLAAFVSAFEDVLQQVDLSLMITEALAYTHCCVNPVIYVFVGKRFRKYLWQLFRRHTAITLPQWLPFLA-------SARL >THRR_RAT PLEGRAVYLNKSRFPPMPPPPFISEDASGYLTSPWLTLFIPSVYTFVFIVSLPLNILAIAVFVFRMKVKKPAVVYMLHLAMADVLFVSVLPFKISYYFSGTDWQFGSGMCRFATAACYCNMYASIMLMTVISIDRFLAVVYPISLSWRTLGRANFTCVVIWVMAIMGVVPLLLKEQTTQ--N--TTCHDVLNETLLHGFYSYYFSAFSAIFFLVPLIISTVCYTSIIRCL------SSSAVANRSKKSRALFLSAAVFCIFIVCFGPTNVLLIVHYLLLSDETAYFAYLLCVCVTSVASCIDPLIYYYASSECQKHLYSILCCRESSDSNSCNSTGDTCS-------- >DADR_PIG --------------MRTLNTSTMDGTGLVVERDFSFRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-SFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLSWHKAGN-TTHNCDSS--------LSRTYAISSSLISFYIPVAIMIVTYTRIYRIAQKQAECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCGSGCIDSITFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSTLLGCYRLCPTSTNAIETAVVFSSH----- >GRHR_BOVIN --MANSDSPEQNENHCSAINSSIPLTPGSLPTLTLSGKIRVTVTFFLFLLSTIFNTSFLLKLQNWTQKLSRMKLLLKHLTLANLLETLIVMPLDGMWNITVQWYAGELLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITKPL-AVKSNSKLGQFMIGLAWLLSSIFAGPQLYIFGMIHEG--FSQCVTH--SFPQWWHQAFYNFFTFSCLFIIPLLIMVICNAKIIFTLTRVPHKNQSKNNIPRARLRTLKMTVAFATSFTVCWTPYYVLGIWYWFDPDMRVSDPVNHFFFLFAFLNPCFDPLIYGYFSL------------------------------------- >OLF3_CHICK -------------MASGNCTTPTTFILSGLTDNPGLQMPLFMVFLAIYTITLLTNLGLIRLISVDLHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERKTISYVGCILQYFSFVLLTTSECLLLAVMAYDRYVAICKPLYPAIMTKAVCWRLVESLYFLAFLNSLVHTCGLLKL---SNHFFCDISQISSSSIAISELLVIISGSLFVMSSIIIILISYVFIILTV--------VMIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRPVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRLTATTFGFIDSKAVQ-------------- >ML1A_PHOSU -------MKGNGSTLLNASQQAPGVGEGGGPRPSWLASTLAFILIFTIVVDILGNLLVILSVYRNKKLRNAGNIFVVSLAIADLVVAIYPYPLVLTSIFNNGWNLGYLHCQISAFLMGLSVIGSIFNITGIAINRYCYICHSLYDRLYSNKNSLCYVFLIWVLTLVAIMPNLQTGT-LQYDP-IYSCTFT------QSVSSAYTIAVVVFHFIVPMIIVIFCYLRIWILVLQVRR-PDSKPRLKPQDFRNFVTMFVVFVLFAICWAPLNFIGLIVASDPATRIPEWLFVASYYMAYFNSCLNAIIYGLLNQNFRQEYKRILVSLFTAKMCFVDSSNCKPAPLIANNNL >NY2R_HUMAN EEMKVEQYGP-QTTPRGELVPDPEPELIDSTKLIEVQVVLILAYCSIILLGVIGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTITLTVIALDRHRCIVYHL-ESKISKRISFLIIGLAWGISALLASPLAIFREYSLFE--IVACTEKWPGEEKSIYGTVYSLSSLLILYVLPLGIISFSYTRIWSKLKNHSPG-AANDHYHQRRQKTTKMLVCVVVVFAVSWLPLHAFQLAVDIDSQVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCEQRLDAIHSE---AKKNLEVRKNSG >FMLR_HUMAN -----------METNSSLPTNISGGTPAVSAGYLFLDIITYLVFAVTFVLGVLGNGLVIWVAGFRMT-HTVTTISYLNLAVADFCFTSTLPFFMVRKAMGGHWPFGWFLCKFLFTIVDINLFGSVFLIALIALDRCVCVLHPVTQNHRTVSLAKKVIIGPWVMALLLTLPVIIRVTTVPSPWPKERINVA------VAMLTVRGIIRFIIGFSAPMSIVAVSYGLIATKI---------HKQGLIKSSRPLRVLSFVAAAFFLCWSPYQVVALIATVRIREKEIGIAVDVTSALAFFNSCLNPMLYVFMGQDFRERLIHALPASLERALTEDSTQTTL---------- >NK1R_CAVPO -----MDNVLPVDSDLFPNISTNTSEPNQFVQPAWQIVLWAAAYTVIVVTSVVGNVVVMWIILAHKRMRTVTNYFLVNLAFAEASMAAFNTVVNFTYAVHNEWYYGLFYCKFHNFFPIAAVFASIYSMTAVAFDRYMAIIHPL-QPRLSATATKVVICVIWVLALLLAFPQGYYST---GR---VVCMIEWPSHPDKIYEKVYHICVTVLIYFLPLLVIGYAYTVVGITLE---IPSDRYHEQVSAKRKVVKMMIVVVCTFAICWLPFHIFFLLPYINPDLKFIQQVYLAIMWLAMSSTMYNPIIYCCLNDRFRLGFKHAFRCCPFISAADY----STRYFQTQGSVY >ML1B_HUMAN NGSFANCCEAGGWAVRPGWSGAGSARPSRTPRPPWVAPALSAVLIVTTAVDVVGNLLVILSVLRNRKLRNAGNLFLVSLALADLVVAFYPYPLILVAIFYDGWALGEEHCKASAFVMGLSVIGSVFNITAIAINRYCYICHSMYHRIYRRWHTPLHICLIWLLTVVALLPNFFVGS-LEYDP-IYSCTFI------QTASTQYTAAVVVIHFLLPIAVVSFCYLRIWVLVLQARK-PESRLCLKPSDLRSFLTMFVVFVIFAICWAPLNCIGLAVAINPQEQIPEGLFVTSYLLAYFNSCLNAIVYGLLNQNFRREYKRILLALWNPRHCIQDASKQSPAPPIIGVQH >C3AR_RAT --------------MESFTADTNSTDLHSRPLFKPQDIASMVILSLTCLLGLPGNGLVLWVAGVKMK-RTVNTVWFLHLTLADFLCCLSLPFSVAHLILRGHWPYGLFLCKLIPSVIILNMFASVFLLTAISLDRCLMVHKPICQNHRSVRTAFAVCGCVWVVTFVMCIPVFVYRDLLV-ED-DYFDQLM-YGNHAWTPQVAITISRLVVGFLVPFFIMITCYSLIVFRM--------RKTNLTKSRNKTLRVAVAVVTVFFVCWIPYHIVGILLVITDQEEVVLPWDHMSIALASANSCFNPFLYALLGKDFRKKARQSVKGILEAAFSEELTHSAPS--------- >OPSB_GECGE MNGTEGINFYVPLSNKTGLVRSPFEYPQYYLADPWKFKVLSFYMFFLIAAGMPLNGLTLFVTFQHKKLRQPLNYILVNLAAANLVTVCCGFTVTFYASWYAYFVFGPIGCAIEGFFATIGGQVALWSLVVLAIERYIVICKPMGNFRFSATHAIMGIAFTWFMALACAGPPLFGWSRFIPEGMQCSCGPDYYTLNPDFHNESYVIYMFIVHFTVPMVVIFFSYGRLVCKVREAAAQQQESATTQKAEKEVTRMVILMVLGFLLAWTPYAATAIWIFTNRGAAFSVTFMTIPAFFSKSSSIYNPIIYVLLNKQFRNCMVTTICCGKNPFGDEDVSSSVSSVSSSQVAPA >SSR5_RAT LSLASTPSWNAS---AASSGNHNWSLVGSASPMGARAVLVPVLYLLVCTVGLSGNTLVIYVVLRHAKMKTVTNVYILNLAVADVLFMLGLPFLATQNAVVSYWPFGSFLCRLVMTLDGINQFTSIFCLMVMSVDRYLAVVHPLSARWRRPRVAKMASAAVWVFSLLMSLPLLVFADVQE-----GTCNLS-WPEPVGLWGAAFITYTSVLGFFGPLLVICLCYLLIVVKVKAAMR--VGSSRRRRSEPKVTRMVVVVVLVFVGCWLPFFIVNIVNLAFTLPPTSAGLYFFVVVLSYANSCANPLLYGFLSDNFRQSFRKVLCLRRGYGMEDADAIERPQATLPTRSCE >B2AR_CANFA MGQPANRSVFLLAPNGSHAPDQ----GDSQERSEAWVVGMGIVMSLIVLAIVFGNVLVITAIARFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFYQSLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQAYAIASSIVSFYLPLVVMVFVYSRVFQVAQRQGRSHRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IPKEVYILLNWVGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSLKAYGNGYSDYAGEHSGCHLG >CKR6_MOUSE FGTDDYDN---TEYYSIPPDHGPC---SLEEVRNFTKVFVPIAYSLICVFGLLGNIMVVMTFAFYKKARSMTDVYLLNMAITDILFVLTLPFWAVTHATNT-WVFSDALCKLMKGTYAVNFNCGMLLLACISMDRYIAIVQATRVRSRTLTHSKVICVAVWFISIIISSPTFIFNKKYE-LQ-RDVCEPRRSVSEPITWKLLGMGLELFFGFFTPLLFMVFCYLFIIKTL---------VQAQNSKRHRAIRVVIAVVLVFLACQIPHNMVLLVTAVNTGKKVLAYTRNVAEVLAFLHCCLNPVLYAFIGQKFRNYFMKIMKDVWCMRRKNKMPGFESY-----ISRQ >NYR_DROME TLSGLQFETYNITVMMNFSCDDYDLLSEDMWSSAYFKIIVYMLYIPIFIFALIGNGTVCYIVYSTPRMRTVTNYFIASLAIGDILMSFFCEPSSFISLFILNWPFGLALCHFVNYSQAVSVLVSAYTLVAISIDRYIAIMWPL-KPRITKRYATFIIAGVWFIALATALPIPIVSGLDI---WHTKCEKYREMWPSRSQEYYYTLSLFALQFVVPLGVLIFTYARITIRVWAKPGETNRDQRMARSKRKMVKMMLTVVIVFTCCWLPFNILQLLLNDEEFADPLPYVWFAFHWLAMSHCCYNPIIYCYMNARFRSGFVQLMHRMPGLRRWCCLRSVSGTGPALPLNRM >HH2R_MOUSE -------------------MEPNGTVHSCCLDSIALKVTISVVLTTLIFITVAGNVVVCLAVSLNRRLRSLTNCFIVSLAATDLLLGLLVMPFSAIYQLSFKWRFGQVFCNIYTSLDVMLCTASILNLFMISLDRYCAVTDPLYPVLVTPVRVAISLVFIWVISITLSFLSIHLGWN--RN-TF-KCKVQ--------VNEVYGLVDGMVTFYLPLLIMCVTYYRIFKIAREQR--ISSWKAATIREHKATVTLAAVMGAFIVCWFPYFTAFVYRGLRGDD-VNEVVEGIVLWLGYANSALNPILYATLNRDFRMAYQQLFHCKLASHNSHKTSLRRSQSREGRW--- >TRFR_BOVIN -----------MENETGSELN-QTQLQPRAVVALEYQVVTILLVLIICGLGIVGNIMVVLVVMRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSIYCMLWFFLLDLN--DA-SCGYKIS------RNYYSPIYLMDFGVFYVVPMILATVLYGFIARILFLNLNSNRYFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPVEKPANYSVKESDRFSTELDD >OPSP_ICTPU -------MASIILINFSETDTLHLGSVNDHIMPRIGYTILSIIMALSSTFGIILNMVVIIVTVRYKQLRQPLNYALVNLAVADLGCPVFGGLLTAVTNAMGYFSLGRVGCVLEGFAVAFFGIAGLCSVAVIAVDRYMVVCRPLGAVMFQTKHALAGVVFSWVWSFIWNTPPLFGWGSYQLEGVMTSCAPN--WYRRDPVNVSYILCYFMLCFALPFATIIFSYMHLLHTLWQVAKLVADSGSTAKVEVQVARMVVIMVMAFLLTWLPYAAFALTVIIDSNIYINPVIGTIPAYLAKSSTVFNPIIYIFMNRQFRDYALPCLLCGKNPWAAKEGRDSTVSKNTSVSPL- >OPSR_.ENLA NDDDDTTRSSVFTYTNSNNTRGPFEGPNYHIAPRWVYNLTSIWMIFVVFASVFTNGLVIVATLKFKKLRHPLNWILVNMAIADLGETVIASTISVFNQIFGYFILGHPMCVLEGFTVSTCGITALWSLTVIAWERWFVVCKPFGNIKFDEKLAATGIIFSWVWSAGWCAPPMFGWSRFWPHGLKTSCGPDVFSGSSDPGVQSYMLVLMITCCIIPLAIIILCYLHVWWTIRQVAQQQKESESTQKAEREVSRMVVVMIVAYIFCWGPYTFFACFAAFSPGYSFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIYQMFGKKVDDG-----SEVVSSVSNSSVSPA >ML1._MOUSE ---------MGPTKAVPTPFGCIGCKLPKPDYPPALIIFMFCAMVITVVVDLIGNSMVILAVTKNKKLRNSGNIFVASLSVADMLVAIYPYPLMLYAMSVGGWDLSQLQCQMVGLVTGLSVVGSIFNITAIAINRYCYICHSLYKRIFSLRNTCIYLVVTWVMTVLAVLPNMYIGT-IEYDP-TYTCIFN------YVNNPAFTVTIVCIHFVLPLIIVGYCYTKIWIKVLAARD-AGQNPDNQFAEVRNFLTMFVIFLLFAVCWCPVNVLTVLVAVIPKEKIPNWLYLAAYCIAYFNSCLNAIIYGILNESFRREYWTIFHAMRHPILFISHLISTRALTRARVRAR >OPSG_CHICK MNGTEGINFYVPMSNKTGVVRSPFEYPQYYLAEPWKYRLVCCYIFFLISTGLPINLLTLLVTFKHKKLRQPLNYILVNLAVADLFMACFGFTVTFYTAWNGYFVFGPVGCAVEGFFATLGGQVALWSLVVLAIERYIVVCKPMGNFRFSATHAMMGIAFTWVMAFSCAAPPLFGWSRYMPEGMQCSCGPDYYTHNPDYHNESYVLYMFVIHFIIPVVVIFFSYGRLICKVREAAAQQQESATTQKAEKEVTRMVILMVLGFMLAWTPYAVVAFWIFTNKGADFTATLMAVPAFFSKSSSLYNPIIYVLMNKQFRNCMITTICCGKNPFGDEDVSSTVSSVSSSQVSPA >GRHR_HUMAN --MANSASPEQNQNHCSAINNSIPLMQGNLPTLTLSGKIRVTVTFFLFLLSATFNASFLLKLQKWTQKLSRMKLLLKHLTLANLLETLIVMPLDGMWNITVQWYAGELLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITRPL-ALKSNSKVGQSMVGLAWILSSVFAGPQLYIFRMIHKV--FSQCVTH--SFSQWWHQAFYNFFTFSCLFIIPLFIMLICNAKIIFTLTRVPHENQSKNNIPRARLKTLKMTVAFATSFTVCWTPYYVLGIWYWFDPEMRLSDPVNHFFFLFAFLNPCFDPLIYGYFSL------------------------------------- >5HT2_APLCA ----------------MLCGRLRHTMNSTTCFFSHRTVLIGIVGSLIIAVSVVGNVLVCLAIFTEPISHSKSKFFIVSLAVADLLLALLVMTFALVNSLYGYWLFGETFCFIWMSADVMCETASIFSICVISYNRLKQVQKPLYEEFMTTTRALLIIASLWICSFVVSFVPFFLEWHELGD-PKPECLFD--------VHFIYSVIYSLFCFYIPCTLMLRNYLRLFLIAKKH-RIHRLHRNQGTQGSKAARTLTIITGTFLACWLPFFIINPIEAVDEHL-IPLECFMVTIWLGYFNSCVNPIIYGTSNSKFRAAFQRLLRCRSVKSTVSSISPVSWIRPSLLDGP- >OPSD_ASTFA MNGTEGPYFYVPMSNATGVVRSPYEYPQYYLAPPWAYACLAAYMFFLILVGFPVNFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSLNGYFVFGRLGCNLEGFFATFGGINSLWCLVVLSIERWVVVCKPMSNFRFGENHAIMGVAFTWFMALACTVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVVHFLTPLFVITFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVILMFIAYLVCWLPYASVSWWIFTNQGSEFGPIFMTVPAFFAKSSSIYNPVIYICLNKQFRHCMITTLCCGKNPF--EEEEGATEASSVSSVSPA >5H2C_HUMAN GLLVWQCDISVSPVAAIVTDIFNTSDGGRFKFPDGVQNWPALSIVIIIIMTIGGNILVIMAVSMEKKLHNATNYFLMSLAIADMLVGLLVMPLSLLAILYDYWPLPRYLCPVWISLDVLFSTASIMHLCAISLDRYVAIRNPIHSRFNSRTKAIMKIAIVWAISIGVSVPIPVIGLRD-VF--NTTCVL---------NDPNFVLIGSFVAFFIPLTIMVITYCLTIYVLRRQKKKPRGTMQAINNERKASKVLGIVFFVFLIMWCPFFITNILSVLCEKSKLMEKLLNVFVWIGYVCSGINPLVYTLFNKIYRRAFSNYLRCNYKVEKKPPVRQIALSGRELNVNIY >DBDR_RAT PGRNRTAQPARLGLQRQLAQVDAPAG--SATPLGPAQVVTAGLLTLLIVWTLLGNVLVCAAIVRSRHRAKMTNIFIVSLAVSDLFVALLVMPWKAVAEVAGYWPFG-TFCDIWVAFDIMCSTASILNLCIISVDRYWAISRPFYERKMTQRVALVMVGLAWTLSILISFIPVQLNWHRDEG-RTENCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAQVQRGADPSLRASIKKETKVFKTLSMIMGVFVCCWLPFFILNCMVPFCSSGCVSETTFDIFVWFGWANSSLNPIIYAFNA-DFRKVFAQLLGCSHFCFRTPVQTVNYNQDTVFHK--- >EDG1_MOUSE VKALRSSVSDYGNYDIIVRHYNYTGKLNIGAEKDHGIKLTSVVFILICCFIILENIFVLLTIWKTKKFHRPMYYFIGNLALSDLLAG-VAYTANLLLSGATTYKLTPAQWFLREGSMFVALSASVFSLLAIAIERYITMLKMKLHNGSNSSRSFLLISACWVISLILGGLPSMGWNCI-S--SLSSCSTV------LPLYHKHYILFCTTVFTLLLLSIAILYCRIYSLVRTRLTFISKGSRSSEKSLALLKTVIIVLSVFIACWAPLFILLLLDVGCKAKCDILYKAEYFLVLAVLNSGTNPIIYTLTNKEMRRAFIRIVSCCKCPNGDSAGKFKEFSRSKSDNSSH >B3AR_BOVIN MAPWPPGNSSLTPWPDIPTLAPNTANASGLPGVPWAVALAGALLALAVLATVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGHWPLGVTGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRALAAVVLVWVVSAAVSFAPIMSKWWRIQ-R--RCCTFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALRTLGLIMGTFTLCWLPFFVVNVVRALGGPS-VSGPTFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCR---PEEHLAAAGAPTALTSPAGP >NY2R_CAVPO EEIKVEPYGPGHTTPRGELAPDPEPELIDSTKLTEVRVVLILAYCSIILLGVVGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTVTLTVIALDRHRCIVYHL-DSKISKQNSFLIIGLAWGISALLASPLAIFREYSLFE--IVACTEKWPGEEKSIYGTVYSLSSLLILYVLPLGIISVSYVRIWSKLKNHSPG-AANDHYHQRRQKTTKMLVFVVVVFAVSWLPLHAFQLAVDIDSQVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCQQRLDAIQSE---AKTNVEVEKNHG >PE22_RAT ---------------MDNSFNDSRRVENCESRQYLLSDESPAISSVMFTAGVLGNLIALALLARRWRSISLFHVLVTELVLTDLLGTCLISPVVLASYSRNQLAPESRACTYFAFTMTFFSLATMLMLFAMALERYLAIGHPYYRRRVSRRGGLAVLPAIYGVSLLFCSLPLLNYGEYVQYCPGTWCFIQ--------HGRTAYLQLYATVLLLLIVAVLGCNISVILNLIRMRGPRRGERTSMAEETDHLILLAIMTITFAVCSLPFTIFAYMDETSS---RKEKWDLRALRFLSVNSIIDPWVFVILRPPVLRLMRSVLCCRTSLRAPEAPGAS--QTDLCGQL-- >A2AB_MOUSE --------------------MSGPAMVHQEPYSVQATAAIASAITFLILFTIFGNALVILAVLTSRSLRAPQNLFLVSLAAADILVATLIIPFSLANELLGYWYFWRAWCEVYLALDVLFCTSSIVHLCAISLDRYWAVSRALYNSKRTPRRIKCIILTVWLIAAVISLPPLIYKGD-Q-----PQCELN--------QEAWYILASSIGSFFAPCLIMILVYLRIYVIAKRSGVAWWRRRTQLSREKRFTFVLAVVIGVFVVCWFPFFFSYSLGAICPQHKVPHGLFQFFFWIGYCNSSLNPVIYTIFNQDFRRAFRRILCRQWTQTGW------------------ >AG2S_HUMAN ------MILNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPAIIHRNVFF-TN--TVCAFH-YESRNSTLPIGLGLTKNILGSCFPFLIILTSYTLIWKALKKA----YEIQKNNPRNDDIFRIIMAIVLFFFFSWIPHQIFTFLDVLIQQGDIVDTAMPITIWIAYFNNCLNPLFYGFLGKKFKKDILQLLKYIPPKAKSHSNLSTRPSD-------N >B2AR_BOVIN MGQPGNRSVFLLAPNASHAPDQ----NVTLERDEAWVVGMGILMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGACHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYLAITSPFYQCLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQPYAIASSIVSFYLPLVVMVFVYSRVFQVAKRQGRSQRRTSKFYLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIKDNL-IRKEIYILLNWLGYINSAFNPLIYCRSP-DFRIAFQELLCLRRSSLKAYGNGCSDYTGEQSGYHLG >IL8B_RABIT G--DFSNYSYSTDLPPTLLDSAPC----RSESLETNSYVVLITYILVFLLSLLGNSLVMLVILYSRSTCSVTDVYLLNLAIADLLFATTLPIWAASKVHG--WTFGTPLCKVVSLVKEVNFYSGILLLACISVDRYLAIVHATRTMIQKRHLVKFICLSMWGVSLILSLPILLFRNAIF-NS-SPVCYED-MGNSTAKWRMVLRILPQTFGFILPLLVMLFCYVFTLRTL---------FQAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLTDTLMRTHNDIDRALDATEILGFLHSCLNPIIYAFIGQKFRYGLLKILAAHGLISKEFLAKES------------ >GASR_RAT GSSLCRPGVSLLNSSSAGNLSCDPPRIRGTGTRELEMAIRITLYAVIFLMSVGGNVLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAISYLMGVSVSVSTLNLVAIALERYSAICRPLARVWQTRSHAARVILATWLLSGLLMVPYPVYTMVQP---V-LQCMHR---WPSARVQQTWSVLLLLLLFFIPGVVIAVAYGLISRELYLGPGPPRPNQAKLLAKKRVVRMLLVIVLLFFLCWLPVYSVNTWRAFDGPGALSGAPISFIHLLSYVSACVNPLVYCFMHRRFRQACLDTCARCCPRPPRARPQPLPSIASLSRLSYT >A1AB_MOUSE HNTSAPAHWGELKDANFTGPNQTSSNSTLPQLDVTRAISVGCLG-AFILFAIVGNILVILSVACNRHLRTPTNYFIVNLAIADLLLSFTDLPFSATLEVLGYWVLGRIFCDIWAAVDVLCCTASILSLCAISIDRYIGVRYSLYPTLVTRRKAILALLSVWVLSTVISIGPLLGWKEP--AP--KECGVT--------EEPFYALFSSLGSFYIPLAVILVMYCRVYIVAKRTHNPIAVKLFKFSREKKAAKTLGIVVGMFILCWLPFFIALPLGSLFSTLKPPDAVFKVVFWLGYFNSCLNPIIYPCSSKEFKRAFMRILGCQCRGGRRRRRRRRTYRPWTRGGSLE >OPSR_FELCA AGLEDSTRASIFTYTNSNATRGPFEGPNYHIAPRWVYHVTSAWMIFVVIASVFTNGLVLAATMKFKKLRHPLNWILVNLAVADLAETIIASTISVVNQIYGYFVLGHPMCVLEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIAGIAFSWIWAAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMITCCIIPLSVIVLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVMVMIFAYCVCWGPYTFFACFAAAHPGYAFHPLVAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKVDDG-----SELASSV--SSVSPA >P2Y7_HUMAN -------------------MNTTSSAAPPSLGVEFISLLAIILLSVALAVGLPGNSFVVWSILKRMQKRSVTALMVLNLALADLAVLLTAPFFLHFLAQGT-WSFGLAGCRLCHYVCGVSMYASVLLITAMSLDRSLAVARPFSQKLRTKAMARRVLAGIWVLSFLLATPVLAYRTVVPK--NMSLCFPR---YPSEGHRAFHLIFEAVTGFLLPFLAVVASYSDIGRRL---------QARRFRRSRRTGRLVVLIILTFAAFWLPYHVVNLAEAGRALAKRLSLARNVLIALAFLSSSVNPVLYACAGGGLLRSAGVGFVAKLLEGTGSEASSTQTARSGPAALEP >GPRL_HUMAN ---------MNSTLDGNQSSHPFCLLAFGYLETVNFCLLEVLIIVFLTVLIISGNIIVIFVFHCAPLNHHTTSYFIQTMAYADLFVGVSCVVPSLSLLHHPLPVEESLTCQIFGFVVSVLKSVSMASLACISIDRYIAITKPLYNTLVTPWRLRLCIFLIWLYSTLVFLPSFFHWG---K-P-VFQWCAE-----SWHTDSYFTLFIVMMLYAPAALIVCFTYFNIFRICQQHRFSGETGEVQACPDKRYAMVLFRITSVFYILWLPYIIYFLLESSTGHS--NRFASFLTTWLAISNSFCNCVIYSLSNSVFQRGLKRLSGAMCTSCASQTTANDGPLNGCHI---- >CKR3_MOUSE TDEIKTVVESFETTPYEYEWAPPC---EKVRIKELGSWLLPPLYSLVFIIGLLGNMMVVLILIKYRKLQIMTNIYLFNLAISDLLFLFTVPFWIHYVLWNE-WGFGHYMCKMLSGFYYLALYSEIFFIILLTIDRYLAIVHAVALRARTVTFATITSIITWGLAGLAALPEFIFHESQD-SF-EFSCSPRYPEGEEDSWKRFHALRMNIFGLALPLLVMVICYSGIIKTL---------LRCPNKKKHKAIRLIFVVMIVFFIFWTPYNLVLLFSAFHRTFKHLDLAMQVTEVIAYTHCCVNPVIYAFVGERFRKHLRLFFHRNVAVYLGKYIPFLT-------SSVS >HH1R_MOUSE ----------MRLPNTSSASEDKMCEGNRTAMASPQLLPLVVVLSSISLVTVGLNLGVLYAVRSERKLHTVGNLYIVSLSVADLIVGAIVMPMNILYLIMTKWSLGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLRYRTKTRASATILGAWFLSFLWVIPILGWHHFT--EL-EDKCETD------FYNVTWFKIMTAIINFYLPTLLMLWFYVKIYNGVRRHLRSQYVSGLHLNRERKAAKQLGCIMAAFILCWIPYFIFFMVIAFCNSC-CSEPVHMFTIWLGYINSTLNPLIYPLCNENFKKTFKKILHIRS----------------------- >SSR1_HUMAN GEGGGSRGPGAGAADGMEEPGRNASQNGTLSEGQGSAILISFIYSVVCLVGLCGNSMVIYVILRYAKMKTATNIYILNLAIADELLMLSVPFLVTSTLLRH-WPFGALLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIAARYRRPTVAKVVNLGVWVLSLLVILPIVVFSRTAADG--TVACNML-MPEPAQRWLVGFVLYTFLMGFLLPVGAICLCYVLIIAKMRMVALK-AGWQQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQ--DDATVSQLSVILGYANSCANPILYGFLSDNFKRSFQRILCLS-----WMDNAAETALKSRAYSVED >MC5R_MOUSE -MNSSSTLTVLNLTLNASEDGILGSNVKNKSLACEEMGIAVEVFLTLGLVSLLENILVIGAIVKNKNLHSPMYFFVGSLAVADMLVSMSNAWETVTIYLLNNDTFVRHIDNVFDSMICISVVASMCSLLAIAVDRYITIFYALYHHIMTARRSGVIIACIWTFCISCGIVFIIYYY----------------------EESKYVIICLISMFFTMLFFMVSLYIHMFLLARNH-RIPRYNSVRQRTSMKGAITLTMLLGIFIVCWSPFFLHLILMISCPQNSCFMSYFNMYLILIMCNSVIDPLIYALRSQEMRRTFKEIVCCHGFRRPCRLLGGY------------ >OPSD_CHELA MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPVNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVLGGFTTTMYTSMHGYFVLGRLGCNVEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFSEDHAIMGLAFTWVMASACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVVAFLVCWCPYAGVAWYIFTHQGSEFGPLFMTFPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >OLF6_RAT ----------MAWSTGQNLSTPGPFILLGFPGPRSMRIGLFLLFLVMYLLTVVGNLAIISLVGAHRCLQTPMYFFLCNLSFLEIWFTTACVPKTLATFAPRGGVISLAGCATQMYFVFSLGCTEYFLLAVMAYDRYLAICLPLYGGIMTPGLAMRLALGSWLCGFSAITVPATLIARL---SNHFFCDISVLSCTDTQVVELVSFGIAFCVILGSCGITLVSYAYIITTI--------IKIPSARGRHRAFSTCSSHLTVVLIWYGSTIFLHVRTSVES---SLDLTKAITVLNTIVTPVLNPFIYTLRNKDVKEALRRTVKGK------------------------ >D4DR_HUMAN ---MGNRSTADADGLLAGRGPAAGASAGASAGLAGQGAAALVGGVLLIGAVLAGNSLVCVSVATERALQTPTNSFIVSLAAADLLLALLVLPLFVYSEVQGGWLLSPRLCDALMAMDVMLCTASIFNLCAISVDRFVAVAVPLYNRQGGSRRQLLLIGATWLLSAAVAAPVLCGLN---G-R--AVCRL---------EDRDYVVYSSVCSFFLPCPLMLLLYWATFRGLQRWPPPRRRRAKITGRERKAMRVLPVVVGAFLLCWTPFFVVHITQALCPACSVPPRLVSAVTWLGYVNSALNPVIYTVFNAEFRNVFRKALRACC----------------------- >NY2R_MOUSE VEVKVEPYGPGHTTPRGELPPDPEPELIDSTKLVEVQVILILAYCSIILLGVVGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTITLTVIALDRHRCIVYHL-ESKISKRISFLIIGLAWGISALLASPLAIFREYSLFE--IVACTEKWPGEEKSVYGTVYSLSTLLILYVLPLGIISFSYTRIWSKLRNHSPG-AASDHYHQRRHKMTKMLVCVVVVFAVSWLPLHAFQLAVDIDSHVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCEQRLDAIHSE---AKKNLEVKKNNG >ETBR_MOUSE SSAPAEVTKGGRGAGVPPRS-FPPPCQRNIEISKTFKYINTIVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINTYKLLAEDWPFGAEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAIGFDMITRV--LRVCMLNQKTAFMQFYKTAKDWWLFSFYFCLPLAITAVFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYDQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQTFE-EKQSLEFKANDHGYDNFR >NK2R_RAT ----MGTRAIVSDANILSGLESNATGVTAFSMPGWQLALWATAYLALVLVAVTGNATVIWIILAHERMRTVTNYFIINLALADLCMAAFNATFNFIYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPSTKAIIAGIWLVALALASPQCFYST---GA---TKCVVAWPNDNGGKMLLLYHLVVFVLIYFLPLLVMFGAYSVIGLTLWKRPRHHGANLRHLQAKKKFVKAMVLVVLTFAICWLPYHLYFILGTFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTEE----HTPSLSRRVNRC >OPR._MOUSE GSHFQGNLSLLN---ETVPHHLLLNASHSAFLPLGLKVTIVGLYLAVCIGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQGTDILLGF-WPFGNALCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALASVVGVPVAIMGSAQVEE---IECLVE-IPAPQDYWGPVFAICIFLFSFIIPVLIISVCYSLMIRRLRGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLVQGLGVQPETAVAILRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCASALHREMQVSDRVGLGCKTSETVPR >YLD1_CAEEL --------------------------MIIFYLYVATQVFVAIAFVLLMATAIIGNSVVMWIIYQHKVMHYGFNYFLFNMAFADLLIALFNVGTSWTYNLYYDWWYG-DLCTLTSFFGIAPTTVSVCSMMALSWDRCQAVVNPLQKRPLSRKRSVIAILIIWVVSTVTALPFAIAASVNSVTSKAHVCSAP--------VNTFFEKVLFGIQYALPIIILGSTFTRIAVAFRATEATSSLKNNHTRAKSKAVKMLFLMVVAFVVCWLPYHIYHAFALEEFFDARGKYAYLLIYWIAMSSCAYNPIIYCFANERFRIGFRYVFRWIPVIDCKKEQYEYMRSMAISLQKGR >5HT_BOMMO LPLQNCSWNSTGWEPNWNVTVWQASAPFDTPAALVRAAAKAVVLGLLILATVVGNVFVIAAILLERHLRSAANNLILSLAVADLLVACLVMPLGAVYEVVQRWTLGPELCDMWTSGDVLCCTASILHLVAIALDRYWAVTNI-YIHASTAKRVGMMIACVWTVSFFVCIAQLLGWKDPDS-E--LRCVVS--------QDVGYQIFATASSFYVPVLIILILYWRIYQTARKRPSLKPKEAADSKRERKAAKTLAIITGAFVACWLPFFVLAILVPTCDC-EVSPVLTSLSLWLGYFNSTLNPVIYTVFSPEFRHAFQRLLCGRRVRRRRAPQ--------------- >OPSD_SALPV MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFFLILLGFPINFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIGLWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWIMACACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVVYMFTCHFCIPLTIIGFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVILMVVGFLVCWLPYASVAWYIFSNQGSQFGPLFMTIPAFFAKSSSVYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTSSVSSSSVSPAA >OPSD_LITMO MNGTEGPYFYVPMVNTSGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWLMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLMVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIAFLICWCPYAGVAWWIFTHQGSDFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >P2UR_RAT ----MAAGLDSWNSTINGTWEGDELGYKCRFNEDFKYVLLPVSYGVVCVLGLCLNVVALYIFLCRLKTWNASTTYMFHLAVSDSLYAASLPLLVYYYAQGDHWPFSTVLCKLVRFLFYTNLYCSILFLTCISVHRCLGVLRPLSLSWGHARYARRVAAVVWVLVLACQAPVLYFVTTS-----RITCHDT-SARELFSHFVAYSSVMLGLLFAVPFSIILVCYVLMARRLLKP---AYGTTGLPRAKRKSVRTIALVLAVFALCFLPFHVTRTLYYSFRSLNAINMAYKITRPLASANSCLDPVLYFLAGQRLVRFARDAKPATEPTPSPQARRKLTDTVRKDLSISS >OLF4_CHICK -------------MASGNCTTPTTFILSGLTDNPGLQMPLFMVFLAIYTITLLTNLGLIALISVDLHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERKTISYIGCILQYFSFVLLTVTESLLLAVMAYDRYVAICKPLYPSIMTKAVCWRLVKGLYSLAFLNSLVHTSGLLKL---SNHFFCDNSQISSSSTTLNELLVFIFGSLFAMSSIITILISYVFIILTV--------VRIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRPVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRVIATNVWIH-------------------- >LSHR_RAT ENELSGWDYDYGFCSPKTLQCAPEPDAFNPCEDIMGYAFLRVLIWLINILAIFGNLTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDSQTKGWQTG-SGCGAAGFFTVFASELSVYTLTVITLERWHTITYAVLDQKLRLRHAIPIMLGGWLFSTLIATMPLVGISNY----KVSICLPM-----VESTLSQVYILSILILNVVAFVVICACYIRIYFAVQNP------ELTAPNKDTKIAKKMAILIFTDFTCMAPISFFAISAAFKVPLITVTNSKILLVLFYPVNSCANPFLYAIFTKAFQRDFLLLLSRFGCCKRRAELYRRSNCKNGFPGASK >MSHR_BOVIN PALGSQRRLLGSLNCTPPATLPFTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAVSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHKVILLCLVGLFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW----------------------- >PE24_RAT --------------------MSIPGVNASFSSTPERLNSPVTIPAVMFIFGVVGNLVAIVVLCKSRKKETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKGQWPGDQALCDYSTFILLFFGLSGLSIICAMSIERYLAINHAYYSHYVDKRLAGLTLFAVYASNVLFCALPNMGLGRSERQYPGTWCFID---WTTNVTAYAAFSYMYAGFSSFLILATVLCNVLVCGALLRMAAAVASFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFINQLYQPSDISRNPDLQAIRIASVNPILDPWIYILLRKTVLSKAIEKIKCLFCRIGGSGRDGSRRTSSAMSGHSR >5H1B_FUGRU ----MEGTNNTTGWTHFDSTSNRTSKSFDEEVKLSYQVVTSFLLGALILCSIFGNACVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNRWTLGQIPCDIFISLDMLCCTSSILHLCVIALDRYWAITEPIYMKKRTPRRAAVLISVTWLVGFSISIPPMLIMRSQPA-N--KQCKIT--------QDPWYTIYSTFGAFYIPLTLMLVLYGRIFKAARFRRHEETKRKIALARERKTVKTLGIIMGTFILCWLPFFIVALVMPFCQESFMPHWLKDVINWLGYSNSLLNPIIYAYFNKDFQSAFKKIIKCHFCRA-------------------- >GP39_HUMAN -------MASPSLPGSDCSQIIDHSHVPEFEVATWIKITLILVYLIIFVMGLLGNSATIRVTQVLQKLQKEVTDHMVSLACSDILVFLIGMPMEFYSIIWNPTSSYTLSCKLHTFLFEACSYATLLHVLTLSFERYIAICHPFYKAVSGPCQVKLLIGFVWVTSALVALPLLFAMGTEYETSNMSICTNL-----SRWTVFQSSIFGAFVVYLVVLLSVAFMCWNMMQVLMKSRPPKSESEESRTARRQTIIFLRLIVVTLAVCWMPNQIRRIMAAAKPKHRAYMILLPFSETFFYLSSVINPLLYTVSSQQFRRVFVQVLCCRLSLQHANHEKRLTDSARFVQRPLL >BRB2_HUMAN FSADMLNVTLQGPTLNGTFAQSKC---PQVEWLGWLNTIQPPFLWVLFVLATLENIFVLSVFCLHKSSCTVAEIYLGNLAAADLILACGLPFWAITISNNFDWLFGETLCRVVNAIISMNLYSSICFLMLVSIDRYLALVKTMMGRMRGVRWAKLYSLVIWGCTLLLSSPMLVFRTMKE-HN--TACVIS---YPSLIWEVFTNMLLNVVGFLLPLSVITFCTMQIMQVLRNN---EMQKFKEIQTERRATVLVLVVLLLFIICWLPFQISTFLDTLHRLGRIIDVITQIASFMAYSNSCLNPLVYVIVGKRFRKKSWEVYQGVCQKGGCRSEPIQLRTS-------I >RDC1_HUMAN YAEPGNFSDISWPCNSSDCIVVDTVMCPNMPNKSVLLYTLSFIYIFIFVIGMIANSVVVWVNIQAKTTGYDTHCYILNLAIADLWVVLTIPVWVVSLVQHNQWPMGELTCKVTHLIFSINLFSGIFFLTCMSVDRYLSITYFTTPSSRKKMVRRVVCILVWLLAFCVSLPDTYYLKTVTNNE--TYCRSFYPEHSIKEWLIGMELVSVVLGFAVPFSIIAVFYFLLARAI---------SASSDQEKHSSRKIIFSYVVVFLVCWLPYHVAVLLDIFSILHHALFTALHVTQCLSLVHCCVNPVLYSFINRNYRYELMKAFIFKYSAKTGLTKLIDEYSALEQNAK-- >OPSB_ASTFA QEFQEDFYIPIPLDTNNITALSPFLVPQDHLGGSGIFMIMTVFMLFLFIGGTSINVLTIVCTVQYKKLRSHLNYILVNLAISNLLVSTVGSFTAFVSFLNRYFIFGPTACKIEGFVATLGGMVSLWSLSVVAFERWLVICKPVGNFSFKGTHAIIGCALTWFFALLASTPPLFGWSRYIPEGLQCSCGPDWYTTENKYNNESYVMFLFCFCFGFPFTVILFCYGQLLFTLKSAAKAQADSASTQKAEREVTKMVVVMVMGFLVCWLPYASFALWVVFNRGQSFDLRLGTIPSCFSKASTVYNPVIYVFMNKQFRSCMMKLIFCGKSPFGDDEEASSSSVGPEK----- >CB1B_FUGRU TNASDFPLSNGSGEATQCGEDIVDNMECFMILTPAQQLVIVILAITLGTFTVLENFVVLCVILHSHTRSRPSYHFIGSLAVADLIGSIIFVYSFLDFHVLHR-KDSPSIFLFKLAGVIASFTASVGSLFLTAIDRYVSIHRPMYKRIITKTKAVIAFSVMWAISIEFSLLPLLGWNCK-R--LHSVCSDI------FPLIDEKYLMFWIGMTTVLLLFIIYAYMFILWKSHHHSEGQTVRPEQARMDLRLAKTLVLILVALIICWGPLLAIMVYDLFGRVNDFIKTVFAFCSMLCLLNSTINPVIYAMRSKDLRRAFVNICHMCRGTTQSLDSSAEVRSTGGRAGKDR >5H2B_HUMAN ILQSTFVHVISSNWSGLQTESIPEEMKQIVEEQGNKLHWAALLILMVIIPTIGGNTLVILAVSLEKKLQYATNYFLMSLAVADLLVGLFVMPIALLTIMFEAWPLPLVLCPAWLFLDVLFSTASIMHLCAISVDRYIAIKKPIANQYNSRATAFIKITVVWLISIGIAIPVPIKGIET-NP---ITCVLT------KERFGDFMLFGSLAAFFTPLAIMIVTYFLTIHALQKKRRTGKKSVQTISNEQRASKVLGIVFFLFLLMWCPFFITNITLVLCDSCTTLQMLLEIFVWIGYVSSGVNPLVYTLFNKTFRDAFGRYITCNYRATKSVKTLRKRNPMAENSKFFK >OAR2_LYMST RNFSVSADVWLCGANFSQEWQLMQPVCSTKYDSITIFITVAVVLTLITLWTILGNFFVLMALYRYGTLRTMSNCLIGNLAISDLLLAVTVLPISTVHDLLGYWVFGEFTCTLWLCMDVLYCTASIWGLCTVAFDRYLATVYPVYHDQRSVRKAVGCIVFVWIFSIVISFAPFIGWQHM-S-F--YQCILF--------TSSSYVLYSSMGSFVIPAILMAFMYVRIFVVLHNQRNKLSMKRRFELREQRATKRMLLIMACFCVCWMPFLFMYILRSVCDTCHMNQHFVAAIIWLGYVNSSLNPVLYTLFNDDFKVAFKRLIGARSPSAYRSPGPRR------------ >GRPR_HUMAN DCFLLNLEVDHFMHCN-ISSHSADLPVNDDWSHPGILYVIPAVYGVIILIGLIGNITLIKIFCTVKSMRNVPNLFISSLALGDLLLLITCAPVDASRYLADRWLFGRIGCKLIPFIQLTSVGVSVFTLTALSADRYKAIVRPMIQASHALMKICLKAAFIWIISMLLAIPEAVFSDLHPQT--FISCAPY---HSNELHPKIHSMASFLVFYVIPLSIISVYYYFIAKNLIQSLPVNIHVKKQIESRKRLAKTVLVFVGLFAFCWLPNHVIYLYRSYHYSEMLHFVTSICARLLAFTNSCVNPFALYLLSKSFRKQFNTQLLCCQPGLIIR--SHSMTSLKSTNPSVA >DBDR_HUMAN PGSNGTAYPGQFALYQQLAQGNAVGGSAGAPPLGPSQVVTACLLTLLIIWTLLGNVLVCAAIVRSRHRANMTNVFIVSLAVSDLFVALLVMPWKAVAEVAGYWPFG-AFCDVWVAFDIMCSTASILNLCVISVDRYWAISRPFYKRKMTQRMALVMVGLAWTLSILISFIPVQLNWHRDED-NAENCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAQVQSAADTSLRASIKKETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCSGHCVSETTFDVFVWFGWANSSLNPVIYAFNA-DFQKVFAQLLGCSHFCSRTPVETVNYNQDIVFHK--- >B2AR_MESAU MGPPGNDSDFLLTTNGSHVPDH----DVTEERDEAWVVGMAILMSVIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWNFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYIAITSPFYQSLLTKNKARMVILMVWIVSGLTSFLPIQMHWYRAI-D--TCCDFF--------TNQAYAIASSIVSFYVPLVVMVFVYSRVFQVAKRQGRSLRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IPKEVYILLNWLGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSSKAYGNGYSDYMGEASGCQLG >5H1B_HUMAN PAGSETWVPQANLSSAPSQNCSAKDYIYQDSISLPWKVLLVMLLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASILHLCVIALDRYWAITDAVYSAKRTPKRAAVMIALVWVFSISISLPPFFWR----E-E--SECVVN-------TDHILYTVYSTVGAFYFPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHLAIFDFFTWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCTS--------------------- >GPRH_HUMAN ------MNGLEVAPPGLITNFSLATAEQCGQETPLENMLFASFYLLDFILALVGNTLALWLFIRDHKSGTPANVFLMHLAVADLSCVLVLPTRLVYHFSGNHWPFGEIACRLTGFLFYLNMYASIYFLTCISADRFLAIVHPVSLKLRRPLYAHLACAFLWVVVAVAMAPLLVSPQTVQVCL-QLYRE----------KASHHALVSLAVAFTFPFITTVTCYLLIIRSLR------QGLRVEKRLKTKAVRMIAIVLAIFLVCFVPYHVNRSVYVLHYRSRILALANRITSCLTSLNGALDPIMYFFVAEKFRHALCNLLCGKRLKGPPPSFEGKAKSEL------- >ACM1_MOUSE ------------MNTSVPPAVSPNITVLAPGKGPWQVAFIGSTTGLLSLATVTGNLLVLISIKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL-AGQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTVNPMCYASCNKAFRDHFRLLLLCRWDKRRWRKIPKRPSRQC------- >B3AR_SHEEP MAPWPPGNSFLTPWPDIPTLAPNTANASGLPGVPWAVALAGALLALAVLATVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGHWPLGVTGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSKWWRVQ-R--RCCTFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVDTRQGVPRRPARLLPLREHRALRTLGLIMGTFTLCWLPFFVVNVVRALGGPS-VSGPTFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCPPEEHLAAASPPTVLTSPAGPRQP >A1AD_MOUSE TGSGEDNQSSTAEAGAAASGEVNGSAAVGGLVVSAQGVGVGVFLAAFILTAVAGNLLVILSVACNRHLQTVTNYFIVNLAVADLLLSAAVLPFSATMEVLGFWPFGRTFCDVWAAVDVLCCTASILSLCTISVDRYVGVRHSLYPAIMTERKAAAILALLWAVALVVSVGPLLGWKEP--VP--RFCGIT--------EEVGYAIFSSVCSFYLPMAVIVVMYCRVYVVARSTHTLLSVRLLKFSREKKAAKTLAIVVGVFVLCWFPFFFVLPLGSLFPQLKPSEGVFKVIFWLGYFNSCVNPLIYPCSSREFKRAFLRLLRCQCRRRRRR--LWPSLDRR-PALRLC >C5AR_CANFA SMNFSPPEYPDYG-TATLDPNIFVDESLNTPKLSVPDMIALVIFVMVFLVGVPGNFLVVWVTGFEVR-RTINAIWFLNLAVADLLSCLALPILFSSIVQQGYWPFGNAACRILPSLILLNMYASILLLTTISADRFVLVFNPICQNYRGPQLAWAACSVAWAVALLLTVPSFIFRGVHT-PF--MTCGVD-YSGVGVLVERGVAILRLLMGFLGPLVILSICYTFLLIRT---------WSRKATRSTKTLKVVVAVVVSFFVLWLPYQVTGMMMALFYKHRRVSRLDSLCVAVAYINCCINPIIYVLAAQGFHSRFLKSLPARLRQVLAEESVGRSTVD-------- >CKRA_HUMAN QVSWGHYSGDEEDAYSAEPLPELC---YKADVQAFSRAFQPSVSLTVAALGLAGNGLVLATHLAARRARSPTSAHLLQLALADLLLALTLPFAAAGALQG--WSLGSATCRTISGLYSASFHAGFLFLACISADRYVAIARALGPRPSTPGRAHLVSVIVWLLSLLLALPALLFS--QD-RE-QRRCRLIFPEGLTQTVKGASAVAQVALGFALPLGVMVACYALLGRTL---------LAARGPERRRALRVVVALVAAFVVLQLPYSLALLLDTADLLAKRKDVALLVTSGLALARCGLNPVLYAFLGLRFRQDLRRLLRGGSSPSGPQPRRGC--------PRLS >MTR_BUFMA LNLDCSELPNSSWVNSSMENQSSNSTRDPLKRNEEVAKVEVTVLALILFLALAGNICVLLGIYINRHKHSRMYFFMKHLSIADLVVAIFQVLPQLIWDITFRFYAPDLVCRLVTYLQVVGMFASTYMLLLMSLDRCLAICQPL--RSLHRRSDCVYVLFTWILSFLLSTPQTVIFSLTE----VYDCRAD---FIQPWGPKAYITWITLAVYIIPVMILSVCYGLISYKIWQNRATVSSVRLISKAKIRTVKMTFIIVLAYIVCWTPFFFVQMWSVWDPNP-KEASLFIIAMLLGSLNSCCNPWIYMLFTGHLFHDLLQSFLCCSARYLKTQQQGSSNSSTFVLSRKS >B1AR_HUMAN LPDGAATAARLLVPASPPASLLPPASESPEPLSQQWTAGMGLLMALIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARGLVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHREL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQGLLCCARRAARRRHATHGCLARPGPPPSPG >ACM2_RAT --------------MNNSTNSSNNGLAITSPYKTFEVVFIVLVAGSLSLVTIIGNILVMVSIKVSRHLQTVNNYFLFSLACADLIIGVFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIV-VE-DGECYIQ------FFSNAAVTFGTAIAAFYLPVIIMTVLYWHISRASKSRVKMPAKKKPPPSREKKVTRTILAILLAFIITWAPYNVMVLINTFCAPC-IPNTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLMCHYKNIGATR---------------- >AG22_MERUN AATSRNITSSLPFVNLNMSGTNDLIFNCSHKPSDKHLEAIPVLYYLIFVIGFAVNIIVVSLFCCQKGPKKVSSIYIFNLAVADLLLLATLPLWATYYSYRYDWLFGPVMCKVFGSFLTLNMFASIFFITCMSVDRYQSVIYPFLSQRRNPWQASYVVPLVWCMACLSSLPTFYFRDVRT-LG--NACVMAFPPEKYAQWSAGIALMKNVLGFIIPLIFIATCYFGIRKHLLKT----NSYGKNRITRDQVLKMAAAVVLAFIICWLPFHVLTFLDALSWMGAVIDLALPFAILLGFTNSCVNPFLYCFVGNRFQQKLRSMFRVPITWLQGKRETMSREMDTFVS---- >NY5R_PIG -------------NTVATRNSGFPVWEDYKGSVDDLQYFLIGLYTFVSLLGFMGNLLILMAVMRKRNQKTTVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKVMCHIMPFLQCVTVLVSTLILISIAIVRYHMIKHPV-SNNLTANHGYFLIATVWTLGLAICSPLPVFHSLVESS--RYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRTISCGVPESRSIMRLRKRSRSVFYRLTVLILVFAVSWMPLHLFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLMSLIHCLHVS--------------------- >CB1R_HUMAN EFYNKSLSSFKENEENIQCGENFMDIECFMVLNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFIDFHVFHR-KDSRNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRPKAVVAFCLMWTIAIVIAVLPLLGWNCE-K--LQSVCSDI------FPHIDETYLMFWIGVTSVLLLFIVYAYMYILWKAHSHSEDQVTRPDQARMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTVFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPSCEGTAQPLDNSMGKHANNAASVHRA >D3DR_MOUSE --------MAPLSQISSHINSTCGAENSTGVNRARPHAYYALSYCALILAIIFGNGLVCAAVLRERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGWNFSRICCDVFVTLDVMMCTASILNLCAISIDRYTAVVMPVGTGQSSCRRVALMITAVWVLAFAVSCPLLFGFN---D-P--SICSI---------SNPDFVIYSSVVSFYVPFGVTVLVYARIYMVLRQRTSLPLQPRGVPLREKKATQMVVIVLGAFIVCWLPFFLTHVLNTHCQACHVSPELYRATTWLGYVNSALNPVIYTTFNIEFRKAFLKILSC------------------------- >GPRF_MACMU ----MDPEETSVYLDYYYATSPNPDIRETHSHVPYTSVFLPVFYTAVFLTGVLGNLVLMGALHFKPGSRRLIDIFIINLAASDFIFLVTLPLWVDKEASLGLWRTGSFLCKGSSYMISVNMHCSVFLLTCMSVDRYLAIVCPVSRKFRRTDCAYVVCASIWFISCLLGLPTLLSRELT-IDD-KPYCAEK----KATPLKLIWSLVALIFTFFVPLLSIVTCYCCIARKLCAH---YQQSGKHNKKLKKSIKIIFIVVAAFLVSWLPFNTFKLLAIVSGLQAMLQLGMEVSGPLAFANSCVNPFIYYIFDSYIRRAIVHCLCPCLKNYDFGSSTETALSTFIHAEDFT >CB1R_RAT EFYNKSLSSFKENEENIQCGENFMDMECFMILNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFVDFHVFHR-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRPKAVVAFCLMWTIAIVIAVLPLLGWNCK-K--LQSVCSDI------FPLIDETYLMFWIGVTSVLLLFIVYAYMYILWKAHSHSEDQVTRPDQARMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTVFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPSCEGTAQPLDNSMGHANNTASMHRAA >CKR4_HUMAN IADTTLDESIYSNYYLYESIPKPC---TKEGIKAFGELFLPPLYSLVFVFGLLGNSVVVLVLFKYKRLRSMTDVYLLNLAISDLLFVFSLPFWGYYAADQ--WVFGLGLCKMISWMYLVGFYSGIFFVMLMSIDRYLAIVHAVSLRARTLTYGVITSLATWSVAVFASLPGFLFSTCYT-ER-HTYCKTK-YSLNSTTWKVLSSLEINILGLVIPLGIMLFCYSMIIRTL---------QHCKNEKKNKAVKMIFAVVVLFLGFWTPYNIVLFLETLVELERYLDYAIQATETLAFVHCCLNPIIYFFLGEKFRKYILQLFKTCRGLFVLCQYCGLTP------SSSY >PD2R_MOUSE ----------------------MNESYRCQTSTWVERGSSATMGAVLFGAGLLGNLLALVLLARSGLPPSVFYVLVCGLTVTDLLGKCLISPMVLAAYAQNQPASGNQLCETFAFLMSFFGLASTLQLLAMAVECWLSLGHPFYQRHVTLRRGVLVAPVVAAFCLAFCALPFAGFGKFVQYCPGTWCFIQ-MIHKERSFSVIGFSVLYSSLMALLVLATVVCNLGAMYNLYDMAQSYRHGSLHPLEELDHFVLLALMTVLFTMCSLPLIYRAYYGAFKL--AEGDSEDLQALRFLSVISIVDPWIFIIFRTSVFRMLFHKVFTRPLIYRNWSSHSQL----------- >OPSD_BUFMA MNGTEGPNFYIPMSNKTGVVRSPFEYPQYYLAEPWQYSVLCAYMFLLILLGFPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFVFGQTGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAIMGVAFTWIMALACAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFLIPLIIIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVFFLICWVPYASVAFFIFTHQGSEFGPVFMTIPAFFAKSSSIYNPVIYIMLNKQFRNCMITTLCCGKNPFGDEDASSAASSVSSSQVSPA >5H1F_MOUSE -------------MDFLNASD-QNLTSEELLNRMPSKILVSLTLSGLALMTTTINSLVIAAIIVTRKLHHPANYLICSLAVTDFLVAVLVMPFSIVYIVRESWIMGQVLCDIWLSVDIICCTCSILHLSAIALDRYRAITDAVYARKRTPRHAGIMITIVWVISVFISMPPLFWR----S-R--DECVIK-------HDHIVSTIYSTFGAFYIPLVLILILYYKIYRAARTLLKHWRRQKISGTRERKAATTLGLILGAFVICWLPFFVKELVVNVCEKCKISEEMSNFLAWLGYLNSLINPLIYTIFNEDFKKAFQKLVRCRY----------------------- >B2AR_RAT MEPHGNDSDFLLAPNGSRAPGH----DITQERDEAWVVGMAILMSVIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWNFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYVAITSPFYQSLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-D--TCCDFF--------TNQAYAIASSIVSFYVPLVVMVFVYSRVFQVAKRQGRSLRSSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIRANL-IPKEVYILLNWLGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSSKTYGNGYSDYTGEQSAYQLG >O.YR_PIG GVLAANWSAEAVNSSAAPPEAEGNRTAGPPQRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RALRRPADRLAVLATWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLAACYGLISFKIWQNRAAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDADA-KEASAFIIAMLLASLNSCCNPWIYMLFTGHLFHELVQRFLCCSSSHLKTSRPGENSSTFVLSQHSS >OPSB_CARAU PEFHEDFYIPIPLDINNLSAYSPFLVPQDHLGNQGIFMAMSVFMFFIFIGGASINILTILCTIQFKKLRSHLNYILVNLSIANLFVAIFGSPLSFYSFFNRYFIFGATACKIEGFLATLGGMVGLWSLAVVAFERWLVICKPLGNFTFKTPHAIAGCILPWISALAASLPPLFGWSRYIPEGLQCSCGPDWYTTNNKYNNESYVMFLFCFCFAVPFGTIVFCYGQLLITLKLAAKAQADSASTQKAEREVTKMVVVMVLGFLVCWAPYASFSLWIVSHRGEEFDLRMATIPSCLSKASTVYNPVIYVLMNKQFRSCMMKMVCGKNIEE---DEASTSSVAPEK----- >5H1B_RAT CAPPPPATSQTGVPLANLSHNSADDYIYQDSIALPWKVLLVALLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPKRAAIMIVLVWVFSISISLPPFFWR----E-E--LDCFVN-------TDHVLYTVYSTVGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHMAIFDFFNWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCTG--------------------- >H963_HUMAN -----------------------MTNSSFFCPVYKDLEPFTYFFYLVFLVGIIGSCFATWAFIQKNTNHRCVSIYLINLLTADFLLTLALPVKIVVDLGVAPWKLKIFHCQVTACLIYINMYLSIIFLAFVSIDRCLQLTHSCIYRIQEPGFAKMISTVVWLMVLLIMVPNMMIPIKD-SN---VGCMEF--KKEFGRNWHLLTNFICVAIFLNFSAIILISNCLVIRQLYR-----NKDNENYPNVKKALINILLVTTGYIICFVPYHIVRIPYTLSQTEISLFKAKEATLLLAVSNLCFDPILYYHLSKAFRSKVTETFASPKETKAQKEKLRC------------ >OPS3_DROME ARLSAETRLLGWNVPPEELRHIPEHWLTYPEPPESMNYLLGTLYIFFTLMSMLGNGLVIWVFSAAKSLRTPSNILVINLAFCDFMMMVKTPIFIYNSFHQG-YALGHLGCQIFGIIGSYTGIAAGATNAFIAYDRFNVITRPM-EGKMTHGKAIAMIIFIYMYATPWVVACYTETWGRFPEGYLTSCTFD--YLTDNFDTRLFVACIFFFSFVCPTTMITYYYSQIVGHVFSHNVESNVDKNKETAEIRIAKAAITICFLFFCSWTPYGVMSLIGAFGDKTLLTPGATMIPACACKMVACIDPFVYAISHPRYRMELQKRCPWLALNEKAPESSAVEPQQTTAA---- >CCR5_MOUSE DDLYKELAFYSNSTEIPLQDSNFCSTVEGPLLTSFKAVFMPVAYSLIFLLGMMGNILVLVILERHRHTRSSTETFLFHLAVADLLLVFILPFAVAEGSVG--WVLGTFLCKTVIALHKINFYCSSLLVACIAVDRYLAIVHAVAYRRRRLLSIHITCTAIWLAGFLFALPELLFAKVGQ-ND-LPQCTFSQENEAETRAWFTSRFLYHIGGFLLPMLVMGWCYVGVVHRL--------LQAQRRPQRQKAVRVAILVTSIFFLCWSPYHIVIFLDTLERLKGYLSVAITLCEFLGLAHCCLNPMLYTFAGVKFRSDLSRLLTKLGCAG---PASLC--------PNWR >5H1F_CAVPO -------------MDFLNSSD-QNLTSEELLHRMPSKILVSLTLSGLALMTTTINSLVIAAIIVTRKLHHPANYLICSLAVTDFLVAVLVMPFSIVYIVRESWIMGQVLCDIWLSVDIICCTCSILHLSAIALDRYRAITDAVYARKRTPKQAGIMITIVWIISVFISMPPLFWR----S-R--DECIIK-------HDHIVSTIYSTFGAFYIPLVLILILYYKIYKAAKTLLRHWRRQKISGTRERKAATTLGLILGAFVICWLPFFVKELVVNVCEKCKISEEMANFLAWLGYLNSLINPLIYTIFNEDFKKAFQKLVRCQY----------------------- >OPSD_MESBI MNGTEGLNFYVPFSNHTGVVRSPFEYPQYYLAEPWQFSVLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVANLFMVLGGFTTTLYTSMHAYFIFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWIMALACAAPPLVGWSRYIPEGMQCSCGVDYYTPSPEVNNESFVVYMFVVHFSIPMVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVVIMVVAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPSFFAKSSAIYNPVIYIMMNKQFRNCMLTTLCCGRNPLGDDEVSTTASKTETSQVAPA >PI2R_RAT GRPDGPPSITPESPLIVGGREWQGMAGSCWNITYVQDSVGPATSTLMFVAGVVGNGLALGILGARRRHPSAFAVLVTGLAVTDLLGTCFLSPAVFVAYARNSAHGGTMLCDTFAFAMTFFGLASTLILFAMAVERCLALSHPYYAQLDGPRCARLALPAIYAFCCLFCSLPLLGLGEHQQYCPGSWCFIR---MRSPQPGGCAFSLAYASLMALLVTSIFFCNGSVTLSLCHMRRHFVPTSRAREDEVYHLILLALMTGIMAVCSLPLTIRGFTQAIAPDS--REMGDLHAFRFNAFNPILDPWVFILFRKAVFQRLKFWLCCLCARSVHGDLQTPRRDTLAPDSLQA >IL8A_PANTR PQMWDFDDLNFTGMPPTDEGYSPC----RLETETLNKYVVIITYALVFLLSLLGNSLVMLVILYSRVGRSVTDVYLLNLALADLLFALTLPIWAASKVNG--WIFGTFLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRHLVKFVCLGCWGLSMNLSLPFFLFRQAYH-NS-SPVCYEV-LGNDTAKWRMVLRILPHTFGFIVPLFVMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQNNIGRALDATEILGFLHSCLNPIIYAFIGQNFRHGFLKILAMHGLVSKEFLARHR------------ >CCR5_HUMAN EDLFWELDRLDNYNDTSLVENHLCPATEGPLMASFKAVFVPVAYSLIFLLGVIGNVLVLVILERHRQTRSSTETFLFHLAVADLLLVFILPFAVAEGSVG--WVLGTFLCKTVIALHKVNFYCSSLLLACIAVDRYLAIVHAVAYRHRRLLSIHITCGTIWLVGFLLALPEILFAKVSQ-NN-LPRCTFSQENQAETHAWFTSRFLYHVAGFLLPMLVMGWCYVGVVHRL--------RQAQRRPQRQKAVRVAILVTSIFFLCWSPYHIVIFLDTLARLKGSLPVAITMCEFLGLAHCCLNPMLYTFAGVKFRSDLSRLLTKLGCTG---PASLC--------PSWR >CB2R_HUMAN ----MEECWVTEIANGSKDGLDSNPMKDYMILSGPQKTAVAVLCTLLGLLSALENVAVLYLILSSHQRRKPSYLFIGSLAGADFLASVVFACSFVNFHVFHG-VDSKAVFLLKIGSVTMTFTASVGSLLLTAIDRYLCLRYPPYKALLTRGRALVTLGIMWVLSALVSYLPLMGWTC-----CPRPCSEL------FPLIPNDYLLSWLLFIAFLFSGIIYTYGHVLWKAHQHSGHQVPGMARMRLDVRLAKTLGLVLAVLLICWFPVLALMAHSLATTLSDQVKKAFAFCSMLCLINSMVNPVIYALRSGEIRSSAHHCLAHWKKCVRGLGSEAKVTETEADGKITP >AA2B_HUMAN ------------------------------MLLETQDALYVALELVIAALSVAGNVLVCAAVGTANTLQTPTNYFLVSLAAADVAVGLFAIPFAITISLG--FCTDFYGCLFLACFVLVLTQSSIFSLLAVAVDRYLAICVPLYKSLVTGTRARGVIAVLWVLAFGIGLTPFLGWNSKDT-T-LVKCLFE-----NVVPMSYMVYFNFFGCVLPPLLIMLVIYIKIFLVACRQ---MDHSRTTLQREIHAAKSLAMIVGIFALCWLPVHAVNCVTLFQPAQNKPKWAMNMAILLSHANSVVNPIVYAYRNRDFRYTFHKIISRYLLCQADVKSGNGLGVGL------- >MSHR_HUMAN AVQGSQRRLLGSLNSTPTAIPQLGLAANQTGARCLEVSISDGLFLSLGLVSLVENALVVATIAKNRNLHSPMYCFICCLALSDLLVSGTNVLETAVILLLEAAAVLQQLDNVIDVITCSSMLSSLCFLGAIAVDRYISIFYALYHSIVTLPRAPRAVAAIWVASVVFSTLFIAYYY----------------------DDHVAVLLCLVVFFLAMLVLMAVLYVHMLARACQHIARKRQRPVHQGFGLKGAVTLTILLGIFFLCWGPFFLHLTLIVLCPEHGCIFKNFNLFLALIICNAIIDPLIYAFHSQELRRTLKEVLTCSW----------------------- >ETBR_PIG SSSPPQMPKGGRMAGPPARTLTPPPCEGPIEIKDTFKYINTVVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINVYKLLAEDWPFGVEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEALGFDMITRI--LRICLLHQKTAFMQFYKTAKDWWLFSFYFCLPLAITAFFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYDQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQSFE-EKQSLEFKANDHGYDNFR >OAR2_LOCMI SSAAEEPQDALVGGDACGGRRPPSVLGVRLAVPEWEVAVTAVSLSLIILITIVGNVLVVLSVFTYKPLRIVQNFFIVSLAVADLTVAVLVMPFNVAYSLIQRWVFGIVVCKMWLTCDVLCCTASILNLCAIALDRYWAITDPIYAQKRTLRRVLAMIAGVWLLSGVISSPPLIGWNDW-N-D--TPCQLT--------EEQGYVIYSSLGSFFIPLFIMTIVYVEIFIATKRRPVYEEKQRISLSKERRAARTLGIIMGVFVVCWLPFFLMYVIVPFCNPSKPSPKLVNFITWLGYINSALNPIIYTIFNLDFRRAFKKLLHFKT----------------------- >MC5R_HUMAN -MNSSFHLHFLDLNLNATEGNLSGPNVKNKSSPCEDMGIAVEVFLTLGVISLLENILVIGAIVKNKNLHSPMYFFVCSLAVADMLVSMSSAWETITIYLLNNDAFVRHIDNVFDSMICISVVASMCSLLAIAVDRYVTIFYALYHHIMTARRSGAIIAGIWAFCTGCGIVFILYYS----------------------EESTYVILCLISMFFAMLFLLVSLYIHMFLLARTH-RIPGASSARQRTSMQGAVTVTMLLGVFTVCWAPFFLHLTLMLSCPQNSRFMSHFNMYLILIMCNSVMDPLIYAFRSQEMRKTFKEIICCRGFRIACSFPRRD------------ >5H6_RAT -----------MVPEPGPVNSSTPAWGPGPPPAPGGSGWVAAALCVVIVLTAAANSLLIVLICTQPARNTS-NFFLVSLFTSDLMVGLVVMPPAMLNALYGRWVLARGLCLLWTAFDVMCCSASILNLCLISLDRYLLILSPLYKLRMTAPRALALILGAWSLAALASFLPLLLGWH---E-APGQCRLL--------ASLPFVLVASGVTFFLPSGAICFTYCRILLAARKQMESRRLATKHSRKALKASLTLGILLGMFFVTWLPFFVANIAQAVCDC--ISPGLFDVLTWLGYCNSTMNPIIYPLFMRDFKRALGRFLPCVHCPPEHRPALPPAVPDQASACSRC >5H1D_CAVPO SPPNQSEEGLPQEASNRSLNATETPGDWDPGLLQALKVSLVVVLSIITLATVLSNAFVLTTILLTRKLHTPANYLIGSLATTDLLVSILVMPISIAYTTTRTWNFGQILCDIWVSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAGAMIAAVWVISICISIPPLFWR----Q-E--SDCLVN-------TSQISYTIYSTCGAFYIPSVLLIILYSRIYRAARSRKLALERKRISAARERKATKTLGIILGAFIVCWLPFFVVSLVLPICRDSWIHPALFDFFTWLGYLNSLINPIIYTVFNEDFRQAFQKVVHFRKAS--------------------- >NMBR_MOUSE SNLSFPTEANESELVPEVWEKDFLPDSDGTTAELVIRCVIPSLYLIIISVGLLGNIMLVKIFLTNSAMRNVPNIFISNLAAGDLLLLLTCVPVDASRYFFDEWVFGKLGCKLIPAIQLTSVGVSVFTLTALSADRYRAIVNPMMQTSGVLLWTSLKAVGIWVVSVLLAVPEAVFSEVARSS--FTACIPY---QTDELHPKIHSVLIFLVYFLIPLVIISIYYYHIAKTLIKSLPGNEHTKKQMETRKRLAKIVLVFVGCFVFCWFPNHVLYLYRSFNYKELGHMIVTLVARVLSFSNSCVNPFALYLLSESFRKHFNSQLCCGRKSYPERSTSYLMTSLKSNTKNVV >OPSD_SHEEP MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPQGMQCSCGALYFTLKPEINNESFVIYMFVVHFSIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKSSSVYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTVSKTETSQVAPA >OPSD_GALML MNGTEGENFYVPMSNKTGVVRNPFEYPQYYLADHWMFAVLAAYMFFLIITGFPVNFLTLFVTIQNKKLRQPLNYILLNLAVANLFMVFGGFTTTLITSMNGYFVFGSTGCNLEGFFATLGGEISLWSLVVLAIERYVVVCKPMSNFRFGSQHAIAGVSLTWVMAMACAAPPLVGWSRYIPEGLQCSCGIDYYTPKPEINNVSFVIYMFVVHFSIPLTIIFFCYGRLVCTVKAAAAQQQESETTQRAEREVTRMVVIMVIGFLICWLPYASVALYIFNNQGSEFGPVFMTIPSFFAKSSALYNPLIYILMNKQFRNCMITTLCCGKNPFEEEESTSASSVSSSQVSPAA >A1AA_RAT ----------MVLLSENASEGSNCT-HPPAPVNISKAILLGVILGGLIIFGVLGNILVILSVACHRHLHSVTHYYIVNLAVADLLLTSTVLPFSAIFEILGYWAFGRVFCNIWAAVDVLCCTASIMGLCIISIDRYIGVSYPLYPTIVTQRRGVRALLCVWVLSLVISIGPLFGWRQP--AP--TICQIN--------EEPGYVLFSALGSFYVPLAIILVMYCRVYVVAKREAKNFSVRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVMPIGSFFPDFKPSETVFKIVFWLGYLNSCINPIIYPCSSQEFKKAFQNVLRIQCLRRRQSSKHALSQ---------- >GHSR_RAT SEEPEP-NVTLDLDWDASPGNDSLPDELLPLFPAPLLAGVTATCVALFVVGISGNLLTMLVVSRFRELRTTTNLYLSSMAFSDLLIFLCMPLDLVRLWQYRPWNFGDLLCKLFQFVSESCTYATVLTITALSVERYFAICFPLAKVVVTKGRVKLVILVIWAVAFCSAGPIFVLVG---T-D-TNECRAT---FAVRSGLLTVMVWVSSVFFFLPVFCLTVLYSLIGRKLWRRD--AVGASLRDQNHKQTVKMLAVVVFAFILCWLPFHVGRYLFSKSFEPQISQYCNLVSFVLFYLSAAINPILYNIMSKKYRVAVFKLLGFESFSQRKLSTLKDKSSINT------ >O1E2_HUMAN -------------MMGQNQTSISDFLLLGLPIQPEQQNLCYALFLAMYLTTLLGNLLIIVLIRLDSHLHTPVYLFLSNLSFSDLCFSSVTMPKLLQNMQNQDPSIPYADCLTQMYFFLYFSDLESFLLVAMAYDRYVAICFPMYTAIMSPMLCLSVVALSWVLTTFHAMLHTLLMARL---CPHFFCDMSKLACSDTRVNEWVIFIMGGLILVIPFLLILGSYARIVSSI--------LKVPSSKGICKAFSTCGSHLSVVSLFYGTVIGLYLCPSANS---STLKDTVMAMMYTVVTPMLTPFIYSLRNRDMKGALERVICKRKNPFLL------------------ >OPSD_SARSL MNGTEGPYFYVPMVNTSGIVRSPYEYPQYYLVNPAAYARLGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWLMALACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFTVPLMVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIMMVVAFLVCWLPYASVAWWIFTHQGSEFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >DOP2_DROME SATSATLSPAMVATGGGGTTTPEPDLSEFLEALPNDRVGLLAFLFLFSFATVFGNSLVILAVIRERYLHTATNYFITSLAVADCLVGLVVMPFSALYEVLENWFFGTDWCDIWRSLDVLFSTASILNLCVISLDRYWAITDPFYPMRMTVKRAAGLIAAVWICSSAISFPAIVWWRAARP----YKCTFT--------EHLGYLVFSSTISFYLPLLVMVFTYCRIYRAAVIQMGKLSRKLAKFAKEKKAAKTLGIVMGVFIICWLPFFVVNLLSGFCIECEHEEIVSAIVTWLGWINSCMNPVIYACWSRDFRRAFVRLLCMCCPRKIRRKYQPTFATRRCYSTCSL >GASR_RABIT VASLCRPGGPLLNNSGTGNLSCEPPRIRGAGTRELELAIRVTLYAVIFLMSVGGNILIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAVSYLMGVSVSVSTLSLVAIALERYSAICRPLARVWQTRSHAARVILATWLLSGLLMVPYPVYTAVQP---V-LQCVHR---WPSARVRQTWSVLLLLLLFFVPGVVMAVAYGLISRELYLGSGPPRPAQAKLLAKKRVVRMLLVIVVLFFMCWLPVYSANTWRAFDGPGALSGAPISFIHLLSYASACVNPLVYCFMHRRFRQACLDTCARCCPRPPRARPRPLPSIASLSRLSYT >IL8B_CANFA G--DIDNYTYNTEMPIIPADSAPC----RPESLDINKYAVVVIYVLVFVLNLLGNSLVIMVVLYSRVSHSVTDVYLLNLAIADLLFALTLPIWAVSKVKG--WIFGTPLCKIVSLLKEVNFYSGILLLASISMDRYLAIVHATRRLTQKKHWVKFICLGIWALSLILSLPIFVFRRAIN-YS-SPVCYED-MGTNTTKLRIVMRALPQTFGFIVPLMIMLFCYGLTLRTL---------FEAHMGQKHRAMRVIFAVVLVFLLCWLPYNLVADTLMRLQAINDIGRALDATEILGFFHSCLNPLIYAFIGQKFRHGLLKIMAFHGLISKEYLPKDS------------ >KI01_RAT ---------------MDNTTTTEPPKQPCTRNTLITQQIIPMLYCVVFITGVLLNGISGWIFFYVPS-SKSFIIYLKNIVVADFLMGLTFPFKVLSDSGLGPWQLNVFVFRVSAVIFYVNMYVSIAFFGLISFDRYYKIVKPLVSIVQSVNYSKVLSVLVWVLMLLLAVPNIILTNQS-TN---IQCMEL--KNELGRKWHKASNYVFVSIFWIVFLLLTVFYMAITRKIFK----LKSRKNSISVKRKSSRNIFSIVLAFVACFAPYHVARIPYTKSQTEETLLYTKEFTLLLSAANVCLDPISISSYASRLEKS-------------------------------- >O18793 RSRFIRNTNGSGEEVTTFFDYDYGAPCHKFDVKQIGAQLLPPLYSLVFIFGFVGNMLVVLILINCKKLKSLTDIYLLNLAISDLLFLITLPLWAHSAANE--WVFGNAMCKLFTGLYHIGYLGGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWLVAVFASVPGIIFTKCQE-ED-VYICGPY----FPRGWNNFHTIMRNILGLVLPLLIMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWTPYNIVILLNTFQEFFRQLDQATQVTETLGMTHCCINPIIYAFVGEKFRRYLSMFFRKYITKRFCKQCPVFVTSTNTPSTAEQ >NY1R_HUMAN TLFSQVENHSVHSNFSEKNAQLLAFENDDCHLPLAMIFTLALAYGAVIILGVSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLVAIMCLPFTFVYTLMDHWVFGEAMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYVGIAVIWVLAVASSLPFLIYQVMTDKD--KYVCFDQ---FPSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMMRDNKYRSSETKRINIMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK >B2AR_FELCA ----MGQPGNRSVFLLAPNGSHAPDQDGTQERNDAWVVGMGIVMSLIVLAIVFGNVLVITAIARFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFYQSLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQAYAIASSIVSFYLPLVVMVFVYSRVFQVAQRQDGRHRRASKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IPKEVYILLNWVGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSLKAYGNGYSDYAGEHSGGPLG >OLF1_RAT -------------MTEENQTVISQFLLLFLPIPSEHQHVFYALFLSMYLTTVLGNLIIIILIHLDSHLHTPMYLFLSNLSFSDLCFSSVTMPKLLQNMQSQVPSIPFAGCLTQLYFYLYFADLESFLLVAMAYDRYVAICFPLYMSIMSPKLCVSLVVLSWVLTTFHAMLHTLLMARL---SPHFFCDISKLSCSDTHVNELVIFVMGGLVIVIPFVLIIVSYARVVASI--------LKVPSVRGIHKIFSTCGSHLSVVSLFYGTIIGLYLCPSANN---STVKETVMAMMYTVVTPMLNPFIYSLRNRDMKEALIRVLCKKKITFCL------------------ >GASR_HUMAN GASLCRPGAPLLNSSSVGNLSCEPPRIRGAGTRELELAIRITLYAVIFLMSVGGNMLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAVSYLMGVSVSVSTLSLVAIALERYSAICRPLARVWQTRSHAARVIVATWLLSGLLMVPYPVYTVVQP---V-LQCVHR---WPSARVRQTWSVLLLLLLFFIPGVVMAVAYGLISRELYLGPGPSRPTQAKLLAKKRVVRMLLVIVVLFFLCWLPVYSANTWRAFDGPGALSGAPISFIHLLSYASACVNPLVYCFMHRRFRQACLETCARCCPRPPRARPRALPSIASLSRLSYT >CKR2_MOUSE FTRSIQELDEGATTPYDYDDGEPC---HKTSVKQIGAWILPPLYSLVFIFGFVGNMLVIIILIGCKKLKSMTDIYLLNLAISDLLFLLTLPFWAHYAANE--WVFGNIMCKVFTGLYHIGYFGGIFFIILLTIDRYLAIVHAVALKARTVTFGVITSVVTWVVAVFASLPGIIFTKSKQ-DD-HYTCGPY----FTQLWKNFQTIMRNILSLILPLLVMVICYSGILHTL--------FRCRNEKKRHRAVRLIFAIMIVYFLFWTPYNIVLFLTTFQESLKHLDQAMQVTETLGMTHCCINPVIYAFVGEKFRRYLSIFFRKHIAKRLCKQCPVFV-------SSTF >CKRA_MOUSE PTEQVSWGLYSGYDEEAYSVGPLPELCYKADVQAFSRAFQPSVSLMVAVLGLAGNGLVLATHLAARRTRSPTSVHLLQLALADLLLALTLPFAAAGALQG--WNLGSTTCRAISGLYSASFHAGFLFLACINADRYVAIARALGQRPSTPSRAHLVSVFVWLLSLFLALPALLFS--RD-RE-QRRCRLIFPESLTQTVKGASAVAQVVLGFALPLGVMAACYALLGRTL---------LAARGPERRRALRVVVALVVAFVVLQLPYSLALLLDTADLLAKRKDLALLVTGGLTLVRCSLNPVLYAFLGLRFRRDLRRLLQGGGCSPKPNPRGRCSCSAPTETHSLS >OPSD_GOBNI MNGTEGPFFYIPMVNTTGVVRSPYEYPQYYLVNPAAYACLGAYMFFLILVGFPVNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVFGGFTTTIYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGVAFTWFMASACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFTVHFCIPLAVVGFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVIGFLVCWLPYASVAWYIFTHQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTVSSSSVSPAA-- >AG2R_PIG ------MILNSSTEDSIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPTIIHRNVFF-TN--TVCAFH-YESQNSTLPVGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICLAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSE-------N >BONZ_MACMU -----MAEYDHYEDDGFLNSFNDSSQEEHQDFLQFRKVFLPCMYLVVFVCGLVGNSLVLVISIFYHKLQSLTDVFLVNLPLADLVFVCTLPFWAYAGIHE--WIFGQVMCKTLLGVYTINFYTSMLILTCITVDRFIVVVKATNQQAKRMTWGKVICLLIWVISLLVSLPQIIYGNVFNLD--KLICGYH-----DEEISTVVLATQMTLGFFLPLLAMIVCYSVIIKTL---------LHAGGFQKHRSLKIIFLVMAVFLLTQTPFNLVKLIRSTHWEYTSFHYTIIVTEAIAYLRACLNPVLYAFVSLKFRKNFWKLVKDIGCLPYLGVSHQWKTFSASHNVEAT >PE24_RABIT --------------------MSTPVANASASSMPELLNNPVTIPAVMFIFGVVGNLVAIVVLCKSRKKETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKGQWPGGQALCDYSTFILLFFGLSGLSIICAMSIERYLAINHAYYSHYVDKRLAGLTLFAVYASNVLFCALPNMGLGRSRLQFPDTWCFID---WRTNVTAHAAFSYMYAGFSSFLILATVLCNVLVCGALLRMAAAAASFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFINQLYQPDEISQNPDLQAIRIASVNPILDPWIYILLRKTVLSKAIEKIKCLFCRIGGSRRDRSRRTSSAMSTHSR >FSHR_MACFA FDMTYAEFDYDLCNEVVDVTCSPKPDAFNPCEDILGYNILRVLIWFISILAITGNIIVLVTLTTSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLDCKVHVRHAASVMVMGWIFAFAAALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYTHIYLTVRNP------NIVSSSSDTRIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKAKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEMQAQIYRTNSHPRNGHCSSA >ACM1_DROME -------------MYGNQTNGTIGFETKGPRYSLASMVVMGFVAAILSTVTVAGNVMVMISFKIDKQLQTISNYFLFSLAIADFAIGTISMPLFAVTTILGYWPLGPIVCDTWLALDYLASNASVLNLLIINFDRYFSVTRPLYRAKGTTNRAAVMIG-AWGISLLLWPPWIYSWPYIE-VP-KDECYIQ-----FIETNQYITFGTALAAFYFPVTIMCFLYWRIWRETKKRNPNKKKKSQEKRQESKAAKTLSAILLSFIITWTPYNILVLIKPLTTCSCIPTELWDFFYALCYINSTINPMSYALCNATFRRTYVRILTCKWHTRNREGMVRG------------ >BRS3_MOUSE QTLISITNDTETSSSVVSNDTTHKGWTGDNSPGIEALCAIYITYAGIISVGILGNAILIKVFFKTKSMQTVPNIFITSLAFGDLLLLLTCVPVDATHYLAEGWLFGKVGCKVLSFIRLTSVGVSVFTLTILSADRYKAVVKPLRQPPNAILKTCAKAGGIWIVSMIFALPEAIFSNVYTVT--FESCNSY---ISERLLQEIHSLLCFLVFYIIPLSIISVYYSLIARTLYKSIPTQSHARKQIESRKRIAKTVLVLVALFALCWLPNHLLYLYHSFTYESDVPFVIIIFSRVLAFSNSCVNPFALYWLSKTFQQHFKAQLCCLKAEQPEPPLGDIMGRVPATGSAHV >OPSD_MUGCE MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILLGFPINFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIALWSLVVLAVERWMVVCKPISNFRFGENHAIMGLAFTWVMASACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVVAFLICWCPYAGVAWYIFTHQGSEFGPLFMTFPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >C5AR_HUMAN SFNYTTPDYGHYDDKDTLDLNTPVD--KTSNTLRVPDILALVIFAVVFLVGVLGNALVVWVTAFEAK-RTINAIWFLNLAVADFLSCLALPILFTSIVQHHHWPFGGAACSILPSLILLNMYASILLLATISADRFLLVFKPICQNFRGAGLAWIACAVAWGLALLLTIPSFLYRVVRE-PP--VLCGVD-YS-HDKRRERAVAIVRLVLGFLWPLLTLTICYTFILLRT---------WSRRATRSTKTLKVVVAVVASFFIFWLPYQVTGIMMSFLEPSLLLNKLDSLCVSFAYINCCINPIIYVVAGQGFQGRLRKSLPSLLRNVLTEESVVRSTVD-------- >MAS_RAT ------MDQSNMTSFAEEKAMNTSSRNASLGTSHPPIPIVHWVIMSISPLGFVENGILLWFLCFRMR-RNPFTVYITHLSIADISLLFCIFILSIDYALDYESSGHYYTIVTLSVTFLFGYNTGLYLLTAISVERCLSVLYPIYRCHRPKHQSAFVCALLWALSCLVTTMEYVMCI-DSHS--QSDC-----------RAVIIFIAILSFLVFTPLMLVSSTILVVKIRK----------NTWASHSSKLYIVIMVTIIIFLIFAMPMRVLYLLYYEYW--STFGNLHNISLLFSTINSSANPFIYFFVGSSKKKRFRESLKVVLTRAFKDEMQPRTVSIETVV---- >GPRC_RAT NLSGLPRDCIEAGTPENISAAVPSQGSVVESEPELVVNPWDIVLCSSGTLICCENAVVVLIIFHSPSLRAPMFLLIGSLALADLLAG-LGLIINFVFA-Y--LLQSEATKLVTIGLIVASFSASVCSLLAITVDRYLSLYYALYHSERTVTFTYVMLVMLWGTSTCLGLLPVMGWNCL-R--DESTCSVV-------RPLTKNNAAILSISFLFMFALMLQLYIQICKIVMRHIALHFLATSHYVTTRKGISTLALILGTFAACWMPFTLYSLIADY----TYPSIYTYATLLPATYNSIINPVIYAFRNQEIQKALCLICCGCIPNTLSQRARSP------------ >V1AR_HUMAN SSPWWPLATGAGNTSREAEALGEGNGPPRDVRNEELAKLEIAVLAVTFAVAVLGNSSVLLALHRTPRKTSRMHLFIRHLSLADLAVAFFQVLPQMCWDITYRFRGPDWLCRVVKHLQVFGMFASAYMLVVMTADRYIAVCHPLKTLQQPARRSRLMIAAAWVLSFVLSTPQYFVFSMIETK--ARDCWAT---FIQPWGSRAYVTWMTGGIFVAPVVILGTCYGFICYNIWCNFLLVSSVKSISRAKIRTVKMTFVIVTAYIVCWAPFFIIQMWSVWDPMSESENPTITITALLGSLNSCCNPWIYMFFSGHLLQDCVQSFPCCQNMKEKFNKEDTTFYSNNRSPTNS >A1AA_HUMAN ----------MVFLSGNASDSSNCT-QPPAPVNISKAILLGVILGGLILFGVLGNILVILSVACHRHLHSVTHYYIVNLAVADLLLTSTVLPFSAIFEVLGYWAFGRVFCNIWAAVDVLCCTASIMGLCIISIDRYIGVSYPLYPTIVTQRRGLMALLCVWALSLVISIGPLFGWRQP--AP--TICQIN--------EEPGYVLFSALGSFYLPLAIILVMYCRVYVVAKREAKTFSVRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVMPIGSFFPDFKPSETVFKIVFWLGYLNSCINPIIYPCSSQEFKKAFQNVLRIQCLCRKQSSKHALSQ---------- >CML2_RAT PSNSTPLALNLSLALREDAPGNLTGDLSEHQQYVIALFLSCLYTIFLFPIGFVGNILILVVNISFREKMTIPDLYFINLAAADLILVADSLIEVFNLDEQ--YYDIAVLCTFMSLFLQINMYSSVFFLTWMSFDRYLALAKAMCGLFRTKHHARLSCGLIWMASVSATLVPFTAVHLR-A----CFCFA---------DVREVQWLEVTLGFIVPFAIIGLCYSLIVRALIR----AHRHRGLRPRRQKALRMIFAVVLVFFICWLPENVFISVHLLQWAQHAYPLTGHIVNLAAFSNSCLSPLIYSFLGETFRDKLRLYVAQKTSLPALNRFCHADSTEQSDVKFSS >GPRF_HUMAN ----MDPEETSVYLDYYYATSPNSDIRETHSHVPYTSVFLPVFYTAVFLTGVLGNLVLMGALHFKPGSRRLIDIFIINLAASDFIFLVTLPLWVDKEASLGLWRTGSFLCKGSSYMISVNMHCSVLLLTCMSVDRYLAIVWPVSRKFRRTDCAYVVCASIWFISCLLGLPTLLSRELT-IDD-KPYCAEK----KATPIKLIWSLVALIFTFFVPLLSIVTCYCCIARKLCAH---YQQSGKHNKKLKKSIKIIFIVVAAFLVSWLPFNTFKFLAIVSGLRAILQLGMEVSGPLAFANSCVNPFIYYIFDSYIRRAIVHCLCPCLKNYDFGSSTETALSTFIHAEDFA >CKR5_TRAFR ----MDYQVSSPTYDIDYYTSEPC---QKVNVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >GP68_HUMAN ----------------MGNITADNSSMSCTIDHTIHQTLAPVVYVTVLVVGFPANCLSLYFGYLQIKARNELGVYLCNLTVADLFYICSLPFWLQYVLQHDNWSHGDLSCQVCGILLYENIYISVGFLCCISVDRYLAVAHPFFHQFRTLKAAVGVSVVIWAKELLTSIYFLMHEEV--QH---RVCFEHPIQAWQRAIQRAINYYRFLVGFLFPICLLLASYQGILRAVRR------SHGTQKSRKDQIQRLVLSTVVIFLACFLPYHVLLLVRSVWEASKGVFNAYHFSLLLTSFNCVADPVLYCFVSETTHRDLARLRGACLAFLTCSRTGRAAPEASGKSGAQG >YYI3_CAEEL SDPNAEDLYITMTPSVSTENDTTVWATEEPAAIVWRHPLLAIALFSICLLTVAGNCLVVIAVCTKKYIWVTRLYLIISLAIADLIVGVIVMPMNSLFEIANHWLFGLMMCDVFHAMDILASTASIWNLCVISLDRYMAGQDPIYRDKVSKRRILMAILSVWVLSAILSFPGIIWWWRTSP-H--SQCLFT--------DSKMYVSFSSLVSFYIPLFLILFAYGKVYIIATRHKLNKSRQMMRYVHEQRAARTLSIVVGAFILCWTPFFVFTPLTAFCESCSNKETIFTFVTWAGHLNSMLNPLIYSRFSRDFRRAFKQILTCQRQQKVKTAFKTPLISVTQMAPRFS >CKR1_MACMU METPNTTEDYDMITEFDYGDATPC---HKVNERAILAQLLPPLYSLVFVIGVVGNLLVVLVLVQYKRLKNMTNIYLLNLAISDLLFLFTLPFLIYYKSTDD-WIFGDAMCKILSGFYYTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIIIWALAILASSPLMYFSKTQW-NI-RHSCNIHFPYESFQQWKLFQALKLNLFGLVLPLLVMIVCYTGIIKIL---------LRRPNEKKSKAVRLIFVIMIIFFLFWTPYNLTELISVFQEFLRQLDLAMEVTEVIANMHCCVNPVIYAFAGERFRKYLRQLFHRRVAVHLVKWLPFLV-------SSTS >BRB2_RAT IEMFNITTQALGSAHNGTFSEVNC---PDTEWWSWLNAIQAPFLWVLFLLAALENIFVLSVFCLHKTNCTVAEIYLGNLAAADLILACGLPFWAITIANNFDWLFGEVLCRVVNTMIYMNLYSSICFLMLVSIDRYLALVKTMMGRMRGVRWAKLYSLVIWSCTLLLSSPMLVFRTMKD-HN--TACVIV---YPSRSWEVFTNMLLNLVGFLLPLSIITFCTVRIMQVLRNN---EMKKFKEVQTEKKATVLVLAVLGLFVLCWFPFQISTFLDTLLRLGRAVDIVTQISSYVAYSNSCLNPLVYVIVGKRFRKKSREVYQAICRKGGCMGESVQLRTS-------I >A1AD_RAT TGSGEDNQSSTGEPGAAASGEVNGSAAVGGLVVSAQGVGVGVFLAAFILTAVAGNLLVILSVACNRHLQTVTNYFIVNLAVADLLLSAAVLPFSATMEVLGFWAFGRTFCDVWAAVDVLCCTASILSLCTISVDRYVGVRHSLYPAIMTERKAAAILALLWAVALVVSVGPLLGWKEP--VP--RFCGIT--------EEVGYAIFSSVCSFYLPMAVIVVMYCRVYVVARSTHTLLSVRLLKFSREKKAAKTLAIVVGVFVLCWFPFFFVLPLGSLFPQLKPSEGVFKVIFWLGYFNSCVNPLIYPCSSREFKRAFLRLLRCQCRRRRRR--LWAASTGD-ARSDCA >TSHR_BOVIN LQAFDSHYDYTVCGGSEDMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSLLALLGNVFVLVILLTSHYKLTVPRFLMCNLAFADFCMGLYLLLIASVDLYTQSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWHAITFAMLDRKIRLWHAYVIMLGGWVCCFLLALLPLVGISSY----KVSICLPM-----TETPLALAYIILVLLLNIIAFIIVCACYVKIYITVRNP------HYNPGDKDTRIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFMLLSKFGICKRQAQAYRGSTGIRVQKVPPD >MC3R_HUMAN NASCCLPSVQPTLPNGSEHLQAPFFSNQSSSAFCEQVFIKPEIFLSLGIVSLLENILVILAVVRNGNLHSPMYFFLCSLAVADMLVSVSNALETIMIAIVHSDQFIQHMDNIFDSMICISLVASICNLLAIAVDRYVTIFYALYHSIMTVRKALTLIVAIWVCCGVCGVVFIVYYS----------------------EESKMVIVCLITMFFAMMLLMGTLYVHMFLFARLHIAAADGVAPQQHSCMKGAVTITILLGVFIFCWAPFFLHLVLIITCPTNICYTAHFNTYLVLIMCNSVIDPLIYAFRSLELRNTFREILCGCNGMNLG------------------ >MSHR_SHEEP PVLGSQRRLLGSLNCTPPATLPLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICSSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSVLSITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW----------------------- >CKR1_MOUSE MEISDFTEAYPTTTEFDYGDSTPC---QKTAVRAFGAGLLPPLYSLVFIIGVVGNVLMILVLMQHRRLQSMTSIYLFNLAVSDLVFLFTLPFWIDYKLKDD-WIFGDAMCKLLSGFYYLGLYSEIFFIILLTIDRYLAIVHAVALRARTVTLGIITSIITWALAILASMPALYFFKAQW-EF-HRTCSPHFPYKSLKQWKRFQALKLNLLGLILPLLVMIICYAGIIRIL---------LRRPSEKKVKAVRLIFAITLLFFLLWTPYNLSVFVSAFQDVLKHLDLAMQVTEVIAYTHCCVNPIIYVFVGERFWKYLRQLFQRHVAIPLAKWLPFLT-------SSIS >A2AA_BOVIN ----MGSLQPDAGNASWNGTEAPGGGARATPYSLQVTLTLVCLAGLLMLFTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKAWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISFEKKRP-S--PRCEIN--------DQKWYVISSSIGSFFAPCLIMILVYVRIYQIAKRRSGGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLTAIGCP--VPPTLFKFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------ >SSR3_MOUSE SEPMTLDPGNTSSTWPLDTTLGNTSAGASLTGLAVSGILISLVYLVVCVVGLLGNSLVIYVVLRHTSSPSVTSVYILNLALADELFMLGLPFLAAQNALSY-WPFGSLMCRLVMAVDGINQFTSIFCLTVMSVDRYLAVVHPTSARWRTAPVARTVSRAVWVASAVVVLPVVVFSGVPR-----STCHMQ-WPEPAAAWRTAFIIYMAALGFFGPLLVICLCYLLIVVKVRSTSCQAPACQRRRRSERRVTRMVVAVVALFVLCWMPFYLLNIVNVVCPLPPAFFGLYFLVVALPYANSCANPILYGFLSYRFKQGFRRILLRPSRRIRSQEPGSGEEDEEEEERREE >P2YR_BOVIN TAFLADPGSPWGNSTVTSTAAVASPFKCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDAMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLSLGRLKKKNAVYISVLVWLIVVVGISPILFYSGTG----KTITCYDT-TSDEYLRSYFIYSMCTTVAMFCVPLVLILGCYGLIVRALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMKTMNLRARLDDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKASRRSEANLQSKSLSEFKQNGDTSL >5H5B_RAT TPGIAFPPGPESCSDSPSSGRGGLILSGREPPFSAFTVLVVTLLVLLIAATFLWNLLVLVTILRVRAFHRVPHNLVASTAVSDVLVAALVMPLSLVSELSAGWQLGRSLCHVWISFDVLCCTASIWNVAAIALDRYWTITRHLYTLRTRRRASALMIAITWALSALIALAPLLFGWGE-D-A-LQRCQVS--------QEPSYAVFSTCGAFYVPLAVVLFVYWKIYKAAKFRATVTSGDSWREQKEKRAAMMVGILIGVFVLCWIPFFLTELVSPLCAC-SLPPIWKSIFLWLGYSNSFFNPLIYTAFNKNYNNAFKSLFTKQR----------------------- >OPSD_CHICK MNGTEGQDFYVPMSNKTGVVRSPFEYPQYYLAEPWKFSALAAYMFMLILLGFPVNFLTLYVTIQHKKLRTPLNYILLNLVVADLFMVFGGFTTTMYTSMNGYFVFGVTGCYIEGFFATLGGEIALWSLVVLAVERYVVVCKPMSNFRFGENHAIMGVAFSWIMAMACAAPPLFGWSRYIPEGMQCSCGIDYYTLKPEINNESFVIYMFVVHFMIPLAVIFFCYGNLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTNQGSDFGPIFMTIPAFFAKSSAIYNPVIYIVMNKQFRNCMITTLCCGKNPLGDEDTSAGTSSVSTSQVSPA >B3AR_MOUSE MAPWPHRNGSLALWSDAPTLDPSAAN---TSGLPGVPWAAALAGALLALATVGGNLLVIIAIARTPRLQTITNVFVTSLAAADLVVGLLVMPPGATLALTGHWPLGETGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGTLVTKRRARAAVVLVWIVSAAVSFAPIMSQWWRVQ-E--RCCSFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVAKRQGVPRRPARLLPLREHRALRTLGLIMGIFSLCWLPFFLANVLRALAGPS-VPSGVFIALNWLGYANSAFNPVIYCRSP-DFRDAFRRLLCSYGGRGPEEPRAVTARQSPPLNRFDG >SSR1_RAT GEGVCSRGPGSGAADGMEEPGRNSSQNGTLSEGQGSAILISFIYSVVCLVGLCGNSMVIYVILRYAKMKTATNIYILNLAIADELLMLSVPFLVTSTLLRH-WPFGALLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIAARYRRPTVAKVVNLGVWVLSLLVILPIVVFSRTAADG--TVACNML-MPEPAQRWLVGFVLYTFLMGFLLPVGAICLCYVLIIAKMRMVALK-AGWQQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQ--DDATVSQLSVILGYANSCANPILYGFLSDNFKRSFQRILCLS-----WMDNAAETALKSRAYSVED >AA1R_HUMAN ----------------------------MPPSISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PQTYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKMVVTPRRAAVAIAGCWILSFVVGLTPMFGWNNLSN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRKQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPSCHKPSILTYIAIFLTHGNSAMNPIVYAFRIQKFRVTFLKIWNDHFRCQPAPPID--PDD--------- >PE23_HUMAN PFCTRLNHSYTGMWAPERSAEARGNLTRPPGSGEDCGSVSVAFPITMLLTGFVGNALAMLLVSRSYRRKKSFLLCIGWLALTDLVGQLLTTPVVIVVYLSKQIDPSGRLCTFFGLTMTVFGLSSLFIASAMAVERALAIRAPHYASHMKTRATRAVLLGVWLAVLAFALLPVLGVG----QYTGTWCFISNGTSSSHNWGNLFFASAFAFLGLLALTVTFSCNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNQTSQKECNFFLIAVRLASLNQILDPWVYLLLRKILLRKFCQIRYHTNNYASSSTSLPCWSDHLER----- >OLF0_RAT ---------------MNNQTFITQFLLLGLPIPEEHQHLFYALFLVMYLTTILGNLLIIVLVQLDSQLHTPMYLFLSNLSFSDLCFSSVTMPKLLQNMRSQDTSIPYGGCLAQTYFFMVFGDMESFLLVAMAYDRYVAICFPLYTSIMSPKLCTCLVLLLWMLTTSHAMMHTLLAARL---SLNFFCDLFKLACSDTYINELMIFIMSTLLIIIPFFLIVMSYARIISSI--------LKVPSTQGICKVFSTCGSHLSVVSLFYGTIIGLYLCPAGNN---STVKEMVMAMMYTVVTPMLNPFIYSLRNRDMKRALIRVICSMKITL-------------------- >BRB2_RABIT -MLNITSQVLAPALNGSVSQSSGC---PNTEWSGWLNVIQAPFLWVLFVLATLENLFVLSVFCLHKSSCTVAEVYLGNLAAADLILACGLPFWAVTIANHFDWLFGEALCRVVNTMIYMNLYSSICFLMLVSIDRYLALVKTMIGRMRRVRWAKLYSLVIWGCTLLLSSPMLVFRTMKD-YN--TACIID---YPSRSWEVFTNVLLNLVGFLLPLSVITFCTVQILQVLRNN---EMQKFKEIQTERRATVLVLAVLLLFVVCWLPFQVSTFLDTLLKLGHVIDVITQVGSFMGYSNSCLNPLVYVIVGKRFRKKSREVYRAACPKAGCVLEPVQLRTS-------I >CCKR_MOUSE DSLLMNGSNITPPCELGLENETLFCLDQPQPSKEWQSAVQILLYSFIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAVSDLMLCLFCMPFNLIPNLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICRPLSRVWQTKSHALKVIAATWCLSFTIMTPYPIYN----NNQTANMCRFL---LPSDAMQQSWQTFLLLILFLIPGVVMVVAYGLISLELYQGRINSSGSAANLIAKKRVIRMLIVIVVLFFLCWMPIFSANAWRAYDTVSHLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGPTGVRGEVTIRASLSRYSYS >ACM3_RAT N---------ISQETGNFSS-NDTSSDPLGGHTIWQVVFIAFLTGFLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFV-VP-PGECFIQ------FLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRKTRTKRKRMSLIKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFCDSC-IPKTYWNLGYWLCYINSTVNPVCYALCNKTFRTTFKTLLLCQCDKRKRRKQQYQHKRVPEQAL--- >5H1E_HUMAN -------------MNITNCT--TEASMAIRPKTITEKMLICMTLVVITTLTTLLNLAVIMAIGTTKKLHQPANYLICSLAVTDLLVAVLVMPLSIIYIVMDRWKLGYFLCEVWLSVDMTCCTCSILHLCVIALDRYWAITNAIYARKRTAKRAALMILTVWTISIFISMPPLFWRS---S-P--SQCTIQ-------HDHVIYTIYSTLGAFYIPLTLILILYYRIYHAAKSLNDLGERQQISSTRERKAARILGLILGAFILSWLPFFIKELIVGLSIYT-VSSEVADFLTWLGYVNSLINPLLYTSFNEDFKLAFKKLIRCREHT--------------------- >OPS1_CALVI ALT---NGSVTDKVTPDMAHLVHPYWNQFPAMEPKWAKFLAAYMVLIATISWCGNGVVIYIFSTTKSLRTPANLLVINLAISDFGIMITNTPMMGINLFYETWVLGPLMCDIYGGLGSAFGCSSILSMCMISLDRYNVIVKGMAGQPMTIKLAIMKIALIWFMASIWTLAPVFGWSRYVPEGNLTSCGID--YLERDWNPRSYLIFYSIFVYYLPLFLICYSYWFIIAAVSAHMNVRSSEDADKSAEGKLAKVALVTISLWFMAWTPYTIINTLGLFKYEG-LTPLNTIWGACFAKSAACYNPIVYGISHPKYGIALKEKCPCCVFGKVDDGK-ASNNESETKA---- >V1BR_RAT ---MNSEPSWTATPSPGGTLPVPNATTPWLGRDEELAKVEIGILATVLVLATGGNLAVLLTLGRHGHKRSRMHLFVLHLALTDLGVALFQVLPQLLWDITYRFQGSDLLCRAVKYLQVLSMFASTYMLLAMTLDRYLAVCHPLRSLRQPSQSTYPLIAAPWLLAAILSLPQVFIFSLRESG--VLDCWAD---FYFSWGPRAYITWTTMAIFVLPVAVLSACYGLICHEIYKNRGLVSSISTISRAKIRTVKMTFVIVLAYIACWAPFFSVQMWSVWDENADSTNVAFTISMLLGNLSSCCNPWIYMGFNSRLLPRSLSHHACCTGSKPQVHRQLSRTTLLTHACGSP >OPSD_SARPI MNGTEGPFFYIPMSNATGLVRSPYDYPQYYLVPPWGYACLAAYMFLLILTGFPVNFLTLYVTIEHKKLRSPLNYILLNLAVADLFMVIGGFTTTMWTSLNGYFVFGRMGCNIEGFFATLGGEIALWSLVVLSMERWIVVCKPISNFRFGENHAVMGVAFSWFMAAACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVVHFTCPLTIITFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVIIMFVAFLACWVPYASVAWYIFTHQGSEFGPVFMTIPAFFAKSSAVYNPVIYICLNKQFRHCMITTLCCGKNPFEEEEGSTTSVCSVSPAA--- >V1BR_MOUSE ---MDSEPSWTATPSPGGTLFVPNTTTPWLGRDEELAKVEIGILATVLVLATGGNLAVLLILGLQGHKRSRMHLFVLHLALTDLGVALFQVLPQLLWDITYRFQGSDLLCRAVKYLQVLSMFASTYMLLAMTLDRYLAVCHPLRSLQQPSQSTYPLIAAPWLLAAILSLPQVFIFSLRESG--VLDCWAD---FYFSWGPRAYITWTTMAIFVLPVVVLTACYGLICHEIYKNATLVSSISTISRAKIRTVKMTFVIVLAYIACWAPFFSVQMWSVWDENADSTNVAFTISMLLGNLSSCCNPWIYMGFNSHLLPRSLSHRACCRGSKPRVHRQLSRTTLLTHTCGPS >CCKR_CAVPO DSLFVNGSNITSACELGFENETLFCLDRPRPSKEWQPAVQILLYSLIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAVSDLMLCLFCMPFNLIPSLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICKPLSRVWQTKSHALKVIAATWCLSFTIMTPYPIYN----NNQTGNMCRFL---LPNDVMQQTWHTFLLLILFLIPGIVMMVAYGLISLELYQGRINSSSSTANLMAKKRVIRMLIVIVVLFFLCWMPIFSANAWRAYDTVSHLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGTPGVRGEMTTGASLSRYSYS >OPRM_RAT LSHVDGNQSDPCGLNRTGLGGNDSLCPQTGSPSMVTAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGTILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDFRTPRNAKIVNVCNWILSSAIGLPVMFMATTKYGS---IDCTLT-FSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSTIEQQNSTRVPSTANTVDRTNH >FMLR_RABIT -----------MDSNASLPLNVSGGTQATPAGLVVLDVFSYLILVVTFVLGVLGNGLVIWVTGFRMT-HTVTTISYLNLALADFSFTSTLPFFIVTKALGGHWPFGWFLCKFVFTIVDINLFGSVFLIALIALDRCICVLHPVAQNHRNVSLAKKVIVGPWICALLLTLPVIIRVTTLSSPWPAEKLKVA------ISMFMVRGIIRFIIGFSTPMSIVAVCYGLIATKI---------HRQGLIKSSRPLRVLSFVVASFLLCWSPYQIAALIATVRIREKDLRIVLDVTSFVAFFNSCLNPMLYVFMGQDFRERLIHSLPASLERALSEDSAQTTS---------- >OPSP_COLLI ----MDPTNSPQEPPHTSTPGPFDGPQWPHQAPRGMYLSVAVLMGIVVISASVVNGLVIVVSIRYKKLRSPLNYILVNLAMADLLVTLCGSSVSFSNNINGFFVFGKRLCELEGFMVSLTGIVGLWSLAILALERYVVVCRPLGDFRFQHRHAVTGCAFTWVWSLLWTTPPLLGWSSYVPEGLRTSCGPN--WYTGGSNNNSYILTLFVTCFVMPLSLILFSYANLLMTLRAAAAQQQESDTTQQAERQVTRMVVAMVMAFLICWLPYTTFALVVATNKDIAIQPALASLPSYFSKTATVYNPIIYVFMNKQFQSCLLKMLCCGHHPRGTGRTAPAGLRNKVTPSHPV >O1D2_HUMAN -------------MDGGNQSEGSEFLLLGMSESPEQQQILFWMFLSMYLVTVVGNVLIILAISSDSRLHTPVYFFLANLSFTDLFFVTNTIPKMLVNLQSHNKAISYAGCLTQLYFLVSLVALDNLILAVMAYDRYVAICCPLYTTAMSPKLCILLLSLCWVLSVLYGLIHTLLMTRV---THYIFCEMYRMACSNIQINHTVLIATGCFIFLIPFGFVIISYVLIIRAI--------LRIPSVSKKYKAFSTCASHLGAVSLFYGTLCMVYLKPLHTY----SVKDSVATVMYAVVTPMMNPFIYSLRNKDMHGALGRLLDKHFKRLT------------------- >IL8A_HUMAN PQMWDFDDLNFTGMPPADEDYSPC----MLETETLNKYVVIIAYALVFLLSLLGNSLVMLVILYSRVGRSVTDVYLLNLALADLLFALTLPIWAASKVNG--WIFGTFLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRHLVKFVCLGCWGLSMNLSLPFFLFRQAYH-NS-SPVCYEV-LGNDTAKWRMVLRILPHTFGFIVPLFVMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQNNIGRALDATEILGFLHSCLNPIIYAFIGQNFRHGFLKILAMHGLVSKEFLARHR------------ >EBP2_HUMAN NPDKDGGTPDSGQELRGNLTGQIQNPLYPVTESSYSAYAIMLLALVVFAVGIVGNLSVMCIVWHSYYLKSAWNSILASLALWDFLVLFFCLPIVIFNEITKQRLLGDVSCRAVPFMEVSSLGVTTFSLCALGIDRFHVATSTLVRPIERCQSILAKLAVIWVGSMTLAVPELLLWQLAQM----KPSASLSLYSLVMTYQNARMWWYFGCYFCLPILFTVTCQLVTWRVRGPPRKS-CRASKHEQCESQLNSTVVGLTVVYAFCTLPENVCNIVVAYLSTEQTLDLLGLINQFSTFFKGAITPVLLLCICRPLGQAFLDCCCCCCCEECGGASEASKLKTEVSSSIYF >5H5B_MOUSE TPGLAFPPGPESCSDSPSSGRGGLILPGREPPFSAFTVLVVTLLVLLIAATFLWNLLVLVTILRVRAFHRVPHNLVASTAVSDVLVAVLVMPLSLVSELSAGWQLGRSLCHVWISFDVLCCTASIWNVAAIALDRYWTITRHLYTLRTRSRASALMIAITWALSALIALAPLLFGWGE-D-A-LQRCQVS--------QEPSYAVFSTCGAFYLPLAVVLFVYWKIYKAAKFRATVTSGDSWREQKEKRAAMMVGILIGVFVLCWIPFFLTELISPLCAC-SLPPIWKSIFLWLGYSNSFFNPLIYTAFNKNYNNAFKSLFTKQR----------------------- >V2R_BOVIN MFMASTTSAVPWHLSQPTPAGNGSEGELLTARDPLLAQAELALLSTVFVAVALSNGLVLGALVRRGRRWAPMHVFIGHLCLADLAVALFQVLPQLAWDATDRFRGPDALCRAVKYLQMVGMYASSYMILAMTLDRHRAICRPMAHRHGGGTHWNRPVLLAWAFSLLFSLPQLFIFAQRDSG--VLDCWAR---FAEPWGLRAYVTWIALMVFVAPALGIAACQVLIFREIHASGCRPAEGARVSAAVAKTVKMTLVIVIVYVLCWAPFFLVQLWAAWDPEA-REGPPFVLLMLLASLNSCTNPWIYASFSSSISSELRSLLCCTWRRAPPSPGPQESFLAKDTPS--- >GRHR_MOUSE --MANNASLEQDPNHCSAINNSIPLIQGKLPTLTVSGKIRVTVTFFLFLLSTAFNASFLLKLQKWTQKLSRMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGEFLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITQPL-AVQSNSKLEQSMISLAWILSIVFAGPQLYIFRMIYTV--FSQCVTH--SFPQWWHQAFYNFFTFGCLFIIPLLIMLICNAKIIFALTRVPRKNQSKNNIPRARLRTLKMTVAFATSFVVCWTPYYVLGIWYWFDPEMRVSEPVNHFFFLFAFLNPCFDPLIYGYFSL------------------------------------- >OLF1_CANFA --------------MDGNYTLVTEFILLGFPTRPELQIVLFLVFLTLYGIILTGNIGLMMLIRTDPHLQTPMYFFLSNLSFADLCFSSAIVPKMLVNFLSENKSISLYGCALQFYFSCAFADTESFILAAMAYDRYVAICNPLYTVVMSRGICVWLIVLSYIGGNMSSLVHTSFAFIL---KNHFFCDLPKLSCTDTSVNEWLLSTYGSSVEIFCFIVIVISYYFILRSV--------LRIRSSSGRKKTFSTCASHLTSVAIYQGTLLFIYSRPTYLY---TPNTDKIISVFYTIIIPVLNPLIYSLRNKDVKDAAKRAVRLKVDSS-------------------- >GPRY_HUMAN SVSSWPYSSHRMRFITNHSDQATPNVTTCPMDEKLLSTVLTTSYSVIFIVGLVGNIIALYVFLGIHRKRNSIQIYLLNVAIADLLLIFCLPFRIMYHINQNKWTLGVILCKVVGTLFYMNMYISIILLGFISLDRYIKINRSIQRKAITTKQSIYVCCIVWMLALGGFLTMIILTLKK-----STMCFHY--RDKHNAKGEAIFNFILVVMFWLIFLLIILSYIKIGKNLLRISKR--SKFPNSGKYATTARNSFIVLIIFTICFVPYHAFRFIYISSQLNEIVHKTNEIMLVLSSFNSCLDPVMYFLMSSNIRKIMCQLLFRRFQGEPSRSESTSLHDTSVAVKIQS >OPSD_RANPI MNGTEGPNFYIPMSNKTGVVRSPFDYPQYYLAEPWKYSVLAAYMFLLILLGLPINFMTLYVTIQHKKLRTPLNYILLNLGVCNHFMVLCGFTITMYTSLHGYFVFGQTGCYFEGFFATLGGEIALWSLVVLAIERYIVVCKPMSNFRFGENHAMMGVAFTWIMALACAVPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFLIPLIIISFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVIFFLICWVPYAYVAFYIFTHQGSEFGPIFMTVPAFFAKSSAIYNPVIYIMLNKQFRNCMITTLCCGKNPFGDDDASSAATSVSTSQVSPA >GP38_HUMAN GSPWNGSDGPEGAREPPWPALPPCDERRCSPFPLGALVPVTAVCLCLFVVGVSGNVVTVMLIGRYRDMRTTTNLYLGSMAVSDLLILLGLPFDLYRLWRSRPWVFGPLLCRLSLYVGEGCTYATLLHMTALSVERYLAICRPLARVLVTRRRVRALIAVLWAVALLSAGPFLFLVG---AEAFSRECRPS----PAQLGALRVMLWVTTAYFFLPFLCLSILYGLIGRELWSSPLR--AASGRERGHRQTVRVLLVVVLAFIICWLPFHVGRIIYINTEDSYFSQYFNIVALQLFYLSASINPILYNLISKKYRAAAFKLLLARKSRPRGFHRSRDDTGGDTVGYTET >GPRM_HUMAN PILEINMQSESNITVRDDIDDINTNMYQPLSYPLSFQVSLTGFLMLEIVLGLGSNLTVLVLYCMKSNINSVSNIITMNLHVLDVIICVGCIPLTIVILLLSLESNTALICCFHEACVSFASVSTAINVFAITLDRYDISVKPA-NRILTMGRAVMLMISIWIFSFFSFLIPFIEVNFFS--TKTLLCVST---EYYTELGMYYHLLVQIPIFFFTVVVMLITYTKILQALNIRTISQHEARERRERQKRVFRMSLLIISTFLLCWTPISVLNTTILCLGPSDLLVKLRLCFLVMAYGTTIFHPLLYAFTRQKFQKVLKSKMKKRVVSIVEADPLPNWIDPKRNKKITF >GPR3_HUMAN GAGSPLAWLSAGSGNVNVSSVGPAEGPTGPAAPLPSPKAWDVVLCISGTLVSCENALVVAIIVGTPAFRAPMFLLVGSLAVADLLAG-LGLVLHFAAV-F--CIGSAEMSLVLVGVLAMAFTASIGSLLAITVDRYLSLYNALYYSETTVTRTYVMLALVWGGALGLGLLPVLAWNCL-D--GLTTCGVV-------YPLSKNHLVVLAIAFFMVFGIMLQLYAQICRIVCRHIALHLLPASHYVATRKGIATLAVVLGAFAACWLPFTVYCLLGDA----HSPPLYTYLTLLPATYNSMINPIIYAFRNQDVQKVLWAVCCCCSSSKIPFRSRSP------------ >CCKR_RAT DSLLMNGSNITPPCELGLENETLFCLDQPQPSKEWQSALQILLYSIIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAVSDLMLCLFCMPFNLIPNLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICRPLSRVWQTKSHALKVIAATWCLSFTIMTPYPIYN----NNQTANMCRFL---LPSDAMQQSWQTFLLLILFLLPGIVMVVAYGLISLELYQGRLNSSSSAANLIAKKRVIRMLIVIVVLFFLCWMPIFSANAWRAYDTVSHLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGPPGVRGEVTIRALLSRYSYS >RDC1_MOUSE YAEPGNYSDINWPCNSSDCIVVDTVQCPTMPNKNVLLYTLSFIYIFIFVIGMIANSVVVWVNIQAKTTGYDTHCYILNLAIADLWVVITTPVWVVSLVQHNQWPMGELTCKITHLIFSINLFGSIFFLACMSVDRYLSITYFTTSSYKKKMVRRVVCILVWLLAFFVSLPDTYYLKAVTNNE--TYCRSFYPEHSIKEWLIGMELVSVILGFAVPFTIIAIFYFLLARAM---------SASGDQEKHSSRKIIFSYVVVFLVCWLPYHFVVLLDIFSILHNVLFTALHVTQCLSLVHCCVNPVLYSFINRNYRYELMKAFIFKYSAKTGLTKLIDEYSALEQNTK-- >OLF9_RAT -------------MTRRNQTAISQFFLLGLPFPPEYQHLFYALFLAMYLTTLLGNLIIIILILLDSHLHTPMYLFLSNLSFADLCFSSVTMPKLLQNMQSQVPSIPYAGCLAQIYFFLFFGDLGNFLLVAMAYDRYVAICFPLYMSIMSPKLCVSLVVLSWVLTTFHAMLHTLLMARL---SPHYFCDMSKVACSDTHDNELAIFILGGPIVVLPFLLIIVSYARIVSSI--------FKVPSSQSIHKAFSTCGSHLSVVSLFYGTVIGLYLCPSANN---STVKETVMSLMYTMVTPMLNPFIYSLRNRDIKDALEKIMCKKQIPSFL------------------ >5H1D_RABIT SPSNQSAEGLPQEAANRSLNATGTPEAWDPGTLQALKISLAVVLSIITVATVLSNTFVLTTILLTRKLHTPANYLIGSLATTDLLVSILVMPISIAYTITHTWNFGQVLCDIWVSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAAAMIAVVWAISICISIPPLFWR----H-E--SDCLVN-------TSQISYTIYSTCGAFYIPSVLLIVLYGRIYMAARNRKLALERKRISAARERKATKTLGIILGAFIGCWLPFFVASLVLPICRDSWMPPGLFDFFTWLGYLNSLINPIIYTVFNEDFRQAFQRVIHFRKAF--------------------- >ML1C_.ENLA -----MMEVNSTCLDCRTPGTIRTEQDAQDSASQGLTSALAVVLIFTIVVDVLGNILVILSVLRNKKLQNAGNLFVVSLSIADLVVAVYPYPVILIAIFQNGWTLGNIHCQISGFLMGLSVIGSVFNITAIAINRYCYICHSLYDKLYNQRSTWCYLGLTWILTIIAIVPNFFVGS-LQYDP-IFSCTFA------QTVSSSYTITVVVVHFIVPLSVVTFCYLRIWVLVIQVHR-QDFKQKLTQTDLRNFLTMFVVFVLFAVCWAPLNFIGLAVAINPFHKIPEWLFVLSYFMAYFNSCLNAVIYGVLNQNFRKEYKRILMSLLTPRLLFLDTSRSKPSPAVTNNNQ >SSR2_PIG LLNGSQPWLSSPFDLNGSVATANSSNQTEPYYDLTSNAVLTFIYFVVCIIGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMINVAVWGVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYAFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSVAISPALKGMFDFVVVLTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGTDDGERSDLNETTETQRTLL >NTR1_HUMAN EEALLAPGFGNASGNASERVLAAPSSELDVNTDIYSKVLVTAVYLALFVVGTVGNTVTAFTLARKKSLQSTVHYHLGSLALSDLLTLLLAMPVELYNFIWVHWAFGDAGCRGYYFLRDACTYATALNVASLSVERYLAICHPFAKTLMSRSRTKKFISAIWLASALLTVPMLFTMGEQN--SGGLVCTPT---IHTATVKVVIQVN-TFMSFIFPMVVISVLNTIIANKLTVMEHSMAIEPGRVQALRHGVRVLRAVVIAFVVCWLPYHVRRLMFCYISDEDFYHYFYMVTNALFYVSSTINPILYNLVSANFRHIFLATLACLCPVWRRRRK-RPSVSSNHTLSSNA >AG2R_CAVPO ------MILNSSTEDGIKRIQDDC---PKAGRHSYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADICFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCVIIWLMAGLASLPAVIHRNVFF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFMFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSTLSTRPSD-------N >PE23_RABIT PFCTRLNHSYPGMWAP----EARGNLTRPPGPGEDCGSVSVAFPITMLITGFVGNALAMLLVSRSYRRKKSFLLCIGWLALTDLVGQLLTSPVVILVYLSKQLDPSGRLCTFFGLTMTVFGLSSLFIASAMAVERALAIRAPHYASHMKTRATRAVLLGVWLAVLAFALLPVLGVG----QYTGTWCFISNGTSSSHNWGNLFFASTFAFLGLLALAITFTCNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNQTSQKECNFFLIAVRLASLNQILDPWVYLLLRKILLRKFCQVIHENNEQKDEIQRENRHEEARDSEKSKT >CB1A_FUGRU TDLFGNRNTTRD-ENSIQCGENFMDMECFMILTPSQQLAVAVLSLTLGTFTVLENLVVLCVIFQSRTRCRPSYHFIGSLAVADLLGSVIFVYSFLDFHVFHK-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYRRIVTRTKAVIAFCMMWTISIIIAVLPLLGWNCK-R--LNSVCSDI------FPLIDENYLMFWIGVTSVLVLFIIYAYIYILWKAHHHAEGQTTRPEQTRMDIRLAKTLVLILAVLVICWGPLLAIMVYDLFWKMDDNIKTVFAFCSMLCLLNSTVNPIIYALRSRDLRHAFLSSCHACRGSAQQLDNSLENVN--ISANRAA >CKR9_HUMAN DDYGSESTSSMEDYVNFNFTDFYC---EKNNVRQFASHFLPPLYWLVFIVGALGNSLVILVYWYCTRVKTMTDMFLLNLAIADLLFLVTLPFWAIAAADQ--WKFQTFMCKVVNSMYKMNFYSCVLLIMCISVDRYIAIAQAMTWREKRLLYSKMVCFTIWVLAAALCIPEILYSQIKE-G--IAICTMVYPSDESTKLKSAVLTLKVILGFFLPFVVMACCYTIIIHTL---------IQAKKSSKHKALKVTITVLTVFVLSQFPYNCILLVQTIDAYATNIDICFQVTQTIAFFHSCLNPVLYVFVGERFRRDLVKTLKNLGCISQAQWVSFT--------GSLK >V1BR_HUMAN ---MDSGPLWDANPTPRGTLSAPNATTPWLGRDEELAKVEIGVLATVLVLATGGNLAVLLTLGQLGRKRSRMHLFVLHLALTDLAVALFQVLPQLLWDITYRFQGPDLLCRAVKYLQVLSMFASTYMLLAMTLDRYLAVCHPLRSLQQPGQSTYLLIAAPWLLAAIFSLPQVFIFSLRESG--VLDCWAD---FGFPWGPRAYLTWTTLAIFVLPVTMLTACYSLICHEICKNRGLVSSINTISRAKIRTVKMTFVIVLAYIACWAPFFSVQMWSVWDKNADSTNVAFTISMLLGNLNSCCNPWIYMGFNSHLLPRPLRHLACCGGPQPRMRRRLSHTTLLTRSSCPA >5H4_RAT ------------------MDRLDANVSSNEGFGSVEKVVLLTFFAMVILMAILGNLLVMVAVCRDRQRKIKTNYFIVSLAFADLLVSVLVNAFGAIELVQDIWFYGEMFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPYRNKMTPLRIALMLGGCWVIPMFISFLPIMQGWNNIRK-NSTFCVFM--------VNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHSRPDQHSTHRMRTETKAAKTLCVIMGCFCFCWAPFFVTNIVDPFIDYT-VPEKVWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYKRPPILGQTINGSTHVLR-- >5HTA_DROME DSSLFGEMLANRSGHLDLINGTTSKVAEDDFTQLLRMAVTSVLLGLMILVTIIGNVFVIAAIILERNLQNVANYLVASLAVADLFVACLVMPLGAVYEISQGWILGPELCDIWTSCDVLCCTASILHLVAIAVDRYWAVTNI-YIHSRTSNRVFMMIFCVWTAAVIVSLAPQFGWKDPDE-Q--QKCMVS--------QDVSYQVFATCCTFYVPLMVILALYWKIYQTARKRPMQKRKETLEAKRERKAAKTLAIITGAFVVCWLPFFVMALTMPLCAACQISDSVASLFLWLGYFNSTLNPVIYTIFSPEFRQAFKRILFGGHRPVHYRSGKL------------- >HH1R_CAVPO -MSFLPGMTPVTLSNFSWALEDRMLEGNSTTTPTRQLMPLVVVLSSVSLVTVALNLLVLYAVRSERKLHTVGNLYIVSLSVADLIVGAVVMPMSILYLHRSAWILGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLRYRTKTRASATILGAWLLSFLWVIPILGWHHFM--EP-EKKCETD------FYDVTWFKVMTAIINFYLPTLLMLWFYIRIYKAVRRHLRSQYTSGLHLNRERKAAKQLGCIMAAFILCWIPYFVFFMVIAFCKSC-SNEPVHMFTIWLGYLNSTLNPLIYPLCNENFRKTFKRILRIPP----------------------- >OPSD_PHOGR MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVGLTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGFNFGPIFMTLPAFFAKAAAIYNPVIYIMMNKQFRTCMITTLCCGKNPLGDDEVSASASKTETSQVAPA >MSHR_CANFA VWQGPQRRLLGSLNGTSPATPHFELAANQTGPRCLEVSIPNGLFLSLGLVSVVENVLVVAAIAKNRNLHSPMYYFIGCLAVSDLLVSVTNVLETAVMLLVEAAAVVQQLDDIIDVLICGSMVSSLCFLGAIAVDRYLSIFYALYHSIVTLPRAWRAISAIWVASVLSSTLFIAYYY----------------------NNHTAVLLCLVSFFVAMLVLMAVLYVHMLARARQH-IAKRQHSVHQGFGLKGAATLTILLGIFFLCWGPFFLHLSLMVLCPQHGCVFQNFNLFLTLIICNSIIDPFIYAFRSQELRKTLQEVVLCSWA---------------------- >A2AC_HUMAN AAGPNASGAGERGSGGVANASGASWGPPRGQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAVISFPPLVSLYR-------PQCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKRRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFIYSLYGICREAQVPGPLFKFFFWIGYCNSSLNPVIYTVFNQDFRPSFKHILFRRRRRGFRQ----------------- >B1AR_SHEEP VPDGAATAARLLVP-SPLRLAADLGQRGTPLLSQQWTVGMGLLMAFIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARALVCTVWAISALVSFLPIFMQWWGDS-R--ECCDFI--------INEGYAITSSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRAACGSHGAAGCLAVARPSPSPG >OPS4_DROME S-GNGDLQFLGWNVPPDQIQYIPEHWLTQLEPPASMHYMLGVFYIFLFCASTVGNGMVIWIFSTSKSLRTPSNMFVLNLAVFDLIMCLKAPIFNSFHRGFA-IYLGNTWCQIFASIGSYSGIGAGMTNAAIGYDRYNVITKPM-NRNMTFTKAVIMNIIIWLYCTPWVVLPLTQFWDRFPEGYLTSCSFD--YLSDNFDTRLFVGTIFFFSFVCPTLMILYYYSQIVGHVFSHNVESNVDKSKETAEIRIAKAAITICFLFFVSWTPYGVMSLIGAFGDKSLLTQGATMIPACTCKLVACIDPFVYAISHPRYRLELQKRCPWLGVNEKSGEISSAQQQTTAA----- >O1E1_HUMAN -------------MMGQNQTSISDFLLLGLPIQPEQQNLCYALFLAMYLTTLLGNLLIIVLIRLDSHLHTPMYLFLSNLSFSDLCFSSVTIPKLLQNMQNQDPSIPYADCLTQMYFFLLFGDLESFLLVAMAYDRYVAICFPLYTAIMSPMLCLALVALSWVLTTFHAMLHTLLMARL---CPHFFCDMSKLAFSDTRVNEWVIFIMGGLILVIPFLLILGSYARIVSSI--------LKVPSSKGICKAFSTCGSHLSVVSLFYGTVIGLYLCSSANS---STLKDTVMAMMYTVVTPMLNPFIYSLRNRDMKGALSRVIHQKKTFFSL------------------ >MSHR_CHICK -MSMLAPLRLVREPWNASEGNQSNATAGAGGAWCQGLDIPNELFLTLGLVSLVENLLVVAAILKNRNLHSPTYYFICCLAVSDMLVSVSNLAKTLFMLLMEHASIVRHMDNVIDMLICSSVVSSLSFLGVIAVDRYITIFYALYHSIMTLQRAVVTMASVWLASTVSSTVLITYYY----------------------RRNNAILLCLIGFFLFMLVLMLVLYIHMFALACHHSISQKQPTIYRTSSLKGAVTLTILLGVFFICWGPFFFHLILIVTCPTNTCFFSYFNLFLILIICNSVVDPLIYAFRSQELRRTLREVVLCSW----------------------- >A1AA_BOVIN ----------MVFLSGNASDSSNCT-HPPPPVNISKAILLGVILGGLILFGVLGNILVILSVACHRHLHSVTHYYIVNLAVADLLLTSTVLPFSAIFEILGYWAFGRVFCNVWAAVDVLCCTASIMGLCIISIDRYIGVSYPLYPTIVTQKRGLMALLCVWALSLVISIGPLFGWRQP--AP--TICQIN--------EEPGYVLFSALGSFYVPLTIILVMYCRVYVVAKREAKNFSVRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVMPIGSFFPDFRPSETVFKIAFWLGYLNSCINPIIYPCSSQEFKKAFQNVLRIQCLRRKQSSKHTLSH---------- >CKR5_PYGBI ----MDYQVSSPTYDIDYYTSEPC---QKVNVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCYIFA-------SSVY >OPSO_SALSA TLRIAVNGVSYNEASEIYKPHADPFTGPITNLAPWNFAVLATLMFVITSLSLFENFTVMLATYKFKQLRQPLNYIIVNLSLADFLVSLTGGTISFLTNARGYFFLGNWACVLEGFAVTYFGIVAMWSLAVLSFERYFVICRPLGNVRLRGKHAALGLLFVWTFSFIWTIPPVFGWCSYTVSKIGTTCEPN--WYSNNIWNHTYIITFFVTCFIMPLGMIIYCYGKLLQKLRKVSHD--RLGNAKKPERQVSRMVVVMIVAYLVGWTPYAAFSIIVTACPTIYLDPRLAAAPAFFSKTAAVYNPVIYVFMNKQVSTQLNWGFWSRA----------------------- >NY1R_RAT TLFSRVENYSVHYNVSE-NSPFLAFENDDCHLPLAVIFTLALAYGAVIILGVSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLVAVMCLPFTFVYTLMDHWVFGETMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYIGITVIWVLAVASSLPFVIYQILTDKD--KYVCFDK---FPSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMIRDSKYRSSETKRINVMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK >OLF3_CANFA -------------MGTGNQTWVREFVLLGLSSDWDTEVSLFVLFLITYMVTVLGNFLIILLIRLDSRLHTPMYFFLTNLSLVDVSYATSIIPQMLAHLLAAHKAIPFVSCAAQLFFSLGLGGIEFVLLAVMAYDRYVAVCDPLYSVIMHGGLCTRLAITSWVSGSMNSLMQTVITFQL---PDHISCELLRLACVDTSSNEIAIMVSSIVLLMTPFCLVLLSYIQIISTI--------LKIQSTEGRKKAFHTCASHLTVVVLCYGMAIFTYIQPRSSP---SVLQEKLISLFYSVLTPMLNPMIYSVRNKEVKGAWQKLLGQLTGITSKLAT--------------- >P2Y9_HUMAN DRRFIDFQFQDSNSSLRPRLGNATANNTCIVDDSFKYNLNGAVYSVVFILGLITNSVSLFVFCFRMKMRSETAIFITNLAVSDLLFVCTLPFKIFYNFNRH-WPFGDTLCKISGTAFLTNIYGSMLFLTCISVDRFLAIVYPFSRTIRTRRNSAIVCAGVWILVLSGGISASLFSTTN----ATTTCFEGFSKRVWKTYLSKITIFIEVVGFIIPLILNVSCSSVVLRTLRKP----ATLSQIGTNKKKVLKMITVHMAVFVVCFVPYNSVLFLYALVRSQRFAKIMYPITLCLATLNCCFDPFIYYFTLESFQKSFYINAHIRMESLFKTETPLTIQEEVSDQTTNN >MC5R_BOVIN -MNSSFHLHFLDLGLNTTDGNLSGLSVQNASSLCEDMGIAVEVFLALGLISLLENILVIGAIVRNRNLHTPMYFFVGSLAVADMLVSLSNSWETITIYLLTNDASVRHLDNVFDSMICISVVASMCSLLAIAVDRYVTIFCALYQRIMTGRRSGAIIGGIWAFCASCGTVFIVYYY----------------------EESTYVVICLIAMFLTMLLLMASLYTHMFLLARTH-RIPGHSSVRQRTGVKGAITLAMLLGVFIVCWAPFFLHLILMISCPHNSCFMSHFNMYLILIMCNSVIDPLIYAFRSQEMRKTFKEIVCFQSFRTPCRFPSRY------------ >CKR5_TRAPH ----MDYQVSSPTYDIDYYTSEPC---QKVNVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >OPSB_SAIBB --MSKMPEEEEFYLFKNISSVGPWDGPQYHIAPVWAFQLQAAFMGIVFLAGLPLNSMVLVATVRYKKLRHPLNYVLVNVSVGGFLLCIFSVLPVFVNSCNGYFVFGRHVCALEGFLGTVAGLVTGWSLAFLAFERYIVICKPFGNFRFSSKHALMVVLTTWTIGIGVSIPPFFGWSRYIAEGLQCSCGPDWYTVGTKYRSEYYTWFLFIFCFIVPLSLICFSYAQLLRALKAVAAQQQESATTQKAEREVSRMVVVMVGSFCVCYVPYAALAMYMVNNRNHGLDLRLVSIPAFFSKSSCIYNPIIYCFMNKQFRACIMEMVCGKAMTD---ESDISSTVSSSQVGPN- >OAR_DROME LSTAQADKDSAGECEGAVEELHASILGLQLAVPEWEALLTALVLSVIIVLTIIGNILVILSVFTYKPLRIVQNFFIVSLAVADLTVALLVLPFNVAYSILGRWEFGIHLCKLWLTCDVLCCTSSILNLCAIALDRYWAITDPIYAQKRTVGRVLLLISGVWLLSLLISSPPLIGWNDW-T-S--TPCELT--------SQRGYVIYSSLGSFFIPLAIMTIVYIEIFVATRRRGVNEEKQKISLSKERRAARTLGIIMGVFVICWLPFFLMYVILPFCQTCCPTNKFKNFITWLGYINSGLNPVIYTIFNLDYRRAFKRLLGLN------------------------ >PAR3_HUMAN -WTGATITVKIKCPEESASHLHVKNATMGYLTSSLSTKLIPAIYLLVFVVGVPANAVTLWMLFFRTR-SICTTVFYTNLAIADFLFCVTLPFKIAYHLNGNNWVFGEVLCRATTVIFYGNMYCSILLLACISINRYLAIVHPFYRGLPKHTYALVTCGLVWATVFLYMLPFFILKQEYY--D--TTCHDVNTCESSSPFQLYYFISLAFFGFLIPFVLIIYCYAAIIRTL----------NAYDHRWLWYVKASLLILVIFTICFAPSNIILIIHHANYYYDGLYFIYLIALCLGSLNSCLDPFLYFLMSKTRNHSTAYLTK-------------------------- >NMBR_HUMAN SNLSVTTGANESGSVPEGWERDFLPASDGTTTELVIRCVIPSLYLLIITVGLLGNIMLVKIFITNSAMRSVPNIFISNLAAGDLLLLLTCVPVDASRYFFDEWMFGKVGCKLIPVIQLTSVGVSVFTLTALSADRYRAIVNPMMQTSGALLRTCVKAMGIWVVSVLLAVPEAVFSEVARSS--FTACIPY---QTDELHPKIHSVLIFLVYFLIPLAIISIYYYHIAKTLIKSLPGNEHTKKQMETRKRLAKIVLVFVGCFIFCWFPNHILYMYRSFNYNELGHMIVTLVARVLSFGNSCVNPFALYLLSESFRRHFNSQLCCGRKSYQERGTSYLMTSLKSNAKNMV >AA1R_BOVIN ----------------------------MPPSISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PRTYFHTCLKVACPVLILTQSSILALLAMAVDRYLRVKIPLYKTVVTPRRAVVAITGCWILSFVVGLTPMFGWNNLSN-G-VIECQFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYMEVFYLIRKQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPSCHMPRILIYIAIFLSHGNSAMNPIVYAFRIQKFRVTFLKIWNDHFRCQPAPPID--PDD--------- >C.C1_MOUSE ----------MESSTAFYDYHDKLSLLCENNVIFFSTISTIVLYSLVFLLSLVGNSLVLWVLVKYENLESLTNIFILNLCLSDLMFSCLLPVLISAQWS---WFLGDFFCKFFNMIFGISLYSSIFFLTIMTIHRYLSVVSPITLGIHTLRCRVLVTSCVWAASILFSIPDAVFHKVIS-----LNCKYS------EHHGFLASVYQHNIFFLLSMGIILFCYVQILRTL---------FRTRSRQRHRTVRLIFTVVVAYFLSWAPYNLTLFLKTGIIQQQQLDIAMIICRHLAFSHCCFNPVLYVFVGIKFRRHLKHLFQQVWLCRKTSSTVPCEGPSFY------ >OPS1_PATYE NGTLNRSMTPNTGWEGPYDMSVHLHWTQFPPVTEEWHYIIGVYITIVGLLGIMGNTTVVYIFSNTKSLRSPSNLFVVNLAVSDLIFSAVNGFPLLTVSSFHQWIFGSLFCQLYGFVGGVFGLMSINTLTAISIDRYVVITKPLASQTMTRRKVHLMIVIVWVLSILLSIPPFFGWGAYIPEGFQTSCTFD--YLTKTARTRTYIVVLYLFGFLIPLIIIGVCYVLIIRGVRRHSMKARANNKRARSELRISKIAMTVTCLFIISWSPYAIIALIAQFGPAHWITPLVSELPMMLAKSSSMHNPVVYALSHPKFRKALYQRVPWLFCCCKPKEKADFRSVTRTESVNSD >5H2B_MOUSE ILQKTCDHLILTNRSGLETDSVAEEMKQTVEGQGHTVHWAALLILAVIIPTIGGNILVILAVALEKRLQYATNYFLMSLAIADLLVGLFVMPIALLTIMFEAWPLPLALCPAWLFLDVLFSTASIMHLCAISLDRYIAIKKPIANQCNTRATAFIKITVVWLISIGIAIPVPIKGIET-NP---VTCELT------KDRFGSFMVFGSLAAFFVPLTIMVVTYFLTIHTLQKKRRMGKRSAQTISNEQRASKALGVVFFLFLLMWCPFFITNLTLALCDSCTTLKTLLEIFVWIGYVSSGVNPLIYTLFNKTFREAFGRYITCNYRATKSVKALRKGNSMVENSKFFT >OPSG_HUMAN DSYEDSTQSSIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSVWMIFVVIASVFTNGLVLAATMKFKKLRHPLNWILVNLAVADLAETVIASTISVVNQVYGYFVLGHPMCVLEGYTVSLCGITGLWSLAIISWERWMVVCKPFGNVRFDAKLAIVGIAFSWIWAAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMVTCCITPLSIIVLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVLAFCFCWGPYAFFACFAAANPGYPFHPLMAALPAFFAKSATIYNPVIYVFMNRQFRNCILQLFGKKVDDG-----SELVSSV--SSVSPA >ACM4_MOUSE ------MANFTPVNGSSANQSVRLVTTAHNHLETVEMVFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLACADLIIGAFSMNLYTLYIIKGYWPLGAVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPARRTTKMAGLMIAAAWVLSFVLWAPAILFWQFVV-VP-DNQCFIQ------FLSNPAVTFGTAIAAFYLPVVIMTVLYIHISLASRSRSIAVRKKRQMAARERKVTRTIFAILLAFILTWTPYNVMVLVNTFCQSC-IPERVWSIGYWLCYVNSTINPACYALCNATFKKTFRHLLLCQYRNIGTAR---------------- >AA1R_RAT ----------------------------MPPYISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PQTYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKTVVTQRRAAVAIAGCWILSLVVGLTPMFGWNNLSN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRKQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPTCQKPSILIYIAIFLTHGNSAMNPIVYAFRIHKFRVTFLKIWNDHFRCQPKPPID--AED--------- >AA1R_CHICK ----------------------------MAQSVTAFQAAYISIEVLIALVSVPGNILVIWAVKMNQALRDATFCFIVSLAVADVAVGALVIPLAIIINIG--PQTEFYSCLMMACPVLILTESSILALLAIAVDRYLRVKIPVYKSVVTPRRAAVAIACCWIVSFLVGLTPMFGWNNLNN-V-VIKCQFE-----TVISMEYMVYFNFFVWVLPPLLLMLLIYLEVFNLIRTQK-VSNDPQKYYGKELKIAKSLALVLFLFALSWLPLHILNCITLFCPSCKTPHILTYIAIFLTHGNSAMNPIVYAFRIKKFRTAFLQIWNQYFCCKTNKSSS--VN---------- >P2YR_MELGA PELLAG-----------GWAAGNASTKCSLTKTGFQFYYLPTVYILVFITGFLGNSVAIWMFVFHMRPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISVHRYTGVVHPLSLGRLKKKNAVYVSSLVWALVVAVIAPILFYSGTG----KTITCYDT-TADEYLRSYFVYSMCTTVFMFCIPFIVILGCYGLIVKALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYLPFHVMKTLNLRARLDDKVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKSSRRSEPNVQSKSLTEYKQNGDTSL >GRE1_BALAM ----MEGPPLSPAPADNVTLNVSCGRPATLFDWADHRLISLLALAFLNLMVVAGNLLVVMAVFVHSKLRTVTNLFIVSLACADLLVGMLVLPFSATLEVLDVWLYGDVWCSVWLAVDVWMCTSSILNLCAISLDRYLAVSQPIYPSLMSTRRAKQLIAAVWVLSFVICFPPLVGWNDR-G-S--LTCELT--------NERGYVIYSALGSFFLPSTVMLFFYGRIYRTAVSTRVSVRHQARRFRMETKAAKTVGIIVGLFILCWLPFFVCYLVRGFCADC-VPPLLFSVFFWLGYCNSAVNPCVYALCSRDFRFAFSSILCKCVCRRGAMERRFRRSQTEEDCEVAD >TDA8_MOUSE ---------------------MAMNSMCIEEQRHLEHYLFPVVYIIVFIVSVPANIGSLCVSFLQAKKENELGIYLFSLSLSDLLYALTLPLWINYTWNKDNWTFSPTLCKGSVFFTYMNFYSSTAFLTCIALDRYLAVVYPLFSFLRTRRFAFITSLSIWILESFFNSMLLWKDETS-DKSNFTLCYDK---YPLEKWQINLNLFRTCMGYAIPLITIMICNHKVYRAV------RHNQATENSEKRRIIKLLASITLTFVLCFTPFHVMVLIRCVLERDWQTFTVYRVTVALTSLNCVADPILYCFVTETGRADMWNILKLCTRKHNRHQGKKRRDAVELEIID-- >MSHR_CEREL PVLGSQRRLLGSLNCTPPATFPLTLAPNRTGPQCLEVAIPDGLFLSLGLVSLVENVLVVAAIAKNRNLQSPMYYFICCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW----------------------- >NY1R_MOUSE TLFSKVENHSIHYNASE-NSPLLAFENDDCHLPLAVIFTLALAYGAVIILGVSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLVAVMCLPFTFVYTLMDHWVFGETMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYIGITVIWVLAVASSLPFVIYQILTDKD--KYVCFDK---FPSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMIRDSKYRSSETKRINIMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK >GPR1_MACMU EDLEETLFEEFENYSYALDYYSLESDLEEKVQLGVVHWVSLVLYCLSFVLGIPGNAIVIWFTGFKWKKTVS-TLWFLNLAIADFIFLLFLPLYISYVVMNFHWPFGIWLCKANSFTAQLNMFASVFFLTVISLDHYIHLIHPVSHRHRTLKNSLIVIIFIWLLASLIGGPALYFR--D--NN-HTLCYNNHDPDLTVIRHHVLTWVKFIVGYLFPLLTMSICYLCLIFKV---------KKRSILISSRHFWTILAVVVAFVVCWTPYHLFSIWELTIHHNHVMQAGIPLSTGLAFLNSCLNPILYVLISKKFQARFRSSVAEILKYTLWEVSCSGNSETKNLCLLET >ACM1_PIG ------------MNTSAPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL-AGQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFIVTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTINPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPSRQC------- >RTA_RAT EAHSTNQNKMCPGMSEALELYSRGFLTIEQIATLPPPAVTNYIFLLLCLCGLVGNGLVLWFFGFSIK-RTPFSIYFLHLASADGIYLFSKAVIALLNMGTFLGSFPDYVRRVSRIVGLCTFFAGVSLLPAISIERCVSVIFPMYWRRRPKRLSAGVCALLWLLSFLVTSIHNYFCM-FL---SGTACL-----------NMDISLGILLFFLFCPLMVLPCLALILHVE---------CRARRRQRSAKLNHVVLAIVSVFLVSSIYLGIDWFLFWVFQ--IPAPFPEYVTDLCICINSSAKPIVYFLAGRDKSQRLWEPLRVVFQRALRDGAEPGNTVTMEMQCPSG >GP43_HUMAN ------------------------------MLPDWKSSLILMAYIIIFLTGLPANLLALRAFVGRIRQPAPVHILLLSLTLADLLLLLLLPFKIIEAASNFRWYLPKVVCALTSFGFYSSIYCSTWLLAGISIERYLGVAFPVYKLSRRPLYGVIAALVAWVMSFGHCTIVIIVQYLNT--N--ITCYEN-FTDNQLDVVLPVRLELCLVLFFIPMAVTIFCYWRFVWIMLSQP------LVGAQRRRRAVGLAVVTLLNFLVCFGPYNVSHLVGYHQR---KSPWWRSIAVVFSSLNASLDPLLFYFSSSVVRRAFGRGLQVLRNQGSSLLGRRG---------TAE >CKR1_HUMAN METPNTTEDYDTTTEFDYGDATPC---QKVNERAFGAQLLPPLYSLVFVIGLVGNILVVLVLVQYKRLKNMTSIYLLNLAISDLLFLFTLPFWIDYKLKDD-WVFGDAMCKILSGFYYTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIIIWALAILASMPGLYFSKTQW-EF-HHTCSLHFPHESLREWKLFQALKLNLFGLVLPLLVMIICYTGIIKIL---------LRRPNEKKSKAVRLIFVIMIIFFLFWTPYNLTILISVFQDFLRHLDLAVQVTEVIAYTHCCVNPVIYAFVGERFRKYLRQLFHRRVAVHLVKWLPFLV-------SSTS >VK02_SPVKA YEYSTITDYYNTINNDITSSSVIKAFDNNCTFLEDTKYHIIVIHIILFLLGSIGNIFVVSLIAFKRN-KSITDIYILNLSMSDCIFVFQIPFIVYSKLDQ--WIFGNILCKIMSVLYYVGFFSNMFIITLMSIDRYFAIVHPIRQPYRTKRIGILMCCSAWLLSLILSSPVSKLYENIP--MDIYQCTLTENDSIIAFIKRLMQIEITILGFLIPIIIFVYCYYRIFTTV---------VRLRNRRKYKSIKIVLMIVVCSLICWIPLYIVLMIATIVSLYLNLAYAITFSETISLARCCINPIIYTLIGEHVRSRISSICSCIYRDNRIRKKLFSNII--------- >AA1R_CAVPO ----------------------------MPHSVSAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIASLAVADVAVGALVIPLAILINIG--PQTYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKTVVTPRRAAVAIAGCWILSLVVGLTPMFGWNNLSN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRKQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPTCHKPTILTYIAIFLTHGNSAMNPIVYAFRIQKFRVTFLKIWNDHFRCQPEPPID--VDD--------- >P2Y6_RAT -----------MERDNGTIQAPGLPPTTCVYREDFKRLLLPPVYSVVLVVGLPLNVCVIAQICASRRTLTRSAVYTLNLALADLLYACSLPLLIYNYARGDHWPFGDLACRLVRFLFYANLHGSILFLTCISFQRYLGICHPLWHKRGGRRAAWVVCGVVWLVVTAQCLPTAVFAATG-----RTVCYDL-SPPILSTRYLPYGMALTVIGFLLPFTALLACYCRMARRLCRQ---GPAGPVAQERRSKAARMAVVVAAVFVISFLPFHITKTAYLAVRSTETFAAAYKGTRPFASANSVLDPILFYFTQQKFRRQPHDLLQKLTAKWQRQRV--------------- >O2H3_HUMAN ---------------MDNQSSTPGFLLLGFSEHPGLGRTLFVDVITSYLLTLVGNTLIILLSALDTKLHSPMYFFLSNLSFLDLCFTTSCVPQMLANLWGPKKTISFLDCSVQIFIFLSLGTTECILMKVMAFDRYVAVCQPLYATIIHPRLCWQLASVAWVIGLVGSVVQTPSTLHL---PDDFVCEVPRLSCEDTSYNEIQVAVASVFILVVPLSLILVSYGAITWAV--------LRINSATAWRKAFGTCSSHLTVVTLFYSSVIAVYLQPKNPY---AQGRGKFFGLFYAVGTPSLNPLVYTLRNKEIKRALRRLLGKERDSRESWRAA-------------- >OPRK_MOUSE LPNSSSWFPNWAESDSNGSVGSEDQQLESAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSAVYLMNS-WPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPLKAKIINICIWLLASSVGISAIVLGGTKVDVD-VIECSLQFPDDEYSWWDLFMKICVFVFAFVIPVLIIIVCYTLMILRLKSVRLL-SGSREKDRNLRRITKLVLVVVAVFIICWTPIHIFILVEALGSTSTAALSSYYFCIALGYTNSSLNPVLYAFLDENFKRCFRDFCFPIKMRMERQSTNRVASMRDVGGMNKP >5HTB_DROME TTSNLSQIVWNRSVNGNGNSNDEQERAAVEFWLLVKMIAMAVVLGLMILVTIIGNVFVIAAIILERNLQNVANYLVASLAVADLFVACLVMPLGAVYEISNGWILGPELCDIWTSCDVLCCTASILHLVAIAADRYWTVTNI-YNNLRTPRRVFLMIFCVWFAALIVSLAPQFGWKDPDE-E--QHCMVS--------QDVGYQIFATCCTFYVPLLVILFLYWKIYIIARKRPHQKRRQLLEAKRERKAAQTLAIITGAFVICWLPFFVMALTMSLCKECEIHTAVASLFLWLGYFNSTLNPVIYTIFNPEFRRAFKRILFGRKAAARARSAKI------------- >OPR._HUMAN GSHLQGNLSLLSPNHSLLPPHLLLNASHGAFLPLGLKVTIVGLYLAVCVGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQGTDILLGF-WPFGNALCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALASVVGVPVAIMGSAQVEE---IECLVE-IPTPQDYWGPVFAICIFLFSFIVPVLVISVCYSLMIRRLRGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLAQGLGVQPETAVAILRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCASALRRDVQVSDRVALACKTSETVPR >VG74_KSHV LDDDESWNETLNMSGYDYSGNFSLEVSVCEMTTVVPYTWNVGILSLIFLINVLGNGLVTYIFCKHRS-RAGAIDILLLGICLNSLCLSISLLAEVLM--FLFNIISTGLCRLEIFFYYLYVYLDIFSVVCVSLVRYLLVAYSTSWPKKQSLGWVLTSAALLIALVLSGDACRHRSRVVD-PVKQAMCYEN---NMTADWRLHVRTVSVTAGFLLPLALLILFYALTWCVV---------RRTKLQARRKVRGVIVAVVLLFFVFCFPYHVLNLLDTLLRRRGLINVGLAVTSLLQALYSAVVPLIYSCLGSLFRQRMYGLFQSLRQSFMSGATT-------------- >PE22_MOUSE ---------------MDNFLNDSKLMEDCKSRQWLLSGESPAISSVMFSAGVLGNLIALALLARRWRSISLFHVLVTELVLTDLLGTCLISPVVLASYSRNQLAPESHACTYFAFTMTFFSLATMLMLFAMALERYLSIGYPYYRRHLSRRGGLAVLPVIYGASLLFCSLPLLNYGEYVQYCPGTWCFIR--------HGRTAYLQLYATMLLLLIVAVLACNISVILNLIRMRGPRRGERTSMAEETDHLILLAIMTITFAICSLPFTIFAYMDETSS---LKEKWDLRALRFLSVNSIIDPWVFAILRPPVLRLMRSVLCCRTSLRTQEAQQTSSKQTDLCGQL-- >A1AD_RABIT GSGEDNRSSAGEPGGAGGGGEVNGTAAVGGLVVSAQSVGVGVFLAAFILTAVAGNLLVILSVACNRHLQTVTNYFIVNLAVADLLLSATVLPFSATMEVLGFWAFGRAFCDVWAAVDVLCCTASILSLCTISVDRYVGVRHSLYPAIMTERKAAAILALLWAVALVVSMGPLLGWKEP--VP--RFCGIT--------EEVGYAVFSSLCSFYLPMAVIVVMYCRVYVVARSTHTFLSVRLLKFSREKKAAKTLAIVVGVFVLCWFPFFFVLPLGSLFPQLKPSEGVFKVIFWLGYFNSCVNPLIYPCSSREFKRAFLRLLRCQCRRRRRRRPLWRASAGGGPHPDCA >OPSD_CRIGR MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVICKPMSNFRFGENHAIMGVVFTWIMALACAAPPLVGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFTIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVILMVVFFLICWFPYAGVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNILGDDEASATASKTETSQVAPA >AA1R_CANFA ----------------------------MPPAISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PRTYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKTVVTPRRAAVAIAGCWILSFVVGLTPLFGWNRLGN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRRQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPSCRKPSILMYIAIFLTHGNSAMNPIVYAFRIQKFRVTFLKIWNDHFRCQPTPPVD--PHD--------- >DCDR_.ENLA ---------MENFSIFNVTVNVWHADLDVGNSDLSLRALTGLLLSLLILSTLLGNTLVCLAVIKFRHRSKVTNFFVISLAVSDLFVALLVMPWKAVTEVAGFWVFG-DFCDTWVAFDIMCSTASILNLCIISLDRYWAIASPFYERKMTQRVAFIMIGVAWTLSILISFIPVQLSWHKSEE-HTENCDSS--------LNRTYAISSSLISFYIPVVIMIGTYTRIYRIAQTQSSRENSLKTSFRKETKVLKTLSIIMGVFVFCWLPFFVLNCMIPFCHMNCVSETTFNIFVWFGWANSSLNPVIYAFNA-DFRKAFTTILGCNRFCSSNNVEAVNYHHDTTFQK--- >O.YR_BOVIN GAFAANWSAEAVNGSAAPPGTEGNRTAGPPQRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLSRRTDRLAVLVTWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLATCYGLISFKIWQNRAIVSNVKLISKAKIRTVKMTFIVVLAFIVCWTPFFFVQMWSVWDADA-KEASPFIIAMLLASLNSCCNPWIYMLFTGHLFQELVQRFLCCSFRRLKGSRPGENSSTFVLSQYSS >YQNJ_CAEEL RDSVINASSAVSTTTLPPLDIPMTSMKPPSIIPTVELVLGTITYLVIIAMTVVGNTLVVVAVFSYRPLKKVQNYFLVSLAASDLAVAIFVMPLHVVTFLAGQWLLGVTVCQFFTTADILLCTSSILNLCAIALDRYWAIHNPIYAQKRTTKFVCIVIVIVWILSMLISVPPIIGWNNW-M---EDSCGLS--------TEKAFVVFSAAGSFFLPLLVMVVVYVKIFISARQRNPTRKREKISVAKEKRAAKTIAVIIFVFSFCWLPFFVAYVIRPFCETCTLVQQVEQAFTWLGYINSSLNPFLYGILNLEFRRAFKKILCPKAVLEQRRRRMSA------------ >OPS2_PATYE ------------------MPFPLNRTDTALVISPSEFRIIGIFISICCIIGVLGNLLIIIVFAKRRSVRRPINFFVLNLAVSDLIVALLGYPMTAASAFSNRWIFDNIGCKIYAFLCFNSGVISIMTHAALSFCRYIIICQYGYRKKITQTTVLRTLFSIWSFAMFWTLSPLFGWSSYVIEVVPVSCSVN--WYGHGLGDVSYTISVIVAVYVFPLSIIVFSYGMILQEKVCGIRARYTPRFIQDIEQRVTFISFLMMAAFMVAWTPYAIMSALAIGSFN--VENSFAALPTLFAKASCAYNPFIYAFTNANFRDTVVEIMAPWTTRRVGVSTLPWRRRTSAVNTTDI >RGR_HUMAN ---------------------MAETSALPTGFGELEVLAVGMVLLVEALSGLSLNTLTIFSFCKTPELRTPCHLLVLSLALADSGIS-LNALVAATSSLLRRWPYGSDGCQAHGFQGFVTALASICSSAAIAWGRYHHYCTRS---QLAWNSAVSLVLFVWLSSAFWAALPLLGWGHYDYEPLGTCCTLD--YSKGDRNFTSFLFTMSFFNFAMPLFITITSYSLME--------------QKLGKSGHLQVNTTLPARTLLLGWGPYAILYLYAVIADVTSISPKLQMVPALIAKMVPTINAINYALGNEMVCRGIWQCLSPQKREKDRTK---------------- >PE23_PIG ------------MWAPERSAEEQGNLTRSLGSSEDCGSVSVVFPMTMLITGFVGNALAMLLVSQSYRRKKSFLLCIGWLALTDMVGQLLTSPVVIVLYLSHQLDPSGRLCTFFGLTMTAFGLSSLFIASAMAVERALAIRAPHYSSHMKTSATRAVLLGVWLAVLAFALLPVLGVG----QYTGTWCFISNETSSENNWGNIFFASAFSFLGLSALVVTFACNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKTIFNQTSQNECNFFLIAVRLASLNQILDPWVYLLLRKILLQKFCQAVSQKQREEAATLIFTHPGEARVLFSKSK >ACM1_MACMU ------------MNTSAPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL--GQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTINPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPSRQC------- >GPRF_CERAE ----MDPEETSVYLDYYYATSPNPDIRETHSHVPYTSVFLPVFYIAVFLTGVLGNLVLMGALHFKPGSRRLIDIFIINLAASDFIFLVTLPLWVDKEASLGLWRTGSFLCKGSSYMISVNMHCSVFLLTCMSVDRYLAIVCPVSRKFRRTDCAYVVCASIWFISCLLGLPTLLSRELT-IDD-KPYCAEK----KATPLKLIWSLVALIFTFFVPLLSIVTCYCRIARKLCAH---YQQSGKHNKKLKKSIKIIFIVVAAFLVSWLPFNTSKLLAIVSGLQAILQLGMEVSGPLAFANSCVNPFIYYIFDSYIRRAIVHCLCPCLKNYDFGSSTETALSTFIHAEDFT >NK3R_RAT -G--NFSSALGLPATTQAPSQVRANLTNQFVQPSWRIALWSLAYGLVVAVAVFGNLIVIWIILAHKRMRTVTNYFLVNLAFSDASVAAFNTLINFIYGLHSEWYFGANYCRFQNFFPITAVFASIYSMTAIAVDRYMAIIDPL-KPRLSATATKIVIGSIWILAFLLAFPQCLYSK---GR---TLCYVQ--WPEGPKQHFTYHIIVIILVYCFPLLIMGVTYTIVGITLWGG--PCDKYHEQLKAKRKVVKMMIIVVVTFAICWLPYHVYFILTAIYQQLKYIQQVYLASFWLAMSSTMYNPIIYCCLNKRFRAGFKRAFRWCPFIQVSSY----TTRFHPTRQSSL >CKR3_CERAE MTTSLYTVETFGPTSYDDDMGLLC---EKADVGALIAQFVPPLYSLVFTVGLLGNVVVVMILIKYRRLRIMTNIYLLNLAISDLLFLFTLPFWIHYVREHN-WVFSHGMCKVLSGFYHTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIVTWGLAVLVALPEFIFYGTEE-LF-ETLCSAIYPQDTVYSWRHFHTLKMTILCLALPLLVMAICYTGIIKTL---------LKCPSKKKYKAIRLIFVIMAVFFIFWTPYNVAILISTYQSILKHVDLVVLVTEVIAYSHCCVNPVIYAFVGERFRKYLRHFFHRHVLMHLGRYIPFLT-------SSVS >PE24_HUMAN --------------------MSTPGVNSSASLSPDRLNSPVTIPAVMFIFGVVGNLVAIVVLCKSRKKETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKGQWPGGQPLCEYSTFILLFFSLSGLSIICAMSVERYLAINHAYYSHYVDKRLAGLTLFAVYASNVLFCALPNMGLGSSRLQYPDTWCFID---WTTNVTAHAAYSYMYAGFSSFLILATVLCNVLVCGALLRMAAAASSFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFVNQLYQPSEVSKNPDLQAIRIASVNPILDPWIYILLRKTVLSKAIEKIKCLFCRIGGSRRERSQRTSSAMSGHSR >GPR4_PIG --------------------MGNGTWEGCHVDSRVDHLFPPSLYIFVIGVGLPTNCLRLWAAYRQVRQRNELGVYLMNLSIADLLYICTLPLWVDYFLHHDNWIHGPGSCKLFGFIFYTNIYISIAFLCCISVDRYLAVAHPLFARLRRVKTAVAVSSVVWATELGANSVPLFHDEL--NH---TFCFEKPMEGWVAWMVAWMNLYRVFVGFLFPWALMLLSYRGILRAVRG------SVSTERQEKAKIKRLALSLIAIVLVCFAPYHVLLLSRSAVYLGERVFSAYHSSLAFTSLNCVADPILYCLVNEGARSDVAKALHNLLRFLTSDKPQEMDTPLTSKRNSMA >PE23_BOVIN PFCTRFNHSDPGIWAAERAVEAPNNLTLPPEPSEDCGSVSVAFSMTMMITGFVGNALAITLVSKSYRRKKSFLLCIGWLALTDMVGQLLTSPVVIVLYLSHQLDPSGRLCTFFGLTMTVFGLSSLFIASAMAVERALATRAPHYSSHMKTSVTRAVLLGVWLAVLAFALLPVLGVG----QYTGTWCFISNGTNSRQNWGNVFFASAFAILGLSALVVTFACNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNHTSQDECNFFLIAVRLASLNQILDPWVYLLLRKILLQKFCQLLKGHSYGLDTEGGTENNLYISNLSRFFI >OPSD_RANCA MNGTEGPNFYVPMSNKTGIVRSPFEYPQYYLAEPWKYSVLAAYMFLLILLGLPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTITMYTSLHGYFVFGQTGCYFEGFFATLGGEIALWSLVVLAIERYIVVCKPMSNFRFGENHAMMGVAFTWIMALACAVPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFLIPLIIISFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVVIMVIFFLICWVPYAYVAFYIFTHQGSEFGPIFMTVPAFFAKSSAIYNPVIYIMLNKQFRNCMITTLCCGKNPFGDEDASSAATSVSTSQVSPA >OPS6_DROME GR----NLSLAESVPAEIMHMVDPYWYQWPPLEPMWFGIIGFVIAILGTMSLAGNFIVMYIFTSSKGLRTPSNMFVVNLAFSDFMMMFTMFPPVVLNGFYGTWIMGPFLCELYGMFGSLFGCVSIWSMTLIAYDRYCVIVKGMARKPLTATAAVLRLMVVWTICGAWALMPLFGWNRYVPEGNMTACGTD--YFAKDWWNRSYIIVYSLWVYLTPLLTIIFSYWHIMKAVAAHNVANSEADKSKAIEIKLAKVALTTISLWFFAWTPYTIINYAGIFESMH-LSPLSTICGSVFAKANAVCNPIVYGLSHPKYKQVLREKMPCLACGKDDLTSDSRSESQA------- >OPRM_HUMAN LSHLDGNLSDPCGPNRTDLGGRDSLCPPTGSPSMITAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGTILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDFRTPRNAKIINVCNWILSSAIGLPVMFMATTKYGS---IDCTLT-FSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALVTIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSNIEQQNSTRIPSTANTVDRTNH >AA2A_MOUSE ----------------------------------MGSSVYIMVELAIAVLAILGNVLVCWAVWINSNLQNVTNFFVVSLAAADIAVGVLAIPFAITISTG--FCAACHGCLFIACFVLVLTQSSIFSLLAIAIDRYIAIRIPLYNGLVTGMKAKGIIAICWVLSFAIGLTPMLGWNNCST-K-RVTCLFE-----DVVPMNYMVYYNFFAFVLLPLLLMLAIYLRIFLAARRQESQGERTRSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCSTCHAPPWLMYLAIILSHSNSVVNPFIYAYRIREFRQTFRKIIRTHVLRRQEPFRAGGAHSTEGEQVSLR >OPSG_CAVPO DAYEDSTQASLFTYTNSNNTRGPFEGPNYHIAPRWVYHLTSAWMTIVVIASIFTNGLVLVATMRFKKLRHPLNWILVNLAVADLAETVIASTISVVNQVYGYFVLGHPLCVVEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIVGIVFSWVWSAVWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCITPLSIIVLCYLHVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVLAYCLCWGPYAFFACFATANPGYSFHPLVAALPAYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVEDSSELSSTSRSVSPAA------ >CCR4_CERTO IYTSDNYTEEMG-SGDYDSIKEPC---FREKNAHFNRIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQKPRKLLAEKVVYVGVWIPALLLTIPGFIFASVSE-DD-RFICDRF---YPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH >OPSB_ANOCA MNGTEGINFYVPLSNKTGLVRSPFEYPQYYLAEPWKYKVVCCYIFFLIFTGLPINILTLLVTFKHKKLRQPLNYILVNLAVADLFMACFGFTVTFYTAWNGYFIFGPIGCAIEGFFATLGGQVALWSLVVLAIERYIVVCKPMGNFRFSATHALMGISFTWFMSFSCAAPPLLGWSRYIPEGMQCSCGPDYYTLNPDYHNESYVLYMFGVHFVIPVVVIFFSYGRLICKVREAAAQQQESASTQKAEREVTRMVILMVLGFLLAWTPYAMVAFWIFTNKGVDFSATLMSVPAFFSKSSSLYNPIIYVLMNKQFRNCMITTICCGKNPFGDEDVSSSVSSVSSSQVSPA >O1A2_HUMAN -------------MKKENQSFNLDFILLGVTSQQEQNNVFFVIFLCIYPITLTGNLLIILAICADIRLHNPMYFLLANLSLVDIIFSSVTIPKVLANHLLGSKFISFGGCLMQMYFMIALAKADSYTLAAMAYDRAVAISCPLYTTIMSPRSCILLIAGSWVIGNTSALPHTLLTASL---SANFYCDIMKLSCSDVVFFNVKMMYLGVGVFSLPLLCIIVSYVQVFSTV--------FQVPSTKSLFKAFCTCGSHLTVVFLYYGTTMGMYFRPLTSY----SPKDAVITVMYVAVTPALNPFIYSLRNWDMKAALQKLFSKRISS--------------------- >A2AC_MOUSE AEGPNGSDAGEWGSGGGANASGTDWVPPPGQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAVISFPPLVSFYR-------PQCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKLRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFSYSLYGICREAQLPEPLFKFFFWIGYCNSSLNPVIYTVFNQDFRRSFKHILFRRRRRGFRQ----------------- >5H2C_RAT LLVWQFDISISPVAAIVTDTFNSSDGGRLFQFPDGVQNWPALSIVVIIIMTIGGNILVIMAVSMEKKLHNATNYFLMSLAIADMLVGLLVMPLSLLAILYDYWPLPRYLCPVWISLDVLFSTASIMHLCAISLDRYVAIRNPIHSRFNSRTKAIMKIAIVWAISIGVSVPIPVIGLRD-VF--NTTCVL---------NDPNFVLIGSFVAFFIPLTIMVITYFLTIYVLRRQKKKPRGTMQAINNEKKASKVLGIVFFVFLIMWCPFFITNILSVLCGKAKLMEKLLNVFVWIGYVCSGINPLVYTLFNKIYRRAFSKYLRCDYKPDKKPPVRQIALSGRELNVNIY >OPRM_PIG FSHLEGNLSDPCIRNRTELGGSDSLCPPTGSPSMVTAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGTILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDFRTPRNAKIINVCNWILSSAIGLPVMFMATTKYGS---IDCALT-FSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSTIEQQNSARIPSTANTVDRTNH >B1AR_FELCA LPDGAATAARLLVPASPSASPLTPTSEGPAPLSQQWTAGIGLLMALIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVMRGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARALVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCFARRAARGGHAAAGCLPGTRPPPSPG >BONZ_HUMAN ------MAEHDYHEDYGFSSFNDSSQEEHQDFLQFSKVFLPCMYLVVFVCGLVGNSLVLVISIFYHKLQSLTDVFLVNLPLADLVFVCTLPFWAYAGIHE--WVFGQVMCKSLLGIYTINFYTSMLILTCITVDRFIVVVKATNQQAKRMTWGKVTSLLIWVISLLVSLPQIIYGNVFNLD--KLICGYH-----DEAISTVVLATQMTLGFFLPLLTMIVCYSVIIKTL---------LHAGGFQKHRSLKIIFLVMAVFLLTQMPFNLMKFIRSTHWEYTSFHYTIMVTEAIAYLRACLNPVLYAFVSLKFRKNFWKLVKDIGCLPYLGVSHQWKTFSASHNVEAT >CH23_HUMAN EDEDYNTSISYGDEYPDYLDSIVVLEDLSPLEARVTRIFLVVVYSIVCFLGILGNGLVIIIATFKMK-KTVNMVWFLNLAVADFLFNVFLPIHITYAAMDYHWVFGTAMCKISNFLLIHNMFTSVFLLTIISSDRCISVLLPVSQNHRSVRLAYMACMVIWVLAFFLSSPSLVFRDTAN-SS--WPTHSQ-MDPVGYSRHMVVTVTRFLCGFLVPVLIITACYLTIVCKL---------QRNRLAKTKKPFKIIVTIIITFFLCWCPYHTLNLLELHHTAMSVFSLGLPLATALAIANSCMNPILYVFMGQDFKK-FKVALFSRLVNALSEDTGHSFTKMSSMNERTS >TSHR_HUMAN LQAFDSHYDYTICGDSEDMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSLLALLGNVFVLLILLTSHYKLNVPRFLMCNLAFADFCMGMYLLLIASVDLYTHSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLRHACAIMVGGWVCCFLLALLPLVGISSY----KVSICLPM-----TETPLALAYIVFVLTLNIVAFVIVCCCHVKIYITVRNP------QYNPGDKDTKIAKRMAVLIFTDFICMAPISFYALSAILNKPLITVSNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFILLSKFGICKRQAQAYRGSTDIQVQKVTHD >GHSR_PIG SEEPGPNLTLPDLGWDAPPENDSLVEELLPLFPTPLLAGVTATCVALFVVGIAGNLLTMLVVSRFREMRTTTNLYLSSMAFSDLLIFLCMPLDLFRLWQYRPWNLGNLLCKLFQFVSESCTYATVLTITALSVERYFAICFPLAKVVVTKGRVKLVILVIWAVAFCSAGPIFVLVG---T-D-TNECRAT---FAVRSGLLTVMVWVSSVFFFLPVFCLTVLYSLIGRKLWRRGE-AVGSSLRDQNHKQTVKMLAVVVFAFILCWLPFHVGRYLFSKSLEPQISQYCNLVSFVLFYLSAAINPILYNIMSKKYRVAVFKLLGFEPFSQRKLSTLKDESSINT------ >B1AR_RAT LPDGAATAARLLVLASPPASLLPPASEGSAPLSQQWTAGMGLLLALIVLLIVVGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITLPFYQSLLTRARARALVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRAACRRRAAHGCLARAGPPPSPG >ACM2_CHICK -----------MNNSTYINSSSENVIALESPYKTIEVVFIVLVAGSLSLVTIIGNILVMVSIKVNRHLQTVNNYFLFSLACADLIIGIFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIV-VP-DKDCYIQ------FFSNPAVTFGTAIAAFYLPVIIMTVLYWQISRASKSRVKMPAKKKPPPSREKKVTRTILAILLAFIITWTPYNVMVLINSFCASC-IPGTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLMCHYKNIGATR---------------- >GPRK_HUMAN ATAVTTVRTNASGLEVPLFHLFARLDEELHGTFPGLCVALMAVHGAIFLAGLVLNGLALYVFCCRTRAKTPSVIYTINLVVTDLLVGLSLPTRFAVYYGA---RGCLRCAFPHVLGYFLNMHCSILFLTCICVDRYLAIVRPEPAACRQPACARAVCAFVWLAAGAVTLSVLGVTG----------S-----------RPCCRVFALTVLEFLLPLLVISVFTGRIMCALSRP----GLLHQGRQRRVRAMQLLLTVLIIFLVCFTPFHARQVAVALWPDMHTSLVVYHVAVTLSSLNSCMDPIVYCFVTSGFQATVRGLFGQHGEREPSSGDVVSSGRHHILSAGPH >ACM1_HUMAN ------------MNTSAPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL-AGQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTINPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPSRQC------- >ML1A_HUMAN -------MQGNGSALPNASQ---PVLRGDGARPSWLASALACVLIFTIVVDILGNLLVILSVYRNKKLRNAGNIFVVSLAVADLVVAIYPYPLVLMSIFNNGWNLGYLHCQVSGFLMGLSVIGSIFNITGIAINRYCYICHSLYDKLYSSKNSLCYVLLIWLLTLAAVLPNLRAGT-LQYDP-IYSCTFA------QSVSSAYTIAVVVFHFLVPMIIVIFCYLRIWILVLQVQR-PDRKPKLKPQDFRNFVTMFVVFVLFAICWAPLNFIGLAVASDPASRIPEWLFVASYYMAYFNSCLNAIIYGLLNQNFRKEYRRIIVSLCTARVFFVDSSNWKPSPLMTNNNV >TSHR_RAT LQAFDSHYDYTVCGDNEDMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSPMALLGNVFVLFVLLTSHYKLTVPRFLMCNLAFADFCMGVYLLLIASVDLYTHTWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLRHAYTIMAGGWVSCFLLALLPMVGISSY----KVSICLPM-----TDTPLALAYIALVLLLNVVAFVIVCSCYVKIYITVRNP------QYNPRDKDTKIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSGVLLVLFYPLNSCANPFLYAIFTKAFQRDVFILLSKFGLCKHQAQAYQANTGIQIQKIPQD >OPS2_DROME AQSS-GNGSVLDNVLPDMAHLVNPYWSRFAPMDPMMSKILGLFTLAIMIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIINFYYETWVLGPLWCDIYAGCGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKILFIWMMAVFWTVMPLIGWSAYVPEGNLTACSID--YMTRMWNPRSYLITYSLFVYYTPLFLICYSYWFIIAAVAAHMNVRSSEDCDKSAEGKLAKVALTTISLWFMAWTPYLVICYFGLFKIDG-LTPLTTIWGATFAKTSAVYNPIVYGISHPKYRIVLKEKCPMCVFGNTDEPKPDATSEADSKA---- >OPRD_HUMAN PPLFANASDAYPSACPSAGANASGPPGARSASSLALAIAITALYSAVCAVGLLGNVLVMFGIVRYTKMKTATNIYIFNLALADALATSTLPFQSAKYLMET-WPFGELLCKAVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPAKAKLINICIWVLASGVGVPIMVMAVTRPGA---VVCMLQ-FPSPSWYWDTVTKICVFLFAFVVPILIITVCYGLMLLRLRSVRLL-SGSKEKDRSLRRITRMVLVVVGAFVVCWAPIHIFVIVWTLVDIDPLVVAALHLCIALGYANSSLNPVLYAFLDENFKRCFRQLCRKPCGRPDPSSFSRARVTACTPSDGPG >OPSD_.ENLA MNGTEGPNFYVPMSNKTGVVRSPFDYPQYYLAEPWQYSALAAYMFLLILLGLPINFMTLFVTIQHKKLRTPLNYILLNLVFANHFMVLCGFTVTMYTSMHGYFIFGPTGCYIEGFFATLGGEVALWSLVVLAVERYIVVCKPMANFRFGENHAIMGVAFTWIMALSCAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFIVHFTIPLIVIFFCYGRLLCTVKEAAAQQQESLTTQKAEKEVTRMVVIMVVFFLICWVPYAYVAFYIFTHQGSNFGPVFMTVPAFFAKSSAIYNPVIYIVLNKQFRNCLITTLCCGKNPFGDEDGSSAASSVSSSQVSPA >PAFR_HUMAN ----------------------MEPHDSSHMDSEFRYTLFPIVYSIIFVLGVIANGYVLWVFARLYPKFNEIKIFMVNLTMADMLFLITLPLWIVYYQNQGNWILPKFLCNVAGCLFFINTYCSVAFLGVITYNRFQAVTRPITAQANTRKRGISLSLVIWVAIVGAASYFLILDSTN-G--NVTRCFEH---YEKGSVPVLIIHIFIVFSFFLVFLIILFCNLVIIRTLLMQP---VQQQRNAEVKRRALWMVCTVLAVFIICFVPHHVVQLPWTLAELGQAINDAHQVTLCLLSTNCVLDPVIYCFLTKKFRKHLTEKFYSMRSSRKCSRATTDPFNQIPGNSLKN >OPSG_SCICA DSHEDSTQSSIFTYTNSNATRGPFEGPNYHIAPRWVYHITSTWMIIVVIASVFTNGLVLVATMKFKKLRHPLNWILVNLAIADLAETVIASTISVVNQLYGYFVLGHPLCVVEGYTVSVCGITGLWSLAIISWERWLVVCKPFGNMRFDAKLAIVGIAFSWIWSAVWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCIIPLSIIILCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVFAYCLCWGPYTFFACFATANPGYAFHPLVAALPAYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVDDTSELSSASKSVSPAA------ >HH2R_RAT -------------------MEPNGTVHSCCLDSMALKVTISVVLTTLILITIAGNVVVCLAVSLNRRLRSLTNCFIVSLAATDLLLGLLVLPFSAIYQLSFTWSFGHVFCNIYTSLDVMLCTASILNLFMISLDRYCAVTDPLYPVLVTPVRVAISLVFIWVISITLSFLSIHLGWN--RN-TF-KCKVQ--------VNEVYGLVDGLVTFYLPLLIMCVTYYRIFKIAREQR--ISSWKAATIREHKATVTLAAVMGAFIICWFPYFTAFVYRGLRGDD-INEAVEGIVLWLGYANSALNPILYAALNRDFRTAYQQLFHCKFASHNSHKTSLRRSQSREGRW--- >V1AR_RAT SSPWWPLTTEGSNGSQEAARLGEGDSPLGDVRNEELAKLEIAVLAVIFVVAVLGNSSVLLALHRTPRKTSRMHLFIRHLSLADLAVAFFQVLPQLCWDITSSFRGPDWLCRVVKHLQVFAMFASAYMLVVMTADRYIAVCHPLKTLQQPARRSRLMIATSWVLSFILSTPQYFIFSVIETK--TQDCWAT---FIQPWGTRAYVTWMTSGVFVAPVVVLGTCYGFICYHIWRNLLVVSSVKSISRAKIRTVKMTFVIVSAYILCWAPFFIVQMWSVWDENFDSENPSITITALLASLNSCCNPWIYMFFSGHLLQDCVQSFPCCHSMAQKFAKDDSTSYSNNRSPTNS >GPR1_HUMAN EDLEETLFEEFENYSYDLDYYSLESDLEEKVQLGVVHWVSLVLYCLAFVLGIPGNAIVIWFTGLKWK-KTVTTLWFLNLAIADFIFLLFLPLYISYVAMNFHWPFGIWLCKANSFTAQLNMFASVFFLTVISLDHYIHLIHPVSHRHRTLKNSLIVIIFIWLLASLIGGPALYFR--D--NN-HTLCYNNHDPDLTLIRHHVLTWVKFIIGYLFPLLTMSICYLCLIFKV---------KKRTVLISSRHFWTILVVVVAFVVCWTPYHLFSIWELTIHHNHVMQAGIPLSTGLAFLNSCLNPILYVLISKKFQARFRSSVAEILKYTLWEVSCSGNSETKNLCLLET >DADR_MACMU --------------MRTLNTSAMDGTGLVVERDFSVRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-SFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLSWHKAGN-TIDNCDSS--------LSRTYAISSSVISFYIPVAIMIVTYTRIYRIAQKQVECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCILPFCGSGCIDSITFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSTLLGCYRLCPATNNAIETAAMFSSH----- >AG2R_CANFA ------MILNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYVAIVHPMSPVRRTMLMAKVTCIIIWLLAGLASLPTIIHRNVFF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKTLKRA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSD-------H >OPSD_ATHBO MNGTEGPYFYIPMLNTTGVVRSPYEYPQYYLVNPAAYAVLGAYMFFLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTIYTSMHGYFVLGRLGCNVEGFSATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGVAFTWFMAAACAVPPLFGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFTCHFCIPLMVVFFCYGRLVCAVKEAAAAQQESETTQRAEREVTRMVIIMVVSFLVSWVPYASVAWYIFTHQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASSSSVSSSSVSPAA >RDC1_RAT YVEPGNYSDSNWPCNSSDCIVVDTVQCPAMPNKNVLLYTLSFIYIFIFVIGMIANSVVVWVNIQAKTTGYDTHCYILNLAIADLWVVITIPVWVVSLVQHNQWPMGELTCKITHLIFSINLFGSIFFLACMSVDRYLSITYFTTSSYKKKMVLRVVCVLVWLLAFFVSLPDTYYLKTVTNNE--TYCRSFYPEHSIKEWLIGMELVSVILGFAVPFTIIAIFYFLLARAM---------SASGDQEKHSSRKIIFSYVVVFLVCWLPYHFVVLLDIFSILHNVLFTALHVTQCLSLVHCCVNPVLYSFINRNYRYELMKAFIFKYSAKTGLTKLIDEYSALEQNAKA- >IL8A_RAT EGDFEEEFGNITRMLPTGEYFSPC----KR-VPMTNRQAVVVFYALVFLLSLLGNSLVMLVILYRRRTRSVTDVYVLNLAIADLLFSLTLPFLAVSKWKG--WIFGTPLCKMVSLLKEVNFFSGILLLACISVDRYLAIVHATRTLTRKRYLVKFVCMGTWGLSLVLSLPFAIFRQAYK-RS-GTVCYEV-LGEATADLRITLRGLSHIFGFLLPLFIMLVCYGLTLRTL---------FKAHMRQKRRAMWVIFAVVLVFLLCCLPYNLVLLSDTLLGAHNNIDQALYITEILGFSHSCLNPVIYAFVGQSFRHEFLKILAN--LVHKEVLTHHS------------ >CB2R_RAT ----MEGCRELELTNGSNGGLEFNPMKEYMILSDAQQIAVAVLCTLMGLLSALENVAVLYLILSSQRRRKPSYLFIGSLAGADFLASVIFACNFVIFHVFHG-VDSRNIFLLKIGSVTMTFTASVGSLLLTAVDRYLCLCYPPYKALVTRGRALVALGVMWVLSALISYLPLMGWTC-----CPSPCSEL------FPLIPNDYLLGWLLFIAILFSGIIYTYGYVLWKAHQHTEHQVPGIARMRLDVRLAKTLGLVMAVLLICWFPALALMGHSLVTTLSDKVKEAFAFCSMLCLVNSMVNPIIYALRSGEIRSAAQHCLTGWKKYLQGLGSEGKVTETEAEVKTTT >LSHR_HUMAN ESELSGWDYEYGFCLPKTPRCAPEPDAFNPCEDIMGYDFLRVLIWLINILAIMGNMTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDSQTKGWQTG-SGCSTAGFFTVFASELSVYTLTVITLERWHTITYAILDQKLRLRHAILIMLGGWLFSSLIAMLPLVGVSNY----KVSICFPM-----VETTLSQVYILTILILNVVAFFIICACYIKIYFAVRNP------ELMATNKDTKIAKKMAILIFTDFTCMAPISFFAISAAFKVPLITVTNSKVLLVLFYPINSCANPFLYAIFTKTFQRDFFLLLSKFGCCKRRAELYRRSNCKNGFTGSNK >OPSD_LIZSA MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIALWSLVVLAIERWMVVCKPISNFRFGEDHAIMGLAFTWVMAAACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVVAFLICWCPYAGVAWYIFTHQGSEFGPLFMTFPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA- >CB1R_FELCA EFYNKSLSSYKENEENIQCGENFMDMECFMILNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFVDFHVFHR-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKKIVTRPKAVVAFCLMWTIAIVIAVLPLLGWNCK-K--LQSVCSDI------FPLIDETYLMFWIGVTSVLLLFIVYAYMYILWKAHIHSEDQVTRPDQARMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTVFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPSCEGTAQPLDNSMGHANNTANVHRAA >EDG3_HUMAN TALPPRLQPVRGNETLREHYQYVGKLAGRLKEASEGSTLTTVLFLVICSFIVLENLMVLIAIWKNNKFHNRMYFFIGNLALCDLLAG-IAYKVNILMSGKKTFSLSPTVWFLREGSMFVALGASTCSLLAIAIERHLTMIKMRPYDANKRHRVFLLIGMCWLIAFTLGALPILGWNCL-H--NLPDCSTI------LPLYSKKYIAFCISIFTAILVTIVILYARIYFLVKSS---KVANHNNSERSMALLRTVVIVVSVFIACWSPLFILFLIDVACRVQCPILFKAQWFIVLAVLNSAMNPVIYTLASKEMRRAFFRLVCNCLVRGRGARASPIRSKSSSSNNSSH >UR2R_HUMAN AATGSSVPEPPGGPNATLNSSWASPTEPSSLEDLVATGTIGTLLSAMGVVGVVGNAYTLVVTCRSLRAVASMYVYVVNLALADLLYLLSIPFIVATYVTKE-WHFGDVGCRVLFGLDFLTMHASIFTLTVMSSERYAAVLRPLDTVQRPKGYRKLLALGTWLLALLLTLPVMLAMR---GP--KSLCLPA----WGPRAHRAYLTLLFATSIAGPGLLIGLLYARLARAYRRSQR--ASFKRARRPGARALRLVLGIVLLFWACFLPFWLWQLLAQYHQAPRTARIVNYLTTCLTYGNSCANPFLYTLLTRNYRDHLRGRVRGPGSGGGRGPVPSLRCSGRSLSSCSP >AA2B_CHICK --------------------------------MNTMKTTYIVLELIIAVLSIAGNVLVCWAVAINSTLKNATNYFLVSLAVADIAVGLLAIPFAITISIG--FQVDFHSCLFFACFVLVLTQSSIFSLLAVAIDRYLAIKIPLYNSLVTGKRARGLIAVLWLLSFVIGLTPLMGWNKAMG-A-FISCLFE-----NVVTMSYMVYFNFFGCVLLPLIIMLGIYIKIFMVACKQ---MGNSRTTLQKEVHAAKSLAIIVGLFAFCWLPLHILNCITHFHEEFSKPEWVMYVAIILSHANSVINPIIYAYRIRDFRYTFHKIISKILCKTDDFPKCTTVTNVNAPAASVT >5H2A_MOUSE NTSEASNWTIDAENRTNLSCEGYLPPTCLSILHLQEKNWSALLTTVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEEPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESNVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILAYKSSQLQVGQK >US27_HCMVA -------MTTSTNNQTLTQVSNMTNHTLNSTEIYQLFEYTRLGVWLMCIVGTFLNVLVITTILYYRRKKSPSDTYICNLAVADLLIVVGLPFFLEYAKHHP-KLSREVVCSGLNACFYICLFAGVCFLINLSMDRYCVIVWGVLNRVRNNKRATCWVVIFWILAVLMGMPHYLMYSHT-----NNECVGE-ANETSGWFPVFLNTKVNICGYLAPIALMAYTYNRMVRFI---------INYVGKWHMQTLHVLLVVVVSFASFWFPFNLALFLESIRLLANVIIFCLYVGQFLAYVRACLNPGIYILVGTQMRKDMWTTLRVFACCCVKQEIPYQKDIQRRAKHTKR >NK1R_HUMAN -----MDNVLPVDSDLSPNISTNTSEPNQFVQPAWQIVLWAAAYTVIVVTSVVGNVVVMWIILAHKRMRTVTNYFLVNLAFAEASMAAFNTVVNFTYAVHNEWYYGLFYCKFHNFFPIAAVFASIYSMTAVAFDRYMAIIHPL-QPRLSATATKVVICVIWVLALLLAFPQGYYST---SR---VVCMIEWPEHPNKIYEKVYHICVTVLIYFLPLLVIGYAYTVVGITLE---IPSDRYHEQVSAKRKVVKMMIVVVCTFAICWLPFHIFFLLPYINPDLKFIQQVYLAIMWLAMSSTMYNPIIYCCLNDRFRLGFKHAFRCCPFISAGDY----STRYLQTQGSVY >5H1A_HUMAN -MDVLSPGQGNNTTSPPAPFETGGNTTGISDVTVSYQVITSLLLGTLIFCAVLGNACVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQVTCDLFIALDVLCCTSSILHLCAIALDRYWAITDPIYVNKRTPRRAAALISLTWLIGFLISIPPMLGW-----DP--DACTIS--------KDHGYTIYSTFGAFYIPLLLMLVLYGRIFRAARFRKNEEAKRKMALARERKTVKTLGIIMGTFILCWLPFFIVALVLPFCESSHMPTLLGAIINWLGYSNSLLNPVIYAYFNKDFQNAFKKIIKCKFCRQ-------------------- >IL8B_MOUSE SGDLDIFN-YSSGMPSILPDAVPC----HSENLEINSYAVVVIYVLVTLLSLVGNSLVMLVILYNRSTCSVTDVYLLNLAIADLFFALTLPVWAASKVNG--WTFGSTLCKIFSYVKEVTFYSSVLLLACISMDRYLAIVHATSTLIQKRHLVKFVCIAMWLLSVILALPILILRNPVK-LS-TLVCYED-VGNNTSRLRVVLRILPQTFGFLVPLLIMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLVFLLCWLPYNLVLFTDTLMRTKDDIDKALNATEILGFLHSCLNPIIYAFIGQKFRHGLLKIMATYGLVSKEFLAKEG------------ >GPRY_MOUSE LCSSHGMHFITNYSDQASQNFGVPNVTSCPMDEKLLSTVLTTFYSVIFLVGLVGNIIALYVFLGIHRKRNSIQIYLLNVAVADLLLIFCLPFRIMYHINQNKWTLGVILCKVVGTLFYMNMYISIILLGFISLDRYIKINRSIQRRAITTKQSIYVCCIVWTVALAGFLTMIILTLKK-----STMCFHY--RDRHNAKGEAIFNFVLVVMFWLIFLLIILSYIKIGKNLLRISKR--SKFPNSGKYATTARNSFIVLIIFTICFVPYHAFRFIYISSQLNEIIHKTNEIMLVFSSFNSCLDPVMYFLMSSNIRKIMCQLLFRRFQSEASRSESTSLHDLSVTVKMPQ >PAR2_RAT ---LDTPPPITGKGAPVEPGFSVDEFSASVLTGKLTTVFLPVIYIIVFVIGLPSNGMALWVFFFRTKKKHPAVIYMANLALADLLSVIWFPLKISYHLHGNDWTYGDALCKVLIGFFYGNMYCSILFMTCLSVQRYWVIVNPMGHSRKRANIAVGVSLAIWLLIFLVTIPLYVMRQTIY--N--TTCHDVLPEEVLVGDMFSYFLSLAIGVFLFPALLTASAYVLMIKTL------SAMDEHSEKKRRRAIRLIITVLSMYFICFAPSNVLLVVHYFLIKSSHVYALYLVALCLSTLNSCIDPFVYYFVSKDFRDQARNALLCRSVRTVKRMQISLKSSS-------- >5H5A_MOUSE LPVNLTSFSLSTPSSLEPNRSDTEVLRPSRPFLSAFRVLVLTLLGFLAAATFTWNLLVLATILKVRTFHRVPHNLVASMAISDVLVAVLVMPLSLVHELSGRWQLGRRLCQLWIACDVLCCTASIWNVTAIALDRYWSITRHLYTLRTRKRVSNVMILLTWALSTVISLAPLLFGWGE-S-E-SEECQVS--------REPSYTVFSTVGAFYLPLWLVLFVYWKIYRAAKFRATVTEGDTWREQKEQRAALMVGILIGVFVLCWFPFFVTELISPLCSW-DVPAIWKSIFLWLGYSNSFFNPLIYTAFNRSYSSAFKVFFSKQQ----------------------- >IL8A_GORGO PQMWDFDDLNFTGMPPIDEDYSPC----RLETETLNKYVVIITYALAFLLSLLGNSLVMLVILYSRGGRSVTDVYLLNLALADLLFALTLPIWAASKVNG--WIFGTFLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRHLVKFVCLGCWGLSMILSLPFFLFRQAYH-NS-SPVCYEV-LGNDTAKWRMVLRILPHTFGFIVPLFVMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQNNVSLALDATEILGFLHSCLNPIIYAFIGQNFRHGFLKILAMHGLVSKEFLARHR------------ >5H2A_RAT NTSEASNWTIDAENRTNLSCEGYLPPTCLSILHLQEKNWSALLTTVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAIWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEEPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESNVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILAYKSSQLQVGQK >SSR2_BOVIN ELNETQPWLTTPFDLNGSVGAANISNQTEPYYDLASNVVLTFIYFVVCIIGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMINVAVWGVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYAFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSVAISPALKGMFDFVVVLTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGTDDGERSDLNETTETQRTLL >GP42_HUMAN -----------------------MDTGPDQSYFSGNHWFVFSVYLLTFLVGLPLNLLALVVFVGKLRRPVAVDVLLLNLTASDLLLLLFLPFRMVEAANGMHWPLPFILCPLSGFIFFTTIYLTALFLAAVSIERFLSVAHPLYKTRPRLGQAGLVSVACWLLASAHCSVVYVIEFSGD--T--GTCYLE-FWKDQLAILLPVRLEMAVVLFVVPLIITSYCYSRLVWILGR--------GGSHRRQRRVAGLVAATLLNFLVCFGPYNVSHVVGYICG---ESPVWRIYVTLLSTLNSCVDPFVYYFSSSGFQADFHELLRRLCGLWGQWQQESSGGEEQRADRPAE >APJ_MACMU ---------MEEGGDFDNYYGADNQSECEYTDWKSSGALIPAIYMLVFLLGTTGNGLVLWTVFRSSRKRRSADIFIASLAVADLTFVVTLPLWATYTYRDYDWPFGTFSCKLSSYLIFVNMYASVFCLTGLSFDRYLAIVRPVNARLRLRVSGAVATAVLWVLAALLAMPVMVFRTTGDQCY-MDYSMVA-TVSSDWAWEVGLGVSSTTVGFVVPFTIMLTCYFFIAQTIAGHFR--KERIEGLRKRRRLLSIIVVLVVTFALCWMPYHLVKTLYMLGSLLLFLMNVFPYCTCISYVNSCLNPFLYAFFDPRFRQACTSMLCCGQSRCAGTSHSSSSSGHSQGPGPNM >AG22_MOUSE RNITSSRPFDNLNATGTNESAFNC----SHKPSDKHLEAIPVLYYMIFVIGFAVNIVVVSLFCCQKGPKKVSSIYIFNLALADLLLLATLPLWATYYSYRYDWLFGPVMCKVFGSFLTLNMFASIFFITCMSVDRYQSVIYPFLSQRRNPWQASYVVPLVWCMACLSSLPTFYFRDVRT-LG--NACIMAFPPEKYAQWSAGIALMKNILGFIIPLIFIATCYFGIRKHLLKT----NSYGKNRITRDQVLKMAAAVVLAFIICWLPFHVLTFLDALTWMGAVIDLALPFAILLGFTNSCVNPFLYCFVGNRFQQKLRSVFRVPITWLQGKRETMSREMD-------T >FSHR_SHEEP FDMMYSEFDYDLCSEVVDVTCSPEPDAFNPCEDIMGYDILRVLIWFISILAITGNILVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDVHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLECKVHVRHAASIMLVGWVFAFAVALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYTHIYLTVRNP------NITSSSSDTKIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKSKILLVLFYPINSCANPFLYAIFTRNFRRDFFILLSKFGCYEVQAQTYRSNFHPRNGHCPPA >YN84_CAEEL EVFHHISTTNKIFQKMFDKRNFSTDYTFNPKTFPGYRTYVASTYISFNVVGFVINAWVLYVVAPLLFVPKSILFYIFALCVGDLMTMIAMLLLVIELVFG--TWQFS-SMVCTSYLIFDSMNKFMAPMIVFLISRTCYSTVCLGEKAATLKYAIIQFCIAFAFVMILLWPVFAYSQVFTQEVVMRKCGFF----PPPQIEFWFNLIACITSYAVPLFGIIYWYVSVPFFLKRR---LVASSSMDAALRKVITTVLLLTVIYVLCWTPYWVSMFANRIWIMEKSIIIISYFIHLLPYISCVAYPLIFTLLNRGIRSAHAKIVADQRRRFRSLTDEASRTIPGTKMKKNE >5H1D_CANFA SPPNQSLEGLLQEASNRSLNATETPEAWGPETLQALKISLALLLSIITMATALSNAFVLTTIFLTRKLHTPANYLIGSLAMTDLLVSILVMPISIAYTTTRTWSFGQILCDIWLSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGRAAVMIATVWVISICISIPPLFWR----Q-E--SDCQVN-------TSQISYTIYSTCGAFYIPSVLLIILYGRIYVAARNRKLALERKRISAARERKATKTLGIILGAFIVCWLPFFVASLVLPICRASWLHPALFDFFTWLGYLNSLINPIIYTVFNEEFRQAFQRVVHVRKAS--------------------- >GASR_BOVIN GASLCRSGGPLLNGSGTGNLSCEPPRIRGAGTRELELAIRVTLYAVIFLMSVGGNVLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVVCKAVSYFMGVSVSVSTLSLVAIALERYSAICRPLARVWQTRSHAARVIVATWMLSGLLMVPYPVYTAVQP---V-LQCMHR---WPSARVRQTWSVLLLLLLFFVPGVVMAVAYGLISRELYLGPGPTRPAQAKLLAKKRVVRMLLVIVVLFFLCWLPVYSANTWRAFDGPGALSGAPISFIHLLTYASACVNPLVYCFMHRRFRQACLDTCTRCCPRPPRARPRPLPSIASLSRLSYT >BRS4_BOMOR QTLPSAISSIAHLESLNDSFILGAKQSEDVSPGLEILALISVTYAVIISVGILGNTILIKVFFKIKSMQTVPNIFITSLAFGDLLLLLTCVPVDASRYIVDTWMFGRAGCKIISFIQLTSVGVSVFTLTVLSADRYRAIVKPLLQTSDAVLKTCGKAVCVWIISMLLAAPEAVFSDLYETT--FEACAPY---VSEKILQETHSLICFLVFYIVPLSIISAYYFLIAKTLYKSMPAHTHARKQIESRKRVAKTVLVLVALFAVCWLPNHMLYLYRSFTYHSAFHLSATIFARVLAFSNSCVNPFALYWLSRSFRQHFKKQVYCCKTEPPAS--QQSTGITAVKGNIQM >OPSB_RAT -----MSGE-EFYLFQNISSVGPWDGPQYHIAPVWAFHLQAAFMGFVFFAGTPLNATVLVATLHYKKLRQPLNYILVNVSLGGFLFCIFSVFTVFIASCHGYFLFGRHVCALEAFLGSVAGLVTGWSLAFLAFERYLVICKPFGNIRFNSKHALTVVLITWTIGIGVSIPPFFGWSRFIPEGLQCSCGPDWYTVGTKYRSEHYTWFLFIFCFIIPLSLICFSYFQLLRTLRAVAAQQQESATTQKAEREVSHMVVVMVGSFCLCYVPYAALAMYMVNNRNHGLYLRLVTIPAFFSKSSCVYNPIIYCFMNKQFRACILEMVCRKPMTD---ESDMSSTVSSSKVGPH- >NK2R_RABIT ----MGACDIVTEANISSDIDSNATGVTAFSMPGWQLALWATAYLALVLVAVVGNATVIWIILAHRRMRTVTNYFIVNLALADLCMATFNAAFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSGPGTKAVIAGIWLVALALAFPQCFYST---GA---TKCVVAWPEDSGGKMLLLYHLTVIALIYFLPLVVMFVAYSVIGFKLWRRPGHHGANLRHLRAKKKFVKTMVLVVVTFAVCWLPYHLYFLLGHFQDDIKFIQQVYLVLFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTEE----HTPSLSVRVNRC >PAR2_MOUSE ---LETQPPITGKGVPVEPGFSIDEFSASILTGKLTTVFLPVVYIIVFVIGLPSNGMALWIFLFRTKKKHPAVIYMANLALADLLSVIWFPLKISYHLHGNNWVYGEALCKVLIGFFYGNMYCSILFMTCLSVQRYWVIVNPMGHPRKKANIAVGVSLAIWLLIFLVTIPLYVMKQTIY--N--TTCHDVLPEEVLVGDMFNYFLSLAIGVFLFPALLTASAYVLMIKTL------SAMDEHSEKKRQRAIRLIITVLAMYFICFAPSNLLLVVHYFLIKTSHVYALYLVALCLSTLNSCIDPFVYYFVSKDFRDHARNALLCRSVRTVNRMQISLKSGS-------- >OPSD_CARAU MNGTEGDMFYVPMSNATGIVRSPYDYPQYYLVAPWAYACLAAYMFFLIITGFPVNFLTLYVTIEHKKLRTPLNYILLNLAISDLFMVFGGFTTTMYTSLHGYFVFGRVGCNPEGFFATLGGEMGLWSLVVLAFERWMVVCKPVSNFRFGENHAIMGVVFTWFMACTCAVPPLVGWSRYIPEGMQCSCGVDYYTRPQAYNNESFVIYMFIVHFIIPLIVIFFCYGRLVCTVKEAAAQHEESETTQRAEREVTRMVVIMVIGFLICWIPYASVAWYIFTHQGSEFGPVFMTLPAFFAKTAAVYNPCIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA >TRFR_HUMAN ------------MENETVSELNQTQLQPRAVVALEYQVVTILLVLIICGLGIVGNIMVVLVVMRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSLYCMLWFFLLDLN--DA-SCGYKIS------RNYYSPIYLMDFGVFYVVPMILATVLYGFIARILFLNLNVNRCFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPTEKPANYSVKESDHFSTELDD >PE24_MOUSE GTIPRSNRELQRCVLLTTTIMSIPGVNASFSSTPERLNSPVTIPAVMFIFGVVGNLVAIVVLCKSRKKETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKGQWPGDQALCDYSTFILLFFGLSGLSIICAMSIERYLAINHAYYSHYVDKRLAGLTLFAIYASNVLFCALPNMGLGRSERQYPGTWCFID---WTTNVTAYAAFSYMYAGFSSFLILATVLCNVLVCGALLRMAAAVASFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFINQLYQPNDISRNPDLQAIRIASVNPILDPWIYILLRKTVLSKAIEKIKCLFCRIGGSGRDSSRRTSSAMSGHSR >C3AR_CAVPO --------------MESSSAETNSTGLHLEPQYQPETILAMAILGLTFVLGLPGNGLVLWVAGLKMR-RTVNTVWFLHLTVADFVCCLSLPFSMAHLALRGYWPYGEILCKFIPTVIIFNMFASVFLLTAISLDRCLMVLKPICQNHRNVRTACIICGCIWLVAFVLCIPVFVYRETFT-EE-DDLSPFT-HEYRTPRLLKVITFTRLVVGFLLPMIIMVACYTLIIFRM--------RRVRVVKSWNKALHLAMVVVTIFLICWAPYHVFGVLILFINPEAALLSWDHVSIALASANSCFNPFLYALLGRDLRKRVRQSMKGILEAAFSEDISKSAFS--------- >GPR3_MOUSE GAGSSMAWFSAGSGSVNVSSVDPVEEPTGPATLLPSPRAWDVVLCISGTLVSCENALVVAIIVGTPAFRAPMFLLVGSLAVADLLAG-LGLVLHFAAD-F--CIGSPEMSLMLVGVLAMAFTASIGSLLAITVDRYLSLYNALYYSETTVTRTYVMLALVWVGALGLGLVPVLAWNCR-D--GLTTCGVV-------YPLSKNHLVVLAIAFFMVFGIMLQLYAQICRIVCRHIALHLLPASHYVATRKGIATLAVVLGAFAACWLPFTVYCLLGDA----DSPRLYTYLTLLPATYNSMINPVIYAFRNQDVQKVLWAICCCCSTSKIPFRSRSP------------ >OPSD_OCTDO --MVESTTLVNQTWWYNPTVDIHPHWAKFDPIPDAVYYSVGIFIGVVGIIGILGNGVVIYLFSKTKSLQTPANMFIINLAMSDLSFSAINGFPLKTISAFMKWIFGKVACQLYGLLGGIFGFMSINTMAMISIDRYNVIGRPMASKKMSHRRAFLMIIFVWMWSIVWSVGPVFNWGAYVPEGILTSCSFD--YLSTDPSTRSFILCMYFCGFMLPIIIIAFCYFNIVMSVSNHRLNLRKAQAGASAEMKLAKISMVIITQFMLSWSPYAIIALLAQFGPAEWVTPYAAELPVLFAKASAIHNPIVYSVSHPKFREAIQTTFPWLLTCCQFDEKECEEVVASERG-GES >B1AR_MELGA WLPPDCGPHNRSGGGGATAAPTGSRQVSAELLSQQWEAGMSLLMALVVLLIVAGNVLVIAAIGRTQRLQTLTNLFITSLACADLVMGLLVVPFGATLVVRGTWLWGSFLCECWTSLDVLCVTASIETLCVIAIDRYLAITSPFYQSLMTRARAKVIICTVWAISALVSFLPIMMHWWRDL-K--GCCDFV--------TNRAYAIASSIISFYIPLLIMIFVYLRVYREAKEQNGRRKTSRVMAMREHKALKTLGIIMGVFTLCWLPFFLVNIVNVFNRDL-VPDWLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFKRLLCFPRKADRRLHAGGQFISTLGSPEHSP >5H1B_CAVPO PAVLGSQTGLPHANVSAPPNNAP-SHIYQDSIALPWKVLLVVLLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAFTDLLVSILVMPISTMYTVTGRWTLGQALCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPRRAAGMIALVWVFSICISLPPFFWR----E-E--LDCLVN-------TDHVLYTVYSTGGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGVILGAFIVCWLPFFIISLVMPICKDAWFHMAIFDFFTWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCTT--------------------- >BRB2_MOUSE IEMFNVTTQVLGSALNGTLSKDNC---PDTEWWSWLNAIQAPFLWVLFLLAALENLFVLSVFFLHKNSCTVAEIYLGNLAAADLILACGLPFWAITIANNFDWVFGEVLCRVVNTMIYMNLYSSICFLMLVSIDRYLALVKTMMGRMRGVRWAKLYSLVIWGCTLLLSSPMLVFRTMRE-HN--TACVIV---YPSRSWEVFTNVLLNLVGFLLPLSVITFCTVRILQVLRNN---EMKKFKEVQTERKATVLVLAVLGLFVLCWVPFQISTFLDTLLRLGHAVDVITQISSYVAYSNSGLNPLVYVIVGKRFRKKSREVYRVLCQKGGCMGEPVQLRTS-------I >OPSB_APIME YVPSMREKFLGWNVPPEYSDLVRPHWRAFPAPGKHFHIGLAIIYSMLLIMSLVGNCCVIWIFSTSKSLRTPSNMFIVSLAIFDIIMAFEMPMLVISSFMERM--GWEIGCDVYSVFGSISGMGQAMTNAAIAFDRYRTISCPI-DGRLNSKQAAVIIAFTWFWVTPFTVLPLLKVWGRYTEGFLTTCSFD--FLTDDEDTKVFVTCIFIWAYVIPLIFIILFYSRLLSSIRNHNVKSNQDKER-SAEVRIAKVAFTIFFLFLLAWTPYATVALIGVYGNRELLTPVSTMLPAVFAKTVSCIDPWIYAINHPRYRQELQKRCKWMGIHEPETTSDATKTDE-------- >O.YR_HUMAN GALAANWSAEAANASAAPPGAEGNRTAGPPRRNEALARVEVAVLCLILLLALSGNACVLLALRTTRQKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLATWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLATCYGLISFKIWQNRVAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDANA-KEASAFIIVMLLASLNSCCNPWIYMLFTGHLFHELVQRFLCCSASYLKGRRLGENSSSFVLSHRSS >OPSR_CAPHI ANFEESTQGSIFTYTNSNSTRDPFEGPNYHIAPRWVYHLTSAWMVFVVIASVFTNGLVLAATMRFKKLRHPLNWILVNLAIADLAETIIASTISVVNQMYGYFVLGHPLCVVEGYTVSLCGITGLWSLAIISWERWMVVCKPFGNVRFDAKLATAGIAFSWIWAAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMITCCFIPLSVIILCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVMVMIFAYCLCWGPYTFFACFAAAHPGYAFHPLVAALPAYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVDDS-----SELASSV--SSVSPA >APJ_HUMAN ---------MEEGGDFDNYYGADNQSECEYTDWKSSGALIPAIYMLVFLLGTTGNGLVLWTVFRSSRKRRSADIFIASLAVADLTFVVTLPLWATYTYRDYDWPFGTFFCKLSSYLIFVNMYASVFCLTGLSFDRYLAIVRPVNARLRLRVSGAVATAVLWVLAALLAMPVMVLRTTGDQCY-MDYSMVA-TVSSEWAWEVGLGVSSTTVGFVVPFTIMLTCYFFIAQTIAGHFR--KERIEGLRKRRRLLSIIVVLVVTFALCWMPYHLVKTLYMLGSLLLFLMNIFPYCTCISYVNSCLNPFLYAFFDPRFRQACTSMLCCGQSRCAGTSHSSSSSGHSQGPGPNM >NY1R_PIG TLSSQVENHSIYYNFSEKNSQFLAFENDDCHLPLAMIFTLALAYGAVIILGVSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLVAIMCLPFTFVYTLMDHWVFGEVMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPSNRHAYVGIAVIWVLAVASSLPFLIYQVLTDKD--KYVCFDK---FLSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMMRDNKYRSSETKRINVMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCINPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK >OAR1_LYMST ---------MSRDIFMKRLRLHLLFDEVAMVTHIVGDVLSSVLLCAVVLLVLVGNTLVVAAVATSRKLRTVTNVFIVNLACADLLLGVLVLPFSAVNEIKDVWIFGHVWCQVWLAVDVWLCTASILNLCCISLDRYLAITRPIYPGLMSAKRAKTLVAGVWLFSFVICCPPLIGWNDGGT-Y--TTCELT--------NSRGYRIYAALGSFFIPMLVMVFFYLQIYRAAVKTHKPMRLHMQKFNREKKAAKTLAIIVGAFIMCWMPFFTIYLVGAFCENC-ISPIVFSVAFWLGYCNSAMNPCVYALFSRDFRFAFRKLLTCSCKAWSKNRSFRPIQLHCATQDDAK >OLF5_CHICK -------------MALGNCTTPTTFILSGLTDNPRLQMPLFMVFLAIYTITLLANLGLIALISVDFHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERRTISYVGCILQYFSFVLLTSSECLLLAVMAYDRYVAICKPLYPAIMTKAVCWRLVEGLYSLAFLNSLVHTSGLLKL---SNHFFCDNSQISSSSTTLNELLVFIFGSWFAMSSIITTPISYVFIILTV--------VRIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRPVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRVIATNVWIH-------------------- >A2AR_CARAU ----------MDVTQSNATKDDANITVTPWPYTETAAAFIILVVSVIILVSIVGNVLVIVAVLTSRALRAPQNLFLVSLACADILVATLVIPFSLANEIMGYWFFGSTWCAFYLALDVLFCTSSIVHLCAISLDRYWSVTKAVYNLKRTPKRIKSMIAVVWVISAVISFPPLIMTKH-------KECLIN--------DETWYILSSSLVSFFAPGFIMITVYCKIYRVAKQRSKQASKTKVAQMREKRFTFVLTVVMGVFVLCWFPFFFTYSLHAICGDSEPPEALFKLFFWIGYCNSSVNPIIYTIFNRDFRKAFKKICLLDCAAHLRDSCLGTCIFECHQKSNQE >VQ3L_CAPVK SNYTTAYNTTYYSDDYDDYEVSIVDIPHCDDGVDTTSFGLITLYSTIFFLGLFGNIIVLTVLRKYKI-KTIQDMFLLNLTLSDLIFVLVFPFNLYDSIAKQ-WSLGDCLCKFKAMFYFVGFYNSMSFITLMSIDRYLAVVHPVSMPIRTKRYGIVLSMVVWIVSTIESFPIMLFYETKK-VY-ITYCHVF-YNDNAKIWKLFINFEINIFGMIIPLTILLYCYYKILNTL----------KTSQTKNKKAIKMVFLIVICSVLFLLPFSVTVFVSSLYLLNRFVNLAVHVAEIVSLCHCFINPLIYAFCSREFTKKLLRLRTTSSAGSISIG---------------- >GP40_HUMAN --------------------------------MDLPPQLSFGLYVAAFALGFPLNVLAIRGATAHARRLTPSLVYALNLGCSDLLLTVSLPLKAVEALASGAWPLPASLCPVFAVAHFFPLYAGGGFLAALSAGRYLGAAFPLYQAFRRPCYSWGVCAAIWALVLCHLGLVFGLEAPGGTPVGSPVCLEA----WDPASAGPARFSLSLLLFFLPLAITAFCYVGCLRAL-------ARSGLTHRRKLRAAWVAGGALLTLLLCVGPYNASNVASFLYP--NLGGSWRKLGLITGAWSVVLNPLVTGYLGRGPGLKTVCAARTQGGKSQK------------------ >CKR8_MACMU --MDYTLDPSMTTMTDYYYPDSLSSPCDGELIQRNDKLLLAVFYCLLFVFSLLGNSLVILVLVVCKKLRNITDIYLLNLALSDLLFVFSFPFQTYYQLDQ--WVFGTVMCKVVSGFYYIGFYSSMFFITLMSVDRYLAVVHAVIKVRTIRMGTTTLSLLVWLTAIMATIPLLVFYQVAS-ED-VLQCYSF-YNQQTLKWKIFTNFEMNILGLLIPFTIFMFCYIKILHQL---------KRCQNHNKTKAIRLVLIVVIASLLFWVPFNVVLFLTSLHSMHQQLNYATHVTEIISFTHCCVNPVIYAFVGEKFKKHLSEIFQKSCSHIFIYLGRQMSSSCQQHSFRSS >NTR1_RAT EATFLALSLSNGSGNTSESDTAGPNSDLDVNTDIYSKVLVTAIYLALFVVGTVGNSVTAFTLARKKSLQSTVHYHLGSLALSDLLILLLAMPVELYNFIWVHWAFGDAGCRGYYFLRDACTYATALNVASLSVERYLAICHPFAKTLMSRSRTKKFISAIWLASALLAIPMLFTMGLQN--SGGLVCTPI---VDTATVKVVIQVN-TFMSFLFPMLVISILNTVIANKLTVMEHSMTIEPGRVQALRHGVLVLRAVVIAFVVCWLPYHVRRLMFCYISDEDFYHYFYMLTNALFYVSSAINPILYNLVSANFRQVFLSTLACLCPGWRHRRKKRPSMSSNHAFSTSA >B3AR_MACMU MAPWPHGNSSLVPWPDVPTLAPNTANTSGLPGVPWAAALAGALLALAVLATVGGNLLVIVAITRTPRLQTMTNVFVTSLAAADLVMGLLVVPPAATLVLTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSQWWRVQ-R--RCCAFA--------SNMPYVLLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALCTLGLIMGTFTLCWLPFFLANVLRALGGPS-VPDPAFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCHCGGRLPREPCAADAPLRPGPAPRSP >GALS_MOUSE ------------MNGSDSQGAEDSSQE-GGGGWQPEAVLVPLFFALIFLVGAVGNALVLAVLLRGGQAVSTTNLFILNLGVADLCFILCCVPFQATIYTLDDWVFGSLLCKAVHFLIFLTMHASSFTLAAVSLDRYLAIRYPMSRELRTPRNALAAIGLIWGLALLFSGPYLSYYS---AN--LTVCHPA----WSAPRRRAMDLCTFVFSYLLPVLVLSLTYARTLHYLWRTDP-VAAGSGSQRAKRKVTRMIVIVAVLFCLCWMPHHALILCVWFGRFPRATYALRILSHLVSYANSCVNPIVYALVSKHFRKGFRKICAGLLRRAPRRASGRVHSGGMLEPESTD >P2Y3_MELGA ----------------MSMANFTAGRNSCTFQEEFKQVLLPLVYSVVFLLGLPLNAVVIGQIWLARKALTRTTIYMLNLATADLLYVCSLPLLIYNYTQKDYWPFGDFTCKFVRFQFYTNLHGSILFLTCISVQRYMGICHPLWHKKKGKKLTWLVCAAVWFIVIAQCLPTFVFASTG-----RTVCYDL-SPPDRSASYFPYGITLTITGFLLPFAAILACYCSMARILCQK---ELIGLAVHKKKDKAVRMIIIVVIVFSISFFPFHLTKTIYLIVRSSQAFAIAYKCTRPFASMNSVLDPILFYFTQRKFRESTRYLLDKMSSKWRHDHCITY------------ >OPSD_TURTR MNGTEGLNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSVLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVANLFMVFGGFTTTLYTSLHAYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWIMAMACAAAPLVGWSRYIPEGMQCSCGIDYYTSRQEVNNESFVIYMFVVHFTIPLVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWVPYASVAFYIFTHQGSDFGPIFMTIPSFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGRNPLGDDEASTTASKTETSQVAPA >A2AB_CAVPO -------------------------MDHQEPYSVQATAAIAAVITFLILFTIFGNALVILAVLTSRSLPAPQNLFLVSLAAADILVATLIIPFSLANELLGYWYFWRTWCEVYLALDVLFCTSSIVHLCAISLDRYWAVSRALYNSKRTPRRIKCIILTVWLIAAVISLPPLIYKGD-Q-----PQCKIN--------QEAWYILASSIGSFFAPCLIMILVYLRIYLIAKRSGAVWWRRRTQMTREKRFTFVLAVVIGVFVLCWFPFFFTYSLGAICPQHKVPHGLFQFFFWIGYCNSSLNPVIYTIFNQDFRRAFRRILCRQWTQTAW------------------ >CML1_MOUSE EYDAYNDSGIYDDEYSDGFGYFVDLEEASPWEAKVAPVFLVVIYSLVCFLGLLGNGLVIVIATFKMK-KTVNTVWFVNLAVADFLFNIFLPMHITYAAMDYHWVFGKAMCKISNFLLSHNMYTSVFLLTVISFDRCISVLLPVSQNHRSIRLAYMTCSAVWVLAFFLSSPSLVFRDTAN-SS--HPAHSQ-VVSTGYSRHVAVTVTRFLCGFLIPVFIITACYLTIVFKL---------QRNRLAKNKKPFKIIITIIITFFLCWCPYHTLYLLELHHTAVSVFSLGLPLATAVAIANSCMNPILYVFMGHDFRK-FKVALFSRLANALSEDTGPSFTKMSSLNEKAS >5HT1_APLCA -MKSLKSSTHDVPHPEHVVWAPPAYDEQHHLFFSHGTVLIGIVGSLIITVAVVGNVLVCLAIFTEPISHSKSNFFIVSLAVADLLLALLVMTFALVNDMYGYWLFGETFCFIWMSADVMCETASIFSICVISYDRLKQVQKPLYEEFMTTTRALLIIACLWICSFVLSFVPIFLEWHELGD-AKHVCLFD--------VHFTYSVIYSFICFYVPCTLMLTNYLRLFLIAQTHQLRASSYRNQGTQGSKAARTLTIITGTFLACWLPFFIINPIAAADEHL-IPLECFMVTIWLGYFNSSVNPIIYGTSNSKFRAAFKRLLRCRSVKSVVGSISPVSWIRPSRLDLSS >5H1A_FUGRU NDSNATSGYSDTAAVDWDEGENATGSGSLPDPELSYQIITSLFLGALILCSIFGNSCVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQDICDLFIALDVLCCTSSILHLCAIALDRYWAITDPIYVNKRTPRRAAVLISVTWLIGFSISIPPMLGWRS---NP--DACIIS--------QDPGYTIYSTFGAFYIPLILMLVLYGRIFKAARFRINEGTRRKIALARERKTVKTLGIIMGTFIFCWLPFFIVALVLPFCAENYMPEWLGAVINWLGYSNSLLNPIIYAYFNKDFQSAFKKILRCKFHRH-------------------- >ACTR_BOVIN -------------MKHILNLYENINSTARNNSDCPAVILPEEIFFTVSIVGVLENLMVLLAVAKNKSLQSPMYFFICSLAISDMLGSLYKILENVLIMFKNMGSFESTADDVVDSLFILSLLGSICSLSVIAADRYITIFHALYHRIMTPHRALVILTVLWAGCTGSGITIVTFFS----------------------HHHVPTVIAFTALFPLMLAFILCLYVHMFLLARSH---TRRTPSLPKANMRGAVTLTVLLGVFIFCWAPFVLHVLLMTFCPADACYMSLFQVNGVLIMCNAIIDPFIYAFRSPELRVAFKKMVICNCYQ--------------------- >CKR9_MOUSE DDFSYDSTASTDDYMNLNFSSFFC---KKNNVRQFASHFLPPLYWLVFIVGTLGNSLVILVYWYCTRVKTMTDMFLLNLAIADLLFLATLPFWAIAAAGQ--WMFQTFMCKVVNSMYKMNFYSCVLLIMCISVDRYIAIVQAMVWRQKRLLYSKMVCITIWVMAAVLCTPEILYSQVSG-G--IATCTMVYPKDKNAKLKSAVLILKVTLGFFLPFMVMAFCYTIIIHTL---------VQAKKSSKHKALKVTITVLTVFIMSQFPYNSILVVQAVDAYATNIDICFQVTQTIAFFHSCLNPVLYVFVGERFRRDLVKTLKNLGCISQAQWVSFT--------GSLK >5H1B_SPAEH CAPPPPAGSQTQTPSSNLSHNSADSYIYQDSIALPWKVLLVALLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPRRAAVMIALVWVFSISISLPRFFWR----E-E--LDCLVN-------TDHVLYTVYSTVGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHMAIFDFFNWLGYLNSLINPIIYTMPNEDFKQAFHKLIRFKCTG--------------------- >OPS4_DROVI VSGNGDLQFLGWNVPPDQIQHIPEHWLTQLEPPASMHYMLGVFYIFLFCASTVGNGMVIWIFSTSKALRTPSNMFVLNLAVFDFIMCLKAPIFIYNSFHRG-FALGNTGCQIFAAIGSYSGIGAGMTNAAIGYDRLNVITKPM-NRNMTFTKAIIMNVIIWLYCTPWVVLPLTQFWDRFPEGYLTSCTFD--YLTDNFDTRLFVGTIFFFSFVCPTLMIIYYYSQIVGHVFSHNVESNVDKSKDTAEIRIAKAAITICFLFFVSWTPYGVMSLIGAFGDKSLLTPGATMIPACTCKLVACIDPFVYAISHPRYRMELQKRCPWLAIDEKAPESSSAEQQQTTAA---- >OPSG_RABIT ESHEDSTQASIFTYTNSNSTRGPFEGPNFHIAPRWVYHLTSAWMILVVIASVFTNGLVLVATMRFKKLRHPLNWILVNLAVADLAETVIASTISVVNQFYGYFVLGHPLCVVEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIAGIAFSWIWAAVWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCIIPLSVIVLCYLQVWMAIRTVAKQQKESESTQKAEKEVTRMVVVMVFAYCLCWGPYTFFACFATAHPGYSFHPLVAAIPSYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVEDS-----SELASSV--SSVSPA >NK3R_RABIT LVQAGNLSSSLPSSVPGLPTTPRANLTNQFVQPSWRIALWSLAYGVVVAVAVFGNLIVIWIILAHKRMRTVTNYFLVNLAFSDASMAAFNTLVNFIYALHSEWYFGANYCRFQNFFPITAVFASIYSMTAIAVDRYMAIIDPL-KPRLSATATKIVIGSIWILAFLLALPQCLYSK---GR---TLCYVQ--WPEGPKQHFIYHIIVIILVYCFPLLIMGITYTIVGITLWGG--PCDKYHEQLKAKRKVVKMMIIVVVTFAICWLPYHIYFILTAIYQQLKYIQQVYLASFWLAMSSTMYNPIIYCCLNKRFRAGFKRAFRWCPFIQVSSYDELEPTRQSSLYTVTR >ACM5_HUMAN -------MEGDSYHNATTVNGTPVNHQPLERHRLWEVITIAAVTAVVSLITIVGNVLVMISFKVNSQLKTVNNYYLLSLACADLIIGIFSMNLYTTYILMGRWALGSLACDLWLALDYVASNASVMNLLVISFDRYFSITRPLYRAKRTPKRAGIMIGLAWLISFILWAPAILCWQYLV-VP-LDECQIQ------FLSEPTITFGTAIAAFYIPVSVMTILYCRIYRETEKRNPSTKRKRVVLVKERKAAQTLSAILLAFIITWTPYNIMVLVSTFCDKC-VPVTLWHLGYWLCYVNSTVNPICYALCNRTFRKTFKMLLLCRWKKKKVEEKLYW------------ >O3A1_HUMAN ----------MQPESGANGTVIAEFILLGLLEAPGLQPVVFVLFLFAYLVTVRGNLSILAAVLVEPKLHTPMYFFLGNLSVLDVGCISVTVPSMLSRLLSRKRAVPCGACLTQLFFFHLFVGVDCFLLTAMAYDQFLAICRPLYSTRMSQTVQRMLVAASWACAFTNALTHTVAMSTL---NNHFYCDLPQLSCSSTQLNELLLFAVGFIMAGTPMALIVISYIHVAAAV--------LRIRSVEGRKKAFSTCGSHLTVVAIFYGSGIFNYMRLGSTK---LSDKDKAVGIFNTVINPMLNPIIYSFRNPDVQSAIWRMLTGRRSLA-------------------- >CKRB_BOVIN YNQSTDYYYEENEMNDTHDYSQYEVICIKEEVRKFAKVFLPAFFTIAFIIGLAGNSTVVAIYAYYKKRRTKTDVYILNLAVADLFLLFTLPFWAVNAVHG--WVLGKIMCKVTSALYTVNFVSGMQFLACISTDRYWAVTKAP-SQSGVGKPCWVICFCVWVAAILLSIPQLVFYTVN----HKARCVPI--YHLGTSMKASIQILEICIGFIIPFLIMAVCYFITAKTL---------IKMPNIKKSQPLKVLFTVVIVFIVTQLPYNIVKFCQAIDIIYKRMDVAIQITESIALFHSCLNPVLYVFMGTSFKNYIMKVAKKYGSW---------NVEEIPFESEDA >TA2R_HUMAN --MWPNG-----------SSLGPCFRPTNITLEERRLIASPWFAASFCVVGLASNLLALSVLAGARQTRSSFLTFLCGLVLTDFLGLLVTGTIVVSQHAALFVDPGCRLCRFMGVVMIFFGLSPLLLGAAMASERYLGITRPFRPAVASQRRAWATVGLVWAAALALGLLPLLGVGRYTVQYPGSWCFLT----LGAESGDVAFGLLFSMLGGLSVGLSFLLNTVSVATLCHVYHGEAAQQRPRDSEVEMMAQLLGIMVVASVCWLPLLVFIAQTVLRNPPRTTEKELLIYLRVATWNQILDPWVYILFRRAVLRRLQPRLSTRPRRVSLCGPAWSTATSASRVQAIL >AG2R_CHICK ------MVPNYSTEETVKRIHVDC---PVSGRHSYIYIMVPTVYSIIFIIGIFGNSLVVIVIYCYMKLKTVASIFLLNLALADLCFLITLPLWAAYTAMEYQWPFGNCLCKLASAGISFNLYASVFLLTCLSIDRYLAIVHPVSRIRRTMFVARVTCIVIWLLAGVASLPVIIHRNIFF-LN--TVCGFR-YDNNNTTLRVGLGLSKNLLGFLIPFLIILTSYTLIWKTLKKA----YQIQRNKTRNDDIFKMIVAIVFFFFFSWIPHQVFTFLDVLIQLHDIVDTAMPFTICIAYFNNCLNPFFYVFFGKNFKKYFLQLIKYIPPNVSTHPSLTTRPPE-------N >P2YR_MOUSE AAFLAGLGSLWGNSTVASTAAVSSSFQCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDAMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLSLGRLKKKNAIYVSVLVWLIVVVAISPILFYSGTG----KTVTCYDT-TSNDYLRSYFIYSMCTTVAMFCIPLVLILGCYGLIVKALIY------NDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMKTMNLRARLDDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKASRRSEANLQSKSLSEFKQNGDTSL >NY5R_RAT -------------NTAAARNAAFPAWEDYRGSVDDLQYFLIGLYTFVSLLGFMGNLLILMAVMKKRNQKTTVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKAMCHIMPFLQCVSVLVSTLILISIAIVRYHMIKHPI-SNNLTANHGYFLIATVWTLGFAICSPLPVFHSLVESS--KYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRSISCGAHEKRSITRIKKRSRSVFYRLTILILVFAVSWMPLHVFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLRALIHCLHMS--------------------- >THRR_PAPHA LTEYRLVSINKSSPLQKPLPAFISEDASGYLTSSWLTLFVPSVYTGVFVVSLPVNIMAIVVFILKMKVKKPAVVYMLHLATADVLFVSVLPFKISYYLSGSDWQFGSELCRFVTAAFYCNMYASILLMTVISIDRFLAVVYPMSLSWRTLGRASFTCLAIWALAIAGVVPLLLKEQTIQ--N--TTCHDVLNETLLEGYYAYYFSAFSAVFFFVPLIISTVCYVSIIRCL------SSSTVANRSKKSRALFLSAAVFCIFIICFGPTNILLIAHYSFLSHEAAYFAYLLCVCVSSISCCIDPLIYYYASSECQRYVYSILCCKESSDPSSSNSSGDTCS-------- >FSHR_HUMAN FDMTYTEFDYDLCNEVVDVTCSPKPDAFNPCEDIMGYNILRVLIWFISILAITGNIIVLVILTTSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLDCKVQLRHAASVMVMGWIFAFAAALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYIHIYLTVRNP------NIVSSSSDTRIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKAKILLVLFHPINSCANPFLYAIFTKNFRRDFFILLSKCGCYEMQAQIYRTNTHPRNGHCSSA >OPSB_CONCO MNGTEGPNFYVPMSNATGVVRSPFEYPQYYLAEPWAFSILAAYMFFLIITGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVFGETGCNLEGYFATLGGEISLWSLVVLAIERWVVVCKPISNFRFGENHAIMGLTLTWVMANACAMPPLFGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFLVHFTIPLTIISFCYGRLVCAVKEAAAQQQESETTQRAEREVTRMVVIMVISFLVCWIPYASVAWYIFTHQGSTFGPIFMTVPSFFAKSSSIYNPMIYICMNKQFRNCMITTLFCGKNPFEGEE--EGASAVSS--VSPA >A2AA_HUMAN ----MGSLQPDAGNASWNGTEAPGGGARATPYSLQVTLTLVCLAGLLMLLTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKAWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIITVWVISAVISFPPLISIEKKGQ-P--PRCEIN--------DQKWYVISSCIGSFFAPCLIMILVYVRIYQIAKRRRVGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLTAVGCS--VPRTLFKFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------ >OPSD_TRIMA MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTLPAFFAKSASIYNPVIYIMMNKQFRNCMLTTICCGKNPFAEEEGATTVSKTETSQVAPA >AA3R_HUMAN -------------------------MPNNSTALSLANVTYITMEIFIGLCAIVGNVLVICVVKLNPSLQTTTFYFIVSLALADIAVGVLVMPLAIVVSLG--ITIHFYSCLFMTCLLLIFTHASIMSLLAIAVDRYLRVKLTVYKRVTTHRRIWLALGLCWLVSFLVGLTPMFGWN-MKE-Y-FLSCQFV-----SVMRMDYMVYFSFLTWIFIPLVVMCAIYLDIFYIIRNK--NSKETGAFYGREFKTAKSLFLVLFLFALSWLPLSIINCIIYFNGE--VPQLVLYMGILLSHANSMMNPIVYAYKIKKFKETYLLILKACVVCHPSDSLDTS------------ >ET1R_HUMAN ELSFLVTTHQPTNLVLPSNGSMHNYCPQQTKITSAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRNDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSVQGIGIPLVTAIEIVSIWILSFILAIPEAIGFVMVPKT--HKTCMLNATSKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRGSL-IALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYNESFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCYQSKSLMTSVPWKNHDQNNHNTD >GALS_HUMAN ------------MNVSGCPGAGNASQAGGGGGWHPEAVIVPLLFALIFLVGTVGNTLVLAVLLRGGQAVSTTNLFILNLGVADLCFILCCVPFQATIYTLDGWVFGSLLCKAVHFLIFLTMHASSFTLAAVSLDRYLAIRYPLSRELRTPRNALAAIGLIWGLSLLFSGPYLSYYR---AN--LTVCHPA----WSAPRRRAMDICTFVFSYLLPVLVLGLTYARTLRYLWRADP-VAAGSGARRAKRKVTRMILIVAALFCLCWMPHHALILCVWFGQFPRATYALRILSHLVSYANSCVNPIVYALVSKHFRKGFRTICAGLLGRAPGRASGRVHSGSVLERESSD >OPSV_ORYLA -------MGKYFYLYENISKVGPYDGPQYYLAPTWAFYLQAAFMGFVFFVGTPLNFVVLLATAKYKKLRVPLNYILVNITFAGFIFVTFSVSQVFLASVRGYYFFGQTLCALEAAVGAVAGLVTSWSLAVLSFERYLVICKPFGAFKFGSNHALAAVIFTWFMGV-VRCPPFFGWSRYIPEGLGCSCGPDWYTNCEEFSCASYSKFLLVTCFICPITIIIFSYSQLLGALRAVAAQQAESASTQKAEKEVSRMIIVMVASFVTCYGPYALTAQYYAYSQDENKDYRLVTIPAFFSKSSCVYNPLIYAFMNKQFNGCIMEMVFGKKMEE-----ASESTDS-------- >YWO1_CAEEL ----------------------------MNHVDYVAHVIVMPIVLSIGMINQCLNVCTLLHIRT------SIFLYLKASAIADILSIVAFIPFLFRHAKLIDELGMFYHAHLELPLINALISASALNIVAMTVDRYVSVCHPINETKPSRRRTMLIIVMIYFIALMIYFPSVFQKKLG-VTTIYTIVRNE---VEALQVFKFYLIVRECICRWGPVLLLVILNMCVVRGLRKIRTEQRQLRSPRDDRSRISVLLFVTSATFIICNIPASVISFFVRRVSGSLFWQIFRAIANLLQVTSYLYNFYLYALCSSEYRHAFLRLFGCRSSLSPTSTGDSPGKRCHQAVVLLG >CKR5_RAT --MDFQGSIPTYIYDIDYSMSAPC---QKVNVKQIAAQLLPPLYSLVFIFGFVGNMMVFLILISCKKLKSMTDIYLFNLAISDLLFLLTLPFWAHYAANE--WVFGNIMCKLFTGIYHIGYFGGIFFIILLTIDRYLAIVHAVAIKARTVNFGVITSVVTWVVAVFVSLPEIIFMRSQK-EG-HYTCSPHFLHIQYRFWKHFQTLKMVILSLILPLLVMVICYSGILNTL--------FRCRNEKKRHRAVRLIFAIMIVYFLFWTPYNIVLLLTTFQEYFNRLDQAMQVTETLGMTHCCLNPVIYAFVGEKFRNYLSVFFRKHIVKRFCKHCSIFV-------SSVY >5HT_HELVI TSSEWFDGSNCSWVDAVSWGCTSTNATSTDVTSFVLMAVTSVVLALIILATIVGNVFVIAAIIIERNLQNVANYLVASLAVADLMVACLVMPLGAVYEVSQGWILGPELCDMWTSSDVLCSSASILHLVAIATDRYWAVTDV-YIHIRNEKRIFTMIVLVWGAALVVSLAPQLGWKDPDT-Q--QKCLVS--------QDLAYQIFATMSTFYVPLAVILILYWKIFQTARRRPAPEKKESLEAKRERKAAKTLAIITGAFVFCWLPFFIMALVMPICQTCVISDYLASFFLWLGYFNSTLNPVIYTIFSPDFRQAFARILFGTHRRRRYKKF--------------- >NTR2_HUMAN -----METSSPRPPRPSSNPGLSLDARLGVDTRLWAKVLFTALYALIWALGAAGNALSVHVVLKARARAGRLRHHVLSLALAGLLLLLVGVPVELYSFVWFHWVFGDLGCRGYYFVHELCAYATVLSVAGLSAERCLAVCQPLARSLLTPRRTRWLVALSWAASLGLALPMAVIMGQKH--AASRVCTVL---VVSRTALQVFIQVNVLVSFVLPLALTAFLNGVTVSHLLALGQVRHKDVRRIRSLQRSVQVLRAIVVMYVICWLPYHARRLMYCYVPDDNFYHYFYMVTNTLFYVSSAVTPLLYNAVSSSFRKLFLEAVSSLCGEHHPMKRLPPMDTASGFGDPPE >NY4R_MOUSE HFLAPLFPGSLQGKNGTNPLDSPYNFSDGCQDSAELLAFIITTYSIETILGVLGNLCLIFVTTRQKEKSNVTNLLIANLAFSDFLMCLICQPLTVTYTIMDYWIFGEVLCKMLTFIQCMSVTVSILSLVLVALERHQLIINPT-GWKPSIFQAYLGIVVIWFISCFLSLPFLANSTLNDED--KVVCFVS---WSSDHHRLIYTTFLLLFQYCIPLAFILVCYIRIYQRLQRQKHVAHACSSRAGQMKRINSMLMTMVTAFAVLWLPLHVFNTLEDWYQEACHGNLIFLMCHLLAMASTCVNPFIYGFLNINFKKDIKALVLTCHCRSPQGES---TVHTDLSKGSMR >PE21_RAT --MSPYG-LNLSLVDEATTCVTPRVPNTSVVLPTGGNGTSPALPIFSMTLGAVSNVLALALLAQVAGSTATFLLFVASLLAIDLAGHVIPGALVLRLYTAG-RAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTQPLHAARVSVARARLALALLAAMALAVALLPLVHVGHYELQYPGTWCFIS--LGPPGGWRQALLAGLFAGLGLAALLAALVCNTLSGLALLRDRRRSRGLRRVHAHDVEMVGQLVGIMVVSCICWSPLLVLVVLAIGGWNSNSLQRPLFLAVRLASWNQILDPWVYILLRQAMLRQLLRLLPLRVSAKGGPTELSLSSLRSSRHSGFS >OPSG_RAT DHYEDSTQASIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSTWMILVVIASVFTNGLVLAATMRFKKLRHPLNWILVNLAVADLAETIIASTISVVNQIYGYFVLGHPLCVIEGYIVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLATVGIVFSWVWAAVWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCIFPLSIIVLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVFAYCLCWGPYTFFACFATAHPGYAFHPLVASLPSYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVDDS-----SELVSSV--SSVSPA >P2UR_MOUSE ----MAADLEPWNSTINGTWEGDELGYKCRFNEDFKYVLLPVSYGVVCVLGLCLNVVALYIFLCRLKTWNASTTYMFHLAVSDSLYAASLPLLVYYYARGDHWPFSTVLCKLVRFLFYTNLYCSILFLTCISVHRCLGVLRPLSLRWGRARYARRVAAVVWVLVLACQAPVLYFVTTS-----RITCHDT-SARELFSHFVAYSSVMLGLLFAVPFSVILVCYVLMARRLLKP---YGTTGGLPRAKRKSVRTIALVLAVFALCFLPFHVTRTLYYSFRSLNAINMAYKITRPLASANSCLDPVLYFLAGQRLVRFARDAKPPTEPTPSPQARRKL--TVRKDLSVSS >A2AC_DIDMA -MDLQLTTNSTDSGDRGGSSNESLQRQPPSQYSPAEVAGLAAVVSFLIVFTIVGNVLVVIPVLTSRALKAPQNLFLVSLASADILVATLVMPFSLANELMNYWYFGKVWCDIYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRIKGIIVTVWLISAVISFPPLISLYR-------PQCELN--------DETWYILSSCIGSFFAPCIIMVLVYVRIYRVAKLRRRKLCRRKVTQAREKRFTFVLAVVMGVFVVCWFPFFFTYSLYGICREAQVPETLFKFFFWFGYCNSSLNPVIYTIFNQDFRRSFKHILFKKKKKTSLQ----------------- >GPR1_RAT EVSREMLFEELDNYSYALEYYSQEPDAEENVYPGIVHWISLLLYALAFVLGIPGNAIVIWFMGFKWK-KTVTTLWFLNLAIADFIFVLFLPLYISYVALSFHWPFGRWLCKLNSFIAQLNMFSSVFFLTVISLDRYIHLIHPGSHPHRTLKNSLLVVLFVWLLASLLGGPTLYFR--D--NN-RIICYNNQEYELTLMRHHVLTWVKFLFGYLLPLLTMSSCYLCLIFKT---------KKQNILISSKHLWMILSVVIAFMVCWTPFHLFSIWELSIHHNNVLQGGIPLSTGLAFLNSCLNPILYVIISKKFQARFRASVAEVLKRSLWEASCSGSAETKSLSLLET >CCR4_PAPAN IYTSDNYTEEMG-SGDYDSIKEPC---FREENAHFNRIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFASVSE-DD-RYICDRF---YPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH >D2D1_.ENLA ---MDPQNLSMYNDD------INNGTNGTAVDQKPHYNYYAMLLTLLVFVIVFGNVLVCIAVSREKALQTTTNYLIVSLAVADLLVATLVMPWAVYMEVVGEWRFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMISVVWVLSFAISCPLLFGLN---S----KVCII---------DNPAFVIYSSIVSFYVPFIVTLLVYVQIYIVLRKRRTSMSKKKLSQHKEKKATQMLAIVLGVFIICWLPFFIIHILNMHCNCN-IPQALYSAFTWLGYVNSAVNPIIYTTFNVEFRKAFIKILHC------------------------- >OLF8_RAT ---------------MNNKTVITHFLLLGLPIPPEHQQLFFALFLIMYLTTFLGNLLIVVLVQLDSHLHTPMYLFLSNLSFSDLCFSSVTMLKLLQNIQSQVPSISYAGCLTQIFFFLLFGYLGNFLLVAMAYDRYVAICFPLYTNIMSHKLCTCLLLVFWIMTSSHAMMHTLLAARL---SLNFFCDLFKLACSDTYVNELMIHIMGVIIIVIPFVLIVISYAKIISSI--------LKVPSTQSIHKVFSTCGSHLSVVSLFYGTIIGLYLCPSGDN---FSLKGSAMAMMYTVVTPMLNPFIYSLRNRDMKQALIRVTCSKKISLPW------------------ >GASR_MOUSE GSSLCRPGVSLLNSSSAGNLSCETPRIRGTGTRELELTIRITLYAVIFLMSVGGNVLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAVSYLMGVSVSVSTLNLAAIALERYSAICRPLARVWQTRSHAARVILATWLLSGLLMVPYPVYTVVQP---I-LQCMHL---WPSERVQQMWSVLLLILLFFIPGVVMAVAYGLISRELYLGTGPPRPNQAKLLAKKRVVRMLLVIVLLFFVCWLPVYSANTWRAFDGPGALAGAPISFIHLLSYTSACANPLVYCFMHRRFRQACLDTCARCCPRPPRARPRPLPSIASLSRLSYT >OPRK_CAVPO LPNGSAWLPGWAEPDGNGSAGPQDEQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNS-WPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPLKAKIINICIWLLSSSVGISAIILGGTKVDVD-IIECSLQFPDDDYSWWDLFMKICVFVFAFVIPVLIIIVCYTLMILRLKSVRLL-SGSREKDRNLRRITRLVLVVVAVFIICWTPIHIFILVEALGSTSTAALSSYYFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFPIKMRMERQSTSRVAYMRNVDGVNKP >OLFD_CANFA -------------MTEKNQTVVSEFVLLGLPIDPDQRDLFYALFLAMYVTTILGNLLIIVLIQLDSHLHTPMYLFLSNLSFSDLCFSSVTMPKLLQNMQSQVPSIPYAGCLTQMYFFLFFGDLESFLLVAMAYDRYVAICFPLYTTIMSPKLCFSLLVLSWVLTMFHAVLHTLLMARL---CPHFFCDMSKLACSDTQVNELVIFIMGGLILVIPFLLIITSYARIVSSI--------LKVPSAIGICKVFSTCGSHLSVVSLFYGTVIGLYLCPSANN---STVKETIMAMMYTVVTPMLNPFIYSLRNKDMKGALRRVICRKKITFSV------------------ >O2F1_HUMAN -------------MGTDNQTWVSEFILLGLSSDWDTRVSLFVLFLVMYVVTVLGNCLIVLLIRLDSRLHTPMYFFLTNLSLVDVSYATSVVPQLLAHFLAEHKAIPFQSCAAQLFFSLALGGIEFVLLAVMAYDRYVAVCDALYSAIMHGGLCARLAITSWVSGFISSPVQTAITFQL---PDHISCELLRLACVDTSSNEVTIMVSSIVLLMTPFCLVLLSYIQIISTI--------LKIQSREGRKKAFHTCASHLTVVALCYGVAIFTYIQPHSSP---SVLQEKLFSVFYAILTPMLNPMIYSLRNKEVKGAWQKLLWKFSGLTSKLAT--------------- >OPS._MOUSE ------------MLSEASDFNSSGSRSEGSVFSRTEHSVIAAYLIVAGITSILSNVVVLGIFIKYKELRTPTNAVIINLAFTDIGVSSIGYPMSAASDLHGSWKFGHAGCQIYAGLNIFFGMVSIGLLTVVAMDRYLTISCPDVGRRMTTNTYLSMILGAWINGLFWALMPIIGWASYA--PTGATCTIN--WRNNDTSFVSYTMMVIVVNFIVPLTVMFYCYYHVSRSLRLYAS-TAHLHRDWADQADVTKMSVIMILMFLLAWSPYSIVCLWACFGNPKKIPPSMAIIAPLFAKSSTFYNPCIYVAAHKKFRKAMLAMFKCQPHLAVPEPSTLPLAPVRI------ >A1AB_RAT HNTSAPAHWGELKDDNFTGPNQTSSNSTLPQLDVTRAISVGLVLGAFILFAIVGNILVILSVACNRHLRTPTNYFIVNLAIADLLLSFTVLPFSATLEVLGYWVLGRIFCDIWAAVDVLCCTASILSLCAISIDRYIGVRYSLYPTLVTRRKAILALLSVWVLSTVISIGPLLGWKEP--AP--KECGVT--------EEPFYALFSSLGSFYIPLAVILVMYCRVYIVAKRTHNPIAVKLFKFSREKKAAKTLGIVVGMFILCWLPFFIALPLGSLFSTLKPPDAVFKVVFWLGYFNSCLNPIIYPCSSKEFKRAFMRILGCQCRGGRRRRRRRRTYRPWTRGGSLE >D4DR_RAT ---MGNSSATGDGGLLAGRGP---ESLGTGTGLGGAGAAALVGGVLLIGMVLAGNSLVCVSVASERILQTPTNYFIVSLAAADLLLAVLVLPLFVYSEVQGGWLLSPRLCDTLMAMDVMLCTASIFNLCAISVDRFVAVTVPL-RYNQQGQCQLLLIAATWLLSAAVAAPVVCGLN---G-R--TVCCL---------EDRDYVVYSSICSFFLPCPLMLLLYWATFRGLRRWPAPRKRGAKITGRERKAMRVLPVVVGAFLMCWTPFFVVHITRALCPACFVSPRLVSAVTWLGYVNSALNPIIYTIFNAEFRSVFRKTLRLRC----------------------- >GP52_HUMAN SRWTEWRILNMSSGIVNASERHSCPLGFGHYSVVDVCIFETVVIVLLTFLIIAGNLTVIFAFHCAPLHHYTTSYFIQTMAYADLFVGVSCLVPTLSLLHYSTGVHESLTCRVFGYIISVLKSVSMACLACISVDRYLAITKPLYNQLVTPCRLRICIILIWIYSCLIFLPSFFGWG---PGY-IFEWCAT-----SWLTSAYFTGFIVCLLYAPAAFVVCFTYFHIFKICRQHFPSDSSRETGHSPDRRYAMVLFRITSVFYMLWLPYIIYFLLESSRV--LDNPTLSFLTTWLAVSNSFCNCVIYSLSNGVFRLGLRRLFETMCTSCMCVKDQEARANSCSI----- >ACTR_MOUSE -------------MKHIINSYEHTNDTARNNSDCPDVVLPEEIFFTISVIGILENLIVLLAVIKNKNLQSPMYFFICSLAISDMLGSLYKILENILIMFRNMGSFESTADDIIDCMFILSLLGSIFSLSVIAADRYITIFHALYHSIVTMRRTIITLTIIWMFCTGSGITMVIFFS----------------------HHHIPTVLTFTSLFPLMLVFILCLYIHMFLLARSH---ARKISTLPRTNMKGAMTLTILLGVFIFCWAPFVLHVLLMTFCPNNVCYMSLFQVNGMLIMCNAVIDPFIYAFRSPELRDAFKRMLFCNRY---------------------- >SSR4_HUMAN GEEGLGTAWPSAANASSAPAEAEEAVAGPGDARAAGMVAIQCIYALVCLVGLVGNALVIFVILRYAKMKTATNIYLLNLAVADELFMLSVPFVASSAALRH-WPFGSVLCRAVLSVDGLNMFTSVFCLTVLSVDRYVAVVHPLAATYRRPSVAKLINLGVWLASLLVTLPIAIFADTRPGGQ-AVACNLQ---WPHPAWSAVFVVYTFLLGFLLPVLAIGLCYLLIVGKMRAVALR-AGWQQRRRSEKKITRLVLMVVVVFVLCWMPFYVVQLLNLVVTS--LDATVNHVSLILSYANSCANPILYGFLSDNFRRSFQRVLCLRCCLLEGAGGAEETALKSKGGAGCM >OPSD_PIG MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFMLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWVMALACAAPPLVGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFSIPLVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWLPYASVAFYIFTHQGSDFGPIFMTIPAFFAKSASIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTTSKTETSQVAPA >GRHR_SHEEP --MANGDSPDQNENHCSAINSSILLTPGSLPTLTLSGKIRVTVTFFLFLLSTIFNTSFLLKLQNWTQKLSKMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGELLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITRPL-AVKSNSKLGQFMIGLAWLLSSIFAGPQLYIFGMIHEG--FSQCVTH--SFPQWWHQAFYNFFTFSCLFIIPLLIMLICNAKIIFTLTRVPHKNQSKNNIPQARLRTLKMTVAFATSFTVCWTPYYVLGIWYWFDPDMRVSDPVNHFFFLFAFLNPCFDPLIYGYFSL------------------------------------- >DOP1_DROME YDGTTLTSFYNESSWTNASEMDTIVGEEPEPLSLVSIVVVGIFLSVLIFLSVAGNILVCLAIYTDGSPRIG-NLFLASLAIADLFVASLVMTFAGVNDLLGYWIFGAQFCDTWVAFDVMCSTASILNLCAISMDRYIHIKDPLYGRWVTRRVAVITIAAIWLLAAFVSFVPISLGIHRPLI-KYPTCALD--------LTPTYAVVSSCISFYFPCVVMIGIYCRLYCYAQKHFKVHTHSSPYHVSDHKAAVTVGVIMGVFLICWVPFFCVNITAAFCKTC-IGGQTFKILTWLGYSNSAFNPIIYSIFNKEFRDAFKRILTMRNP---------------------- >AA1R_RABIT ----------------------------MPPSISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PETYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKAVVTPRRAAVAIAGCWILSLVVGLTPMFGWNNLRN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRRQK-ASGDPHKYYGKELKIAKSLALILFLFALSWLPLHILNCVTLFCPSCQKPSILVYTAIFLTHGNSAMNPIVYAFRIHKFRVTFLKIWNDHFRCRPAPAGDGDPND--------- >OPSI_ASTFA LNGFEGDNFYIPMSNRTGLVRDPFVYEQYYLAEPWQFKLLACYMFFLICLGLPINGFTLFVTAQHKKLQQPLNFILVNLAVAGMIMVCFGFTITISSAVNGYFYFGPTACAIEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSASHALGGIGFTWFMAMTCAAPPLVGWSRYIPEGLQCSCGPDYYTLNPKYNNESYVIYMFVVHFIVPVTVIFFTYGRLVCTVKSAAAAQQDSASTQKAEKEVTRMVILMVVGFLVAWTPYATVAAWIFFNKGAAFTAQFMAVPAFFSKSSALFNPIIYVLLNKQFRNCMLTTLFCGKNPLGDEESSTVTVSS----VSPA >ACTR_HUMAN -------------MKHIINSYENINNTARNNSDCPRVVLPEEIFFTISIVGVLENLIVLLAVFKNKNLQAPMYFFICSLAISDMLGSLYKILENILIILRNMGSFETTADDIIDSLFVLSLLGSIFSLSVIAADRYITIFHALYHSIVTMRRTVVVLTVIWTFCTGTGITMVIFFS----------------------HHHVPTVITFTSLFPLMLVFILCLYVHMFLLARSH---TRKISTLPRANMKGAITLTILLGVFIFCWAPFVLHVLLMTFCPSNACYMSLFQVNGMLIMCNAVIDPFIYAFRSPELRDAFKKMIFCSRYW--------------------- >CCR4_RAT IYTSDNYSEEVG-SGDYDSNKEPC---FRDENENFNRIFLPTIYFIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAMAD--WYFGKFLCKAVHIIYTVNLYSSVLILAFISLDRYLAIVHATSQSARKLLAEKAVYVGVWIPALLLTIPDIIFADVSQ-DG-RYICDRL---YPDSLWMVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYVGISIDSFILLESVVHKWISITEALAFFHCCLNPILYAFLGAKFKSSAQHALNSMSRGSSLKILSKG----------GH >OPSD_ANOCA MNGTEGQNFYVPMSNKTGVVRNPFEYPQYYLADPWQFSALAAYMFLLILLGFPINFLTLFVTIQHKKLRTPLNYILLNLAVANLFMVLMGFTTTMYTSMNGYFIFGTVGCNIEGFFATLGGEMGLWSLVVLAVERYVVICKPMSNFRFGETHALIGVSCTWIMALACAGPPLLGWSRYIPEGMQCSCGVDYYTPTPEVHNESFVIYMFLVHFVTPLTIIFFCYGRLVCTVKAAAAQQQESATTQKAEREVTRMVVIMVISFLVCWVPYASVAFYIFTHQGSDFGPVFMTIPAFFAKSSAIYNPVIYILMNKQFRNCMIMTLCCGKNPLGDEETSAGTSTVSTSQVSPA >ML1A_CHICK -------MRANGSELNGTVLPRDPPAEGSPRRPPWVTSTLATILIFTIVVDLLGNLLVILSVYRNKKLRNAGNIFVVSLAIADLVVAIYPYPLVLTSVFHNGWNLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLYDKLYSDKNSLCYVGLIWVLTVVAIVPNLFVGS-LQYDP-IYSCTFA------QSVSSAYTIAVVFFHFILPIAIVTYCYLRIWILVIQVRR-PDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGLAVAVDPETRIPEWLFVSSYYMAYFNSCLNAIIYGLLNQNFRREYKKIVVSFCTAKAFFQDSSNSKPSPLITNNNQ >EDG1_HUMAN VKAHRSSVSDYVNYDIIVRHYNYTGKLNISADKENSIKLTSVVFILICCFIILENIFVLLTIWKTKKFHRPMYYFIGNLALSDLLAG-VAYTANLLLSGATTYKLTPAQWFLREGSMFVALSASVFSLLAIAIERYITMLKMKLHNGSNNFRLFLLISACWVISLILGGLPIMGWNCI-S--ALSSCSTV------LPLYHKHYILFCTTVFTLLLLSIVILYCRIYSLVRTRLTFISKASRSSE-NVALLKTVIIVLSVFIACWAPLFILLLLDVGCKVKCDILFRAEYFLVLAVLNSGTNPIIYTLTNKEMRRAFIRIMSCCKCPSGDSAGKFKEFSRSKSDNSSH >OPS1_DROPS APS---NGSVVDKVTPDMAHLISPYWDQFPAMDPIWAKILTAYMIIIGMISWCGNGVVIYIFATTKSLRTPANLLVINLAISDFGIMITNTPMMGINLYFETWVLGPMMCDIYAGLGSAFGCSSIWSMCMISLDRYQVIVKGMAGRPMTIPLALGKIAYIWFMSTIWCCLAPVFWSRYVPEGNLTSCGID--YLERDWNPRSYLIFYSIFVYYIPLFLICYSYWFIIAAVSAHMNVRSSEDADKSAEGKLAKVALVTISLWFMAWTPYLVINCMGLFKFEG-LTPLNTIWGACFAKSAACYNPIVYGISHPKYRLALKEKCPCCVFGKVDDGK-SSTSEAESKA---- >PAFR_CAVPO ----------------------MELNSSSRVDSEFRYTLFPIVYSIIFVLGIIANGYVLWVFARLYPKLNEIKIFMVNLTVADLLFLITLPLWIVYYSNQGNWFLPKFLCNLAGCLFFINTYCSVAFLGVITYNRFQAVKYPITAQATTRKRGIALSLVIWVAIVAAASYFLVMDSTN-G--NITRCFEH---YEKGSKPVLIIHICIVLGFFIVFLLILFCNLVIIHTLLRQP---VKQQRNAEVRRRALWMVCTVLAVFVICFVPHHMVQLPWTLAELGQAINDAHQVTLCLLSTNCVLDPVIYCFLTKKFRKHLSEKLNIMRSSQKCSRVTTDPINHTPVNPIKN >OPSB_MOUSE -----MSGEDDFYLFQNISSVGPWDGPQYHLAPVWAFRLQAAFMGFVFFVGTPLNAIVLVATLHYKKLRQPLNYILVNVSLGGFLFCIFSVFTVFIASCHGYFLFGRHVCALEAFLGSVAGLVTGWSLAFLAFERYVVICKPFGSIRFNSKHALMVVLATWIIGIGVSIPPFFGWSRFIPEGLQCSCGPDWYTVGTKYRSEYYTWFLFIFCFIIPLSLICFSYSQLLRTLRAVAAQQQESATTQKAEREVSHMVVVMVGSFCLCYVPYAALAMYMVNNRNHGLDLRLVTIPAFFSKSSCVYNPIIYCFMNKQFRACILEMVCRKPMAD---ESDVSSTVSSSKVGPH- >HH2R_CANFA -------------------MISNGTGSSFCLDSPPCRITVSVVLTVLILITIAGNVVVCLAVGLNRRLRSLTNCFIVSLAITDLLLGLLVLPFSAFYQLSCRWSFGKVFCNIYTSLDVMLCTASILNLFMISLDRYCAVTDPLYPVLITPVRVAVSLVLIWVISITLSFLSIHLGWN--RN-TIPKCKVQ--------VNLVYGLVDGLVTFYLPLLVMCITYYRIFKIARDQR--MGSWKAATIGEHKATVTLAAVMGAFIICWFPYFTVFVYRGLKGDD-INEAFEAVVLWLGYANSALNPILYATLNRDFRTAYQQLFRCRPASHNAQETSLRRNQSREPMR--- >C3.1_MOUSE --MSTSFPELDLENFEYDDSAEAC---YLGDIVAFGTIFLSVFYALVFTFGLVGNLLVVLALTNSRKPKSITDIYLLNLALSDLLFVATLPFWTHYLISHEG--LHNAMCKLTTAFFFIGFFGGIFFITVISIDRYLAIVLAASMNNRTVQHGVTISLGVWAAAILVASPQFMFTKRKD-----NECLGDYPEVLQEMWPVLRNSEVNILGFALPLLIMSFCYFRIIQTL---------FSCKNRKKARAVRLILLVVFAFFLFWTPYNIMIFLETLKFYNRDLRLALSVTETVAFSHCCLNPFIYAFAGEKFRRYLGHLYRKCLAVLCGHPVHTGRSRQDSILSS-F >ML1._SHEEP ---------MGRTLAVPTPYGCIGCKLPQPDYPPALIVFMFCAMVITIVVDLIGNSMVILAVSKNKKLRNSGNVFVVSLSVADMLVAIYPYPLMLHAMAIGGWDLSKLQCQMVGFITGLSVVGSIFNIMAIAINRYCYICHSLYERIFSVRNTCIYLAVTWIMTVLAVLPNMYIGT-IEYDP-TYTCIFN------YVNNPAFAVTIVCIHFVLPLLIVGFCYVKIWTKVLAARD-AGQNPDNQLAEVRNFLTMFVIFLLFAVCWCPINALTVLVAVNPKEKIPNWVYLAAYFIAYFNSCLNAVIYGVLNENFRREYWTIFHAMRHPVLFLSGLLTAQAHTHARARAR >GRPR_MOUSE NCSHLNLDVDPFLSCNDTFNQSLSPPKMDNWFHPGFIYVIPAVYGLIIVIGLIGNITLIKIFCTVKSMRNVPNLFISSLALGDLLLLVTCAPVDASKYLADRWLFGRIGCKLIPFIQLTSVGVSVFTLTALSADRYKAIVRPMIQASHALMKICLKAALIWIVSMLLAIPEAVFSDLHPQT--FISCAPY---HSNELHPKIHSMASFLVFYVIPLAIISVYYYFIARNLIQSLPVNIHVKKQIESRKRLAKTVLVFVGLFAFCWLPNHVIYLYRSYHYSEMLHFVTSICARLLAFTNSCVNPFALYLLSKSFRKQFNTQLLCCQPGLMNR--SHSMTSFKSTNP-SA >OPS._HUMAN ------------MLRNNLGNSSDSKNEDGSVFSQTEHNIVATYLIMAGMISIISNIIVLGIFIKYKELRTPTNAIIINLAVTDIGVSSIGYPMSAASDLYGSWKFGYAGCQVYAGLNIFFGMASIGLLTVVAVDRYLTICLPDVGRRMTTNTYIGLILGAWINGLFWALMPIIGWASYA--PTGATCTIN--WRKNDRSFVSYTMTVIAINFIVPLTVMFYCYYHVTLSIKHHSD-TESLNRDWSDQIDVTKMSVIMICMFLVAWSPYSIVCLWASFGDPKKIPPPMAIIAPLFAKSSTFYNPCIYVVANKKFRRAMLAMFKCQTHQTMPVTSILPLASGRI------ >5HT1_DROME VSYQGITSSNLGDSNTTLVPLLEEFAAGEFVLPPLTSIFVSIVLLIVILGTVVGNVLVCIAVCMVRKLRRPCNYLLVSLALSDLCVALLVMPMALLYEVLEKWNFGPLLCDIWVSFDVLCCTASILNLCAISVDRYLAITKPLYGVKRTPRRMMLCVGIVWLAAACISLPPLLILGN--E-G--PICTVC--------QNFAYQIYATLGSFYIPLSVMLFVYYQIFRAARRILLGHKKLRFQLAKEKKASTTLGIIMSAFTVCWLPFFILALIRPFETM-HVPASLSSLFLWLGYANSLLNPIIYATLNRDFRKPFQEILYFRCSSLNTMMRENYPPSQRVMLGDER >CRH2_MOUSE ------MANVTLKPLCPLLEEMVQLPNHSNSSLRYIDHVSVLLHGLASLLGLVENGLILFVVGCRMR-QTVVTTWVLHLALSDLLAAASLPFFTYFLAVGHSWELGTTFCKLHSSVFFLNMFASGFLLSAISLDRCLQVVRPVAQNHRTVAVAHRVCLMLWALAVLNTIPYFVFRDTIPWNPRDTTCDY---------RQKALAVSKFLLAFMVPLAIIASSHVAVSLRL---------HHRGRQRTGRFVRLVAAIVVAFVLCWGPYHIFSLLEARAHS-QLASRGLPFVTSLAFFNSVVNPLLYVFTCPDMLYKLRRSLRAVLESVLVEDSDQSRRASSTATPAST >ML1C_CHICK -------------MERPGSNGSCSGCRLEGGPAARAASGLAAVLIVTIVVDVLGNALVILSVLRNKKLRNAGNIFVVSLSVADLVVAVYPYPLILSAIFHNGWTMGNIHCQISGFLMGLSVIGSIFNITAIAINRYCYICHSLYDKLFNLKNTCCYICLTWTLTVVAIVPNFFVGS-LQYDP-IYSCTFA------QTVSTSYTITVVVVHFIVPLSIVTFCYLRIWILVIQVHR-QDCKQKIRAADIRNFLTMFVVFVLFAVCWGPLNFIGLAVSINPSKHIPEWLFVLSYFMAYFNSCLNAVIYGLLNQNFRKEYKRILLMLRTPRLLFIDVSKSKPSPAVTNNNQ >5H1F_HUMAN -------------MDFLNSSD-QNLTSEELLNRMPSKILVSLTLSGLALMTTTINSLVIAAIIVTRKLHHPANYLICSLAVTDFLVAVLVMPFSIVYIVRESWIMGQVVCDIWLSVDITCCTCSILHLSAIALDRYRAITDAVYARKRTPKHAGIMITIVWIISVFISMPPLFWR----S-R--DECIIK-------HDHIVSTIYSTFGAFYIPLALILILYYKIYRAAKTLFKHWRRQKISGTRERKAATTLGLILGAFVICWLPFFVKELVVNVCDKCKISEEMSNFLAWLGYLNSLINPLIYTIFNEDFKKAFQKLVRCRC----------------------- >GRHR_PIG --MANSASPEQNQNHCSAINSSILLTQGNLPTLTLSPNIRVTVTFFLFLLSTAFNASFLLKLQKWTQKLSRMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGEFLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITRPL-AVKSNSRLGRFMIGLAWLLSSIFAGPQLYIFRMIHEG--FSQCVTH--SFPQWWHQAFYDFFTFSCLFIIPLLIMLICNAKIMFTLTRVPHNNQSKNNIPRARLRTLKMTVAFAASFIVCWTPYLVLGIWYWFDPEMRVSDPVNHFFFLFAFLNPCFDPLIYGYFSL------------------------------------- >MAS_MOUSE ------MDQSNMTSLAEEKAMNTSSRNASLGSSHPPIPIVHWVIMSISPLGFVENGILLWFLCFRMR-RNPFTVYITHLSMADISLLFCIFILSIDYALDYESSGHHYTIVTLSVTFLFGYNTGLYLLTAISVERCLSVLYPIYTSHRPKHQSAFVCALLCALSCLVTTMEYVMCI-DSHS--RSDC-----------RAVIIFIAILSFLVFTPLMLVSSSILVVKIRK----------NTWASHSSKLYIVIMVTIIIFLIFAMPMRVLYLLYYEYW--SAFGNLHNISLLFSTINSSANPFIYFFVGSSKKKRFRESLKVVLTRAFKDEMQPRTVSIETVV---- >UL33_HCMVA ----------------------------MTGPLFAIRTTEAVLNTFIIFVGGPLNAIVLITQLLTNRGYSTPTIYMTNLYSTNFLTLTVLPFIVLSNQ-WL-LPAGVASCKFLSVIYYSSCTVGFATVALIAADRYR-VLHKRTYARQSYRSTYMILLLTWLAGLIFSVPAAVYTTVVMN-NGHATCVLYVAEEVH-TVLLSWKVLLTMVWGAAPVIMMTWFYAFFYSTV---------QRTSQKQRSRTLTFVSVLLISFVALQTPYVSLMIFNSYATTATLRRTIGTLARVVPHLHCLINPILYALLGHDFLQRMRQCFRGQLLDRRAFLRSQQTNLAAGNNSQSV >CKR5_GORGO ----MDYQVSSPTYDIDYYTSEPC---QKTNVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >PE21_MOUSE --MSPCG-LNLSLADEAATCATPRLPNTSVVLPTGDNGTSPALPIFSMTLGAVSNVLALALLAQVAGSAATFLLFVASLLAIDLAGHVIPGALVLRLYTAG-RAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTQPLHAARVSVARARLALAVLAAMALAVALLPLVHVGRYELQYPGTWCFIS--LGPRGGWRQALLAGLFAGLGLAALLAALVCNTLSGLALLRDRRRSRGPRRVHAHDVEMVGQLVGIMVVSCICWSPLLVLVVLAIGGWNSNSLQRPLFLAVRLASWNQILDPWVYILLRQAMLRQLLRLLPLRVSAKGGPTELGLSSLRSSRHSGFS >A1AB_MESAU HNTSAPAQWGELKDANFTGPNQTSSNSTLPQLDVTRAISVGLVLGAFILFAIVGNILVILSVACNRHLRTPTNYFIVNLAIADLLLSFTVLPFSATLEVLGYWVLGRIFCDIWAAVDVLCCTASILSLCAISIDRYIGVRYSLYPTLVTRRKAILALLSVWVLSTVISIGPLLGWKEP--AP--KECGVT--------EEPFYALFSSLGSFYIPLAVILVMYCRVYIVAKRTHNPIAVKLFKFSREKKAAKTLGIVVGMFILCWLPFFIALPLGSLFSTLKPPDAVFKVVFWLGYFNSCLNPIIYPCSSKEFKRAFMRILGCQCRSGRRRRRRRRTYRPWTRGGSLE >O3A2_HUMAN ----------MEPEAGTNRTAVAEFILLGLVQTEEMQPVVFVLLLFAYLVTTGGNLSILAAVLVEPKLHAPMYFFLGNLSVLDVGCITVTVPAMLGRLLSHKSTISYDACLSQLFFFHLLAGMDCFLLTAMAYDRLLAICQPLYSTRMSQTVQRMLVAASLACAFTNALTHTVAMSTL---NNHFYCDLPQLSCSSTQLNELLLFAVGFIMAGTPLVLIITAYSHVAAAV--------LRIRSVEGRKKAFSTCGSHLTVVCLFFGRGIFNYMRLGSEE---ASDKDKGVGVFNTVINPMLNPLIYSLRNPDVQGALWQIFLGRRSLTA------------------- >OPSU_CARAU -------MDAWTYQFGNLSKISPFEGPQYHLAPKWAFYLQAAFMGFVFFVGTPLNAIVLFVTMKYKKLRQPLNYILVNISLGGFIFDTFSVSQVFFSALRGYYFFGYTLCAMEAAMGSIAGLVTGWSLAVLAFERYVVICKPFGSFKFGQSQALGAVALTWIIGIGCATPPFWGWSRYIPEGIGTACGPDWYTKNEEYNTESYTYFLLVSCFMMPIMIITFSYSQLLGALRAVAAQQAESASTQKAEKEVSRMVVVMVGSFVVCYGPYAITALYFSYAEDSNKDYRLVAIPSLFSKSSCVYNPLIYAFMNKQFNACIMETVFGKKIDE-----SSESSVSA------- >PAFR_RAT ----------------------MEQNGSFRVDSEFRYTLFPIVYSVIFVLGVVANGYVLWVFATLYPKLNEIKIFMVNLTVADLLFLMTLPLWIVYYSNEGDWIVHKFLCNLAGCLFFINTYCSVAFLGVITYNRYQAVAYPITAQATTRKRGITLSLVIWISIAATASYFLATDSTN-G--NITRCFEH---YEPYSVPILVVHIFITSCFFLVFFLIFYCNMVIIHTLLTRP---VRQQRKPEVKRRALWMVCTVLAVFVICFVPHHVVQLPWTLAELGQAINDAHQITLCLLSTNCVLDPVIYCFLTKKFRKHLSEKFYSMRSSRKCSRATSDPANQTPVLPLKN >YYO1_CAEEL ISPNASNYLTYPFDGLCLQKFFYQLQTSLRRFTPYEEIIYTTVYIIISVAAVIGNGLVIMAVVRKKTMRTNRNVLILNLALSNLILAITNIPFLWLPSIDFEFPYSRFFCKFANVLPGSNIYCSTLTISVMAIDRYYSVKKLKASNRKQCFHAVLVSLAIWIVSFILSLPLLLYYETS--MQEVRQCRLVRLPDITQSIQLLMSILQVAFLYIVPLFVLSIFNVKLTRFLKTNDSHLKNNNKTNQRTNRTTSLLIAMAGSYAALWFPFTLITFLIDFELIINLVERIDQTCKMVSMLSICVNPFLYGFLNTNFRHEFSDIYYRYIRCETKSQPAGRIAHHRQDSVYND >OPS1_LIMPO PN-----ASVVDTMPKEMLYMIHEHWYAFPPMNPLWYSILGVAMIILGIICVLGNGMVIYLMMTTKSLRTPTNLLVVNLAFSDFCMMAFMMPTMTSNCFAETWILGPFMCEVYGMAGSLFGCASIWSMVMITLDRYNVIVRGMAAAPLTHKKATLLLLFVWIWSGGWTILPFFGWSRYVPEGNLTSCTVD--YLTKDWSSASYVVIYGLAVYFLPLITMIYCYFFIVHAVAEHNVAANADQQKQSAECRLAKVAMMTVGLWFMAWTPYLIISWAGVFSSGTRLTPLATIWGSVFAKANSCYNPIVYGISHPRYKAALYQRFPSLACGSGESGSDVKTMEEKPKIPEA- >OPSV_APIME YLPAGPPRLLGWNVPAEELIHIPEHWLVYPEPNPSLHYLLALLYILFTFLALLGNGLVIWIFCAAKSLRTPSNMFVVNLAICDFFMMIKTPIFIYNSFNTG-FALGNLGCQIFAVIGSLTGIGAAITNAAIAYDRYSTIARPL-DGKLSRGQVILFIVLIWTYTIPWALMPVMGVWGRFPEGFLTSCSFD--YLTDTNEIRIFVATIFTFSYCIPMILIIYYYSQIVSHVVNHNVDSNANTSSQSAEIRIAKAAITICFLYVLSWTPYGVMSMIGAFGNKALLTPGVTMIPACTCKAVACLDPYVYAISHPKYRLELQKRLPWLELQEKPISDSTSTPPASS------ >5H1B_DIDMA PPASGSLTSSQTNHSTFPNPNAPDLEPYQDSIALPWKVLLATFLGLITLGTTLSNAFVIATVSRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASILHLCVIALDRYWAITDAVYSAKRTPKRAAGMIIMVWVFSVSISMPPLFWR------E--ADCSVN-------TDHILYTVYSTVGAFYFPTLLLIALYGRIYVEARSRKVSLEKKKLMAARERKATRTLGIILGAFIVCWLPFFIISLALPICDDAWFHLAIFDFFNWLGYLNSLINPIIYTKSNDDFKQAFQKLMRFRRTS--------------------- >NY2R_PIG EEMKMEPSGPGHTTPRGELAPDSEPELKDSTKLIEVQIILILAYCSIILLGVVGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTITLTVIALDRHRCIVYHL-ESKISKRISFLIIGLAWGISALLASPLAIFREYS-FE--IVACTEKWPGEEKSIYGTVYSLSSLLILYVLPLGIISFSYARIWSKLKNHSPG-GVNDHYHQRRQKTTKMLVCVVVVFAVSWLPLHAFQLAVDIDSQVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCEQRLDAIHSE---AKKNLEATKNGG >B1AR_.ENLA WGPMEC--RNRS----GTPTTVPSPMHPLPELTHQWTMGMTMFMAAIILLIVMGNIMVIVAIGRNQRLQTLTNVFITSLACADLIMGLFVVPLGATLVVSGRWLYGSIFCEFWTSVDVLCVTASIETLCVISIDRYIAITSPFYQSLLTKGRAKGIVCSVWGISALVSFLPIMMHWWRDM-K--GCCDFV--------TNRAYAIASSIISFYFPLIIMIFVYIRVFKEAQKQHGRRILSKILVAKEQKALKTLGIIMGTFTLCWLPFFLANVVNVFYRNL-IPDKLFLFLNWLGYANSAFNPIIYCRSP-DFRKAFKRLLCCPKKADRHLHTTGEFVNSLDTN---A >D2DR_HUMAN ---MDPLNLSWYDDDLERQNWSRPFNGSDGKADRPHYNYYATLLTLLIAVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWKFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMISIVWVLSFTISCPLLFGLN---Q----NECII---------ANPAFVVYSSIVSFYVPFIVTLLVYIKIYIVLRRRRTSMSRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNIHCDCN-IPPVLYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFLKILHC------------------------- >FML2_HUMAN -----------METNFSIPLNETEEVLPEPAGHTVLWIFSLLVHGVTFVFGVLGNGLVIWVAGFRMT-RTVNTICYLNLALADFSFSAILPFRMVSVAMREKWPFASFLCKLVHVMIDINLFVSVYLITIIALDRCICVLHPAAQNHRTMSLAKRVMTGLWIFTIVLTLPNFIFWTTISAFWAVERLNVF------ITMAKVFLILHFIIGFTVPMSIITVCYGIIAAKI---------HRNHMIKSSRPLRVFAAVVASFFICWFPYELIGILMAVWLKEKIILVLINPTSSLAFFNSCLNPILYVFMGRNFQERLIRSLPTSLERALTEVPDSATSAS-------- >ACM5_MACMU -------MEGDSYHNATTVNGTPVYHQPLERHRLWEVISIAAVTAVVSLITIVGNVLVMISFKVNSQLKTVNNYYLLSLACADLIIGIFSMNLYTTYILMGRWALGSLACDLWLALDYVASNASVMNLLVISFDRYFSITRPLYRAKRTPKRAGVMIGLAWLISFILWAPAILCWQYLV-VP-LDECQIQ------FLSEPTITFGTAIAAFYIPVSVMTILYCRIYRETEKRNPSTKRKRMVLVKERKAAQTLSAILLAFIITWTPYNIMVLVSTFCDKC-VPVTLWHLGYWLCYVNSTVNPICYALCNRTFRKTFKMLLLCRWKKKKVEEKLYW------------ >PE22_CANFA ----------------MGSISNNSGSEDCESREWLPSGESPAISSAMFSAGVLGNLIALALLARRWRSISLFHVLVTELVFTDLLGTCLISPVVLASYARNQLEPERRACTYFAFAMTFFSLATMLMLFAMALERYLSIGRPYYQRHVTRRGGLAVLPTIYTVSLLFCSLPLLGYGQYVQYCPGTWCFIR--------HGRTAYLQLYATLLLLLIVAVLACNFSVILNLIRMDGSRRGERVSVAEETDHLILLAIMTITFAICSLPFTIFAYMNETSS---RREKWDLQALRFLSINSIIDPWVFAIFRPPVLRLMRSVLCCRVSLRAQDATQTSSRLTFVDTS--- >5H1B_RABIT AAGSQIAVPQANLSAAHSHNCSAEGYIYQDSIALPWKVLLVLLLALFTLATTLSNAFVVATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDLWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPKRAAIMIRLVWVFSICISLPPFFWR----E-E--SECLVN-------TDHVLYTVYSTVGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGVFIVCWLPFFIISLVMPICKDAWFHQAIFDFFTWLGYVNSLINPIIYTMSNEDFKQAFHKLIRFKCTS--------------------- >GLHR_ANTEL HSNHTPNGTQFHQCSKIPVQCVPKSDAFHPCEDIMGYVWLTVVSFMVGAVALVANLVVALVLLTSQRRLNVTRFLMCNLAFADFILGLYIFILTSVSAVTRGWQNG-AGCKILGFLAVFSSELSLFTLVMMTIERFYAIVHAMMNARLSFRKTVRFMIGGWIFALVMAVVPLTGVSGY----KVAICLPF-----VSDATSTAYVAFLLLVNGASFISVMYLYSRMLYVVVSG---GDMEGAPKRNDSKVAKRMAILVFTDMLCWAPIAFFGLLAAFGQTLLTVTQSKILLVFFFPINSICNPFLYAFFTKAFKRELFTALSRIGFCKFRALKYNGSRSRRHHSTVNA >MSHR_OVIMO PALGSQRRLLGSLNCTPPATLPLTLAPNRTGPQCLEVSIPNGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFVCCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICSSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSVLSITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACRHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW----------------------- >P2YR_CHICK PELLAG-----------GWAAGNATTKCSLTKTGFQFYYLPTVYILVFITGFLGNSVAIWMFVFHMRPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISVHRYTGVVHPLSLGRLKKKNAVYVSSLVWALVVAVIAPILFYSGTG----KTITCYDT-TADEYLRSYFVYSMCTTVFMFCIPFIVILGCYGLIVKALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYLPFHVMKTLNLRARLDDKVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKSSRRSEPNVQSKSLTEYKQNGDTSL >ADMR_MOUSE LEPDNDFRDIHNWTELLHLFNQTFTDCHIEFNENTKHVVLFVFYLAIFVVGLVENVLVICVNCRRSGRVGMLNLYILNMAIADLGIILSLPVWMLEVMLYETWLWGSFSCRFIHYFYLVNMYSSIFFLTCLSIDRYVTLTNTSSWQRHQHRIRRAVCAGVWVLSAIIPLPEVVHIQLLD--E--PMCLFLAPFETYSAWALAVALSATILGFLLPFLLIAVFNILTACRL---------RRQRQTESRRHCLLMWAYIVVFAICWLPYQVTMLLLTLHGTHNLLYFFYEIIDCFSMLHCVANPILYNFLSPSFRGRLLSLVVRYLPKEQARAAGGRQHSIIITKEGSL >FSHR_BOVIN FDVMYSEFDYDLCNEVVDVTCSPEPDAFNPCEDIMGDDILRVLIWFISILAITGNILVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDVHTKTWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLECKVQLRHAASIMLVGWIFAFAVALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYTHIYLTVRNP------NITSSSSDTKIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKSKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEVQAQTYRSNFHPRNGHCPPA >B3AR_RAT MAPWPHKNGSLAFWSDAPTLDPSAAN---TSGLPGVPWAAALAGALLALATVGGNLLVITAIARTPRLQTITNVFVTSLATADLVVGLLVMPPGATLALTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGTLVTKRRARAAVVLVWIVSATVSFAPIMSQWWRVQ-E--RCCSFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVAKRQGVPRRPARLLPLGEHRALRTLGLIMGIFSLCWLPFFLANVLRALVGPS-VPSGVFIALNWLGYANSAFNPLIYCRSP-DFRDAFRRLLCSYGGRGPEEPRVVTSRQNSPLNRFDG >D3DR_CERAE --------MAPLSQLSGHLNYTCGVENSTGASQARPHAYYALSYCALILAIVFGNGLVCMAVLKERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGWNFSRVCCDVFVTLDVMMCTASILNLCAISIDRYTAVVMPVGTGQSSCRRVTLMITAVWVLAFAVSCPLLFGFN---D-P--TVCSI---------SNPDFVIYSSVVSFYLPFGVTVLVYARIYVVLKQRTSLPLQPRGVPLREKKATQMVAIVLGAFIVCWLPFFLTHVLNTHCQTCHVSPELYSATTWLGYVNSALNPVIYTTFNIEFRKAFLKILSC------------------------- >NK2R_MOUSE ----MGAHASVTDTNILSGLESNATGVTAFSMPGWQLALWATAYLALVLVAVTGNATVIWIILAHERMRTVTNYFIINLALADLCMAAFNATFNFIYASHNIWYFGSTFCYFQNLFPVTAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPSTKAVIAVIWLVALALASPQCFYST---GA---TKCVVAWPNDNGGKMLLLYHLVVFVLIYFLPLVVMFAAYSVIGLTLWKRPRHHGANLRHLQAKKKFVKAMVLVVVTFAICWLPYHLYFILGTFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWGTPTEE----HTPSISRRVNRC >BONZ_MACNE ------MAEHDYHEDYGLNSFNDSSQEEHQDFLQFRKVFLPCMYLVVFVCGLVGNSLVLVISIFYHKLQSLTDVFLVNLPLADLVFVCTLPFWAYAGIHE--WIFGQVMCKTLLGVYTINFYTSMLILTCITVDRFIVVVKATNQQAKRMTWGKVICLLIWVISLLVSLPQIIYGNVFNLD--KLICGYH-----DKEISTVVLATQMTLGFFLPLLAMIVCYSVIIKTL---------LHAGGFQKHRSLKIIFLVMAVFLLTQTPFNLVKLIRSTHWEYTSFHYTIIVTEAIAYLRACLNPVLYAFVSLKFRKNFWKLVKDIGCLPYLGVSHQWKTFSASHNVEAT >NK1R_MOUSE -----MDNVLPVDSDLFPNTSTNTSESNQFVQPTWQIVLWAAAYTVIVVTSVVGNVVVIWIILAHKRMRTVTNYFLVNLAFAEACMAAFNTVVNFTYAVHNVWYYGLFYCKFHNFFPIAALFASIYSMTAVAFDRYMAIIHPL-QPRLSATATKVVIFVIWVLALLLAFPQGYYST---SR---VVCMIEWPEHPNRTYEKAYHICVTVLIYFLPLLVIGYAYTVVGITLE---IPSDRYHEQVSAKRKVVKMMIVVVCTFAICWLPFHIFFLLPYINPDLKFIQQVYLASMWLAMSSTMYNPIIYCCLNDRFRLGFKHAFRCCPFISAGDY----STRYLQTQSSVY >O.2R_HUMAN ASELNETQEPFLNPTDYDDEEFLRYLWREYLHPKEYEWVLIAGYIIVFVVALIGNVLVCVAVWKNHHMRTVTNYFIVNLSLADVLVTITCLPATLVVDITETWFFGQSLCKVIPYLQTVSVSVSVLTLSCIALDRWYAICHPL-MFKSTAKRARNSIVIIWIVSCIIMIPQAIVMECSTKTTLFTVCDER---WGGEIYPKMYHICFFLVTYMAPLCLMVLAYLQIFRKLWCRKSRVAAEIKQIRARRKTARMLMVVLLVFAICYLPISILNVLKRVFGMFETVYAWFTFSHWLVYANSAANPIIYNFLSGKFREEFKAAFSCCCLGVHHRQEDRLESRKSLTTQISN >IL8A_RABIT WTWFEDEFANATGMPPVEKDYSPC----LVVTQTLNKYVVVVIYALVFLLSLLGNSLVMLVILYSRSNRSVTDVYLLNLAMADLLFALTMPIWAVSKEKG--WIFGTPLCKVVSLVKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRHLVKFICLGIWALSLILSLPFFLFRQVFS-NS-SPVCYED-LGHNTAKWRMVLRILPHTFGFILPLLVMLFCYGFTLRTL---------FQAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTHNDIDRALDATEILGFLHSCLNPIIYAFIGQNFRNGFLKMLAARGLISKEFLTRHR------------ >AG2S_MOUSE ------MILNSSIEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYQWPFGNHLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLMAGLASLPAVIHRNVYF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFVFPFVIILTSYTLIWKALKKA----YKIQKNTPRNDDIFRIIMAIVLFFFFSWVPHQIFSFLDVLIQLGDVVDTAMPITICIAYFNNCLNPLFYGFLGKKFKRYFLQLLKYIPPKARSHAGLSTRPSD-------N >OPSD_ANGAN MNGTEGPNFYIPMSNITGVVRSPFEYPQYYLAEPWAYTILAAYMFTLILLGFPVNFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTVYTSMHGYFVFGETGCNLEGYFATLGGEISLWSLVVLAIERWVVVCKPMSNFRFGENHAIMGLAFTWIMANSCAMPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFIVHFSVPLTIISFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVVIMVIAFLVCWVPYASVAWYIFTHQGSTFGPVFMTVPSFFAKSSAIYNPLIYICLNSQFRNCMITTLFCGKNPFQEEEGASTASSVSS--VSPA >O3A3_HUMAN ----MSLQKLMEPEAGTNRTAVAEFILLGLVQTEEMQPVVFVLLLFAYLVTIGGNLSILAAVLVEPKLHAPMYFFLGNLSVLDVGCITVTVPAMLGRLLSHKSTISYDACLSQLFFFHLLAGMDCFLLTAMAYDRLLAICQPLYSTRMSQTVQRMLVAASWACAFTNALTHTVAMSTL---NNHFYCDLPQLSCSSTQLNELLLFVAAAFMAVAPLVFISVSYAHVVAAV--------LQIRSAEGRKKAFSTCGSHLTVVGIFYGTGVFSYMRLGSVE---SSDKDKGVGVFMTVINPMLNPLIYSLRNTDVQGALCQLLVGERSLT-------------------- >GPRC_MOUSE NLSGLPRDCIDAGAPENISAAVPSQGSVAESEPELVVNPWDIVLCSSGTLICCENAVVVLIIFHSPSLRAPMFLLIGSLALADLLAG-LGLIINFVFA-Y--LLQSEATKLVTIGLIVASFSASVCSLLAITVDRYLSLYYALYHSERTVTFTYVMLVMLWGTSICLGLLPVMGWNCL-R--DESTCSVV-------RPLTKNNAAILSISFLFMFALMLQLYIQICKIVMRHIALHFLATSHYVTTRKGVSTLALILGTFAACWMPFTLYSLIADY----TYPSIYTYATLLPATYNSIINPVIYAFRNQEIQKALCLICCGCIPSSLSQRARSP------------ >D1DR_CARAU -------------MAVLDLNLTTVIDSGFMESDRSVRVLTGCFLSVLILSTLLGNTLVCAAVTKFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVTEVAGFWPFG-AFCDIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPRVAFVMISGAWTLSVLISFIPVQLKWHKAVN-PTDNCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTQIYRIAQKQGSNESSFKLSFKRETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCKRTCISPTTFDVFVWFGWANSSLNPIIYAFNA-DFRRAFAILLGCQRLCPGSI-SMET------------ >OPS1_SCHGR GG-F-ANQTVVDKVPPEMLYLVDPHWYQFPPMNPLWHGLLGFVIGVLGVISVIGNGMVIYIFSTTKSLRTPSNLLVVNLAFSDFLMMFTMSAPMGINCYYETWVLGPFMCELYALFGSLFGCGSIWTMTMIALDRYNVIVKGLSAKPMTNKTAMLRILFIWAFSVAWTIMPLFGWNRYVPEGNMTACGTD--YLTKDWVSRSYILVYSFFVYLLPLGTIIYSYFFILQAVSAHMNVRSAEASQTSAECKLAKVALMTISLWFFGWTPYLIINFTGIFETMK-ISPLLTIWGSLFAKANAVFNPIVYGISHPKYRAALEKKFPSLACASSSDDNTSVSDEKSEKSASA- >MSHR_CAPCA PVLGSQRRLLGSLNCTPPATFPLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAVSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDMLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW----------------------- >O3A4_HUMAN -----------MDLGNSGNDSVTKFVLLGLTETAALQPILFVIFLLAYVTTIGGTLSILAAILMETKLHSPMYFFLGNLSLPDVGCVSVTVPAMLSHFISNDRSIPYKACLSELFFFHLLAGADCFLLTIMAYDRYLAICQSLYSSRMSWGIQQALVGMSWVFSFTNALTQTVALSPL---NNHFYCDLPQLSCASVHLNGQLLFVAAAFMGVAPLVLITVSYAHVAAAV--------LRIRSAEGKKKAFSTCSSHLTVVGIFYGTGVFSYTRLGSVE---SSDKDKGIGILNTVISPMLNPLIYWTSLLDVGCISHCSSDAGVSPGPPVQSSLCLSPPPGWGGLSP >P70658 IYTSDNYSEEVG-SGDYDSNKEPC---FRDENVHFNRIFLPTIYFIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAMAD--WYFGKFLCKAVHIIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKAVYVGVWIPALLLTIPDFIFADVSQGDD-RYICDRL---YPDSLWMVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYVGISIDSFILLGSIVHKWISITEALAFFHCCLNPILYAFLGAKFKSSAQHALNSMSRGSSLKILSKG----------GH >OPSD_AMBTI MNGTEGPNFYVPFSNKSGVVRSPFEYPQYYLAEPWQYSVLAAYMFLLILLGFPVNFLTLYVTIQHKKLRTPLNYILLNLAFANHFMVFGGFPVTMYSSMHGYFVFGQTGCYIEGFFATMGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVMMTWIMALACAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFLVHFTIPLMIIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWVPYASVAFYIFSNQGTDFGPIFMTVPAFFAKSSAIYNPVIYIVLNKQFRNCMITTICCGKNPFGDDETTSAASSVSSSQVSPA >V2R_PIG -MLRATTSAVPRALSWPAAPGNGSEREPLDDRDPLLARVELALLSTVFVAVALSNGLVLGALVRRGRRWAPMHVFIGHLCLADLAVALFQVLPQLAWDATYRFRGPDALCRAVKYLQMVGMYASSYMILAMTLDRHRAICRPMAYRHGGGARWNRPVLVAWAFSLLLSLPQLFIFAQRDSG--VLDCWAS---FAEPWGLRAYVTWIALMVFVAPALGIAACQVLIFREIHTSGRRPREGARVSAAMAKTARMTLVIVAVYVLCWAPFFLVQLWSVWDPKA-REGPPFVLLMLLASLNSCTNPWIYASFSSSISSELRSLLCCPRRRTPPSLRPQESFSARDTSS--- >5H7_HUMAN GSWAPHLLS---EVTASPAPTNASGCGEQINYGRVEKVVIGSILTLITLLTIAGNCLVVISVCFVKKLRQPSNYLIVSLALADLSVAVAVMPFVSVTDLIGGWIFGHFFCNVFIAMDVMCCTASIMTLCVISIDRYLGITRPLYPVRQNGKCMAKMILSVWLLSASITLPPLFGWAQ--N-D--KVCLIS--------QDFGYTIYSTAVAFYIPMSVMLFMYYQIYKAARKSSRLERKNISIFKREQKAATTLGIIVGAFTVCWLPFFLLSTARPFICGTCIPLWVERTFLWLGYANSLINPFIYAFFNRDLRTTYRSLLQCQYRNINRKLSAAGAERPERPEFVLR >NTR2_RAT -----METSSPWPPRPSPSAGLSLEARLGVDTRLWAKVLFTALYSLIFAFGTAGNALSVHVVLKARARPGRLRYHVLSLALSALLLLLVSMPMELYNFVWSHWVFGDLGCRGYYFVRELCAYATVLSVASLSAERCLAVCQPLARRLLTPRRTRRLLSLVWVASLGLALPMAVIMGQKH--AASRVCTVL---VSRATLQVFIQVN-VLVSFALPLALTAFLNGITVNHLMALVQARHKDASQIRSLQHSAQVLRAIVAVYVICWLPYHARRLMYCYIPDDDFYHYFYMVTNTLFYVSSAVTPILYNAVSSSFRKLFLESLGSLCGEQHSLVPLPQSTYSFRLWGSPR >OPSD_RAJER MNGTEGENFYVPMSNKTGVVRSPFDYPQYYLGEPWMFSALAAYMFFLILTGLPVNFLTLFVTIQHKKLRQPLNYILLNLAVSDLFMVFGGFTTTIITSMNGYFIFGPAGCNFEGFFATLGGEVGLWCLVVLAIERYMVVCKPMANFRFGSQHAIIGVVFTWIMALSCAGPPLVGWSRYIPEGLQCSCGVDYYTMKPEVNNESFVIYMFVVHFTIPLIVIFFCYGRLVCTVKEAAAQQQESESTQRAEREVTRMVIIMVVAFLICWVPYASVAFYIFINQGCDFTPFFMTVPAFFAKSSAVYNPLIYILMNKQFRNCMITTICLGKNPFEEEESTSAASSVSSSQVAPA >OPSD_CAMAB GGGF-GNQTVVDKVPPEMLHMVDAHWYQFPPMNPLWHALLGFVIGVLGVISVIGNGMVIYIFTTTKSLRTPSNLLVVNLAISDFLMMLCMSPAMVINCYYETWVLGPLFCELYGLAGSLFGCASIWTMTMIAFDRYNVIVKGLSAKPMTINGALIRILTIWFFTLAWTIAPMFGWNRYVPEGNMTACGTD--YLTKDLFSRSYILIYSIFVYFTPLFLIIYSYFFIIQAVAAHMNVRSAENQSTSAECKLAKVALMTISLWFMAWTPYLVINYSGIFETTK-ISPLFTIWGSLFAKANAVYNPIVYGISHPKYRAALFQKFPSLACTTEPTG-ADTTEGNEKPAA--- >SSR4_MOUSE VED----TTWTPGINASWAPEQEEDAMGSDGTGTAGMVTIQCIYALVCLVGLVGNALVIFVILRYAKMKTATNIYLLNLAVADELFMLSVPFVRSAAALRH-WPFGAVLCRAVLSVDGLNMFTSVFCLTVLSVDRYVAVVHPLTATYRRPSVAKLINLGVWLASLLVTLPIAVFADTRPGGE-AVACNLH---WPHPAWSAVFVIYTFLLGFLPPVLAIGLCYLLIVGKMRAVALR-GGWQQRRRSEKKITRLVLMVVTVFVLCWMPFYVVQLLNLFVTS--LDATVNHVSLILSYANSCANPILYGFLSDNFRRSFQRVLCLRCCLLETTGGAEETALKSRGGAGCI >NY1R_.ENLA NFSTYFENLSVPNNISG---NITFPISEDCALPLPMIFTLALAYGAVIILGLSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLATIMCLPFTLIYTLMDHWIFGEVMCKLNEYIQCVSVTVSIFSLVLIAIERHQLIINPR-GWRPNNRHACFGITVIWGFAMACSTPLMMYSVLTDIG--KYVCLED---FPEDKFRLSYTTLLFILQYLGPLCFIFVCYTKIFLRLKRRNMMIRDNKYRSSETKRINIMLLSIVVGFALCWLPFFIFNLVFDWNHEACNHNLLFLICHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSREDDY---TMHTDVSKTSLK >LSHR_PIG ESELSDWDYDYGFCSPKTLQCAPEPDAFNPCEDIMGYDFLRVLIWLINILAIMGNVTVLFVLLTSHYKLTVPRFLMCNLSFADFCMGLYLLLIASVDAQTKGWQTG-NGCSVAGFFTVFASELSVYTLTVITLERWHTITYAILDQKLRLRHAIPIMLGGWLFSTLIAMLPLVGVSSY----KVSICLPM-----VETTLSQVYILTILILNVVAFIIICACYIKIYFAVQNP------ELMATNKDTKIAKKMAVLIFTDFTCMAPISFFAISAALKVPLITVTNSKVLLVLFYPVNSCANPFLYAIFTKAFRRDFFLLLSKSGCCKHQAELYRRK---NGFTGSNK >GPR4_HUMAN --------------------MGNHTWEGCHVDSRVDHLFPPSLYIFVIGVGLPTNCLALWAAYRQVQQRNELGVYLMNLSIADLLYICTLPLWVDYFLHHDNWIHGPGSCKLFGFIFYTNIYISIAFLCCISVDRYLAVAHPLFARLRRVKTAVAVSSVVWATELGANSAPLFHDEL--NH---TFCFEKPMEGWVAWMVAWMNLYRVFVGFLFPWALMLLSYRGILRAVRG------SVSTERQEKAKIKRLALSLIAIVLVCFAPYHVLLLSRSAIYLGERVFSAYHSSLAFTSLNCVADPILYCLVNEGARSDVAKALHNLLRFLASDKPQEMETPLTSKRNSTA >OPSD_POMMI MNGTEGPFFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFFLILTGFPINFLTLYVTLEHKKLRTALNLILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNVEGFFATLGGEIALWSLVVLAVERWVVVCKPISNFRFTENHAIMGVAFSWIMAATCAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFIVHFLAPLIVIFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIIMVIGFLTSWLPYASVAWYIFTHQGTEFGPLFMTIPAFFAKSSALYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA >CB1R_POEGU EFYNKSLSTFKDNEENIQCGENFMDMECFMILNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFVDFHVFHR-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRPKAVVAFCVMWTIAIVIAVLPLLGWNCK-K--LNSVCSDI------FPLIDETYLMFWIGVTSILLLFIVYAYMYILWKAHSHTEDQITRPDQTRMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTIFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPTCEGTAQPLDNSMEANN-AGNVHRAA >ADMR_HUMAN AVPTSDLGEIHNWTELLDLFNHTLSECHVELSQSTKRVVLFALYLAMFVVGLVENLLVICVNWRGSGRAGLMNLYILNMAIADLGIVLSLPVWMLEVTLDYTWLWGSFSCRFTHYFYFVNMYSSIFFLVCLSVDRYVTLTSASSWQRYQHRVRRAMCAGIWVLSAIIPLPEVVHIQLVE--E--PMCLFMAPFETYSTWALAVALSTTILGFLLPFPLITVFNVLTACRL---------RQPGQPKSRRHCLLLCAYVAVFVMCWLPYHVTLLLLTLHGTHHLLYFFYDVIDCFSMLHCVINPILYNFLSPHFRGRLLNAVVHYLPKDQTKAGTCAQHSIIITKGDSQ >GPR6_HUMAN G-PPAAAALGAGGGANGSLELSSQLSAGPPGLLLPAVNPWDVLLCVSGTVIAGENALVVALIASTPALRTPMFVLVGSLATADLLAG-CGLILHFVFQ-Y--LVPSETVSLLTVGFLVASFAASVSSLLAITVDRYLSLYNALYYSRRTLLGVHLLLAATWTVSLGLGLLPVLGWNCL-A--ERAACSVV-------RPLARSHVALLSAAFFMVFGIMLHLYVRICQVVWRHIALHCLAPPHLAATRKGVGTLAVVLGTFGASWLPFAIYCVVGSH----EDPAVYTYATLLPATYNSMINPIIYAFRNQEIQRALWLLLCGCFQSKVPFRSRSP------------ >5H1D_MOUSE SLPNQSLEGLPQEASNRSLNAT---GAWDPEVLQALRISLVVVLSVITLATVLSNAFVLTTILLTKKLHTPANYLIGSLATTDLLVSILVMPISIAYTTTRTWNFGQILCDIWVSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAAAMIAAVWIISICISIPPLFWR----H-E--SDCLVN-------TSQISYTIYSTCGAFYIPSILLIILYGRIYVAARSRKLALERKRISAARERKATKTLGIILGAFIICWLPFFVVSLVLPICRDSWIHPALFDFFTWLGYLNSLINPVIYTVFNEDFRQAFQKVVHFRKAS--------------------- >OPSD_ORYLA MNGTEGPYFNVPMVNTTGIVRSPYEYPQYYLVSPAAYAALGAYMFFLILVGFPINFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIGLWSLVVLAIERWVVVCKPISNFRFGENHAIMGLVFTWIMAASCAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVVYMFVCHFLIPLIVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIGFLVCWLPYASVAWYIFTNQGSEFGPLFMTIPAFFAKSSSIYNPAIYICMNKQFRNCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA >YDBM_CAEEL EGAGEDVDHHSLFCPKKLVGNLKGFIRNQYHQHETIQILKGSALFLLVLWTIFANSLVFIVLYKNPRLQTVPNLLVGNLAFSDLALGLIVLPLSSVYAIAGEWVFPDALCEVFVSADILCSTASIWNLSIVGLDRYWAITSPVYMSKRNKRTAGIMILSVWISSALISLAPLLGWKQ--Q-TTVRQCTFL--------DLPSYTVYSATGSFFIPTLLMFFVYFKIYQAFAKHYRKRKPKAISAAKERRGVKVLGIILGCFTVCWAPFFTMYVLVQFCKDCSPNAHIEMFITWLGYSNSAMNPIIYTVFNRDYQIALKRLFTSEKKPSSTSRV--------------- >THRR_MOUSE LLEGRAVYLNISLPPHTPPPPFISEDASGYLTSPWLTLFMPSVYTIVFIVSLPLNVLAIAVFVLRMKVKKPAVVYMLHLAMADVLFVSVLPFKISYYFSGTDWQFGSGMCRFATAAFYGNMYASIMLMTVISIDRFLAVVYPISLSWRTLGRANFTCVVIWVMAIMGVVPLLLKEQTTR--N--TTCHDVLSENLMQGFYSYYFSAFSAIFFLVPLIVSTVCYTSIIRCL------SSSAVANRSKKSRALFLSAAVFCIFIVCFGPTNVLLIVHYLFLSDEAAYFAYLLCVCVSSVSCCIDPLIYYYASSECQRHLYSILCCKESSDPNSCNSTGDTCS-------- >H218_RAT ----MGGLYSEYLNPEKVQEHYNYTKETLDMQETPSRKVASAFIIILCCAIVVENLLVLIAVARNSKFHSAMYLFLGNLAASDLLAG-VAFVANTLLSGPVTLSLTPLQWFAREGSAFITLSASVFSLLAIAIERQVAIAKVKLYGSDKSCRMLMLIGASWLISLILGGLPILGWNCL-D--HLEACSTV------LPLYAKHYVLCVVTIFSVILLAIVALYVRIYFVVRSS-----HADVAGPQTLALLKTVTIVLGVFIICWLPAFSILLLDSTCPVRCPVLYKAHYFFAFATLNSLLNPVIYTWRSRDLRREVLRPLLCWRQGKGATGRRGGPLRSSSSLERGL >CKR5_MOUSE --MDFQGSVPTYIYDIDYGMSAPC---QKINVKQIAAQLLPPLYSLVFIFGFVGNMMVFLILISCKKLKSVTDIYLLNLAISDLLFLLTLPFWAHYAANE--WIFGNIMCKVFTGVYHIGYFGGIFFIILLTIDRYLAIVHAVALKVRTVNFGVITSVVTWVVAVFASLPEIIFTRSQK-EG-HYTCSPHFPHTQYHFWKSFQTLKMVILSLILPLLVMIICYSGILHTL--------FRCRNEKKRHRAVRLIFAIMIVYFLFWTPYNIVLLLTTFQEFFNRLDQAMQATETLGMTHCCLNPVIYAFVGEKFRSYLSVFFRKHIVKRFCKRCSIFV-------SSVY >B3AR_CANFA MAPWPHGNGSVASWPAAPTPTPDAANTSGLPGAPWAVALAGALLALEVLATVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGRWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSKWWRVQ-R--HCCAFA--------SNIPYALLSSSVSFYLPLLVMLFVYARVFLVATRQAVPLRPARLLPLREHRALRTLGLIVGTFTLCWLPFFVANVMRALGGPS-VPSPALLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCR---REEHRAAAAAPAALTSPAES >P2Y5_CHICK -----------------------MVSSNCSTEDSFKYTLYGCVFSMVFVLGLIANCVAIYIFTFTLKVRNETTTYMLNLAISDLLFVFTLPFRIYYFVVRN-WPFGDVLCKISVTLFYTNMYGSILFLTCISVDRFLAIVHPFSKTLRTKRNARIVCVAVWITVLAGSTPASFFQSTNR---EQRTCFENFPESTWKTYLSRIVIFIEIVGFFIPLILNVTCSTMVLRTLNKP----LTLSRNKLSKKKVLKMIFVHLVIFCFCFVPYNITLILYSLMRTQTAVRTMYPVTLCIAVSNCCFDPIVYYFTSDTNSELDKKQQVHQNT---------------------- >GPRO_HUMAN HASRMSVLRAKPMSNSQRLLLLSPGSPPRTGSISYINIIMPSVFGTICLLGIIGNSTVIFAVVKKSKCNNVPDIFIINLSVVDLLFLLGMPFMIHQLMGNGVWHFGETMCTLITAMDANSQFTSTYILTAMAIDRYLATVHPISTKFRKPSVATLVICLLWALSFISITPVWLYARLIP-FP-AVGCGIR--LPNPDTDLYWFTLYQFFLAFALPFVVITAAYVRILQRMTSSV--PASQRSIRLRTKRVTRTAIAICLVFFVCWAPYYVLQLTQLSISRPTFVYLYNAAIS-LGYANSCLNPFVYIVLCETFRKRLVLSVKPAAQGQLRAVSNAQESKGT------- >THRR_.ENLA SGEGSGDQAPVSRSARKPIRRNITKEAEQYLSSQWLTKFVPSLYTVVFIVGLPLNLLAIIIFLFKMKVRKPAVVYMLNLAIADVFFVSVLPFKIAYHLSGNDWLFGPGMCRIVTAIFYCNMYCSVLLIASISVDRFLAVVYPMSLSWRTMSRAYMACSFIWLISIASTIPLLVTEQTQK--D--TTCHDVLDLKDLKDFYIYYFSSFCLLFFFVPFIITTICYIGIIRSL------SSSSIENSCKKTRALFLAVVVLCVFIICFGPTNVLFLTHYLQEANEFLYFAYILSACVGSVSCCLDPLIYYYASSQCQRYLYSLLCCRKVSEPGSSTGQLDNCS-------- >GHSR_HUMAN SEEPGFNLTLADLDWDASPGNDSLGDELLQLFPAPLLAGVTATCVALFVVGIAGNLLTMLVVSRFRELRTTTNLYLSSMAFSDLLIFLCMPLDLVRLWQYRPWNFGDLLCKLFQFVSESCTYATVLTITALSVERYFAICFPLAKVVVTKGRVKLVIFVIWAVAFCSAGPIFVLVG---T-D-TNECRPT---FAVRSGLLTVMVWVSSIFFFLPVFCLTVLYSLIGRKLWRRGD-VVGASLRDQNHKQTVKMLAVVVFAFILCWLPFHVGRYLFSKSFEPQISQYCNLVSFVLFYLSAAINPILYNIMSKKYRVAVFRLLGFEPFSQRKLSTLKDESSINT------ >CKR3_CAVPO PEEAELETEFPGTTFYDYEFAQPC---FKVSITDLGAQFLPSLFSLVFIVGLLGNITVIVVLTKYQKLKIMTNIYLLNLAISDLLFLFTLPFWTYYVHWNK-WVFGHFMCKIISGLYYVGLFSEIFFIILLTIDRYLAIVHAVALRTRTVTFGIITSVITWVLAVLAALPEFMFYGTQG-HF-VLFCGPSYPEKKEHHWKRFQALRMNIFGLALPLLIMIICYTGIIKTL---------LRCPSKKKYKAIRLIFVIMVVFFVFWTPYNLLLLFSAFDLSFKQLDMAKHVTEVIAHTHCCINPIIYAFVGERFQKYLRHFLHRNVTMHLSKYIPFFS-------SSIS >ADMR_RAT LAPDNDFREIHNWTELLHLFNQTFSDCRMELNENTKQVVLFVFYLAIFVVGLVENVLVICVNCRRSGRVGMLNLYILNMAVADLGIILSLPVWMLEVMLYETWLWGSFSCRFIHYFYLANMYSSIFFLTCLSIDRYVTLTNTSSWQRHQHRIRRAVCAGVWVLSAIIPLPEVVHIQLLD--E--PMCLFLAPFETYSAWALAVALSATILGFLLPFPLIAVFNILSACRL---------RRQGQTESRRHCLLMWAYIVVFAICWLPYHVTMLLLTLHTTHNFLYFFYEITDCFSMLHCVANPILYNFLSPSFRGRLLSLVVRYLPKEQARAAGGRQHSIIITKEGSL >GPR._MOUSE --------MDLINSSTHVINVSTSLTNSTGVPTPAPKTIIAASLFMAFIIGVISNGLYLWMLQFKMQ-RTVNTLLFFHLILSYFISTLILPFMATSFLQDNHWVFGSVLCKAFNSTLSVSMFASVFFLSAISVARYYLILHPVSQQHRTPHWASRIALQIWISATILSIPYLVFRTTHD-IS--DWESKE-HQTLGQWIHAACFVGRFLLGFLLPFLVIIFCYKRVATKM---------KEKGLFKSSKPFKVMVTAVISFFVCWMPYHVHSGLVLTKSQP-PLHLTLGLAVVTISFNTVVSPVLYLFTGENFKV-FKKSILALFNSTFSDISSTEETEI-------- >CML2_HUMAN APNTTSPELNLSHPLLGTALANGTGELSEHQQYVIGLFLSCLYTIFLFPIGFVGNILILVVNISFREKMTIPDLYFINLAVADLILVADSLIEVFNLHER--YYDIAVLCTFMSLFLQVNMYSSVFFLTWMSFDRYIALARAMCSLFRTKHHARLSCGLIWMASVSATLVPFTAVHLQ-A----CFCFA---------DVREVQWLEVTLGFIVPFAIIGLCYSLIVRVLVR----AHRHRGLRPRRQKALRMILAVVLVFFVCWLPENVFISVHLLQRTQHAHPLTGHIVNLAAFSNSCLNPLIYSFLGETFRDKLRLYIEQKTNLPALNRFCHADSTEQSDVRFSS >AA2A_CANFA -------------------------------MSTMGSWVYITVELAIAVLAILGNVLVCWAVWLNSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTG--FCAACHNCLFFACFVLVLTQSSIFSLLAIAIDRYIAIRIPLYNGLVTGTRAKGIIAVCWVLSFAIGLTPMLGWNNCSS-Q-QVACLFE-----DVVPMNYMVYYNFFAFVLVPLLLMLGVYLRIFLAARRQESQGERARSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCPECHAPLWLMYLTIVLSHTNSVVNPFIYAYRIREFRQTFRKIIRSHVLRRREPFKAGGAHGSDGEQISLR >NK2R_MESAU ----MGGRAIVTDTNIFSGLESNTTGVTAFSMPAWQLALWATAYLGLVLVAVTGNATVIWIILAHERMRTVTNYFIINLALADLCMAAFNATFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPITKATIAGIWLVALALASPQCFYST---GA---TKCVVAWPNDNGGKMLLLYHLVVFVLVYFLPLVVMFVAYSVIGLTLWKRPRHHGANLRHLHAKKKFVKAMVLVVLTFAICWLPYHLYFILGSFQKDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTEE----RTPSLSRRVNRC >PAFR_MOUSE ----------------------MEHNGSFRVDSEFRYTLFPIVYSVIFILGVVANGYVLWVFANLYPKLNEIKIFMVNLTMADLLFLITLPLWIVYYYNEGDWILPNFLCNVAGCLFFINTYCSVAFLGVITYNRYQAVAYPITAQATTRKRGISLSLIIWVSIVATASYFLATDSTN-G--NITRCFEH---YEPYSVPILVVHVFIAFCFFLVFFLIFYCNLVIIHTLLTQP---MRQQRKAGVKRRALWMVCTVLAVFIICFVPHHVVQLPWTLAELGQAINDAHQITLCLLSTNCVLDPVIYCFLTKKFRKHLSEKFYSMRSSRKCSRATSDPANQTPIVSLKN >GU27_RAT --------------------------------MILNCNPFSGLFLSMYLVTVLGNLLIILAVSSNSHLHNLMYFFLSNLSFVDICFISTTIPKMLVNIHSQTKDISYIECLSQVYFLTTFGGMDNFLLTLMACDRYVAICHPLYTVIMNLQLCALLILMFWLIMFCVSLIHVLLMNEL---NPHFFCELAKVANSDTHINNVFMYVVTSLLGLIPMTGILMSYSQIASSL--------LKMSSSVSKYKAFSTCGSHLCVVSLFYGSATIVYFCSSVLHS---THKKMIASLMYTVISPMLNPFIYSLRNKDVKGALGKLFIRVASCPLWSKDFRPRQSL-------- >ET3R_.ENLA GNVLNMSPP------------PPSPCLSRAKIRHAFKYVTTILSCVIFLVGIVGNSTLLRIIYKNKCMRNGPNVLIASLALGDLFYILIAIPIISISFWLS----TGHSEYIYQLVHLYRARVYSLSLCALSIDRYRAVASWNIRSIGIPVRKAIELTLIWAVAIIVAVPEAIAFNLVELV--ILVCMLPQTSDFMRFYQEVKVWWLFGFYFCLPLACTGVFYTLMSCEMLSINGM-IALNDHMKQRREVAKTVFCLVVIFALCWLPLHVSSIFVRLSATVQLLMVMNYTGINMASLNSCIGPVALYFVSRKFKNCFQSCLCCWCHRPTLTITPMDWKANGHDLDLDR >B3AR_CAPHI MAPWPPRNSSLTPWPDIPTLAPNTANASGLPGVPWAVALAGALLALAVLAIVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGHWPLGVTGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSKWWRVQ-R--RCCTFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALRTLGLIMGTFTLCWLPFFVVNVVRALGGPS-VSGPTFLALNWLGYANSAFNPLIYCRSP-DFQSAFRRLLCRCRPEEHLAAASPPRVLTSPAGPRQP >GP41_HUMAN -----------------------MDTGPDQSYFSGNHWFVFSVYLLTFLVGLPLNLLALVVFVGKLQRPVAVDVLLLNLTASDLLLLLFLPFRMVEAANGMHWPLPFILCPLSGFIFFTTIYLTALFLAAVSIERFLSVAHPLYKTRPRLGQAGLVSVACWLLASAHCSVVYVIEFSGD--T--GTCYLE-FRKDQLAILLPVRLEMAVVLFVVPLIITSYCYSRLVWILGR--------GGSHRRQRRVAGLLAATLLNFLVCFGPYNVSHVVGYICG---ESPAWRIYVTLLSTLNSCVDPFVYYFSSSGFQADFHELLRRLCGLWGQWQQESSGGEEQRADRPAE >O.YR_RAT GTPAANWSVELDLGSGVPPGEEGNRTAGPPQRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIRDLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLGTWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYVTWITLAVYIVPVIVLAACYGLISFKIWQNRAAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDVNA-KEASAFIIAMLLASLNSCCNPWIYMLFTGHLFHELVQRFFCCSARYLKGSRPGENSSTFVLSRRSS >OLF2_CHICK -------------MASGNCTTPTTFILSGLTDNPRLQMPLFMVFLVIYTTTLLTNLGLIALIGMDLHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERRTISYVGCILQYFSFVLLTTSEWLLLAVMAYDRYVAICKPLYPSIMTKAVCWRLVKGLYSLAFLNSLVHTSGLLKL---SNHFFCDNRQISSSSTTLNELLVIISGSLFVMSSIITILISYVFIILTV--------VMIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRSVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRLTATSVWLH-------------------- >NTR1_MOUSE PHPQFGLETMLLALSLSNGSGLEPNSNLDVNTDIYSKVLVTAVYLALFVVGTVGNSVTAFTLARKKSLQSTVHYHLGSLALSDLLILLLAMPVELYNFIWVHWAFGDAGCRGYYFLRDACTYATALNVASLSVERYLAICHPFAKTLMSRSRTKKFISAIWLASALLAVPMLFTMGLQN--SGGLVCTPT---VVDTATVKVVIQVNTFMSFLFPMLIISILNTVIANKLTVMEHSMSIEPGRVQALRHGVLVLRAVVIAFVVCWLPYHVRRLMFCYISDEDFYHYFYMLTNALFYVSSAINPILYNLVSANFRQVFLSTLACLCPGWRRRRKKRPSMSSNHAFSTSA >OLF7_RAT ------------MERRNHSGRVSEFVLLGFPAPAPLRVLLFFLSLL-YVLVLTENMLIIIAIRNHPTLHKPMYFFLANMSFLEIWYVTVTIPKMLAGFIGSKQLISFEACMTQLYFFLGLGCTECVLLAVMAYDRYVAICHPLYPVIVSSRLCVQMAAGSWAGGFGISMVKVFLISRL---SNHFFCDVSNLSCTDMSTAELTDFVLAIFILLGPLSVTGASYMAITGAV--------MRIPSAAGRHKAFSTCASHLTVVIIFYAASIFIYARPKALS---AFDTNKLVSVLYAVIVPLFNPIIYCLRNQDVKRALRRTLHLAQDQEANTNKGSK------------ >MC5R_RAT -MNSSSHLTLLDLTLNASEDNILGQNVNNKSSACEDMGIAVEVFLTLGLVSLLENILVIGAIVKNKNLHSPMYFFVGSLAVADMLVSMSNAWETITIYLINNDTFVRHIDNVFDSMICISVVASMCSLLAIAVDRYITIFYALYHHIMTARRSGVIIACIWTFCISCGIVFIIYYY----------------------EESKYVIVCLISMFFTMLFFMVSLYIHMFLLARNH-RIPRYNSVRQRASMKGAITLTMLLGIFIVCWSPFFLHLILMISCPQNACFMSYFNMYLILIMCNSVIDPLIYALRSQEMRRTFKEIICCHGFRRTCTLLGRY------------ >BONZ_CERAE ------MAEYDHYEDNGFNSFNDSSQEEHQDFLQFSKVFLPCMYLVVFVCGLVGNSLVLVISIFYHKLQSLTDVFLVNLPLADLVFVCTLPFWAYAGIHE--WIFGQVMCKTLLGIYTINFYTSMLILTCITVDRFIVVVKATNQQAKKMTWGKVICLLIWVISLLVSLPQIIYGNVFNLD--KLICGYH-----DEEISTVVLATQMTLGFFLPLLAMIVCYSVIIKTL---------LHAGGFQKHRSLKIIFLVMAVFLLTQTPFNLVKLIRSTHWEYTSFHYTIIVTEAIAYLRACLNPVLYAFVSLKFRKNFWKLVKDIGCLPYLGVSHQWKTFSASHNVEAT >SSR5_MOUSE LSLASTPSWNAS---AASSGSHNWSLVDPVSPMGARAVLVPVLYLLVCTVGLGGNTLVIYVVLRYAKMKTVTNVYILNLAVADVLFMLGLPFLATQNAVSY-WPFGSFLCRLVMTLDGINQFTSIFCLMVMSVDRYLAVVHPLSARWRRPRVAKLASAAVWVFSLLMSLPLLVFADVQE-----GTCNLS-WPEPVGLWGAAFITYTSVLGFFGPLLVICLCYLLIVVKVKAAMR--VGSSRRRRSERKVTRMVVVVVLVFVGCWLPFFIVNIVNLAFTLPPTSAGLYFFVVVLSYANSCANPLLYGFLSDNFRQSFRKALCLRRGYGVEDADAIERPQTTLPTRSCE >OPSR_ASTFA GD--DTTREAAFTYTNSNNTKDPFEGPNYHIAPRWVYNLATCWMFFVVVASTVTNGLVLVASAKFKKLRHPLNWILVNLAIADLLETLLASTISVCNQFFGYFILGHPMCVFEGFTVATCGIAGLWSLTVISWERWVVVCKPFGNVKFDGKMATAGIVFTWVWSAVWCAPPIFGWSRYWPHGLKTSCGPDVFSGSEDPGVQSYMIVLMITCCFIPLGIIILCYIAVWWAIRTVAQQQKDSESTQKAEKEVSRMVVVMIMAYCFCWGPYTFFACFAAANPGYAFHPLAAAMPAYFAKSATIYNPVIYVFMNRQFRVCIMQLFGKKVDDG-----SEVSS------VAPA >C.C1_HUMAN ------MESSGNPESTTFFYYDLQSQPCENQAWVFATLATTVLYCLVFLLSLVGNSLVLWVLVKYESLESLTNIFILNLCLSDLVFACLLPVWISPYHWG--WVLGDFLCKLLNMIFSISLYSSIFFLTIMTIHRYLSVVSPLTLRVPTLRCRVLVTMAVWVASILSSILDTIFHKVLS-----SGCDYS------ELTWYLTSVYQHNLFFLLSLGIILFCYVEILRTL---------FRSRSKRRHRTVKLIFAIVVAYFLSWGPYNFTLFLQTLFRTQQQLEYALLICRNLAFSHCCFNPVLYVFVGVKFRTHLKHVLRQFWFCRLQAPSPASFAYEGASFY--- >GALT_RAT --------------------MADIQNISLDSPGSVGAVAVPVIFALIFLLGMVGNGLVLAVLLQPGPPRSTTDLFILNLAVADLCFILCCVPFQAAIYTLDAWLFGAFVCKTVHLLIYLTMYASSFTLAAVSLDRYLAVRHPLSRALRTPRNARAAVGLVWLLAALFSAPYLSYYG---GA--LELCVPA----WEDARRRALDVATFAAGYLLPVAVVSLAYGRTLCFLWAAP--AAAAEARRRATGRAGRAMLAVAALYALCWGPHHALILCFWYGRFAPATYACRLASHCLAYANSCLNPLVYSLASRHFRARFRRLWPCGRRRHRHHHR-AHPASSGPAGYPGD >ACM4_HUMAN -----MANFTPVNGSSGNQSVRLVTSSSHNRYETVEMVFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLACADLIIGAFSMNLYTVYIIKGYWPLGAVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPARRTTKMAGLMIAAAWVLSFVLWAPAILFWQFVV-VP-DNHCFIQ------FLSNPAVTFGTAIAAFYLPVVIMTVLYIHISLASRSRSIAVRKKRQMAARERKVTRTIFAILLAFILTWTPYNVMVLVNTFCQSC-IPDTVWSIGYWLCYVNSTINPACYALCNATFKKTFRHLLLCQYRNIGTAR---------------- >AG22_RAT RNITSSLPFDNLNATGTNESAFNC----SHKPADKHLEAIPVLYYMIFVIGFAVNIVVVSLFCCQKGPKKVSSIYIFNLAVADLLLLATLPLWATYYSYRYDWLFGPVMCKVFGSFLTLNMFASIFFITCMSVDRYQSVIYPFLSQRRNPWQASYVVPLVWCMACLSSLPTFYFRDVRT-LG--NACIMAFPPEKYAQWSAGIALMKNILGFIIPLIFIATCYFGIRKHLLKT----NSYGKNRITRDQVLKMAAAVVLAFIICWLPFHVLTFLDALTWMGAVIDLALPFAILLGFTNSCVNPFLYCFVGNRFQQKLRSVFRVPITWLQGKRETMSREMD-------T >OPRM_MOUSE LSHVDGNQSDPCGPNRTGLGGSHSLCPQTGSPSMVTAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGNILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDFRTPRNAKIVNVCNWILSSAIGLPVMFMATTKYGS---IDCTLT-FSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSTIEQQNSARIPSTANTVDRTNH >MC4R_HUMAN GMHTSLHLWNRSSYRLHSNASESLGKGYSDGGCYEQLFVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETIIITLLNSQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALYHNIMTVKRVGIIISCIWAACTVSGILFIIYYS----------------------DDSSAVIICLITMFFTMLALMASLYVHMFLMARLH-RIPGTGAIRQGANMKGAITLTILIGVFVVCWAPFFLHLIFYISCPQNVCFMSHFNLYLILIMCNSIIDPLIYALRSQELRKTFKEIICCYPLGGLCDLSSRY------------ >5H7_CAVPO STWTPRLLSGVPEVAASPSPSNVSGCGEQINYGRAEKVVIGSILTLITLLTIAGNCLVVISVCFVKKLRQPSNYLIVSLALADLSVAVAVIPFVSVTDLIGGWIFGHFFCNVFIAMDVMCCTASIMTLCVISIDRYLGITRPLYPVRQNGKCMPKMILSVWLLSASITLPPLFGWAQ--N-D--KVCLIS--------QDFGYTIYSTAVAFYIPMSVMLFMYYRIYKAARKSSRLERKNISIFKREQKAATTLGIIVGAFTVCWLPFFLLSTARPFICGTCIPLWVERTCLWLGYANSLINPFIYAFFNRDLRTTYRSLLQCQYRNINRKLSAAGAERPERPECVLQ >NK2R_HUMAN ----MGTCDIVTEANISSGPESNTTGITAFSMPSWQLALWAPAYLALVLVAVTGNAIVIWIILAHRRMRTVTNYFIVNLALADLCMAAFNAAFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPSTKAVIAGIWLVALALASPQCFYST---GA---TKCVVAWPEDSGGKTLLLYHLVVIALIYFLPLAVMFVAYSVIGLTLWRRPGHHGANLRHLQAKKKFVKTMVLVVLTFAICWLPYHLYFILGSFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTKE----PTTSLSTRVNRC >NY1R_CAVPO TSFSQLENHSVHYNLSEEKPSFFAFENDDCHLPLAVIFTLALAYGAVIILGVSGNLALILIILKQKEMRNVTNILIVNLSFSDLLVAIMCLPFTFVYTLMDHWIFGEIMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYIGIAVIWVLAVASSLPFMIYQVLTDKD--KLVCFDQ---FPSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMMRDSKYRSSESKRINIMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK >O.1R_RAT PGVPTSSGEPFHLPPDYED-EFLRYLWRDYLYPKQYEWVLIAAYVAVFLIALVGNTLVCLAVWRNHHMRTVTNYFIVNLSLADVLVTAICLPASLLVDITESWLFGHALCKVIPYLQAVSVSVAVLTLSFIALDRWYAICHPL-LFKSTARRARGSILGIWAVSLAVMVPQAAVMECSSRTRLFSVCDER---WADELYPKIYHSCFFFVTYLAPLGLMGMAYFQIFRKLWGPQPRFLAEVKQMRARRKTAKMLMVVLLVFALCYLPISVLNVLKRVFGMFEAVYACFTFSHWLVYANSAANPIIYNFLSGKFREQFKAAFSCCLPGLGPS-----RHKS---LSLQS >YTJ5_CAEEL -------------MPNYTVPPDPADTSWDSPYSIPVQIVVWIIIIVLSLETIIGNAMVVMAYRIERNSKQVSNRYIVSLAISDLIIGIEGFPFFTVYVLNGDWPLGWVACQTWLFLDYTLCLVSILTVLLITADRYLSVCHTAYLKWQSPTKTQLLIVMSWLLPAIIFGIMIYGWQAMTQSTSGAECSAP------FLSNPYVNMGMYVAYYWTTLVAMLILYKVFSSGYQKKSQPDRLAPPNKTDTFLSASGTITFIVGFFAILWSPYYIMATVYGFCKG--IPSFLYTLSYYMCYLNSSGNPFAYALANRQFRSAFMRMFRGNFNKVA------------------ >KI01_HUMAN ---------------MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPS-SKSFIIYLKNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRYYKIVKPLTSFIQSVSYSKLLSVIVWMLMLLLAVPNIILTNQS-TQ---IKCIEL--KSELGRKWHKASNYIFVAIFWIVFLLLIVFYTAITKKIFK----LKSSRNSTSVKKKSSRNIFSIVFVFFVCFVPYHIARIPYTKSQTEEILRYMKEFTLLLSAANVCLDPIIYFFLCQPFREILCKKLHIPLKAQNDLDISRIESTDTL------ >PD2R_HUMAN ---------------------MKSPFYRCQNTTSVEKGNSAVMGGVLFSTGLLGNLLALGLLARSGLLPSVFYMLVCGLTVTDLLGKCLLSPVVLAAYAQNRPALDNSLCQAFAFFMSFFGLSSTLQLLAMALECWLSLGHPFYRRHITLRLGALVAPVVSAFSLAFCALPFMGFGKFVQYCPGTWCFIQ-MVHEEGSLSVLGYSVLYSSLMALLVLATVLCNLGAMRNLYAMAEPGREASPQPLEELDHLLLLALMTVLFTMCSLPVIYRAYYGAFKDV-TSEEAEDLRALRFLSVISIVDPWIFIIFRSPVFRIFFHKIFIRPLRYRSRCSNST------------ >C3.1_RAT --MPTSFPELDLENFEYDDSAEAC---YLGDIVAFGTIFLSIFYSLVFTFGLVGNLLVVLALTNSRKSKSITDIYLLNLALSDLLFVATLPFWTHYLISHEG--LHNAMCKLTTAFFFIGFFGGIFFITVISIDRYLAIVLAASMNNRTVQHGVTISLGVWAAAILVASPQFMFTKRKD-----NECLGDYPEVLQEIWPVLRNSEVNILGFVLPLLIMSFCYFRIVRTL---------FSCKNRKKARAIRLILLVVVVFFLFWTPYNIVIFLETLKFYNRDLRWALSVTETVAFSHCCLNPFIYAFAGEKFRRYLRHLYNKCLAVLCGRPVHAGRSRQDSILSS-L >UR2R_RAT TVSGSTVTELPGDSNVSLNSSWSGPTDPSSLKDLVATGVIGAVLSAMGVVGMVGNVYTLVVMCRFLRASASMYVYVVNLALADLLYLLSIPFIIATYVTKD-WHFGDVGCRVLFSLDFLTMHASIFTLTIMSSERYAAVLRPLDTVQRSKGYRKLLVLGTWLLALLLTLPMMLAIQ---GS--KSLCLPA----WGPRAHRTYLTLLFGTSIVGPGLVIGLLYVRLARAYWLS---ASFKQTRRLPNPRVLYLILGIVLLFWACFLPFWLWQLLAQYHEAMETARIVNYLTTCLTYGNSCINPFLYTLLTKNYREYLRGRQRSLGSSCHSPGSPGSLQQDSGRSLSSS >5H4_MOUSE ------------------MDKLDANVSSNEGFRSVEKVVLLTFLAVVILMAILGNLLVMVAVCRDRQRKIKTNYFIVSLAFADLLVSVLVMPFGAIELVQDIWAYGEMFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPYRNKMTPLRIALMLGGCWVLPMFISFLPIMQGWNNIRK-NSTWCVFM--------VNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHSRPDQHSTHRMRTETKAAKTLCVIMGCFCFCWAPFFVTNIVDPFIDYT-VPEQVWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYKRPPILGQTINGSTHVLR-- >MC4R_RAT GMYTSLHLWNRSSHGLHGNASESLGKGHSDGGCYEQLFVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETIVITLLNSQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALYHNIMTVRRVGIIISCIWAACTVSGVLFIIYYS----------------------DDSSAVIICLITMFFTMLVLMASLYVHMFLMARLH-RIPGTGTIRQGANMKGAITLTILIGVFVVCWAPFFLHLLFYISCPQNVCFMSHFNLYLILIMCNAVIDPLIYALRSQELRKTFKEIICFYPLGGICELPGRY------------ >OLF2_CANFA -------------MDGKNCSSVNEFLLVGISNKPGVKVTLFITFLIVYLIILVANLGMIILIRMDSQLHTPMYFFLSHLSFSDARYSTAVGPRMLVGFIAKNKSIPFYSCAMQWLVFCTFVDSECLLLAVMAFDRYKAISHPLYTVSMSSRVCSLLMAGVYLVGIMDASVNTILTFRL---CNHFFCDVPLLSCSDTQVNELVIFTIFGFIELITLSGLFVSYCYIILAV--------RKINSAEGRFKAFSTCTSHLTAVAIFQGTMLFMYFRPSSSY---SLDQDKIISLFYSLVIPMLNPLIYSLRNKDVKEALKKLKNKKWFH--------------------- >AG2S_.ENLA ----MLSNISAGENSEVEKIVVKC---SKSGMHNYIFITIPIIYSTIFVVGVFGNSLVVIVIYSYMKMKTMASVFLMNLALSDLCFVITLPLWAVYTAMHYHWPFGDLLCKIASTAITLNLYTTVFLLTCLSIDRYSAIVHPMSRIRRTVMVARLTCVGIWLVAFLASLPSVIYRQIFI-TN--TVCALV-YHSGHIYFMVGMSLVKNIVGFFIPFVIILTSYTLIGKTLKEV------YRAQRARNDDIFKMIVAVVLLFFFCWIPHQVFTFLDVLIQMDDIVDTGMPITICIAYFNSCLNPFLYGFFGKKFRKHFLQLIKYIPPKMRTHASVNTRLSD-------T >OPRD_MOUSE SSPLVNLSDAFPSAFPSAGANASGSPGARSASSLALAIAITALYSAVCAVGLLGNVLVMFGIVRYTKLKTATNIYIFNLALADALATSTLPFQSAKYLMET-WPFGELLCKAVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPAKAKLINICIWVLASGVGVPIMVMAVTQPGA---VVCMLQ-FPSPSWYWDTVTKICVFLFAFVVPILIITVCYGLMLLRLRSVRLL-SGSKEKDRSLRRITRMVLVVVGAFVVCWAPIHIFVIVWTLVDINPLVVAALHLCIALGYANSSLNPVLYAFLDENFKRCFRQLCRTPCGRQEPGSLRRPRVTACTPSDGPG >CKR5_PYGNE ----MDYQVSSPTYDIDYYTSEPC---QKVNVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLIMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >OPSD_GAMAF MNGTEGPYFYVPMVNTTGIVRSPYEYPQYYLVSPAAYACLGAYMFFLILVGFPVNFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTIYTSMHGYFVLGRLGCNLEGYFATLGGEIGLWSLVVLAVERWLVVCKPISNFRFTENHAIMGLVFTWIMANACAAPPLLGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFICHFCIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVILVIGFLVCWTPYASVAWYIFSNQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA >LSHR_CALJA ESGQSGWDYDYGFHLPKTPRCAPEPDAFNPCEDIMGYDFLRVLIWLINILAIMGNMTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDSQTKGWQTG-SGCNTAGFFTVFASELSVYTLTVITLERWHTITYAILDQKLRLRHAILIMLGGWLFSSLIAMLPLVGVSNY----KVSICFPM-----VETTLSQIYILTILILNVVAFIIICACYIKIYFAVRNP------ELMATNKDTKIAKKMAILIFTDFTCMAPISFFAISAAFKMPLITVTNSKVLLVLFYPINSCANPFLYAIFTKTFRRDFFLLLGKFGCCKHRAELYRRSNYKNGFTGSSK >O1F1_HUMAN -------------MSGTNQSSVSEFLLLGLSRQPQQQHLLFVFFLSMYLATVLGNLLIILSVSIDSCLHTPMYFFLSNLSFVDICFSFTTVPKMLANHILETQTISFCGCLTQMYFVFMFVDMDNFLLAVMAYDHFVAVCHPLYTAKMTHQLCALLVAGLWVVANLNVLLHTLLMAPL---STHFFCDVTKLSCSDTHLNEVIILSEGALVMITPFLCILASYMHITCTV--------LKVPSTKGRWKAFSTCGSHLAVVLLFYSTIIAVYFNPLSSH---SAEKDTMATVLYTVVTPMLNPFIYSLRNRYLKGALKKVVGRVVFSV-------------------- >GALT_HUMAN --------------------MADAQNISLDSPGSVGAVAVPVVFALIFLLGTVGNGLVLAVLLQPGPPGSTTDLFILNLAVADLCFILCCVPFQATIYTLDAWLFGALVCKAVHLLIYLTMYASSFTLAAVSVDRYLAVRHPLSRALRTPRNARAAVGLVWLLAALFSAPYLSYYG---GA--LELCVPA----WEDARRRALDVATFAAGYLLPVAVVSLAYGRTLRFLWAAP--AAAAEARRRATGRAGRAMLAVAALYALCWGPHHALILCFWYGRFAPATYACRLASHCLAYANSCLNPLVYALASRHFRARFRRLWPCGRRRRHRARRALRGPPGCPGDARPS >A2AA_CAVPO ----MGSLQPDSGNASWNGTEGPGGGTRATPYSLQVTVTLVCLVGLLILLTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKAWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISFEK-AQ-P--PRCEIN--------DQKWYVISSSIGSFFAPCLIMILVYVRIYQIAKRRRGGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLTAVGCS--VPRTLFKFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------ >A2AA_RAT ----MGSLQPDAGNSSWNGTEAPGGGTRATPYSLQVTLTLVCLAGLLMLFTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKVWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISIEKKGQ-P--PSCKIN--------DQKWYVISSSIGSFFAPCLIMILVYVRIYQIAKRRRAGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLIAVGCP--VPYQLFNFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------ >OLF4_CANFA -------------MELENDTRIPEFLLLGFSEEPKLQPFLFGLFLSMYLVTILGNLLLILAVSSDSHLHTPMYFFLANLSFVDICFTCTTIPKMLVNIQTQRKVITYESCIIQMYFFELFAGIDNFLLTVMAYDRYMAICYPLYMVIMNPQLCSLLLLVSWIMSALHSLLQTLMVLRL---SPHFFCELNQLACSDTFLNNMMLYFAAILLGVAPLVGVLYSYFKIVSSI--------RGISSAHSKYKAFSTCASHLSVVSLFYCTSLGVYLSSAAPQ---STHTSSVASVMYTVVTPMLNPFIYSLRNKDIKGALNVFFRGKP----------------------- >SSR2_RAT QFNGSQVWIPSPFDLNGSLGPSNGSNQTEPYYDMTSNAVLTFIYFVVCVVGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMINVAVWGVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYAFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSVAISPALKGMFDFVVILTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGAEDGERSDLNETTETQRTLL >BRS3_SHEEP QTLISTTNDTESSSSVVPNDSTNKRRTGDNSPGIEALCAIYITYAVIISVGILGNAILIKVFFKTKSMQTVPNIFITSLAFGDLLLLLTCVPVDVTHYLAEGWLFGRIGCKVLSFIRLTSVGVSVFTLTILSADRYKAVVKPLRQPPNAILKTCAKAGCIWIMSMIIALPEAIFSNVYTVT--FKACASY---VSERLLQEIHSLLCFLVFYIIPLSIISVYYSLIARTLYKSIPTQRHARKQIESRKRIAKTVLVLVALFALCWLPNHLLYLYRSFTSQTTVHLFVTIISRILAFSNSCVNPFALYWLSNTFQQHFKAQLFCCKAGRPDPTAANTMGRVPGAASTQM >ETBR_BOVIN SSATPQIPRGGRMAGIPPR--TPPPCDGPIEIKETFKYINTVVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINTYKLLAKDWPFGVEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAVGFDIITRI--LRICLLHQKTAFMQFYKTAKDWWLFSFYFCLPLAITALFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYDQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQSFE-EKQSLEFKANDHGYDNFR >FML1_HUMAN -----------METNFSTPLNEYEEVSYESAGYTVLRILPLVVLGVTFVLGVLGNGLVIWVAGFRMT-RTVTTICYLNLALADFSFTATLPFLIVSMAMGEKWPFGWFLCKLIHIVVDINLFGSVFLIGFIALDRCICVLHPVAQNHRTVSLAMKVIVGPWILALVLTLPVFLFLTTVTASWPEERLKVA------ITMLTARGIIRFVIGFSLPMSIVAICYGLIAAKI---------HKKGMIKSSRPLRVLTAVVASFFICWFPFQLVALLGTVWLKEKIIDILVNPTSSLAFFNSCLNPMLYVFVGQDFRERLIHSLPTSLERALSEDSAPTAS---------- >V1AR_MOUSE SSPWWPLTTEGANSSREAAGLGEGGSPPGDVRNEELAKLEVTVLAVIFVVAVLGNSSVLLALHRTPRKTSRMHLFIRHLSLADLAVAFFQVLPQLCWDITYRFRGPDWLCRVVKHLQVFAMFASSYMLVVMTADRYIAVCHPLKTLQQPARRSRLMIAASWGLSFVLSIPQYFIFSVIETK--AQDCWAT---FIPPWGTRAYVTWMTSGVFVVPVIILGTCYGFICYHIWRNLLVVSSVKSISRAKIRTVKMTFVIVSAYILCWTPFFIVQMWSVWDTNFDSENPSTTITALLASLNSCCNPWIYMFFSGHLLQDCVQSFPCCQSIAQKFAKDDSTSYSNNRSPTNS >OPSB_BOVIN --MSKMSEEEEFLLFKNISLVGPWDGPQYHLAPVWAFHLQAVFMGFVFFVGTPLNATVLVATLRYRKLRQPLNYILVNVSLGGFIYCIFSVFIVFITSCYGYFVFGRHVCALEAFLGCTAGLVTGWSLAFLAFERYIIICKPFGNFRFSSKHALMVVVATWTIGIGVSIPPFFGWSRFVPEGLQCSCGPDWYTVGTKYYSEYYTWFLFIFCYIVPLSLICFSYSQLLGALRAVAAQQQESASTQKAEREVSHMVVVMVGSFCLCYTPYAALAMYIVNNRNHGVDLRLVTIPAFFSKSACVYNPIIYCFMNKQFRACIMEMVCGKPMTD---ESELSSTVSSSQVGPN- >APJ_MOUSE -----------MEDDGYNYYGADNQSECDYADWKPSGALIPAIYMLVFLLGTTGNGLVLWTVFRTSRKRRSADIFIASLAVADLTFVVTLPLWATYTYREFDWPFGTFSCKLSSYLIFVNMYASVFCLTGLSFDRYLAIVRPVNARLRLRVSGAVATAVLWVLAALLAVPVMVFRSTDAQCY-MDYSMVA-TSNSEWAWEVGLGVSSTAVGFVVPFTIMLTCYFFIAQTIAGHFR--KERIEGLRKRRRLLSIIVVLVVTFALCWMPYHLVKTLYMLGSLLIFLMNVFPYCTCISYVNSCLNPFLYAFFDPRFRQACTSMLCCDQSGCKGTPHSSSSSGHSQGPGPNM >O1G1_HUMAN -------------MEGKNLTSISECFLLGFSEQLEEQKPLFGSFLFMYLVTVAGNLLIILVIITDTQLHTPMYFFLANLSLADACFVSTTVPKMLANIQIQSQAISYSGCLLQLYFFMLFVMLEAFLLAVMAYDCYVAICHPLYILIMSPGLCIFLVSASWIMNALHSLLHTLLMNSL---SPHFFCDINSLSCTDPFTNELVIFITGGLTGLICVLCLIISYTNVFSTI--------LKIPSAQGKRKAFSTCSSHLSVVSLFFGTSFCVDFSSPSTH---SAQKDTVASVMYTVVTPMLNPFIYSLRNQEIKSSLRKLIWVRKIHSP------------------- >CKR2_HUMAN FIRNTNESGEEVTTFFDYDYGAPC---HKFDVKQIGAQLLPPLYSLVFIFGFVGNMLVVLILINCKKLKCLTDIYLLNLAISDLLFLITLPLWAHSAANE--WVFGNAMCKLFTGLYHIGYFGGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWLVAVFASVPGIIFTKCQK-ED-VYVCGPY----FPRGWNNFHTIMRNILGLVLPLLIMVICYSGILKTL--------LRCRNEKKRHRAVRVIFTIMIVYFLFWTPYNIVILLNTFQEFFSQLDQATQVTETLGMTHCCINPIIYAFVGEKFRSLFHIALGCRIAPLQKPVCGGPV-------KVTT >5H2A_CRIGR NSSDASNWTIDGENRTNLSFEGYLPPTCLSILHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGVSMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEEPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESHVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILAYKSSQLQAGQN >TRFR_MOUSE ------------MENDTVSEMNQTELQPQAAVALEYQVVTILLVVIICGLGIVGNIMVVLVVMRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSIYCMLWFFLLDLN--NA-SCGYKIS------RNYYSPIYLMDFGVFYVVPMILATVLYGFIARILFLNLNLNRCFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPTEKAANYSVKESDRFSTELED >OPSH_CARAU MNGTEGNNFYVPLSNRTGLVRSPFEYPQYYLAEPWQFKLLAVYMFFLICLGLPINGLTLICTAQHKKLRQPLNFILVNLAVAGAIMVCFGFTVTFYTAINGYFALGPTGCAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSSTHASAGIAFTWVMAMACAAPPLVGWSRYIPEGIQCSCGPDYYTLNPEYNNESYVLYMFICHFILPVTIIFFTYGRLVCTVKAAAAQQQDSASTQKAEREVTKMVILMVLGFLVAWTPYATVAAWIFFNKGAAFSAQFMAIPAFFSKTSALYNPVIYVLLNKQFRSCMLTTLFCGKNPLGDEESSTVSS------VSPA >A2AB_RAT --------------------MSGPTMDHQEPYSVQATAAIASAITFLILFTIFGNALVILAVLTSRSLRAPQNLFLVSLAAADILVATLIIPFSLANELLGYWYFWRAWCEVYLALDVLFCTSSIVHLCAISLDRYWAVSRALYNSKRTPCRIKCIILTVWLIAAVISLPPLIYKGD-Q-----PQCELN--------QEAWYILASSIGSFFAPCLIMILVYLRIYVIAKRSGVAWWRRRTQLSREKRFTFVLAVVIGVFVVCWFPFFFSYSLGAICPQHKVPHGLFQFFFWIGYCNSSLNPVIYTVFNQDFRRAFRRILCRPWTQTGW------------------ >TA2R_CERAE --MWPNG-----------SSLGPCFRPTNITLEERRLIASPWFAASFCVVGLASNLLALSVLAGARQTRSSFLTFLCGLVLTDFLGLLVTGAIVVSQHAALFVDPGCRLCRFMGVVMIFFGLSPLLLGATMASERFLGITRPFRPVVTSQRRAWATVGLVWAAALALGLLPLLGLGRYTVQYPGSWCFLT----LGAESGDVAFGLLFSMLGGLSVGLSFLLNTVSVATLCHVYHGEAAQQRPRDSEVEMMAQLLGIMLVASVCWLPLLVFIAQTVLRNPPRATEQELLIYLRVATWNQILDPWVYILFRRAVLRRLQPRLSTRPRSLSLQPQLTQ------------ >SSR2_HUMAN PLNGSHTWLSIPFDLNGSVVSTNTSNQTEPYYDLTSNAVLTFIYFVVCIIGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMITMAVWGVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYTFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSMAISPALKGMFDFVVVLTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGTDDGERSDLNETTETQRTLL >ACM4_.ENLA ----MENDTWENESSASNHSIDETIVEIPGKYQTMEMIFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLACADLIIGVFSMNLYSLYIIKGYWPLGPIVCDLWLALDYVVSNASVMNLLIISLER-FCVTKPLYPARRTTKMAGLMIAAAWLLSFELWAPAILFWQFIV-VP-SGECYIQ------FLSNPAVTFGTAIAAFYLPVVIMTILYIHISLASRSRSIAVRKKRQMAAREKKVTRTIFAILLAFIITWTPYNVMVLINTFCQTC-IPETIWYIGYWLCYVNSTINPACYALCNATFKKTFKHLLMCQYKSIGTAR---------------- >Y..5_CAEEL SVNESCDNYVEIFNKINYFFRDDQVINGTEYSPKEFGYFITFAYMLIILFGAIGNFLTIIVVILNPAMRTTRNFFILNLALSDFFVCIVTAPTTLYTVLYMFWPFSRTLCKIAGSLQGFNIFLSTFSIASIAVDRYVLIIFPT-KRERQQNLSFCFFIMIWVISLILAVPLLQASDLTPCDLALYICHEQEIWEKMIISKGTYTLAVLITQYAFPLFSLVFAYSRIAHRMKLRTTNSQRRRSVVERQRRTHLLLVCVVAVFAVAWLPLNVFHIFNTFELVN-FSVTTFSICHCLAMCSACLNPLIYAFFNHNFRIEFMHLFDRVGLRSLRVVIFGEMRTEFRSRGGCK >GRHR_CLAGA TLLLSNPTNVLDNSSVLNVSVSPPVLKWETPTFTTAARFRVAATLVLFVFRAASNLSVLLSVTRGRGLASHLRPLIASLASADLVMTFVVMPLDAVWNVTVQWYAGDAMCKLMCFLKLFAMHSAAFILVVVSLDRHHAILHPL-DTLDAGRRNRRMLLTAWILSLLLASPQLFIFRAIKVD--FVQCATH--SFQQHWQETAYNMFHFVTLYVFPLLVMSLCYTRILVEINRQGEPRSGTDMIPKARMKTLKMTIIIVASFVICWTPYYLLGIWYWFQPQMVIPDYVHHVFFVFGNLNTCCDPVIYGFFTPSFRADLSRCFCWRNQNASAKSLPHFSGEAESDLGSGD >CKR5_HUMAN ----MDYQVSSPIYDINYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAVVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >OPSD_NEOSA MNGTEGPYFYVPMVNTTGVVRSPYEYPQYYLVNPAAFAVLGAYMFFLIIFGFPINFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVIGGFTTTMYSSMHGYFVLGRLGCNLEGFSATLGGMISLWSLAVLAIERWVVVCKPTSNFRFGENHAIMGVSLTWTMALACTVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVLYMFFCHFMVPLIIIFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVILMVIGYLVCWLPYASVAWFIFTHQGSEFGPLFMTIPAFFAKSSSIYNPVIYICMNKQFRNCMITTLFCGKNPF---EGEEETEASSASSVSPA >A2AB_HUMAN -------------------------MDHQDPYSVQATAAIAAAITFLILFTIFGNALVILAVLTSRSLRAPQNLFLVSLAAADILVATLIIPFSLANELLGYWYFRRTWCEVYLALDVLFCTSSIVHLCAISLDRYWAVSRALYNSKRTPRRIKCIILTVWLIAAVISLPPLIYKGD-Q-----PQCKLN--------QEAWYILASSIGSFFAPCLIMILVYLRIYLIAKRSGAIWWRRRAHVTREKRFTFVLAVVIGVFVLCWFPFFFSYSLGAICPKHKVPHGLFQFFFWIGYCNSSLNPVIYTIFNQDFRRAFRRILCRPWTQTAW------------------ >CKR4_MOUSE VTDTTQDETVYNSYYFYESMPKPC---TKEGIKAFGEVFLPPLYSLVFLLGLFGNSVVVLVLFKYKRLKSMTDVYLLNLAISDLLFVLSLPFWGYYAADQ--WVFGLGLCKIVSWMYLVGFYSGIFFIMLMSIDRYLAIVHAVSLKARTLTYGVITSLITWSVAVFASLPGLLFSTCYT-EH-HTYCKTQ-YSVNSTTWKVLSSLEINVLGLLIPLGIMLFWYSMIIRTL---------QHCKNEKKNRAVRMIFGVVVLFLGFWTPYNVVLFLETLVELERYLDYAIQATETLGFIHCCLNPVIYFFLGEKFRKYITQLFRTCRGPLVLCKHCDFMS------SSSY >A1AD_HUMAN GSGEDNRSSAGEPGSAGAGGDVNGTAAVGGLVVSAQGVGVGVFLAAFILMAVAGNLLVILSVACNRHLQTVTNYFIVNLAVADLLLSATVLPFSATMEVLGFWAFGRAFCDVWAAVDVLCCTASILSLCTISVDRYVGVRHSLYPAIMTERKAAAILALLWVVALVVSVGPLLGWKEP--VP--RFCGIT--------EEAGYAVFSSVCSFYLPMAVIVVMYCRVYVVARSTHTFLSVRLLKFSREKKAAKTLAIVVGVFVLCWFPFFFVLPLGSLFPQLKPSEGVFKVIFWLGYFNSCVNPLIYPCSSREFKRAFLRLLRCQCRRRRRRRPLWRASTSG-LRQDCA >C3AR_HUMAN --------------MASFSAETNSTDLLSQPWNEPPVILSMVILSLTFLLGLPGNGLVLWVAGLKMQ-RTVNTIWFLHLTLADLLCCLSLPFSLAHLALQGQWPYGRFLCKLIPSIIVLNMFASVFLLTAISLDRCLVVFKPICQNHRNVGMACSICGCIWVVAFVMCIPVFVYREIFT-ED-YNLGQFT-DDDQVPTPLVAITITRLVVGFLLPSVIMIACYSFIVFRM--------QRGRFAKSQSKTFRVAVVVVAVFLVCWTPYHIFGVLSLLTDPEKTLMSWDHVCIALASANSCFNPFLYALLGKDFRKKARQSIQGILEAAFSEELTRSVIS--------- >AG2R_HUMAN ------MILNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPAIIHRNVFF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWIPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKRYFLQLLKYIPPKAKSHSNLSTRPSD-------N >CKR5_PANTR ----MDYQVSSPIYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >PF2R_MOUSE --MSMNS---------SKQPVSPAAGLIANTTCQTENRLSVFFSIIFMTVGILSNSLAIAILMKAYQSKASFLLLASGLVITDFFGHLINGGIAVFVYASDKFDQSNILCSIFGISMVFSGLCPLFLGSAMAIERCIGVTNPIHSTKITSKHVKMILSGVCMFAVFVAVLPILGHRDYQIQASRTWCFYN--TEHIEDWEDRFYLLFFSFLGLLALGVSFSCNAVTGVTLLRVKFRSQQHRQGRSHHLEMIIQLLAIMCVSCVCWSPFLVTMANIAINGNNPVTCETTLFALRMATWNQILDPWVYILLRKAVLRNLYKLASRCCGVNIISLHIWELKVAAISESPAA >OPSB_ORYLA VEFPDDFWIPIPLDTNNVTALSPFLVPQDHLGSPTIFYSMSALMFVLFVAGTAINLLTIACTLQYKKLRSHLNYILVNMAVANLIVASTGSSTCFVCFAFKYMVLGPLGCKIEGFTAALGGMVSLWSLAVIAFERWLVICKPLGNFVFKSEHALLCCALTWVCGLCASVPPLVGWSRYIPEGMQCSCGPDWYTTGNKFNNESFVMFLFCFCFAVPFSIIVFCYSQLLFTLKMAAKAQADSASTQKAEKEVTRMVVVMVVAFLVCYVPYASFALWVINNRGQTFDLRLATIPSCVSKASTVYNPVIYVLLNKQFRLCMKKMLGMSADED---EESSTSKVGPS------ >CCKR_RABIT ASLLGNASGIPPPCELGLDNETLFCLDQPPPSKEWQPAVQILLYSLIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAISDLMLCLFCMPFNLIPNLLKDFIFGSALCKTTTYLMGTSVSVSTLNLVAISLERYGAICKPLSRVWQTKSHALKVIAATWCLSFAIMTPYPIYN----NNQTANMCRFL---LPSDVMQQAWHTFLLLILFLIPGIVMMVAYGMISLELYQGRVSSSSSAATLMAKKRVIRMLMVIVVLFFLCWMPIFSANAWRAYDTVSRLSGTPISFILLLSYTSSCVNPIIYCFMNRRFRLGFMATFPCCPNPGPPGPRAEATTRASLSRYSYS >BRB1_HUMAN SSWPPLELQSSNQSQLFPQNATAC--DNAPEAWDLLHRVLPTFIISICFFGLLGNLFVLLVFLLPRRQLNVAEIYLANLAASDLVFVLGLPFWAENIWNQFNWPFGALLCRVINGVIKANLFISIFLVVAISQDRYRVLVHPMSGRQQRRRQARVTCVLIWVVGGLLSIPTFLLRSIQA-LN--TACILL---LPHEAWHFARIVELNILGFLLPLAAIVFFNYHILASLRTR---VSRTRVRGPKDSKTTALILTLVVAFLVCWAPYHFFAFLEFLFQVQDFIDLGLQLANFFAFTNSSLNPVIYVFVGRLFRTKVWELYKQCTPK---------LAPI-------S >TRFR_RAT ------------MENETVSELNQTELPPQVAVALEYQVVTILLVVVICGLGIVGNIMVVLVVMRTKHMRTATNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSIYCMLWFFLLDLN--DA-SCGYKIS------RNYYSPIYLMDFGVFYVMPMILATVLYGFIARILFLNMNLNRCFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPTEKAANYSVKESDRFSTELDD >AG22_HUMAN KNITSGLHFGLVNISGNNESTLNC----SQKPSDKHLDAIPILYYIIFVIGFLVNIVVVTLFCCQKGPKKVSSIYIFNLAVADLLLLATLPLWATYYSYRYDWLFGPVMCKVFGSFLTLNMFASIFFITCMSVDRYQSVIYPFLSQRRNPWQASYIVPLVWCMACLSSLPTFYFRDVRT-LG--NACIMAFPPEKYAQWSAGIALMKNILGFIIPLIFIATCYFGIRKHLLKT----NSYGKNRITRDQVLKMAAAVVLAFIICWLPFHVLTFLDALAWMGAVIDLALPFAILLGFTNSCVNPFLYCFVGNRFQQKLRSVFRVPITWLQGKRESMSREME-------T >VU51_HSV7J -----------------MKNIDLTNWKLLAEIYEYLFFFSFFFLCLLVIIVVKFNNSTVGR-E--------YTFSTFSGMLVYILLLPVKMGMLTKM-----WDVSTDYCIILMFLSDFSFIFSSWALTLLALERINNFSFSEIKVNETKILKQMSFPIIWVTSIFQAVQISMKYKKSQ---EDDYCLLA--------IRSAEEAWILLMYTVVIPTFIVFFYVLNKRFL-----------FLERDLNSIVTHLSLFLFFGALCFFPASVLNEFNCN----RLFYGLHELLIVCLELKIFYVPTMTYIISCENYRLAAKAFFCKCFKPCFLMPSLRSTQF-------- >PAR3_MOUSE -WTGATTTIKAECPEDSISTLHVNNATIGYLRSSLSTQVIPAIYILLFVVGVPSNIVTLWKLSLRTK-SISLVIFHTNLAIADLLFCVTLPFKIAYHLNGNNWVFGEVMCRITTVVFYGNMYCAILILTCMGINRYLATAHPFYQKLPKRSFSLLMCGIVWVMVFLYMLPFVILKQEYH--E--TTCHDVDACESPSSFRFYYFVSLAFFGFLIPFVIIIFCYTTLIHKL----------KSKDRIWLGYIKAVLLILVIFTICFAPTNIILVIHHANYYYDSLYFMYLIALCLGSLNSCLDPFLYFVMSKVVDQLNP------------------------------ >GPRC_HUMAN NLSGLPRDYLDAAAAENISAAVSSRVPAVEPEPELVVNPWDIVLCTSGTLISCENAIVVLIIFHNPSLRAPMFLLIGSLALADLLAG-IGLITNFVFA-Y--LLQSEATKLVTIGLIVASFSASVCSLLAITVDRYLSLYYALYHSERTVTFTYVMLVMLWGTSICLGLLPVMGWNCL-R--DESTCSVV-------RPLTKNNAAILSVSFLFMFALMLQLYIQICKIVMRHIALHFLATSHYVTTRKGVSTLAIILGTFAACWMPFTLYSLIADY----TYPSIYTYATLLPATYNSIINPVIYAFRNQEIQKALCLICCGCIPSSLAQRARSP------------ >BRB2_CAVPO ---MFNITSQVSALNATLAQGNSC---LDAEWWSWLNTIQAPFLWVLFVLAVLENIFVLSVFFLHKSSCTVAEIYLGNLAVADLILAFGLPFWAITIANNFDWLFGEVLCRMVNTMIQMNMYSSICFLMLVSIDRYLALVKTMMGRMRGVRWAKLYSLVIWGCALLLSSPMLVFRTMKD-HN--TACLII---YPSLTWQVFTNVLLNLVGFLLPLSIITFCTVQIMQVLRNN---EMQKFKEIQTERRATVLVLAVLLLFVVCWLPFQIGTFLDTLRLLGHVIDLITQISSYLAYSNSCLNPLVYVIVGKRFRKKSREVYHGLCRSGGCVSEPAQLRTS-------I >P2YR_RAT AAFLAGLGSLWGNSTIASTAAVSSSFRCALIKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLSLGRLKKKNAIYVSVLVWLIVVVAISPILFYSGTG----KTVTCYDS-TSDEYLRSYFIYSMCTTVAMFCIPLVLILGCYGLIVRALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMKTMNLRARLDDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKASRRSEANLQSKSLSEFKQNGDTSL >PF2R_SHEEP --MSTNN---------SVQPVSPASELLSNTTCQLEEDLSISFSIIFMTVGILSNSLAIAILMKAYQYKSSFLLLASALVITDFFGHLINGTIAVFVYASDKFDKSNILCSIFGICMVFSGLCPLFLGSLMAIERCIGVTKPIHSTKITTKHVKMMLSGVCFFAVFVALLPILGHRDYKIQASRTWCFYK--TDQIKDWEDRFYLLLFAFLGLLALGISFVCNAITGISLLKVKFRSQQHRQGRSHHFEMVIQLLGIMCVSCICWSPFLVTMASIGMNIQDKDSCERTLFTLRMATWNQILDPWVYILLRKAVLRNLYVCTRRCCGVHVISLHVWELKVAAISDLPVT >GPRO_RAT QTSLLSTGPNASNISDGQDNLTLPGSPPRTGSVSYINIIMPSVFGTICLLGIVGNSTVIFAVVKKSKCSNVPDIFIINLSVVDLLFLLGMPFMIHQLMGNGVWHFGETMCTLITAMDANSQFTSTYILTAMTIDRYLATVHPISTKFRKPSMATLVICLLWALSFISITPVWLYARLIP-FP-AVGCGIR--LPNPDTDLYWFTLYQFFLAFALPFVVITAAYVKILQRMTSSV--PASQRSIRLRTKRVTRTAIAICLVFFVCWAPYYVLQLTQLSISRPTFVYLYNAAIS-LGYANSCLNPFVYIVLCETFRKRLVLSVKPAAQGQLRTVSNAQESKGT------- >ML1._HUMAN ---------MGPTLAVPTPYGCIGCKLPQPEYPPALIIFMFCAMVITIVVDLIGNSMVILAVTKNKKLRNSGNIFVVSLSVADMLVAIYPYPLMLHAMSIGGWDLSQLQCQMVGFITGLSVVGSIFNIVAIAINRYCYICHSLYERIFSVRNTCIYLVITWIMTVLAVLPNMYIGT-IEYDP-TYTCIFN------YLNNPVFTVTIVCIHFVLPLLIVGFCYVRIWTKVLAARD-AGQNPDNQLAEVRNFLTMFVIFLLFAVCWCPINVLTVLVAVSPKEKIPNWLYLAAYFIAYFNSCLNAVIYGLLNENFRREYWTIFHAMRHPIIFFPGLISARTLARARAHAR >AVT_CATCO -------------MGRIANQTTASNDTDPFGRNEEVAKMEITVLSVTFFVAVIGNLSVLLAMHNTKKKSSRMHLFIKHLSLADMVVAFFQVLPQLCWEITFRFYGPDFLCRIVKHLQVLGMFASTYMMVMMTLDRYIAICHPLKTLQQPTQRAYIMIGSTWLCSLLLSTPQYFIFSLSESY--VYDCWGH---FIEPWGIRAYITWITVGIFLIPVIILMICYGFICHSIWKNMIGVSSVTIISRAKLRTVKMTLVIVLAYIVCWAPFFIVQMWSVWDENFDSENAAVTLSALLASLNSCCNPWIYMLFSGHLLYDFLRCFPCCKKPRNMLQKEDSTLLTKLAAGRMT >ACM3_PIG N---------ISQAAGNFSSPNGTTSDPLGGHTIWQVVFIAFLTGILALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWVISFILWAPAILFWQYFV-VP-PGECFIQ------FLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRKTRTKRKRMSLIKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFCDSC-IPKTYWNLGYWLCYINSTVNPVCYALCNKTFRTTFKMLLLCQCDKRKRRKQQYQHKRVPEQAL--- >MSHR_ALCAA PVLGSQRRLLGSLNCTPPATFSLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAVSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAMDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAILYVHMLARACQHIARKRQHPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW----------------------- >GP72_MOUSE TGPNASSHFWANYTFSDWQNFVGRRRYGAESQNPTVKALLIVAYSFTIVFSLFGNVLVCHVIFKNQRMHSATSLFIVNLAVADIMITLLNTPFTLVRFVNSTWVFGKGMCHVSRFAQYCSLHVSALTLTAIAVDRHQVIMHPL-KPRISITKGVIYIAVIWVMATFFSLPHAICQKLFT--EVRSLCLPD-FPEPADLFWKYLDLATFILLYLLPLFIISVAYARVAKKLWLCGDVTEQYLALRRKKKTTVKMLVLVVVLFALCWFPLNCYVLLLSSKAI-HTNNALYFAFHWFAMSSTCYNPFIYCWLNENFRVELKALLSMCQRPPKPQEDRLPVAWTEKSHGRRA >ACM5_RAT --------MEGESYNESTVNGTPVNHQALERHGLWEVITIAVVTAVVSLMTIVGNVLVMISFKVNSQLKTVNNYYLLSLACADLIIGIFSMNLYTTYILMGRWVLGSLACDLWLALDYVASNASVMNLLVISFDRYFSITRPLYRAKRTPKRAGIMIGLAWLVSFILWAPAILCWQYLV-VP-PDECQIQ------FLSEPTITFGTAIAAFYIPVSVMTILYCRIYRETEKRNLSTKRKRMVLVKERKAAQTLSAILLAFIITWTPYNIMVLVSTFCDKC-VPVTLWHLGYWLCYVNSTINPICYALCNRTFRKTFKLLLLCRWKKKKVEEKLYW------------ >CKR5_PONPY ----MDYQVSSPTYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY >GASR_CANFA GASLCRAGGALLNSSGAGNLSCEPPRLRGAGTRELELAIRVTLYAVIFLMSVGGNVLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVVCKAVSYLMGVSVSVSTLSLVAIALERYSAICRPLARVWQTRSHAARVIIATWMLSGLLMVPYPVYTAVQP---A-LQCVHR---WPSARVRQTWSVLLLLLLFFVPGVVMAVAYGLISRELYLGPGPPRPYQAKLLAKKRVVRMLLVIVVLFFLCWLPLYSANTWRAFDSSGALSGAPISFIHLLSYASACVNPLVYCFMHRRFRQACLETCARCCPRPPRARPRPLPSIASLSRLSYT >TLR2_DROME TLSTDQPAVGDVEDAAEDAAASMETGSFAFVVPWWRQVLWSILFGGMVIVATGGNLIVVWIVMTTKRMRTVTNYFIVNLSIADAMVSSLNVTFNYYYMLDSDWPFGEFYCKLSQFIAMLSICASVFTLMAISIDRYVAIIRPL-QPRMSKRCNLAIAAVIWLASTLISCPMMIIYR----NR--TVCYPEDGPTNHSTMESLYNILIIILTYFLPIVSMTVTYSRVGIELWGSTIGTPRQVENVRSKRRVVKMMIVVVLIFAICWLPFHSYFIITSCYPAIPFIQELYLAIYWLAMSNSMYNPIIYCWMNSRFRYGFKMVFRWCLFVRVGTEPFSRYSCSGSPDHNRI >D2DR_MELGA ------MDPLNLSWYNTGDRNWSEPVNESSADQKPQYNYYAVLLTLLIFVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWRFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAAAMPMNTRYSSKRRVTVMIACVWVLSFAISSPILFGLN---E----RECII---------ANPAFVVYSSVVSFYVPFIVTLLVYVQIYMVLRRRSTLMNRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNMHCDCN-IPPAMYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFMKILHC------------------------- >EDG2_BOVIN QPQFTAMNEQQCFSNESIAFFYNRSGKYLATEWNTVTKLVMGLGITVCIFIMLANLLVMVAIYVNRRFHFPIYYLMANLAAADFFAG-LAYFYLMFNTGPNTRRLTVSTWLLRQGLIDTSLTVSVANLLAIAIERHITVFRMQLHARMSNRRVVVVIVVIWTMAIVMGAIPSVGWNCI-C--DIENCSNM------APLYSDSYLVFWAIFNLVTFVVMVVLYAHIFGYVRQRRMSSSGPRRNRDTMMSLLKTVVIVLGAFIICWTPGLVLLLLDVCCPQC-DVLAYEKFFLLLAEFNSAMNPIIYSYRDKEMSATFRQILCCQRSENTSGPTEGSNHTILAGVHSND >NY2R_BOVIN EEMKVDQFGPGHTTLPGELAPDSEPELIDSTKLIEVQVVLILAYCSIILLGVIGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTITLTVIALDRHRCIVYHL-ESKISKQISFLIIGLAWGVSALLASPLAIFREYSLFE--IVACTEKWPGEEKGIYGTIYSLSSLLILYVLPLGIISFSYTRIWSKLKNHSPG-AAHDHYHQRRQKTTKMLVCVVVVFAVSWLPLHAFQLAVDIDSHVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCEQRLDAIHSE---AKKHLQVTKNNG >HH1R_RAT ----------MSFANTSSTFEDKMCEGNRTAMASPQLLPLVVVLSSISLVTVGLNLLVLYAVHSERKLHTVGNLYIVSLSVADLIVGAVVMPMNILYLIMTKWSLGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLRYRTKTRASATILGAWFFSFLWVIPILGWHHFM--EL-EDKCETD------FYNVTWFKIMTAIINFYLPTLLMLWFYVKIYKAVRRHLRSQYVSGLHLNRERKAAKQLGFIMAAFILCWIPYFIFFMVIAFCKSC-CSEPMHMFTIWLGYINSTLNPLIYPLCNENFKKTFKKILHIRS-----------------------""" gpcr_aln = DenseAlignment(data=gpcr_ungapped.split('\n'),MolType=PROTEIN) myos_data = """>gi|107137|pir||A37102 LSRIITRIQA >gi|11024712|ref|NP-060003.1| LAQLITRTQA >gi|11276950|pir||A59286 LSRIITRIQA >gi|11276952|pir||A59293 LAQLITRTQA >gi|11276954|pir||A59234 LSLIITRIQA >gi|11276955|pir||A59236 LSSIFKLIQA >gi|11321579|ref|NP-003793.1| LVTLMTSTQA >gi|11342672|ref|NP-002461.1| LAKLITRTQA >gi|1197168|dbj|BAA08111.1| LSSIFKLIQA >gi|12003423|gb|AAG43570.1|AF21 LVTLMTRTQA >gi|12003425|gb|AAG43571.1|AF21 LVTLMTRTQA >gi|12003427|gb|AAG43572.1|AF21 LVTLMTRTQA >gi|12053672|emb|CAC20413.1| LSRIITRIQA >gi|12060489|dbj|BAB20630.1| LSRIITRIQA >gi|12657350|emb|CAC27776.1| LASLVTLTQA >gi|12657354|emb|CAC27778.1| LAKLVTMTQA >gi|127741|sp|P02563|MYH6-RAT LSRIITRIQA >gi|127748|sp|P02564|MYH7-RAT LSRIITRIQA >gi|127755|sp|P12847|MYH3-RAT LAKLITRTQA >gi|1289512|gb|AAC59911.1| LSLIITRIQA >gi|1289514|gb|AAC59912.1| LSLIITRIQA >gi|13431707|sp|Q28641|MYH4-RAB LAQLITRTQA >gi|13431711|sp|Q90339|MYSS-CYP LALLVTMTQA >gi|13431716|sp|Q9UKX2|MYH2-HUM LAQLITRTQA >gi|13431717|sp|Q9UKX3|MYHD-HUM LVTLMTSTQA >gi|13431724|sp|Q9Y623|MYH4-HUM LAQLITRTQA >gi|1346637|sp|P02565|MYH3-CHIC LAQLITRTQA >gi|13560269|dbj|BAB40920.1| LAQLMTRTQA >gi|13560273|dbj|BAB40922.1| LSRIITRIQA >gi|13638390|sp|P12882|MYH1-HUM LAQLITRTQA >gi|14017756|dbj|BAB47399.1| LSLIITRIQA >gi|15384839|emb|CAC59753.1| LATLVTMTQA >gi|1581130|prf||2116354A LSRIITRIQA >gi|1619328|emb|CAA27817.1| LAKLITRTQA >gi|16508127|gb|AAL17913.1| LSRIITRIQA >gi|1698895|gb|AAB37320.1| LSRIITRIQA >gi|17907763|dbj|BAB79445.1| LAKILTMLQA >gi|179508|gb|AAA51837.1| LSRIITRIQA >gi|179510|gb|AAA62830.1| LSRIITRIQA >gi|18859641|ref|NP-542766.1| LSRIITRIQA >gi|191618|gb|AAA37159.1| LSRIITRIQA >gi|191620|gb|AAA37160.1| LSRIITRIQA >gi|191622|gb|AAA37161.1| LSRIITRIQA >gi|191624|gb|AAA37162.1| LSRIITRIQA >gi|2119306|pir||I49464 LSRIITRIQA >gi|2119307|pir||I48175 LSRIITRIQA >gi|2119308|pir||I48153 LSRIITRIQA >gi|212376|gb|AAA48972.1| LAQLITRTQA >gi|21623523|dbj|BAC00871.1| LAALVGMVQA >gi|21743235|dbj|BAB40921.2| LAQLITRTQA >gi|21907898|dbj|BAC05679.1| LAQIITRTQA >gi|21907900|dbj|BAC05680.1| LAQIITRTQA >gi|21907902|dbj|BAC05681.1| LSRIITRIQA >gi|219524|dbj|BAA00791.1| LSRIITRIQA >gi|22121649|gb|AAM88909.1| LAQIITRTQA >gi|23379831|gb|AAM88910.1| LAQLITRTQA >gi|2351219|dbj|BAA22067.1| LSHLVTMTQA >gi|2351221|dbj|BAA22068.1| LVNLVTMTQA >gi|2351223|dbj|BAA22069.1| LALLVTMTQA >gi|27764861|ref|NP-002462.1| LSRIITRMQA >gi|297024|emb|CAA79675.1| LSRIITRMQA >gi|3024204|sp|Q02566|MYH6-MOUS LSRIITRIQA >gi|3041706|sp|P13533|MYH6-HUMA LSRIITRMQA >gi|3041708|sp|P13540|MYH7-MESA LSRIITRIQA >gi|3043372|sp|P11055|MYH3-HUMA LAKLITRTQA >gi|34870884|ref|XP-213345.2| LAQLITRTQA >gi|34870892|ref|XP-340820.1| LAQIITRTQA >gi|37720046|gb|AAN71741.1| LARILTGIQA >gi|38091410|ref|XP-354614.1| LAKLITRTQA >gi|38091413|ref|XP-354615.1| LAQLITRTQA >gi|38177589|gb|AAF00096.2|AF11 LSLIISGIQA >gi|38347761|dbj|BAD01606.1| LSLLLTRTQA >gi|38347763|dbj|BAD01607.1| LSLLLTRTQA >gi|38488753|ref|NP-942118.1| LARILTGIQA >gi|3915779|sp|P13539|MYH6-MESA LSRIITRIQA >gi|402372|gb|AAA62313.1| LSRIITRIQA >gi|402374|gb|AAB59701.1| LSRIITRIQA >gi|41350446|gb|AAS00505.1| LAALVTMTQA >gi|41386691|ref|NP-776542.1| LAQLITRTQA >gi|41386711|ref|NP-777152.1| LSRIITRIQA >gi|42476190|ref|NP-060004.2| LAQLITRTQA >gi|42662294|ref|XP-371398.2| LAKVLTLLQA >gi|45382109|ref|NP-990097.1| LAKILTMIQA >gi|45383005|ref|NP-989918.1| LAKILTMLQA >gi|45383668|ref|NP-989559.1| LAQLITRTQA >gi|4557773|ref|NP-000248.1| LSRIITRIQA >gi|45595719|gb|AAH67305.1| LAQLITRTQA >gi|476355|pir||A46762 LSRIITRIQA >gi|4808809|gb|AAD29948.1| LVTLMTSTQA >gi|4808811|gb|AAD29949.1| LAQLITRTQA >gi|4808813|gb|AAD29950.1| LAQLITRTQA >gi|4808815|gb|AAD29951.1| LAQLITRTQA >gi|5360746|dbj|BAA82144.1| LAQLITRTQA >gi|5360748|dbj|BAA82145.1| LAQLITRTQA >gi|5360750|dbj|BAA82146.1| LAQLITRTQA >gi|547966|sp|P12883|MYH7-HUMAN LSRIITRIQA >gi|56655|emb|CAA34064.1| LSRIITRIQA >gi|6093461|sp|P79293|MYH7-PIG LSRIITRIQA >gi|6683485|dbj|BAA89233.1| LAQLITRTQA >gi|6708502|gb|AAD09454.2| LAKIMTMLQC >gi|7209643|dbj|BAA92289.1| LATLVTMTQA >gi|7248371|dbj|BAA92710.1| LAKILTMIQA >gi|7669506|ref|NP-005954.2| LAQLITRTQA >gi|8393804|ref|NP-058935.1| LSRIITRIQA >gi|8393807|ref|NP-058936.1| LSRIITRIQA >gi|86358|pir||A29320 LAQLITRTQA >gi|88201|pir||S04090 LAKLITRTQA >gi|92498|pir||S06005 LSRIITRIQA >gi|92499|pir||S06006 LSRIITRIQA >gi|92509|pir||A24922 LAKLITRTQA >gi|940233|gb|AAA74199.1| LAQLITRTQA >gi|9800486|gb|AAF99314.1|AF272 LAQLITRTQA >gi|9800488|gb|AAF99315.1|AF272 LAQLITRTQA >gi|9971579|dbj|BAB12571.1| LAALVTMTQA""" myos_aln = DenseAlignment(data=myos_data.split('\n'),MolType=PROTEIN) # a randomly generated tree to use in tests tree20_string='(((0:0.5,1:0.5):0.5,(((2:0.5,3:0.5):0.5,(4:0.5,(5:0.5,6:0.5):0.5):0.5):0.5,((7:0.5,8:0.5):0.5,((9:0.5,((10:0.5,11:0.5):0.5,12:0.5):0.5):0.5,13:0.5):0.5):0.5):0.5):0.5,(((14:0.5,(15:0.5,16:0.5):0.5):0.5,17:0.5):0.5,(18:0.5,19:0.5):0.5):0.5);' default_gctmpca_aa_sub_matrix_lines = default_gctmpca_aa_sub_matrix.split('\n') default_gctmpca_aa_sub_matrix = {} for aa,line in zip(gctmpca_aa_order,default_gctmpca_aa_sub_matrix_lines): default_gctmpca_aa_sub_matrix[aa] = dict([(col_aa,float(rate)/100.) \ for col_aa,rate in zip(gctmpca_aa_order,line.split())]) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_evolve/test_likelihood_function.py000644 000765 000024 00000057305 12024702176 025415 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Some tests for the likelihood function class. tests to do: setting of parameters, by coord, by for-all, checking pars sets testing the likelihood for specified pars getting ancestral probs simulating sequence (not possible to verify values as random) checking that the object resets on tree change, model change, etc """ import warnings warnings.filterwarnings("ignore", "Motif probs overspecified") warnings.filterwarnings("ignore", "Model not reversible") warnings.filterwarnings("ignore", "Ignoring tree edge lengths") import os from numpy import ones, dot from cogent.evolve import substitution_model, predicate from cogent import DNA, LoadSeqs, LoadTree from cogent.util.unit_test import TestCase, main from cogent.maths.matrix_exponentiation import PadeExponentiator as expm from cogent.maths.stats.information_criteria import aic, bic from cogent.evolve.models import JTT92 Nucleotide = substitution_model.Nucleotide MotifChange = predicate.MotifChange __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", "Matthew Wakefield", "Brett Easton"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" base_path = os.getcwd() data_path = os.path.join(base_path, 'data') ALIGNMENT = LoadSeqs( moltype=DNA, filename = os.path.join(data_path,'brca1.fasta')) OTU_NAMES = ["Human", "Mouse", "HowlerMon"] ######################################################## # some funcs for assembling Q-matrices for 'manual' calc def isTransition(motif1, motif2): position = getposition(motif1, motif2) a, b = motif1[position], motif2[position] transitions = {('A', 'G') : 1, ('C', 'T'):1} pair = (min(a, b), max(a, b)) return transitions.has_key(pair) def numdiffs_position(motif1, motif2): assert len(motif1) == len(motif2),\ "motif1[%s] & motif2[%s] have inconsistent length" %\ (motif1, motif2) ndiffs, position = 0, -1 for i in range(len(motif1)): if motif1[i] != motif2[i]: position = i ndiffs += 1 return ndiffs == 1, position def isinstantaneous(motif1, motif2): if motif1 != motif2 and (motif1 == '-' * len(motif1) or \ motif2 == '-' * len(motif1)): return True ndiffs, position = numdiffs_position(motif1, motif2) return ndiffs def getposition(motif1, motif2): ndiffs, position = numdiffs_position(motif1, motif2) return position ############################################################## # funcs for testing the monomer weighted substitution matrices _root_probs = lambda x: dict([(n1+n2, p1*p2) \ for n1,p1 in x.items() for n2,p2 in x.items()]) def make_p(length, coord, val): """returns a probability matrix with value set at coordinate in instantaneous rate matrix""" Q = ones((4,4), float)*0.25 # assumes equi-frequent mprobs at root for i in range(4): Q[i,i] = 0.0 Q[coord] *= val row_sum = Q.sum(axis=1) scale = 1/(.25*row_sum).sum() for i in range(4): Q[i,i] -= row_sum[i] Q *= scale return expm(Q)(length) class LikelihoodCalcs(TestCase): """tests ability to calculate log-likelihoods for several substitution models.""" def setUp(self): self.alignment = ALIGNMENT.takeSeqs(OTU_NAMES)[0: 42] self.tree = LoadTree(tip_names=OTU_NAMES) def _makeLikelihoodFunction(self, submod, translate=False, **kw): alignment = self.alignment if translate: alignment = alignment.getTranslation() calc = submod.makeLikelihoodFunction(self.tree, **kw) calc.setAlignment(alignment) calc.setParamRule('length', value=1.0, is_constant=True) if not translate: calc.setParamRule('kappa', value=3.0, is_constant=True) return calc def test_no_seq_named_root(self): """root is a reserved name""" aln = self.alignment.takeSeqs(self.alignment.Names[:4]) aln = aln.todict() one = aln.pop(aln.keys()[0]) aln["root"] = one aln = LoadSeqs(data=aln) submod = Nucleotide() tree = LoadTree(treestring="%s" % str(tuple(aln.Names))) lf = submod.makeLikelihoodFunction(tree) try: lf.setAlignment(aln) except AssertionError: pass collection = aln.degap().NamedSeqs collection.pop("Human") tree = LoadTree(treestring="%s" % str(tuple(collection.keys()))) lf = submod.makeLikelihoodFunction(tree, aligned=False) try: lf.setSequences(collection) except AssertionError: pass def test_binned_gamma(self): """just rate is gamma distributed""" submod = substitution_model.Codon( predicates={'kappa': 'transition', 'omega': 'replacement'}, ordered_param='rate', distribution='gamma', mprob_model='tuple') lf = self._makeLikelihoodFunction(submod, bins=3) try: values = lf.getParamValueDict(['bin'])['omega_factor'].values() except KeyError: # there shouldn't be an omega factor pass values = lf.getParamValueDict(['bin'])['rate'].values() obs = round(sum(values) / len(values), 6) self.assertEqual(obs, 1.0) self.assertEqual(len(values), 3) shape = lf.getParamValue('rate_shape') def test_binned_gamma_ordered_param(self): """rate is gamma distributed omega follows""" submod = substitution_model.Codon( predicates={'kappa': 'transition', 'omega': 'replacement'}, ordered_param='rate', partitioned_params='omega', distribution='gamma', mprob_model='tuple') lf = self._makeLikelihoodFunction(submod,bins=3) values = lf.getParamValueDict(['bin'])['omega_factor'].values() self.assertEqual(round(sum(values) / len(values), 6), 1.0) self.assertEqual(len(values), 3) shape = lf.getParamValue('rate_shape') def test_binned_partition(self): submod = substitution_model.Codon( predicates={'kappa': 'transition', 'omega': 'replacement'}, ordered_param='rate', partitioned_params='omega', distribution='free', mprob_model='tuple') lf = self._makeLikelihoodFunction(submod, bins=3) values = lf.getParamValueDict(['bin'])['omega_factor'].values() self.assertEqual(round(sum(values) / len(values), 6), 1.0) self.assertEqual(len(values), 3) def test_complex_binned_partition(self): submod = substitution_model.Codon( predicates={'kappa': 'transition', 'omega': 'replacement'}, ordered_param='kappa', partitioned_params=['omega'], mprob_model='tuple') lf = self._makeLikelihoodFunction(submod, bins=['slow', 'fast']) lf.setParamRule('kappa', value=1.0, is_constant=True) lf.setParamRule('kappa', edge="Human", init=1.0, is_constant=False) values = lf.getParamValueDict(['bin'])['kappa_factor'].values() self.assertEqual(round(sum(values) / len(values), 6), 1.0) self.assertEqual(len(values), 2) def test_codon(self): """test a three taxa codon model.""" submod = substitution_model.Codon( equal_motif_probs=True, do_scaling=False, motif_probs=None, predicates={'kappa': 'transition', 'omega': 'replacement'}, mprob_model='tuple') likelihood_function = self._makeLikelihoodFunction(submod) likelihood_function.setParamRule('omega', value=0.5, is_constant=True) evolve_lnL = likelihood_function.getLogLikelihood() self.assertFloatEqual(evolve_lnL, -80.67069614541883) def test_nucleotide(self): """test a nucleotide model.""" submod = Nucleotide( equal_motif_probs=True, do_scaling=False, motif_probs=None, predicates={'kappa': 'transition'}) # now do using the evolve likelihood_function = self._makeLikelihoodFunction( submod) self.assertEqual(likelihood_function.getNumFreeParams(), 0) evolve_lnL = likelihood_function.getLogLikelihood() self.assertFloatEqual(evolve_lnL, -157.49363874840455) def test_discrete_nucleotide(self): """test that partially discrete nucleotide model can be constructed, differs from continuous, and has the expected number of free params""" submod = Nucleotide( equal_motif_probs=True, do_scaling=False, motif_probs=None, predicates={'kappa': 'transition'}) likelihood_function = self._makeLikelihoodFunction( submod, discrete_edges=['Human']) self.assertEqual(likelihood_function.getNumFreeParams(), 12) evolve_lnL = likelihood_function.getLogLikelihood() self.assertNotEqual(evolve_lnL, -157.49363874840455) def test_dinucleotide(self): """test a dinucleotide model.""" submod = substitution_model.Dinucleotide( equal_motif_probs=True, do_scaling=False, motif_probs = None, predicates = {'kappa': 'transition'}, mprob_model='tuple') likelihood_function = self._makeLikelihoodFunction(submod) evolve_lnL = likelihood_function.getLogLikelihood() self.assertFloatEqual(evolve_lnL, -102.48145536663735) def test_protein(self): """test a protein model.""" submod = substitution_model.Protein( do_scaling=False, equal_motif_probs=True) likelihood_function = self._makeLikelihoodFunction(submod, translate=True) evolve_lnL = likelihood_function.getLogLikelihood() self.assertFloatEqual(evolve_lnL, -89.830370754876185) class LikelihoodFunctionTests(TestCase): """tests for a tree analysis class. Various tests to create a tree analysis class, set parameters, and test various functions. """ def setUp(self): self.submodel = Nucleotide( do_scaling=True, model_gaps=False, equal_motif_probs=True, predicates = {'beta': 'transition'}) self.data = LoadSeqs( filename = os.path.join(data_path, 'brca1_5.paml'), moltype = self.submodel.MolType) self.tree = LoadTree( filename = os.path.join(data_path, 'brca1_5.tree')) def _makeLikelihoodFunction(self, **kw): lf = self.submodel.makeLikelihoodFunction(self.tree, **kw) lf.setParamRule('beta', is_independent=True) lf.setAlignment(self.data) return lf def _setLengthsAndBetas(self, likelihood_function): for (species, length) in [ ("DogFaced", 0.1), ("NineBande", 0.2), ("Human", 0.3), ("HowlerMon", 0.4), ("Mouse", 0.5)]: likelihood_function.setParamRule("length", value=length, edge=species, is_constant=True) for (species1, species2, length) in [ ("Human", "HowlerMon", 0.7), ("Human", "Mouse", 0.6)]: LCA = self.tree.getConnectingNode(species1, species2).Name likelihood_function.setParamRule("length", value=length, edge=LCA, is_constant=True) likelihood_function.setParamRule("beta", value=4.0, is_constant=True) def test_information_criteria(self): """test get information criteria from a model.""" lf = self._makeLikelihoodFunction() nfp = lf.getNumFreeParams() lnL = lf.getLogLikelihood() l = len(self.data) self.assertFloatEqual(lf.getAic(), aic(lnL, nfp)) self.assertFloatEqual(lf.getAic(second_order=True), aic(lnL, nfp, l)) self.assertFloatEqual(lf.getBic(), bic(lnL, nfp, l)) def test_result_str(self): # actualy more a test of self._setLengthsAndBetas() likelihood_function = self._makeLikelihoodFunction() self._setLengthsAndBetas(likelihood_function) self.assertEqual(str(likelihood_function), \ """Likelihood Function Table\n\ ====== beta ------ 4.0000 ------ ============================= edge parent length ----------------------------- Human edge.0 0.3000 HowlerMon edge.0 0.4000 edge.0 edge.1 0.7000 Mouse edge.1 0.5000 edge.1 root 0.6000 NineBande root 0.2000 DogFaced root 0.1000 ----------------------------- =============== motif mprobs --------------- T 0.2500 C 0.2500 A 0.2500 G 0.2500 ---------------""") likelihood_function = self._makeLikelihoodFunction(digits=2,space=2) self.assertEqual(str(likelihood_function), \ """Likelihood Function Table\n\ =============================== edge parent length beta ------------------------------- Human edge.0 1.00 1.00 HowlerMon edge.0 1.00 1.00 edge.0 edge.1 1.00 1.00 Mouse edge.1 1.00 1.00 edge.1 root 1.00 1.00 NineBande root 1.00 1.00 DogFaced root 1.00 1.00 ------------------------------- ============= motif mprobs ------------- T 0.25 C 0.25 A 0.25 G 0.25 -------------""") def test_calclikelihood(self): likelihood_function = self._makeLikelihoodFunction() self._setLengthsAndBetas(likelihood_function) self.assertAlmostEquals(-250.686745262, likelihood_function.getLogLikelihood(),places=9) def test_g_statistic(self): likelihood_function = self._makeLikelihoodFunction() self._setLengthsAndBetas(likelihood_function) self.assertAlmostEquals(230.77670557, likelihood_function.getGStatistic(),places=6) def test_ancestralsequences(self): likelihood_function = self._makeLikelihoodFunction() self._setLengthsAndBetas(likelihood_function) result = likelihood_function.reconstructAncestralSeqs()['edge.0'] a_column_with_mostly_Ts = -1 motif_G = 2 self.assertAlmostEquals(2.28460181711e-05, result[a_column_with_mostly_Ts][motif_G], places=8) lf = self.submodel.makeLikelihoodFunction(self.tree, bins=['low', 'high']) lf.setParamRule('beta', bin='low', value=0.1) lf.setParamRule('beta', bin='high', value=10.0) lf.setAlignment(self.data) result = lf.reconstructAncestralSeqs() def test_likely_ancestral(self): """excercising the most likely ancestral sequences""" likelihood_function = self._makeLikelihoodFunction() self._setLengthsAndBetas(likelihood_function) result = likelihood_function.likelyAncestralSeqs() def test_simulateAlignment(self): "Simulate DNA alignment" likelihood_function = self._makeLikelihoodFunction() self._setLengthsAndBetas(likelihood_function) simulated_alignment = likelihood_function.simulateAlignment(20, exclude_internal = False) self.assertEqual(len(simulated_alignment), 20) self.assertEqual(len(simulated_alignment.getSeqNames()), 8) def test_simulateHetergeneousAlignment(self): "Simulate substitution-heterogeneous DNA alignment" lf = self.submodel.makeLikelihoodFunction(self.tree, bins=['low', 'high']) lf.setParamRule('beta', bin='low', value=0.1) lf.setParamRule('beta', bin='high', value=10.0) simulated_alignment = lf.simulateAlignment(100) def test_simulatePatchyHetergeneousAlignment(self): "Simulate patchy substitution-heterogeneous DNA alignment" lf = self.submodel.makeLikelihoodFunction(self.tree, bins=['low', 'high'], sites_independent=False) lf.setParamRule('beta', bin='low', value=0.1) lf.setParamRule('beta', bin='high', value=10.0) simulated_alignment = lf.simulateAlignment(100) def test_simulateAlignment2(self): "Simulate alignment with dinucleotide model" al = LoadSeqs(data={'a':'ggaatt','c':'cctaat'}) t = LoadTree(treestring="(a,c);") sm = substitution_model.Dinucleotide(mprob_model='tuple') lf = sm.makeParamController(t) lf.setAlignment(al) simalign = lf.simulateAlignment() self.assertEqual(len(simalign), 6) def test_simulateAlignment3(self): """Simulated alignment with gap-induced ambiguous positions preserved""" t = LoadTree(treestring='(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1)root;') al = LoadSeqs(data={ 'a':'g--cactat?', 'b':'---c-ctcct', 'c':'-a-c-ctat-', 'd':'-a-c-ctat-'}) sm = Nucleotide(recode_gaps=True) lf = sm.makeParamController(t) #pc.setConstantLengths() lf.setAlignment(al) #print lf.simulateAlignment(sequence_length=10) simulated = lf.simulateAlignment() self.assertEqual(len(simulated.getSeqNames()), 4) import re self.assertEqual( re.sub('[ATCG]', 'x', simulated.todict()['a']), 'x??xxxxxx?') def test_simulateAlignment_root_sequence(self): """provide a root sequence for simulating an alignment""" def use_root_seq(root_sequence): al = LoadSeqs(data={'a':'ggaatt','c':'cctaat'}) t = LoadTree(treestring="(a,c);") sm = substitution_model.Dinucleotide(mprob_model='tuple') lf = sm.makeParamController(t) lf.setAlignment(al) simalign = lf.simulateAlignment(exclude_internal=False, root_sequence=root_sequence) root = simalign.NamedSeqs['root'] self.assertEqual(str(root), str(root_sequence)) root_sequence = DNA.makeSequence('GTAATT') use_root_seq(root_sequence) # as a sequence instance use_root_seq('GTAATC') # as a string def test_pc_initial_parameters(self): """Default parameter values from original annotated tree""" likelihood_function = self._makeLikelihoodFunction() self._setLengthsAndBetas(likelihood_function) tree = likelihood_function.getAnnotatedTree() lf = self.submodel.makeParamController(tree) lf.setAlignment(self.data) self.assertEqual(lf.getParamValue("length", "Human"), 0.3) self.assertEqual(lf.getParamValue("beta", "Human"), 4.0) def test_set_par_all(self): likelihood_function = self._makeLikelihoodFunction() likelihood_function.setParamRule("length", value=4.0, is_constant=True) likelihood_function.setParamRule("beta", value=6.0, is_constant=True) self.assertEqual(str(likelihood_function), \ """Likelihood Function Table ====== beta ------ 6.0000 ------ ============================= edge parent length ----------------------------- Human edge.0 4.0000 HowlerMon edge.0 4.0000 edge.0 edge.1 4.0000 Mouse edge.1 4.0000 edge.1 root 4.0000 NineBande root 4.0000 DogFaced root 4.0000 ----------------------------- =============== motif mprobs --------------- T 0.2500 C 0.2500 A 0.2500 G 0.2500 ---------------""") #self.submodel.setScaleRule("ts",['beta']) #self.submodel.setScaleRule("tv",['beta'], exclude_pars = True) self.assertEqual(str(likelihood_function),\ """Likelihood Function Table ====== beta ------ 6.0000 ------ ============================= edge parent length ----------------------------- Human edge.0 4.0000 HowlerMon edge.0 4.0000 edge.0 edge.1 4.0000 Mouse edge.1 4.0000 edge.1 root 4.0000 NineBande root 4.0000 DogFaced root 4.0000 ----------------------------- =============== motif mprobs --------------- T 0.2500 C 0.2500 A 0.2500 G 0.2500 ---------------""") def test_getMotifProbs(self): likelihood_function = self._makeLikelihoodFunction() mprobs = likelihood_function.getMotifProbs() assert hasattr(mprobs, 'keys'), mprobs keys = mprobs.keys() keys.sort() obs = self.submodel.getMotifs() obs.sort() self.assertEqual(obs, keys) def test_getAnnotatedTree(self): likelihood_function = self._makeLikelihoodFunction() likelihood_function.setParamRule("length", value=4.0, edge="Human", is_constant=True) result = likelihood_function.getAnnotatedTree() self.assertEqual(result.getNodeMatchingName('Human').params['length'], 4.0) self.assertEqual(result.getNodeMatchingName('Human').Length, 4.0) def test_getparamsasdict(self): likelihood_function = self._makeLikelihoodFunction() likelihood_function.setName("TEST") self.assertEqual(str(likelihood_function),\ """TEST ======================================= edge parent length beta --------------------------------------- Human edge.0 1.0000 1.0000 HowlerMon edge.0 1.0000 1.0000 edge.0 edge.1 1.0000 1.0000 Mouse edge.1 1.0000 1.0000 edge.1 root 1.0000 1.0000 NineBande root 1.0000 1.0000 DogFaced root 1.0000 1.0000 --------------------------------------- =============== motif mprobs --------------- T 0.2500 C 0.2500 A 0.2500 G 0.2500 ---------------""") self.assertEqual(likelihood_function.getParamValueDict(['edge']), { 'beta': {'NineBande': 1.0, 'edge.1': 1.0,'DogFaced': 1.0, 'Human': 1.0, 'edge.0': 1.0, 'Mouse': 1.0, 'HowlerMon': 1.0}, 'length': {'NineBande': 1.0,'edge.1': 1.0, 'DogFaced': 1.0, 'Human': 1.0, 'edge.0': 1.0, 'Mouse': 1.0,'HowlerMon': 1.0}}) def test_get_statistics_from_empirical_model(self): """should return valid dict from an empirical substitution model""" submod = JTT92() aln = self.data.getTranslation() lf = submod.makeLikelihoodFunction(self.tree) lf.setAlignment(aln) stats = lf.getParamValueDict(['edge'], params=['length']) def test_constant_to_free(self): """excercise setting a constant param rule, then freeing it""" # checks by just trying to make the calculator lf = self.submodel.makeLikelihoodFunction(self.tree) lf.setAlignment(self.data) lf.setParamRule('beta', is_constant=True, value=2.0, edges=['NineBande', 'DogFaced'], is_clade=True) lf.setParamRule('beta', init=2.0, is_constant=False, edges=['NineBande', 'DogFaced'], is_clade=True) def test_get_psub_rate_matrix(self): """lf should return consistent rate matrix and psub""" lf = self.submodel.makeLikelihoodFunction(self.tree) lf.setAlignment(self.data) Q = lf.getRateMatrixForEdge('NineBande') P = lf.getPsubForEdge('NineBande') self.assertFloatEqual(expm(Q.array)(1.0), P.array) # should fail for a discrete Markov model dm = substitution_model.DiscreteSubstitutionModel(DNA.Alphabet) lf = dm.makeLikelihoodFunction(self.tree) lf.setAlignment(self.data) self.assertRaises(Exception, lf.getRateMatrixForEdge, 'NineBande') def test_make_discrete_markov(self): """lf ignores tree lengths if a discrete Markov model""" t = LoadTree(treestring='(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1)root;') dm = substitution_model.DiscreteSubstitutionModel(DNA.Alphabet) lf = dm.makeLikelihoodFunction(t) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_evolve/test_models.py000644 000765 000024 00000003623 12024702176 022642 0ustar00jrideoutstaff000000 000000 from cogent.util.unit_test import TestCase, main from cogent.evolve.models import JC69, F81, HKY85, TN93, GTR, \ MG94HKY, MG94GTR, GY94, H04G, H04GK, H04GGK, \ DSO78, AH96, AH96_mtmammals, JTT92, WG01, CNFGTR, CNFHKY, \ WG01_matrix, WG01_freqs __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class CannedModelsTest(TestCase): """Check each canned model can actually be instantiated.""" def _instantiate_models(self, models, **kwargs): for model in models: model(**kwargs) def test_nuc_models(self): """excercising nucleotide model construction""" self._instantiate_models([JC69, F81, HKY85, GTR]) def test_codon_models(self): """excercising codon model construction""" self._instantiate_models([CNFGTR, CNFHKY, MG94HKY, MG94GTR, GY94, H04G, H04GK, H04GGK]) def test_aa_models(self): """excercising aa model construction""" self._instantiate_models([DSO78, AH96, AH96_mtmammals, JTT92, WG01]) def test_bin_options(self): kwargs = dict(with_rate=True, distribution='gamma') model = WG01(**kwargs) model = GTR(**kwargs) def test_empirical_values_roundtrip(self): model = WG01() assert model.getMotifProbs() == WG01_freqs assert (model.calcExchangeabilityMatrix('dummy_mprobs') == WG01_matrix).all() def test_solved_models(self): for klass in [TN93, HKY85, F81]: for scaled in [True, False]: model = klass(rate_matrix_required=False, do_scaling=scaled) model.checkPsubCalculationsMatch() if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_evolve/test_motifchange.py000644 000765 000024 00000013272 12024702176 023644 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import unittest from cogent.evolve.predicate import MotifChange from cogent.core.moltype import CodonAlphabet __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", "Matthew Wakefield", "Brett Easton"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class FakeModel(object): def __init__(self, alphabet): self.alphabet = alphabet self.MolType = alphabet.MolType def getAlphabet(self): return self.alphabet class TestPredicates(unittest.TestCase): def setUp(self): self.alphabet = CodonAlphabet() self.model = FakeModel(self.alphabet) def _makeMotifChange(self, *args, **kw): pred = MotifChange(*args, **kw) return pred.interpret(self.model) def assertMatch(self, pred, seq1, seq2): assert pred(seq1, seq2), (pred.__doc__, (seq1, seq2)) def assertNoMatch(self, pred, seq1, seq2): assert not pred(seq1, seq2), ('not ' + pred.__doc__, (seq1, seq2)) def test_indels(self): indel = self._makeMotifChange('---', 'NNN') self.assertMatch(indel, '---', 'AAA') def test_impossible_change(self): self.assertRaises(Exception, self._makeMotifChange, '----', 'NNNN') def test_isfromcpg(self): isFromCpG = self._makeMotifChange('CG', forward_only = True) self.assertMatch(isFromCpG, 'CG', 'CA') self.assertMatch(isFromCpG, 'CG', 'TG') self.assertMatch(isFromCpG, 'ACG', 'ATG') self.assertMatch(isFromCpG, 'CGT', 'CTT') self.assertNoMatch(isFromCpG, 'CTT', 'CGT') self.assertNoMatch(isFromCpG, 'C', 'G') def test_isfromtocpg(self): isFromToCpG = self._makeMotifChange('CG') self.assertMatch(isFromToCpG, 'CG', 'CA') self.assertMatch(isFromToCpG, 'CG', 'TG') self.assertMatch(isFromToCpG, 'ACG', 'ATG') self.assertMatch(isFromToCpG, 'CGT', 'CTT') self.assertMatch(isFromToCpG, 'CTT', 'CGT') def test_isFromToCpA_C_only(self): isFromToCpA_C_only = self._makeMotifChange('CA', diff_at = 0) self.assertMatch(isFromToCpA_C_only, 'CA', 'TA') self.assertMatch(isFromToCpA_C_only, 'TCA', 'TTA') self.assertMatch(isFromToCpA_C_only, 'TAA', 'CAA') self.assertNoMatch(isFromToCpA_C_only, 'TCA', 'TCT') def test_isFromCpA_C_only(self): isFromCpA_C_only = self._makeMotifChange('CA', forward_only = True, diff_at = 0) self.assertMatch(isFromCpA_C_only, 'CA', 'TA') self.assertMatch(isFromCpA_C_only, 'TCA', 'TTA') self.assertNoMatch(isFromCpA_C_only, 'TAA', 'CAA') def test_isCpT_T_only(self): isCpT_T_only = self._makeMotifChange('CT', diff_at = 1) self.assertMatch(isCpT_T_only, 'CT', 'CA') self.assertMatch(isCpT_T_only, 'TCA', 'TCT') self.assertNoMatch(isCpT_T_only, 'TTA', 'TCA') self.assertNoMatch(isCpT_T_only, 'TA', 'CT') def test_isCCC(self): isCCC = self._makeMotifChange('CCC') self.assertNoMatch(isCCC, 'CC', 'CT') def test_isC(self): isC = self._makeMotifChange('C') self.assertMatch(isC, 'C', 'T') self.assertNoMatch(isC, 'CA', 'CT') self.assertMatch(isC, 'CA', 'CC') self.assertMatch(isC, 'CAT', 'GAT') self.assertMatch(isC, 'CAT', 'CCT') self.assertMatch(isC, 'CAT', 'CAC') self.assertNoMatch(isC, 'CAT', 'CAA') self.assertNoMatch(isC, 'C', 'C') def test_isCtoT(self): isCtoT = self._makeMotifChange('C', 'T') self.assertMatch(isCtoT, 'C', 'T') self.assertMatch(isCtoT, 'T', 'C') self.assertNoMatch(isCtoT, 'T', 'A') isCtoT = self._makeMotifChange('C', 'T', forward_only = True) self.assertMatch(isCtoT, 'C', 'T') self.assertNoMatch(isCtoT, 'T', 'C') def test_isCGtoCA(self): isCG_CA = self._makeMotifChange('CG', 'CA') self.assertMatch(isCG_CA, 'CG', 'CA') self.assertMatch(isCG_CA, 'CA', 'CG') self.assertMatch(isCG_CA, 'CAT', 'CGT') self.assertMatch(isCG_CA, 'CGT', 'CAT') self.assertMatch(isCG_CA, 'TCA', 'TCG') self.assertNoMatch(isCG_CA, 'TCT', 'TCG') self.assertMatch(isCG_CA, 'CGTT', 'CATT') self.assertMatch(isCG_CA, 'TCGT', 'TCAT') self.assertMatch(isCG_CA, 'TTCG', 'TTCA') self.assertMatch(isCG_CA, 'CATT', 'CGTT') self.assertMatch(isCG_CA, 'TCAT', 'TCGT') self.assertMatch(isCG_CA, 'TTCA', 'TTCG') isCG_CA = self._makeMotifChange('CG', 'CA', forward_only = True) self.assertMatch(isCG_CA, 'CGTT', 'CATT') self.assertMatch(isCG_CA, 'TCGT', 'TCAT') self.assertMatch(isCG_CA, 'TTCG', 'TTCA') self.assertNoMatch(isCG_CA, 'CATT', 'CGTT') self.assertNoMatch(isCG_CA, 'TCAT', 'TCGT') self.assertNoMatch(isCG_CA, 'TTCA', 'TTCG') isCG = self._makeMotifChange('CG', diff_at = 1) self.assertMatch(isCG, 'CGTT', 'CATT') self.assertMatch(isCG, 'TCGT', 'TCAT') self.assertMatch(isCG, 'TTCG', 'TTCA') self.assertNoMatch(isCG, 'CGTT', 'TGTT') self.assertNoMatch(isCG, 'TCGT', 'TAGT') self.assertNoMatch(isCG, 'TTCG', '--GG') def test_wildcards(self): isCG_CN = self._makeMotifChange('CG', 'CN') self.assertMatch(isCG_CN, 'CG', 'CA') self.assertNoMatch(isCG_CN, 'CG', 'CG') self.assertNoMatch(isCG_CN, 'CG', 'C-') if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_evolve/test_newq.py000644 000765 000024 00000043010 12024702176 022323 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import warnings warnings.filterwarnings("ignore", "Motif probs overspecified") warnings.filterwarnings("ignore", "Model not reversible") from numpy import ones, dot, array from cogent import LoadSeqs, DNA, LoadTree, LoadTable from cogent.evolve.substitution_model import Nucleotide, General, \ GeneralStationary from cogent.evolve.discrete_markov import DiscreteSubstitutionModel from cogent.evolve.predicate import MotifChange from cogent.util.unit_test import TestCase, main from cogent.maths.matrix_exponentiation import PadeExponentiator as expm __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def _dinuc_root_probs(x,y=None): if y is None: y = x return dict([(n1+n2, p1*p2) for n1,p1 in x.items() for n2,p2 in y.items()]) def _trinuc_root_probs(x,y,z): return dict([(n1+n2+n3, p1*p2*p3) for n1,p1 in x.items() for n2,p2 in y.items() for n3,p3 in z.items()]) def make_p(length, coord, val): """returns a probability matrix with value set at coordinate in instantaneous rate matrix""" Q = ones((4,4), float)*0.25 # assumes equi-frequent mprobs at root for i in range(4): Q[i,i] = 0.0 Q[coord] *= val row_sum = Q.sum(axis=1) scale = 1/(.25*row_sum).sum() for i in range(4): Q[i,i] -= row_sum[i] Q *= scale return expm(Q)(length) class NewQ(TestCase): aln = LoadSeqs(data={ 'seq1': 'TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACT', 'seq2': 'TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACT'}, moltype=DNA) tree = LoadTree(tip_names=['seq1', 'seq2']) symm_nuc_probs = dict(A=0.25,T=0.25,C=0.25,G=0.25) symm_root_probs = _dinuc_root_probs(symm_nuc_probs) asymm_nuc_probs = dict(A=0.1,T=0.1,C=0.4,G=0.4) asymm_root_probs = _dinuc_root_probs(asymm_nuc_probs) posn_root_probs = _dinuc_root_probs(symm_nuc_probs, asymm_nuc_probs) cond_root_probs = dict([(n1+n2, p1*[.1, .7][n1==n2]) for n1,p1 in asymm_nuc_probs.items() for n2 in 'ATCG']) # Each of these (data, model) pairs should give a result different # from any of the simpler models applied to the same data. ordered_by_complexity = [ # P(AA) == P(GG) == P(AG) [symm_root_probs, 'tuple'], # P(GA) == P(AG) but P(AA) != P(GG) [asymm_root_probs, 'monomer'], # P(AG) == P(A?)*P(?G) but P(A?) != P(?A) [posn_root_probs, 'monomers'], # P(AG) != P(A?)*P(?G) [cond_root_probs, 'conditional'], ] def test_newQ_is_nuc_process(self): """newQ is an extension of an independent nucleotide process""" nuc = Nucleotide(motif_probs = self.asymm_nuc_probs) new_di = Nucleotide(motif_length=2, mprob_model='monomer', motif_probs = self.asymm_root_probs) nuc_lf = nuc.makeLikelihoodFunction(self.tree) new_di_lf = new_di.makeLikelihoodFunction(self.tree) # newQ branch length is exactly motif_length*nuc branch length nuc_lf.setParamRule('length', is_independent=False, init=0.2) new_di_lf.setParamRule('length', is_independent=False, init=0.4) nuc_lf.setAlignment(self.aln) new_di_lf.setAlignment(self.aln) self.assertFloatEqual(nuc_lf.getLogLikelihood(), new_di_lf.getLogLikelihood()) def test_lf_display(self): """str of likelihood functions should not fail""" for (dummy, model) in self.ordered_by_complexity: di = Nucleotide(motif_length=2, mprob_model=model) di.adaptMotifProbs(self.cond_root_probs, auto=True) lf = di.makeLikelihoodFunction(self.tree) s = str(lf) def test_get_statistics(self): """get statistics should correctly apply arguments""" for (mprobs, model) in self.ordered_by_complexity: di = Nucleotide(motif_length=2, motif_probs=mprobs, mprob_model=model) lf = di.makeLikelihoodFunction(self.tree) for wm, wt in [(True, True), (True, False), (False, True), (False, False)]: stats = lf.getStatistics(with_motif_probs=wm, with_titles=wt) def test_sim_alignment(self): """should be able to simulate an alignment under all models""" for (mprobs, model) in self.ordered_by_complexity: di = Nucleotide(motif_length=2, motif_probs=mprobs, mprob_model=model) lf = di.makeLikelihoodFunction(self.tree) lf.setParamRule('length', is_independent=False, init=0.4) lf.setAlignment(self.aln) sim = lf.simulateAlignment() def test_reconstruct_ancestor(self): """should be able to reconstruct ancestral sequences under all models""" for (mprobs, model) in self.ordered_by_complexity: di = Nucleotide(motif_length=2, mprob_model=model) di.adaptMotifProbs(mprobs, auto=True) lf = di.makeLikelihoodFunction(self.tree) lf.setParamRule('length', is_independent=False, init=0.4) lf.setAlignment(self.aln) ancestor = lf.reconstructAncestralSeqs() def test_results_different(self): for (i, (mprobs, dummy)) in enumerate(self.ordered_by_complexity): results = [] for (dummy, model) in self.ordered_by_complexity: di = Nucleotide(motif_length=2, motif_probs=mprobs, mprob_model=model) lf = di.makeLikelihoodFunction(self.tree) lf.setParamRule('length', is_independent=False, init=0.4) lf.setAlignment(self.aln) lh = lf.getLogLikelihood() for other in results[:i]: self.failIfAlmostEqual(other, lh, places=2) for other in results[i:]: self.assertFloatEqual(other, lh) results.append(lh) def test_position_specific_mprobs(self): """correctly compute likelihood when positions have distinct probabilities""" aln_len = len(self.aln) posn1 = [] posn2 = [] for name, seq in self.aln.todict().items(): p1 = [seq[i] for i in range(0,aln_len,2)] p2 = [seq[i] for i in range(1,aln_len,2)] posn1.append([name, ''.join(p1)]) posn2.append([name, ''.join(p2)]) # the position specific alignments posn1 = LoadSeqs(data=posn1) posn2 = LoadSeqs(data=posn2) # a newQ dinucleotide model sm = Nucleotide(motif_length=2, mprob_model='monomer', do_scaling=False) lf = sm.makeLikelihoodFunction(self.tree) lf.setAlignment(posn1) posn1_lnL = lf.getLogLikelihood() lf.setAlignment(posn2) posn2_lnL = lf.getLogLikelihood() expect_lnL = posn1_lnL+posn2_lnL # the joint model lf.setAlignment(self.aln) aln_lnL = lf.getLogLikelihood() # setting the full alignment, which has different motif probs, should # produce a different lnL self.failIfAlmostEqual(expect_lnL, aln_lnL) # set the arguments for taking position specific mprobs sm = Nucleotide(motif_length=2, mprob_model='monomers', do_scaling=False) lf = sm.makeLikelihoodFunction(self.tree) lf.setAlignment(self.aln) posn12_lnL = lf.getLogLikelihood() self.assertFloatEqual(expect_lnL, posn12_lnL) def test_compute_conditional_mprobs(self): """equal likelihood from position specific and conditional mprobs""" def compare_models(motif_probs, motif_length): # if the 1st and 2nd position motifs are independent of each other # then conditional is the same as positional ps = Nucleotide(motif_length=motif_length, motif_probs=motif_probs, mprob_model='monomers') cd = Nucleotide(motif_length=motif_length,motif_probs=motif_probs, mprob_model='conditional') ps_lf = ps.makeLikelihoodFunction(self.tree) ps_lf.setParamRule('length', is_independent=False, init=0.4) ps_lf.setAlignment(self.aln) cd_lf = cd.makeLikelihoodFunction(self.tree) cd_lf.setParamRule('length', is_independent=False, init=0.4) cd_lf.setAlignment(self.aln) self.assertFloatEqual(cd_lf.getLogLikelihood(), ps_lf.getLogLikelihood()) compare_models(self.posn_root_probs, 2) # trinucleotide trinuc_mprobs = _trinuc_root_probs(self.asymm_nuc_probs, self.asymm_nuc_probs, self.asymm_nuc_probs) compare_models(trinuc_mprobs, 3) def test_cond_pos_differ(self): """lnL should differ when motif probs are not multiplicative""" dinuc_probs = {'AA': 0.088506666666666664, 'AC': 0.044746666666666664, 'GT': 0.056693333333333332, 'AG': 0.070199999999999999, 'CC': 0.048653333333333333, 'TT': 0.10678666666666667, 'CG': 0.0093600000000000003, 'GG': 0.049853333333333333, 'GC': 0.040253333333333335, 'AT': 0.078880000000000006, 'GA': 0.058639999999999998, 'TG': 0.081626666666666667, 'TA': 0.068573333333333333, 'CA': 0.06661333333333333, 'TC': 0.060866666666666666, 'CT': 0.069746666666666665} mg = Nucleotide(motif_length=2, motif_probs=dinuc_probs, mprob_model='monomer') mg_lf = mg.makeLikelihoodFunction(self.tree) mg_lf.setParamRule('length', is_independent=False, init=0.4) mg_lf.setAlignment(self.aln) cd = Nucleotide(motif_length=2, motif_probs=dinuc_probs, mprob_model='conditional') cd_lf = cd.makeLikelihoodFunction(self.tree) cd_lf.setParamRule('length', is_independent=False, init=0.4) cd_lf.setAlignment(self.aln) self.assertNotAlmostEqual(mg_lf.getLogLikelihood(), cd_lf.getLogLikelihood()) def test_getting_node_mprobs(self): """return correct motif probability vector for tree nodes""" tree = LoadTree(treestring='(a:.2,b:.2,(c:.1,d:.1):.1)') aln = LoadSeqs(data={ 'a': 'TGTG', 'b': 'TGTG', 'c': 'TGTG', 'd': 'TGTG', }) motifs = ['T', 'C', 'A', 'G'] aX = MotifChange(motifs[0], motifs[3], forward_only=True).aliased('aX') bX = MotifChange(motifs[3], motifs[0], forward_only=True).aliased('bX') edX = MotifChange(motifs[1], motifs[2], forward_only=True).aliased('edX') cX = MotifChange(motifs[2], motifs[1], forward_only=True).aliased('cX') sm = Nucleotide(predicates=[aX, bX, edX, cX], equal_motif_probs=True) lf = sm.makeLikelihoodFunction(tree) lf.setParamRule('aX', edge='a', value=8.0) lf.setParamRule('bX', edge='b', value=8.0) lf.setParamRule('edX', edge='edge.0', value=2.0) lf.setParamRule('cX', edge='c', value=0.5) lf.setParamRule('edX', edge='d', value=4.0) lf.setAlignment(aln) # we construct the hand calc variants mprobs = ones(4, float) * .25 a = make_p(.2, (0,3), 8) a = dot(mprobs, a) b = make_p(.2, (3, 0), 8) b = dot(mprobs, b) e = make_p(.1, (1, 2), 2) e = dot(mprobs, e) c = make_p(.1, (2, 1), 0.5) c = dot(e, c) d = make_p(.1, (1, 2), 4) d = dot(e, d) prob_vectors = lf.getMotifProbsByNode() self.assertFloatEqual(prob_vectors['a'].array, a) self.assertFloatEqual(prob_vectors['b'].array, b) self.assertFloatEqual(prob_vectors['c'].array, c) self.assertFloatEqual(prob_vectors['d'].array, d) self.assertFloatEqual(prob_vectors['edge.0'].array, e) def _make_likelihood(model, tree, results, is_discrete=False): """creates the likelihood function""" # discrete model fails to make a likelihood function if tree has # lengths if is_discrete: kwargs={} else: kwargs=dict(expm='pade') lf = model.makeLikelihoodFunction(tree, optimise_motif_probs=True, **kwargs) if not is_discrete: for param in lf.getParamNames(): if param in ('length', 'mprobs'): continue lf.setParamRule(param, is_independent=True, upper=5) lf.setAlignment(results['aln']) return lf def MakeCachedObjects(model, tree, seq_length, opt_args): """simulates an alignment under F81, all models should be the same""" lf = model.makeLikelihoodFunction(tree) lf.setMotifProbs(dict(A=0.1,C=0.2,G=0.3,T=0.4)) aln = lf.simulateAlignment(seq_length) results = dict(aln=aln) discrete_tree = LoadTree(tip_names=aln.Names) def fit_general(results=results): if 'general' in results: return gen = General(DNA.Alphabet) gen_lf = _make_likelihood(gen, tree, results) gen_lf.optimise(**opt_args) results['general'] = gen_lf return def fit_gen_stat(results=results): if 'gen_stat' in results: return gen_stat = GeneralStationary(DNA.Alphabet) gen_stat_lf = _make_likelihood(gen_stat, tree, results) gen_stat_lf.optimise(**opt_args) results['gen_stat'] = gen_stat_lf def fit_constructed_gen(results=results): if 'constructed_gen' in results: return preds = [MotifChange(a,b, forward_only=True) for a,b in [['A', 'C'], ['A', 'G'], ['A', 'T'], ['C', 'A'], ['C', 'G'], ['C', 'T'], ['G', 'C'], ['G', 'T'], ['T', 'A'], ['T', 'C'], ['T', 'G']]] nuc = Nucleotide(predicates=preds) nuc_lf = _make_likelihood(nuc, tree, results) nuc_lf.optimise(**opt_args) results['constructed_gen'] = nuc_lf def fit_discrete(results=results): if 'discrete' in results: return dis_lf = _make_likelihood(DiscreteSubstitutionModel(DNA.Alphabet), discrete_tree, results, is_discrete=True) dis_lf.optimise(**opt_args) results['discrete'] = dis_lf funcs = dict(general=fit_general, gen_stat=fit_gen_stat, discrete=fit_discrete, constructed_gen=fit_constructed_gen) def call(self, obj_name): if obj_name not in results: funcs[obj_name]() return results[obj_name] return call # class DiscreteGeneral(TestCase): # """test discrete and general markov""" # tree = LoadTree(treestring='(a:0.4,b:0.4,c:0.6)') # opt_args = dict(max_restarts=1, local=True) # make_cached = MakeCachedObjects(Nucleotide(), tree, 100000, opt_args) # # def _setup_discrete_from_general(self, gen_lf): # dis_lf = self.make_cached('discrete') # for edge in self.tree: # init = gen_lf.getPsubForEdge(edge.Name) # dis_lf.setParamRule('psubs', edge=edge.Name, init=init) # dis_lf.setMotifProbs(gen_lf.getMotifProbs()) # return dis_lf # # def test_discrete_vs_general1(self): # """compares fully general models""" # gen_lf = self.make_cached('general') # gen_lnL = gen_lf.getLogLikelihood() # dis_lf = self._setup_discrete_from_general(gen_lf) # self.assertFloatEqual(gen_lnL, dis_lf.getLogLikelihood()) # # def test_general_vs_constructed_general(self): # """a constructed general lnL should be identical to General""" # sm_lf = self.make_cached('constructed_gen') # sm_lnL = sm_lf.getLogLikelihood() # gen_lf = self.make_cached('general') # gen_lnL = gen_lf.getLogLikelihood() # self.assertFloatEqualAbs(sm_lnL, gen_lnL, eps=0.1) # # def test_general_stationary(self): # """General stationary should be close to General""" # gen_stat_lf = self.make_cached('gen_stat') # gen_lf = self.make_cached('general') # gen_stat_lnL = gen_stat_lf.getLogLikelihood() # gen_lnL = gen_lf.getLogLikelihood() # self.assertLessThan(gen_stat_lnL, gen_lnL) # # def test_general_stationary_is_stationary(self): # """should be stationary""" # gen_stat_lf = self.make_cached('gen_stat') # mprobs = gen_stat_lf.getMotifProbs() # mprobs = array([mprobs[nuc] for nuc in DNA.Alphabet]) # for edge in self.tree: # psub = gen_stat_lf.getPsubForEdge(edge.Name) # pi = dot(mprobs, psub.array) # self.assertFloatEqual(mprobs, pi) # # def test_general_is_not_stationary(self): # """should not be stationary""" # gen_lf = self.make_cached('general') # mprobs = gen_lf.getMotifProbs() # mprobs = array([mprobs[nuc] for nuc in DNA.Alphabet]) # for edge in self.tree: # psub = gen_lf.getPsubForEdge(edge.Name) # pi = dot(mprobs, psub.array) # try: # self.assertFloatEqual(mprobs, pi) # except AssertionError: # pass if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_evolve/test_pairwise_distance.py000644 000765 000024 00000026426 12024702176 025062 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import warnings warnings.filterwarnings('ignore', 'Not using MPI as mpi4py not found') import numpy # hides the warning from taking log of -ve determinant numpy.seterr(invalid='ignore') from cogent.util.unit_test import TestCase, main from cogent import LoadSeqs, DNA, RNA, PROTEIN from cogent.evolve.pairwise_distance import get_moltype_index_array, \ seq_to_indices, _fill_diversity_matrix, \ _jc69_from_matrix, JC69Pair, _tn93_from_matrix, TN93Pair, LogDetPair from cogent.evolve._pairwise_distance import \ _fill_diversity_matrix as pyx_fill_diversity_matrix import math __author__ = "Gavin Huttley and Yicheng Zhu" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Yicheng Zhu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Production" class TestPair(TestCase): dna_char_indices = get_moltype_index_array(DNA) rna_char_indices = get_moltype_index_array(RNA) alignment = LoadSeqs(data=[('s1', 'ACGTACGTAC'), ('s2', 'GTGTACGTAC')], moltype=DNA) ambig_alignment = LoadSeqs(data=[('s1', 'RACGTACGTACN'), ('s2', 'AGTGTACGTACA')], moltype=DNA) diff_alignment = LoadSeqs(data=[('s1', 'ACGTACGTTT'), ('s2', 'GTGTACGTAC')], moltype=DNA) def est_char_to_index(self): """should correctly recode a DNA & RNA seqs into indices""" seq = 'TCAGRNY?-' expected = [0, 1, 2, 3, -9, -9, -9, -9, -9] indices = seq_to_indices(seq, self.dna_char_indices) self.assertEquals(indices, expected) seq = 'UCAGRNY?-' indices = seq_to_indices(seq, self.rna_char_indices) self.assertEquals(indices, expected) def est_fill_diversity_matrix_all(self): """make correct diversity matrix when all chars valid""" s1 = seq_to_indices('ACGTACGTAC', self.dna_char_indices) s2 = seq_to_indices('GTGTACGTAC', self.dna_char_indices) matrix = numpy.zeros((4,4), float) # self-self should just be an identity matrix _fill_diversity_matrix(matrix, s1, s1) self.assertEquals(matrix.sum(), len(s1)) self.assertEquals(matrix, numpy.array([[2,0,0,0], [0,3,0,0], [0,0,3,0], [0,0,0,2]], float)) # small diffs matrix.fill(0) _fill_diversity_matrix(matrix, s1, s2) self.assertEquals(matrix, numpy.array([[2,0,0,0], [1,2,0,0], [0,0,2,1], [0,0,0,2]], float)) def est_fill_diversity_matrix_some(self): """make correct diversity matrix when not all chars valid""" s1 = seq_to_indices('RACGTACGTACN', self.dna_char_indices) s2 = seq_to_indices('AGTGTACGTACA', self.dna_char_indices) matrix = numpy.zeros((4,4), float) # small diffs matrix.fill(0) _fill_diversity_matrix(matrix, s1, s2) self.assertEquals(matrix, numpy.array([[2,0,0,0], [1,2,0,0], [0,0,2,1], [0,0,0,2]], float)) def est_python_vs_cython_fill_matrix(self): """python & cython fill_diversity_matrix give same answer""" s1 = seq_to_indices('RACGTACGTACN', self.dna_char_indices) s2 = seq_to_indices('AGTGTACGTACA', self.dna_char_indices) matrix1 = numpy.zeros((4,4), float) _fill_diversity_matrix(matrix1, s1, s2) matrix2 = numpy.zeros((4,4), float) pyx_fill_diversity_matrix(matrix2, s1, s2) self.assertFloatEqual(matrix1, matrix2) def est_jc69_from_matrix(self): """compute JC69 from diversity matrix""" s1 = seq_to_indices('ACGTACGTAC', self.dna_char_indices) s2 = seq_to_indices('GTGTACGTAC', self.dna_char_indices) matrix = numpy.zeros((4,4), float) _fill_diversity_matrix(matrix, s1, s2) total, p, dist, var = _jc69_from_matrix(matrix) self.assertEquals(total, 10.0) self.assertEquals(p, 0.2) def est_jc69_from_alignment(self): """compute JC69 dists from an alignment""" calc = JC69Pair(DNA, alignment=self.alignment) calc.run() self.assertEquals(calc.Lengths['s1', 's2'], 10) self.assertEquals(calc.Proportions['s1', 's2'], 0.2) # value from OSX MEGA 5 self.assertFloatEqual(calc.Dists['s1', 's2'], 0.2326161962) # value**2 from OSX MEGA 5 self.assertFloatEqual(calc.Variances['s1', 's2'], 0.029752066125078681) # value from OSX MEGA 5 self.assertFloatEqual(calc.StdErr['s1', 's2'], 0.1724878724) # same answer when using ambiguous alignment calc.run(self.ambig_alignment) self.assertFloatEqual(calc.Dists['s1', 's2'], 0.2326161962) # but different answer if subsequent alignment is different calc.run(self.diff_alignment) self.assertTrue(calc.Dists['s1', 's2'] != 0.2326161962) def est_tn93_from_matrix(self): """compute TN93 distances""" calc = TN93Pair(DNA, alignment=self.alignment) calc.run() self.assertEquals(calc.Lengths['s1', 's2'], 10) self.assertEquals(calc.Proportions['s1', 's2'], 0.2) # value from OSX MEGA 5 self.assertFloatEqual(calc.Dists['s1', 's2'], 0.2554128119) # value**2 from OSX MEGA 5 self.assertFloatEqual(calc.Variances['s1', 's2'], 0.04444444445376601) # value from OSX MEGA 5 self.assertFloatEqual(calc.StdErr['s1', 's2'], 0.2108185107) # same answer when using ambiguous alignment calc.run(self.ambig_alignment) self.assertFloatEqual(calc.Dists['s1', 's2'], 0.2554128119) # but different answer if subsequent alignment is different calc.run(self.diff_alignment) self.assertTrue(calc.Dists['s1', 's2'] != 0.2554128119) def est_distance_pair(self): """get distances dict""" calc = TN93Pair(DNA, alignment=self.alignment) calc.run() dists = calc.getPairwiseDistances() dist = 0.2554128119 expect = {('s1', 's2'): dist, ('s2', 's1'): dist} self.assertEquals(dists.keys(), expect.keys()) self.assertFloatEqual(dists.values(), expect.values()) def est_logdet_pair_dna(self): """logdet should produce distances that match MEGA""" aln = LoadSeqs('data/brca1_5.paml', moltype=DNA) logdet_calc = LogDetPair(moltype=DNA, alignment=aln) logdet_calc.run(use_tk_adjustment=True) dists = logdet_calc.getPairwiseDistances() all_expected = {('Human', 'NineBande'): 0.075336929999999996, ('NineBande', 'DogFaced'): 0.0898575452, ('DogFaced', 'Human'): 0.1061747919, ('HowlerMon', 'DogFaced'): 0.0934480008, ('Mouse', 'HowlerMon'): 0.26422862920000001, ('NineBande', 'Human'): 0.075336929999999996, ('HowlerMon', 'NineBande'): 0.062202897899999998, ('DogFaced', 'NineBande'): 0.0898575452, ('DogFaced', 'HowlerMon'): 0.0934480008, ('Human', 'DogFaced'): 0.1061747919, ('Mouse', 'Human'): 0.26539976700000001, ('NineBande', 'HowlerMon'): 0.062202897899999998, ('HowlerMon', 'Human'): 0.036571181899999999, ('DogFaced', 'Mouse'): 0.2652555144, ('HowlerMon', 'Mouse'): 0.26422862920000001, ('Mouse', 'DogFaced'): 0.2652555144, ('NineBande', 'Mouse'): 0.22754789210000001, ('Mouse', 'NineBande'): 0.22754789210000001, ('Human', 'Mouse'): 0.26539976700000001, ('Human', 'HowlerMon'): 0.036571181899999999} for pair in dists: got = dists[pair] expected = all_expected[pair] self.assertFloatEqual(got, expected) def est_logdet_tk_adjustment(self): """logdet using tamura kumar differs from classic""" aln = LoadSeqs('data/brca1_5.paml', moltype=DNA) logdet_calc = LogDetPair(moltype=DNA, alignment=aln) logdet_calc.run(use_tk_adjustment=True, show_progress=False) tk = logdet_calc.getPairwiseDistances() logdet_calc.run(use_tk_adjustment=False, show_progress=False) not_tk = logdet_calc.getPairwiseDistances() self.assertNotEqual(tk, not_tk) def est_logdet_pair_aa(self): """logdet shouldn't fail to produce distances for aa seqs""" aln = LoadSeqs('data/brca1_5.paml', moltype=DNA) aln = aln.getTranslation() logdet_calc = LogDetPair(moltype=PROTEIN, alignment=aln) logdet_calc.run(use_tk_adjustment=True, show_progress=False) dists = logdet_calc.getPairwiseDistances() def test_logdet_missing_states(self): """should calculate logdet measurement with missing states""" data = [('seq1', "GGGGGGGGGGGCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGCGGTTTTTTTTTTTTTTTTTT"), ('seq2', "TAAAAAAAAAAGGGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCC")] aln = LoadSeqs(data=data, moltype=DNA) logdet_calc = LogDetPair(moltype=DNA, alignment=aln) logdet_calc.run(use_tk_adjustment=True, show_progress=False) dists = logdet_calc.getPairwiseDistances() self.assertTrue(dists.values()[0] is not None) logdet_calc.run(use_tk_adjustment=False, show_progress=False) dists = logdet_calc.getPairwiseDistances() self.assertTrue(dists.values()[0] is not None) def test_logdet_variance(self): """calculate logdet variance consistent with hand calculation""" data = [('seq1', "GGGGGGGGGGGCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGCGGTTTTTTTTTTTTTTTTTT"), ('seq2', "TAAAAAAAAAAGGGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCC")] aln = LoadSeqs(data=data, moltype=DNA) logdet_calc = LogDetPair(moltype=DNA, alignment=aln) logdet_calc.run(use_tk_adjustment=True, show_progress=False) self.assertFloatEqual(logdet_calc.Variances[1,1], 0.5267, eps=1e-3) logdet_calc.run(use_tk_adjustment=False, show_progress=False) dists = logdet_calc.getPairwiseDistances() self.assertFloatEqual(logdet_calc.Variances[1,1], 0.4797, eps=1e-3) def est_logdet_for_determinant_lte_zero(self): """returns distance of None if the determinant is <= 0""" data = dict(seq1="AGGGGGGGGGGCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGCGGTTTTTTTTTTTTTTTTTT", seq2="TAAAAAAAAAAGGGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCC") aln = LoadSeqs(data=data, moltype=DNA) logdet_calc = LogDetPair(moltype=DNA, alignment=aln) logdet_calc.run(use_tk_adjustment=True, show_progress=False) dists = logdet_calc.getPairwiseDistances() self.assertTrue(dists.values()[0] is None) logdet_calc.run(use_tk_adjustment=False, show_progress=False) dists = logdet_calc.getPairwiseDistances() self.assertTrue(dists.values()[0] is None) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_evolve/test_parameter_controller.py000644 000765 000024 00000020112 12024702176 025572 0ustar00jrideoutstaff000000 000000 #! /usr/bin/env python # Matthew Wakefield Feb 2004 from __future__ import with_statement import unittest import os import warnings from cogent import LoadSeqs, LoadTree import cogent.evolve.parameter_controller, cogent.evolve.substitution_model from cogent.maths import optimisers __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" base_path = os.getcwd() data_path = os.path.join(base_path, 'data') good_rule_sets = [ [ {'par_name' : 'length','is_independent':True}, ], [ {'par_name' : 'length','is_independent':True}, ], [ {'par_name' : 'length','is_clade' :True, 'is_independent':True, 'edges' : ['a','b']}, ], [ {'par_name' : 'length','is_independent':True, 'edges' : ['a','c','e']}, ], [ {'par_name' : 'length','is_independent':True, 'edge' : 'a'}, ], ] bad_rule_sets = [ [ {'par_name' : 'length','is_clade' :True, 'edges' : ['b','f'],}, ], ] class test_parameter_controller(unittest.TestCase): """Tesing Parameter Controller""" def setUp(self): #length all edges 1 except c=2. b&d transitions all other transverions self.al = LoadSeqs( data={'a':'tata', 'b':'tgtc', 'c':'gcga', 'd':'gaac', 'e':'gagc',}) self.tree = LoadTree(treestring='((a,b),(c,d),e);') self.model = cogent.evolve.substitution_model.Nucleotide( do_scaling=True, equal_motif_probs=True, model_gaps=True) def test_scoped_local(self): model = cogent.evolve.substitution_model.Nucleotide( do_scaling=True, equal_motif_probs=True, model_gaps=True, predicates = {'kappa':'transition'}) lf = model.makeLikelihoodFunction(self.tree) lf.setConstantLengths() lf.setAlignment(self.al) null = lf.getNumFreeParams() lf.setParamRule(par_name='kappa', is_independent=True, edges=['b','d']) self.assertEqual(null+2, lf.getNumFreeParams()) def test_setMotifProbs(self): """Mprobs supplied to the parameter controller""" model = cogent.evolve.substitution_model.Nucleotide( model_gaps=True, motif_probs=None) lf = model.makeLikelihoodFunction(self.tree, motif_probs_from_align=False) mprobs = {'A':0.1,'C':0.2,'G':0.2,'T':0.5,'-':0.0} lf.setMotifProbs(mprobs) self.assertEqual(lf.getMotifProbs(), mprobs) lf.setMotifProbsFromData(self.al[:1], is_constant=True) self.assertEqual(lf.getMotifProbs()['G'], 0.6) lf.setMotifProbsFromData(self.al[:1], pseudocount=1) self.assertNotEqual(lf.getMotifProbs()['G'], 0.6) # test with consideration of ambiguous states al = LoadSeqs(data = {'seq1': 'ACGTAAGNA', 'seq2': 'ACGTANGTC', 'seq3': 'ACGTACGTG'}) lf.setMotifProbsFromData(al, include_ambiguity=True, is_constant=True) motif_probs = dict(lf.getMotifProbs()) correct_probs = {'A': 8.5/27, 'C': 5.5/27, '-': 0.0, 'T': 5.5/27, 'G': 7.5/27} self.assertEqual(motif_probs, correct_probs) self.assertEqual(sum(motif_probs.values()), 1.0) def test_setMultiLocus(self): """2 loci each with own mprobs""" model = cogent.evolve.substitution_model.Nucleotide(motif_probs=None) lf = model.makeLikelihoodFunction(self.tree, motif_probs_from_align=False, loci=["a", "b"]) mprobs_a = dict(A=.2, T=.2, C=.3, G=.3) mprobs_b = dict(A=.1, T=.2, C=.3, G=.4) for is_constant in [False, True]: lf.setMotifProbs(mprobs_a, is_constant=is_constant) s = str(lf) lf.setMotifProbs(mprobs_b, locus="b") self.assertEqual(lf.getMotifProbs(locus="a"), mprobs_a) self.assertEqual(lf.getMotifProbs(locus="b"), mprobs_b) s = str(lf) #lf.setParamRule('mprobs', is_independent=False) def test_setParamRules(self): lf = self.model.makeLikelihoodFunction(self.tree) def do_rules(rule_set): for rule in rule_set: lf.setParamRule(**rule) for rule_set in good_rule_sets: lf.setDefaultParamRules() do_rules(rule_set) for rule_set in bad_rule_sets: lf.setDefaultParamRules() self.assertRaises((KeyError, TypeError, AssertionError, ValueError), do_rules, rule_set) def test_setLocalClock(self): pass def test_setConstantLengths(self): t = LoadTree(treestring='((a:1,b:2):3,(c:4,d:5):6,e:7);') lf = self.model.makeLikelihoodFunction(t)#self.tree) lf.setParamRule('length', is_constant=True) # lf.setConstantLengths(t) lf.setAlignment(self.al) self.assertEqual(lf.getParamValue('length', 'b'), 2) self.assertEqual(lf.getParamValue('length', 'd'), 5) def test_pairwise_clock(self): al = LoadSeqs(data={'a':'agct','b':'ggct'}) tree = LoadTree(treestring='(a,b);') model = cogent.evolve.substitution_model.Dinucleotide( do_scaling=True, equal_motif_probs=True, model_gaps=True, mprob_model='tuple') lf = model.makeLikelihoodFunction(tree) lf.setLocalClock('a','b') lf.setAlignment(al) lf.optimise(local=True) rd = lf.getParamValueDict(['edge'], params=['length']) self.assertAlmostEquals(lf.getLogLikelihood(),-10.1774488956) self.assertEqual(rd['length']['a'],rd['length']['b']) def test_local_clock(self): lf = self.model.makeLikelihoodFunction(self.tree) lf.setLocalClock('c','d') lf.setAlignment(self.al) lf.optimise(local=True, tolerance=1e-8, max_restarts=2) rd = lf.getParamValueDict(['edge'], params=['length']) self.assertAlmostEquals(lf.getLogLikelihood(),-27.84254174) self.assertEqual(rd['length']['c'],rd['length']['d']) self.assertNotEqual(rd['length']['a'],rd['length']['e']) def test_complex_parameter_rules(self): # This test has many local minima and so does not cope # with changes to optimiser details. model = cogent.evolve.substitution_model.Nucleotide( do_scaling=True, equal_motif_probs=True, model_gaps=True, predicates = {'kappa':'transition'}) lf = model.makeLikelihoodFunction(self.tree) lf.setParamRule(par_name='kappa', is_independent=True) lf.setParamRule(par_name='kappa', is_independent=False, edges=['b','d']) lf.setConstantLengths(LoadTree( treestring='((a:1,b:1):1,(c:2,d:1):1,e:1);')) #print self.pc lf.setAlignment(self.al) lf.optimise(local=True) rd = lf.getParamValueDict(['edge'], params=['kappa']) self.assertAlmostEquals(lf.getLogLikelihood(),-27.3252, 3) self.assertEqual(rd['kappa']['b'],rd['kappa']['d']) self.assertNotEqual(rd['kappa']['a'],rd['kappa']['b']) def test_bounds(self): """Test setting upper and lower bounds for parameters""" lf = self.model.makeLikelihoodFunction(self.tree) lf.setParamRule('length', value=3, lower=0, upper=5) # Out of bounds value should warn and keep bounded with warnings.catch_warnings(record=True) as w: lf.setParamRule('length', lower=0, upper=2) self.assertTrue(len(w), 'No warning issued') self.assertEqual(lf.getParamValue('length', edge='a'), 2) # upper < lower bounds should fail self.assertRaises(ValueError, lf.setParamRule, 'length', lower=2, upper=0) if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_evolve/test_scale_rules.py000644 000765 000024 00000014024 12024702176 023655 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import unittest from cogent import LoadTree from cogent.evolve import substitution_model def a_c(x, y): return (x == 'A' and y == 'C') or (x == 'C' and y == 'A') from cogent.evolve.predicate import MotifChange, replacement __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" a_c = MotifChange('A', 'C') trans = MotifChange('A', 'G') | MotifChange('T', 'C') TREE = LoadTree(tip_names='ab') class ScaleRuleTests(unittest.TestCase): def _makeModel(self, do_scaling, predicates, scale_rules=[]): return substitution_model.Nucleotide( do_scaling=do_scaling, equal_motif_probs=True, model_gaps=False, predicates=predicates, scales=scale_rules) def _getScaledLengths(self, model, params): LF = model.makeLikelihoodFunction(TREE) for param in params: LF.setParamRule(param, value=params[param], is_constant=True) result = {} for predicate in model.scale_masks: result[predicate] = LF.getScaledLengths(predicate)['a'] return result def test_scaled(self): """Scale rule requiring matrix entries to have all pars specified""" model = self._makeModel(True, {'k':trans}, { 'ts':trans, 'tv': ~trans}) self.assertEqual( self._getScaledLengths(model, {'k':6.0, 'length':4.0}), {'ts': 3.0, 'tv':1.0}) def test_binned(self): model = self._makeModel(True, {'k':trans}, { 'ts':trans, 'tv': ~trans}) LF = model.makeLikelihoodFunction(TREE, bins=2) LF.setParamRule('length', value=4.0, is_constant=True) LF.setParamRule('k', value=6.0, bin='bin0', is_constant=True) LF.setParamRule('k', value=1.0, bin='bin1', is_constant=True) for (bin, expected) in [('bin0', 3.0), ('bin1', 4.0/3), (None, 13.0/6)]: self.assertEqual(LF.getScaledLengths('ts', bin=bin)['a'], expected) def test_unscaled(self): """Scale rule on a model which has scaling performed after calculation rather than during it""" model = self._makeModel(False, {'k':trans}, { 'ts':trans, 'tv': ~trans}) self.assertEqual( self._getScaledLengths(model, {'k':6.0, 'length':2.0}), {'ts': 3.0, 'tv':1.0}) def test_scaled_or(self): """Scale rule where matrix entries can have any of the pars specified""" model = self._makeModel(True, {'k':trans, 'ac':a_c}, { 'or': (trans | a_c), 'not': ~(trans | a_c)}) self.assertEqual( self._getScaledLengths(model, {'k':6.0,'length':6.0, 'ac': 3.0}), {'or': 5.0, 'not': 1.0}) def test_scaling(self): """Testing scaling calculations using Dn and Ds as an example.""" model = substitution_model.Codon( do_scaling=True, model_gaps=False, recode_gaps=True, predicates = { 'k': trans, 'r': replacement}, motif_probs={ 'TAT': 0.0088813702685557206, 'TGT': 0.020511736096426307, 'TCT': 0.024529498836963416, 'TTT': 0.019454430112074435, 'TGC': 0.0010573059843518714, 'TGG': 0.0042292239374074857, 'TAC': 0.002326073165574117, 'TTC': 0.0086699090716853451, 'TCG': 0.0010573059843518714, 'TTA': 0.020723197293296681, 'TTG': 0.01036159864664834, 'TCC': 0.0082469866779445976, 'TCA': 0.022414886868259674, 'GCA': 0.015648128568407697, 'GTA': 0.014590822584055826, 'GCC': 0.0095157538591668436, 'GTC': 0.0063438359061112285, 'GCG': 0.0016916895749629942, 'GTG': 0.0067667582998519769, 'CAA': 0.018185662930852189, 'GTT': 0.021569042080778176, 'GCT': 0.014167900190315077, 'ACC': 0.0042292239374074857, 'GGT': 0.014167900190315077, 'CGA': 0.0012687671812222456, 'CGC': 0.0010573059843518714, 'GAT': 0.030238951152463524, 'AAG': 0.034891097483611758, 'CGG': 0.002326073165574117, 'ACT': 0.028758722774370905, 'GGG': 0.0071896806935927262, 'GGA': 0.016282512159018821, 'GGC': 0.0090928314654260944, 'GAG': 0.031296257136815393, 'AAA': 0.05476844998942694, 'GAC': 0.011207443434129837, 'CGT': 0.0033833791499259885, 'GAA': 0.076337492070205112, 'CTT': 0.010573059843518714, 'ATG': 0.012687671812222457, 'ACA': 0.021991964474518927, 'ACG': 0.00084584478748149711, 'ATC': 0.0076126030873334746, 'AAC': 0.022837809262000422, 'ATA': 0.017762740537111441, 'AGG': 0.013533516599703954, 'CCT': 0.025586804821315288, 'AGC': 0.029393106364982026, 'AGA': 0.021991964474518927, 'CAT': 0.021357580883907802, 'AAT': 0.05772890674561218, 'ATT': 0.019031507718333687, 'CTG': 0.012899133009092831, 'CTA': 0.013744977796574329, 'CTC': 0.0078240642842038483, 'CAC': 0.0050750687248889825, 'CCG': 0.00021146119687037428, 'AGT': 0.03742863184605625, 'CAG': 0.024106576443222668, 'CCA': 0.021357580883907802, 'CCC': 0.0069782194967223515}, scales = {'dN': replacement, 'dS': ~replacement}, mprob_model = 'tuple', ) length = 0.1115 a = self._getScaledLengths(model, {'k': 3.6491, 'r': 0.6317, 'length': length}) b = self._getScaledLengths(model, {'k': 3.6491, 'r': 1.0, 'length': length}) dN = length * a['dN'] / (3.0 * b['dN']) dS = length * a['dS'] / (3.0 * b['dS']) # following are results from PAML self.assertEqual('%.4f' % dN, '0.0325') self.assertEqual('%.4f' % dS ,'0.0514') if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_evolve/test_simulation.py000644 000765 000024 00000004640 12024702176 023543 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """testing the alignment simulation code. We will first just create a simple Jukes Cantor model using a four taxon tree with very different branch lengths, and a Kimura two (really one) parameter model. The test is to reestimate the parameter values as accurately as possible.""" import sys from cogent.core import alignment, tree from cogent.evolve import substitution_model __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" # specify the 4 taxon tree, and a 'dummy' alignment t = tree.LoadTree(treestring='(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1)root;') #al = alignments.LoadSeqs(data={'a':'a','b':'a','c':'a','d':'a'}) # how long the simulated alignments should be # at 1000000 the estimates get nice and close length_of_align = 10000 ######################### # # For a Jukes Cantor model # ######################### sm = substitution_model.Nucleotide() lf = sm.makeLikelihoodFunction(t) lf.setConstantLengths() lf.setName('True JC model') print lf simulated = lf.simulateAlignment(sequence_length=length_of_align) print simulated new_lf = sm.makeLikelihoodFunction(t) new_lf = new_lf.setAlignment(simulated) new_lf.optimise(tolerance=1.0) new_lf.optimise(local=True) new_lf.setName('True JC model') print new_lf ######################### # # a Kimura model # ######################### # has a ts/tv term, different values for every edge sm = substitution_model.Nucleotide(predicates={'kappa':'transition'}) lf = sm.makeLikelihoodFunction(t) lf.setConstantLengths() lf.setParamRule('kappa',is_constant = True, value = 4.0, edge_name='a') lf.setParamRule('kappa',is_constant = True, value = 0.5, edge_name='b') lf.setParamRule('kappa',is_constant = True, value = 0.2, edge_name='c') lf.setParamRule('kappa',is_constant = True, value = 3.0, edge_name='d') lf.setParamRule('kappa',is_constant = True, value = 2.0, edge_name='edge.0') lf.setName('True Kappa model') print lf simulated = lf.simulateAlignment(sequence_length=length_of_align) print simulated new_lf = sm.makeLikelihoodFunction(t) new_lf.setParamRule('kappa',is_independent=True) new_lf.setAlignment(simulated) new_lf.optimise(tolerance=1.0) new_lf.optimise(local=True) new_lf.setName('Estimated Kappa model') print new_lf PyCogent-1.5.3/tests/test_evolve/test_substitution_model.py000644 000765 000024 00000020657 12024702176 025321 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os from cogent import LoadSeqs, CodonAlphabet, DNA, LoadTable from cogent.core import genetic_code from cogent.evolve import substitution_model, substitution_calculation from cogent.util.unit_test import TestCase, main __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" base_path = os.getcwd() data_path = os.path.join(base_path, 'data') class NucleotideModelTestMethods(TestCase): def setUp(self): self.submodel = substitution_model.Nucleotide( do_scaling=True, model_gaps=False) def test_isTransition(self): """testing isTransition""" isTransition = self.submodel.getPredefinedPredicate('transition') assert isTransition('A', 'G') assert isTransition('C', 'T') assert not isTransition('A', 'T') assert not isTransition('C', 'G') def test_isTransversion(self): """testing isTransversion""" isTransversion = self.submodel.getPredefinedPredicate('transversion') assert not isTransversion('A', 'G') assert not isTransversion('C', 'T') assert isTransversion('A', 'T') assert isTransversion('C', 'G') def test_isIndel(self): """testing indel comparison nucleotide model""" model = substitution_model.Nucleotide( do_scaling=True, model_gaps=True) isIndel = model.getPredefinedPredicate('indel') assert isIndel('A', '-') assert isIndel('-', 'G') #assert not self.submodel.isIndel('-', '-') assert not isIndel('a', 't') def test_PredicateChecks(self): # overparameterisation self.assertRaises(ValueError, substitution_model.Nucleotide, model_gaps=False, predicates=['transition', 'transversion']) class MultiLetterMotifSubstModelTests(TestCase): def setUp(self): self.submodel = substitution_model.Dinucleotide(do_scaling=True, model_gaps=True, mprob_model='tuple') def test_asciiArt(self): model = substitution_model.Dinucleotide(mprob_model='tuple', predicates=['k:transition']) model.asciiArt() model = substitution_model.Dinucleotide(mprob_model='tuple') model.asciiArt() def test_isIndel(self): """testing indel comparison for dinucleotide model""" # these are non-instantaneous isIndel = self.submodel.getPredefinedPredicate('indel') assert not isIndel('AA', '--') assert not isIndel('--', 'CT') #assert not self.submodel.isIndel('--', '--') assert not isIndel('AT', 'AA') assert isIndel('AA', 'A-') assert isIndel('-A', 'CA') assert isIndel('TA', '-A') # isIndel can now assume it won't get any non-instantaneous pairs # assert self.submodel.isIndel('-a', 'a-') == 0 class TupleModelMotifProbFuncs(TestCase): dinucs = ('TT', 'CT', 'AT', 'GT', 'TC', 'CC', 'AC', 'GC', 'TA', 'CA', 'AA', 'GA', 'TG', 'CG', 'AG', 'GG') nuc_probs = [('T', 0.1), ('C', 0.2), ('A', 0.3), ('G', 0.4)] dinuc_probs=[(m2+m1,p1*p2) for m1,p1 in nuc_probs for m2,p2 in nuc_probs] mat_indices = dict( C=set([(0,1),(0,4),(1,5),(2,1),(2,6),(3,1),(3,7),(4,5),(6,5), (7,5),(8,4),(8,9),(9,5),(10,6),(10,9),(11,7),(11,9),(12,4), (12,13),(13,5),(14,6),(14,13),(15,7),(15,13)]), A=set([(0,2),(0,8),(1,2),(1,9),(2,10),(3,2),(3,11),(4,6),(4,8), (5,6),(5,9),(6,10),(7,6),(7,11),(8,10),(9,10),(11,10), (12,8),(12,14),(13,9),(13,14),(14,10),(15,11),(15,14)]), G=set([(0,3),(0,12),(1,3),(1,13),(2,3),(2,14),(3,15),(4,7),(4,12), (5,7),(5,13),(6,7),(6,14),(7,15),(8,11),(8,12),(9,11), (9,13),(10,11),(10,14),(11,15),(12,15),(13,15),(14,15)]), T=set([(1,0),(2,0),(3,0),(4,0),(5,1),(5,4),(6,2),(6,4),(7,3), (7,4),(8,0),(9,1),(9,8),(10,2),(10,8),(11,3),(11,8),(12,0), (13,1),(13,12),(14,2),(14,12),(15,3),(15,12)]) ) class ThreeLetterMotifSubstModelTests(TestCase): def setUp(self): self.submodel = substitution_model.Nucleotide(motif_length=3, mprob_model='tuple') def test_isIndel(self): """testing indel comparison for trinucleotide model""" isIndel = self.submodel.getPredefinedPredicate('indel') assert isIndel('AAA', 'AA-') assert isIndel('-CA', 'CCA') assert isIndel('TAC', 'T-C') # isIndel can now assume it won't get any non-instantaneous pairs assert not isIndel('AAA', '---') assert not isIndel('---', 'CTT') assert not isIndel('AAA', '--A') assert not isIndel('C--', 'CTT') class CodonSubstModelTests(TestCase): def setUp(self): self.standardcode = substitution_model.Codon(model_gaps=True, gc=1, mprob_model='tuple') self.mitocode = substitution_model.Codon(model_gaps=False, gc=2, mprob_model='tuple') def test_isTransition(self): """testing codon isTransition""" isTransition = self.standardcode.getPredefinedPredicate('transition') # first position assert isTransition('TGC', 'CGC') assert isTransition('GGC', 'AGC') # second position assert isTransition('CTT', 'CCT') assert isTransition('CAT', 'CGT') # thirs position assert isTransition('CTT', 'CTC') assert isTransition('CTA', 'CTG') # mito code assert isTransition('CTT', 'CTC') assert isTransition('CTA', 'CTG') assert not isTransition('GAG', 'GTG') assert not isTransition('CCC', 'CGC') def test_isReplacement(self): """test isReplacement for the two major genetic codes""" isReplacement = self.standardcode.getPredefinedPredicate('replacement') # for the standard code, a replacement assert isReplacement('CTG', 'ATG') assert not isReplacement('AGT','TCC') assert not isReplacement('CTG', '---') assert not isReplacement('---', 'CTA') # for the vert mitocho code, instantaneous replacement isReplacement = self.mitocode.getPredefinedPredicate('replacement') assert isReplacement('AAA', 'AAC') def test_isSilent(self): """testing isSilent for the two major genetic codes""" isSilent = self.standardcode.getPredefinedPredicate('silent') assert isSilent('CTA', 'CTG') assert not isSilent('AGT','AAG') assert not isSilent('CTG', '---') assert not isSilent('---', 'CTG') # for vert mito code isSilent = self.mitocode.getPredefinedPredicate('silent') assert isSilent('TCC', 'TCA') def test_isIndel(self): """test isIndel for codon model""" isIndel = self.standardcode.getPredefinedPredicate('indel') assert isIndel('CTA', '---') assert not isIndel('---', '---') assert isIndel('---', 'TTC') def test_str_(self): """str() and repr() of a substitution model""" s = str(self.standardcode) r = repr(self.standardcode) class ModelDataInteractionTestMethods(TestCase): def test_excludeinggaps(self): """testing excluding gaps from model""" model = substitution_model.Nucleotide(model_gaps=False) assert len(model.getAlphabet()) == 4 def test_includinggaps(self): """testing excluding gaps from model""" model = substitution_model.Nucleotide(model_gaps=True) assert len(model.getAlphabet()) == 5 def test_getMotifs(self): """testing return of motifs""" model_motifs = substitution_model.Nucleotide().getMotifs() def test_getParamList(self): """testing getting the parameter list""" model = substitution_model.Nucleotide() self.assertEqual(model.getParamList(), []) model = substitution_model.Nucleotide(predicates=['beta:transition']) self.assertEqual(model.getParamList(), ['beta']) # need to ensure entering motif probs that sum to 1, that motif sets are the same if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_draw/__init__.py000644 000765 000024 00000000714 12024702176 021512 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_distribution_plots'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Sandra Smit", "Gavin Huttley", "Rob Knight", "Zongzhi Liu", "Amanda Birmingham", "Greg Caporaso", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_draw/test_distribution_plots.py000755 000765 000024 00000061034 12024702176 024757 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Jai Ram Rideout" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jai Ram Rideout" __email__ = "jai.rideout@gmail.com" __status__ = "Production" """Tests public and private functions in the distribution_plots module.""" from matplotlib import use use('Agg', warn=False) from StringIO import StringIO import sys import matplotlib.colors as colors from matplotlib.pyplot import boxplot from numpy import array from cogent.draw.distribution_plots import _validate_input,\ _get_distribution_markers, _validate_x_values, _create_plot,\ _calc_data_point_locations, _set_axes_options, generate_box_plots,\ generate_comparative_plots, _calc_data_point_ticks, _plot_bar_data,\ _plot_scatter_data, _plot_box_data, _color_box_plot,\ _create_legend, _set_figure_size from cogent.util.unit_test import TestCase, main class DistributionPlotsTests(TestCase): """Tests of the distribution_plots module.""" def setUp(self): """Create some data to be used in the tests.""" # Test null data list. self.Null = None # Test empty data list. self.Empty = [] # Test nested empty data list. self.EmptyNested = [[]] # Test nested empty data list (for bar/scatter plots). self.EmptyDeeplyNested = [[[]]] # Test invalid number of samples in data list (for bar/scatter plots). self.InvalidNumSamples = [[[1, 2, 3, 4, 5]], [[4, 5, 6, 7, 8], [2, 3, 2]], [[4, 7, 10, 33, 32, 6, 7, 8]]] # Test valid data with one sample (for bar/scatter plots). self.ValidSingleSampleData = [[[1, 2, 3, 4, 5]], [[4, 5, 6, 7, 8]], [[4, 7, 10, 33, 32, 6, 7, 8]]] # Test valid data with three samples and four data points # (for bar/scatter plots). self.ValidTypicalData = [[[1.0, 2, 3.5, 5], [2, 3, 5, 6], [2, 3, 8]], [[4, 7, 8], [8, 9, 10, 11], [9.0, 4, 1, 1]], [[4, 33, 32, 6, 8], [5, 4, 8, 13], [1, 1, 2]], [[2, 2, 2, 2], [3, 9, 8], [2, 1, 6, 7, 4, 5]]] # Test typical data to be plotted by the boxplot function. self.ValidTypicalBoxData = [[3.4, 10, 11.67, 12.0, 2, 2, 99.99], [2.3, 4, 5, 88, 9, 10, 11, 1, 0, 3, -8], [2, 9, 7, 5, 6]] def test_validate_input_null(self): """_validate_input() should raise a ValueError if null data is passed to it.""" self.assertRaises(ValueError, _validate_input, self.Null, None, None, None) def test_validate_input_empty(self): """_validate_input() should raise a ValueError if empty data is passed to it.""" self.assertRaises(ValueError, _validate_input, self.Empty, None, None, None) def test_validate_input_empty_nested(self): """_validate_input() should raise a ValueError if empty nested data is passed to it.""" self.assertRaises(ValueError, _validate_input, self.EmptyNested, None, None, None) def test_validate_input_empty_deeply_nested(self): """_validate_input() should pass for deeply nested empty data.""" num_points, num_samples = _validate_input(self.EmptyDeeplyNested, None, None, None) self.assertEqual(num_points, 1) self.assertEqual(num_samples, 1) def test_validate_input_invalid_num_samples(self): """_validate_input() should raise a ValueError if an inconsistent number of samples in included in the data.""" self.assertRaises(ValueError, _validate_input, self.InvalidNumSamples, None, None, None) def test_validate_x_values_invalid_x_values(self): """_validate_x_values() should raise a ValueError on an invalid number of x_values.""" self.assertRaises(ValueError, _validate_x_values, [1, 2, 3, 4], ["T0", "T1", "T2"], len(self.ValidSingleSampleData)) def test_validate_x_values_invalid_x_tick_labels(self): """_validate_x_values() should raise a ValueError on an invalid number of x_tick_labels.""" self.assertRaises(ValueError, _validate_x_values, None, ["T0"], len(self.ValidSingleSampleData)) def test_validate_x_values_nonnumber_x_values(self): """_validate_x_values() should raise a ValueError on x_values that aren't numbers.""" self.assertRaises(ValueError, _validate_x_values, ["foo", 2, 3], None, len(self.ValidSingleSampleData)) def test_validate_x_values_valid_x_values(self): """_validate_x_values() should not throw an exception.""" _validate_x_values([1, 2.0, 3], None, 3) def test_validate_input_invalid_data_point_names(self): """_validate_input() should raise a ValueError on data_point_names that are an invalid length.""" self.assertRaises(ValueError, _validate_input, self.ValidSingleSampleData, None, ["T0", "T1"], None) def test_validate_input_invalid_sample_names(self): """_validate_input() should raise a ValueError on sample_names that are an invalid length.""" self.assertRaises(ValueError, _validate_input, self.ValidSingleSampleData, None, None, ["Men", "Women"]) def test_validate_input_all_valid_input(self): """_validate_input() should return valid information about the data without throwing an exception.""" self.assertEqual(_validate_input(self.ValidTypicalData, [1, 3, 4, 8], ["T0", "T1", "T2", "T3"], ["Infants", "Children", "Teens"]), (4, 3)) def test_get_distribution_markers_null_marker_list(self): """_get_distribution_markers() should return a list of predefined matplotlib markers.""" self.assertEqual(_get_distribution_markers('colors', None, 5), ['b', 'g', 'r', 'c', 'm']) def test_get_distribution_markers_empty_marker_list(self): """_get_distribution_markers() should return a list of predefined matplotlib markers.""" self.assertEqual(_get_distribution_markers('colors', None, 4), ['b', 'g', 'r', 'c']) def test_get_distribution_markers_insufficient_markers(self): """_get_distribution_markers() should return a wrapped list of predefined markers.""" # Save stdout and replace it with something that will capture the print # statement. Note: this code was taken from here: # http://stackoverflow.com/questions/4219717/how-to-assert-output- # with-nosetest-unittest-in-python/4220278#4220278 saved_stdout = sys.stdout try: out = StringIO() sys.stdout = out self.assertEqual(_get_distribution_markers('colors', None, 10), ['b', 'g', 'r', 'c', 'm', 'y', 'w', 'b', 'g', 'r']) self.assertEqual(_get_distribution_markers('symbols', ['^', '>', '<'], 5), ['^', '>', '<', '^', '>']) output = out.getvalue().strip() self.assertEqual(output, "There are not enough markers to " "uniquely represent each distribution in your dataset. " "You may want to provide a list of markers that is at " "least as large as the number of distributions in your " "dataset.\nThere are not enough markers to " "uniquely represent each distribution in your dataset. " "You may want to provide a list of markers that is at " "least as large as the number of distributions in your " "dataset.") finally: sys.stdout = saved_stdout def test_get_distribution_markers_bad_marker_type(self): """_get_distribution_markers() should raise a ValueError.""" self.assertRaises(ValueError, _get_distribution_markers, 'shapes', [], 3) def test_get_distribution_markers_zero_markers(self): """_get_distribution_markers() should return an empty list.""" self.assertEqual(_get_distribution_markers('symbols', None, 0), []) self.assertEqual(_get_distribution_markers('symbols', ['^'], 0), []) def test_create_plot(self): """_create_plot() should return a tuple containing a Figure and Axes.""" fig, ax = _create_plot() self.assertEqual(fig.__class__.__name__, "Figure") self.assertEqual(ax.__class__.__name__, "AxesSubplot") def test_plot_bar_data(self): """_plot_bar_data() should return a list of Rectangle objects.""" fig, ax = _create_plot() result = _plot_bar_data(ax, [1, 2, 3], 'red', 0.5, 3.75, 1.5, 'stdv') self.assertEqual(result[0].__class__.__name__, "Rectangle") self.assertEqual(len(result), 1) self.assertFloatEqual(result[0].get_width(), 0.5) self.assertFloatEqual(result[0].get_facecolor(), (1.0, 0.0, 0.0, 1.0)) self.assertFloatEqual(result[0].get_height(), 2.0) fig, ax = _create_plot() result = _plot_bar_data(ax, [1, 2, 3], 'red', 0.5, 3.75, 1.5, 'sem') self.assertEqual(result[0].__class__.__name__, "Rectangle") self.assertEqual(len(result), 1) self.assertFloatEqual(result[0].get_width(), 0.5) self.assertFloatEqual(result[0].get_facecolor(), (1.0, 0.0, 0.0, 1.0)) self.assertFloatEqual(result[0].get_height(), 2.0) def test_plot_bar_data_bad_error_bar_type(self): """_plot_bar_data() should raise an exception on bad error bar type.""" fig, ax = _create_plot() self.assertRaises(ValueError, _plot_bar_data, ax, [1, 2, 3], 'red', 0.5, 3.75, 1.5, 'var') def test_plot_bar_data_empty(self): """_plot_bar_data() should not error when given empty list of data, but should not plot anything.""" fig, ax = _create_plot() result = _plot_bar_data(ax, [], 'red', 0.5, 3.75, 1.5, 'stdv') self.assertTrue(result is None) fig, ax = _create_plot() result = _plot_bar_data(ax, [], 'red', 0.5, 3.75, 1.5, 'sem') self.assertTrue(result is None) def test_plot_scatter_data(self): """_plot_scatter_data() should return a Collection instance.""" fig, ax = _create_plot() result = _plot_scatter_data(ax, [1, 2, 3], '^', 0.77, 1, 1.5, 'stdv') self.assertFloatEqual(result.get_sizes(), 20) def test_plot_scatter_data_empty(self): """_plot_scatter_data() should not error when given empty list of data, but should not plot anything.""" fig, ax = _create_plot() result = _plot_scatter_data(ax, [], '^', 0.77, 1, 1.5, 'stdv') self.assertTrue(result is None) def test_plot_box_data(self): """_plot_box_data() should return a dictionary for Line2D's.""" fig, ax = _create_plot() result = _plot_box_data(ax, [0, 0, 7, 8, -3, 44], 'blue', 0.33, 55, 1.5, 'stdv') self.assertEqual(result.__class__.__name__, "dict") self.assertEqual(len(result['boxes']), 1) self.assertEqual(len(result['medians']), 1) self.assertEqual(len(result['whiskers']), 2) self.assertEqual(len(result['fliers']), 2) self.assertEqual(len(result['caps']), 2) def test_plot_box_data_empty(self): """_plot_box_data() should not error when given empty list of data, but should not plot anything.""" fig, ax = _create_plot() result = _plot_box_data(ax, [], 'blue', 0.33, 55, 1.5, 'stdv') self.assertEqual(result.__class__.__name__, "dict") self.assertEqual(len(result['boxes']), 0) self.assertEqual(len(result['medians']), 0) self.assertEqual(len(result['whiskers']), 0) self.assertEqual(len(result['fliers']), 0) self.assertEqual(len(result['caps']), 0) def test_calc_data_point_locations_invalid_widths(self): """_calc_data_point_locations() should raise a ValueError exception when it encounters bad widths.""" self.assertRaises(ValueError, _calc_data_point_locations, [1, 2, 3], 3, 2, -2, 0.5) self.assertRaises(ValueError, _calc_data_point_locations, [1, 2, 3], 3, 2, 2, -0.5) def test_calc_data_point_locations_default_spacing(self): """_calc_data_point_locations() should return an array containing the x-axis locations for each data point, evenly spaced from 1..n.""" locs = _calc_data_point_locations(None, 4, 2, 0.25, 0.5) self.assertEqual(locs, array([1.0, 2.0, 3.0, 4.0])) def test_calc_data_point_locations_custom_spacing(self): """_calc_data_point_locations() should return an array containing the x-axis locations for each data point, spaced according to a custom spacing scheme.""" locs = _calc_data_point_locations([3, 4, 10, 12], 4, 2, 0.25, 0.75) self.assertEqual(locs, array([3.75, 5.0, 12.5, 15.0])) def test_calc_data_point_ticks(self): """_calc_data_point_ticks() should return an array containing the x-axis locations for each data point tick.""" ticks = _calc_data_point_ticks(array([1, 5, 9, 11]), 1, 0.5, False) self.assertFloatEqual(ticks, array([1.25, 5.25, 9.25, 11.25])) ticks = _calc_data_point_ticks(array([0]), 3, 0.5, False) self.assertFloatEqual(ticks, array([0.75])) def test_set_axes_options(self): """_set_axes_options() should set the labels on the axes and not raise any exceptions.""" fig, ax = _create_plot() _set_axes_options(ax, "Plot Title", "x-axis label", "y-axis label", x_tick_labels=["T0", "T1"]) self.assertEqual(ax.get_title(), "Plot Title") self.assertEqual(ax.get_ylabel(), "y-axis label") self.assertEqual(ax.get_xticklabels()[0].get_text(), "T0") self.assertEqual(ax.get_xticklabels()[1].get_text(), "T1") def test_set_axes_options_ylim(self): """_set_axes_options() should set the y-axis limits.""" fig, ax = _create_plot() _set_axes_options(ax, "Plot Title", "x-axis label", "y-axis label", x_tick_labels=["T0", "T1", "T2"], y_min=0, y_max=1) self.assertEqual(ax.get_title(), "Plot Title") self.assertEqual(ax.get_ylabel(), "y-axis label") self.assertEqual(ax.get_xticklabels()[0].get_text(), "T0") self.assertEqual(ax.get_xticklabels()[1].get_text(), "T1") self.assertFloatEqual(ax.get_ylim(), [0, 1]) def test_set_axes_options_bad_ylim(self): """_set_axes_options() should raise an exception when given non-numeric y limits.""" fig, ax = _create_plot() self.assertRaises(ValueError, _set_axes_options, ax, "Plot Title", "x-axis label", "y-axis label", x_tick_labels=["T0", "T1", "T2"], y_min='car', y_max=30) def test_create_legend(self): """_create_box_plot_legend() should create a legend on valid input.""" fig, ax = _create_plot() _create_legend(ax, ['b', 'r'], ['dist1', 'dist2'], 'colors') self.assertEqual(len(ax.get_legend().get_texts()), 2) fig, ax = _create_plot() _create_legend(ax, ['^', '<', '>'], ['dist1', 'dist2', 'dist3'], 'symbols') self.assertEqual(len(ax.get_legend().get_texts()), 3) def test_create_legend_invalid_input(self): """Test raises error on bad input.""" fig, ax = _create_plot() self.assertRaises(ValueError, _create_legend, ax, ['^', '<', '>'], ['dist1', 'dist2'], 'symbols') self.assertRaises(ValueError, _create_legend, ax, ['^', '<', '>'], ['dist1', 'dist2', 'dist3'], 'foo') def test_generate_box_plots(self): """generate_box_plots() should return a valid Figure object.""" fig = generate_box_plots(self.ValidTypicalBoxData, [1, 4, 10], ["Data 1", "Data 2", "Data 3"], "Test", "x-axis label", "y-axis label") ax = fig.get_axes()[0] self.assertEqual(ax.get_title(), "Test") self.assertEqual(ax.get_xlabel(), "x-axis label") self.assertEqual(ax.get_ylabel(), "y-axis label") self.assertEqual(len(ax.get_xticklabels()), 3) self.assertFloatEqual(ax.get_xticks(), [1, 4, 10]) def test_generate_comparative_plots_bar(self): """generate_comparative_plots() should return a valid barchart Figure object.""" fig = generate_comparative_plots('bar', self.ValidTypicalData, [1, 4, 10, 11], ["T0", "T1", "T2", "T3"], ["Infants", "Children", "Teens"], ['b', 'r', 'g'], "x-axis label", "y-axis label", "Test") ax = fig.get_axes()[0] self.assertEqual(ax.get_title(), "Test") self.assertEqual(ax.get_xlabel(), "x-axis label") self.assertEqual(ax.get_ylabel(), "y-axis label") self.assertEqual(len(ax.get_xticklabels()), 4) self.assertFloatEqual(ax.get_xticks(), [2.3, 7.4, 17.6, 19.3]) def test_generate_comparative_plots_insufficient_colors(self): """generate_comparative_plots() should work even when there aren't enough colors. We should capture a print statement that warns the users.""" saved_stdout = sys.stdout try: out = StringIO() sys.stdout = out generate_comparative_plots('bar', self.ValidTypicalData, [1, 4, 10, 11], ["T0", "T1", "T2", "T3"], ["Infants", "Children", "Teens"], ['b', 'r'], "x-axis label", "y-axis label", "Test") output = out.getvalue().strip() self.assertEqual(output, "There are not enough markers to " "uniquely represent each distribution in your dataset. " "You may want to provide a list of markers that is at " "least as large as the number of distributions in your " "dataset.") finally: sys.stdout = saved_stdout def test_generate_comparative_plots_scatter(self): """generate_comparative_plots() should return a valid scatterplot Figure object.""" fig = generate_comparative_plots('scatter', self.ValidTypicalData, [1, 4, 10, 11], ["T0", "T1", "T2", "T3"], ["Infants", "Children", "Teens"], ['^', '>', '<'], "x-axis label", "y-axis label", "Test") ax = fig.get_axes()[0] self.assertEqual(ax.get_title(), "Test") self.assertEqual(ax.get_xlabel(), "x-axis label") self.assertEqual(ax.get_ylabel(), "y-axis label") self.assertEqual(len(ax.get_xticklabels()), 4) self.assertFloatEqual(ax.get_xticks(), [2.1, 7.2, 17.4, 19.1]) def test_generate_comparative_plots_insufficient_symbols(self): """generate_comparative_plots() should work even when there aren't enough symbols. We should capture a print statement that warns the users.""" saved_stdout = sys.stdout try: out = StringIO() sys.stdout = out generate_comparative_plots('scatter', self.ValidTypicalData, [1, 4, 10, 11], ["T0", "T1", "T2", "T3"], ["Infants", "Children", "Teens"], ['^'], "x-axis label", "y-axis label", "Test") output = out.getvalue().strip() self.assertEqual(output, "There are not enough markers to " "uniquely represent each distribution in your dataset. " "You may want to provide a list of markers that is at " "least as large as the number of distributions in your " "dataset.") finally: sys.stdout = saved_stdout def test_generate_comparative_plots_empty_marker_list(self): """generate_comparative_plots() should use the predefined list of markers if an empty list is provided by the user.""" generate_comparative_plots('scatter', self.ValidTypicalData, [1, 4, 10, 11], ["T0", "T1", "T2", "T3"], ["Infants", "Children", "Teens"], [], "x-axis label", "y-axis label", "Test") def test_generate_comparative_plots_box(self): """generate_comparative_plots() should return a valid boxplot Figure object.""" fig = generate_comparative_plots('box', self.ValidTypicalData, [1, 4, 10, 11], ["T0", "T1", "T2", "T3"], ["Infants", "Children", "Teens"], ['b', 'g', 'y'], "x-axis label", "y-axis label", "Test") ax = fig.get_axes()[0] self.assertEqual(ax.get_title(), "Test") self.assertEqual(ax.get_xlabel(), "x-axis label") self.assertEqual(ax.get_ylabel(), "y-axis label") self.assertEqual(len(ax.get_xticklabels()), 4) self.assertFloatEqual(ax.get_xticks(), [2.1, 7.2, 17.4, 19.1]) def test_generate_comparative_plots_error(self): """generate_comparative_plots() should raise a ValueError for an invalid plot type.""" self.assertRaises(ValueError, generate_comparative_plots, 'pie', self.ValidTypicalData, [1, 4, 10, 11], ["T0", "T1", "T2", "T3"], ["Infants", "Children", "Teens"], ['b', 'g', 'y'], "x-axis label", "y-axis label", "Test") def test_color_box_plot(self): """_color_box_plot() should not throw an exception when passed the proper input.""" fig, ax = _create_plot() box_plot = boxplot(self.ValidTypicalBoxData) _color_box_plot(ax, box_plot, 'blue') def test_set_figure_size(self): """Test setting a valid figure size.""" fig, ax = _create_plot() _set_axes_options(ax, 'foo', 'x_foo', 'y_foo', x_tick_labels=['foofoofoo', 'barbarbar'], x_tick_labels_orientation='vertical') _set_figure_size(fig, 3, 4) self.assertFloatEqual(fig.get_size_inches(), (3, 4)) def test_set_figure_size_defaults(self): """Test setting a figure size using matplotlib defaults.""" fig, ax = _create_plot() _set_axes_options(ax, 'foo', 'x_foo', 'y_foo', x_tick_labels=['foofoofoo', 'barbarbar'], x_tick_labels_orientation='vertical') orig_fig_size = fig.get_size_inches() _set_figure_size(fig) self.assertFloatEqual(fig.get_size_inches(), orig_fig_size) def test_set_figure_size_invalid(self): """Test setting a figure size using invalid dimensions.""" fig, ax = _create_plot() _set_axes_options(ax, 'foo', 'x_foo', 'y_foo', x_tick_labels=['foofoofoo', 'barbarbar'], x_tick_labels_orientation='vertical') orig_fig_size = fig.get_size_inches() _set_figure_size(fig, -1, 0) self.assertFloatEqual(fig.get_size_inches(), orig_fig_size) def test_set_figure_size_long_labels(self): """Test setting a figure size that has really long labels.""" saved_stdout = sys.stdout try: out = StringIO() sys.stdout = out fig, ax = _create_plot() _set_axes_options(ax, 'foo', 'x_foo', 'y_foo', x_tick_labels=['foofoofooooooooooooooooooooooooo' 'ooooooooooooooooooooooooooooooooooooooooooooooo' 'ooooooooooooooooooooo', 'barbarbar'], x_tick_labels_orientation='vertical') _set_figure_size(fig, 3, 3) self.assertFloatEqual(fig.get_size_inches(), (3, 3)) output = out.getvalue().strip() self.assertEqual(output, "Warning: could not automatically resize plot to make room for " "axes labels and plot title. This can happen if the labels or " "title are extremely long and the plot size is too small. Your " "plot may have its labels and/or title cut-off. To fix this, " "try increasing the plot's size (in inches) and try again.") finally: sys.stdout = saved_stdout if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_draw/test_matplotlib/000755 000765 000024 00000000000 12024703633 022604 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_draw/test_matplotlib/test_arrow_rates.py000644 000765 000024 00000001602 12024702176 026545 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python from cogent.draw.arrow_rates import make_arrow_plot, sample_data from cogent.util.unit_test import TestCase, main from os import remove __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class arrow_rates_tests(TestCase): """Tests of top-level function, primarily checking that it writes the file. WARNING: must visually inspect output to check correctness! """ def test_make_arrow_plot(self): """arrow_plot should write correct file and not raise exception""" make_arrow_plot(sample_data, graph_name='arrows.png') #comment out line below to see the result #remove('arrows.png') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_draw/test_matplotlib/test_codon_usage.py000644 000765 000024 00000023234 12024702176 026510 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Tests of the codon usage graphs. Note: currently, this must be invoked manually because all the output is graphical. Expects to be invoked from the test directory. """ from cogent.maths.stats.cai.adaptor import data_from_file, adapt_p12, \ adapt_p123gc, adapt_cai_p3, adapt_cai_histogram, adapt_fingerprint, \ adapt_pr2_bias, file_to_codon_list, \ bin_by_p3 from cogent.draw.codon_usage import plot_cai_p3_scatter, \ plot_p12_p3, plot_p123_gc, plot_fingerprint, plot_pr2_bias, \ plot_p12_p3_contour, \ plot_p12_p3_contourlines, aa_labels from cogent.draw.util import as_species, \ plot_scatter_with_histograms, plot_histograms, \ plot_filled_contour, format_contour_array from pylab import gca, clf from numpy import transpose from sys import argv from os import getcwd from os.path import sep, join test_path = getcwd().split(sep) index = test_path.index('tests') fields = test_path[:index+1] + ["data"] test_path = sep + join(*fields) test_file_name = join(test_path, 'Homo_sapiens_codon_usage.pri') __author__ = "Stephanie Wilson" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Stephanie Wilson"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def insert_before_extension(name, item): """Inserts item before extension in name.""" last_dot = name.rfind('.') return name[:last_dot+1]+str(item)+'.'+ name[last_dot+1:] #TESTS USING ADAPTORS def make_generic_adaptor_test(adaptor_f, plot_f, default_outfilename): """Makes adaptor test for generic graphs.""" def adaptor_test(codons, infilename, name, outfilename=default_outfilename): print "=> outfile:", outfilename print "from file:", infilename graph_data = adaptor_f(codons) plot_f(graph_data, num_genes=len(codons), graph_name=outfilename,\ title=name) gca().clear() clf() return adaptor_test def make_contour_adaptor_test(adaptor_f, plot_f, default_outfilename): """Makes adaptor test for contour graphs.""" def adaptor_test(codons, infilename, name, outfilename=default_outfilename): print "=> outfile:", outfilename print "from file:", infilename xy_data = adaptor_f(codons) x, y, data = format_contour_array(xy_data) plot_f(x, y, data, xy_data, num_genes=len(codons), \ graph_name=outfilename, title=name) gca().clear() clf() return adaptor_test def make_pr2bias_adaptor_test(adaptor_f, plot_f, default_outfilename): """Makes adaptor test for pr2bias graphs.""" def adaptor_test(codons, infilename, name, outfilename=default_outfilename): print "=> base outfile:", outfilename print "from file:", infilename for aa, triplet in aa_labels.items(): triplet = triplet.replace('T','U') graph_data = adaptor_f(codons, block=triplet[:2]) curr_outfilename = insert_before_extension(outfilename, triplet) plot_f(graph_data, num_genes=len(codons), \ graph_name=curr_outfilename, title=aa) gca().clear() clf() return adaptor_test def make_gc_gradient_adaptor_test(adaptor_f, plot_f, default_outfilename, \ one_graph=False, one_series=False): """Makes adaptor test for replicated graphs over e.g. a GC gradient""" def adaptor_test(codons, infilename, name, outfilename=default_outfilename): min_gene_threshold=10 #suppress bins with few genes print "=>base outfile:", outfilename print "from file:", infilename print "one graph:", one_graph print "one series:", one_series gc_bins = bin_by_p3(codons) if one_series: #assume we want to adapt the list of codon usages data = adaptor_f(gc_bins) plot_f(data, num_genes=len(codons), graph_name=outfilename,\ title=name) else: data = [] for b in gc_bins: try: data.append(adaptor_f(b)) except: data.append([]) if one_graph: #assume downstream f copes with list of data total_genes = len(codons) alpha = [float(len(b))/total_genes for b in gc_bins] plot_f(data, num_genes=total_genes, graph_name=outfilename, \ alpha=alpha, title=name, multiple=True) else: #multiple graphs: feed one at a time for i, c in enumerate(gc_bins): curr_p3 = i*0.1 if len(c) > min_gene_threshold: curr_outfilename = insert_before_extension(\ outfilename, curr_p3) graph_data = adaptor_f(c) plot_f(graph_data, num_genes=len(c), \ graph_name=curr_outfilename, \ title=name+' '+str(curr_p3)+'-'+str(curr_p3+0.1)) gca().clear() clf() gca().clear() clf() return adaptor_test mgat = make_generic_adaptor_test #save typing in what follows mcat = make_contour_adaptor_test #ditto mpat = make_pr2bias_adaptor_test mggat = make_gc_gradient_adaptor_test #scatterplot adaptors test_p12_p3_adaptor = mgat(adapt_p12, plot_p12_p3, 'test_p12_p3_A.png') test_p123_gc_adaptor = mgat(adapt_p123gc, plot_p123_gc, \ 'test_p123_gc_A.png') def plot_p12_p3_from_gc(*args, **kwargs): return plot_p123_gc(use_p3_as_x=True, graph_shape='sqr',*args, **kwargs) test_p12_p3gc_adaptor = mgat(adapt_p123gc, plot_p12_p3_from_gc, \ 'test_p12_from_gc_A.png') test_cai_p3_adaptor = mgat(adapt_cai_p3, plot_cai_p3_scatter, \ 'test_p3_cai_A.png') def adapt_cai_p3_twoseries(*args, **kwargs): return adapt_cai_p3(both_series=True) def adapt_cai_p3_twoseries(*args, **kwargs): \ return adapt_cai_p3(both_series=True, *args, **kwargs) test_cai_p3_twoseries_adaptor = mgat(adapt_cai_p3_twoseries, \ plot_cai_p3_scatter, 'test_p3_cai_twoseries_A.png') def scat_hist_cai_p3(data, *args, **kwargs): return plot_scatter_with_histograms(data, x_label='$P_3$', y_label='CAI',\ *args, **kwargs) test_cai_p3_twoseries_adaptor_hist = mgat(adapt_cai_p3_twoseries, \ scat_hist_cai_p3, 'test_p3_cai_twoseries_hist_A.png') #hist adaptors def cai_histogram(data, *args, **kwargs): return plot_histograms(data, x_label='CAI', \ series_names=['others', 'ribosomal'], show_legend=True, \ colors=['white','red'], linecolors=['black','red'], alpha=0.7, \ *args, **kwargs) test_cai_hist_adaptor = mgat(adapt_cai_histogram, cai_histogram,\ 'test_cai_hist.png') #fingerprint adaptors test_fingerprint_adaptor = mgat(adapt_fingerprint, plot_fingerprint, \ 'test_fingerprint_A.png') test_fingerprint_gradient_adaptor = mggat(adapt_fingerprint, plot_fingerprint,\ 'test_fingerprint_gradient_A.png') test_fingerprint_gradient_adaptor_one_graph = mggat(adapt_fingerprint, \ plot_fingerprint, 'test_fingerprint_gradient_onegraph_A.png', one_graph=True) #pr2 bias adaptors test_pr2bias_adaptor = mpat(adapt_pr2_bias, plot_pr2_bias, \ 'test_pr2_bias_A.png') #contour adaptors test_p12_p3_contour_adaptor = mcat(adapt_p12, plot_p12_p3_contour, \ 'test_p12_p3_contour_A.png') test_p12_p3_contourlines_adaptor = mcat(adapt_p12, plot_p12_p3_contourlines, \ 'test_p12_p3_contourlines_A.png') scatter_adaptor = [test_p12_p3_adaptor, test_p123_gc_adaptor, \ test_p12_p3gc_adaptor, test_cai_p3_adaptor, test_cai_p3_twoseries_adaptor,\ test_cai_p3_twoseries_adaptor_hist] hist_adaptor=[test_cai_hist_adaptor] fingerprint_adaptor = [test_fingerprint_adaptor, test_fingerprint_gradient_adaptor, \ test_fingerprint_gradient_adaptor_one_graph] pr2bias_adaptor = [test_pr2bias_adaptor] contour_adaptor = [test_p12_p3_contour_adaptor, \ test_p12_p3_contourlines_adaptor] all_adaptor = scatter_adaptor + fingerprint_adaptor + pr2bias_adaptor \ + hist_adaptor + contour_adaptor #take in pre-constructed codon usage objects, output requested graphs def codons_to_graph(codons, as_file, species, which_tests): """Function for directly passing in codons instead of reading them in from a file""" adaptor_tests= all_adaptor codon_data_fname=as_file print "Running adaptor tests..." codon_data = codons if which_tests: for i in which_tests: print "doing test %s" % i a = adaptor_tests[i] a(codon_data, codon_data_fname, species) else: for i, a in enumerate(adaptor_tests): print "doing test %s" % i a(codon_data,codon_data_fname, species) if __name__ == '__main__': """Tests if the graphs will all compile and outputs the graph from the current version of code: test_fingerprint.png """ adaptor_tests = all_adaptor if len(argv) > 1: codon_data_fname = argv[1] else: codon_data_fname = test_file_name if len(argv) > 2: which_tests = map(int, argv[2].split(',')) else: which_tests = None print "Running adaptor tests..." if codon_data_fname.endswith('.nuc'): #assume FASTA from KEGG codon_data = kegg_fasta_to_codon_list(open(codon_data_fname)) else: codon_data = file_to_codon_list(codon_data_fname) if which_tests: for i in which_tests: print "doing test %s" % i a = adaptor_tests[i] a(codon_data, codon_data_fname, as_species(codon_data_fname)) else: for i, a in enumerate(adaptor_tests): print "doing test %s" % i a(codon_data, codon_data_fname, as_species(codon_data_fname)) PyCogent-1.5.3/tests/test_draw/test_matplotlib/test_dinuc.py000644 000765 000024 00000005152 12024702176 025323 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python from cogent.draw.dinuc import dinuc_plot from numpy import array, clip from numpy.random import random from cogent.util.unit_test import TestCase, main from os import remove __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def add_random(a): """Adds a small random component to gene a, while maintaining the sum.""" r = (random(a.shape)-.5)/10 return clip(a + r, 0, 1) class dinuc_tests(TestCase): """Tests of top-level function, primarily checking that it writes the file. WARNING: must visually inspect output to check correctness! """ def test_dinuc(self): """dinuc_plot should write correct file and not raise exception""" spec_a_ave = array([0.25, 0.20, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.40, 0.55, 0.60, 0.65, 0.70, 0.60, 0.70]) spec_b_ave = array([0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.20, 0.15]) spec_c_ave = array([0.40, 0.50, 0.55, 0.50, 0.45, 0.50, 0.45, 0.50, 0.45, 0.55, 0.45, 0.45, 0.45, 0.55, 0.50, 0.55]) spec_a_ht_gene = array([0.41, 0.52, 0.53, 0.53, 0.45, 0.51, 0.46, 0.52, 0.43, 0.55, 0.45, 0.45, 0.45, 0.55, 0.53, 0.55]) spec_b_ht_gene = array([0.26, 0.22, 0.17, 0.19, 0.23, 0.32, 0.36, 0.41, 0.44, 0.41, 0.53, 0.62, 0.63, 0.71, 0.64, 0.72]) spec_b_rb_gene = array([0.82, 0.74, 0.73, 0.63, 0.62, 0.54, 0.51, 0.43, 0.41, 0.34, 0.32, 0.24, 0.23, 0.17, 0.28, 0.19]) spec_c_rb_gene = array([0.43, 0.54, 0.56, 0.51, 0.44, 0.51, 0.42, 0.53, 0.44, 0.53, 0.43, 0.47, 0.46, 0.57, 0.53, 0.52]) a_data = {'hgt':[spec_a_ht_gene], \ None: map(add_random, [spec_a_ave.copy() for i in range(10)])} b_data = {'hgt':[spec_b_ht_gene], 'ribosomal':[spec_b_rb_gene], \ None:[spec_b_ave]} c_data = {'ribosomal':[spec_b_rb_gene], None:[spec_c_ave]} data = {'Species A': a_data, 'Species B': b_data, 'Species C':c_data} dinuc_plot(data, avg_formats={'markersize':5}, \ point_formats={'s':2, 'alpha':0.2}, graph_name='test.png') #note:comment out the next line to see the test file #remove('test.png') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_draw/test_matplotlib/test_multivariate_plot.py000644 000765 000024 00000010511 12024702176 027760 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python from cogent.draw.multivariate_plot import (plot_ordination, map_colors, text_points, scatter_points, plot_points, arrows) from cogent.util.unit_test import TestCase, main import os, pylab from tempfile import mkstemp from numpy import asarray, c_ from pdb import set_trace __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" ###### # util class class TestCasePlot(TestCase): Debug = True def fig(self, fname=None): if self.Debug: pylab.show() else: if not fname: fd, fname = mkstemp(prefix='PlotTest_', suffix='.png') pylab.savefig(fname) pylab.clf() os.remove(fname) def p(self, obj): if self.Debug: print obj class TestI(TestCasePlot): def setUp(self): self.points = [(0, 0), (1,1), (1,2), (2, 2)] #adjust canvas #pylab.xlim(-0.5, 2.5) #pylab.ylim(-0.5, 2.5) class FunctionsTests(TestI): def test_map_colors(self): #default self.assertEqual(map_colors(range(3)), ['#000080', '#7dff7a', '#800000']) #alternative cmap self.assertEqual(map_colors(range(3), cmap='hot'), ['#0b0000', '#ff5c00', '#ffffff']) #return tuples self.assertFloatEqual(map_colors(range(3), mode='tuples'), [(0.0, 0.0, 0.5), (0.490196, 1.0, 0.477546), (0.5, 0.0, 0.0)]) def test_text_points(self): #same text for all the points text_points(self.points, 'X') self.fig() def test_text_points_diff_texts(self): #diff text for all the points text_points(self.points, ['A', 'B', 'C', 'A']) self.fig() def test_plot_points(self): plot_points(self.points, label='X') self.fig() class scatter_points_tests(TestI): def test_basic(self): scatter_points(self.points, label='X') self.fig() def test_color_list(self): scatter_points(self.points, c=['k', 'red', '#00FF00', (0, 0, 1)], s=[100, 200, 300, 400], marker=['o', 's', 'd', 'h']) pylab.xlim(-0.5, 2.5) pylab.ylim(-0.5, 2.5) self.fig() def test_color_shades(self): #self.Debug = True scatter_points(self.points, c=[1, 2, 3, 4], s=[100, 200, 300, 400], marker=['o', 's', 'd', 'h']) pylab.xlim(-0.5, 2.5) pylab.ylim(-0.5, 2.5) self.fig() class arrows_tests(TestI): def test_arrows(self): points = self.points arrows([[0,0]]*len(points), points) self.fig() class plot_ordination_tests(TestCasePlot): def setUp(self): self.points = points_3d self.keys = ['eigvals', 'samples', 'species', 'centroids', 'biplot'] self.values = [ [0.3, 0.1, 0, -0.5], #eigvals self.points, #samples self.points + 0.2, #species [(0, 1, 0), (2, -2, 0)], #centroids [(0.5, 0.5, 1), (-1, 1.5, 1)], #biplot ] def test_basic(self): res = dict(zip(self.keys[:2], self.values)) plot_ordination(res) self.fig() def test_species(self): res = dict(zip(self.keys[:3], self.values)) plot_ordination(res) self.fig() def test_centroids(self): res = dict(zip(self.keys[:4], self.values)) plot_ordination(res) self.fig() def test_biplot(self): res = dict(zip(self.keys, self.values)) plot_ordination(res, species_kw={'label': 'sp'}, biplot_kw={'label':['b1', 'b2']}, samples_kw={'label': 'sa'}) self.fig() def test_choices(self): res = dict(zip(self.keys, self.values)) plot_ordination(res, choices=[2,3]) self.fig() def test_axis_names(self): #self.Debug = True res = dict(zip(self.keys[:2], self.values)) plot_ordination(res, axis_names=['CCA1', 'CA1', 'CA2'], constrained_names='CCA') self.fig() ##### # test data points = asarray([(0,0), (-1, 1), (1, -2), (2, 3)], float) points_3d = c_[points, [[3]]*len(points)] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_db/__init__.py000644 000765 000024 00000000522 12024702176 021137 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_util', 'test_ncbi', 'test_pdb', 'test_rfam', 'test_ensembl'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_db/test_ensembl/000755 000765 000024 00000000000 12024703632 021511 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_db/test_ncbi.py000644 000765 000024 00000025560 12024702176 021363 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of data retrieval from NCBI.""" from cogent.util.unit_test import TestCase, main from cogent.db.ncbi import EUtils, ESearch, EFetch, ELink, ESearchResultParser,\ ELinkResultParser, get_primary_ids, ids_to_taxon_ids, \ taxon_lineage_extractor, taxon_ids_to_lineages, taxon_ids_to_names, \ taxon_ids_to_names_and_lineages, \ get_unique_lineages, get_unique_taxa, parse_taxonomy_using_elementtree_xml_parse from string import strip from StringIO import StringIO __author__ = "Mike Robeson" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Mike Robeson", "Rob Knight", "Zongzhi Liu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Mike Robeson" __email__ = "mike.robeson@colorado.edu" __status__ = "Production" class EUtilsTests(TestCase): """Tests of the EUtils class.""" def test_simple_get(self): """EUtils simple access of an item should work""" g = EUtils(db='protein',rettype='gp') result = g['NP_003320'].read() assert result.startswith('LOCUS') assert 'NP_003320' in result def test_get_slice(self): """EUtils access of a slice should work""" g = EUtils(db='protein',rettype='gp', retmax=1) result = g['NP_003320':'NP_003322'].read() lines = result.splitlines() is_locus = lambda x: x.startswith('LOCUS') loci = filter(is_locus, lines) self.assertEqual(len(loci), 3) #EUtils access of a slice should work, while limiting #the esearch term length g = EUtils(db='protein',rettype='gp', retmax=1, url_limit=2) result = g['NP_003320':'NP_003322'].read() lines = result.splitlines() is_locus = lambda x: x.startswith('LOCUS') loci = filter(is_locus, lines) self.assertEqual(len(loci), 3) def test_get_list(self): """EUtils access of a list should work""" g = EUtils(db='protein',rettype='gp') result = g['NP_003320','NP_003321','NP_003322'].read() lines = result.splitlines() is_locus = lambda x: x.startswith('LOCUS') loci = filter(is_locus, lines) self.assertEqual(len(loci), 3) #EUtils access of a slice should work, while limiting #the esearch term length g = EUtils(db='protein',rettype='gp',url_limit=2) result = g['NP_003320','NP_003321','NP_003322'].read() lines = result.splitlines() is_locus = lambda x: x.startswith('LOCUS') loci = filter(is_locus, lines) self.assertEqual(len(loci), 3) # def test_get_from_taxonomy_db(self): # """EUtils access from taxonomy database should work""" # #note: this is more fragile than the nucleotide databases # g = EUtils(db='taxonomy', rettype='Brief', retmode='text') # ids = '9606[taxid] OR 28901[taxid]' # result = sorted(g[ids].read().splitlines()) # self.assertEqual(result, ['Homo sapiens', 'Salmonella enterica']) def test_get_from_taxonomy_db(self): """EUtils access from taxonomy database should work""" #note: this is more fragile than the nucleotide databases g = EUtils(db='taxonomy', rettype='xml', retmode='xml') ids = '9606[taxid] OR 28901[taxid]' fh = StringIO() fh.write(g[ids].read()) fh.seek(0) data = parse_taxonomy_using_elementtree_xml_parse(fh) result = sorted([item['ScientificName'] for item in data]) self.assertEqual(result, ['Homo sapiens', 'Salmonella enterica']) def test_query(self): """EUtils access via a query should work""" g = EUtils(db='protein', rettype='gi', retmax=100) result = g['homo[organism] AND erf1[ti]'].read().splitlines() assert '5499721' in result #gi of human eRF1 #note: successfully retrieved 841,821 ids on a query for 'rrna', #although it took about 40 min so not in the tests. RK 1/3/07. def test_query_retmax(self): """EUtils should join results taken retmax at a time""" g = EUtils(db='protein', rettype='gi', retmax=3, DEBUG=False) result = g['homo[organism] AND myh7'].read().splitlines() assert len(result) > 1 assert '83304912' in result #gi of human myh7 def test_query_max_recs(self): """EUtils should stop query at max_recs when max_recs < retmax""" g = EUtils(db='protein', rettype='gi', max_recs=5, DEBUG=False, retmax=100) result = g['homo[organism] AND myh7'].read().splitlines() self.assertEqual(len(result), 5) def test_query_max_recs_gt_retmax(self): """EUtils should stop query at max_recs when max_recs > retmax""" g = EUtils(db='protein', rettype='gi', max_recs=5, DEBUG=False, retmax=3) result = g['homo[organism] AND myh7'].read().splitlines() self.assertEqual(len(result), 5) class ESearchTests(TestCase): """Tests of the ESearch class: gets primary ids from search.""" def test_simple_search(self): """ESearch Access via a query should return accessions""" s = ESearch(db='protein', rettype='gi', retmax=1000, term='homo[organism] AND myh7') result = s.read() parsed = ESearchResultParser(result) assert '83304912' in parsed.IdList #gi of human cardiac beta myh7 class ELinkTests(TestCase): """Tests of the ELink class: converts ids between databases""" def test_simple_elink(self): """ELink should retrieve a link from a single id""" l = ELink(db='taxonomy', dbfrom='protein', id='83304912') result = l.read() parsed = ELinkResultParser(result) self.assertEqual(parsed, ['9606']) #human sequence def test_multiple_elink(self): """ELink should find unique links in a set of ids""" l = ELink(db='taxonomy', dbfrom='protein', id='83304912 115496169 119586556 111309484') result = l.read() parsed = ELinkResultParser(result) self.assertEqual(sorted(parsed), ['10090', '9606']) #human and mouse sequences class EFetchTests(TestCase): """Tests of the EFetch class: gets records using primary ids.""" def test_simple_efetch(self): """EFetch should return records from list of ids""" f = EFetch(db='protein', rettype='fasta', retmode='text', id='111309484') result = f.read().splitlines() assert result[0].startswith('>') assert result[1].startswith('madaemaafg'.upper()) class NcbiTests(TestCase): """Tests of top-level convenience wrappers.""" def setUp(self): """Define some lengthy data.""" self.mouse_taxonomy = map(strip, 'cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus; Mus'.split(';')) self.human_taxonomy = map(strip, 'cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; Homininae; Homo'.split(';')) def test_get_primary_ids(self): """get_primary_ids should get primary ids from query""" res = get_primary_ids('homo[orgn] AND myh7[ti]', retmax=5, max_recs=7) self.assertEqual(len(res), 7) res = get_primary_ids('homo[orgn] AND myh7[ti]', retmax=5, max_recs=2) self.assertEqual(len(res), 2) res = get_primary_ids('homo[orgn] AND myh7[ti]', retmax=100) assert '115496168' in res def test_ids_to_taxon_ids(self): """ids_to_taxonomy should get taxon ids from primary ids""" ids = ['83304912', '115496169', '119586556', '111309484'] result = ids_to_taxon_ids(ids, db='protein') self.assertEqual(sorted(result), ['10090', '9606']) def test_taxon_lineage_extractor(self): """taxon_lineage_extractor should find lineage lines""" lines = """ignore xxx;yyy ignore aaa;bbb """ self.assertEqual(list(taxon_lineage_extractor(lines.splitlines())), [['xxx','yyy'],['aaa','bbb']]) def test_parse_taxonomy_using_elementtree_xml_parse(self): """parse_taxonomy_using_elementtree_xml_parse should return taxonomy associated information""" g = EUtils(db='taxonomy', rettype='xml', retmode='xml') ids = '28901[taxid]' fh = StringIO() fh.write(g[ids].read()) fh.seek(0) data = parse_taxonomy_using_elementtree_xml_parse(fh)[0] obs = (data['Lineage'],data['TaxId'],data['ScientificName'],\ data['Rank']) exp = ('cellular organisms; Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Salmonella',\ '28901','Salmonella enterica','species') self.assertEqual(obs,exp) def test_taxon_ids_to_lineages(self): """taxon_ids_to_lineages should return lineages from taxon ids""" taxon_ids = ['10090', '9606'] result = [self.mouse_taxonomy, self.human_taxonomy] self.assertEqualItems(list(taxon_ids_to_lineages(taxon_ids)), result) # def test_taxon_ids_to_names(self): # """taxon_ids_to_names should return names from taxon ids""" # taxon_ids = ['10090', '9606'] # result = set(['Mus musculus', 'Homo sapiens']) # self.assertEqual(set(taxon_ids_to_names(taxon_ids)), result) def test_taxon_ids_to_names(self): """taxon_ids_to_names should return names from taxon ids""" taxon_ids = ['10090', '9606'] result = set(['Mus musculus', 'Homo sapiens']) self.assertEqual(set(taxon_ids_to_names(taxon_ids)), result) def test_taxon_ids_to_names_and_lineages(self): """taxon_ids_to_names should return names/lineages from taxon ids""" taxon_ids = ['10090', '9606'] exp = [('10090', 'Mus musculus', '; '.join(self.mouse_taxonomy)), ('9606', 'Homo sapiens', '; '.join(self.human_taxonomy))] obs = list(taxon_ids_to_names_and_lineages(taxon_ids)) self.assertEqualItems(obs, exp) def test_get_unique_lineages(self): """get_unique_lineages should return all lineages from a query""" result = get_unique_lineages('angiotensin[ti] AND rodents[orgn]') assert tuple(self.mouse_taxonomy) in result assert len(result) > 2 def test_get_unique_taxa(self): """get_unique_taxa should return all taxa from a query""" result = get_unique_taxa('angiotensin[ti] AND primate[orgn]') assert 'Homo sapiens' in result assert len(result) > 2 if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_db/test_pdb.py000644 000765 000024 00000001470 12024702176 021207 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of data retrieval from PDB.""" from cogent.util.unit_test import TestCase, main from cogent.db.pdb import Pdb __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class PdbTests(TestCase): """Tests of the Pdb class.""" def test_simple_get(self): """Simple access of an item should work.""" rec = Pdb() result = rec['1RMN'].read() assert result.startswith('HEADER') assert result.rstrip().endswith('END') #note: trailing whitespace assert 'HAMMERHEAD RIBOZYME' in result assert '1RMN' in result if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_db/test_rfam.py000644 000765 000024 00000001351 12024702176 021365 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of data retrieval from PDB.""" from cogent.util.unit_test import TestCase, main from cogent.db.rfam import Rfam __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight","Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class RfamTests(TestCase): """Tests of the Rfam class.""" def test_simple_get(self): """Simple access of an item should work.""" rec = Rfam() result = rec['rf00100'].read() assert result.startswith('# STOCKHOLM') assert 'AM773434.1/1-304' in result if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_db/test_util.py000644 000765 000024 00000005534 12024702176 021424 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of the db utility functions and classes.""" from cogent.util.unit_test import TestCase, main from cogent.db.util import UrlGetter, expand_slice, last_nondigit_index,make_lists_of_expanded_slices_of_set_size,make_lists_of_accessions_of_set_size from os import remove __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class db_util_tests(TestCase): """Tests of top-level functions.""" def test_last_nondigit_index(self): """last_nondigit_index should return i such that s[i:] is numeric""" ldi = last_nondigit_index self.assertEqual(ldi('3'), 0) self.assertEqual(ldi('345'), 0) self.assertEqual(ldi('a34'), 1) self.assertEqual(ldi('abc234'), 3) self.assertEqual(ldi('abcd'), None) def test_expand_slice(self): """expand_slice should get accession range""" self.assertEqual(expand_slice(slice('AF1001','AF1003')), \ ['AF1001','AF1002','AF1003']) #can't expand if accession prefixes self.assertRaises(TypeError, expand_slice, slice('AF100:','AG1002')) #should keep leading zeros self.assertEqual(expand_slice(slice('AF0001','AF0003')), \ ['AF0001','AF0002','AF0003']) def test_make_lists_of_expanded_slices_of_set_size(self): """make_lists_of_expanded_slices_of_set_size: should return a list of lists""" expected_list = ['HM780503 HM780504 HM780505','HM780506'] observed = make_lists_of_expanded_slices_of_set_size(slice('HM780503','HM780506'),size_limit=3) self.assertEqual(observed,expected_list) def make_lists_of_accessions_of_set_size(self): """make_lists_of_expanded_slices_of_set_size: should return a list of lists""" expected_list = ['HM780503 HM780506 HM780660 HM780780'] observed = make_lists_of_accessions_of_set_size(['HM780503','HM780506', 'HM780660', 'HM780780'],size_limit=3) self.assertEqual(observed,expected_list) class UrlGetterTests(TestCase): """Tests of the UrlGetter class""" def retrieval_test(self): """Urlgetter should init, read and retrieve""" class Google(UrlGetter): BaseUrl='http://www.google.com' g = Google() #test URL construction self.assertEqual(str(g), 'http://www.google.com') #test reading text = g.read() assert 'Google' in text #test file getting fname = '/tmp/google_test' g.retrieve(fname) g_file = open(fname) g_text = g_file.read() self.assertEqual(g_text, text) g_text.close() remove(fname) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_db/test_ensembl/__init__.py000644 000765 000024 00000000637 12024702176 023632 0ustar00jrideoutstaff000000 000000 __all__ = ['test_assembly', 'test_compara', 'test_database', 'test_feature_level', 'test_genome', 'test_host', 'test_species'] __author__ = "Gavin Huttley, Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" PyCogent-1.5.3/tests/test_db/test_ensembl/test_assembly.py000644 000765 000024 00000011576 12024702176 024755 0ustar00jrideoutstaff000000 000000 import os from cogent.util.unit_test import TestCase, main from cogent.db.ensembl.host import HostAccount, get_ensembl_account from cogent.db.ensembl.assembly import Coordinate, CoordSystem, \ get_coord_conversion from cogent.db.ensembl.genome import Genome __author__ = "Gavin Huttley, Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" Release = 68 if 'ENSEMBL_ACCOUNT' in os.environ: args = os.environ['ENSEMBL_ACCOUNT'].split() host, username, password = args[0:3] kwargs = {} if len(args) > 3: kwargs['port'] = int(args[3]) account = HostAccount(host, username, password, **kwargs) else: account = get_ensembl_account(release=Release) human = Genome(Species = 'human', Release=Release, account=account) platypus = Genome(Species = 'platypus', Release=Release, account=account) class TestLocation(TestCase): def test_init(self): human_loc = Coordinate(CoordName='x', Start=1000, End=10000, Strand=-1, genome = human) # TODO: complete test for platpus self.assertEqual(human_loc.CoordType, 'chromosome') self.assertEqual(human_loc.CoordName, 'x') self.assertEqual(human_loc.Start, 1000) self.assertEqual(human_loc.End, 10000) self.assertEqual(human_loc.Strand, -1) self.assertEqual(human_loc.Species, "Homo sapiens") self.assertEqual(human_loc.seq_region_id, 27516) def test_get_coord_conversion(self): """should correctly map between different coordinate levels""" # not really testing the contig coordinates are correct CoordName, Start, End, Strand = '1', 1000, 1000000, 1 human_loc = Coordinate(CoordName = CoordName, Start = Start, End = End, Strand = Strand, genome = human) results = get_coord_conversion(human_loc, 'contig', human.CoreDb) for result in results: self.assertTrue(result[0].CoordName == CoordName) self.assertTrue(result[0].Start >= Start) self.assertTrue(result[0].End <= End) self.assertTrue(result[0].Strand == Strand) def test_coord_shift(self): """adding coordinates should produce correct results""" CoordName, Start, End, Strand = '1', 1000, 1000000, 1 loc1 = Coordinate(CoordName = CoordName, Start = Start, End = End, Strand = Strand, genome = human) for shift in [100, -100]: loc2 = loc1.shifted(shift) self.assertEqual(loc2.Start, loc1.Start+shift) self.assertEqual(loc2.End, loc1.End+shift) self.assertEqual(id(loc1.genome), id(loc2.genome)) self.assertNotEqual(id(loc1), id(loc2)) def test_coord_resize(self): """resizing should work""" CoordName, Start, End, Strand = '1', 1000, 1000000, 1 loc1 = Coordinate(CoordName = CoordName, Start = Start, End = End, Strand = Strand, genome = human) front_shift = -100 back_shift = 100 loc2 = loc1.resized(front_shift, back_shift) self.assertEqual(len(loc2), len(loc1)+200) self.assertEqual(loc2.Start, loc1.Start+front_shift) self.assertEqual(loc2.End, loc1.End+back_shift) self.assertEqual(loc1.Strand, loc2.Strand) def test_adopted(self): """coordinate should correctly adopt seq_region_id properties of provided coordinate""" CoordName, Start, End, Strand = '1', 1000, 1000000, 1 c1 = Coordinate(CoordName = CoordName, Start = Start, End = End, Strand = Strand, genome = human) CoordName, Start, End, Strand = '2', 2000, 2000000, 1 c2 = Coordinate(CoordName = CoordName, Start = Start, End = End, Strand = Strand, genome = human) c3 = c1.adopted(c2) self.assertEqual(c3.CoordName, c2.CoordName) self.assertEqual(c3.CoordType, c2.CoordType) self.assertEqual(c3.seq_region_id, c2.seq_region_id) self.assertEqual(c3.Start, c1.Start) self.assertEqual(c3.End, c1.End) self.assertEqual(c3.Strand, c1.Strand) c3 = c1.adopted(c2, shift = 100) self.assertEqual(c3.Start, c1.Start+100) self.assertEqual(c3.End, c1.End+100) class TestCoordSystem(TestCase): def test_call(self): human_chrom = CoordSystem('chromosome', core_db = human.CoreDb, species = 'human') human_contig = CoordSystem(1, species = 'human') self.assertEqual(human_chrom.coord_system_id, 2) self.assertEqual(human_contig.name, 'contig') self.assertEqual(human_contig.attr, 'default_version, sequence_level') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_db/test_ensembl/test_compara.py000644 000765 000024 00000025354 12024702176 024557 0ustar00jrideoutstaff000000 000000 from __future__ import division import os from cogent.util.unit_test import TestCase, main from cogent.db.ensembl.host import HostAccount, get_ensembl_account from cogent.db.ensembl.compara import Compara __author__ = "Gavin Huttley, Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" Release = 68 if 'ENSEMBL_ACCOUNT' in os.environ: args = os.environ['ENSEMBL_ACCOUNT'].split() host, username, password = args[0:3] kwargs = {} if len(args) > 3: kwargs['port'] = int(args[3]) account = HostAccount(host, username, password, **kwargs) else: account = get_ensembl_account(release=Release) def calc_slope(x1, y1, x2, y2): """computes the slope from two coordinate sets, assigning a delta of 1 when values are identical""" delta_y = y2-y1 delta_x = x2-x1 delta_y = [delta_y, 1][delta_y == 0] delta_x = [delta_x, 1][delta_x == 0] return delta_y/delta_x class ComparaTestBase(TestCase): comp = Compara(['human', 'mouse', 'rat', 'platypus'], Release=Release, account=account) class TestCompara(ComparaTestBase): def test_query_genome(self): """compara should attach valid genome attributes by common name""" brca2 = self.comp.Mouse.getGeneByStableId("ENSMUSG00000041147") self.assertEquals(brca2.Symbol.lower(), 'brca2') def test_get_related_genes(self): """should correctly return the related gene regions from each genome""" brca2 = self.comp.Mouse.getGeneByStableId("ENSMUSG00000041147") Orthologs = self.comp.getRelatedGenes(gene_region=brca2, Relationship="ortholog_one2one") self.assertEquals("ortholog_one2one", Orthologs.Relationships[0]) def test_get_related_genes2(self): """should handle case where gene is absent from one of the genomes""" clec2d = self.comp.Mouse.getGeneByStableId( StableId='ENSMUSG00000030157') orthologs = self.comp.getRelatedGenes(gene_region=clec2d, Relationship='ortholog_one2many') self.assertTrue(len(orthologs.Members) < 4) def test_get_collection(self): brca2 = self.comp.Human.getGeneByStableId(StableId="ENSG00000139618") Orthologs = self.comp.getRelatedGenes(gene_region=brca2, Relationship="ortholog_one2one") collection = Orthologs.getSeqCollection() self.assertTrue(len(collection.Seqs[0])> 1000) def test_getting_alignment(self): mid = "ENSMUSG00000041147" brca2 = self.comp.Mouse.getGeneByStableId(StableId=mid) result = list(self.comp.getSyntenicRegions(region=brca2, align_method='PECAN', align_clade='vertebrates'))[0] aln = result.getAlignment(feature_types='gene') self.assertTrue(len(aln) > 1000) def test_generate_method_clade_data(self): """should correctly determine the align_method align_clade options for a group of species""" # we should correctly infer the method_species_links, which is a # cogent.util.Table instance self.assertTrue(self.comp.method_species_links.Shape > (0,0)) def test_no_method_clade_data(self): """generate a Table with no rows if no alignment data""" compara = Compara(['S.cerevisiae'], Release=Release, account=account) self.assertEquals(compara.method_species_links.Shape[0], 0) def test_get_syntenic_returns_nothing(self): """should correctly return None for a SyntenicRegion with golden-path assembly gap""" Start = 100000 End = Start + 100000 related = list(self.comp.getSyntenicRegions(Species='mouse', CoordName='1', Start=Start, End=End, align_method='PECAN', align_clade='vertebrates')) self.assertEquals(related, []) def test_get_species_set(self): """should return the correct set of species""" expect = set(['Homo sapiens', 'Ornithorhynchus anatinus', 'Mus musculus', 'Rattus norvegicus']) brca2 = self.comp.Human.getGeneByStableId(StableId="ENSG00000139618") Orthologs = self.comp.getRelatedGenes(gene_region=brca2, Relationship="ortholog_one2one") self.assertEquals(Orthologs.getSpeciesSet(), expect) def test_pool_connection(self): """excercising ability to specify pool connection""" dog = Compara(['chimp', 'dog'], Release=Release, account=account, pool_recycle=1000) class TestSyntenicRegions(TestCase): comp = Compara(['human', 'chimp', 'macaque'], account=account, Release=Release) def test_correct_alignments(self): """should return the correct alignments""" # following cases have a mixture of strand between ref seq and others coords_expected = [ [{'CoordName': 4, 'End': 78099, 'Species': 'human', 'Start': 77999, 'Strand':-1}, {'Homo sapiens:chromosome:4:77999-78099:-1': 'ATGTAAATCAAAACCAAAGTCTGCATTTATTTGCGGAAAGAGATGCTACATGTTCAAAGATAAATATGGAACATTTTTTAAAAGCATTCATGACTTAGAA', 'Macaca mulatta:chromosome:1:3891064-3891163:1': 'ATGTCAATCAAAACCAAAGTCTGTATTTATTTGCAGAAAGAGATACTGCATGTTCAAAGATAAATATGGAAC-TTTTTAAAAAGCATTAATGACTTATAC', 'Pan troglodytes:chromosome:4:102056-102156:-1': 'ATGTAAATCAAAACCAAAGTCTGCATTTATTTGCGGAAAGAGATGCTACATGTTCAAAGATAAATATGGAACATTTTTAAAAAGCATTCATGACTTAGAA'}], [{'CoordName': 18, 'End': 213739, 'Species': 'human', 'Start': 213639, 'Strand':-1}, {'Homo sapiens:chromosome:18:213639-213739:-1': 'ATAAGCATTTCCCTTTAGGGCTCTAAGATGAGGTCATCATCGTTTTTAATCCTGAAGAAGGGCTACTGAGTGAGTGCAGATTATTCGGTAAACACT----CTTA', 'Macaca mulatta:chromosome:18:13858303-13858397:1': '------GTTTCCCTTTAGGGCTCTAAGATGAGGTCATCATTGTTTTTAATCCTGAAGAAGGGCTACTGA----GTGCAGATTATTCTGTAAATGTGCTTACTTG', 'Pan troglodytes:chromosome:18:16601082-16601182:1': 'ATAAGCATTTCCCTTTAGGGCTCTAAGATGAGGTCATCATCGTTTTTAATCCTGAAGAAGGGCTACTGA----GTGCAGATTATTCTGTAAACACTCACTCTTA'}], [{'CoordName': 5, 'End': 204974, 'Species': 'human', 'Start': 204874, 'Strand':1}, {'Homo sapiens:chromosome:5:204874-204974:1': 'AACACTTGGTATTT----CCCCTTTATGGAGTGAGAGAGATCTTTAAAATATAAACCCTTGATAATATAATATTACTACTTCCTATTA---CCTGTTATGCAGTTCT', 'Macaca mulatta:chromosome:6:1297736-1297840:-1': 'AACTCTTGGTGTTTCCTTCCCCTTTATGG---GAGAGAGATCTTTAAAATAAAAAACCTTGATAATATAATATTACTACTTTCTATTATCATCTGTTATGCAGTTCT', 'Pan troglodytes:chromosome:5:335911-336011:1': 'AACACTTGGTAGTT----CCCCTTTATGGAGTGAGAGAGATCTTTAAAATATAAACCCTTGATAATATAATATTACTACTTTCTATTA---CCTGTTATGCAGTTCT'}], [{'CoordName': 18, 'End': 203270, 'Species': 'human', 'Start': 203170, 'Strand':-1}, {'Homo sapiens:chromosome:18:203170-203270:-1': 'GGAATAATGAAAGCAATTGTGAGTTAGCAATTACCTTCAAAGAATTACATTTCTTATACAAAGTAAAGTTCATTACTAACCTTAAGAACTTTGGCATTCA', 'Pan troglodytes:chromosome:18:16611584-16611684:1': 'GGAATAATGAAAGCAATTGTAAGTTAGCAATTACCTTCAAAGAATTACATTTCTTATACAAAGTAAAGTTCATTACTAACCTTAAGAACTTTGGCATTCA'}], [{'CoordName': 2, 'End': 46445, 'Species': 'human', 'Start': 46345, 'Strand':-1}, {'Homo sapiens:chromosome:2:46345-46445:-1': 'CTACCACTCGAGCGCGTCTCCGCTGGACCCGGAACCCCGGTCGGTCCATTCCCCGCGAAGATGCGCGCCCTGGCGGCCCTGAGCGCGCCCCCGAACGAGC', 'Macaca mulatta:chromosome:13:43921-44021:-1': 'CTGCCACTCCAGCGCGTCTCCGCTGCACCCGGAGCGCCGGCCGGTCCATTCCCCGCGAGGATGCGCGCCCTGGCGGCCCTGAACACGTCGGCGAGAGAGC', 'Pan troglodytes:chromosome:2a:36792-36892:-1': 'CTACCACTCGAGCGCGTCTCCGCTGGACCCGGAACCCCAGTCGGTCCATTCCCCGCGAAGATGCGCGCCCTGGCGGCCCTGAACGCGCCCCCGAACGAGC'}], [{'CoordName': 18, 'End': 268049, 'Species': 'human', 'Start': 267949, 'Strand':-1}, {'Homo sapiens:chromosome:18:267949-268049:-1': 'GCGCAGTGGCGGGCACGCGCAGCCGAGAAGATGTCTCCGACGCCGCCGCTCTTCAGTTTGCCCGAAGCGCGGACGCGGTTTACGGTGAGCTGTAGAGGGG', 'Macaca mulatta:chromosome:18:13805604-13805703:1': 'GCGCAG-GGCGGGCACGCGCAGCCGAGAAGATGTCTCCGACGCCGCCGCTCTTCAGTTTGCCCGAAGCGCGGACGCGGTTTACGGTGAGCTGTAGGCGGG', 'Pan troglodytes:chromosome:18:16546800-16546900:1': 'GCGCAGTGGCGGGCACGCGCAGCCGAGAAGATGTCTCCGACGCCGCCGCTCTTCAGTTTGCCCGAAGCGCGGACGCGGTTTACGGTGAGCTGTAGCGGGG'}], [{'CoordName': 16, 'End': 107443, 'Species': 'human', 'Start': 107343, 'Strand':-1}, {'Homo sapiens:chromosome:16:107343-107443:-1': 'AAGAAGCAAACAGGTTTATTTTATACAGTGGGCCAGGCCGTGGGTCTGCCATGTGACTAGGGCATTTGGACCTAGGGAGAGGTCAGTCTCAGGCCAAGTA', 'Pan troglodytes:chromosome:16:48943-49032:-1': 'AAGAAGCAAACAGGTTTATTTTATACACTGGGCCAGGCCGTGGGTCTGCCATGTGACTAGGGAATTTGGACC-----------CAGTCTCAGGCCAAGTA'}] ] # print self.comp.method_species_links for coord, expect in coords_expected[1:]: syntenic = list( self.comp.getSyntenicRegions(method_clade_id=548, **coord))[0] # check the slope computed from the expected and returned # coordinates is ~ 1 got_names = dict([(n.split(':')[0], n.split(':')) for n in syntenic.getAlignment().Names]) exp_names = dict([(n.split(':')[0], n.split(':')) for n in expect.keys()]) for species in exp_names: exp_chrom = exp_names[species][2] got_chrom = got_names[species][2] self.assertEquals(exp_chrom.lower(), got_chrom.lower()) exp_start, exp_end = map(int, exp_names[species][3].split('-')) got_start, got_end = map(int, got_names[species][3].split('-')) slope = calc_slope(exp_start, exp_end, got_start, got_end) self.assertFloatEqual(abs(slope), 1.0, eps=1e-3) def test_failing_region(self): """should correctly handle queries where multiple Ensembl have genome block associations for multiple coord systems""" gene = self.comp.Human.getGeneByStableId(StableId='ENSG00000188554') # this should simply not raise any exceptions syntenic_regions = list(self.comp.getSyntenicRegions(region=gene, align_method='PECAN', align_clade='vertebrates')) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_db/test_ensembl/test_database.py000644 000765 000024 00000007404 12024702176 024675 0ustar00jrideoutstaff000000 000000 import os from cogent.util.unit_test import TestCase, main from cogent.db.ensembl.host import HostAccount, get_ensembl_account from cogent.db.ensembl.database import Database __author__ = "Gavin Huttley, Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" Release = 68 if 'ENSEMBL_ACCOUNT' in os.environ: args = os.environ['ENSEMBL_ACCOUNT'].split() host, username, password = args[0:3] kwargs = {} if len(args) > 3: kwargs['port'] = int(args[3]) account = HostAccount(host, username, password, **kwargs) else: account = get_ensembl_account(release=Release) class TestDatabase(TestCase): def test_connect(self): human = Database(account=account, release=Release, species='human', db_type='core') gene = human.getTable('gene') def test_get_distinct(self): """should return list of strings""" db = Database(account=account, release=Release, species='human', db_type='variation') tn, tc = 'variation_feature', 'consequence_types' expected = set(('3_prime_UTR_variant', 'splice_acceptor_variant', '5_prime_UTR_variant')) got = db.getDistinct(tn, tc) self.assertNotEquals(set(got) & expected, set()) db = Database(account=account, release=Release, species='human', db_type='core') tn, tc = 'gene', 'biotype' expected = set(['protein_coding', 'pseudogene', 'processed_transcript', 'Mt_tRNA', 'Mt_rRNA', 'IG_V_gene', 'IG_J_gene', 'IG_C_gene', 'IG_D_gene', 'miRNA', 'misc_RNA', 'snoRNA', 'snRNA', 'rRNA']) got = set(db.getDistinct(tn, tc)) self.assertNotEquals(set(got) & expected, set()) db = Database(account=account, release=Release, db_type='compara') got = set(db.getDistinct('homology', 'description')) expected = set(['apparent_ortholog_one2one', 'ortholog_many2many', 'ortholog_one2many', 'ortholog_one2one', 'within_species_paralog']) self.assertEquals(len(got&expected), len(expected)) def test_get_table_row_counts(self): """should return correct row counts for some tables""" expect = {'homo_sapiens_core_68_37.analysis': 61L, 'homo_sapiens_core_68_37.seq_region': 55616L, 'homo_sapiens_core_68_37.assembly': 102090L, 'homo_sapiens_core_68_37.qtl': 0L} human = Database(account=account, release=Release, species='human', db_type='core') table_names = [n.split('.')[1] for n in expect] got = dict(human.getTablesRowCount(table_names).getRawData()) for dbname in expect: self.assertTrue(got[dbname] >= expect[dbname]) def test_table_has_column(self): """return correct values for whether a Table has a column""" account = get_ensembl_account(release=Release) var61 = Database(account=account, release=61, species='human', db_type='variation') var62 = Database(account=account, release=62, species='human', db_type='variation') self.assertTrue(var61.tableHasColumn('transcript_variation', 'peptide_allele_string')) self.assertFalse(var61.tableHasColumn('transcript_variation', 'pep_allele_string')) self.assertTrue(var62.tableHasColumn('transcript_variation', 'pep_allele_string')) self.assertFalse(var62.tableHasColumn('transcript_variation', 'peptide_allele_string')) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_db/test_ensembl/test_feature_level.py000644 000765 000024 00000006171 12024702176 025753 0ustar00jrideoutstaff000000 000000 import os from cogent import DNA from cogent.util.unit_test import TestCase, main from cogent.db.ensembl.host import HostAccount, get_ensembl_account from cogent.db.ensembl.genome import Genome from cogent.db.ensembl.assembly import CoordSystem, Coordinate, get_coord_conversion from cogent.db.ensembl.feature_level import FeatureCoordLevels __author__ = "Gavin Huttley, Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" Release = 68 if 'ENSEMBL_ACCOUNT' in os.environ: args = os.environ['ENSEMBL_ACCOUNT'].split() host, username, password = args[0:3] kwargs = {} if len(args) > 3: kwargs['port'] = int(args[3]) account = HostAccount(host, username, password, **kwargs) else: account = get_ensembl_account(release=Release) class TestFeatureCoordLevels(TestCase): def setUp(self): self.chicken = Genome(Species='chicken', Release=Release, account=account) def test_feature_levels(self): ChickenFeatureLevels = FeatureCoordLevels('chicken') chicken_feature_levels = ChickenFeatureLevels( feature_types=['gene', 'cpg', 'est'], core_db=self.chicken.CoreDb, otherfeature_db=self.chicken.OtherFeaturesDb) self.assertEquals(chicken_feature_levels['repeat'].levels, ['contig']) self.assertEquals(set(chicken_feature_levels['cpg'].levels),\ set(['contig', 'supercontig', 'chromosome'])) def test_repeat(self): # use chicken genome as it need to do conversion # chicken coordinate correspondent toRefSeq human IL2A region coord = dict(CoordName=9, Start=23817146, End=23818935) region = self.chicken.getRegion(**coord) # repeat is recorded at contig level, strand is 0 repeats = region.getFeatures(feature_types = 'repeat') expect = [("9", 23817293, 23817321), ("9", 23817803, 23817812), ("9", 23817963, 23817972)] obs = [] for repeat in repeats: loc = repeat.Location obs.append((loc.CoordName, loc.Start, loc.End)) self.assertEquals(set(obs), set(expect)) def test_cpg(self): # contain 3 CpG island recorded at chromosome level coord1 = dict(CoordName=26, Start=110000, End=190000) cpgs1 = self.chicken.getFeatures(feature_types = 'cpg', **coord1) exp = [("26", 116969, 117955), ("26", 139769, 140694), ("26", 184546, 185881)] obs = [] for cpg in cpgs1: loc = cpg.Location obs.append((loc.CoordName, loc.Start, loc.End)) self.assertEquals(set(exp), set(obs)) # test cpg features record at supercontig level: coord2 = dict(CoordName='Un_random', Start=29434117, End=29439117) cpgs2 = self.chicken.getFeatures(feature_types='cpg', **coord2) self.assertEquals(len(list(cpgs2)), 1) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_db/test_ensembl/test_genome.py000644 000765 000024 00000076673 12024702176 024421 0ustar00jrideoutstaff000000 000000 import os from cogent import DNA from cogent.util.unit_test import TestCase, main from cogent.db.ensembl.host import HostAccount, get_ensembl_account from cogent.db.ensembl.util import convert_strand from cogent.db.ensembl.genome import Genome from cogent.db.ensembl.sequence import _assemble_seq from cogent.db.ensembl.util import asserted_one __author__ = "Gavin Huttley, Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" Release = 67 if 'ENSEMBL_ACCOUNT' in os.environ: args = os.environ['ENSEMBL_ACCOUNT'].split() host, username, password = args[0:3] kwargs = {} if len(args) > 3: kwargs['port'] = int(args[3]) account = HostAccount(host, username, password, **kwargs) else: account = get_ensembl_account(release=Release) class GenomeTestBase(TestCase): human = Genome(Species="human", Release=Release, account=account) mouse = Genome(Species="mouse", Release=Release, account=account) rat = Genome(Species="rat", Release=Release, account=account) macaq = Genome(Species="macaque", Release=Release, account=account) gorilla = Genome(Species="gorilla", Release=Release, account=account) brca2 = human.getGeneByStableId(StableId="ENSG00000139618") class TestGenome(GenomeTestBase): def test_other_features(self): """should correctly return record for ENSESTG00000035043""" est = self.human.getEstMatching(StableId='ENSESTG00000035043') direct = list(est)[0] ests = self.human.getFeatures(feature_types='est', CoordName=8, Start=121470000, End=121600000) stable_ids = [est.StableId for est in ests] self.assertContains(stable_ids, direct.StableId) def test_genome_comparison(self): """different genome instances with same CoreDb connection are equal""" h2 = Genome(Species='human', Release=Release, account=account) self.assertEquals(self.human, h2) def test_make_location(self): """should correctly make a location for an entire chromosome""" loc = self.human.makeLocation(CoordName=1) self.assertEquals(len(loc), 249250621) def test_get_region(self): """should return a generic region that extracts correct sequence""" chrom = 1 Start = 11137 End = Start+20 region = self.human.getRegion(CoordName=chrom, Start=Start, End=End, ensembl_coord=True) self.assertEquals(region.Location.Start, Start-1) self.assertEquals(region.Location.End, End) self.assertEquals(region.Location.CoordName, str(chrom)) self.assertEquals(region.Location.CoordType, 'chromosome') self.assertEquals(region.Seq, 'ACCTCAGTAATCCGAAAAGCC') def test_get_assembly_exception_region(self): """should return correct sequence for region with an assembly exception""" ##old:chrY:57767412-57767433; New: chrY:59358024-59358045 region = self.human.getRegion(CoordName = "Y", Start = 59358024, End = 59358045, Strand = 1, ensembl_coord = True) self.assertEquals(str(region.Seq), 'CGAGGACGACTGGGAATCCTAG') def test_no_assembly(self): """return N's for coordinates with no assembly""" krat = Genome('Kangaroo rat', Release=58) Start=24385 End=Start+100 region = krat.getRegion(CoordName='scaffold_13754', Start=Start, End=End) self.assertEquals(str(region.Seq), 'N' * (End-Start)) def test_getting_annotated_seq(self): """a region should return a sequence with the correct annotation""" new_loc = self.brca2.Location.resized(-100, 100) region = self.human.getRegion(region=new_loc) annot_seq = region.getAnnotatedSeq(feature_types='gene') gene_annots = annot_seq.getAnnotationsMatching('gene') self.assertEquals(gene_annots[0].Name, self.brca2.Symbol) def test_correct_feature_type_id_cache(self): """should obtain the feature type identifiers without failure""" self.assertNotEquals(self.human._feature_type_ids.CpGisland, None) def test_strand_conversion(self): """should consistently convert strand info""" self.assertEquals(convert_strand(None), 1) self.assertEquals(convert_strand(-1), -1) self.assertEquals(convert_strand(1), 1) self.assertEquals(convert_strand('-'), -1) self.assertEquals(convert_strand('+'), 1) self.assertEquals(convert_strand(-1.0), -1) self.assertEquals(convert_strand(1.0), 1) def test_pool_connection(self): """excercising ability to specify pool connection""" dog = Genome(Species="dog", Release=Release, account=account, pool_recycle=1000) def test_gorilla(self): """should correctly return a gorilla gene""" self.gorilla = Genome(Species="gorilla", Release=Release, account=account) gene = self.gorilla.getGeneByStableId('ENSGGOG00000005730') self.assertEquals(str(gene.Seq[:10]), 'TGGGAGTCCA') def test_diff_strand_contig_chrom(self): """get correct sequence when contig and chromosome strands differ""" gene = self.gorilla.getGeneByStableId('ENSGGOG00000001953') cds = gene.CanonicalTranscript.Cds self.assertEquals(str(cds), 'ATGGCCCAGGATCTCAGCGAGAAGGACCTGTTGAAGATG' 'GAGGTGGAGCAGCTGAAGAAAGAAGTGAAAAACACAAGAATTCCGATTTCCAAAGCGGGAAAGGAAAT' 'CAAAGAGTACGTGGAGGCCCAAGCAGGAAACGATCCTTTTCTCAAAGGCATCCCTGAGGACAAGAATC' 'CCTTCAAGGAGAAAGGTGGCTGTCTGATAAGCTGA') def test_get_distinct_biotype(self): """Genome instance getDistinct should work on all genomes""" for genome in self.gorilla, self.human, self.mouse, self.rat, self.macaq: biotypes = genome.getDistinct('biotype') class TestGene(GenomeTestBase): def _eval_brca2(self, brca2): """tests all attributes correctly created""" self.assertEquals(brca2.Symbol.lower(), 'brca2') self.assertEquals(brca2.StableId, 'ENSG00000139618') self.assertEquals(brca2.BioType.lower(), 'protein_coding') self.assertContains(brca2.Description.lower(), 'breast cancer') self.assertEquals(brca2.Status, 'KNOWN') self.assertEquals(brca2.CanonicalTranscript.StableId, 'ENST00000380152') # note length can change between genome builds self.assertGreaterThan(len(brca2), 83700) transcript = brca2.getMember('ENST00000380152') self.assertEquals(transcript.getCdsLength(),len(transcript.Cds)) def test_get_genes_by_stable_id(self): """if get gene by stable_id, attributes should be correctly constructed""" self._eval_brca2(self.brca2) def test_get_exons(self): """transcript should return correct exons for brca2""" transcript = self.brca2.getMember('ENST00000380152') self.assertEquals(len(transcript.TranslatedExons), 26) self.assertEquals(len(transcript.Cds), 3419*3) self.assertEquals(len(transcript.ProteinSeq), 3418) def test_translated_exons(self): """should correctly translate a gene with 2 exons but 1st exon transcribed""" gene = self.mouse.getGeneByStableId(StableId='ENSMUSG00000036136') transcript = gene.getMember('ENSMUST00000041133') self.assertTrue(len(transcript.ProteinSeq) > 0) # now one on the - strand gene = self.mouse.getGeneByStableId(StableId='ENSMUSG00000045912') transcript = gene.Transcripts[0] self.assertTrue(len(transcript.ProteinSeq) > 0) def test_failed_ensembl_annotation(self): """we demonstrate a failed annotation by ensembl""" # I'm including this to demonstrate that Ensembl coords are # complex. This case has a macaque gene which we correctly # infer the CDS boundaries for according to Ensembl, but the CDS # length is not divisible by 3. gene = self.macaq.getGeneByStableId(StableId='ENSMMUG00000001551') transcript = gene.getMember('ENSMMUT00000002194') # the following works because we enforce the length being divisble by 3 # in producing ProteinSeq prot_seq = transcript.ProteinSeq # BUT if you work off the Cds you will need to slice the CDS to be # divisible by 3 to get the same protein sequence l = transcript.getCdsLength() trunc_cds = transcript.Cds[: l - (l % 3)] prot_seq = trunc_cds.getTranslation() self.assertEquals(str(prot_seq), 'MPSSPLRVAVVCSSNQNRSMEAHNILSKRGFSVRSFGTGTHVKLPGPAPDKPNVYDFKTT'\ 'YDQMYNDLLRKDKELYTQNGILHMLDRNKRIKPRPERFQNCKDLFDLILTCEERVY') def test_exon_phases(self): """correctly identify phase for an exon""" stable_id = 'ENSG00000171408' gene = self.human.getGeneByStableId(StableId=stable_id) exon1 = gene.Transcripts[1].Exons[0] # first two bases of codon missing self.assertEquals(exon1.PhaseStart, 2) # last two bases of codon missing self.assertEquals(exon1.PhaseEnd, 1) # can translate the sequence if we take those into account seq = exon1.Seq[1:-1].getTranslation() self.assertEquals(str(seq), 'HMLSKVGMWDFDIFLFDRLTN') def test_cds_from_outofphase(self): """return a translatable Cds sequence from out-of-phase start""" # canonical transcript phase end_phase # ENSG00000111729 ENST00000229332 -1 -1 # ENSG00000177151 ENST00000317450 0 -1 # ENSG00000249624 ENST00000433395 1 -1 # ENSG00000237276 ENST00000442385 2 -1 # ENSG00000167744 ENST00000301411 -1 0 canon_ids = 'ENSG00000111729 ENSG00000177151 ENSG00000237276 ENSG00000167744 ENSG00000251184'.split() for index, stable_id in enumerate(canon_ids): gene = self.human.getGeneByStableId(StableId=stable_id) transcript = gene.CanonicalTranscript prot_seq = transcript.ProteinSeq def test_gene_transcripts(self): """should return multiple transcripts""" stable_id = 'ENSG00000012048' gene = self.human.getGeneByStableId(StableId=stable_id) self.assertTrue(len(gene.Transcripts) > 1) # .. and correctly construct the Cds and location for transcript in gene.Transcripts: self.assertTrue(transcript.getCdsLength()>0) self.assertEquals(transcript.Location.CoordName,'17') def test_get_longest_cds_transcript2(self): """should correctly return transcript with longest cds""" # ENSG00000123552 is protein coding, ENSG00000206629 is ncRNA for stable_id, max_cds_length in [('ENSG00000123552', 2445), ('ENSG00000206629', 164)]: gene = self.human.getGeneByStableId(StableId=stable_id) ts = gene.getLongestCdsTranscript() self.assertEquals(len(ts.Cds), max_cds_length) self.assertEquals(ts.getCdsLength(), max(gene.getCdsLengths())) def test_get_longest_cds_transcript1(self): """should correctly return transcript with longest cds""" stable_id = 'ENSG00000178591' gene = self.human.getGeneByStableId(StableId=stable_id) ts = gene.getLongestCdsTranscript() self.assertEquals(ts.getCdsLength(), max(gene.getCdsLengths())) def test_rna_transcript_cds(self): """should return a Cds for an RNA gene too""" rna_gene = self.human.getGeneByStableId(StableId='ENSG00000210049') self.assertTrue(rna_gene.Transcripts[0].getCdsLength() > 0) def test_gene_annotation(self): """should correctly annotated a sequence""" annot_seq = self.brca2.getAnnotatedSeq(feature_types='gene') gene_annots = annot_seq.getAnnotationsMatching('gene') self.assertEquals(gene_annots[0].Name, self.brca2.Symbol) def test_get_by_symbol(self): """selecting a gene by it's HGNC symbol should correctly populate all specified attributes""" results = self.human.getGenesMatching(Symbol="BRCA2") for snp in results: self._eval_brca2(snp) def test_get_by_symbol_synonym(self): """return correct gene if provide a synonymn, rather than symbol""" synonym = 'FOXO1A' gene = list(self.human.getGenesMatching(Symbol=synonym))[0] self.assertEquals(gene.Symbol, 'FOXO1') def test_get_by_description(self): """if get by description, all attributes should be correctly constructed""" description='breast cancer 2' results = list(self.human.getGenesMatching(Description=description)) self.assertEquals(len(results), 1) self._eval_brca2(results[0]) def test_get_member(self): """should return correct exon and translated exon""" transcript = self.brca2.getMember('ENST00000380152') # just returns the first exon_id = 'ENSE00001484009' exon = transcript.getMember(exon_id) trans_exon = transcript.getMember(exon_id,'TranslatedExons') self.assertEquals(exon.StableId, exon_id) self.assertEquals(trans_exon.StableId, exon_id) # we check we got Exon in the first call and TranslatedExon in the # second using the fact that the Exons entry is longer than the # TranslatedExons one self.assertGreaterThan(len(exon), len(trans_exon)) def test_get_by_biotype(self): results = list(self.human.getGenesMatching(BioType='Mt_tRNA', like=False)) self.assertEquals(len(results), 22) results = list(self.human.getGenesMatching(BioType='Mt_tRNA', like=True)) self.assertEquals(len(results), 607) def test_get_by_decsr_biotype(self): """combining the description and biotype should return a result""" results = list(self.human.getGenesMatching(BioType="protein_coding", Description="cancer")) self.assertTrue(len(results) > 50) def test_variant(self): """variant attribute correctly constructed""" self.assertTrue(len(self.brca2.Variants) > 880) def test_get_gene_by_stable_id(self): """should correctly handle getting gene by stable_id""" stable_id = 'ENSG00000012048' gene = self.human.getGeneByStableId(StableId=stable_id) self.assertEquals(gene.StableId, stable_id) # if invalid stable_id, should just return None stable_id = 'ENSG00000XXXXX' gene = self.human.getGeneByStableId(StableId=stable_id) self.assertEquals(gene, None) def test_intron_number(self): """number of introns should be correct""" for gene_id, transcript_id, exp_number in [ ('ENSG00000227268', 'ENST00000445946', 0), ('ENSG00000132199', 'ENST00000319815', 8), ('ENSG00000132199', 'ENST00000383578', 15)]: gene = asserted_one(self.human.getGenesMatching(StableId=gene_id)) transcript = asserted_one( [t for t in gene.Transcripts if t.StableId==transcript_id]) if exp_number == 0: self.assertEqual(transcript.Introns, None) else: self.assertEqual(len(transcript.Introns), exp_number) def test_intron(self): """should get correct Intron sequence, regardless of strand""" # IL2 is on - strand, IL13 is on + strand, both have three introns IL2_exp_introns = [ (1, 123377358, 123377448, 'gtaagtatat', 'actttcttag'), (2, 123375008, 123377298, 'gtaagtacaa', 'attattctag'), (3, 123373017,123374864, 'gtaaggcatt', 'tcttttatag')] IL13_exp_introns = [ (1, 131994052, 131995109, 'gtgagtgtcg', 'gctcccacag'), (2, 131995163, 131995415, 'gtaaggacct', 'ctccccacag'), (3, 131995520, 131995866, 'gtaaggcatc', 'tgtcctgcag')] for symbol, stable_id, exp_introns in [ ('IL2', 'ENST00000226730', IL2_exp_introns), ('IL13', 'ENST00000304506', IL13_exp_introns)]: gene = asserted_one(self.human.getGenesMatching(Symbol=symbol)) strand = gene.Location.Strand transcript = asserted_one( [t for t in gene.Transcripts if t.StableId==stable_id]) introns = transcript.Introns self.assertEqual(len(introns), len(exp_introns)) idx = 0 for intron in introns: loc = intron.Location start, end = loc.Start, loc.End seq = str(intron.Seq) exp_rank, exp_start, exp_end, exp_seq5, \ exp_seq3 = exp_introns[idx] self.assertEqual(loc.Strand, strand) # test the order using rank self.assertEqual(intron.Rank, exp_rank) # test position self.assertEqual(start, exp_start) self.assertEqual(end, exp_end) # test sequence self.assertEqual(seq[:10], exp_seq5.upper()) self.assertEqual(seq[-10:], exp_seq3.upper()) idx += 1 def test_intron_annotation(self): """sequences annotated with Introns should return correct seq""" for symbol, stable_id, rank, exp_seq5, exp_seq3 in [ ('IL2', 'ENST00000226730', 1, 'gtaagtatat', 'actttcttag'), ('IL13', 'ENST00000304506', 3, 'gtaaggcatc', 'tgtcctgcag')]: gene = asserted_one(self.human.getGenesMatching(Symbol=symbol)) seq = gene.getAnnotatedSeq(feature_types='gene') intron = asserted_one(seq.getAnnotationsMatching('intron', '%s-%d'%(stable_id, rank))) intron_seq = str(seq.getRegionCoveringAll(intron).getSlice()) self.assertEqual(intron_seq[:10], exp_seq5.upper()) self.assertEqual(intron_seq[-10:], exp_seq3.upper()) class TestVariation(GenomeTestBase): snp_names = ['rs34213141', 'rs12791610', 'rs10792769', 'rs11545807', 'rs11270496'] snp_nt_alleles = ['G/A', 'C/T', 'A/G', 'C/A', 'CAGCTCCAGCTC/-'] snp_aa_alleles = ['G/R', 'P/L', 'Y/C', "V/F", "GAGAV/V"] snp_effects = ['non_synonymous_codon']*3+[['2KB_upstream_variant', '5KB_upstream_variant', 'non_synonymous_codon']]+['non_synonymous_codon'] snp_nt_len = [1, 1, 1, 1, 12] map_weights = [1,1,1,1,1] snp_flanks = [ ('CTGAGGTGAGCCAGCGTTGGAGCTGTTTTTCCTTTCAGTATGAATTCCACAAGGAAATCATCTCAGGAGGAAGGGCTCATACTTGGATCCAGAAAATATCAACATAGCCAAAGAAAAACAATCAAGACATACCTCCAGGAGCTGTGTAACAGCAACCGGAAAGAGAAACAATGGTGTGTTCCTATGTGGGATATAAAGAGCCGGGGCTCAGGGGGCTCCACACCTGCACCTCCTTCTCACCTGCTCCTCTACCTGCTCCACCCTCAATCCACCAGAACCATGGGCTGCTGTGGCTGCTCC', 'GAGGCTGTGGCTCCAGCTGTGGAGGCTGTGACTCCAGCTGTGGGAGCTGTGGCTCTGGCTGCAGGGGCTGTGGCCCCAGCTGCTGTGCACCCGTCTACTGCTGCAAGCCCGTGTGCTGCTGTGTTCCAGCCTGTTCCTGCTCTAGCTGTGGCAAGCGGGGCTGTGGCTCCTGTGGGGGCTCCAAGGGAGGCTGTGGTTCTTGTGGCTGCTCCCAGTGCAGTTGCTGCAAGCCCTGCTGTTGCTCTTCAGGCTGTGGGTCATCCTGCTGCCAGTGCAGCTGCTGCAAGCCCTACTGCTCCC'), ('GAAAATATCAACATAGCCAAAGAAAAACAATCAAGACATACCTCCAGGAGCTGTGTAACAGCAACCGGAAAGAGAAACAATGGTGTGTTCCTATGTGGGATATAAAGAGCCGGGGCTCAGGGGGCTCCACACCTGCACCTCCTTCTCACCTGCTCCTCTACCTGCTCCACCCTCAATCCACCAGAACCATGGGCTGCTGTGGCTGCTCCGGAGGCTGTGGCTCCAGCTGTGGAGGCTGTGACTCCAGCTGTGGGAGCTGTGGCTCTGGCTGCAGGGGCTGTGGCCCCAGCTGCTGTGCAC', 'CGTCTACTGCTGCAAGCCCGTGTGCTGCTGTGTTCCAGCCTGTTCCTGCTCTAGCTGTGGCAAGCGGGGCTGTGGCTCCTGTGGGGGCTCCAAGGGAGGCTGTGGTTCTTGTGGCTGCTCCCAGTGCAGTTGCTGCAAGCCCTGCTGTTGCTCTTCAGGCTGTGGGTCATCCTGCTGCCAGTGCAGCTGCTGCAAGCCCTACTGCTCCCAGTGCAGCTGCTGTAAGCCCTGTTGCTCCTCCTCGGGTCGTGGGTCATCCTGCTGCCAATCCAGCTGCTGCAAGCCCTGCTGCTCATCCTC'), ('ATCAACATAGCCAAAGAAAAACAATCAAGACATACCTCCAGGAGCTGTGTAACAGCAACCGGAAAGAGAAACAATGGTGTGTTCCTATGTGGGATATAAAGAGCCGGGGCTCAGGGGGCTCCACACCTGCACCTCCTTCTCACCTGCTCCTCTACCTGCTCCACCCTCAATCCACCAGAACCATGGGCTGCTGTGGCTGCTCCGGAGGCTGTGGCTCCAGCTGTGGAGGCTGTGACTCCAGCTGTGGGAGCTGTGGCTCTGGCTGCAGGGGCTGTGGCCCCAGCTGCTGTGCACCCGTCT', 'CTGCTGCAAGCCCGTGTGCTGCTGTGTTCCAGCCTGTTCCTGCTCTAGCTGTGGCAAGCGGGGCTGTGGCTCCTGTGGGGGCTCCAAGGGAGGCTGTGGTTCTTGTGGCTGCTCCCAGTGCAGTTGCTGCAAGCCCTGCTGTTGCTCTTCAGGCTGTGGGTCATCCTGCTGCCAGTGCAGCTGCTGCAAGCCCTACTGCTCCCAGTGCAGCTGCTGTAAGCCCTGTTGCTCCTCCTCGGGTCGTGGGTCATCCTGCTGCCAATCCAGCTGCTGCAAGCCCTGCTGCTCATCCTCAGGCTG'), ('GCTGAAGAAACCATTTCAAACAGGATTGGAATAGGGAAACCCGGCACTCAGCTCGGCGCAAGCCGGCGGTGCCTTCAGACTAGAGAGCCTCTCCTCCGGTGCGCTGCAAGTAGGGCCTCGGCTCGAGGTCAACATTCTAGTTGTCCAGCGCTCCCTCTCCGGCACCTCGGTGAGGCTAGTTGACCCGACAGGCGCGGATCATGAGCAGCTGCAGGAGAATGAAGAGCGGGGACGTAATGAGGCCGAACCAGAGCTCCCGAGTCTGCTCCGCCAGCTTCTGGCACAACAGCATCTCGAAGA', 'GAACTTGAGACTCAGGACCGTAAGTACCCAGAAAAGGCGGAGCACCGCCAGCCGCTTCTCTCCATCCTGGAAGAGGCGCACGGACACGATGGTGGTGAAGTAGGTGCTGAGCCCGTCAGCGGCGAAGAAAGGCACGAACACGTTCCACCAGGAGAGGCCCGGGACCAGGCCATCCACACGCAGTGCCAGCAGCACAGAGAACACCAACAGGGCCAGCAGGTGCACGAAGATCTCGAAGGTGGCGAAGCCTAGCCACTGCACCAGCTCCCGGAGCGAGAAGAGCATCGCGCCCGTTGAGCG')] def test_get_variation_by_symbol(self): """should return correct snp when query genome by symbol""" # supplement this test with some synonymous snp's, where they have no # peptide alleles for i in range(4): snp = list(self.human.getVariation(Symbol=self.snp_names[i]))[0] self.assertEquals(snp.Symbol, self.snp_names[i]) self.assertEquals(snp.Effect, self.snp_effects[i]) self.assertEquals(snp.Alleles, self.snp_nt_alleles[i]) self.assertEquals(snp.MapWeight, self.map_weights[i]) def test_num_alleles(self): """should correctly infer the number of alleles""" for i in range(4): snp = list(self.human.getVariation(Symbol=self.snp_names[i]))[0] self.assertEquals(len(snp), self.snp_nt_len[i]) def test_get_peptide_alleles(self): """should correctly infer the peptide alleles""" for i in range(4): snp = list(self.human.getVariation(Symbol=self.snp_names[i]))[0] if snp.Effect == 'INTRONIC': continue self.assertEquals(snp.PeptideAlleles, self.snp_aa_alleles[i]) def test_get_peptide_location(self): """should return correct location for aa variants""" index = self.snp_names.index('rs11545807') snp = list(self.human.getVariation(Symbol=self.snp_names[index]))[0] self.assertEquals(snp.TranslationLocation, 95) def test_validation_status(self): """should return correct validation status""" def func(x): if type(x) == str or x is None: x = [x] return set(x) data = (('rs34213141', set(['freq']), func), ('rs12791610', set(['cluster', 'freq']), func), ('rs10792769', set(['cluster', 'freq', '1000Genome', 'hapmap', 'doublehit']), func)) for name, status, conv in data: snp = list(self.human.getVariation(Symbol=name))[0] self.assertTrue(status <= conv(snp.Validation)) def test_get_flanking_seq(self): """should correctly get the flanking sequence""" for i in range(4): # only have flanking sequence for 3 snp = list(self.human.getVariation(Symbol=self.snp_names[i]))[0] self.assertEquals(snp.FlankingSeq, self.snp_flanks[i]) def test_variation_seq(self): """should return the sequence for a Variation snp if asked""" snp = list(self.human.getVariation(Symbol=self.snp_names[0]))[0] self.assertContains(snp.Alleles, str(snp.Seq)) def test_get_validation_condition(self): """simple test of SNP validation status""" snp_status = [('rs94', False), ('rs90', True)] for symbol, status in snp_status: snp = list(self.human.getVariation(Symbol=symbol, validated=True)) self.assertEquals(snp != [], status) def test_allele_freqs(self): """exercising getting AlleleFreq data""" snp = list(self.human.getVariation(Symbol='rs34213141'))[0] expect = set([('A', '0.0303'), ('G', '0.9697')]) allele_freqs = snp.AlleleFreqs allele_freqs = set((a, '%.4f' % f ) for a, f in allele_freqs.getRawData(['allele', 'freq'])) self.assertTrue(expect.issubset(allele_freqs)) class TestFeatures(GenomeTestBase): def setUp(self): self.igf2 = self.human.getGeneByStableId(StableId='ENSG00000167244') def test_CpG_island(self): """should return correct CpG islands""" CpGislands = self.human.getFeatures(region=self.igf2, feature_types='CpG') expected_stats = [(630, 757), (652, 537), (3254, 3533)] obs_stats = [(int(island.Score), len(island)) \ for island in CpGislands] obs_stats.sort() self.assertTrue(set(expected_stats) & set(obs_stats) != set()) def test_get_multiple_features(self): """should not fail to get multiple feature types""" regions =\ self.human.getFeatures(feature_types=['repeat','gene','cpg'], CoordName=1, Start=869936,End=901867) for region in regions: pass def test_repeats(self): """should correctly return a repeat""" loc = self.igf2.Location.resized(-1000, 1000) repeats = list(self.human.getFeatures( region=loc, feature_types='repeat')) self.assertTrue(len(repeats) >= 4) def test_genes(self): """should correctly identify igf2 within a region""" loc = self.igf2.Location.resized(-1000, 1000) genes = self.human.getFeatures(region=loc, feature_types='gene') symbols = [g.Symbol.lower() for g in genes] self.assertContains(symbols, self.igf2.Symbol.lower()) def test_other_genes(self): """docstring for est_other_genes""" mouse = self.mouse.getRegion(CoordName='5', Start=150791005, End=150838512, Strand='-') rat = self.rat.getRegion(CoordName='12', Start=4282534, End=4324019, Strand='+') for region in [mouse, rat]: features = region.getFeatures(feature_types=['gene']) ann_seq = region.getAnnotatedSeq(feature_types='gene') genes = ann_seq.getAnnotationsMatching('gene') self.assertTrue(genes != []) def test_get_variation_feature(self): """should correctly return variation features within a region""" snps = self.human.getFeatures(feature_types='variation', region=self.brca2) # snp coordname, start, end should satsify constraints of brca2 loc c = 0 loc = self.brca2.Location for snp in snps: self.assertEquals(snp.Location.CoordName, loc.CoordName) self.assertTrue(loc.Start < snp.Location.Start < loc.End) c += 1 if c == 2: break def test_gene_feature_data_correct(self): """should apply gene feature data in a manner consistent with strand and the Cogent sequence annotations slice should return the same result""" plus = list(self.human.getFeatures(feature_types='gene', CoordName=13, Start=31787610, End=31871820))[0] minus = plus.Location.copy() minus.Strand *= -1 minus = self.human.getRegion(region = minus) # get Sequence plus_seq = plus.getAnnotatedSeq(feature_types='gene') minus_seq = minus.getAnnotatedSeq(feature_types='gene') # the seqs should be the rc of each other self.assertEquals(str(plus_seq), str(minus_seq.rc())) # the Cds, however, from the annotated sequences should be identical plus_cds = plus_seq.getAnnotationsMatching('CDS')[0] minus_cds = minus_seq.getAnnotationsMatching('CDS')[0] self.assertEquals(str(plus_cds.getSlice()),str(minus_cds.getSlice())) def test_other_feature_data_correct(self): """should apply CpG feature data in a manner consistent with strand""" human = self.human coord = dict(CoordName=11, Start=2165124,End=2165724) exp_coord = dict(CoordName=11, Start=2165136, End=2165672) exp_loc = human.getRegion(Strand=1, ensembl_coord=True, **exp_coord) exp = exp_loc.Seq ps_feat = human.getRegion(Strand=1, **coord) ms_feat = human.getRegion(Strand=-1, **coord) ps_seq = ps_feat.getAnnotatedSeq(feature_types='CpG') ps_cgi = ps_seq.getAnnotationsMatching('CpGisland')[0] self.assertEquals(ps_feat.Seq, ms_feat.Seq.rc()) self.assertEquals(ps_cgi.getSlice().rc(), exp) ms_seq = ms_feat.getAnnotatedSeq(feature_types='CpG') ms_cgi = ms_seq.getAnnotationsMatching('CpGisland')[0] self.assertEquals(ms_cgi.getSlice(), ps_cgi.getSlice()) def test_other_repeat(self): """should apply repeat feature data in a manner consistent with strand""" coord=dict(CoordName=13, Start=32890200, End=32890500) ps_repeat = self.human.getRegion(Strand=1, **coord) ms_repeat = self.human.getRegion(Strand=-1, **coord) exp = DNA.makeSequence('CTTACTGTGAGGATGGGAACATTTTACAGCTGTGCTG'\ 'TCCAAACCGGTGCCACTAGCCACATTAAGCACTCGAAACGTGGCTAGTGCGACTAGAGAAGAGGA'\ 'TTTTCATACGATTTAGTTTCAATCACGCTAACCAGTGACGCGTGGCTAGTGG') self.assertEquals(ms_repeat.Seq, ps_repeat.Seq.rc()) ps_annot_seq = ps_repeat.getAnnotatedSeq(feature_types='repeat') ms_annot_seq = ms_repeat.getAnnotatedSeq(feature_types='repeat') ps_seq = ps_annot_seq.getAnnotationsMatching('repeat')[0] ms_seq = ms_annot_seq.getAnnotationsMatching('repeat')[0] self.assertEquals(ms_seq.getSlice(), ps_seq.getSlice()) self.assertEquals(ps_seq.getSlice(), exp) def test_get_features_from_nt(self): """should correctly return the encompassing gene from 1nt""" snp = list(self.human.getVariation(Symbol='rs34213141'))[0] gene=list(self.human.getFeatures(feature_types='gene',region=snp))[0] self.assertEquals(gene.StableId, 'ENSG00000254997') class TestAssembly(TestCase): def test_assemble_seq(self): """should correctly fill in a sequence with N's""" expect = DNA.makeSequence("NAAAAANNCCCCCNNGGGNNN") frags = ["AAAAA","CCCCC","GGG"] positions = [(11, 16), (18, 23), (25, 28)] self.assertEqual(_assemble_seq(frags, 10, 31, positions), expect) positions = [(1, 6), (8, 13), (15, 18)] self.assertEqual(_assemble_seq(frags, 0, 21, positions), expect) # should work with: # start matches first frag start expect = DNA.makeSequence("AAAAANNCCCCCNNGGGNNN") positions = [(0, 5), (7, 12), (14, 17)] self.assertEqual(_assemble_seq(frags, 0, 20, positions), expect) # end matches last frag_end expect = DNA.makeSequence("NAAAAANNCCCCCNNGGG") positions = [(11, 16), (18, 23), (25, 28)] self.assertEqual(_assemble_seq(frags, 10, 28, positions), expect) # both start and end matched expect = DNA.makeSequence("AAAAANNCCCCCNNGGG") positions = [(10, 15), (17, 22), (24, 27)] self.assertEqual(_assemble_seq(frags, 10, 27, positions), expect) # one frag expect = DNA.makeSequence(''.join(frags)) positions = [(10, 23)] self.assertEqual(_assemble_seq([''.join(frags)],10,23,positions), expect) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_db/test_ensembl/test_host.py000644 000765 000024 00000011317 12024702176 024104 0ustar00jrideoutstaff000000 000000 import os from cogent.util.unit_test import TestCase, main from cogent.db.ensembl.name import EnsemblDbName from cogent.db.ensembl.host import get_db_name, get_latest_release,\ DbConnection, HostAccount, get_ensembl_account from cogent.db.ensembl.species import Species __author__ = "Gavin Huttley, Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" Release = 68 if 'ENSEMBL_ACCOUNT' in os.environ: args = os.environ['ENSEMBL_ACCOUNT'].split() host, username, password = args[0:3] kwargs = {} if len(args) > 3: kwargs['port'] = int(args[3]) account = HostAccount(host, username, password, **kwargs) else: account = get_ensembl_account(release=Release) class TestEnsemblDbName(TestCase): def test_cmp_name(self): """should validly compare names by attributes""" n1 = EnsemblDbName('homo_sapiens_core_46_36h') n2 = EnsemblDbName('homo_sapiens_core_46_36h') self.assertEqual(n1, n2) def test_name_without_build(self): """should correctly handle a db name without a build""" n = EnsemblDbName("pongo_pygmaeus_core_49_1") self.assertEqual(n.Prefix, "pongo_pygmaeus") self.assertEqual(n.Type, "core") self.assertEqual(n.Build, '1') def test_ensemblgenomes_names(self): """correctly handle the ensemblgenomes naming system""" n = EnsemblDbName('aedes_aegypti_core_5_58_1e') self.assertEqual(n.Prefix, 'aedes_aegypti') self.assertEqual(n.Type, 'core') self.assertEqual(n.Release, '5') self.assertEqual(n.GeneralRelease, '58') self.assertEqual(n.Build, '1e') n = EnsemblDbName('ensembl_compara_metazoa_6_59') self.assertEqual(n.Release, '6') self.assertEqual(n.GeneralRelease, '59') self.assertEqual(n.Type, 'compara') class TestDBconnects(TestCase): def test_get_ensembl_account(self): """return an HostAccount with correct port""" for release in [48, '48', None]: act_new = get_ensembl_account(release=release) self.assertEqual(act_new.port, 5306) for release in [45, '45']: act_old = get_ensembl_account(release=45) self.assertEqual(act_old.port, 3306) def test_getdb(self): """should discover human entries correctly""" for name, db_name in [("human", "homo_sapiens_core_49_36k"), ("mouse", "mus_musculus_core_49_37b"), ("rat", "rattus_norvegicus_core_49_34s"), ("platypus", "ornithorhynchus_anatinus_core_49_1f")]: result = get_db_name(species=name, db_type="core", release='49') self.assertEqual(len(result), 1) result = result[0] self.assertEqual(result.Name, db_name) self.assertEqual(result.Release, '49') def test_latest_release_number(self): """should correctly the latest release number""" self.assertGreaterThan(get_latest_release(), "53") def test_get_all_available(self): """should return a listing of all the available databases on the indicated server""" available = get_db_name() # make sure we have a compara db present -- a crude check on # correctness one_valid = False for db in available: if db.Type == "compara": one_valid = True break self.assertEqual(one_valid, True) # now check that when we request available under a specific version # that we only receive valid ones back available = get_db_name(release="46") for db in available: self.assertEqual(db.Release, '46') def test_active_connections(self): """connecting to a database on a specified server should be done once only, but same database on a different server should be done""" ensembl_acct = get_ensembl_account(release='46') engine1 = DbConnection(account=ensembl_acct, db_name="homo_sapiens_core_46_36h") engine2 = DbConnection(account=ensembl_acct, db_name="homo_sapiens_core_46_36h") self.assertEqual(engine1, engine2) def test_pool_recycle_option(self): """excercising ability to specify a pool recycle option""" ensembl_acct = get_ensembl_account(release='56') engine1 = DbConnection(account=ensembl_acct, db_name="homo_sapiens_core_46_36h", pool_recycle=1000) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_db/test_ensembl/test_metazoa.py000644 000765 000024 00000005364 12024702176 024574 0ustar00jrideoutstaff000000 000000 from cogent.db.ensembl.host import HostAccount, get_ensembl_account from cogent.db.ensembl.compara import Compara, Genome from cogent.util.unit_test import TestCase, main __author__ = "Jason Merkin" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" Release = 12 account = HostAccount('mysql.ebi.ac.uk','anonymous', '', port=4157) class MZ_ComparaTestBase(TestCase): comp = Compara(['D.grimshawi', 'D.melanogaster'], Release=Release, account=account, division='metazoa') class MZ_TestCompara(MZ_ComparaTestBase): def test_query_genome(self): """compara should attach valid genome attributes by common name""" brca2 = self.comp.Dmelanogaster.getGeneByStableId("FBgn0050169") self.assertEquals(brca2.Symbol.lower(), 'brca2') def test_get_related_genes(self): """should correctly return the related gene regions from each genome""" # using sc35, a splicing factor sc35 = self.comp.Dmelanogaster.getGeneByStableId("FBgn0040286") Orthologs = self.comp.getRelatedGenes(gene_region=sc35, Relationship="ortholog_one2one") self.assertEquals("ortholog_one2one", Orthologs.Relationships[0]) def test_get_related_genes2(self): """should handle case where gene is absent from one of the genomes""" # here, it is brca2 brca2 = self.comp.Dmelanogaster.getGeneByStableId( StableId='FBgn0050169') orthologs = self.comp.getRelatedGenes(gene_region=brca2, Relationship='ortholog_one2one') self.assertEquals(len(orthologs.Members),2) def test_get_collection(self): sc35 = self.comp.Dmelanogaster.getGeneByStableId(StableId="FBgn0040286") Orthologs = self.comp.getRelatedGenes(gene_region=sc35, Relationship="ortholog_one2one") collection = Orthologs.getSeqCollection() self.assertTrue(len(collection.Seqs[0])> 1000) class MZ_Genome(TestCase): def test_get_general_release(self): """should correctly infer the general release""" rel_lt_65 = Genome('D.melanogaster', Release=11, account=account) self.assertEqual(rel_lt_65.GeneralRelease, 64) self.assertEqual(rel_lt_65.CoreDb.db_name, 'drosophila_melanogaster_core_11_64_539') rel_gt_65 = Genome('D.melanogaster', Release=13, account=account) self.assertEqual(rel_gt_65.GeneralRelease, 66) self.assertEqual(rel_gt_65.CoreDb.db_name, 'drosophila_melanogaster_core_13_66_539') if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_db/test_ensembl/test_species.py000644 000765 000024 00000006732 12024702176 024567 0ustar00jrideoutstaff000000 000000 from cogent.util.unit_test import TestCase, main from cogent.db.ensembl.species import Species __author__ = "Gavin Huttley, Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" class TestSpeciesNamemaps(TestCase): def test_get_name_type(self): """should return the (latin|common) name given a latin, common or ensembl db prefix names""" self.assertEqual(Species.getSpeciesName("human"), "Homo sapiens") self.assertEqual(Species.getSpeciesName("homo_sapiens"), "Homo sapiens") self.assertEqual(Species.getCommonName("Mus musculus"), "Mouse") self.assertEqual(Species.getCommonName("mus_musculus"), "Mouse") def test_get_ensembl_format(self): """should take common or latin names and return the corresponding ensembl db prefix""" self.assertEqual(Species.getEnsemblDbPrefix("human"), "homo_sapiens") self.assertEqual(Species.getEnsemblDbPrefix("mouse"), "mus_musculus") self.assertEqual(Species.getEnsemblDbPrefix("Mus musculus"), "mus_musculus") def test_add_new_species(self): """should correctly add a new species/common combination and infer the correct ensembl prefix""" species_name, common_name = "Otolemur garnettii", "Bushbaby" Species.amendSpecies(species_name, common_name) self.assertEqual(Species.getSpeciesName(species_name), species_name) self.assertEqual(Species.getSpeciesName("Bushbaby"), species_name) self.assertEqual(Species.getSpeciesName(common_name), species_name) self.assertEqual(Species.getCommonName(species_name), common_name) self.assertEqual(Species.getCommonName("Bushbaby"), common_name) self.assertEqual(Species.getEnsemblDbPrefix("Bushbaby"), "otolemur_garnettii") self.assertEqual(Species.getEnsemblDbPrefix(species_name), "otolemur_garnettii") self.assertEqual(Species.getEnsemblDbPrefix(common_name), "otolemur_garnettii") def test_amend_existing(self): """should correctly amend an existing species""" species_name = 'Ochotona princeps' common_name1 = 'american pika' common_name2 = 'pika' ensembl_pref = 'ochotona_princeps' Species.amendSpecies(species_name, common_name1) self.assertEqual(Species.getCommonName(species_name),common_name1) Species.amendSpecies(species_name, common_name2) self.assertEqual(Species.getSpeciesName(common_name2), species_name) self.assertEqual(Species.getSpeciesName(ensembl_pref), species_name) self.assertEqual(Species.getCommonName(species_name), common_name2) self.assertEqual(Species.getCommonName(ensembl_pref), common_name2) self.assertEqual(Species.getEnsemblDbPrefix(species_name), ensembl_pref) self.assertEqual(Species.getEnsemblDbPrefix(common_name2), ensembl_pref) def test_get_compara_name(self): """should correctly form valid names for assignment onto objects""" self.assertEqual(Species.getComparaName('pika'), 'Pika') self.assertEqual(Species.getComparaName('C.elegans'), 'Celegans') self.assertEqual(Species.getComparaName('Caenorhabditis elegans'), 'Celegans') if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_data/__init__.py000644 000765 000024 00000000450 12024702176 021463 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_molecular_weight'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_data/test_molecular_weight.py000644 000765 000024 00000002104 12024702176 024313 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for molecular weight. """ from cogent.util.unit_test import TestCase, main from cogent.data.molecular_weight import WeightCalculator, DnaMW, RnaMW, \ ProteinMW __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class WeightCalculatorTests(TestCase): """Tests for WeightCalculator, which should calculate molecular weights. """ def test_call(self): """WeightCalculator should return correct molecular weight""" r = RnaMW p = ProteinMW self.assertEqual(p(''), 0) self.assertEqual(r(''), 0) self.assertFloatEqual(p('A'), 107.09) self.assertFloatEqual(r('A'), 375.17) self.assertFloatEqual(p('AAA'), 285.27) self.assertFloatEqual(r('AAA'), 1001.59) self.assertFloatEqual(r('AAACCCA'), 2182.37) #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_core/__init__.py000644 000765 000024 00000001566 12024702176 021513 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_alignment', 'test_alphabet', 'test_annotation', 'test_bitvector', 'test_core_standalone', 'test_entity', 'test_genetic_code', 'test_info', 'test_location', 'test_maps', 'test_moltype', 'test_profile', 'test_sequence', 'test_tree', 'test_tree2', 'test_usage'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozupone", "Peter Maxwell", "Rob Knight", "Gavin Huttley", "Jeremy Widmann", "Greg Caporaso", "Sandra Smit", "Justin Kuczynski", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_core/test_alignment.py000644 000765 000024 00000247514 12024702176 022776 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.sequence import RnaSequence, frac_same, ModelSequence, Sequence from cogent.maths.stats.util import Freqs, Numbers from cogent.core.moltype import RNA, DNA, PROTEIN, BYTES from cogent.struct.rna2d import ViennaStructure from cogent.core.alignment import SequenceCollection, \ make_gap_filter, coerce_to_string, \ seqs_from_array, seqs_from_model_seqs, seqs_from_generic, seqs_from_fasta, \ seqs_from_dict, seqs_from_aln, seqs_from_kv_pairs, seqs_from_empty, \ aln_from_array, aln_from_model_seqs, aln_from_collection,\ aln_from_generic, aln_from_fasta, aln_from_dense_aln, aln_from_empty, \ DenseAlignment, Alignment, DataError from cogent.core.moltype import AB, DNA from cogent.parse.fasta import MinimalFastaParser from numpy import array, arange, transpose from tempfile import mktemp from os import remove import re __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Catherine Lozuopone", "Gavin Huttley", "Rob Knight", "Daniel McDonald", "Jan Kosinski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class alignment_tests(TestCase): """Tests of top-level functions.""" def test_seqs_from_array(self): """seqs_from_array should return chars, and successive indices.""" a = array([[0,1,2],[2,1,0]]) #three 2-char seqs obs_a, obs_labels = seqs_from_array(a) #note transposition self.assertEqual(obs_a, [array([0,2]), array([1,1]), array([2,0])]) self.assertEqual(obs_labels, None) def test_seqs_from_model_seqs(self): """seqs_from_model_seqs should return model seqs + names.""" s1 = ModelSequence('ABC', Name='a') s2 = ModelSequence('DEF', Name='b') obs_a, obs_labels = seqs_from_model_seqs([s1, s2]) self.assertEqual(obs_a, [s1,s2]) #seq -> numbers self.assertEqual(obs_labels, ['a','b']) def test_seqs_from_generic(self): """seqs_from_generic should initialize seqs from list of lists, etc.""" s1 = 'ABC' s2 = 'DEF' obs_a, obs_labels = seqs_from_generic([s1, s2]) self.assertEqual(obs_a, ['ABC','DEF']) self.assertEqual(obs_labels, [None, None]) def test_seqs_from_fasta(self): """seqs_from_fasta should initialize seqs from fasta-format string""" s = '>aa\nAB\nC\n>bb\nDE\nF\n' obs_a, obs_labels = seqs_from_fasta(s) self.assertEqual(obs_a, ['ABC','DEF']) self.assertEqual(obs_labels, ['aa','bb']) def test_seqs_from_aln(self): """seqs_from_aln should initialize from existing alignment""" c = SequenceCollection(['abc','def']) obs_a, obs_labels = seqs_from_aln(c) self.assertEqual(obs_a, ['abc','def']) self.assertEqual(obs_labels, ['seq_0','seq_1']) def test_seqs_from_kv_pairs(self): """seqs_from_kv_pairs should initialize from key-value pairs""" c = [['a', 'abc'], ['b', 'def']] obs_a, obs_labels = seqs_from_kv_pairs(c) self.assertEqual(obs_a, ['abc','def']) self.assertEqual(obs_labels, ['a','b']) def test_seqs_from_empty(self): """seqs_from_empty should always raise ValueError""" self.assertRaises(ValueError, seqs_from_empty, 'xyz') def test_aln_from_array(self): """aln_from_array should return same array, and successive indices.""" a = array([[0,1,2],[3,4,5]]) #three 2-char seqs obs_a, obs_labels = aln_from_array(a) self.assertEqual(obs_a, transpose(a)) self.assertEqual(obs_labels, None) def test_aln_from_model_seqs(self): """aln_from_model_seqs should initialize aln from sequence objects.""" s1 = ModelSequence('ACC', Name='a', Alphabet=RNA.Alphabet) s2 = ModelSequence('GGU', Name='b', Alphabet=RNA.Alphabet) obs_a, obs_labels = aln_from_model_seqs([s1, s2], \ Alphabet=BYTES.Alphabet) self.assertEqual(obs_a, array([[2,1,1],[3,3,0]], 'b')) #seq -> numbers self.assertEqual(obs_labels, ['a','b']) def test_aln_from_generic(self): """aln_from_generic should initialize aln from list of lists, etc.""" s1 = 'AAA' s2 = 'GGG' obs_a, obs_labels = aln_from_generic([s1, s2], 'b', \ Alphabet=RNA.Alphabet) #specify array type self.assertEqual(obs_a, array([[2,2,2],[3,3,3]], 'b')) #str -> chars self.assertEqual(obs_labels, [None, None]) def test_aln_from_fasta(self): """aln_from_fasta should initialize aln from fasta-format string""" s = '>aa\nAB\nC\n>bb\nDE\nF\n' obs_a, obs_labels = aln_from_fasta(s.splitlines()) self.assertEqual(obs_a, array(['ABC','DEF'], 'c').view('B')) #seq -> numbers self.assertEqual(obs_labels, ['aa','bb']) def test_aln_from_dense_aln(self): """aln_from_dense_aln should initialize from existing alignment""" a = DenseAlignment(array([[0,1,2],[3,4,5]]), conversion_f=aln_from_array) obs_a, obs_labels = aln_from_dense_aln(a) self.assertEqual(obs_a, a.SeqData) self.assertEqual(obs_labels, a.Names) def test_aln_from_collection(self): """aln_from_collection should initialize from existing alignment""" a = SequenceCollection(['AAA','GGG']) obs_a, obs_labels = aln_from_collection(a, Alphabet=RNA.Alphabet) self.assertEqual(a.toFasta(), '>seq_0\nAAA\n>seq_1\nGGG') self.assertEqual(obs_a, array([[2,2,2],[3,3,3]])) def test_aln_from_empty(self): """aln_from_empty should always raise ValueError""" self.assertRaises(ValueError, aln_from_empty, 'xyz') class SequenceCollectionBaseTests(object): """Base class for testing the SequenceCollection object. Unlike Alignments, SequenceCollections can have sequences that are not equal length. This module contains all the code that _doesn't_ depend on being able to look at "ragged" SequenceCollections. It is intended that all classes that inherit from SequenceCollection should have test classes that inherit from this class, but that the SequenceCollection tests themselves will additionally contain code to deal with SequenceCollections of unequal length. set self.Class in subclasses to generate the rught constructor. """ Class = SequenceCollection def setUp(self): """Define some standard SequenceCollection objects.""" self.one_seq = self.Class({'a':'AAAAA'}) self.ragged_padded = self.Class({'a':'AAAAAA','b':'AAA---', \ 'c':'AAAA--'}) self.identical = self.Class({'a':'AAAA','b':'AAAA'}) self.gaps = self.Class({'a':'AAAAAAA','b':'A--A-AA', \ 'c':'AA-----'}) self.gaps_rna = self.Class({'a':RnaSequence('AAAAAAA'), \ 'b':RnaSequence('A--A-AA'), \ 'c':RnaSequence('AA-----')}) self.unordered = self.Class({'a':'AAAAA','b':'BBBBB'}) self.ordered1 = self.Class({'a':'AAAAA','b':'BBBBB'}, \ Names=['a','b']) self.ordered2 = self.Class({'a':'AAAAA','b':'BBBBB'}, \ Names=['b','a']) self.mixed = self.Class({'a':'ABCDE', 'b':'LMNOP'}) self.end_gaps = self.Class({'a':'--A-BC-', 'b':'-CB-A--', \ 'c':'--D-EF-'}, Names=['a','b','c']) self.many = self.Class({ 'a': RnaSequence('UCAGUCAGUU'), 'b': RnaSequence('UCCGUCAAUU'), 'c': RnaSequence('ACCAUCAGUC'), 'd': RnaSequence('UCAAUCGGUU'), 'e': RnaSequence('UUGGUUGGGU'), 'f': RnaSequence('CCGGGCGGCC'), 'g': RnaSequence('UCAACCGGAA'), }) #Additional SequenceCollections for tests added 6/4/04 by Jeremy Widmann self.sequences = self.Class(map(RnaSequence, ['UCAG', 'UCAG', 'UCAG'])) self.structures = self.Class(map(ViennaStructure, ['(())..', '......', '(....)']), MolType=BYTES) self.labeled = self.Class(['ABC', 'DEF'], ['1st', '2nd']) #Additional SequenceCollection for tests added 1/30/06 by Cathy Lozupone self.omitSeqsTemplate_aln = self.Class({ 's1':RnaSequence('UC-----CU---C'), 's2':RnaSequence('UC------U---C'), 's3':RnaSequence('UUCCUUCUU-UUC'), 's4':RnaSequence('UU-UUUU-UUUUC'), 's5':RnaSequence('-------------') }) self.a = DenseAlignment(['AAA','AAA']) self.b = Alignment(['AAA','AAA']) self.c = SequenceCollection(['AAA','AAA']) def test_guess_input_type(self): """SequenceCollection _guess_input_type should figure out data type correctly""" git = self.a._guess_input_type self.assertEqual(git(self.a), 'dense_aln') self.assertEqual(git(self.b), 'aln') self.assertEqual(git(self.c), 'collection') self.assertEqual(git('>ab\nabc'), 'fasta') self.assertEqual(git(['>ab','abc']), 'fasta') self.assertEqual(git(['abc','def']), 'generic') self.assertEqual(git([[1,2],[4,5]]), 'kv_pairs') #precedence over generic self.assertEqual(git([[1,2,3],[4,5,6]]), 'generic') self.assertEqual(git([ModelSequence('abc')]), 'model_seqs') self.assertEqual(git(array([[1,2,3],[4,5,6]])), 'array') self.assertEqual(git({'a':'aca'}), 'dict') self.assertEqual(git([]), 'empty') def test_init_aln(self): """ SequenceCollection should init from existing alignments""" exp = self.Class(['AAA','AAA']) x = self.Class(self.a) y = self.Class(self.b) z = self.Class(self.c) self.assertEqual(x, exp) self.assertEqual(z, exp) self.assertEqual(y, exp) test_init_aln.__doc__ = Class.__name__ + test_init_aln.__doc__ def test_init_dict(self): """SequenceCollection init from dict should work as expected""" d = {'a':'AAAAA', 'b':'BBBBB'} a = self.Class(d) self.assertEqual(a, d) self.assertEqual(a.NamedSeqs.items(), d.items()) def test_init_name_mapped(self): """SequenceCollection init should allow name mapping function""" d = {'a':'AAAAA', 'b':'BBBBB'} f = lambda x: x.upper() a = self.Class(d, label_to_name=f) self.assertNotEqual(a, d) self.assertNotEqual(a.NamedSeqs.items(), d.items()) d_upper = {'A':'AAAAA','B':'BBBBB'} self.assertEqual(a, d_upper) self.assertEqual(a.NamedSeqs.items(), d_upper.items()) def test_init_seq(self): """SequenceCollection init from list of sequences should use indices as keys""" seqs = ['AAAAA', 'BBBBB', 'CCCCC'] a = self.Class(seqs) self.assertEqual(len(a.NamedSeqs), 3) self.assertEqual(a.NamedSeqs['seq_0'], 'AAAAA') self.assertEqual(a.NamedSeqs['seq_1'], 'BBBBB') self.assertEqual(a.NamedSeqs['seq_2'], 'CCCCC') self.assertEqual(a.Names, ['seq_0','seq_1','seq_2']) self.assertEqual(list(a.Seqs), ['AAAAA','BBBBB','CCCCC']) def test_init_annotated_seq(self): """SequenceCollection init from seqs w/ Info should preserve data""" a = Sequence('AAA', Name='a', Info={'x':3}) b = Sequence('CCC', Name='b', Info={'x':4}) c = Sequence('GGG', Name='c', Info={'x':5}) seqs = [c,b,a] a = self.Class(seqs) self.assertEqual(list(a.Names), ['c','b','a']) self.assertEqual(map(str, a.Seqs), ['GGG','CCC','AAA']) if self.Class is not DenseAlignment: #DenseAlignment is allowed to strip Info objects self.assertEqual([i.Info.x for i in a.Seqs], [5,4,3]) #check it still works if constructed from same class b = self.Class(a) self.assertEqual(list(b.Names), ['c','b','a']) self.assertEqual(map(str, b.Seqs), ['GGG','CCC','AAA']) if self.Class is not DenseAlignment: #DenseAlignment is allowed to strip Info objects self.assertEqual([i.Info.x for i in b.Seqs], [5,4,3]) def test_init_pairs(self): """SequenceCollection init from list of (key,val) pairs should work correctly""" seqs = [['x', 'XXX'], ['b','BBB'], ['c','CCC']] a = self.Class(seqs) self.assertEqual(len(a.NamedSeqs), 3) self.assertEqual(a.NamedSeqs['x'], 'XXX') self.assertEqual(a.NamedSeqs['b'], 'BBB') self.assertEqual(a.NamedSeqs['c'], 'CCC') self.assertEqual(a.Names, ['x','b','c']) self.assertEqual(list(a.Seqs), ['XXX','BBB','CCC']) def test_init_duplicate_keys(self): """SequenceCollection init from (key, val) pairs should fail on dup. keys""" seqs = [['x', 'XXX'], ['b','BBB'],['x','CCC'], ['d','DDD'], ['a','AAA']] self.assertRaises(ValueError, self.Class, seqs) aln = self.Class(seqs, remove_duplicate_names=True) self.assertEqual(str(self.Class(seqs, remove_duplicate_names=True)), '>x\nXXX\n>b\nBBB\n>d\nDDD\n>a\nAAA\n') def test_init_ordered(self): """SequenceCollection should iterate over seqs correctly even if ordered""" first = self.ordered1 sec = self.ordered2 un = self.unordered self.assertEqual(first.Names, ['a','b']) self.assertEqual(sec.Names, ['b', 'a']) self.assertEqual(un.Names, un.NamedSeqs.keys()) first_list = list(first.Seqs) sec_list = list(sec.Seqs) un_list = list(un.Seqs) self.assertEqual(first_list, ['AAAAA','BBBBB']) self.assertEqual(sec_list, ['BBBBB','AAAAA']) #check that the unordered seq matches one of the lists self.assertTrue((un_list == first_list) or (un_list == sec_list)) self.assertNotEqual(first_list, sec_list) def test_init_ambig(self): """SequenceCollection should tolerate ambiguous chars""" aln = self.Class(['AAA','CCC'],MolType=DNA) aln = self.Class(['ANS','CWC'],MolType=DNA) aln = self.Class(['A-A','CC-'],MolType=DNA) aln = self.Class(['A?A','CC-'],MolType=DNA) def test_aln_from_fasta_parser(self): """aln_from_fasta_parser should init from iterator""" s = '>aa\nAC\n>bb\nAA\n>c\nGG\n'.splitlines() p = MinimalFastaParser(s) aln = self.Class(p, MolType=DNA) self.assertEqual(aln.NamedSeqs['aa'], 'AC') self.assertEqual(aln.toFasta(), '>aa\nAC\n>bb\nAA\n>c\nGG') s2_ORIG = '>x\nCA\n>b\nAA\n>>xx\nGG' s2 = '>aa\nAC\n>bb\nAA\n>c\nGG\n' d = DenseAlignment(MinimalFastaParser(s2.splitlines())) self.assertEqual(d.toFasta(), aln.toFasta()) def test_aln_from_fasta(self): """SequenceCollection should init from fasta-format string""" s = '>aa\nAC\n>bb\nAA\n>c\nGG\n' aln = self.Class(s) self.assertEqual(aln.toFasta(), s.strip()) def test_SeqLen_get(self): """SequenceCollection SeqLen should return length of longest seq""" self.assertEqual(self.one_seq.SeqLen, 5) self.assertEqual(self.identical.SeqLen, 4) self.assertEqual(self.gaps.SeqLen, 7) def test_Seqs(self): """SequenceCollection Seqs property should return seqs in correct order.""" first = self.ordered1 sec = self.ordered2 un = self.unordered first_list = list(first.Seqs) sec_list = list(sec.Seqs) un_list = list(un.Seqs) self.assertEqual(first_list, ['AAAAA','BBBBB']) self.assertEqual(sec_list, ['BBBBB','AAAAA']) #check that the unordered seq matches one of the lists self.assertTrue((un_list == first_list) or (un_list == sec_list)) self.assertNotEqual(first_list, sec_list) def test_iterSeqs(self): """SequenceCollection iterSeqs() method should support reordering of seqs""" self.ragged_padded = self.Class(self.ragged_padded.NamedSeqs, \ Names=['a','b','c']) seqs = list(self.ragged_padded.iterSeqs()) self.assertEqual(seqs, ['AAAAAA', 'AAA---', 'AAAA--']) seqs = list(self.ragged_padded.iterSeqs(seq_order=['b','a','a'])) self.assertEqual(seqs, ['AAA---', 'AAAAAA', 'AAAAAA']) self.assertSameObj(seqs[1], seqs[2]) self.assertSameObj(seqs[0], self.ragged_padded.NamedSeqs['b']) def test_Items(self): """SequenceCollection Items should iterate over items in specified order.""" #should work if one row self.assertEqual(list(self.one_seq.Items), ['A']*5) #should take order into account self.assertEqual(list(self.ordered1.Items), ['A']*5 + ['B']*5) self.assertEqual(list(self.ordered2.Items), ['B']*5 + ['A']*5) def test_iterItems(self): """SequenceCollection iterItems() should iterate over items in correct order""" #should work if one row self.assertEqual(list(self.one_seq.iterItems()), ['A']*5) #should take order into account self.assertEqual(list(self.ordered1.iterItems()), ['A']*5 + ['B']*5) self.assertEqual(list(self.ordered2.iterItems()), ['B']*5 + ['A']*5) #should allow row and/or col specification r = self.ragged_padded self.assertEqual(list(r.iterItems(seq_order=['c','b'], \ pos_order=[5,1,3])), list('-AA-A-')) #should not interfere with superclass iteritems() i = list(r.NamedSeqs.iteritems()) i.sort() self.assertEqual(i, [('a','AAAAAA'),('b','AAA---'),('c','AAAA--')]) def test_takeSeqs(self): """SequenceCollection takeSeqs should return new SequenceCollection with selected seqs.""" a = self.ragged_padded.takeSeqs('bc') self.assertTrue(isinstance(a, SequenceCollection)) self.assertEqual(a, {'b':'AAA---','c':'AAAA--'}) #should be able to negate a = self.ragged_padded.takeSeqs('bc', negate=True) self.assertEqual(a, {'a':'AAAAAA'}) def test_takeSeqs_moltype(self): """takeSeqs should preserve the MolType""" orig = self.Class(data={'a':'CCCCCC','b':'AAA---', 'c':'AAAA--'}, MolType=DNA) subset = orig.takeSeqs('ab') self.assertEqual(set(subset.MolType), set(orig.MolType)) def test_getSeqIndices(self): """SequenceCollection getSeqIndices should return names of seqs where f(row) is True""" srp = self.ragged_padded is_long = lambda x: len(x) > 10 is_med = lambda x: len(str(x).replace('-','')) > 3 #strips gaps is_any = lambda x: len(x) > 0 self.assertEqual(srp.getSeqIndices(is_long), []) srp.Names = 'cba' self.assertEqual(srp.getSeqIndices(is_med), ['c','a']) srp.Names = 'bac' self.assertEqual(srp.getSeqIndices(is_med), ['a','c']) self.assertEqual(srp.getSeqIndices(is_any),['b','a','c']) #should be able to negate self.assertEqual(srp.getSeqIndices(is_med, negate=True), ['b']) self.assertEqual(srp.getSeqIndices(is_any, negate=True), []) def test_takeSeqsIf(self): """SequenceCollection takeSeqsIf should return seqs where f(row) is True""" is_long = lambda x: len(x) > 10 is_med = lambda x: len(str(x).replace('-','')) > 3 is_any = lambda x: len(x) > 0 srp = self.ragged_padded self.assertEqual(srp.takeSeqsIf(is_long), {}) srp.Names = 'cba' self.assertEqual(srp.takeSeqsIf(is_med), \ {'c':'AAAA--','a':'AAAAAA'}) srp.Names = srp.NamedSeqs.keys() self.assertEqual(srp.takeSeqsIf(is_med), \ {'c':'AAAA--','a':'AAAAAA'}) self.assertEqual(srp.takeSeqsIf(is_any), srp) self.assertTrue(isinstance(srp.takeSeqsIf(is_med), SequenceCollection)) #should be able to negate self.assertEqual(srp.takeSeqsIf(is_med, negate=True), \ {'b':'AAA---'}) def test_getItems(self): """SequenceCollection getItems should return list of items from k,v pairs""" self.assertEqual(self.mixed.getItems([('a',3),('b',4),('a',0)]), \ ['D','P','A']) self.assertRaises(KeyError, self.mixed.getItems, [('x','y')]) self.assertRaises(IndexError, self.mixed.getItems, [('a',1000)]) #should be able to negate -- note that results will have seqs in #arbitrary order self.assertEqualItems(self.mixed.getItems([('a',3),('b',4),('a',0)], \ negate=True), ['B','C','E','L','M','N','O']) def test_getItemIndices(self): """SequenceCollection getItemIndices should return coordinates of matching items""" is_vowel = lambda x: x in 'AEIOU' #reverse name order to test that it's not alphabetical self.mixed = self.Class(self.mixed.NamedSeqs, Names=['b','a']) self.assertEqual(self.mixed.getItemIndices(is_vowel), \ [('b',3),('a',0),('a',4)]) is_lower = lambda x: x.islower() self.assertEqual(self.ragged_padded.getItemIndices(is_lower), []) #should be able to negate self.assertEqualItems(self.mixed.getItemIndices(is_vowel, negate=True),\ [('a',1),('a',2),('a',3),('b',0),('b',1),('b',2),('b',4)]) def test_getItemsIf(self): """SequenceCollection getItemsIf should return matching items""" is_vowel = lambda x: x in 'AEIOU' #reverse name order to test that it's not alphabetical self.mixed = self.Class(self.mixed.NamedSeqs, Names=['b','a']) self.assertEqual(self.mixed.getItemsIf(is_vowel), ['O','A','E']) self.assertEqual(self.one_seq.getItemsIf(is_vowel), list('AAAAA')) #should be able to negate self.assertEqualItems(self.mixed.getItemsIf(is_vowel, negate=True), \ list('BCDLMNP')) def test_getSimilar(self): """SequenceCollection getSimilar should get all sequences close to target seq""" aln = self.many x = RnaSequence('GGGGGGGGGG') y = RnaSequence('----------') #test min and max similarity ranges result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.4,\ max_similarity=0.7) for seq in 'cefg': self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 4) result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.95, \ max_similarity=1) for seq in 'a': self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 1) result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.75, \ max_similarity=0.85) for seq in 'bd': self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 2) result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0, \ max_similarity=0.2) self.assertEqual(result, {}) #test some sequence transformations transform = lambda s: s[1:4] result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.5, \ transform=transform) for seq in 'abdfg': self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 5) transform = lambda s: s[-3:] result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.5, \ transform=transform) for seq in 'abcde': self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 5) #test a different distance metric metric = lambda x, y: str(x).count('G') + str(y).count('G') result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=5, \ max_similarity=10, metric=metric) for seq in 'ef': self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 2) #test the combination of a transform and a distance metric aln = self.Class(dict(enumerate(map(RnaSequence, \ ['GA-GU','A-GAC','GG-GG']))), MolType=RNA) transform = lambda s: RnaSequence(str(s).replace('G','A'\ ).replace('U','C')) metric = RnaSequence.fracSameNonGaps null_transform = lambda s: RnaSequence(str(s)) #first, do it without the transformation try: result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.5, \ metric=metric) except TypeError: #need to coerce to RNA seq w/ null_transform result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.5, \ metric=metric, transform=null_transform) for seq in [0,2]: self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 2) #repeat with higher similarity try: result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.8, \ metric=metric) except TypeError: #need to coerce to RNA result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.8, \ metric=metric, transform=null_transform) for seq in [0]: self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 1) #then, verify that the transform changes the results result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.5, \ metric=metric, transform=transform) for seq in [0,1,2]: self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 3) result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.8, \ metric=metric, transform=transform) for seq in [0,1]: self.assertContains(result.NamedSeqs, seq) self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq]) self.assertEqual(len(result.NamedSeqs), 2) def test_distanceMatrix(self): """SequenceCollection distanceMatrix should produce correct scores""" self.assertEqual(self.one_seq.distanceMatrix(frac_same), {'a':{'a':1}}) self.assertEqual(self.gaps.distanceMatrix(frac_same), { 'a':{'a':7/7.0,'b':4/7.0,'c':2/7.0}, 'b':{'a':4/7.0,'b':7/7.0,'c':3/7.0}, 'c':{'a':2/7.0,'b':3/7.0,'c':7/7.0}, }) def test_isRagged(self): """SequenceCollection isRagged should return true if ragged alignment""" assert(not self.identical.isRagged()) assert(not self.gaps.isRagged()) def test_toPhylip(self): """SequenceCollection should return PHYLIP string format correctly""" align_norm = self.Class( ['ACDEFGHIKLMNPQRSTUVWY-', 'ACDEFGHIKLMNPQRSUUVWF-', 'ACDEFGHIKLMNPERSKUVWC-', 'ACNEFGHIKLMNPQRS-UVWP-', ]) phylip_str, id_map = align_norm.toPhylip() self.assertEqual(phylip_str, """4 22\nseq0000001 ACDEFGHIKLMNPQRSTUVWY-\nseq0000002 ACDEFGHIKLMNPQRSUUVWF-\nseq0000003 ACDEFGHIKLMNPERSKUVWC-\nseq0000004 ACNEFGHIKLMNPQRS-UVWP-""") self.assertEqual(id_map, {'seq0000004':'seq_3', 'seq0000001':'seq_0', \ 'seq0000003': 'seq_2', 'seq0000002': 'seq_1'}) def test_toFasta(self): """SequenceCollection should return correct FASTA string""" aln = self.Class(['AAA','CCC']) self.assertEqual(aln.toFasta(), '>seq_0\nAAA\n>seq_1\nCCC') #NOTE THE FOLLOWING SURPRISING BEHAVIOR BECAUSE OF THE TWO-ITEM #SEQUENCE RULE: aln = self.Class(['AA','CC']) self.assertEqual(aln.toFasta(), '>A\nA\n>C\nC') def test_toNexus(self): """SequenceCollection should return correct Nexus string format""" align_norm = self.Class( ['ACDEFGHIKLMNPQRSTUVWY-', 'ACDEFGHIKLMNPQRSUUVWF-', 'ACDEFGHIKLMNPERSKUVWC-', 'ACNEFGHIKLMNPQRS-UVWP-']) expect = '#NEXUS\n\nbegin data;\n dimensions ntax=4 nchar=22;\n'+\ ' format datatype=protein interleave=yes missing=? gap=-;\n'+\ ' matrix\n seq_1 ACDEFGHIKLMNPQRSUUVWF-\n seq_0'+\ ' ACDEFGHIKLMNPQRSTUVWY-\n seq_3 ACNEFGHIKLMNPQRS-UVWP-\n '+\ ' seq_2 ACDEFGHIKLMNPERSKUVWC-\n\n ;\nend;' self.assertEqual(align_norm.toNexus('protein'), expect) def test_getIntMap(self): """SequenceCollection.getIntMap should return correct mapping.""" aln = self.Class({'seq1':'ACGU','seq2':'CGUA','seq3':'CCGU'}) int_keys = {'seq_0':'seq1','seq_1':'seq2','seq_2':'seq3'} int_map = {'seq_0':'ACGU','seq_1':'CGUA','seq_2':'CCGU'} im,ik = aln.getIntMap() self.assertEqual(ik,int_keys) self.assertEqual(im,int_map) #test change prefix from default 'seq_' prefix='seqn_' int_keys = {'seqn_0':'seq1','seqn_1':'seq2','seqn_2':'seq3'} int_map = {'seqn_0':'ACGU','seqn_1':'CGUA','seqn_2':'CCGU'} im,ik = aln.getIntMap(prefix=prefix) self.assertEqual(ik,int_keys) self.assertEqual(im,int_map) def test_getNumSeqs(self): """SequenceCollection.getNumSeqs should count seqs.""" aln = self.Class({'seq1':'ACGU','seq2':'CGUA','seq3':'CCGU'}) self.assertEqual(aln.getNumSeqs(), 3) def test_copyAnnotations(self): """SequenceCollection copyAnnotations should copy from seq objects""" aln = self.Class({'seq1':'ACGU','seq2':'CGUA','seq3':'CCGU'}) seq_1 = Sequence('ACGU', Name='seq1') seq_1.addFeature('xyz','abc', [(1,2)]) seq_5 = Sequence('ACGUAAAAAA', Name='seq5') seq_5.addFeature('xyzzz','abc', [(1,2)]) annot = {'seq1': seq_1, 'seq5':seq_5} aln.copyAnnotations(annot) aln_seq_1 = aln.NamedSeqs['seq1'] if not hasattr(aln_seq_1, 'annotations'): aln_seq_1 = aln_seq_1.data aln_seq_2 = aln.NamedSeqs['seq2'] if not hasattr(aln_seq_2, 'annotations'): aln_seq_2 = aln_seq_2.data self.assertEqual(len(aln_seq_1.annotations), 1) self.assertEqual(aln_seq_1.annotations[0].Name,'abc') self.assertEqual(len(aln_seq_2.annotations), 0) def test_annotateFromGff(self): """SequenceCollection.annotateFromGff should read gff features""" aln = self.Class({'seq1':'ACGU','seq2':'CGUA','seq3':'CCGU'}) gff = [ ['seq1', 'prog1', 'snp', '1', '2', '1.0', '+', '1','"abc"'], ['seq5', 'prog2', 'snp', '2', '3', '1.0', '+', '1','"yyy"'], ] gff = map('\t'.join, gff) aln.annotateFromGff(gff) aln_seq_1 = aln.NamedSeqs['seq1'] if not hasattr(aln_seq_1, 'annotations'): aln_seq_1 = aln_seq_1.data aln_seq_2 = aln.NamedSeqs['seq2'] if not hasattr(aln_seq_2, 'annotations'): aln_seq_2 = aln_seq_2.data self.assertEqual(len(aln_seq_1.annotations), 1) self.assertEqual(aln_seq_1.annotations[0].Name,'abc') self.assertEqual(len(aln_seq_2.annotations), 0) def test_replaceSeqs(self): """replaceSeqs should replace 1-letter w/ 3-letter seqs""" a = Alignment({'seq1':'ACGU','seq2':'C-UA','seq3':'C---'}) seqs = {'seq1':'AAACCCGGGUUU','seq2':'CCCUUUAAA','seq3':'CCC'} result = a.replaceSeqs(seqs) self.assertEqual(result.toFasta(), \ ">seq1\nAAACCCGGGUUU\n>seq2\nCCC---UUUAAA\n>seq3\nCCC---------") def test_getGappedSeq(self): """SequenceCollection.getGappedSeq should return seq, with gaps""" aln = self.Class({'seq1': '--TTT?', 'seq2': 'GATC??'}) self.assertEqual(str(aln.getGappedSeq('seq1')), '--TTT?') def test_add(self): """__add__ should concatenate sequence data, by name""" align1= self.Class({'a': 'AAAA', 'b': 'TTTT', 'c': 'CCCC'}) align2 = self.Class({'a': 'GGGG', 'b': '----', 'c': 'NNNN'}) align = align1 + align2 concatdict = align.todict() self.assertEqual(concatdict, {'a': 'AAAAGGGG', 'b': 'TTTT----', 'c': 'CCCCNNNN'}) def test_addSeqs(self): """addSeqs should return an alignment with the new sequences appended or inserted""" data = [('name1', 'AAA'), ('name2', 'AAA'), ('name3', 'AAA'), ('name4', 'AAA')] data1 = [('name1', 'AAA'), ('name2', 'AAA')] data2 = [('name3', 'AAA'), ('name4', 'AAA')] data3 = [('name5', 'BBB'), ('name6', 'CCC')] aln = self.Class(data) aln3 = self.Class(data3) out_aln = aln.addSeqs(aln3) self.assertEqual(str(out_aln), str(self.Class(data+data3))) #test append at the end out_aln = aln.addSeqs(aln3, before_name='name3') self.assertEqual(str(out_aln), str(self.Class(data1+data3+data2))) # test insert before out_aln = aln.addSeqs(aln3, after_name='name2') self.assertEqual(str(out_aln), str(self.Class(data1+data3+data2))) # test insert after out_aln = aln.addSeqs(aln3, before_name='name1') self.assertEqual(str(out_aln), str(self.Class(data3+data))) #test if insert before first seq works out_aln = aln.addSeqs(aln3, after_name='name4') self.assertEqual(str(out_aln), str(self.Class(data+data3))) #test if insert after last seq works self.assertRaises(ValueError, aln.addSeqs, aln3, before_name='name5') #wrong after/before name self.assertRaises(ValueError, aln.addSeqs, aln3, after_name='name5') #wrong after/before name if isinstance(aln, Alignment) or isinstance(aln, DenseAlignment): self.assertRaises((DataError, ValueError), aln.addSeqs, aln3+aln3) else: exp = set([seq for name, seq in data]) exp.update([seq+seq for name, seq in data3]) got = set() for seq in aln.addSeqs(aln3+aln3).Seqs: got.update([str(seq).strip()]) self.assertEqual(got, exp) def test_writeToFile(self): """SequenceCollection.writeToFile should write in correct format""" aln = self.Class([('a','AAAA'),( 'b','TTTT'),('c','CCCC')]) fn = mktemp(suffix='.fasta') aln.writeToFile(fn) result = open(fn, 'U').read() self.assertEqual(result, '>a\nAAAA\n>b\nTTTT\n>c\nCCCC\n') remove(fn) def test_len(self): """len(SequenceCollection) returns length of longest sequence""" aln = self.Class([('a','AAAA'),( 'b','TTTT'),('c','CCCC')]) self.assertEqual(len(aln), 4) def test_getTranslation(self): """SequenceCollection.getTranslation translates each seq""" for seqs in [ {'seq1': 'GATTTT', 'seq2': 'GATC??'}, {'seq1': 'GAT---', 'seq2': '?GATCT'}]: alignment = self.Class(data=seqs, MolType=DNA) self.assertEqual(len(alignment.getTranslation()), 2) # check for a failure when no moltype specified alignment = self.Class(data=seqs) try: peps = alignment.getTranslation() except AttributeError: pass def test_getSeq(self): """SequenceCollection.getSeq should return specified seq""" aln = self.Class({'seq1': 'GATTTT', 'seq2': 'GATC??'}) self.assertEqual(aln.getSeq('seq1'), 'GATTTT') self.assertRaises(KeyError, aln.getSeq, 'seqx') def test_todict(self): """SequenceCollection.todict should return dict of strings (not obj)""" aln = self.Class({'seq1': 'GATTTT', 'seq2': 'GATC??'}) self.assertEqual(aln.todict(), {'seq1':'GATTTT','seq2':'GATC??'}) for i in aln.todict().values(): assert isinstance(i, str) def test_getPerSequenceAmbiguousPositions(self): """SequenceCollection.getPerSequenceAmbiguousPositions should return pos""" aln = self.Class({'s1':'ATGRY?','s2':'T-AG??'}, MolType=DNA) self.assertEqual(aln.getPerSequenceAmbiguousPositions(), \ {'s2': {4: '?', 5: '?'}, 's1': {3: 'R', 4: 'Y', 5: '?'}}) def test_degap(self): """SequenceCollection.degap should strip gaps from each seq""" aln = self.Class({'s1':'ATGRY?','s2':'T-AG??'}, MolType=DNA) self.assertEqual(aln.degap(), {'s1':'ATGRY','s2':'TAG'}) def test_withModifiedTermini(self): """SequenceCollection.withModifiedTermini should code trailing gaps as ?""" aln = self.Class({'s1':'AATGR--','s2':'-T-AG?-'}, MolType=DNA) self.assertEqual(aln.withModifiedTermini(), \ {'s1':'AATGR??','s2':'?T-AG??'}) def test_omitSeqsTemplate(self): """SequenceCollection.omitSeqsTemplate returns new aln with well-aln to temp""" aln = self.omitSeqsTemplate_aln result = aln.omitSeqsTemplate('s3', 0.9, 5) self.assertEqual(result, {'s3': 'UUCCUUCUU-UUC', \ 's4': 'UU-UUUU-UUUUC'}) result2 = aln.omitSeqsTemplate('s4', 0.9, 4) self.assertEqual(result2, {'s3': 'UUCCUUCUU-UUC', \ 's4': 'UU-UUUU-UUUUC'}) result3 = aln.omitSeqsTemplate('s1', 0.9, 4) self.assertEqual(result3, {'s2': 'UC------U---C', \ 's1': 'UC-----CU---C', 's5': '-------------'}) result4 = aln.omitSeqsTemplate('s3', 0.5, 13) self.assertEqual(result4, {'s3': 'UUCCUUCUU-UUC', \ 's4': 'UU-UUUU-UUUUC'}) def test_make_gap_filter(self): """make_gap_filter returns f(seq) -> True if aligned ok w/ query""" s1 = RnaSequence('UC-----CU---C') s3 = RnaSequence('UUCCUUCUU-UUC') s4 = RnaSequence('UU-UUUU-UUUUC') #check that the behavior is ok for gap runs f1 = make_gap_filter(s1, 0.9, 5) f3 = make_gap_filter(s3, 0.9, 5) #Should return False since s1 has gap run >= 5 with respect to s3 self.assertEqual(f3(s1), False) #Should return False since s3 has an insertion run >= 5 to s1 self.assertEqual(f1(s3), False) #Should retun True since s4 does not have a long enough gap or ins run self.assertEqual(f3(s4), True) f3 = make_gap_filter(s3, 0.9, 6) self.assertEqual(f3(s1), True) #Check that behavior is ok for gap_fractions f1 = make_gap_filter(s1, 0.5, 6) f3 = make_gap_filter(s3, 0.5, 6) #Should return False since 0.53% of positions are diff for gaps self.assertEqual(f3(s1), False) self.assertEqual(f1(s3), False) self.assertEqual(f3(s4), True) def test_omitGapSeqs(self): """SequenceCollection omitGapSeqs should return alignment w/o seqs with gaps""" #check default params self.assertEqual(self.gaps.omitGapSeqs(), self.gaps.omitGapSeqs(0)) #check for boundary effects self.assertEqual(self.gaps.omitGapSeqs(-1), {}) self.assertEqual(self.gaps.omitGapSeqs(0), {'a':'AAAAAAA'}) self.assertEqual(self.gaps.omitGapSeqs(0.1), {'a':'AAAAAAA'}) self.assertEqual(self.gaps.omitGapSeqs(3.0/7 - 0.01), {'a':'AAAAAAA'}) self.assertEqual(self.gaps.omitGapSeqs(3.0/7), \ {'a':'AAAAAAA','b':'A--A-AA'}) self.assertEqual(self.gaps.omitGapSeqs(3.0/7 + 0.01), \ {'a':'AAAAAAA','b':'A--A-AA'}) self.assertEqual(self.gaps.omitGapSeqs(5.0/7 - 0.01), \ {'a':'AAAAAAA','b':'A--A-AA'}) self.assertEqual(self.gaps.omitGapSeqs(5.0/7 + 0.01), self.gaps) self.assertEqual(self.gaps.omitGapSeqs(0.99), self.gaps) #check new object creation self.assertNotSameObj(self.gaps.omitGapSeqs(0.99), self.gaps) self.assertTrue(isinstance(self.gaps.omitGapSeqs(3.0/7), SequenceCollection)) #repeat tests for object that supplies its own gaps self.assertEqual(self.gaps_rna.omitGapSeqs(-1), {}) self.assertEqual(self.gaps_rna.omitGapSeqs(0), {'a':'AAAAAAA'}) self.assertEqual(self.gaps_rna.omitGapSeqs(0.1), {'a':'AAAAAAA'}) self.assertEqual(self.gaps_rna.omitGapSeqs(3.0/7 - 0.01), \ {'a':'AAAAAAA'}) self.assertEqual(self.gaps_rna.omitGapSeqs(3.0/7), \ {'a':'AAAAAAA','b':'A--A-AA'}) self.assertEqual(self.gaps_rna.omitGapSeqs(3.0/7 + 0.01), \ {'a':'AAAAAAA','b':'A--A-AA'}) self.assertEqual(self.gaps_rna.omitGapSeqs(5.0/7 - 0.01), \ {'a':'AAAAAAA','b':'A--A-AA'}) self.assertEqual(self.gaps_rna.omitGapSeqs(5.0/7 + 0.01), self.gaps_rna) self.assertEqual(self.gaps_rna.omitGapSeqs(0.99), self.gaps_rna) self.assertNotSameObj(self.gaps_rna.omitGapSeqs(0.99), self.gaps_rna) self.assertTrue(isinstance(self.gaps_rna.omitGapSeqs(3.0/7), SequenceCollection)) def test_omitGapRuns(self): """SequenceCollection omitGapRuns should return alignment w/o runs of gaps""" #negative value will still let through ungapped sequences self.assertEqual(self.gaps.omitGapRuns(-5), {'a':'AAAAAAA'}) #test edge effects self.assertEqual(self.gaps.omitGapRuns(0), {'a':'AAAAAAA'}) self.assertEqual(self.gaps.omitGapRuns(1), {'a':'AAAAAAA'}) self.assertEqual(self.gaps.omitGapRuns(2),{'a':'AAAAAAA','b':'A--A-AA'}) self.assertEqual(self.gaps.omitGapRuns(3),{'a':'AAAAAAA','b':'A--A-AA'}) self.assertEqual(self.gaps.omitGapRuns(4),{'a':'AAAAAAA','b':'A--A-AA'}) self.assertEqual(self.gaps.omitGapRuns(5), self.gaps) self.assertEqual(self.gaps.omitGapRuns(6), self.gaps) self.assertEqual(self.gaps.omitGapRuns(1000), self.gaps) #test new object creation self.assertNotSameObj(self.gaps.omitGapRuns(6), self.gaps) self.assertTrue(isinstance(self.gaps.omitGapRuns(6), SequenceCollection)) def test_consistent_gap_degen_handling(self): """gap degen character should be treated consistently""" # the degen character '?' can be a gap, so when we strip gaps it should # be gone too raw_seq = "---??-??TC-GGCG-GCA-G-GC-?-C-TAN-GCGC-CCTC-AGGA?-???-??--" raw_ungapped = re.sub("[-?]", "", raw_seq) raw_no_ambigs = re.sub("[N?]+", "", raw_seq) dna = DNA.makeSequence(raw_seq) aln = self.Class(data=[("a", dna),("b", dna)]) expect = self.Class(data=[("a", raw_ungapped),("b", raw_ungapped)]).toFasta() self.assertEqual(aln.degap().toFasta(), expect) seqs = self.Class(data=[("a", dna),("b", dna)]) self.assertEqual(seqs.degap().toFasta(), expect) def test_padSeqs(self): """SequenceCollection padSeqs should work on alignment.""" #pad to max length padded1 = self.ragged_padded.padSeqs() seqs1 = list(padded1.iterSeqs(seq_order=['a','b','c'])) self.assertEqual(map(str,seqs1),['AAAAAA', 'AAA---', 'AAAA--']) #pad to alternate length padded1 = self.ragged_padded.padSeqs(pad_length=10) seqs1 = list(padded1.iterSeqs(seq_order=['a','b','c'])) self.assertEqual(map(str,seqs1),['AAAAAA----', 'AAA-------',\ 'AAAA------']) #assertRaises error when pad_length is less than max seq length self.assertRaises(ValueError, self.ragged_padded.padSeqs, 5) class SequenceCollectionTests(SequenceCollectionBaseTests, TestCase): """Tests of the SequenceCollection object. Includes ragged collection tests. Should not test alignment-specific features. """ def setUp(self): """Adds self.ragged for ragged collection tests.""" self.ragged = SequenceCollection({'a':'AAAAAA', 'b':'AAA', 'c':'AAAA'}) super(SequenceCollectionTests, self).setUp() def test_SeqLen_get_ragged(self): """SequenceCollection SeqLen get should work for ragged seqs""" self.assertEqual(self.ragged.SeqLen, 6) def test_isRagged_ragged(self): """SequenceCollection isRagged should return True if ragged""" self.assertTrue(self.ragged.isRagged()) def test_Seqs_ragged(self): """SequenceCollection Seqs should work on ragged alignment""" self.ragged.Names = 'bac' self.assertEqual(list(self.ragged.Seqs), ['AAA', 'AAAAAA', 'AAAA']) def test_iterSeqs_ragged(self): """SequenceCollection iterSeqs() method should support reordering of seqs""" self.ragged.Names = ['a','b','c'] seqs = list(self.ragged.iterSeqs()) self.assertEqual(seqs, ['AAAAAA', 'AAA', 'AAAA']) seqs = list(self.ragged.iterSeqs(seq_order=['b','a','a'])) self.assertEqual(seqs, ['AAA', 'AAAAAA', 'AAAAAA']) self.assertSameObj(seqs[1], seqs[2]) self.assertSameObj(seqs[0], self.ragged.NamedSeqs['b']) def test_toPHYLIP_ragged(self): """SequenceCollection should refuse to convert ragged seqs to phylip""" align_rag = self.Class( ['ACDEFGHIKLMNPQRSTUVWY-', 'ACDEFGHIKLMNPQRSUUVWF-', 'ACDEFGHIKLMNPERSKUVWC-', 'ACNEFGHIKLMNUVWP-', ]) self.assertRaises(ValueError, align_rag.toPhylip) def test_padSeqs_ragged(self): """SequenceCollection padSeqs should work on ragged alignment.""" #pad to max length padded1 = self.ragged.padSeqs() seqs1 = list(padded1.iterSeqs(seq_order=['a','b','c'])) self.assertEqual(map(str,seqs1),['AAAAAA', 'AAA---', 'AAAA--']) #pad to alternate length padded1 = self.ragged.padSeqs(pad_length=10) seqs1 = list(padded1.iterSeqs(seq_order=['a','b','c'])) self.assertEqual(map(str,seqs1),['AAAAAA----', 'AAA-------',\ 'AAAA------']) #assertRaises error when pad_length is less than max seq length self.assertRaises(ValueError, self.ragged.padSeqs, 5) class AlignmentBaseTests(SequenceCollectionBaseTests): """Tests of basic Alignment functionality. All Alignments should pass these. Note that this is not a TestCase: need to subclass to test each specific type of Alignment. Override self.Constructor with your alignment class as a constructor. """ def test_Positions(self): """SequenceCollection Positions property should iterate over positions, using self.Names""" r = self.Class({'a':'AAAAAA','b':'AAA---','c':'AAAA--'}) r.Names = ['a','b','c'] self.assertEqual(list(r.Positions), map(list, \ ['AAA','AAA','AAA', 'A-A', 'A--', 'A--'])) def test_iterPositions(self): #"""SequenceCollection iterPositions() method should support reordering of #cols""" r = self.Class(self.ragged_padded.NamedSeqs, Names=['c','b']) self.assertEqual(list(r.iterPositions(pos_order=[5,1,3])),\ map(list,['--','AA','A-'])) #reorder names r = self.Class(self.ragged_padded.NamedSeqs, Names=['a','b','c']) cols = list(r.iterPositions()) self.assertEqual(cols, map(list, ['AAA','AAA','AAA','A-A','A--','A--'])) def test_takePositions(self): """SequenceCollection takePositions should return new alignment w/ specified pos""" self.assertEqual(self.gaps.takePositions([5,4,0], \ seq_constructor=coerce_to_string), \ {'a':'AAA','b':'A-A','c':'--A'}) self.assertTrue(isinstance(self.gaps.takePositions([0]), SequenceCollection)) #should be able to negate self.assertEqual(self.gaps.takePositions([5,4,0], negate=True, \ seq_constructor=coerce_to_string), {'a':'AAAA','b':'--AA','c':'A---'}) def test_getPositionIndices(self): """SequenceCollection getPositionIndices should return names of cols where f(col)""" gap_1st = lambda x: x[0] == '-' gap_2nd = lambda x: x[1] == '-' gap_3rd = lambda x: x[2] == '-' is_list = lambda x: isinstance(x, list) self.gaps = self.Class(self.gaps.NamedSeqs, Names=['a','b','c']) self.assertEqual(self.gaps.getPositionIndices(gap_1st), []) self.assertEqual(self.gaps.getPositionIndices(gap_2nd), [1,2,4]) self.assertEqual(self.gaps.getPositionIndices(gap_3rd), [2,3,4,5,6]) self.assertEqual(self.gaps.getPositionIndices(is_list), [0,1,2,3,4,5,6]) #should be able to negate self.assertEqual(self.gaps.getPositionIndices(gap_2nd, negate=True), \ [0,3,5,6]) self.assertEqual(self.gaps.getPositionIndices(gap_1st, negate=True), \ [0,1,2,3,4,5,6]) self.assertEqual(self.gaps.getPositionIndices(is_list, negate=True), []) def test_takePositionsIf(self): """SequenceCollection takePositionsIf should return cols where f(col) is True""" gap_1st = lambda x: x[0] == '-' gap_2nd = lambda x: x[1] == '-' gap_3rd = lambda x: x[2] == '-' is_list = lambda x: isinstance(x, list) self.gaps.Names = 'abc' self.assertEqual(self.gaps.takePositionsIf(gap_1st,seq_constructor=coerce_to_string),\ {'a':'', 'b':'', 'c':''}) self.assertEqual(self.gaps.takePositionsIf(gap_2nd,seq_constructor=coerce_to_string),\ {'a':'AAA', 'b':'---', 'c':'A--'}) self.assertEqual(self.gaps.takePositionsIf(gap_3rd,seq_constructor=coerce_to_string),\ {'a':'AAAAA', 'b':'-A-AA', 'c':'-----'}) self.assertEqual(self.gaps.takePositionsIf(is_list,seq_constructor=coerce_to_string),\ self.gaps) self.assertTrue(isinstance(self.gaps.takePositionsIf(gap_1st), SequenceCollection)) #should be able to negate self.assertEqual(self.gaps.takePositionsIf(gap_1st, seq_constructor=coerce_to_string,\ negate=True), self.gaps) self.assertEqual(self.gaps.takePositionsIf(gap_2nd, seq_constructor=coerce_to_string,\ negate=True), {'a':'AAAA','b':'AAAA','c':'A---'}) self.assertEqual(self.gaps.takePositionsIf(gap_3rd, seq_constructor=coerce_to_string,\ negate=True), {'a':'AA','b':'A-','c':'AA'}) def test_omitGapPositions(self): """SequenceCollection omitGapPositions should return alignment w/o positions of gaps""" aln = self.end_gaps #first, check behavior when we're just acting on the cols (and not #trying to delete the naughty seqs). #default should strip out cols that are 100% gaps self.assertEqual(aln.omitGapPositions(seq_constructor=coerce_to_string), \ {'a':'-ABC', 'b':'CBA-', 'c':'-DEF'}) #if allowed_gap_frac is 1, shouldn't delete anything self.assertEqual(aln.omitGapPositions(1, seq_constructor=coerce_to_string), \ {'a':'--A-BC-', 'b':'-CB-A--', 'c':'--D-EF-'}) #if allowed_gap_frac is 0, should strip out any cols containing gaps self.assertEqual(aln.omitGapPositions(0, seq_constructor=coerce_to_string), \ {'a':'AB', 'b':'BA', 'c':'DE'}) #intermediate numbers should work as expected self.assertEqual(aln.omitGapPositions(0.4, seq_constructor=coerce_to_string), \ {'a':'ABC', 'b':'BA-', 'c':'DEF'}) self.assertEqual(aln.omitGapPositions(0.7, seq_constructor=coerce_to_string), \ {'a':'-ABC', 'b':'CBA-', 'c':'-DEF'}) #second, need to check behavior when the naughty seqs should be #deleted as well. #default should strip out cols that are 100% gaps self.assertEqual(aln.omitGapPositions(seq_constructor=coerce_to_string, \ del_seqs=True), {'a':'-ABC', 'b':'CBA-', 'c':'-DEF'}) #if allowed_gap_frac is 1, shouldn't delete anything self.assertEqual(aln.omitGapPositions(1, seq_constructor=coerce_to_string, \ del_seqs=True), {'a':'--A-BC-', 'b':'-CB-A--', 'c':'--D-EF-'}) #if allowed_gap_frac is 0, should strip out any cols containing gaps self.assertEqual(aln.omitGapPositions(0, seq_constructor=coerce_to_string, \ del_seqs=True), {}) #everything has at least one naughty non-gap #intermediate numbers should work as expected self.assertEqual(aln.omitGapPositions(0.4, seq_constructor=coerce_to_string, del_seqs=True), {'a':'ABC', 'c':'DEF'}) #b has a naughty non-gap #check that does not delete b if allowed_frac_bad_calls higher than 0.14 self.assertEqual(aln.omitGapPositions(0.4, seq_constructor=coerce_to_string, del_seqs=True, allowed_frac_bad_cols=0.2), \ {'a':'ABC', 'b':'BA-','c':'DEF'}) self.assertEqual(aln.omitGapPositions(0.4, seq_constructor=coerce_to_string, del_seqs=True), {'a':'ABC', 'c':'DEF'}) #b has a naughty non-gap self.assertEqual(aln.omitGapPositions(0.7, seq_constructor=coerce_to_string, del_seqs=True), {'a':'-ABC', 'b':'CBA-', 'c':'-DEF'}) #all ok #when we increase the number of sequences to 6, more differences #start to appear. new_aln_data = aln.NamedSeqs.copy() new_aln_data['d'] = '-------' new_aln_data['e'] = 'XYZXYZX' new_aln_data['f'] = 'AB-CDEF' aln = self.Class(new_aln_data) #if no gaps are allowed, everything is deleted... result = aln.omitGapPositions(seq_constructor=coerce_to_string) self.assertEqual(aln.omitGapPositions(0, del_seqs=False), \ {'a':'', 'b':'', 'c':'', 'd':'', 'e':'', 'f':''}) #...though not a sequence that's all gaps, since it has no positions #that are not gaps. This 'feature' should possibly be considered a bug. self.assertEqual(aln.omitGapPositions(0, del_seqs=True), {'d':''}) #if we're deleting only full positions of gaps, del_seqs does nothing. self.assertEqual(aln.omitGapPositions(del_seqs=True, \ seq_constructor=coerce_to_string), aln) #at 50%, should delete a bunch of minority sequences self.assertEqual(aln.omitGapPositions(0.5, del_seqs=True, \ seq_constructor=coerce_to_string), \ {'a':'-ABC','b':'CBA-','c':'-DEF','d':'----'}) #shouldn't depend on order of seqs aln.Names = 'fadbec' self.assertEqual(aln.omitGapPositions(0.5, del_seqs=True, \ seq_constructor=coerce_to_string), \ {'a':'-ABC','b':'CBA-','c':'-DEF','d':'----'}) def test_IUPACConsensus_RNA(self): """SequenceCollection IUPACConsensus should use RNA IUPAC symbols correctly""" alignmentUpper = self.Class( ['UCAGN-UCAGN-UCAGN-UCAGAGCAUN-', 'UUCCAAGGNN--UUCCAAGGNNAGCAG--', 'UUCCAAGGNN--UUCCAAGGNNAGCUA--', 'UUUUCCCCAAAAGGGGNNNN--AGCUA--', 'UUUUCCCCAAAAGGGGNNNN--AGCUA--', ], MolType=RNA) #following IUPAC consensus calculated by hand #Test all uppper self.assertEqual(alignmentUpper.IUPACConsensus(), 'UYHBN?BSNN??KBVSN?NN??AGCWD?-') def test_IUPACConsensus_DNA(self): """SequenceCollection IUPACConsensus should use DNA IUPAC symbols correctly""" alignmentUpper = self.Class( ['TCAGN-TCAGN-TCAGN-TCAGAGCATN-', 'TTCCAAGGNN--TTCCAAGGNNAGCAG--', 'TTCCAAGGNN--TTCCAAGGNNAGCTA--', 'TTTTCCCCAAAAGGGGNNNN--AGCTA--', 'TTTTCCCCAAAAGGGGNNNN--AGCTA--', ]) #following IUPAC consensus calculated by hand #Test all uppper self.assertEqual(alignmentUpper.IUPACConsensus(DNA), 'TYHBN?BSNN??KBVSN?NN??AGCWD?-') def test_IUPACConsensus_Protein(self): """SequenceCollection IUPACConsensus should use protein IUPAC symbols correctly""" alignmentUpper = self.Class( ['ACDEFGHIKLMNPQRSTUVWY-', 'ACDEFGHIKLMNPQRSUUVWF-', 'ACDEFGHIKLMNPERSKUVWC-', 'ACNEFGHIKLMNPQRS-UVWP-', ]) #following IUPAC consensus calculated by hand #Test all uppper self.assertEqual(alignmentUpper.IUPACConsensus(PROTEIN), 'ACBEFGHIKLMNPZRS?UVWX-') def test_isRagged(self): """SequenceCollection isRagged should return true if ragged alignment""" assert(not self.identical.isRagged()) assert(not self.gaps.isRagged()) def test_columnProbs(self): """SequenceCollection.columnProbs should find Pr(symbol) in each column""" #make an alignment with 4 seqs (easy to calculate probabilities) align = self.Class(["AAA", "ACA", "GGG", "GUC"]) cp = align.columnProbs() #check that the column probs match the counts we expect self.assertEqual(cp, map(Freqs, [ {'A':0.5, 'G':0.5}, {'A':0.25, 'C':0.25, 'G':0.25, 'U':0.25}, {'A':0.5, 'G':0.25, 'C':0.25}, ])) def test_majorityConsensus(self): """SequenceCollection.majorityConsensus should return commonest symbol per column""" #Check the exact strings expected from string transform self.assertEqual(self.sequences.majorityConsensus(str), 'UCAG') self.assertEqual(self.structures.majorityConsensus(str), '(.....') def test_uncertainties(self): """SequenceCollection.uncertainties should match hand-calculated values""" aln = self.Class(['ABC', 'AXC']) obs = aln.uncertainties() self.assertFloatEqual(obs, [0, 1, 0]) #check what happens with only one input sequence aln = self.Class(['ABC']) obs = aln.uncertainties() self.assertFloatEqual(obs, [0, 0, 0]) #check that we can screen out bad items OK aln = self.Class(['ABC', 'DEF', 'GHI', 'JKL', '333'], MolType=BYTES) obs = aln.uncertainties('ABCDEFGHIJKLMNOP') self.assertFloatEqual(obs, [2.0] * 3) def test_columnFreqs(self): """Alignment.columnFreqs should count symbols in each column""" #calculate by hand what the first and last positions should look like in #each case firstvalues = [ [self.sequences, Freqs('UUU')], [self.structures, Freqs('(.(')], ] lastvalues = [ [self.sequences, Freqs('GGG')], [self.structures, Freqs('..)')], ] #check that the first positions are what we expected for obj, result in firstvalues: freqs = obj.columnFreqs() self.assertEqual(str(freqs[0]), str(result)) #check that the last positions are what we expected for obj, result in lastvalues: freqs = obj.columnFreqs() self.assertEqual(str(freqs[-1]), str(result)) def test_scoreMatrix(self): """Alignment scoreMatrix should produce position specific score matrix.""" scoreMatrix = { 0:{'A':1.0,'C':1.0,'U':5.0}, 1:{'C':6.0,'U':1.0}, 2:{'A':3.0,'C':2.0,'G':2.0}, 3:{'A':3.0,'G':4.0}, 4:{'C':1.0,'G':1.0,'U':5.0}, 5:{'C':6.0,'U':1.0}, 6:{'A':3.0,'G':4.0}, 7:{'A':1.0,'G':6.0}, 8:{'A':1.0,'C':1.0,'G':1.0,'U':4.0}, 9:{'A':1.0,'C':2.0,'U':4.0}, } self.assertEqual(self.many.scoreMatrix(), scoreMatrix) def test_sample(self): """Alignment.sample should permute alignment by default""" alignment = self.Class({'seq1': 'ABCDEFGHIJKLMNOP', 'seq2': 'ABCDEFGHIJKLMNOP'}) # effectively permute columns, preserving length shuffled = alignment.sample() # ensure length correct sample = alignment.sample(10) self.assertEqual(len(sample), 10) # test columns alignment preserved seqs = sample.todict().values() self.assertEqual(seqs[0], seqs[1]) # ensure each char occurs once as sampling without replacement for char in seqs[0]: self.assertEqual(seqs[0].count(char), 1) def test_sample_with_replacement(self): #test with replacement -- just verify that it rnus alignment = self.Class({'seq1': 'gatc', 'seq2': 'gatc'}) sample = alignment.sample(1000, with_replacement=True) self.assertEqual(len(sample), 1000) # ensure that sampling with replacement works on single col alignment alignment1 = self.Class({'seq1': 'A', 'seq2': 'A'}) result = alignment1.sample(with_replacement=True) self.assertEqual(len(result), 1) def test_sample_tuples(self): ##### test with motif size != 1 ##### alignment = self.Class({'seq1': 'AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPP', 'seq2': 'AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPP'}) shuffled = alignment.sample(motif_length=2) # ensure length correct sample = alignment.sample(10,motif_length=2) self.assertEqual(len(sample), 20) # test columns alignment preserved seqs = sample.todict().values() self.assertEqual(seqs[0], seqs[1]) # ensure each char occurs twice as sampling dinucs without replacement for char in seqs[0]: self.assertEqual(seqs[0].count(char), 2) def test_copy(self): """correctly copy an alignment""" aln = self.Class(data=[('a', 'AC-GT'), ('b', 'ACCGT')]) copied = aln.copy() self.assertTrue(type(aln), type(copied)) self.assertEqual(aln.todict(), copied.todict()) self.assertEqual(id(aln.MolType), id(copied.MolType)) aln = self.Class(data=[('a', 'AC-GT'), ('b', 'ACCGT')], Info={'check': True}) copied = aln.copy() self.assertEqual(aln.Info, copied.Info) class DenseAlignmentTests(AlignmentBaseTests, TestCase): Class = DenseAlignment def test_get_freqs(self): """DenseAlignment getSeqFreqs: should work on positions and sequences """ s1 = DNA.Sequence('TCAG', Name='s1') s2 = DNA.Sequence('CCAC', Name='s2') s3 = DNA.Sequence('AGAT', Name='s3') da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) seq_exp = array([[1,1,1,1],[0,3,1,0],[1,0,2,1]]) pos_exp = array([[1,1,1,0],[0,2,0,1],[0,0,3,0],[1,1,0,1]]) self.assertEqual(da._get_freqs(index=1), pos_exp) self.assertEqual(da._get_freqs(index=0), seq_exp) def test_getSeqFreqs(self): """DenseAlignment getSeqFreqs: should work with DnaSequences and strings """ exp = array([[1,1,1,1],[0,3,1,0],[1,0,2,1]]) s1 = DNA.Sequence('TCAG', Name='s1') s2 = DNA.Sequence('CCAC', Name='s2') s3 = DNA.Sequence('AGAT', Name='s3') da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) obs = da.getSeqFreqs() self.assertEqual(obs.Data, exp) self.assertEqual(obs.Alphabet, DNA.Alphabet) self.assertEqual(obs.CharOrder, list("TCAG")) s1 = 'TCAG' s2 = 'CCAC' s3 = 'AGAT' da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) obs = da.getSeqFreqs() self.assertEqual(obs.Data, exp) self.assertEqual(obs.Alphabet, DNA.Alphabet) self.assertEqual(obs.CharOrder, list("TCAG")) def test_getPosFreqs_sequence(self): """DenseAlignment getPosFreqs: should work with DnaSequences and strings """ exp = array([[1,1,1,0],[0,2,0,1],[0,0,3,0],[1,1,0,1]]) s1 = DNA.Sequence('TCAG', Name='s1') s2 = DNA.Sequence('CCAC', Name='s2') s3 = DNA.Sequence('AGAT', Name='s3') da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) obs = da.getPosFreqs() self.assertEqual(obs.Data, exp) self.assertEqual(obs.Alphabet, DNA.Alphabet) self.assertEqual(obs.CharOrder, list("TCAG")) s1 = 'TCAG' s2 = 'CCAC' s3 = 'AGAT' da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) obs = da.getPosFreqs() self.assertEqual(obs.Data, exp) self.assertEqual(obs.Alphabet, DNA.Alphabet) self.assertEqual(obs.CharOrder, list("TCAG")) class AlignmentTests(AlignmentBaseTests, TestCase): Class = Alignment def test_get_freqs(self): """Alignment _get_freqs: should work on positions and sequences """ s1 = DNA.Sequence('TCAG', Name='s1') s2 = DNA.Sequence('CCAC', Name='s2') s3 = DNA.Sequence('AGAT', Name='s3') aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) seq_exp = array([[1,1,1,1],[0,3,1,0],[1,0,2,1]]) pos_exp = array([[1,1,1,0],[0,2,0,1],[0,0,3,0],[1,1,0,1]]) self.assertEqual(aln._get_freqs(index=1), pos_exp) self.assertEqual(aln._get_freqs(index=0), seq_exp) def test_getSeqFreqs(self): """Alignment getSeqFreqs: should work with DnaSequences and strings """ exp = array([[1,1,1,1],[0,3,1,0],[1,0,2,1]]) s1 = DNA.Sequence('TCAG', Name='s1') s2 = DNA.Sequence('CCAC', Name='s2') s3 = DNA.Sequence('AGAT', Name='s3') aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) obs = aln.getSeqFreqs() self.assertEqual(obs.Data, exp) self.assertEqual(obs.Alphabet, DNA.Alphabet) self.assertEqual(obs.CharOrder, list("TCAG")) s1 = 'TCAG' s2 = 'CCAC' s3 = 'AGAT' aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) obs = aln.getSeqFreqs() self.assertEqual(obs.Data, exp) self.assertEqual(obs.Alphabet, DNA.Alphabet) self.assertEqual(obs.CharOrder, list("TCAG")) def test_getPosFreqs(self): """Alignment getPosFreqs: should work with DnaSequences and strings """ exp = array([[1,1,1,0],[0,2,0,1],[0,0,3,0],[1,1,0,1]]) s1 = DNA.Sequence('TCAG', Name='s1') s2 = DNA.Sequence('CCAC', Name='s2') s3 = DNA.Sequence('AGAT', Name='s3') aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) obs = aln.getPosFreqs() self.assertEqual(obs.Data, exp) self.assertEqual(obs.Alphabet, DNA.Alphabet) self.assertEqual(obs.CharOrder, list("TCAG")) s1 = 'TCAG' s2 = 'CCAC' s3 = 'AGAT' aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet) obs = aln.getPosFreqs() self.assertEqual(obs.Data, exp) self.assertEqual(obs.Alphabet, DNA.Alphabet) self.assertEqual(obs.CharOrder, list("TCAG")) def make_and_filter(self, raw, expected, motif_length): # a simple filter func func = lambda x: re.findall("[-N?]", " ".join(x)) == [] aln = self.Class(raw) result = aln.filtered(func,motif_length=motif_length,log_warnings=False) self.assertEqual(result.todict(), expected) def test_filtered(self): """filtered should return new alignment with positions consistent with provided callback function""" # a simple filter option raw = {'a':'ACGACGACG', 'b':'CCC---CCC', 'c':'AAAA--AAA'} self.make_and_filter(raw, {'a':'ACGACG','b':'CCCCCC','c':'AAAAAA'}, 1) # check with motif_length = 2 self.make_and_filter(raw, {'a':'ACAC','b':'CCCC','c':'AAAA'}, 2) # check with motif_length = 3 self.make_and_filter(raw, {'a':'ACGACG','b':'CCCCCC','c':'AAAAAA'}, 3) def test_slidingWindows(self): """slidingWindows should return slices of alignments.""" alignment = self.Class({'seq1': 'ACGTACGT', 'seq2': 'ACGTACGT', 'seq3': 'ACGTACGT'}) result = [] for bit in alignment.slidingWindows(5,2): result+=[bit] self.assertEqual(result[0].todict(), {'seq3': 'ACGTA', 'seq2': 'ACGTA', 'seq1': 'ACGTA'}) self.assertEqual(result[1].todict(), {'seq3': 'GTACG', 'seq2': 'GTACG', 'seq1': 'GTACG'}) result = [] for bit in alignment.slidingWindows(5,1): result+=[bit] self.assertEqual(result[0].todict(), {'seq3': 'ACGTA', 'seq2': 'ACGTA', 'seq1': 'ACGTA'}) self.assertEqual(result[1].todict(), {'seq3': 'CGTAC', 'seq2': 'CGTAC', 'seq1': 'CGTAC'}) self.assertEqual(result[2].todict(), {'seq3': 'GTACG', 'seq2': 'GTACG', 'seq1': 'GTACG'}) self.assertEqual(result[3].todict(), {'seq3': 'TACGT', 'seq2': 'TACGT', 'seq1': 'TACGT'}) def test_withGapsFrom(self): """withGapsFrom should overwrite with gaps.""" gapless = self.Class({'seq1': 'TCG', 'seq2': 'TCG'}) pregapped = self.Class({'seq1': '-CG', 'seq2': 'TCG'}) template = self.Class({'seq1': 'A-?', 'seq2': 'ACG'}) r1 = gapless.withGapsFrom(template).todict() r2 = pregapped.withGapsFrom(template).todict() self.assertEqual(r1, {'seq1': 'T-G', 'seq2': 'TCG'}) self.assertEqual(r2, {'seq1': '--G', 'seq2': 'TCG'}) def test_getDegappedRelativeTo(self): """should remove all columns with a gap in sequence with given name""" aln = self.Class([ ['name1', '-AC-DEFGHI---'], ['name2', 'XXXXXX--XXXXX'], ['name3', 'YYYY-YYYYYYYY'], ['name4', '-KL---MNPR---'], ]) out_aln = self.Class([ ['name1', 'ACDEFGHI'], ['name2', 'XXXX--XX'], ['name3', 'YY-YYYYY'], ['name4', 'KL--MNPR'], ]) self.assertEqual(aln.getDegappedRelativeTo('name1'), out_aln) self.assertRaises(ValueError, aln.getDegappedRelativeTo, 'nameX') def test_addFromReferenceAln(self): """should add or insert seqs based on align to reference""" aln1 = self.Class([ ['name1', '-AC-DEFGHI---'], ['name2', 'XXXXXX--XXXXX'], ['name3', 'YYYY-YYYYYYYY'], ]) aln2 = self.Class([ ['name1', 'ACDEFGHI'], ['name4', 'KL--MNPR'], ['name5', 'KLACMNPR'], ['name6', 'KL--MNPR'], ]) aligned_to_ref_out_aln_inserted = self.Class([ ['name1', '-AC-DEFGHI---'], ['name4', '-KL---MNPR---'], ['name5', '-KL-ACMNPR---'], ['name6', '-KL---MNPR---'], ['name2', 'XXXXXX--XXXXX'], ['name3', 'YYYY-YYYYYYYY'], ]) aln2_wrong_refseq = self.Class(( ('name1', 'ACDXFGHI'), ('name4', 'KL--MNPR'), )) aln2_wrong_refseq_name = self.Class([ ['nameY', 'ACDEFGHI'], ['name4', 'KL--MNPR'], ]) aln2_different_aln_class = DenseAlignment([ ['name1', 'ACDEFGHI'], ['name4', 'KL--MNPR'], ]) aln2_list = [ ['name1', 'ACDEFGHI'], ['name4', 'KL--MNPR'], ] aligned_to_ref_out_aln = self.Class([ ['name1', '-AC-DEFGHI---'], ['name2', 'XXXXXX--XXXXX'], ['name3', 'YYYY-YYYYYYYY'], ['name4', '-KL---MNPR---'], ]) out_aln = aln1.addFromReferenceAln(aln2, after_name='name1') self.assertEqual(str(aligned_to_ref_out_aln_inserted), str(out_aln)) #test insert_after out_aln = aln1.addFromReferenceAln(aln2, before_name='name2') self.assertEqual(aligned_to_ref_out_aln_inserted, out_aln) #test insert_before self.assertRaises(ValueError, aln1.addFromReferenceAln, aln2_wrong_refseq_name) #test wrong_refseq_name aln = aln1.addFromReferenceAln(aln2_different_aln_class) self.assertEqual(aligned_to_ref_out_aln, aln) #test_align_to_refseq_different_aln_class aln = aln1.addFromReferenceAln(aln2_list) self.assertEqual(aligned_to_ref_out_aln, aln) #test from_list self.assertRaises(ValueError, aln1.addFromReferenceAln, aln2_wrong_refseq) #test wrong_refseq class DenseAlignmentSpecificTests(TestCase): """Tests of the DenseAlignment object and its methods""" def setUp(self): """Define some standard alignments.""" self.a = DenseAlignment(array([[0,1,2],[3,4,5]]), \ conversion_f=aln_from_array) self.a2 = DenseAlignment(['ABC','DEF'], Names=['x','y']) class ABModelSequence(ModelSequence): Alphabet = AB.Alphabet self.ABModelSequence = ABModelSequence self.a = DenseAlignment(map(ABModelSequence, ['abaa','abbb']), \ Alphabet=AB.Alphabet) self.b = Alignment(['ABC','DEF']) self.c = SequenceCollection(['ABC','DEF']) def test_init(self): """DenseAlignment init should work from a sequence""" a = DenseAlignment(array([[0,1,2],[3,4,5]]), conversion_f=aln_from_array) self.assertEqual(a.SeqData, array([[0,3],[1,4],[2,5]], 'B')) self.assertEqual(a.ArrayPositions, array([[0,1,2],[3,4,5]], 'B')) self.assertEqual(a.Names, ['seq_0','seq_1','seq_2']) def test_guess_input_type(self): """DenseAlignment _guess_input_type should figure out data type correctly""" git = self.a._guess_input_type self.assertEqual(git(self.a), 'dense_aln') self.assertEqual(git(self.b), 'aln') self.assertEqual(git(self.c), 'collection') self.assertEqual(git('>ab\nabc'), 'fasta') self.assertEqual(git(['>ab','abc']), 'fasta') self.assertEqual(git(['abc','def']), 'generic') self.assertEqual(git([[1,2],[4,5]]), 'kv_pairs') #precedence over generic self.assertEqual(git([[1,2,3],[4,5,6]]), 'generic') self.assertEqual(git([ModelSequence('abc')]), 'model_seqs') self.assertEqual(git(array([[1,2,3],[4,5,6]])), 'array') self.assertEqual(git({'a':'aca'}), 'dict') self.assertEqual(git([]), 'empty') def test_init_seqs(self): """DenseAlignment init should work from ModelSequence objects.""" s = map(ModelSequence, ['abc','def']) a = DenseAlignment(s) self.assertEqual(a.SeqData, array(['abc','def'], 'c').view('B')) def test_init_generic(self): """DenseAlignment init should work from generic objects.""" s = ['abc','def'] a = DenseAlignment(s) self.assertEqual(a.SeqData, array(['abc','def'], 'c').view('B')) def test_init_aln(self): """DenseAlignment init should work from another alignment.""" s = ['abc','def'] a = DenseAlignment(s) b = DenseAlignment(a) self.assertNotSameObj(a.SeqData, b.SeqData) self.assertEqual(b.SeqData, array(['abc','def'], 'c').view('B')) def test_init_dict(self): """DenseAlignment init should work from dict.""" s = {'abc':'aaaccc','xyz':'gcgcgc'} a = DenseAlignment(s) self.assertEqual(a.SeqData, array(['aaaccc','gcgcgc'], 'c').view('B')) self.assertEqual(tuple(a.Names), ('abc','xyz')) def test_init_empty(self): """DenseAlignment init should fail if empty.""" self.assertRaises(TypeError, DenseAlignment) self.assertRaises(ValueError, DenseAlignment, 3) def test_get_alphabet_and_moltype(self): """DenseAlignment should figure out correct alphabet and moltype""" s1 = 'A' s2 = RNA.Sequence('AA') d = DenseAlignment(s1) self.assertSameObj(d.MolType, BYTES) self.assertSameObj(d.Alphabet, BYTES.Alphabet) d = DenseAlignment(s1, MolType=RNA) self.assertSameObj(d.MolType, RNA) self.assertSameObj(d.Alphabet, RNA.Alphabets.DegenGapped) d = DenseAlignment(s1, Alphabet=RNA.Alphabet) self.assertSameObj(d.MolType, RNA) self.assertSameObj(d.Alphabet, RNA.Alphabet) d = DenseAlignment(s2) self.assertSameObj(d.MolType, RNA) self.assertSameObj(d.Alphabet, RNA.Alphabets.DegenGapped) d = DenseAlignment(s2, MolType=DNA) self.assertSameObj(d.MolType, DNA) self.assertSameObj(d.Alphabet, DNA.Alphabets.DegenGapped) #checks for containers d = DenseAlignment([s2]) self.assertSameObj(d.MolType, RNA) d = DenseAlignment({'x':s2}) self.assertSameObj(d.MolType, RNA) d = DenseAlignment(set([s2])) self.assertSameObj(d.MolType, RNA) def test_iter(self): """DenseAlignment iter should iterate over positions""" result = list(iter(self.a2)) for i, j in zip(result, [list(i) for i in ['AD', 'BE', 'CF']]): self.assertEqual(i,j) def test_getitem(self): """DenseAlignment getitem should default to positions as chars""" a2 = self.a2 self.assertEqual(a2[1], ['B','E']) self.assertEqual(a2[1:], [['B','E'],['C','F']]) def test_getSubAlignment(self): """DenseAlignment getSubAlignment should get requested part of alignment.""" a = DenseAlignment('>x ABCE >y FGHI >z JKLM'.split()) #passing in positions should keep all seqs, but just selected positions b = DenseAlignment('>x BC >y GH >z KL'.split()) a_1 = a.getSubAlignment(pos=[1,2]) self.assertEqual(a_1.Names, b.Names) self.assertEqual(a_1.Seqs, b.Seqs) #...and with invert_pos, should keep all except the positions passed in a_2 = a.getSubAlignment(pos=[0,3], invert_pos=True) self.assertEqual(a_2.Seqs, b.Seqs) self.assertEqual(a_2.Names, b.Names) #passing in seqs should keep all positions, but just selected seqs c = DenseAlignment('>x ABCE >z JKLM'.split()) a_3 = a.getSubAlignment(seqs=[0,2]) self.assertEqual(a_3.Seqs, c.Seqs) #check that labels were updates as well... self.assertEqual(a_3.Names, c.Names) #...and should work with invert_seqs to exclude just selected seqs a_4 = a.getSubAlignment(seqs=[1], invert_seqs=True) self.assertEqual(a_4.Seqs, c.Seqs) self.assertEqual(a_4.Names, c.Names) #should be able to do both seqs and positions simultaneously d = DenseAlignment('>x BC >z KL'.split()) a_5 = a.getSubAlignment(seqs=[0,2], pos=[1,2]) self.assertEqual(a_5.Seqs, d.Seqs) self.assertEqual(a_5.Names, d.Names) def test_str(self): """DenseAlignment str should return FASTA representation of aln""" self.assertEqual(str(self.a2), '>x\nABC\n>y\nDEF\n') #should work if labels diff length self.a2.Names[-1] = 'yyy' self.assertEqual(str(self.a2), '>x\nABC\n>yyy\nDEF\n') def test_get_freqs(self): """DenseAlignment _get_freqs should get row or col freqs""" ABModelSequence = self.ABModelSequence a = self.a self.assertEqual(a._get_freqs(0), array([[3,1],[1,3]])) self.assertEqual(a._get_freqs(1), array([[2,0],[0,2],[1,1],[1,1]])) def test_getSeqFreqs(self): """DenseAlignment getSeqFreqs should get profile of freqs in each seq""" ABModelSequence = self.ABModelSequence a = self.a f = a.getSeqFreqs() self.assertEqual(f.Data, array([[3,1],[1,3]])) def test_getPosFreqs(self): """DenseAlignment getPosFreqs should get profile of freqs at each pos""" ABModelSequence = self.ABModelSequence a = self.a f = a.getPosFreqs() self.assertEqual(f.Data, array([[2,0],[0,2],[1,1],[1,1]])) def test_getSeqEntropy(self): """DenseAlignment getSeqEntropy should get entropy of each seq""" ABModelSequence = self.ABModelSequence a = DenseAlignment(map(ABModelSequence, ['abab','bbbb','abbb']), \ Alphabet=AB.Alphabet) f = a.getSeqEntropy() e = 0.81127812445913283 #sum(p log_2 p) for p = 0.25, 0.75 self.assertFloatEqual(f, array([1,0,e])) def test_getPosEntropy(self): """DenseAlignment getPosEntropy should get entropy of each pos""" ABModelSequence = self.ABModelSequence a = self.a f = a.getPosEntropy() e = array([0,0,1,1]) self.assertEqual(f, e) class IntegrationTests(TestCase): """Test for integration between regular and model seqs and alns""" def setUp(self): """Intialize some standard sequences""" self.r1 = RNA.Sequence('AAA', Name='x') self.r2 = RNA.Sequence('CCC', Name='y') self.m1 = RNA.ModelSeq('AAA', Name='xx') self.m2 = RNA.ModelSeq('CCC', Name='yy') def test_model_to_model(self): """Model seq should work with dense alignment""" a = DenseAlignment([self.m1, self.m2]) self.assertEqual(str(a), '>xx\nAAA\n>yy\nCCC\n') a = DenseAlignment([self.m1, self.m2], MolType=DNA) self.assertEqual(str(a), '>xx\nAAA\n>yy\nCCC\n') self.assertEqual(self.m1.Name, 'xx') def test_regular_to_model(self): """Regular seq should work with dense alignment""" a = DenseAlignment([self.r1, self.r2]) self.assertEqual(str(a), '>x\nAAA\n>y\nCCC\n') a = DenseAlignment([self.r1, self.r2], MolType=DNA) self.assertEqual(str(a), '>x\nAAA\n>y\nCCC\n') self.assertEqual(self.r1.Name, 'x') def test_model_to_regular(self): """Model seq should work with regular alignment""" a = Alignment([self.m1, self.m2]) self.assertEqual(str(a), '>xx\nAAA\n>yy\nCCC\n') a = Alignment([self.m1, self.m2], MolType=DNA) self.assertEqual(str(a), '>xx\nAAA\n>yy\nCCC\n') self.assertEqual(self.m1.Name, 'xx') def test_regular_to_regular(self): """Regular seq should work with regular alignment""" a = Alignment([self.r1, self.r2]) self.assertEqual(str(a), '>x\nAAA\n>y\nCCC\n') a = Alignment([self.r1, self.r2], MolType=DNA) self.assertEqual(str(a), '>x\nAAA\n>y\nCCC\n') self.assertEqual(self.r1.Name, 'x') def test_model_aln_to_regular_aln(self): """Dense aln should convert to regular aln""" a = DenseAlignment([self.r1, self.r2]) d = Alignment(a) self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n') d = Alignment(a, MolType=DNA) self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n') self.assertEqual(self.r1.Name, 'x') def test_regular_aln_to_model_aln(self): """Regular aln should convert to model aln""" a = Alignment([self.r1, self.r2]) d = DenseAlignment(a) self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n') d = DenseAlignment(a, MolType=DNA) self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n') self.assertEqual(self.r1.Name, 'x') def test_regular_aln_to_regular_aln(self): """Regular aln should convert to regular aln""" a = Alignment([self.r1, self.r2]) d = Alignment(a) self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n') d = Alignment(a, MolType=DNA) self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n') self.assertEqual(self.r1.Name, 'x') def test_model_aln_to_model_aln(self): """Model aln should convert to model aln""" a = Alignment([self.r1, self.r2]) d = Alignment(a) self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n') d = Alignment(a, MolType=DNA) self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n') self.assertEqual(self.r1.Name, 'x') #run tests if invoked from command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_core/test_alphabet.py000644 000765 000024 00000030176 12024702176 022572 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of the Enumeration and Alphabet objects. Note: individual Alphabets are typically in MolType and are tested there. """ from cogent.core.alphabet import Enumeration, get_array_type, \ uint8, uint16, uint32, array, JointEnumeration, CharAlphabet, \ _make_translation_tables, _make_complement_array from cogent.core.moltype import RNA from cogent.util.unit_test import TestCase, main DnaBases = CharAlphabet('TCAG') RnaBases = CharAlphabet('UCAG') AminoAcids = CharAlphabet('ACDEFGHIKLMNPQRSTVWY') __author__ = "Rob Knight, Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Rob Knight", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class translation_table_tests(TestCase): """Tests of top-level translation table functions""" def test_make_translation_tables(self): """_make_translation_tables should translate from chars to indices""" a = 'ucag' itoa, atoi = _make_translation_tables(a) s = 'ggacu' obs = s.translate(atoi) self.assertEqual(obs, '\x03\x03\x02\x01\x00') orig = obs.translate(itoa) self.assertEqual(orig, s) def test_make_complement_array(self): """_make_complement_array should identify complements correctly""" complement_array = _make_complement_array(RNA.Alphabet, RNA.Complements) test = 'UCAG' test_array = [RNA.Alphabet.index(i) for i in test] complements = complement_array.take(test_array) result = ''.join([RNA.Alphabet[i] for i in complements]) self.assertEqual(result, 'AGUC') class get_array_type_tests(TestCase): """Tests of the get_array_type top-level function.""" def test_get_array_type(self): """get_array_type should return unsigned type that fits elements.""" self.assertEqual(get_array_type(0), uint8) self.assertEqual(get_array_type(100), uint8) self.assertEqual(get_array_type(256), uint8) #boundary case self.assertEqual(get_array_type(257), uint16) #boundary case self.assertEqual(get_array_type(10000), uint16) self.assertEqual(get_array_type(65536), uint16) self.assertEqual(get_array_type(65537), uint32) class EnumerationTests(TestCase): """Tests of the Enumeration object.""" def test_init(self): """Enumeration init should work from any sequence""" a = Enumeration('abc') self.assertEqual(a.index('a'), 0) self.assertEqual(a.index('b'), 1) self.assertEqual(a.index('c'), 2) self.assertEqual(a[0], 'a') self.assertEqual(a[1], 'b') self.assertEqual(a[2], 'c') self.assertEqual(a.ArrayType, uint8) a = Enumeration('bca') self.assertEqual(a.index('b'), 0) self.assertEqual(a.index('c'), 1) self.assertEqual(a.index('a'), 2) self.assertEqual(a[0], 'b') self.assertEqual(a[1], 'c') self.assertEqual(a[2], 'a') a = Enumeration([1,'2']) self.assertEqual(a.index(1), 0) self.assertEqual(a.index('2'), 1) self.assertRaises(KeyError, a.index, '1') #check that it works with gaps a = Enumeration('ab-', '-') self.assertEqual(a.Gap, '-') self.assertEqual(a.GapIndex, 2) a = Enumeration(range(257)) #too big to fit in uint8 self.assertEqual(a.ArrayType, uint16) def test_index(self): """Enumeration index should return first index of item""" a = Enumeration('bca') self.assertEqual(a.index('b'), 0) self.assertEqual(a.index('c'), 1) self.assertEqual(a.index('a'), 2) def test_getitem(self): """Enumeration[i] should return character at i""" a = Enumeration('bca') self.assertEqual(a[0], 'b') self.assertEqual(a[1], 'c') self.assertEqual(a[2], 'a') def test_toIndices(self): """Enumeration toIndices should return indices from elements""" a = Enumeration('bca') self.assertEqual(a.toIndices(''), []) self.assertEqual(a.toIndices('ccabac'), [1,1,2,0,2,1]) def test_isValid(self): """Enumeration isValid should return True for valid sequence""" a = Enumeration('bca') self.assertEqual(a.isValid(''), True) self.assertEqual(a.isValid('bbb'), True) self.assertEqual(a.isValid('bbbaac'), True) self.assertEqual(a.isValid('bbd'), False) self.assertEqual(a.isValid('d'), False) self.assertEqual(a.isValid(['a', 'b']), True) self.assertEqual(a.isValid(['a', None]), False) def test_fromIndices(self): """Enumeration fromIndices should return elements from indices""" a = Enumeration('bca') self.assertEqual(a.fromIndices([]), []) self.assertEqual(a.fromIndices([1,1,2,0,2,1]), list('ccabac')) def test_pow(self): """Enumeration pow should produce JointEnumeration with n copies""" a = AminoAcids**3 self.assertEqual(a[0], (AminoAcids[0],)*3) self.assertEqual(a[-1], (AminoAcids[-1],)*3) self.assertEqual(len(a), len(AminoAcids)**3) self.assertEqual(a.ArrayType, uint16) #check that it works with gaps a = Enumeration('a-b', '-') b = a**3 self.assertEqual(len(b), 27) self.assertEqual(b.Gap, ('-','-','-')) self.assertEqual(b.GapIndex, 13) self.assertEqual(b.ArrayType, uint8) #check that array type is set correctly if needed b = a**6 #too big to fit in char self.assertEqual(b.ArrayType, uint16) def test_mul(self): """Enumeration mul should produce correct JointEnumeration""" a = DnaBases * RnaBases self.assertEqual(len(a), 16) self.assertEqual(a[0], ('T','U')) self.assertEqual(a[-1], ('G','G')) #check that it works with gaps a = Enumeration('ab-','-') b = Enumeration('xz','z') x = a*b self.assertEqual(x.Gap, ('-','z')) self.assertEqual(x.GapIndex, 5) self.assertEqual(len(x), 6) self.assertEqual(x, (('a','x'),('a','z'),('b','x'),('b','z'),('-','x'),\ ('-','z'))) #check that it doesn't work when only one seq has gaps c = Enumeration('c') x = a*c self.assertEqual(x.Gap, None) def test_counts(self): """Enumeration counts should count freqs in array""" a = DnaBases f = array([[0,0,1,0,0,3]]) self.assertEqual(a.counts(f), array([4,1,0,1])) #check that it works with byte array f = array([[0,0,1,0,0,3]], 'B') self.assertEqual(a.counts(f), array([4,1,0,1])) #should ignore out-of-bounds items g = [0,4] self.assertEqual(a.counts(g), array([1,0,0,0])) #make sure it works for long sequences, i.e. no wraparound at 255 h = [0, 3] * 70000 self.assertEqual(a.counts(h), array([70000,0,0,70000])) h2 = array(h).astype('B') self.assertEqual(a.counts(h2), array([70000,0,0,70000])) i = array([0,3] * 75000) self.assertEqual(a.counts(i), array([75000,0,0,75000])) #make sure it works for long _binary_ sequences, e.g. the results #of array comparisons. a = array([0,1,2,3]*10000) b = array([0,0,0,0]*10000) same = (a==b) class CharAlphabetTests(TestCase): """Tests of CharAlphabets.""" def test_init(self): """CharAlphabet init should make correct translation tables""" r = CharAlphabet('UCAG') i2c, c2i = r._indices_to_chars, r._chars_to_indices s = array([0,0,1,0,3,2], 'b').tostring() self.assertEqual(s.translate(i2c), 'UUCUGA') self.assertEqual('UUCUGA'.translate(c2i), '\000\000\001\000\003\002') def test_fromString(self): """CharAlphabet fromString should return correct array""" r = CharAlphabet('UCAG') self.assertEqual(r.fromString('UUCUGA'), array([0,0,1,0,3,2],'B')) def test_isValid(self): """CharAlphabet isValid should return True for valid sequence""" a = CharAlphabet('bca') self.assertEqual(a.isValid(''), True) self.assertEqual(a.isValid('bbb'), True) self.assertEqual(a.isValid('bbbaac'), True) self.assertEqual(a.isValid('bbd'), False) self.assertEqual(a.isValid('d'), False) self.assertEqual(a.isValid(['a', 'b']), True) self.assertEqual(a.isValid(['a', None]), False) def test_fromArray(self): """CharAlphabet fromArray should return correct array""" r = CharAlphabet('UCAG') self.assertEqual(r.fromArray(array(['UUC','UGA'], 'c')), \ array([[0,0,1],[0,3,2]], 'B')) def test_toChars(self): """CharAlphabet toChars should convert an input array to chars""" r = CharAlphabet('UCAG') c = r.toChars(array([[0,0,1],[0,3,2]], 'B')) self.assertEqual(c, \ array(['UUC','UGA'], 'c')) def test_toString(self): """CharAlphabet toString should convert an input array to string""" r = CharAlphabet('UCAG') self.assertEqual(r.toString(array([[0,0,1],[0,3,2]], 'B')), 'UUC\nUGA') #should work with single seq self.assertEqual(r.toString(array([[0,0,1,0,3,2]], 'B')), 'UUCUGA') #should work with single seq self.assertEqual(r.toString(array([0,0,1,0,3,2], 'B')), 'UUCUGA') #should work with empty seq self.assertEqual(r.toString(array([], 'B')), '') def test_pairs(self): """pairs should cache the same object.""" r = CharAlphabet('UCAG') rp = r.Pairs self.assertEqual(len(rp), 16) rp2 = r.Pairs self.assertSameObj(rp, rp2) def test_triples(self): """triples should cache the same object.""" r = CharAlphabet('UCAG') rt = r.Triples self.assertEqual(len(rt), 64) rt2 = r.Triples self.assertSameObj(rt, rt2) class JointEnumerationTests(TestCase): """Tests of JointEnumerations.""" def test_init(self): """JointEnumeration init should work as expected""" #should work for alphabet object a = JointEnumeration([DnaBases, RnaBases]) self.assertEqual(len(a), 16) self.assertEqual(a.Shape, (4,4)) self.assertEqual(a[0], ('T','U')) self.assertEqual(a[-1], ('G','G')) self.assertEqual(a._sub_enum_factors, array([[4],[1]])) #should work for arbitrary sequences a = JointEnumeration(['TCAG', 'UCAG']) self.assertEqual(len(a), 16) self.assertEqual(a[0], ('T','U')) self.assertEqual(a[-1], ('G','G')) self.assertEqual(a._sub_enum_factors, array([[4],[1]])) #should work for different length sequences a = JointEnumeration(['TCA', 'UCAG']) self.assertEqual(a.Shape, (3,4)) self.assertEqual(len(a), 12) self.assertEqual(a[0], ('T','U')) self.assertEqual(a[-1], ('A','G')) self.assertEqual(a._sub_enum_factors, \ array([[4],[1]])) #note: _not_ [3,1] def test_toIndices(self): """JointEnumeration toIndices should convert tuples correctly""" a = JointEnumeration(['TCAG','UCAG']) i = a.toIndices([('T','U'),('G','G'),('G','G')]) self.assertEqual(i, [0, 15, 15]) def test_fromIndices(self): """JointEnumeration fromIndices should return correct tuples""" a = JointEnumeration(['TCAG','UCAG']) i = a.fromIndices([0, 15, 15]) self.assertEqual(i, [('T','U'),('G','G'),('G','G')]) def test_packArrays(self): """JointEnumeration packArrays should return correct array.""" a = JointEnumeration(['xyz', 'abcd', 'ef']) v = [[0,1,2,0],[3,3,1,0], [1,1,0,0]] result = a.packArrays(v) self.assertEqual(result, array([7,15,18,0])) def test_unpackArrays(self): """JointEnumeration unpackArrays should return correct arrays.""" a = JointEnumeration(['xyz', 'abcd', 'ef']) v = [7,15,18,0] result = a.unpackArrays(v) self.assertEqual(result, array([[0,1,2,0],[3,3,1,0], [1,1,0,0]])) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_core/test_annotation.py000644 000765 000024 00000020663 12024702176 023164 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import unittest from cogent import DNA, LoadSeqs from cogent.core.annotation import Feature, Variable from cogent.core.location import Map, Span __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def makeSampleSequence(with_gaps=False): raw_seq = 'AACCCAAAATTTTTTGGGGGGGGGGCCCC' cds = (15, 25) utr = (12, 15) if with_gaps: raw_seq = raw_seq[:5] + '-----' +raw_seq[10:-2] + '--' seq = DNA.makeSequence(raw_seq) seq.addAnnotation(Feature, 'CDS', 'CDS', [cds]) seq.addAnnotation(Feature, "5'UTR", "5' UTR", [utr]) return seq def makeSampleAlignment(): seq1 = makeSampleSequence() seq2 = makeSampleSequence(with_gaps=True) seqs = {'FAKE01': seq1, 'FAKE02': seq2} aln = LoadSeqs(data = seqs) aln.addAnnotation(Feature, 'misc_feature', 'misc', [(12,25)]) aln.addAnnotation(Feature, 'CDS', 'blue', [(15, 25)]) aln.addAnnotation(Feature, "5'UTR", 'red', [(2, 4)]) aln.addAnnotation(Feature, "LTR", "fake", [(2,15)]) return aln class TestAnnotations(unittest.TestCase): def setUp(self): self.seq = makeSampleSequence() self.aln = makeSampleAlignment() def test_slice_seq_with_annotations(self): newseq = self.seq[:5] + self.seq[10:] for annot_type in ["CDS", "5'UTR"]: orig = str(list(self.seq.getByAnnotation(annot_type))[0]) new = str(list(newseq.getByAnnotation(annot_type))[0]) assert orig == new, (annot_type, orig, new) def test_aln_annotations(self): """test that annotations to alignment and its' sequences""" aln_expecteds = {"misc_feature":{'FAKE01': 'TTTGGGGGGGGGG', 'FAKE02': 'TTTGGGGGGGGGG'}, "CDS": {'FAKE01': 'GGGGGGGGGG', 'FAKE02': 'GGGGGGGGGG'}, "5'UTR": {'FAKE01': 'CC', 'FAKE02': 'CC'}, "LTR" : {"FAKE01": "CCCAAAATTTTTT", "FAKE02": "CCC-----TTTTT"} } seq_expecteds = {"CDS": {"FAKE01": "GGGGGGGGGG", "FAKE02": "GGGGGGGGGG"}, "5'UTR": {"FAKE01": "TTT", "FAKE02": "TTT"}} for annot_type in ["misc_feature", "CDS", "5'UTR", "LTR"]: observed = list(self.aln.getByAnnotation(annot_type))[0].todict() expected = aln_expecteds[annot_type] assert observed == expected, (annot_type, expected, observed) if annot_type in ["misc_feature", "LTR"]: continue # because seqs haven't been annotated with it for name in self.aln.Names: observed = list(self.aln.NamedSeqs[name].data.\ getByAnnotation(annot_type))[0] observed = str(observed) expected = seq_expecteds[annot_type][name] assert str(observed) == expected, (annot_type, name, expected, observed) def test_slice_aln_with_annotations(self): """test that annotations of sequences and alignments survive alignment slicing.""" aln_expecteds = {"misc_feature":{'FAKE01': 'TTTGGGGGGGGGG', 'FAKE02': 'TTTGGGGGGGGGG'}, "CDS": {'FAKE01': 'GGGGGGGGGG', 'FAKE02': 'GGGGGGGGGG'}, "5'UTR": {'FAKE01': 'CC', 'FAKE02': 'CC'}, "LTR" : {"FAKE01": "CCCTTTTT", "FAKE02": "CCCTTTTT"}} newaln = self.aln[:5]+self.aln[10:] feature_list = newaln.getAnnotationsMatching("LTR") for annot_type in ["LTR", "misc_feature", "CDS", "5'UTR"]: feature_list = newaln.getAnnotationsMatching(annot_type) new = newaln.getRegionCoveringAll(feature_list).getSlice().todict() expected = aln_expecteds[annot_type] assert expected == new, (annot_type, expected, new) if annot_type in ["misc_feature", "LTR"]: continue # because seqs haven't been annotated with it for name in self.aln.Names: orig = str(list(self.aln.getAnnotationsFromSequence(name, annot_type))[0].getSlice()) new = str(list(newaln.getAnnotationsFromSequence(name, annot_type))[0].getSlice()) assert orig == new, (name, annot_type, orig, new) def test_feature_projection(self): expecteds = {"FAKE01": "CCCAAAATTTTTT", "FAKE02": "CCC-----TTTTT"} aln_ltr = self.aln.getAnnotationsMatching('LTR')[0] for seq_name in ['FAKE01', 'FAKE02']: expected = expecteds[seq_name] seq_ltr = self.aln.projectAnnotation(seq_name, aln_ltr) if '-' in expected: self.assertRaises(ValueError, seq_ltr.getSlice) seq_ltr = seq_ltr.withoutLostSpans() expected = expected.replace('-', '') self.assertEqual(seq_ltr.getSlice(), expected) def test_reversecomplement(self): """test correct translation of annotations on reverse complement.""" aln_expecteds = {"misc_feature":{'FAKE01': 'TTTGGGGGGGGGG', 'FAKE02': 'TTTGGGGGGGGGG'}, "CDS": {'FAKE01': 'GGGGGGGGGG', 'FAKE02': 'GGGGGGGGGG'}, "5'UTR": {'FAKE01': 'CC', 'FAKE02': 'CC'}, "LTR" : {"FAKE01": "CCCAAAATTTTTT", "FAKE02": "CCC-----TTTTT"} } seq_expecteds = {"CDS": {"FAKE01": "GGGGGGGGGG", "FAKE02": "GGGGGGGGGG"}, "5'UTR": {"FAKE01": "TTT", "FAKE02": "TTT"}} rc = self.aln.rc() # rc'ing an Alignment or Sequence rc's their annotations too. This means # slicing returns the same sequence as the non-rc'd alignment/seq for annot_type in ["misc_feature", "CDS", "5'UTR", "LTR"]: observed = list(self.aln.getByAnnotation(annot_type))[0].todict() expected = aln_expecteds[annot_type] assert observed == expected, ("+", annot_type, expected, observed) observed = list(rc.getByAnnotation(annot_type))[0].todict() expected = aln_expecteds[annot_type] assert observed == expected, ("-", annot_type, expected, observed) if annot_type in ["misc_feature", "LTR"]: continue # because seqs haven't been annotated with it for name in self.aln.Names: observed = list(self.aln.NamedSeqs[name].data.\ getByAnnotation(annot_type))[0] observed = str(observed) expected = seq_expecteds[annot_type][name] assert str(observed) == expected, ("+", annot_type, name, expected, observed) observed = list(rc.NamedSeqs[name].data.\ getByAnnotation(annot_type))[0] observed = str(observed) expected = seq_expecteds[annot_type][name] assert str(observed) == expected, ("-", annot_type, name, expected, observed) class TestMapSpans(unittest.TestCase): """Test attributes of Map & Spans classes critical to annotation manipulation.""" def test_span(self): length = 100 forward = Span(20, 30) reverse = Span(70, 80, Reverse=True) assert forward.reversedRelativeTo(100) == reverse assert reverse.reversedRelativeTo(100) == forward def test_map(self): """reversing a map with multiple spans should preserve span relative order""" forward = [Span(20,30), Span(40,50)] fmap = Map(spans=forward, parent_length=100) fmap_reversed = fmap.nucleicReversed() reverse = [Span(70,80, Reverse=True), Span(50,60, Reverse=True)] rmap = Map(spans=reverse, parent_length=100) for i in range(2): self.assertEquals(fmap_reversed.spans[i], rmap.spans[i]) if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_core/test_bitvector.py000644 000765 000024 00000125020 12024702176 023004 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of the bitvector module. """ from cogent.util.unit_test import TestCase, main from cogent.core.bitvector import is_nonzero_string_char, is_nonzero_char, \ seq_to_bitstring, is_nonzero_string_int, is_nonzero_int, seq_to_bitlist,\ num_to_bitstring, bitcount, Bitvector, MutableBitvector, \ ImmutableBitvector, VectorFromCases, VectorFromMatches, VectorFromRuns, \ VectorFromSpans, VectorFromPositions, PackedBases, \ LongBitvector, ShortBitvector import re __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" class bitvectorTests(TestCase): """Tests of top-level functions.""" def test_is_nonzero_string_char(self): """is_nonzero_string_char should return '1' for anything but '0', ''""" self.assertEqual(is_nonzero_string_char('0'), '0') self.assertEqual(is_nonzero_string_char(''), '0') for char in "QWERTYUIOPASDFGHJKL:ZXCGHJMK?{|!@#$%^&*()12345678": self.assertEqual(is_nonzero_string_char(char), '1') def test_is_nonzero_char(self): """is_nonzero_char should return '0' for any False item or '0'""" zero = ['', 0, '0', [], {}, None, 0L, 0.0, False] for z in zero: self.assertEqual(is_nonzero_char(z), '0') nonzero = ['z', '1', '00', ' ', 1, -1, 1e-30, [''], {'':None}, True] for n in nonzero: self.assertEqual(is_nonzero_char(n), '1') def test_seq_to_bitstring(self): """seq_to_bitstring should provide expected results""" zero = ['', 0, '0', [], {}, None, 0L, 0.0, False] self.assertEqual(seq_to_bitstring(zero), '0'*9) nonzero = ['z', '1', '00', ' ', 1, -1, 1e-30, [''], {'':None}, True] self.assertEqual(seq_to_bitstring(nonzero), '1'*10) self.assertEqual(seq_to_bitstring(''), '') self.assertEqual(seq_to_bitstring('305'), '101') self.assertEqual(seq_to_bitstring(''), '') def test_is_nonzero_string_int(self): """is_nonzero_string_int should return 1 for anything but '0', ''""" self.assertEqual(is_nonzero_string_int('0'), 0) self.assertEqual(is_nonzero_string_int(''), 0) for char in "QWERTYUIOPASDFGHJKL:ZXCGHJMK?{|!@#$%^&*()12345678": self.assertEqual(is_nonzero_string_int(char), 1) def test_is_nonzero_int(self): """is_nonzero_int should return 0 for any False item or '0'""" zero = ['', 0, '0', [], {}, None, 0L, 0.0, False] for z in zero: self.assertEqual(is_nonzero_int(z), 0) nonzero = ['z', '1', '00', ' ', 1, -1, 1e-30, [''], {'':None}, True] for n in nonzero: self.assertEqual(is_nonzero_int(n), 1) def test_seq_to_bitlist(self): """seq_to_bitlist should provide expected results""" zero = ['', 0, '0', [], {}, None, 0L, 0.0, False] self.assertEqual(seq_to_bitlist(zero), [0]*9) nonzero = ['z', '1', '00', ' ', 1, -1, 1e-30, [''], {'':None}, True] self.assertEqual(seq_to_bitlist(nonzero), [1]*10) self.assertEqual(seq_to_bitlist(''), []) self.assertEqual(seq_to_bitlist('305'), [1,0,1]) self.assertEqual(seq_to_bitlist(''), []) def test_number_to_bitstring(self): """number_to_bitstring should provide expected results""" numbers = [0, 1, 2, 7, 8, 1024, 814715L] for n in numbers: self.assertEqual(num_to_bitstring(n, 0), '') single_results = list('0101001') for exp, num in zip(single_results, numbers): self.assertEqual(num_to_bitstring(num, 1), exp) three_results = ['000','001','010','111','000','000','011'] for exp, num in zip(three_results, numbers): self.assertEqual(num_to_bitstring(num, 3), exp) #should pad or truncate to the correct length self.assertEqual(num_to_bitstring(814715, 20),'11000110111001111011') self.assertEqual(num_to_bitstring(814715, 10),'1001111011') self.assertEqual(num_to_bitstring(8, 10),'0000001000') def test_bitcount(self): """bitcount should provide expected results""" numbers = [0, 1, 2, 7, 8, 1024, 814715L] twenty_results = [0, 1, 1, 3, 1, 1, 13] for exp, num in zip(twenty_results, numbers): self.assertEqual(bitcount(num, 20), exp) self.assertEqual(bitcount(num, 20, 1), exp) self.assertEqual(bitcount(num, 20, 0), 20 - exp) three_results = [0,1,1,3,0,0,2] for exp, num in zip(three_results, numbers): self.assertEqual(bitcount(num, 3), exp) self.assertEqual(bitcount(num, 3, 1), exp) self.assertEqual(bitcount(num, 3, 0), 3 - exp) for num in numbers: self.assertEqual(bitcount(num, 0), 0) self.assertEqual(bitcount(num, 0, 0), 0) self.assertEqual(bitcount(num, 0, 1), 0) class BitvectorTests(TestCase): """Tests of the (immutable) Bitvector class.""" def setUp(self): """Define a few standard strings and vectors.""" self.strings = ['', '0', '1', '00', '01', '10', '11'] self.vectors = map(Bitvector, self.strings) def test_init(self): """Bitvector init should give expected results.""" self.assertEqual(Bitvector(), 0) self.assertEqual(Bitvector('1001'), 9) self.assertEqual(Bitvector(['1','0','0','0']), 8) self.assertEqual(Bitvector([]), 0) #if passing in non-sequence, must specify length self.assertRaises(TypeError, Bitvector, 1024) self.assertEqual(Bitvector(1024, 10), 1024) bv = Bitvector(10, 3) self.assertEqual(bv, 10) self.assertEqual(len(bv), 3) self.assertEqual(len(Bitvector('1'*1000)), 1000) #check that initializing a bv from itself preserves length bv2 = Bitvector(bv) self.assertEqual(bv2, 10) self.assertEqual(len(bv2), 3) def test_len(self): """Bitvector len should match initialized length""" self.assertEqual(len(Bitvector()), 0) self.assertEqual(len(Bitvector('010')), 3) self.assertEqual(len(Bitvector(1024, 5)), 5) self.assertEqual(len(Bitvector(1024, 0)), 0) self.assertEqual(len(Bitvector('1'*1000)), 1000) def test_str(self): """Bitvector str should match expected results""" vecs = [Bitvector(i, 0) for i in [0, 1, 2, 7, 8, 1024, 814715L]] for v in vecs: self.assertEqual(str(v), '') vecs = [Bitvector(i, 1) for i in [0, 1, 2, 7, 8, 1024, 814715L,'1'*50]] single_results = list('01010011') for exp, vec in zip(single_results, vecs): self.assertEqual(str(vec), exp) vecs = [Bitvector(i, 3) for i in [0, 1, 2, 7, 8, 1024, 814715L,'1'*50]] three_results = ['000','001','010','111','000','000','011','111'] for exp, vec in zip(three_results, vecs): self.assertEqual(str(vec), exp) #should pad or truncate to the correct length self.assertEqual(str(Bitvector(814715, 20)),'11000110111001111011') self.assertEqual(str(Bitvector(814715, 10)),'1001111011') self.assertEqual(str(Bitvector(8, 10)),'0000001000') self.assertEqual(str(Bitvector('1'*50)), '1'*50) def test_or(self): """Bitvector A|B should return 1 for each position that is 1 in A or B""" results = [ ['', '', '', '', '', '', ''], #'' or x ['', '0', '1', '0', '0', '1', '1'], #'0' or x ['', '1', '1', '1', '1', '1', '1'], #'1' or x ['', '0', '1', '00', '01', '10', '11'], #'00' or x ['', '0', '1', '01', '01', '11', '11'], #'01' or x ['', '1', '1', '10', '11', '10', '11'], #'10' or x ['', '1', '1', '11', '11', '11', '11'], #'11' or x ] vectors = self.vectors for first_pos, first in enumerate(vectors): for second_pos, second in enumerate(vectors): self.assertEqual( str(first | second), results[first_pos][second_pos]) #test chaining expected = Bitvector('1110') observed = Bitvector('1000') | Bitvector('0100') | Bitvector('0110') self.assertEqual(observed, expected) #test long self.assertEqual(Bitvector('10'*50) | Bitvector('01'*50), \ Bitvector('11'*50)) def test_and(self): """Bitvector A&B should return 0 for each position that is 0 in A and B""" results = [ ['', '', '', '', '', '', ''], #'' and x ['', '0', '0', '0', '0', '0', '0'], #'0' and x ['', '0', '1', '0', '0', '1', '1'], #'1' and x ['', '0', '0', '00', '00', '00', '00'], #'00' and x ['', '0', '0', '00', '01', '00', '01'], #'01' and x ['', '0', '1', '00', '00', '10', '10'], #'10' and x ['', '0', '1', '00', '01', '10', '11'], #'11' and x ] vectors = self.vectors for first_pos, first in enumerate(vectors): for second_pos, second in enumerate(vectors): self.assertEqual( str(first & second), results[first_pos][second_pos]) #test chaining expected = Bitvector('0110') observed = Bitvector('1110') & Bitvector('1111') & Bitvector('0111') self.assertEqual(observed, expected) #test long self.assertEqual(Bitvector('10'*50) & Bitvector('11'*50), \ Bitvector('10'*50)) def test_xor(self): """Bitvector A^B should return 0 for each identical position in A and B""" results = [ ['', '', '', '', '', '', ''], #'' xor x ['', '0', '1', '0', '0', '1', '1'], #'0' xor x ['', '1', '0', '1', '1', '0', '0'], #'1' xor x ['', '0', '1', '00', '01', '10', '11'], #'00' xor x ['', '0', '1', '01', '00', '11', '10'], #'01' xor x ['', '1', '0', '10', '11', '00', '01'], #'10' xor x ['', '1', '0', '11', '10', '01', '00'], #'11' xor x ] vectors = self.vectors for first_pos, first in enumerate(vectors): for second_pos, second in enumerate(vectors): self.assertEqual( str(first ^ second), results[first_pos][second_pos]) #test chaining expected = Bitvector('0110') observed = Bitvector('1111') ^ Bitvector('0110') ^ Bitvector('1111') #test long self.assertEqual(Bitvector('11'*50) ^ Bitvector('01'*50), \ Bitvector('10'*50)) def test_invert(self): """Bitvector ~A should return a vector exchanging 1's for 0's""" results = map(Bitvector, ['', '1', '0', '11', '10', '01', '00']) for data, result in zip(self.vectors, results): self.assertEqual(~data, result) if len(data): self.assertNotEqual(data, result) else: self.assertEqual(data, result) #test chaining self.assertEqual(~~data, data) #inverting twice should give original self.assertEqual(~~~data, ~data) #test long self.assertEqual(~Bitvector('10'*50), Bitvector('01'*50)) self.assertEqual(str(~Bitvector('10'*50)), str(Bitvector('01'*50))) def test_getitem(self): """Bitvector getitem should return states at specified position(s)""" vec_strings = ['', '0', '1', '10', '10001101', '101'*50] vecs = map(Bitvector, vec_strings) for vec_string, vec in zip(vec_strings, vecs): for char, item in zip(vec_string, vec): self.assertEqual(char, str(item)) #test some 2- and 3-item slices as well vec = Bitvector('1001000101001') self.assertEqual(vec[3:7], Bitvector('1000')) self.assertEqual(vec[:4], Bitvector('1001')) self.assertEqual(vec[7:], Bitvector('101001')) self.assertEqual(vec[1:11:2], Bitvector('01011')) def test_bitcount(self): """Bitvector bitcount should correctly count 1's or 0's""" vec_strings = ['', '0', '1', '10', '10001101', '101'*50] vecs = map(Bitvector, vec_strings) one_counts = [0, 0, 1, 1, 4, 100] zero_counts = [0, 1, 0, 1, 4, 50] for v, o, z in zip(vecs, one_counts, zero_counts): self.assertEqual(v.bitcount(), o) self.assertEqual(v.bitcount(1), o) self.assertEqual(v.bitcount(0), z) def test_repr(self): """Bitvector repr should look like a normal object""" v = Bitvector(3, 10) v_id = str(hex(id(v))) expected = '>> from cogent import DNA >>> from cogent.core.annotation import Feature >>> s = DNA.makeSequence("AAGAAGAAGACCCCCAAAAAAAAAATTTTTTTTTTAAAAAAAAAAAAA", ... Name="Orig") >>> exon1 = s.addAnnotation(Feature, 'exon', 'fred', [(10,15)]) >>> exon2 = s.addAnnotation(Feature, 'exon', 'trev', [(30,40)]) The corresponding sequence can be extracted either with slice notation or by asking the feature to do it, since the feature knows what sequence it belongs to. .. doctest:: >>> s[exon1] DnaSequence(CCCCC) >>> exon1.getSlice() DnaSequence(CCCCC) Usually the only way to get a ``Feature`` object like ``exon1`` is to ask the sequence for it. There is one method for querying annotations by type and optionally by name: .. doctest:: >>> exons = s.getAnnotationsMatching('exon') >>> print exons [exon "fred" at [10:15]/48, exon "trev" at [30:40]/48] If the sequence does not have a matching feature you get back an empty list, and slicing the sequence with that returns a sequence of length 0. .. doctest:: >>> dont_exist = s.getAnnotationsMatching('dont_exist') >>> dont_exist [] >>> s[dont_exist] DnaSequence() To construct a pseudo-feature covering (or excluding) multiple features, use ``getRegionCoveringAll``: .. doctest:: >>> print s.getRegionCoveringAll(exons) region "exon" at [10:15, 30:40]/48 >>> print s.getRegionCoveringAll(exons).getShadow() region "not exon" at [0:10, 15:30, 40:48]/48 eg: all the exon sequence: .. doctest:: >>> s.getRegionCoveringAll(exons).getSlice() DnaSequence(CCCCCTT... 15) or with slice notation: .. doctest:: >>> s[exon1, exon2] DnaSequence(CCCCCTT... 15) Though ``.getRegionCoveringAll`` also guarantees no overlaps within the result, slicing does not: .. doctest:: >>> print s.getRegionCoveringAll(exons+exons) region "exon" at [10:15, 30:40]/48 >>> s[exon1, exon1, exon1, exon1, exon1] Traceback (most recent call last): ValueError: Uninvertable. Overlap: 10 < 15 You can use features, maps, slices or integers, but non-monotonic slices are not allowed: .. doctest:: >>> s[15:20, 5:16] Traceback (most recent call last): ValueError: Uninvertable. Overlap: 15 < 16 Features are themselves sliceable: .. doctest:: >>> exon1[0:3].getSlice() DnaSequence(CCC) When sequences are concatenated they keep their (non-overlapping) annotations: .. doctest:: >>> c = s[exon1[4:]]+s >>> print len(c) 49 >>> for feat in c.annotations: ... print feat ... exon "fred" at [-4-, 0:1]/49 exon "fred" at [11:16]/49 exon "trev" at [31:41]/49 Since features know their parents you can't use a feature from one sequence to slice another: .. doctest:: >>> print c[exon1] Traceback (most recent call last): ValueError: Can't map exon "fred" at [10:15]/48 onto ... Features are generally attached to the thing they annotate, but in those cases where a free-floating feature is created it can later be attached: .. doctest:: >>> len(s.annotations) 2 >>> region = s.getRegionCoveringAll(exons) >>> len(s.annotations) 2 >>> region.attach() >>> len(s.annotations) 3 >>> region.detach() >>> len(s.annotations) 2 When dealing with sequences that can be reverse complemented (e.g. ``DnaSequence``) features are **not** reversed. Features are considered to have strand specific meaning (.e.g CDS, exons) and so stay on their original strands. We create a sequence with a CDS that spans multiple exons, and show that after getting the reverse complement we have exactly the same result from getting the CDS annotation. .. doctest:: >>> plus = DNA.makeSequence("AAGGGGAAAACCCCCAAAAAAAAAATTTTTTTTTTAAA", ... Name="plus") >>> plus_cds = plus.addAnnotation(Feature, 'CDS', 'gene', ... [(2,6),(10,15),(25,35)]) >>> print plus_cds.getSlice() GGGGCCCCCTTTTTTTTTT >>> minus = plus.rc() >>> minus_cds = minus.getAnnotationsMatching('CDS')[0] >>> print minus_cds.getSlice() GGGGCCCCCTTTTTTTTTT Sequence features can be accessed via a containing ``Alignment``: .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs(data=[['x','-AAAAAAAAA'], ['y','TTTT--TTTT']]) >>> print aln >x -AAAAAAAAA >y TTTT--TTTT >>> exon = aln.getSeq('x').addAnnotation(Feature, 'exon', 'fred', [(3,8)]) >>> aln_exons = aln.getAnnotationsFromSequence('x', 'exon') >>> aln_exons = aln.getAnnotationsFromAnySequence('exon') But these will be returned as **alignment** features with locations in alignment coordinates. .. doctest:: >>> print exon exon "fred" at [3:8]/9 >>> print aln_exons[0] exon "fred" at [4:9]/10 >>> print aln_exons[0].getSlice() >x AAAAA >y --TTT >>> aln_exons[0].attach() >>> len(aln.annotations) 1 Similarly alignment features can be projected onto the aligned sequences, where they may end up falling across gaps: .. doctest:: >>> exons = aln.getProjectedAnnotations('y', 'exon') >>> print exons [exon "fred" at [-2-, 4:7]/8] >>> print aln.getSeq('y')[exons[0].map.withoutGaps()] TTT We copy the annotations from another sequence, .. doctest:: >>> aln = LoadSeqs(data=[['x', '-AAAAAAAAA'], ['y', 'TTTT--CCCC']]) >>> s = DNA.makeSequence("AAAAAAAAA", Name="x") >>> exon = s.addAnnotation(Feature, 'exon', 'fred', [(3,8)]) >>> exon = aln.getSeq('x').copyAnnotations(s) >>> aln_exons = list(aln.getAnnotationsFromSequence('x', 'exon')) >>> print aln_exons [exon "fred" at [4:9]/10] even if the name is different. .. doctest:: >>> exon = aln.getSeq('y').copyAnnotations(s) >>> aln_exons = list(aln.getAnnotationsFromSequence('y', 'exon')) >>> print aln_exons [exon "fred" at [3:4, 6:10]/10] >>> print aln[aln_exons] >x AAAAA >y TCCCC If the feature lies outside the sequence being copied to, you get a lost span .. doctest:: >>> aln = LoadSeqs(data=[['x', '-AAAA'], ['y', 'TTTTT']]) >>> seq = DNA.makeSequence('CCCCCCCCCCCCCCCCCCCC', 'x') >>> exon = seq.addFeature('exon', 'A', [(5,8)]) >>> aln.getSeq('x').copyAnnotations(seq) >>> copied = list(aln.getAnnotationsFromSequence('x', 'exon')) >>> copied [exon "A" at [5:5, -4-]/5] >>> copied[0].getSlice() 2 x 4 text alignment: x[----], y[----] You can copy to a sequence with a different name, in a different alignment if the feature lies within the length .. doctest:: >>> aln = LoadSeqs(data=[['x', '-AAAAAAAAA'], ['y', 'TTTT--TTTT']]) >>> seq = DNA.makeSequence('CCCCCCCCCCCCCCCCCCCC', 'x') >>> match_exon = seq.addFeature('exon', 'A', [(5,8)]) >>> aln.getSeq('y').copyAnnotations(seq) >>> copied = list(aln.getAnnotationsFromSequence('y', 'exon')) >>> copied [exon "A" at [7:10]/10] If the sequence is shorter, again you get a lost span. .. doctest:: >>> aln = LoadSeqs(data=[['x', '-AAAAAAAAA'], ['y', 'TTTT--TTTT']]) >>> diff_len_seq = DNA.makeSequence('CCCCCCCCCCCCCCCCCCCCCCCCCCCC', 'x') >>> nonmatch = diff_len_seq.addFeature('repeat', 'A', [(12,14)]) >>> aln.getSeq('y').copyAnnotations(diff_len_seq) >>> copied = list(aln.getAnnotationsFromSequence('y', 'repeat')) >>> copied [repeat "A" at [10:10, -6-]/10] We consider cases where there are terminal gaps. .. doctest:: >>> aln = LoadSeqs(data=[['x', '-AAAAAAAAA'], ['y', '------TTTT']]) >>> exon = aln.getSeq('x').addFeature('exon', 'fred', [(3,8)]) >>> aln_exons = list(aln.getAnnotationsFromSequence('x', 'exon')) >>> print aln_exons [exon "fred" at [4:9]/10] >>> print aln_exons[0].getSlice() >x AAAAA >y --TTT >>> aln = LoadSeqs(data=[['x', '-AAAAAAAAA'], ['y', 'TTTT--T---']]) >>> exon = aln.getSeq('x').addFeature('exon', 'fred', [(3,8)]) >>> aln_exons = list(aln.getAnnotationsFromSequence('x', 'exon')) >>> print aln_exons[0].getSlice() >x AAAAA >y --T-- In this case, only those residues included within the feature are covered - note the omission of the T in ``y`` opposite the gap in ``x``. .. doctest:: >>> aln = LoadSeqs(data=[['x', 'C-CCCAAAAA'], ['y', '-T----TTTT']], ... moltype=DNA) >>> print aln >x C-CCCAAAAA >y -T----TTTT >>> exon = aln.getSeq('x').addFeature('exon', 'ex1', [(0,4)]) >>> print exon exon "ex1" at [0:4]/9 >>> print exon.getSlice() CCCC >>> aln_exons = list(aln.getAnnotationsFromSequence('x', 'exon')) >>> print aln_exons [exon "ex1" at [0:1, 2:5]/10] >>> print aln_exons[0].getSlice() >x CCCC >y ---- ``Feature.asOneSpan()``, is applied to the exon that straddles the gap in ``x``. The result is we preserve that feature. .. doctest:: >>> print aln_exons[0].asOneSpan().getSlice() >x C-CCC >y -T--- These properties also are consistently replicated with reverse complemented sequences. .. doctest:: >>> aln_rc = aln.rc() >>> rc_exons = list(aln_rc.getAnnotationsFromAnySequence('exon')) >>> print aln_rc[rc_exons] # not using asOneSpan, so gap removed from x >x CCCC >y ---- >>> print aln_rc[rc_exons[0].asOneSpan()] >x C-CCC >y -T--- Features can provide their coordinates, useful for custom analyses. .. doctest:: >>> all_exons = aln.getRegionCoveringAll(aln_exons) >>> coords = all_exons.getCoordinates() >>> assert coords == [(0,1),(2,5)] Annotated regions can be masked (observed sequence characters replaced by another), either through the sequence on which they reside or by projection from the alignment. Note that ``mask_char`` must be a valid character for the sequence ``MolType``. Either the features (multiple can be named), or their shadow, can be masked. We create an alignment with a sequence that has two different annotation types. .. doctest:: >>> aln = LoadSeqs(data=[['x', 'C-CCCAAAAAGGGAA'], ['y', '-T----TTTTG-GTT']]) >>> print aln >x C-CCCAAAAAGGGAA >y -T----TTTTG-GTT >>> exon = aln.getSeq('x').addFeature('exon', 'norwegian', [(0,4)]) >>> print exon.getSlice() CCCC >>> repeat = aln.getSeq('x').addFeature('repeat', 'blue', [(9,12)]) >>> print repeat.getSlice() GGG >>> repeat = aln.getSeq('y').addFeature('repeat', 'frog', [(5,7)]) >>> print repeat.getSlice() GG Each sequence should correctly mask either the single feature, it's shadow, or the multiple features, or shadow. .. doctest:: >>> print aln.getSeq('x').withMaskedAnnotations('exon', mask_char='?') ????AAAAAGGGAA >>> print aln.getSeq('x').withMaskedAnnotations('exon', mask_char='?', ... shadow=True) CCCC?????????? >>> print aln.getSeq('x').withMaskedAnnotations(['exon', 'repeat'], ... mask_char='?') ????AAAAA???AA >>> print aln.getSeq('x').withMaskedAnnotations(['exon', 'repeat'], ... mask_char='?', shadow=True) CCCC?????GGG?? >>> print aln.getSeq('y').withMaskedAnnotations('exon', mask_char='?') TTTTTGGTT >>> print aln.getSeq('y').withMaskedAnnotations('repeat', mask_char='?') TTTTT??TT >>> print aln.getSeq('y').withMaskedAnnotations('repeat', mask_char='?', ... shadow=True) ?????GG?? The same methods can be applied to annotated Alignment's. .. doctest:: >>> print aln.withMaskedAnnotations('exon', mask_char='?') >x ?-???AAAAAGGGAA >y -T----TTTTG-GTT >>> print aln.withMaskedAnnotations('exon', mask_char='?', shadow=True) >x C-CCC?????????? >y -?----?????-??? >>> print aln.withMaskedAnnotations('repeat', mask_char='?') >x C-CCCAAAAA???AA >y -T----TTTT?-?TT >>> print aln.withMaskedAnnotations('repeat', mask_char='?', shadow=True) >x ?-????????GGG?? >y -?----????G-G?? >>> print aln.withMaskedAnnotations(['repeat', 'exon'], mask_char='?') >x ?-???AAAAA???AA >y -T----TTTT?-?TT >>> print aln.withMaskedAnnotations(['repeat', 'exon'],shadow=True) >x C-CCC?????GGG?? >y -?----????G-G?? It shouldn't matter whether annotated coordinates are entered separately, or as a series. .. doctest:: >>> data = [['human', 'CGAAACGTTT'], ['mouse', 'CTAAACGTCG']] >>> as_series = LoadSeqs(data = data) >>> as_items = LoadSeqs(data = data) We add annotations to the sequences as a series. .. doctest:: >>> as_series.getSeq('human').addFeature('cpgsite', 'cpg', [(0,2), (5,7)]) cpgsite "cpg" at [0:2, 5:7]/10 >>> as_series.getSeq('mouse').addFeature('cpgsite', 'cpg', [(5,7), (8,10)]) cpgsite "cpg" at [5:7, 8:10]/10 We add the annotations to the sequences one segment at a time. .. doctest:: >>> as_items.getSeq('human').addFeature('cpgsite', 'cpg', [(0,2)]) cpgsite "cpg" at [0:2]/10 >>> as_items.getSeq('human').addFeature('cpgsite', 'cpg', [(5,7)]) cpgsite "cpg" at [5:7]/10 >>> as_items.getSeq('mouse').addFeature('cpgsite', 'cpg', [(5,7)]) cpgsite "cpg" at [5:7]/10 >>> as_items.getSeq('mouse').addFeature('cpgsite', 'cpg', [(8,10)]) cpgsite "cpg" at [8:10]/10 These different constructions should generate the same output. .. doctest:: >>> serial = as_series.withMaskedAnnotations(['cpgsite']) >>> print serial >human ??AAA??TTT >mouse CTAAA??T?? >>> itemwise = as_items.withMaskedAnnotations(['cpgsite']) >>> print itemwise >human ??AAA??TTT >mouse CTAAA??T?? Annotations should be correctly masked, whether the sequence has been reverse complemented or not. We use the plus/minus strand CDS containing sequences created above. .. doctest:: >>> print plus.withMaskedAnnotations("CDS") AA????AAAA?????AAAAAAAAAA??????????AAA >>> print minus.withMaskedAnnotations("CDS") TTT??????????TTTTTTTTTT?????TTTT????TT .. todo:: Not documented, Source features. PyCogent-1.5.3/tests/test_core/test_genetic_code.py000644 000765 000024 00000032550 12024702176 023420 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Unit tests for Genetic Code classes. """ from cogent import RNA, DNA from cogent.core.genetic_code import GeneticCode, GeneticCodeInitError,\ InvalidCodonError, GeneticCodes from cogent.util.unit_test import TestCase, main __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Rob Knight", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "caporaso@colorado.edu" __status__ = "Production" class GeneticCodeTests(TestCase): """Tests of the GeneticCode class.""" def setUp(self): """Set up some standard genetic code representations.""" self.SGC = \ "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG" self.mt = \ "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG" self.AllG= \ "GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG" self.WrongLength= [ "GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG" "", "GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG", ] self.NcbiStandard = [ 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 1, 'Standard Nuclear', '---M---------------M---------------M----------------------------', ] def test_init(self): """GeneticCode init should work with correct-length sequences""" sgc = GeneticCode(self.SGC) self.assertEqual(sgc['UUU'], 'F') mt = GeneticCode(self.mt) self.assertEqual(mt['UUU'], 'F') allg = GeneticCode(self.AllG) self.assertEqual(allg['UUU'], 'G') for i in self.WrongLength: self.assertRaises(GeneticCodeInitError, GeneticCode, i) def test_standard_code(self): """Standard genetic code from NCBI should have correct properties""" sgc = GeneticCode(*self.NcbiStandard) self.assertEqual(sgc.CodeSequence, 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG') self.assertEqual(sgc.StartCodonSequence, '---M---------------M---------------M----------------------------') self.assertEqual(sgc.StartCodons, {'TTG':'M', 'CTG':'M', 'ATG':'M'}) self.assertEqual(sgc.ID, 1) self.assertEqual(sgc.Name, 'Standard Nuclear') self.assertEqual(sgc['UUU'], 'F') self.assertEqual(sgc.isStart('ATG'), True) self.assertEqual(sgc.isStart('AAA'), False) self.assertEqual(sgc.isStop('UAA'), True) self.assertEqual(sgc.isStop('AAA'), False) self.assertEqual(len(sgc.SenseCodons), 61) self.assertContains(sgc.SenseCodons, 'AAA') self.assertNotContains(sgc.SenseCodons, 'TGA') def test_standard_code_lookup(self): """GeneticCodes should hold codes keyed by id as string and number""" sgc_new = GeneticCode(*self.NcbiStandard) sgc_number = GeneticCodes[1] sgc_string = GeneticCodes['1'] for sgc in sgc_new, sgc_number, sgc_string: self.assertEqual(sgc.CodeSequence, 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG') self.assertEqual(sgc.StartCodonSequence, '---M---------------M---------------M----------------------------') self.assertEqual(sgc.StartCodons, {'TTG':'M', 'CTG':'M', 'ATG':'M'}) self.assertEqual(sgc.ID, 1) self.assertEqual(sgc.Name, 'Standard Nuclear') self.assertEqual(sgc['TTT'], 'F') self.assertEqual(sgc.isStart('ATG'), True) self.assertEqual(sgc.isStart('AAA'), False) self.assertEqual(sgc.isStop('TAA'), True) self.assertEqual(sgc.isStop('AAA'), False) mtgc = GeneticCodes[2] self.assertEqual(mtgc.Name, 'Vertebrate Mitochondrial') self.assertEqual(mtgc.isStart('AUU'), True) self.assertEqual(mtgc.isStop('UGA'), False) self.assertEqual(sgc_new.changes(mtgc), {'AGA':'R*', 'AGG':'R*', 'ATA':'IM', 'TGA':'*W'}) self.assertEqual(mtgc.changes(sgc_new), {'AGA':'*R', 'AGG':'*R', 'ATA':'MI', 'TGA':'W*'}) self.assertEqual(mtgc.changes(mtgc), {}) self.assertEqual(mtgc.changes( 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'), {'AGA':'*R', 'AGG':'*R', 'ATA':'MI', 'TGA':'W*'}) def test_str(self): """GeneticCode str() should return its code string""" code_strings = self.SGC, self.mt, self.AllG codes = map(GeneticCode, code_strings) for code, string in zip(codes, code_strings): self.assertEqual(str(code), string) #check an example directly in case strings are bad self.assertEqual(str(self.SGC), \ "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG") def test_cmp(self): """GeneticCode cmp() should act on code strings""" sgc_1 = GeneticCode(self.SGC) sgc_2 = GeneticCode(self.SGC) self.assertEqual(sgc_1 is sgc_2, False) #ensure different objects #self.assertNotEqual(sgc_1, sgc_2) # GREG self.assertEqual(sgc_1, sgc_2) mtgc = GeneticCode(self.mt) self.assertNotEqual(sgc_1, mtgc) def test_getitem_codon(self): """GeneticCode getitem should return amino acid for codon""" #specific checks of a particular codon in the standard code variant_codons = ['AUU', 'AUU', 'AUU', 'ATT', 'ATU', 'ATU'] sgc = GeneticCode(self.SGC) for i in variant_codons: self.assertEqual(sgc[i], 'I') #full check for the standard code codons = [a+b+c for a in 'UCAG' for b in 'TCAG' for c in 'UCAG'] for codon, aa in zip(codons, self.SGC): self.assertEqual(sgc[codon], aa) #full check for another code allg = GeneticCode(self.AllG) for codon, aa in zip(codons, self.AllG): self.assertEqual(allg[codon], aa) #check that degenerate codon returns X self.assertEqual(sgc['NNN'], 'X') def test_getitem_aa(self): """GeneticCode getitem should return codon set for aa""" #for all G, should return all the codons (in some order) allg = GeneticCode(self.AllG) codons = [a+b+c for a in 'TCAG' for b in 'TCAG' for c in 'TCAG'] g_codons = allg['G'] codons_copy = codons[:] self.assertEqual(g_codons, codons_copy) #check some known cases in the standard genetic code sgc = GeneticCode(self.SGC) exp_ile = ['ATT', 'ATC', 'ATA'] obs_ile = sgc['I'] self.assertEqual(obs_ile, exp_ile) exp_arg = ['AGA', 'AGG', 'CGT', 'CGC', 'CGA', 'CGG'] obs_arg = sgc['R'] self.assertEqual(obs_ile, exp_ile) exp_leu = ['TTA','TTG','CTT','CTC','CTA','CTG'] obs_leu = sgc['L'] self.assertEqual(obs_leu, exp_leu) exp_met = ['ATG'] obs_met = sgc['M'] self.assertEqual(obs_met, exp_met) #unknown aa should return [] self.assertEqual(sgc['U'], []) def test_getitem_invalid_length(self): """GeneticCode getitem should raise InvalidCodonError on wrong length""" sgc = GeneticCode(self.SGC) self.assertRaises(InvalidCodonError, sgc.__getitem__, 'AAAA') self.assertRaises(InvalidCodonError, sgc.__getitem__, 'AA') def test_Blocks(self): """GeneticCode Blocks should return correct list""" sgc = GeneticCode(self.SGC) exp_blocks = [ ['TTT', 'TTC',], ['TTA', 'TTG',], ['TCT','TCC','TCA','TCG'], ['TAT','TAC'], ['TAA','TAG'], ['TGT','TGC'], ['TGA'], ['TGG'], ['CTT','CTC','CTA','CTG'], ['CCT','CCC','CCA','CCG'], ['CAT','CAC'], ['CAA','CAG'], ['CGT','CGC','CGA','CGG'], ['ATT','ATC'], ['ATA',], ['ATG',], ['ACT','ACC','ACA','ACG'], ['AAT','AAC'], ['AAA','AAG'], ['AGT','AGC'], ['AGA','AGG'], ['GTT','GTC','GTA','GTG'], ['GCT','GCC','GCA','GCG'], ['GAT','GAC'], ['GAA','GAG'], ['GGT','GGC','GGA','GGG'], ] self.assertEqual(sgc.Blocks, exp_blocks) def test_Anticodons(self): """GeneticCode Anticodons should return correct list""" sgc = GeneticCode(self.SGC) exp_anticodons = { 'F': ['AAA', 'GAA',], 'L': ['TAA', 'CAA', 'AAG','GAG','TAG','CAG'], 'Y': ['ATA','GTA'], '*': ['TTA','CTA', 'TCA'], 'C': ['ACA','GCA'], 'W': ['CCA'], 'S': ['AGA','GGA','TGA','CGA','ACT','GCT'], 'P': ['AGG','GGG','TGG','CGG'], 'H': ['ATG','GTG'], 'Q': ['TTG','CTG'], 'R': ['ACG','GCG','TCG','CCG','TCT','CCT'], 'I': ['AAT','GAT','TAT'], 'M': ['CAT',], 'T': ['AGT','GGT','TGT','CGT'], 'N': ['ATT','GTT'], 'K': ['TTT','CTT'], 'V': ['AAC','GAC','TAC','CAC'], 'A': ['AGC','GGC','TGC','CGC'], 'D': ['ATC','GTC'], 'E': ['TTC','CTC'], 'G': ['ACC','GCC','TCC','CCC'], } self.assertEqual(sgc.Anticodons, exp_anticodons) def test_translate(self): """GeneticCode translate should return correct amino acid string""" allg = GeneticCode(self.AllG) sgc = GeneticCode(self.SGC) mt = GeneticCode(self.mt) seq = 'AUGCAUGACUUUUGA' # . . . . . markers for codon start self.assertEqual(allg.translate(seq), 'GGGGG') self.assertEqual(allg.translate(seq, 1), 'GGGG') self.assertEqual(allg.translate(seq, 2), 'GGGG') self.assertEqual(allg.translate(seq, 3), 'GGGG') self.assertEqual(allg.translate(seq, 4), 'GGG') self.assertEqual(allg.translate(seq, 12), 'G') self.assertEqual(allg.translate(seq, 14), '') self.assertRaises(ValueError, allg.translate, seq, 15) self.assertRaises(ValueError, allg.translate, seq, 20) self.assertEqual(sgc.translate(seq), 'MHDF*') self.assertEqual(sgc.translate(seq, 3), 'HDF*') self.assertEqual(sgc.translate(seq, 6), 'DF*') self.assertEqual(sgc.translate(seq, 9), 'F*') self.assertEqual(sgc.translate(seq, 12), '*') self.assertEqual(sgc.translate(seq, 14), '') #check shortest translatable sequences self.assertEqual(sgc.translate('AAA'), 'K') self.assertEqual(sgc.translate(''), '') #check that different code gives different results self.assertEqual(mt.translate(seq), 'MHDFW') #check translation with invalid codon(s) self.assertEqual(sgc.translate('AAANNNCNC123UUU'), 'KXXXF') def test_sixframes(self): """GeneticCode sixframes should provide six-frame translation""" class fake_rna(str): """Fake RNA class with reverse-complement""" def __new__(cls, seq, rev): return str.__new__(cls, seq) def __init__(self, seq, rev): self.seq = seq self.rev = rev def rc(self): return self.rev test_rna = fake_rna('AUGCUAACAUAAA', 'UUUAUGUUAGCAU') # . . . . . . . . . . sgc = GeneticCode(self.SGC) self.assertEqual(sgc.sixframes(test_rna), [ 'MLT*', 'C*HK', 'ANI', 'FMLA', 'LC*H', 'YVS']) # should also actually work with an RNA or DNA sequence!!! test_rna = RNA.makeSequence('AUGCUAACAUAAA') self.assertEqual(sgc.sixframes(test_rna), [ 'MLT*', 'C*HK', 'ANI', 'FMLA', 'LC*H', 'YVS']) def test_stop_indexes(self): """should return stop codon indexes for a specified frame""" sgc = GeneticCode(self.SGC) seq = DNA.makeSequence('ATGCTAACATAAA') expected = [[9], [4], []] for frame, expect in enumerate(expected): got = sgc.getStopIndices(seq, start=frame) self.assertEqual(got, expect) def test_Synonyms(self): """GeneticCode Synonyms should return aa -> codon set mapping.""" expected_synonyms = { 'A':['GCT','GCC','GCA','GCG'], 'C':['TGT', 'TGC'], 'D':['GAT', 'GAC'], 'E':['GAA','GAG'], 'F':['TTT','TTC'], 'G':['GGT','GGC','GGA','GGG'], 'H':['CAT','CAC'], 'I':['ATT','ATC','ATA'], 'K':['AAA','AAG'], 'L':['TTA','TTG','CTT','CTC','CTA','CTG'], 'M':['ATG'], 'N':['AAT','AAC'], 'P':['CCT','CCC','CCA','CCG'], 'Q':['CAA','CAG'], 'R':['AGA','AGG','CGT','CGC','CGA','CGG'], 'S':['TCT','TCC','TCA','TCG','AGT','AGC'], 'T':['ACT','ACC','ACA','ACG'], 'V':['GTT','GTC','GTA','GTG'], 'W':['TGG'], 'Y':['TAT','TAC'], '*':['TAA','TAG', 'TGA'], } obs_synonyms = GeneticCode(self.SGC).Synonyms #note that the lists will be arbitrary-order for i in expected_synonyms: self.assertEqualItems(obs_synonyms[i], expected_synonyms[i]) #Run tests if called from command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_core/test_info.py000644 000765 000024 00000011645 12024702176 021745 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for Info class and associated objects (DbRef, DbRefs, etc.). """ from cogent.util.unit_test import TestCase, main from cogent.core.info import DbRef, DbRefs, Info, _make_list __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class DbRefTests(TestCase): """Tests of the DbRef object.""" def setUp(self): """Define a standard DbRef object""" self.data = dict(Accession='xyz',Db='abc',Name='qwe',Description='blah', Data = range(20)) self.db = DbRef(**self.data) def test_init_minimal(self): """DbRef minimal init should fill fields as expected""" d = DbRef('abc') self.assertEqual(d.Accession, 'abc') self.assertEqual(d.Db, '') self.assertEqual(d.Name, '') self.assertEqual(d.Description, '') self.assertEqual(d.Data, None) #empty init not allowed self.assertRaises(TypeError, DbRef) def test_init(self): """DbRef init should insert correct data""" for attr, val in self.data.items(): self.assertEqual(getattr(self.db, attr), val) def test_str(self): """DbRef str should be the same as the accession str""" self.assertEqual(str(self.db), 'xyz') self.db.Accession = 12345 self.assertEqual(str(self.db), '12345') def test_int(self): """DbRef int should be the same as the accession int""" self.assertRaises(ValueError, int, self.db) self.db.Accession = '12345' self.assertEqual(int(self.db), 12345) def test_cmp(self): """DbRef cmp should first try numeric, then alphabetic, cmp.""" self.assertLessThan(DbRef('abc'), DbRef('xyz')) self.assertEqual(DbRef('abc'), DbRef('abc')) self.assertGreaterThan(DbRef('123'), DbRef('14')) self.assertLessThan(DbRef('123'), DbRef('abc')) #check that it ignores other attributes self.assertEqual(DbRef('x','y','z','a','b'), DbRef('x')) class infoTests(TestCase): """Tests of top-level functions.""" def test_make_list(self): """_make_list should always return a list""" self.assertEqual(_make_list('abc'), ['abc']) self.assertEqual(_make_list([]), []) self.assertEqual(_make_list(None), [None]) self.assertEqual(_make_list({'x':'y'}), [{'x':'y'}]) self.assertEqual(_make_list([1,2,3]), [1,2,3]) class DbRefsTests(TestCase): """Tests of the DbRefs class.""" def test_init_empty(self): """DbRefs empty init should work as expected""" self.assertEqual(DbRefs(), {}) def test_init_data(self): """DbRefs init with data should produce expected results""" d = DbRefs({'GenBank':'ab', 'GO':(3,44), 'PDB':['asdf','ghjk']}) self.assertEqual(d,{'GenBank':['ab'],'GO':[3,44],'PDB':['asdf','ghjk']}) d.GenBank = 'xyz' self.assertEqual(d['GenBank'], ['xyz']) class InfoTests(TestCase): """Tests of the Info class.""" def test_init_empty(self): """Info empty init should work as expected""" d = Info() self.assertEqual(len(d), 1) self.assertContains(d, 'Refs') self.assertEqual(d.Refs, DbRefs()) self.assertTrue(isinstance(d.Refs, DbRefs)) def test_init_data(self): """Info init with data should put items in correct places""" #need to check init, setting, and resetting of attributes that belong #in the Info object and attributes that belong in Info.Refs. Also need #to check __getitem__, __setitem__, and __contains__. d = Info({'x':3, 'GO':12345}) self.assertEqual(d.x, 3) self.assertEqual(d.GO, [12345]) self.assertEqual(d.Refs.GO, [12345]) try: del d.Refs except AttributeError: pass else: raise Exception, "Failed to prevent deletion of required key Refs""" d.GenBank = ('qaz', 'wsx') self.assertEqual(d.GenBank, ['qaz', 'wsx']) self.assertContains(d.Refs, 'GenBank') self.assertContains(d, 'GenBank') d.GenBank = 'xyz' self.assertEqual(d.GenBank, ['xyz']) self.assertSameObj(d.GenBank, d.Refs.GenBank) d.GO = 'x' self.assertEqual(d.GO, ['x']) d.GO.append('y') self.assertEqual(d.GO, ['x', 'y']) d.ZZZ = 'zzz' self.assertEqual(d.ZZZ, 'zzz') self.assertNotContains(d.Refs, 'ZZZ') self.assertNotContains(d, 'XXX') self.assertEqual(d.XXX, None) def test_identity(self): """Info should get its own new Refs when created""" i = Info() j = Info() self.assertNotSameObj(i, j) self.assertNotSameObj(i.Refs, j.Refs) #run the following if invoked from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_core/test_location.py000644 000765 000024 00000045576 12024702176 022634 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for Range, Span and Point classes. NOTE: Map not currently tested. """ from cogent.util.unit_test import TestCase, main from cogent.core.location import Span, Range, Point, RangeFromString __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class SpanTests(TestCase): """Tests of the Span object.""" def setUp(self): """Define some standard Spans""" self.empty = Span(0, 0) self.full = Span(35, 30) #will convert to (30, 35) internally self.overlapping = Span(32, 36) self.inside = Span(31, 32) self.before = Span(25, 30) self.after = Span(35, 40) self.reverse = Span(30, 35, Reverse=True) self.spans_zero = Span(-5, 5) def test_init(self): """Span object should init with Start, End, and Length""" s = Span(0) self.assertEqual(s.Start, 0) self.assertEqual(s.End, 1) self.assertEqual(s.Reverse, False) #to get an empty interval, must specify start and end explicitly t = Span(0, 0) self.assertEqual(t.Start, 0) self.assertEqual(t.End, 0) self.assertEqual(t.Reverse, False) #should be able to specify direction also u = Span(5, 15, Reverse=True) self.assertEqual(u.Start, 5) self.assertEqual(u.End, 15) self.assertEqual(u.Reverse, True) #should be able to init from another span v = Span(u) self.assertEqual(v.Start, 5) self.assertEqual(v.End, 15) self.assertEqual(v.Reverse, True) def test_contains(self): """Span object contains its start but not its end""" self.assertNotContains(self.empty, 0) self.assertContains(self.full, 30) self.assertContains(self.full, 34) self.assertNotContains(self.full, 35) self.assertContains(self.full, self.inside) self.assertNotContains(self.full, self.overlapping) self.assertContains(self.spans_zero, 0) self.assertContains(self.spans_zero, -5) self.assertNotContains(self.spans_zero, 5) def test_overlaps(self): """Span objects should be able to overlap points or spans""" self.assertTrue(self.full.overlaps(self.overlapping)) self.assertFalse(self.full.overlaps(self.before)) self.assertFalse(self.before.overlaps(self.overlapping)) self.assertFalse(self.full.overlaps(self.after)) self.assertFalse(self.after.overlaps(self.before)) self.assertTrue(self.full.overlaps(self.inside)) self.assertTrue(self.spans_zero.overlaps(self.empty)) self.assertTrue(self.empty.overlaps(self.spans_zero)) def test_reverse(self): """Span reverse should change direction""" self.assertFalse(self.empty.Reverse) self.empty.reverse() self.assertTrue(self.empty.Reverse) self.empty.reverse() self.assertFalse(self.empty.Reverse) self.assertTrue(self.reverse.Reverse) self.reverse.reverse() self.assertFalse(self.reverse.Reverse) def test_iter(self): """Span iter should loop through (integer) contents""" self.assertEqual(list(iter(self.empty)), []) self.assertEqual(list(iter(self.full)), [30,31,32,33,34]) self.assertEqual(list(iter(self.spans_zero)),[-5,-4,-3,-2,-1,0,1,2,3,4]) self.assertEqual(list(iter(self.inside)), [31]) self.assertEqual(list(self.reverse), [34,33,32,31,30]) def test_str(self): """Span str should print start, stop, reverse""" self.assertEqual(str(self.empty), '(0,0,False)') self.assertEqual(str(self.full), '(30,35,False)') self.assertEqual(str(self.reverse), '(30,35,True)') def test_len(self): """Span len should return difference between start and end""" self.assertEqual(len(self.empty), 0) self.assertEqual(len(self.full), 5) self.assertEqual(len(self.inside),1) self.assertEqual(len(self.spans_zero), 10) def test_cmp(self): """Span cmp should support sort by 1st/2nd index and direction""" s, e, f, r, i, o = self.spans_zero, self.empty, self.full, \ self.reverse, self.inside, self.overlapping n = Span(30, 36) expected_order = [s, e, f, r, n, i, o] first = expected_order[:] first.sort() second = [r, o, f, s, e, i, n] second.sort() for i, j in zip(first, second): self.assertSameObj(i, j) for i, j in zip(first, expected_order): self.assertSameObj(i, j) def test_startsBefore(self): """Span startsBefore should match hand-calculated results""" e, f = self.empty, self.full self.assertTrue(e.startsBefore(f)) self.assertFalse(f.startsBefore(e)) self.assertTrue(e.startsBefore(1)) self.assertTrue(e.startsBefore(1000)) self.assertFalse(e.startsBefore(0)) self.assertFalse(e.startsBefore(-1)) self.assertFalse(f.startsBefore(30)) self.assertTrue(f.startsBefore(31)) self.assertTrue(f.startsBefore(1000)) def test_startsAfter(self): """Span startsAfter should match hand-calculated results""" e, f = self.empty, self.full self.assertFalse(e.startsAfter(f)) self.assertTrue(f.startsAfter(e)) self.assertFalse(e.startsAfter(1)) self.assertFalse(e.startsAfter(1000)) self.assertFalse(e.startsAfter(0)) self.assertTrue(e.startsAfter(-1)) self.assertTrue(f.startsAfter(29)) self.assertFalse(f.startsAfter(30)) self.assertFalse(f.startsAfter(31)) self.assertFalse(f.startsAfter(1000)) def test_startsAt(self): """Span startsAt should return True if input matches""" e, f = self.empty, self.full s = Span(30, 1000) self.assertTrue(e.startsAt(0)) self.assertTrue(f.startsAt(30)) self.assertTrue(s.startsAt(30)) self.assertTrue(f.startsAt(s)) self.assertTrue(s.startsAt(f)) self.assertFalse(e.startsAt(f)) self.assertFalse(e.startsAt(-1)) self.assertFalse(e.startsAt(1)) self.assertFalse(f.startsAt(29)) def test_startsInside(self): """Span startsInside should return True if input starts inside span""" e, f, i, o = self.empty, self.full, self.inside, self.overlapping self.assertFalse(e.startsInside(0)) self.assertFalse(f.startsInside(30)) self.assertFalse(e.startsInside(f)) self.assertTrue(i.startsInside(f)) self.assertFalse(f.startsInside(i)) self.assertTrue(o.startsInside(f)) self.assertFalse(o.endsInside(i)) def test_endsBefore(self): """Span endsBefore should match hand-calculated results""" e, f = self.empty, self.full self.assertTrue(e.endsBefore(f)) self.assertFalse(f.endsBefore(e)) self.assertTrue(e.endsBefore(1)) self.assertTrue(e.endsBefore(1000)) self.assertFalse(e.endsBefore(0)) self.assertFalse(e.endsBefore(-1)) self.assertFalse(f.endsBefore(30)) self.assertFalse(f.endsBefore(31)) self.assertTrue(f.endsBefore(1000)) def test_endsAfter(self): """Span endsAfter should match hand-calculated results""" e, f = self.empty, self.full self.assertFalse(e.endsAfter(f)) self.assertTrue(f.endsAfter(e)) self.assertFalse(e.endsAfter(1)) self.assertFalse(e.endsAfter(1000)) self.assertFalse(e.endsAfter(0)) self.assertTrue(e.endsAfter(-1)) self.assertTrue(f.endsAfter(29)) self.assertTrue(f.endsAfter(30)) self.assertTrue(f.endsAfter(34)) self.assertFalse(f.endsAfter(35)) self.assertFalse(f.endsAfter(1000)) def test_endsAt(self): """Span endsAt should return True if input matches""" e, f = self.empty, self.full s = Span(30, 1000) t = Span(-100, 35) self.assertTrue(e.endsAt(0)) self.assertTrue(f.endsAt(35)) self.assertTrue(s.endsAt(1000)) self.assertFalse(f.endsAt(s)) self.assertFalse(s.endsAt(f)) self.assertTrue(f.endsAt(t)) self.assertTrue(t.endsAt(f)) def test_endsInside(self): """Span endsInside should return True if input ends inside span""" e, f, i, o = self.empty, self.full, self.inside, self.overlapping self.assertFalse(e.endsInside(0)) self.assertFalse(f.endsInside(30)) self.assertFalse(f.endsInside(34)) self.assertFalse(f.endsInside(35)) self.assertFalse(e.endsInside(f)) self.assertTrue(i.endsInside(f)) self.assertFalse(f.endsInside(i)) self.assertFalse(o.endsInside(f)) self.assertFalse(o.endsInside(i)) self.assertTrue(e.endsInside(Span(-1,1))) self.assertTrue(e.endsInside(Span(0,1))) self.assertFalse(e.endsInside(Span(-1,0))) class RangeInterfaceTests(object): #SpanTests): """A single-element Range should behave like the corresponding Span.""" def setUp(self): """Define some standard Spans""" self.empty = Range(Span(0, 0)) self.full = Range(Span(30, 35)) self.overlapping = Range(Span(32, 36)) self.inside = Range(Span(31, 32)) self.before = Range(Span(25, 30)) self.after = Range(Span(35, 40)) self.reverse = Range(Span(30, 35, Reverse=True)) self.spans_zero = Range(Span(-5, 5)) def test_str(self): """Range str should print start, stop, reverse for each Span""" #note that the Range adds an extra level of parens, since it can #contain more than one Span. self.assertEqual(str(self.empty), '((0,0,False))') self.assertEqual(str(self.full), '((30,35,False))') self.assertEqual(str(self.reverse), '((30,35,True))') class RangeTests(TestCase): """Tests of the Range object.""" def setUp(self): """Set up a few standard ranges.""" self.one = Range(Span(0,100)) self.two = Range([Span(3,5), Span(8, 11)]) self.three = Range([Span(6,7), Span(15, 17), Span(30, 35)]) self.overlapping = Range([Span(6, 10), Span(7,3)]) self.single = Range(0) self.singles = Range([3, 11]) self.twocopy = Range(self.two) self.twothree = Range([self.two, self.three]) self.empty = Range([Span(6,6), Span(8,8)]) def test_init(self): """Range init from Spans, numbers, or Ranges should work OK.""" #single span self.assertEqual(self.one, Span(0,100)) #list of spans self.assertEqual(self.two.Spans, [Span(3,5), Span(8,11)]) #another range self.assertEqual(self.two, self.twocopy) #list of ranges self.assertEqual(self.twothree.Spans, [Span(3,5), Span(8,11), Span(6,7), Span(15,17), Span(30,35)]) #list of numbers self.assertEqual(self.singles.Spans, [Span(3,4), Span(11,12)]) #single number self.assertEqual(self.single.Spans, [Span(0,1)]) #nothing self.assertEqual(Range().Spans, []) def test_str(self): """Range str should print nested with parens""" self.assertEqual(str(self.one), '((0,100,False))') self.assertEqual(str(self.twothree), '((3,5,False),(8,11,False),(6,7,False),(15,17,False),(30,35,False))') self.assertEqual(str(self.single), '((0,1,False))') def test_len(self): """Range len should sum span lengths""" self.assertEqual(len(self.one), 100) self.assertEqual(len(self.single), 1) self.assertEqual(len(self.empty), 0) self.assertEqual(len(self.three), 8) def test_cmp(self): """Ranges should compare equal if they have the same spans""" self.assertEqual(self.twothree, Range([Span(3,5), Span(8, 11), Span(6,7), Span(15, 17), Span(30, 35)])) self.assertEqual(Range(), Range()) def test_start_end(self): """Range Start and End should behave as expected""" self.assertEqual(self.one.Start, 0) self.assertEqual(self.one.End, 100) self.assertEqual(self.overlapping.Start, 3) self.assertEqual(self.overlapping.End, 10) self.assertEqual(self.three.Start, 6) self.assertEqual(self.three.End, 35) def test_reverse(self): """Range reverse method should reverse each span""" for s in self.overlapping.Spans: self.assertFalse(s.Reverse) self.overlapping.reverse() for s in self.overlapping.Spans: self.assertTrue(s.Reverse) self.overlapping.Spans.append(Span(0, 100)) self.overlapping.reverse() for s in self.overlapping.Spans[0:1]: self.assertFalse(s.Reverse) self.assertTrue(self.overlapping.Spans[-1].Reverse) def test_Reverse(self): """Range Reverse property should return True if any span reversed""" self.assertFalse(self.one.Reverse) self.one.reverse() self.assertTrue(self.one.Reverse) self.assertFalse(self.two.Reverse) self.two.Spans.append(Span(0,100,Reverse=True)) self.assertTrue(self.two.Reverse) self.two.reverse() self.assertTrue(self.two.Reverse) def test_contains(self): """Range contains an item if any span contains it""" self.assertContains(self.one, 50) self.assertContains(self.one, 0) self.assertContains(self.one, 99) self.assertNotContains(self.one, 100) self.assertContains(self.three, 6) self.assertNotContains(self.three, 7) self.assertNotContains(self.three, 8) self.assertNotContains(self.three, 14) self.assertContains(self.three, 15) self.assertNotContains(self.three, 29) self.assertContains(self.three, 30) self.assertContains(self.three, 34) self.assertNotContains(self.three, 35) self.assertNotContains(self.three, 40) #should work if a span is added self.three.Spans.append(40) self.assertContains(self.three, 40) #should work for spans self.assertContains(self.three, Span(31, 33)) self.assertNotContains(self.three, Span(31, 37)) #span contains itself self.assertContains(self.two, self.twocopy) #should work for ranges self.assertContains(self.three, Range([6, Span(15,16), Span(30,33)])) #should work for copy, except when extra piece added threecopy = Range(self.three) self.assertContains(self.three, threecopy) threecopy.Spans.append(1000) self.assertNotContains(self.three, threecopy) self.three.Spans.append(Span(950, 1050)) self.assertContains(self.three, threecopy) self.assertNotContains(threecopy, self.three) def test_overlaps(self): """Range overlaps should return true if any component overlapping""" self.assertTrue(self.two.overlaps(self.one)) self.assertTrue(self.one.overlaps(self.two)) self.assertTrue(self.three.overlaps(self.one)) #two and three are interleaved but not overlapping self.assertFalse(self.two.overlaps(self.three)) self.assertFalse(self.three.overlaps(self.two)) self.assertTrue(self.one.overlaps(self.empty)) self.assertTrue(self.empty.overlaps(self.one)) self.assertTrue(self.singles.overlaps(self.two)) def test_overlapsExtent(self): """Range overlapsExtent should return true for interleaved ranges""" self.assertTrue(self.two.overlapsExtent(self.three)) self.assertTrue(self.three.overlapsExtent(self.two)) self.assertFalse(self.single.overlapsExtent(self.two)) self.assertFalse(self.single.overlapsExtent(self.three)) self.assertTrue(self.one.overlapsExtent(self.three)) def test_sort(self): """Range sort should sort component spans""" one = self.one one.sort() self.assertEqual(one.Spans, [Span(100,0)]) one.Spans.append(Span(-20,-10)) self.assertEqual(one.Spans, [Span(0,100),Span(-20,-10)]) one.sort() self.assertEqual(one.Spans, [Span(-20,-10),Span(0,100)]) one.Spans.append(Span(-20, -10, Reverse=True)) self.assertEqual(one.Spans, [Span(-20,-10),Span(0,100), Span(-20,-10,Reverse=True)]) one.sort() self.assertEqual(one.Spans, [Span(-20,-10),Span(-20,-10,Reverse=True), Span(0,100)]) def test_iter(self): """Range iter should iterate through each span in turn""" self.assertEqual(list(iter(self.two)), [3,4,8,9,10]) self.two.Spans.insert(1, Span(103, 101, Reverse=True)) self.assertEqual(list(iter(self.two)), [3,4,102,101,8,9,10]) def test_Extent(self): """Range extent should span limits of range""" self.assertEqual(self.one.Extent, Span(0,100)) self.assertEqual(self.three.Extent, Span(6,35)) self.assertEqual(self.singles.Extent, Span(3, 12)) self.assertEqual(self.single.Extent, Span(0,1)) self.three.Spans.append(Span(100, 105, Reverse=True)) self.assertEqual(self.three.Extent, Span(6,105)) self.three.Spans.append(Span(-100, -1000)) self.assertEqual(self.three.Extent, Span(-1000,105)) def test_simplify(self): """Range reduce should group overlapping ranges""" #consolidate should have no effect when no overlap r = self.two r.simplify() self.assertEqual(r.Spans, [Span(3,5), Span(8,11)]) #should consolidate an overlap of the same direction r.Spans.append(Span(-1, 4)) r.simplify() self.assertEqual(r.Spans, [Span(-1,5), Span(8,11)]) #should also consolidate _adjacent_ spans of the same direction r.Spans.append(Span(11,14)) r.simplify() self.assertEqual(r.Spans, [Span(-1,5), Span(8,14)]) #bridge should cause consolidations s = Range(r) s.Spans.append(Span(5,8)) s.simplify() self.assertEqual(s.Spans, [Span(-1,14)]) #ditto for bridge that overlaps everything s = Range(r) s.Spans.append(Span(-100, 100)) s.simplify() self.assertEqual(s.Spans, [Span(-100,100)]) #however, can't consolidate span in other orientation s = Range(r) s.Spans.append(Span(-100, 100, Reverse=True)) self.assertEqual(s.Spans, [Span(-1,5), Span(8,14), \ Span(-100,100,Reverse=True)]) class RangeFromStringTests(TestCase): """Tests of the RangeFromString factory function.""" def test_init(self): self.assertEqual(RangeFromString(''), Range()) self.assertEqual(RangeFromString(' 3 , 4\t, ,, 10 ,'), Range([3,4,10])) self.assertEqual(RangeFromString('3,4-10,1-5'), Range([Span(3), Span(4,10), Span(1,5)])) #run the following if invoked from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_core/test_maps.py000644 000765 000024 00000011467 12024702176 021754 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import unittest from cogent.core.location import Map, Span from cogent.core.annotation import _Annotatable, Feature, _Feature from cogent import LoadSeqs, DNA __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", "Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def SimpleAnnotation(parent, locations, Name): return Feature(parent, '', Name, locations) def annotate(parent, start, end, Name): annot = parent.addAnnotation(SimpleAnnotation, locations=[(start, end)], Name=Name) return annot def structure(a, depth=0): annots = [structure(annot, depth+1) for annot in a.annotations] if not isinstance(a, _Feature): return ('seq', len(a), annots) elif annots: return (a.Name, repr(a.map), annots) else: return (a.Name, repr(a.map)) class MapTest(unittest.TestCase): """Testing annotation of and by maps""" def test_spans(self): # a simple two part map of length 10 map = Map([(0,5), (5,10)], parent_length=10) # try different spans on the above map for ((start, end), expected) in [ ((0, 4), "[0:4]"), ((0, 5), "[0:5]"), ((0, 6), "[0:5, 5:6]"), ((5, 10), "[5:10]"), ((-1, 10), "[-1-, 0:5, 5:10]"), ((5, 11), "[5:10, -1-]"), ((0, 10), "[0:5, 5:10]"), ((10, 0), "[10:5, 5:0]"), ]: r = repr(Span(start, end, Reverse=start>end).remapWith(map)) #print (start, end), r, if r != expected: self.fail(repr((r, expected))) def test_maps_on_maps(self): seq = DNA.makeSequence('ATCGATCGAT' * 5, Name='base') feat1 = annotate(seq, 10, 20, 'fake') feat2 = annotate(feat1, 3, 5, 'fake2') feat3 = annotate(seq, 1, 3, 'left') seq2 = seq[5:] self.assertEqual(structure(seq), ('seq', 50, [('fake', '[10:20]/50', [('fake2', '[3:5]/10')]), ('left', '[1:3]/50')]) ) self.assertEqual(structure(seq2), ('seq', 45, [('fake', '[5:15]/45', [('fake2', '[3:5]/10')])]) ) def test_getByAnnotation(self): seq = DNA.makeSequence('ATCGATCGAT' * 5, Name='base') seq.addAnnotation(Feature, 'test_type', 'test_label', [(5,10)]) seq.addAnnotation(Feature, 'test_type', 'test_label2', [(15,18)]) answer = list(seq.getByAnnotation('test_type')) self.assertEqual( len(answer), 2) self.assertEqual( str(answer[0]), 'TCGAT') self.assertEqual( str(answer[1]), 'TCG') answer = list(seq.getByAnnotation('test_type', 'test_label')) self.assertEqual( len(answer), 1) self.assertEqual( str(answer[0]), 'TCGAT') # test ignoring of a partial annotation sliced_seq = seq[:17] answer = list(sliced_seq.getByAnnotation('test_type', ignore_partial=True)) self.assertEqual(len(answer), 1) self.assertEqual( str(answer[0]), 'TCGAT') def test_getBySequenceAnnotation(self): aln = LoadSeqs(data={ 'a': 'ATCGAAATCGAT', 'b': 'ATCGA--TCGAT'}) b = aln.getSeq('b') b.addAnnotation(Feature, 'test_type', 'test_label', [(4,6)]) answer = aln.getBySequenceAnnotation('b', 'test_type')[0].todict() self.assertEqual(answer, {'b':'A--T', 'a':'AAAT'}) if 0: # old, needs fixes # Maps a = Map([(10,20)], parent_length=100) for (desc, map, expected) in [ ('a ', a, "Map([10:20] on base)"), ('i ', a.inverse(), "Map([-10-, 0:10, -80-] on Map([10:20] on base))"), ('1 ', a[5:], "Map([5:10] on Map([10:20] on base))"), ('1r', a[5:].relative_to(b), "Map([15:20] on base)"), ('2 ', a[:5], "Map([0:5] on Map([10:20] on base))"), ('2r', a[:5].relative_to(b), "Map([10:15] on base)"), ('r ', a.relative_to(a[5:]), "Map([-5-, 0:5] on Map([5:10] on Map([10:20] on base)))"), ('r ', a[2:4].relative_to(a[2:6]), "Map([0:2] on Map([2:6] on Map([10:20] on base)))"), ('r ', a[2:4].relative_to(a[2:6][0:3]), "Map([0:2] on Map([0:3] on Map([2:6] on Map([10:20] on base))))")]: print desc, repr(map), if repr(map) == expected: print else: print ' <--- ', expected bad = True if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_core/test_moltype.py000644 000765 000024 00000101102 12024702176 022467 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.core import moltype, sequence from cogent.core.moltype import AlphabetError, \ CoreObjectGroup, AlphabetGroup, make_matches, make_pairs, \ array, MolType, RNA, DNA, PROTEIN, STANDARD_CODON,\ IUPAC_RNA_chars, \ IUPAC_RNA_ambiguities, IUPAC_RNA_ambiguities_complements, \ IUPAC_DNA_chars, IUPAC_DNA_ambiguities, IUPAC_DNA_ambiguities_complements, \ RnaStandardPairs, DnaStandardPairs from cogent.util.unit_test import TestCase, main from cogent.data.molecular_weight import DnaMW, RnaMW, ProteinMW __author__ = "Gavin Huttley, Peter Maxwell, and Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Gavin Huttley", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" #ind some of the standard alphabets to reduce typing RnaBases = RNA.Alphabets.Base DnaBases = DNA.Alphabets.Base AminoAcids = PROTEIN.Alphabets.Base #the following classes are to preserve compatibility for older test code #that assumes mixed-case is OK. RnaMolType = MolType( Sequence = sequence.RnaSequence, motifset = IUPAC_RNA_chars, Ambiguities = IUPAC_RNA_ambiguities, label = "rna_with_lowercase", MWCalculator = RnaMW, Complements = IUPAC_RNA_ambiguities_complements, Pairs = RnaStandardPairs, add_lower=True, preserve_existing_moltypes=True, make_alphabet_group=True, ) DnaMolType = MolType( Sequence = sequence.DnaSequence, motifset = IUPAC_DNA_chars, Ambiguities = IUPAC_DNA_ambiguities, label = "dna_with_lowercase", MWCalculator = DnaMW, Complements = IUPAC_DNA_ambiguities_complements, Pairs = DnaStandardPairs, add_lower=True, preserve_existing_moltypes=True, make_alphabet_group=True, ) ProteinMolType = PROTEIN class make_matches_tests(TestCase): """Tests of the make_matches top-level function""" def test_init_empty(self): """make_matches should init ok with no parameters""" self.assertEqual(make_matches(), {}) def test_init_monomers(self): """make_matches with only monomers should produce {(i,i):True}""" m = make_matches('') self.assertEqual(m, {}) m = make_matches('qaz') self.assertEqual(m, {('q','q'):True,('a','a'):True,('z','z'):True}) def test_init_gaps(self): """make_matches with only gaps should match all gaps to each other""" m = make_matches('', '~!') self.assertEqual(m, {('~','~'):True,('!','!'):True,('!','~'):True, ('~','!'):True}) def test_init_degen(self): """make_matches with only degen should work as expected""" m = make_matches(None, None, {'x':'ab','y':'bc','z':'cd', 'n':'bcd'}) self.assertEqual(m, {('x','x'):False, ('x','y'):False, ('x','n'):False, ('y','x'):False, ('y','y'):False, ('y','z'):False, ('y','n'):False, ('z','y'):False, ('z','z'):False, ('z','n'):False, ('n','x'):False, ('n','y'):False, ('n','z'):False, ('n','n'):False}) self.assertNotContains(m, ('x','z')) def test_init_all(self): """make_matches with everything should produce correct dict""" m = make_matches('ABC',('-','~'),{'X':'AB','Y':('B','C'),'N':list('ABC')}) exp = { ('-','-'):True, ('~','~'):True, ('-','~'):True, ('~','-'):True, ('A','A'):True, ('B','B'):True, ('C','C'):True, ('A','X'):False, ('X','A'):False, ('B','X'):False, ('X','B'):False, ('B','Y'):False, ('Y','B'):False, ('C','Y'):False, ('Y','C'):False, ('A','N'):False, ('N','A'):False, ('B','N'):False, ('N','B'):False, ('C','N'):False, ('N','C'):False, ('X','X'):False, ('Y','Y'):False, ('N','N'):False, ('X','Y'):False, ('Y','X'):False, ('X','N'):False, ('N','X'):False, ('Y','N'):False, ('N','Y'):False, } self.assertEqual(m, exp) class make_pairs_tests(TestCase): """Tests of the top-level make_pairs factory function.""" def setUp(self): """Define some standard pairs and other data""" self.pairs = {('U','A'):True, ('A','U'):True, ('G','U'):False} def test_init_empty(self): """make_pairs should init ok with no parameters""" self.assertEqual(make_pairs(), {}) def test_init_pairs(self): """make_pairs with just pairs should equal the original""" self.assertEqual(make_pairs(self.pairs), self.pairs) self.assertNotSameObj(make_pairs(self.pairs), self.pairs) def test_init_monomers(self): """make_pairs with pairs and monomers should equal just the pairs""" self.assertEqual(make_pairs(self.pairs, 'ABCDEFG'), self.pairs) self.assertNotSameObj(make_pairs(self.pairs, 'ABCDEFG'), self.pairs) def test_init_gaps(self): """make_pairs should add all combinations of gaps as weak pairs""" p = make_pairs(self.pairs, None, '-~') self.assertNotEqual(p, self.pairs) self.pairs.update({('~','~'):False,('-','~'):False,('-','-'):False, ('~','-'):False}) self.assertEqual(p, self.pairs) def test_init_degen(self): """make_pairs should add in degenerate combinations as weak pairs""" p = make_pairs(self.pairs, 'AUG','-', {'R':'AG','Y':'CU','W':'AU'}) self.assertNotEqual(p, self.pairs) self.pairs.update({ ('-','-'):False, ('A','Y'):False, ('Y','A'):False, ('A','W'):False, ('W','A'):False, ('U','R'):False, ('R','U'):False, ('U','W'):False, ('W','U'):False, ('G','Y'):False, ('G','W'):False, ('R','Y'):False, ('R','W'):False, ('Y','R'):False, ('Y','W'):False, ('W','R'):False, ('W','Y'):False, ('W','W'):False, }) self.assertEqual(p, self.pairs) class CoreObjectGroupTests(TestCase): """Tests of the CoreObjectGroup class.""" def test_init(self): """CoreObjectGroup should init with basic list of objects.""" class o(object): def __init__(self, s): self.s = s base = o('base') c = CoreObjectGroup(base) self.assertSameObj(c.Base, base) self.assertSameObj(c.Degen, None) self.assertSameObj(c.Base.Degen, None) base, degen, gap, degengap = map(o, ['base','degen','gap','degengap']) c = CoreObjectGroup(base, degen, gap, degengap) self.assertSameObj(c.Base, base) self.assertSameObj(c.Base.Degen, degen) self.assertSameObj(c.Degen.Gapped, degengap) class AlphabetGroupTests(TestCase): """Tests of the AlphabetGroup class.""" def test_init(self): """AlphabetGroup should initialize successfully""" chars = 'AB' degens = {'C':'AB'} g = AlphabetGroup(chars, degens) self.assertEqual(''.join(g.Base), 'AB') self.assertEqual(''.join(g.Degen), 'ABC') self.assertEqual(''.join(g.Gapped), 'AB-') self.assertEqual(''.join(g.DegenGapped), 'AB-C?') class MolTypeTests(TestCase): """Tests of the MolType class. Should support same API as old Alphabet.""" def test_init_minimal(self): """MolType should init OK with just monomers""" a = MolType('Abc') self.assertContains(a.Alphabet, 'A') self.assertNotContains(a.Alphabet, 'a') # case-sensitive self.assertContains(a.Alphabet, 'b') self.assertNotContains(a.Alphabet, 'B') self.assertNotContains(a.Alphabet, 'x') def test_init_everything(self): """MolType should init OK with all parameters set""" k = dict.fromkeys a = MolType(k('Abc'), Ambiguities={'d':'bc'}, Gaps=k('~'), \ Complements={'b':'c','c':'b'}, Pairs={}, add_lower=False) for i in 'Abcd~': self.assertContains(a, i) self.assertEqual(a.complement('b'), 'c') self.assertEqual(a.complement('AbcAA'), 'AcbAA') self.assertEqual(a.firstDegenerate('AbcdA'), 3) self.assertEqual(a.firstGap('a~c'), 1) self.assertEqual(a.firstInvalid('Abcx'), 3) def test_stripDegenerate(self): """MolType stripDegenerate should remove any degenerate bases""" s = RnaMolType.stripDegenerate self.assertEqual(s('UCAG-'), 'UCAG-') self.assertEqual(s('NRYSW'), '') self.assertEqual(s('USNG'), 'UG') def test_stripBad(self): """MolType stripBad should remove any non-base, non-gap chars""" s = RnaMolType.stripBad self.assertEqual(s('UCAGwsnyrHBND-D'), 'UCAGwsnyrHBND-D') self.assertEqual(s('@#^*($@!#&()!@QZX'), '') self.assertEqual(s('aaa ggg ---!ccc'), 'aaaggg---ccc') def test_stripBadAndGaps(self): """MolType stripBadAndGaps should remove gaps and bad chars""" s = RnaMolType.stripBadAndGaps self.assertEqual(s('UCAGwsnyrHBND-D'), 'UCAGwsnyrHBNDD') self.assertEqual(s('@#^*($@!#&()!@QZX'), '') self.assertEqual(s('aaa ggg ---!ccc'), 'aaagggccc') def test_complement(self): """MolType complement should correctly complement sequence""" self.assertEqual(RnaMolType.complement('UauCG-NR'), 'AuaGC-NY') self.assertEqual(DnaMolType.complement('TatCG-NR'), 'AtaGC-NY') self.assertEqual(RnaMolType.complement(''), '') self.assertRaises(TypeError, ProteinMolType.complement, 'ACD') #if it wasn't a string, result should be a list self.assertEqual(RnaMolType.complement(list('UauCG-NR')), list('AuaGC-NY')) self.assertEqual(RnaMolType.complement(('a','c')), ('u','g')) #constructor should fail for a dict self.assertRaises(ValueError, RnaMolType.complement, {'a':'c'}) def test_rc(self): """MolType rc should correctly reverse-complement sequence""" self.assertEqual(RnaMolType.rc('U'), 'A') self.assertEqual(RnaMolType.rc(''), '') self.assertEqual(RnaMolType.rc('R'), 'Y') self.assertEqual(RnaMolType.rc('UauCG-NR'), 'YN-CGauA') self.assertEqual(DnaMolType.rc('TatCG-NR'), 'YN-CGatA') self.assertRaises(TypeError, ProteinMolType.rc, 'ACD') #if it wasn't a string, result should be a list self.assertEqual(RnaMolType.rc(list('UauCG-NR')), list('YN-CGauA')) self.assertEqual(RnaMolType.rc(('a','c')), ('g','u')) #constructor should fail for a dict self.assertRaises(ValueError, RnaMolType.rc, {'a':'c'}) def test_contains(self): """MolType contains should return correct result""" for i in 'UCAGWSMKRYBDHVN-' + 'UCAGWSMKRYBDHVN-'.lower(): self.assertContains(RnaMolType, i) for i in 'x!@#$%^&ZzQq': self.assertNotContains(RnaMolType, i) a = MolType(dict.fromkeys('ABC'), add_lower=True) for i in 'abcABC': self.assertContains(a, i) self.assertNotContains(a, 'x') b = MolType(dict.fromkeys('ABC'), add_lower=False) for i in 'ABC': self.assertContains(b, i) for i in 'abc': self.assertNotContains(b, i) def test_iter(self): """MolType iter should iterate over monomer order""" self.assertEqual(list(RnaMolType), ['U','C','A','G', 'u','c','a','g']) a = MolType('ZXCV') self.assertEqual(list(a), ['Z','X','C','V']) def test_isGapped(self): """MolType isGapped should return True if gaps in seq""" g = RnaMolType.isGapped self.assertFalse(g('')) self.assertFalse(g('ACGUCAGUACGUCAGNRCGAUcaguaguacYRNRYRN')) self.assertTrue(g('-')) self.assertTrue(g('--')) self.assertTrue(g('CAGUCGUACGUCAGUACGUacucauacgac-caguACUG')) self.assertTrue(g('CA--CGUAUGCA-----g')) self.assertTrue(g('CAGU-')) def test_isGap(self): """MolType isGap should return True if char is a gap""" g = RnaMolType.isGap #True for the empty string self.assertFalse(g('')) #True for all the standard and degenerate symbols s = 'ACGUCAGUACGUCAGNRCGAUcaguaguacYRNRYRN' self.assertFalse(g(s)) for i in s: self.assertFalse(g(i)) #should be true for a single gap self.assertTrue(g('-')) #note that it _shouldn't_ be true for a run of gaps: use a.isGapped() self.assertFalse(g('--')) def test_isDegenerate(self): """MolType isDegenerate should return True if degen symbol in seq""" d = RnaMolType.isDegenerate self.assertFalse(d('')) self.assertFalse(d('UACGCUACAUGuacgucaguGCUAGCUA---ACGUCAG')) self.assertTrue(d('N')) self.assertTrue(d('R')) self.assertTrue(d('y')) self.assertTrue(d('GCAUguagcucgUCAGUCAGUACgUgcasCUAG')) self.assertTrue(d('ACGYAUGCUGYEWEWNFMNfuwbybcwuybcjwbeiwfub')) def test_isValid(self): """MolType isValid should return True if any unknown symbol in seq""" v = RnaMolType.isValid self.assertFalse(v(3)) self.assertFalse(v(None)) self.assertTrue(v('ACGUGCAUGUCAYCAYGUACGcaugacyugc----RYNCYRNC')) self.assertTrue(v('')) self.assertTrue(v('a')) self.assertFalse(v('ACIUBHFWUIXZKLNJUCIHBICNSOWMOINJ')) self.assertFalse(v('CAGUCAGUCACA---GACCAUG-_--cgau')) def test_isStrict(self): """MolType isStrict should return True if all symbols in Monomers""" s = RnaMolType.isStrict self.assertFalse(s(3)) self.assertFalse(s(None)) self.assertTrue(s('')) self.assertTrue(s('A')) self.assertTrue(s('UAGCACUgcaugcauGCAUGACuacguACAUG')) self.assertFalse(s('CAGUCGAUCA-cgaucagUCGAUGAC')) self.assertFalse(s('ACGUGCAUXCAGUCAG')) def test_firstGap(self): """MolType firstGap should return index of first gap symbol, or None""" g = RnaMolType.firstGap self.assertEqual(g(''), None) self.assertEqual(g('a'), None) self.assertEqual(g('uhacucHuhacUIUIhacan'), None) self.assertEqual(g('-abc'), 0) self.assertEqual(g('b-ac'), 1) self.assertEqual(g('abcd-'), 4) def test_firstDegenerate(self): """MolType firstDegenerate should return index of first degen symbol""" d = RnaMolType.firstDegenerate self.assertEqual(d(''), None) self.assertEqual(d('a'), None) self.assertEqual(d('UCGACA--CU-gacucaguacgua'), None) self.assertEqual(d('nCAGU'), 0) self.assertEqual(d('CUGguagvAUG'), 7) self.assertEqual(d('ACUGCUAacgud'), 11) def test_firstInvalid(self): """MolType firstInvalid should return index of first invalid symbol""" i = RnaMolType.firstInvalid self.assertEqual(i(''), None) self.assertEqual(i('A'), None) self.assertEqual(i('ACGUNVBuacg-wskmWSMKYRryNnN--'), None) self.assertEqual(i('x'), 0) self.assertEqual(i('rx'), 1) self.assertEqual(i('CAGUNacgunRYWSwx'), 15) def test_firstNonStrict(self): """MolType firstNonStrict should return index of first non-strict symbol""" s = RnaMolType.firstNonStrict self.assertEqual(s(''), None) self.assertEqual(s('A'), None) self.assertEqual(s('ACGUACGUcgaucagu'), None) self.assertEqual(s('N'), 0) self.assertEqual(s('-'), 0) self.assertEqual(s('x'), 0) self.assertEqual(s('ACGUcgAUGUGCAUcaguX'), 18) self.assertEqual(s('ACGUcgAUGUGCAUcaguX-38243829'), 18) def test_disambiguate(self): """MolType disambiguate should remove degenerate bases""" d = RnaMolType.disambiguate self.assertEqual(d(''), '') self.assertEqual(d('AGCUGAUGUA--CAGU'),'AGCUGAUGUA--CAGU') self.assertEqual(d('AUn-yrs-wkmCGwmrNMWRKY', 'strip'), 'AU--CG') self.assertEqual(d(tuple('AUn-yrs-wkmCGwmrNMWRKY'), 'strip'), \ tuple('AU--CG')) s = 'AUn-yrs-wkmCGwmrNMWRKY' t = d(s, 'random') u = d(s, 'random') for i, j in zip(s, t): if i in RnaMolType.Degenerates: self.assertContains(RnaMolType.Degenerates[i], j) else: self.assertEquals(i, j) self.assertNotEqual(t, u) self.assertEqual(d(tuple('UCAG'), 'random'), tuple('UCAG')) self.assertEqual(len(s), len(t)) self.assertSameObj(RnaMolType.firstDegenerate(t), None) #should raise exception on unknown disambiguation method self.assertRaises(NotImplementedError, d, s, 'xyz') def test_degap(self): """MolType degap should remove all gaps from sequence""" g = RnaMolType.degap self.assertEqual(g(''), '') self.assertEqual(g('GUCAGUCgcaugcnvuincdks'), 'GUCAGUCgcaugcnvuincdks') self.assertEqual(g('----------------'), '') self.assertEqual(g('gcuauacg-'), 'gcuauacg') self.assertEqual(g('-CUAGUCA'), 'CUAGUCA') self.assertEqual(g('---a---c---u----g---'), 'acug') self.assertEqual(g(tuple('---a---c---u----g---')), tuple('acug')) def test_gapList(self): """MolType gapList should return correct gap positions""" g = RnaMolType.gapList self.assertEqual(g(''), []) self.assertEqual(g('ACUGUCAGUACGHFSDKJCUICDNINS'), []) self.assertEqual(g('GUACGUIACAKJDC-SDFHJDSFK'), [14]) self.assertEqual(g('-DSHFUHDSF'), [0]) self.assertEqual(g('UACHASJAIDS-'), [11]) self.assertEqual(g('---CGAUgCAU---ACGHc---ACGUCAGU---'), \ [0,1,2,11,12,13,19,20,21,30,31,32]) a = MolType({'A':1}, Gaps=dict.fromkeys('!@#$%')) g = a.gapList self.assertEqual(g(''), []) self.assertEqual(g('!!!'), [0,1,2]) self.assertEqual(g('!@#$!@#$!@#$'), range(12)) self.assertEqual(g('cguua!cgcuagua@cguasguadc#'), [5,14,25]) def test_gapVector(self): """MolType gapVector should return correct gap positions""" g = RnaMolType.gapVector self.assertEqual(g(''), []) self.assertEqual(g('ACUGUCAGUACGHFSDKJCUICDNINS'), [False]*27) self.assertEqual(g('GUACGUIACAKJDC-SDFHJDSFK'), map(bool, map(int,'000000000000001000000000'))) self.assertEqual(g('-DSHFUHDSF'), map(bool, map(int,'1000000000'))) self.assertEqual(g('UACHASJAIDS-'), map(bool, map(int,'000000000001'))) self.assertEqual(g('---CGAUgCAU---ACGHc---ACGUCAGU---'), \ map(bool, map(int,'111000000001110000011100000000111'))) a = MolType({'A':1}, Gaps=dict.fromkeys('!@#$%')) g = a.gapVector self.assertEqual(g(''), []) self.assertEqual(g('!!!'), map(bool, [1,1,1])) self.assertEqual(g('!@#$!@#$!@#$'), [True] * 12) self.assertEqual(g('cguua!cgcuagua@cguasguadc#'), map(bool, map(int,'00000100000000100000000001'))) def test_gapMaps(self): """MolType gapMaps should return dicts mapping gapped/ungapped pos""" empty = '' no_gaps = 'aaa' all_gaps = '---' start_gaps = '--abc' end_gaps = 'ab---' mid_gaps = '--a--b-cd---' gm = RnaMolType.gapMaps self.assertEqual(gm(empty), ({},{})) self.assertEqual(gm(no_gaps), ({0:0,1:1,2:2}, {0:0,1:1,2:2})) self.assertEqual(gm(all_gaps), ({},{})) self.assertEqual(gm(start_gaps), ({0:2,1:3,2:4},{2:0,3:1,4:2})) self.assertEqual(gm(end_gaps), ({0:0,1:1},{0:0,1:1})) self.assertEqual(gm(mid_gaps), ({0:2,1:5,2:7,3:8},{2:0,5:1,7:2,8:3})) def test_countGaps(self): """MolType countGaps should return correct gap count""" c = RnaMolType.countGaps self.assertEqual(c(''), 0) self.assertEqual(c('ACUGUCAGUACGHFSDKJCUICDNINS'), 0) self.assertEqual(c('GUACGUIACAKJDC-SDFHJDSFK'), 1) self.assertEqual(c('-DSHFUHDSF'), 1) self.assertEqual(c('UACHASJAIDS-'), 1) self.assertEqual(c('---CGAUgCAU---ACGHc---ACGUCAGU---'), 12) a = MolType({'A':1}, Gaps=dict.fromkeys('!@#$%')) c = a.countGaps self.assertEqual(c(''), 0) self.assertEqual(c('!!!'), 3) self.assertEqual(c('!@#$!@#$!@#$'), 12) self.assertEqual(c('cguua!cgcuagua@cguasguadc#'), 3) def test_countDegenerate(self): """MolType countDegenerate should return correct degen base count""" d = RnaMolType.countDegenerate self.assertEqual(d(''), 0) self.assertEqual(d('GACUGCAUGCAUCGUACGUCAGUACCGA'), 0) self.assertEqual(d('N'), 1) self.assertEqual(d('NRY'), 3) self.assertEqual(d('ACGUAVCUAGCAUNUCAGUCAGyUACGUCAGS'), 4) def test_possibilites(self): """MolType possibilities should return correct # possible sequences""" p = RnaMolType.possibilities self.assertEqual(p(''), 1) self.assertEqual(p('ACGUgcaucagUCGuGCAU'), 1) self.assertEqual(p('N'), 4) self.assertEqual(p('R'), 2) self.assertEqual(p('H'), 3) self.assertEqual(p('nRh'), 24) self.assertEqual(p('AUGCnGUCAg-aurGauc--gauhcgauacgws'), 96) def test_MW(self): """MolType MW should return correct molecular weight""" r = RnaMolType.MW p = ProteinMolType.MW self.assertEqual(p(''), 0) self.assertEqual(r(''), 0) self.assertFloatEqual(p('A'), 107.09) self.assertFloatEqual(r('A'), 375.17) self.assertFloatEqual(p('AAA'), 285.27) self.assertFloatEqual(r('AAA'), 1001.59) self.assertFloatEqual(r('AAACCCA'), 2182.37) def test_canMatch(self): """MolType canMatch should return True if all positions can match""" m = RnaMolType.canMatch self.assertTrue(m('', '')) self.assertTrue(m('UCAG', 'UCAG')) self.assertFalse(m('UCAG', 'ucag')) self.assertTrue(m('UCAG', 'NNNN')) self.assertTrue(m('NNNN', 'UCAG')) self.assertTrue(m('NNNN', 'NNNN')) self.assertFalse(m('N', 'x')) self.assertFalse(m('N', '-')) self.assertTrue(m('UCAG', 'YYRR')) self.assertTrue(m('UCAG', 'KMWS')) def test_canMismatch(self): """MolType canMismatch should return True on any possible mismatch""" m = RnaMolType.canMismatch self.assertFalse(m('','')) self.assertTrue(m('N', 'N')) self.assertTrue(m('R', 'R')) self.assertTrue(m('N', 'r')) self.assertTrue(m('CGUACGCAN', 'CGUACGCAN')) self.assertTrue(m('U', 'C')) self.assertTrue(m('UUU', 'UUC')) self.assertTrue(m('UUU', 'UUY')) self.assertFalse(m('UUU', 'UUU')) self.assertFalse(m('UCAG', 'UCAG')) self.assertFalse(m('U--', 'U--')) def test_mustMatch(self): """MolType mustMatch should return True when no possible mismatches""" m = RnaMolType.mustMatch self.assertTrue(m('','')) self.assertFalse(m('N', 'N')) self.assertFalse(m('R', 'R')) self.assertFalse(m('N', 'r')) self.assertFalse(m('CGUACGCAN', 'CGUACGCAN')) self.assertFalse(m('U', 'C')) self.assertFalse(m('UUU', 'UUC')) self.assertFalse(m('UUU', 'UUY')) self.assertTrue(m('UU-', 'UU-')) self.assertTrue(m('UCAG', 'UCAG')) def test_canPair(self): """MolType canPair should return True if all positions can pair""" p = RnaMolType.canPair self.assertTrue(p('', '')) self.assertFalse(p('UCAG', 'UCAG')) self.assertTrue(p('UCAG', 'CUGA')) self.assertFalse(p('UCAG', 'cuga')) self.assertTrue(p('UCAG', 'NNNN')) self.assertTrue(p('NNNN', 'UCAG')) self.assertTrue(p('NNNN', 'NNNN')) self.assertFalse(p('N', 'x')) self.assertFalse(p('N', '-')) self.assertTrue(p('-', '-')) self.assertTrue(p('UCAGU', 'KYYRR')) self.assertTrue(p('UCAG', 'KKRS')) self.assertTrue(p('U', 'G')) d = DnaMolType.canPair self.assertFalse(d('T', 'G')) def test_canMispair(self): """MolType canMispair should return True on any possible mispair""" m = RnaMolType.canMispair self.assertFalse(m('','')) self.assertTrue(m('N', 'N')) self.assertTrue(m('R', 'Y')) self.assertTrue(m('N', 'r')) self.assertTrue(m('CGUACGCAN', 'NUHCHUACH')) self.assertTrue(m('U', 'C')) self.assertTrue(m('U', 'R')) self.assertTrue(m('UUU', 'AAR')) self.assertTrue(m('UUU', 'GAG')) self.assertFalse(m('UUU', 'AAA')) self.assertFalse(m('UCAG', 'CUGA')) self.assertTrue(m('U--', '--U')) d = DnaMolType.canPair self.assertTrue(d('TCCAAAGRYY', 'RRYCTTTGGA')) def test_mustPair(self): """MolType mustPair should return True when no possible mispairs""" m = RnaMolType.mustPair self.assertTrue(m('','')) self.assertFalse(m('N', 'N')) self.assertFalse(m('R', 'Y')) self.assertFalse(m('A', 'A')) self.assertFalse(m('CGUACGCAN', 'NUGCGUACG')) self.assertFalse(m('U', 'C')) self.assertFalse(m('UUU', 'AAR')) self.assertFalse(m('UUU', 'RAA')) self.assertFalse(m('UU-', '-AA')) self.assertTrue(m('UCAG', 'CUGA')) d = DnaMolType.mustPair self.assertTrue(d('TCCAGGG', 'CCCTGGA')) self.assertTrue(d('tccaggg', 'ccctgga')) self.assertFalse(d('TCCAGGG', 'NCCTGGA')) class RnaMolTypeTests(TestCase): """Spot-checks of alphabet functionality applied to RNA alphabet.""" def test_contains(self): """RnaMolType should __contain__ the expected symbols.""" keys = 'ucagrymkwsbhvdn?-' for k in keys: self.assertContains(RnaMolType, k) for k in keys.upper(): self.assertContains(RnaMolType, k) self.assertNotContains(RnaMolType, 'X') def test_degenerateFromSequence(self): """RnaMolType degenerateFromSequence should give correct results""" d = RnaMolType.degenerateFromSequence #check monomers self.assertEqual(d('a'), 'a') self.assertEqual(d('C'), 'C') #check seq of monomers self.assertEqual(d('aaaaa'), 'a') #check some 2- to 4-way cases self.assertEqual(d('aaaaag'), 'r') self.assertEqual(d('ccgcgcgcggcc'), 's') self.assertEqual(d('accgcgcgcggcc'), 'v') self.assertEqual(d('aaaaagcuuu'), 'n') #check some cases with gaps self.assertEqual(d('aa---aaagcuuu'), '?') self.assertEqual(d('aaaaaaaaaaaaaaa-'), '?') self.assertEqual(d('----------------'), '-') #check mixed case example self.assertEqual(d('AaAAaa'), 'A') #check example with degenerate symbols in set self.assertEqual(d('RS'), 'V') self.assertEqual(d('RN-'), '?') #check that it works for proteins as well p = ProteinMolType.degenerateFromSequence self.assertEqual(p('A'), 'A') self.assertEqual(p('AAA'), 'A') self.assertEqual(p('DN'), 'B') self.assertEqual(p('---'), '-') self.assertEqual(p('ACD'), 'X') self.assertEqual(p('ABD'), 'X') self.assertEqual(p('ACX'), 'X') self.assertEqual(p('AC-'), '?') class _AlphabetTestCase(TestCase): def assertEqualSeqs(self, a, b): """For when we don't care about the type, just the elements""" self.assertEqual(list(a), list(b)) def assertEqualSets(self, a, b): self.assertEqual(set(a), set(b)) class DNAAlphabet(_AlphabetTestCase): def setUp(self): self.alpha = DNA.Alphabet def test_exclude(self): """Nucleotide alphabet testing excluding gap motif""" self.assertEqualSeqs(self.alpha, ['T','C','A','G']) def test_include(self): """Nucleotide alphabet testing including gap motif""" self.assertEqualSets(self.alpha.withGapMotif(), ['A','C','G','T','-']) def test_usesubset(self): """testing using a subset of motifs.""" self.assertEqualSets(self.alpha.withGapMotif(), ['A','C','G','T','-']) alpha = self.alpha.getSubset(motif_subset = ['A']) self.assertEqualSets(alpha, ['A']) #self.assertRaises(AlphabetError, self.alpha.getSubset, ['A','C']) alpha = DNA.Alphabet self.assertEqualSets(alpha, ['T','C','A','G']) alpha = alpha.getSubset(motif_subset = ['A','T','G']) self.assertEqualSets(alpha, ['A','G','T']) class DinucAlphabet(_AlphabetTestCase): def setUp(self): self.alpha = DNA.Alphabet.withGapMotif().getWordAlphabet(2) def test_exclude(self): """Dinucleotide alphabet testing excluding gap motif""" expected = ['-A', '-C', '-G', '-T', 'A-', 'AA', 'AC', 'AG', 'AT', 'C-', 'CA', 'CC', 'CG', 'CT', 'G-', 'GA', 'GC', 'GG', 'GT', 'T-', 'TA', 'TC', 'TG', 'TT'] self.assertEqualSets( self.alpha.getSubset(['--'], excluded=True) , expected) def test_include(self): """Dinucleotide alphabet testing including gap motif""" expected = ['--', '-A', '-C', '-G', '-T', 'A-', 'AA', 'AC', 'AG', 'AT', 'C-', 'CA', 'CC', 'CG', 'CT', 'G-', 'GA', 'GC', 'GG', 'GT', 'T-', 'TA', 'TC', 'TG', 'TT'] self.assertEqualSets(self.alpha, expected) def test_usesubset(self): """testing using a subset of motifs.""" alpha = self.alpha.getSubset(motif_subset = ['AA', 'CA','GT']) self.assertEqualSeqs(alpha, ['AA', 'CA','GT']) self.assertRaises(AlphabetError, alpha.getSubset, motif_subset = ['AA','CA','GT', 'TT']) def test_usesubsetbyfreq(self): """testing using a subset of motifs by using motif probs.""" motif_freqs = {'--':0, '-A': 0.0, '-C': 0, '-G': 0, '-T': 0, 'A-': 0, 'AA': 1.0, 'AC': 0.0, 'AG': 0, 'AT': 0, 'C-': 0, 'CA': 1, 'CC': 0, 'CG': 0, 'CT': 0, 'G-': 0, 'GA': 0, 'GC': 0, 'GG': 0, 'GT': 1, 'T-': 0, 'TA': 0, 'TC': 0, 'TG': 0, 'TT': 0} alpha = self.alpha.getSubset(motif_freqs) self.assertEqualSets(alpha, ['AA', 'CA', 'GT']) class CodonAlphabet(_AlphabetTestCase): def setUp(self): self.alpha = STANDARD_CODON def test_ambiguous_gaps(self): alpha = self.alpha.withGapMotif() self.assertEqual(len(alpha.resolveAmbiguity('AT?')), 4) self.assertRaises(Exception, alpha.resolveAmbiguity, 'at-') self.assertEqual(len(alpha.resolveAmbiguity('???')), 62) self.assertEqual(len(alpha.resolveAmbiguity('---')), 1) alpha = self.alpha self.assertEqual(len(alpha.resolveAmbiguity('AT?')), 4) self.assertRaises(Exception, alpha.resolveAmbiguity, 'at-') self.assertEqual(len(alpha.resolveAmbiguity('???')), 61) self.assertRaises(Exception, alpha.resolveAmbiguity, '---') def test_exclude(self): """testing excluding gap motif""" alpha = self.alpha expected = ['AAA', 'AAC', 'AAG', 'AAT', 'ACA', 'ACC', 'ACG', 'ACT', 'AGA', 'AGC', 'AGG', 'AGT', 'ATA', 'ATC', 'ATG', 'ATT', 'CAA', 'CAC', 'CAG', 'CAT', 'CCA', 'CCC', 'CCG', 'CCT', 'CGA', 'CGC', 'CGG', 'CGT', 'CTA', 'CTC', 'CTG', 'CTT', 'GAA', 'GAC', 'GAG', 'GAT', 'GCA', 'GCC', 'GCG', 'GCT', 'GGA', 'GGC', 'GGG', 'GGT', 'GTA', 'GTC', 'GTG', 'GTT', 'TAC', 'TAT', 'TCA', 'TCC', 'TCG', 'TCT', 'TGC', 'TGG', 'TGT', 'TTA', 'TTC', 'TTG', 'TTT'] self.assertEqualSets(alpha, expected) def test_include(self): """testing including gap motif""" alpha = self.alpha.withGapMotif() expected = ['---', 'AAA', 'AAC', 'AAG', 'AAT', 'ACA', 'ACC', 'ACG', 'ACT', 'AGA', 'AGC', 'AGG', 'AGT', 'ATA', 'ATC', 'ATG', 'ATT', 'CAA', 'CAC', 'CAG', 'CAT', 'CCA', 'CCC', 'CCG', 'CCT', 'CGA', 'CGC', 'CGG', 'CGT', 'CTA', 'CTC', 'CTG', 'CTT', 'GAA', 'GAC', 'GAG', 'GAT', 'GCA', 'GCC', 'GCG', 'GCT', 'GGA', 'GGC', 'GGG', 'GGT', 'GTA', 'GTC', 'GTG', 'GTT', 'TAC', 'TAT', 'TCA', 'TCC', 'TCG', 'TCT', 'TGC', 'TGG', 'TGT', 'TTA', 'TTC', 'TTG', 'TTT'] self.assertEqualSets(alpha, expected) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_core/test_profile.py000644 000765 000024 00000100444 12024702176 022446 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides tests for classes and functions in profile.py """ from __future__ import division from string import translate from numpy import array, sum, sqrt, transpose, add, subtract, multiply,\ divide, zeros from numpy.random import random from cogent.util.unit_test import TestCase, main#, numpy_err from cogent.core.moltype import DNA from cogent.core.sequence import ModelSequence from cogent.core.profile import Profile, ProfileError, CharMeaningProfile from cogent.core.alignment import DenseAlignment as Alignment __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Gavin Huttley", "Rob Knight", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class ProfileTests(TestCase): """Tests for Profile object""" def setUp(self): """setUp method for all Profile tests""" self.full = Profile(array([[2,4],[3,5],[4,8]]),"AB") self.empty = Profile(array([[]]),"AB") self.empty_row = Profile(array([[1,1],[0,0]]), "AB") self.empty_col = Profile(array([[0,1],[0,1]]), "AB") self.consensus = Profile(array([[.2,0,.8,0],[0,.1,.2,.7],[0,0,0,1],\ [.2,.3,.4,.1],[.5,.5,0,0]]),\ Alphabet=DNA, CharOrder="TCAG") self.not_same_value = Profile(array([[.3,.5,.1,.1],[.4,.6,0,.7],\ [.3,.2,0,0],[0,0,4,0]]),Alphabet=DNA, CharOrder="TCAG") self.zero_entry = Profile(array([[.3,.2,0,.5],[0,0,.8,.2]]),\ Alphabet="UCAG") self.score1 = Profile(Data=array([[-1,0,1,2],[-2,2,0,0],[-3,5,1,0]]),\ Alphabet=DNA, CharOrder="ATGC") self.score2 = Profile(array([[.2,.4,.4,0],[.1,0,.9,0],[.1,.2,.3,.4]]),\ Alphabet="TCAG") self.oned = Profile(array([.25,.25,.25,.25]),"ABCD") self.pp = Profile(array([[1,2,3,4],[5,6,7,8],[9,10,11,12]]),"ABCD") def test_init(self): """__init__: should set all attributed correctly""" self.assertRaises(TypeError, Profile) self.assertRaises(TypeError, Profile, array([[2,3]])) #only alphabet p = Profile(array([[.2,.8],[.7,.3]]),"AB") self.assertEqual(p.Data, [[.2,.8],[.7,.3]]) self.assertEqual(p.Alphabet, "AB") self.assertEqual(p.CharOrder, list("AB")) self.assertEqual(translate("ABBA",p._translation_table), "\x00\x01\x01\x00") #alphabet and char order p = Profile(array([[.1,.2],[.4,.3]]),Alphabet=DNA, CharOrder="AG") self.assertEqual(p.CharOrder,"AG") assert p.Alphabet is DNA #non-character alphabet p = Profile(array([[.1,.2],[.4,.3]]),Alphabet=[7,3], CharOrder=[3,7]) self.assertEqual(p.CharOrder,[3,7]) self.assertEqual(p.Alphabet, [7,3]) self.assertEqual(p.Data, [[.1,.2],[.4,.3]]) def test_str(self): """__str__: should return string representation of data in profile """ self.assertEqual(str(self.empty_row),str(array([[1,1],[0,0]]))) def test_make_translation_table(self): """_make_translation_table: should return correct table from char order """ p = Profile(array([[.2,.8],[.7,.3]]),"ABCDE","AB") self.assertEqual(translate("ABBA",p._translation_table), "\x00\x01\x01\x00") def test_hasValidData(self): """hasValidData: should work on full and empty profiles""" full = self.full.copy() full.normalizePositions() self.assertEqual(full.hasValidData(),True) self.assertEqual(self.empty_row.hasValidData(),False) self.assertEqual(self.empty.hasValidData(),False) def test_hasValidAttributes(self): """hasValidAttributes: should work for different alphabets/char orders """ p = Profile(array([[1,2],[3,4]]),Alphabet="ABCD", CharOrder="BAC") #self.Data doesn't match len(CharOrder) self.assertEqual(p.hasValidAttributes(),False) p = Profile(array([[1,2],[3,4]]),Alphabet="ABCD", CharOrder="AX") #not all chars in CharOrder in Alphabet self.assertEqual(p.hasValidAttributes(),False) p = Profile(array([[1,2],[3,4]]),Alphabet="ABCD", CharOrder="CB") #should be fine self.assertEqual(p.hasValidAttributes(),True) def test_isValid(self): """isValid: should work as expected""" #everything valid p1 = Profile(array([[.3,.7],[.8,.2]]),Alphabet="AB",CharOrder="AB") #invalid data, valid attributes p2 = Profile(array([[1,2],[3,4]]),Alphabet="ABCD", CharOrder="BA") #invalid attributes, valid data p3 = Profile(array([[.3,.7],[.8,.2]]),Alphabet="ABCD",CharOrder="AF") self.assertEqual(p1.isValid(),True) self.assertEqual(p2.isValid(),False) self.assertEqual(p3.isValid(),False) def test_dataAt(self): """dataAt: should work on valid position and character""" p = Profile(array([[.2,.4,.4,0],[.1,0,.9,0],[.1,.2,.3,.4]]),\ Alphabet="TCAG") self.assertEqual(p.dataAt(0,'C'),.4) self.assertEqual(p.dataAt(1,'T'),.1) self.assertRaises(ProfileError, p.dataAt, 1, 'U') self.assertRaises(ProfileError, p.dataAt, -2, 'T') self.assertRaises(ProfileError, p.dataAt, 5, 'T') def test_copy(self): """copy: should act as expected while rebinding/modifying attributes """ p = Profile(array([[1,1],[.7,.3]]),{'A':'A','G':'G','R':'AG'},"AG") p_copy = p.copy() assert p.Data is p_copy.Data assert p.Alphabet is p_copy.Alphabet assert p.CharOrder is p_copy.CharOrder #modifying p.Data modifies p_copy.Data p.Data[1,1] = 100 assert p.Alphabet is p_copy.Alphabet #normalizing p.Data rebinds it, so p_copy.Data is unchanged p.normalizePositions() assert not p.Data is p_copy.Data #Adding something to the alphabet changes both p and p_copy p.Alphabet['Y']='TC' assert p.Alphabet is p_copy.Alphabet #Rebinding the CharOrder does only change the original p.CharOrder='XX' assert not p.CharOrder is p_copy.CharOrder def test_normalizePositions(self): """normalizePositions: should normalize or raise appropriate error """ p = self.full.copy() p.normalizePositions() self.assertEqual(p.Data,array([[2/6,4/6],[3/8,5/8],[4/12,8/12]])) self.assertEqual(sum(p.Data,1),[1,1,1]) p = self.empty_col.copy() p.normalizePositions() self.assertEqual(p.Data,array([[0,1],[0,1]])) p = self.empty_row.copy() self.assertRaises(ProfileError,p.normalizePositions) p = Profile(array([[0.0,0.0]]),"AB") self.assertRaises(ProfileError,p.normalizePositions) #negative numbers!!!!!! p1 = Profile(array([[3,-2],[4,-3]]),"AB") p1.normalizePositions() self.assertEqual(p1.Data,array([[3,-2],[4,-3]])) p2 = Profile(array([[3,-3],[4,-3]]),"AB") self.assertRaises(ProfileError,p2.normalizePositions) def test_normalizeSequences(self): """normalizeSequences: should normalize or raise appropriate error """ p = self.full.copy() p.normalizeSequences() self.assertEqual(p.Data,array([[2/9,4/17],[3/9,5/17],[4/9,8/17]])) self.assertEqual(sum(p.Data, axis=0),[1,1]) p = self.empty_row.copy() p.normalizeSequences() self.assertEqual(p.Data,array([[1,1],[0,0]])) p = self.empty_col.copy() self.assertRaises(ProfileError,p.normalizeSequences) p = Profile(array([[0.0],[0.0]]),"AB") self.assertRaises(ProfileError,p.normalizeSequences) #negative numbers!!!!!! p1 = Profile(array([[3,4],[-2,-3]]),"AB") p1.normalizeSequences() self.assertEqual(p1.Data,array([[3,4],[-2,-3]])) p2 = Profile(array([[3,4],[-3,-3]]),"AB") self.assertRaises(ProfileError,p2.normalizeSequences) def test_prettyPrint_without_parameters(self): """prettyPrint: should work without parameters passed in""" p = self.full self.assertEqual(p.prettyPrint(),"2\t4\n3\t5\n4\t8") self.assertEqual(p.prettyPrint(include_header=True),\ "A\tB\n2\t4\n3\t5\n4\t8") self.assertEqual(p.prettyPrint(transpose_data=True),\ "2\t3\t4\n4\t5\t8") self.assertEqual(p.prettyPrint(include_header=True,\ transpose_data=True),"A\t2\t3\t4\nB\t4\t5\t8") #empty self.assertEqual(self.empty.prettyPrint(),"") self.assertEqual(self.empty.prettyPrint(transpose_data=True),"") #it will still print with invalid data (e.g if len(CharOrder) #doesn't match the data p = self.full.copy() p.CharOrder="ABC" self.assertEqual(p.prettyPrint(include_header=True),\ "A\tB\tC\n2\t4\t \n3\t5\t \n4\t8\t ") #it will truncate the CharOrder if data is transposed #and CharOrder is longer then the number of rows in the #transposed data self.assertEqual(p.prettyPrint(include_header=True,\ transpose_data=True),"A\t2\t3\t4\nB\t4\t5\t8") def test_prettyPrint_four_cases(self): """prettyPrint: with/without header/transpose/limit""" p = self.full p = self.pp self.assertEqual(p.prettyPrint(),\ "1\t 2\t 3\t 4\n5\t 6\t 7\t 8\n9\t10\t11\t12") self.assertEqual(p.prettyPrint(column_limit=3),\ "1\t 2\t 3\n5\t 6\t 7\n9\t10\t11") self.assertEqual(p.prettyPrint(column_limit=3, include_header=True),\ "A\t B\t C\n1\t 2\t 3\n5\t 6\t 7\n9\t10\t11") self.assertEqual(p.prettyPrint(column_limit=3, include_header=False,\ transpose_data=True),\ "1\t5\t 9\n2\t6\t10\n3\t7\t11\n4\t8\t12") self.assertEqual(p.prettyPrint(column_limit=2, include_header=False,\ transpose_data=True),\ "1\t5\n2\t6\n3\t7\n4\t8") self.assertEqual(p.prettyPrint(column_limit=3, include_header=True,\ transpose_data=True),\ "A\t1\t5\nB\t2\t6\nC\t3\t7\nD\t4\t8") def test_reduce_wrong_size(self): """reduce: should fail when profiles have different sizes""" p1 = Profile(array([[1,0],[0,1]]),Alphabet="AB") p2 = Profile(array([[1,0,0],[1,0,0]]),Alphabet="ABC") self.assertRaises(ProfileError,p1.reduce,p2) def test_reduce_normalization_error(self): """reduce: fails when input or output can't be normalized""" #Will raise errors when input data can't be normalized self.assertRaises(ProfileError,self.empty.reduce,self.empty,add) self.assertRaises(ProfileError,self.full.reduce,self.empty_row,add) #don't normalize input, but do normalize output #fails when one row adds up to zero p1 = Profile(array([[3,3],[4,4]]),"AB") p2 = Profile(array([[3,3],[-4,-4]]),"AB") self.assertRaises(ProfileError,p1.reduce,p2,add,False,True) def test_reduce_operators(self): """reduce: should work fine with different operators """ #different operators, normalize input, don't normalize output p1 = Profile(array([[1,0,0],[0,1,0]]),Alphabet="ABC") p2 = Profile(array([[1,0,0],[0,0,1]]),Alphabet="ABC") self.assertEqual(p1.reduce(p2).Data,array([[1,0,0],[0,.5,.5]])) self.assertEqual(p1.reduce(p2,add,normalize_input=True,\ normalize_output=False).Data,array([[2,0,0],[0,1,1]])) self.assertEqual(p1.reduce(p2,subtract,normalize_input=True,\ normalize_output=False).Data,array([[0,0,0],[0,1,-1]])) self.assertEqual(p1.reduce(p2,multiply,normalize_input=True,\ normalize_output=False).Data,array([[1,0,0],[0,0,0]])) self.assertRaises(ProfileError,p1.reduce,p2,divide,\ normalize_input=True,normalize_output=False) #don't normalize and normalize only input p3 = Profile(array([[1,2],[3,4]]),Alphabet="AB") p4 = Profile(array([[4,3],[2,1]]),Alphabet="AB") self.assertEqual(p3.reduce(p4,add,normalize_input=False,\ normalize_output=False).Data,array([[5,5],[5,5]])) self.assertFloatEqual(p3.reduce(p4,add,normalize_input=True,\ normalize_output=False).Data,array([[19/21,23/21],[23/21,19/21]])) #normalize input and output p5 = Profile(array([[1,1,0,0],[1,1,1,1]]),Alphabet="ABCD") p6 = Profile(array([[1,0,0,0],[1,0,0,1]]),Alphabet="ABCD") self.assertEqual(p5.reduce(p6,add,normalize_input=True,\ normalize_output=True).Data,array([[.75,.25,0,0],\ [.375,.125,.125,.375]])) #it can collapse empty profiles when normalizing is turned off self.assertEqual(self.empty.reduce(self.empty,\ normalize_input=False,normalize_output=False).Data.tolist(),[[]]) #more specific tests of the operators will be in the #separate functions def test__add_(self): """__add__: should not normalize input or output, just add""" p1 = Profile(array([[.3,.4,.1,0],[.1,.1,.1,.7]]),Alphabet="ABCD") p2 = Profile(array([[1,0,0,0],[1,0,0,1]]),Alphabet="ABCD") self.assertEqual((p1+p2).Data, array([[1.3,.4,.1,0],[1.1,.1,.1,1.7]])) self.assertRaises(ProfileError,self.empty.__add__, p1) self.assertEqual((self.empty + self.empty).Data.tolist(),[[]]) def test__sub_(self): """__sub__: should subtract two profiles, no normalization""" p1 = Profile(array([[.3,.4,.1,0],[.1,.1,.1,.7]]),Alphabet="ABCD") p2 = Profile(array([[1,0,0,0],[1,0,0,1]]),Alphabet="ABCD") self.assertFloatEqual((p1-p2).Data, array([[-.7,.4,.1,0],\ [-.9,.1,.1,-.3]])) def test__mul_(self): """__mul__: should multiply two profiles, no normalization""" p1 = Profile(array([[1,-2,3,0],[1,1,1,.5]]),Alphabet="ABCD") p2 = Profile(array([[1,0,0,0],[1,0,3,2]]),Alphabet="ABCD") self.assertEqual((p1*p2).Data, array([[1,0,0,0],\ [1,0,3,1]])) def test__div_(self): """__div__ and __truediv__: always true division b/c __future__.division """ p1 = Profile(array([[2,3],[4,5]]),"AB") p2 = Profile(array([[1,0],[4,5]]),"AB") #Int 0 p3 = Profile(array([[1,0.0],[4,5]]),"AB") #Float 0.0 p4 = Profile(array([[1,2],[8.0,5]]),"AB") #Float 0.0 self.assertRaises(ProfileError, p1.__truediv__,p2) #infinity in result data self.assertRaises(ProfileError, p1.__div__, p3) self.assertFloatEqual((p1.__div__(p4)).Data, array([[2,1.5],[0.5,1]])) def test_distance(self): """distance: should return correct distance between the profiles """ p1 = Profile(array([[2,4],[3,1]]), "AB") p2 = Profile(array([[4,6],[5,3]]), "AB") p3 = Profile(array([[4,6],[5,3],[1,1]]), "AB") p4 = Profile(array([2,2]),"AB") p5 = Profile(array([2,2,2]),"AB") p6 = Profile(array([[]]),"AB") self.assertEqual(p1.distance(p2),4) self.assertEqual(p2.distance(p1),4) self.assertEqual(p1.distance(p4),sqrt(6)) self.assertEqual(p6.distance(p6),0) #Raises error when frames are not aligned self.assertRaises(ProfileError, p1.distance,p3) self.assertRaises(ProfileError,p1.distance,p5) def test_toOddsMatrix(self): """toOddsMatrix: should work on valid data or raise an error """ p = Profile(array([[.1,.3,.5,.1],[.25,.25,.25,.25],\ [.05,.8,.05,.1],[.7,.1,.1,.1],[.6,.15,.05,.2]]),\ Alphabet="ACTG") p_exp = Profile(array([[.4, 1.2, 2, .4],[1,1,1,1],[.2,3.2,.2,.4],\ [2.8,.4,.4,.4],[2.4,.6,.2,.8]]),Alphabet="ACTG") self.assertEqual(p.toOddsMatrix().Data,p_exp.Data) assert p.Alphabet is p.toOddsMatrix().Alphabet self.assertEqual(p.toOddsMatrix([.25,.25,.25,.25]).Data,p_exp.Data) #fails if symbol_freqs has wrong size self.assertRaises(ProfileError, p.toOddsMatrix,\ [.25,.25,.25,.25,.25,.25]) self.assertRaises(ProfileError, self.zero_entry.toOddsMatrix,\ [.1,.2,.3]) #works on empty profile self.assertEqual(self.empty.toOddsMatrix().Data.tolist(),[[]]) #works with different input self.assertEqual(self.zero_entry.toOddsMatrix().Data,\ array([[1.2,.8,0,2],[0,0,3.2,.8]])) self.assertFloatEqual(self.zero_entry.toOddsMatrix([.1,.2,.3,.4]).Data,\ array([[3,1,0,1.25],[0,0,2.667,.5]]),1e-3) #fails when one of the background frequencies is 0 self.assertRaises(ProfileError, self.zero_entry.toOddsMatrix,\ [.1,.2,.3,0]) def test_toLogOddsMatrix(self): """toLogOddsMatrix: should work as expected""" #This test can be short, because it mainly depends on toOddsMatrix #for which everything has been tested p = Profile(array([[.1,.3,.5,.1],[.25,.25,.25,.25],\ [.05,.8,.05,.1],[.7,.1,.1,.1],[.6,.15,.05,.2]]),\ Alphabet="ACTG") p_exp = Profile(array(\ [[-1.322, 0.263, 1., -1.322],\ [ 0., 0., 0., 0.],\ [-2.322, 1.678, -2.322, -1.322],\ [ 1.485, -1.322, -1.322, -1.322],\ [ 1.263, -0.737, -2.322, -0.322]]),\ Alphabet="ACTG") self.assertFloatEqual(p.toLogOddsMatrix().Data,p_exp.Data,eps=1e-3) #works on empty matrix self.assertEqual(self.empty.toLogOddsMatrix().Data.tolist(),[[]]) def test__score_indices(self): """_score_indices: should work on valid input""" self.assertEqual(self.score1._score_indices(array([0,1,1,3,0,3]),\ offset=0),[6,2,-3,0]) self.assertFloatEqual(self.score2._score_indices(\ array([3,1,2,0,2,2,3]), offset=0),[.3,1.4,.8,1.4,1.7]) self.assertFloatEqual(self.score2._score_indices(\ array([3,1,2,0,2,2,3]), offset=3),[1.4,1.7]) #Errors will be raised on invalid input. Errors are not handled #in this method. Validation of the input is done elsewhere self.assertRaises(IndexError,self.score2._score_indices,\ array([3,1,63,0,4,2,3]), offset=3) def test__score_profile(self): """_score_profile: should work on valid input""" p1 = Profile(array([[1,0,0,0],[0,1,0,0],[0,0,.5,.5],[0,0,0,1],\ [.25,.25,.25,.25]]),"TCAG") p2 = Profile(array([[0,1,0,0],[.2,0,.8,0],[0,0,.5,.5],[1/3,1/3,0,1/3],\ [.25,.25,.25,.25]]),"TCAG") self.assertFloatEqual(self.score2._score_profile(p1,offset=0),\ [.55,1.25,.45]) self.assertFloatEqual(self.score2._score_profile(p1,offset=2),\ [.45]) self.assertFloatEqual(self.score2._score_profile(p2,offset=0),\ [1.49,1.043,.483],1e-3) #Errors will be raised on invalid input. Errors are not handled #in this method. Validation of the input is done elsewhere #In this case you don't get an error, but for sure an unexpected #result self.assertFloatEqual(self.score2._score_profile(p1,offset=3).tolist(),\ []) def test_score_sequence(self): """score: should work correctly for Sequence as input """ #works on normal valid data s1 = self.score1.score("ATTCAC",offset=0) self.assertEqual(s1,\ [6,2,-3,0]) self.assertFloatEqual(self.score2.score("TCAAGT",offset=0), [.5,1.6,1.7,0.5]) #works with different offset self.assertFloatEqual(self.score2.score("TCAAGT",offset=2), [1.7,0.5]) self.assertFloatEqual(self.score2.score("TCAAGT",offset=3), [0.5]) #raises error on invalid offset self.assertRaises(ProfileError,self.score2.score,\ "TCAAGT",offset=4) #works on seq of minimal length self.assertFloatEqual(self.score2.score("AGT",offset=0), [0.5]) #raises error when sequence is too short self.assertRaises(ProfileError, self.score2.score,"",offset=0) #raises error on empty profile self.assertRaises(ProfileError,self.empty.score,"ACGT") #raises error when sequence contains characters that #are not in the characterorder self.assertRaises(ProfileError,self.score2.score,"ACBRT") def test_score_sequence_object(self): """score: should work correctly on Sequence object as input """ # DnaSequence object ds = self.score1.score(DNA.Sequence("ATTCAC"),offset=0) self.assertEqual(ds, [6,2,-3,0]) # ModelSequence object ms = self.score1.score(ModelSequence("ATTCAC", Alphabet=DNA.Alphabet),\ offset=0) self.assertEqual(ms, [6,2,-3,0]) def test_score_no_trans_table(self): """score: should work when no translation table is present """ p = Profile(Data=array([[-1,0,1,2],[-2,2,0,0],[-3,5,1,0]]),\ Alphabet=DNA, CharOrder="ATGC") # remove translation table del p.__dict__['_translation_table'] # then score the profile s1 = p.score(DNA.Sequence("ATTCAC"),offset=0) self.assertEqual(s1, [6,2,-3,0]) def test_score_profile(self): """score: should work correctly for Profile as input """ p1 = Profile(array([[1,0,0,0],[0,1,0,0],[0,0,.5,.5],[0,0,0,1],\ [.25,.25,.25,.25]]),"TCAG") p2 = Profile(array([[0,1,0,0],[.2,0,.8,0],[0,0,.5,.5],[1/3,1/3,0,1/3],\ [.25,.25,.25,.25]]),"TCAG") p3 = Profile(array([[1,0,0,0],[0,1,0,0],[0,0,0,1]]),"TCAG") p4 = Profile(array([[1,0,0,0],[0,1,0,0]]),"TCAG") p5 = Profile(array([[1,0,0,0],[0,1,0,0],[0,0,0,1]]),"AGTC") #works on normal valid data self.assertFloatEqual(self.score2.score(p1,offset=0),\ [.55,1.25,.45]) self.assertFloatEqual(self.score2.score(p2,offset=0), [1.49,1.043,.483],1e-3) #works with different offset self.assertFloatEqual(self.score2.score(p1,offset=1), [1.25,0.45]) self.assertFloatEqual(self.score2.score(p1,offset=2), [0.45]) #raises error on invalid offset self.assertRaises(ProfileError,self.score2.score,\ p1,offset=3) #works on profile of minimal length self.assertFloatEqual(self.score2.score(p3,offset=0), [0.6]) #raises error when profile is too short self.assertRaises(ProfileError, self.score2.score,p4,offset=0) #raises error on empty profile self.assertRaises(ProfileError,self.empty.score,p1) #raises error when character order doesn't match self.assertRaises(ProfileError,self.score2.score,p5) def test_rowUncertainty(self): """rowUncertainty: should handle full and empty profiles """ p = Profile(array([[.25,.25,.25,.25],[.5,.5,0,0]]),"ABCD") self.assertEqual(p.rowUncertainty(),[2,1]) #for empty rows 0 is returned as the uncertainty self.assertEqual(self.empty.rowUncertainty().tolist(),[]) p = Profile(array([[],[],[]]),"") self.assertEqual(p.rowUncertainty().tolist(),[]) #doesn't work on 1D array self.assertRaises(ProfileError,self.oned.rowUncertainty) def test_columnUncertainty(self): """columnUncertainty: should handle full and empty profiles """ p = Profile(array([[.25,.5],[.25,.5],[.25,0],[.25,0]]),"AB") self.assertEqual(p.columnUncertainty(),[2,1]) #for empty cols nothing is returned as the uncertainty self.assertEqual(self.empty.columnUncertainty().tolist(),[]) p = Profile(array([[],[],[]]),"") self.assertEqual(p.columnUncertainty().tolist(),[]) #doesn't work on 1D array self.assertRaises(ProfileError,self.oned.columnUncertainty) def test_rowDegeneracy(self): """rowDegneracy: should work as expected""" p1 = self.consensus p2 = self.not_same_value self.assertEqual(p1.rowDegeneracy(),[1,1,1,2,1]) self.assertEqual(p1.rowDegeneracy(cutoff=.5),[1,1,1,2,1]) self.assertEqual(p1.rowDegeneracy(cutoff=.75),[1,2,1,3,2]) #when a row seems to add up to the cutoff value, it's not #always found because of floating point error. E.g. second row #in this example self.assertEqual(p1.rowDegeneracy(cutoff=1),[2,4,1,4,2]) #when the cutoff can't be found, the number of columns in the #profile is returned (for each row) self.assertEqual(p1.rowDegeneracy(cutoff=1.5),[4,4,4,4,4]) self.assertEqual(p2.rowDegeneracy(cutoff=.95),[4,2,4,1]) self.assertEqual(p2.rowDegeneracy(cutoff=1.4),[4,3,4,1]) self.assertEqual(self.empty.rowDegeneracy(),[]) def test_columnDegeneracy(self): """columnDegeneracy: shoudl work as expected""" p1 = self.consensus p1.Data = transpose(p1.Data) p2 = self.not_same_value p2.Data = transpose(p2.Data) p1d = p1.columnDegeneracy() self.assertEqual(p1d,[1,1,1,2,1]) self.assertEqual(p1.columnDegeneracy(cutoff=.5),[1,1,1,2,1]) self.assertEqual(p1.columnDegeneracy(cutoff=.75),[1,2,1,3,2]) #when a row seems to add up to the cutoff value, it's not #always found because of floating point error. E.g. second row #in this example self.assertEqual(p1.columnDegeneracy(cutoff=1),[2,4,1,4,2]) #when the cutoff can't be found, the number of rows in the #profile is returned (for each column) self.assertEqual(p1.columnDegeneracy(cutoff=1.5),[4,4,4,4,4]) self.assertEqual(p2.columnDegeneracy(cutoff=.95),[4,2,4,1]) self.assertEqual(p2.columnDegeneracy(cutoff=1.4),[4,3,4,1]) self.assertEqual(self.empty.columnDegeneracy(),[]) def test_rowMax(self): """rowMax should return max value in each row""" p1 = self.consensus obs = p1.rowMax() self.assertEqual(obs, array([.8, .7, 1, .4, .5])) def test_toConsensus(self): """toConsensus: should work with all the different options """ p = self.consensus self.assertEqual(p.toConsensus(fully_degenerate=False),"AGGAT") self.assertEqual(p.toConsensus(fully_degenerate=True),"WVGNY") self.assertEqual(p.toConsensus(cutoff=0.75),"ARGHY") self.assertEqual(p.toConsensus(cutoff=0.95),"WVGNY") self.assertEqual(p.toConsensus(cutoff=2),"WVGNY") p = self.not_same_value self.assertEqual(p.toConsensus(fully_degenerate=False),"CGTA") self.assertEqual(p.toConsensus(fully_degenerate=True),"NBYA") self.assertEqual(p.toConsensus(cutoff=0.75),"YSYA") self.assertEqual(p.toConsensus(cutoff=2),"NBYA") self.assertEqual(p.toConsensus(cutoff=5),"NBYA") #when you specify both fully_generate and a cutoff value #the cutoff takes priority and is used in the calculation self.assertEqual(p.toConsensus(cutoff=0.75,fully_degenerate=True),\ "YSYA") #raises AttributeError when Alphabet doens't have Degenerates p = Profile(array([[.2,.8],[.7,.3]]),"AB") self.assertRaises(AttributeError,p.toConsensus,cutoff=.5) def test_toConsensus_include_all(self): """toConsensus: Should include all possibilities when include_all=True """ p1 = Profile(array([[.2,0,.8,0],[0,.1,.2,.7],[0,0,0,1],\ [.2,.3,.4,.1],[.5,.5,0,0]]),\ Alphabet=DNA, CharOrder="TCAG") self.assertEqual(p1.toConsensus(cutoff=0.4, include_all=True),\ "AGGAY") p2 = Profile(array([[.25,0.25,.25,0.25],[0.1,.1,.1,0],\ [.4,0,.4,0],[0,.2,0.2,0.3]]),\ Alphabet=DNA, CharOrder="TCAG") self.assertEqual(p2.toConsensus(cutoff=0.4,\ include_all=True), "NHWV") def test_randomIndices(self): """randomIndices: 99% of new frequencies should be within 3*SD """ r_num, c_num = 100,20 num_elements = r_num*c_num r = random([r_num,c_num]) p = Profile(r,"A"*c_num) p.normalizePositions() d = p.Data n = 1000 #Test only works on normalized profile, b/c of 1-d below means = n*d three_stds = sqrt(d*(1-d)*n)*3 result = [p.randomIndices() for x in range(n)] a = Alignment(transpose(result)) def absoluteProfile(alignment,char_order): f = a.columnFreqs() res = zeros([len(f),len(char_order)]) for row, freq in enumerate(f): for i in freq: res[row, ord(i)] = freq[i] return res ap = absoluteProfile(a,p.CharOrder) failure = abs(ap-means) > three_stds assert sum(sum(failure))/num_elements <= 0.01 def test_randomSequence(self): """randomSequence: 99% of new frequencies should be within 3*SD""" r_num, c_num = 100,20 num_elements = r_num*c_num alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" r = random([r_num,c_num]) p = Profile(r,alpha[:c_num]) p.normalizePositions() d = p.Data n = 1000 #Test only works on normalized profile, b/c of 1-d below means = n*d three_stds = sqrt(d*(1-d)*n)*3 a = Alignment([p.randomSequence() for x in range(n)]) def absoluteProfile(alignment,char_order): f = a.columnFreqs() res = zeros([len(f),len(char_order)]) for row, freq in enumerate(f): for i in freq: col = char_order.index(i) res[row, col] = freq[i] return res ap = absoluteProfile(a,p.CharOrder) failure = abs(ap-means) > three_stds assert sum(sum(failure))/num_elements <= 0.01 class ModuleLevelFunctionsTest(TestCase): """Contains tests for the module level functions in profile.py""" def setUp(self): """setUp to change the alphabet for testing general CharMeaningProfile """ self.alt_dna = DNA DnaDegenerateSymbols = {'R':'AG','N':'TCAG','Y':'TC','?':'TCAG-'} self.alt_dna.Degenerates = DnaDegenerateSymbols def test_CharMeaningProfile(self): """CharMeaningProfile: should work as expected """ p1 = CharMeaningProfile(self.alt_dna,"AGCT") p1_exp = [('A',[1,0,0,0]),('G',[0,1,0,0]),('C',[0,0,1,0]),\ ('T',[0,0,0,1])] p2 = CharMeaningProfile(self.alt_dna,"TCAG") p2_exp = [('A',[0,0,1,0]),('G',[0,0,0,1]),('C',[0,1,0,0]),\ ('T',[1,0,0,0])] #split_degen, but only whose chars are all in char order #so ? is ignored right now p3 = CharMeaningProfile(self.alt_dna,"TCAG",split_degenerates=True) p3_exp = [('A',[0,0,1,0]),('G',[0,0,0,1]),('C',[0,1,0,0]),\ ('T',[1,0,0,0]),('R',[0,0,.5,.5]),('Y',[.5,.5,0,0]),\ ('N',[.25,.25,.25,.25])] #if we add '-' to the character order, ? is split up as well p4 = CharMeaningProfile(self.alt_dna,"TCAG-",split_degenerates=True) p4_exp = [('A',[0,0,1,0,0]),('G',[0,0,0,1,0]),('C',[0,1,0,0,0]),\ ('T',[1,0,0,0,0]),('R',[0,0,.5,.5,0]),('Y',[.5,.5,0,0,0]),\ ('N',[.25,.25,.25,.25,0]),('-',[0,0,0,0,1]),('?',[.2,.2,.2,.2,.2])] #Degenerate characters in the character order, when split_degenerates #is True, won't be split up, they'll get a 1 in their own column. p5 = CharMeaningProfile(self.alt_dna,"AGN",split_degenerates=True) p5_exp = [('A',[1,0,0]),('G',[0,1,0]),('N',[0,0,1]),\ ('R',[.5,.5,0])] #defaults char_order to list(alphabet) p6 = CharMeaningProfile(self.alt_dna) p6_exp = [('A',[0,0,1,0]),('G',[0,0,0,1]),('C',[0,1,0,0]),\ ('T',[1,0,0,0])] #also accepts empty char_order -> set to list(alphabet) p7 = CharMeaningProfile(self.alt_dna,"") p7_exp = [('A',[0,0,1,0]),('G',[0,0,0,1]),('C',[0,1,0,0]),\ ('T',[1,0,0,0])] for obs,exp in [(p1,p1_exp),(p2,p2_exp),(p3,p3_exp),(p4,p4_exp),\ (p5,p5_exp),(p6,p6_exp),(p7,p7_exp)]: nz = [(chr(i),r.tolist()) for i,r in enumerate(obs.Data) if r.any()] self.assertEqualItems(nz, exp) self.assertRaises(ValueError,CharMeaningProfile,self.alt_dna,\ "AGNX",split_degenerates=True) if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_core/test_seq_aln_integration.py000644 000765 000024 00000025635 12024702176 025043 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division from numpy import array, transpose, alltrue from cogent.util.unit_test import TestCase, main from cogent.core.moltype import RNA from cogent.core.sequence import RnaSequence, Sequence, ModelSequence from cogent.core.alignment import Alignment, DenseAlignment __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class AllTests(TestCase): def setUp(self): """setUp method for all tests""" # named sequences self.rna1 = RnaSequence('UCAGGG', Name='rna1') self.rna2 = RnaSequence('YCU-RG', Name='rna2') self.rna3 = RnaSequence('CAA-NR', Name='rna3') self.model1 = ModelSequence('UCAGGG', Name='rna1',\ Alphabet=RNA.Alphabets.DegenGapped) self.model2 = ModelSequence('YCU-RG', Name='rna2',\ Alphabet=RNA.Alphabets.DegenGapped) self.model3 = ModelSequence('CAA-NR', Name='rna3',\ Alphabet=RNA.Alphabets.DegenGapped) self.aln = Alignment([self.rna1, self.rna2, self.rna3], MolType=RNA) self.da = DenseAlignment([self.model1, self.model2, self.model3],\ MolType=RNA, Alphabet=RNA.Alphabets.DegenGapped) # seqs no name self.nn_rna1 = RnaSequence('UCAGGG') self.nn_rna2 = RnaSequence('YCU-RG') self.nn_rna3 = RnaSequence('CAA-NR') self.nn_model1 = ModelSequence('UCAGGG',\ Alphabet=RNA.Alphabets.DegenGapped) self.nn_model2 = ModelSequence('YCU-RG',\ Alphabet=RNA.Alphabets.DegenGapped) self.nn_model3 = ModelSequence('CAA-NR',\ Alphabet=RNA.Alphabets.DegenGapped) self.nn_aln = Alignment([self.nn_rna1, self.nn_rna2, self.nn_rna3],\ MolType=RNA) self.nn_da = DenseAlignment([self.nn_model1, self.nn_model2,\ self.nn_model3], MolType=RNA, Alphabet=RNA.Alphabets.DegenGapped) def test_printing_named_seqs(self): """Printing named seqs should work the same on Aln and DenseAln""" #Note: the newline trailing each sequence is intentional, because #we want each FASTA-format record to be separated. exp_lines_general = ['>rna1','UCAGGG','>rna2','YCU-RG','>rna3','CAA-NR'] self.assertEqual(str(self.aln), '\n'.join(exp_lines_general) + '\n') self.assertEqual(str(self.da), '\n'.join(exp_lines_general) + '\n') def test_printing_unnamed_seqs(self): """Printing unnamed sequences should work the same on Aln and DenseAln """ exp_lines_gen = ['>seq_0','UCAGGG','>seq_1','YCU-RG','>seq_2','CAA-NR\n'] self.assertEqual(str(self.nn_aln),'\n'.join(exp_lines_gen)) self.assertEqual(str(self.nn_da),'\n'.join(exp_lines_gen)) def test_DenseAlignment_without_moltype(self): """Expect MolType to be picked up from the sequences.""" m1 = ModelSequence('UCAG',Alphabet=RNA.Alphabets.DegenGapped,\ Name='rna1') m2 = ModelSequence('CCCR',Alphabet=RNA.Alphabets.DegenGapped,\ Name='rna2') da = DenseAlignment([m1, m2]) exp_lines = ['>rna1','UCAG','>rna2','CCCR'] self.assertEqual(str(da), '\n'.join(exp_lines) + '\n') def test_names(self): # Should both alignments handle names the same way? self.assertEqual(self.aln.Names, ['rna1','rna2','rna3']) self.assertEqual(self.da.Names, ['rna1','rna2','rna3']) # On unnamed sequences the behavior is now the same. self.assertEqual(self.nn_aln.Names, ['seq_0','seq_1','seq_2']) self.assertEqual(self.nn_da.Names, ['seq_0','seq_1','seq_2']) def test_seqFreqs(self): """seqFreqs should work the same on Alignment and DenseAlignment""" # Used alphabet: ('U', 'C', 'A', 'G', '-', 'B', 'D', 'H',\ # 'K', 'M', 'N', 'S', 'R', 'W', 'V', 'Y') exp = [[1,1,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0],\ [1,1,0,1,1,0,0,0,0,0,0,0,1,0,0,1,0],\ [0,1,2,0,1,0,0,0,0,0,1,0,1,0,0,0,0]] # This works self.assertEqual(self.da.getSeqFreqs().Data, exp) # This used to raise an error, but now works self.assertEqual(self.aln.getSeqFreqs().Data, exp) def test_subset_positions_DenseAlignment(self): model1 = ModelSequence('UCG', Name='rna1',\ Alphabet=RNA.Alphabets.DegenGapped) model2 = ModelSequence('YCG', Name='rna2',\ Alphabet=RNA.Alphabets.DegenGapped) model3 = ModelSequence('CAR', Name='rna3',\ Alphabet=RNA.Alphabets.DegenGapped) sub_da = DenseAlignment([model1, model2, model3],\ MolType=RNA, Alphabet=RNA.Alphabets.DegenGapped) full_data = array([[0,1,2,3,3,3],[15,1,0,4,12,3],[1,2,2,4,10,12]]) sub_data = array([[0,1,3],[15,1,3],[1,2,12]]) # First check some data self.assertEqual(self.da.ArraySeqs, full_data) self.assertEqual(self.da.ArrayPositions, transpose(full_data)) self.assertEqual(sub_da.ArraySeqs, sub_data) self.assertEqual(sub_da.ArrayPositions, transpose(sub_data)) obs_sub_da_TP = self.da.takePositions([0,1,5]) obs_sub_da_SA = self.da.getSubAlignment(pos=[0,1,5]) # When using the getSubAlignment method the data is right self.assertEqual(obs_sub_da_SA, sub_da) self.failIfEqual(obs_sub_da_SA, self.da) self.assertEqual(obs_sub_da_SA.ArraySeqs, sub_data) self.assertEqual(obs_sub_da_SA.ArrayPositions, transpose(sub_data)) # For the takePositions method: Why does this work self.assertEqual(obs_sub_da_TP, sub_da) self.failIfEqual(obs_sub_da_TP, self.da) # If the data doesn't match? self.assertEqual(obs_sub_da_TP.ArraySeqs, sub_data) self.assertEqual(obs_sub_da_TP.ArrayPositions, transpose(sub_data)) # Shouldn't the __eq__ method check the data at least? def test_subset_positions_Alignment(self): rna1 = RnaSequence('UCG', Name='rna1') rna2 = RnaSequence('YCG', Name='rna2') rna3 = RnaSequence('CAR', Name='rna3') sub_aln = Alignment([rna1, rna2, rna3], MolType=RNA) obs_sub_aln = self.aln.takePositions([0,1,5]) self.assertEqual(obs_sub_aln, sub_aln) self.failIfEqual(obs_sub_aln, self.aln) # string representations should be the same. This fails right # now, because sequence order is not maintained. See separate test. self.assertEqual(str(obs_sub_aln), str(sub_aln)) def test_takePositions_sequence_order(self): """Alignment takePositions should maintain seq order""" #This works self.assertEqual(self.da.Names,['rna1','rna2','rna3']) sub_da = self.da.getSubAlignment(pos=[0,1,5]) self.assertEqual(sub_da.Names,['rna1','rna2','rna3']) # seq order not maintained in Alignment self.assertEqual(self.aln.Names,['rna1','rna2','rna3']) sub_aln = self.aln.takePositions([0,1,5]) self.assertEqual(sub_aln.Names,['rna1','rna2','rna3']) def test_subset_seqs_Alignment(self): rna1 = RnaSequence('UCG', Name='rna1') rna2 = RnaSequence('YCG', Name='rna2') rna3 = RnaSequence('CAR', Name='rna3') sub_aln = Alignment([rna2, rna3], MolType=RNA) aln = Alignment([rna1, rna2, rna3], MolType=RNA) obs_sub_aln = aln.takeSeqs(['rna2','rna3']) self.assertEqual(obs_sub_aln, sub_aln) self.assertEqual(str(obs_sub_aln), str(sub_aln)) # Selected sequences should be in specified order? obs_sub_aln_1 = self.aln.takeSeqs(['rna3','rna2']) obs_sub_aln_2 = self.aln.takeSeqs(['rna2','rna3']) self.failIfEqual(str(obs_sub_aln_1), str(obs_sub_aln_2)) def test_subset_seqs_DenseAlignment(self): model1 = ModelSequence('UCG', Name='rna1',\ Alphabet=RNA.Alphabets.DegenGapped) model2 = ModelSequence('YCG', Name='rna2',\ Alphabet=RNA.Alphabets.DegenGapped) model3 = ModelSequence('CAR', Name='rna3',\ Alphabet=RNA.Alphabets.DegenGapped) sub_da = DenseAlignment([model1, model2, model3],\ MolType=RNA, Alphabet=RNA.Alphabets.DegenGapped) # takeSeqs by name should have the same effect as # getSubAlignment by seq idx? obs_sub_da_TS = self.da.takeSeqs(['rna1']) obs_sub_da_SA = self.da.getSubAlignment(seqs=[0]) # These two are now the same. Fixed mapping of key to char array. self.assertEqual(obs_sub_da_TS, obs_sub_da_SA) self.assertEqual(str(obs_sub_da_TS), str(obs_sub_da_SA)) def test_aln_equality(self): # When does something compare equal? self.assertEqual(self.da == self.da, True) # one sequence less other_da1 = DenseAlignment([self.model1, self.model2],\ MolType=RNA, Alphabet=RNA.Alphabets.DegenGapped) self.assertEqual(self.da == other_da1, False) # seqs in different order -- doesn't matter other_da2 = DenseAlignment([self.model1, self.model3, self.model2],\ MolType=RNA, Alphabet=RNA.Alphabets.DegenGapped) self.assertEqual(self.da == other_da2, True) # seqs in different encoding -- doesn't matter, only looks at data other_da3 = DenseAlignment([self.model1, self.model2, self.model3]) # Should this compare False even though the data is exactly the same? # The MolType is different... self.assertEqual(self.da == other_da3, True) assert alltrue(map(alltrue,self.da.ArraySeqs == other_da3.ArraySeqs)) def test_seq_equality(self): model1 = ModelSequence('UCG', Name='rna1',\ Alphabet=RNA.Alphabets.DegenGapped) model2 = ModelSequence('UCG', Name='rna1',\ Alphabet=RNA.Alphabets.DegenGapped) # Shouldn't the above two sequences be equal? self.assertEqual(model1, model2) # string comparison is True self.assertEqual(str(model1), str(model2)) def test_seq_ungapping(self): rna1 = RnaSequence('U-C-A-G-', Name='rna1') model1 = ModelSequence('U-C-A-G-', Name='rna1',\ Alphabet=RNA.Alphabets.DegenGapped) self.assertEqual(rna1, 'U-C-A-G-') self.assertEqual(rna1.degap(), 'UCAG') # check is produces the right string from the beginning self.assertEqual(str(model1), 'U-C-A-G-') self.assertEqual(model1._data, [0,4,1,4,2,4,3,4]) # ModelSequence should maybe have the same degap method as normal Seq self.assertEqual(str(model1.degap()), 'UCAG') def test_the_rest_of_ModelSequence(self): """The class ModelSequence has 14 methods, but only 2 unittests. You might want to add some tests there...""" #note: mostly these are tested in derived classes, for convenience. pass if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_core/test_sequence.py000644 000765 000024 00000111456 12024702176 022623 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Unit tests for Sequence class and its subclasses. """ from cogent.core.sequence import Sequence, RnaSequence, DnaSequence, \ ProteinSequence, ModelSequenceBase, \ ModelSequence, ModelNucleicAcidSequence, ModelRnaSequence, \ ModelDnaSequence, ModelProteinSequence, ModelCodonSequence, \ ModelDnaCodonSequence, ModelRnaCodonSequence from cogent.core.moltype import RNA, DNA, PROTEIN, ASCII, BYTES, AlphabetError from cogent.util.unit_test import TestCase, main import re from pickle import dumps from numpy import array __author__ = "Rob Knight, Gavin Huttley and Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Gavin Huttley", "Peter Maxwell", "Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class SequenceTests(TestCase): """Tests of the Sequence class.""" SEQ = Sequence RNA = RnaSequence DNA = DnaSequence PROT = ProteinSequence def test_init_empty(self): """Sequence and subclasses should init correctly.""" #NOTE: ModelSequences can't be initialized empty because it screws up #the dimensions of the array, and not worth special-casing. s = self.SEQ() self.assertEqual(s, '') assert s.MolType in (ASCII, BYTES) r = self.RNA() assert r.MolType is RNA def test_init_data(self): """Sequence init with data should set data in correct location""" r = self.RNA('ucagg') #no longer preserves case self.assertEqual(r, 'UCAGG') def test_init_other_seq(self): """Sequence init with other seq should preserve name and info.""" r = self.RNA('UCAGG', Name='x', Info={'z':3}) s = Sequence(r) self.assertEqual(s._seq, 'UCAGG') self.assertEqual(s.Name, 'x') self.assertEqual(s.Info.z, 3) def test_compare_to_string(self): """Sequence should compare equal to same string.""" r = self.RNA('UCC') self.assertEqual(r, 'UCC') def test_slice(self): """Sequence slicing should work as expected""" r = self.RNA('UCAGG') self.assertEqual(r[0], 'U') self.assertEqual(r[-1], 'G') self.assertEqual(r[1:3], 'CA') def test_conversion(self): """Should convert t to u automatically""" r = self.RNA('TCAtu') self.assertEqual(str(r), 'UCAUU') d = self.DNA('UCAtu') self.assertEqual(str(d), 'TCATT') def test_toDna(self): """Returns copy of self as DNA.""" r = self.RNA('TCA') self.assertEqual(str(r), 'UCA') self.assertEqual(str(r.toDna()), 'TCA') def test_toRna(self): """Returns copy of self as RNA.""" r = self.DNA('UCA') self.assertEqual(str(r), 'TCA') self.assertEqual(str(r.toRna()), 'UCA') def test_toFasta(self): """Sequence toFasta() should return Fasta-format string""" even = 'TCAGAT' odd = even + 'AAA' even_dna = self.SEQ(even, Name='even') odd_dna = self.SEQ(odd, Name='odd') self.assertEqual(even_dna.toFasta(), '>even\nTCAGAT') #set line wrap to small number so we can test that it works even_dna.LineWrap = 2 self.assertEqual(even_dna.toFasta(), '>even\nTC\nAG\nAT') odd_dna.LineWrap = 2 self.assertEqual(odd_dna.toFasta(), '>odd\nTC\nAG\nAT\nAA\nA') #check that changing the linewrap again works even_dna.LineWrap = 4 self.assertEqual(even_dna.toFasta(), '>even\nTCAG\nAT') def test_serialize(self): """Sequence should be serializable""" r = self.RNA('ugagg') assert dumps(r) def test_stripDegenerate(self): """Sequence stripDegenerate should remove any degenerate bases""" self.assertEqual(self.RNA('UCAG-').stripDegenerate(), 'UCAG-') self.assertEqual(self.RNA('NRYSW').stripDegenerate(), '') self.assertEqual(self.RNA('USNG').stripDegenerate(), 'UG') def test_stripBad(self): """Sequence stripBad should remove any non-base, non-gap chars""" #have to turn off check to get bad data in; no longer preserves case self.assertEqual(self.RNA('UCxxxAGwsnyrHBNzzzD-D', check=False\ ).stripBad(), 'UCAGWSNYRHBND-D') self.assertEqual(self.RNA('@#^*($@!#&()!@QZX', check=False \ ).stripBad(), '') self.assertEqual(self.RNA('aaaxggg---!ccc', check=False).stripBad(), 'AAAGGG---CCC') def test_stripBadAndGaps(self): """Sequence stripBadAndGaps should remove gaps and bad chars""" #have to turn off check to get bad data in; no longer preserves case self.assertEqual(self.RNA('UxxCAGwsnyrHBNz#!D-D', check=False \ ).stripBadAndGaps(), 'UCAGWSNYRHBNDD') self.assertEqual(self.RNA('@#^*($@!#&()!@QZX', check=False \ ).stripBadAndGaps(), '') self.assertEqual(self.RNA('aaa ggg ---!ccc', check=False \ ).stripBadAndGaps(), 'AAAGGGCCC') def test_shuffle(self): """Sequence shuffle should return new random sequence w/ same monomers""" r = self.RNA('UUUUCCCCAAAAGGGG') s = r.shuffle() self.assertNotEqual(r, s) self.assertEqualItems(r, s) def test_complement(self): """Sequence complement should correctly complement sequence""" self.assertEqual(self.RNA('UAUCG-NR').complement(), 'AUAGC-NY') self.assertEqual(self.DNA('TATCG-NR').complement(), 'ATAGC-NY') self.assertEqual(self.DNA('').complement(), '') self.assertRaises(TypeError, self.PROT('ACD').complement) def test_rc(self): """Sequence rc should correctly reverse-complement sequence""" #no longer preserves case! self.assertEqual(self.RNA('UauCG-NR').rc(), 'YN-CGAUA') self.assertEqual(self.DNA('TatCG-NR').rc(), 'YN-CGATA') self.assertEqual(self.RNA('').rc(), '') self.assertEqual(self.RNA('A').rc(), 'U') self.assertRaises(TypeError, self.PROT('ACD').rc) def test_contains(self): """Sequence contains should return correct result""" r = self.RNA('UCA') assert 'U' in r assert 'CA' in r assert 'X' not in r assert 'G' not in r def test_iter(self): """Sequence iter should iterate over sequence""" p = self.PROT('QWE') self.assertEqual(list(p), ['Q','W','E']) def test_isGapped(self): """Sequence isGapped should return True if gaps in seq""" assert not self.RNA('').isGapped() assert not self.RNA('ACGUCAGUACGUCAGNRCGAUcaguaguacYRNRYRN').isGapped() assert self.RNA('-').isGapped() assert self.PROT('--').isGapped() assert self.RNA('CAGUCGUACGUCAGUACGUacucauacgac-caguACUG').isGapped() assert self.RNA('CA--CGUAUGCA-----g').isGapped() assert self.RNA('CAGU-').isGapped() def test_isGap(self): """Sequence isGap should return True if char is a valid gap char""" r = self.RNA('ACGUCAGUACGUCAGNRCGAUcaguaguacYRNRYRN') for char in 'qwertyuiopasdfghjklzxcvbnmQWERTYUIOASDFGHJKLZXCVBNM': assert not r.isGap(char) assert r.isGap('-') #only works on a single literal that's a gap, not on a sequence. #possibly, this behavior should change? assert not r.isGap('---') #check behaviour on self assert not self.RNA('CGAUACGUACGACU').isGap() assert not self.RNA('---CGAUA----CGUACG---ACU---').isGap() assert self.RNA('').isGap() assert self.RNA('----------').isGap() def test_isDegenerate(self): """Sequence isDegenerate should return True if degen symbol in seq""" assert not self.RNA('').isDegenerate() assert not self.RNA('UACGCUACAUGuacgucaguGCUAGCUA---ACGUCAG').isDegenerate() assert self.RNA('N').isDegenerate() assert self.RNA('R').isDegenerate() assert self.RNA('y').isDegenerate() assert self.RNA('GCAUguagcucgUCAGUCAGUACgUgcasCUAG').isDegenerate() assert self.RNA('ACGYAUGCUGYWWNMNuwbycwuybcwbwub').isDegenerate() def test_isStrict(self): """Sequence isStrict should return True if all symbols in Monomers""" assert self.RNA('').isStrict() assert self.PROT('A').isStrict() assert self.RNA('UAGCACUgcaugcauGCAUGACuacguACAUG').isStrict() assert not self.RNA('CAGUCGAUCA-cgaucagUCGAUGAC').isStrict() def test_firstGap(self): """Sequence firstGap should return index of first gap symbol, or None""" self.assertEqual(self.RNA('').firstGap(), None) self.assertEqual(self.RNA('a').firstGap(), None) self.assertEqual(self.RNA('uhacucHuhacUUhacan').firstGap(), None) self.assertEqual(self.RNA('-abc').firstGap(), 0) self.assertEqual(self.RNA('b-ac').firstGap(), 1) self.assertEqual(self.RNA('abcd-').firstGap(), 4) def test_firstDegenerate(self): """Sequence firstDegenerate should return index of first degen symbol""" self.assertEqual(self.RNA('').firstDegenerate(), None) self.assertEqual(self.RNA('a').firstDegenerate(), None) self.assertEqual(self.RNA('UCGACA--CU-gacucaguacgua' ).firstDegenerate(), None) self.assertEqual(self.RNA('nCAGU').firstDegenerate(), 0) self.assertEqual(self.RNA('CUGguagvAUG').firstDegenerate(), 7) self.assertEqual(self.RNA('ACUGCUAacgud').firstDegenerate(), 11) def test_firstNonStrict(self): """Sequence firstNonStrict should return index of first non-strict symbol""" self.assertEqual(self.RNA('').firstNonStrict(), None) self.assertEqual(self.RNA('A').firstNonStrict(), None) self.assertEqual(self.RNA('ACGUACGUcgaucagu').firstNonStrict(), None) self.assertEqual(self.RNA('N').firstNonStrict(), 0) self.assertEqual(self.RNA('-').firstNonStrict(), 0) self.assertEqual(self.RNA('ACGUcgAUGUGCAUcagu-').firstNonStrict(),18) def test_disambiguate(self): """Sequence disambiguate should remove degenerate bases""" self.assertEqual(self.RNA('').disambiguate(), '') self.assertEqual(self.RNA('AGCUGAUGUA--CAGU').disambiguate(), 'AGCUGAUGUA--CAGU') self.assertEqual(self.RNA('AUn-yrs-wkmCGwmrNMWRKY').disambiguate( 'strip'), 'AU--CG') s = self.RNA('AUn-yrs-wkmCGwmrNMWRKY') t = s.disambiguate('random') u = s.disambiguate('random') for i, j in zip(str(s), str(t)): if i in s.MolType.Degenerates: assert j in s.MolType.Degenerates[i] else: assert i == j self.assertNotEqual(t, u) self.assertEqual(len(s), len(t)) def test_degap(self): """Sequence degap should remove all gaps from sequence""" #doesn't preserve case self.assertEqual(self.RNA('').degap(), '') self.assertEqual(self.RNA('GUCAGUCgcaugcnvuncdks').degap(), 'GUCAGUCGCAUGCNVUNCDKS') self.assertEqual(self.RNA('----------------').degap(), '') self.assertEqual(self.RNA('gcuauacg-').degap(), 'GCUAUACG') self.assertEqual(self.RNA('-CUAGUCA').degap(), 'CUAGUCA') self.assertEqual(self.RNA('---a---c---u----g---').degap(), 'ACUG') self.assertEqual(self.RNA('?a-').degap(), 'A') def test_gapList(self): """Sequence gapList should return correct gap positions""" self.assertEqual(self.RNA('').gapList(), []) self.assertEqual(self.RNA('ACUGUCAGUACGHSDKCUCDNNS').gapList(),[]) self.assertEqual(self.RNA('GUACGUACAKDC-SDHDSK').gapList(),[12]) self.assertEqual(self.RNA('-DSHUHDS').gapList(), [0]) self.assertEqual(self.RNA('UACHASADS-').gapList(), [9]) self.assertEqual(self.RNA('---CGAUgCAU---ACGHc---ACGUCAGU---' ).gapList(), [0,1,2,11,12,13,19,20,21,30,31,32]) def test_gapVector(self): """Sequence gapVector should return correct gap positions""" g = lambda x: self.RNA(x).gapVector() self.assertEqual(g(''), []) self.assertEqual(g('ACUGUCAGUACGHCSDKCCUCCDNCNS'), [False]*27) self.assertEqual(g('GUACGUAACAKADC-SDAHADSAK'), map(bool, map(int,'000000000000001000000000'))) self.assertEqual(g('-DSHSUHDSS'), map(bool, map(int,'1000000000'))) self.assertEqual(g('UACHASCAGDS-'), map(bool, map(int,'000000000001'))) self.assertEqual(g('---CGAUgCAU---ACGHc---ACGUCAGU--?'), \ map(bool, map(int,'111000000001110000011100000000111'))) def test_gapMaps(self): """Sequence gapMaps should return dicts mapping gapped/ungapped pos""" empty = '' no_gaps = 'aaa' all_gaps = '---' start_gaps = '--abc' end_gaps = 'ab---' mid_gaps = '--a--b-cd---' gm = lambda x: self.RNA(x).gapMaps() self.assertEqual(gm(empty), ({},{})) self.assertEqual(gm(no_gaps), ({0:0,1:1,2:2}, {0:0,1:1,2:2})) self.assertEqual(gm(all_gaps), ({},{})) self.assertEqual(gm(start_gaps), ({0:2,1:3,2:4},{2:0,3:1,4:2})) self.assertEqual(gm(end_gaps), ({0:0,1:1},{0:0,1:1})) self.assertEqual(gm(mid_gaps), ({0:2,1:5,2:7,3:8},{2:0,5:1,7:2,8:3})) def test_countGaps(self): """Sequence countGaps should return correct gap count""" self.assertEqual(self.RNA('').countGaps(), 0) self.assertEqual(self.RNA('ACUGUCAGUACGHSDKCUCDNNS').countGaps(), 0) self.assertEqual(self.RNA('GUACGUACAKDC-SDHDSK').countGaps(), 1) self.assertEqual(self.RNA('-DSHUHDS').countGaps(), 1) self.assertEqual(self.RNA('UACHASADS-').countGaps(), 1) self.assertEqual(self.RNA('---CGAUgCAU---ACGHc---ACGUCAGU---' ).countGaps(), 12) def test_countDegenerate(self): """Sequence countDegenerate should return correct degen base count""" self.assertEqual(self.RNA('').countDegenerate(), 0) self.assertEqual(self.RNA('GACUGCAUGCAUCGUACGUCAGUACCGA' ).countDegenerate(), 0) self.assertEqual(self.RNA('N').countDegenerate(), 1) self.assertEqual(self.PROT('N').countDegenerate(), 0) self.assertEqual(self.RNA('NRY').countDegenerate(), 3) self.assertEqual(self.RNA('ACGUAVCUAGCAUNUCAGUCAGyUACGUCAGS' ).countDegenerate(), 4) def test_possibilites(self): """Sequence possibilities should return correct # possible sequences""" self.assertEqual(self.RNA('').possibilities(), 1) self.assertEqual(self.RNA('ACGUgcaucagUCGuGCAU').possibilities(), 1) self.assertEqual(self.RNA('N').possibilities(), 4) self.assertEqual(self.RNA('R').possibilities(), 2) self.assertEqual(self.RNA('H').possibilities(), 3) self.assertEqual(self.RNA('nRh').possibilities(), 24) self.assertEqual(self.RNA('AUGCnGUCAg-aurGauc--gauhcgauacgws' ).possibilities(), 96) def test_MW(self): """Sequence MW should return correct molecular weight""" self.assertEqual(self.PROT('').MW(), 0) self.assertEqual(self.RNA('').MW(), 0) self.assertFloatEqual(self.PROT('A').MW(), 107.09) self.assertFloatEqual(self.RNA('A').MW(), 375.17) self.assertFloatEqual(self.PROT('AAA').MW(), 285.27) self.assertFloatEqual(self.RNA('AAA').MW(), 1001.59) self.assertFloatEqual(self.RNA('AAACCCA').MW(), 2182.37) def test_canMatch(self): """Sequence canMatch should return True if all positions can match""" assert self.RNA('').canMatch('') assert self.RNA('UCAG').canMatch('UCAG') assert not self.RNA('UCAG').canMatch('ucag') assert self.RNA('UCAG').canMatch('NNNN') assert self.RNA('NNNN').canMatch('UCAG') assert self.RNA('NNNN').canMatch('NNNN') assert not self.RNA('N').canMatch('x') assert not self.RNA('N').canMatch('-') assert self.RNA('UCAG').canMatch('YYRR') assert self.RNA('UCAG').canMatch('KMWS') def test_canMismatch(self): """Sequence canMismatch should return True on any possible mismatch""" assert not self.RNA('').canMismatch('') assert self.RNA('N').canMismatch('N') assert self.RNA('R').canMismatch('R') assert self.RNA('N').canMismatch('r') assert self.RNA('CGUACGCAN').canMismatch('CGUACGCAN') assert self.RNA('U').canMismatch('C') assert self.RNA('UUU').canMismatch('UUC') assert self.RNA('UUU').canMismatch('UUY') assert not self.RNA('UUU').canMismatch('UUU') assert not self.RNA('UCAG').canMismatch('UCAG') assert not self.RNA('U--').canMismatch('U--') def test_mustMatch(self): """Sequence mustMatch should return True when no possible mismatches""" assert self.RNA('').mustMatch('') assert not self.RNA('N').mustMatch('N') assert not self.RNA('R').mustMatch('R') assert not self.RNA('N').mustMatch('r') assert not self.RNA('CGUACGCAN').mustMatch('CGUACGCAN') assert not self.RNA('U').mustMatch('C') assert not self.RNA('UUU').mustMatch('UUC') assert not self.RNA('UUU').mustMatch('UUY') assert self.RNA('UU-').mustMatch('UU-') assert self.RNA('UCAG').mustMatch('UCAG') def test_canPair(self): """Sequence canPair should return True if all positions can pair""" assert self.RNA('').canPair('') assert not self.RNA('UCAG').canPair('UCAG') assert self.RNA('UCAG').canPair('CUGA') assert not self.RNA('UCAG').canPair('cuga') assert self.RNA('UCAG').canPair('NNNN') assert self.RNA('NNNN').canPair('UCAG') assert self.RNA('NNNN').canPair('NNNN') assert not self.RNA('N').canPair('x') assert not self.RNA('N').canPair('-') assert self.RNA('-').canPair('-') assert self.RNA('UCAGU').canPair('KYYRR') assert self.RNA('UCAG').canPair('KKRS') assert self.RNA('U').canPair('G') assert not self.DNA('T').canPair('G') def test_canMispair(self): """Sequence canMispair should return True on any possible mispair""" assert not self.RNA('').canMispair('') assert self.RNA('N').canMispair('N') assert self.RNA('R').canMispair('Y') assert self.RNA('N').canMispair('r') assert self.RNA('CGUACGCAN').canMispair('NUHCHUACH') assert self.RNA('U').canMispair('C') assert self.RNA('U').canMispair('R') assert self.RNA('UUU').canMispair('AAR') assert self.RNA('UUU').canMispair('GAG') assert not self.RNA('UUU').canMispair('AAA') assert not self.RNA('UCAG').canMispair('CUGA') assert self.RNA('U--').canMispair('--U') assert self.DNA('TCCAAAGRYY').canMispair('RRYCTTTGGA') def test_mustPair(self): """Sequence mustPair should return True when no possible mispairs""" assert self.RNA('').mustPair('') assert not self.RNA('N').mustPair('N') assert not self.RNA('R').mustPair('Y') assert not self.RNA('A').mustPair('A') assert not self.RNA('CGUACGCAN').mustPair('NUGCGUACG') assert not self.RNA('U').mustPair('C') assert not self.RNA('UUU').mustPair('AAR') assert not self.RNA('UUU').mustPair('RAA') assert not self.RNA('UU-').mustPair('-AA') assert self.RNA('UCAG').mustPair('CUGA') assert self.DNA('TCCAGGG').mustPair('CCCTGGA') assert self.DNA('tccaggg').mustPair(self.DNA('ccctgga')) assert not self.DNA('TCCAGGG').mustPair('NCCTGGA') def test_diff(self): """Sequence diff should count 1 for each difference between sequences""" self.assertEqual(self.RNA('UGCUGCUC').diff(''), 0) self.assertEqual(self.RNA('UGCUGCUC').diff('U'), 0) self.assertEqual(self.RNA('UGCUGCUC').diff('UCCCCCUC'), 3) #case-sensitive! self.assertEqual(self.RNA('AAAAA').diff('CCCCC'), 5) #raises TypeError if other not iterable self.assertRaises(TypeError, self.RNA('AAAAA').diff, 5) def test_distance(self): """Sequence distance should calculate correctly based on function""" def f(a, b): if a == b: return 0 if (a in 'UC' and b in 'UC') or (a in 'AG' and b in 'AG'): return 1 else: return 10 #uses identity function by default self.assertEqual(self.RNA('UGCUGCUC').distance(''), 0) self.assertEqual(self.RNA('UGCUGCUC').distance('U'), 0) self.assertEqual(self.RNA('UGCUGCUC').distance('UCCCCCUC'), 3) #case-sensitive! self.assertEqual(self.RNA('AAAAA').distance('CCCCC'), 5) #should use function if supplied self.assertEqual(self.RNA('UGCUGCUC').distance('', f), 0) self.assertEqual(self.RNA('UGCUGCUC').distance('U', f), 0) self.assertEqual(self.RNA('UGCUGCUC').distance('C', f), 1) self.assertEqual(self.RNA('UGCUGCUC').distance('G', f), 10) self.assertEqual(self.RNA('UGCUGCUC').distance('UCCCCCUC', f), 21) #case-sensitive! self.assertEqual(self.RNA('AAAAA').distance('CCCCC', f), 50) def test_matrixDistance(self): """Sequence matrixDistance should look up distances from a matrix""" #note that the score matrix must contain 'diagonal' elements m[i][i] #to avoid failure when the sequences match. m = {'U':{'U':0, 'C':1, 'A':5}, 'C':{'C':0, 'A':2,'G':4}} self.assertEqual(self.RNA('UUUCCC').matrixDistance('UCACGG', m), 14) self.assertEqual(self.RNA('UUUCCC').matrixDistance('', m), 0) self.assertEqual(self.RNA('UUU').matrixDistance('CAC', m), 7) self.assertRaises(KeyError, self.RNA('UUU').matrixDistance, 'CAG', m) def test_fracSame(self): """Sequence fracSame should return similarity between sequences""" s1 = self.RNA('ACGU') s2 = self.RNA('AACG') s3 = self.RNA('GG') s4 = self.RNA('A') e = self.RNA('') self.assertEqual(s1.fracSame(e), 0) self.assertEqual(s1.fracSame(s2), 0.25) self.assertEqual(s1.fracSame(s3), 0) self.assertEqual(s1.fracSame(s4), 1.0) #note truncation def test_fracDiff(self): """Sequence fracDiff should return difference between sequences""" s1 = self.RNA('ACGU') s2 = self.RNA('AACG') s3 = self.RNA('GG') s4 = self.RNA('A') e = self.RNA('') self.assertEqual(s1.fracDiff(e), 0) self.assertEqual(s1.fracDiff(s2), 0.75) self.assertEqual(s1.fracDiff(s3), 1) self.assertEqual(s1.fracDiff(s4), 0) #note truncation def test_fracSameGaps(self): """Sequence fracSameGaps should return similarity in gap positions""" s1 = self.RNA('AAAA') s2 = self.RNA('GGGG') s3 = self.RNA('----') s4 = self.RNA('A-A-') s5 = self.RNA('-G-G') s6 = self.RNA('UU--') s7 = self.RNA('-') s8 = self.RNA('GGG') e = self.RNA('') self.assertEqual(s1.fracSameGaps(s1), 1) self.assertEqual(s1.fracSameGaps(s2), 1) self.assertEqual(s1.fracSameGaps(s3), 0) self.assertEqual(s1.fracSameGaps(s4), 0.5) self.assertEqual(s1.fracSameGaps(s5), 0.5) self.assertEqual(s1.fracSameGaps(s6), 0.5) self.assertEqual(s1.fracSameGaps(s7), 0) self.assertEqual(s1.fracSameGaps(e), 0) self.assertEqual(s3.fracSameGaps(s3), 1) self.assertEqual(s3.fracSameGaps(s4), 0.5) self.assertEqual(s3.fracSameGaps(s7), 1.0) self.assertEqual(e.fracSameGaps(e), 0.0) self.assertEqual(s4.fracSameGaps(s5), 0.0) self.assertEqual(s4.fracSameGaps(s6), 0.5) self.assertFloatEqual(s6.fracSameGaps(s8), 2/3.0) def test_fracDiffGaps(self): """Sequence fracDiffGaps should return difference in gap positions""" s1 = self.RNA('AAAA') s2 = self.RNA('GGGG') s3 = self.RNA('----') s4 = self.RNA('A-A-') s5 = self.RNA('-G-G') s6 = self.RNA('UU--') s7 = self.RNA('-') s8 = self.RNA('GGG') e = self.RNA('') self.assertEqual(s1.fracDiffGaps(s1), 0) self.assertEqual(s1.fracDiffGaps(s2), 0) self.assertEqual(s1.fracDiffGaps(s3), 1) self.assertEqual(s1.fracDiffGaps(s4), 0.5) self.assertEqual(s1.fracDiffGaps(s5), 0.5) self.assertEqual(s1.fracDiffGaps(s6), 0.5) self.assertEqual(s1.fracDiffGaps(s7), 1) self.assertEqual(s1.fracDiffGaps(e), 0) self.assertEqual(s3.fracDiffGaps(s3), 0) self.assertEqual(s3.fracDiffGaps(s4), 0.5) self.assertEqual(s3.fracDiffGaps(s7), 0.0) self.assertEqual(e.fracDiffGaps(e), 0.0) self.assertEqual(s4.fracDiffGaps(s5), 1.0) self.assertEqual(s4.fracDiffGaps(s6), 0.5) self.assertFloatEqual(s6.fracDiffGaps(s8), 1/3.0) def test_fracSameNonGaps(self): """Sequence fracSameNonGaps should return similarities at non-gaps""" s1 = self.RNA('AAAA') s2 = self.RNA('AGGG') s3 = self.RNA('GGGG') s4 = self.RNA('AG--GA-G') s5 = self.RNA('CU--CU-C') s6 = self.RNA('AC--GC-G') s7 = self.RNA('--------') s8 = self.RNA('AAAA----') s9 = self.RNA('A-GG-A-C') e = self.RNA('') test = lambda x, y, z: self.assertFloatEqual(x.fracSameNonGaps(y), z) test(s1, s2, 0.25) test(s1, s3, 0) test(s2, s3, 0.75) test(s1, s4, 0.5) test(s4, s5, 0) test(s4, s6, 0.6) test(s4, s7, 0) test(s4, s8, 0.5) test(s4, s9, 2/3.0) test(e, s4, 0) def test_fracDiffNonGaps(self): """Sequence fracDiffNonGaps should return differences at non-gaps""" s1 = self.RNA('AAAA') s2 = self.RNA('AGGG') s3 = self.RNA('GGGG') s4 = self.RNA('AG--GA-G') s5 = self.RNA('CU--CU-C') s6 = self.RNA('AC--GC-G') s7 = self.RNA('--------') s8 = self.RNA('AAAA----') s9 = self.RNA('A-GG-A-C') e = self.RNA('') test = lambda x, y, z: self.assertFloatEqual(x.fracDiffNonGaps(y), z) test(s1, s2, 0.75) test(s1, s3, 1) test(s2, s3, 0.25) test(s1, s4, 0.5) test(s4, s5, 1) test(s4, s6, 0.4) test(s4, s7, 0) test(s4, s8, 0.5) test(s4, s9, 1/3.0) test(e, s4, 0) def test_fracSimilar(self): """Sequence fracSimilar should return the fraction similarity""" transitions = dict.fromkeys([ \ ('A','A'), ('A','G'), ('G','A'), ('G','G'), ('U','U'), ('U','C'), ('C','U'), ('C','C')]) s1 = self.RNA('UCAGGCAA') s2 = self.RNA('CCAAAUGC') s3 = self.RNA('GGGGGGGG') e = self.RNA('') test = lambda x, y, z: self.assertFloatEqual( \ x.fracSimilar(y, transitions), z) test(e, e, 0) test(s1, e, 0) test(s1, s1, 1) test(s1, s2, 7.0/8) test(s1, s3, 5.0/8) test(s2,s3, 4.0/8) def test_withTerminiUnknown(self): """withTerminiUnknown should reset termini to unknown char""" s1 = self.RNA('-?--AC--?-') s2 = self.RNA('AC') self.assertEqual(s1.withTerminiUnknown(), '????AC????') self.assertEqual(s2.withTerminiUnknown(), 'AC') def test_consistent_gap_degen_handling(self): """gap degen character should be treated consistently""" # the degen character '?' can be a gap, so when we strip either gaps or # degen characters it should be gone too raw_seq = "---??-??TC-GGCG-GCA-G-GC-?-C-TAN-GCGC-CCTC-AGGA?-???-??--" raw_ungapped = re.sub("[-?]", "", raw_seq) raw_no_ambigs = re.sub("[N?]+", "", raw_seq) dna = self.DNA(raw_seq) self.assertEqual(dna.degap(), raw_ungapped) self.assertEqual(dna.stripDegenerate(), raw_no_ambigs) self.assertEqual(dna.stripBadAndGaps(), raw_ungapped) class SequenceSubclassTests(TestCase): """Only one general set of tests, since the subclasses are very thin.""" def test_DnaSequence(self): """DnaSequence should behave as expected""" x = DnaSequence('tcag') #note: no longer preserves case self.assertEqual(x, 'TCAG') x = DnaSequence('aaa') + DnaSequence('ccc') #note: doesn't preserve case self.assertEqual(x, 'AAACCC') assert x.MolType is DNA self.assertRaises(AlphabetError, x.__add__, 'z') self.assertEqual(DnaSequence('TTTAc').rc(), 'GTAAA') class ModelSequenceTests(object): """Base class for tests of specific ModelSequence objects.""" SequenceClass = None #override in derived classes def test_toFasta(self): """Sequence toFasta() should return Fasta-format string""" even = 'TCAGAT' odd = even + 'AAA' even_dna = self.SequenceClass(even, Name='even') odd_dna = self.SequenceClass(odd, Name='odd') self.assertEqual(even_dna.toFasta(), '>even\nTCAGAT') #set line wrap to small number so we can test that it works even_dna.LineWrap = 2 self.assertEqual(even_dna.toFasta(), '>even\nTC\nAG\nAT') odd_dna.LineWrap = 2 self.assertEqual(odd_dna.toFasta(), '>odd\nTC\nAG\nAT\nAA\nA') #check that changing the linewrap again works even_dna.LineWrap = 4 self.assertEqual(even_dna.toFasta(), '>even\nTCAG\nAT') def test_toPhylip(self): """Sequence toPhylip() should return one-line phylip string""" s = self.SequenceClass('ACG', Name='xyz') self.assertEqual(s.toPhylip(), 'xyz'+' '*27+'ACG') class DnaSequenceTests(ModelSequenceTests, TestCase): class SequenceClass(ModelNucleicAcidSequence): Alphabet = DNA.Alphabets.Base def test_init(self): """Sequence should do round-trip from string""" orig = '' r = self.SequenceClass(orig) self.assertEqual(str(r), orig) orig = 'TCAGGA' r = self.SequenceClass(orig) self.assertEqual(r._data, array([0,1,2,3,3,2])) self.assertEqual(str(r), orig) def test_toKwords(self): """Sequence toKwords should give expected counts""" orig = 'ATCCCTAGC' r = self.SequenceClass(orig) #if we use k = 1, should just get the characters w = r.toKwords(1) self.assertEqual(w, r._data) w = r.toKwords(1, overlapping=False) self.assertEqual(w, r._data) #if we use k = 2, should get overlapping or nonoverlapping k-words w = r.toKwords(2) self.assertEqual(w, array([8,1,5,5,4,2,11,13])) w = r.toKwords(2, overlapping=False) self.assertEqual(w, array([8,5,4,11])) #check a case with k = 3, i.e. codons w = r.toKwords(3, overlapping=False) self.assertEqual(w, array([33,20,45])) class CodonSequenceTests(SequenceTests, TestCase): class SequenceClass(ModelCodonSequence): Alphabet = DNA.Alphabets.Base.Triples def test_init(self): """Sequence should do round-trip from string""" orig = '' r = self.SequenceClass(orig) self.assertEqual(str(r), orig) orig = 'TCAGGA' r = self.SequenceClass(orig) self.assertEqual(r._data, array([6,62])) self.assertEqual(str(r), orig) def test_toKwords(self): """Sequence toKwords should give expected counts""" orig = 'ATCCCTAGC' r = self.SequenceClass(orig) #if we use k = 1, should just get the characters w = r.toKwords(1) self.assertEqual(w, r._data) w = r.toKwords(1, overlapping=False) self.assertEqual(w, r._data) #if we use k = 2, should get overlapping or nonoverlapping k-words w = r.toKwords(2) self.assertEqual(w, array([2132,1325])) w = r.toKwords(2, overlapping=False) self.assertEqual(w, array([2132])) class DnaSequenceGapTests(TestCase): """Tests of gapped DNA sequences.""" class SequenceClass(ModelNucleicAcidSequence): Alphabet = DNA.Alphabets.Gapped Gap = '-' def test_init(self): """Gapped sequence should init ok""" orig = 'TC---' seq = self.SequenceClass(orig) self.assertEqual(str(seq), orig) def test_gaps(self): """Gapped sequence gaps() should return correct array""" sc = self.SequenceClass self.assertEqual(sc('TC').gaps(), array([0,0])) self.assertEqual(sc('T-').gaps(), array([0,1])) def test_degap(self): """Gapped sequence degap() should return correct array""" sc = self.SequenceClass self.assertEqual(sc('T-').degap(), sc('T')) def test_nongaps(self): """Gapped sequence nongaps() should return correct array""" sc = self.SequenceClass self.assertEqual(sc('TC').nongaps(), array([1,1])) self.assertEqual(sc('T-').nongaps(), array([1,0])) def test_regap(self): """Gapped sequence regap() should return correct sequence""" sc = self.SequenceClass self.assertEqual(str(sc('TC').regap(sc('A---A-'))), 'T---C-') class SequenceIntegrationTests(TestCase): """Should be able to convert regular to model sequences, and back""" def test_regular_to_model(self): """Regular sequence should convert to model sequence""" r = RNA.Sequence('AAA', Name='x') s = RNA.ModelSeq(r) self.assertEqual(str(s), 'AAA') self.assertEqual(s.MolType, RNA) self.assertEqual(s.Name, 'x') def test_model_to_regular(self): """Model sequence should convert to regular sequence""" r = RNA.ModelSeq('AAA', Name='x') s = RNA.Sequence(r) self.assertEqual(str(s), 'AAA') self.assertEqual(s.MolType, RNA) self.assertEqual(s.Name, 'x') def test_regular_to_regular(self): """Regular sequence should convert to regular sequence""" r = RNA.Sequence('AAA', Name='x') s = RNA.Sequence(r) self.assertEqual(str(s), 'AAA') self.assertEqual(s.MolType, RNA) self.assertEqual(s.Name, 'x') def test_model_to_model(self): """Model sequence should convert to model sequence""" r = RNA.ModelSeq('AAA', Name='x') s = RNA.ModelSeq(r) self.assertEqual(str(s), 'AAA') self.assertEqual(s.MolType, RNA) self.assertEqual(s.Name, 'x') def test_ModelDnaCodonSequence(self): """ModelDnaCodonSequence should behave as expected""" d = ModelDnaCodonSequence('UUUCGU') self.assertEqual(str(d), 'TTTCGT') self.assertEqual(d._data, array([0,28])) self.assertEqual(str(d.toRna()), 'UUUCGU') self.assertEqual(str(d.toDna()), 'TTTCGT') def test_ModelRnaCodonSequence(self): """ModelRnaCodonSequence should behave as expected""" r = ModelRnaCodonSequence('UUUCGU') self.assertEqual(str(r), 'UUUCGU') self.assertEqual(r._data, array([0,28])) self.assertEqual(str(r.toRna()), 'UUUCGU') self.assertEqual(str(r.toDna()), 'TTTCGT') class ModelSequenceTests(SequenceTests): """Tests of the ModelSequence class's inheritance of SequenceI.""" SEQ = ModelSequence RNA = ModelRnaSequence DNA = ModelDnaSequence PROT = ModelProteinSequence def test_distance_indices(self): """ModelSequence distance should work with function of indices""" s1 = self.RNA('AUGC') s2 = self.RNA('AAGC') def f(x,y): if x == 2 or y == 2: return 10 return 0 self.assertEqual(s1.distance(s2, f, use_indices=True), 20) def test_stripBad(self): """Sequence stripBad should remove any non-base, non-gap chars""" #have to turn off check to get bad data in; no longer preserves case r = self.RNA('UCAGRYU') r._data[0] = 31 r._data[2] = 55 self.assertEqual(r.stripBad(), 'CGRYU') def test_stripBadAndGaps(self): """Sequence stripBadAndGaps should remove gaps and bad chars""" #have to turn off check to get bad data in; no longer preserves case r = self.RNA('ACG--GRN?') self.assertEqual(r.stripBadAndGaps(), 'ACGGRN') r._data[0] = 99 self.assertEqual(r.stripBadAndGaps(), 'CGGRN') def test_gapArray(self): """Sequence gapArray should return array of gaps""" r = self.RNA('-?A-?NRY-') v = r.gapArray() self.assertEqual(v, array([1,1,0,1,1,0,0,0,1])) r = self.RNA('AC') v = r.gapArray() self.assertEqual(v, array([0,0])) r = self.RNA('-?') v = r.gapArray() self.assertEqual(v, array([1,1])) def test_gapIndices(self): """Sequence gapIndices should return positions of gaps""" r = self.RNA('-?A-?NRY-') v = r.gapIndices() self.assertEqual(v, array([0,1,3,4,8])) r = self.RNA('AC') v = r.gapIndices() self.assertEqual(v, array([])) #note: always returns array r = self.RNA('-?') v = r.gapIndices() self.assertEqual(v, array([0,1])) #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_core/test_tree.py000644 000765 000024 00000231602 12024702176 021746 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of classes for dealing with trees and phylogeny. """ from copy import copy, deepcopy from cogent import LoadTree from cogent.core.tree import TreeNode, PhyloNode, TreeError from cogent.parse.tree import DndParser from cogent.maths.stats.test import correlation from cogent.util.unit_test import TestCase, main from numpy import array, arange __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Catherine Lozupone", "Daniel McDonald", "Peter Maxwell", "Gavin Huttley", "Andrew Butterfield", "Matthew Wakefield", "Justin Kuczynski","Jens Reeder", "Jose Carlos Clemente Litran"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class TreeTests(TestCase): """Tests of top-level functions.""" def test_LoadTree(self): """LoadTree should load a tree from a file or a string""" #NOTE: This method now sits in cogent.__init__ t_str = '(a_a:10,(b_b:2,c_c:4):5);' #NOTE: Tree quotes these labels because they have underscores in them. result_str = "('a_a':10.0,('b_b':2.0,'c_c':4.0):5.0);" t = LoadTree(treestring=t_str) #t = DndParser(t_str) names = [i.Name for i in t.tips()] self.assertEqual(names, ['a_a', 'b_b', 'c_c']) self.assertEqual(str(t),result_str) self.assertEqual(t.getNewick(with_distances=True), result_str) t_str = '(a_a:10.0,(b_b:2.0,c_c:4.0):5.0);' #NOTE: Tree silently converts spaces to underscores (only for output), #presumably for Newick compatibility. result_str = "(a_a:10.0,(b_b:2.0,c_c:4.0):5.0);" t = LoadTree(treestring=t_str, underscore_unmunge=True) #t = DndParser(t_str, unescape_name=True) names = [i.Name for i in t.tips()] self.assertEqual(names, ['a a', 'b b', 'c c']) self.assertEqual(str(t),result_str) self.assertEqual(t.getNewick(with_distances=True),result_str) def _new_child(old_node, constructor): """Returns new_node which has old_node as its parent.""" new_node = constructor() new_node.Parent = old_node if old_node is not None: if new_node not in old_node.Children: old_node.Children.append(new_node) return new_node tree_std = """\ ((a:1, b:2, c:3)abc:0.1, (d:4, (e:5, f:6)ef:0.2)def:0.3); """ tree_std_dist = \ [[ 0. , 3. , 4. , 5.4, 6.6, 7.6], [ 3. , 0. , 5. , 6.4, 7.6, 8.6], [ 4. , 5. , 0. , 7.4, 8.6, 9.6], [ 5.4, 6.4, 7.4, 0. , 9.2, 10.2], [ 6.6, 7.6, 8.6, 9.2, 0. , 11. ], [ 7.6, 8.6, 9.6, 10.2, 11. , 0. ]] tree_std_tips = ['a', 'b', 'c', 'd', 'e', 'f'] tree_one_level = """(a:1, b:2, c:3)abc;""" tree_two_level = """((a:1, b:2, c:3)abc:0.1, d:0.3)abcd;""" tree_one_child = """((a:1, b:2, c:3)abc:0.1, (d:0.2)d_:0.3)abcd;""" tree_one_child_dist = \ [[ 0. , 3. , 4. , 1.6], [ 3. , 0. , 5. , 2.6], [ 4. , 5. , 0. , 3.6], [ 1.6, 2.6, 3.6, 0. ]] tree_one_child_tips = ['a', 'b', 'c', 'd'] class TreeNodeTests(TestCase): """Tests of the TreeNode class.""" def setUp(self): """Define some standard TreeNode for testing""" self.Empty = TreeNode() self.Single = TreeNode(Name='a') self.Child = TreeNode(Name='b') self.OneChild = TreeNode(Name='a', Children=[self.Child]) self.Multi = TreeNode(Name = 'a', Children='bcd') self.Repeated = TreeNode(Name='x', Children='aaa') self.BigName = map(TreeNode, '0123456789') self.BigParent = TreeNode(Name = 'x', Children = self.BigName) self.Comparisons = map(TreeNode, 'aab') nodes = dict([(x, TreeNode(x)) for x in 'abcdefgh']) nodes['a'].append(nodes['b']) nodes['b'].append(nodes['c']) nodes['c'].append(nodes['d']) nodes['c'].append(nodes['e']) nodes['c'].append(nodes['f']) nodes['f'].append(nodes['g']) nodes['a'].append(nodes['h']) self.TreeNode = nodes self.TreeRoot = nodes['a'] self.s = '((H,G),(R,M));' self.t = DndParser(self.s, TreeNode) self.s2 = '(((H,G),R),M);' self.t2 = DndParser(self.s2, TreeNode) self.s4 = '(((H,G),(O,R)),X);' self.t4 = DndParser(self.s4, TreeNode) def test_init_empty(self): """Empty TreeNode should init OK""" t = self.Empty self.assertEqual(t.Name, None) self.assertEqual(t.Parent, None) self.assertEqual(len(t), 0) def test_init_full(self): """TreeNode should init OK with parent, data, and children""" t = self.Empty u = TreeNode(Parent=t, Name='abc', Children='xyz') self.assertEqual(u.Name, 'abc') assert u.Parent is t assert u in t self.assertEqual(u[0].Name, 'x') self.assertEqual(u[1].Name, 'y') self.assertEqual(u[2].Name, 'z') self.assertEqual(len(u), 3) def test_str(self): """TreeNode str should give Newick-style representation""" #note: name suppressed if None self.assertEqual(str(self.Empty), ';') self.assertEqual(str(self.OneChild), '(b)a;') self.assertEqual(str(self.BigParent), '(0,1,2,3,4,5,6,7,8,9)x;') self.BigParent[-1].extend('abc') self.assertEqual(str(self.BigParent), '(0,1,2,3,4,5,6,7,8,(a,b,c)9)x;') def test_getNewick(self): """Should return Newick-style representation""" self.assertEqual(self.Empty.getNewick(), ';') self.assertEqual(self.OneChild.getNewick(), '(b)a;') self.assertEqual(self.BigParent.getNewick(), \ '(0,1,2,3,4,5,6,7,8,9)x;') self.BigParent[-1].extend('abc') self.assertEqual(self.BigParent.getNewick(), \ '(0,1,2,3,4,5,6,7,8,(a,b,c)9)x;') def test_multifurcating(self): """Coerces nodes to have <= n children""" t_str = "((a:1,b:2,c:3)d:4,(e:5,f:6,g:7)h:8,(i:9,j:10,k:11)l:12)m:14;" t = DndParser(t_str) # can't break up easily... sorry 80char exp_str = "((a:1.0,(b:2.0,c:3.0):0.0)d:4.0,((e:5.0,(f:6.0,g:7.0):0.0)h:8.0,(i:9.0,(j:10.0,k:11.0):0.0)l:12.0):0.0)m:14.0;" obs = t.multifurcating(2) self.assertEqual(obs.getNewick(with_distances=True), exp_str) self.assertNotEqual(t.getNewick(with_distances=True), obs.getNewick(with_distances=True)) obs = t.multifurcating(2, 0.5) exp_str = "((a:1.0,(b:2.0,c:3.0):0.5)d:4.0,((e:5.0,(f:6.0,g:7.0):0.5)h:8.0,(i:9.0,(j:10.0,k:11.0):0.5)l:12.0):0.5)m:14.0;" self.assertEqual(obs.getNewick(with_distances=True), exp_str) t_str = "((a,b,c)d,(e,f,g)h,(i,j,k)l)m;" exp_str = "((a,(b,c))d,((e,(f,g))h,(i,(j,k))l))m;" t = DndParser(t_str, constructor=TreeNode) obs = t.multifurcating(2) self.assertEqual(obs.getNewick(with_distances=True), exp_str) obs = t.multifurcating(2, eps=10) # no effect on TreeNode type self.assertEqual(obs.getNewick(with_distances=True), exp_str) self.assertRaises(TreeError, t.multifurcating, 1) def test_multifurcating_nameunnamed(self): """Coerces nodes to have <= n children""" t_str = "((a:1,b:2,c:3)d:4,(e:5,f:6,g:7)h:8,(i:9,j:10,k:11)l:12)m:14;" t = DndParser(t_str) exp_str = "((a:1.0,(b:2.0,c:3.0):0.0)d:4.0,((e:5.0,(f:6.0,g:7.0):0.0)h:8.0,(i:9.0,(j:10.0,k:11.0):0.0)l:12.0):0.0)m:14.0;" obs = t.multifurcating(2, name_unnamed=True) c0,c1 = obs.Children self.assertTrue(c0.Children[1].Name.startswith('AUTO')) self.assertTrue(c1.Name.startswith('AUTO')) self.assertTrue(c1.Children[0].Children[1].Name.startswith('AUTO')) self.assertTrue(c1.Children[1].Children[1].Name.startswith('AUTO')) self.assertEqual(len(c0.Children[1].Name), 22) self.assertEqual(len(c1.Name), 22) self.assertEqual(len(c1.Children[0].Children[1].Name), 22) self.assertEqual(len(c1.Children[1].Children[1].Name), 22) names = [n.Name for n in t.nontips()] self.assertEqual(len(names), len(set(names))) def test_bifurcating(self): """Coerces nodes to have <= 2 children""" t_str = "((a:1,b:2,c:3)d:4,(e:5,f:6,g:7)h:8,(i:9,j:10,k:11)l:12)m:14;" t = DndParser(t_str) # can't break up easily... sorry 80char exp_str = "((a:1.0,(b:2.0,c:3.0):0.0)d:4.0,((e:5.0,(f:6.0,g:7.0):0.0)h:8.0,(i:9.0,(j:10.0,k:11.0):0.0)l:12.0):0.0)m:14.0;" obs = t.bifurcating() def test_cmp(self): """TreeNode cmp should compare using id""" nodes = self.TreeNode self.assertEqual(cmp(nodes['a'], nodes['a']), 0) self.assertNotEqual(cmp(nodes['b'], nodes['a']), 0) self.assertNotEqual(cmp(nodes['a'], nodes['b']), 0) def test_compareName(self): """Compare names between TreeNodes""" nodes = self.TreeNode self.assertEqual(nodes['a'].compareName(nodes['a']), 0) self.assertEqual(nodes['a'].compareName(nodes['b']), -1) self.assertEqual(nodes['b'].compareName(nodes['a']), 1) def test_compareByNames(self): """Compare names between trees""" self.assertTrue(self.t.compareByNames(self.t2)) self.assertTrue(self.t.compareByNames(self.t)) self.assertFalse(self.t.compareByNames(self.t4)) def test_eq(self): """TreeNode should compare equal if same id""" t, u, v = self.Comparisons self.assertEqual(t, t) assert t is not u self.assertNotEqual(t, u) self.assertNotEqual(t, v) f = TreeNode(1.0) g = TreeNode(1) self.assertNotEqual(f, g) f.Name += 0.1 self.assertNotEqual(f, g) #however, two TreeNodes that have no name should not compare equal f = TreeNode() g = TreeNode() self.assertNotEqual(f,g) f = TreeNode(Name='foo') g = f.copy() self.assertNotEqual(f, g) def test_ne(self): """TreeNode should compare ne by id or data""" t, u, v = self.Comparisons self.assertFalse(t != t) self.assertTrue(t != u) f = TreeNode(Name='foo') g = f.copy() self.assertTrue(f != g) def test_append(self): """TreeNode append should add item to end of self""" self.OneChild.append(TreeNode('c')) self.assertEqual(len(self.OneChild), 2) self.assertEqual(self.OneChild[-1].Name, 'c') self.OneChild.append(6) self.assertEqual(len(self.OneChild), 3) self.assertEqual(self.OneChild[-1].Name, 6) #check that refs are updated when moved from one tree to another empty = TreeNode() empty.append(self.OneChild[-1]) self.assertEqual(len(empty), 1) self.assertEqual(empty[0].Name, 6) self.assertEqual(empty[0].Parent, empty) self.assertEqual(self.OneChild[-1].Name, 'c') def test_extend(self): """TreeNode extend should add many items to end of self""" self.Empty.extend('abcdefgh') data = ''.join([i.Name for i in self.Empty]) self.assertEqual(data, 'abcdefgh') def test_insert(self): """TreeNode insert should insert item at specified index""" parent, nodes = self.BigParent, self.BigName self.assertEqual(len(parent), 10) parent.insert(3, 5) self.assertEqual(len(parent), 11) self.assertEqual(parent[3].Name, 5) self.assertEqual(parent[4].Name, '3') parent.insert(-1, 123) self.assertEqual(len(parent), 12) self.assertEqual(parent[-1].Name, '9') self.assertEqual(parent[-2].Name, 123) def test_pop(self): """TreeNode pop should remove and return child at specified index""" parent, nodes = self.BigParent, self.BigName self.assertEqual(len(parent), 10) last = parent.pop() assert last is nodes[-1] assert last.Parent is None self.assertEqual(len(parent), 9) assert parent[-1] is nodes[-2] first = parent.pop(0) assert first is nodes[0] assert first.Parent is None self.assertEqual(len(parent), 8) assert parent[0] is nodes[1] second_to_last = parent.pop(-2) assert second_to_last is nodes[-3] def test_remove(self): """TreeNode remove should remove first match by value, not id""" nodes = map(TreeNode, 'abc'*3) parent = TreeNode(Children=nodes) self.assertEqual(len(parent), 9) parent.remove('a') self.assertEqual(len(parent), 8) self.assertEqual(''.join([i.Name for i in parent]), 'bcabcabc') new_node = TreeNode('a') parent.remove(new_node) self.assertEqual(len(parent), 7) self.assertEqual(''.join([i.Name for i in parent]), 'bcbcabc') def test_getitem(self): """TreeNode getitem should return item or slice""" r = self.TreeRoot n = self.TreeNode assert r[0] is n['b'] items = n['c'][0:1] self.assertEqual(len(items), 1) assert items[0] is n['d'] items = n['c'][0:2] self.assertEqual(len(items), 2) assert items[0] is n['d'] assert items[1] is n['e'] items = n['c'][:] self.assertEqual(len(items), 3) assert items[0] is n['d'] assert items[-1] is n['f'] def test_slice(self): """TreeNode slicing should return list, not TreeNode""" nodes = self.TreeNode c, d, e, f = nodes['c'],nodes['d'],nodes['e'],nodes['f'] assert c[:] is not c self.assertEqual(c[:], [d,e,f]) self.assertEqual(c[1:2], [e]) self.assertEqual(c[0:3:2], [d,f]) def test_setitem(self): """TreeNode setitem should set item or extended slice of nodes""" parent, nodes = self.BigParent, self.BigName t = TreeNode(1) parent[0] = t assert parent[0] is t assert t.Parent is parent assert nodes[0].Parent is None u = TreeNode(2) parent[-2] = u assert parent[8] is u assert u.Parent is parent assert nodes[8].Parent is None parent[1:6:2] = 'xyz' for i in [1,3,5]: assert nodes[i].Parent is None self.assertEqual(parent[1].Name, 'x') self.assertEqual(parent[3].Name, 'y') self.assertEqual(parent[5].Name, 'z') for i in parent: assert i.Parent is parent def test_setslice(self): """TreeNode setslice should set old-style slice of nodes""" parent, nodes = self.BigParent, self.BigName self.assertEqual(len(parent), 10) parent[5:] = [] self.assertEqual(len(parent), 5) for i in range(5, 10): assert nodes[i].Parent is None parent[1:3] = 'abcd' self.assertEqual(len(parent), 7) for i in parent: assert i.Parent is parent data_list = [i.Name for i in parent] self.assertEqual(data_list, list('0abcd34')) parent[1:3] = parent[2:3] data_list = [i.Name for i in parent] self.assertEqual(data_list, list('0bcd34')) def test_delitem(self): """TreeNode __delitem__ should delete item and set parent to None""" self.assertEqual(self.Child.Parent, self.OneChild) self.assertEqual(len(self.OneChild), 1) del self.OneChild[0] self.assertEqual(self.OneChild.Parent, None) self.assertEqual(len(self.OneChild), 0) nodes = self.BigName parent = self.BigParent self.assertEqual(len(parent), 10) for n in nodes: assert n.Parent is parent del parent[-1] self.assertEqual(nodes[-1].Parent, None) self.assertEqual(len(parent), 9) del parent[1:6:2] self.assertEqual(len(parent), 6) for i, n in enumerate(nodes): if i in [0,2,4,6,7,8]: assert n.Parent is parent else: assert n.Parent is None def test_delslice(self): """TreeNode __delslice__ should delete items from start to end""" parent = self.BigParent nodes = self.BigName self.assertEqual(len(parent), 10) del parent[3:-2] self.assertEqual(len(parent), 5) for i, n in enumerate(nodes): if i in [3,4,5,6,7]: assert n.Parent is None else: assert n.Parent is parent def test_iter(self): """TreeNode iter should iterate over children""" r = self.TreeRoot n = self.TreeNode items = list(r) assert items[0] is n['b'] assert items[1] is n['h'] self.assertEqual(len(items), 2) def test_len(self): """TreeNode len should return number of children""" r = self.TreeRoot self.assertEqual(len(r), 2) def test_copyRecursive(self): """TreeNode.copyRecursive() should produce deep copy""" t = TreeNode(['t']) u = TreeNode(['u']) t.append(u) c = u.copy() assert c is not u assert c.Name == u.Name assert c.Name is not u.Name #note: Name _is_ same object if it's immutable, e.g. a string. #deepcopy doesn't copy data for immutable objects. #need to check that we also copy arbitrary attributes t.XYZ = [3] c = t.copy() assert c is not t assert c[0] is not u assert c[0].Name is not u.Name assert c[0].Name == u.Name assert c.XYZ == t.XYZ assert c.XYZ is not t.XYZ t = self.TreeRoot c = t.copy() self.assertEqual(str(c), str(t)) def test_copy(self): """TreeNode.copy() should work on deep trees""" t = comb_tree(1024) # should break recursion limit on regular copy t.Name = 'foo' t.XYZ = [3] t2 = t.copy() t3 = t.copy() t3.Name = 'bar' self.assertEqual(len(t.tips()), 1024) self.assertEqual(len(t2.tips()), 1024) self.assertEqual(len(t3.tips()), 1024) self.assertNotSameObj(t, t2) self.assertEqual(t.Name, t2.Name) self.assertNotEqual(t.Name, t3.Name) self.assertEqual(t.XYZ, t2.XYZ) self.assertNotSameObj(t.XYZ, t2.XYZ) self.assertEqual(t.getNewick(), t2.getNewick()) t_simple = TreeNode(['t']) u_simple = TreeNode(['u']) t_simple.append(u_simple) self.assertEqual(str(t_simple.copy()), str(t_simple.copy())) def test_copyTopology(self): """TreeNode.copyTopology() should produce deep copy ignoring attrs""" t = TreeNode(['t']) u = TreeNode(['u']) t.append(u) c = u.copyTopology() assert c is not u self.assertEqual(c.Name, u.Name) #note: Name _is_ same object if it's immutable, e.g. a string. #deepcopy doesn't copy data for immutable objects. #need to check that we do not also copy arbitrary attributes t.XYZ = [3] c = t.copyTopology() assert c is not t assert c[0] is not u assert c[0].Name is not u.Name assert c[0].Name == u.Name assert not hasattr(c, 'XYZ') t = self.TreeRoot c = t.copy() self.assertEqual(str(c), str(t)) def _test_copy_copy(self): """copy.copy should raise TypeError on TreeNode""" t = TreeNode('t') u = TreeNode('u') t.append(u) self.assertRaises(TypeError, copy, t) self.assertRaises(TypeError, copy, u) def test_deepcopy(self): """copy.deepcopy should work on TreeNode""" t = TreeNode(['t']) u = TreeNode(['u']) t.append(u) c = deepcopy(u) assert c is not u assert c.Name == u.Name assert c.Name is not u.Name #note: Name _is_ same object if it's immutable, e.g. a string. #deepcopy doesn't copy data for immutable objects. #need to check that we also copy arbitrary attributes t.XYZ = [3] c = deepcopy(t) assert c is not t assert c[0] is not u assert c[0].Name is not u.Name assert c[0].Name == u.Name assert c.XYZ == t.XYZ assert c.XYZ is not t.XYZ t = self.TreeRoot c = deepcopy(t) self.assertEqual(str(c), str(t)) def test_Parent(self): """TreeNode Parent should hold correct data and be mutable""" #check initial conditions self.assertEqual(self.Single.Parent, None) #set parent and check parent/child relations self.Single.Parent = self.Empty assert self.Single.Parent is self.Empty self.assertEqual(self.Empty[0], self.Single) assert self.Single in self.Empty self.assertEqual(len(self.Empty), 1) #reset parent and check parent/child relations self.Single.Parent = self.OneChild assert self.Single.Parent is self.OneChild assert self.Single not in self.Empty assert self.Single is self.OneChild[-1] #following is added to check that we don't screw up when there are #nodes with different ids that still compare equal for i in self.Repeated: assert i.Parent is self.Repeated last = self.Repeated[-1] last.Parent = self.OneChild self.assertEqual(len(self.Repeated), 2) for i in self.Repeated: assert i.Parent is self.Repeated assert last.Parent is self.OneChild def test_indexInParent(self): """TreeNode indexInParent should hold correct data""" first = TreeNode('a') second = TreeNode('b') third = TreeNode('c') fourth = TreeNode('0', Children=[first, second, third]) self.assertEqual(len(fourth), 3) self.assertEqual(first.indexInParent(), 0) self.assertEqual(second.indexInParent(), 1) self.assertEqual(third.indexInParent(), 2) del fourth[0] self.assertEqual(second.indexInParent(), 0) self.assertEqual(third.indexInParent(), 1) self.assertEqual(len(fourth), 2) assert first.Parent is None def test_isTip(self): """TreeNode isTip should return True if node is a tip""" tips = 'degh' for n in self.TreeNode.values(): if n.Name in tips: self.assertEqual(n.isTip(), True) else: self.assertEqual(n.isTip(), False) def test_isRoot(self): """TreeNode isRoot should return True if parent is None""" r = 'a' for n in self.TreeNode.values(): if n.Name in r: self.assertEqual(n.isRoot(), True) else: self.assertEqual(n.isRoot(), False) def test_traverse(self): """TreeNode traverse should iterate over nodes in tree.""" e = self.Empty s = self.Single o = self.OneChild m = self.Multi r = self.TreeRoot self.assertEqual([i.Name for i in e.traverse()], [None]) self.assertEqual([i.Name for i in e.traverse(False, False)], [None]) self.assertEqual([i.Name for i in e.traverse(True, True)], [None]) self.assertEqual([i.Name for i in s.traverse()], ['a']) self.assertEqual([i.Name for i in s.traverse(True, True)], ['a']) self.assertEqual([i.Name for i in s.traverse(True, False)], ['a']) self.assertEqual([i.Name for i in s.traverse(False, True)], ['a']) self.assertEqual([i.Name for i in s.traverse(False, False)], ['a']) self.assertEqual([i.Name for i in o.traverse()], ['a','b']) self.assertEqual([i.Name for i in o.traverse(True, True)],['a','b','a']) self.assertEqual([i.Name for i in o.traverse(True, False)], ['a', 'b']) self.assertEqual([i.Name for i in o.traverse(False, True)], ['b', 'a']) self.assertEqual([i.Name for i in o.traverse(False, False)], ['b']) self.assertEqual([i.Name for i in m.traverse()], ['a','b','c','d']) self.assertEqual([i.Name for i in m.traverse(True, True)],\ ['a','b','c','d','a']) self.assertEqual([i.Name for i in m.traverse(True, False)], \ ['a', 'b','c','d']) self.assertEqual([i.Name for i in m.traverse(False, True)], \ ['b', 'c', 'd', 'a']) self.assertEqual([i.Name for i in m.traverse(False, False)], \ ['b', 'c', 'd']) self.assertEqual([i.Name for i in r.traverse()], \ ['a','b','c','d', 'e', 'f', 'g', 'h']) self.assertEqual([i.Name for i in r.traverse(True, True)],\ ['a','b','c','d','e','f','g','f','c','b','h','a']) self.assertEqual([i.Name for i in r.traverse(True, False)], \ ['a', 'b','c','d','e','f','g','h']) self.assertEqual([i.Name for i in r.traverse(False, True)], \ ['d','e','g','f','c','b','h','a']) self.assertEqual([i.Name for i in r.traverse(False, False)], \ ['d','e','g','h']) self.assertEqual([i.Name for i in r.traverse(True, True, False)],\ ['b','c','d','e','f','g','f','c','b','h']) self.assertEqual([i.Name for i in r.traverse(True, False, False)], \ ['b','c','d','e','f','g','h']) self.assertEqual([i.Name for i in r.traverse(False, True, False)], \ ['d','e','g','f','c','b','h']) self.assertEqual([i.Name for i in r.traverse(False, False, False)], \ ['d','e','g','h']) #this previously failed t = DndParser('((a:6,(b:1,c:2):8):12,(d:3,(e:1,f:1):4):10);') t0 = t.Children[0] list(t0.traverse(self_before=False, self_after=True)) list(t0.traverse(self_before=True, self_after=True)) def test_levelorder(self): t = DndParser("(((A,B)C,(D,E)F,(G,H)I)J,(K,L)M)N;") exp = ['N','J','M','C','F','I','K','L','A','B','D','E','G','H'] names = [n.Name for n in t.levelorder()] self.assertEqual(names,exp) def test_ancestors(self): """TreeNode ancestors should provide list of ancestors, deepest first""" nodes, tree = self.TreeNode, self.TreeRoot self.assertEqual(nodes['a'].ancestors(), []) self.assertEqual(nodes['b'].ancestors(), [nodes['a']]) self.assertEqual(nodes['d'].ancestors(), nodes['f'].ancestors()) self.assertEqual(nodes['g'].ancestors(), \ [nodes['f'], nodes['c'], nodes['b'], nodes['a']]) def test_root(self): """TreeNode root() should find root of tree""" nodes, root = self.TreeNode, self.TreeRoot for i in nodes.values(): assert i.root() is root def test_children(self): """TreeNode Children should allow getting/setting children""" nodes = self.TreeNode for n in nodes: node = nodes[n] self.assertEqual(list(node), node.Children) t = TreeNode(Children='abc') self.assertEqual(len(t), 3) u, v = TreeNode('u'), TreeNode('v') #WARNING: If you set Children directly, Parent refs will _not_ update! t.Children = [u,v] assert t[0] is u assert t[1] is v self.assertEqual(len(t), 2) def test_siblings(self): """TreeNode siblings() should return all siblings, not self""" self.assertEqual(self.Empty.siblings(), []) self.assertEqual(self.Child.siblings(), []) self.assertEqual(self.OneChild.siblings(), []) nodes, tree = self.TreeNode, self.TreeRoot a = nodes['a'] b = nodes['b'] c = nodes['c'] d = nodes['d'] e = nodes['e'] f = nodes['f'] g = nodes['g'] h = nodes['h'] self.assertEqual(g.siblings(), []) self.assertEqual(f.siblings(), [d,e]) self.assertEqual(e.siblings(), [d,f]) self.assertEqual(d.siblings(), [e,f]) self.assertEqual(c.siblings(), []) self.assertEqual(b.siblings(), [h]) self.assertEqual(h.siblings(), [b]) self.assertEqual(a.siblings(), []) def test_tips(self): """TreeNode tips should return all terminal descendants""" self.assertEqual(self.Empty.tips(), []) self.assertEqual(self.Child.tips(), []) self.assertEqual(self.OneChild.tips(), [self.Child]) nodes, tree = self.TreeNode, self.TreeRoot a = nodes['a'] b = nodes['b'] c = nodes['c'] d = nodes['d'] e = nodes['e'] f = nodes['f'] g = nodes['g'] h = nodes['h'] self.assertEqual(g.tips(), []) self.assertEqual(f.tips(), [g]) self.assertEqual(e.tips(), []) self.assertEqual(d.tips(), []) self.assertEqual(c.tips(), [d,e,g]) self.assertEqual(b.tips(), [d,e,g]) self.assertEqual(h.tips(), []) self.assertEqual(a.tips(), [d,e,g,h]) def test_itertips(self): """TreeNode itertips should iterate over terminal descendants""" tree = self.TreeRoot self.assertEqual([i.Name for i in tree.iterTips()], list('degh')), def test_nontips(self): """TreeNode nontips should return all non-terminal descendants""" tree = self.TreeRoot self.assertEqual([i.Name for i in tree.nontips()], list('bcf')) def test_iterNonTips(self): """TreeNode iterNontips should iterate over non-terminal descendants""" tree = self.TreeRoot self.assertEqual([i.Name for i in tree.iterNontips()], list('bcf')) def test_tipChildren(self): """TreeNode tipChildren should return all terminal children""" self.assertEqual(self.Empty.tipChildren(), []) self.assertEqual(self.Child.tipChildren(), []) self.assertEqual(self.OneChild.tipChildren(), [self.Child]) nodes, tree = self.TreeNode, self.TreeRoot a = nodes['a'] b = nodes['b'] c = nodes['c'] d = nodes['d'] e = nodes['e'] f = nodes['f'] g = nodes['g'] h = nodes['h'] self.assertEqual(g.tipChildren(), []) self.assertEqual(f.tipChildren(), [g]) self.assertEqual(e.tipChildren(), []) self.assertEqual(d.tipChildren(), []) self.assertEqual(c.tipChildren(), [d,e]) self.assertEqual(b.tipChildren(), []) self.assertEqual(h.tipChildren(), []) self.assertEqual(a.tipChildren(), [h]) def test_nonTipChildren(self): """TreeNode nonTipChildren should return all non-terminal children""" self.assertEqual(self.Empty.nonTipChildren(), []) self.assertEqual(self.Child.nonTipChildren(), []) self.assertEqual(self.OneChild.nonTipChildren(), []) nodes, tree = self.TreeNode, self.TreeRoot a = nodes['a'] b = nodes['b'] c = nodes['c'] d = nodes['d'] e = nodes['e'] f = nodes['f'] g = nodes['g'] h = nodes['h'] self.assertEqual(g.nonTipChildren(), []) self.assertEqual(f.nonTipChildren(), []) self.assertEqual(e.nonTipChildren(), []) self.assertEqual(d.nonTipChildren(), []) self.assertEqual(c.nonTipChildren(), [f]) self.assertEqual(b.nonTipChildren(), [c]) self.assertEqual(h.nonTipChildren(), []) self.assertEqual(a.nonTipChildren(), [b]) def test_childGroups(self): """TreeNode childGroups should divide children by grandchild presence""" parent = TreeNode(Children='aababbbaaabbbababbb') for node in parent: if node.Name == 'a': node.append('def') groups = parent.childGroups() self.assertEqual(len(groups), 10) exp_group_sizes = [2,1,1,3,3,3,1,1,1,3] obs_group_sizes = [len(i) for i in groups] self.assertEqual(obs_group_sizes, exp_group_sizes) parent = TreeNode(Children='aab') for node in parent: if node.Name == 'a': node.append('def') groups = parent.childGroups() self.assertEqual(len(groups), 2) self.assertEqual([len(i) for i in groups], [2,1]) parent = TreeNode(Children='aaaaa') groups = parent.childGroups() self.assertEqual(len(groups), 1) self.assertEqual(len(groups[0]), 5) parent = TreeNode(Children='aaba') for node in parent: if node.Name == 'a': node.append('def') groups = parent.childGroups() self.assertEqual(len(groups), 3) self.assertEqual([len(i) for i in groups], [2,1,1]) def test_removeNode(self): """TreeNode removeNode should delete node by id, not value""" parent = self.Repeated children = list(self.Repeated) self.assertEqual(len(parent), 3) self.assertEqual(parent.removeNode(children[1]), True) self.assertEqual(len(parent), 2) assert children[0].Parent is parent assert children[1].Parent is None assert children[2].Parent is parent self.assertEqual(children[0].compareName(children[1]), 0) self.assertEqual(parent.removeNode(children[1]), False) self.assertEqual(len(parent), 2) self.assertEqual(parent.removeNode(children[0]), True) self.assertEqual(len(parent), 1) def test_lowestCommonAncestor(self): """TreeNode lowestCommonAncestor should return LCA for set of tips""" t1 = DndParser("((a,(b,c)d)e,f,(g,h)i)j;") t2 = t1.copy() t3 = t1.copy() t4 = t1.copy() input1 = ['a'] # return self input2 = ['a','b'] # return e input3 = ['b','c'] # return d input4 = ['a','h','g'] # return j exp1 = t1.getNodeMatchingName('a') exp2 = t2.getNodeMatchingName('e') exp3 = t3.getNodeMatchingName('d') exp4 = t4 obs1 = t1.lowestCommonAncestor(input1) obs2 = t2.lowestCommonAncestor(input2) obs3 = t3.lowestCommonAncestor(input3) obs4 = t4.lowestCommonAncestor(input4) self.assertEqual(obs1, exp1) self.assertEqual(obs2, exp2) self.assertEqual(obs3, exp3) self.assertEqual(obs4, exp4) # verify multiple calls work t_mul = t1.copy() exp_1 = t_mul.getNodeMatchingName('d') exp_2 = t_mul.getNodeMatchingName('i') obs_1 = t_mul.lowestCommonAncestor(['b','c']) obs_2 = t_mul.lowestCommonAncestor(['g','h']) self.assertEqual(obs_1, exp_1) self.assertEqual(obs_2, exp_2) def test_lastCommonAncestor(self): """TreeNode LastCommonAncestor should provide last common ancestor""" nodes, tree = self.TreeNode, self.TreeRoot a = nodes['a'] b = nodes['b'] c = nodes['c'] d = nodes['d'] e = nodes['e'] f = nodes['f'] g = nodes['g'] h = nodes['h'] self.assertEqual(a.lastCommonAncestor(a), a) self.assertEqual(a.lastCommonAncestor(b), a) self.assertEqual(a.lastCommonAncestor(g), a) self.assertEqual(a.lastCommonAncestor(h), a) self.assertEqual(b.lastCommonAncestor(g), b) self.assertEqual(b.lastCommonAncestor(d), b) self.assertEqual(b.lastCommonAncestor(a), a) self.assertEqual(b.lastCommonAncestor(h), a) self.assertEqual(d.lastCommonAncestor(f), c) self.assertEqual(d.lastCommonAncestor(g), c) self.assertEqual(d.lastCommonAncestor(a), a) self.assertEqual(d.lastCommonAncestor(h), a) self.assertEqual(g.lastCommonAncestor(g), g) self.assertEqual(g.lastCommonAncestor(f), f) self.assertEqual(g.lastCommonAncestor(e), c) self.assertEqual(g.lastCommonAncestor(c), c) self.assertEqual(g.lastCommonAncestor(b), b) self.assertEqual(g.lastCommonAncestor(a), a) self.assertEqual(g.lastCommonAncestor(h), a) t = TreeNode('h') for i in [a,b,c,d,e,f,g,h]: self.assertEqual(i.lastCommonAncestor(t), None) self.assertEqual(t.lastCommonAncestor(i), None) u = TreeNode('a', Children=[t]) def test_separation(self): """TreeNode separation should return correct number of edges""" nodes, tree = self.TreeNode, self.TreeRoot a = nodes['a'] b = nodes['b'] c = nodes['c'] d = nodes['d'] e = nodes['e'] f = nodes['f'] g = nodes['g'] h = nodes['h'] self.assertEqual(a.separation(a), 0) self.assertEqual(c.separation(c), 0) self.assertEqual(a.separation(b), 1) self.assertEqual(a.separation(h), 1) self.assertEqual(g.separation(h), 5) self.assertEqual(f.separation(d), 2) self.assertEqual(f.separation(c), 1) self.assertEqual(c.separation(f), 1) def test_nameUnnamedNodes(self): """nameUnnamedNodes assigns an arbitrary value when Name == None""" tree, tree_nodes = self.TreeRoot, self.TreeNode tree_nodes['b'].Name = 'node2' tree_nodes['c'].Name = None tree_nodes['f'].Name = None tree_nodes['e'].Name = 'node3' tree.nameUnnamedNodes() self.assertEqual(tree_nodes['c'].Name, 'node1') self.assertEqual(tree_nodes['f'].Name, 'node4') def test_makeTreeArray(self): """makeTreeArray maps nodes to the descendants in them""" tree = self.TreeRoot result, node_list = tree.makeTreeArray() self.assertEqual(result, \ array([[1,1,1,1], [1,1,1,0], [1,1,1,0],[0,0,1,0]])) nodes = [node.Name for node in node_list] self.assertEqual(nodes, ['a', 'b', 'c', 'f']) #test if works with a dec_list supplied dec_list = ['d', 'added', 'e', 'g', 'h'] result2, node_list = tree.makeTreeArray(dec_list) self.assertEqual(result2, \ array([[1,0,1,1,1], [1,0,1,1,0], [1,0,1,1,0], [0,0,0,1,0]])) def test_reassignNames(self): """reassignNames should rename node names based on dict mapping""" t = self.TreeRoot mapping = dict([(x, str(i)) for i,x in enumerate('abfg')]) exp_names = ['0','1','2','3','c','d','e','h'] t.reassignNames(mapping) obs_names = sorted(t.getNodeNames()) self.assertEqual(obs_names, exp_names) def test_reassignNames_specific_nodes(self): """reassignNames should rename nodes based on dict mapping""" t = self.TreeRoot nodes = [self.TreeNode['a'], self.TreeNode['b']] mapping = dict([(x, str(i)) for i,x in enumerate('abfg')]) exp_names = ['0','1','c','d','e','f','g','h'] t.reassignNames(mapping, nodes) obs_names = sorted(t.getNodeNames()) self.assertEqual(obs_names, exp_names) def test_getNodesDict(self): """getNodesDict returns a dict keyed by name, value is node""" t = self.TreeRoot nodes = self.TreeNode self.assertEqual(t.getNodesDict(), nodes) def test_getNodesDict_nonunique_names(self): """getNodesDict raises if non unique names are in tree""" t = self.TreeRoot t.Children[0].Name = 'same' t.Children[0].Children[0].Name = 'same' self.assertRaises(TreeError, t.getNodesDict) def test_removeDeleted(self): """removeDeleted should remove all nodes where is_deleted tests true.""" tree = DndParser('((a:3,(b:2,(c:1,d:1):1):1):2,(e:3,f:3):2);', constructor=TreeNode) result_not_deleted = deepcopy(tree) tree.removeDeleted(lambda x: x.Name in []) self.assertEqual(str(tree),str(result_not_deleted)) deleted = set(['b','d','e','f']) result_tree = DndParser('((a:3,((c:1):1):1):2);',constructor=TreeNode) is_deleted = lambda x: x.Name in deleted tree.removeDeleted(is_deleted) self.assertEqual(str(tree),str(result_tree)) def test_prune(self): """prune should reconstruct correct topology of tree.""" tree = DndParser('((a:3,((c:1):1):1):2);',constructor=TreeNode) tree.prune() result_tree = DndParser('((a:3,c:1));',constructor=TreeNode) self.assertEqual(str(tree),str(result_tree)) samename_bug = DndParser("((A,B)SAMENAME,((C,D)SAMENAME));") samename_bug.prune() exp_tree_str = '((A,B)SAMENAME,(C,D)SAMENAME);' self.assertEqual(str(samename_bug), exp_tree_str) def test_getNodeMatchingName(self): """TreeNode getNodeMatchingName should return node that matches name""" nodes = self.TreeNode root = self.TreeRoot assert root.getNodeMatchingName('g') is nodes['g'] def test_subset(self): """subset should return set of leaves that descends from node""" t = self.t self.assertEqual(t.subset(), frozenset('HGRM')) c = t.Children[0] self.assertEqual(c.subset(), frozenset('HG')) leaf = c.Children[1] self.assertEqual(leaf.subset(), frozenset('')) def test_subsets(self): """subsets should return all subsets descending from a set""" t = self.t self.assertEqual(t.subsets(), frozenset( [frozenset('HG'), frozenset('RM')])) def test_compareBySubsets(self): """compareBySubsets should return the fraction of shared subsets""" result = self.t.compareBySubsets(self.t) self.assertEqual(result, 0) result = self.t2.compareBySubsets(self.t2) self.assertEqual(result, 0) result = self.t.compareBySubsets(self.t2) self.assertEqual(result, 0.5) result = self.t.compareBySubsets(self.t4) self.assertEqual(result, 1-2./5) result = self.t.compareBySubsets(self.t4, exclude_absent_taxa=True) self.assertEqual(result, 1-2./3) result = self.t.compareBySubsets(self.TreeRoot, exclude_absent_taxa=True) self.assertEqual(result, 1) result = self.t.compareBySubsets(self.TreeRoot) self.assertEqual(result, 1) class PhyloNodeTests(TestCase): """Tests of phylogeny-specific methods.""" def setUp(self): """Creates a standard tree""" nodes = dict([(x, PhyloNode(x)) for x in 'abcdefgh']) nodes['a'].append(nodes['b']) nodes['b'].append(nodes['c']) nodes['c'].append(nodes['d']) nodes['c'].append(nodes['e']) nodes['c'].append(nodes['f']) nodes['f'].append(nodes['g']) nodes['a'].append(nodes['h']) self.TreeNode = nodes self.TreeRoot = nodes['a'] nodes['a'].Length = None nodes['b'].Length = 0 nodes['c'].Length = 3 nodes['d'].Length = 1 nodes['e'].Length = 4 nodes['f'].Length = 2 nodes['g'].Length = 3 nodes['h'].Length = 2 self.s = '((H:1,G:1):2,(R:0.5,M:0.7):3);' self.t = DndParser(self.s,PhyloNode) self.s3 = '(((H:1,G:1,O:1):2,R:3):1,X:4);' self.t3 = DndParser(self.s3, PhyloNode) def test_init(self): """Check PhyloNode constructor""" n = PhyloNode('foo', Length=10) self.assertEqual(n.Name,'foo') self.assertEqual(n.Length, 10) n = PhyloNode('bar') self.assertEqual(n.Name, 'bar') self.assertEqual(n.Length, None) n = PhyloNode() self.assertEqual(n.Name, None) self.assertEqual(n.Length, None) def test_totalDescendingBranchLength(self): """totalDescendingBranchLength returns total branchlength below self""" t = self.TreeRoot exp = 15 obs = t.totalDescendingBranchLength() self.assertEqual(obs, exp) node_c = self.TreeNode['c'] exp = 10 obs = node_c.totalDescendingBranchLength() self.assertEqual(obs, exp) def test_tipsWithinDistance(self): """tipsWithinDistance returns tips that are within distance from self""" t_str = "(A:1,B:2,(C:3,D:3)E:2,(F,((G:1,H:2)I:2)J:3)K:2)L;" t = DndParser(t_str, constructor=PhyloNode) nodes = t.getNodesDict() e_node = nodes['E'] exp_at_dist_2 = [] exp_at_dist_3 = ['A','C','D'] exp_at_dist_4 = ['A','B','C','D','F'] obs_at_dist_2 = sorted([n.Name for n in e_node.tipsWithinDistance(2)]) obs_at_dist_3 = sorted([n.Name for n in e_node.tipsWithinDistance(3)]) obs_at_dist_4 = sorted([n.Name for n in e_node.tipsWithinDistance(4)]) self.assertEqual(obs_at_dist_2, exp_at_dist_2) self.assertEqual(obs_at_dist_3, exp_at_dist_3) self.assertEqual(obs_at_dist_4, exp_at_dist_4) def test_tipsWithinDistance_nodistances(self): """tipsWithinDistance returns tips that are within distance from self""" t_str = "(A,B,(C,D)E,(F,((G,H)I)J)K)L;" t = DndParser(t_str, constructor=PhyloNode) nodes = t.getNodesDict() e_node = nodes['E'] exp = sorted([n.Name for n in t.tips()]) obs = sorted([n.Name for n in e_node.tipsWithinDistance(0)]) self.assertEqual(obs, exp) def test_distance(self): """PhyloNode Distance should report correct distance between nodes""" nodes, tree = self.TreeNode, self.TreeRoot a = nodes['a'] b = nodes['b'] c = nodes['c'] d = nodes['d'] e = nodes['e'] f = nodes['f'] g = nodes['g'] h = nodes['h'] self.assertEqual(a.distance(a), 0) self.assertEqual(a.distance(b), 0) self.assertEqual(a.distance(c), 3) self.assertEqual(a.distance(d), 4) self.assertEqual(a.distance(e), 7) self.assertEqual(a.distance(f), 5) self.assertEqual(a.distance(g), 8) self.assertEqual(a.distance(h), 2) self.assertEqual(b.distance(a), 0) self.assertEqual(b.distance(b), 0) self.assertEqual(b.distance(c), 3) self.assertEqual(b.distance(d), 4) self.assertEqual(b.distance(e), 7) self.assertEqual(b.distance(f), 5) self.assertEqual(b.distance(g), 8) self.assertEqual(b.distance(h), 2) self.assertEqual(c.distance(a), 3) self.assertEqual(c.distance(b), 3) self.assertEqual(c.distance(c), 0) self.assertEqual(c.distance(d), 1) self.assertEqual(c.distance(e), 4) self.assertEqual(c.distance(f), 2) self.assertEqual(c.distance(g), 5) self.assertEqual(c.distance(h), 5) self.assertEqual(d.distance(a), 4) self.assertEqual(d.distance(b), 4) self.assertEqual(d.distance(c), 1) self.assertEqual(d.distance(d), 0) self.assertEqual(d.distance(e), 5) self.assertEqual(d.distance(f), 3) self.assertEqual(d.distance(g), 6) self.assertEqual(d.distance(h), 6) self.assertEqual(e.distance(a), 7) self.assertEqual(e.distance(b), 7) self.assertEqual(e.distance(c), 4) self.assertEqual(e.distance(d), 5) self.assertEqual(e.distance(e), 0) self.assertEqual(e.distance(f), 6) self.assertEqual(e.distance(g), 9) self.assertEqual(e.distance(h), 9) self.assertEqual(f.distance(a), 5) self.assertEqual(f.distance(b), 5) self.assertEqual(f.distance(c), 2) self.assertEqual(f.distance(d), 3) self.assertEqual(f.distance(e), 6) self.assertEqual(f.distance(f), 0) self.assertEqual(f.distance(g), 3) self.assertEqual(f.distance(h), 7) self.assertEqual(g.distance(a), 8) self.assertEqual(g.distance(b), 8) self.assertEqual(g.distance(c), 5) self.assertEqual(g.distance(d), 6) self.assertEqual(g.distance(e), 9) self.assertEqual(g.distance(f), 3) self.assertEqual(g.distance(g), 0) self.assertEqual(g.distance(h), 10) self.assertEqual(h.distance(a), 2) self.assertEqual(h.distance(b), 2) self.assertEqual(h.distance(c), 5) self.assertEqual(h.distance(d), 6) self.assertEqual(h.distance(e), 9) self.assertEqual(h.distance(f), 7) self.assertEqual(h.distance(g), 10) self.assertEqual(h.distance(h), 0) def test_compareByTipDistances(self): obs = self.t.compareByTipDistances(self.t3) #note: common taxa are H, G, R (only) m1 = array([[0,2,6.5],[2,0,6.5],[6.5,6.5,0]]) m2 = array([[0,2,6],[2,0,6],[6,6,0]]) r = correlation(m1.flat, m2.flat)[0] self.assertEqual(obs, (1-r)/2) def test_compareByTipDistances_sample(self): obs = self.t.compareByTipDistances(self.t3, sample=3, shuffle_f=sorted) #note: common taxa are H, G, R (only) m1 = array([[0,2,6.5],[2,0,6.5],[6.5,6.5,0]]) m2 = array([[0,2,6],[2,0,6],[6,6,0]]) r = correlation(m1.flat, m2.flat)[0] self.assertEqual(obs, (1-r)/2) # 4 common taxa, still picking H, G, R s = '((H:1,G:1):2,(R:0.5,M:0.7,Q:5):3);' t = DndParser(self.s,PhyloNode) s3 = '(((H:1,G:1,O:1):2,R:3,Q:10):1,X:4);' t3 = DndParser(self.s3, PhyloNode) obs = t.compareByTipDistances(t3, sample=3, shuffle_f=sorted) def test_tipToTipDistances_endpoints(self): """Test getting specifc tip distances with tipToTipDistances""" nodes = [self.t.getNodeMatchingName('H'), self.t.getNodeMatchingName('G'), self.t.getNodeMatchingName('M')] names = ['H','G','M'] exp = (array([[0,2.0,6.7],[2.0,0,6.7],[6.7,6.7,0.0]]), nodes) obs = self.t.tipToTipDistances(endpoints=names) self.assertEqual(obs, exp) obs = self.t.tipToTipDistances(endpoints=nodes) self.assertEqual(obs, exp) def test_prune(self): """prune should reconstruct correct topology and Lengths of tree.""" tree = DndParser('((a:3,((c:1):1):1):2);',constructor=PhyloNode) tree.prune() result_tree = DndParser('((a:3.0,c:3.0):2.0);',constructor=PhyloNode) self.assertEqual(str(tree),str(result_tree)) def test_str(self): """PhyloNode str should give expected results""" nodes, tree = self.TreeNode, self.TreeRoot a = nodes['a'] b = nodes['b'] c = nodes['c'] d = nodes['d'] e = nodes['e'] f = nodes['f'] g = nodes['g'] h = nodes['h'] self.assertEqual(str(h), 'h:2;') self.assertEqual(str(f), '(g:3)f:2;') self.assertEqual(str(a), '(((d:1,e:4,(g:3)f:2)c:3)b:0,h:2)a;') #check that None isn't converted any more h.Length = None c.Length = None #need to test both leaf and internal node self.assertEqual(str(a), '(((d:1,e:4,(g:3)f:2)c)b:0,h)a;') def test_getMaxTipTipDistance(self): """getMaxTipTipDistance should get max tip distance across tree""" nodes, tree = self.TreeNode, self.TreeRoot dist, names, node = tree.getMaxTipTipDistance() self.assertEqual(dist, 15.0) # due to nodes with single descendents!! self.assertEqual(sorted(names), ['e','g']) self.assertEqual(node.Name, 'b') def test_setMaxTipTipDistance(self): """setMaxTipTipDistance sets MaxDistTips across tree""" nodes, tree = self.TreeNode, self.TreeRoot tree.setMaxTipTipDistance() tip_a, tip_b = tree.MaxDistTips self.assertEqual(tip_a[0] + tip_b[0], 10) self.assertEqual(sorted([tip_a[1],tip_b[1]]), ['g','h']) def test_maxTipTipDistance(self): """maxTipTipDistance returns the max dist between any pair of tips""" nodes, tree = self.TreeNode, self.TreeRoot max_dist, tip_pair = tree.maxTipTipDistance() self.assertEqual(max_dist, 10) try: self.assertEqual(tip_pair, ('h', 'g')) except AssertionError: self.assertEqual(tip_pair, ('g', 'h')) def test__find_midpoint_nodes(self): """_find_midpoint_nodes should return nodes surrounding the midpoint""" nodes, tree = self.TreeNode, self.TreeRoot max_dist = 10 tip_pair = ('g', 'h') result = tree._find_midpoint_nodes(max_dist, tip_pair) self.assertEqual(result, (nodes['b'], nodes['c'])) tip_pair = ('h', 'g') result = tree._find_midpoint_nodes(max_dist, tip_pair) self.assertEqual(result, (nodes['f'], nodes['c'])) def test_rootAtMidpoint(self): """rootAtMidpoint performs midpoint rooting""" nodes, tree = self.TreeNode, self.TreeRoot #works when the midpoint falls on an existing edge tree1 = deepcopy(tree) result = tree1.rootAtMidpoint() self.assertEqual(result.distance(result.getNodeMatchingName('e')), 4) self.assertEqual(result.getDistances(), tree1.getDistances()) #works when the midpoint falls between two existing edges nodes['f'].Length = 1 nodes['c'].Length = 4 result = tree.rootAtMidpoint() self.assertEqual(result.distance(result.getNodeMatchingName('e')), 5.0) self.assertEqual(result.distance(result.getNodeMatchingName('g')), 5.0) self.assertEqual(result.distance(result.getNodeMatchingName('h')), 5.0) self.assertEqual(result.distance(result.getNodeMatchingName('d')), 2.0) self.assertEqual(result.getDistances(), tree.getDistances()) def test_rootAtMidpoint2(self): """rootAtMidpoint works when midpoint is on both sides of root""" #also checks whether it works if the midpoint is adjacent to a tip nodes, tree = self.TreeNode, self.TreeRoot nodes['h'].Length = 20 result = tree.rootAtMidpoint() self.assertEqual(result.distance(result.getNodeMatchingName('h')), 14) self.assertEqual(result.getDistances(), tree.getDistances()) def test_rootAtMidpoint3(self): """ midpoint between nodes should behave correctly""" tree = DndParser('(a:1,((c:1,d:2.5)n3:1,b:1)n2:1)rt;') tmid = tree.rootAtMidpoint() self.assertEqual(tmid.getDistances(),tree.getDistances()) tipnames = tree.getTipNames() nontipnames = [t.Name for t in tree.nontips()] self.assertTrue(tmid.isRoot()) self.assertEqual(tmid.distance(tmid.getNodeMatchingName('d')), 2.75) def test_rootAtMidpoint4(self): """midpoint should be selected correctly when it is an internal node """ tree = DndParser('(a:1,((c:1,d:3)n3:1,b:1)n2:1)rt;') tmid = tree.rootAtMidpoint() self.assertEqual(tmid.getDistances(),tree.getDistances()) tipnames = tree.getTipNames() nontipnames = [t.Name for t in tree.nontips()] # for tipname in tipnames: # tmid_tip = tmid.getNodeMatchingName(tipname) # orig_tip = tree.getNodeMatchingName(tipname) # for nontipname in nontipnames: # tmid_dist=\ # tmid.getNodeMatchingName(nontipname).distance(tmid_tip) # orig_dist=\ # tree.getNodeMatchingName(nontipname).distance(orig_tip) # print nontipname, tipname, 'assert' # self.assertEqual(tmid_dist, orig_dist) self.assertTrue(tmid.isRoot()) self.assertEqual(tmid.distance(\ tmid.getNodeMatchingName('d')), 3) def test_rootAtMidpoint5(self): """midpoint should be selected correctly when on an even 2tip tree """ tree = DndParser('''(BLO_1:0.649351,BLO_2:0.649351):0.0;''') tmid = tree.rootAtMidpoint() self.assertEqual(tmid.getDistances(),tree.getDistances()) tipnames = tree.getTipNames() nontipnames = [t.Name for t in tree.nontips()] self.assertTrue(tmid.isRoot()) self.assertFloatEqual(tmid.distance(\ tmid.getNodeMatchingName('BLO_2')), 0.649351) self.assertFloatEqual(tmid.distance(\ tmid.getNodeMatchingName('BLO_1')), 0.649351) self.assertFloatEqual(tmid[0].distance(tmid[1]), 2.0* 0.649351) def test_setTipDistances(self): """setTipDistances should correctly set tip distances.""" tree = DndParser('(((A1:.1,B1:.1):.1,(A2:.1,B2:.1):.1):.3,((A3:.1,B3:.1):.1,(A4:.1,B4:.1):.1):.3);',constructor=PhyloNode) #expected distances for a post order traversal expected_tip_distances = [0,0,0.1,0,0,0.1,0.2,0,0,0.1,0,0,0.1,0.2,0.5] #tips should have distance of 0 tree.setTipDistances() for node in tree.tips(): self.assertEqual(node.TipDistance,0) idx = 0 for node in tree.traverse(self_before=False,self_after=True): self.assertEqual(node.TipDistance,expected_tip_distances[idx]) idx+=1 def test_scaleBranchLengths(self): """scaleBranchLengths should correclty scale branch lengths.""" tree = DndParser('(((A1:.1,B1:.1):.1,(A2:.1,B2:.1):.1):.3,((A3:.1,B3:.1):.1,(A4:.1,B4:.1):.1):.3);',constructor=PhyloNode) tree.scaleBranchLengths(max_length=100,ultrametric=True) expected_tree = '(((A1:20,B1:20):20,(A2:20,B2:20):20):60,((A3:20,B3:20):20,(A4:20,B4:20):20):60);' self.assertEqual(str(tree),expected_tree) def test_unrooted(self): """unrooted should preserve tips, drop a node""" rooted = LoadTree(treestring="(B:0.2,(C:0.2,D:0.2)F:0.2)G;") unrooted = rooted.unrooted() self.assertEqual(sorted(rooted.getTipNames()), sorted(unrooted.getTipNames())) self.assertLessThan(len(unrooted.getNodeNames()), len(rooted.getNodeNames())) class Test_tip_tip_distances_I(object): """Abstract class for testing different implementations of tip_to_tip.""" def setUp(self): """Define a few standard trees""" constructor = PhyloNode self.root_std = DndParser(tree_std, constructor) self.root_one_level = DndParser(tree_one_level, constructor) self.root_two_level = DndParser(tree_two_level, constructor) self.root_one_child = DndParser(tree_one_child, constructor) def test_one_level(self): """tip_to_tip should work for one-level multifurcating tree""" matrix, order = self.fun(self.root_one_level) self.assertEqual([i.Name for i in order], list('abc')) self.assertEqual(matrix, array([[0,3,4],[3,0,5],[4,5,0]])) def test_two_level(self): """tip_to_tip should work for two-level tree""" matrix, order = self.fun(self.root_two_level) self.assertEqual([i.Name for i in order], list('abcd')) self.assertFloatEqual(matrix, \ array([[0,3,4,1.4],[3,0,5,2.4],[4,5,0,3.4],[1.4,2.4,3.4,0]])) class Test_tip_tip_distances_array(Test_tip_tip_distances_I, TestCase): """Tests for the array implementation of tip_to_tip distances""" def setUp(self): """Specify which method to call.""" self.fun = lambda x: x.tipToTipDistances() super(Test_tip_tip_distances_array, self).setUp() def test_std(self): """tip_to_tip should work for small but complex tree""" dist, tips = self.fun(self.root_std) tips = [tip.Name for tip in tips] self.assertEqual(dist, tree_std_dist) self.assertEqual(tips, tree_std_tips) def test_one_child(self): """tip_to_tip should work for tree with a single child""" dist, tips = self.fun(self.root_one_child) tips = [tip.Name for tip in tips] self.assertEqual(dist, tree_one_child_dist) self.assertEqual(tips, tree_one_child_tips) # for use with testing iterative copy method def comb_tree(num_leaves): """Returns a comb node_class tree.""" branch_child = 1 root = TreeNode() curr = root for i in range(num_leaves-1): curr.Children[:] = [TreeNode(Parent=curr),TreeNode(Parent=curr)] curr = curr.Children[branch_child] return root # Moved from test_tree2.py during code sprint on 04/14/10 # Missing tests: edge attributes (Name, Length, Children) only get tested # in passing by some of these tests. See also xxx's class TreeInterfaceForLikelihoodFunction(TestCase): default_newick = "((A:1,B:2)ab:3,((C:4,D:5)cd,E:6)cde:7)" def _maketree(self, treestring=None): if treestring is None: treestring = self.default_newick return LoadTree(treestring=treestring, underscore_unmunge=True) def setUp(self): self.default_tree = self._maketree() def test_getEdgeNames(self): tree = self._maketree() for (a, b, outgroup, result) in [ ('A', 'B', None, ['A', 'B']), ('E', 'C', None, ['C', 'D', 'cd', 'E']), ('C', 'D', 'E', ['C', 'D'])]: self.assertEqual(tree.getEdgeNames( a, b, True, False, outgroup), result) def test_parser(self): """nasty newick""" nasty = "( (A :1.0,'B (b)': 2) [com\nment]pair:3,'longer name''s':4)dash_ed;" nice = "((A:1.0,'B (b)':2.0)pair:3.0,'longer name''s':4.0)dash_ed;" tree = self._maketree(nasty) tidied = tree.getNewick(with_distances=1) self.assertEqual(tidied, nice) # Likelihood Function Interface def test_getEdgeNames(self): tree = self.default_tree clade = tree.getEdgeNames('C', 'E', getstem=0, getclade=1) clade.sort() self.assertEqual(clade, ['C', 'D', 'E', 'cd']) all = tree.getEdgeNames('C', 'E', getstem=1, getclade=1) all.sort() self.assertEqual(all, ['C', 'D', 'E', 'cd', 'cde']) stem = tree.getEdgeNames('C', 'E', getstem=1, getclade=0) self.assertEqual(stem, ['cde']) def test_getEdgeNamesUseOutgroup(self): t1 = LoadTree(treestring="((A,B)ab,(F,(C,D)cd)cdf,E)root;") # a, e, ogroup f t2 = LoadTree(treestring="((E,(A,B)ab)abe,F,(C,D)cd)root;") expected = ['A', 'B', 'E', 'ab'] for t in [t1, t2]: edges = t.getEdgeNames('A', 'E', getstem = False, getclade = True, outgroup_name = "F") edges.sort() self.assertEqual(expected, edges) def test_getConnectingNode(self): tree = self.default_tree self.assertEqual(tree.getConnectingNode('A', 'B').Name, 'ab') self.assertEqual(tree.getConnectingNode('A', 'C').Name, 'root') def test_getNodeMatchingName(self): tree = self.default_tree for (name, expect_tip) in [('A', True), ('ab', False)]: edge = tree.getNodeMatchingName(name) self.assertEqual(edge.Name, name) self.assertEqual(edge.istip(), expect_tip) def test_getEdgeVector(self): tree = self.default_tree names = [e.Name for e in tree.getEdgeVector()] self.assertEqual(names, ['A', 'B', 'ab', 'C', 'D', 'cd', 'E', 'cde', 'root']) def test_getNewickRecursive(self): orig = "((A:1.0,B:2.0)ab:3.0,((C:4.0,D:5.0)cd:6.0,E:7.0)cde:8.0)all;" unlen = "((A,B)ab,((C,D)cd,E)cde)all;" tree = self._maketree(orig) self.assertEqual(tree.getNewickRecursive(with_distances=1), orig) self.assertEqual(tree.getNewickRecursive(), unlen) tree.Name = "a'l" ugly_name = "((A,B)ab,((C,D)cd,E)cde)a'l;" ugly_name_esc = "((A,B)ab,((C,D)cd,E)cde)'a''l';" self.assertEqual(tree.getNewickRecursive(escape_name=True), \ ugly_name_esc) self.assertEqual(tree.getNewickRecursive(escape_name=False), ugly_name) tree.Name = "a_l" ugly_name = "((A,B)ab,((C,D)cd,E)cde)a_l;" ugly_name_esc = "((A,B)ab,((C,D)cd,E)cde)'a_l';" self.assertEqual(tree.getNewickRecursive(escape_name=True), \ ugly_name_esc) self.assertEqual(tree.getNewickRecursive(escape_name=False), ugly_name) tree.Name = "a l" ugly_name = "((A,B)ab,((C,D)cd,E)cde)a l;" ugly_name_esc = "((A,B)ab,((C,D)cd,E)cde)a_l;" self.assertEqual(tree.getNewickRecursive(escape_name=True), \ ugly_name_esc) self.assertEqual(tree.getNewickRecursive(escape_name=False), ugly_name) tree.Name = "'a l'" quoted_name = "((A,B)ab,((C,D)cd,E)cde)'a l';" quoted_name_esc = "((A,B)ab,((C,D)cd,E)cde)'a l';" self.assertEqual(tree.getNewickRecursive(escape_name=True), \ quoted_name_esc) self.assertEqual(tree.getNewickRecursive(escape_name=False),quoted_name) def test_getNewick(self): orig = "((A:1.0,B:2.0)ab:3.0,((C:4.0,D:5.0)cd:6.0,E:7.0)cde:8.0)all;" unlen = "((A,B)ab,((C,D)cd,E)cde)all;" tree = self._maketree(orig) self.assertEqual(tree.getNewick(with_distances=1), orig) self.assertEqual(tree.getNewick(), unlen) tree.Name = "a'l" ugly_name = "((A,B)ab,((C,D)cd,E)cde)a'l;" ugly_name_esc = "((A,B)ab,((C,D)cd,E)cde)'a''l';" self.assertEqual(tree.getNewick(escape_name=True), ugly_name_esc) self.assertEqual(tree.getNewick(escape_name=False), ugly_name) tree.Name = "a_l" ugly_name = "((A,B)ab,((C,D)cd,E)cde)a_l;" ugly_name_esc = "((A,B)ab,((C,D)cd,E)cde)'a_l';" self.assertEqual(tree.getNewickRecursive(escape_name=True), \ ugly_name_esc) self.assertEqual(tree.getNewickRecursive(escape_name=False), ugly_name) tree.Name = "a l" ugly_name = "((A,B)ab,((C,D)cd,E)cde)a l;" ugly_name_esc = "((A,B)ab,((C,D)cd,E)cde)a_l;" self.assertEqual(tree.getNewickRecursive(escape_name=True), \ ugly_name_esc) self.assertEqual(tree.getNewickRecursive(escape_name=False), ugly_name) tree.Name = "'a l'" quoted_name = "((A,B)ab,((C,D)cd,E)cde)'a l';" quoted_name_esc = "((A,B)ab,((C,D)cd,E)cde)'a l';" self.assertEqual(tree.getNewick(escape_name=True), \ quoted_name_esc) self.assertEqual(tree.getNewick(escape_name=False),quoted_name) def test_XML(self): # should add some non-length parameters orig = self.default_tree xml = orig.getXML() parsed = LoadTree(treestring=xml) self.assertEqual(str(orig), str(parsed)) # Magic methods def test_str(self): """testing (well, exercising at least), __str__""" str(self.default_tree) def test_repr(self): """testing (well, exercising at least), __repr__""" repr(self.default_tree) def test_eq(self): """testing (well, exercising at least), __eq__""" # xxx not good enough! t1 = self._maketree() t2 = self._maketree() self.assertTrue(t1 == t1) self.assertFalse(t1 == t2) def test_balanced(self): """balancing an unrooted tree""" t = LoadTree(treestring='((a,b),((c1,(c2,(c3,(c4,(c5,(c6,c7)))))),(d,e)),f)') b = LoadTree(treestring='(c1,(c2,(c3,(c4,(c5,(c6,c7))))),((d,e),((a,b),f)))') self.assertEqual(str(t.balanced()), str(b)) def test_params_merge(self): t = LoadTree(treestring='((((a,b)ab,c)abc),d)') for (label, length, beta) in [('a',1, 20),('b',3,2.0),('ab',4,5.0),]: t.getNodeMatchingName(label).params = {'length':length, 'beta':beta} t = t.getSubTree(['b', 'c', 'd']) self.assertEqual(t.getNodeMatchingName('b').params, {'length':7, 'beta':float(2*3+4*5)/(3+4)}) self.assertRaises(ValueError, t.getSubTree, ['b','c','xxx']) self.assertEqual(str(t.getSubTree(['b','c','xxx'],ignore_missing=True)), '(b:7,c)root;') def test_making_from_list(self): tipnames_with_spaces = ['a_b','a b',"T'lk"] tipnames_with_spaces.sort() t = LoadTree(tip_names=tipnames_with_spaces) result = t.getTipNames() result.sort() assert result == tipnames_with_spaces def test_getsetParamValue(self): """test getting, setting of param values""" t = LoadTree(treestring='((((a:.2,b:.3)ab:.1,c:.3)abc:.4),d:.6)') self.assertEqual(t.getParamValue('length', 'ab'), 0.1, 2) t.setParamValue('zz', 'ab', 4.321) node = t.getNodeMatchingName('ab') self.assertEqual(4.321, node.params['zz'], 4) class SmallTreeReshapeTestClass(TestCase): def test_rootswaps(self): """testing (well, exercising at least), unrooted""" new_tree = LoadTree(treestring="((a,b),(c,d))") new_tree = new_tree.unrooted() self.assert_(len(new_tree.Children) > 2, 'not unrooted right') def test_reroot(self): tree = LoadTree(treestring="((a,b),(c,d),e)") tree2 = tree.rootedWithTip('b') self.assertEqual(tree2.getNewick(), "(a,b,((c,d),e));") def test_sameShape(self): """test topology assessment""" t1 = LoadTree(treestring="(((s1,s5),s3),s2,s4);") t2 = LoadTree(treestring="((s1,s5),(s2,s4),s3);") t3 = LoadTree(treestring="((s1,s4),(s2,s5),s3);") assert t1.sameTopology(t2), (t1, t2) assert not t1.sameTopology(t3), (t1, t3) assert not t2.sameTopology(t3), (t2, t3) #============================================================================= # these are tests involving tree manipulation methods # hence, testing them for small and big trees # the tests are written once for the small tree, the big tree # tests are performed by inheriting from this class, but over-riding # the setUp. class TestTree(TestCase): """tests for a single tree-type""" def setUp(self): self.name = 'small tree - ' self.otu_names = ['NineBande', 'Mouse', 'HowlerMon', 'DogFaced'] self.otu_names.sort() self.newick = '(((Human,HowlerMon),Mouse),NineBande,DogFaced);' self.newick_sorted = '(DogFaced,((HowlerMon,Human),Mouse),NineBande);' self.newick_reduced = '((HowlerMon,Mouse),NineBande,DogFaced);' self.tree = LoadTree(treestring = self.newick) def test_sorttree(self): """testing (well, exercising at least) treesort""" new_tree = self.tree.sorted() if hasattr(self, 'newick_sorted'): self.assertEqual( self.newick_sorted, new_tree.getNewick(with_distances=0)) def test_getsubtree(self): """testing getting a subtree""" subtree = self.tree.unrooted().getSubTree(self.otu_names) new_tree = LoadTree(treestring = self.newick_reduced).unrooted() # check we get the same names self.assertEqual(*[len(t.Children) for t in (subtree,new_tree)]) self.assertEqual(str(subtree), str(new_tree)) def test_getsubtree_2(self): """tree.getSubTree() has same pairwise tip dists as tree (len0 node) """ t1 = DndParser('((a:1,b:2):4,((c:3, j:17.2):0,(d:1,e:1):2):3)', \ PhyloNode) # note c,j is len 0 node orig_dists = t1.getDistances() subtree = t1.getSubTree(set(['a','b','d','e','c'])) sub_dists = subtree.getDistances() for pair, dist in sub_dists.items(): self.assertEqual((pair,dist), (pair,orig_dists[pair])) def test_getsubtree_3(self): """tree.getSubTree() has same pairwise tip dists as tree (nonzero nodes) """ t1 = DndParser('((a:1,b:2):4,((c:3, j:17):0,(d:1,e:1):2):3)', \ PhyloNode) # note c,j is len 0 node orig_dists = t1.getDistances() subtree = t1.getSubTree(set(['a','b','d','e','c'])) sub_dists = subtree.getDistances() # for pair, dist in sub_dists.items(): # self.assertEqual((pair,dist), (pair,orig_dists[pair])) t2 = DndParser('((a:1,b:2):4,((c:2, j:16):1,(d:1,e:1):2):3)', \ PhyloNode) # note c,j similar to above t2_dists = t2.getDistances() # ensure t2 is same as t1, except j->c or c->j for pair, dist in t2_dists.items(): if (pair == ('c','j')) or (pair == ('j','c')): continue self.assertEqual((pair,dist), (pair,orig_dists[pair])) sub2 = t2.getSubTree(set(['a','b','d','e','c'])) sub2_dists = sub2.getDistances() for pair, dist in sub2_dists.items(): self.assertEqual((pair,dist), (pair,orig_dists[pair])) def test_getsubtree_4(self): """tree.getSubTree() handles keep_root correctly """ t1 = DndParser('((a:1,b:2):4,(((c:2)cparent:1, j:17):0,(d:1,e:4):2):3)') # /----4--- /--1-a # ---------| \--2-b # | /----0--- /-1---cparent---2---c # \---3----| \--17-j # \----2--- /--1--d # \--4--e # note c,j is len 0 node true_dists = {('a', 'b'): 3.0, ('a', 'c'): 11.0, ('a', 'd'): 11.0, ('a', 'e'): 14.0, ('a', 'j'): 25.0, ('b', 'a'): 3.0, ('b', 'c'): 12.0, ('b', 'd'): 12.0, ('b', 'e'): 15.0, ('b', 'j'): 26.0, ('c', 'a'): 11.0, ('c', 'b'): 12.0, ('c', 'd'): 6.0, ('c', 'e'): 9.0, ('c', 'j'): 20.0, ('d', 'a'): 11.0, ('d', 'b'): 12.0, ('d', 'c'): 6.0, ('d', 'e'): 5.0, ('d', 'j'): 20.0, ('e', 'a'): 14.0, ('e', 'b'): 15.0, ('e', 'c'): 9.0, ('e', 'd'): 5.0, ('e', 'j'): 23.0, ('j', 'a'): 25.0, ('j', 'b'): 26.0, ('j', 'c'): 20.0, ('j', 'd'): 20.0, ('j', 'e'): 23.0} true_root_dists = {'a':5,'b':6,'c':6,'j':20,'d':6,'e':9} t1_dists = t1.getDistances() # subtree = t1.getSubTree(set(['d','e','c'])) sub_dists = subtree.getDistances() true_sub_root_dists = {'c':3,'d':3,'e':6} sub_sameroot = t1.getSubTree(set(['d','e','c']), keep_root=True) sub_sameroot_dists = sub_sameroot.getDistances() sub_sameroot2 = t1.getSubTree(set(['j','c']), keep_root=True) sub_sameroot_dists2 = sub_sameroot2.getDistances() # tip to tip dists should be the same for tip_pair in sub_dists.keys(): self.assertEqual(sub_dists[tip_pair],true_dists[tip_pair]) for tip_pair in t1_dists.keys(): self.assertEqual(t1_dists[tip_pair],true_dists[tip_pair]) for tip_pair in sub_sameroot_dists.keys(): self.assertEqual(sub_sameroot_dists[tip_pair], true_dists[tip_pair]) for tip_pair in sub_sameroot_dists2.keys(): self.assertEqual(sub_sameroot_dists2[tip_pair], true_dists[tip_pair]) # sameroot should have longer root to tip dists for tip in t1.tips(): self.assertFloatEqual(t1.distance(tip), true_root_dists[tip.Name]) for tip in subtree.tips(): self.assertFloatEqual(subtree.distance(tip), true_sub_root_dists[tip.Name]) for tip in sub_sameroot.tips(): self.assertFloatEqual(sub_sameroot.distance(tip), true_root_dists[tip.Name]) for tip in sub_sameroot2.tips(): self.assertFloatEqual(sub_sameroot2.distance(tip), true_root_dists[tip.Name]) def test_ascii(self): self.tree.asciiArt() # unlabeled internal node tr = DndParser("(B:0.2,(C:0.3,D:0.4):0.6)F;") obs = tr.asciiArt(show_internal=True, compact=False) exp = """ /-B\n-F-------|\n | /-C\n \\--------|\n \\-D""" self.assertEqual(obs, exp) obs = tr.asciiArt(show_internal=True, compact=True) exp = """-F------- /-B\n \-------- /-C\n \-D""" self.assertEqual(obs, exp) obs = tr.asciiArt(show_internal=False, compact=False) exp = """ /-B\n---------|\n | /-C\n \\--------|\n \\-D""" self.assertEqual(obs, exp) # the following class repeats the above tests but using a big tree and big data-set class BigTreeSingleTests(TestTree): """using the big-tree for single-tree tests""" def setUp(self): self.name = 'big tree - ' self.otu_names = ['Horse', 'TombBat', 'Rhino', 'Pig', 'AsianElep', 'SpermWhal', 'Cat', 'Gorilla', 'Orangutan', 'bandicoot', 'Hedgehog', 'Sloth', 'HairyArma', 'Manatee', 'GoldenMol', 'Pangolin'] self.otu_names.sort() self.newick = '((((((((FlyingFox,DogFaced),((FreeTaile,LittleBro),(TombBat,RoundEare))),(FalseVamp,LeafNose)),(((Horse,Rhino),(Pangolin,(Cat,Dog))),(Llama,(Pig,(Cow,(Hippo,(SpermWhal,HumpbackW))))))),(Mole,Hedgehog)),(TreeShrew,(FlyingLem,((Jackrabbit,(FlyingSqu,(OldWorld,(Mouse,Rat)))),(Galago,(HowlerMon,(Rhesus,(Orangutan,(Gorilla,(Human,Chimpanzee)))))))))),(((NineBande,HairyArma),(Anteater,Sloth)),(((Dugong,Manatee),((AfricanEl,AsianElep),(RockHyrax,TreeHyrax))),(Aardvark,((GoldenMol,(Madagascar,Tenrec)),(LesserEle,GiantElep)))))),(caenolest,(phascogale,(wombat,bandicoot))));' self.newick_reduced = '(((((TombBat,(((Horse,Rhino),(Pangolin,Cat)),(Pig,SpermWhal))),Hedgehog),(Orangutan,Gorilla)),((HairyArma,Sloth),((Manatee,AsianElep),GoldenMol))),bandicoot);' self.tree = LoadTree(treestring = self.newick) def test_getEdgeNames(self): """testing (well, exercising at least), getedgenames""" # Fell over on small tree because "stem descended from root # joiner was a tip" a,b = self.otu_names[:2] clade = self.tree.getEdgeNames(a, b, True, False) def test_getTipNames(self): """testing (well, exercising at least), getTipNames""" a,b = self.otu_names[:2] tips = self.tree.getTipNames() self.assertEqual(len(tips), 55) #run if called from command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_core/test_usage.py000644 000765 000024 00000066017 12024702176 022121 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of classes dealing with base, codon, amino acid usage. """ from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.util.misc import FunctionWrapper from cogent.core.usage import InfoFreqs, AminoAcidUsage, BaseUsage, CodonUsage,\ PositionalBaseUsage, UnsafeBaseUsage, EqualBases, DinucUsage from cogent.core.genetic_code import GeneticCodes, GeneticCode __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class InfoFreqsTests(TestCase): """Tests of the InfoFreqs base class""" class kp_empty(InfoFreqs): pass class kp(InfoFreqs): Mask = FunctionWrapper(int) RequiredKeys = dict.fromkeys([1,2,3]) class has_info(list): def __new__(cls, data, *args): return list.__new__(cls, data) def __init__(self, data, info): list.__init__(self, data) self.Info = info def test_init_empty(self): """InfoFreqs empty init should include only required items""" self.assertEqual(self.kp_empty(), {}) self.assertEqual(self.kp(), {1:0, 2:0, 3:0}) def test_init_data(self): """InfoFreqs init with data should count freqs""" self.assertEqual(self.kp_empty('1qazaz'), {'1':1,'q':1,'a':2,'z':2}) self.assertEqual(self.kp([1,2,3,1]), {1:2, 2:1, 3:1}) #type of exception on invalid conversion currently not specified self.assertRaises(Exception, self.kp, '123q') def test_init_info(self): """InfoFreqs init from object with Info should get ref to Info""" s = self.has_info([1,1,1,2,2,3], {'test':'data'}) obj = self.kp(s) self.assertEqual(obj, {1:3, 2:2, 3:1}) self.assertEqual(obj.Info, {'test':'data'}) #should use its own info if supplied, though obj = self.kp(s, {'new':'info'}) self.assertEqual(obj, {1:3, 2:2, 3:1}) self.assertNotEqual(obj.Info, {'test':'data'}) self.assertEqual(obj.Info, {'new':'info'}) def test_info(self): """InfoFreqs info changes should work as expected.""" class has_attr(object): def __init__(self, attr): self.Attr = attr s = self.has_info([1,1,1,2,2,3,3], {'test':'data'}) obj = self.kp(s) self.assertEqual(obj, {1:3, 2:2, 3:2}) self.assertEqual(obj.Info, {'test':'data'}) self.assertRaises(AttributeError, obj.__getattr__, 'Attr') obj.Info = has_attr('Test') self.assertEqual(obj.Attr, 'Test') def test_normalize(self): """InfoFreqs normalize should delete non-required keys, if any required""" e = self.kp_empty('1qaa') self.assertEqual(e, {'1':1,'q':1,'a':2,}) e.normalize() #should not delete any keys, since kp_empty has no required keys self.assertEqual(e, {'1':0.25, 'q':0.25, 'a':0.5}) k = self.kp([1,2,3,1,4,4,4]) self.assertEqual(k, {1:2, 2:1, 3:1, 4:3}) k.normalize() #should delete 4, since it's not required self.assertEqual(k, {1:0.5, 2:0.25, 3:0.25}) class BaseUsageTests(TestCase): """Tests of the BaseUsage class.""" def test_init_empty(self): """BaseUsage should init with empty freqs""" b = BaseUsage() self.assertEqual(len(b), 4) for nt in 'UTACGutacg': assert nt in b self.assertEqual(b[nt], 0) for nt in 'UCAG': assert nt in b.keys() items = list(iter(b)) items.sort() self.assertEqual(items, ['A', 'C', 'G', 'U']) def test_init_data(self): """BaseUsage should init with arbitrary data""" b = BaseUsage('UUUUUGGGCA') self.assertEqual(b, {'U':5, 'G':3, 'C':1, 'A':1}) b.normalize() self.assertEqual(b, {'U':0.5, 'G':0.3, 'C':0.1, 'A':0.1}) def test_setitem(self): """BaseUsage should map keys on setitem""" b = BaseUsage() b['t'] = 3 b['G'] = 3 b.normalize() i = b.items() i.sort() self.assertEqual(i, [('A',0.0),('C',0.0),('G',0.5),('U',0.5)]) def test_getitem(self): """BaseUsage should map key on getitem""" b = BaseUsage({'a':3, 'T':2, 'X':1}) self.assertEqual(b['X'], 1) self.assertEqual(b['A'], 3) self.assertEqual(b['U'], 2) self.assertEqual(b['a'], 3) self.assertEqual(b['t'], 2) b.normalize() assert 'X' not in b self.assertFloatEqual(b['A'], 0.6) self.assertFloatEqual(b['u'], 0.4) def test_getitem_multi(self): """BaseUsage should get total frequency on getitem with 2-letter key""" b = BaseUsage({'a':3, 'T':2, 'C':5, 'X':1}) self.assertEqual(b['AT'], 0.5) self.assertEqual(b['AC'], 0.8) self.assertEqual(b['TC'], 0.7) self.assertEqual(b['AX'], 0.3) def test_copy(self): """BaseUsage copy should work correctly""" b = BaseUsage({'a':3, 'T':2, 'C':5, 'X':1}) c = b.copy() self.assertEqual(c['AT'], 0.5) def test_bases(self): """BaseUsage bases() should return same object""" b = BaseUsage({'a':3, 'T':2, 'C':5, 'X':1}) c = b.bases() assert b is c def test_codons(self): """BaseUsage codons should return most likely codon freqs""" b = BaseUsage({'a':3, 'T':2, 'X':1}) c = b.codons() known = { 'AAA' : .6 * .6 * .6, 'AAU' : .6 * .6 * .4, 'AUA' : .6 * .4 * .6, 'AUU' : .6 * .4 * .4, 'UAA' : .4 * .6 * .6, 'UAU' : .4 * .6 * .4, 'UUA' : .4 * .4 * .6, 'UUU' : .4 * .4 * .4, } for codon in c: if codon in known: self.assertFloatEqual(c[codon], known[codon]) else: self.assertEqual(c[codon], 0) def test_positionalBases(self): """BaseUsage positionalBases should have copy of self at each position""" b = BaseUsage('A') p = b.positionalBases() for i in p: assert i is not b self.assertEqual(i, {'A':1,'U':0,'C':0,'G':0}) def test_aminoAcids(self): """BaseUsage aminoAcids should give the same results as the codons""" known_data = { 'AAA' : .6 * .6 * .6, 'AAU' : .6 * .6 * .4, 'AUA' : .6 * .4 * .6, 'AUU' : .6 * .4 * .4, 'UAA' : .4 * .6 * .6, 'UAU' : .4 * .6 * .4, 'UUA' : .4 * .4 * .6, 'UUU' : .4 * .4 * .4, } known = CodonUsage(known_data) b = BaseUsage({'a':3, 'T':2, 'X':1}) self.assertEqual(b.aminoAcids(), known.aminoAcids()) #check that the genetic code is passed through correctly all_g = GeneticCode('G'*64) self.assertEqual(b.aminoAcids(all_g), AminoAcidUsage({'G':1})) def test_normalize(self): """BaseUsage normalize should work when empty""" b = BaseUsage() b.normalize() self.assertEqual(b, {'U':0,'C':0,'A':0,'G':0}) b = BaseUsage('AACG') b.normalize() self.assertEqual(b, {'U':0, 'C':0.25, 'A':0.5, 'G':0.25}) def test_distance(self): """BaseUsage distance() should return dist between two BUs""" #absolute numbers, will normalize to calculate distance self.assertFloatEqual(BaseUsage('GC').distance(BaseUsage('AU')),1) self.assertFloatEqual(BaseUsage('AU').distance(BaseUsage('GC')),1) self.assertFloatEqual(BaseUsage('GCAU').distance(BaseUsage('AUAU')),.5) #should work even against dict with 'T's self.assertFloatEqual(BaseUsage('GCAU').distance(\ BaseUsage({'A':2,'T':2,'C':0,'G':0,})),.5) #rounding error self.assertEqual(BaseUsage('ACG').distance(BaseUsage('CCGGAA')),0) #normalized - as in unit simplex ag = BaseUsage('AG') ag.normalize() uc = BaseUsage('UC') uc.normalize() self.assertFloatEqual(ag.distance(uc),1) self.assertFloatEqual(BaseUsage({'A':0.4,'G':0.1,'C':0.4,'U':0.1})\ .distance(BaseUsage({'A':0.1,'G':0.4,'U':0.4,'C':0.1})),0.6) self.assertFloatEqual(BaseUsage({'A':0.25,'G':0.25,'C':0.25,'U':0.25})\ .distance(BaseUsage({'A':0.25,'G':0.25,'U':0.25,'C':0.25})),0.0) self.assertFloatEqual(BaseUsage({'A':0.245,'G':0.255,'C':0.245,\ 'U':0.255}).distance(BaseUsage({'A':0.255,'G':0.245,'U':0.245,\ 'C':0.255})),0.02) self.assertFloatEqual(BaseUsage({'A':0.245,'G':0.255,'C':0.245,\ 'U':0.255}).distance(BaseUsage({'A':0.25,'G':0.25,'U':0.25,\ 'C':0.25})),0.01) self.assertFloatEqual(BaseUsage({'A':0.248,'G':0.252,'C':0.248,\ 'U':0.252}).distance(BaseUsage({'A':0.25,'G':0.25,'U':0.25,\ 'C':0.25})),0.004) def test_content(self): """BaseUsage content should return sum of specified bases.""" b = BaseUsage('UUUUUCCCAG') #should work for combinations self.assertEqual(b.content('UCAG'), 10) self.assertEqual(b.content('GC'), 4) self.assertEqual(b.content('AU'), 6) #should map T to U self.assertEqual(b.content('AT'), 6) #should work for single bases self.assertEqual(b.content('U'), 5) #shouldn't complain about invalid bases self.assertEqual(b.content('X'), 0) def test_add(self): """BaseUsage add should sum two base usages""" b = BaseUsage('U') b2 = BaseUsage('C') self.assertEqual(b+b2, BaseUsage('UC')) b += b2 self.assertEqual(b, BaseUsage('UC')) def test_toCartesian(self): """BaseUsage toCartesian should return x, y, z from instance""" b = BaseUsage('ACGU') self.assertEqual(b.toCartesian(), (0.5,0.5,0.5)) b = BaseUsage('A') self.assertEqual(b.toCartesian(), (0,0,1)) b = BaseUsage('CGA') self.assertEqual(b.toCartesian(), (1/3.0,1/3.0,1/3.0)) def test_fromCartesian(self): """BaseUsage fromCartesian should init instance from x,y,z""" b = BaseUsage.fromCartesian(0.5,.5,.5) self.assertFloatEqual(b['A'], 0.25) self.assertFloatEqual(b['C'], 0.25) self.assertFloatEqual(b['G'], 0.25) self.assertFloatEqual(b['U'], 0.25) b = BaseUsage.fromCartesian(1/3.0, 1/3.0, 1/3.0) self.assertEqual(b['U'], 0) self.assertEqual(b['A'], 1/3.0) class UnsafeBaseUsageTests(TestCase): """Tests of the UnsafeBaseUsage class.""" def test_init(self): """UnsafeFreqs uses dict init, so must use += after creation for others""" self.assertRaises(ValueError, UnsafeBaseUsage, 'acguc') u = UnsafeBaseUsage() u += 'acgtc' #note lack of conversion to uppercase RNA! self.assertEqual(u, {'a':1,'c':2,'g':1,'t':1}) def test_normalize(self): """UnsafeFreqs should normalize based on the uppercase RNA alphabet""" u = UnsafeBaseUsage() #note that the lower-case u will be discarded, not converted u += 'AAACCGGGGGu' u.normalize() self.assertFloatEqual(u, {'A':0.3, 'C':0.2, 'G':0.5, 'U':0.0}) class CodonUsageTests(TestCase): """Tests of the CodonUsage class.""" def test_init_empty(self): """Empty CodonUsage init should have 64 codons, all 0""" u = CodonUsage() self.assertEqual(len(u), 64) for i in u: self.assertEqual(u[i], 0) #check that the genetic code is the default assert u.GeneticCode is GeneticCodes[1] def test_init_string(self): """CodonUsage should count codons in string""" u = CodonUsage('UUUCCCUUUUUUGA') self.assertEqual(u, CodonUsage({'UUU':3, 'CCC':1, 'GA':1})) u.normalize() self.assertEqual(u, CodonUsage({'UUU':0.75, 'CCC':0.25})) def test_getitem(self): """CodonUsage should allow lookup as RNA or DNA, case-insensitive""" u = CodonUsage() rna, dna, lc = 'UCAG', 'TCAG', 'ucag' for a in [rna, dna, lc]: codons = [i+j+k for i in a for j in a for k in a] for c in codons: self.assertEqual(u[c], 0) def test_bases(self): """CodonUsage bases should count bases correctly""" u = CodonUsage('UUUCCCUAGCCCGGGAA') b = u.bases() self.assertEqual(b, BaseUsage('UUUCCCUAGCCCGGGAA')) #purge_unwanted should get rid of bad codons b = u.bases(purge_unwanted=True) self.assertEqual(b, BaseUsage('UUUCCCCCCGGG')) def test_codons(self): """CodonUsage codons should return same object""" u = CodonUsage('abc') c = u.codons() assert u is c def test_aminoAcids(self): """CodonUsage aminoAcids should correctly count amino acids""" freqs = {'UUC':5, 'AUA':10, 'AUG':10, 'CGC':3, 'AGG':2, 'XYZ':8, 'UAA':2, 'UGA':1} u = CodonUsage(freqs, "test") self.assertEqual(u.Info, 'test') for key, val in u.items(): if key in freqs: self.assertEqual(val, freqs[key]) else: self.assertEqual(val, 0) aa = u.aminoAcids() self.assertEqual(aa, AminoAcidUsage({'F':5,'I':10,'M':10,'R':5,'*':3,'X':8})) #check that it works with a different genetic code u.GeneticCode = GeneticCodes['2'] aa = u.aminoAcids() self.assertEqual(aa, AminoAcidUsage({'F':5,'I':0,'M':20,'R':3,'*':4,'W':1,'X':8})) #check that it works if a genetic code is supplied explicitly u.GeneticCode = GeneticCodes[1] aa = u.aminoAcids() self.assertEqual(aa, AminoAcidUsage({'F':5,'I':10,'M':10,'R':5,'*':3,'X':8})) aa_2 = u.aminoAcids(2) self.assertEqual(aa_2, AminoAcidUsage({'F':5,'I':0,'M':20,'R':3,'*':4,'W':1,'X':8})) #check that we held onto the info object through the above self.assertEqual(aa_2.Info, 'test') def test_positionalBases(self): """CodonUsage bases should count bases at each position correctly""" freqs = {'UUC':5, 'AUA':10, 'AUG':10, 'CGC':3, 'AGG':2, 'XYZ':8, 'UAA':2, 'UGA':1} u = CodonUsage(freqs) b = u.positionalBases() assert isinstance(b, PositionalBaseUsage) first, second, third = b self.assertEqual(first, BaseUsage({'U':8,'C':3,'A':22,'X':8})) self.assertEqual(second, BaseUsage({'U':25,'C':0,'A':2,'G':6,'Y':8})) self.assertEqual(third, BaseUsage({'C':8,'A':13,'G':12,'Z':8})) #check that it also works when we purge p = u.positionalBases(purge_unwanted=True) first, second, third = p self.assertEqual(first, BaseUsage({'U':5,'C':3,'A':2})) self.assertEqual(second, BaseUsage({'U':5,'G':5})) self.assertEqual(third, BaseUsage({'C':8,'G':2})) #check that it also works with a different genetic code, and, #incidentally, that the purging didn't affect the original object u.GeneticCode = GeneticCodes[2] #mt code: different stop codons p = u.positionalBases(purge_unwanted=True) first, second, third = p self.assertEqual(first, BaseUsage({'U':6,'C':3,'A':20})) self.assertEqual(second, BaseUsage({'U':25,'G':4})) self.assertEqual(third, BaseUsage({'C':8,'A':11,'G':10})) def test_positionalGC(self): """CodonUsage positionalGC should give correct GC contents.""" c = EqualBases.codons() self.assertEqual(c.positionalGC(False), [0.5,0.5,0.5,0.5]) c = EqualBases.codons() self.assertNotEqual(c.positionalGC(True), [0.5,0.5,0.5,0.5]) def test_fingerprint(self): """CodonUsage fingerprint should give correct ratios.""" c = EqualBases.codons() f = c.fingerprint() self.assertEqual(len(f), 9) self.assertEqual(f, \ [[.5,.5,.125] for i in range(8)] + [[.5,.5,1]]) #should be able to omit mean... f = c.fingerprint(include_mean=False) self.assertEqual(f, [[.5,.5,.125] for i in range(8)]) #...or use all doublets f = c.fingerprint(include_mean=False, which_blocks='all') self.assertEqual(len(f), 16) #...or do just the non-quartet ones f = c.fingerprint(include_mean=False, which_blocks='split') self.assertEqual(len(f), 6) #check that it doesn't fail on an empty codon usage c = CodonUsage('') f = c.fingerprint() self.assertEqual(f[0], [0.5, 0.5, 0]) def test_pr2bias(self): """CodonUsage pr2bias should give correct ratios.""" c = EqualBases.codons() b = c.pr2bias('UU') self.assertEqual(len(b), 6) self.assertEqual(b, tuple([.5]*6)) c = CodonUsage() c['ACU'] = 10 c['ACC'] = 5 c['ACA'] = 15 c['ACG'] = 20 self.assertEqual(c.pr2bias('AC'), (20/25,15/25,20/35,5/15,20/30,5/20)) def test_add(self): """CodonUsage add should sum two base usages""" c = CodonUsage('UUU') c2 = CodonUsage('CCC') self.assertEqual(c+c2, CodonUsage('UUUCCC')) c += c2 self.assertEqual(c, CodonUsage('UUUCCC')) def test_rscu(self): """CodonUsage rscu should calculate synonymous usage correctly""" c = CodonUsage({'UUU':3,'UUC':1,'ACA':1}) c.rscu() self.assertEqual(c['UUU'], 0.75) self.assertEqual(c['UUC'], 0.25) self.assertEqual(c['ACA'], 1) self.assertEqual(c['GGG'], 0) class PositionalBaseUsageTests(TestCase): """Tests of the PositionalBaseUsage class.""" def test_init_empty(self): """PositionalBaseUsage init when empty should set all freqs to 0""" p = PositionalBaseUsage() assert p[0] is not p[1] assert p[1] is not p[2] assert p[0] is not p[2] for i in p: self.assertEqual(i, BaseUsage()) def test_info(self): """PositionalBaseUsage info should work as expected""" #test a cycle of setting the Info and setting it back again p = PositionalBaseUsage() self.assertRaises(AttributeError, getattr, p, 'upper') p.Info = 'xyz' self.assertEqual(p.upper(), 'XYZ') p.Info = None self.assertRaises(AttributeError, getattr, p, 'upper') def test_getitem(self): """PositionalBaseUsage getitem should return 1st, 2nd, 3rd in order""" a, c, g = BaseUsage('A'), BaseUsage('C'), BaseUsage('G') p = PositionalBaseUsage(a, c, g) assert p.First is a assert p.Second is c assert p.Third is g #make sure they're not all the same object assert p.First is not g #test positive indices assert p[0] is p.First assert p[1] is p.Second assert p[2] is p.Third #test negative indices assert p[-1] is p.Third assert p[-2] is p.Second assert p[-3] is p.First try: x = p[3] except IndexError: pass else: self.fail("Failed to raise IndexError on bad index") #test iteration for o, e in zip(p, [a, c, g]): assert o is e def test_normalize(self): """PositionalBaseUsage normalize should normalize each position""" a, c, g = BaseUsage('AAGC'), BaseUsage('CCGA'), BaseUsage('GGCA') p = PositionalBaseUsage(a, c, g) self.assertEqual(p[0], {'A':2, 'C':1, 'G':1, 'U':0}) p.normalize() self.assertEqual(p[0], {'A':0.5, 'C':0.25, 'G':0.25, 'U':0}) self.assertEqual(p[1], {'A':0.25, 'C':0.5, 'G':0.25, 'U':0}) self.assertEqual(p[2], {'A':0.25, 'C':0.25, 'G':0.5, 'U':0}) def test_bases(self): """PositionalBaseUsage bases should sum bases at each position""" a, c, g = BaseUsage('AAGC'), BaseUsage('CCGA'), BaseUsage('GGCA') p = PositionalBaseUsage(a, c, g) b = p.bases() self.assertEqual(b, BaseUsage('AAGCCCGAGGCA')) def test_codons(self): """PositionalBaseUsage codons should give expected codon freqs""" #one of each base should give freqs if 1/64 for everything orig = CodonUsage('UUUCCCAAAGGG') b = orig.positionalBases() final = b.codons() self.assertEqual(len(final), 64) for i in final: self.assertFloatEqual(final[i], 1.0/64) #two bases at each position should give correct freqs orig = CodonUsage('UCGAGUUCGUCG') final = orig.positionalBases().codons() exp = { 'UCG': 0.75 * 0.75 * 0.75, 'UCU': 0.75 * 0.75 * 0.25, 'UGG': 0.75 * 0.25 * 0.75, 'UGU': 0.75 * 0.25 * 0.25, 'ACG': 0.25 * 0.75 * 0.75, 'ACU': 0.25 * 0.75 * 0.25, 'AGG': 0.25 * 0.25 * 0.75, 'AGU': 0.25 * 0.25 * 0.25, } for f in final: if f in exp: self.assertFloatEqual(final[f], exp[f]) else: self.assertEqual(final[f], 0) def test_positionalBases(self): """PositionalBaseUsage positionalBases should return same object""" p = PositionalBaseUsage() x = p.positionalBases() assert p is x def test_aminoAcids(self): """PositionalBaseUsage aminoAcids should return correct amino acids""" #check hand-calculated values on a particular sequence orig = CodonUsage('UCGAGUUCGUCG') final = orig.positionalBases().aminoAcids() exp = { 'S': 0.75 * 0.75 * 0.75 + 0.75 * 0.75 * 0.25 + 0.25*0.25*0.25, 'W': 0.75 * 0.25 * 0.75, 'C': 0.75 * 0.25 * 0.25, 'T': 0.25 * 0.75 * 0.75 + 0.25 * 0.75 * 0.25, 'R': 0.25 * 0.25 * 0.75, } for f in final: if f in exp: self.assertFloatEqual(final[f], exp[f]) else: self.assertEqual(final[f], 0) #test for unbiased freqs on a couple of different genetic codes orig = CodonUsage('UUUCCCAAAGGG') final = orig.positionalBases().aminoAcids() SGC = GeneticCodes[1] for aa in final: self.assertEqual(final[aa], len(SGC[aa])/64.0) mt = GeneticCodes[2] final_mt = orig.positionalBases().aminoAcids(mt) self.assertNotEqual(final, final_mt) for aa in final_mt: self.assertEqual(final_mt[aa], len(mt[aa])/64.0) class AminoAcidUsageTests(TestCase): """Tests of the AminoAcidUsage class.""" def test_init_empty(self): """AminoAcidUsage should init with empty freqs""" a = AminoAcidUsage() for key, val in a.items(): self.assertEqual(val, 0) self.assertEqual(len(a), 21) assert 'A' in a assert 'a' in a def test_init_data(self): """AminoAcidUsage should init with data""" a = AminoAcidUsage('aadddx') self.assertEqual(a['A'], 2) self.assertEqual(a['d'], 3) self.assertEqual(a['X'], 1) a.normalize() self.assertEqual(a['a'], 0.4) self.assertEqual(a['d'], 0.6) assert 'x' not in a def test_bases(self): """AminoAcidUsage bases should return most likely base freqs""" a = AminoAcidUsage('GGG') self.assertEqual(a.bases(), {'G':9.0/12,'U':1.0/12,'C':1.0/12,'A':1.0/12}) a = AminoAcidUsage('CAGTWERQWE') exp = a.codons().bases() exp.normalize() self.assertFloatEqual(a.bases(), exp) def test_codons(self): """AminoAcidUsage codons should return most likely codon freqs""" a = AminoAcidUsage('GGG') c = CodonUsage('GGUGGCGGAGGG') c.normalize() self.assertEqual(a.codons(), c) a = AminoAcidUsage('D') c = CodonUsage('GAUGAC') c.normalize() self.assertEqual(a.codons(), c) a = AminoAcidUsage('GDDFMM') c = CodonUsage('GGUGGCGGAGGG'+'GAUGAC'*4+'UUUUUC'*2+'AUG'*8) c.normalize() self.assertEqual(a.codons(), c) a = AminoAcidUsage('II*') c = CodonUsage('AUUAUCAUA'*2+'UAAUAGUGA') c.normalize() self.assertEqual(a.codons(), c) #check that it works with a nonstandard code code = GeneticCode('A'*4+'C'*28+'G'*32) a = AminoAcidUsage('AAA') c = CodonUsage('UUUUUCUUAUUG') c.normalize() self.assertEqual(a.codons(code), c) #check that it works with unequal codon frequencies unequal = CodonUsage({'GGU':5,'GGC':2,'GGA':2,'GGG':1,'UUU':3,'UUC':1}) a = AminoAcidUsage('GFFF') exp = { 'GGU':0.5*0.25, 'GGC':0.2*0.25, 'GGA':0.2*0.25, 'GGG':0.1*0.25, 'UUU':0.75*0.75, 'UUC':0.25*0.75 } obs = a.codons(codon_usage=unequal) for codon, freq in obs.items(): self.assertFloatEqual(freq, exp.get(codon, 0)) def test_positionalBases(self): """AminoAcidUsage positionalBases should return best positional bases""" a = AminoAcidUsage('WQRSFADDQW') exp = a.codons().positionalBases() obs = a.positionalBases() for o, e in zip(obs, exp): self.assertFloatEqual(o, e) def test_aminoAcids(self): """AminoAcidUsage aminoAcids should return same object""" a = AminoAcidUsage('REWQDFTDSF') b = a.aminoAcids() assert a is b class DinucUsageTests(TestCase): """Tests of the DinucUsage class.""" def test_init_from_seq(self): """DinucUsage should init correctly from string.""" s1 = 'AAAAA' s2 = 'ACTACG' fd = filter_dict self.assertEqual(fd(DinucUsage(s1)), {'AA':4}) #NOTE: will map DNA seq tp RNA. self.assertEqual(fd(DinucUsage(s2)), {'AC':2,'CU':1,'UA':1,'CG':1}) #check that it works for non-overlapping self.assertEqual(fd(DinucUsage(s1, Overlapping=False)), {'AA':2}) self.assertEqual(fd(DinucUsage(s2, Overlapping=False)), \ {'AC':1,'UA':1,'CG':1}) #check that it works for the 3-1 case self.assertEqual(fd(DinucUsage(s1, Overlapping='3-1')), {'AA':1}) self.assertEqual(fd(DinucUsage(s2, Overlapping='3-1')), \ {'UA':1}) s3 = 'ACG'*5 self.assertEqual(fd(DinucUsage(s3, Overlapping='3-1')), \ {'GA':4}) s4 = s3 + 'GAA' self.assertEqual(fd(DinucUsage(s4, Overlapping='3-1')), \ {'GA':4,'GG':1}) def test_distance(self): """Dinuc distance should calculate Euclidean dist. correctly""" s1 ='AA'+'GG'*10 s2 = 'AA'*5 + 'GG'*7 d1 = DinucUsage(s1, Overlapping=False) d2 = DinucUsage(s2, Overlapping=False) self.assertEqual(d1.distance(d1), 0) self.assertEqual(d1.distance(d2), 5) self.assertEqual(d2.distance(d1), 5) def filter_dict(d): """Removes zero keys from dict-like object.""" result = dict(d) for k, v in d.items(): if not v: del result[k] return result #run if called from command-line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_cluster/__init__.py000644 000765 000024 00000000700 12024702176 022231 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_goodness_of_fit', 'test_UPGMA', 'test_metric_scaling', 'test_procrustes', 'test_nmds', 'test_approximate_mds'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozuopone", "Peter Maxwell", "Rob Knight", "Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_cluster/test_approximate_mds.py000644 000765 000024 00000413537 12024702176 024745 0ustar00jrideoutstaff000000 000000 #!usr/bin/env python """Unit tests for fast metric scaling functions""" from cogent.util.unit_test import TestCase, main from cogent.cluster import goodness_of_fit from cogent.cluster.approximate_mds \ import nystrom from cogent.cluster.approximate_mds \ import calc_matrix_a, calc_matrix_b, build_seed_matrix from cogent.cluster.approximate_mds import rowmeans, \ affine_mapping, adjust_mds_to_ref, recenter, combine_mds, \ cmds_tzeng, CombineMds from numpy import array, matrix, random, argsort __author__ = "Andreas Wilm" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Andreas Wilm"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Andreas Wilm" __email__ = "andreas.wilm@ucd.ie" __status__ = "FIXME" PRINT_STRESS = False # Bigish symmetrical matrix for testing: The following is a distance # matrix of 100 points making up a 16-dimensional spiral. Idea was # copied from Tzeng et al. 2008 (PMID 18394154). # # Note: the objects are ordered, i.e. permuting the distances will # give better MDS approximations # FULL_SYM_MATRIX = array([ [ 0. , 9.94319, 7.11402, 7.30016, 9.1998 , 7.52661, 10.1122 , 11.71544, 11.72171, 12.23729, 16.55706, 15.07764, 17.41182, 19.23286, 22.03428, 20.37969, 21.79021, 23.31804, 25.27884, 25.33384, 25.51721, 25.23346, 27.13757, 27.77099, 31.89744, 30.12906, 30.853 , 31.25053, 32.01368, 32.92915, 34.80687, 34.67094, 34.61157, 33.95445, 36.09099, 34.39821, 35.29278, 33.43954, 35.50327, 34.96861, 34.61776, 31.52205, 31.5479 , 31.86234, 30.85453, 29.69611, 29.41756, 28.83014, 30.11403, 28.76504, 23.7291 , 25.6983 , 25.97412, 25.13513, 24.45972, 21.55045, 21.69278, 19.95303, 16.26348, 17.11013, 14.32254, 17.62265, 15.44168, 17.37134, 13.90914, 11.73735, 12.23879, 12.84236, 16.11693, 14.78883, 14.85241, 16.81082, 15.84987, 19.49531, 19.10495, 20.82333, 22.2353 , 24.10629, 25.58022, 26.94421, 29.15922, 29.35633, 30.44926, 33.44155, 33.20634, 35.10944, 37.58041, 36.97612, 37.4974 , 39.19126, 41.96371, 39.49224, 44.24451, 44.49713, 46.20998, 45.17665, 43.34276, 45.20143, 42.75318, 46.74111], [ 9.94319, 0. , 9.15502, 9.50864, 4.67066, 8.73786, 10.65396, 7.07033, 7.65245, 11.7303 , 13.97535, 14.44678, 14.4366 , 17.67632, 17.78924, 17.34483, 20.92378, 19.64563, 22.37953, 22.50652, 22.59182, 23.16428, 25.36422, 24.53657, 28.53754, 28.8102 , 28.67001, 29.99149, 29.0992 , 30.78134, 32.60362, 33.06432, 32.35053, 31.31204, 33.96212, 32.70361, 34.0075 , 31.79899, 34.1394 , 34.08447, 34.40636, 32.09843, 31.55098, 32.21933, 29.68935, 30.64989, 29.84656, 30.31266, 30.95073, 29.50086, 25.13342, 26.92125, 26.77764, 25.45479, 26.64139, 24.28835, 23.54393, 21.14658, 19.0042 , 19.69926, 17.01569, 17.70308, 16.76036, 18.85088, 15.67933, 13.11373, 14.57384, 14.49709, 13.93834, 13.97627, 15.28133, 15.43603, 15.38908, 19.14731, 18.0164 , 19.25691, 20.79821, 21.17772, 23.34925, 24.69467, 25.91663, 26.0043 , 27.48027, 31.09767, 29.88905, 31.87226, 35.07482, 34.08161, 34.96767, 36.59085, 39.56174, 37.35437, 42.00158, 42.65467, 44.21824, 42.7808 , 41.49634, 42.89946, 41.51576, 45.2993 ], [ 7.11402, 9.15502, 0. , 6.86568, 9.13056, 6.40397, 9.71158, 8.69917, 8.75907, 9.94964, 12.60852, 10.83371, 13.24922, 15.56278, 18.00935, 17.28358, 18.64376, 18.74751, 21.30424, 20.79329, 22.52019, 21.17382, 24.35227, 24.04257, 27.79547, 27.14583, 27.9384 , 28.31876, 29.19352, 30.31387, 32.49497, 32.53298, 32.54342, 31.6677 , 34.4333 , 33.06171, 33.8104 , 32.32825, 34.54666, 33.66703, 34.21985, 31.60984, 31.44096, 32.23245, 30.68488, 30.737 , 30.20034, 30.13288, 31.62192, 30.13293, 25.84322, 27.91135, 28.12311, 27.30813, 27.01853, 24.69093, 24.94734, 23.21836, 20.37644, 20.92257, 17.85377, 20.91217, 19.36837, 20.86528, 17.38101, 14.14065, 15.49992, 14.33989, 17.43602, 15.89444, 14.88863, 15.37154, 15.57648, 18.73803, 17.66276, 19.14902, 19.51857, 21.56308, 22.21288, 23.47894, 25.86362, 25.36028, 26.33286, 29.98075, 29.33639, 31.30981, 33.73965, 33.6759 , 34.04663, 35.63369, 38.24726, 36.06871, 40.78921, 41.34813, 43.00012, 42.36618, 40.24461, 42.30763, 40.55066, 44.09355], [ 7.30016, 9.50864, 6.86568, 0. , 7.99096, 7.60197, 10.10639, 9.22165, 8.8801 , 9.49238, 13.66406, 12.57744, 13.13858, 15.46775, 17.73952, 16.56614, 18.40617, 19.64056, 21.45177, 22.03097, 21.88416, 22.61383, 24.92398, 25.28642, 29.08492, 27.8259 , 29.18638, 29.3961 , 30.42798, 31.81182, 33.20848, 33.36339, 34.03547, 33.03364, 35.82444, 34.7116 , 35.8487 , 33.89059, 36.02136, 35.60052, 36.08047, 33.37669, 33.34017, 33.79942, 32.42774, 31.97527, 31.72587, 31.85386, 33.66607, 32.0357 , 26.92882, 29.3656 , 29.6113 , 28.77098, 27.98344, 25.49869, 26.08835, 24.3511 , 21.22304, 21.25266, 18.87943, 21.25316, 19.09945, 20.88305, 16.98734, 14.43121, 14.87293, 12.60412, 16.25858, 14.52592, 14.10743, 13.84321, 13.02127, 16.58691, 15.13994, 17.45726, 18.68005, 20.17601, 21.41715, 23.0355 , 25.07801, 24.73817, 26.67714, 29.31584, 29.02044, 31.28387, 33.55744, 33.39091, 34.49413, 36.29427, 39.09789, 36.46602, 41.84873, 41.95262, 44.21519, 43.15904, 41.86844, 43.70448, 41.85398, 45.9426 ], [ 9.1998 , 4.67066, 9.13056, 7.99096, 0. , 8.10903, 8.12191, 6.68991, 6.01745, 8.81851, 12.76517, 13.05449, 13.25358, 16.08166, 16.85701, 14.96082, 18.58137, 18.78881, 21.03104, 21.37792, 20.97864, 22.24007, 23.27987, 23.47925, 27.51623, 27.08492, 27.63842, 28.49791, 28.08892, 29.94971, 31.60411, 32.35099, 31.64065, 30.7586 , 33.58783, 32.51966, 33.71852, 31.59205, 34.09468, 34.22325, 34.16007, 32.3283 , 31.89116, 32.43132, 30.52192, 31.02961, 30.1106 , 31.0687 , 31.53677, 30.2114 , 25.7518 , 27.6139 , 28.20037, 27.20175, 27.57982, 25.10337, 25.22312, 22.84998, 20.28434, 20.93006, 18.50018, 19.27925, 17.75401, 20.49716, 16.24356, 13.95706, 14.68697, 14.86848, 14.61363, 14.13096, 14.51717, 15.68778, 14.51018, 17.43174, 16.76305, 18.13449, 19.68793, 19.75357, 21.78947, 23.32496, 24.54554, 24.18059, 26.19812, 29.52194, 28.20267, 30.01717, 33.36934, 32.26088, 33.40864, 34.83176, 38.18059, 35.81903, 40.65885, 41.40321, 43.17206, 41.53149, 40.71829, 42.49124, 40.79031, 44.66045], [ 7.52661, 8.73786, 6.40397, 7.60197, 8.10903, 0. , 6.16569, 7.32563, 7.75387, 7.69682, 10.69552, 9.88725, 11.3961 , 14.00163, 16.23874, 15.6289 , 16.73334, 17.44048, 19.99725, 19.9533 , 20.46002, 19.84823, 22.72976, 23.22363, 27.33355, 26.83688, 27.09796, 28.27618, 28.3494 , 29.75656, 31.77772, 32.50386, 32.5482 , 31.99727, 34.67355, 33.19062, 34.46902, 33.13995, 35.80184, 35.25591, 35.33808, 33.38177, 33.04059, 33.91149, 32.75336, 32.5993 , 32.43664, 32.23116, 33.38874, 32.62237, 28.36209, 30.55605, 30.3347 , 29.56765, 29.58556, 27.53405, 26.86667, 25.00712, 21.74171, 23.02809, 19.62897, 22.0497 , 20.71073, 20.89719, 18.9401 , 14.67542, 15.87359, 14.84921, 15.9765 , 14.6316 , 13.43337, 14.09887, 13.393 , 15.35339, 15.15473, 15.57783, 16.38066, 18.12999, 19.33544, 20.47957, 22.53012, 23.41117, 23.94749, 27.14785, 27.12172, 29.01223, 32.1432 , 31.53822, 31.97442, 34.20046, 36.96213, 35.11644, 39.86917, 40.59916, 42.25278, 42.35218, 40.45489, 42.53407, 40.64863, 44.57547], [ 10.1122 , 10.65396, 9.71158, 10.10639, 8.12191, 6.16569, 0. , 8.95464, 7.95483, 6.96854, 10.2321 , 9.1072 , 11.40191, 13.88653, 16.49484, 12.77356, 14.76539, 17.37815, 19.05223, 19.09134, 19.48543, 19.25793, 20.07018, 21.44923, 26.16354, 24.71872, 25.7368 , 26.60446, 26.8787 , 27.97926, 30.01728, 31.59176, 31.2204 , 30.74726, 33.36662, 32.19782, 33.24557, 32.26498, 35.20755, 35.27066, 34.3224 , 33.45572, 33.21501, 33.77282, 32.98931, 33.29691, 32.48639, 32.89237, 33.31269, 33.17576, 29.06571, 31.25932, 31.50259, 31.19405, 31.33608, 28.94501, 28.98748, 26.83509, 23.67883, 25.4293 , 22.19619, 23.92894, 22.48331, 23.70778, 20.61034, 16.80414, 17.54135, 17.64634, 18.13698, 16.73796, 15.0992 , 16.73965, 15.39915, 15.3146 , 16.21818, 15.90665, 16.80116, 17.15962, 18.1004 , 20.22697, 21.04317, 22.01882, 23.10616, 25.70386, 25.33855, 26.90823, 30.27089, 29.34631, 29.92087, 31.66778, 35.17056, 33.31352, 37.56416, 39.27059, 40.62456, 40.2062 , 39.40489, 41.81369, 39.33236, 43.20546], [ 11.71544, 7.07033, 8.69917, 9.22165, 6.68991, 7.32563, 8.95464, 0. , 3.7505 , 6.85568, 7.69523, 9.7299 , 8.8292 , 12.08808, 12.84075, 12.41397, 15.7665 , 13.80821, 17.10528, 17.0151 , 18.03523, 18.49492, 21.4005 , 20.91953, 24.56468, 24.98275, 25.20224, 26.47583, 26.01966, 28.37676, 30.43507, 31.12544, 31.08021, 29.96861, 33.06401, 32.57304, 33.997 , 32.11079, 34.68809, 34.12659, 35.42161, 33.62916, 33.16394, 34.14988, 31.96731, 33.13473, 32.68089, 33.34788, 34.37908, 32.97516, 29.38917, 31.30679, 31.86587, 30.67301, 31.14565, 29.06828, 29.04268, 27.11562, 24.63125, 25.08637, 22.31814, 23.71491, 22.41495, 23.87865, 20.79483, 17.13811, 18.05839, 16.67094, 16.67337, 15.59013, 15.29917, 14.46026, 14.31921, 16.68845, 14.41634, 16.10611, 16.76691, 16.68857, 18.02822, 19.23566, 20.91886, 19.93191, 21.22647, 25.60087, 24.01222, 26.23519, 29.55426, 29.01695, 29.67471, 31.85062, 34.7836 , 32.77038, 38.06911, 38.63774, 40.78472, 39.75408, 38.43139, 40.51321, 39.43324, 43.19846], [ 11.72171, 7.65245, 8.75907, 8.8801 , 6.01745, 7.75387, 7.95483, 3.7505 , 0. , 6.25554, 7.94184, 9.40809, 9.37058, 12.64537, 13.25671, 11.69312, 15.73247, 14.29708, 17.17558, 17.07269, 18.18613, 18.66698, 20.99351, 20.83302, 24.42795, 24.65435, 25.16361, 26.40833, 26.00333, 28.48816, 30.39505, 31.12778, 31.04052, 30.02038, 33.06101, 32.51077, 33.91746, 32.16109, 34.72522, 34.26106, 35.29431, 33.7662 , 33.40285, 34.11146, 32.13639, 33.33042, 32.66608, 33.38514, 34.29769, 32.89254, 29.26526, 31.06269, 32.01605, 30.75263, 31.17672, 28.89607, 29.20958, 27.32181, 24.53975, 25.21252, 22.16274, 23.60181, 22.19147, 23.8951 , 20.57865, 17.20223, 17.67657, 16.61788, 17.05001, 15.68349, 15.30447, 14.95815, 14.4235 , 16.70277, 14.6234 , 16.24885, 16.87216, 16.49137, 17.85663, 19.18505, 20.91421, 19.70914, 21.64589, 25.6927 , 24.18448, 26.08005, 29.35354, 29.00965, 29.70327, 31.74665, 34.68077, 32.64932, 37.91062, 38.78521, 40.83826, 39.61926, 38.5705 , 40.69567, 39.56802, 43.04025], [ 12.23729, 11.7303 , 9.94964, 9.49238, 8.81851, 7.69682, 6.96854, 6.85568, 6.25554, 0. , 7.3016 , 8.07212, 8.2089 , 9.98656, 12.42781, 10.62705, 12.28722, 13.48977, 16.04661, 16.19534, 16.3577 , 17.32154, 19.0178 , 20.5046 , 23.95953, 23.43756, 24.50878, 25.26128, 25.60517, 28.0269 , 30.02525, 31.11886, 31.15891, 30.49865, 33.86459, 33.37975, 34.65544, 33.14589, 36.03802, 35.51766, 36.24468, 35.04257, 34.67094, 35.65553, 34.59658, 34.79422, 34.36677, 35.45895, 36.3779 , 35.3327 , 31.54706, 33.65312, 34.74652, 33.95725, 33.40518, 31.44167, 31.93944, 30.05815, 26.96615, 27.64338, 24.84473, 26.74278, 24.89471, 26.39705, 22.80323, 19.30856, 19.43576, 18.22819, 18.64491, 17.05169, 14.82973, 15.51568, 14.01205, 14.33961, 13.3435 , 14.52817, 14.70497, 14.77414, 15.56051, 16.74031, 18.93591, 17.75143, 19.27456, 22.94456, 21.91537, 23.84675, 27.42168, 26.76934, 27.66108, 29.75946, 33.0935 , 30.98471, 36.65424, 37.30058, 39.65202, 39.12975, 38.02632, 40.69416, 39.21815, 43.02089], [ 16.55706, 13.97535, 12.60852, 13.66406, 12.76517, 10.69552, 10.2321 , 7.69523, 7.94184, 7.3016 , 0. , 6.34024, 6.29442, 9.31874, 10.60831, 9.48152, 11.679 , 9.86008, 12.78319, 12.47705, 15.20036, 14.79868, 18.06249, 17.95878, 21.80304, 21.94482, 22.94088, 24.29869, 24.32754, 26.55776, 29.0597 , 30.47032, 31.03937, 29.92437, 33.13454, 33.19895, 34.69406, 33.79095, 36.57048, 36.01451, 37.16413, 36.42303, 36.18591, 37.09342, 35.61312, 37.27428, 36.78719, 37.18274, 38.417 , 37.62287, 34.72634, 36.87088, 37.5226 , 36.63014, 37.07125, 35.13839, 35.36933, 33.57977, 30.77296, 31.75893, 28.54674, 30.22502, 28.99081, 29.49656, 27.01958, 22.64011, 23.11653, 21.4745 , 21.88627, 19.79573, 18.19257, 16.9958 , 16.94848, 16.33478, 14.10467, 14.58976, 14.07395, 12.99207, 12.74353, 13.92433, 15.69868, 14.25548, 15.27649, 19.74988, 18.25877, 20.80091, 23.91183, 24.08524, 24.27307, 27.16596, 30.02343, 28.55089, 33.83448, 35.47091, 37.51809, 37.42083, 36.17692, 39.05643, 38.09928, 41.43184], [ 15.07764, 14.44678, 10.83371, 12.57744, 13.05449, 9.88725, 9.1072 , 9.7299 , 9.40809, 8.07212, 6.34024, 0. , 6.9511 , 8.1817 , 12.01962, 9.83166, 10.03616, 10.82597, 12.22157, 12.42638, 14.81562, 13.24075, 15.91179, 16.17176, 20.81994, 19.82228, 21.65753, 22.19242, 23.10866, 24.91478, 26.51025, 27.84524, 28.94428, 28.23706, 31.0608 , 30.49635, 31.58452, 31.13131, 34.00834, 33.61703, 34.00148, 33.33828, 33.20841, 33.84176, 33.36215, 34.29352, 34.00194, 34.24137, 35.53305, 35.23732, 32.32311, 34.68689, 34.89579, 34.4585 , 34.33839, 33.07445, 33.23999, 31.99141, 28.83607, 30.27728, 27.08538, 29.42086, 28.13654, 28.78396, 26.43445, 22.12388, 23.17097, 21.90119, 23.31711, 21.43302, 19.21266, 18.09438, 18.37291, 17.67573, 16.69031, 16.42476, 15.74665, 15.65608, 14.72969, 16.18704, 17.54084, 16.84414, 17.82539, 20.46911, 20.57828, 21.7296 , 24.02904, 24.20871, 24.33682, 26.71437, 29.08875, 27.8246 , 32.39523, 33.86684, 35.84499, 36.03478, 34.76023, 37.39742, 35.57779, 39.06011], [ 17.41182, 14.4366 , 13.24922, 13.13858, 13.25358, 11.3961 , 11.40191, 8.8292 , 9.37058, 8.2089 , 6.29442, 6.9511 , 0. , 5.80269, 7.34823, 7.61967, 8.88048, 8.03171, 10.84203, 11.15163, 12.21275, 12.34706, 16.11929, 16.0975 , 19.45663, 20.33958, 21.24043, 22.37157, 22.50594, 24.89152, 26.47231, 28.19409, 29.03612, 28.11855, 31.78424, 31.68759, 33.16023, 32.00993, 35.12134, 34.61314, 36.07251, 35.37259, 34.85009, 36.19004, 34.77939, 36.03767, 35.84771, 37.05029, 38.33912, 37.71305, 34.34605, 36.90215, 37.11372, 36.56462, 36.48557, 35.26936, 35.32999, 33.72941, 31.33676, 32.09544, 29.33582, 31.07569, 29.78355, 30.69899, 27.88792, 23.6909 , 24.95077, 22.44497, 22.73515, 21.53386, 19.39791, 17.15086, 17.26961, 16.50911, 14.97885, 14.84698, 14.0348 , 13.16584, 12.28025, 14.13143, 14.59832, 13.82806, 14.44424, 17.8433 , 16.98162, 19.04357, 22.24141, 22.05366, 22.98183, 25.33998, 28.35413, 26.6982 , 32.11611, 32.88203, 35.32819, 35.61998, 34.74136, 37.0847 , 36.06984, 40.03816], [ 19.23286, 17.67632, 15.56278, 15.46775, 16.08166, 14.00163, 13.88653, 12.08808, 12.64537, 9.98656, 9.31874, 8.1817 , 5.80269, 0. , 7.38877, 8.41906, 6.8504 , 7.29774, 9.2088 , 10.31162, 9.70575, 10.45564, 13.73927, 14.84133, 17.61261, 18.12427, 19.27123, 19.82935, 20.31297, 23.84286, 24.56468, 25.91998, 27.59428, 26.89288, 30.55842, 30.76624, 32.23747, 30.8686 , 34.10976, 33.50914, 35.42056, 34.64547, 34.19767, 35.53408, 35.0925 , 35.58014, 35.82126, 37.49003, 38.6495 , 38.10808, 35.42106, 37.82413, 38.53678, 38.03388, 37.13566, 36.62481, 36.8622 , 35.93997, 33.1206 , 33.81829, 31.46152, 33.72103, 32.12859, 33.34453, 30.4796 , 26.68134, 27.47971, 25.54979, 25.74229, 24.75989, 22.16485, 20.06477, 20.06272, 19.02226, 17.40359, 17.54345, 16.56831, 15.07742, 13.83543, 14.50581, 15.50799, 13.55086, 13.77702, 16.34443, 16.4955 , 17.40327, 20.09447, 19.53993, 20.64083, 23.14301, 25.73473, 23.91935, 30.1514 , 29.62866, 33.12758, 33.46496, 32.47884, 35.00942, 33.91675, 38.06303], [ 22.03428, 17.78924, 18.00935, 17.73952, 16.85701, 16.23874, 16.49484, 12.84075, 13.25671, 12.42781, 10.60831, 12.01962, 7.34823, 7.38877, 0. , 9.98466, 10.45042, 7.01399, 9.5126 , 10.22698, 8.29332, 10.75866, 15.09764, 14.55 , 16.07545, 19.17678, 18.69286, 20.74564, 19.36248, 23.2144 , 24.62428, 26.05861, 27.01352, 25.91131, 30.18725, 30.6706 , 32.78994, 31.0901 , 34.2731 , 33.85408, 36.35145, 35.76726, 35.06246, 36.69763, 35.34281, 36.84744, 36.8362 , 38.82735, 39.99487, 39.19622, 36.6623 , 39.10442, 39.88191, 38.95773, 38.89602, 38.43435, 38.31651, 37.10258, 34.93565, 35.54241, 33.22748, 34.83971, 33.64244, 34.72378, 32.11209, 28.69735, 29.37637, 27.27617, 26.28932, 25.69016, 23.88593, 21.52732, 21.55272, 20.82185, 18.79088, 18.59526, 17.84671, 15.16618, 14.82818, 14.49505, 15.15638, 12.82821, 12.71296, 15.97897, 14.73869, 16.7043 , 19.42872, 18.50832, 20.11721, 22.55747, 25.22857, 23.0396 , 29.71055, 29.41731, 32.39176, 32.91178, 31.62541, 33.79263, 34.18837, 37.85652], [ 20.37969, 17.34483, 17.28358, 16.56614, 14.96082, 15.6289 , 12.77356, 12.41397, 11.69312, 10.62705, 9.48152, 9.83166, 7.61967, 8.41906, 9.98466, 0. , 7.63615, 9.92392, 10.04376, 10.49189, 11.12076, 12.59852, 12.42193, 13.54166, 17.25956, 16.80266, 18.81578, 19.22611, 19.70099, 22.40398, 23.49591, 25.94766, 26.54727, 25.58126, 29.12877, 29.7836 , 31.07416, 29.98562, 33.17048, 33.35028, 34.18506, 34.59722, 34.20947, 35.18721, 34.12158, 35.88742, 35.13336, 37.20982, 37.74652, 37.47408, 34.70021, 37.06721, 38.15456, 37.88232, 37.77484, 36.36806, 37.32521, 35.80232, 33.52616, 34.40962, 32.12772, 33.40582, 31.97045, 34.19075, 30.2786 , 26.83978, 27.6592 , 26.30551, 26.20587, 25.12388, 23.22648, 22.09752, 21.38396, 19.8089 , 18.83186, 18.86371, 18.34015, 15.42575, 14.24007, 16.84456, 15.52977, 13.17945, 15.12571, 17.63489, 15.48099, 16.55105, 19.58731, 18.96449, 20.24747, 21.6386 , 25.58849, 23.63285, 28.70622, 30.21742, 32.68447, 31.55284, 32.34425, 35.03969, 33.47656, 37.35049], [ 21.79021, 20.92378, 18.64376, 18.40617, 18.58137, 16.73334, 14.76539, 15.7665 , 15.73247, 12.28722, 11.679 , 10.03616, 8.88048, 6.8504 , 10.45042, 7.63615, 0. , 10.17132, 8.68748, 9.45034, 8.94115, 9.44003, 9.64568, 12.4589 , 15.94741, 14.65608, 17.17587, 17.02335, 18.67632, 20.67866, 21.7796 , 24.38931, 25.59948, 25.17219, 28.8489 , 29.1434 , 30.36341, 29.95065, 33.33854, 33.29967, 33.88558, 34.48264, 34.10065, 35.42542, 35.45233, 36.19829, 36.03964, 37.86692, 38.88208, 39.07201, 36.40368, 39.23049, 39.92573, 40.05261, 39.20976, 38.61636, 39.27939, 38.1003 , 35.6578 , 36.74788, 34.53888, 36.56434, 35.07799, 36.59537, 33.28937, 29.56233, 30.64956, 29.02811, 29.40351, 28.14274, 25.25644, 24.3272 , 23.56296, 20.85461, 20.80185, 19.89348, 18.69289, 17.03854, 14.62848, 16.26612, 15.56155, 13.83998, 13.72169, 14.82471, 14.34678, 15.01213, 17.11687, 16.34701, 17.47794, 19.00668, 22.61177, 20.87814, 25.97138, 27.04819, 29.54295, 30.34684, 30.10012, 33.17501, 31.31277, 35.47796], [ 23.31804, 19.64563, 18.74751, 19.64056, 18.78881, 17.44048, 17.37815, 13.80821, 14.29708, 13.48977, 9.86008, 10.82597, 8.03171, 7.29774, 7.01399, 9.92392, 10.17132, 0. , 6.48226, 6.68 , 9.57466, 9.48408, 13.97397, 13.15679, 15.03983, 17.80094, 18.12113, 19.85319, 18.97979, 23.36183, 24.33233, 25.80959, 27.19217, 26.53318, 30.11932, 30.64921, 32.34859, 31.49555, 34.6749 , 33.71667, 36.43731, 36.39199, 35.67965, 37.13664, 35.96695, 37.67898, 37.95828, 39.3333 , 40.59642, 39.92405, 37.99061, 40.17349, 40.90994, 39.98119, 39.92003, 39.6808 , 39.62528, 38.71391, 36.21198, 37.08048, 34.29651, 36.10593, 34.99984, 35.8496 , 33.74472, 29.52144, 30.75883, 28.63502, 28.32294, 27.26777, 25.1948 , 22.50728, 23.05361, 22.12662, 19.74132, 19.63274, 17.89876, 15.79792, 14.13342, 14.08034, 14.92898, 11.06895, 11.02782, 15.65033, 14.30857, 14.46629, 17.74227, 17.97385, 18.06633, 21.34884, 22.96009, 22.50932, 28.30391, 28.4628 , 31.64773, 32.1907 , 31.04573, 33.47638, 33.5043 , 36.83706], [ 25.27884, 22.37953, 21.30424, 21.45177, 21.03104, 19.99725, 19.05223, 17.10528, 17.17558, 16.04661, 12.78319, 12.22157, 10.84203, 9.2088 , 9.5126 , 10.04376, 8.68748, 6.48226, 0. , 5.22655, 6.84299, 8.01445, 10.22964, 9.35169, 12.50172, 13.03384, 15.11671, 16.15319, 16.32212, 20.01859, 20.19876, 22.03234, 24.49275, 23.76956, 26.89441, 27.73295, 29.57591, 29.31235, 32.28865, 31.99324, 33.95671, 34.64837, 34.25079, 35.22869, 34.85416, 36.69074, 37.04425, 38.1781 , 39.77615, 39.60791, 38.021 , 40.62842, 41.3098 , 40.72945, 40.56478, 40.68159, 40.97394, 40.4026 , 37.91905, 39.08311, 36.66186, 38.48701, 37.36496, 38.4644 , 36.32379, 32.42384, 33.65172, 31.82761, 32.1881 , 30.80285, 28.93245, 26.4393 , 26.65776, 25.03495, 23.12291, 22.70202, 21.31299, 18.64901, 16.46835, 16.81846, 16.44223, 12.42447, 12.98025, 14.62265, 13.85173, 13.32047, 13.81794, 14.45423, 14.59398, 17.7774 , 18.83897, 18.33698, 23.71274, 24.45456, 27.512 , 27.97503, 27.44909, 29.9473 , 29.47687, 32.87905], [ 25.33384, 22.50652, 20.79329, 22.03097, 21.37792, 19.9533 , 19.09134, 17.0151 , 17.07269, 16.19534, 12.47705, 12.42638, 11.15163, 10.31162, 10.22698, 10.49189, 9.45034, 6.68 , 5.22655, 0. , 9.1845 , 6.24097, 10.68858, 8.9892 , 10.75227, 12.5549 , 13.42965, 14.83639, 15.19843, 18.37361, 20.11514, 21.80086, 23.21553, 22.42534, 25.94406, 26.90792, 28.49847, 28.38986, 31.34822, 30.44753, 32.95536, 33.76841, 33.23805, 34.76968, 33.79277, 36.13666, 36.16784, 37.57777, 39.01639, 38.73035, 37.37737, 39.86314, 40.9369 , 40.37831, 40.3905 , 40.18733, 40.75477, 40.04084, 38.03854, 39.0819 , 36.62329, 38.68787, 37.66226, 39.09107, 36.48663, 32.61225, 34.05121, 32.23 , 32.67538, 31.35308, 29.36674, 27.44256, 27.60226, 26.2458 , 24.4287 , 24.17496, 22.32386, 20.26882, 17.63621, 18.1981 , 18.06242, 13.71252, 12.60611, 16.39878, 13.84417, 13.90252, 14.58239, 15.61773, 14.72379, 17.00739, 18.61744, 17.9266 , 22.92496, 24.15302, 26.40181, 27.03924, 25.82034, 28.7832 , 28.60458, 31.58328], [ 25.51721, 22.59182, 22.52019, 21.88416, 20.97864, 20.46002, 19.48543, 18.03523, 18.18613, 16.3577 , 15.20036, 14.81562, 12.21275, 9.70575, 8.29332, 11.12076, 8.94115, 9.57466, 6.84299, 9.1845 , 0. , 8.42136, 9.37678, 10.39484, 12.03637, 13.28049, 13.52418, 15.33503, 14.2314 , 18.60678, 18.65654, 20.52313, 22.48976, 21.9616 , 25.69566, 26.5576 , 28.88904, 27.73913, 31.08385, 31.06195, 33.13063, 33.66469, 33.0211 , 34.43139, 34.32978, 35.45315, 35.945 , 38.07486, 39.28353, 39.22602, 37.42741, 40.16763, 41.16595, 40.59288, 40.07303, 40.5409 , 40.65161, 40.1262 , 37.85329, 38.93651, 37.02688, 38.83153, 37.52198, 39.07799, 36.60449, 33.43265, 34.30145, 32.81646, 32.20425, 31.53711, 29.60455, 27.68888, 27.2861 , 25.60395, 24.28611, 23.68602, 22.83922, 19.6321 , 18.43043, 18.32631, 17.7009 , 14.75941, 14.24738, 14.81143, 14.39659, 14.09233, 14.84183, 13.11227, 14.67681, 17.10835, 19.10648, 17.05555, 23.72982, 22.93954, 26.36839, 26.99544, 26.43948, 28.60162, 28.39343, 32.42605], [ 25.23346, 23.16428, 21.17382, 22.61383, 22.24007, 19.84823, 19.25793, 18.49492, 18.66698, 17.32154, 14.79868, 13.24075, 12.34706, 10.45564, 10.75866, 12.59852, 9.44003, 9.48408, 8.01445, 6.24097, 8.42136, 0. , 9.19567, 8.28483, 9.68535, 11.98782, 11.05565, 13.27899, 12.97288, 15.95134, 17.47779, 19.26536, 20.81256, 20.47412, 24.07951, 24.48853, 26.31214, 26.12805, 29.5516 , 28.71938, 30.9913 , 31.6734 , 31.05312, 32.90775, 32.68055, 34.31807, 34.6565 , 36.33305, 37.55324, 37.84719, 36.54117, 39.18647, 39.96285, 39.6461 , 39.38249, 39.77171, 39.84257, 39.44919, 37.48819, 38.87541, 36.52095, 39.03388, 38.04894, 39.2854 , 37.05129, 33.33446, 34.86519, 33.37436, 33.51843, 32.6792 , 30.40374, 28.64096, 28.81112, 27.30426, 26.30338, 25.2772 , 23.65439, 21.82505, 19.44844, 19.69742, 19.3097 , 17.16471, 14.47088, 16.52024, 15.93936, 15.41702, 15.60782, 15.3568 , 14.63636, 16.4597 , 17.9459 , 16.78672, 21.70309, 21.98327, 24.07691, 26.11801, 24.12467, 26.69992, 26.15777, 29.63547], [ 27.13757, 25.36422, 24.35227, 24.92398, 23.27987, 22.72976, 20.07018, 21.4005 , 20.99351, 19.0178 , 18.06249, 15.91179, 16.11929, 13.73927, 15.09764, 12.42193, 9.64568, 13.97397, 10.22964, 10.68858, 9.37678, 9.19567, 0. , 6.87271, 10.56088, 8.63342, 11.01072, 10.93865, 11.45914, 14.5144 , 14.09155, 17.70246, 18.65296, 19.02075, 22.11715, 22.68976, 23.99425, 24.33084, 28.06753, 28.67934, 28.75474, 31.08502, 30.53155, 31.69899, 32.62122, 33.92971, 33.80767, 36.2285 , 36.50334, 37.53278, 36.5014 , 39.07491, 40.15724, 40.40674, 39.85148, 40.40163, 40.9679 , 40.5153 , 38.42327, 40.1212 , 38.22838, 40.08247, 38.99665, 41.12947, 38.17204, 34.9892 , 36.38785, 35.95878, 35.6227 , 34.90092, 32.5457 , 31.91806, 31.28384, 28.86779, 28.98518, 27.6702 , 26.58669, 23.92989, 21.86083, 22.75265, 21.11808, 18.7693 , 18.18955, 18.2176 , 17.45544, 15.00264, 15.51133, 13.46033, 13.79958, 13.68436, 16.58338, 15.70851, 18.99807, 20.56008, 22.44251, 22.97965, 23.45606, 26.30876, 24.09575, 27.8713 ], [ 27.77099, 24.53657, 24.04257, 25.28642, 23.47925, 23.22363, 21.44923, 20.91953, 20.83302, 20.5046 , 17.95878, 16.17176, 16.0975 , 14.84133, 14.55 , 13.54166, 12.4589 , 13.15679, 9.35169, 8.9892 , 10.39484, 8.28483, 6.87271, 0. , 7.72073, 8.27576, 9.50557, 10.31176, 9.92433, 12.30625, 12.50632, 15.65258, 16.85322, 16.28059, 19.40831, 20.28063, 21.80124, 22.2785 , 25.59015, 26.20645, 27.03954, 29.20724, 28.64062, 29.85524, 29.8061 , 32.57415, 32.29469, 34.31277, 34.99807, 35.82 , 35.31492, 37.94267, 38.60944, 38.67121, 38.95962, 39.63135, 39.99892, 39.64319, 38.13686, 39.73796, 37.88513, 39.57072, 38.9161 , 40.94059, 38.2891 , 35.07866, 36.89862, 36.20281, 36.05376, 35.34433, 33.67934, 32.25061, 32.36628, 30.7808 , 30.11484, 29.01616, 27.92706, 25.32419, 23.29639, 24.12436, 22.42396, 19.7537 , 19.01006, 19.94131, 18.33339, 16.81857, 15.99741, 15.0617 , 14.77595, 14.59106, 16.3054 , 15.56181, 17.98122, 20.15022, 21.20001, 21.39029, 21.13788, 23.4717 , 22.23873, 25.49112], [ 31.89744, 28.53754, 27.79547, 29.08492, 27.51623, 27.33355, 26.16354, 24.56468, 24.42795, 23.95953, 21.80304, 20.81994, 19.45663, 17.61261, 16.07545, 17.25956, 15.94741, 15.03983, 12.50172, 10.75227, 12.03637, 9.68535, 10.56088, 7.72073, 0. , 9.87641, 7.6598 , 9.49731, 7.49547, 12.71031, 13.12007, 15.09318, 15.76199, 15.39935, 19.70394, 21.41588, 23.13548, 22.85749, 26.18616, 25.68635, 28.9516 , 30.93716, 30.0345 , 32.15204, 31.65812, 34.38793, 34.38638, 37.50663, 38.17473, 38.39304, 38.36668, 40.71544, 42.31539, 42.04724, 41.86037, 42.90939, 43.56974, 43.4618 , 42.34263, 43.41177, 41.79768, 43.79152, 42.99372, 45.43309, 42.39361, 39.51926, 41.2064 , 40.11474, 39.74911, 39.34168, 37.45432, 35.8801 , 35.90349, 34.72309, 33.54021, 32.87432, 31.3026 , 28.63784, 26.36769, 26.4844 , 25.34082, 21.25983, 19.4387 , 21.35453, 18.66931, 16.58331, 15.29044, 14.5311 , 14.07159, 12.69685, 13.75561, 12.45704, 16.48731, 16.18279, 18.19724, 18.82112, 17.69826, 20.07196, 20.9814 , 23.64271], [ 30.12906, 28.8102 , 27.14583, 27.8259 , 27.08492, 26.83688, 24.71872, 24.98275, 24.65435, 23.43756, 21.94482, 19.82228, 20.33958, 18.12427, 19.17678, 16.80266, 14.65608, 17.80094, 13.03384, 12.5549 , 13.28049, 11.98782, 8.63342, 8.27576, 9.87641, 0. , 7.98874, 5.38369, 9.95786, 10.82054, 11.0462 , 12.89802, 15.44491, 14.85573, 17.38129, 19.47045, 20.77089, 21.32264, 24.07172, 24.28534, 25.596 , 27.93375, 27.86256, 28.87872, 29.7082 , 31.88982, 31.87346, 34.12199, 35.33819, 36.0731 , 35.88703, 38.71968, 40.48097, 40.65492, 40.04445, 40.78432, 42.24041, 42.36186, 40.86445, 42.22587, 40.89454, 43.17864, 41.97252, 44.85097, 41.45324, 38.91288, 40.27944, 39.6453 , 40.59724, 39.49418, 37.70761, 36.65188, 36.33936, 34.78051, 33.80123, 33.49258, 32.35566, 29.75799, 27.30116, 28.38011, 27.09472, 23.18208, 22.41711, 22.75371, 20.88887, 19.50611, 15.96133, 15.86852, 14.98222, 13.66391, 14.86599, 12.41707, 15.52539, 16.68747, 18.30724, 16.95519, 17.37892, 20.49274, 18.30611, 21.75684], [ 30.853 , 28.67001, 27.9384 , 29.18638, 27.63842, 27.09796, 25.7368 , 25.20224, 25.16361, 24.50878, 22.94088, 21.65753, 21.24043, 19.27123, 18.69286, 18.81578, 17.17587, 18.12113, 15.11671, 13.42965, 13.52418, 11.05565, 11.01072, 9.50557, 7.6598 , 7.98874, 0. , 6.64528, 4.59801, 8.21652, 10.31466, 11.16178, 12.08679, 11.83932, 15.09664, 17.18703, 19.54388, 19.20711, 22.42597, 21.78585, 24.86467, 26.82595, 26.17402, 28.22608, 28.33647, 30.77347, 31.09911, 33.82732, 34.63993, 35.32224, 35.64687, 38.19558, 40.17273, 39.9043 , 39.71271, 40.82968, 41.59044, 41.89523, 40.82556, 42.24267, 40.88634, 43.2909 , 42.33337, 45.05862, 42.17539, 39.71804, 41.17056, 40.67256, 40.65653, 40.19097, 38.65435, 37.60561, 37.35235, 36.34129, 35.23204, 34.77337, 33.6225 , 30.92734, 29.00015, 29.40958, 28.36382, 25.0391 , 22.59333, 24.07796, 21.93502, 20.62995, 18.16191, 17.02272, 15.54819, 14.56199, 15.3057 , 12.64367, 16.45475, 15.71403, 16.86429, 16.85299, 15.00269, 17.43522, 17.0517 , 20.42138], [ 31.25053, 29.99149, 28.31876, 29.3961 , 28.49791, 28.27618, 26.60446, 26.47583, 26.40833, 25.26128, 24.29869, 22.19242, 22.37157, 19.82935, 20.74564, 19.22611, 17.02335, 19.85319, 16.15319, 14.83639, 15.33503, 13.27899, 10.93865, 10.31176, 9.49731, 5.38369, 6.64528, 0. , 7.92917, 9.13679, 9.53346, 10.47996, 12.16654, 11.7129 , 15.01867, 17.3945 , 18.42663, 18.36001, 21.25014, 21.14056, 23.39958, 25.64935, 25.29954, 27.11021, 27.88541, 29.88329, 29.87818, 33.15122, 33.91166, 34.60562, 34.9862 , 37.6223 , 39.74094, 40.05704, 39.15768, 40.26411, 41.81453, 42.1735 , 41.17397, 42.24356, 41.4135 , 43.87194, 42.72225, 46.08496, 42.30557, 40.15065, 41.69059, 41.24556, 41.85591, 41.23647, 39.41798, 38.64347, 38.26693, 37.14527, 36.28322, 36.14132, 34.9677 , 32.65462, 30.32245, 31.15585, 30.05495, 26.14201, 24.6254 , 25.64368, 23.38049, 21.94394, 18.90934, 18.07516, 17.34202, 14.70928, 16.17006, 13.04202, 15.86513, 15.06125, 16.67207, 15.02326, 14.43571, 17.62774, 15.52057, 19.39125], [ 32.01368, 29.0992 , 29.19352, 30.42798, 28.08892, 28.3494 , 26.8787 , 26.01966, 26.00333, 25.60517, 24.32754, 23.10866, 22.50594, 20.31297, 19.36248, 19.70099, 18.67632, 18.97979, 16.32212, 15.19843, 14.2314 , 12.97288, 11.45914, 9.92433, 7.49547, 9.95786, 4.59801, 7.92917, 0. , 9.46759, 9.53989, 10.77149, 10.67123, 10.77957, 14.38107, 16.55356, 18.86377, 17.93757, 21.53923, 21.5 , 24.50229, 26.72856, 25.8231 , 27.87236, 28.08681, 30.54159, 30.74741, 34.21383, 34.31615, 35.0456 , 35.81848, 38.00399, 40.05607, 39.79457, 39.61693, 41.10429, 41.69482, 42.04974, 41.08039, 42.4485 , 41.31527, 43.35611, 42.53551, 45.50379, 42.54538, 40.30517, 41.79641, 41.68827, 40.95449, 40.88777, 39.44077, 38.47113, 38.21591, 37.33843, 36.35944, 35.80693, 34.8564 , 31.85679, 30.32951, 30.47848, 29.27892, 25.8884 , 23.90214, 25.36182, 23.14462, 21.09032, 19.56871, 17.40934, 16.72036, 15.08113, 16.01239, 13.78221, 17.09382, 15.61569, 17.24015, 16.33925, 15.18723, 17.0286 , 16.96968, 20.31666], [ 32.92915, 30.78134, 30.31387, 31.81182, 29.94971, 29.75656, 27.97926, 28.37676, 28.48816, 28.0269 , 26.55776, 24.91478, 24.89152, 23.84286, 23.2144 , 22.40398, 20.67866, 23.36183, 20.01859, 18.37361, 18.60678, 15.95134, 14.5144 , 12.30625, 12.71031, 10.82054, 8.21652, 9.13679, 9.46759, 0. , 8.89237, 11.56402, 9.66796, 8.7428 , 11.95979, 14.08575, 15.80045, 16.65181, 19.74296, 20.61442, 21.15351, 24.45494, 23.9113 , 26.09147, 26.17295, 29.57049, 28.96665, 31.93696, 32.42409, 34.06939, 34.49619, 37.51745, 38.87821, 39.44799, 39.80638, 40.83166, 41.71964, 41.67128, 41.61213, 43.2428 , 42.36568, 44.39476, 43.77865, 46.84558, 43.53456, 41.47749, 43.3667 , 43.24129, 43.27906, 42.89426, 41.53307, 41.05796, 40.77052, 39.41463, 39.1773 , 38.1895 , 37.36602, 34.97812, 33.00745, 34.10653, 32.18818, 29.92886, 27.45379, 28.27133, 25.75202, 25.34647, 22.44188, 20.82485, 19.82573, 16.70077, 18.83691, 15.85836, 15.60841, 17.8028 , 15.28365, 15.19425, 13.84108, 16.11862, 13.96013, 17.63429], [ 34.80687, 32.60362, 32.49497, 33.20848, 31.60411, 31.77772, 30.01728, 30.43507, 30.39505, 30.02525, 29.0597 , 26.51025, 26.47231, 24.56468, 24.62428, 23.49591, 21.7796 , 24.33233, 20.19876, 20.11514, 18.65654, 17.47779, 14.09155, 12.50632, 13.12007, 11.0462 , 10.31466, 9.53346, 9.53989, 8.89237, 0. , 7.18087, 9.42639, 10.38858, 11.58877, 12.91252, 14.59769, 15.59285, 18.95488, 20.67684, 20.8958 , 24.70048, 24.0165 , 25.43853, 26.95084, 29.21772, 29.55268, 32.62558, 32.93689, 34.88785, 35.60229, 38.44528, 39.2932 , 39.98048, 39.73092, 41.86321, 42.40135, 43.0919 , 42.45519, 44.1862 , 43.47623, 45.37599, 44.76722, 47.773 , 44.99764, 42.85482, 45.0001 , 44.84355, 44.80852, 44.71783, 43.31817, 42.17357, 42.00564, 40.73932, 40.54667, 39.4803 , 38.72058, 36.16405, 34.29426, 35.27955, 33.02234, 30.89456, 29.48465, 28.8475 , 27.86781, 25.48297, 22.56247, 20.68579, 20.34994, 17.57016, 17.81885, 16.39775, 15.34486, 15.15469, 14.83316, 14.19401, 15.20368, 15.44059, 11.80224, 16.78251], [ 34.67094, 33.06432, 32.53298, 33.36339, 32.35099, 32.50386, 31.59176, 31.12544, 31.12778, 31.11886, 30.47032, 27.84524, 28.19409, 25.91998, 26.05861, 25.94766, 24.38931, 25.80959, 22.03234, 21.80086, 20.52313, 19.26536, 17.70246, 15.65258, 15.09318, 12.89802, 11.16178, 10.47996, 10.77149, 11.56402, 7.18087, 0. , 8.74195, 9.19296, 8.9676 , 10.89087, 13.36331, 12.96807, 15.25494, 15.85912, 18.74728, 20.99385, 20.58536, 21.90675, 23.7641 , 25.59202, 26.79648, 29.58366, 30.67949, 31.89102, 33.4665 , 36.09133, 37.70993, 37.82863, 37.1605 , 39.86674, 40.65224, 42.17263, 41.48734, 42.91311, 42.56542, 45.03955, 44.26307, 47.54766, 44.95504, 43.36693, 45.22628, 45.15693, 45.74134, 45.58144, 44.57787, 43.24457, 43.23535, 43.03554, 41.97252, 41.7482 , 41.05749, 38.72517, 37.14338, 37.61494, 36.50124, 33.46255, 32.11582, 32.19497, 31.24603, 29.25163, 25.58682, 24.49213, 23.73966, 21.92186, 20.66609, 18.42641, 19.3634 , 16.24486, 17.24548, 15.15315, 13.59916, 12.73676, 10.23459, 14.84621], [ 34.61157, 32.35053, 32.54342, 34.03547, 31.64065, 32.5482 , 31.2204 , 31.08021, 31.04052, 31.15891, 31.03937, 28.94428, 29.03612, 27.59428, 27.01352, 26.54727, 25.59948, 27.19217, 24.49275, 23.21553, 22.48976, 20.81256, 18.65296, 16.85322, 15.76199, 15.44491, 12.08679, 12.16654, 10.67123, 9.66796, 9.42639, 8.74195, 0. , 4.97625, 7.39172, 8.51026, 10.14648, 9.50175, 13.1454 , 14.11243, 15.8578 , 19.62747, 18.29823, 20.97263, 21.55663, 24.21139, 24.16038, 28.57247, 28.2877 , 29.74339, 31.3773 , 33.75635, 35.79133, 36.21335, 36.02599, 38.43108, 39.29054, 40.11502, 40.44974, 41.86014, 41.7082 , 43.69002, 43.19493, 47.14492, 43.81483, 42.6301 , 44.81877, 45.24839, 44.99179, 45.27901, 44.24326, 44.00885, 43.62399, 43.42716, 43.097 , 42.57723, 41.90128, 39.82617, 38.47462, 39.17248, 37.86386, 34.99443, 33.29005, 34.59372, 32.25062, 30.55199, 28.63669, 26.61145, 26.02174, 22.74625, 23.58481, 21.28107, 21.00584, 19.89187, 18.45627, 16.11799, 14.52011, 14.22711, 12.12231, 15.73701], [ 33.95445, 31.31204, 31.6677 , 33.03364, 30.7586 , 31.99727, 30.74726, 29.96861, 30.02038, 30.49865, 29.92437, 28.23706, 28.11855, 26.89288, 25.91131, 25.58126, 25.17219, 26.53318, 23.76956, 22.42534, 21.9616 , 20.47412, 19.02075, 16.28059, 15.39935, 14.85573, 11.83932, 11.7129 , 10.77957, 8.7428 , 10.38858, 9.19296, 4.97625, 0. , 6.72148, 10.12496, 12.00924, 10.13169, 12.83076, 14.2167 , 16.72792, 19.37301, 18.59613, 21.00058, 20.66568, 24.48285, 23.84352, 28.26891, 28.32281, 29.3094 , 30.90888, 33.39326, 35.6437 , 35.93431, 36.08269, 37.85734, 39.11931, 39.73362, 40.26473, 41.36942, 41.39021, 43.28881, 42.72848, 46.85017, 43.10439, 42.13604, 43.99537, 44.34038, 44.26734, 44.39004, 43.65807, 43.18833, 42.95227, 42.93121, 42.14583, 41.96176, 41.49189, 39.07646, 37.79461, 38.59134, 37.2521 , 34.09993, 32.53784, 33.99566, 31.15287, 30.43379, 27.85509, 26.08889, 25.78955, 22.49509, 23.83891, 20.37192, 20.98593, 20.20119, 18.99761, 15.27863, 13.7462 , 13.92431, 12.80535, 16.20137], [ 36.09099, 33.96212, 34.4333 , 35.82444, 33.58783, 34.67355, 33.36662, 33.06401, 33.06101, 33.86459, 33.13454, 31.0608 , 31.78424, 30.55842, 30.18725, 29.12877, 28.8489 , 30.11932, 26.89441, 25.94406, 25.69566, 24.07951, 22.11715, 19.40831, 19.70394, 17.38129, 15.09664, 15.01867, 14.38107, 11.95979, 11.58877, 8.9676 , 7.39172, 6.72148, 0. , 6.43367, 8.98397, 8.67729, 10.0527 , 12.31282, 13.61151, 17.29523, 16.952 , 18.20422, 19.0518 , 23.06074, 23.05218, 26.13901, 26.38234, 28.09777, 30.76486, 33.11761, 35.1361 , 35.41829, 35.86916, 38.17342, 39.25199, 40.55388, 40.93513, 42.5099 , 42.56349, 44.52465, 44.12784, 48.09543, 45.13455, 44.19635, 46.17291, 46.91914, 47.23808, 47.21417, 46.86937, 46.30617, 46.18952, 46.3353 , 45.46555, 45.29483, 44.96068, 42.56432, 41.36339, 42.23577, 40.86225, 37.9703 , 36.62617, 37.63698, 35.45185, 34.21791, 31.08457, 29.69294, 28.68309, 26.09517, 26.05857, 23.59887, 22.7285 , 22.24654, 20.53253, 16.26812, 15.10008, 13.99427, 10.96839, 13.95765], [ 34.39821, 32.70361, 33.06171, 34.7116 , 32.51966, 33.19062, 32.19782, 32.57304, 32.51077, 33.37975, 33.19895, 30.49635, 31.68759, 30.76624, 30.6706 , 29.7836 , 29.1434 , 30.64921, 27.73295, 26.90792, 26.5576 , 24.48853, 22.68976, 20.28063, 21.41588, 19.47045, 17.18703, 17.3945 , 16.55356, 14.08575, 12.91252, 10.89087, 8.51026, 10.12496, 6.43367, 0. , 5.13455, 7.19981, 9.26742, 11.82888, 10.17125, 14.78769, 13.87506, 15.22203, 17.04526, 19.97496, 20.45104, 22.81966, 22.94849, 25.40346, 27.90575, 30.36558, 31.45698, 32.01961, 32.50931, 35.50764, 35.83888, 37.37873, 37.77828, 39.79016, 39.73051, 41.74234, 41.66611, 45.21051, 42.9827 , 41.96344, 44.45906, 45.37185, 45.70899, 45.86452, 45.49593, 45.23056, 45.08133, 45.40294, 45.13354, 44.55667, 44.24469, 42.6187 , 41.59934, 42.40437, 41.24915, 39.18714, 37.89513, 38.80147, 37.34563, 35.75256, 33.50089, 31.99846, 30.98915, 28.78806, 28.57053, 27.01304, 25.56683, 25.4267 , 23.23683, 21.19257, 19.15064, 17.6646 , 13.95903, 16.88417], [ 35.29278, 34.0075 , 33.8104 , 35.8487 , 33.71852, 34.46902, 33.24557, 33.997 , 33.91746, 34.65544, 34.69406, 31.58452, 33.16023, 32.23747, 32.78994, 31.07416, 30.36341, 32.34859, 29.57591, 28.49847, 28.88904, 26.31214, 23.99425, 21.80124, 23.13548, 20.77089, 19.54388, 18.42663, 18.86377, 15.80045, 14.59769, 13.36331, 10.14648, 12.00924, 8.98397, 5.13455, 0. , 7.16407, 9.04734, 11.89485, 7.77198, 14.16043, 13.10023, 14.59801, 16.76365, 19.57846, 19.38376, 22.39702, 21.86462, 24.72382, 27.4199 , 29.752 , 30.72862, 31.93212, 32.14663, 35.16599, 35.88763, 37.34873, 38.05632, 39.9811 , 40.13104, 42.11747, 42.08783, 46.05367, 43.28691, 42.38342, 45.19904, 46.24403, 46.80226, 47.00371, 46.37995, 46.44777, 46.32251, 46.58154, 46.66496, 46.07288, 45.61367, 44.385 , 43.10472, 44.24015, 42.9754 , 40.85532, 39.73088, 40.79164, 39.14055, 37.3534 , 35.4027 , 33.98145, 32.97956, 30.08218, 30.29887, 29.00976, 26.63001, 27.03646, 24.48703, 22.05865, 20.62081, 19.57784, 14.73243, 17.36648], [ 33.43954, 31.79899, 32.32825, 33.89059, 31.59205, 33.13995, 32.26498, 32.11079, 32.16109, 33.14589, 33.79095, 31.13131, 32.00993, 30.8686 , 31.0901 , 29.98562, 29.95065, 31.49555, 29.31235, 28.38986, 27.73913, 26.12805, 24.33084, 22.2785 , 22.85749, 21.32264, 19.20711, 18.36001, 17.93757, 16.65181, 15.59285, 12.96807, 9.50175, 10.13169, 8.67729, 7.19981, 7.16407, 0. , 5.29396, 8.67893, 9.00802, 11.50156, 10.13767, 12.58519, 13.75226, 16.33162, 16.16096, 21.27586, 20.36335, 22.08521, 24.78172, 26.80699, 28.87533, 29.58085, 29.35632, 32.37612, 33.42273, 35.02844, 35.98014, 37.25958, 38.02517, 40.01821, 39.69333, 44.39596, 40.98925, 40.81211, 43.15676, 44.3192 , 44.52264, 45.11777, 44.80329, 44.7664 , 44.57585, 45.54288, 45.13382, 45.13769, 45.036 , 43.45157, 42.69738, 43.68518, 42.70208, 40.23943, 39.23902, 40.71097, 38.75821, 37.46691, 35.83383, 33.96711, 33.65132, 30.88072, 31.70992, 29.12309, 28.97938, 27.45576, 26.55069, 22.69609, 21.31464, 19.94906, 16.52504, 20.00963], [ 35.50327, 34.1394 , 34.54666, 36.02136, 34.09468, 35.80184, 35.20755, 34.68809, 34.72522, 36.03802, 36.57048, 34.00834, 35.12134, 34.10976, 34.2731 , 33.17048, 33.33854, 34.6749 , 32.28865, 31.34822, 31.08385, 29.5516 , 28.06753, 25.59015, 26.18616, 24.07172, 22.42597, 21.25014, 21.53923, 19.74296, 18.95488, 15.25494, 13.1454 , 12.83076, 10.0527 , 9.26742, 9.04734, 5.29396, 0. , 7.02491, 7.91346, 9.49832, 8.97779, 10.53213, 11.64105, 15.08541, 14.93272, 19.4389 , 19.22241, 20.51852, 24.28367, 26.21266, 28.62454, 29.21067, 29.13434, 32.23565, 33.74681, 35.77881, 37.02082, 38.04921, 39.25933, 41.2876 , 41.02127, 45.92867, 42.53998, 42.80247, 45.02922, 46.2601 , 46.99824, 47.39798, 47.50405, 47.43771, 47.28364, 48.66654, 47.8922 , 48.31191, 48.29145, 46.78708, 46.0658 , 47.00178, 46.27576, 43.40774, 42.66151, 44.23342, 42.09818, 41.14137, 38.8066 , 37.42541, 36.96229, 34.28354, 34.64723, 31.85059, 31.58249, 30.04064, 28.92462, 24.27898, 22.53438, 20.85881, 17.8264 , 20.38759], [ 34.96861, 34.08447, 33.66703, 35.60052, 34.22325, 35.25591, 35.27066, 34.12659, 34.26106, 35.51766, 36.01451, 33.61703, 34.61314, 33.50914, 33.85408, 33.35028, 33.29967, 33.71667, 31.99324, 30.44753, 31.06195, 28.71938, 28.67934, 26.20645, 25.68635, 24.28534, 21.78585, 21.14056, 21.5 , 20.61442, 20.67684, 15.85912, 14.11243, 14.2167 , 12.31282, 11.82888, 11.89485, 8.67893, 7.02491, 0. , 11.6549 , 10.73557, 9.71187, 12.80952, 12.4755 , 15.06586, 16.27814, 20.16314, 21.00477, 21.04933, 24.7121 , 26.38835, 29.5294 , 29.42525, 28.82948, 32.05309, 33.66472, 36.01266, 37.14096, 37.87527, 38.78613, 41.51105, 41.02762, 45.8564 , 42.62709, 42.66475, 44.95803, 45.70734, 46.90492, 47.20804, 47.14969, 46.86 , 46.8637 , 48.70376, 47.41032, 48.23058, 47.81376, 46.72845, 45.78045, 46.55168, 46.44799, 43.06789, 41.80521, 44.2389 , 41.92408, 40.96925, 38.73615, 37.92498, 36.83463, 34.6773 , 34.46336, 31.84197, 32.48614, 29.86623, 29.37943, 25.34313, 22.18623, 20.58748, 18.68724, 20.88898], [ 34.61776, 34.40636, 34.21985, 36.08047, 34.16007, 35.33808, 34.3224 , 35.42161, 35.29431, 36.24468, 37.16413, 34.00148, 36.07251, 35.42056, 36.35145, 34.18506, 33.88558, 36.43731, 33.95671, 32.95536, 33.13063, 30.9913 , 28.75474, 27.03954, 28.9516 , 25.596 , 24.86467, 23.39958, 24.50229, 21.15351, 20.8958 , 18.74728, 15.8578 , 16.72792, 13.61151, 10.17125, 7.77198, 9.00802, 7.91346, 11.6549 , 0. , 8.82796, 8.51924, 8.99769, 12.41161, 14.38248, 13.54233, 16.59245, 15.72496, 18.98626, 22.36856, 24.60879, 26.09397, 27.63374, 27.67937, 30.67328, 31.99623, 33.83917, 35.09395, 36.84287, 37.83474, 39.87093, 39.81846, 44.45979, 41.3634 , 41.50962, 44.11761, 45.79608, 46.84759, 47.06841, 46.96281, 47.70312, 47.32522, 48.293 , 48.42568, 48.33951, 48.32094, 47.48303, 46.66074, 47.9282 , 47.12264, 45.05578, 44.3409 , 45.66062, 43.92436, 42.83903, 40.98259, 39.58649, 38.84249, 36.13211, 36.84814, 34.73249, 33.17741, 33.3556 , 31.06652, 27.70613, 26.09503, 25.22437, 20.37247, 22.75393], [ 31.52205, 32.09843, 31.60984, 33.37669, 32.3283 , 33.38177, 33.45572, 33.62916, 33.7662 , 35.04257, 36.42303, 33.33828, 35.37259, 34.64547, 35.76726, 34.59722, 34.48264, 36.39199, 34.64837, 33.76841, 33.66469, 31.6734 , 31.08502, 29.20724, 30.93716, 27.93375, 26.82595, 25.64935, 26.72856, 24.45494, 24.70048, 20.99385, 19.62747, 19.37301, 17.29523, 14.78769, 14.16043, 11.50156, 9.49832, 10.73557, 8.82796, 0. , 3.87115, 4.46637, 8.17529, 8.09024, 8.56917, 11.67969, 12.43841, 13.85593, 17.57202, 19.71272, 21.99594, 22.85153, 22.16445, 25.45666, 27.04794, 29.67531, 31.04348, 32.13853, 33.7628 , 36.48198, 36.14058, 41.01457, 37.91637, 38.79497, 40.96397, 42.64609, 44.26794, 44.55381, 44.89301, 45.34457, 45.22525, 47.30472, 46.71893, 47.3816 , 47.69802, 47.15203, 46.77432, 47.78838, 47.81124, 45.64705, 44.93698, 46.70672, 45.20662, 44.90814, 43.03585, 41.8455 , 41.35057, 39.45815, 40.11168, 37.08534, 37.64673, 36.20666, 35.37051, 31.85248, 29.10527, 28.10747, 24.43796, 27.26505], [ 31.5479 , 31.55098, 31.44096, 33.34017, 31.89116, 33.04059, 33.21501, 33.16394, 33.40285, 34.67094, 36.18591, 33.20841, 34.85009, 34.19767, 35.06246, 34.20947, 34.10065, 35.67965, 34.25079, 33.23805, 33.0211 , 31.05312, 30.53155, 28.64062, 30.0345 , 27.86256, 26.17402, 25.29954, 25.8231 , 23.9113 , 24.0165 , 20.58536, 18.29823, 18.59613, 16.952 , 13.87506, 13.10023, 10.13767, 8.97779, 9.71187, 8.51924, 3.87115, 0. , 6.12594, 7.96356, 7.57526, 8.45868, 12.97798, 12.90918, 14.31407, 17.64249, 19.73502, 21.71174, 22.56943, 21.94117, 25.69766, 26.75678, 29.24043, 30.87281, 31.9451 , 33.52238, 36.04839, 35.83875, 40.75469, 37.69173, 38.4327 , 40.95094, 42.4804 , 43.60742, 44.22534, 44.42682, 44.91905, 44.77389, 46.85295, 46.45624, 46.96242, 47.13319, 46.71132, 46.36086, 47.34189, 47.27088, 45.18382, 44.30438, 46.30564, 44.68842, 44.1482 , 42.80237, 41.36941, 40.89437, 38.88697, 39.62182, 36.88549, 37.35607, 35.71787, 34.81861, 31.64177, 28.85684, 27.61326, 24.23459, 27.25467], [ 31.86234, 32.21933, 32.23245, 33.79942, 32.43132, 33.91149, 33.77282, 34.14988, 34.11146, 35.65553, 37.09342, 33.84176, 36.19004, 35.53408, 36.69763, 35.18721, 35.42542, 37.13664, 35.22869, 34.76968, 34.43139, 32.90775, 31.69899, 29.85524, 32.15204, 28.87872, 28.22608, 27.11021, 27.87236, 26.09147, 25.43853, 21.90675, 20.97263, 21.00058, 18.20422, 15.22203, 14.59801, 12.58519, 10.53213, 12.80952, 8.99769, 4.46637, 6.12594, 0. , 8.44401, 7.97982, 8.83251, 10.38464, 10.82867, 12.8995 , 17.10962, 18.97275, 20.8149 , 21.56576, 21.19765, 24.98392, 26.22855, 29.18584, 30.1669 , 31.64604, 33.19261, 35.58516, 35.3853 , 40.18287, 37.57579, 38.53515, 40.71094, 42.71524, 44.28372, 44.57255, 45.13821, 45.55069, 45.47875, 47.62256, 47.08008, 47.68002, 48.15121, 47.49823, 47.35146, 48.3686 , 48.33511, 46.28713, 46.10429, 47.57535, 46.38933, 45.72293, 43.99917, 42.7842 , 42.32976, 40.70953, 41.06713, 38.47232, 38.78739, 37.64475, 36.9269 , 33.11543, 30.98764, 29.72708, 25.83973, 28.4507 ], [ 30.85453, 29.68935, 30.68488, 32.42774, 30.52192, 32.75336, 32.98931, 31.96731, 32.13639, 34.59658, 35.61312, 33.36215, 34.77939, 35.0925 , 35.34281, 34.12158, 35.45233, 35.96695, 34.85416, 33.79277, 34.32978, 32.68055, 32.62122, 29.8061 , 31.65812, 29.7082 , 28.33647, 27.88541, 28.08681, 26.17295, 26.95084, 23.7641 , 21.55663, 20.66568, 19.0518 , 17.04526, 16.76365, 13.75226, 11.64105, 12.4755 , 12.41161, 8.17529, 7.96356, 8.44401, 0. , 9.43444, 8.12104, 10.99449, 11.78021, 11.37028, 14.90543, 16.80241, 19.058 , 19.18607, 20.42547, 22.96086, 24.54862, 26.69301, 29.07142, 29.81017, 31.49645, 33.32921, 33.38918, 38.71385, 35.4746 , 36.61224, 39.11164, 40.61639, 42.02245, 42.49394, 43.59005, 43.80397, 43.89376, 46.64783, 45.65345, 46.63616, 47.08335, 46.48834, 46.45531, 47.75785, 47.61879, 45.29908, 45.03967, 47.61192, 45.31081, 45.41523, 44.16385, 43.17606, 42.72795, 41.05225, 42.00461, 39.14698, 39.86511, 39.12616, 38.19473, 33.86518, 31.76617, 30.50207, 27.94986, 30.43803], [ 29.69611, 30.64989, 30.737 , 31.97527, 31.02961, 32.5993 , 33.29691, 33.13473, 33.33042, 34.79422, 37.27428, 34.29352, 36.03767, 35.58014, 36.84744, 35.88742, 36.19829, 37.67898, 36.69074, 36.13666, 35.45315, 34.31807, 33.92971, 32.57415, 34.38793, 31.88982, 30.77347, 29.88329, 30.54159, 29.57049, 29.21772, 25.59202, 24.21139, 24.48285, 23.06074, 19.97496, 19.57846, 16.33162, 15.08541, 15.06586, 14.38248, 8.09024, 7.57526, 7.97982, 9.43444, 0. , 6.60121, 9.85696, 10.89332, 10.39258, 12.86851, 14.75476, 17.21257, 17.61602, 15.71376, 20.40996, 21.66767, 25.03361, 26.60748, 27.25091, 29.58843, 32.22904, 31.80973, 37.16025, 34.28055, 35.884 , 38.25288, 39.99024, 41.63577, 42.34816, 43.03802, 43.69836, 43.42436, 46.41609, 45.84523, 46.92581, 47.45754, 47.51112, 47.68894, 48.68155, 49.1594 , 47.14392, 46.92675, 49.05016, 47.7489 , 47.3948 , 46.55753, 45.24034, 45.04445, 43.71142, 44.46337, 41.62923, 43.11111, 40.95347, 40.97874, 37.50522, 35.00647, 33.76313, 30.48797, 33.78487], [ 29.41756, 29.84656, 30.20034, 31.72587, 30.1106 , 32.43664, 32.48639, 32.68089, 32.66608, 34.36677, 36.78719, 34.00194, 35.84771, 35.82126, 36.8362 , 35.13336, 36.03964, 37.95828, 37.04425, 36.16784, 35.945 , 34.6565 , 33.80767, 32.29469, 34.38638, 31.87346, 31.09911, 29.87818, 30.74741, 28.96665, 29.55268, 26.79648, 24.16038, 23.84352, 23.05218, 20.45104, 19.38376, 16.16096, 14.93272, 16.27814, 13.54233, 8.56917, 8.45868, 8.83251, 8.12104, 6.60121, 0. , 10.17757, 8.53509, 8.32038, 10.85168, 12.70132, 15.99258, 17.25635, 16.51621, 19.26168, 21.60589, 23.78436, 26.22532, 26.73652, 29.27758, 31.33041, 31.04823, 37.12763, 32.98749, 35.06947, 37.28719, 39.461 , 40.85783, 41.59448, 42.32162, 43.57751, 43.1737 , 45.94993, 45.7186 , 46.70514, 47.35142, 47.22145, 47.41203, 48.74309, 48.88315, 46.89426, 46.8966 , 49.20629, 47.28112, 47.34997, 46.70348, 45.24956, 45.31637, 43.29447, 44.9994 , 41.76229, 42.8692 , 41.88106, 41.23793, 37.14262, 35.24283, 34.57233, 31.19456, 34.11927], [ 28.83014, 30.31266, 30.13288, 31.85386, 31.0687 , 32.23116, 32.89237, 33.34788, 33.38514, 35.45895, 37.18274, 34.24137, 37.05029, 37.49003, 38.82735, 37.20982, 37.86692, 39.3333 , 38.1781 , 37.57777, 38.07486, 36.33305, 36.2285 , 34.31277, 37.50663, 34.12199, 33.82732, 33.15122, 34.21383, 31.93696, 32.62558, 29.58366, 28.57247, 28.26891, 26.13901, 22.81966, 22.39702, 21.27586, 19.4389 , 20.16314, 16.59245, 11.67969, 12.97798, 10.38464, 10.99449, 9.85696, 10.17757, 0. , 7.83858, 8.41089, 11.19326, 13.48331, 14.05563, 14.64482, 15.51254, 18.33828, 19.49451, 22.64066, 24.21162, 25.72832, 27.49107, 29.94017, 30.09775, 34.51412, 32.6424 , 34.1041 , 36.48373, 38.65803, 41.22222, 41.17685, 42.52243, 43.43884, 43.36506, 46.36946, 45.84661, 46.8006 , 47.56391, 47.96634, 48.21386, 49.40067, 49.99774, 48.4731 , 48.58388, 50.70632, 49.48718, 49.72993, 48.60454, 47.90274, 47.29044, 46.48555, 47.1459 , 44.6641 , 45.39504, 45.2002 , 44.23507, 41.08682, 38.38866, 37.70556, 34.12645, 36.59337], [ 30.11403, 30.95073, 31.62192, 33.66607, 31.53677, 33.38874, 33.31269, 34.37908, 34.29769, 36.3779 , 38.417 , 35.53305, 38.33912, 38.6495 , 39.99487, 37.74652, 38.88208, 40.59642, 39.77615, 39.01639, 39.28353, 37.55324, 36.50334, 34.99807, 38.17473, 35.33819, 34.63993, 33.91166, 34.31615, 32.42409, 32.93689, 30.67949, 28.2877 , 28.32281, 26.38234, 22.94849, 21.86462, 20.36335, 19.22241, 21.00477, 15.72496, 12.43841, 12.90918, 10.82867, 11.78021, 10.89332, 8.53509, 7.83858, 0. , 6.82238, 11.4934 , 11.88195, 13.2132 , 14.82213, 16.05954, 18.55392, 19.55994, 22.19041, 24.20944, 25.95339, 27.98701, 29.72961, 30.14266, 35.2683 , 32.68231, 34.49442, 36.84283, 39.96906, 41.25252, 41.86267, 43.03579, 44.67317, 44.36291, 47.1119 , 47.23168, 47.84897, 48.73064, 48.82056, 49.2878 , 50.56509, 50.75426, 49.53957, 49.65108, 51.8766 , 50.44647, 50.26925, 49.99599, 48.57102, 48.20799, 46.79335, 48.1779 , 45.75658, 46.03133, 46.10588, 44.94675, 41.40908, 39.40758, 38.86882, 34.90814, 37.3594 ], [ 28.76504, 29.50086, 30.13293, 32.0357 , 30.2114 , 32.62237, 33.17576, 32.97516, 32.89254, 35.3327 , 37.62287, 35.23732, 37.71305, 38.10808, 39.19622, 37.47408, 39.07201, 39.92405, 39.60791, 38.73035, 39.22602, 37.84719, 37.53278, 35.82 , 38.39304, 36.0731 , 35.32224, 34.60562, 35.0456 , 34.06939, 34.88785, 31.89102, 29.74339, 29.3094 , 28.09777, 25.40346, 24.72382, 22.08521, 20.51852, 21.04933, 18.98626, 13.85593, 14.31407, 12.8995 , 11.37028, 10.39258, 8.32038, 8.41089, 6.82238, 0. , 8.71731, 7.76443, 12.45335, 12.2652 , 12.90017, 15.09061, 17.41787, 20.38048, 22.61928, 23.14501, 25.76323, 27.71115, 27.8258 , 33.66319, 30.48156, 32.98226, 34.95939, 37.85604, 39.53608, 40.08616, 41.63043, 43.15796, 42.86376, 46.50258, 45.86231, 47.29211, 48.17182, 48.3848 , 49.04825, 50.09211, 50.93521, 48.90653, 49.34591, 52.28837, 50.39015, 50.58997, 50.40001, 49.3039 , 49.05992, 47.89894, 49.234 , 46.43048, 47.85874, 47.20917, 46.79585, 42.67964, 40.30443, 39.8812 , 36.95114, 39.24277], [ 23.7291 , 25.13342, 25.84322, 26.92882, 25.7518 , 28.36209, 29.06571, 29.38917, 29.26526, 31.54706, 34.72634, 32.32311, 34.34605, 35.42106, 36.6623 , 34.70021, 36.40368, 37.99061, 38.021 , 37.37737, 37.42741, 36.54117, 36.5014 , 35.31492, 38.36668, 35.88703, 35.64687, 34.9862 , 35.81848, 34.49619, 35.60229, 33.4665 , 31.3773 , 30.90888, 30.76486, 27.90575, 27.4199 , 24.78172, 24.28367, 24.7121 , 22.36856, 17.57202, 17.64249, 17.10962, 14.90543, 12.86851, 10.85168, 11.19326, 11.4934 , 8.71731, 0. , 5.57175, 8.63071, 9.25549, 9.12862, 9.36863, 12.44362, 14.25313, 17.25268, 17.45389, 20.41051, 22.31093, 22.02977, 28.4554 , 24.40348, 27.41917, 29.65847, 32.03859, 34.25637, 34.87194, 36.40332, 38.15373, 37.65245, 41.47684, 41.28355, 42.71965, 43.82151, 44.61234, 45.47536, 47.05548, 47.78609, 46.48006, 47.17699, 49.92586, 48.24647, 49.03127, 49.42958, 48.33632, 48.54484, 47.60107, 49.68564, 46.61442, 48.6812 , 48.31756, 48.12534, 44.56241, 42.7187 , 42.53511, 39.36061, 42.55081], [ 25.6983 , 26.92125, 27.91135, 29.3656 , 27.6139 , 30.55605, 31.25932, 31.30679, 31.06269, 33.65312, 36.87088, 34.68689, 36.90215, 37.82413, 39.10442, 37.06721, 39.23049, 40.17349, 40.62842, 39.86314, 40.16763, 39.18647, 39.07491, 37.94267, 40.71544, 38.71968, 38.19558, 37.6223 , 38.00399, 37.51745, 38.44528, 36.09133, 33.75635, 33.39326, 33.11761, 30.36558, 29.752 , 26.80699, 26.21266, 26.38835, 24.60879, 19.71272, 19.73502, 18.97275, 16.80241, 14.75476, 12.70132, 13.48331, 11.88195, 7.76443, 5.57175, 0. , 9.15475, 8.56349, 8.47478, 8.49044, 11.99477, 14.44966, 17.32829, 17.19026, 20.41925, 22.11879, 22.02562, 28.80888, 24.85666, 28.34247, 30.26402, 33.37055, 35.15239, 36.01863, 37.78748, 39.81268, 39.32254, 43.59766, 43.22273, 44.89829, 46.04871, 46.77149, 47.87025, 49.21038, 50.23529, 48.6807 , 49.53981, 52.72328, 50.9 , 51.39158, 52.21003, 51.04052, 51.18106, 50.27656, 52.28684, 49.36971, 51.53462, 50.93333, 50.99637, 47.03863, 45.1501 , 44.98107, 42.0262 , 44.83658], [ 25.97412, 26.77764, 28.12311, 29.6113 , 28.20037, 30.3347 , 31.50259, 31.86587, 32.01605, 34.74652, 37.5226 , 34.89579, 37.11372, 38.53678, 39.88191, 38.15456, 39.92573, 40.90994, 41.3098 , 40.9369 , 41.16595, 39.96285, 40.15724, 38.60944, 42.31539, 40.48097, 40.17273, 39.74094, 40.05607, 38.87821, 39.2932 , 37.70993, 35.79133, 35.6437 , 35.1361 , 31.45698, 30.72862, 28.87533, 28.62454, 29.5294 , 26.09397, 21.99594, 21.71174, 20.8149 , 19.058 , 17.21257, 15.99258, 14.05563, 13.2132 , 12.45335, 8.63071, 9.15475, 0. , 5.69041, 8.86444, 10.50088, 9.15299, 11.56439, 15.31776, 16.37099, 19.20324, 20.11261, 21.17791, 26.23138, 24.1575 , 26.74297, 29.9642 , 32.63265, 34.33692, 35.4078 , 37.27436, 38.82324, 38.79999, 42.81333, 43.14353, 44.0194 , 45.29074, 46.65033, 47.77203, 49.32322, 49.88537, 49.61055, 50.40164, 52.96687, 51.90866, 52.35132, 53.40064, 52.18807, 52.39054, 51.76294, 53.60591, 51.35824, 52.66349, 52.69263, 52.29538, 49.4784 , 47.59661, 47.05231, 43.63606, 46.86818], [ 25.13513, 25.45479, 27.30813, 28.77098, 27.20175, 29.56765, 31.19405, 30.67301, 30.75263, 33.95725, 36.63014, 34.4585 , 36.56462, 38.03388, 38.95773, 37.88232, 40.05261, 39.98119, 40.72945, 40.37831, 40.59288, 39.6461 , 40.40674, 38.67121, 42.04724, 40.65492, 39.9043 , 40.05704, 39.79457, 39.44799, 39.98048, 37.82863, 36.21335, 35.93431, 35.41829, 32.01961, 31.93212, 29.58085, 29.21067, 29.42525, 27.63374, 22.85153, 22.56943, 21.56576, 19.18607, 17.61602, 17.25635, 14.64482, 14.82213, 12.2652 , 9.25549, 8.56349, 5.69041, 0. , 7.69056, 9.58872, 7.27847, 11.07301, 13.97117, 14.85007, 17.3385 , 18.54959, 19.51911, 24.47804, 23.22298, 25.99818, 28.67918, 31.37243, 33.06567, 34.08613, 36.44189, 37.65905, 37.7809 , 42.43773, 42.01543, 43.33093, 44.64622, 45.80397, 47.25139, 48.43046, 49.49756, 48.83554, 49.72558, 52.67093, 51.57562, 52.08085, 53.1481 , 52.11595, 52.16829, 52.09858, 53.53316, 51.2622 , 53.32599, 52.90649, 52.95174, 49.96028, 47.72114, 47.06461, 44.40938, 47.37281], [ 24.45972, 26.64139, 27.01853, 27.98344, 27.57982, 29.58556, 31.33608, 31.14565, 31.17672, 33.40518, 37.07125, 34.33839, 36.48557, 37.13566, 38.89602, 37.77484, 39.20976, 39.92003, 40.56478, 40.3905 , 40.07303, 39.38249, 39.85148, 38.95962, 41.86037, 40.04445, 39.71271, 39.15768, 39.61693, 39.80638, 39.73092, 37.1605 , 36.02599, 36.08269, 35.86916, 32.50931, 32.14663, 29.35632, 29.13434, 28.82948, 27.67937, 22.16445, 21.94117, 21.19765, 20.42547, 15.71376, 16.51621, 15.51254, 16.05954, 12.90017, 9.12862, 8.47478, 8.86444, 7.69056, 0. , 8.27573, 8.48397, 13.22617, 14.5959 , 14.29243, 17.74386, 20.29587, 20.04847, 25.72337, 23.44249, 26.57937, 28.95484, 31.3744 , 33.67799, 34.68713, 36.28302, 37.6514 , 37.46012, 42.17949, 41.8008 , 43.46606, 44.62028, 46.06782, 47.29764, 48.35452, 49.81537, 48.72589, 49.68722, 52.42257, 51.63792, 51.8305 , 52.84663, 51.78041, 52.05071, 51.83186, 53.22438, 50.71905, 53.35513, 51.86422, 52.68216, 49.67751, 47.40003, 46.87143, 43.90372, 47.26843], [ 21.55045, 24.28835, 24.69093, 25.49869, 25.10337, 27.53405, 28.94501, 29.06828, 28.89607, 31.44167, 35.13839, 33.07445, 35.26936, 36.62481, 38.43435, 36.36806, 38.61636, 39.6808 , 40.68159, 40.18733, 40.5409 , 39.77171, 40.40163, 39.63135, 42.90939, 40.78432, 40.82968, 40.26411, 41.10429, 40.83166, 41.86321, 39.86674, 38.43108, 37.85734, 38.17342, 35.50764, 35.16599, 32.37612, 32.23565, 32.05309, 30.67328, 25.45666, 25.69766, 24.98392, 22.96086, 20.40996, 19.26168, 18.33828, 18.55392, 15.09061, 9.36863, 8.49044, 10.50088, 9.58872, 8.27573, 0. , 7.81203, 9.63841, 11.81341, 10.38626, 14.34691, 16.65127, 16.01583, 22.86644, 18.67511, 22.78851, 24.49932, 27.34599, 30.249 , 30.89222, 32.98648, 35.07533, 34.59757, 39.54874, 39.14111, 41.24033, 42.66911, 44.16304, 45.54425, 47.06384, 48.51593, 47.55693, 48.77945, 51.948 , 50.60804, 51.67431, 52.94916, 52.10773, 52.46879, 52.27757, 54.53359, 51.56906, 54.55259, 53.99784, 54.65517, 51.26881, 49.40055, 49.64106, 46.62981, 49.95592], [ 21.69278, 23.54393, 24.94734, 26.08835, 25.22312, 26.86667, 28.98748, 29.04268, 29.20958, 31.93944, 35.36933, 33.23999, 35.32999, 36.8622 , 38.31651, 37.32521, 39.27939, 39.62528, 40.97394, 40.75477, 40.65161, 39.84257, 40.9679 , 39.99892, 43.56974, 42.24041, 41.59044, 41.81453, 41.69482, 41.71964, 42.40135, 40.65224, 39.29054, 39.11931, 39.25199, 35.83888, 35.88763, 33.42273, 33.74681, 33.66472, 31.99623, 27.04794, 26.75678, 26.22855, 24.54862, 21.66767, 21.60589, 19.49451, 19.55994, 17.41787, 12.44362, 11.99477, 9.15299, 7.27847, 8.48397, 7.81203, 0. , 6.39087, 8.34311, 9.09498, 12.01051, 14.03886, 14.66736, 19.35195, 18.37439, 21.53225, 24.01021, 26.91727, 28.70851, 29.95311, 32.23778, 33.96847, 33.87564, 38.86784, 38.83989, 40.22175, 41.76582, 43.56084, 45.32808, 46.48406, 47.93843, 47.98325, 48.86364, 51.79319, 51.20293, 51.97604, 53.74581, 52.60551, 52.8835 , 53.24671, 55.09278, 52.73899, 55.52021, 54.91969, 55.45876, 53.12569, 50.8073 , 50.61611, 47.78862, 51.26349], [ 19.95303, 21.14658, 23.21836, 24.3511 , 22.84998, 25.00712, 26.83509, 27.11562, 27.32181, 30.05815, 33.57977, 31.99141, 33.72941, 35.93997, 37.10258, 35.80232, 38.1003 , 38.71391, 40.4026 , 40.04084, 40.1262 , 39.44919, 40.5153 , 39.64319, 43.4618 , 42.36186, 41.89523, 42.1735 , 42.04974, 41.67128, 43.0919 , 42.17263, 40.11502, 39.73362, 40.55388, 37.37873, 37.34873, 35.02844, 35.77881, 36.01266, 33.83917, 29.67531, 29.24043, 29.18584, 26.69301, 25.03361, 23.78436, 22.64066, 22.19041, 20.38048, 14.25313, 14.44966, 11.56439, 11.07301, 13.22617, 9.63841, 6.39087, 0. , 7.5008 , 7.8237 , 10.40305, 10.70596, 11.78864, 17.10573, 14.52165, 18.11827, 20.90729, 23.93647, 25.20954, 26.57818, 28.98693, 31.46102, 31.12039, 35.92599, 36.4427 , 37.60689, 39.32328, 41.32062, 43.23691, 44.71995, 45.92899, 46.36053, 47.3359 , 50.52313, 49.48781, 50.80266, 53.15541, 51.87092, 52.39965, 52.58596, 55.23165, 52.76786, 55.52018, 55.73155, 55.94518, 53.73325, 51.74464, 51.99425, 49.21381, 52.79329], [ 16.26348, 19.0042 , 20.37644, 21.22304, 20.28434, 21.74171, 23.67883, 24.63125, 24.53975, 26.96615, 30.77296, 28.83607, 31.33676, 33.1206 , 34.93565, 33.52616, 35.6578 , 36.21198, 37.91905, 38.03854, 37.85329, 37.48819, 38.42327, 38.13686, 42.34263, 40.86445, 40.82556, 41.17397, 41.08039, 41.61213, 42.45519, 41.48734, 40.44974, 40.26473, 40.93513, 37.77828, 38.05632, 35.98014, 37.02082, 37.14096, 35.09395, 31.04348, 30.87281, 30.1669 , 29.07142, 26.60748, 26.22532, 24.21162, 24.20944, 22.61928, 17.25268, 17.32829, 15.31776, 13.97117, 14.5959 , 11.81341, 8.34311, 7.5008 , 0. , 5.83413, 5.46876, 8.07889, 8.10377, 12.28344, 11.70756, 14.58617, 16.50391, 20.31629, 22.0821 , 22.97453, 25.272 , 27.72545, 27.36277, 32.25835, 32.61786, 33.89607, 35.73057, 37.77336, 39.95231, 41.07814, 42.85021, 43.44198, 44.85383, 47.6269 , 47.47827, 48.41191, 50.90451, 49.78713, 50.24847, 51.22179, 53.41837, 51.29034, 54.5929 , 54.6199 , 55.51699, 53.60744, 51.63282, 52.1621 , 49.27814, 52.86302], [ 17.11013, 19.69926, 20.92257, 21.25266, 20.93006, 23.02809, 25.4293 , 25.08637, 25.21252, 27.64338, 31.75893, 30.27728, 32.09544, 33.81829, 35.54241, 34.40962, 36.74788, 37.08048, 39.08311, 39.0819 , 38.93651, 38.87541, 40.1212 , 39.73796, 43.41177, 42.22587, 42.24267, 42.24356, 42.4485 , 43.2428 , 44.1862 , 42.91311, 41.86014, 41.36942, 42.5099 , 39.79016, 39.9811 , 37.25958, 38.04921, 37.87527, 36.84287, 32.13853, 31.9451 , 31.64604, 29.81017, 27.25091, 26.73652, 25.72832, 25.95339, 23.14501, 17.45389, 17.19026, 16.37099, 14.85007, 14.29243, 10.38626, 9.09498, 7.8237 , 5.83413, 0. , 6.41631, 8.54307, 7.54798, 13.83722, 10.47169, 14.98237, 16.51711, 19.64403, 21.76762, 22.8833 , 25.40245, 27.78937, 27.30865, 32.95778, 32.79115, 34.89289, 36.70972, 38.8788 , 41.05635, 42.15574, 44.18684, 44.13238, 45.71765, 48.95971, 48.28665, 49.59359, 52.1083 , 51.08147, 51.81738, 52.5753 , 55.01953, 52.46332, 56.32081, 55.78484, 57.09844, 54.71573, 52.72249, 53.32 , 50.79307, 54.50537], [ 14.32254, 17.01569, 17.85377, 18.87943, 18.50018, 19.62897, 22.19619, 22.31814, 22.16274, 24.84473, 28.54674, 27.08538, 29.33582, 31.46152, 33.22748, 32.12772, 34.53888, 34.29651, 36.66186, 36.62329, 37.02688, 36.52095, 38.22838, 37.88513, 41.79768, 40.89454, 40.88634, 41.4135 , 41.31527, 42.36568, 43.47623, 42.56542, 41.7082 , 41.39021, 42.56349, 39.73051, 40.13104, 38.02517, 39.25933, 38.78613, 37.83474, 33.7628 , 33.52238, 33.19261, 31.49645, 29.58843, 29.27758, 27.49107, 27.98701, 25.76323, 20.41051, 20.41925, 19.20324, 17.3385 , 17.74386, 14.34691, 12.01051, 10.40305, 5.46876, 6.41631, 0. , 6.56254, 5.82652, 9.42892, 8.76253, 11.01716, 12.97536, 16.14407, 18.65999, 19.26615, 21.70973, 23.98596, 23.83313, 29.46502, 29.31874, 31.04096, 32.71498, 35.20378, 37.43086, 38.47346, 40.74117, 41.15986, 42.72724, 46.03943, 45.7643 , 46.92375, 49.81399, 49.07544, 49.48923, 50.83285, 53.02921, 51.02381, 54.85708, 54.85699, 56.10978, 54.39723, 52.29678, 53.08869, 50.68305, 54.16762], [ 17.62265, 17.70308, 20.91217, 21.25316, 19.27925, 22.0497 , 23.92894, 23.71491, 23.60181, 26.74278, 30.22502, 29.42086, 31.07569, 33.72103, 34.83971, 33.40582, 36.56434, 36.10593, 38.48701, 38.68787, 38.83153, 39.03388, 40.08247, 39.57072, 43.79152, 43.17864, 43.2909 , 43.87194, 43.35611, 44.39476, 45.37599, 45.03955, 43.69002, 43.28881, 44.52465, 41.74234, 42.11747, 40.01821, 41.2876 , 41.51105, 39.87093, 36.48198, 36.04839, 35.58516, 33.32921, 32.22904, 31.33041, 29.94017, 29.72961, 27.71115, 22.31093, 22.11879, 20.11261, 18.54959, 20.29587, 16.65127, 14.03886, 10.70596, 8.07889, 8.54307, 6.56254, 0. , 4.33833, 8.76183, 7.88595, 10.96004, 13.3731 , 17.00054, 17.6708 , 18.90909, 22.24521, 24.80605, 24.44948, 29.94102, 30.21988, 31.68858, 33.75574, 35.95484, 38.6216 , 39.86389, 41.58399, 42.28369, 44.38077, 47.67845, 47.02336, 48.24653, 51.67294, 50.62617, 51.37421, 52.55434, 55.1481 , 53.295 , 56.77537, 57.33105, 58.35886, 56.29047, 54.80877, 55.50578, 53.14199, 56.6705 ], [ 15.44168, 16.76036, 19.36837, 19.09945, 17.75401, 20.71073, 22.48331, 22.41495, 22.19147, 24.89471, 28.99081, 28.13654, 29.78355, 32.12859, 33.64244, 31.97045, 35.07799, 34.99984, 37.36496, 37.66226, 37.52198, 38.04894, 38.99665, 38.9161 , 42.99372, 41.97252, 42.33337, 42.72225, 42.53551, 43.77865, 44.76722, 44.26307, 43.19493, 42.72848, 44.12784, 41.66611, 42.08783, 39.69333, 41.02127, 41.02762, 39.81846, 36.14058, 35.83875, 35.3853 , 33.38918, 31.80973, 31.04823, 30.09775, 30.14266, 27.8258 , 22.02977, 22.02562, 21.17791, 19.51911, 20.04847, 16.01583, 14.66736, 11.78864, 8.10377, 7.54798, 5.82652, 4.33833, 0. , 9.49269, 5.60066, 10.13634, 11.30708, 15.06618, 16.60802, 17.5898 , 20.66387, 23.36406, 22.74647, 28.44725, 28.42199, 30.43171, 32.53739, 34.61086, 37.28059, 38.61696, 40.54339, 40.91422, 43.13823, 46.37182, 45.76609, 47.10957, 50.41253, 49.41508, 50.26066, 51.50842, 54.21112, 52.03406, 56.08765, 56.26153, 57.72116, 55.39809, 54.07596, 54.94413, 52.52346, 56.22352], [ 17.37134, 18.85088, 20.86528, 20.88305, 20.49716, 20.89719, 23.70778, 23.87865, 23.8951 , 26.39705, 29.49656, 28.78396, 30.69899, 33.34453, 34.72378, 34.19075, 36.59537, 35.8496 , 38.4644 , 39.09107, 39.07799, 39.2854 , 41.12947, 40.94059, 45.43309, 44.85097, 45.05862, 46.08496, 45.50379, 46.84558, 47.773 , 47.54766, 47.14492, 46.85017, 48.09543, 45.21051, 46.05367, 44.39596, 45.92867, 45.8564 , 44.45979, 41.01457, 40.75469, 40.18287, 38.71385, 37.16025, 37.12763, 34.51412, 35.2683 , 33.66319, 28.4554 , 28.80888, 26.23138, 24.47804, 25.72337, 22.86644, 19.35195, 17.10573, 12.28344, 13.83722, 9.42892, 8.76183, 9.49269, 0. , 10.77901, 10.05965, 11.4491 , 14.23453, 15.65154, 15.89031, 19.35493, 21.37071, 21.35245, 26.88469, 27.14093, 28.27884, 30.5466 , 33.50676, 36.4934 , 37.19752, 39.64894, 41.31134, 43.32302, 46.1664 , 46.66713, 48.02232, 51.58657, 50.8496 , 51.44928, 53.70231, 55.7896 , 54.36972, 58.30467, 58.85498, 60.2241 , 59.29777, 57.3284 , 58.22332, 56.01087, 59.61459], [ 13.90914, 15.67933, 17.38101, 16.98734, 16.24356, 18.9401 , 20.61034, 20.79483, 20.57865, 22.80323, 27.01958, 26.43445, 27.88792, 30.4796 , 32.11209, 30.2786 , 33.28937, 33.74472, 36.32379, 36.48663, 36.60449, 37.05129, 38.17204, 38.2891 , 42.39361, 41.45324, 42.17539, 42.30557, 42.54538, 43.53456, 44.99764, 44.95504, 43.81483, 43.10439, 45.13455, 42.9827 , 43.28691, 40.98925, 42.53998, 42.62709, 41.3634 , 37.91637, 37.69173, 37.57579, 35.4746 , 34.28055, 32.98749, 32.6424 , 32.68231, 30.48156, 24.40348, 24.85666, 24.1575 , 23.22298, 23.44249, 18.67511, 18.37439, 14.52165, 11.70756, 10.47169, 8.76253, 7.88595, 5.60066, 10.77901, 0. , 7.37848, 7.94763, 11.60285, 13.60009, 14.22662, 16.93863, 20.48684, 19.57347, 25.10732, 25.49894, 27.55298, 29.7128 , 32.13686, 34.70808, 36.23865, 38.20019, 38.77183, 41.07158, 44.37704, 43.59581, 45.48707, 49.01208, 48.08088, 49.19229, 50.28702, 53.56762, 51.16563, 55.41252, 55.96844, 57.4112 , 55.34975, 54.10888, 55.4712 , 53.09032, 56.86156], [ 11.73735, 13.11373, 14.14065, 14.43121, 13.95706, 14.67542, 16.80414, 17.13811, 17.20223, 19.30856, 22.64011, 22.12388, 23.6909 , 26.68134, 28.69735, 26.83978, 29.56233, 29.52144, 32.42384, 32.61225, 33.43265, 33.33446, 34.9892 , 35.07866, 39.51926, 38.91288, 39.71804, 40.15065, 40.30517, 41.47749, 42.85482, 43.36693, 42.6301 , 42.13604, 44.19635, 41.96344, 42.38342, 40.81211, 42.80247, 42.66475, 41.50962, 38.79497, 38.4327 , 38.53515, 36.61224, 35.884 , 35.06947, 34.1041 , 34.49442, 32.98226, 27.41917, 28.34247, 26.74297, 25.99818, 26.57937, 22.78851, 21.53225, 18.11827, 14.58617, 14.98237, 11.01716, 10.96004, 10.13634, 10.05965, 7.37848, 0. , 6.85229, 8.33375, 10.64638, 10.34292, 12.47002, 15.48307, 15.00604, 20.17726, 20.87708, 22.2614 , 24.11224, 27.27657, 29.58244, 31.1899 , 33.13339, 34.40247, 36.49848, 39.87239, 39.52611, 41.18787, 45.18565, 44.54115, 45.31625, 47.01201, 50.01015, 48.42501, 52.3525 , 53.37068, 54.80933, 53.68554, 52.36036, 53.92575, 51.51095, 55.33784], [ 12.23879, 14.57384, 15.49992, 14.87293, 14.68697, 15.87359, 17.54135, 18.05839, 17.67657, 19.43576, 23.11653, 23.17097, 24.95077, 27.47971, 29.37637, 27.6592 , 30.64956, 30.75883, 33.65172, 34.05121, 34.30145, 34.86519, 36.38785, 36.89862, 41.2064 , 40.27944, 41.17056, 41.69059, 41.79641, 43.3667 , 45.0001 , 45.22628, 44.81877, 43.99537, 46.17291, 44.45906, 45.19904, 43.15676, 45.02922, 44.95803, 44.11761, 40.96397, 40.95094, 40.71094, 39.11164, 38.25288, 37.28719, 36.48373, 36.84283, 34.95939, 29.65847, 30.26402, 29.9642 , 28.67918, 28.95484, 24.49932, 24.01021, 20.90729, 16.50391, 16.51711, 12.97536, 13.3731 , 11.30708, 11.4491 , 7.94763, 6.85229, 0. , 7.42181, 9.81149, 8.48963, 11.54921, 15.39613, 14.38347, 19.86861, 19.77106, 22.06307, 24.50788, 26.82158, 29.77616, 30.96037, 33.5255 , 34.52259, 36.99302, 40.18245, 39.99496, 42.1269 , 45.88592, 45.21708, 46.21675, 48.21974, 51.36139, 49.18705, 54.05609, 54.84453, 56.72943, 55.20246, 53.91346, 55.78262, 53.64081, 57.31579], [ 12.84236, 14.49709, 14.33989, 12.60412, 14.86848, 14.84921, 17.64634, 16.67094, 16.61788, 18.22819, 21.4745 , 21.90119, 22.44497, 25.54979, 27.27617, 26.30551, 29.02811, 28.63502, 31.82761, 32.23 , 32.81646, 33.37436, 35.95878, 36.20281, 40.11474, 39.6453 , 40.67256, 41.24556, 41.68827, 43.24129, 44.84355, 45.15693, 45.24839, 44.34038, 46.91914, 45.37185, 46.24403, 44.3192 , 46.2601 , 45.70734, 45.79608, 42.64609, 42.4804 , 42.71524, 40.61639, 39.99024, 39.461 , 38.65803, 39.96906, 37.85604, 32.03859, 33.37055, 32.63265, 31.37243, 31.3744 , 27.34599, 26.91727, 23.93647, 20.31629, 19.64403, 16.14407, 17.00054, 15.06618, 14.23453, 11.60285, 8.33375, 7.42181, 0. , 9.27097, 6.7636 , 9.28897, 10.82079, 10.31003, 16.92184, 15.97901, 18.91758, 20.92638, 24.2509 , 26.79505, 28.28992, 30.96807, 32.00602, 34.44939, 37.75691, 37.62064, 40.16677, 43.88795, 43.79662, 44.82826, 47.13516, 50.10628, 48.05767, 53.25517, 53.82784, 55.9775 , 54.96514, 53.63403, 55.43629, 53.67147, 57.5573 ], [ 16.11693, 13.93834, 17.43602, 16.25858, 14.61363, 15.9765 , 18.13698, 16.67337, 17.05001, 18.64491, 21.88627, 23.31711, 22.73515, 25.74229, 26.28932, 26.20587, 29.40351, 28.32294, 32.1881 , 32.67538, 32.20425, 33.51843, 35.6227 , 36.05376, 39.74911, 40.59724, 40.65653, 41.85591, 40.95449, 43.27906, 44.80852, 45.74134, 44.99179, 44.26734, 47.23808, 45.70899, 46.80226, 44.52264, 46.99824, 46.90492, 46.84759, 44.26794, 43.60742, 44.28372, 42.02245, 41.63577, 40.85783, 41.22222, 41.25252, 39.53608, 34.25637, 35.15239, 34.33692, 33.06567, 33.67799, 30.249 , 28.70851, 25.20954, 22.0821 , 21.76762, 18.65999, 17.6708 , 16.60802, 15.65154, 13.60009, 10.64638, 9.81149, 9.27097, 0. , 5.4579 , 8.31392, 11.37399, 10.06637, 15.80099, 16.19604, 17.81404, 20.35224, 22.68455, 26.33321, 27.22424, 29.39485, 31.26857, 33.44303, 36.96176, 36.64784, 38.92403, 43.95773, 42.77504, 44.24492, 46.45457, 49.95585, 48.08682, 53.30444, 53.83046, 55.9695 , 55.1559 , 54.03066, 55.76022, 54.37677, 58.38252], [ 14.78883, 13.97627, 15.89444, 14.52592, 14.13096, 14.6316 , 16.73796, 15.59013, 15.68349, 17.05169, 19.79573, 21.43302, 21.53386, 24.75989, 25.69016, 25.12388, 28.14274, 27.26777, 30.80285, 31.35308, 31.53711, 32.6792 , 34.90092, 35.34433, 39.34168, 39.49418, 40.19097, 41.23647, 40.88777, 42.89426, 44.71783, 45.58144, 45.27901, 44.39004, 47.21417, 45.86452, 47.00371, 45.11777, 47.39798, 47.20804, 47.06841, 44.55381, 44.22534, 44.57255, 42.49394, 42.34816, 41.59448, 41.17685, 41.86267, 40.08616, 34.87194, 36.01863, 35.4078 , 34.08613, 34.68713, 30.89222, 29.95311, 26.57818, 22.97453, 22.8833 , 19.26615, 18.90909, 17.5898 , 15.89031, 14.22662, 10.34292, 8.48963, 6.7636 , 5.4579 , 0. , 6.20725, 9.55298, 8.01646, 13.88004, 13.47562, 15.73658, 18.32946, 21.00988, 24.41039, 25.357 , 28.05329, 29.56878, 32.10165, 35.5989 , 35.30488, 37.94608, 42.47888, 41.88941, 43.10642, 45.66534, 48.94918, 47.14171, 52.39688, 53.39353, 55.47101, 54.71155, 53.48973, 55.54874, 54.10548, 57.89195], [ 14.85241, 15.28133, 14.88863, 14.10743, 14.51717, 13.43337, 15.0992 , 15.29917, 15.30447, 14.82973, 18.19257, 19.21266, 19.39791, 22.16485, 23.88593, 23.22648, 25.25644, 25.1948 , 28.93245, 29.36674, 29.60455, 30.40374, 32.5457 , 33.67934, 37.45432, 37.70761, 38.65435, 39.41798, 39.44077, 41.53307, 43.31817, 44.57787, 44.24326, 43.65807, 46.86937, 45.49593, 46.37995, 44.80329, 47.50405, 47.14969, 46.96281, 44.89301, 44.42682, 45.13821, 43.59005, 43.03802, 42.32162, 42.52243, 43.03579, 41.63043, 36.40332, 37.78748, 37.27436, 36.44189, 36.28302, 32.98648, 32.23778, 28.98693, 25.272 , 25.40245, 21.70973, 22.24521, 20.66387, 19.35493, 16.93863, 12.47002, 11.54921, 9.28897, 8.31392, 6.20725, 0. , 8.05145, 5.69215, 9.9497 , 11.36293, 12.71091, 14.59198, 18.34641, 21.18762, 22.23008, 25.12185, 26.96778, 29.22174, 32.55304, 32.67088, 35.01998, 40.03592, 39.3736 , 40.65807, 43.06865, 46.5909 , 44.99134, 50.2539 , 51.23541, 53.43403, 53.31284, 52.11044, 54.48838, 52.87226, 56.74007], [ 16.81082, 15.43603, 15.37154, 13.84321, 15.68778, 14.09887, 16.73965, 14.46026, 14.95815, 15.51568, 16.9958 , 18.09438, 17.15086, 20.06477, 21.52732, 22.09752, 24.3272 , 22.50728, 26.4393 , 27.44256, 27.68888, 28.64096, 31.91806, 32.25061, 35.8801 , 36.65188, 37.60561, 38.64347, 38.47113, 41.05796, 42.17357, 43.24457, 44.00885, 43.18833, 46.30617, 45.23056, 46.44777, 44.7664 , 47.43771, 46.86 , 47.70312, 45.34457, 44.91905, 45.55069, 43.80397, 43.69836, 43.57751, 43.43884, 44.67317, 43.15796, 38.15373, 39.81268, 38.82324, 37.65905, 37.6514 , 35.07533, 33.96847, 31.46102, 27.72545, 27.78937, 23.98596, 24.80605, 23.36406, 21.37071, 20.48684, 15.48307, 15.39613, 10.82079, 11.37399, 9.55298, 8.05145, 0. , 5.15882, 10.42401, 8.2677 , 10.41686, 12.23247, 15.69246, 18.50621, 19.6109 , 22.38789, 24.44886, 26.89026, 29.8127 , 30.67153, 32.99654, 37.56207, 37.45707, 38.73435, 41.86474, 44.60558, 43.27327, 48.91205, 49.33654, 52.18163, 52.20071, 51.11608, 53.03856, 51.85091, 55.85643], [ 15.84987, 15.38908, 15.57648, 13.02127, 14.51018, 13.393 , 15.39915, 14.31921, 14.4235 , 14.01205, 16.94848, 18.37291, 17.26961, 20.06272, 21.55272, 21.38396, 23.56296, 23.05361, 26.65776, 27.60226, 27.2861 , 28.81112, 31.28384, 32.36628, 35.90349, 36.33936, 37.35235, 38.26693, 38.21591, 40.77052, 42.00564, 43.23535, 43.62399, 42.95227, 46.18952, 45.08133, 46.32251, 44.57585, 47.28364, 46.8637 , 47.32522, 45.22525, 44.77389, 45.47875, 43.89376, 43.42436, 43.1737 , 43.36506, 44.36291, 42.86376, 37.65245, 39.32254, 38.79999, 37.7809 , 37.46012, 34.59757, 33.87564, 31.12039, 27.36277, 27.30865, 23.83313, 24.44948, 22.74647, 21.35245, 19.57347, 15.00604, 14.38347, 10.31003, 10.06637, 8.01646, 5.69215, 5.15882, 0. , 7.97303, 7.41391, 9.92165, 12.10268, 15.51953, 18.54691, 19.63637, 22.4477 , 24.28556, 26.89341, 29.828 , 30.30008, 32.67363, 37.5192 , 37.10411, 38.59252, 41.46073, 44.64997, 43.08337, 48.79588, 49.29285, 52.03523, 51.96351, 51.03286, 53.18936, 51.82288, 55.93366], [ 19.49531, 19.14731, 18.73803, 16.58691, 17.43174, 15.35339, 15.3146 , 16.68845, 16.70277, 14.33961, 16.33478, 17.67573, 16.50911, 19.02226, 20.82185, 19.8089 , 20.85461, 22.12662, 25.03495, 26.2458 , 25.60395, 27.30426, 28.86779, 30.7808 , 34.72309, 34.78051, 36.34129, 37.14527, 37.33843, 39.41463, 40.73932, 43.03554, 43.42716, 42.93121, 46.3353 , 45.40294, 46.58154, 45.54288, 48.66654, 48.70376, 48.293 , 47.30472, 46.85295, 47.62256, 46.64783, 46.41609, 45.94993, 46.36946, 47.1119 , 46.50258, 41.47684, 43.59766, 42.81333, 42.43773, 42.17949, 39.54874, 38.86784, 35.92599, 32.25835, 32.95778, 29.46502, 29.94102, 28.44725, 26.88469, 25.10732, 20.17726, 19.86861, 16.92184, 15.80099, 13.88004, 9.9497 , 10.42401, 7.97303, 0. , 7.5317 , 5.31432, 7.67385, 11.29427, 13.88518, 15.65731, 17.41581, 21.03678, 23.50462, 25.23083, 26.39159, 28.81917, 34.13332, 33.45212, 35.14618, 38.06477, 41.77849, 40.61566, 45.83126, 47.30105, 49.72211, 50.48444, 50.07272, 52.69689, 50.98398, 55.16405], [ 19.10495, 18.0164 , 17.66276, 15.13994, 16.76305, 15.15473, 16.21818, 14.41634, 14.6234 , 13.3435 , 14.10467, 16.69031, 14.97885, 17.40359, 18.79088, 18.83186, 20.80185, 19.74132, 23.12291, 24.4287 , 24.28611, 26.30338, 28.98518, 30.11484, 33.54021, 33.80123, 35.23204, 36.28322, 36.35944, 39.1773 , 40.54667, 41.97252, 43.097 , 42.14583, 45.46555, 45.13354, 46.66496, 45.13382, 47.8922 , 47.41032, 48.42568, 46.71893, 46.45624, 47.08008, 45.65345, 45.84523, 45.7186 , 45.84661, 47.23168, 45.86231, 41.28355, 43.22273, 43.14353, 42.01543, 41.8008 , 39.14111, 38.83989, 36.4427 , 32.61786, 32.79115, 29.31874, 30.21988, 28.42199, 27.14093, 25.49894, 20.87708, 19.77106, 15.97901, 16.19604, 13.47562, 11.36293, 8.2677 , 7.41391, 7.5317 , 0. , 6.63347, 8.51215, 10.2156 , 13.17845, 14.2086 , 17.41713, 18.82641, 21.82783, 24.56628, 25.20903, 28.13747, 32.64308, 32.72029, 34.13216, 37.76093, 40.74152, 39.28459, 45.54548, 46.22894, 49.39464, 49.45139, 48.79067, 51.31535, 50.31884, 54.28067], [ 20.82333, 19.25691, 19.14902, 17.45726, 18.13449, 15.57783, 15.90665, 16.10611, 16.24885, 14.52817, 14.58976, 16.42476, 14.84698, 17.54345, 18.59526, 18.86371, 19.89348, 19.63274, 22.70202, 24.17496, 23.68602, 25.2772 , 27.6702 , 29.01616, 32.87432, 33.49258, 34.77337, 36.14132, 35.80693, 38.1895 , 39.4803 , 41.7482 , 42.57723, 41.96176, 45.29483, 44.55667, 46.07288, 45.13769, 48.31191, 48.23058, 48.33951, 47.3816 , 46.96242, 47.68002, 46.63616, 46.92581, 46.70514, 46.8006 , 47.84897, 47.29211, 42.71965, 44.89829, 44.0194 , 43.33093, 43.46606, 41.24033, 40.22175, 37.60689, 33.89607, 34.89289, 31.04096, 31.68858, 30.43171, 28.27884, 27.55298, 22.2614 , 22.06307, 18.91758, 17.81404, 15.73658, 12.71091, 10.41686, 9.92165, 5.31432, 6.63347, 0. , 4.70483, 7.62293, 10.77344, 12.04317, 14.09367, 18.36526, 20.64699, 22.23753, 23.95215, 26.43932, 31.57022, 31.17569, 32.66191, 36.26804, 39.40728, 38.57361, 43.9427 , 45.43841, 47.97621, 49.1094 , 48.50627, 50.97824, 49.73738, 53.71283], [ 22.2353 , 20.79821, 19.51857, 18.68005, 19.68793, 16.38066, 16.80116, 16.76691, 16.87216, 14.70497, 14.07395, 15.74665, 14.0348 , 16.56831, 17.84671, 18.34015, 18.69289, 17.89876, 21.31299, 22.32386, 22.83922, 23.65439, 26.58669, 27.92706, 31.3026 , 32.35566, 33.6225 , 34.9677 , 34.8564 , 37.36602, 38.72058, 41.05749, 41.90128, 41.49189, 44.96068, 44.24469, 45.61367, 45.036 , 48.29145, 47.81376, 48.32094, 47.69802, 47.13319, 48.15121, 47.08335, 47.45754, 47.35142, 47.56391, 48.73064, 48.17182, 43.82151, 46.04871, 45.29074, 44.64622, 44.62028, 42.66911, 41.76582, 39.32328, 35.73057, 36.70972, 32.71498, 33.75574, 32.53739, 30.5466 , 29.7128 , 24.11224, 24.50788, 20.92638, 20.35224, 18.32946, 14.59198, 12.23247, 12.10268, 7.67385, 8.51215, 4.70483, 0. , 7.52323, 8.61132, 9.93048, 12.5578 , 16.40777, 18.25766, 20.46284, 22.14114, 24.25493, 29.68935, 29.77053, 30.84631, 34.5401 , 37.42484, 37.09933, 42.31915, 43.87867, 46.39273, 48.0183 , 47.23087, 49.8669 , 48.80525, 52.57763], [ 24.10629, 21.17772, 21.56308, 20.17601, 19.75357, 18.12999, 17.15962, 16.68857, 16.49137, 14.77414, 12.99207, 15.65608, 13.16584, 15.07742, 15.16618, 15.42575, 17.03854, 15.79792, 18.64901, 20.26882, 19.6321 , 21.82505, 23.92989, 25.32419, 28.63784, 29.75799, 30.92734, 32.65462, 31.85679, 34.97812, 36.16405, 38.72517, 39.82617, 39.07646, 42.56432, 42.6187 , 44.385 , 43.45157, 46.78708, 46.72845, 47.48303, 47.15203, 46.71132, 47.49823, 46.48834, 47.51112, 47.22145, 47.96634, 48.82056, 48.3848 , 44.61234, 46.77149, 46.65033, 45.80397, 46.06782, 44.16304, 43.56084, 41.32062, 37.77336, 38.8788 , 35.20378, 35.95484, 34.61086, 33.50676, 32.13686, 27.27657, 26.82158, 24.2509 , 22.68455, 21.00988, 18.34641, 15.69246, 15.51953, 11.29427, 10.2156 , 7.62293, 7.52323, 0. , 5.61253, 7.21393, 8.09939, 12.2139 , 14.96738, 16.37581, 17.99627, 20.35805, 25.70473, 25.34467, 26.89611, 30.90733, 34.04257, 33.31594, 39.15504, 40.74492, 43.65432, 44.59624, 44.58934, 47.23448, 46.50554, 50.1937 ], [ 25.58022, 23.34925, 22.21288, 21.41715, 21.78947, 19.33544, 18.1004 , 18.02822, 17.85663, 15.56051, 12.74353, 14.72969, 12.28025, 13.83543, 14.82818, 14.24007, 14.62848, 14.13342, 16.46835, 17.63621, 18.43043, 19.44844, 21.86083, 23.29639, 26.36769, 27.30116, 29.00015, 30.32245, 30.32951, 33.00745, 34.29426, 37.14338, 38.47462, 37.79461, 41.36339, 41.59934, 43.10472, 42.69738, 46.0658 , 45.78045, 46.66074, 46.77432, 46.36086, 47.35146, 46.45531, 47.68894, 47.41203, 48.21386, 49.2878 , 49.04825, 45.47536, 47.87025, 47.77203, 47.25139, 47.29764, 45.54425, 45.32808, 43.23691, 39.95231, 41.05635, 37.43086, 38.6216 , 37.28059, 36.4934 , 34.70808, 29.58244, 29.77616, 26.79505, 26.33321, 24.41039, 21.18762, 18.50621, 18.54691, 13.88518, 13.17845, 10.77344, 8.61132, 5.61253, 0. , 6.28487, 5.75247, 9.62946, 11.75449, 13.17307, 14.78111, 17.10354, 22.23733, 22.66218, 23.80291, 27.72151, 30.83746, 30.51189, 35.9164 , 37.93316, 40.70448, 42.1526 , 42.22162, 45.16766, 44.28871, 47.87966], [ 26.94421, 24.69467, 23.47894, 23.0355 , 23.32496, 20.47957, 20.22697, 19.23566, 19.18505, 16.74031, 13.92433, 16.18704, 14.13143, 14.50581, 14.49505, 16.84456, 16.26612, 14.08034, 16.81846, 18.1981 , 18.32631, 19.69742, 22.75265, 24.12436, 26.4844 , 28.38011, 29.40958, 31.15585, 30.47848, 34.10653, 35.27955, 37.61494, 39.17248, 38.59134, 42.23577, 42.40437, 44.24015, 43.68518, 47.00178, 46.55168, 47.9282 , 47.78838, 47.34189, 48.3686 , 47.75785, 48.68155, 48.74309, 49.40067, 50.56509, 50.09211, 47.05548, 49.21038, 49.32322, 48.43046, 48.35452, 47.06384, 46.48406, 44.71995, 41.07814, 42.15574, 38.47346, 39.86389, 38.61696, 37.19752, 36.23865, 31.1899 , 30.96037, 28.28992, 27.22424, 25.357 , 22.23008, 19.6109 , 19.63637, 15.65731, 14.2086 , 12.04317, 9.93048, 7.21393, 6.28487, 0. , 7.47856, 9.53171, 11.18018, 13.01788, 15.47366, 17.28446, 22.4102 , 22.68226, 23.79207, 28.38873, 30.63132, 30.55106, 36.61026, 37.86975, 41.02305, 43.05683, 42.18162, 45.17244, 45.04898, 48.29225], [ 29.15922, 25.91663, 25.86362, 25.07801, 24.54554, 22.53012, 21.04317, 20.91886, 20.91421, 18.93591, 15.69868, 17.54084, 14.59832, 15.50799, 15.15638, 15.52977, 15.56155, 14.92898, 16.44223, 18.06242, 17.7009 , 19.3097 , 21.11808, 22.42396, 25.34082, 27.09472, 28.36382, 30.05495, 29.27892, 32.18818, 33.02234, 36.50124, 37.86386, 37.2521 , 40.86225, 41.24915, 42.9754 , 42.70208, 46.27576, 46.44799, 47.12264, 47.81124, 47.27088, 48.33511, 47.61879, 49.1594 , 48.88315, 49.99774, 50.75426, 50.93521, 47.78609, 50.23529, 49.88537, 49.49756, 49.81537, 48.51593, 47.93843, 45.92899, 42.85021, 44.18684, 40.74117, 41.58399, 40.54339, 39.64894, 38.20019, 33.13339, 33.5255 , 30.96807, 29.39485, 28.05329, 25.12185, 22.38789, 22.4477 , 17.41581, 17.41713, 14.09367, 12.5578 , 8.09939, 5.75247, 7.47856, 0. , 9.46928, 10.77391, 10.07175, 12.78621, 14.47174, 20.14208, 19.79391, 21.44302, 25.49215, 28.64581, 28.7825 , 33.76637, 36.05478, 38.72334, 40.7433 , 41.16185, 43.95332, 43.33299, 46.9185 ], [ 29.35633, 26.0043 , 25.36028, 24.73817, 24.18059, 23.41117, 22.01882, 19.93191, 19.70914, 17.75143, 14.25548, 16.84414, 13.82806, 13.55086, 12.82821, 13.17945, 13.83998, 11.06895, 12.42447, 13.71252, 14.75941, 17.16471, 18.7693 , 19.7537 , 21.25983, 23.18208, 25.0391 , 26.14201, 25.8884 , 29.92886, 30.89456, 33.46255, 34.99443, 34.09993, 37.9703 , 39.18714, 40.85532, 40.23943, 43.40774, 43.06789, 45.05578, 45.64705, 45.18382, 46.28713, 45.29908, 47.14392, 46.89426, 48.4731 , 49.53957, 48.90653, 46.48006, 48.6807 , 49.61055, 48.83554, 48.72589, 47.55693, 47.98325, 46.36053, 43.44198, 44.13238, 41.15986, 42.28369, 40.91422, 41.31134, 38.77183, 34.40247, 34.52259, 32.00602, 31.26857, 29.56878, 26.96778, 24.44886, 24.28556, 21.03678, 18.82641, 18.36526, 16.40777, 12.2139 , 9.62946, 9.53171, 9.46928, 0. , 7.70034, 11.03984, 9.3173 , 11.00972, 16.06011, 16.8982 , 18.3223 , 22.22843, 24.9604 , 24.69864, 30.94594, 32.41159, 35.92687, 36.56528, 36.93542, 40.15198, 40.34494, 43.38612], [ 30.44926, 27.48027, 26.33286, 26.67714, 26.19812, 23.94749, 23.10616, 21.22647, 21.64589, 19.27456, 15.27649, 17.82539, 14.44424, 13.77702, 12.71296, 15.12571, 13.72169, 11.02782, 12.98025, 12.60611, 14.24738, 14.47088, 18.18955, 19.01006, 19.4387 , 22.41711, 22.59333, 24.6254 , 23.90214, 27.45379, 29.48465, 32.11582, 33.29005, 32.53784, 36.62617, 37.89513, 39.73088, 39.23902, 42.66151, 41.80521, 44.3409 , 44.93698, 44.30438, 46.10429, 45.03967, 46.92675, 46.8966 , 48.58388, 49.65108, 49.34591, 47.17699, 49.53981, 50.40164, 49.72558, 49.68722, 48.77945, 48.86364, 47.3359 , 44.85383, 45.71765, 42.72724, 44.38077, 43.13823, 43.32302, 41.07158, 36.49848, 36.99302, 34.44939, 33.44303, 32.10165, 29.22174, 26.89026, 26.89341, 23.50462, 21.82783, 20.64699, 18.25766, 14.96738, 11.75449, 11.18018, 10.77391, 7.70034, 0. , 9.07328, 6.77896, 9.73201, 14.62719, 15.04701, 15.37031, 19.6617 , 22.50593, 22.25054, 28.56357, 29.72752, 32.81706, 34.96093, 34.29057, 37.67062, 38.17835, 41.31302], [ 33.44155, 31.09767, 29.98075, 29.31584, 29.52194, 27.14785, 25.70386, 25.60087, 25.6927 , 22.94456, 19.74988, 20.46911, 17.8433 , 16.34443, 15.97897, 17.63489, 14.82471, 15.65033, 14.62265, 16.39878, 14.81143, 16.52024, 18.2176 , 19.94131, 21.35453, 22.75371, 24.07796, 25.64368, 25.36182, 28.27133, 28.8475 , 32.19497, 34.59372, 33.99566, 37.63698, 38.80147, 40.79164, 40.71097, 44.23342, 44.2389 , 45.66062, 46.70672, 46.30564, 47.57535, 47.61192, 49.05016, 49.20629, 50.70632, 51.8766 , 52.28837, 49.92586, 52.72328, 52.96687, 52.67093, 52.42257, 51.948 , 51.79319, 50.52313, 47.6269 , 48.95971, 46.03943, 47.67845, 46.37182, 46.1664 , 44.37704, 39.87239, 40.18245, 37.75691, 36.96176, 35.5989 , 32.55304, 29.8127 , 29.828 , 25.23083, 24.56628, 22.23753, 20.46284, 16.37581, 13.17307, 13.01788, 10.07175, 11.03984, 9.07328, 0. , 8.87214, 9.85187, 12.31734, 12.2625 , 14.04632, 18.81453, 21.19061, 21.12454, 27.04242, 28.49102, 31.8115 , 34.8203 , 35.12966, 38.15204, 38.01118, 41.47549], [ 33.20634, 29.88905, 29.33639, 29.02044, 28.20267, 27.12172, 25.33855, 24.01222, 24.18448, 21.91537, 18.25877, 20.57828, 16.98162, 16.4955 , 14.73869, 15.48099, 14.34678, 14.30857, 13.85173, 13.84417, 14.39659, 15.93936, 17.45544, 18.33339, 18.66931, 20.88887, 21.93502, 23.38049, 23.14462, 25.75202, 27.86781, 31.24603, 32.25062, 31.15287, 35.45185, 37.34563, 39.14055, 38.75821, 42.09818, 41.92408, 43.92436, 45.20662, 44.68842, 46.38933, 45.31081, 47.7489 , 47.28112, 49.48718, 50.44647, 50.39015, 48.24647, 50.9 , 51.90866, 51.57562, 51.63792, 50.60804, 51.20293, 49.48781, 47.47827, 48.28665, 45.7643 , 47.02336, 45.76609, 46.66713, 43.59581, 39.52611, 39.99496, 37.62064, 36.64784, 35.30488, 32.67088, 30.67153, 30.30008, 26.39159, 25.20903, 23.95215, 22.14114, 17.99627, 14.78111, 15.47366, 12.78621, 9.3173 , 6.77896, 8.87214, 0. , 8.06218, 11.429 , 11.46571, 13.13148, 16.10375, 20.41288, 19.36076, 25.3713 , 27.60118, 30.25563, 31.87386, 32.4665 , 36.03614, 36.48471, 39.63076], [ 35.10944, 31.87226, 31.30981, 31.28387, 30.01717, 29.01223, 26.90823, 26.23519, 26.08005, 23.84675, 20.80091, 21.7296 , 19.04357, 17.40327, 16.7043 , 16.55105, 15.01213, 14.46629, 13.32047, 13.90252, 14.09233, 15.41702, 15.00264, 16.81857, 16.58331, 19.50611, 20.62995, 21.94394, 21.09032, 25.34647, 25.48297, 29.25163, 30.55199, 30.43379, 34.21791, 35.75256, 37.3534 , 37.46691, 41.14137, 40.96925, 42.83903, 44.90814, 44.1482 , 45.72293, 45.41523, 47.3948 , 47.34997, 49.72993, 50.26925, 50.58997, 49.03127, 51.39158, 52.35132, 52.08085, 51.8305 , 51.67431, 51.97604, 50.80266, 48.41191, 49.59359, 46.92375, 48.24653, 47.10957, 48.02232, 45.48707, 41.18787, 42.1269 , 40.16677, 38.92403, 37.94608, 35.01998, 32.99654, 32.67363, 28.81917, 28.13747, 26.43932, 24.25493, 20.35805, 17.10354, 17.28446, 14.47174, 11.00972, 9.73201, 9.85187, 8.06218, 0. , 9.67155, 9.04695, 10.17736, 13.59906, 16.54233, 17.9564 , 22.847 , 24.60167, 27.9245 , 29.98479, 31.1793 , 34.37541, 34.48999, 37.48156], [ 37.58041, 35.07482, 33.73965, 33.55744, 33.36934, 32.1432 , 30.27089, 29.55426, 29.35354, 27.42168, 23.91183, 24.02904, 22.24141, 20.09447, 19.42872, 19.58731, 17.11687, 17.74227, 13.81794, 14.58239, 14.84183, 15.60782, 15.51133, 15.99741, 15.29044, 15.96133, 18.16191, 18.90934, 19.56871, 22.44188, 22.56247, 25.58682, 28.63669, 27.85509, 31.08457, 33.50089, 35.4027 , 35.83383, 38.8066 , 38.73615, 40.98259, 43.03585, 42.80237, 43.99917, 44.16385, 46.55753, 46.70348, 48.60454, 49.99599, 50.40001, 49.42958, 52.21003, 53.40064, 53.1481 , 52.84663, 52.94916, 53.74581, 53.15541, 50.90451, 52.1083 , 49.81399, 51.67294, 50.41253, 51.58657, 49.01208, 45.18565, 45.88592, 43.88795, 43.95773, 42.47888, 40.03592, 37.56207, 37.5192 , 34.13332, 32.64308, 31.57022, 29.68935, 25.70473, 22.23733, 22.4102 , 20.14208, 16.06011, 14.62719, 12.31734, 11.429 , 9.67155, 0. , 6.56064, 6.38088, 10.37223, 11.02495, 11.31594, 17.26136, 19.1381 , 22.65982, 24.77406, 25.9365 , 29.3083 , 29.80644, 32.34705], [ 36.97612, 34.08161, 33.6759 , 33.39091, 32.26088, 31.53822, 29.34631, 29.01695, 29.00965, 26.76934, 24.08524, 24.20871, 22.05366, 19.53993, 18.50832, 18.96449, 16.34701, 17.97385, 14.45423, 15.61773, 13.11227, 15.3568 , 13.46033, 15.0617 , 14.5311 , 15.86852, 17.02272, 18.07516, 17.40934, 20.82485, 20.68579, 24.49213, 26.61145, 26.08889, 29.69294, 31.99846, 33.98145, 33.96711, 37.42541, 37.92498, 39.58649, 41.8455 , 41.36941, 42.7842 , 43.17606, 45.24034, 45.24956, 47.90274, 48.57102, 49.3039 , 48.33632, 51.04052, 52.18807, 52.11595, 51.78041, 52.10773, 52.60551, 51.87092, 49.78713, 51.08147, 49.07544, 50.62617, 49.41508, 50.8496 , 48.08088, 44.54115, 45.21708, 43.79662, 42.77504, 41.88941, 39.3736 , 37.45707, 37.10411, 33.45212, 32.72029, 31.17569, 29.77053, 25.34467, 22.66218, 22.68226, 19.79391, 16.8982 , 15.04701, 12.2625 , 11.46571, 9.04695, 6.56064, 0. , 5.98419, 8.35946, 11.68013, 10.99161, 17.01812, 18.26248, 21.8055 , 23.90279, 25.4595 , 28.64282, 28.83705, 32.02712], [ 37.4974 , 34.96767, 34.04663, 34.49413, 33.40864, 31.97442, 29.92087, 29.67471, 29.70327, 27.66108, 24.27307, 24.33682, 22.98183, 20.64083, 20.11721, 20.24747, 17.47794, 18.06633, 14.59398, 14.72379, 14.67681, 14.63636, 13.79958, 14.77595, 14.07159, 14.98222, 15.54819, 17.34202, 16.72036, 19.82573, 20.34994, 23.73966, 26.02174, 25.78955, 28.68309, 30.98915, 32.97956, 33.65132, 36.96229, 36.83463, 38.84249, 41.35057, 40.89437, 42.32976, 42.72795, 45.04445, 45.31637, 47.29044, 48.20799, 49.05992, 48.54484, 51.18106, 52.39054, 52.16829, 52.05071, 52.46879, 52.8835 , 52.39965, 50.24847, 51.81738, 49.48923, 51.37421, 50.26066, 51.44928, 49.19229, 45.31625, 46.21675, 44.82826, 44.24492, 43.10642, 40.65807, 38.73435, 38.59252, 35.14618, 34.13216, 32.66191, 30.84631, 26.89611, 23.80291, 23.79207, 21.44302, 18.3223 , 15.37031, 14.04632, 13.13148, 10.17736, 6.38088, 5.98419, 0. , 7.87813, 8.5264 , 10.22297, 15.17486, 17.26896, 20.15363, 23.01967, 23.76411, 27.20938, 27.40981, 30.03519], [ 39.19126, 36.59085, 35.63369, 36.29427, 34.83176, 34.20046, 31.66778, 31.85062, 31.74665, 29.75946, 27.16596, 26.71437, 25.33998, 23.14301, 22.55747, 21.6386 , 19.00668, 21.34884, 17.7774 , 17.00739, 17.10835, 16.4597 , 13.68436, 14.59106, 12.69685, 13.66391, 14.56199, 14.70928, 15.08113, 16.70077, 17.57016, 21.92186, 22.74625, 22.49509, 26.09517, 28.78806, 30.08218, 30.88072, 34.28354, 34.6773 , 36.13211, 39.45815, 38.88697, 40.70953, 41.05225, 43.71142, 43.29447, 46.48555, 46.79335, 47.89894, 47.60107, 50.27656, 51.76294, 52.09858, 51.83186, 52.27757, 53.24671, 52.58596, 51.22179, 52.5753 , 50.83285, 52.55434, 51.50842, 53.70231, 50.28702, 47.01201, 48.21974, 47.13516, 46.45457, 45.66534, 43.06865, 41.86474, 41.46073, 38.06477, 37.76093, 36.26804, 34.5401 , 30.90733, 27.72151, 28.38873, 25.49215, 22.22843, 19.6617 , 18.81453, 16.10375, 13.59906, 10.37223, 8.35946, 7.87813, 0. , 8.67141, 8.22265, 10.68938, 14.39115, 15.81638, 18.19258, 20.24196, 24.13398, 23.94398, 26.50431], [ 41.96371, 39.56174, 38.24726, 39.09789, 38.18059, 36.96213, 35.17056, 34.7836 , 34.68077, 33.0935 , 30.02343, 29.08875, 28.35413, 25.73473, 25.22857, 25.58849, 22.61177, 22.96009, 18.83897, 18.61744, 19.10648, 17.9459 , 16.58338, 16.3054 , 13.75561, 14.86599, 15.3057 , 16.17006, 16.01239, 18.83691, 17.81885, 20.66609, 23.58481, 23.83891, 26.05857, 28.57053, 30.29887, 31.70992, 34.64723, 34.46336, 36.84814, 40.11168, 39.62182, 41.06713, 42.00461, 44.46337, 44.9994 , 47.1459 , 48.1779 , 49.234 , 49.68564, 52.28684, 53.60591, 53.53316, 53.22438, 54.53359, 55.09278, 55.23165, 53.41837, 55.01953, 53.02921, 55.1481 , 54.21112, 55.7896 , 53.56762, 50.01015, 51.36139, 50.10628, 49.95585, 48.94918, 46.5909 , 44.60558, 44.64997, 41.77849, 40.74152, 39.40728, 37.42484, 34.04257, 30.83746, 30.63132, 28.64581, 24.9604 , 22.50593, 21.19061, 20.41288, 16.54233, 11.02495, 11.68013, 8.5264 , 8.67141, 0. , 8.20946, 9.70989, 11.26803, 13.99325, 18.01633, 18.62262, 21.68822, 22.49918, 24.09327], [ 39.49224, 37.35437, 36.06871, 36.46602, 35.81903, 35.11644, 33.31352, 32.77038, 32.64932, 30.98471, 28.55089, 27.8246 , 26.6982 , 23.91935, 23.0396 , 23.63285, 20.87814, 22.50932, 18.33698, 17.9266 , 17.05555, 16.78672, 15.70851, 15.56181, 12.45704, 12.41707, 12.64367, 13.04202, 13.78221, 15.85836, 16.39775, 18.42641, 21.28107, 20.37192, 23.59887, 27.01304, 29.00976, 29.12309, 31.85059, 31.84197, 34.73249, 37.08534, 36.88549, 38.47232, 39.14698, 41.62923, 41.76229, 44.6641 , 45.75658, 46.43048, 46.61442, 49.36971, 51.35824, 51.2622 , 50.71905, 51.56906, 52.73899, 52.76786, 51.29034, 52.46332, 51.02381, 53.295 , 52.03406, 54.36972, 51.16563, 48.42501, 49.18705, 48.05767, 48.08682, 47.14171, 44.99134, 43.27327, 43.08337, 40.61566, 39.28459, 38.57361, 37.09933, 33.31594, 30.51189, 30.55106, 28.7825 , 24.69864, 22.25054, 21.12454, 19.36076, 17.9564 , 11.31594, 10.99161, 10.22297, 8.22265, 8.20946, 0. , 10.97595, 10.25467, 13.79501, 15.37994, 15.98122, 19.6484 , 20.7999 , 23.08624], [ 44.24451, 42.00158, 40.78921, 41.84873, 40.65885, 39.86917, 37.56416, 38.06911, 37.91062, 36.65424, 33.83448, 32.39523, 32.11611, 30.1514 , 29.71055, 28.70622, 25.97138, 28.30391, 23.71274, 22.92496, 23.72982, 21.70309, 18.99807, 17.98122, 16.48731, 15.52539, 16.45475, 15.86513, 17.09382, 15.60841, 15.34486, 19.3634 , 21.00584, 20.98593, 22.7285 , 25.56683, 26.63001, 28.97938, 31.58249, 32.48614, 33.17741, 37.64673, 37.35607, 38.78739, 39.86511, 43.11111, 42.8692 , 45.39504, 46.03133, 47.85874, 48.6812 , 51.53462, 52.66349, 53.32599, 53.35513, 54.55259, 55.52021, 55.52018, 54.5929 , 56.32081, 54.85708, 56.77537, 56.08765, 58.30467, 55.41252, 52.3525 , 54.05609, 53.25517, 53.30444, 52.39688, 50.2539 , 48.91205, 48.79588, 45.83126, 45.54548, 43.9427 , 42.31915, 39.15504, 35.9164 , 36.61026, 33.76637, 30.94594, 28.56357, 27.04242, 25.3713 , 22.847 , 17.26136, 17.01812, 15.17486, 10.68938, 9.70989, 10.97595, 0. , 11.45306, 8.11042, 12.83722, 15.11178, 18.41451, 17.69405, 18.89004], [ 44.49713, 42.65467, 41.34813, 41.95262, 41.40321, 40.59916, 39.27059, 38.63774, 38.78521, 37.30058, 35.47091, 33.86684, 32.88203, 29.62866, 29.41731, 30.21742, 27.04819, 28.4628 , 24.45456, 24.15302, 22.93954, 21.98327, 20.56008, 20.15022, 16.18279, 16.68747, 15.71403, 15.06125, 15.61569, 17.8028 , 15.15469, 16.24486, 19.89187, 20.20119, 22.24654, 25.4267 , 27.03646, 27.45576, 30.04064, 29.86623, 33.3556 , 36.20666, 35.71787, 37.64475, 39.12616, 40.95347, 41.88106, 45.2002 , 46.10588, 47.20917, 48.31756, 50.93333, 52.69263, 52.90649, 51.86422, 53.99784, 54.91969, 55.73155, 54.6199 , 55.78484, 54.85699, 57.33105, 56.26153, 58.85498, 55.96844, 53.37068, 54.84453, 53.82784, 53.83046, 53.39353, 51.23541, 49.33654, 49.29285, 47.30105, 46.22894, 45.43841, 43.87867, 40.74492, 37.93316, 37.86975, 36.05478, 32.41159, 29.72752, 28.49102, 27.60118, 24.60167, 19.1381 , 18.26248, 17.26896, 14.39115, 11.26803, 10.25467, 11.45306, 0. , 9.27356, 12.57009, 12.95849, 14.74353, 15.81852, 18.51871], [ 46.20998, 44.21824, 43.00012, 44.21519, 43.17206, 42.25278, 40.62456, 40.78472, 40.83826, 39.65202, 37.51809, 35.84499, 35.32819, 33.12758, 32.39176, 32.68447, 29.54295, 31.64773, 27.512 , 26.40181, 26.36839, 24.07691, 22.44251, 21.20001, 18.19724, 18.30724, 16.86429, 16.67207, 17.24015, 15.28365, 14.83316, 17.24548, 18.45627, 18.99761, 20.53253, 23.23683, 24.48703, 26.55069, 28.92462, 29.37943, 31.06652, 35.37051, 34.81861, 36.9269 , 38.19473, 40.97874, 41.23793, 44.23507, 44.94675, 46.79585, 48.12534, 50.99637, 52.29538, 52.95174, 52.68216, 54.65517, 55.45876, 55.94518, 55.51699, 57.09844, 56.10978, 58.35886, 57.72116, 60.2241 , 57.4112 , 54.80933, 56.72943, 55.9775 , 55.9695 , 55.47101, 53.43403, 52.18163, 52.03523, 49.72211, 49.39464, 47.97621, 46.39273, 43.65432, 40.70448, 41.02305, 38.72334, 35.92687, 32.81706, 31.8115 , 30.25563, 27.9245 , 22.65982, 21.8055 , 20.15363, 15.81638, 13.99325, 13.79501, 8.11042, 9.27356, 0. , 11.3311 , 10.74328, 12.86819, 13.30844, 14.52224], [ 45.17665, 42.7808 , 42.36618, 43.15904, 41.53149, 42.35218, 40.2062 , 39.75408, 39.61926, 39.12975, 37.42083, 36.03478, 35.61998, 33.46496, 32.91178, 31.55284, 30.34684, 32.1907 , 27.97503, 27.03924, 26.99544, 26.11801, 22.97965, 21.39029, 18.82112, 16.95519, 16.85299, 15.02326, 16.33925, 15.19425, 14.19401, 15.15315, 16.11799, 15.27863, 16.26812, 21.19257, 22.05865, 22.69609, 24.27898, 25.34313, 27.70613, 31.85248, 31.64177, 33.11543, 33.86518, 37.50522, 37.14262, 41.08682, 41.40908, 42.67964, 44.56241, 47.03863, 49.4784 , 49.96028, 49.67751, 51.26881, 53.12569, 53.73325, 53.60744, 54.71573, 54.39723, 56.29047, 55.39809, 59.29777, 55.34975, 53.68554, 55.20246, 54.96514, 55.1559 , 54.71155, 53.31284, 52.20071, 51.96351, 50.48444, 49.45139, 49.1094 , 48.0183 , 44.59624, 42.1526 , 43.05683, 40.7433 , 36.56528, 34.96093, 34.8203 , 31.87386, 29.98479, 24.77406, 23.90279, 23.01967, 18.19258, 18.01633, 15.37994, 12.83722, 12.57009, 11.3311 , 0. , 9.90814, 11.78426, 11.47244, 12.76638], [ 43.34276, 41.49634, 40.24461, 41.86844, 40.71829, 40.45489, 39.40489, 38.43139, 38.5705 , 38.02632, 36.17692, 34.76023, 34.74136, 32.47884, 31.62541, 32.34425, 30.10012, 31.04573, 27.44909, 25.82034, 26.43948, 24.12467, 23.45606, 21.13788, 17.69826, 17.37892, 15.00269, 14.43571, 15.18723, 13.84108, 15.20368, 13.59916, 14.52011, 13.7462 , 15.10008, 19.15064, 20.62081, 21.31464, 22.53438, 22.18623, 26.09503, 29.10527, 28.85684, 30.98764, 31.76617, 35.00647, 35.24283, 38.38866, 39.40758, 40.30443, 42.7187 , 45.1501 , 47.59661, 47.72114, 47.40003, 49.40055, 50.8073 , 51.74464, 51.63282, 52.72249, 52.29678, 54.80877, 54.07596, 57.3284 , 54.10888, 52.36036, 53.91346, 53.63403, 54.03066, 53.48973, 52.11044, 51.11608, 51.03286, 50.07272, 48.79067, 48.50627, 47.23087, 44.58934, 42.22162, 42.18162, 41.16185, 36.93542, 34.29057, 35.12966, 32.4665 , 31.1793 , 25.9365 , 25.4595 , 23.76411, 20.24196, 18.62262, 15.98122, 15.11178, 12.95849, 10.74328, 9.90814, 0. , 6.63421, 10.22093, 9.98474], [ 45.20143, 42.89946, 42.30763, 43.70448, 42.49124, 42.53407, 41.81369, 40.51321, 40.69567, 40.69416, 39.05643, 37.39742, 37.0847 , 35.00942, 33.79263, 35.03969, 33.17501, 33.47638, 29.9473 , 28.7832 , 28.60162, 26.69992, 26.30876, 23.4717 , 20.07196, 20.49274, 17.43522, 17.62774, 17.0286 , 16.11862, 15.44059, 12.73676, 14.22711, 13.92431, 13.99427, 17.6646 , 19.57784, 19.94906, 20.85881, 20.58748, 25.22437, 28.10747, 27.61326, 29.72708, 30.50207, 33.76313, 34.57233, 37.70556, 38.86882, 39.8812 , 42.53511, 44.98107, 47.05231, 47.06461, 46.87143, 49.64106, 50.61611, 51.99425, 52.1621 , 53.32 , 53.08869, 55.50578, 54.94413, 58.22332, 55.4712 , 53.92575, 55.78262, 55.43629, 55.76022, 55.54874, 54.48838, 53.03856, 53.18936, 52.69689, 51.31535, 50.97824, 49.8669 , 47.23448, 45.16766, 45.17244, 43.95332, 40.15198, 37.67062, 38.15204, 36.03614, 34.37541, 29.3083 , 28.64282, 27.20938, 24.13398, 21.68822, 19.6484 , 18.41451, 14.74353, 12.86819, 11.78426, 6.63421, 0. , 8.83043, 8.80103], [ 42.75318, 41.51576, 40.55066, 41.85398, 40.79031, 40.64863, 39.33236, 39.43324, 39.56802, 39.21815, 38.09928, 35.57779, 36.06984, 33.91675, 34.18837, 33.47656, 31.31277, 33.5043 , 29.47687, 28.60458, 28.39343, 26.15777, 24.09575, 22.23873, 20.9814 , 18.30611, 17.0517 , 15.52057, 16.96968, 13.96013, 11.80224, 10.23459, 12.12231, 12.80535, 10.96839, 13.95903, 14.73243, 16.52504, 17.8264 , 18.68724, 20.37247, 24.43796, 24.23459, 25.83973, 27.94986, 30.48797, 31.19456, 34.12645, 34.90814, 36.95114, 39.36061, 42.0262 , 43.63606, 44.40938, 43.90372, 46.62981, 47.78862, 49.21381, 49.27814, 50.79307, 50.68305, 53.14199, 52.52346, 56.01087, 53.09032, 51.51095, 53.64081, 53.67147, 54.37677, 54.10548, 52.87226, 51.85091, 51.82288, 50.98398, 50.31884, 49.73738, 48.80525, 46.50554, 44.28871, 45.04898, 43.33299, 40.34494, 38.17835, 38.01118, 36.48471, 34.48999, 29.80644, 28.83705, 27.40981, 23.94398, 22.49918, 20.7999 , 17.69405, 15.81852, 13.30844, 11.47244, 10.22093, 8.83043, 0. , 7.46263], [ 46.74111, 45.2993 , 44.09355, 45.9426 , 44.66045, 44.57547, 43.20546, 43.19846, 43.04025, 43.02089, 41.43184, 39.06011, 40.03816, 38.06303, 37.85652, 37.35049, 35.47796, 36.83706, 32.87905, 31.58328, 32.42605, 29.63547, 27.8713 , 25.49112, 23.64271, 21.75684, 20.42138, 19.39125, 20.31666, 17.63429, 16.78251, 14.84621, 15.73701, 16.20137, 13.95765, 16.88417, 17.36648, 20.00963, 20.38759, 20.88898, 22.75393, 27.26505, 27.25467, 28.4507 , 30.43803, 33.78487, 34.11927, 36.59337, 37.3594 , 39.24277, 42.55081, 44.83658, 46.86818, 47.37281, 47.26843, 49.95592, 51.26349, 52.79329, 52.86302, 54.50537, 54.16762, 56.6705 , 56.22352, 59.61459, 56.86156, 55.33784, 57.31579, 57.5573 , 58.38252, 57.89195, 56.74007, 55.85643, 55.93366, 55.16405, 54.28067, 53.71283, 52.57763, 50.1937 , 47.87966, 48.29225, 46.9185 , 43.38612, 41.31302, 41.47549, 39.63076, 37.48156, 32.34705, 32.02712, 30.03519, 26.50431, 24.09327, 23.08624, 18.89004, 18.51871, 14.52224, 12.76638, 9.98474, 8.80103, 7.46263, 0. ]]) class FastMetricScmdsScalingTests(TestCase): """test the functions to do metric scaling""" def setUp(self): """creates inputs""" self.dist_func = lambda x, y: (FULL_SYM_MATRIX[x, y]) self.num_objects = FULL_SYM_MATRIX.shape[0] def test_scmds_cmds_tzeng(self): """cmds_tzeng() should return eigenvectors and eigenvalues, sorted by the eigenvalues """ dim = 3 (eigvec, eigval) = cmds_tzeng(FULL_SYM_MATRIX, dim) self.assertTrue(len(eigval) == dim) self.assertTrue(eigvec.shape == (FULL_SYM_MATRIX.shape[0], dim)) self.assertTrue(sorted(eigval, reverse=True) == eigval.tolist()) self.assertFloatEqual(eigval[0], 27336.883436) self.assertFloatEqual(eigval[-1], 536.736247) self.assertFloatEqual(eigvec[0, 0], -14.978621) self.assertFloatEqual(eigvec[-1, -1], 0.673001) # full pcoa dim = FULL_SYM_MATRIX.shape[0] (eigvec, eigval) = cmds_tzeng(FULL_SYM_MATRIX, dim) self.assertTrue(len(eigval) == dim) self.assertTrue(eigvec.shape == (FULL_SYM_MATRIX.shape[0], dim)) self.assertTrue(sorted(eigval, reverse=True) == eigval.tolist()) self.assertFloatEqual(eigval[0], 27336.883436043929) self.assertFloatEqual(eigval[-1], 0.000000) self.assertFloatEqual(eigvec[0, 0], -14.978621) self.assertFloatEqual(eigvec[-1, -1], 0.000000) def test_scmds_rowmeans(self): """rowmeans() should return a vector of row-means for a 2d matrix. """ rm = rowmeans(FULL_SYM_MATRIX[:10]) self.assertTrue(rm.shape[0] == 10) self.assertFloatEqual(float(rm[0]), 25.983320) self.assertFloatEqual(float(rm[-1]), 23.967050) def test_scmds_recenter(self): """recenter() should recenter an mds solution """ mds_coords = matrix([ [-3.78333558, -2.90925004, 2.75333034], [ 5.18887751, -1.2130882 , -0.86476508], [-3.10404298, -0.6620052 , -3.91668873], [-2.53758526, 3.99102424, 0.86289149], [ 4.23608631, 0.7933192 , 1.16523198]]) centered_coords = recenter(mds_coords) self.assertTrue(centered_coords.shape == mds_coords.shape) center_of_gravity = sum([centered_coords[:, x].sum() \ for x in range(centered_coords.shape[1])]) self.assertFloatEqual(center_of_gravity, 0.0) def test_scmds_affine_mapping(self): """affine_mapping() should return a touple of two matrices of certain shape """ matrix_x = matrix([ [-24.03457111, 10.10355666, -23.17039728, 28.48438894, 22.57322482], [ 0.62716392, 20.84502664, 6.42317521, -7.66901011, 14.37923852], [ -2.60793417, 2.83532649, 2.91024821, 1.37414959, -4.22916659]]) matrix_y = matrix([ [-29.81089477, -2.01312927, -30.5925487 , 23.05985801, 11.68581751], [ -4.68879117, 23.62633294, 1.07934315, 0.57461989, 20.52800221], [ -0.51503505, 1.18377044, 0.83671471, 1.81751358, -3.28812925]]) dim = matrix_x.shape[0] (tu, tb) = affine_mapping(matrix_x, matrix_y) self.assertTrue(tu.shape[0] == dim and tb.shape[0] == dim) self.assertTrue(tu.shape[0] == tu.shape[1]) self.assertFloatEqual(tu[0, 0], 0.966653) self.assertFloatEqual(tu[-1, -1], 0.994816) self.assertFloatEqual(tb[0, 0], -6.480975) self.assertFloatEqual(tb[-1, -1], 0.686521) def test_scmds_adjust_mds_to_ref(self): """adjust_mds_to_ref() should return an adjusted mds solutions""" overlap = 5 dim = 3 size = 10 fake_mds_coords_ref = FULL_SYM_MATRIX[:size, :dim] fake_mds_coords_add = FULL_SYM_MATRIX[overlap:size+overlap, :dim] mds_adj = adjust_mds_to_ref(fake_mds_coords_ref, fake_mds_coords_add, overlap) self.assertTrue(mds_adj.shape == fake_mds_coords_add.shape) self.assertFloatEqual(mds_adj[0, 0], 7.526609) self.assertFloatEqual(mds_adj[-1, -1], 18.009350) def test_scmds_combine_mds(self): """combine_mds() should merge two mds solutions """ overlap = 3 dim = 3 mds_coords_1 = matrix([ [-3.78333558, -2.90925004, 2.75333034], [ 5.18887751, -1.2130882 , -0.86476508], [-3.10404298, -0.6620052 , -3.91668873], [-2.53758526, 3.99102424, 0.86289149], [ 4.23608631, 0.7933192 , 1.16523198]]) mds_coords_2 = matrix([ [-3.78333558, -2.90925004, 2.75333034], [ 5.18887751, -1.2130882 , -0.86476508], [-3.10404298, -0.6620052 , -3.91668873], [-2.53758526, 3.99102424, 0.86289149], [ 4.23608631, 0.7933192 , 1.16523198]]) comb_mds = combine_mds(mds_coords_1, mds_coords_2, overlap) self.assertTrue(comb_mds.shape == ( mds_coords_1.shape[0]*2-overlap, dim)) self.assertFloatEqual(comb_mds[0, 0], -3.783335) #self.assertFloatEqual(comb_mds[-1, -1], 0.349951) def test_scmds_class_combinemds(self): """class CombineMds() should be able to join MDS solutions """ # tmp note # tile1 = FULL_SYM_MATRIX[0:5, 0:5] # tile2 = FULL_SYM_MATRIX[2:7, 2:7] # mds_coords_1 = cmds_tzeng(tile1, 3) # mds_coords_2 = cmds_tzeng(tile2, 3) overlap = 3 mds_coords_1 = matrix([ [-3.78333558, -2.90925004, 2.75333034], [ 5.18887751, -1.2130882 , -0.86476508], [-3.10404298, -0.6620052 , -3.91668873], [-2.53758526, 3.99102424, 0.86289149], [ 4.23608631, 0.7933192 , 1.16523198]]) mds_coords_2 = matrix([ [-3.78333558, -2.90925004, 2.75333034], [ 5.18887751, -1.2130882 , -0.86476508], [-3.10404298, -0.6620052 , -3.91668873], [-2.53758526, 3.99102424, 0.86289149], [ 4.23608631, 0.7933192 , 1.16523198]]) comb_mds = CombineMds() comb_mds.add(mds_coords_1, overlap) comb_mds.add(mds_coords_2, overlap) final_mds = comb_mds.getFinalMDS() self.assertTrue(final_mds.shape == (mds_coords_1.shape[0]*2-overlap, mds_coords_1.shape[1])) #self.assertFloatEqual(final_mds[0, 0], 0.0393279) #self.assertFloatEqual(final_mds[-1, -1], -5.322599) class FastMetricNystromScalingTests(TestCase): """test the functions to do metric scaling""" def setUp(self): """creates inputs""" self.big_seed_matrix = FULL_SYM_MATRIX[:49] self.small_seed_matrix = FULL_SYM_MATRIX[:25] def test_calc_matrix_a(self): """calc_matrix_a should calculate a k x k matrix of (predefined) association matrix K of certain (predefined) value""" nseeds = self.small_seed_matrix.shape[0] matrix_e = self.small_seed_matrix[:, 0:nseeds] matrix_a = calc_matrix_a(matrix_e) self.assertFloatEqual(matrix_a[0, 0], 250.032270) self.assertFloatEqual(matrix_a[-1][-1], 316.875461) def test_nystrom_build_seed_matrix(self): """build_seed_matrix() should return a seedmatrix and an order """ seedmat_dim = 10 dist_func = lambda x, y: (FULL_SYM_MATRIX[x, y]) (seedmat, order) = build_seed_matrix( FULL_SYM_MATRIX.shape[0], seedmat_dim, dist_func) self.assertTrue(len(order) == FULL_SYM_MATRIX.shape[0]) self.assertTrue(sorted(order) == range(FULL_SYM_MATRIX.shape[0])) self.assertTrue(seedmat.shape == ( seedmat_dim, FULL_SYM_MATRIX.shape[0])) # build_seed_matrix randomises order ind = argsort(order) i = random.randint(0, seedmat.shape[0]) j = random.randint(0, seedmat.shape[1]) self.assertFloatEqual(seedmat[i, j], FULL_SYM_MATRIX[ind[i], ind[j]]) def test_nystrom(self): """nystrom() should return an MDS approximation""" dim = 3 mds_coords = nystrom(self.big_seed_matrix, dim) self.assertTrue(len(mds_coords.shape) == 2) self.assertTrue(mds_coords.shape[0] == self.big_seed_matrix.shape[1]) self.assertTrue(mds_coords.shape[1] == dim) self.assertFloatEqual(mds_coords[0, 0], -10.709626) self.assertFloatEqual(mds_coords[-1, -1], -1.778160) def test_nystrom_seed_number(self): """nystrom() should give better MDS approximations the more seeds were used""" dim = 3 mds_coords = nystrom(self.big_seed_matrix, dim) stress = goodness_of_fit.Stress(FULL_SYM_MATRIX, mds_coords) kruskal_stress_big_mat = stress.calcKruskalStress() if PRINT_STRESS: print("INFO: Kruskal stress for Nystrom MDS (big_seed_matrix, dim=%d) = %f" % \ (dim, kruskal_stress_big_mat)) self.assertTrue(kruskal_stress_big_mat < 0.04) mds_coords = nystrom(self.small_seed_matrix, dim) stress = goodness_of_fit.Stress(FULL_SYM_MATRIX, mds_coords) kruskal_stress_small_mat = stress.calcKruskalStress() if PRINT_STRESS: print("INFO: Kruskal stress for Nystrom MDS (small_seed_matrix, dim=%d) = %f" % \ (dim, kruskal_stress_small_mat)) self.assertTrue(kruskal_stress_small_mat < 0.06) self.assertTrue(kruskal_stress_small_mat > kruskal_stress_big_mat) def test_calc_matrix_b(self): """calc_matrix_b should calculate a k x n-k matrix of association matrix K """ nseeds = self.small_seed_matrix.shape[0] matrix_e = self.small_seed_matrix[:, 0:nseeds] matrix_f = self.small_seed_matrix[:, nseeds:] matrix_b = calc_matrix_b(matrix_e, matrix_f) self.assertTrue(matrix_b.shape == matrix_f.shape) self.assertFloatEqual(matrix_b[0, 0], -272.711227) self.assertFloatEqual(matrix_b[-1, -1], -64.898372) #run if called from the command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_cluster/test_goodness_of_fit.py000644 000765 000024 00000011273 12024702176 024707 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import unittest import numpy import cogent.cluster.goodness_of_fit as goodness_of_fit __author__ = "Andreas Wilm" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Andreas Wilm"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Andreas Wilm" __email__ = "andreas.wilm@ucd.ie" __status__ = "Production" def example_distmat_and_mdscoords(): """Return an example distance matrix and corresponding MDS coordinates Arguments: * None Returns: * Tuple of: ** a distance matrix as numpy.matrix and ** MDS coordinates as numpy.array """ distmat = numpy.array([ [ 0. , 0.039806, 0.056853, 0.21595 , 0.056853, 0.0138 , 0.203862, 0.219002, 0.056853, 0.064283], [ 0.039806, 0. , 0.025505, 0.203862, 0.0208 , 0.039806, 0.194917, 0.21291 , 0.0208 , 0.027869], [ 0.056853, 0.025505, 0. , 0.197887, 0.018459, 0.056853, 0.191958, 0.203862, 0.018459, 0.025505], [ 0.21595 , 0.203862, 0.197887, 0. , 0.206866, 0.206866, 0.07956 , 0.066935, 0.203862, 0.206866], [ 0.056853, 0.0208 , 0.018459, 0.206866, 0. , 0.056853, 0.203862, 0.21595 , 0.0138 , 0.0208 ], [ 0.0138 , 0.039806, 0.056853, 0.206866, 0.056853, 0. , 0.197887, 0.209882, 0.056853, 0.064283], [ 0.203862, 0.194917, 0.191958, 0.07956 , 0.203862, 0.197887, 0. , 0.030311, 0.200869, 0.206866], [ 0.219002, 0.21291 , 0.203862, 0.066935, 0.21595 , 0.209882, 0.030311, 0. , 0.21291 , 0.219002], [ 0.056853, 0.0208 , 0.018459, 0.203862, 0.0138 , 0.056853, 0.200869, 0.21291 , 0. , 0.011481], [ 0.064283, 0.027869, 0.025505, 0.206866, 0.0208 , 0.064283, 0.206866, 0.219002, 0.011481, 0. ]]) mds_coords = numpy.array([ [ 0.065233, 0.035019, 0.015413], [ 0.059604, 0.00168 , -0.003254], [ 0.052371, -0.010959, -0.014047], [-0.13804 , -0.036031, 0.031628], [ 0.063703, -0.015483, -0.00751 ], [ 0.056803, 0.031762, 0.021767], [-0.135082, 0.023552, -0.021006], [-0.150323, 0.011935, -0.010013], [ 0.06072 , -0.01622 , -0.007721], [ 0.065009, -0.025254, -0.005257]]) return (distmat, mds_coords) class GoodnessOfFitTestCase(unittest.TestCase): def setUp(self): """ set up """ (self.distmat, self.mds_coords) = example_distmat_and_mdscoords() self.stress = goodness_of_fit.Stress(self.distmat, self.mds_coords) def test_kruskalstress1(self): """ testing goodness_of_fit.calcKruskalStress() """ val = "%0.6f" % self.stress.calcKruskalStress() self.assertEqual(val, '0.022555') def test_sstress(self): """ testing goodness_of_fit.calcSstress() """ val = "%0.6f" % self.stress.calcSstress() self.assertEqual(val, '0.008832') def test_calc_pwdist(self): """ testing (private) goodness_of_fit._calc_pwdist """ # this is a square in 2D square_mds = numpy.array([[0,0], [1,0], [1,1], [0,1]]) # this is what the distance matrix should look like square_distmat = numpy.array([[ 0. , 1. , 1.41421356, 1. ], [ 1. , 0. , 1. , 1.41421356], [ 1.41421356, 1. , 0. , 1. ], [ 1. , 1.41421356, 1. , 0. ]]) derived_distmat = self.stress._calc_pwdist(square_mds) # check if dervied and original array are (more or less) the same self.assert_((derived_distmat-square_distmat).sum() < 0.000001) def test_argument_mixup_exception(self): """ test if mds_coords and distmat are mix-up is detected """ self.assertRaises(AssertionError, goodness_of_fit.Stress, self.mds_coords, self.distmat) # should give something like # AssertionError: orig_distmat shape bigger than mds_coords shape. Possible argument mixup def test_size_exception(self): """ test if check on number of rows works """ self.assertRaises(AssertionError, goodness_of_fit.Stress, self.distmat, self.mds_coords.transpose()) # should give something like # AssertionError: orig_distmat and mds_coords do not have the same number of rows/objects. if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_cluster/test_metric_scaling.py000644 000765 000024 00000021251 12024702176 024520 0ustar00jrideoutstaff000000 000000 #!usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.cluster.metric_scaling import make_E_matrix, \ make_F_matrix, run_eig, get_principal_coordinates, \ principal_coordinates_analysis, output_pca, PCoA from numpy import array import numpy Float = numpy.core.numerictypes.sctype2char(float) __author__ = "Catherine Lozupone" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozupone", "Peter Maxwell", "Rob Knight", "Justin Kuczynski", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Catherine Lozupone" __email__ = "lozupone@colorado.edu" __status__ = "Production" class MetricScalingTests(TestCase): """test the functions to do metric scaling""" def setUp(self): """creates inputs""" #create a test input matrix self.matrix = array([[1,2,3], [4,5,6]], Float) #create a symmetrical matrix self.sym_matrix = array([[0.5, 1.0, 1.5, 2.0], [1.0, 2.0, 3.0, 4.0], [1.5, 3.0, 2.0, 1.0], [2.0, 4.0, 1.0, 0.5]]) self.tree_dnd = """((org1:0.11, org2:0.22,(org3:0.12, org4:0.23):0.33) :0.2,(org5:0.44, org6:0.55):0.3, org7:0.4);""" self.all_envs = ['env1', 'env2', 'env3'] self.envs_dict = {"org1": array([1,1,0]), "org2": array([0,1,0]), "org3": array([0,1,0]), "org4": array([0,0,1]), "org5": array([1,1,0]), "org6": array([1,1,0]), "org7": array([0,0,1]), } #sample data set from page 111 of W.J Krzanowski. Principles of #multivariate analysis, 2000, Oxford University Press self.real_matrix = array([[0,0.099,0.033,0.183,0.148,0.198,0.462,0.628,0.113,0.173,0.434,0.762,0.53,0.586],\ [0.099,0,0.022,0.114,0.224,0.039,0.266,0.442,0.07,0.119,0.419,0.633,0.389,0.435],\ [0.033,0.022,0,0.042,0.059,0.053,0.322,0.444,0.046,0.162,0.339,0.781,0.482,0.55], \ [0.183,0.114,0.042,0,0.068,0.085,0.435,0.406,0.047,0.331,0.505,0.7,0.579,0.53], \ [0.148,0.224,0.059,0.068,0,0.051,0.268,0.24,0.034,0.177,0.469,0.758,0.597,0.552], \ [0.198,0.039,0.053,0.085,0.051,0,0.025,0.129,0.002,0.039,0.39,0.625,0.498,0.509], \ [0.462,0.266,0.322,0.435,0.268,0.025,0,0.014,0.106,0.089,0.315,0.469,0.374,0.369], \ [0.628,0.442,0.444,0.406,0.24,0.129,0.014,0,0.129,0.237,0.349,0.618,0.562,0.471], \ [0.113,0.07,0.046,0.047,0.034,0.002,0.106,0.129,0,0.071,0.151,0.44,0.247,0.234], \ [0.173,0.119,0.162,0.331,0.177,0.039,0.089,0.237,0.071,0,0.43,0.538,0.383,0.346], \ [0.434,0.419,0.339,0.505,0.469,0.39,0.315,0.349,0.151,0.43,0,0.607,0.387,0.456], \ [0.762,0.633,0.781,0.7,0.758,0.625,0.469,0.618,0.44,0.538,0.607,0,0.084,0.09], \ [0.53,0.389,0.482,0.579,0.597,0.498,0.374,0.562,0.247,0.383,0.387,0.084,0,0.038], \ [0.586,0.435,0.55,0.53,0.552,0.509,0.369,0.471,0.234,0.346,0.456,0.09,0.038,0]]) #test tree dnd = """ """ def test_principal_coordinate_analysis(self): """principal_coordinate_analysis returns array of principal coors""" #I took the example in the book (see intro info), and did the #principal coordinates analysis, plotted the data and it looked #right matrix = self.real_matrix pcs, eigvals= principal_coordinates_analysis(matrix) bigfirstorder = eigvals.argsort()[::-1] pcs = pcs[bigfirstorder] eigvals = eigvals[bigfirstorder] self.assertEqual(len(pcs), 14) self.assertFloatEqual(abs(pcs[0,0]), 0.240788133045) self.assertFloatEqual(abs(pcs[1,0]), 0.233677162) def test_PCoA(self): """PCoA returns a cogent Table result""" matrix = self.real_matrix names = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n'] pairwise_dist = {} for i, name1 in enumerate(names): for j, name2 in enumerate(names): combo = (name1, name2) if combo not in pairwise_dist: pairwise_dist[combo] = matrix[i,j] #perform with the PCoA function result = PCoA(pairwise_dist) self.assertEqual(result[7,1], 'a') self.assertFloatEqual(abs(result[7,2]), 0.240788133045) def test_make_E_matrix(self): """make_E_matrix converts a distance matrix to an E matrix""" matrix = self.matrix E_matrix = make_E_matrix(matrix) self.assertFloatEqual(E_matrix[0,0], -0.5) self.assertFloatEqual(E_matrix[0,1], -2.0) self.assertFloatEqual(E_matrix[1,0], -8.0) self.assertFloatEqual(E_matrix[1,2], -18.0) def test_make_F_matrix(self): """make_F_matrix converts an E_matrix to an F_matrix""" #matrix = self.matrix matrix = array([[1,2,3],[4,5,6],[7,8,9]]) F_matrix = make_F_matrix(matrix) self.assertEqual(F_matrix, array([[0.0, -2.0, -4.0], [2.0, 0.0, -2.0], [4.0, 2.0, 0.0]])) def test_run_eig(self): """run_eig returns eigenvectors and values""" matrix = self.sym_matrix eigvals, eigvecs = run_eig(matrix) #make sure that the number of eigvecs and eigvals is equal to dims self.assertEqual(len(eigvals), 4) self.assertEqual(len(eigvecs), 4) def test_get_principal_coordinates(self): """get_principal_coordinates normalizes eigvecs with eigvalues""" matrix = array([[1,1,1],[2,2,2],[3,3,3]]) vec = array([0,1,-4]) result = get_principal_coordinates(vec, matrix) self.assertEqual(result[0], array([0,0,0])) self.assertEqual(result[1], array([2,2,2])) self.assertEqual(result[2], array([6,6,6])) def test_output_pca(self): """output_pca1 creates a Table output for pcs results""" #make arbitary values for inputs eigvals = array([4.2,3.2,5.2]) names = ['env1', 'env2', 'env3'] pca_matrix = array([[-0.34, -0.22, 0.57], [-0.12, 0.14, -0.018], [1.8, 1.9, 2.0]]) output_table = output_pca(pca_matrix, eigvals, names) self.assertEqual(str(output_table), '''\ ================================================================ Type Label vec_num-0 vec_num-1 vec_num-2 ---------------------------------------------------------------- Eigenvectors env1 1.80 -0.34 -0.12 Eigenvectors env2 1.90 -0.22 0.14 Eigenvectors env3 2.00 0.57 -0.02 Eigenvalues eigenvalues 5.20 4.20 3.20 Eigenvalues var explained (%) 41.27 33.33 25.40 ----------------------------------------------------------------''') def test_integration(self): """Integration test of PCoA should work correctly""" u = array([[ 0. , 0.7123611 , 0.76849198, 0.80018563, 0.68248524, 0.74634629], [ 0.7123611 , 0. , 0.86645691, 0.80485282, 0.83381301, 0.73881726], [ 0.76849198, 0.86645691, 0. , 0.82308396, 0.77451746, 0.76498872], [ 0.80018563, 0.80485282, 0.82308396, 0. , 0.84167365, 0.77614366], [ 0.68248524, 0.83381301, 0.77451746, 0.84167365, 0. , 0.72661163], [ 0.74634629, 0.73881726, 0.76498872, 0.77614366, 0.72661163, 0. ]]) e = make_E_matrix(u) f = make_F_matrix(e) eigvals, eigvecs = run_eig(f) bigfirstorder = eigvals.argsort()[::-1] #eigvecs = eigvecs[bigfirstorder] #eigvals = eigvals[bigfirstorder] principal_coords = get_principal_coordinates(eigvals, eigvecs) principal_coords = principal_coords[bigfirstorder] expected = array([ [ 2.85970001e-02, -3.74940557e-01, 3.35175925e-01, -2.54123944e-01, 2.82568441e-01, -1.72768652e-02], [ 2.29038532e-01, 2.23340550e-01, -2.38559794e-01, -4.12346397e-01, 1.86069108e-01, 1.24580018e-02], [ 7.05527166e-02, -2.08929136e-01, -3.09988697e-01, 2.33436419e-01, 2.88756308e-01, -7.38276099e-02]]) self.assertFloatEqual(abs(principal_coords[[0,1,2]]), abs(expected)) #run if called from the command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_cluster/test_nmds.py000644 000765 000024 00000010233 12024702176 022474 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from numpy import array, sqrt, size from cogent.cluster.nmds import NMDS, metaNMDS from cogent.maths.distance_transform import dist_euclidean __author__ = "Justin Kuczynski" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Justin Kuczynski" __email__ = "justinak@gmail.com" __status__ = "Development" class NMDSTests(TestCase): """test the nonmetric_scaling module, using floating point numpy arrays """ def setUp(self): """creates inputs""" self.mtx = array([[0,3,4,8], [3,0,1,27], [4,1,0,3.5], [8,27,3.5,0]],'d') self.nm = NMDS(self.mtx, verbosity=0) def test_getStress(self): """stress should be small this is preliminary, better to check for convergence to similar states with random starting points enabled""" stress = self.nm.getStress() self.assertLessThan(stress, 1e-1) def test_getPoints(self): """points should be of the right number and dimensionality this is preliminary, better to check for convergence to similar states with random starting points enabled""" pts = self.nm.getPoints() self.assertEqual(size(pts, 0), 4) self.assertEqual(size(pts, 1), 2) def test_2(self): """l19 data should give stress below .13""" ptmtx = array( [[7,1,0,0,0,0,0,0,0], [4,2,0,0,0,1,0,0,0], [2,4,0,0,0,1,0,0,0], [1,7,0,0,0,0,0,0,0], [0,8,0,0,0,0,0,0,0], [0,7,1,0,0,0,0,0,0],#idx 5 [0,4,2,0,0,0,2,0,0], [0,2,4,0,0,0,1,0,0], [0,1,7,0,0,0,0,0,0], [0,0,8,0,0,0,0,0,0], [0,0,7,1,0,0,0,0,0],#idx 10 [0,0,4,2,0,0,0,3,0], [0,0,2,4,0,0,0,1,0], [0,0,1,7,0,0,0,0,0], [0,0,0,8,0,0,0,0,0], [0,0,0,7,1,0,0,0,0],#idx 15 [0,0,0,4,2,0,0,0,4], [0,0,0,2,4,0,0,0,1], [0,0,0,1,7,0,0,0,0]], 'float') distmtx = dist_euclidean(ptmtx) nm = NMDS(distmtx, verbosity=0) self.assertLessThan(nm.getStress(), .13) def test_3(self): """l19 data should give stress below .13 in multi-D""" ptmtx = array( [[7,1,0,0,0,0,0,0,0], [4,2,0,0,0,1,0,0,0], [2,4,0,0,0,1,0,0,0], [1,7,0,0,0,0,0,0,0], [0,8,0,0,0,0,0,0,0], [0,7,1,0,0,0,0,0,0],#idx 5 [0,4,2,0,0,0,2,0,0], [0,2,4,0,0,0,1,0,0], [0,1,7,0,0,0,0,0,0], [0,0,8,0,0,0,0,0,0], [0,0,7,1,0,0,0,0,0],#idx 10 [0,0,4,2,0,0,0,3,0], [0,0,2,4,0,0,0,1,0], [0,0,1,7,0,0,0,0,0], [0,0,0,8,0,0,0,0,0], [0,0,0,7,1,0,0,0,0],#idx 15 [0,0,0,4,2,0,0,0,4], [0,0,0,2,4,0,0,0,1], [0,0,0,1,7,0,0,0,0]], 'float') distmtx = dist_euclidean(ptmtx) for dim in range(3,18): nm = NMDS(distmtx, verbosity=0, dimension=dim) self.assertLessThan(nm.getStress(), .13) def test_metaNMDS(self): """l19 data should give stress below .13""" ptmtx = array( [[7,1,0,0,0,0,0,0,0], [4,2,0,0,0,1,0,0,0], [2,4,0,0,0,1,0,0,0], [1,7,0,0,0,0,0,0,0], [0,8,0,0,0,0,0,0,0], [0,7,1,0,0,0,0,0,0],#idx 5 [0,4,2,0,0,0,2,0,0], [0,2,4,0,0,0,1,0,0], [0,1,7,0,0,0,0,0,0], [0,0,8,0,0,0,0,0,0], [0,0,7,1,0,0,0,0,0],#idx 10 [0,0,4,2,0,0,0,3,0], [0,0,2,4,0,0,0,1,0], [0,0,1,7,0,0,0,0,0], [0,0,0,8,0,0,0,0,0], [0,0,0,7,1,0,0,0,0],#idx 15 [0,0,0,4,2,0,0,0,4], [0,0,0,2,4,0,0,0,1], [0,0,0,1,7,0,0,0,0]], 'float') distmtx = dist_euclidean(ptmtx) nm = metaNMDS(1, distmtx, verbosity=0) self.assertLessThan(nm.getStress(), .13) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_cluster/test_procrustes.py000644 000765 000024 00000012352 12024702176 023750 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from numpy import array, sqrt, dot, trace, transpose, pi, cos, sin, dot,\ trace, append from cogent.cluster.procrustes import procrustes, get_disparity, center, \ normalize __author__ = "Justin Kuczynski" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Justin Kuczynski" __email__ = "justinak@gmail.com" __status__ = "Production" class procrustesTests(TestCase): """test the procrustes module, using floating point numpy arrays """ def setUp(self): """creates inputs""" # an L self.data1 = array([[1, 3], [1, 2], [1, 1], [2, 1]], 'd') # a larger, shifted, mirrored L self.data2 = array([[4, -2], [4, -4], [4, -6], [2, -6]], 'd') # an L shifted up 1, right 1, and with point 4 shifted an extra .5 # to the right # pointwise distance disparity with data1: 3*(2) + (1 + 1.5^2) self.data3 = array([[2, 4], [2, 3], [2, 2], [3, 2.5]], 'd') # data4, data5 are standardized (trace(A*A') = 1). # procrustes should return an identical copy if they are used # as the first matrix argument. shiftangle = pi/8 self.data4 = array([[1,0], [0,1], [-1,0], [0,-1]],'d')/sqrt(4) self.data5 = array([[cos(shiftangle), sin(shiftangle)], [cos(pi/2-shiftangle), sin(pi/2-shiftangle)], [-cos(shiftangle), -sin(shiftangle)], [-cos(pi/2-shiftangle), -sin(pi/2-shiftangle)]], 'd')/sqrt(4) def test_procrustes(self): """tests procrustes' ability to match two matrices. the second matrix is a rotated, shifted, scaled, and mirrored version of the first, in two dimensions only """ # can shift, mirror, and scale an 'L'? a,b,disparity = procrustes(self.data1, self.data2) self.assertFloatEqual(b, a) self.assertFloatEqual(disparity,0.) # if first mtx is standardized, leaves first mtx unchanged? m4, m5, disp45 = procrustes(self.data4, self.data5) self.assertFloatEqual(m4, self.data4) # at worst, data3 is an 'L' with one point off by .5 m1, m3, disp13 = procrustes(self.data1, self.data3) self.assertLessThan(disp13, .5**2) def test_procrustes2(self): """procrustes disparity should not depend on order of matrices""" m1, m3, disp13 = procrustes(self.data1, self.data3) m3_2, m1_2, disp31 = procrustes(self.data3, self.data1) self.assertFloatEqual(disp13, disp31) # try with 3d, 8 pts per rand1 = array([[ 2.61955202, 0.30522265, 0.55515826], [ 0.41124708, -0.03966978, -0.31854548], [ 0.91910318, 1.39451809, -0.15295084], [ 2.00452023, 0.50150048, 0.29485268], [ 0.09453595, 0.67528885, 0.03283872], [ 0.07015232, 2.18892599, -1.67266852], [ 0.65029688, 1.60551637, 0.80013549], [-0.6607528 , 0.53644208, 0.17033891]]) rand3 = array([[ 0.0809969 , 0.09731461, -0.173442 ], [-1.84888465, -0.92589646, -1.29335743], [ 0.67031855, -1.35957463, 0.41938621], [ 0.73967209, -0.20230757, 0.52418027], [ 0.17752796, 0.09065607, 0.29827466], [ 0.47999368, -0.88455717, -0.57547934], [-0.11486344, -0.12608506, -0.3395779 ], [-0.86106154, -0.28687488, 0.9644429 ]]) res1, res3, disp13 = procrustes(rand1,rand3) res3_2, res1_2, disp31 = procrustes(rand3, rand1) self.assertFloatEqual(disp13, disp31) def test_get_disparity(self): """tests get_disparity""" disp = get_disparity(self.data1, self.data3) disp2 = get_disparity(self.data3, self.data1) self.assertFloatEqual(disp, disp2) self.assertFloatEqual(disp, (3.*2. + (1. + 1.5**2))) d1 = append(self.data1, self.data1, 0) d3 = append(self.data3, self.data3, 0) disp3 = get_disparity(d1,d3) disp4 = get_disparity(d3,d1) self.assertFloatEqual(disp3, disp4) # 2x points in same configuration should give 2x disparity self.assertFloatEqual(disp3, 2.*disp) def test_center(self): centered_mtx = center(self.data1) column_means = centered_mtx.mean(0) for col_mean in column_means: self.assertFloatEqual(col_mean, 0.) def test_normalize(self): norm_mtx = normalize(self.data1) self.assertFloatEqual(trace(dot(norm_mtx,transpose(norm_mtx))), 1.) # match_points isn't yet tested, as it's almost a private function # and test_procrustes() tests it implicitly. #run if called from command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_cluster/test_UPGMA.py000644 000765 000024 00000013662 12024702176 022415 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.core.tree import PhyloNode from numpy import array import numpy Float = numpy.core.numerictypes.sctype2char(float) from cogent.cluster.UPGMA import find_smallest_index, condense_matrix, \ condense_node_order, UPGMA_cluster, inputs_from_dict2D, upgma from cogent.util.dict2d import Dict2D __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class UPGMATests(TestCase): """test the functions to cluster using UPGMA using numpy""" def setUp(self): """creates inputs""" self.pairwise_distances = {('a', 'b'): 1.0, ('a', 'c'):4.0, ('a', 'd'):20.0, ('a', 'e'):22.0, ('b', 'c'):5.0, ('b', 'd'):21.0, ('b', 'e'):23.0, ('c', 'd'):10.0, ('c', 'e'):12.0, ('d', 'e'):2.0} #create a list of PhyloNode objects a, b, c, d, e = map(PhyloNode, 'abcde') self.node_order = [a, b, c, d, e] #create a numpy matrix object to cluster self.matrix = array(([9999999, 1, 4, 20, 22], \ [1, 9999999, 5, 21, 23], \ [4, 5, 9999999, 10, 12], \ [20, 21, 10, 9999999, 2], \ [22, 23, 12, 2, 9999999]), Float) #create a numpy matrix with zero diagonals to test diagonal mask self.matrix_zeros = array(([0, 1, 4, 20, 22], \ [1, 0, 5, 21, 23], \ [4, 5, 0, 10, 12], \ [20, 21, 10, 0, 2], \ [22, 23, 12, 2, 0]), Float) #create a numpy matrix with zero diagonals to test diagonal mask self.matrix_five = array(([5, 1, 4, 20, 22], \ [1, 5, 5, 21, 23], \ [4, 5, 5, 10, 12], \ [20, 21, 10, 5, 2], \ [22, 23, 12, 2, 5]), Float) def test_UPGMA_cluster(self): """upgma works on pairwise distance dict """ pairwise_dist = self.pairwise_distances cluster = upgma(pairwise_dist) self.assertEqual(str(cluster), '(((b:0.5,a:0.5)edge.1:1.75,c:2.25)edge.0:5.875,(d:1.0,e:1.0)edge.2:7.125)root;') def test_find_smallest_index(self): """find_smallest_index returns the index of smallest value in array """ matrix = self.matrix index = find_smallest_index(matrix) self.assertEqual(index, (0,1)) def test_condense_matrix(self): """condense_array joins two rows and columns identified by indices """ matrix = self.matrix index = find_smallest_index(matrix) result = condense_matrix(matrix, index, 9999999999) self.assertFloatEqual(result[0, 0], 5000000.0) self.assertEqual(result[1, 4], 9999999999) self.assertEqual(result[0, 1], 9999999999) self.assertEqual(result[0, 2], 4.5) self.assertEqual(result[2, 0], 4.5) self.assertEqual(result[0, 4], 22.5) self.assertEqual(result[4, 4], 9999999) self.assertEqual(result[4, 0], 22.5) def test_condense_node_order(self): """condense_node_order condenses nodes in list based on index info """ matrix = self.matrix index = find_smallest_index(matrix) node_order = self.node_order node_order = condense_node_order(matrix, index, node_order) self.assertEqual(node_order[1], None) self.assertEqual(node_order[0].__str__(), '(a:0.5,b:0.5);') self.assertEqual(node_order[2].__str__(), 'c;') self.assertEqual(node_order[3].__str__(), 'd;') self.assertEqual(node_order[4].__str__(), 'e;') def test_upgma_cluster(self): """UPGMA_cluster clusters nodes based on info in a matrix with UPGMA """ matrix = self.matrix node_order = self.node_order large_number = 9999999999 tree = UPGMA_cluster(matrix, node_order, large_number) self.assertEqual(str(tree), \ '(((a:0.5,b:0.5):1.75,c:2.25):5.875,(d:1.0,e:1.0):7.125);') def test_UPGMA_cluster_diag(self): """UPGMA_cluster works when the diagonal has lowest values """ #test that checking the diagonal works matrix = self.matrix_zeros node_order = self.node_order large_number = 9999999999 tree = UPGMA_cluster(matrix, node_order, large_number) self.assertEqual(str(tree), \ '(((a:0.5,b:0.5):1.75,c:2.25):5.875,(d:1.0,e:1.0):7.125);') def test_UPGMA_cluster_diag(self): """UPGMA_cluster works when the diagonal has intermediate values """ #test that checking the diagonal works matrix = self.matrix_five node_order = self.node_order large_number = 9999999999 tree = UPGMA_cluster(matrix, node_order, large_number) self.assertEqual(str(tree), \ '(((a:0.5,b:0.5):1.75,c:2.25):5.875,(d:1.0,e:1.0):7.125);') def test_inputs_from_dict2D(self): """inputs_from_dict2D makes an array object and PhyloNode list""" matrix = [('1', '2', 0.86), ('2', '1', 0.86), \ ('1', '3', 0.92), ('3', '1', 0.92), ('2', '3', 0.67), \ ('3', '2', 0.67)] row_order = ['3', '2', '1'] matrix_d2d = Dict2D(matrix, RowOrder=row_order, \ ColOrder=row_order, Pad=True, Default = 999999999999999) matrix_array, PhyloNode_order = inputs_from_dict2D(matrix_d2d) self.assertFloatEqual(matrix_array[0][2], 0.92) self.assertFloatEqual(matrix_array[1][0], 0.67) self.assertEqual(PhyloNode_order[0].Name, '3') self.assertEqual(PhyloNode_order[2].Name, '1') #run if called from command line if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/__init__.py000644 000765 000024 00000001726 12024702176 021341 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = """ test_dynalign test_raxml test_stride test_blast test_foldalign test_rnaalifold test_carnac test_ilm test_rnaforester test_clustalw test_knetfold test_rnaview test_cmfinder test_mfold test_sfold test_comrna test_muscle test_unafold test_consan test_nupack test_util test_contrafold test_parameters test_vienna_package test_cove test_pfold test_gctmpca test_dialign test_pknotsrg test_fasttree test_msms""".split() __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Rob Knight", "Catherine Lozupone", "Sandra Smit", "Gavin Huttley", "Greg Caporaso", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_app/test_blast.py000644 000765 000024 00000035572 12024702176 021754 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from string import split, strip from os import popen, remove from glob import glob from cogent.util.unit_test import TestCase, main from cogent.parse.blast import QMEBlast9 from cogent.app.blast import seqs_to_stream,\ make_subject_match_scorer, make_shotgun_scorer, keep_everything_scorer, \ ids_from_seq_lower_threshold, PsiBlast, psiblast_n_neighbors __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Rob Knight", "Catherine Lozupone"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" class BlastTests(TestCase): """Tests of top-level functions""" def setUp(self): """Define some standard data""" self.rec = """# BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-06 52.8 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 2 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-54 211 ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-54 211 ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-08 59.0 ece:Z4181 sfl:CP0138 33.98 103 57 2 8 110 6 97 6e-06 50.5 ece:Z4181 spt:SPA2730 37.50 72 45 0 39 110 30 101 1e-05 49.8 ece:Z4181 sec:SC2804 37.50 72 45 0 39 110 30 101 1e-05 49.8 ece:Z4181 stm:STM2872 37.50 72 45 0 39 110 30 101 1e-05 49.8 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4182 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4182 ece:Z4182 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4182 ecs:ECs3718 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4182 cvi:CV2422 41.67 72 42 0 39 110 29 100 2e-06 52.8""".split('\n') self.rec2 = """# BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 spt:SPA2730 37.50 72 45 0 39 110 30 101 1e-05 49.8 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 2 # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-54 211 ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-08 59.0 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4182 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4182 ece:Z4182 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4182 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-06 52.8""".split('\n') self.rec3 = """# BLASTP 2.2.10 [Oct-19-2004] # BLASTP 2.2.10 [Oct-19-2004] # Query: ece:Z4181 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4181 ece:Z4182 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4181 spt:SPA2730 37.50 72 45 0 39 110 30 101 1e-05 49.8 # BLASTP 2.2.10 [Oct-19-2004] # Iteration: 1 # Query: ece:Z4182 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4182 ece:Z4182 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4182 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-06 52.8 # BLASTP 2.2.10 [Oct-19-2004] # Query: ece:Z4183 # Database: db/everything.faa # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score ece:Z4183 ece:Z4182 100.00 110 0 0 1 110 1 110 3e-47 187 ece:Z4183 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-54 211 ece:Z4183 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-08 59.0""".split('\n') self.query_1 = """>gi|100002553| Bd2556c Bd2556c two-component system sensor histidine kinase 3092017:3094158 reverse MW:81963 MRLKNRLNNWISIRMGMVIVIFLGVSCGSMRSSTPPPAKDRLTEIDSLERLLPDCPTIASTLPLLRRLAFLYQQQSEMKVYNERLYENAMAVDSISVAYLGLKNLAEYYYDQSVRDSLEYYCSLVDSIAKARHEYPNVLFDVKSLSSQDLLWLGNYELAMSEAMDLYRLASNLDHRYGLLRCSETLGLIYQRIRRDSDAVVSFQESLDLLKDIKDVPDIMDTKVRLTSYQLESSVRTKQYASTERILGQYMALLDEQYKIYQEKNDLLSIKREYWLLYSFYTSFYLSQGDLENAKRSLDQASSYADSNWVEGDYAINTYLTVKARYHKAAGDIPLALHCINEVLETERLPEDIQFKADILKEQGQLGEVMALYDELYSTLTKRRGTSFLRQVNQLRTLHELHEKELKETELKEAGQRIARKQDLLIFILSISVVLLILLYVLFLYYRHLRSLKNQLQREKELLLESQRQLIKEKTRAEEASLMKSAFLANMSHEVRTPLNAIVGFSGLLVEPSTDEEERKEYSSIIRNNTDLMLNLVNDVLDLSRMETGDLHFDIKDHLLLVCCQMALESVRHRIPDGVKLTFSPAGEPIVVHVDNLRLQQLLTNLLTNAAKFTEKGEINLSFQLEPDRKKVCIAVTDTGAGIPLEKQATIFNRFEKLDDYKPGVGLGLSICLLIAERLDGALFIDSSYTDGARFVLILSCEIDSSIYNPPIEV""" self.query_2 = """>gi|100002557| Bd2560c Bd2560c conserved hypothetical protein 3097971:3098210 reverse MW:8927 MGKNQLIHGNEFHLLKQAEIHKATGKLVESLNLAAGSTGGFDIYKVVEAYFTDLEKRKEINDLLGISEPCETRVTEECFS """ self.fasta_recs = """>gi|100002550| Bd2553c Bd2553c conserved hypothetical protein 3090609:3091013 reverse MW:14682 MMDFISVPLVVGIVCAGIYGLFELFVRKRERLAIIEKIGDKLDTSAFDGKLGLPNYMRNFSFSSLKAGCLLAGIGLGLLVGFIINMCMATNSYYDDGWYRHEVAGTAYGASVLLFGGIGLIIAFVIELKLGKNNK >gi|100002551| Bd2554 Bd2554 RNA polymerase ECF-type sigma factor 3091112:3091717 forward MW:23408 LLPQVVTYLPGLRPLSTMELYTDTYYIQRIQAGDVACFACLLDKYSRPIHSLILKVVRSQEEAEELAQDTFMKVFKNLASFKGDCSFSTWIYRIAYNTAISSVRKKRYEFLAIEETTLENVSEEEITNLFGQTESTEQVQRLEVALEQLLPDERALILLFYWKEKTIEELVSITGLTASNIKVKLHRIRKKLFVLLNGMDHE >gi|100002552| Bd2555 Bd2555 conserved hypothetical protein 3091713:3092066 forward MW:13332 MSKINTNKEQPDLLGDLFKRIPEEELPASFRSNVMRQIMLESAKAKKRDERFSLLAAIVASLIMISLAIVSFVYMEIPKIAIPTISTSALAFYLYIGAITLILLLADYKLRNLFHKKG >gi|100002553| Bd2556c Bd2556c two-component system sensor histidine kinase 3092017:3094158 reverse MW:81963 MRLKNRLNNWISIRMGMVIVIFLGVSCGSMRSSTPPPAKDRLTEIDSLERLLPDCPTIASTLPLLRRLAFLYQQQSEMKVYNERLYENAMAVDSISVAYLGLKNLAEYYYDQSVRDSLEYYCSLVDSIAKARHEYPNVLFDVKSLSSQDLLWLGNYELAMSEAMDLYRLASNLDHRYGLLRCSETLGLIYQRIRRDSDAVVSFQESLDLLKDIKDVPDIMDTKVRLTSYQLESSVRTKQYASTERILGQYMALLDEQYKIYQEKNDLLSIKREYWLLYSFYTSFYLSQGDLENAKRSLDQASSYADSNWVEGDYAINTYLTVKARYHKAAGDIPLALHCINEVLETERLPEDIQFKADILKEQGQLGEVMALYDELYSTLTKRRGTSFLRQVNQLRTLHELHEKELKETELKEAGQRIARKQDLLIFILSISVVLLILLYVLFLYYRHLRSLKNQLQREKELLLESQRQLIKEKTRAEEASLMKSAFLANMSHEVRTPLNAIVGFSGLLVEPSTDEEERKEYSSIIRNNTDLMLNLVNDVLDLSRMETGDLHFDIKDHLLLVCCQMALESVRHRIPDGVKLTFSPAGEPIVVHVDNLRLQQLLTNLLTNAAKFTEKGEINLSFQLEPDRKKVCIAVTDTGAGIPLEKQATIFNRFEKLDDYKPGVGLGLSICLLIAERLDGALFIDSSYTDGARFVLILSCEIDSSIYNPPIEV >gi|100002554| Bd2557c Bd2557c two-component system sensor histidine kinase 3094158:3095507 reverse MW:51247 LERKYNGEGKIFPVKRHRCLMSCYYCELYTMKGNSGKAQAYLDQATAYLDSSFGDRVEAQYLRTKSFYYWKEKDYRHALSAVNLALKINRDLDKLEMKKAVLQSSGQLQEAVTIYEEIINKTETINTDAFDRQIEQLRVLNDLNDLEKQDRELKLKSEQEALKQKQIVVSIGLLLVLMGLLYMLWRIYMHTKRLRNELLQEKDSLTASEKQLRVVTKEAEAANKKKSAFIANISHEVRTPLNAIVGFSELLASSEYSEEEKIRFAGEVNHSSELLLNLVNDVLDLSRLESGKIKFSVKPNDLVACCQRALDSIRHRVKPGVRLTFTPSIESYTLNTDALRLQQLLTNLLSNAAKFTSEGEINLSFTVDEGKEEVCFSVTDTGCGIPEDKCEKIFERFEKLDDFIQGTGLGLSVCQIISEQLNGSLSVDISYKDGARFVFIHPTNLIETPI >gi|100002555| Bd2558c Bd2558c hypothetical protein 3095527:3095985 reverse MW:17134 LRGKNIHLGRVGCNYGKLLIFIDIYFVSLRIVSDKSMSRGFLRKSSVNTFIGIVWILFAVGTSAQNAVSKFRADSIRQSLSRIQKPQDKIPLLKELIGLYWQLPEEVLALKEIIDIAMPLDSIGIVYDAMAGLSRYYPAIRTFVRVGGALETV >gi|100002556| Bd2559 Bd2559 30S ribosomal protein S1 3096095:3097882 forward MW:67092 MENLKNIQPVEDFNWDAFEQGETYTEVSKDDLVKTYDETLNTVKDKEVVMGTVTSMNKREVVVNIGFKSDGVVPMSEFRYNPDLKIGDEVEVYIESQEDKKGQLILSHKKARATRSWDRVNEALEKDEIIKGYIKCRTKGGMIVDVFGIEAFLPGSQIDVKPIRDYDVFVGKTMEFKIVKINQEFKNVVVSHKALIEAELEQQKKDIISKLEKGQVLEGTVKNITSYGVFIDLGGVDGLIHITDLSWGRVSHPEEIVQLDQKINVVILDFDDEKKRIALGLKQLTPHPWDALDTNLKVGDKVKGKVVVMADYGAFIEIAPGVEGLIHVSEMSWTQHLRSAQDFMKVGDEIEAVILTLDRDERKMSLGIKQLKADPWENIEERFPVGSRHAAKVRNFTNFGVFVEIEEGVDGLIHISDLSWTKKIKHPSEFTQIGAEIEVQVLEIDKENRRLSLGHKQLEENPWDVFETIFTVGSIHEGTIIEVLDKGAVISLPYGVEGFATPKHLVKEDGSQAQVDEKLSFKVIEFNKEAKRIILSHSRIFEDEQKGAKATSEKKASSKRGGKKEEESGMVTGPVEKTTLGDIEELAALKEKLSGK >gi|100002557| Bd2560c Bd2560c conserved hypothetical protein 3097971:3098210 reverse MW:8927 MGKNQLIHGNEFHLLKQAEIHKATGKLVESLNLAAGSTGGFDIYKVVEAYFTDLEKRKEINDLLGISEPCETRVTEECFS >gi|100002558| Bd2561 Bd2561 phosphoglycolate phosphatase 3098389:3099033 forward MW:24182 MKKLVIFDLDGTLLNTIADLAHSTNHALRQNGFPTHDVKEYNFFVGNGINKLFERALPEGEKTAENILKVREEFLKHYDLHNTDRSVPYPGVPELLALLQERGIKLAVASNKYQAATRKLIAHFFPSIQFTEVLGQREGVKAKPDPSIVNEIVERASISKESTLYVGDSDVDMQTAINSEVTSCGVTWGFRPRTELEKYAPDHIAEKAEDILKFI >gi|100002559| Bd2562 Bd2562 conserved hypothetical protein 3099382:3100299 forward MW:35872 MSGNIKKIVEPNSGIDYSLEKDFKIFTLSKELPITTYPSYIRLGIVIYCVKGNAKIDIYSNKHIITPKELIIILPGQLVALTDVSVDFQIRYFTITESFYSDILSGISRFSPHFFFYMRQHYYFKMEDVETLSFVDFFELLIRKAVDPENQYRRESVILLLRILFLDIYNHYKVNSLDSTATIDVHKKELTHKFFQLVMSNYKVNRSVTFYANSLCITPKYLTMVVKEVSGKSAKDWITEYMILELKGLLTNSTLNIQEIVEKTQFSNQSSLGRFFRRHTGLSPLQYRKKYLTTEQRTNFSKNNTI """ def test_seqs_to_stream(self): """seqs_to_stream should iterate over seqs""" sts = seqs_to_stream self.assertEqual(list(sts('>a\nTG\n>b\nWW\n', \ '_input_as_multiline_string')),\ [['>a','TG'],['>b','WW']]) #skipping test for file open self.assertEqual(list(sts(['TG','WW'], '_input_as_seqs')), \ [['>0','TG'],['>1','WW']]) self.assertEqual(list(sts(['>a','TG','>b','WW'], \ '_input_as_lines')),\ [['>a','TG'],['>b','WW']]) self.assertRaises(TypeError, sts, 'abc', 'xyz') def test_make_subject_match_scorer(self): """make_subject_match_scorer should keep ids matching n queries""" qm1 = make_subject_match_scorer(1) qm3 = make_subject_match_scorer(3) qm5 = make_subject_match_scorer(5) qmes = wrap_qmes(QMEBlast9(self.rec3)) self.assertEqualItems(qm1(qmes), ['ece:Z4181','ece:Z4182','ece:Z4183']) self.assertEqualItems(qm3(qmes), ['ece:Z4181','ece:Z4183']) self.assertEqualItems(qm5(qmes), []) def test_make_shotgun_scorer(self): """make_shotgun_scorer should keep ids matching n queries""" sg1 = make_shotgun_scorer(1) sg2 = make_shotgun_scorer(2) sg3 = make_shotgun_scorer(3) sg4 = make_shotgun_scorer(4) sg5 = make_shotgun_scorer(5) qmes = wrap_qmes(QMEBlast9(self.rec3)) self.assertEqualItems(sg1(qmes), keep_everything_scorer(qmes)) self.assertEqualItems(sg2(qmes), \ ['ece:Z4181','ece:Z4182','ece:Z4183','cvi:CV2421','ecs:ECs3717']) self.assertEqualItems(sg3(qmes), \ ['ece:Z4181','ece:Z4182','ece:Z4183']) self.assertEqualItems(sg4(qmes), \ ['ece:Z4182']) self.assertEqualItems(sg5(qmes), []) def test_keep_everything_scorer(self): """keep_everything_scorer should keep all ids found.""" k = keep_everything_scorer(wrap_qmes(QMEBlast9(self.rec2))) self.assertEqualItems(k, \ ['ece:Z4181','ecs:ECs3717','spt:SPA2730','cvi:CV2421','ece:Z4182']) def test_ids_from_seq_lower_threshold(self): "ids_from_seq_lower_threshold returns psiblast hits, decreasing sens" bdb_seqs = self.fasta_recs f = open('test_bdb', 'w') f.write(bdb_seqs) f.close() temp = popen('formatdb -i test_bdb -o T -p T') params = {'-j':2, '-d':'test_bdb'} query = self.query_1.split('\n') app = PsiBlast(params=params, InputHandler='_input_as_lines') #the command below should result in finding itself and 2554 #it should run for max_iterations result = ids_from_seq_lower_threshold(query, n=12, \ max_iterations=4, app=app, core_threshold=1e-50, \ lower_threshold=1e-20, step=10000) self.assertEqual(result[0],\ [('gi|100002553', '0.0'), ('gi|100002554', '0.0')]) self.assertEqual(result[1], 4) #if n=2, it should find the same sequences but only run for 1 iteration #since it would hit n after the first blast search result = ids_from_seq_lower_threshold(query, n=2, \ max_iterations=4, app=app, core_threshold=1e-50, \ lower_threshold=1e-20, step=10000) self.assertEqual(result[0],\ [('gi|100002553', '0.0'), ('gi|100002554', '0.0')]) self.assertEqual(result[1], 1) query = self.query_2.split('\n') #query_2_s e-value for itself is 9e-47, it should not be found #with the lower_threshold set to 1e-48 result = ids_from_seq_lower_threshold(query, n=12, \ max_iterations=4, app=app, core_threshold=1e-50, \ lower_threshold=1e-48, step=10000) self.assertEqual(result[0], []) #it also should not be found if the max_iterations is set to 1 result = ids_from_seq_lower_threshold(query, n=12, \ max_iterations=1, app=app, core_threshold=1e-50, \ lower_threshold=1e-20, step=10000) self.assertEqual(result[0], []) for fname in ['formatdb.log'] + glob('test_bdb*'): remove(fname) def test_psiblast_n_neighbors(self): "psiblast_n_neighbors psiblasts and stops when n neighbors are reached" bdb_seqs = self.fasta_recs f = open('test_bdb', 'w') f.write(bdb_seqs) f.close() temp = popen('formatdb -i test_bdb -o T -p T') params = {'-j':11} lines = bdb_seqs.split('\n') results = psiblast_n_neighbors(lines, n=12, blast_db='test_bdb', \ method='lower_threshold', params=params,\ core_threshold=1e-50, step=10000) #there should be 10 result entries since there were 10 queries self.assertEqual(len(results), 10) for i in results: #each query should at least find itself self.failUnless(len(results[i][0]) >= 1) #each query should iterate 8 times since it can never reach max self.assertEqual(results[i][1], 11) for fname in ['formatdb.log'] + glob('test_bdb*'): remove(fname) def wrap_qmes(qmes): """Converts qmes into a dict of {q:{m:e}}""" d = {} for q, m, e in qmes: if q not in d: d[q] = {} d[q][m] = e return d if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_app/test_blat.py000644 000765 000024 00000064113 12024702176 021562 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.app.util import get_tmp_filename from cogent.app.blat import Blat, assign_reads_to_database, \ assign_dna_reads_to_dna_database, \ assign_dna_reads_to_protein_database from os.path import join, exists from os import remove from re import search __author__ = "Adam Robbins-Pianka" __copyright__ = "Copyright 2007-2012, The QIIME Project" __credits__ = ["Adam Robbins-Pianka", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Adam Robbins-Pianka" __email__ = "adam.robbinspianka@colorado.edu" __status__ = "Prototype" class BlatTests(TestCase): files_to_remove = [] def setUp(self): """Sets up files for testing. """ self.test_db_prot_filename = get_tmp_filename().replace('"', '') self.test_db_prot = open(self.test_db_prot_filename, 'w') self.test_db_dna_filename = get_tmp_filename().replace('"', '') self.test_db_dna = open(self.test_db_dna_filename, 'w') self.test_query_filename = get_tmp_filename().replace('"', '') self.test_query = open(self.test_query_filename, 'w') # write the global variables at the bottom of this file to the # temporary test files. Can't use file-like objects because the # external application needs actual files. self.test_db_prot.write('\n'.join(test_db_prot)) self.test_db_dna.write('\n'.join(test_db_dna)) self.test_query.write('\n'.join(test_query)) # close the files self.test_db_prot.close() self.test_db_dna.close() self.test_query.close() # prepare output file path self.testout = get_tmp_filename().replace('"', '') self.files_to_remove += [self.test_db_prot_filename, self.test_db_dna_filename, self.test_query_filename, self.testout] def tearDown(self): """Removes temporary files created during the tests """ for filename in self.files_to_remove: if exists(filename): remove(filename) def test_assign_reads_to_database(self): """Tests that assign_reads_to_database works as expected. Checks the output file against the expected result when known database and query files are used. """ exp = [l for l in assign_reads_exp if not l.startswith('#')] obs_lines = assign_reads_to_database(self.test_query_filename, self.test_db_dna_filename, self.testout).read().splitlines() obs = [l for l in obs_lines if not l.startswith('#')] self.assertEqual(obs, exp) def test_assign_dna_reads_to_dna_database(self): """Tests that assign_dna_reads_to_dna_database works as expected. Checks the output file against the expected result when known database and query files are used. """ exp = [l for l in assign_reads_exp if not l.startswith('#')] obs_lines = assign_dna_reads_to_dna_database(self.test_query_filename, self.test_db_dna_filename, self.testout).read().splitlines() obs = [l for l in obs_lines if not l.startswith('#')] self.assertEqual(obs, exp) def test_assign_dna_reads_to_protein_database(self): """Tests that assign_dna_reads_to_protein_database works as expected. Checks the output file against the expected result when known database and query files are used. """ exp = [l for l in assign_reads_prot_exp if not l.startswith('#')] obs_lines = assign_dna_reads_to_protein_database( self.test_query_filename, self.test_db_prot_filename, self.testout).read().splitlines() obs = [l for l in obs_lines if not l.startswith('#')] self.assertEqual(obs, exp) def test_get_base_command(self): """Tests that _get_base_command generates the proper command given various inputs. """ test_parameters_blank = {} files = (self.test_query_filename, self.test_db_dna_filename, self.testout) exp_blank = 'blat %s %s %s' % (files[1], files[0], files[2]) # initialize a Blat instance with these parameters and get the # command string b = Blat(params = {}, HALT_EXEC=True) # need to set the positional parameters' values b._input_as_list(files) cmd = b._get_base_command() # find the end of the cd command and trim the base command cmd_index = search('cd ".+"; ', cmd).end() cmd = cmd[cmd_index:] self.assertEqual(cmd, exp_blank) test_parameters_1 = { '-t': 'dna', '-q': 'dna', '-ooc': '11.ooc', '-tileSize': 1, '-stepSize': 2, '-oneOff': 1, '-minMatch': 2, '-minScore': 3, '-minIdentity': 4, '-maxGap': 5, '-makeOoc': 'N.ooc', '-repMatch': 6, '-mask': 'lower', '-qMask': 'lower', '-repeats': 'lower', '-minRepDivergence': 7, '-dots': 8, '-out': 'psl', '-maxIntron': 9} exp_1 = 'blat %s %s ' % (files[1], files[0]) + \ '-dots=8 -makeOoc="N.ooc" -mask=lower -maxGap=5 ' + \ '-maxIntron=9 -minIdentity=4 -minMatch=2 ' + \ '-minRepDivergence=7 -minScore=3 -oneOff=1 -ooc="11.ooc" ' + \ '-out=psl -q=dna -qMask=lower -repMatch=6 -repeats=lower ' + \ '-stepSize=2 -t=dna -tileSize=1 %s' % files[2] # initialize a Blat instance with these parameters and get the # command string b = Blat(params = test_parameters_1, HALT_EXEC=True) # need to set the positional parameters' values b._input_as_list(files) cmd = b._get_base_command() # find the end of the cd command and trim the base command cmd_index = search('cd ".+"; ', cmd).end() cmd = cmd[cmd_index:] self.assertEqual(cmd, exp_1) test_parameters_2 = { '-tileSize': 1, '-stepSize': 2, '-minMatch': 2, '-minScore': 3, '-minIdentity': 4, '-maxGap': 5, '-makeOoc': 'N.ooc', '-out': 'psl', '-maxIntron': 9} exp_2 = 'blat %s %s ' % (files[1], files[0]) + \ '-makeOoc="N.ooc" -maxGap=5 -maxIntron=9 -minIdentity=4 ' + \ '-minMatch=2 -minScore=3 -out=psl -stepSize=2 ' + \ '-tileSize=1 %s' % files[2] # initialize a Blat instance with these parameters and get the # command string b = Blat(params = test_parameters_2, HALT_EXEC=True) # need to set the positional parameters' values b._input_as_list(files) cmd = b._get_base_command() # find the end of the cd command and trim the base command cmd_index = search('cd ".+"; ', cmd).end() cmd = cmd[cmd_index:] self.assertEqual(cmd, exp_2) assign_reads_exp = """# BLAT 34 [2006/03/10] # Query: NZ_GG770509_647533119 # Database: test_db.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_GG770509_647533119 NZ_GG770509_647533119 100.00 1371 0 0 1 1371 1 1371 0.0e+00 2187.0 NZ_GG770509_647533119 NZ_ACIZ01000148_643886127 85.49 634 92 0 336 969 337 970 4.5e-234 807.0 NZ_GG770509_647533119 NZ_ACIZ01000148_643886127 86.08 237 33 0 1135 1371 1137 1373 1.2e-77 287.0 NZ_GG770509_647533119 NZ_ACIZ01000148_643886127 83.12 154 26 0 976 1129 977 1130 2.2e-48 190.0 NZ_GG770509_647533119 NZ_GG739926_647533195 78.42 329 71 0 656 984 657 985 4.8e-97 351.0 NZ_GG770509_647533119 NZ_GG739926_647533195 89.09 110 11 1 1138 1246 1141 1250 1.1e-30 131.0 NZ_GG770509_647533119 NZ_GG739926_647533195 86.96 69 9 0 1021 1089 1023 1091 3.2e-20 96.0 NZ_GG770509_647533119 NZ_GG739926_647533195 75.26 97 22 2 356 450 356 452 2.3e-13 73.0 NZ_GG770509_647533119 NZ_GG739926_647533195 90.57 53 5 0 1319 1371 1315 1367 2.5e-10 63.0 NZ_GG770509_647533119 NZ_GG739926_647533195 81.82 22 4 0 989 1010 992 1013 1.5e+02 24.0 # BLAT 34 [2006/03/10] # Query: NZ_GG739926_647533195 # Database: test_db.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_GG739926_647533195 NZ_GG739926_647533195 100.00 1367 0 0 1 1367 1 1367 0.0e+00 2235.0 NZ_GG739926_647533195 NZ_ACIZ01000148_643886127 76.22 572 136 0 414 985 414 985 1.7e-158 556.0 NZ_GG739926_647533195 NZ_ACIZ01000148_643886127 76.80 181 42 0 1023 1203 1022 1202 6.4e-53 205.0 NZ_GG739926_647533195 NZ_ACIZ01000148_643886127 96.00 50 2 0 1209 1258 1207 1256 6.4e-14 75.0 NZ_GG739926_647533195 NZ_ACIZ01000148_643886127 88.68 53 6 0 1315 1367 1321 1373 1.6e-09 61.0 NZ_GG739926_647533195 NZ_ACIZ01000148_643886127 77.27 22 5 0 992 1013 990 1011 8.5e+02 22.0 NZ_GG739926_647533195 NZ_GG770509_647533119 79.29 280 58 0 657 936 656 935 9.9e-82 301.0 NZ_GG739926_647533195 NZ_GG770509_647533119 89.09 110 11 1 1141 1250 1138 1246 1.1e-30 131.0 NZ_GG739926_647533195 NZ_GG770509_647533119 86.96 69 9 0 1023 1091 1021 1089 3.2e-20 96.0 NZ_GG739926_647533195 NZ_GG770509_647533119 75.26 97 22 2 356 452 356 450 2.3e-13 73.0 NZ_GG739926_647533195 NZ_GG770509_647533119 90.57 53 5 0 1315 1367 1319 1371 2.5e-10 63.0 NZ_GG739926_647533195 NZ_GG770509_647533119 80.00 30 6 0 956 985 955 984 1.2e-03 41.0 NZ_GG739926_647533195 NZ_GG770509_647533119 81.82 22 4 0 992 1013 989 1010 1.5e+02 24.0 # BLAT 34 [2006/03/10] # Query: NZ_ACIZ01000148_643886127 # Database: test_db.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_ACIZ01000148_643886127 NZ_ACIZ01000148_643886127 100.00 1373 0 0 1 1373 1 1373 0.0e+00 2165.0 NZ_ACIZ01000148_643886127 NZ_GG770509_647533119 85.49 634 92 0 337 970 336 969 4.5e-234 807.0 NZ_ACIZ01000148_643886127 NZ_GG770509_647533119 86.08 237 33 0 1137 1373 1135 1371 1.2e-77 287.0 NZ_ACIZ01000148_643886127 NZ_GG770509_647533119 83.12 154 26 0 977 1130 976 1129 2.2e-48 190.0 NZ_ACIZ01000148_643886127 NZ_GG739926_647533195 76.22 572 136 0 414 985 414 985 1.7e-158 556.0 NZ_ACIZ01000148_643886127 NZ_GG739926_647533195 76.80 181 42 0 1022 1202 1023 1203 6.4e-53 205.0 NZ_ACIZ01000148_643886127 NZ_GG739926_647533195 96.00 50 2 0 1207 1256 1209 1258 6.4e-14 75.0 NZ_ACIZ01000148_643886127 NZ_GG739926_647533195 88.68 53 6 0 1321 1373 1315 1367 1.6e-09 61.0 NZ_ACIZ01000148_643886127 NZ_GG739926_647533195 77.27 22 5 0 990 1011 992 1013 8.5e+02 22.0 """.splitlines() assign_reads_prot_exp = """# BLAT 34x13 [2009/02/26] # Query: NZ_GG770509_647533119_frame_1 # Database: /home/adro2179/metagenome/test_db_prot.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_GG770509_647533119_frame_1 NZ_GG770509_647533119 96.83 441 0 7 1 427 1 441 8.9e-254 872.0 # BLAT 34x13 [2009/02/26] # Query: NZ_GG770509_647533119_frame_2 # Database: /home/adro2179/metagenome/test_db_prot.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_GG770509_647533119_frame_2 NZ_ACIZ01000148_643886127 85.37 41 6 0 359 399 362 402 8.0e-13 72.0 NZ_GG770509_647533119_frame_2 NZ_ACIZ01000148_643886127 93.75 16 1 0 419 434 421 436 1.3e+00 31.0 NZ_GG770509_647533119_frame_2 NZ_GG739926_647533195 75.86 29 7 0 320 348 326 354 2.9e-04 43.0 # BLAT 34x13 [2009/02/26] # Query: NZ_GG770509_647533119_frame_3 # Database: /home/adro2179/metagenome/test_db_prot.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_GG770509_647533119_frame_3 NZ_ACIZ01000148_643886127 80.61 98 19 0 210 307 209 306 7.5e-39 158.0 NZ_GG770509_647533119_frame_3 NZ_ACIZ01000148_643886127 66.33 98 33 0 43 140 44 141 8.9e-27 118.0 NZ_GG770509_647533119_frame_3 NZ_ACIZ01000148_643886127 78.95 38 8 0 310 347 308 345 2.3e-08 57.0 NZ_GG770509_647533119_frame_3 NZ_ACIZ01000148_643886127 66.67 30 10 0 178 207 178 207 2.5e-01 33.0 NZ_GG770509_647533119_frame_3 NZ_GG739926_647533195 53.00 100 47 0 131 230 134 233 1.9e-18 90.0 NZ_GG770509_647533119_frame_3 NZ_GG739926_647533195 68.89 45 14 0 238 282 241 285 5.9e-09 59.0 NZ_GG770509_647533119_frame_3 NZ_GG739926_647533195 72.09 43 12 0 63 105 66 108 3.0e-08 56.0 # BLAT 34x13 [2009/02/26] # Query: NZ_GG739926_647533195_frame_1 # Database: /home/adro2179/metagenome/test_db_prot.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_GG739926_647533195_frame_1 NZ_GG739926_647533195 100.00 437 0 0 1 437 1 437 1.7e-263 904.0 NZ_GG739926_647533195_frame_1 NZ_ACIZ01000148_643886127 69.86 73 22 0 213 285 209 281 1.1e-20 98.0 NZ_GG739926_647533195_frame_1 NZ_ACIZ01000148_643886127 53.33 60 28 0 148 207 145 204 1.3e-06 51.0 NZ_GG739926_647533195_frame_1 NZ_ACIZ01000148_643886127 60.53 38 15 0 66 103 64 101 1.9e-03 41.0 NZ_GG739926_647533195_frame_1 NZ_ACIZ01000148_643886127 76.92 26 6 0 2 27 3 28 9.7e-03 38.0 NZ_GG739926_647533195_frame_1 NZ_ACIZ01000148_643886127 69.57 23 7 0 288 310 285 307 4.8e+00 29.0 NZ_GG739926_647533195_frame_1 NZ_ACIZ01000148_643886127 90.00 10 1 0 134 143 132 141 1.6e+04 18.0 # BLAT 34x13 [2009/02/26] # Query: NZ_GG739926_647533195_frame_2 # Database: /home/adro2179/metagenome/test_db_prot.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_GG739926_647533195_frame_2 NZ_GG770509_647533119 66.67 42 14 0 270 311 276 317 2.3e-08 57.0 NZ_GG739926_647533195_frame_2 NZ_GG770509_647533119 60.00 45 18 0 185 229 188 232 3.9e-06 49.0 NZ_GG739926_647533195_frame_2 NZ_GG770509_647533119 80.00 20 4 0 247 266 251 270 5.6e-01 32.0 # BLAT 34x13 [2009/02/26] # Query: NZ_GG739926_647533195_frame_3 # Database: /home/adro2179/metagenome/test_db_prot.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_GG739926_647533195_frame_3 NZ_ACIZ01000148_643886127 94.44 18 1 0 390 407 385 402 4.3e-03 39.0 # BLAT 34x13 [2009/02/26] # Query: NZ_ACIZ01000148_643886127_frame_1 # Database: /home/adro2179/metagenome/test_db_prot.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_ACIZ01000148_643886127_frame_1 NZ_ACIZ01000148_643886127 100.00 436 0 0 1 436 1 436 2.1e-261 897.0 NZ_ACIZ01000148_643886127_frame_1 NZ_GG739926_647533195 78.57 42 9 0 240 281 244 285 4.0e-10 63.0 NZ_ACIZ01000148_643886127_frame_1 NZ_GG739926_647533195 60.53 38 15 0 64 101 66 103 1.9e-03 41.0 NZ_ACIZ01000148_643886127_frame_1 NZ_GG739926_647533195 76.92 26 6 0 3 28 2 27 9.7e-03 38.0 NZ_ACIZ01000148_643886127_frame_1 NZ_GG739926_647533195 69.57 23 7 0 285 307 288 310 4.8e+00 29.0 # BLAT 34x13 [2009/02/26] # Query: NZ_ACIZ01000148_643886127_frame_2 # Database: /home/adro2179/metagenome/test_db_prot.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_ACIZ01000148_643886127_frame_2 NZ_GG770509_647533119 79.59 147 26 2 182 324 189 335 2.3e-61 233.0 NZ_ACIZ01000148_643886127_frame_2 NZ_GG770509_647533119 72.73 33 9 0 128 160 137 169 5.0e-04 42.0 NZ_ACIZ01000148_643886127_frame_2 NZ_GG770509_647533119 90.91 22 2 0 70 91 76 97 2.5e-03 40.0 # BLAT 34x13 [2009/02/26] # Query: NZ_ACIZ01000148_643886127_frame_3 # Database: /home/adro2179/metagenome/test_db_prot.fasta # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score NZ_ACIZ01000148_643886127_frame_3 NZ_GG770509_647533119 84.21 38 4 1 360 395 367 404 3.0e-08 56.0 NZ_ACIZ01000148_643886127_frame_3 NZ_GG770509_647533119 94.12 17 1 0 413 429 425 441 1.6e+00 31.0 NZ_ACIZ01000148_643886127_frame_3 NZ_GG739926_647533195 78.57 28 5 1 321 347 326 353 1.5e-03 41.0""" assign_reads_prot_exp = assign_reads_prot_exp.splitlines() test_db_prot = """>NZ_GG770509_647533119 YLEFDPGSERTLAAGLTHASRASGRRVSNAWERTICYGITQGNLCYRMetWKVGKSARVGLASWWGKGSPRRRSIAGLRGSATLGLRHGPDSYGRQQWGILDNGRKPDPAMetPRERPGCKALSPVKMetTVTGEEAPANFVPAAAVIRRGLALFGFTGRKAHVGGLLSQGNPGAQPRNCLYWKSVWRVEFRVRNSIFGGTPVAKAAHWTNRGAKAWGANRIRYPGSPRRKRMetLAVGASVAQLTHTFRLGSAVARLKLKGIDGGPHKRWSMetWFNSKQRAEPYQPLTSTGAAWLSSARVVRCWVKSRNERNPRPLPAWALGDCRAGGRWGRQVLMetALTGWATHVLQWWSVGSEHASVSSPPSQFGCTLQLECRSWNRSRISMetPRIRSRALYTPPVTPWELVLPEGACAGDHGRVSDWGEVVTRPGNLRLDHLLS >NZ_GG739926_647533195 WEFDPGSGTLATGLTHASRGTGARVSNAYPTFPRPRDNLPKGRLIPYVQSRSRMGMRPISLLAGQRPTKASIGRGSERKAPHTGTETRSRLLREAAVRNIGQWAEATSQVACRTTAYGLTAFMRGYAGTAIRTGFRASSRGNTEGPGVIRIYWVRERRPPCKRAVKSSGPTAALRRELLGLSAPEAGGIRGVAVKCLDITKNPDCEGSPLWRLTLRLEGAGIEQDIPWSARTMDTRCPALGGQAKALSIPPGEYAGNGETQRNRGPAQAEEHVVFDDTRGTLPGLELRCCMVVVSSCREVSAQVPRAQPLSAVAIGRALCGHCRRKVEEGGDDVKSARPLRPGPHTCYNGRQRAVRAQVRVNPLRSQFGWGLQPDPRSWIRSRISHGAVNTFPGLVHTARQAMKAGGASPCRPRAKPVIGAKSQGSRTGRCGWNTSF >NZ_ACIZ01000148_643886127 NMEFDPGSGTLAACLIHASRTSGGRVSNTWVTCPVGDNIWKQMLIPHKESRFWMDPRRISLVRRLTKAMIRSRTERLIGHIGTETRPKLLREAAVGNLPQWTQVWSNAAVKKAFGSNSVVGEDDGIQPESHGLRASSRGNTVASVIRIYWASERRRFFKSDVKALGLTEEVHRKLGNLSAEEDSGTPCVAVKCVDIWKNTSGEGGCLVLTLRLESMGSEQDIPWSMPTMNARCWSFSAAANALSIPPGEYDRKVETQRNRGPAQAVEHVVFEATRRTLPGLDIDRWCMVVVSSCREMLGVPQRAQPLLVASMGTLVRLPVTNRRKVGMTSNHHAPYDLGYTRATMDGNELRDREVKLISSILSSDVGCNSPTEVGIASNRGSARRGEYVPGPCTHRPSHHESLHPKPVRSEPSKVGQMIRVKSQGSRRRTCGWITS""" test_db_prot = test_db_prot.splitlines() test_db_dna = """>NZ_GG770509_647533119 UACUUGGAGUUUGAUCCUGGCUCAGAACGAACGCUGGCGGCAGGCUUAACACAUGCAAGUCGAGCGAGCGGCAGACGGGUGAGUAACGCGUGGGAACGUACCAUUUGCUACGGAAUAACUCAGGGAAACUUGUGCUAAUACCGUAUGUGGAAAGUCGGCAAAUGAUCGGCCCGCGUUGGAUUAGCUAGUUGGUGGGGUAAAGGCUCACCAAGGCGACGAUCCAUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGCAAGCCUGAUCCAGCCAUGCCGCGUGAGUGAUGAAGGCCCUAGGGUUGUAAAGCUCUUUCACCGGUGAAGAUGACGGUAACCGGAGAAGAAGCCCCGGCUAACUUCGUGCCAGCAGCCGCGGUAAUACGAAGGGGGCUAGCGUUGUUCGGAUUUACUGGGCGUAAAGCGCACGUAGGCGGACUUUUAAGUCAGGGGUGAAAUCCCGGGGCUCAACCCCGGAACUGCCUUUGAUACUGGAAGUCUUGAGUAUGGUAGAGGUGAGUGGAAUUCCGAGUGUAGAGGUGAAAUUCGUAGAUAUUCGGAGGAACACCAGUGGCGAAGGCGGCUCACUGGACCAACUGACGCUGAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAAUGUUAGCCGUCGGGGCUUCGGUGGCGCAGCUAACGCAUUAAACAUUCCGCCUGGGGAGUGCGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGCAGAACCUUACCAGCCCUUGACAUCGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGCCCUUAGUUGCCAGCAUGGGCACUCUAAGGGGACUGCCGGUGAUAAGCCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUACGGGCUGGGCUACACACGUGCUACAAUGGUGGUCAGUGGGCAGCGAGCACGCGAGUGUGAGCUAAUCUCCGCCAUCUCAGUUCGGAUGCACUCUGCAACUCGAGUGCAGAAGUUGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUUGGUUUUACCCGAAGGCGCUUGCUAGGCAGGCGACCACGGUAGGGUCAGCGACUGGGGUGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGCGGCUGGAUCACCUCCUUUCU >NZ_GG739926_647533195 UAAUGGGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCUACAGGCUUAACACAUGCAAGUCGAGGGACCGGCGCACGGGUGAGUAACGCGUAUCCAACCUUCCCGCGACCAAGGGAUAACCUGCCGAAAGGCAGACUAAUACCUUAUGUCCAAAGUCGGUCACGGAUGGGGAUGCGUCCGAUUAGCUUGUUGGCGGGGCAACGGCCCACCAAGGCAUCGAUCGGUAGGGGUUCUGAGAGGAAGGCCCCCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGAGGAAUAUUGGUCAAUGGGCGGAAGCCUGAACCAGCCAAGUAGCGUGCAGGACGACGGCCUACGGGUUGUAAACUGCUUUUAUGCGGGGAUAUGCAGGUACCGCAUGAAUAAGGACCGGCUAAUUCCGUGCCAGCAGCCGCGGUAAUACGGAAGGUCCGGGCGUUAUCCGGAUUUAUUGGGUUUAAAGGGAGCGCAGGCCGCCGUGCAAGCGUGCCGUGAAAAGCAGCGGCCCAACCGCUGCCCUGCGGCGCGAACUGCUUGGCUUGAGUGCGCCGGAAGCGGGCGGAAUUCGUGGUGUAGCGGUGAAAUGCUUAGAUAUCACGAAGAACCCCGAUUGCGAAGGCAGCCCGCUGUGGCGACUGACGCUGAGGCUCGAAGGUGCGGGUAUCGAACAGGAUUAGAUACCCUGGUAGUCCGCACGGUAAACGAUGGAUACCCGCUGUCCGGCUCUGGGCGGCCAAGCGAAAGCGUUAAGUAUCCCACCUGGGGAGUACGCCGGCAACGGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGAGGAACAUGUGGUUUAAUUCGAUGAUACGCGAGGAACCUUACCCGGGCUUGAAUUGUGAAGGUGCUGCAUGGUUGUCGUCAGCUCGUGCCGUGAGGUGUCGGCUCAAGUGCCAUAACGAGCGCAACCCCUCUCCGCAGUUGCCAUCGGCCGGGCACUCUGCGGACACUGCCGCCGCAAGGUGGAGGAAGGUGGGGAUGACGUCAAAUCAGCACGGCCCUUACGUCCGGGGCCACACACGUGUUACAAUGGCCGGCAGAGGGCUGUCCGCGCGCAAGUGCGGGUGAAUCCCCUCCGGUCCCAGUUCGGAUGGGGUCUGCAACCCGACCCCAGAAGCUGGAUUCGCUAGUAAUCGCGCAUCAGCCAUGGCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCAAGCCAUGAAAGCCGGGGGUGCCUGAAGUCCGUGUCGGCCUAGGGCAAAACCGGUGAUUGGGGCUAAGUCGUAACAAGGUAGCCGUACCGGAAGGUGCGGCUGGAACACCUCCUUUCU >NZ_ACIZ01000148_643886127 AAUAUGGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAACGAGUGGCGGACGGGUGAGUAACACGUGGGUAACCUGCCCUUAAGUGGGGGAUAACAUUUGGAAACAGAUGCUAAUACCGCAUAAAGAAAGUCGCUUUUGGAUGGACCCGCGGCGUAUUAGCUAGUUGGUGAGGUAACGGCUCACCAAGGCAAUGAUACGUAGCCGAACUGAGAGGUUGAUCGGCCACAUUGGGACUGAGACACGGCCCAAACUCCUACGGGAGGCAGCAGUAGGGAAUCUUCCACAAUGGACGCAAGUCUGAUGGAGCAACGCCGCGUGAGUGAAGAAGGCUUUCGGGUCGUAAAACUCUGUUGUUGGAGAAGAUGACGGUAUCCAACCAGAAAGCCACGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGUGGCAAGCGUUAUCCGGAUUUAUUGGGCGUAAAGCGAGCGCAGGCGGUUUUUUAAGUCUGAUGUGAAAGCCCUCGGCUUAACCGAGGAAGUGCAUCGGAAACUGGGAAACUUGAGUGCAGAAGAGGACAGUGGAACUCCAUGUGUAGCGGUGAAAUGCGUAGAUAUAUGGAAGAACACCAGUGGCGAAGGCGGCUGUCUGGUCUGACUGACGCUGAGGCUCGAAAGCAUGGGUAGCGAACAGGAUUAGAUACCCUGGUAGUCCAUGCCGUAAACGAUGAAUGCUAGGUGUUGGAGCUUCAGUGCCGCAGCUAACGCAUUAAGCAUUCCGCCUGGGGAGUACGACCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAGGUCUUGACAUCGACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUGACUAGUUGCCAGCAUGGGCACUCUAGUAAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGACCUGGGCUACACACGUGCUACAAUGGAUGGCAACGAGUUGCGAGACCGCGAGGUCAAGCUAAUCUCUUCCAUUCUCAGUUCGGAUGUAGGCUGCAACUCGCCUACAGAAGUCGGAAUCGCUAGUAAUCGCGGAUCAGCACGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGAGAGUUUGUAACACCCGAAGCCGGUGCGUAGCGAGCCGUCUAAGGUGGGACAAAUGAUUAGGGUGAAGUCGUAACAAGGUAGCCGUAGGAGAACCUGCGGCUGGAUCACCUCCUUUCU""" test_db_dna = test_db_dna.splitlines() test_query = """>NZ_GG770509_647533119 UACUUGGAGUUUGAUCCUGGCUCAGAACGAACGCUGGCGGCAGGCUUAACACAUGCAAGUCGAGCGAGCGGCAGACGGGUGAGUAACGCGUGGGAACGUACCAUUUGCUACGGAAUAACUCAGGGAAACUUGUGCUAAUACCGUAUGUGGAAAGUCGGCAAAUGAUCGGCCCGCGUUGGAUUAGCUAGUUGGUGGGGUAAAGGCUCACCAAGGCGACGAUCCAUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGCAAGCCUGAUCCAGCCAUGCCGCGUGAGUGAUGAAGGCCCUAGGGUUGUAAAGCUCUUUCACCGGUGAAGAUGACGGUAACCGGAGAAGAAGCCCCGGCUAACUUCGUGCCAGCAGCCGCGGUAAUACGAAGGGGGCUAGCGUUGUUCGGAUUUACUGGGCGUAAAGCGCACGUAGGCGGACUUUUAAGUCAGGGGUGAAAUCCCGGGGCUCAACCCCGGAACUGCCUUUGAUACUGGAAGUCUUGAGUAUGGUAGAGGUGAGUGGAAUUCCGAGUGUAGAGGUGAAAUUCGUAGAUAUUCGGAGGAACACCAGUGGCGAAGGCGGCUCACUGGACCAACUGACGCUGAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAAUGUUAGCCGUCGGGGCUUCGGUGGCGCAGCUAACGCAUUAAACAUUCCGCCUGGGGAGUGCGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGCAGAACCUUACCAGCCCUUGACAUCGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGCCCUUAGUUGCCAGCAUGGGCACUCUAAGGGGACUGCCGGUGAUAAGCCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUACGGGCUGGGCUACACACGUGCUACAAUGGUGGUCAGUGGGCAGCGAGCACGCGAGUGUGAGCUAAUCUCCGCCAUCUCAGUUCGGAUGCACUCUGCAACUCGAGUGCAGAAGUUGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUUGGUUUUACCCGAAGGCGCUUGCUAGGCAGGCGACCACGGUAGGGUCAGCGACUGGGGUGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGCGGCUGGAUCACCUCCUUUCU >NZ_GG739926_647533195 UAAUGGGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCUACAGGCUUAACACAUGCAAGUCGAGGGACCGGCGCACGGGUGAGUAACGCGUAUCCAACCUUCCCGCGACCAAGGGAUAACCUGCCGAAAGGCAGACUAAUACCUUAUGUCCAAAGUCGGUCACGGAUGGGGAUGCGUCCGAUUAGCUUGUUGGCGGGGCAACGGCCCACCAAGGCAUCGAUCGGUAGGGGUUCUGAGAGGAAGGCCCCCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGAGGAAUAUUGGUCAAUGGGCGGAAGCCUGAACCAGCCAAGUAGCGUGCAGGACGACGGCCUACGGGUUGUAAACUGCUUUUAUGCGGGGAUAUGCAGGUACCGCAUGAAUAAGGACCGGCUAAUUCCGUGCCAGCAGCCGCGGUAAUACGGAAGGUCCGGGCGUUAUCCGGAUUUAUUGGGUUUAAAGGGAGCGCAGGCCGCCGUGCAAGCGUGCCGUGAAAAGCAGCGGCCCAACCGCUGCCCUGCGGCGCGAACUGCUUGGCUUGAGUGCGCCGGAAGCGGGCGGAAUUCGUGGUGUAGCGGUGAAAUGCUUAGAUAUCACGAAGAACCCCGAUUGCGAAGGCAGCCCGCUGUGGCGACUGACGCUGAGGCUCGAAGGUGCGGGUAUCGAACAGGAUUAGAUACCCUGGUAGUCCGCACGGUAAACGAUGGAUACCCGCUGUCCGGCUCUGGGCGGCCAAGCGAAAGCGUUAAGUAUCCCACCUGGGGAGUACGCCGGCAACGGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGAGGAACAUGUGGUUUAAUUCGAUGAUACGCGAGGAACCUUACCCGGGCUUGAAUUGUGAAGGUGCUGCAUGGUUGUCGUCAGCUCGUGCCGUGAGGUGUCGGCUCAAGUGCCAUAACGAGCGCAACCCCUCUCCGCAGUUGCCAUCGGCCGGGCACUCUGCGGACACUGCCGCCGCAAGGUGGAGGAAGGUGGGGAUGACGUCAAAUCAGCACGGCCCUUACGUCCGGGGCCACACACGUGUUACAAUGGCCGGCAGAGGGCUGUCCGCGCGCAAGUGCGGGUGAAUCCCCUCCGGUCCCAGUUCGGAUGGGGUCUGCAACCCGACCCCAGAAGCUGGAUUCGCUAGUAAUCGCGCAUCAGCCAUGGCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCAAGCCAUGAAAGCCGGGGGUGCCUGAAGUCCGUGUCGGCCUAGGGCAAAACCGGUGAUUGGGGCUAAGUCGUAACAAGGUAGCCGUACCGGAAGGUGCGGCUGGAACACCUCCUUUCU >NZ_ACIZ01000148_643886127 AAUAUGGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAACGAGUGGCGGACGGGUGAGUAACACGUGGGUAACCUGCCCUUAAGUGGGGGAUAACAUUUGGAAACAGAUGCUAAUACCGCAUAAAGAAAGUCGCUUUUGGAUGGACCCGCGGCGUAUUAGCUAGUUGGUGAGGUAACGGCUCACCAAGGCAAUGAUACGUAGCCGAACUGAGAGGUUGAUCGGCCACAUUGGGACUGAGACACGGCCCAAACUCCUACGGGAGGCAGCAGUAGGGAAUCUUCCACAAUGGACGCAAGUCUGAUGGAGCAACGCCGCGUGAGUGAAGAAGGCUUUCGGGUCGUAAAACUCUGUUGUUGGAGAAGAUGACGGUAUCCAACCAGAAAGCCACGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGUGGCAAGCGUUAUCCGGAUUUAUUGGGCGUAAAGCGAGCGCAGGCGGUUUUUUAAGUCUGAUGUGAAAGCCCUCGGCUUAACCGAGGAAGUGCAUCGGAAACUGGGAAACUUGAGUGCAGAAGAGGACAGUGGAACUCCAUGUGUAGCGGUGAAAUGCGUAGAUAUAUGGAAGAACACCAGUGGCGAAGGCGGCUGUCUGGUCUGACUGACGCUGAGGCUCGAAAGCAUGGGUAGCGAACAGGAUUAGAUACCCUGGUAGUCCAUGCCGUAAACGAUGAAUGCUAGGUGUUGGAGCUUCAGUGCCGCAGCUAACGCAUUAAGCAUUCCGCCUGGGGAGUACGACCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAGGUCUUGACAUCGACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUGACUAGUUGCCAGCAUGGGCACUCUAGUAAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGACCUGGGCUACACACGUGCUACAAUGGAUGGCAACGAGUUGCGAGACCGCGAGGUCAAGCUAAUCUCUUCCAUUCUCAGUUCGGAUGUAGGCUGCAACUCGCCUACAGAAGUCGGAAUCGCUAGUAAUCGCGGAUCAGCACGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGAGAGUUUGUAACACCCGAAGCCGGUGCGUAGCGAGCCGUCUAAGGUGGGACAAAUGAUUAGGGUGAAGUCGUAACAAGGUAGCCGUAGGAGAACCUGCGGCUGGAUCACCUCCUUUCU""" test_query = test_query.splitlines() if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_bwa.py000644 000765 000024 00000042272 12024702176 021413 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from os.path import join, exists from os import remove, getcwd from cogent.app.bwa import BWA, BWA_index, BWA_aln, BWA_samse, \ BWA_sampe, BWA_bwasw, create_bwa_index_from_fasta_file, \ assign_reads_to_database from cogent.app.util import get_tmp_filename, ApplicationError __author__ = "Adam Robbins-Pianka" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Adam Robbins-Pianka", "Daniel McDonald", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Adam Robbins-Pianka" __email__ = "adam.robbinspianka@colorado.edu" __status__ = "Production" class BWAtests(TestCase): """Tests for the BWA app controller """ # keeps track of which files are created during the tests so that they # can be removed during tearDown files_to_remove = [] def setUp(self): """Performs setup for the tests. Nothing to set up for these tests. """ pass def tearDown(self): """Properly and politely terminates the test. Removes files created during the tests. """ for f in self.files_to_remove: if exists(f): remove(f) def test_check_arguments(self): """Tests the "check_arguments" method of the BWA base class. Arguments passed to certain parameters of the various subcommands can take only certain values. The check_arguments function enforces these constraints. This function ensures that the rules are being enforced as expected. """ # set up test parameters # should pass index_params_is = {'-a': 'is'} # should pass index_params_bwtsw = {'-a': 'bwtsw'} # should fail, -a must be one of "is" or "bwtsw" index_params_invalid = {'-a': 'invalid'} # should fail, -p must specify a prefix that is an absolute path index_params_invalid_prefix = {'-p': 'invalid'} # should pass index_params_valid_prefix = {'-p': '/prefix'} # instantiate objects built from the above parameters index_is = BWA_index(params=index_params_is, HALT_EXEC=True) index_bwtsw = BWA_index(params=index_params_bwtsw, HALT_EXEC=True) index_invalid = BWA_index(params=index_params_invalid, HALT_EXEC=True) index_invalid_prefix = BWA_index(params=index_params_invalid_prefix, \ HALT_EXEC=True) index_valid_prefix = BWA_index(params=index_params_valid_prefix, \ HALT_EXEC=True) # Should not be allowed self.assertRaisesRegexp(ApplicationError, "Invalid argument", index_invalid.check_arguments) self.assertRaisesRegexp(ApplicationError, "Invalid argument", index_invalid_prefix.check_arguments) # Should execute and not raise any exceptions index_is.check_arguments() index_bwtsw.check_arguments() index_valid_prefix.check_arguments() # The rest of the _valid_arguments are for checking is_int and is_float # and they all use the same function from the base-class, so testing # just one of the subcommands should suffice # -n must be a float (expressed either as a float or as a string) # -o must be an int (expressed either as an int or as a string) # pass, both valid aln_params_valid = {'-n': 3.0, '-o': 5, '-f':'/sai_out'} # fail, second invalid aln_params_invalid1 = {'-n': 3.0, '-o': 'nope', '-f':'/sai_out'} # fail, first invalid aln_params_invalid2 = {'-n': '3.5.1', '-o': 4, '-f':'/sai_out'} # fail, did not specify -f aln_params_invalid3 = {'-n': 3.0, '-o': 5} # instantiate objects aln_valid = BWA_aln(params=aln_params_valid, HALT_EXEC=True) aln_invalid1 = BWA_aln(params=aln_params_invalid1, HALT_EXEC=True) aln_invalid2 = BWA_aln(params=aln_params_invalid2, HALT_EXEC=True) aln_invalid3 = BWA_aln(params=aln_params_invalid3, HALT_EXEC=True) test_paths = {'prefix': '/fa_in', 'fastq_in': '/fq_in'} # Should Halt Exec (AssertionError) right before execution self.assertRaisesRegexp(AssertionError, 'Halted exec', aln_valid, test_paths) # also need to make sure the base command is correct self.assertIn('; bwa aln -f /sai_out -n 3.0 -o 5 /fa_in /fq_in', aln_valid.BaseCommand) # Should fail self.assertRaisesRegexp(ApplicationError, "Invalid argument", aln_invalid1, test_paths) self.assertRaisesRegexp(ApplicationError, "Invalid argument", aln_invalid2, test_paths) self.assertRaisesRegexp(ApplicationError, "Please specify an output file", aln_invalid3, test_paths) def test_input_as_dict(self): """Tests the input handler (_input_as_dict) The input handler should throw exceptions if there are not enough arguments, or if there are unrecognized arguments, or if a file path appears to be a relative filepath. """ # Arguments for BWA_bwasw, which was chosen since it is the only one # that also has an optional argument (optional arguments are denoted # by a leading underscore) missing = {'prefix':'/fa_in', '_query_fasta_2': '/mate'} extra = {'prefix':'/fa_in', 'query_fasta':'/query_fasta', 'extra':'/param'} rel_fp = {'prefix':'fa_in', 'query_fasta':'/query_fasta'} valid = {'prefix':'/fa_in', 'query_fasta':'/query_fasta'} valid_with_mate = {'prefix':'/fa_in', 'query_fasta':'/query_fasta', '_query_fasta_2':'/mate'} # instantiate the object bwasw = BWA_bwasw(params={'-f':'/sam_out'}, HALT_EXEC=True) # should raise ApplicationError for wrong I/O files; failure self.assertRaisesRegexp(ApplicationError, "Missing required input", bwasw, missing) self.assertRaisesRegexp(ApplicationError, "Invalid input arguments", bwasw, extra) self.assertRaisesRegexp(ApplicationError, "Only absolute paths", bwasw, rel_fp) # should raise AssertionError (Halt Exec); success # tests valid arguments with and without the optional # _query_fasta_2 argument self.assertRaisesRegexp(AssertionError, 'Halted exec', bwasw, valid) self.assertRaisesRegexp(AssertionError, 'Halted exec', bwasw, valid_with_mate) def test_get_base_command(self): """Tests the function that generates the command string. Tests whether an object can be instantiated and then called using one set of files, and then another set of files. Since the structure of the various sublcasses is consistent, testing that the correct command is generated by one of the subclasses should suffice here. """ # instantiate one instance aln = BWA_aln(params = {'-n': 1.0, '-f':'/sai_out'}, HALT_EXEC=True) # set up two different sets of files first_files = {'prefix':'/fa_in1', 'fastq_in':'/fq_in1'} second_files = {'prefix':'/fa_in2', 'fastq_in':'/fq_in2'} # make sure both sets run, and that the command appears to be correct self.assertRaisesRegexp(AssertionError, 'Halted exec', aln, first_files) self.assertIn('; bwa aln -f /sai_out -n 1.0 /fa_in1 /fq_in1', aln.BaseCommand) self.assertRaisesRegexp(AssertionError, 'Halted exec', aln, second_files) self.assertIn('; bwa aln -f /sai_out -n 1.0 /fa_in2 /fq_in2', aln.BaseCommand) # instantiate another object, to test that there is no cross-talk # between instances with the same baseclass aln2 = BWA_aln(params = {'-n': 2.5, '-o': 7, '-f':'/sai_out'}, HALT_EXEC=True) self.assertRaisesRegexp(AssertionError, 'Halted exec', aln2, first_files) self.assertIn('; bwa aln -f /sai_out -n 2.5 -o 7 /fa_in1 /fq_in1', aln2.BaseCommand) def test_get_result_paths(self): """Tests the function that retrieves the result paths. aln, sampe, samse, bwasw return only one file. BWA_index returns 5 files, and the name depends on whether or not the -p option is on or not """ # instantiate objects index = BWA_index(params = {}, HALT_EXEC=True) index2 = BWA_index(params = {'-p':'/prefix'}, HALT_EXEC=True) aln = BWA_aln(params = {'-f':'/sai_out'}, HALT_EXEC=True) samse = BWA_samse(params = {'-f':'/sam_out'}, HALT_EXEC=True) sampe = BWA_sampe(params = {'-f':'/sam_out'}, HALT_EXEC=True) bwasw = BWA_bwasw(params = {'-f':'/sam_out'}, HALT_EXEC=True) # pass in the data, and make sure the output paths are as expected. # -p is off here index_data = {'fasta_in':'/fa_in'} results = index._get_result_paths(index_data) self.assertEqual(results['.amb'].Path, '/fa_in.amb') self.assertEqual(results['.ann'].Path, '/fa_in.ann') self.assertEqual(results['.bwt'].Path, '/fa_in.bwt') self.assertEqual(results['.pac'].Path, '/fa_in.pac') self.assertEqual(results['.sa'].Path, '/fa_in.sa') # pass in the data, and make sure the output paths are as expected. # -p is on here results = index2._get_result_paths(index_data) self.assertEqual(results['.amb'].Path, '/prefix.amb') self.assertEqual(results['.ann'].Path, '/prefix.ann') self.assertEqual(results['.bwt'].Path, '/prefix.bwt') self.assertEqual(results['.pac'].Path, '/prefix.pac') self.assertEqual(results['.sa'].Path, '/prefix.sa') # pass in the data, and make sure the output path is as expected aln_data = {'prefix':'/fa_in', 'fastq_in':'/fq_in'} results = aln._get_result_paths(aln_data) self.assertEqual(results['output'].Path, '/sai_out') samse_data = {'prefix':'/fa_in', 'sai_in':'/sai_in', 'fastq_in':'/fq_in'} results = samse._get_result_paths(samse_data) self.assertEqual(results['output'].Path, '/sam_out') sampe_data = {'prefix':'/fa_in', 'sai1_in':'/sai1_in', 'sai2_in':'/sai2_in', 'fastq1_in':'/fq1_in', 'fastq2_in':'/fq2_in'} results = sampe._get_result_paths(sampe_data) self.assertEqual(results['output'].Path, '/sam_out') def test_create_bwa_index_from_fasta_file(self): """Test create_bwa_index_from_fasta_file Makes sure that the file paths are as expected. """ # get a new temp file for the input fasta fasta_in = get_tmp_filename(suffix=".fna") # write the test fasta (see end of this file) to the temp file fasta = open(fasta_in, 'w') fasta.write(test_fasta) fasta.close() # make sure to remove this fasta file upon tearDown self.files_to_remove.append(fasta_in) # run the function results = create_bwa_index_from_fasta_file(fasta_in, {}) # for each of the 5 output files (not counting stdout, stderr, and # the exitStatus), make sure the file paths are as expcted. for filetype, result in results.iteritems(): if filetype not in ('ExitStatus'): # be sure to remove these 5 files self.files_to_remove.append(result.name) if filetype not in ('StdOut', 'ExitStatus', 'StdErr'): self.assertEqual(fasta_in + filetype, result.name) def test_assign_reads_to_database(self): """Tests for proper failure in assign_reads_to_database """ # sets of params that should cause failure no_alg = {} wrong_alg = {'algorithm': 'not_an_algorithm'} no_aln_params = {'algorithm': 'bwa-short'} # dummy files -- checking for failure as expected, so the function # won't get as far as actually running the program database = '/db' query = '/query' out = '/sam' self.assertRaisesRegexp(ApplicationError, "Must specify which algorithm", assign_reads_to_database, query, database, out, no_alg) self.assertRaisesRegexp(ApplicationError, "Unknown algorithm", assign_reads_to_database, query, database, out, wrong_alg) self.assertRaisesRegexp(ApplicationError, "aln is an intermediate step", assign_reads_to_database, query, database, out, no_aln_params) test_fasta = '''>NZ_GG770509_647533119 UACUUGGAGUUUGAUCCUGGCUCAGAACGAACGCUGGCGGCAGGCUUAACACAUGCAAGUCGAGCGAGCGGCAGACGGGUGAGUAACGCGUGGGAACGUACCAUUUGCUACGGAAUAACUCAGGGAAACUUGUGCUAAUACCGUAUGUGGAAAGUCGGCAAAUGAUCGGCCCGCGUUGGAUUAGCUAGUUGGUGGGGUAAAGGCUCACCAAGGCGACGAUCCAUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGCAAGCCUGAUCCAGCCAUGCCGCGUGAGUGAUGAAGGCCCUAGGGUUGUAAAGCUCUUUCACCGGUGAAGAUGACGGUAACCGGAGAAGAAGCCCCGGCUAACUUCGUGCCAGCAGCCGCGGUAAUACGAAGGGGGCUAGCGUUGUUCGGAUUUACUGGGCGUAAAGCGCACGUAGGCGGACUUUUAAGUCAGGGGUGAAAUCCCGGGGCUCAACCCCGGAACUGCCUUUGAUACUGGAAGUCUUGAGUAUGGUAGAGGUGAGUGGAAUUCCGAGUGUAGAGGUGAAAUUCGUAGAUAUUCGGAGGAACACCAGUGGCGAAGGCGGCUCACUGGACCAACUGACGCUGAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAAUGUUAGCCGUCGGGGCUUCGGUGGCGCAGCUAACGCAUUAAACAUUCCGCCUGGGGAGUGCGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGCAGAACCUUACCAGCCCUUGACAUCGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGCCCUUAGUUGCCAGCAUGGGCACUCUAAGGGGACUGCCGGUGAUAAGCCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUACGGGCUGGGCUACACACGUGCUACAAUGGUGGUCAGUGGGCAGCGAGCACGCGAGUGUGAGCUAAUCUCCGCCAUCUCAGUUCGGAUGCACUCUGCAACUCGAGUGCAGAAGUUGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUUGGUUUUACCCGAAGGCGCUUGCUAGGCAGGCGACCACGGUAGGGUCAGCGACUGGGGUGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGCGGCUGGAUCACCUCCUUUCU >NZ_GG739926_647533195 UAAUGGGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCUACAGGCUUAACACAUGCAAGUCGAGGGACCGGCGCACGGGUGAGUAACGCGUAUCCAACCUUCCCGCGACCAAGGGAUAACCUGCCGAAAGGCAGACUAAUACCUUAUGUCCAAAGUCGGUCACGGAUGGGGAUGCGUCCGAUUAGCUUGUUGGCGGGGCAACGGCCCACCAAGGCAUCGAUCGGUAGGGGUUCUGAGAGGAAGGCCCCCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGAGGAAUAUUGGUCAAUGGGCGGAAGCCUGAACCAGCCAAGUAGCGUGCAGGACGACGGCCUACGGGUUGUAAACUGCUUUUAUGCGGGGAUAUGCAGGUACCGCAUGAAUAAGGACCGGCUAAUUCCGUGCCAGCAGCCGCGGUAAUACGGAAGGUCCGGGCGUUAUCCGGAUUUAUUGGGUUUAAAGGGAGCGCAGGCCGCCGUGCAAGCGUGCCGUGAAAAGCAGCGGCCCAACCGCUGCCCUGCGGCGCGAACUGCUUGGCUUGAGUGCGCCGGAAGCGGGCGGAAUUCGUGGUGUAGCGGUGAAAUGCUUAGAUAUCACGAAGAACCCCGAUUGCGAAGGCAGCCCGCUGUGGCGACUGACGCUGAGGCUCGAAGGUGCGGGUAUCGAACAGGAUUAGAUACCCUGGUAGUCCGCACGGUAAACGAUGGAUACCCGCUGUCCGGCUCUGGGCGGCCAAGCGAAAGCGUUAAGUAUCCCACCUGGGGAGUACGCCGGCAACGGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGAGGAACAUGUGGUUUAAUUCGAUGAUACGCGAGGAACCUUACCCGGGCUUGAAUUGUGAAGGUGCUGCAUGGUUGUCGUCAGCUCGUGCCGUGAGGUGUCGGCUCAAGUGCCAUAACGAGCGCAACCCCUCUCCGCAGUUGCCAUCGGCCGGGCACUCUGCGGACACUGCCGCCGCAAGGUGGAGGAAGGUGGGGAUGACGUCAAAUCAGCACGGCCCUUACGUCCGGGGCCACACACGUGUUACAAUGGCCGGCAGAGGGCUGUCCGCGCGCAAGUGCGGGUGAAUCCCCUCCGGUCCCAGUUCGGAUGGGGUCUGCAACCCGACCCCAGAAGCUGGAUUCGCUAGUAAUCGCGCAUCAGCCAUGGCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCAAGCCAUGAAAGCCGGGGGUGCCUGAAGUCCGUGUCGGCCUAGGGCAAAACCGGUGAUUGGGGCUAAGUCGUAACAAGGUAGCCGUACCGGAAGGUGCGGCUGGAACACCUCCUUUCU >NZ_ACIZ01000148_643886127 AAUAUGGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAACGAGUGGCGGACGGGUGAGUAACACGUGGGUAACCUGCCCUUAAGUGGGGGAUAACAUUUGGAAACAGAUGCUAAUACCGCAUAAAGAAAGUCGCUUUUGGAUGGACCCGCGGCGUAUUAGCUAGUUGGUGAGGUAACGGCUCACCAAGGCAAUGAUACGUAGCCGAACUGAGAGGUUGAUCGGCCACAUUGGGACUGAGACACGGCCCAAACUCCUACGGGAGGCAGCAGUAGGGAAUCUUCCACAAUGGACGCAAGUCUGAUGGAGCAACGCCGCGUGAGUGAAGAAGGCUUUCGGGUCGUAAAACUCUGUUGUUGGAGAAGAUGACGGUAUCCAACCAGAAAGCCACGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGUGGCAAGCGUUAUCCGGAUUUAUUGGGCGUAAAGCGAGCGCAGGCGGUUUUUUAAGUCUGAUGUGAAAGCCCUCGGCUUAACCGAGGAAGUGCAUCGGAAACUGGGAAACUUGAGUGCAGAAGAGGACAGUGGAACUCCAUGUGUAGCGGUGAAAUGCGUAGAUAUAUGGAAGAACACCAGUGGCGAAGGCGGCUGUCUGGUCUGACUGACGCUGAGGCUCGAAAGCAUGGGUAGCGAACAGGAUUAGAUACCCUGGUAGUCCAUGCCGUAAACGAUGAAUGCUAGGUGUUGGAGCUUCAGUGCCGCAGCUAACGCAUUAAGCAUUCCGCCUGGGGAGUACGACCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAGGUCUUGACAUCGACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUGACUAGUUGCCAGCAUGGGCACUCUAGUAAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGACCUGGGCUACACACGUGCUACAAUGGAUGGCAACGAGUUGCGAGACCGCGAGGUCAAGCUAAUCUCUUCCAUUCUCAGUUCGGAUGUAGGCUGCAACUCGCCUACAGAAGUCGGAAUCGCUAGUAAUCGCGGAUCAGCACGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGAGAGUUUGUAACACCCGAAGCCGGUGCGUAGCGAGCCGUCUAAGGUGGGACAAAUGAUUAGGGUGAAGUCGUAACAAGGUAGCCGUAGGAGAACCUGCGGCUGGAUCACCUCCUUUCU''' if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_app/test_carnac.py000644 000765 000024 00000006563 12024702176 022074 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.carnac import Carnac __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class CarnacTest(TestCase): """Tests for Carnac application controller""" def setUp(self): self.input = carnac_input def test_stdout_input_as_lines(self): """Test carnac stdout input as lines If error check computation time in carnac_stdout!! Usually 00:00:00 but on slower systems may be different""" c = Carnac(InputHandler='_input_as_lines') exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in carnac_stdout]) res = c(self.input) obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() def test_stdout_input_as_string(self): """Test carnac stdout input as string If error check computation time in carnac_stdout!! Usually 00:00:00 but on slower systems may be different""" c = Carnac() exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in carnac_stdout]) f = open('/tmp/input.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = c('/tmp/input.fasta') obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() remove('/tmp/input.fasta') def test_get_result_path(self): """Tests carnac result path""" c = Carnac(InputHandler='_input_as_lines') res = c(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus',\ 'ct1','eq1','ct2','eq2','out_seq1','out_seq2','graph','align']) self.assertEqual(res['ExitStatus'],0) assert res['ct1'] is not None assert res['eq1'] is not None assert res['out_seq1'] is not None res.cleanUp() carnac_input = ['>seq1\n', 'GGCCACGTAGCTCAGTCGGTAGAGCAAAGGACTGAAAATCCTTGTGTCGTTGGTTCAATTCCAACCGTGGCCACCA\n', '>seq2\n', 'GCCAGATAGCTCAGTCGGTAGAGCGTTCGCCTGAAAAGTGAAAGGTCGCCGGTTCGATCCCGGCTCTGGCCACCA\n'] carnac_stdout = [' Sequences \n', '\n', ' sequence 1 (length 76, gc 52): seq1\n', ' sequence 2 (length 75, gc 60): seq2\n', '\n', ' Finding all potential stems \n', '\n', ' sequence 1 : 12 potential stems\n', ' sequence 2 : 18 potential stems\n', '\n', ' Pairwise foldings \n', '\n', ' seq 1 / seq 2: 3 vs 3 stems\n', '\n', ' Combination of pairwise foldings: 6 classes of stems in 3 connex components\n', '\n', ' sequence 1: 1 cofoldings with 12 stems -> 3 selected stems + 2 remaining stems \n', ' sequence 2: 1 cofoldings with 18 stems -> 3 selected stems + 1 remaining stems \n', '\n', ' Overall computation time : 00:00:00\n', '\n', ' Parameter values :\n', '\n', ' AP_THD 8\n', ' SUB -5\n', ' SIZE_HAIRPIN 3\n', ' CORRECT_THD 1\n', ' INI_THD -500\n', ' DIST_1 50\n', ' DIST_2 300\n', ' Energy tresholds : -800 - -1300\n', ' Allowing single hairpins : no\n', ' SIZE_MAX_HP 8\n', ' THD_HP -1500\n', ' FLT 1\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_cd_hit.py000644 000765 000024 00000020513 12024702176 022066 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import getcwd, rmdir from cogent.core.moltype import PROTEIN, DNA from cogent.util.unit_test import TestCase, main from cogent.app.cd_hit import CD_HIT, CD_HIT_EST, cdhit_from_seqs, \ cdhit_clusters_from_seqs, clean_cluster_seq_id, parse_cdhit_clstr_file __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "mcdonadt@colorado.edu" __status__ = "Development" class CD_HIT_Tests(TestCase): """Tests for the CD-HIT application controller""" def test_base_command(self): """CD_HIT BaseCommand should return the correct BaseCommand""" c = CD_HIT() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cd-hit'])) c.Parameters['-i'].on('seq.txt') self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cd-hit -i "seq.txt"'])) c.Parameters['-c'].on(0.8) self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cd-hit -c 0.8' + ' -i "seq.txt"'])) def test_changing_working_dir(self): """CD_HIT BaseCommand should change according to WorkingDir""" c = CD_HIT(WorkingDir='/tmp/cdhit_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cdhit_test','/"; ','cd-hit'])) c = CD_HIT() c.WorkingDir = '/tmp/cdhit_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cdhit_test2','/"; ','cd-hit'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/cdhit_test') rmdir('/tmp/cdhit_test2') def test_cdhit_from_seqs(self): """CD_HIT should return expected seqs""" res = cdhit_from_seqs(protein_seqs, PROTEIN, {'-c':0.8}) self.assertEqual(res.toFasta(), protein_expected) class CD_HIT_EST_Tests(TestCase): """Tests for the CD-HIT application controller""" def test_base_command(self): """CD_HIT_EST BaseCommand should return the correct BaseCommand""" c = CD_HIT_EST() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cd-hit-est'])) c.Parameters['-i'].on('seq.txt') self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cd-hit-est -i "seq.txt"'])) c.Parameters['-c'].on(0.8) self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cd-hit-est -c 0.8' + ' -i "seq.txt"'])) def test_changing_working_dir(self): """CD_HIT_EST BaseCommand should change according to WorkingDir""" c = CD_HIT_EST(WorkingDir='/tmp/cdhitest_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cdhitest_test','/"; ','cd-hit-est'])) c = CD_HIT_EST() c.WorkingDir = '/tmp/cdhitest_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cdhitest_test2','/"; ','cd-hit-est'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/cdhitest_test') rmdir('/tmp/cdhitest_test2') def test_cdhit_from_seqs(self): """CD_HIT should return expected seqs""" res = cdhit_from_seqs(dna_seqs, DNA, {'-c':0.8}) self.assertEqual(res.toFasta(), dna_expected) def test_cdhit_from_seqs_synonym(self): """CD_HIT should return expected seqs with -c synonym""" res = cdhit_from_seqs(dna_seqs, DNA, {'Similarity':0.8}) self.assertEqual(res.toFasta(), dna_expected) class CD_HIT_SupportMethodTests(TestCase): """Tests for supporting methods""" def test_clean_cluster_seq_id(self): """clean_cluster_seq_id returns a cleaned sequence id""" data = ">foobar..." exp = "foobar" obs = clean_cluster_seq_id(data) self.assertEqual(obs, exp) def test_parse_cdhit_clstr_file(self): """parse_cdhit_clstr_file returns the correct clusters""" data = cdhit_clstr_file.split('\n') exp = [['seq0'],['seq1','seq10','seq3','seq23','seq145'],\ ['seq7','seq17','seq69','seq1231']] obs = parse_cdhit_clstr_file(data) self.assertEqual(obs, exp) def test_cdhit_clusters_from_seqs(self): """cdhit_clusters_from_seqs returns expected clusters""" exp = [['cdhit_test_seqs_0'],['cdhit_test_seqs_1'],\ ['cdhit_test_seqs_2'],['cdhit_test_seqs_3'],\ ['cdhit_test_seqs_4'],['cdhit_test_seqs_5'],\ ['cdhit_test_seqs_6','cdhit_test_seqs_8'],\ ['cdhit_test_seqs_7'],['cdhit_test_seqs_9']] obs = cdhit_clusters_from_seqs(dna_seqs, DNA) self.assertEqual(obs, exp) dna_seqs = """>cdhit_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >cdhit_test_seqs_1 ACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT >cdhit_test_seqs_2 CCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT >cdhit_test_seqs_3 CCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT >cdhit_test_seqs_4 GCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT >cdhit_test_seqs_5 CCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA >cdhit_test_seqs_6 CGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT >cdhit_test_seqs_7 ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT >cdhit_test_seqs_8 CGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT >cdhit_test_seqs_9 GGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA""" dna_expected = """>cdhit_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >cdhit_test_seqs_1 ACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT >cdhit_test_seqs_2 CCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT >cdhit_test_seqs_4 GCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT >cdhit_test_seqs_5 CCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA >cdhit_test_seqs_7 ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT""" protein_seqs = """>seq1 MGNKWSKSWPQVRDRMRRAAPAPAADGVGAVSQDLAKHGAITSSNTAATNDDCAWLEAQTEEEVGFPVRPQVPLRPMTYK >seq2 MGGKWSKSSIVGWSTVRERMRKTPPAADGVGAVSQDLDKHGAVTSSNTAFNNPDCAWLEAQEDEDVGFPVRPQVPLRPT >seq3 MGGKWSKSSIVGWPAIRERMRRARPAADRVGTQPAADGVGAVSQDLARHGAVTSSNTSHNNPDCAWLEAQEEEEVGVR >seq4 MGKIWSKSSIVGWPEIRERMRRQRPHEPAVEPAVGVGAASQDLANRGALTTSNTRTNNPTVAWVEAQEEEGEVVRPQ >seq5 MGKIWSKSSLVGWPEIRERMRRQTQEPAVEPAVGAGAASQDLANRGAITIRNTRDNNESIAWLEAQEEEFPVRPQV >seq6 MGKIWSKSSLVGWPEIRERIRRQTPEPAVGVGAVSQDLANRGAITTSNTKDNNQTVAWLEAQEEPVRPQVPLRPM >seq7 MGNALRKGKFEGWAAVRERMRRTRTFPESEPCAPGVGQISRELAARGGIPSSHTPQNNESHQEEEVGFPVAPQV >seq8 MGNAWSKSKFAGWSEVRDRMRRSSSDPQQPCAPGVGAVSRELATRGGISSSALAFLDSHKDEDVGFPVRPQVP >seq9 MGNVLGKDKFKGWAAVRERMRKTSSDPDPQPCAPGVGPVSRELSYTPQNNAALAFLESHEDEDVGFPVXPQV >seq10 MGNVLGKDKFKGWSAVRERMRKTSPEPEPCAPGVRGGISNSHTPQNNAALAFLESHQDEDVGFPVRPQVPL""" protein_expected = """>seq1 MGNKWSKSWPQVRDRMRRAAPAPAADGVGAVSQDLAKHGAITSSNTAATNDDCAWLEAQTEEEVGFPVRPQVPLRPMTYK >seq2 MGGKWSKSSIVGWSTVRERMRKTPPAADGVGAVSQDLDKHGAVTSSNTAFNNPDCAWLEAQEDEDVGFPVRPQVPLRPT >seq3 MGGKWSKSSIVGWPAIRERMRRARPAADRVGTQPAADGVGAVSQDLARHGAVTSSNTSHNNPDCAWLEAQEEEEVGVR >seq4 MGKIWSKSSIVGWPEIRERMRRQRPHEPAVEPAVGVGAASQDLANRGALTTSNTRTNNPTVAWVEAQEEEGEVVRPQ >seq5 MGKIWSKSSLVGWPEIRERMRRQTQEPAVEPAVGAGAASQDLANRGAITIRNTRDNNESIAWLEAQEEEFPVRPQV >seq7 MGNALRKGKFEGWAAVRERMRRTRTFPESEPCAPGVGQISRELAARGGIPSSHTPQNNESHQEEEVGFPVAPQV >seq8 MGNAWSKSKFAGWSEVRDRMRRSSSDPQQPCAPGVGAVSRELATRGGISSSALAFLDSHKDEDVGFPVRPQVP >seq9 MGNVLGKDKFKGWAAVRERMRKTSSDPDPQPCAPGVGPVSRELSYTPQNNAALAFLESHEDEDVGFPVXPQV""" cdhit_clstr_file = """>Cluster 0 0 2799aa, >seq0... * >Cluster 1 0 2214aa, >seq1... at 80% 1 2215aa, >seq10... at 84% 2 2217aa, >seq3... * 3 2216aa, >seq23... at 84% 4 527aa, >seq145... at 63% >Cluster 2 0 2202aa, >seq7... at 60% 1 2208aa, >seq17... * 2 2207aa, >seq69... at 73% 3 2208aa, >seq1231... at 69%""" if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_clearcut.py000644 000765 000024 00000024203 12024702176 022436 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import getcwd, remove, rmdir, mkdir, path import tempfile, shutil from cogent.core.moltype import DNA, RNA, PROTEIN from cogent.core.alignment import DataError from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from cogent.app.clearcut import Clearcut, build_tree_from_alignment,\ _matrix_input_from_dict2d, build_tree_from_distance_matrix from cogent.util.dict2d import Dict2D __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" class GeneralSetUp(TestCase): def setUp(self): """Clearcut general setUp method for all tests""" self.seqs1 = ['ACUGCUAGCUAGUAGCGUACGUA','GCUACGUAGCUAC', 'GCGGCUAUUAGAUCGUA'] self.labels1 = ['>1','>2','>3'] self.lines1 = flatten(zip(self.labels1,self.seqs1)) self.seqs2=['UAGGCUCUGAUAUAAUAGCUCUC','UAUCGCUUCGACGAUUCUCUGAUAGAGA', 'UGACUACGCAU'] self.labels2=['>a','>b','>c'] self.lines2 = flatten(zip(self.labels2,self.seqs2)) self.temp_dir = tempfile.mkdtemp() #self.temp_dir_spaces = '/tmp/test for clearcut/' #try: # mkdir(self.temp_dir_spaces) #except OSError: # pass try: #create sequence files f = open(path.join(self.temp_dir, 'seq1.txt'),'w') f.write('\n'.join(self.lines1)) f.close() g = open(path.join(self.temp_dir, 'seq2.txt'),'w') g.write('\n'.join(self.lines2)) g.close() except OSError: pass class ClearcutTests(GeneralSetUp): """Tests for the Clearcut application controller""" def test_base_command(self): """Clearcut BaseCommand should return the correct BaseCommand""" c = Clearcut() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','clearcut -d -q'])) c.Parameters['--in'].on('seq.txt') self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','clearcut -d --in="seq.txt" -q'])) def test_changing_working_dir(self): """Clearcut BaseCommand should change according to WorkingDir""" c = Clearcut(WorkingDir='/tmp/clearcut_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/clearcut_test','/"; ','clearcut -d -q'])) c = Clearcut() c.WorkingDir = '/tmp/clearcut_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/clearcut_test2','/"; ','clearcut -d -q'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/clearcut_test') rmdir('/tmp/clearcut_test2') def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) #shutil.rmtree(self.temp_dir_spaces) def test_build_tree_from_alignment(self): """Clearcut should return a tree built from the passed alignment""" tree_short = build_tree_from_alignment(build_tree_seqs_short,\ moltype=DNA) num_seqs = flatten(build_tree_seqs_short).count('>') self.assertEqual(len(tree_short.tips()), num_seqs) tree_long = build_tree_from_alignment(build_tree_seqs_long, moltype=DNA) seq_names = [] for line in build_tree_seqs_long.split('\n'): if line.startswith('>'): seq_names.append(line[1:]) for node in tree_long.tips(): if node.Name not in seq_names: self.fail() #repeat with best_tree = True tree_long = build_tree_from_alignment(build_tree_seqs_long,\ best_tree=True,\ moltype=DNA) seq_names = [] for line in build_tree_seqs_long.split('\n'): if line.startswith('>'): seq_names.append(line[1:]) for node in tree_long.tips(): if node.Name not in seq_names: self.fail() #build_tree_from_alignment should raise DataError when constructing # an Alignment from unaligned sequences. Clearcut only allows aligned # or a distance matrix as input. self.assertRaises(DataError,build_tree_from_alignment,\ build_tree_seqs_unaligned,DNA) def test_matrix_input_from_dict2d(self): """matrix_input_from_dict2d formats dict2d object into distance matrix """ data = [('sample1aaaaaaa', 'sample2', 1.438), ('sample2', 'sample1aaaaaaa', 1.438), ('sample1aaaaaaa', 'sample3', 2.45678), ('sample3', 'sample1aaaaaaa', 2.45678), ('sample2', 'sample3', 2.7), ('sample3', 'sample2', 2.7)] data_dict2d = Dict2D(data, Pad=True, Default=0.0) matrix, int_map = _matrix_input_from_dict2d(data_dict2d) #of = open('temp.txt', 'w') #of.write(matrix) #of.close() matrix = matrix.split('\n') self.assertEqual(matrix[0], ' 3') self.assertEqual(matrix[1], 'env_0 0.0 1.438 2.45678') self.assertEqual(matrix[2], 'env_1 1.438 0.0 2.7') self.assertEqual(matrix[3], 'env_2 2.45678 2.7 0.0') self.assertEqual(int_map['env_1'], 'sample2') self.assertEqual(int_map['env_0'], 'sample1aaaaaaa') self.assertEqual(int_map['env_2'], 'sample3') def test_build_tree_from_distance_matrix(self): """build_tree_from_distance_matrix builds a tree from a dict2d """ data = [('sample1aaaaaaa', 'sample2', 1.438), ('sample2', 'sample1aaaaaaa', 1.438), ('sample1aaaaaaa', 'sample3', 2.45678), ('sample3', 'sample1aaaaaaa', 2.45678), ('sample2', 'sample3', 2.7), ('sample3', 'sample2', 2.7)] data_dict2d = Dict2D(data, Pad=True, Default=0.0) result = build_tree_from_distance_matrix(data_dict2d) self.assertEqual(str(result), '((sample1aaaaaaa:0.59739,sample2:0.84061),sample3:1.85939);') align1 = ">seq_0\nACUGCUAGCUAGUAGCGUACGUA\n>seq_1\n---GCUACGUAGCUAC-------\n>seq_2\nGCGGCUAUUAGAUCGUA------" build_tree_seqs_short = """>clearcut_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AGCTTTAAATCATGCCAGTG >clearcut_test_seqs_1 GACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT TGCTTTCAATAATGCCAGTG >clearcut_test_seqs_2 AACCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT TGCTTTGAATCATGCCAGTA >clearcut_test_seqs_3 AAACCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT TGCTTTACATCATGCAAGTG >clearcut_test_seqs_4 AACCGCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT TGCTTTAAATCATGCCAGTG >clearcut_test_seqs_5 AACCCCCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA TGCTTTAAATCATGCCAGTT >clearcut_test_seqs_6 GACCCCCGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT TACTTTAGATCATGCCGGTG >clearcut_test_seqs_7 AACCCCCACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT TGCTTTAAATCATGCCAGTG >clearcut_test_seqs_8 AACCCCCACGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT TGCATTAAATCATGCCAGTG >clearcut_test_seqs_9 AAGCCCCACGGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA TGCTTTAAATCCTGACAGCG """ build_tree_seqs_long = """>clearcut_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AGCTTTAAATCATGCCAGTG >clearcut_test_seqsaaaaaaaa_1 GACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT TGCTTTCAATAATGCCAGTG >clearcut_test_seqsaaaaaaaa_2 AACCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT TGCTTTGAATCATGCCAGTA >clearcut_test_seqsaaaaaaaa_3 AAACCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT TGCTTTACATCATGCAAGTG >clearcut_test_seqsaaaaaaaa_4 AACCGCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT TGCTTTAAATCATGCCAGTG >clearcut_test_seqsaaaaaaaa_5 AACCCCCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA TGCTTTAAATCATGCCAGTT >clearcut_test_seqsaaaaaaaa_6 GACCCCCGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT TACTTTAGATCATGCCGGTG >clearcut_test_seqsaaaaaaaa_7 AACCCCCACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT TGCTTTAAATCATGCCAGTG >clearcut_test_seqsaaaaaaaa_8 AACCCCCACGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT TGCATTAAATCATGCCAGTG >clearcut_test_seqsaaaaaaaa_9 AAGCCCCACGGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA TGCTTTAAATCCTGACAGCG """ #Unaligned seqs. First two sequences are 3 nucleotides shorter. build_tree_seqs_unaligned = """>clearcut_test_seqs_0 CCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AGCTTTAAATCATGCCAGTG >clearcut_test_seqs_1 CCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT TGCTTTCAATAATGCCAGTG >clearcut_test_seqs_2 AACCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT TGCTTTGAATCATGCCAGTA >clearcut_test_seqs_3 AAACCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT TGCTTTACATCATGCAAGTG >clearcut_test_seqs_4 AACCGCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT TGCTTTAAATCATGCCAGTG >clearcut_test_seqs_5 AACCCCCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA TGCTTTAAATCATGCCAGTT >clearcut_test_seqs_6 GACCCCCGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT TACTTTAGATCATGCCGGTG >clearcut_test_seqs_7 AACCCCCACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT TGCTTTAAATCATGCCAGTG >clearcut_test_seqs_8 AACCCCCACGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT TGCATTAAATCATGCCAGTG >clearcut_test_seqs_9 AAGCCCCACGGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA TGCTTTAAATCCTGACAGCG """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_clustalw.py000644 000765 000024 00000057360 12024702176 022504 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for application controller for ClustalW v1.83""" import re from os import getcwd, remove, rmdir, mkdir, path import shutil from cogent.core.alignment import Alignment from cogent.core.moltype import RNA from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from cogent.app.clustalw import Clustalw, alignUnalignedSeqsFromFile,\ alignUnalignedSeqs, alignTwoAlignments, addSeqsToAlignment,\ buildTreeFromAlignment, build_tree_from_alignment, \ bootstrap_tree_from_alignment, align_unaligned_seqs, \ align_and_build_tree, add_seqs_to_alignment, align_two_alignments from cogent.parse.fasta import MinimalFastaParser __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight", "Daniel McDonald",\ "Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" cw_vers = re.compile("CLUSTAL W [(]1\.8[1-3][.\d]*[)]") class GeneralSetUp(TestCase): def setUp(self): """Clustalw general setUp method for all tests""" self.seqs1 = ['ACUGCUAGCUAGUAGCGUACGUA','GCUACGUAGCUAC', 'GCGGCUAUUAGAUCGUA'] self.aln1_fasta = ALIGN1_FASTA self.labels1 = ['>1','>2','>3'] self.lines1 = flatten(zip(self.labels1,self.seqs1)) self.stdout1 = STDOUT1 self.aln1 = ALIGN1 self.dnd1 = DND1 self.multiline1 = '\n'.join(flatten(zip(self.labels1, self.seqs1))) self.seqs2=['UAGGCUCUGAUAUAAUAGCUCUC','UAUCGCUUCGACGAUUCUCUGAUAGAGA', 'UGACUACGCAU'] self.labels2=['>a','>b','>c'] self.lines2 = flatten(zip(self.labels2,self.seqs2)) self.aln2 = ALIGN2 self.dnd2 = DND2 self.twoalign = TWOALIGN self.alignseqs = ALIGNSEQS self.treeduringalignseqs = TREEDURINGALIGNSEQS self.treefromalignseqs = TREEFROMALIGNSEQS self.temp_dir_space = "/tmp/clustalw test" self.build_tree_seqs_short = """>clustal_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AGCTTTAAATCATGCCAGTG >clustal_test_seqs_1 GACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT TGCTTTCAATAATGCCAGTG >clustal_test_seqs_2 AACCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT TGCTTTGAATCATGCCAGTA >clustal_test_seqs_3 AAACCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT TGCTTTACATCATGCAAGTG >clustal_test_seqs_4 AACCGCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT TGCTTTAAATCATGCCAGTG >clustal_test_seqs_5 AACCCCCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA TGCTTTAAATCATGCCAGTT >clustal_test_seqs_6 GACCCCCGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT TACTTTAGATCATGCCGGTG >clustal_test_seqs_7 AACCCCCACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT TGCTTTAAATCATGCCAGTG >clustal_test_seqs_8 AACCCCCACGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT TGCATTAAATCATGCCAGTG >clustal_test_seqs_9 AAGCCCCACGGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA TGCTTTAAATCCTGACAGCG """ self.build_tree_seqs_long = """>clustal_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AGCTTTAAATCATGCCAGTG >clustal_test_seqsaaaaaaaa_1 GACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT TGCTTTCAATAATGCCAGTG >clustal_test_seqsaaaaaaaa_2 AACCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT TGCTTTGAATCATGCCAGTA >clustal_test_seqsaaaaaaaa_3 AAACCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT TGCTTTACATCATGCAAGTG >clustal_test_seqsaaaaaaaa_4 AACCGCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT TGCTTTAAATCATGCCAGTG >clustal_test_seqsaaaaaaaa_5 AACCCCCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA TGCTTTAAATCATGCCAGTT >clustal_test_seqsaaaaaaaa_6 GACCCCCGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT TACTTTAGATCATGCCGGTG >clustal_test_seqsaaaaaaaa_7 AACCCCCACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT TGCTTTAAATCATGCCAGTG >clustal_test_seqsaaaaaaaa_8 AACCCCCACGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT TGCATTAAATCATGCCAGTG >clustal_test_seqsaaaaaaaa_9 AAGCCCCACGGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA TGCTTTAAATCCTGACAGCG """ try: mkdir('/tmp/ct') except OSError: #dir already exists pass try: #create sequence files f = open('/tmp/ct/seq1.txt','w') f.write('\n'.join(self.lines1)) f.close() g = open('/tmp/ct/seq2.txt','w') g.write('\n'.join(self.lines2)) g.close() #create alignment files f = open('/tmp/ct/align1','w') f.write(self.aln1) f.close() g = open('/tmp/ct/align2','w') g.write(self.aln2) g.close() #create tree file f = open('/tmp/ct/tree1','w') f.write(DND1) f.close() except OSError: pass class ClustalwTests(GeneralSetUp): """Tests for the Clustalw application controller""" def test_base_command(self): """Clustalw BaseCommand should return the correct BaseCommand""" c = Clustalw() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','clustalw -align'])) c.Parameters['-infile'].on('seq.txt') self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ',\ 'clustalw -infile="seq.txt" -align'])) c.Parameters['-align'].off() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','clustalw -infile="seq.txt"'])) c.Parameters['-nopgap'].on() c.Parameters['-infile'].off() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','clustalw -nopgap'])) def test_changing_working_dir(self): """Clustalw BaseCommand should change according to WorkingDir""" c = Clustalw(WorkingDir='/tmp/clustaltest') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/clustaltest','/"; ','clustalw -align'])) c = Clustalw(WorkingDir='/tmp/clustaltest/') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/clustaltest/','/"; ','clustalw -align'])) c = Clustalw() c.WorkingDir = '/tmp/clustaltest2/' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/clustaltest2/','/"; ','clustalw -align'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/clustaltest') rmdir('/tmp/clustaltest2') def test_stdout_input_as_string(self): """Clustalw input_as_string shoud function as expected""" c = Clustalw(WorkingDir='/tmp/ct') res = c('/tmp/ct/seq1.txt') self.assertEqual(cw_vers.sub("", res['StdOut'].read()), cw_vers.sub("", self.stdout1)) self.assertEqual(res['StdErr'].read(),'') self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.aln1)) self.assertEqual(res['Dendro'].read(),self.dnd1) res.cleanUp() def test_stdout_input_as_lines(self): """Clustalw input_as_lines should function as expected""" c = Clustalw(InputHandler='_input_as_lines',WorkingDir='/tmp/ct') res = c(self.lines1) #get info on input file name and change output accordingly name = c.Parameters['-infile'].Value out = self.stdout1.split('\n') out[16] =\ 'Guide tree file created: ['+name.rsplit(".")[0]+'.dnd]' out[23] =\ 'CLUSTAL-Alignment file created ['+name.rsplit(".")[0]+'.aln]' self.assertEqual(cw_vers.sub("", res['StdOut'].read()), cw_vers.sub("", '\n'.join(out))) self.assertEqual(res['StdErr'].read(),'') self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.aln1)) self.assertEqual(res['Dendro'].read(),self.dnd1) res.cleanUp() def test_stdout_input_as_lines_local(self): """Clustalw input_as_lines should function as expected""" c = Clustalw(InputHandler='_input_as_lines',WorkingDir=self.temp_dir_space) res = c(self.lines1) #get info on input file name and change output accordingly name = c.Parameters['-infile'].Value out = self.stdout1.split('\n') out[16] =\ 'Guide tree file created: ['+name.rsplit(".")[0]+'.dnd]' out[23] =\ 'CLUSTAL-Alignment file created ['+name.rsplit(".")[0]+'.aln]' self.assertEqual(cw_vers.sub("", res['StdOut'].read()), cw_vers.sub("", '\n'.join(out))) self.assertEqual(res['StdErr'].read(),'') self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.aln1)) self.assertEqual(res['Dendro'].read(),self.dnd1) res.cleanUp() def test_stdout_input_as_seqs(self): """Clustalw input_as_seqs should function as expected""" c = Clustalw(InputHandler='_input_as_seqs',WorkingDir='/tmp/ct') res = c(self.seqs1) #get info on input file name and change output accordingly name = c.Parameters['-infile'].Value out = self.stdout1.split('\n') out[16] =\ 'Guide tree file created: ['+name.rsplit(".")[0]+'.dnd]' out[23] =\ 'CLUSTAL-Alignment file created ['+name.rsplit(".")[0]+'.aln]' self.assertEqual(cw_vers.sub("", res['StdOut'].read()), cw_vers.sub("", '\n'.join(out))) self.assertEqual(res['StdErr'].read(),'') self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.aln1)) self.assertEqual(res['Dendro'].read(),self.dnd1) res.cleanUp() def test_stdout_input_as_multiline_string(self): """Clustalw input_as_multiline_string should function as expected""" c = Clustalw(InputHandler='_input_as_multiline_string',\ WorkingDir='/tmp/ct') res = c(self.multiline1) name = c.Parameters['-infile'].Value out = self.stdout1.split('\n') out[16] =\ 'Guide tree file created: ['+name.rsplit(".")[0]+'.dnd]' out[23] =\ 'CLUSTAL-Alignment file created ['+name.rsplit(".")[0]+'.aln]' self.assertEqual(cw_vers.sub("", res['StdOut'].read()), cw_vers.sub("", '\n'.join(out))) self.assertEqual(res['StdErr'].read(),'') self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.aln1)) self.assertEqual(res['Dendro'].read(),self.dnd1) res.cleanUp() def test_alignment_trees(self): """Clustalw alignment should work correctly with new/usetree""" c = Clustalw(params={'-quicktree':True,'-type':'DNA','-gapopen':10},\ WorkingDir='/tmp/ct') res = c('/tmp/ct/seq1.txt') self.assertEqual(res['Align'].name,'/tmp/ct/seq1.aln') self.assertEqual(res['Dendro'].name,'/tmp/ct/seq1.dnd') res.cleanUp() c.Parameters['-usetree'].on('/tmp/ct/tree1') c.Parameters['-output'].on('PHYLIP') res = c('/tmp/ct/seq1.txt') self.assertEqual(res['Align'].name,'/tmp/ct/seq1.phy') self.assertEqual(res['Dendro'].name,'/tmp/ct/tree1') res.cleanUp() c.Parameters['-newtree'].on('newtree') c.Parameters['-outfile'].on('outfile') res = c('/tmp/ct/seq1.txt') self.assertEqual(res['Align'].name, c.WorkingDir + 'outfile') self.assertEqual(res['Dendro'].name, c.WorkingDir + 'newtree') res.cleanUp() def test_profile_newtree(self): """Clustalw profile should work correctly with new/usetree""" c = Clustalw(params={'-profile':None,'-profile1':'/tmp/ct/seq1.txt',\ '-profile2':'/tmp/ct/seq2.txt','-newtree1':'lala'},\ WorkingDir='/tmp/ct') c.Parameters['-align'].off() res = c() self.assertEqual(res['Align'],None) self.assertEqual(res['Dendro1'].name,'/tmp/ct/lala') self.assertEqual(res['Dendro2'].name,'/tmp/ct/seq2.dnd') res.cleanUp() def test_sequences_newtree(self): """Clustalw sequences should work correctly with new/usetree""" c = Clustalw(params={'-sequences':None,'-newtree':'lala',\ '-profile1':'/tmp/ct/align1','-profile2':'/tmp/ct/seq2.txt'},\ WorkingDir='/tmp/ct') c.Parameters['-align'].off() res = c() self.assertEqual(res['Align'],None) self.assertEqual(res['Dendro'].name,'/tmp/ct/lala') res.cleanUp() #is this a bug in clustal. It's creating an empty file 'seq2.aln' #but doesn't report it in the stdout remove('/tmp/ct/seq2.aln') def test_tree_outputtree(self): """Clustalw tree should work correctly with outputtree""" c = Clustalw(params={'-tree':None,'-outputtree':'dist',\ '-infile':'/tmp/ct/align1'},WorkingDir='/tmp/ct/') c.Parameters['-align'].off() res = c() self.assertEqual(res['Tree'].name,'/tmp/ct/align1.ph') self.assertEqual(res['TreeInfo'].name,'/tmp/ct/align1.dst') res.cleanUp() class clustalwTests(GeneralSetUp): """Tests for module level functions in clustalw.py""" def test_alignUnalignedSeqs(self): """Clustalw alignUnalignedSeqs should work as expected""" res = alignUnalignedSeqs(self.seqs1,WorkingDir='/tmp/ct') self.assertNotEqual(res['StdErr'],None) self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.aln1)) self.assertEqual(res['Dendro'].read(),self.dnd1) res.cleanUp() #suppress stderr and stdout res = alignUnalignedSeqs(self.seqs1,WorkingDir='/tmp/ct',\ SuppressStderr=True,SuppressStdout=True) self.assertEqual(res['StdOut'],None) self.assertEqual(res['StdErr'],None) self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.aln1)) self.assertEqual(res['Dendro'].read(),self.dnd1) res.cleanUp() def test_alignUnalignedSeqsFromFile(self): """Clustalw alignUnalignedSeqsFromFile should work as expected""" #make temp file res = alignUnalignedSeqsFromFile('/tmp/ct/seq1.txt') self.assertEqual(cw_vers.sub("", res['StdOut'].read()), cw_vers.sub("", self.stdout1)) self.assertEqual(res['StdErr'].read(),'') self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.aln1)) self.assertEqual(res['Dendro'].read(),self.dnd1) res.cleanUp() #suppress stderr and stdout res = alignUnalignedSeqsFromFile('/tmp/ct/seq1.txt',\ SuppressStderr=True, SuppressStdout=True) self.assertEqual(res['StdOut'],None) self.assertEqual(res['StdErr'],None) self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.aln1)) self.assertEqual(res['Dendro'].read(),self.dnd1) res.cleanUp() def test_alignTwoAlignments(self): """Clustalw alignTwoAlignments should work as expected""" res = alignTwoAlignments('/tmp/ct/align1','/tmp/ct/align2',\ 'twoalign.aln') self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.twoalign)) self.assertNotEqual(res['Dendro1'],None) self.assertNotEqual(res['Dendro2'],None) #are there new trees created during the profiling? #the produced trees are not the same as when aligning individually #self.assertEqual(res['Dendro1'].read(),self.dnd) #self.assertEqual(res['Dendro2'].read(),self.dnd2) res.cleanUp() def test_addSeqsToAlignment(self): """Clustalw addSeqsToAlignment shoudl work as expected""" res = addSeqsToAlignment('/tmp/ct/align1','/tmp/ct/seq2.txt',\ 'alignseqs') self.assertEqual(cw_vers.sub("", res['Align'].read()), cw_vers.sub("", self.alignseqs)) self.assertEqual(res['Dendro'].read(),self.treeduringalignseqs) res.cleanUp() def test_buildTreeFromAlignment(self): """Clustalw buildTreeFromAlignment shoudl work as expected""" pre_res = addSeqsToAlignment('/tmp/ct/align1','/tmp/ct/seq2.txt',\ 'alignseqs',WorkingDir='/tmp/ct') res = buildTreeFromAlignment('/tmp/ct/alignseqs',WorkingDir='/tmp/ct') self.assertEqual(res['Tree'].read(),self.treefromalignseqs) res.cleanUp() pre_res.cleanUp() def test_build_tree_from_alignment(self): """Clustalw should return a tree built from the passed alignment""" tree_short = build_tree_from_alignment(self.build_tree_seqs_short, \ RNA, best_tree=False) num_seqs = flatten(self.build_tree_seqs_short).count('>') self.assertEqual(len(tree_short.tips()), num_seqs) tree_long = build_tree_from_alignment(self.build_tree_seqs_long, \ RNA, best_tree=False) seq_names = [] for line in self.build_tree_seqs_long.split('\n'): if line.startswith('>'): seq_names.append(line[1:]) for node in tree_long.tips(): if node.Name not in seq_names: self.fail() tree_short = build_tree_from_alignment(self.build_tree_seqs_short, \ RNA, best_tree=True, params={'-bootstrap':3}) num_seqs = flatten(self.build_tree_seqs_short).count('>') self.assertEqual(len(tree_short.tips()), num_seqs) def test_align_unaligned_seqs(self): """Clustalw align_unaligned_seqs should work as expected""" res = align_unaligned_seqs(self.seqs1, RNA) self.assertEqual(res.toFasta(), self.aln1_fasta) def test_bootstrap_tree_from_alignment(self): """Clustalw should return a bootstrapped tree from the passed aln""" tree_short = bootstrap_tree_from_alignment(self.build_tree_seqs_short) num_seqs = flatten(self.build_tree_seqs_short).count('>') self.assertEqual(len(tree_short.tips()), num_seqs) tree_long = bootstrap_tree_from_alignment(self.build_tree_seqs_long) seq_names = [] for line in self.build_tree_seqs_long.split('\n'): if line.startswith('>'): seq_names.append(line[1:]) for node in tree_long.tips(): if node.Name not in seq_names: self.fail() def test_align_and_build_tree(self): """Aligns and builds a tree for a set of sequences""" res = align_and_build_tree(self.seqs1, RNA) self.assertEqual(res['Align'].toFasta(), self.aln1_fasta) tree = res['Tree'] seq_names = [] for line in self.aln1_fasta.split('\n'): if line.startswith('>'): seq_names.append(line[1:]) for node in tree.tips(): if node.Name not in seq_names: self.fail() def test_add_seqs_to_alignment(self): """Clustalw add_seqs_to_alignment should work as expected.""" seq2 = dict(MinimalFastaParser(self.lines2)) align1 = dict(MinimalFastaParser(ALIGN1_FASTA.split('\n'))) res = add_seqs_to_alignment(seq2,align1,RNA) self.assertEqual(res.toFasta(), SEQ_PROFILE_ALIGN) def test_align_two_alignments(self): """Clustalw align_two_alignments should work as expected.""" align1 = dict(MinimalFastaParser(ALIGN1_FASTA.split('\n'))) align2 = dict(MinimalFastaParser(ALIGN2_FASTA.split('\n'))) res = align_two_alignments(align1,align2,RNA) self.assertEqual(res.toFasta(), PROFILE_PROFILE_ALIGN) def test_zzz_general_cleanUp(self): """Last test executed: cleans up all files initially created""" remove('/tmp/ct/seq1.txt') remove('/tmp/ct/seq2.txt') remove('/tmp/ct/align1') remove('/tmp/ct/align2') remove('/tmp/ct/tree1') rmdir('/tmp/ct') shutil.rmtree(self.temp_dir_space) STDOUT1=\ """ CLUSTAL W (1.83) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: 1 23 bp Sequence 2: 2 13 bp Sequence 3: 3 17 bp Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 46 Sequences (1:3) Aligned. Score: 41 Sequences (2:3) Aligned. Score: 30 Guide tree file created: [/tmp/ct/seq1.dnd] Start of Multiple Alignment There are 2 groups Aligning... Group 1: Sequences: 2 Score:171 Group 2: Sequences: 3 Score:162 Alignment Score 33 CLUSTAL-Alignment file created [/tmp/ct/seq1.aln] """ ALIGN1=\ """CLUSTAL W (1.83) multiple sequence alignment 1 ACUGCUAGCUAGUAGCGUACGUA 2 ---GCUACGUAGCUAC------- 3 GCGGCUAUUAGAUCGUA------ **** """ ALIGN1_FASTA = ">seq_0\nACUGCUAGCUAGUAGCGUACGUA\n>seq_1\n---GCUACGUAGCUAC-------\n>seq_2\nGCGGCUAUUAGAUCGUA------" DND1=\ """( 1:0.21719, 2:0.32127, 3:0.37104); """ ALIGN2 =\ """CLUSTAL W (1.83) multiple sequence alignment a UAGGCUCUGAUAUAAUAGCUCUC--------- b ----UAUCGCUUCGACGAUUCUCUGAUAGAGA c ------------UGACUACGCAU--------- * * """ ALIGN2_FASTA = ">a\nUAGGCUCUGAUAUAAUAGCUCUC---------\n>b\n----UAUCGCUUCGACGAUUCUCUGAUAGAGA\n>c\n------------UGACUACGCAU---------" DND2=\ """( a:0.30435, b:0.30435, c:0.33202); """ TWOALIGN=\ """CLUSTAL W (1.83) multiple sequence alignment 1 ---ACUGCUAGCUAGUAGCGUACGUA------ 2 ------GCUACGUAGCUAC------------- 3 ---GCGGCUAUUAGAUCGUA------------ a UAGGCUCUGAUAUAAUAGCUCUC--------- b ----UAUCGCUUCGACGAUUCUCUGAUAGAGA c ------------UGACUACGCAU--------- """ ALIGNSEQS=\ """CLUSTAL W (1.83) multiple sequence alignment 1 ----------ACUGCUAGCUAGUAGCGUACGUA 2 -------------GCUACGUAGCUAC------- 3 ----------GCGGCUAUUAGAUCGUA------ a -------UAGGCUCUGAUAUAAUAGCUCUC--- c -------------------UGACUACGCAU--- b UAUCGCUUCGACGAUUCUCUGAUAGAGA----- """ TREEDURINGALIGNSEQS=\ """( 1:0.34511, ( 2:0.25283, ( ( 3:0.21486, a:0.19691) :0.11084, b:0.31115) :0.06785) :0.02780, c:0.20035); """ TREEFROMALIGNSEQS=\ """( ( ( 1:0.17223, ( 2:0.14749, c:0.13822) :0.19541) :0.07161, a:0.25531) :0.03600, 3:0.29438, b:0.23503); """ SEQ_PROFILE_ALIGN = """>a\n-------UAGGCUCUGAUAUAAUAGCUCUC---\n>b\nUAUCGCUUCGACGAUUCUCUGAUAGAGA-----\n>c\n-------------------UGACUACGCAU---\n>seq_0\n----------ACUGCUAGCUAGUAGCGUACGUA\n>seq_1\n-------------GCUACGUAGCUAC-------\n>seq_2\n----------GCGGCUAUUAGAUCGUA------""" PROFILE_PROFILE_ALIGN = """>a\nUAGGCUCUGAUAUAAUAGCUCUC---------\n>b\n----UAUCGCUUCGACGAUUCUCUGAUAGAGA\n>c\n------------UGACUACGCAU---------\n>seq_0\n---ACUGCUAGCUAGUAGCGUACGUA------\n>seq_1\n------GCUACGUAGCUAC-------------\n>seq_2\n---GCGGCUAUUAGAUCGUA------------""" if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_cmfinder.py000644 000765 000024 00000146657 12024702176 022445 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.app.cmfinder import CMfinder,CombMotif __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class CmfinderTest(TestCase): """Tests for Cmfinder application controller""" def setUp(self): self.input = cmfinder_input def test_input_as_lines(self): """Test Cmfinder stdout input as lines""" c = CMfinder(InputHandler='_input_as_lines') res = c(self.input,remove_tmp=False) #Filename in stdout, can't compare stdout since impossible to #predict filename self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_stdout_input_as_string(self): """Test Cmfinder stdout input as string""" c = CMfinder() exp ='%s\n' % '\n'.join([str(i).strip('\n') for i in cmfinder_stdout]) f = open('input.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = c('input.fasta',remove_tmp=False) obs = res['StdOut'].read() self.assertEqual(obs,exp) self.assertEqual(res['ExitStatus'],0) res.cleanUp() remove('input.fasta') def test_get_result_path(self): """Tests cmfinder result path""" c =CMfinder(InputHandler='_input_as_lines') res = c(self.input,remove_tmp=False) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus', 'cm_1','cm_2','cm_3','motif_1','motif_2','motif_3','latest', '_input_filename']) self.assertEqual(res['ExitStatus'],0) assert res['cm_1'] is not None assert res['cm_2'] is not None assert res['cm_3'] is not None assert res['motif_1'] is not None assert res['motif_2'] is not None assert res['motif_3'] is not None res.cleanUp() class CombMotifTest(TestCase): """Tests for CombMotif application controller _input_as_lines function not used!!""" def setUp(self): """ """ a = open('input.fasta.cm.h1.1','w') txt = '\n'.join([str(i).strip('\n') for i in cm1]) a.write(txt) a.close() b = open('input.fasta.cm.h1.2','w') txt = '\n'.join([str(i).strip('\n') for i in cm2]) b.write(txt) b.close() c = open('input.fasta.cm.h1.3','w') txt = '\n'.join([str(i).strip('\n') for i in cm3]) c.write(txt) c.close() d = open('input.fasta.motif.h1.1','w') txt = '\n'.join([str(i).strip('\n') for i in motif1]) d.write(txt) d.close() e = open('input.fasta.motif.h1.2','w') txt = '\n'.join([str(i).strip('\n') for i in motif2]) e.write(txt) e.close() f = open('input.fasta.motif.h1.3','w') txt = '\n'.join([str(i).strip('\n') for i in motif3]) f.write(txt) f.close() g = open('input.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in cmfinder_input]) g.write(txt) g.close() def test_combmotif(self): """Tests CombMotif result path & input as string""" cbm =CombMotif() res = cbm('input.fasta') self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus', 'comb1','latest']) self.assertEqual(res['ExitStatus'],0) assert res['comb1'] is not None res.cleanUp() remove('input.fasta') remove('input.fasta.cm.h1.1') remove('input.fasta.cm.h1.2') remove('input.fasta.cm.h1.3') remove('input.fasta.motif.h1.1') remove('input.fasta.motif.h1.2') remove('input.fasta.motif.h1.3') remove('input.fasta.cm.h1.2.3') cmfinder_input = ['>seq1\n', 'GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC\n', '>seq2\n', 'GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC\n', '>seq3\n', 'GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC\n', '>seq4\n', 'GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC'] cmfinder_stdout = ['Seq_0_Cand0_20_69 -0.000000\n', 'UAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUG\n', '((((((...((((((((...((..........))))))))))..))))))\n', 'Seq_0_Cand1_29_61 -0.000000\n', 'GGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCC\n', '((((((((...((..........))))))))))\n', 'Seq_0_Cand2_36_68 -0.000000\n', 'AGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCU\n', '(((.((....((((......))))...)).)))\n', 'Alignment saved in file input.fasta.align.h1.1\n', 'Alignment saved in file input.fasta.align.h1.2\n', 'Alignment saved in file input.fasta.align.h1.3\n'] cm1 = ['INFERNAL-1 [0.7]\n', 'NAME (null)\n', 'STATES 154\n', 'NODES 40\n', 'W 200\n', 'el_selfsc 0.000000\n', 'NULL -0.363 -0.170 0.415 0.000 \n', 'MODEL:\n', '\t\t\t\t[ ROOT 0 ]\n', ' S 0 -1 0 1 4 -7.615 -7.822 -0.033 -6.236 \n', ' IL 1 1 2 1 4 -1.686 -2.369 -1.117 -4.855 1.023 -0.442 -0.708 -0.076 \n', ' IR 2 2 3 2 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 1 ]\n', ' ML 3 2 3 5 3 -8.950 -0.012 -7.267 -1.824 -2.170 -3.369 1.788 \n', ' D 4 2 3 5 3 -5.620 -0.734 -1.403 \n', ' IL 5 5 3 5 3 -1.925 -0.554 -4.164 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATR 2 ]\n', ' MR 6 5 3 8 5 -7.656 -0.026 -7.471 -7.683 -8.575 -1.639 -3.134 1.386 -2.615 \n', ' D 7 5 3 8 5 -5.352 -0.707 -2.978 -4.409 -2.404 \n', ' IR 8 8 3 8 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 3 ]\n', ' MP 9 8 3 13 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 10 8 3 13 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 11 8 3 13 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 12 8 3 13 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 13 13 5 13 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 14 14 6 14 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 4 ]\n', ' MP 15 14 6 19 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 16 14 6 19 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 17 14 6 19 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 18 14 6 19 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 19 19 5 19 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 20 20 6 20 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 5 ]\n', ' MP 21 20 6 25 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 22 20 6 25 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 23 20 6 25 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 24 20 6 25 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 25 25 5 25 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 26 26 6 26 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 6 ]\n', ' MP 27 26 6 31 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 28 26 6 31 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 29 26 6 31 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 30 26 6 31 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 31 31 5 31 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 32 32 6 32 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 7 ]\n', ' MP 33 32 6 37 4 -7.615 -7.822 -0.033 -6.236 -5.208 -5.325 -4.370 -0.894 -3.401 -4.617 3.483 -4.720 -7.134 -1.459 -5.156 -3.274 0.375 -6.308 -1.634 -4.773 \n', ' ML 34 32 6 37 4 -3.758 -3.940 -0.507 -2.670 1.023 -0.442 -0.708 -0.076 \n', ' MR 35 32 6 37 4 -4.809 -3.838 -1.706 -0.766 1.023 -0.442 -0.708 -0.076 \n', ' D 36 32 6 37 4 -4.568 -4.250 -2.265 -0.520 \n', ' IL 37 37 5 37 4 -1.686 -2.369 -1.117 -4.855 1.023 -0.442 -0.708 -0.076 \n', ' IR 38 38 6 38 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 8 ]\n', ' ML 39 38 6 41 3 -8.593 -0.013 -7.247 2.232 -3.180 -3.632 -2.819 \n', ' D 40 38 6 41 3 -6.174 -1.687 -0.566 \n', ' IL 41 41 3 41 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 9 ]\n', ' ML 42 41 3 44 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 43 41 3 44 3 -6.174 -1.687 -0.566 \n', ' IL 44 44 3 44 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 10 ]\n', ' ML 45 44 3 47 3 -8.950 -0.012 -7.267 2.232 -3.180 -3.632 -2.819 \n', ' D 46 44 3 47 3 -5.620 -0.734 -1.403 \n', ' IL 47 47 3 47 3 -1.925 -0.554 -4.164 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATR 11 ]\n', ' MR 48 47 3 50 3 -8.950 -0.012 -7.267 -1.639 -3.134 1.386 -2.615 \n', ' D 49 47 3 50 3 -6.390 -1.568 -0.620 \n', ' IR 50 50 3 50 3 -1.925 -0.554 -4.164 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATR 12 ]\n', ' MR 51 50 3 53 5 -7.656 -0.026 -7.471 -7.683 -8.575 -2.124 1.984 -3.766 -2.279 \n', ' D 52 50 3 53 5 -5.352 -0.707 -2.978 -4.409 -2.404 \n', ' IR 53 53 3 53 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 13 ]\n', ' MP 54 53 3 58 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 55 53 3 58 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 56 53 3 58 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 57 53 3 58 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 58 58 5 58 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 59 59 6 59 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 14 ]\n', ' MP 60 59 6 64 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 61 59 6 64 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 62 59 6 64 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 63 59 6 64 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 64 64 5 64 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 65 65 6 65 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 15 ]\n', ' MP 66 65 6 70 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 67 65 6 70 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 68 65 6 70 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 69 65 6 70 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 70 70 5 70 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 71 71 6 71 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 16 ]\n', ' MP 72 71 6 76 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.687 -4.001 -5.006 -0.279 -3.289 -5.690 -0.341 -4.598 -4.707 -1.022 -5.894 -2.187 4.037 -4.560 -1.931 -3.750 \n', ' ML 73 71 6 76 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 74 71 6 76 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 75 71 6 76 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 76 76 5 76 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 77 77 6 77 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 17 ]\n', ' MP 78 77 6 82 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -4.629 -4.281 -5.803 -0.660 -4.305 -6.778 -1.192 -5.008 -5.286 -1.451 -6.272 -2.306 -0.329 -5.298 3.329 -4.356 \n', ' ML 79 77 6 82 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 80 77 6 82 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 81 77 6 82 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 82 82 5 82 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 83 83 6 83 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 18 ]\n', ' MP 84 83 6 88 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 85 83 6 88 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 86 83 6 88 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 87 83 6 88 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 88 88 5 88 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 89 89 6 89 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 19 ]\n', ' MP 90 89 6 94 4 -7.615 -7.822 -0.033 -6.236 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 91 89 6 94 4 -3.758 -3.940 -0.507 -2.670 1.023 -0.442 -0.708 -0.076 \n', ' MR 92 89 6 94 4 -4.809 -3.838 -1.706 -0.766 1.023 -0.442 -0.708 -0.076 \n', ' D 93 89 6 94 4 -4.568 -4.250 -2.265 -0.520 \n', ' IL 94 94 5 94 4 -1.686 -2.369 -1.117 -4.855 1.023 -0.442 -0.708 -0.076 \n', ' IR 95 95 6 95 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 20 ]\n', ' ML 96 95 6 98 3 -8.593 -0.013 -7.247 2.232 -3.180 -3.632 -2.819 \n', ' D 97 95 6 98 3 -6.174 -1.687 -0.566 \n', ' IL 98 98 3 98 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 21 ]\n', ' ML 99 98 3 101 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 100 98 3 101 3 -6.174 -1.687 -0.566 \n', ' IL 101 101 3 101 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 22 ]\n', ' ML 102 101 3 104 3 -8.593 -0.013 -7.247 2.232 -3.180 -3.632 -2.819 \n', ' D 103 101 3 104 3 -6.174 -1.687 -0.566 \n', ' IL 104 104 3 104 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 23 ]\n', ' ML 105 104 3 107 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 106 104 3 107 3 -6.174 -1.687 -0.566 \n', ' IL 107 107 3 107 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 24 ]\n', ' ML 108 107 3 110 3 -8.593 -0.013 -7.247 -2.124 1.984 -3.766 -2.279 \n', ' D 109 107 3 110 3 -6.174 -1.687 -0.566 \n', ' IL 110 110 3 110 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 25 ]\n', ' ML 111 110 3 113 3 -8.593 -0.013 -7.247 -2.124 1.984 -3.766 -2.279 \n', ' D 112 110 3 113 3 -6.174 -1.687 -0.566 \n', ' IL 113 113 3 113 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 26 ]\n', ' ML 114 113 3 116 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 115 113 3 116 3 -6.174 -1.687 -0.566 \n', ' IL 116 116 3 116 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 27 ]\n', ' ML 117 116 3 119 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 118 116 3 119 3 -6.174 -1.687 -0.566 \n', ' IL 119 119 3 119 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 28 ]\n', ' ML 120 119 3 122 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 121 119 3 122 3 -6.174 -1.687 -0.566 \n', ' IL 122 122 3 122 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 29 ]\n', ' ML 123 122 3 125 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 124 122 3 125 3 -6.174 -1.687 -0.566 \n', ' IL 125 125 3 125 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 30 ]\n', ' ML 126 125 3 128 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 127 125 3 128 3 -6.174 -1.687 -0.566 \n', ' IL 128 128 3 128 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 31 ]\n', ' ML 129 128 3 131 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 130 128 3 131 3 -6.174 -1.687 -0.566 \n', ' IL 131 131 3 131 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 32 ]\n', ' ML 132 131 3 134 3 -8.593 -0.013 -7.247 -2.124 1.984 -3.766 -2.279 \n', ' D 133 131 3 134 3 -6.174 -1.687 -0.566 \n', ' IL 134 134 3 134 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 33 ]\n', ' ML 135 134 3 137 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 136 134 3 137 3 -6.174 -1.687 -0.566 \n', ' IL 137 137 3 137 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 34 ]\n', ' ML 138 137 3 140 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 139 137 3 140 3 -6.174 -1.687 -0.566 \n', ' IL 140 140 3 140 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 35 ]\n', ' ML 141 140 3 143 3 -8.593 -0.013 -7.247 -2.124 1.984 -3.766 -2.279 \n', ' D 142 140 3 143 3 -6.174 -1.687 -0.566 \n', ' IL 143 143 3 143 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 36 ]\n', ' ML 144 143 3 146 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 145 143 3 146 3 -6.174 -1.687 -0.566 \n', ' IL 146 146 3 146 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 37 ]\n', ' ML 147 146 3 149 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 148 146 3 149 3 -6.174 -1.687 -0.566 \n', ' IL 149 149 3 149 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 38 ]\n', ' ML 150 149 3 152 2 -9.084 -0.003 -1.824 -2.170 -3.369 1.788 \n', ' D 151 149 3 152 2 -8.445 -0.004 \n', ' IL 152 152 3 152 2 -1.823 -0.479 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ END 39 ]\n', ' E 153 152 3 -1 0 \n', '//\n'] cm2 = ['INFERNAL-1 [0.7]\n', 'NAME (null)\n', 'STATES 103\n', 'NODES 24\n', 'W 200\n', 'el_selfsc 0.000000\n', 'NULL -0.363 -0.170 0.415 0.000 \n', 'MODEL:\n', '\t\t\t\t[ ROOT 0 ]\n', ' S 0 -1 0 1 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 \n', ' IL 1 1 2 1 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 2 2 3 2 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 1 ]\n', ' MP 3 2 3 7 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 4 2 3 7 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 5 2 3 7 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 6 2 3 7 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 7 7 5 7 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 8 8 6 8 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 2 ]\n', ' MP 9 8 6 13 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 10 8 6 13 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 11 8 6 13 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 12 8 6 13 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 13 13 5 13 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 14 14 6 14 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 3 ]\n', ' MP 15 14 6 19 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 16 14 6 19 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 17 14 6 19 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 18 14 6 19 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 19 19 5 19 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 20 20 6 20 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 4 ]\n', ' MP 21 20 6 25 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.687 -4.001 -5.006 -0.279 -3.289 -5.690 -0.341 -4.598 -4.707 -1.022 -5.894 -2.187 4.037 -4.560 -1.931 -3.750 \n', ' ML 22 20 6 25 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 23 20 6 25 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 24 20 6 25 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 25 25 5 25 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 26 26 6 26 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 5 ]\n', ' MP 27 26 6 31 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -4.629 -4.281 -5.803 -0.660 -4.305 -6.778 -1.192 -5.008 -5.286 -1.451 -6.272 -2.306 -0.329 -5.298 3.329 -4.356 \n', ' ML 28 26 6 31 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 29 26 6 31 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 30 26 6 31 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 31 31 5 31 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 32 32 6 32 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 6 ]\n', ' MP 33 32 6 37 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 34 32 6 37 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 35 32 6 37 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 36 32 6 37 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 37 37 5 37 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 38 38 6 38 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 7 ]\n', ' MP 39 38 6 43 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 40 38 6 43 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 41 38 6 43 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 42 38 6 43 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 43 43 5 43 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 44 44 6 44 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 8 ]\n', ' MP 45 44 6 49 4 -7.487 -8.733 -0.052 -5.186 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 46 44 6 49 4 -2.408 -4.532 -1.293 -1.473 1.023 -0.442 -0.708 -0.076 \n', ' MR 47 44 6 49 4 -4.102 -12.528 -0.390 -2.485 1.023 -0.442 -0.708 -0.076 \n', ' D 48 44 6 49 4 -12.737 -14.007 -2.036 -0.404 \n', ' IL 49 49 5 49 4 -2.817 -4.319 -0.613 -2.698 1.023 -0.442 -0.708 -0.076 \n', ' IR 50 50 6 50 3 -1.925 -0.554 -4.164 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATR 9 ]\n', ' MR 51 50 6 53 3 -8.950 -0.012 -7.267 -1.639 -3.134 1.386 -2.615 \n', ' D 52 50 6 53 3 -6.390 -1.568 -0.620 \n', ' IR 53 53 3 53 3 -1.925 -0.554 -4.164 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATR 10 ]\n', ' MR 54 53 3 56 5 -7.656 -0.026 -7.471 -7.683 -8.575 -1.639 -3.134 1.386 -2.615 \n', ' D 55 53 3 56 5 -5.352 -0.707 -2.978 -4.409 -2.404 \n', ' IR 56 56 3 56 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 11 ]\n', ' MP 57 56 3 61 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 58 56 3 61 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 59 56 3 61 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 60 56 3 61 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 61 61 5 61 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 62 62 6 62 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 12 ]\n', ' MP 63 62 6 67 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 64 62 6 67 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 65 62 6 67 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 66 62 6 67 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 67 67 5 67 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 68 68 6 68 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 13 ]\n', ' MP 69 68 6 73 4 -7.615 -7.822 -0.033 -6.236 -4.629 -4.281 -5.803 -0.660 -4.305 -6.778 -1.192 -5.008 -5.286 -1.451 -6.272 -2.306 -0.329 -5.298 3.329 -4.356 \n', ' ML 70 68 6 73 4 -3.758 -3.940 -0.507 -2.670 1.023 -0.442 -0.708 -0.076 \n', ' MR 71 68 6 73 4 -4.809 -3.838 -1.706 -0.766 1.023 -0.442 -0.708 -0.076 \n', ' D 72 68 6 73 4 -4.568 -4.250 -2.265 -0.520 \n', ' IL 73 73 5 73 4 -1.686 -2.369 -1.117 -4.855 1.023 -0.442 -0.708 -0.076 \n', ' IR 74 74 6 74 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 14 ]\n', ' ML 75 74 6 77 3 -8.593 -0.013 -7.247 -2.124 1.984 -3.766 -2.279 \n', ' D 76 74 6 77 3 -6.174 -1.687 -0.566 \n', ' IL 77 77 3 77 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 15 ]\n', ' ML 78 77 3 80 3 -8.593 -0.013 -7.247 -2.124 1.984 -3.766 -2.279 \n', ' D 79 77 3 80 3 -6.174 -1.687 -0.566 \n', ' IL 80 80 3 80 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 16 ]\n', ' ML 81 80 3 83 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 82 80 3 83 3 -6.174 -1.687 -0.566 \n', ' IL 83 83 3 83 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 17 ]\n', ' ML 84 83 3 86 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 85 83 3 86 3 -6.174 -1.687 -0.566 \n', ' IL 86 86 3 86 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 18 ]\n', ' ML 87 86 3 89 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 88 86 3 89 3 -6.174 -1.687 -0.566 \n', ' IL 89 89 3 89 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 19 ]\n', ' ML 90 89 3 92 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 91 89 3 92 3 -6.174 -1.687 -0.566 \n', ' IL 92 92 3 92 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 20 ]\n', ' ML 93 92 3 95 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 94 92 3 95 3 -6.174 -1.687 -0.566 \n', ' IL 95 95 3 95 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 21 ]\n', ' ML 96 95 3 98 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 97 95 3 98 3 -6.174 -1.687 -0.566 \n', ' IL 98 98 3 98 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 22 ]\n', ' ML 99 98 3 101 2 -9.084 -0.003 -2.124 1.984 -3.766 -2.279 \n', ' D 100 98 3 101 2 -8.445 -0.004 \n', ' IL 101 101 3 101 2 -1.823 -0.479 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ END 23 ]\n', ' E 102 101 3 -1 0 \n', '//\n'] cm3 = ['INFERNAL-1 [0.7]\n', 'NAME (null)\n', 'STATES 103\n', 'NODES 24\n', 'W 200\n', 'el_selfsc 0.000000\n', 'NULL -0.363 -0.170 0.415 0.000 \n', 'MODEL:\n', '\t\t\t\t[ ROOT 0 ]\n', ' S 0 -1 0 1 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 \n', ' IL 1 1 2 1 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 2 2 3 2 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 1 ]\n', ' MP 3 2 3 7 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 4 2 3 7 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 5 2 3 7 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 6 2 3 7 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 7 7 5 7 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 8 8 6 8 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 2 ]\n', ' MP 9 8 6 13 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 10 8 6 13 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 11 8 6 13 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 12 8 6 13 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 13 13 5 13 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 14 14 6 14 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 3 ]\n', ' MP 15 14 6 19 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 16 14 6 19 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 17 14 6 19 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 18 14 6 19 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 19 19 5 19 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 20 20 6 20 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 4 ]\n', ' MP 21 20 6 25 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.687 -4.001 -5.006 -0.279 -3.289 -5.690 -0.341 -4.598 -4.707 -1.022 -5.894 -2.187 4.037 -4.560 -1.931 -3.750 \n', ' ML 22 20 6 25 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 23 20 6 25 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 24 20 6 25 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 25 25 5 25 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 26 26 6 26 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 5 ]\n', ' MP 27 26 6 31 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -4.629 -4.281 -5.803 -0.660 -4.305 -6.778 -1.192 -5.008 -5.286 -1.451 -6.272 -2.306 -0.329 -5.298 3.329 -4.356 \n', ' ML 28 26 6 31 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 29 26 6 31 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 30 26 6 31 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 31 31 5 31 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 32 32 6 32 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 6 ]\n', ' MP 33 32 6 37 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 34 32 6 37 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 35 32 6 37 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 36 32 6 37 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 37 37 5 37 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 38 38 6 38 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 7 ]\n', ' MP 39 38 6 43 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 40 38 6 43 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 41 38 6 43 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 42 38 6 43 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 43 43 5 43 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 44 44 6 44 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 8 ]\n', ' MP 45 44 6 49 4 -7.487 -8.733 -0.052 -5.186 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 46 44 6 49 4 -2.408 -4.532 -1.293 -1.473 1.023 -0.442 -0.708 -0.076 \n', ' MR 47 44 6 49 4 -4.102 -12.528 -0.390 -2.485 1.023 -0.442 -0.708 -0.076 \n', ' D 48 44 6 49 4 -12.737 -14.007 -2.036 -0.404 \n', ' IL 49 49 5 49 4 -2.817 -4.319 -0.613 -2.698 1.023 -0.442 -0.708 -0.076 \n', ' IR 50 50 6 50 3 -1.925 -0.554 -4.164 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATR 9 ]\n', ' MR 51 50 6 53 3 -8.950 -0.012 -7.267 -1.639 -3.134 1.386 -2.615 \n', ' D 52 50 6 53 3 -6.390 -1.568 -0.620 \n', ' IR 53 53 3 53 3 -1.925 -0.554 -4.164 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATR 10 ]\n', ' MR 54 53 3 56 5 -7.656 -0.026 -7.471 -7.683 -8.575 -1.639 -3.134 1.386 -2.615 \n', ' D 55 53 3 56 5 -5.352 -0.707 -2.978 -4.409 -2.404 \n', ' IR 56 56 3 56 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 11 ]\n', ' MP 57 56 3 61 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -6.059 -3.399 -7.491 -0.050 -6.698 -4.090 -1.804 -6.314 -5.110 3.545 -5.426 -1.969 -1.163 -4.250 -3.694 -5.400 \n', ' ML 58 56 3 61 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 59 56 3 61 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 60 56 3 61 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 61 61 5 61 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 62 62 6 62 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 12 ]\n', ' MP 63 62 6 67 6 -9.400 -9.339 -0.017 -8.115 -8.395 -8.790 -3.884 -3.075 -5.228 4.065 -4.750 -5.113 -1.228 -4.059 -4.861 -0.314 -6.063 -1.931 -0.504 -4.796 -2.719 -3.753 \n', ' ML 64 62 6 67 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 1.023 -0.442 -0.708 -0.076 \n', ' MR 65 62 6 67 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 1.023 -0.442 -0.708 -0.076 \n', ' D 66 62 6 67 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 \n', ' IL 67 67 5 67 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 1.023 -0.442 -0.708 -0.076 \n', ' IR 68 68 6 68 5 -2.408 -0.496 -5.920 -4.087 -5.193 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATP 13 ]\n', ' MP 69 68 6 73 4 -7.615 -7.822 -0.033 -6.236 -4.629 -4.281 -5.803 -0.660 -4.305 -6.778 -1.192 -5.008 -5.286 -1.451 -6.272 -2.306 -0.329 -5.298 3.329 -4.356 \n', ' ML 70 68 6 73 4 -3.758 -3.940 -0.507 -2.670 1.023 -0.442 -0.708 -0.076 \n', ' MR 71 68 6 73 4 -4.809 -3.838 -1.706 -0.766 1.023 -0.442 -0.708 -0.076 \n', ' D 72 68 6 73 4 -4.568 -4.250 -2.265 -0.520 \n', ' IL 73 73 5 73 4 -1.686 -2.369 -1.117 -4.855 1.023 -0.442 -0.708 -0.076 \n', ' IR 74 74 6 74 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 14 ]\n', ' ML 75 74 6 77 3 -8.593 -0.013 -7.247 -2.124 1.984 -3.766 -2.279 \n', ' D 76 74 6 77 3 -6.174 -1.687 -0.566 \n', ' IL 77 77 3 77 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 15 ]\n', ' ML 78 77 3 80 3 -8.593 -0.013 -7.247 -2.124 1.984 -3.766 -2.279 \n', ' D 79 77 3 80 3 -6.174 -1.687 -0.566 \n', ' IL 80 80 3 80 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 16 ]\n', ' ML 81 80 3 83 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 82 80 3 83 3 -6.174 -1.687 -0.566 \n', ' IL 83 83 3 83 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 17 ]\n', ' ML 84 83 3 86 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 85 83 3 86 3 -6.174 -1.687 -0.566 \n', ' IL 86 86 3 86 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 18 ]\n', ' ML 87 86 3 89 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 88 86 3 89 3 -6.174 -1.687 -0.566 \n', ' IL 89 89 3 89 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 19 ]\n', ' ML 90 89 3 92 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 91 89 3 92 3 -6.174 -1.687 -0.566 \n', ' IL 92 92 3 92 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 20 ]\n', ' ML 93 92 3 95 3 -8.593 -0.013 -7.247 -1.639 -3.134 1.386 -2.615 \n', ' D 94 92 3 95 3 -6.174 -1.687 -0.566 \n', ' IL 95 95 3 95 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 21 ]\n', ' ML 96 95 3 98 3 -8.593 -0.013 -7.247 -1.824 -2.170 -3.369 1.788 \n', ' D 97 95 3 98 3 -6.174 -1.687 -0.566 \n', ' IL 98 98 3 98 3 -1.442 -0.798 -4.142 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ MATL 22 ]\n', ' ML 99 98 3 101 2 -9.084 -0.003 -2.124 1.984 -3.766 -2.279 \n', ' D 100 98 3 101 2 -8.445 -0.004 \n', ' IL 101 101 3 101 2 -1.823 -0.479 1.023 -0.442 -0.708 -0.076 \n', '\t\t\t\t[ END 23 ]\n', ' E 102 101 3 -1 0 \n', '//\n'] motif1 = ['# STOCKHOLM 1.0\n', '#=GF AU CMfinder 0.2\n', '\n', '#=GS seq1 WT 1.00\n', '#=GS seq2 WT 1.00\n', '#=GS seq3 WT 1.00\n', '#=GS seq4 WT 1.00\n', '\n', '#=GS seq1 DE 20.. 69\t89.835533\n', '#=GS seq2 DE 20.. 69\t89.835533\n', '#=GS seq3 DE 20.. 69\t89.835533\n', '#=GS seq4 DE 20.. 69\t89.835533\n', '\n', 'seq1 UAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUG\n', '#=GR seq1 SS .<<<<<...<<<<<<<...................>>>>>>>..>>>>>.\n', 'seq2 UAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUG\n', '#=GR seq2 SS .<<<<<...<<<<<<<...................>>>>>>>..>>>>>.\n', 'seq3 UAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUG\n', '#=GR seq3 SS .<<<<<...<<<<<<<...................>>>>>>>..>>>>>.\n', 'seq4 UAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUG\n', '#=GR seq4 SS .<<<<<...<<<<<<<...................>>>>>>>..>>>>>.\n', '#=GC SS_cons :<<<<<---<<<<<<<___________________>>>>>>>-->>>>>:\n', '#=GC RF UAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUG\n', '//\n'] motif2 = ['# STOCKHOLM 1.0\n', '#=GF AU CMfinder 0.2\n', '\n', '#=GS seq1 WT 1.00\n', '#=GS seq2 WT 1.00\n', '#=GS seq3 WT 1.00\n', '#=GS seq4 WT 1.00\n', '\n', '#=GS seq1 DE 29.. 61\t59.381535\n', '#=GS seq2 DE 29.. 61\t59.381535\n', '#=GS seq3 DE 29.. 61\t59.381535\n', '#=GS seq4 DE 29.. 61\t59.381535\n', '\n', 'seq1 GGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCC\n', '#=GR seq1 SS <<<<<<<<<<<.........>>>..>>>>>>>>\n', 'seq2 GGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCC\n', '#=GR seq2 SS <<<<<<<<<<<.........>>>..>>>>>>>>\n', 'seq3 GGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCC\n', '#=GR seq3 SS <<<<<<<<<<<.........>>>..>>>>>>>>\n', 'seq4 GGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCC\n', '#=GR seq4 SS <<<<<<<<<<<.........>>>..>>>>>>>>\n', '#=GC SS_cons <<<<<<<<<<<_________>>>-->>>>>>>>\n', '#=GC RF GGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCC\n', '//\n'] motif3 = ['# STOCKHOLM 1.0\n', '#=GF AU CMfinder 0.2\n', '\n', '#=GS seq1 WT 1.00\n', '#=GS seq2 WT 1.00\n', '#=GS seq3 WT 1.00\n', '#=GS seq4 WT 1.00\n', '\n', '#=GS seq1 DE 36.. 68\t59.230255\n', '#=GS seq2 DE 36.. 68\t59.230255\n', '#=GS seq3 DE 36.. 68\t59.230255\n', '#=GS seq4 DE 36.. 68\t59.230255\n', '\n', 'seq1 AGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCU\n', '#=GR seq1 SS .<<........>><<<<<.......>>>>>...\n', 'seq2 AGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCU\n', '#=GR seq2 SS .<<........>><<<<<.......>>>>>...\n', 'seq3 AGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCU\n', '#=GR seq3 SS .<<........>><<<<<.......>>>>>...\n', 'seq4 AGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCU\n', '#=GR seq4 SS .<<........>><<<<<.......>>>>>...\n', '#=GC SS_cons :<<________>><<<<<_______>>>>>:::\n', '#=GC RF AGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCU\n', '//\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_comrna.py000644 000765 000024 00000004311 12024702176 022111 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.comrna import comRNA __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class ComrnaTest(TestCase): """Tests for comRNA application controller""" def setUp(self): self.input = comrna_input def test_input_as_lines(self): """Test comrna input as lines""" c = comRNA(InputHandler='_input_as_lines') res = c(self.input) #Can't compare stdout since comRNA app controller uses tmp filenames #that are impossible to predict. self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test comrna input as string""" c = comRNA() f = open('/tmp/single.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = c('/tmp/single.fasta') #Can't compare stdout since comRNA app controller uses tmp filenames #that are impossible to predict. self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/single.fasta') def test_get_result_path(self): """Tests comrna result path""" c = comRNA(InputHandler='_input_as_lines') res = c(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() comrna_input = ['>seq1\n', 'GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC\n', '>seq2\n', 'GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC\n', '>seq3\n', 'GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC\n', '>seq4\n', 'GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_consan.py000644 000765 000024 00000004561 12024702176 022122 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.app.consan import Consan __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class ConsanTest(TestCase): """Tests for Consan application controller""" def setUp(self): self.input1 = consan_input1 self.input2 = consan_input2 def test_stdout_input_as_lines(self): """Test Consan stdout input as lines""" c = Consan(InputHandler='_input_as_lines') input = self.input1 input.extend(self.input2) res = c(input) #Impossible to compare stdout since copmutation time is in the output #which may differ between runs self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test Consan stdout input as string""" c = Consan() f = open('/tmp/seq1.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input1]) f.write(txt) f.close() s = open('/tmp/seq2.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input2]) s.write(txt) s.close() res = c(['/tmp/seq1.fasta','/tmp/seq2.fasta']) #Impossible to compare stdout since copmutation time is in the output #which may differ between runs self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_get_result_path(self): """Tests Consan result path""" c = Consan(InputHandler='_input_as_lines') input = self.input1 input.extend(self.input2) res = c(input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) res.cleanUp() consan_input1 = ['>seq1\n', 'GGCCACGTAGCTCAGTCGGTAGAGCAAAGGACTGAAAATCCTTGTGTCGTTGGTTCAATTCCAACCGTGGCCACCA'] consan_input2 = ['>seq2\n', 'GCCAGATAGCTCAGTCGGTAGAGCGTTCGCCTGAAAAGTGAAAGGTCGCCGGTTCGATCCCGGCTCTGGCCACCA'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_contrafold.py000644 000765 000024 00000005463 12024702176 022776 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.contrafold import Contrafold __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class ContrafoldTest(TestCase): """Tests for Contrafold application controller""" def setUp(self): self.input = contrafold_input def test_stdout_input_as_lines(self): """Test contrafold stdout input as lines""" c = Contrafold(InputHandler='_input_as_lines') exp= '%s\n' % '\n'.join([str(i).strip('\n') for i in contrafold_stdout]) res = c(self.input) obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() def test_stdout_input_as_string(self): """Test contrafold stdout input as string""" c = Contrafold() exp= '%s\n' % '\n'.join([str(i).strip('\n') for i in contrafold_stdout]) f = open('/tmp/single.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = c('/tmp/single.fasta') obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() remove('/tmp/single.fasta') def test_get_result_path(self): """Tests contrafold result path""" c = Contrafold(InputHandler='_input_as_lines') res = c(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() contrafold_input = ['>seq1\n', 'GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '\n'] contrafold_stdout = ['1 G 0\n', '2 G 71\n', '3 C 70\n', '4 U 9\n', '5 A 0\n', '6 G 0\n', '7 A 0\n', '8 U 0\n', '9 A 4\n', '10 G 0\n', '11 C 0\n', '12 U 0\n', '13 C 0\n', '14 A 0\n', '15 G 0\n', '16 A 0\n', '17 U 0\n', '18 G 0\n', '19 G 0\n', '20 U 69\n', '21 A 68\n', '22 G 67\n', '23 A 66\n', '24 G 65\n', '25 C 64\n', '26 A 0\n', '27 G 62\n', '28 A 0\n', '29 G 61\n', '30 G 60\n', '31 A 59\n', '32 U 58\n', '33 U 57\n', '34 G 56\n', '35 A 55\n', '36 A 54\n', '37 G 51\n', '38 A 50\n', '39 U 49\n', '40 C 0\n', '41 C 0\n', '42 U 0\n', '43 U 0\n', '44 G 0\n', '45 U 0\n', '46 G 0\n', '47 U 0\n', '48 C 0\n', '49 G 39\n', '50 U 38\n', '51 C 37\n', '52 G 0\n', '53 G 0\n', '54 U 36\n', '55 U 35\n', '56 C 34\n', '57 G 33\n', '58 A 32\n', '59 U 31\n', '60 C 30\n', '61 C 29\n', '62 C 27\n', '63 G 0\n', '64 G 25\n', '65 C 24\n', '66 U 23\n', '67 C 22\n', '68 U 21\n', '69 G 20\n', '70 G 3\n', '71 C 2\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_cove.py000644 000765 000024 00000170752 12024702176 021603 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.cove import Coves,Covet __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class CovetTest(TestCase): """Tests for Covet application controller""" def setUp(self): self.input = cove_input def test_stdout_input_as_lines(self): """Test covet stdout input as lines""" c = Covet(InputHandler='_input_as_lines') res = c(self.input,remove_tmp=False) #Can't test for stdout since different models are created each run self.assertEqual(res['ExitStatus'],0) res.cleanUp() def test_stdout_input_as_string(self): """Test covet stdout input as string""" c = Covet() f = open('single.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = c('single.fasta',remove_tmp=False) #Can't test for stdout since different models are created each run self.assertEqual(res['ExitStatus'],0) res.cleanUp() remove('single.fasta') def test_get_result_path(self): """Tests covet result path""" c = Covet(InputHandler='_input_as_lines') res = c(self.input,remove_tmp=False) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus','cm' ,'_input_filename']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() class CovesTest(TestCase): """Tests for Coves application controller""" def setUp(self): self.input = cove_input def test_stdout_input_as_string(self): """Test coves stdout input as string""" c = Coves() exp= '%s\n' % '\n'.join([str(i).strip('\n') for i in coves_stdout]) f = open('single.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() #Create cm file needed for coves s = open('single.fasta.cm','w') txt = '\n'.join([str(i).strip('\n') for i in cove_cm]) s.write(txt) s.close() res = c('single.fasta') obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() remove('single.fasta') remove('single.fasta.cm') def test_get_result_path(self): """Tests coves result path""" c = Coves() f = open('single.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() #Create cm file needed for coves s = open('single.fasta.cm','w') txt = '\n'.join([str(i).strip('\n') for i in cove_cm]) s.write(txt) s.close() res = c('single.fasta') self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() cove_input = ['>seq1\n', 'GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '\n'] coves_stdout = ['coves - scoring and structure prediction of RNA sequences\n', ' using a covariance model\n', ' version 2.4.4, January 1996\n', '\n', '---------------------------------------------------\n', 'Database to search/score: single.fasta\n', 'Model: single.fasta.cm\n', 'GC% of background model: 50%\n', '---------------------------------------------------\n', '\n', '-32.55 bits : seq1\n', ' seq1 GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUC\n', ' seq1 >.>..>.>.>.......>.>.>.>..>.>>...........<<.<..<.<.<.<......\n', '\n', ' seq1 CCGGCUCUGGC\n', ' seq1 .<.<.<..<.<\n', '\n'] cove_cm = ['\xb0\xb2\xed\xe3%\x00\x00\x00\x06\x00\x00\x00\x01\x00\x00\x00\xff\xff\xff\xff\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x03\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x04\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x05\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x06\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x07\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x08\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\t\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\n', '\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x0b\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x0c\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\r\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x0e\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x0f\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x10\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x11\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x12\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x13\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x14\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x15\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x16\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x17\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x18\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x19\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x1a\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x1b\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x1c\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x1d\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x1e\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00\x1f\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00 \x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00!\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00"\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00#\x00\x00\x00\xff\xff\xff\xffUUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x92$I\x92$I\xc2?\x92$I\x92$I\xd2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?\x92$I\x92$I\xc2?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?UUUUUU\xc5?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x01\x00\x00\x00$\x00\x00\x00\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xd9?\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?UUUUUU\xd5?\x00\x00\x00\x00\x00\x00\x00\x00UUUUUU\xd5?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00UUUUUU\xd5?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xbe?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x1e\x1e\x1e\x1e\x1e\x1e\xae?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x02\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\xe0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00UUUUUU\xe5?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00UUUUUU\xd5?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\xd0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xd9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x9a\x99\x99\x99\x99\x99\xc9?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_dialign.py000644 000765 000024 00000010501 12024702176 022237 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import getcwd, remove, rmdir, mkdir, path import tempfile, shutil from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from cogent.app.dialign import Dialign __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class GeneralSetUp(TestCase): def setUp(self): """Dialign general setUp method for all tests""" self.seqs1 = ['LDTAPCLFSDGSPQKAAYVLWDQTILQQDITPLPSHETHSAQKGELLALICGLRAAK', 'PDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALDAGTSAQRAELIALTQALKM', 'RPGLCQVFADATPTGWGLVMGHQRMRGTFSAPLPIHTAELLAACFARSRSGANIIGTDNSVV', 'MLKQVEIFTDGSCLGNPGPGGYGAILRYRGREKTFSAGYTRTTNNRMELMAAIV'] self.labels1 = ['>HTL2','>MMLV', '>HEPB', '>ECOL'] self.lines1 = flatten(zip(self.labels1,self.seqs1)) self.out = \ """ DIALIGN 2.2.1 ************* Program code written by Burkhard Morgenstern and Said Abdeddaim e-mail contact: dialign (at) gobics (dot) de Published research assisted by DIALIGN 2 should cite: Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218. For more information, please visit the DIALIGN home page at http://bibiserv.techfak.uni-bielefeld.de/dialign/ ************************************************************ program call: dialign2-2 -fa -fn /tmp/di/seq1.fasta /tmp/di/seq1.txt Aligned sequences: length: ================== ======= 1) HTL2 57 2) MMLV 58 3) HEPB 62 4) ECOL 54 Average seq. length: 57.8 Please note that only upper-case letters are considered to be aligned. Alignment (DIALIGN format): =========================== HTL2 1 ldtapC-LFS DGS------P QKAAYVL--- ----WDQTIL QQDITPLPSH MMLV 1 pdadhtw-YT DGSSLLQEGQ RKAGAAVtte teviWa---- KALDAG---T HEPB 1 rpgl-CQVFA DAT------P TGWGLVM--- ----GHQRMR GTFSAPLPIH ECOL 1 mlkqv-EIFT DGSCLGNPGP GGYGAIL--- ----RYRGRE KTFSAGytrT 0000000588 8882222229 9999999000 0000666666 6666633334 HTL2 37 ethSAQKGEL LALICGLRAa k--------- --- MMLV 43 ---SAQRAEL IALTQALKm- ---------- --- HEPB 37 t------AEL LAA-CFARSr sganiigtdn svv ECOL 43 ---TNNRMEL MAAIv----- ---------- --- 0003333455 5533333300 0000000000 000 Sequence tree: ============== Tree constructed using UPGMAbased on DIALIGN fragment weight scores ((HTL2 :0.130254MMLV :0.130254):0.067788(HEPB :0.120520ECOL :0.120520):0.077521); """ self.temp_dir = tempfile.mkdtemp() try: #create sequence files f = open(path.join(self.temp_dir, 'seq1.txt'),'w') f.write('\n'.join(self.lines1)) f.close() except OSError: pass class DialignTests(GeneralSetUp): """Tests for the Dialign application controller""" def test_base_command(self): """Dialign BaseCommand should return the correct BaseCommand""" c = Dialign() self.assertEqual(c.BaseCommand, ''.join(['cd ','"%s/"; ' % getcwd(),'dialign2-2'])) c.Parameters["-fa"].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd ','"%s/"; ' % getcwd(),'dialign2-2 -fa'])) def test_align(self): """test aligning samples""" c = Dialign(WorkingDir=self.temp_dir, params={"-fn":path.join(self.temp_dir,"seq1.txt")}) c.Parameters["-fa"].on() res = c(path.join(self.temp_dir, 'seq1.txt')) align = "".join(res["Align"].readlines()) def test_cleaning_up(self): """not a test, just removes the temp files""" shutil.rmtree(self.temp_dir) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_dotur.py000644 000765 000024 00000012721 12024702176 021773 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # test_dotur.py from os import getcwd, remove, rmdir, mkdir, path import tempfile, shutil from cogent.core.moltype import DNA, RNA, PROTEIN from cogent.core.sequence import DnaSequence, RnaSequence from cogent.core.alignment import DataError from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from cogent.app.dotur import Dotur, dotur_from_alignment, dotur_from_file,\ remap_seq_names from cogent.parse.dotur import OtuListParser __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" def rna_distance(first,second): first = RnaSequence(first) return first.fracDiff(second) class DoturTests(TestCase): def setUp(self): """Dotur general setUp method for all tests""" self.seqs1_unaligned = {'1':'ACUGCUAGCUAGUAGCGUACGUA',\ '2':'GCUACGUAGCUAC',\ '3':'GCGGCUAUUAGAUCGUA'} self.seqs2_aligned = {'a': 'UAGGCUCUGAUAUAAUAGCUCUC---------',\ 'c': '------------UGACUACGCAU---------',\ 'b': '----UAUCGCUUCGACGAUUCUCUGAUAGAGA'} self.seqs2_unaligned = {'a': 'UAGGCUCUGAUAUAAUAGCUCUC',\ 'c': 'UGACUACGCAU',\ 'b': 'UAUCGCUUCGACGAUUCUCUGAUAGAGA'} #self.seqs1 aligned to self.seqs2 with self.seqs2 included. self.seqs1_and_seqs2_aligned = \ {'a': 'UAGGCUCUGAUAUAAUAGC-UCUC---------',\ 'b': '----UAUCGCUUCGACGAU-UCUCUGAUAGAGA',\ 'c': '------------UGACUAC-GCAU---------',\ '1': '-ACUGCUAGCUAGUAGCGUACGUA---------',\ '2': '----------GCUACGUAG-CUAC---------',\ '3': '-----GCGGCUAUUAG-AU-CGUA---------',\ } self.otu_list_string = \ """unique 3 a b c 0.00 3 a b c 0.59 2 a,c b 0.78 1 a,c,b """ self.otu_res_list = [ [0.0,3,[['a'],['b'],['c']]],\ [0.0,3,[['a'],['b'],['c']]],\ [float(0.59),2,[['a','c'],['b']]],\ [float(0.78),1,[['a','c','b']]],\ ] self.distance_matrix_string = \ """ 3 a 0.0 0.78125 0.59375 b 0.78125 0.0 0.71875 c 0.59375 0.71875 0.0 """ self.int_keys = {'seq_1': 'b', 'seq_0': 'a', 'seq_2': 'c'} self.otu_lists_unmapped = [\ [['seq_0'], ['seq_1'], ['seq_2']], [['seq_0'], ['seq_1'], ['seq_2']], [['seq_0', 'seq_2'], ['seq_1']], [['seq_0', 'seq_2', 'seq_1']], ] self.otu_lists_mapped = [\ [['a'], ['b'], ['c']], [['a'], ['b'], ['c']], [['a', 'c'], ['b']], [['a', 'c', 'b']], ] self.temp_dir = tempfile.mkdtemp() self.temp_dir_spaces = '/tmp/test_for_dotur/' try: mkdir(self.temp_dir_spaces) except OSError: pass try: #create sequence files f = open(path.join(self.temp_dir, 'seqs1.sto'),'w') f.write('') f.close() self.d_mat_file = path.join(self.temp_dir, 'dmat.txt') d_mat = open(self.d_mat_file,'w') d_mat.write(self.distance_matrix_string) d_mat.close() except OSError: pass def test_base_command(self): """Dotur BaseCommand should return the correct BaseCommand""" c = Dotur() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','dotur'])) c.Parameters['-l'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','dotur -l'])) def test_changing_working_dir(self): """Dotur BaseCommand should change according to WorkingDir""" c = Dotur(WorkingDir='/tmp/dotur_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/dotur_test','/"; ','dotur'])) c = Dotur() c.WorkingDir = '/tmp/dotur_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/dotur_test2','/"; ','dotur'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/dotur_test') rmdir('/tmp/dotur_test2') def test_remap_seq_names(self): """remap_seq_names should function as expected.""" for unmapped, mapped in zip(self.otu_lists_unmapped,\ self.otu_lists_mapped): self.assertEqual(remap_seq_names(unmapped,self.int_keys),mapped) def test_dotur_from_alignment(self): """dotur_from_alignment should behave correctly.""" res = dotur_from_alignment(aln=self.seqs2_aligned,moltype=RNA,\ distance_function=rna_distance) self.assertEqual(res,self.otu_res_list) def test_dotur_from_file(self): """dotur_from_alignment should behave correctly.""" res = dotur_from_file(self.d_mat_file) self.assertEqual(res,self.otu_res_list) def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_dynalign.py000644 000765 000024 00000004373 12024702176 022447 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.app.dynalign import Dynalign __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class DynalignTest(TestCase): """Tests for Dynalign application controller""" def setUp(self): self.input1 = dynalign_input1 self.input2 = dynalign_input2 def test_stdout_input_as_lines(self): """Test Dynalign stdout input as lines""" d = Dynalign(InputHandler='_input_as_lines') exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in dynalign_stdout]) res = d([self.input1,self.input2]) obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() def test_stdout_input_as_string(self): """Test Dynalign stdout input as string""" d = Dynalign() exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in dynalign_stdout]) f = open('/tmp/dyn1','w') txt = '\n'.join([str(i).strip('\n') for i in self.input1]) f.write(txt) f.close() s = open('/tmp/dyn2','w') txt = '\n'.join([str(i).strip('\n') for i in self.input2]) s.write(txt) s.close() res = d(['/tmp/dyn1','/tmp/dyn2']) obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() def test_get_result_path(self): """Tests Dynalign result path""" d = Dynalign(InputHandler='_input_as_lines') res = d([self.input1,self.input2]) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus',\ 'seq_1_ct','seq_2_ct','alignment']) self.assertEqual(res['ExitStatus'],0) res.cleanUp() dynalign_input1 = [';\n', 'testSeq1\n', '\n', 'GGCTAGATAGCTCAGGTCGGTTCGATCCCGGCTCTGGCC1'] dynalign_input2 = [';\n', 'testSeq2\n', '\n', 'GGCTAGATAGCTGTGTCGTCGGTTCGATCCCGGCTCTGGCC1'] dynalign_stdout = ['12%\n', '25%\n', '38%\n', '51%\n', '64%\n', '76%\n', '89%\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_fasttree.py000644 000765 000024 00000022403 12024702176 022451 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for FastTree v1.1 application controller. Also functions on v2.0.1, v2.1.0 and v2.1.3""" from shutil import rmtree from os import getcwd from cogent.util.unit_test import TestCase, main from cogent.app.fasttree import FastTree, build_tree_from_alignment from cogent.core.alignment import Alignment from cogent.parse.fasta import MinimalFastaParser from cogent.parse.tree import DndParser from cogent.core.moltype import DNA __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald", "Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "mcdonadt@colorado.edu" __status__ = "Development" class FastTreeTests(TestCase): def setUp(self): self.seqs = Alignment(dict(MinimalFastaParser(test_seqs.split()))) def test_base_command(self): app = FastTree() self.assertEqual(app.BaseCommand, \ ''.join(['cd "',getcwd(),'/"; ','FastTree'])) app.Parameters['-nt'].on() self.assertEqual(app.BaseCommand, \ ''.join(['cd "',getcwd(),'/"; ','FastTree -nt'])) def test_change_working_dir(self): app = FastTree(WorkingDir='/tmp/FastTreeTest') self.assertEqual(app.BaseCommand, \ ''.join(['cd "','/tmp/FastTreeTest','/"; ','FastTree'])) rmtree('/tmp/FastTreeTest') def test_build_tree_from_alignment(self): tree = build_tree_from_alignment(self.seqs, DNA) # test expected output for fasttree 1.1 and 2.0.1 try: for o,e in zip(tree.traverse(), DndParser(exp_tree).traverse()): self.assertEqual(o.Name,e.Name) self.assertFloatEqual(o.Length,e.Length) except AssertionError: for o,e in zip(tree.traverse(), DndParser(exp_tree_201).traverse()): self.assertEqual(o.Name,e.Name) self.assertFloatEqual(o.Length,e.Length) test_seqs = """>test_set1_0 GGTAGATGGGACTACCTCATGACATGAAACTGCAGTCTGTTCTTTTATAGAAGCTTCATACTTGGAGATGTATACTATTA CTTAGGACTATGGAGGTATA >test_set1_1 GGTTGATGGGACTACGTAGTGACATGAAATTGCAGTCTGTGCTTTTATAGAAGTTTGATACTTGGAGCTCTCTACTATTA CTTAGGACTATGGAGGTATA >test_set1_2 GGTTGATGGGCCTACCTCATGACAATAAACTGAAGTCTGTGCTTTTATAGAGGCTTGATACTTGGAGCTCTATACTATTA CTTAGGATTATGGAGGTCTA >test_set1_3 GGTTGATGGGACTACCTCATGACATGAAACTGCAGTCTGTGCTTTTATAGAAGCTTGATACTTGGAGATCTATACTATTA CTTAGGACTATGGAGGTCAC >test_set1_4 GGTTGGTGGGACTACCTCATGACATGAAGATGCAGTCTGTGCTTGTATAGAAGCTTGAAACTTGGATATCTATACTATTA CTTAAGACTATGGAGGTCTA >test_set1_5 GGTTGATGCGACTACCTCATGACATGAGACTGCAGTCTGTGCTTTTACTGAAGCTTGATACTTGGAGATCTATACTATTA CTTAGGACTATGGAGGTTTA >test_set1_6 GGTTGATGGGACTACCTCATGACATGAAAATGCAGTCTGTCCTTTTATAGAAGCTTGATACTTGTAGATCTATACTGTTA CTTAGGACTATGGAGGTCTA >test_set1_7 GGTTGATGGGACTCCCTCATGACATAAAACTGCAGTCTGTGCTTTTACAGAAGCTTGATACTTGGAGATCTATACTATTA CATAGGACTATGGAGGTCTA >test_set1_8 GGTTGATGGCACTACCTCATGAGATGAAACTGCAGTCTGTGCTTTTATAGAAGCTTGATACTTGGATATCTATACTATAA CTTAGTACTATGGAGGCCTA >test_set1_9 GGTTTATGTTACTACCTCATGACATGAAACGGCAGCATGTGCTTTTATAGAAGCTTGATACTTGGAGATCTAAACTATTA CTTAGGACTATGGAGGTCTA >test_set2_0 AGCGAATCATACTCTGGAAAGAAAAGGACGACTCCTTTGCTCGCGGTCTAGCTGCTACAGCTTCACCGAGTACATCTGAA TGATGGTTGAACCGGGTTCA >test_set2_1 AGAGAATAGTACTCTGGAAAGACAAGGACGACTCCTTTGATCGCGGTCTAGCTGCTACAGCTTCACCGAGTACATCTGAA TGATGGTTGAACCGGATTCA >test_set2_2 AGAGTATAATACTCTGGAAAGAAAAGGACGACTCCTTTGATCGCGGTCTAGCTGCTACAGCTTCACCGAGTACATCTTAA TGATGGTTGAACCGGGGTCA >test_set2_3 AGAGAATCATACTCTGGAAAGAAATGGACGACTCCTTTGATCGCGGTCCAGCTGCTACAGCTTCACCGAGTACATCTGAA TGATGGTTGGACCGGGTTCA >test_set2_4 AGAGAATAATAGTCTGGAAAGAAAAGGACGACTCCTTTGTTCCCGGTCTAGCTGCTACAGCTTCCCCGAGTACATCTGAA TGATGGTTGAACCGGGTTCA >test_set2_5 ACAGAATACTACTCTGGAAAGAAAAGGCCGACTCCTTTGATCGCTGTCTAGCTGCGACAGCTGCACGGAGTCCATCCGAA TGATGGTTGAACCGGGTTCA >test_set2_6 AGAGAATAATACTCTGGACAGAAATGGACGACTCCTTTGATCGCGGTCTAGCTGCTACAGCTTCACCGAGTACATCTGAA TGATGGCTGAACCGGGTTCA >test_set2_7 AGAGAATATTACTCTGGAAAGAAAAGGACGACTCCTTGGATCGCGGTCTAGCTGCTACAGCTTCAGCGAGTACATCGGAA TGATGGTTTAACCGGGTTCA >test_set2_8 AGTGAATAATACTCTGGAAAGAAAAGGACGACTCCTTTGATCGCGGTCTAGCTGCTAGAGCTTCACCGAGTACATCTGAA TGATGGTTGAACCGGGTTCA >test_set2_9 AGAGATTAATACTCTGGATAGAAAATGACGACTCCTTTGATCGCGGTCTAGCTGCTACAGATTGACCTATTACATCTGAA TGATGGTTGAACCGGGTTCA >test_set3_0 TTGTCTCCATTGAGCACTCTAATCTTGCCGTGTATTCAGGAAAGGAGGATAGAACTCGGACAGTATTCTGAACATTACAG AATCGCCGTATTTACGGTGT >test_set3_1 TTGTCTCCATTGAGCACTCTAATCATGCCGTGTATTCAGGAACGGAGGAGAGGACTCGGTCAGTATTCGGAACATTACAG AATGGCGTTATTTACGGTGT >test_set3_2 TTGTCTCCATTGAGCACTCTAATCTTGCCGTGTATTCAGGAACGGAGGATAGAACTCGGACAGAATCCTGAATATTACAA AATCGGGTTATTTACGGTGT >test_set3_3 TTGTCTCCATTGAGCACTCTAATCTTGCCGTGTTTTCAGGAACGGAGGATAGAACTCGGACAGTAGCCTGAACATTACAG AATCCCGTTATTTACGGTGT >test_set3_4 TTGTCTCCATCGAGCACTCTAATCTTGCCGTGTATTCAGGAACGGAGGATTGAACTCGGACAGTATCCTGAACATTACAG AATCGCGTTATTTACGGTGT >test_set3_5 TTGTCTCCATTGAGCACGCTAAGCTTGCCGTGTATTCAGGAACGGAGGATAGAACTCGGACAGTATCCTGAACATTACAG AATCGCGTTATTTACGGTGT >test_set3_6 TTGTCGTCATTGAGCACTCTAATCTTGCCGTGTATTCAGGAACGAAGGATAGAACTCGGACAGTATCCTGAACTTTGCAA AATCGCGTTATTTACGGTGT >test_set3_7 TTGTCTCCATTGAGCACTCTAATCTAGCCGTGTAGTCAGGAACGGAGGATGGAACGCGCACAGTATCCTGAACATAACAG AATCGCGTTATTTACGGTGT >test_set3_8 TTGTCTCCATTGAGCACTCTAATCTTGCCGTATATTCCCGAACGGAGGATAGAACTCGGACAGTAGCCTGAACAGTACAG AATCGCGTTATTTACGGTGT >test_set3_9 TTGTCTCCCTTGAGCACTCTAATCTTGCCGTGTATTCAGGAACGGAGGATAGAACTCGGACAGTATCCTGAACATTACAG AATCGCGTTATTTACGGTGT >test_set4_0 CTTTTACCGGGCTGCCCGAGAGCACTATCTGCGTCGTGCCCTGCTTCGATGCCCACACTACCATCATACTATTCGTGAAT TTGCGGCCGCTAAGATCCGA >test_set4_1 CTTTTATCGGGGTGCCTGATAGCACCATCTGCGTCGTGCCCTGCTTCGATGCCTAAACCACCGTCATGCTATTTGTGAAT TTGAGGTCGCTAAGAGCCCA >test_set4_2 CTTTTATCGGGGTGCCCGAGAGCACCATCTGCGTCGTGCCCTGCTTCGATGCCCAGGCCACCATCATACTATTTGTGGCT TAGGGGTCGCTAAGAGCCGA >test_set4_3 CTTTTATCGGGGGGCCCGAGAGCACCACCTGCGTCGTGCCCTGCTTCGATGCCCAAACCACCATCATACTATTTGTGAAT TTGGGGTCGCTAAGAGCCGA >test_set4_4 CTTTTATAGGGGTGCCCGAGAGCACCATCTGCGTCGTGCCCAGCTTCGATTTCCAAACCACCATCATACTATTTGTGAAC TTGGGGACGTTAAGAGCCGA >test_set4_5 CTTTTCGCGGGGTGCCCGAGAGCACCATCTGCGTCGCGCCCTGCTTCGGTGCCCATACCACCATCATAATATTTGGGAAA TTGGGATCGCTAAGAGTCGA >test_set4_6 CTTTTCTCGGGGTGCCCGAGAGCCCCATCTGCGTTGTGCCCTGCTACTATGCCCAAACCACCATCATACTATTTGTGAAT GTGGCGTCGCTCAGAGCCGA >test_set4_7 CTTTTATCGGGGTGCCCGAGAGCACCATCTGCGTCGTGCCCTGCTTCGATGCCCACGTCACCATACTACTATTTGTGAAT TTGGGGTCGCTAATAGCCGA >test_set4_8 CTTTTATCGGGGGGCCCGAGAGCATCATCTGCGTCGTGCCCTGCTTCGATGCCCAAACTACCATCATACTATTTGTGAAT TTGGGGTTTCTAAGAGCCGA >test_set4_9 CTTTTACCGGGGTGACCGAGAGCACCATCTGCGCCGTGCCCTGCTTCGAGGCCCAAACCACCATCATACTGTTTGTGAAT CAGGGGTTGCTAAGAGCCGA""" exp_tree = """((test_set2_0:0.02121,(test_set2_8:-0.03148,(((test_set3_6:0.05123,(test_set3_5:0.01878,((test_set3_0:0.03155,test_set3_1:0.06432)0.664:0.01096,(((test_set3_3:0.02014,test_set3_8:0.04240)0.880:0.01129,(test_set3_7:0.05900,test_set3_4:0.01449)0.756:0.00571)0.514:0.00038,test_set3_9:0.00907)0.515:0.00020)0.834:0.00164)0.708:0.01349)0.754:0.19207,test_set3_2:-0.16026)0.999:1.34181,(test_set1_2:0.00324,((test_set1_0:0.04356,test_set1_1:0.07539)0.393:0.00223,((test_set1_3:0.01998,(test_set1_9:0.07362,((test_set1_4:0.06701,test_set1_8:0.05195)0.397:0.00350,(((test_set4_4:0.06931,(((test_set4_2:0.03637,test_set4_7:0.04823)0.726:0.01237,((test_set4_5:0.09845,test_set4_6:0.08151)0.593:0.00959,((test_set4_3:0.01520,test_set4_8:0.03654)0.590:0.00869,test_set4_9:0.07865)0.499:0.00229)0.479:0.00187)0.430:0.00179,test_set4_0:0.08643)0.651:0.00975)0.478:0.04249,test_set4_1:0.03754)1.000:1.66272,test_set1_6:-0.12006)0.803:0.15777)0.490:0.00569)0.562:0.00182)0.879:0.00579,(test_set1_7:0.03234,test_set1_5:0.04114)0.520:0.00487)0.567:0.00688)0.651:0.06887)0.923:0.48284)0.994:1.24321)0.517:0.05040)0.522:0.00306,test_set2_4:0.03835,((test_set2_9:0.07472,(test_set2_3:0.03380,test_set2_6:0.01794)0.540:0.00679)0.583:0.00234,(test_set2_2:0.03055,((test_set2_5:0.08864,test_set2_7:0.04212)0.724:0.00563,test_set2_1:0.02522)0.905:0.00645)0.566:0.00081)0.642:0.00394);""" # for FastTree version 2.0.1 exp_tree_201 = """(((test_set2_8:0.00039,(((test_set3_6:0.05278,(test_set3_5:0.02030,(((test_set3_0:0.03166,test_set3_1:0.06412)0.783:0.00945,(test_set3_7:0.06330,test_set3_4:0.02026)0.896:0.00014)0.911:0.00014,((test_set3_3:0.02053,test_set3_8:0.04149)0.790:0.00995,test_set3_9:0.01011)0.927:0.00015)0.922:0.00015)0.780:0.00976)0.763:0.03112,test_set3_2:0.00014)0.881:1.40572,(((((test_set1_9:0.07378,(test_set1_7:0.03123,test_set1_5:0.04198)0.756:0.00995)0.883:0.00016,(test_set1_3:0.02027,((test_set1_0:0.04231,test_set1_1:0.07523)0.377:0.00928,test_set1_2:0.07433)0.868:0.00016)0.131:0.00015)0.872:0.00016,test_set1_8:0.06287)0.438:0.00975,(test_set1_6:0.00014,((((test_set4_4:0.07405,(test_set4_6:0.07814,test_set4_5:0.10163)0.688:0.00645)1.000:0.00015,((test_set4_2:0.03960,test_set4_7:0.05092)0.776:0.01382,(test_set4_9:0.07780,test_set4_0:0.08964)0.197:0.00703)0.862:0.00014)0.798:0.00014,(test_set4_3:0.01000,test_set4_8:0.04167)0.782:0.01024)1.000:0.07368,test_set4_1:0.00014)1.000:1.73333)0.598:0.03127)0.122:0.00091,test_set1_4:0.06300)0.634:0.46513)0.918:1.50492)0.600:0.01987,test_set2_0:0.03067)0.466:0.00015,test_set2_4:0.04129,((test_set2_2:0.03073,(test_set2_1:0.03068,(test_set2_5:0.09729,test_set2_7:0.05209)0.421:0.00015)0.851:0.00015)0.771:0.00015,(test_set2_9:0.07415,(test_set2_3:0.03110,test_set2_6:0.02061)0.776:0.00997)0.985:0.00016)0.879:0.00015);""" if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_fasttree_v1.py000644 000765 000024 00000017033 12024702176 023062 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for FastTree v1.0.0 application controller""" from shutil import rmtree from os import getcwd from cogent.util.unit_test import TestCase, main from cogent.app.fasttree_v1 import FastTree, build_tree_from_alignment from cogent.core.alignment import Alignment from cogent.parse.fasta import MinimalFastaParser from cogent.parse.tree import DndParser from cogent.core.moltype import DNA __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "mcdonadt@colorado.edu" __status__ = "Development" class FastTreeTests(TestCase): def setUp(self): self.seqs = Alignment(dict(MinimalFastaParser(test_seqs.split()))) def test_base_command(self): app = FastTree() self.assertEqual(app.BaseCommand, \ ''.join(['cd "',getcwd(),'/"; ','FastTree'])) app.Parameters['-nt'].on() self.assertEqual(app.BaseCommand, \ ''.join(['cd "',getcwd(),'/"; ','FastTree -nt'])) def test_change_working_dir(self): app = FastTree(WorkingDir='/tmp/FastTreeTest') self.assertEqual(app.BaseCommand, \ ''.join(['cd "','/tmp/FastTreeTest','/"; ','FastTree'])) rmtree('/tmp/FastTreeTest') def test_build_tree_from_alignment(self): tree = build_tree_from_alignment(self.seqs, DNA) for o,e in zip(tree.traverse(), DndParser(exp_tree).traverse()): self.assertEqual(o.Name,e.Name) self.assertFloatEqual(o.Length,e.Length) test_seqs = """>test_set1_0 GGTAGATGGGACTACCTCATGACATGAAACTGCAGTCTGTTCTTTTATAGAAGCTTCATACTTGGAGATGTATACTATTA CTTAGGACTATGGAGGTATA >test_set1_1 GGTTGATGGGACTACGTAGTGACATGAAATTGCAGTCTGTGCTTTTATAGAAGTTTGATACTTGGAGCTCTCTACTATTA CTTAGGACTATGGAGGTATA >test_set1_2 GGTTGATGGGCCTACCTCATGACAATAAACTGAAGTCTGTGCTTTTATAGAGGCTTGATACTTGGAGCTCTATACTATTA CTTAGGATTATGGAGGTCTA >test_set1_3 GGTTGATGGGACTACCTCATGACATGAAACTGCAGTCTGTGCTTTTATAGAAGCTTGATACTTGGAGATCTATACTATTA CTTAGGACTATGGAGGTCAC >test_set1_4 GGTTGGTGGGACTACCTCATGACATGAAGATGCAGTCTGTGCTTGTATAGAAGCTTGAAACTTGGATATCTATACTATTA CTTAAGACTATGGAGGTCTA >test_set1_5 GGTTGATGCGACTACCTCATGACATGAGACTGCAGTCTGTGCTTTTACTGAAGCTTGATACTTGGAGATCTATACTATTA CTTAGGACTATGGAGGTTTA >test_set1_6 GGTTGATGGGACTACCTCATGACATGAAAATGCAGTCTGTCCTTTTATAGAAGCTTGATACTTGTAGATCTATACTGTTA CTTAGGACTATGGAGGTCTA >test_set1_7 GGTTGATGGGACTCCCTCATGACATAAAACTGCAGTCTGTGCTTTTACAGAAGCTTGATACTTGGAGATCTATACTATTA CATAGGACTATGGAGGTCTA >test_set1_8 GGTTGATGGCACTACCTCATGAGATGAAACTGCAGTCTGTGCTTTTATAGAAGCTTGATACTTGGATATCTATACTATAA CTTAGTACTATGGAGGCCTA >test_set1_9 GGTTTATGTTACTACCTCATGACATGAAACGGCAGCATGTGCTTTTATAGAAGCTTGATACTTGGAGATCTAAACTATTA CTTAGGACTATGGAGGTCTA >test_set2_0 AGCGAATCATACTCTGGAAAGAAAAGGACGACTCCTTTGCTCGCGGTCTAGCTGCTACAGCTTCACCGAGTACATCTGAA TGATGGTTGAACCGGGTTCA >test_set2_1 AGAGAATAGTACTCTGGAAAGACAAGGACGACTCCTTTGATCGCGGTCTAGCTGCTACAGCTTCACCGAGTACATCTGAA TGATGGTTGAACCGGATTCA >test_set2_2 AGAGTATAATACTCTGGAAAGAAAAGGACGACTCCTTTGATCGCGGTCTAGCTGCTACAGCTTCACCGAGTACATCTTAA TGATGGTTGAACCGGGGTCA >test_set2_3 AGAGAATCATACTCTGGAAAGAAATGGACGACTCCTTTGATCGCGGTCCAGCTGCTACAGCTTCACCGAGTACATCTGAA TGATGGTTGGACCGGGTTCA >test_set2_4 AGAGAATAATAGTCTGGAAAGAAAAGGACGACTCCTTTGTTCCCGGTCTAGCTGCTACAGCTTCCCCGAGTACATCTGAA TGATGGTTGAACCGGGTTCA >test_set2_5 ACAGAATACTACTCTGGAAAGAAAAGGCCGACTCCTTTGATCGCTGTCTAGCTGCGACAGCTGCACGGAGTCCATCCGAA TGATGGTTGAACCGGGTTCA >test_set2_6 AGAGAATAATACTCTGGACAGAAATGGACGACTCCTTTGATCGCGGTCTAGCTGCTACAGCTTCACCGAGTACATCTGAA TGATGGCTGAACCGGGTTCA >test_set2_7 AGAGAATATTACTCTGGAAAGAAAAGGACGACTCCTTGGATCGCGGTCTAGCTGCTACAGCTTCAGCGAGTACATCGGAA TGATGGTTTAACCGGGTTCA >test_set2_8 AGTGAATAATACTCTGGAAAGAAAAGGACGACTCCTTTGATCGCGGTCTAGCTGCTAGAGCTTCACCGAGTACATCTGAA TGATGGTTGAACCGGGTTCA >test_set2_9 AGAGATTAATACTCTGGATAGAAAATGACGACTCCTTTGATCGCGGTCTAGCTGCTACAGATTGACCTATTACATCTGAA TGATGGTTGAACCGGGTTCA >test_set3_0 TTGTCTCCATTGAGCACTCTAATCTTGCCGTGTATTCAGGAAAGGAGGATAGAACTCGGACAGTATTCTGAACATTACAG AATCGCCGTATTTACGGTGT >test_set3_1 TTGTCTCCATTGAGCACTCTAATCATGCCGTGTATTCAGGAACGGAGGAGAGGACTCGGTCAGTATTCGGAACATTACAG AATGGCGTTATTTACGGTGT >test_set3_2 TTGTCTCCATTGAGCACTCTAATCTTGCCGTGTATTCAGGAACGGAGGATAGAACTCGGACAGAATCCTGAATATTACAA AATCGGGTTATTTACGGTGT >test_set3_3 TTGTCTCCATTGAGCACTCTAATCTTGCCGTGTTTTCAGGAACGGAGGATAGAACTCGGACAGTAGCCTGAACATTACAG AATCCCGTTATTTACGGTGT >test_set3_4 TTGTCTCCATCGAGCACTCTAATCTTGCCGTGTATTCAGGAACGGAGGATTGAACTCGGACAGTATCCTGAACATTACAG AATCGCGTTATTTACGGTGT >test_set3_5 TTGTCTCCATTGAGCACGCTAAGCTTGCCGTGTATTCAGGAACGGAGGATAGAACTCGGACAGTATCCTGAACATTACAG AATCGCGTTATTTACGGTGT >test_set3_6 TTGTCGTCATTGAGCACTCTAATCTTGCCGTGTATTCAGGAACGAAGGATAGAACTCGGACAGTATCCTGAACTTTGCAA AATCGCGTTATTTACGGTGT >test_set3_7 TTGTCTCCATTGAGCACTCTAATCTAGCCGTGTAGTCAGGAACGGAGGATGGAACGCGCACAGTATCCTGAACATAACAG AATCGCGTTATTTACGGTGT >test_set3_8 TTGTCTCCATTGAGCACTCTAATCTTGCCGTATATTCCCGAACGGAGGATAGAACTCGGACAGTAGCCTGAACAGTACAG AATCGCGTTATTTACGGTGT >test_set3_9 TTGTCTCCCTTGAGCACTCTAATCTTGCCGTGTATTCAGGAACGGAGGATAGAACTCGGACAGTATCCTGAACATTACAG AATCGCGTTATTTACGGTGT >test_set4_0 CTTTTACCGGGCTGCCCGAGAGCACTATCTGCGTCGTGCCCTGCTTCGATGCCCACACTACCATCATACTATTCGTGAAT TTGCGGCCGCTAAGATCCGA >test_set4_1 CTTTTATCGGGGTGCCTGATAGCACCATCTGCGTCGTGCCCTGCTTCGATGCCTAAACCACCGTCATGCTATTTGTGAAT TTGAGGTCGCTAAGAGCCCA >test_set4_2 CTTTTATCGGGGTGCCCGAGAGCACCATCTGCGTCGTGCCCTGCTTCGATGCCCAGGCCACCATCATACTATTTGTGGCT TAGGGGTCGCTAAGAGCCGA >test_set4_3 CTTTTATCGGGGGGCCCGAGAGCACCACCTGCGTCGTGCCCTGCTTCGATGCCCAAACCACCATCATACTATTTGTGAAT TTGGGGTCGCTAAGAGCCGA >test_set4_4 CTTTTATAGGGGTGCCCGAGAGCACCATCTGCGTCGTGCCCAGCTTCGATTTCCAAACCACCATCATACTATTTGTGAAC TTGGGGACGTTAAGAGCCGA >test_set4_5 CTTTTCGCGGGGTGCCCGAGAGCACCATCTGCGTCGCGCCCTGCTTCGGTGCCCATACCACCATCATAATATTTGGGAAA TTGGGATCGCTAAGAGTCGA >test_set4_6 CTTTTCTCGGGGTGCCCGAGAGCCCCATCTGCGTTGTGCCCTGCTACTATGCCCAAACCACCATCATACTATTTGTGAAT GTGGCGTCGCTCAGAGCCGA >test_set4_7 CTTTTATCGGGGTGCCCGAGAGCACCATCTGCGTCGTGCCCTGCTTCGATGCCCACGTCACCATACTACTATTTGTGAAT TTGGGGTCGCTAATAGCCGA >test_set4_8 CTTTTATCGGGGGGCCCGAGAGCATCATCTGCGTCGTGCCCTGCTTCGATGCCCAAACTACCATCATACTATTTGTGAAT TTGGGGTTTCTAAGAGCCGA >test_set4_9 CTTTTACCGGGGTGACCGAGAGCACCATCTGCGCCGTGCCCTGCTTCGAGGCCCAAACCACCATCATACTGTTTGTGAAT CAGGGGTTGCTAAGAGCCGA""" exp_tree = """(test_set1_3:0.02062,(test_set1_8:0.05983,test_set1_9:0.07093)0.652:0.00422,((test_set1_5:0.04140,test_set1_7:0.03208)0.634:0.00995,((test_set1_0:0.04748,(test_set1_1:0.07025,(test_set1_2:-0.00367,((((((test_set3_4:0.01485,test_set3_7:0.05863)0.862:0.00569,(test_set3_5:0.02048,(test_set3_3:0.02036,test_set3_8:0.04218)0.724:0.01088)0.397:0.00005)0.519:0.00018,((test_set3_0:0.03139,test_set3_1:0.06448)0.699:0.01095,test_set3_9:0.00940)0.505:0.00036)0.721:0.01080,test_set3_6:0.05333)0.808:0.16470,test_set3_2:-0.12547)1.000:1.42295,((test_set4_1:-0.24707,((test_set4_3:0.00903,test_set4_8:0.04272)0.650:0.00955,(test_set4_4:0.07633,((test_set4_5:0.09853,test_set4_6:0.08143)0.577:0.00748,((test_set4_0:0.08822,test_set4_9:0.07914)0.371:0.00130,(test_set4_2:0.03477,test_set4_7:0.04983)0.764:0.01228)0.665:0.00370)0.535:0.00147)0.633:0.00116)0.850:0.32065)0.998:1.45908,(((test_set2_2:0.03125,((test_set2_8:0.01770,(test_set2_0:0.02172,test_set2_4:0.04082)0.482:0.00228)0.575:0.00249,(test_set2_9:0.07291,(test_set2_3:0.03201,test_set2_6:0.01973)0.538:0.00746)0.690:0.00319)0.746:0.00073)0.788:0.00569,(test_set2_5:0.08827,test_set2_7:0.04250)0.717:0.00533)0.504:0.04532,test_set2_1:-0.01642)0.997:1.14023)0.625:0.41298)0.822:0.20836)0.783:0.07781)0.487:0.00591)0.469:0.00151,(test_set1_4:0.06606,test_set1_6:0.02981)0.703:0.00956)0.560:0.00216)0.468:0.00050);""" if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_foldalign.py000644 000765 000024 00000003453 12024702176 022577 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.foldalign import foldalign __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class FoldalignTest(TestCase): """Tests for Foldalign application controller""" def setUp(self): self.input = FOLDALIGN_INPUT def test_input_as_lines(self): """Test foldalign stdout input as lines""" f = foldalign(InputHandler='_input_as_lines') res = f(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test foldalign stdout input as string""" f = foldalign() t = open('/tmp/single.col','w') t.write('\n'.join(self.input)) t.close() res = f('/tmp/single.col') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/single.col') def test_get_result_path(self): """Tests foldalign result path""" f = foldalign(InputHandler='_input_as_lines') res = f(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() FOLDALIGN_INPUT = ['>seq1\n', 'GGCCACGTAGCTCAGTCGGTAGAGCAAAGGACTGAAAATCCTTGTGTCGTTGGTTCAATTCCAACCGTGGCCACCA','>seq2\n', 'GCCAGATAGCTCAGTCGGTAGAGCGTTCGCCTGAAAAGTGAAAGGTCGCCGGTTCGATCCCGGCTCTGGCCACCA'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_formatdb.py000755 000765 000024 00000065300 12024702176 022440 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # Author: Greg Caporaso (gregcaporaso@gmail.com) # test_formatdb.py """ Description File created on 16 Sep 2009. """ from __future__ import division from os.path import split, exists from cogent import LoadSeqs from cogent.util.unit_test import TestCase, main from cogent.app.util import get_tmp_filename from cogent.util.misc import remove_files from cogent.app.blast import blastn from cogent.app.formatdb import FormatDb, build_blast_db_from_seqs,\ build_blast_db_from_fasta_path, build_blast_db_from_fasta_file __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Production" class FormatDbTests(TestCase): def setUp(self): self.in_seqs1_fp =\ get_tmp_filename(prefix='FormatDbTests',suffix='.fasta') self.in_seqs1_file = open(self.in_seqs1_fp,'w') self.in_seqs1_file.write(in_seqs1) self.in_seqs1_file.close() self.in_seqs1 = LoadSeqs(self.in_seqs1_fp,aligned=False) self.test_seq = test_seq self.in_aln1_fp =\ get_tmp_filename(prefix='FormatDbTests',suffix='.fasta') self.in_aln1_file = open(self.in_aln1_fp,'w') self.in_aln1_file.write(in_aln1) self.in_aln1_file.close() self.in_aln1 = LoadSeqs(self.in_aln1_fp) self.files_to_remove = [self.in_seqs1_fp,self.in_aln1_fp] def tearDown(self): remove_files(self.files_to_remove) def test_call(self): """FormatDb: Calling on a nucleotide data functions as expected """ fdb = FormatDb(WorkingDir='/tmp') result = fdb(self.in_seqs1_fp) # test sucessful run self.assertEqual(result['ExitStatus'],0) expected_result_keys = set(\ ['log','nhr','nin','nsd','nsi','nsq','ExitStatus','StdOut','StdErr']) self.assertEqual(set(result.keys()),expected_result_keys) inputfile_basename = split(self.in_seqs1_fp)[1] # got all the expected out files, and filepaths are as expected outpaths = [] for ext in ['log','nhr','nin','nsd','nsi','nsq']: outpath = '/tmp/%s.%s' % (inputfile_basename,ext) outpaths.append(outpath) self.assertEqual(result[ext].name,outpath) result.cleanUp() # all created files are cleaned up for outpath in outpaths: self.assertFalse(exists(outpath),\ "%s was not cleaned up." % outpath) def test_blast_against_new_db(self): """Formatdb: blastall against a newly created DB functions as expected """ fdb = FormatDb(WorkingDir='/tmp') result = fdb(self.in_seqs1_fp) blast_res = blastn(self.test_seq,blast_db=self.in_seqs1_fp) result.cleanUp() # Test that a blast result was returned self.assertTrue('s1' in blast_res,\ "Not getting any blast results.") # Test that the sequence we expect was a good blast hit subject_ids = [r['SUBJECT ID'] for r in blast_res['s1'][0]] self.assertTrue('11472384' in subject_ids,\ "Not getting expected blast results.") def test_build_blast_db_from_seqs(self): """build_blast_db_from_seqs convenience function works as expected """ blast_db, db_files = build_blast_db_from_seqs(self.in_seqs1,output_dir='/tmp') self.assertTrue(blast_db.startswith('/tmp/Blast_tmp_db')) self.assertTrue(blast_db.endswith('.fasta')) expected_db_files = set([blast_db + ext\ for ext in ['.nhr','.nin','.nsq','.nsd','.nsi','.log']]) self.assertEqual(set(db_files),expected_db_files) # result returned when blasting against new db self.assertEqual(\ len(blastn(self.test_seq,blast_db=blast_db)),1) # Make sure all db_files exist for fp in db_files: self.assertTrue(exists(fp)) # Remove all db_files exist remove_files(db_files) # Make sure nothing weird happened in the remove for fp in db_files: self.assertFalse(exists(fp)) def test_build_blast_db_from_fasta_path(self): """build_blast_db_from_fasta_path convenience function works as expected """ blast_db, db_files = \ build_blast_db_from_fasta_path(self.in_seqs1_fp) self.assertEqual(blast_db,self.in_seqs1_fp) expected_db_files = set([self.in_seqs1_fp + ext\ for ext in ['.nhr','.nin','.nsq','.nsd','.nsi','.log']]) self.assertEqual(set(db_files),expected_db_files) # result returned when blasting against new db self.assertEqual(\ len(blastn(self.test_seq,blast_db=blast_db)),1) # Make sure all db_files exist for fp in db_files: self.assertTrue(exists(fp)) # Remove all db_files exist remove_files(db_files) # Make sure nothing weird happened in the remove for fp in db_files: self.assertFalse(exists(fp)) def test_build_blast_db_from_fasta_path_aln(self): """build_blast_db_from_fasta_path works with alignment as input """ blast_db, db_files = build_blast_db_from_fasta_path(self.in_aln1_fp) self.assertEqual(blast_db,self.in_aln1_fp) expected_db_files = set([blast_db + ext\ for ext in ['.nhr','.nin','.nsq','.nsd','.nsi','.log']]) self.assertEqual(set(db_files),expected_db_files) # result returned when blasting against new db self.assertEqual(\ len(blastn(self.test_seq,blast_db=blast_db,e_value=0.0)),1) # Make sure all db_files exist for fp in db_files: self.assertTrue(exists(fp)) # Remove all db_files exist remove_files(db_files) # Make sure nothing weird happened in the remove for fp in db_files: self.assertFalse(exists(fp)) def test_build_blast_db_from_fasta_file(self): """build_blast_db_from_fasta_file works with open files as input """ blast_db, db_files = \ build_blast_db_from_fasta_file(open(self.in_aln1_fp),output_dir='/tmp/') self.assertTrue(blast_db.startswith('/tmp/BLAST_temp_db')) self.assertTrue(blast_db.endswith('.fasta')) expected_db_files = set([blast_db] + [blast_db + ext\ for ext in ['.nhr','.nin','.nsq','.nsd','.nsi','.log']]) self.assertEqual(set(db_files),expected_db_files) # result returned when blasting against new db self.assertEqual(\ len(blastn(self.test_seq,blast_db=blast_db,e_value=0.0)),1) # Make sure all db_files exist for fp in db_files: self.assertTrue(exists(fp)) # Remove all db_files exist remove_files(db_files) # Make sure nothing weird happened in the remove for fp in db_files: self.assertFalse(exists(fp)) in_seqs1 = """>11472286 GATGAACGCTGGCGGCATGCTTAACACATGCAAGTCGAACGGAACACTTTGTGTTTTGAGTTAATAGTTCGATAGTAGATAGTAAATAGTGAACACTATGAACTAGTAAACTATTTAACTAGAAACTCTTAAACGCAGAGCGTTTAGTGGCGAACGGGTGAGTAATACATTGGTATCTACCTCGGAGAAGGACATAGCCTGCCGAAAGGTGGGGTAATTTCCTATAGTCCCCGCACATATTTGTTCTTAAATCTGTTAAAATGATTATATGTTTTATGTTTATTTGATAAAAAGCAGCAAGACAAATGAGTTTTATATTGGTTATACAGCAGATTTAAAAAATAGAATTAGGTCTCATAATCAGGGAGAAAACAAATCAACTAAATCTAAAATACCTTGGGAATTGGTTTACTATGAAGCCTACAAAAACCAAACATCAGCAAGGGTTAGAGAATCAAAGTTGAAACATTATGGGCAATCATTAACTAGACTTAAGAGAAGAATTGGTTTTTGAGAACAAATATGTGCGGGGTAAAGCAGCAATGCGCTCCGAGAGGAACCTCTGTCCTATCAGCTTGTTGGTAAGGTAATGGCTTACCAAGGCGACGACGGGTAGCTGGTGTGAGAGCACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGAGGAATTTTCCACAATGGGCGCAAGCCTGATGGAGCAATGCCGCGTGAAGGATGAAGATTTTCGGATTGTAAACTTCTTTTAAGTAGGAAGATTATGACGGTACTACTTGAATAAGCATCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGATGCAAGCGTTATCCGGAATTACTGGGCGTAAAGCGTGTGTAGGTGGTTTATTAAGTTAAATGTTAAATTTTCAGGCTTAACTTGGAAACCGCATTTAATACTGGTAGACTTTGAGGACAAGAGAGGCAGGCGGAATTAGCGGAGTAGCGGTGAAATGCGTAGATATCGCTAAGAACACCAATGGCGAAGGCAGCCTGCTGGTTTGCACCTGACACTGAGATACGAAAGCGTGGGGAGCGAACGGGATTAGATACCCCGGTAGTCCACGCCGTAAACGATGGTCACTAGCTGTTAGGGGCTCGACCCCTTTAGTAGCGAAGCTAACGCGTTAAGTGACCCGCCTGGGGAGTACGATCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGAACGTGAGGTTTAATTCGTCTCTAAGCGAAAAACCTTACCGAGGCTTGACATCTCCGGAAGACCTTAGAAATAAGGTTGTGCCCGAAAGGGAGCCGGATGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTCGGTTAAGTCCGTTAACGAGCGCAACCCTTGCTGTGTGTTGTATTTTTCACACAGGACTATCCTGGTCAACAGGGAGGAAGGTGGGGATGACGTCAAGTCAGCATGGCTCTTACGCCTCGGGCTACACTCGCGTTACAATGGCCGGTACAATGGGCTGCCAACTCGTAAGGGGGAGCTAATCCCATCAAAACCGGTCCCAGTTCGGATTGAGGGCTGCAATTCGCCCTCATGAAGTCGGAATCGCTAGTAACCGCGAATCAGCACGTCGCGGTGAATGCGTTCTCGGGTCTTGTACACACTGCCCGTCACACCACGAAAGTTAGTAACGCCCGAAGTGCCCTGTATGGGGTCCTAAGGTGGGGCTAGCGATTGGGGTG >11472384 AGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGGCAACCCTGGTGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACGTGTCCTGTAGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGCTCTACGGAGGAAAGGGGGGGATCTTAGGACCTCCCGCTACAGGGGCGGCCGATGGCAGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTTGTCCGGAAAGAAAACGCCGTGGTTAATACCCGTGGCGGATGACGGTACCGGAAGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTCCGCTAAGACAGATGTGAAATCCCCGGGCTTAACCTGGGAACTGCATTTGTGACTGGCGGGCTAGAGTATGGCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAATACCGATGGCGAAGGCAGCCCCCTGGGCCAATACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCAACTAGTTGTCGGGTCTTCATTGACTTGGTAACGTAGCTAACGCGTGAAGTTGACCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAATTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGTATGGAATCCTGCTGAGAGGTGGGAGTGCCCGAAAGGGAGCCATAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCCTAGTTGCTACGCAAGAGCACTCTAGGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTATGGGTAGGGCTTCACACGTCATACAATGGTCGGAACAGAGGGTCGCCAACCCGCGAGGGGGAGCCAATCCCAGAAAACCGATCGTAGTCCGGATCGCACTCTGCAACTCGAGTGCGTGAAGCTGGAATCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTTTACCAGAAGTGGCTAGTCTAACCGCAAGGAGGACGGTCACCACGGTAGGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGCTGGATCACCTCCTTTCTCGAGCGAACGTGTCGAACGTTGAGCGCTCACGCTTATCGGCTGTGAAATTAGGACAGTAAGTCAGACAGACTGAGGGGTCTGTAGCTCAGTCGGTTAGAGCACCGTCTTGATAAGGCGGGGGTCGATGGTTCGAATCCATCCAGACCCACCATTGTCT >11468680 TAAACTGAAGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGTGCTTGCACCTGGTGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACATGTCCTGTAGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGATCTACGGATGAAAGCGGGGGACCTTCGGGCCTCGCGCTATAGGGTTGGCCGATGGCTGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCAGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGAAAGCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTTGTCCGGAAAGAAATCCTTGGCTCTAATACAGTCGGGGGATGACGGTACCGGAAGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTGCTAAGACCGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTGGTGACTGGCAGGCTAGAGTATGGCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAATACCGATGGCGAAGGCAGCCCCCTGGGCCAATACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCAACTAGTTGTTGGGGATTCATTTCCTTAGTAACGTAGCTAACGCGTGAAGTTGACCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAATTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGGTCGGAATCCCGCTGAGAGGTGGGAGTGCTCGAAAGAGAACCGGCGCACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTTAGTTGCTACGCAAGAGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTATGGGTAGGGCTTCACACGTCATACAATGGTCGGAACAGAGGGTTGCCAACCCGCGAGGGGGAGCTAATCCCAGAAAACCGATCGTAGTCCGGATTGCACTCTGCAACTCGAGTGCATGAAGCTGGAATCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTTTACCAGAAGTGGCTAGTCTAACCGCAAGGAGGACGGTCACCACGGTAGGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGCTGGATCACCTCCTTTCCAGAGCTATCTCGCAAAGTTGAGCGCTCACGCTTATCGGCTGTAAATTTAAAGACAGACTCAGGGGTCTGTAGCTCAGTCGGTTAGAGCACCGTCTTGATAAGGCGGGGGTCGTTGGTTCGAATCCAACCAGACCCACCATTGTCTG >11458037 GACGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAACGGTTTCGAAGATCGGACTTCGAATTTCGAATTTCGATCATCGAGATAGTGGCGGACGGGTGAGTAACGCGTGGGTAACCTACCCATAAAGCCGGGACAACCCTTGGAAACGAGGGCTAATACCGGATAAGCTTGAGAAGTGGCATCACTTTTTAAGGAAAGGTGGCCGATGAGAATGCTGCCGATTATGGATGGACCCGCGTCTGATTAGCTGGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCAGTAGCCGGCCTGAGAGGGTGAACGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGTATGATGAAGGTTTTCGGATTGTAAAGTACTGTCTATGGGGAAGAATGGTGTGCTTGAGAATATTAAGTACAAATGACGGTACCCAAGGAGGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCGCGTAGGCGGATAGTTAAGTCCGGTGTGAAAGATCAGGGCTCAACCCTGAGAGTGCATCGGAAACTGGGTATCTTGAGGACAGGAGAGGAAAGTGGAATTCCACGTGTAGCGGTGAAATGCGTAGATATGTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGACTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATGAGTGCTAGGTGTAGAGGGTATCGACCCCTTCTGTGCCGCAGTTAACACAATAAGCACTCCGCCTGGGGAGTACGGCCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGACGCAACGCGAAGAACCTTACCAGGGCTTGACATCCTCTGAACTTGCTGGAAACAGGAAGGTGCCCTTCGGGGAGCAGAGAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAATCCCGCAACGAGCGCAACCCCTGTATTTAGTTGCTAACGCGTAGAGGCGAGCACTCTGGATAGACTGCCGGTGATAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTTCTGGGCTACACACGTGCTACAATGGCCGGTACAGACGGAAGCGAAGCCGCGAGGCGGAGCAAATCCGAGAAAGCCGGTCTCAGTTCGGATTGCAGGCTGCAACTCGCCTGCATGAAGTCGGAATCGCTAGTAATCGCAGGTCAGCATACTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCACGAAAGTCTGCAACACCCGAAGCCGGTGAGGTAACCGACTCGAGATTCGAGGCTCGAAGTTCGAGGATCGAAGTGTAAGCGAAATTAATAAGTCTTAGTAAAGCTAAAAAGCATTAAGACCGATAAGATGATCTTGCAATCGAACATCGAACATCGAATTTCGAACCTCGAGTTGGAGCTAGCCGTCGAAGGTGGGGCCGATAATTGGGGTG >11469739 AGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAACGAGAAGCTAACTTCTGATTCCTTCGGGATGATGAGGTTAGCAGAAAGTGGCGAACGGGTGAGTAACGCGTGGGTAATCTACCCTGTAAGTGGGGGATAACCCTCCGAAAGGAGGGCTAATACCGCATAATATCTTTATCCCAAAAGAGGTAAAGATTAAAGATGGCCTCTATACTATGCTATCGCTTCAGGATGAGTCCGCGTCCTATTAGTTAGTTGGTGGGGTAATGGCCTACCAAGACGACAATGGGTAGCCGGTCTGAGAGGATGTACGGCCACACTGGGACTGAGATACGGCCCAGACTCCTACGGGAGACAGCAGTGGGGAATATTGCGCAATGGGGGAAACCCTGACGCAGCGACGCCGCGTGGATGATGAAGGCCCTTGGGTTGTAAAATCCTGTTCTGGGGGAAGAAAGCTTAAAGGTCCAATAAACCCTTAAGCCTGACGGTACCCCAAGAGAAAGCTCCGGCTAATTATGTGCCAGCAGCCGCGGTAATACATAAGGAGCAAGCGTTATCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTCTTAAAAGTCAGTTGTGAAATTATCAGGCTCAACCTGATAAGGTCATCTGAAACTCTAAGACTTGAGGTTAGAAGAGGAAAGTGGAATTCCCGGTGTAGCGGTGAAATGCGTAGATATCGGGAGGAACACCAGTGGCGAAGGCGGCTTTCTGGTCTATCTCTGACGCTGAGGAGCGAAAGCTAGGGGAGCAAACGGGATTAGATACCCCGGTAGTCCTAGCTGTAAACGATGGATACTAGGTGTGGGAGGTATCGACCCCTTCTGTGCCGTAGCTAACGCATTAAGTATCCCGCCTGGGGAGTACGGTCGCAAGGCTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGACGCAACGCGAAGAACCTTACCGGGACTTGACATTATCTTGCCCGTCTAAGAAATTAGATCTTCTTCCTTTGGAAGACAGGATAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCACAACGAGCGCAACCCTTGTGCTTAGTTGCTAACTTGTTTTACAAGTGCACTCTAGGCAGACTGCCGCAGATAATGCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTACGTCCCGGGCTACACACGTGCTACAATGGCCTGTACAGAGGGTAGCGAAAGAGCGATCTTAAGCCAATCCCAAAAAGCAGGCCCCAGTTCGGATTGGAGGCTGCAACTCGCCTCCATGAAGTAGGAATCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCACGAAAGTTGGCGATACCTGAAGTTACTAGGCTAACCTGGCACTCAACTAAGTTCACTAACTTATTTGCTTAAAATAAGGCTTAATGTGCTTAGTTGAGTGCCGGGAGGCAGGTACCGAAGGTATGGCTGGCGATTGGGGTGAAGTCGTAACAAGGTGGAAA >11469752 AGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGGCAGCGAGTTCCTCACCGAGGTTCGGAACAGTTGACAGTAAACAGTTGACAGTAAACAGTAACTTCAGAAATGAAGCGGACTGTGAACTGTTTACTGTAACCTGTTAGCTATTATTTCGAGCTTTAGTGAGGAATGTCGGCGAGCGGCGGACGGCTGAGTAACGCGTAGGAACGTACCCCAAACTGAGGGATAAGCACCAGAAATGGTGTCTAATACCGCATATGGCCCAGCACCTTTTTTAATCAACCACGACCCTAAAATCGTGAATAATTGGTAGGAAAAGGTGTTGGGTTAAAGCTTCGGCGGTTTGGGAACGGCCTGCGTATGATTAGCTTGTTGGTGAGGTAAAAGCTCACCAAGGCGACGATCATTAGCTGGTCTGAGAGGATGATCAGCCAGACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGGCGAAAGCCTGATGGAGCAACGCCGTGTGCAGGATGAAAGCCTTCGGGTCGTAAACTGCTTTTATATGTGAAGACTTCGACGGTAGCATATGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTCATACGGAGGATCCAAGCGTTATCCGGAATTACTGGGCGTAAAGAGTTGCGTAGGTGGCATAGTAAGTTGGTAGTGAAATTGTGTGGCTCAACCATACACCCATTACTAAAACTGCTAAGCTAGAGTATATGAGAGGTAGCTGGAATTCCTAGTGTAGGAGTGAAATCCGTANATATTAGGAGGAACACCGATGGCGTAGGCAGGCTACTGGCATATTACTGACACTAAGGCACGAAAGCGTGGGGAGCGAACGGGATTAGATACCCCGGTAGTCCACGCTGTAAACGATGGATGCTAGCTGTTATGAGTATCGACCCTTGTAGTAGCGAAGCTAACGCGTTAAGCATCCCGCCTGTGGAGTACGAGCGCAAGCTTAAAACATAAAGGAATTGACGGGGACCCGCACAAGCGGTGGAGCGTGTTGTTTAATTCGATGATAAGCGAAGAACCTTACCAAGGCTTGACATCCCTGGAATTTCTCCGAAAGGAGAGAGTGCCTTCGGGAATCAGGTGACAGGTGATGCATGGCCGTCGTCAGCTCGTGTCGTGAGATGTTTGGTTAAGTCCATTAACGAGCGCAACCCTTGTAAATAGTTGGATTTTTCTATTTAGACTGCCTCGGTAACGGGGAGGAAGGAGGGGATGATGTCAGGTCAGTATTTCTCTTACGCCTTGGGCTACAAACACGCTACAATGGCCGGTACAAAGGGCAGCCAACCCGCGAGGGGGAGCAAATCCCATCAAAGCCGGTCTCAGTTCGGATAGCAGGCTGAAATTCGCCTGCTTGAAGTCGGAATCGCTAGTAACGGTGAGTCAGCTATATTACCGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCAAGGCATGAAAGTCATCAATACCTGACGTCTGGATTTATTCTGGCCTAAGGTAGGGGCGATGATTGGGCCTAAGTCGTAACAAGGTAA >11460523 AGAGTTTGATCCTGGCTCAGAACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGCGAAATCGGGCACTCAATTTTGCTTTTCAAACATTAACTGATGAAACGACCAGAGAGATTGTTCCAGTTTAAAGAGTGAAAAGCAGGCTTGAGTGCCTGAGAGTAGAGTGGCGCACGGGTGAGTAACGCGTAAATAATCTACCCCTGCATCTGGGATAACCCACCGAAAGGTGAGCTAATACCGGATACGTTCTTTTAACCGCGAGGTTTTAAGAAGAAAGGTGGCCTCTGATATAAGCTACTGTGCGGGGAGGAGTTTGCGTACCATTAGCTAGTTGGTAGGGTAATGGCCTACCAAGGCATCGATGGTTAGCGGGTCTGAGAGGATGATCCGCCACACTGGAACTGGAACACGGACCAGACTCCTACGGGAGGCAGCAGTGAGGAATATTGCGCAATGGGGGCAACCCTGACGCAGCGACGCCGCGTGGATGATGAAGGCCTTCGGGTCGTAAAATCCTGTCAGATGGAAAGAAGTGTTATATGGATAATACCTGTATAGCTTGACGGTACCATCAAAGGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTCTGTTATGTCAGATGTGAAAGTCCACGGCTCAACCGTGGAAGTGCATTTGAAACTGACAGACTTGAGTACTGGAGGGGGTGGTGGAATTCCCGGTGTAGAGGTGAAATTCGTAGATATCGGGAGGAATACCGGTGGCGAAGGCGACCACCTGGCCAGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCAACTAGGTGTTGGGATGGTTAATCGTCTCATTGCCGGAGCTAACGCATTAAGTTGACCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGTATGTGGTTTAATTCGACGCAACGCGCAGAACCTTACCTGGTCTTGACATCCCGAGAATCTCAAGGAAACTTGAGAGTGCCTCTTGAGGAACTCGGTGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCTTTAGTTGCCATCATTAAGTTGGGCACTCTAAAGAGACTGCCGGTGTCAAACCGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCTTTATGACCAGGGCTACACACGTACTACAATGGCATAGACAAAGGGCAGCGACATCGCGAGGTGAAGCGAATCCCATAAACCATGTCTCAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTTGGAATCGCTAGTAATCGTAGATCAGCATGCTACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCACGGGAGTTGGTTGTACCAGAAGCAGTTGAGCGAACTATTCGTAGACGCAGGCTGCCAAGGTATGATTGGTAACTGGGGTGAAGTCGTAACAAGGTAACC >11460543 TGGTTTGATCCTGGCTCAGGACAAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAACGAGAAGCCAGCTTTTGATTCCTTCGGGATGAGAAAGCAGGTAGAAAGTGGCGAACGGGTGAGTAACGCGTGGGTAATCTACCCTGTAAGTAGGGGATAACCCTCTGAAAAGAGGGCTAATACCGCATAATATCTTTACCCCATAAGAAGTAAAGATTAAAGATGGCCTCTGTATATGCTATCGCTTCAGGATGAGCCCGCGTCCTATTAGTTAGTTGGTAAGGTAATGGCTTACCAAGACCACGATGGGTAGCCGGTCTGAGAGGATGTACGGCCACACTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCGCAATGGGGGAAACCCTGACGCAGCGACGCCGCGTGGATGATGAAGGCCTTCGGGTTGTAAAATCCTGTTTTGGGGGACGAAACCTTAAGGGTCCAATAAACCCTTAAATTGACGGTACCCCAAGAGAAAGCTCCGGCTAATTATGTGCCAGCAGCCGCGGTAATACATAAGGAGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGTTCGTAGGCGGTCTTAAAAGTCAGGTGTGAAATTATCAGGCTTAACCTGATACGGTCATCTGAAACTTTAAGACTTGAGGTTAGGAGAGGAAAGTGGAATTCCCGGTGTAGCGGTGAAATGCGTAGATATCGGGAGGAACACCAGTGGCGAAGGCGGCTTTCTGGCCTAACTCTGACGCTGAGGAACGAAAGCTAGGGGAGCAAACGGGATTAGATACCCCGGTAGTCCTAGCTGTAAACGATGGATACTAGGTGTGGGAGGTATCGACCCCTTCTGTGCCGWCACTAACGCATTAAGTATCCCGCCTGGGGAGTACGGTCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGACGCAACGCGAAGAACCTTACCGGGGCTTGACATTGTCTTGCCCGTTTAAGAAATTAAATTTTCTTCCCTTTTAGGGAAGACAAGATAACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCACAACGAGCGCAACCCTTATTCTTAGTTGCTAGTTTGTTTACAAACGCACTCTAAAGAGACTGCCGCAGATAATGCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTACGTCCCGGGCTACACACGTGCTACAATGGCCTGTACAGAGGGTAGCGAAAGAGCGATCTCAAGCTAATCCCTTAAAACAGGTCTCAGTTCGGATTGGAGGCTGCAACTCGCCTCCATGAAGTCGGAATCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGAAAGTTGGCGATACCTGAAGTTACTGTGCTAACCCGGCACTCAACTAAGTACATTAAGTCTTATTTTAAGCTATTGTATTTAGTTGAGTGCCGGGAGGCAGGTACCTAAGGTATGGCTAGCGATTGGGGTGAAGTCGTAACAAGGTAGCCG >11480235 TGGTTTGATCCTGGCTCAGGATTAACGCTGGCGGCGCGCCTTATACATGCAAGTCGAACGAGCCTTGTGCTTCGCACAAGGAAATTCCAAGCACCAAGCACCAAATCTCAAACAAATCCCAATGACCAAAATTCCAAAAACCTAAACATTTTAAATGTTTAGAATTTGGAAAATTGGAATTTGGAATTTATTTGTTATTTGGAATTTATGATTTGGGATTTTCTCGCGCGGAGANCNTNAGTGGCGAACGGGTGAGTAATACGTTGGTATCTACCCCAAAGTAGAGAATAAGCCCGAGAAATCGGGGTTAATACTCTATGTGTTCGAAAGAACAAAGACTTCGGTTGCTTTGGGAAGAACCTGCGGCCTATCAGCTTGTTGGTAAGGTAACGGCTTACCAAGGCTTTGACGGGTAGCTGGTCTGGGAAGACGACCAGCCACAATGGGACTTAGACACGGCCCATACTCCTACGGGAGGCAGCAGTAGGGAATCTTCGGCAATGCCCGAAAGGTGACCGAGCGACGCCGCGTAGAGGAAGAAGATCTTTGGATTGTAAACTCTTTTTCTCCTAGACAAAGTTCTGATTGTATAGGAGGAATAAGGGGTTTCTAAACTCGTGCCAGCAGAAGCGGTAATACGAGTGCCCCAAGCGTTATCCGGAATCATTGGGCGTAGAGCGTTGTATAGGTGGTTTAAAAAGTCCAAAATTAAATCTTTAGGCTCAACCTAAAATCTGTTTTGGAAACTTTTAGACTTGAATAAAATCGACGSGAGTGGAACTTCCAGAGTAGGGGTTACATCCGTTGATACTGGAAGGAACGCCGAAGGCGAAAGCAACTCGCGAGATTTTATTGACGCCGCGTACACGAAAGCGTGGGGAGCGAAAAGTATTAGATACACTTGTAGTCCACGCCGTAAACTATGGATACTAGCAATTTGAAGCTTCGACCCTTCAAGTTGCGGACTAACGCGTTAAGTATCTCGCCTGGGAAGTACGGCCGCAAGGCTAAAACTCAAAGGAATAGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGACGATAAGCGTGGAACCTTACCAGGGCTTAGACGTACAGAGAATTCCTTGGAAACAAGGAAGTGCTTCGGGAACTCTGTACTCAGGTACTGCATGGCTGTCGTCAGTATGTACTGTGAAGCACTCCCTTAATTGGGGCAACATACGCAACCCCTATCCTAAGTTAGAAATGTCTTAGGAAACCGCTTCGATTCATCGGAGAGGAAGATGGGGACGACGTCAAGTCAGCATGGTCCTTGATGTCCTGGGCGACACACGTGCTACAATGGCTAGTATAACGGGATGCGTAGGTGCGAACCGAAGCTAATCCTTAAAAAACTAGTCTAAGTTCGGATTGAAGTCTGCAACTCGACTTCATGAAGCCGGAATCGCTAGTAACCGCAAATCAGCCACGTTGCGGTGAATACGTTCTCGGGCCTTGTACTCACTGCCCGTCACGTCAAAAAAGTCGGTAATACCCGAAGCACCCTTTTAAAGGGTTCTAAGGTAGGACCGATGATTGGGACGAAGTCGTAACAAGGTAGCCG >11480408 AATTTAGCGGCCGCGAATTCGCCCTTGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGGGATATCCGAGCGGAAGGTTTCGGCCGGAAGGTTGGGTATTCGAGTGGCGGACGGGTGAGTAACGCGTGAGCAATCTGTCCCGGACAGGGGGATAACACTTGGAAACAGGTGCTAATACCGCATAAGACCACAGCATCGCATGGTGCAGGGGTAAAAGGAGCGATCCGGTCTGGGGTGAGCTCGCGTCCGATTAGATAGTTGGTGAGGTAACGGCCCACCAAGTCAACGATCGGTAGCCGACCTGAGAGGGTGATCGGCCACATTGGAACTGAGAGACGGTCCAAACTCCTACGGGAGGCAGCAGTGGGGAATATTGGGCAATGGGCGAAAGCCTGACCCAGCAACGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTGTTATGCGAGACGAAGGAAGTGACGGTATCGCATAAGGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCGAGCGTTGTCCGGAATGACTGGGCGTAAAGGGCGTGTAGGCGGCCGTTTAAGTATGGAGTGAAAGTCCATTTTTCAAGGATGGAATTGCTTTGTAGACTGGATGGCTTGAGTGCGGAAGAGGTAAGTGGAATTCCCAGTGTAGCGGTGAAATGCGTAGAGATTGGGAGGAACACCAGTGGCGAAGGCGACTTACTGGGCCGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCGGTAAACGATGAATGCTAGGTGTTGCGGGTATCGACCCCTGCAGTGCCGGAGTAAACACAATAAGCATTCCGCCTGGGGAGTACGGCCGCAAGGTTGAAACTCAAGGGAATTGACGGGGGCCCGCACAAGCAGCGGAGCATGTTGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCAGTTAAGCTCATAGAGATATGAGGTCCCTTCGGGGGAACTGAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATGGTCAGTTACTAACGCGTGAAGGCGAGGACTCTGACGAGACTGCCGGGGACAACTCGGAGGAAGGTGGGGACGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACAAACGTGCTACAATGGTGACTACAAAGAGGAGCGAGACTGTAAAGTGGAGCGGATCTCAAAAAAGTCATCCCAGTTCGGATTGTGGGCTGCAACCCGCCCACATGAAGTTGGAGTTGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTTGGGAGCACCCGAAGTCAGTGAGGTAACCGGAAGGAGCCAGCTGCCGAAGGTGAGACCGATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGCTGGATCACCTCCTTAAGGGCGAATTCGTTTAAACCTGCAGGACTAG """ test_seq = """>s1 (11472384) AGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGGCAACCCTGGTGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACGTGTCCTGTAGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGCTCTACGGAGGAAAGGGGGGGATCTTAGGACCTCCCGCTACAGGGGCGGCCGATGGCAGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTTGTCCGGAAAGAAAACGCCGTGGTTAATACCCGTGGCGGATGACGGTACCGGAAGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTCCGCTAAGACAGATGTGAAATCCCCGGGCTTAACCTGGGAACTGCATTTGTGACTGGCGGGCTAGAGTATGGCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAATACCGATGGCGAAGGCAGCCCCCTGGGCCAATACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCAACTAGTTGTCGGGTCTTCATTGACTTGGTAACGTAGCTAACGCGTGAAGTTGACCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAATTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGTATGGAATCCTGCTGAGAGGTGGGAGTGCCCGAAAGGGAGCCATAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCCTAGTTGCTACGCAAGAGCACTCTAGGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTATGGGTAGGGCTTCACACGTCATACAATGGTCGGAACAGAGGGTCGCCAACCCGCGAGGGGGAGCCAATCCCAGAAAACCGATCGTAGTCCGGATCGCACTCTGCAACTCGAGTGCGTGAAGCTGGAATCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTTTACCAGAAGTGGCTAGTCTAACCGCAAGGAGGACGGTCACCACGGTAGGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGCTGGATCACCTCCTTTCTCGAGCGAACGTGTCGAACGTTGAGCGCTCACGCTTATCGGCTGTGAAATTAGGACAGTAAGTCAGACAGACTGAGGGGTCTGTAGCTCAGTCGGTTAGAGCACCGTCTTGATAAGGCGGGGGTCGATGGTTCGAATCCATCCAGACCCACCATTGTCT """ in_aln1 = """>a1 AAACCTTT----TTTTAAATTCCGAAGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGGCAACCCTGGTGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACGTGTCCTGTAGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGCTCTACGGAGGAAAGGGGGGGATCTTAGGACCTCCCGCTACAGGGGCGGCCGATGGCAGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTTC >a2 AAACCTTT----TTTTAAATTCCGCAGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGGCAACCCTGGTGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACGTGTCCTGTAGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGCTCTACGGAGGAAAGGGGGGGATCTTAGGACCTCCCGCTACAGGGGCGGCCGATGGCAGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTTC >a3 AAACCTTT----TTTTAAATTCCGGAGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGGCAACCCTGGTGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACGTGTCCTGTAGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGCTCTACGGAGGAAAGGGGGGGATCTTAGGACCTCCCGCTACAGGGGCGGCCGATGGCAGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTTC >a4 AAACCTTT----TTTTAAATTCCGTAGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGGCAACCCTGGTGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACGTGTCCTGTAGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGCTCTACGGAGGAAAGGGGGGGATCTTAGGACCTCCCGCTACAGGGGCGGCCGATGGCAGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTTC """ if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_app/test_gctmpca.py000755 000765 000024 00000772543 12024702176 022276 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # Author: Greg Caporaso (gregcaporaso@gmail.com) # test_gctmpca.py """ Tests of the Generalized Continuous-Time Markov Process Coevolutionary Algorithm (GCTMPCA) application controller. """ from __future__ import division from os import environ from cogent.util.unit_test import TestCase, main from cogent.app.util import ApplicationError from cogent.util.misc import app_path from cogent.app.gctmpca import Gctmpca, gctmpca_aa_order, \ default_gctmpca_aa_priors, gctmpca_base_order __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Beta" class GctmpcaTests(TestCase): """ Tests of the Gctmpca coevolution detection app controller """ def setUp(self): """ """ self.run_slow_tests = int(environ.get('TEST_SLOW_APPC',0)) self.sample_seqs = sample_seqs.split('\n') self.sample_tree = sample_tree.split('\n') self.sample_species_names = sample_species_names.split('\n') self.sample_priors = sample_priors self.sample_sub_matrix = sample_sub_matrix self.sample_seq_to_species = sample_seq_to_species.split('\n') self.sample_data = {'mol_type':'rna','seqs1':self.sample_seqs,\ 'tree1':self.sample_tree,'seq_names':self.sample_species_names,\ 'species_tree':self.sample_tree,\ 'seq_to_species1':self.sample_seq_to_species,\ 'char_priors':self.sample_priors,'sub_matrix':self.sample_sub_matrix, 'single_pair_only':1,'pos1':1265,'pos2':1270,'epsilon':0.1} self.trivial_seqs = trivial_seqs.split('\n') self.trivial_tree = trivial_tree.split('\n') self.trivial_species_names = trivial_species_names.split('\n') self.trivial_seq_to_species = trivial_seq_to_species.split('\n') #self.trivial_priors = trivial_priors.split('\n') #self.trivial_sub_matrix = trivial_sub_matrix.split('\n') self.trivial_data = {'mol_type':'protein','seqs1':self.trivial_seqs,\ 'tree1':self.trivial_tree,'seq_names':self.trivial_species_names,\ 'species_tree':self.trivial_tree,\ 'seq_to_species1':self.trivial_seq_to_species,\ 'single_pair_only':0,'epsilon':'0.7'} self.myog_seqs = myog_seqs.split('\n') self.myog_tree = myog_tree.split('\n') self.myog_species_names = myog_species_names.split('\n') self.myog_seq_to_species = myog_seq_to_species.split('\n') #self.myog_priors = myog_priors.split('\n') #self.myog_sub_matrix = myog_sub_matrix.split('\n') self.myog_data = {'mol_type':'protein','seqs1':self.myog_seqs,\ 'tree1':self.myog_tree,'seq_names':self.myog_species_names,\ 'species_tree':self.myog_tree,\ 'seq_to_species1':self.myog_seq_to_species,\ 'single_pair_only':0,'epsilon':'0.1'} self.app = Gctmpca() def test_calculate_likelihood_in_path(self): """Gctmpca: calculate_likelihood binary in path """ assert app_path('calculate_likelihood'),\ "calculate_likelihood is not in your path -- other tests will fail!" def test_trivial_data(self): """Gctmpca: full computation works with trivial example """ if not self.run_slow_tests: return expected = ['position 1\tposition 2\tscore\n','2\t3\t3.705879\n'] #app = Gctmpca(HALT_EXEC=True) #app(self.trivial_data) #exit(-1) actual = list(self.app(self.trivial_data)['output']) self.assertEqual(actual,expected) def test_trivial_data_pairwise(self): """Gctmpca: pairwise computation works with trivial example """ if not self.run_slow_tests: return expected = ['position 1\tposition 2\tscore\n'] self.trivial_data['single_pair_only'] = 1 self.trivial_data['pos1'] = 1 self.trivial_data['pos2'] = 2 actual = list(self.app(self.trivial_data)['output']) self.assertEqual(actual,expected) def test_myog_data_pairwise(self): """Gctmpca: pairwise computation works with myog example """ if not self.run_slow_tests: return expected = ['position 1\tposition 2\tscore\n','44\t108\t0.364928\n'] self.myog_data['single_pair_only'] = 1 self.myog_data['pos1'] = 44 self.myog_data['pos2'] = 108 actual = list(self.app(self.myog_data)['output']) self.assertEqual(actual,expected) def test_rna_sample_data(self): """Gctmpca: works with provided sample data """ expected = ['position 1\tposition 2\tscore\n',\ '1265\t1270\t33.862890\n'] actual = list(self.app(self.sample_data)['output']) self.assertEqual(actual,expected) def test_amino_acid_defaults(self): """Gctmpca: cogent defaults match those presented by Yeang et al. """ expected_aa_order = 'ARNDCQEGHILKMFPSTWYV' self.assertEqual(gctmpca_aa_order,expected_aa_order) gctmpca_priors_string = \ "0.087127\t0.040904\t0.040432\t0.046872\t0.033474\t0.038255\t" +\ "0.049530\t0.088612\t0.033618\t0.036886\t0.085357\t0.080482\t" +\ "0.014753\t0.039772\t0.050680\t0.069577\t0.058542\t0.010494\t" +\ "0.029916\t0.064718\n" gctmpca_priors = dict(zip(gctmpca_aa_order,\ map(float,gctmpca_priors_string.strip().split('\t')))) cogent_priors = default_gctmpca_aa_priors self.assertFloatEqual(cogent_priors,gctmpca_priors) def test_default_sub_matrix_and_priors_used(self): """Gctmpca: works with provided sample data mixed w/ defaults""" # Works with default priors and sub matrix expected = ['position 1\tposition 2\tscore\n',\ '1265\t1270\t33.862890\n'] del self.sample_data['char_priors'] del self.sample_data['sub_matrix'] actual = list(self.app(self.sample_data)['output']) self.assertEqual(actual,expected) def test_gctmpca_cl_input_result_sanity(self): """Gctmpca: correct number of parameters on command line""" actual = self.app._gctmpca_cl_input(self.sample_data) self.assertTrue(actual.startswith('0 0 "/tmp/tmp')) # Many of the intermediate paramters are randomly generated # file names. self.assertTrue(actual.endswith('.txt" 1 - 1265 1270')) parameter_count = len(actual.split()) self.assertEqual(parameter_count,22) def test_gctmpca_cl_input_result_sanity_w_some_default(self): """Gctmpca: correct number of parameters on command line w some defaults """ # remove some parameters and make sure the result changes del self.sample_data['single_pair_only'] del self.sample_data['pos1'] del self.sample_data['pos2'] actual = self.app._gctmpca_cl_input(self.sample_data) self.assertTrue(actual.startswith('0 0 "/tmp/tmp')) # Many of the intermediate paramters are randomly generated # file names. self.assertTrue(actual.endswith('.txt" 0 -')) parameter_count = len(actual.split()) self.assertEqual(parameter_count,20) def test_input_as_gctmpca_char_priors(self): """Gctmpca: priors input handler works as expected """ priors = {'A':0.50,'C':0.25,'G':0.15,'U':0.10} expected = ['0.5\t0.25\t0.15\t0.1',''] actual = self.app._input_as_gctmpca_char_priors(priors,'ACGU') self.assertEqual(actual,expected) expected = ['0.1\t0.15\t0.25\t0.5',''] actual = self.app._input_as_gctmpca_char_priors(priors,'UGCA') self.assertEqual(actual,expected) def test_input_as_gctmpca_rate_matrix(self): """Gctmpca: rate matrix input handler works as expected """ m = {'A':{'A':-13.4,'C':3.2,'D':2.2},\ 'C':{'A':3.4,'C':-3.2,'D':22.2},\ 'D':{'A':0.4,'C':0.0,'D':-1.}} expected = ['-13.4\t3.2\t2.2','3.4\t-3.2\t22.2','0.4\t0.0\t-1.0'] actual = self.app._input_as_gctmpca_rate_matrix(m,'ACD') self.assertEqual(actual,expected) expected = ['-1.0\t0.4\t0.0','2.2\t-13.4\t3.2','22.2\t3.4\t-3.2'] actual = self.app._input_as_gctmpca_rate_matrix(m,'DAC') self.assertEqual(actual,expected) def test_unsupported_function_raises_error(self): """Gctmpca: attempting an intermolecular experiment causes error """ self.sample_data['comparison_type'] = 1 self.assertRaises(NotImplementedError,self.app,self.sample_data) def test_gctmpca_cl_input_error(self): """Gctmpca: missing required data causes ApplicationError """ # missing all self.assertRaises(ApplicationError,self.app,{}) # missing one # mol_type data = {'seqs1':self.sample_seqs,\ 'tree1':self.sample_tree,'seq_names':self.sample_species_names,\ 'species_tree':self.sample_tree,\ 'seq_to_species1':self.sample_seq_to_species} self.assertRaises(ApplicationError,self.app,data) # seqs1 data = {'mol_type':'rna',\ 'tree1':self.sample_tree,'seq_names':self.sample_species_names,\ 'species_tree':self.sample_tree,\ 'seq_to_species1':self.sample_seq_to_species} self.assertRaises(ApplicationError,self.app,data) # tree1 data = {'mol_type':'rna','seqs1':self.sample_seqs,\ 'seq_names':self.sample_species_names,\ 'species_tree':self.sample_tree,\ 'seq_to_species1':self.sample_seq_to_species} self.assertRaises(ApplicationError,self.app,data) # 'seq_names' data = {'mol_type':'rna','seqs1':self.sample_seqs,\ 'tree1':self.sample_tree,\ 'species_tree':self.sample_tree,\ 'seq_to_species1':self.sample_seq_to_species} self.assertRaises(ApplicationError,self.app,data) # species_tree data = {'mol_type':'rna','seqs1':self.sample_seqs,\ 'tree1':self.sample_tree,'seq_names':self.sample_species_names,\ 'seq_to_species1':self.sample_seq_to_species} self.assertRaises(ApplicationError,self.app,data) # seq_to_species1 data = {'mol_type':'rna','seqs1':self.sample_seqs,\ 'tree1':self.sample_tree,'seq_names':self.sample_species_names,\ 'species_tree':self.sample_tree} self.assertRaises(ApplicationError,self.app,data) # pos1 if single_pair_only == 1 data = {'mol_type':'rna','seqs1':self.sample_seqs,\ 'tree1':self.sample_tree,'seq_names':self.sample_species_names,\ 'species_tree':self.sample_tree,\ 'seq_to_species1':self.sample_seq_to_species,\ 'single_pair_only':1,'pos2':5} self.assertRaises(ApplicationError,self.app,data) # pos2 if single_pair_only == 2 data = {'mol_type':'rna','seqs1':self.sample_seqs,\ 'tree1':self.sample_tree,'seq_names':self.sample_species_names,\ 'species_tree':self.sample_tree,\ 'seq_to_species1':self.sample_seq_to_species,\ 'single_pair_only':1,'pos1':5} self.assertRaises(ApplicationError,self.app,data) # pos1 & pos2 if single_pair_only == 2 data = {'mol_type':'rna','seqs1':self.sample_seqs,\ 'tree1':self.sample_tree,'seq_names':self.sample_species_names,\ 'species_tree':self.sample_tree,\ 'seq_to_species1':self.sample_seq_to_species,\ 'single_pair_only':1} self.assertRaises(ApplicationError,self.app,data) sample_seqs = """146 1542 cy.Mic.aer CCGAGCUCGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCGGCGUGCCUAACACAUGCAAGUCGAACGGGAAUC--------UUCG-------GAUU-CUAGUGGCGGACGGGUGAGUAACGCGUAAGAAUCUAACUUCAGGACGGGGACAACAGUUGGAAACGACUGCUAAUACCCGAUG-UGCCGCAGGGUGAAACC--------UAAU------UGGCCUGGAGAAGAGCUUGCGUCUGAUUAGCUAGUUGGUGGGGUAAGGGCCUACCAAGGCGACGAUCAGUAGCUGGUCUGAG-GGAUGAGCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGCCUGACGGAGCAACGCCGCGUGAGGGAGGAAGGUCUUUGGAUUGUAAACCUCUUUUCUCAAGGAAGAAG----------UUCU-------------AACGGUUCUUGAGGAAUCAGCCUCGGCUAACUCCGUGCCAGCAGCCCCGGUAAUACGGGGGAGGCAAGCGUUAUCCGGAAUUAUUGGGCGUAAAGCGUCCGCAGGUGGUCAGCCAAGUCUGCCGUCAAAUCAGGUUGCUUAACGACCUAAAGGCGGUGGAAACUGGCAGACUAGAGAGCAGUAGGGGUAGCAGGAAUUCCCAGUGUAGCAGUGAAAUGCGUAGAGAUUGGGAAGAACAUCGGUGGCGAAAGCGUGCUACUGGGCUGUCUCUGACACUCAGGGACGAAAGCUUGGGGAGCGAACGGGAUUAGAUACCCCUGUAGUCCUAGCCGUAAACGAUGGAUACUAGGUGUGGGUUGUAUCGCCCGAGGCGGGCCAAAGAUAACGCGUUAUGUAUCCCCCCUGGAGAGUACGCACGCAAGUUUGAAUCUCAAAGGUAUUGACGGGGGCCCGCACAAGCGGUGGAGUAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCAAG-AUUGACAUGUCGCGAACCCUGGUGAAAGCUGGGGGUGCCUUCGGGAGCGCCAACACAGAUGGUGUAUAGCUGUCGCCAGCUCGUGUCGUGAGAUGUUGGGUUAUGUCCCGCAAAGAGCGCACCCCUCGUUCUUAGUUGCCAGCAUUAAG-UGGACGACUCUAAGGAGACUGCCGGUGACAAACCGGAGGAAGGUGGUGAUGACGUCAAGUCAGCAUGCCCCUUACGUCUUGGGCGACACACGUACUACAGUCGUCGGAACAAAAGGCAGUCAACUCGCGAGAGCCAGCGUAUCCCGCAAACCCGGCCUCAGUUCAGAUUGCAGGCUGCAACUCGCCUGCAUGAAGGAGGAAUCGCUAGUAAUCGCCGGUCAGCAUACGGCGGUGAAUUCGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGAAGCUGGUCACGCCCGAAGUCAUUACCUCAACCGCAAGAGGGGGGAUGCCUAAGGCAGGGCUAGUGACUGGGGUmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- cy.Osc.aga -----mmmmmmmmmmmmmmmmmmmmmmGAUGAACGCUGGCGGUCUGCUUAACACAUGCAAGUCGAACGGAAUCC--------UUCG--------GGAUUUAGUGGCGGACGGGUGAGUAACACGUAAGAACCUGCCUCUAGGACGGGGACAACAGUUGGAAACGACUGCUAAACCCGGAUG-AGCCGAAAGGUAAAAGA--------UUAA------UCGCCUAGAGAGGGGCUUGCGUCUGAUUAGCUAGUUGGUGAGGUAAGAGCCCACCAAGGCGACGAUCAGUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGC-UGACGGAGCAAGACCGCGUGGGGGAGGAAGGUUCUUGGAUUGUCAACCCCUUUUCUCAGGGAAGAACA---------CAAU-------------GACGGUACCUGAGGAAAAAGCAUCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGGGGAUGCAAGCGUUAUCCGGAAUGAUUGGGCGUAAAGAGUCCGUAGGUAGUCAUCCAAGUCUGCUGUUAAAGAGCGAGGCUUAACCUCGUAAAGGCAGUGGAAACUGGAAGACUAGAGUGUAGUAGGGGCAGAGGGAAUUCCUGGUGUAGCGGUGAAAUGCGUAGAGAUCAGGAAGAACACCGGUGGCGAAGGCGCUCUGCUGGGCUAUAACUGACACUGAGGGACGAAAGCUAGGGGAGCGAAUGGGAUUAGAUACCCCAGUAGUCCUAGCGGUAAACGAUGGAAACUAGGUGUGGCCUGUAUCGCCCGGGCCGUGCCGAAGCAAACGCGUUAAGUUUCCCGCCUGGGGAGUACGCACGCAAGUGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGUAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCAGGACUUGACAUCUCUGGAAUCUCCUUGAAAGGGGAGAGUGCCGUAAGGAACCAGAAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUCGUUAGUUGCCAUCAUUAAG-UUGGGAACUCUAGCGAGACUGCCGGUGACAAACCGGAGGAAGGUGAGGAUGACGUCAAGUCAGCAUGGCCCUUACGUCCUGGGCGACACACGUACUACAAUGCGAGGGACAGAGAGCAGCCAACCCGCGAGGGAGAGCGAAUCUCAUAAACCCUGGCACAGUUCAGAUUGCAGGCUGCAACUCGCCUGCAUGAAGGAGGAAUCGCUAGUAAUCGGAGGUCAGCAUACUGCGGUGAAUCCGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGAAGUGAGCCACGCCCGAAGUCAUUACUCUAACCGAAAGGAGGAGGGUGCCGAAGGCGGGGUUGAUGACUGGGGUGAAGUCGUAACAAGGUAGCCGUACCGGAAGGUGUGGCUGGAUCACCUCCUUU cy.Pho.amb -----mmmmmmmmmmmmmmmmmmmmmmGAUGAACGCUGGCGGUCUGCUUAACACAUGCAAGUCGAACGAACUC---------UUCG---------GAGUUAGUGGCGGACGGGUGAGUAACGCGUGAGAGUCUAGCUUCAGAACGGGGACAACAGAGGGAAACCACUGCUAAUACCCGAUG-UGCCGAGAGGUGAAAGA--------UUUA------UCAUCUGAAGAUGAGCUCGCGUCCGAUUAGCUAGUUGGUAAGGUAAGAGCUUACCAAGGCGACGAUCGGUAGCUGGUCUGAGAGGAUGACCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGC-UGACGGAGCAAGACCGCGUGAGGGAGGAAGGCUCUUGGGUUGUAAACCUCUUUUCUCUGGGAAGAAG----------AUCU-------------GACGGUACCAGAGGAAUCAGCAUCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGAUGCAAGCGUUAUCCGGAAUUAUUGGGCGUAAAGCGUCCGUAGGUGGCGGUUCAAGUCUGCCGUUAAAACCAGUAGCUUAACUACUGACAGGCGGUGGAAACUGAACAGCUAGAGUGUGGUAGGGGUAGAGGGAAUUCCCAGUGUAGCGGUGAAAUGCGUAGAGAUUGGGAAGAACACCGGUGGCGAAAGCGCUCUGCUGGACCACAACUGACACUCACGGACGAAAGCUAGGGGAGCGAAAGGGAUUAGAUACCCCAGUAGUCCUAGCCGUAAACGAUGGAUACUAGGUGUAGUAGGUAUCGCCCCUACUGUGCCGUAGCUAACGCGUUAAGUAUCCCGCCUGGGGAGUACGGUCGCAAGAUUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGUAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCAGGACUUGACAUGUCGCGAAUCUCUCUGAAAGGAGAGAGUGCCUUCGGGAGCGCGAACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCACGUUUUUAGUUGCCAUCAUUAAG-UUGGGCACUCUAGAGAGACUGCCGGUGACAAACCGGAGGAAGGUGUGGAUGACGUCAAGUCAGCAUGCCCCUUACGUCCUGGGCUACACACGUACUACAAUGGCUGGGACAAAGAGCUGCAAGCGAGCGAUCGCAAGCCAAUCUCAUAAACCCAGUCUUAGUUCAGAUCGCAGGCUGCAACUCGCCUGCGUGAAGGCGGAAUCGCUAGUAAUCGCAGAUCAGAAUGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGAAGCUGGUCACGCCCGAAGUCGUUAUCCUAACCUUCGGGAGGGAGAUGCCUAAGGCAGGGCUGGUGACUGGGGUGAAGUCGUAACAAGGUAGC-GUACCGGAAGGmmmmmmmmmmmmmmmm----- cy.Tri.ten -----mmmmmmmmmmmmmmmmmmmmmmmAUGAACGCUGGCGGUCUGCUUAACACAUGCAAGUCGAACGGACUC---------UUCG---------GAGUUAGUGGCGGACGGGUGAGUAACGCGUGAGAAUCUGCCUUCAGGUCUGGGACAACAGAAGGAAACUUCCGCUAAUCCCGGAUG-AGCCUUAGGGUAAAAGA--------UAAA------UUGCCUGGAGAUGAGCUUGCGUCUGAUUAGCUAGUUGGUGUGGUAAAAGCAUACCAAGGCAACGAUCAGUAGCUGGUCUGAGAGGAUGAGCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGCAAGCCUGACGGAGCAAGACCGCGUGGGGGAGGAAGGCUCUAGGGUUGUAAACCCCUUUUCUUUGGGAAGAAG----------AAAU-------------GACGGUACCAAAGGAAUCAGCCUCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGAGGCAAGCGUUAUCCCGAAUGAUUGGGCGUAAAGCGUCCGCAGGUGGCCAUGUAAGUCUGCUGUUAAAACCCAGGGCUUAACUCUGGUCAGGCAGUGGAAACUACAAAGCUAGAGUUCGGUAGGGGCAAAGGGAAUUCCCGGUGUAGCGGUGAAAUGCGUAGAUAUCGGGAAGAACAUCGGUGGCGAAAGCGCUUUGCUAGACCGAAACUGACGCUCAGGGACGAAAGCUAGGGGAGCGAAUGGGAUUAGAUACCCCAGUAGUCCUAGCCGUAAACGAUGGAUACUAGGUGUUGCCUGUAUCGCCCAGGCAGUGCCGUAGCUAACGCGUUAAGUAUCCCGCCUGGGGAGUACGCACGCAAGUGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGUAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCAGGGCUUGACAUGUCGCGAAUCUCAAGGAAACUUGAGAGUGCCUUCGGGAGCGCGAACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUCUUUAGUUGCCAUCAUUAAG-UUGGGCACUCUGGAGAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAGUCAGCAUGCCCCUUACGCUCUGGGCUACACACGUACUACAAUGGUUGGGACAGAGGGCAGCGAACCCGAGAGGGGAAGCCAAUCCCCAAAACCCAGCCUCAGUUCAGAUUGCAGGCUGCAACUCGCCUGCAUGAAGGAGGAAUCGCUAGUAAUCGCAGGUCAGCAUACUGCGGUGAAUCCGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGAAGUUGGCCACGCCCGAAGUCAUUACUCUAACCGAAAGGAGGGGGAUGCCGAAGGCAGGGCUGAUGACUGGGGUGAAGUCGUAACAAGGUAGCCGUACCGGAAmmmmmmmmmmmmmmmmmm----- pl.Zea.may -----UACCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUGCAAGUAUGAACUAA-----CGAA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUGUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCAAAACCCUCC-GGGCGGAUCGCA--CGGUUCGCGGCG-ACG-CAUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGGGGCCUACCAUGGUGGUGACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACCG-GGCGCGUUAGUGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGC-CGGGC---------CGGG-UG-CC-GCCGUGGCA----G--AACCA------CC-----AUUAAUAGGGCAGUGGGGGCACGUAUUUCAUAGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACAACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACCAGGGAUCAGCGG-UAAUCCCCGCUGGCCCUUAUGAGAAAUCA--AAGUCGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGCGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---CUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGUCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAACUAGCUAUGCCAUC--UUAGCUUCUUAGAGGGACUAUGGCCGUUUAGGCCGCG-AAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGUAUCCAACGAGUACCUUGGCCGACAGGCC-GG-UAAUCUUGGAAAUUUCAUCGUGAUGGGGAUAGAUCAUUGCAAUUGUUGGUCUUCAACGAGGAAUGCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGGUCCGGUGA-AGUGUUCGGACUCGGUUCGCCGGAAGUCCAUUGAACCUUAUCAUUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUG----- pl.Pan.gin ----CAACCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUGUAAGUAUGAACUAA-----CAGA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUGUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCAAAACCCUCU-GGACGGAUCGCA--CGGCUCGCGGCG-ACG-CAUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACUAUGGUGGUGACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACCG-GGCUG-AUUCAGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGU-UGGGUC-----GGCCGGUCCG-CC--UCUCGGUG---UG--CACCA------UC-----AUUAACAGGGCAGUGGGGGCACGUAUUUCAUAGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACAACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACCAGGGAUCAGUGGAUUUUCUCCACUGGCCCUUAUGAGAAAUCA--AAGUUGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---CUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAAAUAGCUAUGUUAUA--CCAGCUUCUUAGAGGGACUAUGGCCUUUUAGGCCACGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGUAUUCAACGAGUACCUUGGCCGACAGGCCCGGGUAAUCUU-GAAAUUUCAUCGUGAUGGGGAUAGAUCAUUGCAAUUGUUGGUCUUCAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGGUCCGGUGA-AGUGUUCGGAUGCGGUUCGAAAGAAGUCCACUGAACCUUAUCAUUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCAGAAGGAUCAGNN----- pl.Ara.tha -----UACCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUGUAAGUAUGAACGAA-----CAGA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUGUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCAAAACCCUAU-GGACGGAUCGCA--UGGUCUGUGGCG-ACG-CAUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACCAUGGUGGUAACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACCG-GGCUCUUUCGAGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGA-UGGGUC-----GGCCGGUCCG-CC--UUU-GGUG---UG--CAUUG------UC-----AUUAACAGGGCAGUGGGGGCACGUAUUUCAUAGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACAACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACCAGGGAUCAGCGGAUUAUCUCCGCUGGCCCUUAUGAGAAAUCA--AAGUUGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---CUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAACUAGCUACGUCAUC--CCGGCUUCUUAGAGGGACUAUGGCCGUUUAGGCCAAGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGUAUUCAACGAGACCCUU-GCCGACAGGCCCGGGUAAUCUU-GAAAUUUCAUCGUGAUGGGGAUAGAUCAUUGCAAUUGUUGGUCUUCAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGAUCCGGUGA-AGUGUUCGGACGCGGUUCGCGAGAAGUCCACUAAACCUUAUCAUUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUG----- pl.Gly.max -----UACUUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUGUAAGUAUGAACUAA-----CAGA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUGUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCAAAACCCUCU-GGACGGAUCGCA--CGGUUUGCGGCG-ACG-CAUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACCAUGGUGGUGACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACCG-GGCUC-AUUGAGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAUGAUCCAUUGAAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGU-UGGGUC-----GAUCGGUCCG-CC--UCC-GGUG---UG--CACCG------UC-----AUUAACAGGGCAGUGGGGGCACGUAUUUCAUAGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACAACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACCAGGGAUCAGCGGAUUUUCUCCGCUGGCCCUUAUGAGAAAUCA--AAGUCGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCU-AAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---CUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAAAUAGCUAUGUUAAC--CCAGCUUCUUAGAGGGACUAUGGCCGCUUAGGCCACGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGUAUUCAACGAGUACCUUGGCCGACAGGUCCGGGUAAUCUU-GAAAUUUCAUCGUGAUGGGGAUAGAUCAUUGCAAUUGUUGGUCUUCAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGGUCCGGUGA-AGUGUUCGGAUGCGGUUCGUGAGAAGUCCACUGAACCUUAUCAUUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUG----- pl.Pis.sat -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNNNNNNNNAAGCCAUGCAUGUGUAAGUAUGAACUAA-----CAGA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUGUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCAAAACCCUUU-GGACGGAUCGCA--CGGUUUGUGGCG-ACG-CAUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACCAUGGUGGUGACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUAACAC-GGGGAGGUAGUGACAAUAAAUCAAUACCG-GGCUC-AUUGAGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGU-UGGGUU-----GAUCGGUCCG-CC--UCU-GGUG---UG--CACCG------UU-----AUUAACAGGGCAGUGGGGGCACGUAUUUCAUAGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACAACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACCAGGGAUCAGCGGAUUUUCUCCGCUGGCCCUUAUGAGAAAUCA--AAGUCGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---CUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAAAUAGCUAUGUUAAC--CCAGCUUCUUAGAGGGACUAUGGCCGCUUAGGCCACGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGUAUUCAACGAGUACCUUGGCCGACAGGCCCGGGUAAUCUU-GAAAUUUCAUCGUGAUGGGGAUAGAUCAUUGCAAUUGUUGGUCUUCAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGGUCCGGUGA-AGUGUUCGGAUGCGGUUCGUGAGAAGUCCACUGAACCUUAUCAUUUAGAGGAAGGAGAAGUCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- pl.Lyc.esc -----UACCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUGUAAGUAUGAACAAA-----CAGA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUGUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCAAAACCCUCUGGGACGGAUCGCA--CGGAUCGCGGCG-ACG-CAUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACCAUGGUGGUGACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACCG-GGCUC-UAUGAGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGA-UGGGCC-----GGCCGGUCCG-CC--CUA-GGUG-UG----CACCG------UC-----AUUAACAGGGCAGUGGGGGCACGUAUUUCAUAGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACAACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACCAGGGAUCGGCGGAUUUUCUCCGCCGGCCCUUAUGAGAAAUCA--AAGUUGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---CUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAACUAGCUAUGCUAUC--CCAGCUUCUUAGAGGGACUA-GCCU-UUUAGGCCGCGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCC-CACGCGCGCUACACUGAUGUAUUCAACGAGUACCUU-GCCGACAGGCCCGGGUAAUCUU-GAAAUUUCAUCGUGAUGGGGAUAGAUCAUUGCAAUUGUUGGUCUUCAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGAUCCGGUGA-AAUGUUCGGACGCGGUUCGCGAGAAGUCCAUUGAACCUUAUCAUUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUG----- pl.Sol.tub -----UACCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUGUAAGUAUGAACAAA-----CAGA--------UGU-GAAACUGCGAAUGGCUCAU-AAAUCAGUUAAGUUUGUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCAAAACCCUCU-GGACGGAUCGCA--CGGAUCGCGGCG-ACG-CAUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACCAUGGUGGUGACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACCG-GGCUC-UAUGAGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGA-UGGCAC-----GGCCGGUCCG-CC--CUA-GGUG---UG--CACCG------UC-----AUUAACAGGGCAGUGGGGGCACGUAUUUCAUAGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACAACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGACCGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACCAGGGGUCGGCGGAUUUUCUCCGCCGGCCCUUAUGAGAAAUCA--AAGUUGGUUCCGGGG-GAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---CUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAACUAGCUAUGCUAUC--CCAGCUUCUUAGAGGGACUACGGCCUUUUAGGCCGCGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGUAUUCAACGAGUACCUUGGCCGACAGGCCCGGGUAAUCUU-GAAAUUUCAUCGUGAUGGGGAUAGAUCAUUGCAAUUGUUGGUCUUCAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGAUCCGGUGA-AAUGUUCGGACGCGGUUCGCGAGAAGUCCAUUGAACCUUAUCAUUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUG----- pl.Gin.bil -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNUCAAAAUAAGCCAUGCAUGUGUAAGUAUGAACUCU-----CAGA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUCUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCACAAUCCUCU-GGACGGAUCGCA--CGGCUGGCGGCG-ACG-CUUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGAGGCCUACCAUGGUGGUGACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACUG-GGCUC-AUCGAGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGC-CGGGUC-----GGCCGGUCCG-CC-UUUUCGGUG---UG--CACCG------CC-----AUUAAUAGGGCGGUGGGGGCACGUAUUUCAUUGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACCACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACUAGGGAUCGGCGGAUUUACUCCGCCGGCCCUUGUGAGAAAUCA--AAGUUGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAACUAGCUAUGCUUCG--CCAGCUUCUUAGAGGGACUAUGGCCCUUCAGGCCAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGUAUUCAACGAGUACCUGGGCCGAGAGGCCCGGGAAAUCUGCGAAAUUUCAUCGUGAUGGGGAUAGAUCAUUGCAAUUAUUGAUCUUAAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAACUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGAUCCGGUGA-AGUGUUCGGACGCGCUUCGCGAGAAGUUCAUUGAACCUUAUCAUUUAGAGGAAGGAGAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- pl.Pin.luc -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNUCAAAAUAAGCCAUGCAUGUCUAAGUAUGAACAU------CAGA--------UGU-GAAACUGCGGAUGGCUCAUUAAAUCAGUUAAGUUUCUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCACAAUCCCUU-GGACGGAUCGCA--CGGUUUGUGGCG-ACG-CUUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGAGGCCUACCAUGGUGGUGACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACUG-GGCUC-AUCGAGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGA-CGUCAC-----GGUCGGUCCG-CC-UUCUUGGUG---UG--CACUG------CC-----AUUAAUAGGGCUGUGGGGGCACGUAUUUCAUUGUCAGAGGUGAAAUUCUUGGAUUUAUGGAAGACGAACCACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACCAGGGAUCGGCGGAUCUACUCCGCCAGCCCUUCUGAGAAAUCA--GAGUGGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAACUAGCUACGCUUCC--CCAGCUUCUUAGAGGGACUAUGGCCUCCUAGGCCAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGCAACCAACGAGCUCCUGGCCCGAAAGGUUUGGGAAAUCUUCCAAAUUGCAUCGUGAUGGGGAUAGACCAUUGCAAUUAUUGAUCUUCAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGAUCCGGUGA-AGUGUUCGGACGCGCUUC-CGAGAAGUUCAUUGAACCUUAUCAUUUAGAGGAAGGAGAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- pl.Tai.cry -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNUCAAAAUAAGCCAUGCAUGUCUAAGUAUGAACUAU-----CAGA--------UGU-GAAACUGCGGAUGGCUCAUUAAAUCAGUUAAGUUUCUUUGAUGGACUCGGAUAACCGUAGUAAUCUAGAGCUAAUACGUGCACAAUCCCUU-GGACGGAUCGCA--CGGUCUGCGGCG-ACG-CUUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGAGGCCUACCAUGGUGGUGACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACUG-GGCUC-AUCGAGUUGGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGU-CGUCAC-----GGUCGGUCCG-CC-ACUC-GGUG---UG--CACUG------CC-----AUUAAUAGGGCUGUGGGGGCACGUAUUUCAUUGUCAGAGGUGAAAUUCUUGGAUUUAUGGAAGACGAACCACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUCAGAUACCGUCCUAGUCUCAACCAUAAACGAUGCCGACCAGGGAUCGGCGGAUCUACUCCGCCAGCCCUUCUGAGAAAUCA--GAGUGGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACGCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUCUAACUAGCUACGCUUCC--CCAGCUUCUUAGAGGGACUAUGGCCGUUUAGGCCAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGCAACCAACGAGCUCCUGGCCCGAAAGGUUCGGGAAAUCUUCCAAAUUGCAUCGUGAUGGGGAUAGACCAUUGCAAUUAUUGAUCUUCAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGAUCCGGUGA-AGUGUUCGGACGCGCUUU-CGAGAAGUUCAUUGAACCUUAUCAUUUAGAGGAAGGAGAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- pr.Sar.neu -----AACCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUUAAAAUAAGCCAUGCAUGUCUAAGUAUAAGCU-------UAUA--------GGC-GAAACUGCGAAUGGCUCAUUAAAACAGUUAAGUUUAUUUGAUGGACAUGGAUAACCGUGGUAAUCUAUGGCUAAUACAUGCGCAUACCCUCGGGUCGGAUCGCAU-UGGUUUUCGGCG-AUG-GAUCAUUCAAGUUUCUGACCUAUCAGCUUGACGGUACUGUAUUGGACUACCGUGGCAGUGACGGGUACGGGGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCUA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACUC-AGGGAGGUAGUGACAAGAAAUCAACACUG-GAAAU-UAUAUUUUAGUGAUUGAUGAUGGGAAUCCAAACC------------------------------CCUUCAGAGUAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGUGCU-GGAAGC-GCCAGUCCCCC--------UUUG-------------------CACUUGGG--AUUAAUAGGGCAGUGGGGGCACGUAUUUAACUGUCAGAGGUGAAAUUCUUAGAUUUGUUAAAGACGAACUACUGCGAAAGCAUGCCAAAGAUGUUUUAUU---AAUCAAGAACGAAAGUUAGGGGCUCGAAGACGAUCAGAUACCGUCGUAGUCUUAACCAUAAACUAUGCCGACUAGAGAUAGGAAAAUCCUCUUCUCCUGCCCUUAUGAGAAAUCA--AAGUCGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGCGUGGAGC-UGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAG-AA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUUAACCUCUAAAUAGGGUCAAUUUG-UAUCACUUCUUAGAGGGACUUUGCGUGUCUAACGCAAGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCUGCACGCGCGCUACACUGAUGCAUCCAACGAGCACCCUGGCCGACAGGUCUAGGUAAUCUUUGAGUAUGCAUCGUGAUGGGGAUAGAUUAUUGCAAUUAUUAAUCUUCAACGAGGAAUGCCUAGUAGGCGCAAGUCAGCAGCUUGCGCCGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAGUGUUCCGGUGA-AUUAUUCGGAUGUUCUACCUGGGAAGUUUUGUGAACCUUAACACUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCAGAAGGAUCANNN----- pr.Pla.mal -----AACCUGGUUGAUCUUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAAGUGAAAGUAUAUGCAUAU----U--------AUAUGUAGAAACUGCGAACGGCUCAUUAAAACAGUUAAGUCUACUUGACAUAUAAGGAUAACUACGGAAAACUGUAGCUAAUACUUGCUUAAUACUUAUGUAAACACAUAA--UAAUUCG-UAGU-GUG-UAUCAAUCGAGUUUCUGACCUAUCAGCUUGAUGUUAGGGUAUUGGCCUAACAUGGCUAUGACGGGUACGGGGAAUUAGAGUUCGAUUCCGGAGAGGGAGCCUGAGAAAUAGCUACCACAUCUA-GGAAGGCAGCAGGCGCGUAAAUUACCCAAUUCUAAAGA-AGAGAGGUAGUGACAAGAAAUCAAUGCAA-GGCCA-AUUUGGUUUGCAAUUGAUGAUGGGAAUUUAAAAC------------------------------CUCCCAGA-AGGCAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAAUUGUUGCAGUUAAAACGCUCGUAGUUGAAGG-AUC-AAU-AUUUUAAUAAU--GCUU-GCCAUU-UAAUAUCUUCUU-AUU---AUAA--AAUUAAUAGGAUAGCGGGGGCAUGUAUUCAGAUGUCAGAGGUGAAAUUCUUAGAUUUUCUGGAGACAAGCAACUGCGAAAGCAUGCCUAAAAUACUUCAUU---AAUCAAGAACGAAAGUUAAGGGAGUGAAGACGAUCAGAUACCGUCGUAAUCUUAACCAUAAACUAUGCCGACUAGGUGUUGGAUGAAUAUUUCCUUCAGUCCUUAUGAGAAAUCA--AAGUCGGUUCUGGGGCGAGUAUUCGCGCAAGCGAGAAAGUUAAAAGAACCGACGGAAGGGGACAC-AGGCGUGGAGCUUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACUAGUUUAAGACAG-UA---UUAA-------------------------U--UU--CUUGGAUGGUGAUGCAUGGCCGUUUUUAGUUCGUGAAAUGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGAUCUUAACCUCUAAUUAGCGGUAAUAAU---AAGCUUCUUAGAGGAACGAUGUGUGUCUAACACAAGGAAGUUUAAGGCAACAACAGGUCGUGAUGUCCUUAGUGAACUAGGCUGCACGCGUGCUACACUGAUAUGUAUAACGAGGUUGCUCACUGAA-AGUGUAGGUAAUCUUUCAAUAUAUAUCGUGAUGGGGAUAGAUUAUUGCAAUUAUUAAUCUUGAACGAGGAAUGCCUAGUAAGCAUGAUUCAUCAGAUUGUGCUGACUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAAGAUAUGAUGA-AUUGUUUGGAAAGAAUUAUCUGGAAAAAUCGUAAAUCCUAUCUUUUAAAGGAAGGAGAAGUCGUAACAAGGUUUCCGUCGGUGAACCUGCGGAAGGAUCAUUA----- pr.Sti.hel -----NNNNNNNNNNNNNNNNNNNNNNNUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUAUAAACUG------UAUA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUAUUUGAUGGACUCGGAUAACUGUAGUAAUCUACAGCUAAUACGUGCUUAGUCCUCU-GGACGAAUCGCA--UAGCUUGUGGCG-AUG-UUUCAUUCAAACUUCUGCCCUAUCAACUUGACGGUAGGAUAGAGGCCUACCAUGGUGGUGACGGGUACGGAGGAUUAGGGUUCGGUUCCGGAGAGGGAGCCUGAGAAAUGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-AGGGAGGUAGUGACAAUAAAUCAAUACCG-GGCAU-UAAAUGUUGGUAAUUGAUGAUAACGAUCUAAACC------------------------------CAUAUAGAGUAUCCAUUGGAGGCAAGUCUGGUGCCNNCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUUNAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGU-GGCUUUC-----UGCGGUCCG-C---UUC-GGUG---UG--CACUG------CU-----AUUAAGAGGGCAGUGGGGGCACGUAUUUCAUUGUCAGAGGUGAAAUUCUUGGAUUUAUGAAAGACGAACUUCUGCGAAAGCAUGCCAAGUAUGUCUUAUU---AAUCAAGAACGAAAGUUGGGGGCUCGAAGACGAUUAGAUACCGUCGUAGUCUCAACCAUAAACGAUGCCGACUAGGGAUCGGUGGGUUCGCUCCUCCGGCCCUUAUGAGAAAUCA--AAGUCGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGCGUGGANNCUGCGGCUUAAUUUGACUCAACACGGGAAAACUUACCAGGUCCAGACAA-GA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGGUUGCCUUGUCAGGUUGAUUCC-GGAACGAACGAGACCUCAGCCUCUAAAUAGUNNNUGUCUU-GCCGACUUCUUAGAGGGACUAUUGUCGUUUAGGCAAUGGAAGUAUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGCAUUCAACGAGUACCUUGGCCGAGAGGCC-GGGUAAUCUU-GAAACUGCAUCGUGAUGGGGAUAGAUUAUUGCAAUUAUUAGUCUUCAACGAGGAAUGCCUAGUAAGCGCGAGUCAUCAGCUCGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGGGUGUGCUGGUGA-AGNNUUCGGAUGAGAUCU-CGAGAAGUUCGUUAAACCCUCCCACCUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCNNNNNNNNNNNNNNNNNNNNNNNNNNN----- pr.Mal.mat -----CACCUGGUUGAUCCUGCCAGUAG-CAUACGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUAUAAACAAC-----UGUA--------UGU-GAAACUGCGAAUGGCUCAUUAUAUCAGUUAAGUUUAUUUGAUGGACAUGGAUAACCGUAGUAAUCUAGAGCUAAUACAUGCAUAAGCCUUC-GGCCGGAUCGAU--C--UUCGGAUCG-AUG-CAUCAUUCAAGUUUCUGCCCUAUCAGCUUGAUGGUAGGGUAUUGGCCUACCAUGGCUUUAACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAAUGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGUAAAUUACCCAAUCCUGACAC-AGGGAGGUAGUGACAAUAAAUCAAUGCCG-GGCUUAUUUAAGUUGGCAAUUGAUGAGAACAAUUUAAAUC------------------------------CCUAUCGAGGAUCAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUACUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGUGGU-UGUGGCA-----AACAGUCCG-CCC-AACAGGUGU---GU--GCUA---------U-G-AUUAAUAGGGUAGUGGGGGUACGUAUUCAAUUGUCAGAGGUGAAAUUCUUGGAUUUAUGGAAGACGAACUACUGCGAAAGCAUACCAAAUAUGUUUUAUU---AAUCAAGAACGAAAGUUAGGGGAUCGAAGAUGAUUAGAUACCAUCGUAGUCUUAACCAUAAACUAUGCCGACUAGGGAUUGGUGGCGUUACUCCAUCAGCCCUUAUGAGAAAUCA--AAGUCGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGAAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-GA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCCCCGCCUCUAAAUAGUCGUACUUAGUGUCGGCUUCUUAGAGGGACUUUUGGUGACUAACCGAAGGAAGUUGGGGGCAAUAACAGGUCGUGAUGCCCUUAGUGUCCUGGGCCGCACGCGCGCUACACUGACACACGCAACGAG-UCCUUGAUUGAAAAAUCCGGGUAAUCUUUAAAUGUGUGUCGUGAUAGGGAUAGAUUAUUGCAAUUAUUAAUCUUGAACGAGGAAUUCCUAGUAAAUGCGAGUCAUCAGCUCGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCACCUACCGAUGAAUGAUUCGGUGA-AAAUUCCGGAUGUGGGCAAUAGGAAGUUAUUUAAACCUCAUCAUUUAGAGGAAGGUGAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCAGAAGAUCCAGCUUCCUC pr.Cyc.gla -----NNNNNNNNNNNNNNNNNNNNUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUAUAAAU--------AAUA--------AGU-GAAACUGCGAAUGGCUCAUUAAAACAGUUAAGUUUAUUUGAUGAACAUGGAUAACCGUGGUAAUCUAGAGCUAAUACAUGCAUAGGCCUCACGGCCGGAUCCCC-----AACU-GGGG-AUAAGAUCANUCAAGUUUCUGCCCUAUCAGCUUGAUGGUAGUGUAUUGGACUACCAUGGCAGUAACGGGUACGGAGAAUUAGGGUUCGGUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCUA-GGAAGGCAGCAGGCGCGUAAAUUACCCAAUGCAGAUUC-UGCGAGGUAGUGACAAUACAUCAACCUGG-GGUC-UCACGACUACGGGAUUGAUGAGAACAAUUUAAAAA------------------------------CCUAACGAGGAACAAGUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCACUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGUGGU-UGNGCGC------AGUGGUCG-GUC-AUC--GAUC------GCCCG-------------AUUAAUAGGGCAGUGGGGGCAAGUAUUUAAUUGUCAGAGGUGAAAUUCUUGGAUUUAUUAAAGACUAACUUAUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUAGGGGAUCAAAGACGAUCAGAUACCGUCCUAGUCUUAACUAUAAACUAUACCGACUCGGUUUUGGUGGGGUAUCUCUAUCAGCCCGUAUGAGAAAUCA--AAGUCGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGUGUGGACGCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCAAAACAG-UG---UUGA-------------------------U--UC--UGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGCUUAAUUGC-GUAACGAACGAGACCUUAACCUCUUAAUAGAUGCCUGUAC--UAUUCUUCUUAGAGGGACUAUGCGUUUCAAGCGCAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCCUAGUGUCCUGGGCCGCACGCGCGCUACACUGACCCACUCAGCAAGCUCCUGGCCUGGAAAGGUUGGGUAAUCUUGCAAUAUGGGUCGUGUUAGGGAUCGAUCUUUGCAAUUAUAGAUCUUGAACGAGGAAUACCUAGUAAGUGCAAGUCAUCAGCUUGUACUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGGUCCGGUGA-ACCUUCUGGACAGUCGCAACGCGAAGUUAAGUAAACCUUAUCAUUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCGUNNNNNNNNNNNNNNNNNNNNNNNNN----- pr.Tet.bor -----AACCUGGUUGAUCCUGCCAGUA--CAUAUGCU-UGUCUUAAAAUAACCCAUGCAUGUGCCAGUUCA-GU--------UGAA--------AGC-GAAACUGCGAAUGGCUCAUUAAAACAGUUAAGUUUAUUUGAUAAACAUGGAUAACCGAGCUAAUGUUGGGCUAAUACAUGCUUAAUUCCCUGGAACGAAUCGAA--G--CUUGCUUCG-AUA-AAUCAUCUAAGUUUCUGCCCUAUCAGCUCGAUGGUAGUGUAUUGGACUACCAUGGCAGUCACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAAGGAGCCUGAGAAACGGCUACUACAACUC-GGUCGGCAGCAGGGAAGAAAAUUGGCCAAUCCUAAUUC-AGGGAGCCAGUGACAAGAAAUCAAGCUGG-GAAACCUA-GUUUUCGGCAUUGAUGAGAAAAGUGUUAAUC------------------------------UCUAGCGAGGAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGUGUU-CAGGUUCU-----UUCGA--------CUCG------------UCG----------G-GAAUUAAUAGGGCAGUGGGGGCAAGUAUUUAAUAGUCAGAGGUGAAAUUCUUGGAUUUAUUAAGGACUAACUAAUGCGAAAGCAUGCCAAAGAUGUUUUAUU---AAUCAAGAACGAAAGUUAGGGGAUCAAAGACGAUCAGAUACCGUCGUAGUCUUAACUAUAAACUAUACCGACUCGGGAUCGGCUGGAUAAUCCAGUCGGCCCGUAUGAGAAAUCA--AAGUCGGUUCUGGGGGAAGUAUGGACGCAAGUCUGAAACUUAAAGGAAUUGACGGAACAGCACACCAGAAGUGGAACCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACGAGCGCAAGACAA-AG---UUGA-------------------------U--UC--UUUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUUAACCUCUAACUAGUCUGCUAACA--UGUACUUCUUAGAGGGACUAUUGGCAAGAAGCCAAUGGAAGUUUAAGGCAAUAACAGGUCGUGAUGCCCCUAGCGUGCUCGGCCGCACGCGCGUUACAAUGACUGGCGCAGAAAGUUCCUGUCCUGGGAAGGUAGGGUAAUCUUUUAAUACCAGUCGUGUUAGGGAUAGUUCUUUGGAAUUGUGGAUCUUGAACGAGGAAUUUCUAGUAAGUGCAAGUCAUCAGCUUGCGUUGAUUAUGUCCCUGCCGUUUGUACACACCGCCCGUCGCUUGUAGUAACGAAUGGUCUGGUGA-ACCUUCUGGAUGCGGGCAACGGAAAAAUAAGUAAACCCUACCAUUUGGAACAACAAGAAGUCGUAACAAGGUAUCUGUAGGUGAACCUGCAGAUGGAUCAUUA----- pr.Rho.sal -----UACCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUUAGUGUAAAUA-------UGU---------UAU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUACGUUUAUUUGAUGGACAUGGAUAACCGUAGUAAUCUAGAGCUAAUACAUGCACAGGUCUUACGACCGAACACAG--C--CUUGAGUGU-GUG-AUUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGAGGCCUACCAUGGUUUUAACGGGUACGGAGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAGACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCC-GACUU-GGGGAGGUAGUGACAAUAAAUCAAUACAG-GGCU--UAC-AGUUUGUUAUUGAUGAGAACAAUUUAAAUC------------------------------UCUUACGAGGAUCAAUUAGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCUAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUCGGGGC-UCGGGGA-----GGCUGUCGG-C---UUCG--------GUGGACGC-----------AUAUUAACAGGGCAGUGGGGCCGUAUAUUUCGUUGUCAGAGGUGAAAUUCUUGGAUUUACGAAAGAUAAACUUCUGCGAAAGCACGGCAAGGAUGUUUUAUU---GAUCAAGAACGAAAGUUAGGGGAUCGAAGACGAUCAGAUACCGUCGUAGUCUUAACCAUAAACUAUGCCGACUAGGGAUCAGUGGAUUUGCUCCAUUGGCCCUUGUGAGAAAUCA--AAGUUGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCUUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCAGCCUGUAACUAGUGACGCUUU--GCCCACUUCUUAGAGGGACUAUUUGUGUUUAAUGAAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGAAUGCAACGAGCUCCUUAUUCGAAAGAAUCGGGUAAACUUUGAAAAUUCAUCGUGAUGGGGAUAGAUUAUUGCAAUUAUUAAUCUUCAACGAGGAAUUCCUAGUAAGCGCGAGUCAUCAGCUCGUGCUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGGUCCGGUGA-AAUCUUCGGAUGCAGUUAUCGAGAAGUUGAUUGAACCUUAUCAUUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUG----- pr.Por.aca -----NNNNNNNNNNNNNNNNNNNNNAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUAUAAACGCC-----UAUA-------UCGU-GAAACUGCGAAUGGCUCAUUAAAACAGUUAAGUUCCUUUGGGAAACUUGGAUAGCCGUAGUAAUCUAGAGCUAAUACAUGCCGACGCCUCACGGUCGGAUCGCA--CGGCAUGUGGCG-ACG-CCUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGAGUAUUGGCCUACCAUGGUGUCGACGGGUACGGGGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCCAACAC-GGGGAGGUAGUGACAAAAAAUCAAUAGGG-GGCCCUUUACGGUCUCUAAUUGGUGAGAACAAUUUAAAUU------------------------------CCUAUCAAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAACGCUCGUAGUCGGGAG-CACCUC-------CACGGGCC-GGGUUCUGUUGC---CAG-UUGGG-------------AUUAAUAGGGCAGUGGGGGCACGUAUUUCAUUGUCAGAGGUGAAAUUCUUGGAUUGAUGGAAGACGCACAACUGCGAAAGCAUGCCAUGGAUGUUUUAUU---GAUCAAGAACGAAAGUUAGGGGAUCGAAGACGAUCAGAUACCGUCGUAGUCUUAACCAUAAACGAUGCCGACUGGGGAUUGGCGGGUUUUCUCCGUCAGCCCCUAUGGGAAACCA--AAGUUGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGAAAACUUACCAGGUCCGGACAA-GA---CUGA-------------------------U--UU--UUUGGUUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCC-GUAACGAACGAGACCUCGUCCUCUAAAUAGGUGCGCCCCGUCUUACCUUCUUAGAGGGACUAUCCGCGUCUAGCGUAUGGAAGAUUGAGGCAAUAAUAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGCAUUCAACAAGAGCCUGGACCGGAAGGCUCGGGUAACCUUUGAAACUGCAUCGUGAUGGGGAUAGAUCAUUGCAAUUAUUGAUCUUGAACGAGGAAUUCCUUGUAGGCGUAGGUCACUAGCCUGCGCCGAAUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUCCUACCGAUGAAUGGUCCGAUGA-AGUGUUCGGAUGCGGUUA-UUAGAAGUUCAUUAAAUCUUAUCAUUUAGAGGAAGGAGAAGUCGUAACAAGGUUUCCNNNNNNNNNNNNNNNNNNNNNNNNNNN----- pr.Bla.hom -----NNNNNNNNNNNNNNNNNNNNNNGUCAUACGCU-CGUCUCAAAAUAAGCCAUGCAUGUGUAAGUGUAAAU--------ACUA--------UUU-GGAACUGCGAAUGGCUCAUUAUAUCAGUUAAGUUUAUUUGGUGAACUUGGAUAACCGUAGUAAUCUAGGGCUAAUACAUGAGAGUCCUUGGUAGGCGUAUCGCA--UGCUUAAUAGCG-AUG-AGUCUUUCAAGUUUCUGCCCUAUCAGCUUGAUGGUAGUAUAUGGGCCUACCAUGGCAGUAACGGGUACGAAGAAUUUGGGUUCGAUUUCGGAGAGGGAGCCUGAGAGAUGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGUAAAUUACCCAAUCCUGACAC-AGGGAGGUAGUGACAAUAAAUCAAUGCGG-GACU--AUC-AGUUUGCAAUUGUUGAGAACAAUGUACAAC------------------------------UCUAUCGAUAAGCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAACGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGU-GAUCGCU-----GUUGUGAGA-C---UUCGUCUCU---CG--ACAU-------------GUUAAAAGGACAGUGGGGGUACAUAUUCACUAGUUAGAGGUGAAAUUCUCGGAUUUAUGGAAGAUGAACAAGUGCGAAAGCAUACCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGCUAGGGGAUCGAAGAGGAUUAGAUACCCUCGUAGUCUUAGCUAUAAACGAUACCGACUAGGGGUUAGUAGA-AAGCUUUAUUAGUCCUUAUGAGAAAUCA--AAGUCGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUUACCAGGUCCAGACAA-AA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGCUAAUUCC-GAAACGAACGAGACCUCCGCCUUUAACUAGUGACGUUGUG---UUGCUUCUUAUAGGGACACUAUAUGUAAAAUGUAGGGAAGCUGGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCUGCACGCGCGCGACACUGAUCUAUUCAACGAG--GCUGGGUCGAGAGACUUGGCAAAUCUUUGAAAGUAGAUCGUGAUGGGGAUUGAUGCUUGUAAUUGUUCAUCAUGAACGAGGAAUUCCUAGUAAACGCAAGUCAUCAACUUGCAUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCACCUACCGAUGAAUGGUCCGGUGA-ACACUUUGGAUUGAGUAAAGGAGAAGUCGUGUAAAUCUUAUCAUUUAGAGGAAGGUGAAGUCGUAACAAGGUUUCCNNNNNNNNNNNNNNNNNNNNNNNNNNN----- pr.Try.equ -----NNNNNNNNUGAUUCUGCCAGUAGUCAUAUGCU-UGUUUCAAGAUUAGCCAUGCAUGCCUCAGAAUCACUGC------U----------GCAG-GAAUCUGCGCAUGGCUCAUUACAUCAGACGAAUCUGCGCCAAAAUACUGGAUAACUUGGCGAAAGCCAAGCUAAUACAUGAACUCGGAUUA-UCCGAAAGCCG---GC-CUUGCUCGG-CGUCUACUGAC-GAACAACUGCCCUAUCAGCCAGAUGGCCGUGUAGUGGACUGCCAUGGCGUUGACGGGAGCGGGGGAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAAUAGCUACCACUUCUC-GGAGGGCAGCAGGCGCGCAAAUUGCCCAAUGUCGAAAACGAUGAGGCAGCGAAAAGAAAUGAGCGACC-GUGCCCUAGUGCAGGUUGUUUUAUGGGGGAUACUCAAACC--------------------------CAUCCAUAUCGAGUAACAAUUGGAGACAAGUCUGGUGCCAGCACCCGCGGUAAUUCCAGCUCCAAAAGCGUAUAUUAAUGCUGUUGCUGUUAAAGGGUUCGUAGUUGGGGC-CACGUAG-------UGUGCC--GUGACCUC-GGACGUGUUGACCCACGC--------CCUGAAGGAGGGAGUUGGGGGAGCGUACUGGUGCGUCAGAGGUGAAAUUCUUAGACCGCACCAAGACGAACUACAGCGAAGGCACUUCAAGGAUACCUUCUC---AAUCAAGAACCAAAGUGUGGGGAUCAAAGAUGAUUAGAGACCAUUGUAGUCCACACUGCAAACCAUGACACCCAUGAAUUGGGGAA----UU-UCUUACUCUUCACGCGAAAGCUUGGAGGUGUCUCAGGGGGGAGUACGUUCGCAAGAGUGAAACUUAAAGAAAUUGACGGAAUGGCACACAAGACGUGGAGCGUGCGGUUUAAUUUGACUCAACACGGGGAACUUUACCAGAUCCGGACAG-GA---UGGA-------------------------U--CC--CCUGAAUGGUGGUGCAUGGCCGCUUUUGGUCGGUGGAGUGAUUUGUUUGGUUGAUUCC-GUAACGGACGAGAUCCAAGCUGCCCAGUAGGUGCCGUUUU-GGCCCCUUCUCUGCGGGAUUCCUUGCUUUCGGCAAGGUGAGAUUUUGGGCAACAGCAGGUCGUGAUGCUCCUCAUGUUCUGGGCGACACGCGCACUACAAUGUCAGUGAGAACAGUCCCUUG-AUCAAAAGAGC-GGGGAAACCAAAUCACGUAGACCCACUUGGGACCGAGUAUUGCAAUUAUUGGUCGCCAACGAGGAAUGUCUCGUAGGCGCAGCUCAUCAAACUGUGCCGAUUACGUCCCUGCCAUUUGUACACACCGCCCGUCGUUGUUUCCGAUGAUGGUGCAAUACA-GGUGAUCGGACGUCGUCU-CCGAAAGUUCACCGAUAUUGCUUCAAUAGAGGAAGCAAAAGUCGUAACAAGGUAGCUGUAGNNNNNNNNNNNNNNNNNNNNNNN----- an.Dro.mel -----AUUCUGGUUGAUCCUGCCAGUAGUUAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUACACACG-------UUAA--------AGU-GAAACCGCAAAAGGCUCAUUAUAUCAGUUAGGUUCCUUAGAUCGACUUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCAAAACAUUUAUGUGCAGAUCGUA--UGGCUUGCGACG-ACA-GAUCUUUCAAAUGUCUGCCCUAUCAACUUGAUGGUAGUAUCUAGGACUACCAUGGUUGCAACGGGUACGGGGAAUCAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCUA-GGAAGGCAGCAGGCGCGUAAAUUACCCACUCCCAGCUC-GGGGAGGUAGUGACGAAAAAUCAAUACAG-GACUCAUCCGAGCCUGUAAUUGAUGAGUACACUUUAAAUC------------------------------CUUAACAAGGACCAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCGGUUAAAACGUUCGUAGUUGUGCU-UCAUAC-----GGGUAGUACA-ACU-AAUUGUUA-GU-----ACUU----ACCUUUUG-AUUAAUAGAACAGUGGGGGCAAGUAUUACGACGCGAGAGGUGAAAUUCUUGGACCGUCGUAAGACUAACUUAAGCGAAAGCAUGCCAAAGAUGUUUUAUU---AAUCAAGAACGAAAGUUAGAGGUUCGAAGGCGAUCAGAUACCGCCCUAGUUCUAACCAUAAACGAUGCCAGCUAGCAAUUGGGUGUUUUUCUCUCUCAGUGCUUCCGGGAAACCA--AAGCUGGCUCCGGGGGAAGUAUGGUUGCAAAGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGAAAACUUACCAGGUC-GAACAA-UG---UUGA-------------------------U--CC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUCGUGGAGUGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACUCAAAUAUUUAAAUAGAUAC--UGUGCACUAGCUUCUUAAAUGGACAAAUUGCGUCUAGCAAUAUGAGA-UU-GAGCAAUAACAGGUCGUGAUGCCCUUAGUGUCCUGGGCUGCACGCGCGCUACAAUGAAAGUAUCAACGUGUUCCUAGACCGAGAGGUCCGGGUAAACCGUGAACCACUUUCAUGCUUGGGAUUGUGAACUGAAACUG-UUCACAUGAACUUGGAAUUCCCAGUAAGUGUGAGUCAUUAACUCGCAUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUUAUUUAGUGA-GGUCUCCGGAGUGAUCUUGGCAAAAGUUGACCGAACUUGAUUAUUUAGAGGAAGUAAAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUA----- an.Lam.aep -----NACCUGGUUGAUCCUGCCAGUAG-CAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUGCGUGCAAACGG------GUUA--------AGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAGGUUCCUUUGAUCGGCUUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCUGAGCGCUCCGGCGCAGAUCGCA--CGGCUGGUGGCG-ACG-UAUCUUUCGAAUGUCUGCCCUAUCAACUUGAUGGCAGGCUCCGUGCCUACCAUGGUGACCACGGGUACGGGGAAUCAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCACUCCCGACUC-GGGGAGGUAGUGACGAAAAAUCAAUACAG-GACUCUUUCGAGCCUGUAAUUGAUGAGUACACUUUAAAUC------------------------------CUUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGCUGUUGCAGUUAAAAAGCUCGUAGUUGGGGA-AGGGGCU-----UGCGGUCCG-CC--GCGAGGUGU--GU--CACUC-------CUG--CAUUAAGAGGGCUGCGGGGGCACGUAUUGUGCCGUUAGAGGUGAAAUUCUUGGAUCGGCGCAAGACGAGCGAAAGCGAAAGCAUGCCAAGAAUGUCUUAUU---AAUCAAGAACGAAAGUCGGAGGUUCGAAGGCGAUCAGAUACCGCCCUAGUUCCGACCAUAAACGAUGCCAACUGGCGAUCAGGCGGUCCCCCUGCCUGGCGCUUGCGGGAAACCA--AAGUGGGUUCCGGGGGAAGUAUGGUUGCAAAGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGAAAACUCACCCGGCCCGGACAG-GA---UUGA-------------------------U--CC--CGUGGGUGGUGGUGCAUGGCCGUUCGUAGUUGGUGGAGCGAUUUGUCUGGUUCAUUCC-GAAACGAACGAGACUCCGACGUCUAACUAGCUACGCUGUG--UGAGCUUCUUAGAGGGACGAGUGGCUUUCAGCCACACGAGA-UU-GAGCAAUAACAGGUCGUGAUGCCCUUAGUGUCCGGGGCCGCACGCGCGCUACACUGAAUGGAUCAGCGUGCGCCUUCGCCGGCAGGCGUGGGUAACCCGUGAAACCCAUUCGUGAUAGGGAUUGGGGAUUGGAACUGUUCCCCAUGAACGAGGAAUUCCCAGUAAGCGCGAGUCAUAAGCUUGCGUUGAUUAAGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGGAUGGUUUAGUGA-GGUCCUCGGAUGGCCGCGAUGAGAAGACGAUCGAACUUGACUGUCUAGAGGAAGUAAAAGUCGUNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- an.Xen.lae -----UACCUGGUUGAUCCUGCCAGUAG-CAUAUGCU-UGUCUCAAAAUAAGCCAUGCACGUGUAAGUACGCACGG------GGUA--------AGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAGGUUCCUUUGAUCGACUUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCCGAGCGCCCCAGCGCCGAUCGCA--CGUCCCGCGGCG-ACG-AUACAUUCGGAUGUCUGCCCUAUCAACUUGAUGGUACUUUCUGCGCCUACCAUGGUGACCACGGGUACGGGGAAUCAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCACUCCCGACGC-GGGGAGGUAGUGACGAAAAAUCAAUACAG-GACUCUUUCGAGCCUGUAAUUGAUGAGUACACUUUAAAUC------------------------------CUUAACGAGGAUCUAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGCUGCAGUUAAAAAGCUCGUAGUUGGGGA-UCGAGCU-----GGCGGUCCG-CC--GCGAGGCG---GC--UACCC-------CUG--CAUUAAGAGGGCGGCGGGGGCACGUAUUGUGCCGCUAGAGGUGAAAUUCUUGGACCGGCGCAAGACGAACCAAAGCGAAAGCAUGCCAAGAAUGUUUUAUU---AAUCAAGAACGAAAGUCGGAGGUUCGAAGACGAUCAGAUACCGUCGUAGUUCCGACCAUAAACGAUGCCGACUAGCGAUCCGGCGGUCCCCCCGCCGAGCGCUUCCGGGAAACCA--AAGUCGGUUCCGGGGGGAGUAUGGUUGCAAAGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGAAACCUCACCCGGCCCGGACAG-AA---UUGA-------------------------U--CC--UGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACUCCUCCAUCUAACUAGUUACGCCCGG--CCAACUUCUUAGAGGGACAAGUGGCGUUCAGCCACACGAGA-UC-GAGCAAUAACAGGUCGUGAUGCCCUUAGUGUCCGGGGCUGCACGCGCGCUACACUGAACGGAUCAGCGUGUACCUGCGCCGACAGGUGCGGGUAACCCGUGAACCCCGUUCGUGAUAGGGAUCGGGGAUUGCAAUUAUUUCCCAUGAACGAGGAAUUCCCAGUAAGUGCGGGUCAUAAGCUCGCGUUGAUUAAGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGGAUGGUUUAGUGA-GGUCCUCGGACGGCCGCCACGAGAAGACGAUCAAACUUGACUAUCUAGAGGAAGUAAAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUA----- an.Hom.sap -----UACCUGGUUGAUCCUGCCAGUAG-CAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUACGCACGG------GGUA--------AGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAGGUUCCUUUGGUCGACUUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCCGGGCGC-CGCGCGCCGAUCGCA--CGCCCCGCGGCG-ACG-ACCCAUUCGAACGUCUGCCCUAUCAACUUGAUGGUAGUCGCCGUGCCUACCAUGGUGACCACGGGUACGGGGAAUCAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCACUCCCGACCC-GGGGAGGUAGUGACGAAAAAUCAAUACAG-GACUCUUUCGAGCCUGUAAUUGAUGAGUCCACUUUAAAUC------------------------------CUUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGCUGCAGUUAAAAAGCUCGUAGUUGGGGA-GCGGGC-----GGGCGGUCCG-CC--GCGAGGCG-AGCC---ACCC------CCGU---AUUAAGAGGGCGGCGGGGGCACGUAUUGCGCCGCUAGAGGUGAAAUUCUUGGACCGGCGCAAGACGGACCAGAGCGAAAGCAUGCCAAGAAUGUUUUAUU---AAUCAAGAACGAAAGUCGGAGGUUCGAAGACGAUCAGAUACCGUCGUAGUUCCGACCAUAAACGAUGCCGACCGGCGAUGCGGCGGUCCCCCCGCCGGGCGCUUCCGGGAAACCA--AAGUCGGUUCCGGGGGGAGUAUGGUUGCAAAGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGAAACCUCACCCGGCCCGGACAG-CA---UUGA-------------------------U--CC--CGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACUCUGGCAUCUAACUAGUUACGCCGAGCCCCAACUUCUUAGAGGGACAAGUGGCGUUCAGCCACCCGAGA-UU-GAGCAAUAACAGGUCGUGAUGCCCUUAGUGUCCGGGGCUGCACGCGCGCUACACUGACUGGCUCAGCGUGUACCUACGCCGGCAGGCGCGGGUAACCCGUGAACCCCAUUCGUGAUGGGGAUCGGGGAUUGCAAUUAUUCCCCAUGAACGAGGAAUUCCCAGUAAGUGCGGGUCAUAAGCUUGCGUUGAUUAAGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGGAUGGUUUAGUGA-GGCCCUCGGACGGCCGCCCUGAGAAGACGGUCGAACUUGACUAUCUAGAGGAAGUAAAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUA----- an.Mus.mus -----UACCUGGUUGAUCCUGCCAGUAG-CAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUACGCACGG------GGUA--------AGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAGGUUCCUUUGGUCGACUUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCCGGGCGCUCCCGCGCCGAUCGCA--CGCCCCGCGGCG-ACG-ACCCAUUCGAACGUCUGCCCUAUCAACUUGAUGGUAGUCGCCGUGCCUACCAUGGUGACCACGGGUACGGGGAAUCAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCACUCCCGACCC-GGGGAGGUAGUGACGAAAAAUCAAUACAG-GACUCUUUCGAGCCUGUAAUUGAUGAGUCCACUUUAAAUC------------------------------CUUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGCUGCAGUUAAAAAGCUCGUAGUUGGGGA-GCGGGC-----GGGCGGUCCG-CC--GCGAGGCGA--GU--CACCC------CCGU--CAUUAAGAGGGCGGCGGGGGCACGUAUUGCGCCGCUAGAGGUGAAAUUCUUGGACCGGCGCAAGACGGACCAGAGCGAAAGCAUGCCAAGAAUGUUUUAUU---AAUCAAGAACGAAAGUCGGAGGUUCGAAGACGAUCAGAUACCGUCGUAGUUCCGACCAUAAACGAUGCCGACUGGCGAUGCGGCGGUCCCCCCGCCGGGCGCUUCCGGGAAACCA--AAGUCGGUUCCGGGGGGAGUAUGGUUGCAAAGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGG-GCCUGCGGCUUAAUUUGACUCAACACGGGAAACCUCACCCGGCCCGGACAG-CA---UUGA-------------------------U--CC--CGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACUCUGGCAUCUAACUAGUUACGCCGAGCCCCAACUUCUUAGAGGGACAAGUGGCGUUCAGCCACCCGAGA-UU-GAGCAAUAACAGGUCGUGAUGCCCUUAGUGUCCGGGGCUGCACGCGCGCUACACUGACUGGCUCAGCGUGUACCUGCGCCGGCAGGCGCGGGUAACCCGUGAACCCCAUUCGUGAUGGGGAUCGGGGAUUGCAAUUAUUCCCCAUGAACGAGGAAUUCCCAGUAAGUGCGGGUCAUAAGCUUGCGUUGAUUAAGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGGAUGGUUUAGUGA-GGCCCUCGGACGGCCGCCCUGAGAAGACGGUCGAACUUGACUAUCUAGAGGAAGUAAAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUA----- an.Rat.nor -----UACCUGGUUGAUCCUGCCAGUAG-CAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUACGCACGG------GGUA--------AGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAGGUUCCUUUG-UCGACUUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCCGGGCGCUCCCGCGCCGAUCGCA--CGCUCCGCGGCG-ACG-ACCCAUUCGAACGUCUGCCCUAUCAACUUGAUGGUAGUCGCCGUGCCUACCAUGGUGACCACGGGUACGGGGAAUCAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCACUCCCGACCC-GGGGAGGUAGUGACGAAAAAUCAAUACAG-GACUCUUUCGAGCCUGUAAUUGAUGAGUCCACUUUAAAUC------------------------------CUUAACGAGGAUCCAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGCUGCAGUUAAAAAGCUCGUAGUUGGGGA-GCGGGC-----GGGCGGUCCG-CC--GCGAGGCGA--GU--CACCC-------CCG--CAUUAAGAGGGCGGCGGGGGCACGUAUUGCGCCGCUAGAGGUGAAAUUCUUGGACCGGCGCAAGACGGACCAGAGCGAAAGCAUGCCAAGAAUGUUUUAUU---AAUCAAGAACGAAAGUCGGAGGUUCGAAGACGAUCAGAUACCGUCGUAGUUCCGACCAUAAACGAUGCCGACUGGCGAUGCGGCGGUCCCCCCGCCGGGCGCUUCCGGGAAACCA--AAGUCGGUUCCGGGGGGAGUAUGGUUGCAAAGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGG-GCCUGCGGCUUAAUUUGACUCAACACGGGAAACCUCACCCGGCCCGGACAG-CA---UUGA-------------------------U--CC--CGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACUCUGGCAUCUAACUAGUUACGCCGAGCCCCAACUUCUUAGAGGGACAAGUGGCGUUCAGCCACCCGAGA-UU-GAGCAAUAACAGGUCGUGAUGCCCUUAGUGUCCGGGGCUGCACGCGCGCUACACUGACUGGCUCAGCGUGUACCUACGCCGGCAGGCGCGGGUAACCCGUGAACCCCAUUCGUGAUGGGGAUCGGGGAUUGCAAUUAUUCCCCAUGAACGAGGAAUUCCCAGUAAGUGCGGGUCAUAAGCUUGCGUUGAUUAAGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGGAUGGUUUAGUGA-GGCCCUCGGACGGCCGCCCUGAGAAGACGGUCGAACUUGACUAUCUAGAGGAAGUAAAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUA----- an.Ast.amu -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-------NNAA--------AGU-GAAACUGCGGACGGCUCAUUAAAUCAGUUAGGUUCCUUGGAGCGACAUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCAAAGCGCUUACGCGCCGAUCGCA--CGGUCUGCGGCG-ACG-GAUCCUUCGAAUGUCUGCCCUAUCAACUUGAUGGUACGUUAUGCGCCUACCAUGGUCGUAACGGGUACGGAGAAUCAGGGUUCGAUUCCGGAGAGGGAGCUUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCACUCC-GACAC-GGGGAGGUAGUGACGAAAAAUCAAUACAG-GACUCUUUCGAGCCGGUAAUUGAUGAGUACACUUUAAAUC------------------------------CUUAACGAGGAUCUAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAAUUGCUGCAGUUAAAAAGCUCGUAGUUGGGGC-GCAGCGU-----GGCGGUCCG-C---AAG--GCC---G---CACUC-------CCG---AUUAAGAGGGCUGAGGGGGCACGUAUUGCGGUGUGAGAGGUGAAAUUCUUGGAUCGCCGCAAGACGACCGACUGCGAAAGCAUGCCAAGAAUGUUUUAUU---AAUCAAGAACGAAAGUUAGAGGUUCGAAGGCGAUCAGAUACCGCCCUAGUUCUAACCAUAAACGAUGCCGACU--CGAUC-GCCGGUCCACGCGGCGGGCGUCUGCGGGAAACCA--AAGUCGGUUCCGGGGGAAGUAUGGUUGCAAAGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCCGGCCCGGACAA-GA---UUGA-------------------------U--UC--UGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACUCUGGCUUCUAAAUAGUCGCGCGCCG--CUGACUUCUUAGAGGGACAAGUGCGGUUCAGCCACGCGAGA-UU-GAGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCGGGGC-GCACGCGCGCUACACUGAAGGGAUCAGCGGGCGCCUUGUCUGGAAGGCCUGGGCAAUCCGUGAACCCCCUUCGUGCUUGGGAUAGGGACUUGCAAUUGUCUCCCUUAAACGAGGAAUUCCCAGUAAGCGCAAGUCAUCAGCUCGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUGGUUUAGUGA-GAUCCUCGGAGGGCCUCUGCCGGAAGACCAUCGAACUUNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- an.Str.int -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-------NNNN--------CGACGAAACUGCGGAUGGCUCAUUAAAUCAGUUAGGUUCAUUGGAUCGACAUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCGUAGCGCUUCCGCGCCGAUCGCA--CGGUUUGCGGCG-ACG-GAUCCUUCGAAUGUCUGCCCUAUCAACUUGAUGGUACGUUAUGCGCCUACCAUGGUCGUCACGGGUACGGAGAAUCAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCACUCCCGACAC-GGGGAGGUAGUGACGAAAAAUCAAUACAG-GACUCUUUCGAGCCUGUAAUUGAUGAGUACACUUUAAAUC------------------------------CUUAACGAGGAUCCACUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAGUAGCGUAUAUUAAAGCUGUUGCAGUUAAAAAGCUCGUAGUUG-GGGCCCAGGC------UGCGGUCCG-C--CGUGU-GUG---U----ACUC-------A-G---AUUAAGAGGGCUGAGGGGGCACGUAUUGCGGUGUGAGAGGUGAAAUUCUUGGAUCGCCGCAAGACGAACGACUGCGAAAGCAUGCCAAGAAUGUUUUAUU---AAUCAAGAACGAAAGUUAGAGGUUCGAAGGCGAUCAGAUACCGCCCUAGUUCUAACCAUAAACGAUGCCGACUGACGAUCCGCCGGUCCCCGCGGCGGGCGUCUAAGGGAAACCA--AAGUCGGUUCCGGGGGAAGUAUGGUUGCAAAGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGAAAACUCACCCGGCCCGGACAA-GA---UUGA-------------------------U--UC--UGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACUCUGGCUUCUAAAUAGUUGCGCGCCG--UCAACUUCUUAGAGGGACAAGUGGCGUUUAGCCAGGCGAGA-UU-GAGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCGGGGC-GCACGCGCGCUACACUGGCGGAAUCAGCGGGUGCCUUGGCCGGAAGGUCUGGGUAAUCCGUGAACCUCCUCCGUGAUGGGGAUAGGGAGUUGCAAUUAUCUCCCUUGAACGAGGAAUUCCCAGUAAGCGCGAGUCAUCAGCUCGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUGGUUUAGUGA-GAUCCUCGGACGUCGUUUGCGAGAAGACGAUCAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- an.Cae.ele ----AUACCUGAUUGAUUCUGUCAGCGCG-AUAUGCUAAGU--AAAAAUAAGCCAUGCAUGCUUUGAUUCA-----------AA------------U-GAAAUUGCGUACGGCUCAUUAGAGCAGAUACACCUUAUCCGGGAAUAUGGAUAACUGCGGAAAUCUGGAGCUAAUACAUGCAAUACCCGCAAGGGUUUACUGU---CAGUUCGUGACU-CUA-UCCGGAAAGGGUGUCUGCCCUUUCAACUAGAUGGUAGUUUAUUGGACUACCAUGGUUGUUACGGGUACGGAGAAUAAGGGUUCGACUCCGGAGAGGGAGCCUUAGAAACGGCUACCACGUCCA-GGAAGGCAGCAGGCGCGAAACUUAUCCACUGUUG-AGU--AUGAGAUAGUGACUAAAAAUAAAGACUC-AUCC-UUUU-GG-GAGUUAUUUAUGAGUUGAAUACAAAUG------------------------------AUCUUCGAGUAGCAAGGAGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCUCCUAGUGUAUCUCGUUAUUGCUGCGGUUAAAAAGCUCGUAGUUGGGUU-CGUGCCGAGUUCGC------------AUU----------------UGCGU-CAAC-GU-GUUAAGAGGGCAAAGGGGGCACGUAUCAUUACGCGAGAGGUGAAAUUCGUGGACCGUAGUGAGACGCCCAACAGCGAAAGCAUGCCAAGAAUGUCUUAUU---AAUCAAGAACGAAAGUCAGAGGUUCGAAGGCGAUUAGAUACCGCCCUAGUUCUGACCGUAAACGAUGCCAUCUCGCGAUUCGG-AGUUUUCCUGCCGAGGGCUAUCCGGAAACGA--AAGUCGGUUCCGGGGGUAGUAUGGUUGCAAAGCUGAAACUUAAAGAAAUUGACGGAAGGGCACACAAGGCGUGGAGCUUGCGGCUUAAUUUGACUCAACACGGGAAAACUCACCCGGUCCGGACAC-UA---UUGA-------------------------U--CU--GGUGGUUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUUAUUCC-GAAACGAGCGAGACUCUAGCCUCUAAAUAGUUGGCGUUCG--AUAACUUCUUAGAGGGAUAAGCGGUGUUUAGCCGCACGAGA-UU-GAGCGAUAACAGGUCGUGAUGCCCUUAGUGUCCGGGGCUGCACGCGUGCUACACUGGUGGAGUCAGCGGGUUCCUAUGCCGAAAGGUAUCGGUAAACCGUGAAAUUCUUCCAUGUCCGGGAUAGGGUAUUGUAAUUAUUGCCCUUAAACGAGGAAUGCCUAGUAAGUGUGAGUCAUCAGCUCACGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUAUCCGGGACGAACUGAUUCGAGA-AGAGUGGGGAUGUCGUAACCGGAAACCAUUUUUAUCGCAUUGGUUUGAACCGGGUAAAAGUCGUAACAAGGUAGCUGUAGGUGAACCUGCAGCUGGAUCAUCG----- an.Bra.pli -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNNNNNNNNNNNNNAUGCAUGUCUAAGUACAUACC-------AGCA--------GGU-GAAACCGCGAAUGGCUCAUUAAAUCAGUUAGGUUCCUUAGAUCGACAUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCAAAGCUCGUAUGAGCUGAUCGCA--UGGCUAGCGGCG-ACG-UAUCUUUCAAGUGUCUGCCCUAUCAACUUGAUGGUAAGCGAUUUGCCUACCAUGGUUGUAACGGGUACGGGGAAUCAGGGUUCGAUUCCGGAGAGGGAGCAUGAGAAACGGCUACCACAUCUC-GGAAGGCAGCAGGCGCGCAAAUUACCCACUCCUAGAAC-GGGGAGGUAGUGACGAAAAAUCAAUACCG-GACUCAAU-GAGCCGGUAAUUGAUGAGUACAGUUUAAAAC------------------------------CCUAACGAGGAUCUAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGUAUAGCAUGGGUACGC------------UUUU---------------GCUGU-U-AC-ACAAUUAAUAGGGCAGAGGGGGCACGUAUUGCGGUGUUAGAGGUGAAAUUCUUGGAUCGCCGCAAGACGAACAACUGCGAAAGCAUGCCAAGAAUGUUUUAUU---AAUCAAGAACGAAAGUUGGAGGUUCGAAGACGAUUAGAUACCGUCCUAGUUCCAACCAUAAACGAUGCCAACUAGCGAUUAGCUGCUUUUCACAGCUAGCGCUUCCGGGAAACCA--AAGUUGGUUCCGGGGGAAGUAUGGUUGCAAAGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGUAGCCUGCG-CUUAAUUUGACUCAACACGGGAAAUCUCACCCGGCCCGGACAU-AA---UUGA-------------------------U--UC--GGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGCGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACUCUAGCCUCUAAAUAGUACGUCCUUUUGGGUACUUCUUAGAGGGACAAGUAGCGGUAAGCUACACGAAA-UU-GAGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCGGGGGCAUACGCGCGCUACACUGAAGGGAUAAGCGUGUUCCUGCUCCGAAAGGAGUGGGUAAUCCGUGAAA-CCCUUCGUGAUUGGGAUCGGGGCUUGAAAUUAUUCUCCGUGAACGAGGAAUUCCCAGUAAGCGCGAGUCAUAAGCUCGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUGGUUUAGUGA-GAUCUUCGGAUGGUCGCAACGAGAAGAUGAUCAAACUUGAUCAUUUAGAGGAAGUAAAAGUCGUAACAAGGUUUCCGUANNNNNNNNNNNNNNNNNNNNNNNN----- fu.Sch.pom -----UACCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUAUAAGCAAU-----UGUA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUACGUUUAUUUGAUAGACUUGGAUAACCGUGGUAAUCUAGAGCUAAUACAUGCUAAAUCCUUUUGGACGAAUCGCA--UGGCUUGCGGCG-AUG-GUUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGAGGCCUACCAUGGUUUUAACGGGUACGGGGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCCGACAC-GGGGAGGUAGUGACAAGAAAUCAAUGCAG-GGCCCUUUCGGGUUUGUAAUUGAUGAGUACAAUGUAAAUA------------------------------CCUAACGAGGAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGAGC-CUGGUCG-----ACUGGUCCG-CCG--AAGCGUG---UU--UACUG-------UCA--UAUUAAUAGGGUAGUGGGGGCACGUAUUCAAUUGUCAGAGGUGAAAUUCUUGGAUUUAUUGAAGACGAACUACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUAGGGGAUCGAAGACGAUCAGAUACCGUCGUAGUCUUAACCAUAAACUAUGCCGACUAGGGAUCGGGCAAUUUACUUGCUCGGCCCUUACGAGAAAUCA--AAGUCGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACAAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAA-AA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGCUUAAUUGC-GAAACGAACGAGACCUUAACCUCUAAAUAGCUGGAUUUU---UUAGCUUCUUAGAGGGACUAUUGGCAUAAAGCCAAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGACGGAGCCAACGAGUACCUUGGCCGGAAGGUCUGGGUAAUCUUUUAAACUCCGUCGUGCUGGGGAUAGAGCAUUGCAAUUAUUGCUCUUCAACGAGGAAUUCCUAGUAAGCGCAAGUCAUCAGCUUGCGUUGAAUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUGGCUUAGUGA-GGCCUCUGGAUGGCUGCAACGAGAAGUUGGACAAACUUGGUCAUUUAGAGGAAGUAAAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCAGAAGGAUCAUUA----- fu.Asp.nid -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNNNNNNNNNNNNNNNNNNNNNNNAAGUAUAAGCAA------UAUA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUACGUUUAUUUGAUAGACAUGGAUACCUGUGGUAAUCUAGAGCUAAUACAUGCUAAACCCUUCGGGGCGAAUCGCA--UGGCUUGCGGCG-AUG-GUUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACCAUGGUGGCAACGGGUACGGGGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAA-CC-GACAC-GGGGAGGUAGUGACAAUAAAUUGAUACGG-GGCUCUUUUGGGUUCGUAAUUGAUGAGAACAAUUUAAAUC------------------------------CCUAACGAGGAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGU-CUGGCUG-----GCCGGUCCG-C--CUCAC-GCG---AG--UACUG-------UC---CAUUAAUAGGGUAGUGGGGGCGAGUAUUCAGCUGUCAGAGGUGAAAUUCUUGGAUUUGCUGAAGACUAACUACUGCGAAAGCACGCCAAGGAUGUUUUAUU---AAUCAGGAACGAAAGUUAGGGGAUCGAAGACGAUCAGAUACCGUCGUAGUCUUAACCAUGAACUAUGCCGACUAGGGAUCGGGCGGUUUUCCCGCUCGGCCCUUACGAGAAAUCA--AAGUUGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGAAAUUGACGGAAGGGCACACAAGGCGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAA-AA---UUGA-------------------------U--UU--UUUGGAUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGCUUAAUUGC-GAAACGAACGAGACCUCGGCCCUUAAAUAGCCCGGUGUCC--CUGGCUUCUUAGGGGGACUAUCGCC-UCAAGCCGAUGGAAGUGCGCGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGACAGGGGCAGCGAGUCCCUUGGUCGAGAGGCCCGGGUAAUCUUUUAAACCCUGUCGUGCUGGGGAUAGAGCAUUGCAAUUAUUGCUCUUCAACGAGGAAUGCCUAGUAGGCACGAGUCAUCAGCUGCUGCCGAUUACGUCCCUGCCCUUUGUACACACCGCCUGUCGCUACUACCGAUGAAUGGCUCGGUGA-GGCCUCNGGAUGNCUGCAACGGAAAGCUGGUUAAACCCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- fu.Asp.tam -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNNNAAAAUAAGCCAUGCAUGUCUAAGUAUAAGCAC------UAUA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUACGUUUAUUUGAUAGACAUGGAUACCUGUGGUAAUCUAGAGCUAAUACAUGCUAAACCUUUCGGGGCGAAUCGCA--UGGCUUGCGGCG-AUG-GUUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACCAUGGUGGCAACGGGUACGGGGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCCGACAC-GGGGAGGUAGUGACAAUAAAUUGAUACGG-GGCUCUUUUGGGUUCGUAAUUGAUGAGUACAAUCUAAAUC------------------------------CCUAACGAGGAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGU-CUGGCUG-----GCCGGUCCG-C--CUCAC-GCG---AG--UACUG-------UC---CAUUAAUAGGGUAGUGGGGGCGAGUAUUCAGCUGUCAGAGGUGAAAUUCUUGGAUUUGCUGAAGACUAACUACUGCGAAAGCACGCCAAGGAUGUUUUAUU---AAUCAGGAACGAAAGUUAGGGGAUCGAAGACGAUCAGAUACCGUCGUAGUCUUAACCAUAAACUAUGCCGACUAGGGAUCGGGCGGUAUGCCCGCUCGGCCCUUACGAGAAAUCA--AAGUUGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGAAAUUGACGGAAGGGCACACAAGGCGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAA-AA---UUGA-------------------------U--UU--UUUGGAUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGCUUAAUUGC-GAAACGAACGAGACCUCGGCCCUUAAAUAGCCCGGUGUUU--CUGGCUUCUUAGGGGGACUAUCGGC-UCAAGCCGAUGGAAGUGCGCGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGACAGGGCCAGCGAGUCCCUUGACCGAGAGGUCCGGGUAAUCUUUUAAACCCUGUCGUGCUGGGGAUAGAGCAUUGCAAUUAUUGCUCUUCAACGAGGAAUGCCUAGUAGGCACGAGUCAUCAGCUCGUGCCGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUGGCUCGGUGA-GGCCUUCGGAUGGCCGCAACGGAAAGUUGGUCAAACCCGGUCAUUUAGAGGAAGUAAAAGUCGUAACAAGGUUUCNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- fu.Neu.cra -----UACCUGGUUGAUUCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUUUAAGCA-------UAAA--------CGC-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUAUUUGAUAGACAUGGAUAACCGUGGUAAUCUAGAGCUAAUACAUGCUAAACCCUUCGGGGCGAAUCGCA--UGGCUUGUGGCG-AUG-GUUCAUUCAAAUUUCUGCCCUAUCAACUUGACGGCUGGGUCUUGGCCAGCCAUGGUGACAACGGGUACGGAGGGUUAGGGCUCGACCCCGGAGAAGGAGCCUGAGAAACGGCUACUACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCCGACAC-GGGGAGGUAGUGACAAUAAAUUGAUACAG-GGCUCUUUUGGGUUUGUAAUUGAUGAGUACAAUUUAAAUC------------------------------CCUAACGAGGAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGAGGUUAAAAAGCUCGUAGUUGGGGC-UCGGCC-----GU-CGGUCCG-C--CUCAC-GCG---UG--CACUA-------CU---GAUUAAUAGGGCAGUGGGGGCAAGUAUUCAAUUGUCAGAGGUGAAAUUCUUGGAUUUAUUGAAGACUAACUACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAGGAACGAAAGUUAGGGGAUCGAAGACGAUCAGAUACCGUCGUAGUCUUAACCAUAAACUAUGCCGAUUAGGGAUCGGACGGUUUUCCCGUUCGGCCCUUACGAUAAAUCA--AAAUGGGCUCCUGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGAAAUUGACGGAAGGGCACACCAGGGGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAG-GA---UUGA-------------------------U--UU--CGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGCUUAAUUGC-GAAACGAACGAGACCUUAACCUCUAAAUAGCCCGUAUUUG--CUGGCUUCUUAGAGGGACUAUCGGC-UCAAGCCGAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGACACAGCCAGCGAGCUCCUUGGCCGGAAGGUCCGGGUAAUCUUUUAAACUGUGUCGUGCUGGGGAUAGAGCAUUGCAAUUAUUGCUCUUCAACGAGGAAUCCCUAGUAAGCGCAAGUCAUCAGCUUGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUGGCUCAGUGA-GGCUUCCGGAUGGCCGCAACGGAAAGCUAUCCAAACUCGGUCAUUUAGAGGAAGUAAAAGUCGUAACAAGGUCUCCGUUGGUGAACCAGCGGAGGGAUCAUUA----- fu.Can.gla -----UAUCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUAUAAGCAA------UAUA--------AGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUACGUUUAUUUGAUAGACAUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCUUAAUCUUCUUAGACGAAUCGCA--UGGCUUGUGGCG-AUG-GUUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACCAUGGUUUCAACGGGUACGGGGAAUAAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUGACAC-AGGGAGGUAGUGACAAUAAAUCGAUACAG-GGCCCAUUCGGGUUUGUAAUUGAUGAGUACAAUGUAAAUA------------------------------CCUAACGAGGAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGAUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGC-CUGGGU-----AGCCGGUCCG-A--UUUUU-UCG---UG--UACUG------AAU--GCAUUAAUAGGGCGGUGGGGGCAAGUAUUCAAUUGUCAGAGGUGAAAUUCUUGGAUUUAUUGAAGACUAACUACUGCGAAAGCAUGCCAAGGACGUUUUAUU---AAUCAAGAACGAAAGUUAGGGGAUCGAAGAUGAUCAGAUACCGUCGUAGUCUUAACCAUAAACUAUGCCGACUAGGGAUCGGGUGGUUUACCCACUCGGCCCUUACGAGAAAUCA--AAGUCGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCG-CUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAA-AA---UUGA-------------------------U--UU--UGUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGCUUAAUUGC-GAAACGAACGAGACCUUAACCUCUAAAUAGUGGUGCAUUU--UCCACUUCUUAGAGGGACUAUCGGUUUCAAGCCGAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGCGUUCUGGGCCGCACGCGCGCUACACUGACGGAGCCAGCGAGUACCUUGGCCGAGAGGUCUUGGUAAUCUUUGAAACUCCGUCGUGCUGGGGAUAGAGCAUUGUAAUUAUUGCUCUUCAACGAGGAAUUCCUAGUAAGCGCAAGUCAUCAGCUUGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUAGUACCGAUGAAUGGCUUAGUGA-GGCCUCAGGACUGCUGCGAGGAGAAUCUGGUCAAACUUGGUCAUUUAGAGGAACUAAAAGUCGUAACAAGGUUUCCGUAGGUGAA-CUGCGGAAGGAUCAUUA----- fu.Sac.cer -----UAUCUGGUUGAUCCUGCCAGUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUAUAAGCAA------UAUA--------AGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUACGUUUAUUUGAUAGACAUGGAUAACCGUGGUAAUCUAGAGCUAAUACAUGCUUAAUCUCUUUAGACGAAUCGCA--UGGCUUGUGGCG-AUG-GUUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGUGGCCUACCAUGGUUUCAACGGGUACGGGGAAUAAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCUAAUUC-AGGGAGGUAGUGACAAUAAAUCGAUACAG-GGCCCAUUCGGGUUUGUAAUUGAUGAGUACAAUGUAAAUA------------------------------CCUAACGAGGAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGC-CCGGUU-----GGCCGGUCCG-A--UUUUU-UCG---UG--UACUG-----AUUUC--CAUUAAUAGGGCGGUGGGGGCAGGUAUUCAAUUGUC-GAGGUGAAAUUCUUGGAUUUAUUGAAGACUAACUACUGCGAAAGCAUGCCAAGGACGUUUUAUU---AAUCAAGAACGAAAGUUAGGGGAUCGAAGAUGAUCUGGUACCGUCGUAGUCUUAACCAUAAACUAUGCCGACUAG--AUCGGGUGGUUUACCCACUCGGUCCUUACGAGAAAUCA--AAGUCGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACUAGGAGUGGAGCCUGCGGC-UAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAA-AA---UUGA-------------------------U--UU--UGUGGGUGGUGGUGCAUGGCCGUUCUCAGUUGGUGGAGUGAUUUGUCUGCUUAAUUGC-GAAACGAACGAGACCUUAACCUCUAAAUAGUGGUGCAUUU--UCCACUUCUUAGAGGGACUAUCGGUUUCAAGCCGAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGCGUUCUGGGCCGCACGCGCGCUACACUGACGGAGCCAGCGAGUACCUUGGCCGAGAGGUCUUGGUAAUCUUUGAAACUCCGUCGUGCUGGGGAUAGAGCAUUGUAAUUAUUGCUCUUCAACGAGGAAUUCCUAGUAAGCGCAAGUCAUCAGCUUGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUAGUACCGAUGAAUGGCUUAGUGA-GGCCUCAGGACUGCUGCAAGGAGAAUUUGGACAAACUUGGUCAUUUAGAGGAACUAAAAGUCGUAACAAGGUUUCCGUAGGUGAACCUGCGGAAGGAUCAUUA----- fu.Bul.hui -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN-NNNNNCAAAAUAAGCCAUGCAUGUCUAAGUAUAAACAAA-----CAUA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUAUUUGAUGGACAUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCUGAGCCCUCU-GGGCGAAUCGUA--UGGCUUGCGACG-AUG-CUUCAUUCAAAUAUCUGCCCUAUCAACUUGAUGGUAGGAUAGAGGCCUACCAUGGUAUCAACGGGUACGGGGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCCGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACAG-GGCUCUAUUGGGUUUGUAAUUGAUGAGUACAAUUUAAAUC------------------------------CCUAACGAGGAACAACUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAGUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUCGGGGC-CCGGCGG-----GACGGUCCG-C---CUUAGGUG---UG--UACUU-----CU-----UAUUAAUAGGGCGGUGGGGGCAAGUAUUCCGUUGCUAGAGGUGAAAUUCUUAGGUUUACGGGAGACUAACUACUGCGAAAGCAUGCCAAGGACGUUUUAUU---GAUCAAGAACGAAGGUUAGGGGAUCAAAAACGAUUAGAUACCGUUGUAGUCUUAACAGUAAACUAUGCCGACUAGGGAUCGGUUCCAAUAAGGAAUCGGCCCUUACGAGAAAUCA--AAGUCGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGUGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAA-GA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACCUUAACCUCUAAAUAGCCCGGCUUUG--AUGGCUUCUUAGAGGGACUAACGGCGUUUAGCCGUUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGACUGAGCCAGCGAGUCCCUUGGCCGAGAGGUCUGGGUAAUCUUUGAAACUCAGUCGUGCUGGGGAUAGAGCAUUGCAAUUAUUGCUCUUCAACGAGGAAUACCUAGUAAGCGUGAGUCAUCAGCUCGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUGGCUUAGUGA-GAUCUCCGGAUGGCGGCAAUGAGAAGUUGAUCAAACUUGGUCAUUUAGAGGAAGUAAAAGUCGUAACAAGGUUNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- fu.Den.sul -----NNNNNNNNNNNNNNNNNNNNUAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUAUAAACAAG-----UGUA--------UGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUAUUUGAUGGACAUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCAAAGCCCUCGCGGGCGAAUCGCA--UGGCUUGCGGCG-AUG-CUUCAUUCAAAUAUCUGCCCUAUCAACUUGAUGGUAGGAUAGAGGCCUACCAUGGUUUCAACGGGUACGGGGAAUAAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCCGACAC-GGGGAGGUAGUGACAAUAAAUCAAUAUAG-GGCUCUUUUGGGUUUAUAAUUGAUGAGUACAAUUUAAAUC------------------------------UCUAACGAGGAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGAGAC-CUGGCCG-----GGCGGUCUG-CC--UCACGGUA---UG--UACUU--------C---UAUUAAUAGGGUAGUGGGGGCAAGUAUUCAGUUGCUAGAGGUGAAAUUCUUGGAUUUACUGAAGACUAACUACUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAGGUUAGGGGAUCGAAAACGAUCAGAUACCGUUGUAGUCUUAACAGUAAACUAUGCCGACUAGGGAUCGGGCGAUCUUGUCGCUCGGCCCUUACGAGAAAUCA--AAGUCGGUUCUGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGAGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAG-UA---UUGA-------------------------A--UU--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACCUUAACCUCUUAAUAGCCAGGCUUUU--CCGGCUUCUUAGAGGGACUGUCUGCGUCUAGCAAACGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGACAGAGCCAGCGAGCACCUUGGCCGGAAGGUCUGGGUAAUCUUUGAAACUCUGUCGUGCUGGGGAUAGAGCAUUGCAAUUAUCGCUCUUCAACGAGGAAUUCCUAGUAAGCGUGAGUCAUCAGCUCGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUGGCUUAGUGA-GGUCUUGGGAUGGCUGCAAUGAGAACUUGAUCAAACUUGGUCAUUUAGAGGAA-UAAAA-UCGUAACAAGG-UUCC-UAGNNNNNNNNNNNNNNNNNNNNNNN----- fu.Fel.oga -----NNNNNNNNNNNNNNNNNNNNNAGUCAUAUGCU-UGUCUCAAAAUAAGCCAUGCAUGUCUAAGUAUAAACAAA-----CAUA--------GGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUAUUUGAUGGACAUGGAUAACUGUGGUAAUCUAGAGCUAAUACAUGCUGAGCCCUCU-GGGCGAAUCGCA--UGGCUUGCGGCG-AUG-CUUCAUUCAAAUAUCUGCCCUAUCAACUUGAUGGUAGGAUAGAGACCUACCAUGGUAUCAAC-GGUACGGGGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCCGACAC-GGGGAGGUAGUGACAAUAAAUCAAUACAG-GGCUCUAAUGGGUUUGUAAUUGAUGAGUACAAUUUAAAUC------------------------------CCUAACGAGGAACAACUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAGUACCGUAUAUUAAAGUUGUUGCAGUUAAAACGCUCGUAGUCGGGGC-CUGGCGG-----GGUGGUCCG-CC--UUACGGUG---UG--UACUU--------C---CAUUAAUAGGGCGGUGGGGGCAAGUAUUCCGUUGCUAGAGGUGAAAUUCUUAGAUUUACGGAAGACUAACUUCUGCGAAAGCAUGCCAAGGACGUUUUAUU---GAUCAAGAACGAAGGUUAGGGGAUCAAAAACGAUUAGAUACCGUUGUAGUCUUAACAGUAAACUAUGCCGACUAGGGAUCGG-CCA-UUCCUG-CUCGGCCCU-ACGAG-AAUCA--AAGUCGGUUCUGGGGG-AGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGUGUGGAGCCUGCGGCUUAAUUUGACUCAACACGGGGAAACUCACCAGGUCCAGACAA-AA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGCCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACCUUAACCUCUAAAUAGUCAGGCUCCG--UCGACUUCUUAGAGGGACUGUCGGCGUUUAGCCGACGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGACCGCACGCGCGCUACACUGACUGAGCCAGCGAGUCCCUUGUCCGAGAGGUCUGGGUAAUCUUUG-AA-UCAGUCGUGCUG-GGAUAGAGCAUUGCAAUUAUUGCUCUUCAACGAGGAAUACCUAGUAAGCGUGAGUCAUCAACUCGCGUUGAUUACGUCCCUGCCCUUUGUACACACCGUCUGUCGCUACUACCGAUGAAUGGCUUAGUGA-GGUCUCCGGAUGGCAGCAAUGGGAAGUUGAACAAACUUGGUCAUUUAGAGGAAGUAAAAGUCAUAACAAGGUUUCCGUANNNNNNNNNNNNNNNNNNNNNNNN----- fu.Gig.mar -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCU-UGUCUCAAAAUAAGCCAUGCAUGUCU-AGUAUAAUCG-------AUAC--------GGU-GAAACUGCGAAUGGCUCAUUAAAUCAGUUAAGUUUAUUUGAUAGACUUGGAUAACCGUGGUAAUCUAGAGCUAAUACAUGCUAAAUCCUCU-GGACGAAUCGUA--UGGCUUGUGACG-AUG-UAUCAUUCAAAUUUCUGCCCUAUCAACUUGAUGGUAGGAUAGAGGCCUACCAUGGUUUUAACGGGUACGGGGAAUUAGGGUUCGAUUCCGGAGAGGGAGCCUGAGAAACGGCUACCACAUCCA-GGAAGGCAGCAGGCGCGCAAAUUACCCAAUCCCGAUAC-GGGGAGGUAGUGACAAUAAAUCAAUACAG-GGCUCUUAUGGGUUUGUAAUUGAUGAGUACAAUUUAAAUC------------------------------UCUAACGAGGAACAAUUGGAGGCAAGUCUGGUGCCAGCAGCCGCGGUAAUUCCAGCUCCAAUAGCGUAUAUUAAAGUUGUUGCAGUUAAAAAGCUCGUAGUUGGGGG-UUCUACC-----GUUGGUCGG-GC--AAUUGUCU----G--UACUGC------G----UAUUAAUAGGGUAGUGGGGGCAAGUAUUCAAUUGUCAGAGGUGAAAUUCUUGGAUUUAUUGAAGACUAACUUCUGCGAAAGCAUGCCAAGGAUGUUUUAUU---AAUCAAGAACGAAAGUUAGGGGAUCGAAGACGAUCAGAUACCGUCGUAGUCUUAACCAUAAACUAUGCCGACUAGGGAUCGGACGAUUUUCUCGUUCGGC-CUUACGGGAAACCA--AAGUGGGUUCCGGGGGGAGUAUGGUCGCAAGGCUGAAACUUAAAGGAAUUGACGGAAGGGCACACCAGGGGUGGACCGUGCGGCUUAAUUUGACUCAACACGGGAAAACUCACCAGGUCCAGACAA-AA---UUGA-------------------------U--UC--UAUGGGUGGUGGUGCAUGGCCGUUCUUAGUUGGUGGAGUGAUUUGUCUGGUUAAUUCC-GAAACGAACGAGACCUUAACCUCUAAAUAGUCAGGCUUUU--ACGACUUCUUAGAGGGACUAUCGGCGUUUAGCCGAUGGAAGUUUGAGGCAAUAACAGGUCGUGAUGCCCUUAGUGUUCUGGGCCGCACGCGCGCUACACUGAUGAAGUCAUCGAGUUCCUUUACCGGAAGGUAUGGGUAAUCUUUGAAACUUCAUCGUGAUGGGGAUAGAGCAUUGCAAUUAUUGCUCUUCAACGAGGAAUCCCUAGUAAGCAUGAGUCAUCAGCUCGUGCUGAUUACGUCCCUGCCCUUUGUACACACCGCCCGUCGCUACUACCGAUGAAUGGCUUAGUGA-GACCCUCGGACGAUAUCACUGAGAAGUUGGUCAAACUUGGUCAUUUAGAGGAAGUAAAAGUCGUAACAAGGUUUCCGUNNNNNNNNNNNNNNNNNNNNNNNNN----- fl.Bac.spl NCUAUGAAGAGUUUGAUCCUGGCUCAGGAUNAACGCUAGCGACAGGCUUAACACAUGCAAGUCGAGGGGCAUCAUGAGGUA-GCAA-UACCUUGAUGGCGACCGGCGCACGGGUGCGUAACGCGUAUGAACCUGCCUGAUACCGGGGUAUAGCCCAUGGAAACGUGGAUUAACACCCCAUAGUACUGCAUAGUUAAAU---------GUUY------AAGGUAUCGGAUGGGCAUGCGUCCUAUUAGUUAGUUGGCGGGGUAACAGCCCACCAAGACGAUGAUAGGUAGGGGUUCUGAGAGGAAGGUCCCCCACAUUGGAACUGAGACACGGUCCAAACUCCUACGGGAGGCAGCAGUGAGGAAUAUUGGUCAAUGGACGUAAGUCUGAACCAGCCAAGUCGCGUGAGGGAAGACUGCCUNUGGGUUGUAAACCNCUUUUAUAAGGGAAGAAUAAGUUCUA--CGUG----UAGAAUGAUGCCUGUACCUUAUGAAUAAGCAUCGGCUNACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGAUGCGAGCGUUAUCCGGAUUUAUUGGGUUUAAAGGGUGCGUAGGCGGUUUAUUAAGUUAGUGGUUAAAUAUUUGAGCUCAACUC-AAUUGUGCCAUUAAUACUGGUAAACUGGAGUACAGGCGAGGUAGGCGGAAUAAGUUAAGUAGCGGUGAAAUGCAUAGAUAUAACUUAGAACUCCGAUUGCGAAGGCAGCUUACCAGACUGUAACUGACGCUGAUGCACGAGAGCGUGGGUAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGCUCACUNGUUCUSUGCGA-UAUA--UUGUACGGGAUUAAGCGAAAGUAUUAAGUGAGCCACCUGGGGAGUACGUCGGCAACGAUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGAGGAACAUGUGGUUUAAUUCGAUGAUACGCGAGGAACCUUACCUGGGUUUAAA-UGAAAUGCCGUAUUUGGAAACAGAUAUUCUC-UUCG-GAGCUUUUUCR-AGGUGCUGCAUGGUUGUCGUCAGCUCGUGCCGUGAGGUGUCGGGUUAAGUCCCAUAACGAGCGCAACCCUUACCGUUAGUUGCUAGCAUAAUG-AUGAGCACUCUAACGGGACUGCCACCGUAA-GGUGGAGGAAGGCGGGGAUGACGUCAAAUCAGCACGGCCCUUACACCCAGGGCUACACACGUGUUACAAUGGCCGGUACAGAGGGCCGCUACCAGGUGACUGGAUGCCAAUCUC-AAAAGCCGGUCGUAGUUCGGAUUGGAGUCUGUAACCCGACUCCAUGAAGUUGGAUUCGCUAGUAAUCGCGCAUCAGCCUGGCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCAAGCCAUGGAAGCCGGGGGUGCCUGAAGUCCGUAACC-----GCGA----GGAUCGGCCUAGGGCAAAACUGGUAACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- fl.Cyt.mar ACGAUGAAGAGUUUGAUCCUGGCUCAGGAUNAACGCUAGCGGCAGGCCUAACACAUGCAAGUCGAACGGUAACAGGGAAAAGCUUGCUUUUCUGCUGACGAGUGGCGCACGGGUGCGUAACGCGUAUGAAUCUGCCUUGUACUGGAGUAUAGCCCAGGGAAACUUGGAUUAAUCCUCCAUAGUCUAGCAUUAGUAAAG---------GUUA-------CGGUACAAGAUGAGCAUGCGUCCUAUUAGCUAGUAGGUGUGGUAACGGCACACCUAGGCAACGAUAGGUAGGGGUCCUGAGAGGGAGAUCCCCCACACUGGUACUGAGACACGGACCAGACUCCUACGGGAGGCAGCAGUGAGGAAUAUUGGACAAUGGGCGCAAGCCNNAUCCAGCCAUGCCGCGUGCAGGAAGACUGCCUAUGGGUUGUAAACUNCUUUUAUACGGGAAGAAUAAGGUCUA--CGAG----UAGGCUGAUGACGGUACCGUAAGAAUAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGNNNAUACGGAGNGUGCNAGCGUUAUCCGGAAUCAUUGGGUUUAAAGGGUCCGUAGGUGGACUAAUAAGUCAGUGGUGAAAGUCUGCAGCUUAACUGUAGAAUUGCCAUUGAUACUGUUAGUCUUGAAUUGUUAUGAAGUAACUAGAAUAUGUAGUGUAGCGGUGAAAUGCAUAGAUAUUACAUAGAAUACCGAUUGCGAAGGCAGGUUACUAAUAAUAYAUUGACACUGAUGGACGAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGGUUACUAGCUGUCCGGCCCAUUGAGGGCUGGGUGGCCAAGCGAAAGUGAUAAGUAACCCACCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGGCCNGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGAUACGCGAGGAACCUUACCAGGGCUUAAA-UGGUCUGACAGGAGUGGAAACAUUCUUUUC--UUCG--GACGAUUACA-AGGUGCUGCAUGGUUGUCGUCAGCUCGUGCCGUGAGGUGUCAGGUUAAGUCCUAUAACGAGCGCAACCCCUGUGGUUAGUUGCCAUCGAAAUG-UCGGGAACUCUAGCCAGACUGCCAGUGCAA-ACUGGAGGAAGGUGGGGAUGACGUCAAAUCAUCACGGCCCUUACGUCCUGGGCCACACACGUGCUACAAUGGUAGGUACAGAGAGCAGCCACUUUGCAAAGAGGAGCGAAUCUA-UAAAACCUAUCACAGUUCGGAUCGGGGUCUGCAACUCGACCCCGUGAAGCUGGAAUCGCUAGUAAUCGCAUAUCAGCCUGAUGCGGUGAAUACGUUCCCGGGCNUUGUACACACCGCCCGUCAAGNCAUGGAAGCNNGGGGUACCUGAAGUCCGUCACC-----GUAA----GGAGCGGCCUAGGGUAAGACCGGUAACUGGGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- fl.Fla.xan -----NNNNNNNNNNNNNNNNNNNNNNNNNNAACGCUAGCGGCAGGCUUAACACAUGCAAGUCGAGGGGUAUAGUUC-----UUCG-----GAAUUAGAGACCGGCGCACGGGUGCGUAACGCGUAUGAAUCUACCUUUCACAGAGGGAUAGCCCAGAGAAAUUUGGAUUAAUACCUUAUAGUAUAGCAUUAUUAAAGA--------UUUA------UCGGUGAAAGAUGAGCAUGCGUCCCAUUAGUUAGUUGGUAAGGUAACGGCUUACCAAGGCAACGAUGGUUAGGGGUCCUGAGAGGGAGAUCCCCCACAUUGGUACUGAGACACGGACCAGACUCCUACGGGAGGCACCAGUGAGGAAUAUUGGACAAUGGGCGCAAGCCUGACCCAGCCAUGCCGCGUGCAGGAUGACGGUCUAUGGAUUGUAAACUGCUUUUAUACAGGAAGAAACAGUUCUA--CGUG----UAGAACCUUGACGGUACUGUAAGAAUAAGGAUCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGAUCCCAAGCUUAUCCGGAAUCAUUGGGUUUAAAGGGUCCGUAGGCGGUCAGAUAAGUCAGUGGUGAAAGCCCAUCGCUCAACGGUGGAACGGCCAUUGAUACUGUCUGACUUGAAUUAUUAGGAAGUAACUAGAAUAUGUAGUGUAGCGGUGAAAUGCUUAGAGAUUACAUGGAAUACCAAUUGCGAAGGCAGGUUACUACUAAUAUAUUGACGCUGAUGGACGAAAGCGUGGGUAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGGAUACUAGCUGUUGGGC--GCAA---GUUCAGUGGCUAAGCGAAAGUGAUAAGUAUCCCACCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGAUACGCGAGGAACCUUACCAAGGCUUAAA-UGGAUUGACCGGUUUGGAAACAGACUUUUC--GCAA--GACAUUUACA-AGGUGCUGCAUGGUUGUCGUCAGCUCGUGCCGUGAGGUGUCAGGUUAAGUCCUAUAACGAGCGCAACCCCUGUUGUUAGUUGCCAGCGACAAG-UCGGGAACUCUAACAAGACUGCCAGUGCAA-ACUGGAAGAAAGUGGGGAUGACGUCAAAUCAUCACGGCCCUUACGCCUUGGGCUACACACGUGCUACAAUGGCCGGUACAGAGAGCAGCCACUGGGCGAGCAGGAGCGAAUCUA-UAAAACCGGUCACAGUUCGGAUCGGAGUCUGCAACUCGACUCCGUGAAGCUGGGAUCGCUAGUAAUCGCAGAUCAGCCUGAUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCAAGCCAUGGAAGCUGGGGGUGCCUGAANUCGGUAACC-----GCAA----GGAGCUGCCUANGGUAAAACUGGUAACUAGGGCUAAGUCGUAACAAGGUANCCNUANNNNNNNNNNNNNNNNNNNNNNNN----- fl.Mar.psy -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNCCUAAUACAUGCAAGUCGAGGGGUGGCCAGUGUC--CCGN----GNUACAGNNCACCGGCGCACGGGUGNGUAACGCAUAUAAAUCUACCUCGUACUGAGGGAUAACCCACAGAAAUAUGGNUUAAUACCUCAUANUAUAGCCUGAUCAAAG---------UUCG-------CGGUACAAGAUGAGUAUGCGUUCUAUUAGCUAGAUGGUGGGGUAACGGCUCACCAUGACAGCGAUAAAUAGGGGCCCUGAGAGGGGGAUCCCCCACACUGGUACUGAGACACGGACCAGACUCCUACGGGAGGCAGCAGUGAGGAAUAUUGGACAAUGGGCGAGAGCCUGAUCCAGCCAUGCCGCGUGCAGGANGACUGCCUAUGGGUUGUAAACUGCUUUUAUACAGGAAGAAACCCCUCCA--CNUG----UG-AGGAUUGACGGUACNGUAGGAAUAAGGAUCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGAUCCAAGCGUUAUCCGGAAUCAUUGGGUUUAAAGGGUCCGUAGGCGGGACAAUCAGUCAGCGGUGAAAGUCUGUGGCUCAACCAUAGAAUUGCCAUUGAUACUGUUGUUCUUGAAUACUUAUGAAGUGGUUGGAAUAUGUAGUGUAGCGGUGAAAUGCAUAGAUAUUACAUAGAACACCUAUUGCGAAGGCAGGUCACUAAUAAGACACUGACGCUGAUGGACGAAAGCGUGGGUAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGGUUACUAGCUGUUCGGGA-UUCG--GACUGAGUGGCUAAGCGAAAGUGAUAAGUAUUCCACCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGGNCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGAUACGCGAGGAACCUUACCAGGGCUUAAA-UGUAUUGACAGGUUUAGAGAUAUACUUUUC--UUCG--GACAUUUACA-AGGUGCUGCAUGGUUGUCGUCAGCUCGUGCCGUGAGGUGUUAGGUUAAGUCCUAUAACGAGCGCAACCCCUGUUGUUAGUUGCCAGCGGAAUG-UCGGGAACUCUAACAAGACUGCCCGUGCAA-ACUGGAGGAAGGUGGGGAUGACGUCAAAUCAUCACGGCCCUUACGUCAUGGGCUACACACGUGCUACAAUGGUAGGGACAGAGAGCAGCCACUUGGCGAGAAGGAGCGAAUCUA-UAAACCCUAUCACAGUUCGGAUCGGAGUCUGCAACUCGACUCCGUGAAGCUGGAAUCGCUAGUAAUCGCAUAUCAGCCUGAUNCGGUGAAUACGUUCCCGGACCUUGUACACACCGCCCGUCAAGCCAUGGAAGCCGGGAGUGCCUGAAGUCCAUCACC-----GCAA----GGAGCGGCCUAGGGUAAGAUCCGUGACUAGGGNNAAGUCGUAACAAGGUANNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- fl.Pre.eno -----NNNNNNNNNNNNNNNNNNNNNNGAUGAACGCUAGCUACAGGCCUAACACAUGCAAGUCGAGGGGCAUCA-UGAUGAUCUUGAUCGUU-GAUGGCGACCGGCGCACGGGUGAGUAACGCGUAUCAACCUACCCAUGACUAAGGGAUAACCCGUCGAAAGUCGGCCUAAGACCUUAUGUAAUCGCAUGAUGAAAGA--------UUAA------UUGGUGAUGGAUGGGGAUGCGUCUGAUUAGCUUGUCGGUGAGGUAACGGCUCACCGAGGCAACGAUCAGUAGGGGUUCUGAGAGGAAGGUCCCCCACAUUGGAACUGAGACACGGUCCAAACUCCUACGGGAGGCAGCAGUGAGGAAUAUUGGUCAAUGGACGAUAGUCUGAACCAGCCAAGUAGCGUGCAGGAUGACGGCCUAUGGGUUGUAAACUGCUUUUAUGCGGGGAUAAAGUGAGGGA--CGUG----UCCCUCAUUGCAUGUACCGCAUGAAUAAGGACCGGCUAAUUCCGUGCCAGCAGCCGCGGUAAUACGGAAGGUCCAGGCGUUAUCCGGAUUUAUUGGGUUUAAAGGGAGCGUAGGCUGCCGUUUAAGCGUGUUGUGAAAUGUACCGGCUCAACCGGUGAUGUGCAGCGCGAACUGGAUGGCUUGAGUACGAAGAGGGAAUGCGGAACUCGUGGUGUAGCGGUGAAAUGCUUAGAUAUCACGAGGAACUCCGAUCGCGAAGGCAGCAUUCCGUUUCGUGACUGACGCUGAUGCUCGAAAGUGCGGGUAUCGAACAGGAUUAGAUACCCUGGUAGUCCGCACGGUAAACGAUGGAUGCUCGCUGUUGGAUA-UUUU--UAUUCAGUGGCUAAGUGAAAGCAUUAAGCAUCCCACCUGGGGAGUACGCCGGCAACGGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACNAGCGGAGGAACAUGUGGUUUAAUUCGAUGAUACGCGAGGAAC-UUACCCGGGCUUGAA-UUAGAGGAAAGAUCCAGAGAUGGUGAUGCCC-UUCG-GGGUUCUGUGA-AGGUGCUGCAUGGUUGUCGUCAGCUCGUGCCGUGAGGUGUCGGCUUAAGUGCCAUAACGAGCGCAACCCUUUUCCUUAGUUGCCAUCGGCAAG-CCGGGCACUCUGAGGAUACUGCCUCCGCAA-GGAGGAGGAAGGUGGGGAUGACGUCAAAUCAGCACGGCCCUUACGUCCGGGGCUACACACGUGUUACAAUGGCGCGUACAGAGAGCUGGGUUUGCGCAAGCACUCUCAAAUCUU-UAAAACGCGUCUCAGUUCGGACCGGGGUCUGCAACCCGACCCCGCGAAGCUGGAUUCGCUAGUAAUCGCGCAUCAGCCUGGCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCAAGCCAUGAAAGCNGGGGGCGCCUGAAGUCCGUGACC-----GCAA----GGAUCGGCCUAGGCGAGACCCGGUGAUUNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN----- fu.Fus.gon CGAACGAAGAGUUUGAUCCUGGCUCAGGAUGAACGCUGACAGAAUGCUUAACACAUGCAAGUCG-ACUCGAGUC--------UUCG--------GACUUGGGUNGCGGACGGGUGAGUAACGCGUAAAAACUUGCCUCAUAGUCUGGGACAACAUCUGGAAACGGAUGCUAAUACCGGAUAUUAUGGCAUUAUGAAAGC--------UAUA-----UGCGCUAUGAGAGAGCUUUGCGUCCCAUUAGCUAGUUGGUGAGGUAACGGCCCACCAAGGCGAUGAUGGGUAGCCGGCCUGAGAGGGUUAACGGCCACAAGGGGACUGAGACACGGCCCNUACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGACCAAAGUCUGAUCCAGCAAUUCUGUGUGCACGAUGACGUUUUUCGGAAUGUAAAGUGCNUUCAGUCGGGAAGAAGC---------AAGU-------------GACGGUACCGACAGAAGAAGCGACGGCUAAAUACGUGCCAGCAGCCGCGGUAAUACGUAUGUCGCNAGCGUUAUCCGGAUUUAUUGGGCGUAAAGCGCGUCUAGGCGGCAAGGAAAGUCUGAUGUGAAAAUGCGGGGCUCAACUCCGUAUU-GCGUUGGAAACUGCCUUNCUAGAGUNCUGGAGAGGUAGGCGGAACUACAAGUGUAGAGGUGAAAUUCGUAGAUAUUUGUAGGAAUGCNNAUGGGGAAGCCAGCCUACUGGACAGAUACUGACGCUAAAGCGCNAAAGCGUGGGUAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCUGUAAACGAUGAUUACUAGGUGUUGGGG--GUCA-AACCUCAGCGCCNAAGCUAACGCGAUAAGUAAUCCGCCUGGGGAGUACGUACGCAAGUAUGAAACUCAAAGGAAUUGACGGGGACCNGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGANNNAACGCGAGGAACCUUACCAGCGUUUGACAUCCUACAAAGAGUGCAGAGAUGCGCUUGUGCUUUCGAGAAUGUAGUGACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUAUCGUAUGUUACCAGCCUUUAG-UUGGGGACUCAUGCGAUACUGCCUGCGACGAGCAGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGCCCCUUAUACGCUGGGCUACACACGUGCUACAAUGGGUAGUACAGAGAGCGGCGAACCCGCGAGGGGGAGCAAAUCUCAGAAAACUAUUCUUAGUUCGGAUUGUACUCUGCAACUCGAGUACAUGAAGUUGGAAUCGCUAGUAAUCGCAAAUCAGCAUGUUGCGGUGAAUACGUUCUCGGGUCUUGUACACACCGCCCGUCACACCACGAGAGUUGGUUGCACCUGAAGUAGCAGGCCUAACCGUAAGGAAGGAUGCUCCGAGGGUGUGGUUAGCGANUGGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- fu.Fus.nul -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm-mmmmmmmmmmmmmmm--mmmm--mmmmmmmmmmmGGGUGGCGGACGGGUGAGUAACGCGUAAAAACUUGCCUCACAGCUAGGGACAACAUUUAGAAAUGAAUGCUAAUNCCUNAUAUUAUGGCAUUAUGAAAGC--------UAUA-----AGCACUGUGAGAGAGCUUUGCGUCCCAUUAGCUAGUUGGAGAGGUAACAGCUCACCAAGGCGAUGAUGGGUAGCCGGCCUGAGAGGGUGAACGGCCACAAGGGGACUGAGACACGGCCCNNNCUCCUACGGGAGGCNGCAGUGGGGAAUAUUGGACAAUGGACCGAGGUCUGAUCCAGCAAUUCUGUGUGCACGAUGAAGUUUUUCGGAAUGUAAAGUGCUUUCAGUUGGGAAGAAAG---------AAAU-------------GACGGUACCAACAGAAGAAGUGACGGCUAAAUACGUGCCAGCAGCCGCGGUAAUACGUAUGUCACGAGCGUUAUCCGGAUUUAUUGGGCGUAAAGCGCGUCUAGGUGGUUAUGUAAGUCUGAUGUGAAAAUGCAGGGCUCAACUCUGUAUU-GCGUUGGAAACUGUAUAACUAGAGUACUGGAGAGGUAAGCGGAACUACAAGUGUAGAGGUGAAAUUCGUAGAUAUUUGUAGGAAUGCCGAUGGGGAAGCCAGCUUACUGGACAGAUACUGACGCUAAAGCGCGAAAGCGUGGGUAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAUUACUAGGUGUUGGGG--GUCG-AACCUCAGCGCCCAAGCUAACGCGAUAAGUNAUCCGCCUGGGGAGUACGUACGCAAGUAUGAAACUCAAAGGAAUUGACGGGGAC-CGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGACGNAACGCGAGGAACCUUACCAGCGUUUGACAUCUUAGGAAUGAGAUAGAGAUNUUUCAGUGUCUUCGGAAACCUAAAGACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUNUCGUAUGUUACCAUCAUUAAG-UUGGGGACUCAUGCGAUACUGCCUNCGAUGAGUAGGAGGAAGGUGGGGAUGACGUNNAGUCAUCAUGCCCCUNAUACGCUGGGCUACACACGUGCUACAAUGGGUAGAACAGAGAGUUGCAAAGCCGUGAGGUGGAGCUAAUCUCAGAAAACUNUUCUUAGUUCGGAUUGUACUCUGCAACUCGAGUACAUGAAGUUGGAAUCGCUAGUAAUCGCGAAUCAGCAUGUCGCGGUGAAUACGUUCUCGGGUCUUGUACACACCGCCCGUCACACCACGAGAGUUGGUUGCACCUGAAGUAGCAGGCCUAACCGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- fu.Lep.buc -----mmmmmmmmmmAUCCUGGCUCAGGAUGAACGCUGACAGAAUGCUUAACACAUGCAAGUCUUUGGC--RAAUCUGUG--CUUG--CACAGCCUAGC-CAAGGCGGACGGGUGAGUAACGCGUAAAAACUUGCCCUGCAGACAGGGAUAACAGACGGAAACGACUGAUAAUACCUGAUAYAAUUGCAUAAUGAAAA---------GUGA-------UGCUGCAGGAGAGCUUUGCGUCCUAUUAGCUUGUUGGUGAGGUAANGGCUCACCAAGGCGAUGAUAGGUAGCCGGCCUGAGAGGGUGAACGGCCACAAGGGGACUGAGAUACGGCCCUUACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGGGCANCCCUGAUCCAGCAAUUCUGUGUGCACGAAGAAGGUUUUCGGAUUGUAAAGUGCUUUCAGCAGGGAAGAAGA---------AAGU-------------GACGGUACCUGCAGAAGAAGCGACGGCUAAAUACGUGCCAGCAGCCGCGGUAAUACGUAUGUCGCAAGCGUUAUCCGGAAUUAUUGGGCAUAAAGGGCAUCUAGGCGGCCAGACAAGUCUGGGGUGAAAACUUGCGGCUCAACCGCAAGCCUGCCCUGGAAACUGUUUGGCUAGAGUGCUGGAGAGGUGGACGGAACUGCACGAGUAGAGGUGAAAUUCGUAGAUAUGUGCAGGAAUGCCGAUGAUGAAGAUAGUUCACUGGACGGNAACUGACGCUGWAGUGCGVAARCUGGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCCACCCGUAAACGAUGAUUACUGGGUGUGGGCAU-GAAG-AGUGUCCGUGCCGAAGCUAAUGCGAUAAGUAAUCCGCCUGGGGAGUACGKCCGCAAGGCUGAAACUCAAAGGAAUUGACGGGGACCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGACGCAACGCGAGGAACCUUACCAGAUCUUGACAUCCUACGAAUGCCUGUGAGAACAGGCAGUGCCUUCGGGAACGUWGAGACAGGUGGUGCAUGGCUGUCGACAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUAUCGCUAGUUGCCAUCAUUAAG-UUGGGGACUCUAGCGAGACUGCCUGCGAAGAGCAGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGCCCCUUAUGAUCUGGGCUACACACGUGCUACAAUGGCCGGNACAAAGAGCUGCAAAACGGUAACGUUUAGCCAAUCUU-UAAAGCCGGUCCAAGUUCGGAUUGAAGUCUGCAACUCGACUUCAUGAAGCCGGAAUCGCUAGUAAUCGCAGAUCAGCAUGCUGCGGUGAAUACGUUCUCGGGUCUUGUACACACCGCCCGUCACACCACGAGAGUUGUUUGNACCUGAAGCCGCCGGUCCAACCGUAAGGAGGAAGGCGUCUAAGGUGUGGAUAGUGAUUGGGGUGAAGUCGUAACAAGGUAAGUCGmmmmmmmmmmmmmmmmmmmmmmmmm----- fu.Lep.mic -----mmmmmmmmmmmmmmmGGCUCAGGAUAAACGCUGACAGAAUGCUUAACACAUGCAAGUCG-AUGAUGGGAGCUAG---CUUG---CUAGAAGAAGUCAUGGCGGACGGGUGAGUAACGUGUAAAAACUUACCAUAUAGACUGGGAUAACAGAGGGAAACUUCUGAUAAUACUGGAUA-AGUUGCAUAAUGAAAGUA-------GCAA-----UACGCUAUAUGAGAGCUUUGCAUCCUAUUAGCUAGUUGGUGGGGUAAAAGCCUACCAAGGCGAUGAUAGGUAGCCGGCCUGAGAGGGUGGACGGCCACAAGGGGACUGAGAUACGGCCCUUACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGAGGCAACUCUGAUCCAGCAAUUCUGUGUGUGUGAAGAAGGUUUUAGGACUGUAAAACACUUUUAGUAGGGAAGAAAG---------AAAU-------------GACGGUACCUACAGAAGAAGCGACGGCUAAAUACGUGCCAGCAGCCGCGGUAAUACGUAUGUCGCGAGCGUUAUCCGGAAUUAUUGGGCUUAAAGGGCAUCUAGGCGGUUAAACAAGUUGAAGGUGAAAACCUGUGGCUCAACCAUAGGCUUGCCUACAAAACUGUAUAACUAGAGUACUGGAAAGGUGGGUGGAACUACACGAGUAGAGGUGAAAUUCGUAGAUAUGUGUAGGAAUGCCGAUGAUGAAGAUAACUCACUGGACAGCAACUGACGCUGAAGUGCGAAAGCUAGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCUAGCUGUAAACGAUGAUCACUGGGUGUGGGGAU-UCGA-AGUCUCUGUGCCGAANCAAAAGCGAUAAGUGAUCCGCCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGACCCGCACAAGUGGUGGAGCAUGUGGUUUAAUUCGACGCAACGCGAGGAACCUUACCAGAUCUUGACAUCCUCCGAAGAGCAUAGAAGUAUGCUUGUGCCUACGGGAACGGAGAGACAGGUGGUGCAUGGCUGUCGACAGCUCGUGUUGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGAAACCCCUAUCAUUAGUUACCAUCAUUAAG-UUGGGGACUCUAAUGAAACUGCCUACGAAGAGUAGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGCCCCUUAUGAUCUGGGCUACACACGUGCUACAAUGGAUAGUACAAAGAGAAGCUUUGURGCGAUACAUGNCNANACUAAGAAAGCUAUUCUUAGUUCGGAUUGAAGUCUGCAACUCGACUUCAUGAAGUUGGAAUCACUAGUAAUCGUGAAUCAGCAUGUCACGGUGAAKACGUUNUCGGGUNUUGUACACACCGCCCGUCACACCACGAGAGUUGUUUGCACCUGAAAUUACUGGCCUAACUGUAAAGGGGGAGGUACUGAAGGUGUGGAUAGCGAUUGGGGUGAAGUCGUAACAAGGUAAmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- gr.The.ros GGGAUGGAGAGUUUGAUCCUGGCUCAGGGGGAACGCUGGCGGCGUGCCUAAUGCAUGCAAGUCGGACGGGANGCACGCN---CUCU---GGCGUGCCGACCGUGGCGGACGGGUGCGUAACACGUGGGAACCCUCCCGGGUGCGGGGGAUAACCCGGGGAAACUCGGGCUAAUACCCCAUACGCUUGGUGAAGGAAAGGCGCAG---GCGA-CUGUGCUGUGCUCGGAGGGCCCUGCGGCCUAUCAGCUAGACGGUAGGGUAACGGCCUACCGUGGCGAUGACGGGUAGCUGGUCUGAGAGGAUGGCCAGCCACACGGGCACUGAGACACGGGCCCGACUCCUACGGGAGGCAGCAGCAGGGAAUCUUCCGCAAUGGGGGCAACCCUGACGGAGCGACGCCGCGUGCGGGAGGAAGCCCUUCGGGGUGUAAACCGCUGUUCGGGGGGACGAUC----------GAGC-------------GACGGUACCCUCGGAGCAAGUCCCGGCUAACUACGUGCCAGCAGCCGCGGUAAGACGUAGGGGGCGAGCGUUACCCGGAGUCACUGGGCGUAAAGGGCGUGUAGGCGGCUGGGUACGCCGCGUGUGAAAGUCCCCGGCUCAACCGGGGAGGGUCGCGCGGGACGGCCUGGCUCGAGGGCGGGAGAGGCGGGUGGAAUUCCCGGUGUAGCGGUGAAAUGCGUAGAGAUCGGGAGGAACGCCGGUGGCGAAGGCGGCCCGCUGGCCCGUACCUGACGCUGAGGCGCGAAGGCGUGGGGAGCGAACCGGAUUAGAUACCCGGGUAGUCCACGCAGUAAACGAUGCGGGCGAGGUGUGGGUGGUUUGACCCCAUCCGUGCCGGCGCCAACGCAGUAAGCCCGCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCAGCGGAGCGUGUGGUUUAAUUCGACGCAACGCGAAGAACCUUACCAGGGCUUGACAUGCACCGAACCUGGCUGAAAGGCUGGGGUGCCGUGAGGAGCGGUGGCACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGGGGUCAGUUAC------GCGG-----GUG-UCUGACCCGACUGCCGGGGAAAGCCCGGAGGAAGGAGGGGAUGACGUCAAGUCAGCAUGGCCCUGACGCCCUGGGCGACACACACGCUACAGUGACCGGGACAGUGGGCAGCGAAGGGGCGACCUGGAGCCAAUCCCGCAAACCCGGUCGUGGUGGGGAUCGCAGGCUGCAACCCGCCUGCGUGAACGCGGAGUUGCUAGUAACCGCCGGUCAGCCUACGGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACGUCACGAAAGCUGGCUUCACCUGAAGCUGGUGGGCCAACCGCACGGGGGCAGCCGUCGAGGGUGGGGCUGGUGAUUGGGACGAAGUCGUAACAAGGUAGCCGUACCGGAAGGNNNNGGUGGAUCACCUCCUUU pt.Iso.pal NNUCCGAAGGGUUUGAUCCUGGCUCAGAAUGAACGUUGGCGGCGUGGAUUAGGCAUGCAAGUCGAACGGCC-----------GCAA----------GGCUAGUGGCGAAAGGGUGAGUAAGGCGACGGAACCUACCCCGAGGUUGG-CAUAGCCGGGGGAAACUCCGGGUAAUUCCCAGCGAUGUCGCAUCGC-AAAGGU-------GCAA----UUCCGCCUCGGGACGGGCCGUNGUGGUAUUAGGUAGUUGGUGGGGUCACGGCCCACNAAGCCGACGAUGCCUACCGGGCGUGCGAGCGUGGCCCGGCACACUGGGACUGAGACACUGCCCAGACUCCUACGGGAGGCUGCAGUCGAG-AUCUUCGGCAAUGGGCGCAAGCCUGACCGAGCGACGCGCNGUGGAGGAAGACGGCCCUUGGGUUGUAAACUCCUGNCNNGGGGAAGGAAGGGUCGGC---GAAG---AGCCNAUCUUGACCGC-UCCCUGGAGGAAGCACGGGCUAAGUUCGUGCCAGCAGCCGCGGUAAGACGAACCGUGCAAACGUUAUUCGGAAUCACUGGGCUUAAAGGGCGCGUAGGCGGAAGGGCGCGUCGGCGUUGAAAUNCCCCGGCUCAACCGGGNNAGCGGCGUCGAAACGGCCCUUCNGGAGGGGCGUAGAGGGACUCGGAACUUCCGGUGGAGCGGUGAUAUGCGUUGAGAUCGGAAGGAANGCCCGUGGCGAAAGCGGAGUCCUGGACGCUUACUGACGCUGAGGCGCGAAAGCCAGGGGAGCAAACGGGAUUAGAUACCCCGGUAGUCCUGGCUGUAAACRAUGGGCACUUGGCAGUGGGUUCUCGA-GGGUCCACUGCCNNNGGGAAACCGUGAAGUGCCNNGCCUGGGGAGUAUGGUCGCAAGGCUGAAACUCAAAGGAAUUCACGGGGGCUCACACAAGCGGUGGAGGAUGUGGCUUAAUUCGAANNUACGCGAAAAACCUUACCAGGGCUUGACAUGAGGGAUAGCCGGCGGAAACGUCGGUGCGCCGCAAGUGGAACCUAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGCCGUGAGGUGUCGGGUUAAGUCCCUUAACGAGCGAAACCCCUGCGGCNAGUUGCCAACACUCUG-GUGGGGACUCU-GCCAGACCGCCGGCGUGAAGCCGGAGGAAGGCGGGGAUGACGUCAAGUCCUCAUGGCCCUUAUGCCNAGGGCUGCACACGUGCUACUAUGGCGGGGACAAAGCGUCGCCACGCCGUAAGGCCGAGCCAACCGCGUAAACCCCGCCCCAGUUCGGAUCGAGGGCUGCAACCCGCCCUCGUGAAGCCGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUGUGUUCCUGAGCCUUGUACACACCGCCCGUCAAGNCACCAAAGGGGGGGGCACCCGAAGUCGAAGA-UC----UCAC---GACGGGCGCCGAAGGUGAAACUCCNNAUGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- pt.Pla.bra -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm-------mmmm-------mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm-------mmmm-----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmCAAUGGGCGAAAGCCUGACCAAGCGAUGCCGCGUGCGGGAUGAAGGCCCUUGGGUUGUAAACCGCUGUCAGGGGGGAUGAAACU--------UUCG----------GUUGACAGA-GCCCCAGAGGAAGCACGGGCUAAGUACGUGCCAGCAGCCGCGGUAACACGUACUGUGCGAACGUUAUUCGGAAUCACUGGGCUUAAAGGGUGCGUAGGCGGCCUGGAUAGUCAGAUGUGAAAUCCCGCGGCUCAACCGUGGAACUGCAUUUGAAACUGCCAGGCUUGAGUGAGACAGGGGUGUGUGGAACUUCUCGUGGAGCGGUGAAAUGUGUUGAUAUGAGAAGGAACACCGGUGGCGAAAGCGACACACUGGGUCUUAACUGACGCUGAGGCACGAAAGCCAGGGGAGCGAACGGGAUUAGAUACCCCGGUAGUCCUGGCUGUAAACGUUGAGUACUAGUUGGUGGGAACUUCG-GUUCUCACGGACGUAGCAAAAGUGUUAAGUACUCCGCCUGGGGAGUAUGGUCGCAAGGCUGAAACUCAAAGGAAUUGACGGGGGCUCACACAAGCGGUGGAGCAUGUGGCUUAAUUCGAGGCAACGCGAAGAACCUUACCUAGACUUGACAUGAUGGAUAGCUGGCUGAAAGGUCAGUGACGCUUCGGUGGAACAUGCACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCUUGAACGAGCGCAACCCCUGUCGCCAGUUGCCAGCAAUAAUGUUGGGGACUCUGGCGAGACCGCCGGUGUUAAACCGGAGGAAGGUGGGGACGACGUCAAGUCAUCAUGGCCUUUAUGUCUAGGGCUGCACACGUGCUACAAUGCGGCGUACAAAGGGAAGCC-ACCCGCGAGGGGGAGCAAAUCUCAGAAAGCGCCGCUCAGUUCGGAUUGUAGGCUGCAACUCGCCUACAUGAAGCUGGAAUCGCUAGUAAUCGCAGGUCAGCAUACUGCGGUGAAUGUGUUCCUGAGCCUUGUACACACCGCCCGUCAAGCCACGAAAGGGGGGGGCAUCCUAAGUCACUGAGCUA---AUC---UGGCAGGUGCCUAAGAUGAACUCCCUGAUUGGGACUAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGCGGCUGGAUCACCUCCUU- pt.Pla.mar -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm-------mmmm--------mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm-------mmmm----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmCUAGGGGGUGUGAGAGCAUGGCCCCCACCACUGGGACUGAGACACUGCCCAGACACCUACGGGUGGCUGCAGUCGAGAAUCUUCGGCAAUGGACGCAAGUCUGACCGAGCGACGCCGCGUGCGGGAUGAAGGCCCUUGGGUUGUAAACCGCUGUCAGAGGGGAUGAAAUGCAGAAGGGUUAU--CCCUUUUGUUUGACAGA-GCCUCAGAGGAAGCACGGGCUAAGUCCGUGCCAGCAGCCGCGGUAACACGUACUGUGCGAACGUUAUUCGGAAUCACUGGGCUUAAAGGGUGCGUAGGCGGUUUAGUAAGUAGGGUGUGAAAUGCCAGGGCUCAACCUUGGCACGGCGCUCUAAACUGCUAAACUUGAGUGAGAUAGGGGUGUACGGAACUUCCGGUGGAGCGGUGAAAUGCGUUGAUAUCGGAAGGAACACCGGUGGCGAAAGCGGUACACUGGGUCUUAACUGACGCUGAGGCACGAAAGCUAGGGUAGCGAACGGGAUUAGAUACCCCGGUAGUCCUAGCCGUAAACGAUGAGUACUAGUUGGGAGGAGCUUCG-GCUCAUCCGGACGUAGCGAAAGCAUUAAGUACUCCGCCUGGGGAGUAUGGUCGCAAGGCUGAAACUCAAAGGAAUUGACGGGGGCUCACACAAGCGGUGGAGCAUGUGGCUUAAUUCGAGGCAACGCGAAGAACCUUACCUGGAUUUGACAUGUUGUAUAGCUCUGUGAAAGCAGAGUGACGCUUCGGUGGAACUUGCACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUCGCGUUAAGUCGCUGAACGAGCGAAACCCCUAUCCUUAGUUGCCAGCACUCAUGGUGGGGACUCUAAGGAGACUGCCGGUGUCAAACCGGAGGAAGGUGGGGACGACGUCAAGUCAUCAUGGCCUUUAUGUCCAGGGCUGCACACGUGCUACAAUGCGGCGUACAAAGGGAAGCAAAAUCGCGAGAUCAAGCAAAUCCCAAAAAGCNUCGCUCAGUUCGGAUUGCAGGCUGCAACUCGCCUGCAUGAAGUUGGAAUCGCUAGUAAUCGCAGGUCAGCUUACUGCGGUGAAUAUGAACCUGAGCCUUGUACACACCGCCCGUCAAGCCACGAAAGCGGGGGGCGUCCAAAGUmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ch.Zea.may CUCAUGGAGAGUUCGAUCCUGGCUCAGGAUGAACGCUGGCGGCAUGCUUAACACAUGCAAGUCGAACGGGAAGU--------GGU---------GUUUCCAGUGGCGAACGGGUGAGUAACGCGUAAGAACCUGCCCUUGGGAGGGGAACAACAACUGGAAACGGUUGCUAAUACCCCGUA-GGCUGAGGAGCAAAAGGA-------GAAA-----UCCGCCCAAGGAGGGGCUCGCGUCUGAUUAGCUAGUUGGUGAGGCAAUAGCUUACCAAGGCGAUGAUCAGUAGCUGGUCCGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUC-GGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGCCUGACGGAGCAAUGCCGCGUGGAGGUGGAAGGCCUACGGGUCGUCAACUUCUUUUCUCGGAGAAGAA-----------ACAA------------UGACGGUAUCUGAGGAAUAAGCAUCGGCUAACUCUGUGCCAGCAGCCGCGGUAAGACAGAGGAUGCAAGCGUUAUCCGGAAUGAUUGGGCGUAAAGCGUCUGUAGGUGGCUUUUCAAGUCCGCCGUCAAAUCCCAGGGCUCAACCCUGGACAGGCGGUGGAAACUACCAAGCUGGAGUACGGUAGGGGCAGAGGGAAUUUCCGGUGGAGCGGUGAAAUGCAUUGAGAUCGGAAAGAACACCAACGGCGAAAGCACUCUGCUGGGCCGACACUGACACUGAGAGACGAAAGCUAGGGGAGCAAAUGGGAUUAGAGACCCCAGUAGUCCUAGCCGUAAACGAUGGAUACUAGGUGCUGUGCGACUCGCCCGUGCAGUGCUGUAGCUAACGCGUUAAGUAUCCCGCCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAAGGCGAAGAACCUUACCAGGGCUUGACAUGCCGCGAAUCCUCUUGAAAGAGAGGGGUGCCCUCGGGAACGCGGACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGCCGUAAGGUGUUGGGUUAAGUCUCGCAACGAGCGCAACCCUCGUGUUUAGUUGCCACUA-UGAG-UUUGGAACCCUGAACAGACCGCCGGUGUUAAGCCGGAGGAAGGAGAGGAUGAGGCCAAGUCAUCAUGCCCCUUAUGCCCUGGGCGACACACGUGCUACAAUGGGCGGGACAAAGGGUCGCGAUCUCGCGAGGGUGAGCUAACUCCAAAAACCCGUCCUCAGUUCGGAUUGCAGGCUGCAACUCGCCUGCAUGAAGCAGGAAUCGCUAGUAAUCGCCGGUCAGCCUACGGCGGCGAAUCCGUUCCCGGGCCUUGUACACACCGCCCGUCACACUAUAGGAGCUGGCCAGGUUUGAAGUCAUUAC-CUAACCGUAAGGAGGGGGAUGCCUAAGGCUAGGCUUGCGACUGGAGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGCGGCUGGAUCACCUCCUUU ch.Dau.car CUCAUGGAGAGUUCGAUCCUGGCUCAGGAUGAACGCUGGCGGCAUGGAAAACACAUGCAAGUCGGACGGGAAGU--------GGU---------GUUUCCAGUGGCGGACGGGACUGUAACGCGUAAGAACCUGCCCUUGGGUGGGGAACAACAGCUGGAAACGGCUGCUAAUACCCCGUA-GGCUGAGGAGCAAAAGGA-------GGAA-----UCCGCCCGAGGGAGGGCUCGCGUCUGAU-AGCUAGUUGGUGAGGCAAUAGCUUACCAAGGCGAUGAUCAGUAGCUGGUCCGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUC-GGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGC--GACGGAGCAAUGCCGCGUGGAGGUAGAAGGCCCACGGGUCGUGAACUUCUUUUCCCGGAGAAGAA-----------GCAA------------UGACGGUAUCUGGGGAAUAAGCAUCGGCUAACUCUGUGCCAGCAGCCGCGGUAAUACAGAGGAUGCAAGCGUUAUCCGGAAUGAUUGGGCGUAAAGCGUCUGUAGGUGGCUUUUUAAGUCCGCCGUCAAAUCCCAGGGCUCAACCCUGGACAGGCGGUGGAAACUACCAAGCUGGAGUACGGUAGGGGCAGAGGGAAUUUCCGGUGGAGCGGUGAAAUGCGUAGAGAUCGGAAAGAACACCAACGGCGAAAGCACUCUGCUGGGCCGACAUUGACACUGAGAGACGAAAGCUAGGGGAGCGAAUGGGAUUAAAUACCCCAUUAGUCCUAGCCGUAAACGAUGGAUACUAGGCGCUGUGCG-AUCGCCCGUGCAGUGCUGUAACUACCGCGUUAAGUAUCCCGCCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGGCGCGCACAAUCGGUGGAGCAUGUGGUUUAAUUCGAUGCAAAGCGAAGAACCUUACCAUGGCUUGACAUGCCGCGAAUCCUCUUGAAAGAGAGGGGUGCCUUCGGGAACGCGGACACAGGUGGUGAAUGGCUGUCGUCAGCUCGUGCCGUAAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUGUUUAGUUGCCAUCG-UUGAGUUUGGAACCCUGAACAGACUGCCGGUGAUAAGCCGGAGGAAGGUGAGGAUGACGUCAAGUCAUCAUGCCCCUUAUGCCCUUGGCGACACUCGUGCUACAAUGGCCGGGACAAAGGGUUGCGAUCCCGCGAGGGUGAGCUAACCCCAAAAACCCGUCCUCAGUUGGGAUUGCAGGCUGCAACUCGCCUGCAUGAAGCCGGAAUCGCUAGUAAUCGCCGGUCAGCCUACGGCGGUGAAUUCGGUACCGGGCCUUGUACACACCGCCCGUCACACUAUGGGAGCUGGCCAUGCCCGAAGUCGUUAC-CUAACCGCAAGG-GGGGGAUGCCGAA-GCAGGGCUAGUGACUGGAGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGCGGCUGGAUCACCUCCUmm ch.Gly.max CUCAUGGAGAGUUCGAUCCUGGCUCAGGAUGAACGCUGGCGGCAUGCCUUACACAUGCAAGUCGGACGGGAAGU--------GGU---------GUUUCCAGUGGCGGACGGGUGAGUAACGCGUAAGAACCUACCCUUGGGAGGGGAACAACAGCUGGAAACGGCUGCUAAUACCCCGUA-GGCUGAGGAGCAAAAGGA--------GAA-----UCCGCCCGAGGAGGGGCUCGCGUCUGAUUAGCUAGUUGGUGAGGC-AUAGCUUACCAAGGCGAUGAUCAGUAGCUGGUCCGAGA-GAUGAUCAGCCACACUGGGACUGAGACGAGGCCCAGACUCUUC-GGGAGGCAGCAGUGGGGAAUUUUCCGCAAUG-GCGAAAGC-UGACGGAGCAAUGCCGCGUGAAGGUAGAAGGCCUAC-GGUCAUGAACUUCUUUUCCCGGAGAAGAA-----------GCAA------------UGACGGUAUCCGGGGAAUAAGCAUCGGCUAACUCUGUGCCAGCGGCCGCGGUAAGACAGAGGAUGCAAGCGUUAUCCGGAAUGAUUGGGCG-AAAG-GUCUGUAGGUGGCUUUUUAAGUUCGCCGUCAAAUCCCAGGGCUCAACCCUGGACAGGCGGUGGAAACUACCAAGCUGGAGUACGGUAGGGGCAGAGGGAAUUUCCGGUGGAGCGGUGAAAUGCGUAGAGAUCGGAAAGAACACCAACGGCGAAAGCACUCUGCUGGGCCGACACUGACACUGAGAGACGAAACUUAGGGGAGCGAAUGGGAUUAGAUACCCCAGUAGUCCUAGCCGUAAACGAUGGAUACUAGGCGCUGUGCGUAUCGCCCGUGCAAUGCUGUA-CUAACGCGUUAAGUAUCCCGCCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGG-CCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAAAGCGAAGAACCUUACCAGGGCUUGACAUGCCGCGAAUCCUCUUGAAAGAGAGGGGUGC-UUCGGGAACGCGGACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGCCGUAAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUGUUUAGUUGCCAACA-UUUAGUUUGGAACCCUGAGCAGACUGCCGGUGAUAAGCCGGAGGAAGGUGAGGAUGACGUCGAGUCAUCAUGCCCCUUAUGCCCUGGGCGACACACGUGCUACAAU-----GGACAAAGGAUCGCGAUCCCGCGA-GGUGAGCUAACUCCAAAAACCCGUCCUCAGUUCGGAUUGUAGGCUGCAACUCGCCUGCAUGAAGCCGGAAUCGCUAGUAAUCGCCGGUCAGCCUACGGCGGUGAAUUCGUUCCCGGGCCUUGUACACACCGCCCGUCACACUAUGGGAGCUGGCCAUGC-CGAAGUCGUUAC-CUAACCGCAA-GAGGGGGAUGCCGAA-G--GGGCUAGUGACUGGAGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGCGGCUGGAUCACCUCCUUm ch.Pis.sat CUCAUGGAGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCGGCAUGCUUUACACAUGCAAGUCGGACGGGAAGU--------GGU---------GUUUCCAGUGGCGAACGGGUGAGUAACGCGUAAGAACCUGCCCUUGGGAGGGGGACAACAGCUGGAAACGGCUGCUAAUACCCCGUA-GGCUGAGGAGCGAAAGGA-------GGAA-----UCCGCCCAAGGAGGGGCUCGCGUCUGAUUAGCUAGUUGGUGAGGUAAUAGCUUACCAAGGCGAUGAUCAGUAGCUGGUCCGAGAGGAUGAUCAGCCACACUGGGACUGAGACAAGGUCCAGACUCCUC-GGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGCCUGACGGAGCAAUGCCGCGUGGAGGUAGA-GGCCCCUGGGUCAUGAACUUCUUUUCCCGGAGAAGAA-----------AAAA------------UGACGGUAUCCGGGGAAUAAGCAUCGGCUAACUCUGUGCCAGCAGCCGCGGUAAGACAGAGGAUGCAAGCGUUAUCCGGAAUGAUUGGGCGUAAAGCGUCUGUAGGUGGCUUUUUAAGUUCGCUGUCAAAUACCAGGGCUCAACCCUGGACAGGUGGUGAAAACUACUAAGCUAGAGUACGGUAGGGGCAGAGGGAAUUUCCGGUGGAGCGAUGAAAUGCGUAGAGAUCGGAAGGAACACCAACGGCGAAAGAACUCUGCUGGGCCGACACUGACACUGAGAGACGAAAGCUAGGGGAGCGAAUGGGAUUAGAGACCCCAGUAGUCCUAGCCGUAAACGAUGGAUACUAAGUGCUGUGCGAAUCGCCCGUGCAACGCUGUA-CUAACGCGUUAAGUAUCCCGCCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGG-CCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAAAGCGAAGAACCUUACCAGGGCUUGACAUGC-GCGAAUCCUCUUGAAAGAGAGGAGUGCCUUCGGGAAUGC-GACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUGUUUAGUUGCCAACG-UUUAGUUUGGAACUCUGAACAGACUGCCGGUGAUAAGCCGGAGGAAGGUGAGGAUGACGUCAAGUCAUCAUGCCCCUUAUGCCCUGGGCUACACAC-UGCUACAAUGGACCGGACAAAGGAUCGCGACCCCGCGAGGGUGAGCUAACUUCAAAAACCUGUCCUCAGUUCGGAUUGUAGGCUGCAACUCGCCUACAUGAAGCCGGAAUCGCUAGUAAUCGCCGGUCAGCCUACGGCGGUGAAUUCGUUCCCGGGCCUUGUACACACCGCCCGUCACACUAUGGGAGCUGGCCAUGCCCGAAGUCAUUAC-CUAACCGCAAGGAGGGGGAUGCCGAAGGCAGGGCUAGUGACUGGAGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGCGACUGGAUCACCUCCUUU ch.Nic.tab CUCAUGGAGAGUUCGAUCCUGGCUCAGGAUGAACGCUGGCGGCAUGCUUAACACAUGCAAGUCGGACGGGAAGU--------GGU---------GUUUCCAGUGGCGGACGGGUGAGUAACGCGUAAGAACCUGCCCUUGGGAGGGGAACAACAGCUGGAAACGGCUGCUAAUACCCCGUA-GGCUGAGGAGCAAAAGGA-------GGAA-----UCCGCCCGAGGAGGGGCUCGCGUCUGAUUAGCUAGUUGGUGAGGCAAUAGCUUACCAAGGCGAUGAUCAGUAGCUGGUCCGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUC-GGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGC--GACGGAGCAAUGCCGCGUGGAGGUAGAAGGCCCACGGGUCGUGAACUUCUUUUCCCGGAGAAGAA-----------GCAA------------UGACGGUAUCUGGGGAAUAAGCAUCGGCUAACUCUGUGCCAGCAGCCGCGGUAAUACAGAGGAUGCAAGCGUUAUCCGGAAUGAUUGGGCGUAAAGCGUCUGUAGGUGGCUUUUUAAGUCCGCCGUCAAAUCCCAGGGCUCAACCCUGGACAGGCGGUGGAAACUACCAAGCUGGAGUACGGUAGGGGCAGAGGGAAUUUCCGGUGGAGCGGUGAAAUGCGUAGAGAUCGGAAAGAACACCAACGGCGAAAGCACUCUGCUGGGCCGACACUGACACUGAGAGACGAAAGCUAGGGGAGCGAAUGGGAUUAGAUACCCCAGUAGUCCUAGCCGUAAACGAUGGAUACUAGGCGCUGUGCG-AUCGCCCGUGCAGUGCUGUAGCUAACGCGUUAAGUAUCCCGCCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGGCGCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAAAGCGAAGAACCUUACCAUGGCUUGACAUGCCGCGAAUCCUCUUGAAAGAGAGGGGUGCCUUCGGGAACGCGGACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGCCGUAAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUGUUUAGUUGCCAUCG-UUGAGUUUGGAACCCUGAACAGACUGCCGGUGUUAAGCCGGAGGAAGGUGAGGAUGACGUCAAGUCAUCAUGCCCCUUAUGCCCUUGGCGACACACGUGCUACAAUGGCCGGGACAAAGGGUCGCGAUCCCGCGAGGGUGAGCUAACCCCAAAAACCCGUCCUCAGUUCGGAUUGCAGGCUGCAACUCGCCUGCAUGAAGCCGGAAUCGCUAGUAAUCGCCGGUCAGCCUACGGCGGUGAAUUCGUUCCCGGGCCUUGUACACACCGCCCGUCACACUAUGGGAGCUGGCCAUGCCCGAAGUCGUUAC-CUAACCGCAAGG-GGGGGAUGCCGAA-GCGGGGCUAGUGACUGGAGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGCGGCUGGAUCACCUCCUUU ch.Pin.thu CUCAUGGAGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCGGCAUGCUUAACACAUGCAAGUCGGACGGGAAGU--------GGU---------GUUUCCAGUGGCGGACGGGUGAGUAACGCGUAAGAACCUGCCCUUGGGAGGGGAACAACAGCUGGAAACGGCUGCUAAUACCCCAUA-GGCUGAGGAGCAAAAGGA-------GGAA-----UCCGCCCAAGGAGGGGCUCGCGUCUGAUUAGUUAGUUGGUGAGGCAAUGGCUUACCAAGGCGACGAUCAGUAGCUGGUCCGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUC-GGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGCCUGACGGAGCAAUGCCGCGUGAAGGCAGAAGGCCCACGGGUCAUGAACUUCUUUUCUCGGAGAAGAA-----------AAAA------------UGACGGUAUCUGAGGAAUAAGCAUCGGCUAACUCUGUGCCAGCAGCCGCGGUAAGACAGAGGAUGCAAGCGUUAUCCGGAAUGAUUGGGCGUAAAGCGUCUGUAGGUGGCUUUUCAAGUCCGCCGUCAAAUCCCAGGGCUCAACCCUGGACAGGCGGUGGAAACUACCAAGCUGGAGUACGGUAGGGGCAGAGGGAAUUUCCGGUGGAGCGGUGAAAUGCGUUGAGAUCGGAAAGAACACCAACGGCGAAAGCACUCUGCUGGGCCGACACUGACACUGAGAGACGAAAGCUAGGGGAGCAAAUGGGAUUAGAUACCCCAGUAGUCCUAGCCGUAAACGAUGGAUACUAAGUGCUGUGCGUAUCGCCCGCGCAGUGCUGUAGCUAACGCGUUAAGUAUCCCGCCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUCAAUUCGAUGCAAAGCGAAGAACCUUACCAGGGCUUGACAUGCCGUGAAUCCUCCCGAAAGAGAGGAGUGCCUUCGGGAACGCGGACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGCCGUAAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUGUUUAGUUGCCAGCA-UUUAGUUUGGAACCCUGAACAGACUGCCGGUGAUAAGCCGGAGGAAGGUGAGGAUGACGUCAAGUCAUCAUGCCCCUUACGCCCUGGGCGACACACGUGCUACAAUGACCGGGACAAAGGGUCGCGACCCCGCGAGGGCAAGCUAACCUCAAAAACCCGGCCUCAGUUCGGAUUGCAGGCUGCAACUCGCCUGCAUGAAGCCGGAAUCGCUAGUAAUCGCCGGUCAGCCUACGGCGGUGAAUCCGUUCCCGGGCCUUGUACACACCGCCCGUCACACUAUGGGAGCUGGCCAUGCCCCAAGUCGUUAC-CUAACCGCAAGGAGGGGGAUGCCGAAGGCUGGGCUAGUGACUGGAGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGCGGCUGGAUCACCUCCUUU ch.Chl.mir CCUGCAGAGAGUUYGAUCCUGGCUCAGGAUGAACGCUGGCGGCAUGCUUAACACAUGCAAGUCGAACGGAGAUUUUAUGUUUCUUGAAACAUAAAAUCUUAGUGGCGGACGGGUGAGGAACGCGUAAGAACCUACCUUUAGGGGAGGGACAACAACUGGAAACGGUUGCUAAUACCUCAUA-UGCUGAGGAGUAAAAGGGUUAUAAAUACGAUUAUUCCGCCUAAAGAUGGGCUUGCGUCUGAUUAGCUUGUUGGUGGGGUAAUUGCUUACCAAGGCGACGAUCAGUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUC-GGGAGGCAGCAGURAGGAAUUUUCCGCAAUGGGCGAAAGCCUGACGGAGCAAUGCCGCGUGGAGGAUGACAGCCUGUGGGUCGUAAACUCCUUUUCUCAGAAAAGAA-----------GAUC------------UGACGGUAUCUGAGGAAUAAGCAUCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGAUGCAAGCGUUAUCCGGAAUGAUUGGGCGUAAAGCGUCUGUAGGUGGUUUAUUAAGUCUACUGUUAAAGAUCAGGGCUUAACCCUGAGCCGGCAGUGGAAACUAAUAAGCUUGAGUACGGUAGGGGCAGAGGGAAUUCCCGGUGUAGCGGUGAAAUGCGUAGAGAUCGGGAAGAACACCAAUGGCGAAAGCACUCUGCUGGGCCGAAACUGACACUGAGAGACGAAAGCUAGGGGAGCGAAUGGGAUUAGAUACCCCAGUAGUCCUAGCCGUAAACGAUGGAUACUAAGUGCUGUGCGACUCAACCGUCCAGUACUGUAGCUAACGCGUGAAGUAUCCCGCCUGGGGAGUAUGCUCGCAAGGGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGYUUAAUUCGAUGCAACGCGAAGAACCUUACCAGGGCUUGACAUGCCAUUUAGUCUUUUGAAAAAAAGACAUAUUUUUAGAUUGGUGGACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCUUUACUUGCCAUUACUAGCUGUUGGGAAAUUAGAGAGACUGCCGGUGAGAAGCCGGRGGAAGGUGAGGAUGACGUCAAGUCAGCAUGCCCCUUAUGCCCUGGGCGACACACGUGCUACAAUGGCCGGGACAAUGAGAUGCAACCUCGCGAGAGCAAGCUAACCUCAAAAACCCGGUCUCAGUUCGGAUUGCAGGCUGCAACUCGCCUGCAUGAAGUCGGAAUCGCUAGUAAUCGCUGGUCAGCCUACAGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGCUGGCUAUGCCCAAAGUCGUUAC-CCAACCGUAAGGAGGGGGAUGCCUAAGGCAGAGCUAGUGACUGGGGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGNGGNUGGAUCACCUCCUUG ch.Eug.gra GAAAUGACGAGUUUGAUCCUUGCUCAGGGUGAACGCUGGCGGUAUGCUUAACACAUGCAAGUUGAACGAAAUUACUA-----GCAA-----UAGUAAUUUAGUGGCGGACGGGUGAGUAAUAUGUAAGAAUCUGCGCUUGGGUGAGGAAUAACAGAUGGAAACGUUUGCUAAUGCCUCAUAUUACUGUGAAGUUAAAGA--------GAAU-----UUCGCCUAGGCAUGAGCUUGCAUCUGAUUAGCUUGUUGGUGAGGUAAAGGCUUACCAAGGCGACGAUCAGUAGCUGAUUUGAGAGGAUGAUCAGCCACACUGGGAUUGAGA-ACGGAACAGACUUUUC-GGAAGGCAGCAGUGAGGAAUUUUCCGCAAUGGGCGCAAGCCUGACGGAGCAAUACCGCGUGAAGGAAGAAGGCCUUUGGGUUGUAAACUUCUUUUCUCAAAGAAGAA-----------GAAA------------UGACGGUAUUUGAGGAAUAAGCAUCGGCUAAUUCCGUGCCAGCAGCCGCGGUAAUACGGGAGAUGCGAGCGUUAUCCGGAAUUAUUGGGCGUAAAGAGUUUGUAGGCGGUCAAGUGUGUUUAAUGUUAAAAGUCAAAGCUUAACUUUGGAAGGGCAUUAAAAACUGCUAGACUUGAGUAUGGUAGGGGUGAAGGGAAUUUCCAGUGUAGCGGUGAAAUGCGUAGAGAUUGGAAAGAACACCAAUGGCGAAGGCACUUUUCUAGGCCAAUACUGACGCUGAGAAACGAAAGCUGAGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCUUGGCCGUAAACUAUGGAUACUAAGUGGUGCUG--AAAG----UGCACUGCUGUAGUUAACACGUUAAGUAUCCCGCCUGGGGAGUACGCUUGCACAAGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACACGAAGAACCUUACCAGGAUUUGACAGGCUAGGAGGAAGUUUGAAAGAACGCAGUACCUUCGGGUAUCUAGACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUUUUUUUAAUUAAC-----GCUU-----GUCAUUUAGAAAUACUGCUGGUUAUU-ACCAGAGGAAGGUGAGGACGACGUCAAGUCAUCAUGCCCCUUAUAUCCUGGGCUACACACGUGCUACAAUGGUUAAGACAAUAAGUUGCAAUUUUGUGAAAAUGAGCUAAUCUU-AAAACUUAGCCUAAGUUCGGAUUGUAGGCUGAAACUCGCCUACAUGAAGCCGGAAUCGCUAGUAAUCGCCGGUCAGCUUACGGCGGUGAAUACGUUCUCGGGCCUUGUACACACCGCCCGUCACACCAUGGAAGUUGGCUGUGCCCGAAGUUAUUAU-CUGCCUGAAAAGAGGGAAAUACCUAAGGCCUGGCUGGUGACUGGGGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGUGGCUGGAACAAUUCCCmm ch.Cya.par ACCAUGGAGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCGGUAUGCUUAACACAUGCAAGUCGAACGAAGAUC--------GCAA--------GAUCUUAGUGGCGGACGGGUGAGUAACGCGUGAGAAUCUACCCUUAGGAGGGGGACAACAGUUGGAAACGACUGCUAAUACCCCAUA-UGCCUUCGGGUGAAAAGA-------GUAA-----UCUGCCUGAGGAAGAGCUCGCGUCUGAUUAGCUAGUUGGUGGGGUAAAGGCCUACCAAGGCGACGAUCAGUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGAUACGGCCCAGACUCCUC-GGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGCCUGACGGAGCAAUACCGCGUGAGGGAAGACGGCCUGUGGGUUGUAAACCUCUUUUCUUAGGGAAGAA-----------UCAA------------UGACGGUACCUAAGGAAUAAGCAUCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGAUGCAAGCGUUAUCCGGAAUCAUUGGGCGUAAAGAGUUCGUAGGUGGCUAAGCAAGUCUGUUGUUAAAGGCUGGGGCUUAACCCCAAAAAGGCAAUGGAAACUGUUUGGCUUGAGUACGGUAGGGGCAGAGGGAAUUCCUGGUGUAGCGGUGAAAUGCGUAGAUAUCAGGAAGAACACCGAUGGCGAAAGCACUCUGCUGGGCCGUUACUGACACUGAGGAACGAAAGCUAGGGUAGCAAAUGGGAUUAGAUACCCCAGUAGUCCUAGCCGUAAACUAUGGAUACUAGGUGUUGUGCGUAUCGCCCGUACAGUACCGUAGCUAACGCGUUAAGUAUCCCGCCUGGGGAGUACGCUCGCAAGAGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGUAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCAGGGUUUGACAUGUCGCGAAUUUUCUUGAAAGAGAAAAGUGCCUUCGGGAACGCGAACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUUUUUAGUUGCCAUCA-UCAG-UUGGGCACUCUAAAGAGACUGCCGGUGACAAGCCGGAGGAAGGUGGGGAUGACGUCAAGUCAGCAUGCCCCUUAUACCCUGGGCUACACACGUACUACAAUGGUCGGGACAAUAGGUUGCCAACUUGCGAAAGUGAGCUAAUCCGUUAAACCCGGCCUCAGUUCAGAUUGCAGGCUGCAACUCGCCUGCAUGAAGGUGGAAUCGCUAGUAAUCGCCGGUCAGCUUACGGCGGUGAAUUCGUUCCCGGGCCUUGUACACACCGCCCGUCACACCACGGGAGUCGGCCAUGCCCGAAGUCGUUAC-CUAACCAUUUGGAGGGGGAUGCCUAAGGCAGGGCUGGUGACUGGGGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGCGGCUGGAUCACCUCCUUU ch.Cya.cal AUAAUGGAGAGUUUGAUCCUGGCUCAGGAUUAACGCUGGCGGUAUGCCUAACACAUGCAAGUCGUACGAGAAU----------UUU---------AUUCUAGUGGCGGACGGGUGAGUAACACGUGAGAAUCUACCUCUAGGAGGGGGAUAACAGUUGGAAACGAUUGCUAAAACCCCAUA-UGCCUUAUGGUGAAAAGA-------UUUA-----UCUGCCUGGAGAUGAGCUCGCGGCUGAUUAGCUAGUUGGUAGGGUAAUGGCUUACCAAGGCAACGAUCAGUAGCUGGUCUUAGAGGAUGAUCAGCCACACUGGAACUGAGAUACGGUCCAGACUCCUC-GGGAGGCAGCAGUGGGGAAUUUUCCACAAUGGGCGAAAGCCUGAUGGAGCAAUACCGUGUGAGGGAUGAAGGCCUGUGGGUUGUAAACCUCUUUUUUCAGGAAAGAA-----------ACUU------------UGACGGUACCUGAAGAAUAAGCAUCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGAUGCAAGCGUUAUCCGGAAUCACUGGGCGUAAAGCGUCUGUAGGUGGUUUAUCAAGUCUGCUGUUAAAGCUUGAGGCUUAACCUCAAAAAAGCAGUGGAAACUGAUAGACUAGAGAAUGGUAGGGGCAGAGAGAAUUCUCAGUGUAGCGGUGAAAUGCGUAGAUAUUGAGAAGAAUACCGAUAGCGAAGGCGCUCUGCUGGGCCAUUACUGACACUCAGAGACGAAAGCUAGGGGAGCAAAUGGGAUUAGAUACCCCAGUAGUCCUAGCCGUAAACUAUGGAUACUAGAUGUUGUGUGAGUAAAUUGUGCAGUAUCGAAGCUAACGCGUUAAGUAUCCCGCCUGGGAAGUACGCUCGCAAGAGUGAAACUCAAAGGAAUUGACGGGGACCCGCACAAGC-GUGGAGCAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCAGGACUUGACAUGUUACUAAUUUCCUUGAAAGAGGAAAGUGCCUUUGGGAAAGUAAACACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCUUUAGUUACCAUCA-UUAAGUUGGGGACUCUAAAGAGACUGCCGGUGAUAAACCGGAGGAAGGUAAGGAUGAGGUCAAGUCAUCAUGCCCCUUAUGUCCUGGGCUACACACGUGCUACAAUGGUUAGGACAAUAAGUCGCAAAUUCGUGAGAACUAGCUAAUCUUAUAAACCUAAUCUCAGUACGGAUUGUAGGCUGCAACUCGCCUACAUGAAGACGGAAUCGCUAGUAAUCGCUGGUCAGCUCACAGCGGUGAAUACGUUCCCGGGUCUUGUACACACCGCCCGUCACACCAUGGGAGCUGGCCAUGUCCGAAAUCAUUAC-UUAACUUAAUGGAGGAGGAUGCUUAAGGCAGGGCUAGUGACUGGGGUGAAGUCGUAACAAGGUAGCCGUACUGGAAGGUGCGGCUGGAUUACCUCCUUU ga.Esc.col AAAUUGAAGAGUUUGAUCAUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAACGGUAACAGGAAGAAGCUUGCUUCUUUGCUGACGAGUGGCGGACGGGUGAGUAAUGUCUGGGAAACUGCCUGAUGGAGGGGGAUAACUACUGGAAACGGUAGCUAAUACCGCAUAACGUCGCAAGACCAAAGAGGGGGACCUUCGGGCCUCUUGCCAUCGGAUGUGCCCAGAUGGGAUUAGCUAGUAGGUGGGGUAACGGCUCACCUAGGCGACGAUCCCUAGCUGGUCUGAGAGGAUGACCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGCAAGCCUGAUGCAGCCAUGCCGCGUGUAUGAAGAAGGCCUUCGGGUUGUAAAGUACUUUCAGCGGGGAGGAAGGGAGUAAAGUUAAUACCUUUGCUCAUUGACGUUACCCGCAGAAGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGCACGCAGGCGGUUUGUUAAGUCAGAUGUGAAAUCCCCGGGCUCAACCUGGGAACUGCAUCUGAUACUGGCAAGCUUGAGUCUCGUAGAGGGGGGUAGAAUUCCAGGUGUAGCGGUGAAAUGCGUAGAGAUCUGGAGGAAUACCGGUGGCGAAGGCGGCCCCCUGGACGAAGACUGACGCUCAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGUCGACUUGGAGGUUGUGCCCUUGAGGCGUGGCUUCCGGAGCUAACGCGUUAAGUCGACCGCCUGGGGAGUACGGCCGCAAGGUUAAAACUCAAAUGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGUCUUGACAUCCACGGAAGUUUUCAGAGAUGAGAAUGUGCCUUCGGGAACCGUGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUUGUGAAAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUCCUUUGUUGCCAGCGGUCCGGCCGGGAACUCAAAGGAGACUGCCAGUGAUAAACUGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUACGACCAGGGCUACACACGUGCUACAAUGGCGCAUACAAAGAGAAGCGACCUCGCGAGAGCAAGCGGACCUCAUAAAGUGCGUCGUAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGUGGAUCAGAAUGCCACGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGUUGCAAAAGAAGUAGGUAGCUUAACCUUCGGGAGGGCGCUUACCACUUUGUGAUUCAUGACUGGGGUGAAGUCGUAACAAGGUAACCGUAGGGGAACCUGCGGUUGGAUCACCUCCUUA ga.Hae.par --ACUGAAGAGUUUGAUCNUGGCUCAGAUUGAACGCUGGCGGCAGGCUUAACACAUGCAAGUCGAACGGUAACGGGUUGAAACUUGUUUCAAUGCUGACGAGUGGCGGACGGGUGAGUAAUGCUUGGGAAUCUGGCUUAUGGAGGGGGAUAACCAUUGGAAACGAUGGCUAAUACCGCAUAGAAUCGGAAGAUUAAAGGGUGGGACUUUUUAGCCACCUGCCAUAAGAUGAGCCCAAGUGGGAUUAGGUNGUUGGUGAGGUNAAGGCUCACCAAGCCUNCGAUCUCUAGCUNGUCUNAGAGGAUGGCCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCGCNAUGGGGGGAACCCUGACGCAGCNAUGCCGCGUGAAUGAAGAAGGCCUUCGGGUUGUAAAGUUCUUUCGGUGGUGAGGAAGGUUNGUGUGUUAAUAGCACACUAAUUUGACGUUAGCCACAGAAGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUNAUACGGAGGGUGCGAGCGUUNAUCGGAAUAACUGGGCUUAAAGGGCACGCAGGCGGUAAAUUAAGUGAGAUGUGAAAUCCCCGAGCUUAACUUAGGAAUUGCAUUUCAGACUNGUUUACUAGAGUACUUUAGGGAGGGNUAGAAUUCCACGUGUAGCGGUGAAAUGCGUAGAGAUGUGGAGGAAUACCGAAGGCGAAGGCAGCCCCUUGGGAAGCUACUGACGCUCAUGUGCNNAAGCGUGGGGAGCAAACAGGAUUNGAUACCCUGGUAGUCCACGCUGUAAACGCUGUCGAUUUGGGGAUUGGGC-UUNN-GGCUUGGUGCCCGUAGCUAACGUGAUAAAUCGACCGCCUNGGGAGUACGGCCGCAAGGUUAAAACUCAAAUNAAUUGACGGGGGCCCGCACNAGCGGUGGAGCAUGUGGUUUNAUUCGANNNAACGCGAAGAACCUUACCUACUCUUGACAUCCUAAGAAUCCUGUAGAGAUACGGGAGUGCCUUCGGGAGCUUAGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUUGUGAAAUGUUGGGUUNAGUCCCGCAACGAGCGCAACCCUUAUCCUUUGUUGCCAGCACUUCGGGUGGGAACUCAAAGGAGACUGCCAGUGAUNAACUGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUACGAGUAGGGCUACACACGUGCUACAAUGGUGCAUACAGAGGGAAGCGAGCCUGCGAGGGGGAGCGAAUCUCAGAAAGUGCAUCUAAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGCAAAUCAGAAUGUUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGUUGUACCAGAAGUAGAUAGCUUAACCUUCGGGAGGGCGUUUACCACGGUAUGAUUCAUGACUmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ga.Leg.lyt -----mmmmmmmmmmmmmmmmmmmmmmmmmmmACGCUGGCGGCAUGCUUAACACAUGCAAGUCGAACGGCAGCACAGUCUAGCUUGCUAGACGGGUGGCGAGUGGCGAACGGGUGAGUAACGCGUAGGAAUAUACCUUGAAGAGGGGGACAACUUGGGGAAACUCAAGCUAAUACCGCAUAAUGUCUGAGGACGAAAGCCGGGGACCGUAAGGCCUGGCGCUUUAAGAUUAGCCUGCGUCCGAUUAGCUAGUUGGUAGGGAAAGGGCCUACCAAGGCGACGAUCGGUAGCUGGUCUGAGAGGAUGACCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGGGGAACCCUGAUCCAGCAAUGCCGCGUGUGUGAAGAAGGCCUGAGGGUUGUAAAGCACUUUCAGUGGGGAGGAGGCUUGUUAGGUUAAGAGCUAAAUAAGUGGACGUUACCCACAGAAGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUAAUCGGAAUUACUGGGCGUAAAGAGUGCGUAGGUGGUUUGGUAAGUUAUCUGUGAAAUCCCUGGGCUCAACCUGGGCAGGUCAGAUAAUACUGCUGAACUCGAGUAUGGGAGAGGGUAGUGGAAUUUCCGGUGUAGCGGUGAAAUGCGUAGAGAUCGGAAGGAACACCAGUGGCGAAGGCGGCUACCUGGCCUAAUACUGACACUGAGGCACGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGUCAACUAGCUGUUGGUUAUAUGAUAUAAUUAGUGGCGCAGCAAACGCGAUAAGUUGACCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCUACCCUUGACAUACAGUGAACUUUGCAGAGAUGCAUUGGUGCCUUCGGGAACACUGAUACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGUAACGAGCGCAACCCUUAUUGUUCGUUGCCAGCGCUAAUGGCGGGAACUCUAAUGAGACUGCCGGUGACAAACCGGAGGAAGGCGGGGACGACGUCAAGUCAUCAUGGCC-UUACGGGUAGGGCUAGACACGUGCUACAAUGGUUGAUACAGAGGGAAGCGAAGGAGCGAUCUGGAGC-AAUCUUAGAAAGUCAAUCGUAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGCGAAUCAGCAUGUCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGCUG-ACCAGAAGUAGAUAGUCUAACCGCAAGGGGGACGUUUAmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ga.Mar.psy -----mmAGAGUUUGAUCAUGGCUCAGAUUGAACGCUGGCGGCAGGCUUAACACAUGCAAGUCGAACGGUAACAGGAAAUAGCUUGCUAUUUUGCUGACGAGUGGCGGACGGGUGAGUAAUGCUUGGGAAUUUGCCGAAAGGUGGGGGACAACAGUUGGAAACGACUGCUAAUACCGCAUAAUGUCUACGGACCAAAGGUGGCCUCUUUUAAUGCUAUCGCCUUUCGAUGAGCCCAAGUGGGAUUAGCUAGUUGGUAAGGUAAUGGCUUACCAAGGCUUCGAUCCCUAGCUGGUCUUAGAGGAUGACCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGAGGAAACUCUGAUGCAGCCAUGCCGCGUGUGUGAAGAAGGCUUUCGGGUUGUAAAGCACUUUCAGCGAGGAGGAAAGGGUGUUGGUUAAUAUCCAAUAUCUGUGACGUUACUCGCAGAAGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAAGGUGCGAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGCGCGUAGGCGGUUUAAUAAGUCAGAUGUGAAAGCCCAGGGCUCAACCUGGGAACUGCAUUUUGAACUGUUAAACUAGAGUUUUGUAGAGGANGGUAGAAUUUCAGGUGUAGCGGUGAAAUGCGUAGAGAUCUGAAGGAAUACCAGUGGCGAAGGCGGCCACCUGGACAAAGACUGACACUGAGGCGCGAAAGCGUGGGUAGCAAACGGGAUUAGAUACCCCGGUAGUCCACGCAGUAAACGAUGUCUAUUAGAAGUUUGUGGCUAUAUGCCGUGGGUUUCAAAGUUAACGCAUUAAAUAGACCGCCUGGGGAGUACGGCCGCAAGGUUAAAACUCAAAUGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCAUCCCUUGACAUCCAGAGAAUCACCUAGAGAUAGAUGAGUGCCUUCGGGAACUCUGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUUGUGAAAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUCCUUAGUUGCCAGCACUAAUGGUGGGAACUCUAGGGAGACUGCCGGUGAUAAACCGGAGGAAGGUGGGGACGACGUCAAGUCAUCAUGGCCCUUACGGGAUGGGCUACACACGUGCUACAAUGGCAAAUACAAAGGGUUGCUAACCUGCGAGGGUAUGCGAAUCUCAUAAAGUUUGUCGUAGUCCGGAUCGGAGUCUGCAACUCGACUCCGUGAAGUUGGAAUCGCUAGUAAUCGUGGAUCAGAAUGCCACGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGCUGCACCAGAAGUCAUUAGCUUAACCUUCGGGAUGGCGAUGACCACGGUGUGGUUCAUGACUGGGGUGAAGUCGUAACAAGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ga.Oce.pus CAACUUGAGAGUUUGAUCCUGGCUCAGAACGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAGGGAGAAGCUAUC----UUCG----GAUAGUGGAGACCGGCAGACGGGUGAGUAACACGUGGGAACRUACCGAGUAGUGGGGGAUAACAGUUGGAAACGACUGCUAAUACCGCAUACGCCCUUCGGGGGAAAGA--------UUUA------UCGCUAUUCGAUUGGCCCGCGUUAGAUUAGCUAGUUGGUAAGGUAACGGCUUACCAAGGCGACGAUCUAUAGCUGGUUUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCGCAAUGGAGGAAACUCUGACGCAGCCAUGCCGCGUGAGUGAAGAAGGCCUUAGGGUUGUAAAGCUCUUUCAGACGUGAU--------------GAUG-----------AUGACAGUAGCGUCAAAAGAAGUUCCGGCUAACUUCGUGCCAGCAGCCGCGGUAAUACGAAGGGAACUAGCGUUGUUCGGAUUUACUGGGCGUAAAGAGCAUGUAGGCGGAUUGGACAGUUGAGGGUGAAAUCCCAGAGCUCAACUCUGGAACGGCCUUCAAUACUUCCAGUCUAGAGUCCGUAAGGGGGUGGUGGAAUUCCGAGUGUAGAGGUGAAAUUCGUAGAUAUUCGGAGGAACACCAGUGGCGAAGGCGACCACCUGGUACGGUACUGACGCUGAGAUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAGUGCUAGUUGUCAGGAUGUUUA-CAUCUUGGUGACGCAGCUAACGCAUUAAGCACUCCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGGCCNGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAANNAACGCGAAGAACCUUACCNAUUCUUGACAUACCUGUCGAUUUCCAGAGAUGGAUUUCUCAG-UCGCUGGACAGGAUACAGGUGCUGCAUGGCNGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCACCCCUAGUUGCCAGCAUUUAG-UUGGGCACUCUAUGGGAACUGCCGGUGACAAGCNGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUACGGAUUGGGCUACACACGUGCUACAAUGGUAACUACAGUGGGCAGCGACGUCGCGAGGCGAAGCAAAUCUC-CAAAAGUUAUCUCAGUUCGGAUUGUUCUCUGCAACUCGAGAGCAUGAAGUCGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUUGGUUUUACCCGAAGACGGUGGGCUAACCUUUAGGAGGCAGCCGGCCACGGUAAGGUCAGCGACUGGGNmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ga.Pho.pho -----mmmmmmmmmmmmmmmmmmmmmmAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAACGGUAACAGAU-GANACUUGUNUCAU-GCUGACGAACGGCGGACGGGUGAGUAAUGCCUGGGAAUAUACCCUGAUGUGGGGGAUAACUAUUGGAAACGAUAGCUAAUACCGCAUAAUCUCUUCGGAGCAAAGAGGGGGACCUUCGGGCCUCUCGCGUCAGGAUUAGCCCAGGUGGGAUUAGCUAGUUGGUGGGGUAAUGGCUCACCAAGGCGACGAUCCCUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGGGAAACCCUGAUGCAGCCAUGCCGCGUGUAUGAAGAAGGCCUUCGGGUUGUAAAGUACUUUCAGUUGUGAGGAAGGCGUUGGAGUUAAUAGCUUCAGCGCUUGACGUUAGCAACAGAAGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGUUCCGAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGCAUGCAGGCGGUCUGUUAAGCAAGAUGUGAAAGCCCGGGGCUCAACCUCGGAACAGCAUUUUGAACUGGCAGACUAGAGUCUUGUAGAGGGGGGUAGAAUUUCAGGUGUAGCGGUGAAAUGCGUAGAGAUCUGAAGGAAUACCGGUGGCGAAGGCGGCCCCCUGGACAAAGACUGACGCUCAGAUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGUCUACUUGAAGGUUGUGGCCUUGAGCCGUGGCUUUCGGAGCUAACGCGUUAAGUAGACCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAUGAAUUGACGGAGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCUACUCUUGACAUCCAGAGAAUUCGCUAGAGAUAGCUUAGUGCCUUCGGGAACUCUGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUUGUGAAAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUCCUUGUUUGCCAGCACUAAUGGUGGGAACUCCAGGGAGACUGCCGGUGAUAAACCGGAGGAAGGUGGGGACGACGUCAAGUCAUCAUGGCCCUUACGAGUAGGGCUACACACGUGCUAGAAUGGCGUAUAGAGAGGGCUGCAAGCUAGCGAUAGUGAGCGAAUCCGAGAAAGUACGUCGUAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGUGAAUCAGAAUGUCACGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGCUGCACCAGAAGUAGAUAGCUUAACCUUCGGGAGGGCGUUUACCACGGUGUGGUUCAUGACUGGGGUGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ga.Pse.asp -----mmmmmmmmmmmmmmmmmmmmmmAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAGCGGAUGAGAAGAG---CUUG---CUCUUCGAUUCAGCGGCGGACGGGUGAGUAAUGCCUAGGAAUCUGCCUGGUAGUGGGGGACAACGUUUCGAAAGGAACGCUAAUACCGCAUACGUCCUACGGGAGAAAGCAGGGGACCUUCGGGCCUUGCGCUAUCAGAUGAGCCUAGGUCGGAUUAGCUAGUUGGUGAGGUAAUGGCUCACCAAGGCGACGAUCCGUAACUGGUCUGAGAGGAUGAUCAGUCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGAAAGCCUGAUCCAGCCAUGCCGCGUGUGUGAAGAAGGUCUUCGGAUUGUAAAGCACUUUAAGUUGGGAGGAAGGGCAUUAACCUAAUACGUUAGUGUUUUGACGUUACCGACAGAAUAAGCACCGGCUAACUCUGUGCCAGCAGCCGCGGUAAUACAGAGGGUGCAAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGCGCGUAGGUGGUUCGUUAAGUUGGAUGUGAAAUCCCCGGGCUCAACCUGGGAACUGCAUCCAAAACUGGCGAGCUAGAGUAGGGUAGAGGGUGGUGGAAUUUCCUGUGUAGCGGUGAAAUGCGUAGAUAUAGGAAGGAACACCAGUGGCGAAGGCGACCACCUGGACUCAUACUGACACUGAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGUCAACUAGCCGUUGGGAACCUUGAGUUCUUAGUGGCGCAGCUAACGCAUUAAGUUGACCGCCUGGGGAGUACGGCCGCAAGGUUAAAACUCAAAUGUAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAGGCCUUGACAUGCAGAGAACUUUCCAGAGAUGGAUUGGUGCCUUCGGGAACUCUGACACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGUAACGAGCGCAACCCUUGUCCUUAGUUACCAGCACUCAUGGUGG-CACUCUAAGGAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUACGGCCUGGGCUACACACGUGCUACAAUGGUCGGUACAGAGGGUUGCCAAGCCGCGAGGUGGAGCUAAUCCCACAAAACCGAUCGUAGUCCGGAUCGCAGUCUGCAACUCGACUGCGUGAAGUCGGAAUCGCUAGUAAUCRCGAAUCAGAAUGUCGCGGUGAAUACGUUCCCGGGSCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGUUGGACCAGAAGUAGCUAGUCUAACCUUCGGGAGGACGGUUACCACGGUGUGAUUCAUGACUGGGGUGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGCmmmmmmmmmmmmm----- ga.Sal.typ -----mmAGAGUUUGAUCCUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAACGGUAACAGGAAGCAGCUUGCUGCUUUGCUGACGAGUGGCGGACGGGUGAGUAAUGUCUGGGAAACUGCCUGAUGGAGGGGGAUAACUACUGGAAACGGUGGCUAAUACCGCAUAACGUCGCAAGACCAAAGAGGGGGACCUUCGGGCCUCUUGCCAUCAGAUGUGCCCAGAUGGGAUUAGCUUGUUGGUGAGGUAACGGCUCACCAAGGCGACGAUCCCUAGCUGGUCUGAGAGGAUGACCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGCAAGCCUGAUGCAGCCAUGCCGCGUGUAUGAAGAAGGCCUUCGGGUUGUAAAGUACUUUCAGCGGGGAGGAAGGUGUUGUGGUUAAUAACCGCAGCAAUUGACGUUACCCGCAGAAGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGCACGCAGGCGCUCUGUCAAGUCGGAUGUGAAAUCCCCGGGCUCAACCUGGGAACUGCAUUCGAAACUGGCAGGCUUGAGUCUUGUAGAGGGGGGUAGAAUUCCAGGUGUAGCGGUGAAAUGCGUAGAGAUCUGGAGGAAUACCGGUGGCGAAGGCGGCCCCCUGGACAAAGACUGACGCUCAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGUCUACUUGGAGGUUGUGCCCUUGAGGCGUGGCUUCCGGAGCUAACGCGUUAAGUAGACCGCCUGGGGAGUACGGCCGCAAGGUUAAAACUCAAAUGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGUCUUGACAUCCACAGAACUUUCCAGAGAUGGAUUGGUUCCUUCGGGAACUGUGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUUGUGAAAUGUCGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUCCUUUGUUGCCAGCGGUUAGGCCGGGAACUCAAAGGAGACUGCCAGUGAUAAACUGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUACGACCAGGGCUACACACGUGCUACAAUGGCGCAUACAAAGAGAAGCGACCUCGCGAGAGCAAGCGGACCUCAUAAAGUGCGUCGUAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGUGGAUCAGAAUGCCACGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGUUGCAAAAGAAGUAGGUAGCUUAACCUUCGGGAGGGCGCUUACCACUUUGUGAUUCAUGACUGGGGUGAAGUCGUAACAAGGUAACCGUAGGGGAACCUGCGGCUGGAUCACCUCCUU- ga.Vib.cho -----mmAGAGUUUGAUNNUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAGCGGCAGCACAGAGGAACUUGUUCCUUGGGUGGCGAGCGGCGGACGGGUGAGUAAUGCCUGGGAAAUUGCCCGGUAGAGGGGGAUAACCAUUGGAAACGAUGGCUAAUACCGCAUAACCUCGCAAGAGCAAAGCAGGGGACCUUCGGGCCUUGCGCUACCGGAUAUGCCCAGGUGGGAUUAGCUAGUUGGUGAGGUAAGGGCUCACCAAGGCGACGAUCCCUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGCAAGCCUGAUGCAGCCAUGCCGCGUGUAUGAAGAAGGCCUUCGGGUUGUAAAGUACUUUCAGUAGGGAGGAAGGUGGUUAAGUUAAUACCUUAAUCAUUUGACGUUACCUACAGAAGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGCAUGCAGGUGGUUUGUUAAGUCAGAUGUGAAAGCCCUGGGCUCAACCUAGGAAUCGCAUUUGAAACUGACAAGCUAGAGUACUGUAGAGGGGGGUAGAAUUUCAGGUGUAGCGGUGAAAUGCGUAGAGAUCUGAAGGAAUACCGGUGGCGAAGGCGGCCCCCUGGACAGAUACUGACACUCAGAUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGUCUACUUGGAGGUUGUGCCCUAGAGGCGUGGCUUUCGGAGCUAACGCGUUAAGUAGACCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAUGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCUACUCUUGACAUCCAGAGAAUCUAGCGGAGACGCUGGAGUGCCUUCGGGAGCUCUGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUUGUGAAAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUCCUUGUUUGCCAGCACUAAUGGUGGGAACUCCAGGGAGACUGCCGGUGAUAAACCGGAGGAAGGUGGGGACGACGUCAAGUCAUCAUGGCCCUUACGAGUAGGGCUACACACGUGCUACAAUGGCGUAUACAGAGGGCAGCGAUACCGCGAGGUGGAGCGAAUCUCACAAAGUACGUCGUAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGCAAAUCAGAAUGUUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGCUGCAAAAGAAGCAGGUAGUUUAACCUUCGGGAGGACGCUUGCCACUUUGUGGUUCAUGACUGGGGUGAAGUCGUAACAAGGUAGCGCUAGGGGAACCUGGCGCUGGAUCACCUCCUUU ga.Xan.pis -----mmmmmmmmmmmmmmmmmmmmmmAGUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAACGGCAGCACAGUAAGACUUGUCUUAUGGGUGGCGAGUGGCGGACGGGUGAGGAAUACAUCGGAAUCUACUCUUUCGUGGGGGAUAACGUAGGGAAACUUACGCUAAUACCGCAUACGACCUACGGGUGAAAGCGGAGGACCUUCGGGCUUCGCGCGAUUGAAUGAGCCGAUGUCGGAUUAGCUAGUUGGCGGGGUAAAGGCCCACCAAGGCGACGAUCCGUAGCUGGUCUGAGAGGAUGAUCARCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGCAAGCCUGAUCCAGCCAUGCCGCGUGGGUGAAGAAGGCCUUCGGGUUGUAAAGCCCUUUUGUUGGGAAAGAAAAGCAGUCGGUUAAUACCCGAUUGUUCUGACGGUACCCAAAGAAUAAGCACCGGCUAACUUCGUGCCAGCAGCCGCGGUAAUACGAAGGGUGCAAGCGUUACUCGGAAUUACUGGGCGUAAAGCGUGCGUAGGUGGUGGUUUAAGUCUGUUGUGAAAGCCCUGGGCUCAACCUGGGAAUUGCAGUGGAUACUGGGUCACUAGAGUGUGGUAGAGGGUAGCGGAAUUCCCGGUGUAGCAGUGAAAUGCGUAGAGAUCGGGAGGAACAUCCGUGGCGAAGGCGGCUACCUGGACCAACACUGACACUGAGGCACGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGCGAACUGGAUGUUGGGUGCUUUG-GCACGCAGUAUCGAAGCUAACGCGUUAAGUUCGCCGCCUGGGGAGUACGGUCGCAAGACUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGUAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGUCUUGACAUCCACGGAACUUUCCAGAGAUGGAUUGGUGCCUUCGGGAACCGUGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCCUUAGUUGCCAGCACUAAUGGUGGGAACUCUAAGGAGACCGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUACGACCAGGGCUACACACGUACUACAAUGGUAGGGACAGAGGGCUGCAAACCCGCGAGGGCAAGCCAAUCCCAGAAACCCUAUCUCAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGCAGAUCAGCAUGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUUUGUUGCACCAGAAGCAGGUAGCUUAACCUUCGGGAGGGCGCUUGCCACGGUGUGGCCGAUGACUGGGGUGAAGUCGUAACAAGGUAGCCGUAUCGGAAGGUGCmmmmmmmmmmmmm----- ra.Dei.mur -----mmmmmmmmmmmmmmmmmmmmAGGGUGAACGCUGGCGGCGUGCUUAAGACAUGCAAGUCGAACGGCCCU---------UUCG--------GAGGGCAGUGGCGCACGGGUGAGUAACGCGUAACGACCUGCCCCCAAGUCUGGAAUAACCCUCCGAAAGGAGGGCUAAUACCGGAUGUGCUGCUUGCAGUAAAGGC-------GCGA-----GCCGCUUGGGGAUGGGGUUGCGUUCCAUCAGCUAGAUGGUGGGGUAAAGGCCUACCAAGGCGACGACGGAUAACCGGCCUGAGAGGGUGGCCGGUCACAGGGGCACUGAGACACGGGUCCCACUCCUACGGGAGGCAGCAGUUAGGAAUCUUCCCCAAUGGACGAAAGUCUGAGGGAGCGACGCCGCGUGAGGGAUGAAGGUCUUCGGAUUGUAAACCUCUGA-AUCAGGGACGAAAGACGCU----UUA------GGCGGGAUGACGGUACCUGAG-UAACAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUACCCGGAAUCACUGGGCGUAAAGGGCGUGUAGGCGGUACUUUAAGUCUGACUUUAAAGACCGUGGCUGAACCACGGAAGUGGGUUGGAUACUGGCGUGCUGGACCUCUGGAGAGACAACCGGAAUUCCUGGUGUAGCGGUGGAAUGCGUAGAUACCAGGAGGAACACCGAUGGCGAAGGCAGGUUCUUGGACAGAAGGUGACGCUGAGGCGCGAAAGUGUGGGGAGCGAACCGGAUUAGAUACCCGGGUAGUCCACACCCUAAACGAUGUACGUUGGCUGAUGGCGG-GAUG--CCGUCAUGGGCGAAGCUAACGCGAUAAACGUACCGCCUGGGAAGUACGGCCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAGGUCUUGACAUCCCAUGAACCUUGCAGAGAUGUGAGGGUGCCUUCGGGAACAUGGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGCCUUGUGUUGCCAGCAGUUCGGCUGGGCACUCACGAGGGACUGCCUGUGA-AAGCAGGAGGAAGGCGGGGAUGACGUCUAGUCAGCAUGGUCCUUACGACCUGGGCGACACACGUGCUACAAUGGAUGGUACAACGCGCAGCCAAGUCGCGAGACUGAGCGAAUCGCUGAAAGCCAUCCCCAGUUCAGAUCGGAGUCUGCAACUCGACUCCGUGAAGUUGGAAUCGCUAGUAAUCGCGGGUCAGCAUACCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUAGAUUGCAGCUGAAACCGCCGGGAGCU--UCAC---GGCAGGCGUCUAGGCUGUGGUUUAUGACUGGGGUGAAGUCGUAACAAGGUAACUGUACCGGAAGGUGCGGCUGGAmmmmmm----- ra.Dei.rad UUUAUGGAGAGUUUGAUCCUGGCUCAGGGUGAACGCUGGCGGCGUGCUUAAGACAUGCAAGUCGAACGCGGUC---------UUCG---------GACCGAGUGGCGCACGGGUGAGUAACACGUAACGACCUACCCAGAAGUCAUGAAUAACUGGCCGAAAGGUCAGCUAAUACGUGAUGUGGUGCGUGCACUAAAGA--------UUUA------UCGCUUCUGGAUGGGGUUGCGUUCCAUCAGCUGGUUGGUGGGGUAAAGGCCUACCAAGGCGACGACGGAUAGCCGGCCUGAGAGGGUGGCCGGCCACAGGGGCACUGAGACACGGGUCCCACUCCUACGGGAGGCAGCAGUUAGGAAUCUUCCACAAUGGGCGCAAGCCUGAUGGAGCGACGCCGCGUGAGGGAUGAAGGUUUUCGGAUCGUAAACCUCUGA-AUCUGGGACGAAAGAGCC-----UUCG------GGCAGAUGACGGUACCAGAG-UAAUAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUACCCGGAAUCACUGGGCGUAAAGGGCGUGUAGGCGGAUAUUUAAGUCUGGUUUUAAAGACCGAGGCUCAACCUCGGGAGUGGACUGGAUACUGGAUGUCUUGACCUCUGGAGAGGUAACUGGAAUUCCUGGUGUAGCGGUGGAAUGCGUAGAUACCAGGAGGAACACCAAUGGCGAAGGCAAGUUACUGGACAGAAGGUGACGCUGAGGCGCGAAAGUGUGGGGAGCAAACCGGAUUAGAUACCCGGGUAGUCCACACCCUAAACGAUGUACGUUGGCUAAGCGCAG-GAUG--CUGUGCUUGGCGAAGCUAACGCGAUAAACGUACCGCCUGGGAAGUACGGCCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAGGUCUUGACAUGCUAGGAAGGCGCUGGAGACAGCGCCGUGCCUUCGGGAACCUAGACACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGCUUCCAGUUGCCAGCA-UUCAGUUGGGCACUCUGGAGGGACUGCCUGUGA-AAGCAGGAGGAAGGCGGGGAUGACGUCUAGUCAGCAUGGUCCUUACGUCCUGGGCUACACACGUGCUACAAUGGAUAGGACAACGCGCAGCAAACAUGUGAGUGUAAGCGAAUCGCUGAAACCUAUCCCCAGUUCAGAUCGGAGUCUGCAACUCGACUCCGUGAAGUUGGAAUCGCUAGUAAUCGCGGGUCAGCAUACCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUAGAUUGCAGUUGAAACCGCCGGGAGCC--UCAC---GGCAGGCGUCUAGACUGUGGUUUAUGACUGGGGUGAAGUCGUAACAAGGUAACUGUACCGGAAGGUGCGGUUGGAUCACCUCCUUU sp.Bor.par -----mmmmAGUUUGAUCCUGGCUUAGAACUAACGCUGGCAGUGCGUCUUAAGCAUGCAAGUCAGACGGAAUGUA-------GCAA-------UACAUUCAGUGGCGAACGGGUGAGUAACGCGUGGAAAUCUACCUAUGAGAUGGGGAUAACUAUUAGAAAUAGUAGCUAAUACCGAAUAAGGUCGAUGGAUGAAAGAAGCCU---UUAA--AGCUUCGCUUGUAGAUGAGUCUGCGUCUUAUUAGCUAGUUGGUAGGGUAAGAGCCUACCAAGGCUAUGAUAAGUAACCGGCCUGAGAGGGUGAACGGUCACACUGGAACUGAGAUACGGUCCAGACUCCUACGGGAGGCAGCAGCUAAGAAUCUUCCGCAAUGGGCGAAAG-CUGACGGAGCGACACUGCGUGAACGAAGAAGGUCGAAAGAUUGUAAAGUUCUUUUAUAAAUGAGGAAUAAGCUUUGUAGAAA-UGACAAGGUGAUGACGUUAAUUUAUGAAUAAGCCCCGGCUAAUUACGUGCCAGCAGCCGCGGUAAUACGUAAGGGGCGAGCGUUGUUCGGGAUUAUUGGGCGUAAAGGGUGAGUAGGCGGAUAUGUAAGUCAUGCGUAAAAUACCACAGCUCAACUGUGGAACUAUGCUGGAAACUGCAUGACUAGAGUCUGAUAGGGGAAGUUAGAAUUCCUGGUGUAAGGGUGGAAUCUGUUGAUAUCAGGAAGAAUACCAGAGGCGAAGGCGAACUUCUGGGUCAAGACUGACGCUGAGUCACGAAAGCGUAGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCUACGCUGUAAACGAUGCACACUUGGUGUUAAUC--GAAA---GGUUAGUACCGAAGCUAACGUGUUAAGUGUGCCGCCUGGGGAGUAUGCUCGCAAGAGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGAUACGCGAGGAACCUUACCAGGGCUUGACAUAACAGGAUGUAGUUAGAGAUAACUAUUCCCCGUUUGGGGUCUGUAUACAGGUGCUGCAUGGUUGUCGUCAGCUCGUGCUGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUUGUCUGUUACCAGCAUUAAAGAUGGGGACUCAGACGAGACUGCCGGUGAUAAGCCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGGCCCUUAUGUCCUGGGCUACACACGUGCUACAAUGGCCUGUACAAAGCGAUGCGAAACAGUGAUGUGAAGCAAAACGCAUAAAGCAGGUCUCAGUCCAGAUUGAAGUCUGAAACUCGACUUCAUGAAGUUGGAAUCGCUAGUAAUCGUAUAUCAGAAUGAUACGGUGAAUACGUUCUCGGGCCUUGUACACACCGCCCGUCACACCACCCGAGUUGAGGAUACCCGAAGCUAUUAUUCUAACCGCAAGGAGGAAGGUAUCUAAGGUAUGUUUAGUGAGGGGGGUGAAGUCGUAACAAGGUAGCCGUACUGGAAAGUGUGGCUGGAUCACCU----- sp.Lep.mey -----mmmmmmmmmmmmmmmmmmmmmmmmSUARCGCUGGCGGCGCGUCUUAAACAUCCAA-UCAAGCGGAGUA---------GCAA---------UACUCAGCGGCGAACGGGUGAGUAACACGUGGGAAUCUUCCUCCGAGUCUGGGAUAACUUUUCGAAAGGGAAGCUAAUACUGGAUAGUCCCAUAAGGGUAAAGA--------UUCA------UUGCUUGGAGAUGAGCCCGCGUCCGAUUAGCUAGUUGGUGAGGUAAUGGCUCACCAAGGCGACGAUCGGUAGCCGGCCUGAGAGGGUGUUCGGCCACAAUGGAACUGAGACACGGUCCAUACUCCUACGGGAGGCAGCAGUUAAGAAUCUUGCUCAAUGGGGGGAACCCUGAAGCAGCGACGCCGCGUGAACGAUGAAGGUCUUCGGAUUGUAAAGUUCAAUAAGCAGGGAAAAAUAAGCA-----GCAA-------UGUGAUGAUGGUACCUGCCUA--AAGCACCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAUGGUGCAAGCGUUGUUCGGAAUCAUUGGGCGUAAAGGGUGCGUAGGCGGACAUGUAAGUCAGGUGUGAAAACUGGGGGCUCAACUCCCAGCCUGCACUUGAAACUAUGUGUCUGGAGUUUGGGAGAGGCAAGUGGAAUUCCAGGUGUAGCGGUGAAAUGCGUAGAUAUCUGGAGGAACACCAGUGGCGAAGGCGACUUGCUGGCCUAAAACUGACGCUGAGGCACGAAAGCGUGGGUAGUGAACGGGAUUAGAUACCCCGGUAAUCCACGCCCUAAACGUUGUCUACCAGUUGUUGGGGGUUUAA--CCCUCAGUAACGAACCUAACGGAUUAAGUAGACCGCCUGGGGACUAUGCUCGCAAGAGUGAAACUCAAAGGAAUUGACGGGGGUCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGAUACGCGAAAAACCUCACCUAGGCUUGACAUGAGUGGAAUCAUGUAGAGAUACAUGAGCC--UUCG--GGCCGCUUCACAGGUGCUGCAUGGUUGUAAUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCACCUUAUGUUGCCAUCA-UUCAGUUGGGCACUCGUAAGGAACUGCCGGUGACAAACCGGAGGAAGGCGGGGAUGACGUCAAAUCCUCAUGGCCUUUAUGUCUAGGGCAACACACGUGCUACAAUGGCCGGUACAAAGGGUAGCCAACUCGCGAGGGGGAGCUAAUCUCAAAAACCCGGUCCCAGUUCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCGGACCUUGUACACACC-CCCGUCACACCACCUGAGUGGGGAGCACCCGAAGUGGUCUUGCCAACCGCAAGGAAGCAGACUACUAAGGUGAAACUCGUGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- sp.Tre.soc -----mmmmmGUUUGAUCCUGGCUCAGAACGAACGCUGGCGGCGCGUCUUAACCAUGCAAGUCGAGCGGCAGGCA-------GCAA-------UGCCGAGAGCGGCGGACUGGUGAGUAACACGUGGAAACGAACCCCGAUGCCCGGGACAGCCUGUAGAAAUAGAGGGUAAUACCGGAUAGAUCAGGAACGGGAAAGGAGC-----UUCG---GCUCCGCGCCGGGAUCGGUCUGCGGCCCAUCAGCUAGACGGCGGGGUAAGGGCCCGCCGUGGCGAGGACGGGUAUCCGGCCUGAGAGGGCGGACGGACACAUUGGGACUGAGAUACGGCCCAGACUCCUACGGGAGGCAGCAGGUAAGAAUAUUCCGCAAUGGGGGGAACCCUGACGGAGCGACGCCGCGUGAACGAAGAAGGCCGGAAGGUUGUAAAGUUCUUUUCUGUCCGAGGAAUAAGUGUAG-GGAAA-UGCCUGCAUGGUGACGGUAGGGCAGGAAUAAGCACCGGCUAAUUACGUGCCAGCAGCCGCGGUAACACGUAAGGUGCGAGCGUUGUUCGGAAUUAUUGGGCGUAAAGGGCAUGCAGGCGGGUCGCCAAGCUUGGUAAGAAAUACCGGGGCUCAACUCCGGAGCUAUAUUGAGAACUGGCGAGCUAGAGUUGCCGAAGGGUAUCCGGAAUUCCGCGUGAAGGGGUGAAAUCUGUAGAUAUGCGGAAGAACACCGAUGGCGAAGGCAGGAUACCGGCGGACGACUGACGCUGAGGUGCGAAGGUGCGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCGCACAGUCAACGAUGUACACUGGGCGUGUGCGC-AAGA--GCGUGCGUGCCGAAGCAAACGCGAUAAGUGUACCGCCUGGGGAGUAUGCCCGCAAGGGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGGUACGCGAGGAACCUUACCUGGGUUUGACAUCAGAGGGAUCAUAUAGAGAUAUGUGAGCGUAGCAAUACGGCUCUUGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUACUGCCGGUUACUAACAGUAAUGCUGAGGACUCAGGCGGAACUGCCUGCGACAAGCAGGAGGAAGGCGGGGACGACGUCAAGUCAUCAUGGCCCUUAUGUCCAGGGCUACACACGUGCUACAAUGGCCGCCACAGAGCGGGGCGAAGCCGAGAGGCGGAGCAGAACGCAGAAAAGCGGUCGUAGUCCGGAUUGAAGUCUGAAACUCGACUUCAUGAAGCUGGAAUCGCUAGUAACCGCACAUCAGCACGGCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUCCGAGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- sp.Tre.suc GGCAUGGAGAGUUCGAUCCUGGCUCAGAACGAACGCUGGCGGCGCGUCUUAAGCAUGCAAGUCGGGCGGGAUUCACGCG---CUUG---CGCGUGGUGAGAGCGGCGGACUGGCGAGUAACACGUGGGGACGCGCCCUCCGGACGGGAAUAGCCUGCAGAAAUGCAGGGUAAUGCCGGAUGCGAACUGGAGUGGAAAGCCCC-----CACG---GGGGCGCCGGAGGAGCGGCCCGCGGCCCAUCAGCUGGUAGGCGGUGCAAGGGACCACCUAGGCUACGACGGGUACCCGGCNUAAGAGGGCGGACGGGCACAUUGGGACUGAGAUACGGCCCAGACUCNUACGGGAGGCAGCAGCUAAGAAUAUUCCGCAAUGGGGGGAACNNUGACGGAGCGACGCCGCGUGGGCGAGGAAGGCCGGAAGGUUGUAAAGCCCUUUUGCGCGCGAGGAAUNAGGGGAG-GGGAA-UGCCUUCCCGGUGACUGUAGCGCGCGAAUAAGCGCCGGCUAAUUACGUGCCAGCAGCCGCGGUNACACGUAAGGCGCNAGCGUUGUNCGGAAUNAUUGGGCGUAAAGGGCGUGUAGGCGGCCCUGCAAGCCUNGCGUGAAAUCCCGGGGCNCAACCCCGNAACCGCGCUGGGAACUGCUGGGCUUGAGCCGCUGUGGCGCAGCCGGAAUUNCAGGUGUAGGGGUGAAAUCUGUAGAUAUCUGGAAGAACACCGAUGGCGAAGGCAGGCUGCGAGCGGACGGCUGACGCUGAGGCGCGAAGGCGCGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCGCGCAGUAAACGAUGCACACUGGGUGNUCGGGC-AUGA--GCCCGGNUGCCGAAGCGAACGCGUUAAGUGUGCCGCCUGGGGAGUAUGCCCGCAAGGGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUAUAAUUCGANNNNUAGCGAGGAACCUUACCUGGGCUUGACAUAACAGGGACCGCCUGGAGACAGGCGGACGCAGCAAUGCGCCUGUGAACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUGCCGCCAGUUGCCAGCAUUAGAGGUGGGCACUCUGGCGGAACUGCCGGCGACAAGCAGGAGGAAGGCGGGGACGACGUCAAGUCAUCAUGGCCCUUAUGUCCAGGGCUACACACGUGCUACAAUGGCAGGCACAGAGUGAAGCGAGGCCGCGAGGCGGAGCGAAACGCAGAAAACCUGCCGUAGUCCGGAUCGGAGUCUGAAACCCGACUCCGUGAAGCUGGAAUCGCUAGUAAUCGCGCAUCAGCACGGCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUCCGAGUCGGGGAUUCCCGAAGCGGGCAGGCCAACCGAAAGGGGGCNNCUCUNNAAGGUNUGCUUmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- th.The.afr UUCAUGGAGGGUUUGAUCCUGGCUCAGGGUGAACGCUGGCGGCGUGCCUUACACAUGCAAGUCGAGCGAAGCUGCUGGUGGAUUCGANGCCNGUAGACUNAGCGGCGGACGGGUGAGUAACGCGUAGGGACCUACCCCUCAGAGGGGGAUAACUGGGGGAAACCUCAGCUAAUACCCNAUACGUUCUAAGGAAGAAAGGAGC-----AAUA---GCUCUGCUGAGNNNGGGGCCUGCGACCCAUCAGGUAGUUGGUGAGGUAACGGCUCACNAAGCCUACGACGGGUAGCCGGUCUGAGAGGAUGGCCGNCNACAAGGGCACUGAGACACGGGCCNNACUCCUACGGGAGGCAGCAGUGGGGGAUUUUGGACAAUGGGCGAAAGCCUNAUCCAGCGACGCCGNGUGCGGGACGAAGCCCUUCGGGGUGUAAACCGCUGUGGUGGGAGACGAAUAAGGUGAG-GGGAA-AGCCUCACUGAUGACGGUAUCCCACUAGAAAGCCCCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGGGCNAGCGUUACCCGGAUUUACUGGGCGUAAAGGGGGCGUAGGCGGCCGUGAAAGUCCGGUGUGAAAACUCACGGCUCAACCGUGGGACUGCGCUGGAAACUACACGGCUUGAGGACGGUAGAGGGAGACGGAACUGCUGGUGUAGGGGUGAAAUCCGUAGAUAUCAGCAGGAACGCCGGUGGAGAAGUCGGUCUCCUGGGCCGUNCCUGACGCUGAGGCCCNAAAGCUAGGGGAGCAAACCGGAUUAGAUACCCGGGUAGUCCUAGCCGUAAACGAUGCCCNCUAGGUGUGGGGGAG-UAA-UUCCUCCGUGCUGAAGCUAACGCGAUAAGUGGGCCGCCUGGGGAGUAUGCCCGCAAGGGUGAAACUCAAAGGAAUUGACGGGACCCNGCACAAGCGGUGGAGCGUGUGGUUUAAUUGGACGCUAAGCCAAGAACCUUACCAGGGCUUGACAUGUGGUGUACUGNCNCGAAAGGGGAGGGCCUGGUUACGGGAGCCAGCACAGGUGGUGCACGGUCGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUGCCCUUAGUUGCCAGCGGUUCGGCCGGGCACUCUAAGGGNACUGCCGGCGACGAGCCGGAGGAAGGAGGGGAUGACGUCAGAUACUCGUGCNCCUUAUGCCCUNGGCGACACACGCGCUACAAUGGGUGGGACAGCGGGAAGCGAGCCAGCGAUGGUGAGCGAAGCCCUUAAACCCACCCUCAGUUCGGAUUGCAGGCUGAAACCCGCCUGCAUGAAGCCGGAAUCGCUAGUAAUCGCGGAUCAGCCCGCCGCGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----mmmm---mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- th.The.hyp -----mmmmGGUUUGAUCCUGGCUCAGGGUGAACGCUGGCGGCGUGCCUAACACAUGCAAGUCGAGCGGGUACCC-------GCAA-------GGGUACCAGCGGCGCACGGGUGAGUAACACGUGGGAAUCUACCCCUCGGAGGGGGAUAACCGGGGGAAACCCCGGCUAAUACCCCAUACGAUCGAAGGA-GAAAGGGGC-----GUUU---CCUCCGCCGAGGGAUGAGCCCGCGUCCCAUCAGGUAGUUGGUGGGGUAAUGGCCCACCAAGCCUACGACGGGUAGCCGGCCUGAGAGGGUGGUCGGCCACAAGGGCACUGAGACACGGGCCCUACUCCUACGGGAGGCAGCAGUGGGGAAUCUUGGACAAUGGGCGAAAGCCUGAUCCAGCGACGCCGCGUGGGGGAUGAAGCCCUUCGGGGUGUAAACCCCUGUUGCGAGGGAGGAAUAAGGCCUG-GGAAA-UGCCAGGCCGAUGACUGUACCUCGCGAGGAAGCCCCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGGGCGAGCGUUACCCGGAUUCACUGGGCGUAAAGCGGGUGUAGGCGGCUCGGUAAGUCGGGUGUGAAAUCCCACAGCUCAACUGUGGAAUUGCGCCCGAAACUGCUGAGCUUGGGGCCGGUAGAGGGAGACGGAACUGCCGGUGUAGGGGUGAAAUCCGUAGAUAUCGGCAGGAACGCCGGUGGGGAAGCCGGUCUCCUGGGCCGCGCCCGACGCUGAGACCCGAAAGCUAGGGGAGCAAACCGGAUUAGAUACCCGGGUAGUCCUAGCCGUAAACGAUGCCCACUAGGUGUGGGGGA-UUAA-UUCCUCCGUGCUGUAGCUAACGCGUUAAGUGGNCCGCCUGGGGAGUACGCCCGCAAGGGUGAAACUCAAAGGAAUUGGCGGGACCCCGCACAAGCGGUGGAGCGCGUGGUUUAAUUGGAUGCUGAGCCAAGAACCUUACCAGGGUUUGACAUGAGGUGUACCGACCCGAAAGGGGAGGGCCCCUUUUGGGGAGGCUGCACAGGUGGUGCACGGCCGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAAUCCCUGCCCUUAGUUGACAACGGUUCGGCCGGGUACUCUAAGGGGACUGCCGGUGACGANUCGGAGGAAGGAGGGGACGACGUCAGGUACUCGUUCCCUUUAUGCCCUGGGCUACACACGCGCUACACUGGGUGGUACAAUGGGUUGCGACCCCNCGAGGGGGAGCCAAUCCC-UAAAACCACCCCCAGUUCGGAUCGCAGGCUGCAACCCGCCUGCGUGAAGCCGGAAUCGCUAGUAAUCGCGGAUCAACCCGCCGCGGUGAAUACGUUCCCGGGGUUUGCACACACCGCCCGUCAAACCACCCGAGUCGGGGGCACCCGAAGACGCUCUCCUAACCGAAAGGAGGGAGGCGUUGAGGGUGAACCUGGUGAGGGGGGCUAAGUCGUAACAAGGUAACCGUACUGGAAGGUGCmmmmmmmmmmmmm----- th.The.sub -----mmmmmmmmmmmmmmmmmmmmAGGGUGAACGCUGGCGGCGUGCUUAACACAUGCAAGUCGCGCGGGGA-AACCCC---UUCG---GGGGGAGUACCAGCGGCGCACGGGUGAGUAACACGUGGGAACCUACCCCUCAGCGGGGGAUAACCGGGGGAAACUCCGGCUAAUACCCCAUAUUAUCGACAGAUGAAAGGAGC-----GUUU---GCUUCGUUGAGGGAUGGGCCCGCGGCCCAUCAGGUAGUUGGUGAGGUAAUGGCUCACCAAGCCUACGACGGGUAGCCGACCUGAGAGGGUGACCGGCCACAAGGGCACUGAGACACGGGCCCUACUCCUACGGGAGGCAGCAGUGGGGAAUUUUGGACAAUGGGCGAAAGCCUGAUCCAGCGACGCCGCGUGAGGGACGAAGCCCUUCGGGGUGUAAACCUCUGUUGUGAGGGACGAAUAAGAUCUG-GGAAA-UGCCAGAUCGAUGACGGUACCUCACGAGAAAGCCCCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGGGCAAGCGUUACCCGGAUUCACUGGGCGUAAAGGGGGCGCAGGCGAUCCAGUAUGUCGGGUGUGAAAUCCCACAGCUCAACUGUGGAAUCGCACCCGAAACUACUGGGCUUGGGGCUGGUAGAGGGAGACGGAACUGCUGGUGUAGGGGUGAAAUCCGUAGAUAUCAGCAGGAACGCCGGUGGGGAAGCCGGUCUCCUGGGCCAAGCCCGACGCUGAGGCCCGAAAGCUAGGGGAGCAAACCGGAUUAGAUACCCGGGUAGUCCUAGCCGUAAACGAUGCCCACUAGGUGUGGGGGAG-UCA-UUCCUCCGUGCUGUAGCUAACGCGUUAAGUGGGCCGCCUGGGGAGUACGCCCGCAAGGGUGAAACUCAAAGGAAUUGGCGGGGACCCGCACAAGCGGUGGAGCGUGUGGUUUAAUUGGAUGCUAAGCCAAGAACCUUACCAGGGCUUGACAUGAGGUGUACCAACCCGAAAGGGGAAGGCCCUUUUUAGGGAGCCUGCACAGGUGGUGCACGGCCGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUGCCCUU-GUUGCCAGCGGUUCGGCCGGGCACUCUAAGGGGACCGCCGGCGACGAGCCGGAGGAAGGAGGGGACGACGUCAGGUACUCGUGCCCUUUAUGCCCUGGGCUACACACGCGCUACAAUGGGUGGUACAGUGGGUCGCGACCUCGCGAGAGGGAGCCAAUCCC-CAAAACCAUCCUCAGUUCAGAUCGCAGGCUGCAACCCGCCUGCGUGAAGCCGGAAUCGCUAGUAAUCGCGGAUCAGCCAGCCGCGGUGAAUACGUUCCCGGGGUUUGCACACACCGCCCGUCAAGCCACCCGAGUCGGGGGCACCUGAAGACGCCUUCCUAACCGAAAGGAGGGAGGUGGUGAAGGUGAAUCUGGCGAGGGGGGCUAAmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ar.Sul.sol -----AUUCCGGUUGAUCCUGCCGGACCC-GACCGCUAUCGGGGUAGGUAAGCCAUGGGAGUCUUACACUCCCG--------GUAA---------GGGAGUGUGGCGGACGGCUGAGUAACACGUGGCAACCUACCCUCGGGACGGGGAUAACCCCGGGAAACUGGGGAUAAUCCCCGAUA-GGGAGGAAUCCUAAAGGCUAUAGGCUUUCUUGUAGCCGCCCGAGGAUGGGGCUACGGCCCAUCAGGCUGUCGGUGGGGUAAAGGCCCACCGAACCUAUAACGGGUAGGGGCCGUGGAAGCGGGAGCCUCCAGUUGGGCACUGAGACAAGGGCCCAGGCCCUACGGGGCGCACCAGGCGCGAAACGUCCCCAAUGCGCGAAAGCGUGAGGGCGCUACCCCGAGUGCCC----------GCAA-------G-AGGCUUUUCCCCGCUCU--------------AAAA-------------------GGCGGGGGAAUAAGCGGGGGCAAGUCUGGUGUCAGCCGCCGCGGUAAUACCAGCUCCGCGAGUGGUCGGGGUGAUUACUGGGCCUAAAGCGCCUGUAGCCGGCCCACCAAGUCGCCCCUUAAAGUCCCCGGCUCAACCGGGGAACUGGGGGCGAUACUGGUGGGCUAGGGGGCGGGAGAGGCGGGGGGUACUCCCGGAGUAGGGGCGAAAUCCUUAGAUACCGGGAGGACCACCAGUGGCGGAAGCGCCCCGCUAGAACGCGCCCGACGGUGAGAGGCGAAAGCCGGGGCAGCAAACGGGAUUAGAUACCCCGGUAGUCCCGGCUGUAAACGAUGCGGGCUAGGUGUCGAGUAGUUAGCCUACUCGGUGCCGCAGGGAAGCCGUUAAGCCCGCCGCCUGGGGAGUACGGUCGCAAGACUGAAACUUAAAGGAAUUGGCGGGGGAGCACACAAGGGGUGGAACCUGCGGCUCAAUUGGAGUCAACGCCUGGAAUCUUACCGGGGG-AGACCC-UA---------------CAAC-------------C--UC--GCGGAGAGGAGGUGCAUGGCCGUCGCCAGCUCGUGUUGUGAAAUGUCCGGUUAAGUCCGGCAACGAGCGAGACCCCCACCCCUAGUUGGUAUUCUUCCG-AGAACCACACUAGGGGGACUGCCGGCGUAA-GCCGGAGGAAGGAGGGGGCCACGGCAGGUCAGCAUGCCCCGAAACUCCCGGGCCGCACGCGGGUUACAAUGGCAGGGACAACGGGAUGCUACCUCGAAAGGGGGAGCCAAUCCU-UAAACCCUGCCGCAGUUGGGAUCGAGGGCUGAAACCCGCCCUCGUGAACGAGGAAUCCCUAGUAACCGCGGGUCAACAACCCGCGGUGAAUACGUCCCUGCUCCUUGCACACACCGCCCGUCGCUCCACCCGAGCGCGAAAGGGGUGA-GGUCCCUGC------GAUA-----AGGGGAUCGAACUCCUUUCCCGCGAGGGGGGAGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGCGGCUGGAUCACCUCAm-- ar.The.pen GCCAGACUCCGGUUGAUCCUGCCGGACCC-GACCGCUAUCGGGGUGGGUAACCCAUGGAAGUCUAGGAGCCGGGC-------UACG--------GCCGGCUCCGGCGGACGGCUCAGUAGCACGUGGCAACCUACCCUCGGGAGGGGGAUAACCCCGGGAAACUGGGGAUAAACCCCCAUA-GGCGGGAACGCGAAAGGCCACGGUACAUGCCGUGGCCGCCCGAGGAUGGGGCUGCGCCCUAUCAGGUAGUUGGCGGGGUAACGGCCCGCCAAGCCGAUAACGGGUGGGGGCCGUGAGAGCGGGAGCCCCGAGAUGGGCACUGAGACAAGGGCCCAGGCCCUACGGGGUGCACCAGGGGCGAAACUUCCGCAAUGCGGGAAACCGUGACGGAGUCACCCCGAGUGCCC----------GAAG-------GGUGGCUUUUGCCCGGUCU--------------AAAA-------------------GCCGGGCGAAUAAGCGGGGGCAAGCUUGGUGUCAGCCGCCGCGGUAAUACCAACCCCGCGAGUGGUCGGGACGUUUAUUGGGCCUAAAGCGUCCGUAGCCGGCCCGGUAAGUCCCUCCUUAAAGCCCACGGCUCAACCGUGGGAGCGGAGG-GAUACUGCCGGGCUAGGGGGCGGGAGAGGCCGGGGGUACUCCUGGGGUAGGGGCGAAAUCCUAUAAUCCCAGGAGGACCACCAGUGGCGAAGGCGCCCGGCUAGCACGCGCCCGACGGUGAGGGACGAAAGCUGGGGGAGCAAAGGGGAUUAGAUACCCCCGUAGUCCCAGCUGUAAACGAUGCGGGCUAGGUGUUGGACGGUUCGCCCGUCCAGUGCCGUAGGGAAGCCGUUAAGCCCGCCGCCUGGGGAGUACGGCCGCAAGGCUGAAACUUAAAGGAAUUGGCGGGGGAGCACACAAGGGGUGAAGCUUGCGGUUUAAUUGGAGUCAACGCCGGAAACCUUACCGGGGG-CGACAC-GA---------------CGAC-------------C--AA--GCUGAGAGGAGGUGCAUGGCCGUCGCCGGCUCGUGCCGUGAGGUGUCCUGUUAAGUCAGGGAACGAGCGAGACCCCCGCCCCUAGUUGCUACCCAUUCG-UGGGGCACUCUAGGGGGACUGCCGGCGAUAAGCCGGAGGAAGGUGGGGGCUACGGCAGGUCAGUAUGCCCCGAAACCCCCGGGCUACACGCGAGCUGCAAUGGCGGGGACAGCGGGUUCCGACCCCGAAAGGGGGAGGUAAUCCCUUAAACCCCGCCUCAGUAGGAAUCGAGGGCUGCAACUCGCCCUCGUGAACGUGGAAUCCCUAGUAACCGCGUGUCACCAACGCGCGGUGAAUACGUCCCUGCUCCUUGCACACACCGCCCGUCGCUCCACCCGAGGGAGGCCUAGGUGA-GGCCUCCGCC-----GACG----AGGGAGGUCGAACCUGGGCCUCCCAAGGGGGGAGAAGUCGUAACAAGGUGGCCGUAGGGGAACCUGCGGCCGGAUCACCUCmm-- ar.Hal.val -----AUUCCGGUUGAUCCUGCCGGAGGC-CAUUGCUAUCGGAGUCCAUUAGCCAUGCUAGUUGCACGAGU-----------UUAG-----------ACUCGUAGCAUAUAGCUCAGUAACACGUGGCAAACUACCCUACAGACCGCAAUAACCUCGGGAAACUGAGGCCAAUAGCGGAUA-UAACGGAGGUUGAAAGU--------UCCG------GCGCUGUAGGAUGUGGCUGCGGCCGAUUAGGUAGAUGGUGGGGUAACGGCCCACCAUGCCGAUAAUCGGUACAGGUUGUGAGAGCAAGAGCCUGGAGACGGUAUCUGAGACAAGAUACCGGGCCCUACGGGGCGCAGCAGGCGCGAAACCUUUACACUGCACGACAGUGCGAUAGGGGGACUCCGAGUGUGG----------AUAU------ACCUCGCUUUUCUGUACCGU--------------AAGG-------------------GGUACAGGAACAAGGACUGGCAAGACCGGUGCCAGCCGCCGCGGUAAUACCGGCAGUCCGAGUGAUGGCCGAUAUUAUUGGGCCUAAAGCGUCCGUAGCUUGCUGUGUAAGUCCAUUGGGAAAUCCACCAGCUCAACUGGUCGGCGUCCGGGGAAACUACACAGCUUGGGGCCGAGAGACUCAACGGGUACGUCCGGGGUAGGAGUGAAAUCCUGUAAUCCUGGACGGACCACCAAUGGGGAAACCACGUUGAGAGACCGGACCCGACAGUGAGGGACGAAAGCCAGGGUCUCGAACCGGAUUAGAUACCCGGGUAGUCCUGGCUGUAAACAAUGCUCGCUAGGUAU-GUCAGCCAUGCACGUAAUGUGCCGUAGUGAAGACGAUAAGCGAGCCGCCUGGGAAGUACGUCCGCAAGGAUGAAACUUAAAGGAAUUGGCGGGGGAGCACACAACCCGAGGAGCCUGCGGUUUAAUUGGACUCAACGCCGGACAUCUCACCGGUCC-CGACAU-UAA--------------UGAC-------------U--CC--ACUGAGAGGAGGUGCAUGGCCGCCGUCAGCUCGUACCGUGAGGCGUCCUGUUAAGUCAGGCAACGAGCGAGACCCACACUUCUAGUUGCCAGCAACCUG-UUGGGUACACUAGGAGGACUGCCAUUGCUAAAAUGGAGGAAGGAAUGGGCAACGGUAGGUCAGUAUGCCCCGAAUGGACCGGGCAACACGCGGGCUACAAUGGCUAUGACAGUGGGAUGCAACGCCGAAAGGCGACGCUAAUCUC-CAAACGUAGUCGUAGUUCGGAUUUCGGGCUGAAACCCGCCCGCAUGAAGCUGGAUUCGGUAGUAAUCGCGUGUCAGAAGCGCGCGGUGAAUACGUCCCUGCUCCUUGCACACACCGCCCGUUAAAGCACCCGAGUGGGGUCCGGAUGA-GGCCGUC--------AUGC-------ACGGUUGAAUCUGG-CUCCGCAAGGGGGCUUAAGUCGUAACAAGGUAGCCGUAGAGGAAUCUGCGGCUGGAUCACCUCCU-- ar.Hal.sac -----AUUCCGGUUGAUCCUGCCGGAGGC-CAUUGCUAUUGGGAUUCAUUAGCCAUGCUAGUCGCACGAGU-----------UCAG-----------ACUCGUGGCGAAUAGCUCAGUAACACGUGGCAAACUACCCUUCGGAGCUCCAUACCCUCGGGAAACUGAGGCUAAUAGAGCAUA-CCACGGAAGUGCAAAGC--------UCCG------GCGCCGAAGGAUGUGGCUGCGGCCGAUUAGGUAGACGGUGGGGUAACGGCCCACCGUGCCAAUAAUCGGUACGGGUCAUGAGAGUGAGAACCCGGAGACGGAAUCUGAGACAAGAUUCCGGGCCCUACGGGGCGCAGCAGGCGCGAAACCUUUACACUGCACGACAGUGCGAUAAGGGAAUCCCAAGUGCGG----------AUAG------ACUGCGCUUUUGUACACCGU--------------AGGG-------------------GGUGUACGAAUAAGGGCUGGCAAGACCGGUGCCAGCCGCCGCGGUAAUACCGGCAGCCCGAGUGAUGGCCGAUCUUAUUGGGCCUAAAGCGUCCGUAGCUGGCCGCACAAGUCCAUCGGAAAAUCCACCCGCUCAACGGGUGGGCGUCCGGGGAAACUGUGUGGCUUGGGACCGGAAGGCGCGACGGGUACGUCCGGGGUAGGAGUGAAAUCCCGUAAUCCUGGACGGACCGCCGAUGGCGAAAGCACGUCGCGAGGACGGAUCCGACAGUGAGGGACGAAAGCCAGGGUCUCGAACCGGAUUAGAUACCCGGGUAGUCCUGGCCGUAAACAAUGUCUGUUAGGUGU-GGCUCCUACGUGGGUGCUGUGCCGUAGGGAAGCCGCUAAACAGACCGCCUGGGAAGUACGUCCGCAAGGAUGAAACUUAAAGGAAUUGGCGGGGGAGCAUACAACCGGAGGAGCCUGCGGUUUAAUUGGACUCAACGCCGGACAUCUCACCAGCAU-CGACUU-UAA--------------UGAU-------------C--CU--UCAGAGAGGAGGUGCAUGGCCGCCGUCAGCUCGUACCGUGAGGCGUCCUGUUAAGUCAGGCAACGAGCGAGACCCGCAUCCUUACUUGCCAGCAGGCGA-CUGGGGACAGUAGGGAGACCGCCGUGGCCAACACGGAGGAAGGAACGGGCAACGGUAGGUCAGUAUGCCCCGAAUGUGCUGGGCAACACGCGGGCUACAAUGGUCGAGACAAAGGGUUCCAACUCCGAAAGGAGACGGUAAUCUCAGAAACUCGAUCGUAGUUCGGAUUGUGGGCUGCAACUCGCCCACAUGAAGCUGGAUUCGGUAGUAAUCGCGUGUCACAAGCGCGCGGUGAAUACGUCCCUGCUCCUUGCACACACCGCCCGUCAAAGCACCCGAGUGAGGUCCGGAUGA-GGCGCGU---------UCC-------CGCGUCGAAUCUGG-CUUCGCAAGGGGGCUUAAGUCGUAACAAGGUAGCCGUAGGGGAAUCUGCGGCUGGAUCACCUCCU-- ar.Nat.mag -----AUUCCGGUUGAUCCUGCCGGAGGU-CAUUGCUAUUGGAGUCCAUUAGCCAUGCUAGUUGUACGAGU-----------UUAG-----------ACUCGUAGCAGAUAGCUCAGUAACACGUGGCAAACUACCCUAUGGAUCCAGACAACCUCGG-AAACUGAGGCUAAUCUGGAAUA-CGACGGAGGUCGAAAGC--------UCCG------GCGCCAUAGGAUGUGGC--CGGCCGAUUAGGUAGACGGUGGGGUAACGGCCCACCGUGCCGAUAAUCGGUACGGGUUGUGAGAGCAAGAGCCCGGAGACGGUAUCUGAGACAAGAUACCGGGCCCUACGGGGCGCAGCAGGCGCGAAACCUUUACACUGCACGCCAGUGCGAUAAGGGGACUCCAAGUGCGG----------AUAU------ACCUCGCUUUUUGCGACCGU--------------AAGG-------------------GGUCGCGGAAUAAGUGCUGGCAAGACCGGU-CCAGCCGCCGCGGUAA-ACCGGCAGCACGAG--AUGACCGGUGUUAUUGGGCCUAAAGCGUCCGUAGCUGGCCGCGCAAGUCUAUCGGGAAAUCUCUUCGCUUAACGGAGAGGCGUCCGGGGAAAA-AUGUGGCUUGGGACCGGAAGACCAGAGGGGUACGUCUGGGGUAGGAGUGAAAUCCCGUAAUCCUGGACGGACCACCGGUGGCGAAAGCGCCUCUGGAAGACGGAUCCGACGGUGAGGGACGAAAGCUCGGGUCACGAACCGGAUUAGAUACCCGGGUAGUCCGAGCUGUAAACGAUGUCUGCUAGGUGU-GACAAGUACGCCUGUGUUGUGCCGUAGGGAAGCCGUGAAGCAGACCGCCUGGGAAGUACGUCCGCAAGGAUGAAACUUAAAGGAAUUGGCGGGGGAGCAUACAACCGGAGGAGCCUGCGGUUUAAUUGGACUCAACGCCGGACAUCUCACCAGCAC-CGACAU-UAA--------------GGAU-------------C-UGC--AUUGAGAGGAGGUGCAUGGCCGCCGUCAGCUCGUACCGUGAGGCGUCCUGUUAAGUCAGGCAACGAGCGAGACCCGCUCUCCUAAUUGCCAGCAACUUU-UUGGGUACAUUAGGAGGACUGCCGCUGCCAAAGCGGAGGAAGGAACGGGCAACGGUAGGUCAGUAUGCCCCGAAUGUGCUGGGCAACACGCGGGCUACAAUGGCCACGACAGUGGGAUGCAACGCCGAAAGGCGACGCUAAUCUCCUAAACGUGGUCGUACUUCGGAUUGAGGGCUGAAACUCGCCCUCAUGAAGCUGGAUUCGGUAGUAAUCGCGCCUCAGAAGGGCGCGGUGAAUACGUCCCUGCUCCUUGCACACACCGCCCGUCAAAGCACCCGAGUGGGGUCCGGAUGA-GGCCCGG--------UUUA-------CGGGUCGAAUCUGG-CUCCGCAAGGGGGCUUAAGUCGUAACAAGGUAGCCGUAGGGGAAUCUGCGGCUGGAUCACCUCCA-- ar.Met.fer CGCCAACUCCGUUUGAUCCUGGCGGAGGC-CACUGCUAUGGGGGUCCAUAAGCCAUGCAAGUCGAACGGGC-----------CUUG----------UGCCCGUGGCGAACGGCUCAGUAACACGUGGAAACCUACCCUGGGGUCCGGGAUAACCCCGGGAAACUGGGGCUAAUCCCGGAUA-GGCGGGAACGCGAAAGGUC------UUUU----GACCGCCCCAGGAUGGGUCUGCGGCCGAUUAGGUAGUUGGUAGGGUAACGGCCUACCAAGCCUACGAUCGGUACGGGUUGUGAGAGCAAGAGCCCGGAGACGGGGCCUGAGACAAGGCCCCGGGCCCUACGGGGCGCAGCAGGCGCGAAAACUCCGCAAUGCGCGAAAGCGCGACGGGGGGACCCCCAGUGCCU----------GUAA-------AGUGGCUUUUCCGGAGUGU--------------AAAA-------------------GCUCCGGGAAUAAGGGCUGGCAAGACCGGUGCCAGCCGCCGCGGUAACACCGGCAGCCCGAGUGGUGGCCGCGUUUAUUGGGCCUAAAGCGUCCGUAGCCGGUCCGGUAAGUCUCCGGUGAAAGCCCGCAGCUCAACUGCGGGAGUAGCCGAGAUACUGCCGGACUUGGGGCCGGGAGAGGCCGGAGGUACCCCCGGGGUAGGGGUGAAAUCCUGUAAUCCCGGGGGGACCACCUGUGGCGAAGGCGUCCGGCUGGAACGGGCCCGACGGUGAGGGACGAAAGCCAGGGGAGCGAACCGGAUUAGAUACCCGGGUAGUCCUGGCCGUAAACGAUGCGGACUUGGUGUUGGGGCACUCGUUGCCCCAGUGCCGAAGGGAAGCCGUUAAGUCCGCCGCCUGGGGAGUACGGCCGCAAGGCUGAAACUUAAAGGAAUUGGCGGGGGAGCACACAACGCGUGGAGCCUGCGGUUUAAUUGGAUUCAACGCCGGACACCUCACCGGGGG-CGACGC-GA---------------UGAU-------------C--UA--GCCGAGAGGAGGUGCAUGGCCGCCGUCAGCUCGUACCGUGAGGCGUCCUGUUAAGUCAGGCAACGAGCGAGACCCGCGCCCCUAGUUGCCAGCGGGUAA-CCGGGCACACUAGGGGGACCGCCAGCGAUAAGCUGGAGGAAGGUGCGGGCGACGGUAGGUCCGUAUGCCCCGAAACCCCCGGGCUACACGCGGGCUACAAUGGCCGGGACAAUGGGUACCGACCCCGAAAGGGGGAGGUAAUCCCAUAAACCCGGCCGUAGUUCGGAUCGAGGGCUGCAACUCGCCCUCGUGAAGCUGGAAUGCGUAGUAAUCGCGGGUCACUAUCCCGCGGUGAAUACGUCCCUGCUCCUUGCACACACCGCCCGUCACGCCACCCAAACGGGGUUCGGAUGA-GGCCAUGCC------UCU-------GAUGGUCGAAUCCGGGCCCCGUGAGGAGGGCGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGCGGCUGGAUCACmmmmm-- ar.Met.jan -----AUUCCGGUUGAUCCUGCCGGAGGC-CACUGCUAUCGGGGUCCAUAAGCCAUGCGAGUCA-AGGGGCUCC--------UUCG--------GGGAGCACCGGCGCACGGCUCAGUAACACGUGGCAACCUACCCUCGGGUGGGGGAUAACCUCGGGAAACUGAGGCUAAUCCCCCAUA-GGGGGGAACCCGAAAGGC-------GUAA-----GCUGCCCGAGGAUGGGGCUGCGGCGGAUUAGGUAGUUGGUGGGGUAACGGCCCACCAAGCCUACGAUCCGUACGGGCCCUGAGAGGGGGAGCCCGGAGAUGGACACUGAGACACGGGUCCAGGCC-UACGGGGCGCAGCAGGCGCGAAACCUCCGCAAUGCGCGAAAGCGCGACGGGGGGACCCCGAGUGCCC----------CCU--------GUGGGCUUUUCCGGAGUGU--------------AAAC-------------------GCUCCGGGAAUAAGGGCUGGCAAGUCCGGUGCCAGCAGCCGCGGUAAUACCGGCGGCCCAAGUGGUGGCCACUGUUAUUGGGCCUAAAGCGUCCGUAGCCGGCCCGGUAAGUCUCUGCUUAAAU-CUGCGGCUCAACCGCAGGGCUGGCA-AGAUACUGCCGGGCUUGGGACCGGGAGAGGCCGGGGGUACCCCAGGGGUAGCGGUGAAAUGCGUUGAUCCCUGGGGGACCACCUGUGGCGAAGGCGCCCGGCUGGAACGGGUCCGACGGUGAGGGACGAAGGCCAGGGGAGCAAACCGGAUUAGAUACCCGGGUAGUCCUGGCUGUAAACUCUGCGGACUAGGUGUCGCGUCGUUCGCCGACGCGGUGCCGAAGGGAAGCCGUUAAGUCCGCCGCCUGGGGAGUACGGUCGCAAGACUGAAACUUAAAGGAAUUGGCGGGGGAGCAUACAACGGGUGGAGCCUGCGGUUUAAUUGGAUUCAACGCCGGGCAUCUUACCAGGGG-CGACGC-GA---------------UGAC-------------C--AC--GCCGAGAGGUGGUGCAUGGCCGUCGUCAGCUCGUACCGUGAGGCGUCCUGUUAAGUCAGGUAACGAGCGAGACCCGUGCCCCAUGUUGCUACCUCUCCG-GAGGGCACUCAUGGGGGACCGCCGGCGCUAAGCCGGAGGAAGGUGCGGGCAACGACAGGUCCGCAUGCCCCGAAUCCCCUGGGCUACACGCGGGCUACAAUGGCCGGGACAAUGGGACGCGACCCCGAAAGGGGGAGCGAAUCCCCUAAACCCGGUCGUAGUCCGGAUCGAGGGCUGUAACUCGCCCUCGUGAAGCCGGAAUCCGUAGUAAUCGCGCCUCACCAUGGCGCGGUGAAUGCGUCCCUGCUCCUUGCACACACCGCCCGUCACGCCACCCGAGUUGAGCCCAAGUGA-GGCCCUGCC------GCAA------GAGGGUCGAACUUGGGUUCAGCGAGGGGGGCGAAGUCGUAACAAGGUAGCCGUAGGGGAA-CUGCGGCUGGAUCACCUCCm-- ar.Met.ore -----NUUCUGGUUGAUCCUGCCAGAGGU-CACUGCUAUCAGUGUUCAUAAGCCAUGCGAGUCAAAUGUUC-----------UUCG----------UGAACAUGGCGUACUGCUCAGUAACNNGUGGAAAUCUGCCCUAAGGUCUGGCAUAACRCCGGGAAACUGAUGAUAAUUCCAGAUG-GACCGGAAGGUAAAAGC--------UCCG------GCGCCUUAGGAUGAAUCUGCGGCCUAUCAGGUUGUAGUGGGUGUAACGUACCUACUAGCCGACGACGGGUACGGGNUGUGAGAGCAAGAGCCCGGAGAUGGAUUCUGAGACAUGAAUCCAGGCCCUACGGGGCGCAGCAGGCGCGAAAACUUUACAAUGCGGGAAACCGCGAUAAGGGGACACUGAGUGCCC----------CUU--------GUUGGC-UGUCCCAUGUAU--------------AAAU-------------------GCAUGUGUUACAAGGGCCGGCAAGACCGGUGCCAGCCGCCGCGGUAACACCGGCGGCCCGAGUGGUGGCCACUAUUAUUGGGUCUAAAGGGUCCGUAGCCGGUUUGAUCAGUCUUCCGGGAAAUCUGACAGCUCAACUGUUAGGCUUCCGGGGAUACUGUCAGGCUUGGGACCGGGAGAGGUAAGAGGUACUACAGGGGUAGGAGUGAAAUCUUGUAAUCCCUGUGGGACCACCAGUGGCGAAGGCGUCUUACCAGAACGGGUCCGACGGUGAGGGACGAAAGCUGGGGGCACGAACCGGAUUAGAUACCCGGGUAGUCCCAGCCGUAAACGAUGCUCGCUAGGUGUCUGGGAUUGCGCGUUUCAGGUGCCGCAGGGAAGCCGUGAAGCGAGCCACCUGGGAAGUACGGCCGCAAGGCUGAAACUUAAAGGAAUUGGCGGGGGAGCAUACAACGGGUGGAGCCUGCGGUUUAAUUGGNNNNAACGCCGGAAAACUCACCUGGGG-CGACAC-UA---------------CGAA-------------C--UC--GCUGAGAGGANGUGCAUGGCCGUCGUCAGUUCGUACUGUGAAGCAUCCUGUUAAGUCAGGCAACGAGCGAGACCCGUGCCCACUGUUGCCAGCAUUUCG-AUGGGUACUCUGUGGGNACCGCCGCUGCUAAAGCGGAGGAAGGUGCGGGCUACGGUAGGUCAGUAUGCCCCGAAUCUCCAGGGCUACACGCGGGCUACAAUGGACGGGACAAUGGGCUCCUACCCCGAAAGGGGNUGGCAAUCUCACAAACCCGGCCGUAGUUCGGAUCGAGGGCUGUAACUCGCCCUCGUGAAGCUGGAAUCCGUAGUAAUCGCGUUUCAAUAUAGCGCGGUGAAUACGUCCCUGCUCCUUGCACACACCGCCCGNCAAACCACCCGAGUGAGGUAUGGGUGA-GGGCACAAC------AAA------GUGUGUUCGAACCUAAAUUUCGCAAGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm-- ar.The.chi -----mmmmmmmmmmmmmmmmmmGGAGGC-CACUGCUAUGGGGGUCCAUAAGCCAUGCGAGUCAAGGGGGCGUCC-------UUCU------GGGAC-CCACCGGCGGACGGCUCAGUAACACGUCGGAACCUACCCUCGGGAGGGGGAUAACCCCGGGAAACUGGGGCUAAUCCCCCAUA-GGCCGGAAGGCGAAAGGGACC----GUAA--GGUCCCGCCCGAGGAUGGGCCGGCGGCCGAUUAGGUAGUUGGUGGGGUAACGGCCCACCAAGCCGAAGAUCGGUACGGGCCGUGAGAGCGGGAGCCCGGAGAUGGACACUGAGACACGGGUCCAGGNCCUACGGGGCGCAGCAGGCGCGAAACCUCCGCAAUGCGGGAAACCGCGACGGGGGGACCCCCAGUGCCUC---------UCU--------CACGGCUUUUCCGGAGUGU--------------AAAA-------------------GCUCCGGGAAUAAGGGCUGGCNAGGCCGGUGNNAGCCGCCGCGGUAAUACCGGCGGCCCGAGUGGUGGCCACUAUUAUUGGGCCUAAAGCGGCCGUAGCCGGGCCCGUAAGUCCCUGGCGAAAUCCCACGGCUCAACCGUGGGGCUCGCUGGGAUACUGCGGGCCUUGGGACCGGGAGAGGCCGGGGGUACCCCCGGGGUAGGGGUGAAAUCCUAUAAUCCCGGGGGGACCGCCAGUGGCGAAGGCGCCCGGCUGGAACGGGUCCGACGGUGAGGGCCGAAGGCCAGGGGAGCGAACCGGAUUAGAUACCCGGGUAGUCCUGGCUGUAAAGGAUGCGGGCUAGGUGUCGGGCNAUUCG-NCGCCCGGUGCCGUAGGGAAGCCGUUAAGCCCGCCGCCUGGGGAGUACGGCCGCAAGGCUGAAACUUAAAGGAAUUGGCGGGGGAGCAUACAAGGGGUGGAGCGUGCGGUUUAAUUGGAUUCAACGCCGGGAACCUCACCGGGGG-CGACGC-GA---------------CGAA-------------C--GC--GCCGAGAGGAGGUGCAUGGNCGCCGUCAGCUCGUACCGUGAGGCGUCCACUUAAGUGUGGUAACGAGCGAGACCCGCGCCCCCAGUUGCCAGUNCGCU--GGAGGCACUCUGGGGGGACUGCCGGCGAUAAGCCGGAGGAAGGGGCGGGCGACGGUAGGUCAGUAUGCCCCGAAACCCCCGGGCUACACGCGCGCUACAAUGGGCGGGACAAUGGGACCCGACCCCGAAAGGGGAAGGGAAUCCCCUAAACCCGCCCUCAGUUCGGAUCGCGGGCUGCAACUCGCCCGCGUGAAGCUGGAAUCCCUAGUACCCGCGCGUCAUCAUCGCGCGGCGAAUACGUCCCUGCUCCUUGCACACACCGCCCGUCACUCCACCCGAGCGGGGCCCGGGUGA-GGCCCGACUC-----UUCG----GGACGGGUCGAGCCUGGGCUCCGUGAGGGGGGAGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm-- ar.The.ste -----mmmmmmmmmmmmmmmmmmmmmmmm-mmmmmmmmmmGAGGUCCAUAACCCAUGCGAGUCAUGGGGCGCG---------UCUG---------CGCGCACCGGCGGACGGCUCAGUAACACGUCGGAACCUACCCUCGGGAGGGGGAUAACCCCGGGAAACUGGGGCUAAUCCCCCAUA-GGCCGGAAGGCGAAAGGUC-------UCU----GACCGCCCGAGGAUGGGCCGGCGGCCGAUUAGGUAGUUGGUGGGGUAACGGCCCACCAAGCCGAAGAUCGGUACGGGCCAUGAGAGUGGGAGCCCGGAGAUGGACACUGAGACACGGGUCCAGGCCCUACGGGGCGCAGCAGGCGCGAAACCUCCGCAAUGCGGGCAACCGCGACGGGGGGACCCCCAGUGCCG----------AUA--------CACGGCUUUUCCGGAGUGU--------------AAAA-------------------GCUCCGGGAAUAAGGGCUGGCAAGGCCGGUGGCAGCCGCCGCGGUAAUACCGGCGGCCCGAGUGGUGGCCGCUAUUAUUGGGCCUAAAGCGUCCGUAGCCGGGCCCGUAAGUCCCUGGCGAAAUCCCACGGCUCAACCGUGGGGCUUGCUGGGAUACUGCGGGCCUUGGGACCGGGAGAGGCCGGGGGUACCCCUGGGGUAGGGGUGAAAUCCUAUAAUCCCAGGGGGACCGCCAGUGGCGAAGGCGCCCGGCUGGAACGGGUCCGACGGUGAGGGACGAAGGCCAGGGGAGCGAACCGGAUUAGAUACCCGGGUAGUCCUGGCUGUAAAGGAUGCGGGCUAGGUGUCGGGCGAUUCGCUCGCCCGGUGCCGGAGGGAAGCCGUUAAGCCCGCCGCCUGGGGAGUACGGCCGCAAGGCUGAAACUUAAAGGAAUUGGCGGGGGAGCAUACAAGGGGUGGAGCGUGCGGUUUAAUUGGAUUCAACGCCGGGAACCUCACCGGGGG-CGACGC-GA---------------CGAA-------------C--GC--GCCGAGAGGAGGUGCAUGGCCGCCGUCAGCUCGUACCGUGAGGCGUCCACUUAAGUGUGGUAACGAGCGAGACCCGCGCCCCCAGUUGCCAGUCUGCU--GGAGGCACUCUGGGGGGACCGCCGGCGAUAAGCCGGAGGAAGGAGCGGGCGACGGUAGGUCAGUAUGCCCCGAAACCCCCGGGCUACACGCGCGCUACAAUGGGCGGGACAAUGGGAUCCGACCCCGAAAGGGGAAGGGAAUCCCCUAAACCCGCCCUCAGUUCGGAUUGCGGGCUGCAACUCGCCCGCAUGAAGCUGGAAUCCCUAGUACCCGCGUGUCAUCAUCGCGCGGCGAAUACGUCCCUGCUCCUUGCACACACCGCCCGUCACUCCACCCGAGCGGGGUCCGGAUGA-GGCCUGGCUCC----UUCG---GGGACGGGUCGAGUCUGGGCUCCGUGAGGGGGGAGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUACGGCUCGAUCACCUCCUAU hi.Act.eur -----mmmmmmmmmmmmCCUGGCUCAGGACGAACGCUGGCGGCGUGCUUAACACAUGCAAGUCGAACGGGAUCCAAAGGC--UUUU-UGUUUUUGGUGAGAGUGGCGAACGGGUGAGUAACACGUGAGAACCUGCCCCCUUCUUUUGGAUAACCGCAUGAAAGUGUGGCUAAUACAGGAUAUUCCAGCAUUGGGAAAGGU-------UUGG-----UCUGGUGGGGGAUGGGCUCGCGGCCUAUCAGCUUGUUGGUGGGGUGAUGGCCUACCAAGGCGGUGACGGGUAGCCGGCCUGAGAGGGUGGGCGGUCACACUGGGACUGAGAUACGNCCCAGANUCCUACGGGAGGCAGCAGUGGGGGAUUUUGCACAAUGGGCGCAAGCCUGAUGCAGCGACGCCGCGUGAGGGAUGAAGGCCUUCGGGUUGUAAACCUCUUUCGCUGGGUUGAAAGGCCAUGCU---UUG----GGUGUGGUUGAUUUGAACUGGUAAAGAAGUACCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGUACUAGCGUUGUCCGGAUUUAUUGGGCGUAAAGGGCUUGUAGGUGGUUUGUCGCGUCUGUCGUGAAAUCCUGUGGCUUAACCAUGGGCUUGCGGUGGGUACGGGCAGGCUUGAGUGCGGUAGGGGAGACUGGAAUUCCUGGUGUAGCGGUGGAAUGCGCAGAUAUCAGGAGGAACACCGGUGGCGAAGGCGGGUCUCUGGGCCGUUACUGACACUGAGGAGCGAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCAUGCUGUAAACGUUGGGCACUAGGUGUGGGGGCCGUGU-GGUUUCUGCGCCGUAGCUAACGCAUUAAGUGUCCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGGAAUUGACGGGGACCCGCACAAGCGGCGGAGCAUGCGGAUUAAUUCGAUGCAACGCGAAGAACCUUACCAAGGCUUGACAUGACCGGUACAGUGUAGAGAUAUGCUGGCC--UUUU-UGGCUGGUGUGCAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCUCGUGUUGCCAGCAUUUGG-UUGGGGACUCUCGAGAGACUGCCGGGGUUAACUCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGUCUUGGGCUUCACGCAUGCUACAAUGGAUGGUACAGAGGGUUGCGAUACUGUGAGGUGGAGCUAAUCCCUUAAAGCUGUUCUCAGUUCGGAUCGUAGUCUGCAACUCGACUACGUGAAGGUGGAGUCGCUAGUAAUCGCAGAUCAGCACGCUGCGGUGAAUACGUUCUCGGGUCUUGUACACACCGCCCGUCACGUCACGAAAGUUGGUAACACCCGAAmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- hi.Bif.inf UUUGUGGAGGGUUCGAUUCUGGCUCAGGAUGAACGCUGGCGGCGUGCUUNACNCNUGCNAGUCGAACGGGAUCCAUCGGGC-UUUG--CUUGGUGGUGAGAGUGGCGAACGGGNGAGUNAUGCGUGACGACCUGCCCCAUACACCGGAAUAGCUCCUGGAAACGGGUGGUAAUGCCGGAUGUUCCAGCAUUGGGAAAGU--------UUC-------GCGGUAUGGGAUGGGGUCGCGUCCUAUCAGCUUGACGGCGGGGUAACGGCCCACCGUGGCUUCGACGGGUAGCCGGCCUGAGAGGGCGACCGGCCACNUUGGGACUGAGAUNCGGCCNNGNCUCCUACGGGAGGCNGCNGUGGGNNAUAUUGCACNAUGGGCGCAAGCCUNAUGCAGCGACGCNGCGUGAGGGAUGGAGGCCUUCGGGUUGUNAACCUCUUUUNUCGGGGAGCAAGC---------GUGA-----------GUGAGUUUACCCNUUGAAUNAGCACCGGCUAACUACGUGCCAGCNGCCGCGGUAAUACGUAGGGUGCNAGCGUUAUCCGGAAUUAUUGGGCGUNAAGGGCUCGUAGGCGGUUCGUCGCGUCCGGUGUGAAAGUCCAUCGCUUAACGGUGNAUCCGCGCCGNGUACGGGCGNGCUUGAGUGCGGUAGGGGAGACUGGAAUUCCCGGUGUAACGGUGGAAUGUGUAGAUAUCGGGAAGAACACCAAUGGCGAAGGCAGGUCUCUGGGCUGUUACUGACGCUGAGGAGCNAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUNAACGGUGGAUGCUGGAUGUGGGGCCNUUCCCGGGUUCCGUGUCGGAGCUAACGCGUUAAGCAUCCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGAAAUUGACGGGGGCCNGCACAAGCGGCGGAGCAUGCGGAUUNAUUCGANNNAACGCGAAGAACCUUACCUGGGCUUGACNUGUCCCGACGAUCCCAGAGAUGGNNNN-UCC-UUCG-GGNCGGGNUCACNGGUGGCGCAUGGUCGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGCCCCGUGUUGCCAGCGGUGUG-CCGGNNACUCACGGGNNACCGCCGGGGUUNACUCGGAGGAAGGUGGGGAUGACGUCAGAUCAUCAUGCCCCUUACGUCCAGGGCUUCACGCAUGCUACAAUGGCCGGUACAACGGGAUGCGACGCGGCGACGCGGAGCGGAUCCCUGAAAACCGGNCUCAGUUCGGAUCGCAGUCUGCAACUCGACUGCGUGAAGGCGGAGUCGCUAGUAAUCGCGAAUCAGCACGUCGCGGUGAAUGCGUUCCCNGGCNUUGUACACACNGCCCGUCAAGUCAUGAAAGUGGGCAGCACCCGAAGCCGGUGGCCUAACCCUUGGGANGGAGCCGUCUNAGGUGAGGCUCGUGANNGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- hi.Cel.cel -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmGGCGUGCUUAACACAUGCAAGUCGAACGGUGACGACGGGG--CUUG--CCCUGUCUGAUCAGUGGCGAACGGGUGAGUAACACGUGAGAACCUGCCCUUGACUCUGGGAUAACCGCGGGAAACGGCGGCUAAUACCGGAUAUGAGAGCAUUCUGAAAGA--------UUUA------UCGGUCAAGGAUGGGCUCGCGGCCUAUCAGCUUGUUGGUGGGGUGAUGGCCUACCAAGGCGACGACGGGUAGCCGGCCUGAGAGGGCGACCGGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGCAAGCCUGAUGCAGCGACGCCGCGUGAGGGAUGAAGGCCUUCGGGUUGUAAACCUCUUUCAGCAGGGAAGAAGC---------GAAA-----------GUGACGGUACCUGCAGAAGAAGCGCCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGCGCAAGCGUUGUCCGGAAUUAUUGGGCGUAAAGAGCUCGUAGGCGGUUUGUCGCGUCUGCUGUGAAAACCUGAGGCUCAACCUCGGGCUUGCAGUGGGUACGGGCAGACUAGAGUGCGGUAGGGGUGACUGGAAUUCCUGGUGUAGCGGUGGAAUGCGCAGAUAUCAGGAGGAACACCGAUGGCGAAGGCAGGUCACUGGGCCGCAACUGACGCUGAGGAGCGAAAGCAUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCAUGCCGUAAACGUUGGGCACUAGGUGUGGGGCUCUUCCCGAGUUCCGUGCCGCAGCAAACGCAUUAAGUGCCCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGAAAUUGACGGGGGCCCGCACAAGCGGCGGAGCAUGCGGAUUAAUUCGAUGCAACGCGAAGAACCUUACCAAGGCUUGACAUAACCGGAAAAGUGCAGAGAUGUGCUCCCC--GCAA--GGUCGGUGUACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUCCUAUGUUGCCAGCACCAUG-GUGGGGACUCAUAGGAGACUGCCGGGGUCAACUCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGUCUUGGGCUUCACGCAUGCUACAAUGGCCGGUACAAAGGGCUGCGAUACCGCGAGGUGGAGCGAAUCCCAAAAAGCCGGUCUCAGUUCGGAUUGGGGUCUGCAACUCGACCCCAUGAAGUCGGAGUCGCUAGUAAUCGCAGAUCAGCACGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCAAGUCACGAAAGUCGGUAACACCCGAAGCCGGUGGCCCAACCCUUGGGAGGGAGCCGUCGAAGGUGGGACUGGCGAUUGGGACUAAGUCGUAACAAGGUAGCCGUmmmmmmmmmmmmmmmmmmmmmmmmm----- hi.Cor.oti -----mmmmmmmmmmmmmmmmmmmmmmmmCGAACGCUGNCGGCGUGCUUAACACAUGCAAGUCGAACGGAAAGGCCUACUUUCUUGAUUGCGGGUGCUCGAGUGGCGAACGGGUGAGUAACACGUGAGGAUCUGCCCCCAACUUGGGNAUAAGCCUGGGAAACUGGGUCUAAUUCCCGAUAGGACUUGGUGGUGAAAACGA------UUUU---CUAGUGGUUGGGGAUGAGCUCGCGGCCUAUCAGCUUGUUGGUGGGGUAAUGGCCUACCAAGGCGGCGACGGGUAGCCGGCCUGAGAGGGUGGACGGCCACAUUGGGACUGAGAUACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGGAAGCCUGAUGCAGCGACGCCGCGUGGGGGAUGACGGCCUUCGGGUUGUAAACUCCUUUCGACCGCGAGGAAGCCGCCUGG--UUGG----AAGGGUGGUGACGGUAGUGGUAGAAGAAGCACCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGUGCGAGCGUUGUCCGGAUUUACUGGGCGUAAAGAGCUCGUAGGUGGCUUGUCGCGUCGUCUGUGAAAGUCUGGGGCUUAACUCCGGGUGUGCAGGCGAUACGGGCUGGCUUGAGUGCUGUAGGGGAGACUGGAAUUCCUGGUGUAGCGGUGGAAUGCGCAGAUAUCAGGAGGAACACCGAUGGCGAAGGCAGGUCUCUGGGCAGUCACUGACGCUGAGGAGCGAGAGCAUGGGUAGCGAACAGGAUUAGAUACCCUGGUAGUCCAUGCUGUAAACGGUGGGCGCUAGGUGUGGGGACUUUCCUGGUUUCCGUGUCCUAGCUAACGCGUUAAGCGCCCCGCCUGGGGAGUACGNCCGCAAGGCUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGCGGAGCAUGUGGAUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGGUUUGACAUGACUAGAUUAGGCGAGAGAUCGUCUGUCCC-UUUG-UGGCUGGUGUGCAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGUAACCCUUGUCUUAUGUUGCCAGCACUGUG-GUGGGGACUCGUGAGAGACUGCCGGGGUGAACUCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGUCCAGGGCUUCACACAUGCUACAAUGGCCGGUACAGUAGGUUGCGAGACCGUGAGGUGGAGCUAAUCCUGUAAAGCUGGUCGUAGUUCGGAUUGGGGUCUGCAACUCGACCCCAUGAAGUCGGAGUCGCUAGUAAUCGCAGAUCACCAUGCUGCGGUGAAUACGUUCUCGGGCCUUGUACACACCGCCCGUCACGUCAUGAAAGUUGGUAACACCCGAAGCCGNUUGCUCC-ACUUAGNUGGGGUGGUNUCGAAGGUGGGAUCGGCGAUmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- hi.Mic.par -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmGCAAGUCGAGCGGAAAGGCCC-----UUCG----GGGGUACUCGAGCGGCGAACGGGUGAGUAACACGUGAGAACCUGCCCCUGACUCUGGGAUAAGCCUGGGAAACCGGGUCUAAUACCGGAUAUGACAGCAUUGUGAAAGU--------UUUU------UCGGUUGGGGAUGGGCUCGCGGCCUAUCAGCUUGUUGGUGGGGUGAUGGCCUACCAAGGCGACGACGGGUAGCCGGCCUGAGAGGGCGACCGGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCGCAAUGGGCGGAAGCCUGAGCCAGCGACGCCGCGUGGGGGAUGACGGCCUUCGGGUUGUAAACCUCUUUCAGCAGGGACGAAGUU--------GACG----------------UGUACCUGUAGAAGAAGCGCCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGCGCGAGCGUUGUCCGGAAUUAUUGGGCGUAAAGAGCUCGUAGGUGGCUUGUUGCGUCUGCCGUGAAAGCCCGUGGCUUAACUACGGGUCUGCGGUGGAUACGGGCAGGCUAGAGGCUGGUAGGGGCAAGCGGAAUUCCUGGUGUAGCGGUGAAAUGCGCAGAUAUCAGGAGGAACACCGGUGGCGAAGGCGGCUUGCUGGGCCAGUUCUGACGCUGAGGAGCGAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCUGUAAACGUUGGGCGCUAGGUGUGGGGGUCUUCCCGAUUCCUGUGCCGUAGCUAACGCAUUAAGCGCCCCGCCUGGGGAGUACGGC-GCAAGGCUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGCGGAGCAUGUUGCUUAAUUCGAGCCAACGCGAAGAACCUUACCAAGGUUUGACAUAACCGGAAACACUCAGAGACGGGUGCCUCC-UUU--GGACUGGUGUACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUUCCAUGUUGCCAGCACUUUGGGUGGGGACUCAUGGGAGACUGCCGGGGUCAACUCGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGCCCCUUAUGUCUUGGGCUGCAAACAUGCUACAAUGGUCGGUACAGAGGGUUGCGAUACCGUGAGGUGGAGCGAAUCCCUAAAAGCCGGUCUCAGUUCGGAUUGGGGUCUGCAACUCGACCCCAUGAAGUCGGAGUCGCUAGUAAUCGCAGAUCAGCACGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACGUCACGAAAGUCGGCAACACCCGAAGCCCGUGGCCCAACCCUUGGGGGGGAGCGGUCGAAGGUGGGGCUGGCGAUUGGGACGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- hi.Mic.ech -----mmmmmmmmUGAUCCUGGCGCAGGACGAACGCUGGCGGCGUGCUUAACACAUGCAAGUCGAGCGGAAAGGCCC-----UUCG-----GGGUACUCGAGCGGCGAACGGGUGAGUAACACGUGAGAACCUGCCCUAGGCUUUGGGAUAACCCCGGGAAACCGGGCCUAAUACCGAAUAGGACCGCAUGGUGAAAG---------UUUU------UCGGCCUGGGAUGGGCUCGCGGCCUAUCAGCUUGUUGGUGGGGUGAUGGCCUACCAAGCCGACGACGGGUAGCCGGCCUGAGAGGGCGACCGGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGGAAGCCUGAUGCAGCGACGCCGCGUGAGGGAUGACGGCCUUCGGGUUGUAAACCUCUUUCAGCAGGGACGAAGC---------GUAA-----------GUGACGGUACCUGCAGAAGAAGCGCCAGCCAACUACGUGCCAGCAGCCGCGGUAAGACGUAGGGCGCGAGCGUUGUCCGGAUUUAUUGGGCGUAAAGAGCUCGUAGGCGGCUUGUCGCGUCGACUGUGAAAACCCGCAGCUCAACUGCGGGCCUGCAGUCGAUACGGGCAGGCUAGAGUUCGGUAGGGGAGACUGGAAUUCCUGGUGUAGCGGUGAAAUGCGCAGAUAUCAGGAGGAACACCGGUGGCGAAGGCGGGUCUCUGGGCCGAUACUGACGCUGAGGAGCGAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCUGUAAACGUUGGGCGCUAGGUGUGGGGGGCCUCCGGUUCUCUGUGCCGCAGCUAACGCAUUAAGCGCCCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGCGGAGCAUGCGGAUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGGUUUGACAUGCCGCAAAACCGGCAGAGAUGUCGGGUCC--UUCG--GGGGCGGUCACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUUCGAUGUUGCCAGCGCUAUG-GCGGGGACUCAUCGAAGACUGCCGGGGUCAACUCGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGCCCCUUAUGUCCAGGGCUUCACGCAUGCUACAAUGGCCGGUACAAUGGGCUGCGAUACCGUGAGGUGGAGCGAAUCCCAAAAAGCCGGUCUCAGUUCGGAUCGGGGUCUGCAACUCGACCCCGUGAAGUCGGAGUCGCUAGUAAUCGCAGAUCAGCACGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACGUCACGAAAGUCGGCAACACCCGAAGCCGGUGGCCCAACCCUUGGGAGGGAGCCGUCGAAGGUGGUGCUGGCGAUUGGGACGAAGUCGUAACAAGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- hi.Myc.cel -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmAACACAUGCAAGUCGAACGGAAAGGCCU-----UUNG----GGGGUGCUCGAGUGGCGAACGGGUGGGUAACACGUGGGGAUCUGCCCUGCACUUCGGGAUAAGCUUGGGAAACUGGGUCUAAUACCGGAUAGGACCGCAUGGUGAAAGC--------UUUU------GCGGUGUGGGAUGGGCCCGCGGCCUAUCAGCUUGUUGGUGGGGUGAUGGCCUACCAAGGCGACGACGGGUAGCCGGCCUGAGAGGGUGUCCGGCCACACUGGGACUGAGAUACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGCAAGCCUGAUGCAGCGACGCCGCGUGGGGGAUGACGGCCUUCGGGUUGUAAACCUCUUUCACCAUCGACGAAGCUGCCGG---UUUU-----CCGGUGGUGACGGUAGGUGGAGAAGAAGCACCGGCCAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGUGCGAGCGUUGUCCGGAAUUACUGGGCGUAAAGAGCUCGUAGGUGGUUUGUCGCGUUGUUCGUGAAAUUCCCUGGCUUAACUGGGGGCGUGCGGGCGAUACGGGCAGACUGGAGUACUGCAGGGGAGACUGGAAUUCCUGGUGUAGCGGUGGAAUGCGCAGAUAUCAGGAGGAACACCGGUGGCGAAGGCGGGUCUCUGGGCAGUAACUGACGCUGAGGAGCGAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGGUGGGUACUAGGUGUGGGUUUCUUCCUGGGAUCCGUGCCGUAGCUAACGCAUUAAGUACCCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGCGGAGCAUGUGGAUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGGUUUGACAUGACAGGACGACUGCAGAGAUGUGGUUUCC--CUUG-UGGCCUGUGUGCAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCUCAUGUUGCCAGCGCGAUG-GCGGGGACUCGUGAGAGACUGCCGGGGUCAACUCGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGCCCCUUAUGUCCAGGGCUUCACACAUGCUACAAUGGCCGGUACAAAGGGCUGCGAUGCCGUGAGGUUUAGCGAAUCCUUUAAAGCCGGUCUCAGUUCGGAUCGGGGUCUGCAACUCGACCCCGUGAAGUCGGAGUCGCUAGUAAUCGCAGAUCAGCAUGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACGUCAUGAAAGUCGGUAACACCCGAAGCCAGUGGCCUAACCGCAA-GAGGGAGCUGUCGAAGGUGGGAUCGGCGAUUGGGACGAAGUCGUAACAAGGUAGCCGUACCGGAAGUGmmmmmmmmmmmmmmm----- hi.Myc.int -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmGGCGGCGUGCUUAACACAUGCAAGUCGAACGGAAAGGCCCC----UUCG----GGGGUACUCGAGUGGCGAACGGGUGAGUAACACGUGGGAAUCUGCCCUGCACACUGGGAUAAGCCUGGGAAACUGGGUCUAAUACCGGAUAGGACCGCAUGGUGAAAGC--------UUUU------GCGGUGUGGGAUGGGCCCGCGGCCUAUCAGCUUGUUGGUGGGGUGACGGCCUACCAAGGCGACGACGGGUAGCCGGCCUGAGAGGGUGUCCGGCCACACUGGGACUGAGAUACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGCAAGCCUGAUGCAGCGACGCCGCGUGGGGGAUGACGGCCUUCGGGUUGUAAACCUCUUUCAGCAGGGACGAAGC---------GCAA-----------GUGACGGUACCUGCAGAAGAAGCACCGGCCAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGUGCGAGCGUUGUCCGGAAUUACUGGGCGUAAAGAGCUCGUAGGUGGUUUGUCGCGUUGUUCGUGAAAUCUCACGGCUUAACUGUGAGCGUGCGGGCGAUACGGGCAGACUAGAGUACUGCAGGGGAGACUGGAAUUCCUGGUGUAGCGGUGGAAUGCGCAGAUAUCAGGAGGAACACCGGUGGCGAAGGCGGGUCUCUGGGCAGUAACUGACGCUGAGGAGCGAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGGUGGGUACUAGGUGUGGGUUUCUUCCUGGGAUCCGUGCCGUAGCUAACGCAUUAAGUACCCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGCGGAGCAUGUGGAUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGGUUUGACAUGACAGGACGCGUCUAGAGAUAGGCGUUCC--CUUG-UGGCCUGUGUGCAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCUCAUGUUGCCAGCGGAAUG-CCGGGGACUCGUGAGAGACUGCCGGGGUCAACUCGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGCCCCUUAUGUCCAGGGCUUCACACAUGCUACAAUGGCCGGUACAAAGGGCUGCGAUGCCGCGAGGUUAAGCGAAUCCUUUAAAGCCGGUCUCAGUUCGGAUCGGGGUCUGCAACUCGACCCCGUGAAGUCGGAGUCGCUAGUAAUCGCAGAUCAGCACGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACGUCAUGAAAGUCGGUAACACCCGAAGCCAGUGGCCUAACCUUUGGGAGGGAGCUGUCGAAGGUGGGAUCGGCGAUUGGGACGAAGUCGUAAmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- hi.Noc.fla -----mmmmmmmmmmmmmmmmmmmmmmGACGAACGCUGGCGGCGUGCUUAACACAUGCAAGUCGAGCGGAAAGGCCC-----UUCG-----GGGUACUCGAGCGGCGAACGGGUGAGUAACACGUGAGAAUCUGCCCCAGGCUCUGGGAUAGCCACCGGAAACGGUGAUUAAUACCGGAUACGACAGCAUUGUGAAAG---------UUUU------UCGGCCUGGGAUGUGCUCGCGGCCUAUCAGCUUGUUGGUGAGGUAAUGGCUCACCAAGGCUUCGACGGGUAGCCGGCCUGAGAGGGUGACCGGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGGAAGCCUGAUCCAGCAACGCCGCGUGAGGGAUGACGGCCUUCGGGUUGUAAACCUCUUUCAGCACAGACGAAGC---------GCAA-----------GUGACGGUAUGUGCAGAAGAAGGACCGGCCAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGUCCGAGCGUUGUCCGGAAUUAUUGGGCGUAAAGGGCUCGUAGGCGGUCUGUCGCGUCGGGAGUGAAAACCAGGUGCUUAACACCUGGCCUGCUUUCGAUACGGGCAGACUAGAGGUACUCAGGGGAGAAUGGAAUUCCUGGUGUAGCGGUGAAAUGCGCAGAUAUCAGGAGGAACACCGGUGGCGAAGGCGGUUCUCUGGGAGUAUCCUGACGCUGAGGAGCGAAAGUGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACACCGUAAACGUUGGGCGCUAGGUGUGGGAUCCUUCCCGGGUUCCGUGCCGCAGCUAACGCAUUAAGCGCCCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGCGGAGCAUGCGGAUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGGUUUGACAUAACCGGAAAGCUGCAGAGAUGUAGCCCCU--UUU---AGUCGGUGUACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUCCUAUGUUGCCAGCAAUUCGGUUGGGGACUCAUAGGAGACUGCCGGGGUCAACUCGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGCCCCUUAUGUCCAGGGCUUCACGCAUGCUACAAUGGCCGGUACAAAGGGCUGCGAUCCCGUGAGGGUGAGCGAAUCCCAAAAAGCCGGUCUCAGUUCGGAUUGGGGUCUGCAACUCGACCCCAUGAAGUCGGAGUCGCUAGUAAUCGCAGAUCAGCACGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACGUCACGAAAGUCGGCAACACCCGAAGCCAGUGGCCCAACCCUUGGGGGGGAGCUGUCGAAGGUGGGGCUGGCGAUUGGGACGAAGUCGUAACAAGGUAGCCGUACCGGAAGGUGCmmmmmmmmmmmmm----- hi.Str.rim -UCACGGAGAGUUUGAUCCUGGCUCAGGACGAACGCUGGCGGCGUGCUUAACACAUGCAAGUCGAACGAUGAAGCCC-----UUCG-----GGGUGGAUUAGUGGCGAACGGGUGAGUAACACGUGGGAAUCUGCCCUGCGCUCUGGGACAAGCCCUGGAAACGGGGUCUAAUACCGGAUAUGACAGCAUUGUGAAAGC--------UCCG------GCGGUGCAGGAUGAGCCCGCGGCCUAUCAGCUUGUUGGUUAGGUAAUG-CCUACCAAGGCGGCGACGGGUAGCCGGCCUGAGAGGGCGACCGGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGCAAGCCUGAUGCAGCGACGCCGCGUGAGGGAUGACGGCCUUCGGGUUGUAAACCUCUUUCAGCAGGGAAGAAGC---------GCAA-----------GUGACGGUACCUGCAGAAGAAGCGCCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGCGCAAGCGUUGUCCGGAAUUAUUGGGCGUAAAGAGCUCGUAGGCGGCUUGUCGCGUCGGAUGUGAAAGCCCGGGGCUUAACCCCGGGUCUGCAUUCGAUACGGGCAGGCUAGAGUUCGGUAGGGGAGAUCGGAAUUCCUGGUGUAGCGGUGAAAUGCGCAGAUAUCAGGAGGAACGCCGGUGGCGAAGGCGGAUCUCUGGGCCGAUACUGACGCUGAGGAGCGAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGUUGGGAACUAGGUGUGGGCGACUUCCCGUCGUCCGUGCCGCAGCUAACGCAUUAAUUGCCCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGCGGAGCAUGUGGCUUAAUUCGACGCAACGCGAAGAACCUUACCAAGGCUUGACAUAACCGGAAACCUCUGGAGACAGGGGCCCC--CUUG-UGGUCGGUGUACAGGUGGUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUG-GCUCUGUUGCCAGCAUUUCGGAUGGGGACUCACAGGAGACCGCCGGGGUCAACUCGGAGGAAGGUGGGGACGACCUCAAGUCAUCAUGCCCCUUAUGUCUUGGGCUGCACACGUGCUACAAUGGCCGGUACAAUGAGCUGCGAUACCGCGAGGUGGAGCGAAUCUCAAAAAGCCGGUCUCAGUUCGGAUUGGGGUCUGCAACUCGACCCCAUGAAGUCGGAGUCGCUAGUAAUCGCAGAUCAGCAUGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACGUCACGAAAGUCGGUAACACCCGAAGCCGGUGGCCCAACCCUUGGGAGGGAAUCGUCGAAGGUGGGACUGGCGAUUGGGACGAAGUCGUAACAAGGUAGCCGUACCGGAAGGUGCGGCUGGAUCACCUCCUUU lo.Ach.lai UAUAUGGAGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAACGAAGCAUC-------UUCG-------GAUGCUUAGUGGCGAACGGGUGAGUAACACGUAGAAACCUACCUUUAACUCGAGGAUAACUCCGGGAAACUGGAGCUAAUACUGGAUA-GGAUGAA-AUUUAAAGA--------UUUA------UCGGUUUAAGAGGGGUCUGCGGCGCAUUAGUUAGUUGGUGGGGUAAGAGCCUACCAAGACGAUGAAUCGUAGCCGGACUGAGAGGUCUACCGGCCACAUUGGGACUGAGA-ACGGCCCAAACUCCUACGGGAGGCAGCAGUAGGGAAUUUUCGGCAAUGGGGGAAACCCUGACCGAGCAACGCCGCGUGAACGACGAAGUACUUCGGUAUGUAAAGUUCUUUUAUAUGGGAAGAAAAAUU------AAA----------AAUUGACGGUACCAUAUGAAUAAGCCCCGGCUAACUAUGUGCCAGCAGCCGCGGUAAUACAUAGGGGGCGAGCGUUAUCCGGAUUUACUGGGCGUAAAGGGUGCGUAGGUGGUUAUAAAAGUUUGUGGUGUAAGUGCAGUGCUUAACGCUGUGA-GGCUAUGAAAACUAUAUAACUAGAGUGAGACAGAGGCAAGUGGAAUUCCAUGUGUAGCGGUAAAAUGCGUAAAUAUAUGGAGGAACACCAGUGGCGAAGGCGGCUUGCUGGGUCUAUACUGACACUGAUGCACGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAGAACUAAGUGUUGGCC--AUAA---GGUCAGUGCUGCAGUUAACGCAUUAAGUUCUCCGCCUGAGUAGUACGUACGCAAGUAUGAAACUCAAAGGAAUUGACGGGACCCCGCACAAGCGGUGGAUCAUGUUGUUUAAUUCGAAGAUACACGAAAAACCUUACCAGGUCUUGACAUACUCUGAAAGGCUUAGAAAUAAGUUC-GG--AGG---CUCAGAUGUACAGGUGGUGCACGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUUGCUAGUUACCAUCA-UUAAGUUGGGGACUCUAGCGAGACUGCCAGUGAUAAAUUGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGACCUGGGCUACAAACGUGAUACAAUGGCUGGAACAAAGAGAAGCGAUAGGGUGACCUGGAGCGAAACUCACAAAAACAGUCUCAGUUCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGCAAAUCAGCAUGUUGCGGUGAAUACGUUCUCGGGGUUUGUACACACCGCCCGUCAAACCACGAAAGUGGGCAAUACCCAACGCCGGUGGCCUAACCGAAAGGAGGGAGCCGUCUAAGGUAGGGUCCAUGAUUGGGGUUAAGUCGUAACAAGGUAUCCCUACGGmmmmmmmmmmmmmmmmmmmmm----- lo.Bac.alc -----mmmmmmmUUGAUCCUGGCUCAGGACGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAGCGGACCAAAGGGAG--CUUG--CUCCCAGAGGUUAGCGGCGGACGGGUGAGUAACACGUGGGAACCUGCCCUGUAGACUGGGAUAACAUCGAGAAAUCGGUGCUAAUACCGGAUA-AUCAACAUUGUAAAAGAUGGC----UCCG--GCUAUCACUAACAGAU-GGCCUGCGGCGCUUUAGCUAGUUGGUAAGGUAAUGGCUUACCAAGGCGACGAUGCGUAGCCGACCUGAGAGGGUGAUCGGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUCUUCCGCAAUGGACGAAAGUCUGACGGAGCAACGCCGCGUGAGUGAUGAAGGUUUUCGGAUCGUAAAGCUCUGUUGUUAGGGAAGAACAAGUGCCGUUAAUA-GGUCGGCACCUUGACGGUACCUAACCAGAAAGCCACGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGUGGCAAGCGUUGUCCGGAAUUAUUGGGCGUAAAGCGCGCGCAGGCGGUCUUUUAAGUCUGAUGUGAAAUAUCGGGGCUCAACCCCGAGGGGUCAUUGGAAACUGGGA-GCUUGAGUACAGAAGAGGAGAGUGGAAUUCCNCGUGUNGCGGUGAAAUGCGUAGAUAUGUGGAGGAACACCAGUGGCGAAGGCGACUCUCUGGUCUGUAACUGACGCUGAGGCGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAGUGCUAGGUGUUAGGGGUUUCGUGCCCUUAGUGCCGAAGUUAACACAUUAAGCACUCCGCCUGGGGAGUACGGCCGCAAGGCUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCAGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAGGUCUUGACAUCCUUUGACCACUCUAGAGAUNGAGC--UUUCUUCGGGGACAAAGUGACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGAUCUUAGUUGCCAGCA-UUUAGUUGGGCACUCUAAGGUGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCAUNAUGCCCCUUAUGACCUNGGCUACACACGUGCUACAAUGGAUGGUACAAAGGGCAGCGAAACCGCGAGGUCGAGCCAAUCCCAUAAAGCCAUUCUCAGUUCGGAUUGUAGGCUGCAACUCGCCUACAUGAAGCCGGAAUUGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCACGAGAGUUUGUAACACCCmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- lo.Bac.sub UUAUCGGAGAGUUUGAUCCUGGCUCAGGACGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAGCGGACAG-GUGGGAG-CUUG-CUCC---GAUGUUAGCGGCGGACGGGUGAGUAACACGUGGGAACCUGCCUGUAAGACUGGGAUAACUCCGGGAAACCGGGGCUAAUACCGGAUG-GUUGGCAUCAUAAAAGGUGGC----UUCG--GCUACCACUUACAGAUGGACCCGCGGCGCAUUAGCUAGUUGGUGAGGUAACGGCUCACCAAGGCAACGAUGCGUAGCCGACCUGAGAGGGUGAUCGGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUCUUCCGCAAUGGACGAAAGUCUGACGGAGCAACGCCGCGUGAGUGAUGAAGGUUUUCGGAUCGUAAAGCUCUGUUGUUAGGGAAGAACAAGUACCGUUAACA-GGGCGGUACCUUGACGGUACCUAACCAGAAAGCCACGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGUGGCAAGCGUUUUCCGGAAUUAUUGGGCGUAAAGGGCUCGCAGGCGGUUUCUUAAGUCUGAUGUGAAAGCCCCCGGCUCAACCGGGGAGGGUCAUUGGAAACUGGGGAACUUGAGUGCAGAAGAGGAGAGUGGAAUUCCACGUGUAGCGGUGAAAUGCGUAGAGAUGUGGAGGAACACCAGUGGCGAAGGCGACUCUCUGGUCUGUAACUGACGCUGAGGAGCGAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAGUGCUAAGUGUUAGGGGGUUCCGCCCCUUAGUGCUGCAGCUAACGCAUUGAGCACUCCGCCUGGGGAGUACGGUCGCAAGACUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACUAGGUCUUGACAUCCUCUGACAAUCCUAGAGAUAGGACGUCCC-UUCG-GGGCAGAGUGACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGAUCUUAGUUGCCAGCA-UUCAGUUGGGCACUCUAAGGUGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGACUUGGGCUACACACGUGCUACAAUGGACAGAACAAAGGGCAGCG-AACCGCGAGGUUAAGCCAAUCCCACAAAUCUGUUCUCAGUUCGGAUCGCAGUCUGCAACUCGACUGCGUGAAGCUGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCACGAGAGUUUGUAACACCCGAAGUCGGUGAGGUAACCUUU-GGAGCCAGCCGCCGAAGGUGGGACAGAUGAUUGGGGUGAAGUCGUAACAAGGUAGCCGUAUCGGAAGGUGCGGCUGGAUCACCUCCUUU lo.Car.mob -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmGGACGGGUNAGUAACACGUGGGNACCNGCCCAUAAGUGGGGGAUAACAUUCGGAAACGGAUGCUAAUACCGCAUA-ACUCUCCUGAUAAAAGACGGU----UUCC--GCUGUCGCUNAUNGAUNGACCCGCGGCGUNUUAGCUNGUUGGUGAGGUAAUGGCUCACCAAGGCNAUGAUGCNUAGCCGACCUGAGAGGGUNAUCGGCNACACUGGGACUGAGACACGGCCNAGACUCCUACGGGAGGCAGCNGUAGGGAAUCUUCCGCAAUGGACGAAAGUCUGACGGAGCNAUGCCGCGUGAGUGAAGAAGGUUUUCGGAUCGUAAAACUNUGUUGUUAGAGAAGAACNAGGAUGAGCUAAC-UNCUCAUCCCCUGACGGUAUCUNACCAGAAAGCCAUGGCUAACUACGUGCCAGCAGCCGCGGUNAUACGUAGAUGGCNAGCGUUGUCCGGAUUUAUUGGGCGUNAAGCGAGCGCAGGCGGUUCUUUAAGUCUNAUGUGAAAGCCCCCAGCUCNACUGGGNAAGGUCAUUGGAAACUGGGGAACUUGAGUGCAGAAGAGGAGAGUGGAAUUCCACGUGUAGCGGUGAAAUGCGUAGAUAUGUGGAGGAACACCAGUGGCGAAGGCGACUCUCUNGUCUGUNACUGACGCUNAGGCUCGNAAGCGUNGGGAGCAAACAGGAUUAGAUACCCUNGUNGUCCACGCCGUAAACGAUGAGUGCUNAGUGUUGGAGGGUUCCGCCCUUCAGUGCUGCAGCUNACGCAUUAAGCACUNCGCCUNGGGAGUACGGCCGCAAGGCUGAAACUCAAAGGAAUUGACGGGGA-CCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGNAACGCGAAGAACCUUACCAGGUCUUGACAUCCUUUGACCACUCUAGAGAUAGAGCU-UUCCUUCGGGGACAAAGUGACAGGUGGNGCAUNGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUNAAGUCCCGCAACGAGCGCAACCCUNAUUACUAGUUGCCAGCA-UUCAGUUGGGCACUCUAGUGAGACUGCCGGUGAUAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGACCUNGGCUACACACGUGCUACAAUGGAUGGUACAACGAGUCGCAAGACCGCGAGGUCAAGCUAAUCUCUUAAAGCCAUUCUCNGUUCGGAUUGCAGGCUGCNACUCGCCUNCAUGAAGCCGGAAUCGCUAGUAAUCGCGGAUCNGAACGCCGCGGUNAAUACGUUCCCGGGUCUUGUACACACCGCCCGUCACACCACGAGAGUUUGUAACACCCGAAGUCGGUGAGGUNACCUUUUGGAGCCAGCCGCCUAAGGUNGGAUAGAUmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- lo.Eub.lac -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmCGGCGUGCUUNACMCRUGCGAGUCGAGCGAAGCACUUAGGWAAUUCGUUCCUAUUUGACUGAGCGGCGGACGGGUGAGUAACGCGUGGGAACCUCCCUCAUACAGGGGGAUAACAGUUAGAAAUGACUGCUAAUACCGCAUAAGACCGCAUGGUAAAAAC--------UCCG------GUGGUAUGAGAUGGACCCGCGUCUGAUUAGUUAGUUGGUGGGGUAACGGCCUACCAAGGCGACGAUCAGUAGCCGACCUGAGAGGGUGACCGGCCACAUUGGGACUGAGACACGGCCCAAACUCCUACGGGAGGCAGNAGNGGGGAAUAUUGCACAAUGGGGGANACCCUGAUGCAGCGACGCCGCGUGAGCGAAGAAGUAUUUCGGUAUGUAAAGCUCUAUCAGCAGGGAAG-------------AAAA------------UGACGGUACCUGACUAAGAAGCCCCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGGGCAAGCGUUAUCCGGAUUUACUGGGUGUAAAGGGAGCGUAGACGGAGCAGCAAGUCUGAUGUGAAAACCCGGGGCUCAACCCCGGGACUGCAUUGGAAACUGUUGAUCUGGAGUGCCGGAGAGGUAAGCGGAAUUCCUAGUGUAGCGGUGAAAUGCGUAGAUAUUAGGAGGAACACCAGUGGCGAAGGCGGCUUACUGGACGGUAACUGACGUUGAGGCUCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGACUACUAGGUGUCGGGUGGAAA-GCCAUUCGGUGCCGCAGC-AACGCAAUAAGUAGUCCACCUGGGGAGUACGUUCGCAAGAAUGAAACUCAAAGGAAUUGACGGGGACCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAANAACCUUACCUGCUCUUGACAUCCGGUGACGGCAGAGUAAUGUCUGCUUUUC-UUCG-GAACCCGGUGACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUAUCUUCAGUAGCCAGCG-GUAAGCCGGGCACUCUGGAGAGACUGCCAGGGAUAACCUGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGAGCAGGGCUACACACGUGCUACAAUGGCGUAAACAAAGGGAAGCGAACCCGCGAGGGUGGGCAAAUCCCANAAAUAACGUCUCAGUUCGGAUUGUAGUCUGCAACUCGACUACAUGAAGCUGGAAUCGCUAGUAAUCGCGAAUCAGAAUGUCGCGGUGAAUACGUUCCCGGGUCUUGUACACACCGCCCGUCACACCAUGGGAGUCAGUAACGCCCGAAGUCAGUGACCCAACCGUmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- lo.Lac.car UUAUAUGAGAGUUUGAUCCUGGCUCAGGACGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAACGCACGAAGUUGAAGACUUGUCUUUAACCAAGUGAGUGGCGGACGGGUGAGUAACACGUGGGAACCUGCCCAUUAGAGGGGGAUAACAUUCGGAAACGGAUGCUAAUACCGCAUA-GUUUGCAUAAGGAAAGGUGGC----UUCG--GCUACCACUAAUGGAUGGACCCNCGGCGUAUUAGCUAGUUGGUGAGGUAAUGGCUCACNAAGGCAAUGAUACGUAGCCGACCUGAGAGGGUGAUCGGCCACACUGGGACUGAGACACGGCNNNNACUCCUACGGGAGGCAGCAGUAGGGAAUCUUCCGCAAUGGACGAAAGUCUGACGGAGCAACGCCGCGUGAGUGAAGAAGGUUUUCGGAUCGUAAAACUCUNUUGUUAAAGAAGAACAAGGAUGAGAUAAC-UGCUCAUCCCUNNACGGUAUUUAACCAGAAAGCCACGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGUGGCAAGCGUUGUCCGGAUUUAUUGGGCGUAAAGCGAGCGCAGGCGGUUCUUUAAGUCUNAUGUGAAAGCCCNCNGCUCAACCGGGNAGGGUCAUUGGAAACUGGAGAACUUGAGUGCAGAAGAGGAGAGUGGAAUUCCACGUGUAGCGGUGAAAUGCGUAGAUAUGUGGAGGAACACCAGUGGCGAAGGCGACUCUCUNGUCUGUAACUGACGCUGAGGCUCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAGUGCUAAGUGUUGGAGGGUUCCGCCCUUCAGUGCUGCAGCUAACGCAUUAAGCACUCCGCCUGGGGAGUACGGCCGCAAGGCNNNAACUCAAAGGAAUUGNCGGGGACCNGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAANNAACGCGAAGAACCUUACCAGGUCUUGACAUCCUUUGACCACUCUAGAGAUAGAGCUUUCCCUUCGGGGACAAAGUGACAGGUGGNGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUUACUAGUUGCCAGCA-UUUAGUUGGGCACUCUAGUGAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGACCUGGNCUACACACGUGCUACAAUGGAUGGUACAACGAGUCGCAAGGUCGCGAGGCCAAGCUAAUCUCUUAAAGCCAUUCUCAGUUCGGAUUGUAGGCUGCAACUCGCCUNCAUGAAGCCGGAAUCGCUAGUAAUCGCGGAUCAGAACGCCGCGGUGAAUACGUUCCCGGGUCUUGUACACACCGCNCGUCACACCACGAGAGUUUGUAACACCCGAAGCCGGUGAGGUAACCUUU-GGAGCCAGCCGUCUAAGGUGGGAUAGAUAAUUNGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- lo.Myc.cap -AAAAUGAGAGUUUGAUCCUGGCUCAGGAUAAACGCUGGCGGCUGGCCUAAUACAUGCAAGUCGAACGGGGGUG--------CUUG--------CACCUCAGUGGCGAACGGGUGAGUAACACGUAUCAACU-ACCUUAUAGCGGGGGAUAACUUUUGGAAACGAAAGAUAAUACCGCAUGUAGAUGCAUAUCAAAAGAACC-----GUUU---GGUUCACUAUGAGAUGGGGAUGCGGCGUAUUAGCUAGUAGGUGAGAUAAUAGCCCACCUAGGCGAUGAUACGUAGCCGAACUGAGAGGUUGAUCGGCCACAUUGGGACUGAAAUACGGCCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUUUUUCACAAUGGACGAAAGUCUGAUGAAGCAAUGCCGCGUGAGUGAUGACGGCCUUCGGGUUGUAAAGCUCUGUUGUAAGGGAAGAAAAAAUAGAGUAGAAA-UGACUUUAUCUUGACAGUACCUUACCAGAAAGCCACGGCUAACUAUGUGCCAGCAGCCGCGGUAAUACAUAGGUGGCAAGCGUUAUCCGGAUUUAUUGGGCGUAUAGGGUGCGUAGGCGGUUUUGCAAGUUUGAGGUUAAAGUCCGGAGCUCAACUCCGGUU-CGCCUUGAAGACUGUUUUACUAGAAUGCAAGAGAGGUAAGCGGAAUUCCAUGUGUAGCGGUGAAAUGCGUAGAUAUAUGGAAGAACACCUGUGGCGAAAGCGGCUUACUGGCUUGUUAUUGACGCUGAGGCACGAAAGCGUGGGGAGCAAAUAGGAUUAGAUACCCUAGUAGUCCACGCCGUAAACGAUGAGUACUAAGUGUUGGG---GUAA----CUCAGCGCUGCAGCUAACGCAUUAAGUACUCCGCCUGAGUAGUAUGCUCGCAAGAGUGAAACUCAAAGGAAUUGACGGGGACCCGCACAAGUGGUGGAGCAUGUGGUUUAAUUCGAAGCAACACGAAGAACCUUACCAGGGCUGACA-UCCAGUGAAAGCUAUAGAGAUAUAGU--AG--AGG---UUCAUUGAGACAGGUGGUGCAUGGUUGUCGUCAGUUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAACGCAACCCUUGUCGUUAGUUACUAACA-UUAAGUUGAGAACUCUAACGAGACUGCUAGUG-UAAGCUAGAGGAAGGUGGGGAUGACGUUAAACUACUAUGCCCUUUAUGUCCUGGGCUACACACGUGCUACAAUGGCUGGUACAAAGAGUUGCAAUCCUGUGAAGGGGAGCUAAUCUCAAAAAACCAGUCUCAGUUCGGAUUGAAGUCUGCAACUCGACUUCAUGAAGCCGGAAUCACUAGUAAUCGCGAAUCAGCUUGUCGCGGUGAAUACGUUCUCGGGUCUUGUACACACCGCCCGUCACACCAUGAGAGUUGGUAAUACCAGAAGUAGGUAGCUUAACCAUUUGGAGAGCGCUUCCCAAGGUAGGACUAGCGAUUGGGGUGAAGUCGUAACAAGGUAUCCGUACGGGAACGUGCGGAUGGAUCACCUCCUUU lo.Myc.myc -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmCUGGCGGCAUGCCUAAUACAUGCAAGUCGAACGGAGGUG--------CUUG--------CACCUCAGUGGCGAACGGGUGAGUAACACGUAUCAACCUACCUCAUAGCGGGGGAUAACUUUUGGAAACGAAAGAUAAUACCGCAUGUAGAUGCAUAUCAAAAGAACC-----GUUU---GGUUCACUAUGAGAUGGGGAUGCGGCGUAUUAGCUAGUAGGUGAGAUAAUAGCCCACCUAGGCGAUGAUACGUAGCCGAACUGAGAGGUUGAUCGGCCACAUUGGGACUGAGAUACGGCCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUUUUUCACAAUGGACGAAAGUCUGAUGAAGCAAUGCCGCGUGAGUGAUGACGGCCUUCGGGUUGUAAAGCUCUGUUGUAAGGGAAGAAAAAAUAAAGUAGAAA-UGACUUUAUCUUGACAGUACCUUACCAGAAAGCCACGGCUAACUAUGUGCCAGCAGCCGCGGUAAUACAUAGGUGGCAAGCGUUAUCCGGAUUUAUUGGGCGUAUAGGGUGCGUAGGCGGUUUUGCAAGUUUGAGGUUAAAGUCCGGAGCUCAACUCCGGUU-CGCCUUGAAAACUGUAUUACUAGAAUGCAAGAGAGGUAAGCGGAAUUCCAUGUGUAGCGGUGAAAUGCGUAGAUAUAUGGAAGAACACCUGUGGCGAAAGCGGCUUACUGGCUUGUUAUUGACGCUGAGGCACGAAAGCGUGGGGAGCAAAUAGGAUUAGAUACCCUAGUAGUCCACGCCGUAAACGAUGAGUACUAAGUGUUGGG---GUAA----CUCAGCGCUGCAGCUAACGCAUUAAGUACUCCGCCUGAGUAGUAUGCUCGCAAGAGUGAAACUCAAAGGAAUUGACGGGGACCCGCACAAGUGGUGGAGCAUGUGGUUUAAUUCGAAGCAACACGAAGAACCUUACCAGGGCUUGACAUCCAGUGAAAGCUAUAGAGAUAUAGU--AG--AGG---UUCAUUGAGACAGGUGGUGCAUGGUUGUCGUCAGUUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAACGCAACCCUUGUCGUUAGUUACUAACA-UUAAGUUGAGAACUCUAACGAGACUGCUAGUG-UAAGCUAGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGUCCUGGGCUACACACGUGCUACAAUGGCUGGUACAAAGAGUUGCAAUCCUGUGAAGGGGAGCUAAUCUCAAAAAACCAGUCUCAGUUCGGAUUGAAGUCUGCAACUCGACUUCAUGAAGCCGGAAUCACUAGUAAUCGCGAAUCAGCUUGUCGCGGUGAAUACGUUCUCGGGUCUUGUACACACCGCCCGUCACACCAUGAGAGUUGGUAAUACCAGAAGUAGGUAGCUUAACCGUUUGGAGAGCGCUUCCCAAGGUAGGACUAGCGAUUGGGGUGAAGUCGUAACAAGGUAUCCGUACGGGAACmmmmmmmmmmmmmmmmm----- lo.Spi.cit UUUAAUGAGAGUUUGAUCCUGGCUCAGGAUGAACGCUGGCGGCAUGCCUAAUACAUGCAAGUCGAACGGGGUG---------CUUG---------CACCCAGUGGCGAACGGGUGAGUAACACGUAUCAAUCUACCCAUUAGCGGGGGAUAACAGUUGGAAACGACUGAUAAUACCGCAUA-CGACGCAUGUUAAAAGGUCC-----GUUU---GGAUCACUAAUGGAUGAGGAUGCGGCGUAUUAGUUAGUUGGUGGGGUAAUGGCCUACCAAGACAAUGAUACGUAGCCGAACUGAGAGGUUGAUCGGCCACAUCGGGACUGAGACACGGCCCGAACUCCUACGGGAGGCAGCAGUAGGGAAUUUUUCACAAUGG-CGAAAGCCUGAUGGAGCAAUGCCGCGUGACUGAAGACGGUCUUCGGAUUGUAAAAGUCUGUUGUAAGGGAAGAACAGUAAGUAUAGAAA-UGAUACUUAUUUGACGGUACCUUACCAGAAAGCCACGGCUAACUAUGUGCCAGCAGCCGCGGUAAUACAUAGGUGGCAAGCGUUAUCCGGAUUUAUUGGGCGUAAAGCGUGCGCAGACGGUUUAACAAGUUUGGGGUCAAAUCCUGGAGCUCAACUCCAGUU-CGCCUUGAAAACUGUUAAGCUAGAGUGUAGGAAAGGUCGAUGGAAUUCCAUGUGUAGCGGUGAAAUGCGUAGAUAUAUGGAGGAACACCAGUGGCGAAGGCGGUCGACUGGCCUAUCACUGACGUUUAGGCACGAAAGCGUAGGGAGCAAAUAGGAUUAGAUACCCUAGUAGUCUACGCCGUAAACGAUGAGUACUAAGUGUCGGAC---UAA---GUUCGGUGCUGCAGCUAACGCAUUAAGUACUCCGCCUGAGUAGUAUGCUCGCAAGAGUGAAACUCAAAGGAAUUGACGGGGACCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAAGGCUUGACAUCCAGUGAAAGCUGUAGAAAUACAGU--GG--AGG---UUCAUUGAGACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGCCGUGAGGUGUUUGGUUAAGUCCAGUAACGAGCGCAACCCUUGCCGUUAGUUACUC-CA-UUAAUUGAGAUACUCUAACAGGACUGCUAGUG-UAAGCUAGAGGAAGGUGGGGAUGACGUCAAAUCAGCAUGCCCCUUAUAUCCUGGGCUACACACGUGCUACAAUGGUCGGUACAAACAGUUGCGAUCUCGUAAGAGGGAGCUAAUCUGAAAAAGCCGAUCUCAGUUCGGAUUGAGGGCUGCAACUCGCCCUCAUGAAGCCGGAAUCGCUAGUAAUCGCGAAUCAGCAUGUCGCGGUGAAUACGUUCUCGGGUCUUGUACACACCGCCCGUCACACCAUGAGAGUUGAUAAUACCAGAAGUCGGUAUUCUAACCGCAAGGAGGAAGCCGCCCAAGGUAGGAUUGAUGAUUAGGGUGAAGUCGUAACAAGGUAUCCGUACGAGAACGUGCGGAUGGAUCACCUCCUUU lo.Sta.hom -----mmAGAGUUUGAUNNUGGCUCAGGAUGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAGCGAACAGACGAGGAG-CUUG-CUCCUUUGACGUUAGCGGCGGACGGGUGAGUAACACGUGGGAACCUACCUAUAAGACUGGGAUAACUUCGGGAAACCGGAGCUAAUACCGGAUA-AUAUGCAUAGUGAAAGAUGGC----UUU---GCUAUCACUUAUAGAUGGACCUGCGCCGUAUUAGCUAGUUGGUAAGGUAACGGCUUACCAAGGCAACGAUACGUAGCCGACCUGAGAGGGUGAUCGGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUCUUCCGCAAUGGGCGAAAGCCUGACGGAGCAACGCCGCGUGAGUGAUGAAGGUCUUCGGAUCGUAAAACUCUGUUAUUAGGGAAGAACAAACGUGUAAUAAC-UGUGCACGUCUUGACGGUACCUAAUCAGAAAGCCACGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGUGGCAAGCGUUAUCCGGAAUUAUUGGGCGUAAAGCGCGCGUAGGCGGUUUUUUAAGUCUGAUGUGAAAGCCCACGGCUCAACCGUGGAGGGUCAUUGGAAACUGGAAAACUUGAGUGCAGAAGAGGAAAGUGGAAUUCCAUGUGUAGCGGUGAAAUGCGCAGAGAUAUGGAGGAACACCAGUGGCGAAGGCGACUUUCUGGUCUCUAACUGACGCUGAUGUGCGAAAGCGUGGGGAUCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAGUGCUAAGUGUUAGGGGGUUCCGCCCCUUAGUGCUGCAGCUAACGCAUUAAGCACUCCGCCUGGGGAGUACGACCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGACCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAAAUCUUGACAUCCUUUGACCCUUCUAGAGAUAGAAG--UUUCUUCGGGGACAAAGUGACAGGUGGUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAAGCUUAGUUGCCAUCA-UUAAGUUGGGCACUCUAAGUUGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGCCCCUUAUGAUUUGGGCUACACACGUGCUACAAUGGACAAUACAAAGGGCAGCGAAACCGCGAGGUCAAGCAAAUCCCAUAAAGUUGUUCUCAGUUCGGAUUGUAGUCUGCAACUCGACUACAUGAAGCUGGAAUCGCUAGUAAUCGUAGAUCAGCAUGCUACGGUGAAUACGUUCCCGGGUCUUGUACACACCGCCCGUCACACCACGAGAGUUUGUAACACCCGAAGCCGGUGGAGUAACCAUUUGGAGCUAGCCGUCGAAGGUGGGACAAAUGAUUGGGGUGAAGUCGUAACAAGGUAGCCGUAUCGGAAGGUGCGGCUGGAUCACCUCCUUU mt.Dro.mel -----UUAAAGUUUUAUUUUGGCUUAAAA-AUUUGUUAUUAAUUUGAUU--UAUAUGUAAAUUU--------------------------------------UUGU------GUGAUU-UUA-UAUU-AUUUAAA-----------------------------------------------------------------------------------------AAAU--AAAUA------U-------------------------------AAA-----AUU---------------------------UAUC--------------------------------------GCAGUAAUUAAUAUUAUU-AAUUA-AAAAAUUUAGAAAUAGCAAUAUAAA---------------------------------------------------------------------------------------------AAAGUAUUGACUAAAUUGGUGCCAGCAGUCGCGGUUAUACCAUUAAUACAAAUAAAUUUUU---------UUAGUAUUAAUUAAAUUAAU---------------------------------------------------------------------------UUAU-------UAA--AAUUAAAUAUAU-A---GUUUUUA-UAUU-----------------UAAUUA----AU-AAAAA------------AAUUGAGUUAAA-UUAA---UAAAAAACUAGGAAUACCC-UA--UAUU-UAA--AAU-GUAAAUAAAUUG--------------------------------------------------------CUAAAG--UAGUA-AUAGUAU-UUCUGAAACUUAAAAAAUUUGGCGGUAUUUUAUCUA-CUAGA--GA-CC-U--GUUUUAUCGUAAUCC-ACGA-GGACCUUACUUAAAU-GUAAU-------------------------------------------------AGUUUAUAUACCGUCGUUAUCAGAAUAUUUU--UAGA-A---AUAAU---AU--------UCAAUAAUUUUAAU--------------------------------------------------------------AAAAA-UUUAUAUCAGAUCAAGGUGUAGCUUAUAUUUAAG--UAAUAAUGGGUUA-AA----UAAA-------------------------------------UUUAUA------------UGAAUAAAAUUAUGAAAA---AAUUUUUGAAAGUGGAUUUAGUAGUAAAAUUAU-AGA---UAAUAAUUUGAUUUUAGCUCUAAAAUAUGUACACAUCGCCCGUCGCUCUUAUU---------------------------------AUUA------------------------------GGUAAGAUAAGUCGUAACAUAGUAGAUGUACUGGAAAGUGUAUCUAGAAUGA------- mt.Gal.gal -----AAAAGACUUAGUCCUAACCUUUCU-AUUGGUUUUUGCUAGACAUA-UACAUGCAAGUAU--------------------------------------CCGCAUCCCAGUGAAA-AUGCCCCCACCUUUCUUCCC-------------------------------------------------------------------------------------AAGC-AAAAGGAGCAGGUAUCGGCA-CACU-------------------CAGU----AGCCC---AAGACGCCUU-GCU---UAAGCCAC-----------------------ACCCCCC-GGGUACUAGCAGUAAUUAACCUUAAGCAAUAA-GUUAAACUUGACUUAGCCAUAGCAA--------------------------------------------------------------------------------------------CCCAGGGUUGGUAAAUCUUGUGCCAGCCACCGCGGUCAUACAAGAAACCCAAAUCAAUAGCU-----ACCCGGCGUAAAGAGUGGCCACAU--------------------------------------------------------------------GUUAUCUCACC-------AAA--AUGCAACCAAGCUAUAAGCGAUCC-AC--------CUAACCCAAC--CCAA-----AU-CUUAG------------AAUUUAACCACGAAGG----ACCCAAACUGGGAAUACCC-CA--UAUG-CCU--AGCCCUAAAUCUAGAUACCU-----------------------------------C-CAAC--ACAUGUAUCCGCCU---GAGAACGAGCCAAAGCUUAAAACUCUAAGGACUUGGCGGUGCCCCACCCA-CUAGA--GA-CC-U--GCUAUAUCGUAAUCC-ACGA-UCACCCAACCACCCC-GCCA---GCA--------------------------------------------AGCCUACAUACCGCCGUCGCCAGCCCACCUCAAUGAA-G---ACAACA--GUG------AGCUCAAUAGCCCCU---------------------------------------------------------------CGCU-AAUAAGACAGGUCAAGGUAUAGCCUAUGGGGUGG--GAGAAAUGGGCUA-AUU---UUCUA------------------------------------AGAACA----------AACGAAAAAGGAUGUGAAACC-CGCCCUUAGAAGGAGGAUUUAGCAGUAAAGUGAGAUUACCCAGCUCACUUUAAGACGGCUCUGAGGCACGUACAUACCGCCCGUCACCCUCUUCACAAGCCAUCAACAUCAAUA-------------AAUA--------UAUACUCCCC-UCCCGGCUAAAGACGAGGCAAGUCGUAACAAGGUAAGUGUACCGGAAGGUGCACUUAGACUACN------ mt.Bos.tau -----CAUAGGUUUGGUCCCAGCCUUCCU-GUUAACUCUUAAUAAACUUA-CACAUGCAAGCAU--------------------------------------CUACACCCCAGUGAGA-AUGCCCUCUGGUUAUUAA---------------------------------------------------------------------------------------AACU-AAGAGGAGCUGGCAUCAGCA-CACA-------------------CUGU----AGCUC---ACGACGCCUU-GC----UUAACCAC-----------------------A-CCCCA-GGGAAACAGCAGUGACAAAAAUUAAGCCAUAA-ACAAAGUUUGACUAAGUUAUAUUAA---------------------------------------------------------------------------------------------UUAGGGUUGGUAAAUCUCGUGCCAGCCACCGCGGUCAUACGAUUAACCCAAGCUAACAGGA-----GUACGGCGUAAAACGUGUUAAAGC----------------------------------------------------------------------ACCAUCCAA-------AAA--UUCUAACUAAGCUAAAAGCAUUAA-AA--------UAAAAAUAAAUGACAAA----AC-CCUAC------------AGCCGA-GCACUAAAG----ACCCAAACUGGGAAUACCC-CA--UAUG-CUU--AGCCCUAAACACAGAUAAUU-----------------------------------C-AUAC--AAAAUUAUUCGCCA---GAGUACUAGC-AC-GCUUAAAACUCAAAGGACUUGGCGGUGCUUUAUCCU-CUAGA--GA-CC-U--GCUAUAUCGUAAACC-CCGA-AAACCUCACCAAUUC-GCUA---AUA--------------------------------------------AGUCUAUAUACCGCCAUCUUCAGCAAACCCU--AAAG-G---AAAAAA--GUA------AGCGUAAUUAUGAUA---------------------------------------------------------------CAUA-AAAACGUUAGGUCAAGGUGUAACCUAUGAAAUGG-GAAGAAAUGGGCUA-AUU---CUCU-------------------------------------AGAGAA------UAAGCACGAAAGUUAUUAUGAAAC--CAAUAACCAAAGGAGGAUUUAGCAGUAAACUAAGAAGA---UGCUUAGUUGAAUUAGGCCAUGAAGCACGCACACACCGCCCGUCACCCUCCUCAAAUAGAUUCAGUGCAU-CUAA-----------CCUA---------UUAAACGCACUAGCUACAUGAGAGGAGACAAGUCGUAACAAGGUAAGCAUACUGGAAAGUGUGCUUGGAUAAAU------ mt.Hom.sap -----AAUAGGUUUGGUCCUAGCCUUUCU-AUUAGCUCUUAGUAAGAUUA-CACAUGCAAGCAU--------------------------------------CCCCGUUCCAGUGAGU-UCACCCUCUAAUCACCAC---------------------------------------------------------------------------------------GAUC-AAAAGGAACAAGCAUCAGCA-CGCA-------------------AUGC----AGCUC---AAAACGCUUA-GC----CUAGCCAC-----------------------ACCCCCC-GGGAAACAGCAGUGAUUAACCUUUAGCAAUAA-ACAAAGUUUAACUAAGCUAUACUAA-------------------------------------------------------------------------------------------CCCCAGGGUUGGUCAAUUUCGUGCCAGCCACCGCGGUCACACGAUUAACCCAAGUCAAUAGAA-----UC-CGGCGUAAAGAGUGUUUUAGA----------------------------------------------------------------------UCACCCCUCCA-----AAA--ACUCACCUGAGUUAAAAACGUUGA-CA--------C-AAAAUAGACUACAAA----GC-UUUAA------------AUCUGAAACAGAAAAG----ACCCAAACUGGGAAUACCC-CA--UAUG-CUU--AGCCCUAAACCUCAACAGUU-----------------------------------A-AUAC--AAAACUGCUCGCCA---GAACACGAGC-AC-GCUUAAAACUCAAAGGACCUGGCGGUGCUUCAUCCC-CUAGA--GA-CC-U--GCUGUAUCGUAAACC-CCGA-CAACCUCACCACCUC-GCU---------------------------------------------------AGCCUAUAUACCGCCAUCUUCAGCAAACCCU-AUGAG-G---ACAAA---GUA------AGCGCAAGUACCCA----------------------------------------------------------------CGUA-AAGACGUUAGGUCAAGGUGUAGCCCAUGAGGUGG-CAAGAAAUGGGCUA-AUU---UUCU-------------------------------------AGAAAA---------CUACGAUAGCCCUUAUGAAACU-UAAGGGUCGAAGGUGGAUUUAGCAGUAAACUAAGAGGA---UGCUUAGUUGAACAGGGCCCUGAAGCGCGUACACACCGCCCGUCACCCUCCUCAAGUAUACUUCAAAGGACAUUU-----------ACUA---------AAACCCUACGCAUUUAUAU-AGAGGAGACAAGUCGUAACAUGGUAAGUGUACUGGAAAGUGCACUUGGACGAAC------ mt.Mus.mus -----NAAAGGUUUGGUCCUGGCCUUAUA-AUUAAUUAGAGGUAAAAUUA-CACAUGCAAACCU--------------------------------------CCAUAGACCGGUGUAA-AAUCCCUUAAUUUACUUA---------------------------------------------------------------------------------------AAAU-UUAAGGAGAGGGUAUCAGCA-CAUU-------------------AAAU----AGCUU---AAGACACCUU-GC----CUAGCCAC-----------------------ACCCCCC-GGGACUCAGCAGUGAUAAAUAUUAAGCAAUAA-ACAAAGUUUGACUAAGUUAUACCUC--------------------------------------------------------------------------------------------U-UAGGGUUGGUAAAUUUCGUGCCAGCCACCGCGGUCAUACGAUUAACCCAAACUAAUUAUC------UUCGGCGUAAAACGUGUCAACUA---------------------------------------------------------------------UAAAUAAUAA-------AAA--AUCCAACUUAUAUAAAAUUGUUAG-GA--------CCAAACUCAAUAACAAA----AU-UCUAG------------UUAU-AAACACGAAAG----ACCCAAACUGGGAAUACCC-CA--UAUG-CUU--AGCCAUAAACCUAAAUAAUU----------------------------------AA-UUAC--AAAACUAUUUGCCA---GAGAACUAGC-AU-GCUUAAAACUCAAAGGACUUGGCGGUACUUUAUCCA-CUAGA--GA-CC-U--GCUAUAUCGUAAACC-CCGC-CUACCUCACCAUCUC-GCUA---AUU--------------------------------------------AGCCUAUAUACCGCCAUCUUCAGCAAACCCU--AAAG-G---AUUAAA--GUA------AGCAAAAGAAUCA-A---------------------------------------------------------------CAUA-AAAACGUUAGGUCAAGGUGUAGCCAAUGAAAUGG-GAAGAAAUGGGCUA-AUU---UUCUU------------------------------------AGAACA---------UUACUAUACCCUUUAUGAAAC--UAAAGGACUAAGGAGGAUUUAGUAGUAAAUUAAGAAGA---AGCUUAAUUGAAUUGAGCAAUGAAGUACGCACACACCGCCCGUCACCCUCCUCAAAUUAAAUUAAACUUAACAUA-----------UUAA--------UUUCUAGACA-UCCGUUUAUGAGAGGAGAUAAGUCGUAACAAGGUAAGCAUACUGGAAAGUGUGCUUGGAAUAAU------ mt.Sac.cer AUUUAUAAGAAUAUGAUGUUGUUUCAGAUUAAGCGCUAAAUAAGGACAUGACGCAUACGAGUCAU-CGU-UU-AUUA-UU------GAUAAUA-AAU--AUGUGGUGUAAACGUGAGUAUUUAUUAGGAAUUAAAACUAUAG-----------AUAAGCUAAACUUAA------UAUA----UUAUAUAAAUA-AAAG-GAUA---UAUA--A-UAUUUACUAUAGUC-AAGCCAAUA-UGGUUUGGUA-GUAGGUUUAUUAGAGUUAAACCUAGCCACGAUCCAU-AAUCGUAAUGAAAGUUAGAACGAUCACGUUGACUCUGAAAUAU-AGUCAA-UAUCUU-AAGAUACAGCAGUGAGGAAUAUUGGACAAUGA-UCAAAGAUUGAUCCAGUUACUUAUUA-GGAUGAU-UAUAAA-AUAUUUUAUUUUAUUUUAUUAAUAUUUAUUAAUAAUAAUAAUA--AAUUAA-UUUAUUAUAUUAAUAUUAUUUAAU---GUCCUGACUAAAUUUGUGCCAGCAGUCGCGGUAACACAAAGAGGGCGAGCGUUAAUCAUAA-----UGGUUUAAAGGAUCCGUAGAUGAAUUAUA------------------------UAU-------------------------------UAGAGUUAAAAAA-------AGAAUUAUAAUAGUAAAGAAAUAAAUAAU-UAUA------AGACUAAUAUAUGUAAA-A--AA-UUAAAUAU-UA------CAUUGA-GGAUUAAGAG---U-GCG-AAACGGAAUAUUC-GU--UAGUUUCU--AGUAGUAAACUAUGAAUAC-AAUUAUUUAUAAU-UAUUAUAUAUAAAAAUAA---U-GAAUG-AAAGUAUUCCACCUGAAGAGUACGUUA-CA-UAAUGAAACUCAAAACAAUAGACGGUUACAGAUUAA-GCAGU--GA-CA-U--GAUUUUUCGUAAUCC-ACGA-UAACUUUACCAUAUU-G----A-AUAU----UAUAAU---AUAUU-AU---------UU---AUA-UUA-AGGCGUUACAUUGUUGUCUUUAGUUCGUGCU--CAAGUU--AGAUUAAA-UGUGCAACGAGCAAAACUCCAUAUAUAAUU--UUAU--------------------AUUA--UUUAU----UA--A-UAUAAAG--AAAGGAAUUAAGACAAAUCAUAAUGAUCCUUAUAAUAUGGGUAAUAGACGUGCUA-A-AUAAAAUGAUAAUAAAUAAUUUAAUU--U-AAUAUAAC----UUUUAAUAUAUUUUUUUUAUAUGAAUUAUAAUCUGAAAUUCGAUUAUAUGAAAAAAGAAUUGCUAGUAAUCGUAAAUGU---UGUUACGGUGAAUAUUC-UAACUGUUUCGCACUAAUCACUCAUCAGGCGUUGAAACAUAUUAUUAUCUUAUUAUUAUAUAAAUUU-UAA-UAUAUUUAUUUAUAUCAGAAAUAAUAUGAAUUAAUGCGAAGUUG-AAUACAGUUACCGUAGGGGAACCUGCGGUGGGCUUAUAAAUAUC mt.Zea.may CAAAAUCUGAGUUUGAUCCUGGCUCAGAAGGAACGCUAGCUAUAUGCUUAACACAUGCAAGUCGA-CGU-UGUUUUC-GGGGUUGAGUUGAGA-ACA--AAGUGGCGAACGGGUGCGUAACGCGUGGGAAUCUGCCGAACAG-----------UCGGGCCAAUCCUGAA-----GAAA----GCU-CAAAGC----------------------------CUGUUUGAUGAGCCUGCGUAGUAUUGGUA-GUUGGUCAGGUAAAGGCUGACCAAGCCAUGAUGCUU-AGCUGGUCUUUUCGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCGGACUCCCC-G-GGGGCAGCAGUGGGGAAUCUUGGACAAUGG-GCAAAGCCCGAUCCAGCAAUAUCGC--GAGUGAA-GAAGGCAAUGGCUUGUAAAGCUCUUUCGUCGAGUGCGCGAUC-----------------------AUGACAGGACUCGAGGAA-AAGCCCCGGCUAACUCCGUGCCAGCAGCCGCGGUAAGACGGGGGGGGCAAGUGUUCUUCGGAAUGACUGGGCGUAAAGGGCACGUAGGCGGUGAAUCGGGUUGAAAGUGA-AAGUCGCCAAAAAGGCGGA---A--UGCUU-------CGA-AC-UUGAGUGAGCAGA-------GGAAUUUCGUGUGUAGGGAAAUCCAUCUA-CGAA------GGAACGCCAAAAGCAAG-A--UC-UCUGGGUCCCU------CGCUGG-GUGCGAUGGG---GAGCG-AACAGGAAUACCC-UG--UAGU-CCA--UGCCGUAAACGAUGAGUGUUCGCCCUUGGUCUGUACGGCGGAUCAGGGCCCA---C-UAGCGUGAAACACUCCGCCUGGGGAGUACGGUC-CA-GACCGAAACUCAAAGGAAUUGACGGGGGCCUGACAA-GCGGU--GA-CA-U--GGUUUUUCGUACAAC-GCGC-AAACCUUACCAGCCC-GACA-A-UGAACAAAACCUGU---UAACG-GGA--------ACU--UUC-AUA-AGGUGCUGCAUGGCUGUCGUCAGCUCGUGUC--UGGAUG--UGGUCAAGUCCUAUAACGAGCGAAACCCUCGUUUUGUGUUGCUGAGAAG----UGCCGCACUCACGAGGGACUGCCAGU-GA--ACUGGAGGAAGGUGGGGAUGACGUCAAGUCCGCAUGGCCCUUAUGGGCUGGGCCACACACGUGCUA-AAUGGCAAUGACAAUGGGAAGCAAGGCU--AAGGCGGAGC----UCCGGAAAGAUUGCCUCAGUUCGGAUUGUUCUCUGCAACUCGGGAACAUGAAGUAGAAAUCGCUAGUAAUCGCGGAUGC---UGCCGCGGUGAAUAUGUACCCGGGCCCUGUACACACCGCCCGUCACACCCUGGGAAUUGGUUUCGCCCGAAGCUCGGACCAUGAUCACUAUAUUGGCGCAUACCACGGUGGGGUCUUCGACUGGGGUGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGUGGCUGGAUUGAAUCC--- mt.Gly.max CAAAAUAAGAGUUUGAUCCUGGCUCAGAAGGAACGCUAGCUAUAUGCUUAACACAUGCAAGUCGA-CGU-UGUUUUC-GGGGUGC-GUUGAAA-ACA--AAGUGGCGAACGGGUGCGUAAUGCGUGGGAAUCUGCCGAACAG-----------UCGGGCCAAUCCUGAA-----GAAA----GCU-AAAAGC----------------------------CUGUUUGAUGAGCCUGCGUAGUAUUGGUA-GUUGGUCAGGUAAAGGCUGACCAAGCCAUGAUGCUU-AGCUGGUCUUUUCGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCGGACUCCCC-G-GGGGCAGCAGUGGGGAAUCUUGGACAAUGG-GCAAAGCCCGAUCCAGCAAUAUCGC--GAGUGAA-GAAGGCAAUGGCUUGUAAAGCUCUUUCGUCGAGUGCGCGAUC-----------------------AUGACAGGACUCGAGGAA-AAGCCCCGGCUAACUCCGUGCCAGCAGCCGCGGUAAGACGGGGGGGGCAAGUGUUCUUCGGAAUGACUGCGCGUAAAGGGCACGUAGGCGGUGAAUCGGGUUGAAAGUGA-AAGUCGCCAAAAACGUGGA---A--UGCUU-------CGA-AC-UUGAGUGAGCAGA-------GGAAUUUCGUGUGUAGGGAAAUCCAUCUA-CGAA------GGAACGCCAAAAGCAAG-A--UC-UCUGGGUCCCU------CGCUGG-GUGCGAUGGG---GAGCG-AACGGGAAUACCC-UG--UAGU-CCA--UGCCGUAAACGAUGAGUGUUCGCCCUUGGUCU--ACG--GGAUCAGGGCCCA---C-UAGCGUGAAACACUCCGCCUGGGGAGUACGGUC-CA-GACCGAAACUCAAAGGAAUUGACGGGGGCCUGACAA-GCGGU--GA-CA-U--GGUUUUUCGUACAAC-GCGC-AAACCUUACCAGCCC-GACA-A-UGAACAAAACCAGU---UAACG-GGA--------ACU--UUC-AUA-AGGUGCUGCAUGGCUGUCGUCAGCUCGUGUC--UGGAUG--UGGUCAAGUCCUAUAACGAGCGAAACCCUCGUUUUGUGUUGCUGAGAAGAAU-UGCCGCACUCACGAGGGACUGCCAGU-UA--ACUGGAGGAAGGUGGGGAUGACGUCAAGUCCGCAUGGCCCUUAUGGGCUGGGCCACACACGUGCUA-AAUGGCAAUUACAAUGGGAAGCAAGGCU--AAGGCGGAGC----UCCGGAAAGAUUGCCUCAGUUCGGAUUGUUCUCUGCAACUCGGGAACAUGAAGUUGGAAUCGCUAGUAAUCGCGGAUGC---UGCCGCGGUGAAUAUGUACCCGGGCCCUGUACACACCGCCCGUCACACCCUGGGAAUUGGUUUCGCCCGAAGCUCGGACCAUGAUCACUAUAUUGGCGCAUACCACGGUGGGGUCUUCGACUGGGGUGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGUGGCUGGAUGGAAUCC--- mt.Tet.pyr AAUUUAUUGAGUGUGGUCAUGGCUCGGGAUUAACGCUAAUUAGACGCCUAACACAUGCGAGUU---UA--UAUAUUA-GAU-----UC---UA-AAU--GUAUAGCGUCAAGGUGAGUAAAUACA--UAAUAUGCCUUUAAGAAAUAAAGUUAAUUUA-AAAAUACAAA--UGCAUAAA---UUAUAAACUAA----G-AUUA--AU-UAA-G-UAAUUAUUUCUAAAAUGUUUAAU--AAAUAAA-U--AUUUAUUUUU---AGUAAAAUA-AGCCUGAUGUUU--AGUUGAAUU-UAGAGUUGUUCAACCACAUACGGGUUGAAAAC-UACCCUA-UCUAU----UGAC-AGCAGUGAGGA-UUUUGG-CAAUAU-GCAAAACA--A-CCAGCGAACUAAU--AAACGAAGAAGUAGCAA-UUGU-A-AAGUUUAAUUAACAAAUG---------------------------------AGUA-UAU-CUAGA-GAAGCUCUGGCUAAACAUGUACCAGCAGCCGCGGUAAUACAUGAGGAGCGAGCGUUACUCAUAUUGACUGAGUGUAUUAAAUACGUAGGAUGUUAUAAUUACUACUUUAAA-AAUGUAAU-AAAAA----UAAAU-C-UGU-------GUUA--U-AAUAAUUAUAUUA-------AGA-CUUAUAGAACAACGAAAUGUACACU-AUAU------AGCUAGCCAUAAACAAG-A--UC-UCUAAUUA-UA------AUCUGA-GUAUGACGGG---UAUCG-AUGAGGAUCUCCC-UA--UAGU-CCG--UACUGUAAAAGAUGAAUACUUUAUGAA---------C-----UUCAGAAU-A---C-UAGCA--AAGUAUUCUGCUUGGGGAGUAUUAUC-CA-GAUUAAAACUUAACUGAAUUGGCGGGAAUUUGUCGA-ACGGU--GA-CA-U--GGUUUUGCGUAAUCC-ACGC-AAAUCUUACCAACGU-UAG--C-UUUAUUAAUAUGGU----CACUA-UA--------AG---UAA-AGU-CGGAAUUGCAUGGCUGUCGUCAGUUCGUGCU--UGAGUU--GGAUUAAGUUCUUUAACGAAUGCAACCCUAUAAAUAAGUCUUUAAUAG-----UUUAUUAUUCUUGUUUAAUCUAUAUCAAUAUGAAUUAGGGGUUUUAGGCUGAAGUCAAGUCCCUAUGGUCUUUAUACGUUGGGCUACACACGUGUUA-AA-GGUAAAAACAAAGAGACGCAAUAGA--AAUCUGGAGC----CUC-AAAAAUUACCU-CAGUUCAGAUUGUCUCAUGAAAUUCUGAGGCAUGAAGAUGAAAUCGUUAGUAAUUGUAAAUAA---UGUUACAGUGAAAUAUUAGUCAAAUUUUGCACACAC-GCCCAUCACGCUCGGAAAGUCAAUAUUAGCGGAAG---AUUUGAAAUCUUUA-GGAAAGUAGUAUCUAAUCUAAUAUUGGUAAUCUGAGUGAAGUUGACACAAGGUACUGGUAGGGGAACCUGUUGGUGGAAUAUAUUAUUA al.Aci.cry -----mmmmmmmmmmmmmmmmmmmmmmAGCGAACGCUGGCGGCAUGCUUAACACAUGCAAGUCGCACGGGCAGG--------GCAA--------CCUGUCAGUGGCGGACGGGUGAGUAACGCGUAGGAAUCUAUCCUUGGGUGGGGGACAACCGUGGGAAACUACGGCUAAUACCGCAUGAUCCCUGAGGGGCAAAGGC-------GAAA-----GUCGCCUGAGGAGGAGCCUGCGUCUGAUUAGGUAGUUGGUGGGGUAAAGGCCUACCAAGCCUGCGAUCAGUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGAAAGCCUGAUCCAGCAAUGCCGCGUGGAUGAAGAAGGUCUUCGGAUUGUAAAGUCCUUUUGGCGGGGACGA------------UGAU-------------GACGGUACCCGCAGAAUAAGCUCCGGCUAACUUCGUGCCAGCAGCCGCGGUAAUACGAAGGGGGCUAGCGUUGCUCGGAAUGACUGGGCGUAAAGGGCGCGUAGGCGGACGGCACAGUCAGGCGUGAAAUUCCUGGGCUCAACCUGGGGACUGCGUCUGAGACGUGUUGUCUUGAGUAUGGAAGAGGGUUGUGGAAUUUCCAGUGUAGAGGUGAAAUUCGUAGAUAUUGGAAAGAACACCGGUGGCGAAGGCGGCAACCUGGUCCAUUACUGACGCUGAGGCGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCUGUAAACGAUGUGUGCUGGAUGUUGGGGUGUUA-GCACUUCAGUGUCGUAGCUAACGCGGUAAGCACACCGCCUGGGGAGUACGGCCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGCAGAACCUUACCAGGAUUUGACAUGGGAGUACCGGUCCAGAGAUGGACUUUCCC-GCAA-GGGGUCCCGCACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGCCUUCAGUUGCCAGCAUUUUGGGUGGGCACUCUGAAGGAACUGCCGGUGACAAGCCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUAUGUCCUGGGCUACACACGUGCUACAAUGGCGGUGACAGUGGGAAGCCAGGUGGUGACACCGAGCUGAUCUCAAAAA-GCCGUCUCAGUUCGGAUUGCACUCUGCAACUCGAGUGCAUGAAGGUGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAUUUGGUUUGACCUUAAGUUGGUGCGCUAACCGCAAGGAGGCAGCCAACCACGGUCGGGUCAGAGACUGGGGUGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- al.Met.rho -----mmmmmmmmmmmmmmmmmmmmmmAGCGAACGCUGUCGGCAGGCUUAUCACAUGUAAGUCGAGAGGGCACC--------UUCG--------GGUGUCAGCGGCAGACGGGUGAGUAACACGUGGGAACGUGCCCUUCGGUUCGGAAUAACUCAGGGAAACUUGAGCUAAUACCGGAUACGCCCUUAUGGGGAAAGG--------UUUA------CCGCCGAAGGAUCGGCCCGCGUCUGAUUAGCUUGUUGGUGGGGUAACGGCCUACCAAGGCGACGAUCAGUAGCUGGUCUGAGAGGAUCAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGCAAGCCUGAUCCAGCCAUGCCGCGUGAGUGAUGAAGGCCUUAGGGUUGUAAAGCUCUUUUGUCCGGGACGA------------UAAU-------------GACGGUACCGGAAGAAUAAGCCCCGGCUAACUUCGUGCCAGCAGCCGCGGUAAUACGAAGGGGGCUAGCGUUGCUCGGAAUCACUGGGCGUAAAGGGCGCGUAGGCGGCUUUUUAAGUCGGGGGUGAAAGCCUGUGGCUCAACCACAGAAUUGCCUUCGAUACUGGAAAGCUUGAGACCGGAAGAGGACAGCGGAACUGCGAGUGUAGAGGUGAAAUUCGUAGAUAUUCGCAAGAACACCAGUGGCGAAGGCGGCUGUCUGGUCCGGUUCUGACGCUGAGGCGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAAUGCCAGCCGUUGGCCUGCUUG-CAGGUCAGUGGCGCCGCUAACGCAUUAAGCAUUCCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGCAGAACCUUACCAUCCCUUGACAUGGCGUG-UACCCAGGGAGACUUGGGAUCCUCUUCGGAGGCGCGCACACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCACGUCCUUAGUUGCCAUCAUUCAG-UUGGGCACUCUAGGGAGACUGCCGGUGAUAAGCCGGAGGAAGGUGUGGAUGACGUCAAGUCCUCAUGGGCCUUACGGGAUGGGCUACACACGUGCUACAAUGGCGGUGACAGUGGGACGCGAAGGAGCGAUCUGGAGCAAAUCCCCAAAA-GCCGUCUCAGUUCGGAUUGCACUCUGCAACUCGGGUGCAUGAAGGCGGAAUCGCUAGUAAUCGUGGAUCAGCAUGCCACGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUUGGUCUUACCCGACGGCGCUGCGCCAACCGCAAGGAGGCAGGCGACCACGGUAGGGUCAGCGACUGGGGUGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- al.Rhi.fre -AACUUGAGAGUUUGAUCCUGGCUCAGAACGAACGCUGGCGGCAGGCUUAACACAUGCAAGUCGAGCGCCCC----------GCAA----------GNNNAGCGGCAGACGGGUGAGUAACGCGUGGGAAUCUACCCUUUUCUACGGAAUAACGCAGGGAAACUUGUGCUAAUACCGUAUGAGCCCUUCGGGGGAAAGA--------UUUA------UCGGGAAAGGAUGAGCCCGCGUUGGAUUAGCUAGUUGGUGGGGUAAAGGCCUACCAAGGCGACGAUCCAUAGCUGGUCUGAGAGGAUGAUCAGCCACAUUGGGACUGAGACACGGCCCAAACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGCAAGCCUGAUCCAGCCAUGCCGCGUGAGUGAUGAAGGCCCUAGGGUUGUAAAGCUCUUUCACCGGUGAAGA------------UAAU-------------GACGGUAACCGGAGAAGAAGCCCCGGCUAACUUCGUGCCAGCAGCCGCGGUAAUACGAAGGGGGCUAGCGUUGUUCGGAAUUACUGGGCGUAAAGCGCACGUAGGCGGACAUUUAAGUCAGGGGUGAAAUCCCGGGGCUCAACCCCGGAACUGCCUUUGAUACUGGGUGUCUAGAGUCCGGAAGAGGUGAGUGGAAUUCCGAGUGUAGAGGUGAAAUUCGUAGAUAUUCGGAGGAACACCAGUGGCGAAGGCGGCUCACUGGUCCGGUACUGACGCUGAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAAUGUUAGCCGUCGGGCAGUUUA-CUGUUCGGUGGCGCACGUAACGCAUUAAACAUUCCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGCAGAACCUUACCAGCCCUUGACAUCCCGAUCGGAUACGAGAGAUCGUAUCCUCAGUUCGCUGGAUCGGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGCCCUUAGUUGCCAGCAUUUAG-UUGGGCACUCUAAGGGGACUGCCGGUGAUAAGCCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUACGGGCUGGGCUACACACGUGCUACAAUGGUGGUGACAGUGGGCAGCGAGACCGCGAGGUCGAGCUAAUCUCCAAAA-GCCAUCUCAGUUCGGAUUGCACUCUGCAACUCGAGUGCAUGAAGUUGGAAUCGCUAGUAAUCGCAGAUCAGCAUGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUUGGUUCUACCCGAAGGUAGUGCGCUAACCGCAAGGAGGCAGCUAACCACGGUAGGGUCAGCGACUGGGGUGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGCGGCUGGAUCACCUCC--- al.Ric.pea -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmGCUUAACACAUGCAAGUCGAACGGACUAAUU---GGGCUUG-CUCCAA-UUAGUUAGUGGCAGACGGGUGAGUAACACGUGGGAAUCUACCCAUCAGUACGGAAUAACUUUUAGAAAUAAAAGCUAAUACCGUAUAUUCUCUGCGGAGGAAAGA--------UUUA------UCGCUGAUGGAUGAGCCCGCGUCAGAUUAGGUAGUUGGUGAGGUAAUGGCUCACCAAGCCGACGAUCUGUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGAAAGCCUGAUCCAGCAAUACCGAGUGAGUGAUGAAGGCCUUAGGGUUGUAAAGCUCUUUUAGCAAGGAAGA------------UAAU-------------GACGUUACUUGCAGAAAAAGCCCCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGGGCUAGCGUUGUUCGGAAUUACUGGGCGUAAAGAGUGCGUAGGCGGUUUCGUAAGUUGGAAGUGAAAGCCCGGGGCUUAACCUCGGAAUUGCUUUCAAAACUACUAAUCUAGAGUGUAGUAGGGGAUGAUGGAAUUCCUAGUGUAGAGGUGAAAUUCUUAGAUAUUAGGAGGAACACCGGUGGCGAAGGCGGUCAUCUGGGCUACAACUGACGCUGAUGCACGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAGUGCUAGAUAUCGGAAGAUUC--UCUUUCGGUUUCGCAGCUAACGCAUUAAGCACUCCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGGCUCGCACAAGCGGUGGAGCAUGCGGUUUAAUUCGAUGUUACGCGAAAAACCUUACCAACCCUUGACAUGGUGGUCGGAUCGCAGAGAUGCUUUUCUCAGCUCGCUGGACCACACACAGGUGUUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCAUUCUUAUUUGCCAGCGGUAAUGCCGGGAACUAUAAGAAAACUGCCGGUGAUAAGCCGGAGGAAGGUGGGGACGACGUCAAGUCAUCAUGGCCCUUACGGGUUGGGCUACACGCGUGCUACAAUGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm-mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm-mmmm-mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- al.Sph.ter -----mmmmAGUUUGAUCCUGGCUCAGAACGAACGCUGGCGGCAUGCCUAACACAUGCAAGUCGAACGAGAUC---------UUCG---------GAUCUAGUGGCGCACGGGUGCGUAACGCGUGGGAAUCUGCCCUUGGGUUCGGAAUAACUCAGAGAAAUUUGUGCUAAUACCGUAUAAUGUCUUCGGACCAAAGA--------UUUA------UCGCCCAAGGAUGAGCCCGCGUAGGAUUAGCUAGUUGGUGGGGUAAAGGCUCACCAAGGCGACGAUCCAUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGGACAAUGGGCGAAAGCCUGAUCCAGCAAUGCCGCGUGAGUGAUGAAGGCCCUAGGGUUGUAAAGCUCUUUUACCCGGGAUGA------------UAAU-------------GACAGUACCGGGAGAAUAAGCUCCGGCUAACUUCGUGCCAGCAGCCGCGGUAAUACGAGGGGAGCUAGCGUUGUUCGGAAUUACUGGGCGUAAAGCGCGCGUAGGCGGUUUUUUAAGUCAGAGGUGAAAGCCCGGGGCUCAACCCCGGAAUAGCCUUUGAAACUGGAAAGCUAGAAUCUUGGAGAGGUCAGUGGAAUUCCGAGUGUAGAGGUGAAAUUCGUAGAUAUUCGGAAGAACACCAGUUGCGAAGGCGGCUGACUGGACUGGUAUUGACGCUGAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGAUAACUAGCUGUCCGGGCU-AUAGAGCUUGGGUGGCGCACGUAACGCAUUAAGUUAUCCGCCUGGGGAGUACGGCCGCAAGGUUAAAACUCAAAGGAAUUGACGGGGGCCUGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGCAGAACCUUACCAGCGUUUGACAUCCUGAGCGGUUACCAGAGAUGGUUUCCUUAGUUCGCUGGAUAGGUGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCAUCCCUAGUUGCCAUCAUUAAG-UUGGGCACUCUAAGGAAACUGCCGGUGAUAAGCCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUACGCGCUGGGCUACACACGUGCUACAAUGGCGGUGACAGUGGGCAGCAACCUCGCGAGAGGUAGCUAAUCUCCAAAA-GCCGUCUCAGUUCGGAUUGUUCUCUGCAACUCGAGAGCAUGAAGGCGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCAGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUUGGUUUCACCCGAAGGCCCAGAGCCAACCGCACGGAGGAAAGG------UGAAUGCCUAGUUA-UGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- be.Hyd.pal -----mmmmmmmmmmmmmmmmmmmmmmAUUGAACGCUGGCGGCAUGCUUUACACAUGCAAGUCGAACGGUAACAGGCC----GCAA----GGUGCUGACGAGUGGCGAACGGGUGAGUAAUGCAUCGGAACGUGCCCAGUCGUGGGGGAUAACGCAGCGAAAGCUGCGCUAAUACCGCAUACGAUCUAUGGAUGAAAGCGGGGGACCGUAAGGCCUCGCGCGAUUGGAGCGGCCGAUGUCAGAUUAGGUAGUUGGUGGGGUAAAGGCUCACCAAGCCAACGAUCUGUAGCUGGUCUGAGAGGACGACCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUGGACAAUGGGCGCMAGCCUGAUCCAGCAAUGCCGCGUGCAGGAAGAAGGCCUUCGGGUUGUAAACUGCUUUUGUACGGAACGAAAAGGCUCUGGUUAAUACCUGGGGCACAUGACGGUACCGUAAGAAUAAGCACCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGUGCAAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGUGCGCAGGCGGUUUUGUAAGACAGGCGUGAAAUCCCCGGGCUCAACCUGGGAAUUGCGCUUGUGACUGCAAGGCUGGAGUGCGGCAGAGGGGGAUGGAAUUCCGCGUGUAGCAGUGAAAUGCGUAGAUAUGCGGAGGAACACCGAUGGCGAAGGCAAUCCCCUGGGCCUGCACUGACGCUCAUGCACGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGUCAACUGGUUGUUGGGUCUCUUC-UGACUCAGUAACGAAGCUAACGCGUGAAGUUGACCGCCUGGGGAGUACGGCCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGACCCGCACAAGCGGUGGAUGAUGUGGUUUAAUUCGAUGCAACGCGAAAAACCUUACCCACCUUUGACAUGUACGGAAUUUGCCAGAGAUGGCUUAGUGCUGAAAAGAGCCGUAACACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCAUUAGUUGCUAC---GAAA---GGGCACUCUAAUGAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUAUAGGUGGGGCUACACACGUCAUACAAUGGCCGGUACAAAGGGUCGCAAACCCGCGAGGGGGAGCUAAUCCAUCAAAGCCGGUCGUAGUCCGGAUCGCAGUCUGCAACUCGACUGCGUGAAGUCGGAAUCGCUAGUAAUCGUGGAUCAGCAUGUCACGGUGAAUACGUUCCCGGGUCUUGUACACACCGCCCGUCACACCAUGGGAGCGGGUCUCGCCAGAAGUAGUUAGCCUAACCGCAAGGAGGGCGAUUACCACGGCGGGGUUCGUGACUGGGGUGGAGUCGUAACAAGGUAGCCGUAUCGGAAGGmmmmmmmmmmmmmmmm----- be.Nei.fla -AACAUAAGAGUUUGAUCCUGGCUCAGAUUGAACGCUGGCGGCAUGCUUUACACAUGCAAGUCGAACGGCAGCACAGAGAAGCUUGCUUCUUGGGUGGCGAGUGGCGAACGGGUGAGUAAUAUAUCGGAACGUACCGAGUAAUGGGGGAUAACUAAUCGAAAGAUUAGCUAAUACCGCAUAUUCUCUGAGGAGGAAAGCAGGGGACCUUCGGGCCNUGCGUUAUUCGAGCGGCCGAUAUCUGAUUAGUUAGUUGGUGGGGUAAAGGCCUACCAAGGCGACGAUCAGUAGCGGGUCUGAGAGGAUGAUCCGCCACACUGGGACUGAGACACGGCCNNGNCUCCUACGGGAGGCAGCAGUGGGGAAUUUUGGACAAUGGGCGCAAGCCUNAUCCAGCCAUGCCGCGUGUCUGAAGAAGGCCUUCGGGUUGUAAAGGACUUUUGUCAGGGAAGAAAAGGCUGUUGCUAAUAUCGACAGCUGAUGACGGUACCUGAAGAAUAAGCACCGGCUAACUACGUGCCAGCAGCCGCGNNNAUACGUAGGGUGCNAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGAGCGCAGACGGUUACUUAAGCAGGAUGUGAAAU-CCCGGGCUCAACCCGGNNACUGCGUUCUGAACUGGGUAACUAGAGUGUGUCAGAGGGAGGUAGAAUUCCACGUGUAGCAGUGAAAUGCGUAGAGAUGUGGAGGAAUACCGAUGGCGAAGGCAGCCUCCUGGGAUAACACUGACGUUCAUGCUCGAAAGCGUGGGUAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGUCAAUUAGCUGUUGGGCAACUUGAUUGCUUAGUAGCGUAGCUAACGCGUGAAAUUGACCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGACCCGCACAAGCGGUGGAUGAUGUGGAUUAAUUCGAUGNAACGCGAAGAACCUUACCUGGUCUUGACAUGUACGGAAUCCUCCGGAGACGGAGGNGUGCCUUCGGGAGCCGUAACACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCRACGAGCGCAACCCUUGUCAUUAGUUGCCAUCAUUUAG-UUGGGCACUCUAAUGAGACUGCCGGUGACAAGCCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUAUGACCAGGGCUUCACACGUCAUACAAUGGUCGGUACAGAGGGUAGCCAAGCCGCGAGGUGGAGCCAAUCUCACAAAACCGAUCGUAGUCCGGAUUGCACUCUGCAACUCGAGUGCAUGAAGUCGGAAUCGCUAGUAAUCGCAGGUCAGCAUACUGCGGUGAAUACGUUCCCGGGUCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGGGAUACCAGAAGUAGGUAGGGUAACCGCAAGGAGUCCGCUUACCACGGUAUGCUUCAUGACUGGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- be.Pse.syz -----mmmmmmmmmmmmmmmmmmmmmmmmmmAACGCUGGCGGCAUGCCUUACACAUGCAAGUCGAACGGCAGCGGGGGUAG-CUUG--CUACCUCCGGCGAGUGGCGAACGGGUGAGUAAUACAUCGGAACGUGCCCUGUAGUGGGGGAUAACUA-UCGAAAGACUGGCUAAUACCGCAUACGACCUGAGGGUGAAAGUGGGGC-CCGCAAGGCCUCAUGCUAUAGGAGCGGCCGAUGUCUGAUUAGCUAGUUGGUGGGGUAAAGGCCCACCAAGGCGACGAUCAGUAGCUGGUCUGAGAGGACGAUCAGCCACACUGGGACUGAGAGACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUGGACAAUGGGGGCAACCCUGAUCCAGCAAUGCCGCGUGUGUGAAGAAGGCCUUCGGGUUGUAAAGCACUUUUGUCCGGAAAGAAAUCGCUUCGGUUAAUACCUGGAGUGGAUGACGGUACCGGAAGAAUAAGGACCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGUCCAAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGUGCGCAGGCGGUUGUGCAAGACCGAUGUGAAAUCCCCGGGCUUAACCUGGGAAUUGCAUUGGUGACUGCACGGCUAGAGUGUGUCAGAGGGGGGUAGAAUUCCACGUGUAGCAGUGAAAUGCGUAGAGAUGUGGAGGAAUACCGAUGGCGAAGGCAGCCCCCUGGGAUAACACUGACGCUCAUGCACGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGUCAACUAGUUGUUGGGGA-UUCAUUUCCUUAGUAACGUAGCUAACGCGUGAAGUUGACCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUUGACGGGGACCCGCACAAGCGGUGGAUGAUGUGGAUUAAUUCGAUGCAACGCGAAAAACCUUACCUACCCUUGACAUGCCACUAACGAAGCAGAGAUGCAUUAGUG-UGAAAAGAAAGUGGACACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCUCUAGUUGCUA-C--GAAA---GGGCACUCUAGAGAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUAUGGGUAGGGCUUCACACGUCAUACAAUGGUGCAUACAGAGGGUUGCCAAGCCGCGAGGUGGAGCUAAUCCCAGAAAAUGCAUCGUAGUCCGGAUCGUAGUCUGCAACUCGACUACGUGAAGCUGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUACGUUCCCGGGUCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGCUUUACCAGAAGUAGUUAGCCUAACCGCAAGGAGGGCGAUUACCAC-GUAGGGUUCAUGACUGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- be.Rho.pur GAACUGAAGAGUUUGAUCCUGGCUCAGAUUGAACGCUGGCGGCAUGCCUUACACAUGCAAGUCGAACGGUAACGGGNCC---UUCG---GGCGCCGAACGAGUGGCGAACGGGUGAGUAAUGCAUCGGAACAUGCCCUGAAGUGGGGGAUAACGUAGCGAAAGUUACGCUAAUACCGCAUAUUCUGUGAGCAGGAAAGCAGGGGACCUUCGGGCCUUGCGCUUUGGGAGUGGCCGAUGUCGGAUUAGCUAGUUGGUGGGGUAAAAGCCUACCAAGGCAACGAUCCGUAGCGGGUCUGAGAGGAUGAUCCGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUGGACAAUGGGCGAAAGCCUGAUCCAGCCAUGCCGCGUGAGUGAAGAAGGCCUUCGGGUUGUAAAGCUCUUUCGGCGGGGAAGAAAUCGGGUUUCCUAAUACGGAACCCGGAUGACGGUACCCGAAGAAGAAGCACCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGUGCNAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGUGCGCAGGCGGUUGUGUAAGACAGACGUGAAAUCCCCGGGCUCAACCUGGGAACUGCGUUUGUGACUGCACAGCUAGAGUACGGCAGAGGGGGGUGGAAUUCCACGUGUAGCAGUGAAAUGCGUAGAGAUGUGGAGGAACACCGAUGGCGAAGGCAGCCCCCUGGGCCAAUACUGACGCUCAUGCACGNAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGUCAACUAGGUGUUGGUGGGGUUAACCCAUUAGUGCCGUAGCUAACGCGUGAAGUUGACCGCCUGGGGAGUACGGCGGCAAGGUUAAAACUCAAAGGAAUUGACGGGGANCCGCACAAGCGGUGGAUGAUGUGGAUUAAUUCGAUGCAACGCGAAAAACCUUACCUACCCUUGACAUGUCAGGAAUCCUGAGGAGACUCGGGAGUGCCGAAAGGNACCUGAACACAGGUGCUGCAUGGCNGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCAUUAAUUGCCAUCAUUCAG-UUGGGCACUUUAAUGAAACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUAUGGGUAGGGCUUCACACGUCAUACAAUGGUCGGUCCAUAGGGUUGCNAACCCGCGAGGGGGAGCUAAUCCCAGAAAGCCGAUCGUAGUCCGGAUUGCAGUCUGCAACUCGACUGCAUGAAGUCGGAAUCGCUAGUAAUCGCGGAUCAGCAUGUCGCGGUGAAUACGUUCCCGGGUCUUGUACACACCGCCCGUCACACCAUGGGAGCGGGUUCUGCCAGAAGUAGUUAGCCUAACCGCAAGGAGGGCGAUUACCACGGCAGCGUUCGUmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- be.Tha.ter -----mmmmmmmmmmmmmmmmmmmmmmmmmmAACGCUGGCGGCAUGCUUUACACAUGCAAGUCGAACGGCAGCACGGGC---UUCG---GCCUGGUGGCGAGUGGCGAACGGGUGAGUAAUGCAUCGGAACGUGCCCAUUUGUGGGGGAUAACUACGCGAAAGCGUGGCUAAUACCGCAUACGCCCUGAGGGGGAAAGCGGGGGAUUUUCGAACCUCGCGCAAUUGGAGCGGCCGAUGUCAGAUUAGCUUGUUGGUGAGGUAAAGGCUCACCAAGGCGACGAUCUGUAGCGGGUCUGAGAGGAUGAUCCGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUGGACAAUGGGGGCAACCCUGAUCCAGCCAUGCCGCGUGAGUGAAGAAGGCCUUCGGGUUGUAAAGCUCUUUCAGCCGGGAAGAAAACGCACUCUCUAACAUAGGGUGUGGAUGACGGUACCGGAAGAAGAAGCACCGGCUAACUACGUGCCAGCAGCCGCGGUAAUACGUAGGGUGCGAGCGUUAAUCGGAAUUACUGGGCGUAAAGGGUGCGCAGGCGGUUUUGUAAGACAGAUGUGAAAUCCCCGGGCUUAACCUGGGAACUGCGUUUGUGACUGCAAGGCUAGAGUACGGCAGAGGGGGGUGGAAUUCCGCGUGUAGCAGUGAAAUGCGUAGAUAUGCGGAGGAACACCGAUGGCGAAGGCAACCCCCUGGGCCUGUACUGACGCUCAUGCACGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGUCAACUAGUCGUUCGGAGCGCAAUGCACUGAGUGACGCAGCUAACGCGUGAAGUUGACCGCCUGGGGAGUACGGCCGCAAGGUUAAAACUCAAAGGAAUUGACGGGGACCCGCACAAGCGGUGGAUGAUGUGGAUUAAUUCGAUGCAACGCGAAAAACCUUACCUACCCUUGACAUGUCUGGAACCUUGGUGAGAGCCGAGGGUGCCUUCGGGAGCCAGAACACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCAUUAGUUGCCAUCAUUUAG-UUGGGCACUCUAAUGAGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUAUGGGUAGGGCUUCACACGUCAUACAAUGGUCGGUACAGAGGGUUGCCAAGCCGCGAGGUGGAGCCAAUCCCACAAAGCCGAUCGUAGUCCGGAUCGUAGUCUGCAACUCGACUACGUGAAGUCGGAAUCGCUAGUAAUCGCAGAUCAGCAUGCUGCGGUGAAUACGUUCCCGGGUCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGUUUCACCAGAAGUAGGUAGCUUAACCUUCGGGAGGGCGCUUACCACGGUGAGAUUCAUGACUGGGGUGAAGUCGUAACAAGGUAGCCGUAUCGGAAmmmmmmmmmmmmmmmmmm----- de.Des.vib -----mmmmmmmmmmmmmmUGGCUCAGAAUGAACGCUGGCGGCGUGCUUAACACAUGCAAGUCGAACGAGAAAGGGAUUG--CUUG-CAAUCCUGAGUAGAGUGGCGCACGGGUGAGUAACACGUAGAAAUCUGUCUCCAAGCCUGGGAUAACUAUUCGAAAGGGUAGCUAAUACCGGAUAAAGUUAUAAAAUGAAAGAUUGCCUCUCNUGAAGCAAUUGUUUGGGGGUGAGUUUGCGUACCAUUAGCUUGUUGGUGGGGUAAAGGCCUACCAAGGCAACGAUGGUUAGCUGGUCUGAGAGGAUGAUCAGUCACACUGGAACUGGAACACGGUCCAGACUCCUACGGGAGGCAGCAGUGAGGAAUUUUGCGCAAUGGGGGCAACCCUGACGCAGCAACGCCGCGUGAGUGAAGAAGGCCUUUGGGUCGUAAAGCUCUGUCAACAGGGAAGAAGUUAUUAUCUAUAAUAGUGGAUACUAUUGACGGUACCUGUGGAGGAAGCGCCGGCUAACUCCGUGCCAGCAGCCGCGGUAACACGGGGGGCGCAAGCGUUAUUCGGAAUUAUUGGGCGUAAAGGGCGCGUAGGCGGUCUUGUCCGUCAGGUGUGAAAGCCCGGGGCUCAACCCCGGAAGUGCACUUGAAACAGCAAGACUUGAAUACGGGAGAGGAAAGCGGAAUUCCUGGUGUAGAGGUGAAAUUCGUAGAUAUCAGGAGGAACACCGAUGGCGAAGGCAGCUUUCUGGACCGAUAUUGACGCUGAGGCGCGAAGGCGUGGGUAGCGAACGGGAUUAGAUACCCCGGUAGUCCACGCAGUAAACGUUGUACACUCGGUGUGGCGGAUAUUAAAUCUGCUGUGCCCAAGCUAACGCAUUAAGUGUACCGCCUGGGAAGUACGGUCGCAAGACUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGACGCAACGCGAAGAACCUUACCUGGGUUUGACAUCCUGUGAAUAUUGGGUAAUUGCCAUAGUGCCUUCGGGAGCACAGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUUGGUUAAGUCCAGCAACGAGCGCAACCCUUAUCGUCAGUUGCCAGCACUAAUGGUGGGAACUCUGGCGAGACUGCCCCGGUCAACGGGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUAUAUCCAGGGCUACACACGUGCUACAAUGGUAGGUACAAAGGGCAGCGACUCUGCAAAAAGAAGCGAAUCCCAAAA-GCCUAUCUCAGUCCGGAUUGGGGUCUGCAACUCGACCCCAUGAAGUUGGAAUCGCUAGUAAUCGCGGAUCAGCAUGCCGCGGUGAAUAUGUUCCCGGGCCUUGUACACACCGCCCGUCACACCAUGGAAGUUGAUUAUACCCGACGUCGCUGGGCUAACUUUUAAGGGGCAGGCGCCUAAGGUAUGGUCGAUAACUGGGGUGAAGUCGUAACAAGGUAGCCGUUGGAGmmmmmmmmmmmmmmmmmmmm----- de.Des.ter -----mmmmmmmmmmmmmmmmmmmmmmAUUGAACGCUGGCGGCGUGCCUAACACAUGCAAGUCGUGCGUGAAAGGGC-----UUCG----GCCUGAGUAAAGCGGCGCACGGGUGAGUAACGCGUGGAGAUCUGCCCAUGAGUUGGGAAUAACGGCUGGAAACGGUCGCUAAUACCGAAUACGCUC-CAUGGGGAAAGGUGGCCUCUCUUGAAGCUACCGCUCAUGGAUGAGUCCGCGUCCCAUUAGCUUGUUGGCGGGGUAAUGGCCCACCAAGGCGACGAUGGGUAGCCGACCUGAGAGGGUGAUCGGCCACACUGGGACUGGAACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCGCAAUGGGCGAAAGCCUGACGCAGCGACGCCGCGUGAGGGAUGAAGGCCUUCGGGUCGUAAACCUCUGUCAGGAGGGAAGAACCGCCAUGGUGUAAU-CAGCCAUGGUCUGACGGUACCUCCAAAGGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUAAUCGGAAUCACUGGGCGUAAAGCGCACGUAGGCUGUUUGGUAAGUCAGGGGUGAAAUCCCGCAGCUCAACUGCGGGAUUGCCCUUGAUACUGCUGGACUUGAGUUCGGGAGAGGGUGGCGGAAUUCCAGGUGUAGGAGUGAAAUCCGUAGAUAUCUGGAGGAACAUCAGUGGCGAAGGCGGCCACCUGGACCGAUACUGACGCUGAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCUGUAAACGAUGGAUGCUAGGUGUCGGGGC-CUUG-AGCUUCGGUNCCGCAGCUAACGCGUNAAGCAUCCCGCCUGGGGAGUACGGUCGNNAGGNUGAAACUCAAAGAAAUUGACGGGNNCCCGCACAAGC-GUGGAGUAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCCAGGCUUGACAUCCGGGAAACCCUCCCGAAAAGGAGGGGUGCUUUCGAGAAUCCCGAGACAGGUGCUGCNUGGNUGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUGUUCAUAGUUGCUACCAGUAAUGCUGGGCACUCUAUGGAGNNNNCCCCGGUUAACGGGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUACNNCUNNGGCUACACACGUACUACAAUGGCGCACACAAAGGGCAGCGAUACCGUGAGGUGGAGCCAAUCCCAAAAAAUGCGUCCCAGUCCGGAUUGCAGUCUGCAACUCGACUGCAUGAAGUUGGAAUCGCUAGUAAUUCGAGAUCAGCAUGCUCGGGUGAAUGCGUUCCCGGGCCUUGUACACACCGCCCGUCACACCACGAAAGUCGGUUUUACCCGAUACCGGUGAGCCAACCGCAAGGAGGCAGCCGUCUACGGUAGGGCCGAUGAUUGGGGUGAAGUCGUAACNNNGUAGCCGUAGGGGAACCUGCGGCUmmmmmmmmm----- de.Des.suc -----mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmGUAUGCCUAACACAUGCAAGUCGAACGCGAACGGAAUC---UUCG--GAUUCCNAGUAGAGUGGCGCACGGGUGAGUAACACGUGGAAACCUGCCCUAGAGUCUGGGAUAACACUCCGAAAGGAGUGCUAAUACCGGAUAAGACCCUCGGGGAAAAGA--------UUUA------UUGCUCUAGGAUGGGUCCGCGGUCCAUUAGCUAGUUGGUGGGGUAAUGGCCUACCAAGGCUGCGAUGGAUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUGCGCAAUGGGCGAAAGCCUGACGCAGCAAUACCGCGUGAAGGAUGAAGGCCCUUGGGUCGUAAACUUCUGUCAGAGGGGAAGAAAUACUC-----GCAA------GAGUGCUGACGGUACCCUCAAAGGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUGUUCGGAAUUAUUGGGCGUAAAGAGCAUGUAGGCGGUCUGUUAAGUCUGGUGUGAAAGCCCGGGGCUCAACCCCGGAAGUGCAUUGGAUACUGGC-GACUUGAGUAUGGGAGAGGAAAGUGGAAUUCCGAGUGUAGGAGUGAAAUCCGUAGAUAUUCGGAGGAACACCAGUGGCGAAGGCGGCUUUCUGGACCAAUACUGACGCUGAGAUGCGAAAGCGUGGGGAGCGAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGGGUACUAGGUGUUGCGGGUAACCCUCCUGCAGUGCCGCAGCUAACGCAUUAAGUACCCCGCCUGGGGAGUACGGCCGCAAGGCUAAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGACGCAACGCGAAGAACCUUACCUAGGCUUGACAUCCCGAUCACUCUAUGGAAAAAUAGAGGUCAGCUUGCUGGAUCGGUGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUGUCCUUAGUUGCCAUCAUUAAG-UUGGGCACUCUAGGGAGACUGCCGGUGUUAAACCGGAGGAAGGUGGGGAUGACGUCAAGUCCUCAUGGCCCUUAUGCCUAGGGCUACACACGUGCUACAAUGGCCGGUACAAAGGGCAGCAAUACCGCGAGGUGGAGCGAAUCCCAAAAAGCCGGUCUCAGUUCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUUGGAAUCGCUAGUAAUCGCGUAUCAGCAUGACGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCACGGGAGUCUAUUGUACCGGAAACCGGUGGGCUAACCUUCGGGGAGCAGCCGUUUAUGGUAUGAUCGGUAACUGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ep.Alv.pom -----mmAGAGUUUGAUCAUGGCUCAGAGUGAACGCUGGCGGCGUGCUUAACACAUGCAAGUCGAACGGUAACAGGUC----UUCG---GAGUGCUGACGAGUGGCGAACGGGUGAGUAAUGUAUAGUAAUUUUCCCCUUGGAGAGGGAUAGCCACUGGAAACGGUGAUUAAUACCUCAUACUCCUAAAAAGGGAAAGG--------UUUA----UUCUGCCAAGGGAUAAGACUAUAUCCUAUCAGCUAGUUGGUAGUGUAAGGGACUACCAAGGCAAUGACGGGUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGGGAAACCCUGAUGCAGCAACGCCGCGUGGAGGAAGAAGCAUUUAGGUGUGUAAACUCCUUUUAUCAAGGAAGAA-----------GA-C-------------GACGGUACUUGAUGAAUAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUACUCGGAAUCACUGGGCGUAAAGCGCAUGUAGGCGGUUGAUUAAGUUAGAAGUGAAAGCCUACGGCUCAACCGUAGAACAGCUUCUAAAACUGAUUAACUAGAGUCUGGGAGAGGAAGAUGGAAUUAGUAGUGUAGGGGUAAAAUCCGUAGAGAUUACUAGGAAUACCGAAAGCGAAGGCGAUCUUCUGGAACAGGACUGACGCUGAGAUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGAAUGUUAGUCGUUGGAGUGCUAGCCACUUCAGCGAUGCAGCUAACGCAUUAAACAUUCCGCCUGGGGAGUACGSCCGCAAGGUUAAAACUCAAAGGAAUAGACGGGGACCCGCACAAGUGGUGGAGCAUGUGGUUUAAUUCGAAGAUACGCGAAGAACCUUACCUGGCCUUGACAUAAUCAGAACCCACCAGAGAUGGUGGGGUGCCUUCGGGAGCUGAUAUACAGGUGCUGCACGGCUGUCGUCAGCUCGUGUCGUGUGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUGACUAGUUACUAACAGUUAGGCUGAGGACUCUAGUCAGACUGCCUUCG-UAAGGAGGAGGAAGGUAAGGACGACGUCAAGUCAUCAUGGCCCUUACGGCCAGGGCGACACACGUGCUACAAUGGGAAGGACAAAGAGAAGCAAAACUGCGAGGUGGAGYAAAUCU-AUAAACCUUCUCUCAGUUCGGAUUGCAGUCUGCAACUCGACUGCAUGAAGCUGGAAUCACUAGUAAUCGUGAAUCAGCCUGUCACGGUGAAUACGUUCCCGGGUCUUGUACUCACCGCCCGUCACACCAUGGGAGUUGAUUUCACCCGAAGUGGGGAUGCUA---AAUA---GGCUACCCACCACGGUGGAAUUAGCGACUGGGGUGAAGUCGUAACAAGGUAACCGUAGGAGAACCUGCGGCUGGAUCACCUCCUUA ep.Cam.fet -UUAUGGAGAGUUUGAUCCUGGCUCAGAGUGAACGCNGGCGGCGUGCCUAAUACAUGCAAGUCGAACGGAGUAUUAAGAGAGCUUGNU--NUUAAUACNUAGUGGCGCACGGGUGAGUAAUGUAUAGUAAUCUGCCCUACACUGGAGGACAACAGUUAGAAAUGACUGCUAAUACUCCAUACUCCUAUAACGGGAAAG---------UNUU------UCGGUGUAGGAUGAGACUAUAUUGUAUCAGCUAGUUGGUAAGGUAAUGGCUUACCAAGGCUNUGACGCAUAACUGGUCUGAGAGGAUGAUCAGUCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUAUUGCUCAAUGGGGGAAACCCUGAAGCAGCAACGCCGCGUGGAGGAUGACACUUUUCGGAGCGUAAACUCCNUUUGUUAGGGAAGAA-----------CCAU-------------GACGGUACCUAACGAAUAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCNAGCGUUACUCGGAAUCACUGGGCGUAAAGGACGCGUAGGCGGAUUAUCAAGUCUUUUGUGAAAUCUAACAGCUAAACUGUUAAACUGCUUGAGAAACUGAUAAUCUAGAGUGAGGGAGAGGCAGAUGGAAUUGGUGGUGUAGGGGUAAAAUCCGUAGAGAUCACCAGGAAUACCCAUUGCGAAGGCGAUCUGCUGGAACUCAACUGACGCUAAUGCGUGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCUCUAAACGAUGUAUACUAGUUGUUGCUGUGCUAGUCACGGCAGUAAUGCACCUAACGGAUUAAGUAUACCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUAGACGGGGACCCGCACAAGCGGUGGAGNNNGUGGUUUAAUUCGANNNUACGCGAAGAACCUUACCUGGGCUUGAUAUCCAACUAAUCUCUUAGAGAUAAGAGAGUGCUCUUGAGAAAGUUGAGACAGGUGCUGCACGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCACGUAUUUAGUUGCUAACAGUUCGGCUGAGCACUCUAAAUAGACUGCCUUCG-CAAGGAGGAGGAAGGUGUGGACGACGUCAAGUCAUCAUGGCCCUUAUGCCCAGGGCGACACACGUGCUACAAUGGCAUAUACAAUGAGAUGCAAUAUCGCGAGAUGGAGCAAAUCUAUAAA-AUAUGUCCCAGUUCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGCCGGAAUCGCUAGUAAUCGUAGAUCAGCCUGCUACGGUGAAUACGUUCCCGGGUCUUGUACUCACCGCCCGUCACACCAUGGGAGUUGAUUUCACUCGAAGUCGGAAUGCUA---AAC---UAGCUACCGCCCACAGUGGAAUCAGCGACUGGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ep.Cam.lar -UUAUGGAGAGUUUGAUCCUGGCUCAGAGUGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGNACGAUGAAGCGA-CUAGCUUGAG---UUGUGGAUUAGUGGCGCACGGGUGAGUAAGGUAUAGUAAUCUGCCCUACACAAGAGGACAACAGUUGGAAACGACUGCUNAUACUCUAUACUCCUACAAAGGGAAAG---------UUUU------UCGGUGUAGGAUGAGACUAUAUAGUAUCAGCUAGUUGGUGAGGUAAUGGCUCACCAAGGCUNUGACGCUUAACUGGUCUGAGAGGAUGAUCAGUCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUAUUGCGCAAUGGGGGAAACCCUGACGCAGCAACGCCGCGUGGAGGAUGACACUUUUCGGAGCGUAAACUCCUUUUCUUAGGGAAGAA-----------UUCU-------------GACGGUACCUAAGGAAUAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCNAGCGUUACUCGGAAUCACUGGGCGUAAAGGGCGCGUAGGCGGAUUAUCAAGUCUCUUGUGAAAUCCAACGGCUUAACCGUUGAACUGCUUGGGAAACUGGUAAUCUAGAGUGGGGGAGAGGCAGAUGGAAUUGGUGGUGUAGGGGUAAAAUCCGUAGAUAUCACCAAGAAUACCCAUUGCGAAGGCGAUCUGCUGGAACUUAACUGACGCUAAGGCGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGUAUGCUAGUUGUUGGGGUGCUAGUCAUCUCAGUAAUGCAGCUAACGCAUUAAGCAUACCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUAGACGGGGACCCGCACAAGCGGUGGAGCAUGUGGUUUNAUNCGAAGAUACGCGAAGAACCUUACCUGGGCUUGAUAUCCUAAGAACCUUAUAGAGAUAUGAGGGUGCUCUUGAGAACUUAGAGACAGGUGCUGCACGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCACGUAUUUAGUUGCUAACACUUCGGGUGAGCACUCUAAAUAGACUGCCUUCG-UAAGGAGGAGGAAGGUGUGGACGACGUCAAGUCAUCAUGGCCCUUAUGCCCAGGGCGACACACGUGCUACAAUGGCAUAUACAAUGAGACGCAAUACCGCGAGGUGGAGCAAAUCUAUAAA-AUAUGUCCCAGUUCGGAUUGUUCUCUGCAACUCGAGAGCAUGAAGCCGGAAUCGCUAGUAAUCGUAGAUCAGCCUGCUACGGUGAAUACGUUCCCGGGUCUUGUACUCACCGCCCGUCACACCAUGGGAGUUGAUUUCACUCGAAGCCGGAAUACUA---AAC---UAGUUACCGUCCACAGUGGAAUCAGCGACUGGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ep.Hel.pyl UUUAUGGAGAGUUUGAUCCUGGCUCAGAGUGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAACGAUGAAGCUU-CUAGCUUGAG--AGUGCUGAUUAGUGGCGCACGGGUGAGUAACGCAUAGGCAUGUGCCUCUUAGUUUGGGAUAGCCAUUGGAAACGAUGAUUAAUACCAGAUACUCC-UACGGGGGAAAGA--------UUUA------UCGCUAAGAGAUCAGCCUAUGUCCUAUCAGCUUGUUGGUAAGGUAAUGGCUUACCAAGGCUAUGACGGGUAUCCGGCCUGAGAGGGUGAACGGACACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUAUUGCUCAAUGGGGGAAACCCUGAAGCAGCAACGCCGCGUGGAGGAUGAAGGUUUUAGGAUUGUAAACUCCUUUUGUUAGAGAAGA------------UAAU-------------GACGGUAUCUAACGAAUAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUACUCGGAAUCACUGGGCGUAAAGAGCGCGUAGGCGGGAUAGUCAGUCAGGUGUGAAAUCCUAUGGCUUAACCAUAGAACUGCAUUUGAAACUACUAUUCUAGAGUGUGGGAGAGGUAGGUGGAAUUCUUGGUGUAGGGGUAAAAUCCGUAGAGAUCAAGAGGAAUACUCAUUGCGAAGGCGACCUGCUGGAACAUUACUGACGCUGAUGCGCUAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGGAUGCUAGUUGUUGGAGGGUUAGUCUCUCCAGUAAUGCAGCUAACGCAUUAAGCAUCCCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUAGACGGGGACCCGCACAAGCGGUGGAGCANGUGGUUUAAUUCGANNNNACACGAAGAACCUUACCUAGGCUUGACAUUGAGAGAAUCCGCUAGAAAUAGUGGAGGUCUCUUGAGACCUUGAAAACAGGUGCUGCACGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCNUUUCUUAGUUGCUAACAGUUAUGCUGAGAACUCUAAGGAUACUGCCUCCG-UAAGGAGGAGGAAGGUGGGGACGACGUCAAGUCAUCAUGGCCCUUACGCCUAGGGCUACACACGUGCUACAAUGGGGUGCACAAAGAGAAGCAAUACUGUGAAGUGGAGCCAAUCUUCAAA-ACACCUCUCAGUUCGGAUUGUAGGCUGCAACUCGCCUGCAUGAAGCUGGAAUCGCUAGUAAUCGCAAAUCAGCCUGUUGCGGUGAAUACGUUCCCGGGUCUUGUACUCACCGCCCGUCACACCAUGGGAGUUGUGUUUGCCUUAAGUCAGGAUGCUA---AAU---UGGCUACUGCCCACGGCACACACAGCGACUGGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- ep.Hel.tro -----mmmmmmmmmmmmmmmmmmUCAGAGUGAACGCUGGCGGCGUGCCUAAUACAUGCAAGUCGAACGAUGAAGCUU-UUAGCUUGAG---AAGUGGAUUAGUGGCGCACGGGUGAGUAAUGCAUAGGAACAUGCCCCAUAGUCUGGGAUAGCCACUGGAAACGGUGAUUAAUACCGGAUACUCCUUACGAGGGAAAG---------UUUU------UCGCUAUGGGAUUGGCCUAUGUCCUAUCAGCUUGUUGGUGAGGUAAUGGCUCACCAAGGCUAUGACGGGUAUCCGGCCUGAGAGGGUGAUCGGACACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUAGGGAAUAUUGCUCAAUGGGCGAAAGCCUGAAGCAGCAACGCCGCGUGGAGGAUGAAGGUUUUAGGAUUGUAAACUCCUUUUCUAAGAGAAGA------------UUAU-------------GACGGUAUCUUAGGAAUAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUACUCGGAAUCACUGGGCGUAAAGAGCGCGUAGGCGGGGUAAUAAGUCAGAUGUGAAAUCCUGUAGCUUAACUACAGAACUGCAUUUGAAACUGUUAUUCUAGAGUGUGGGAGAGGUAGGUGGAAUUCUUGGUGUAGGGGUAAAAUCCGUAGAGAUCAAGAGGAAUACUCAUUGCGAAGGCGACCUGCUGGAACAUUACUGACGCUGAUGCGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGAUGAAUGCUAGUUGUUGCCCUGCUUGUCAGGGCAGUAAUGCAGCUAACGCAUUAAGCAUUCCGCCUGGGGAGUACGGUCGCAAGAUUAAAACUCAAAGGAAUAGACGGGGACCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGAUACGCGAAGAACCUUACCUAGGCUUGACAUUGAUAGAAUCUGCUAGAGAUAGCGGAGUGCCUUCGGGAGCUUGAAAACAGGUGCUGCACGGCUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUCGUCCUUAGUUGCUAGCAGUUCGGCUGAGCACUCUAAGGAGACUGCCUUCG-UAAGGAGGAGGAAGGUGAGGACGACGUCAAGUCAUCAUGGCCCUUACGCCUAGGGCUACACACGUGCUACAAUGGGGCGCACAAAGAGGAGCAAUAUCGUGAGAUGGAGCAAAUCUCAAAA-ACGUCUCUCAGUUCGGAUUGUAGUCUGCAACUCGACUACAUAAAGCUGGAAUCGCUAGUAAUCGCAAAUCAGCAUGUUGCGGUGAAUACGUUCCCGGGUCUUGUACUCACCGCCCGUCACACCAUGGGAGUUGUAUUCGCCUUAAGUCGGAAUGCCA---AAC---UGGCUACCGCCCACGGCGGAUGCAGCGACUGGGGUGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- un.Aer.mar -----mmmmmmmmmmmmmmUGGCUCAGGACGAACGCUGGCGGCGUGCCUAAGACAUGCAAGUCGAGCGGGGAGAUUGGGGAGCUUGCUCUCUGGUCUCCUAGCGGCGGACGGGUGAGUAACACGUGGGAACCUGCCCGGCAGUGGGGGAUAACCCUGGGAAACUGGGGCUAAUACCGCAUACGGUCGCAUGAAGAAAGGCCGUC---GUGA-GGCGGUCGCUGCCGGAGGGGCCCGCGGCCCAUCAGCUCGUUGGUGGGGUAACGGCCCACCAAGGCGACGACGGGUAGCCGGCCUGAGAGGGUGGUCGGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUCGGGAAUCUUGCGCAAUGGGCGAAAGCCUGACGCAGCGACGCCGCGUGGGGGAGGAAGCCCUUCGGGGUGUAAACCCCUGUCGUCCGGGACGAAGGUGGGGGG-UGAAA--GGCCCUCUGCUGACGGUACCGGAGGAGGAAGCCCCGGCUAACUACNUGCCAGCAGCCGCGGUAAGACGUAGGGGGCGAGCGUUGUCCGGAAUCACUGGGCGUAAAGGGCGCGUAGGCGGCCUGGCAAGUCGGAUGUGAAAGGUCCCGGCUCAACCGGGGAGGUGCAUUCGAAACUGCCGGGCUUGAGGGCAGGAGAGGGCAGCGGAAUUCCCGGUGGAGCGGUGAAAUGCGUAGAGAUCGGGAGGAACACCAGUGGCGAAGGCGGCUGCCUGGCCUGGCCCUGACGCUGAGGCGCGACAGCGUGGGGAGCGAACGGGAUUAGAUACCCCGGUAGUCCACGCCGUAAACGAUGGGUGCUAGGUGUGGGAGGUUCGACCCCUUCCGUGCCGGAGCUAACGCACUAAGCACCCCGCCUGGGGAGUACGGCCGCAAGGCUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCUGGGCUUGACAUCCCGCGAACCUGGCCGAAAGGCUGGGGUGCCUUCGGGAGCGCGGUGACAGGUGCUGCAUGGUUGUCGUCAGCUCGUGUCGUGAGAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCCGCCCUGUGUUGCCAGCGGUACGGCCGGGCACUCACAGGGGACUGCCGGUGACAAACCGGAGGAAGGUGGGGAUGACGUCAAAUCAUCAUGGCCCUUAUGCCCAGGGCUACACACGUGCUACAAUGGCCGGUACAACGGGUUGCGAACCCGCGAGGGGGAGCCAAUCCCUAAAAGCCGGUCUCAGUUCGGAUCGCAGGCUGCAACUCGCCUGCGUGAAGCCGGAAUCGCUAGUAAUCGCGGAUCAGAAUGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCACGGGAGCCGGCAACACCCGAAGCCGGUGGCCCAACCGUAAGGAGGGAGCCGUCGAAGGUGGGGCCGGUGACUGGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- un.Fle.sin UGGAUGGAGAGUUUGAUCCUGGCUCAGAACGAACGCUGGCGGCGUGCUUAACACAUGCAAGUCAGGGGGAAUCCUCCC----UUCG--GGGAGGUAGUACNCCGGCGCACGGGUGAGUAACGCGUGAGGACCUACCCAUUAGACCUGGAUAACCCGGCGAAAGUUGGGCUAAGACAGGAUGUGUUAUCAUUGAUAAAGCAGGU----UUCG--GCUUGUGCUAAUGGAUGGGCUCGCGUCUGAUUAGCUAGUUGGUAAGGUAAAGGCUUACCAAGGCGACGAUCAUUAGGCGGCCUGAGAGGGUGGUCGCCCACACUGGAACUGAGACACGGUCCAGNCUCCUACGGGAGGCAGCAGUGGGGAAUUUUGCGCAAUGGGCGAGAGCCUGACGCAGCGACGCCGCGUGGACGAGGAAGGCCUUCGGGUCGUAAAGUCCNUUUUUACGGGAAGAAAGUUAUUAACAUAAC-UGGUUAAUAUUUGACGGUACCGUAAGAAUAAGCCCCGGCUAACUCCGUGCCAGCAGCCGNNNUNAUACGGAGGGGGCNAGCGUUGUUCGGAAUUACUGGGCGUAAAGCGUACGUAGGCGGCGUGGUAAGUCGGAGGUUAAAGGCUACGGCUNAACCGUAGUAGGGCNUUUGAUACUAUCAUGCUAGAGUGUCGGAGGGGGNAGCGGAAUUCCCUGUGUAGCGGUGAAAUGCGUAGAUAUAGGGAAGAACNCCAGUAGCGAAGGCGGCUACCUGGCCGAUCACUGACGCUGAGGUACGAGAGCGUGGGUAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCUGUAAACGAUGGACGCUAGGUGUUGGAGGAACCGCCCCUUCAGUGCCGAAGCCAACGCGUUAAGCGUCCNGCCUGGGGAGUACGACCGCAAGGUUAAAACUCAAAGGAAUUGACGGGGGCCNGCACAAGCGGUGGAGCACGUNGUUUAAUUCGAUGNUAACCGAAGAACCUUACCUGGUCUUGACAUCCUCCGGAGGGACUAGAGAUAGUUCUGNUGCNUNGGNAGCGGAGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCUAUUCAUAGUUGCCAUCGGUAAGGCCGGGCACUCUAUGGAUACUGCCCGUGAUAAACGGGAGGAAGGUGGGGACGACGUCAAGUCAUCAUGGCCUUUAUGACCAGGGCUACACACGUGCUACAAUGGUACGUACAGAGGGCAGCGAAGCCGCGAGGUGNAGCGAAUCCCUAAAAGCGUGCCUUAGUUCGGAUUGUAGUCUGCAAUUCGACUACAUGAAGGCGGAAUCGCUAGUAAUCGCAGGUCAGCAAACUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCNCNUCACACCACGGGAGUUAGUUGUACCUGAAGCCGGUGGCCCAAC-UGCG-GAGGGAGCCGNUUAUGGUAUGACUGGCAACNGGGGmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm----- un.Mag.bav -----mmAGAGUUUGAUUUUGGCUCAGAGCGAACGCUGGCGGCGUGCCUAACACAUGCAAGUCGAACGGAUCAUGAGUC---GCAA---GAUUCAUGGUUAGUGGCGCACGGGUGAGUAACGCGUAGGAAUCUGCCUUCAGGACCGGAACAACCAUUGGAAACGAUGGCUAAUCCCGGAUAAGACCGCAAGGUAAAAGGA-------GCAA-----UUCACCUGGAGAUGAGCCUGCGUCCUAUCAGGUAGUUGGUAAGGUAACGGCUUACCAAGCCUAAGACGGGUAGCCGGUCUGAGGGGAUGAACGGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGAGGAAUUUUGCGCAAUGGGGGAAACCCUGACGCAGCGACGCCGCGUGGAGGAAGAAGGCCUUCGGGUUGUAAACUCCUUUUGUAGGGAAAGA------------UGAU-------------GACGGUACCUUACGAAUAAGCCACGGCUAACUCUGUGCCAGCAGCCGCGGUAAGACAGAGGUGGCAAGCGUUGCUCGGAAUUACUGGGCUUAAAGGGUGCGUAGGCGGUUAGAUAAGUUUGGGGUGGAAUGCUCGGGCUCAACCCGGGAAUUGCCUUGAAAACUGUUUAACUUGAGUAAGCGAGGGGAUGGCGGAAUUCCUGGUGUAGCGGUGAAAUGCGUAGAUAUCAGGAGGAAGGCCGGUGGCGAAGGCGGCCAUCAGGCGCUUAACUGACGCUGAGGCACGAAAGCGUGGGGAUCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCCUAAACGGUGGGUACUAGGUAUAGGGCUCGAUA-GGGUUCUGUGCCGAAGGGAAACCAAUAAGUACCCCGCCUGGGAAGUACGGCCGCAAGGUUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGACGCAACGCGAGGAACCUUACCUGGGUUUGACAUGAAGUGUAGGAGUCCGAAAGGAUAACGCUCCGCAAGGAGAGCUUGCACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUUCUCUGUUGCCAUCGGUAAAGCCGGGCGCUCUGAGAAAACUGCCGGCGAUGAGUUGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUAUGUCCAGGGCUACACACGUGCUACAAUGGCUAUUACAGAGGGAAGCAAGAUCGCGAGGUGGAGCAAAUCCC-UAAAAAUAGUCUUAGUUGGGAUCGGAGUCUGCAACUCGACUCCGUGAACGUGGAAUCGCUAGUAAUCGCAGAUCACUACGCUGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACACCACGAAAGUCUGCUGUACCCGAAGCGGGUGAGCUAACCGCAAGGAGGCAGCUCACGAAGGUAUGGUAGGUGAUUGGGGUGAAGUCGUAACAAGGUAGCCGUAGGGGAACCUGCGGCUGGAUCACCUCCUUU """ sample_tree = """((((ch.Cya.cal:0.01909,ch.Cya.par:0.00974):0.00562,((ch.Eug.gra:0.03679,ch.Chl.mir:0.03448):0.00792,(ch.Pin.thu:0.00289,(((ch.Pis.sat:0.00728,ch.Gly.max:0.00560):0.00207,(ch.Nic.tab:0.00032,ch.Dau.car:0.00404):0.00191):0.00189,ch.Zea.may:0.00644):0.00207):0.00992):0.00674):0.00976,(((al.Aci.cry:0.02022,(al.Met.rho:0.01428,((al.Sph.ter:0.01856,al.Rhi.fre:0.01560):0.00577,(al.Ric.pea:0.03909,ga.Oce.pus:0.02219):0.00897):0.00740):0.00884):0.01161,(((be.Rho.pur:0.01507,be.Nei.fla:0.01667):0.00857,(be.Tha.ter:0.01119,(be.Pse.syz:0.01745,be.Hyd.pal:0.01564):0.00716):0.00536):0.01090,(ga.Xan.pis:0.01723,(ga.Pse.asp:0.01325,(ga.Leg.lyt:0.02104,(ga.Mar.psy:0.01684,((ga.Pho.pho:0.01267,ga.Hae.par:0.02272):0.00386,(ga.Vib.cho:0.01167,(ga.Sal.typ:0.00374,ga.Esc.col:0.00568):0.00939):0.00726):0.00652):0.01339):0.00892):0.00829):0.00892):0.01675):0.00927,((((de.Des.suc:0.02147,(de.Des.ter:0.02837,de.Des.vib:0.02735):0.01258):0.01172,(ra.Dei.rad:0.01597,ra.Dei.mur:0.01346):0.02678):0.00841,(((lo.Eub.lac:0.03050,sp.Lep.mey:0.03062):0.01102,((((lo.Lac.car:0.00581,lo.Car.mob:0.01332):0.01048,(lo.Sta.hom:0.01349,lo.Bac.sub:0.01148):0.01015):0.00661,lo.Bac.alc:0.01312):0.01687,((hi.Bif.inf:0.02108,((((hi.Myc.int:0.00581,hi.Myc.cel:0.00922):0.00960,hi.Cor.oti:0.01993):0.00689,(hi.Mic.par:0.01467,(hi.Mic.ech:0.00986,(hi.Noc.fla:0.01240,(hi.Str.rim:0.01912,hi.Cel.cel:0.01370):0.00521):0.00437):0.00594):0.00601):0.01040,hi.Act.eur:0.02348):0.00923):0.01618,((un.Fle.sin:0.03037,(((th.The.sub:0.01159,th.The.hyp:0.01299):0.01239,th.The.afr:0.02464):0.02101,(sp.Tre.suc:0.02298,sp.Tre.soc:0.02704):0.02202):0.01363):0.01096,(un.Aer.mar:0.01630,(((pt.Pla.mar:0.01738,pt.Pla.bra:0.02107):0.03072,pt.Iso.pal:0.03183):0.02494,gr.The.ros:0.03608):0.01201):0.01061):0.00862):0.01055):0.00806):0.00855,(((lo.Spi.cit:0.02091,(lo.Myc.myc:0.00530,lo.Myc.cap:0.00444):0.01268):0.01693,lo.Ach.lai:0.02594):0.01425,((fu.Lep.mic:0.01902,fu.Lep.buc:0.01886):0.01380,(fu.Fus.nul:0.01817,fu.Fus.gon:0.01127):0.01341):0.01792):0.00949):0.00816):0.00698,(((un.Mag.bav:0.02918,sp.Bor.par:0.03882):0.01142,((fl.Fla.xan:0.01205,(fl.Mar.psy:0.01497,fl.Cyt.mar:0.01590):0.00762):0.01329,(fl.Pre.eno:0.03031,fl.Bac.spl:0.02532):0.01223):0.02551):0.01015,((((ep.Hel.tro:0.01074,ep.Hel.pyl:0.01136):0.01095,(ep.Cam.lar:0.00615,ep.Cam.fet:0.00894):0.01470):0.01449,ep.Alv.pom:0.01917):0.01597,(((mt.Gly.max:0.02673,mt.Zea.may:0.02336):0.10722,((mt.Tet.pyr:0.10091,mt.Sac.cer:0.10584):0.04143,(((mt.Hom.sap:0.01752,(mt.Mus.mus:0.02097,mt.Bos.tau:0.01663):0.01043):0.01436,mt.Gal.gal:0.02956):0.02679,mt.Dro.mel:0.05506):0.07927):0.04096):0.05555,((ar.Met.ore:0.02764,((ar.Hal.sac:0.01549,(ar.Nat.mag:0.01515,ar.Hal.val:0.01643):0.00712):0.02421,(ar.Met.fer:0.01448,((ar.The.ste:0.00873,ar.The.chi:0.00962):0.01420,(ar.Met.jan:0.01611,(ar.The.pen:0.01914,ar.Sul.sol:0.02566):0.02002):0.00655):0.00998):0.01266):0.01176):0.03812,(pr.Try.equ:0.13823,((pr.Tet.bor:0.03717,pr.Cyc.gla:0.02400):0.01721,((pr.Pla.mal:0.11785,pr.Sar.neu:0.02921):0.01926,(pr.Por.aca:0.03748,((pr.Rho.sal:0.02632,(pr.Bla.hom:0.03851,pr.Mal.mat:0.03042):0.01478):0.00979,(((fu.Gig.mar:0.01786,((fu.Den.sul:0.00925,(fu.Fel.oga:0.00867,fu.Bul.hui:0.00910):0.00830):0.00765,(((fu.Sac.cer:0.00875,fu.Can.gla:0.00407):0.01253,(fu.Neu.cra:0.01565,(fu.Asp.tam:0.00163,fu.Asp.nid:0.00344):0.01020):0.00840):0.00546,fu.Sch.pom:0.01836):0.00563):0.00700):0.00993,((an.Bra.pli:0.02731,an.Cae.ele:0.06299):0.01757,(((an.Str.int:0.01379,an.Ast.amu:0.01332):0.01183,((((an.Rat.nor:0.00272,an.Mus.mus:0.00282):0.00216,an.Hom.sap:0.00628):0.01405,an.Xen.lae:0.00769):0.00993,an.Lam.aep:0.01238):0.01437):0.01557,an.Dro.mel:0.06939):0.01211):0.01858):0.00917,(pr.Sti.hel:0.02282,(((pl.Tai.cry:0.00186,pl.Pin.luc:0.00267):0.00699,pl.Gin.bil:0.00367):0.00467,((pl.Ara.tha:0.00614,((pl.Pis.sat:0.00081,pl.Gly.max:0.00234):0.00290,((pl.Sol.tub:0.00344,pl.Lyc.esc:0.00378):0.00372,pl.Pan.gin:0.00361):0.00180):0.00148):0.00251,pl.Zea.may:0.01435):0.00597):0.01283):0.01173):0.00837):0.01080):0.01141):0.01545):0.04435):0.08751):0.03645):0.02229):0.00860):0.00878):0.00657):0.01503):0.00749,(cy.Pho.amb:0.01236,(cy.Tri.ten:0.01069,cy.Osc.aga:0.01588):0.00501):0.00887,cy.Mic.aer:0.03275); """ sample_species_names = """cy.Mic.aer cy.Osc.aga cy.Pho.amb cy.Tri.ten pl.Zea.may pl.Pan.gin pl.Ara.tha pl.Gly.max pl.Pis.sat pl.Lyc.esc pl.Sol.tub pl.Gin.bil pl.Pin.luc pl.Tai.cry pr.Sar.neu pr.Pla.mal pr.Sti.hel pr.Mal.mat pr.Cyc.gla pr.Tet.bor pr.Rho.sal pr.Por.aca pr.Bla.hom pr.Try.equ an.Dro.mel an.Lam.aep an.Xen.lae an.Hom.sap an.Mus.mus an.Rat.nor an.Ast.amu an.Str.int an.Cae.ele an.Bra.pli fu.Sch.pom fu.Asp.nid fu.Asp.tam fu.Neu.cra fu.Can.gla fu.Sac.cer fu.Bul.hui fu.Den.sul fu.Fel.oga fu.Gig.mar fl.Bac.spl fl.Cyt.mar fl.Fla.xan fl.Mar.psy fl.Pre.eno fu.Fus.gon fu.Fus.nul fu.Lep.buc fu.Lep.mic gr.The.ros pt.Iso.pal pt.Pla.bra pt.Pla.mar ch.Zea.may ch.Dau.car ch.Gly.max ch.Pis.sat ch.Nic.tab ch.Pin.thu ch.Chl.mir ch.Eug.gra ch.Cya.par ch.Cya.cal ga.Esc.col ga.Hae.par ga.Leg.lyt ga.Mar.psy ga.Oce.pus ga.Pho.pho ga.Pse.asp ga.Sal.typ ga.Vib.cho ga.Xan.pis ra.Dei.mur ra.Dei.rad sp.Bor.par sp.Lep.mey sp.Tre.soc sp.Tre.suc th.The.afr th.The.hyp th.The.sub ar.Sul.sol ar.The.pen ar.Hal.val ar.Hal.sac ar.Nat.mag ar.Met.fer ar.Met.jan ar.Met.ore ar.The.chi ar.The.ste hi.Act.eur hi.Bif.inf hi.Cel.cel hi.Cor.oti hi.Mic.par hi.Mic.ech hi.Myc.cel hi.Myc.int hi.Noc.fla hi.Str.rim lo.Ach.lai lo.Bac.alc lo.Bac.sub lo.Car.mob lo.Eub.lac lo.Lac.car lo.Myc.cap lo.Myc.myc lo.Spi.cit lo.Sta.hom mt.Dro.mel mt.Gal.gal mt.Bos.tau mt.Hom.sap mt.Mus.mus mt.Sac.cer mt.Zea.may mt.Gly.max mt.Tet.pyr al.Aci.cry al.Met.rho al.Rhi.fre al.Ric.pea al.Sph.ter be.Hyd.pal be.Nei.fla be.Pse.syz be.Rho.pur be.Tha.ter de.Des.vib de.Des.ter de.Des.suc ep.Alv.pom ep.Cam.fet ep.Cam.lar ep.Hel.pyl ep.Hel.tro un.Aer.mar un.Fle.sin un.Mag.bav """ sample_seq_to_species = """cy.Mic.aer\tcy.Mic.aer cy.Osc.aga\tcy.Osc.aga cy.Pho.amb\tcy.Pho.amb cy.Tri.ten\tcy.Tri.ten pl.Zea.may\tpl.Zea.may pl.Pan.gin\tpl.Pan.gin pl.Ara.tha\tpl.Ara.tha pl.Gly.max\tpl.Gly.max pl.Pis.sat\tpl.Pis.sat pl.Lyc.esc\tpl.Lyc.esc pl.Sol.tub\tpl.Sol.tub pl.Gin.bil\tpl.Gin.bil pl.Pin.luc\tpl.Pin.luc pl.Tai.cry\tpl.Tai.cry pr.Sar.neu\tpr.Sar.neu pr.Pla.mal\tpr.Pla.mal pr.Sti.hel\tpr.Sti.hel pr.Mal.mat\tpr.Mal.mat pr.Cyc.gla\tpr.Cyc.gla pr.Tet.bor\tpr.Tet.bor pr.Rho.sal\tpr.Rho.sal pr.Por.aca\tpr.Por.aca pr.Bla.hom\tpr.Bla.hom pr.Try.equ\tpr.Try.equ an.Dro.mel\tan.Dro.mel an.Lam.aep\tan.Lam.aep an.Xen.lae\tan.Xen.lae an.Hom.sap\tan.Hom.sap an.Mus.mus\tan.Mus.mus an.Rat.nor\tan.Rat.nor an.Ast.amu\tan.Ast.amu an.Str.int\tan.Str.int an.Cae.ele\tan.Cae.ele an.Bra.pli\tan.Bra.pli fu.Sch.pom\tfu.Sch.pom fu.Asp.nid\tfu.Asp.nid fu.Asp.tam\tfu.Asp.tam fu.Neu.cra\tfu.Neu.cra fu.Can.gla\tfu.Can.gla fu.Sac.cer\tfu.Sac.cer fu.Bul.hui\tfu.Bul.hui fu.Den.sul\tfu.Den.sul fu.Fel.oga\tfu.Fel.oga fu.Gig.mar\tfu.Gig.mar fl.Bac.spl\tfl.Bac.spl fl.Cyt.mar\tfl.Cyt.mar fl.Fla.xan\tfl.Fla.xan fl.Mar.psy\tfl.Mar.psy fl.Pre.eno\tfl.Pre.eno fu.Fus.gon\tfu.Fus.gon fu.Fus.nul\tfu.Fus.nul fu.Lep.buc\tfu.Lep.buc fu.Lep.mic\tfu.Lep.mic gr.The.ros\tgr.The.ros pt.Iso.pal\tpt.Iso.pal pt.Pla.bra\tpt.Pla.bra pt.Pla.mar\tpt.Pla.mar ch.Zea.may\tch.Zea.may ch.Dau.car\tch.Dau.car ch.Gly.max\tch.Gly.max ch.Pis.sat\tch.Pis.sat ch.Nic.tab\tch.Nic.tab ch.Pin.thu\tch.Pin.thu ch.Chl.mir\tch.Chl.mir ch.Eug.gra\tch.Eug.gra ch.Cya.par\tch.Cya.par ch.Cya.cal\tch.Cya.cal ga.Esc.col\tga.Esc.col ga.Hae.par\tga.Hae.par ga.Leg.lyt\tga.Leg.lyt ga.Mar.psy\tga.Mar.psy ga.Oce.pus\tga.Oce.pus ga.Pho.pho\tga.Pho.pho ga.Pse.asp\tga.Pse.asp ga.Sal.typ\tga.Sal.typ ga.Vib.cho\tga.Vib.cho ga.Xan.pis\tga.Xan.pis ra.Dei.mur\tra.Dei.mur ra.Dei.rad\tra.Dei.rad sp.Bor.par\tsp.Bor.par sp.Lep.mey\tsp.Lep.mey sp.Tre.soc\tsp.Tre.soc sp.Tre.suc\tsp.Tre.suc th.The.afr\tth.The.afr th.The.hyp\tth.The.hyp th.The.sub\tth.The.sub ar.Sul.sol\tar.Sul.sol ar.The.pen\tar.The.pen ar.Hal.val\tar.Hal.val ar.Hal.sac\tar.Hal.sac ar.Nat.mag\tar.Nat.mag ar.Met.fer\tar.Met.fer ar.Met.jan\tar.Met.jan ar.Met.ore\tar.Met.ore ar.The.chi\tar.The.chi ar.The.ste\tar.The.ste hi.Act.eur\thi.Act.eur hi.Bif.inf\thi.Bif.inf hi.Cel.cel\thi.Cel.cel hi.Cor.oti\thi.Cor.oti hi.Mic.par\thi.Mic.par hi.Mic.ech\thi.Mic.ech hi.Myc.cel\thi.Myc.cel hi.Myc.int\thi.Myc.int hi.Noc.fla\thi.Noc.fla hi.Str.rim\thi.Str.rim lo.Ach.lai\tlo.Ach.lai lo.Bac.alc\tlo.Bac.alc lo.Bac.sub\tlo.Bac.sub lo.Car.mob\tlo.Car.mob lo.Eub.lac\tlo.Eub.lac lo.Lac.car\tlo.Lac.car lo.Myc.cap\tlo.Myc.cap lo.Myc.myc\tlo.Myc.myc lo.Spi.cit\tlo.Spi.cit lo.Sta.hom\tlo.Sta.hom mt.Dro.mel\tmt.Dro.mel mt.Gal.gal\tmt.Gal.gal mt.Bos.tau\tmt.Bos.tau mt.Hom.sap\tmt.Hom.sap mt.Mus.mus\tmt.Mus.mus mt.Sac.cer\tmt.Sac.cer mt.Zea.may\tmt.Zea.may mt.Gly.max\tmt.Gly.max mt.Tet.pyr\tmt.Tet.pyr al.Aci.cry\tal.Aci.cry al.Met.rho\tal.Met.rho al.Rhi.fre\tal.Rhi.fre al.Ric.pea\tal.Ric.pea al.Sph.ter\tal.Sph.ter be.Hyd.pal\tbe.Hyd.pal be.Nei.fla\tbe.Nei.fla be.Pse.syz\tbe.Pse.syz be.Rho.pur\tbe.Rho.pur be.Tha.ter\tbe.Tha.ter de.Des.vib\tde.Des.vib de.Des.ter\tde.Des.ter de.Des.suc\tde.Des.suc ep.Alv.pom\tep.Alv.pom ep.Cam.fet\tep.Cam.fet ep.Cam.lar\tep.Cam.lar ep.Hel.pyl\tep.Hel.pyl ep.Hel.tro\tep.Hel.tro un.Aer.mar\tun.Aer.mar un.Fle.sin\tun.Fle.sin un.Mag.bav\tun.Mag.bav """ sample_priors = {'A':0.2528,'C':0.2372,'G':0.3099,'U':0.2001} sub_matrix = """-1.4150\t0.2372\t0.9777\t0.2001 0.2528\t-1.1940\t0.3099\t0.6313 0.7976\t0.2372\t-1.2349\t0.2001 0.2528\t0.7484\t0.3099\t-1.3111 """ sample_sub_matrix = {} for row_c,row in zip(gctmpca_base_order,sub_matrix.split('\n')): sample_sub_matrix[row_c] = dict(zip(gctmpca_base_order,row.split())) trivial_seqs = """3 4 A1.. AACF A12. AADF A123 ADCF """ trivial_tree = """(A1..:0.5,(A12.,A123):0.5); """ trivial_species_names = """A1.. A12. A123 """ trivial_seq_to_species = """A1..\tA1.. A12.\tA12. A123\tA123 """ myog_seqs = """42 153 Alligator...... ELSDQEWKHVLDIWTKVESKLPEHGHEVIIRLLQEHPETQERFEKFKHMKTADEMKSSEKMKQHGNTVFTALGNILKQKGNHAEVLKPLAKSHALEHKIPVKYLEFISEIIVKVIAEKYPADFGADSQAAMRKALELFRNDMASKYKEFGYQG Aptenodytes.... GLNDQEWQQVLTMWGKVESDLAGHGHAVLMRLFKSHPETMDRFDKFRGLKTPDEMRGSEDMKKHGVTVLT-LGQILKKKGHHEAELKPLSQTHATKHKVPVKYLEFISEAIMKVIAQKHASNFGADAQEAMKKALELFRNDMASKYKEFGFQG Balaenoptera... VLTDAEWHLVLNIWAKVEADVAGHGQDILISLFKGHPETLEKFDKFKHLKTEAEMKASEDLKKHGNTVLTALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSRHPADFGADAQAAMNKALELFRKDIAAKYKELGFQG Bos............ GLSDGEWQLVLNAWGKVEADVAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGNTVLTALGGILKKKGHHEAEVKHLAESHANKHKIPVKYLEFISDAIIHVLHAKHPSDFGADAQAAMSKALELFRNDMAAQYKVLGFHG Callithrix..... GLSDGEWQLVLNVWGKVEADIPSHGQEVLISLFKGHPETLEKFDKFKHLKSEDEMKASEELKKHGVTVLTALGGILKKKGHHEAELKPLAQSHATKHKIPVKYLEFISDAIVHVLQKKHPGDFGADAQGAMKKALELFRNDMAAKYKELGFQG Canis.......... GLSDGEWQIVLNIWGKVETDLAGHGQEVLIRLFKNHPETLDKFDKFKHLKTEDEMKGSEDLKKHGNTVLTALGGILKKKGHHEAELKPLAQSHATKHKIPVKYLEFISDAIIQVLQSKHSGDFHADTEAAMKKALELFRNDIAAKYKELGFQG Caretta........ GLSDDEWNHVLGIWAKVEPDLSAHGQEVIIRLFQLHPETQERFAKFKNLTTIDALKSSEEVKKHGTTVLTALGRILKQKNNHEQELKPLAESHATKHKIPVKYLEFICEIIVKVIAEKHPSDFGADSQAAMKKALELFRNDMASKYKEFGFQG Castor......... GLSDGEWQLVLHVWGKVEADLAGHGQEVLIRLFKGHPETLEKFNKFKHIKSEDEMKASEDLKKHGVTVLTALGGVLKKKGHHEAEIKPLAQSHATKHKIPIKYLEFISEAIIHVLQSKHPGBFGADABGAMNKALELFRKDIAAKYKELGFQG Ctenodactylus.. GLSDGEWQLVLNAWGKVETDIGGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGTTVLTALGNILKKKGQHEAELAPLAQSHATKHKIPVKYLEFISEAIIQVLESKHPGDFGADAQGAMSKALELFRNDIAAKYKELGFQG Didelphis...... GLSDGEWQLVLNAWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGNILKKKGNHEAELKPLAQSHATKHKISVQFLEFISEAIIQVIQSKHPGDFGGDAQAAMGKALELFRNDMAAKYKELGFQG Elephas........ GLSDGEWELVLKTWGKVEADIPGHGETVFVRLFTGHPETLEKFDKFKHLKTEGEMKASEDLKKQGVTVLTALGGILKKKGHHEAEIQPLAQSHATKHKIPIKYLEFISDAIIHVLQSKHPAEFGADAQGAMKKALELFRNDIAAKYKELGFQG Equus.......... GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG Gallus......... GLSDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGLKTPDQMKGSEDLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKIPVKYLEFISEVIIKVIAEKHAADFGADSQAAMKKALELFRNDMASKYKEFGFQG Graptemys...... GLSDDEWHHVLGIWAKVEPDLSAHGQEVIIRLFQVHPETQERFAKFKNLKTIDELRSSEEVKKHGTTVLTALGRILKLKNNHEPELKPLAESHATKHKIPVKYLEFICEIIVKVIAEKHPSDFGADSQAAMRKALELFRNDMASKYKEFGFQG Inia........... GLSDGEWQLVLNIWGKVEADLAGHGQDVLIRLFKGHPETLEKFDKFKHLKTEAEMKASEDLKKHGNTVLTALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQAAMNKALELFRKDIAAKYKELGFHG Kogia.......... VLSEGEWQLVLHVWAKVEADIAGHGQDILIRLFKHHPETLEKFDRFKHLKSEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPADFGADAQGAMSKALELFRKDIAAKYKELGYQG Lagostomus..... GLSDGEWQLVLNVWGKVEADLGGHGQEVLIRLFKGHPETLEKFDKFKHLKAEDEMRASEDLKKHGTTVLTALGGILKKRGQHAAELAPLAQSHATKHKIPVKYLEFISEAIIQVLQSKHPGDFGADAQAAMSKALELFRNDIAAKYKELGFQG Lagothrix...... GLSDGEWQLVLNIWGKVEADIPSHGQEVLISLFKGHPETLEKFDKFKHLKSEDEMKASEELKKHGVTVLTALGGILKKKGQHEAELKPLAQSHATKHKIPVKYLEFISDAIIHALQKKHPGDFGADAQGAMKKALELFRNDMAAKYKELGFQG Lutra.......... GLSDGEWQLVLNVWGKVEADLAGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKGSEDLKKHGNTVLTALGGILKKKGKHEAELKPLAQSHATKHKIPIKYLEFISEAIIQVLQSKHPGBFGADAQGAMKRALELFRNDIAAKYKELGFQG Macropus....... GLSDGEWQLVLNIWGKVETDEGGHGKDVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGITVLTALGNILKKKGHHEAELKPLAQSHATKHKIPVQFLEFISDAIIQVIQSKHAGNFGADAQAAMKKALELFRHDMAAKYKEFGFQG Meles.......... GLSDGEWQLVLNVWGKVEADLAGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKGSEDLKKHGNTVLTALGGILKKKGHQEAELKPLAQSHATKHKIPVKYLEFISDAIAQVLQSKHPGNFAAEAQGAMKKALELFRNDIAAKYKELGFQG Mus............ GLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSEDLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRHSGDFGADAQGAMSKALELFRNDIAAKYKELGFQG Ochotona....... GLSDGEWQLVLNVWGKVEADLAGHGQEVLIRLFKNHPETLEKFDKFKNLKSEDEMKGSDDLKKHGNTVLSALGGILKKKGQHEAELKPLAQSHATKHKIPVKYLEFISEAIIQVLQSKHPGDFGADAQGAMSKALELFRNDMAAKYKELGFQG Ondatra........ GLSDGEWQLVLHVWGKVEADLAGHGQDVLIRLFKAHPETLEKFDKFKHIKSEDEMKGSEDLKKHGBTVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPIKYLEFISEAIIHVLZSKHPSBFGADVZGAMKRALELFRNDIAAKYKELGFQG Orcinus........ GLSDGEWQLVLNVWGKVEADLAGHGQDILIRLFKGHPETLEKFDKFKHLKTEADMKASEDLKKHGNTVLTALGAILKKKGHHDAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPAEFGADAQGAMNKALELFRKDIAAKYKELGFHG Ornithorhynchus GLSDGEWQLVLKVWGKVEGDLPGHGQEVLIRLFKTHPETLEKFDKFKGLKTEDEMKASADLKKHGGTVLTALGNILKKKGQHEAELKPLAQSHATKHKISIKFLEYISEAIIHVLQSKHSADFGADAQAAMGKALELFRNDMAAKYKEFGFQG Orycteropus.... GLSDAEWQLVLNVWGKVEADIPGHGQDVLIRLFKGHPETLEKFDRFKHLKTEDEMKASEDLKKHGTTVLTALGGILKKKGQHEAEIQPLAQSHATKHKIPVKYLEFISEAIIQVIQSKHSGDFGADAQGAMSKALELFRNDIAAKYKELGFQG Ovis........... GLSDGEWQLVLNAWGKVEADVAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGNTVLTALGGILKKKGHHEAEVKHLAESHANKHKIPVKYLEFISDAIIHVLHAKHPSNFGADAQGAMSKALELFRNDMAAEYKVLGFQG Pan............ GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLHSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG Papio.......... GLSDGEWQLVLNVWGKVEADIPSHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLELISESIIQVLQSKHPGDFGADAQGAMNKALELFRNDMAAKYKELGFQG Phocoenoides... GLSEGEWQLVLNVWGKVEADLAGHGQDVLIRLFKGHPETLEKFDKFKHLKTEAEMKASEDLKKHGNTVLTALGGILKKKGHHDAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPAEFGADAQGAMNKALELFRKDIATKYKELGFHG Physeter....... VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Pongo.......... GLSDGEWQLVLNVWGKVEADIPSHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISESIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG Proechimys..... GLSDGEWQLVLNVWGKVEGDLSGHGQEVLIRLFKGHPETLEKFDKFKHLKAEDEMRASEELKKHGTTVLTALGGILKKKGQHAAELAPLAQSHATKHKIPVKYLEFISEAIIQVLQSKHPGDFGADAQGAMSKALELFRNDIAAKYKELGFQG Rousettus...... GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGQHEAQLKPLAQSHATKHKIPVKYLEFISEVIIQVLQSKHPGDFGADAQGAMGKALELFRNDIAAKYKELGFQG Saimiri........ GLSDGEWQLVLNIWGKVEADIPSHGQEVLISLFKGHPETLEKFDKFKHLKSEDEMKASEELKKHGTTVLTALGGILKKKGQHEAELKPLAQSHATKHKIPVKYLELISDAIVHVLQKKHPGDFGADAQGAMKKALELFRNDMAAKYKELGFQG Spalax......... GLSDGEWQLVLNVWGKVEGDLAGHGQEVLIKLFKNHPETLEKFDKFKHLKSEDEMKGSEDLKKHGNTVLTALGGILKKKGQHAAEIQPLAQSHATKHKIPIKYLEFISEAIIQVLQSKHPGDFGADAQGAMSKALELFRNDIAAKYKELGFQG Tachyglossus... GLSDGEWQLVLKVWGKVETDITGHGQDVLIRLFKTHPETLEKFDKFKHLKTEDEMKASADLKKHGGVVLTALGSILKKKGQHEAELKPLAQSHATKHKISIKFLEFISEAIIHVLQSKHSADFGADAQAAMGKALELFRNDMATKYKEFGFQG Tupaia......... GLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDKFKHLKTEDEMKASEDLKKHGNTVLSALGGILKKKGQHEAEIKPLAQSHATKHKIPVKYLEFISEAIIQVLQSKHPGDFGADAQAAMSKALELFRNDIAAKYKELGFQG Varanus........ GLSDEEWKKVVDIWGKVEPDLPSHGQEVIIRMFQNHPETQDRFAKFKNLKTLDEMKNSEDLKKHGTTVLTALGRILKQKGHHEAEIAPLAQTHANTHKIPIKYLEFICEVIVGVIAEKHSADFGADSQEAMRKALELFRNDMASRYKELGFQG Zalophus....... GLSDGEWQLVLNIWGKVEADLVGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKRSEDLKKHGKTVLTALGGILKKKGHHDAELKPLAQSHATKHKIPIKYLEFISEAIIHVLQSKHPGDFGADTHAAMKKALELFRNDIAAKYRELGFQG Ziphius........ GLSEAEWQLVLHVWAKVEADLSGHGQEILIRLFKGHPETLEKFDKFKHLKSEAEMKASEDLKKHGHTVLTALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSRHPSDFGADAQAAMTKALELFRKDIAAKYKELGFHG """ myog_seq_to_species = """Alligator...... Alligator...... Aptenodytes.... Aptenodytes.... Balaenoptera... Balaenoptera... Bos............ Bos............ Callithrix..... Callithrix..... Canis.......... Canis.......... Caretta........ Caretta........ Castor......... Castor......... Ctenodactylus.. Ctenodactylus.. Didelphis...... Didelphis...... Elephas........ Elephas........ Equus.......... Equus.......... Gallus......... Gallus......... Graptemys...... Graptemys...... Inia........... Inia........... Kogia.......... Kogia.......... Lagostomus..... Lagostomus..... Lagothrix...... Lagothrix...... Lutra.......... Lutra.......... Macropus....... Macropus....... Meles.......... Meles.......... Mus............ Mus............ Ochotona....... Ochotona....... Ondatra........ Ondatra........ Orcinus........ Orcinus........ Ornithorhynchus Ornithorhynchus Orycteropus.... Orycteropus.... Ovis........... Ovis........... Pan............ Pan............ Papio.......... Papio.......... Phocoenoides... Phocoenoides... Physeter....... Physeter....... Pongo.......... Pongo.......... Proechimys..... Proechimys..... Rousettus...... Rousettus...... Saimiri........ Saimiri........ Spalax......... Spalax......... Tachyglossus... Tachyglossus... Tupaia......... Tupaia......... Varanus........ Varanus........ Zalophus....... Zalophus....... Ziphius........ Ziphius........ """ myog_tree = """((((((((Alligator......:0.14157,(Caretta........:0.02868,Graptemys......:0.02361):0.07084):0.02380,Varanus........:0.11672):0.04036,(Aptenodytes....:0.09769,Gallus.........:0.07336):0.03201):0.06365,(Ornithorhynchus:0.02648,Tachyglossus...:0.03235):0.05190):0.00632,(Didelphis......:0.03986,Macropus.......:0.06472):0.01338):0.01350,((((((Balaenoptera...:0.05296,(Kogia..........:0.01638,Physeter.......:0.01630):0.03528):0.00341,Ziphius........:0.04397):0.01061,(Inia...........:0.01297,(Orcinus........:0.01876,Phocoenoides...:0.01392):0.01971):0.00818):0.02315,((Bos............:0.01434,Ovis...........:0.01180):0.06631,Equus..........:0.03500):0.00381):0.00467,Elephas........:0.09030):0.01398,((Canis..........:0.05867,Zalophus.......:0.04591):0.00949,(((Castor.........:0.03609,Ondatra........:0.04888):0.01934,Lutra..........:0.01988):0.00609,Meles..........:0.04293):0.00277):0.00653):0.00260):0.00531,(((Callithrix.....:0.00920,(Lagothrix......:0.01063,Saimiri........:0.01551):0.00387):0.03812,((Pan............:0.00842,Pongo..........:0.01118):0.01307,Papio..........:0.01307):0.01907):0.00421,Rousettus......:0.02439):0.00691):0.00513,((Ctenodactylus..:0.03010,(Lagostomus.....:0.01528,Proechimys.....:0.01740):0.01892):0.00914,Orycteropus....:0.04642):0.00078,(((Mus............:0.07859,Spalax.........:0.02599):0.01114,Ochotona.......:0.03134):0.00969,Tupaia.........:0.02462):0.00576); """ myog_species_names = """Alligator...... Aptenodytes.... Balaenoptera... Bos............ Callithrix..... Canis.......... Caretta........ Castor......... Ctenodactylus.. Didelphis...... Elephas........ Equus.......... Gallus......... Graptemys...... Inia........... Kogia.......... Lagostomus..... Lagothrix...... Lutra.......... Macropus....... Meles.......... Mus............ Ochotona....... Ondatra........ Orcinus........ Ornithorhynchus Orycteropus.... Ovis........... Pan............ Papio.......... Phocoenoides... Physeter....... Pongo.......... Proechimys..... Rousettus...... Saimiri........ Spalax......... Tachyglossus... Tupaia......... Varanus........ Zalophus....... Ziphius........ """ if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_app/test_guppy.py000644 000765 000024 00000010172 12024702176 022000 0ustar00jrideoutstaff000000 000000 #!/bin/env python __author__ = "Jesse Stombaugh" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Stombaugh"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Stombaugh" __email__ = "jesse.stombaugh@colorado.edu" __status__ = "Production" from os import getcwd, remove, rmdir, mkdir from os.path import splitext from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from random import randint from cogent.app.guppy import Guppy, build_tree_from_json_using_params from cogent.app.util import ApplicationError,get_tmp_filename from cogent.core.tree import PhyloNode from cogent.parse.tree import DndParser class Genericguppy(TestCase): def setUp(self): '''setup the files for testing guppy''' # create a list of files to cleanup self._paths_to_clean_up = [] self._dirs_to_clean_up = [] # get a tmp filename to use basename=splitext(get_tmp_filename())[0] # create and write out RAxML stats file self.json_fname=basename+'.json' json_out=open(self.json_fname,'w') json_out.write(JSON_RESULT) json_out.close() self._paths_to_clean_up.append(self.json_fname) def tearDown(self): """cleans up all files initially created""" # remove the tempdir and contents map(remove,self._paths_to_clean_up) map(rmdir,self._dirs_to_clean_up) class guppyTests(Genericguppy): """Tests for the guppy application controller""" def test_guppy(self): """Base command-calls""" app=Guppy() self.assertEqual(app.BaseCommand, \ ''.join(['cd "',getcwd(),'/"; ','guppy'])) app.Parameters['--help'].on() self.assertEqual(app.BaseCommand, \ ''.join(['cd "',getcwd(),'/"; ','guppy --help'])) def test_change_working_dir(self): """Change working dir""" # define working directory for output working_dir='/tmp/Guppy' self._dirs_to_clean_up.append(working_dir) app = Guppy(WorkingDir=working_dir) self.assertEqual(app.BaseCommand, \ ''.join(['cd "','/tmp/Guppy','/"; ','guppy'])) def test_build_tree_from_alignment_using_params(self): """Builds a tree from a json file""" # define working directory for output outdir='/tmp/' # set params params={} params["tog"] = None # build tree tree = build_tree_from_json_using_params(self.json_fname, output_dir=outdir, params=params) self.assertEqual(tree.getNewick(), DndParser(TREE_RESULT, constructor=PhyloNode).getNewick()) JSON_RESULT="""\ {"tree": "((seq0000004:0.08408[0],seq0000005:0.13713[1])0.609:0.00215[2],seq0000003:0.02032[3],(seq0000001:0.00014[4],seq0000002:0.00014[5])0.766:0.00015[6]):0[7];", "placements": [ {"p": [[0, -113.210938, 0.713818, 0.064504, 0.000006], [1, -114.929894, 0.127954, 0.137122, 0.000007], [2, -114.932766, 0.127587, 0.000008, 0.000006], [6, -117.743534, 0.007675, 0.000141, 0.027211], [3, -117.743759, 0.007674, 0.020310, 0.027207], [4, -117.747386, 0.007646, 0.000131, 0.027266], [5, -117.747396, 0.007646, 0.000131, 0.027266] ], "n": ["seq0000006"] }, {"p": [[0, -113.476305, 1.000000, 0.035395, 0.000006]], "n": ["seq0000007"] } ], "metadata": {"invocation": "guppy -t %s -r %s -s %s --out-dir \/tmp %s" }, "version": 1, "fields": ["edge_num", "likelihood", "like_weight_ratio", "distal_length", "pendant_length" ] } """.replace('\n','').replace(' ','') TREE_RESULT="""\ ((((seq0000004:0.035395,seq0000007:6e-06)\ :0.029109,seq0000006:6e-06)\ :0.019576,seq0000005:0.13713)\ 0.609:0.00215,seq0000003:0.02032,(seq0000001:0.00014,seq0000002:0.00014)\ 0.766:0.00015)\ :0; """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_ilm.py000644 000765 000024 00000235564 12024702176 021433 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.ilm import ILM, xhlxplot, hlxplot __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class xhlxplotTest(TestCase): """Tests for xhlxplot application controller""" def setUp(self): self.input = xhlx_input def test_input_as_lines(self): """Test xhlxplot stdout input as lines""" x = xhlxplot(InputHandler='_input_as_lines') res = x(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test xhlxplot input as string""" x = xhlxplot() f = open('/tmp/ilm.mwm','w') f.write('\n'.join(self.input)) f.close() res = x('/tmp/ilm.mwm') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/ilm.mwm') def test_get_result_path(self): """Tests xhlxplot result path""" x = xhlxplot(InputHandler='_input_as_lines') res = x(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() class IlmTest(TestCase): """Tests for ilm application controller""" def setUp(self): self.input = ilm_input def test_stdout_input_as_lines(self): """Test ilm input as lines""" i = ILM(InputHandler='_input_as_lines') res = i(self.input) exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in ilm_stdout]) obs = res['StdOut'].read() self.assertEqual(obs,exp) #Exitstatus platform dependent?? Works but gives exitstatus 224 #self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_stdout_input_as_string(self): """Test ilm input as string""" ilm = ILM() f = open('/tmp/ilm.mwm','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = ilm('/tmp/ilm.mwm') exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in ilm_stdout]) obs = res['StdOut'].read() self.assertEqual(obs,exp) #Exitstatus platform dependent?? #self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/ilm.mwm') def test_get_result_path(self): """Tests ilm result path""" i = ILM(InputHandler='_input_as_lines') res = i(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) #self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() class hlxplotTest(TestCase): """Tests for hlxplot application controller""" def setUp(self): self.input = hlx_input def test_input_as_lines(self): """Test hlxplot input as lines""" h = hlxplot(InputHandler='_input_as_lines') res = h(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test hlxplot stdout input as string""" h = hlxplot() f = open('/tmp/ilm.mwm','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = h('/tmp/ilm.mwm') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/ilm.mwm') def test_get_result_path(self): """Tests hlxplot result path""" h = hlxplot(InputHandler='_input_as_lines') res = h(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() ilm_stdout =['\n', 'Final Matching:\n', '1 72\n', '2 71\n', '3 70\n', '4 69\n', '5 68\n', '6 67\n', '7 66\n', '8 0\n', '9 0\n', '10 25\n', '11 24\n', '12 23\n', '13 22\n', '14 0\n', '15 0\n', '16 0\n', '17 0\n', '18 48\n', '19 47\n', '20 46\n', '21 45\n', '22 13\n', '23 12\n', '24 11\n', '25 10\n', '26 0\n', '27 43\n', '28 42\n', '29 41\n', '30 40\n', '31 39\n', '32 38\n', '33 37\n', '34 56\n', '35 55\n', '36 54\n', '37 33\n', '38 32\n', '39 31\n', '40 30\n', '41 29\n', '42 28\n', '43 27\n', '44 0\n', '45 21\n', '46 20\n', '47 19\n', '48 18\n', '49 65\n', '50 64\n', '51 63\n', '52 62\n', '53 61\n', '54 36\n', '55 35\n', '56 34\n', '57 0\n', '58 0\n', '59 0\n', '60 0\n', '61 53\n', '62 52\n', '63 51\n', '64 50\n', '65 49\n', '66 7\n', '67 6\n', '68 5\n', '69 4\n', '70 3\n', '71 2\n', '72 1\n'] hlx_stdout = ['\x07\x08d\x96\x04\x00\x00\x00H\x00\x00\x00\xfc\t\x00\x00,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x04\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x04\x01\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x04\x01\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x04\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xdc\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x04\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xdc\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x04\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffT\x01\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xdc\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00T\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00T\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffT\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffT\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff,\x01\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffT\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00,\x01\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00T\x01\x00\x00d\x00\x00\x00\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00T\x01\x00\x00d\x00\x00\x00\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xb4\x00\x00\x00\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00,\x01\x00\x00\x8c\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xb4\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xb4\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xb4\x00\x00\x00,\x01\x00\x00\x8c\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\x8c\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff,\x01\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff,\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\x8c\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x8c\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xb4\x00\x00\x00d\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xffd\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xb4\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\xd8\xff\xff\xff\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xd8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'] xhlx_stdout = ['\x07\x08d\x96\x04\x00\x00\x00H\x00\x00\x00\xfc\t\x00\x00$\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x81\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x8e\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x13\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff:\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff$\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff6\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x81\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x87\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x8e\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x88\x00\x00\x008\xff\xff\xff\'\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xbe\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x13\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff:\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff$\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff6\x01\x00\x00_\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffO\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x87\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x02\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x80\x00\x00\x00+\x01\x00\x00\xb2\x00\x00\x00\x02\x00\x00\x008\xff\xff\xff8\xff\xff\xffV\x00\x00\x008\xff\xff\xff\xf5\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x8c\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe1\x00\x00\x00\x04\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb9\x00\x00\x002\x01\x00\x00\x93\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff:\x02\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf2\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xbf\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x04\x01\x00\x00\x91\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf3\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xf7\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xd5\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x02\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb2\x00\x00\x00+\x01\x00\x00\x80\x00\x00\x00\x02\x00\x00\x008\xff\xff\xff\x10\x00\x00\x008\xff\xff\xff8\xff\xff\xff\'\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xbe\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe1\x00\x00\x006\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x002\x01\x00\x00\x93\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xce\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x06\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xd3\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffs\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf3\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x0b\x02\x00\x008\xff\xff\xff\xcd\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xb7\x01\x00\x008\xff\xff\xff\x18\x01\x00\x00\x04\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x02\x00\x00\x008\xff\xff\xff\x07\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x02\x00\x00\x008\xff\xff\xff\x10\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xf5\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x8c\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x04\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffa\x01\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xcf\x00\x00\x00\x00\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff$\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf1\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x91\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xcf\x00\x00\x00%\x01\x00\x008\xff\xff\xff8\xff\xff\xff)\x02\x00\x008\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xd5\x01\x00\x008\xff\xff\xff6\x01\x00\x006\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xba\x00\x00\x008\xff\xff\xff8\xff\xff\xff%\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xcf\x00\x00\x00\xbe\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00c\x01\x00\x00\x9d\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x1b\x01\x00\x008\xff\xff\xff\x06\x02\x00\x008\xff\xff\xffh\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x1b\x01\x00\x008\xff\xff\xffs\x01\x00\x008\xff\xff\xff\x1c\x01\x00\x00\x9d\x00\x00\x008\xff\xff\xff\x1b\x01\x00\x008\xff\xff\xff\x0b\x02\x00\x008\xff\xff\xff8\xff\xff\xff\x1b\x01\x00\x008\xff\xff\xff\xb7\x01\x00\x00\x1c\x01\x00\x00\x04\x01\x00\x00\x18\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x9c\x00\x00\x00\x1b\x01\x00\x008\xff\xff\xff\x07\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xdc\x00\x00\x00\x9d\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\x95\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff9\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x86\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff9\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xffN\x01\x00\x008\xff\xff\xff8\xff\xff\xff9\x01\x00\x008\xff\xff\xff)\x02\x00\x008\xff\xff\xff8\xff\xff\xff9\x01\x00\x008\xff\xff\xff\xd5\x01\x00\x00N\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff9\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x88\x00\x00\x008\xff\xff\xff\x86\x01\x00\x008\xff\xff\xff8\xff\xff\xff*\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x0e\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x95\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x86\x01\x00\x00\x80\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x1c\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff)\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x80\x00\x00\x00\xf9\x00\x00\x00\x81\x00\x00\x00c\x00\x00\x008\xff\xff\xff8\xff\xff\xffV\x00\x00\x008\xff\xff\xffT\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xf8\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xdc\x00\x00\x00\x83\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xffc\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffT\x01\x00\x00\xb2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb2\x00\x00\x00\xf9\x00\x00\x00O\x00\x00\x00c\x00\x00\x008\xff\xff\xff\x88\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x86\x01\x00\x008\xff\xff\xff8\xff\xff\xff*\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xdc\x00\x00\x00\xb5\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffj\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff@\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x95\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x95\x00\x00\x008\xff\xff\xff\xba\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x86\x01\x00\x008\xff\xff\xff8\xff\xff\xff*\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb5\x01\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x01\x00\x00@\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffq\x01\x00\x00j\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xea\x01\x00\x00@\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xaa\x01\x00\x00*\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xcf\x00\x00\x00@\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x1a\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xda\x00\x00\x008\xff\xff\xff8\xff\xff\xffB\x01\x00\x00q\x01\x00\x008\xff\xff\xff8\xff\xff\xff\x07\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x07\x01\x00\x008\xff\xff\xff\xcf\x00\x00\x00\xea\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xba\x00\x00\x008\xff\xff\xff8\xff\xff\xffT\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffH\x01\x00\x00\xaa\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\x9d\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xfc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xbc\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x10\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe9\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe9\x00\x00\x008\xff\xff\xff\x9d\x00\x00\x00\xcc\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x9c\x00\x00\x008\xff\xff\xff8\xff\xff\xff6\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffH\x00\x00\x008\xff\xff\xff\xb8\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xa9\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x16\x01\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xfc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xc6\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xbc\x00\x00\x008\xff\xff\xff8\xff\xff\xff$\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xcc\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff6\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xea\x00\x00\x008\xff\xff\xff8\xff\xff\xff\\\x00\x00\x008\xff\xff\xff\xcc\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xbd\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff*\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe4\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x04\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb8\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffT\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xd6\x00\x00\x008\xff\xff\xff\xf2\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xea\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xdb\x01\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xcf\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff8\xff\xff\xff6\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xa4\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf7\x00\x00\x008\xff\xff\xff\x89\x01\x00\x00\xeb\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xea\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x08\x01\x00\x008\xff\xff\xff$\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xea\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xdb\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00i\x00\x00\x00\x9d\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xb1\x00\x00\x00\xa8\x00\x00\x00\xcd\x00\x00\x008\xff\xff\xff\x04\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb1\x00\x00\x00i\x00\x00\x00\xcd\x00\x00\x008\xff\xff\xff\x07\x00\x00\x00\x04\x01\x00\x008\xff\xff\xff\xb1\x00\x00\x00O\x00\x00\x00\x86\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xb1\x00\x00\x00O\x00\x00\x00\xd9\x00\x00\x00\xdf\xff\xff\xffW\x01\x00\x00\xcd\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x9d\x00\x00\x00\xcc\x01\x00\x00\xb1\x00\x00\x00\xc8\x00\x00\x00\xcd\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xffi\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x9d\x00\x00\x00\xbc\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xf0\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x9d\x00\x00\x00}\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x07\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x9d\x00\x00\x00c\x00\x00\x00r\x01\x00\x008\xff\xff\xff8\xff\xff\xff\x9d\x00\x00\x00c\x00\x00\x00\xc5\x00\x00\x00\xdf\xff\xff\xffk\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb1\x00\x00\x00\xcc\x01\x00\x00\x9d\x00\x00\x00\xdc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffV\x00\x00\x008\xff\xff\xff\xff\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x9b\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xa8\x00\x00\x008\xff\xff\xff8\xff\xff\xff"\x01\x00\x00\x80\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffi\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x07\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xa4\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffW\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xc8\x00\x00\x00\\\x01\x00\x00\xba\x00\x00\x00\xb9\x00\x00\x008\xff\xff\xff8\xff\xff\xffV\x00\x00\x008\xff\xff\xff\xff\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xffi\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf0\x00\x00\x00\xb2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xdc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffW\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xfa\x00\x00\x00\\\x01\x00\x00\x88\x00\x00\x00\xb9\x00\x00\x008\xff\xff\xff\xf2\x00\x00\x008\xff\xff\xff8\xff\xff\xff1\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Y\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xc9\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff`\x01\x00\x008\xff\xff\xff8\xff\xff\xff\x0e\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xba\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xba\x00\x00\x008\xff\xff\xff\x1d\x01\x00\x00\xc9\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x89\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xf7\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff$\x01\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\'\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xab\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xb2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffB\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xdc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x9c\x00\x00\x00\xb2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x9c\x00\x00\x008\xff\xff\xff\xeb\x00\x00\x00\xab\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xffk\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xd9\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x96\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xffY\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x97\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xe4\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff.\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xdc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe4\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x97\x00\x00\x00\n', "\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xc5\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf2\x00\x00\x008\xff\xff\xff\x96\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x81\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xd7\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xd2\x00\x00\x008\xff\xff\xff8\xff\xff\xff`\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xba\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xba\x00\x00\x008\xff\xff\xff\x1d\x01\x00\x00\xf1\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff<\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xf1\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff$\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xffO\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb9\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xe3\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb4\x00\x00\x008\xff\xff\xff8\xff\xff\xff.\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x9c\x00\x00\x00\xe8\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x9c\x00\x00\x008\xff\xff\xff\xeb\x00\x00\x00\xd3\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x1e\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xd3\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x81\x01\x00\x00\x17\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xa5\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x15\x01\x00\x008\xff\xff\xff\xb2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xa0\x00\x00\x008\xff\xff\xff8\xff\xff\xff.\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x1a\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xbf\x00\x00\x00V\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xbf\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\xe3\x00\x00\x008\xff\xff\xff8\xff\xff\xff+\x00\x00\x008\xff\xff\xff\x91\x00\x00\x008\xff\xff\xff\xb9\x00\x00\x00\xe3\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x94\x00\x00\x008\xff\xff\xff\x05\x00\x00\x008\xff\xff\xff\xa0\x00\x00\x00x\x00\x00\x008\xff\xff\xffB\x01\x00\x008\xff\xff\xff8\xff\xff\xff\x13\x01\x00\x008\xff\xff\xff\xfc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\\\x00\x00\x008\xff\xff\xff8\xff\xff\xffV\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\\\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x15\x01\x00\x00i\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff}\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x15\x01\x00\x008\xff\xff\xff\xf1\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf1\xff\xff\xff8\xff\xff\xff8\xff\xff\xffx\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffE\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffH\x00\x00\x00\xa8\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffH\x00\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x91\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff}\x00\x00\x00O\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x91\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xd3\x00\x00\x00O\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xf1\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x8c\x00\x00\x00\xed\x00\x00\x00c\x00\x00\x008\xff\xff\xff8\xff\xff\xff'\x01\x00\x00\xdf\x01\x00\x00i\x01\x00\x00\x91\xff\xff\xff\x96\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xa8\x00\x00\x00G\x01\x00\x00\xb8\x00\x00\x00\x96\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xce\x00\x00\x00\x91\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xcd\x00\x00\x00i\x00\x00\x00c\x00\x00\x008\xff\xff\xff\xcd\x00\x00\x00}\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xcd\x00\x00\x00\xbf\x00\x00\x00c\x00\x00\x008\xff\xff\xff\xce\x00\x00\x00\xf1\xff\xff\xff8\xff\xff\xff\xcd\x00\x00\x00x\x00\x00\x00\x01\x01\x00\x00O\x00\x00\x008\xff\xff\xff\xcd\x00\x00\x00\x13\x01\x00\x00\xf3\x01\x00\x00U\x01\x00\x00\x91\xff\xff\xff\xaa\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xbc\x00\x00\x00G\x01\x00\x00\xa4\x00\x00\x00\xaa\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x01\x00\x00\x00\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff8\xff\xff\xffq\x01\x00\x00\x00\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff\x1f\x01\x00\x008\xff\xff\xff8\xff\xff\xff\xeb\x00\x00\x008\xff\xff\xff\x11\x02\x00\x00\x87\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffe\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xcf\x00\x00\x00@\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x1a\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xda\x00\x00\x008\xff\xff\xff8\xff\xff\xffB\x01\x00\x00q\x01\x00\x008\xff\xff\xff8\xff\xff\xff\x07\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff9\x01\x00\x008\xff\xff\xff\xf2\x00\x00\x00\x11\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x01\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\x9d\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xfc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xbc\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x10\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe9\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x1b\x01\x00\x008\xff\xff\xff\xc0\x00\x00\x00\xf3\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xfc\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xc6\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xbc\x00\x00\x008\xff\xff\xff8\xff\xff\xff$\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf3\x01\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xb2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb2\x00\x00\x008\xff\xff\xff\xe4\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xdf\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x94\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x94\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff'\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xffi\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xffi\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffE\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xffU\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x85\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x0e\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x91\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff}\x00\x00\x00\xb9\x00\x00\x008\xff\xff\xff8\xff\xff\xff}\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff'\x01\x00\x00z\x00\x00\x008\xff\xff\xff8\xff\xff\xffU\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x99\x00\x00\x00\xce\x00\x00\x00\x9c\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xf0\x00\x00\x00\xce\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\x91\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffi\x00\x00\x00\xcd\x00\x00\x008\xff\xff\xff8\xff\xff\xffi\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x13\x01\x00\x00\x8e\x00\x00\x008\xff\xff\xff8\xff\xff\xffU\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x85\x00\x00\x00\xe2\x00\x00\x00\x88\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xdc\x00\x00\x00\xe2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xcd\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x14\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x8e\x00\x00\x008\xff\xff\xff8\xff\xff\xffi\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x81\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x81\x00\x00\x00\x80\x00\x00\x002\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x01\x00\x00O\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffB\x01\x00\x00O\x00\x00\x00\xb2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff$\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xb2\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb2\x00\x00\x008\xff\xff\xffB\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x94\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x94\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff>\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\x9e\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x12\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\\\x01\x00\x00\xdd\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x9e\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x12\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xab\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x80\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xf4\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xd4\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe4\x00\x00\x008\xff\xff\xff\x12\x02\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xb6\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x7f\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xc6\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\xff\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xb6\x00\x00\x008\xff\xff\xff\x86\x00\x00\x008\xff\xff\xff\x7f\x00\x00\x00\xe8\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff1\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xffr\x00\x00\x008\xff\xff\xff8\xff\xff\xff\x1a\x01\x00\x00\x00\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xff\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x96\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xe8\x00\x00\x002\x01\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x13\x01\x00\x008\xff\xff\xff\x9c\x00\x00\x008\xff\xff\xff\xaa\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x9c\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xf9\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xdb\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0e\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc2\x01\x00\x00\xdc\x00\x00\x008\xff\xff\xff8\xff\xff\xff)\x01\x00\x008\xff\xff\xff\xe2\x00\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\xf4\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xffG\x01\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff\xf4\x01\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\xd6\x01\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff8\xff\xff\xff8\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x008\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"] ilm_input = xhlx_stdout hlx_input = ['seq1:GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC\n'] xhlx_input = ['seq1:GGCTAGATAGCTCAGATGGTAGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_infernal.py000644 000765 000024 00000070446 12024702176 022444 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # test_infernal.py from os import getcwd, remove, rmdir, mkdir, path import tempfile, shutil from cogent.core.moltype import DNA, RNA, PROTEIN from cogent.core.alignment import DataError from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from cogent.app.infernal import Cmalign, Cmbuild, Cmcalibrate, Cmemit, Cmscore,\ Cmsearch, Cmstat, cmbuild_from_alignment, cmbuild_from_file, \ cmalign_from_alignment, cmalign_from_file, cmsearch_from_alignment,\ cmsearch_from_file from cogent.parse.rfam import MinimalRfamParser, ChangedRnaSequence, \ ChangedSequence from cogent.format.stockholm import stockholm_from_alignment from cogent.struct.rna2d import ViennaStructure, wuss_to_vienna __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" class GeneralSetUp(TestCase): def setUp(self): """Infernal general setUp method for all tests""" self.seqs1_unaligned = {'1':'ACUGCUAGCUAGUAGCGUACGUA',\ '2':'GCUACGUAGCUAC',\ '3':'GCGGCUAUUAGAUCGUA'} self.struct1_unaligned_string = '....(((...)))....' self.seqs1_unaligned_gaps = {'1':'ACUGCUAGCUAGU-AGCGUAC--GUA',\ '2':'--GCUACGUAGCUAC',\ '3':'GCGGCUAUUAGAUCGUA--'} self.seqs2_aligned = {'a': 'UAGGCUCUGAUAUAAUAGCUCUC---------',\ 'c': '------------UGACUACGCAU---------',\ 'b': '----UAUCGCUUCGACGAUUCUCUGAUAGAGA'} self.seqs2_unaligned = {'a': 'UAGGCUCUGAUAUAAUAGCUCUC',\ 'c': 'UGACUACGCAU',\ 'b': 'UAUCGCUUCGACGAUUCUCUGAUAGAGA'} self.struct2_aligned_string = '............((.(...)))..........' self.struct2_aligned_dict = {'SS_cons':self.struct2_aligned_string} self.lines2 = stockholm_from_alignment(aln=self.seqs2_aligned,\ GC_annotation=self.struct2_aligned_dict) #self.seqs1 aligned to self.seqs2 with self.seqs2 included. self.seqs1_and_seqs2_aligned = \ {'a': 'UAGGCUCUGAUAUAAUAGC-UCUC---------',\ 'b': '----UAUCGCUUCGACGAU-UCUCUGAUAGAGA',\ 'c': '------------UGACUAC-GCAU---------',\ '1': '-ACUGCUAGCUAGUAGCGUACGUA---------',\ '2': '----------GCUACGUAG-CUAC---------',\ '3': '-----GCGGCUAUUAG-AU-CGUA---------',\ } self.seqs1_and_seqs2_aligned_struct_string = \ '............((.(....)))..........' #self.seqs1 aligned to self.seqs2 without self.seqs2 included. self.seqs1_aligned = \ {'1': 'ACUGCUAGCUAGUAGCGUACGUA',\ '2': '---------GCUACGUAG-CUAC',\ '3': '----GCGGCUAUUAG-AU-CGUA',\ } self.seqs1_aligned_struct_string = \ '...........((.(....))).' self.temp_dir = tempfile.mkdtemp() self.temp_dir_spaces = '/tmp/test for infernal/' try: mkdir(self.temp_dir_spaces) except OSError: pass try: #create sequence files f = open(path.join(self.temp_dir, 'seqs1.sto'),'w') f.write(self.lines2) f.close() #create cm file. self.cmfile = path.join(self.temp_dir, 'aln2.cm') cm = open(self.cmfile,'w') cm.write(ALN1_CM) cm.close() #create alignment file used to create cm file. self.aln2_file = path.join(self.temp_dir, 'aln2.sto') af = open(self.aln2_file,'w') af.write(self.lines2) af.close() except OSError: pass class CmalignTests(GeneralSetUp): """Tests for the Cmalign application controller""" def test_base_command(self): """Infernal BaseCommand should return the correct BaseCommand""" c = Cmalign() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmalign'])) c.Parameters['-l'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmalign -l'])) def test_changing_working_dir(self): """Infernal BaseCommand should change according to WorkingDir""" c = Cmalign(WorkingDir='/tmp/cmalign_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmalign_test','/"; ','cmalign'])) c = Cmalign() c.WorkingDir = '/tmp/cmalign_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmalign_test2','/"; ','cmalign'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/cmalign_test') rmdir('/tmp/cmalign_test2') def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) def test_cmalign_from_alignment(self): """cmalign_from_alignment should work as expected. """ #Align with cmalign_from_alignment without original alignment. aln, struct = cmalign_from_alignment(aln=self.seqs2_aligned,\ structure_string=self.struct2_aligned_string,\ seqs=self.seqs1_unaligned_gaps,moltype=RNA,include_aln=False) #Check correct alignment self.assertEqual(aln.todict(),self.seqs1_aligned) #Check correct struct self.assertEqual(wuss_to_vienna(str(struct)),\ self.seqs1_aligned_struct_string) #should work with gapped seqs. Need to test this is taken care of # since cmalign segfaults when there are gaps in the seqs to be aligned. aln, struct = cmalign_from_alignment(aln=self.seqs2_aligned,\ structure_string=self.struct2_aligned_string,\ seqs=self.seqs1_unaligned_gaps,moltype=RNA) #alignment should be correct self.assertEqual(aln.todict(),self.seqs1_and_seqs2_aligned) #structure should be correct self.assertEqual(wuss_to_vienna(str(struct)),\ self.seqs1_and_seqs2_aligned_struct_string) #should work with ungapped seqs. aln, struct = cmalign_from_alignment(aln=self.seqs2_aligned,\ structure_string=self.struct2_aligned_string,\ seqs=self.seqs1_unaligned_gaps,moltype=RNA) #alignment should be correct self.assertEqual(aln.todict(),self.seqs1_and_seqs2_aligned) #structure should be correct self.assertEqual(wuss_to_vienna(str(struct)),\ self.seqs1_and_seqs2_aligned_struct_string) #should return standard out aln, struct,stdout = cmalign_from_alignment(aln=self.seqs2_aligned,\ structure_string=self.struct2_aligned_string,\ seqs=self.seqs1_unaligned_gaps,moltype=RNA,\ return_stdout=True) #Test that standard out is same length as expected self.assertEqual(len(stdout.split('\n')),\ len(CMALIGN_STDOUT.split('\n'))) def test_cmalign_from_file(self): """cmalign_from_file should work as expected. """ #Align with cmalign_from_file without original alignment. aln,struct = cmalign_from_file(cm_file_path=self.cmfile,\ seqs=self.seqs1_unaligned,\ moltype=RNA) #Check correct alignment self.assertEqual(aln.todict(),self.seqs1_aligned) #Check correct struct self.assertEqual(wuss_to_vienna(str(struct)),\ self.seqs1_aligned_struct_string) #Align with cmalign_from_file using original alignment. aln,struct = cmalign_from_file(cm_file_path=self.cmfile,\ seqs=self.seqs1_unaligned,\ moltype=RNA,\ alignment_file_path=self.aln2_file,\ include_aln=True) #alignment should be correct self.assertEqual(aln.todict(),self.seqs1_and_seqs2_aligned) #structure should be correct self.assertEqual(wuss_to_vienna(str(struct)),\ self.seqs1_and_seqs2_aligned_struct_string) class CmbuildTests(GeneralSetUp): """Tests for the Cmbuild application controller""" def test_base_command(self): """Infernal BaseCommand should return the correct BaseCommand""" c = Cmbuild() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmbuild'])) c.Parameters['-A'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmbuild -A'])) def test_changing_working_dir(self): """Infernal BaseCommand should change according to WorkingDir""" c = Cmbuild(WorkingDir='/tmp/cmbuild_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmbuild_test','/"; ','cmbuild'])) c = Cmbuild() c.WorkingDir = '/tmp/cmbuild_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmbuild_test2','/"; ','cmbuild'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/cmbuild_test') rmdir('/tmp/cmbuild_test2') def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) def test_cmbuild_from_alignment(self): """cmbuild_from_alignment should work as expected. """ #Test unaligned seqs and unaligned struct fail. #DataError should be raised with Alignment is constructed self.assertRaises(DataError,cmbuild_from_alignment,\ self.seqs1_unaligned,self.struct1_unaligned_string) #Test aligned seqs and unaligned struct fail. self.assertRaises(ValueError,cmbuild_from_alignment,\ self.seqs2_aligned,self.struct1_unaligned_string) #Test get cm back without alignment. cm_res = cmbuild_from_alignment(self.seqs2_aligned,\ self.struct2_aligned_string) cm_lines = cm_res.split('\n') ALN1_CM_lines = ALN1_CM.split('\n') #Check that the same number of lines are in both CMs self.assertEqual(len(cm_lines),len(ALN1_CM_lines)) #The first 13 lines are unique to the specific run. The res of the # CM should be the same, since built from the same data. self.assertEqual(cm_lines[13:],ALN1_CM_lines[13:]) #Make sure same alignment is returned if return_alignment=True cm_res, cm_aln = cmbuild_from_alignment(self.seqs2_aligned,\ self.struct2_aligned_string,return_alignment=True) self.assertEqual(cm_aln,self.lines2) def test_cmbuild_from_file(self): """cmbuild_from_file should work as expected. """ cm_res = cmbuild_from_file(self.temp_dir+'/seqs1.sto') cm_lines = cm_res.split('\n') ALN1_CM_lines = ALN1_CM.split('\n') #Check that the same number of lines are in both CMs self.assertEqual(len(cm_lines),len(ALN1_CM_lines)) #The first 13 lines are unique to the specific run. The res of the # CM should be the same, since built from the same data. self.assertEqual(cm_lines[13:],ALN1_CM_lines[13:]) #Make sure same alignment is returned if return_alignment=True cm_res, cm_aln = cmbuild_from_alignment(self.seqs2_aligned,\ self.struct2_aligned_string,return_alignment=True) self.assertEqual(cm_aln,self.lines2) class CmcalibrateTests(GeneralSetUp): """Tests for the Cmcalibrate application controller""" def test_base_command(self): """Infernal BaseCommand should return the correct BaseCommand""" c = Cmcalibrate() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmcalibrate'])) c.Parameters['--mpi'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmcalibrate --mpi'])) def test_changing_working_dir(self): """Infernal BaseCommand should change according to WorkingDir""" c = Cmcalibrate(WorkingDir='/tmp/cmcalibrate_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmcalibrate_test','/"; ','cmcalibrate'])) c = Cmcalibrate() c.WorkingDir = '/tmp/cmcalibrate_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmcalibrate_test2','/"; ','cmcalibrate'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/cmcalibrate_test') rmdir('/tmp/cmcalibrate_test2') def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) class CmemitTests(GeneralSetUp): """Tests for the Cmemit application controller""" def test_base_command(self): """Infernal BaseCommand should return the correct BaseCommand""" c = Cmemit() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmemit'])) c.Parameters['-u'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmemit -u'])) def test_changing_working_dir(self): """Infernal BaseCommand should change according to WorkingDir""" c = Cmemit(WorkingDir='/tmp/cmemit_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmemit_test','/"; ','cmemit'])) c = Cmemit() c.WorkingDir = '/tmp/cmemit_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmemit_test2','/"; ','cmemit'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/cmemit_test') rmdir('/tmp/cmemit_test2') def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) class CmscoreTests(GeneralSetUp): """Tests for the Cmscore application controller""" def test_base_command(self): """Infernal BaseCommand should return the correct BaseCommand""" c = Cmscore() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmscore'])) c.Parameters['-l'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmscore -l'])) def test_changing_working_dir(self): """Infernal BaseCommand should change according to WorkingDir""" c = Cmscore(WorkingDir='/tmp/cmscore_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmscore_test','/"; ','cmscore'])) c = Cmscore() c.WorkingDir = '/tmp/cmscore_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmscore_test2','/"; ','cmscore'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/cmscore_test') rmdir('/tmp/cmscore_test2') def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) class CmsearchTests(GeneralSetUp): """Tests for the Cmsearch application controller""" def test_base_command(self): """Infernal BaseCommand should return the correct BaseCommand""" c = Cmsearch() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmsearch'])) c.Parameters['-p'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmsearch -p'])) def test_changing_working_dir(self): """Infernal BaseCommand should change according to WorkingDir""" c = Cmsearch(WorkingDir='/tmp/cmsearch_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmsearch_test','/"; ','cmsearch'])) c = Cmsearch() c.WorkingDir = '/tmp/cmsearch_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmsearch_test2','/"; ','cmsearch'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/cmsearch_test') rmdir('/tmp/cmsearch_test2') def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) def test_cmsearch_from_alignment_no_hits(self): """cmsearch_from_alignment should work as expected """ search_res = cmsearch_from_alignment(aln=self.seqs2_aligned,\ structure_string=self.struct2_aligned_string,\ seqs=self.seqs1_unaligned,moltype=RNA) self.assertEqual(search_res,[]) def test_cmsearch_from_alignment(self): """cmsearch_from_alignment should work as expected """ exp_search_res = [['a', 5, 23, 1, 19, 12.85, '-', 37],\ ['b', 1, 19, 1, 19, 14.359999999999999, '-', 47]] search_res = cmsearch_from_alignment(aln=self.seqs2_aligned,\ structure_string=self.struct2_aligned_string,\ seqs=self.seqs2_unaligned,moltype=RNA) for search, exp in zip(search_res, exp_search_res): self.assertEqual(search[1:],exp) def test_cmsearch_from_file_no_hits(self): """cmsearch_from_file should work as expected """ search_res = cmsearch_from_file(cm_file_path=self.cmfile,\ seqs=self.seqs1_unaligned,moltype=RNA) self.assertEqual(search_res,[]) def test_cmsearch_from_file(self): """cmsearch_from_file should work as expected """ exp_search_res = [['a', 5, 23, 1, 19, 12.85, '-', 37],\ ['b', 1, 19, 1, 19, 14.359999999999999, '-', 47]] search_res = cmsearch_from_file(cm_file_path=self.cmfile,\ seqs=self.seqs2_unaligned,moltype=RNA) for search, exp in zip(search_res, exp_search_res): self.assertEqual(search[1:],exp) class CmstatTests(GeneralSetUp): """Tests for the Cmstat application controller""" def test_base_command(self): """Infernal BaseCommand should return the correct BaseCommand""" c = Cmstat() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmstat'])) c.Parameters['-g'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','cmstat -g'])) def test_changing_working_dir(self): """Infernal BaseCommand should change according to WorkingDir""" c = Cmstat(WorkingDir='/tmp/cmstat_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmstat_test','/"; ','cmstat'])) c = Cmstat() c.WorkingDir = '/tmp/cmstat_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/cmstat_test2','/"; ','cmstat'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/cmstat_test') rmdir('/tmp/cmstat_test2') def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) ALN1_CM = """INFERNAL-1 [1.0rc1] NAME aln1-1 STATES 61 NODES 18 ALPHABET 1 ELSELF -0.08926734 WBETA 1e-07 NSEQ 3 EFFNSEQ 3.000 CLEN 19 BCOM cmbuild aln1.cm aln1.sto BDATE Sun Oct 5 18:45:35 2008 NULL 0.000 0.000 0.000 0.000 MODEL: [ ROOT 0 ] S 0 -1 0 1 4 -2.071 -2.210 -1.649 -2.140 IL 1 1 2 1 4 -0.556 -5.022 -1.818 -7.508 0.000 0.000 0.000 0.000 IR 2 2 3 2 3 -0.310 -2.439 -6.805 0.000 0.000 0.000 0.000 [ MATL 1 ] ML 3 2 3 5 3 -8.003 -0.020 -6.657 -0.389 0.377 -1.236 0.597 D 4 2 3 5 3 -7.923 -3.436 -0.146 IL 5 5 3 5 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 2 ] ML 6 5 3 8 3 -8.003 -0.020 -6.657 0.711 -1.015 -1.162 0.507 D 7 5 3 8 3 -7.923 -3.436 -0.146 IL 8 8 3 8 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 3 ] ML 9 8 3 11 3 -8.003 -0.020 -6.657 -0.389 0.377 -1.236 0.597 D 10 8 3 11 3 -7.923 -3.436 -0.146 IL 11 11 3 11 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 4 ] ML 12 11 3 14 3 -8.003 -0.020 -6.657 -0.392 0.246 -1.238 0.703 D 13 11 3 14 3 -7.923 -3.436 -0.146 IL 14 14 3 14 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 5 ] ML 15 14 3 17 3 -8.003 -0.020 -6.657 -1.340 -2.411 1.644 -1.777 D 16 14 3 17 3 -7.923 -3.436 -0.146 IL 17 17 3 17 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 6 ] ML 18 17 3 20 3 -8.003 -0.020 -6.657 0.830 0.106 -1.204 -0.492 D 19 17 3 20 3 -7.923 -3.436 -0.146 IL 20 20 3 20 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 7 ] ML 21 20 3 23 3 -8.003 -0.020 -6.657 -1.143 -1.575 -1.925 1.560 D 22 20 3 23 3 -7.923 -3.436 -0.146 IL 23 23 3 23 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 8 ] ML 24 23 3 26 3 -8.391 -0.018 -6.709 0.821 -1.044 -1.178 0.385 D 25 23 3 26 3 -6.905 -0.258 -2.688 IL 26 26 3 26 3 -1.925 -0.554 -4.164 0.000 0.000 0.000 0.000 [ MATR 9 ] MR 27 26 3 29 5 -7.411 -0.031 -7.227 -7.439 -8.330 -0.726 0.967 -1.567 0.142 D 28 26 3 29 5 -5.352 -0.707 -2.978 -4.409 -2.404 IR 29 29 3 29 5 -2.408 -0.496 -5.920 -4.087 -5.193 0.000 0.000 0.000 0.000 [ MATP 10 ] MP 30 29 3 34 6 -9.266 -9.205 -0.019 -7.982 -8.261 -8.656 -1.570 -1.865 -1.898 0.327 -1.331 -2.318 0.651 0.994 -1.872 0.282 -2.224 -0.666 1.972 -1.608 -0.242 1.187 ML 31 29 3 34 6 -6.250 -6.596 -1.310 -1.005 -6.446 -3.975 0.660 -0.612 -0.293 -0.076 MR 32 29 3 34 6 -6.988 -5.717 -1.625 -5.695 -0.829 -3.908 0.660 -0.612 -0.293 -0.076 D 33 29 3 34 6 -9.049 -7.747 -3.544 -4.226 -4.244 -0.319 IL 34 34 5 34 6 -2.579 -2.842 -0.760 -4.497 -5.274 -4.934 0.000 0.000 0.000 0.000 IR 35 35 6 35 5 -2.408 -0.496 -5.920 -4.087 -5.193 0.000 0.000 0.000 0.000 [ MATP 11 ] MP 36 35 6 40 4 -7.331 -7.538 -0.041 -5.952 -4.114 0.397 -4.664 0.815 -4.665 -4.015 -0.462 -4.315 -3.939 3.331 -3.732 -0.830 -0.398 -3.640 -1.958 -3.517 ML 37 35 6 40 4 -3.758 -3.940 -0.507 -2.670 0.660 -0.612 -0.293 -0.076 MR 38 35 6 40 4 -4.809 -3.838 -1.706 -0.766 0.660 -0.612 -0.293 -0.076 D 39 35 6 40 4 -4.568 -4.250 -2.265 -0.520 IL 40 40 5 40 4 -1.686 -2.369 -1.117 -4.855 0.000 0.000 0.000 0.000 IR 41 41 6 41 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 12 ] ML 42 41 6 44 5 -7.411 -0.031 -7.227 -7.439 -8.330 1.826 -2.947 -2.856 -2.413 D 43 41 6 44 5 -4.959 -0.803 -4.221 -2.596 -2.508 IL 44 44 3 44 5 -2.408 -0.496 -4.087 -5.920 -5.193 0.000 0.000 0.000 0.000 [ MATP 13 ] MP 45 44 3 49 4 -7.331 -7.538 -0.041 -5.952 -1.592 -1.722 -1.807 0.471 -1.387 -2.146 1.822 0.774 -1.836 0.505 -2.076 -0.521 1.055 -1.515 -0.260 0.958 ML 46 44 3 49 4 -3.758 -3.940 -0.507 -2.670 0.660 -0.612 -0.293 -0.076 MR 47 44 3 49 4 -4.809 -3.838 -1.706 -0.766 0.660 -0.612 -0.293 -0.076 D 48 44 3 49 4 -4.568 -4.250 -2.265 -0.520 IL 49 49 5 49 4 -1.686 -2.369 -1.117 -4.855 0.000 0.000 0.000 0.000 IR 50 50 6 50 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 14 ] ML 51 50 6 53 3 -8.323 -0.016 -6.977 0.481 -1.091 -0.011 0.192 D 52 50 6 53 3 -6.174 -1.687 -0.566 IL 53 53 3 53 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 15 ] ML 54 53 3 56 3 -8.323 -0.016 -6.977 1.148 -1.570 -0.075 -1.007 D 55 53 3 56 3 -6.174 -1.687 -0.566 IL 56 56 3 56 3 -1.442 -0.798 -4.142 0.000 0.000 0.000 0.000 [ MATL 16 ] ML 57 56 3 59 2 * 0.000 -0.726 0.967 -1.567 0.142 D 58 56 3 59 2 * 0.000 IL 59 59 3 59 2 -1.823 -0.479 0.000 0.000 0.000 0.000 [ END 17 ] E 60 59 3 -1 0 // """ CMALIGN_STDOUT = """# cmalign :: align sequences to an RNA CM # INFERNAL 1.0rc1 (June 2008) # Copyright 2007-2009 (C) 2008 HHMI Janelia Farm Research Campus # Freely distributed under the GNU General Public License (GPL) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # command: cmalign --withali aln1.sto -o all_aligned.sto aln1.cm seqs1.fasta # date: Sun Oct 5 22:04:30 2008 # # cm name algorithm config sub bands tau # ------------------------- --------- ------ --- ----- ------ # aln1-1 opt acc global no hmm 1e-07 # # bit scores # ------------------ # seq idx seq name len total struct avg prob elapsed # ------- -------- ----- -------- -------- -------- ----------- 1 1 23 -9.98 5.71 0.260 00:00:00.01 2 2 13 -6.79 6.73 0.710 00:00:00.00 3 3 17 -7.43 5.86 0.754 00:00:00.01 # Alignment saved in file all_aligned.sto. # # CPU time: 0.02u 0.00s 00:00:00.02 Elapsed: 00:00:00 """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_knetfold.py000644 000765 000024 00000003666 12024702176 022454 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.app.knetfold import Knetfold __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class KnetfoldTest(TestCase): """Tests for Knetfold application controller""" def setUp(self): self.input = knetfold_input def test_input_as_lines(self): """Test Knetfold stdout input as lines""" k = Knetfold(InputHandler='_input_as_lines') res = k(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test Knetfold stdout input as string""" k = Knetfold() f = open('/tmp/single.fasta','w') f.write('\n'.join(self.input)) f.close() res = k('/tmp/single.fasta') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/single.fasta') def test_get_result_path(self): """Tests knetfold result path""" k = Knetfold(InputHandler='_input_as_lines') res = k(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus', 'ct','coll','sec','fasta','pdf','mx0','mx1','mx2','mx3']) self.assertEqual(res['ExitStatus'],0) assert res['ct'] is not None assert res['coll'] is not None assert res['sec'] is not None assert res['fasta'] is not None res.cleanUp() knetfold_input = ['>seq1\n', 'GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_mafft.py000644 000765 000024 00000012340 12024702176 021730 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import getcwd, remove, rmdir, mkdir, path import tempfile, shutil from cogent.core.moltype import RNA from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from cogent.app.mafft import Mafft, align_unaligned_seqs, \ add_seqs_to_alignment, align_two_alignments __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" class GeneralSetUp(TestCase): def setUp(self): """Mafft general setUp method for all tests""" self.seqs1 = ['ACUGCUAGCUAGUAGCGUACGUA','GCUACGUAGCUAC', 'GCGGCUAUUAGAUCGUA'] self.labels1 = ['>1','>2','>3'] self.lines1 = flatten(zip(self.labels1,self.seqs1)) self.aligned1 = {'1': 'acugcuagcuaguagcguacgua',\ '2': 'gcuacguagcuac----------',\ '3': 'gcggcuauuagau------cgua',\ } self.seqs2=['UAGGCUCUGAUAUAAUAGCUCUC','UAUCGCUUCGACGAUUCUCUGAUAGAGA', 'UGACUACGCAU'] self.labels2=['>a','>b','>c'] self.lines2 = flatten(zip(self.labels2,self.seqs2)) self.aligned2 = {'a': 'UAGGCUCUGAUAUAAUAGCUCUC---------',\ 'b': 'UA----UCGCUUCGACGAUUCUCUGAUAGAGA',\ 'c': 'UG------------ACUACGCAU---------',\ } self.temp_dir = tempfile.mkdtemp() self.temp_dir_spaces = '/tmp/test for mafft/' try: mkdir(self.temp_dir_spaces) except OSError: pass try: #create sequence files f = open(path.join(self.temp_dir, 'seq1.txt'),'w') f.write('\n'.join(self.lines1)) f.close() g = open(path.join(self.temp_dir, 'seq2.txt'),'w') g.write('\n'.join(self.lines2)) g.close() except OSError: pass class MafftTests(GeneralSetUp): """Tests for the Mafft application controller""" def test_base_command(self): """Mafft BaseCommand should return the correct BaseCommand""" c = Mafft() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','mafft'])) c.Parameters['--quiet'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','mafft --quiet'])) c.Parameters['--globalpair'].on() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','mafft --globalpair --quiet'])) c.Parameters['--maxiterate'].on(1000) self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ',"""mafft --maxiterate 1000 --globalpair --quiet"""])) def test_changing_working_dir(self): """Mafft BaseCommand should change according to WorkingDir""" c = Mafft(WorkingDir='/tmp/mafft_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/mafft_test','/"; ','mafft'])) c = Mafft() c.WorkingDir = '/tmp/mafft_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/mafft_test2','/"; ','mafft'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/mafft_test') rmdir('/tmp/mafft_test2') def test_general_cleanUp(self): """Last test executed: cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) def test_align_unaligned_seqs(self): """align_unaligned_seqs should work as expected""" res = align_unaligned_seqs(self.seqs1, RNA) self.assertEqual(res.toFasta(), align1) res = align_unaligned_seqs(self.lines2, RNA) self.assertEqual(res.toFasta(), align2) def test_add_seqs_to_alignment(self): """add_seqs_to_alignment should work as expected.""" res = add_seqs_to_alignment(self.lines1,self.aligned2, RNA) self.assertEqual(res.toFasta(), add_seqs_align) def test_align_two_alignments(self): """align_two_alignments should work as expected.""" res = align_two_alignments(self.aligned1, self.aligned2, RNA) self.assertEqual(res.toFasta(), align_two_align) align1 = ">seq_0\nACUGCUAGCUAGUAGCGUACGUA\n>seq_1\nGCUACGUAGCUAC----------\n>seq_2\nGCGGCUAUUAGAU------CGUA" align2 = ">a\nUAGGCUCUGAUAUAAUAGCUCUC---------\n>b\nUA----UCGCUUCGACGAUUCUCUGAUAGAGA\n>c\nUG------------ACUACGCAU---------" add_seqs_align = """>1\nACUGC-UAGCUAGUAGCGUACGUA--------\n>2\nGCUACGUAGCUA-----------C--------\n>3\nGCGGCUAUUAGAUCGUA---------------\n>a\nUAGGCUCUGAUAUAAUAGCUCUC---------\n>b\nUA----UCGCUUCGACGAUUCUCUGAUAGAGA\n>c\nUG------------ACUACGCAU---------""" align_two_align = """>1\nACUGCUAGCUAGUAGCGUACGUA---------\n>2\nGCUACGUAGCUAC-------------------\n>3\nGCGGCUAUUAGAU------CGUA---------\n>a\nUAGGCUCUGAUAUAAUAGCUCUC---------\n>b\nUA----UCGCUUCGACGAUUCUCUGAUAGAGA\n>c\nUG------------ACUACGCAU---------""" if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_mfold.py000644 000765 000024 00000005203 12024702176 021734 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.app.mfold import Mfold __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class MfoldTest(TestCase): """Tests for Mfold application controller""" def setUp(self): self.input = mfold_input def test_stdout_input_as_lines(self): """Test Mfold stdout input as lines""" m = Mfold(InputHandler='_input_as_lines') res = m(self.input) #Impossible to compare stdout since tmp filenames in app controller #can't be predicted and are in stdout #Test exitstatus = 0 and stdout is not none self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_stdout_input_as_string(self): """Test Mfold stdout input as string""" m = Mfold() f = open('/tmp/single.fasta','w') f.write('\n'.join(self.input)) f.close() res = m('/tmp/single.fasta') #Impossible to compare stdout since tmp filenames in app controller #can't be predicted and are in stdout #Test exitstatus = 0 and stdout is not none assert res['StdOut'] is not None self.assertEqual(res['ExitStatus'],0) res.cleanUp() remove('/tmp/single.fasta') def test_get_result_path(self): """Tests mfold result path""" m = Mfold(InputHandler='_input_as_lines') res = m(self.input) self.assertEqualItems(res.keys(),['ps','_1.ps','_1.ss','StdOut', 'StdErr', 'ExitStatus', 'ct_all', 'ct1', 'log', 'ann', 'h-num', 'det', 'pnt', 'sav', 'ss-count', '-local.seq', 'rnaml', 'out', 'plot','pdf1']) self.assertEqual(res['ExitStatus'],0) assert res['ct_all'] is not None assert res['log'] is not None assert res['ann'] is not None assert res['h-num'] is not None assert res['det'] is not None assert res['pnt'] is not None assert res['sav'] is not None assert res['ss-count'] is not None assert res['-local.seq'] is not None assert res['rnaml'] is not None assert res['out'] is not None assert res['plot'] is not None res.cleanUp() mfold_input = ['>seq1\n', 'GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_mothur.py000644 000765 000024 00000013467 12024702176 022164 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # test_mothur.py from __future__ import with_statement from cStringIO import StringIO from os import remove, rmdir from tempfile import mkdtemp, mkstemp from cogent.util.unit_test import TestCase, main from cogent.app.mothur import Mothur, mothur_from_file __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger", "Jose Carlos Clemente Litran"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Development" class MothurTests(TestCase): def setUp(self): self.small_fasta = ( '>aaaaaa\nTAGGCTCTGATATAATAGCTCTC---------\n' '>cccccc\n------------TGACTACGCAT---------\n' '>bbbbbb\n----TATCGCTTCGACGATTCTCTGATAGAGA\n' ) self.small_otus = ( 'unique\t3\taaaaaa\tcccccc\tbbbbbb\t\n' '0.62\t2\taaaaaa\tbbbbbb,cccccc\t\n' '0.67\t1\tbbbbbb,cccccc,aaaaaa\t\n' ) self.small_otus_parsed = [ (float('0'), [['aaaaaa'], ['cccccc'], ['bbbbbb']]), (float('0.62'), [['aaaaaa'], ['bbbbbb', 'cccccc']]), (float('0.67'), [['bbbbbb', 'cccccc', 'aaaaaa']]), ] self.complement_fasta = ( '>a\n--AGGGGTAATAA--\n' '>b\n--TTATTACCCCT--\n' '>c\n-------AAAAAA--\n' ) self.complement_otus = ( 'unique\t3\ta\tb\tc\t\n' '0.43\t2\tc,a\tb\t\n' '1.00\t1\tb,c,a\t\n' ) def test_get_help(self): """Mothur.getHelp() should return help string""" expected_help = ( 'See manual, available on the MOTHUR wiki:\n' 'http://schloss.micro.umass.edu/mothur/' ) self.assertEqual(Mothur.getHelp(), expected_help) def test_compile_mothur_script(self): """Mothur._compile_mothur_script() should return valid Mothur script""" app = Mothur() app._input_filename = 'test.fasta' observed_script = app._compile_mothur_script() expected_script = ( '#unique.seqs(fasta=test.fasta); ' 'dist.seqs(fasta=test.unique.fasta); ' 'read.dist(column=test.unique.dist, name=test.names); ' 'cluster(method=furthest)') self.assertEqual(observed_script, expected_script) def test_get_result_paths(self): """Mothur._get_result_paths() should guess correct output paths""" app = Mothur() app._input_filename = 'test.fasta' observed_paths = { 'distance matrix': app._derive_dist_path(), 'otu list': app._derive_list_path(), 'rank abundance': app._derive_rank_abundance_path(), 'species abundance': app._derive_species_abundance_path(), 'unique names': app._derive_names_path(), 'unique seqs': app._derive_unique_path(), } expected_paths = { 'distance matrix': 'test.unique.dist', 'otu list': 'test.unique.fn.list', 'rank abundance': 'test.unique.fn.rabund', 'species abundance': 'test.unique.fn.sabund', 'unique names': 'test.names', 'unique seqs': 'test.unique.fasta', } self.assertEqual(observed_paths, expected_paths) def test_working_directory(self): """Mothur.WorkingDir attribute should not be cast to FilePath object""" app = Mothur(WorkingDir='/tmp') self.assertEquals(str(app.WorkingDir), '/tmp') def test_call_with_multiline_string(self): """Mothur.__call__() should return correct otu's for input as single string""" app = Mothur() result = app(self.small_fasta) observed_otus = result['otu list'].read() self.assertEquals(observed_otus, self.small_otus) result.cleanUp() def test_call_with_lines(self): """Mothur.__call__() should return correct otu's for input as lines""" lines = self.small_fasta.split('\n') app = Mothur(InputHandler='_input_as_lines') result = app(lines) observed_otus = result['otu list'].read() self.assertEquals(observed_otus, self.small_otus) result.cleanUp() def test_call_with_path(self): """Mothur.__call__() should return correct otu's for input as path""" working_dir = mkdtemp() _, filename = mkstemp(dir=working_dir, suffix='.fasta') with open(filename, 'w') as f: f.write(self.small_fasta) app = Mothur(InputHandler='_input_as_path', WorkingDir=working_dir) result = app(filename) observed_otus = result['otu list'].read() self.assertEquals(observed_otus, self.small_otus) remove(filename) result.cleanUp() rmdir(working_dir) def test_call_with_working_dir(self): """Mothur.__call__() should return correct otu's when input dir is changed""" working_dir = mkdtemp() app = Mothur(WorkingDir=working_dir) result = app(self.small_fasta) observed_otus = result['otu list'].read() self.assertEquals(observed_otus, self.small_otus) result.cleanUp() rmdir(working_dir) def test_call_with_complement(self): """Mothur.__call__() should return correct otu's for input sequences which are reverse complements""" app = Mothur() result = app(self.complement_fasta) observed_otus = result['otu list'].read() self.assertEquals(observed_otus, self.complement_otus) result.cleanUp() def test_mothur_from_file(self): """mothur_from_file() should return parsed otus""" f = StringIO(self.small_fasta) f.seek(0) parsed_otus = mothur_from_file(f) self.assertEquals(parsed_otus, self.small_otus_parsed) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_msms.py000644 000765 000024 00000005026 12024702176 021615 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os, tempfile from cogent.util.unit_test import TestCase, main from cogent.parse.pdb import PDBParser from cogent.app.msms import Msms, surface_xtra from cogent.struct.selection import einput __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class MsmsTest(TestCase): """Tests for Msms application controller""" def setUp(self): self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) def test_stdout_input_from_entity(self): """Test Stride when input is an entity""" s = Msms() res = s(self.input_structure) self.assertEqual(res['ExitStatus'], 0) stdout = res['StdOut'].read() assert stdout.find('1634 spheres 0 collision only, radii 1.600 to 1.850') != -1 assert not res['StdOut'].read() assert res['ExitStatus'] == 0 assert list(sorted(res.keys())) == list(sorted(['FaceFile', 'StdOut', 'AreaFile', 'StdErr', 'VertFile', 'ExitStatus'])) af = res['AreaFile'].readlines() assert len(af) == 1635, len(af) ff = res['FaceFile'].readlines() assert ff[1].strip() == "#faces #sphere density probe_r" assert ff[2].strip() == "25310 1634 1.00 1.50" or \ ff[2].strip() == "51712 1634 1.00 1.50" assert len(ff) == 25313 or len(ff) == 51715 vf = res['VertFile'].readlines() assert vf[1] == '#vertex #sphere density probe_r\n' assert vf[2].strip() == '12657 1634 1.00 1.50' or \ vf[2].strip() == '25858 1634 1.00 1.50' assert len(vf) == 12660 or len(vf) == 25861 res.cleanUp() def test_surface_xtra(self): res = surface_xtra(self.input_structure) assert res.shape == (12658, 3) or res.shape == (25859, 3) assert res is self.input_structure.xtra['MSMS_SURFACE'] chains = einput(self.input_structure, 'C') chainA, chainB = chains.sortedvalues() resA = surface_xtra(chainA) assert len(resA) == 6223 or len(resA) == 12965 resB = surface_xtra(chainB) assert len(resB) == 6620 or len(resB) == 13390 assert chainB.xtra['MSMS_SURFACE'] is resB is \ self.input_structure[(0,)][('B',)].xtra['MSMS_SURFACE'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_muscle.py000644 000765 000024 00000026175 12024702176 022136 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import getcwd, remove, rmdir, mkdir, path from subprocess import Popen, PIPE, STDOUT import tempfile, shutil from cogent.core.moltype import RNA, DNA from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from cogent.app.muscle import Muscle, muscle_seqs, aln_tree_seqs, \ align_unaligned_seqs, build_tree_from_alignment, \ align_and_build_tree, add_seqs_to_alignment, align_two_alignments __author__ = "Catherine Lozupone" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozupone", "Rob Knight", "Daniel McDonald", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Catherine Lozupone" __email__ = "lozupone@colorado.edu" __status__ = "Development" class GeneralSetUp(TestCase): def setUp(self): """Muscle general setUp method for all tests""" # Check if muscle version is supported for this test acceptable_version = (3,6) command = "muscle -version" proc = Popen(command,shell=True,universal_newlines=True,\ stdout=PIPE,stderr=STDOUT) stdout = proc.stdout.read() version_string = stdout.strip().split(' ')[1].strip()[1:] try: version = tuple(map(int,version_string.split('.'))) pass_test = version[:2] == acceptable_version except ValueError: pass_test = False version_string = stdout self.assertTrue(pass_test,\ "Unsupported muscle version. %s is required, but running %s." \ % ('.'.join(map(str,acceptable_version)), version_string)) self.seqs1 = ['ACUGCUAGCUAGUAGCGUACGUA','GCUACGUAGCUAC', 'GCGGCUAUUAGAUCGUA'] self.labels1 = ['>1','>2','>3'] self.lines1 = flatten(zip(self.labels1,self.seqs1)) self.seqs2=['UAGGCUCUGAUAUAAUAGCUCUC','UAUCGCUUCGACGAUUCUCUGAUAGAGA', 'UGACUACGCAU'] self.labels2=['>a','>b','>c'] self.lines2 = flatten(zip(self.labels2,self.seqs2)) self.temp_dir = tempfile.mkdtemp() self.temp_dir_spaces = '/tmp/test for muscle/' try: mkdir(self.temp_dir_spaces) except OSError: pass try: #create sequence files f = open(path.join(self.temp_dir, 'seq1.txt'),'w') f.write('\n'.join(self.lines1)) f.close() g = open(path.join(self.temp_dir, 'seq2.txt'),'w') g.write('\n'.join(self.lines2)) g.close() except OSError: pass def tearDown(self): """cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) class MuscleTests(GeneralSetUp): """Tests for the Muscle application controller""" def test_base_command(self): """Muscle BaseCommand should return the correct BaseCommand""" c = Muscle() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','muscle'])) c.Parameters['-in'].on('seq.txt') self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','muscle -in "seq.txt"'])) c.Parameters['-cluster2'].on('neighborjoining') self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','muscle -cluster2 neighborjoining' + ' -in "seq.txt"'])) def test_maxmb(self): """maxmb option should not break Muscle""" app = Muscle() app.Parameters['-maxmb'].on('250') outfile = tempfile.NamedTemporaryFile() app.Parameters['-out'].on(outfile.name) infile = tempfile.NamedTemporaryFile() infile.write( ">Seq1\nAAAGGGTTTCCCCT\n" ">Seq2\nAAAGGGGGTTTCCACT\n") infile.flush() result = app(infile.name) observed = result['MuscleOut'].read() expected = ( ">Seq1\nAAA--GGGTTTCCCCT\n" ">Seq2\nAAAGGGGGTTTCCACT\n" ) self.assertEqual(observed, expected) def test_changing_working_dir(self): """Muscle BaseCommand should change according to WorkingDir""" c = Muscle(WorkingDir='/tmp/muscle_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/muscle_test','/"; ','muscle'])) c = Muscle() c.WorkingDir = '/tmp/muscle_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/muscle_test2','/"; ','muscle'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/muscle_test') rmdir('/tmp/muscle_test2') def test_aln_tree_seqs(self): "aln_tree_seqs returns the muscle alignment and tree from iteration2" tree, aln = aln_tree_seqs(path.join(self.temp_dir, 'seq1.txt'), tree_type="neighborjoining", WorkingDir=self.temp_dir, clean_up=True) self.assertEqual(str(tree), '((1:1.125,2:1.125):0.375,3:1.5);') self.assertEqual(len(aln), 6) self.assertEqual(aln[-2], '>3\n') self.assertEqual(aln[-1], 'GCGGCUAUUAGAUCGUA------\n') def test_aln_tree_seqs_spaces(self): "aln_tree_seqs should work on filename with spaces" try: #create sequence files f = open(path.join(self.temp_dir_spaces, 'muscle_test_seq1.txt'),'w') f.write('\n'.join(self.lines1)) f.close() except OSError: pass tree, aln = aln_tree_seqs(path.join(self.temp_dir_spaces,\ 'muscle_test_seq1.txt'), tree_type="neighborjoining", WorkingDir=getcwd(), clean_up=True) self.assertEqual(str(tree), '((1:1.125,2:1.125):0.375,3:1.5);') self.assertEqual(len(aln), 6) self.assertEqual(aln[-2], '>3\n') self.assertEqual(aln[-1], 'GCGGCUAUUAGAUCGUA------\n') remove(self.temp_dir_spaces+'/muscle_test_seq1.txt') def test_align_unaligned_seqs(self): """align_unaligned_seqs should work as expected""" res = align_unaligned_seqs(self.seqs1, RNA) self.assertEqual(res.toFasta(), align1) def test_build_tree_from_alignment(self): """Muscle should return a tree built from the passed alignment""" tree_short = build_tree_from_alignment(build_tree_seqs_short, DNA) num_seqs = flatten(build_tree_seqs_short).count('>') self.assertEqual(len(tree_short.tips()), num_seqs) tree_long = build_tree_from_alignment(build_tree_seqs_long, DNA) seq_names = [] for line in build_tree_seqs_long.split('\n'): if line.startswith('>'): seq_names.append(line[1:]) for node in tree_long.tips(): if node.Name not in seq_names: self.fail() def test_align_and_build_tree(self): """Should align and build a tree from a set of sequences""" res = align_and_build_tree(self.seqs1, RNA) self.assertEqual(res['Align'].toFasta(), align1) tree = res['Tree'] seq_names = [] for line in align1.split('\n'): if line.startswith('>'): seq_names.append(line[1:]) for node in tree.tips(): if node.Name not in seq_names: self.fail() def test_add_seqs_to_alignment(self): """Should add sequences to an alignment""" res = add_seqs_to_alignment(seqs_to_add, align1) self.assertEqual(res.toFasta(), added_align_result) def test_align_two_alignments(self): """Should align to multiple sequence alignments""" res = align_two_alignments(align1, aln_to_merge) self.assertEqual(res.toFasta(), merged_align_result) align1 = ">seq_0\nACUGCUAGCUAGUAGCGUACGUA\n>seq_1\n---GCUACGUAGCUAC-------\n>seq_2\nGCGGCUAUUAGAUCGUA------" # for use in test_add_seqs_to_alignment() seqs_to_add = ">foo\nGCUACGUAGCU\n>bar\nGCUACGUAGCC" added_align_result = ">bar\n---GCUACGUAGCC---------\n>foo\n---GCUACGUAGCU---------\n>seq_0\nACUGCUAGCUAGUAGCGUACGUA\n>seq_1\n---GCUACGUAGCUAC-------\n>seq_2\nGCGGCUAUUAGAUCGUA------" # for use in test_align_two_alignments() aln_to_merge = ">foo\nGCUACGUAGCU\n>bar\n--UACGUAGCC" merged_align_result = ">bar\n-----UACGUAGCC---------\n>foo\n---GCUACGUAGCU---------\n>seq_0\nACUGCUAGCUAGUAGCGUACGUA\n>seq_1\n---GCUACGUAGCUAC-------\n>seq_2\nGCGGCUAUUAGAUCGUA------" build_tree_seqs_short = """>muscle_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AGCTTTAAATCATGCCAGTG >muscle_test_seqs_1 GACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT TGCTTTCAATAATGCCAGTG >muscle_test_seqs_2 AACCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT TGCTTTGAATCATGCCAGTA >muscle_test_seqs_3 AAACCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT TGCTTTACATCATGCAAGTG >muscle_test_seqs_4 AACCGCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT TGCTTTAAATCATGCCAGTG >muscle_test_seqs_5 AACCCCCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA TGCTTTAAATCATGCCAGTT >muscle_test_seqs_6 GACCCCCGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT TACTTTAGATCATGCCGGTG >muscle_test_seqs_7 AACCCCCACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT TGCTTTAAATCATGCCAGTG >muscle_test_seqs_8 AACCCCCACGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT TGCATTAAATCATGCCAGTG >muscle_test_seqs_9 AAGCCCCACGGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA TGCTTTAAATCCTGACAGCG """ build_tree_seqs_long = """>muscle_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AGCTTTAAATCATGCCAGTG >muscle_test_seqsaaaaaaaa_1 GACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT TGCTTTCAATAATGCCAGTG >muscle_test_seqsaaaaaaaa_2 AACCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT TGCTTTGAATCATGCCAGTA >muscle_test_seqsaaaaaaaa_3 AAACCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT TGCTTTACATCATGCAAGTG >muscle_test_seqsaaaaaaaa_4 AACCGCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT TGCTTTAAATCATGCCAGTG >muscle_test_seqsaaaaaaaa_5 AACCCCCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA TGCTTTAAATCATGCCAGTT >muscle_test_seqsaaaaaaaa_6 GACCCCCGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT TACTTTAGATCATGCCGGTG >muscle_test_seqsaaaaaaaa_7 AACCCCCACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT TGCTTTAAATCATGCCAGTG >muscle_test_seqsaaaaaaaa_8 AACCCCCACGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT TGCATTAAATCATGCCAGTG >muscle_test_seqsaaaaaaaa_9 AAGCCCCACGGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA TGCTTTAAATCCTGACAGCG """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_muscle_v38.py000644 000765 000024 00000026177 12024702176 022640 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import getcwd, remove, rmdir, mkdir, path from subprocess import Popen, PIPE, STDOUT import tempfile, shutil from cogent.core.moltype import RNA, DNA from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from cogent.app.muscle_v38 import Muscle, muscle_seqs, aln_tree_seqs, \ align_unaligned_seqs, build_tree_from_alignment, \ align_and_build_tree, add_seqs_to_alignment, align_two_alignments __author__ = "Catherine Lozupone" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozupone", "Rob Knight", "Daniel McDonald", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Catherine Lozupone" __email__ = "lozupone@colorado.edu" __status__ = "Production" class GeneralSetUp(TestCase): def setUp(self): """Muscle general setUp method for all tests""" # Check if muscle version is supported for this test acceptable_version = (3,8) command = "muscle -version" proc = Popen(command,shell=True,universal_newlines=True,\ stdout=PIPE,stderr=STDOUT) stdout = proc.stdout.read() version_string = stdout.strip().split(' ')[1].strip()[1:] try: version = tuple(map(int,version_string.split('.'))) pass_test = version[:2] == acceptable_version except ValueError: pass_test = False version_string = stdout self.assertTrue(pass_test,\ "Unsupported muscle version. %s is required, but running %s." \ % ('.'.join(map(str,acceptable_version)), version_string)) self.seqs1 = ['ACUGCUAGCUAGUAGCGUACGUA','GCUACGUAGCUAC', 'GCGGCUAUUAGAUCGUA'] self.labels1 = ['>1','>2','>3'] self.lines1 = flatten(zip(self.labels1,self.seqs1)) self.seqs2=['UAGGCUCUGAUAUAAUAGCUCUC','UAUCGCUUCGACGAUUCUCUGAUAGAGA', 'UGACUACGCAU'] self.labels2=['>a','>b','>c'] self.lines2 = flatten(zip(self.labels2,self.seqs2)) self.temp_dir = tempfile.mkdtemp() self.temp_dir_spaces = '/tmp/test for muscle/' try: mkdir(self.temp_dir_spaces) except OSError: pass try: #create sequence files f = open(path.join(self.temp_dir, 'seq1.txt'),'w') f.write('\n'.join(self.lines1)) f.close() g = open(path.join(self.temp_dir, 'seq2.txt'),'w') g.write('\n'.join(self.lines2)) g.close() except OSError: pass def tearDown(self): """cleans up all files initially created""" # remove the tempdir and contents shutil.rmtree(self.temp_dir) shutil.rmtree(self.temp_dir_spaces) class MuscleTests(GeneralSetUp): """Tests for the Muscle application controller""" def test_base_command(self): """Muscle BaseCommand should return the correct BaseCommand""" c = Muscle() self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','muscle'])) c.Parameters['-in'].on('seq.txt') self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','muscle -in "seq.txt"'])) c.Parameters['-cluster2'].on('neighborjoining') self.assertEqual(c.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','muscle -cluster2 neighborjoining' + ' -in "seq.txt"'])) def test_maxmb(self): """maxmb option should not break Muscle""" app = Muscle() app.Parameters['-maxmb'].on('250') outfile = tempfile.NamedTemporaryFile() app.Parameters['-out'].on(outfile.name) infile = tempfile.NamedTemporaryFile() infile.write( ">Seq1\nAAAGGGTTTCCCCT\n" ">Seq2\nAAAGGGGGTTTCCACT\n") infile.flush() result = app(infile.name) observed = result['MuscleOut'].read() expected = ( ">Seq1\nAAA--GGGTTTCCCCT\n" ">Seq2\nAAAGGGGGTTTCCACT\n" ) self.assertEqual(observed, expected) def test_changing_working_dir(self): """Muscle BaseCommand should change according to WorkingDir""" c = Muscle(WorkingDir='/tmp/muscle_test') self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/muscle_test','/"; ','muscle'])) c = Muscle() c.WorkingDir = '/tmp/muscle_test2' self.assertEqual(c.BaseCommand,\ ''.join(['cd "','/tmp/muscle_test2','/"; ','muscle'])) #removing the dirs is proof that they were created at the same time #if the dirs are not there, an OSError will be raised rmdir('/tmp/muscle_test') rmdir('/tmp/muscle_test2') def test_aln_tree_seqs(self): "aln_tree_seqs returns the muscle alignment and tree from iteration2" tree, aln = aln_tree_seqs(path.join(self.temp_dir, 'seq1.txt'), tree_type="neighborjoining", WorkingDir=self.temp_dir, clean_up=True) self.assertEqual(str(tree), '((1:1.125,2:1.125):0.375,3:1.5);') self.assertEqual(len(aln), 6) self.assertEqual(aln[-2], '>3\n') self.assertEqual(aln[-1], 'GCGGCUAUUAGAUCGUA------\n') def test_aln_tree_seqs_spaces(self): "aln_tree_seqs should work on filename with spaces" try: #create sequence files f = open(path.join(self.temp_dir_spaces, 'muscle_test_seq1.txt'),'w') f.write('\n'.join(self.lines1)) f.close() except OSError: pass tree, aln = aln_tree_seqs(path.join(self.temp_dir_spaces,\ 'muscle_test_seq1.txt'), tree_type="neighborjoining", WorkingDir=getcwd(), clean_up=True) self.assertEqual(str(tree), '((1:1.125,2:1.125):0.375,3:1.5);') self.assertEqual(len(aln), 6) self.assertEqual(aln[-2], '>3\n') self.assertEqual(aln[-1], 'GCGGCUAUUAGAUCGUA------\n') remove(self.temp_dir_spaces+'/muscle_test_seq1.txt') def test_align_unaligned_seqs(self): """align_unaligned_seqs should work as expected""" res = align_unaligned_seqs(self.seqs1, RNA) self.assertEqual(res.toFasta(), align1) def test_build_tree_from_alignment(self): """Muscle should return a tree built from the passed alignment""" tree_short = build_tree_from_alignment(build_tree_seqs_short, DNA) num_seqs = flatten(build_tree_seqs_short).count('>') self.assertEqual(len(tree_short.tips()), num_seqs) tree_long = build_tree_from_alignment(build_tree_seqs_long, DNA) seq_names = [] for line in build_tree_seqs_long.split('\n'): if line.startswith('>'): seq_names.append(line[1:]) for node in tree_long.tips(): if node.Name not in seq_names: self.fail() def test_align_and_build_tree(self): """Should align and build a tree from a set of sequences""" res = align_and_build_tree(self.seqs1, RNA) self.assertEqual(res['Align'].toFasta(), align1) tree = res['Tree'] seq_names = [] for line in align1.split('\n'): if line.startswith('>'): seq_names.append(line[1:]) for node in tree.tips(): if node.Name not in seq_names: self.fail() def test_add_seqs_to_alignment(self): """Should add sequences to an alignment""" res = add_seqs_to_alignment(seqs_to_add, align1) self.assertEqual(res.toFasta(), added_align_result) def test_align_two_alignments(self): """Should align to multiple sequence alignments""" res = align_two_alignments(align1, aln_to_merge) self.assertEqual(res.toFasta(), merged_align_result) align1 = ">seq_0\nACUGCUAGCUAGUAGCGUACGUA\n>seq_1\n---GCUACGUAGCUAC-------\n>seq_2\nGCGGCUAUUAGAUCGUA------" # for use in test_add_seqs_to_alignment() seqs_to_add = ">foo\nGCUACGUAGCU\n>bar\nGCUACGUAGCC" added_align_result = ">bar\n---GCUACGUAGCC---------\n>foo\n---GCUACGUAGCU---------\n>seq_0\nACUGCUAGCUAGUAGCGUACGUA\n>seq_1\n---GCUACGUAGCUAC-------\n>seq_2\nGCGGCUAUUAGAUCGUA------" # for use in test_align_two_alignments() aln_to_merge = ">foo\nGCUACGUAGCU\n>bar\n--UACGUAGCC" merged_align_result = ">bar\n-----UACGUAGCC---------\n>foo\n---GCUACGUAGCU---------\n>seq_0\nACUGCUAGCUAGUAGCGUACGUA\n>seq_1\n---GCUACGUAGCUAC-------\n>seq_2\nGCGGCUAUUAGAUCGUA------" build_tree_seqs_short = """>muscle_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AGCTTTAAATCATGCCAGTG >muscle_test_seqs_1 GACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT TGCTTTCAATAATGCCAGTG >muscle_test_seqs_2 AACCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT TGCTTTGAATCATGCCAGTA >muscle_test_seqs_3 AAACCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT TGCTTTACATCATGCAAGTG >muscle_test_seqs_4 AACCGCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT TGCTTTAAATCATGCCAGTG >muscle_test_seqs_5 AACCCCCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA TGCTTTAAATCATGCCAGTT >muscle_test_seqs_6 GACCCCCGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT TACTTTAGATCATGCCGGTG >muscle_test_seqs_7 AACCCCCACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT TGCTTTAAATCATGCCAGTG >muscle_test_seqs_8 AACCCCCACGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT TGCATTAAATCATGCCAGTG >muscle_test_seqs_9 AAGCCCCACGGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA TGCTTTAAATCCTGACAGCG """ build_tree_seqs_long = """>muscle_test_seqs_0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AGCTTTAAATCATGCCAGTG >muscle_test_seqsaaaaaaaa_1 GACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT TGCTTTCAATAATGCCAGTG >muscle_test_seqsaaaaaaaa_2 AACCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT TGCTTTGAATCATGCCAGTA >muscle_test_seqsaaaaaaaa_3 AAACCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT TGCTTTACATCATGCAAGTG >muscle_test_seqsaaaaaaaa_4 AACCGCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT TGCTTTAAATCATGCCAGTG >muscle_test_seqsaaaaaaaa_5 AACCCCCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA TGCTTTAAATCATGCCAGTT >muscle_test_seqsaaaaaaaa_6 GACCCCCGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT TACTTTAGATCATGCCGGTG >muscle_test_seqsaaaaaaaa_7 AACCCCCACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT TGCTTTAAATCATGCCAGTG >muscle_test_seqsaaaaaaaa_8 AACCCCCACGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT TGCATTAAATCATGCCAGTG >muscle_test_seqsaaaaaaaa_9 AAGCCCCACGGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA TGCTTTAAATCCTGACAGCG """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_nupack.py000644 000765 000024 00000005150 12024702176 022115 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.nupack import Nupack __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class NupackTest(TestCase): """Tests for Nupack application controller""" def setUp(self): self.input = nupack_input def test_input_as_lines(self): """Test nupack input as lines""" n = Nupack(InputHandler='_input_as_lines') exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in nupack_stdout]) res = n(self.input) obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() def test_input_as_string(self): """Test nupack input as string""" n = Nupack() exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in nupack_stdout]) f = open('/tmp/single.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = n('/tmp/single.fasta') obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() remove('/tmp/single.fasta') def test_get_result_path(self): """Tests nupack result path""" n = Nupack(InputHandler='_input_as_lines') res = n(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus', 'pair','ene']) assert res['pair'] is not None assert res['ene'] is not None res.cleanUp() nupack_input = ['>seq1\n', 'GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '\n'] nupack_stdout = ['****************************************************************\n', 'NUPACK 1.2\n', 'Copyright 2003, 2004 by Robert M. Dirks & Niles A. Pierce\n', 'California Institute of Technology\n', 'Pasadena, CA 91125 USA\n', '\n', 'Last Modified: 03/18/2004\n', '****************************************************************\n', '\n', '\n', 'Fold.out Version 1.2: Complexity O(N^5) (pseudoknots enabled)\n', 'Reading Input File...\n', 'Sequence Read.\n', 'Energy Parameters Loaded\n', 'SeqLength = 71\n', 'Sequence and a Minimum Energy Structure:\n', 'GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '.....((..((.((...{.{{{{{{.{.{{{{{{{{......)).))..))..}}}}}}}}}.}}}}}}.}\n', 'mfe = -23.30 kcal/mol\n', 'pseudoknotted!\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_parameters.py000644 000765 000024 00000061276 12024702176 023012 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym, ParameterError, FilePath __author__ = "Sandra Smit and Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Greg Caporaso", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" class FlagParameterTests(TestCase): """ Tests of the FlagParameter class """ def setUp(self): """Setup some variables for the tests to use """ self.p_modify_prefix = [FlagParameter(Name='d',Prefix='-'),\ FlagParameter(Name='d',Prefix='--'),\ FlagParameter(Name='d',Prefix='')] self.p_modify_name = [FlagParameter(Name='d',Prefix='-'),\ FlagParameter(Name='D',Prefix='-'),\ FlagParameter(Name=4,Prefix='-'),\ FlagParameter(Name='abcdef',Prefix='-')] self.p_On = [FlagParameter(Name='d',Prefix='-',Value=True),\ FlagParameter(Name='d',Prefix='-',Value=5),\ FlagParameter(Name='d',Prefix='-',Value=[1]),\ FlagParameter(Name='d',Prefix='-',Value='F')] self.p_Off = [FlagParameter(Name='d',Prefix='-',Value=False),\ FlagParameter(Name='d',Prefix='-',Value=None),\ FlagParameter(Name='d',Prefix='-',Value=[]),\ FlagParameter(Name='d',Prefix='-',Value=0),\ FlagParameter(Name='d',Prefix='-',Value='')] self.ID_tests = [FlagParameter(Name='d',Prefix='-'),\ FlagParameter(Name='d',Prefix=''),\ FlagParameter(Name='',Prefix='-'),\ FlagParameter(Name=4,Prefix='-'),\ FlagParameter(Name=None,Prefix='-'),\ FlagParameter(Name=4,Prefix=None),\ FlagParameter(Name='abcdef',Prefix='-')] def test_init(self): """FlagParameter: init functions as expected """ param = FlagParameter(Name='a',Prefix='-',Value=42) self.assertEqual(param.Name,'a') self.assertEqual(param.Prefix,'-') self.assertEqual(param.Value,42) self.assertEqual(param.Delimiter,None) self.assertEqual(param.Quote,None) self.assertEqual(param.Id,'-a') def test_init_defaults(self): """FlagParameter: init functions as expected with default values""" p = FlagParameter(Name='a',Prefix='-') self.assertEqual(p.Name,'a') self.assertEqual(p.Prefix,'-') self.assertEqual(p.Value,False) self.assertEqual(p.Delimiter,None) self.assertEqual(p.Quote,None) self.assertEqual(p.Id,'-a') def test_get_id(self): """FlagParameter: _get_id functions as expected """ expected_results = ['-d','d','-','-4','-','4','-abcdef'] for param,exp in zip(self.ID_tests,expected_results): self.assertEqual(param._get_id(),exp) def test_eq(self): """FlagParameter: eq functions as expected """ p1 = FlagParameter(Name='a',Prefix='-',Value=True) p2 = FlagParameter(Name='a',Prefix='-',Value=True) p3 = FlagParameter(Name='a',Prefix='-') p4 = FlagParameter(Name='i',Prefix='-',Value=True) p5 = FlagParameter(Name='a',Prefix='--',Value=True) assert p1 == p2 assert not p1 == p3 assert not p1 == p4 assert not p1 == p5 assert not p3 == p4 assert not p3 == p5 assert not p4 == p5 def test_ne(self): """FlagParameter: ne functions as expected """ p1 = FlagParameter(Name='a',Prefix='-',Value=True) p2 = FlagParameter(Name='a',Prefix='-',Value=True) p3 = FlagParameter(Name='a',Prefix='-') p4 = FlagParameter(Name='i',Prefix='-',Value=True) p5 = FlagParameter(Name='a',Prefix='--',Value=True) assert not p1 != p2 assert p1 != p3 assert p1 != p4 assert p1 != p5 assert p3 != p4 assert p3 != p5 assert p4 != p5 def test_isOn_True(self): """FlagParameter: isOn functions as expected with True Values """ for param in self.p_On: assert param.isOn() def test_isOn_False(self): """FlagParameter: isOn functions as expected with False Values """ for param in self.p_Off: assert not param.isOn() def test_isOff_True(self): """FlagParameter: isOff functions as expected with True values """ for param in self.p_Off: assert param.isOff() def test_isOff_False(self): """FlagParameter: isOff functions as expected with False values """ for param in self.p_On: assert not param.isOff() def test_on(self): """FlagParameter: on functions as expected """ for param in self.p_On + self.p_Off: param.on() assert param.isOn() def test_off(self): """FlagParameter: off functions as expected """ for param in self.p_On + self.p_Off: param.off() assert param.isOff() def test_str_modify_prefix(self): """FlagParameter: str functions as expected with different prefixes """ expected_results = ['-d','--d','d'] for param,exp in zip(self.p_modify_prefix,expected_results): param.on() self.assertEqual(str(param),exp) def test_str_modify_name(self): """FlagParameter: str functions as expected with different names """ expected_results = ['-d','-D','-4','-abcdef'] for param,exp in zip(self.p_modify_name,expected_results): param.on() self.assertEqual(str(param),exp) class ValuedParameterTests(TestCase): """ Tests of the ValuedParameter class """ constructor = ValuedParameter s = 'Valued' def setUp(self): """Setup some variables for the tests to use """ self.p_modify_prefix = [self.constructor(Name='d',Prefix='-'),\ self.constructor(Name='d',Prefix='--'),\ self.constructor(Name='d',Prefix='')] self.p_modify_name = [self.constructor(Name='d',Prefix='-'),\ self.constructor(Name='D',Prefix='-'),\ self.constructor(Name=4,Prefix='-'),\ self.constructor(Name='abcdef',Prefix='-')] self.p_On = [self.constructor(Name='d',Prefix='-',Value=True),\ self.constructor(Name='d',Prefix='-',Value=5),\ self.constructor(Name='d',Prefix='-',Value=[1]),\ self.constructor(Name='d',Prefix='-',Value=False),\ self.constructor(Name='d',Prefix='-',Value='F')] self.p_Off = [self.constructor(Name='d',Prefix='-',Value=None)] self.p_full = [self.constructor(Name='a',Prefix='-',\ Value=42,Delimiter=' ',Quote='\'')] self.p_default = [self.constructor(Name='a',Prefix='-')] self.p_modified_prefix = [self.constructor(Name='d',Prefix='-'),\ self.constructor(Name='d',Prefix='--'),\ self.constructor(Name='d',Prefix='')] self.p_modified_name = [self.constructor(Name='d',Prefix='-'),\ self.constructor(Name='D',Prefix='-'),\ self.constructor(Name=4,Prefix='-'),\ self.constructor(Name='abcdef',Prefix='-')] self.p_modified_delimiter =\ [self.constructor(Name='d',Prefix='-',Value=42),\ self.constructor(Name='d',Prefix='-',Value=42,Delimiter=''),\ self.constructor(Name='d',Prefix='-',Value=42,Delimiter=' '),\ self.constructor(Name='d',Prefix='-',Value=42,Delimiter=9),\ self.constructor(Name='d',Prefix='-',Value=42,Delimiter='=')] self.p_modified_value =\ [self.constructor(Name='d',Prefix='-',Value=42,Delimiter=' '),\ self.constructor(Name='d',Prefix='-',Value='pbl',Delimiter=' '),\ self.constructor(Name='d',Prefix='-',Value='2-2',Delimiter=' '),\ self.constructor(Name='d',Prefix='-',Value='evo/t.txt',\ Delimiter=' '),\ self.constructor(Name='d',Prefix='-',Value='\'',Delimiter=' ')] self.p_modified_quote =\ [self.constructor(Name='d',Prefix='-',Value=42,Quote=''),\ self.constructor(Name='d',Prefix='-',Value=42),\ self.constructor(Name='d',Prefix='-',Value=42,Quote=' '),\ self.constructor(Name='d',Prefix='-',Value=42,Quote='\''),\ self.constructor(Name='d',Prefix='-',Value=42,Quote='\"'),\ self.constructor(Name='d',Prefix='-',Value=42,Quote='x')] self.ID_tests = [self.constructor(Name='d',Prefix='-'),\ self.constructor(Name='d',Prefix=''),\ self.constructor(Name='',Prefix='-'),\ self.constructor(Name=4,Prefix='-'),\ self.constructor(Name=None,Prefix='-'),\ self.constructor(Name=4,Prefix=None),\ self.constructor(Name='abcdef',Prefix='-')] self.p_modified_is_path =\ [self.constructor(Name='d',Prefix='-',Delimiter=' ',\ Value='test.txt',IsPath=True),\ self.constructor(Name='d',Prefix='-',Delimiter=' ',\ Value='test.txt',IsPath=False),\ self.constructor(Name='d',Prefix='-',Delimiter=' ',\ Value='test.txt',Quote='"',IsPath=True)] def test_init(self): """Parameter: init functions as expected """ for param in self.p_full: self.assertEqual(param.Name,'a') self.assertEqual(param.Prefix,'-') self.assertEqual(param.Value,42) self.assertEqual(param.Delimiter,' ') self.assertEqual(param.Quote,'\'') self.assertEqual(param.Id,'-a') def test_init_defaults(self): """Parameter: init functions as expected with default values""" for p in self.p_default: self.assertEqual(p.Name,'a') self.assertEqual(p.Prefix,'-') self.assertEqual(p.Value,None) self.assertEqual(p.Delimiter,None) self.assertEqual(p.Quote,None) self.assertEqual(p.Id,'-a') def test_get_id(self): """Parameter: _get_id functions as expected """ expected_results = ['-d','d','-','-4','-','4','-abcdef'] for param,exp in zip(self.ID_tests,expected_results): self.assertEqual(param._get_id(),exp) def test_eq(self): """Parameter: eq functions as expected """ p1 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='=') p2 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='=') p3 = self.constructor(Name='dsf',Prefix='-',Value=42,Quote='\'',\ Delimiter='=') p4 = self.constructor(Name='a',Prefix='--',Value=42,Quote='\'',\ Delimiter='=') p5 = self.constructor(Name='a',Prefix='-',Value=942,Quote='\'',\ Delimiter='=') p6 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\"',\ Delimiter='=') p7 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='!!!') p8 = self.constructor(Name='wwwww',Prefix='-------') p9 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='=',IsPath=True) assert p1 == p2 assert not p1 == p3 assert not p1 == p4 assert not p1 == p5 assert not p1 == p6 assert not p1 == p7 assert not p1 == p8 assert not p1 == p9 # test default setting p5.Value = 42 assert not p1 == p5 def test_ne(self): """Parameter: ne functions as expected """ p1 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='=') p2 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='=') p3 = self.constructor(Name='dsf',Prefix='-',Value=42,Quote='\'',\ Delimiter='=') p4 = self.constructor(Name='a',Prefix='--',Value=42,Quote='\'',\ Delimiter='=') p5 = self.constructor(Name='a',Prefix='-',Value=942,Quote='\'',\ Delimiter='=') p6 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\"',\ Delimiter='=') p7 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='!!!') p8 = self.constructor(Name='wwwww',Prefix='-------') p9 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='=',IsPath=True) assert not p1 != p2 assert p1 != p3 assert p1 != p4 assert p1 != p5 assert p1 != p6 assert p1 != p7 assert p1 != p8 assert p1 != p9 # test default setting p5.Value = 42 assert p1 != p5 def test_get_default(self): """Parameter: default behaves as expected """ p1 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='=') self.assertEqual(p1._get_default(),42) p1.Value = 43 self.assertEqual(p1._get_default(),42) def test_get_default_w_IsPath(self): """Parameter: default is a FilePath object when IsPath is set """ p = self.constructor(Name='a',Prefix='-',Value='test.txt',Quote='\'',\ Delimiter='=',IsPath=True) self.assertEqual(p._get_default(),'test.txt') self.assertEqual(p.Default,'test.txt') p.Value = 'test2.txt' self.assertEqual(p._get_default(),'test.txt') self.assertEqual(p.Default,'test.txt') assert isinstance(p._get_default(),FilePath) assert isinstance(p.Default,FilePath) def test_reset(self): """Parameter: reset correctly set Value to _default """ p1 = self.constructor(Name='a',Prefix='-',Value=42,Quote='\'',\ Delimiter='=') p1.Value=43 self.assertNotEqual(p1.Default,p1.Value) p1.reset() self.assertEqual(p1.Default,p1.Value) def test_isOn_True(self): """Parameter: isOn functions as expected with True Values """ for param in self.p_On: assert param.isOn() def test_isOn_False(self): """Parameter: isOn functions as expected with False Values """ for param in self.p_Off: assert not param.isOn() def test_isOff_True(self): """Parameter: isOff functions as expected with True values """ for param in self.p_Off: assert param.isOff() def test_isOff_False(self): """Parameter: isOff functions as expected with False values """ for param in self.p_On: assert not param.isOff() def test_on(self): """Parameter: on functions as expected """ for param in self.p_On + self.p_Off: param.on('a') assert param.isOn() p = self.p_On[0] self.assertRaises(ParameterError,p.on,None) def test_off(self): """Parameter: off functions as expected """ for param in self.p_On + self.p_Off: param.off() assert param.isOff() def test_str_off(self): """Parameter: str() prints empty string when off """ for p in self.p_Off: self.assertEqual(str(p),'') def test_str_modify_prefix(self): """Parameter: str functions as expected with different prefixes """ expected_results = ['-d','--d','d'] for param,exp in zip(self.p_modified_prefix,expected_results): param.on('') self.assertEqual(str(param),exp) def test_str_modify_name(self): """Parameter: str functions as expected with different names """ expected_results = ['-d','-D','-4','-abcdef'] for param,exp in zip(self.p_modified_name,expected_results): param.on('') self.assertEqual(str(param),exp) def test_str_modify_delimiter(self): """Parameter: str functions as expected with different delimiter """ expected_results = ['-d42','-d42','-d 42','-d942', '-d=42'] for param,exp in zip(self.p_modified_delimiter,expected_results): self.assertEqual(str(param),exp) def test_str_modify_values(self): """Parameter: str functions as expected with different values """ expected_results = ['-d 42',\ '-d pbl','-d 2-2','-d evo/t.txt', '-d \''] for param,exp in zip(self.p_modified_value,expected_results): self.assertEqual(str(param),exp) def test_str_modify_quotes(self): """Parameter: str functions as expected with different quotes """ expected_results = ['-d42','-d42','-d 42 ','-d\'42\'',\ '-d\"42\"','-dx42x'] for param,exp in zip(self.p_modified_quote,expected_results): self.assertEqual(str(param),exp) def test_str_modify_is_path(self): """Parameter: str functions as expected with different IsPath """ expected_results = ['-d "test.txt"','-d test.txt','-d "test.txt"'] for param,exp in zip(self.p_modified_is_path,expected_results): self.assertEqual(str(param),exp) def test_str_full(self): """Parameter: str functions as expected with all values non-default """ for p in self.p_full: self.assertEqual(str(p),'-a \'42\'') class MixedParameterTests(ValuedParameterTests): """ Tests of the MixedParameter class """ constructor = MixedParameter def setUp(self): """Setup some variables for the tests to use """ super(MixedParameterTests,self).setUp() self.p_On = [self.constructor(Name='d',Prefix='-',Value=True),\ self.constructor(Name='d',Prefix='-',Value=5),\ self.constructor(Name='d',Prefix='-',Value=[1]),\ self.constructor(Name='d',Prefix='-',Value=None),\ self.constructor(Name='d',Prefix='-',Value='F')] self.p_Off = [self.constructor(Name='d',Prefix='-',Value=False)] # This is different from the superclass variable b/c we need to make # sure that specifying IsPath with Value=None functions as expected self.p_modified_is_path =\ [self.constructor(Name='d',Prefix='-',Delimiter=' ',\ Value='test.txt',IsPath=True),\ self.constructor(Name='d',Prefix='-',Delimiter=' ',\ Value='test.txt',Quote='"',IsPath=True),\ self.constructor(Name='d',Prefix='-',Delimiter=' ',\ Value='test.txt',IsPath=False),\ self.constructor(Name='d',Prefix='-',Delimiter=' ',\ Value=None,IsPath=True),\ self.constructor(Name='d',Prefix='-',Delimiter=' ',\ Value=None,IsPath=False)] def test_on(self): """Parameter: on functions as expected """ for param in self.p_On + self.p_Off: param.on('a') assert param.isOn() p = self.p_On[0] self.assertRaises(ParameterError,p.on,False) def test_init_defaults(self): """MixedParameter: init functions as expected with default values""" for p in self.p_default: self.assertEqual(p.Name,'a') self.assertEqual(p.Prefix,'-') self.assertEqual(p.Value,False) self.assertEqual(p.Delimiter,None) self.assertEqual(p.Quote,None) self.assertEqual(p.Id,'-a') self.assertEqual(p.IsPath,False) def test_str_all_modes(self): """MixedParameter: str() functions in various modes """ p = MixedParameter(Prefix='-',Name='d',Delimiter='=',Quote=']') self.assertEqual(str(p),'') p.on() self.assertEqual(str(p),'-d') p.on('a') self.assertEqual(str(p),'-d=]a]') def test_str_modify_is_path(self): """MixedParameter: str functions as expected with different IsPath """ # This is different from the superclass test b/c we need to make # sure that specifying IsPath with Value=None functions as expected expected_results = ['-d "test.txt"','-d "test.txt"',\ '-d test.txt','-d','-d'] for param,exp in zip(self.p_modified_is_path,expected_results): self.assertEqual(str(param),exp) class ParametersTests(TestCase): """Tests of the Parameters class""" def setUp(self): self.fp = FlagParameter(Prefix='-',Name='d') self.vp = ValuedParameter(Name='p',Prefix='-',Value=[1]) self.mp = MixedParameter(Prefix='--',Name='k',Delimiter=' ') self.all_params = {self.fp.Id:self.fp, self.vp.Id:self.vp, self.mp.Id:self.mp} self.p1 = Parameters() self.p2 = Parameters(self.all_params) self._synonyms = {'Pino':'-p','K':'k'} self.p3 = Parameters(self.all_params,self._synonyms) def test_init(self): """Parameters: init functions as expected""" self.assertEqualItems(self.p1,{}) self.assertEqualItems(self.p2,self.all_params) self.assertEqualItems(self.p3,self.all_params) def test_lookup(self): """Parameters: test ability to lookup """ self.assertEqual(self.p2['-p'],self.vp) self.assertEqual(self.p3['Pino'],self.vp) def test_immutability(self): """Parameters: attempt to modify object raises error """ self.assertRaises(NotImplementedError,self.p2.__setitem__,9) self.assertRaises(NotImplementedError,self.p2.setdefault,9) self.assertRaises(NotImplementedError,self.p2.update,{9:0}) self.assertRaises(NotImplementedError,self.p2.__delitem__,'-p') def test_all_off(self): """Parameters: all_off() should turn all parametes off""" p = self.p2 for v in p.values(): try: v.on(3) except TypeError: v.on() assert v.isOn() p.all_off() for v in p.values(): assert v.isOff() def parametersTests(TestCase): """Tests of top-level functions """ def test_find_synonym(self): """_find_synonym() functions as expected """ f = _find_synonym({'a':'-a'}) self.assertEqual(f('a'),'-a') class FilePathTests(TestCase): """ Tests of the FilePath class """ def setUp(self): """ Initialize variables to be used by tests """ self.filename = 'filename.txt' self.relative_dir_path = 'a/relative/path/' self.relative_dir_path_no_trailing_slash = 'a/relative/path' self.relative_file_path = 'a/relative/filepath.txt' self.absolute_dir_path = '/absolute/path/' self.absolute_file_path = '/absolute/filepath.txt' self.all_paths = [self.filename, self.relative_dir_path,\ self.relative_file_path, self.absolute_dir_path,\ self.absolute_file_path] def test_init(self): """FilePath: initialization returns w/o error """ for p in self.all_paths: self.assertEqual(FilePath(p),p) self.assertEqual(FilePath(''),'') def test_str(self): """FilePath: str wraps path in quotes """ # Do one explicit test (for sanity), then automatically run # through the examples self.assertEqual(str(FilePath(self.filename)),'"filename.txt"') for p in self.all_paths: self.assertEqual(str(FilePath(p)),'"' + p + '"') def test_str_path_is_None(self): """FilePath: str return empty string when path is None """ self.assertEqual(str(FilePath(None)),'') def test_add(self): """FilePath: add (or joining of paths) functions as expected """ actual = FilePath(self.relative_dir_path) + FilePath(self.filename) expected = FilePath('a/relative/path/filename.txt') self.assertEqual(actual,expected) # result is a FilePath assert isinstance(actual,FilePath) # appending a string to a FilePath results in a FilePath actual = FilePath(self.relative_dir_path) + 'filename.txt' expected = FilePath('a/relative/path/filename.txt') self.assertEqual(actual,expected) # result is a FilePath assert isinstance(actual,FilePath) def test_FilePath_identity_preserved(self): """FilePath: trivial actions on FilePaths yeild original FilePath """ p = FilePath(self.filename) # Creating FilePath from FilePath results in FilePath # equal to original self.assertEqual(FilePath(p),p) for p in self.all_paths: self.assertEqual(FilePath(p),p) # Appending an empty FilePath to a FilePath results in FilePath # equal to original self.assertEqual(p+FilePath(''),p) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_parsinsert.py000644 000765 000024 00000010424 12024702176 023026 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for ParsInsert v1.03 application controller.""" __author__ = "Jesse Stombaugh" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Stombaugh"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Stombaugh" __email__ = "jesse.stombaugh@colorado.edu" __status__ = "Production" from shutil import rmtree from cogent.util.unit_test import TestCase, main from cogent.app.parsinsert import ParsInsert, insert_sequences_into_tree from cogent.core.alignment import Alignment from cogent.parse.fasta import MinimalFastaParser from cogent.parse.tree import DndParser from cogent.core.moltype import DNA from cogent.app.util import get_tmp_filename from os.path import splitext from os import getcwd, remove, rmdir, mkdir class ParsInsertTests(TestCase): def setUp(self): # create a list of files to cleanup self._paths_to_clean_up = [] self._dirs_to_clean_up = [] # load query seqs self.seqs = Alignment(MinimalFastaParser(QUERY_SEQS.split())) # generate temp filename tmp_dir='/tmp' self.outfile = get_tmp_filename(tmp_dir) # create and write out reference sequence file self.outfasta=splitext(self.outfile)[0]+'.fasta' fastaout=open(self.outfasta,'w') fastaout.write(REF_SEQS) fastaout.close() self._paths_to_clean_up.append(self.outfasta) # create and write out starting tree file self.outtree=splitext(self.outfile)[0]+'.tree' treeout=open(self.outtree,'w') treeout.write(REF_TREE) treeout.close() self._paths_to_clean_up.append(self.outtree) def tearDown(self): """cleans up all files initially created""" # remove the tempdir and contents map(remove,self._paths_to_clean_up) map(rmdir,self._dirs_to_clean_up) def test_base_command(self): """Base command-calls""" app = ParsInsert() self.assertEqual(app.BaseCommand, \ ''.join(['cd "',getcwd(),'/"; ','ParsInsert'])) def test_change_working_dir(self): """Change working dir""" app = ParsInsert(WorkingDir='/tmp/ParsInsertTest') self.assertEqual(app.BaseCommand, \ ''.join(['cd "','/tmp/ParsInsertTest',\ '/"; ','ParsInsert'])) rmtree('/tmp/ParsInsertTest') def test_insert_sequences_into_tree(self): """Inserts sequences into Tree""" # define log fp log_fp='/tmp/parsinsert.log' self._paths_to_clean_up.append(log_fp) # define tax assignment values fp tax_assign_fp='/tmp/tax_assignments.log' self._paths_to_clean_up.append(tax_assign_fp) # set the reference alignment and starting tree param={ '-t':self.outtree, '-s':self.outfasta, '-l':log_fp, '-o':tax_assign_fp } seqs, align_map = self.seqs.toPhylip() # insert sequences into tree tree = insert_sequences_into_tree(seqs, DNA, params=param) # rename tips back to query names for node in tree.tips(): if node.Name in align_map: node.Name = align_map[node.Name] self.assertEqual(tree.getNewick(with_distances=True),exp_tree) QUERY_SEQS= """\ >6 TGCATGTCAGTATAGCTTTGGTGAAACTGCGAATGGCTCATTAAATCAGT >7 TGCATGTCAGTATAACTTTGGTGAAACTGCGAATGGCTCATTAAATCAGT """ REF_SEQS= """\ >seq0000011 TGCATGTCAGTATAGCTTTAGTGAAACTGCGAATGGCTCATTAAATCAGT >seq0000012 TGCATGTCAGTATAGCTTTAGTGAAACTGCGAATGGCTNNTTAAATCAGT >seq0000013 TGCATGTCAGTATAGCATTAGTGAAACTGCGAATGGCTCATTAAATCAGT >seq0000014 TCCATGTCAGTATAACTTTGGTGAAACTGCGAATGGCTCATTAAATCAGG >seq0000015 NNNNNNNNNNTATATCTTATGTGAAACTTCGAATGCCTCATTAAATCAGT """ REF_TREE="""((seq0000014:0.08408,seq0000015:0.13713)0.609:0.00215,seq0000013:0.02032,(seq0000011:0.00014,seq0000012:0.00014)0.766:0.00015); """ exp_tree = """((seq0000014:0.08408,seq0000015:0.13713,7:0.02027):0.00215,seq0000013:0.02032,(seq0000011:0.00014,seq0000012:0.00014,6:0.02027):0.00015):0.0;""" if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_pfold.py000644 000765 000024 00000036713 12024702176 021751 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides Tests for pfold application controller. IMPORTANT!!!!! 'dir' variable must be set in pfold app controller file for application to work! """ from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.pfold import fasta2col,findphyl,mltree,scfg __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class Fasta2colTest(TestCase): """Tests for fasta2col application controller""" def setUp(self): self.input = f2c_input def test_stdout_input_as_lines(self): """Test fasta2col stdout input as lines""" c = fasta2col(InputHandler='_input_as_lines') res = c(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_stdout_input_as_string(self): """Test fasta2col stdout input as string""" c = fasta2col() f = open('/tmp/single.col','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = c('/tmp/single.col') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/single.col') def test_get_result_path(self): """Tests fasta2col result path""" c = fasta2col(InputHandler='_input_as_lines') res = c(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) res.cleanUp() class FindphylTest(TestCase): """Tests for findphyl application controller""" def setUp(self): self.input = fp_input def test_input_as_lines(self): """Test findphyl input as lines""" fp = findphyl(InputHandler='_input_as_lines') res = fp(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test findphyl input as string""" fp = findphyl() f = open('/tmp/single.col','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = fp('/tmp/single.col') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/single.col') def test_get_result_path(self): """Tests findphyl result path""" fp = findphyl(InputHandler='_input_as_lines') res = fp(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) res.cleanUp() class MltreeTest(TestCase): """Tests for Pfold application controller""" def setUp(self): self.input = ml_input def test_input_as_lines(self): """Test mltree input as lines""" m = mltree(InputHandler='_input_as_lines') res = m(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test mltree input as string""" m = mltree() f = open('/tmp/single.col','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = m('/tmp/single.col') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/single.col') def test_get_result_path(self): """Tests mltree result path""" m = mltree(InputHandler='_input_as_lines') res = m(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) res.cleanUp() class ScfgTest(TestCase): """Tests for scfg application controller""" def setUp(self): self.input = scfg_input def test_input_as_lines(self): """Test scfg input as lines""" s = scfg(InputHandler='_input_as_lines') res = s(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test scfg input as string""" s = scfg() f = open('/tmp/single.col','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = s('/tmp/single.col') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/single.col') def test_get_result_path(self): """Tests scfg result path""" s = scfg(InputHandler='_input_as_lines') res = s(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) res.cleanUp() f2c_input = ['>seq1\n', 'GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '\n'] f2c_stdout = ['; generated by fasta2col\n', '; ============================================================\n', '; TYPE RNA\n', '; COL 1 label\n', '; COL 2 residue\n', '; COL 3 seqpos\n', '; COL 4 alignpos\n', '; ENTRY seq1\n', '; ----------\n', 'N G 1 1\n', 'N G 2 2\n', 'N C 3 3\n', 'N U 4 4\n', 'N A 5 5\n', 'N G 6 6\n', 'N A 7 7\n', 'N U 8 8\n', 'N A 9 9\n', 'N G 10 10\n', 'N C 11 11\n', 'N U 12 12\n', 'N C 13 13\n', 'N A 14 14\n', 'N G 15 15\n', 'N A 16 16\n', 'N U 17 17\n', 'N G 18 18\n', 'N G 19 19\n', 'N U 20 20\n', 'N A 21 21\n', 'N G 22 22\n', 'N A 23 23\n', 'N G 24 24\n', 'N C 25 25\n', 'N A 26 26\n', 'N G 27 27\n', 'N A 28 28\n', 'N G 29 29\n', 'N G 30 30\n', 'N A 31 31\n', 'N U 32 32\n', 'N U 33 33\n', 'N G 34 34\n', 'N A 35 35\n', 'N A 36 36\n', 'N G 37 37\n', 'N A 38 38\n', 'N U 39 39\n', 'N C 40 40\n', 'N C 41 41\n', 'N U 42 42\n', 'N U 43 43\n', 'N G 44 44\n', 'N U 45 45\n', 'N G 46 46\n', 'N U 47 47\n', 'N C 48 48\n', 'N G 49 49\n', 'N U 50 50\n', 'N C 51 51\n', 'N G 52 52\n', 'N G 53 53\n', 'N U 54 54\n', 'N U 55 55\n', 'N C 56 56\n', 'N G 57 57\n', 'N A 58 58\n', 'N U 59 59\n', 'N C 60 60\n', 'N C 61 61\n', 'N C 62 62\n', 'N G 63 63\n', 'N G 64 64\n', 'N C 65 65\n', 'N U 66 66\n', 'N C 67 67\n', 'N U 68 68\n', 'N G 69 69\n', 'N G 70 70\n', 'N C 71 71\n', '; **********\n'] fp_input = f2c_stdout fp_stdout = ['; generated by fasta2col\n', '; ============================================================\n', '; TYPE TREE\n', '; COL 1 label\n', '; COL 2 number\n', '; COL 3 name\n', '; COL 4 uplen\n', '; COL 5 child\n', '; COL 6 brother\n', '; ENTRY tree\n', '; root 1\n', '; ----------\n', ' N 1 seq1 0.000000 . .\n', '; **********\n', '; TYPE RNA\n', '; COL 1 label\n', '; COL 2 residue\n', '; COL 3 seqpos\n', '; COL 4 alignpos\n', '; ENTRY seq1\n', '; ----------\n', 'N G 1 1\n', 'N G 2 2\n', 'N C 3 3\n', 'N U 4 4\n', 'N A 5 5\n', 'N G 6 6\n', 'N A 7 7\n', 'N U 8 8\n', 'N A 9 9\n', 'N G 10 10\n', 'N C 11 11\n', 'N U 12 12\n', 'N C 13 13\n', 'N A 14 14\n', 'N G 15 15\n', 'N A 16 16\n', 'N U 17 17\n', 'N G 18 18\n', 'N G 19 19\n', 'N U 20 20\n', 'N A 21 21\n', 'N G 22 22\n', 'N A 23 23\n', 'N G 24 24\n', 'N C 25 25\n', 'N A 26 26\n', 'N G 27 27\n', 'N A 28 28\n', 'N G 29 29\n', 'N G 30 30\n', 'N A 31 31\n', 'N U 32 32\n', 'N U 33 33\n', 'N G 34 34\n', 'N A 35 35\n', 'N A 36 36\n', 'N G 37 37\n', 'N A 38 38\n', 'N U 39 39\n', 'N C 40 40\n', 'N C 41 41\n', 'N U 42 42\n', 'N U 43 43\n', 'N G 44 44\n', 'N U 45 45\n', 'N G 46 46\n', 'N U 47 47\n', 'N C 48 48\n', 'N G 49 49\n', 'N U 50 50\n', 'N C 51 51\n', 'N G 52 52\n', 'N G 53 53\n', 'N U 54 54\n', 'N U 55 55\n', 'N C 56 56\n', 'N G 57 57\n', 'N A 58 58\n', 'N U 59 59\n', 'N C 60 60\n', 'N C 61 61\n', 'N C 62 62\n', 'N G 63 63\n', 'N G 64 64\n', 'N C 65 65\n', 'N U 66 66\n', 'N C 67 67\n', 'N U 68 68\n', 'N G 69 69\n', 'N G 70 70\n', 'N C 71 71\n', '; **********\n'] ml_input = fp_stdout ml_stdout = ['; generated by fasta2col\n', '; ============================================================\n', '; TYPE TREE\n', '; COL 1 label\n', '; COL 2 number\n', '; COL 3 name\n', '; COL 4 uplen\n', '; COL 5 child\n', '; COL 6 brother\n', '; ENTRY tree\n', '; root 1\n', '; ----------\n', ' N 1 seq1 0.001000 . .\n', '; **********\n', '; TYPE RNA\n', '; COL 1 label\n', '; COL 2 residue\n', '; COL 3 seqpos\n', '; COL 4 alignpos\n', '; ENTRY seq1\n', '; ----------\n', 'N G 1 1\n', 'N G 2 2\n', 'N C 3 3\n', 'N U 4 4\n', 'N A 5 5\n', 'N G 6 6\n', 'N A 7 7\n', 'N U 8 8\n', 'N A 9 9\n', 'N G 10 10\n', 'N C 11 11\n', 'N U 12 12\n', 'N C 13 13\n', 'N A 14 14\n', 'N G 15 15\n', 'N A 16 16\n', 'N U 17 17\n', 'N G 18 18\n', 'N G 19 19\n', 'N U 20 20\n', 'N A 21 21\n', 'N G 22 22\n', 'N A 23 23\n', 'N G 24 24\n', 'N C 25 25\n', 'N A 26 26\n', 'N G 27 27\n', 'N A 28 28\n', 'N G 29 29\n', 'N G 30 30\n', 'N A 31 31\n', 'N U 32 32\n', 'N U 33 33\n', 'N G 34 34\n', 'N A 35 35\n', 'N A 36 36\n', 'N G 37 37\n', 'N A 38 38\n', 'N U 39 39\n', 'N C 40 40\n', 'N C 41 41\n', 'N U 42 42\n', 'N U 43 43\n', 'N G 44 44\n', 'N U 45 45\n', 'N G 46 46\n', 'N U 47 47\n', 'N C 48 48\n', 'N G 49 49\n', 'N U 50 50\n', 'N C 51 51\n', 'N G 52 52\n', 'N G 53 53\n', 'N U 54 54\n', 'N U 55 55\n', 'N C 56 56\n', 'N G 57 57\n', 'N A 58 58\n', 'N U 59 59\n', 'N C 60 60\n', 'N C 61 61\n', 'N C 62 62\n', 'N G 63 63\n', 'N G 64 64\n', 'N C 65 65\n', 'N U 66 66\n', 'N C 67 67\n', 'N U 68 68\n', 'N G 69 69\n', 'N G 70 70\n', 'N C 71 71\n', '; **********\n'] scfg_input = ml_stdout scfg_stdout = ['; generated by fasta2col\n', '; ============================================================\n', '; TYPE TREE\n', '; COL 1 label\n', '; COL 2 number\n', '; COL 3 name\n', '; COL 4 uplen\n', '; COL 5 child\n', '; COL 6 brother\n', '; ENTRY tree\n', '; root 1\n', '; ----------\n', ' N 1 seq1 0.001000 . .\n', '; **********\n', '; TYPE RNA\n', '; COL 1 label\n', '; COL 2 residue\n', '; COL 3 seqpos\n', '; COL 4 alignpos\n', '; COL 5 align_bp\n', '; COL 6 certainty\n', '; ENTRY seq1\n', '; ----------\n', 'N G 1 1 . 0.8723\n', 'N G 2 2 71 0.5212\n', 'N C 3 3 70 0.5697\n', 'N U 4 4 69 0.5377\n', 'N A 5 5 68 0.5193\n', 'N G 6 6 67 0.4899\n', 'N A 7 7 . 0.6499\n', 'N U 8 8 . 0.9159\n', 'N A 9 9 . 0.7860\n', 'N G 10 10 . 0.4070\n', 'N C 11 11 . 0.3208\n', 'N U 12 12 . 0.4499\n', 'N C 13 13 . 0.5170\n', 'N A 14 14 . 0.8507\n', 'N G 15 15 . 0.8156\n', 'N A 16 16 . 0.8715\n', 'N U 17 17 . 0.8722\n', 'N G 18 18 . 0.7489\n', 'N G 19 19 . 0.7477\n', 'N U 20 20 . 0.7500\n', 'N A 21 21 . 0.7109\n', 'N G 22 22 . 0.3999\n', 'N A 23 23 . 0.3800\n', 'N G 24 24 . 0.3241\n', 'N C 25 25 . 0.3072\n', 'N A 26 26 . 0.7266\n', 'N G 27 27 . 0.5672\n', 'N A 28 28 . 0.4695\n', 'N G 29 29 41 0.5402\n', 'N G 30 30 40 0.5552\n', 'N A 31 31 39 0.5167\n', 'N U 32 32 38 0.4277\n', 'N U 33 33 . 0.4967\n', 'N G 34 34 . 0.7119\n', 'N A 35 35 . 0.7504\n', 'N A 36 36 . 0.8149\n', 'N G 37 37 . 0.6416\n', 'N A 38 38 32 0.4277\n', 'N U 39 39 31 0.5167\n', 'N C 40 40 30 0.5552\n', 'N C 41 41 29 0.5402\n', 'N U 42 42 . 0.4754\n', 'N U 43 43 . 0.6645\n', 'N G 44 44 . 0.7777\n', 'N U 45 45 . 0.8122\n', 'N G 46 46 . 0.6969\n', 'N U 47 47 . 0.6836\n', 'N C 48 48 . 0.5963\n', 'N G 49 49 . 0.4748\n', 'N U 50 50 . 0.5209\n', 'N C 51 51 . 0.4149\n', 'N G 52 52 . 0.3754\n', 'N G 53 53 . 0.4969\n', 'N U 54 54 . 0.7206\n', 'N U 55 55 . 0.7112\n', 'N C 56 56 . 0.5039\n', 'N G 57 57 . 0.5171\n', 'N A 58 58 . 0.5683\n', 'N U 59 59 . 0.6093\n', 'N C 60 60 . 0.5462\n', 'N C 61 61 . 0.3332\n', 'N C 62 62 . 0.3746\n', 'N G 63 63 . 0.3898\n', 'N G 64 64 . 0.3047\n', 'N C 65 65 . 0.2899\n', 'N U 66 66 . 0.3037\n', 'N C 67 67 6 0.4899\n', 'N U 68 68 5 0.5193\n', 'N G 69 69 4 0.5377\n', 'N G 70 70 3 0.5697\n', 'N C 71 71 2 0.5212\n', '; **********\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_pknotsrg.py000644 000765 000024 00000004057 12024702176 022510 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.pknotsrg import PknotsRG __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class PknotsrgTest(TestCase): """Tests for Pknotsrg application controller""" def setUp(self): self.input = pknotsrg_input def test_stdout_input_as_lines(self): """Test pknotsrg stdout input as lines""" p = PknotsRG(InputHandler='_input_as_lines') exp= '%s\n' % '\n'.join([str(i).strip('\n') for i in pknotsrg_stdout]) res = p(self.input) obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() def test_stdout_input_as_string(self): """Test pknotsrg stdout input as string""" p = PknotsRG() exp= '%s\n' % '\n'.join([str(i).strip('\n') for i in pknotsrg_stdout]) f = open('/tmp/single.plain','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = p('/tmp/single.plain') obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() remove('/tmp/single.plain') def test_get_result_path(self): """Tests pknotsrg result path""" p = PknotsRG(InputHandler='_input_as_lines') res = p(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() pknotsrg_input = ['GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n'] pknotsrg_stdout = ['GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '.((((((..((((........)))).(((((((...))))))).....(((((.......))))))))))) (-22.40)\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_pplacer.py000644 000765 000024 00000017363 12024702176 022273 0ustar00jrideoutstaff000000 000000 #!/bin/env python __author__ = "Jesse Stombaugh" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Stombaugh"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Stombaugh" __email__ = "jesse.stombaugh@colorado.edu" __status__ = "Production" from os import getcwd, remove, rmdir, mkdir from os.path import splitext from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from random import randint from cogent.app.pplacer import Pplacer, insert_sequences_into_tree from cogent.app.util import ApplicationError,get_tmp_filename from cogent.parse.fasta import MinimalFastaParser from cogent.core.tree import PhyloNode from cogent.core.moltype import RNA,DNA from StringIO import StringIO from cogent.core.alignment import Alignment class Genericpplacer(TestCase): def setUp(self): '''setup the files for testing pplacer''' # create a list of files to cleanup self._paths_to_clean_up = [] self._dirs_to_clean_up = [] # get a tmp filename to use basename=splitext(get_tmp_filename())[0] # create and write out RAxML stats file self.stats_fname=basename+'.stats' stats_out=open(self.stats_fname,'w') stats_out.write(RAXML_STATS) stats_out.close() self._paths_to_clean_up.append(self.stats_fname) # create and write out reference sequence file self.refseq_fname=basename+'_refseqs.fasta' refseq_out=open(self.refseq_fname,'w') refseq_out.write(REF_SEQS) refseq_out.close() self._paths_to_clean_up.append(self.refseq_fname) # create and write out query sequence file self.query_fname=basename+'_queryseqs.fasta' query_out=open(self.query_fname,'w') query_out.write(QUERY_SEQS) query_out.close() self._paths_to_clean_up.append(self.query_fname) # create and write out starting tree file self.tree_fname=basename+'.tre' tree_out=open(self.tree_fname,'w') tree_out.write(REF_TREE) tree_out.close() self._paths_to_clean_up.append(self.tree_fname) def writeTmp(self, outname): """Write data to temp file""" t = open(outname, "w+") t.write(PHYLIP_FILE) t.close() # def tearDown(self): """cleans up all files initially created""" # remove the tempdir and contents map(remove,self._paths_to_clean_up) map(rmdir,self._dirs_to_clean_up) class pplacerTests(Genericpplacer): """Tests for the pplacer application controller""" def test_pplacer(self): """Base command-calls""" app=Pplacer() self.assertEqual(app.BaseCommand, \ ''.join(['cd "',getcwd(),'/"; ','pplacer'])) app.Parameters['--help'].on() self.assertEqual(app.BaseCommand, \ ''.join(['cd "',getcwd(),'/"; ','pplacer --help'])) def test_change_working_dir(self): """Change working dir""" working_dir='/tmp/Pplacer' self._dirs_to_clean_up.append(working_dir) # define working directory for output app = Pplacer(WorkingDir=working_dir) self.assertEqual(app.BaseCommand, \ ''.join(['cd "','/tmp/Pplacer','/"; ','pplacer'])) def test_insert_sequences_into_tree(self): """Inserts sequences into Tree""" params={} # generate temp filename for output params["-r"] = self.refseq_fname params["-t"] = self.tree_fname params["-s"] = self.stats_fname params["--out-dir"] = "/tmp" aln_ref_query=MinimalFastaParser(StringIO(QUERY_SEQS)) aln = Alignment(aln_ref_query) seqs, align_map = aln.toPhylip() tree = insert_sequences_into_tree(seqs, DNA, params=params, write_log=False) # rename tips back to query names for node in tree.tips(): if node.Name in align_map: node.Name = align_map[node.Name] self.assertEqual(tree.getNewick(with_distances=True), RESULT_TREE) JSON_RESULT="""\ {"tree": "((seq0000004:0.08408[0],seq0000005:0.13713[1])0.609:0.00215[2],seq0000003:0.02032[3],(seq0000001:0.00014[4],seq0000002:0.00014[5])0.766:0.00015[6]):0[7];", "placements": [ {"p": [[0, -113.210938, 0.713818, 0.064504, 0.000006], [1, -114.929894, 0.127954, 0.137122, 0.000007], [2, -114.932766, 0.127587, 0.000008, 0.000006], [6, -117.743534, 0.007675, 0.000141, 0.027211], [3, -117.743759, 0.007674, 0.020310, 0.027207], [4, -117.747386, 0.007646, 0.000131, 0.027266], [5, -117.747396, 0.007646, 0.000131, 0.027266] ], "n": ["seq0000006"] }, {"p": [[0, -113.476305, 1.000000, 0.035395, 0.000006]], "n": ["seq0000007"] } ], "metadata": {"invocation": "pplacer -t %s -r %s -s %s --out-dir \/tmp %s" }, "version": 1, "fields": ["edge_num", "likelihood", "like_weight_ratio", "distal_length", "pendant_length" ] } """.replace('\n','').replace(' ','') QUERY_SEQS= """\ >6 TGCATGTCAGTATAGCTTTGGTGAAACTGCGAATGGCTCATTAAATCAGT >7 TGCATGTCAGTATAACTTTGGTGAAACTGCGAATGGCTCATTAAATCAGT """ REF_SEQS= """\ >seq0000011 TGCATGTCAGTATAGCTTTAGTGAAACTGCGAATGGCTCATTAAATCAGT >seq0000012 TGCATGTCAGTATAGCTTTAGTGAAACTGCGAATGGCTNNTTAAATCAGT >seq0000013 TGCATGTCAGTATAGCATTAGTGAAACTGCGAATGGCTCATTAAATCAGT >seq0000014 TCCATGTCAGTATAACTTTGGTGAAACTGCGAATGGCTCATTAAATCAGG >seq0000015 NNNNNNNNNNTATATCTTATGTGAAACTTCGAATGCCTCATTAAATCAGT """ REF_TREE="""((seq0000014:0.08408,seq0000015:0.13713)0.609:0.00215,seq0000013:0.02032,(seq0000011:0.00014,seq0000012:0.00014)0.766:0.00015); """ RESULT_TREE="""((((seq0000014:0.0353946,7:6.11352e-06):0.0291093,6:6.11352e-06):0.019576,seq0000015:0.13713)0.609:0.00215,seq0000013:0.02032,(seq0000011:0.00014,seq0000012:0.00014)0.766:0.00015);""" RAXML_STATS=""" This is RAxML version 7.2.6 released by Alexandros Stamatakis in February 2010. With greatly appreciated code contributions by: Andre Aberer (TUM) Simon Berger (TUM) John Cazes (TACC) Michael Ott (TUM) Nick Pattengale (UNM) Wayne Pfeiffer (SDSC) Alignment has 18 distinct alignment patterns Proportion of gaps and completely undetermined characters in this alignment: 4.80% RAxML rapid hill-climbing mode Using 1 distinct models/data partitions with joint branch length optimization Executing 1 inferences on the original alignment using 1 distinct randomized MP trees All free model parameters will be estimated by RAxML ML estimate of 25 per site rate categories Likelihood of final tree will be evaluated and optimized under GAMMA GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units Partition: 0 Alignment Patterns: 18 Name: No Name Provided DataType: DNA Substitution Matrix: GTR RAxML was called as follows: raxmlHPC -m GTRCAT -s test_raxml.phy -n results Inference[0]: Time 0.072128 CAT-based likelihood -85.425107, best rearrangement setting 2 alpha[0]: 1.000000 rates[0] ac ag at cg ct gt: 0.000017 0.037400 0.859448 1.304301 0.000017 1.000000 Conducting final model optimizations on all 1 trees under GAMMA-based models .... Inference[0] final GAMMA-based Likelihood: -107.575676 tree written to file /home/RAxML_result.results Starting final GAMMA-based thorough Optimization on tree 0 likelihood -107.575676 .... Final GAMMA-based Score of best tree -107.575676 Program execution info written to /home/RAxML_info.results Best-scoring ML tree written to: /home/RAxML_bestTree.results Overall execution time: 0.078965 secs or 0.000022 hours or 0.000001 days """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_raxml.py000644 000765 000024 00000013263 12024702176 021763 0ustar00jrideoutstaff000000 000000 #!/bin/env python from os import getcwd, remove, rmdir, mkdir from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from cogent.app.raxml import Raxml,raxml_alignment, build_tree_from_alignment from cogent.app.util import ApplicationError from cogent.parse.phylip import get_align_for_phylip from cogent.core.tree import PhyloNode from cogent.core.moltype import RNA from StringIO import StringIO from cogent.util.misc import app_path from subprocess import Popen, PIPE, STDOUT __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "Micah Hamady" __status__ = "Development" class GenericRaxml(TestCase): def setUp(self): """Check if Raxml version is supported for this test""" acceptable_version = (7,0,3) self.assertTrue(app_path('raxmlHPC'), "raxmlHPC not found. This may or may not be a problem depending on "+\ "which components of QIIME you plan to use.") command = "raxmlHPC -v | grep version" proc = Popen(command,shell=True,universal_newlines=True,\ stdout=PIPE,stderr=STDOUT) stdout = proc.stdout.read() version_string = stdout.strip().split(' ')[4].strip() try: version = tuple(map(int,version_string.split('.'))) pass_test = version == acceptable_version except ValueError: pass_test = False version_string = stdout self.assertTrue(pass_test,\ "Unsupported raxmlHPC version. %s is required, but running %s." \ % ('.'.join(map(str,acceptable_version)), version_string)) """Setup data for raxml tests""" self.seqs1 = ['ACUGCUAGCUAGUAGCGUACGUA','GCUACGUAGCUAC', 'GCGGCUAUUAGAUCGUA'] self.labels1 = ['>1','>2','>3'] self.lines1 = flatten(zip(self.labels1,self.seqs1)) self.test_model = "GTRCAT" self.align1 = get_align_for_phylip(StringIO(PHYLIP_FILE)) self.test_fn1 = "/tmp/raxml_test1.txt" self.test_fn2 = "raxml_test1.txt" self.test_fn1_space = "/tmp/raxml test1.txt" def writeTmp(self, outname): """Write data to temp file""" t = open(outname, "w+") t.write(PHYLIP_FILE) t.close() class RaxmlTests(GenericRaxml): """Tests for the Raxml application controller""" def test_raxml(self): """raxml BaseCommand should return the correct BaseCommand""" r = Raxml() self.assertEqual(r.BaseCommand, \ ''.join(['cd \"',getcwd(),'/\"; ','raxmlHPC -e 0.1 -f d -c 50'])) r.Parameters['-s'].on('seq.nexus') self.assertEqual(r.BaseCommand,\ ''.join(['cd \"',getcwd(),'/\"; ',\ 'raxmlHPC -e 0.1 -f d -c 50 -s seq.nexus'])) def test_raxml_params(self): """raxml should raise exception if missing required params""" r = Raxml(WorkingDir="/tmp") r.SuppressStdout = True r.SuppressStderr = True # raise error by default self.assertRaises(ValueError, r) # specify output name r.Parameters['-n'].on("test_name") self.assertRaises(ApplicationError, r) # specify model r.Parameters['-m'].on("GTRCAT") self.assertRaises(ApplicationError, r) r.Parameters['-s'].on(self.test_fn1) self.assertRaises(ApplicationError, r) self.writeTmp(self.test_fn1) o = r() o.cleanUp() remove(self.test_fn1) def test_raxml_from_file(self): """raxml should run correctly using filename""" r = Raxml(WorkingDir="/tmp") r.Parameters['-s'].on(self.test_fn1) r.Parameters['-m'].on("GTRCAT") r.Parameters['-n'].on("test_me") # test with abs filename cur_out = self.test_fn1 self.writeTmp(cur_out) out = r() out.cleanUp() remove(cur_out) # test with rel + working dir r.Parameters['-s'].on(self.test_fn2) r.Parameters['-n'].on("test_me2") r.Parameters['-w'].on("/tmp/") self.writeTmp(self.test_fn1) out = r() out.cleanUp() remove(self.test_fn1) r.Parameters['-s'].on("\"%s\"" % self.test_fn1_space) r.Parameters['-n'].on("test_me3") r.Parameters['-w'].on("/tmp/") #print r.BaseCommand self.writeTmp(self.test_fn1_space) out = r() out.cleanUp() remove(self.test_fn1_space) def test_raxml_alignment(self): """raxml_alignment should work as expected""" phy_node, parsimony_phy_node, log_likelihood, total_exec \ = raxml_alignment(self.align1) def test_build_tree_from_alignment(self): """Builds a tree from an alignment""" tree = build_tree_from_alignment(self.align1, RNA, False) self.assertTrue(isinstance(tree, PhyloNode)) self.assertEqual(len(tree.tips()), 7) self.assertRaises(NotImplementedError, build_tree_from_alignment, \ self.align1, RNA, True) PHYLIP_FILE= """ 7 50 Species001 UGCAUGUCAG UAUAGCUUUA GUGAAACUGC GAAUGGCUCA UUAAAUCAGU Species002 UGCAUGUCAG UAUAGCUUUA GUGAAACUGC GAAUGGCUNN UUAAAUCAGU Species003 UGCAUGUCAG UAUAGCAUUA GUGAAACUGC GAAUGGCUCA UUAAAUCAGU Species004 UCCAUGUCAG UAUAACUUUG GUGAAACUGC GAAUGGCUCA UUAAAUCAGG Species005 NNNNNNNNNN UAUAUCUUAU GUGAAACUUC GAAUGCCUCA UUAAAUCAGU Species006 UGCAUGUCAG UAUAGCUUUG GUGAAACUGC GAAUGGCUCA UUAAAUCAGU Species007 UGCAUGUCAG UAUAACUUUG GUGAAACUGC GAAUGGCUCA UUAAAUCAGU """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_raxml_v730.py000644 000765 000024 00000021231 12024702176 022534 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import getcwd, remove, rmdir, mkdir from os.path import splitext from cogent.util.unit_test import TestCase, main from cogent.util.misc import flatten from random import randint from cogent.app.raxml_v730 import (Raxml,raxml_alignment,\ build_tree_from_alignment,\ insert_sequences_into_tree) from cogent.app.util import ApplicationError,get_tmp_filename from cogent.parse.phylip import get_align_for_phylip from cogent.core.tree import PhyloNode from cogent.core.moltype import RNA,DNA from StringIO import StringIO from cogent.util.misc import app_path from subprocess import Popen, PIPE, STDOUT from cogent.core.alignment import Alignment import re from random import choice, randint __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Rob Knight", "Daniel McDonald","Jesse Stombaugh"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Stombaugh" __email__ = "jesse.stombaugh@colorado.edu" __status__ = "Production" class GenericRaxml(TestCase): def setUp(self): """Check if Raxml version is supported for this test""" acceptable_version = (7,3,0) self.assertTrue(app_path('raxmlHPC'), "raxmlHPC not found. This may or may not be a problem depending on "+\ "which components of QIIME you plan to use.") command = "raxmlHPC -v | grep version" proc = Popen(command,shell=True,universal_newlines=True,\ stdout=PIPE,stderr=STDOUT) stdout = proc.stdout.read() version_string = stdout.strip().split(' ')[4].strip() try: version = tuple(map(int,version_string.split('.'))) pass_test = version == acceptable_version except ValueError: pass_test = False version_string = stdout self.assertTrue(pass_test,\ "Unsupported raxmlHPC version. %s is required, but running %s." \ % ('.'.join(map(str,acceptable_version)), version_string)) """Setup data for raxml tests""" self.seqs1 = ['ACUGCUAGCUAGUAGCGUACGUA','GCUACGUAGCUAC', 'GCGGCUAUUAGAUCGUA'] self.labels1 = ['>1','>2','>3'] self.lines1 = flatten(zip(self.labels1,self.seqs1)) self.test_model = "GTRCAT" self.align1 = get_align_for_phylip(StringIO(PHYLIP_FILE)) self.test_fn1 = "/tmp/raxml_test1.txt" self.test_fn2 = "raxml_test1.txt" self.test_fn1_space = "/tmp/raxml test1.txt" def writeTmp(self, outname): """Write data to temp file""" t = open(outname, "w+") t.write(PHYLIP_FILE) t.close() class RaxmlTests(GenericRaxml): """Tests for the Raxml application controller""" def test_raxml(self): """raxml BaseCommand should return the correct BaseCommand""" r = Raxml() self.assertEqual(r.BaseCommand, \ ''.join(['cd \"',getcwd(),'/\"; ','raxmlHPC -f d -# 1'])) r.Parameters['-s'].on('seq.nexus') self.assertEqual(r.BaseCommand,\ ''.join(['cd \"',getcwd(),'/\"; ',\ 'raxmlHPC -f d -s seq.nexus -# 1'])) def test_raxml_params(self): """raxml should raise exception if missing required params""" r = Raxml(WorkingDir="/tmp") r.SuppressStdout = True r.SuppressStderr = True # raise error by default self.assertRaises(ValueError, r) # specify output name r.Parameters['-n'].on("test_name") r.Parameters["-p"].on(randint(1,100000)) self.assertRaises(ApplicationError, r) # specify model r.Parameters['-m'].on("GTRCAT") self.assertRaises(ApplicationError, r) r.Parameters['-s'].on(self.test_fn1) self.assertRaises(ApplicationError, r) self.writeTmp(self.test_fn1) o = r() o.cleanUp() remove(self.test_fn1) def test_raxml_from_file(self): """raxml should run correctly using filename""" r = Raxml(WorkingDir="/tmp") r.Parameters['-s'].on(self.test_fn1) r.Parameters['-m'].on("GTRCAT") r.Parameters['-n'].on("test_me") r.Parameters["-p"].on(randint(1,100000)) # test with abs filename cur_out = self.test_fn1 self.writeTmp(cur_out) out = r() out.cleanUp() remove(cur_out) # test with rel + working dir r.Parameters['-s'].on(self.test_fn2) r.Parameters['-n'].on("test_me2") r.Parameters['-w'].on("/tmp/") r.Parameters["-p"].on(randint(1,100000)) self.writeTmp(self.test_fn1) out = r() out.cleanUp() remove(self.test_fn1) r.Parameters['-s'].on("\"%s\"" % self.test_fn1_space) r.Parameters['-n'].on("test_me3") r.Parameters['-w'].on("/tmp/") r.Parameters["-p"].on(randint(1,100000)) #print r.BaseCommand self.writeTmp(self.test_fn1_space) out = r() out.cleanUp() remove(self.test_fn1_space) def test_raxml_alignment(self): """raxml_alignment should work as expected""" phy_node, parsimony_phy_node, log_likelihood, total_exec \ = raxml_alignment(self.align1) def test_build_tree_from_alignment(self): """Builds a tree from an alignment""" tree = build_tree_from_alignment(self.align1, RNA, False) self.assertTrue(isinstance(tree, PhyloNode)) self.assertEqual(len(tree.tips()), 7) self.assertRaises(NotImplementedError, build_tree_from_alignment, \ self.align1, RNA, True) def test_insert_sequences_into_tree(self): """Inserts sequences into Tree using params - test handles tree-insertion""" # generate temp filename for output outfname=splitext(get_tmp_filename('/tmp/'))[0] # create starting tree outtreefname=outfname+'.tre' outtree=open(outtreefname,'w') outtree.write(REF_TREE) outtree.close() # set params for tree-insertion params={} params["-w"]="/tmp/" params["-n"] = get_tmp_filename().split("/")[-1] params["-f"] = 'v' #params["-G"] = '0.25' params["-t"] = outtreefname params["-m"] = 'GTRGAMMA' aln_ref_query=get_align_for_phylip(StringIO(PHYLIP_FILE_DNA_REF_QUERY)) aln = Alignment(aln_ref_query) seqs, align_map = aln.toPhylip() tree = insert_sequences_into_tree(seqs, DNA, params=params, write_log=False) for node in tree.tips(): removed_query_str=re.sub('QUERY___','',str(node.Name)) new_node_name=re.sub('___\d+','',str(removed_query_str)) if new_node_name in align_map: node.Name = align_map[new_node_name] self.assertTrue(isinstance(tree, PhyloNode)) self.assertEqual(tree.getNewick(with_distances=True),RESULT_TREE) self.assertEqual(len(tree.tips()), 7) self.assertRaises(NotImplementedError, build_tree_from_alignment, \ self.align1, RNA, True) remove(outtreefname) PHYLIP_FILE= """ 7 50 Species001 UGCAUGUCAG UAUAGCUUUA GUGAAACUGC GAAUGGCUCA UUAAAUCAGU Species002 UGCAUGUCAG UAUAGCUUUA GUGAAACUGC GAAUGGCUNN UUAAAUCAGU Species003 UGCAUGUCAG UAUAGCAUUA GUGAAACUGC GAAUGGCUCA UUAAAUCAGU Species004 UCCAUGUCAG UAUAACUUUG GUGAAACUGC GAAUGGCUCA UUAAAUCAGG Species005 NNNNNNNNNN UAUAUCUUAU GUGAAACUUC GAAUGCCUCA UUAAAUCAGU Species006 UGCAUGUCAG UAUAGCUUUG GUGAAACUGC GAAUGGCUCA UUAAAUCAGU Species007 UGCAUGUCAG UAUAACUUUG GUGAAACUGC GAAUGGCUCA UUAAAUCAGU """ PHYLIP_FILE_DNA_REF_QUERY= """ 7 50 Species001 TGCATGTCAG TATAGCTTTA GTGAAACTGC GAATGGCTCA TTAAATCAGT Species002 TGCATGTCAG TATAGCTTTA GTGAAACTGC GAATGGCTNN TTAAATCAGT Species003 TGCATGTCAG TATAGCATTA GTGAAACTGC GAATGGCTCA TTAAATCAGT Species004 TCCATGTCAG TATAACTTTG GTGAAACTGC GAATGGCTCA TTAAATCAGG Species005 NNNNNNNNNN TATATCTTAT GTGAAACTTC GAATGCCTCA TTAAATCAGT Species006 TGCATGTCAG TATAGCTTTG GTGAAACTGC GAATGGCTCA TTAAATCAGT Species007 TGCATGTCAG TATAACTTTG GTGAAACTGC GAATGGCTCA TTAAATCAGT """ REF_TREE="""((seq0000004:0.08408,seq0000005:0.13713)0.609:0.00215,seq0000003:0.02032,(seq0000001:0.00014,seq0000002:0.00014)0.766:0.00015); """ RESULT_TREE="""(Species003:0.0194919169324,(Species001:4.34281710439e-07,Species002:4.34281710439e-07):4.34281710439e-07,(((Species006:0.0,Species007:0.0):0.0,Species004:0.0438017433031):0.0438017433031,Species005:0.171345128781):0.00331197405878);""" if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_rdp_classifier.py000644 000765 000024 00000054226 12024702176 023635 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for the rdp_classifier_2.0.1 application controller""" import os from cStringIO import StringIO from os import getcwd, environ, remove, listdir from shutil import rmtree import tempfile from cogent.app.util import ApplicationNotFoundError, ApplicationError,\ get_tmp_filename from cogent.app.rdp_classifier import ( RdpClassifier, RdpTrainer, assign_taxonomy, train_rdp_classifier, train_rdp_classifier_and_assign_taxonomy, parse_rdp_assignment ) from cogent.util.unit_test import TestCase, main __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Prototype" class RdpClassifierTests(TestCase): def setUp(self): # fetch user's RDP_JAR_PATH if 'RDP_JAR_PATH' in environ: self.user_rdp_jar_path = environ['RDP_JAR_PATH'] else: self.user_rdp_jar_path = 'rdp_classifier-2.2.jar' self.output_file = tempfile.NamedTemporaryFile() def test_default_java_vm_parameters(self): """RdpClassifier should store default arguments to Java VM.""" a = RdpClassifier() self.assertContains(a.Parameters, '-Xmx') self.assertEqual(a.Parameters['-Xmx'].Value, '1000m') def test_parameters_list(self): a = RdpClassifier() parameters = a.Parameters.keys() parameters.sort() self.assertEqual(parameters, ['-Xmx', '-f', '-o', '-t']) def test_assign_jvm_parameters(self): """RdpCalssifier should pass alternate parameters to Java VM.""" app = RdpClassifier() app.Parameters['-Xmx'].on('75M') exp = ''.join([ 'cd "', getcwd(), '/"; java -Xmx75M -jar "', self.user_rdp_jar_path, '" -q']) self.assertEqual(app.BaseCommand, exp) def test_basecommand_property(self): """RdpClassifier BaseCommand property should use overridden method.""" app = RdpClassifier() self.assertEqual(app.BaseCommand, app._get_base_command()) def test_base_command(self): """RdpClassifier should return expected shell command.""" app = RdpClassifier() exp = ''.join([ 'cd "', getcwd(), '/"; java -Xmx1000m -jar "', self.user_rdp_jar_path, '" -q']) self.assertEqual(app.BaseCommand, exp) def test_change_working_dir(self): """RdpClassifier should run program in expected working directory.""" test_dir = '/tmp/RdpTest' app = RdpClassifier(WorkingDir=test_dir) exp = ''.join([ 'cd "', test_dir, '/"; java -Xmx1000m -jar "', self.user_rdp_jar_path, '" -q']) self.assertEqual(app.BaseCommand, exp) rmtree(test_dir) def test_sample_fasta(self): """RdpClassifier should classify its own sample data correctly""" test_dir = '/tmp/RdpTest' app = RdpClassifier(WorkingDir=test_dir) _, output_fp = tempfile.mkstemp(dir=test_dir) app.Parameters['-o'].on(output_fp) results = app(StringIO(rdp_sample_fasta)) assignment_toks = results['Assignments'].readline().split('\t') self.assertEqual(assignment_toks[0], 'X67228') lineage = [x.strip('"') for x in assignment_toks[2::3]] self.assertEqual(lineage, [ 'Root', 'Bacteria', 'Proteobacteria', 'Alphaproteobacteria', 'Rhizobiales', 'Rhizobiaceae', 'Rhizobium']) rmtree(test_dir) class RdpTrainerTests(TestCase): """Tests of the trainer for the RdpClassifier app """ def setUp(self): self.reference_file = StringIO(rdp_training_sequences) self.reference_file.seek(0) self.taxonomy_file = tempfile.NamedTemporaryFile( prefix="RdpTaxonomy", suffix=".txt") self.taxonomy_file.write(rdp_training_taxonomy) self.taxonomy_file.seek(0) self.training_dir = tempfile.mkdtemp(prefix='RdpTrainer_output_') def tearDown(self): rmtree(self.training_dir) def test_call(self): app = RdpTrainer() app.Parameters['taxonomy_file'].on(self.taxonomy_file.name) app.Parameters['model_output_dir'].on(self.training_dir) results = app(self.reference_file) exp_file_list = [ 'bergeyTrainingTree.xml', 'genus_wordConditionalProbList.txt', 'logWordPrior.txt', 'RdpClassifier.properties', 'wordConditionalProbIndexArr.txt', ] obs_file_list = listdir(self.training_dir) exp_file_list.sort() obs_file_list.sort() self.assertEqual(obs_file_list, exp_file_list) autogenerated_headers = { 'bergeyTree': 'bergeyTrainingTree', 'probabilityList': 'genus_wordConditionalProbList', 'wordPrior': 'logWordPrior', 'probabilityIndex': 'wordConditionalProbIndexArr', } for id, basename in autogenerated_headers.iteritems(): obs_header = results[id].readline() exp_header = exp_training_header_template % basename self.assertEqual(exp_header, obs_header) class RdpWrapperTests(TestCase): """ Tests of RDP classifier wrapper functions """ def setUp(self): self.num_trials = 10 self.test_input1 = rdp_test_fasta.split('\n') self.expected_assignments1 = rdp_expected_out # Files for training self.reference_file = StringIO(rdp_training_sequences) self.reference_file.seek(0) self.taxonomy_file = StringIO(rdp_training_taxonomy) self.taxonomy_file.seek(0) self.training_dir = tempfile.mkdtemp(prefix='RdpTrainer_output_') # Sequences for trained classifier self.test_trained_input = rdp_trained_fasta.split("\n") def tearDown(self): rmtree(self.training_dir) def test_parse_rdp_assignment(self): seqid, direction, assignments = parse_rdp_assignment( "X67228\t\t" "Root\tnorank\t1.0\t" "Bacteria\tdomain\t1.0\t" "\"Proteobacteria\"\tphylum\t1.0\t" "Alphaproteobacteria\tclass\t0.9\t" "Rhizobiales\torder\t0.9\t" "Rhizobiaceae\tfamily\t0.47\t" "Rhizobium\tgenus\t0.46") self.assertEqual(seqid, "X67228") def test_assign_taxonomy_short_sequence(self): """assign_taxonomy should return Unclassifiable if sequence is too short """ assignments = assign_taxonomy([ '>MySeq 1', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGA', ]) self.assertEqual(assignments, {'MySeq 1': ('Unassignable', 1.0)}) def test_assign_taxonomy(self): """assign_taxonomy wrapper functions as expected This test may fail periodicially, but failure should be rare. """ unverified_seq_ids = set(self.expected_assignments1.keys()) for i in range(self.num_trials): obs_assignments = assign_taxonomy(self.test_input1) for seq_id in list(unverified_seq_ids): obs_lineage, obs_confidence = obs_assignments[seq_id] exp_lineage = self.expected_assignments1[seq_id] if (obs_lineage == exp_lineage): unverified_seq_ids.remove(seq_id) if not unverified_seq_ids: break messages = [] for seq_id in unverified_seq_ids: messages.append("Unable to verify %s trials" % self.num_trials) messages.append(" Sequence ID: %s" % seq_id) messages.append(" Expected: %s" % self.expected_assignments1[seq_id]) messages.append(" Observed: %s" % obs_assignments[seq_id][0]) messages.append(" Confidence: %s" % obs_assignments[seq_id][1]) # make sure all taxonomic results were correct at least once self.assertFalse(unverified_seq_ids, msg='\n'.join(messages)) def test_assign_taxonomy_alt_confidence(self): """assign_taxonomy wrapper functions as expected with alt confidence """ obs_assignments = assign_taxonomy( self.test_input1, min_confidence=0.95) for seq_id, assignment in obs_assignments.items(): obs_lineage, obs_confidence = assignment exp_lineage = self.expected_assignments1[seq_id] message = "Sequence ID: %s, assignment: %s" % (seq_id, assignment) self.assertTrue( exp_lineage.startswith(obs_lineage) or \ (obs_lineage == "Unclassified"), msg=message, ) self.assertTrue(obs_confidence >= 0.95, msg=message) def test_assign_taxonomy_file_output(self): """ assign_taxonomy wrapper writes correct file output when requested This function tests for sucessful completion of assign_taxonomy when writing to file, that the lines in the file roughly look correct by verifying how many are written (by zipping with expected), and that each line starts with the correct seq id. Actual testing of taxonomy data is performed elsewhere. """ output_fp = get_tmp_filename(\ prefix='RDPAssignTaxonomyTests',suffix='.txt') # convert the expected dict to a list of lines to match # file output expected_file_headers = self.expected_assignments1.keys() expected_file_headers.sort() actual_return_value = assign_taxonomy(\ self.test_input1,min_confidence=0.95,output_fp=output_fp) actual_file_output = list(open(output_fp)) actual_file_output.sort() # remove the output_fp before running the tests, so if they # fail the output file is still cleaned-up remove(output_fp) # None return value on write to file self.assertEqual(actual_return_value,None) # check that each line starts with the correct seq_id -- not # checking the taxonomies or confidences here as these are variable and # tested elsewhere for a,e in zip(actual_file_output,expected_file_headers): self.assertTrue(a.startswith(e)) def test_train_rdp_classifier(self): results = train_rdp_classifier( self.reference_file, self.taxonomy_file, self.training_dir) exp_file_list = [ 'bergeyTrainingTree.xml', 'genus_wordConditionalProbList.txt', 'logWordPrior.txt', 'RdpClassifier.properties', 'wordConditionalProbIndexArr.txt', ] obs_file_list = listdir(self.training_dir) exp_file_list.sort() obs_file_list.sort() self.assertEqual(obs_file_list, exp_file_list) autogenerated_headers = { 'bergeyTree': 'bergeyTrainingTree', 'probabilityList': 'genus_wordConditionalProbList', 'wordPrior': 'logWordPrior', 'probabilityIndex': 'wordConditionalProbIndexArr', } for id, basename in autogenerated_headers.iteritems(): obs_header = results[id].readline() exp_header = exp_training_header_template % basename self.assertEqual(exp_header, obs_header) def test_train_rdp_classifier_and_assign_taxonomy(self): obs = train_rdp_classifier_and_assign_taxonomy(self.reference_file, self.taxonomy_file, self.test_trained_input, min_confidence=0.80, model_output_dir=self.training_dir) exp = {'X67228': ( 'Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;' 'Rhizobiaceae;Rhizobium', 1.0 )} self.assertEqual(obs, exp) # Sample data copied from rdp_classifier-2.0, which is licensed under # the GPL 2.0 and Copyright 2008 Michigan State University Board of # Trustees rdp_training_sequences = """>X67228 Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Rhizobiaceae;Rhizobium aacgaacgctggcggcaggcttaacacatgcaagtcgaacgctccgcaaggagagtggcagacgggtgagtaacgcgtgggaatctacccaaccctgcggaatagctctgggaaactggaattaataccgcatacgccctacgggggaaagatttatcggggatggatgagcccgcgttggattagctagttggtggggtaaaggcctaccaaggcgacgatccatagctggtctgagaggatgatcagccacattgggactgagacacggcccaaa >X73443 Bacteria;Firmicutes;Clostridia;Clostridiales;Clostridiaceae;Clostridium nnnnnnngagatttgatcctggctcaggatgaacgctggccggccgtgcttacacatgcagtcgaacgaagcgcttaaactggatttcttcggattgaagtttttgctgactgagtggcggacgggtgagtaacgcgtgggtaacctgcctcatacagggggataacagttagaaatgactgctaataccnnataagcgcacagtgctgcatggcacagtgtaaaaactccggtggtatgagatggacccgcgtctgattagctagttggtggggt >AB004750 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Enterobacter acgctggcggcaggcctaacacatgcaagtcgaacggtagcagaaagaagcttgcttctttgctgacgagtggcggacgggtgagtaatgtctgggaaactgcccgatggagggggataactactggaaacggtagctaataccgcataacgtcttcggaccaaagagggggaccttcgggcctcttgccatcggatgtgcccagatgggattagctagtaggtggggtaacggctcacctaggcgacgatccctagctggtctgagaggatgaccagccacactggaactgagacacggtccagactcctacgggaggcagcagtggggaatattgca >xxxxxx Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas ttgaacgctggcggcaggcctaacacatgcaagtcgagcggcagcannnncttcgggaggctggcgagcggcggacgggtgagtaacgcatgggaacttacccagtagtgggggatagcccggggaaacccggattaataccgcatacgccctgagggggaaagcgggctccggtcgcgctattggatgggcccatgtcggattagttagttggtggggtaatggcctaccaaggcgacgatccgtagctggtctgagaggatgatcagccacaccgggactgagacacggcccggactcctacgggaggcagcagtggggaatattggacaatgggggcaaccctgatccagccatgccg >AB004748 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Enterobacter acgctggcggcaggcctaacacatgcaagtcgaacggtagcagaaagaagcttgcttctttgctgacgagtggcggacgggtgagtaatgtctgggaaactgcccgatggagggggataactactggaaacggtagctaataccgcataacgtcttcggaccaaagagggggaccttcgggcctcttgccatcggatgtgcccagatgggattagctagtaggtggggtaacggctcacctaggcgacgatccctagctggtctgagaggatgaccagccacactggaactgagacacggtccagactcctacgggaggcagcagtggggaatattgcacaatgggcgcaagcctgatgcagccatgccgcgtgtatgaagaaggccttcgggttg >AB000278 Bacteria;Proteobacteria;Gammaproteobacteria;Vibrionales;Vibrionaceae;Photobacterium caggcctaacacatgcaagtcgaacggtaanagattgatagcttgctatcaatgctgacgancggcggacgggtgagtaatgcctgggaatataccctgatgtgggggataactattggaaacgatagctaataccgcataatctcttcggagcaaagagggggaccttcgggcctctcgcgtcaggattagcccaggtgggattagctagttggtggggtaatggctcaccaaggcgacgatccctagctggtctgagaggatgatcagccacactggaactgagacacggtccagactcctacgggaggcagcagtggggaatattgcacaatgggggaaaccctgatgcagccatgccgcgtgta >AB000390 Bacteria;Proteobacteria;Gammaproteobacteria;Vibrionales;Vibrionaceae;Vibrio tggctcagattgaacgctggcggcaggcctaacacatgcaagtcgagcggaaacgantnntntgaaccttcggggnacgatnacggcgtcgagcggcggacgggtgagtaatgcctgggaaattgccctgatgtgggggataactattggaaacgatagctaataccgcataatgtctacggaccaaagagggggaccttcgggcctctcgcttcaggatatgcccaggtgggattagctagttggtgaggtaatggctcaccaaggcgacgatccctagctggtctgagaggatgatcagccacactggaactgag """ rdp_training_taxonomy = """\ 1*Bacteria*0*0*domain 765*Firmicutes*1*1*phylum 766*Clostridia*765*2*class 767*Clostridiales*766*3*order 768*Clostridiaceae*767*4*family 769*Clostridium*768*5*genus 160*Proteobacteria*1*1*phylum 433*Gammaproteobacteria*160*2*class 586*Vibrionales*433*3*order 587*Vibrionaceae*586*4*family 588*Vibrio*587*5*genus 592*Photobacterium*587*5*genus 552*Pseudomonadales*433*3*order 553*Pseudomonadaceae*552*4*family 554*Pseudomonas*553*5*genus 604*Enterobacteriales*433*3*order 605*Enterobacteriaceae*604*4*family 617*Enterobacter*605*5*genus 161*Alphaproteobacteria*160*2*class 260*Rhizobiales*161*3*order 261*Rhizobiaceae*260*4*family 262*Rhizobium*261*5*genus""" exp_training_header_template = "1version1cogent%s\n" rdp_trained_fasta = """>X67228 aacgaacgctggcggcaggcttaacacatgcaagtcgaacgctccgcaaggagagtggcagacgggtgagtaacgcgtgggaatctacccaaccctgcggaatagctctgggaaactggaattaataccgcatacgccctacgggggaaagatttatcggggatggatgagcccgcgttggattagctagttggtggggtaaaggcctaccaaggcgacgatccatagctggtctgagaggatgatcagccacattgggactgagacacggcccaaa """ rdp_sample_fasta = """>X67228 Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Rhizobiaceae;Rhizobium aacgaacgctggcggcaggcttaacacatgcaagtcgaacgctccgcaaggagagtggcagacgggtgagtaacgcgtgggaatctacccaaccctgcggaatagctctgggaaactggaattaataccgcatacgccctacgggggaaagatttatcggggatggatgagcccgcgttggattagctagttggtggggtaaaggcctaccaaggcgacgatccatagctggtctgagaggatgatcagccacattgggactgagacacggcccaaa """ rdp_sample_classification = """>X67228 reverse=false Root; 1.0; Bacteria; 1.0; Proteobacteria; 1.0; Alphaproteobacteria; 1.0; Rhizobiales; 1.0; Rhizobiaceae; 1.0; Rhizobium; 0.95; """ rdp_test_fasta = """>AY800210 description field TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTAGATGAATAAGGGGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCCCGAGTGGTCGGGATTTTTATTGGGCCTAAAGCGTCCGTAGCCGGGCGTGCAAGTCATTGGTTAAATATCGGGTCTTAAGCCCGAACCTGCTAGTGATACTACACGCCTTGGGACCGGAAGAGGCAAATGGTACGTTGAGGGTAGGGGTGAAATCCTGTAATCCCCAACGGACCACCGGTGGCGAAGCTTGTTCAGTCATGAACAACTCTACACAAGGCGATTTGCTGGGACGGATCCGACGGTGAGGGACGAAACCCAGGGGAGCGAGCGGGATTAGATACCCCGGTAGTCCTGGGCGTAAACGATGCGAACTAGGTGTTGGCGGAGCCACGAGCTCTGTCGGTGCCGAAGCGAAGGCGTTAAGTTCGCCGCCAGGGGAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCAC >EU883771 TGGCGTACGGCTCAGTAACACGTGGATAACTTACCCTTAGGACTGGGATAACTCTGGGAAACTGGGGATAATACTGGATATTAGGCTATGCCTGGAATGGTTTGCCTTTGAAATGTTTTTTTTCGCCTAAGGATAGGTCTGCGGCTGATTAGGTCGTTGGTGGGGTAATGGCCCACCAAGCCGATGATCGGTACGGGTTGTGAGAGCAAGGGCCCGGAGATGGAACCTGAGACAAGGTTCCAGACCCTACGGGGTGCAGCAGGCGCGAAACCTCCGCAATGTACGAAAGTGCGACGGGGGGATCCCAAGTGTTATGCTTTTTTGTATGACTTTTCATTAGTGTAAAAAGCTTTTAGAATAAGAGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAACACCGGCAGCTCGAGTGGTGACCACTTTTATTGGGCTTAAAGCGTTCGTAGCTTGATTTTTAAGTCTCTTGGGAAATCTCACGGCTTAACTGTGAGGCGTCTAAGAGATACTGGGAATCTAGGGACCGGGAGAGGTAAGAGGTACTTCAGGGGTAGAAGTGAAATTCTGTAATCCTTGAGGGACCACCGATGGCGAAGGCATCTTACCAGAACGGCTTCGACAGTGAGGAACGAAAGCTGGGGGAGCGAACGGGATTAGATACCCCGGTAGTCCCAGCCGTAAACTATGCGCGTTAGGTGTGCCTGTAACTACGAGTTACCGGGGTGCCGAAGTGAAAACGTGAAACGTGCCGCCTGGGAAGTACGGTCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAACGGGTGGAGCCTGCGGTTTAATTGGACTCAACGCCGGGCAGCTCACCGGATAGGACAGCGGAATGATAGCCGGGCTGAAGACCTTGCTTGACCAGCTGAGA >EF503699 AAGAATGGGGATAGCATGCGAGTCACGCCGCAATGTGTGGCATACGGCTCAGTAACACGTAGTCAACATGCCCAGAGGACGTGGACACCTCGGGAAACTGAGGATAAACCGCGATAGGCCACTACTTCTGGAATGAGCCATGACCCAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGGAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCACGAAACCTCTGCAATAGGCGAAAGCTTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCCGCTTAACGGATGGGCTGCGGAGGATACTGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCTTTGATCTACTGAAGACCACCAGTGGTGAAGGCGGTTCGCCAGAACGCGCTCGAACGGTGAGGATGAAAGCTGGGGGAGCAAACCGGAATAGATACCCGAGTAATCCCAACTGTAAACGATGGCAACTCGGGGATGGGTTGGCCTCCAACCAACCCCATGGCCGCAGGGAAGCCGTTTAGCTCTCCCGCCTGGGGAATACGGTCCGCAGAATTGAACCTTAAAGGAATTTGGCGGGGAACCCCCACAAGGGGGAAAACCGTGCGGTTCAATTGGAATCCACCCCCCGGAAACTTTACCCGGGCGCG >random_seq AAGCTCCGTCGCGTGAGCTAAAAACCATGCTGACTTATGAGACCTAAAAGCGATGCGCCGACCTGACGATGCTCTGTTCAGTTTCATCACGATCACCGGTAGTCAGGGTACCCTCCAGACCGCGCATAGTGACTATGTTCCCGCACCTGTATATGTAATTCCCATTATACGTCTACGTTATGTAGTAAAGTTGCTCACGCCAGGCACAGTTTGTCTTGATACATAGGGTAGCTTAAGTCCCGTCCATTTCACCGCGATTGTAATAGACGAATCAGCAGTGGTGCAATCAAGTCCCAACAGTTATATTTCAAAAATCTTCCGATAGTCGTGGGCGAAGTTGTCAACCTACCTACCATGGCTATAAGGCCCAGTTTACTTCAGTTGAACGTGACGGTAACCCTACTGAGTGCACGATACCTGCTCAACAACGGCCCAAAACCCGTGCGACACATTGGGCACTACAATAATCTTAGAGGACCATGGATCTGGTGGGTGGACTGAAGCATATCCCAAAAGTGTCGTGAGTCCGTTATGCAATTGACTGAAACAGCCGTACCAGAGTTCGGATGACCTCTGGGTTGCTGCGGTACACACCCGGGTGCGGCTTCTGAAATAGAAAAGACTAAGCATCGGCCGCCTCACACGCCAC >DQ260310 GATACCCCCGGAAACTGGGGATTATACCGGATATGTGGGGCTGCCTGGAATGGTACCTCATTGAAATGCTCCCGCGCCTAAAGATGGATCTGCCGCAGAATAAGTAGTTTGCGGGGTAAATGGCCACCCAGCCAGTAATCCGTACCGGTTGTGAAAACCAGAACCCCGAGATGGAAACTGAAACAAAGGTTCAAGGCCTACCGGGCACAACAAGCGCCAAAACTCCGCCATGCGAGCCATCGCGACGGGGGAAAACCAAGTACCACTCCTAACGGGGTGGTTTTTCCGAAGTGGAAAAAGCCTCCAGGAATAAGAACCTGGGCCAGAACCGTGGCCAGCCGCCGCCGTTACACCCGCCAGCTCGAGTTGTTGGCCGGTTTTATTGGGGCCTAAAGCCGGTCCGTAGCCCGTTTTGATAAGGTCTCTCTGGTGAAATTCTACAGCTTAACCTGTGGGAATTGCTGGAGGATACTATTCAAGCTTGAAGCCGGGAGAAGCCTGGAAGTACTCCCGGGGGTAAGGGGTGAAATTCTATTATCCCCGGAAGACCAACTGGTGCCGAAGCGGTCCAGCCTGGAACCGAACTTGACCGTGAGTTACGAAAAGCCAAGGGGCGCGGACCGGAATAAAATAACCAGGGTAGTCCTGGCCGTAAACGATGTGAACTTGGTGGTGGGAATGGCTTCGAACTGCCCAATTGCCGAAAGGAAGCTGTAAATTCACCCGCCTTGGAAGTACGGTCGCAAGACTGGAACCTAAAAGGAATTGGCGGGGGGACACCACAACGCGTGGAGCCTGGCGGTTTTATTGGGATTCCACGCAGACATCTCACTCAGGGGCGACAGCAGAAATGATGGGCAGGTTGATGACCTTGCTTGACAAGCTGAAAAGGAGGTGCAT >EF503697 TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC >short_seq TAAAATGACTAGCCTGCGAGTCAC """ rdp_expected_out = { 'AY800210 description field': 'Archaea;Euryarchaeota', 'EU883771': 'Archaea;Euryarchaeota;Methanomicrobia;Methanomicrobiales;Methanomicrobiaceae;Methanomicrobium', 'EF503699': 'Archaea;Crenarchaeota;Thermoprotei', 'random_seq': 'Bacteria', 'DQ260310': 'Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanosphaera', 'EF503697': 'Archaea;Crenarchaeota;Thermoprotei', 'short_seq': 'Unassignable', } if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_rdp_classifier20.py000644 000765 000024 00000064642 12024702176 024002 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests for the rdp_classifier_2.0.1 application controller""" from cStringIO import StringIO from os import getcwd, environ, remove, listdir from shutil import rmtree from tempfile import mkdtemp from cogent.app.util import ( ApplicationNotFoundError, ApplicationError, get_tmp_filename, ) from cogent.app.rdp_classifier20 import ( RdpClassifier20, RdpTrainer20, assign_taxonomy, train_rdp_classifier, train_rdp_classifier_and_assign_taxonomy, ) from cogent.util.unit_test import TestCase, main __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Prototype" class RdpClassifier20Tests(TestCase): def setUp(self): # fetch user's RDP_JAR_PATH if 'RDP_JAR_PATH' in environ: self.user_rdp_jar_path = environ['RDP_JAR_PATH'] else: self.user_rdp_jar_path = 'rdp_classifier-2.0.jar' def test_default_java_vm_parameters(self): """RdpClassifier should store default arguments to Java VM.""" a = RdpClassifier20() self.assertContains(a.Parameters, '-Xmx') self.assertEqual(a.Parameters['-Xmx'].Value, '1000m') def test_parameters_list(self): a = RdpClassifier20() parameters = a.Parameters.keys() parameters.sort() self.assertEqual(parameters, ['-Xmx', '-training-data']) def test_jvm_parameters_list(self): a = RdpClassifier20() parameters = a.JvmParameters.keys() parameters.sort() self.assertEqual(parameters, ['-Xmx']) def test_positional_parameters_list(self): a = RdpClassifier20() parameters = a.PositionalParameters.keys() parameters.sort() self.assertEqual(parameters, ['-training-data']) def test_default_positional_parameters(self): """RdpClassifier should store default positional arguments.""" a = RdpClassifier20() self.assertContains(a.PositionalParameters, '-training-data') self.assertEqual(a.PositionalParameters['-training-data'].Value, '') def test_assign_jvm_parameters(self): """RdpCalssifier should pass alternate parameters to Java VM.""" app = RdpClassifier20() app.Parameters['-Xmx'].on('75M') exp = ''.join(['cd "', getcwd(), '/"; java -Xmx75M -jar "', self.user_rdp_jar_path, '"']) self.assertEqual(app.BaseCommand, exp) def test_basecommand_property(self): """RdpClassifier BaseCommand property should use overridden _get_base_command method.""" app = RdpClassifier20() self.assertEqual(app.BaseCommand, app._get_base_command()) def test_base_command(self): """RdpClassifier should return expected shell command.""" app = RdpClassifier20() exp = ''.join(['cd "', getcwd(), '/"; java -Xmx1000m -jar "', self.user_rdp_jar_path, '"']) self.assertEqual(app.BaseCommand, exp) def test_change_working_dir(self): """RdpClassifier should run program in expected working directory.""" test_dir = '/tmp/RdpTest' app = RdpClassifier20(WorkingDir=test_dir) exp = ''.join(['cd "', test_dir, '/"; java -Xmx1000m -jar "', self.user_rdp_jar_path, '"']) self.assertEqual(app.BaseCommand, exp) rmtree(test_dir) def test_sample_fasta(self): """RdpClassifier should classify its own sample data correctly""" test_dir = '/tmp/RdpTest' app = RdpClassifier20(WorkingDir=test_dir) results = app(rdp_sample_fasta) results_file = results['Assignments'] id_line = results_file.readline() self.failUnless(id_line.startswith('>X67228')) classification_line = results_file.readline().strip() obs = parse_rdp(classification_line) exp = ['Root', 'Bacteria', 'Proteobacteria', 'Alphaproteobacteria', 'Rhizobiales', 'Rhizobiaceae', 'Rhizobium'] self.assertEqual(obs, exp) rmtree(test_dir) class RdpTrainer20Tests(TestCase): """Tests of the trainer for the RdpClassifier app """ def setUp(self): self.reference_file = StringIO(rdp_training_sequences) self.reference_file.seek(0) self.taxonomy_file = StringIO(rdp_training_taxonomy) self.taxonomy_file.seek(0) self.training_dir = mkdtemp(prefix='RdpTrainer_output_') def tearDown(self): rmtree(self.training_dir) def test_train_with_rdp_files(self): app = RdpTrainer20() results = app._train_with_rdp_files( self.reference_file, self.taxonomy_file, self.training_dir) exp_file_list = [ 'bergeyTrainingTree.xml', 'genus_wordConditionalProbList.txt', 'logWordPrior.txt', 'RdpClassifier.properties', 'wordConditionalProbIndexArr.txt', ] obs_file_list = listdir(self.training_dir) exp_file_list.sort() obs_file_list.sort() self.assertEqual(obs_file_list, exp_file_list) autogenerated_headers = { 'bergeyTree': 'bergeyTrainingTree', 'probabilityList': 'genus_wordConditionalProbList', 'wordPrior': 'logWordPrior', 'probabilityIndex': 'wordConditionalProbIndexArr', } for id, basename in autogenerated_headers.iteritems(): obs_header = results[id].readline() exp_header = exp_training_header_template % basename self.assertEqual(exp_header, obs_header) class RdpWrapperTests(TestCase): """ Tests of RDP classifier wrapper functions """ def setUp(self): self.test_input1 = rdp_test_fasta.split('\n') self.expected_assignments1 = rdp_expected_out # Files for training self.reference_file = StringIO(rdp_training_sequences) self.reference_file.seek(0) self.taxonomy_file = StringIO(rdp_training_taxonomy) self.taxonomy_file.seek(0) self.training_dir = mkdtemp(prefix='RdpTrainer_output_') # Sequences for trained classifier self.test_trained_input = rdp_trained_fasta.split("\n") def tearDown(self): rmtree(self.training_dir) def test_assign_taxonomy_short_sequence(self): """assign_taxonomy should return Unclassifiable if sequence is too short """ assignments = assign_taxonomy([ '>MySeq 1', 'TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGA', ]) self.assertEqual(assignments, {'MySeq 1': ('Unassignable', 1.0)}) def test_assign_taxonomy(self): """assign_taxonomy wrapper functions as expected This test may fail periodicially, but failure should be rare. """ # convert the expected dict to a list, so it's easier to # handle the order expected_assignments = \ [(k,v[0],v[1]) for k,v in self.expected_assignments1.items()] expected_assignments.sort() # Because there is some variation in the taxon assignments, # I run the test several times (which can be quite slow) and # each sequence was classified the same as expected at least once taxon_assignment_results = [False] * len(expected_assignments) all_assigned_correctly = False for i in range(10): actual_assignments = assign_taxonomy(self.test_input1) # covert actual_assignments to a list so it's easier to handle # the order actual_assignments = \ [(k,v[0],v[1]) for k,v in actual_assignments.items()] actual_assignments.sort() for j in range(len(expected_assignments)): a = actual_assignments[j] e = expected_assignments[j] # same description fields self.assertEqual(a[0],e[0]) # same taxonomic assignment r = a[1] == e[1] if r and not taxon_assignment_results[j]: taxon_assignment_results[j] = True # confidence >= 0.80 self.assertTrue(a[2]>=0.80) if False not in taxon_assignment_results: # all sequences have been correctly assigned at # least once -- bail out all_assigned_correctly = True break # make sure all taxonomic results were correct at least once self.assertTrue(all_assigned_correctly) def test_assign_taxonomy_alt_confidence(self): """assign_taxonomy wrapper functions as expected with alt confidence """ actual_assignments = \ assign_taxonomy(self.test_input1,min_confidence=0.95) # covert actual_assignments to a list so it's easier to handle # the order actual_assignments = \ [(k,v[0],v[1]) for k,v in actual_assignments.items()] actual_assignments.sort() # convert the expected dict to a list, so it's easier to # handle the order expected_assignments = \ [(k,v[0],v[1]) for k,v in self.expected_assignments1.items()] expected_assignments.sort() for a,e in zip(actual_assignments,expected_assignments): # same description fields self.assertEqual(a[0],e[0]) # confidence >= 0.95 self.assertTrue(a[2]>=0.95) def test_assign_taxonomy_file_output(self): """ assign_taxonomy wrapper writes correct file output when requested This function tests for sucessful completion of assign_taxonomy when writing to file, that the lines in the file roughly look correct by verifying how many are written (by zipping with expected), and that each line starts with the correct seq id. Actual testing of taxonomy data is performed elsewhere. """ output_fp = get_tmp_filename(\ prefix='RDPAssignTaxonomyTests',suffix='.txt') # convert the expected dict to a list of lines to match # file output expected_file_headers = self.expected_assignments1.keys() expected_file_headers.sort() actual_return_value = assign_taxonomy(\ self.test_input1,min_confidence=0.95,output_fp=output_fp) actual_file_output = list(open(output_fp)) actual_file_output.sort() # remove the output_fp before running the tests, so if they # fail the output file is still cleaned-up remove(output_fp) # None return value on write to file self.assertEqual(actual_return_value,None) # check that each line starts with the correct seq_id -- not # checking the taxonomies or confidences here as these are variable and # tested elsewhere for a,e in zip(actual_file_output,expected_file_headers): self.assertTrue(a.startswith(e)) def test_train_rdp_classifier(self): results = train_rdp_classifier( self.reference_file, self.taxonomy_file, self.training_dir) exp_file_list = [ 'bergeyTrainingTree.xml', 'genus_wordConditionalProbList.txt', 'logWordPrior.txt', 'RdpClassifier.properties', 'wordConditionalProbIndexArr.txt', ] obs_file_list = listdir(self.training_dir) exp_file_list.sort() obs_file_list.sort() self.assertEqual(obs_file_list, exp_file_list) autogenerated_headers = { 'bergeyTree': 'bergeyTrainingTree', 'probabilityList': 'genus_wordConditionalProbList', 'wordPrior': 'logWordPrior', 'probabilityIndex': 'wordConditionalProbIndexArr', } for id, basename in autogenerated_headers.iteritems(): obs_header = results[id].readline() exp_header = exp_training_header_template % basename self.assertEqual(exp_header, obs_header) def test_train_rdp_classifier_and_assign_taxonomy(self): obs = train_rdp_classifier_and_assign_taxonomy(self.reference_file, self.taxonomy_file, self.test_trained_input, min_confidence=0.80, model_output_dir=self.training_dir) exp = {'X67228': ('Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Rhizobiaceae;Rhizobium', 1.0)} self.assertEqual(obs, exp) def parse_rdp(line): """Returns a list of assigned taxa from an RDP classification line """ tokens = line.split('; ') # Keep even-numbered tokens return [t for (pos, t) in enumerate(tokens) if (pos % 2 == 0)] # Sample data copied from rdp_classifier-2.0, which is licensed under # the GPL 2.0 and Copyright 2008 Michigan State University Board of # Trustees rdp_training_sequences = """>X67228 Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Rhizobiaceae;Rhizobium aacgaacgctggcggcaggcttaacacatgcaagtcgaacgctccgcaaggagagtggcagacgggtgagtaacgcgtgggaatctacccaaccctgcggaatagctctgggaaactggaattaataccgcatacgccctacgggggaaagatttatcggggatggatgagcccgcgttggattagctagttggtggggtaaaggcctaccaaggcgacgatccatagctggtctgagaggatgatcagccacattgggactgagacacggcccaaa >X73443 Bacteria;Firmicutes;Clostridia;Clostridiales;Clostridiaceae;Clostridium nnnnnnngagatttgatcctggctcaggatgaacgctggccggccgtgcttacacatgcagtcgaacgaagcgcttaaactggatttcttcggattgaagtttttgctgactgagtggcggacgggtgagtaacgcgtgggtaacctgcctcatacagggggataacagttagaaatgactgctaataccnnataagcgcacagtgctgcatggcacagtgtaaaaactccggtggtatgagatggacccgcgtctgattagctagttggtggggt >AB004750 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Enterobacter acgctggcggcaggcctaacacatgcaagtcgaacggtagcagaaagaagcttgcttctttgctgacgagtggcggacgggtgagtaatgtctgggaaactgcccgatggagggggataactactggaaacggtagctaataccgcataacgtcttcggaccaaagagggggaccttcgggcctcttgccatcggatgtgcccagatgggattagctagtaggtggggtaacggctcacctaggcgacgatccctagctggtctgagaggatgaccagccacactggaactgagacacggtccagactcctacgggaggcagcagtggggaatattgca >xxxxxx Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas ttgaacgctggcggcaggcctaacacatgcaagtcgagcggcagcannnncttcgggaggctggcgagcggcggacgggtgagtaacgcatgggaacttacccagtagtgggggatagcccggggaaacccggattaataccgcatacgccctgagggggaaagcgggctccggtcgcgctattggatgggcccatgtcggattagttagttggtggggtaatggcctaccaaggcgacgatccgtagctggtctgagaggatgatcagccacaccgggactgagacacggcccggactcctacgggaggcagcagtggggaatattggacaatgggggcaaccctgatccagccatgccg >AB004748 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Enterobacter acgctggcggcaggcctaacacatgcaagtcgaacggtagcagaaagaagcttgcttctttgctgacgagtggcggacgggtgagtaatgtctgggaaactgcccgatggagggggataactactggaaacggtagctaataccgcataacgtcttcggaccaaagagggggaccttcgggcctcttgccatcggatgtgcccagatgggattagctagtaggtggggtaacggctcacctaggcgacgatccctagctggtctgagaggatgaccagccacactggaactgagacacggtccagactcctacgggaggcagcagtggggaatattgcacaatgggcgcaagcctgatgcagccatgccgcgtgtatgaagaaggccttcgggttg >AB000278 Bacteria;Proteobacteria;Gammaproteobacteria;Vibrionales;Vibrionaceae;Photobacterium caggcctaacacatgcaagtcgaacggtaanagattgatagcttgctatcaatgctgacgancggcggacgggtgagtaatgcctgggaatataccctgatgtgggggataactattggaaacgatagctaataccgcataatctcttcggagcaaagagggggaccttcgggcctctcgcgtcaggattagcccaggtgggattagctagttggtggggtaatggctcaccaaggcgacgatccctagctggtctgagaggatgatcagccacactggaactgagacacggtccagactcctacgggaggcagcagtggggaatattgcacaatgggggaaaccctgatgcagccatgccgcgtgta >AB000390 Bacteria;Proteobacteria;Gammaproteobacteria;Vibrionales;Vibrionaceae;Vibrio tggctcagattgaacgctggcggcaggcctaacacatgcaagtcgagcggaaacgantnntntgaaccttcggggnacgatnacggcgtcgagcggcggacgggtgagtaatgcctgggaaattgccctgatgtgggggataactattggaaacgatagctaataccgcataatgtctacggaccaaagagggggaccttcgggcctctcgcttcaggatatgcccaggtgggattagctagttggtgaggtaatggctcaccaaggcgacgatccctagctggtctgagaggatgatcagccacactggaactgag """ rdp_training_taxonomy = """1*Bacteria*0*0*domain 765*Firmicutes*1*1*phylum 766*Clostridia*765*2*class 767*Clostridiales*766*3*order 768*Clostridiaceae*767*4*family 769*Clostridium*768*5*genus 160*Proteobacteria*1*1*phylum 433*Gammaproteobacteria*160*2*class 586*Vibrionales*433*3*order 587*Vibrionaceae*586*4*family 588*Vibrio*587*5*genus 592*Photobacterium*587*5*genus 552*Pseudomonadales*433*3*order 553*Pseudomonadaceae*552*4*family 554*Pseudomonas*553*5*genus 604*Enterobacteriales*433*3*order 605*Enterobacteriaceae*604*4*family 617*Enterobacter*605*5*genus 161*Alphaproteobacteria*160*2*class 260*Rhizobiales*161*3*order 261*Rhizobiaceae*260*4*family 262*Rhizobium*261*5*genus """ exp_training_header_template = "1version1cogent%s\n" rdp_trained_fasta = """>X67228 aacgaacgctggcggcaggcttaacacatgcaagtcgaacgctccgcaaggagagtggcagacgggtgagtaacgcgtgggaatctacccaaccctgcggaatagctctgggaaactggaattaataccgcatacgccctacgggggaaagatttatcggggatggatgagcccgcgttggattagctagttggtggggtaaaggcctaccaaggcgacgatccatagctggtctgagaggatgatcagccacattgggactgagacacggcccaaa """ rdp_sample_fasta = """>X67228 Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Rhizobiaceae;Rhizobium aacgaacgctggcggcaggcttaacacatgcaagtcgaacgctccgcaaggagagtggcagacgggtgagtaacgcgtgggaatctacccaaccctgcggaatagctctgggaaactggaattaataccgcatacgccctacgggggaaagatttatcggggatggatgagcccgcgttggattagctagttggtggggtaaaggcctaccaaggcgacgatccatagctggtctgagaggatgatcagccacattgggactgagacacggcccaaa """ rdp_sample_classification = """>X67228 reverse=false Root; 1.0; Bacteria; 1.0; Proteobacteria; 1.0; Alphaproteobacteria; 1.0; Rhizobiales; 1.0; Rhizobiaceae; 1.0; Rhizobium; 0.95; """ rdp_test_fasta = """>AY800210 description field TTCCGGTTGATCCTGCCGGACCCGACTGCTATCCGGATGCGACTAAGCCATGCTAGTCTAACGGATCTTCGGATCCGTGGCATACCGCTCTGTAACACGTAGATAACCTACCCTGAGGTCGGGGAAACTCCCGGGAAACTGGGCCTAATCCCCGATAGATAATTTGTACTGGAATGTCTTTTTATTGAAACCTCCGAGGCCTCAGGATGGGTCTGCGCCAGATTATGGTCGTAGGTGGGGTAACGGCCCACCTAGCCTTTGATCTGTACCGGACATGAGAGTGTGTGCCGGGAGATGGCCACTGAGACAAGGGGCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTCACAATGCCCGCAAGGGTGATGAGGGTATCCGAGTGCTACCTTAGCCGGTAGCTTTTATTCAGTGTAAATAGCTAGATGAATAAGGGGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCCCGAGTGGTCGGGATTTTTATTGGGCCTAAAGCGTCCGTAGCCGGGCGTGCAAGTCATTGGTTAAATATCGGGTCTTAAGCCCGAACCTGCTAGTGATACTACACGCCTTGGGACCGGAAGAGGCAAATGGTACGTTGAGGGTAGGGGTGAAATCCTGTAATCCCCAACGGACCACCGGTGGCGAAGCTTGTTCAGTCATGAACAACTCTACACAAGGCGATTTGCTGGGACGGATCCGACGGTGAGGGACGAAACCCAGGGGAGCGAGCGGGATTAGATACCCCGGTAGTCCTGGGCGTAAACGATGCGAACTAGGTGTTGGCGGAGCCACGAGCTCTGTCGGTGCCGAAGCGAAGGCGTTAAGTTCGCCGCCAGGGGAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCAC >EU883771 TGGCGTACGGCTCAGTAACACGTGGATAACTTACCCTTAGGACTGGGATAACTCTGGGAAACTGGGGATAATACTGGATATTAGGCTATGCCTGGAATGGTTTGCCTTTGAAATGTTTTTTTTCGCCTAAGGATAGGTCTGCGGCTGATTAGGTCGTTGGTGGGGTAATGGCCCACCAAGCCGATGATCGGTACGGGTTGTGAGAGCAAGGGCCCGGAGATGGAACCTGAGACAAGGTTCCAGACCCTACGGGGTGCAGCAGGCGCGAAACCTCCGCAATGTACGAAAGTGCGACGGGGGGATCCCAAGTGTTATGCTTTTTTGTATGACTTTTCATTAGTGTAAAAAGCTTTTAGAATAAGAGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAACACCGGCAGCTCGAGTGGTGACCACTTTTATTGGGCTTAAAGCGTTCGTAGCTTGATTTTTAAGTCTCTTGGGAAATCTCACGGCTTAACTGTGAGGCGTCTAAGAGATACTGGGAATCTAGGGACCGGGAGAGGTAAGAGGTACTTCAGGGGTAGAAGTGAAATTCTGTAATCCTTGAGGGACCACCGATGGCGAAGGCATCTTACCAGAACGGCTTCGACAGTGAGGAACGAAAGCTGGGGGAGCGAACGGGATTAGATACCCCGGTAGTCCCAGCCGTAAACTATGCGCGTTAGGTGTGCCTGTAACTACGAGTTACCGGGGTGCCGAAGTGAAAACGTGAAACGTGCCGCCTGGGAAGTACGGTCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAACGGGTGGAGCCTGCGGTTTAATTGGACTCAACGCCGGGCAGCTCACCGGATAGGACAGCGGAATGATAGCCGGGCTGAAGACCTTGCTTGACCAGCTGAGA >EF503699 AAGAATGGGGATAGCATGCGAGTCACGCCGCAATGTGTGGCATACGGCTCAGTAACACGTAGTCAACATGCCCAGAGGACGTGGACACCTCGGGAAACTGAGGATAAACCGCGATAGGCCACTACTTCTGGAATGAGCCATGACCCAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGGAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCACGAAACCTCTGCAATAGGCGAAAGCTTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCCGCTTAACGGATGGGCTGCGGAGGATACTGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCTTTGATCTACTGAAGACCACCAGTGGTGAAGGCGGTTCGCCAGAACGCGCTCGAACGGTGAGGATGAAAGCTGGGGGAGCAAACCGGAATAGATACCCGAGTAATCCCAACTGTAAACGATGGCAACTCGGGGATGGGTTGGCCTCCAACCAACCCCATGGCCGCAGGGAAGCCGTTTAGCTCTCCCGCCTGGGGAATACGGTCCGCAGAATTGAACCTTAAAGGAATTTGGCGGGGAACCCCCACAAGGGGGAAAACCGTGCGGTTCAATTGGAATCCACCCCCCGGAAACTTTACCCGGGCGCG >random_seq AAGCTCCGTCGCGTGAGCTAAAAACCATGCTGACTTATGAGACCTAAAAGCGATGCGCCGACCTGACGATGCTCTGTTCAGTTTCATCACGATCACCGGTAGTCAGGGTACCCTCCAGACCGCGCATAGTGACTATGTTCCCGCACCTGTATATGTAATTCCCATTATACGTCTACGTTATGTAGTAAAGTTGCTCACGCCAGGCACAGTTTGTCTTGATACATAGGGTAGCTTAAGTCCCGTCCATTTCACCGCGATTGTAATAGACGAATCAGCAGTGGTGCAATCAAGTCCCAACAGTTATATTTCAAAAATCTTCCGATAGTCGTGGGCGAAGTTGTCAACCTACCTACCATGGCTATAAGGCCCAGTTTACTTCAGTTGAACGTGACGGTAACCCTACTGAGTGCACGATACCTGCTCAACAACGGCCCAAAACCCGTGCGACACATTGGGCACTACAATAATCTTAGAGGACCATGGATCTGGTGGGTGGACTGAAGCATATCCCAAAAGTGTCGTGAGTCCGTTATGCAATTGACTGAAACAGCCGTACCAGAGTTCGGATGACCTCTGGGTTGCTGCGGTACACACCCGGGTGCGGCTTCTGAAATAGAAAAGACTAAGCATCGGCCGCCTCACACTTCAAGGGCCCTATGCCTAACAGTCTAGCAAATGCTTGAACCTTGTACCAAAGTTCAGACTTACCTTTACTTGGTTATCGCCCTTGAACCTGTAACCGTCACGTGGTCTACAATTCGTGGATTCCTGCATGAGGATGAACGGGTCCCTTCTCGTGCTACGTAGGCAGTATGTTCAACAACAAGAGGGTAATGCAATGGGGCTTAGAATCCCTTGACCGCAGAACACGTGGGGACTGCATTTCCACGTCGGTACTAATCCTCTGATTCTGTTCGTTATGTCCAAGACCATATCATCTTAATACCTATCAGACCTACGCTCTTCTGTCTGAACCCTAAAACGTTCGGAATAGCACGATCGTCCGGTTTGAACTGACGTGTAGGACTCTCGTTGCTCCCTGGTAATCATTCAGGCCCTGGCGTAATGCCCTTACTGACTACCACTGGCCTGGTTAGCAGCATCATTCTTAGGTGGAAGTGTTGGTCGCTGTTCTCCCATTGTGTCGAGTTTCCTCCGCGATCTCGTCACCGGGCGAGTTTTTAACTGCGACATCAACGTGTGGGTTTGACTTACCGCGCAGCATCGGGCCAACAGCCTTCGACAGCTAGAAGCTGGAAAATTAGTAAATAGTCCAGCACCGTTATATACATCTACTTGCTCGGAGGGGCACAGACGGCAGCGGACTCACGCCCGTTTTCAACCCTGGTAAAGAGGACGCGCGTTCATGAGGCGTAGGGATGCATTGCTCCTGATCCATGTTCTCAATTACGCTTCAATTCTGAGTATTAGACCTCACGCACGCAGTAAGCTTACTGGTCTCACTGCCATATTACTAGCTTAAAATAGCGGTCCAAAGACGCGGCATGATTAACGACAGCCCTTCACTTCACGGACCTCACGCCCGATATAGCGTACGTTCAGGGCTTGGAAAGGCGGACAATAATAAGCATGCAATAGCTAAGATTGTAGGAACTTCCATCCGCCGAACACAGCCCCTCGCCACAACGGTTGGAACCCGCGCTATGCCGTAAAGCGGCCAAAACGTCGCGCGCCACCACAAGATAGTGCTCAAAGCCCGCAGGGAGAGTCGGTGCTTGGTGCCTTCGCTACGGGGCCCAATAGCTTGCTTTTTCTTGCGCCGATTGACTCTAGGTAAACTCACCGTGACATACCGCATAAATCTGCAAGGGTGGTCCTGACTAAAGAGCTCTATAGCGATGATGGTGGCCTTAGACAGCACAAGCTGAACTTATTAATTCTTACCAGGTCCCGTGGGGTTCCGGAACATGAGTATGCTCTTTGGAACGGGCTTTGTCCACGTGTAAGGATGGTTACGTCCCGGGCGTATTGCACTACGTTCAAGTGGTGTAAGACAGAGTGACTAGAGACAGTGCCGTTATCACTATTGTGGGGCCCATCCTAAGGCTGAGGACACGGAACATGTCTCTTTATCATCGCACGAGCTGTATGCCACTGTATTCCTCTACCTTAGCCATCGCTCATTTAACGCCACGTGTAGCGGTGCAGGCTACGAGGCTTAAGTCTGCTCGGCTTGCTGAGCATATCCCTATAAATGAATGAGGAATAAAGGACATGACGCATTCCCGGCACCTGACAACAGGACGCAATTACTAAGTAGGCTTATGTAGTCTCGTGTAATGCAGACCGCTCTTAAGAGTCGGATCATAATTTGAGCAGAACAAAATTACTTATGCCCTTACAAGACGTGTGCAAACCTAAGTGTGAAGATTTAGGAGGCACCGCGTTTTATGGCTTCTGCGAATATATTGTGATTTCCTGAATAGTGGGGTGGGATGTAATGGACTGAAAAAGGTGAACATCTTAAGCCTACCAGTCATATTCCCGCCGGAACTTACTTAAATCAATGGACACTCAAAGAGACTTTGAAGCTCTTTATACCGATGTGCGCGCAAACCCCTCGAGTGCTGTCCTGCAACACCCTAAGTTGCAGTATGTTGGTTACAGCGATCATTTATAGGTTAAAATGGCTAAATGAGCTAGCGCCGCGGCCCGGAAGAACAGATTTCGTGAGGTGACCTGGCGAATGTGAATCCTGAAAATTTTCACACGACCACAGCAAAGCCGTTGGGCAAGTTCCGGCAACTAAGCCGAAGGTGACCCTGTACTGGCAGGGGTTTACCATGATGGAATGACCGGAATCAAGGCAAGGAAACACCGAGTACAGTTACGAGCAAACACGCGTAAATTAAATTACTGCAGTATAACTCGTTCACCAATTCGGGTCGGCCGACGTGCACGTCAACCAGGCATACGACACGTGAAATTCACTGCCGACACCACTGTTTCGATTACCATGTGTCTGGTCTTTCAAAGCACAGAGAGAGGCCCTCGCCGGTAAATTACGGACTCGTATGTGTTAACCGGGAATAGGTGGGACGGATCAACACTTAAGTTGGACAGACCAAAAACTAGCCGAAAACCTTCACAAAAAGAGTCATGAATGATCCGTCAAGGACAGCGCTCTCCCACCGCTGGATGGCAATCGCAAGGTATCAATCAATAGTGATGTCTACGAGTCTTACACAGGTGTCCTGGATGTAACTACTGTTGCCACGAGGAACGTATACACCCCAGCCTGCGTAATGTATGATCTTTTCGCGTTCTGAATCCAGAATATATTAGTGAAGGTCCCAAAGACACCTATTAATCGGTCCCAGTCGTTTGTCCTACATTTCTGTGTAGCCAGGGGTCCCCTATTACTCCTAATGAGGGATTGGCGCCGGGTAGAGTCTACGCCAAAGCGGCAAAGAGATTGATCACCGGCCGGGTACAATGCAGACTTTACATTCAGAGCGTGTTTGCCGCGTAATGCTTAATTCACTGCTGGCGCACCTCGCAGAATTACCTATATTTCCTCTTCCTCACTAAACTGGTGTTCAGAGATGGTCGATTTTCCGGTGTGGTTTATAGCAGGTCCCGCCACATGCAACACATACCGAACATCGCTATCAGTGTTGTTCTCGTCGCCGCGACTCCTGACTCGCGATTAGTTGGCTAGCTCCCAGCGCTAGCTCCGCCTCTGTCTGTATATCGCCAGTAATCAGTTTTCAATGACGTTGACTTATTTATAAACGAGCTGAAGCATCTCTCGCGCCCTAGCGTTACTACTATCAGGAACCGCCGTGTGGGAACTCCTCTACCTCACCACCCTGCCAGCTCCTATGACAACGTTTAGTCCGCGTACTGACAGGGAGAGAGAAGCGTTACGGGACCGTCTGAACAGTATGTTGGCAGAGGAAGGCCAGGGCTCCCTATGGTTTTAGGTT >DQ260310 GATACCCCCGGAAACTGGGGATTATACCGGATATGTGGGGCTGCCTGGAATGGTACCTCATTGAAATGCTCCCGCGCCTAAAGATGGATCTGCCGCAGAATAAGTAGTTTGCGGGGTAAATGGCCACCCAGCCAGTAATCCGTACCGGTTGTGAAAACCAGAACCCCGAGATGGAAACTGAAACAAAGGTTCAAGGCCTACCGGGCACAACAAGCGCCAAAACTCCGCCATGCGAGCCATCGCGACGGGGGAAAACCAAGTACCACTCCTAACGGGGTGGTTTTTCCGAAGTGGAAAAAGCCTCCAGGAATAAGAACCTGGGCCAGAACCGTGGCCAGCCGCCGCCGTTACACCCGCCAGCTCGAGTTGTTGGCCGGTTTTATTGGGGCCTAAAGCCGGTCCGTAGCCCGTTTTGATAAGGTCTCTCTGGTGAAATTCTACAGCTTAACCTGTGGGAATTGCTGGAGGATACTATTCAAGCTTGAAGCCGGGAGAAGCCTGGAAGTACTCCCGGGGGTAAGGGGTGAAATTCTATTATCCCCGGAAGACCAACTGGTGCCGAAGCGGTCCAGCCTGGAACCGAACTTGACCGTGAGTTACGAAAAGCCAAGGGGCGCGGACCGGAATAAAATAACCAGGGTAGTCCTGGCCGTAAACGATGTGAACTTGGTGGTGGGAATGGCTTCGAACTGCCCAATTGCCGAAAGGAAGCTGTAAATTCACCCGCCTTGGAAGTACGGTCGCAAGACTGGAACCTAAAAGGAATTGGCGGGGGGACACCACAACGCGTGGAGCCTGGCGGTTTTATTGGGATTCCACGCAGACATCTCACTCAGGGGCGACAGCAGAAATGATGGGCAGGTTGATGACCTTGCTTGACAAGCTGAAAAGGAGGTGCAT >EF503697 TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGCATACAGGCTCAGTAACACGTAGTCAACATGCCCAAAGGACGTGGATAACCTCGGGAAACTGAGGATAAACCGCGATAGGCCAAGGTTTCTGGAATGAGCTATGGCCGAAATCTATATGGCCTTTGGATTGGACTGCGGCCGATCAGGCTGTTGGTGAGGTAATGGCCCACCAAACCTGTAACCGGTACGGGCTTTGAGAGAAGTAGCCCGGAGATGGGCACTGAGACAAGGGCCCAGGCCCTATGGGGCGCAGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCCTGACAGGGTTACTCTGAGTGATGCCCGCTAAGGGTATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTTGTCGGGACGATTATTGGGCCTAAAGCATCCGTAGCCTGTTCTGCAAGTCCTCCGTTAAATCCACCTGCTCAACGGATGGGCTGCGGAGGATACCGCAGAGCTAGGAGGCGGGAGAGGCAAACGGTACTCAGTGGGTAGGGGTAAAATCCATTGATCTACTGAAGACCACCAGTGGCGAAGGCGGTTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGGTAGTCCCAGCTGTAAACGGATGCAGACTCGGGTGATGGGGTTGGCTTCCGGCCCAACCCCAATTGCCCCCAGGCGAAGCCCGTTAAGATCTTGCCGCCCTGTCAGATGTCAGGGCCGCCAATACTCGAAACCTTAAAAGGAAATTGGGCGCGGGAAAAGTCACCAAAAGGGGGTTGAAACCCTGCGGGTTATATATTGTAAACC >short_seq TAAAATGACTAGCCTGCGAGTCACGCCGTAAGGCGTGGC """ rdp_expected_out = { 'AY800210 description field': ('Root;Archaea;Euryarchaeota', 0.9), 'EU883771': ('Root;Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanosphaera', 0.92), 'EF503699': ('Root;Archaea;Crenarchaeota;Thermoprotei', 0.82), 'random_seq': ('Root', 1.0), 'DQ260310': ('Root;Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae', 0.93), 'EF503697': ('Root;Archaea;Crenarchaeota;Thermoprotei', 0.88), 'short_seq': ('Unassignable', 1.0) } if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_rnaalifold.py000644 000765 000024 00000006643 12024702176 022757 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.rnaalifold import RNAalifold, rnaalifold_from_alignment from cogent.core.alignment import DataError __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman","Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class RnaalifoldTest(TestCase): """Tests for Rnaalifold application controller""" def setUp(self): self.input = RNAALIFOLD_INPUT self.unaligned = \ {'seq_0': 'GGUAGGUCGCUGGACUUGUCUCCUUGACUGUCCGGAAGGAGCGGU', 'seq_1': 'GGUAGGUCGCUGGAUUGAUAUGAGUAUUGUCCGGAAGGAGCGGA', 'seq_2': 'GGUAGGACGCGGGACUUCUGUUCAGGACUGUCCCGAAGGUGCGGU', 'seq_3': 'GGUAGGUCGCCGCACGUCGCUUCAGGACUGUGCGGAAGGAGCGGU', 'seq_4': 'GGUAGGUCGCUGUACUUCUAUCAGGACUGUACGGAAGGAGCGGU', } self.alignment = \ {'seq_0': 'GGUAGGUCGCUGGAC-UUGUCUCCUUGACU-GUCCGGAAGGAGCGGU', 'seq_1': 'GGUAGGUCGCUGGAU-UGAUAUGAGUAUU--GUCCGGAAGGAGCGGA', 'seq_2': 'GGUAGGACGCGGGAC-UUCUGUUCAGGACU-GUCCCGAAGGUGCGGU', 'seq_3': 'GGUAGGUCGCCGCACGUCGCUUCAGGAC--UGUGCGGAAGGAGCGGU', 'seq_4': 'GGUAGGUCGCUGUAC-UUCUAUCAGGACU--GUACGGAAGGAGCGGU', } self.old_struct = '....(.((.((((((..(((....)))....))))))...)).)...' #from version 1.4 self.new_struct = '....(.((.((((((................))))))...)).)...' #from version 1.8 - no easy way to check version so putting both in def test_input_as_lines(self): """Test rnaalifold stdout input as lines""" r = RNAalifold(InputHandler='_input_as_lines') res = r(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_input_as_string(self): """Test rnaalifold stdout input as string""" r = RNAalifold() f = open('/tmp/clustal','w') f.write('\n'.join(self.input)) f.close() res = r('/tmp/clustal') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/clustal') def test_get_result_path(self): """Tests rnaalifold result path""" r = RNAalifold(InputHandler='_input_as_lines') res = r(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus','SS']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_rnaalifold_from_alignment_unaligned(self): """rnaalifold_from_alignment should handle unaligned seqs. """ self.assertRaises(DataError,rnaalifold_from_alignment,self.unaligned) def test_rnaalifold_from_alignment(self): """rnaalifold_from_alignment should give correct result. """ [[seq, struct,energy]] = rnaalifold_from_alignment(aln=self.alignment) try: self.assertEqual(struct,self.old_struct) except AssertionError: self.assertEqual(struct,self.new_struct) RNAALIFOLD_INPUT = ['CLUSTAL\n', '\n', 'seq1 GGCTAGATAGCTCAGATGGT-AGAGCAGAGGATTGAAGATCCTTGTGTCGTCGGTTCGATCCCGGCTCTGGCC----\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_rnaforester.py000644 000765 000024 00000015257 12024702176 023177 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.app.rnaforester import RNAforester __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class RnaforesterTest(TestCase): """Tests for Rnaforester application controller""" def setUp(self): self.input = rnaforester_input def test_stdout_input_as_lines(self): """Test rnaforester stdout input as lines""" r = RNAforester(InputHandler='_input_as_lines') exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in rnaforester_stdout]) res = r(self.input) obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() def test_stdout_input_as_string(self): """Test rnaforester stdout input as string""" r = RNAforester() exp = '%s\n' % '\n'.join([str(i).strip('\n') for i in rnaforester_stdout]) f = open('/tmp/input.fasta','w') txt = '\n'.join([str(i).strip('\n') for i in self.input]) f.write(txt) f.close() res = r('/tmp/input.fasta') obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() remove('/tmp/input.fasta') def test_get_result_path(self): """Tests rnaforester result path""" r = RNAforester(InputHandler='_input_as_lines') res = r(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus']) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() rnaforester_input = ['>seq1\n', 'GGCCACGTAGCTCAGTCGGTAGAGCAAAGGACTGAAAATCCTTGTGTCGTTGGTTCAATTCCAACCGTGGCCACCA\n', '>seq2\n', 'GCCAGATAGCTCAGTCGGTAGAGCGTTCGCCTGAAAAGTGAAAGGTCGCCGGTTCGATCCCGGCTCTGGCCACCA\n'] rnaforester_stdout = ['*** Scoring parameters ***\n', '\n', 'Scoring type: similarity\n', 'Scoring parameters:\n', 'pm: 10\n', 'pd: -5\n', 'bm: 1\n', 'br: 0\n', 'bd: -10\n', '\n', '\n', 'Input string (upper or lower case); & to end for multiple alignments, @ to quit\n', '....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8\n', '\n', '*** Calculation ***\n', '\n', 'clustering threshold is: 0.7\n', 'join clusters cutoff is: 0\n', '\n', 'Computing all pairwise similarities\n', '2,1: 0.74606\n', '\n', 'joining alignments:\n', '1,2: 0.74606 -> 1\n', 'Calculate similarities to other clusters\n', '\n', '\n', '*** Results ***\n', '\n', 'Minimum basepair probability for consensus structure (-cmin): 0.5\n', '\n', 'RNA Structure Cluster Nr: 1\n', 'Score: 264.25\n', 'Members: 2\n', '\n', 'seq1 ggccacguagcucagucgguagagcaaaggacugaaaauccuugugucguugguu\n', 'seq2 -gccagauagcucagucgguagagcguucgccugaaaagugaaaggucgccgguu\n', ' **** ****************** * ******* **** ****\n', '\n', 'seq1 caauuccaaccguggccacca\n', 'seq2 cgaucccggcucuggccacca\n', ' * ** ** * *********\n', '\n', 'seq1 (((((((..((((........)))).(((((.......))))).....(((((..\n', 'seq2 -((((((..((((........)))).((((.((....)))))).....(((((..\n', ' ***************************** **** *****************\n', '\n', 'seq1 .....))))))))))))....\n', 'seq2 .....))))))))))).....\n', ' **************** ****\n', '\n', '\n', 'Consensus sequence/structure:\n', ' 100% **** ****************** * ******* **** ****\n', ' 90% **** ****************** * ******* **** ****\n', ' 80% **** ****************** * ******* **** ****\n', ' 70% **** ****************** * ******* **** ****\n', ' 60% **** ****************** * ******* **** ****\n', ' 50% *******************************************************\n', ' 40% *******************************************************\n', ' 30% *******************************************************\n', ' 20% *******************************************************\n', ' 10% *******************************************************\n', ' ggccacauagcucagucgguagagcaaacgacugaaaagccaaaggucgccgguu\n', ' (((((((..((((........)))).((((.((....)))))).....(((((..\n', ' 10% *******************************************************\n', ' 20% *******************************************************\n', ' 30% *******************************************************\n', ' 40% *******************************************************\n', ' 50% *******************************************************\n', ' 60% *******************************************************\n', ' 70% ****************************** **** ****************\n', ' 80% ****************************** **** ****************\n', ' 90% ****************************** **** ****************\n', ' 100% ****************************** **** ****************\n', '\n', ' 100% * ** ** * *********\n', ' 90% * ** ** * *********\n', ' 80% * ** ** * *********\n', ' 70% * ** ** * *********\n', ' 60% * ** ** * *********\n', ' 50% *********************\n', ' 40% *********************\n', ' 30% *********************\n', ' 20% *********************\n', ' 10% *********************\n', ' caaucccaacccuggccacca\n', ' .....))))))))))))....\n', ' 10% *********************\n', ' 20% *********************\n', ' 30% *********************\n', ' 40% *********************\n', ' 50% *********************\n', ' 60% *********************\n', ' 70% **************** ****\n', ' 80% **************** ****\n', ' 90% **************** ****\n', ' 100% **************** ****\n', '\n', '\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_rnaview.py000644 000765 000024 00001056324 12024702176 022321 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.app.rnaview import RnaView from tempfile import mktemp, tempdir from os import remove, system, getcwd __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "Greg Caporaso" __status__ = "Production" class test_rnaview(TestCase): """ Tests of the rnaview application controller """ def setUp(self): """ SetUp some objects for use by tests """ # Define the directory where mktemp creates files, this simplifies # things b/c if ~/tmp exists files are created there, otherwise they # are created in /tmp. For the case of tests, it's easier just to # set it to a constant self.fake_pdb_file = fake_pdb_file self.fake_nmr_file = fake_nmr_file self.r1 = RnaView(WorkingDir='/tmp') self.r_input_as_lines = RnaView(WorkingDir='/tmp',\ InputHandler='_input_as_lines') self.r2 = RnaView(params={'-v':None},WorkingDir='/tmp') self.r3 = RnaView(params={'-c':'A'},WorkingDir='/tmp') self.r4 = RnaView(params={'-x':None,'-p':None},WorkingDir='/tmp') self.r5 = RnaView() def test_base_command(self): """RnaView: BaseCommand is built correctly """ self.assertEqual(self.r1.BaseCommand, 'cd "/tmp/"; rnaview') self.assertEqual(self.r2.BaseCommand, 'cd "/tmp/"; rnaview -v') self.assertEqual(self.r3.BaseCommand, 'cd "/tmp/"; rnaview -c A') assert (self.r4.BaseCommand == 'cd "/tmp/"; rnaview -x -p') or \ (self.r4.BaseCommand == 'cd "/tmp/"; rnaview -p -x') self.assertEqual(self.r5.BaseCommand,\ 'cd "' + getcwd() + '/'+ '"; rnaview') def test_file_pointers_no_extras_input_as_file(self): """RnaView: pointers created only for minimal files w/ _input_as_lines """ written_files = {}.fromkeys(['bp_stats','base_pairs',\ 'StdOut','StdErr','ExitStatus']) res =self.r_input_as_lines(data=self.fake_pdb_file.split('\n')) for f in res: if f in written_files: assert res[f] is not None else: assert res[f] is None res.cleanUp() def test_file_pointers_no_extras(self): """RnaView: pointers created only for minimal files """ filename = mktemp() + '.pdb' f = open(filename,"w") f.writelines(self.fake_pdb_file) f.close() written_files = {}.fromkeys(['bp_stats','base_pairs',\ 'StdOut','StdErr','ExitStatus']) res = self.r1(data=filename) for f in res: if f in written_files: assert res[f] is not None else: assert res[f] is None res.cleanUp() remove(filename) def test_file_pointers_w_vrml(self): """RnaView: pointers created for minimal files and files turned on """ # need to make a fake pdb file with base pairs so that wrl file will be # created filename = mktemp() + '.pdb' f = open(filename,"w") f.writelines(self.fake_pdb_file) f.close() written_files = {}.fromkeys(['bp_stats','base_pairs',\ 'StdOut','StdErr','ExitStatus','vrml']) res = self.r2(data=filename) for f in res: if f in written_files: assert res[f] is not None else: assert res[f] is None res.cleanUp() remove(filename) def test_base_pairs_out(self): """RnaView: output sanity check """ filename = mktemp() f = open(filename,"w") f.writelines(self.fake_pdb_file) f.close() res = self.r2(data=filename) bp_file = list(res['base_pairs']) self.assertEqual(bp_file[0],'PDB data file name: ' + filename + '\n') res.cleanUp() remove(filename) def test_get_pdb_filename(self): """RnaView: _get_pdb_filename functions as expected """ #Test X-RAY CRYSTALLOGRAPHY file f = open('/tmp/1EHZ.pdb',"w") f.writelines(self.fake_pdb_file) f.close() self.assertEqual(self.r1._get_pdb_filename('/tmp/1EHZ.pdb'),'1EHZ.pdb') remove('/tmp/1EHZ.pdb') #Test NMR file f = open('/tmp/pdb17.ent',"w") f.writelines(self.fake_nmr_file) f.close() self.assertEqual(self.r1._get_pdb_filename('/tmp/pdb17.ent'),\ 'pdb17.ent_nmr.pdb') remove('/tmp/pdb17.ent') #OLD TESTS BEFORE THE ACTUAL FILE HAD TO BE THERE #self.assertEqual(self.r1._get_pdb_filename('/tmp/1EHZ.pdb'),'1EHZ.pdb') #self.assertEqual(self.r1._get_pdb_filename('/tmp/duh/1EHZ.pdb'),\ # '1EHZ.pdb') #self.assertEqual(self.r1._get_pdb_filename('1EHZ1.pdb'),'1EHZ1.pdb') #self.assertEqual(self.r1._get_pdb_filename('/1'),'1') #self.assertEqual(self.r1._get_pdb_filename('1EHZZZ.pdb'),'1EHZZZ.pdb') #self.assertEqual(self.r1._get_pdb_filename('/tmp/tmpW3urtc'),\ # 'tmpW3urtc') def test_get_out_path(self): """RnaView: _get_out_path functions as expected """ self.assertEqual(self.r1._get_out_path('1EHZ.pdb'),'') self.assertEqual(self.r1._get_out_path('/tmp/1EHZ.pdb'),'/tmp/') self.assertEqual(self.r1._get_out_path('/tmp/duh/1EHZ.pdb'),'/tmp/duh/') self.assertEqual(self.r1._get_out_path('/1'),'/') def test_accept_exit_status(self): """RnaView: _accept_exit_status functions as expected """ self.assertEqual(self.r1._accept_exit_status(0),True) self.assertEqual(self.r1._accept_exit_status(1),False) self.assertEqual(self.r1._accept_exit_status('0'),False) self.assertEqual(self.r1._accept_exit_status(None),False) self.assertEqual(self.r1._accept_exit_status(''),False) fake_pdb_file = """ATOM 1 O3P G A 1 50.193 51.190 50.534 1.00 99.85 O ATOM 2 P G A 1 50.626 49.730 50.573 1.00100.19 P ATOM 3 O1P G A 1 49.854 48.893 49.562 1.00100.19 O ATOM 4 O2P G A 1 52.137 49.542 50.511 1.00 99.21 O ATOM 5 O5* G A 1 50.161 49.136 52.023 1.00 99.82 O ATOM 6 C5* G A 1 50.216 49.948 53.210 1.00 98.63 C ATOM 7 C4* G A 1 50.968 49.231 54.309 1.00 97.84 C ATOM 8 O4* G A 1 50.450 47.888 54.472 1.00 97.10 O ATOM 9 C3* G A 1 52.454 49.030 54.074 1.00 98.07 C ATOM 10 O3* G A 1 53.203 50.177 54.425 1.00 99.39 O ATOM 11 C2* G A 1 52.781 47.831 54.957 1.00 96.96 C ATOM 12 O2* G A 1 53.018 48.156 56.313 1.00 96.77 O ATOM 13 C1* G A 1 51.502 47.007 54.836 1.00 95.70 C ATOM 14 N9 G A 1 51.628 45.992 53.798 1.00 93.67 N ATOM 15 C8 G A 1 51.064 46.007 52.547 1.00 92.60 C ATOM 16 N7 G A 1 51.379 44.966 51.831 1.00 91.19 N ATOM 17 C5 G A 1 52.197 44.218 52.658 1.00 91.47 C ATOM 18 C6 G A 1 52.848 42.992 52.425 1.00 90.68 C ATOM 19 O6 G A 1 52.826 42.291 51.404 1.00 90.38 O ATOM 20 N1 G A 1 53.588 42.588 53.534 1.00 90.71 N ATOM 21 C2 G A 1 53.685 43.282 54.716 1.00 91.21 C ATOM 22 N2 G A 1 54.452 42.733 55.671 1.00 91.23 N ATOM 23 N3 G A 1 53.077 44.429 54.946 1.00 91.92 N ATOM 24 C4 G A 1 52.356 44.836 53.879 1.00 92.62 C ATOM 25 P C A 2 54.635 50.420 53.741 1.00100.19 P ATOM 26 O1P C A 2 55.145 51.726 54.238 1.00100.19 O ATOM 27 O2P C A 2 54.465 50.204 52.269 1.00100.19 O ATOM 28 O5* C A 2 55.563 49.261 54.342 1.00 98.27 O ATOM 29 C5* C A 2 55.925 49.246 55.742 1.00 95.40 C ATOM 30 C4* C A 2 56.836 48.075 56.049 1.00 93.33 C ATOM 31 O4* C A 2 56.122 46.828 55.830 1.00 92.18 O ATOM 32 C3* C A 2 58.090 47.947 55.197 1.00 92.75 C ATOM 33 O3* C A 2 59.174 48.753 55.651 1.00 92.89 O ATOM 34 C2* C A 2 58.416 46.463 55.298 1.00 91.81 C ATOM 35 O2* C A 2 59.140 46.136 56.466 1.00 91.36 O ATOM 36 C1* C A 2 57.022 45.836 55.356 1.00 90.59 C ATOM 37 N1 C A 2 56.570 45.364 54.029 1.00 88.84 N ATOM 38 C2 C A 2 57.094 44.157 53.520 1.00 88.64 C ATOM 39 O2 C A 2 57.921 43.516 54.198 1.00 88.97 O ATOM 40 N3 C A 2 56.686 43.721 52.301 1.00 87.36 N ATOM 41 C4 C A 2 55.802 44.437 51.597 1.00 87.11 C ATOM 42 N4 C A 2 55.430 43.972 50.397 1.00 86.30 N ATOM 43 C5 C A 2 55.259 45.660 52.089 1.00 86.87 C ATOM 44 C6 C A 2 55.663 46.080 53.296 1.00 88.01 C ATOM 45 P G A 3 60.184 49.419 54.574 1.00 92.31 P ATOM 46 O1P G A 3 61.015 50.422 55.295 1.00 92.97 O ATOM 47 O2P G A 3 59.371 49.857 53.404 1.00 91.56 O ATOM 48 O5* G A 3 61.137 48.219 54.105 1.00 88.57 O ATOM 49 C5* G A 3 62.175 47.724 54.969 1.00 83.44 C ATOM 50 C4* G A 3 62.769 46.443 54.422 1.00 79.87 C ATOM 51 O4* G A 3 61.734 45.427 54.299 1.00 78.36 O ATOM 52 C3* G A 3 63.405 46.499 53.040 1.00 78.97 C ATOM 53 O3* G A 3 64.741 47.029 53.060 1.00 79.76 O ATOM 54 C2* G A 3 63.359 45.032 52.608 1.00 77.19 C ATOM 55 O2* G A 3 64.411 44.256 53.155 1.00 77.80 O ATOM 56 C1* G A 3 62.018 44.572 53.194 1.00 73.98 C ATOM 57 N9 G A 3 60.934 44.675 52.202 1.00 68.20 N ATOM 58 C8 G A 3 60.024 45.702 52.050 1.00 65.03 C ATOM 59 N7 G A 3 59.252 45.556 51.003 1.00 62.99 N ATOM 60 C5 G A 3 59.655 44.348 50.447 1.00 59.95 C ATOM 61 C6 G A 3 59.189 43.675 49.292 1.00 55.65 C ATOM 62 O6 G A 3 58.287 44.013 48.522 1.00 53.32 O ATOM 63 N1 G A 3 59.893 42.491 49.072 1.00 54.00 N ATOM 64 C2 G A 3 60.906 42.006 49.876 1.00 55.46 C ATOM 65 N2 G A 3 61.512 40.873 49.479 1.00 48.16 N ATOM 66 N3 G A 3 61.312 42.605 50.983 1.00 56.69 N ATOM 67 C4 G A 3 60.666 43.774 51.193 1.00 61.76 C ATOM 68 P G A 4 65.295 47.868 51.793 1.00 79.34 P ATOM 69 O1P G A 4 66.538 48.562 52.246 1.00 80.87 O ATOM 70 O2P G A 4 64.193 48.679 51.209 1.00 79.00 O ATOM 71 O5* G A 4 65.720 46.752 50.724 1.00 75.17 O ATOM 72 C5* G A 4 66.789 45.843 51.019 1.00 68.95 C ATOM 73 C4* G A 4 66.749 44.634 50.114 1.00 65.13 C ATOM 74 O4* G A 4 65.484 43.939 50.258 1.00 61.83 O ATOM 75 C3* G A 4 66.881 44.840 48.611 1.00 62.79 C ATOM 76 O3* G A 4 68.230 44.977 48.176 1.00 61.75 O ATOM 77 C2* G A 4 66.318 43.538 48.064 1.00 60.58 C ATOM 78 O2* G A 4 67.283 42.514 48.122 1.00 59.59 O ATOM 79 C1* G A 4 65.192 43.241 49.051 1.00 58.29 C ATOM 80 N9 G A 4 63.923 43.716 48.500 1.00 53.36 N ATOM 81 C8 G A 4 63.204 44.843 48.842 1.00 49.19 C ATOM 82 N7 G A 4 62.140 45.013 48.107 1.00 46.88 N ATOM 83 C5 G A 4 62.144 43.926 47.243 1.00 45.95 C ATOM 84 C6 G A 4 61.246 43.573 46.206 1.00 43.95 C ATOM 85 O6 G A 4 60.182 44.136 45.874 1.00 42.46 O ATOM 86 N1 G A 4 61.672 42.428 45.540 1.00 40.34 N ATOM 87 C2 G A 4 62.788 41.686 45.867 1.00 42.60 C ATOM 88 N2 G A 4 63.034 40.588 45.135 1.00 38.91 N ATOM 89 N3 G A 4 63.612 41.994 46.850 1.00 44.40 N ATOM 90 C4 G A 4 63.239 43.117 47.480 1.00 48.65 C ATOM 91 P A A 5 68.530 45.722 46.789 1.00 59.03 P ATOM 92 O1P A A 5 69.991 45.842 46.548 1.00 60.84 O ATOM 93 O2P A A 5 67.685 46.959 46.834 1.00 60.64 O ATOM 94 O5* A A 5 67.957 44.735 45.675 1.00 57.14 O ATOM 95 C5* A A 5 68.648 43.529 45.323 1.00 53.41 C ATOM 96 C4* A A 5 67.927 42.844 44.191 1.00 50.63 C ATOM 97 O4* A A 5 66.589 42.480 44.646 1.00 48.70 O ATOM 98 C3* A A 5 67.665 43.715 42.964 1.00 50.77 C ATOM 99 O3* A A 5 68.747 43.769 42.051 1.00 52.86 O ATOM 100 C2* A A 5 66.455 43.024 42.355 1.00 48.94 C ATOM 101 O2* A A 5 66.864 41.798 41.731 1.00 48.54 O ATOM 102 C1* A A 5 65.646 42.719 43.615 1.00 44.50 C ATOM 103 N9 A A 5 64.779 43.843 44.021 1.00 42.01 N ATOM 104 C8 A A 5 64.938 44.803 45.016 1.00 39.75 C ATOM 105 N7 A A 5 63.925 45.649 45.113 1.00 41.58 N ATOM 106 C5 A A 5 63.049 45.220 44.115 1.00 38.26 C ATOM 107 C6 A A 5 61.796 45.688 43.683 1.00 35.83 C ATOM 108 N6 A A 5 61.110 46.688 44.232 1.00 32.66 N ATOM 109 N1 A A 5 61.233 45.057 42.644 1.00 35.14 N ATOM 110 C2 A A 5 61.870 44.017 42.074 1.00 38.97 C ATOM 111 N3 A A 5 63.024 43.467 42.399 1.00 36.02 N ATOM 112 C4 A A 5 63.571 44.119 43.437 1.00 39.04 C ATOM 113 P U A 6 69.150 45.179 41.392 1.00 55.09 P ATOM 114 O1P U A 6 70.511 44.926 40.836 1.00 56.37 O ATOM 115 O2P U A 6 68.953 46.283 42.381 1.00 51.00 O ATOM 116 O5* U A 6 68.119 45.358 40.184 1.00 50.38 O ATOM 117 C5* U A 6 67.912 44.271 39.258 1.00 48.10 C ATOM 118 C4* U A 6 66.579 44.400 38.565 1.00 47.17 C ATOM 119 O4* U A 6 65.486 44.324 39.513 1.00 46.61 O ATOM 120 C3* U A 6 66.344 45.708 37.850 1.00 45.27 C ATOM 121 O3* U A 6 66.964 45.696 36.590 1.00 45.77 O ATOM 122 C2* U A 6 64.833 45.733 37.727 1.00 45.88 C ATOM 123 O2* U A 6 64.431 44.864 36.684 1.00 44.33 O ATOM 124 C1* U A 6 64.413 45.113 39.057 1.00 41.32 C ATOM 125 N1 U A 6 64.065 46.111 40.079 1.00 39.88 N ATOM 126 C2 U A 6 62.798 46.658 39.977 1.00 36.06 C ATOM 127 O2 U A 6 62.021 46.333 39.099 1.00 38.25 O ATOM 128 N3 U A 6 62.487 47.582 40.924 1.00 34.15 N ATOM 129 C4 U A 6 63.272 48.002 41.975 1.00 37.36 C ATOM 130 O4 U A 6 62.822 48.829 42.752 1.00 39.30 O ATOM 131 C5 U A 6 64.583 47.395 42.032 1.00 39.23 C ATOM 132 C6 U A 6 64.926 46.497 41.084 1.00 35.72 C ATOM 133 P U A 7 67.463 47.074 35.969 1.00 44.37 P ATOM 134 O1P U A 7 68.318 46.756 34.822 1.00 48.09 O ATOM 135 O2P U A 7 67.945 47.948 37.077 1.00 45.68 O ATOM 136 O5* U A 7 66.104 47.724 35.455 1.00 40.88 O ATOM 137 C5* U A 7 65.285 47.024 34.459 1.00 37.89 C ATOM 138 C4* U A 7 64.055 47.852 34.101 1.00 35.74 C ATOM 139 O4* U A 7 63.297 48.107 35.326 1.00 38.13 O ATOM 140 C3* U A 7 64.317 49.197 33.459 1.00 36.87 C ATOM 141 O3* U A 7 63.402 49.394 32.378 1.00 37.45 O ATOM 142 C2* U A 7 64.097 50.171 34.624 1.00 36.55 C ATOM 143 O2* U A 7 63.595 51.417 34.246 1.00 35.54 O ATOM 144 C1* U A 7 63.015 49.475 35.442 1.00 37.23 C ATOM 145 N1 U A 7 63.056 49.858 36.864 1.00 36.91 N ATOM 146 C2 U A 7 62.011 50.628 37.343 1.00 34.52 C ATOM 147 O2 U A 7 61.087 50.966 36.653 1.00 34.65 O ATOM 148 N3 U A 7 62.112 50.993 38.659 1.00 37.03 N ATOM 149 C4 U A 7 63.131 50.684 39.541 1.00 40.15 C ATOM 150 O4 U A 7 63.105 51.143 40.699 1.00 36.62 O ATOM 151 C5 U A 7 64.179 49.865 38.971 1.00 36.52 C ATOM 152 C6 U A 7 64.106 49.490 37.691 1.00 36.25 C ATOM 153 P U A 8 63.884 49.282 30.858 1.00 36.77 P ATOM 154 O1P U A 8 62.852 49.899 29.952 1.00 38.95 O ATOM 155 O2P U A 8 64.442 47.955 30.547 1.00 38.70 O ATOM 156 O5* U A 8 65.171 50.254 30.733 1.00 35.95 O ATOM 157 C5* U A 8 64.994 51.676 30.500 1.00 33.53 C ATOM 158 C4* U A 8 66.105 52.236 29.628 1.00 34.33 C ATOM 159 O4* U A 8 67.428 52.119 30.261 1.00 31.81 O ATOM 160 C3* U A 8 66.269 51.519 28.297 1.00 30.21 C ATOM 161 O3* U A 8 65.321 51.887 27.314 1.00 32.41 O ATOM 162 C2* U A 8 67.685 51.906 27.900 1.00 31.37 C ATOM 163 O2* U A 8 67.743 53.224 27.433 1.00 27.02 O ATOM 164 C1* U A 8 68.407 51.830 29.255 1.00 30.28 C ATOM 165 N1 U A 8 68.914 50.469 29.501 1.00 28.11 N ATOM 166 C2 U A 8 70.125 50.078 28.931 1.00 29.25 C ATOM 167 O2 U A 8 70.835 50.819 28.278 1.00 27.81 O ATOM 168 N3 U A 8 70.481 48.778 29.170 1.00 25.94 N ATOM 169 C4 U A 8 69.808 47.856 29.922 1.00 27.37 C ATOM 170 O4 U A 8 70.215 46.704 29.963 1.00 32.58 O ATOM 171 C5 U A 8 68.612 48.328 30.490 1.00 29.58 C ATOM 172 C6 U A 8 68.214 49.592 30.265 1.00 30.40 C ATOM 173 P A A 9 64.755 50.731 26.311 1.00 32.05 P ATOM 174 O1P A A 9 63.287 50.736 26.312 1.00 36.52 O ATOM 175 O2P A A 9 65.447 49.415 26.519 1.00 31.33 O ATOM 176 O5* A A 9 65.221 51.270 24.948 1.00 28.22 O ATOM 177 C5* A A 9 64.646 52.433 24.369 1.00 32.91 C ATOM 178 C4* A A 9 64.531 52.215 22.904 1.00 32.49 C ATOM 179 O4* A A 9 65.887 52.090 22.406 1.00 35.09 O ATOM 180 C3* A A 9 63.820 50.923 22.466 1.00 34.41 C ATOM 181 O3* A A 9 63.140 51.180 21.236 1.00 36.11 O ATOM 182 C2* A A 9 64.979 49.997 22.155 1.00 32.37 C ATOM 183 O2* A A 9 64.686 49.016 21.194 1.00 35.87 O ATOM 184 C1* A A 9 65.985 50.969 21.571 1.00 28.79 C ATOM 185 N9 A A 9 67.376 50.497 21.585 1.00 23.84 N ATOM 186 C8 A A 9 67.851 49.356 22.159 1.00 25.84 C ATOM 187 N7 A A 9 69.149 49.195 22.010 1.00 26.83 N ATOM 188 C5 A A 9 69.527 50.298 21.288 1.00 23.87 C ATOM 189 C6 A A 9 70.730 50.663 20.793 1.00 30.26 C ATOM 190 N6 A A 9 71.797 49.922 20.994 1.00 30.95 N ATOM 191 N1 A A 9 70.817 51.794 20.072 1.00 28.29 N ATOM 192 C2 A A 9 69.701 52.547 19.932 1.00 32.68 C ATOM 193 N3 A A 9 68.469 52.287 20.369 1.00 25.13 N ATOM 194 C4 A A 9 68.446 51.117 21.026 1.00 26.68 C HETATM 195 P 2MG A 10 61.504 51.328 21.232 1.00 44.21 P HETATM 196 O1P 2MG A 10 61.165 52.038 22.473 1.00 41.39 O HETATM 197 O2P 2MG A 10 61.216 51.946 19.892 1.00 41.97 O HETATM 198 O5* 2MG A 10 60.903 49.858 21.330 1.00 38.75 O HETATM 199 C5* 2MG A 10 59.437 49.660 21.397 1.00 42.74 C HETATM 200 C4* 2MG A 10 59.058 48.375 20.709 1.00 42.88 C HETATM 201 O4* 2MG A 10 59.575 48.416 19.351 1.00 44.02 O HETATM 202 C3* 2MG A 10 59.701 47.161 21.326 1.00 43.31 C HETATM 203 O3* 2MG A 10 58.874 46.647 22.357 1.00 45.12 O HETATM 204 C2* 2MG A 10 59.822 46.215 20.154 1.00 46.04 C HETATM 205 O2* 2MG A 10 58.533 45.637 19.943 1.00 47.96 O HETATM 206 C1* 2MG A 10 60.152 47.173 19.012 1.00 44.62 C HETATM 207 N9 2MG A 10 61.581 47.402 18.752 1.00 42.14 N HETATM 208 C8 2MG A 10 62.199 48.621 18.635 1.00 40.38 C HETATM 209 N7 2MG A 10 63.494 48.534 18.422 1.00 40.70 N HETATM 210 C5 2MG A 10 63.745 47.167 18.395 1.00 43.82 C HETATM 211 C6 2MG A 10 64.965 46.449 18.205 1.00 43.45 C HETATM 212 O6 2MG A 10 66.097 46.891 17.963 1.00 44.87 O HETATM 213 N1 2MG A 10 64.767 45.086 18.293 1.00 44.71 N HETATM 214 C2 2MG A 10 63.541 44.482 18.486 1.00 47.21 C HETATM 215 N2 2MG A 10 63.532 43.164 18.551 1.00 49.27 N HETATM 216 CM2 2MG A 10 62.220 42.454 18.591 1.00 52.10 C HETATM 217 N3 2MG A 10 62.411 45.125 18.614 1.00 45.85 N HETATM 218 C4 2MG A 10 62.574 46.451 18.582 1.00 43.27 C ATOM 219 P C A 11 59.474 46.418 23.818 1.00 50.75 P ATOM 220 O1P C A 11 58.367 46.417 24.802 1.00 49.46 O ATOM 221 O2P C A 11 60.585 47.425 23.967 1.00 44.94 O ATOM 222 O5* C A 11 60.064 44.937 23.797 1.00 49.65 O ATOM 223 C5* C A 11 59.234 43.814 23.447 1.00 49.66 C ATOM 224 C4* C A 11 60.091 42.608 23.221 1.00 50.13 C ATOM 225 O4* C A 11 60.886 42.801 22.028 1.00 47.40 O ATOM 226 C3* C A 11 61.091 42.406 24.335 1.00 52.64 C ATOM 227 O3* C A 11 60.498 41.644 25.372 1.00 54.31 O ATOM 228 C2* C A 11 62.252 41.701 23.640 1.00 51.51 C ATOM 229 O2* C A 11 62.072 40.314 23.587 1.00 53.58 O ATOM 230 C1* C A 11 62.189 42.294 22.230 1.00 48.88 C ATOM 231 N1 C A 11 63.145 43.397 21.999 1.00 46.21 N ATOM 232 C2 C A 11 64.484 43.091 21.738 1.00 45.74 C ATOM 233 O2 C A 11 64.833 41.895 21.708 1.00 47.49 O ATOM 234 N3 C A 11 65.365 44.106 21.527 1.00 42.07 N ATOM 235 C4 C A 11 64.941 45.376 21.555 1.00 42.69 C ATOM 236 N4 C A 11 65.829 46.353 21.301 1.00 38.00 N ATOM 237 C5 C A 11 63.586 45.709 21.822 1.00 43.75 C ATOM 238 C6 C A 11 62.732 44.698 22.035 1.00 45.15 C ATOM 239 P U A 12 60.976 41.853 26.874 1.00 57.65 P ATOM 240 O1P U A 12 60.123 41.039 27.783 1.00 59.26 O ATOM 241 O2P U A 12 61.080 43.316 27.117 1.00 59.70 O ATOM 242 O5* U A 12 62.441 41.244 26.886 1.00 55.93 O ATOM 243 C5* U A 12 62.652 39.837 26.718 1.00 52.43 C ATOM 244 C4* U A 12 64.121 39.544 26.594 1.00 48.54 C ATOM 245 O4* U A 12 64.635 40.154 25.385 1.00 43.89 O ATOM 246 C3* U A 12 65.015 40.119 27.684 1.00 50.30 C ATOM 247 O3* U A 12 65.044 39.362 28.898 1.00 48.54 O ATOM 248 C2* U A 12 66.384 40.159 27.007 1.00 48.05 C ATOM 249 O2* U A 12 67.072 38.922 27.081 1.00 51.04 O ATOM 250 C1* U A 12 66.013 40.478 25.565 1.00 47.40 C ATOM 251 N1 U A 12 66.260 41.889 25.203 1.00 42.96 N ATOM 252 C2 U A 12 67.565 42.250 24.872 1.00 43.65 C ATOM 253 O2 U A 12 68.512 41.462 24.914 1.00 41.62 O ATOM 254 N3 U A 12 67.736 43.577 24.520 1.00 45.49 N ATOM 255 C4 U A 12 66.755 44.567 24.497 1.00 41.58 C ATOM 256 O4 U A 12 67.069 45.722 24.203 1.00 41.28 O ATOM 257 C5 U A 12 65.436 44.114 24.864 1.00 40.07 C ATOM 258 C6 U A 12 65.241 42.831 25.193 1.00 44.32 C ATOM 259 P C A 13 65.498 40.095 30.251 1.00 51.82 P ATOM 260 O1P C A 13 65.477 39.159 31.399 1.00 52.57 O ATOM 261 O2P C A 13 64.709 41.323 30.314 1.00 51.61 O ATOM 262 O5* C A 13 67.010 40.399 29.846 1.00 48.38 O ATOM 263 C5* C A 13 67.843 41.283 30.558 1.00 46.61 C ATOM 264 C4* C A 13 69.102 41.516 29.762 1.00 43.48 C ATOM 265 O4* C A 13 68.818 42.103 28.459 1.00 41.66 O ATOM 266 C3* C A 13 70.084 42.477 30.389 1.00 46.01 C ATOM 267 O3* C A 13 70.867 41.784 31.351 1.00 49.09 O ATOM 268 C2* C A 13 70.925 42.922 29.204 1.00 42.77 C ATOM 269 O2* C A 13 71.950 41.978 29.004 1.00 45.11 O ATOM 270 C1* C A 13 69.909 42.919 28.054 1.00 39.03 C ATOM 271 N1 C A 13 69.390 44.256 27.668 1.00 36.14 N ATOM 272 C2 C A 13 70.236 45.162 26.999 1.00 31.39 C ATOM 273 O2 C A 13 71.429 44.834 26.781 1.00 33.97 O ATOM 274 N3 C A 13 69.736 46.376 26.604 1.00 30.71 N ATOM 275 C4 C A 13 68.478 46.702 26.907 1.00 27.53 C ATOM 276 N4 C A 13 68.050 47.913 26.576 1.00 29.33 N ATOM 277 C5 C A 13 67.596 45.795 27.581 1.00 30.21 C ATOM 278 C6 C A 13 68.085 44.602 27.937 1.00 31.74 C ATOM 279 P A A 14 71.499 42.582 32.585 1.00 52.69 P ATOM 280 O1P A A 14 71.592 41.647 33.723 1.00 56.81 O ATOM 281 O2P A A 14 70.795 43.877 32.732 1.00 54.35 O ATOM 282 O5* A A 14 72.996 42.894 32.143 1.00 52.44 O ATOM 283 C5* A A 14 73.291 44.004 31.337 1.00 44.62 C ATOM 284 C4* A A 14 74.612 43.815 30.626 1.00 38.21 C ATOM 285 O4* A A 14 74.372 44.178 29.229 1.00 37.15 O ATOM 286 C3* A A 14 75.617 44.841 31.120 1.00 39.31 C ATOM 287 O3* A A 14 76.409 44.373 32.214 1.00 35.14 O ATOM 288 C2* A A 14 76.410 45.187 29.878 1.00 35.42 C ATOM 289 O2* A A 14 77.406 44.222 29.562 1.00 37.00 O ATOM 290 C1* A A 14 75.325 45.147 28.805 1.00 32.12 C ATOM 291 N9 A A 14 74.639 46.437 28.568 1.00 30.32 N ATOM 292 C8 A A 14 73.332 46.800 28.850 1.00 28.83 C ATOM 293 N7 A A 14 73.030 48.029 28.495 1.00 27.89 N ATOM 294 C5 A A 14 74.205 48.496 27.963 1.00 28.40 C ATOM 295 C6 A A 14 74.551 49.722 27.451 1.00 28.49 C ATOM 296 N6 A A 14 73.715 50.778 27.422 1.00 27.62 N ATOM 297 N1 A A 14 75.820 49.867 26.972 1.00 28.90 N ATOM 298 C2 A A 14 76.658 48.824 27.058 1.00 26.09 C ATOM 299 N3 A A 14 76.449 47.633 27.546 1.00 34.21 N ATOM 300 C4 A A 14 75.194 47.523 27.993 1.00 28.42 C ATOM 301 P G A 15 76.463 45.227 33.560 1.00 38.35 P ATOM 302 O1P G A 15 77.577 44.561 34.373 1.00 36.34 O ATOM 303 O2P G A 15 75.020 45.308 34.127 1.00 36.07 O ATOM 304 O5* G A 15 76.977 46.682 33.179 1.00 34.17 O ATOM 305 C5* G A 15 78.216 46.873 32.475 1.00 37.95 C ATOM 306 C4* G A 15 78.274 48.248 31.867 1.00 34.05 C ATOM 307 O4* G A 15 77.353 48.424 30.762 1.00 36.42 O ATOM 308 C3* G A 15 77.992 49.400 32.793 1.00 41.25 C ATOM 309 O3* G A 15 79.176 49.641 33.526 1.00 55.04 O ATOM 310 C2* G A 15 77.696 50.528 31.803 1.00 37.22 C ATOM 311 O2* G A 15 78.880 51.102 31.276 1.00 33.49 O ATOM 312 C1* G A 15 76.941 49.779 30.686 1.00 33.07 C ATOM 313 N9 G A 15 75.505 49.813 30.961 1.00 30.40 N ATOM 314 C8 G A 15 74.775 48.836 31.612 1.00 30.39 C ATOM 315 N7 G A 15 73.537 49.203 31.861 1.00 28.29 N ATOM 316 C5 G A 15 73.439 50.464 31.298 1.00 27.38 C ATOM 317 C6 G A 15 72.351 51.383 31.271 1.00 24.88 C ATOM 318 O6 G A 15 71.261 51.260 31.756 1.00 26.54 O ATOM 319 N1 G A 15 72.683 52.569 30.607 1.00 23.73 N ATOM 320 C2 G A 15 73.896 52.858 30.094 1.00 29.75 C ATOM 321 N2 G A 15 74.047 54.083 29.581 1.00 27.02 N ATOM 322 N3 G A 15 74.925 52.008 30.089 1.00 28.46 N ATOM 323 C4 G A 15 74.632 50.842 30.714 1.00 29.28 C HETATM 324 P H2U A 16 79.106 49.914 35.099 1.00 64.01 P HETATM 325 O1P H2U A 16 77.803 50.520 35.400 1.00 58.28 O HETATM 326 O2P H2U A 16 79.533 48.676 35.816 1.00 67.91 O HETATM 327 O5* H2U A 16 80.270 50.994 35.265 1.00 70.49 O HETATM 328 C5* H2U A 16 81.110 51.317 34.115 1.00 77.82 C HETATM 329 C4* H2U A 16 80.514 52.486 33.353 1.00 82.34 C HETATM 330 O4* H2U A 16 79.081 52.313 33.356 1.00 85.70 O HETATM 331 C3* H2U A 16 80.758 53.821 34.030 1.00 84.30 C HETATM 332 O3* H2U A 16 81.907 54.422 33.414 1.00 84.12 O HETATM 333 C1* H2U A 16 78.428 53.548 33.551 1.00 88.13 C HETATM 334 C2* H2U A 16 79.505 54.639 33.690 1.00 86.71 C HETATM 335 O2* H2U A 16 79.637 55.391 32.493 1.00 88.25 O HETATM 336 N1 H2U A 16 77.347 53.323 34.582 1.00 91.19 N HETATM 337 C2 H2U A 16 76.119 52.865 34.160 1.00 92.39 C HETATM 338 O2 H2U A 16 75.885 52.463 33.033 1.00 92.20 O HETATM 339 N3 H2U A 16 75.123 52.894 35.107 1.00 93.28 N HETATM 340 C4 H2U A 16 75.289 52.711 36.458 1.00 93.34 C HETATM 341 O4 H2U A 16 74.309 52.695 37.208 1.00 92.66 O HETATM 342 C5 H2U A 16 76.696 52.479 36.909 1.00 93.77 C HETATM 343 C6 H2U A 16 77.717 53.238 36.039 1.00 93.22 C HETATM 344 P H2U A 17 83.371 53.708 33.472 1.00 82.84 P HETATM 345 O1P H2U A 17 83.746 53.377 32.068 1.00 83.70 O HETATM 346 O2P H2U A 17 83.498 52.655 34.529 1.00 83.82 O HETATM 347 O5* H2U A 17 84.277 54.923 33.943 1.00 81.72 O HETATM 348 C5* H2U A 17 83.692 55.978 34.736 1.00 76.14 C HETATM 349 C4* H2U A 17 84.176 55.886 36.150 1.00 71.56 C HETATM 350 O4* H2U A 17 85.622 55.872 36.137 1.00 71.61 O HETATM 351 C3* H2U A 17 83.738 57.031 37.055 1.00 67.99 C HETATM 352 O3* H2U A 17 82.553 56.582 37.718 1.00 60.02 O HETATM 353 C1* H2U A 17 86.102 56.903 36.958 1.00 71.64 C HETATM 354 C2* H2U A 17 84.964 57.213 37.948 1.00 71.27 C HETATM 355 O2* H2U A 17 85.004 56.273 39.021 1.00 73.23 O HETATM 356 N1 H2U A 17 86.579 57.954 36.004 1.00 72.27 N HETATM 357 C2 H2U A 17 87.702 58.662 36.301 1.00 71.21 C HETATM 358 O2 H2U A 17 87.834 59.359 37.287 1.00 72.68 O HETATM 359 N3 H2U A 17 88.693 58.585 35.358 1.00 69.04 N HETATM 360 C4 H2U A 17 88.711 57.779 34.244 1.00 68.89 C HETATM 361 O4 H2U A 17 89.766 57.616 33.620 1.00 64.81 O HETATM 362 C5 H2U A 17 87.401 57.154 33.864 1.00 69.36 C HETATM 363 C6 H2U A 17 86.257 57.828 34.577 1.00 71.72 C ATOM 364 P G A 18 81.804 57.491 38.803 1.00 53.35 P ATOM 365 O1P G A 18 82.773 58.100 39.715 1.00 56.70 O ATOM 366 O2P G A 18 80.724 56.638 39.368 1.00 56.95 O ATOM 367 O5* G A 18 81.038 58.580 37.950 1.00 45.07 O ATOM 368 C5* G A 18 80.288 58.201 36.778 1.00 37.17 C ATOM 369 C4* G A 18 80.100 59.412 35.902 1.00 33.24 C ATOM 370 O4* G A 18 79.417 60.430 36.705 1.00 29.49 O ATOM 371 C3* G A 18 81.426 60.038 35.456 1.00 29.19 C ATOM 372 O3* G A 18 81.313 60.691 34.173 1.00 27.98 O ATOM 373 C2* G A 18 81.638 61.165 36.437 1.00 26.61 C ATOM 374 O2* G A 18 82.417 62.205 35.773 1.00 31.88 O ATOM 375 C1* G A 18 80.191 61.615 36.658 1.00 31.24 C ATOM 376 N9 G A 18 79.893 62.457 37.818 1.00 25.73 N ATOM 377 C8 G A 18 80.399 62.361 39.094 1.00 29.39 C ATOM 378 N7 G A 18 79.992 63.332 39.883 1.00 29.66 N ATOM 379 C5 G A 18 79.165 64.095 39.074 1.00 26.03 C ATOM 380 C6 G A 18 78.469 65.259 39.359 1.00 29.73 C ATOM 381 O6 G A 18 78.491 65.889 40.411 1.00 31.10 O ATOM 382 N1 G A 18 77.698 65.711 38.257 1.00 25.44 N ATOM 383 C2 G A 18 77.634 65.077 37.076 1.00 25.15 C ATOM 384 N2 G A 18 76.850 65.605 36.127 1.00 28.06 N ATOM 385 N3 G A 18 78.312 63.960 36.797 1.00 31.81 N ATOM 386 C4 G A 18 79.053 63.539 37.817 1.00 27.46 C ATOM 387 P G A 19 81.705 59.935 32.855 1.00 34.61 P ATOM 388 O1P G A 19 80.751 58.780 32.632 1.00 31.49 O ATOM 389 O2P G A 19 83.185 59.683 32.792 1.00 35.14 O ATOM 390 O5* G A 19 81.429 61.065 31.780 1.00 30.92 O ATOM 391 C5* G A 19 80.053 61.459 31.456 1.00 35.49 C ATOM 392 C4* G A 19 80.105 62.508 30.407 1.00 32.28 C ATOM 393 O4* G A 19 80.779 63.631 30.991 1.00 33.92 O ATOM 394 C3* G A 19 80.907 62.116 29.171 1.00 34.56 C ATOM 395 O3* G A 19 80.389 62.868 28.083 1.00 33.76 O ATOM 396 C2* G A 19 82.305 62.679 29.462 1.00 35.83 C ATOM 397 O2* G A 19 82.892 63.160 28.284 1.00 36.38 O ATOM 398 C1* G A 19 81.965 63.912 30.283 1.00 31.26 C ATOM 399 N9 G A 19 82.922 64.301 31.303 1.00 34.09 N ATOM 400 C8 G A 19 83.808 63.510 32.002 1.00 34.35 C ATOM 401 N7 G A 19 84.330 64.135 33.026 1.00 29.97 N ATOM 402 C5 G A 19 83.803 65.412 32.954 1.00 30.15 C ATOM 403 C6 G A 19 83.998 66.553 33.810 1.00 30.28 C ATOM 404 O6 G A 19 84.715 66.645 34.861 1.00 30.49 O ATOM 405 N1 G A 19 83.246 67.654 33.362 1.00 30.06 N ATOM 406 C2 G A 19 82.438 67.667 32.262 1.00 28.68 C ATOM 407 N2 G A 19 81.753 68.812 32.044 1.00 31.69 N ATOM 408 N3 G A 19 82.279 66.621 31.443 1.00 33.35 N ATOM 409 C4 G A 19 82.976 65.537 31.862 1.00 29.77 C ATOM 410 P G A 20 79.212 62.270 27.169 1.00 37.57 P ATOM 411 O1P G A 20 78.632 63.435 26.460 1.00 37.48 O ATOM 412 O2P G A 20 78.418 61.535 28.129 1.00 34.93 O ATOM 413 O5* G A 20 79.952 61.358 26.110 1.00 35.27 O ATOM 414 C5* G A 20 80.561 61.904 24.945 1.00 34.78 C ATOM 415 C4* G A 20 80.529 60.879 23.849 1.00 33.56 C ATOM 416 O4* G A 20 81.420 59.808 24.203 1.00 34.69 O ATOM 417 C3* G A 20 79.164 60.229 23.679 1.00 34.55 C ATOM 418 O3* G A 20 78.373 60.964 22.748 1.00 32.07 O ATOM 419 C2* G A 20 79.540 58.879 23.115 1.00 31.94 C ATOM 420 O2* G A 20 79.935 59.005 21.776 1.00 29.14 O ATOM 421 C1* G A 20 80.836 58.580 23.876 1.00 33.29 C ATOM 422 N9 G A 20 80.557 57.879 25.128 1.00 28.16 N ATOM 423 C8 G A 20 80.808 58.309 26.413 1.00 28.17 C ATOM 424 N7 G A 20 80.362 57.471 27.316 1.00 27.62 N ATOM 425 C5 G A 20 79.819 56.431 26.576 1.00 26.85 C ATOM 426 C6 G A 20 79.214 55.251 26.985 1.00 29.07 C ATOM 427 O6 G A 20 79.010 54.845 28.141 1.00 30.12 O ATOM 428 N1 G A 20 78.811 54.474 25.894 1.00 26.18 N ATOM 429 C2 G A 20 78.993 54.795 24.576 1.00 28.16 C ATOM 430 N2 G A 20 78.546 53.879 23.648 1.00 24.71 N ATOM 431 N3 G A 20 79.567 55.902 24.177 1.00 25.02 N ATOM 432 C4 G A 20 79.951 56.671 25.225 1.00 27.86 C ATOM 433 P A A 21 76.960 61.541 23.213 1.00 35.97 P ATOM 434 O1P A A 21 76.324 62.054 21.989 1.00 35.07 O ATOM 435 O2P A A 21 77.193 62.456 24.350 1.00 35.45 O ATOM 436 O5* A A 21 76.166 60.289 23.790 1.00 34.57 O ATOM 437 C5* A A 21 75.604 59.329 22.914 1.00 33.17 C ATOM 438 C4* A A 21 75.622 58.001 23.580 1.00 34.52 C ATOM 439 O4* A A 21 74.864 57.958 24.808 1.00 29.25 O ATOM 440 C3* A A 21 75.125 56.852 22.735 1.00 33.11 C ATOM 441 O3* A A 21 76.250 56.581 21.883 1.00 36.69 O ATOM 442 C2* A A 21 74.815 55.806 23.788 1.00 33.65 C ATOM 443 O2* A A 21 76.034 55.158 24.220 1.00 30.39 O ATOM 444 C1* A A 21 74.304 56.666 24.955 1.00 30.28 C ATOM 445 N9 A A 21 72.843 56.834 25.065 1.00 28.13 N ATOM 446 C8 A A 21 72.122 57.968 24.720 1.00 24.90 C ATOM 447 N7 A A 21 70.828 57.864 24.968 1.00 27.77 N ATOM 448 C5 A A 21 70.695 56.581 25.516 1.00 25.63 C ATOM 449 C6 A A 21 69.596 55.872 26.031 1.00 28.10 C ATOM 450 N6 A A 21 68.327 56.341 26.073 1.00 24.98 N ATOM 451 N1 A A 21 69.817 54.668 26.536 1.00 27.40 N ATOM 452 C2 A A 21 71.064 54.186 26.531 1.00 26.83 C ATOM 453 N3 A A 21 72.169 54.734 26.097 1.00 26.24 N ATOM 454 C4 A A 21 71.927 55.948 25.580 1.00 28.94 C ATOM 455 P G A 22 76.122 55.607 20.624 1.00 36.77 P ATOM 456 O1P G A 22 77.347 55.811 19.853 1.00 34.15 O ATOM 457 O2P G A 22 74.796 55.966 20.020 1.00 38.21 O ATOM 458 O5* G A 22 76.107 54.255 21.420 1.00 39.67 O ATOM 459 C5* G A 22 75.588 53.058 20.896 1.00 33.77 C ATOM 460 C4* G A 22 76.292 51.916 21.581 1.00 33.57 C ATOM 461 O4* G A 22 76.032 51.934 23.018 1.00 28.98 O ATOM 462 C3* G A 22 75.794 50.585 21.102 1.00 31.27 C ATOM 463 O3* G A 22 76.427 50.216 19.874 1.00 36.07 O ATOM 464 C2* G A 22 75.986 49.707 22.315 1.00 30.09 C ATOM 465 O2* G A 22 77.321 49.252 22.478 1.00 29.86 O ATOM 466 C1* G A 22 75.605 50.671 23.444 1.00 25.75 C ATOM 467 N9 G A 22 74.157 50.722 23.757 1.00 27.25 N ATOM 468 C8 G A 22 73.306 51.785 23.618 1.00 27.24 C ATOM 469 N7 G A 22 72.095 51.529 24.052 1.00 28.94 N ATOM 470 C5 G A 22 72.147 50.220 24.476 1.00 28.23 C ATOM 471 C6 G A 22 71.138 49.385 25.011 1.00 28.84 C ATOM 472 O6 G A 22 69.936 49.648 25.252 1.00 29.10 O ATOM 473 N1 G A 22 71.624 48.128 25.270 1.00 27.83 N ATOM 474 C2 G A 22 72.899 47.703 25.050 1.00 30.21 C ATOM 475 N2 G A 22 73.158 46.409 25.364 1.00 27.00 N ATOM 476 N3 G A 22 73.837 48.454 24.566 1.00 29.25 N ATOM 477 C4 G A 22 73.400 49.695 24.288 1.00 25.81 C ATOM 478 P A A 23 75.571 49.404 18.784 1.00 38.64 P ATOM 479 O1P A A 23 76.230 49.559 17.497 1.00 35.37 O ATOM 480 O2P A A 23 74.080 49.759 18.881 1.00 35.35 O ATOM 481 O5* A A 23 75.636 47.938 19.332 1.00 37.83 O ATOM 482 C5* A A 23 76.912 47.318 19.633 1.00 37.02 C ATOM 483 C4* A A 23 76.705 45.975 20.257 1.00 39.13 C ATOM 484 O4* A A 23 76.089 46.103 21.547 1.00 38.15 O ATOM 485 C3* A A 23 75.794 45.028 19.486 1.00 40.14 C ATOM 486 O3* A A 23 76.500 44.368 18.446 1.00 42.27 O ATOM 487 C2* A A 23 75.356 44.060 20.563 1.00 39.45 C ATOM 488 O2* A A 23 76.423 43.125 20.754 1.00 42.00 O ATOM 489 C1* A A 23 75.210 45.010 21.771 1.00 37.98 C ATOM 490 N9 A A 23 73.858 45.578 21.885 1.00 35.38 N ATOM 491 C8 A A 23 73.461 46.822 21.439 1.00 35.25 C ATOM 492 N7 A A 23 72.234 47.140 21.772 1.00 33.59 N ATOM 493 C5 A A 23 71.772 46.017 22.438 1.00 34.26 C ATOM 494 C6 A A 23 70.529 45.710 23.019 1.00 32.33 C ATOM 495 N6 A A 23 69.521 46.594 23.138 1.00 34.94 N ATOM 496 N1 A A 23 70.368 44.470 23.517 1.00 35.35 N ATOM 497 C2 A A 23 71.405 43.615 23.470 1.00 31.66 C ATOM 498 N3 A A 23 72.643 43.811 23.008 1.00 32.48 N ATOM 499 C4 A A 23 72.758 45.035 22.487 1.00 34.85 C ATOM 500 P G A 24 75.691 43.814 17.170 1.00 44.02 P ATOM 501 O1P G A 24 76.744 43.269 16.286 1.00 46.42 O ATOM 502 O2P G A 24 74.752 44.889 16.691 1.00 43.05 O ATOM 503 O5* G A 24 74.795 42.670 17.756 1.00 40.73 O ATOM 504 C5* G A 24 75.378 41.492 18.218 1.00 46.30 C ATOM 505 C4* G A 24 74.313 40.622 18.747 1.00 47.22 C ATOM 506 O4* G A 24 73.799 41.198 19.975 1.00 46.23 O ATOM 507 C3* G A 24 73.094 40.484 17.855 1.00 47.23 C ATOM 508 O3* G A 24 73.287 39.486 16.850 1.00 51.92 O ATOM 509 C2* G A 24 72.056 40.037 18.867 1.00 47.49 C ATOM 510 O2* G A 24 72.324 38.676 19.169 1.00 46.29 O ATOM 511 C1* G A 24 72.412 40.925 20.073 1.00 45.11 C ATOM 512 N9 G A 24 71.687 42.195 20.013 1.00 42.04 N ATOM 513 C8 G A 24 72.126 43.377 19.471 1.00 42.70 C ATOM 514 N7 G A 24 71.218 44.315 19.472 1.00 40.83 N ATOM 515 C5 G A 24 70.114 43.715 20.070 1.00 42.41 C ATOM 516 C6 G A 24 68.831 44.242 20.336 1.00 44.36 C ATOM 517 O6 G A 24 68.390 45.396 20.057 1.00 44.29 O ATOM 518 N1 G A 24 68.011 43.299 20.965 1.00 42.61 N ATOM 519 C2 G A 24 68.390 42.013 21.284 1.00 44.44 C ATOM 520 N2 G A 24 67.446 41.234 21.843 1.00 42.33 N ATOM 521 N3 G A 24 69.599 41.521 21.050 1.00 41.16 N ATOM 522 C4 G A 24 70.396 42.419 20.435 1.00 42.35 C ATOM 523 P C A 25 72.660 39.699 15.392 1.00 49.58 P ATOM 524 O1P C A 25 73.082 38.530 14.539 1.00 54.65 O ATOM 525 O2P C A 25 72.897 41.067 14.886 1.00 47.78 O ATOM 526 O5* C A 25 71.110 39.469 15.645 1.00 48.69 O ATOM 527 C5* C A 25 70.619 38.180 16.052 1.00 50.80 C ATOM 528 C4* C A 25 69.170 38.275 16.491 1.00 50.78 C ATOM 529 O4* C A 25 69.098 39.152 17.653 1.00 55.06 O ATOM 530 C3* C A 25 68.156 38.899 15.531 1.00 53.87 C ATOM 531 O3* C A 25 67.629 38.012 14.533 1.00 52.62 O ATOM 532 C2* C A 25 67.045 39.336 16.481 1.00 54.24 C ATOM 533 O2* C A 25 66.276 38.238 16.922 1.00 56.92 O ATOM 534 C1* C A 25 67.844 39.818 17.685 1.00 51.93 C ATOM 535 N1 C A 25 68.040 41.263 17.589 1.00 51.02 N ATOM 536 C2 C A 25 67.005 42.092 18.016 1.00 51.02 C ATOM 537 O2 C A 25 65.989 41.579 18.475 1.00 49.40 O ATOM 538 N3 C A 25 67.136 43.428 17.897 1.00 50.42 N ATOM 539 C4 C A 25 68.240 43.949 17.341 1.00 50.86 C ATOM 540 N4 C A 25 68.316 45.287 17.212 1.00 46.56 N ATOM 541 C5 C A 25 69.316 43.129 16.899 1.00 48.98 C ATOM 542 C6 C A 25 69.185 41.802 17.062 1.00 50.79 C HETATM 543 P M2G A 26 67.297 38.593 13.077 1.00 53.21 P HETATM 544 O1P M2G A 26 68.469 39.430 12.691 1.00 53.23 O HETATM 545 O2P M2G A 26 66.835 37.487 12.148 1.00 56.89 O HETATM 546 O5* M2G A 26 66.058 39.572 13.258 1.00 49.44 O HETATM 547 C5* M2G A 26 64.865 39.112 13.893 1.00 49.64 C HETATM 548 C4* M2G A 26 63.938 40.267 14.143 1.00 49.36 C HETATM 549 O4* M2G A 26 64.443 41.119 15.209 1.00 48.17 O HETATM 550 C3* M2G A 26 63.719 41.196 12.968 1.00 48.54 C HETATM 551 O3* M2G A 26 62.681 40.699 12.152 1.00 52.14 O HETATM 552 C2* M2G A 26 63.273 42.477 13.644 1.00 47.40 C HETATM 553 O2* M2G A 26 61.905 42.387 14.025 1.00 47.93 O HETATM 554 C1* M2G A 26 64.147 42.470 14.901 1.00 46.92 C HETATM 555 N9 M2G A 26 65.403 43.195 14.677 1.00 41.61 N HETATM 556 C8 M2G A 26 66.657 42.687 14.400 1.00 45.40 C HETATM 557 N7 M2G A 26 67.543 43.618 14.188 1.00 44.60 N HETATM 558 C5 M2G A 26 66.836 44.807 14.371 1.00 43.23 C HETATM 559 C6 M2G A 26 67.253 46.174 14.285 1.00 41.77 C HETATM 560 O6 M2G A 26 68.372 46.637 14.008 1.00 45.23 O HETATM 561 N1 M2G A 26 66.209 47.048 14.544 1.00 42.85 N HETATM 562 C2 M2G A 26 64.926 46.678 14.840 1.00 42.16 C HETATM 563 N2 M2G A 26 64.015 47.663 15.061 1.00 41.43 N HETATM 564 N3 M2G A 26 64.531 45.410 14.927 1.00 42.18 N HETATM 565 C4 M2G A 26 65.524 44.546 14.680 1.00 43.64 C HETATM 566 CM1 M2G A 26 64.404 49.075 15.158 1.00 44.51 C HETATM 567 CM2 M2G A 26 62.594 47.288 15.283 1.00 41.28 C ATOM 568 P C A 27 62.860 40.709 10.569 1.00 53.96 P ATOM 569 O1P C A 27 61.690 39.909 10.056 1.00 58.78 O ATOM 570 O2P C A 27 64.244 40.343 10.165 1.00 54.14 O ATOM 571 O5* C A 27 62.614 42.218 10.155 1.00 53.83 O ATOM 572 C5* C A 27 61.437 42.899 10.561 1.00 51.71 C ATOM 573 C4* C A 27 61.594 44.378 10.321 1.00 53.47 C ATOM 574 O4* C A 27 62.476 44.955 11.330 1.00 51.31 O ATOM 575 C3* C A 27 62.210 44.814 9.004 1.00 54.38 C ATOM 576 O3* C A 27 61.256 44.865 7.943 1.00 60.47 O ATOM 577 C2* C A 27 62.693 46.223 9.339 1.00 51.78 C ATOM 578 O2* C A 27 61.640 47.167 9.301 1.00 51.18 O ATOM 579 C1* C A 27 63.159 46.064 10.787 1.00 48.68 C ATOM 580 N1 C A 27 64.612 45.818 10.877 1.00 42.59 N ATOM 581 C2 C A 27 65.472 46.868 10.634 1.00 44.48 C ATOM 582 O2 C A 27 64.981 47.978 10.348 1.00 42.73 O ATOM 583 N3 C A 27 66.821 46.659 10.722 1.00 42.28 N ATOM 584 C4 C A 27 67.275 45.452 11.056 1.00 43.75 C ATOM 585 N4 C A 27 68.586 45.272 11.180 1.00 44.57 N ATOM 586 C5 C A 27 66.402 44.364 11.291 1.00 44.20 C ATOM 587 C6 C A 27 65.095 44.589 11.192 1.00 44.33 C ATOM 588 P C A 28 61.715 44.551 6.429 1.00 61.60 P ATOM 589 O1P C A 28 60.464 44.441 5.640 1.00 62.15 O ATOM 590 O2P C A 28 62.648 43.401 6.479 1.00 61.61 O ATOM 591 O5* C A 28 62.553 45.816 5.941 1.00 56.70 O ATOM 592 C5* C A 28 61.910 47.047 5.647 1.00 57.75 C ATOM 593 C4* C A 28 62.918 48.099 5.246 1.00 58.29 C ATOM 594 O4* C A 28 63.719 48.476 6.402 1.00 56.80 O ATOM 595 C3* C A 28 63.942 47.721 4.187 1.00 60.32 C ATOM 596 O3* C A 28 63.466 47.779 2.838 1.00 64.19 O ATOM 597 C2* C A 28 65.056 48.727 4.461 1.00 57.91 C ATOM 598 O2* C A 28 64.774 50.025 3.952 1.00 56.60 O ATOM 599 C1* C A 28 65.046 48.767 5.987 1.00 56.20 C ATOM 600 N1 C A 28 65.958 47.725 6.532 1.00 51.41 N ATOM 601 C2 C A 28 67.338 47.999 6.616 1.00 49.26 C ATOM 602 O2 C A 28 67.761 49.119 6.240 1.00 45.60 O ATOM 603 N3 C A 28 68.175 47.034 7.101 1.00 50.25 N ATOM 604 C4 C A 28 67.676 45.854 7.493 1.00 50.21 C ATOM 605 N4 C A 28 68.515 44.934 7.978 1.00 50.49 N ATOM 606 C5 C A 28 66.284 45.560 7.412 1.00 50.83 C ATOM 607 C6 C A 28 65.473 46.516 6.936 1.00 50.67 C ATOM 608 P A A 29 64.134 46.810 1.715 1.00 65.77 P ATOM 609 O1P A A 29 63.389 47.109 0.457 1.00 67.04 O ATOM 610 O2P A A 29 64.179 45.395 2.219 1.00 63.95 O ATOM 611 O5* A A 29 65.641 47.328 1.540 1.00 62.83 O ATOM 612 C5* A A 29 65.903 48.647 0.972 1.00 63.87 C ATOM 613 C4* A A 29 67.384 49.009 1.027 1.00 63.27 C ATOM 614 O4* A A 29 67.815 49.121 2.412 1.00 62.56 O ATOM 615 C3* A A 29 68.423 48.084 0.399 1.00 63.00 C ATOM 616 O3* A A 29 68.576 48.226 -1.003 1.00 65.01 O ATOM 617 C2* A A 29 69.702 48.526 1.092 1.00 61.38 C ATOM 618 O2* A A 29 70.234 49.721 0.530 1.00 61.02 O ATOM 619 C1* A A 29 69.196 48.789 2.509 1.00 59.63 C ATOM 620 N9 A A 29 69.335 47.576 3.315 1.00 54.19 N ATOM 621 C8 A A 29 68.378 46.647 3.653 1.00 54.48 C ATOM 622 N7 A A 29 68.847 45.630 4.321 1.00 53.76 N ATOM 623 C5 A A 29 70.198 45.923 4.448 1.00 52.14 C ATOM 624 C6 A A 29 71.245 45.231 5.038 1.00 50.79 C ATOM 625 N6 A A 29 71.082 44.066 5.687 1.00 50.54 N ATOM 626 N1 A A 29 72.481 45.776 4.952 1.00 48.83 N ATOM 627 C2 A A 29 72.620 46.942 4.329 1.00 48.49 C ATOM 628 N3 A A 29 71.699 47.698 3.751 1.00 51.37 N ATOM 629 C4 A A 29 70.502 47.119 3.845 1.00 50.75 C ATOM 630 P G A 30 69.148 46.986 -1.875 1.00 63.15 P ATOM 631 O1P G A 30 68.830 47.365 -3.277 1.00 64.99 O ATOM 632 O2P G A 30 68.659 45.688 -1.323 1.00 61.44 O ATOM 633 O5* G A 30 70.738 46.979 -1.648 1.00 64.00 O ATOM 634 C5* G A 30 71.552 48.159 -1.889 1.00 63.25 C ATOM 635 C4* G A 30 72.933 47.996 -1.281 1.00 63.56 C ATOM 636 O4* G A 30 72.830 47.805 0.157 1.00 62.05 O ATOM 637 C3* G A 30 73.716 46.785 -1.754 1.00 65.21 C ATOM 638 O3* G A 30 74.380 47.055 -2.978 1.00 68.93 O ATOM 639 C2* G A 30 74.690 46.532 -0.603 1.00 63.90 C ATOM 640 O2* G A 30 75.815 47.379 -0.636 1.00 66.48 O ATOM 641 C1* G A 30 73.839 46.903 0.605 1.00 60.48 C ATOM 642 N9 G A 30 73.204 45.719 1.184 1.00 56.62 N ATOM 643 C8 G A 30 71.872 45.376 1.155 1.00 55.59 C ATOM 644 N7 G A 30 71.623 44.235 1.750 1.00 54.28 N ATOM 645 C5 G A 30 72.868 43.798 2.203 1.00 52.26 C ATOM 646 C6 G A 30 73.238 42.611 2.909 1.00 51.92 C ATOM 647 O6 G A 30 72.512 41.677 3.315 1.00 48.72 O ATOM 648 N1 G A 30 74.614 42.562 3.133 1.00 53.22 N ATOM 649 C2 G A 30 75.513 43.524 2.738 1.00 52.55 C ATOM 650 N2 G A 30 76.797 43.282 3.015 1.00 52.61 N ATOM 651 N3 G A 30 75.180 44.635 2.106 1.00 52.73 N ATOM 652 C4 G A 30 73.853 44.703 1.866 1.00 53.73 C ATOM 653 P A A 31 74.530 45.886 -4.075 1.00 69.42 P ATOM 654 O1P A A 31 75.319 46.471 -5.195 1.00 71.00 O ATOM 655 O2P A A 31 73.193 45.303 -4.337 1.00 68.84 O ATOM 656 O5* A A 31 75.405 44.785 -3.333 1.00 67.81 O ATOM 657 C5* A A 31 76.782 45.018 -3.047 1.00 66.34 C ATOM 658 C4* A A 31 77.346 43.848 -2.300 1.00 65.86 C ATOM 659 O4* A A 31 76.688 43.760 -1.020 1.00 64.48 O ATOM 660 C3* A A 31 77.123 42.485 -2.938 1.00 67.80 C ATOM 661 O3* A A 31 78.102 42.185 -3.933 1.00 69.43 O ATOM 662 C2* A A 31 77.166 41.543 -1.735 1.00 66.49 C ATOM 663 O2* A A 31 78.476 41.202 -1.324 1.00 70.09 O ATOM 664 C1* A A 31 76.518 42.403 -0.649 1.00 62.06 C ATOM 665 N9 A A 31 75.083 42.164 -0.541 1.00 56.88 N ATOM 666 C8 A A 31 74.083 42.881 -1.152 1.00 56.22 C ATOM 667 N7 A A 31 72.877 42.454 -0.869 1.00 55.69 N ATOM 668 C5 A A 31 73.093 41.379 -0.019 1.00 52.78 C ATOM 669 C6 A A 31 72.210 40.498 0.642 1.00 53.56 C ATOM 670 N6 A A 31 70.867 40.568 0.536 1.00 51.52 N ATOM 671 N1 A A 31 72.757 39.533 1.428 1.00 53.95 N ATOM 672 C2 A A 31 74.103 39.473 1.530 1.00 54.22 C ATOM 673 N3 A A 31 75.027 40.244 0.958 1.00 53.01 N ATOM 674 C4 A A 31 74.453 41.186 0.189 1.00 54.24 C HETATM 675 N1 OMC A 32 74.486 38.301 -2.478 1.00 64.48 N HETATM 676 C2 OMC A 32 73.347 37.833 -1.784 1.00 63.86 C HETATM 677 N3 OMC A 32 72.174 38.492 -1.928 1.00 62.03 N HETATM 678 C4 OMC A 32 72.103 39.570 -2.726 1.00 62.91 C HETATM 679 C5 OMC A 32 73.242 40.067 -3.425 1.00 63.75 C HETATM 680 C6 OMC A 32 74.401 39.413 -3.269 1.00 64.43 C HETATM 681 O2 OMC A 32 73.450 36.817 -1.056 1.00 63.74 O HETATM 682 N4 OMC A 32 70.914 40.189 -2.874 1.00 60.56 N HETATM 683 C1* OMC A 32 75.756 37.560 -2.358 1.00 68.30 C HETATM 684 C2* OMC A 32 75.912 36.485 -3.446 1.00 68.41 C HETATM 685 O2* OMC A 32 76.521 35.341 -2.874 1.00 68.27 O HETATM 686 CM2 OMC A 32 75.491 34.476 -2.439 1.00 68.09 C HETATM 687 C3* OMC A 32 76.801 37.196 -4.464 1.00 69.68 C HETATM 688 C4* OMC A 32 77.712 38.038 -3.579 1.00 69.87 C HETATM 689 O4* OMC A 32 76.843 38.468 -2.493 1.00 69.46 O HETATM 690 O3* OMC A 32 77.529 36.305 -5.304 1.00 71.71 O HETATM 691 C5* OMC A 32 78.349 39.239 -4.237 1.00 68.92 C HETATM 692 O5* OMC A 32 77.359 40.004 -4.930 1.00 69.40 O HETATM 693 P OMC A 32 77.644 41.515 -5.330 1.00 70.88 P HETATM 694 O1P OMC A 32 76.360 42.128 -5.750 1.00 69.87 O HETATM 695 O2P OMC A 32 78.813 41.581 -6.239 1.00 71.33 O ATOM 696 P U A 33 76.971 35.962 -6.783 1.00 72.91 P ATOM 697 O1P U A 33 78.044 35.151 -7.420 1.00 73.35 O ATOM 698 O2P U A 33 76.437 37.170 -7.490 1.00 71.97 O ATOM 699 O5* U A 33 75.699 35.053 -6.505 1.00 71.09 O ATOM 700 C5* U A 33 75.822 33.741 -5.951 1.00 71.16 C ATOM 701 C4* U A 33 74.457 33.095 -5.870 1.00 72.48 C ATOM 702 O4* U A 33 73.655 33.729 -4.828 1.00 73.47 O ATOM 703 C3* U A 33 73.601 33.208 -7.129 1.00 72.70 C ATOM 704 O3* U A 33 73.935 32.212 -8.110 1.00 70.79 O ATOM 705 C2* U A 33 72.190 33.041 -6.575 1.00 72.85 C ATOM 706 O2* U A 33 71.917 31.668 -6.340 1.00 73.80 O ATOM 707 C1* U A 33 72.289 33.769 -5.226 1.00 72.04 C ATOM 708 N1 U A 33 71.847 35.177 -5.291 1.00 69.22 N ATOM 709 C2 U A 33 70.504 35.476 -4.990 1.00 67.66 C ATOM 710 O2 U A 33 69.696 34.637 -4.613 1.00 66.73 O ATOM 711 N3 U A 33 70.155 36.799 -5.139 1.00 64.32 N ATOM 712 C4 U A 33 70.975 37.841 -5.535 1.00 64.94 C ATOM 713 O4 U A 33 70.494 38.966 -5.720 1.00 60.96 O ATOM 714 C5 U A 33 72.339 37.462 -5.787 1.00 65.88 C ATOM 715 C6 U A 33 72.718 36.181 -5.660 1.00 68.66 C HETATM 716 P OMG A 34 73.785 32.556 -9.678 1.00 71.12 P HETATM 717 O1P OMG A 34 74.725 33.685 -10.014 1.00 67.66 O HETATM 718 O2P OMG A 34 73.921 31.249 -10.387 1.00 68.06 O HETATM 719 O5* OMG A 34 72.274 33.065 -9.810 1.00 65.26 O HETATM 720 C5* OMG A 34 71.764 33.656 -11.016 1.00 64.26 C HETATM 721 C4* OMG A 34 70.295 33.326 -11.163 1.00 64.62 C HETATM 722 O4* OMG A 34 70.126 31.890 -11.328 1.00 63.80 O HETATM 723 C3* OMG A 34 69.463 33.672 -9.939 1.00 65.81 C HETATM 724 O3* OMG A 34 69.094 35.051 -9.965 1.00 67.70 O HETATM 725 C2* OMG A 34 68.311 32.644 -9.969 1.00 64.60 C HETATM 726 O2* OMG A 34 67.083 32.909 -10.666 1.00 63.66 O HETATM 727 CM2 OMG A 34 67.294 33.412 -11.976 1.00 64.30 C HETATM 728 C1* OMG A 34 68.999 31.433 -10.593 1.00 61.80 C HETATM 729 N9 OMG A 34 69.429 30.396 -9.649 1.00 59.16 N HETATM 730 C8 OMG A 34 70.718 29.977 -9.388 1.00 57.76 C HETATM 731 N7 OMG A 34 70.774 28.948 -8.579 1.00 56.91 N HETATM 732 C5 OMG A 34 69.441 28.691 -8.266 1.00 58.76 C HETATM 733 C6 OMG A 34 68.866 27.676 -7.438 1.00 60.12 C HETATM 734 O6 OMG A 34 69.451 26.785 -6.793 1.00 60.12 O HETATM 735 N1 OMG A 34 67.469 27.777 -7.395 1.00 59.19 N HETATM 736 C2 OMG A 34 66.720 28.751 -8.040 1.00 59.35 C HETATM 737 N2 OMG A 34 65.376 28.719 -7.843 1.00 56.71 N HETATM 738 N3 OMG A 34 67.250 29.698 -8.816 1.00 58.06 N HETATM 739 C4 OMG A 34 68.602 29.600 -8.889 1.00 58.63 C ATOM 740 P A A 35 69.287 35.959 -8.637 1.00 69.94 P ATOM 741 O1P A A 35 68.963 37.372 -8.988 1.00 66.80 O ATOM 742 O2P A A 35 70.613 35.629 -8.027 1.00 67.49 O ATOM 743 O5* A A 35 68.138 35.409 -7.674 1.00 69.12 O ATOM 744 C5* A A 35 66.791 35.233 -8.158 1.00 71.95 C ATOM 745 C4* A A 35 65.999 34.348 -7.216 1.00 73.29 C ATOM 746 O4* A A 35 66.330 32.957 -7.426 1.00 71.52 O ATOM 747 C3* A A 35 66.228 34.575 -5.733 1.00 75.03 C ATOM 748 O3* A A 35 65.477 35.689 -5.246 1.00 79.27 O ATOM 749 C2* A A 35 65.806 33.241 -5.122 1.00 73.35 C ATOM 750 O2* A A 35 64.415 33.130 -4.902 1.00 73.82 O ATOM 751 C1* A A 35 66.202 32.251 -6.217 1.00 69.82 C ATOM 752 N9 A A 35 67.458 31.568 -5.953 1.00 66.95 N ATOM 753 C8 A A 35 68.714 31.908 -6.382 1.00 65.90 C ATOM 754 N7 A A 35 69.650 31.095 -5.969 1.00 63.58 N ATOM 755 C5 A A 35 68.964 30.158 -5.217 1.00 64.51 C ATOM 756 C6 A A 35 69.394 29.030 -4.505 1.00 64.64 C ATOM 757 N6 A A 35 70.676 28.646 -4.442 1.00 64.41 N ATOM 758 N1 A A 35 68.454 28.302 -3.853 1.00 64.67 N ATOM 759 C2 A A 35 67.173 28.694 -3.936 1.00 64.93 C ATOM 760 N3 A A 35 66.650 29.737 -4.582 1.00 64.37 N ATOM 761 C4 A A 35 67.612 30.435 -5.203 1.00 64.99 C ATOM 762 P A A 36 66.113 36.647 -4.115 1.00 81.67 P ATOM 763 O1P A A 36 65.045 37.623 -3.738 1.00 81.59 O ATOM 764 O2P A A 36 67.432 37.156 -4.600 1.00 82.66 O ATOM 765 O5* A A 36 66.374 35.639 -2.911 1.00 83.98 O ATOM 766 C5* A A 36 65.302 34.813 -2.451 1.00 87.71 C ATOM 767 C4* A A 36 65.801 33.698 -1.565 1.00 90.67 C ATOM 768 O4* A A 36 66.568 32.696 -2.288 1.00 89.82 O ATOM 769 C3* A A 36 66.712 33.999 -0.388 1.00 92.78 C ATOM 770 O3* A A 36 66.031 34.644 0.701 1.00 97.07 O ATOM 771 C2* A A 36 67.162 32.587 0.010 1.00 91.11 C ATOM 772 O2* A A 36 66.278 31.932 0.896 1.00 92.17 O ATOM 773 C1* A A 36 67.143 31.833 -1.328 1.00 88.90 C ATOM 774 N9 A A 36 68.488 31.414 -1.719 1.00 85.94 N ATOM 775 C8 A A 36 69.437 32.000 -2.530 1.00 84.74 C ATOM 776 N7 A A 36 70.588 31.361 -2.537 1.00 83.82 N ATOM 777 C5 A A 36 70.369 30.267 -1.700 1.00 84.29 C ATOM 778 C6 A A 36 71.195 29.192 -1.278 1.00 84.05 C ATOM 779 N6 A A 36 72.462 29.017 -1.673 1.00 85.59 N ATOM 780 N1 A A 36 70.658 28.283 -0.430 1.00 83.44 N ATOM 781 C2 A A 36 69.385 28.429 -0.052 1.00 82.14 C ATOM 782 N3 A A 36 68.512 29.376 -0.383 1.00 82.89 N ATOM 783 C4 A A 36 69.074 30.277 -1.213 1.00 84.44 C HETATM 784 N1 YG A 37 73.405 29.912 1.754 1.00 91.04 N HETATM 785 N2 YG A 37 73.444 28.355 3.405 1.00 90.89 N HETATM 786 C2 YG A 37 72.723 29.290 2.751 1.00 90.80 C HETATM 787 N3 YG A 37 71.471 29.563 3.082 1.00 90.20 N HETATM 788 C3 YG A 37 70.758 28.811 4.134 1.00 89.88 C HETATM 789 C4 YG A 37 70.959 30.583 2.344 1.00 90.19 C HETATM 790 C5 YG A 37 71.580 31.301 1.340 1.00 89.89 C HETATM 791 C6 YG A 37 72.913 30.962 0.974 1.00 90.32 C HETATM 792 O6 YG A 37 73.618 31.473 0.097 1.00 89.77 O HETATM 793 N7 YG A 37 70.739 32.287 0.844 1.00 89.86 N HETATM 794 C8 YG A 37 69.640 32.153 1.532 1.00 89.85 C HETATM 795 N9 YG A 37 69.698 31.144 2.464 1.00 90.90 N HETATM 796 C10 YG A 37 75.897 27.876 3.472 1.00 91.27 C HETATM 797 C11 YG A 37 74.717 28.526 2.928 1.00 92.18 C HETATM 798 C12 YG A 37 74.703 29.466 1.794 1.00 92.82 C HETATM 799 C13 YG A 37 75.894 29.679 0.821 1.00 95.17 C HETATM 800 C14 YG A 37 75.821 28.497 -0.184 1.00 98.02 C HETATM 801 C15 YG A 37 76.173 28.762 -1.668 1.00 99.84 C HETATM 802 C16 YG A 37 75.991 27.380 -2.269 1.00 99.97 C HETATM 803 O17 YG A 37 74.838 26.920 -2.376 1.00100.19 O HETATM 804 O18 YG A 37 76.976 26.693 -2.657 1.00100.19 O HETATM 805 C19 YG A 37 77.769 27.170 -3.764 1.00 99.79 C HETATM 806 N20 YG A 37 75.234 29.792 -2.267 1.00100.19 N HETATM 807 C21 YG A 37 75.173 30.151 -3.610 1.00100.19 C HETATM 808 O22 YG A 37 74.112 30.084 -4.264 1.00100.19 O HETATM 809 O23 YG A 37 76.221 30.547 -4.170 1.00100.19 O HETATM 810 C24 YG A 37 76.863 31.711 -3.637 1.00 99.34 C HETATM 811 C1* YG A 37 68.645 30.761 3.410 1.00 93.08 C HETATM 812 C2* YG A 37 68.983 31.220 4.830 1.00 94.21 C HETATM 813 O2* YG A 37 68.545 30.264 5.774 1.00 94.06 O HETATM 814 C3* YG A 37 68.238 32.545 4.908 1.00 94.74 C HETATM 815 O3* YG A 37 68.016 32.978 6.242 1.00 95.58 O HETATM 816 C4* YG A 37 66.976 32.279 4.107 1.00 94.30 C HETATM 817 O4* YG A 37 67.424 31.399 3.039 1.00 94.11 O HETATM 818 C5* YG A 37 66.323 33.509 3.519 1.00 95.27 C HETATM 819 O5* YG A 37 67.311 34.330 2.856 1.00 96.96 O HETATM 820 P YG A 37 66.875 35.460 1.815 1.00 99.01 P HETATM 821 O1P YG A 37 68.123 35.981 1.180 1.00 99.22 O HETATM 822 O2P YG A 37 65.932 36.421 2.473 1.00 99.27 O ATOM 823 P A A 38 69.075 33.984 6.915 1.00 95.95 P ATOM 824 O1P A A 38 68.607 34.325 8.279 1.00 96.86 O ATOM 825 O2P A A 38 69.344 35.071 5.940 1.00 96.06 O ATOM 826 O5* A A 38 70.393 33.112 7.051 1.00 93.01 O ATOM 827 C5* A A 38 70.417 31.929 7.859 1.00 91.33 C ATOM 828 C4* A A 38 71.783 31.310 7.804 1.00 90.73 C ATOM 829 O4* A A 38 72.019 30.786 6.472 1.00 89.61 O ATOM 830 C3* A A 38 72.876 32.339 8.008 1.00 90.95 C ATOM 831 O3* A A 38 73.107 32.623 9.369 1.00 92.09 O ATOM 832 C2* A A 38 74.046 31.768 7.218 1.00 90.07 C ATOM 833 O2* A A 38 74.791 30.775 7.903 1.00 89.63 O ATOM 834 C1* A A 38 73.308 31.178 6.015 1.00 88.20 C ATOM 835 N9 A A 38 73.101 32.188 4.971 1.00 84.93 N ATOM 836 C8 A A 38 71.911 32.814 4.660 1.00 83.84 C ATOM 837 N7 A A 38 72.004 33.675 3.679 1.00 82.59 N ATOM 838 C5 A A 38 73.344 33.617 3.312 1.00 82.67 C ATOM 839 C6 A A 38 74.081 34.292 2.315 1.00 81.67 C ATOM 840 N6 A A 38 73.539 35.181 1.471 1.00 80.44 N ATOM 841 N1 A A 38 75.404 34.015 2.214 1.00 80.84 N ATOM 842 C2 A A 38 75.938 33.108 3.055 1.00 81.31 C ATOM 843 N3 A A 38 75.346 32.402 4.026 1.00 82.09 N ATOM 844 C4 A A 38 74.035 32.708 4.104 1.00 83.08 C HETATM 845 N1 PSU A 39 74.080 36.066 5.459 1.00 75.82 N HETATM 846 C2 PSU A 39 74.415 36.835 4.354 1.00 75.59 C HETATM 847 N3 PSU A 39 75.735 36.769 3.984 1.00 76.29 N HETATM 848 C4 PSU A 39 76.728 36.038 4.591 1.00 77.28 C HETATM 849 C5 PSU A 39 76.307 35.280 5.732 1.00 77.93 C HETATM 850 C6 PSU A 39 75.025 35.316 6.112 1.00 76.07 C HETATM 851 O2 PSU A 39 73.605 37.525 3.749 1.00 75.80 O HETATM 852 O4 PSU A 39 77.875 36.079 4.134 1.00 77.81 O HETATM 853 C1* PSU A 39 77.325 34.455 6.488 1.00 79.85 C HETATM 854 C2* PSU A 39 78.240 35.315 7.366 1.00 80.79 C HETATM 855 O2* PSU A 39 79.550 34.775 7.399 1.00 79.82 O HETATM 856 C3* PSU A 39 77.509 35.235 8.700 1.00 81.32 C HETATM 857 C4* PSU A 39 77.034 33.800 8.726 1.00 83.30 C HETATM 858 O3* PSU A 39 78.312 35.525 9.823 1.00 80.38 O HETATM 859 O4* PSU A 39 76.648 33.545 7.349 1.00 81.10 O HETATM 860 C5* PSU A 39 75.867 33.557 9.646 1.00 86.38 C HETATM 861 O5* PSU A 39 74.796 34.470 9.341 1.00 90.31 O HETATM 862 P PSU A 39 73.308 34.151 9.814 1.00 92.76 P HETATM 863 O1P PSU A 39 73.270 34.200 11.303 1.00 93.12 O HETATM 864 O2P PSU A 39 72.370 34.998 9.024 1.00 92.11 O HETATM 865 P 5MC A 40 78.152 36.942 10.553 1.00 79.24 P HETATM 866 O1P 5MC A 40 76.685 37.167 10.693 1.00 80.02 O HETATM 867 O2P 5MC A 40 79.021 36.942 11.759 1.00 78.31 O HETATM 868 O5* 5MC A 40 78.720 37.994 9.494 1.00 76.52 O HETATM 869 C5* 5MC A 40 80.116 38.022 9.141 1.00 72.60 C HETATM 870 C4* 5MC A 40 80.351 38.970 7.985 1.00 69.92 C HETATM 871 O4* 5MC A 40 79.612 38.525 6.814 1.00 68.54 O HETATM 872 C3* 5MC A 40 79.877 40.397 8.200 1.00 68.47 C HETATM 873 O3* 5MC A 40 80.825 41.181 8.913 1.00 66.73 O HETATM 874 C2* 5MC A 40 79.663 40.906 6.778 1.00 69.16 C HETATM 875 O2* 5MC A 40 80.841 41.385 6.165 1.00 68.77 O HETATM 876 C1* 5MC A 40 79.168 39.648 6.065 1.00 67.58 C HETATM 877 N1 5MC A 40 77.695 39.634 6.019 1.00 65.59 N HETATM 878 C2 5MC A 40 77.041 40.408 5.050 1.00 63.72 C HETATM 879 O2 5MC A 40 77.718 41.074 4.247 1.00 61.25 O HETATM 880 N3 5MC A 40 75.695 40.421 5.017 1.00 63.28 N HETATM 881 C4 5MC A 40 74.995 39.705 5.898 1.00 62.84 C HETATM 882 N4 5MC A 40 73.662 39.740 5.803 1.00 62.95 N HETATM 883 C5 5MC A 40 75.630 38.920 6.903 1.00 63.44 C HETATM 884 C6 5MC A 40 76.968 38.900 6.919 1.00 64.81 C HETATM 885 CM5 5MC A 40 74.837 38.153 7.923 1.00 63.46 C ATOM 886 P U A 41 80.313 42.447 9.767 1.00 63.29 P ATOM 887 O1P U A 41 81.427 42.965 10.609 1.00 65.66 O ATOM 888 O2P U A 41 79.049 42.016 10.412 1.00 63.65 O ATOM 889 O5* U A 41 79.984 43.552 8.669 1.00 62.24 O ATOM 890 C5* U A 41 81.036 44.143 7.918 1.00 56.79 C ATOM 891 C4* U A 41 80.498 45.168 6.953 1.00 56.45 C ATOM 892 O4* U A 41 79.680 44.526 5.942 1.00 53.37 O ATOM 893 C3* U A 41 79.592 46.251 7.508 1.00 54.69 C ATOM 894 O3* U A 41 80.283 47.292 8.160 1.00 55.80 O ATOM 895 C2* U A 41 78.887 46.737 6.251 1.00 52.98 C ATOM 896 O2* U A 41 79.686 47.539 5.407 1.00 54.28 O ATOM 897 C1* U A 41 78.651 45.424 5.527 1.00 49.85 C ATOM 898 N1 U A 41 77.358 44.851 5.928 1.00 45.19 N ATOM 899 C2 U A 41 76.197 45.430 5.430 1.00 41.52 C ATOM 900 O2 U A 41 76.190 46.418 4.723 1.00 42.41 O ATOM 901 N3 U A 41 75.046 44.808 5.803 1.00 41.65 N ATOM 902 C4 U A 41 74.928 43.710 6.596 1.00 40.75 C ATOM 903 O4 U A 41 73.826 43.203 6.735 1.00 49.24 O ATOM 904 C5 U A 41 76.154 43.186 7.099 1.00 43.67 C ATOM 905 C6 U A 41 77.302 43.767 6.753 1.00 41.90 C ATOM 906 P G A 42 79.705 47.861 9.545 1.00 53.54 P ATOM 907 O1P G A 42 80.846 48.577 10.148 1.00 56.88 O ATOM 908 O2P G A 42 79.060 46.769 10.292 1.00 47.65 O ATOM 909 O5* G A 42 78.638 48.930 9.059 1.00 51.64 O ATOM 910 C5* G A 42 79.001 49.877 8.054 1.00 51.41 C ATOM 911 C4* G A 42 77.769 50.433 7.372 1.00 54.00 C ATOM 912 O4* G A 42 77.032 49.365 6.704 1.00 51.14 O ATOM 913 C3* G A 42 76.726 51.067 8.272 1.00 53.82 C ATOM 914 O3* G A 42 77.060 52.389 8.660 1.00 57.90 O ATOM 915 C2* G A 42 75.480 51.029 7.397 1.00 53.12 C ATOM 916 O2* G A 42 75.444 52.122 6.489 1.00 55.26 O ATOM 917 C1* G A 42 75.657 49.688 6.671 1.00 49.13 C ATOM 918 N9 G A 42 74.906 48.601 7.302 1.00 44.46 N ATOM 919 C8 G A 42 75.381 47.535 8.030 1.00 41.11 C ATOM 920 N7 G A 42 74.431 46.730 8.433 1.00 41.58 N ATOM 921 C5 G A 42 73.265 47.311 7.951 1.00 42.30 C ATOM 922 C6 G A 42 71.901 46.887 8.049 1.00 39.56 C ATOM 923 O6 G A 42 71.457 45.892 8.572 1.00 41.46 O ATOM 924 N1 G A 42 71.044 47.776 7.421 1.00 40.73 N ATOM 925 C2 G A 42 71.439 48.906 6.760 1.00 39.14 C ATOM 926 N2 G A 42 70.481 49.648 6.204 1.00 43.75 N ATOM 927 N3 G A 42 72.696 49.297 6.643 1.00 42.69 N ATOM 928 C4 G A 42 73.544 48.463 7.263 1.00 41.15 C ATOM 929 P G A 43 76.348 53.024 9.953 1.00 59.25 P ATOM 930 O1P G A 43 76.997 54.350 10.246 1.00 56.83 O ATOM 931 O2P G A 43 76.330 51.946 10.985 1.00 53.33 O ATOM 932 O5* G A 43 74.844 53.245 9.472 1.00 55.31 O ATOM 933 C5* G A 43 74.543 54.211 8.454 1.00 54.14 C ATOM 934 C4* G A 43 73.059 54.264 8.204 1.00 51.95 C ATOM 935 O4* G A 43 72.595 52.976 7.736 1.00 49.20 O ATOM 936 C3* G A 43 72.190 54.541 9.414 1.00 53.09 C ATOM 937 O3* G A 43 72.143 55.927 9.696 1.00 58.30 O ATOM 938 C2* G A 43 70.852 53.982 8.963 1.00 50.69 C ATOM 939 O2* G A 43 70.277 54.842 8.008 1.00 55.19 O ATOM 940 C1* G A 43 71.295 52.730 8.219 1.00 46.94 C ATOM 941 N9 G A 43 71.350 51.546 9.071 1.00 44.87 N ATOM 942 C8 G A 43 72.451 50.993 9.658 1.00 41.57 C ATOM 943 N7 G A 43 72.182 49.898 10.321 1.00 41.08 N ATOM 944 C5 G A 43 70.826 49.729 10.161 1.00 40.57 C ATOM 945 C6 G A 43 69.965 48.713 10.615 1.00 40.78 C ATOM 946 O6 G A 43 70.230 47.688 11.221 1.00 38.22 O ATOM 947 N1 G A 43 68.652 48.964 10.259 1.00 41.30 N ATOM 948 C2 G A 43 68.217 50.037 9.555 1.00 42.51 C ATOM 949 N2 G A 43 66.869 50.119 9.354 1.00 42.42 N ATOM 950 N3 G A 43 69.008 50.969 9.090 1.00 44.65 N ATOM 951 C4 G A 43 70.291 50.756 9.425 1.00 44.20 C ATOM 952 P A A 44 71.830 56.434 11.191 1.00 59.72 P ATOM 953 O1P A A 44 72.143 57.887 11.179 1.00 60.18 O ATOM 954 O2P A A 44 72.477 55.542 12.203 1.00 59.03 O ATOM 955 O5* A A 44 70.251 56.220 11.341 1.00 61.19 O ATOM 956 C5* A A 44 69.317 56.879 10.465 1.00 59.15 C ATOM 957 C4* A A 44 67.971 56.191 10.523 1.00 58.59 C ATOM 958 O4* A A 44 68.135 54.794 10.173 1.00 58.70 O ATOM 959 C3* A A 44 67.277 56.140 11.871 1.00 58.67 C ATOM 960 O3* A A 44 66.523 57.316 12.123 1.00 60.86 O ATOM 961 C2* A A 44 66.365 54.931 11.744 1.00 56.86 C ATOM 962 O2* A A 44 65.162 55.206 11.061 1.00 56.94 O ATOM 963 C1* A A 44 67.208 53.996 10.887 1.00 54.14 C ATOM 964 N9 A A 44 67.970 53.042 11.685 1.00 47.88 N ATOM 965 C8 A A 44 69.290 53.122 12.063 1.00 44.83 C ATOM 966 N7 A A 44 69.697 52.091 12.767 1.00 43.37 N ATOM 967 C5 A A 44 68.570 51.276 12.856 1.00 41.45 C ATOM 968 C6 A A 44 68.345 50.029 13.433 1.00 42.09 C ATOM 969 N6 A A 44 69.278 49.331 14.092 1.00 45.08 N ATOM 970 N1 A A 44 67.101 49.482 13.313 1.00 42.78 N ATOM 971 C2 A A 44 66.176 50.170 12.653 1.00 41.91 C ATOM 972 N3 A A 44 66.268 51.351 12.075 1.00 40.34 N ATOM 973 C4 A A 44 67.500 51.858 12.203 1.00 45.75 C ATOM 974 P G A 45 66.470 57.914 13.620 1.00 58.58 P ATOM 975 O1P G A 45 65.444 58.996 13.659 1.00 60.46 O ATOM 976 O2P G A 45 67.862 58.228 14.011 1.00 57.19 O ATOM 977 O5* G A 45 65.905 56.683 14.444 1.00 57.67 O ATOM 978 C5* G A 45 64.533 56.287 14.311 1.00 52.30 C ATOM 979 C4* G A 45 64.255 55.158 15.248 1.00 50.49 C ATOM 980 O4* G A 45 65.017 54.002 14.829 1.00 50.44 O ATOM 981 C3* G A 45 64.685 55.408 16.681 1.00 49.36 C ATOM 982 O3* G A 45 63.616 56.059 17.359 1.00 51.42 O ATOM 983 C2* G A 45 64.894 53.991 17.190 1.00 48.83 C ATOM 984 O2* G A 45 63.645 53.382 17.453 1.00 45.60 O ATOM 985 C1* G A 45 65.456 53.277 15.954 1.00 45.24 C ATOM 986 N9 G A 45 66.920 53.202 15.909 1.00 41.15 N ATOM 987 C8 G A 45 67.772 54.149 15.386 1.00 40.45 C ATOM 988 N7 G A 45 69.031 53.833 15.507 1.00 39.19 N ATOM 989 C5 G A 45 69.014 52.601 16.134 1.00 35.16 C ATOM 990 C6 G A 45 70.070 51.816 16.551 1.00 38.09 C ATOM 991 O6 G A 45 71.292 52.028 16.410 1.00 39.62 O ATOM 992 N1 G A 45 69.636 50.685 17.196 1.00 37.05 N ATOM 993 C2 G A 45 68.338 50.332 17.390 1.00 38.47 C ATOM 994 N2 G A 45 68.164 49.150 17.996 1.00 39.48 N ATOM 995 N3 G A 45 67.301 51.072 17.004 1.00 37.37 N ATOM 996 C4 G A 45 67.719 52.193 16.390 1.00 36.46 C HETATM 997 P 7MG A 46 63.905 57.326 18.310 1.00 53.01 P HETATM 998 O1P 7MG A 46 64.951 58.232 17.766 1.00 52.60 O HETATM 999 O2P 7MG A 46 62.558 57.883 18.619 1.00 54.84 O HETATM 1000 O5* 7MG A 46 64.457 56.673 19.662 1.00 50.92 O HETATM 1001 C5* 7MG A 46 63.673 55.697 20.380 1.00 47.09 C HETATM 1002 C4* 7MG A 46 63.911 55.833 21.859 1.00 45.87 C HETATM 1003 O4* 7MG A 46 65.276 55.388 22.160 1.00 44.37 O HETATM 1004 C3* 7MG A 46 63.810 57.269 22.378 1.00 44.64 C HETATM 1005 O3* 7MG A 46 63.177 57.259 23.652 1.00 47.38 O HETATM 1006 C2* 7MG A 46 65.280 57.700 22.484 1.00 44.78 C HETATM 1007 O2* 7MG A 46 65.577 58.744 23.387 1.00 43.73 O HETATM 1008 C1* 7MG A 46 65.949 56.384 22.900 1.00 39.69 C HETATM 1009 N9 7MG A 46 67.386 56.266 22.628 1.00 35.94 N HETATM 1010 C8 7MG A 46 68.201 57.146 21.945 1.00 35.57 C HETATM 1011 N7 7MG A 46 69.429 56.680 21.823 1.00 35.71 N HETATM 1012 C5 7MG A 46 69.423 55.456 22.475 1.00 30.85 C HETATM 1013 C6 7MG A 46 70.472 54.529 22.704 1.00 32.44 C HETATM 1014 O6 7MG A 46 71.653 54.665 22.404 1.00 34.18 O HETATM 1015 N1 7MG A 46 70.035 53.405 23.371 1.00 27.06 N HETATM 1016 C2 7MG A 46 68.726 53.216 23.821 1.00 28.72 C HETATM 1017 N2 7MG A 46 68.413 52.076 24.386 1.00 30.51 N HETATM 1018 N3 7MG A 46 67.782 54.113 23.692 1.00 29.24 N HETATM 1019 C4 7MG A 46 68.183 55.188 22.989 1.00 30.11 C HETATM 1020 CM7 7MG A 46 70.529 57.362 21.130 1.00 35.54 C ATOM 1021 P U A 47 61.607 57.698 23.777 1.00 47.19 P ATOM 1022 O1P U A 47 60.804 56.464 23.970 1.00 48.54 O ATOM 1023 O2P U A 47 61.264 58.639 22.661 1.00 50.14 O ATOM 1024 O5* U A 47 61.587 58.598 25.089 1.00 45.96 O ATOM 1025 C5* U A 47 62.341 59.800 25.124 1.00 48.85 C ATOM 1026 C4* U A 47 62.482 60.255 26.545 1.00 50.43 C ATOM 1027 O4* U A 47 61.241 60.814 27.040 1.00 50.06 O ATOM 1028 C3* U A 47 62.833 59.116 27.493 1.00 49.46 C ATOM 1029 O3* U A 47 63.658 59.705 28.465 1.00 50.38 O ATOM 1030 C2* U A 47 61.488 58.710 28.088 1.00 48.84 C ATOM 1031 O2* U A 47 61.620 58.065 29.342 1.00 41.07 O ATOM 1032 C1* U A 47 60.798 60.075 28.167 1.00 50.55 C ATOM 1033 N1 U A 47 59.332 60.087 28.149 1.00 54.98 N ATOM 1034 C2 U A 47 58.686 60.797 29.157 1.00 56.81 C ATOM 1035 O2 U A 47 59.281 61.400 30.035 1.00 56.24 O ATOM 1036 N3 U A 47 57.319 60.784 29.079 1.00 58.14 N ATOM 1037 C4 U A 47 56.552 60.165 28.123 1.00 58.07 C ATOM 1038 O4 U A 47 55.332 60.325 28.140 1.00 59.44 O ATOM 1039 C5 U A 47 57.289 59.452 27.121 1.00 57.49 C ATOM 1040 C6 U A 47 58.613 59.436 27.164 1.00 55.97 C ATOM 1041 P C A 48 65.235 59.648 28.260 1.00 51.91 P ATOM 1042 O1P C A 48 65.823 60.627 29.208 1.00 48.27 O ATOM 1043 O2P C A 48 65.455 59.769 26.780 1.00 53.23 O ATOM 1044 O5* C A 48 65.585 58.155 28.696 1.00 48.16 O ATOM 1045 C5* C A 48 65.553 57.821 30.068 1.00 37.25 C ATOM 1046 C4* C A 48 66.046 56.424 30.251 1.00 33.96 C ATOM 1047 O4* C A 48 67.242 56.203 29.430 1.00 33.29 O ATOM 1048 C3* C A 48 66.457 56.168 31.682 1.00 32.27 C ATOM 1049 O3* C A 48 66.149 54.821 32.011 1.00 29.89 O ATOM 1050 C2* C A 48 67.972 56.366 31.634 1.00 27.98 C ATOM 1051 O2* C A 48 68.646 55.696 32.665 1.00 27.27 O ATOM 1052 C1* C A 48 68.291 55.799 30.245 1.00 27.98 C ATOM 1053 N1 C A 48 69.554 56.279 29.660 1.00 23.61 N ATOM 1054 C2 C A 48 70.690 55.421 29.693 1.00 24.90 C ATOM 1055 O2 C A 48 70.567 54.326 30.181 1.00 24.38 O ATOM 1056 N3 C A 48 71.884 55.875 29.215 1.00 23.05 N ATOM 1057 C4 C A 48 71.964 57.127 28.722 1.00 24.65 C ATOM 1058 N4 C A 48 73.163 57.592 28.288 1.00 21.79 N ATOM 1059 C5 C A 48 70.843 57.978 28.653 1.00 27.10 C ATOM 1060 C6 C A 48 69.665 57.518 29.131 1.00 23.05 C HETATM 1061 P 5MC A 49 65.638 54.464 33.461 1.00 31.61 P HETATM 1062 O1P 5MC A 49 65.643 52.989 33.540 1.00 33.43 O HETATM 1063 O2P 5MC A 49 66.252 55.251 34.624 1.00 26.54 O HETATM 1064 O5* 5MC A 49 64.126 54.972 33.441 1.00 31.88 O HETATM 1065 C5* 5MC A 49 63.204 54.605 32.392 1.00 32.68 C HETATM 1066 C4* 5MC A 49 61.796 55.006 32.810 1.00 30.80 C HETATM 1067 O4* 5MC A 49 61.292 54.110 33.848 1.00 32.40 O HETATM 1068 C3* 5MC A 49 61.745 56.381 33.468 1.00 31.44 C HETATM 1069 O3* 5MC A 49 61.666 57.398 32.483 1.00 31.25 O HETATM 1070 C2* 5MC A 49 60.485 56.287 34.312 1.00 34.62 C HETATM 1071 O2* 5MC A 49 59.414 56.469 33.417 1.00 35.41 O HETATM 1072 C1* 5MC A 49 60.568 54.849 34.825 1.00 33.10 C HETATM 1073 N1 5MC A 49 61.331 54.787 36.102 1.00 32.34 N HETATM 1074 C2 5MC A 49 60.737 55.340 37.271 1.00 32.42 C HETATM 1075 O2 5MC A 49 59.568 55.832 37.197 1.00 28.24 O HETATM 1076 N3 5MC A 49 61.420 55.348 38.428 1.00 29.50 N HETATM 1077 C4 5MC A 49 62.626 54.785 38.503 1.00 32.97 C HETATM 1078 N4 5MC A 49 63.188 54.744 39.702 1.00 28.75 N HETATM 1079 C5 5MC A 49 63.270 54.215 37.358 1.00 34.81 C HETATM 1080 C6 5MC A 49 62.583 54.243 36.161 1.00 33.69 C HETATM 1081 CM5 5MC A 49 64.655 53.581 37.465 1.00 34.52 C ATOM 1082 P U A 50 62.421 58.828 32.744 1.00 33.43 P ATOM 1083 O1P U A 50 62.480 59.502 31.468 1.00 33.78 O ATOM 1084 O2P U A 50 63.651 58.683 33.637 1.00 30.15 O ATOM 1085 O5* U A 50 61.374 59.605 33.659 1.00 33.34 O ATOM 1086 C5* U A 50 60.011 59.728 33.268 1.00 34.47 C ATOM 1087 C4* U A 50 59.218 60.444 34.331 1.00 31.72 C ATOM 1088 O4* U A 50 58.928 59.552 35.438 1.00 29.81 O ATOM 1089 C3* U A 50 59.871 61.655 34.983 1.00 32.89 C ATOM 1090 O3* U A 50 59.731 62.817 34.131 1.00 33.31 O ATOM 1091 C2* U A 50 59.066 61.725 36.279 1.00 33.15 C ATOM 1092 O2* U A 50 57.727 62.115 35.997 1.00 35.03 O ATOM 1093 C1* U A 50 58.966 60.259 36.646 1.00 31.45 C ATOM 1094 N1 U A 50 60.113 59.812 37.453 1.00 31.52 N ATOM 1095 C2 U A 50 60.106 60.183 38.793 1.00 30.82 C ATOM 1096 O2 U A 50 59.256 60.941 39.253 1.00 30.94 O ATOM 1097 N3 U A 50 61.128 59.648 39.560 1.00 28.21 N ATOM 1098 C4 U A 50 62.143 58.831 39.099 1.00 27.57 C ATOM 1099 O4 U A 50 62.938 58.308 39.924 1.00 33.33 O ATOM 1100 C5 U A 50 62.120 58.581 37.684 1.00 29.31 C ATOM 1101 C6 U A 50 61.140 59.068 36.927 1.00 31.19 C ATOM 1102 P G A 51 60.854 63.992 34.167 1.00 39.01 P ATOM 1103 O1P G A 51 60.476 64.987 33.121 1.00 35.87 O ATOM 1104 O2P G A 51 62.217 63.402 34.134 1.00 37.91 O ATOM 1105 O5* G A 51 60.648 64.595 35.600 1.00 36.58 O ATOM 1106 C5* G A 51 59.391 65.231 35.906 1.00 40.92 C ATOM 1107 C4* G A 51 59.365 65.668 37.334 1.00 38.83 C ATOM 1108 O4* G A 51 59.409 64.501 38.203 1.00 38.03 O ATOM 1109 C3* G A 51 60.529 66.527 37.797 1.00 40.34 C ATOM 1110 O3* G A 51 60.383 67.887 37.400 1.00 41.89 O ATOM 1111 C2* G A 51 60.488 66.292 39.298 1.00 38.39 C ATOM 1112 O2* G A 51 59.369 66.938 39.873 1.00 35.78 O ATOM 1113 C1* G A 51 60.143 64.799 39.378 1.00 37.87 C ATOM 1114 N9 G A 51 61.303 63.902 39.460 1.00 32.83 N ATOM 1115 C8 G A 51 61.915 63.238 38.424 1.00 33.29 C ATOM 1116 N7 G A 51 62.917 62.483 38.830 1.00 36.76 N ATOM 1117 C5 G A 51 62.986 62.690 40.208 1.00 35.57 C ATOM 1118 C6 G A 51 63.913 62.188 41.197 1.00 34.80 C ATOM 1119 O6 G A 51 64.846 61.404 41.049 1.00 35.08 O ATOM 1120 N1 G A 51 63.653 62.706 42.466 1.00 36.79 N ATOM 1121 C2 G A 51 62.620 63.573 42.759 1.00 37.65 C ATOM 1122 N2 G A 51 62.530 63.985 44.032 1.00 35.39 N ATOM 1123 N3 G A 51 61.746 64.007 41.868 1.00 36.74 N ATOM 1124 C4 G A 51 61.996 63.548 40.619 1.00 35.55 C ATOM 1125 P U A 52 61.706 68.803 37.121 1.00 42.84 P ATOM 1126 O1P U A 52 61.321 70.142 36.563 1.00 41.91 O ATOM 1127 O2P U A 52 62.726 68.025 36.369 1.00 41.76 O ATOM 1128 O5* U A 52 62.224 68.978 38.598 1.00 36.07 O ATOM 1129 C5* U A 52 61.477 69.672 39.558 1.00 34.35 C ATOM 1130 C4* U A 52 62.118 69.559 40.905 1.00 33.37 C ATOM 1131 O4* U A 52 62.012 68.190 41.401 1.00 32.65 O ATOM 1132 C3* U A 52 63.616 69.845 41.000 1.00 33.39 C ATOM 1133 O3* U A 52 63.932 71.237 41.043 1.00 39.74 O ATOM 1134 C2* U A 52 63.939 69.176 42.335 1.00 30.75 C ATOM 1135 O2* U A 52 63.396 69.967 43.391 1.00 32.96 O ATOM 1136 C1* U A 52 63.119 67.888 42.237 1.00 31.31 C ATOM 1137 N1 U A 52 63.946 66.845 41.579 1.00 30.44 N ATOM 1138 C2 U A 52 64.863 66.182 42.369 1.00 30.06 C ATOM 1139 O2 U A 52 65.009 66.454 43.550 1.00 29.26 O ATOM 1140 N3 U A 52 65.623 65.231 41.729 1.00 29.70 N ATOM 1141 C4 U A 52 65.587 64.922 40.408 1.00 34.08 C ATOM 1142 O4 U A 52 66.347 64.036 39.967 1.00 34.26 O ATOM 1143 C5 U A 52 64.622 65.680 39.641 1.00 33.07 C ATOM 1144 C6 U A 52 63.831 66.587 40.266 1.00 26.53 C ATOM 1145 P G A 53 65.414 71.746 40.572 1.00 37.07 P ATOM 1146 O1P G A 53 65.308 73.211 40.502 1.00 37.71 O ATOM 1147 O2P G A 53 65.813 70.998 39.374 1.00 30.94 O ATOM 1148 O5* G A 53 66.393 71.211 41.694 1.00 32.28 O ATOM 1149 C5* G A 53 66.229 71.538 43.081 1.00 34.99 C ATOM 1150 C4* G A 53 67.174 70.724 43.912 1.00 34.42 C ATOM 1151 O4* G A 53 66.864 69.323 43.765 1.00 35.23 O ATOM 1152 C3* G A 53 68.665 70.782 43.620 1.00 34.67 C ATOM 1153 O3* G A 53 69.247 71.895 44.262 1.00 38.95 O ATOM 1154 C2* G A 53 69.164 69.500 44.283 1.00 34.61 C ATOM 1155 O2* G A 53 69.182 69.612 45.688 1.00 34.19 O ATOM 1156 C1* G A 53 68.019 68.537 43.991 1.00 32.96 C ATOM 1157 N9 G A 53 68.320 67.761 42.786 1.00 28.24 N ATOM 1158 C8 G A 53 67.793 67.893 41.556 1.00 31.68 C ATOM 1159 N7 G A 53 68.284 67.042 40.701 1.00 26.28 N ATOM 1160 C5 G A 53 69.196 66.319 41.426 1.00 28.92 C ATOM 1161 C6 G A 53 70.091 65.297 41.013 1.00 25.46 C ATOM 1162 O6 G A 53 70.220 64.815 39.895 1.00 28.07 O ATOM 1163 N1 G A 53 70.897 64.869 42.059 1.00 29.47 N ATOM 1164 C2 G A 53 70.866 65.396 43.328 1.00 26.11 C ATOM 1165 N2 G A 53 71.750 64.920 44.203 1.00 30.39 N ATOM 1166 N3 G A 53 70.033 66.332 43.705 1.00 30.28 N ATOM 1167 C4 G A 53 69.234 66.743 42.708 1.00 26.13 C HETATM 1168 N1 5MU A 54 73.251 67.803 42.694 1.00 35.32 N HETATM 1169 C2 5MU A 54 73.839 66.792 41.983 1.00 34.56 C HETATM 1170 N3 5MU A 54 73.442 66.712 40.672 1.00 35.04 N HETATM 1171 C4 5MU A 54 72.524 67.496 40.026 1.00 33.29 C HETATM 1172 C5 5MU A 54 71.922 68.529 40.835 1.00 31.37 C HETATM 1173 C5M 5MU A 54 70.918 69.466 40.203 1.00 28.02 C HETATM 1174 C6 5MU A 54 72.301 68.618 42.113 1.00 33.12 C HETATM 1175 O2 5MU A 54 74.649 66.016 42.473 1.00 42.24 O HETATM 1176 O4 5MU A 54 72.257 67.283 38.828 1.00 36.52 O HETATM 1177 C1* 5MU A 54 73.711 68.026 44.082 1.00 37.46 C HETATM 1178 C2* 5MU A 54 75.024 68.864 44.109 1.00 40.51 C HETATM 1179 O2* 5MU A 54 75.836 68.430 45.180 1.00 39.54 O HETATM 1180 C3* 5MU A 54 74.485 70.281 44.321 1.00 42.37 C HETATM 1181 C4* 5MU A 54 73.295 70.030 45.246 1.00 39.47 C HETATM 1182 O3* 5MU A 54 75.448 71.209 44.879 1.00 44.31 O HETATM 1183 O4* 5MU A 54 72.728 68.779 44.751 1.00 36.92 O HETATM 1184 C5* 5MU A 54 72.225 71.084 45.312 1.00 38.24 C HETATM 1185 O5* 5MU A 54 71.693 71.314 44.026 1.00 36.39 O HETATM 1186 P 5MU A 54 70.668 72.485 43.743 1.00 37.08 P HETATM 1187 O1P 5MU A 54 70.657 72.754 42.270 1.00 33.03 O HETATM 1188 O2P 5MU A 54 70.866 73.636 44.695 1.00 39.97 O HETATM 1189 N1 PSU A 55 74.158 70.927 39.519 1.00 35.82 N HETATM 1190 C2 PSU A 55 73.717 70.455 38.323 1.00 38.30 C HETATM 1191 N3 PSU A 55 74.479 69.441 37.783 1.00 35.34 N HETATM 1192 C4 PSU A 55 75.687 68.934 38.291 1.00 36.07 C HETATM 1193 C5 PSU A 55 76.107 69.537 39.499 1.00 33.56 C HETATM 1194 C6 PSU A 55 75.337 70.458 40.076 1.00 35.52 C HETATM 1195 O2 PSU A 55 72.728 70.924 37.738 1.00 37.13 O HETATM 1196 O4 PSU A 55 76.304 68.043 37.675 1.00 32.17 O HETATM 1197 C1* PSU A 55 77.461 69.118 40.100 1.00 34.24 C HETATM 1198 C2* PSU A 55 78.634 70.014 39.665 1.00 37.02 C HETATM 1199 O2* PSU A 55 79.793 69.181 39.668 1.00 37.99 O HETATM 1200 C3* PSU A 55 78.650 71.033 40.796 1.00 38.47 C HETATM 1201 C4* PSU A 55 78.398 70.137 41.999 1.00 35.54 C HETATM 1202 O3* PSU A 55 79.864 71.807 40.930 1.00 37.81 O HETATM 1203 O4* PSU A 55 77.424 69.161 41.505 1.00 34.94 O HETATM 1204 C5* PSU A 55 77.870 70.849 43.223 1.00 36.16 C HETATM 1205 O5* PSU A 55 76.820 71.796 42.859 1.00 39.91 O HETATM 1206 P PSU A 55 75.925 72.471 43.991 1.00 43.83 P HETATM 1207 O1P PSU A 55 74.766 73.172 43.306 1.00 46.32 O HETATM 1208 O2P PSU A 55 76.844 73.293 44.834 1.00 45.38 O ATOM 1209 P C A 56 79.893 73.323 40.427 1.00 37.81 P ATOM 1210 O1P C A 56 81.372 73.714 40.748 1.00 37.74 O ATOM 1211 O2P C A 56 78.768 74.121 41.003 1.00 35.29 O ATOM 1212 O5* C A 56 79.825 73.304 38.871 1.00 33.59 O ATOM 1213 C5* C A 56 79.545 74.494 38.135 1.00 33.82 C ATOM 1214 C4* C A 56 79.735 74.239 36.678 1.00 32.83 C ATOM 1215 O4* C A 56 81.120 73.946 36.360 1.00 33.52 O ATOM 1216 C3* C A 56 78.954 73.036 36.142 1.00 30.99 C ATOM 1217 O3* C A 56 77.596 73.405 35.829 1.00 31.77 O ATOM 1218 C2* C A 56 79.739 72.659 34.901 1.00 31.94 C ATOM 1219 O2* C A 56 79.480 73.400 33.725 1.00 29.83 O ATOM 1220 C1* C A 56 81.174 72.955 35.333 1.00 33.49 C ATOM 1221 N1 C A 56 81.800 71.739 35.857 1.00 29.51 N ATOM 1222 C2 C A 56 82.298 70.818 34.956 1.00 28.07 C ATOM 1223 O2 C A 56 82.160 71.013 33.755 1.00 29.82 O ATOM 1224 N3 C A 56 82.907 69.716 35.412 1.00 27.72 N ATOM 1225 C4 C A 56 83.035 69.503 36.720 1.00 32.68 C ATOM 1226 N4 C A 56 83.675 68.399 37.102 1.00 29.93 N ATOM 1227 C5 C A 56 82.515 70.397 37.674 1.00 30.58 C ATOM 1228 C6 C A 56 81.894 71.503 37.204 1.00 29.09 C ATOM 1229 P G A 57 76.379 72.458 36.349 1.00 32.12 P ATOM 1230 O1P G A 57 75.189 73.231 35.952 1.00 34.27 O ATOM 1231 O2P G A 57 76.558 72.064 37.745 1.00 22.22 O ATOM 1232 O5* G A 57 76.529 71.173 35.451 1.00 28.84 O ATOM 1233 C5* G A 57 76.321 71.233 34.052 1.00 30.38 C ATOM 1234 C4* G A 57 76.829 69.964 33.385 1.00 32.75 C ATOM 1235 O4* G A 57 78.284 69.786 33.566 1.00 34.57 O ATOM 1236 C3* G A 57 76.221 68.672 33.883 1.00 29.11 C ATOM 1237 O3* G A 57 74.937 68.457 33.239 1.00 27.73 O ATOM 1238 C2* G A 57 77.251 67.650 33.423 1.00 28.35 C ATOM 1239 O2* G A 57 77.136 67.492 32.013 1.00 34.71 O ATOM 1240 C1* G A 57 78.571 68.404 33.655 1.00 32.37 C ATOM 1241 N9 G A 57 79.153 68.106 34.968 1.00 26.80 N ATOM 1242 C8 G A 57 79.113 68.840 36.137 1.00 26.67 C ATOM 1243 N7 G A 57 79.792 68.266 37.110 1.00 27.09 N ATOM 1244 C5 G A 57 80.280 67.087 36.529 1.00 28.66 C ATOM 1245 C6 G A 57 81.103 66.037 37.080 1.00 28.50 C ATOM 1246 O6 G A 57 81.588 65.974 38.203 1.00 28.04 O ATOM 1247 N1 G A 57 81.358 65.033 36.153 1.00 27.23 N ATOM 1248 C2 G A 57 80.910 65.036 34.858 1.00 27.75 C ATOM 1249 N2 G A 57 81.215 63.982 34.112 1.00 25.22 N ATOM 1250 N3 G A 57 80.185 66.017 34.328 1.00 27.62 N ATOM 1251 C4 G A 57 79.901 66.989 35.227 1.00 25.05 C HETATM 1252 P 1MA A 58 73.770 67.765 34.057 1.00 30.65 P HETATM 1253 O1P 1MA A 58 73.621 68.229 35.450 1.00 29.49 O HETATM 1254 O2P 1MA A 58 72.638 67.886 33.105 1.00 32.84 O HETATM 1255 O5* 1MA A 58 74.315 66.273 34.254 1.00 28.81 O HETATM 1256 C5* 1MA A 58 74.592 65.439 33.080 1.00 29.42 C HETATM 1257 C4* 1MA A 58 74.279 63.972 33.383 1.00 33.42 C HETATM 1258 O4* 1MA A 58 74.880 63.685 34.667 1.00 32.36 O HETATM 1259 C3* 1MA A 58 72.789 63.573 33.509 1.00 35.13 C HETATM 1260 O3* 1MA A 58 72.625 62.168 33.250 1.00 36.80 O HETATM 1261 C2* 1MA A 58 72.560 63.667 35.012 1.00 34.80 C HETATM 1262 O2* 1MA A 58 71.525 62.828 35.506 1.00 36.27 O HETATM 1263 C1* 1MA A 58 73.908 63.150 35.551 1.00 33.62 C HETATM 1264 N9 1MA A 58 74.284 63.494 36.930 1.00 30.36 N HETATM 1265 C8 1MA A 58 73.887 64.574 37.688 1.00 34.55 C HETATM 1266 N7 1MA A 58 74.415 64.610 38.899 1.00 33.32 N HETATM 1267 C5 1MA A 58 75.204 63.469 38.953 1.00 33.37 C HETATM 1268 C6 1MA A 58 76.031 62.941 39.948 1.00 33.58 C HETATM 1269 N6 1MA A 58 76.184 63.488 41.134 1.00 41.19 N HETATM 1270 N1 1MA A 58 76.708 61.803 39.669 1.00 34.48 N HETATM 1271 CM1 1MA A 58 77.649 61.222 40.626 1.00 31.43 C HETATM 1272 C2 1MA A 58 76.527 61.216 38.479 1.00 28.43 C HETATM 1273 N3 1MA A 58 75.793 61.624 37.453 1.00 31.67 N HETATM 1274 C4 1MA A 58 75.142 62.771 37.747 1.00 33.02 C ATOM 1275 P U A 59 72.617 61.530 31.733 1.00 41.00 P ATOM 1276 O1P U A 59 73.971 61.410 31.109 1.00 33.02 O ATOM 1277 O2P U A 59 71.557 62.222 31.005 1.00 41.76 O ATOM 1278 O5* U A 59 72.130 60.048 31.994 1.00 39.14 O ATOM 1279 C5* U A 59 70.719 59.794 32.346 1.00 35.32 C ATOM 1280 C4* U A 59 70.618 58.472 33.026 1.00 29.87 C ATOM 1281 O4* U A 59 71.242 57.502 32.161 1.00 30.33 O ATOM 1282 C3* U A 59 71.352 58.357 34.366 1.00 30.46 C ATOM 1283 O3* U A 59 70.546 58.855 35.497 1.00 31.17 O ATOM 1284 C2* U A 59 71.629 56.872 34.435 1.00 34.51 C ATOM 1285 O2* U A 59 70.529 56.200 35.034 1.00 33.82 O ATOM 1286 C1* U A 59 71.808 56.483 32.934 1.00 34.05 C ATOM 1287 N1 U A 59 73.191 56.244 32.453 1.00 30.28 N ATOM 1288 C2 U A 59 73.828 55.055 32.883 1.00 33.87 C ATOM 1289 O2 U A 59 73.232 54.201 33.510 1.00 40.51 O ATOM 1290 N3 U A 59 75.160 54.920 32.534 1.00 31.37 N ATOM 1291 C4 U A 59 75.886 55.811 31.767 1.00 33.85 C ATOM 1292 O4 U A 59 76.986 55.468 31.336 1.00 34.46 O ATOM 1293 C5 U A 59 75.128 57.004 31.315 1.00 30.62 C ATOM 1294 C6 U A 59 73.842 57.148 31.666 1.00 24.94 C ATOM 1295 P C A 60 71.267 59.539 36.803 1.00 33.56 P ATOM 1296 O1P C A 60 70.297 59.837 37.820 1.00 32.77 O ATOM 1297 O2P C A 60 72.180 60.638 36.294 1.00 34.80 O ATOM 1298 O5* C A 60 72.351 58.459 37.303 1.00 29.47 O ATOM 1299 C5* C A 60 71.983 57.223 37.962 1.00 31.76 C ATOM 1300 C4* C A 60 73.252 56.436 38.285 1.00 32.79 C ATOM 1301 O4* C A 60 74.024 56.360 37.069 1.00 34.84 O ATOM 1302 C3* C A 60 74.192 57.080 39.322 1.00 35.07 C ATOM 1303 O3* C A 60 74.832 56.043 40.028 1.00 38.94 O ATOM 1304 C2* C A 60 75.220 57.789 38.455 1.00 34.97 C ATOM 1305 O2* C A 60 76.489 58.039 39.019 1.00 36.28 O ATOM 1306 C1* C A 60 75.336 56.786 37.306 1.00 32.64 C ATOM 1307 N1 C A 60 75.849 57.370 36.090 1.00 32.74 N ATOM 1308 C2 C A 60 76.958 56.775 35.475 1.00 33.30 C ATOM 1309 O2 C A 60 77.484 55.783 36.004 1.00 37.02 O ATOM 1310 N3 C A 60 77.445 57.308 34.368 1.00 27.75 N ATOM 1311 C4 C A 60 76.919 58.410 33.866 1.00 31.44 C ATOM 1312 N4 C A 60 77.491 58.943 32.854 1.00 31.16 N ATOM 1313 C5 C A 60 75.767 59.040 34.441 1.00 31.34 C ATOM 1314 C6 C A 60 75.274 58.489 35.549 1.00 34.59 C ATOM 1315 P C A 61 74.243 55.584 41.447 1.00 42.61 P ATOM 1316 O1P C A 61 75.151 54.490 41.830 1.00 44.00 O ATOM 1317 O2P C A 61 72.781 55.398 41.440 1.00 37.71 O ATOM 1318 O5* C A 61 74.477 56.853 42.390 1.00 39.76 O ATOM 1319 C5* C A 61 75.717 57.059 43.037 1.00 43.07 C ATOM 1320 C4* C A 61 75.643 58.316 43.840 1.00 39.59 C ATOM 1321 O4* C A 61 75.807 59.456 42.959 1.00 37.14 O ATOM 1322 C3* C A 61 74.297 58.531 44.524 1.00 38.59 C ATOM 1323 O3* C A 61 74.209 57.799 45.784 1.00 39.28 O ATOM 1324 C2* C A 61 74.292 60.040 44.702 1.00 37.97 C ATOM 1325 O2* C A 61 75.174 60.397 45.747 1.00 38.59 O ATOM 1326 C1* C A 61 74.974 60.515 43.421 1.00 36.12 C ATOM 1327 N1 C A 61 74.069 60.960 42.325 1.00 33.98 N ATOM 1328 C2 C A 61 73.361 62.180 42.471 1.00 33.04 C ATOM 1329 O2 C A 61 73.490 62.839 43.524 1.00 35.58 O ATOM 1330 N3 C A 61 72.551 62.600 41.455 1.00 35.45 N ATOM 1331 C4 C A 61 72.421 61.838 40.349 1.00 35.34 C ATOM 1332 N4 C A 61 71.550 62.237 39.373 1.00 32.83 N ATOM 1333 C5 C A 61 73.147 60.623 40.187 1.00 35.42 C ATOM 1334 C6 C A 61 73.938 60.221 41.188 1.00 31.03 C ATOM 1335 P A A 62 72.777 57.319 46.334 1.00 42.33 P ATOM 1336 O1P A A 62 73.023 56.390 47.455 1.00 41.97 O ATOM 1337 O2P A A 62 71.960 56.820 45.161 1.00 42.66 O ATOM 1338 O5* A A 62 72.104 58.670 46.874 1.00 43.04 O ATOM 1339 C5* A A 62 72.719 59.460 47.936 1.00 39.06 C ATOM 1340 C4* A A 62 71.882 60.684 48.244 1.00 39.69 C ATOM 1341 O4* A A 62 71.968 61.660 47.165 1.00 39.39 O ATOM 1342 C3* A A 62 70.391 60.442 48.405 1.00 40.80 C ATOM 1343 O3* A A 62 70.103 59.945 49.700 1.00 43.31 O ATOM 1344 C2* A A 62 69.808 61.827 48.120 1.00 41.10 C ATOM 1345 O2* A A 62 70.016 62.793 49.130 1.00 43.43 O ATOM 1346 C1* A A 62 70.711 62.312 46.995 1.00 38.85 C ATOM 1347 N9 A A 62 70.182 62.021 45.654 1.00 32.58 N ATOM 1348 C8 A A 62 70.560 61.004 44.827 1.00 33.66 C ATOM 1349 N7 A A 62 70.002 61.038 43.650 1.00 32.82 N ATOM 1350 C5 A A 62 69.158 62.140 43.707 1.00 31.78 C ATOM 1351 C6 A A 62 68.304 62.715 42.759 1.00 29.57 C ATOM 1352 N6 A A 62 68.170 62.255 41.478 1.00 29.13 N ATOM 1353 N1 A A 62 67.590 63.788 43.135 1.00 31.96 N ATOM 1354 C2 A A 62 67.754 64.268 44.373 1.00 33.29 C ATOM 1355 N3 A A 62 68.548 63.835 45.349 1.00 32.89 N ATOM 1356 C4 A A 62 69.239 62.747 44.945 1.00 35.51 C ATOM 1357 P C A 63 68.913 58.881 49.918 1.00 43.35 P ATOM 1358 O1P C A 63 69.009 58.532 51.351 1.00 49.74 O ATOM 1359 O2P C A 63 68.970 57.790 48.905 1.00 46.77 O ATOM 1360 O5* C A 63 67.599 59.762 49.769 1.00 45.50 O ATOM 1361 C5* C A 63 67.371 60.836 50.703 1.00 44.28 C ATOM 1362 C4* C A 63 66.263 61.750 50.244 1.00 44.77 C ATOM 1363 O4* C A 63 66.669 62.512 49.079 1.00 41.48 O ATOM 1364 C3* C A 63 64.958 61.089 49.850 1.00 46.16 C ATOM 1365 O3* C A 63 64.160 60.771 51.005 1.00 51.01 O ATOM 1366 C2* C A 63 64.325 62.154 48.972 1.00 43.55 C ATOM 1367 O2* C A 63 63.764 63.209 49.763 1.00 48.54 O ATOM 1368 C1* C A 63 65.541 62.713 48.233 1.00 39.95 C ATOM 1369 N1 C A 63 65.788 62.041 46.918 1.00 35.72 N ATOM 1370 C2 C A 63 65.155 62.553 45.793 1.00 32.68 C ATOM 1371 O2 C A 63 64.440 63.541 45.950 1.00 36.07 O ATOM 1372 N3 C A 63 65.328 61.946 44.571 1.00 34.10 N ATOM 1373 C4 C A 63 66.113 60.839 44.485 1.00 31.53 C ATOM 1374 N4 C A 63 66.253 60.229 43.282 1.00 34.07 N ATOM 1375 C5 C A 63 66.783 60.308 45.617 1.00 35.81 C ATOM 1376 C6 C A 63 66.603 60.941 46.809 1.00 36.37 C ATOM 1377 P A A 64 63.129 59.542 50.943 1.00 57.75 P ATOM 1378 O1P A A 64 62.528 59.402 52.300 1.00 58.22 O ATOM 1379 O2P A A 64 63.807 58.359 50.338 1.00 54.03 O ATOM 1380 O5* A A 64 62.008 60.100 49.971 1.00 54.48 O ATOM 1381 C5* A A 64 61.194 61.198 50.398 1.00 56.10 C ATOM 1382 C4* A A 64 60.204 61.569 49.339 1.00 54.58 C ATOM 1383 O4* A A 64 60.917 62.255 48.276 1.00 53.92 O ATOM 1384 C3* A A 64 59.459 60.433 48.632 1.00 56.48 C ATOM 1385 O3* A A 64 58.374 59.780 49.384 1.00 56.71 O ATOM 1386 C2* A A 64 59.036 61.152 47.349 1.00 53.00 C ATOM 1387 O2* A A 64 57.994 62.076 47.565 1.00 56.42 O ATOM 1388 C1* A A 64 60.283 61.983 47.026 1.00 51.87 C ATOM 1389 N9 A A 64 61.176 61.174 46.178 1.00 45.48 N ATOM 1390 C8 A A 64 62.220 60.374 46.567 1.00 42.54 C ATOM 1391 N7 A A 64 62.766 59.702 45.585 1.00 42.30 N ATOM 1392 C5 A A 64 62.051 60.122 44.466 1.00 40.22 C ATOM 1393 C6 A A 64 62.152 59.783 43.119 1.00 36.50 C ATOM 1394 N6 A A 64 63.082 58.963 42.657 1.00 34.41 N ATOM 1395 N1 A A 64 61.261 60.326 42.262 1.00 37.69 N ATOM 1396 C2 A A 64 60.346 61.178 42.745 1.00 36.62 C ATOM 1397 N3 A A 64 60.169 61.600 43.991 1.00 36.71 N ATOM 1398 C4 A A 64 61.070 61.018 44.815 1.00 39.74 C ATOM 1399 P G A 65 58.068 58.186 49.152 1.00 58.45 P ATOM 1400 O1P G A 65 57.104 57.682 50.155 1.00 62.97 O ATOM 1401 O2P G A 65 59.328 57.423 48.941 1.00 61.74 O ATOM 1402 O5* G A 65 57.302 58.169 47.766 1.00 57.06 O ATOM 1403 C5* G A 65 56.297 59.121 47.518 1.00 49.45 C ATOM 1404 C4* G A 65 55.988 59.170 46.060 1.00 45.91 C ATOM 1405 O4* G A 65 57.113 59.757 45.358 1.00 43.81 O ATOM 1406 C3* G A 65 55.711 57.868 45.316 1.00 45.00 C ATOM 1407 O3* G A 65 54.366 57.370 45.509 1.00 47.48 O ATOM 1408 C2* G A 65 55.920 58.322 43.886 1.00 41.79 C ATOM 1409 O2* G A 65 54.776 59.084 43.491 1.00 39.93 O ATOM 1410 C1* G A 65 57.142 59.262 44.032 1.00 42.23 C ATOM 1411 N9 G A 65 58.373 58.489 43.875 1.00 39.23 N ATOM 1412 C8 G A 65 59.245 58.086 44.861 1.00 37.65 C ATOM 1413 N7 G A 65 60.189 57.305 44.420 1.00 35.71 N ATOM 1414 C5 G A 65 59.942 57.213 43.050 1.00 38.74 C ATOM 1415 C6 G A 65 60.647 56.509 42.009 1.00 36.19 C ATOM 1416 O6 G A 65 61.690 55.776 42.101 1.00 38.10 O ATOM 1417 N1 G A 65 60.040 56.710 40.766 1.00 37.09 N ATOM 1418 C2 G A 65 58.920 57.482 40.550 1.00 37.20 C ATOM 1419 N2 G A 65 58.460 57.534 39.290 1.00 36.90 N ATOM 1420 N3 G A 65 58.288 58.152 41.498 1.00 33.44 N ATOM 1421 C4 G A 65 58.838 57.965 42.703 1.00 36.71 C ATOM 1422 P A A 66 54.074 55.782 45.410 1.00 48.51 P ATOM 1423 O1P A A 66 52.701 55.474 45.862 1.00 50.89 O ATOM 1424 O2P A A 66 55.214 55.033 46.008 1.00 48.73 O ATOM 1425 O5* A A 66 54.173 55.469 43.863 1.00 45.90 O ATOM 1426 C5* A A 66 53.339 56.143 42.940 1.00 46.19 C ATOM 1427 C4* A A 66 53.671 55.677 41.542 1.00 46.27 C ATOM 1428 O4* A A 66 55.037 56.058 41.206 1.00 44.63 O ATOM 1429 C3* A A 66 53.666 54.171 41.354 1.00 44.76 C ATOM 1430 O3* A A 66 52.326 53.680 41.155 1.00 46.57 O ATOM 1431 C2* A A 66 54.535 54.017 40.112 1.00 42.31 C ATOM 1432 O2* A A 66 53.773 54.358 38.980 1.00 44.56 O ATOM 1433 C1* A A 66 55.595 55.117 40.311 1.00 40.50 C ATOM 1434 N9 A A 66 56.838 54.618 40.902 1.00 38.52 N ATOM 1435 C8 A A 66 57.216 54.722 42.216 1.00 37.83 C ATOM 1436 N7 A A 66 58.365 54.141 42.488 1.00 38.64 N ATOM 1437 C5 A A 66 58.784 53.639 41.260 1.00 37.61 C ATOM 1438 C6 A A 66 59.943 52.898 40.877 1.00 36.94 C ATOM 1439 N6 A A 66 60.924 52.566 41.728 1.00 34.85 N ATOM 1440 N1 A A 66 60.042 52.511 39.596 1.00 36.33 N ATOM 1441 C2 A A 66 59.056 52.850 38.752 1.00 37.97 C ATOM 1442 N3 A A 66 57.922 53.552 38.986 1.00 38.42 N ATOM 1443 C4 A A 66 57.850 53.916 40.276 1.00 37.28 C ATOM 1444 P A A 67 51.950 52.180 41.620 1.00 46.84 P ATOM 1445 O1P A A 67 50.472 52.017 41.466 1.00 50.38 O ATOM 1446 O2P A A 67 52.569 51.904 42.938 1.00 43.27 O ATOM 1447 O5* A A 67 52.644 51.252 40.553 1.00 44.33 O ATOM 1448 C5* A A 67 52.276 51.321 39.167 1.00 45.20 C ATOM 1449 C4* A A 67 53.319 50.645 38.320 1.00 43.61 C ATOM 1450 O4* A A 67 54.601 51.353 38.446 1.00 42.36 O ATOM 1451 C3* A A 67 53.636 49.208 38.725 1.00 41.30 C ATOM 1452 O3* A A 67 52.690 48.291 38.163 1.00 46.52 O ATOM 1453 C2* A A 67 55.053 49.024 38.157 1.00 42.21 C ATOM 1454 O2* A A 67 55.039 48.819 36.759 1.00 41.99 O ATOM 1455 C1* A A 67 55.669 50.417 38.391 1.00 41.64 C ATOM 1456 N9 A A 67 56.441 50.432 39.643 1.00 39.97 N ATOM 1457 C8 A A 67 56.129 50.866 40.893 1.00 40.04 C ATOM 1458 N7 A A 67 57.062 50.618 41.788 1.00 37.46 N ATOM 1459 C5 A A 67 58.077 50.015 41.060 1.00 37.07 C ATOM 1460 C6 A A 67 59.348 49.481 41.435 1.00 37.10 C ATOM 1461 N6 A A 67 59.864 49.515 42.677 1.00 37.48 N ATOM 1462 N1 A A 67 60.076 48.895 40.483 1.00 37.12 N ATOM 1463 C2 A A 67 59.577 48.842 39.230 1.00 42.46 C ATOM 1464 N3 A A 67 58.415 49.313 38.758 1.00 39.78 N ATOM 1465 C4 A A 67 57.712 49.894 39.743 1.00 37.61 C ATOM 1466 P U A 68 52.371 46.894 38.917 1.00 42.45 P ATOM 1467 O1P U A 68 51.176 46.256 38.300 1.00 49.18 O ATOM 1468 O2P U A 68 52.399 47.127 40.381 1.00 41.13 O ATOM 1469 O5* U A 68 53.625 45.986 38.529 1.00 44.81 O ATOM 1470 C5* U A 68 53.889 45.709 37.150 1.00 43.19 C ATOM 1471 C4* U A 68 55.168 44.940 37.012 1.00 43.76 C ATOM 1472 O4* U A 68 56.296 45.792 37.351 1.00 42.13 O ATOM 1473 C3* U A 68 55.315 43.758 37.955 1.00 41.59 C ATOM 1474 O3* U A 68 54.653 42.598 37.490 1.00 42.60 O ATOM 1475 C2* U A 68 56.825 43.573 37.996 1.00 38.55 C ATOM 1476 O2* U A 68 57.255 43.012 36.776 1.00 36.20 O ATOM 1477 C1* U A 68 57.286 45.028 38.018 1.00 40.75 C ATOM 1478 N1 U A 68 57.451 45.543 39.385 1.00 40.94 N ATOM 1479 C2 U A 68 58.636 45.212 40.015 1.00 38.74 C ATOM 1480 O2 U A 68 59.470 44.485 39.490 1.00 38.70 O ATOM 1481 N3 U A 68 58.804 45.747 41.261 1.00 37.87 N ATOM 1482 C4 U A 68 57.924 46.564 41.939 1.00 41.13 C ATOM 1483 O4 U A 68 58.175 46.857 43.097 1.00 38.53 O ATOM 1484 C5 U A 68 56.693 46.846 41.241 1.00 42.38 C ATOM 1485 C6 U A 68 56.502 46.326 40.013 1.00 41.40 C ATOM 1486 P U A 69 54.151 41.542 38.559 1.00 39.53 P ATOM 1487 O1P U A 69 53.485 40.463 37.781 1.00 45.74 O ATOM 1488 O2P U A 69 53.441 42.272 39.591 1.00 36.81 O ATOM 1489 O5* U A 69 55.409 40.901 39.291 1.00 37.79 O ATOM 1490 C5* U A 69 56.364 40.117 38.558 1.00 37.34 C ATOM 1491 C4* U A 69 57.582 39.880 39.400 1.00 37.79 C ATOM 1492 O4* U A 69 58.241 41.150 39.731 1.00 36.27 O ATOM 1493 C3* U A 69 57.300 39.247 40.746 1.00 37.91 C ATOM 1494 O3* U A 69 57.092 37.832 40.625 1.00 37.28 O ATOM 1495 C2* U A 69 58.553 39.630 41.533 1.00 35.61 C ATOM 1496 O2* U A 69 59.645 38.794 41.156 1.00 34.07 O ATOM 1497 C1* U A 69 58.796 41.050 41.019 1.00 34.46 C ATOM 1498 N1 U A 69 58.186 42.082 41.876 1.00 33.94 N ATOM 1499 C2 U A 69 58.871 42.410 43.052 1.00 34.89 C ATOM 1500 O2 U A 69 59.947 41.899 43.373 1.00 36.74 O ATOM 1501 N3 U A 69 58.261 43.334 43.836 1.00 35.12 N ATOM 1502 C4 U A 69 57.071 43.985 43.603 1.00 37.89 C ATOM 1503 O4 U A 69 56.665 44.808 44.432 1.00 37.56 O ATOM 1504 C5 U A 69 56.419 43.627 42.355 1.00 38.56 C ATOM 1505 C6 U A 69 56.991 42.702 41.549 1.00 35.52 C ATOM 1506 P C A 70 56.232 37.065 41.726 1.00 41.13 P ATOM 1507 O1P C A 70 55.792 35.757 41.056 1.00 42.55 O ATOM 1508 O2P C A 70 55.233 37.976 42.338 1.00 39.43 O ATOM 1509 O5* C A 70 57.233 36.693 42.902 1.00 36.56 O ATOM 1510 C5* C A 70 58.371 35.817 42.663 1.00 39.97 C ATOM 1511 C4* C A 70 59.348 35.925 43.805 1.00 42.01 C ATOM 1512 O4* C A 70 59.923 37.260 43.909 1.00 41.84 O ATOM 1513 C3* C A 70 58.743 35.702 45.172 1.00 41.90 C ATOM 1514 O3* C A 70 58.575 34.329 45.408 1.00 45.51 O ATOM 1515 C2* C A 70 59.765 36.355 46.087 1.00 41.68 C ATOM 1516 O2* C A 70 60.879 35.486 46.166 1.00 42.55 O ATOM 1517 C1* C A 70 60.163 37.591 45.267 1.00 36.50 C ATOM 1518 N1 C A 70 59.360 38.780 45.597 1.00 39.49 N ATOM 1519 C2 C A 70 59.805 39.676 46.617 1.00 39.47 C ATOM 1520 O2 C A 70 60.879 39.451 47.194 1.00 41.25 O ATOM 1521 N3 C A 70 59.047 40.767 46.935 1.00 37.96 N ATOM 1522 C4 C A 70 57.901 40.982 46.285 1.00 40.08 C ATOM 1523 N4 C A 70 57.169 42.038 46.653 1.00 40.99 N ATOM 1524 C5 C A 70 57.445 40.109 45.229 1.00 39.06 C ATOM 1525 C6 C A 70 58.187 39.026 44.934 1.00 35.59 C ATOM 1526 P G A 71 57.514 33.856 46.509 1.00 51.79 P ATOM 1527 O1P G A 71 57.674 32.367 46.524 1.00 50.10 O ATOM 1528 O2P G A 71 56.203 34.485 46.254 1.00 46.01 O ATOM 1529 O5* G A 71 58.085 34.416 47.884 1.00 49.09 O ATOM 1530 C5* G A 71 59.164 33.703 48.505 1.00 56.68 C ATOM 1531 C4* G A 71 59.688 34.434 49.701 1.00 58.95 C ATOM 1532 O4* G A 71 60.125 35.757 49.305 1.00 57.94 O ATOM 1533 C3* G A 71 58.716 34.693 50.839 1.00 63.78 C ATOM 1534 O3* G A 71 58.514 33.580 51.699 1.00 70.59 O ATOM 1535 C2* G A 71 59.392 35.845 51.559 1.00 63.20 C ATOM 1536 O2* G A 71 60.455 35.370 52.375 1.00 64.19 O ATOM 1537 C1* G A 71 59.915 36.661 50.374 1.00 59.19 C ATOM 1538 N9 G A 71 58.897 37.620 49.946 1.00 58.54 N ATOM 1539 C8 G A 71 58.004 37.488 48.897 1.00 56.09 C ATOM 1540 N7 G A 71 57.207 38.519 48.769 1.00 54.09 N ATOM 1541 C5 G A 71 57.597 39.383 49.788 1.00 53.61 C ATOM 1542 C6 G A 71 57.114 40.666 50.148 1.00 52.69 C ATOM 1543 O6 G A 71 56.184 41.315 49.653 1.00 52.69 O ATOM 1544 N1 G A 71 57.827 41.196 51.223 1.00 52.34 N ATOM 1545 C2 G A 71 58.844 40.566 51.884 1.00 53.05 C ATOM 1546 N2 G A 71 59.399 41.251 52.885 1.00 55.23 N ATOM 1547 N3 G A 71 59.291 39.360 51.585 1.00 54.72 N ATOM 1548 C4 G A 71 58.636 38.835 50.526 1.00 55.38 C ATOM 1549 P C A 72 57.094 33.402 52.426 1.00 73.67 P ATOM 1550 O1P C A 72 57.177 32.132 53.187 1.00 76.79 O ATOM 1551 O2P C A 72 56.011 33.588 51.427 1.00 75.72 O ATOM 1552 O5* C A 72 57.017 34.599 53.469 1.00 74.00 O ATOM 1553 C5* C A 72 57.817 34.563 54.660 1.00 75.03 C ATOM 1554 C4* C A 72 57.653 35.834 55.450 1.00 74.38 C ATOM 1555 O4* C A 72 57.964 36.953 54.582 1.00 74.64 O ATOM 1556 C3* C A 72 56.261 36.155 55.979 1.00 76.09 C ATOM 1557 O3* C A 72 55.932 35.502 57.209 1.00 78.85 O ATOM 1558 C2* C A 72 56.302 37.671 56.139 1.00 74.55 C ATOM 1559 O2* C A 72 56.872 38.146 57.338 1.00 75.06 O ATOM 1560 C1* C A 72 57.184 38.082 54.963 1.00 72.55 C ATOM 1561 N1 C A 72 56.343 38.502 53.829 1.00 68.80 N ATOM 1562 C2 C A 72 55.796 39.801 53.838 1.00 66.61 C ATOM 1563 O2 C A 72 56.063 40.568 54.791 1.00 63.14 O ATOM 1564 N3 C A 72 54.993 40.180 52.815 1.00 64.67 N ATOM 1565 C4 C A 72 54.734 39.331 51.818 1.00 64.57 C ATOM 1566 N4 C A 72 53.935 39.749 50.832 1.00 63.54 N ATOM 1567 C5 C A 72 55.283 38.015 51.780 1.00 64.77 C ATOM 1568 C6 C A 72 56.077 37.647 52.792 1.00 67.37 C ATOM 1569 P A A 73 54.383 35.178 57.550 1.00 79.81 P ATOM 1570 O1P A A 73 54.333 34.483 58.872 1.00 81.24 O ATOM 1571 O2P A A 73 53.785 34.517 56.355 1.00 79.38 O ATOM 1572 O5* A A 73 53.739 36.617 57.766 1.00 77.14 O ATOM 1573 C5* A A 73 54.204 37.443 58.849 1.00 75.83 C ATOM 1574 C4* A A 73 53.508 38.772 58.837 1.00 74.32 C ATOM 1575 O4* A A 73 53.912 39.514 57.654 1.00 72.81 O ATOM 1576 C3* A A 73 51.987 38.727 58.750 1.00 73.77 C ATOM 1577 O3* A A 73 51.332 38.492 59.999 1.00 76.93 O ATOM 1578 C2* A A 73 51.667 40.101 58.178 1.00 71.67 C ATOM 1579 O2* A A 73 51.704 41.127 59.141 1.00 70.42 O ATOM 1580 C1* A A 73 52.823 40.304 57.196 1.00 69.17 C ATOM 1581 N9 A A 73 52.434 39.853 55.856 1.00 64.04 N ATOM 1582 C8 A A 73 52.792 38.695 55.206 1.00 60.25 C ATOM 1583 N7 A A 73 52.256 38.571 54.016 1.00 58.41 N ATOM 1584 C5 A A 73 51.493 39.729 53.869 1.00 57.85 C ATOM 1585 C6 A A 73 50.683 40.203 52.812 1.00 56.02 C ATOM 1586 N6 A A 73 50.516 39.541 51.654 1.00 55.49 N ATOM 1587 N1 A A 73 50.052 41.388 52.979 1.00 53.67 N ATOM 1588 C2 A A 73 50.233 42.052 54.136 1.00 55.48 C ATOM 1589 N3 A A 73 50.975 41.713 55.198 1.00 59.62 N ATOM 1590 C4 A A 73 51.589 40.525 54.993 1.00 59.80 C ATOM 1591 P C A 74 49.934 37.680 60.030 1.00 78.48 P ATOM 1592 O1P C A 74 49.485 37.580 61.446 1.00 80.11 O ATOM 1593 O2P C A 74 50.090 36.441 59.222 1.00 77.06 O ATOM 1594 O5* C A 74 48.914 38.636 59.275 1.00 77.54 O ATOM 1595 C5* C A 74 48.529 39.898 59.843 1.00 79.25 C ATOM 1596 C4* C A 74 47.463 40.541 58.985 1.00 80.49 C ATOM 1597 O4* C A 74 48.030 40.879 57.690 1.00 79.75 O ATOM 1598 C3* C A 74 46.291 39.627 58.660 1.00 81.62 C ATOM 1599 O3* C A 74 45.292 39.640 59.669 1.00 82.15 O ATOM 1600 C2* C A 74 45.778 40.187 57.340 1.00 80.42 C ATOM 1601 O2* C A 74 44.932 41.302 57.547 1.00 82.05 O ATOM 1602 C1* C A 74 47.080 40.621 56.662 1.00 79.08 C ATOM 1603 N1 C A 74 47.631 39.588 55.748 1.00 76.36 N ATOM 1604 C2 C A 74 47.204 39.557 54.400 1.00 74.77 C ATOM 1605 O2 C A 74 46.399 40.404 54.002 1.00 75.12 O ATOM 1606 N3 C A 74 47.694 38.606 53.571 1.00 73.87 N ATOM 1607 C4 C A 74 48.588 37.723 54.024 1.00 73.03 C ATOM 1608 N4 C A 74 49.055 36.813 53.167 1.00 72.80 N ATOM 1609 C5 C A 74 49.046 37.734 55.376 1.00 73.50 C ATOM 1610 C6 C A 74 48.543 38.671 56.196 1.00 74.97 C ATOM 1611 P C A 75 44.492 38.295 59.994 1.00 83.00 P ATOM 1612 O1P C A 75 43.435 38.637 60.974 1.00 84.71 O ATOM 1613 O2P C A 75 45.478 37.229 60.303 1.00 81.99 O ATOM 1614 O5* C A 75 43.802 37.890 58.618 1.00 84.55 O ATOM 1615 C5* C A 75 42.993 38.825 57.874 1.00 85.33 C ATOM 1616 C4* C A 75 42.564 38.194 56.568 1.00 86.51 C ATOM 1617 O4* C A 75 43.667 38.149 55.630 1.00 84.68 O ATOM 1618 C3* C A 75 42.136 36.747 56.733 1.00 88.27 C ATOM 1619 O3* C A 75 40.778 36.695 57.104 1.00 95.04 O ATOM 1620 C2* C A 75 42.402 36.136 55.366 1.00 85.58 C ATOM 1621 O2* C A 75 41.337 36.317 54.459 1.00 84.94 O ATOM 1622 C1* C A 75 43.631 36.926 54.915 1.00 82.57 C ATOM 1623 N1 C A 75 44.883 36.201 55.170 1.00 78.44 N ATOM 1624 C2 C A 75 45.365 35.327 54.193 1.00 75.98 C ATOM 1625 O2 C A 75 44.719 35.169 53.149 1.00 74.80 O ATOM 1626 N3 C A 75 46.512 34.673 54.406 1.00 74.76 N ATOM 1627 C4 C A 75 47.176 34.846 55.543 1.00 75.41 C ATOM 1628 N4 C A 75 48.311 34.161 55.705 1.00 75.90 N ATOM 1629 C5 C A 75 46.709 35.719 56.563 1.00 75.64 C ATOM 1630 C6 C A 75 45.570 36.373 56.337 1.00 77.27 C ATOM 1631 P A A 76 40.334 35.770 58.330 1.00 99.54 P ATOM 1632 O1P A A 76 41.267 36.020 59.481 1.00 99.74 O ATOM 1633 O2P A A 76 40.226 34.405 57.758 1.00 99.88 O ATOM 1634 O5* A A 76 38.872 36.304 58.697 1.00100.19 O ATOM 1635 C5* A A 76 38.666 37.596 59.323 1.00100.18 C ATOM 1636 C4* A A 76 37.607 38.379 58.569 1.00100.19 C ATOM 1637 O4* A A 76 36.479 37.500 58.278 1.00100.19 O ATOM 1638 C3* A A 76 37.025 39.585 59.305 1.00100.19 C ATOM 1639 O3* A A 76 36.428 40.400 58.274 1.00100.19 O ATOM 1640 C2* A A 76 35.785 38.991 59.981 1.00100.19 C ATOM 1641 O2* A A 76 34.780 39.945 60.283 1.00100.19 O ATOM 1642 C1* A A 76 35.314 37.959 58.950 1.00100.19 C ATOM 1643 N9 A A 76 34.598 36.785 59.488 1.00100.19 N ATOM 1644 C8 A A 76 34.399 35.586 58.829 1.00100.19 C ATOM 1645 N7 A A 76 33.715 34.701 59.522 1.00100.19 N ATOM 1646 C5 A A 76 33.440 35.352 60.719 1.00100.19 C ATOM 1647 C6 A A 76 32.739 34.948 61.881 1.00100.19 C ATOM 1648 N6 A A 76 32.161 33.747 62.024 1.00100.19 N ATOM 1649 N1 A A 76 32.652 35.835 62.902 1.00100.19 N ATOM 1650 C2 A A 76 33.230 37.042 62.758 1.00100.19 C ATOM 1651 N3 A A 76 33.911 37.538 61.720 1.00100.19 N ATOM 1652 C4 A A 76 33.982 36.637 60.719 1.00100.19 C TER 1653 A A 76 HETATM 1654 MG MG 590 70.566 35.530 1.665 1.00 65.44 MG HETATM 1655 MN O4M 530 80.714 57.068 31.271 1.00 34.23 MN HETATM 1656 O4 O4M 530 80.684 55.382 30.181 1.00 37.94 O HETATM 1657 O3 O4M 530 81.503 58.060 29.728 1.00 34.54 O HETATM 1658 O2 O4M 530 82.554 56.601 31.916 1.00 40.13 O HETATM 1659 O1 O4M 530 78.882 57.524 30.607 1.00 37.65 O HETATM 1660 MG MO5 510 57.346 47.575 47.279 1.00 62.77 MG HETATM 1661 O1 MO5 510 59.121 46.716 46.939 1.00 61.12 O HETATM 1662 O2 MO5 510 55.569 48.426 47.617 1.00 62.44 O HETATM 1663 O3 MO5 510 57.139 47.928 45.318 1.00 62.18 O HETATM 1664 O4 MO5 510 57.549 47.218 49.240 1.00 63.71 O HETATM 1665 O5 MO5 510 56.451 45.796 47.050 1.00 64.69 O HETATM 1666 MN MN5 520 49.923 44.427 50.131 1.00 89.20 MN HETATM 1667 O1 MN5 520 50.566 42.548 49.894 1.00 90.06 O HETATM 1668 O2 MN5 520 49.279 46.306 50.367 1.00 88.19 O HETATM 1669 O3 MN5 520 48.828 43.854 51.702 1.00 89.34 O HETATM 1670 O4 MN5 520 51.020 45.000 48.559 1.00 89.35 O HETATM 1671 O5 MN5 520 48.379 44.049 48.917 1.00 88.93 O HETATM 1672 MG MO3 540 77.110 64.307 25.357 1.00 40.08 MG HETATM 1673 O1 MO3 540 75.884 64.949 23.911 1.00 42.31 O HETATM 1674 O2 MO3 540 75.591 63.324 26.209 1.00 42.77 O HETATM 1675 O3 MO3 540 78.628 65.295 24.504 1.00 43.67 O HETATM 1676 MG MO6 560 62.649 46.629 27.595 1.00 47.02 MG HETATM 1677 O1 MO6 560 63.354 44.755 27.511 1.00 43.55 O HETATM 1678 O2 MO6 560 61.942 48.502 27.674 1.00 41.87 O HETATM 1679 O3 MO6 560 62.352 46.432 29.566 1.00 44.71 O HETATM 1680 O4 MO6 560 62.949 46.821 25.628 1.00 43.68 O HETATM 1681 O5 MO6 560 64.498 47.320 27.946 1.00 41.71 O HETATM 1682 O6 MO6 560 60.803 45.945 27.247 1.00 46.26 O HETATM 1683 MG MO6 570 73.331 43.321 11.207 1.00 50.39 MG HETATM 1684 O1 MO6 570 72.795 42.398 9.514 1.00 50.20 O HETATM 1685 O2 MO6 570 73.865 44.246 12.908 1.00 49.37 O HETATM 1686 O3 MO6 570 74.746 41.940 11.519 1.00 51.45 O HETATM 1687 O4 MO6 570 71.918 44.704 10.896 1.00 48.14 O HETATM 1688 O5 MO6 570 72.020 42.211 12.224 1.00 47.02 O HETATM 1689 O6 MO6 570 74.644 44.433 10.185 1.00 47.69 O HETATM 1690 MG MO1 580 69.222 44.815 33.339 1.00 61.74 MG HETATM 1691 O1 MO1 580 68.372 45.072 31.544 1.00 46.52 O HETATM 1692 MN MN5 550 72.301 48.513 33.894 1.00 56.51 MN HETATM 1693 O1 MN5 550 70.576 49.247 33.191 1.00 58.54 O HETATM 1694 O2 MN5 550 74.024 47.784 34.605 1.00 58.71 O HETATM 1695 O3 MN5 550 71.834 46.713 33.168 1.00 61.51 O HETATM 1696 O4 MN5 550 72.774 50.321 34.622 1.00 60.62 O HETATM 1697 O5 MN5 550 71.401 48.057 35.619 1.00 60.95 O HETATM 1698 O HOH 101 65.235 47.736 24.306 1.00 29.97 O HETATM 1699 O HOH 102 74.678 53.324 26.387 1.00 29.43 O HETATM 1700 O HOH 103 79.647 66.543 30.502 1.00 36.21 O HETATM 1701 O HOH 104 69.474 53.115 32.762 1.00 35.08 O HETATM 1702 O HOH 105 77.803 59.348 29.070 1.00 34.91 O HETATM 1703 O HOH 106 86.312 62.508 34.397 1.00 36.87 O HETATM 1704 O HOH 107 69.798 47.380 19.420 1.00 35.43 O HETATM 1705 O HOH 108 77.715 51.787 26.254 1.00 25.88 O HETATM 1706 O HOH 109 66.697 54.043 19.991 1.00 38.52 O HETATM 1707 O HOH 110 73.012 72.799 41.306 1.00 35.46 O HETATM 1708 O HOH 111 84.966 51.999 36.727 1.00 48.17 O HETATM 1709 O HOH 112 75.699 47.326 15.656 1.00 43.55 O HETATM 1710 O HOH 113 61.911 39.313 42.834 1.00 38.72 O HETATM 1711 O HOH 114 72.538 65.567 46.811 1.00 39.71 O HETATM 1712 O HOH 115 64.957 57.362 35.588 1.00 36.28 O HETATM 1713 O HOH 116 88.913 61.884 36.666 1.00 33.38 O HETATM 1714 O HOH 117 77.430 47.049 24.346 1.00 38.25 O HETATM 1715 O HOH 118 85.080 55.471 30.753 1.00 42.06 O HETATM 1716 O HOH 119 73.126 42.774 26.088 1.00 39.89 O HETATM 1717 O HOH 120 79.541 54.130 20.639 1.00 41.26 O HETATM 1718 O HOH 121 75.971 56.472 28.176 1.00 40.47 O HETATM 1719 O HOH 123 78.750 74.856 43.549 1.00 41.10 O HETATM 1720 O HOH 124 59.778 66.139 42.613 1.00 45.92 O HETATM 1721 O HOH 125 72.198 49.756 14.319 1.00 52.62 O HETATM 1722 O HOH 126 68.821 65.008 47.959 1.00 45.12 O HETATM 1723 O HOH 127 67.849 57.944 43.472 1.00 49.89 O HETATM 1724 O HOH 128 67.631 51.090 33.685 1.00 39.79 O HETATM 1725 O HOH 129 72.804 72.294 34.649 1.00 52.66 O HETATM 1726 O HOH 130 71.861 49.100 17.322 1.00 38.23 O HETATM 1727 O HOH 131 65.908 53.779 40.692 1.00 47.68 O HETATM 1728 O HOH 132 54.150 42.092 45.268 1.00 43.73 O HETATM 1729 O HOH 133 89.825 58.582 38.826 1.00 50.13 O HETATM 1730 O HOH 134 84.722 60.865 36.177 1.00 45.21 O HETATM 1731 O HOH 135 76.181 73.948 39.594 1.00 36.60 O HETATM 1732 O HOH 136 77.944 64.645 31.617 1.00 50.38 O HETATM 1733 O HOH 137 63.795 30.172 -8.994 1.00 51.48 O HETATM 1734 O HOH 138 79.187 54.860 37.993 1.00 53.51 O HETATM 1735 O HOH 139 65.438 43.393 28.966 1.00 49.20 O HETATM 1736 O HOH 140 76.458 61.362 31.941 1.00 41.25 O HETATM 1737 O HOH 141 65.955 45.704 31.155 1.00 37.31 O HETATM 1738 O HOH 142 76.497 48.574 12.986 1.00 52.62 O HETATM 1739 O HOH 143 77.696 63.153 34.024 1.00 41.50 O HETATM 1740 O HOH 144 83.868 72.752 40.256 1.00 49.29 O HETATM 1741 O HOH 145 83.766 59.152 28.946 1.00 39.50 O HETATM 1742 O HOH 146 80.216 39.612 12.437 1.00 58.96 O HETATM 1743 O HOH 147 74.386 47.570 12.387 1.00 53.73 O HETATM 1744 O HOH 148 76.514 45.163 13.466 1.00 52.76 O HETATM 1745 O HOH 149 63.032 41.158 41.088 1.00 48.34 O HETATM 1746 O HOH 150 71.118 68.217 36.794 1.00 51.89 O HETATM 1747 O HOH 152 81.091 65.399 26.812 1.00 44.48 O HETATM 1748 O HOH 153 77.519 52.631 29.605 1.00 48.97 O HETATM 1749 O HOH 155 59.968 42.360 37.108 1.00 49.01 O HETATM 1750 O HOH 156 65.613 68.667 37.929 1.00 45.40 O HETATM 1751 O HOH 157 56.060 63.049 38.179 1.00 49.08 O HETATM 1752 O HOH 158 65.177 33.333 7.613 1.00 55.31 O HETATM 1753 O HOH 159 75.172 67.087 29.927 1.00 54.05 O HETATM 1754 O HOH 160 84.839 64.460 36.543 1.00 47.37 O HETATM 1755 O HOH 161 84.226 60.935 26.655 1.00 51.90 O HETATM 1756 O HOH 162 65.528 66.128 32.712 1.00 58.08 O HETATM 1757 O HOH 163 54.801 39.353 47.303 1.00 50.47 O HETATM 1758 O HOH 164 82.124 69.432 40.965 1.00 44.14 O HETATM 1759 O HOH 166 76.820 63.025 29.658 1.00 45.12 O HETATM 1760 O HOH 167 66.938 41.242 10.226 1.00 51.21 O HETATM 1761 O HOH 168 85.023 60.379 30.965 1.00 38.68 O HETATM 1762 O HOH 170 81.568 67.018 41.474 1.00 54.90 O HETATM 1763 O HOH 171 83.672 64.839 39.206 1.00 48.85 O HETATM 1764 O HOH 172 69.358 59.813 40.447 1.00 56.32 O HETATM 1765 O HOH 173 65.402 39.517 46.250 1.00 42.43 O HETATM 1766 O HOH 174 78.025 56.354 40.203 1.00 44.75 O HETATM 1767 O HOH 175 72.640 54.075 19.103 1.00 56.38 O HETATM 1768 O HOH 178 62.561 49.577 9.243 1.00 59.11 O HETATM 1769 O HOH 181 67.851 57.255 35.234 1.00 39.17 O HETATM 1770 O HOH 183 65.609 57.383 39.390 1.00 56.16 O HETATM 1771 O HOH 184 77.652 63.370 43.463 1.00 46.31 O HETATM 1772 O HOH 185 56.156 59.761 40.881 1.00 54.01 O HETATM 1773 O HOH 186 68.030 57.904 37.829 1.00 54.73 O HETATM 1774 O HOH 189 64.948 50.726 18.484 1.00 45.40 O HETATM 1775 O HOH 191 58.581 70.881 37.367 1.00 43.35 O HETATM 1776 O HOH 195 69.329 73.815 40.546 1.00 45.81 O HETATM 1777 O HOH 196 71.092 44.843 13.706 1.00 46.18 O HETATM 1778 O HOH 197 63.773 67.421 46.112 1.00 52.13 O HETATM 1779 O HOH 200 79.526 44.399 2.237 1.00 56.28 O HETATM 1780 O HOH 204 61.159 66.141 44.876 1.00 49.57 O HETATM 1781 O HOH 205 55.921 58.490 38.100 1.00 52.97 O HETATM 1782 O HOH 206 61.370 44.287 30.748 1.00 53.06 O HETATM 1783 O HOH 208 72.463 66.452 30.831 1.00 58.61 O HETATM 1784 O HOH 210 60.953 51.071 33.259 1.00 44.43 O HETATM 1785 O HOH 214 55.561 30.912 50.683 1.00 57.65 O HETATM 1786 O HOH 219 72.422 43.667 15.579 1.00 52.11 O HETATM 1787 O HOH 222 65.477 55.377 26.488 1.00 40.16 O HETATM 1788 O HOH 223 62.090 56.194 45.841 1.00 48.80 O HETATM 1789 O HOH 226 60.948 49.649 25.176 1.00 34.03 O HETATM 1790 O HOH 228 69.381 66.644 35.818 1.00 46.90 O HETATM 1791 O HOH 230 66.314 66.119 35.359 1.00 45.34 O HETATM 1792 O HOH 231 69.248 64.196 37.425 1.00 39.82 O HETATM 1793 O HOH 233 67.490 67.214 38.107 1.00 42.96 O HETATM 1794 O HOH 591 70.315 63.256 32.948 1.00 48.78 O HETATM 1795 O HOH 592 70.278 67.253 46.788 1.00 49.58 O HETATM 1796 O HOH 593 59.221 51.476 25.337 1.00 53.13 O HETATM 1797 O HOH 596 36.759 35.239 56.874 1.00 54.15 O HETATM 1798 O HOH 598 72.226 76.169 43.785 1.00 56.77 O HETATM 1799 O HOH 602 79.271 66.367 28.116 1.00 57.39 O HETATM 1800 O HOH 603 68.045 43.077 4.648 1.00 58.94 O HETATM 1801 O HOH 608 52.188 37.918 39.395 1.00 56.79 O HETATM 1802 O HOH 610 53.895 62.874 28.494 1.00 55.33 O HETATM 1803 O HOH 611 70.166 44.500 35.929 1.00 59.11 O HETATM 1804 O HOH 612 65.815 56.418 41.681 1.00 63.02 O HETATM 1805 O HOH 613 84.445 61.099 24.272 1.00 53.13 O HETATM 1806 O HOH 616 62.869 55.003 26.332 1.00 48.50 O HETATM 1807 O HOH 618 50.840 53.124 44.965 1.00 59.63 O HETATM 1808 O HOH 626 59.036 51.116 45.301 1.00 57.70 O HETATM 1809 O HOH 627 51.263 39.619 37.323 1.00 55.93 O HETATM 1810 O HOH 633 59.499 57.746 20.958 1.00 59.69 O HETATM 1811 O HOH 635 77.328 58.675 26.658 1.00 51.93 O HETATM 1812 O HOH 644 72.884 27.349 -8.052 1.00 57.57 O HETATM 1813 O HOH 648 63.777 65.450 36.414 1.00 42.27 O HETATM 1814 O HOH 657 72.947 30.361 -12.545 1.00 50.16 O HETATM 1815 O HOH 662 57.486 68.555 38.998 1.00 41.15 O HETATM 1816 O HOH 670 72.917 73.923 36.977 1.00 53.90 O HETATM 1817 O HOH 671 82.577 50.441 36.557 1.00 57.25 O HETATM 1818 O HOH 675 68.361 55.414 38.613 1.00 58.70 O HETATM 1819 O HOH 688 64.284 30.562 -4.413 1.00 56.07 O HETATM 1820 O HOH 690 69.590 42.478 12.375 1.00 58.33 O HETATM 1821 O HOH 693 83.889 61.714 39.991 1.00 53.01 O HETATM 1822 O HOH 707 61.422 49.192 45.932 1.00 56.92 O CONECT 181 195 CONECT 195 181 196 197 198 CONECT 196 195 CONECT 197 195 CONECT 198 195 199 CONECT 199 198 200 CONECT 200 199 201 202 CONECT 201 200 206 CONECT 202 200 203 204 CONECT 203 202 219 CONECT 204 202 205 206 CONECT 205 204 CONECT 206 201 204 207 CONECT 207 206 208 218 CONECT 208 207 209 CONECT 209 208 210 CONECT 210 209 211 218 CONECT 211 210 212 213 CONECT 212 211 CONECT 213 211 214 CONECT 214 213 215 217 CONECT 215 214 216 CONECT 216 215 CONECT 217 214 218 CONECT 218 207 210 217 CONECT 219 203 CONECT 309 324 CONECT 324 309 325 326 327 CONECT 325 324 CONECT 326 324 CONECT 327 324 328 CONECT 328 327 329 CONECT 329 328 330 331 CONECT 330 329 333 CONECT 331 329 332 334 CONECT 332 331 344 CONECT 333 330 334 336 CONECT 334 331 333 335 CONECT 335 334 CONECT 336 333 337 343 CONECT 337 336 338 339 CONECT 338 337 CONECT 339 337 340 CONECT 340 339 341 342 CONECT 341 340 CONECT 342 340 343 CONECT 343 336 342 CONECT 344 332 345 346 347 CONECT 345 344 CONECT 346 344 CONECT 347 344 348 CONECT 348 347 349 CONECT 349 348 350 351 CONECT 350 349 353 CONECT 351 349 352 354 CONECT 352 351 364 CONECT 353 350 354 356 CONECT 354 351 353 355 CONECT 355 354 CONECT 356 353 357 363 CONECT 357 356 358 359 CONECT 358 357 CONECT 359 357 360 CONECT 360 359 361 362 CONECT 361 360 CONECT 362 360 363 CONECT 363 356 362 CONECT 364 352 CONECT 531 543 CONECT 543 531 544 545 546 CONECT 544 543 CONECT 545 543 CONECT 546 543 547 CONECT 547 546 548 CONECT 548 547 549 550 CONECT 549 548 554 CONECT 550 548 551 552 CONECT 551 550 568 CONECT 552 550 553 554 CONECT 553 552 CONECT 554 549 552 555 CONECT 555 554 556 565 CONECT 556 555 557 CONECT 557 556 558 CONECT 558 557 559 565 CONECT 559 558 560 561 CONECT 560 559 CONECT 561 559 562 CONECT 562 561 563 564 CONECT 563 562 566 567 CONECT 564 562 565 CONECT 565 555 558 564 CONECT 566 563 CONECT 567 563 CONECT 568 551 CONECT 661 693 CONECT 675 676 680 683 CONECT 676 675 677 681 CONECT 677 676 678 CONECT 678 677 679 682 CONECT 679 678 680 CONECT 680 675 679 CONECT 681 676 CONECT 682 678 CONECT 683 675 684 689 CONECT 684 683 685 687 CONECT 685 684 686 CONECT 686 685 CONECT 687 684 688 690 CONECT 688 687 689 691 CONECT 689 683 688 CONECT 690 687 696 CONECT 691 688 692 CONECT 692 691 693 CONECT 693 661 692 694 695 CONECT 694 693 CONECT 695 693 CONECT 696 690 CONECT 704 716 CONECT 716 704 717 718 719 CONECT 717 716 CONECT 718 716 CONECT 719 716 720 CONECT 720 719 721 CONECT 721 720 722 723 CONECT 722 721 728 CONECT 723 721 724 725 CONECT 724 723 740 CONECT 725 723 726 728 CONECT 726 725 727 CONECT 727 726 CONECT 728 722 725 729 CONECT 729 728 730 739 CONECT 730 729 731 CONECT 731 730 732 CONECT 732 731 733 739 CONECT 733 732 734 735 CONECT 734 733 CONECT 735 733 736 CONECT 736 735 737 738 CONECT 737 736 CONECT 738 736 739 CONECT 739 729 732 738 CONECT 740 724 CONECT 770 820 CONECT 784 786 791 798 CONECT 785 786 797 CONECT 786 784 785 787 CONECT 787 786 788 789 CONECT 788 787 CONECT 789 787 790 795 CONECT 790 789 791 793 CONECT 791 784 790 792 CONECT 792 791 CONECT 793 790 794 CONECT 794 793 795 CONECT 795 789 794 811 CONECT 796 797 CONECT 797 785 796 798 CONECT 798 784 797 799 CONECT 799 798 800 CONECT 800 799 801 CONECT 801 800 802 806 CONECT 802 801 803 804 CONECT 803 802 CONECT 804 802 805 CONECT 805 804 CONECT 806 801 807 CONECT 807 806 808 809 CONECT 808 807 CONECT 809 807 810 CONECT 810 809 CONECT 811 795 812 817 CONECT 812 811 813 814 CONECT 813 812 CONECT 814 812 815 816 CONECT 815 814 823 CONECT 816 814 817 818 """ fake_nmr_file="""HEADER RIBONUCLEIC ACID 04-AUG-98 17RA TITLE BRANCHPOINT HELIX FROM YEAST AND BINDING SITE FOR PHAGE TITLE 2 GA/MS2 COAT PROTEINS, NMR, 12 STRUCTURES COMPND MOL_ID: 1; COMPND 2 MOLECULE: RNA; COMPND 3 CHAIN: NULL; COMPND 4 FRAGMENT: RBS AND START SITE FOR PHAGE GA REPLICASE GENE; COMPND 5 ENGINEERED: YES; COMPND 6 MUTATION: A5U, A6U SOURCE MOL_ID: 1; SOURCE 2 SYNTHETIC: YES; SOURCE 3 ORGANISM_SCIENTIFIC: SACCHAROMYCES CEREVISIAE; SOURCE 4 ORGANISM_COMMON: BAKER'S YEAST; SOURCE 5 CELLULAR_LOCATION: NUCLEUS; SOURCE 6 GENE: REPLICASE; SOURCE 7 OTHER_DETAILS: IN VITRO SYNTHESIS FROM DNA TEMPLATE USING SOURCE 8 T7 RNA POLYMERASE. HAIRPIN CORRESPONDS TO NT -16 - +5 OF SOURCE 9 PHAGE GA REPLICASE AND THE YEAST PRE-MRNA BRANCHPOINT HELIX KEYWDS BRANCHPOINT HELIX, PHAGE MS2, BULGE, BASE TRIPLE, KEYWDS 2 RIBONUCLEIC ACID EXPDTA NMR, 12 STRUCTURES AUTHOR E.P.NIKONOWICZ,J.S.SMITH REVDAT 1 20-APR-99 17RA 0 JRNL AUTH J.S.SMITH,E.P.NIKONOWICZ JRNL TITL NMR STRUCTURE AND DYNAMICS OF AN RNA MOTIF COMMON JRNL TITL 2 TO THE SPLICEOSOME BRANCH-POINT HELIX AND THE JRNL TITL 3 RNA-BINDING SITE FOR PHAGE GA COAT PROTEIN JRNL REF BIOCHEMISTRY V. 37 13486 1998 JRNL REFN ASTM BICHAW US ISSN 0006-2960 0033 REMARK 1 REMARK 2 REMARK 2 RESOLUTION. NOT APPLICABLE. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.1 REMARK 3 AUTHORS : BRUNGER REMARK 3 REMARK 3 OTHER REFINEMENT REMARKS: REFINEMENT DETAILS CAN BE FOUND REMARK 3 IN THE JRNL CITATION ABOVE. REMARK 4 REMARK 4 17RA COMPLIES WITH FORMAT V. 2.3, 09-JULY-1998 REMARK 6 REMARK 6 CALCULATIONS PERFORMED WITHOUT PHOSPHATE ON 5' TERMINAL REMARK 6 NUCLEOTIDE. REMARK 210 REMARK 210 EXPERIMENTAL DETAILS REMARK 210 EXPERIMENT TYPE : NMR REMARK 210 TEMPERATURE (KELVIN) : 298 REMARK 210 PH : 6.8 REMARK 210 IONIC STRENGTH : 20 MM REMARK 210 PRESSURE : 1 ATM REMARK 210 SAMPLE CONTENTS : 10% H2O/90% D2O, 100% D2O REMARK 210 REMARK 210 NMR EXPERIMENTS CONDUCTED : 2D/3D NOESY, COSY, HETCOR REMARK 210 SPECTROMETER FIELD STRENGTH : 500 MHZ REMARK 210 SPECTROMETER MODEL : AMX500 REMARK 210 SPECTROMETER MANUFACTURER : BRUKER REMARK 210 REMARK 210 STRUCTURE DETERMINATION. REMARK 210 SOFTWARE USED : FELIX 950 REMARK 210 METHOD USED : SIMULATED ANNEALING REMARK 210 REMARK 210 CONFORMERS, NUMBER CALCULATED : 75 REMARK 210 CONFORMERS, NUMBER SUBMITTED : 12 REMARK 210 CONFORMERS, SELECTION CRITERIA : LEAST RESTRAINT VIOLATION REMARK 210 REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS ENSEMBLE : NULL REMARK 210 REMARK 210 REMARK: FURTHER DETAILS OF DATA ACQUISITION AND STRUCTURE REMARK 210 CALCULATION CAN BE FOUND IN THE METHODS SECTION OF THE REMARK 210 MANUSCRIPT LISTED ABOVE. MODELS 1-6 ARE MINIMIZED REMARK 210 STRUCTURES CALCULATED WITHOUT BASE PAIR CONSTRAINTS FOR REMARK 210 A6, A7, AND U16. MODELS 7-12 ARE MINIMIZED STRUCTURES REMARK 210 CALCULATED USING A6-U16 BASE PAIR CONSTRAINTS AND AN A7 REMARK 210 N1H-U16 O2 HYDROGEN BOND. REMARK 215 REMARK 215 NMR STUDY REMARK 215 THE COORDINATES IN THIS ENTRY WERE GENERATED FROM SOLUTION REMARK 215 NMR DATA. PROTEIN DATA BANK CONVENTIONS REQUIRE THAT REMARK 215 CRYST1 AND SCALE RECORDS BE INCLUDED, BUT THE VALUES ON REMARK 215 THESE RECORDS ARE MEANINGLESS. REMARK 800 REMARK 800 SITE REMARK 800 SITE_IDENTIFIER: AAU REMARK 800 SITE_DESCRIPTION: (A-A)*U MOTIF. PKA OF A7~6.1 AND IS REMARK 800 PARTIALLY PROTONATED AT PH 6.8, A7 NH1 MAY BE STABILIZED REMARK 800 BY U16 O2 (MODELS 7-12). REMARK 999 REMARK 999 SEQUENCE REMARK 999 CORRESPONDS TO GB D10027, GROUP II RNA BASES 1733 1753 DBREF 17RA 1 21 PDB 17RA 17RA 1 21 SEQRES 1 21 G G C G U A A G G A U U A SEQRES 2 21 C C U A U G C C SITE 1 AAU 3 A 6 A 7 U 16 CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 1.000000 0.000000 0.000000 0.00000 SCALE2 0.000000 1.000000 0.000000 0.00000 SCALE3 0.000000 0.000000 1.000000 0.00000 MODEL 1 ATOM 1 O5* G 1 35.729 -4.043 -8.815 1.00 0.00 O ATOM 2 C5* G 1 36.486 -5.197 -8.419 1.00 0.00 C ATOM 3 C4* G 1 35.947 -6.491 -9.034 1.00 0.00 C ATOM 4 O4* G 1 36.433 -7.650 -8.339 1.00 0.00 O ATOM 5 C3* G 1 34.424 -6.588 -8.918 1.00 0.00 C ATOM 6 O3* G 1 33.771 -5.934 -10.022 1.00 0.00 O ATOM 7 C2* G 1 34.203 -8.095 -8.942 1.00 0.00 C ATOM 8 O2* G 1 34.122 -8.584 -10.288 1.00 0.00 O ATOM 9 C1* G 1 35.423 -8.665 -8.218 1.00 0.00 C ATOM 10 N9 G 1 35.126 -8.972 -6.792 1.00 0.00 N ATOM 11 C8 G 1 35.282 -8.197 -5.684 1.00 0.00 C ATOM 12 N7 G 1 34.953 -8.707 -4.545 1.00 0.00 N ATOM 13 C5 G 1 34.521 -9.985 -4.914 1.00 0.00 C ATOM 14 C6 G 1 34.026 -11.046 -4.105 1.00 0.00 C ATOM 15 O6 G 1 33.873 -11.066 -2.885 1.00 0.00 O ATOM 16 N1 G 1 33.701 -12.162 -4.869 1.00 0.00 N ATOM 17 C2 G 1 33.833 -12.251 -6.243 1.00 0.00 C ATOM 18 N2 G 1 33.467 -13.405 -6.801 1.00 0.00 N ATOM 19 N3 G 1 34.297 -11.256 -7.009 1.00 0.00 N ATOM 20 C4 G 1 34.621 -10.158 -6.287 1.00 0.00 C ATOM 21 1H5* G 1 36.434 -5.308 -7.345 1.00 0.00 H ATOM 22 2H5* G 1 37.533 -5.043 -8.723 1.00 0.00 H ATOM 23 H4* G 1 36.243 -6.549 -10.082 1.00 0.00 H ATOM 24 H3* G 1 34.089 -6.173 -7.960 1.00 0.00 H ATOM 25 1H2* G 1 33.294 -8.345 -8.391 1.00 0.00 H ATOM 26 2HO* G 1 33.856 -9.506 -10.242 1.00 0.00 H ATOM 27 H1* G 1 35.757 -9.572 -8.727 1.00 0.00 H ATOM 28 H8 G 1 35.670 -7.179 -5.754 1.00 0.00 H ATOM 29 H1 G 1 33.342 -12.958 -4.361 1.00 0.00 H ATOM 30 1H2 G 1 33.541 -13.530 -7.800 1.00 0.00 H ATOM 31 2H2 G 1 33.116 -14.156 -6.224 1.00 0.00 H ATOM 32 H5T G 1 35.846 -3.936 -9.762 1.00 0.00 H ATOM 33 P G 2 32.263 -5.363 -9.892 1.00 0.00 P ATOM 34 O1P G 2 31.976 -4.551 -11.096 1.00 0.00 O ATOM 35 O2P G 2 32.103 -4.764 -8.548 1.00 0.00 O ATOM 36 O5* G 2 31.365 -6.705 -9.956 1.00 0.00 O ATOM 37 C5* G 2 31.011 -7.307 -11.214 1.00 0.00 C ATOM 38 C4* G 2 30.343 -8.676 -11.050 1.00 0.00 C ATOM 39 O4* G 2 31.035 -9.497 -10.097 1.00 0.00 O ATOM 40 C3* G 2 28.923 -8.561 -10.491 1.00 0.00 C ATOM 41 O3* G 2 27.951 -8.310 -11.523 1.00 0.00 O ATOM 42 C2* G 2 28.731 -9.940 -9.871 1.00 0.00 C ATOM 43 O2* G 2 28.290 -10.893 -10.847 1.00 0.00 O ATOM 44 C1* G 2 30.119 -10.301 -9.334 1.00 0.00 C ATOM 45 N9 G 2 30.233 -10.024 -7.876 1.00 0.00 N ATOM 46 C8 G 2 30.688 -8.915 -7.234 1.00 0.00 C ATOM 47 N7 G 2 30.687 -8.928 -5.942 1.00 0.00 N ATOM 48 C5 G 2 30.171 -10.196 -5.665 1.00 0.00 C ATOM 49 C6 G 2 29.919 -10.822 -4.414 1.00 0.00 C ATOM 50 O6 G 2 30.109 -10.370 -3.286 1.00 0.00 O ATOM 51 N1 G 2 29.397 -12.100 -4.577 1.00 0.00 N ATOM 52 C2 G 2 29.145 -12.706 -5.793 1.00 0.00 C ATOM 53 N2 G 2 28.642 -13.941 -5.748 1.00 0.00 N ATOM 54 N3 G 2 29.379 -12.124 -6.974 1.00 0.00 N ATOM 55 C4 G 2 29.889 -10.878 -6.841 1.00 0.00 C ATOM 56 1H5* G 2 31.908 -7.466 -11.801 1.00 0.00 H ATOM 57 2H5* G 2 30.337 -6.621 -11.752 1.00 0.00 H ATOM 58 H4* G 2 30.320 -9.189 -12.015 1.00 0.00 H ATOM 59 H3* G 2 28.882 -7.791 -9.713 1.00 0.00 H ATOM 60 1H2* G 2 28.016 -9.880 -9.045 1.00 0.00 H ATOM 61 2HO* G 2 27.375 -10.684 -11.057 1.00 0.00 H ATOM 62 H1* G 2 30.322 -11.357 -9.525 1.00 0.00 H ATOM 63 H8 G 2 31.045 -8.046 -7.786 1.00 0.00 H ATOM 64 H1 G 2 29.191 -12.607 -3.725 1.00 0.00 H ATOM 65 1H2 G 2 28.437 -14.436 -6.605 1.00 0.00 H ATOM 66 2H2 G 2 28.464 -14.382 -4.857 1.00 0.00 H ATOM 67 P C 3 26.771 -7.224 -11.322 1.00 0.00 P ATOM 68 O1P C 3 26.262 -6.847 -12.661 1.00 0.00 O ATOM 69 O2P C 3 27.254 -6.170 -10.400 1.00 0.00 O ATOM 70 O5* C 3 25.628 -8.075 -10.559 1.00 0.00 O ATOM 71 C5* C 3 24.477 -8.572 -11.257 1.00 0.00 C ATOM 72 C4* C 3 23.641 -9.523 -10.386 1.00 0.00 C ATOM 73 O4* C 3 24.471 -10.338 -9.542 1.00 0.00 O ATOM 74 C3* C 3 22.733 -8.770 -9.408 1.00 0.00 C ATOM 75 O3* C 3 21.495 -8.361 -10.017 1.00 0.00 O ATOM 76 C2* C 3 22.506 -9.840 -8.347 1.00 0.00 C ATOM 77 O2* C 3 21.435 -10.719 -8.716 1.00 0.00 O ATOM 78 C1* C 3 23.839 -10.592 -8.275 1.00 0.00 C ATOM 79 N1 C 3 24.678 -10.136 -7.128 1.00 0.00 N ATOM 80 C2 C 3 24.508 -10.789 -5.908 1.00 0.00 C ATOM 81 O2 C 3 23.685 -11.698 -5.800 1.00 0.00 O ATOM 82 N3 C 3 25.268 -10.395 -4.851 1.00 0.00 N ATOM 83 C4 C 3 26.159 -9.403 -4.971 1.00 0.00 C ATOM 84 N4 C 3 26.883 -9.048 -3.910 1.00 0.00 N ATOM 85 C5 C 3 26.343 -8.725 -6.217 1.00 0.00 C ATOM 86 C6 C 3 25.588 -9.118 -7.263 1.00 0.00 C ATOM 87 1H5* C 3 24.807 -9.110 -12.148 1.00 0.00 H ATOM 88 2H5* C 3 23.856 -7.730 -11.565 1.00 0.00 H ATOM 89 H4* C 3 23.038 -10.170 -11.027 1.00 0.00 H ATOM 90 H3* C 3 23.265 -7.915 -8.975 1.00 0.00 H ATOM 91 1H2* C 3 22.294 -9.367 -7.384 1.00 0.00 H ATOM 92 2HO* C 3 20.613 -10.251 -8.558 1.00 0.00 H ATOM 93 H1* C 3 23.644 -11.663 -8.184 1.00 0.00 H ATOM 94 1H4 C 3 27.559 -8.301 -3.985 1.00 0.00 H ATOM 95 2H4 C 3 26.757 -9.525 -3.028 1.00 0.00 H ATOM 96 H5 C 3 27.069 -7.917 -6.317 1.00 0.00 H ATOM 97 H6 C 3 25.706 -8.619 -8.224 1.00 0.00 H ATOM 98 P G 4 20.708 -7.037 -9.523 1.00 0.00 P ATOM 99 O1P G 4 19.779 -6.624 -10.600 1.00 0.00 O ATOM 100 O2P G 4 21.698 -6.069 -8.998 1.00 0.00 O ATOM 101 O5* G 4 19.832 -7.583 -8.278 1.00 0.00 O ATOM 102 C5* G 4 18.633 -8.350 -8.491 1.00 0.00 C ATOM 103 C4* G 4 18.228 -9.169 -7.262 1.00 0.00 C ATOM 104 O4* G 4 19.372 -9.689 -6.570 1.00 0.00 O ATOM 105 C3* G 4 17.500 -8.321 -6.213 1.00 0.00 C ATOM 106 O3* G 4 16.092 -8.212 -6.500 1.00 0.00 O ATOM 107 C2* G 4 17.750 -9.137 -4.951 1.00 0.00 C ATOM 108 O2* G 4 16.781 -10.184 -4.808 1.00 0.00 O ATOM 109 C1* G 4 19.155 -9.706 -5.149 1.00 0.00 C ATOM 110 N9 G 4 20.181 -8.913 -4.420 1.00 0.00 N ATOM 111 C8 G 4 21.010 -7.930 -4.866 1.00 0.00 C ATOM 112 N7 G 4 21.826 -7.403 -4.014 1.00 0.00 N ATOM 113 C5 G 4 21.518 -8.108 -2.845 1.00 0.00 C ATOM 114 C6 G 4 22.071 -7.993 -1.538 1.00 0.00 C ATOM 115 O6 G 4 22.961 -7.238 -1.149 1.00 0.00 O ATOM 116 N1 G 4 21.477 -8.887 -0.653 1.00 0.00 N ATOM 117 C2 G 4 20.476 -9.781 -0.978 1.00 0.00 C ATOM 118 N2 G 4 20.036 -10.563 0.009 1.00 0.00 N ATOM 119 N3 G 4 19.952 -9.894 -2.202 1.00 0.00 N ATOM 120 C4 G 4 20.513 -9.033 -3.084 1.00 0.00 C ATOM 121 1H5* G 4 18.796 -9.055 -9.296 1.00 0.00 H ATOM 122 2H5* G 4 17.821 -7.658 -8.769 1.00 0.00 H ATOM 123 H4* G 4 17.590 -10.000 -7.571 1.00 0.00 H ATOM 124 H3* G 4 17.968 -7.333 -6.126 1.00 0.00 H ATOM 125 1H2* G 4 17.734 -8.483 -4.076 1.00 0.00 H ATOM 126 2HO* G 4 16.922 -10.590 -3.948 1.00 0.00 H ATOM 127 H1* G 4 19.178 -10.735 -4.796 1.00 0.00 H ATOM 128 H8 G 4 20.986 -7.595 -5.904 1.00 0.00 H ATOM 129 H1 G 4 21.814 -8.859 0.298 1.00 0.00 H ATOM 130 1H2 G 4 20.432 -10.478 0.935 1.00 0.00 H ATOM 131 2H2 G 4 19.309 -11.239 -0.169 1.00 0.00 H ATOM 132 P U 5 15.233 -6.916 -6.060 1.00 0.00 P ATOM 133 O1P U 5 13.850 -7.096 -6.556 1.00 0.00 O ATOM 134 O2P U 5 15.992 -5.701 -6.434 1.00 0.00 O ATOM 135 O5* U 5 15.217 -7.025 -4.446 1.00 0.00 O ATOM 136 C5* U 5 13.987 -7.142 -3.713 1.00 0.00 C ATOM 137 C4* U 5 14.174 -6.810 -2.224 1.00 0.00 C ATOM 138 O4* U 5 15.442 -7.271 -1.730 1.00 0.00 O ATOM 139 C3* U 5 14.196 -5.299 -1.970 1.00 0.00 C ATOM 140 O3* U 5 12.866 -4.773 -1.798 1.00 0.00 O ATOM 141 C2* U 5 15.006 -5.214 -0.682 1.00 0.00 C ATOM 142 O2* U 5 14.170 -5.389 0.469 1.00 0.00 O ATOM 143 C1* U 5 16.027 -6.347 -0.796 1.00 0.00 C ATOM 144 N1 U 5 17.357 -5.858 -1.260 1.00 0.00 N ATOM 145 C2 U 5 18.197 -5.285 -0.314 1.00 0.00 C ATOM 146 O2 U 5 17.870 -5.156 0.866 1.00 0.00 O ATOM 147 N3 U 5 19.433 -4.863 -0.773 1.00 0.00 N ATOM 148 C4 U 5 19.901 -4.963 -2.073 1.00 0.00 C ATOM 149 O4 U 5 21.023 -4.555 -2.368 1.00 0.00 O ATOM 150 C5 U 5 18.963 -5.572 -2.991 1.00 0.00 C ATOM 151 C6 U 5 17.746 -5.989 -2.569 1.00 0.00 C ATOM 152 1H5* U 5 13.620 -8.166 -3.803 1.00 0.00 H ATOM 153 2H5* U 5 13.248 -6.464 -4.143 1.00 0.00 H ATOM 154 H4* U 5 13.372 -7.270 -1.643 1.00 0.00 H ATOM 155 H3* U 5 14.723 -4.785 -2.783 1.00 0.00 H ATOM 156 1H2* U 5 15.521 -4.251 -0.634 1.00 0.00 H ATOM 157 2HO* U 5 14.701 -5.187 1.242 1.00 0.00 H ATOM 158 H1* U 5 16.136 -6.836 0.175 1.00 0.00 H ATOM 159 H3 U 5 20.051 -4.441 -0.096 1.00 0.00 H ATOM 160 H5 U 5 19.246 -5.709 -4.036 1.00 0.00 H ATOM 161 H6 U 5 17.062 -6.432 -3.288 1.00 0.00 H ATOM 162 P A 6 12.565 -3.186 -1.896 1.00 0.00 P ATOM 163 O1P A 6 11.096 -2.998 -1.909 1.00 0.00 O ATOM 164 O2P A 6 13.385 -2.620 -2.991 1.00 0.00 O ATOM 165 O5* A 6 13.139 -2.622 -0.494 1.00 0.00 O ATOM 166 C5* A 6 12.398 -2.764 0.728 1.00 0.00 C ATOM 167 C4* A 6 13.218 -2.316 1.947 1.00 0.00 C ATOM 168 O4* A 6 14.583 -2.747 1.849 1.00 0.00 O ATOM 169 C3* A 6 13.286 -0.789 2.053 1.00 0.00 C ATOM 170 O3* A 6 12.314 -0.331 3.011 1.00 0.00 O ATOM 171 C2* A 6 14.715 -0.482 2.497 1.00 0.00 C ATOM 172 O2* A 6 14.736 0.082 3.814 1.00 0.00 O ATOM 173 C1* A 6 15.474 -1.806 2.465 1.00 0.00 C ATOM 174 N9 A 6 16.741 -1.676 1.704 1.00 0.00 N ATOM 175 C8 A 6 16.951 -1.695 0.363 1.00 0.00 C ATOM 176 N7 A 6 18.166 -1.554 -0.053 1.00 0.00 N ATOM 177 C5 A 6 18.866 -1.420 1.150 1.00 0.00 C ATOM 178 C6 A 6 20.222 -1.233 1.445 1.00 0.00 C ATOM 179 N6 A 6 21.168 -1.143 0.507 1.00 0.00 N ATOM 180 N1 A 6 20.569 -1.141 2.740 1.00 0.00 N ATOM 181 C2 A 6 19.641 -1.226 3.696 1.00 0.00 C ATOM 182 N3 A 6 18.333 -1.403 3.532 1.00 0.00 N ATOM 183 C4 A 6 18.008 -1.492 2.225 1.00 0.00 C ATOM 184 1H5* A 6 12.119 -3.812 0.856 1.00 0.00 H ATOM 185 2H5* A 6 11.489 -2.162 0.668 1.00 0.00 H ATOM 186 H4* A 6 12.776 -2.729 2.856 1.00 0.00 H ATOM 187 H3* A 6 13.099 -0.339 1.074 1.00 0.00 H ATOM 188 1H2* A 6 15.172 0.211 1.787 1.00 0.00 H ATOM 189 2HO* A 6 14.285 -0.534 4.397 1.00 0.00 H ATOM 190 H1* A 6 15.694 -2.126 3.485 1.00 0.00 H ATOM 191 H8 A 6 16.123 -1.811 -0.335 1.00 0.00 H ATOM 192 1H6 A 6 22.132 -1.008 0.774 1.00 0.00 H ATOM 193 2H6 A 6 20.918 -1.210 -0.469 1.00 0.00 H ATOM 194 H2 A 6 19.993 -1.143 4.724 1.00 0.00 H ATOM 195 P A 7 11.860 1.219 3.079 1.00 0.00 P ATOM 196 O1P A 7 10.531 1.280 3.729 1.00 0.00 O ATOM 197 O2P A 7 12.051 1.826 1.741 1.00 0.00 O ATOM 198 O5* A 7 12.943 1.858 4.095 1.00 0.00 O ATOM 199 C5* A 7 13.656 3.061 3.772 1.00 0.00 C ATOM 200 C4* A 7 14.906 3.224 4.652 1.00 0.00 C ATOM 201 O4* A 7 15.883 2.209 4.374 1.00 0.00 O ATOM 202 C3* A 7 15.639 4.540 4.375 1.00 0.00 C ATOM 203 O3* A 7 15.112 5.625 5.160 1.00 0.00 O ATOM 204 C2* A 7 17.066 4.195 4.787 1.00 0.00 C ATOM 205 O2* A 7 17.264 4.395 6.194 1.00 0.00 O ATOM 206 C1* A 7 17.224 2.721 4.409 1.00 0.00 C ATOM 207 N9 A 7 17.905 2.560 3.096 1.00 0.00 N ATOM 208 C8 A 7 17.377 2.501 1.843 1.00 0.00 C ATOM 209 N7 A 7 18.197 2.371 0.854 1.00 0.00 N ATOM 210 C5 A 7 19.433 2.334 1.510 1.00 0.00 C ATOM 211 C6 A 7 20.748 2.208 1.046 1.00 0.00 C ATOM 212 N6 A 7 21.058 2.096 -0.246 1.00 0.00 N ATOM 213 N1 A 7 21.731 2.204 1.964 1.00 0.00 N ATOM 214 C2 A 7 21.445 2.319 3.262 1.00 0.00 C ATOM 215 N3 A 7 20.239 2.446 3.811 1.00 0.00 N ATOM 216 C4 A 7 19.268 2.446 2.873 1.00 0.00 C ATOM 217 1H5* A 7 12.996 3.918 3.924 1.00 0.00 H ATOM 218 2H5* A 7 13.959 3.027 2.725 1.00 0.00 H ATOM 219 H4* A 7 14.622 3.171 5.705 1.00 0.00 H ATOM 220 H3* A 7 15.604 4.774 3.305 1.00 0.00 H ATOM 221 1H2* A 7 17.771 4.801 4.215 1.00 0.00 H ATOM 222 2HO* A 7 16.670 3.797 6.655 1.00 0.00 H ATOM 223 H1* A 7 17.791 2.203 5.185 1.00 0.00 H ATOM 224 H8 A 7 16.301 2.551 1.679 1.00 0.00 H ATOM 225 1H6 A 7 22.024 2.006 -0.527 1.00 0.00 H ATOM 226 2H6 A 7 20.327 2.100 -0.943 1.00 0.00 H ATOM 227 H2 A 7 22.293 2.303 3.948 1.00 0.00 H ATOM 228 P G 8 15.175 7.152 4.632 1.00 0.00 P ATOM 229 O1P G 8 14.224 7.956 5.434 1.00 0.00 O ATOM 230 O2P G 8 15.064 7.145 3.156 1.00 0.00 O ATOM 231 O5* G 8 16.679 7.602 5.021 1.00 0.00 O ATOM 232 C5* G 8 16.974 8.190 6.301 1.00 0.00 C ATOM 233 C4* G 8 18.459 8.097 6.664 1.00 0.00 C ATOM 234 O4* G 8 19.038 6.861 6.220 1.00 0.00 O ATOM 235 C3* G 8 19.294 9.178 5.970 1.00 0.00 C ATOM 236 O3* G 8 19.301 10.432 6.677 1.00 0.00 O ATOM 237 C2* G 8 20.676 8.536 5.995 1.00 0.00 C ATOM 238 O2* G 8 21.341 8.783 7.242 1.00 0.00 O ATOM 239 C1* G 8 20.400 7.045 5.800 1.00 0.00 C ATOM 240 N9 G 8 20.603 6.633 4.384 1.00 0.00 N ATOM 241 C8 G 8 19.716 6.588 3.353 1.00 0.00 C ATOM 242 N7 G 8 20.153 6.196 2.203 1.00 0.00 N ATOM 243 C5 G 8 21.500 5.939 2.479 1.00 0.00 C ATOM 244 C6 G 8 22.532 5.475 1.617 1.00 0.00 C ATOM 245 O6 G 8 22.457 5.193 0.423 1.00 0.00 O ATOM 246 N1 G 8 23.743 5.352 2.291 1.00 0.00 N ATOM 247 C2 G 8 23.940 5.638 3.629 1.00 0.00 C ATOM 248 N2 G 8 25.175 5.455 4.101 1.00 0.00 N ATOM 249 N3 G 8 22.975 6.075 4.444 1.00 0.00 N ATOM 250 C4 G 8 21.786 6.203 3.810 1.00 0.00 C ATOM 251 1H5* G 8 16.430 7.661 7.074 1.00 0.00 H ATOM 252 2H5* G 8 16.653 9.244 6.283 1.00 0.00 H ATOM 253 H4* G 8 18.576 8.179 7.746 1.00 0.00 H ATOM 254 H3* G 8 18.960 9.308 4.933 1.00 0.00 H ATOM 255 1H2* G 8 21.274 8.913 5.164 1.00 0.00 H ATOM 256 2HO* G 8 22.250 8.489 7.145 1.00 0.00 H ATOM 257 H1* G 8 21.060 6.464 6.449 1.00 0.00 H ATOM 258 H8 G 8 18.671 6.869 3.491 1.00 0.00 H ATOM 259 H1 G 8 24.528 5.030 1.739 1.00 0.00 H ATOM 260 1H2 G 8 25.377 5.646 5.072 1.00 0.00 H ATOM 261 2H2 G 8 25.906 5.124 3.488 1.00 0.00 H ATOM 262 P G 9 19.503 11.838 5.902 1.00 0.00 P ATOM 263 O1P G 9 19.258 12.934 6.867 1.00 0.00 O ATOM 264 O2P G 9 18.733 11.791 4.637 1.00 0.00 O ATOM 265 O5* G 9 21.078 11.827 5.533 1.00 0.00 O ATOM 266 C5* G 9 22.079 12.117 6.527 1.00 0.00 C ATOM 267 C4* G 9 23.469 11.607 6.136 1.00 0.00 C ATOM 268 O4* G 9 23.399 10.353 5.441 1.00 0.00 O ATOM 269 C3* G 9 24.176 12.539 5.154 1.00 0.00 C ATOM 270 O3* G 9 24.856 13.639 5.786 1.00 0.00 O ATOM 271 C2* G 9 25.176 11.572 4.531 1.00 0.00 C ATOM 272 O2* G 9 26.330 11.407 5.367 1.00 0.00 O ATOM 273 C1* G 9 24.390 10.265 4.404 1.00 0.00 C ATOM 274 N9 G 9 23.764 10.121 3.058 1.00 0.00 N ATOM 275 C8 G 9 22.457 10.216 2.689 1.00 0.00 C ATOM 276 N7 G 9 22.173 10.051 1.440 1.00 0.00 N ATOM 277 C5 G 9 23.433 9.814 0.884 1.00 0.00 C ATOM 278 C6 G 9 23.801 9.558 -0.468 1.00 0.00 C ATOM 279 O6 G 9 23.073 9.498 -1.460 1.00 0.00 O ATOM 280 N1 G 9 25.171 9.376 -0.598 1.00 0.00 N ATOM 281 C2 G 9 26.084 9.433 0.440 1.00 0.00 C ATOM 282 N2 G 9 27.363 9.241 0.117 1.00 0.00 N ATOM 283 N3 G 9 25.747 9.673 1.710 1.00 0.00 N ATOM 284 C4 G 9 24.415 9.854 1.864 1.00 0.00 C ATOM 285 1H5* G 9 21.816 11.626 7.453 1.00 0.00 H ATOM 286 2H5* G 9 22.106 13.208 6.687 1.00 0.00 H ATOM 287 H4* G 9 24.079 11.487 7.029 1.00 0.00 H ATOM 288 H3* G 9 23.471 12.891 4.392 1.00 0.00 H ATOM 289 1H2* G 9 25.472 11.929 3.543 1.00 0.00 H ATOM 290 2HO* G 9 26.061 10.888 6.129 1.00 0.00 H ATOM 291 H1* G 9 25.050 9.416 4.600 1.00 0.00 H ATOM 292 H8 G 9 21.675 10.422 3.420 1.00 0.00 H ATOM 293 H1 G 9 25.505 9.184 -1.532 1.00 0.00 H ATOM 294 1H2 G 9 28.075 9.277 0.832 1.00 0.00 H ATOM 295 2H2 G 9 27.620 9.060 -0.844 1.00 0.00 H ATOM 296 P A 10 24.574 15.170 5.346 1.00 0.00 P ATOM 297 O1P A 10 25.088 16.058 6.414 1.00 0.00 O ATOM 298 O2P A 10 23.160 15.284 4.920 1.00 0.00 O ATOM 299 O5* A 10 25.509 15.351 4.040 1.00 0.00 O ATOM 300 C5* A 10 26.926 15.558 4.158 1.00 0.00 C ATOM 301 C4* A 10 27.695 14.893 3.010 1.00 0.00 C ATOM 302 O4* A 10 27.088 13.644 2.639 1.00 0.00 O ATOM 303 C3* A 10 27.666 15.734 1.730 1.00 0.00 C ATOM 304 O3* A 10 28.738 16.681 1.557 1.00 0.00 O ATOM 305 C2* A 10 27.848 14.655 0.676 1.00 0.00 C ATOM 306 O2* A 10 29.235 14.386 0.433 1.00 0.00 O ATOM 307 C1* A 10 27.132 13.433 1.221 1.00 0.00 C ATOM 308 N9 A 10 25.782 13.325 0.619 1.00 0.00 N ATOM 309 C8 A 10 24.567 13.233 1.208 1.00 0.00 C ATOM 310 N7 A 10 23.531 13.159 0.438 1.00 0.00 N ATOM 311 C5 A 10 24.118 13.212 -0.830 1.00 0.00 C ATOM 312 C6 A 10 23.590 13.185 -2.128 1.00 0.00 C ATOM 313 N6 A 10 22.285 13.087 -2.383 1.00 0.00 N ATOM 314 N1 A 10 24.460 13.266 -3.151 1.00 0.00 N ATOM 315 C2 A 10 25.772 13.368 -2.917 1.00 0.00 C ATOM 316 N3 A 10 26.373 13.394 -1.738 1.00 0.00 N ATOM 317 C4 A 10 25.485 13.314 -0.728 1.00 0.00 C ATOM 318 1H5* A 10 27.271 15.137 5.101 1.00 0.00 H ATOM 319 2H5* A 10 27.130 16.630 4.154 1.00 0.00 H ATOM 320 H4* A 10 28.728 14.715 3.315 1.00 0.00 H ATOM 321 H3* A 10 26.687 16.215 1.611 1.00 0.00 H ATOM 322 1H2* A 10 27.356 14.965 -0.242 1.00 0.00 H ATOM 323 2HO* A 10 29.611 14.051 1.250 1.00 0.00 H ATOM 324 H1* A 10 27.712 12.533 0.999 1.00 0.00 H ATOM 325 H8 A 10 24.467 13.253 2.288 1.00 0.00 H ATOM 326 1H6 A 10 21.957 13.072 -3.339 1.00 0.00 H ATOM 327 2H6 A 10 21.624 13.026 -1.622 1.00 0.00 H ATOM 328 H2 A 10 26.420 13.467 -3.792 1.00 0.00 H ATOM 329 P U 11 28.576 17.995 0.621 1.00 0.00 P ATOM 330 O1P U 11 29.925 18.403 0.166 1.00 0.00 O ATOM 331 O2P U 11 27.725 18.967 1.343 1.00 0.00 O ATOM 332 O5* U 11 27.742 17.456 -0.660 1.00 0.00 O ATOM 333 C5* U 11 28.394 16.929 -1.831 1.00 0.00 C ATOM 334 C4* U 11 27.827 17.520 -3.137 1.00 0.00 C ATOM 335 O4* U 11 26.520 17.016 -3.444 1.00 0.00 O ATOM 336 C3* U 11 27.619 19.039 -3.065 1.00 0.00 C ATOM 337 O3* U 11 27.694 19.750 -4.306 1.00 0.00 O ATOM 338 C2* U 11 26.178 19.201 -2.604 1.00 0.00 C ATOM 339 O2* U 11 25.622 20.454 -3.029 1.00 0.00 O ATOM 340 C1* U 11 25.499 18.016 -3.297 1.00 0.00 C ATOM 341 N1 U 11 24.332 17.511 -2.520 1.00 0.00 N ATOM 342 C2 U 11 23.089 17.546 -3.137 1.00 0.00 C ATOM 343 O2 U 11 22.939 17.962 -4.286 1.00 0.00 O ATOM 344 N3 U 11 22.021 17.082 -2.388 1.00 0.00 N ATOM 345 C4 U 11 22.082 16.594 -1.094 1.00 0.00 C ATOM 346 O4 U 11 21.063 16.209 -0.521 1.00 0.00 O ATOM 347 C5 U 11 23.411 16.592 -0.526 1.00 0.00 C ATOM 348 C6 U 11 24.475 17.039 -1.237 1.00 0.00 C ATOM 349 1H5* U 11 28.260 15.848 -1.848 1.00 0.00 H ATOM 350 2H5* U 11 29.459 17.145 -1.780 1.00 0.00 H ATOM 351 H4* U 11 28.499 17.279 -3.968 1.00 0.00 H ATOM 352 H3* U 11 28.293 19.482 -2.333 1.00 0.00 H ATOM 353 1H2* U 11 26.115 19.093 -1.518 1.00 0.00 H ATOM 354 2HO* U 11 25.589 20.443 -3.989 1.00 0.00 H ATOM 355 H1* U 11 25.169 18.324 -4.292 1.00 0.00 H ATOM 356 H3 U 11 21.111 17.104 -2.827 1.00 0.00 H ATOM 357 H5 U 11 23.559 16.227 0.491 1.00 0.00 H ATOM 358 H6 U 11 25.464 17.021 -0.780 1.00 0.00 H ATOM 359 P U 12 29.104 20.030 -5.029 1.00 0.00 P ATOM 360 O1P U 12 30.182 19.371 -4.256 1.00 0.00 O ATOM 361 O2P U 12 29.190 21.477 -5.328 1.00 0.00 O ATOM 362 O5* U 12 28.904 19.242 -6.418 1.00 0.00 O ATOM 363 C5* U 12 29.559 17.999 -6.687 1.00 0.00 C ATOM 364 C4* U 12 28.656 17.042 -7.476 1.00 0.00 C ATOM 365 O4* U 12 27.374 17.616 -7.774 1.00 0.00 O ATOM 366 C3* U 12 29.275 16.676 -8.821 1.00 0.00 C ATOM 367 O3* U 12 29.008 15.408 -9.437 1.00 0.00 O ATOM 368 C2* U 12 28.653 17.722 -9.741 1.00 0.00 C ATOM 369 O2* U 12 28.547 17.237 -11.086 1.00 0.00 O ATOM 370 C1* U 12 27.274 18.004 -9.154 1.00 0.00 C ATOM 371 N1 U 12 26.895 19.438 -9.299 1.00 0.00 N ATOM 372 C2 U 12 26.259 19.820 -10.473 1.00 0.00 C ATOM 373 O2 U 12 26.014 19.022 -11.377 1.00 0.00 O ATOM 374 N3 U 12 25.916 21.158 -10.572 1.00 0.00 N ATOM 375 C4 U 12 26.148 22.135 -9.619 1.00 0.00 C ATOM 376 O4 U 12 25.799 23.298 -9.814 1.00 0.00 O ATOM 377 C5 U 12 26.815 21.652 -8.430 1.00 0.00 C ATOM 378 C6 U 12 27.161 20.347 -8.306 1.00 0.00 C ATOM 379 1H5* U 12 29.833 17.528 -5.743 1.00 0.00 H ATOM 380 2H5* U 12 30.462 18.191 -7.254 1.00 0.00 H ATOM 381 H4* U 12 28.507 16.133 -6.892 1.00 0.00 H ATOM 382 H3* U 12 30.355 16.846 -8.759 1.00 0.00 H ATOM 383 1H2* U 12 29.243 18.624 -9.718 1.00 0.00 H ATOM 384 2HO* U 12 28.308 17.983 -11.641 1.00 0.00 H ATOM 385 H1* U 12 26.532 17.378 -9.652 1.00 0.00 H ATOM 386 H3 U 12 25.452 21.448 -11.422 1.00 0.00 H ATOM 387 H5 U 12 27.041 22.348 -7.621 1.00 0.00 H ATOM 388 H6 U 12 27.668 20.016 -7.400 1.00 0.00 H ATOM 389 P A 13 29.931 14.883 -10.657 1.00 0.00 P ATOM 390 O1P A 13 30.926 15.931 -10.983 1.00 0.00 O ATOM 391 O2P A 13 29.042 14.363 -11.721 1.00 0.00 O ATOM 392 O5* A 13 30.717 13.645 -9.995 1.00 0.00 O ATOM 393 C5* A 13 30.316 12.299 -10.252 1.00 0.00 C ATOM 394 C4* A 13 29.638 11.672 -9.027 1.00 0.00 C ATOM 395 O4* A 13 28.904 12.629 -8.255 1.00 0.00 O ATOM 396 C3* A 13 28.602 10.612 -9.409 1.00 0.00 C ATOM 397 O3* A 13 28.539 9.603 -8.396 1.00 0.00 O ATOM 398 C2* A 13 27.285 11.373 -9.478 1.00 0.00 C ATOM 399 O2* A 13 26.163 10.519 -9.217 1.00 0.00 O ATOM 400 C1* A 13 27.494 12.395 -8.358 1.00 0.00 C ATOM 401 N9 A 13 26.739 13.655 -8.572 1.00 0.00 N ATOM 402 C8 A 13 26.342 14.278 -9.713 1.00 0.00 C ATOM 403 N7 A 13 25.576 15.311 -9.592 1.00 0.00 N ATOM 404 C5 A 13 25.456 15.411 -8.203 1.00 0.00 C ATOM 405 C6 A 13 24.783 16.309 -7.365 1.00 0.00 C ATOM 406 N6 A 13 24.048 17.324 -7.820 1.00 0.00 N ATOM 407 N1 A 13 24.896 16.115 -6.041 1.00 0.00 N ATOM 408 C2 A 13 25.627 15.104 -5.571 1.00 0.00 C ATOM 409 N3 A 13 26.290 14.202 -6.269 1.00 0.00 N ATOM 410 C4 A 13 26.166 14.412 -7.584 1.00 0.00 C ATOM 411 1H5* A 13 31.198 11.713 -10.511 1.00 0.00 H ATOM 412 2H5* A 13 29.625 12.287 -11.095 1.00 0.00 H ATOM 413 H4* A 13 30.393 11.224 -8.384 1.00 0.00 H ATOM 414 H3* A 13 28.839 10.174 -10.382 1.00 0.00 H ATOM 415 1H2* A 13 27.187 11.875 -10.443 1.00 0.00 H ATOM 416 2HO* A 13 26.029 9.977 -9.998 1.00 0.00 H ATOM 417 H1* A 13 27.154 11.948 -7.421 1.00 0.00 H ATOM 418 H8 A 13 26.726 13.980 -10.681 1.00 0.00 H ATOM 419 1H6 A 13 23.582 17.942 -7.171 1.00 0.00 H ATOM 420 2H6 A 13 23.957 17.476 -8.815 1.00 0.00 H ATOM 421 H2 A 13 25.718 15.025 -4.491 1.00 0.00 H ATOM 422 P C 14 28.746 8.040 -8.738 1.00 0.00 P ATOM 423 O1P C 14 29.625 7.923 -9.924 1.00 0.00 O ATOM 424 O2P C 14 27.417 7.388 -8.737 1.00 0.00 O ATOM 425 O5* C 14 29.561 7.526 -7.444 1.00 0.00 O ATOM 426 C5* C 14 30.941 7.870 -7.256 1.00 0.00 C ATOM 427 C4* C 14 31.185 8.628 -5.942 1.00 0.00 C ATOM 428 O4* C 14 30.361 9.798 -5.843 1.00 0.00 O ATOM 429 C3* C 14 30.808 7.790 -4.720 1.00 0.00 C ATOM 430 O3* C 14 31.916 6.964 -4.319 1.00 0.00 O ATOM 431 C2* C 14 30.486 8.862 -3.686 1.00 0.00 C ATOM 432 O2* C 14 31.671 9.303 -3.009 1.00 0.00 O ATOM 433 C1* C 14 29.864 9.996 -4.509 1.00 0.00 C ATOM 434 N1 C 14 28.377 9.960 -4.487 1.00 0.00 N ATOM 435 C2 C 14 27.730 10.276 -3.297 1.00 0.00 C ATOM 436 O2 C 14 28.380 10.549 -2.289 1.00 0.00 O ATOM 437 N3 C 14 26.370 10.282 -3.285 1.00 0.00 N ATOM 438 C4 C 14 25.667 10.000 -4.388 1.00 0.00 C ATOM 439 N4 C 14 24.334 9.999 -4.323 1.00 0.00 N ATOM 440 C5 C 14 26.326 9.681 -5.621 1.00 0.00 C ATOM 441 C6 C 14 27.674 9.666 -5.621 1.00 0.00 C ATOM 442 1H5* C 14 31.274 8.490 -8.090 1.00 0.00 H ATOM 443 2H5* C 14 31.522 6.955 -7.247 1.00 0.00 H ATOM 444 H4* C 14 32.234 8.923 -5.879 1.00 0.00 H ATOM 445 H3* C 14 29.919 7.188 -4.929 1.00 0.00 H ATOM 446 1H2* C 14 29.758 8.477 -2.969 1.00 0.00 H ATOM 447 2HO* C 14 31.394 9.875 -2.291 1.00 0.00 H ATOM 448 H1* C 14 30.211 10.959 -4.133 1.00 0.00 H ATOM 449 1H4 C 14 23.788 9.774 -5.140 1.00 0.00 H ATOM 450 2H4 C 14 23.870 10.216 -3.449 1.00 0.00 H ATOM 451 H5 C 14 25.761 9.508 -6.542 1.00 0.00 H ATOM 452 H6 C 14 28.212 9.392 -6.529 1.00 0.00 H ATOM 453 P C 15 31.719 5.402 -3.957 1.00 0.00 P ATOM 454 O1P C 15 33.013 4.715 -4.167 1.00 0.00 O ATOM 455 O2P C 15 30.505 4.908 -4.646 1.00 0.00 O ATOM 456 O5* C 15 31.415 5.441 -2.371 1.00 0.00 O ATOM 457 C5* C 15 32.446 5.757 -1.422 1.00 0.00 C ATOM 458 C4* C 15 31.871 6.019 -0.022 1.00 0.00 C ATOM 459 O4* C 15 30.766 6.936 -0.065 1.00 0.00 O ATOM 460 C3* C 15 31.284 4.752 0.604 1.00 0.00 C ATOM 461 O3* C 15 32.277 3.986 1.310 1.00 0.00 O ATOM 462 C2* C 15 30.252 5.327 1.564 1.00 0.00 C ATOM 463 O2* C 15 30.848 5.659 2.826 1.00 0.00 O ATOM 464 C1* C 15 29.722 6.578 0.856 1.00 0.00 C ATOM 465 N1 C 15 28.433 6.316 0.152 1.00 0.00 N ATOM 466 C2 C 15 27.307 6.054 0.933 1.00 0.00 C ATOM 467 O2 C 15 27.389 6.058 2.160 1.00 0.00 O ATOM 468 N3 C 15 26.125 5.807 0.305 1.00 0.00 N ATOM 469 C4 C 15 26.039 5.814 -1.032 1.00 0.00 C ATOM 470 N4 C 15 24.864 5.567 -1.612 1.00 0.00 N ATOM 471 C5 C 15 27.188 6.085 -1.843 1.00 0.00 C ATOM 472 C6 C 15 28.356 6.330 -1.216 1.00 0.00 C ATOM 473 1H5* C 15 32.980 6.647 -1.757 1.00 0.00 H ATOM 474 2H5* C 15 33.149 4.923 -1.368 1.00 0.00 H ATOM 475 H4* C 15 32.651 6.426 0.625 1.00 0.00 H ATOM 476 H3* C 15 30.788 4.145 -0.162 1.00 0.00 H ATOM 477 1H2* C 15 29.443 4.608 1.705 1.00 0.00 H ATOM 478 2HO* C 15 30.135 5.881 3.429 1.00 0.00 H ATOM 479 H1* C 15 29.590 7.384 1.583 1.00 0.00 H ATOM 480 1H4 C 15 24.787 5.570 -2.619 1.00 0.00 H ATOM 481 2H4 C 15 24.052 5.375 -1.044 1.00 0.00 H ATOM 482 H5 C 15 27.120 6.093 -2.932 1.00 0.00 H ATOM 483 H6 C 15 29.245 6.553 -1.807 1.00 0.00 H ATOM 484 P U 16 32.239 2.370 1.327 1.00 0.00 P ATOM 485 O1P U 16 33.340 1.894 2.196 1.00 0.00 O ATOM 486 O2P U 16 32.151 1.891 -0.072 1.00 0.00 O ATOM 487 O5* U 16 30.835 2.048 2.060 1.00 0.00 O ATOM 488 C5* U 16 30.717 2.070 3.491 1.00 0.00 C ATOM 489 C4* U 16 29.251 1.992 3.949 1.00 0.00 C ATOM 490 O4* U 16 28.384 2.751 3.092 1.00 0.00 O ATOM 491 C3* U 16 28.697 0.566 3.878 1.00 0.00 C ATOM 492 O3* U 16 29.025 -0.175 5.067 1.00 0.00 O ATOM 493 C2* U 16 27.200 0.829 3.756 1.00 0.00 C ATOM 494 O2* U 16 26.599 1.011 5.045 1.00 0.00 O ATOM 495 C1* U 16 27.110 2.108 2.919 1.00 0.00 C ATOM 496 N1 U 16 26.823 1.816 1.483 1.00 0.00 N ATOM 497 C2 U 16 25.498 1.595 1.127 1.00 0.00 C ATOM 498 O2 U 16 24.584 1.626 1.950 1.00 0.00 O ATOM 499 N3 U 16 25.262 1.334 -0.212 1.00 0.00 N ATOM 500 C4 U 16 26.213 1.273 -1.216 1.00 0.00 C ATOM 501 O4 U 16 25.887 1.032 -2.377 1.00 0.00 O ATOM 502 C5 U 16 27.564 1.513 -0.761 1.00 0.00 C ATOM 503 C6 U 16 27.827 1.773 0.545 1.00 0.00 C ATOM 504 1H5* U 16 31.156 2.996 3.868 1.00 0.00 H ATOM 505 2H5* U 16 31.268 1.225 3.908 1.00 0.00 H ATOM 506 H4* U 16 29.167 2.371 4.969 1.00 0.00 H ATOM 507 H3* U 16 29.068 0.058 2.979 1.00 0.00 H ATOM 508 1H2* U 16 26.719 0.003 3.226 1.00 0.00 H ATOM 509 2HO* U 16 25.648 1.034 4.917 1.00 0.00 H ATOM 510 H1* U 16 26.326 2.752 3.326 1.00 0.00 H ATOM 511 H3 U 16 24.300 1.173 -0.483 1.00 0.00 H ATOM 512 H5 U 16 28.385 1.485 -1.479 1.00 0.00 H ATOM 513 H6 U 16 28.857 1.950 0.851 1.00 0.00 H ATOM 514 P A 17 28.993 -1.792 5.089 1.00 0.00 P ATOM 515 O1P A 17 29.952 -2.257 6.118 1.00 0.00 O ATOM 516 O2P A 17 29.116 -2.284 3.697 1.00 0.00 O ATOM 517 O5* A 17 27.495 -2.113 5.604 1.00 0.00 O ATOM 518 C5* A 17 27.111 -1.892 6.970 1.00 0.00 C ATOM 519 C4* A 17 25.653 -2.303 7.229 1.00 0.00 C ATOM 520 O4* A 17 24.758 -1.701 6.281 1.00 0.00 O ATOM 521 C3* A 17 25.445 -3.806 7.077 1.00 0.00 C ATOM 522 O3* A 17 25.756 -4.478 8.311 1.00 0.00 O ATOM 523 C2* A 17 23.962 -3.866 6.726 1.00 0.00 C ATOM 524 O2* A 17 23.150 -3.848 7.907 1.00 0.00 O ATOM 525 C1* A 17 23.728 -2.616 5.875 1.00 0.00 C ATOM 526 N9 A 17 23.815 -2.914 4.421 1.00 0.00 N ATOM 527 C8 A 17 24.831 -2.680 3.545 1.00 0.00 C ATOM 528 N7 A 17 24.655 -3.039 2.318 1.00 0.00 N ATOM 529 C5 A 17 23.368 -3.581 2.358 1.00 0.00 C ATOM 530 C6 A 17 22.551 -4.157 1.377 1.00 0.00 C ATOM 531 N6 A 17 22.927 -4.293 0.105 1.00 0.00 N ATOM 532 N1 A 17 21.335 -4.587 1.757 1.00 0.00 N ATOM 533 C2 A 17 20.941 -4.463 3.027 1.00 0.00 C ATOM 534 N3 A 17 21.631 -3.936 4.035 1.00 0.00 N ATOM 535 C4 A 17 22.846 -3.510 3.631 1.00 0.00 C ATOM 536 1H5* A 17 27.226 -0.832 7.205 1.00 0.00 H ATOM 537 2H5* A 17 27.766 -2.471 7.625 1.00 0.00 H ATOM 538 H4* A 17 25.363 -2.005 8.231 1.00 0.00 H ATOM 539 H3* A 17 26.048 -4.193 6.248 1.00 0.00 H ATOM 540 1H2* A 17 23.755 -4.760 6.138 1.00 0.00 H ATOM 541 2HO* A 17 22.244 -4.013 7.635 1.00 0.00 H ATOM 542 H1* A 17 22.748 -2.193 6.109 1.00 0.00 H ATOM 543 H8 A 17 25.758 -2.205 3.865 1.00 0.00 H ATOM 544 1H6 A 17 22.300 -4.715 -0.565 1.00 0.00 H ATOM 545 2H6 A 17 23.839 -3.974 -0.191 1.00 0.00 H ATOM 546 H2 A 17 19.938 -4.827 3.262 1.00 0.00 H ATOM 547 P U 18 26.538 -5.892 8.327 1.00 0.00 P ATOM 548 O1P U 18 26.683 -6.322 9.737 1.00 0.00 O ATOM 549 O2P U 18 27.740 -5.772 7.470 1.00 0.00 O ATOM 550 O5* U 18 25.495 -6.886 7.598 1.00 0.00 O ATOM 551 C5* U 18 24.294 -7.317 8.256 1.00 0.00 C ATOM 552 C4* U 18 23.245 -7.824 7.257 1.00 0.00 C ATOM 553 O4* U 18 23.032 -6.889 6.188 1.00 0.00 O ATOM 554 C3* U 18 23.694 -9.107 6.552 1.00 0.00 C ATOM 555 O3* U 18 23.374 -10.289 7.311 1.00 0.00 O ATOM 556 C2* U 18 22.884 -9.044 5.263 1.00 0.00 C ATOM 557 O2* U 18 21.575 -9.599 5.447 1.00 0.00 O ATOM 558 C1* U 18 22.811 -7.551 4.933 1.00 0.00 C ATOM 559 N1 U 18 23.820 -7.152 3.908 1.00 0.00 N ATOM 560 C2 U 18 23.539 -7.458 2.581 1.00 0.00 C ATOM 561 O2 U 18 22.503 -8.034 2.247 1.00 0.00 O ATOM 562 N3 U 18 24.489 -7.077 1.651 1.00 0.00 N ATOM 563 C4 U 18 25.681 -6.425 1.918 1.00 0.00 C ATOM 564 O4 U 18 26.454 -6.134 1.007 1.00 0.00 O ATOM 565 C5 U 18 25.897 -6.142 3.319 1.00 0.00 C ATOM 566 C6 U 18 24.983 -6.503 4.256 1.00 0.00 C ATOM 567 1H5* U 18 23.874 -6.478 8.815 1.00 0.00 H ATOM 568 2H5* U 18 24.540 -8.119 8.955 1.00 0.00 H ATOM 569 H4* U 18 22.300 -7.993 7.774 1.00 0.00 H ATOM 570 H3* U 18 24.765 -9.062 6.325 1.00 0.00 H ATOM 571 1H2* U 18 23.413 -9.576 4.470 1.00 0.00 H ATOM 572 2HO* U 18 21.670 -10.555 5.471 1.00 0.00 H ATOM 573 H1* U 18 21.807 -7.309 4.574 1.00 0.00 H ATOM 574 H3 U 18 24.291 -7.293 0.681 1.00 0.00 H ATOM 575 H5 U 18 26.810 -5.631 3.629 1.00 0.00 H ATOM 576 H6 U 18 25.173 -6.258 5.304 1.00 0.00 H ATOM 577 P G 19 24.346 -11.580 7.312 1.00 0.00 P ATOM 578 O1P G 19 24.010 -12.406 8.495 1.00 0.00 O ATOM 579 O2P G 19 25.739 -11.120 7.112 1.00 0.00 O ATOM 580 O5* G 19 23.883 -12.376 5.982 1.00 0.00 O ATOM 581 C5* G 19 23.221 -13.652 6.066 1.00 0.00 C ATOM 582 C4* G 19 22.809 -14.197 4.694 1.00 0.00 C ATOM 583 O4* G 19 22.387 -13.150 3.808 1.00 0.00 O ATOM 584 C3* G 19 23.980 -14.852 3.954 1.00 0.00 C ATOM 585 O3* G 19 24.213 -16.220 4.336 1.00 0.00 O ATOM 586 C2* G 19 23.494 -14.774 2.513 1.00 0.00 C ATOM 587 O2* G 19 22.622 -15.866 2.197 1.00 0.00 O ATOM 588 C1* G 19 22.756 -13.437 2.447 1.00 0.00 C ATOM 589 N9 G 19 23.620 -12.369 1.877 1.00 0.00 N ATOM 590 C8 G 19 24.415 -11.464 2.507 1.00 0.00 C ATOM 591 N7 G 19 25.082 -10.639 1.773 1.00 0.00 N ATOM 592 C5 G 19 24.701 -11.024 0.485 1.00 0.00 C ATOM 593 C6 G 19 25.095 -10.496 -0.775 1.00 0.00 C ATOM 594 O6 G 19 25.871 -9.570 -1.004 1.00 0.00 O ATOM 595 N1 G 19 24.481 -11.170 -1.824 1.00 0.00 N ATOM 596 C2 G 19 23.594 -12.221 -1.685 1.00 0.00 C ATOM 597 N2 G 19 23.107 -12.739 -2.815 1.00 0.00 N ATOM 598 N3 G 19 23.218 -12.722 -0.504 1.00 0.00 N ATOM 599 C4 G 19 23.805 -12.081 0.536 1.00 0.00 C ATOM 600 1H5* G 19 22.311 -13.546 6.644 1.00 0.00 H ATOM 601 2H5* G 19 23.894 -14.363 6.569 1.00 0.00 H ATOM 602 H4* G 19 21.997 -14.916 4.812 1.00 0.00 H ATOM 603 H3* G 19 24.887 -14.248 4.076 1.00 0.00 H ATOM 604 1H2* G 19 24.350 -14.760 1.833 1.00 0.00 H ATOM 605 2HO* G 19 22.486 -15.860 1.246 1.00 0.00 H ATOM 606 H1* G 19 21.854 -13.544 1.842 1.00 0.00 H ATOM 607 H8 G 19 24.490 -11.435 3.595 1.00 0.00 H ATOM 608 H1 G 19 24.716 -10.852 -2.751 1.00 0.00 H ATOM 609 1H2 G 19 22.456 -13.510 -2.777 1.00 0.00 H ATOM 610 2H2 G 19 23.392 -12.361 -3.708 1.00 0.00 H ATOM 611 P C 20 25.686 -16.880 4.233 1.00 0.00 P ATOM 612 O1P C 20 25.634 -18.216 4.870 1.00 0.00 O ATOM 613 O2P C 20 26.678 -15.885 4.698 1.00 0.00 O ATOM 614 O5* C 20 25.882 -17.081 2.640 1.00 0.00 O ATOM 615 C5* C 20 25.335 -18.223 1.964 1.00 0.00 C ATOM 616 C4* C 20 25.611 -18.194 0.451 1.00 0.00 C ATOM 617 O4* C 20 25.236 -16.940 -0.144 1.00 0.00 O ATOM 618 C3* C 20 27.103 -18.328 0.127 1.00 0.00 C ATOM 619 O3* C 20 27.549 -19.698 0.164 1.00 0.00 O ATOM 620 C2* C 20 27.121 -17.766 -1.290 1.00 0.00 C ATOM 621 O2* C 20 26.757 -18.765 -2.252 1.00 0.00 O ATOM 622 C1* C 20 26.097 -16.627 -1.254 1.00 0.00 C ATOM 623 N1 C 20 26.747 -15.292 -1.105 1.00 0.00 N ATOM 624 C2 C 20 27.208 -14.670 -2.264 1.00 0.00 C ATOM 625 O2 C 20 27.079 -15.220 -3.356 1.00 0.00 O ATOM 626 N3 C 20 27.800 -13.449 -2.153 1.00 0.00 N ATOM 627 C4 C 20 27.940 -12.856 -0.961 1.00 0.00 C ATOM 628 N4 C 20 28.526 -11.659 -0.895 1.00 0.00 N ATOM 629 C5 C 20 27.471 -13.484 0.235 1.00 0.00 C ATOM 630 C6 C 20 26.885 -14.693 0.120 1.00 0.00 C ATOM 631 1H5* C 20 24.256 -18.244 2.124 1.00 0.00 H ATOM 632 2H5* C 20 25.771 -19.130 2.387 1.00 0.00 H ATOM 633 H4* C 20 25.056 -18.999 -0.034 1.00 0.00 H ATOM 634 H3* C 20 27.698 -17.695 0.796 1.00 0.00 H ATOM 635 1H2* C 20 28.114 -17.367 -1.514 1.00 0.00 H ATOM 636 2HO* C 20 25.840 -18.999 -2.091 1.00 0.00 H ATOM 637 H1* C 20 25.508 -16.645 -2.175 1.00 0.00 H ATOM 638 1H4 C 20 28.640 -11.200 -0.004 1.00 0.00 H ATOM 639 2H4 C 20 28.858 -11.212 -1.738 1.00 0.00 H ATOM 640 H5 C 20 27.584 -13.002 1.207 1.00 0.00 H ATOM 641 H6 C 20 26.515 -15.194 1.013 1.00 0.00 H ATOM 642 P C 21 29.106 -20.075 0.387 1.00 0.00 P ATOM 643 O1P C 21 29.192 -21.528 0.664 1.00 0.00 O ATOM 644 O2P C 21 29.688 -19.111 1.347 1.00 0.00 O ATOM 645 O5* C 21 29.757 -19.794 -1.065 1.00 0.00 O ATOM 646 C5* C 21 29.562 -20.704 -2.159 1.00 0.00 C ATOM 647 C4* C 21 29.991 -20.090 -3.501 1.00 0.00 C ATOM 648 O4* C 21 29.684 -18.689 -3.574 1.00 0.00 O ATOM 649 C3* C 21 31.505 -20.153 -3.707 1.00 0.00 C ATOM 650 O3* C 21 31.915 -21.419 -4.241 1.00 0.00 O ATOM 651 C2* C 21 31.720 -19.028 -4.710 1.00 0.00 C ATOM 652 O2* C 21 31.499 -19.481 -6.052 1.00 0.00 O ATOM 653 C1* C 21 30.691 -17.968 -4.305 1.00 0.00 C ATOM 654 N1 C 21 31.301 -16.871 -3.494 1.00 0.00 N ATOM 655 C2 C 21 31.856 -15.796 -4.186 1.00 0.00 C ATOM 656 O2 C 21 31.842 -15.776 -5.415 1.00 0.00 O ATOM 657 N3 C 21 32.408 -14.779 -3.469 1.00 0.00 N ATOM 658 C4 C 21 32.423 -14.807 -2.131 1.00 0.00 C ATOM 659 N4 C 21 32.976 -13.791 -1.465 1.00 0.00 N ATOM 660 C5 C 21 31.858 -15.906 -1.409 1.00 0.00 C ATOM 661 C6 C 21 31.311 -16.910 -2.123 1.00 0.00 C ATOM 662 1H5* C 21 28.504 -20.970 -2.214 1.00 0.00 H ATOM 663 2H5* C 21 30.143 -21.610 -1.978 1.00 0.00 H ATOM 664 H4* C 21 29.485 -20.610 -4.318 1.00 0.00 H ATOM 665 H3* C 21 32.032 -19.932 -2.770 1.00 0.00 H ATOM 666 1H2* C 21 32.733 -18.626 -4.605 1.00 0.00 H ATOM 667 2HO* C 21 31.753 -18.767 -6.642 1.00 0.00 H ATOM 668 H1* C 21 30.237 -17.545 -5.205 1.00 0.00 H ATOM 669 1H4 C 21 32.994 -13.798 -0.455 1.00 0.00 H ATOM 670 2H4 C 21 33.376 -13.013 -1.971 1.00 0.00 H ATOM 671 H5 C 21 31.869 -15.929 -0.318 1.00 0.00 H ATOM 672 H6 C 21 30.873 -17.760 -1.601 1.00 0.00 H ATOM 673 H3T C 21 32.875 -21.422 -4.267 1.00 0.00 H TER 674 C 21 ENDMDL MODEL 2 ATOM 675 O5* G 1 37.008 -12.325 -11.644 1.00 0.00 O ATOM 676 C5* G 1 35.778 -11.791 -11.135 1.00 0.00 C ATOM 677 C4* G 1 34.582 -12.698 -11.437 1.00 0.00 C ATOM 678 O4* G 1 34.720 -13.979 -10.804 1.00 0.00 O ATOM 679 C3* G 1 33.275 -12.133 -10.875 1.00 0.00 C ATOM 680 O3* G 1 32.656 -11.221 -11.801 1.00 0.00 O ATOM 681 C2* G 1 32.443 -13.395 -10.683 1.00 0.00 C ATOM 682 O2* G 1 31.758 -13.753 -11.891 1.00 0.00 O ATOM 683 C1* G 1 33.467 -14.461 -10.291 1.00 0.00 C ATOM 684 N9 G 1 33.522 -14.660 -8.818 1.00 0.00 N ATOM 685 C8 G 1 34.370 -14.124 -7.898 1.00 0.00 C ATOM 686 N7 G 1 34.209 -14.467 -6.663 1.00 0.00 N ATOM 687 C5 G 1 33.121 -15.342 -6.745 1.00 0.00 C ATOM 688 C6 G 1 32.453 -16.062 -5.716 1.00 0.00 C ATOM 689 O6 G 1 32.698 -16.070 -4.511 1.00 0.00 O ATOM 690 N1 G 1 31.408 -16.828 -6.223 1.00 0.00 N ATOM 691 C2 G 1 31.047 -16.895 -7.556 1.00 0.00 C ATOM 692 N2 G 1 30.014 -17.687 -7.852 1.00 0.00 N ATOM 693 N3 G 1 31.669 -16.220 -8.527 1.00 0.00 N ATOM 694 C4 G 1 32.692 -15.467 -8.059 1.00 0.00 C ATOM 695 1H5* G 1 35.579 -10.837 -11.606 1.00 0.00 H ATOM 696 2H5* G 1 35.887 -11.650 -10.048 1.00 0.00 H ATOM 697 H4* G 1 34.492 -12.838 -12.516 1.00 0.00 H ATOM 698 H3* G 1 33.455 -11.650 -9.907 1.00 0.00 H ATOM 699 1H2* G 1 31.730 -13.248 -9.869 1.00 0.00 H ATOM 700 2HO* G 1 30.920 -14.147 -11.638 1.00 0.00 H ATOM 701 H1* G 1 33.214 -15.405 -10.780 1.00 0.00 H ATOM 702 H8 G 1 35.160 -13.430 -8.191 1.00 0.00 H ATOM 703 H1 G 1 30.887 -17.371 -5.548 1.00 0.00 H ATOM 704 1H2 G 1 29.705 -17.775 -8.809 1.00 0.00 H ATOM 705 2H2 G 1 29.543 -18.197 -7.120 1.00 0.00 H ATOM 706 H5T G 1 36.972 -12.266 -12.601 1.00 0.00 H ATOM 707 P G 2 31.597 -10.105 -11.305 1.00 0.00 P ATOM 708 O1P G 2 31.385 -9.148 -12.416 1.00 0.00 O ATOM 709 O2P G 2 32.027 -9.603 -9.980 1.00 0.00 O ATOM 710 O5* G 2 30.241 -10.961 -11.103 1.00 0.00 O ATOM 711 C5* G 2 29.403 -11.311 -12.220 1.00 0.00 C ATOM 712 C4* G 2 28.254 -12.246 -11.831 1.00 0.00 C ATOM 713 O4* G 2 28.692 -13.298 -10.958 1.00 0.00 O ATOM 714 C3* G 2 27.167 -11.521 -11.033 1.00 0.00 C ATOM 715 O3* G 2 26.224 -10.863 -11.900 1.00 0.00 O ATOM 716 C2* G 2 26.527 -12.672 -10.268 1.00 0.00 C ATOM 717 O2* G 2 25.523 -13.325 -11.057 1.00 0.00 O ATOM 718 C1* G 2 27.697 -13.614 -9.968 1.00 0.00 C ATOM 719 N9 G 2 28.219 -13.421 -8.588 1.00 0.00 N ATOM 720 C8 G 2 29.283 -12.693 -8.154 1.00 0.00 C ATOM 721 N7 G 2 29.533 -12.693 -6.887 1.00 0.00 N ATOM 722 C5 G 2 28.519 -13.519 -6.392 1.00 0.00 C ATOM 723 C6 G 2 28.244 -13.918 -5.055 1.00 0.00 C ATOM 724 O6 G 2 28.853 -13.616 -4.030 1.00 0.00 O ATOM 725 N1 G 2 27.135 -14.753 -4.989 1.00 0.00 N ATOM 726 C2 G 2 26.378 -15.157 -6.074 1.00 0.00 C ATOM 727 N2 G 2 25.346 -15.961 -5.811 1.00 0.00 N ATOM 728 N3 G 2 26.631 -14.786 -7.334 1.00 0.00 N ATOM 729 C4 G 2 27.709 -13.971 -7.423 1.00 0.00 C ATOM 730 1H5* G 2 29.996 -11.836 -12.961 1.00 0.00 H ATOM 731 2H5* G 2 29.003 -10.384 -12.662 1.00 0.00 H ATOM 732 H4* G 2 27.820 -12.686 -12.731 1.00 0.00 H ATOM 733 H3* G 2 27.622 -10.811 -10.332 1.00 0.00 H ATOM 734 1H2* G 2 26.097 -12.302 -9.333 1.00 0.00 H ATOM 735 2HO* G 2 25.111 -13.992 -10.501 1.00 0.00 H ATOM 736 H1* G 2 27.375 -14.649 -10.099 1.00 0.00 H ATOM 737 H8 G 2 29.908 -12.132 -8.849 1.00 0.00 H ATOM 738 H1 G 2 26.879 -15.079 -4.068 1.00 0.00 H ATOM 739 1H2 G 2 24.757 -16.290 -6.562 1.00 0.00 H ATOM 740 2H2 G 2 25.155 -16.243 -4.859 1.00 0.00 H ATOM 741 P C 3 25.601 -9.419 -11.527 1.00 0.00 P ATOM 742 O1P C 3 25.067 -8.814 -12.769 1.00 0.00 O ATOM 743 O2P C 3 26.592 -8.675 -10.717 1.00 0.00 O ATOM 744 O5* C 3 24.357 -9.804 -10.570 1.00 0.00 O ATOM 745 C5* C 3 23.094 -10.207 -11.119 1.00 0.00 C ATOM 746 C4* C 3 22.246 -10.979 -10.096 1.00 0.00 C ATOM 747 O4* C 3 23.056 -11.826 -9.267 1.00 0.00 O ATOM 748 C3* C 3 21.535 -10.046 -9.110 1.00 0.00 C ATOM 749 O3* C 3 20.295 -9.540 -9.639 1.00 0.00 O ATOM 750 C2* C 3 21.302 -10.992 -7.937 1.00 0.00 C ATOM 751 O2* C 3 20.101 -11.756 -8.115 1.00 0.00 O ATOM 752 C1* C 3 22.538 -11.897 -7.928 1.00 0.00 C ATOM 753 N1 C 3 23.542 -11.462 -6.911 1.00 0.00 N ATOM 754 C2 C 3 23.422 -11.988 -5.627 1.00 0.00 C ATOM 755 O2 C 3 22.515 -12.776 -5.364 1.00 0.00 O ATOM 756 N3 C 3 24.325 -11.608 -4.684 1.00 0.00 N ATOM 757 C4 C 3 25.310 -10.749 -4.974 1.00 0.00 C ATOM 758 N4 C 3 26.172 -10.399 -4.019 1.00 0.00 N ATOM 759 C5 C 3 25.444 -10.201 -6.290 1.00 0.00 C ATOM 760 C6 C 3 24.547 -10.580 -7.223 1.00 0.00 C ATOM 761 1H5* C 3 23.270 -10.847 -11.985 1.00 0.00 H ATOM 762 2H5* C 3 22.547 -9.319 -11.442 1.00 0.00 H ATOM 763 H4* C 3 21.509 -11.590 -10.621 1.00 0.00 H ATOM 764 H3* C 3 22.202 -9.229 -8.810 1.00 0.00 H ATOM 765 1H2* C 3 21.254 -10.421 -7.007 1.00 0.00 H ATOM 766 2HO* C 3 20.243 -12.346 -8.859 1.00 0.00 H ATOM 767 H1* C 3 22.232 -12.925 -7.721 1.00 0.00 H ATOM 768 1H4 C 3 26.918 -9.750 -4.223 1.00 0.00 H ATOM 769 2H4 C 3 26.078 -10.784 -3.088 1.00 0.00 H ATOM 770 H5 C 3 26.246 -9.501 -6.530 1.00 0.00 H ATOM 771 H6 C 3 24.625 -10.180 -8.233 1.00 0.00 H ATOM 772 P G 4 19.644 -8.163 -9.094 1.00 0.00 P ATOM 773 O1P G 4 18.604 -7.733 -10.057 1.00 0.00 O ATOM 774 O2P G 4 20.741 -7.233 -8.737 1.00 0.00 O ATOM 775 O5* G 4 18.912 -8.625 -7.728 1.00 0.00 O ATOM 776 C5* G 4 17.694 -9.392 -7.759 1.00 0.00 C ATOM 777 C4* G 4 17.404 -10.094 -6.429 1.00 0.00 C ATOM 778 O4* G 4 18.608 -10.564 -5.802 1.00 0.00 O ATOM 779 C3* G 4 16.784 -9.148 -5.396 1.00 0.00 C ATOM 780 O3* G 4 15.354 -9.046 -5.548 1.00 0.00 O ATOM 781 C2* G 4 17.148 -9.851 -4.094 1.00 0.00 C ATOM 782 O2* G 4 16.192 -10.871 -3.771 1.00 0.00 O ATOM 783 C1* G 4 18.527 -10.452 -4.372 1.00 0.00 C ATOM 784 N9 G 4 19.622 -9.612 -3.818 1.00 0.00 N ATOM 785 C8 G 4 20.416 -8.692 -4.429 1.00 0.00 C ATOM 786 N7 G 4 21.319 -8.109 -3.711 1.00 0.00 N ATOM 787 C5 G 4 21.118 -8.699 -2.458 1.00 0.00 C ATOM 788 C6 G 4 21.800 -8.482 -1.229 1.00 0.00 C ATOM 789 O6 G 4 22.733 -7.713 -0.997 1.00 0.00 O ATOM 790 N1 G 4 21.285 -9.279 -0.212 1.00 0.00 N ATOM 791 C2 G 4 20.244 -10.175 -0.356 1.00 0.00 C ATOM 792 N2 G 4 19.892 -10.852 0.738 1.00 0.00 N ATOM 793 N3 G 4 19.600 -10.385 -1.506 1.00 0.00 N ATOM 794 C4 G 4 20.082 -9.620 -2.513 1.00 0.00 C ATOM 795 1H5* G 4 17.777 -10.167 -8.510 1.00 0.00 H ATOM 796 2H5* G 4 16.864 -8.717 -8.020 1.00 0.00 H ATOM 797 H4* G 4 16.737 -10.941 -6.601 1.00 0.00 H ATOM 798 H3* G 4 17.262 -8.163 -5.442 1.00 0.00 H ATOM 799 1H2* G 4 17.216 -9.122 -3.284 1.00 0.00 H ATOM 800 2HO* G 4 16.390 -11.175 -2.881 1.00 0.00 H ATOM 801 H1* G 4 18.578 -11.447 -3.932 1.00 0.00 H ATOM 802 H8 G 4 20.294 -8.449 -5.485 1.00 0.00 H ATOM 803 H1 G 4 21.718 -9.178 0.693 1.00 0.00 H ATOM 804 1H2 G 4 20.379 -10.694 1.610 1.00 0.00 H ATOM 805 2H2 G 4 19.138 -11.524 0.697 1.00 0.00 H ATOM 806 P U 5 14.561 -7.681 -5.200 1.00 0.00 P ATOM 807 O1P U 5 13.126 -7.896 -5.497 1.00 0.00 O ATOM 808 O2P U 5 15.277 -6.549 -5.832 1.00 0.00 O ATOM 809 O5* U 5 14.740 -7.557 -3.598 1.00 0.00 O ATOM 810 C5* U 5 13.691 -7.942 -2.694 1.00 0.00 C ATOM 811 C4* U 5 13.954 -7.447 -1.263 1.00 0.00 C ATOM 812 O4* U 5 15.270 -7.801 -0.808 1.00 0.00 O ATOM 813 C3* U 5 13.921 -5.918 -1.168 1.00 0.00 C ATOM 814 O3* U 5 12.578 -5.438 -0.963 1.00 0.00 O ATOM 815 C2* U 5 14.806 -5.663 0.046 1.00 0.00 C ATOM 816 O2* U 5 14.052 -5.744 1.264 1.00 0.00 O ATOM 817 C1* U 5 15.869 -6.760 -0.017 1.00 0.00 C ATOM 818 N1 U 5 17.145 -6.276 -0.615 1.00 0.00 N ATOM 819 C2 U 5 18.017 -5.575 0.208 1.00 0.00 C ATOM 820 O2 U 5 17.752 -5.322 1.383 1.00 0.00 O ATOM 821 N3 U 5 19.208 -5.171 -0.369 1.00 0.00 N ATOM 822 C4 U 5 19.604 -5.402 -1.677 1.00 0.00 C ATOM 823 O4 U 5 20.693 -4.997 -2.083 1.00 0.00 O ATOM 824 C5 U 5 18.638 -6.137 -2.463 1.00 0.00 C ATOM 825 C6 U 5 17.463 -6.542 -1.923 1.00 0.00 C ATOM 826 1H5* U 5 13.616 -9.031 -2.684 1.00 0.00 H ATOM 827 2H5* U 5 12.744 -7.527 -3.047 1.00 0.00 H ATOM 828 H4* U 5 13.212 -7.875 -0.585 1.00 0.00 H ATOM 829 H3* U 5 14.368 -5.472 -2.063 1.00 0.00 H ATOM 830 1H2* U 5 15.277 -4.682 -0.042 1.00 0.00 H ATOM 831 2HO* U 5 13.710 -6.639 1.331 1.00 0.00 H ATOM 832 H1* U 5 16.058 -7.137 0.990 1.00 0.00 H ATOM 833 H3 U 5 19.849 -4.657 0.218 1.00 0.00 H ATOM 834 H5 U 5 18.863 -6.384 -3.502 1.00 0.00 H ATOM 835 H6 U 5 16.755 -7.083 -2.546 1.00 0.00 H ATOM 836 P A 6 12.187 -3.888 -1.211 1.00 0.00 P ATOM 837 O1P A 6 10.715 -3.766 -1.112 1.00 0.00 O ATOM 838 O2P A 6 12.880 -3.421 -2.433 1.00 0.00 O ATOM 839 O5* A 6 12.848 -3.140 0.060 1.00 0.00 O ATOM 840 C5* A 6 12.211 -3.149 1.348 1.00 0.00 C ATOM 841 C4* A 6 13.092 -2.495 2.425 1.00 0.00 C ATOM 842 O4* A 6 14.452 -2.942 2.339 1.00 0.00 O ATOM 843 C3* A 6 13.164 -0.974 2.252 1.00 0.00 C ATOM 844 O3* A 6 12.183 -0.340 3.092 1.00 0.00 O ATOM 845 C2* A 6 14.587 -0.603 2.659 1.00 0.00 C ATOM 846 O2* A 6 14.605 0.044 3.937 1.00 0.00 O ATOM 847 C1* A 6 15.381 -1.908 2.702 1.00 0.00 C ATOM 848 N9 A 6 16.530 -1.858 1.766 1.00 0.00 N ATOM 849 C8 A 6 16.552 -2.015 0.421 1.00 0.00 C ATOM 850 N7 A 6 17.692 -1.914 -0.176 1.00 0.00 N ATOM 851 C5 A 6 18.551 -1.655 0.899 1.00 0.00 C ATOM 852 C6 A 6 19.931 -1.434 0.977 1.00 0.00 C ATOM 853 N6 A 6 20.734 -1.440 -0.089 1.00 0.00 N ATOM 854 N1 A 6 20.454 -1.205 2.195 1.00 0.00 N ATOM 855 C2 A 6 19.670 -1.195 3.275 1.00 0.00 C ATOM 856 N3 A 6 18.355 -1.391 3.316 1.00 0.00 N ATOM 857 C4 A 6 17.854 -1.618 2.084 1.00 0.00 C ATOM 858 1H5* A 6 12.007 -4.181 1.637 1.00 0.00 H ATOM 859 2H5* A 6 11.265 -2.607 1.283 1.00 0.00 H ATOM 860 H4* A 6 12.697 -2.734 3.414 1.00 0.00 H ATOM 861 H3* A 6 12.998 -0.710 1.203 1.00 0.00 H ATOM 862 1H2* A 6 15.020 0.055 1.901 1.00 0.00 H ATOM 863 2HO* A 6 14.170 -0.543 4.561 1.00 0.00 H ATOM 864 H1* A 6 15.744 -2.083 3.716 1.00 0.00 H ATOM 865 H8 A 6 15.636 -2.206 -0.137 1.00 0.00 H ATOM 866 1H6 A 6 21.723 -1.274 0.023 1.00 0.00 H ATOM 867 2H6 A 6 20.351 -1.610 -1.008 1.00 0.00 H ATOM 868 H2 A 6 20.162 -1.003 4.230 1.00 0.00 H ATOM 869 P A 7 11.764 1.207 2.880 1.00 0.00 P ATOM 870 O1P A 7 10.422 1.404 3.472 1.00 0.00 O ATOM 871 O2P A 7 12.003 1.572 1.465 1.00 0.00 O ATOM 872 O5* A 7 12.838 1.986 3.804 1.00 0.00 O ATOM 873 C5* A 7 13.666 3.034 3.277 1.00 0.00 C ATOM 874 C4* A 7 14.900 3.262 4.164 1.00 0.00 C ATOM 875 O4* A 7 15.877 2.226 3.983 1.00 0.00 O ATOM 876 C3* A 7 15.645 4.552 3.803 1.00 0.00 C ATOM 877 O3* A 7 15.097 5.691 4.495 1.00 0.00 O ATOM 878 C2* A 7 17.058 4.234 4.276 1.00 0.00 C ATOM 879 O2* A 7 17.221 4.541 5.668 1.00 0.00 O ATOM 880 C1* A 7 17.219 2.735 4.019 1.00 0.00 C ATOM 881 N9 A 7 17.943 2.467 2.747 1.00 0.00 N ATOM 882 C8 A 7 17.459 2.277 1.491 1.00 0.00 C ATOM 883 N7 A 7 18.315 2.068 0.547 1.00 0.00 N ATOM 884 C5 A 7 19.527 2.120 1.243 1.00 0.00 C ATOM 885 C6 A 7 20.861 1.974 0.838 1.00 0.00 C ATOM 886 N6 A 7 21.219 1.739 -0.425 1.00 0.00 N ATOM 887 N1 A 7 21.811 2.082 1.784 1.00 0.00 N ATOM 888 C2 A 7 21.477 2.318 3.055 1.00 0.00 C ATOM 889 N3 A 7 20.250 2.473 3.547 1.00 0.00 N ATOM 890 C4 A 7 19.313 2.361 2.580 1.00 0.00 C ATOM 891 1H5* A 7 13.083 3.955 3.226 1.00 0.00 H ATOM 892 2H5* A 7 13.992 2.762 2.272 1.00 0.00 H ATOM 893 H4* A 7 14.597 3.291 5.212 1.00 0.00 H ATOM 894 H3* A 7 15.635 4.706 2.717 1.00 0.00 H ATOM 895 1H2* A 7 17.782 4.792 3.676 1.00 0.00 H ATOM 896 2HO* A 7 16.609 3.987 6.157 1.00 0.00 H ATOM 897 H1* A 7 17.756 2.278 4.851 1.00 0.00 H ATOM 898 H8 A 7 16.387 2.290 1.287 1.00 0.00 H ATOM 899 1H6 A 7 22.196 1.643 -0.663 1.00 0.00 H ATOM 900 2H6 A 7 20.513 1.657 -1.143 1.00 0.00 H ATOM 901 H2 A 7 22.299 2.386 3.768 1.00 0.00 H ATOM 902 P G 8 15.447 7.203 4.041 1.00 0.00 P ATOM 903 O1P G 8 14.403 8.100 4.587 1.00 0.00 O ATOM 904 O2P G 8 15.726 7.206 2.587 1.00 0.00 O ATOM 905 O5* G 8 16.832 7.501 4.820 1.00 0.00 O ATOM 906 C5* G 8 16.850 7.784 6.232 1.00 0.00 C ATOM 907 C4* G 8 18.238 7.596 6.853 1.00 0.00 C ATOM 908 O4* G 8 18.934 6.484 6.270 1.00 0.00 O ATOM 909 C3* G 8 19.154 8.797 6.596 1.00 0.00 C ATOM 910 O3* G 8 18.995 9.830 7.589 1.00 0.00 O ATOM 911 C2* G 8 20.534 8.156 6.693 1.00 0.00 C ATOM 912 O2* G 8 20.992 8.107 8.051 1.00 0.00 O ATOM 913 C1* G 8 20.338 6.751 6.124 1.00 0.00 C ATOM 914 N9 G 8 20.771 6.671 4.703 1.00 0.00 N ATOM 915 C8 G 8 20.045 6.813 3.561 1.00 0.00 C ATOM 916 N7 G 8 20.673 6.707 2.436 1.00 0.00 N ATOM 917 C5 G 8 21.984 6.462 2.858 1.00 0.00 C ATOM 918 C6 G 8 23.163 6.253 2.090 1.00 0.00 C ATOM 919 O6 G 8 23.284 6.243 0.867 1.00 0.00 O ATOM 920 N1 G 8 24.271 6.042 2.904 1.00 0.00 N ATOM 921 C2 G 8 24.251 6.032 4.287 1.00 0.00 C ATOM 922 N2 G 8 25.415 5.807 4.898 1.00 0.00 N ATOM 923 N3 G 8 23.147 6.230 5.013 1.00 0.00 N ATOM 924 C4 G 8 22.054 6.438 4.243 1.00 0.00 C ATOM 925 1H5* G 8 16.184 7.101 6.745 1.00 0.00 H ATOM 926 2H5* G 8 16.503 8.818 6.385 1.00 0.00 H ATOM 927 H4* G 8 18.138 7.432 7.928 1.00 0.00 H ATOM 928 H3* G 8 18.991 9.190 5.586 1.00 0.00 H ATOM 929 1H2* G 8 21.242 8.710 6.074 1.00 0.00 H ATOM 930 2HO* G 8 20.386 7.545 8.538 1.00 0.00 H ATOM 931 H1* G 8 20.907 6.036 6.721 1.00 0.00 H ATOM 932 H8 G 8 18.973 7.005 3.590 1.00 0.00 H ATOM 933 H1 G 8 25.150 5.891 2.427 1.00 0.00 H ATOM 934 1H2 G 8 25.460 5.785 5.906 1.00 0.00 H ATOM 935 2H2 G 8 26.250 5.655 4.351 1.00 0.00 H ATOM 936 P G 9 19.371 11.371 7.273 1.00 0.00 P ATOM 937 O1P G 9 18.904 12.198 8.410 1.00 0.00 O ATOM 938 O2P G 9 18.919 11.692 5.901 1.00 0.00 O ATOM 939 O5* G 9 20.988 11.363 7.282 1.00 0.00 O ATOM 940 C5* G 9 21.729 11.337 8.516 1.00 0.00 C ATOM 941 C4* G 9 23.200 10.954 8.316 1.00 0.00 C ATOM 942 O4* G 9 23.358 9.961 7.292 1.00 0.00 O ATOM 943 C3* G 9 24.044 12.132 7.832 1.00 0.00 C ATOM 944 O3* G 9 24.504 12.988 8.895 1.00 0.00 O ATOM 945 C2* G 9 25.211 11.387 7.193 1.00 0.00 C ATOM 946 O2* G 9 26.160 10.963 8.181 1.00 0.00 O ATOM 947 C1* G 9 24.545 10.187 6.512 1.00 0.00 C ATOM 948 N9 G 9 24.217 10.466 5.085 1.00 0.00 N ATOM 949 C8 G 9 23.010 10.681 4.496 1.00 0.00 C ATOM 950 N7 G 9 22.992 10.916 3.226 1.00 0.00 N ATOM 951 C5 G 9 24.352 10.856 2.902 1.00 0.00 C ATOM 952 C6 G 9 24.996 11.033 1.646 1.00 0.00 C ATOM 953 O6 G 9 24.487 11.290 0.555 1.00 0.00 O ATOM 954 N1 G 9 26.374 10.890 1.755 1.00 0.00 N ATOM 955 C2 G 9 27.054 10.612 2.927 1.00 0.00 C ATOM 956 N2 G 9 28.381 10.509 2.832 1.00 0.00 N ATOM 957 N3 G 9 26.456 10.446 4.111 1.00 0.00 N ATOM 958 C4 G 9 25.111 10.580 4.031 1.00 0.00 C ATOM 959 1H5* G 9 21.300 10.595 9.177 1.00 0.00 H ATOM 960 2H5* G 9 21.657 12.330 8.987 1.00 0.00 H ATOM 961 H4* G 9 23.608 10.573 9.251 1.00 0.00 H ATOM 962 H3* G 9 23.499 12.701 7.070 1.00 0.00 H ATOM 963 1H2* G 9 25.694 12.023 6.448 1.00 0.00 H ATOM 964 2HO* G 9 26.943 10.661 7.715 1.00 0.00 H ATOM 965 H1* G 9 25.200 9.313 6.581 1.00 0.00 H ATOM 966 H8 G 9 22.084 10.649 5.073 1.00 0.00 H ATOM 967 H1 G 9 26.900 11.006 0.898 1.00 0.00 H ATOM 968 1H2 G 9 28.931 10.305 3.654 1.00 0.00 H ATOM 969 2H2 G 9 28.835 10.632 1.939 1.00 0.00 H ATOM 970 P A 10 24.210 14.578 8.885 1.00 0.00 P ATOM 971 O1P A 10 24.448 15.101 10.250 1.00 0.00 O ATOM 972 O2P A 10 22.901 14.810 8.231 1.00 0.00 O ATOM 973 O5* A 10 25.366 15.154 7.911 1.00 0.00 O ATOM 974 C5* A 10 26.711 15.341 8.379 1.00 0.00 C ATOM 975 C4* A 10 27.739 15.068 7.275 1.00 0.00 C ATOM 976 O4* A 10 27.319 13.980 6.439 1.00 0.00 O ATOM 977 C3* A 10 27.893 16.256 6.319 1.00 0.00 C ATOM 978 O3* A 10 28.901 17.233 6.647 1.00 0.00 O ATOM 979 C2* A 10 28.363 15.549 5.059 1.00 0.00 C ATOM 980 O2* A 10 29.788 15.390 5.046 1.00 0.00 O ATOM 981 C1* A 10 27.655 14.208 5.062 1.00 0.00 C ATOM 982 N9 A 10 26.458 14.265 4.188 1.00 0.00 N ATOM 983 C8 A 10 25.159 14.007 4.467 1.00 0.00 C ATOM 984 N7 A 10 24.302 14.150 3.511 1.00 0.00 N ATOM 985 C5 A 10 25.123 14.559 2.455 1.00 0.00 C ATOM 986 C6 A 10 24.862 14.897 1.121 1.00 0.00 C ATOM 987 N6 A 10 23.641 14.870 0.585 1.00 0.00 N ATOM 988 N1 A 10 25.910 15.265 0.361 1.00 0.00 N ATOM 989 C2 A 10 27.141 15.300 0.879 1.00 0.00 C ATOM 990 N3 A 10 27.497 14.996 2.117 1.00 0.00 N ATOM 991 C4 A 10 26.434 14.632 2.860 1.00 0.00 C ATOM 992 1H5* A 10 26.899 14.662 9.209 1.00 0.00 H ATOM 993 2H5* A 10 26.825 16.367 8.730 1.00 0.00 H ATOM 994 H4* A 10 28.704 14.825 7.725 1.00 0.00 H ATOM 995 H3* A 10 26.920 16.733 6.143 1.00 0.00 H ATOM 996 1H2* A 10 28.038 16.114 4.190 1.00 0.00 H ATOM 997 2HO* A 10 30.022 14.832 5.791 1.00 0.00 H ATOM 998 H1* A 10 28.334 13.427 4.714 1.00 0.00 H ATOM 999 H8 A 10 24.848 13.720 5.467 1.00 0.00 H ATOM 1000 1H6 A 10 23.509 15.125 -0.384 1.00 0.00 H ATOM 1001 2H6 A 10 22.849 14.595 1.147 1.00 0.00 H ATOM 1002 H2 A 10 27.943 15.645 0.219 1.00 0.00 H ATOM 1003 P U 11 28.812 18.764 6.118 1.00 0.00 P ATOM 1004 O1P U 11 30.180 19.331 6.126 1.00 0.00 O ATOM 1005 O2P U 11 27.736 19.447 6.871 1.00 0.00 O ATOM 1006 O5* U 11 28.328 18.615 4.578 1.00 0.00 O ATOM 1007 C5* U 11 29.263 18.459 3.494 1.00 0.00 C ATOM 1008 C4* U 11 28.940 19.378 2.299 1.00 0.00 C ATOM 1009 O4* U 11 27.773 18.954 1.580 1.00 0.00 O ATOM 1010 C3* U 11 28.599 20.813 2.722 1.00 0.00 C ATOM 1011 O3* U 11 28.865 21.847 1.767 1.00 0.00 O ATOM 1012 C2* U 11 27.084 20.814 2.869 1.00 0.00 C ATOM 1013 O2* U 11 26.535 22.124 2.678 1.00 0.00 O ATOM 1014 C1* U 11 26.667 19.857 1.751 1.00 0.00 C ATOM 1015 N1 U 11 25.405 19.136 2.076 1.00 0.00 N ATOM 1016 C2 U 11 24.329 19.311 1.216 1.00 0.00 C ATOM 1017 O2 U 11 24.398 20.021 0.213 1.00 0.00 O ATOM 1018 N3 U 11 23.165 18.639 1.550 1.00 0.00 N ATOM 1019 C4 U 11 22.980 17.820 2.651 1.00 0.00 C ATOM 1020 O4 U 11 21.896 17.271 2.850 1.00 0.00 O ATOM 1021 C5 U 11 24.148 17.691 3.492 1.00 0.00 C ATOM 1022 C6 U 11 25.302 18.336 3.190 1.00 0.00 C ATOM 1023 1H5* U 11 29.236 17.423 3.156 1.00 0.00 H ATOM 1024 2H5* U 11 30.267 18.682 3.849 1.00 0.00 H ATOM 1025 H4* U 11 29.789 19.390 1.610 1.00 0.00 H ATOM 1026 H3* U 11 29.063 21.050 3.677 1.00 0.00 H ATOM 1027 1H2* U 11 26.799 20.406 3.842 1.00 0.00 H ATOM 1028 2HO* U 11 25.592 22.066 2.848 1.00 0.00 H ATOM 1029 H1* U 11 26.534 20.425 0.826 1.00 0.00 H ATOM 1030 H3 U 11 22.374 18.759 0.931 1.00 0.00 H ATOM 1031 H5 U 11 24.099 17.067 4.386 1.00 0.00 H ATOM 1032 H6 U 11 26.165 18.216 3.844 1.00 0.00 H ATOM 1033 P U 12 30.365 22.352 1.474 1.00 0.00 P ATOM 1034 O1P U 12 31.310 21.554 2.289 1.00 0.00 O ATOM 1035 O2P U 12 30.381 23.829 1.564 1.00 0.00 O ATOM 1036 O5* U 12 30.538 21.941 -0.073 1.00 0.00 O ATOM 1037 C5* U 12 31.344 20.830 -0.476 1.00 0.00 C ATOM 1038 C4* U 12 30.748 20.111 -1.694 1.00 0.00 C ATOM 1039 O4* U 12 29.536 20.722 -2.157 1.00 0.00 O ATOM 1040 C3* U 12 31.705 20.135 -2.878 1.00 0.00 C ATOM 1041 O3* U 12 31.712 19.032 -3.792 1.00 0.00 O ATOM 1042 C2* U 12 31.243 21.382 -3.630 1.00 0.00 C ATOM 1043 O2* U 12 31.513 21.273 -5.033 1.00 0.00 O ATOM 1044 C1* U 12 29.743 21.469 -3.368 1.00 0.00 C ATOM 1045 N1 U 12 29.291 22.881 -3.225 1.00 0.00 N ATOM 1046 C2 U 12 28.934 23.562 -4.381 1.00 0.00 C ATOM 1047 O2 U 12 28.991 23.041 -5.495 1.00 0.00 O ATOM 1048 N3 U 12 28.512 24.870 -4.213 1.00 0.00 N ATOM 1049 C4 U 12 28.414 25.550 -3.010 1.00 0.00 C ATOM 1050 O4 U 12 28.024 26.716 -2.978 1.00 0.00 O ATOM 1051 C5 U 12 28.805 24.768 -1.857 1.00 0.00 C ATOM 1052 C6 U 12 29.223 23.486 -1.993 1.00 0.00 C ATOM 1053 1H5* U 12 31.417 20.123 0.350 1.00 0.00 H ATOM 1054 2H5* U 12 32.337 21.183 -0.719 1.00 0.00 H ATOM 1055 H4* U 12 30.540 19.074 -1.425 1.00 0.00 H ATOM 1056 H3* U 12 32.714 20.307 -2.500 1.00 0.00 H ATOM 1057 1H2* U 12 31.729 22.258 -3.222 1.00 0.00 H ATOM 1058 2HO* U 12 32.366 21.685 -5.193 1.00 0.00 H ATOM 1059 H1* U 12 29.201 20.993 -4.184 1.00 0.00 H ATOM 1060 H3 U 12 28.249 25.377 -5.047 1.00 0.00 H ATOM 1061 H5 U 12 28.763 25.218 -0.864 1.00 0.00 H ATOM 1062 H6 U 12 29.517 22.924 -1.105 1.00 0.00 H ATOM 1063 P A 13 33.089 18.602 -4.523 1.00 0.00 P ATOM 1064 O1P A 13 33.745 17.561 -3.700 1.00 0.00 O ATOM 1065 O2P A 13 33.835 19.830 -4.881 1.00 0.00 O ATOM 1066 O5* A 13 32.559 17.931 -5.890 1.00 0.00 O ATOM 1067 C5* A 13 32.848 16.569 -6.220 1.00 0.00 C ATOM 1068 C4* A 13 32.210 15.588 -5.224 1.00 0.00 C ATOM 1069 O4* A 13 31.454 16.258 -4.215 1.00 0.00 O ATOM 1070 C3* A 13 31.229 14.633 -5.904 1.00 0.00 C ATOM 1071 O3* A 13 31.260 13.370 -5.229 1.00 0.00 O ATOM 1072 C2* A 13 29.888 15.340 -5.756 1.00 0.00 C ATOM 1073 O2* A 13 28.797 14.411 -5.753 1.00 0.00 O ATOM 1074 C1* A 13 30.062 16.013 -4.393 1.00 0.00 C ATOM 1075 N9 A 13 29.280 17.261 -4.260 1.00 0.00 N ATOM 1076 C8 A 13 29.030 18.262 -5.142 1.00 0.00 C ATOM 1077 N7 A 13 28.209 19.188 -4.770 1.00 0.00 N ATOM 1078 C5 A 13 27.885 18.771 -3.476 1.00 0.00 C ATOM 1079 C6 A 13 27.062 19.304 -2.476 1.00 0.00 C ATOM 1080 N6 A 13 26.356 20.425 -2.629 1.00 0.00 N ATOM 1081 N1 A 13 26.999 18.633 -1.315 1.00 0.00 N ATOM 1082 C2 A 13 27.702 17.509 -1.151 1.00 0.00 C ATOM 1083 N3 A 13 28.490 16.925 -2.024 1.00 0.00 N ATOM 1084 C4 A 13 28.541 17.608 -3.169 1.00 0.00 C ATOM 1085 1H5* A 13 33.931 16.424 -6.218 1.00 0.00 H ATOM 1086 2H5* A 13 32.471 16.364 -7.221 1.00 0.00 H ATOM 1087 H4* A 13 32.986 15.009 -4.730 1.00 0.00 H ATOM 1088 H3* A 13 31.481 14.517 -6.962 1.00 0.00 H ATOM 1089 1H2* A 13 29.767 16.088 -6.540 1.00 0.00 H ATOM 1090 2HO* A 13 27.985 14.922 -5.797 1.00 0.00 H ATOM 1091 H1* A 13 29.736 15.316 -3.619 1.00 0.00 H ATOM 1092 H8 A 13 29.582 18.344 -6.068 1.00 0.00 H ATOM 1093 1H6 A 13 25.778 20.764 -1.872 1.00 0.00 H ATOM 1094 2H6 A 13 26.397 20.936 -3.499 1.00 0.00 H ATOM 1095 H2 A 13 27.656 17.039 -0.178 1.00 0.00 H ATOM 1096 P C 14 31.030 11.984 -6.021 1.00 0.00 P ATOM 1097 O1P C 14 31.972 11.929 -7.163 1.00 0.00 O ATOM 1098 O2P C 14 29.579 11.813 -6.253 1.00 0.00 O ATOM 1099 O5* C 14 31.496 10.901 -4.923 1.00 0.00 O ATOM 1100 C5* C 14 32.862 10.835 -4.492 1.00 0.00 C ATOM 1101 C4* C 14 33.022 11.139 -2.994 1.00 0.00 C ATOM 1102 O4* C 14 32.307 12.320 -2.612 1.00 0.00 O ATOM 1103 C3* C 14 32.430 10.031 -2.122 1.00 0.00 C ATOM 1104 O3* C 14 33.398 8.990 -1.904 1.00 0.00 O ATOM 1105 C2* C 14 32.093 10.783 -0.841 1.00 0.00 C ATOM 1106 O2* C 14 33.228 10.854 0.032 1.00 0.00 O ATOM 1107 C1* C 14 31.678 12.176 -1.328 1.00 0.00 C ATOM 1108 N1 C 14 30.202 12.319 -1.441 1.00 0.00 N ATOM 1109 C2 C 14 29.447 12.280 -0.272 1.00 0.00 C ATOM 1110 O2 C 14 29.995 12.119 0.819 1.00 0.00 O ATOM 1111 N3 C 14 28.101 12.439 -0.369 1.00 0.00 N ATOM 1112 C4 C 14 27.510 12.635 -1.555 1.00 0.00 C ATOM 1113 N4 C 14 26.183 12.760 -1.603 1.00 0.00 N ATOM 1114 C5 C 14 28.282 12.684 -2.763 1.00 0.00 C ATOM 1115 C6 C 14 29.615 12.514 -2.658 1.00 0.00 C ATOM 1116 1H5* C 14 33.452 11.553 -5.065 1.00 0.00 H ATOM 1117 2H5* C 14 33.235 9.837 -4.691 1.00 0.00 H ATOM 1118 H4* C 14 34.081 11.268 -2.759 1.00 0.00 H ATOM 1119 H3* C 14 31.517 9.634 -2.579 1.00 0.00 H ATOM 1120 1H2* C 14 31.254 10.295 -0.339 1.00 0.00 H ATOM 1121 2HO* C 14 32.922 11.211 0.870 1.00 0.00 H ATOM 1122 H1* C 14 32.066 12.937 -0.649 1.00 0.00 H ATOM 1123 1H4 C 14 25.720 12.890 -2.489 1.00 0.00 H ATOM 1124 2H4 C 14 25.640 12.718 -0.751 1.00 0.00 H ATOM 1125 H5 C 14 27.815 12.905 -3.727 1.00 0.00 H ATOM 1126 H6 C 14 30.233 12.508 -3.558 1.00 0.00 H ATOM 1127 P C 15 32.984 7.429 -1.947 1.00 0.00 P ATOM 1128 O1P C 15 34.217 6.631 -2.132 1.00 0.00 O ATOM 1129 O2P C 15 31.858 7.267 -2.896 1.00 0.00 O ATOM 1130 O5* C 15 32.426 7.167 -0.454 1.00 0.00 O ATOM 1131 C5* C 15 33.306 7.165 0.681 1.00 0.00 C ATOM 1132 C4* C 15 32.536 7.006 2.001 1.00 0.00 C ATOM 1133 O4* C 15 31.494 7.986 2.129 1.00 0.00 O ATOM 1134 C3* C 15 31.811 5.661 2.084 1.00 0.00 C ATOM 1135 O3* C 15 32.663 4.622 2.598 1.00 0.00 O ATOM 1136 C2* C 15 30.681 5.984 3.053 1.00 0.00 C ATOM 1137 O2* C 15 31.107 5.831 4.414 1.00 0.00 O ATOM 1138 C1* C 15 30.317 7.442 2.750 1.00 0.00 C ATOM 1139 N1 C 15 29.122 7.547 1.862 1.00 0.00 N ATOM 1140 C2 C 15 27.889 7.166 2.390 1.00 0.00 C ATOM 1141 O2 C 15 27.809 6.774 3.552 1.00 0.00 O ATOM 1142 N3 C 15 26.789 7.250 1.594 1.00 0.00 N ATOM 1143 C4 C 15 26.881 7.689 0.333 1.00 0.00 C ATOM 1144 N4 C 15 25.779 7.756 -0.416 1.00 0.00 N ATOM 1145 C5 C 15 28.141 8.085 -0.219 1.00 0.00 C ATOM 1146 C6 C 15 29.230 8.000 0.573 1.00 0.00 C ATOM 1147 1H5* C 15 33.858 8.105 0.706 1.00 0.00 H ATOM 1148 2H5* C 15 34.015 6.341 0.581 1.00 0.00 H ATOM 1149 H4* C 15 33.227 7.106 2.841 1.00 0.00 H ATOM 1150 H3* C 15 31.400 5.389 1.104 1.00 0.00 H ATOM 1151 1H2* C 15 29.826 5.334 2.848 1.00 0.00 H ATOM 1152 2HO* C 15 30.330 5.925 4.969 1.00 0.00 H ATOM 1153 H1* C 15 30.127 7.972 3.687 1.00 0.00 H ATOM 1154 1H4 C 15 25.835 8.084 -1.370 1.00 0.00 H ATOM 1155 2H4 C 15 24.888 7.476 -0.031 1.00 0.00 H ATOM 1156 H5 C 15 28.218 8.445 -1.247 1.00 0.00 H ATOM 1157 H6 C 15 30.201 8.306 0.184 1.00 0.00 H ATOM 1158 P U 16 32.481 3.079 2.148 1.00 0.00 P ATOM 1159 O1P U 16 33.543 2.281 2.801 1.00 0.00 O ATOM 1160 O2P U 16 32.335 3.037 0.675 1.00 0.00 O ATOM 1161 O5* U 16 31.061 2.672 2.807 1.00 0.00 O ATOM 1162 C5* U 16 30.943 2.388 4.209 1.00 0.00 C ATOM 1163 C4* U 16 29.476 2.232 4.643 1.00 0.00 C ATOM 1164 O4* U 16 28.611 3.142 3.944 1.00 0.00 O ATOM 1165 C3* U 16 28.916 0.848 4.304 1.00 0.00 C ATOM 1166 O3* U 16 29.263 -0.099 5.331 1.00 0.00 O ATOM 1167 C2* U 16 27.419 1.134 4.239 1.00 0.00 C ATOM 1168 O2* U 16 26.823 1.075 5.541 1.00 0.00 O ATOM 1169 C1* U 16 27.333 2.547 3.658 1.00 0.00 C ATOM 1170 N1 U 16 27.040 2.530 2.195 1.00 0.00 N ATOM 1171 C2 U 16 25.709 2.427 1.812 1.00 0.00 C ATOM 1172 O2 U 16 24.794 2.353 2.631 1.00 0.00 O ATOM 1173 N3 U 16 25.464 2.408 0.449 1.00 0.00 N ATOM 1174 C4 U 16 26.416 2.482 -0.553 1.00 0.00 C ATOM 1175 O4 U 16 26.084 2.456 -1.738 1.00 0.00 O ATOM 1176 C5 U 16 27.775 2.588 -0.069 1.00 0.00 C ATOM 1177 C6 U 16 28.043 2.610 1.260 1.00 0.00 C ATOM 1178 1H5* U 16 31.393 3.204 4.777 1.00 0.00 H ATOM 1179 2H5* U 16 31.482 1.464 4.432 1.00 0.00 H ATOM 1180 H4* U 16 29.391 2.412 5.717 1.00 0.00 H ATOM 1181 H3* U 16 29.282 0.519 3.323 1.00 0.00 H ATOM 1182 1H2* U 16 26.932 0.424 3.564 1.00 0.00 H ATOM 1183 2HO* U 16 26.287 0.279 5.574 1.00 0.00 H ATOM 1184 H1* U 16 26.554 3.105 4.182 1.00 0.00 H ATOM 1185 H3 U 16 24.499 2.333 0.158 1.00 0.00 H ATOM 1186 H5 U 16 28.596 2.645 -0.783 1.00 0.00 H ATOM 1187 H6 U 16 29.078 2.699 1.589 1.00 0.00 H ATOM 1188 P A 17 29.128 -1.691 5.090 1.00 0.00 P ATOM 1189 O1P A 17 30.089 -2.375 5.985 1.00 0.00 O ATOM 1190 O2P A 17 29.170 -1.952 3.633 1.00 0.00 O ATOM 1191 O5* A 17 27.632 -2.003 5.614 1.00 0.00 O ATOM 1192 C5* A 17 27.278 -1.850 6.998 1.00 0.00 C ATOM 1193 C4* A 17 25.805 -2.202 7.251 1.00 0.00 C ATOM 1194 O4* A 17 24.933 -1.552 6.313 1.00 0.00 O ATOM 1195 C3* A 17 25.537 -3.691 7.076 1.00 0.00 C ATOM 1196 O3* A 17 25.848 -4.377 8.302 1.00 0.00 O ATOM 1197 C2* A 17 24.052 -3.687 6.737 1.00 0.00 C ATOM 1198 O2* A 17 23.249 -3.687 7.926 1.00 0.00 O ATOM 1199 C1* A 17 23.841 -2.405 5.932 1.00 0.00 C ATOM 1200 N9 A 17 23.844 -2.667 4.467 1.00 0.00 N ATOM 1201 C8 A 17 24.786 -2.368 3.532 1.00 0.00 C ATOM 1202 N7 A 17 24.548 -2.718 2.311 1.00 0.00 N ATOM 1203 C5 A 17 23.294 -3.330 2.427 1.00 0.00 C ATOM 1204 C6 A 17 22.444 -3.936 1.494 1.00 0.00 C ATOM 1205 N6 A 17 22.737 -4.034 0.196 1.00 0.00 N ATOM 1206 N1 A 17 21.280 -4.436 1.948 1.00 0.00 N ATOM 1207 C2 A 17 20.966 -4.348 3.242 1.00 0.00 C ATOM 1208 N3 A 17 21.695 -3.799 4.210 1.00 0.00 N ATOM 1209 C4 A 17 22.857 -3.304 3.734 1.00 0.00 C ATOM 1210 1H5* A 17 27.451 -0.815 7.297 1.00 0.00 H ATOM 1211 2H5* A 17 27.911 -2.502 7.603 1.00 0.00 H ATOM 1212 H4* A 17 25.528 -1.907 8.258 1.00 0.00 H ATOM 1213 H3* A 17 26.119 -4.088 6.237 1.00 0.00 H ATOM 1214 1H2* A 17 23.812 -4.551 6.120 1.00 0.00 H ATOM 1215 2HO* A 17 22.338 -3.821 7.656 1.00 0.00 H ATOM 1216 H1* A 17 22.896 -1.943 6.222 1.00 0.00 H ATOM 1217 H8 A 17 25.704 -1.843 3.798 1.00 0.00 H ATOM 1218 1H6 A 17 22.086 -4.481 -0.435 1.00 0.00 H ATOM 1219 2H6 A 17 23.607 -3.662 -0.156 1.00 0.00 H ATOM 1220 H2 A 17 20.003 -4.767 3.538 1.00 0.00 H ATOM 1221 P U 18 26.579 -5.817 8.301 1.00 0.00 P ATOM 1222 O1P U 18 26.793 -6.227 9.708 1.00 0.00 O ATOM 1223 O2P U 18 27.729 -5.760 7.371 1.00 0.00 O ATOM 1224 O5* U 18 25.456 -6.782 7.657 1.00 0.00 O ATOM 1225 C5* U 18 24.254 -7.102 8.372 1.00 0.00 C ATOM 1226 C4* U 18 23.146 -7.595 7.431 1.00 0.00 C ATOM 1227 O4* U 18 22.934 -6.690 6.339 1.00 0.00 O ATOM 1228 C3* U 18 23.513 -8.921 6.759 1.00 0.00 C ATOM 1229 O3* U 18 23.167 -10.054 7.578 1.00 0.00 O ATOM 1230 C2* U 18 22.670 -8.867 5.493 1.00 0.00 C ATOM 1231 O2* U 18 21.344 -9.360 5.732 1.00 0.00 O ATOM 1232 C1* U 18 22.649 -7.384 5.114 1.00 0.00 C ATOM 1233 N1 U 18 23.642 -7.059 4.047 1.00 0.00 N ATOM 1234 C2 U 18 23.311 -7.401 2.742 1.00 0.00 C ATOM 1235 O2 U 18 22.244 -7.945 2.458 1.00 0.00 O ATOM 1236 N3 U 18 24.246 -7.089 1.771 1.00 0.00 N ATOM 1237 C4 U 18 25.468 -6.473 1.981 1.00 0.00 C ATOM 1238 O4 U 18 26.225 -6.242 1.039 1.00 0.00 O ATOM 1239 C5 U 18 25.737 -6.149 3.366 1.00 0.00 C ATOM 1240 C6 U 18 24.838 -6.444 4.339 1.00 0.00 C ATOM 1241 1H5* U 18 23.903 -6.212 8.897 1.00 0.00 H ATOM 1242 2H5* U 18 24.473 -7.881 9.105 1.00 0.00 H ATOM 1243 H4* U 18 22.215 -7.704 7.990 1.00 0.00 H ATOM 1244 H3* U 18 24.579 -8.932 6.500 1.00 0.00 H ATOM 1245 1H2* U 18 23.153 -9.447 4.704 1.00 0.00 H ATOM 1246 2HO* U 18 21.407 -10.312 5.833 1.00 0.00 H ATOM 1247 H1* U 18 21.646 -7.113 4.775 1.00 0.00 H ATOM 1248 H3 U 18 24.014 -7.334 0.818 1.00 0.00 H ATOM 1249 H5 U 18 26.676 -5.662 3.631 1.00 0.00 H ATOM 1250 H6 U 18 25.068 -6.171 5.372 1.00 0.00 H ATOM 1251 P G 19 24.070 -11.395 7.584 1.00 0.00 P ATOM 1252 O1P G 19 23.816 -12.113 8.853 1.00 0.00 O ATOM 1253 O2P G 19 25.456 -11.030 7.213 1.00 0.00 O ATOM 1254 O5* G 19 23.435 -12.255 6.371 1.00 0.00 O ATOM 1255 C5* G 19 22.522 -13.339 6.622 1.00 0.00 C ATOM 1256 C4* G 19 21.910 -13.906 5.338 1.00 0.00 C ATOM 1257 O4* G 19 21.631 -12.876 4.378 1.00 0.00 O ATOM 1258 C3* G 19 22.872 -14.848 4.607 1.00 0.00 C ATOM 1259 O3* G 19 22.849 -16.193 5.123 1.00 0.00 O ATOM 1260 C2* G 19 22.306 -14.797 3.194 1.00 0.00 C ATOM 1261 O2* G 19 21.217 -15.718 3.035 1.00 0.00 O ATOM 1262 C1* G 19 21.836 -13.351 3.036 1.00 0.00 C ATOM 1263 N9 G 19 22.839 -12.528 2.309 1.00 0.00 N ATOM 1264 C8 G 19 23.791 -11.684 2.790 1.00 0.00 C ATOM 1265 N7 G 19 24.552 -11.088 1.933 1.00 0.00 N ATOM 1266 C5 G 19 24.065 -11.583 0.719 1.00 0.00 C ATOM 1267 C6 G 19 24.486 -11.309 -0.611 1.00 0.00 C ATOM 1268 O6 G 19 25.390 -10.564 -0.986 1.00 0.00 O ATOM 1269 N1 G 19 23.731 -12.014 -1.541 1.00 0.00 N ATOM 1270 C2 G 19 22.696 -12.877 -1.236 1.00 0.00 C ATOM 1271 N2 G 19 22.087 -13.463 -2.269 1.00 0.00 N ATOM 1272 N3 G 19 22.295 -13.140 0.012 1.00 0.00 N ATOM 1273 C4 G 19 23.016 -12.464 0.937 1.00 0.00 C ATOM 1274 1H5* G 19 21.699 -12.982 7.229 1.00 0.00 H ATOM 1275 2H5* G 19 23.061 -14.131 7.167 1.00 0.00 H ATOM 1276 H4* G 19 20.985 -14.435 5.577 1.00 0.00 H ATOM 1277 H3* G 19 23.888 -14.436 4.620 1.00 0.00 H ATOM 1278 1H2* G 19 23.095 -15.012 2.471 1.00 0.00 H ATOM 1279 2HO* G 19 20.988 -15.731 2.103 1.00 0.00 H ATOM 1280 H1* G 19 20.886 -13.332 2.498 1.00 0.00 H ATOM 1281 H8 G 19 23.908 -11.512 3.860 1.00 0.00 H ATOM 1282 H1 G 19 23.976 -11.870 -2.509 1.00 0.00 H ATOM 1283 1H2 G 19 21.324 -14.104 -2.107 1.00 0.00 H ATOM 1284 2H2 G 19 22.390 -13.265 -3.212 1.00 0.00 H ATOM 1285 P C 20 24.094 -17.201 4.902 1.00 0.00 P ATOM 1286 O1P C 20 23.887 -18.383 5.769 1.00 0.00 O ATOM 1287 O2P C 20 25.351 -16.425 5.009 1.00 0.00 O ATOM 1288 O5* C 20 23.916 -17.661 3.361 1.00 0.00 O ATOM 1289 C5* C 20 23.025 -18.728 3.003 1.00 0.00 C ATOM 1290 C4* C 20 22.893 -18.887 1.479 1.00 0.00 C ATOM 1291 O4* C 20 22.785 -17.619 0.811 1.00 0.00 O ATOM 1292 C3* C 20 24.132 -19.534 0.852 1.00 0.00 C ATOM 1293 O3* C 20 24.116 -20.969 0.981 1.00 0.00 O ATOM 1294 C2* C 20 23.970 -19.088 -0.597 1.00 0.00 C ATOM 1295 O2* C 20 23.083 -19.960 -1.314 1.00 0.00 O ATOM 1296 C1* C 20 23.395 -17.672 -0.492 1.00 0.00 C ATOM 1297 N1 C 20 24.448 -16.628 -0.664 1.00 0.00 N ATOM 1298 C2 C 20 24.788 -16.267 -1.967 1.00 0.00 C ATOM 1299 O2 C 20 24.233 -16.804 -2.926 1.00 0.00 O ATOM 1300 N3 C 20 25.743 -15.314 -2.147 1.00 0.00 N ATOM 1301 C4 C 20 26.346 -14.734 -1.103 1.00 0.00 C ATOM 1302 N4 C 20 27.278 -13.806 -1.323 1.00 0.00 N ATOM 1303 C5 C 20 26.005 -15.096 0.239 1.00 0.00 C ATOM 1304 C6 C 20 25.058 -16.040 0.414 1.00 0.00 C ATOM 1305 1H5* C 20 22.039 -18.520 3.422 1.00 0.00 H ATOM 1306 2H5* C 20 23.399 -19.662 3.428 1.00 0.00 H ATOM 1307 H4* C 20 22.009 -19.486 1.252 1.00 0.00 H ATOM 1308 H3* C 20 25.046 -19.104 1.281 1.00 0.00 H ATOM 1309 1H2* C 20 24.948 -19.058 -1.085 1.00 0.00 H ATOM 1310 2HO* C 20 22.206 -19.849 -0.939 1.00 0.00 H ATOM 1311 H1* C 20 22.621 -17.539 -1.252 1.00 0.00 H ATOM 1312 1H4 C 20 27.743 -13.361 -0.545 1.00 0.00 H ATOM 1313 2H4 C 20 27.521 -13.549 -2.269 1.00 0.00 H ATOM 1314 H5 C 20 26.495 -14.624 1.092 1.00 0.00 H ATOM 1315 H6 C 20 24.775 -16.334 1.424 1.00 0.00 H ATOM 1316 P C 21 25.478 -21.837 0.913 1.00 0.00 P ATOM 1317 O1P C 21 25.166 -23.216 1.355 1.00 0.00 O ATOM 1318 O2P C 21 26.555 -21.078 1.590 1.00 0.00 O ATOM 1319 O5* C 21 25.806 -21.872 -0.669 1.00 0.00 O ATOM 1320 C5* C 21 25.064 -22.713 -1.566 1.00 0.00 C ATOM 1321 C4* C 21 25.331 -22.359 -3.037 1.00 0.00 C ATOM 1322 O4* C 21 25.492 -20.944 -3.228 1.00 0.00 O ATOM 1323 C3* C 21 26.646 -22.955 -3.545 1.00 0.00 C ATOM 1324 O3* C 21 26.476 -24.311 -3.980 1.00 0.00 O ATOM 1325 C2* C 21 26.966 -22.031 -4.712 1.00 0.00 C ATOM 1326 O2* C 21 26.289 -22.448 -5.905 1.00 0.00 O ATOM 1327 C1* C 21 26.470 -20.658 -4.244 1.00 0.00 C ATOM 1328 N1 C 21 27.584 -19.805 -3.730 1.00 0.00 N ATOM 1329 C2 C 21 28.285 -19.039 -4.659 1.00 0.00 C ATOM 1330 O2 C 21 27.988 -19.086 -5.851 1.00 0.00 O ATOM 1331 N3 C 21 29.300 -18.247 -4.213 1.00 0.00 N ATOM 1332 C4 C 21 29.619 -18.201 -2.914 1.00 0.00 C ATOM 1333 N4 C 21 30.618 -17.412 -2.517 1.00 0.00 N ATOM 1334 C5 C 21 28.906 -18.983 -1.951 1.00 0.00 C ATOM 1335 C6 C 21 27.903 -19.767 -2.396 1.00 0.00 C ATOM 1336 1H5* C 21 23.999 -22.597 -1.362 1.00 0.00 H ATOM 1337 2H5* C 21 25.345 -23.754 -1.392 1.00 0.00 H ATOM 1338 H4* C 21 24.502 -22.712 -3.654 1.00 0.00 H ATOM 1339 H3* C 21 27.425 -22.882 -2.777 1.00 0.00 H ATOM 1340 1H2* C 21 28.048 -22.003 -4.876 1.00 0.00 H ATOM 1341 2HO* C 21 26.713 -23.255 -6.208 1.00 0.00 H ATOM 1342 H1* C 21 25.977 -20.150 -5.077 1.00 0.00 H ATOM 1343 1H4 C 21 30.870 -17.368 -1.539 1.00 0.00 H ATOM 1344 2H4 C 21 31.124 -16.859 -3.193 1.00 0.00 H ATOM 1345 H5 C 21 29.165 -18.946 -0.892 1.00 0.00 H ATOM 1346 H6 C 21 27.342 -20.373 -1.686 1.00 0.00 H ATOM 1347 H3T C 21 27.352 -24.670 -4.145 1.00 0.00 H TER 1348 C 21 ENDMDL """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_rtax.py000644 000765 000024 00000031410 12024702176 021610 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "David Soergel" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["David Soergel"] #remember to add yourself __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "David Soergel" __email__ = "soergel@cs.umass.edu" __status__ = "Production" from cogent.util.misc import remove_files from unittest import TestCase, main from cogent.app.rtax import Rtax, assign_taxonomy from cogent.app.util import get_tmp_filename class RtaxClassifierTests(TestCase): """ Tests of the RTAX classifier module """ def setUp(self): self.maxDiff = None self.id_to_taxonomy_fp = get_tmp_filename(\ prefix='RtaxTaxonAssignerTests_',suffix='.txt') self.input_seqs_fp = get_tmp_filename(\ prefix='RtaxTaxonAssignerTests_',suffix='.fasta') self.reference_seqs_fp = get_tmp_filename(\ prefix='RtaxTaxonAssignerTests_',suffix='.fasta') self.read_1_seqs_fp = get_tmp_filename(\ prefix='RtaxTaxonAssignerTests_',suffix='.fasta') self.read_2_seqs_fp = get_tmp_filename(\ prefix='RtaxTaxonAssignerTests_',suffix='.fasta') self._paths_to_clean_up = [self.id_to_taxonomy_fp,self.input_seqs_fp,self.reference_seqs_fp, self.read_1_seqs_fp,self.read_2_seqs_fp] a = open(self.id_to_taxonomy_fp,'w') a.write(rtax_reference_taxonomy) a.close() b = open(self.reference_seqs_fp,'w') b.write(rtax_reference_fasta) b.close() c = open(self.input_seqs_fp,'w') c.write(rtax_test_repset_fasta) c.close() d = open(self.read_1_seqs_fp,'w') d.write(rtax_test_read1_fasta) d.close() e = open(self.read_2_seqs_fp,'w') e.write(rtax_test_read2_fasta) e.close() def tearDown(self): remove_files(set(self._paths_to_clean_up),error_on_missing=False) def test_paired_end_classification(self): self._paths_to_clean_up += cleanAll(self.read_1_seqs_fp) self._paths_to_clean_up += cleanAll(self.read_2_seqs_fp) result = assign_taxonomy(self.input_seqs_fp, self.reference_seqs_fp, self.id_to_taxonomy_fp, self.read_1_seqs_fp, self.read_2_seqs_fp,single_ok=False,header_id_regex="\\S+\\s+(\\S+?)\/") self.assertEqual(result, rtax_expected_result_paired) def test_paired_end_classification_with_fallback(self): self._paths_to_clean_up += cleanAll(self.read_1_seqs_fp) self._paths_to_clean_up += cleanAll(self.read_2_seqs_fp) result = assign_taxonomy(self.input_seqs_fp, self.reference_seqs_fp, self.id_to_taxonomy_fp, self.read_1_seqs_fp, self.read_2_seqs_fp,single_ok=True,header_id_regex="\\S+\\s+(\\S+?)\/") self.assertEqual(result, rtax_expected_result_paired_with_fallback) def test_single_end_classification(self): self._paths_to_clean_up += cleanAll(self.read_1_seqs_fp) result = assign_taxonomy(self.input_seqs_fp, self.reference_seqs_fp, self.id_to_taxonomy_fp, self.read_1_seqs_fp, None ,header_id_regex="\\S+\\s+(\\S+?)\/") self.assertEqual(result, rtax_expected_result_single) # I'd like to add tests here that involve the TOOMANYHITS case. However, that requires either a reference # database with >16,000 sequences, which we don't have handy for tests, or adjusting the maxMaxAccepts parameter to rtaxSearch.pl. # However the "rtax" wrapper shell script currently doesn't allow setting that option, and I'd prefer to leave that as is # unless someone actually wants to use it. Thus the TOOMANYHITS situation is not easily testable at the moment. def cleanAll(path): return [path, path + ".pos.db", path + ".pos.dir", path + ".pos.pag", path + ".lines.db", path + ".lines.dir", path + ".lines.pag"] # sample data copied from GreenGenes rtax_reference_taxonomy = """508720 99.0 k__Bacteria p__Actinobacteria c__Actinobacteria o__Actinomycetales f__Propionibacteriaceae g__Propionibacterium s__Propionibacterium acnes 508050 99.0 k__Bacteria p__Proteobacteria c__Betaproteobacteria o__Burkholderiales f__Comamonadaceae g__Diaphorobacter s__ 502492 99.0 k__Bacteria p__Proteobacteria c__Betaproteobacteria o__Burkholderiales f__ g__Aquabacterium s__ """ rtax_reference_fasta = """>508720 GACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCCCTGCTTTTGTGGGGTGCTCGAGTGGCGAACG GGTGAGTAACACGTGAGTAACCTGCCCTTGACTTTGGGATAACTTCAGGAAACTGGGGCTAATACCGGATAGGAGCTCCT GCTGCATGGTGGGGGTTGGAAAGTTTCGGCGGTTGGGGATGGACTCGCGGCTTATCAGCTTGTTGGTGGGGTAGTGGCTT ACCAAGGCTTTGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGA GGCAGCAGTGGGGAATATTGCACAATGGGCGGAAGCCTGATGCAGCAACGCCGCGTGCGGGATGACGGCCTTCGGGTTGT AAACCGCTTTCGCCTGTGACGAAGCGTGAGTGACGGTAATGGGTAAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCG GTGATACGTAGGGTGCGAGCGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGTGGTTGATCGCGTCGGAAGTGTAA TCTTGGGGCTTAACCCTGAGCGTGCTTTCGATACGGGTTGACTTGAGGAAGGTAGGGGAGAATGGAATTCCTGGTGGAGC GGTGGAATGCGCAGATATCAGGAGGAACACCAGTGGCGAAGGCGGTTCTCTGGGCCTTTCCTGACGCTGAGGAGCGAAAG CGTGGGGAGCGAACAGGCTTAGATACCCTGGTAGTCCACGCTGTAAACGGTGGGTACTAGGTGTGGGGTCCATTCCACGG GTTCCGTGCCGTAGCTAACGCTTTAAGTACCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGG GCCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGTAGAACCTTACCTGGGTTTGACATGGATCGGGAG TGCTCAGAGATGGGTGTGCCTCTTTTGGGGTCGGTTCACAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTT GGGTTAAGTCCCGCAACGAGCGCAACCCTTGTTCACTGTTGCCAGCACGTTATGGTGGGGACTCAGTGGAGACCGCCGGG GTCAACTCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTTCACGCATGCTACAATGGCTG GTACAGAGAGTGGCGAGCCTGTGAGGGTGAGCGAATCTCGGAAAGCCGGTCTCAGTTCGGATTGGGGTCTGCAACTCGAC CTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGGGCT >508050 ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGTAACAGGTCTTCGGATGCTGACGAGTGGCGAACGGGTG AGTAATACATCGGAACGTGCCCGATCGTGGGGGATAACGAGGCGAAAGCTTTGCTAATACCGCATACGATCTACGGATGA AAGCGGGGGATCTTCGGACCTCGCGCGGACGGAGCGGCCGATGGCAGATTAGGTAGTTGGTGGGATAAAAGCTTACCAAG CCGACGATCTGTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGC AGTGGGGAATTTTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGCAGGATGAAGGCCTTCGGGTTGTAAACTG CTTTTGTACGGAACGAAAAGCCTCTTTCTAATAAAGAGGGGTCATGACGGTACCGTAAGAATAAGCACCGGCTAACTACG TGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTTGTA AGACAGAGGTGAAATCCCCGGGCTCAACCTGGGAACTGCCTTTGTGACTGCAAGGCTGGAGTGCGGCAGAGGGGGATGGA ATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGGCCTGCACTGACG CTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCAACTGGTTGTTG GGTCTTCACTGACTCAGTAACGAAGCTAACGCGTGAAGTTGACCGCCTGGGGAGTACGGCCGCAAGGTTGAAACTCAAAG GAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGTTTAATTCGATGCAACGCGAAAAACCTTACCCACCTTTGACA TGGCAGGAAGTTTCCAGAGATGGATTCGTGCCCGAAAGGGAACCTGCACACAGGTGCTGCATGGCTGTCGTCAGCTCGTG TCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGCCATTAGTTGCTACGAAAGGGCACTCTAATGGGACTG CCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTATAGGTGGGGCTACACACGTCATACAAT GGCTGGTACAGAGGGTTGCCAACCCGCGAGGGGGAGCTAATCCCATAAAGCCAGTCGTAGTCCGGATCGCAGTCTGCAAC TCGACTGCGTGAAGTCGGAATCGCTAGTAATCGCGGATCAGAATGTCGCGGTGAATACGTTCCCGGGTCT >502492 ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGTAACGGGTCCTTCGGGATGCCGACGAGTGGCGAACGGG TGAGTAATATATCGGAACGTGCCCAGTAGTGGGGGATAACTGCTCGAAAGAGCAGCTAATACCGCATACGACCTGAGGGT GAAAGGGGGGGATCGCAAGACCTCTCGCTATTGGAGCGGCCGATATCAGATTAGCTAGTTGGTGGGGTAAAGGCCTACCA AGGCAACGATCTGTAGTTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCA GCAGTGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCAATGCCGCGTGCAGGAAGAAGGCCTTCGGGTTGTAAAC TGCTTTTGTCAGGGAAGAAATCTTCTGGGCTAATACCCCGGGAGGATGACGGTACCTGAAGAATAAGCACCGGCTAACTA CGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGCTTTG CAAGACAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGTGACTGCAAGGCTAGAGTACGGCAGAGGGGGATG GAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCAATGGCGAAGGCAATCCCCTGGGCCTGTACTGA CGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCAACTGGTTGT TGGACGGCTTGCTGTTCAGTAACGAAGCTAACGCGTGAAGTTGACCGCCTGGGGAGTACGGCCGCAAGGTTGAAACTCAA AGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGTTTAATTCGATGCAACGCGAAAAACCTTACCTACCCTTGA CATGTCAAGAATTCTGCAGAGATGTGGAAGTGCTCGAAAGAGAACTTGAACACAGGTGCTGCATGGCCGTCGTCAGCTCG TGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCATTAGTTGCTACGCAAGAGCACTCTAATGAGAC TGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAGGTCCTCATGGCCCTTATGGGTAGGGCTACACACGTCATACA ATGGCCGGTACAGAGGGCTGCCAACCCGCGAGGGGGAGCCAATCCCAGAAAACCGGTCGTAGTCCGGATCGTAGTCTGCA ACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGCGGATCAGCTTGCCGCGGTGAATACGTTCCCGGGTCT """ rtax_test_repset_fasta = """>clusterIdA splitRead1IdA ACCAAGGCTTTGACGGGTAGCCGGCCTGAGTGGGTGACCGGCCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGA >clusterIdB splitRead1IdB CCGACGATCTGTAGCTGGTCTGAGAGGATGTTCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGC >clusterIdC splitRead1IdC AGGCAACGATCTGTAGTTGGTCTGAGAGGAGGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCA >clusterIdD splitRead1IdD AGGCAACGATCTGTAGTTGGTCTGAGAGGAGGACCAGCCACACTGGGACGGGGGGGGGGCCCAGACTCCTACGGGAGGCA """ # these reads are the 4th and 14th lines from the reference seqs #rtax_test_read1_fasta = """>splitRead1IdA ampliconId_34563456/1 #ACCAAGGCTTTGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGA #>splitRead1IdB ampliconId_ #CCGACGATCTGTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGC #>splitRead1IdC ampliconId_ #AGGCAACGATCTGTAGTTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCA #""" # #rtax_test_read2_fasta = """>splitRead2IdA ampliconId_34563456/3 #GGGTTAAGTCCCGCAACGAGCGCAACCCTTGTTCACTGTTGCCAGCACGTTATGGTGGGGACTCAGTGGAGACCGCCGGG #>splitRead2IdB ampliconId_ #TCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGCCATTAGTTGCTACGAAAGGGCACTCTAATGGGACTG #>splitRead2IdC ampliconId_ #TGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCATTAGTTGCTACGCAAGAGCACTCTAATGAGAC #""" # these reads are the 4th and 14th lines from the reference seqs, with one nucleotide changed each # except D and E, which are unique to one read or the other # and F and G, which are just decoys rtax_test_read1_fasta = """>splitRead1IdA ampliconId_34563456/1 ACCAAGGCTTTGACGGGTAGCCGGCCTGAGTGGGTGACCGGCCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGA >splitRead1IdB ampliconId_12341234/1 CCGACGATCTGTAGCTGGTCTGAGAGGATGTTCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGC >splitRead1IdC ampliconId_23452345/1 AGGCAACGATCTGTAGTTGGTCTGAGAGGAGGACCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCA >splitRead1IdD ampliconId_45674567/1 AGGCAACGATCTGTAGTTGGTCTGAGAGGAGGACCAAAAAAAAAAAGACTGAGACACGGCCCAGACTCCTACGGGAGGCA >splitRead1IdF ampliconId_56785678/1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA """ rtax_test_read2_fasta = """>splitRead2IdA ampliconId_34563456/3 GGGTTAAGTCCCGCAACGAGCGCAACCCTTATTCACTGTTGCCAGCACGTTATGGTGGGGACTCAGTGGAGACCGCCGGG >splitRead2IdB ampliconId_12341234/3 TCGTGAGATGTTGGGTTAAGTCCCGCAACGTGCGCAACCCTTGCCATTAGTTGCTACGAAAGGGCACTCTAATGGGACTG >splitRead2IdC ampliconId_23452345/3 TGTCGTGAGATGTTGGGTTAAGTCCCGCAAAGAGCGCAACCCTTGTCATTAGTTGCTACGCAAGAGCACTCTAATGAGAC >splitRead2IdE ampliconId_67896789/3 TGTCGTGAGATGTTGGGTTAAAAAAAAAAAAAAACGCAACCCTTGTCATTAGTTGCTACGCAAGAGCACTCTAATGAGAC >splitRead2IdG ampliconId_78907890/3 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT """ rtax_expected_result_paired = { 'clusterIdA splitRead1IdA': ('k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Propionibacteriaceae; g__Propionibacterium; s__Propionibacterium acnes', 1.0), 'clusterIdB splitRead1IdB': ('k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Comamonadaceae; g__Diaphorobacter; s__', 1.0), 'clusterIdC splitRead1IdC': ('k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__; g__Aquabacterium; s__', 1.0), } rtax_expected_result_paired_with_fallback = { 'clusterIdA splitRead1IdA': ('k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Propionibacteriaceae; g__Propionibacterium; s__Propionibacterium acnes', 1.0), 'clusterIdB splitRead1IdB': ('k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Comamonadaceae; g__Diaphorobacter; s__', 1.0), 'clusterIdC splitRead1IdC': ('k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__; g__Aquabacterium; s__', 1.0), 'clusterIdD splitRead1IdD': ('k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__; g__Aquabacterium; s__', 1.0), } rtax_expected_result_single = { 'clusterIdA splitRead1IdA': ('k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Propionibacteriaceae; g__Propionibacterium; s__Propionibacterium acnes', 1.0), 'clusterIdB splitRead1IdB': ('k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Comamonadaceae; g__Diaphorobacter; s__', 1.0), 'clusterIdC splitRead1IdC': ('k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__; g__Aquabacterium; s__', 1.0), 'clusterIdD splitRead1IdD': ('k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__; g__Aquabacterium; s__', 1.0), } if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_app/test_sfffile.py000644 000765 000024 00000006047 12024702176 022260 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # test_sfffile.py import os import tempfile from cogent.util.unit_test import TestCase, main from cogent.parse.binary_sff import parse_binary_sff from cogent.app.util import ApplicationError from cogent.app.sfffile import Sfffile __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Prototype" class SfffileTests(TestCase): """Test the Sfffile application controller. """ def setUp(self): test_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) self.sff_fp = os.path.join(test_dir, 'data', 'test.sff') def _check_unmodified_sff_contents(self, sff_file): """Extracting repeated code from sfffile tests""" sff_file.seek(0) header, reads_gen = parse_binary_sff(sff_file) reads = list(reads_gen) self.assertEqual(header["number_of_reads"], 1) self.assertEqual(len(reads), 1) self.assertEqual(reads[0]['Name'], 'FA6P1OK01CGMHQ') def test_exit_status(self): """Sfffile should raise ApplicationError if exit status is nonzero.""" a = Sfffile() self.assertRaises(ApplicationError, a, 'an_sff_file_that_does_not_exist.sff') def test_call(self): """Simple sfffile call should produce expected output.""" a = Sfffile() app_results = a(self.sff_fp) self._check_unmodified_sff_contents(app_results['sff']) app_results.cleanUp() def test_call_with_output_path(self): """Sfffile should store output to specified filepath.""" _, output_fp = tempfile.mkstemp() a = Sfffile() a.Parameters['-o'].on(output_fp) app_results = a(self.sff_fp) self._check_unmodified_sff_contents(open(output_fp)) self._check_unmodified_sff_contents(app_results['sff']) app_results.cleanUp() def test_call_with_included_accession_numbers(self): """Sfffile should include specified accession numbers in output.""" accno_file = tempfile.NamedTemporaryFile() accno_file.write('FA6P1OK01CGMHQ\n') accno_file.seek(0) a = Sfffile() a.Parameters['-i'].on(accno_file.name) app_results = a(self.sff_fp) self._check_unmodified_sff_contents(app_results['sff']) app_results.cleanUp() def test_call_with_excluded_accession_numbers(self): """Sfffile should exclude specified accession numbers in output.""" accno_file = tempfile.NamedTemporaryFile() accno_file.write('FA6P1OK01CGMHQ\n') accno_file.seek(0) a = Sfffile() a.Parameters['-e'].on(accno_file.name) app_results = a(self.sff_fp) header, reads_gen = parse_binary_sff(app_results['sff']) reads = list(reads_gen) self.assertEqual(header["number_of_reads"], 0) self.assertEqual(len(reads), 0) app_results.cleanUp() if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_sffinfo.py000644 000765 000024 00000027072 12024702176 022275 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # test_sffinfo.py import os import shutil import tempfile from cogent.util.unit_test import TestCase, main from cogent.app.util import ApplicationError from cogent.app.sffinfo import ( ManyValuedParameter, Sffinfo, sffinfo_from_file) __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Prototype" class ManyValuedParameterTests(TestCase): def test_init(self): """__init__() should set appropriate class variables.""" p = ManyValuedParameter(None, None) self.assertEqual(p.Name, None) self.assertEqual(p.Value, None) self.assertEqual(p.Delimiter, None) self.assertEqual(p.Quote, None) p = ManyValuedParameter('-', 'a', Values=['abc']) self.assertEqual(p.Value, ['abc']) def test_append(self): """append() should append values to Value class attribute.""" p = ManyValuedParameter(None, None) p.append('abc') p.append(3) self.assertEqual(p.Value, ['abc', 3]) def test_on(self): """on() should alias append().""" p = ManyValuedParameter(None, None) p.on('abc') p.on(3) self.assertEqual(p.Value, ['abc', 3]) def test_str(self): """__str__() should produce quoted, delimited string of parameter values.""" p = ManyValuedParameter(None, None) p.append('abc') p.append(3) self.assertEqual(str(p), 'abc3') p = ManyValuedParameter(None, None, Quote='"', ValueDelimiter=',') p.append('abc') p.append(3) self.assertEqual(str(p), '"abc","3"') class SffinfoTests(TestCase): def setUp(self): test_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) self.sff_fp = os.path.join(test_dir, 'data', 'test.sff') def test_base_command(self): """BaseCommand should include non-positional parameters.""" s = Sffinfo() expected = 'cd "%s/"; sffinfo' % os.getcwd() self.assertEqual(s.BaseCommand, expected) s.Parameters['-a'].on() expected = 'cd "%s/"; sffinfo -a' % os.getcwd() self.assertEqual(s.BaseCommand, expected) # accession number parameters should not be included in base command s.Parameters['-a'].off() s.Parameters['accno'].on('12345ABC') expected = 'cd "%s/"; sffinfo' % os.getcwd() self.assertEqual(s.BaseCommand, expected) def test_changing_working_dir(self): """WorkingDir should be created and included in BaseCommand.""" # Directory is not created, only filename is returned working_dir = tempfile.mktemp() self.assertRaises(OSError, os.rmdir, working_dir) a = Sffinfo(WorkingDir=working_dir) expected = 'cd "%s/"; sffinfo' % working_dir self.assertEqual(a.BaseCommand, expected) # Removing the directory is proof that it was created. If the # directory is not there, an OSError will be raised. os.rmdir(working_dir) def test_input_handler(self): """Sffinfo should decorate input handler output with accession numbers""" my_accno = '12345ABC' a = Sffinfo() a.Parameters['accno'].on(my_accno) self.assertEqual(a.InputHandler, '_input_handler_decorator') observed = getattr(a, a.InputHandler)(self.sff_fp) expected = '"%s" %s' % (self.sff_fp, my_accno) self.assertEqual(observed, expected) def test_call(self): """Simple sffinfo call should produce expected output.""" a = Sffinfo() app_results = a(self.sff_fp) observed = app_results['StdOut'].read() self.assertEqual(observed, sffinfo_output) def test_call_with_accno(self): """Sffinfo accession number parameters should filter output.""" # Valid accession number a = Sffinfo() a.Parameters['accno'].on('FA6P1OK01CGMHQ') app_results = a(self.sff_fp) observed = app_results['StdOut'].read() self.assertEqual(observed, sffinfo_output) # Invalid accession number a = Sffinfo() a.Parameters['accno'].on('AAAAAAAAAAAAAA') app_results = a(self.sff_fp) observed = app_results['StdOut'].read() self.assertEqual(observed, empty_sffinfo_output) def test_call_with_flags(self): """Sffinfo flags should alter output as expected.""" # -a flag a = Sffinfo() a.Parameters['-a'].on() app_results = a(self.sff_fp) observed = app_results['StdOut'].read() self.assertEqual(observed, 'FA6P1OK01CGMHQ\n') # -s flag a = Sffinfo() a.Parameters['-s'].on() app_results = a(self.sff_fp) observed = app_results['StdOut'].read() expected = ( '>FA6P1OK01CGMHQ length=48 xy=0892_1356 region=1 ' 'run=R_2008_05_28_17_11_38_\n' 'ATCTGAGCTGGGTCATAGCTGCCTCCGTAGGAGGTGCCTCCCTACGGC\n' ) self.assertEqual(observed, expected) # -q flag a = Sffinfo() a.Parameters['-q'].on() app_results = a(self.sff_fp) observed = app_results['StdOut'].read() expected = ( '>FA6P1OK01CGMHQ length=48 xy=0892_1356 region=1 ' 'run=R_2008_05_28_17_11_38_\n' '32 32 32 32 32 32 32 25 25 21 21 21 28 32 32 31 30 30 32 32 32 ' '33 31 25 18 18 20 18 32 30 28 23 22 22 24 28 18 19 18 16 16 16 ' '17 18 13 17 27 21\n') self.assertEqual(observed, expected) # -f flag a = Sffinfo() a.Parameters['-f'].on() app_results = a(self.sff_fp) observed = app_results['StdOut'].read() expected = ( '>FA6P1OK01CGMHQ xy=0892_1356 region=1 run=R_2008_05_28_17_11_38_\n' 'A,1.02 C,0.00 G,0.00 T,0.99 A,0.00 C,1.00 G,0.00 T,1.00 A,0.00 C,0.00\n' 'G,1.00 T,0.00 A,1.10 C,0.00 G,1.08 T,0.00 A,0.00 C,1.46 G,0.00 T,0.88\n' 'A,0.18 C,0.00 G,2.69 T,1.01 A,0.08 C,0.96 G,0.00 T,0.02 A,0.92 C,0.08\n' 'G,0.00 T,0.98 A,0.68 C,0.00 G,0.89 T,0.00 A,0.00 C,1.15 G,0.00 T,1.13\n' 'A,0.00 C,0.02 G,1.12 T,0.05 A,0.15 C,1.84 G,0.00 T,1.10 A,0.00 C,2.47\n' 'G,0.96 T,0.86 A,1.06 C,0.00 G,1.96 T,0.12 A,0.93 C,0.13 G,1.65 T,1.06\n' 'A,0.06 C,0.00 G,0.99 T,0.00 A,0.00 C,1.87 G,0.44 T,1.08 A,0.00 C,3.25\n' 'G,0.09 T,0.97 A,0.50 C,1.00 G,1.72 T,0.07 A,0.00 C,0.92 \n') self.assertEqual(observed, expected) class SffinfoFunctionTests(TestCase): def test_sffinfo_from_file(self): """sffinfo_from_file should return file object with sffinfo output.""" test_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) sff_fp = os.path.join(test_dir, 'data', 'test.sff') observed = sffinfo_from_file(sff_fp).read() self.assertEqual(observed, sffinfo_output) sffinfo_output = '''Common Header: Magic Number: 0x2E736666 Version: 0001 Index Offset: 1504 Index Length: 706 # of Reads: 1 Header Length: 440 Key Length: 4 # of Flows: 400 Flowgram Code: 1 Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG Key Sequence: TCAG >FA6P1OK01CGMHQ Run Prefix: R_2008_05_28_17_11_38_ Region #: 1 XY Location: 0892_1356 Run Name: R_2008_05_28_17_11_38_FLX02070135_adminrig_KnightLauber Analysis Name: /data/2008_05_28/R_2008_05_28_17_11_38_FLX02070135_adminrig_KnightLauber/D_2008_05_28_21_13_06_FLX02070135_KnightLauber_FullAnalysisAmplicons Full Path: /data/2008_05_28/R_2008_05_28_17_11_38_FLX02070135_adminrig_KnightLauber/D_2008_05_28_21_13_06_FLX02070135_KnightLauber_FullAnalysisAmplicons/../D_2008_05_29_13_52_01_FLX02070135_Knight_Lauber_jones_SignalProcessingAmplicons Read Header Len: 32 Name Length: 14 # of Bases: 77 Clip Qual Left: 5 Clip Qual Right: 52 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.04 0.00 1.01 0.00 0.00 0.96 0.00 1.02 0.00 1.02 0.00 0.00 0.99 0.00 1.00 0.00 1.00 0.00 0.00 1.00 0.00 1.10 0.00 1.08 0.00 0.00 1.46 0.00 0.88 0.18 0.00 2.69 1.01 0.08 0.96 0.00 0.02 0.92 0.08 0.00 0.98 0.68 0.00 0.89 0.00 0.00 1.15 0.00 1.13 0.00 0.02 1.12 0.05 0.15 1.84 0.00 1.10 0.00 2.47 0.96 0.86 1.06 0.00 1.96 0.12 0.93 0.13 1.65 1.06 0.06 0.00 0.99 0.00 0.00 1.87 0.44 1.08 0.00 3.25 0.09 0.97 0.50 1.00 1.72 0.07 0.00 0.92 0.58 0.00 0.00 0.59 0.06 0.11 0.09 0.07 0.06 0.16 0.00 0.24 0.03 0.00 0.12 0.06 0.16 0.00 0.18 0.00 0.00 0.14 0.00 0.15 0.00 0.18 0.00 0.03 0.14 0.03 0.13 0.01 0.19 0.00 0.02 0.33 0.05 0.00 0.16 0.10 0.35 0.01 0.21 0.04 0.09 0.18 0.13 0.19 0.00 0.10 0.51 0.26 0.00 0.23 0.19 0.27 0.01 0.29 0.05 0.14 0.17 0.16 0.18 0.27 0.09 0.26 0.10 0.18 0.23 0.15 0.22 0.13 0.37 0.11 0.11 0.26 0.59 0.14 0.06 0.33 0.34 0.26 0.05 0.27 0.44 0.19 0.10 0.35 0.27 0.15 0.34 0.28 0.45 0.14 0.16 0.34 0.27 0.12 0.07 0.25 0.18 0.12 0.04 0.23 0.16 0.12 0.05 0.20 0.16 0.11 0.03 0.21 0.16 0.10 0.02 0.21 0.16 0.12 0.02 0.20 0.15 0.10 0.02 0.23 0.15 0.11 0.02 0.22 0.14 0.09 0.02 0.20 0.13 0.09 0.01 0.19 0.13 0.08 0.02 0.17 0.12 0.08 0.03 0.17 0.09 0.08 0.01 0.14 0.09 0.07 0.01 0.15 0.09 0.06 0.01 0.13 0.08 0.06 0.00 0.13 0.08 0.05 0.02 0.12 0.07 0.05 0.01 0.11 0.07 0.05 0.00 0.10 0.07 0.05 0.01 0.11 0.08 0.04 0.00 0.10 0.06 0.05 0.01 0.09 0.06 0.04 0.01 0.08 0.07 0.05 0.00 0.08 0.06 0.05 0.00 0.09 0.06 0.04 0.00 0.09 0.06 0.04 0.01 0.08 0.06 0.04 0.00 0.09 0.06 0.03 0.00 0.09 0.06 0.02 0.00 0.09 0.06 0.04 0.00 0.08 0.05 0.03 0.00 0.07 0.05 0.02 0.00 0.08 0.04 0.03 0.00 0.07 0.04 0.03 0.00 0.07 0.05 0.02 0.00 0.07 0.05 0.02 0.00 0.06 0.04 0.02 0.00 0.06 0.03 0.03 0.00 0.08 0.02 0.00 0.00 0.07 0.03 0.01 0.00 0.06 0.03 0.02 0.00 0.05 0.03 0.02 0.00 0.05 0.03 0.01 0.00 0.06 0.02 0.00 0.00 0.05 0.01 0.01 0.00 0.04 0.01 0.01 0.00 0.04 0.01 0.01 0.00 0.05 0.01 0.00 0.00 0.04 0.02 0.01 0.00 0.03 0.02 0.01 0.00 0.03 0.01 0.00 0.00 0.03 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.02 0.00 Flow Indexes: 1 3 6 8 10 13 15 17 20 22 24 27 29 32 32 32 33 35 38 41 42 44 47 49 52 55 55 57 59 59 60 61 62 64 64 66 68 68 69 72 75 75 77 79 79 79 81 82 83 84 84 87 88 91 102 126 130 138 140 145 153 157 161 164 166 171 175 179 183 187 191 195 199 203 211 215 219 Bases: tcagATCTGAGCTGGGTCATAGCTGCCTCCGTAGGAGGTGCCTCCCTACGGCgcnnnannnnngnnnnnnnnnnnnn Quality Scores: 32 32 32 32 32 32 32 32 32 32 32 25 25 21 21 21 28 32 32 31 30 30 32 32 32 33 31 25 18 18 20 18 32 30 28 23 22 22 24 28 18 19 18 16 16 16 17 18 13 17 27 21 20 21 0 0 0 17 0 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 ''' empty_sffinfo_output = '''Common Header: Magic Number: 0x2E736666 Version: 0001 Index Offset: 1504 Index Length: 706 # of Reads: 1 Header Length: 440 Key Length: 4 # of Flows: 400 Flowgram Code: 1 Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG Key Sequence: TCAG ''' if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_sfold.py000644 000765 000024 00000005434 12024702176 021750 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides Tests for Sfold application controller. IMPORTANT!!! don't forget to set param_dir variable in sfold application controller IMPORTANT!!! """ from os import remove from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.app.sfold import Sfold __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class SfoldTest(TestCase): """Tests for Sfold application controller""" def setUp(self): self.input = sfold_input def test_stdout_input_as_lines(self): """Test Sfold stdout input as lines""" s = Sfold(InputHandler='_input_as_lines') res = s(self.input) self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() def test_stdout_input_as_string(self): """Test Sfold stdout input as string""" s = Sfold() f = open('/tmp/single.fasta','w') f.write('\n'.join(self.input)) f.close() res = s('/tmp/single.fasta') self.assertEqual(res['ExitStatus'],0) assert res['StdOut'] is not None res.cleanUp() remove('/tmp/single.fasta') def test_get_result_path(self): """Tests sfold result path""" s = Sfold(InputHandler='_input_as_lines') res = s(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus', '10structure','10structure_2','Dharmacon_thermo','bp', 'bprob','cdf','fe','loopr','oligo','oligo_f','pdf', 'sample','sample_1000','sirna','sirna_f','sirna_s', 'smfe','sstrand','stability']) self.assertEqual(res['ExitStatus'],0) assert res['10structure'] is not None assert res['10structure_2'] is not None assert res['Dharmacon_thermo'] is not None assert res['bp'] is not None assert res['cdf'] is not None assert res['fe'] is not None assert res['loopr'] is not None assert res['oligo'] is not None assert res['oligo_f'] is not None assert res['pdf'] is not None assert res['sample'] is not None assert res['sample_1000'] is not None assert res['sirna'] is not None assert res['sirna_f'] is not None assert res['sirna_s'] is not None assert res['smfe'] is not None assert res['sstrand'] is not None assert res['stability'] is not None res.cleanUp() sfold_input = ['>seq1\n', 'GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_stride.py000644 000765 000024 00000004377 12024702176 022140 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os, tempfile try: from cogent.util.unit_test import TestCase, main from cogent.parse.pdb import PDBParser from cogent.app.stride import Stride except ImportError: from zenpdb.cogent.util.unit_test import TestCase, main from zenpdb.cogent.parse.pdb import PDBParser from zenpdb.cogent.app.stride import Stride __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class StrideTest(TestCase): """Tests for Stride application controller""" def setUp(self): self.input_file = os.path.join('data', '2E12.pdb') self.input_structure = PDBParser(open(self.input_file)) def test_stdout_input_from_entity(self): """Test Stride when input is an entity""" s = Stride() res = s(self.input_structure) self.assertEqual(res['ExitStatus'], 0) assert res['StdOut'] is not None self.assertTrue(res['StdOut'].readline().endswith('--------- 2E12\n')) self.assertEquals(len(res['StdOut'].readlines()), 267) res.cleanUp() def test_stdout_input_from_path(self): """Test Stride when input is an entity""" s = Stride(InputHandler='_input_as_path') res = s(self.input_file) self.assertEqual(res['ExitStatus'], 0) assert res['StdOut'] is not None self.assertTrue(res['StdOut'].readline().endswith('--------- 2E12\n')) self.assertEquals(len(res['StdOut'].readlines()), 267) res.cleanUp() def test_get_result_path(self): """Tests stride result path""" s = Stride() fd, name = tempfile.mkstemp() os.close(fd) s.Parameters['-f'].on(name) res = s(self.input_structure) self.assertEqual(res['ExitStatus'], 0) self.assertEqualItems(res.keys(), ['StdOut', 'StdErr', 'ExitStatus', 'File']) self.assertTrue(res['File'].readline().endswith('--------- 2E12\n')) self.assertEquals(len(res['File'].readlines()), 267) res.cleanUp() self.assertFalse(os.path.exists(name)) if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_uclust.py000755 000765 000024 00000073140 12024702176 022162 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ : provides unit tests for the uclust.py module Modified from Daniel McDonald's test_cd_hit.py code on Feb-4-2010 """ from os import getcwd, rmdir, remove from subprocess import Popen, PIPE, STDOUT from os.path import isfile from cogent.util.misc import remove_files from cogent.core.moltype import DNA from cogent.util.unit_test import TestCase, main from cogent.app.util import ApplicationError, get_tmp_filename from cogent.app.uclust import (Uclust, uclust_fasta_sort_from_filepath, uclust_cluster_from_sorted_fasta_filepath, get_output_filepaths,clusters_from_uc_file, get_clusters_from_fasta_filepath, uclust_search_and_align_from_fasta_filepath, process_uclust_pw_alignment_results, UclustParseError) __author__ = "William Walters" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald","William Walters","Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "William Walters" __email__ = "William.A.Walters@colorado.edu" __status__ = "Production" class UclustTests(TestCase): def setUp(self): self.tmp_unsorted_fasta_filepath = \ get_tmp_filename(prefix="uclust_test", suffix=".fasta") tmp_unsorted_fasta = open(self.tmp_unsorted_fasta_filepath,"w") tmp_unsorted_fasta.write('\n'.join(raw_dna_seqs)) tmp_unsorted_fasta.close() self.tmp_sorted_fasta_filepath = \ get_tmp_filename(prefix = "uclust_test", suffix = ".fasta") tmp_sorted_fasta = open(self.tmp_sorted_fasta_filepath,"w") tmp_sorted_fasta.write('\n'.join(sorted_dna_seqs)) tmp_sorted_fasta.close() self.tmp_uc_filepath = \ get_tmp_filename(prefix = "uclust_test", suffix = ".uc") tmp_uc = open(self.tmp_uc_filepath,"w") tmp_uc.write('\n'.join(uc_dna_clusters)) tmp_uc.close() self.tmp_clstr_filepath = \ get_tmp_filename(prefix = "uclust_test", suffix = ".clstr") self.WorkingDir = '/tmp/uclust_test' self.tmpdir = '/tmp/' self.files_to_remove = [self.tmp_unsorted_fasta_filepath, self.tmp_sorted_fasta_filepath, self.tmp_uc_filepath, self.tmp_clstr_filepath] def tearDown(self): remove_files(self.files_to_remove,error_on_missing=False) def test_fasta_sorting(self): """ Should sort fasta seqs from largest to smallest in outfile Since a fasta file has to be passed to the app controller for uclust, a temporary fasta file is created, and the raw fasta seqs supplied in this module are written to it. This file is sent to the app controller, and the resulting sorted file is compared to the expected results to ensure proper function of uclust as called by this app controller.""" test_app = Uclust({'--tmpdir':self.tmpdir}) test_app_res = test_app(data = \ {'--mergesort':self.tmp_unsorted_fasta_filepath,\ '--output':self.tmp_sorted_fasta_filepath}) sorted_fasta_actual = [l.strip() for l in open(test_app_res['Output'].name,"U")] sorted_fasta_expected = [l.strip() for l in sorted_dna_seqs if l] self.assertEqual(sorted_fasta_actual,sorted_fasta_expected) test_app_res.cleanUp() def test_clustering_fasta_filepath(self): """ Should create clusters in uclust format from sorted fasta file Since a fasta file has to be passed to the app controller for uclust, a temporary fasta file is created, and the sorted seqs supplied in this module are written to it. This file is sent to the app controller, and the resulting uclust file is compared to the expected results to ensure proper function of uclust as called by this app controller.""" test_app = Uclust({'--id':0.9},HALT_EXEC=False) test_app_res = test_app(data = \ {'--input':self.tmp_sorted_fasta_filepath,\ '--uc':self.tmp_uc_filepath}) uc_file = open(test_app_res['ClusterFile'].name,"U") # compare the actual and expect uc files, ignoring comment lines uc_file_actual = [l.strip() for l in uc_file if not l.startswith('#')] uc_file_expected = [l.strip() for l in uc_dna_clusters if not l.startswith('#')] self.assertEqual(uc_file_actual, uc_file_expected) test_app_res.cleanUp() class UclustConvenienceWrappers(TestCase): """ Unit tests for uclust convenience wrappers """ def setUp(self): self.tmp_unsorted_fasta_filepath = \ get_tmp_filename(prefix = "uclust_test", suffix = ".fasta") tmp_unsorted_fasta = open(self.tmp_unsorted_fasta_filepath,"w") tmp_unsorted_fasta.write('\n'.join(raw_dna_seqs)) tmp_unsorted_fasta.close() self.tmp_raw_dna_seqs_rc_filepath = \ get_tmp_filename(prefix = "uclust_test", suffix = ".fasta") tmp_rc_fasta = open(self.tmp_raw_dna_seqs_rc_filepath,"w") tmp_rc_fasta.write('\n'.join(raw_dna_seqs_rc)) tmp_rc_fasta.close() self.tmp_sorted_fasta_filepath = \ get_tmp_filename(prefix = "uclust_test", suffix = ".fasta") tmp_sorted_fasta = open(self.tmp_sorted_fasta_filepath,"w") tmp_sorted_fasta.write('\n'.join(sorted_dna_seqs)) tmp_sorted_fasta.close() self.tmp_uc_filepath = \ get_tmp_filename(prefix = "uclust_test", suffix = ".uc") tmp_uc = open(self.tmp_uc_filepath,"w") tmp_uc.write('\n'.join(uc_dna_clusters)) tmp_uc.close() self.tmp_clstr_filepath = \ get_tmp_filename(prefix = "uclust_test", suffix = ".clstr") self.search_align_out1_expected = search_align_out1_expected self.search_align_out_fasta_pairs1 = search_align_out_fasta_pairs1 self.search_align_out_uc1 = search_align_out_uc1 self.search_align_query1_fp = \ get_tmp_filename(prefix = "uclust_test", suffix = ".clstr") open(self.search_align_query1_fp,'w').write(search_align_query1) self.search_align_template1_fp = \ get_tmp_filename(prefix = "uclust_test", suffix = ".clstr") open(self.search_align_template1_fp,'w').write(search_align_template1) self.search_align_out2_expected = search_align_out2_expected self.search_align_query2_fp = \ get_tmp_filename(prefix = "uclust_test", suffix = ".clstr") open(self.search_align_query2_fp,'w').write(search_align_query2) self.search_align_template2_fp = \ get_tmp_filename(prefix = "uclust_test", suffix = ".clstr") open(self.search_align_template2_fp,'w').write(search_align_template2) self.ref_dna_seqs_fp = get_tmp_filename(prefix = "uclust_test", suffix = ".fasta") open(self.ref_dna_seqs_fp,'w').write(ref_dna_seqs) self.files_to_remove = [self.tmp_unsorted_fasta_filepath, self.tmp_raw_dna_seqs_rc_filepath, self.tmp_sorted_fasta_filepath, self.tmp_uc_filepath, self.tmp_clstr_filepath, self.search_align_query1_fp, self.search_align_template1_fp, self.search_align_query2_fp, self.search_align_template2_fp, self.ref_dna_seqs_fp] self.ref_test_clusters1 = ref_test_clusters1 self.ref_test_failures1 = ref_test_failures1 self.ref_test_new_seeds1 = ref_test_new_seeds1 self.ref_test_clusters2 = ref_test_clusters2 self.ref_test_failures2 = ref_test_failures2 self.ref_test_new_seeds2 = ref_test_new_seeds2 self.uc_dna_clusters = uc_dna_clusters self.uc_lines1 = uc_lines1 self.uc_lines_overlapping_lib_input_seq_ids = \ uc_lines_overlapping_lib_input_seq_ids def tearDown(self): remove_files(self.files_to_remove,error_on_missing=False) def test_uclust_fasta_sort_from_filepath(self): """ Given an unsorted fasta filepath, will return sorted file """ app_res = \ uclust_fasta_sort_from_filepath(self.tmp_unsorted_fasta_filepath) sorted_fasta_actual = [l.strip() for l in open(app_res['Output'].name,"U")] sorted_fasta_expected = [l.strip() for l in sorted_dna_seqs if l] self.assertEqual(sorted_fasta_actual,sorted_fasta_expected) app_res.cleanUp() def test_clusters_from_uc_file(self): """ clusters_from_uc_file functions as expected """ expected_clusters = {'s2':['s2','s3']} expected_failures = ['s1'] expected_new_seeds = ['s2'] self.assertEqual(clusters_from_uc_file(self.uc_lines1), (expected_clusters,expected_failures,expected_new_seeds)) def test_clusters_from_uc_file_error(self): """ clusters_from_uc_file raises error when lib/input seq ids overlap""" self.assertRaises(UclustParseError, clusters_from_uc_file, self.uc_lines_overlapping_lib_input_seq_ids) def test_uclust_cluster_from_sorted_fasta_filepath(self): """ Given a sorted fasta filepath, will return uclust (.uc) file """ app_res = \ uclust_cluster_from_sorted_fasta_filepath(self.tmp_sorted_fasta_filepath, \ percent_ID = 0.90,HALT_EXEC=False) uc_file = open(app_res['ClusterFile'].name,"U") # compare the actual and expect uc files, ignoring comment lines uc_file_actual = [l.strip() for l in uc_file if not l.startswith('#')] uc_file_expected = [l.strip() for l in uc_dna_clusters if not l.startswith('#')] self.assertEqual(uc_file_actual, uc_file_expected) app_res.cleanUp() def test_get_output_filepaths(self): """ Properly generates output filepath names """ uc_res = \ get_output_filepaths("/tmp/","test_seqs.fasta") self.assertEqual(uc_res, "/tmp/test_seqs_clusters.uc") def test_get_clusters_from_fasta_filepath(self): """ Tests for return of lists of OTUs from given fasta filepath """ clusters_res = \ get_clusters_from_fasta_filepath(self.tmp_unsorted_fasta_filepath, \ original_fasta_path = None, percent_ID = 0.90, save_uc_files=False) expected_cluster_list.sort() expected_failure_list.sort() expected_new_seed_list.sort() clusters_res[0].sort() clusters_res[1].sort() clusters_res[2].sort() self.assertEqual(clusters_res,(expected_cluster_list, expected_failure_list, expected_new_seed_list)) def test_get_clusters_from_fasta_filepath_reference_db_only(self): """ Correct clusters returned when clustering against a database only """ clusters_res = get_clusters_from_fasta_filepath( self.tmp_unsorted_fasta_filepath, original_fasta_path = None, save_uc_files=False, max_accepts=7,max_rejects=12, percent_ID = 0.90, subject_fasta_filepath=self.ref_dna_seqs_fp, suppress_new_clusters=True, HALT_EXEC=False) self.ref_test_clusters1.sort() self.ref_test_failures1.sort() self.ref_test_new_seeds1.sort() clusters_res[0].sort() clusters_res[1].sort() clusters_res[2].sort() self.assertEqual(clusters_res,(self.ref_test_clusters1, self.ref_test_failures1, self.ref_test_new_seeds1)) def test_get_clusters_from_fasta_filepath_extending_reference_db(self): """ Correct clusters when clustering against db and adding new clusters """ clusters_res = get_clusters_from_fasta_filepath( self.tmp_unsorted_fasta_filepath, original_fasta_path = None, max_accepts=7,max_rejects=12, percent_ID = 0.90, subject_fasta_filepath=self.ref_dna_seqs_fp, suppress_new_clusters=False,enable_rev_strand_matching=True, HALT_EXEC=False, save_uc_files=False) self.ref_test_clusters2.sort() self.ref_test_failures2.sort() self.ref_test_new_seeds2.sort() clusters_res[0].sort() clusters_res[1].sort() clusters_res[2].sort() self.assertEqual(clusters_res,(self.ref_test_clusters2, self.ref_test_failures2, self.ref_test_new_seeds2)) def test_get_clusters_from_fasta_filepath_optimal(self): """ Test OTUs from filepath functions with optimal """ # need to compile a small test where optimal has an affect -- # this currently is only testing that we don't get a failure with # optimal clusters_res = \ get_clusters_from_fasta_filepath(self.tmp_unsorted_fasta_filepath, original_fasta_path = None, save_uc_files=False, percent_ID = 0.90, optimal = True) expected_cluster_list.sort() expected_failure_list.sort() expected_new_seed_list.sort() clusters_res[0].sort() clusters_res[1].sort() clusters_res[2].sort() self.assertEqual(clusters_res,(expected_cluster_list, expected_failure_list, expected_new_seed_list)) def test_get_clusters_from_fasta_filepath_suppress_sort(self): """ Test OTUs from filepath functions with suppress sort """ expected = [['uclust_test_seqs_0'], ['uclust_test_seqs_1'], ['uclust_test_seqs_2'], ['uclust_test_seqs_3'], ['uclust_test_seqs_4'], ['uclust_test_seqs_5'], ['uclust_test_seqs_6', 'uclust_test_seqs_8'], ['uclust_test_seqs_7'], ['uclust_test_seqs_9']] clusters_res = \ get_clusters_from_fasta_filepath(self.tmp_unsorted_fasta_filepath, original_fasta_path = None, percent_ID = 0.90, suppress_sort = True, save_uc_files=False) expected_cluster_list.sort() expected_failure_list.sort() expected_new_seed_list.sort() clusters_res[0].sort() clusters_res[1].sort() clusters_res[2].sort() self.assertEqual(clusters_res,(expected_cluster_list, expected_failure_list, expected_new_seed_list)) def test_get_clusters_from_fasta_filepath_rev_strand_match(self): """ Test OTUs from filepath functions with rev strand match """ # seq and its rc don't cluster when enable_rev_strand_matching = False expected_cluster_list = [['uclust_test_seqs_0'], ['uclust_test_seqs_0_rc']] expected_failure_list = [] expected_new_seed_list = ['uclust_test_seqs_0', 'uclust_test_seqs_0_rc'] clusters_res = \ get_clusters_from_fasta_filepath(self.tmp_raw_dna_seqs_rc_filepath, original_fasta_path = None, save_uc_files=False, percent_ID = 0.90, enable_rev_strand_matching = False) expected_cluster_list.sort() expected_failure_list.sort() expected_new_seed_list.sort() clusters_res[0].sort() clusters_res[1].sort() clusters_res[2].sort() self.assertEqual(clusters_res,(expected_cluster_list, expected_failure_list, expected_new_seed_list)) # seq and its rc cluster when enable_rev_strand_matching = False expected_cluster_list = [['uclust_test_seqs_0', 'uclust_test_seqs_0_rc']] expected_failure_list = [] expected_new_seed_list = ['uclust_test_seqs_0'] clusters_res = \ get_clusters_from_fasta_filepath(self.tmp_raw_dna_seqs_rc_filepath, original_fasta_path = None, save_uc_files=False, percent_ID = 0.90, enable_rev_strand_matching = True) expected_cluster_list.sort() expected_failure_list.sort() expected_new_seed_list.sort() clusters_res[0].sort() clusters_res[1].sort() clusters_res[2].sort() self.assertEqual(clusters_res,(expected_cluster_list, expected_failure_list, expected_new_seed_list)) def test_process_uclust_pw_alignment_results(self): """parsing of pairwise alignment fasta pairs file functions as expected """ actual = list(process_uclust_pw_alignment_results(\ self.search_align_out_fasta_pairs1,self.search_align_out_uc1)) expected = self.search_align_out1_expected # iterate over results so error output will highlight the bad match for a,e in zip(actual,expected): self.assertEqual(a,e) # make sure the full result objects are the same self.assertEqual(actual,expected) def test_uclust_search_and_align_from_fasta_filepath(self): """ uclust_search_and_align_from_fasta_filepath functions as expected """ # rev comp matches allowed (default) actual = list(uclust_search_and_align_from_fasta_filepath( self.search_align_query1_fp,self.search_align_template1_fp)) self.assertEqual(actual,self.search_align_out1_expected) # rev comp matches not allowed actual = list(uclust_search_and_align_from_fasta_filepath( self.search_align_query1_fp,self.search_align_template1_fp, enable_rev_strand_matching=False)) self.assertEqual(actual,self.search_align_out1_expected[:2]) def test_uclust_search_and_align_from_fasta_filepath_protein(self): """ uclust_search_and_align_from_fasta_filepath functions with protein """ # rev comp matches allowed (default) actual = list(uclust_search_and_align_from_fasta_filepath( self.search_align_query2_fp,self.search_align_template2_fp)) self.assertEqual(actual,self.search_align_out2_expected) def test_uclust_supported_version(self): """uclust version is supported """ command = 'uclust --version' proc = Popen(command,shell=True,universal_newlines=True,\ stdout=PIPE,stderr=STDOUT) stdout = proc.stdout.read() version_string = stdout.strip().split('v')[-1].strip('q') try: version = tuple(map(int,version_string.split('.'))) acceptable_version = version >= (1,2,22) except ValueError: acceptable_version = False self.assertTrue(acceptable_version,\ "Unsupported uclust version. 1.2.22 or later "+\ "is required, but running %s." % version_string) raw_dna_seqs = """>uclust_test_seqs_0 ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT >uclust_test_seqs_1 GCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT >uclust_test_seqs_2 CCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT >uclust_test_seqs_3 CCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT >uclust_test_seqs_4 ACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT >uclust_test_seqs_5 CCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA >uclust_test_seqs_6 CGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT >uclust_test_seqs_7 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >uclust_test_seqs_8 CGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT >uclust_test_seqs_9 GGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA """.split('\n') ref_dna_seqs = """>ref1 25 random bases appended to uclust_test_seqs_0 and one mismatch ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATATTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCTATAGCAGCCCCAGCGTTTACTTCTA >ref2 15 random bases prepended to uclust_test_seqs_1 and one mismatch GCTGCGGCGTCCTGCGCCACGGTGGGTACAACACGTCCACTACATCTGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT >ref3 5 random bases prepended and 10 random bases appended to uclust_test_seqs_2 ATAGGCCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACTGCCTGATTCA >ref4 exact match to uclust_test_seqs_3 CCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT """ ref_test_clusters1 = [['uclust_test_seqs_0'],['uclust_test_seqs_1'], ['uclust_test_seqs_2'],['uclust_test_seqs_3']] ref_test_failures1 = ['uclust_test_seqs_4','uclust_test_seqs_5', 'uclust_test_seqs_6','uclust_test_seqs_7', 'uclust_test_seqs_8','uclust_test_seqs_9'] ref_test_new_seeds1 = [] ref_test_clusters2 = [['uclust_test_seqs_0'],['uclust_test_seqs_1'], ['uclust_test_seqs_2'],['uclust_test_seqs_3'], ['uclust_test_seqs_4'],['uclust_test_seqs_5'], ['uclust_test_seqs_6','uclust_test_seqs_8'], ['uclust_test_seqs_7'],['uclust_test_seqs_9']] ref_test_failures2 = [] ref_test_new_seeds2 = ['uclust_test_seqs_4','uclust_test_seqs_5','uclust_test_seqs_6', 'uclust_test_seqs_7','uclust_test_seqs_9'] raw_dna_seqs_rc = """>uclust_test_seqs_0 ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT >uclust_test_seqs_0_rc AGCTCTGACACAAAACTGACGTGATGTGCCTTAAGTATCCAACCCGTTGGATGGGACGTCTTGTAGCCACCGT """.split('\n') sorted_dna_seqs=""">uclust_test_seqs_7 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >uclust_test_seqs_4 ACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT >uclust_test_seqs_2 CCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT >uclust_test_seqs_3 CCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT >uclust_test_seqs_1 GCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT >uclust_test_seqs_5 CCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA >uclust_test_seqs_6 CGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT >uclust_test_seqs_0 ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT >uclust_test_seqs_8 CGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT >uclust_test_seqs_9 GGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA """.split('\n') # Clusters are created at a 0.90% identity uc_dna_clusters= """# uclust --input /tmp/uclust_testBGwZvcikrbNefYGRTk0u.fasta --id 0.9 --uc /tmp/uclust_testrbcO0CyBVpV9AwH3OIK1.uc # version=1.1.577 # Tab-separated fields: # 1=Type, 2=ClusterNr, 3=SeqLength or ClusterSize, 4=PctId, 5=Strand, 6=QueryStart, 7=SeedStart, 8=Alignment, 9=QueryLabel, 10=TargetLabel # Record types (field 1): L=LibSeed, S=NewSeed, H=Hit, R=Reject, D=LibCluster, C=NewCluster, N=NoHit # For C and D types, PctId is average id with seed. # QueryStart and SeedStart are zero-based relative to start of sequence. # If minus strand, SeedStart is relative to reverse-complemented seed. S 0 80 * * * * * uclust_test_seqs_7 * S 1 79 * * * * * uclust_test_seqs_4 * S 2 78 * * * * * uclust_test_seqs_2 * S 3 77 * * * * * uclust_test_seqs_3 * S 4 76 * * * * * uclust_test_seqs_1 * S 5 75 * * * * * uclust_test_seqs_5 * S 6 74 * * * * * uclust_test_seqs_6 * S 7 73 * * * * * uclust_test_seqs_0 * H 6 72 91.7 + 0 0 2I72M uclust_test_seqs_8 uclust_test_seqs_6 S 8 71 * * * * * uclust_test_seqs_9 * C 0 1 * * * * * uclust_test_seqs_7 * C 1 1 * * * * * uclust_test_seqs_4 * C 2 1 * * * * * uclust_test_seqs_2 * C 3 1 * * * * * uclust_test_seqs_3 * C 4 1 * * * * * uclust_test_seqs_1 * C 5 1 * * * * * uclust_test_seqs_5 * C 6 2 91.7 * * * * uclust_test_seqs_6 * C 7 1 * * * * * uclust_test_seqs_0 * C 8 1 * * * * * uclust_test_seqs_9 *""".split('\n') expected_cluster_list=[['uclust_test_seqs_7'], ['uclust_test_seqs_4'], ['uclust_test_seqs_2'], ['uclust_test_seqs_3'], ['uclust_test_seqs_1'], ['uclust_test_seqs_5'], ['uclust_test_seqs_6', 'uclust_test_seqs_8'], ['uclust_test_seqs_0'], ['uclust_test_seqs_9']] expected_failure_list = [] expected_new_seed_list = ['uclust_test_seqs_7', 'uclust_test_seqs_4', 'uclust_test_seqs_2', 'uclust_test_seqs_3', 'uclust_test_seqs_1', 'uclust_test_seqs_5', 'uclust_test_seqs_6', 'uclust_test_seqs_0', 'uclust_test_seqs_9'] search_align_query1 = """>1_like TACGGCTACCTTGTTACGACTTCATCCCAATCATTTGTTCCACCTTCGACGGCTA >2_like ATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGGATGGCAACTAAG >2_like_rc CTTAGTTGCCATCCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCAT >rand TTGCGACGAGCGGACGGCCGGGTGTATGTCGTCATATATATGTGTCTGCCTATCGTTACGTACACTCGTCGTCT """ search_align_template1 = """>1 AGAAAGGAGGTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATTTGTTCCACCTTCGACGGCTAGCTCCAAATGGTTACTCCACCGGCTTCGGGTGTTACAAACTC >2 AGCCCAAATCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGGATGGCAACTAAGCTTAAGGGTTGCGCT """ search_align_query2 = """>1_like PRTEINACYYPL >2_like AGGYTPPLVN >rand GGTYPARREE """ search_align_template2 = """>1 PRTELNACYYPL >2 AGGYTRPPLVN """ search_align_out2_expected = [ ('1_like','1','PRTEINACYYPL','PRTELNACYYPL',91.70000), ('2_like','2','AGGYT-PPLVN','AGGYTRPPLVN',100.0)] search_align_out_fasta_pairs1 = """>1_like -------------------------------TACGGCTACCTTGTTACGACTTCATCCCAATCATTTGTTCCACCTTCGACGGCTA------------------------------------------ >1+ AGAAAGGAGGTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATTTGTTCCACCTTCGACGGCTAGCTCCAAATGGTTACTCCACCGGCTTCGGGTGTTACAAACTC >2_like -------------------ATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGGATGGCAACTAAG--------------- >2+ AGCCCAAATCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGGATGGCAACTAAGCTTAAGGGTTGCGCT >2_like_rc ---------------CTTAGTTGCCATCCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCAT------------------- >2- AGCGCAACCCTTAAGCTTAGTTGCCATCCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGCT """.split('\n') search_align_out_uc1 = """# uclust --input sm_query.fasta --lib sm_template.fasta --id 0.75 --libonly --rev --maxaccepts 8 --maxrejects 32 --fastapairs sm_pw.fasta --uc sm_result.uc # version=1.1.577 # Tab-separated fields: # 1=Type, 2=ClusterNr, 3=SeqLength or ClusterSize, 4=PctId, 5=Strand, 6=QueryStart, 7=SeedStart, 8=Alignment, 9=QueryLabel, 10=TargetLabel # Record types (field 1): L=LibSeed, S=NewSeed, H=Hit, R=Reject, D=LibCluster, C=NewCluster, N=NoHit # For C and D types, PctId is average id with seed. # QueryStart and SeedStart are zero-based relative to start of sequence. # If minus strand, SeedStart is relative to reverse-complemented seed. L 0 128 * * * * * 1 * H 0 55 98.2 + 0 0 31I55M42I 1_like 1 L 1 92 * * * * * 2 * H 1 58 100.0 + 0 0 19I58M15I 2_like 2 H 1 58 100.0 - 0 0 15I58M19I 2_like_rc 2 N * 74 * * * * * rand * D 0 2 * * * * 98.2 1 * D 1 3 * * * * 100.0 2 * """.split('\n') search_align_out1_expected = [ ('1_like','1','-------------------------------TACGGCTACCTTGTTACGACTTCATCCCAATCATTTGTTCCACCTTCGACGGCTA------------------------------------------','AGAAAGGAGGTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATTTGTTCCACCTTCGACGGCTAGCTCCAAATGGTTACTCCACCGGCTTCGGGTGTTACAAACTC',98.2), ('2_like','2','-------------------ATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGGATGGCAACTAAG---------------','AGCCCAAATCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGGATGGCAACTAAGCTTAAGGGTTGCGCT',100.0),\ ('2_like_rc RC','2','-------------------ATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGGATGGCAACTAAG---------------','AGCCCAAATCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGGATGGCAACTAAGCTTAAGGGTTGCGCT',100.0)] uc_lines1 = """# uclust --input q.fasta --lib r.fasta --uc results.uc --id 0.90 --libonly --rev # version=1.1.579 # Tab-separated fields: # 1=Type, 2=ClusterNr, 3=SeqLength or ClusterSize, 4=PctId, 5=Strand, 6=QueryStart, 7=SeedStart, 8=Alignment, 9=QueryLabel, 10=TargetLabel # Record types (field 1): L=LibSeed, S=NewSeed, H=Hit, R=Reject, D=LibCluster, C=NewCluster, N=NoHit # For C and D types, PctId is average id with seed. # QueryStart and SeedStart are zero-based relative to start of sequence. # If minus strand, SeedStart is relative to reverse-complemented seed. N * 80 * * * * * s1 some comment * S 4 80 * * * * * s2 some other comment * H 2 78 100.0 + 0 0 5I78M10I s3 yet another comment s2""".split('\n') uc_lines_overlapping_lib_input_seq_ids = """# uclust --maxrejects 32 --input /tmp/OtuPickerbb092OWRWLWqlBR2BmTZ.fasta --id 0.97 --uc /tmp/uclust_clustersLf5Oqv0SvGTZo1mVWBqK.uc --rev --usersort --maxaccepts 8 --lib r.fasta # version=1.1.16 # Tab-separated fields: # 1=Type, 2=ClusterNr, 3=SeqLength or ClusterSize, 4=PctId, 5=Strand, 6=QueryStart, 7=SeedStart, 8=Alignment, 9=QueryLabel, 10=TargetLabel # Record types (field 1): L=LibSeed, S=NewSeed, H=Hit, R=Reject, D=LibCluster, C=NewCluster, N=NoHit # For C and D types, PctId is average id with seed. # QueryStart and SeedStart are zero-based relative to start of sequence. # If minus strand, SeedStart is relative to reverse-complemented seed. S 1 24 * * * * * 3 * H 1 24 100.0 + 0 0 24M 4 3 L 0 54 * * * * * 3 * H 0 54 100.0 + 0 0 54M 2 3 D 0 2 * * * * 100.0 3 * C 1 2 100.0 * * * * 3 * """.split('\n') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_unafold.py000644 000765 000024 00000004215 12024702176 022265 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove from cogent.util.unit_test import TestCase, main from cogent.core.info import Info from cogent.app.unafold import hybrid_ss_min __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class UnafoldTest(TestCase): """Tests for Unafold application controller""" def setUp(self): self.input = unafold_input def test_stdout_input_as_lines(self): """Test Unafold stdout input as lines""" u = hybrid_ss_min(InputHandler='_input_as_lines') exp = '\n'.join(unafold_stdout) res = u(self.input) obs = res['StdOut'].read() self.assertEqual(obs,exp) self.assertEqual(res['ExitStatus'],0) res.cleanUp() def test_stdout_input_as_string(self): """Test Unafold stdout input as string""" u = hybrid_ss_min() exp = '\n'.join(unafold_stdout) f = open('/tmp/single.fasta','w') f.write('\n'.join(self.input)) f.close() res = u('/tmp/single.fasta') obs = res['StdOut'].read() self.assertEqual(obs,exp) self.assertEqual(res['ExitStatus'],0) res.cleanUp() remove('/tmp/single.fasta') def test_get_result_path(self): """Tests unafold result path""" u = hybrid_ss_min(InputHandler='_input_as_lines') res = u(self.input) self.assertEqualItems(res.keys(),['StdOut','StdErr','ExitStatus',\ 'ct','dG','run','plot_37','ext_37']) self.assertEqual(res['ExitStatus'],0) assert res['ct'] is not None assert res['dG'] is not None assert res['run'] is not None assert res['plot_37'] is not None assert res['ext_37'] is not None res.cleanUp() unafold_input = ['>seq1\n', 'GGCUAGAUAGCUCAGAUGGUAGAGCAGAGGAUUGAAGAUCCUUGUGUCGUCGGUUCGAUCCCGGCUCUGGC\n', '\n'] unafold_stdout = ['Calculating for seq1, t = 37\n'] if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_usearch.py000755 000765 000024 00000211443 12024702176 022275 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ provides unit tests for the usearch.py module """ __author__ = "William Walters" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["William Walters", "Jose Carlos Clemente Litran", "Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "William Walters" __email__ = "william.a.walters@colorado.edu" __status__ = "Production" from os.path import isfile, basename, join, exists from shutil import rmtree from glob import glob from cogent.util.misc import remove_files from cogent.util.unit_test import TestCase, main from cogent.app.util import ApplicationError, get_tmp_filename from cogent.util.misc import create_dir, get_random_directory_name from cogent.app.usearch import (Usearch, clusters_from_blast_uc_file, usearch_fasta_sort_from_filepath, usearch_dereplicate_exact_subseqs, usearch_dereplicate_exact_seqs, usearch_sort_by_abundance, usearch_cluster_error_correction, usearch_chimera_filter_de_novo, usearch_chimera_filter_ref_based, usearch_cluster_seqs, enumerate_otus, assign_reads_to_otus, usearch_qf, concatenate_fastas, get_retained_chimeras, assign_dna_reads_to_protein_database, assign_dna_reads_to_dna_database,) __author__ = "William Walters" __copyright__ = "Copyright 2007-2009, The Cogent Project" __credits__ = ["William Walters"] __license__ = "GPL" __version__ = "1.6.0.dev" __maintainer__ = "William Walters" __email__ = "William.A.Walters@colorado.edu" __status__ = "Development" class UsearchTests(TestCase): def setUp(self): # create the temporary input files self.dna_seqs_1 = dna_seqs_1 self.dna_seqs_2 = dna_seqs_usearch self.dna_seqs_3 = dna_seqs_3 self.dna_seqs_4 = dna_seqs_4 self.protein_ref_seqs1 = protein_ref_seqs1 self.ref_database = usearch_ref_seqs1 self.dna_seqs_with_abundance = dna_seqs_with_abundance self.de_novo_chimera_seqs = de_novo_chimera_seqs self.dna_seqs_with_dups = dna_seqs_with_dups self.dna_seqs_reference_otu_picking = dna_seqs_reference_otu_picking # Expected output files self.uc_lines1 = uc_lines1 self.expected_otu_assignments = expected_otu_assignments self.expected_enumerated_fasta = expected_enumerated_fasta self.expected_enumerated_fasta_added_options =\ expected_enumerated_fasta_added_options self.expected_clusters_w_abundance_default_settings =\ expected_clusters_w_abundance_default_settings self.expected_clusters_w_abundance_low_setting =\ expected_clusters_w_abundance_low_setting self.expected_reference_filtered_seqs =\ expected_reference_filtered_seqs self.expected_de_novo_chimeras_default =\ expected_de_novo_chimeras_default self.expected_de_novo_chimera_filtered_skew11 =\ expected_de_novo_chimera_filtered_skew11 self.expected_cluster_err_seqs =\ expected_cluster_err_seqs self.expected_sorted_by_abundance_no_filter =\ expected_sorted_by_abundance_no_filter self.expected_derep_seqs = expected_derep_seqs self.expected_abundance_sort_filtered = expected_abundance_sort_filtered self.expected_len_sorted_seqs = expected_len_sorted_seqs self.expected_combined_dna_seqs_1_seqs_usearch =\ expected_combined_dna_seqs_1_seqs_usearch self.retained_chimeras_seqs1 = retained_chimeras_seqs1 self.retained_chimeras_seqs2 = retained_chimeras_seqs2 self.expected_retained_chimeras_union =\ expected_retained_chimeras_union self.expected_retained_chimeras_intersection =\ expected_retained_chimeras_intersection self.expected_derep_seqs_full_len =\ expected_derep_seqs_full_len # Create temporary files for use with unit tests self.tmp_dir = '/tmp/' self.tmp_seq_filepath1 = get_tmp_filename(\ prefix='UsearchOtuPickerTest_',\ suffix='.fasta') seq_file = open(self.tmp_seq_filepath1,'w') seq_file.write(self.dna_seqs_1) seq_file.close() self.tmp_seq_filepath2 = get_tmp_filename(\ prefix='UsearchOtuPickerTest_',\ suffix='.fasta') seq_file = open(self.tmp_seq_filepath2,'w') seq_file.write(self.dna_seqs_2) seq_file.close() self.dna_seqs3_filepath = get_tmp_filename(\ prefix='UsearchOtuPickerTest_',\ suffix='.fasta') seq_file = open(self.dna_seqs3_filepath,'w') seq_file.write(self.dna_seqs_3) seq_file.close() self.dna_seqs4_filepath = get_tmp_filename(\ prefix='UsearchOtuPickerTest_',\ suffix='.fasta') seq_file = open(self.dna_seqs4_filepath,'w') seq_file.write(self.dna_seqs_4) seq_file.close() self.protein_ref_seqs1_filepath = get_tmp_filename(\ prefix='UsearchOtuPickerTest_',\ suffix='.fasta') seq_file = open(self.protein_ref_seqs1_filepath,'w') seq_file.write(self.protein_ref_seqs1) seq_file.close() self.tmp_ref_database = get_tmp_filename(\ prefix='UsearchRefDatabase_',\ suffix='.fasta') seq_file = open(self.tmp_ref_database, 'w') seq_file.write(self.ref_database) seq_file.close() self.tmp_seqs_w_abundance = get_tmp_filename(\ prefix='UsearchSeqsAbundance_',\ suffix='.fasta') seq_file = open(self.tmp_seqs_w_abundance, 'w') seq_file.write(self.dna_seqs_with_abundance) seq_file.close() self.tmp_de_novo_chimera_seqs = get_tmp_filename(\ prefix='UsearchdenovoChimera_',\ suffix='.fasta') seq_file = open(self.tmp_de_novo_chimera_seqs, 'w') seq_file.write(self.de_novo_chimera_seqs) seq_file.close() self.tmp_dna_seqs_with_dups = get_tmp_filename(\ prefix='UsearchDupDNASeqs_',\ suffix='.fasta') seq_file = open(self.tmp_dna_seqs_with_dups, 'w') seq_file.write(self.dna_seqs_with_dups) seq_file.close() self.tmp_retained_chimeras_seqs1 = get_tmp_filename(\ prefix="UsearchRetainedChimeras1_",\ suffix=".fasta") seq_file = open(self.tmp_retained_chimeras_seqs1, 'w') seq_file.write(self.retained_chimeras_seqs1) seq_file.close() self.tmp_retained_chimeras_seqs2 = get_tmp_filename(\ prefix="UsearchRetainedChimeras1_",\ suffix=".fasta") seq_file = open(self.tmp_retained_chimeras_seqs2, 'w') seq_file.write(self.retained_chimeras_seqs2) seq_file.close() self.tmp_dna_seqs_ref_otu_picking = get_tmp_filename(\ prefix="UsearchRefOtuPicking_",\ suffix=".fasta") seq_file = open(self.tmp_dna_seqs_ref_otu_picking, "w") seq_file.write(self.dna_seqs_reference_otu_picking) seq_file.close() self._files_to_remove =\ [self.tmp_seq_filepath1, self.tmp_seq_filepath2, self.tmp_ref_database, self.tmp_seqs_w_abundance, self.tmp_de_novo_chimera_seqs, self.tmp_dna_seqs_with_dups, self.tmp_retained_chimeras_seqs1, self.tmp_retained_chimeras_seqs2, self.tmp_dna_seqs_ref_otu_picking, self.dna_seqs3_filepath, self.protein_ref_seqs1_filepath, self.dna_seqs4_filepath] self._dirs_to_remove = [] def tearDown(self): remove_files(self._files_to_remove) if self._dirs_to_remove: for curr_dir in self._dirs_to_remove: rmtree(curr_dir) def test_usearch_qf(self): """ Main program loop test, with default parameters """ # cluster size filtering set to 1 instead of default 4 clusters, failures = usearch_qf(self.tmp_seq_filepath2, output_dir = self.tmp_dir, db_filepath = self.tmp_ref_database, minsize = 1, remove_usearch_logs=True, chimeras_retention = 'intersection') expected_clusters = {'1': ['Solemya', 'Solemya_seq2'], '0': ['usearch_ecoli_seq', 'usearch_ecoli_seq2']} expected_failures = ['chimera'] self.assertEqual(clusters, expected_clusters) self.assertEqual(failures, expected_failures) def test_usearch_qf_minlen(self): """ Main program loop test, with longer minlen """ # cluster size filtering set to 1 instead of default 4 clusters, failures = usearch_qf(self.tmp_seq_filepath2, output_dir = self.tmp_dir, db_filepath = self.tmp_ref_database, minsize = 1, remove_usearch_logs=True, chimeras_retention = 'intersection', minlen=110) expected_clusters = {'0': ['usearch_ecoli_seq', 'usearch_ecoli_seq2']} expected_failures = ['Solemya', 'Solemya_seq2', 'chimera'] self.assertEqual(clusters, expected_clusters) self.assertEqual(failures, expected_failures) def test_usearch_qf_reference_otu_picking(self): """ Main program loop test, with reference + new clusters """ # cluster size filtering set to 1 instead of default 4 clusters, failures = usearch_qf(self.tmp_dna_seqs_ref_otu_picking, output_dir = self.tmp_dir, refseqs_fp = self.tmp_ref_database, reference_chimera_detection=False, minsize = 1, remove_usearch_logs=True, suppress_new_clusters=False) # Will cluster everything including RandomCrap, as new clusters allowed. expected_clusters = {'1': ['Solemya', 'Solemya_seq2'], '0': ['usearch_ecoli_seq', 'usearch_ecoli_seq2'], '2': ['RandomCrap']} expected_failures = [] self.assertEqual(clusters, expected_clusters) self.assertEqual(failures, expected_failures) def test_usearch_qf_reference_otu_picking_no_new_clusters(self): """ Main program loop test, with reference and no new clusters """ # cluster size filtering set to 1 instead of default 4 clusters, failures = usearch_qf(self.tmp_dna_seqs_ref_otu_picking, output_dir = self.tmp_dir, refseqs_fp = self.tmp_ref_database, reference_chimera_detection=False, minsize = 1, remove_usearch_logs=True, suppress_new_clusters=True) # Will cluster everything but RandomCrap, as no new clusters allowed. expected_clusters = {'1': ['Solemya', 'Solemya_seq2'], '0': ['usearch_ecoli_seq', 'usearch_ecoli_seq2']} expected_failures = ['RandomCrap'] self.assertEqual(clusters, expected_clusters) self.assertEqual(failures, expected_failures) def test_usearch_qf_no_ref_database(self): """ Main program loop with no reference chimera testing """ # cluster size filtering set to 1 instead of default 4 clusters, failures = usearch_qf(self.tmp_seq_filepath2, output_dir = self.tmp_dir, reference_chimera_detection=False, minsize = 1, remove_usearch_logs=True) # Chimera sequence should not be detected without reference test. expected_clusters = {'1': ['Solemya', 'Solemya_seq2'], '0': ['usearch_ecoli_seq', 'usearch_ecoli_seq2'], '2': ['chimera']} expected_failures = [] self.assertEqual(clusters, expected_clusters) self.assertEqual(failures, expected_failures) def test_usearch_qf_union(self): """ Main program loop with union nonchimera retention """ # cluster size filtering set to 1 instead of default 4 clusters, failures = usearch_qf(self.tmp_seq_filepath2, output_dir = self.tmp_dir, reference_chimera_detection=False, minsize = 1, remove_usearch_logs=True, chimeras_retention = 'union') # Chimera sequence retained as passes de novo test expected_clusters = {'1': ['Solemya', 'Solemya_seq2'], '0': ['usearch_ecoli_seq', 'usearch_ecoli_seq2'], '2': ['chimera']} expected_failures = [] self.assertEqual(clusters, expected_clusters) self.assertEqual(failures, expected_failures) def test_usearch_qf_disabled_filters(self): """ Returns expected clustering with no filtering """ # cluster size filtering set to 1 instead of default 4 clusters, failures = usearch_qf(self.tmp_seq_filepath2, output_dir = self.tmp_dir, de_novo_chimera_detection=False, reference_chimera_detection=False, cluster_size_filtering=False, remove_usearch_logs=True) # Chimera sequence should not be detected without reference test. expected_clusters = {'1': ['Solemya', 'Solemya_seq2'], '0': ['usearch_ecoli_seq', 'usearch_ecoli_seq2'], '2': ['chimera']} expected_failures = [] self.assertEqual(clusters, expected_clusters) self.assertEqual(failures, expected_failures) def test_usearch_qf_generates_logs(self): """ Generates expected log files """ curr_output_dir = get_tmp_filename(prefix='/UsearchLogTest_',suffix='/') create_dir(curr_output_dir) self._dirs_to_remove.append(curr_output_dir) # cluster size filtering set to 1 instead of default 4 clusters, failures = usearch_qf(self.tmp_seq_filepath2, output_dir = curr_output_dir, db_filepath = self.tmp_ref_database, minsize = 1, remove_usearch_logs=False, chimeras_retention = 'intersection') expected_clusters = {'1': ['Solemya', 'Solemya_seq2'], '0': ['usearch_ecoli_seq', 'usearch_ecoli_seq2']} expected_failures = ['chimera'] self.assertEqual(clusters, expected_clusters) self.assertEqual(failures, expected_failures) # Only checking for creation of files, as file contents contain # tmp file names. expected_log_names = ['assign_reads_to_otus.log', 'uchime_de_novo_chimera_filtering.log', 'derep.log', 'uchime_reference_chimera_filtering.log', 'minsize_0_abundance_sort.log', 'usearch_cluster_err_corrected.log', 'minsize_1_abundance_sort.log', 'usearch_cluster_seqs.log', 'sortlen.log'] actual_logs =\ [basename(curr_file) for curr_file in glob(curr_output_dir + "*.*")] for log in expected_log_names: self.assertContains(actual_logs, log) def test_concatenate_fastas(self): """ Properly concatenates two fasta files """ out_f =\ get_tmp_filename(prefix='UsearchConcatFileTest_',suffix='.fasta') actual_concatenated_seqs = concatenate_fastas(self.tmp_seq_filepath1, self.tmp_seq_filepath2, out_f) self._files_to_remove.append(out_f) actual_lines =\ [line.strip() for line in open(actual_concatenated_seqs, "U")] self.assertEqual(actual_lines, expected_combined_dna_seqs_1_seqs_usearch) def test_assign_reads_to_otus(self): """ Properly assigns reads back to original ID """ app_result, output_filepath =\ assign_reads_to_otus(original_fasta = self.tmp_ref_database, filtered_fasta = self.tmp_seq_filepath2, remove_usearch_logs = True, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) # Stripping off first line, which refers to the command using tmp # file names, retaining other actual results. actual_assignments =\ [line.strip() for line in open(output_filepath, "U")][2:] self.assertEqual(actual_assignments, self.expected_otu_assignments) def test_enumerate_otus(self): """ Enumerates OTUs properly """ output_filepath = enumerate_otus(self.tmp_seq_filepath1) self._files_to_remove.append(output_filepath) actual_fasta = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_fasta, self.expected_enumerated_fasta) def test_enumerate_otus_added_options(self): """ Enumerates with all options properly """ output_filepath = enumerate_otus(self.tmp_seq_filepath1, label_prefix = "Big", label_suffix = "Ern", retain_label_as_comment = True, count_start = 255) self._files_to_remove.append(output_filepath) actual_fasta = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_fasta, self.expected_enumerated_fasta_added_options) def test_usearch_cluster_seqs(self): """ Clusters sequences correctly """ # clusters all seqs with default 97% identity app_result, output_filepath =\ usearch_cluster_seqs(self.tmp_seqs_w_abundance, save_intermediate_files=False, remove_usearch_logs=True, percent_id = 0.97, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_clusters = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_clusters, self.expected_clusters_w_abundance_default_settings) def test_usearch_cluster_seqs_high_identity(self): """ Clusters sequences correctly """ # Should get two clusters with 99.9% identity app_result, output_filepath =\ usearch_cluster_seqs(self.tmp_seqs_w_abundance, save_intermediate_files=False, remove_usearch_logs=True, percent_id = 0.999, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_clusters = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_clusters, self.expected_clusters_w_abundance_low_setting) def test_usearch_chimera_filter_ref_based(self): """ Properly detects chimeras against reference database """ app_result, output_filepath =\ usearch_chimera_filter_ref_based(self.tmp_seq_filepath2, self.tmp_ref_database, remove_usearch_logs=True, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_filtered_chimeras =\ [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_filtered_chimeras, self.expected_reference_filtered_seqs) def test_usearch_chimera_filter_de_novo(self): """ Properly detects de novo chimeras """ app_result, output_filepath =\ usearch_chimera_filter_de_novo(self.tmp_de_novo_chimera_seqs, remove_usearch_logs=True, abundance_skew = 2, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_seqs = \ [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_seqs, self.expected_de_novo_chimeras_default) def test_usearch_chimera_filter_de_novo_abundance_skew(self): """ Properly detects de novo chimeras with skew changes """ app_result, output_filepath =\ usearch_chimera_filter_de_novo(self.tmp_de_novo_chimera_seqs, remove_usearch_logs=True, abundance_skew = 11, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_seqs = \ [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_seqs, self.expected_de_novo_chimera_filtered_skew11) def test_usearch_cluster_error_correction(self): """ Properly clusters seqs for chimera testing/filtering """ # clusters all seqs with default 97% identity app_result, output_filepath =\ usearch_cluster_error_correction(self.tmp_seqs_w_abundance, save_intermediate_files=False, remove_usearch_logs=True, percent_id_err = 0.97, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_clusters = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_clusters, self.expected_cluster_err_seqs) def test_usearch_sort_by_abundance(self): """ Properly sorts fasta by abundance """ app_result, output_filepath =\ usearch_sort_by_abundance(self.tmp_de_novo_chimera_seqs, remove_usearch_logs=True, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_seqs = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_seqs, self.expected_sorted_by_abundance_no_filter) def test_usearch_sort_by_abundance_filter(self): """ Properly sorts fasta by abundance, filters low count otus """ app_result, output_filepath =\ usearch_sort_by_abundance(self.tmp_de_novo_chimera_seqs, remove_usearch_logs=True, minsize = 40, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_seqs = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_seqs, self.expected_abundance_sort_filtered) def test_usearch_dereplicate_exact_subseqs(self): """ Properly dereplicates fasta file """ app_result, output_filepath =\ usearch_dereplicate_exact_subseqs(self.tmp_dna_seqs_with_dups, remove_usearch_logs=True, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_seqs = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_seqs, self.expected_derep_seqs) def test_usearch_dereplicate_exact_seqs(self): """ Properly dereplicates fasta file """ app_result, output_filepath =\ usearch_dereplicate_exact_seqs(self.tmp_dna_seqs_with_dups, remove_usearch_logs=True, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_seqs = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_seqs, self.expected_derep_seqs_full_len) def test_usearch_fasta_sort_from_filepath(self): """ Properly sorts fasta according to seq length """ app_result, output_filepath =\ usearch_fasta_sort_from_filepath(self.tmp_seq_filepath2, remove_usearch_logs=True, working_dir=self.tmp_dir) self._files_to_remove.append(output_filepath) actual_seqs = [line.strip() for line in open(output_filepath, "U")] self.assertEqual(actual_seqs, self.expected_len_sorted_seqs) def test_clusters_from_blast_uc_file(self): """ clusters_from_uc_file functions as expected """ expected_clusters = {'19': ['PC.634_4'], '42': ['PC.test2_1', 'PC.test1_2', 'PC.634_3'], '6': ['PC.269_5']} expected_failures = ['PC.481_6'] self.assertEqual(clusters_from_blast_uc_file(self.uc_lines1), (expected_clusters,expected_failures)) def test_get_retained_chimeras_union(self): """ Properly returns union of two fastas """ out_f =\ get_tmp_filename(prefix='UsearchUnionTest_',suffix='.fasta') actual_out_fp = get_retained_chimeras(self.tmp_retained_chimeras_seqs1, self.tmp_retained_chimeras_seqs2, out_f, chimeras_retention = 'union') self._files_to_remove.append(out_f) actual_out_f = [line.strip() for line in open(actual_out_fp, "U")] self.assertEqual(actual_out_f, self.expected_retained_chimeras_union) def test_get_retained_chimeras_intersection(self): """ Properly returns intersection of two fastas """ out_f =\ get_tmp_filename(prefix='UsearchIntersectionTest_',suffix='.fasta') actual_out_fp = get_retained_chimeras(self.tmp_retained_chimeras_seqs1, self.tmp_retained_chimeras_seqs2, out_f, chimeras_retention = 'intersection') self._files_to_remove.append(out_f) actual_out_f = [line.strip() for line in open(actual_out_fp, "U")] self.assertEqual(actual_out_f, self.expected_retained_chimeras_intersection) def test_assign_dna_reads_to_protein_database(self): """assign_dna_reads_to_protein_database wrapper functions as expected """ output_dir = get_random_directory_name(output_dir=self.tmp_dir) self._dirs_to_remove.append(output_dir) output_fp = join(output_dir,'out.uc') assign_dna_reads_to_protein_database(self.dna_seqs3_filepath, self.protein_ref_seqs1_filepath, output_fp, temp_dir = self.tmp_dir) self.assertTrue(exists(output_fp)) self.assertTrue(exists(output_fp.replace('.uc','.bl6'))) # confirm that the clusters look like what we expect expected_clusters = [['eco:b0015'],['eco:b0122','eco:b0122-like']] expected_clusters.sort() actual_clusters = clusters_from_blast_uc_file(open(output_fp))[0].values() actual_clusters.sort() self.assertEqual(actual_clusters,expected_clusters) def test_assign_dna_reads_to_protein_database_alt_params(self): """assign_dna_reads_to_protein_database wrapper functions with alt params """ output_dir = get_random_directory_name(output_dir=self.tmp_dir) self._dirs_to_remove.append(output_dir) output_fp = join(output_dir,'out.uc') assign_dna_reads_to_protein_database(self.dna_seqs3_filepath, self.protein_ref_seqs1_filepath, output_fp, temp_dir = self.tmp_dir, params={'--id':1.0}) self.assertTrue(exists(output_fp)) self.assertTrue(exists(output_fp.replace('.uc','.bl6'))) # confirm that the clusters look like what we expect expected_clusters = [['eco:b0015'],['eco:b0122']] expected_clusters.sort() actual_clusters = clusters_from_blast_uc_file(open(output_fp))[0].values() actual_clusters.sort() self.assertEqual(actual_clusters,expected_clusters) def test_assign_dna_reads_to_dna_database(self): """assign_dna_reads_to_protein_database wrapper functions as expected """ output_dir = get_random_directory_name(output_dir=self.tmp_dir) self._dirs_to_remove.append(output_dir) output_fp = join(output_dir,'out.uc') assign_dna_reads_to_protein_database(self.dna_seqs3_filepath, self.dna_seqs4_filepath, output_fp, temp_dir = self.tmp_dir) self.assertTrue(exists(output_fp)) self.assertTrue(exists(output_fp.replace('.uc','.bl6'))) # confirm that the clusters look like what we expect expected_clusters = [['eco:b0015'],['eco:b0122','eco:b0122-like']] expected_clusters.sort() actual_clusters = clusters_from_blast_uc_file(open(output_fp))[0].values() actual_clusters.sort() self.assertEqual(actual_clusters,expected_clusters) # Long strings for test files, output, etc. # ************************************************* retained_chimeras_seqs1 = """>seq1 ACAGGCC >seq2 ACAGGCCCCC >seq3 TTATCCATT""" retained_chimeras_seqs2 = """>seq3 ACAGGCC >seq4 ACAGGCCCCC >seq5 TTATCCATT""" dna_seqs_1 = """>uclust_test_seqs_0 some comment0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >uclust_test_seqs_1 some comment1 ACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT >uclust_test_seqs_2 some comment2 CCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT >uclust_test_seqs_3 some comment3 CCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT >uclust_test_seqs_4 some comment4 GCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT >uclust_test_seqs_5 some comment4_again CCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA >uclust_test_seqs_6 some comment6 CGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT >uclust_test_seqs_7 some comment7 ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT >uclust_test_seqs_8 some comment8 CGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT >uclust_test_seqs_9 some comment9 GGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA""" dna_seqs_3 = """>eco:b0001 thrL; thr operon leader peptide; K08278 thr operon leader peptide (N) atgaaacgcattagcaccaccattaccaccaccatcaccattaccacaggtaacggtgcg ggctga >eco:b0015 dnaJ; chaperone Hsp40, co-chaperone with DnaK; K03686 molecular chaperone DnaJ (N) atggctaagcaagattattacgagattttaggcgtttccaaaacagcggaagagcgtgaa atcagaaaggcctacaaacgcctggccatgaaataccacccggaccgtaaccagggtgac aaagaggccgaggcgaaatttaaagagatcaaggaagcttatgaagttctgaccgactcg caaaaacgtgcggcatacgatcagtatggtcatgctgcgtttgagcaaggtggcatgggc ggcggcggttttggcggcggcgcagacttcagcgatatttttggtgacgttttcggcgat atttttggcggcggacgtggtcgtcaacgtgcggcgcgcggtgctgatttacgctataac atggagctcaccctcgaagaagctgtacgtggcgtgaccaaagagatccgcattccgact ctggaagagtgtgacgtttgccacggtagcggtgcaaaaccaggtacacagccgcagact tgtccgacctgtcatggttctggtcaggtgcagatgcgccagggattcttcgctgtacag cagacctgtccacactgtcagggccgcggtacgctgatcaaagatccgtgcaacaaatgt catggtcatggtcgtgttgagcgcagcaaaacgctgtccgttaaaatcccggcaggggtg gacactggagaccgcatccgtcttgcgggcgaaggtgaagcgggcgagcatggcgcaccg gcaggcgatctgtacgttcaggttcaggttaaacagcacccgattttcgagcgtgaaggc aacaacctgtattgcgaagtcccgatcaacttcgctatggcggcgctgggtggcgaaatc gaagtaccgacccttgatggtcgcgtcaaactgaaagtgcctggcgaaacccagaccggt aagctattccgtatgcgcggtaaaggcgtcaagtctgtccgcggtggcgcacagggtgat ttgctgtgccgcgttgtcgtcgaaacaccggtaggcctgaacgaaaggcagaaacagctg ctgcaagagctgcaagaaagcttcggtggcccaaccggcgagcacaacagcccgcgctca aagagcttctttgatggtgtgaagaagttttttgacgacctgacccgctaa >eco:b0122 yacC; conserved protein, PulS_OutS family (N) atgaagacgtttttcagaacagtgttattcggcagcctgatggccgtctgcgcaaacagt tacgcgctcagcgagtctgaagccgaagatatggccgatttaacggcagtttttgtcttt ctgaagaacgattgtggttaccagaacttacctaacgggcaaattcgtcgcgcactggtc tttttcgctcagcaaaaccagtgggacctcagtaattacgacaccttcgacatgaaagcc ctcggtgaagacagctaccgcgatctcagcggcattggcattcccgtcgctaaaaaatgc aaagccctggcccgcgattccttaagcctgcttgcctacgtcaaataa >eco:b0122-like atgaagaaaattttcagaacagtgttattcggcagcctgatggccgtctgcgcaaacagt tacgcgctcagcgagtctgaagccgaagatatggccgatttaacggcagtttttgtcttt ctgaagaacgattgtggttaccagaacttacctaacgggcaaattcgtcgcgcactggtc tttttcgctcagcaaaaccagtgggacctcagtaattacgacaccttcgacatgaaagcc ctcggtgaagacagctaccgcgatctcagcggcattggcattcccgtcgctaaaaaatgc aaagccctggcccgcgattccttaagcctgcttgcctacgtcaaatcc""" dna_seqs_4 = """>eco:b0015 dnaJ; chaperone Hsp40, co-chaperone with DnaK; K03686 molecular chaperone DnaJ (N) atggctaagcaagattattacgagattttaggcgtttccaaaacagcggaagagcgtgaa atcagaaaggcctacaaacgcctggccatgaaataccacccggaccgtaaccagggtgac aaagaggccgaggcgaaatttaaagagatcaaggaagcttatgaagttctgaccgactcg caaaaacgtgcggcatacgatcagtatggtcatgctgcgtttgagcaaggtggcatgggc ggcggcggttttggcggcggcgcagacttcagcgatatttttggtgacgttttcggcgat atttttggcggcggacgtggtcgtcaacgtgcggcgcgcggtgctgatttacgctataac atggagctcaccctcgaagaagctgtacgtggcgtgaccaaagagatccgcattccgact ctggaagagtgtgacgtttgccacggtagcggtgcaaaaccaggtacacagccgcagact tgtccgacctgtcatggttctggtcaggtgcagatgcgccagggattcttcgctgtacag cagacctgtccacactgtcagggccgcggtacgctgatcaaagatccgtgcaacaaatgt catggtcatggtcgtgttgagcgcagcaaaacgctgtccgttaaaatcccggcaggggtg gacactggagaccgcatccgtcttgcgggcgaaggtgaagcgggcgagcatggcgcaccg gcaggcgatctgtacgttcaggttcaggttaaacagcacccgattttcgagcgtgaaggc aacaacctgtattgcgaagtcccgatcaacttcgctatggcggcgctgggtggcgaaatc gaagtaccgacccttgatggtcgcgtcaaactgaaagtgcctggcgaaacccagaccggt aagctattccgtatgcgcggtaaaggcgtcaagtctgtccgcggtggcgcacagggtgat ttgctgtgccgcgttgtcgtcgaaacaccggtaggcctgaacgaaaggcagaaacagctg ctgcaagagctgcaagaaagcttcggtggcccaaccggcgagcacaacagcccgcgctca aagagcttctttgatggtgtgaagaagttttttgacgacctgacccgctaa >eco:b0122 yacC; conserved protein, PulS_OutS family (N) atgaagacgtttttcagaacagtgttattcggcagcctgatggccgtctgcgcaaacagt tacgcgctcagcgagtctgaagccgaagatatggccgatttaacggcagtttttgtcttt ctgaagaacgattgtggttaccagaacttacctaacgggcaaattcgtcgcgcactggtc tttttcgctcagcaaaaccagtgggacctcagtaattacgacaccttcgacatgaaagcc ctcggtgaagacagctaccgcgatctcagcggcattggcattcccgtcgctaaaaaatgc aaagccctggcccgcgattccttaagcctgcttgcctacgtcaaataa >eco:b0122-like atgaagacgtttttcagaacagtgttattcggcagcctgatggccgtctgcgcaaacagt tacgcgctcagcgagtctgaagccgaagatatggccgatttaacggcagtttttgtcttt ctgaagaacgattgtggttaccagaacttacctaacgggcaaattcgtcgcgcactggtc tttttcgctcagcaaaaccagtgggacctcagtaattacgacaccttcgacatgaaagcc ctcggtgaagacagctaccgcgatctcagcggcattggcattcccgtcgctaaaaaatgc aaagccctggcccgcgattccttaagcctgcttgcctacgtcaaatcc""" protein_ref_seqs1 = """>eco:b0001 thrL; thr operon leader peptide; K08278 thr operon leader peptide (A) MKRISTTITTTITITTGNGAG >eco:b0015 dnaJ; chaperone Hsp40, co-chaperone with DnaK; K03686 molecular chaperone DnaJ (A) MAKQDYYEILGVSKTAEEREIRKAYKRLAMKYHPDRNQGDKEAEAKFKEIKEAYEVLTDS QKRAAYDQYGHAAFEQGGMGGGGFGGGADFSDIFGDVFGDIFGGGRGRQRAARGADLRYN MELTLEEAVRGVTKEIRIPTLEECDVCHGSGAKPGTQPQTCPTCHGSGQVQMRQGFFAVQ QTCPHCQGRGTLIKDPCNKCHGHGRVERSKTLSVKIPAGVDTGDRIRLAGEGEAGEHGAP AGDLYVQVQVKQHPIFEREGNNLYCEVPINFAMAALGGEIEVPTLDGRVKLKVPGETQTG KLFRMRGKGVKSVRGGAQGDLLCRVVVETPVGLNERQKQLLQELQESFGGPTGEHNSPRS KSFFDGVKKFFDDLTR >eco:b0015:rep MAKQDYYEILGVSKTAEEREIRKAYKRLAMKYHPDRNQGDKEAEAKFKEIKEAYEVLTDS QKRAAYDQYGHAAFEQGGMGGGGFGGGADFSDIFGDVFGDIFGGGRGRQRAARGADLRYN MELTLEEAVRGVTKEIRIPTLEECDVCHGSGAKPGTQPQTCPTCHGSGQVQMRQGFFAVQ QTCPHCQGRGTLIKDPCNKCHGHGRVERSKTLSVKIPAGVDTGDRIRLAGEGEAGEHGAP AGDLYVQVQVKQHPIFEREGNNLYCEVPINFAMAALGGEIEVPTLDGRVKLKVPGETQTG KLFRMRGKGVKSVRGGAQGDLLCRVVVETPVGLNERQKQLLQELQESFGGPTGEHNSPRS KSFFDGVKKFFDDLTR >eco:b0122 yacC; conserved protein, PulS_OutS family (A) MKTFFRTVLFGSLMAVCANSYALSESEAEDMADLTAVFVFLKNDCGYQNLPNGQIRRALV FFAQQNQWDLSNYDTFDMKALGEDSYRDLSGIGIPVAKKCKALARDSLSLLAYVK""" usearch_ref_seqs1 = """>ref1 ecoli sequence CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA >EU199232 1 1236 Bacteria/Deltaproteobacteria/Desulfurella - Hippea/uncultured TACGCGCGGAAATCGAGCGAGATTGGGAACGCAAGTTCCTGAGTATTGCGGCGAACGGGTGAGTAAGACGTGGGTGATCTACCCCTAGGGTGGGAATAACCCGGGGAAACCCGGGCTAATACCGAATAAGACCACAGGAGGCGACTCCAGAGGGTCAAAGGGAGCCTTGGCCTCCCCC >L07864 1 1200 Bacteria/Beta Gammaproteobacteria/Solemya symbiont GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAGTGGCGGACGGGTGAGTAATGCATGGGAATCTGCCATATAGTGGGGGACAACTGGGGAAACCCAGGCTAATACCGCATAATCTCTACGGAGGAAAGGCTTC """ dna_seqs_usearch = """>usearch_ecoli_seq CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGT >Solemya seq GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAGTGGCGGACGGGTGAGTA >usearch_ecoli_seq2 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTCCAT >Solemya_seq2 GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAGTGGCGGACGGGTGAGTATCAAG >chimera CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACCCCTAGGGTGGGAATAACCCGGGGAAACCCGGGCTAATACCGAATAAGACCACAGGAGGCGACTCCAGAGGGTCAAAGGGAGCCTTGGCCTCCCCC """ dna_seqs_reference_otu_picking = """>usearch_ecoli_seq CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGT >Solemya seq GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAGTGGCGGACGGGTGAGTA >usearch_ecoli_seq2 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTCCAT >Solemya_seq2 GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAGTGGCGGACGGGTGAGTATCAAG >RandomCrap ACACAAACAGTATATTATATCCCCAGACAGGGACCGAGATTTACCACACCCAAAAAAAAAAAAAACACACCCCCCCCCCCCCCACACACACACTTATTTT """ dna_seqs_with_abundance = """>Cluster1;size=114 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >Cluster2;size=45 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCC >Cluster0;size=37 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGAACATCGCGTCAGGTTTGTGTCAGGCCT >Cluster7;size=33 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAAAGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >Cluster6;size=32 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >Cluster5;size=25 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >Cluster11;size=22 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >Cluster12;size=15 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >Cluster13;size=2 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTGTGTCAGGCCT >Cluster14;size=1 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTG""" de_novo_chimera_seqs = """>Cluster1;size=52 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGGCTTGGTGGTCCGTTACAC CGCCAACTACCTAATGCGACGCATGCCCATCCGCTACCGGATCGCTCCTTTGGAATCCCGGGGATGTCCCCGGAACTCGT TATGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGTAGCGGGCAGGTTGCATACGTGTTACTCACCCGTGCGCCG GTCGCCGG >Cluster0;size=50 TTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCC CGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCA TCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGG >Cluster2;size=45 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCTGTTACCC CGCCAACCAGCTAATCAGACGCGGATCCATCGTATACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTT >Cluster10;size=43 CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCGCCCTCTCAGGCCGGCTATGCATCATCGTCTTGGTGGGCCTTTACCC CGCCAACCAACTAATGCACCGCAGGTCCATCCGCGCCCCATCCCCTAAAGGATGTTTCACAGAAAGAAGATGCCTCCTTC CTGTACATCGGGATTTGTTCTCCGTTTCCAGAGCGTATTCCCGGTGCGCGGGCAGGTTCCCTACGTGTTACTCACCCG >Cluster4;size=40 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCC CGCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGTCCCATGCAGGACCGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTGCAAGGCAGGTTACCCACGCGTTACTCACCCGTCCG >Cluster6;size=40 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTCGGTGGGCCGTTACCC CGCCGACTAGCTAATGCGCCGCATGGCCATCCGCAGCCGATAAATCTTTAAACATCGGGAGATGCCTCCCAACGTTGTTA CGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGCTGCGGGCAGGTTCCATACGTGTTACTCACCCGTGCGCCGG >Cluster3;size=30 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTCAACTCGGCTATGCATCATTGCCTTGGTAAGCCGTTACCT TACCAACTAGCTAATGCACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCTAGACATGCGTCTAGTG TTGTTATCCGGTATTAGCATCTGTTTCCAGGTGTTATCCCAGTCTCTTGGG >Cluster12;size=19 TTGGTCCGTGTCTCAGTACCAATGTGGGGGGTTAACCTCTCAGTCCCCCTATGTATCGTGGTCTTGGTGAGCCGTTACCC CACCAACTAACTAATACAACGCATGCCCATCCATTACCACCGGAGTTTTCAACCCAAGAAGATGCCTCCCTGGATGTTAT GGGGTATTAGTACCGATTTCTCAGTGTTATCCCCCTGTAATGGGTAGGTTGCATACGCGTTACGCACCCGTGCGCCGGTC GCCGACAAT >Cluster30;size=18 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTTGGTGGGCCGTTACCC CGCCAACTAGCTAATGCGCCGCATGGCCATCCGTAGCCGGTGTTACCCTTTAAACCCCAAGAGATGCCTCTCGGAGTTAT TACGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGCTACGGGCAGGTTCCATACGTGTTACTCACCCGTGCGCCG GTCGCCGG >Cluster29;size=18 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTCGGTGGGCCGTTACCC CGCCGACTAGCTAATGCGCCGCATGCCCATCCGCCACCGGTAATCCCTTTGGCGGCACCGGGATGCCCCGACGCCGCGTC ACGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGTGGCGGGCAGGTTGCATACGTGTTACTCACCCGTGCGCCGG TCGCCGG >Cluster16;size=16 CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCGTTACCC CTCCAACCAGCTAATCAGACGCGGGTCCATCCTGTACCACCGGAGTTTTTCACACTGTACCATGCGGTACTGTGCGCTTA TGCGGTTTTAGCACCTATTTCTAAGTGTTATCCCCCTGTACAGGGCAGGTTACCCACGCGTTACTCACCCGTCCGCCACT >Cluster222;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCC GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTAT GCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTTACT >Cluster221;size=1 CTGGGCCGTATCTCAGTCCCAATGTGGCCGTTCAACCTCTCAGTCCGGCTACTGATCGTCGCCTTGGTAGGCCGTTGCCC CGCCAACTACCTAATCGGACGCGAGCCCATCTTTCAGCGGATTGCTCCTTTGATTATCTCACCATGCGGCAAAATAATGT CATGCGGTATTAGCGTTCGTTTCCAAACGTTATCCCCCTCTGAAAGGCAGGTTGCTCACGCGTT >Cluster218;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGGCCACCCTCTCAGGTCGGCTACTGATCGTCACCTTGGTAGGCCGTTACCC CACCAACTAGCTAATCAGACGCAAGCCCATCTATCAGCGGATTGCTCCTTTTCTAGCTATATCATGCGATACTACTAGCT TATGCGGTATTAGCAATGATTTCTCACTGTTATTCCCCTCTGATAGGCAGG >Cluster217;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCC CGCCAACCAGCTAATCAGACGCGAGTCCATCTCAGAGCGATAAATCTTTGATATCCAGAGCCATGCGACCCAGATATATT ATGCGGTATTAGCAGCTGTTTCCAGCTGTTATTCCCCATCCAAGGCAGGTT >Cluster216;size=1 CTGGGCCGTGTCTCAGTCCCAGTGTGGCCGTCCGCCCTCTCAGGTCAGCTACTGATCGTCGCCTTGGTAGGCCATTACCC TACCAACTAGCTAATCAGACGCGAGGCCATCTCTCAGCGATAAATCTTTGATATATCTGCCATGCGACAAACATATATTA TGCGGTATTAGCAGTCGTTTCCAACTGTTGTCCCCCTCTGAAAGGCAGGTT >Cluster522;size=10 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCC GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTAT GCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTTACT""" dna_seqs_with_dups=""">seq1 GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTAA >seq2 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCC >seq3 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCC >seq4 GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTT""" # Expected output file data uc_lines1 = """# usearch --id 0.97 --uc usearch_picked_otus/assign_reads_to_otus.uc --query seqs.fna --global --db usearch_picked_otus/enumerated_otus.fasta # version=4.2.66 # Tab-separated fields: # 1=Type, 2=ClusterNr, 3=SeqLength or ClusterSize, 4=PctId, 5=Strand, 6=QueryStart, 7=SeedStart, 8=Alignment, 9=QueryLabel, 10=TargetLabel # Record types (field 1): L=LibSeed, S=NewSeed, H=Hit, R=Reject, D=LibCluster, C=NewCluster, N=NoHit # For C and D types, PctId is average id with seed. # QueryStart and SeedStart are zero-based relative to start of sequence. # If minus strand, SeedStart is relative to reverse-complemented seed. H\t42\t217\t99.1\t+\t0\t0\t217MI\tPC.test2_1 FLP3FBN02xELBSXx orig_bc=ACAGAGTCGGCG new_bc=ACAGAGTCGGCG,FLP3FBN02x bc_diffs=0\t42 H\t42\t217\t99.1\t+\t0\t0\t217MI\tPC.test1_2 FLP3FBN03ELBSXx orig_bc=ACAGAGTCGGCG new_bc=ACAGAGTCGGCG,FLP3FBN03 bc_diffs=0\t42 H\t42\t217\t99.1\t+\t0\t0\t217MI\tPC.634_3 FLP3FBN01ELBSX orig_bc=TCAGAGTCGGCT new_bc=ACAGAGTCGGCT,FLP3FBN01 bc_diffs=1\t42 H\t19\t243\t100.0\t+\t0\t0\t25MI218M\tPC.634_4 FLP3FBN01EG8AX orig_bc=ACAGAGTCGGCT new_bc=ACAGAGTCGGCT,FLP3FBN01 bc_diffs=0\t19 N\t*\t219\t*\t*\t*\t*\t*\tPC.481_6\tFLP3FBN01DEHK3 orig_bc=ACCAGCGACTAG new_bc=ACCAGCGACTAG,FLP3FBN01 bc_diffs=0\t* H\t6\t211\t99.5\t+\t0\t0\t211M\tPC.269_5 FLP3FBN01EEWKD orig_bc=AGCACGAGCCTA new_bc=AGCACGAGCCTA,FLP3FBN01 bc_diffs=0\t6 """.split('\n') expected_otu_assignments = """# Tab-separated fields: # 1=Type, 2=ClusterNr, 3=SeqLength or ClusterSize, 4=PctId, 5=Strand, 6=QueryStart, 7=SeedStart, 8=Alignment, 9=QueryLabel, 10=TargetLabel # Record types (field 1): L=LibSeed, S=NewSeed, H=Hit, R=Reject, D=LibCluster, C=NewCluster, N=NoHit # For C and D types, PctId is average id with seed. # QueryStart and SeedStart are zero-based relative to start of sequence. # If minus strand, SeedStart is relative to reverse-complemented seed. H\t2\t199\t97.5\t.\t0\t0\t119M80D\tref1 ecoli sequence\tusearch_ecoli_seq2 N\t*\t178\t*\t*\t*\t*\t*\tEU199232 1 1236 Bacteria/Deltaproteobacteria/Desulfurella - Hippea/uncultured\t* H\t1\t180\t100.0\t.\t0\t0\t97M83D\tL07864 1 1200 Bacteria/Beta Gammaproteobacteria/Solemya symbiont\tSolemya seq""".split('\n') expected_enumerated_fasta = """>0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >1 ACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT >2 CCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT >3 CCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT >4 GCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT >5 CCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA >6 CGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT >7 ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT >8 CGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT >9 GGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA""".split('\n') expected_enumerated_fasta_added_options = """>Big255Ern\tuclust_test_seqs_0 some comment0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >Big256Ern\tuclust_test_seqs_1 some comment1 ACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT >Big257Ern\tuclust_test_seqs_2 some comment2 CCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT >Big258Ern\tuclust_test_seqs_3 some comment3 CCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT >Big259Ern\tuclust_test_seqs_4 some comment4 GCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT >Big260Ern\tuclust_test_seqs_5 some comment4_again CCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA >Big261Ern\tuclust_test_seqs_6 some comment6 CGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT >Big262Ern\tuclust_test_seqs_7 some comment7 ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT >Big263Ern\tuclust_test_seqs_8 some comment8 CGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT >Big264Ern\tuclust_test_seqs_9 some comment9 GGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA""".split('\n') expected_clusters_w_abundance_default_settings = """>Cluster1;size=326 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT""".split('\n') expected_clusters_w_abundance_low_setting = """>Cluster1;size=304 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >Cluster11;size=22 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCTAACCC CCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT""".split('\n') expected_reference_filtered_seqs = """>usearch_ecoli_seq CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTG ACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGT >Solemya seq GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAG TGGCGGACGGGTGAGTA >usearch_ecoli_seq2 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTG ACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTCCAT >Solemya_seq2 GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAG TGGCGGACGGGTGAGTATCAAG""".split('\n') expected_de_novo_chimeras_default = """>Cluster1;size=52 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGGCTTGGTGGTCCGTTACAC CGCCAACTACCTAATGCGACGCATGCCCATCCGCTACCGGATCGCTCCTTTGGAATCCCGGGGATGTCCCCGGAACTCGT TATGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGTAGCGGGCAGGTTGCATACGTGTTACTCACCCGTGCGCCG GTCGCCGG >Cluster0;size=50 TTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCC CGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCA TCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGG >Cluster2;size=45 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCTGTTACCC CGCCAACCAGCTAATCAGACGCGGATCCATCGTATACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTT >Cluster10;size=43 CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCGCCCTCTCAGGCCGGCTATGCATCATCGTCTTGGTGGGCCTTTACCC CGCCAACCAACTAATGCACCGCAGGTCCATCCGCGCCCCATCCCCTAAAGGATGTTTCACAGAAAGAAGATGCCTCCTTC CTGTACATCGGGATTTGTTCTCCGTTTCCAGAGCGTATTCCCGGTGCGCGGGCAGGTTCCCTACGTGTTACTCACCCG >Cluster4;size=40 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCC CGCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGTCCCATGCAGGACCGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTGCAAGGCAGGTTACCCACGCGTTACTCACCCGTCCG >Cluster6;size=40 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTCGGTGGGCCGTTACCC CGCCGACTAGCTAATGCGCCGCATGGCCATCCGCAGCCGATAAATCTTTAAACATCGGGAGATGCCTCCCAACGTTGTTA CGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGCTGCGGGCAGGTTCCATACGTGTTACTCACCCGTGCGCCGG >Cluster3;size=30 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTCAACTCGGCTATGCATCATTGCCTTGGTAAGCCGTTACCT TACCAACTAGCTAATGCACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCTAGACATGCGTCTAGTG TTGTTATCCGGTATTAGCATCTGTTTCCAGGTGTTATCCCAGTCTCTTGGG >Cluster12;size=19 TTGGTCCGTGTCTCAGTACCAATGTGGGGGGTTAACCTCTCAGTCCCCCTATGTATCGTGGTCTTGGTGAGCCGTTACCC CACCAACTAACTAATACAACGCATGCCCATCCATTACCACCGGAGTTTTCAACCCAAGAAGATGCCTCCCTGGATGTTAT GGGGTATTAGTACCGATTTCTCAGTGTTATCCCCCTGTAATGGGTAGGTTGCATACGCGTTACGCACCCGTGCGCCGGTC GCCGACAAT >Cluster29;size=18 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTCGGTGGGCCGTTACCC CGCCGACTAGCTAATGCGCCGCATGCCCATCCGCCACCGGTAATCCCTTTGGCGGCACCGGGATGCCCCGACGCCGCGTC ACGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGTGGCGGGCAGGTTGCATACGTGTTACTCACCCGTGCGCCGG TCGCCGG >Cluster30;size=18 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTTGGTGGGCCGTTACCC CGCCAACTAGCTAATGCGCCGCATGGCCATCCGTAGCCGGTGTTACCCTTTAAACCCCAAGAGATGCCTCTCGGAGTTAT TACGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGCTACGGGCAGGTTCCATACGTGTTACTCACCCGTGCGCCG GTCGCCGG >Cluster16;size=16 CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCGTTACCC CTCCAACCAGCTAATCAGACGCGGGTCCATCCTGTACCACCGGAGTTTTTCACACTGTACCATGCGGTACTGTGCGCTTA TGCGGTTTTAGCACCTATTTCTAAGTGTTATCCCCCTGTACAGGGCAGGTTACCCACGCGTTACTCACCCGTCCGCCACT >Cluster221;size=1 CTGGGCCGTATCTCAGTCCCAATGTGGCCGTTCAACCTCTCAGTCCGGCTACTGATCGTCGCCTTGGTAGGCCGTTGCCC CGCCAACTACCTAATCGGACGCGAGCCCATCTTTCAGCGGATTGCTCCTTTGATTATCTCACCATGCGGCAAAATAATGT CATGCGGTATTAGCGTTCGTTTCCAAACGTTATCCCCCTCTGAAAGGCAGGTTGCTCACGCGTT >Cluster218;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGGCCACCCTCTCAGGTCGGCTACTGATCGTCACCTTGGTAGGCCGTTACCC CACCAACTAGCTAATCAGACGCAAGCCCATCTATCAGCGGATTGCTCCTTTTCTAGCTATATCATGCGATACTACTAGCT TATGCGGTATTAGCAATGATTTCTCACTGTTATTCCCCTCTGATAGGCAGG >Cluster217;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCC CGCCAACCAGCTAATCAGACGCGAGTCCATCTCAGAGCGATAAATCTTTGATATCCAGAGCCATGCGACCCAGATATATT ATGCGGTATTAGCAGCTGTTTCCAGCTGTTATTCCCCATCCAAGGCAGGTT >Cluster216;size=1 CTGGGCCGTGTCTCAGTCCCAGTGTGGCCGTCCGCCCTCTCAGGTCAGCTACTGATCGTCGCCTTGGTAGGCCATTACCC TACCAACTAGCTAATCAGACGCGAGGCCATCTCTCAGCGATAAATCTTTGATATATCTGCCATGCGACAAACATATATTA TGCGGTATTAGCAGTCGTTTCCAACTGTTGTCCCCCTCTGAAAGGCAGGTT""".split('\n') expected_de_novo_chimera_filtered_skew11 = """>Cluster1;size=52 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGGCTTGGTGGTCCGTTACAC CGCCAACTACCTAATGCGACGCATGCCCATCCGCTACCGGATCGCTCCTTTGGAATCCCGGGGATGTCCCCGGAACTCGT TATGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGTAGCGGGCAGGTTGCATACGTGTTACTCACCCGTGCGCCG GTCGCCGG >Cluster0;size=50 TTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCC CGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCA TCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGG >Cluster2;size=45 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCTGTTACCC CGCCAACCAGCTAATCAGACGCGGATCCATCGTATACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTT >Cluster10;size=43 CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCGCCCTCTCAGGCCGGCTATGCATCATCGTCTTGGTGGGCCTTTACCC CGCCAACCAACTAATGCACCGCAGGTCCATCCGCGCCCCATCCCCTAAAGGATGTTTCACAGAAAGAAGATGCCTCCTTC CTGTACATCGGGATTTGTTCTCCGTTTCCAGAGCGTATTCCCGGTGCGCGGGCAGGTTCCCTACGTGTTACTCACCCG >Cluster4;size=40 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCC CGCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGTCCCATGCAGGACCGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTGCAAGGCAGGTTACCCACGCGTTACTCACCCGTCCG >Cluster6;size=40 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTCGGTGGGCCGTTACCC CGCCGACTAGCTAATGCGCCGCATGGCCATCCGCAGCCGATAAATCTTTAAACATCGGGAGATGCCTCCCAACGTTGTTA CGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGCTGCGGGCAGGTTCCATACGTGTTACTCACCCGTGCGCCGG >Cluster3;size=30 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTCAACTCGGCTATGCATCATTGCCTTGGTAAGCCGTTACCT TACCAACTAGCTAATGCACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCTAGACATGCGTCTAGTG TTGTTATCCGGTATTAGCATCTGTTTCCAGGTGTTATCCCAGTCTCTTGGG >Cluster12;size=19 TTGGTCCGTGTCTCAGTACCAATGTGGGGGGTTAACCTCTCAGTCCCCCTATGTATCGTGGTCTTGGTGAGCCGTTACCC CACCAACTAACTAATACAACGCATGCCCATCCATTACCACCGGAGTTTTCAACCCAAGAAGATGCCTCCCTGGATGTTAT GGGGTATTAGTACCGATTTCTCAGTGTTATCCCCCTGTAATGGGTAGGTTGCATACGCGTTACGCACCCGTGCGCCGGTC GCCGACAAT >Cluster29;size=18 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTCGGTGGGCCGTTACCC CGCCGACTAGCTAATGCGCCGCATGCCCATCCGCCACCGGTAATCCCTTTGGCGGCACCGGGATGCCCCGACGCCGCGTC ACGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGTGGCGGGCAGGTTGCATACGTGTTACTCACCCGTGCGCCGG TCGCCGG >Cluster30;size=18 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTTGGTGGGCCGTTACCC CGCCAACTAGCTAATGCGCCGCATGGCCATCCGTAGCCGGTGTTACCCTTTAAACCCCAAGAGATGCCTCTCGGAGTTAT TACGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGCTACGGGCAGGTTCCATACGTGTTACTCACCCGTGCGCCG GTCGCCGG >Cluster16;size=16 CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCGTTACCC CTCCAACCAGCTAATCAGACGCGGGTCCATCCTGTACCACCGGAGTTTTTCACACTGTACCATGCGGTACTGTGCGCTTA TGCGGTTTTAGCACCTATTTCTAAGTGTTATCCCCCTGTACAGGGCAGGTTACCCACGCGTTACTCACCCGTCCGCCACT >Cluster522;size=10 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCC GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTAT GCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTTACT >Cluster221;size=1 CTGGGCCGTATCTCAGTCCCAATGTGGCCGTTCAACCTCTCAGTCCGGCTACTGATCGTCGCCTTGGTAGGCCGTTGCCC CGCCAACTACCTAATCGGACGCGAGCCCATCTTTCAGCGGATTGCTCCTTTGATTATCTCACCATGCGGCAAAATAATGT CATGCGGTATTAGCGTTCGTTTCCAAACGTTATCCCCCTCTGAAAGGCAGGTTGCTCACGCGTT >Cluster218;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGGCCACCCTCTCAGGTCGGCTACTGATCGTCACCTTGGTAGGCCGTTACCC CACCAACTAGCTAATCAGACGCAAGCCCATCTATCAGCGGATTGCTCCTTTTCTAGCTATATCATGCGATACTACTAGCT TATGCGGTATTAGCAATGATTTCTCACTGTTATTCCCCTCTGATAGGCAGG >Cluster217;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCC CGCCAACCAGCTAATCAGACGCGAGTCCATCTCAGAGCGATAAATCTTTGATATCCAGAGCCATGCGACCCAGATATATT ATGCGGTATTAGCAGCTGTTTCCAGCTGTTATTCCCCATCCAAGGCAGGTT >Cluster216;size=1 CTGGGCCGTGTCTCAGTCCCAGTGTGGCCGTCCGCCCTCTCAGGTCAGCTACTGATCGTCGCCTTGGTAGGCCATTACCC TACCAACTAGCTAATCAGACGCGAGGCCATCTCTCAGCGATAAATCTTTGATATATCTGCCATGCGACAAACATATATTA TGCGGTATTAGCAGTCGTTTCCAACTGTTGTCCCCCTCTGAAAGGCAGGTT""".split('\n') expected_cluster_err_seqs = """>Cluster0;size=2 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT""".split('\n') expected_sorted_by_abundance_no_filter = """>Cluster1;size=52 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGGCTTGGTGGTCCGTTACAC CGCCAACTACCTAATGCGACGCATGCCCATCCGCTACCGGATCGCTCCTTTGGAATCCCGGGGATGTCCCCGGAACTCGT TATGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGTAGCGGGCAGGTTGCATACGTGTTACTCACCCGTGCGCCG GTCGCCGG >Cluster0;size=50 TTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCC CGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCA TCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGG >Cluster2;size=45 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCTGTTACCC CGCCAACCAGCTAATCAGACGCGGATCCATCGTATACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTT >Cluster10;size=43 CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCGCCCTCTCAGGCCGGCTATGCATCATCGTCTTGGTGGGCCTTTACCC CGCCAACCAACTAATGCACCGCAGGTCCATCCGCGCCCCATCCCCTAAAGGATGTTTCACAGAAAGAAGATGCCTCCTTC CTGTACATCGGGATTTGTTCTCCGTTTCCAGAGCGTATTCCCGGTGCGCGGGCAGGTTCCCTACGTGTTACTCACCCG >Cluster4;size=40 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCC CGCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGTCCCATGCAGGACCGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTGCAAGGCAGGTTACCCACGCGTTACTCACCCGTCCG >Cluster6;size=40 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTCGGTGGGCCGTTACCC CGCCGACTAGCTAATGCGCCGCATGGCCATCCGCAGCCGATAAATCTTTAAACATCGGGAGATGCCTCCCAACGTTGTTA CGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGCTGCGGGCAGGTTCCATACGTGTTACTCACCCGTGCGCCGG >Cluster3;size=30 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTCAACTCGGCTATGCATCATTGCCTTGGTAAGCCGTTACCT TACCAACTAGCTAATGCACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCTAGACATGCGTCTAGTG TTGTTATCCGGTATTAGCATCTGTTTCCAGGTGTTATCCCAGTCTCTTGGG >Cluster12;size=19 TTGGTCCGTGTCTCAGTACCAATGTGGGGGGTTAACCTCTCAGTCCCCCTATGTATCGTGGTCTTGGTGAGCCGTTACCC CACCAACTAACTAATACAACGCATGCCCATCCATTACCACCGGAGTTTTCAACCCAAGAAGATGCCTCCCTGGATGTTAT GGGGTATTAGTACCGATTTCTCAGTGTTATCCCCCTGTAATGGGTAGGTTGCATACGCGTTACGCACCCGTGCGCCGGTC GCCGACAAT >Cluster29;size=18 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTCGGTGGGCCGTTACCC CGCCGACTAGCTAATGCGCCGCATGCCCATCCGCCACCGGTAATCCCTTTGGCGGCACCGGGATGCCCCGACGCCGCGTC ACGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGTGGCGGGCAGGTTGCATACGTGTTACTCACCCGTGCGCCGG TCGCCGG >Cluster30;size=18 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTTGGTGGGCCGTTACCC CGCCAACTAGCTAATGCGCCGCATGGCCATCCGTAGCCGGTGTTACCCTTTAAACCCCAAGAGATGCCTCTCGGAGTTAT TACGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGCTACGGGCAGGTTCCATACGTGTTACTCACCCGTGCGCCG GTCGCCGG >Cluster16;size=16 CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCGTTACCC CTCCAACCAGCTAATCAGACGCGGGTCCATCCTGTACCACCGGAGTTTTTCACACTGTACCATGCGGTACTGTGCGCTTA TGCGGTTTTAGCACCTATTTCTAAGTGTTATCCCCCTGTACAGGGCAGGTTACCCACGCGTTACTCACCCGTCCGCCACT >Cluster522;size=10 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCC GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTAT GCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTTACT >Cluster222;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCC GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTAT GCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTTACT >Cluster221;size=1 CTGGGCCGTATCTCAGTCCCAATGTGGCCGTTCAACCTCTCAGTCCGGCTACTGATCGTCGCCTTGGTAGGCCGTTGCCC CGCCAACTACCTAATCGGACGCGAGCCCATCTTTCAGCGGATTGCTCCTTTGATTATCTCACCATGCGGCAAAATAATGT CATGCGGTATTAGCGTTCGTTTCCAAACGTTATCCCCCTCTGAAAGGCAGGTTGCTCACGCGTT >Cluster218;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGGCCACCCTCTCAGGTCGGCTACTGATCGTCACCTTGGTAGGCCGTTACCC CACCAACTAGCTAATCAGACGCAAGCCCATCTATCAGCGGATTGCTCCTTTTCTAGCTATATCATGCGATACTACTAGCT TATGCGGTATTAGCAATGATTTCTCACTGTTATTCCCCTCTGATAGGCAGG >Cluster217;size=1 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCC CGCCAACCAGCTAATCAGACGCGAGTCCATCTCAGAGCGATAAATCTTTGATATCCAGAGCCATGCGACCCAGATATATT ATGCGGTATTAGCAGCTGTTTCCAGCTGTTATTCCCCATCCAAGGCAGGTT >Cluster216;size=1 CTGGGCCGTGTCTCAGTCCCAGTGTGGCCGTCCGCCCTCTCAGGTCAGCTACTGATCGTCGCCTTGGTAGGCCATTACCC TACCAACTAGCTAATCAGACGCGAGGCCATCTCTCAGCGATAAATCTTTGATATATCTGCCATGCGACAAACATATATTA TGCGGTATTAGCAGTCGTTTCCAACTGTTGTCCCCCTCTGAAAGGCAGGTT""".split('\n') expected_abundance_sort_filtered = """>Cluster1;size=52 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGGCTTGGTGGTCCGTTACAC CGCCAACTACCTAATGCGACGCATGCCCATCCGCTACCGGATCGCTCCTTTGGAATCCCGGGGATGTCCCCGGAACTCGT TATGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGTAGCGGGCAGGTTGCATACGTGTTACTCACCCGTGCGCCG GTCGCCGG >Cluster0;size=50 TTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCC CGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCA TCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGG >Cluster2;size=45 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCTGTTACCC CGCCAACCAGCTAATCAGACGCGGATCCATCGTATACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTATACGGCAGGTTCTCCACGCGTT >Cluster10;size=43 CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCGCCCTCTCAGGCCGGCTATGCATCATCGTCTTGGTGGGCCTTTACCC CGCCAACCAACTAATGCACCGCAGGTCCATCCGCGCCCCATCCCCTAAAGGATGTTTCACAGAAAGAAGATGCCTCCTTC CTGTACATCGGGATTTGTTCTCCGTTTCCAGAGCGTATTCCCGGTGCGCGGGCAGGTTCCCTACGTGTTACTCACCCG >Cluster4;size=40 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCC CGCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGTCCCATGCAGGACCGTGCGCTTA TGCGGTATTAGCACCTATTTCTAAGTGTTATCCCCCAGTGCAAGGCAGGTTACCCACGCGTTACTCACCCGTCCG >Cluster6;size=40 CTGGTCCGTGTCTCAGTACCAGTGTGGGGGACCTTCCTCTCAGAACCCCTACGCATCGTCGCCTCGGTGGGCCGTTACCC CGCCGACTAGCTAATGCGCCGCATGGCCATCCGCAGCCGATAAATCTTTAAACATCGGGAGATGCCTCCCAACGTTGTTA CGCGGTATTAGACGGAATTTCTTCCGCTTATCCCCCTGCTGCGGGCAGGTTCCATACGTGTTACTCACCCGTGCGCCGG""".split('\n') expected_derep_seqs = """>seq1;size=2 GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTAA >seq2;size=2 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCC""".split('\n') expected_derep_seqs_full_len = """>Cluster0;size=1 GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTAA >Cluster1;size=2 TTGGGCCGTGTCTCAGTCCCAATGTGGCCGTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCTTTACCCC >Cluster2;size=1 GCCAACCAGCTAATCAGACGCGGGTCCATCTTGCACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTT""".split('\n') expected_len_sorted_seqs = """>chimera CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACCCCTAGGGTGGGAATAACCCGGGGAAACCCGGGCTAATACCGAATAAGACCACAGGAGGCGACTCCAGAGGGTCAAAGGGAGCCTTGGCCTCCCCC >usearch_ecoli_seq2 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTCCAT >usearch_ecoli_seq CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGT >Solemya_seq2 GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAGTGGCGGACGGGTGAGTATCAAG >Solemya seq GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAGTGGCGGACGGGTGAGTA""".split('\n') expected_combined_dna_seqs_1_seqs_usearch = """>uclust_test_seqs_0 some comment0 AACCCCCACGGTGGATGCCACACGCCCCATACAAAGGGTAGGATGCTTAAGACACATCGCGTCAGGTTTGTGTCAGGCCT >uclust_test_seqs_1 some comment1 ACCCACACGGTGGATGCAACAGATCCCATACACCGAGTTGGATGCTTAAGACGCATCGCGTGAGTTTTGCGTCAAGGCT >uclust_test_seqs_2 some comment2 CCCCCACGGTGGCAGCAACACGTCACATACAACGGGTTGGATTCTAAAGACAAACCGCGTCAAAGTTGTGTCAGAACT >uclust_test_seqs_3 some comment3 CCCCACGGTAGCTGCAACACGTCCCATACCACGGGTAGGATGCTAAAGACACATCGGGTCTGTTTTGTGTCAGGGCT >uclust_test_seqs_4 some comment4 GCCACGGTGGGTACAACACGTCCACTACATCGGCTTGGAAGGTAAAGACACGTCGCGTCAGTATTGCGTCAGGGCT >uclust_test_seqs_5 some comment4_again CCGCGGTAGGTGCAACACGTCCCATACAACGGGTTGGAAGGTTAAGACACAACGCGTTAATTTTGTGTCAGGGCA >uclust_test_seqs_6 some comment6 CGCGGTGGCTGCAAGACGTCCCATACAACGGGTTGGATGCTTAAGACACATCGCAACAGTTTTGAGTCAGGGCT >uclust_test_seqs_7 some comment7 ACGGTGGCTACAAGACGTCCCATCCAACGGGTTGGATACTTAAGGCACATCACGTCAGTTTTGTGTCAGAGCT >uclust_test_seqs_8 some comment8 CGGTGGCTGCAACACGTGGCATACAACGGGTTGGATGCTTAAGACACATCGCCTCAGTTTTGTGTCAGGGCT >uclust_test_seqs_9 some comment9 GGTGGCTGAAACACATCCCATACAACGGGTTGGATGCTTAAGACACATCGCATCAGTTTTATGTCAGGGGA >usearch_ecoli_seq CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGT >Solemya seq GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAGTGGCGGACGGGTGAGTA >usearch_ecoli_seq2 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTCCAT >Solemya_seq2 GGCTCAGATTGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGGTAACAGGCGGAGCTTGCTCTGCGCTGACGAGTGGCGGACGGGTGAGTATCAAG >chimera CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAGGGAGTAAAGTTAATACCTTTGCTCATTGACCCCTAGGGTGGGAATAACCCGGGGAAACCCGGGCTAATACCGAATAAGACCACAGGAGGCGACTCCAGAGGGTCAAAGGGAGCCTTGGCCTCCCCC""".split('\n') expected_retained_chimeras_union = """>seq1 ACAGGCC >seq2 ACAGGCCCCC >seq3 TTATCCATT >seq4 ACAGGCCCCC >seq5 TTATCCATT""".split('\n') expected_retained_chimeras_intersection = """>seq3 TTATCCATT""".split('\n') if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_util.py000644 000765 000024 00000147511 12024702176 021621 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from cogent.app.util import Application, CommandLineApplication, \ CommandLineAppResult, ResultPath, ApplicationError, ParameterIterBase,\ ParameterCombinations, cmdline_generator, ApplicationNotFoundError,\ get_tmp_filename, guess_input_handler from cogent.app.parameters import * from os import remove,system,mkdir,rmdir,removedirs,getcwd, walk __author__ = "Greg Caporaso and Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Sandra Smit", "Gavin Huttley", "Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" class ParameterCombinationsTests(TestCase): def setUp(self): """Setup for ParameterCombinations tests""" self.mock_app = ParameterCombinationsApp self.params = {'-flag1':True, '--value1':range(0,5), '-delim':range(0,2), '-mix1':[None] + range(0,3)} self.always_on = ['--value1'] self.param_iter = ParameterCombinations(self.mock_app, self.params, self.always_on) def test_init_generator(self): """Tests generator capabilities""" all_params = list(self.param_iter) self.assertEqual(len(all_params), 150) params = {'-flag1':True, '--value1':1, '-delim':['choice1','choice2']} always_on = ['-flag1','-delim'] param_iter = ParameterCombinations(self.mock_app, params, always_on) exp = [self.mock_app._parameters.copy(), self.mock_app._parameters.copy(), self.mock_app._parameters.copy(), self.mock_app._parameters.copy()] # default is on in all these cases exp[0]['-flag1'].on() exp[0]['--value1'].on(1) exp[0]['-delim'].on('choice1') exp[1]['-flag1'].on() exp[1]['--value1'].on(1) exp[1]['-delim'].on('choice2') exp[2]['-flag1'].on() exp[2]['--value1'].off() exp[2]['-delim'].on('choice1') exp[3]['-flag1'].on() exp[3]['--value1'].off() exp[3]['-delim'].on('choice2') obs = list(param_iter) self.assertEqual(obs,exp) def test_reset(self): """Resets the iterator""" first = list(self.param_iter) self.assertRaises(StopIteration, self.param_iter.next) self.param_iter.reset() second = list(self.param_iter) self.assertEqual(first, second) class ParameterIterBaseTests(TestCase): def setUp(self): """Setup for ParameterIterBase tests""" self.mock_app = ParameterCombinationsApp self.params = {'-flag1':True, '--value1':range(0,5), '-delim':range(0,2), '-mix1':[None] + range(0,3)} self.always_on = ['--value1'] self.param_base = ParameterIterBase(self.mock_app, self.params, self.always_on) def test_init(self): """Test constructor""" exp_params = {'-flag1':[True, False], '--value1':range(0,5), '-delim':range(0,2) + [False], '-mix1':[None,0,1,2] + [False]} exp_keys = exp_params.keys() exp_values = exp_params.values() self.assertEqual(sorted(self.param_base._keys), sorted(exp_keys)) self.assertEqual(sorted(self.param_base._values), sorted(exp_values)) self.params['asdasda'] = 5 self.assertRaises(ValueError, ParameterIterBase, self.mock_app, \ self.params, self.always_on) self.params.pop('asdasda') self.always_on.append('asdasd') self.assertRaises(ValueError, ParameterIterBase, self.mock_app, \ self.params, self.always_on) def test_make_app_params(self): """Returns app parameters with expected values set""" values = [0,0,True,None] exp = self.mock_app._parameters.copy() exp['-flag1'].on() exp['--value1'].on(0) exp['-delim'].on(0) exp['-mix1'].on(None) obs = self.param_base._make_app_params(values) self.assertEqual(obs, exp) state = [4,False,False,False] exp = self.mock_app._parameters.copy() exp['-flag1'].off() exp['--value1'].on(4) exp['-delim'].off() exp['-mix1'].off() obs = self.param_base._make_app_params(values) self.assertEqual(obs, exp) class CommandLineGeneratorTests(TestCase): def setUp(self): self.abs_path_to_bin = '/bin/path' self.abs_path_to_cmd = '/cmd/path' self.abs_path_to_input = '/input/path' self.abs_path_to_output = '/output/path' self.abs_path_to_stdout = '/stdout/path' self.abs_path_to_stderr = '/stderr/path' self.app = ParameterCombinationsApp params = {'-flag1':True, '-delim':['choice1','choice2']} always_on = ['-delim'] self.mock_app = ParameterCombinationsApp self.param_iter = ParameterCombinations(self.mock_app,params, always_on) def test_cmdline_generator_easy(self): """Returns parameter combinations commandlines""" cmdgen = cmdline_generator(self.param_iter, PathToBin=self.abs_path_to_bin, PathToCmd=self.abs_path_to_cmd, PathsToInputs=self.abs_path_to_input, PathToOutput=self.abs_path_to_output, PathToStdout=self.abs_path_to_stdout, PathToStderr=self.abs_path_to_stderr, UniqueOutputs=False, InputParam='-input', OutputParam='-output') bin = self.abs_path_to_bin cmd = self.abs_path_to_cmd inputfile = self.abs_path_to_input outputfile = self.abs_path_to_output stdout = self.abs_path_to_stdout stderr = self.abs_path_to_stderr exp = [' '.join([bin, cmd, '-default=42', '-delimaaachoice1','-flag1',\ '-input="%s"' % inputfile,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])] exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice1', \ '-input="%s"' % inputfile,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42','-delimaaachoice2', \ '-flag1', '-input="%s"' % inputfile, \ '-output="%s"' % outputfile, '> "%s"' % stdout,\ '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice2', \ '-input="%s"' % inputfile,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) cmdlines = list(cmdgen) self.assertEqual(cmdlines, exp) def test_cmdline_generator_hard(self): """Returns parameter combinations commandlines. Test stdin/stdout""" cmdgen = cmdline_generator(self.param_iter, PathToBin=self.abs_path_to_bin, PathToCmd=self.abs_path_to_cmd, PathsToInputs=self.abs_path_to_input, PathToOutput=self.abs_path_to_output, PathToStdout=self.abs_path_to_stdout, PathToStderr=self.abs_path_to_stderr, UniqueOutputs=True, InputParam=None, OutputParam=None) bin = self.abs_path_to_bin cmd = self.abs_path_to_cmd inputfile = self.abs_path_to_input outputfile = self.abs_path_to_output stdout = self.abs_path_to_stdout stderr = self.abs_path_to_stderr # the extra '' is intentionally added. When stdout is used for actual # output, the stdout_ param gets set to '' which results in an extra # space being generated on the cmdline. this should be benign # across operating systems exp = [' '.join([bin, cmd, '-default=42', '-delimaaachoice1','-flag1',\ '< "%s"' % inputfile, '> "%s"0' % outputfile, '',\ '2> "%s"' % stderr])] exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice1', \ '< "%s"' % inputfile,'> "%s"1' % outputfile, '',\ '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42','-delimaaachoice2', \ '-flag1', '< "%s"' % inputfile, \ '> "%s"2' % outputfile, '', '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice2', \ '< "%s"' % inputfile,'> "%s"3' % outputfile, '',\ '2> "%s"' % stderr])) cmdlines = list(cmdgen) self.assertEqual(cmdlines, exp) def test_cmdline_generator_stdout_stderr_off(self): """Returns cmdlines with stdout and stderr disabled""" cmdgen = cmdline_generator(self.param_iter, PathToBin=self.abs_path_to_bin, PathToCmd=self.abs_path_to_cmd, PathsToInputs=self.abs_path_to_input, PathToOutput=self.abs_path_to_output, PathToStdout=None, PathToStderr=None, UniqueOutputs=False, InputParam='-input', OutputParam='-output') bin = self.abs_path_to_bin cmd = self.abs_path_to_cmd inputfile = self.abs_path_to_input outputfile = self.abs_path_to_output stdout = self.abs_path_to_stdout stderr = self.abs_path_to_stderr exp = [' '.join([bin, cmd, '-default=42', '-delimaaachoice1','-flag1',\ '-input="%s"' % inputfile,'-output="%s"' % outputfile,\ '',''])] exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice1', \ '-input="%s"' % inputfile,'-output="%s"' % outputfile,\ '',''])) exp.append(' '.join([bin, cmd, '-default=42','-delimaaachoice2', \ '-flag1', '-input="%s"' % inputfile, \ '-output="%s"' % outputfile,'',''])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice2', \ '-input="%s"' % inputfile,'-output="%s"' % outputfile,\ '',''])) cmdlines = list(cmdgen) self.assertEqual(cmdlines, exp) def test_cmdline_generator_multiple_inputs(self): """Tests the cmdline_generator for multiple input support""" paths_to_inputs = ['/some/dir/a','/some/dir/b'] cmdgen = cmdline_generator(self.param_iter, PathToBin=self.abs_path_to_bin, PathToCmd=self.abs_path_to_cmd, PathsToInputs=paths_to_inputs, PathToOutput=self.abs_path_to_output, PathToStdout=self.abs_path_to_stdout, PathToStderr=self.abs_path_to_stderr, UniqueOutputs=False, InputParam='-input', OutputParam='-output') bin = self.abs_path_to_bin cmd = self.abs_path_to_cmd inputfile1 = paths_to_inputs[0] inputfile2 = paths_to_inputs[1] outputfile = self.abs_path_to_output stdout = self.abs_path_to_stdout stderr = self.abs_path_to_stderr exp = [' '.join([bin, cmd, '-default=42', '-delimaaachoice1','-flag1',\ '-input="%s"' % inputfile1,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])] exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice1',\ '-flag1', '-input="%s"' % inputfile2,\ '-output="%s"' % outputfile, '> "%s"' % stdout, \ '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice1', \ '-input="%s"' % inputfile1,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice1', \ '-input="%s"' % inputfile2,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42','-delimaaachoice2', \ '-flag1', '-input="%s"' % inputfile1, \ '-output="%s"' % outputfile, '> "%s"' % stdout,\ '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42','-delimaaachoice2', \ '-flag1', '-input="%s"' % inputfile2, \ '-output="%s"' % outputfile, '> "%s"' % stdout,\ '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice2', \ '-input="%s"' % inputfile1,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice2', \ '-input="%s"' % inputfile2,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) cmdlines = list(cmdgen) self.assertEqual(cmdlines, exp) def test_cmdline_generator_multiple_input_stdin(self): """Tests cmdline_generator for multiple inputs over stdin""" paths_to_inputs = ['/some/dir/a','/some/dir/b'] cmdgen = cmdline_generator(self.param_iter, PathToBin=self.abs_path_to_bin, PathToCmd=self.abs_path_to_cmd, PathsToInputs=paths_to_inputs, PathToOutput=self.abs_path_to_output, PathToStdout=self.abs_path_to_stdout, PathToStderr=self.abs_path_to_stderr, UniqueOutputs=False, InputParam=None, OutputParam='-output') bin = self.abs_path_to_bin cmd = self.abs_path_to_cmd inputfile1 = paths_to_inputs[0] inputfile2 = paths_to_inputs[1] outputfile = self.abs_path_to_output stdout = self.abs_path_to_stdout stderr = self.abs_path_to_stderr exp = [' '.join([bin, cmd, '-default=42', '-delimaaachoice1','-flag1',\ '< "%s"' % inputfile1,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])] exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice1',\ '-flag1', '< "%s"' % inputfile2,\ '-output="%s"' % outputfile, '> "%s"' % stdout, \ '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice1', \ '< "%s"' % inputfile1,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice1', \ '< "%s"' % inputfile2,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42','-delimaaachoice2', \ '-flag1', '< "%s"' % inputfile1, \ '-output="%s"' % outputfile, '> "%s"' % stdout,\ '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42','-delimaaachoice2', \ '-flag1', '< "%s"' % inputfile2, \ '-output="%s"' % outputfile, '> "%s"' % stdout,\ '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice2', \ '< "%s"' % inputfile1,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) exp.append(' '.join([bin, cmd, '-default=42', '-delimaaachoice2', \ '< "%s"' % inputfile2,'-output="%s"' % outputfile,\ '> "%s"' % stdout, '2> "%s"' % stderr])) cmdlines = list(cmdgen) self.assertEqual(cmdlines, exp) class CommandLineApplicationTests(TestCase): """Tests for the CommandLineApplication class""" def setUp(self): """setUp for all CommandLineApplication tests""" f = open('/tmp/CLAppTester.py','w') f.write(script) f.close() system('chmod 777 /tmp/CLAppTester.py') # create a copy of the script with a space in the name f = open('/tmp/CLApp Tester.py','w') f.write(script) f.close() system('chmod 777 "/tmp/CLApp Tester.py"') self.app_no_params = CLAppTester() self.app_no_params_no_stderr = CLAppTester(SuppressStderr=True) self.app_params =CLAppTester({'-F':'p_file.txt'}) self.app_params_space_in_command =\ CLAppTester_space_in_command({'-F':'p_file.txt'}) self.app_params_no_stderr =CLAppTester({'-F':'p_file.txt'},\ SuppressStderr=True) self.app_params_no_stdout =CLAppTester({'-F':'p_file.txt'},\ SuppressStdout=True) self.app_params_input_as_file =CLAppTester({'-F':'p_file.txt'},\ InputHandler='_input_as_lines') self.app_params_WorkingDir =CLAppTester({'-F':'p_file.txt'},\ WorkingDir='/tmp/test') self.app_params_WorkingDir_w_space =CLAppTester({'-F':'p_file.txt'},\ WorkingDir='/tmp/test space') self.app_params_TmpDir =CLAppTester({'-F':'p_file.txt'},\ TmpDir='/tmp/tmp2') self.app_params_TmpDir_w_space =CLAppTester({'-F':'p_file.txt'},\ TmpDir='/tmp/tmp space') self.data = 42 def test_base_command(self): """CLAppTester: BaseCommand correctly composed """ # No parameters on app = CLAppTester() self.assertEqual(app.BaseCommand,'cd "/tmp/"; /tmp/CLAppTester.py') # ValuedParameter on/off app.Parameters['-F'].on('junk.txt') self.assertEqual(app.BaseCommand,\ 'cd "/tmp/"; /tmp/CLAppTester.py -F "junk.txt"') app.Parameters['-F'].off() self.assertEqual(app.BaseCommand,'cd "/tmp/"; /tmp/CLAppTester.py') # ValuedParameter accessed by synonym turned on/off app.Parameters['File'].on('junk.txt') self.assertEqual(app.BaseCommand,\ 'cd "/tmp/"; /tmp/CLAppTester.py -F "junk.txt"') app.Parameters['File'].off() self.assertEqual(app.BaseCommand,'cd "/tmp/"; /tmp/CLAppTester.py') # Try multiple parameters, must check for a few different options # because parameters are printed in arbitrary order app.Parameters['-F'].on('junk.txt') app.Parameters['--duh'].on() self.failUnless(app.BaseCommand ==\ 'cd "/tmp/"; /tmp/CLAppTester.py -F "junk.txt" --duh'\ or app.BaseCommand ==\ 'cd "/tmp/"; /tmp/CLAppTester.py --duh -F "junk.txt"') # Space in _command app = CLAppTester_space_in_command() self.assertEqual(app.BaseCommand,'cd "/tmp/"; "/tmp/CLApp Tester.py"') def test_getHelp(self): """CLAppTester: getHelp() functions as expected """ app = CLAppTester() self.assertEqual(app.getHelp(),'Duh') def test_handle_app_result_build_failure(self): """_handle_app_result_build_failure called when CommandLineAppResult() fails """ app = CLAppTester_bad_fixed_file() self.assertRaises(ApplicationError,app) app = CLAppTester_bad_fixed_file_w_handler() self.assertEqual(app(),"Called self._handle_app_result_build_failure") def test_error_on_missing_executable(self): """CLAppTester: Useful error message on executable not found """ # fake command via self._command class Blah(CLAppTester): _command = 'fake_command_jasdlkfsadlkfskladfkladf' self.assertRaises(ApplicationNotFoundError,Blah) # real command but bad path via self._command class Blah(CLAppTester): _command = '/not/a/real/path/ls' self.assertRaises(ApplicationNotFoundError,Blah) # alt _error_on_missing_command function works as expected class Blah(CLAppTester): _command = 'ls' def _error_on_missing_application(self,data): raise ApplicationNotFoundError self.assertRaises(ApplicationNotFoundError,Blah) class Blah(CLAppTester): _command = 'fake_app_asfasdasdasdasdasd' def _error_on_missing_application(self,data): pass # no error raised Blah() def test_no_p_no_d(self): """CLAppTester: parameters turned off, no data""" app = self.app_no_params #test_init assert app.Parameters['-F'].isOff() self.assertEqual(app.InputHandler,'_input_as_string') assert not app.SuppressStderr #test_command self.assertEqual(app.BaseCommand,'cd "/tmp/"; /tmp/CLAppTester.py') #test_result result = app() self.assertEqual(result['StdOut'].read(),'out\n') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') self.assertEqual(result['parameterized_file'],None) result.cleanUp() def test_no_p_data_as_str(self): """CLAppTester: parameters turned off, data as string""" app = self.app_no_params #test_init assert app.Parameters['-F'].isOff() self.assertEqual(app.InputHandler,'_input_as_string') assert not app.SuppressStderr #test_command self.assertEqual(app.BaseCommand,'cd "/tmp/"; /tmp/CLAppTester.py') #test_result result = app(self.data) self.assertEqual(result['StdOut'].read(),'out 43\n') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') self.assertEqual(result['parameterized_file'],None) result.cleanUp() def test_p_data_as_str_suppress_stderr(self): """CLAppTester: parameters turned on, data as string, suppress stderr""" app = self.app_params_no_stderr #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_string') assert app.SuppressStderr #test_command self.assertEqual(app.BaseCommand,\ 'cd "/tmp/"; /tmp/CLAppTester.py -F "p_file.txt"') #test_result result = app(self.data) self.assertEqual(result['StdOut'].read(),'') self.assertEqual(result['StdErr'],None) self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') self.assertEqual(result['parameterized_file'].read(),\ 'out 43 p_file.txt') result.cleanUp() def test_p_data_as_str_suppress_stdout(self): """CLAppTester: parameters turned on, data as string, suppress stdout""" app = self.app_params_no_stdout #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_string') assert app.SuppressStdout #test_command self.assertEqual(app.BaseCommand,\ 'cd "/tmp/"; /tmp/CLAppTester.py -F "p_file.txt"') #test_result result = app(self.data) self.assertEqual(result['StdOut'],None) self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') self.assertEqual(result['parameterized_file'].read(),\ 'out 43 p_file.txt') result.cleanUp() def test_p_no_data(self): """CLAppTester: parameters turned on, no data""" app = self.app_params #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_string') assert not app.SuppressStderr #test_command self.assertEqual(app.BaseCommand,\ 'cd "/tmp/"; /tmp/CLAppTester.py -F "p_file.txt"') #test_result result = app() self.assertEqual(result['StdOut'].read(),'') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') self.assertEqual(result['parameterized_file'].read(),\ 'out p_file.txt') result.cleanUp() def test_p_space_in_command(self): """CLAppTester: parameters turned on, no data, space in command""" app = self.app_params_space_in_command #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_string') assert not app.SuppressStderr #test_command self.assertEqual(app.BaseCommand,\ 'cd "/tmp/"; "/tmp/CLApp Tester.py" -F "p_file.txt"') #test_result result = app() self.assertEqual(result['StdOut'].read(),'') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') self.assertEqual(result['parameterized_file'].read(),\ 'out p_file.txt') result.cleanUp() def test_p_data_as_str(self): """CLAppTester: parameters turned on, data as str""" app = self.app_params #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_string') assert not app.SuppressStderr #test_command self.assertEqual(app.BaseCommand,\ 'cd "/tmp/"; /tmp/CLAppTester.py -F "p_file.txt"') #test_result result = app(self.data) self.assertEqual(result['StdOut'].read(),'') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') self.assertEqual(result['parameterized_file'].read(),\ 'out 43 p_file.txt') result.cleanUp() def test_p_data_as_file(self): """CLAppTester: parameters turned on, data as file""" app = self.app_params_input_as_file #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_lines') assert not app.SuppressStderr #test_command # we don't test the command in this case, because we don't know what # the name of the input file is. #test_result result = app([self.data]) self.assertEqual(result['StdOut'].read(),'') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') self.assertEqual(result['parameterized_file'].read(),\ 'out 43 p_file.txt') result.cleanUp() def test_WorkingDir(self): """CLAppTester: WorkingDir functions as expected """ system('cp /tmp/CLAppTester.py /tmp/test/CLAppTester.py') app = self.app_params_WorkingDir #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_string') assert not app.SuppressStderr # WorkingDir is what we expect self.assertEqual(app.WorkingDir,'/tmp/test/') #test_command self.assertEqual(app.BaseCommand,\ 'cd "/tmp/test/"; /tmp/CLAppTester.py -F "p_file.txt"') #test_result result = app() self.assertEqual(result['StdOut'].read(),'') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') # Make sure that the parameterized file is in the correct place self.assertEqual(result['parameterized_file'].name,\ '/tmp/test/p_file.txt') self.assertEqual(result['parameterized_file'].read(),\ 'out p_file.txt') result.cleanUp() def test_WorkingDir_w_space(self): """CLAppTester: WorkingDir w/ space in path functions as expected """ system('cp /tmp/CLAppTester.py "/tmp/test space/CLAppTester.py"') app = self.app_params_WorkingDir_w_space #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_string') assert not app.SuppressStderr # WorkingDir is what we expect self.assertEqual(app.WorkingDir,'/tmp/test space/') #test_command self.assertEqual(app.BaseCommand,\ 'cd "/tmp/test space/"; /tmp/CLAppTester.py -F "p_file.txt"') #test_result result = app() self.assertEqual(result['StdOut'].read(),'') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') # Make sure that the parameterized file is in the correct place self.assertEqual(result['parameterized_file'].name,\ '/tmp/test space/p_file.txt') self.assertEqual(result['parameterized_file'].read(),\ 'out p_file.txt') result.cleanUp() def test_TmpDir(self): """CLAppTester: Alternative TmpDir functions as expected""" app = self.app_params_TmpDir #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_string') assert not app.SuppressStderr # TmpDir is what we expect self.assertEqual(app.TmpDir,'/tmp/tmp2') #test_command self.assertEqual(app.BaseCommand,\ 'cd "/tmp/"; /tmp/CLAppTester.py -F "p_file.txt"') #test_result result = app() self.assertEqual(result['StdOut'].read(),'') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') # Make sure that the parameterized file is in the correct place self.assertEqual(result['parameterized_file'].name,\ '/tmp/p_file.txt') self.assertEqual(result['parameterized_file'].read(),\ 'out p_file.txt') result.cleanUp() def test_TmpDir_w_space(self): """CLAppTester: TmpDir functions as expected w space in name""" app = self.app_params_TmpDir_w_space #test_init assert app.Parameters['-F'].isOn() self.assertEqual(app.InputHandler,'_input_as_string') assert not app.SuppressStderr # TmpDir is what we expect self.assertEqual(app.TmpDir,'/tmp/tmp space') #test_command self.assertEqual(app.BaseCommand,\ 'cd "/tmp/"; /tmp/CLAppTester.py -F "p_file.txt"') #test_result result = app() self.assertEqual(result['StdOut'].read(),'') self.assertEqual(result['StdErr'].read(),'I am stderr\n') self.assertEqual(result['ExitStatus'],0) self.assertEqual(result['fixed_file'].read(),'I am fixed file') self.assertEqual(result['base_dep_1'].read(),'base dependent 1') self.assertEqual(result['base_dep_2'].read(),'base dependent 2') # Make sure that the parameterized file is in the correct place self.assertEqual(result['parameterized_file'].name,\ '/tmp/p_file.txt') self.assertEqual(result['parameterized_file'].read(),\ 'out p_file.txt') result.cleanUp() def test_input_as_string(self): """CLAppTester: _input_as_string functions as expected """ self.assertEqual(self.app_no_params._input_as_string('abcd'),'abcd') self.assertEqual(self.app_no_params._input_as_string(42),'42') self.assertEqual(self.app_no_params._input_as_string(None),'None') self.assertEqual(self.app_no_params._input_as_string([1]),'[1]') self.assertEqual(self.app_no_params._input_as_string({'a':1}),\ "{'a': 1}") def test_input_as_lines_from_string(self): """CLAppTester: _input_as_lines functions as expected w/ data as str """ filename = self.app_no_params._input_as_lines('abcd') self.assertEqual(filename[0],'/') f = open(filename) self.assertEqual(f.readline(),'a\n') self.assertEqual(f.readline(),'b\n') self.assertEqual(f.readline(),'c\n') self.assertEqual(f.readline(),'d') f.close() remove(filename) def test_input_as_lines_from_list(self): """CLAppTester: _input_as_lines functions as expected w/ data as list """ filename = self.app_no_params._input_as_lines(['line 1',None,3]) self.assertEqual(filename[0],'/') f = open(filename) self.assertEqual(f.readline(),'line 1\n') self.assertEqual(f.readline(),'None\n') self.assertEqual(f.readline(),'3') f.close() remove(filename) def test_input_as_lines_from_list_w_newlines(self): """CLAppTester: _input_as_lines functions w/ data as list w/ newlines """ filename = self.app_no_params._input_as_lines(['line 1\n',None,3]) self.assertEqual(filename[0],'/') f = open(filename) self.assertEqual(f.readline(),'line 1\n') self.assertEqual(f.readline(),'None\n') self.assertEqual(f.readline(),'3') f.close() remove(filename) def test_input_as_multiline_string(self): """CLAppTester: _input_as_multiline_string functions as expected """ filename = self.app_no_params._input_as_multiline_string(\ 'line 1\nNone\n3') self.assertEqual(filename[0],'/') f = open(filename) self.assertEqual(f.readline(),'line 1\n') self.assertEqual(f.readline(),'None\n') self.assertEqual(f.readline(),'3') f.close() remove(filename) def test_input_as_lines_from_list_single_entry(self): """CLAppTester: _input_as_lines functions as expected w/ 1 element list """ filename = self.app_no_params._input_as_lines(['line 1']) self.assertEqual(filename[0],'/') f = open(filename) self.assertEqual(f.readline(),'line 1') f.close() remove(filename) def test_input_as_multiline_string_single_line(self): """CLAppTester: _input_as_multiline_string functions w/ single line """ # functions as expected with single line string filename = self.app_no_params._input_as_multiline_string(\ 'line 1') self.assertEqual(filename[0],'/') f = open(filename) self.assertEqual(f.readline(),'line 1') f.close() remove(filename) def test_getTmpFilename_non_default(self): """TmpFilename handles alt tmp_dir, prefix and suffix properly""" app = CLAppTester() obs = app.getTmpFilename(include_class_id=False) self.assertTrue(obs.startswith('/tmp/tmp')) self.assertTrue(obs.endswith('.txt')) obs = app.getTmpFilename(tmp_dir="/tmp/blah",prefix="app_ctl_test",\ suffix='.test',include_class_id=False) self.assertTrue(obs.startswith('/tmp/blah/app_ctl_test')) self.assertTrue(obs.endswith('.test')) def test_getTmpFilename_defaults_to_no_class_id(self): """CLAppTester: getTmpFilename doesn't include class id by default """ # I want to explicitly test for this so people don't forget to # set the default to False if they change it for testing purposes app = CLAppTester() self.assertFalse(app.getTmpFilename().\ startswith('/tmp/tmpCLAppTester')) self.assertTrue(app.getTmpFilename(include_class_id=True).\ startswith('/tmp/tmpCLAppTester')) def test_input_as_path(self): """CLAppTester: _input_as_path casts data to FilePath""" actual = self.app_no_params._input_as_path('test.pdb') self.assertEqual(actual,'test.pdb') self.assertEqual(str(actual),'"test.pdb"') actual = self.app_no_params._input_as_path('te st.pdb') self.assertEqual(actual,'te st.pdb') self.assertEqual(str(actual),'"te st.pdb"') actual = self.app_no_params._input_as_path('./test.pdb') self.assertEqual(actual,'./test.pdb') self.assertEqual(str(actual),'"./test.pdb"') actual = self.app_no_params._input_as_path('/this/is/a/test.pdb') self.assertEqual(actual,'/this/is/a/test.pdb') self.assertEqual(str(actual),'"/this/is/a/test.pdb"') actual = self.app_no_params._input_as_path('/this/i s/a/test.pdb') self.assertEqual(actual,'/this/i s/a/test.pdb') self.assertEqual(str(actual),'"/this/i s/a/test.pdb"') def test_input_as_paths(self): """CLAppTester: _input_as_paths casts each input to FilePath """ input = ['test.pdb'] actual = self.app_no_params._input_as_paths(input) expected = '"test.pdb"' self.assertEqual(actual,expected) input = ['test1.pdb','test2.pdb'] actual = self.app_no_params._input_as_paths(input) expected = '"test1.pdb" "test2.pdb"' self.assertEqual(actual,expected) input = ['/path/to/test1.pdb','test2.pdb'] actual = self.app_no_params._input_as_paths(input) expected = '"/path/to/test1.pdb" "test2.pdb"' self.assertEqual(actual,expected) input = ['test1.pdb','/path/to/test2.pdb'] actual = self.app_no_params._input_as_paths(input) expected = '"test1.pdb" "/path/to/test2.pdb"' self.assertEqual(actual,expected) input = ['/path/to/test1.pdb','/path/to/test2.pdb'] actual = self.app_no_params._input_as_paths(input) expected = '"/path/to/test1.pdb" "/path/to/test2.pdb"' self.assertEqual(actual,expected) input = ['/pa th/to/test1.pdb','/path/to/te st2.pdb'] actual = self.app_no_params._input_as_paths(input) expected = '"/pa th/to/test1.pdb" "/path/to/te st2.pdb"' self.assertEqual(actual,expected) def test_absolute(self): """CLAppTester: _absolute converts relative paths to absolute paths """ absolute = self.app_no_params._absolute self.assertEqual(absolute('/tmp/test.pdb'),'/tmp/test.pdb') self.assertEqual(absolute('test.pdb'),'/tmp/test.pdb') def test_working_dir_setting(self): """CLAppTester: WorkingDir is set correctly """ app = CLAppTester_no_working_dir() self.assertEqual(app.WorkingDir,getcwd()+'/') def test_error_raised_on_command_None(self): """CLAppTester: An error is raises when _command == None """ app = CLAppTester() app._command = None self.assertRaises(ApplicationError, app._get_base_command) def test_rejected_exit_status(self): """CLAppTester_reject_exit_status results in useful error """ app = CLAppTester_reject_exit_status() self.assertRaises(ApplicationError,app) def test_getTmpFilename(self): """TmpFilename should return filename of correct length""" app = CLAppTester() obs = app.getTmpFilename(include_class_id=True) # leaving the strings in this statement so it's clear where the expected # length comes from self.assertEqual(len(obs), len(app.TmpDir) + len('/') + app.TmpNameLen \ + len('tmp') + len('CLAppTester') + len('.txt')) assert obs.startswith(app.TmpDir) chars = set(obs[18:]) assert len(chars) > 1 obs = app.getTmpFilename(include_class_id=False) # leaving the strings in this statement so it's clear where the expected # length comes from self.assertEqual(len(obs), len(app.TmpDir) + len('/') + app.TmpNameLen \ + len('tmp') + len('.txt')) assert obs.startswith(app.TmpDir) def test_getTmpFilename_prefix_suffix_result_constructor(self): """TmpFilename: result has correct prefix, suffix, type""" app = CLAppTester() obs = app.getTmpFilename(prefix='blah',include_class_id=False) self.assertTrue(obs.startswith('/tmp/blah')) obs = app.getTmpFilename(suffix='.blah',include_class_id=False) self.assertTrue(obs.endswith('.blah')) # Prefix defaults to not include the class name obs = app.getTmpFilename(include_class_id=False) self.assertFalse(obs.startswith('/tmp/tmpCLAppTester')) self.assertTrue(obs.endswith('.txt')) # including class id functions correctly obs = app.getTmpFilename(include_class_id=True) self.assertTrue(obs.startswith('/tmp/tmpCLAppTester')) self.assertTrue(obs.endswith('.txt')) # result as FilePath obs = app.getTmpFilename(result_constructor=FilePath) self.assertEqual(type(obs),FilePath) # result as str (must check that result is a str and is not a FilePath # since a FilePath is a str) obs = app.getTmpFilename(result_constructor=str) self.assertEqual(type(obs),str) self.assertNotEqual(type(obs),FilePath) class ConvenienceFunctionTests(TestCase): """ """ def setUp(self): """ """ self.tmp_dir = '/tmp' self.tmp_name_len = 20 def test_guess_input_handler(self): """guess_input_handler should correctly identify input""" gih = guess_input_handler self.assertEqual(gih('abc.txt'), '_input_as_string') self.assertEqual(gih('>ab\nTCAG'), '_input_as_multiline_string') self.assertEqual(gih(['ACC','TGA'], True), '_input_as_seqs') self.assertEqual(gih(['>a','ACC','>b','TGA']), '_input_as_lines') self.assertEqual(gih([('a','ACC'),('b','TGA')]),\ '_input_as_seq_id_seq_pairs') self.assertEqual(gih([]),'_input_as_lines') def test_get_tmp_filename(self): """get_tmp_filename should return filename of correct length Adapted from the CommandLineApplication tests of the member function """ obs = get_tmp_filename() # leaving the strings in this statement so it's clear where the expected # length comes from self.assertEqual(len(obs), len(self.tmp_dir) + len('/') + self.tmp_name_len \ + len('tmp') + len('.txt')) self.assertTrue(obs.startswith('/tmp')) # different results on different calls self.assertNotEqual(get_tmp_filename(),get_tmp_filename()) obs = get_tmp_filename() # leaving the strings in this statement so it's clear where the expected # length comes from self.assertEqual(len(obs), len(self.tmp_dir) + len('/') + self.tmp_name_len \ + len('tmp') + len('.txt')) assert obs.startswith(self.tmp_dir) def test_get_tmp_filename_prefix_suffix_constructor(self): """get_tmp_filename: result has correct prefix, suffix, type Adapted from the CommandLineApplication tests of the member function """ obs = get_tmp_filename(prefix='blah') self.assertTrue(obs.startswith('/tmp/blah')) obs = get_tmp_filename(suffix='.blah') self.assertTrue(obs.endswith('.blah')) # result as FilePath obs = get_tmp_filename(result_constructor=FilePath) self.assertEqual(type(obs),FilePath) # result as str (must check that result is a str and is not a FilePath # since a FilePath is a str) obs = get_tmp_filename(result_constructor=str) self.assertEqual(type(obs),str) self.assertNotEqual(type(obs),FilePath) class RemoveTests(TestCase): def test_remove(self): """This will remove the test script. Not actually a test!""" for dir, n, fnames in walk('/tmp/test/'): for f in fnames: try: remove(dir + f) except OSError, e: pass remove('/tmp/CLAppTester.py') remove('/tmp/test space/CLAppTester.py') remove('/tmp/CLApp Tester.py') rmdir('/tmp/tmp space') rmdir('/tmp/test') rmdir('/tmp/test space') rmdir('/tmp/tmp2') rmdir('/tmp/blah') #=====================END OF TESTS=================================== script = """#!/usr/bin/env python #This is a test script intended to test the CommandLineApplication #class and CommandLineAppResult class from sys import argv, stderr,stdin from os import isatty out_file_name = None input_arg = None # parse input try: if argv[1] == '-F': out_file_name = argv[2] except IndexError: pass try: if out_file_name: input_arg = argv[3] else: input_arg = argv[1] except IndexError: pass # Create the output string out = 'out' # get input try: f = open(str(input_arg)) data = int(f.readline().strip()) except IOError: try: data = int(input_arg) except TypeError: data = None if data: data = str(data + 1) out = ' '.join([out,data]) # Write base dependent output files base = 'BASE' f = open('/tmp/' + base + '.1','w') f.writelines(['base dependent 1']) f.close() f = open('/tmp/' + base + '.2','w') f.writelines(['base dependent 2']) f.close() # If output to file, open the file and write output to it if out_file_name: filename = argv[2] f = open(''.join([out_file_name]),'w') out = ' '.join([out,out_file_name]) f.writelines(out) f.close() else: print out #generate some stderr print >> stderr, 'I am stderr' # Write the fixed file f = open('/tmp/fixed.txt','w') f.writelines(['I am fixed file']) f.close() """ class CLAppTester(CommandLineApplication): _parameters = { '-F':ValuedParameter(Prefix='-',Name='F',Delimiter=' ',\ Value=None, Quote="\""),\ '--duh':FlagParameter(Prefix='--',Name='duh')} _command = '/tmp/CLAppTester.py' _synonyms = {'File':'-F','file':'-F'} _working_dir = '/tmp' def _get_result_paths(self,data): if self.Parameters['-F'].isOn(): param_path = ''.join([self.WorkingDir,self.Parameters['-F'].Value]) else: param_path = None result = {} result['fixed_file'] = ResultPath(Path='/tmp/fixed.txt') result['parameterized_file'] = ResultPath(Path=param_path,\ IsWritten=self.Parameters['-F'].isOn()) result['base_dep_1'] = ResultPath(Path=self._build_name(suffix='.1')) result['base_dep_2'] = ResultPath(Path=self._build_name(suffix='.2')) return result def _build_name(self,suffix): return '/tmp/BASE' + suffix def getHelp(self): return """Duh""" class CLAppTester_no_working_dir(CLAppTester): _working_dir = None class CLAppTester_reject_exit_status(CLAppTester): def _accept_exit_status(self,exit_status): return False class CLAppTester_bad_fixed_file(CLAppTester): def _get_result_paths(self,data): if self.Parameters['-F'].isOn(): param_path = ''.join([self.WorkingDir,self.Parameters['-F'].Value]) else: param_path = None result = {} result['fixed_file'] = ResultPath(Path='/tmp/fixed.txt') result['fixed_file_bad'] = ResultPath(Path='/tmp/i_dont_exist.txt') result['parameterized_file'] = ResultPath(Path=param_path,\ IsWritten=self.Parameters['-F'].isOn()) result['base_dep_1'] = ResultPath(Path=self._build_name(suffix='.1')) result['base_dep_2'] = ResultPath(Path=self._build_name(suffix='.2')) return result class CLAppTester_bad_fixed_file_w_handler(CLAppTester_bad_fixed_file): def _handle_app_result_build_failure(self,out,err,exit_status,result_paths): return "Called self._handle_app_result_build_failure" class CLAppTester_space_in_command(CLAppTester): _command = '"/tmp/CLApp Tester.py"' class ParameterCombinationsApp(CommandLineApplication): """ParameterCombinations mock application to wrap""" _command = 'testcmd' _parameters = {'-flag1':FlagParameter(Prefix='-',Name='flag1'), '-flag2':FlagParameter(Prefix='-',Name='flag2'), '--value1':ValuedParameter(Prefix='--',Name='value1'), '-value2':ValuedParameter(Prefix='-',Name='value2'), '-mix1':MixedParameter(Prefix='-',Name='mix1'), '-mix2':MixedParameter(Prefix='-',Name='mix2'), '-delim':ValuedParameter(Prefix='-',Name='delim', Delimiter='aaa'), '-default':ValuedParameter(Prefix='-',Name='default', Value=42, Delimiter='='), '-input':ValuedParameter(Prefix='-',Name='input',\ Delimiter='='), '-output':ValuedParameter(Prefix='-',Name='output', Delimiter='=')} if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_app/test_vienna_package.py000644 000765 000024 00000054115 12024702176 023574 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import getcwd, remove, rmdir import tempfile, shutil from cogent.util.unit_test import TestCase, main from cogent.app.vienna_package import RNAfold, RNAsubopt, RNAplot,\ plot_from_seq_and_struct, DataError, get_constrained_fold, \ get_secondary_structure __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight", "Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" class RNAfoldTests(TestCase): """Tests for the RNAfold application controller""" def setUp(self): self.unnamed_seq = ['AUAGCUAGCUAUGCGCUAGC','ACGGCUAUAGCUAGCGA',\ 'gcuagcuauuauauaua'] self.named_seq = ['>namedseq1','AUAGCUAGCUAUGCGCUAGC','>namedseq2',\ 'ACGGCUAUAGCUAGCGA'] self.mixed_seq = ['>namedseq1','AUAGCUAGCUAUGCGCUAGC', 'ACGGCUAUAGCUAGCGA','gcuagcuauuauauaua'] self.mixed_seq2 = ['>namedseq2','AUAGCUAGCUAUGCGCUAGC', 'ACGGCUAUAGCUAGCGA','gcuagcuauuauauaua'] self.temp_dir = '/tmp/test_rnafold' def test_base_command(self): """RNAfold: BaseCommand should be ok for different parameter settings""" r = RNAfold() working_dir = getcwd() obs = r.BaseCommand.split() exp = ['cd','"%s/";' % getcwd(),'RNAfold','-d1','-T','37','-S','1.07'] self.assertEqualItems(obs, exp) r.Parameters['-noLP'].on() obs = r.BaseCommand.split() exp = ['cd','"%s/";' % getcwd(),'RNAfold','-d1','-noLP','-T','37',\ '-S','1.07'] self.assertEqualItems(obs, exp) r.Parameters['Temp'].on(15) obs = r.BaseCommand.split() exp = ['cd','"%s/";' % getcwd(),'RNAfold','-d1','-noLP','-T','15', \ '-S','1.07'] self.assertEqualItems(obs, exp) r.Parameters['-d'].off() obs = r.BaseCommand.split() exp = ['cd','"%s/";' % getcwd(),'RNAfold','-noLP','-T','15','-S','1.07'] self.assertEqualItems(obs, exp) def test_changing_working_dir(self): """RNAfold: BaseCommand should be ok after changing the working dir""" #changing in initialization temp_dir = tempfile.mkdtemp() r = RNAfold(WorkingDir=temp_dir) self.assertEqual(r.BaseCommand,\ 'cd "%s/"; RNAfold -d1 -T 37 -S 1.07'%(temp_dir)) #changing afterwards r = RNAfold() r.WorkingDir = temp_dir self.assertEqual(r.BaseCommand,\ 'cd "%s/"; RNAfold -d1 -T 37 -S 1.07'%(temp_dir)) rmdir(temp_dir) def test_stdout(self): """RNAfold: StdOut should be as expected""" r = RNAfold() exp = '\n'.join(['>namedseq1','AUAGCUAGCUAUGCGCUAGC',\ '...((((((.....)))))) ( -8.30)','ACGGCUAUAGCUAGCGA',\ '...((((....)))).. ( -3.20)','GCUAGCUAUUAUAUAUA',\ '................. ( 0.00)'])+'\n' res = r(self.mixed_seq) obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() def test_stdout_input_as_path(self): """RNAfold: StdOut with input_as_path""" r = RNAfold(InputHandler='_input_as_path') f = open('/tmp/rnatestfile','w') f.write('\n'.join(self.mixed_seq2)) f.close() exp = '\n'.join(['>namedseq2','AUAGCUAGCUAUGCGCUAGC',\ '...((((((.....)))))) ( -8.30)','ACGGCUAUAGCUAGCGA',\ '...((((....)))).. ( -3.20)','GCUAGCUAUUAUAUAUA',\ '................. ( 0.00)'])+'\n' res = r('/tmp/rnatestfile') obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() remove('/tmp/rnatestfile') def test_stdout_input_as_path_space(self): """RNAfold: StdOut with input_as_path and space in filename""" r = RNAfold(InputHandler='_input_as_path') f = open('/tmp/rna test file','w') f.write('\n'.join(self.mixed_seq2)) f.close() exp = '\n'.join(['>namedseq2','AUAGCUAGCUAUGCGCUAGC',\ '...((((((.....)))))) ( -8.30)','ACGGCUAUAGCUAGCGA',\ '...((((....)))).. ( -3.20)','GCUAGCUAUUAUAUAUA',\ '................. ( 0.00)'])+'\n' res = r('/tmp/rna test file') obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() remove('/tmp/rna test file') def test_get_result_paths_unnamed_seq(self): """RNAfold: _get_result_paths() should work on unnamed seq""" r = RNAfold() res = r(self.unnamed_seq) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','SS','DP']) self.failUnless(res['DP'] is None) self.failUnless(res['SS'] is not None) self.failUnless(res['StdOut'] is not None) self.failUnless(res['StdErr'] is None) self.assertEqual(res['ExitStatus'],0) res.cleanUp() def test_get_result_paths_named_seq(self): """RNAfold: _get_result_paths() should work on named seq""" r = RNAfold() res = r(self.named_seq) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','SS','DP','namedseq1_ss',\ 'namedseq2_ss','namedseq1_dp','namedseq2_dp']) res.cleanUp() def test_get_result_paths_mixed_seq(self): """RNAfold: _get_result_paths() should work on partly named seq""" r = RNAfold() res = r(self.mixed_seq) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','SS','DP','namedseq1_ss',\ 'namedseq1_dp']) res.cleanUp() def test_get_result_paths_parameter(self): """RNAfold: _get_result_paths() should work with diff parameters""" r = RNAfold() r.Parameters['-p'].on() res = r(self.unnamed_seq) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','SS','DP']) self.failUnless(res['DP'] is not None) self.failUnless(res['SS'] is not None) self.failUnless(res['StdOut'] is not None) self.failUnless(res['StdErr'] is None) self.assertEqual(res['ExitStatus'],0) res.cleanUp() def test_get_result_paths_working_dir(self): """RNAfold: _get_result_paths() should work with diff working dir""" r = RNAfold(WorkingDir=self.temp_dir) res = r(self.unnamed_seq) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','SS','DP']) self.failUnless(res['DP'] is None) self.failUnless(res['SS'] is not None) self.failUnless(isinstance(res['SS'],file)) self.failUnless(res['StdOut'] is not None) self.failUnless(res['StdErr'] is None) self.assertEqual(res['ExitStatus'],0) res.cleanUp() def test_get_constrained_fold_bad_data(self): """get_constrained_fold should handle bad data.""" test_seq = 'AAACCCGGGUUU' constraint = '(((...)))' #Test empty sequence. self.assertRaises(ValueError, get_constrained_fold,\ '', constraint) #Test empty constraint string. self.assertRaises(ValueError, get_constrained_fold,\ test_seq, '') #Test different length sequence and constraint. self.assertRaises(ValueError, get_constrained_fold,\ test_seq, constraint) def test_get_constrained_fold(self): """get_constrained_fold should give correct result.""" test_seq = 'AAACCCGGGUUU' constraint = '(((......)))' expected_struct = '((((....))))' obs_seq, obs_struct, obs_energy = \ get_constrained_fold(test_seq,constraint) #Test get back correct seq and struct self.assertEqual(obs_seq, test_seq) self.assertEqual(obs_struct, expected_struct) def test_get_secondary_structure(self): """get_secondary_structure should give correct result.""" test_seq = 'AAACCCGGGUUU' expected_struct = '((((....))))' expected_energy = -0.80 obs_seq, obs_struct, obs_energy = \ get_secondary_structure(test_seq) #Test get back correct seq and struct self.assertEqual(obs_seq, test_seq) self.assertEqual(obs_struct, expected_struct) self.assertEqual(obs_energy, expected_energy) def test_zzz_general_cleanup(self): """Executed last, clean up temp_dir""" shutil.rmtree(self.temp_dir) class RNAsuboptTests(TestCase): """Tests for the RNAsubopt application controller""" def test_base_command(self): """RNAsubopt: BaseCommand should be ok for different parameter settings """ r = RNAsubopt() obs = r.BaseCommand.split() exp = ['cd','"%s/";' % getcwd(),'RNAsubopt','-e','1','-d2','-T','37'] self.assertEqualItems(obs, exp) r.Parameters['-nsp'].on('GA') obs = r.BaseCommand.split() exp = ['cd','"%s/";' % getcwd(),'RNAsubopt','-e','1','-d2','-nsp','GA',\ '-T','37'] self.assertEqualItems(obs, exp) r.Parameters['Temp'].on(15) obs = r.BaseCommand.split() exp = ['cd','"%s/";' % getcwd(),'RNAsubopt','-e','1','-d2','-nsp','GA',\ '-T','15'] self.assertEqualItems(obs, exp) r.Parameters['-d'].off() obs = r.BaseCommand.split() exp = ['cd','"%s/";' % getcwd(),'RNAsubopt','-e','1','-nsp','GA','-T',\ '15'] self.assertEqualItems(obs, exp) def test_changing_working_dir(self): """RNAsubopt: BaseCommand should be ok after changing the working dir """ temp_dir = tempfile.mkdtemp() #changing in initialization r = RNAsubopt(WorkingDir=temp_dir) self.assertEqual(r.BaseCommand,\ 'cd "%s/"; RNAsubopt -e 1 -d2 -T 37'%(temp_dir)) #changing afterwards r = RNAsubopt() r.WorkingDir = temp_dir self.assertEqual(r.BaseCommand,\ 'cd "%s/"; RNAsubopt -e 1 -d2 -T 37'%(temp_dir)) rmdir(temp_dir) def test_stdout(self): """RNAsubopt: StdOut should be as expected""" r = RNAsubopt() seq = ['AUAGCUAGCUAUGCGCUAGCGGAUUAGCUAGCUAGCGA',\ 'ucgaucgaucagcuagcuauuauauaua'] exp = '\n'.join( ['AUAGCUAGCUAUGCGCUAGCGGAUUAGCUAGCUAGCGA -1720 100', '.(((((((((.(((....)))....))))))))).... -16.20', '.(((((((((((.((....)).)).))))))))).... -17.20', '.((((((((((..((....))...)))))))))).... -16.60', '.((((((((((.((....))....)))))))))).... -16.40', '.(((((((((((((....)))...)))))))))).... -16.90', '.(((((((((((.((....)).).)))))))))).... -17.20', 'UCGAUCGAUCAGCUAGCUAUUAUAUAUA 0 100', '......(((.((....)))))....... 0.70', '..........((....)).......... 0.60', '..((....)).................. 0.90', '............................ 0.00']) + '\n' res = r(seq) obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() def test_stdout_input_as_path_space(self): """RNAsubopt: StdOut with input_as_path and space in filename""" mixed_seq2 = ['>namedseq2','AUAGCUAGCUAUGCGCUAGC', 'ACGGCUAUAGCUAGCGA','gcuagcuauuauauaua'] r = RNAsubopt(InputHandler='_input_as_path') f = open('/tmp/rna test file','w') f.write('\n'.join(mixed_seq2)) f.close() exp = '\n'.join(['> namedseq2 [100]', 'AUAGCUAGCUAUGCGCUAGC -830 100', '...((((((.....)))))) -8.30', 'ACGGCUAUAGCUAGCGA -320 100', '...(((......))).. -2.30', '...((((....)))).. -3.20', 'GCUAGCUAUUAUAUAUA 0 100', '................. 0.00'])+'\n' res = r('/tmp/rna test file') obs = res['StdOut'].read() self.assertEqual(obs,exp) res.cleanUp() remove('/tmp/rna test file') def test_get_result_paths(self): """RNAsubopt: _get_result_paths() should create the right dict entries """ r = RNAsubopt() seq = ['AUAGCUAGCUAUGCGCUAGCGGAUUAGCUAGCUAGCGA',\ 'ucgaucgaucagcuagcuauuauauaua'] res = r(seq) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus']) self.failUnless(res['StdOut'] is not None) self.failUnless(res['StdErr'] is None) self.assertEqual(res['ExitStatus'],0) res.cleanUp() r = RNAsubopt({'-s':None,'-lodos':None,'-d':3,'-logML':None,\ '-noLP':None,'-4':None,'-noGU':None,'-noCloseGU':None}) res = r(seq) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus']) self.failUnless(res['StdOut'] is not None) self.failUnless(res['StdErr'] is None) #self.assertEqual(res['ExitStatus'],0) #platform-dependent? res.cleanUp() class RNAplotTests(TestCase): """Tests for the RNAplot application controller""" def setUp(self): self.unnamed_seqs = ['AUAGCUAGCUAUGCGCUAGC','...((((((.....))))))',\ 'ACGGCUAUAGCUAGCGA','...((((....))))..',\ 'gcuagcuauuauauaua','.................'] self.named_seqs = \ ['>namedseq1','AUAGCUAGCUAUGCGCUAGC','...((((((.....))))))',\ '>namedseq2','ACGGCUAUAGCUAGCGA','...((((....))))..'] self.mixed_seqs = \ ['>namedseq1','AUAGCUAGCUAUGCGCUAGC','...((((((.....))))))',\ 'ACGGCUAUAGCUAGCGA','...((((....))))..',\ 'gcuagcuauuauauaua','.................'] self.named_seq = \ ['>namedseq','AUAGCUAGCUAUGCGCUAGC','...((((((.....))))))'] self.standard_name = 'namedseq' self.standard_seq = 'AUAGCUAGCUAUGCGCUAGC' self.standard_struct = '...((((((.....))))))' self.bad_pairing_struct = '...((((((.....)))...' self.bad_chars_struct = '...((((((.....)xx)))' self.short_struct = '...((((((..))))))' self.temp_dir = '/tmp/test_rnaplot' def test_base_command(self): """RNAplot: BaseCommand should be ok for different parameter settings""" r = RNAplot() working_dir = getcwd() self.assertEqual(r.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','RNAplot'])) r.Parameters['-t'].on(0) self.assertEqual(r.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','RNAplot -t 0'])) r.Parameters['-o'].on('svg') self.assertEqual(r.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','RNAplot -t 0 -o svg'])) r.Parameters['-t'].off() self.assertEqual(r.BaseCommand,\ ''.join(['cd "',getcwd(),'/"; ','RNAplot -o svg'])) def test_changing_working_dir(self): """RNAplot: BaseCommand should be ok after changing the working dir""" #changing in initialization temp_dir = tempfile.mkdtemp() r = RNAplot(WorkingDir=temp_dir) self.assertEqual(r.BaseCommand,\ 'cd "%s/"; RNAplot'%(temp_dir)) #changing afterwards r = RNAplot() r.WorkingDir = temp_dir self.assertEqual(r.BaseCommand,\ 'cd "%s/"; RNAplot'%(temp_dir)) rmdir(temp_dir) def test_get_result_paths_unnamed_seq(self): """RNAplot: _get_result_paths() should work on unnamed seq""" r = RNAplot() res = r(self.unnamed_seqs) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','SS']) self.failUnless(res['SS'] is not None) self.failUnless(res['StdOut'] is not None) self.failUnless(res['StdErr'] is None) self.assertEqual(res['ExitStatus'],0) res.cleanUp() def test_get_result_paths_named_seq(self): """RNAplot: _get_result_paths() should work on named seq""" r = RNAplot() res = r(self.named_seqs) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','namedseq1_ss',\ 'namedseq2_ss']) res.cleanUp() def test_get_result_paths_mixed_seq(self): """RNAplot: _get_result_paths() should work on partly named seq""" r = RNAplot() res = r(self.mixed_seqs) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','SS','namedseq1_ss']) res.cleanUp() def test_get_result_paths_parameter(self): """RNAplot: _get_result_paths() should work with diff parameters""" r = RNAplot() r.Parameters['-t'].on(0) res = r(self.unnamed_seqs) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','SS']) self.failUnless(res['SS'] is not None) self.failUnless(res['StdOut'] is not None) self.failUnless(res['StdErr'] is None) self.assertEqual(res['ExitStatus'],0) res.cleanUp() def test_get_result_paths_working_dir(self): """RNAplot: _get_result_paths() should work with diff working dir""" r = RNAplot(WorkingDir=self.temp_dir) res = r(self.unnamed_seqs) self.assertEqualItems(res.keys(),\ ['StdOut','StdErr','ExitStatus','SS']) self.failUnless(res['SS'] is not None) self.failUnless(isinstance(res['SS'],file)) self.failUnless(res['StdOut'] is not None) self.failUnless(res['StdErr'] is None) self.assertEqual(res['ExitStatus'],0) res.cleanUp() def test_rnaplot_output(self): """RNAplot: calling RNAplot on data should give correct result.""" r = RNAplot(WorkingDir=self.temp_dir) res = r(self.named_seq) ps_plot = res['namedseq_ss'].read() observed_lines = ps_plot.split('\n') expected_lines = RNAPLOT_RES.split('\n') #First 8 lines depend on the runtime. Check after. self.assertEqual(observed_lines[8:],expected_lines[8:]) res.cleanUp() def test_plot_from_seq_and_struct_bad_data(self): """plot_from_seq_and_struct helper function should handle bad data. """ #Check bad Vienna Structure pairing self.assertRaises(IndexError, plot_from_seq_and_struct,\ self.standard_seq, self.bad_pairing_struct) #Check bad characters in Vienna Structure self.assertRaises(ValueError, plot_from_seq_and_struct,\ self.standard_seq, self.bad_chars_struct) #Check different lengths of seq and struct self.assertRaises(DataError, plot_from_seq_and_struct,\ self.standard_seq, self.short_struct) def test_plot_from_seq_and_struct(self): """plot_from_seq_and_struct helper function should give correct result. """ ps_plot = plot_from_seq_and_struct(self.standard_seq,\ self.standard_struct, seqname=self.standard_name) observed_lines = ps_plot.split('\n') expected_lines = RNAPLOT_RES.split('\n') #First 8 lines depend on the runtime. Check after. self.assertEqual(observed_lines[8:],expected_lines[8:]) def test_zzz_general_cleanup(self): """Executed last, clean up temp_dir""" shutil.rmtree(self.temp_dir) RNAPLOT_RES = """%!PS-Adobe-3.0 EPSF-3.0 %%Creator: PS_dot.c,v 1.38 2007/02/02 15:18:13 ivo Exp $, ViennaRNA-1.7.1 %%CreationDate: Tue Oct 21 15:44:50 2008 %%Title: RNA Secondary Structure Plot %%BoundingBox: 66 210 518 662 %%DocumentFonts: Helvetica %%Pages: 1 %%EndComments %Options: % to switch off outline pairs of sequence comment or % delete the appropriate line near the end of the file %%BeginProlog /RNAplot 100 dict def RNAplot begin /fsize 14 def /outlinecolor {0.2 setgray} bind def /paircolor {0.2 setgray} bind def /seqcolor {0 setgray} bind def /cshow { dup stringwidth pop -2 div fsize -3 div rmoveto show} bind def /min { 2 copy gt { exch } if pop } bind def /max { 2 copy lt { exch } if pop } bind def /drawoutline { gsave outlinecolor newpath coor 0 get aload pop 0.8 0 360 arc % draw 5' circle of 1st sequence currentdict /cutpoint known % check if cutpoint is defined {coor 0 cutpoint getinterval {aload pop lineto} forall % draw outline of 1st sequence coor cutpoint 1 add get aload pop 2 copy moveto 0.8 0 360 arc % draw 5' circle of 2nd sequence coor cutpoint 1 add coor length cutpoint 1 add sub getinterval {aload pop lineto} forall} % draw outline of 2nd sequence {coor {aload pop lineto} forall} % draw outline as a whole ifelse stroke grestore } bind def /drawpairs { paircolor 0.7 setlinewidth [9 3.01] 9 setdash newpath pairs {aload pop coor exch 1 sub get aload pop moveto coor exch 1 sub get aload pop lineto } forall stroke } bind def % draw bases /drawbases { [] 0 setdash seqcolor 0 coor { aload pop moveto dup sequence exch 1 getinterval cshow 1 add } forall pop } bind def /init { /Helvetica findfont fsize scalefont setfont 1 setlinejoin 1 setlinecap 0.8 setlinewidth 72 216 translate % find the coordinate range /xmax -1000 def /xmin 10000 def /ymax -1000 def /ymin 10000 def coor { aload pop dup ymin lt {dup /ymin exch def} if dup ymax gt {/ymax exch def} {pop} ifelse dup xmin lt {dup /xmin exch def} if dup xmax gt {/xmax exch def} {pop} ifelse } forall /size {xmax xmin sub ymax ymin sub max} bind def 72 6 mul size div dup scale size xmin sub xmax sub 2 div size ymin sub ymax sub 2 div translate } bind def end %%EndProlog RNAplot begin % data start here /sequence (\\ AUAGCUAGCUAUGCGCUAGC\\ ) def /coor [ [100.992 114.290] [88.210 108.134] [86.994 93.999] [98.538 85.751] [105.046 72.236] [111.554 58.722] [118.062 45.207] [124.571 31.693] [131.079 18.178] [127.159 2.622] [136.992 -10.055] [153.034 -10.127] [162.981 2.461] [159.200 18.052] [144.593 24.686] [138.085 38.201] [131.577 51.716] [125.069 65.230] [118.560 78.745] [112.052 92.259] ] def /pairs [ [4 20] [5 19] [6 18] [7 17] [8 16] [9 15] ] def init % switch off outline pairs or bases by removing these lines drawoutline drawpairs drawbases % show it showpage end %%EOF """ if __name__ == '__main__': main() PyCogent-1.5.3/tests/test_align/__init__.py000644 000765 000024 00000000554 12024702176 021651 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_algorithm', 'test_align', 'test_weights'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Peter Maxwell", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" PyCogent-1.5.3/tests/test_align/test_algorithm.py000644 000765 000024 00000030045 12024702176 023135 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division from cogent.util.unit_test import TestCase, main from cogent.align.algorithm import ScoreCell, MatchScorer, equality_scorer,\ default_gap, default_gap_symbol, ScoreMatrix, NeedlemanWunschMatrix, \ SmithWatermanMatrix, nw_align, sw_align from copy import copy, deepcopy __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" """Tests of the cogent.align.algorithm module. """ class ScoreCellTests(TestCase): """Tests for ScoreCell class. """ def setUp(self): """Setup for ScoreCell tests.""" self.score_cell = ScoreCell() def test_init(self): """Tests for ScoreCell __init__ function.""" #Test for empty score cell self.assertEqual(self.score_cell.Score,0) self.assertEqual(self.score_cell.Pointer,None) def test_update(self): """Tests for ScoreCell update function.""" #Test tie up wins self.score_cell.update(1,1,1) self.assertEqual(self.score_cell.Score,1) self.assertEqual(self.score_cell.Pointer,"up") #Test tie diag wins self.score_cell.update(3,1,3) self.assertEqual(self.score_cell.Score,3) self.assertEqual(self.score_cell.Pointer,"diag") #Test up self.score_cell.update(1,1,1) self.assertEqual(self.score_cell.Score,1) self.assertEqual(self.score_cell.Pointer,"up") #Test diag self.score_cell.update(3,1,2) self.assertEqual(self.score_cell.Score,3) self.assertEqual(self.score_cell.Pointer,"diag") #Test left self.score_cell.update(1,2,3) self.assertEqual(self.score_cell.Score,3) self.assertEqual(self.score_cell.Pointer,"left") class MatchScorerTests(TestCase): """Tests for MatchScorer function. """ def setUp(self): """Setup for MatchScorer tests.""" self.equality_scorer = MatchScorer(1,-1) self.mismatch_worse_scorer = MatchScorer(1,-2) self.match_worse_scorer = MatchScorer(-1,1) def test_scorer(self): """Tests for MatchScorer function.""" #Test equality_scorer self.assertEqual(self.equality_scorer('A','A'),1) self.assertEqual(self.equality_scorer('A','B'),-1) #Test mismatch_worse_scorer self.assertEqual(self.mismatch_worse_scorer('A','A'),1) self.assertEqual(self.mismatch_worse_scorer('A','B'),-2) #Test match_worse_scorer self.assertEqual(self.match_worse_scorer('A','A'),-1) self.assertEqual(self.match_worse_scorer('A','B'),1) class ScoreMatrixTests(TestCase): """Tests for ScoreCell class. """ def setUp(self): """Setup for ScoreMatrix tests.""" self.score_matrix = ScoreMatrix('AACGU','ACAGU') def fill(): pass def traceback(): pass self.score_matrix.fill = fill self.score_matrix.traceback = traceback self.empty_score_matrix = ScoreMatrix('','') def test_init(self): """Tests for ScoreMatrix __init__ function.""" #Test empty ScoreMatrix self.empty_score_matrix = ScoreMatrix('','') self.assertEqual(self.empty_score_matrix.First,'') self.assertEqual(self.empty_score_matrix.Second,'') self.assertEqual(self.empty_score_matrix.Cols,1) self.assertEqual(self.empty_score_matrix.Rows,1) self.assertEqual(self.empty_score_matrix.GapScore,-1) self.assertEqual(self.empty_score_matrix.GapSymbol,'-') self.assertEqual(self.empty_score_matrix.FirstAlign,[]) self.assertEqual(self.empty_score_matrix.SecondAlign,[]) def test_str(self): """Tests for ScoreMatrix __str__ function.""" #Test empty ScoreMatrix self.assertEqual(self.empty_score_matrix.__str__(),"Empty Score Matrix") #Test full ScoreMatrix self.assertEqual(self.score_matrix.__str__(),\ """\t\tA\tA\tC\tG\tU\n\t0\t0\t0\t0\t0\t0\nA\t0\t0\t0\t0\t0\t0\nC\t0\t0\t0\t0\t0\t0\nA\t0\t0\t0\t0\t0\t0\nG\t0\t0\t0\t0\t0\t0\nU\t0\t0\t0\t0\t0\t0""") def test_alignment(self): """Tests for ScoreMatrix alignment function.""" #Should not align since ScoreMatrix base object does not have fill() #or traceback() methods. For testing purposes fill() and traceback() #for self.score_matrix do nothing. self.assertEqual(self.score_matrix.alignment(),('','')) class NeedlemanWunschMatrixTests(TestCase): """Tests for NeedlemanWunschMatrix class. """ def setUp(self): """Setup for NeedlemanWunschMatrix tests.""" self.nw_matrix = NeedlemanWunschMatrix('ACGU','CAGU') #Since _init_first_row, _init_first_col and _init_first_cell are #automatically called when NeedlemanWunschMatrix is initialized, #need to change all elements in nw_matrix_empty to 0 to test #that each _init works. self.nw_matrix_empty = \ NeedlemanWunschMatrix('ACGU','CAGU') for i in range(len(self.nw_matrix_empty)): for j in range(len(self.nw_matrix_empty[i])): self.nw_matrix_empty[i][j]=ScoreCell(0) def test_init_first_row(self): """Tests for NeedlemanWunschMatrix _init_first_row function.""" nw_matrix_empty = copy(self.nw_matrix_empty) #Init first row nw_matrix_empty._init_first_row() #matrix after first row init matrix_scores_first_row_init = [[0,-1,-2,-3,-4], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0]] for i in range(len(nw_matrix_empty)): for j in range(len(nw_matrix_empty[i])): self.assertEqual(nw_matrix_empty[i][j].Score,\ matrix_scores_first_row_init[i][j]) def test_init_first_col(self): """Tests for NeedlemanWunschMatrix _init_first_col function.""" nw_matrix_empty = copy(self.nw_matrix_empty) #Init first row nw_matrix_empty._init_first_col() #matrix after first row init matrix_scores_first_row_init = [[0,0,0,0,0], [-1,0,0,0,0], [-2,0,0,0,0], [-3,0,0,0,0], [-4,0,0,0,0]] for i in range(len(nw_matrix_empty)): for j in range(len(nw_matrix_empty[i])): self.assertEqual(nw_matrix_empty[i][j].Score,\ matrix_scores_first_row_init[i][j]) def test_init_first_cell(self): """Tests for NeedlemanWunschMatrix _init_first_cell function.""" nw_matrix_empty = copy(self.nw_matrix_empty) #Init first row nw_matrix_empty._init_first_cell() self.assertEqual(nw_matrix_empty[0][0].Score,0) def test_initialized_matrix(self): """Tests for NeedlemanWunschMatrix after full initialization.""" matrix_scores_first_row_init = [[0,-1,-2,-3,-4], [-1,0,0,0,0], [-2,0,0,0,0], [-3,0,0,0,0], [-4,0,0,0,0]] for i in range(len(self.nw_matrix)): for j in range(len(self.nw_matrix[i])): self.assertEqual(self.nw_matrix[i][j].Score,\ matrix_scores_first_row_init[i][j]) def test_fill(self): """Tests for NeedlemanWunschMatrix fill function.""" filled_nw_matrix = copy(self.nw_matrix) filled_nw_matrix.fill() matrix_scores_filled = [[0,-1,-2,-3,-4], [-1,-1,0,-1,-2], [-2,0,-1,-1,-2], [-3,-1,-1,0,-1], [-4,-2,-2,-1,1]] matrix_pointers_filled = [[None,'left','left','left','left'], ['up','diag','diag','left','left'], ['up','diag','up','diag','diag'], ['up','up','diag','diag','left'], ['up','up','up','up','diag']] self.assertEqual(filled_nw_matrix.Filled,True) for i in range(len(filled_nw_matrix)): for j in range(len(filled_nw_matrix[i])): self.assertEqual(filled_nw_matrix[i][j].Score,\ matrix_scores_filled[i][j]) self.assertEqual(filled_nw_matrix[i][j].Pointer,\ matrix_pointers_filled[i][j]) self.assertEqual(filled_nw_matrix.MaxScore,(1,4,4)) def test_traceback(self): """Tests for NeedlemanWunschMatrix traceback function.""" self.nw_matrix.traceback() self.assertEqual(self.nw_matrix.FirstAlign,['A','C','-','G','U']) self.assertEqual(self.nw_matrix.SecondAlign,['-','C','A','G','U']) class SmithWatermanMatrixTests(TestCase): """Tests for SmithWatermanMatrix class. """ def setUp(self): """Setup for SmithWatermanMatrix tests.""" self.sw_matrix = SmithWatermanMatrix('ACGU','CAGU') def test_fill(self): """Tests for SmithWatermanMatrix fill function.""" filled_sw_matrix = copy(self.sw_matrix) filled_sw_matrix.fill() matrix_scores_filled = [[0,0,0,0,0], [0,0,1,0,0], [0,1,0,0,0], [0,0,0,1,0], [0,0,0,0,2]] matrix_pointers_filled = [[None,None,None,None,None], [None,None,'diag',None,None], [None,'diag',None,None,None], [None,None,None,'diag',None], [None,None,None,None,'diag']] for i in range(len(filled_sw_matrix)): for j in range(len(filled_sw_matrix[i])): self.assertEqual(filled_sw_matrix[i][j].Score,\ matrix_scores_filled[i][j]) self.assertEqual(filled_sw_matrix[i][j].Pointer,\ matrix_pointers_filled[i][j]) self.assertEqual(filled_sw_matrix.MaxScore,(2,4,4)) def test_traceback(self): """Tests for SmithWatermanMatrix traceback function.""" self.sw_matrix.fill() self.sw_matrix.traceback() self.assertEqual(self.sw_matrix.FirstAlign,['G','U']) self.assertEqual(self.sw_matrix.SecondAlign,['G','U']) class NwAlignTests(TestCase): """Tests for nw_align fuction. """ def test_nw_align_empty(self): """Tests for nw_align function.""" (first,second),score = nw_align('','',return_score=True) self.assertEqual(first,'') self.assertEqual(second,'') self.assertEqual(score,0) def test_nw_align(self): """Tests for nw_align function.""" (first,second),score = nw_align('ACGU','CAGU',return_score=True) self.assertEqual(first,'AC-GU') self.assertEqual(second,'-CAGU') self.assertEqual(score,1) class SwAlignTests(TestCase): """Tests for sw_align function. """ def test_sw_align_empty(self): """Tests for sw_align function.""" (first,second),score = sw_align('','',return_score=True) self.assertEqual(first,'') self.assertEqual(second,'') self.assertEqual(score,0) def test_sw_align(self): """Tests for sw_align function.""" (first,second),score = sw_align('ACGU','CAGU',return_score=True) self.assertEqual(first,'GU') self.assertEqual(second,'GU') self.assertEqual(score,2) #run if called from command-line if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_align/test_align.py000644 000765 000024 00000022026 12024702176 022241 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent import DNA, LoadSeqs from cogent.align.align import classic_align_pairwise, make_dna_scoring_dict,\ local_pairwise, global_pairwise from cogent.evolve.models import HKY85 import cogent.evolve.substitution_model dna_model = cogent.evolve.substitution_model.Nucleotide( model_gaps=False, equal_motif_probs=True) import cogent.align.progressive import unittest __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def matchedColumns(align): """Count the matched columns in an alignment""" def all_same(column): consensus = None for motif in column: if consensus is None: consensus = motif elif motif != consensus: return False return True return len(align.filtered(all_same)) seq1 = DNA.makeSequence('aaaccggacattacgtgcgta', Name='FAKE01') seq2 = DNA.makeSequence( 'ccggtcaggttacgtacgtt', Name= 'FAKE02') class AlignmentTestCase(unittest.TestCase): def _aligned_both_ways(self, seq1, seq2, **kw): S = make_dna_scoring_dict(10, -1, -8) a1 = classic_align_pairwise(seq1, seq2, S, 10, 2, **kw) a2 = classic_align_pairwise(seq2, seq1, S, 10, 2, **kw) return [a1, a2] def test_local(self): for a in self._aligned_both_ways(seq1, seq2, local=True): self.assertEqual(matchedColumns(a), 15) self.assertEqual(len(a), 19) def test_gap_at_one_end(self): for a in self._aligned_both_ways(seq1, seq2, local=False): self.assertEqual(matchedColumns(a), 15) self.assertEqual(len(a), 23) def test_gaps_at_both_ends(self): s = 'aaaccggttt' s1 = DNA.makeSequence(s[:-2], Name="A") s2 = DNA.makeSequence(s[2:], Name="B") for a in self._aligned_both_ways(s1, s2, local=False): self.assertEqual(matchedColumns(a), 6) self.assertEqual(len(a), 10) def test_short(self): s1 = DNA.makeSequence('tacagta', Name="A") s2 = DNA.makeSequence('tacgtc', Name="B") for a in self._aligned_both_ways(s1, s2, local=False): self.assertEqual(matchedColumns(a), 5) self.assertEqual(len(a), 7) def test_pairwise_returns_score(self): """exercise pairwise local/global returns alignment score""" S = make_dna_scoring_dict(10, -1, -8) aln, score = local_pairwise(seq1, seq2, S, 10, 2, return_score=True) self.assertTrue(score > 100) aln, score = global_pairwise(seq1, seq2, S, 10, 2, return_score=True) self.assertTrue(score > 100) def test_codon(self): s1 = DNA.makeSequence('tacgccgta', Name="A") s2 = DNA.makeSequence('tacgta', Name="B") codon_model = cogent.evolve.substitution_model.Codon( model_gaps=False, equal_motif_probs=True, mprob_model='conditional') tree = cogent.LoadTree(tip_names=['A', 'B']) lf = codon_model.makeLikelihoodFunction(tree, aligned=False) lf.setSequences(dict(A=s1, B=s2)) (score, a) = lf.getLogLikelihood().edge.getViterbiScoreAndAlignment() self.assertEqual(matchedColumns(a), 6) self.assertEqual(len(a), 9) def test_local_tiebreak(self): """Should pick the first best-equal hit rather than the last one""" # so that the Pyrex and Python versions give the same result. score_matrix = make_dna_scoring_dict(match=1, transition=-1, transversion=-1) pattern = DNA.makeSequence('cwc', Name='pattern') two_hit = DNA.makeSequence( 'cactc', Name= 'target') aln = local_pairwise(pattern, two_hit, score_matrix, 5, 2) hit = aln.NamedSeqs['target'] self.assertEqual(str(hit).lower(), 'cac') class UnalignedPairTestCase(unittest.TestCase): def test_forward(self): tree = cogent.LoadTree(tip_names='AB') pc = dna_model.makeLikelihoodFunction(tree, aligned=False) pc.setSequences({'A':seq1, 'B':seq2}) LnL = pc.getLogLikelihood() assert isinstance(LnL, float) class MultipleAlignmentTestCase(unittest.TestCase): def _make_aln(self, orig, model=dna_model, param_vals=None, indel_rate=0.1, indel_length=0.5, **kw): kw['indel_rate'] = indel_rate kw['indel_length'] = indel_length seqs = dict((key, DNA.makeSequence(value)) for (key, value) in orig.items()) if len(seqs) == 2: tree = cogent.LoadTree(tip_names=seqs.keys()) tree = cogent.LoadTree(treestring="(A:.1,B:.1)") else: tree = cogent.LoadTree(treestring="(((A:.1,B:.1):.1,C:.1):.1,D:.1)") aln, tree = cogent.align.progressive.TreeAlign(model, seqs, tree=tree, param_vals=param_vals, show_progress=False, **kw) return aln def _test_aln(self, seqs, model=dna_model, param_vals=None, **kw): orig = dict((n,s.replace('-', '')) for (n,s) in seqs.items()) aln = self._make_aln(orig, model=model, param_vals=param_vals, **kw) result = dict((n,s.lower()) for (n,s) in aln.todict().items()) # assert the alignment result is correct self.assertEqual(seqs, result) # assert the returned alignment has the correct parameter values in the # align.Info object. if param_vals: for param, val in param_vals: self.assertEqual(aln.Info.AlignParams[param], val) def test_progressive1(self): """test progressive alignment, gaps in middle""" self._test_aln({ 'A': 'tacagta', 'B': 'tac-gtc', 'C': 'ta---ta', 'D': 'tac-gtc', }) def test_progressive_est_tree(self): """excercise progressive alignment without a guide tree""" seqs = LoadSeqs(data={'A': "TGTGGCACAAATGCTCATGCCAGCTCTTTACAGCATGAGAACA", 'B': "TGTGGCACAGATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTT", 'C': "TGTGGCACAAGTACTCATGCCAGCTCAGTACAGCATGAGAACAGCAGTTT"}, aligned=False) aln, tree = cogent.align.progressive.TreeAlign(HKY85(), seqs, show_progress=False, param_vals={'kappa': 4.0}) expect = {'A': 'TGTGGCACAAATGCTCATGCCAGCTCTTTACAGCATGAGAACA-------', 'C': 'TGTGGCACAAGTACTCATGCCAGCTCAGTACAGCATGAGAACAGCAGTTT', 'B': 'TGTGGCACAGATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTT'} self.assertEqual(aln.todict(), expect) def test_progressive_params(self): """excercise progressive alignment providing model params""" self._test_aln({ 'A': 'tacagta', 'B': 'tac-gtc', 'C': 'ta---ta', 'D': 'cac-cta', }, model=HKY85(), param_vals=[('kappa',2.0)]) def test_TreeAlign_does_pairs(self): """test TreeAlign handles pairs of sequences""" self._test_aln({ 'A': 'acttgtac', 'B': 'ac--gtac', }) def test_gap_at_start(self): """test progressive alignment, gaps at start""" self._test_aln({ 'A': '-ac', 'B': '-ac', 'C': '-ac', 'D': 'gac', }) def test_gap_at_end(self): """test progressive alignment, gaps at end""" self._test_aln({ 'A': 'gt-', 'B': 'gt-', 'C': 'gt-', 'D': 'gta', }) def test_gaps2(self): """Gaps have real costs, even end gaps""" self._test_aln({ 'A': 'g-', 'B': 'g-', 'C': 'ga', 'D': 'a-', }) self._test_aln({ 'A': '-g', 'B': '-g', 'C': 'ag', 'D': '-a', }) def test_difficult_end_gaps(self): self._test_aln({ 'A': '--cctc', 'B': '--cctc', 'C': 'gacctc', 'D': 'ga----', }) return self._test_aln({ 'A': 'gcctcgg------', 'B': 'gcctcgg------', 'C': 'gcctcggaaacgt', 'D': '-------aaacgt', }) class HirschbergTestCase(MultipleAlignmentTestCase): # Force use of linear space algorithm def _test_aln(self, seqs, **kw): tmp = cogent.align.pairwise.HIRSCHBERG_LIMIT try: cogent.align.pairwise.HIRSCHBERG_LIMIT = 100 result = MultipleAlignmentTestCase._test_aln(self, seqs, **kw) finally: cogent.align.pairwise.HIRSCHBERG_LIMIT = tmp return result if __name__ == '__main__': unittest.main() PyCogent-1.5.3/tests/test_align/test_weights/000755 000765 000024 00000000000 12024703632 022243 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/tests/test_align/test_weights/__init__.py000644 000765 000024 00000000500 12024702176 024351 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['test_methods', 'test_util'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" PyCogent-1.5.3/tests/test_align/test_weights/test_methods.py000644 000765 000024 00000041223 12024702176 025323 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division from numpy import array, zeros, float64 as Float64 from cogent.util.unit_test import TestCase, main from cogent.parse.tree import DndParser from cogent.parse.clustal import ClustalParser as MinimalClustalParser from cogent.core.alignment import Alignment from cogent.core.profile import Profile from cogent.align.weights.util import DNA_ORDER, PROTEIN_ORDER from cogent.align.weights.methods import VA, VOR, mVOR, pos_char_weights, PB,\ SS, ACL, GSC, _clip_branch_lengths, _set_branch_sum, _set_node_weight __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" def ClustalParser(f): return Alignment(list(MinimalClustalParser(f))) class GeneralTests(TestCase): """General tests for all classes in this file, provides general setup""" def setUp(self): """General setUp method for all tests in this file""" #ALIGNMENTS self.aln1 = Alignment(['ABC','BCC','BAC']) #alignment from Henikoff 1994 self.aln2 = Alignment({'seq1':'GYVGS','seq2':'GFDGF','seq3':'GYDGF',\ 'seq4':'GYQGG'},Names=['seq1','seq2','seq3','seq4']) #alignment from Vingron & Sibbald 1993 self.aln3 = Alignment({'seq1':'AA', 'seq2':'AA', 'seq3':'BB'},\ Names=['seq1','seq2','seq3']) #alignment from Vingron & Sibbald 1993 self.aln4 = Alignment({'seq1':'AA', 'seq2':'AA', 'seq3':'BB',\ 'seq4':'BB','seq5':'CC'},Names=['seq1','seq2','seq3','seq4','seq5']) self.aln5 = Alignment(['ABBA','ABCA','CBCB']) #alignment 5S rRNA seqs from Hein 1990 self.aln6 = ClustalParser(FIVE_S_ALN.split('\n')) #alignment from Vingron & Sibbald 1993 self.aln7 = Alignment({'seq1':'AGCTA', 'seq2':'AGGTA', 'seq3':'ACCTG', 'seq4':'TGCAA'},Names=['seq1','seq2','seq3','seq4']) #TREES (SEE BOTTOM OF FILE FOR DESCRIPTION) self.tree1 = DndParser(TREE_1) self.tree2 = DndParser(TREE_2) self.tree3 = DndParser(TREE_3) self.tree4 = DndParser(TREE_4) self.tree5 = DndParser(TREE_5) self.tree6 = DndParser(TREE_6) self.tree7 = DndParser(TREE_7) self.tree8 = DndParser(TREE_8) self.tree9 = DndParser(TREE_9) class VoronoiTests(GeneralTests): """Tests for the voronoi sequence weighting module""" def test_VA(self): """VA: should return expected results. Results don't vary with runs""" err=1e-3 aln2_exp = {'seq1':.269, 'seq2':.269,'seq3':.192,'seq4':.269} aln4_exp = {'seq1':.1875, 'seq2':.1875,'seq3':.1875,'seq4':.1875,\ 'seq5':.25} aln5_exp = {'seq_0':0.33333,'seq_1':0.25,'seq_2':0.4167} aln6_exp = dict(zip(map(str,[1,2,3,4,5,6,7,8,9,10]),\ [0.0962,0.0925,0.1061,0.1007,0.0958,0.0977,0.0914,\ 0.0934,0.1106,0.1156])) self.assertFloatEqualAbs(VA(self.aln2).values(),aln2_exp.values(), eps=err) self.assertFloatEqualAbs(VA(self.aln4).values(),aln4_exp.values(), eps=err) self.assertFloatEqualAbs(VA(self.aln5).values(),aln5_exp.values(), eps=err) self.assertFloatEqualAbs(VA(self.aln6).values(),aln6_exp.values(), eps=err) results = [] for x in range(5): results.append(VA(self.aln2)) if x > 0: self.assertEqual(results[x], results[x-1]) def test_VOR_exact(self): """VOR: should give exact results when using pseudo_seqs_exact""" err=1e-3 aln2_exp = {'seq1':.259, 'seq2':.315,'seq3':.167,'seq4':.259} aln3_exp = {'seq1':.29167, 'seq2':.29167, 'seq3':.4167} aln4_exp = {'seq1':.1851, 'seq2':.1851,'seq3':.1851,'seq4':.1851,\ 'seq5':.259} self.assertFloatEqualAbs(VOR(self.aln2).values(),aln2_exp.values(), eps=err) self.assertFloatEqualAbs(VOR(self.aln3).values(),aln3_exp.values(),\ eps=err) self.assertFloatEqualAbs(VOR(self.aln4).values(),aln4_exp.values(),\ eps=err) #this is the exact method, so the answer should be exactly the same #every time (on the same alignment) results = [] for x in range(5): results.append(VOR(self.aln2)) if x > 0: self.assertEqual(results[x], results[x-1]) def test_VOR_force_mc(self): """VOR: should result in good approximation when using monte carlo""" err=9e-2 aln2_exp = {'seq1':.259, 'seq2':.315,'seq3':.167,'seq4':.259} aln3_exp = {'seq1':.29167, 'seq2':.29167, 'seq3':.4167} aln4_exp = {'seq1':.1851, 'seq2':.1851,'seq3':.1851,'seq4':.1851,\ 'seq5':.259} aln6_exp = dict(zip(map(str,[1,2,3,4,5,6,7,8,9,10]),\ [0.0840,0.0763,0.1155,0.1019,0.0932,0.0980,0.0864,\ 0.0999,0.1121,0.1328])) # the following assertSimilarMeans statements were added to replace # stochastic assertFloatEqualAbs calls below self.assertSimilarMeans(VOR(self.aln2,force_monte_carlo=True).values(), aln2_exp.values()) self.assertSimilarMeans(VOR(self.aln3,force_monte_carlo=True).values(), aln3_exp.values()) self.assertSimilarMeans(VOR(self.aln4,force_monte_carlo=True).values(), aln4_exp.values()) self.assertSimilarMeans(VOR(self.aln6,n=1000).values(), aln6_exp.values()) #self.assertFloatEqualAbs(VOR(self.aln2,force_monte_carlo=True)\ # .values(), aln2_exp.values(),eps=err) #self.assertFloatEqualAbs(VOR(self.aln3,force_monte_carlo=True)\ # .values(), aln3_exp.values(),eps=err) #self.assertFloatEqualAbs(VOR(self.aln4,force_monte_carlo=True)\ # .values(), aln4_exp.values(),eps=err) #self.assertFloatEqualAbs(VOR(self.aln6,n=1000)\ # .values(), aln6_exp.values(),eps=err) #make sure monte carlo is used results = [] for x in range(5): results.append(VOR(self.aln2,force_monte_carlo=True)) if x > 0: self.assertNotEqual(results[x], results[x-1]) def test_VOR_mc_threshold(self): """VOR: should apply monte carlo when # of pseudo seqs > mc_threshold """ err=9e-2 aln2_exp = {'seq1':.259, 'seq2':.315,'seq3':.167,'seq4':.259} # the following assertSimilarMeans statement was added to replace # stochastic assertFloatEqualAbs call below self.assertSimilarMeans(VOR(self.aln2, mc_threshold=15).values(), aln2_exp.values()) #self.assertFloatEqual(VOR(self.aln2,mc_threshold=15).values(),\ # aln2_exp.values(),err) #make sure monte carlo is used results = [] for x in range(5): results.append(VOR(self.aln2,mc_threshold=15)) if x > 0: self.assertNotEqual(results[x], results[x-1]) def test_mVOR(self): """mVOR: should return weights closer to the 'True' weights""" #err=5e-2 #original error value # Raised the error value to prevent occasional failure of the test. # The mVOR method takes a sample from a distribution and the outcome # will depend on this sample. Every now and then, one of the weights # was more than 0.05 away from the expected weight. Raised the # allowed error value to prevent that. To use the method on real # data, a larger sample should be taken (e.g. 10000?), but increasing # the sample size here would make the test too slow. err=0.075 aln3_exp = {'seq1':.25, 'seq2':.25, 'seq3':.5} aln4_exp = {'seq1':.1667, 'seq2':.1667,'seq3':.1667,'seq4':.1667,\ 'seq5':.3333} aln6_exp = dict(zip(map(str,[1,2,3,4,5,6,7,8,9,10]), [0.09021,0.08039,0.113560,0.10399,0.092370,0.097130, 0.09198,0.09538,0.10927,0.12572])) # the following assertSimilarMeans statements were added to replace # stochastic assertFloatEqualAbs calls below self.assertSimilarMeans(mVOR(self.aln3,order="ABC").values(), aln3_exp.values()) self.assertSimilarMeans(mVOR(self.aln4,order="ABC").values(), aln4_exp.values()) self.assertSimilarMeans(mVOR(self.aln6,order=DNA_ORDER,n=3000)\ .values(), aln6_exp.values()) #self.assertFloatEqualAbs(mVOR(self.aln3,order="ABC").values(),\ # aln3_exp.values(),eps=err) #self.assertFloatEqualAbs(mVOR(self.aln4,order="ABC").values(),\ # aln4_exp.values(),eps=err) #self.assertFloatEqualAbs(mVOR(self.aln6,order=DNA_ORDER,n=3000)\ # .values(), aln6_exp.values(),eps=err) #the results vary with runs, because the sample of random profiles #is different each time results = [] for x in range(5): results.append(mVOR(self.aln4,order="ABC")) if x > 0: self.assertNotEqual(results[x], results[x-1]) class PositionBasedTests(GeneralTests): """Contains tests for PB (=position-based) method""" def test_pos_char_weights(self): """pos_char_weights: should return correct contributions at each pos """ #build expected profile exp_data = zeros([len(PROTEIN_ORDER),self.aln2.SeqLen],Float64) exp = [{'G':1/4},{'Y':1/6,'F':1/2},{'V':1/3,'D':1/6,'Q':1/3}, {'G':1/4},{'G':1/3,'F':1/6,'S':1/3}] for pos, weights in enumerate(exp): for k,v in weights.items(): exp_data[PROTEIN_ORDER.index(k),pos] = v exp_aln2 = Profile(exp_data,Alphabet=PROTEIN_ORDER) #check observed against expected self.assertEqual(pos_char_weights(self.aln2,PROTEIN_ORDER).Data, exp_aln2.Data) def test_PB(self): """PB: should return correct weights""" err=1e-3 aln2_exp = {'seq3': 0.2, 'seq2': 0.267, 'seq1': 0.267, 'seq4': 0.267} self.assertFloatEqualAbs(PB(self.aln2,PROTEIN_ORDER)\ .values(), aln2_exp.values(),eps=err) class SsTests(GeneralTests): """Tests for SS function""" def test_SS(self): """SS: should return the correct weights""" err=1e-3 aln4_exp = {'seq1':.1910, 'seq2':.1910,'seq3':.1910,'seq4':.1910,\ 'seq5':.2361} aln6_exp = dict(zip(map(str,[1,2,3,4,5,6,7,8,9,10]), [0.0977,0.0942,0.1045,0.0997,0.0968,0.0988, 0.0929,0.0950,0.1076,0.1122])) aln7_exp = {'seq1':.1792, 'seq2':.2447,'seq3':.2880,'seq4':.2880} self.assertFloatEqualAbs(SS(self.aln4).values(),aln4_exp.values(), eps=err) self.assertFloatEqualAbs(SS(self.aln6).values(),aln6_exp.values(), eps=err) self.assertFloatEqualAbs(SS(self.aln7).values(),aln7_exp.values(), eps=err) class AclTests(GeneralTests): """Contains tests for ACL functionality""" def test_ACL(self): """ACL: should return correct weights""" err=1e-3 tree1_exp = {'WMJ2': 0.035, 'HXB': 0.017, 'WMJ1': 0.039,\ 'BH10': 0.013, 'CDC': 0.048, 'BRU': 0.014, 'SF2': 0.050,\ 'BH8': 0.006, 'RF': 0.085, 'ELI': 0.068, 'PV22': 0.013,\ 'Z6': 0.129, 'MAL': 0.115, 'WMJ3': 0.035, 'Z3': 0.333} tree2_exp = {'agcta':0.7380,'aggta':0.0,'acctg':0.0,'tgcaa':0.2620} tree3_exp = {'agcta':0.2857,'aggta':0.0,'acctg':0.2857,'tgcaa':0.4286} tree4_exp = {'10': 0.1186, '1': 0.0627, '3': 0.1307, '2': 0.0627, '5': 0.0919, '4': 0.1307, '7': 0.0958, '6': 0.0919, '9': 0.1186, '8': 0.0958} tree9_exp = {'A':.25,'B':.25,'C':.25,'D':.25} self.assertFloatEqualAbs(ACL(self.tree1), tree1_exp, eps=err) self.assertFloatEqualAbs(ACL(self.tree2), tree2_exp, eps=err) self.assertFloatEqualAbs(ACL(self.tree3), tree3_exp, eps=err) self.assertFloatEqualAbs(ACL(self.tree4), tree4_exp, eps=err) #also works when branch lengths are zero self.assertFloatEqualAbs(ACL(self.tree9), tree9_exp, eps=err) w_tree8 = ACL(self.tree8) self.assertFloatEqual(w_tree8['A'], w_tree8['B'],err) self.assertFloatEqual(w_tree8['A'], w_tree8['C'],err) self.assertFloatEqual(w_tree8['D'], w_tree8['E'],err) self.assertFloatEqual(w_tree8['F'], w_tree8['G'],err) self.assertGreaterThan(w_tree8['A'], w_tree8['D']) self.assertGreaterThan(w_tree8['D'], w_tree8['H']) self.assertGreaterThan(w_tree8['H'], w_tree8['F']) class GscTests(GeneralTests): """Tests for GSC functionality""" def test_gsc(self): """GSC: should return correct weights""" err = 1e-3 tree6_exp = {'A': 0.19025, 'B': 0.19025, 'C': 0.2717, 'D': 0.3478} tree7_exp = {'A':.25, 'B':.25, 'C':.25, 'D':.25} tree8_exp = dict(zip('ABCDEFGH',[.1,.1,.2,.06,.06,.16,.16,.16])) self.assertFloatEqualAbs(GSC(self.tree6).values(),\ tree6_exp.values(),eps=err) self.assertFloatEqualAbs(GSC(self.tree7).values(),\ tree7_exp.values(),eps=err) self.assertFloatEqualAbs(GSC(self.tree8).values(),\ tree8_exp.values(),eps=err) #Rooted tree relating 15 HIV-1 isolates. From Altschul (1989) Fig 2. TREE_1 = "(((((((((((BH8:0.7,PV22:0.3,BH10:0.3):0.1,BRU:0.5):0.1,HXB:0.7):2.4,SF2:3.3):0.1,CDC:3.7):0.5),((WMJ1:0.8,WMJ2:0.9,WMJ3:0.9):2.1)):0.4,RF:4.3):2.6),(((Z6:2.2,ELI:4.2):2.1,MAL:6.1):1.9)):2.7,Z3:9.3);" #Model tree from Vingron and Sibbald (1993) Fig 3, distances estimated by Li. TREE_2 = "(((((agcta:0.0,aggta:1.03):0.0),acctg:2.23):0.6),tgcaa:1.69);" #Model tree from Vingron and Sibbald (1993) Fig 3 Actual # of substitutions TREE_3 = "(((((agcta:0,aggta:1):1),acctg:1):1),tgcaa:2);" #the ultrameric tree from Vingron and Sibbald (1993) Fig 3. TREE_4 ="(((((((((2:6.0,1:6.0):12.9),((8:17.0,7:17.0):1.9)):5.3)),((6:8.5,5:8.5):15.7)):9.6),((((4:15.0,3:15.0):12.1),((9:11.0,10:11.0):16.1)):6.7)));" #the additive tree from Vingron and Sibbald (1993) Fig 3. #I don't trust the results they got for this tree (Table 3). TREE_5 ="(((((((((2:7,1:7):19),((8:18,7:16):3)):12)),((6:10,5:10):28)):16),((((4:14,3:18):15),((9:8,10:14):24)):11)));" TREE_6 = "(((A:20,B:20):30,C:50):30,D:80);" TREE_7 = "((A:10,B:10):5,(C:10,D:10):5);" TREE_8 = "((((A:5,B:5):5,C:15):10),((D:0,E:0):10,((F:5,G:5):10,H:10):10):10);" TREE_9 = "(((A:0,B:0):5,C:10):5,D:25);" FIVE_S_ALN =\ """CLUSTAL W (1.81) 1 A----TCCACGGCCATAGGACTCTGAAAGCACTGCATCCCGT-CCGATCTGCAAAGTTAA 2 A----TCCACGGCCATAGGACTGTGAAAGCACCGCATCCCGT-CTGATCTGCGCAGTTAA 3 T----CTGGTGATGATGGCGGAGGGGACACACCCGTTCCCATACCGAACACGGCCGTTAA 4 T----CTGGTGGCGATAGCGAGAAGGTCACACCCGTTCCCATACCGAACACGGAAGTTAA 5 G---TGGTGCGGTCATACCAGCGCTAATGCACCGGATCCCAT-CAGAACTCCGCAGTTAA 6 G----GGTGCGATCATACCAGCGTTAATGCACCGGATCCCAT-CAGAACTCCGCAGTTAA 7 G----CTTACGACCATATCACGTTGAATGCACGCCATCCCGT-CCGATCTGGCAAGTTAA 8 G----CCTACGGCCATCCCACCCTGGTAACGCCCGATCTCGT-CTGATCTCGGAAGCTAA 9 T--T-CTGGTGTCTCAGGCGTGGAGGAACCACACCAATCCATCCCGAACTTGGTGGTGAA 10 TATT-CTGGTGTCCCAGGCGTAGAGGAACCACACCGATCCATCTCGAACTTGGTGGTGAA . * .: . .: *.* : *.* **:*: * ** 1 CCAGAGTACCGCCCAGT-TAGTACC-AC-GGTGGGGGACCACGCGGGAATCCTGGGTGCT 2 ACACAGTGCCGCCTAGT-TAGTACC-AT-GGTGGGGGACCACATGGGAATCCTGGGTGCT 3 GCCCTCCAGCGCC--AA-TGGTACT-TGCTC-CGCAGGGAG-CCGGGAGAGTAGGACGTC 4 GCTTCTCAGCGCC--GA-TGGTAGT-TA-GG-GGCTGTCCC-CTGTGAGAGTAGGACGCT 5 GCGCGCTTGGGCCAGAA-CAGTACT-GG-GATGGGTGACCTCCCGGGAAGTCCTGGTGCC 6 GCGCGCTTGGGTTGGAG-TAGTACT-AG-GATGGGTGACCTCCTGGGAAGTCCTAATATT 7 GCAACGTTGAGTCCAGT-TAGTACT-TG-GATCGGAGACGGCCTGGGAATCCTGGATGTT 8 GCAGGGTCGGGCCTGGT-TAGTACT-TG-GATGGGAGACCTCCTGGGAATACCGGGTGCT 9 ACTCTATTGCGGT--GA-CGATACTGTA-GG-GGAAGCCCG-ATGGAAAAATAGCTCGAC 10 ACTCTGCCGCGGT--AACCAATACT-CG-GG-GGGGGCCCT-GCGGAAAAATAGCTCGAT * * . ..** * * * .*. . * 1 GT-GG-T--T- 2 GT-GG-T--T- 3 GCCAG-G--C- 4 GCCAG-G--C- 5 GCACC-C--C- 6 GCACC-C-TT- 7 GTAAG-C--T- 8 GTAGG-CT-T- 9 GCCAGGA--T- 10 GCCAGGA--TA : """ if __name__ == "__main__": main() PyCogent-1.5.3/tests/test_align/test_weights/test_util.py000644 000765 000024 00000022775 12024702176 024650 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.unit_test import TestCase, main from numpy import array, float64 as Float64, zeros from math import sqrt from random import choice from cogent.core.alignment import Alignment from cogent.core.sequence import DnaSequence, RnaSequence from cogent.core.moltype import DNA, RNA from cogent.align.weights.util import Weights, number_of_pseudo_seqs,\ pseudo_seqs_exact, pseudo_seqs_monte_carlo, row_to_vote, distance_matrix,\ eigenvector_for_largest_eigenvalue, DNA_ORDER,RNA_ORDER,PROTEIN_ORDER,\ SeqToProfile,AlnToProfile, distance_to_closest __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" class WeightsTests(TestCase): def test_weights(self): """Weights: should behave like a normal dict and can be normalized """ w = Weights({'seq1':2, 'seq2':3, 'seq3':10}) self.assertEqual(w['seq1'],2) w.normalize() exp = {'seq1':0.1333333, 'seq2':0.2, 'seq3':0.6666666} self.assertFloatEqual(w.values(), exp.values()) class UtilTests(TestCase): def setUp(self): """Set up for Voronoi tests""" self.aln1 = Alignment(['ABC','BCC','BAC']) self.aln2 = Alignment({'seq1':'GYVGS','seq2':'GFDGF','seq3':'GYDGF',\ 'seq4':'GYQGG'},Names=['seq1','seq2','seq3','seq4']) self.aln3 = Alignment({'seq1':'AA', 'seq2':'AA', 'seq3':'BB'},\ Names=['seq1','seq2','seq3']) self.aln4 = Alignment({'seq1':'AA', 'seq2':'AA', 'seq3':'BB',\ 'seq4':'BB','seq5':'CC'},Names=['seq1','seq2','seq3','seq4','seq5']) self.aln5 = Alignment(['ABBA','ABCA','CBCB']) def test_number_of_pseudo_seqs(self): """number_of_pseudo_seqs: should return # of pseudo seqs""" self.assertEqual(number_of_pseudo_seqs(self.aln1),6) self.assertEqual(number_of_pseudo_seqs(self.aln2),18) self.assertEqual(number_of_pseudo_seqs(self.aln3),4) self.assertEqual(number_of_pseudo_seqs(self.aln4),9) def test_pseudo_seqs_exact(self): """pseudo_seqs_exact: should generate expected pseudo sequences""" self.assertEqualItems(pseudo_seqs_exact(self.aln1),\ ['AAC','ABC','ACC','BAC','BBC','BCC']) self.assertEqualItems(pseudo_seqs_exact(self.aln3),\ ['AA','AB','BA','BB']) self.assertEqual(len(pseudo_seqs_exact(self.aln2)), 18) def test_pseudo_seqs_monte_carlo(self): """pseudo_seqs_monte_carlo: random sample from all possible pseudo seqs """ self.assertEqual(len(list(pseudo_seqs_monte_carlo(self.aln1,n=100))),\ 100) for i in pseudo_seqs_monte_carlo(self.aln3,n=100): self.assertContains(['AA','AB','BA','BB'], i) def test_row_to_vote(self): """row_to_vote: should return correct votes for int and float distances """ self.assertEqual(row_to_vote(array([2,3,4,5])),array([1,0,0,0])) self.assertEqual(row_to_vote(array([2,3,2,5])),array([.5,0,0.5,0])) self.assertEqual(row_to_vote(array([2.3,3.5,2.1,5.8]))\ ,array([0,0,1,0])) def test_distance_matrix(self): """distance_matrix should obey Names of alignment""" #Names=None aln1_exp = array([[0,2,2],[2,0,1],[2,1,0]]) self.assertEqual(distance_matrix(self.aln1),aln1_exp) a = Alignment(self.aln1.NamedSeqs) a.Names=['seq_1','seq_2','seq_0'] a_exp = array([[0,1,2],[1,0,2],[2,2,0]]) self.assertEqual(distance_matrix(a),a_exp) def test_eigenvector_for_largest_eigenvalue(self): """eigenvector_for_largest_eigenvalue: No idea how to test this""" pass def test_distance_to_closest(self): """distance_to_closest: should return closest distances""" self.assertEqual(distance_to_closest(self.aln1),[2,1,1]) self.assertEqual(distance_to_closest(self.aln2),[2,1,1,2]) def test_SeqToProfile(self): """SequenceToProfile: should work with different parameter settings """ seq = DnaSequence("ATCGRYN-") #Only non-degenerate bases in the char order, all other #characters are ignored. In a sequence this means that #several positions will contain only zeros in the profile. exp = zeros([len(seq),4],Float64) for x,y in zip(range(len(seq)),[2,0,1,3]): exp[x,y] = 1 self.assertEqual(SeqToProfile(seq,char_order="TCAG",\ split_degenerates=False).Data.tolist(),exp.tolist()) #Same thing should work as well when the char order is not passed in exp = zeros([len(seq),4],Float64) for x,y in zip(range(len(seq)),[2,0,1,3]): exp[x,y] = 1 self.assertEqual(SeqToProfile(seq, split_degenerates=False)\ .Data.tolist(),exp.tolist()) #All symbols in the sequence are in the char order, no row #should contain only zeros. Degenerate symbols are not split. exp = zeros([len(seq),8],Float64) for x,y in zip(range(len(seq)),[2,0,1,3,4,5,6,7]): exp[x,y] = 1 self.assertEqual(SeqToProfile(seq,char_order="TCAGRYN-",\ split_degenerates=False).Data.tolist(), exp.tolist()) #splitting all degenerate symbols, having only non-degenerate symbols #in the character order (and -) exp = array([[0,0,1,0,0],[1,0,0,0,0],[0,1,0,0,0],[0,0,0,1,0], [0,0,.5,.5,0],[.5,.5,0,0,0],[.25,.25,.25,.25,0],[0,0,0,0,1]]) self.assertEqual(SeqToProfile(seq,char_order="TCAG-",\ split_degenerates=True).Data.tolist(),exp.tolist()) #splitting degenerates, but having one of the degenerate #symbols in the character order. In that case the degenerate symbol #is not split. exp = array([[0,0,1,0,0,0],[1,0,0,0,0,0],[0,1,0,0,0,0],[0,0,0,1,0,0], [0,0,.5,.5,0,0],[.5,.5,0,0,0,0],[0,0,0,0,1,0],[0,0,0,0,0,1]]) self.assertEqual(SeqToProfile(seq,char_order="TCAGN-",\ split_degenerates=True).Data.tolist(),exp.tolist()) def test_AlignmentToProfile_basic(self): """AlignmentToProfile: should work under basic conditions """ #sequences in the alignment are unweighted #Alphabet is the alphabet of the sequences (RNA) #CharOrder is set explicitly #Degenerate bases are split up #Gaps are ignored #In all of the columns at least one character is in the CharOrder a = Alignment({'a':RnaSequence('UCAGRYN-'),'b':RnaSequence('ACUGAAAA')}) exp =\ array([[.5,0,.5,0], [0,1,0,0], [.5,0,.5,0], [0,0,0,1], [0,0,.75,.25], [.25,.25,.5,0], [.125,.125,.625,.125], [0,0,1,0]]) self.assertEqual(AlnToProfile(a,alphabet=RNA,\ split_degenerates=True).Data.tolist(),exp.tolist()) def test_AlignmentToProfile_ignore(self): """AlignmentToProfile: should raise an error if too many chars ignored """ #Same conditions as previous function, but in the last column #there are only gaps, so normalization will fail at that position a = Alignment({'a':RnaSequence('UCAGRYN-'),'b':RnaSequence('ACUGAAA-')}) exp =\ array([[.5,0,.5,0], [0,1,0,0], [.5,0,.5,0], [0,0,0,1], [0,0,.75,.25], [.25,.25,.5,0], [.125,.125,.625,.125], [0,0,1,0]]) self.assertRaises(ValueError,AlnToProfile,a,alphabet=RNA,\ split_degenerates=True) def test_AlignmentToProfile_weighted(self): """AlignmentToProfile: should work when sequences are weighted """ #Alignment: sequences are just strings and don't have an alphabet #Weights: a normal dictionary (could be a real Weights object as well) a = Alignment({'seq1':'TCAG','seq2':'TAR-','seq3':'YAG-'},\ Names=['seq1','seq2','seq3']) w = {'seq1':0.5,'seq2':.25,'seq3':.25} #Basic situation in which all letters in the sequences occur in the #CharOrder, None have to be ignored. In that case it doesn't matter #whether we set split_degenerates to True or False, because if it's #True it's overwritten by the fact that the char is in the CharOrder. exp = array([[0.75,0,0,0,0,.25,0], [0,0.5,0.5,0,0,0,0], [0,0.5,0,0.25,0.25,0,0], [0,0,0,0.5,0,0,0.5]]) #split_degenerates = False self.assertEqual(AlnToProfile(a,DNA, char_order="TACGRY-",\ weights=w, split_degenerates=False).Data.tolist(),exp.tolist()) #split_degenerates = True self.assertEqual(AlnToProfile(a,DNA, char_order="TACGRY-",\ weights=w, split_degenerates=True).Data.tolist(),exp.tolist()) #Only non-degenerate symbols in the CharOrder. Degenerates are split. #Gaps are ignored exp = array([[0.875,0,0.125,0], [0,0.5,0.5,0], [0,0.625,0,0.375], [0,0,0,1]]) self.assertEqual(AlnToProfile(a,DNA, char_order="TACG",\ weights=w, split_degenerates=True).Data.tolist(),exp.tolist()) #An Error is raised if all chars in an alignment column are ignored #CharOrder=AT, degenerates are not split. self.assertRaises(ValueError,AlnToProfile,a,DNA,\ char_order="AT",weights=w, split_degenerates=True) if __name__ == "__main__": main() PyCogent-1.5.3/tests/data/1A1X.pdb000644 000765 000024 00000302335 11305747275 017475 0ustar00jrideoutstaff000000 000000 HEADER PROTO-ONCOGENE 18-DEC-97 1A1X TITLE CRYSTAL STRUCTURE OF MTCP-1 INVOLVED IN T CELL MALIGNANCIES COMPND MOL_ID: 1; COMPND 2 MOLECULE: HMTCP-1; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 ORGANISM_COMMON: HUMAN; SOURCE 4 ORGANISM_TAXID: 9606; SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 6 EXPRESSION_SYSTEM_TAXID: 562 KEYWDS MTCP-1, CRYSTAL STRUCTURE, ONCOGENE INVOLVED IN T CELL KEYWDS 2 MALIGNANCIES, PROTO-ONCOGENE EXPDTA X-RAY DIFFRACTION AUTHOR Z.Q.FU,G.C.DUBOIS,S.P.SONG,I.KULIKOVSKAYA,L.VIRGILIO, AUTHOR 2 J.ROTHSTEIN,C.M.CROCE,I.T.WEBER,R.W.HARRISON REVDAT 2 24-FEB-09 1A1X 1 VERSN REVDAT 1 27-MAY-98 1A1X 0 JRNL AUTH Z.Q.FU,G.C.DU BOIS,S.P.SONG,I.KULIKOVSKAYA, JRNL AUTH 2 L.VIRGILIO,J.L.ROTHSTEIN,C.M.CROCE,I.T.WEBER, JRNL AUTH 3 R.W.HARRISON JRNL TITL CRYSTAL STRUCTURE OF MTCP-1: IMPLICATIONS FOR ROLE JRNL TITL 2 OF TCL-1 AND MTCP-1 IN T CELL MALIGNANCIES. JRNL REF PROC.NATL.ACAD.SCI.USA V. 95 3413 1998 JRNL REFN ISSN 0027-8424 JRNL PMID 9520380 JRNL DOI 10.1073/PNAS.95.7.3413 REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.00 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.8 REMARK 3 AUTHORS : BRUNGER REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 2.00 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 6.50 REMARK 3 DATA CUTOFF (SIGMA(F)) : 2.000 REMARK 3 DATA CUTOFF HIGH (ABS(F)) : 1000000.000 REMARK 3 DATA CUTOFF LOW (ABS(F)) : 0.0010 REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 99.7 REMARK 3 NUMBER OF REFLECTIONS : 6733 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING SET) : 0.211 REMARK 3 FREE R VALUE : 0.253 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 8.600 REMARK 3 FREE R VALUE TEST SET COUNT : 582 REMARK 3 ESTIMATED ERROR OF FREE R VALUE : 0.010 REMARK 3 REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. REMARK 3 TOTAL NUMBER OF BINS USED : 8 REMARK 3 BIN RESOLUTION RANGE HIGH (A) : 2.00 REMARK 3 BIN RESOLUTION RANGE LOW (A) : 2.09 REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : 99.30 REMARK 3 REFLECTIONS IN BIN (WORKING SET) : 739 REMARK 3 BIN R VALUE (WORKING SET) : 0.2550 REMARK 3 BIN FREE R VALUE : 0.2680 REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : 8.70 REMARK 3 BIN FREE R VALUE TEST SET COUNT : 73 REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : 0.031 REMARK 3 REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. REMARK 3 PROTEIN ATOMS : 881 REMARK 3 NUCLEIC ACID ATOMS : 0 REMARK 3 HETEROGEN ATOMS : 0 REMARK 3 SOLVENT ATOMS : 24 REMARK 3 REMARK 3 B VALUES. REMARK 3 FROM WILSON PLOT (A**2) : NULL REMARK 3 MEAN B VALUE (OVERALL, A**2) : 22.01 REMARK 3 OVERALL ANISOTROPIC B VALUE. REMARK 3 B11 (A**2) : NULL REMARK 3 B22 (A**2) : NULL REMARK 3 B33 (A**2) : NULL REMARK 3 B12 (A**2) : NULL REMARK 3 B13 (A**2) : NULL REMARK 3 B23 (A**2) : NULL REMARK 3 REMARK 3 ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM LUZZATI PLOT (A) : NULL REMARK 3 ESD FROM SIGMAA (A) : NULL REMARK 3 LOW RESOLUTION CUTOFF (A) : NULL REMARK 3 REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM C-V LUZZATI PLOT (A) : NULL REMARK 3 ESD FROM C-V SIGMAA (A) : NULL REMARK 3 REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES. REMARK 3 BOND LENGTHS (A) : 0.007 REMARK 3 BOND ANGLES (DEGREES) : 1.40 REMARK 3 DIHEDRAL ANGLES (DEGREES) : 26.92 REMARK 3 IMPROPER ANGLES (DEGREES) : 1.19 REMARK 3 REMARK 3 ISOTROPIC THERMAL MODEL : RESTRAINED REMARK 3 REMARK 3 ISOTROPIC THERMAL FACTOR RESTRAINTS. RMS SIGMA REMARK 3 MAIN-CHAIN BOND (A**2) : 2.157 ; 1.500 REMARK 3 MAIN-CHAIN ANGLE (A**2) : 3.319 ; 2.000 REMARK 3 SIDE-CHAIN BOND (A**2) : 4.210 ; 2.000 REMARK 3 SIDE-CHAIN ANGLE (A**2) : 6.231 ; 2.500 REMARK 3 REMARK 3 NCS MODEL : NULL REMARK 3 REMARK 3 NCS RESTRAINTS. RMS SIGMA/WEIGHT REMARK 3 GROUP 1 POSITIONAL (A) : NULL ; NULL REMARK 3 GROUP 1 B-FACTOR (A**2) : NULL ; NULL REMARK 3 REMARK 3 PARAMETER FILE 1 : PARHCSDX.PRO REMARK 3 PARAMETER FILE 2 : TIP3P.PARAMETER REMARK 3 PARAMETER FILE 3 : NULL REMARK 3 TOPOLOGY FILE 1 : TOPHCSDX.PRO REMARK 3 TOPOLOGY FILE 2 : TIP3P.TOPOLOGY REMARK 3 TOPOLOGY FILE 3 : NULL REMARK 3 REMARK 3 OTHER REFINEMENT REMARKS: NULL REMARK 4 REMARK 4 1A1X COMPLIES WITH FORMAT V. 3.15, 01-DEC-08 REMARK 100 REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY BNL. REMARK 200 REMARK 200 EXPERIMENTAL DETAILS REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION REMARK 200 DATE OF DATA COLLECTION : SEP-97 REMARK 200 TEMPERATURE (KELVIN) : 293 REMARK 200 PH : 7.8 REMARK 200 NUMBER OF CRYSTALS USED : 1 REMARK 200 REMARK 200 SYNCHROTRON (Y/N) : N REMARK 200 RADIATION SOURCE : ROTATING ANODE REMARK 200 BEAMLINE : NULL REMARK 200 X-RAY GENERATOR MODEL : RIGAKU RUH2R REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M REMARK 200 WAVELENGTH OR RANGE (A) : 1.5418 REMARK 200 MONOCHROMATOR : NI FILTER REMARK 200 OPTICS : NULL REMARK 200 REMARK 200 DETECTOR TYPE : IMAGE PLATE REMARK 200 DETECTOR MANUFACTURER : RIGAKU REMARK 200 INTENSITY-INTEGRATION SOFTWARE : DENZO REMARK 200 DATA SCALING SOFTWARE : SCALEPACK REMARK 200 REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 7194 REMARK 200 RESOLUTION RANGE HIGH (A) : 2.000 REMARK 200 RESOLUTION RANGE LOW (A) : 21.000 REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 0.000 REMARK 200 REMARK 200 OVERALL. REMARK 200 COMPLETENESS FOR RANGE (%) : 99.7 REMARK 200 DATA REDUNDANCY : 5.900 REMARK 200 R MERGE (I) : NULL REMARK 200 R SYM (I) : 0.04700 REMARK 200 FOR THE DATA SET : 15.1000 REMARK 200 REMARK 200 IN THE HIGHEST RESOLUTION SHELL. REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 2.00 REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : 2.07 REMARK 200 COMPLETENESS FOR SHELL (%) : 99.3 REMARK 200 DATA REDUNDANCY IN SHELL : 5.40 REMARK 200 R MERGE FOR SHELL (I) : NULL REMARK 200 R SYM FOR SHELL (I) : 0.14700 REMARK 200 FOR SHELL : 7.600 REMARK 200 REMARK 200 DIFFRACTION PROTOCOL: NULL REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: MAD PHASING REMARK 200 SOFTWARE USED: MADSYS, X-PLOR 3.8 REMARK 200 STARTING MODEL: NULL REMARK 200 REMARK 200 REMARK: NULL REMARK 280 REMARK 280 CRYSTAL REMARK 280 SOLVENT CONTENT, VS (%): 37.00 REMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 2.00 REMARK 280 REMARK 280 CRYSTALLIZATION CONDITIONS: PROTEIN WAS CRYSTALLIZED FROM 1.5M REMARK 280 AMS WITH TRIZMA BUFFER AT PH 7.8 REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 62 2 2 REMARK 290 REMARK 290 SYMOP SYMMETRY REMARK 290 NNNMMM OPERATOR REMARK 290 1555 X,Y,Z REMARK 290 2555 -Y,X-Y,Z+2/3 REMARK 290 3555 -X+Y,-X,Z+1/3 REMARK 290 4555 -X,-Y,Z REMARK 290 5555 Y,-X+Y,Z+2/3 REMARK 290 6555 X-Y,X,Z+1/3 REMARK 290 7555 Y,X,-Z+2/3 REMARK 290 8555 X-Y,-Y,-Z REMARK 290 9555 -X,-X+Y,-Z+1/3 REMARK 290 10555 -Y,-X,-Z+2/3 REMARK 290 11555 -X+Y,Y,-Z REMARK 290 12555 X,X-Y,-Z+1/3 REMARK 290 REMARK 290 WHERE NNN -> OPERATOR NUMBER REMARK 290 MMM -> TRANSLATION VECTOR REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY REMARK 290 RELATED MOLECULES. REMARK 290 SMTRY1 1 1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 1 0.000000 1.000000 0.000000 0.00000 REMARK 290 SMTRY3 1 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 2 -0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 2 0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 2 0.000000 0.000000 1.000000 57.30800 REMARK 290 SMTRY1 3 -0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 3 -0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 3 0.000000 0.000000 1.000000 28.65400 REMARK 290 SMTRY1 4 -1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 4 0.000000 -1.000000 0.000000 0.00000 REMARK 290 SMTRY3 4 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 5 0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 5 -0.866025 0.500000 0.000000 0.00000 REMARK 290 SMTRY3 5 0.000000 0.000000 1.000000 57.30800 REMARK 290 SMTRY1 6 0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 6 0.866025 0.500000 0.000000 0.00000 REMARK 290 SMTRY3 6 0.000000 0.000000 1.000000 28.65400 REMARK 290 SMTRY1 7 -0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 7 0.866025 0.500000 0.000000 0.00000 REMARK 290 SMTRY3 7 0.000000 0.000000 -1.000000 57.30800 REMARK 290 SMTRY1 8 1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 8 0.000000 -1.000000 0.000000 0.00000 REMARK 290 SMTRY3 8 0.000000 0.000000 -1.000000 0.00000 REMARK 290 SMTRY1 9 -0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 9 -0.866025 0.500000 0.000000 0.00000 REMARK 290 SMTRY3 9 0.000000 0.000000 -1.000000 28.65400 REMARK 290 SMTRY1 10 0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 10 -0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 10 0.000000 0.000000 -1.000000 57.30800 REMARK 290 SMTRY1 11 -1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 11 0.000000 1.000000 0.000000 0.00000 REMARK 290 SMTRY3 11 0.000000 0.000000 -1.000000 0.00000 REMARK 290 SMTRY1 12 0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 12 0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 12 0.000000 0.000000 -1.000000 28.65400 REMARK 290 REMARK 290 REMARK: NULL REMARK 300 REMARK 300 BIOMOLECULE: 1 REMARK 300 SEE REMARK 350 FOR THE AUTHOR PROVIDED AND/OR PROGRAM REMARK 300 GENERATED ASSEMBLY INFORMATION FOR THE STRUCTURE IN REMARK 300 THIS ENTRY. THE REMARK MAY ALSO PROVIDE INFORMATION ON REMARK 300 BURIED SURFACE AREA. REMARK 350 REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS REMARK 350 GIVEN BELOW. BOTH NON-CRYSTALLOGRAPHIC AND REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN. REMARK 350 REMARK 350 BIOMOLECULE: 1 REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: DIMERIC REMARK 350 APPLY THE FOLLOWING TO CHAINS: A REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000 REMARK 350 BIOMT1 2 -1.000000 0.000000 0.000000 31.33250 REMARK 350 BIOMT2 2 0.000000 -1.000000 0.000000 54.26948 REMARK 350 BIOMT3 2 0.000000 0.000000 1.000000 0.00000 REMARK 465 REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 GLY A 1 REMARK 465 SER A 2 REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: TORSION ANGLES REMARK 500 REMARK 500 TORSION ANGLES OUTSIDE THE EXPECTED RAMACHANDRAN REGIONS: REMARK 500 (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; REMARK 500 SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT:(10X,I3,1X,A3,1X,A1,I4,A1,4X,F7.2,3X,F7.2) REMARK 500 REMARK 500 EXPECTED VALUES: GJ KLEYWEGT AND TA JONES (1996). PHI/PSI- REMARK 500 CHOLOGY: RAMACHANDRAN REVISITED. STRUCTURE 4, 1395 - 1400 REMARK 500 REMARK 500 M RES CSSEQI PSI PHI REMARK 500 GLU A 75 37.60 76.92 REMARK 500 REMARK 500 REMARK: NULL DBREF 1A1X A 3 108 UNP P56278 MTCPB_HUMAN 2 107 SEQRES 1 A 108 GLY SER ALA GLY GLU ASP VAL GLY ALA PRO PRO ASP HIS SEQRES 2 A 108 LEU TRP VAL HIS GLN GLU GLY ILE TYR ARG ASP GLU TYR SEQRES 3 A 108 GLN ARG THR TRP VAL ALA VAL VAL GLU GLU GLU THR SER SEQRES 4 A 108 PHE LEU ARG ALA ARG VAL GLN GLN ILE GLN VAL PRO LEU SEQRES 5 A 108 GLY ASP ALA ALA ARG PRO SER HIS LEU LEU THR SER GLN SEQRES 6 A 108 LEU PRO LEU MET TRP GLN LEU TYR PRO GLU GLU ARG TYR SEQRES 7 A 108 MET ASP ASN ASN SER ARG LEU TRP GLN ILE GLN HIS HIS SEQRES 8 A 108 LEU MET VAL ARG GLY VAL GLN GLU LEU LEU LEU LYS LEU SEQRES 9 A 108 LEU PRO ASP ASP FORMUL 2 HOH *24(H2 O) HELIX 1 1 PRO A 58 THR A 63 1 6 SHEET 1 A 9 HIS A 13 TRP A 15 0 SHEET 2 A 9 MET A 69 TYR A 73 -1 N TRP A 70 O LEU A 14 SHEET 3 A 9 ARG A 77 ASP A 80 -1 N MET A 79 O GLN A 71 SHEET 4 A 9 LEU A 85 VAL A 94 -1 N TRP A 86 O TYR A 78 SHEET 5 A 9 VAL A 97 LEU A 104 -1 N LYS A 103 O GLN A 87 SHEET 6 A 9 LEU A 41 GLN A 46 -1 N VAL A 45 O LEU A 100 SHEET 7 A 9 THR A 29 GLU A 36 -1 N GLU A 35 O ARG A 42 SHEET 8 A 9 ILE A 21 ASP A 24 -1 N TYR A 22 O TRP A 30 SHEET 9 A 9 LEU A 14 GLN A 18 -1 N GLN A 18 O ILE A 21 CRYST1 62.665 62.665 85.962 90.00 90.00 120.00 P 62 2 2 12 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.015958 0.009213 0.000000 0.00000 SCALE2 0.000000 0.018427 0.000000 0.00000 SCALE3 0.000000 0.000000 0.011633 0.00000 ATOM 1 N ALA A 3 9.430 27.207 36.902 1.00 32.66 N ATOM 2 CA ALA A 3 9.560 25.910 36.169 1.00 36.84 C ATOM 3 C ALA A 3 8.771 24.851 36.930 1.00 35.55 C ATOM 4 O ALA A 3 9.268 24.262 37.891 1.00 34.17 O ATOM 5 CB ALA A 3 11.037 25.499 36.056 1.00 38.15 C ATOM 6 N GLY A 4 7.541 24.611 36.486 1.00 34.63 N ATOM 7 CA GLY A 4 6.682 23.653 37.152 1.00 32.01 C ATOM 8 C GLY A 4 5.852 24.387 38.180 1.00 33.93 C ATOM 9 O GLY A 4 5.178 23.764 38.988 1.00 34.34 O ATOM 10 N GLU A 5 5.912 25.720 38.137 1.00 34.69 N ATOM 11 CA GLU A 5 5.178 26.594 39.057 1.00 33.67 C ATOM 12 C GLU A 5 4.090 27.381 38.346 1.00 36.27 C ATOM 13 O GLU A 5 4.075 27.480 37.119 1.00 36.32 O ATOM 14 CB GLU A 5 6.118 27.622 39.703 1.00 30.77 C ATOM 15 CG GLU A 5 7.142 27.082 40.685 1.00 26.98 C ATOM 16 CD GLU A 5 6.539 26.635 42.002 1.00 21.88 C ATOM 17 OE1 GLU A 5 5.872 27.449 42.666 1.00 17.97 O ATOM 18 OE2 GLU A 5 6.758 25.472 42.387 1.00 23.83 O ATOM 19 N ASP A 6 3.211 27.969 39.153 1.00 39.47 N ATOM 20 CA ASP A 6 2.109 28.817 38.696 1.00 43.46 C ATOM 21 C ASP A 6 2.241 30.130 39.484 1.00 43.77 C ATOM 22 O ASP A 6 1.246 30.733 39.884 1.00 47.03 O ATOM 23 CB ASP A 6 0.747 28.163 39.003 1.00 47.84 C ATOM 24 CG ASP A 6 0.395 27.031 38.041 1.00 51.43 C ATOM 25 OD1 ASP A 6 -0.030 27.329 36.900 1.00 51.13 O ATOM 26 OD2 ASP A 6 0.514 25.846 38.435 1.00 54.02 O ATOM 27 N VAL A 7 3.485 30.551 39.715 1.00 43.28 N ATOM 28 CA VAL A 7 3.794 31.769 40.475 1.00 38.90 C ATOM 29 C VAL A 7 3.660 33.042 39.640 1.00 39.82 C ATOM 30 O VAL A 7 3.479 34.133 40.182 1.00 42.59 O ATOM 31 CB VAL A 7 5.221 31.694 41.099 1.00 34.86 C ATOM 32 CG1 VAL A 7 6.279 31.737 40.015 1.00 29.95 C ATOM 33 CG2 VAL A 7 5.433 32.803 42.110 1.00 30.38 C ATOM 34 N GLY A 8 3.756 32.898 38.322 1.00 36.52 N ATOM 35 CA GLY A 8 3.627 34.047 37.445 1.00 33.02 C ATOM 36 C GLY A 8 4.917 34.810 37.234 1.00 33.14 C ATOM 37 O GLY A 8 6.006 34.305 37.530 1.00 35.75 O ATOM 38 N ALA A 9 4.783 36.045 36.753 1.00 28.39 N ATOM 39 CA ALA A 9 5.935 36.896 36.465 1.00 23.88 C ATOM 40 C ALA A 9 6.555 37.588 37.676 1.00 19.87 C ATOM 41 O ALA A 9 5.860 37.951 38.633 1.00 18.30 O ATOM 42 CB ALA A 9 5.568 37.933 35.408 1.00 23.25 C ATOM 43 N PRO A 10 7.890 37.735 37.665 1.00 18.90 N ATOM 44 CA PRO A 10 8.609 38.392 38.759 1.00 18.82 C ATOM 45 C PRO A 10 8.400 39.897 38.649 1.00 16.65 C ATOM 46 O PRO A 10 7.968 40.387 37.605 1.00 16.16 O ATOM 47 CB PRO A 10 10.066 38.001 38.486 1.00 16.22 C ATOM 48 CG PRO A 10 10.112 37.875 37.016 1.00 17.71 C ATOM 49 CD PRO A 10 8.839 37.128 36.714 1.00 18.47 C ATOM 50 N PRO A 11 8.648 40.644 39.738 1.00 17.77 N ATOM 51 CA PRO A 11 8.460 42.095 39.656 1.00 17.80 C ATOM 52 C PRO A 11 9.399 42.773 38.655 1.00 16.54 C ATOM 53 O PRO A 11 10.559 42.395 38.513 1.00 16.67 O ATOM 54 CB PRO A 11 8.721 42.560 41.095 1.00 15.87 C ATOM 55 CG PRO A 11 9.636 41.536 41.635 1.00 13.61 C ATOM 56 CD PRO A 11 9.046 40.248 41.100 1.00 15.18 C ATOM 57 N ASP A 12 8.854 43.737 37.920 1.00 18.08 N ATOM 58 CA ASP A 12 9.603 44.511 36.940 1.00 17.00 C ATOM 59 C ASP A 12 10.733 45.280 37.635 1.00 15.91 C ATOM 60 O ASP A 12 11.806 45.497 37.066 1.00 15.96 O ATOM 61 CB ASP A 12 8.648 45.466 36.226 1.00 23.98 C ATOM 62 CG ASP A 12 9.368 46.493 35.406 1.00 34.63 C ATOM 63 OD1 ASP A 12 10.050 46.108 34.427 1.00 42.77 O ATOM 64 OD2 ASP A 12 9.262 47.688 35.758 1.00 42.27 O ATOM 65 N HIS A 13 10.470 45.715 38.860 1.00 16.48 N ATOM 66 CA HIS A 13 11.475 46.421 39.641 1.00 17.19 C ATOM 67 C HIS A 13 11.238 46.393 41.146 1.00 16.91 C ATOM 68 O HIS A 13 10.117 46.152 41.627 1.00 15.83 O ATOM 69 CB HIS A 13 11.674 47.856 39.139 1.00 25.37 C ATOM 70 CG HIS A 13 10.460 48.722 39.244 1.00 29.62 C ATOM 71 ND1 HIS A 13 9.757 49.149 38.137 1.00 33.44 N ATOM 72 CD2 HIS A 13 9.865 49.299 40.316 1.00 32.29 C ATOM 73 CE1 HIS A 13 8.785 49.958 38.523 1.00 34.91 C ATOM 74 NE2 HIS A 13 8.829 50.067 39.840 1.00 32.19 N ATOM 75 N LEU A 14 12.329 46.579 41.883 1.00 13.14 N ATOM 76 CA LEU A 14 12.301 46.599 43.336 1.00 13.46 C ATOM 77 C LEU A 14 13.125 47.791 43.804 1.00 16.02 C ATOM 78 O LEU A 14 14.309 47.905 43.477 1.00 16.65 O ATOM 79 CB LEU A 14 12.883 45.300 43.886 1.00 15.55 C ATOM 80 CG LEU A 14 12.020 44.066 43.607 1.00 18.25 C ATOM 81 CD1 LEU A 14 12.834 42.786 43.691 1.00 20.49 C ATOM 82 CD2 LEU A 14 10.870 44.050 44.581 1.00 15.43 C ATOM 83 N TRP A 15 12.472 48.712 44.506 1.00 17.72 N ATOM 84 CA TRP A 15 13.135 49.902 45.014 1.00 14.75 C ATOM 85 C TRP A 15 13.231 49.777 46.525 1.00 12.21 C ATOM 86 O TRP A 15 12.233 49.528 47.190 1.00 13.12 O ATOM 87 CB TRP A 15 12.339 51.172 44.677 1.00 21.09 C ATOM 88 CG TRP A 15 12.285 51.581 43.224 1.00 31.43 C ATOM 89 CD1 TRP A 15 11.213 51.476 42.387 1.00 38.00 C ATOM 90 CD2 TRP A 15 13.305 52.258 42.478 1.00 38.37 C ATOM 91 NE1 TRP A 15 11.495 52.056 41.173 1.00 40.79 N ATOM 92 CE2 TRP A 15 12.773 52.543 41.203 1.00 40.40 C ATOM 93 CE3 TRP A 15 14.618 52.658 42.767 1.00 45.21 C ATOM 94 CZ2 TRP A 15 13.507 53.213 40.214 1.00 48.64 C ATOM 95 CZ3 TRP A 15 15.353 53.329 41.778 1.00 48.47 C ATOM 96 CH2 TRP A 15 14.792 53.598 40.522 1.00 50.70 C ATOM 97 N VAL A 16 14.431 49.977 47.061 1.00 9.84 N ATOM 98 CA VAL A 16 14.671 49.900 48.495 1.00 9.61 C ATOM 99 C VAL A 16 13.919 51.007 49.246 1.00 11.80 C ATOM 100 O VAL A 16 13.683 52.103 48.708 1.00 11.41 O ATOM 101 CB VAL A 16 16.199 50.006 48.794 1.00 9.84 C ATOM 102 CG1 VAL A 16 16.706 51.412 48.526 1.00 11.99 C ATOM 103 CG2 VAL A 16 16.504 49.582 50.204 1.00 13.17 C ATOM 104 N HIS A 17 13.464 50.686 50.451 1.00 11.23 N ATOM 105 CA HIS A 17 12.779 51.667 51.289 1.00 12.50 C ATOM 106 C HIS A 17 13.635 51.843 52.530 1.00 11.56 C ATOM 107 O HIS A 17 14.045 52.952 52.892 1.00 15.66 O ATOM 108 CB HIS A 17 11.392 51.190 51.693 1.00 8.81 C ATOM 109 CG HIS A 17 10.611 52.228 52.424 1.00 12.32 C ATOM 110 ND1 HIS A 17 10.009 53.286 51.782 1.00 16.44 N ATOM 111 CD2 HIS A 17 10.406 52.422 53.745 1.00 12.87 C ATOM 112 CE1 HIS A 17 9.470 54.092 52.677 1.00 16.27 C ATOM 113 NE2 HIS A 17 9.698 53.590 53.877 1.00 18.21 N ATOM 114 N GLN A 18 13.916 50.709 53.152 1.00 14.35 N ATOM 115 CA GLN A 18 14.740 50.621 54.333 1.00 15.00 C ATOM 116 C GLN A 18 15.655 49.445 54.024 1.00 14.65 C ATOM 117 O GLN A 18 15.330 48.620 53.166 1.00 15.16 O ATOM 118 CB GLN A 18 13.859 50.309 55.531 1.00 25.19 C ATOM 119 CG GLN A 18 14.554 50.392 56.852 1.00 37.43 C ATOM 120 CD GLN A 18 13.623 50.070 57.994 1.00 42.68 C ATOM 121 OE1 GLN A 18 13.239 48.912 58.186 1.00 45.28 O ATOM 122 NE2 GLN A 18 13.250 51.094 58.764 1.00 41.38 N ATOM 123 N GLU A 19 16.794 49.366 54.701 1.00 12.93 N ATOM 124 CA GLU A 19 17.742 48.280 54.469 1.00 16.07 C ATOM 125 C GLU A 19 17.025 46.937 54.491 1.00 11.82 C ATOM 126 O GLU A 19 16.378 46.601 55.475 1.00 13.21 O ATOM 127 CB GLU A 19 18.836 48.293 55.548 1.00 19.28 C ATOM 128 CG GLU A 19 20.275 48.282 55.018 1.00 31.93 C ATOM 129 CD GLU A 19 20.785 46.893 54.672 1.00 36.30 C ATOM 130 OE1 GLU A 19 20.530 46.406 53.549 1.00 43.02 O ATOM 131 OE2 GLU A 19 21.462 46.292 55.527 1.00 41.71 O ATOM 132 N GLY A 20 17.087 46.212 53.380 1.00 9.32 N ATOM 133 CA GLY A 20 16.446 44.910 53.306 1.00 10.78 C ATOM 134 C GLY A 20 14.967 44.868 52.953 1.00 12.37 C ATOM 135 O GLY A 20 14.415 43.779 52.759 1.00 11.40 O ATOM 136 N ILE A 21 14.310 46.028 52.900 1.00 11.21 N ATOM 137 CA ILE A 21 12.892 46.071 52.553 1.00 11.81 C ATOM 138 C ILE A 21 12.714 46.872 51.274 1.00 13.84 C ATOM 139 O ILE A 21 13.095 48.043 51.192 1.00 12.70 O ATOM 140 CB ILE A 21 12.025 46.652 53.693 1.00 11.59 C ATOM 141 CG1 ILE A 21 12.224 45.832 54.967 1.00 15.04 C ATOM 142 CG2 ILE A 21 10.528 46.603 53.308 1.00 16.33 C ATOM 143 CD1 ILE A 21 11.451 46.351 56.168 1.00 22.83 C ATOM 144 N TYR A 22 12.169 46.202 50.266 1.00 10.11 N ATOM 145 CA TYR A 22 11.950 46.777 48.958 1.00 11.86 C ATOM 146 C TYR A 22 10.471 46.880 48.634 1.00 14.44 C ATOM 147 O TYR A 22 9.631 46.292 49.307 1.00 17.52 O ATOM 148 CB TYR A 22 12.627 45.892 47.902 1.00 8.92 C ATOM 149 CG TYR A 22 14.139 45.866 48.001 1.00 8.62 C ATOM 150 CD1 TYR A 22 14.785 45.139 49.005 1.00 10.70 C ATOM 151 CD2 TYR A 22 14.922 46.599 47.115 1.00 6.29 C ATOM 152 CE1 TYR A 22 16.175 45.155 49.120 1.00 13.66 C ATOM 153 CE2 TYR A 22 16.305 46.622 47.219 1.00 6.68 C ATOM 154 CZ TYR A 22 16.925 45.902 48.221 1.00 8.01 C ATOM 155 OH TYR A 22 18.292 45.938 48.326 1.00 14.82 O ATOM 156 N ARG A 23 10.165 47.624 47.580 1.00 16.60 N ATOM 157 CA ARG A 23 8.795 47.780 47.097 1.00 18.95 C ATOM 158 C ARG A 23 8.796 47.633 45.575 1.00 15.47 C ATOM 159 O ARG A 23 9.690 48.150 44.905 1.00 14.40 O ATOM 160 CB ARG A 23 8.221 49.138 47.497 1.00 21.56 C ATOM 161 CG ARG A 23 7.813 49.224 48.949 1.00 33.66 C ATOM 162 CD ARG A 23 6.893 50.401 49.176 1.00 36.15 C ATOM 163 NE ARG A 23 7.556 51.660 48.866 1.00 38.83 N ATOM 164 CZ ARG A 23 7.224 52.829 49.402 1.00 45.05 C ATOM 165 NH1 ARG A 23 6.229 52.903 50.282 1.00 46.49 N ATOM 166 NH2 ARG A 23 7.899 53.923 49.069 1.00 46.39 N ATOM 167 N ASP A 24 7.824 46.904 45.033 1.00 15.56 N ATOM 168 CA ASP A 24 7.751 46.703 43.588 1.00 15.72 C ATOM 169 C ASP A 24 6.818 47.714 42.934 1.00 14.47 C ATOM 170 O ASP A 24 6.343 48.640 43.588 1.00 14.11 O ATOM 171 CB ASP A 24 7.346 45.256 43.239 1.00 14.43 C ATOM 172 CG ASP A 24 5.984 44.855 43.801 1.00 15.26 C ATOM 173 OD1 ASP A 24 5.147 45.727 44.070 1.00 11.84 O ATOM 174 OD2 ASP A 24 5.738 43.649 43.967 1.00 18.06 O ATOM 175 N GLU A 25 6.554 47.529 41.650 1.00 14.92 N ATOM 176 CA GLU A 25 5.680 48.429 40.904 1.00 17.73 C ATOM 177 C GLU A 25 4.238 48.519 41.430 1.00 20.50 C ATOM 178 O GLU A 25 3.505 49.436 41.066 1.00 22.81 O ATOM 179 CB GLU A 25 5.652 48.021 39.435 1.00 18.41 C ATOM 180 CG GLU A 25 5.109 46.612 39.187 1.00 19.84 C ATOM 181 CD GLU A 25 6.198 45.564 39.068 1.00 21.17 C ATOM 182 OE1 GLU A 25 7.249 45.697 39.736 1.00 16.47 O ATOM 183 OE2 GLU A 25 5.994 44.602 38.298 1.00 18.38 O ATOM 184 N TYR A 26 3.834 47.561 42.265 1.00 19.79 N ATOM 185 CA TYR A 26 2.488 47.528 42.843 1.00 21.31 C ATOM 186 C TYR A 26 2.441 48.038 44.276 1.00 22.08 C ATOM 187 O TYR A 26 1.401 47.937 44.933 1.00 25.79 O ATOM 188 CB TYR A 26 1.937 46.108 42.830 1.00 17.73 C ATOM 189 CG TYR A 26 1.869 45.496 41.466 1.00 23.02 C ATOM 190 CD1 TYR A 26 1.336 46.203 40.393 1.00 20.72 C ATOM 191 CD2 TYR A 26 2.335 44.205 41.242 1.00 26.28 C ATOM 192 CE1 TYR A 26 1.269 45.640 39.127 1.00 23.56 C ATOM 193 CE2 TYR A 26 2.272 43.629 39.977 1.00 28.23 C ATOM 194 CZ TYR A 26 1.740 44.353 38.927 1.00 26.06 C ATOM 195 OH TYR A 26 1.681 43.801 37.672 1.00 30.94 O ATOM 196 N GLN A 27 3.567 48.563 44.759 1.00 20.91 N ATOM 197 CA GLN A 27 3.694 49.077 46.126 1.00 23.01 C ATOM 198 C GLN A 27 3.722 47.972 47.180 1.00 20.14 C ATOM 199 O GLN A 27 3.596 48.237 48.373 1.00 19.46 O ATOM 200 CB GLN A 27 2.581 50.077 46.454 1.00 28.90 C ATOM 201 CG GLN A 27 2.663 51.391 45.694 1.00 36.12 C ATOM 202 CD GLN A 27 1.415 52.238 45.891 1.00 42.43 C ATOM 203 OE1 GLN A 27 0.546 52.304 45.009 1.00 43.99 O ATOM 204 NE2 GLN A 27 1.303 52.870 47.063 1.00 45.23 N ATOM 205 N ARG A 28 3.878 46.731 46.735 1.00 17.12 N ATOM 206 CA ARG A 28 3.953 45.605 47.650 1.00 15.88 C ATOM 207 C ARG A 28 5.370 45.522 48.209 1.00 14.06 C ATOM 208 O ARG A 28 6.345 45.813 47.514 1.00 11.98 O ATOM 209 CB ARG A 28 3.613 44.308 46.925 1.00 15.62 C ATOM 210 CG ARG A 28 2.188 44.240 46.440 1.00 15.16 C ATOM 211 CD ARG A 28 1.945 42.941 45.743 1.00 18.63 C ATOM 212 NE ARG A 28 2.935 42.734 44.697 1.00 21.42 N ATOM 213 CZ ARG A 28 3.028 41.633 43.963 1.00 26.57 C ATOM 214 NH1 ARG A 28 2.183 40.626 44.161 1.00 29.04 N ATOM 215 NH2 ARG A 28 3.971 41.536 43.033 1.00 28.28 N ATOM 216 N THR A 29 5.482 45.146 49.475 1.00 12.54 N ATOM 217 CA THR A 29 6.791 45.039 50.098 1.00 14.08 C ATOM 218 C THR A 29 7.409 43.646 49.966 1.00 13.70 C ATOM 219 O THR A 29 6.709 42.630 49.943 1.00 11.99 O ATOM 220 CB THR A 29 6.752 45.457 51.579 1.00 13.34 C ATOM 221 OG1 THR A 29 5.743 44.712 52.259 1.00 12.89 O ATOM 222 CG2 THR A 29 6.454 46.935 51.707 1.00 17.38 C ATOM 223 N TRP A 30 8.729 43.627 49.823 1.00 12.04 N ATOM 224 CA TRP A 30 9.507 42.404 49.701 1.00 8.73 C ATOM 225 C TRP A 30 10.625 42.509 50.717 1.00 10.06 C ATOM 226 O TRP A 30 11.212 43.577 50.872 1.00 12.46 O ATOM 227 CB TRP A 30 10.117 42.310 48.314 1.00 9.80 C ATOM 228 CG TRP A 30 9.145 42.015 47.236 1.00 11.47 C ATOM 229 CD1 TRP A 30 8.245 42.881 46.681 1.00 13.90 C ATOM 230 CD2 TRP A 30 8.989 40.772 46.543 1.00 10.96 C ATOM 231 NE1 TRP A 30 7.542 42.254 45.683 1.00 13.05 N ATOM 232 CE2 TRP A 30 7.978 40.958 45.575 1.00 12.91 C ATOM 233 CE3 TRP A 30 9.607 39.518 46.643 1.00 8.88 C ATOM 234 CZ2 TRP A 30 7.571 39.937 44.708 1.00 12.83 C ATOM 235 CZ3 TRP A 30 9.203 38.501 45.776 1.00 8.59 C ATOM 236 CH2 TRP A 30 8.195 38.720 44.823 1.00 10.12 C ATOM 237 N VAL A 31 10.879 41.433 51.455 1.00 10.56 N ATOM 238 CA VAL A 31 11.948 41.421 52.452 1.00 9.81 C ATOM 239 C VAL A 31 13.123 40.644 51.863 1.00 13.48 C ATOM 240 O VAL A 31 12.961 39.500 51.413 1.00 14.18 O ATOM 241 CB VAL A 31 11.490 40.749 53.777 1.00 11.63 C ATOM 242 CG1 VAL A 31 12.655 40.641 54.748 1.00 13.21 C ATOM 243 CG2 VAL A 31 10.353 41.548 54.416 1.00 13.94 C ATOM 244 N ALA A 32 14.295 41.272 51.842 1.00 12.15 N ATOM 245 CA ALA A 32 15.483 40.632 51.293 1.00 11.67 C ATOM 246 C ALA A 32 16.520 40.285 52.353 1.00 13.21 C ATOM 247 O ALA A 32 16.853 41.109 53.202 1.00 12.93 O ATOM 248 CB ALA A 32 16.105 41.507 50.209 1.00 9.35 C ATOM 249 N VAL A 33 17.004 39.046 52.308 1.00 10.78 N ATOM 250 CA VAL A 33 18.021 38.593 53.236 1.00 11.43 C ATOM 251 C VAL A 33 19.234 38.106 52.451 1.00 13.65 C ATOM 252 O VAL A 33 19.106 37.604 51.335 1.00 14.89 O ATOM 253 CB VAL A 33 17.524 37.454 54.182 1.00 11.64 C ATOM 254 CG1 VAL A 33 16.376 37.938 55.045 1.00 15.29 C ATOM 255 CG2 VAL A 33 17.130 36.213 53.396 1.00 15.10 C ATOM 256 N VAL A 34 20.414 38.291 53.033 1.00 14.86 N ATOM 257 CA VAL A 34 21.646 37.861 52.398 1.00 14.07 C ATOM 258 C VAL A 34 22.019 36.455 52.849 1.00 10.75 C ATOM 259 O VAL A 34 21.917 36.128 54.024 1.00 12.88 O ATOM 260 CB VAL A 34 22.833 38.825 52.725 1.00 13.34 C ATOM 261 CG1 VAL A 34 24.138 38.293 52.108 1.00 17.43 C ATOM 262 CG2 VAL A 34 22.546 40.230 52.203 1.00 10.34 C ATOM 263 N GLU A 35 22.381 35.615 51.888 1.00 13.04 N ATOM 264 CA GLU A 35 22.835 34.252 52.161 1.00 15.26 C ATOM 265 C GLU A 35 24.138 34.116 51.384 1.00 15.10 C ATOM 266 O GLU A 35 24.164 33.746 50.214 1.00 14.57 O ATOM 267 CB GLU A 35 21.794 33.246 51.716 1.00 13.76 C ATOM 268 CG GLU A 35 20.471 33.523 52.377 1.00 22.79 C ATOM 269 CD GLU A 35 19.466 32.421 52.186 1.00 25.07 C ATOM 270 OE1 GLU A 35 19.432 31.844 51.076 1.00 29.90 O ATOM 271 OE2 GLU A 35 18.710 32.142 53.149 1.00 26.94 O ATOM 272 N GLU A 36 25.219 34.495 52.045 1.00 18.73 N ATOM 273 CA GLU A 36 26.538 34.495 51.446 1.00 23.19 C ATOM 274 C GLU A 36 27.283 33.164 51.426 1.00 23.70 C ATOM 275 O GLU A 36 27.430 32.499 52.452 1.00 27.43 O ATOM 276 CB GLU A 36 27.376 35.547 52.148 1.00 22.75 C ATOM 277 CG GLU A 36 28.673 35.866 51.466 1.00 31.51 C ATOM 278 CD GLU A 36 29.457 36.917 52.210 1.00 34.90 C ATOM 279 OE1 GLU A 36 28.839 37.748 52.915 1.00 40.64 O ATOM 280 OE2 GLU A 36 30.698 36.904 52.094 1.00 37.73 O ATOM 281 N GLU A 37 27.721 32.768 50.234 1.00 23.31 N ATOM 282 CA GLU A 37 28.492 31.541 50.059 1.00 21.82 C ATOM 283 C GLU A 37 29.966 31.943 50.125 1.00 23.97 C ATOM 284 O GLU A 37 30.283 33.116 50.356 1.00 26.06 O ATOM 285 CB GLU A 37 28.176 30.902 48.702 1.00 24.57 C ATOM 286 CG GLU A 37 26.700 30.564 48.489 1.00 26.70 C ATOM 287 CD GLU A 37 26.165 29.566 49.496 1.00 26.82 C ATOM 288 OE1 GLU A 37 26.893 28.607 49.822 1.00 30.00 O ATOM 289 OE2 GLU A 37 25.013 29.739 49.957 1.00 34.57 O ATOM 290 N THR A 38 30.869 30.987 49.923 1.00 22.64 N ATOM 291 CA THR A 38 32.303 31.288 49.966 1.00 21.52 C ATOM 292 C THR A 38 32.760 32.121 48.760 1.00 19.63 C ATOM 293 O THR A 38 33.459 33.131 48.899 1.00 19.04 O ATOM 294 CB THR A 38 33.157 29.991 50.014 1.00 18.09 C ATOM 295 OG1 THR A 38 32.874 29.271 51.216 1.00 21.23 O ATOM 296 CG2 THR A 38 34.640 30.326 49.983 1.00 20.54 C ATOM 297 N SER A 39 32.320 31.703 47.585 1.00 17.14 N ATOM 298 CA SER A 39 32.705 32.346 46.353 1.00 20.78 C ATOM 299 C SER A 39 31.734 33.376 45.810 1.00 21.63 C ATOM 300 O SER A 39 32.076 34.109 44.886 1.00 19.37 O ATOM 301 CB SER A 39 32.952 31.274 45.308 1.00 21.56 C ATOM 302 OG SER A 39 33.744 30.251 45.882 1.00 26.38 O ATOM 303 N PHE A 40 30.524 33.436 46.358 1.00 22.41 N ATOM 304 CA PHE A 40 29.545 34.404 45.860 1.00 19.83 C ATOM 305 C PHE A 40 28.464 34.754 46.864 1.00 17.95 C ATOM 306 O PHE A 40 28.410 34.213 47.971 1.00 14.28 O ATOM 307 CB PHE A 40 28.905 33.921 44.546 1.00 19.37 C ATOM 308 CG PHE A 40 28.293 32.570 44.650 1.00 25.11 C ATOM 309 CD1 PHE A 40 29.058 31.434 44.424 1.00 29.88 C ATOM 310 CD2 PHE A 40 26.972 32.423 45.044 1.00 26.71 C ATOM 311 CE1 PHE A 40 28.513 30.159 44.598 1.00 32.01 C ATOM 312 CE2 PHE A 40 26.419 31.156 45.219 1.00 30.64 C ATOM 313 CZ PHE A 40 27.193 30.021 44.997 1.00 28.16 C ATOM 314 N LEU A 41 27.578 35.645 46.436 1.00 16.59 N ATOM 315 CA LEU A 41 26.498 36.110 47.272 1.00 11.19 C ATOM 316 C LEU A 41 25.134 35.890 46.639 1.00 10.44 C ATOM 317 O LEU A 41 24.955 36.052 45.431 1.00 6.65 O ATOM 318 CB LEU A 41 26.718 37.590 47.570 1.00 15.40 C ATOM 319 CG LEU A 41 25.764 38.265 48.542 1.00 17.96 C ATOM 320 CD1 LEU A 41 26.560 39.122 49.509 1.00 22.78 C ATOM 321 CD2 LEU A 41 24.742 39.076 47.781 1.00 17.63 C ATOM 322 N ARG A 42 24.189 35.475 47.472 1.00 9.43 N ATOM 323 CA ARG A 42 22.810 35.248 47.058 1.00 12.78 C ATOM 324 C ARG A 42 21.896 36.091 47.942 1.00 10.33 C ATOM 325 O ARG A 42 22.240 36.417 49.077 1.00 13.14 O ATOM 326 CB ARG A 42 22.407 33.772 47.218 1.00 10.56 C ATOM 327 CG ARG A 42 22.983 32.814 46.171 1.00 13.78 C ATOM 328 CD ARG A 42 22.105 31.557 46.035 1.00 15.67 C ATOM 329 NE ARG A 42 22.758 30.495 45.267 1.00 15.51 N ATOM 330 CZ ARG A 42 23.399 29.456 45.799 1.00 12.10 C ATOM 331 NH1 ARG A 42 23.481 29.314 47.114 1.00 11.66 N ATOM 332 NH2 ARG A 42 23.978 28.561 45.013 1.00 13.24 N ATOM 333 N ALA A 43 20.754 36.483 47.396 1.00 10.52 N ATOM 334 CA ALA A 43 19.769 37.244 48.148 1.00 9.84 C ATOM 335 C ALA A 43 18.465 36.478 47.999 1.00 8.98 C ATOM 336 O ALA A 43 18.105 36.083 46.903 1.00 12.92 O ATOM 337 CB ALA A 43 19.622 38.647 47.585 1.00 8.83 C ATOM 338 N ARG A 44 17.814 36.185 49.114 1.00 8.69 N ATOM 339 CA ARG A 44 16.542 35.481 49.087 1.00 9.60 C ATOM 340 C ARG A 44 15.524 36.573 49.385 1.00 8.62 C ATOM 341 O ARG A 44 15.603 37.242 50.409 1.00 15.33 O ATOM 342 CB ARG A 44 16.518 34.378 50.144 1.00 8.69 C ATOM 343 CG ARG A 44 15.315 33.456 50.048 1.00 11.48 C ATOM 344 CD ARG A 44 15.657 32.081 50.596 1.00 14.85 C ATOM 345 NE ARG A 44 16.129 32.153 51.969 1.00 20.01 N ATOM 346 CZ ARG A 44 15.340 32.179 53.038 1.00 25.11 C ATOM 347 NH1 ARG A 44 14.021 32.138 52.904 1.00 28.22 N ATOM 348 NH2 ARG A 44 15.878 32.232 54.246 1.00 25.52 N ATOM 349 N VAL A 45 14.621 36.805 48.443 1.00 11.50 N ATOM 350 CA VAL A 45 13.641 37.875 48.581 1.00 12.07 C ATOM 351 C VAL A 45 12.228 37.313 48.583 1.00 12.67 C ATOM 352 O VAL A 45 11.825 36.616 47.648 1.00 13.12 O ATOM 353 CB VAL A 45 13.792 38.915 47.421 1.00 12.86 C ATOM 354 CG1 VAL A 45 13.174 40.231 47.812 1.00 12.39 C ATOM 355 CG2 VAL A 45 15.270 39.105 47.038 1.00 10.08 C ATOM 356 N GLN A 46 11.487 37.623 49.644 1.00 9.87 N ATOM 357 CA GLN A 46 10.119 37.152 49.799 1.00 12.52 C ATOM 358 C GLN A 46 9.104 38.285 49.902 1.00 12.60 C ATOM 359 O GLN A 46 9.328 39.284 50.596 1.00 10.34 O ATOM 360 CB GLN A 46 10.004 36.250 51.033 1.00 12.62 C ATOM 361 CG GLN A 46 10.774 34.950 50.921 1.00 16.77 C ATOM 362 CD GLN A 46 10.550 34.052 52.113 1.00 19.44 C ATOM 363 OE1 GLN A 46 9.951 34.464 53.097 1.00 17.87 O ATOM 364 NE2 GLN A 46 11.032 32.813 52.031 1.00 14.24 N ATOM 365 N GLN A 47 7.973 38.099 49.226 1.00 12.30 N ATOM 366 CA GLN A 47 6.898 39.081 49.222 1.00 11.62 C ATOM 367 C GLN A 47 6.063 38.904 50.483 1.00 13.03 C ATOM 368 O GLN A 47 5.170 38.058 50.551 1.00 14.19 O ATOM 369 CB GLN A 47 6.035 38.924 47.962 1.00 9.85 C ATOM 370 CG GLN A 47 5.328 40.201 47.516 1.00 12.95 C ATOM 371 CD GLN A 47 4.119 40.524 48.362 1.00 18.42 C ATOM 372 OE1 GLN A 47 3.034 39.995 48.118 1.00 19.94 O ATOM 373 NE2 GLN A 47 4.291 41.385 49.363 1.00 15.59 N ATOM 374 N ILE A 48 6.416 39.681 51.496 1.00 12.27 N ATOM 375 CA ILE A 48 5.734 39.664 52.776 1.00 11.76 C ATOM 376 C ILE A 48 5.372 41.099 53.135 1.00 15.46 C ATOM 377 O ILE A 48 6.188 42.016 52.976 1.00 15.44 O ATOM 378 CB ILE A 48 6.644 39.064 53.879 1.00 15.37 C ATOM 379 CG1 ILE A 48 7.000 37.610 53.527 1.00 14.45 C ATOM 380 CG2 ILE A 48 5.959 39.148 55.258 1.00 16.12 C ATOM 381 CD1 ILE A 48 7.998 36.954 54.484 1.00 19.15 C ATOM 382 N GLN A 49 4.133 41.283 53.583 1.00 16.14 N ATOM 383 CA GLN A 49 3.612 42.583 53.999 1.00 18.88 C ATOM 384 C GLN A 49 4.245 42.997 55.334 1.00 16.77 C ATOM 385 O GLN A 49 4.066 42.334 56.353 1.00 16.00 O ATOM 386 CB GLN A 49 2.094 42.483 54.142 1.00 21.86 C ATOM 387 CG GLN A 49 1.383 43.776 54.494 1.00 34.69 C ATOM 388 CD GLN A 49 -0.130 43.613 54.473 1.00 42.01 C ATOM 389 OE1 GLN A 49 -0.727 43.320 53.424 1.00 43.27 O ATOM 390 NE2 GLN A 49 -0.758 43.780 55.634 1.00 41.30 N ATOM 391 N VAL A 50 5.059 44.045 55.307 1.00 15.30 N ATOM 392 CA VAL A 50 5.709 44.526 56.522 1.00 16.37 C ATOM 393 C VAL A 50 5.416 46.010 56.671 1.00 18.04 C ATOM 394 O VAL A 50 5.025 46.664 55.706 1.00 16.50 O ATOM 395 CB VAL A 50 7.245 44.294 56.494 1.00 18.68 C ATOM 396 CG1 VAL A 50 7.555 42.809 56.528 1.00 19.50 C ATOM 397 CG2 VAL A 50 7.866 44.930 55.263 1.00 16.54 C ATOM 398 N PRO A 51 5.549 46.553 57.891 1.00 18.12 N ATOM 399 CA PRO A 51 5.277 47.980 58.078 1.00 19.86 C ATOM 400 C PRO A 51 6.349 48.814 57.409 1.00 21.43 C ATOM 401 O PRO A 51 7.467 48.344 57.174 1.00 19.19 O ATOM 402 CB PRO A 51 5.325 48.138 59.598 1.00 24.01 C ATOM 403 CG PRO A 51 6.338 47.123 60.009 1.00 25.48 C ATOM 404 CD PRO A 51 5.949 45.926 59.163 1.00 18.27 C ATOM 405 N LEU A 52 5.983 50.029 57.038 1.00 19.15 N ATOM 406 CA LEU A 52 6.928 50.937 56.414 1.00 21.07 C ATOM 407 C LEU A 52 6.986 52.187 57.266 1.00 22.46 C ATOM 408 O LEU A 52 5.951 52.728 57.656 1.00 27.86 O ATOM 409 CB LEU A 52 6.492 51.308 54.996 1.00 21.76 C ATOM 410 CG LEU A 52 6.588 50.224 53.927 1.00 23.20 C ATOM 411 CD1 LEU A 52 6.029 50.752 52.617 1.00 25.70 C ATOM 412 CD2 LEU A 52 8.022 49.791 53.755 1.00 21.63 C ATOM 413 N GLY A 53 8.200 52.592 57.613 1.00 19.61 N ATOM 414 CA GLY A 53 8.381 53.797 58.394 1.00 18.06 C ATOM 415 C GLY A 53 9.017 54.789 57.446 1.00 17.91 C ATOM 416 O GLY A 53 8.639 54.856 56.269 1.00 16.04 O ATOM 417 N ASP A 54 10.014 55.521 57.931 1.00 17.80 N ATOM 418 CA ASP A 54 10.709 56.493 57.094 1.00 19.56 C ATOM 419 C ASP A 54 11.629 55.784 56.099 1.00 18.53 C ATOM 420 O ASP A 54 12.141 54.695 56.386 1.00 16.74 O ATOM 421 CB ASP A 54 11.568 57.423 57.957 1.00 20.87 C ATOM 422 CG ASP A 54 10.766 58.160 59.012 1.00 14.70 C ATOM 423 OD1 ASP A 54 9.532 58.257 58.883 1.00 23.14 O ATOM 424 OD2 ASP A 54 11.381 58.643 59.981 1.00 20.21 O ATOM 425 N ALA A 55 11.813 56.387 54.926 1.00 15.45 N ATOM 426 CA ALA A 55 12.722 55.826 53.934 1.00 15.62 C ATOM 427 C ALA A 55 14.104 56.113 54.494 1.00 17.34 C ATOM 428 O ALA A 55 14.335 57.181 55.062 1.00 18.82 O ATOM 429 CB ALA A 55 12.547 56.516 52.607 1.00 12.76 C ATOM 430 N ALA A 56 15.016 55.155 54.386 1.00 17.62 N ATOM 431 CA ALA A 56 16.360 55.370 54.910 1.00 18.22 C ATOM 432 C ALA A 56 17.145 56.353 54.038 1.00 18.86 C ATOM 433 O ALA A 56 16.866 56.487 52.847 1.00 15.46 O ATOM 434 CB ALA A 56 17.107 54.041 55.011 1.00 17.23 C ATOM 435 N ARG A 57 18.103 57.057 54.642 1.00 21.08 N ATOM 436 CA ARG A 57 18.942 57.998 53.903 1.00 23.00 C ATOM 437 C ARG A 57 19.964 57.196 53.113 1.00 22.15 C ATOM 438 O ARG A 57 20.508 56.211 53.616 1.00 22.87 O ATOM 439 CB ARG A 57 19.699 58.929 54.848 1.00 24.04 C ATOM 440 CG ARG A 57 18.830 59.726 55.791 1.00 32.51 C ATOM 441 CD ARG A 57 17.841 60.612 55.062 1.00 34.98 C ATOM 442 NE ARG A 57 17.109 61.457 56.006 1.00 41.64 N ATOM 443 CZ ARG A 57 16.153 62.317 55.666 1.00 43.46 C ATOM 444 NH1 ARG A 57 15.797 62.454 54.392 1.00 42.40 N ATOM 445 NH2 ARG A 57 15.573 63.061 56.601 1.00 45.97 N ATOM 446 N PRO A 58 20.232 57.602 51.860 1.00 21.73 N ATOM 447 CA PRO A 58 21.203 56.906 51.007 1.00 20.52 C ATOM 448 C PRO A 58 22.583 56.732 51.678 1.00 21.37 C ATOM 449 O PRO A 58 23.298 55.760 51.413 1.00 20.72 O ATOM 450 CB PRO A 58 21.269 57.812 49.773 1.00 22.99 C ATOM 451 CG PRO A 58 19.873 58.352 49.690 1.00 20.91 C ATOM 452 CD PRO A 58 19.573 58.698 51.122 1.00 22.61 C ATOM 453 N SER A 59 22.935 57.668 52.556 1.00 20.39 N ATOM 454 CA SER A 59 24.202 57.625 53.274 1.00 25.45 C ATOM 455 C SER A 59 24.346 56.333 54.078 1.00 29.06 C ATOM 456 O SER A 59 25.440 55.785 54.181 1.00 30.56 O ATOM 457 CB SER A 59 24.348 58.852 54.183 1.00 23.85 C ATOM 458 OG SER A 59 23.240 59.005 55.055 1.00 25.39 O ATOM 459 N HIS A 60 23.237 55.849 54.633 1.00 32.05 N ATOM 460 CA HIS A 60 23.228 54.605 55.405 1.00 34.52 C ATOM 461 C HIS A 60 23.243 53.391 54.487 1.00 34.25 C ATOM 462 O HIS A 60 23.776 52.339 54.839 1.00 37.45 O ATOM 463 CB HIS A 60 21.975 54.515 56.272 1.00 39.39 C ATOM 464 CG HIS A 60 22.088 55.229 57.581 1.00 50.07 C ATOM 465 ND1 HIS A 60 21.246 56.260 57.940 1.00 53.27 N ATOM 466 CD2 HIS A 60 22.919 55.035 58.633 1.00 53.22 C ATOM 467 CE1 HIS A 60 21.550 56.667 59.160 1.00 55.62 C ATOM 468 NE2 HIS A 60 22.561 55.941 59.602 1.00 56.55 N ATOM 469 N LEU A 61 22.656 53.545 53.307 1.00 30.20 N ATOM 470 CA LEU A 61 22.568 52.455 52.349 1.00 30.63 C ATOM 471 C LEU A 61 23.822 52.203 51.502 1.00 32.64 C ATOM 472 O LEU A 61 23.948 51.146 50.885 1.00 32.33 O ATOM 473 CB LEU A 61 21.350 52.676 51.454 1.00 26.99 C ATOM 474 CG LEU A 61 20.038 52.901 52.208 1.00 27.97 C ATOM 475 CD1 LEU A 61 18.944 53.365 51.251 1.00 27.16 C ATOM 476 CD2 LEU A 61 19.624 51.627 52.929 1.00 24.67 C ATOM 477 N LEU A 62 24.751 53.157 51.488 1.00 35.49 N ATOM 478 CA LEU A 62 25.981 53.029 50.699 1.00 37.77 C ATOM 479 C LEU A 62 26.823 51.806 51.061 1.00 36.44 C ATOM 480 O LEU A 62 27.501 51.233 50.210 1.00 36.53 O ATOM 481 CB LEU A 62 26.862 54.280 50.847 1.00 40.27 C ATOM 482 CG LEU A 62 26.436 55.666 50.344 1.00 42.29 C ATOM 483 CD1 LEU A 62 27.501 56.659 50.766 1.00 39.59 C ATOM 484 CD2 LEU A 62 26.260 55.696 48.833 1.00 40.03 C ATOM 485 N THR A 63 26.773 51.413 52.324 1.00 34.57 N ATOM 486 CA THR A 63 27.560 50.286 52.807 1.00 35.29 C ATOM 487 C THR A 63 26.906 48.908 52.703 1.00 32.03 C ATOM 488 O THR A 63 27.582 47.888 52.865 1.00 31.17 O ATOM 489 CB THR A 63 27.969 50.516 54.267 1.00 37.75 C ATOM 490 OG1 THR A 63 26.792 50.671 55.079 1.00 39.81 O ATOM 491 CG2 THR A 63 28.842 51.767 54.375 1.00 40.91 C ATOM 492 N SER A 64 25.604 48.885 52.429 1.00 28.60 N ATOM 493 CA SER A 64 24.844 47.642 52.321 1.00 27.01 C ATOM 494 C SER A 64 25.332 46.690 51.233 1.00 25.06 C ATOM 495 O SER A 64 25.878 47.124 50.222 1.00 25.07 O ATOM 496 CB SER A 64 23.367 47.964 52.083 1.00 25.98 C ATOM 497 OG SER A 64 22.601 46.778 51.968 1.00 30.64 O ATOM 498 N GLN A 65 25.130 45.392 51.447 1.00 20.79 N ATOM 499 CA GLN A 65 25.515 44.375 50.460 1.00 20.97 C ATOM 500 C GLN A 65 24.467 44.286 49.334 1.00 17.84 C ATOM 501 O GLN A 65 24.751 43.814 48.230 1.00 18.22 O ATOM 502 CB GLN A 65 25.677 43.007 51.133 1.00 20.52 C ATOM 503 CG GLN A 65 26.826 42.940 52.130 1.00 20.61 C ATOM 504 CD GLN A 65 26.974 41.562 52.736 1.00 24.18 C ATOM 505 OE1 GLN A 65 26.248 41.201 53.656 1.00 27.16 O ATOM 506 NE2 GLN A 65 27.902 40.777 52.210 1.00 22.39 N ATOM 507 N LEU A 66 23.253 44.724 49.645 1.00 16.91 N ATOM 508 CA LEU A 66 22.130 44.737 48.707 1.00 17.29 C ATOM 509 C LEU A 66 22.064 46.105 48.015 1.00 15.83 C ATOM 510 O LEU A 66 22.441 47.130 48.600 1.00 14.50 O ATOM 511 CB LEU A 66 20.820 44.452 49.456 1.00 19.92 C ATOM 512 CG LEU A 66 20.328 43.007 49.592 1.00 21.14 C ATOM 513 CD1 LEU A 66 21.390 42.007 49.168 1.00 21.08 C ATOM 514 CD2 LEU A 66 19.859 42.758 51.017 1.00 20.36 C ATOM 515 N PRO A 67 21.540 46.143 46.781 1.00 14.35 N ATOM 516 CA PRO A 67 21.435 47.390 46.018 1.00 14.67 C ATOM 517 C PRO A 67 20.246 48.281 46.335 1.00 14.26 C ATOM 518 O PRO A 67 19.308 47.880 47.030 1.00 13.77 O ATOM 519 CB PRO A 67 21.381 46.889 44.578 1.00 12.70 C ATOM 520 CG PRO A 67 20.557 45.647 44.709 1.00 16.82 C ATOM 521 CD PRO A 67 21.094 44.991 45.974 1.00 14.43 C ATOM 522 N LEU A 68 20.311 49.507 45.820 1.00 13.52 N ATOM 523 CA LEU A 68 19.244 50.487 45.992 1.00 13.91 C ATOM 524 C LEU A 68 18.056 50.078 45.127 1.00 12.87 C ATOM 525 O LEU A 68 16.908 50.374 45.456 1.00 11.98 O ATOM 526 CB LEU A 68 19.706 51.881 45.554 1.00 12.66 C ATOM 527 CG LEU A 68 20.839 52.587 46.293 1.00 17.85 C ATOM 528 CD1 LEU A 68 21.091 53.928 45.628 1.00 17.81 C ATOM 529 CD2 LEU A 68 20.488 52.786 47.751 1.00 18.03 C ATOM 530 N MET A 69 18.335 49.396 44.022 1.00 11.64 N ATOM 531 CA MET A 69 17.273 48.972 43.126 1.00 16.65 C ATOM 532 C MET A 69 17.660 47.825 42.209 1.00 16.10 C ATOM 533 O MET A 69 18.827 47.655 41.858 1.00 15.01 O ATOM 534 CB MET A 69 16.830 50.141 42.254 1.00 20.53 C ATOM 535 CG MET A 69 17.926 50.645 41.336 1.00 29.19 C ATOM 536 SD MET A 69 17.354 50.897 39.654 1.00 46.93 S ATOM 537 CE MET A 69 17.759 49.335 38.933 1.00 38.88 C ATOM 538 N TRP A 70 16.645 47.067 41.805 1.00 13.15 N ATOM 539 CA TRP A 70 16.792 45.946 40.886 1.00 10.26 C ATOM 540 C TRP A 70 15.776 46.214 39.790 1.00 10.68 C ATOM 541 O TRP A 70 14.647 46.593 40.084 1.00 12.34 O ATOM 542 CB TRP A 70 16.435 44.623 41.568 1.00 12.59 C ATOM 543 CG TRP A 70 17.559 43.937 42.265 1.00 10.88 C ATOM 544 CD1 TRP A 70 18.712 43.479 41.705 1.00 8.84 C ATOM 545 CD2 TRP A 70 17.578 43.509 43.632 1.00 8.55 C ATOM 546 NE1 TRP A 70 19.441 42.777 42.636 1.00 7.96 N ATOM 547 CE2 TRP A 70 18.767 42.778 43.826 1.00 9.30 C ATOM 548 CE3 TRP A 70 16.696 43.666 44.712 1.00 6.59 C ATOM 549 CZ2 TRP A 70 19.102 42.199 45.059 1.00 9.37 C ATOM 550 CZ3 TRP A 70 17.027 43.094 45.939 1.00 10.38 C ATOM 551 CH2 TRP A 70 18.225 42.366 46.100 1.00 9.72 C ATOM 552 N GLN A 71 16.174 46.013 38.541 1.00 11.12 N ATOM 553 CA GLN A 71 15.301 46.222 37.396 1.00 10.97 C ATOM 554 C GLN A 71 15.336 44.944 36.568 1.00 12.14 C ATOM 555 O GLN A 71 16.413 44.460 36.228 1.00 13.62 O ATOM 556 CB GLN A 71 15.830 47.397 36.577 1.00 19.11 C ATOM 557 CG GLN A 71 14.997 47.784 35.375 1.00 21.02 C ATOM 558 CD GLN A 71 15.593 48.978 34.637 1.00 27.76 C ATOM 559 OE1 GLN A 71 16.383 48.820 33.703 1.00 29.53 O ATOM 560 NE2 GLN A 71 15.234 50.181 35.072 1.00 28.11 N ATOM 561 N LEU A 72 14.166 44.383 36.263 1.00 14.29 N ATOM 562 CA LEU A 72 14.100 43.156 35.471 1.00 16.97 C ATOM 563 C LEU A 72 14.845 43.327 34.141 1.00 20.03 C ATOM 564 O LEU A 72 14.626 44.284 33.392 1.00 19.06 O ATOM 565 CB LEU A 72 12.646 42.729 35.264 1.00 18.95 C ATOM 566 CG LEU A 72 12.350 41.309 34.775 1.00 20.18 C ATOM 567 CD1 LEU A 72 13.084 40.265 35.619 1.00 20.20 C ATOM 568 CD2 LEU A 72 10.837 41.075 34.822 1.00 21.11 C ATOM 569 N TYR A 73 15.745 42.386 33.880 1.00 22.95 N ATOM 570 CA TYR A 73 16.596 42.389 32.699 1.00 24.81 C ATOM 571 C TYR A 73 16.362 41.085 31.914 1.00 27.98 C ATOM 572 O TYR A 73 15.983 40.076 32.504 1.00 27.51 O ATOM 573 CB TYR A 73 18.043 42.478 33.193 1.00 24.82 C ATOM 574 CG TYR A 73 19.042 42.916 32.167 1.00 28.69 C ATOM 575 CD1 TYR A 73 19.069 44.230 31.709 1.00 29.31 C ATOM 576 CD2 TYR A 73 19.947 42.008 31.630 1.00 29.76 C ATOM 577 CE1 TYR A 73 19.972 44.625 30.731 1.00 32.02 C ATOM 578 CE2 TYR A 73 20.850 42.389 30.658 1.00 32.56 C ATOM 579 CZ TYR A 73 20.859 43.696 30.209 1.00 31.82 C ATOM 580 OH TYR A 73 21.752 44.053 29.225 1.00 36.68 O ATOM 581 N PRO A 74 16.575 41.091 30.576 1.00 32.74 N ATOM 582 CA PRO A 74 16.377 39.890 29.742 1.00 31.54 C ATOM 583 C PRO A 74 17.170 38.682 30.227 1.00 31.66 C ATOM 584 O PRO A 74 18.256 38.835 30.790 1.00 33.95 O ATOM 585 CB PRO A 74 16.889 40.336 28.369 1.00 33.02 C ATOM 586 CG PRO A 74 16.586 41.790 28.349 1.00 36.16 C ATOM 587 CD PRO A 74 16.999 42.231 29.738 1.00 32.78 C ATOM 588 N GLU A 75 16.627 37.489 29.985 1.00 30.64 N ATOM 589 CA GLU A 75 17.267 36.230 30.374 1.00 31.65 C ATOM 590 C GLU A 75 17.123 35.937 31.869 1.00 28.27 C ATOM 591 O GLU A 75 18.040 35.413 32.506 1.00 29.39 O ATOM 592 CB GLU A 75 18.755 36.226 29.985 1.00 37.57 C ATOM 593 CG GLU A 75 19.041 36.580 28.533 1.00 43.26 C ATOM 594 CD GLU A 75 18.389 35.620 27.568 1.00 50.20 C ATOM 595 OE1 GLU A 75 18.661 34.406 27.665 1.00 53.62 O ATOM 596 OE2 GLU A 75 17.600 36.075 26.714 1.00 54.09 O ATOM 597 N GLU A 76 15.962 36.274 32.420 1.00 24.07 N ATOM 598 CA GLU A 76 15.662 36.051 33.826 1.00 24.29 C ATOM 599 C GLU A 76 16.752 36.433 34.804 1.00 20.63 C ATOM 600 O GLU A 76 17.379 35.592 35.444 1.00 18.08 O ATOM 601 CB GLU A 76 15.188 34.617 34.064 1.00 32.67 C ATOM 602 CG GLU A 76 13.797 34.360 33.492 1.00 44.16 C ATOM 603 CD GLU A 76 12.806 35.480 33.831 1.00 51.92 C ATOM 604 OE1 GLU A 76 12.156 35.403 34.900 1.00 56.04 O ATOM 605 OE2 GLU A 76 12.683 36.442 33.032 1.00 52.76 O ATOM 606 N ARG A 77 16.972 37.729 34.901 1.00 18.61 N ATOM 607 CA ARG A 77 17.950 38.278 35.813 1.00 17.16 C ATOM 608 C ARG A 77 17.631 39.742 36.036 1.00 15.52 C ATOM 609 O ARG A 77 16.783 40.308 35.347 1.00 18.04 O ATOM 610 CB ARG A 77 19.376 38.070 35.291 1.00 17.74 C ATOM 611 CG ARG A 77 19.594 38.482 33.882 1.00 24.36 C ATOM 612 CD ARG A 77 20.725 37.675 33.271 1.00 29.77 C ATOM 613 NE ARG A 77 21.072 38.219 31.964 1.00 36.45 N ATOM 614 CZ ARG A 77 22.233 38.023 31.351 1.00 41.91 C ATOM 615 NH1 ARG A 77 23.176 37.276 31.920 1.00 41.81 N ATOM 616 NH2 ARG A 77 22.475 38.638 30.201 1.00 42.87 N ATOM 617 N TYR A 78 18.234 40.318 37.069 1.00 14.26 N ATOM 618 CA TYR A 78 18.013 41.716 37.409 1.00 12.21 C ATOM 619 C TYR A 78 19.294 42.511 37.311 1.00 12.38 C ATOM 620 O TYR A 78 20.374 42.003 37.613 1.00 13.31 O ATOM 621 CB TYR A 78 17.502 41.844 38.841 1.00 11.37 C ATOM 622 CG TYR A 78 16.055 41.478 39.041 1.00 13.61 C ATOM 623 CD1 TYR A 78 15.052 42.438 38.917 1.00 11.56 C ATOM 624 CD2 TYR A 78 15.688 40.179 39.378 1.00 10.08 C ATOM 625 CE1 TYR A 78 13.717 42.113 39.124 1.00 13.82 C ATOM 626 CE2 TYR A 78 14.362 39.845 39.587 1.00 13.45 C ATOM 627 CZ TYR A 78 13.380 40.812 39.458 1.00 16.37 C ATOM 628 OH TYR A 78 12.061 40.470 39.649 1.00 15.20 O ATOM 629 N MET A 79 19.174 43.752 36.857 1.00 12.71 N ATOM 630 CA MET A 79 20.323 44.633 36.799 1.00 13.56 C ATOM 631 C MET A 79 20.093 45.642 37.920 1.00 12.04 C ATOM 632 O MET A 79 19.008 46.210 38.020 1.00 14.41 O ATOM 633 CB MET A 79 20.427 45.361 35.456 1.00 11.12 C ATOM 634 CG MET A 79 21.640 46.295 35.409 1.00 14.73 C ATOM 635 SD MET A 79 21.887 47.165 33.869 1.00 19.00 S ATOM 636 CE MET A 79 22.924 46.026 33.010 1.00 15.38 C ATOM 637 N ASP A 80 21.082 45.816 38.793 1.00 10.24 N ATOM 638 CA ASP A 80 20.959 46.760 39.892 1.00 10.68 C ATOM 639 C ASP A 80 21.429 48.162 39.503 1.00 13.19 C ATOM 640 O ASP A 80 21.760 48.416 38.338 1.00 16.65 O ATOM 641 CB ASP A 80 21.680 46.255 41.153 1.00 11.02 C ATOM 642 CG ASP A 80 23.199 46.088 40.978 1.00 13.93 C ATOM 643 OD1 ASP A 80 23.816 46.744 40.114 1.00 13.00 O ATOM 644 OD2 ASP A 80 23.790 45.297 41.751 1.00 15.93 O ATOM 645 N ASN A 81 21.484 49.057 40.483 1.00 16.40 N ATOM 646 CA ASN A 81 21.905 50.434 40.246 1.00 18.10 C ATOM 647 C ASN A 81 23.389 50.603 39.909 1.00 16.47 C ATOM 648 O ASN A 81 23.788 51.644 39.391 1.00 17.21 O ATOM 649 CB ASN A 81 21.526 51.305 41.435 1.00 13.98 C ATOM 650 CG ASN A 81 21.991 50.723 42.724 1.00 14.03 C ATOM 651 OD1 ASN A 81 21.436 49.738 43.206 1.00 13.53 O ATOM 652 ND2 ASN A 81 23.053 51.290 43.274 1.00 16.16 N ATOM 653 N ASN A 82 24.194 49.578 40.180 1.00 17.77 N ATOM 654 CA ASN A 82 25.628 49.607 39.870 1.00 16.10 C ATOM 655 C ASN A 82 25.939 48.889 38.549 1.00 16.16 C ATOM 656 O ASN A 82 27.103 48.668 38.224 1.00 19.80 O ATOM 657 CB ASN A 82 26.429 48.935 40.984 1.00 18.78 C ATOM 658 CG ASN A 82 26.283 49.633 42.310 1.00 22.33 C ATOM 659 OD1 ASN A 82 26.200 48.993 43.358 1.00 23.21 O ATOM 660 ND2 ASN A 82 26.255 50.954 42.277 1.00 27.16 N ATOM 661 N SER A 83 24.894 48.509 37.814 1.00 17.04 N ATOM 662 CA SER A 83 25.002 47.789 36.536 1.00 17.78 C ATOM 663 C SER A 83 25.355 46.311 36.666 1.00 17.32 C ATOM 664 O SER A 83 25.639 45.654 35.662 1.00 21.76 O ATOM 665 CB SER A 83 25.990 48.457 35.570 1.00 18.73 C ATOM 666 OG SER A 83 25.409 49.585 34.952 1.00 31.03 O ATOM 667 N ARG A 84 25.345 45.789 37.889 1.00 16.34 N ATOM 668 CA ARG A 84 25.653 44.383 38.109 1.00 15.27 C ATOM 669 C ARG A 84 24.418 43.542 37.833 1.00 14.93 C ATOM 670 O ARG A 84 23.296 44.020 37.982 1.00 12.68 O ATOM 671 CB ARG A 84 26.122 44.164 39.535 1.00 17.72 C ATOM 672 CG ARG A 84 27.335 44.970 39.878 1.00 25.25 C ATOM 673 CD ARG A 84 27.646 44.857 41.347 1.00 34.69 C ATOM 674 NE ARG A 84 28.765 45.714 41.732 1.00 42.42 N ATOM 675 CZ ARG A 84 28.833 46.373 42.884 1.00 45.31 C ATOM 676 NH1 ARG A 84 27.842 46.273 43.764 1.00 47.22 N ATOM 677 NH2 ARG A 84 29.890 47.130 43.156 1.00 47.36 N ATOM 678 N LEU A 85 24.636 42.297 37.416 1.00 14.33 N ATOM 679 CA LEU A 85 23.558 41.366 37.100 1.00 17.03 C ATOM 680 C LEU A 85 23.364 40.325 38.182 1.00 15.42 C ATOM 681 O LEU A 85 24.326 39.823 38.760 1.00 15.71 O ATOM 682 CB LEU A 85 23.819 40.668 35.765 1.00 20.80 C ATOM 683 CG LEU A 85 23.215 41.272 34.496 1.00 24.06 C ATOM 684 CD1 LEU A 85 23.410 42.767 34.468 1.00 25.84 C ATOM 685 CD2 LEU A 85 23.847 40.627 33.280 1.00 31.93 C ATOM 686 N TRP A 86 22.101 40.032 38.465 1.00 14.08 N ATOM 687 CA TRP A 86 21.722 39.051 39.471 1.00 10.66 C ATOM 688 C TRP A 86 20.869 38.005 38.776 1.00 12.64 C ATOM 689 O TRP A 86 19.828 38.327 38.217 1.00 13.85 O ATOM 690 CB TRP A 86 20.908 39.703 40.588 1.00 7.24 C ATOM 691 CG TRP A 86 21.665 40.716 41.382 1.00 9.74 C ATOM 692 CD1 TRP A 86 22.097 41.940 40.948 1.00 10.12 C ATOM 693 CD2 TRP A 86 22.058 40.616 42.757 1.00 8.32 C ATOM 694 NE1 TRP A 86 22.731 42.604 41.966 1.00 12.02 N ATOM 695 CE2 TRP A 86 22.723 41.819 43.089 1.00 8.47 C ATOM 696 CE3 TRP A 86 21.910 39.630 43.744 1.00 9.41 C ATOM 697 CZ2 TRP A 86 23.244 42.066 44.367 1.00 11.95 C ATOM 698 CZ3 TRP A 86 22.422 39.875 45.016 1.00 8.69 C ATOM 699 CH2 TRP A 86 23.082 41.083 45.315 1.00 15.01 C ATOM 700 N GLN A 87 21.326 36.761 38.792 1.00 11.88 N ATOM 701 CA GLN A 87 20.610 35.667 38.151 1.00 11.39 C ATOM 702 C GLN A 87 19.438 35.195 39.000 1.00 12.00 C ATOM 703 O GLN A 87 19.589 35.006 40.201 1.00 11.24 O ATOM 704 CB GLN A 87 21.570 34.485 37.940 1.00 14.49 C ATOM 705 CG GLN A 87 20.954 33.287 37.231 1.00 18.40 C ATOM 706 CD GLN A 87 20.598 33.589 35.784 1.00 23.31 C ATOM 707 OE1 GLN A 87 21.462 33.972 34.990 1.00 27.67 O ATOM 708 NE2 GLN A 87 19.325 33.418 35.432 1.00 26.80 N ATOM 709 N ILE A 88 18.263 35.037 38.399 1.00 11.31 N ATOM 710 CA ILE A 88 17.126 34.520 39.155 1.00 12.62 C ATOM 711 C ILE A 88 17.322 33.004 39.123 1.00 11.88 C ATOM 712 O ILE A 88 17.188 32.386 38.070 1.00 14.98 O ATOM 713 CB ILE A 88 15.767 34.870 38.502 1.00 15.22 C ATOM 714 CG1 ILE A 88 15.512 36.380 38.547 1.00 17.20 C ATOM 715 CG2 ILE A 88 14.629 34.147 39.226 1.00 12.57 C ATOM 716 CD1 ILE A 88 14.292 36.818 37.728 1.00 13.83 C ATOM 717 N GLN A 89 17.731 32.421 40.247 1.00 15.00 N ATOM 718 CA GLN A 89 17.951 30.974 40.311 1.00 16.39 C ATOM 719 C GLN A 89 16.646 30.218 40.495 1.00 16.88 C ATOM 720 O GLN A 89 16.505 29.078 40.045 1.00 19.34 O ATOM 721 CB GLN A 89 18.974 30.628 41.379 1.00 12.14 C ATOM 722 CG GLN A 89 20.292 31.319 41.124 1.00 13.54 C ATOM 723 CD GLN A 89 21.419 30.665 41.868 1.00 9.67 C ATOM 724 OE1 GLN A 89 21.586 30.871 43.061 1.00 12.98 O ATOM 725 NE2 GLN A 89 22.198 29.858 41.165 1.00 13.21 N ATOM 726 N HIS A 90 15.710 30.839 41.205 1.00 15.39 N ATOM 727 CA HIS A 90 14.376 30.269 41.348 1.00 12.39 C ATOM 728 C HIS A 90 13.342 31.307 41.775 1.00 11.51 C ATOM 729 O HIS A 90 13.672 32.323 42.393 1.00 9.32 O ATOM 730 CB HIS A 90 14.344 28.958 42.166 1.00 11.57 C ATOM 731 CG HIS A 90 14.323 29.134 43.652 1.00 11.82 C ATOM 732 ND1 HIS A 90 13.547 30.079 44.288 1.00 15.98 N ATOM 733 CD2 HIS A 90 14.933 28.431 44.632 1.00 5.68 C ATOM 734 CE1 HIS A 90 13.681 29.950 45.594 1.00 5.12 C ATOM 735 NE2 HIS A 90 14.518 28.958 45.829 1.00 9.37 N ATOM 736 N HIS A 91 12.115 31.088 41.323 1.00 12.42 N ATOM 737 CA HIS A 91 10.985 31.956 41.612 1.00 13.33 C ATOM 738 C HIS A 91 9.830 31.011 41.883 1.00 14.22 C ATOM 739 O HIS A 91 9.227 30.461 40.965 1.00 14.55 O ATOM 740 CB HIS A 91 10.677 32.837 40.398 1.00 14.49 C ATOM 741 CG HIS A 91 9.508 33.757 40.588 1.00 11.65 C ATOM 742 ND1 HIS A 91 8.830 34.320 39.529 1.00 10.13 N ATOM 743 CD2 HIS A 91 8.919 34.236 41.709 1.00 12.17 C ATOM 744 CE1 HIS A 91 7.875 35.108 39.990 1.00 11.37 C ATOM 745 NE2 HIS A 91 7.908 35.075 41.309 1.00 11.03 N ATOM 746 N LEU A 92 9.536 30.818 43.160 1.00 14.20 N ATOM 747 CA LEU A 92 8.483 29.908 43.549 1.00 16.13 C ATOM 748 C LEU A 92 7.706 30.337 44.773 1.00 15.61 C ATOM 749 O LEU A 92 7.994 31.364 45.392 1.00 13.18 O ATOM 750 CB LEU A 92 9.052 28.491 43.741 1.00 21.91 C ATOM 751 CG LEU A 92 10.522 28.319 44.137 1.00 19.48 C ATOM 752 CD1 LEU A 92 10.697 28.548 45.601 1.00 21.10 C ATOM 753 CD2 LEU A 92 11.003 26.940 43.775 1.00 24.03 C ATOM 754 N MET A 93 6.655 29.576 45.046 1.00 15.85 N ATOM 755 CA MET A 93 5.782 29.789 46.182 1.00 20.41 C ATOM 756 C MET A 93 6.280 28.825 47.251 1.00 22.02 C ATOM 757 O MET A 93 6.350 27.612 47.013 1.00 19.65 O ATOM 758 CB MET A 93 4.354 29.422 45.787 1.00 26.24 C ATOM 759 CG MET A 93 3.275 30.079 46.628 1.00 34.92 C ATOM 760 SD MET A 93 3.241 31.869 46.373 1.00 41.75 S ATOM 761 CE MET A 93 3.437 31.974 44.624 1.00 34.79 C ATOM 762 N VAL A 94 6.703 29.364 48.390 1.00 22.33 N ATOM 763 CA VAL A 94 7.194 28.536 49.487 1.00 24.39 C ATOM 764 C VAL A 94 6.368 28.825 50.730 1.00 26.64 C ATOM 765 O VAL A 94 6.375 29.938 51.244 1.00 26.50 O ATOM 766 CB VAL A 94 8.668 28.806 49.799 1.00 25.47 C ATOM 767 CG1 VAL A 94 9.120 27.943 50.956 1.00 26.45 C ATOM 768 CG2 VAL A 94 9.513 28.518 48.600 1.00 25.45 C ATOM 769 N ARG A 95 5.656 27.812 51.206 1.00 28.59 N ATOM 770 CA ARG A 95 4.801 27.950 52.377 1.00 30.86 C ATOM 771 C ARG A 95 3.795 29.092 52.197 1.00 28.49 C ATOM 772 O ARG A 95 3.442 29.785 53.150 1.00 31.14 O ATOM 773 CB ARG A 95 5.643 28.127 53.648 1.00 35.29 C ATOM 774 CG ARG A 95 6.193 26.814 54.231 1.00 43.27 C ATOM 775 CD ARG A 95 7.128 26.076 53.271 1.00 50.53 C ATOM 776 NE ARG A 95 7.570 24.779 53.793 1.00 54.79 N ATOM 777 CZ ARG A 95 8.841 24.378 53.845 1.00 56.28 C ATOM 778 NH1 ARG A 95 9.818 25.167 53.407 1.00 52.45 N ATOM 779 NH2 ARG A 95 9.140 23.184 54.347 1.00 59.62 N ATOM 780 N GLY A 96 3.352 29.283 50.955 1.00 26.49 N ATOM 781 CA GLY A 96 2.376 30.317 50.646 1.00 24.62 C ATOM 782 C GLY A 96 2.914 31.701 50.337 1.00 24.27 C ATOM 783 O GLY A 96 2.137 32.605 50.018 1.00 26.98 O ATOM 784 N VAL A 97 4.233 31.864 50.405 1.00 20.29 N ATOM 785 CA VAL A 97 4.881 33.147 50.136 1.00 18.13 C ATOM 786 C VAL A 97 5.712 33.086 48.863 1.00 16.61 C ATOM 787 O VAL A 97 6.472 32.136 48.653 1.00 18.26 O ATOM 788 CB VAL A 97 5.802 33.548 51.310 1.00 21.27 C ATOM 789 CG1 VAL A 97 6.535 34.843 51.013 1.00 21.18 C ATOM 790 CG2 VAL A 97 4.988 33.696 52.572 1.00 24.93 C ATOM 791 N GLN A 98 5.568 34.104 48.019 1.00 11.13 N ATOM 792 CA GLN A 98 6.322 34.187 46.770 1.00 9.24 C ATOM 793 C GLN A 98 7.790 34.470 47.113 1.00 11.25 C ATOM 794 O GLN A 98 8.095 35.368 47.912 1.00 9.57 O ATOM 795 CB GLN A 98 5.759 35.290 45.876 1.00 11.50 C ATOM 796 CG GLN A 98 6.286 35.263 44.460 1.00 20.05 C ATOM 797 CD GLN A 98 5.622 36.297 43.574 1.00 21.93 C ATOM 798 OE1 GLN A 98 6.127 36.628 42.500 1.00 24.81 O ATOM 799 NE2 GLN A 98 4.479 36.814 44.020 1.00 24.35 N ATOM 800 N GLU A 99 8.692 33.711 46.498 1.00 12.43 N ATOM 801 CA GLU A 99 10.110 33.859 46.782 1.00 11.35 C ATOM 802 C GLU A 99 11.027 33.912 45.563 1.00 10.20 C ATOM 803 O GLU A 99 10.893 33.119 44.635 1.00 9.87 O ATOM 804 CB GLU A 99 10.563 32.723 47.708 1.00 9.36 C ATOM 805 CG GLU A 99 12.063 32.672 47.920 1.00 14.16 C ATOM 806 CD GLU A 99 12.491 31.512 48.781 1.00 13.26 C ATOM 807 OE1 GLU A 99 12.259 31.552 50.003 1.00 18.35 O ATOM 808 OE2 GLU A 99 13.081 30.565 48.244 1.00 13.98 O ATOM 809 N LEU A 100 11.960 34.861 45.581 1.00 10.42 N ATOM 810 CA LEU A 100 12.947 34.997 44.515 1.00 8.12 C ATOM 811 C LEU A 100 14.276 34.636 45.128 1.00 7.89 C ATOM 812 O LEU A 100 14.535 34.988 46.278 1.00 8.59 O ATOM 813 CB LEU A 100 13.038 36.438 44.014 1.00 10.38 C ATOM 814 CG LEU A 100 11.968 37.005 43.083 1.00 8.57 C ATOM 815 CD1 LEU A 100 12.180 38.503 42.937 1.00 12.70 C ATOM 816 CD2 LEU A 100 12.034 36.317 41.734 1.00 11.56 C ATOM 817 N LEU A 101 15.069 33.849 44.405 1.00 8.76 N ATOM 818 CA LEU A 101 16.406 33.497 44.861 1.00 10.93 C ATOM 819 C LEU A 101 17.342 34.076 43.806 1.00 12.06 C ATOM 820 O LEU A 101 17.403 33.584 42.671 1.00 13.11 O ATOM 821 CB LEU A 101 16.609 31.982 44.998 1.00 9.65 C ATOM 822 CG LEU A 101 18.001 31.673 45.574 1.00 11.27 C ATOM 823 CD1 LEU A 101 18.099 32.208 46.995 1.00 11.25 C ATOM 824 CD2 LEU A 101 18.299 30.176 45.524 1.00 12.23 C ATOM 825 N LEU A 102 18.030 35.153 44.188 1.00 12.14 N ATOM 826 CA LEU A 102 18.950 35.881 43.312 1.00 12.70 C ATOM 827 C LEU A 102 20.417 35.604 43.596 1.00 11.45 C ATOM 828 O LEU A 102 20.832 35.526 44.747 1.00 14.27 O ATOM 829 CB LEU A 102 18.713 37.388 43.454 1.00 9.65 C ATOM 830 CG LEU A 102 17.294 37.898 43.228 1.00 12.23 C ATOM 831 CD1 LEU A 102 17.230 39.413 43.442 1.00 11.94 C ATOM 832 CD2 LEU A 102 16.856 37.527 41.826 1.00 11.88 C ATOM 833 N LYS A 103 21.210 35.532 42.536 1.00 10.60 N ATOM 834 CA LYS A 103 22.630 35.284 42.677 1.00 10.78 C ATOM 835 C LYS A 103 23.414 36.371 41.960 1.00 8.25 C ATOM 836 O LYS A 103 23.250 36.566 40.756 1.00 9.80 O ATOM 837 CB LYS A 103 22.990 33.913 42.093 1.00 14.08 C ATOM 838 CG LYS A 103 24.456 33.517 42.291 1.00 12.97 C ATOM 839 CD LYS A 103 24.750 32.164 41.689 1.00 18.27 C ATOM 840 CE LYS A 103 26.194 31.801 41.902 1.00 19.25 C ATOM 841 NZ LYS A 103 26.495 30.426 41.441 1.00 26.24 N ATOM 842 N LEU A 104 24.260 37.077 42.703 1.00 8.70 N ATOM 843 CA LEU A 104 25.077 38.140 42.119 1.00 10.28 C ATOM 844 C LEU A 104 26.169 37.528 41.256 1.00 9.18 C ATOM 845 O LEU A 104 26.997 36.763 41.743 1.00 11.41 O ATOM 846 CB LEU A 104 25.698 39.012 43.212 1.00 10.80 C ATOM 847 CG LEU A 104 26.633 40.102 42.701 1.00 10.58 C ATOM 848 CD1 LEU A 104 25.905 41.017 41.723 1.00 10.79 C ATOM 849 CD2 LEU A 104 27.181 40.879 43.872 1.00 11.82 C ATOM 850 N LEU A 105 26.133 37.843 39.969 1.00 6.87 N ATOM 851 CA LEU A 105 27.095 37.332 39.008 1.00 9.92 C ATOM 852 C LEU A 105 28.296 38.268 38.875 1.00 15.63 C ATOM 853 O LEU A 105 28.194 39.464 39.163 1.00 14.69 O ATOM 854 CB LEU A 105 26.414 37.171 37.649 1.00 10.09 C ATOM 855 CG LEU A 105 25.218 36.216 37.594 1.00 8.13 C ATOM 856 CD1 LEU A 105 24.583 36.276 36.208 1.00 12.90 C ATOM 857 CD2 LEU A 105 25.660 34.808 37.910 1.00 12.11 C ATOM 858 N PRO A 106 29.457 37.731 38.457 1.00 19.64 N ATOM 859 CA PRO A 106 30.660 38.558 38.297 1.00 22.67 C ATOM 860 C PRO A 106 30.432 39.539 37.164 1.00 25.96 C ATOM 861 O PRO A 106 29.753 39.206 36.190 1.00 29.22 O ATOM 862 CB PRO A 106 31.742 37.532 37.956 1.00 20.17 C ATOM 863 CG PRO A 106 30.980 36.424 37.296 1.00 18.51 C ATOM 864 CD PRO A 106 29.735 36.320 38.137 1.00 19.87 C ATOM 865 N ASP A 107 30.943 40.758 37.296 1.00 30.85 N ATOM 866 CA ASP A 107 30.739 41.746 36.238 1.00 37.83 C ATOM 867 C ASP A 107 31.469 41.340 34.971 1.00 41.65 C ATOM 868 O ASP A 107 32.653 41.016 35.014 1.00 40.17 O ATOM 869 CB ASP A 107 31.196 43.140 36.671 1.00 43.05 C ATOM 870 CG ASP A 107 30.820 44.219 35.654 1.00 51.89 C ATOM 871 OD1 ASP A 107 31.520 44.348 34.618 1.00 53.03 O ATOM 872 OD2 ASP A 107 29.815 44.933 35.892 1.00 54.46 O ATOM 873 N ASP A 108 30.750 41.325 33.853 1.00 46.19 N ATOM 874 CA ASP A 108 31.352 40.962 32.577 1.00 50.93 C ATOM 875 C ASP A 108 31.279 42.138 31.608 1.00 51.95 C ATOM 876 O ASP A 108 31.867 43.180 31.964 1.00 53.18 O ATOM 877 CB ASP A 108 30.661 39.731 31.977 1.00 51.88 C ATOM 878 CG ASP A 108 31.404 39.166 30.761 1.00 52.53 C ATOM 879 OD1 ASP A 108 32.547 39.602 30.477 1.00 50.50 O ATOM 880 OD2 ASP A 108 30.842 38.271 30.092 1.00 55.16 O ATOM 881 OXT ASP A 108 30.861 41.990 30.465 1.00 53.01 O TER 882 ASP A 108 HETATM 883 O HOH A 201 10.790 54.362 49.155 1.00 40.18 O HETATM 884 O HOH A 202 2.542 44.383 50.633 1.00 23.32 O HETATM 885 O HOH A 203 25.202 45.116 45.802 1.00 43.48 O HETATM 886 O HOH A 204 13.390 60.000 54.870 1.00 43.75 O HETATM 887 O HOH A 205 27.374 41.457 37.460 1.00 17.49 O HETATM 888 O HOH A 206 27.812 34.370 40.842 1.00 50.30 O HETATM 889 O HOH A 207 3.611 36.208 48.759 1.00 23.32 O HETATM 890 O HOH A 208 23.300 44.010 53.130 1.00 43.87 O HETATM 891 O HOH A 209 13.307 31.217 55.633 1.00 42.65 O HETATM 892 O HOH A 210 10.639 51.975 47.659 1.00 46.65 O HETATM 893 O HOH A 211 2.400 38.324 41.933 1.00 42.89 O HETATM 894 O HOH A 212 18.750 47.014 51.139 1.00 20.60 O HETATM 895 O HOH A 213 29.420 41.990 40.390 1.00 35.57 O HETATM 896 O HOH A 214 19.260 32.730 32.410 1.00 47.68 O HETATM 897 O HOH A 215 15.685 48.995 59.961 1.00 46.82 O HETATM 898 O HOH A 216 -1.220 40.340 55.790 1.00 58.68 O HETATM 899 O HOH A 217 7.303 41.722 35.036 1.00 44.43 O HETATM 900 O HOH A 218 28.604 49.559 48.327 1.00 56.72 O HETATM 901 O HOH A 219 23.770 46.990 57.770 1.00 39.86 O HETATM 902 O HOH A 220 2.957 37.206 52.169 1.00 51.92 O HETATM 903 O HOH A 221 30.110 49.910 43.900 1.00 57.81 O HETATM 904 O HOH A 222 29.812 49.633 38.182 1.00 45.35 O HETATM 905 O HOH A 223 28.242 50.610 57.743 1.00 50.83 O HETATM 906 O HOH A 224 7.227 31.975 36.634 1.00 29.20 O MASTER 265 0 0 1 9 0 0 6 905 1 0 9 END PyCogent-1.5.3/tests/data/1LJO.pdb000644 000765 000024 00000243331 11305747275 017530 0ustar00jrideoutstaff000000 000000 HEADER UNKNOWN FUNCTION 22-APR-02 1LJO TITLE CRYSTAL STRUCTURE OF AN SM-LIKE PROTEIN (AF-SM2) FROM TITLE 2 ARCHAEOGLOBUS FULGIDUS AT 1.95A RESOLUTION COMPND MOL_ID: 1; COMPND 2 MOLECULE: ARCHAEAL SM-LIKE PROTEIN AF-SM2; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES; COMPND 5 MUTATION: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: ARCHAEOGLOBUS FULGIDUS; SOURCE 3 ORGANISM_TAXID: 2234; SOURCE 4 GENE: AF0362; SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI BL21(DE3); SOURCE 6 EXPRESSION_SYSTEM_TAXID: 469008; SOURCE 7 EXPRESSION_SYSTEM_STRAIN: BL21(DE3); SOURCE 8 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID; SOURCE 9 EXPRESSION_SYSTEM_PLASMID: MODIFIED PET24D KEYWDS SNRNP, SM, CORE SNRNP DOMAIN, RNA BINDING PROTEIN, UNKNOWN KEYWDS 2 FUNCTION EXPDTA X-RAY DIFFRACTION AUTHOR I.TORO,J.BASQUIN,H.TEO-DREHER,D.SUCK REVDAT 3 24-FEB-09 1LJO 1 VERSN REVDAT 2 12-AUG-03 1LJO 1 DBREF SEQADV REVDAT 1 03-JUL-02 1LJO 0 JRNL AUTH I.TORO,J.BASQUIN,H.TEO-DREHER,D.SUCK JRNL TITL ARCHAEAL SM PROTEINS FORM HEPTAMERIC AND HEXAMERIC JRNL TITL 2 COMPLEXES: CRYSTAL STRUCTURES OF THE SM1 AND SM2 JRNL TITL 3 PROTEINS FROM THE HYPERTHERMOPHILE ARCHAEOGLOBUS JRNL TITL 4 FULGIDUS. JRNL REF J.MOL.BIOL. V. 320 129 2002 JRNL REFN ISSN 0022-2836 JRNL PMID 12079339 JRNL DOI 10.1016/S0022-2836(02)00406-0 REMARK 1 REMARK 1 REFERENCE 1 REMARK 1 AUTH I.TORO,S.THORE,C.MAYER,J.BASQUIN,B.SRAPHIN,D.SUCK REMARK 1 TITL RNA BINDING IN AN SM CORE DOMAIN: X-RAY STRUCTURE REMARK 1 TITL 2 AND FUNCTIONAL ANALYSIS OF AN ARCHAEAL SM PROTEIN REMARK 1 TITL 3 COMPLEX. REMARK 1 REF EMBO J. V. 20 2293 2001 REMARK 1 REFN ISSN 0261-4189 REMARK 1 DOI 10.1093/EMBOJ/20.9.2293 REMARK 2 REMARK 2 RESOLUTION. 1.95 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : CNS REMARK 3 AUTHORS : BRUNGER,ADAMS,CLORE,DELANO,GROS,GROSSE- REMARK 3 : KUNSTLEVE,JIANG,KUSZEWSKI,NILGES, PANNU, REMARK 3 : READ,RICE,SIMONSON,WARREN REMARK 3 REMARK 3 REFINEMENT TARGET : ENGH & HUBER REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.95 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 27.10 REMARK 3 DATA CUTOFF (SIGMA(F)) : 0.000 REMARK 3 DATA CUTOFF HIGH (ABS(F)) : NULL REMARK 3 DATA CUTOFF LOW (ABS(F)) : NULL REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 96.6 REMARK 3 NUMBER OF REFLECTIONS : 4521 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING SET) : 0.194 REMARK 3 FREE R VALUE : 0.211 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.000 REMARK 3 FREE R VALUE TEST SET COUNT : 226 REMARK 3 ESTIMATED ERROR OF FREE R VALUE : 0.014 REMARK 3 REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. REMARK 3 TOTAL NUMBER OF BINS USED : 6 REMARK 3 BIN RESOLUTION RANGE HIGH (A) : 1.95 REMARK 3 BIN RESOLUTION RANGE LOW (A) : 2.07 REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : 91.50 REMARK 3 REFLECTIONS IN BIN (WORKING SET) : 670 REMARK 3 BIN R VALUE (WORKING SET) : 0.1990 REMARK 3 BIN FREE R VALUE : 0.2940 REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : 5.40 REMARK 3 BIN FREE R VALUE TEST SET COUNT : 38 REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : 0.048 REMARK 3 REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. REMARK 3 PROTEIN ATOMS : 578 REMARK 3 NUCLEIC ACID ATOMS : 0 REMARK 3 HETEROGEN ATOMS : 1 REMARK 3 SOLVENT ATOMS : 45 REMARK 3 REMARK 3 B VALUES. REMARK 3 FROM WILSON PLOT (A**2) : 18.57 REMARK 3 MEAN B VALUE (OVERALL, A**2) : 25.00 REMARK 3 OVERALL ANISOTROPIC B VALUE. REMARK 3 B11 (A**2) : -5.74000 REMARK 3 B22 (A**2) : -5.74000 REMARK 3 B33 (A**2) : 11.48000 REMARK 3 B12 (A**2) : -1.46000 REMARK 3 B13 (A**2) : 0.00000 REMARK 3 B23 (A**2) : 0.00000 REMARK 3 REMARK 3 ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM LUZZATI PLOT (A) : 0.20 REMARK 3 ESD FROM SIGMAA (A) : 0.11 REMARK 3 LOW RESOLUTION CUTOFF (A) : 5.00 REMARK 3 REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM C-V LUZZATI PLOT (A) : 0.24 REMARK 3 ESD FROM C-V SIGMAA (A) : 0.24 REMARK 3 REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES. REMARK 3 BOND LENGTHS (A) : 0.005 REMARK 3 BOND ANGLES (DEGREES) : 1.32 REMARK 3 DIHEDRAL ANGLES (DEGREES) : 25.48 REMARK 3 IMPROPER ANGLES (DEGREES) : 0.73 REMARK 3 REMARK 3 ISOTROPIC THERMAL MODEL : ISOTROPIC REMARK 3 REMARK 3 ISOTROPIC THERMAL FACTOR RESTRAINTS. RMS SIGMA REMARK 3 MAIN-CHAIN BOND (A**2) : 2.255 ; NULL REMARK 3 MAIN-CHAIN ANGLE (A**2) : 3.363 ; NULL REMARK 3 SIDE-CHAIN BOND (A**2) : 3.077 ; NULL REMARK 3 SIDE-CHAIN ANGLE (A**2) : 4.729 ; NULL REMARK 3 REMARK 3 BULK SOLVENT MODELING. REMARK 3 METHOD USED : FLAT MODEL REMARK 3 KSOL : 0.38 REMARK 3 BSOL : 56.13 REMARK 3 REMARK 3 NCS MODEL : NULL REMARK 3 REMARK 3 NCS RESTRAINTS. RMS SIGMA/WEIGHT REMARK 3 GROUP 1 POSITIONAL (A) : NULL ; NULL REMARK 3 GROUP 1 B-FACTOR (A**2) : NULL ; NULL REMARK 3 REMARK 3 PARAMETER FILE 1 : PROTEIN_REP.PARAM REMARK 3 PARAMETER FILE 2 : WATER_REP.PARAM REMARK 3 PARAMETER FILE 3 : ION.PARAM REMARK 3 PARAMETER FILE 4 : ACY.PAR REMARK 3 PARAMETER FILE 5 : NULL REMARK 3 TOPOLOGY FILE 1 : PROTEIN.TOP REMARK 3 TOPOLOGY FILE 2 : WATER.TOP REMARK 3 TOPOLOGY FILE 3 : ION.TOP REMARK 3 TOPOLOGY FILE 4 : ACY.TOP REMARK 3 TOPOLOGY FILE 5 : NULL REMARK 3 REMARK 3 OTHER REFINEMENT REMARKS: MAXIMUM LIKELIHOOD TARGET REMARK 4 REMARK 4 1LJO COMPLIES WITH FORMAT V. 3.15, 01-DEC-08 REMARK 100 REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY RCSB ON 24-APR-02. REMARK 100 THE RCSB ID CODE IS RCSB016002. REMARK 200 REMARK 200 EXPERIMENTAL DETAILS REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION REMARK 200 DATE OF DATA COLLECTION : 29-APR-99; 20-APR-99 REMARK 200 TEMPERATURE (KELVIN) : 100; 100 REMARK 200 PH : 3.6 REMARK 200 NUMBER OF CRYSTALS USED : 1 REMARK 200 REMARK 200 SYNCHROTRON (Y/N) : Y; N REMARK 200 RADIATION SOURCE : ESRF; ROTATING ANODE REMARK 200 BEAMLINE : BM14; NULL REMARK 200 X-RAY GENERATOR MODEL : NULL; ELLIOTT GX-21 REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M; M REMARK 200 WAVELENGTH OR RANGE (A) : 1; 1.5418 REMARK 200 MONOCHROMATOR : SI 111; NI MIRROR + NI FILTER REMARK 200 OPTICS : NULL; NULL REMARK 200 REMARK 200 DETECTOR TYPE : CCD; IMAGE PLATE REMARK 200 DETECTOR MANUFACTURER : MARRESEARCH; MARRESEARCH REMARK 200 INTENSITY-INTEGRATION SOFTWARE : MAR REMARK 200 DATA SCALING SOFTWARE : XDS REMARK 200 REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 4522 REMARK 200 RESOLUTION RANGE HIGH (A) : 1.950 REMARK 200 RESOLUTION RANGE LOW (A) : 27.100 REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 0.000 REMARK 200 REMARK 200 OVERALL. REMARK 200 COMPLETENESS FOR RANGE (%) : 97.6 REMARK 200 DATA REDUNDANCY : 2.700 REMARK 200 R MERGE (I) : 0.05100 REMARK 200 R SYM (I) : 0.05100 REMARK 200 FOR THE DATA SET : 10.5000 REMARK 200 REMARK 200 IN THE HIGHEST RESOLUTION SHELL. REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 1.95 REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : 2.06 REMARK 200 COMPLETENESS FOR SHELL (%) : 90.9 REMARK 200 DATA REDUNDANCY IN SHELL : 2.00 REMARK 200 R MERGE FOR SHELL (I) : 0.17300 REMARK 200 R SYM FOR SHELL (I) : 0.17300 REMARK 200 FOR SHELL : 4.200 REMARK 200 REMARK 200 DIFFRACTION PROTOCOL: SINGLE WAVELENGTH; SINGLE WAVELENGTH REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: SIRAS REMARK 200 SOFTWARE USED: SHARP REMARK 200 STARTING MODEL: BUILT BY ARP/WARP (68 RESIDUES) REMARK 200 REMARK 200 REMARK: NULL REMARK 280 REMARK 280 CRYSTAL REMARK 280 SOLVENT CONTENT, VS (%): 32.92 REMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 1.83 REMARK 280 REMARK 280 CRYSTALLIZATION CONDITIONS: AMMONIUM SULPHATE, LITHIUM REMARK 280 SULPHATE, SODIUM SULPHATE, PH 3.6, VAPOR DIFFUSION, HANGING REMARK 280 DROP, TEMPERATURE 293K REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 6 REMARK 290 REMARK 290 SYMOP SYMMETRY REMARK 290 NNNMMM OPERATOR REMARK 290 1555 X,Y,Z REMARK 290 2555 -Y,X-Y,Z REMARK 290 3555 -X+Y,-X,Z REMARK 290 4555 -X,-Y,Z REMARK 290 5555 Y,-X+Y,Z REMARK 290 6555 X-Y,X,Z REMARK 290 REMARK 290 WHERE NNN -> OPERATOR NUMBER REMARK 290 MMM -> TRANSLATION VECTOR REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY REMARK 290 RELATED MOLECULES. REMARK 290 SMTRY1 1 1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 1 0.000000 1.000000 0.000000 0.00000 REMARK 290 SMTRY3 1 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 2 -0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 2 0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 2 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 3 -0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 3 -0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 3 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 4 -1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 4 0.000000 -1.000000 0.000000 0.00000 REMARK 290 SMTRY3 4 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 5 0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 5 -0.866025 0.500000 0.000000 0.00000 REMARK 290 SMTRY3 5 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 6 0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 6 0.866025 0.500000 0.000000 0.00000 REMARK 290 SMTRY3 6 0.000000 0.000000 1.000000 0.00000 REMARK 290 REMARK 290 REMARK: NULL REMARK 300 REMARK 300 BIOMOLECULE: 1 REMARK 300 SEE REMARK 350 FOR THE AUTHOR PROVIDED AND/OR PROGRAM REMARK 300 GENERATED ASSEMBLY INFORMATION FOR THE STRUCTURE IN REMARK 300 THIS ENTRY. THE REMARK MAY ALSO PROVIDE INFORMATION ON REMARK 300 BURIED SURFACE AREA. REMARK 300 REMARK: THE BIOLOGICAL ASSEMBLY IS A HEXAMER GENERATED FROM THE REMARK 300 MONOMER IN THE ASYMMETRIC UNIT BY THE OPERATIONS: -Y,X-Y,Z; Y-X, REMARK 300 -X,Z; -X,-Y,Z; Y,Y-X,Z AND X-Y,X,Z REMARK 350 REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS REMARK 350 GIVEN BELOW. BOTH NON-CRYSTALLOGRAPHIC AND REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN. REMARK 350 REMARK 350 BIOMOLECULE: 1 REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: HEXAMERIC REMARK 350 APPLY THE FOLLOWING TO CHAINS: A REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000 REMARK 350 BIOMT1 2 -0.500000 -0.866025 0.000000 0.00000 REMARK 350 BIOMT2 2 0.866025 -0.500000 0.000000 0.00000 REMARK 350 BIOMT3 2 0.000000 0.000000 1.000000 0.00000 REMARK 350 BIOMT1 3 -0.500000 0.866025 0.000000 0.00000 REMARK 350 BIOMT2 3 -0.866025 -0.500000 0.000000 0.00000 REMARK 350 BIOMT3 3 0.000000 0.000000 1.000000 0.00000 REMARK 350 BIOMT1 4 -1.000000 0.000000 0.000000 0.00000 REMARK 350 BIOMT2 4 0.000000 -1.000000 0.000000 0.00000 REMARK 350 BIOMT3 4 0.000000 0.000000 1.000000 0.00000 REMARK 350 BIOMT1 5 0.500000 0.866025 0.000000 0.00000 REMARK 350 BIOMT2 5 -0.866025 0.500000 0.000000 0.00000 REMARK 350 BIOMT3 5 0.000000 0.000000 1.000000 0.00000 REMARK 350 BIOMT1 6 0.500000 -0.866025 0.000000 0.00000 REMARK 350 BIOMT2 6 0.866025 0.500000 0.000000 0.00000 REMARK 350 BIOMT3 6 0.000000 0.000000 1.000000 0.00000 REMARK 375 REMARK 375 SPECIAL POSITION REMARK 375 THE FOLLOWING ATOMS ARE FOUND TO BE WITHIN 0.15 ANGSTROMS REMARK 375 OF A SYMMETRY RELATED ATOM AND ARE ASSUMED TO BE ON SPECIAL REMARK 375 POSITIONS. REMARK 375 REMARK 375 ATOM RES CSSEQI REMARK 375 CD CD A 78 LIES ON A SPECIAL POSITION. REMARK 375 HOH A 307 LIES ON A SPECIAL POSITION. REMARK 375 HOH A 309 LIES ON A SPECIAL POSITION. REMARK 375 HOH A 398 LIES ON A SPECIAL POSITION. REMARK 375 HOH A 442 LIES ON A SPECIAL POSITION. REMARK 375 HOH A 444 LIES ON A SPECIAL POSITION. REMARK 465 REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 GLU A 76 REMARK 465 GLU A 77 REMARK 525 REMARK 525 SOLVENT REMARK 525 REMARK 525 THE SOLVENT MOLECULES HAVE CHAIN IDENTIFIERS THAT REMARK 525 INDICATE THE POLYMER CHAIN WITH WHICH THEY ARE MOST REMARK 525 CLOSELY ASSOCIATED. THE REMARK LISTS ALL THE SOLVENT REMARK 525 MOLECULES WHICH ARE MORE THAN 5A AWAY FROM THE REMARK 525 NEAREST POLYMER CHAIN (M = MODEL NUMBER; REMARK 525 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE REMARK 525 NUMBER; I=INSERTION CODE): REMARK 525 REMARK 525 M RES CSSEQI REMARK 525 HOH A 444 DISTANCE = 6.77 ANGSTROMS REMARK 600 REMARK 600 HETEROGEN REMARK 600 CD ION SITTING ON SIX-FOLD AXIS IS MODELLING A REMARK 600 SULPHATE ION REMARK 800 REMARK 800 SITE REMARK 800 SITE_IDENTIFIER: AC1 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE ACY A 201 REMARK 800 SITE_IDENTIFIER: AC2 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE ACY A 202 REMARK 800 SITE_IDENTIFIER: AC3 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE ACY A 203 REMARK 900 REMARK 900 RELATED ENTRIES REMARK 900 RELATED ID: 1D3B RELATED DB: PDB REMARK 900 CRYSTAL STRUCTURE OF THE D3B SUBCOMPLEX OF THE HUMAN CORE REMARK 900 SNRNP DOMAIN AT 2.0A RESOLUTION REMARK 900 RELATED ID: 1B34 RELATED DB: PDB REMARK 900 CRYSTAL STRUCTURE OF THE D1D2 SUB-COMPLEX FROM THE HUMAN REMARK 900 SNRNP CORE DOMAIN REMARK 900 RELATED ID: 1I81 RELATED DB: PDB REMARK 900 CRYSTAL STRUCTURE OF A HEPTAMERIC LSM PROTEIN FROM REMARK 900 METHANOBACTERIUM THERMOAUTOTROPHICUM REMARK 900 RELATED ID: 1I8F RELATED DB: PDB REMARK 900 THE CRYSTAL STRUCTURE OF A HEPTAMERIC ARCHAEAL SM PROTEIN: REMARK 900 IMPLICATIONS FOR THE EUKARYOTIC SNRNP CORE REMARK 900 RELATED ID: 1I5L RELATED DB: PDB REMARK 900 CRYSTAL STRUCTURE OF AN SM-LIKE PROTEIN (AF-SM1) FROM REMARK 900 ARCHAEOGLOBUS FULGIDUS COMPLEXED WITH SHORT POLY-U RNA REMARK 900 RELATED ID: 1I4K RELATED DB: PDB REMARK 900 CRYSTAL STRUCTURE OF AN SM-LIKE PROTEIN (AF-SM1) FROM REMARK 900 ARCHAEOGLOBUS FULGIDUS AT 2.5A RESOLUTION REMARK 900 RELATED ID: 1JRI RELATED DB: PDB REMARK 900 THE CRYSTAL STRUCTURE OF AN SM-LIKE ARCHAEAL PROTEIN WITH REMARK 900 TWO HEPTAMERS IN THE ASYMMETRIC UNIT. DBREF 1LJO A 1 77 UNP O29885 O29885_ARCFU 1 75 SEQADV 1LJO GLY A 1 UNP O29885 CLONING ARTIFACT SEQADV 1LJO ALA A 2 UNP O29885 CLONING ARTIFACT SEQRES 1 A 77 GLY ALA MET VAL LEU PRO ASN GLN MET VAL LYS SER MET SEQRES 2 A 77 VAL GLY LYS ILE ILE ARG VAL GLU MET LYS GLY GLU GLU SEQRES 3 A 77 ASN GLN LEU VAL GLY LYS LEU GLU GLY VAL ASP ASP TYR SEQRES 4 A 77 MET ASN LEU TYR LEU THR ASN ALA MET GLU CYS LYS GLY SEQRES 5 A 77 GLU GLU LYS VAL ARG SER LEU GLY GLU ILE VAL LEU ARG SEQRES 6 A 77 GLY ASN ASN VAL VAL LEU ILE GLN PRO GLN GLU GLU HET CD A 78 1 HET ACY A 201 4 HET ACY A 202 4 HET ACY A 203 4 HETNAM CD CADMIUM ION HETNAM ACY ACETIC ACID FORMUL 2 CD CD 2+ FORMUL 3 ACY 3(C2 H4 O2) FORMUL 6 HOH *33(H2 O) HELIX 1 1 LEU A 5 MET A 13 1 9 HELIX 2 2 ARG A 65 ASN A 67 5 3 SHEET 1 A 5 GLU A 54 LEU A 64 0 SHEET 2 A 5 LEU A 42 LYS A 51 -1 N LEU A 42 O LEU A 64 SHEET 3 A 5 GLN A 28 VAL A 36 -1 N GLN A 28 O CYS A 50 SHEET 4 A 5 ILE A 17 MET A 22 -1 N VAL A 20 O LEU A 29 SHEET 5 A 5 VAL A 69 PRO A 74 -1 O VAL A 70 N GLU A 21 SITE 1 AC1 3 GLN A 28 ASP A 38 LYS A 51 SITE 1 AC2 2 LYS A 55 SER A 58 SITE 1 AC3 5 ILE A 17 ASN A 46 GLY A 60 GLU A 61 SITE 2 AC3 5 HOH A 385 CRYST1 58.420 58.420 32.084 90.00 90.00 120.00 P 6 6 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.017117 0.009883 0.000000 0.00000 SCALE2 0.000000 0.019766 0.000000 0.00000 SCALE3 0.000000 0.000000 0.031168 0.00000 ATOM 1 N GLY A 1 22.149 18.199 41.790 1.00 43.67 N ATOM 2 CA GLY A 1 21.635 19.597 41.827 1.00 42.53 C ATOM 3 C GLY A 1 20.136 19.663 42.046 1.00 41.28 C ATOM 4 O GLY A 1 19.647 19.354 43.133 1.00 41.86 O ATOM 5 N ALA A 2 19.405 20.067 41.012 1.00 38.71 N ATOM 6 CA ALA A 2 17.953 20.175 41.087 1.00 35.37 C ATOM 7 C ALA A 2 17.313 19.817 39.753 1.00 34.90 C ATOM 8 O ALA A 2 17.981 19.797 38.715 1.00 34.91 O ATOM 9 CB ALA A 2 17.554 21.588 41.489 1.00 33.87 C ATOM 10 N MET A 3 16.017 19.528 39.783 1.00 31.49 N ATOM 11 CA MET A 3 15.297 19.186 38.567 1.00 30.05 C ATOM 12 C MET A 3 13.809 19.434 38.748 1.00 28.55 C ATOM 13 O MET A 3 13.313 19.490 39.872 1.00 28.05 O ATOM 14 CB MET A 3 15.540 17.721 38.185 1.00 30.48 C ATOM 15 CG MET A 3 14.879 16.691 39.089 1.00 29.18 C ATOM 16 SD MET A 3 15.038 15.018 38.409 1.00 29.49 S ATOM 17 CE MET A 3 15.024 14.014 39.898 1.00 31.84 C ATOM 18 N VAL A 4 13.105 19.593 37.633 1.00 25.64 N ATOM 19 CA VAL A 4 11.667 19.833 37.661 1.00 23.81 C ATOM 20 C VAL A 4 10.933 18.550 37.289 1.00 21.83 C ATOM 21 O VAL A 4 11.119 18.016 36.195 1.00 23.25 O ATOM 22 CB VAL A 4 11.273 20.948 36.667 1.00 21.95 C ATOM 23 CG1 VAL A 4 9.778 21.227 36.753 1.00 23.10 C ATOM 24 CG2 VAL A 4 12.064 22.208 36.971 1.00 22.74 C ATOM 25 N LEU A 5 10.107 18.054 38.204 1.00 18.90 N ATOM 26 CA LEU A 5 9.355 16.829 37.958 1.00 20.29 C ATOM 27 C LEU A 5 8.330 17.012 36.843 1.00 16.15 C ATOM 28 O LEU A 5 7.695 18.058 36.732 1.00 17.31 O ATOM 29 CB LEU A 5 8.647 16.373 39.234 1.00 19.85 C ATOM 30 CG LEU A 5 9.564 15.895 40.366 1.00 24.33 C ATOM 31 CD1 LEU A 5 8.751 15.672 41.634 1.00 23.59 C ATOM 32 CD2 LEU A 5 10.274 14.612 39.943 1.00 24.78 C ATOM 33 N PRO A 6 8.176 15.996 35.982 1.00 15.96 N ATOM 34 CA PRO A 6 7.209 16.085 34.889 1.00 14.51 C ATOM 35 C PRO A 6 5.810 16.470 35.384 1.00 14.71 C ATOM 36 O PRO A 6 5.177 17.359 34.824 1.00 13.33 O ATOM 37 CB PRO A 6 7.265 14.691 34.279 1.00 14.95 C ATOM 38 CG PRO A 6 8.724 14.351 34.418 1.00 12.33 C ATOM 39 CD PRO A 6 9.005 14.782 35.851 1.00 15.51 C ATOM 40 N ASN A 7 5.331 15.809 36.436 1.00 19.33 N ATOM 41 CA ASN A 7 4.010 16.127 36.980 1.00 23.08 C ATOM 42 C ASN A 7 3.946 17.585 37.430 1.00 23.25 C ATOM 43 O ASN A 7 2.900 18.225 37.318 1.00 22.91 O ATOM 44 CB ASN A 7 3.666 15.213 38.163 1.00 23.44 C ATOM 45 CG ASN A 7 3.219 13.828 37.726 1.00 25.02 C ATOM 46 OD1 ASN A 7 3.030 12.937 38.554 1.00 28.45 O ATOM 47 ND2 ASN A 7 3.040 13.644 36.425 1.00 22.14 N ATOM 48 N GLN A 8 5.058 18.105 37.946 1.00 23.39 N ATOM 49 CA GLN A 8 5.104 19.499 38.390 1.00 23.53 C ATOM 50 C GLN A 8 4.946 20.401 37.170 1.00 21.36 C ATOM 51 O GLN A 8 4.220 21.395 37.200 1.00 19.26 O ATOM 52 CB GLN A 8 6.445 19.828 39.069 1.00 28.13 C ATOM 53 CG GLN A 8 6.714 19.118 40.390 1.00 30.41 C ATOM 54 CD GLN A 8 7.978 19.629 41.082 1.00 32.59 C ATOM 55 OE1 GLN A 8 9.078 19.573 40.527 1.00 30.42 O ATOM 56 NE2 GLN A 8 7.820 20.125 42.304 1.00 34.48 N ATOM 57 N MET A 9 5.645 20.049 36.097 1.00 17.49 N ATOM 58 CA MET A 9 5.588 20.829 34.870 1.00 15.33 C ATOM 59 C MET A 9 4.172 20.833 34.292 1.00 14.63 C ATOM 60 O MET A 9 3.637 21.889 33.961 1.00 15.62 O ATOM 61 CB MET A 9 6.581 20.268 33.842 1.00 15.69 C ATOM 62 CG MET A 9 6.667 21.070 32.549 1.00 21.37 C ATOM 63 SD MET A 9 7.046 22.822 32.836 1.00 27.63 S ATOM 64 CE MET A 9 6.588 23.523 31.263 1.00 23.80 C ATOM 65 N VAL A 10 3.559 19.658 34.189 1.00 13.76 N ATOM 66 CA VAL A 10 2.210 19.569 33.635 1.00 15.31 C ATOM 67 C VAL A 10 1.199 20.364 34.454 1.00 16.49 C ATOM 68 O VAL A 10 0.324 21.023 33.893 1.00 13.31 O ATOM 69 CB VAL A 10 1.731 18.098 33.526 1.00 14.85 C ATOM 70 CG1 VAL A 10 0.304 18.049 32.982 1.00 16.89 C ATOM 71 CG2 VAL A 10 2.665 17.310 32.615 1.00 13.51 C ATOM 72 N LYS A 11 1.318 20.312 35.778 1.00 14.56 N ATOM 73 CA LYS A 11 0.385 21.047 36.618 1.00 18.34 C ATOM 74 C LYS A 11 0.537 22.551 36.405 1.00 16.58 C ATOM 75 O LYS A 11 -0.438 23.290 36.478 1.00 18.09 O ATOM 76 CB LYS A 11 0.587 20.695 38.099 1.00 18.98 C ATOM 77 CG LYS A 11 -0.494 21.271 39.011 1.00 26.61 C ATOM 78 CD LYS A 11 -0.424 20.707 40.429 1.00 27.09 C ATOM 79 CE LYS A 11 0.851 21.122 41.149 1.00 31.24 C ATOM 80 NZ LYS A 11 0.946 22.601 41.317 1.00 34.46 N ATOM 81 N SER A 12 1.756 23.001 36.123 1.00 16.69 N ATOM 82 CA SER A 12 1.996 24.420 35.892 1.00 16.02 C ATOM 83 C SER A 12 1.320 24.893 34.603 1.00 17.10 C ATOM 84 O SER A 12 1.155 26.089 34.392 1.00 14.71 O ATOM 85 CB SER A 12 3.500 24.708 35.813 1.00 19.31 C ATOM 86 OG SER A 12 4.058 24.248 34.595 1.00 21.19 O ATOM 87 N MET A 13 0.929 23.949 33.750 1.00 16.32 N ATOM 88 CA MET A 13 0.266 24.281 32.487 1.00 18.39 C ATOM 89 C MET A 13 -1.244 24.463 32.645 1.00 16.18 C ATOM 90 O MET A 13 -1.930 24.841 31.693 1.00 15.91 O ATOM 91 CB MET A 13 0.524 23.188 31.445 1.00 18.87 C ATOM 92 CG MET A 13 1.942 23.124 30.905 1.00 21.81 C ATOM 93 SD MET A 13 2.139 21.753 29.723 1.00 26.67 S ATOM 94 CE MET A 13 3.236 20.689 30.628 1.00 28.82 C ATOM 95 N VAL A 14 -1.764 24.188 33.838 1.00 17.68 N ATOM 96 CA VAL A 14 -3.196 24.328 34.075 1.00 16.93 C ATOM 97 C VAL A 14 -3.634 25.772 33.864 1.00 17.33 C ATOM 98 O VAL A 14 -2.969 26.707 34.309 1.00 15.19 O ATOM 99 CB VAL A 14 -3.579 23.890 35.502 1.00 17.94 C ATOM 100 CG1 VAL A 14 -5.011 24.305 35.808 1.00 18.77 C ATOM 101 CG2 VAL A 14 -3.433 22.381 35.634 1.00 18.66 C ATOM 102 N GLY A 15 -4.758 25.950 33.185 1.00 15.79 N ATOM 103 CA GLY A 15 -5.245 27.291 32.922 1.00 16.38 C ATOM 104 C GLY A 15 -4.672 27.859 31.634 1.00 16.56 C ATOM 105 O GLY A 15 -5.079 28.922 31.182 1.00 17.25 O ATOM 106 N LYS A 16 -3.731 27.144 31.033 1.00 16.77 N ATOM 107 CA LYS A 16 -3.111 27.605 29.796 1.00 16.86 C ATOM 108 C LYS A 16 -3.387 26.661 28.634 1.00 12.79 C ATOM 109 O LYS A 16 -3.864 25.549 28.832 1.00 12.68 O ATOM 110 CB LYS A 16 -1.602 27.766 29.999 1.00 21.71 C ATOM 111 CG LYS A 16 -1.265 28.678 31.175 1.00 26.90 C ATOM 112 CD LYS A 16 0.207 29.049 31.229 1.00 32.02 C ATOM 113 CE LYS A 16 0.461 30.032 32.363 1.00 35.30 C ATOM 114 NZ LYS A 16 1.866 30.520 32.389 1.00 38.44 N ATOM 115 N ILE A 17 -3.081 27.124 27.424 1.00 12.25 N ATOM 116 CA ILE A 17 -3.285 26.353 26.201 1.00 12.39 C ATOM 117 C ILE A 17 -2.129 25.391 25.954 1.00 11.21 C ATOM 118 O ILE A 17 -0.964 25.787 25.987 1.00 11.91 O ATOM 119 CB ILE A 17 -3.378 27.273 24.962 1.00 14.96 C ATOM 120 CG1 ILE A 17 -4.437 28.356 25.183 1.00 17.81 C ATOM 121 CG2 ILE A 17 -3.688 26.442 23.718 1.00 14.38 C ATOM 122 CD1 ILE A 17 -5.817 27.820 25.475 1.00 19.62 C ATOM 123 N ILE A 18 -2.457 24.130 25.691 1.00 9.41 N ATOM 124 CA ILE A 18 -1.435 23.126 25.433 1.00 11.75 C ATOM 125 C ILE A 18 -1.767 22.340 24.172 1.00 13.26 C ATOM 126 O ILE A 18 -2.911 22.345 23.714 1.00 12.21 O ATOM 127 CB ILE A 18 -1.308 22.130 26.613 1.00 11.15 C ATOM 128 CG1 ILE A 18 -2.599 21.318 26.765 1.00 12.77 C ATOM 129 CG2 ILE A 18 -1.035 22.893 27.908 1.00 13.70 C ATOM 130 CD1 ILE A 18 -2.487 20.155 27.757 1.00 9.49 C ATOM 131 N ARG A 19 -0.753 21.679 23.615 1.00 10.70 N ATOM 132 CA ARG A 19 -0.910 20.856 22.425 1.00 12.91 C ATOM 133 C ARG A 19 -0.537 19.450 22.869 1.00 11.65 C ATOM 134 O ARG A 19 0.501 19.255 23.502 1.00 11.52 O ATOM 135 CB ARG A 19 0.032 21.332 21.307 1.00 14.90 C ATOM 136 CG ARG A 19 0.014 20.466 20.049 1.00 22.71 C ATOM 137 CD ARG A 19 0.933 21.036 18.965 1.00 26.27 C ATOM 138 NE ARG A 19 0.463 22.329 18.478 1.00 30.98 N ATOM 139 CZ ARG A 19 -0.461 22.488 17.533 1.00 33.47 C ATOM 140 NH1 ARG A 19 -1.017 21.431 16.957 1.00 35.05 N ATOM 141 NH2 ARG A 19 -0.838 23.707 17.172 1.00 32.94 N ATOM 142 N VAL A 20 -1.381 18.478 22.536 1.00 12.81 N ATOM 143 CA VAL A 20 -1.159 17.090 22.938 1.00 12.18 C ATOM 144 C VAL A 20 -1.174 16.088 21.795 1.00 14.25 C ATOM 145 O VAL A 20 -2.064 16.118 20.950 1.00 13.93 O ATOM 146 CB VAL A 20 -2.244 16.624 23.935 1.00 12.11 C ATOM 147 CG1 VAL A 20 -1.934 15.208 24.425 1.00 11.83 C ATOM 148 CG2 VAL A 20 -2.337 17.597 25.091 1.00 11.98 C ATOM 149 N GLU A 21 -0.188 15.197 21.779 1.00 10.98 N ATOM 150 CA GLU A 21 -0.146 14.157 20.760 1.00 18.04 C ATOM 151 C GLU A 21 -0.542 12.861 21.461 1.00 16.71 C ATOM 152 O GLU A 21 0.068 12.483 22.463 1.00 13.57 O ATOM 153 CB GLU A 21 1.256 14.026 20.164 1.00 21.82 C ATOM 154 CG GLU A 21 1.402 12.823 19.248 1.00 29.30 C ATOM 155 CD GLU A 21 2.719 12.814 18.503 1.00 33.96 C ATOM 156 OE1 GLU A 21 3.773 12.974 19.154 1.00 36.33 O ATOM 157 OE2 GLU A 21 2.697 12.641 17.265 1.00 38.97 O ATOM 158 N MET A 22 -1.573 12.199 20.942 1.00 15.57 N ATOM 159 CA MET A 22 -2.064 10.952 21.522 1.00 16.49 C ATOM 160 C MET A 22 -1.633 9.762 20.684 1.00 18.10 C ATOM 161 O MET A 22 -1.534 9.854 19.458 1.00 19.18 O ATOM 162 CB MET A 22 -3.595 10.958 21.609 1.00 15.35 C ATOM 163 CG MET A 22 -4.195 12.065 22.452 1.00 14.61 C ATOM 164 SD MET A 22 -3.732 11.959 24.195 1.00 14.83 S ATOM 165 CE MET A 22 -4.691 10.534 24.725 1.00 12.83 C ATOM 166 N LYS A 23 -1.402 8.638 21.348 1.00 17.43 N ATOM 167 CA LYS A 23 -0.980 7.423 20.668 1.00 24.78 C ATOM 168 C LYS A 23 -2.037 6.928 19.680 1.00 30.60 C ATOM 169 O LYS A 23 -1.712 6.543 18.555 1.00 36.89 O ATOM 170 CB LYS A 23 -0.663 6.341 21.704 1.00 25.61 C ATOM 171 CG LYS A 23 -0.090 5.065 21.129 1.00 29.18 C ATOM 172 CD LYS A 23 0.576 4.231 22.215 1.00 30.69 C ATOM 173 CE LYS A 23 -0.386 3.902 23.347 1.00 31.48 C ATOM 174 NZ LYS A 23 0.292 3.202 24.472 1.00 29.40 N ATOM 175 N GLY A 24 -3.301 6.948 20.093 1.00 34.18 N ATOM 176 CA GLY A 24 -4.371 6.497 19.217 1.00 42.56 C ATOM 177 C GLY A 24 -5.054 7.622 18.456 1.00 47.45 C ATOM 178 O GLY A 24 -6.264 7.576 18.217 1.00 47.89 O ATOM 179 N GLU A 25 -4.276 8.630 18.070 1.00 48.95 N ATOM 180 CA GLU A 25 -4.796 9.781 17.332 1.00 49.42 C ATOM 181 C GLU A 25 -3.753 10.296 16.341 1.00 48.97 C ATOM 182 O GLU A 25 -2.587 10.469 16.691 1.00 49.84 O ATOM 183 CB GLU A 25 -5.181 10.893 18.311 1.00 51.57 C ATOM 184 CG GLU A 25 -6.673 11.190 18.375 1.00 54.39 C ATOM 185 CD GLU A 25 -7.124 12.157 17.296 1.00 55.59 C ATOM 186 OE1 GLU A 25 -6.700 13.331 17.340 1.00 56.74 O ATOM 187 OE2 GLU A 25 -7.898 11.747 16.404 1.00 56.01 O ATOM 188 N GLU A 26 -4.173 10.543 15.104 1.00 47.72 N ATOM 189 CA GLU A 26 -3.259 11.031 14.075 1.00 45.76 C ATOM 190 C GLU A 26 -3.114 12.555 14.071 1.00 42.11 C ATOM 191 O GLU A 26 -2.105 13.091 13.610 1.00 41.70 O ATOM 192 CB GLU A 26 -3.716 10.530 12.700 1.00 50.25 C ATOM 193 CG GLU A 26 -5.198 10.720 12.422 1.00 54.51 C ATOM 194 CD GLU A 26 -5.687 9.868 11.263 1.00 57.55 C ATOM 195 OE1 GLU A 26 -5.561 8.627 11.343 1.00 58.81 O ATOM 196 OE2 GLU A 26 -6.200 10.434 10.273 1.00 59.03 O ATOM 197 N ASN A 27 -4.122 13.249 14.587 1.00 37.10 N ATOM 198 CA ASN A 27 -4.090 14.707 14.657 1.00 32.16 C ATOM 199 C ASN A 27 -3.868 15.104 16.111 1.00 29.56 C ATOM 200 O ASN A 27 -4.240 14.359 17.014 1.00 28.35 O ATOM 201 CB ASN A 27 -5.414 15.288 14.166 1.00 31.78 C ATOM 202 CG ASN A 27 -5.765 14.828 12.771 1.00 33.56 C ATOM 203 OD1 ASN A 27 -4.942 14.899 11.863 1.00 36.50 O ATOM 204 ND2 ASN A 27 -6.991 14.358 12.592 1.00 32.62 N ATOM 205 N GLN A 28 -3.271 16.270 16.338 1.00 25.00 N ATOM 206 CA GLN A 28 -3.024 16.724 17.701 1.00 24.08 C ATOM 207 C GLN A 28 -4.244 17.414 18.293 1.00 22.39 C ATOM 208 O GLN A 28 -5.125 17.875 17.565 1.00 18.75 O ATOM 209 CB GLN A 28 -1.844 17.701 17.750 1.00 26.69 C ATOM 210 CG GLN A 28 -0.543 17.179 17.159 1.00 29.60 C ATOM 211 CD GLN A 28 -0.368 17.579 15.703 1.00 30.51 C ATOM 212 OE1 GLN A 28 -0.385 18.767 15.369 1.00 28.00 O ATOM 213 NE2 GLN A 28 -0.194 16.590 14.831 1.00 30.38 N ATOM 214 N LEU A 29 -4.286 17.478 19.621 1.00 17.60 N ATOM 215 CA LEU A 29 -5.372 18.146 20.321 1.00 13.82 C ATOM 216 C LEU A 29 -4.816 19.420 20.934 1.00 14.73 C ATOM 217 O LEU A 29 -3.747 19.409 21.547 1.00 13.40 O ATOM 218 CB LEU A 29 -5.950 17.260 21.432 1.00 15.61 C ATOM 219 CG LEU A 29 -6.698 15.992 21.014 1.00 17.65 C ATOM 220 CD1 LEU A 29 -7.637 16.309 19.853 1.00 19.13 C ATOM 221 CD2 LEU A 29 -5.708 14.928 20.603 1.00 23.43 C ATOM 222 N VAL A 30 -5.537 20.519 20.751 1.00 11.03 N ATOM 223 CA VAL A 30 -5.125 21.802 21.293 1.00 13.27 C ATOM 224 C VAL A 30 -6.293 22.362 22.092 1.00 15.06 C ATOM 225 O VAL A 30 -7.416 22.434 21.589 1.00 13.80 O ATOM 226 CB VAL A 30 -4.774 22.797 20.169 1.00 17.35 C ATOM 227 CG1 VAL A 30 -4.356 24.128 20.765 1.00 15.22 C ATOM 228 CG2 VAL A 30 -3.675 22.222 19.288 1.00 17.67 C ATOM 229 N GLY A 31 -6.032 22.747 23.338 1.00 13.60 N ATOM 230 CA GLY A 31 -7.095 23.295 24.163 1.00 12.93 C ATOM 231 C GLY A 31 -6.592 23.796 25.499 1.00 12.34 C ATOM 232 O GLY A 31 -5.400 23.721 25.788 1.00 13.21 O ATOM 233 N LYS A 32 -7.501 24.316 26.314 1.00 13.56 N ATOM 234 CA LYS A 32 -7.133 24.830 27.624 1.00 11.92 C ATOM 235 C LYS A 32 -7.103 23.687 28.626 1.00 13.67 C ATOM 236 O LYS A 32 -8.075 22.929 28.757 1.00 9.82 O ATOM 237 CB LYS A 32 -8.137 25.890 28.086 1.00 13.57 C ATOM 238 CG LYS A 32 -7.804 26.518 29.444 1.00 13.10 C ATOM 239 CD LYS A 32 -8.978 27.341 29.956 1.00 16.44 C ATOM 240 CE LYS A 32 -8.706 27.900 31.342 1.00 21.19 C ATOM 241 NZ LYS A 32 -9.960 28.358 32.014 1.00 24.62 N ATOM 242 N LEU A 33 -5.983 23.561 29.332 1.00 12.31 N ATOM 243 CA LEU A 33 -5.846 22.507 30.325 1.00 12.95 C ATOM 244 C LEU A 33 -6.594 22.929 31.583 1.00 14.89 C ATOM 245 O LEU A 33 -6.157 23.824 32.303 1.00 13.36 O ATOM 246 CB LEU A 33 -4.368 22.254 30.652 1.00 11.35 C ATOM 247 CG LEU A 33 -4.109 21.145 31.683 1.00 10.95 C ATOM 248 CD1 LEU A 33 -4.696 19.834 31.196 1.00 7.68 C ATOM 249 CD2 LEU A 33 -2.610 21.009 31.924 1.00 10.06 C ATOM 250 N GLU A 34 -7.733 22.287 31.827 1.00 14.72 N ATOM 251 CA GLU A 34 -8.561 22.591 32.990 1.00 16.11 C ATOM 252 C GLU A 34 -8.275 21.677 34.168 1.00 16.23 C ATOM 253 O GLU A 34 -8.586 22.013 35.309 1.00 17.12 O ATOM 254 CB GLU A 34 -10.043 22.493 32.618 1.00 20.73 C ATOM 255 CG GLU A 34 -10.575 23.735 31.939 1.00 29.40 C ATOM 256 CD GLU A 34 -10.723 24.893 32.901 1.00 33.30 C ATOM 257 OE1 GLU A 34 -11.617 24.835 33.772 1.00 35.00 O ATOM 258 OE2 GLU A 34 -9.937 25.855 32.795 1.00 38.23 O ATOM 259 N GLY A 35 -7.688 20.518 33.893 1.00 14.92 N ATOM 260 CA GLY A 35 -7.382 19.591 34.966 1.00 17.13 C ATOM 261 C GLY A 35 -6.471 18.455 34.545 1.00 14.75 C ATOM 262 O GLY A 35 -6.456 18.053 33.382 1.00 14.72 O ATOM 263 N VAL A 36 -5.709 17.942 35.503 1.00 14.61 N ATOM 264 CA VAL A 36 -4.787 16.843 35.263 1.00 14.03 C ATOM 265 C VAL A 36 -4.567 16.091 36.578 1.00 16.53 C ATOM 266 O VAL A 36 -4.746 16.663 37.653 1.00 17.71 O ATOM 267 CB VAL A 36 -3.422 17.370 34.733 1.00 16.01 C ATOM 268 CG1 VAL A 36 -2.842 18.390 35.704 1.00 17.38 C ATOM 269 CG2 VAL A 36 -2.449 16.216 34.535 1.00 16.33 C ATOM 270 N ASP A 37 -4.216 14.808 36.492 1.00 17.00 N ATOM 271 CA ASP A 37 -3.944 14.013 37.692 1.00 17.81 C ATOM 272 C ASP A 37 -2.570 13.358 37.589 1.00 18.30 C ATOM 273 O ASP A 37 -1.870 13.532 36.591 1.00 16.80 O ATOM 274 CB ASP A 37 -5.022 12.946 37.924 1.00 17.10 C ATOM 275 CG ASP A 37 -5.245 12.058 36.718 1.00 17.69 C ATOM 276 OD1 ASP A 37 -4.266 11.760 36.008 1.00 18.62 O ATOM 277 OD2 ASP A 37 -6.408 11.647 36.494 1.00 17.45 O ATOM 278 N ASP A 38 -2.190 12.598 38.614 1.00 21.26 N ATOM 279 CA ASP A 38 -0.882 11.951 38.631 1.00 22.33 C ATOM 280 C ASP A 38 -0.642 10.929 37.526 1.00 19.90 C ATOM 281 O ASP A 38 0.502 10.593 37.235 1.00 21.89 O ATOM 282 CB ASP A 38 -0.621 11.303 39.995 1.00 26.36 C ATOM 283 CG ASP A 38 -0.423 12.329 41.099 1.00 31.39 C ATOM 284 OD1 ASP A 38 0.134 13.414 40.815 1.00 31.56 O ATOM 285 OD2 ASP A 38 -0.809 12.047 42.252 1.00 35.31 O ATOM 286 N TYR A 39 -1.706 10.428 36.912 1.00 19.47 N ATOM 287 CA TYR A 39 -1.542 9.463 35.834 1.00 20.09 C ATOM 288 C TYR A 39 -1.522 10.208 34.503 1.00 17.99 C ATOM 289 O TYR A 39 -1.531 9.605 33.428 1.00 16.32 O ATOM 290 CB TYR A 39 -2.682 8.442 35.854 1.00 28.34 C ATOM 291 CG TYR A 39 -2.775 7.679 37.155 1.00 36.83 C ATOM 292 CD1 TYR A 39 -3.422 8.226 38.263 1.00 40.12 C ATOM 293 CD2 TYR A 39 -2.193 6.419 37.288 1.00 39.62 C ATOM 294 CE1 TYR A 39 -3.487 7.536 39.474 1.00 42.98 C ATOM 295 CE2 TYR A 39 -2.251 5.721 38.492 1.00 42.48 C ATOM 296 CZ TYR A 39 -2.899 6.284 39.579 1.00 43.68 C ATOM 297 OH TYR A 39 -2.962 5.594 40.768 1.00 46.23 O ATOM 298 N MET A 40 -1.491 11.532 34.598 1.00 16.11 N ATOM 299 CA MET A 40 -1.483 12.407 33.433 1.00 17.01 C ATOM 300 C MET A 40 -2.741 12.349 32.568 1.00 17.68 C ATOM 301 O MET A 40 -2.688 12.578 31.353 1.00 15.78 O ATOM 302 CB MET A 40 -0.226 12.170 32.586 1.00 19.38 C ATOM 303 CG MET A 40 1.000 12.863 33.187 1.00 24.63 C ATOM 304 SD MET A 40 2.492 12.824 32.172 1.00 31.38 S ATOM 305 CE MET A 40 1.901 13.540 30.713 1.00 16.88 C ATOM 306 N ASN A 41 -3.871 12.015 33.187 1.00 14.05 N ATOM 307 CA ASN A 41 -5.131 12.050 32.457 1.00 14.62 C ATOM 308 C ASN A 41 -5.283 13.557 32.283 1.00 14.43 C ATOM 309 O ASN A 41 -4.855 14.316 33.150 1.00 12.23 O ATOM 310 CB ASN A 41 -6.285 11.512 33.302 1.00 11.91 C ATOM 311 CG ASN A 41 -6.206 10.016 33.501 1.00 15.14 C ATOM 312 OD1 ASN A 41 -5.977 9.271 32.549 1.00 11.61 O ATOM 313 ND2 ASN A 41 -6.396 9.566 34.740 1.00 11.10 N ATOM 314 N LEU A 42 -5.866 13.998 31.175 1.00 12.68 N ATOM 315 CA LEU A 42 -6.015 15.426 30.943 1.00 14.27 C ATOM 316 C LEU A 42 -7.451 15.820 30.626 1.00 14.90 C ATOM 317 O LEU A 42 -8.157 15.116 29.904 1.00 14.11 O ATOM 318 CB LEU A 42 -5.123 15.869 29.773 1.00 14.12 C ATOM 319 CG LEU A 42 -3.650 15.452 29.772 1.00 15.28 C ATOM 320 CD1 LEU A 42 -3.026 15.828 28.435 1.00 14.57 C ATOM 321 CD2 LEU A 42 -2.908 16.116 30.928 1.00 15.18 C ATOM 322 N TYR A 43 -7.873 16.954 31.172 1.00 14.49 N ATOM 323 CA TYR A 43 -9.199 17.480 30.902 1.00 11.72 C ATOM 324 C TYR A 43 -9.006 18.834 30.233 1.00 12.35 C ATOM 325 O TYR A 43 -8.531 19.786 30.853 1.00 12.43 O ATOM 326 CB TYR A 43 -10.020 17.622 32.189 1.00 8.99 C ATOM 327 CG TYR A 43 -11.322 18.386 32.012 1.00 12.05 C ATOM 328 CD1 TYR A 43 -12.067 18.279 30.837 1.00 13.25 C ATOM 329 CD2 TYR A 43 -11.810 19.213 33.022 1.00 13.38 C ATOM 330 CE1 TYR A 43 -13.260 18.977 30.670 1.00 17.56 C ATOM 331 CE2 TYR A 43 -13.002 19.915 32.866 1.00 15.44 C ATOM 332 CZ TYR A 43 -13.720 19.795 31.686 1.00 17.83 C ATOM 333 OH TYR A 43 -14.884 20.509 31.510 1.00 17.24 O ATOM 334 N LEU A 44 -9.356 18.895 28.953 1.00 11.88 N ATOM 335 CA LEU A 44 -9.228 20.117 28.171 1.00 12.68 C ATOM 336 C LEU A 44 -10.600 20.695 27.867 1.00 10.41 C ATOM 337 O LEU A 44 -11.572 19.956 27.722 1.00 14.24 O ATOM 338 CB LEU A 44 -8.525 19.831 26.839 1.00 14.22 C ATOM 339 CG LEU A 44 -7.134 19.195 26.857 1.00 17.86 C ATOM 340 CD1 LEU A 44 -6.603 19.100 25.431 1.00 15.18 C ATOM 341 CD2 LEU A 44 -6.204 20.031 27.714 1.00 18.92 C ATOM 342 N THR A 45 -10.673 22.016 27.779 1.00 9.36 N ATOM 343 CA THR A 45 -11.917 22.690 27.442 1.00 9.49 C ATOM 344 C THR A 45 -11.607 23.551 26.227 1.00 12.47 C ATOM 345 O THR A 45 -10.441 23.869 25.979 1.00 13.56 O ATOM 346 CB THR A 45 -12.421 23.591 28.585 1.00 10.31 C ATOM 347 OG1 THR A 45 -11.404 24.534 28.936 1.00 13.74 O ATOM 348 CG2 THR A 45 -12.791 22.743 29.815 1.00 12.90 C ATOM 349 N ASN A 46 -12.640 23.905 25.467 1.00 11.09 N ATOM 350 CA ASN A 46 -12.470 24.728 24.275 1.00 11.16 C ATOM 351 C ASN A 46 -11.446 24.076 23.339 1.00 12.84 C ATOM 352 O ASN A 46 -10.719 24.758 22.618 1.00 14.48 O ATOM 353 CB ASN A 46 -11.995 26.122 24.698 1.00 12.47 C ATOM 354 CG ASN A 46 -12.845 26.703 25.819 1.00 13.81 C ATOM 355 OD1 ASN A 46 -13.993 27.073 25.607 1.00 12.92 O ATOM 356 ND2 ASN A 46 -12.286 26.760 27.026 1.00 15.15 N ATOM 357 N ALA A 47 -11.417 22.749 23.343 1.00 12.36 N ATOM 358 CA ALA A 47 -10.466 21.994 22.540 1.00 11.04 C ATOM 359 C ALA A 47 -10.793 21.868 21.060 1.00 13.51 C ATOM 360 O ALA A 47 -11.946 21.978 20.642 1.00 14.67 O ATOM 361 CB ALA A 47 -10.285 20.612 23.133 1.00 13.09 C ATOM 362 N MET A 48 -9.754 21.616 20.276 1.00 14.15 N ATOM 363 CA MET A 48 -9.897 21.439 18.840 1.00 20.38 C ATOM 364 C MET A 48 -8.875 20.424 18.353 1.00 19.08 C ATOM 365 O MET A 48 -7.837 20.216 18.979 1.00 16.15 O ATOM 366 CB MET A 48 -9.684 22.766 18.111 1.00 25.54 C ATOM 367 CG MET A 48 -8.323 23.386 18.339 1.00 32.64 C ATOM 368 SD MET A 48 -8.117 24.935 17.445 1.00 41.02 S ATOM 369 CE MET A 48 -8.810 26.109 18.624 1.00 40.25 C ATOM 370 N GLU A 49 -9.182 19.789 17.232 1.00 17.46 N ATOM 371 CA GLU A 49 -8.283 18.816 16.643 1.00 19.10 C ATOM 372 C GLU A 49 -7.502 19.618 15.608 1.00 19.95 C ATOM 373 O GLU A 49 -8.097 20.329 14.794 1.00 18.79 O ATOM 374 CB GLU A 49 -9.096 17.710 15.975 1.00 20.68 C ATOM 375 CG GLU A 49 -8.314 16.481 15.592 1.00 28.05 C ATOM 376 CD GLU A 49 -9.216 15.390 15.050 1.00 31.22 C ATOM 377 OE1 GLU A 49 -10.199 15.036 15.734 1.00 33.97 O ATOM 378 OE2 GLU A 49 -8.944 14.888 13.943 1.00 35.11 O ATOM 379 N CYS A 50 -6.177 19.517 15.648 1.00 18.43 N ATOM 380 CA CYS A 50 -5.336 20.259 14.724 1.00 23.55 C ATOM 381 C CYS A 50 -4.248 19.405 14.085 1.00 24.85 C ATOM 382 O CYS A 50 -3.914 18.326 14.566 1.00 24.75 O ATOM 383 CB CYS A 50 -4.657 21.435 15.446 1.00 23.56 C ATOM 384 SG CYS A 50 -5.758 22.646 16.233 1.00 27.18 S ATOM 385 N LYS A 51 -3.717 19.914 12.981 1.00 27.16 N ATOM 386 CA LYS A 51 -2.622 19.288 12.251 1.00 32.30 C ATOM 387 C LYS A 51 -1.641 20.440 12.102 1.00 34.33 C ATOM 388 O LYS A 51 -1.568 21.084 11.056 1.00 36.68 O ATOM 389 CB LYS A 51 -3.077 18.808 10.872 1.00 33.18 C ATOM 390 CG LYS A 51 -3.635 17.397 10.855 1.00 35.44 C ATOM 391 CD LYS A 51 -4.003 16.983 9.438 1.00 37.00 C ATOM 392 CE LYS A 51 -3.826 15.491 9.244 1.00 37.26 C ATOM 393 NZ LYS A 51 -2.402 15.109 9.430 1.00 37.82 N ATOM 394 N GLY A 52 -0.899 20.709 13.168 1.00 36.17 N ATOM 395 CA GLY A 52 0.026 21.821 13.143 1.00 36.07 C ATOM 396 C GLY A 52 -0.820 23.043 13.431 1.00 37.69 C ATOM 397 O GLY A 52 -1.597 23.048 14.389 1.00 37.36 O ATOM 398 N GLU A 53 -0.701 24.072 12.601 1.00 36.96 N ATOM 399 CA GLU A 53 -1.482 25.283 12.804 1.00 38.00 C ATOM 400 C GLU A 53 -2.834 25.198 12.096 1.00 35.81 C ATOM 401 O GLU A 53 -3.601 26.161 12.088 1.00 36.99 O ATOM 402 CB GLU A 53 -0.702 26.498 12.300 1.00 40.98 C ATOM 403 CG GLU A 53 0.684 26.638 12.912 1.00 46.00 C ATOM 404 CD GLU A 53 0.650 26.740 14.428 1.00 49.17 C ATOM 405 OE1 GLU A 53 0.021 27.685 14.948 1.00 51.39 O ATOM 406 OE2 GLU A 53 1.254 25.876 15.099 1.00 50.07 O ATOM 407 N GLU A 54 -3.127 24.039 11.514 1.00 32.81 N ATOM 408 CA GLU A 54 -4.384 23.836 10.800 1.00 30.87 C ATOM 409 C GLU A 54 -5.459 23.148 11.636 1.00 27.61 C ATOM 410 O GLU A 54 -5.292 22.012 12.073 1.00 29.24 O ATOM 411 CB GLU A 54 -4.142 23.017 9.529 1.00 34.29 C ATOM 412 CG GLU A 54 -3.244 23.695 8.508 1.00 37.10 C ATOM 413 CD GLU A 54 -3.822 25.003 8.002 1.00 39.05 C ATOM 414 OE1 GLU A 54 -4.943 24.988 7.450 1.00 39.81 O ATOM 415 OE2 GLU A 54 -3.154 26.046 8.156 1.00 41.42 O ATOM 416 N LYS A 55 -6.568 23.850 11.838 1.00 26.15 N ATOM 417 CA LYS A 55 -7.711 23.348 12.596 1.00 24.91 C ATOM 418 C LYS A 55 -8.569 22.444 11.706 1.00 26.12 C ATOM 419 O LYS A 55 -8.917 22.822 10.583 1.00 24.82 O ATOM 420 CB LYS A 55 -8.534 24.541 13.092 1.00 28.30 C ATOM 421 CG LYS A 55 -9.994 24.267 13.404 1.00 31.15 C ATOM 422 CD LYS A 55 -10.182 23.525 14.709 1.00 35.12 C ATOM 423 CE LYS A 55 -11.247 24.205 15.559 1.00 34.85 C ATOM 424 NZ LYS A 55 -12.498 24.462 14.793 1.00 38.47 N ATOM 425 N VAL A 56 -8.909 21.254 12.201 1.00 22.26 N ATOM 426 CA VAL A 56 -9.725 20.323 11.428 1.00 22.11 C ATOM 427 C VAL A 56 -11.068 19.976 12.069 1.00 22.94 C ATOM 428 O VAL A 56 -12.012 19.634 11.368 1.00 20.13 O ATOM 429 CB VAL A 56 -8.961 19.009 11.139 1.00 24.28 C ATOM 430 CG1 VAL A 56 -7.697 19.310 10.350 1.00 24.86 C ATOM 431 CG2 VAL A 56 -8.621 18.301 12.432 1.00 22.83 C ATOM 432 N ARG A 57 -11.157 20.058 13.397 1.00 21.31 N ATOM 433 CA ARG A 57 -12.406 19.756 14.099 1.00 21.32 C ATOM 434 C ARG A 57 -12.516 20.484 15.434 1.00 22.16 C ATOM 435 O ARG A 57 -11.537 20.594 16.164 1.00 19.00 O ATOM 436 CB ARG A 57 -12.530 18.257 14.400 1.00 26.83 C ATOM 437 CG ARG A 57 -12.853 17.346 13.238 1.00 32.40 C ATOM 438 CD ARG A 57 -13.407 16.027 13.777 1.00 34.30 C ATOM 439 NE ARG A 57 -13.641 15.028 12.739 1.00 38.31 N ATOM 440 CZ ARG A 57 -12.696 14.251 12.217 1.00 40.20 C ATOM 441 NH1 ARG A 57 -11.440 14.353 12.635 1.00 38.65 N ATOM 442 NH2 ARG A 57 -13.010 13.365 11.280 1.00 41.69 N ATOM 443 N SER A 58 -13.711 20.975 15.747 1.00 22.24 N ATOM 444 CA SER A 58 -13.959 21.637 17.027 1.00 21.16 C ATOM 445 C SER A 58 -14.431 20.522 17.951 1.00 20.11 C ATOM 446 O SER A 58 -15.267 19.706 17.557 1.00 23.47 O ATOM 447 CB SER A 58 -15.055 22.702 16.895 1.00 22.39 C ATOM 448 OG SER A 58 -14.558 23.866 16.265 1.00 25.78 O ATOM 449 N LEU A 59 -13.906 20.479 19.170 1.00 16.60 N ATOM 450 CA LEU A 59 -14.273 19.428 20.106 1.00 18.57 C ATOM 451 C LEU A 59 -14.919 19.913 21.403 1.00 18.10 C ATOM 452 O LEU A 59 -15.794 19.246 21.941 1.00 21.22 O ATOM 453 CB LEU A 59 -13.037 18.589 20.446 1.00 21.78 C ATOM 454 CG LEU A 59 -12.288 17.936 19.281 1.00 20.89 C ATOM 455 CD1 LEU A 59 -11.063 17.204 19.813 1.00 25.63 C ATOM 456 CD2 LEU A 59 -13.200 16.972 18.550 1.00 21.80 C ATOM 457 N GLY A 60 -14.483 21.061 21.909 1.00 13.34 N ATOM 458 CA GLY A 60 -15.051 21.564 23.146 1.00 14.81 C ATOM 459 C GLY A 60 -14.425 20.861 24.338 1.00 12.11 C ATOM 460 O GLY A 60 -13.203 20.834 24.474 1.00 13.49 O ATOM 461 N GLU A 61 -15.255 20.281 25.194 1.00 11.35 N ATOM 462 CA GLU A 61 -14.761 19.579 26.375 1.00 14.34 C ATOM 463 C GLU A 61 -14.358 18.139 26.064 1.00 12.24 C ATOM 464 O GLU A 61 -15.143 17.382 25.498 1.00 10.89 O ATOM 465 CB GLU A 61 -15.832 19.547 27.468 1.00 16.30 C ATOM 466 CG GLU A 61 -16.337 20.899 27.920 1.00 17.55 C ATOM 467 CD GLU A 61 -17.277 20.783 29.106 1.00 22.60 C ATOM 468 OE1 GLU A 61 -18.254 20.011 29.021 1.00 20.88 O ATOM 469 OE2 GLU A 61 -17.042 21.465 30.123 1.00 25.49 O ATOM 470 N ILE A 62 -13.140 17.755 26.433 1.00 11.62 N ATOM 471 CA ILE A 62 -12.694 16.378 26.207 1.00 11.62 C ATOM 472 C ILE A 62 -11.768 15.915 27.320 1.00 12.85 C ATOM 473 O ILE A 62 -11.080 16.717 27.941 1.00 11.49 O ATOM 474 CB ILE A 62 -11.921 16.204 24.875 1.00 11.98 C ATOM 475 CG1 ILE A 62 -10.708 17.137 24.853 1.00 15.00 C ATOM 476 CG2 ILE A 62 -12.838 16.470 23.683 1.00 15.73 C ATOM 477 CD1 ILE A 62 -9.795 16.919 23.657 1.00 15.99 C ATOM 478 N VAL A 63 -11.756 14.611 27.559 1.00 12.59 N ATOM 479 CA VAL A 63 -10.890 14.023 28.568 1.00 12.70 C ATOM 480 C VAL A 63 -9.946 13.064 27.851 1.00 12.11 C ATOM 481 O VAL A 63 -10.392 12.168 27.128 1.00 12.86 O ATOM 482 CB VAL A 63 -11.705 13.247 29.623 1.00 13.16 C ATOM 483 CG1 VAL A 63 -10.774 12.438 30.511 1.00 12.01 C ATOM 484 CG2 VAL A 63 -12.537 14.224 30.454 1.00 11.50 C ATOM 485 N LEU A 64 -8.645 13.262 28.040 1.00 11.28 N ATOM 486 CA LEU A 64 -7.641 12.411 27.405 1.00 10.77 C ATOM 487 C LEU A 64 -7.077 11.410 28.416 1.00 13.62 C ATOM 488 O LEU A 64 -6.724 11.779 29.537 1.00 14.37 O ATOM 489 CB LEU A 64 -6.514 13.275 26.826 1.00 13.71 C ATOM 490 CG LEU A 64 -6.967 14.372 25.845 1.00 15.56 C ATOM 491 CD1 LEU A 64 -5.759 15.158 25.332 1.00 14.13 C ATOM 492 CD2 LEU A 64 -7.717 13.741 24.688 1.00 15.63 C ATOM 493 N ARG A 65 -7.007 10.141 28.019 1.00 12.50 N ATOM 494 CA ARG A 65 -6.492 9.094 28.896 1.00 12.04 C ATOM 495 C ARG A 65 -4.972 9.202 29.001 1.00 12.71 C ATOM 496 O ARG A 65 -4.270 9.150 27.998 1.00 9.50 O ATOM 497 CB ARG A 65 -6.879 7.719 28.354 1.00 11.52 C ATOM 498 CG ARG A 65 -6.501 6.560 29.249 1.00 15.06 C ATOM 499 CD ARG A 65 -7.298 6.591 30.545 1.00 16.75 C ATOM 500 NE ARG A 65 -7.021 5.426 31.378 1.00 21.45 N ATOM 501 CZ ARG A 65 -6.017 5.340 32.243 1.00 24.46 C ATOM 502 NH1 ARG A 65 -5.181 6.360 32.397 1.00 27.58 N ATOM 503 NH2 ARG A 65 -5.853 4.232 32.956 1.00 24.35 N ATOM 504 N GLY A 66 -4.481 9.339 30.229 1.00 9.02 N ATOM 505 CA GLY A 66 -3.054 9.482 30.466 1.00 9.34 C ATOM 506 C GLY A 66 -2.092 8.480 29.847 1.00 11.68 C ATOM 507 O GLY A 66 -1.059 8.875 29.302 1.00 13.98 O ATOM 508 N ASN A 67 -2.394 7.190 29.909 1.00 12.26 N ATOM 509 CA ASN A 67 -1.444 6.242 29.343 1.00 17.65 C ATOM 510 C ASN A 67 -1.398 6.276 27.821 1.00 14.32 C ATOM 511 O ASN A 67 -0.567 5.610 27.219 1.00 14.19 O ATOM 512 CB ASN A 67 -1.708 4.816 29.832 1.00 24.07 C ATOM 513 CG ASN A 67 -3.043 4.293 29.402 1.00 26.78 C ATOM 514 OD1 ASN A 67 -4.078 4.703 29.922 1.00 32.41 O ATOM 515 ND2 ASN A 67 -3.034 3.378 28.438 1.00 30.24 N ATOM 516 N ASN A 68 -2.284 7.046 27.198 1.00 12.47 N ATOM 517 CA ASN A 68 -2.265 7.155 25.740 1.00 12.98 C ATOM 518 C ASN A 68 -1.764 8.517 25.275 1.00 14.35 C ATOM 519 O ASN A 68 -1.749 8.832 24.074 1.00 12.18 O ATOM 520 CB ASN A 68 -3.638 6.821 25.157 1.00 12.57 C ATOM 521 CG ASN A 68 -3.929 5.332 25.237 1.00 14.23 C ATOM 522 OD1 ASN A 68 -3.059 4.516 24.941 1.00 14.44 O ATOM 523 ND2 ASN A 68 -5.137 4.973 25.642 1.00 12.25 N ATOM 524 N VAL A 69 -1.334 9.320 26.242 1.00 11.13 N ATOM 525 CA VAL A 69 -0.759 10.627 25.944 1.00 12.90 C ATOM 526 C VAL A 69 0.683 10.334 25.549 1.00 12.03 C ATOM 527 O VAL A 69 1.358 9.544 26.210 1.00 13.49 O ATOM 528 CB VAL A 69 -0.727 11.547 27.188 1.00 12.99 C ATOM 529 CG1 VAL A 69 0.153 12.762 26.911 1.00 10.85 C ATOM 530 CG2 VAL A 69 -2.139 12.001 27.542 1.00 13.91 C ATOM 531 N VAL A 70 1.151 10.943 24.467 1.00 10.26 N ATOM 532 CA VAL A 70 2.528 10.731 24.033 1.00 9.03 C ATOM 533 C VAL A 70 3.400 11.924 24.404 1.00 10.37 C ATOM 534 O VAL A 70 4.432 11.780 25.068 1.00 13.74 O ATOM 535 CB VAL A 70 2.615 10.514 22.497 1.00 12.59 C ATOM 536 CG1 VAL A 70 4.081 10.428 22.062 1.00 11.93 C ATOM 537 CG2 VAL A 70 1.873 9.233 22.111 1.00 12.95 C ATOM 538 N LEU A 71 2.961 13.107 23.998 1.00 10.76 N ATOM 539 CA LEU A 71 3.715 14.326 24.240 1.00 14.37 C ATOM 540 C LEU A 71 2.794 15.503 24.555 1.00 13.59 C ATOM 541 O LEU A 71 1.719 15.630 23.971 1.00 14.87 O ATOM 542 CB LEU A 71 4.570 14.624 22.996 1.00 15.71 C ATOM 543 CG LEU A 71 5.461 15.868 22.896 1.00 19.18 C ATOM 544 CD1 LEU A 71 6.419 15.707 21.707 1.00 18.54 C ATOM 545 CD2 LEU A 71 4.608 17.112 22.718 1.00 19.07 C ATOM 546 N ILE A 72 3.222 16.352 25.486 1.00 11.85 N ATOM 547 CA ILE A 72 2.462 17.539 25.873 1.00 12.14 C ATOM 548 C ILE A 72 3.381 18.733 25.675 1.00 13.51 C ATOM 549 O ILE A 72 4.530 18.713 26.100 1.00 13.00 O ATOM 550 CB ILE A 72 2.034 17.504 27.358 1.00 15.00 C ATOM 551 CG1 ILE A 72 1.160 16.279 27.625 1.00 13.31 C ATOM 552 CG2 ILE A 72 1.271 18.792 27.709 1.00 15.75 C ATOM 553 CD1 ILE A 72 0.729 16.143 29.089 1.00 16.07 C ATOM 554 N GLN A 73 2.873 19.771 25.025 1.00 15.97 N ATOM 555 CA GLN A 73 3.672 20.956 24.757 1.00 17.94 C ATOM 556 C GLN A 73 2.888 22.228 25.027 1.00 19.26 C ATOM 557 O GLN A 73 1.786 22.403 24.510 1.00 19.75 O ATOM 558 CB GLN A 73 4.131 20.945 23.291 1.00 20.01 C ATOM 559 CG GLN A 73 4.765 22.245 22.814 1.00 22.80 C ATOM 560 CD GLN A 73 5.007 22.264 21.314 1.00 24.83 C ATOM 561 OE1 GLN A 73 4.101 21.995 20.524 1.00 25.58 O ATOM 562 NE2 GLN A 73 6.231 22.590 20.914 1.00 24.85 N ATOM 563 N PRO A 74 3.439 23.128 25.855 1.00 21.71 N ATOM 564 CA PRO A 74 2.730 24.378 26.139 1.00 23.39 C ATOM 565 C PRO A 74 2.545 25.060 24.787 1.00 28.88 C ATOM 566 O PRO A 74 3.427 24.976 23.930 1.00 28.01 O ATOM 567 CB PRO A 74 3.710 25.137 27.030 1.00 25.69 C ATOM 568 CG PRO A 74 4.464 24.033 27.726 1.00 25.64 C ATOM 569 CD PRO A 74 4.709 23.061 26.601 1.00 23.40 C ATOM 570 N GLN A 75 1.410 25.712 24.577 1.00 32.73 N ATOM 571 CA GLN A 75 1.186 26.375 23.301 1.00 39.17 C ATOM 572 C GLN A 75 1.552 27.849 23.402 1.00 42.29 C ATOM 573 O GLN A 75 1.661 28.347 24.543 1.00 44.20 O ATOM 574 CB GLN A 75 -0.274 26.223 22.874 1.00 41.26 C ATOM 575 CG GLN A 75 -0.555 26.679 21.451 1.00 46.85 C ATOM 576 CD GLN A 75 0.276 25.931 20.423 1.00 49.37 C ATOM 577 OE1 GLN A 75 1.504 26.035 20.403 1.00 50.93 O ATOM 578 NE2 GLN A 75 -0.392 25.169 19.565 1.00 50.14 N TER 579 GLN A 75 HETATM 580 CD CD A 78 0.000 0.000 21.937 0.17 61.72 CD HETATM 581 C ACY A 201 0.588 16.434 11.242 1.00 51.60 C HETATM 582 O ACY A 201 1.289 16.199 10.168 1.00 52.86 O HETATM 583 OXT ACY A 201 -0.447 15.802 11.650 1.00 51.61 O HETATM 584 CH3 ACY A 201 1.113 17.604 12.057 1.00 51.74 C HETATM 585 C ACY A 202 -16.124 22.064 13.139 1.00 44.74 C HETATM 586 O ACY A 202 -15.330 23.087 13.303 1.00 45.43 O HETATM 587 OXT ACY A 202 -15.878 20.837 13.406 1.00 44.17 O HETATM 588 CH3 ACY A 202 -17.478 22.439 12.565 1.00 45.13 C HETATM 589 C ACY A 203 -19.085 20.723 25.138 1.00 30.72 C HETATM 590 O ACY A 203 -17.820 21.006 25.001 1.00 27.32 O HETATM 591 OXT ACY A 203 -19.656 19.592 24.937 1.00 34.61 O HETATM 592 CH3 ACY A 203 -19.920 21.909 25.585 1.00 26.21 C HETATM 593 O HOH A 307 0.000 33.729 35.203 0.50 44.16 O HETATM 594 O HOH A 309 -14.605 25.297 32.084 0.50 32.50 O HETATM 595 O HOH A 316 -2.773 12.841 18.475 1.00 32.51 O HETATM 596 O HOH A 317 -8.600 26.641 22.997 1.00 29.31 O HETATM 597 O HOH A 319 -1.318 26.677 36.680 1.00 37.58 O HETATM 598 O HOH A 320 3.352 22.733 39.368 1.00 37.01 O HETATM 599 O HOH A 323 -1.196 24.784 6.603 1.00 47.79 O HETATM 600 O HOH A 324 18.373 22.770 36.334 1.00 44.77 O HETATM 601 O HOH A 330 -6.553 30.933 32.628 1.00 36.16 O HETATM 602 O HOH A 377 -8.617 12.457 37.852 1.00 7.32 O HETATM 603 O HOH A 382 -16.313 23.846 30.081 1.00 19.10 O HETATM 604 O HOH A 385 -17.288 18.182 24.062 1.00 26.97 O HETATM 605 O HOH A 387 -1.722 29.817 27.238 1.00 27.14 O HETATM 606 O HOH A 398 -14.605 25.297 20.944 0.50 48.63 O HETATM 607 O HOH A 401 -5.952 12.980 9.790 1.00 29.37 O HETATM 608 O HOH A 404 -15.906 21.840 33.538 1.00 46.91 O HETATM 609 O HOH A 405 -2.397 2.618 19.050 1.00 42.44 O HETATM 610 O HOH A 409 3.734 21.614 43.710 1.00 46.20 O HETATM 611 O HOH A 411 -8.261 10.755 12.821 1.00 46.00 O HETATM 612 O HOH A 415 -1.265 16.103 6.863 1.00 36.28 O HETATM 613 O HOH A 421 0.067 16.574 4.720 1.00 35.13 O HETATM 614 O HOH A 422 17.588 19.870 35.658 1.00 39.78 O HETATM 615 O HOH A 424 -17.449 17.800 18.055 1.00 41.00 O HETATM 616 O HOH A 426 -3.858 31.887 30.436 1.00 36.03 O HETATM 617 O HOH A 428 0.993 25.088 8.070 1.00 40.80 O HETATM 618 O HOH A 430 -4.072 7.167 15.867 1.00 40.07 O HETATM 619 O HOH A 431 -1.597 30.443 23.424 1.00 37.68 O HETATM 620 O HOH A 436 -0.387 9.278 42.936 1.00 45.52 O HETATM 621 O HOH A 442 0.000 0.000 27.450 0.17 44.43 O HETATM 622 O HOH A 443 -8.569 12.038 40.572 1.00 31.79 O HETATM 623 O HOH A 444 0.000 0.000 18.270 0.17 39.19 O HETATM 624 O HOH A 445 -5.372 5.610 36.213 1.00 54.10 O HETATM 625 O HOH A 446 -3.541 12.178 8.896 1.00 33.30 O CONECT 581 582 583 584 CONECT 582 581 CONECT 583 581 CONECT 584 581 CONECT 585 586 587 588 CONECT 586 585 CONECT 587 585 CONECT 588 585 CONECT 589 590 591 592 CONECT 590 589 CONECT 591 589 CONECT 592 589 MASTER 325 0 4 2 5 0 4 6 624 1 12 6 END PyCogent-1.5.3/tests/data/2E12.pdb000644 000765 000024 00000476355 11305747275 017452 0ustar00jrideoutstaff000000 000000 HEADER TRANSLATION 17-OCT-06 2E12 TITLE THE CRYSTAL STRUCTURE OF XC5848 FROM XANTHOMONAS CAMPESTRIS TITLE 2 ADOPTING A NOVEL VARIANT OF SM-LIKE MOTIF COMPND MOL_ID: 1; COMPND 2 MOLECULE: HYPOTHETICAL PROTEIN XCC3642; COMPND 3 CHAIN: A, B; COMPND 4 SYNONYM: SM-LIKE MOTIF; COMPND 5 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: XANTHOMONAS CAMPESTRIS PV. CAMPESTRIS; SOURCE 3 ORGANISM_TAXID: 340; SOURCE 4 STRAIN: PV. CAMPESTRIS; SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 6 EXPRESSION_SYSTEM_TAXID: 562 KEYWDS NOVEL SM-LIKE MOTIF, LSM MOTIF, XANTHOMONAS CAMPESTRIS, X- KEYWDS 2 RAY CRYSTALLOGRAPHY, TRANSLATION EXPDTA X-RAY DIFFRACTION AUTHOR K.-H.CHIN,S.-K.RUAN,A.H.-J.WANG,S.-H.CHOU REVDAT 2 24-FEB-09 2E12 1 VERSN REVDAT 1 30-OCT-07 2E12 0 JRNL AUTH K.-H.CHIN,S.-K.RUAN,A.H.-J.WANG,S.-H.CHOU JRNL TITL XC5848, AN ORFAN PROTEIN FROM XANTHOMONAS JRNL TITL 2 CAMPESTRIS, ADOPTS A NOVEL VARIANT OF SM-LIKE MOTIF JRNL REF PROTEINS V. 68 1006 2007 JRNL REFN ISSN 0887-3585 JRNL PMID 17546661 JRNL DOI 10.1002/PROT.21375 REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 1.70 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : CNS REMARK 3 AUTHORS : BRUNGER,ADAMS,CLORE,DELANO,GROS,GROSSE- REMARK 3 : KUNSTLEVE,JIANG,KUSZEWSKI,NILGES, PANNU, REMARK 3 : READ,RICE,SIMONSON,WARREN REMARK 3 REMARK 3 REFINEMENT TARGET : NULL REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.70 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 30.00 REMARK 3 DATA CUTOFF (SIGMA(F)) : 5.000 REMARK 3 DATA CUTOFF HIGH (ABS(F)) : NULL REMARK 3 DATA CUTOFF LOW (ABS(F)) : NULL REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 99.1 REMARK 3 NUMBER OF REFLECTIONS : 6937 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING SET) : 0.220 REMARK 3 FREE R VALUE : 0.280 REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 FREE R VALUE TEST SET COUNT : NULL REMARK 3 ESTIMATED ERROR OF FREE R VALUE : NULL REMARK 3 REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. REMARK 3 TOTAL NUMBER OF BINS USED : NULL REMARK 3 BIN RESOLUTION RANGE HIGH (A) : 1.70 REMARK 3 BIN RESOLUTION RANGE LOW (A) : 1.75 REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : 97.00 REMARK 3 REFLECTIONS IN BIN (WORKING SET) : NULL REMARK 3 BIN R VALUE (WORKING SET) : 0.2400 REMARK 3 BIN FREE R VALUE : 0.2200 REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 BIN FREE R VALUE TEST SET COUNT : NULL REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : 0.012 REMARK 3 REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. REMARK 3 PROTEIN ATOMS : 1512 REMARK 3 NUCLEIC ACID ATOMS : 0 REMARK 3 HETEROGEN ATOMS : 0 REMARK 3 SOLVENT ATOMS : 122 REMARK 3 REMARK 3 B VALUES. REMARK 3 FROM WILSON PLOT (A**2) : 24.00 REMARK 3 MEAN B VALUE (OVERALL, A**2) : NULL REMARK 3 OVERALL ANISOTROPIC B VALUE. REMARK 3 B11 (A**2) : NULL REMARK 3 B22 (A**2) : NULL REMARK 3 B33 (A**2) : NULL REMARK 3 B12 (A**2) : NULL REMARK 3 B13 (A**2) : NULL REMARK 3 B23 (A**2) : NULL REMARK 3 REMARK 3 ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM LUZZATI PLOT (A) : NULL REMARK 3 ESD FROM SIGMAA (A) : NULL REMARK 3 LOW RESOLUTION CUTOFF (A) : NULL REMARK 3 REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM C-V LUZZATI PLOT (A) : NULL REMARK 3 ESD FROM C-V SIGMAA (A) : NULL REMARK 3 REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES. REMARK 3 BOND LENGTHS (A) : 0.007 REMARK 3 BOND ANGLES (DEGREES) : 1.32 REMARK 3 DIHEDRAL ANGLES (DEGREES) : NULL REMARK 3 IMPROPER ANGLES (DEGREES) : NULL REMARK 3 REMARK 3 ISOTROPIC THERMAL MODEL : NULL REMARK 3 REMARK 3 ISOTROPIC THERMAL FACTOR RESTRAINTS. RMS SIGMA REMARK 3 MAIN-CHAIN BOND (A**2) : NULL ; NULL REMARK 3 MAIN-CHAIN ANGLE (A**2) : NULL ; NULL REMARK 3 SIDE-CHAIN BOND (A**2) : NULL ; NULL REMARK 3 SIDE-CHAIN ANGLE (A**2) : NULL ; NULL REMARK 3 REMARK 3 BULK SOLVENT MODELING. REMARK 3 METHOD USED : NULL REMARK 3 KSOL : NULL REMARK 3 BSOL : NULL REMARK 3 REMARK 3 NCS MODEL : NULL REMARK 3 REMARK 3 NCS RESTRAINTS. RMS SIGMA/WEIGHT REMARK 3 GROUP 1 POSITIONAL (A) : NULL ; NULL REMARK 3 GROUP 1 B-FACTOR (A**2) : NULL ; NULL REMARK 3 REMARK 3 PARAMETER FILE 1 : NULL REMARK 3 TOPOLOGY FILE 1 : NULL REMARK 3 REMARK 3 OTHER REFINEMENT REMARKS: NULL REMARK 4 REMARK 4 2E12 COMPLIES WITH FORMAT V. 3.15, 01-DEC-08 REMARK 100 REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY PDBJ ON 19-OCT-06. REMARK 100 THE RCSB ID CODE IS RCSB026092. REMARK 200 REMARK 200 EXPERIMENTAL DETAILS REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION REMARK 200 DATE OF DATA COLLECTION : 28-JUL-06 REMARK 200 TEMPERATURE (KELVIN) : 100 REMARK 200 PH : 8.0 REMARK 200 NUMBER OF CRYSTALS USED : 10 REMARK 200 REMARK 200 SYNCHROTRON (Y/N) : Y REMARK 200 RADIATION SOURCE : NSRRC REMARK 200 BEAMLINE : BL13B1 REMARK 200 X-RAY GENERATOR MODEL : NULL REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M REMARK 200 WAVELENGTH OR RANGE (A) : 0.96437, 0.97983 REMARK 200 MONOCHROMATOR : NULL REMARK 200 OPTICS : NULL REMARK 200 REMARK 200 DETECTOR TYPE : CCD REMARK 200 DETECTOR MANUFACTURER : ADSC QUANTUM 315 REMARK 200 INTENSITY-INTEGRATION SOFTWARE : DENZO REMARK 200 DATA SCALING SOFTWARE : HKL-2000 REMARK 200 REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 6937 REMARK 200 RESOLUTION RANGE HIGH (A) : 1.700 REMARK 200 RESOLUTION RANGE LOW (A) : 30.000 REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 2.000 REMARK 200 REMARK 200 OVERALL. REMARK 200 COMPLETENESS FOR RANGE (%) : 99.7 REMARK 200 DATA REDUNDANCY : 4.500 REMARK 200 R MERGE (I) : 0.24000 REMARK 200 R SYM (I) : 0.06000 REMARK 200 FOR THE DATA SET : 8.0000 REMARK 200 REMARK 200 IN THE HIGHEST RESOLUTION SHELL. REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 1.70 REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : NULL REMARK 200 COMPLETENESS FOR SHELL (%) : 97.5 REMARK 200 DATA REDUNDANCY IN SHELL : 4.50 REMARK 200 R MERGE FOR SHELL (I) : 0.06000 REMARK 200 R SYM FOR SHELL (I) : 0.24000 REMARK 200 FOR SHELL : 7.900 REMARK 200 REMARK 200 DIFFRACTION PROTOCOL: MAD REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: MAD REMARK 200 SOFTWARE USED: AMORE REMARK 200 STARTING MODEL: NULL REMARK 200 REMARK 200 REMARK: NULL REMARK 280 REMARK 280 CRYSTAL REMARK 280 SOLVENT CONTENT, VS (%): 46.26 REMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 2.29 REMARK 280 REMARK 280 CRYSTALLIZATION CONDITIONS: PH 8.0, VAPOR DIFFUSION, SITTING REMARK 280 DROP, TEMPERATURE 298K REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 21 21 21 REMARK 290 REMARK 290 SYMOP SYMMETRY REMARK 290 NNNMMM OPERATOR REMARK 290 1555 X,Y,Z REMARK 290 2555 -X+1/2,-Y,Z+1/2 REMARK 290 3555 -X,Y+1/2,-Z+1/2 REMARK 290 4555 X+1/2,-Y+1/2,-Z REMARK 290 REMARK 290 WHERE NNN -> OPERATOR NUMBER REMARK 290 MMM -> TRANSLATION VECTOR REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY REMARK 290 RELATED MOLECULES. REMARK 290 SMTRY1 1 1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 1 0.000000 1.000000 0.000000 0.00000 REMARK 290 SMTRY3 1 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 2 -1.000000 0.000000 0.000000 24.97100 REMARK 290 SMTRY2 2 0.000000 -1.000000 0.000000 0.00000 REMARK 290 SMTRY3 2 0.000000 0.000000 1.000000 41.06000 REMARK 290 SMTRY1 3 -1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 3 0.000000 1.000000 0.000000 25.84950 REMARK 290 SMTRY3 3 0.000000 0.000000 -1.000000 41.06000 REMARK 290 SMTRY1 4 1.000000 0.000000 0.000000 24.97100 REMARK 290 SMTRY2 4 0.000000 -1.000000 0.000000 25.84950 REMARK 290 SMTRY3 4 0.000000 0.000000 -1.000000 0.00000 REMARK 290 REMARK 290 REMARK: NULL REMARK 300 REMARK 300 BIOMOLECULE: 1 REMARK 300 SEE REMARK 350 FOR THE AUTHOR PROVIDED AND/OR PROGRAM REMARK 300 GENERATED ASSEMBLY INFORMATION FOR THE STRUCTURE IN REMARK 300 THIS ENTRY. THE REMARK MAY ALSO PROVIDE INFORMATION ON REMARK 300 BURIED SURFACE AREA. REMARK 350 REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS REMARK 350 GIVEN BELOW. BOTH NON-CRYSTALLOGRAPHIC AND REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN. REMARK 350 REMARK 350 BIOMOLECULE: 1 REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: DIMERIC REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000 REMARK 465 REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 LEU A 94 REMARK 465 GLY A 95 REMARK 465 ALA A 96 REMARK 465 PRO A 97 REMARK 465 GLN A 98 REMARK 465 VAL A 99 REMARK 465 MET A 100 REMARK 465 PRO A 101 REMARK 465 LEU B 94 REMARK 465 GLY B 95 REMARK 465 ALA B 96 REMARK 465 PRO B 97 REMARK 465 GLN B 98 REMARK 465 VAL B 99 REMARK 465 MET B 100 REMARK 465 PRO B 101 REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: CLOSE CONTACTS IN SAME ASYMMETRIC UNIT REMARK 500 REMARK 500 THE FOLLOWING ATOMS ARE IN CLOSE CONTACT. REMARK 500 REMARK 500 ATM1 RES C SSEQI ATM2 RES C SSEQI DISTANCE REMARK 500 O HOH A 127 O HOH A 149 2.05 REMARK 500 REMARK 500 REMARK: NULL REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: TORSION ANGLES REMARK 500 REMARK 500 TORSION ANGLES OUTSIDE THE EXPECTED RAMACHANDRAN REGIONS: REMARK 500 (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; REMARK 500 SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT:(10X,I3,1X,A3,1X,A1,I4,A1,4X,F7.2,3X,F7.2) REMARK 500 REMARK 500 EXPECTED VALUES: GJ KLEYWEGT AND TA JONES (1996). PHI/PSI- REMARK 500 CHOLOGY: RAMACHANDRAN REVISITED. STRUCTURE 4, 1395 - 1400 REMARK 500 REMARK 500 M RES CSSEQI PSI PHI REMARK 500 ASN A 64 -175.84 -178.56 REMARK 500 HIS A 71 -156.72 -164.33 REMARK 500 LEU A 72 -70.52 -135.73 REMARK 500 ALA A 74 -75.47 -29.45 REMARK 500 SER A 75 -5.54 -145.62 REMARK 500 GLN A 76 -178.36 65.32 REMARK 500 GLU A 77 115.33 61.52 REMARK 500 MET A 92 -36.89 93.21 REMARK 500 LEU B 25 37.31 -77.89 REMARK 500 GLN B 28 37.09 32.85 REMARK 500 ARG B 30 132.20 -36.84 REMARK 500 ASN B 64 -172.78 -175.93 REMARK 500 GLN B 76 67.64 34.46 REMARK 500 PRO B 91 -156.99 -48.61 REMARK 500 MET B 92 -37.52 -160.44 REMARK 500 REMARK 500 REMARK: NULL REMARK 525 REMARK 525 SOLVENT REMARK 525 REMARK 525 THE SOLVENT MOLECULES HAVE CHAIN IDENTIFIERS THAT REMARK 525 INDICATE THE POLYMER CHAIN WITH WHICH THEY ARE MOST REMARK 525 CLOSELY ASSOCIATED. THE REMARK LISTS ALL THE SOLVENT REMARK 525 MOLECULES WHICH ARE MORE THAN 5A AWAY FROM THE REMARK 525 NEAREST POLYMER CHAIN (M = MODEL NUMBER; REMARK 525 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE REMARK 525 NUMBER; I=INSERTION CODE): REMARK 525 REMARK 525 M RES CSSEQI REMARK 525 HOH B 115 DISTANCE = 6.82 ANGSTROMS REMARK 525 HOH A 116 DISTANCE = 6.52 ANGSTROMS REMARK 525 HOH B 119 DISTANCE = 5.12 ANGSTROMS REMARK 525 HOH B 121 DISTANCE = 5.21 ANGSTROMS REMARK 525 HOH B 123 DISTANCE = 5.18 ANGSTROMS REMARK 525 HOH A 124 DISTANCE = 6.99 ANGSTROMS REMARK 525 HOH B 124 DISTANCE = 5.13 ANGSTROMS REMARK 525 HOH B 134 DISTANCE = 7.25 ANGSTROMS REMARK 525 HOH B 140 DISTANCE = 5.54 ANGSTROMS REMARK 525 HOH B 141 DISTANCE = 5.94 ANGSTROMS REMARK 525 HOH B 142 DISTANCE = 6.60 ANGSTROMS REMARK 525 HOH B 143 DISTANCE = 7.39 ANGSTROMS REMARK 525 HOH A 145 DISTANCE = 9.25 ANGSTROMS REMARK 525 HOH A 150 DISTANCE = 6.01 ANGSTROMS REMARK 525 HOH B 152 DISTANCE = 5.46 ANGSTROMS REMARK 525 HOH B 153 DISTANCE = 9.74 ANGSTROMS REMARK 525 HOH B 154 DISTANCE = 9.32 ANGSTROMS REMARK 525 HOH B 155 DISTANCE = 5.41 ANGSTROMS REMARK 525 HOH B 163 DISTANCE = 5.16 ANGSTROMS DBREF 2E12 A 1 101 UNP Q8P4R5 Q8P4R5_XANCP 1 101 DBREF 2E12 B 1 101 UNP Q8P4R5 Q8P4R5_XANCP 1 101 SEQRES 1 A 101 MET PRO LYS TYR ALA PRO HIS VAL TYR THR GLU GLN ALA SEQRES 2 A 101 GLN ILE ALA THR LEU GLU HIS TRP VAL LYS LEU LEU ASP SEQRES 3 A 101 GLY GLN GLU ARG VAL ARG ILE GLU LEU ASP ASP GLY SER SEQRES 4 A 101 MET ILE ALA GLY THR VAL ALA VAL ARG PRO THR ILE GLN SEQRES 5 A 101 THR TYR ARG ASP GLU GLN GLU ARG GLU GLY SER ASN GLY SEQRES 6 A 101 GLN LEU ARG ILE ASP HIS LEU ASP ALA SER GLN GLU PRO SEQRES 7 A 101 GLN TRP ILE TRP MET ASP ARG ILE VAL ALA VAL HIS PRO SEQRES 8 A 101 MET PRO LEU GLY ALA PRO GLN VAL MET PRO SEQRES 1 B 101 MET PRO LYS TYR ALA PRO HIS VAL TYR THR GLU GLN ALA SEQRES 2 B 101 GLN ILE ALA THR LEU GLU HIS TRP VAL LYS LEU LEU ASP SEQRES 3 B 101 GLY GLN GLU ARG VAL ARG ILE GLU LEU ASP ASP GLY SER SEQRES 4 B 101 MET ILE ALA GLY THR VAL ALA VAL ARG PRO THR ILE GLN SEQRES 5 B 101 THR TYR ARG ASP GLU GLN GLU ARG GLU GLY SER ASN GLY SEQRES 6 B 101 GLN LEU ARG ILE ASP HIS LEU ASP ALA SER GLN GLU PRO SEQRES 7 B 101 GLN TRP ILE TRP MET ASP ARG ILE VAL ALA VAL HIS PRO SEQRES 8 B 101 MET PRO LEU GLY ALA PRO GLN VAL MET PRO FORMUL 3 HOH *122(H2 O) HELIX 1 1 GLU A 11 LEU A 24 1 14 HELIX 2 2 GLU B 11 LEU B 25 1 15 SHEET 1 A 3 ILE A 51 ARG A 55 0 SHEET 2 A 3 GLU A 61 ASP A 70 -1 O ASN A 64 N GLN A 52 SHEET 3 A 3 GLN A 79 TRP A 82 -1 O ILE A 81 N LEU A 67 SHEET 1 B 5 ILE A 51 ARG A 55 0 SHEET 2 B 5 GLU A 61 ASP A 70 -1 O ASN A 64 N GLN A 52 SHEET 3 B 5 MET A 40 VAL A 45 -1 N THR A 44 O ASP A 70 SHEET 4 B 5 ARG A 30 LEU A 35 -1 N ILE A 33 O ILE A 41 SHEET 5 B 5 ILE A 86 PRO A 91 -1 O VAL A 87 N GLU A 34 SHEET 1 C 5 PRO B 78 TRP B 82 0 SHEET 2 C 5 GLN B 66 ASP B 70 -1 N ILE B 69 O GLN B 79 SHEET 3 C 5 MET B 40 VAL B 47 -1 N ALA B 46 O ARG B 68 SHEET 4 C 5 VAL B 31 LEU B 35 -1 N ILE B 33 O ILE B 41 SHEET 5 C 5 ILE B 86 HIS B 90 -1 O VAL B 87 N GLU B 34 SHEET 1 D 2 GLN B 52 ARG B 55 0 SHEET 2 D 2 GLU B 61 ASN B 64 -1 O ASN B 64 N GLN B 52 CRYST1 49.942 51.699 82.120 90.00 90.00 90.00 P 21 21 21 8 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.020023 0.000000 0.000000 0.00000 SCALE2 0.000000 0.019343 0.000000 0.00000 SCALE3 0.000000 0.000000 0.012177 0.00000 ATOM 1 N MET A 1 53.045 42.225 33.724 1.00 2.75 N ATOM 2 CA MET A 1 53.628 40.900 33.321 1.00 2.75 C ATOM 3 C MET A 1 52.484 39.953 32.944 1.00 2.75 C ATOM 4 O MET A 1 51.460 39.918 33.620 1.00 2.75 O ATOM 5 CB MET A 1 54.438 40.316 34.481 1.00 2.75 C ATOM 6 CG MET A 1 55.168 39.017 34.162 1.00 2.75 C ATOM 7 SD MET A 1 56.025 38.339 35.612 1.00 2.75 S ATOM 8 CE MET A 1 57.693 38.808 35.263 1.00 2.75 C ATOM 9 N PRO A 2 52.639 39.190 31.852 1.00 6.58 N ATOM 10 CA PRO A 2 51.588 38.262 31.417 1.00 6.58 C ATOM 11 C PRO A 2 51.672 36.859 32.016 1.00 6.58 C ATOM 12 O PRO A 2 52.754 36.306 32.174 1.00 6.58 O ATOM 13 CB PRO A 2 51.762 38.212 29.890 1.00 6.58 C ATOM 14 CG PRO A 2 52.746 39.341 29.567 1.00 17.56 C ATOM 15 CD PRO A 2 53.627 39.381 30.780 1.00 28.54 C ATOM 16 N LYS A 3 50.524 36.287 32.352 1.00 6.58 N ATOM 17 CA LYS A 3 50.484 34.928 32.870 1.00 6.58 C ATOM 18 C LYS A 3 50.086 34.114 31.642 1.00 6.58 C ATOM 19 O LYS A 3 49.199 34.526 30.894 1.00 6.58 O ATOM 20 CB LYS A 3 49.430 34.799 33.972 1.00 6.58 C ATOM 21 CG LYS A 3 49.622 35.779 35.119 1.00 17.56 C ATOM 22 CD LYS A 3 48.606 35.530 36.216 1.00 28.54 C ATOM 23 CE LYS A 3 48.810 36.475 37.389 1.00 39.52 C ATOM 24 NZ LYS A 3 47.845 36.196 38.491 1.00 50.50 N ATOM 25 N TYR A 4 50.760 32.992 31.414 1.00 2.75 N ATOM 26 CA TYR A 4 50.475 32.153 30.246 1.00 2.75 C ATOM 27 C TYR A 4 49.716 30.873 30.590 1.00 2.75 C ATOM 28 O TYR A 4 49.948 30.254 31.629 1.00 6.58 O ATOM 29 CB TYR A 4 51.779 31.798 29.515 1.00 2.75 C ATOM 30 CG TYR A 4 52.453 32.994 28.860 1.00 2.75 C ATOM 31 CD1 TYR A 4 53.107 33.965 29.627 1.00 2.75 C ATOM 32 CD2 TYR A 4 52.382 33.190 27.474 1.00 2.75 C ATOM 33 CE1 TYR A 4 53.664 35.112 29.028 1.00 2.75 C ATOM 34 CE2 TYR A 4 52.943 34.330 26.868 1.00 2.75 C ATOM 35 CZ TYR A 4 53.576 35.287 27.649 1.00 2.75 C ATOM 36 OH TYR A 4 54.090 36.431 27.057 1.00 2.75 O ATOM 37 N ALA A 5 48.784 30.502 29.719 1.00 6.58 N ATOM 38 CA ALA A 5 47.984 29.298 29.911 1.00 6.58 C ATOM 39 C ALA A 5 48.010 28.497 28.621 1.00 6.58 C ATOM 40 O ALA A 5 48.174 29.058 27.541 1.00 6.58 O ATOM 41 CB ALA A 5 46.540 29.667 30.281 1.00 6.58 C ATOM 42 N PRO A 6 47.873 27.171 28.723 1.00 2.75 N ATOM 43 CA PRO A 6 47.873 26.279 27.562 1.00 2.75 C ATOM 44 C PRO A 6 46.806 26.625 26.505 1.00 2.75 C ATOM 45 O PRO A 6 47.047 26.448 25.314 1.00 6.58 O ATOM 46 CB PRO A 6 47.682 24.887 28.186 1.00 2.75 C ATOM 47 CG PRO A 6 47.204 25.169 29.610 1.00 2.75 C ATOM 48 CD PRO A 6 47.957 26.398 29.972 1.00 2.75 C ATOM 49 N HIS A 7 45.641 27.109 26.944 1.00 6.58 N ATOM 50 CA HIS A 7 44.553 27.501 26.031 1.00 6.58 C ATOM 51 C HIS A 7 43.788 28.682 26.635 1.00 6.58 C ATOM 52 O HIS A 7 43.749 28.823 27.855 1.00 6.58 O ATOM 53 CB HIS A 7 43.549 26.351 25.817 1.00 6.58 C ATOM 54 CG HIS A 7 44.171 25.053 25.402 1.00 17.56 C ATOM 55 ND1 HIS A 7 44.791 24.206 26.296 1.00 28.54 N ATOM 56 CD2 HIS A 7 44.249 24.448 24.193 1.00 28.54 C ATOM 57 CE1 HIS A 7 45.222 23.133 25.656 1.00 39.52 C ATOM 58 NE2 HIS A 7 44.906 23.256 24.380 1.00 39.52 N ATOM 59 N VAL A 8 43.190 29.529 25.791 1.00 6.58 N ATOM 60 CA VAL A 8 42.399 30.686 26.264 1.00 6.58 C ATOM 61 C VAL A 8 41.139 30.865 25.388 1.00 6.58 C ATOM 62 O VAL A 8 41.162 30.538 24.202 1.00 6.58 O ATOM 63 CB VAL A 8 43.231 32.013 26.272 1.00 6.58 C ATOM 64 CG1 VAL A 8 44.314 31.951 27.342 1.00 17.56 C ATOM 65 CG2 VAL A 8 43.857 32.253 24.902 1.00 17.56 C ATOM 66 N TYR A 9 40.064 31.405 25.968 1.00 6.58 N ATOM 67 CA TYR A 9 38.773 31.561 25.260 1.00 6.58 C ATOM 68 C TYR A 9 38.145 32.966 25.195 1.00 6.58 C ATOM 69 O TYR A 9 38.157 33.712 26.171 1.00 6.58 O ATOM 70 CB TYR A 9 37.755 30.602 25.888 1.00 6.58 C ATOM 71 CG TYR A 9 38.240 29.170 25.920 1.00 17.56 C ATOM 72 CD1 TYR A 9 38.045 28.322 24.828 1.00 28.54 C ATOM 73 CD2 TYR A 9 38.961 28.686 27.010 1.00 28.54 C ATOM 74 CE1 TYR A 9 38.559 27.031 24.819 1.00 39.52 C ATOM 75 CE2 TYR A 9 39.481 27.400 27.009 1.00 39.52 C ATOM 76 CZ TYR A 9 39.277 26.577 25.911 1.00 50.50 C ATOM 77 OH TYR A 9 39.797 25.302 25.901 1.00 61.48 O ATOM 78 N THR A 10 37.555 33.294 24.047 1.00 6.58 N ATOM 79 CA THR A 10 36.938 34.606 23.834 1.00 6.58 C ATOM 80 C THR A 10 35.470 34.561 23.400 1.00 6.58 C ATOM 81 O THR A 10 34.818 35.600 23.302 1.00 6.58 O ATOM 82 CB THR A 10 37.710 35.414 22.763 1.00 6.58 C ATOM 83 OG1 THR A 10 37.602 34.759 21.494 1.00 17.56 O ATOM 84 CG2 THR A 10 39.181 35.523 23.135 1.00 17.56 C ATOM 85 N GLU A 11 34.952 33.368 23.138 1.00 6.58 N ATOM 86 CA GLU A 11 33.562 33.228 22.698 1.00 6.58 C ATOM 87 C GLU A 11 32.609 33.191 23.900 1.00 6.58 C ATOM 88 O GLU A 11 32.825 32.439 24.851 1.00 6.58 O ATOM 89 CB GLU A 11 33.409 31.955 21.860 1.00 6.58 C ATOM 90 CG GLU A 11 32.145 31.906 21.007 1.00 17.56 C ATOM 91 CD GLU A 11 32.180 32.899 19.855 1.00 28.54 C ATOM 92 OE1 GLU A 11 33.222 32.977 19.167 1.00 39.52 O ATOM 93 OE2 GLU A 11 31.163 33.590 19.630 1.00 39.52 O ATOM 94 N GLN A 12 31.555 34.004 23.844 1.00 6.58 N ATOM 95 CA GLN A 12 30.586 34.098 24.931 1.00 6.58 C ATOM 96 C GLN A 12 29.975 32.785 25.415 1.00 6.58 C ATOM 97 O GLN A 12 29.537 32.694 26.564 1.00 6.58 O ATOM 98 CB GLN A 12 29.479 35.093 24.565 1.00 6.58 C ATOM 99 CG GLN A 12 30.005 36.518 24.420 1.00 17.56 C ATOM 100 CD GLN A 12 30.952 36.896 25.552 1.00 28.54 C ATOM 101 OE1 GLN A 12 30.536 37.055 26.703 1.00 39.52 O ATOM 102 NE2 GLN A 12 32.236 37.027 25.228 1.00 39.52 N ATOM 103 N ALA A 13 29.952 31.768 24.558 1.00 6.58 N ATOM 104 CA ALA A 13 29.405 30.476 24.948 1.00 6.58 C ATOM 105 C ALA A 13 30.357 29.750 25.894 1.00 6.58 C ATOM 106 O ALA A 13 29.918 29.074 26.823 1.00 6.58 O ATOM 107 CB ALA A 13 29.142 29.620 23.723 1.00 6.58 C ATOM 108 N GLN A 14 31.660 29.883 25.645 1.00 6.58 N ATOM 109 CA GLN A 14 32.674 29.247 26.475 1.00 6.58 C ATOM 110 C GLN A 14 32.864 30.020 27.770 1.00 6.58 C ATOM 111 O GLN A 14 33.198 29.439 28.797 1.00 6.58 O ATOM 112 CB GLN A 14 34.019 29.149 25.739 1.00 6.58 C ATOM 113 CG GLN A 14 34.141 28.000 24.748 1.00 17.56 C ATOM 114 CD GLN A 14 33.025 27.982 23.720 1.00 28.54 C ATOM 115 OE1 GLN A 14 31.896 27.591 24.019 1.00 39.52 O ATOM 116 NE2 GLN A 14 33.335 28.413 22.501 1.00 39.52 N ATOM 117 N ILE A 15 32.663 31.332 27.720 1.00 6.58 N ATOM 118 CA ILE A 15 32.798 32.133 28.920 1.00 6.58 C ATOM 119 C ILE A 15 31.600 31.819 29.811 1.00 6.58 C ATOM 120 O ILE A 15 31.727 31.754 31.044 1.00 6.58 O ATOM 121 CB ILE A 15 32.826 33.635 28.602 1.00 6.58 C ATOM 122 CG1 ILE A 15 34.036 33.962 27.724 1.00 17.56 C ATOM 123 CG2 ILE A 15 32.934 34.427 29.889 1.00 17.56 C ATOM 124 CD1 ILE A 15 34.139 35.417 27.365 1.00 28.54 C ATOM 125 N ALA A 16 30.436 31.616 29.191 1.00 6.58 N ATOM 126 CA ALA A 16 29.221 31.282 29.941 1.00 6.58 C ATOM 127 C ALA A 16 29.416 29.948 30.664 1.00 6.58 C ATOM 128 O ALA A 16 29.005 29.787 31.814 1.00 6.58 O ATOM 129 CB ALA A 16 28.018 31.198 28.997 1.00 6.58 C ATOM 130 N THR A 17 30.047 28.996 29.984 1.00 6.58 N ATOM 131 CA THR A 17 30.317 27.690 30.573 1.00 6.58 C ATOM 132 C THR A 17 31.305 27.861 31.732 1.00 6.58 C ATOM 133 O THR A 17 31.047 27.402 32.844 1.00 6.58 O ATOM 134 CB THR A 17 30.914 26.708 29.530 1.00 6.58 C ATOM 135 OG1 THR A 17 32.126 27.249 28.999 1.00 17.56 O ATOM 136 CG2 THR A 17 29.944 26.499 28.383 1.00 17.56 C ATOM 137 N LEU A 18 32.419 28.544 31.472 1.00 6.58 N ATOM 138 CA LEU A 18 33.430 28.774 32.498 1.00 6.58 C ATOM 139 C LEU A 18 32.868 29.412 33.765 1.00 6.58 C ATOM 140 O LEU A 18 33.285 29.069 34.872 1.00 6.58 O ATOM 141 CB LEU A 18 34.560 29.652 31.942 1.00 6.58 C ATOM 142 CG LEU A 18 35.487 29.006 30.907 1.00 17.56 C ATOM 143 CD1 LEU A 18 36.392 30.058 30.282 1.00 28.54 C ATOM 144 CD2 LEU A 18 36.317 27.922 31.580 1.00 28.54 C ATOM 145 N GLU A 19 31.922 30.335 33.608 1.00 6.58 N ATOM 146 CA GLU A 19 31.335 31.017 34.760 1.00 6.58 C ATOM 147 C GLU A 19 30.371 30.150 35.560 1.00 6.58 C ATOM 148 O GLU A 19 30.310 30.256 36.785 1.00 6.58 O ATOM 149 CB GLU A 19 30.644 32.316 34.308 1.00 6.58 C ATOM 150 CG GLU A 19 31.636 33.396 33.881 1.00 17.56 C ATOM 151 CD GLU A 19 30.969 34.633 33.303 1.00 28.54 C ATOM 152 OE1 GLU A 19 30.217 34.502 32.315 1.00 39.52 O ATOM 153 OE2 GLU A 19 31.206 35.741 33.830 1.00 39.52 O ATOM 154 N HIS A 20 29.621 29.289 34.874 1.00 6.58 N ATOM 155 CA HIS A 20 28.678 28.406 35.560 1.00 6.58 C ATOM 156 C HIS A 20 29.426 27.414 36.463 1.00 6.58 C ATOM 157 O HIS A 20 28.944 27.068 37.531 1.00 6.58 O ATOM 158 CB HIS A 20 27.823 27.637 34.545 1.00 6.58 C ATOM 159 CG HIS A 20 26.814 26.725 35.171 1.00 17.56 C ATOM 160 ND1 HIS A 20 25.792 27.184 35.973 1.00 28.54 N ATOM 161 CD2 HIS A 20 26.676 25.379 35.118 1.00 28.54 C ATOM 162 CE1 HIS A 20 25.068 26.159 36.390 1.00 39.52 C ATOM 163 NE2 HIS A 20 25.584 25.053 35.885 1.00 39.52 N ATOM 164 N TRP A 21 30.608 26.975 36.028 1.00 6.58 N ATOM 165 CA TRP A 21 31.416 26.027 36.799 1.00 6.58 C ATOM 166 C TRP A 21 31.830 26.606 38.143 1.00 6.58 C ATOM 167 O TRP A 21 31.872 25.895 39.150 1.00 6.58 O ATOM 168 CB TRP A 21 32.679 25.629 36.021 1.00 6.58 C ATOM 169 CG TRP A 21 32.440 24.737 34.824 1.00 17.56 C ATOM 170 CD1 TRP A 21 33.259 24.592 33.740 1.00 28.54 C ATOM 171 CD2 TRP A 21 31.327 23.857 34.604 1.00 28.54 C ATOM 172 NE1 TRP A 21 32.728 23.683 32.861 1.00 39.52 N ATOM 173 CE2 TRP A 21 31.543 23.215 33.365 1.00 39.52 C ATOM 174 CE3 TRP A 21 30.169 23.549 35.333 1.00 39.52 C ATOM 175 CZ2 TRP A 21 30.644 22.283 32.837 1.00 50.50 C ATOM 176 CZ3 TRP A 21 29.275 22.623 34.807 1.00 50.50 C ATOM 177 CH2 TRP A 21 29.520 22.001 33.570 1.00 61.48 C ATOM 178 N VAL A 22 32.156 27.895 38.154 1.00 6.58 N ATOM 179 CA VAL A 22 32.554 28.557 39.391 1.00 6.58 C ATOM 180 C VAL A 22 31.392 28.493 40.386 1.00 6.58 C ATOM 181 O VAL A 22 31.580 28.181 41.551 1.00 6.58 O ATOM 182 CB VAL A 22 32.951 30.034 39.125 1.00 6.58 C ATOM 183 CG1 VAL A 22 33.343 30.730 40.426 1.00 17.56 C ATOM 184 CG2 VAL A 22 34.112 30.078 38.131 1.00 17.56 C ATOM 185 N LYS A 23 30.191 28.776 39.893 1.00 6.58 N ATOM 186 CA LYS A 23 28.977 28.742 40.699 1.00 6.58 C ATOM 187 C LYS A 23 28.775 27.375 41.351 1.00 6.58 C ATOM 188 O LYS A 23 28.181 27.267 42.423 1.00 6.58 O ATOM 189 CB LYS A 23 27.764 29.067 39.819 1.00 6.58 C ATOM 190 CG LYS A 23 27.639 30.528 39.416 1.00 17.56 C ATOM 191 CD LYS A 23 27.106 31.370 40.562 1.00 28.54 C ATOM 192 CE LYS A 23 25.840 30.742 41.131 1.00 39.52 C ATOM 193 NZ LYS A 23 24.916 30.311 40.043 1.00 50.50 N ATOM 194 N LEU A 24 29.264 26.331 40.692 1.00 6.58 N ATOM 195 CA LEU A 24 29.131 24.979 41.207 1.00 6.58 C ATOM 196 C LEU A 24 30.387 24.503 41.944 1.00 6.58 C ATOM 197 O LEU A 24 30.421 23.392 42.461 1.00 6.58 O ATOM 198 CB LEU A 24 28.810 24.023 40.061 1.00 6.58 C ATOM 199 CG LEU A 24 27.567 24.381 39.240 1.00 17.56 C ATOM 200 CD1 LEU A 24 27.409 23.373 38.107 1.00 28.54 C ATOM 201 CD2 LEU A 24 26.331 24.393 40.128 1.00 28.54 C ATOM 202 N LEU A 25 31.416 25.341 41.995 1.00 6.58 N ATOM 203 CA LEU A 25 32.658 24.969 42.669 1.00 6.58 C ATOM 204 C LEU A 25 33.010 25.943 43.784 1.00 6.58 C ATOM 205 O LEU A 25 34.124 26.475 43.828 1.00 6.58 O ATOM 206 CB LEU A 25 33.818 24.922 41.665 1.00 6.58 C ATOM 207 CG LEU A 25 33.829 23.785 40.641 1.00 17.56 C ATOM 208 CD1 LEU A 25 34.858 24.061 39.555 1.00 28.54 C ATOM 209 CD2 LEU A 25 34.134 22.478 41.348 1.00 28.54 C ATOM 210 N ASP A 26 32.069 26.175 44.690 1.00 2.75 N ATOM 211 CA ASP A 26 32.325 27.100 45.777 1.00 2.75 C ATOM 212 C ASP A 26 33.445 26.600 46.704 1.00 2.75 C ATOM 213 O ASP A 26 33.659 25.394 46.864 1.00 2.75 O ATOM 214 CB ASP A 26 31.043 27.349 46.577 1.00 2.75 C ATOM 215 CG ASP A 26 31.174 28.526 47.521 1.00 2.75 C ATOM 216 OD1 ASP A 26 30.323 28.657 48.431 1.00 2.75 O ATOM 217 OD2 ASP A 26 32.126 29.333 47.354 1.00 2.75 O ATOM 218 N GLY A 27 34.162 27.543 47.309 1.00 2.75 N ATOM 219 CA GLY A 27 35.245 27.182 48.207 1.00 2.75 C ATOM 220 C GLY A 27 34.850 26.148 49.249 1.00 2.75 C ATOM 221 O GLY A 27 33.798 26.272 49.896 1.00 2.75 O ATOM 222 N GLN A 28 35.691 25.116 49.379 1.00 2.75 N ATOM 223 CA GLN A 28 35.525 24.020 50.344 1.00 2.75 C ATOM 224 C GLN A 28 34.394 23.025 50.087 1.00 2.75 C ATOM 225 O GLN A 28 34.154 22.150 50.911 1.00 2.75 O ATOM 226 CB GLN A 28 35.385 24.582 51.773 1.00 2.75 C ATOM 227 CG GLN A 28 36.645 25.245 52.319 1.00 2.75 C ATOM 228 CD GLN A 28 37.835 24.295 52.390 1.00 2.75 C ATOM 229 OE1 GLN A 28 37.859 23.373 53.197 1.00 2.75 O ATOM 230 NE2 GLN A 28 38.827 24.521 51.539 1.00 2.75 N ATOM 231 N GLU A 29 33.703 23.141 48.953 1.00 2.75 N ATOM 232 CA GLU A 29 32.606 22.224 48.651 1.00 2.75 C ATOM 233 C GLU A 29 33.156 20.908 48.112 1.00 2.75 C ATOM 234 O GLU A 29 34.122 20.893 47.346 1.00 2.75 O ATOM 235 CB GLU A 29 31.663 22.859 47.622 1.00 2.75 C ATOM 236 CG GLU A 29 30.332 22.146 47.421 1.00 2.75 C ATOM 237 CD GLU A 29 29.298 22.461 48.496 1.00 2.75 C ATOM 238 OE1 GLU A 29 28.934 21.526 49.252 1.00 2.75 O ATOM 239 OE2 GLU A 29 28.838 23.629 48.589 1.00 2.75 O ATOM 240 N ARG A 30 32.548 19.804 48.524 1.00 2.75 N ATOM 241 CA ARG A 30 32.983 18.485 48.070 1.00 2.75 C ATOM 242 C ARG A 30 32.229 18.140 46.798 1.00 2.75 C ATOM 243 O ARG A 30 30.995 18.173 46.768 1.00 2.75 O ATOM 244 CB ARG A 30 32.708 17.446 49.148 1.00 2.75 C ATOM 245 CG ARG A 30 33.678 17.503 50.319 1.00 2.75 C ATOM 246 CD ARG A 30 33.196 16.634 51.464 1.00 2.75 C ATOM 247 NE ARG A 30 33.169 15.211 51.129 1.00 2.75 N ATOM 248 CZ ARG A 30 32.354 14.339 51.707 1.00 2.75 C ATOM 249 NH1 ARG A 30 31.509 14.756 52.639 1.00 2.75 N ATOM 250 NH2 ARG A 30 32.382 13.059 51.355 1.00 2.75 N ATOM 251 N VAL A 31 32.969 17.827 45.740 1.00 6.58 N ATOM 252 CA VAL A 31 32.339 17.513 44.470 1.00 6.58 C ATOM 253 C VAL A 31 33.095 16.443 43.700 1.00 6.58 C ATOM 254 O VAL A 31 34.211 16.063 44.063 1.00 6.58 O ATOM 255 CB VAL A 31 32.263 18.770 43.580 1.00 6.58 C ATOM 256 CG1 VAL A 31 31.510 19.879 44.295 1.00 17.56 C ATOM 257 CG2 VAL A 31 33.669 19.236 43.236 1.00 17.56 C ATOM 258 N ARG A 32 32.456 15.940 42.654 1.00 6.58 N ATOM 259 CA ARG A 32 33.070 14.951 41.777 1.00 6.58 C ATOM 260 C ARG A 32 33.007 15.667 40.441 1.00 6.58 C ATOM 261 O ARG A 32 31.963 16.217 40.080 1.00 6.58 O ATOM 262 CB ARG A 32 32.253 13.665 41.725 1.00 6.58 C ATOM 263 CG ARG A 32 33.072 12.405 41.464 1.00 17.56 C ATOM 264 CD ARG A 32 32.308 11.169 41.939 1.00 28.54 C ATOM 265 NE ARG A 32 30.906 11.193 41.524 1.00 39.52 N ATOM 266 CZ ARG A 32 30.494 11.171 40.259 1.00 50.50 C ATOM 267 NH1 ARG A 32 31.376 11.121 39.269 1.00 61.48 N ATOM 268 NH2 ARG A 32 29.199 11.210 39.983 1.00 61.48 N ATOM 269 N ILE A 33 34.120 15.665 39.719 1.00 6.58 N ATOM 270 CA ILE A 33 34.210 16.366 38.447 1.00 6.58 C ATOM 271 C ILE A 33 34.430 15.472 37.237 1.00 6.58 C ATOM 272 O ILE A 33 35.204 14.525 37.299 1.00 6.58 O ATOM 273 CB ILE A 33 35.367 17.377 38.506 1.00 6.58 C ATOM 274 CG1 ILE A 33 35.173 18.297 39.716 1.00 17.56 C ATOM 275 CG2 ILE A 33 35.443 18.176 37.210 1.00 17.56 C ATOM 276 CD1 ILE A 33 36.352 19.210 39.990 1.00 28.54 C ATOM 277 N GLU A 34 33.740 15.785 36.142 1.00 6.58 N ATOM 278 CA GLU A 34 33.897 15.045 34.893 1.00 6.58 C ATOM 279 C GLU A 34 34.785 15.897 33.977 1.00 6.58 C ATOM 280 O GLU A 34 34.471 17.060 33.713 1.00 6.58 O ATOM 281 CB GLU A 34 32.542 14.794 34.216 1.00 6.58 C ATOM 282 CG GLU A 34 31.610 13.862 34.986 1.00 17.56 C ATOM 283 CD GLU A 34 30.434 13.382 34.143 1.00 28.54 C ATOM 284 OE1 GLU A 34 29.525 12.721 34.694 1.00 39.52 O ATOM 285 OE2 GLU A 34 30.424 13.662 32.924 1.00 39.52 O ATOM 286 N LEU A 35 35.885 15.308 33.503 1.00 6.58 N ATOM 287 CA LEU A 35 36.852 15.980 32.627 1.00 6.58 C ATOM 288 C LEU A 35 36.719 15.504 31.179 1.00 6.58 C ATOM 289 O LEU A 35 36.297 14.379 30.938 1.00 6.58 O ATOM 290 CB LEU A 35 38.273 15.705 33.123 1.00 6.58 C ATOM 291 CG LEU A 35 38.619 16.198 34.532 1.00 17.56 C ATOM 292 CD1 LEU A 35 39.872 15.507 35.024 1.00 28.54 C ATOM 293 CD2 LEU A 35 38.793 17.708 34.522 1.00 28.54 C ATOM 294 N ASP A 36 37.115 16.349 30.225 1.00 6.58 N ATOM 295 CA ASP A 36 37.004 16.013 28.797 1.00 6.58 C ATOM 296 C ASP A 36 37.945 14.938 28.245 1.00 6.58 C ATOM 297 O ASP A 36 37.946 14.673 27.045 1.00 6.58 O ATOM 298 CB ASP A 36 37.122 17.287 27.941 1.00 6.58 C ATOM 299 CG ASP A 36 38.504 17.921 27.999 1.00 17.56 C ATOM 300 OD1 ASP A 36 39.416 17.334 28.609 1.00 28.54 O ATOM 301 OD2 ASP A 36 38.678 19.014 27.420 1.00 28.54 O ATOM 302 N ASP A 37 38.739 14.321 29.118 1.00 6.58 N ATOM 303 CA ASP A 37 39.658 13.258 28.710 1.00 6.58 C ATOM 304 C ASP A 37 39.108 11.907 29.186 1.00 6.58 C ATOM 305 O ASP A 37 39.765 10.868 29.067 1.00 6.58 O ATOM 306 CB ASP A 37 41.033 13.499 29.321 1.00 6.58 C ATOM 307 CG ASP A 37 41.061 13.216 30.801 1.00 17.56 C ATOM 308 OD1 ASP A 37 40.058 13.532 31.477 1.00 28.54 O ATOM 309 OD2 ASP A 37 42.087 12.688 31.281 1.00 28.54 O ATOM 310 N GLY A 38 37.892 11.937 29.726 1.00 6.58 N ATOM 311 CA GLY A 38 37.248 10.734 30.211 1.00 6.58 C ATOM 312 C GLY A 38 37.481 10.496 31.693 1.00 6.58 C ATOM 313 O GLY A 38 36.758 9.731 32.334 1.00 6.58 O ATOM 314 N SER A 39 38.493 11.155 32.243 1.00 6.58 N ATOM 315 CA SER A 39 38.815 10.988 33.653 1.00 6.58 C ATOM 316 C SER A 39 37.890 11.800 34.557 1.00 6.58 C ATOM 317 O SER A 39 37.053 12.585 34.099 1.00 6.58 O ATOM 318 CB SER A 39 40.277 11.371 33.912 1.00 6.58 C ATOM 319 OG SER A 39 40.487 12.754 33.679 1.00 17.56 O ATOM 320 N MET A 40 38.050 11.597 35.856 1.00 6.58 N ATOM 321 CA MET A 40 37.233 12.285 36.832 1.00 6.58 C ATOM 322 C MET A 40 37.985 12.348 38.140 1.00 6.58 C ATOM 323 O MET A 40 38.601 11.372 38.562 1.00 6.58 O ATOM 324 CB MET A 40 35.899 11.554 37.006 1.00 6.58 C ATOM 325 CG MET A 40 36.002 10.039 37.107 1.00 17.56 C ATOM 326 SD MET A 40 36.695 9.434 38.663 1.00 28.54 S ATOM 327 CE MET A 40 35.193 9.171 39.631 1.00 39.52 C ATOM 328 N ILE A 41 37.956 13.521 38.760 1.00 6.58 N ATOM 329 CA ILE A 41 38.626 13.734 40.035 1.00 6.58 C ATOM 330 C ILE A 41 37.588 14.131 41.067 1.00 6.58 C ATOM 331 O ILE A 41 36.634 14.841 40.760 1.00 6.58 O ATOM 332 CB ILE A 41 39.698 14.847 39.930 1.00 6.58 C ATOM 333 CG1 ILE A 41 40.390 15.028 41.284 1.00 17.56 C ATOM 334 CG2 ILE A 41 39.053 16.159 39.486 1.00 17.56 C ATOM 335 CD1 ILE A 41 41.615 15.922 41.246 1.00 28.54 C ATOM 336 N ALA A 42 37.766 13.660 42.291 1.00 6.58 N ATOM 337 CA ALA A 42 36.836 13.978 43.360 1.00 6.58 C ATOM 338 C ALA A 42 37.634 14.504 44.534 1.00 6.58 C ATOM 339 O ALA A 42 38.774 14.097 44.742 1.00 6.58 O ATOM 340 CB ALA A 42 36.071 12.749 43.766 1.00 6.58 C ATOM 341 N GLY A 43 37.037 15.419 45.287 1.00 6.58 N ATOM 342 CA GLY A 43 37.728 15.971 46.428 1.00 6.58 C ATOM 343 C GLY A 43 37.130 17.262 46.912 1.00 6.58 C ATOM 344 O GLY A 43 35.950 17.552 46.662 1.00 6.58 O ATOM 345 N THR A 44 37.944 18.050 47.604 1.00 2.75 N ATOM 346 CA THR A 44 37.469 19.325 48.114 1.00 2.75 C ATOM 347 C THR A 44 38.065 20.509 47.359 1.00 2.75 C ATOM 348 O THR A 44 39.271 20.567 47.114 1.00 2.75 O ATOM 349 CB THR A 44 37.797 19.483 49.615 1.00 2.75 C ATOM 350 OG1 THR A 44 37.466 18.279 50.314 1.00 2.75 O ATOM 351 CG2 THR A 44 36.975 20.619 50.213 1.00 2.75 C ATOM 352 N VAL A 45 37.196 21.436 46.968 1.00 2.75 N ATOM 353 CA VAL A 45 37.608 22.664 46.288 1.00 2.75 C ATOM 354 C VAL A 45 38.435 23.444 47.314 1.00 2.75 C ATOM 355 O VAL A 45 37.888 23.937 48.291 1.00 2.75 O ATOM 356 CB VAL A 45 36.388 23.534 45.910 1.00 2.75 C ATOM 357 CG1 VAL A 45 36.858 24.819 45.199 1.00 2.75 C ATOM 358 CG2 VAL A 45 35.433 22.744 45.015 1.00 2.75 C ATOM 359 N ALA A 46 39.743 23.561 47.088 1.00 2.75 N ATOM 360 CA ALA A 46 40.609 24.277 48.028 1.00 2.75 C ATOM 361 C ALA A 46 40.193 25.733 48.224 1.00 2.75 C ATOM 362 O ALA A 46 40.119 26.219 49.343 1.00 2.75 O ATOM 363 CB ALA A 46 42.066 24.210 47.557 1.00 2.75 C ATOM 364 N VAL A 47 39.921 26.428 47.126 1.00 2.75 N ATOM 365 CA VAL A 47 39.525 27.827 47.191 1.00 2.75 C ATOM 366 C VAL A 47 38.636 28.117 45.976 1.00 2.75 C ATOM 367 O VAL A 47 38.873 27.569 44.901 1.00 2.75 O ATOM 368 CB VAL A 47 40.782 28.736 47.192 1.00 2.75 C ATOM 369 CG1 VAL A 47 41.524 28.615 45.863 1.00 2.75 C ATOM 370 CG2 VAL A 47 40.401 30.177 47.472 1.00 2.75 C ATOM 371 N ARG A 48 37.605 28.948 46.141 1.00 6.58 N ATOM 372 CA ARG A 48 36.722 29.256 45.014 1.00 6.58 C ATOM 373 C ARG A 48 37.595 29.782 43.884 1.00 6.58 C ATOM 374 O ARG A 48 38.360 30.727 44.074 1.00 6.58 O ATOM 375 CB ARG A 48 35.674 30.313 45.392 1.00 6.58 C ATOM 376 CG ARG A 48 34.751 30.691 44.227 1.00 17.56 C ATOM 377 CD ARG A 48 33.809 31.848 44.571 1.00 28.54 C ATOM 378 NE ARG A 48 32.778 31.472 45.535 1.00 39.52 N ATOM 379 CZ ARG A 48 31.814 32.286 45.956 1.00 50.50 C ATOM 380 NH1 ARG A 48 31.742 33.528 45.500 1.00 61.48 N ATOM 381 NH2 ARG A 48 30.919 31.858 46.836 1.00 61.48 N ATOM 382 N PRO A 49 37.507 29.162 42.696 1.00 6.58 N ATOM 383 CA PRO A 49 38.319 29.607 41.558 1.00 6.58 C ATOM 384 C PRO A 49 37.776 30.880 40.921 1.00 6.58 C ATOM 385 O PRO A 49 36.635 31.274 41.170 1.00 6.58 O ATOM 386 CB PRO A 49 38.263 28.414 40.614 1.00 6.58 C ATOM 387 CG PRO A 49 36.890 27.875 40.853 1.00 17.56 C ATOM 388 CD PRO A 49 36.769 27.929 42.363 1.00 28.54 C ATOM 389 N THR A 50 38.604 31.520 40.101 1.00 6.58 N ATOM 390 CA THR A 50 38.206 32.754 39.433 1.00 6.58 C ATOM 391 C THR A 50 38.573 32.681 37.959 1.00 6.58 C ATOM 392 O THR A 50 39.582 32.073 37.598 1.00 6.58 O ATOM 393 CB THR A 50 38.931 33.963 40.049 1.00 6.58 C ATOM 394 OG1 THR A 50 40.337 33.847 39.802 1.00 17.56 O ATOM 395 CG2 THR A 50 38.694 34.014 41.553 1.00 17.56 C ATOM 396 N ILE A 51 37.751 33.282 37.108 1.00 6.58 N ATOM 397 CA ILE A 51 38.036 33.292 35.676 1.00 6.58 C ATOM 398 C ILE A 51 38.876 34.539 35.412 1.00 6.58 C ATOM 399 O ILE A 51 38.528 35.631 35.875 1.00 6.58 O ATOM 400 CB ILE A 51 36.733 33.355 34.840 1.00 6.58 C ATOM 401 CG1 ILE A 51 35.838 32.158 35.182 1.00 17.56 C ATOM 402 CG2 ILE A 51 37.063 33.354 33.357 1.00 17.56 C ATOM 403 CD1 ILE A 51 36.490 30.797 34.969 1.00 28.54 C ATOM 404 N GLN A 52 39.991 34.372 34.700 1.00 6.58 N ATOM 405 CA GLN A 52 40.891 35.486 34.406 1.00 6.58 C ATOM 406 C GLN A 52 41.364 35.523 32.958 1.00 6.58 C ATOM 407 O GLN A 52 41.000 34.676 32.141 1.00 6.58 O ATOM 408 CB GLN A 52 42.131 35.411 35.296 1.00 6.58 C ATOM 409 CG GLN A 52 41.862 35.283 36.779 1.00 17.56 C ATOM 410 CD GLN A 52 43.097 34.864 37.554 1.00 28.54 C ATOM 411 OE1 GLN A 52 44.168 35.450 37.403 1.00 39.52 O ATOM 412 NE2 GLN A 52 42.951 33.848 38.393 1.00 39.52 N ATOM 413 N THR A 53 42.198 36.518 32.665 1.00 6.58 N ATOM 414 CA THR A 53 42.784 36.708 31.335 1.00 6.58 C ATOM 415 C THR A 53 44.222 36.172 31.290 1.00 6.58 C ATOM 416 O THR A 53 45.039 36.458 32.173 1.00 6.58 O ATOM 417 CB THR A 53 42.822 38.199 30.954 1.00 6.58 C ATOM 418 OG1 THR A 53 41.497 38.745 31.002 1.00 17.56 O ATOM 419 CG2 THR A 53 43.394 38.368 29.555 1.00 17.56 C ATOM 420 N TYR A 54 44.537 35.418 30.244 1.00 6.58 N ATOM 421 CA TYR A 54 45.881 34.843 30.080 1.00 6.58 C ATOM 422 C TYR A 54 46.313 34.975 28.624 1.00 6.58 C ATOM 423 O TYR A 54 45.550 35.463 27.797 1.00 6.58 O ATOM 424 CB TYR A 54 45.871 33.352 30.440 1.00 6.58 C ATOM 425 CG TYR A 54 45.516 33.023 31.878 1.00 17.56 C ATOM 426 CD1 TYR A 54 46.472 33.103 32.889 1.00 28.54 C ATOM 427 CD2 TYR A 54 44.231 32.599 32.219 1.00 28.54 C ATOM 428 CE1 TYR A 54 46.158 32.763 34.202 1.00 39.52 C ATOM 429 CE2 TYR A 54 43.908 32.260 33.525 1.00 39.52 C ATOM 430 CZ TYR A 54 44.875 32.343 34.512 1.00 50.50 C ATOM 431 OH TYR A 54 44.559 32.000 35.806 1.00 61.48 O ATOM 432 N ARG A 55 47.541 34.545 28.335 1.00 2.75 N ATOM 433 CA ARG A 55 48.108 34.533 26.980 1.00 2.75 C ATOM 434 C ARG A 55 48.536 33.107 26.654 1.00 2.75 C ATOM 435 O ARG A 55 49.060 32.393 27.524 1.00 2.75 O ATOM 436 CB ARG A 55 49.388 35.374 26.850 1.00 2.75 C ATOM 437 CG ARG A 55 49.234 36.863 26.771 1.00 2.75 C ATOM 438 CD ARG A 55 50.560 37.443 26.279 1.00 2.75 C ATOM 439 NE ARG A 55 50.656 38.887 26.450 1.00 2.75 N ATOM 440 CZ ARG A 55 51.751 39.595 26.207 1.00 2.75 C ATOM 441 NH1 ARG A 55 52.850 38.995 25.769 1.00 2.75 N ATOM 442 NH2 ARG A 55 51.763 40.901 26.445 1.00 2.75 N ATOM 443 N ASP A 56 48.338 32.695 25.406 1.00 2.75 N ATOM 444 CA ASP A 56 48.802 31.376 24.999 1.00 2.75 C ATOM 445 C ASP A 56 50.175 31.571 24.343 1.00 2.75 C ATOM 446 O ASP A 56 50.692 32.693 24.283 1.00 2.75 O ATOM 447 CB ASP A 56 47.828 30.692 24.029 1.00 2.75 C ATOM 448 CG ASP A 56 47.463 31.558 22.828 1.00 2.75 C ATOM 449 OD1 ASP A 56 48.277 32.392 22.382 1.00 2.75 O ATOM 450 OD2 ASP A 56 46.346 31.373 22.303 1.00 2.75 O ATOM 451 N GLU A 57 50.764 30.485 23.857 1.00 2.75 N ATOM 452 CA GLU A 57 52.074 30.553 23.240 1.00 2.75 C ATOM 453 C GLU A 57 52.174 31.476 22.030 1.00 2.75 C ATOM 454 O GLU A 57 53.229 32.067 21.800 1.00 2.75 O ATOM 455 CB GLU A 57 52.540 29.159 22.842 1.00 2.75 C ATOM 456 CG GLU A 57 53.986 29.147 22.410 1.00 2.75 C ATOM 457 CD GLU A 57 54.443 27.799 21.954 1.00 2.75 C ATOM 458 OE1 GLU A 57 54.101 26.799 22.626 1.00 2.75 O ATOM 459 OE2 GLU A 57 55.166 27.739 20.928 1.00 2.75 O ATOM 460 N GLN A 58 51.102 31.569 21.242 1.00 2.75 N ATOM 461 CA GLN A 58 51.072 32.429 20.056 1.00 2.75 C ATOM 462 C GLN A 58 50.853 33.879 20.475 1.00 2.75 C ATOM 463 O GLN A 58 50.757 34.766 19.635 1.00 2.75 O ATOM 464 CB GLN A 58 49.948 31.986 19.102 1.00 2.75 C ATOM 465 CG GLN A 58 50.322 30.826 18.180 1.00 2.75 C ATOM 466 CD GLN A 58 49.116 30.184 17.489 1.00 2.75 C ATOM 467 OE1 GLN A 58 48.243 30.876 16.942 1.00 2.75 O ATOM 468 NE2 GLN A 58 49.074 28.854 17.498 1.00 2.75 N ATOM 469 N GLU A 59 50.767 34.093 21.788 1.00 2.75 N ATOM 470 CA GLU A 59 50.568 35.405 22.400 1.00 2.75 C ATOM 471 C GLU A 59 49.160 36.017 22.289 1.00 2.75 C ATOM 472 O GLU A 59 49.020 37.232 22.351 1.00 2.75 O ATOM 473 CB GLU A 59 51.594 36.407 21.858 1.00 2.75 C ATOM 474 CG GLU A 59 53.050 35.958 21.945 1.00 2.75 C ATOM 475 CD GLU A 59 53.569 35.861 23.369 1.00 2.75 C ATOM 476 OE1 GLU A 59 53.031 36.548 24.253 1.00 2.75 O ATOM 477 OE2 GLU A 59 54.536 35.108 23.603 1.00 2.75 O ATOM 478 N ARG A 60 48.134 35.179 22.139 1.00 2.75 N ATOM 479 CA ARG A 60 46.741 35.630 22.052 1.00 2.75 C ATOM 480 C ARG A 60 46.102 35.633 23.461 1.00 2.75 C ATOM 481 O ARG A 60 46.320 34.704 24.242 1.00 2.75 O ATOM 482 CB ARG A 60 45.932 34.699 21.121 1.00 2.75 C ATOM 483 CG ARG A 60 46.589 34.384 19.760 1.00 2.75 C ATOM 484 CD ARG A 60 45.768 33.320 18.998 1.00 2.75 C ATOM 485 NE ARG A 60 46.277 32.993 17.657 1.00 2.75 N ATOM 486 CZ ARG A 60 46.263 33.827 16.623 1.00 2.75 C ATOM 487 NH1 ARG A 60 45.764 35.052 16.755 1.00 2.75 N ATOM 488 NH2 ARG A 60 46.767 33.446 15.457 1.00 2.75 N ATOM 489 N GLU A 61 45.316 36.663 23.777 1.00 2.75 N ATOM 490 CA GLU A 61 44.659 36.788 25.092 1.00 2.75 C ATOM 491 C GLU A 61 43.226 36.234 25.142 1.00 2.75 C ATOM 492 O GLU A 61 42.454 36.385 24.189 1.00 2.75 O ATOM 493 CB GLU A 61 44.632 38.253 25.526 1.00 2.75 C ATOM 494 CG GLU A 61 45.997 38.851 25.745 1.00 2.75 C ATOM 495 CD GLU A 61 45.940 40.168 26.502 1.00 2.75 C ATOM 496 OE1 GLU A 61 46.927 40.463 27.203 1.00 2.75 O ATOM 497 OE2 GLU A 61 44.923 40.900 26.401 1.00 2.75 O ATOM 498 N GLY A 62 42.871 35.615 26.266 1.00 2.75 N ATOM 499 CA GLY A 62 41.536 35.043 26.409 1.00 2.75 C ATOM 500 C GLY A 62 41.231 34.629 27.842 1.00 2.75 C ATOM 501 O GLY A 62 42.041 34.849 28.741 1.00 2.75 O ATOM 502 N SER A 63 40.070 34.019 28.063 1.00 2.75 N ATOM 503 CA SER A 63 39.688 33.613 29.409 1.00 2.75 C ATOM 504 C SER A 63 39.919 32.142 29.725 1.00 2.75 C ATOM 505 O SER A 63 39.943 31.295 28.840 1.00 2.75 O ATOM 506 CB SER A 63 38.220 33.956 29.657 1.00 2.75 C ATOM 507 OG SER A 63 38.035 35.358 29.654 1.00 2.75 O ATOM 508 N ASN A 64 40.094 31.863 31.014 1.00 2.75 N ATOM 509 CA ASN A 64 40.303 30.513 31.527 1.00 2.75 C ATOM 510 C ASN A 64 40.424 30.614 33.048 1.00 2.75 C ATOM 511 O ASN A 64 40.265 31.694 33.611 1.00 2.75 O ATOM 512 CB ASN A 64 41.580 29.908 30.953 1.00 2.75 C ATOM 513 CG ASN A 64 41.529 28.393 30.893 1.00 2.75 C ATOM 514 OD1 ASN A 64 40.803 27.751 31.651 1.00 2.75 O ATOM 515 ND2 ASN A 64 42.319 27.816 30.008 1.00 2.75 N ATOM 516 N GLY A 65 40.705 29.492 33.705 1.00 2.75 N ATOM 517 CA GLY A 65 40.845 29.498 35.155 1.00 2.75 C ATOM 518 C GLY A 65 41.566 28.271 35.690 1.00 2.75 C ATOM 519 O GLY A 65 41.647 27.239 35.020 1.00 2.75 O ATOM 520 N GLN A 66 42.092 28.382 36.908 1.00 2.75 N ATOM 521 CA GLN A 66 42.811 27.283 37.531 1.00 2.75 C ATOM 522 C GLN A 66 42.101 26.858 38.800 1.00 2.75 C ATOM 523 O GLN A 66 41.749 27.693 39.630 1.00 6.58 O ATOM 524 CB GLN A 66 44.256 27.702 37.838 1.00 2.75 C ATOM 525 CG GLN A 66 45.061 28.035 36.572 1.00 2.75 C ATOM 526 CD GLN A 66 46.476 28.504 36.870 1.00 2.75 C ATOM 527 OE1 GLN A 66 47.325 27.715 37.271 1.00 2.75 O ATOM 528 NE2 GLN A 66 46.727 29.796 36.687 1.00 2.75 N ATOM 529 N LEU A 67 41.903 25.548 38.930 1.00 2.75 N ATOM 530 CA LEU A 67 41.223 24.933 40.070 1.00 2.75 C ATOM 531 C LEU A 67 42.171 24.013 40.831 1.00 2.75 C ATOM 532 O LEU A 67 42.919 23.239 40.230 1.00 6.58 O ATOM 533 CB LEU A 67 40.034 24.111 39.572 1.00 2.75 C ATOM 534 CG LEU A 67 39.229 23.238 40.536 1.00 2.75 C ATOM 535 CD1 LEU A 67 38.464 24.107 41.506 1.00 2.75 C ATOM 536 CD2 LEU A 67 38.260 22.374 39.742 1.00 2.75 C ATOM 537 N ARG A 68 42.142 24.130 42.154 1.00 2.75 N ATOM 538 CA ARG A 68 42.956 23.307 43.035 1.00 2.75 C ATOM 539 C ARG A 68 42.006 22.437 43.854 1.00 2.75 C ATOM 540 O ARG A 68 41.051 22.939 44.457 1.00 6.58 O ATOM 541 CB ARG A 68 43.788 24.178 43.982 1.00 2.75 C ATOM 542 CG ARG A 68 44.669 23.384 44.943 1.00 2.75 C ATOM 543 CD ARG A 68 45.297 24.302 45.998 1.00 2.75 C ATOM 544 NE ARG A 68 45.248 23.716 47.330 1.00 2.75 N ATOM 545 CZ ARG A 68 45.555 24.357 48.452 1.00 2.75 C ATOM 546 NH1 ARG A 68 45.949 25.629 48.424 1.00 2.75 N ATOM 547 NH2 ARG A 68 45.437 23.733 49.612 1.00 2.75 N ATOM 548 N ILE A 69 42.260 21.134 43.859 1.00 2.75 N ATOM 549 CA ILE A 69 41.440 20.188 44.602 1.00 2.75 C ATOM 550 C ILE A 69 42.374 19.491 45.581 1.00 2.75 C ATOM 551 O ILE A 69 43.463 19.076 45.199 1.00 6.58 O ATOM 552 CB ILE A 69 40.838 19.103 43.667 1.00 2.75 C ATOM 553 CG1 ILE A 69 40.102 19.765 42.512 1.00 2.75 C ATOM 554 CG2 ILE A 69 39.885 18.189 44.442 1.00 2.75 C ATOM 555 CD1 ILE A 69 39.009 20.704 42.973 1.00 2.75 C ATOM 556 N ASP A 70 41.956 19.379 46.834 1.00 2.75 N ATOM 557 CA ASP A 70 42.754 18.684 47.827 1.00 2.75 C ATOM 558 C ASP A 70 41.903 17.541 48.340 1.00 2.75 C ATOM 559 O ASP A 70 40.713 17.454 48.057 1.00 6.58 O ATOM 560 CB ASP A 70 43.090 19.555 49.037 1.00 2.75 C ATOM 561 CG ASP A 70 43.571 20.936 48.675 1.00 2.75 C ATOM 562 OD1 ASP A 70 44.169 21.129 47.592 1.00 2.75 O ATOM 563 OD2 ASP A 70 43.355 21.841 49.509 1.00 2.75 O ATOM 564 N HIS A 71 42.524 16.653 49.098 1.00 2.75 N ATOM 565 CA HIS A 71 41.800 15.555 49.712 1.00 2.75 C ATOM 566 C HIS A 71 42.616 14.918 50.808 1.00 2.75 C ATOM 567 O HIS A 71 43.510 15.558 51.382 1.00 2.75 O ATOM 568 CB HIS A 71 41.337 14.470 48.715 1.00 2.75 C ATOM 569 CG HIS A 71 42.063 14.457 47.405 1.00 2.75 C ATOM 570 ND1 HIS A 71 43.370 14.872 47.260 1.00 2.75 N ATOM 571 CD2 HIS A 71 41.666 14.036 46.179 1.00 2.75 C ATOM 572 CE1 HIS A 71 43.745 14.713 46.004 1.00 2.75 C ATOM 573 NE2 HIS A 71 42.729 14.209 45.326 1.00 2.75 N ATOM 574 N LEU A 72 42.305 13.664 51.108 1.00 6.58 N ATOM 575 CA LEU A 72 43.013 12.947 52.145 1.00 6.58 C ATOM 576 C LEU A 72 43.370 11.530 51.722 1.00 6.58 C ATOM 577 O LEU A 72 44.543 11.219 51.486 1.00 6.58 O ATOM 578 CB LEU A 72 42.159 12.903 53.413 1.00 6.58 C ATOM 579 CG LEU A 72 42.824 12.362 54.688 1.00 17.56 C ATOM 580 CD1 LEU A 72 42.887 10.839 54.669 1.00 28.54 C ATOM 581 CD2 LEU A 72 44.214 12.974 54.820 1.00 28.54 C ATOM 582 N ASP A 73 42.336 10.695 51.626 1.00 6.58 N ATOM 583 CA ASP A 73 42.438 9.274 51.268 1.00 6.58 C ATOM 584 C ASP A 73 42.532 9.014 49.757 1.00 6.58 C ATOM 585 O ASP A 73 43.367 8.217 49.317 1.00 6.58 O ATOM 586 CB ASP A 73 41.232 8.536 51.890 1.00 6.58 C ATOM 587 CG ASP A 73 41.061 7.104 51.385 1.00 17.56 C ATOM 588 OD1 ASP A 73 42.021 6.314 51.454 1.00 28.54 O ATOM 589 OD2 ASP A 73 39.944 6.763 50.933 1.00 28.54 O ATOM 590 N ALA A 74 41.696 9.704 48.984 1.00 6.58 N ATOM 591 CA ALA A 74 41.639 9.584 47.520 1.00 6.58 C ATOM 592 C ALA A 74 42.966 9.206 46.872 1.00 6.58 C ATOM 593 O ALA A 74 43.158 8.059 46.440 1.00 6.58 O ATOM 594 CB ALA A 74 41.134 10.884 46.923 1.00 6.58 C ATOM 595 N SER A 75 43.873 10.179 46.793 1.00 6.58 N ATOM 596 CA SER A 75 45.194 9.978 46.208 1.00 6.58 C ATOM 597 C SER A 75 46.268 10.802 46.906 1.00 6.58 C ATOM 598 O SER A 75 47.454 10.676 46.598 1.00 6.58 O ATOM 599 CB SER A 75 45.177 10.361 44.724 1.00 6.58 C ATOM 600 OG SER A 75 46.477 10.711 44.261 1.00 17.56 O ATOM 601 N GLN A 76 45.859 11.650 47.841 1.00 6.58 N ATOM 602 CA GLN A 76 46.815 12.516 48.516 1.00 6.58 C ATOM 603 C GLN A 76 47.342 13.452 47.426 1.00 6.58 C ATOM 604 O GLN A 76 46.906 13.372 46.274 1.00 6.58 O ATOM 605 CB GLN A 76 47.979 11.717 49.106 1.00 6.58 C ATOM 606 CG GLN A 76 47.628 10.833 50.279 1.00 17.56 C ATOM 607 CD GLN A 76 48.865 10.302 50.963 1.00 28.54 C ATOM 608 OE1 GLN A 76 49.658 11.069 51.505 1.00 39.52 O ATOM 609 NE2 GLN A 76 49.044 8.987 50.936 1.00 39.52 N ATOM 610 N GLU A 77 48.269 14.334 47.780 1.00 2.75 N ATOM 611 CA GLU A 77 48.827 15.275 46.804 1.00 2.75 C ATOM 612 C GLU A 77 47.784 16.210 46.213 1.00 2.75 C ATOM 613 O GLU A 77 46.863 15.762 45.512 1.00 2.75 O ATOM 614 CB GLU A 77 49.487 14.540 45.645 1.00 2.75 C ATOM 615 CG GLU A 77 50.748 13.777 45.989 1.00 2.75 C ATOM 616 CD GLU A 77 51.640 13.565 44.770 1.00 2.75 C ATOM 617 OE1 GLU A 77 51.111 13.485 43.635 1.00 2.75 O ATOM 618 OE2 GLU A 77 52.875 13.465 44.948 1.00 2.75 O ATOM 619 N PRO A 78 47.910 17.522 46.480 1.00 2.75 N ATOM 620 CA PRO A 78 46.923 18.451 45.917 1.00 2.75 C ATOM 621 C PRO A 78 47.045 18.472 44.387 1.00 2.75 C ATOM 622 O PRO A 78 48.130 18.259 43.850 1.00 2.75 O ATOM 623 CB PRO A 78 47.299 19.775 46.571 1.00 2.75 C ATOM 624 CG PRO A 78 48.806 19.658 46.712 1.00 2.75 C ATOM 625 CD PRO A 78 48.950 18.245 47.236 1.00 2.75 C ATOM 626 N GLN A 79 45.945 18.732 43.685 1.00 2.75 N ATOM 627 CA GLN A 79 45.965 18.738 42.221 1.00 2.75 C ATOM 628 C GLN A 79 45.442 20.046 41.637 1.00 2.75 C ATOM 629 O GLN A 79 44.452 20.577 42.120 1.00 2.75 O ATOM 630 CB GLN A 79 45.104 17.582 41.690 1.00 2.75 C ATOM 631 CG GLN A 79 44.758 16.545 42.743 1.00 2.75 C ATOM 632 CD GLN A 79 45.138 15.144 42.314 1.00 2.75 C ATOM 633 OE1 GLN A 79 45.167 14.212 43.125 1.00 2.75 O ATOM 634 NE2 GLN A 79 45.433 14.986 41.027 1.00 2.75 N ATOM 635 N TRP A 80 46.122 20.558 40.611 1.00 6.58 N ATOM 636 CA TRP A 80 45.714 21.784 39.925 1.00 6.58 C ATOM 637 C TRP A 80 45.298 21.416 38.512 1.00 6.58 C ATOM 638 O TRP A 80 46.007 20.680 37.823 1.00 6.58 O ATOM 639 CB TRP A 80 46.863 22.798 39.850 1.00 6.58 C ATOM 640 CG TRP A 80 47.138 23.513 41.135 1.00 17.56 C ATOM 641 CD1 TRP A 80 47.869 23.052 42.193 1.00 28.54 C ATOM 642 CD2 TRP A 80 46.689 24.822 41.497 1.00 28.54 C ATOM 643 NE1 TRP A 80 47.906 23.996 43.190 1.00 39.52 N ATOM 644 CE2 TRP A 80 47.189 25.093 42.789 1.00 39.52 C ATOM 645 CE3 TRP A 80 45.912 25.795 40.856 1.00 39.52 C ATOM 646 CZ2 TRP A 80 46.936 26.300 43.452 1.00 50.50 C ATOM 647 CZ3 TRP A 80 45.661 26.995 41.517 1.00 50.50 C ATOM 648 CH2 TRP A 80 46.173 27.235 42.800 1.00 61.48 C ATOM 649 N ILE A 81 44.149 21.932 38.082 1.00 6.58 N ATOM 650 CA ILE A 81 43.639 21.647 36.747 1.00 6.58 C ATOM 651 C ILE A 81 43.137 22.923 36.083 1.00 6.58 C ATOM 652 O ILE A 81 42.666 23.830 36.765 1.00 6.58 O ATOM 653 CB ILE A 81 42.460 20.657 36.814 1.00 6.58 C ATOM 654 CG1 ILE A 81 42.919 19.333 37.427 1.00 17.56 C ATOM 655 CG2 ILE A 81 41.877 20.441 35.426 1.00 17.56 C ATOM 656 CD1 ILE A 81 41.820 18.285 37.524 1.00 28.54 C ATOM 657 N TRP A 82 43.252 23.004 34.763 1.00 6.58 N ATOM 658 CA TRP A 82 42.733 24.167 34.059 1.00 6.58 C ATOM 659 C TRP A 82 41.257 23.846 33.845 1.00 6.58 C ATOM 660 O TRP A 82 40.907 22.739 33.429 1.00 6.58 O ATOM 661 CB TRP A 82 43.457 24.389 32.726 1.00 6.58 C ATOM 662 CG TRP A 82 44.872 24.851 32.906 1.00 17.56 C ATOM 663 CD1 TRP A 82 45.985 24.069 32.988 1.00 28.54 C ATOM 664 CD2 TRP A 82 45.315 26.199 33.103 1.00 28.54 C ATOM 665 NE1 TRP A 82 47.096 24.843 33.225 1.00 39.52 N ATOM 666 CE2 TRP A 82 46.714 26.156 33.300 1.00 39.52 C ATOM 667 CE3 TRP A 82 44.666 27.441 33.134 1.00 39.52 C ATOM 668 CZ2 TRP A 82 47.478 27.308 33.527 1.00 50.50 C ATOM 669 CZ3 TRP A 82 45.426 28.590 33.361 1.00 50.50 C ATOM 670 CH2 TRP A 82 46.819 28.511 33.554 1.00 61.48 C ATOM 671 N MET A 83 40.396 24.812 34.149 1.00 6.58 N ATOM 672 CA MET A 83 38.952 24.614 34.064 1.00 6.58 C ATOM 673 C MET A 83 38.299 24.360 32.704 1.00 6.58 C ATOM 674 O MET A 83 37.179 23.854 32.657 1.00 6.58 O ATOM 675 CB MET A 83 38.249 25.768 34.785 1.00 6.58 C ATOM 676 CG MET A 83 38.328 25.644 36.306 1.00 17.56 C ATOM 677 SD MET A 83 38.012 27.184 37.202 1.00 28.54 S ATOM 678 CE MET A 83 36.223 27.187 37.257 1.00 39.52 C ATOM 679 N ASP A 84 38.971 24.689 31.604 1.00 6.58 N ATOM 680 CA ASP A 84 38.383 24.432 30.286 1.00 6.58 C ATOM 681 C ASP A 84 38.278 22.929 30.000 1.00 6.58 C ATOM 682 O ASP A 84 37.691 22.512 29.001 1.00 6.58 O ATOM 683 CB ASP A 84 39.209 25.094 29.177 1.00 6.58 C ATOM 684 CG ASP A 84 40.675 24.686 29.203 1.00 17.56 C ATOM 685 OD1 ASP A 84 41.367 24.906 28.186 1.00 28.54 O ATOM 686 OD2 ASP A 84 41.137 24.157 30.233 1.00 28.54 O ATOM 687 N ARG A 85 38.846 22.119 30.883 1.00 6.58 N ATOM 688 CA ARG A 85 38.827 20.665 30.725 1.00 6.58 C ATOM 689 C ARG A 85 37.592 20.012 31.361 1.00 6.58 C ATOM 690 O ARG A 85 37.367 18.809 31.205 1.00 6.58 O ATOM 691 CB ARG A 85 40.089 20.062 31.348 1.00 6.58 C ATOM 692 CG ARG A 85 41.395 20.729 30.918 1.00 17.56 C ATOM 693 CD ARG A 85 41.800 20.392 29.478 1.00 28.54 C ATOM 694 NE ARG A 85 40.870 20.912 28.477 1.00 39.52 N ATOM 695 CZ ARG A 85 41.024 20.762 27.164 1.00 50.50 C ATOM 696 NH1 ARG A 85 42.074 20.107 26.687 1.00 61.48 N ATOM 697 NH2 ARG A 85 40.128 21.265 26.327 1.00 61.48 N ATOM 698 N ILE A 86 36.791 20.802 32.070 1.00 2.75 N ATOM 699 CA ILE A 86 35.601 20.281 32.751 1.00 2.75 C ATOM 700 C ILE A 86 34.364 20.109 31.889 1.00 2.75 C ATOM 701 O ILE A 86 34.032 20.976 31.093 1.00 6.58 O ATOM 702 CB ILE A 86 35.220 21.168 33.954 1.00 2.75 C ATOM 703 CG1 ILE A 86 36.316 21.051 35.018 1.00 2.75 C ATOM 704 CG2 ILE A 86 33.830 20.772 34.484 1.00 2.75 C ATOM 705 CD1 ILE A 86 36.167 21.980 36.205 1.00 2.75 C ATOM 706 N VAL A 87 33.673 18.990 32.100 1.00 6.58 N ATOM 707 CA VAL A 87 32.456 18.642 31.368 1.00 6.58 C ATOM 708 C VAL A 87 31.228 18.574 32.288 1.00 6.58 C ATOM 709 O VAL A 87 30.091 18.768 31.847 1.00 6.58 O ATOM 710 CB VAL A 87 32.635 17.278 30.669 1.00 6.58 C ATOM 711 CG1 VAL A 87 31.420 16.951 29.831 1.00 17.56 C ATOM 712 CG2 VAL A 87 33.881 17.305 29.814 1.00 17.56 C ATOM 713 N ALA A 88 31.454 18.301 33.568 1.00 6.58 N ATOM 714 CA ALA A 88 30.349 18.213 34.517 1.00 6.58 C ATOM 715 C ALA A 88 30.841 18.331 35.957 1.00 6.58 C ATOM 716 O ALA A 88 31.971 17.964 36.263 1.00 6.58 O ATOM 717 CB ALA A 88 29.605 16.888 34.329 1.00 6.58 C ATOM 718 N VAL A 89 29.980 18.845 36.830 1.00 6.58 N ATOM 719 CA VAL A 89 30.305 19.005 38.242 1.00 6.58 C ATOM 720 C VAL A 89 29.085 18.623 39.064 1.00 6.58 C ATOM 721 O VAL A 89 28.040 19.262 38.954 1.00 6.58 O ATOM 722 CB VAL A 89 30.676 20.465 38.584 1.00 6.58 C ATOM 723 CG1 VAL A 89 30.915 20.598 40.079 1.00 17.56 C ATOM 724 CG2 VAL A 89 31.916 20.894 37.805 1.00 17.56 C ATOM 725 N HIS A 90 29.211 17.578 39.878 1.00 6.58 N ATOM 726 CA HIS A 90 28.100 17.137 40.714 1.00 6.58 C ATOM 727 C HIS A 90 28.487 17.151 42.194 1.00 6.58 C ATOM 728 O HIS A 90 29.606 16.795 42.558 1.00 6.58 O ATOM 729 CB HIS A 90 27.646 15.748 40.263 1.00 6.58 C ATOM 730 CG HIS A 90 27.053 15.737 38.886 1.00 17.56 C ATOM 731 ND1 HIS A 90 25.859 16.357 38.586 1.00 28.54 N ATOM 732 CD2 HIS A 90 27.515 15.231 37.719 1.00 28.54 C ATOM 733 CE1 HIS A 90 25.612 16.234 37.295 1.00 39.52 C ATOM 734 NE2 HIS A 90 26.602 15.554 36.745 1.00 39.52 N ATOM 735 N PRO A 91 27.567 17.581 43.065 1.00 6.58 N ATOM 736 CA PRO A 91 27.811 17.652 44.510 1.00 6.58 C ATOM 737 C PRO A 91 27.701 16.304 45.227 1.00 6.58 C ATOM 738 O PRO A 91 27.305 15.316 44.619 1.00 6.58 O ATOM 739 CB PRO A 91 26.738 18.617 44.971 1.00 6.58 C ATOM 740 CG PRO A 91 25.575 18.186 44.121 1.00 17.56 C ATOM 741 CD PRO A 91 26.200 18.031 42.746 1.00 28.54 C ATOM 742 N MET A 92 28.075 16.288 46.512 1.00 2.75 N ATOM 743 CA MET A 92 28.001 15.116 47.400 1.00 2.75 C ATOM 744 C MET A 92 29.240 14.214 47.533 1.00 2.75 C ATOM 745 O MET A 92 29.518 13.693 48.624 1.00 2.75 O ATOM 746 CB MET A 92 26.772 14.261 47.054 1.00 2.75 C ATOM 747 CG MET A 92 26.441 13.184 48.082 1.00 2.75 C ATOM 748 SD MET A 92 26.160 13.829 49.736 1.00 2.75 S ATOM 749 CE MET A 92 24.402 13.775 49.827 1.00 2.75 C ATOM 750 N PRO A 93 29.996 13.996 46.443 1.00 1.02 N ATOM 751 CA PRO A 93 31.176 13.137 46.583 1.00 1.02 C ATOM 752 C PRO A 93 32.396 13.910 47.080 1.00 1.02 C ATOM 753 O PRO A 93 33.506 13.658 46.572 1.00 1.02 O ATOM 754 CB PRO A 93 31.369 12.597 45.173 1.00 1.02 C ATOM 755 CG PRO A 93 30.915 13.739 44.343 1.00 1.02 C ATOM 756 CD PRO A 93 29.657 14.172 45.020 1.00 1.02 C TER 757 PRO A 93 ATOM 758 N MET B 1 -0.631 29.877 42.866 1.00 2.75 N ATOM 759 CA MET B 1 -0.995 29.262 41.562 1.00 2.75 C ATOM 760 C MET B 1 0.286 28.899 40.811 1.00 2.75 C ATOM 761 O MET B 1 1.216 29.701 40.724 1.00 2.75 O ATOM 762 CB MET B 1 -1.844 30.260 40.745 1.00 2.75 C ATOM 763 CG MET B 1 -3.198 29.715 40.246 1.00 2.75 C ATOM 764 SD MET B 1 -4.537 30.952 40.018 1.00 2.75 S ATOM 765 CE MET B 1 -3.845 31.930 38.718 1.00 2.75 C ATOM 766 N PRO B 2 0.365 27.664 40.285 1.00 6.58 N ATOM 767 CA PRO B 2 1.563 27.256 39.548 1.00 6.58 C ATOM 768 C PRO B 2 1.684 28.116 38.298 1.00 6.58 C ATOM 769 O PRO B 2 0.673 28.422 37.654 1.00 6.58 O ATOM 770 CB PRO B 2 1.280 25.794 39.214 1.00 6.58 C ATOM 771 CG PRO B 2 0.419 25.349 40.353 1.00 17.56 C ATOM 772 CD PRO B 2 -0.533 26.513 40.491 1.00 28.54 C ATOM 773 N LYS B 3 2.911 28.510 37.962 1.00 6.58 N ATOM 774 CA LYS B 3 3.157 29.337 36.781 1.00 6.58 C ATOM 775 C LYS B 3 3.480 28.475 35.573 1.00 6.58 C ATOM 776 O LYS B 3 4.410 27.669 35.604 1.00 6.58 O ATOM 777 CB LYS B 3 4.316 30.309 37.024 1.00 6.58 C ATOM 778 CG LYS B 3 4.053 31.350 38.103 1.00 17.56 C ATOM 779 CD LYS B 3 5.227 32.302 38.249 1.00 28.54 C ATOM 780 CE LYS B 3 4.916 33.416 39.238 1.00 39.52 C ATOM 781 NZ LYS B 3 6.037 34.397 39.360 1.00 50.50 N ATOM 782 N TYR B 4 2.702 28.644 34.510 1.00 2.75 N ATOM 783 CA TYR B 4 2.905 27.887 33.276 1.00 2.75 C ATOM 784 C TYR B 4 3.900 28.588 32.348 1.00 2.75 C ATOM 785 O TYR B 4 4.089 29.798 32.423 1.00 6.58 O ATOM 786 CB TYR B 4 1.557 27.718 32.559 1.00 2.75 C ATOM 787 CG TYR B 4 1.645 27.281 31.111 1.00 2.75 C ATOM 788 CD1 TYR B 4 1.661 25.925 30.762 1.00 2.75 C ATOM 789 CD2 TYR B 4 1.721 28.226 30.084 1.00 2.75 C ATOM 790 CE1 TYR B 4 1.747 25.523 29.421 1.00 2.75 C ATOM 791 CE2 TYR B 4 1.812 27.836 28.748 1.00 2.75 C ATOM 792 CZ TYR B 4 1.821 26.477 28.426 1.00 2.75 C ATOM 793 OH TYR B 4 1.898 26.086 27.098 1.00 2.75 O ATOM 794 N ALA B 5 4.527 27.807 31.472 1.00 6.58 N ATOM 795 CA ALA B 5 5.478 28.321 30.504 1.00 6.58 C ATOM 796 C ALA B 5 5.495 27.369 29.324 1.00 6.58 C ATOM 797 O ALA B 5 5.460 26.157 29.498 1.00 6.58 O ATOM 798 CB ALA B 5 6.868 28.412 31.117 1.00 6.58 C ATOM 799 N PRO B 6 5.528 27.904 28.096 1.00 2.75 N ATOM 800 CA PRO B 6 5.548 27.021 26.929 1.00 2.75 C ATOM 801 C PRO B 6 6.721 26.028 26.897 1.00 2.75 C ATOM 802 O PRO B 6 6.604 24.962 26.301 1.00 6.58 O ATOM 803 CB PRO B 6 5.572 28.007 25.761 1.00 2.75 C ATOM 804 CG PRO B 6 4.764 29.155 26.286 1.00 2.75 C ATOM 805 CD PRO B 6 5.282 29.306 27.698 1.00 2.75 C ATOM 806 N HIS B 7 7.843 26.377 27.528 1.00 6.58 N ATOM 807 CA HIS B 7 9.018 25.489 27.561 1.00 6.58 C ATOM 808 C HIS B 7 9.853 25.669 28.834 1.00 6.58 C ATOM 809 O HIS B 7 9.948 26.773 29.362 1.00 6.58 O ATOM 810 CB HIS B 7 9.941 25.739 26.356 1.00 6.58 C ATOM 811 CG HIS B 7 9.275 25.574 25.025 1.00 17.56 C ATOM 812 ND1 HIS B 7 8.385 26.494 24.518 1.00 28.54 N ATOM 813 CD2 HIS B 7 9.386 24.602 24.090 1.00 28.54 C ATOM 814 CE1 HIS B 7 7.976 26.098 23.325 1.00 39.52 C ATOM 815 NE2 HIS B 7 8.570 24.953 23.043 1.00 39.52 N ATOM 816 N VAL B 8 10.467 24.582 29.307 1.00 6.58 N ATOM 817 CA VAL B 8 11.310 24.610 30.508 1.00 6.58 C ATOM 818 C VAL B 8 12.621 23.845 30.259 1.00 6.58 C ATOM 819 O VAL B 8 12.716 23.083 29.294 1.00 6.58 O ATOM 820 CB VAL B 8 10.581 23.984 31.730 1.00 6.58 C ATOM 821 CG1 VAL B 8 9.362 24.840 32.107 1.00 17.56 C ATOM 822 CG2 VAL B 8 10.150 22.560 31.418 1.00 17.56 C ATOM 823 N TYR B 9 13.618 24.037 31.127 1.00 6.58 N ATOM 824 CA TYR B 9 14.922 23.375 30.961 1.00 6.58 C ATOM 825 C TYR B 9 15.644 23.034 32.271 1.00 6.58 C ATOM 826 O TYR B 9 15.576 23.790 33.238 1.00 6.58 O ATOM 827 CB TYR B 9 15.844 24.268 30.126 1.00 6.58 C ATOM 828 CG TYR B 9 15.261 24.675 28.794 1.00 17.56 C ATOM 829 CD1 TYR B 9 15.352 23.838 27.682 1.00 28.54 C ATOM 830 CD2 TYR B 9 14.593 25.890 28.653 1.00 28.54 C ATOM 831 CE1 TYR B 9 14.793 24.206 26.459 1.00 39.52 C ATOM 832 CE2 TYR B 9 14.030 26.263 27.442 1.00 39.52 C ATOM 833 CZ TYR B 9 14.132 25.422 26.349 1.00 50.50 C ATOM 834 OH TYR B 9 13.580 25.807 25.149 1.00 61.48 O ATOM 835 N THR B 10 16.355 21.902 32.294 1.00 6.58 N ATOM 836 CA THR B 10 17.098 21.494 33.495 1.00 6.58 C ATOM 837 C THR B 10 18.550 21.056 33.262 1.00 6.58 C ATOM 838 O THR B 10 19.232 20.699 34.209 1.00 6.58 O ATOM 839 CB THR B 10 16.390 20.338 34.252 1.00 6.58 C ATOM 840 OG1 THR B 10 16.313 19.190 33.403 1.00 17.56 O ATOM 841 CG2 THR B 10 14.983 20.751 34.673 1.00 17.56 C ATOM 842 N GLU B 11 19.039 21.089 32.026 1.00 6.58 N ATOM 843 CA GLU B 11 20.423 20.666 31.779 1.00 6.58 C ATOM 844 C GLU B 11 21.426 21.753 32.164 1.00 6.58 C ATOM 845 O GLU B 11 21.222 22.907 31.840 1.00 6.58 O ATOM 846 CB GLU B 11 20.613 20.282 30.308 1.00 6.58 C ATOM 847 CG GLU B 11 20.139 18.877 29.963 1.00 17.56 C ATOM 848 CD GLU B 11 20.561 18.450 28.572 1.00 28.54 C ATOM 849 OE1 GLU B 11 19.995 18.969 27.587 1.00 39.52 O ATOM 850 OE2 GLU B 11 21.471 17.599 28.465 1.00 39.52 O ATOM 851 N GLN B 12 22.508 21.381 32.850 1.00 6.58 N ATOM 852 CA GLN B 12 23.524 22.352 33.278 1.00 6.58 C ATOM 853 C GLN B 12 23.988 23.330 32.193 1.00 6.58 C ATOM 854 O GLN B 12 24.135 24.528 32.459 1.00 6.58 O ATOM 855 CB GLN B 12 24.746 21.622 33.853 1.00 6.58 C ATOM 856 CG GLN B 12 24.447 20.816 35.106 1.00 17.56 C ATOM 857 CD GLN B 12 23.857 21.674 36.206 1.00 28.54 C ATOM 858 OE1 GLN B 12 24.459 22.661 36.626 1.00 39.52 O ATOM 859 NE2 GLN B 12 22.673 21.303 36.678 1.00 39.52 N ATOM 860 N ALA B 13 24.247 22.827 30.986 1.00 6.58 N ATOM 861 CA ALA B 13 24.679 23.689 29.885 1.00 6.58 C ATOM 862 C ALA B 13 23.573 24.695 29.606 1.00 6.58 C ATOM 863 O ALA B 13 23.834 25.882 29.429 1.00 6.58 O ATOM 864 CB ALA B 13 24.955 22.863 28.638 1.00 6.58 C ATOM 865 N GLN B 14 22.334 24.211 29.574 1.00 6.58 N ATOM 866 CA GLN B 14 21.193 25.087 29.337 1.00 6.58 C ATOM 867 C GLN B 14 21.134 26.162 30.411 1.00 6.58 C ATOM 868 O GLN B 14 20.944 27.333 30.108 1.00 6.58 O ATOM 869 CB GLN B 14 19.892 24.292 29.344 1.00 6.58 C ATOM 870 CG GLN B 14 19.866 23.151 28.342 1.00 17.56 C ATOM 871 CD GLN B 14 18.547 22.409 28.349 1.00 28.54 C ATOM 872 OE1 GLN B 14 18.052 22.005 29.404 1.00 39.52 O ATOM 873 NE2 GLN B 14 17.970 22.220 27.167 1.00 39.52 N ATOM 874 N ILE B 15 21.283 25.746 31.663 1.00 6.58 N ATOM 875 CA ILE B 15 21.269 26.667 32.789 1.00 6.58 C ATOM 876 C ILE B 15 22.388 27.697 32.595 1.00 6.58 C ATOM 877 O ILE B 15 22.180 28.892 32.790 1.00 6.58 O ATOM 878 CB ILE B 15 21.485 25.909 34.133 1.00 6.58 C ATOM 879 CG1 ILE B 15 20.329 24.939 34.383 1.00 17.56 C ATOM 880 CG2 ILE B 15 21.603 26.900 35.284 1.00 17.56 C ATOM 881 CD1 ILE B 15 20.456 24.137 35.678 1.00 28.54 C ATOM 882 N ALA B 16 23.566 27.215 32.198 1.00 6.58 N ATOM 883 CA ALA B 16 24.737 28.060 31.963 1.00 6.58 C ATOM 884 C ALA B 16 24.404 29.206 31.017 1.00 6.58 C ATOM 885 O ALA B 16 24.794 30.346 31.249 1.00 6.58 O ATOM 886 CB ALA B 16 25.884 27.216 31.380 1.00 6.58 C ATOM 887 N THR B 17 23.681 28.877 29.952 1.00 6.58 N ATOM 888 CA THR B 17 23.259 29.839 28.937 1.00 6.58 C ATOM 889 C THR B 17 22.311 30.896 29.510 1.00 6.58 C ATOM 890 O THR B 17 22.491 32.099 29.286 1.00 6.58 O ATOM 891 CB THR B 17 22.524 29.121 27.768 1.00 6.58 C ATOM 892 OG1 THR B 17 23.424 28.224 27.102 1.00 17.56 O ATOM 893 CG2 THR B 17 21.978 30.139 26.761 1.00 17.56 C ATOM 894 N LEU B 18 21.305 30.442 30.251 1.00 6.58 N ATOM 895 CA LEU B 18 20.319 31.349 30.835 1.00 6.58 C ATOM 896 C LEU B 18 20.939 32.313 31.834 1.00 6.58 C ATOM 897 O LEU B 18 20.583 33.486 31.885 1.00 6.58 O ATOM 898 CB LEU B 18 19.186 30.553 31.502 1.00 6.58 C ATOM 899 CG LEU B 18 18.293 29.728 30.570 1.00 17.56 C ATOM 900 CD1 LEU B 18 17.182 29.084 31.380 1.00 28.54 C ATOM 901 CD2 LEU B 18 17.703 30.610 29.485 1.00 28.54 C ATOM 902 N GLU B 19 21.877 31.812 32.628 1.00 6.58 N ATOM 903 CA GLU B 19 22.548 32.626 33.623 1.00 6.58 C ATOM 904 C GLU B 19 23.476 33.635 32.955 1.00 6.58 C ATOM 905 O GLU B 19 23.745 34.693 33.513 1.00 6.58 O ATOM 906 CB GLU B 19 23.368 31.741 34.566 1.00 6.58 C ATOM 907 CG GLU B 19 22.568 30.706 35.350 1.00 17.56 C ATOM 908 CD GLU B 19 23.462 29.774 36.161 1.00 28.54 C ATOM 909 OE1 GLU B 19 22.925 28.956 36.939 1.00 39.52 O ATOM 910 OE2 GLU B 19 24.701 29.853 36.016 1.00 39.52 O ATOM 911 N HIS B 20 23.990 33.302 31.774 1.00 6.58 N ATOM 912 CA HIS B 20 24.890 34.224 31.098 1.00 6.58 C ATOM 913 C HIS B 20 24.075 35.346 30.478 1.00 6.58 C ATOM 914 O HIS B 20 24.542 36.483 30.370 1.00 6.58 O ATOM 915 CB HIS B 20 25.712 33.501 30.031 1.00 6.58 C ATOM 916 CG HIS B 20 26.754 34.364 29.389 1.00 17.56 C ATOM 917 ND1 HIS B 20 27.742 34.997 30.111 1.00 28.54 N ATOM 918 CD2 HIS B 20 26.958 34.704 28.094 1.00 28.54 C ATOM 919 CE1 HIS B 20 28.509 35.691 29.289 1.00 39.52 C ATOM 920 NE2 HIS B 20 28.055 35.529 28.059 1.00 39.52 N ATOM 921 N TRP B 21 22.843 35.022 30.098 1.00 6.58 N ATOM 922 CA TRP B 21 21.941 35.996 29.503 1.00 6.58 C ATOM 923 C TRP B 21 21.543 37.024 30.552 1.00 6.58 C ATOM 924 O TRP B 21 21.513 38.216 30.275 1.00 6.58 O ATOM 925 CB TRP B 21 20.693 35.304 28.948 1.00 6.58 C ATOM 926 CG TRP B 21 20.884 34.669 27.611 1.00 17.56 C ATOM 927 CD1 TRP B 21 20.154 33.646 27.081 1.00 28.54 C ATOM 928 CD2 TRP B 21 21.840 35.042 26.610 1.00 28.54 C ATOM 929 NE1 TRP B 21 20.595 33.357 25.812 1.00 39.52 N ATOM 930 CE2 TRP B 21 21.630 34.199 25.499 1.00 39.52 C ATOM 931 CE3 TRP B 21 22.856 36.008 26.543 1.00 39.52 C ATOM 932 CZ2 TRP B 21 22.397 34.291 24.332 1.00 50.50 C ATOM 933 CZ3 TRP B 21 23.618 36.101 25.385 1.00 50.50 C ATOM 934 CH2 TRP B 21 23.384 35.246 24.295 1.00 61.48 C ATOM 935 N VAL B 22 21.251 36.548 31.757 1.00 6.58 N ATOM 936 CA VAL B 22 20.871 37.423 32.856 1.00 6.58 C ATOM 937 C VAL B 22 22.016 38.368 33.205 1.00 6.58 C ATOM 938 O VAL B 22 21.794 39.456 33.740 1.00 6.58 O ATOM 939 CB VAL B 22 20.504 36.615 34.124 1.00 6.58 C ATOM 940 CG1 VAL B 22 20.199 37.565 35.280 1.00 17.56 C ATOM 941 CG2 VAL B 22 19.299 35.720 33.844 1.00 17.56 C ATOM 942 N LYS B 23 23.242 37.952 32.906 1.00 6.58 N ATOM 943 CA LYS B 23 24.402 38.779 33.203 1.00 6.58 C ATOM 944 C LYS B 23 24.563 39.904 32.180 1.00 6.58 C ATOM 945 O LYS B 23 24.783 41.057 32.559 1.00 6.58 O ATOM 946 CB LYS B 23 25.666 37.916 33.243 1.00 6.58 C ATOM 947 CG LYS B 23 26.913 38.650 33.735 1.00 17.56 C ATOM 948 CD LYS B 23 28.070 37.687 33.976 1.00 28.54 C ATOM 949 CE LYS B 23 27.716 36.664 35.046 1.00 39.52 C ATOM 950 NZ LYS B 23 28.860 35.776 35.385 1.00 50.50 N ATOM 951 N LEU B 24 24.459 39.581 30.891 1.00 6.58 N ATOM 952 CA LEU B 24 24.602 40.605 29.851 1.00 6.58 C ATOM 953 C LEU B 24 23.462 41.614 29.935 1.00 6.58 C ATOM 954 O LEU B 24 23.687 42.828 29.971 1.00 6.58 O ATOM 955 CB LEU B 24 24.625 39.982 28.448 1.00 6.58 C ATOM 956 CG LEU B 24 25.777 39.047 28.088 1.00 17.56 C ATOM 957 CD1 LEU B 24 25.708 38.726 26.608 1.00 28.54 C ATOM 958 CD2 LEU B 24 27.115 39.699 28.432 1.00 28.54 C ATOM 959 N LEU B 25 22.236 41.106 29.957 1.00 6.58 N ATOM 960 CA LEU B 25 21.062 41.958 30.056 1.00 6.58 C ATOM 961 C LEU B 25 20.922 42.398 31.516 1.00 6.58 C ATOM 962 O LEU B 25 19.808 42.493 32.037 1.00 6.58 O ATOM 963 CB LEU B 25 19.817 41.177 29.644 1.00 6.58 C ATOM 964 CG LEU B 25 19.863 40.486 28.281 1.00 17.56 C ATOM 965 CD1 LEU B 25 18.618 39.629 28.125 1.00 28.54 C ATOM 966 CD2 LEU B 25 19.953 41.517 27.166 1.00 28.54 C ATOM 967 N ASP B 26 22.061 42.649 32.158 1.00 2.75 N ATOM 968 CA ASP B 26 22.128 43.057 33.566 1.00 2.75 C ATOM 969 C ASP B 26 22.523 44.514 33.726 1.00 2.75 C ATOM 970 O ASP B 26 22.004 45.215 34.611 1.00 2.75 O ATOM 971 CB ASP B 26 23.130 42.171 34.326 1.00 2.75 C ATOM 972 CG ASP B 26 23.997 42.956 35.308 1.00 2.75 C ATOM 973 OD1 ASP B 26 23.455 43.524 36.279 1.00 2.75 O ATOM 974 OD2 ASP B 26 25.233 43.011 35.107 1.00 2.75 O ATOM 975 N GLY B 27 23.455 44.965 32.890 1.00 2.75 N ATOM 976 CA GLY B 27 23.875 46.348 32.949 1.00 2.75 C ATOM 977 C GLY B 27 22.767 47.212 32.378 1.00 2.75 C ATOM 978 O GLY B 27 22.985 48.369 32.029 1.00 2.75 O ATOM 979 N GLN B 28 21.574 46.631 32.285 1.00 2.75 N ATOM 980 CA GLN B 28 20.410 47.326 31.758 1.00 2.75 C ATOM 981 C GLN B 28 20.784 48.327 30.672 1.00 2.75 C ATOM 982 O GLN B 28 20.196 49.406 30.577 1.00 2.75 O ATOM 983 CB GLN B 28 19.680 48.034 32.902 1.00 2.75 C ATOM 984 CG GLN B 28 18.260 48.531 32.581 1.00 2.75 C ATOM 985 CD GLN B 28 18.223 49.944 32.023 1.00 2.75 C ATOM 986 OE1 GLN B 28 18.831 50.868 32.583 1.00 2.75 O ATOM 987 NE2 GLN B 28 17.501 50.127 30.928 1.00 2.75 N ATOM 988 N GLU B 29 21.761 47.984 29.842 1.00 2.75 N ATOM 989 CA GLU B 29 22.166 48.909 28.788 1.00 2.75 C ATOM 990 C GLU B 29 21.594 48.521 27.414 1.00 2.75 C ATOM 991 O GLU B 29 21.555 47.337 27.064 1.00 2.75 O ATOM 992 CB GLU B 29 23.691 49.009 28.766 1.00 2.75 C ATOM 993 CG GLU B 29 24.269 49.642 27.536 1.00 2.75 C ATOM 994 CD GLU B 29 24.648 48.604 26.511 1.00 2.75 C ATOM 995 OE1 GLU B 29 23.812 47.700 26.271 1.00 2.75 O ATOM 996 OE2 GLU B 29 25.777 48.678 25.957 1.00 2.75 O ATOM 997 N ARG B 30 21.163 49.513 26.636 1.00 2.75 N ATOM 998 CA ARG B 30 20.545 49.258 25.326 1.00 2.75 C ATOM 999 C ARG B 30 21.165 48.103 24.565 1.00 2.75 C ATOM 1000 O ARG B 30 22.384 48.014 24.437 1.00 2.75 O ATOM 1001 CB ARG B 30 20.562 50.517 24.453 1.00 2.75 C ATOM 1002 CG ARG B 30 21.842 51.315 24.545 1.00 2.75 C ATOM 1003 CD ARG B 30 21.629 52.494 25.479 1.00 2.75 C ATOM 1004 NE ARG B 30 20.661 53.440 24.919 1.00 2.75 N ATOM 1005 CZ ARG B 30 20.173 54.494 25.566 1.00 2.75 C ATOM 1006 NH1 ARG B 30 20.561 54.749 26.808 1.00 2.75 N ATOM 1007 NH2 ARG B 30 19.287 55.285 24.969 1.00 2.75 N ATOM 1008 N VAL B 31 20.303 47.225 24.051 1.00 6.58 N ATOM 1009 CA VAL B 31 20.750 46.036 23.338 1.00 6.58 C ATOM 1010 C VAL B 31 19.934 45.674 22.111 1.00 6.58 C ATOM 1011 O VAL B 31 18.881 46.262 21.837 1.00 6.58 O ATOM 1012 CB VAL B 31 20.740 44.801 24.268 1.00 6.58 C ATOM 1013 CG1 VAL B 31 21.809 44.938 25.344 1.00 17.56 C ATOM 1014 CG2 VAL B 31 19.362 44.636 24.882 1.00 17.56 C ATOM 1015 N ARG B 32 20.442 44.675 21.392 1.00 6.58 N ATOM 1016 CA ARG B 32 19.821 44.154 20.183 1.00 6.58 C ATOM 1017 C ARG B 32 19.753 42.629 20.299 1.00 6.58 C ATOM 1018 O ARG B 32 20.716 41.928 19.969 1.00 6.58 O ATOM 1019 CB ARG B 32 20.656 44.547 18.961 1.00 6.58 C ATOM 1020 CG ARG B 32 20.011 44.208 17.628 1.00 17.56 C ATOM 1021 CD ARG B 32 20.841 44.748 16.476 1.00 28.54 C ATOM 1022 NE ARG B 32 20.180 44.577 15.186 1.00 39.52 N ATOM 1023 CZ ARG B 32 20.662 45.039 14.035 1.00 50.50 C ATOM 1024 NH1 ARG B 32 21.812 45.703 14.014 1.00 61.48 N ATOM 1025 NH2 ARG B 32 19.995 44.840 12.907 1.00 61.48 N ATOM 1026 N ILE B 33 18.613 42.123 20.759 1.00 6.58 N ATOM 1027 CA ILE B 33 18.420 40.687 20.943 1.00 6.58 C ATOM 1028 C ILE B 33 17.983 39.923 19.687 1.00 6.58 C ATOM 1029 O ILE B 33 17.061 40.330 18.981 1.00 6.58 O ATOM 1030 CB ILE B 33 17.378 40.434 22.077 1.00 6.58 C ATOM 1031 CG1 ILE B 33 17.822 41.155 23.357 1.00 17.56 C ATOM 1032 CG2 ILE B 33 17.207 38.933 22.321 1.00 17.56 C ATOM 1033 CD1 ILE B 33 16.846 41.042 24.509 1.00 28.54 C ATOM 1034 N GLU B 34 18.652 38.804 19.417 1.00 6.58 N ATOM 1035 CA GLU B 34 18.310 37.968 18.271 1.00 6.58 C ATOM 1036 C GLU B 34 17.743 36.653 18.811 1.00 6.58 C ATOM 1037 O GLU B 34 18.360 36.031 19.684 1.00 6.58 O ATOM 1038 CB GLU B 34 19.553 37.678 17.423 1.00 6.58 C ATOM 1039 CG GLU B 34 19.238 36.976 16.098 1.00 17.56 C ATOM 1040 CD GLU B 34 20.474 36.485 15.355 1.00 28.54 C ATOM 1041 OE1 GLU B 34 20.320 36.008 14.211 1.00 39.52 O ATOM 1042 OE2 GLU B 34 21.592 36.567 15.909 1.00 39.52 O ATOM 1043 N LEU B 35 16.589 36.230 18.293 1.00 6.58 N ATOM 1044 CA LEU B 35 15.945 34.997 18.744 1.00 6.58 C ATOM 1045 C LEU B 35 16.135 33.829 17.783 1.00 6.58 C ATOM 1046 O LEU B 35 16.818 33.957 16.762 1.00 6.58 O ATOM 1047 CB LEU B 35 14.447 35.223 18.965 1.00 6.58 C ATOM 1048 CG LEU B 35 14.019 36.470 19.741 1.00 17.56 C ATOM 1049 CD1 LEU B 35 12.520 36.400 20.012 1.00 28.54 C ATOM 1050 CD2 LEU B 35 14.790 36.573 21.048 1.00 28.54 C ATOM 1051 N ASP B 36 15.506 32.700 18.119 1.00 6.58 N ATOM 1052 CA ASP B 36 15.601 31.481 17.320 1.00 6.58 C ATOM 1053 C ASP B 36 15.067 31.569 15.900 1.00 6.58 C ATOM 1054 O ASP B 36 15.740 31.126 14.965 1.00 6.58 O ATOM 1055 CB ASP B 36 14.925 30.308 18.042 1.00 6.58 C ATOM 1056 CG ASP B 36 13.460 30.562 18.330 1.00 17.56 C ATOM 1057 OD1 ASP B 36 13.161 31.389 19.216 1.00 28.54 O ATOM 1058 OD2 ASP B 36 12.605 29.938 17.663 1.00 28.54 O ATOM 1059 N ASP B 37 13.871 32.130 15.722 1.00 6.58 N ATOM 1060 CA ASP B 37 13.284 32.234 14.385 1.00 6.58 C ATOM 1061 C ASP B 37 13.967 33.262 13.477 1.00 6.58 C ATOM 1062 O ASP B 37 13.640 33.360 12.294 1.00 6.58 O ATOM 1063 CB ASP B 37 11.782 32.538 14.484 1.00 6.58 C ATOM 1064 CG ASP B 37 11.483 33.725 15.380 1.00 17.56 C ATOM 1065 OD1 ASP B 37 12.436 34.426 15.778 1.00 28.54 O ATOM 1066 OD2 ASP B 37 10.291 33.957 15.679 1.00 28.54 O ATOM 1067 N GLY B 38 14.926 34.006 14.021 1.00 6.58 N ATOM 1068 CA GLY B 38 15.634 35.009 13.241 1.00 6.58 C ATOM 1069 C GLY B 38 15.147 36.411 13.556 1.00 6.58 C ATOM 1070 O GLY B 38 15.657 37.410 13.024 1.00 6.58 O ATOM 1071 N SER B 39 14.150 36.488 14.428 1.00 6.58 N ATOM 1072 CA SER B 39 13.596 37.774 14.816 1.00 6.58 C ATOM 1073 C SER B 39 14.714 38.598 15.441 1.00 6.58 C ATOM 1074 O SER B 39 15.689 38.061 15.974 1.00 6.58 O ATOM 1075 CB SER B 39 12.472 37.598 15.833 1.00 6.58 C ATOM 1076 OG SER B 39 12.970 37.747 17.150 1.00 17.56 O ATOM 1077 N MET B 40 14.573 39.911 15.345 1.00 6.58 N ATOM 1078 CA MET B 40 15.534 40.834 15.909 1.00 6.58 C ATOM 1079 C MET B 40 14.738 41.846 16.716 1.00 6.58 C ATOM 1080 O MET B 40 13.819 42.488 16.198 1.00 6.58 O ATOM 1081 CB MET B 40 16.326 41.533 14.797 1.00 6.58 C ATOM 1082 CG MET B 40 17.305 40.629 14.058 1.00 17.56 C ATOM 1083 SD MET B 40 18.636 40.080 15.132 1.00 28.54 S ATOM 1084 CE MET B 40 19.536 41.606 15.341 1.00 39.52 C ATOM 1085 N ILE B 41 15.084 41.964 17.992 1.00 6.58 N ATOM 1086 CA ILE B 41 14.417 42.882 18.900 1.00 6.58 C ATOM 1087 C ILE B 41 15.434 43.823 19.548 1.00 6.58 C ATOM 1088 O ILE B 41 16.316 43.389 20.284 1.00 6.58 O ATOM 1089 CB ILE B 41 13.639 42.094 19.998 1.00 6.58 C ATOM 1090 CG1 ILE B 41 13.134 43.042 21.090 1.00 17.56 C ATOM 1091 CG2 ILE B 41 14.544 41.037 20.613 1.00 17.56 C ATOM 1092 CD1 ILE B 41 12.281 44.171 20.593 1.00 28.54 C ATOM 1093 N ALA B 42 15.311 45.112 19.252 1.00 6.58 N ATOM 1094 CA ALA B 42 16.207 46.120 19.807 1.00 6.58 C ATOM 1095 C ALA B 42 15.459 47.068 20.737 1.00 6.58 C ATOM 1096 O ALA B 42 14.265 47.344 20.548 1.00 6.58 O ATOM 1097 CB ALA B 42 16.865 46.914 18.685 1.00 6.58 C ATOM 1098 N GLY B 43 16.167 47.565 21.746 1.00 6.58 N ATOM 1099 CA GLY B 43 15.549 48.488 22.675 1.00 6.58 C ATOM 1100 C GLY B 43 16.218 48.543 24.033 1.00 6.58 C ATOM 1101 O GLY B 43 17.363 48.129 24.202 1.00 6.58 O ATOM 1102 N THR B 44 15.480 49.052 25.012 1.00 2.75 N ATOM 1103 CA THR B 44 15.964 49.183 26.374 1.00 2.75 C ATOM 1104 C THR B 44 15.438 48.065 27.287 1.00 2.75 C ATOM 1105 O THR B 44 14.273 47.690 27.215 1.00 2.75 O ATOM 1106 CB THR B 44 15.527 50.540 26.959 1.00 2.75 C ATOM 1107 OG1 THR B 44 15.744 51.569 25.980 1.00 2.75 O ATOM 1108 CG2 THR B 44 16.311 50.856 28.204 1.00 2.75 C ATOM 1109 N VAL B 45 16.304 47.533 28.139 1.00 2.75 N ATOM 1110 CA VAL B 45 15.899 46.486 29.078 1.00 2.75 C ATOM 1111 C VAL B 45 15.065 47.194 30.146 1.00 2.75 C ATOM 1112 O VAL B 45 15.599 47.887 31.006 1.00 2.75 O ATOM 1113 CB VAL B 45 17.121 45.814 29.737 1.00 2.75 C ATOM 1114 CG1 VAL B 45 16.660 44.796 30.779 1.00 2.75 C ATOM 1115 CG2 VAL B 45 17.980 45.121 28.662 1.00 2.75 C ATOM 1116 N ALA B 46 13.751 47.016 30.086 1.00 2.75 N ATOM 1117 CA ALA B 46 12.836 47.691 31.010 1.00 2.75 C ATOM 1118 C ALA B 46 12.469 46.896 32.258 1.00 2.75 C ATOM 1119 O ALA B 46 12.144 47.460 33.299 1.00 2.75 O ATOM 1120 CB ALA B 46 11.569 48.114 30.250 1.00 2.75 C ATOM 1121 N VAL B 47 12.488 45.581 32.134 1.00 2.75 N ATOM 1122 CA VAL B 47 12.211 44.693 33.255 1.00 2.75 C ATOM 1123 C VAL B 47 13.268 43.629 33.047 1.00 2.75 C ATOM 1124 O VAL B 47 13.247 42.935 32.034 1.00 2.75 O ATOM 1125 CB VAL B 47 10.785 44.079 33.182 1.00 2.75 C ATOM 1126 CG1 VAL B 47 10.574 43.106 34.320 1.00 2.75 C ATOM 1127 CG2 VAL B 47 9.738 45.186 33.231 1.00 2.75 C ATOM 1128 N ARG B 48 14.206 43.531 33.986 1.00 6.58 N ATOM 1129 CA ARG B 48 15.313 42.584 33.895 1.00 6.58 C ATOM 1130 C ARG B 48 14.898 41.137 33.998 1.00 6.58 C ATOM 1131 O ARG B 48 13.934 40.802 34.690 1.00 6.58 O ATOM 1132 CB ARG B 48 16.352 42.885 34.981 1.00 6.58 C ATOM 1133 CG ARG B 48 17.321 43.997 34.611 1.00 17.56 C ATOM 1134 CD ARG B 48 18.158 44.434 35.802 1.00 28.54 C ATOM 1135 NE ARG B 48 17.319 44.918 36.894 1.00 39.52 N ATOM 1136 CZ ARG B 48 17.754 45.670 37.899 1.00 50.50 C ATOM 1137 NH1 ARG B 48 19.028 46.032 37.954 1.00 61.48 N ATOM 1138 NH2 ARG B 48 16.915 46.058 38.850 1.00 61.48 N ATOM 1139 N PRO B 49 15.629 40.246 33.310 1.00 6.58 N ATOM 1140 CA PRO B 49 15.308 38.820 33.354 1.00 6.58 C ATOM 1141 C PRO B 49 15.716 38.196 34.688 1.00 6.58 C ATOM 1142 O PRO B 49 16.727 38.579 35.291 1.00 6.58 O ATOM 1143 CB PRO B 49 16.112 38.258 32.182 1.00 6.58 C ATOM 1144 CG PRO B 49 17.333 39.095 32.209 1.00 17.56 C ATOM 1145 CD PRO B 49 16.780 40.492 32.422 1.00 28.54 C ATOM 1146 N THR B 50 14.911 37.257 35.162 1.00 6.58 N ATOM 1147 CA THR B 50 15.210 36.561 36.404 1.00 6.58 C ATOM 1148 C THR B 50 14.796 35.130 36.158 1.00 6.58 C ATOM 1149 O THR B 50 13.668 34.890 35.742 1.00 6.58 O ATOM 1150 CB THR B 50 14.381 37.078 37.593 1.00 6.58 C ATOM 1151 OG1 THR B 50 13.011 36.700 37.421 1.00 17.56 O ATOM 1152 CG2 THR B 50 14.482 38.594 37.702 1.00 17.56 C ATOM 1153 N ILE B 51 15.684 34.182 36.420 1.00 6.58 N ATOM 1154 CA ILE B 51 15.337 32.787 36.193 1.00 6.58 C ATOM 1155 C ILE B 51 14.367 32.328 37.277 1.00 6.58 C ATOM 1156 O ILE B 51 14.527 32.676 38.444 1.00 6.58 O ATOM 1157 CB ILE B 51 16.589 31.890 36.205 1.00 6.58 C ATOM 1158 CG1 ILE B 51 17.621 32.421 35.202 1.00 17.56 C ATOM 1159 CG2 ILE B 51 16.202 30.461 35.827 1.00 17.56 C ATOM 1160 CD1 ILE B 51 18.923 31.646 35.177 1.00 28.54 C ATOM 1161 N GLN B 52 13.348 31.568 36.887 1.00 6.58 N ATOM 1162 CA GLN B 52 12.362 31.068 37.846 1.00 6.58 C ATOM 1163 C GLN B 52 11.901 29.652 37.506 1.00 6.58 C ATOM 1164 O GLN B 52 12.246 29.116 36.447 1.00 6.58 O ATOM 1165 CB GLN B 52 11.158 32.014 37.892 1.00 6.58 C ATOM 1166 CG GLN B 52 11.441 33.352 38.566 1.00 17.56 C ATOM 1167 CD GLN B 52 10.191 34.211 38.710 1.00 28.54 C ATOM 1168 OE1 GLN B 52 9.204 33.791 39.311 1.00 39.52 O ATOM 1169 NE2 GLN B 52 10.232 35.417 38.160 1.00 39.52 N ATOM 1170 N THR B 53 11.139 29.030 38.406 1.00 6.58 N ATOM 1171 CA THR B 53 10.641 27.678 38.151 1.00 6.58 C ATOM 1172 C THR B 53 9.254 27.726 37.503 1.00 6.58 C ATOM 1173 O THR B 53 8.380 28.492 37.920 1.00 6.58 O ATOM 1174 CB THR B 53 10.576 26.845 39.446 1.00 6.58 C ATOM 1175 OG1 THR B 53 9.712 27.490 40.388 1.00 17.56 O ATOM 1176 CG2 THR B 53 11.971 26.702 40.060 1.00 17.56 C ATOM 1177 N TYR B 54 9.075 26.909 36.468 1.00 6.58 N ATOM 1178 CA TYR B 54 7.825 26.837 35.710 1.00 6.58 C ATOM 1179 C TYR B 54 7.459 25.387 35.405 1.00 6.58 C ATOM 1180 O TYR B 54 8.239 24.471 35.642 1.00 6.58 O ATOM 1181 CB TYR B 54 7.959 27.579 34.381 1.00 6.58 C ATOM 1182 CG TYR B 54 8.169 29.069 34.500 1.00 17.56 C ATOM 1183 CD1 TYR B 54 7.105 29.924 34.784 1.00 28.54 C ATOM 1184 CD2 TYR B 54 9.433 29.627 34.308 1.00 28.54 C ATOM 1185 CE1 TYR B 54 7.295 31.301 34.868 1.00 39.52 C ATOM 1186 CE2 TYR B 54 9.634 30.999 34.391 1.00 39.52 C ATOM 1187 CZ TYR B 54 8.562 31.831 34.670 1.00 50.50 C ATOM 1188 OH TYR B 54 8.760 33.190 34.745 1.00 61.48 O ATOM 1189 N ARG B 55 6.262 25.203 34.866 1.00 2.75 N ATOM 1190 CA ARG B 55 5.745 23.892 34.501 1.00 2.75 C ATOM 1191 C ARG B 55 5.178 24.061 33.090 1.00 2.75 C ATOM 1192 O ARG B 55 4.663 25.129 32.768 1.00 2.75 O ATOM 1193 CB ARG B 55 4.618 23.490 35.464 1.00 2.75 C ATOM 1194 CG ARG B 55 5.036 23.429 36.939 1.00 2.75 C ATOM 1195 CD ARG B 55 3.858 23.448 37.904 1.00 2.75 C ATOM 1196 NE ARG B 55 2.904 22.370 37.654 1.00 2.75 N ATOM 1197 CZ ARG B 55 2.071 21.887 38.570 1.00 2.75 C ATOM 1198 NH1 ARG B 55 2.066 22.374 39.810 1.00 2.75 N ATOM 1199 NH2 ARG B 55 1.238 20.909 38.247 1.00 2.75 N ATOM 1200 N ASP B 56 5.260 23.022 32.258 1.00 2.75 N ATOM 1201 CA ASP B 56 4.725 23.121 30.899 1.00 2.75 C ATOM 1202 C ASP B 56 3.407 22.354 30.799 1.00 2.75 C ATOM 1203 O ASP B 56 2.945 21.798 31.786 1.00 2.75 O ATOM 1204 CB ASP B 56 5.752 22.625 29.861 1.00 2.75 C ATOM 1205 CG ASP B 56 6.180 21.175 30.076 1.00 2.75 C ATOM 1206 OD1 ASP B 56 5.549 20.471 30.888 1.00 2.75 O ATOM 1207 OD2 ASP B 56 7.152 20.744 29.423 1.00 2.75 O ATOM 1208 N GLU B 57 2.794 22.339 29.620 1.00 2.75 N ATOM 1209 CA GLU B 57 1.517 21.655 29.439 1.00 2.75 C ATOM 1210 C GLU B 57 1.593 20.165 29.728 1.00 2.75 C ATOM 1211 O GLU B 57 0.579 19.527 30.011 1.00 2.75 O ATOM 1212 CB GLU B 57 0.991 21.846 28.015 1.00 2.75 C ATOM 1213 CG GLU B 57 -0.419 21.291 27.860 1.00 2.75 C ATOM 1214 CD GLU B 57 -0.875 21.205 26.428 1.00 2.75 C ATOM 1215 OE1 GLU B 57 -0.048 20.835 25.565 1.00 2.75 O ATOM 1216 OE2 GLU B 57 -2.068 21.484 26.176 1.00 2.75 O ATOM 1217 N GLN B 58 2.794 19.610 29.629 1.00 2.75 N ATOM 1218 CA GLN B 58 2.990 18.197 29.887 1.00 2.75 C ATOM 1219 C GLN B 58 3.354 18.035 31.366 1.00 2.75 C ATOM 1220 O GLN B 58 3.759 16.956 31.819 1.00 2.75 O ATOM 1221 CB GLN B 58 4.098 17.648 28.987 1.00 2.75 C ATOM 1222 CG GLN B 58 3.982 18.057 27.494 1.00 2.75 C ATOM 1223 CD GLN B 58 2.608 17.776 26.882 1.00 2.75 C ATOM 1224 OE1 GLN B 58 1.621 18.407 27.241 1.00 2.75 O ATOM 1225 NE2 GLN B 58 2.548 16.819 25.957 1.00 2.75 N ATOM 1226 N GLU B 59 3.208 19.124 32.115 1.00 2.75 N ATOM 1227 CA GLU B 59 3.485 19.130 33.552 1.00 2.75 C ATOM 1228 C GLU B 59 4.927 18.894 33.999 1.00 2.75 C ATOM 1229 O GLU B 59 5.165 18.511 35.153 1.00 2.75 O ATOM 1230 CB GLU B 59 2.575 18.123 34.244 1.00 2.75 C ATOM 1231 CG GLU B 59 1.112 18.451 34.092 1.00 2.75 C ATOM 1232 CD GLU B 59 0.527 19.000 35.368 1.00 2.75 C ATOM 1233 OE1 GLU B 59 1.058 19.997 35.904 1.00 2.75 O ATOM 1234 OE2 GLU B 59 -0.468 18.425 35.839 1.00 2.75 O ATOM 1235 N ARG B 60 5.881 19.107 33.091 1.00 2.75 N ATOM 1236 CA ARG B 60 7.303 18.953 33.401 1.00 2.75 C ATOM 1237 C ARG B 60 7.743 20.231 34.113 1.00 2.75 C ATOM 1238 O ARG B 60 7.262 21.317 33.783 1.00 2.75 O ATOM 1239 CB ARG B 60 8.131 18.787 32.124 1.00 2.75 C ATOM 1240 CG ARG B 60 8.299 17.370 31.617 1.00 2.75 C ATOM 1241 CD ARG B 60 9.172 17.363 30.362 1.00 2.75 C ATOM 1242 NE ARG B 60 9.520 16.018 29.899 1.00 2.75 N ATOM 1243 CZ ARG B 60 10.584 15.327 30.305 1.00 2.75 C ATOM 1244 NH1 ARG B 60 11.434 15.845 31.186 1.00 2.75 N ATOM 1245 NH2 ARG B 60 10.785 14.105 29.843 1.00 2.75 N ATOM 1246 N GLU B 61 8.643 20.095 35.086 1.00 2.75 N ATOM 1247 CA GLU B 61 9.144 21.249 35.838 1.00 2.75 C ATOM 1248 C GLU B 61 10.583 21.614 35.437 1.00 2.75 C ATOM 1249 O GLU B 61 11.456 20.744 35.328 1.00 2.75 O ATOM 1250 CB GLU B 61 9.073 20.972 37.342 1.00 2.75 C ATOM 1251 CG GLU B 61 9.641 22.100 38.212 1.00 2.75 C ATOM 1252 CD GLU B 61 9.414 21.892 39.716 1.00 2.75 C ATOM 1253 OE1 GLU B 61 8.395 22.396 40.264 1.00 2.75 O ATOM 1254 OE2 GLU B 61 10.258 21.218 40.350 1.00 2.75 O ATOM 1255 N GLY B 62 10.820 22.903 35.205 1.00 2.75 N ATOM 1256 CA GLY B 62 12.149 23.344 34.805 1.00 2.75 C ATOM 1257 C GLY B 62 12.364 24.845 34.933 1.00 2.75 C ATOM 1258 O GLY B 62 11.504 25.558 35.442 1.00 2.75 O ATOM 1259 N SER B 63 13.521 25.319 34.473 1.00 2.75 N ATOM 1260 CA SER B 63 13.880 26.737 34.536 1.00 2.75 C ATOM 1261 C SER B 63 13.639 27.483 33.217 1.00 2.75 C ATOM 1262 O SER B 63 13.699 26.903 32.127 1.00 2.75 O ATOM 1263 CB SER B 63 15.359 26.890 34.904 1.00 2.75 C ATOM 1264 OG SER B 63 15.691 26.241 36.121 1.00 2.75 O ATOM 1265 N ASN B 64 13.398 28.786 33.340 1.00 2.75 N ATOM 1266 CA ASN B 64 13.164 29.670 32.203 1.00 2.75 C ATOM 1267 C ASN B 64 13.040 31.065 32.809 1.00 2.75 C ATOM 1268 O ASN B 64 13.274 31.258 34.007 1.00 2.75 O ATOM 1269 CB ASN B 64 11.854 29.304 31.498 1.00 2.75 C ATOM 1270 CG ASN B 64 11.842 29.708 30.050 1.00 2.75 C ATOM 1271 OD1 ASN B 64 12.538 30.639 29.640 1.00 2.75 O ATOM 1272 ND2 ASN B 64 11.033 29.014 29.252 1.00 2.75 N ATOM 1273 N GLY B 65 12.676 32.045 31.996 1.00 2.75 N ATOM 1274 CA GLY B 65 12.514 33.388 32.516 1.00 2.75 C ATOM 1275 C GLY B 65 12.014 34.352 31.456 1.00 2.75 C ATOM 1276 O GLY B 65 12.081 34.062 30.269 1.00 2.75 O ATOM 1277 N GLN B 66 11.521 35.506 31.885 1.00 2.75 N ATOM 1278 CA GLN B 66 11.004 36.507 30.958 1.00 2.75 C ATOM 1279 C GLN B 66 11.604 37.879 31.268 1.00 2.75 C ATOM 1280 O GLN B 66 11.946 38.169 32.416 1.00 6.58 O ATOM 1281 CB GLN B 66 9.471 36.589 31.082 1.00 2.75 C ATOM 1282 CG GLN B 66 8.771 35.236 31.210 1.00 2.75 C ATOM 1283 CD GLN B 66 7.275 35.331 31.557 1.00 2.75 C ATOM 1284 OE1 GLN B 66 6.449 35.774 30.752 1.00 2.75 O ATOM 1285 NE2 GLN B 66 6.928 34.895 32.760 1.00 2.75 N ATOM 1286 N LEU B 67 11.746 38.711 30.237 1.00 2.75 N ATOM 1287 CA LEU B 67 12.246 40.081 30.403 1.00 2.75 C ATOM 1288 C LEU B 67 11.375 40.929 29.487 1.00 2.75 C ATOM 1289 O LEU B 67 10.763 40.410 28.545 1.00 6.58 O ATOM 1290 CB LEU B 67 13.723 40.226 29.990 1.00 2.75 C ATOM 1291 CG LEU B 67 14.064 40.266 28.490 1.00 2.75 C ATOM 1292 CD1 LEU B 67 15.109 41.342 28.186 1.00 2.75 C ATOM 1293 CD2 LEU B 67 14.572 38.897 28.061 1.00 2.75 C ATOM 1294 N ARG B 68 11.284 42.222 29.769 1.00 2.75 N ATOM 1295 CA ARG B 68 10.484 43.091 28.918 1.00 2.75 C ATOM 1296 C ARG B 68 11.432 44.088 28.290 1.00 2.75 C ATOM 1297 O ARG B 68 12.258 44.696 28.980 1.00 6.58 O ATOM 1298 CB ARG B 68 9.411 43.837 29.715 1.00 2.75 C ATOM 1299 CG ARG B 68 8.269 44.413 28.850 1.00 2.75 C ATOM 1300 CD ARG B 68 7.342 45.325 29.650 1.00 2.75 C ATOM 1301 NE ARG B 68 7.885 46.665 29.858 1.00 2.75 N ATOM 1302 CZ ARG B 68 7.533 47.457 30.867 1.00 2.75 C ATOM 1303 NH1 ARG B 68 6.642 47.034 31.754 1.00 2.75 N ATOM 1304 NH2 ARG B 68 8.071 48.668 30.992 1.00 2.75 N ATOM 1305 N ILE B 69 11.341 44.215 26.972 1.00 2.75 N ATOM 1306 CA ILE B 69 12.158 45.162 26.242 1.00 2.75 C ATOM 1307 C ILE B 69 11.195 46.180 25.645 1.00 2.75 C ATOM 1308 O ILE B 69 10.234 45.816 24.964 1.00 6.58 O ATOM 1309 CB ILE B 69 12.966 44.477 25.105 1.00 2.75 C ATOM 1310 CG1 ILE B 69 13.825 45.516 24.373 1.00 2.75 C ATOM 1311 CG2 ILE B 69 12.021 43.826 24.100 1.00 2.75 C ATOM 1312 CD1 ILE B 69 14.837 44.895 23.427 1.00 2.75 C ATOM 1313 N ASP B 70 11.436 47.456 25.924 1.00 2.75 N ATOM 1314 CA ASP B 70 10.595 48.519 25.392 1.00 2.75 C ATOM 1315 C ASP B 70 11.278 49.017 24.127 1.00 2.75 C ATOM 1316 O ASP B 70 12.442 49.426 24.158 1.00 6.58 O ATOM 1317 CB ASP B 70 10.449 49.657 26.414 1.00 2.75 C ATOM 1318 CG ASP B 70 9.565 49.273 27.603 1.00 2.75 C ATOM 1319 OD1 ASP B 70 9.427 50.090 28.536 1.00 2.75 O ATOM 1320 OD2 ASP B 70 8.993 48.164 27.600 1.00 2.75 O ATOM 1321 N HIS B 71 10.560 48.967 23.010 1.00 6.58 N ATOM 1322 CA HIS B 71 11.126 49.383 21.731 1.00 6.58 C ATOM 1323 C HIS B 71 11.585 50.833 21.729 1.00 6.58 C ATOM 1324 O HIS B 71 11.250 51.606 22.622 1.00 6.58 O ATOM 1325 CB HIS B 71 10.120 49.140 20.610 1.00 6.58 C ATOM 1326 CG HIS B 71 9.883 47.689 20.317 1.00 17.56 C ATOM 1327 ND1 HIS B 71 10.311 47.087 19.154 1.00 28.54 N ATOM 1328 CD2 HIS B 71 9.260 46.723 21.032 1.00 28.54 C ATOM 1329 CE1 HIS B 71 9.960 45.814 19.163 1.00 39.52 C ATOM 1330 NE2 HIS B 71 9.321 45.567 20.292 1.00 39.52 N ATOM 1331 N LEU B 72 12.348 51.195 20.703 1.00 6.58 N ATOM 1332 CA LEU B 72 12.900 52.536 20.592 1.00 6.58 C ATOM 1333 C LEU B 72 12.025 53.563 19.881 1.00 6.58 C ATOM 1334 O LEU B 72 12.510 54.613 19.444 1.00 6.58 O ATOM 1335 CB LEU B 72 14.280 52.453 19.924 1.00 6.58 C ATOM 1336 CG LEU B 72 15.251 51.508 20.649 1.00 17.56 C ATOM 1337 CD1 LEU B 72 16.484 51.256 19.801 1.00 28.54 C ATOM 1338 CD2 LEU B 72 15.634 52.104 21.993 1.00 28.54 C ATOM 1339 N ASP B 73 10.732 53.277 19.777 1.00 6.58 N ATOM 1340 CA ASP B 73 9.816 54.209 19.133 1.00 6.58 C ATOM 1341 C ASP B 73 9.310 55.196 20.179 1.00 6.58 C ATOM 1342 O ASP B 73 9.482 54.985 21.384 1.00 6.58 O ATOM 1343 CB ASP B 73 8.635 53.477 18.463 1.00 6.58 C ATOM 1344 CG ASP B 73 7.749 52.743 19.454 1.00 17.56 C ATOM 1345 OD1 ASP B 73 8.180 51.704 19.994 1.00 28.54 O ATOM 1346 OD2 ASP B 73 6.618 53.213 19.692 1.00 28.54 O ATOM 1347 N ALA B 74 8.697 56.277 19.712 1.00 6.58 N ATOM 1348 CA ALA B 74 8.186 57.329 20.597 1.00 6.58 C ATOM 1349 C ALA B 74 7.301 56.841 21.743 1.00 6.58 C ATOM 1350 O ALA B 74 7.404 57.345 22.868 1.00 6.58 O ATOM 1351 CB ALA B 74 7.437 58.377 19.765 1.00 6.58 C ATOM 1352 N SER B 75 6.430 55.873 21.472 1.00 6.58 N ATOM 1353 CA SER B 75 5.536 55.372 22.516 1.00 6.58 C ATOM 1354 C SER B 75 6.066 54.104 23.193 1.00 6.58 C ATOM 1355 O SER B 75 5.320 53.397 23.868 1.00 6.58 O ATOM 1356 CB SER B 75 4.130 55.132 21.947 1.00 6.58 C ATOM 1357 OG SER B 75 3.590 56.335 21.420 1.00 17.56 O ATOM 1358 N GLN B 76 7.361 53.848 23.013 1.00 6.58 N ATOM 1359 CA GLN B 76 8.062 52.698 23.606 1.00 6.58 C ATOM 1360 C GLN B 76 7.242 51.418 23.739 1.00 6.58 C ATOM 1361 O GLN B 76 6.942 50.972 24.844 1.00 6.58 O ATOM 1362 CB GLN B 76 8.628 53.088 24.979 1.00 6.58 C ATOM 1363 CG GLN B 76 9.744 54.135 24.912 1.00 17.56 C ATOM 1364 CD GLN B 76 10.260 54.538 26.278 1.00 28.54 C ATOM 1365 OE1 GLN B 76 9.565 55.200 27.046 1.00 39.52 O ATOM 1366 NE2 GLN B 76 11.485 54.136 26.588 1.00 39.52 N ATOM 1367 N GLU B 77 6.901 50.822 22.601 1.00 6.58 N ATOM 1368 CA GLU B 77 6.105 49.604 22.581 1.00 6.58 C ATOM 1369 C GLU B 77 6.715 48.527 23.460 1.00 6.58 C ATOM 1370 O GLU B 77 7.874 48.165 23.282 1.00 6.58 O ATOM 1371 CB GLU B 77 5.978 49.089 21.148 1.00 6.58 C ATOM 1372 CG GLU B 77 5.091 47.872 21.014 1.00 17.56 C ATOM 1373 CD GLU B 77 4.703 47.590 19.577 1.00 28.54 C ATOM 1374 OE1 GLU B 77 3.995 48.428 18.981 1.00 39.52 O ATOM 1375 OE2 GLU B 77 5.105 46.533 19.048 1.00 39.52 O ATOM 1376 N PRO B 78 5.936 48.002 24.423 1.00 6.58 N ATOM 1377 CA PRO B 78 6.391 46.957 25.343 1.00 6.58 C ATOM 1378 C PRO B 78 6.335 45.567 24.725 1.00 6.58 C ATOM 1379 O PRO B 78 5.356 45.213 24.061 1.00 6.58 O ATOM 1380 CB PRO B 78 5.427 47.094 26.513 1.00 6.58 C ATOM 1381 CG PRO B 78 4.153 47.425 25.814 1.00 17.56 C ATOM 1382 CD PRO B 78 4.586 48.464 24.795 1.00 28.54 C ATOM 1383 N GLN B 79 7.392 44.787 24.938 1.00 2.75 N ATOM 1384 CA GLN B 79 7.446 43.426 24.414 1.00 2.75 C ATOM 1385 C GLN B 79 8.108 42.471 25.386 1.00 2.75 C ATOM 1386 O GLN B 79 9.292 42.610 25.705 1.00 2.75 O ATOM 1387 CB GLN B 79 8.191 43.376 23.067 1.00 2.75 C ATOM 1388 CG GLN B 79 8.383 41.954 22.507 1.00 2.75 C ATOM 1389 CD GLN B 79 8.772 41.943 21.037 1.00 2.75 C ATOM 1390 OE1 GLN B 79 9.524 42.804 20.573 1.00 2.75 O ATOM 1391 NE2 GLN B 79 8.270 40.951 20.295 1.00 2.75 N ATOM 1392 N TRP B 80 7.327 41.502 25.855 1.00 6.58 N ATOM 1393 CA TRP B 80 7.833 40.493 26.769 1.00 6.58 C ATOM 1394 C TRP B 80 8.361 39.348 25.920 1.00 6.58 C ATOM 1395 O TRP B 80 7.761 38.994 24.899 1.00 6.58 O ATOM 1396 CB TRP B 80 6.720 39.992 27.696 1.00 6.58 C ATOM 1397 CG TRP B 80 6.186 41.056 28.617 1.00 17.56 C ATOM 1398 CD1 TRP B 80 5.272 42.023 28.311 1.00 28.54 C ATOM 1399 CD2 TRP B 80 6.565 41.282 29.982 1.00 28.54 C ATOM 1400 NE1 TRP B 80 5.058 42.838 29.398 1.00 39.52 N ATOM 1401 CE2 TRP B 80 5.839 42.406 30.437 1.00 39.52 C ATOM 1402 CE3 TRP B 80 7.449 40.646 30.865 1.00 39.52 C ATOM 1403 CZ2 TRP B 80 5.970 42.909 31.737 1.00 50.50 C ATOM 1404 CZ3 TRP B 80 7.579 41.148 32.159 1.00 50.50 C ATOM 1405 CH2 TRP B 80 6.841 42.269 32.580 1.00 61.48 C ATOM 1406 N ILE B 81 9.495 38.790 26.334 1.00 6.58 N ATOM 1407 CA ILE B 81 10.117 37.686 25.618 1.00 6.58 C ATOM 1408 C ILE B 81 10.552 36.585 26.602 1.00 6.58 C ATOM 1409 O ILE B 81 10.823 36.859 27.768 1.00 6.58 O ATOM 1410 CB ILE B 81 11.363 38.180 24.849 1.00 6.58 C ATOM 1411 CG1 ILE B 81 10.980 39.340 23.931 1.00 17.56 C ATOM 1412 CG2 ILE B 81 11.963 37.041 24.029 1.00 17.56 C ATOM 1413 CD1 ILE B 81 12.138 39.905 23.133 1.00 28.54 C ATOM 1414 N TRP B 82 10.607 35.347 26.122 1.00 6.58 N ATOM 1415 CA TRP B 82 11.044 34.219 26.928 1.00 6.58 C ATOM 1416 C TRP B 82 12.536 34.024 26.653 1.00 6.58 C ATOM 1417 O TRP B 82 12.957 34.029 25.497 1.00 6.58 O ATOM 1418 CB TRP B 82 10.279 32.943 26.545 1.00 6.58 C ATOM 1419 CG TRP B 82 8.860 32.907 27.039 1.00 17.56 C ATOM 1420 CD1 TRP B 82 7.741 33.270 26.351 1.00 28.54 C ATOM 1421 CD2 TRP B 82 8.418 32.487 28.334 1.00 28.54 C ATOM 1422 NE1 TRP B 82 6.625 33.098 27.135 1.00 39.52 N ATOM 1423 CE2 TRP B 82 7.011 32.619 28.358 1.00 39.52 C ATOM 1424 CE3 TRP B 82 9.074 32.010 29.478 1.00 39.52 C ATOM 1425 CZ2 TRP B 82 6.245 32.292 29.484 1.00 50.50 C ATOM 1426 CZ3 TRP B 82 8.312 31.683 30.599 1.00 50.50 C ATOM 1427 CH2 TRP B 82 6.911 31.827 30.591 1.00 61.48 C ATOM 1428 N MET B 83 13.323 33.828 27.711 1.00 6.58 N ATOM 1429 CA MET B 83 14.769 33.660 27.582 1.00 6.58 C ATOM 1430 C MET B 83 15.256 32.493 26.734 1.00 6.58 C ATOM 1431 O MET B 83 16.308 32.584 26.103 1.00 6.58 O ATOM 1432 CB MET B 83 15.424 33.558 28.961 1.00 6.58 C ATOM 1433 CG MET B 83 15.555 34.879 29.710 1.00 17.56 C ATOM 1434 SD MET B 83 16.809 34.800 31.027 1.00 28.54 S ATOM 1435 CE MET B 83 17.992 35.915 30.400 1.00 39.52 C ATOM 1436 N ASP B 84 14.506 31.398 26.711 1.00 6.58 N ATOM 1437 CA ASP B 84 14.933 30.244 25.933 1.00 6.58 C ATOM 1438 C ASP B 84 14.997 30.546 24.438 1.00 6.58 C ATOM 1439 O ASP B 84 15.714 29.890 23.690 1.00 6.58 O ATOM 1440 CB ASP B 84 14.007 29.056 26.199 1.00 6.58 C ATOM 1441 CG ASP B 84 12.574 29.325 25.798 1.00 17.56 C ATOM 1442 OD1 ASP B 84 11.982 30.293 26.311 1.00 28.54 O ATOM 1443 OD2 ASP B 84 12.040 28.554 24.974 1.00 28.54 O ATOM 1444 N ARG B 85 14.272 31.572 24.015 1.00 6.58 N ATOM 1445 CA ARG B 85 14.233 31.963 22.614 1.00 6.58 C ATOM 1446 C ARG B 85 15.462 32.732 22.128 1.00 6.58 C ATOM 1447 O ARG B 85 15.637 32.911 20.926 1.00 6.58 O ATOM 1448 CB ARG B 85 12.984 32.817 22.368 1.00 6.58 C ATOM 1449 CG ARG B 85 11.689 32.126 22.760 1.00 17.56 C ATOM 1450 CD ARG B 85 11.570 30.776 22.073 1.00 28.54 C ATOM 1451 NE ARG B 85 10.306 30.109 22.368 1.00 39.52 N ATOM 1452 CZ ARG B 85 9.114 30.556 21.982 1.00 50.50 C ATOM 1453 NH1 ARG B 85 9.016 31.677 21.278 1.00 61.48 N ATOM 1454 NH2 ARG B 85 8.018 29.881 22.300 1.00 61.48 N ATOM 1455 N ILE B 86 16.315 33.163 23.053 1.00 2.75 N ATOM 1456 CA ILE B 86 17.482 33.971 22.701 1.00 2.75 C ATOM 1457 C ILE B 86 18.731 33.228 22.253 1.00 2.75 C ATOM 1458 O ILE B 86 19.178 32.298 22.914 1.00 6.58 O ATOM 1459 CB ILE B 86 17.873 34.872 23.878 1.00 2.75 C ATOM 1460 CG1 ILE B 86 16.657 35.683 24.331 1.00 2.75 C ATOM 1461 CG2 ILE B 86 19.037 35.785 23.487 1.00 2.75 C ATOM 1462 CD1 ILE B 86 16.854 36.372 25.652 1.00 2.75 C ATOM 1463 N VAL B 87 19.305 33.655 21.134 1.00 6.58 N ATOM 1464 CA VAL B 87 20.518 33.019 20.637 1.00 6.58 C ATOM 1465 C VAL B 87 21.725 33.962 20.657 1.00 6.58 C ATOM 1466 O VAL B 87 22.863 33.526 20.461 1.00 6.58 O ATOM 1467 CB VAL B 87 20.318 32.464 19.205 1.00 6.58 C ATOM 1468 CG1 VAL B 87 19.211 31.429 19.213 1.00 17.56 C ATOM 1469 CG2 VAL B 87 19.973 33.596 18.240 1.00 17.56 C ATOM 1470 N ALA B 88 21.483 35.251 20.896 1.00 6.58 N ATOM 1471 CA ALA B 88 22.564 36.227 20.954 1.00 6.58 C ATOM 1472 C ALA B 88 22.097 37.581 21.494 1.00 6.58 C ATOM 1473 O ALA B 88 20.906 37.916 21.451 1.00 6.58 O ATOM 1474 CB ALA B 88 23.188 36.409 19.574 1.00 6.58 C ATOM 1475 N VAL B 89 23.054 38.359 21.991 1.00 6.58 N ATOM 1476 CA VAL B 89 22.781 39.685 22.542 1.00 6.58 C ATOM 1477 C VAL B 89 23.994 40.593 22.303 1.00 6.58 C ATOM 1478 O VAL B 89 25.045 40.405 22.904 1.00 6.58 O ATOM 1479 CB VAL B 89 22.504 39.618 24.069 1.00 6.58 C ATOM 1480 CG1 VAL B 89 22.213 41.014 24.608 1.00 17.56 C ATOM 1481 CG2 VAL B 89 21.335 38.684 24.349 1.00 17.56 C ATOM 1482 N HIS B 90 23.849 41.563 21.409 1.00 6.58 N ATOM 1483 CA HIS B 90 24.940 42.493 21.114 1.00 6.58 C ATOM 1484 C HIS B 90 24.516 43.895 21.577 1.00 6.58 C ATOM 1485 O HIS B 90 23.328 44.191 21.645 1.00 6.58 O ATOM 1486 CB HIS B 90 25.234 42.495 19.610 1.00 6.58 C ATOM 1487 CG HIS B 90 24.976 41.180 18.935 1.00 17.56 C ATOM 1488 ND1 HIS B 90 23.706 40.685 18.729 1.00 28.54 N ATOM 1489 CD2 HIS B 90 25.825 40.263 18.412 1.00 28.54 C ATOM 1490 CE1 HIS B 90 23.784 39.521 18.109 1.00 39.52 C ATOM 1491 NE2 HIS B 90 25.059 39.241 17.905 1.00 39.52 N ATOM 1492 N PRO B 91 25.479 44.772 21.905 1.00 6.58 N ATOM 1493 CA PRO B 91 25.155 46.131 22.360 1.00 6.58 C ATOM 1494 C PRO B 91 24.146 46.915 21.507 1.00 6.58 C ATOM 1495 O PRO B 91 23.319 46.349 20.784 1.00 6.58 O ATOM 1496 CB PRO B 91 26.516 46.815 22.397 1.00 6.58 C ATOM 1497 CG PRO B 91 27.430 45.689 22.786 1.00 17.56 C ATOM 1498 CD PRO B 91 26.935 44.550 21.909 1.00 28.54 C ATOM 1499 N MET B 92 24.226 48.232 21.630 1.00 6.58 N ATOM 1500 CA MET B 92 23.373 49.183 20.917 1.00 6.58 C ATOM 1501 C MET B 92 24.021 50.563 20.904 1.00 6.58 C ATOM 1502 O MET B 92 23.922 51.283 19.904 1.00 6.58 O ATOM 1503 CB MET B 92 22.005 49.304 21.584 1.00 6.58 C ATOM 1504 CG MET B 92 20.961 48.349 21.051 1.00 17.56 C ATOM 1505 SD MET B 92 20.639 48.596 19.289 1.00 28.54 S ATOM 1506 CE MET B 92 19.668 50.108 19.308 1.00 39.52 C ATOM 1507 N PRO B 93 24.688 50.950 22.019 1.00 1.02 N ATOM 1508 CA PRO B 93 25.360 52.250 22.166 1.00 1.02 C ATOM 1509 C PRO B 93 25.878 52.907 20.883 1.00 1.02 C ATOM 1510 O PRO B 93 25.289 53.937 20.472 1.00 1.02 O ATOM 1511 CB PRO B 93 26.484 51.935 23.138 1.00 1.02 C ATOM 1512 CG PRO B 93 25.829 50.975 24.071 1.00 1.02 C ATOM 1513 CD PRO B 93 25.080 50.051 23.125 1.00 1.02 C TER 1514 PRO B 93 HETATM 1515 O HOH A 102 55.426 32.989 21.912 1.00 1.02 O HETATM 1516 O HOH A 103 40.901 26.749 43.055 1.00 2.13 O HETATM 1517 O HOH A 104 35.349 33.788 41.580 1.00 2.13 O HETATM 1518 O HOH A 105 38.978 10.519 41.005 1.00 2.13 O HETATM 1519 O HOH A 106 25.923 8.913 38.903 1.00 2.13 O HETATM 1520 O HOH A 107 40.775 10.844 37.633 1.00 2.13 O HETATM 1521 O HOH A 108 46.016 24.167 52.453 1.00 2.13 O HETATM 1522 O HOH A 109 34.530 20.517 53.061 1.00 2.13 O HETATM 1523 O HOH A 110 39.762 16.152 30.467 1.00 2.13 O HETATM 1524 O HOH A 111 30.722 35.109 21.328 1.00 2.13 O HETATM 1525 O HOH A 112 48.099 39.721 22.618 1.00 2.13 O HETATM 1526 O HOH A 113 29.251 24.535 24.931 1.00 2.13 O HETATM 1527 O HOH A 114 33.272 36.664 19.585 1.00 2.13 O HETATM 1528 O HOH A 115 36.183 37.056 26.418 1.00 2.13 O HETATM 1529 O HOH A 116 49.838 45.352 38.462 1.00 2.13 O HETATM 1530 O HOH A 117 46.800 25.050 36.459 1.00 2.13 O HETATM 1531 O HOH A 118 40.302 38.406 23.407 1.00 2.13 O HETATM 1532 O HOH A 119 37.692 32.576 19.258 1.00 2.13 O HETATM 1533 O HOH A 120 33.794 37.488 17.152 1.00 2.13 O HETATM 1534 O HOH A 121 43.449 39.198 39.150 1.00 2.13 O HETATM 1535 O HOH A 122 49.640 41.488 20.852 1.00 2.13 O HETATM 1536 O HOH A 123 32.173 35.770 17.262 1.00 2.13 O HETATM 1537 O HOH A 124 22.494 21.386 46.541 1.00 2.13 O HETATM 1538 O HOH A 125 28.032 25.047 29.712 1.00 2.13 O HETATM 1539 O HOH A 126 28.233 21.667 43.224 1.00 2.13 O HETATM 1540 O HOH A 127 39.084 12.357 48.850 1.00 2.13 O HETATM 1541 O HOH A 128 34.475 39.410 27.398 1.00 2.13 O HETATM 1542 O HOH A 129 47.708 22.629 31.427 1.00 2.13 O HETATM 1543 O HOH A 130 46.418 42.566 23.669 1.00 2.13 O HETATM 1544 O HOH A 131 30.155 30.338 44.630 1.00 2.13 O HETATM 1545 O HOH A 132 28.721 25.660 44.704 1.00 2.13 O HETATM 1546 O HOH A 133 28.302 16.476 50.387 1.00 2.13 O HETATM 1547 O HOH A 134 41.695 30.391 40.467 1.00 2.13 O HETATM 1548 O HOH A 135 56.669 36.377 24.355 1.00 2.13 O HETATM 1549 O HOH A 136 41.693 16.469 44.730 1.00 2.13 O HETATM 1550 O HOH A 137 42.409 31.118 37.804 1.00 1.02 O HETATM 1551 O HOH A 138 49.137 31.195 36.402 1.00 1.02 O HETATM 1552 O HOH A 139 47.098 27.230 22.246 1.00 1.02 O HETATM 1553 O HOH A 140 44.417 29.341 23.097 1.00 1.02 O HETATM 1554 O HOH A 141 44.503 21.646 30.042 1.00 1.02 O HETATM 1555 O HOH A 142 50.120 24.010 34.435 1.00 1.02 O HETATM 1556 O HOH A 143 42.119 23.323 26.144 1.00 1.02 O HETATM 1557 O HOH A 144 37.316 21.543 24.530 1.00 1.02 O HETATM 1558 O HOH A 145 48.258 50.129 33.383 1.00 1.02 O HETATM 1559 O HOH A 146 44.850 39.402 22.317 1.00 1.02 O HETATM 1560 O HOH A 147 29.120 25.401 32.390 1.00 1.02 O HETATM 1561 O HOH A 148 32.874 12.977 56.281 1.00 1.02 O HETATM 1562 O HOH A 149 38.535 14.185 48.108 1.00 1.02 O HETATM 1563 O HOH A 150 53.534 41.286 39.637 1.00 1.02 O HETATM 1564 O HOH A 151 36.940 23.510 22.201 1.00 1.02 O HETATM 1565 O HOH A 152 34.988 24.620 28.470 1.00 1.02 O HETATM 1566 O HOH A 153 40.101 21.420 51.543 1.00 1.02 O HETATM 1567 O HOH A 154 28.368 9.624 37.268 1.00 1.02 O HETATM 1568 O HOH A 155 35.088 36.660 31.676 1.00 1.02 O HETATM 1569 O HOH A 156 47.916 37.455 31.986 1.00 1.02 O HETATM 1570 O HOH A 157 54.542 41.201 24.879 1.00 1.02 O HETATM 1571 O HOH B 102 7.289 36.553 28.720 1.00 1.02 O HETATM 1572 O HOH B 103 16.717 54.179 24.047 1.00 1.02 O HETATM 1573 O HOH B 104 9.248 42.123 17.733 1.00 1.02 O HETATM 1574 O HOH B 105 14.412 28.147 38.194 1.00 1.02 O HETATM 1575 O HOH B 106 12.750 20.065 38.258 1.00 1.02 O HETATM 1576 O HOH B 107 2.862 19.364 41.232 1.00 1.02 O HETATM 1577 O HOH B 108 4.268 47.290 30.459 1.00 2.13 O HETATM 1578 O HOH B 109 -2.769 31.767 43.156 1.00 2.13 O HETATM 1579 O HOH B 110 12.902 52.548 24.732 1.00 2.13 O HETATM 1580 O HOH B 111 26.359 26.707 27.712 1.00 2.13 O HETATM 1581 O HOH B 112 22.639 23.783 37.663 1.00 2.13 O HETATM 1582 O HOH B 113 5.670 27.728 39.182 1.00 2.13 O HETATM 1583 O HOH B 114 18.823 27.871 20.886 1.00 2.13 O HETATM 1584 O HOH B 115 6.968 22.085 46.925 1.00 2.13 O HETATM 1585 O HOH B 116 20.608 59.410 25.117 1.00 2.13 O HETATM 1586 O HOH B 117 9.544 22.077 28.354 1.00 2.13 O HETATM 1587 O HOH B 118 9.058 35.190 23.766 1.00 2.13 O HETATM 1588 O HOH B 119 20.838 33.579 37.615 1.00 2.13 O HETATM 1589 O HOH B 120 29.520 41.743 35.507 1.00 2.13 O HETATM 1590 O HOH B 121 30.129 41.166 22.082 1.00 2.13 O HETATM 1591 O HOH B 122 27.747 54.704 18.906 1.00 2.13 O HETATM 1592 O HOH B 123 28.438 46.176 32.160 1.00 2.13 O HETATM 1593 O HOH B 124 1.534 49.415 26.654 1.00 2.13 O HETATM 1594 O HOH B 125 27.355 55.392 22.150 1.00 2.13 O HETATM 1595 O HOH B 126 4.705 14.287 32.912 1.00 2.13 O HETATM 1596 O HOH B 127 12.883 20.283 43.923 1.00 2.13 O HETATM 1597 O HOH B 128 2.632 45.033 28.330 1.00 2.13 O HETATM 1598 O HOH B 129 1.635 21.917 23.452 1.00 2.13 O HETATM 1599 O HOH B 130 4.297 24.108 27.700 1.00 2.13 O HETATM 1600 O HOH B 131 17.004 27.437 25.729 1.00 2.13 O HETATM 1601 O HOH B 132 2.460 23.310 21.287 1.00 2.13 O HETATM 1602 O HOH B 133 11.278 39.168 35.156 1.00 2.13 O HETATM 1603 O HOH B 134 27.898 50.589 36.883 1.00 2.13 O HETATM 1604 O HOH B 135 -1.156 19.299 39.714 1.00 2.13 O HETATM 1605 O HOH B 136 8.539 41.213 15.785 1.00 2.13 O HETATM 1606 O HOH B 137 26.268 43.473 15.966 1.00 2.13 O HETATM 1607 O HOH B 138 5.065 41.947 19.954 1.00 2.13 O HETATM 1608 O HOH B 139 7.697 38.140 21.957 1.00 2.13 O HETATM 1609 O HOH B 140 14.774 22.019 21.284 1.00 2.13 O HETATM 1610 O HOH B 141 15.624 24.668 41.850 1.00 2.13 O HETATM 1611 O HOH B 142 21.241 28.333 12.634 1.00 2.13 O HETATM 1612 O HOH B 143 3.423 26.956 48.307 1.00 2.13 O HETATM 1613 O HOH B 144 27.900 42.522 31.320 1.00 2.13 O HETATM 1614 O HOH B 145 9.389 30.775 39.658 1.00 2.13 O HETATM 1615 O HOH B 146 26.148 37.301 22.575 1.00 2.13 O HETATM 1616 O HOH B 147 9.155 17.660 36.448 1.00 2.13 O HETATM 1617 O HOH B 148 18.137 33.019 13.730 1.00 2.13 O HETATM 1618 O HOH B 149 11.313 27.060 15.206 1.00 2.13 O HETATM 1619 O HOH B 150 4.056 17.989 38.825 1.00 2.13 O HETATM 1620 O HOH B 151 7.416 35.656 43.886 1.00 2.13 O HETATM 1621 O HOH B 152 12.541 27.907 11.923 1.00 2.13 O HETATM 1622 O HOH B 153 6.329 26.333 11.082 1.00 2.13 O HETATM 1623 O HOH B 154 26.283 56.895 33.844 1.00 2.13 O HETATM 1624 O HOH B 155 5.366 21.950 19.886 1.00 2.13 O HETATM 1625 O HOH B 156 13.512 42.231 37.121 1.00 2.13 O HETATM 1626 O HOH B 157 -0.309 28.369 45.779 1.00 1.02 O HETATM 1627 O HOH B 158 25.446 48.828 33.125 1.00 1.02 O HETATM 1628 O HOH B 159 18.755 48.751 27.967 1.00 1.02 O HETATM 1629 O HOH B 160 17.792 30.870 15.719 1.00 1.02 O HETATM 1630 O HOH B 161 14.024 55.209 23.929 1.00 1.02 O HETATM 1631 O HOH B 162 5.358 54.257 26.690 1.00 1.02 O HETATM 1632 O HOH B 163 20.975 21.968 22.981 1.00 1.02 O HETATM 1633 O HOH B 164 17.979 35.529 38.171 1.00 1.02 O HETATM 1634 O HOH B 165 24.624 33.158 26.880 1.00 1.02 O HETATM 1635 O HOH B 166 1.944 14.011 24.536 1.00 1.02 O HETATM 1636 O HOH B 167 24.800 20.135 30.685 1.00 1.02 O MASTER 305 0 0 2 15 0 0 6 1634 2 0 16 END PyCogent-1.5.3/tests/data/bowtie_output.map000644 000765 000024 00000001526 11533073176 021735 0ustar00jrideoutstaff000000 000000 GAPC_0015:6:1:1283:11957#0/1 - Mus 66047927 TGTATATATAAACATATATGGAAACTGAATATATATACATTATGTATGTATATATGTATATGTTATATATACATA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0 55:A>G,64:C>A GAPC_0015:6:1:1394:18813#0/1 + Mus 77785518 ATGAAATTCCTAGCCAAATGGATGGACCTGGAGGGCATCATC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 447 GAPC_0015:6:1:1560:18056#0/1 + Mus 178806665 TAGATAAAGGCTCTGTTTTTCATCATTGAGAAATTGTTATTTTTCTGATGTTATA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0 9:T>G GAPC_0015:6:1:1565:19849#0/1 + Mus 116516430 ACCATTTGCTTGGAAAATTGTTTTCCAGCCTTTCACTCTGAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 141 GAPC_0015:6:1:1591:17397#0/1 - Mus 120440696 TCTAAATCTGTTCATTAATTAAGCCTGTTTCCATGTCCTTGGTCTTAAGACCAATCTGTTATGCGGGTGTGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0 70:A>C,71:G>TPyCogent-1.5.3/tests/data/brca1.fasta000644 000765 000024 00000510250 10665667404 020344 0ustar00jrideoutstaff000000 000000 >FlyingFox TGTGGCACAAATGCTCATGCCAGCTCTTTACAGCATGAGAAC---AGTTTATTATACACTAAAGACAGAATGAATGTAGA AAAGACTGACTTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACAGAACAGATGGGTTGAAACTAAGGAAA CATGTAACGAT---ATGCAGACTTCCAGCACAGAGAAAAAGGTAGTTCTGAATGCTGATCCCCTGAATGGGAGAATAGAA CTGAATAAGCAGAAACCTCCATGCTCTGACAGTCCTAGAGAT---TCTCAAGAT---ATTTCTTGGATAACACGGAATAG TAGCATACAGAAAGTTAATGAGTGGTTTTCCAGACGTGATGAAATATTAACTTCTGATGTCTCACCTGATGGGAGGTCTG AATCAAATGTG---------------GTAGAAGTTCCAAAT------GAAGTAGATGGATACTCTGGTGCTTCAGAGAAA ATAGCTTTAAAGGCCAATGATCCTCATGGTGCTTTAATGTGC------GAAAGAGTTCACTCCAAACTGGTAGAAAGTAA T---ATTGAAGATAAAATATTTGGGAAAACATATCGGAGGAAAGCAAGCCTCCCTAACTTGAGCCACATAACTGAAAATC TAATTACAGGAGCATCTGCTATAGAACCTCAGATAACACAA--------------------------------------- ---------------------GAGTATCCCCTCACAAATAAACTAAAGCGTAAAAGGAGAACTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAATAGATTTGGCAGTTGTTCAGAAAACTCCGGAAAACATAATTGAGGAAACTGACCAAATAG AGCAGAAT---------GGTCATGTGATGAATAGTACTAATAATGGTCATGAGAATGAAACAAAAGGTGATTAT---GTT CAGAAGAAGAAAAATACAAACCCAACAGAA------TCATTGGAAAAAGAATCTACTTTCAAAACTAAAGCTGAACCTAT ATGCAGCAGCATAAGCAATATGGAACTAGAATTAAATATCCACAGTTCAAAAGCAGTTAAGAAGAATAGGCTGAGGAGGA AGTCCTCTACCAGGCATATTCATGCACTTGAACTAGTAGTCAATAGAAATCCAAGCCCACCTAATCATACTGAACTACAA ATTGATAGTTGTTCTAGCAGTGAAGAG---CTGAAGGAAAAA---AATTCTGACCAAATGCCAGTCAGACACAGCAAAAA ACTTCAATTCATAGAAGATAAAGAATCTTCAACTGGAGCCAAGAAGAATAACAAGCCAAATGAGACAATCAATAAAAGAC TTGCCAGTGATGCTTTTCCAGAATTAAATATAACAAACATACCTGGTTTTTTTACTAATGGTTCAAGTTCTAATAAACTT CAAGAGTTTGTCAATCCTAGCCTTCAAAGAGAAGAAATAAAAGAGAAC---CTAGGAACAATTCAAGTGTCTAATAGTAC CAAGGACCCCAAAATTTTGATCTTCGGTGAAGGAAGA---GGTTCACAA---ACTGATCGATCTACAGAGAGTACCAGTA TTTTATTGGTGCCTGAAACGGATTATGGCACTCAAGATAGTATCTCATTACTGGAACCTGACATCCCAGAG---AGGGTA AAG---ACAGCACCAAACCATCAT------------GCAGCAATTAAAAACCCCAGAGAACTTATTCATGGTTGT---TC TGAAGATACTAGAAATGATGCAGAGGGCTTTAAAGATCCATTGAGACGTGAAGTTAAC---TACANNNNNNNNNNN---- -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNTTCAAACCCAGGAAATCTAGAAAAGGAATGTGCAACAGGCTATGCCCACTCCAAGTCCTTGAGGAAACAAAGTCCAAA AGTCACTCTTGAATGTGACCGAAAAGAA---AATCAGGGAAAGAAAGAGTCTAACGTCAAGCATGTGCAGGCAGTTTATA CAACTGTAGGCTTTCCTGTGGTTTGTGAGAAAGAAAAAAAGCCAGGAGATTATGCTAAATATGGCATAAAAGAAGTCTCT AGGCTTTGTCAGTCATTTCAGTTCAGA---GAAAATGAAACTGAACTCACTATTGCAAATAAACTTGGAATTTCACAAAA CCCATATCATATGCCATCCATTCCTCCCATCAAGTCATCTGTTAAAACTACATGTAAGAAAAAT---CTGTCAGAGGAAA AGTTTGAAGAACATTCAATATCCCCTGAAAGAACAATAGGAAATGAGACCATCATTCAAAGTACAGTGGGCACAATTAGC CAAAATAACATTAGAGAAAGCACTTTTAAAGAAGGCAGCTCAAGCAGTATTTATGAAGCAGGTTCCAGTACTAACGAACT AGGCTCTAGTGTCAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAGAACTAAGTA GAAACAGAGGACCACAATTAAATGCTGTGCTTCAATTGGGTCTCATGCAGCCTGAAGTCTATAAGCAAAGCCTT---CCT CTAAGTAATTGTAAACATCCTGAAATAAAAAGGCAAGGAGAAAATGAAGGAGTAGTTCAGGCTGTTAATGCAGATGTCTC TCTACGTCAGATTTCAGATAACTTAGAGCAA---CCTATGGGAAACAGTAATGCTTCTCAGGTTTGTTCTGAGACACCGG ATGACCTGTTAAATGATGACAAAATAAAAGAGAATATCGGCTTTGATGAAAGTGGCATTAAGGAAAGATCTGCTGTTTTT AGCAAAAGTGTCCAGAAAGGAGAATTCAAAAGGAGCCCTAGTCCCTTAGCCCAT---ACAAGTTTGTCTCAAGGTCGCCG AAGAGGGGCTAGGAAATTAGAGTCCTCAGAAGAGGA------------- >DogFaced TGTGGCACAAATACTCATGCCAACTCATTACAGCATGAGAACAGCAGTTTATTATACACTAAAGACAGAATGAATGTAGA AAAGACTGACTTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAGCAGAACAGATGGGTTGAAACTAAGGAAA CATGTAATGAT---AGGCAGACTTCCAGCANAGAGAAAAAGGTAGTTCTGAATGCTGATCCCCTGAATGGAAGAATAAAA CTGAATAAGCAGAAACCTCCATGCTCTGACAGTCCTAGAGAT---TCCAAAGAT---ATTCCTTGGATAACACGGAATAG TAGCATACAGAAAGTTAATGAGTGGTTTTCCAGACGTGATGAAACATTAACTTCTGATGTCTTACTTGATGAGAGGTCTG AATCAAATGTG---------------GTAGAAGTTCCAAAT------GAAGTAGATGGATACTCTGGTGCTTCAGAGGAA ATAGCCTTAAAGGCCAGTGATCCTCATGGTGCTTTAATATGT------GAAAGAGTTCACTCCAAATTGATAGAAAGTAA T---ATTGAAGATAAAATATTTGGGAAAACATATCGGAGGAAAGCAAGCCTCCCTAACTTAAGCCACATAACTGAAAATC TAATTACAAGAGCATCTGCTACAGAACCTCAGATAACACAA--------------------------------------- ---------------------GAGTGCCCCCTCACAAATAAACTAAAACGTAAAAGAAGAACTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAATAGATTTGACAACTGTTCAAAAAACTTCTGAAAATATAATTGAGGGAACTGACCAAATAG AGCAGAAT---------GGTCATGTGATGAATAGTTCTAATGATGGTCATGAGAATGAAACAAAAGGTGATTAT---GTT CAGAAGAAGAAAAATACAAACCCAACAGAA------TCATTGGAAAAAGAATCTGCTTTCAGAACTAAAGTTGAGTCTGT ACCCAACAACATAAGCAATGTGGAACTAGAATTAAATATTCACGGTTCAAAAGCACTCAAGAAGAATAGNCTGAGGAGGA AGTCCTNTACCAGGCATATTCATGCACTTGAACTAGTAGTCAATAGAAATTCAAGCCCACCTAATCATACTGAACTACAA ATTGATAGTTGTTCCAGCAGTGAAGAA---CTGAAGGAAAAA---AATTCTGACCGAATGCCAGACAGACACAGCAAAAA ACTTCAGTTCGTAGAAGATAAAGAATCTGCAACTGGAGCCAAGAAGAATAACATGCCAAATGAGGCAATAAATAAAAGAC TTTCCAGTGAAGCTTTTCCCGAATTAAATATAACAAACGTACCTGGTTTTTTTACTAATGGTTCAAGTTCTAATAAACGT CAAGAGTTTGTCAATCCTAGCCTTCAAGGAGAAGAAATAAAAGAGAAT---CTACGAACAATTCAAGTGTCTAATAGCAC CAAAGACCCCAAAATTCTAATCTTTGGTGAAGGAAGA---GGTTCACAA---ACTGATCGATCTACAGAGAGTACCAGTA TTTTATTGGGACCTGAAACGGATTATGGCACTCAAGATAGTATCTCATTACTGGAATCTGACATCCCAGGG---AGGGCA AAG---ACAGCACCAAACCAACATGCAGATCTGTGTGCAGCAATTGAAAACCCCAGAGAACTTATTCATGATTGT---TT TAAAGAAACTAGAAATGACACAGAGAGCTTTAAAGATCCATTGAGACATGAAGTTAAC---TCCACNNNNNNNNNN---- -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNTTCAGACCCAGGAAATCTAGAAAAGGAATGTGCAACAGGCTATGCCCACTCCAGGTCCTTGATAAAACAAAGTCCAAA AGTCACTCTTGAATGTGACCGAAAAGGA---AATCAGGGAAAGAAAGAGTCTAACATNGAGCATGTGCAGGCAGTTTATA CAACTATAGGCTTTCCTGGGGTTTCTGAGAAAGACAAAAAGCCAGGAGATTATGCCAGATATGGCATAAAAGAAGTCTCT AGGCTTTGTCAGTCATTTCAGTCTAGA---AGAAATGAAACTGAGCTCACTATTGCAAATAAACTTGGACTTTCACAAAA CCCATATCATATGCCATCCATTTCTCCCATCAAGTCATCTGTTAAAACTATATGTAAGAAAAAT---CTGTCAGAGGAAA AGTTTGAAGAACATTCAATATTCCCTGAAAGAGCAATAGGAAATGAGACCATCATTCAAAGTACAGTGGGCACAATTAGC CAAAATAACATTAGAGAAAGCACTTTTAAAGAAGGCAGCTCAAGCGGTATTTATGAAGCAGGTTCCAGTACCAATGAACT AGGCTCTAGTGTCAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAGAACTAAGTA GAAACAGAGGACCAAAATTAAATGCTGTGCTTCAGTTGGGTCTCATGCAGCCTGAAGTCTATGAGCAAAGCCTT---CCT CTAAGTAATTGTAAACATTCTGAAATAAAAAGGCAAGGAGAAAATGAAGGAGTGGTTCAGGCTGTTAATGCAGATGTCTC TCCANGTCAAATTTCAGATAACTTAGAGCAA---CCTATGGGAAACAGTAATATTTCTCAGGTTTGTTCTGAGACACCGG ATGACCTGTTAAATGATGACAAAATAAAGGACAATATCAGCTTTGATGAAAGTGGCATTCAGGAAAGATCTGCTGTTTTT AGCAAAAATGTCCAGAAAGGAGAATTCAGAAGGAGCCCTAGTCCCTTAGCCCAT---GCAAGTTTGTCTCAAGGTCGCCC AAGAAGGGC---------------------------------------- >FreeTaile TGTGGCACAGATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTACTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAGCAGCCTGGCTTAGCAAAGAGCCAGCAGAGCAGATGGGCTGAAAGTAAAGAAA CATGTAATGAT---AGGCAGACTCTCAGCACAGAGAAAAGGGTAGTTCTGAATGCTGATCCTCTGAATAGG--------- ------AGAAAAGAACCTCCAGGCTCTAACTATCCTAGAGAT---TCCCAAGAT---GTTCCTTGGATAACACGGAGTAG TAGCATACAGAAAGTTAATGAGTGGTTCTCCAGACGTGATGAAATACTAACTACTGGTGGCTCACATAACGGTAGATTTG AATCAAATGTTGAAGTAGCTGGTGCAGTAGAAGTTCCAAAT------GAAGTAGATGGATATTCTGGTTCTCCAGAGAAA ATAGCCTTAATGGCCTGTGATCCTCCTGATGCTTTAATATGT------GAAAGAGTCTCCTCTAAACCACTAGAAAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGAAAGGCAAGCTTCCCTAACTTGAGCCACATAAGTGAAAATC TAATTATAGGAGCATCTGCTATAGAACCCCAGGTAACAAAA--------------------------------------- ---------------------GAATGTCCCCTCACAAATAAACTAAAGCGTAAAAGAAGA---ACATCAAGCCTTCATCC TGAGGATTTTATCAAGAAAGTAGATTTGGCAGTTGTTCAAAAGACTCTTGAAAAGGTAATTGAGGGAACTGACCAAATAG AACAGAAT---------GGTCATGTGATGAGTATTACTAGTAATTGTCATGAGAATAAAACAAAAGGTGATTAT---GTT CAGAAAGAGAAAAATTCTAACCCAACAGAA------TCATTGGAAAAAGAATCTGCTTTCACAACTAAAGCTGAACCTAT AAGCAGCAGCATAAGCAATATGGAACTAGAGTTAAATATCCACAGTTCAAAAGCACCTAAGAAGAATAGGCTGAAGAGGA AGTCCTCTACCAGGCACATTTATGCACTTGAACTAGTAGTCAATAGAAATCCAAGCCCACCTAATCACACTGAACTACAA ATTGATAGCTGTTCTAGCAGTGAAGAG---GTGAAGGAAAAA---AATTCTGACCAAATACCAGTCAGACACAGCAAAAA GCTTCAACTCATGGAAGGTAAAGAACCTGCAACTGGAGCCAAGAAGAGTAACAAATCAAATGAACAAATAAATAAAAAAC TTGCCAGTGATGTTTTTCCAGAACTAAACTTAACAAACATACCTGGTTTTTTAAGTAATGATTCAAGTTCTAATAAACTT AAAGAGTTTGTCAATCCTAACCTTCAAAGAGAAGAAATAACAGAGAAC---CTAGGAACAGTTCACATGTCTAATAGTAC CAAAGACCTCAAAGATCTGATATTAAGTGGAGGAAGA---AGTTTGCAA---ACTGATAGATCTATGGAGAGTACCAATA TTTTATTGGTACCTGAAACTGATTATGGCACGCAGGATAGTATCTCATTACTGGAACCTGACACCCCAGGG---AAGGCA AAA---AAAGCTCCAAATCAATATGCGGGTCTGTGTGCAGAAATTAAAAACCNCAAGGAACTTATCCATGGTTGT---TC TAACGATAATAGAAATGACAGAGAGGACATTAAGGATCTATTGAGACCTGAAGTTAAC---CACANNNNNNNNNNNNNNN NNNNNNNGGAAGAGAGTGAACTTGATACACAGTATTTACAGAATACATTCAAGGTTTCAAAACGTCAGTCGTTTGCTCTG TTTTCAAATCCA---------GAAAAGGAATATGCAACAGTCTATGGCCACTCCAGGTCCTTAAGGAAACAAAGTCCAAA AGTCACTCTTGAATGTGGACAAAAAGAAGAAAATCAGGGAGAGAAAGAATCTGAAATCAAGTATGTACGGGGAGTTCACA CAACTGCAGGCTTTCCTGTGGTTTGTGAGAAAGACGAAAAGCCAGAAGAATATGCCAAATGTAGCATAAAAGGAACCTCT AGCCTTTGTCAGCCACCTCAGTTCAGA---GGCAACGAAACTGAACTCACTATTGCAAATAAACCCGGAATTTCACGAAA CCCATATCATATACCATGCATTTCTCCCATCAGGTCCTTTGTTAAAACTATAAATAAGAAAAAC---CTGTCAGAGGAAA AGTTTGAGGAACATTCAGTGTCACCTGAAAGAGCAATGAGAAATGAGAAT---ATTCTAAGTACAGTGAGCCCAATTAGC CTAAATAAC---AGAGAAAGCACTTTTAAAGAAGGCAGCTCAAGC---------------------AGTACTAATGAAGT AGGCTCTAGTACCAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAGAACTAGGTA GAAACAGAGGATCAAAAATAAATGCTATGCTCAGATCAGGTCTCATGCAACCTGAAGTCTATAAGCCAAGCCTT---TCT GTAAGTAATTGTGAACATCCTGAAATAAAAAGGCAAGGAGAAAATGAAGGAGTAGTTCAGGCTGTTAATGCAGATTTCTC TCCATGTCAAATTTCAGATAACTTAGAACAA---TCTATGGGAAGTACTCCTGCTTCTCAGGTTTGTTCTGAGACACCAG ATGACCTGTTAAATGATGACAAAATAAAGGAGAATAGCAGCTTTGCTGGAAGTGGCATTAAGGAAAGATCTGCTATTTTT AGCAAAAGTGTCAAGAAAGAAAAATTCAGAAGGAGCCCTAGCCCCTTTGCCCAT---ACACATTTGACTCATACTCGCCA AAGAGGGGCCAGGAAATTAGAGTCCTCAG-------------------- >LittleBro TGTGGCACAGATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTACTACTCACCAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAGCAGCCTGGCTTAGCAAGGAGCCAGCAGAGCAGATGGGCTGAAAGTAAAGAAA CATGTAATGAT---AGGCAGACTCCCAGCACAGAGAAAAGGGTAATTCTGGATGCTGATCCTCTGAATGGG--------- ------GAAAAAGAACTTCCACGCTCTGACCATCCCAGAGAT---TCCCAAGAT---GTGCCTTGGATAACACGGAGCAG TAGCATACAGAAAGTTAATGAGTGGTTTTCCAGGCGTGATGAAATACTAACTTCTGATGGCTCACATAATGGCAAGTCTG AGTCAAATGCTGAAGTAGCTGGTGCAGTGGAAGTTGCAAAT------GAAGTAGATGGGTATTCTGGTTCTCCAGGGAAA ATAACCTTAATGGCCCATGATCCTCATGGTGCTTTAACCTGTGAAAGTGAAAGAGTTCACTCCAAACCAGTAGAAAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCAAACTTGAGCCACATAACTAAAAATC TAATTGTAGGAGCATCTGCTATAGAACCTCAGATAGCACAA--------------------------------------- ---------------------GAGTGTCCCCTCACAAATAAACTAAAGCGTAAAAGGAGAAGTACATCNGGCCTTCATCC TGAGGATTTTATTAAGAAAGTAGATTTGGCAGTTGTTCAAAAGACTCCTGAAGAGATAATTAAGGGAACTGACCGAATAG AACAGAAT---------GGTCATGAGATGAATATTACTAATAATGATCATGAGAATGAAACAAAAGGTGATTGT---GTT CAGAAAGAGAAAAATGCTAACCTAACAGAA------TCACTGGAAAAAGAATCTGCGTTCACAAGTAAAGCTGAACCTAT AAGCAGCAGCATAAGCAATATGGAACTAGAATTAAATGTCCACAGTTCAAAAGCACCTAAGAAGAATAGGCTGAAGAGGA AGTCCTCTACCAGGCCTATTCATGCACTTGAACTAGTAGTCAATAGAAATCCGAGCCCATCTAACCATACTGAACTACAA ATTGATAGTTGTTCTAGCAGTGAAGAG---GTGAAGGAAAAA---AATTCTGACCAAATACCAGTCAGACACAGCAAGAA GCTTCAACTCATGGAAGGTAAAGAACCTGCAGCTGGAGCCAAGAAGAGTAATAAGTCAGATGAACAAATAAATAAAAAAC TTGCCAGTGATGCTTTTCTAGAACAAAACTTAACAAACATGCCTGGTGTTTTTACTAATGGTTTAAGCTCTAATAAGCTT AACGAGTTTGTCGATCCTAACCTACAAAGAGAAGAAACAGAAGAGAAC---CTAGGAGCAGTTCAAATGTCTAATAGTAC CAAAGACCTCGAAGATCTGACATTAAGTGGAGGAAGA---AGTGTGCAA---ATTGATAGATCTAAAGAGAGTACCAATA TTGTATTGGTACCTGAAACTGATTATGGCACACAGGATAGTGTCTCATTACTGGAACCTGACATCCCAGGG---AAGGCA AAA---ACAGCTCCAAATCAATGTGGGGATCTGTGTGCAGCAGTTAAAAATCCTAAAGAACTTATTCGTGGTTGT---TC TAAAGATATTAGAAATGACAGAGAGGGCTTTAAGGATCTATTGAGATGTGAAGTTAAC---CACACGCAGGAGACAAGCA TAGAAGTGGAAGAGAGTGAACTTGATACACAGGAATTACAGAATACATTCAAGGTGTCAAAGCGCCAGTCATTTGCTCTG TTTTCAAATCCA---------GAAAAGGAATGTGCAACAGCCTATGCTCACTCCCAGTCTTTAAGGAAACAAAGTCCAAA AGTCACTCTTGAATGTGGACAAAAAGAAGAAAATCAGGGAAAGAAAGAATCTAAAATCAAGCATGTACAGGCAGTTCACA CAGCTGTAGGCTTTCCTGTGGTTTGTGAGAAAGACAGAAAGCCAGGAGAGTATGCCAAATACAGCATAAAAGGAACCTCT ATGCATTGCCNGTCCTCTCAGTTCAGA---GGCAACAAAACTGAACTCACTATTACAGATAAATATGGACTTTCCCCAAA CCCATATCATATACCATCCATTTCTCCCATCAAGTCATTTGTTAAAACTGTAAGTAAGAAAAAC---CTGTCAAAGGAAA AGTTTGAGGAACATTTAGTGTCACCTGAAAGAGCAATGGGAAATGAGAAC---ATTCAAAGTACAGTGAGCCCAATTAGC CTAAGTAACATTAGAGAAAGCGCTTTTAAAGAAAGCAGCTCAAGC---------------------AGTACTAATGAAGG GGGCTCTAGTATCAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCATGCAGAACTAGATA TAAACAAAGGATCAAAATTAACTGCTATGCTCAGATTAGGTCTCATGCAACCCGAAGTCTATAAGCCAAGCCTT---CCT GTAAGTAATTGTAAACATCCTGAAGTAAAAAGGCAAGGAGACAACGAAGGACTAGTTCAGGCTGTTAATGCAGACTTCCC TCCATGTCAAATTTCAGATAACCTAGAACAA---CCTATGGGAAGTAGTCCTGTTTCTCAGGTTTGTTCTGCGACACCGG ATGACTTGTTAACTGATGATGAAATAAAGGAGAATAGCGGCTTTGATGAAAGTGGCATTAANGAAAGATCTGCTGTTTTT AGCAAAGATGTTCAGAAAGAAGAATTCAGAGGGAGCCCTAGCCCCTTAGCCCAT---ACATATTTGACTCGGAGTTGCCA AAGAAGGGCCAGGAAATTAGAGTCCTCAGAAGAGGA------------- >TombBat TGTGGCACAAGTACTCATGCCAGCTCAGTACAGCATGAGAACAGCAGTTTACTACTCACTAAAGACAGAATGAACGTAGA AAAGCTTGACTTCTGTAATAAAAGCAAGCAGCCTGGCTTAGCAAGGAGCCAGCAGAGCAGATGGGCTGAAAGTAAAGAAA CATGTAATGAT---AGGCAGACTCCCAGCACAGAGAAAAGGGTAGTTGTGAATGCTGATACCCTGGATGGG--------- ------AGAAAAGAACCTCCATACTCTGACTGTCCTAATGAT---TCCCAAGAT---GTCCCTTGGATAACAGGGAATAG TAGCATACAGAAAGTTAGTGAGTGGTTTTCCAGGCGTGATGAAATATTAACTTCTGATGGATCACATGATGGGAGATCTG AATCAAATATGGAAGTAGCTGGTGCAGTAGAAGTTCCATAT------GAAGTAGATGGATATTCTGATTCTCCAGAGAAA ATAGGCTTAATGGCCAGTGATCCTCTTGGTGCTTTACTATGTGAAAGTGAAAGAATCCACTCCAAACCAGTAGAAAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTGAGCCATATAACTGAAAATC TAATTATAGGAGCACCTACTATACAATCTCAGATAACACAA--------------------------------------- ---------------------AATTGTCCCCTCACAGATAAACTAAAGCATAAAAGAAGAACTACATCAGGCCTTCGTCC TGAGGATTTTATCAAGAAGGTAGATTTGGCAGCTGTTCAGAAGACTCCTGAAAAGATAATTGAGGGAACTGATCAAACAG AACAGAAT---------GGTTCTGTGATGAATATTACTGATAATGGTCATGAGGATGAAACAAAATGTGATTAT---GGT CAGAAAGAGAAAAATGCTAACCCAGCAGAA------TCATTGGAAAAAGAATCTGCTTTCAGAACTAAAGCTGAACCTAT AAGCAGCAGCATAAGCAACATGGAACTAGAGTTAAATATCAACAGTTCAAAAGCACCTAAGAAGAATAGACTGAGGAGGA AGTCCTCTACCAGGCATATTTATGCACTTGAACTGGTAGTCAATAGAAATCCAAGCCCACCTAATCATATTGAACTACAA ATTGATAGTTGTTCTAGCAGTGAAGAG---GTGAAGAAAAGA---AATTCTGACCAAATACCAGTCAGGCACAGCAAAGA GCCTCAACTCGTGGAAGGTGAAGAACCTACGACTAGAGCCAAGAAG---AATAAGTCAAATGAACAAATAAATAAAAGAC TTGCCAGTGATACTTTTCCAGAACTAAATTTAAAAAACATACCTGGTTTTTTTACTAATGGTTCAAGTTCTAATAAACTT CAAGAGTTTGTTGATCCTAACCTTCGAAGAGAAGAAGTGGAAGAGAAC---CTAGGAACAATTCAAGTGTCTGATAGTAC CAAAGACCTCAAAGATCTGATATTAAGTGGAGGAAGA---AGTTTGCAA---ACTGATAGATCTATGGAGAGTACCAATA TTTTATTGGTACCTGAAACTGATTATGACACTCAGTATAGTATCTCATTACTGGAACCTGACACCCCAGGG---AAGGCA AAA---ACAGCACCAAGTCAACATGCGAGTCTGTGTGCAGCAATTGAAAACCCCAAGGAGTTTAACCATGGTTGT---TC TAAAGATACTAGAAGTGACACAGAGGGTGTTAAGGATCTACTGAGATGTGAAATTAAC---CNCACNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGAATACATTCNNGNTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCAAATTCAGAAAATCCAGAAAAGGAATGTGCAACAGTCTATGCCCACTCCAAGTCCTTAAGGAAACACAGTCCAAA AGTCACTCTTGGATGTGGTCAAAAAGAAGAAAATCAGGAAGAGAAAGAATCTAAAATCAAGCATGTACAGGCTGTTCACA CAGCTGCAGGCCTTCCTGCCATTTGTGAGAAAGACAAGAAGCCAGGAGAATATGACAGATACAATATAAAAGGAATCTCT AGGCTTTGTCAGTTATCTCAGTTCAGA---GGCAATGAAACTGAACTCACTATTGAAAATAAACACAGAATTTTACAAAA CCCATATCATATATCACCCATCTCTCCCATCAGGTCATCTGTTAAAACGATAAGTAAGAAAAAC---CTGTCAGAGGAAA AGTTTGAGAAACAGTCAGTGTCACCTGAAAAAGCAATGGGAAATGAGAACATCATTCAAAGTACAGTGAGCACAATTAGC CAAAATAATGTTAGAGAAAGAGCTGTTAAAGAAGGCAGCTCAAGC---------------------AGTACTAACGAAGT AGGCTCTAGTATCGATGAAGCAGGTTCCAGT---------------------GGTAAAAACATTGGAGCAGAACTAGATA GAAACAGAGGATCAAAATTAAGTGCTGTTCTCAGATTAGGTCTCATGCAACCCGAAGTCTATAAGCCAAGCCTT---CCT ATAAGTAATTGTAAACACTCTGAAATAGAAAGGCAAGGAGAAAATGAAGTAGTAGTTCAGGCTGTTAATGCA-------- ----TGTCAAATTTCAGATAACTTAGAACAG---CCTATGGGAAGTAGTCCTGTTTCTCAGGCTTGTTCTGAGACACCAG ATGACCTATTAGATGATGACAAAATAAAGGAGAATAGCAGCTTTGCTGAAAGTGGCATTAAGGAAAGATCTGCTATTTTT AGCAAAAGTGCCCAGGAA---GAACTCAGCAGGAGCCCTAGCCCCTTAACCCAT---ACACATTTGGCTCAGGGTCAGCA GAGAAGGGCCGGGAAATTAGAGCC------------------------- >RoundEare ---------------------NGCTCATTANAGCNTGAGAACAGCAGTTTACTGCTCACTGAGGACCAGATGAGTGTGGG AAAGGCTGAATTCCGTCATGAAAGCAAGCAGCCCGGCTTAGCGAGGAGCCAGCAGAGCAGATGGGCTGAAAGTAAAGAAA CATGTGACGAT---AGGCAGGCTCCCAGCGCAGAGGAAAGGGCAGTTCTGAATGCTGATCCCCAGAATGGG--------- ------AGGGAAGAATCTCCATCCTCTGACCACCCTAGAGAT---TCCCAAGAT---GTTCCTTGGATAACACGGAATAG CAGCATACAGAAAGTTAATGAGTGGTTTTCCAGACGTGATGAAACACGGACTTCCAACGGCTCCCACGGTGGGAGGCCTG AGTCAGACACGGAAGGAGCCGACACGGTAGAAGCTCCGGAC------GAAGTGCGTGGATGCCCTGGCTCTCCAGAGAAC ACAGCCTCGCCGGCCGGCGAGCCTCATGGCGCTTTAATGTGCGGAAGTGAAAGCGTCCACTCCAAACCAGTGGAGAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTGAGCCACGTAACTGACAGGC TAACTACGGGACTGTCCGCTCCAGACCCTCAGATAACACGA--------------------------------------- ---------------------GAGCGTCCCTTCACCAACAAACTAAAGCGTAAAAGGAGAACTACACCGGGCCTTCACCC AGAGGATTTTATCAAGACAGTGGATTTGACCGTTGTCCAGAAGACTCCTGAGAAGACCATTGAGGGAACTGACCAAACAG AACAGAAC---------GGTCGTGTGATGGATATTGCTAACAGTGGTCACGGGAATGAAGCAAAAGGTGATTAT---GTT CAGAATGAGAAGAGTTCTGACCCAACAGAA------TCACTGGGAGAAGAACCCGCTTTCAGAACTAAAGCTGGACCTAT AAGCAGCAGCATAAGCACCGTGGGACTAGAATTGAATGTCCACGGTTCAAAAGCGCCCAGGAAGACTAAGCGGAGGGAGA AGACTGCTGCCGAGCATACTTATGCACCTGGACTCGGGGTCAGCAAGAGCCCGAGCCCCCCTGCTCACGCCGGACTGCGG ACGGACGGTTGTTCTGGCGGCGAGGAG---GCAAAGATCGGG---AATTCTGGGCAGAGGCCAGCCAGGCGGAGCAGCAA GCTTCCGCTCGAGGAGGGTCAGGAGCCTGCAGCTGGGGCCAGCAAGGGTGACCGGTCAGATGCACCGATGAATAAGAGAC TTGCCAATGATGCTTTTCCGGAACTAAATTTAACAAGCGTATCTGCCGTTTTTACTAATGGTTCAGGTTCTACTAAACTT AAAGAGTGTGTCGATTCTAACCCTCAAGGAGAAGACACAGAAGAGAAC---CGAGGAACAGTTCAAGTGTCTAGTAGCAC CAAAGACCTCAAAGATCTGATATTCAGTGGGGGAAGA---AGTTTGCAA---ACTGACAGATCTGTGGAGAGTCCCAATA TTGNATTGGTACCTGAAACTGACTGTGACACTCAGGATAGCGTCTCCCTGCTGGCACCTGACACCCCAGGG---AAGGCA GAA---ACAGCACCAGGCCAACGTGTGGGTGGGTGCGCAGCTGCTGGAAGCCCGAAGGAACTTATCCGTGATTGT---TC GAAG------------GACACAGAGGGCGTTAAGGATCTCCCGAGATGTGAAGTTCAG---GAGACNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGCTCTG TGTTCAAATCCAGGNAATCCAGGAAAGGAATGTGCAACCATGTATTCCCACTCCAGGTCCACAAGGAAACGAAGTCCAAA AGTTACTCTTGAATGTGGACGAAAAGAA---CATCAGGGAGAGAAGGAATTTAATATGGAGCGTGCGCAGCCACCTTACA CAACAGCAGGCTTTCCTAAGGGTTGTGAGAAAGACAAAACGCCAGGAGAGTGTGCCACATATACTATGAAAGGAATCTCT AAGCCTTGTCAGTCATCTTCGTTCAGA---GGCAATGAAACTAAACTCACTATTGAAAATAAATATGGGATTTCACAAAA CCCCTATCACATACCACCCATTTCTCCCGTCAGGTCATCTGTTAAAACTACAAGTAAGAAGAAC---CTGTCAGAGGAGA AGTGTGAGGAACATTCAGTGTCACCTGCCAGAGCCGTGGGACGTGAGAACATCATTCAGAGTACAGCGGGCACTCCTAGC CAGAACAAAACCAGGGAAAGTGCCGCCAGAGAAGGCAGCTCGAGC---------------------GGCACTAACGAAGT AGGCTCCAGT------------------------------------------GGTGAAAACGGTCAAGCAGAGCCAGGCA CAAACAGAGCATCAAAATTAAGCGCTCTTCTCAGATCAGGGCTCATGCAACCTGAAGTCTGTAAGCCGAGTCTT---CCT CTGAGTAATTGTGAAGATCCTGAAATAAAAAGGCAA---GAAGATGGGGGAGTAGTGCAGGCTGTTAATGCAGATTTCTC TCCGTGTCAAATTTTAGATAACCTAGAACAA---CCTCTGGGAAGCAGTCCCGCTTCTCGGGTTTGTTCCGAGACCCCAG ACGACCTGTTAAATGATGACAAGGTAAAGGAGGATAACAGCTTTGCTGAAGGCGGCATGAAGGACAGATCTGCTGTTTTT AGCAAAAGCGTCCTGAAA---GAATTCAGAAGGAGCCCCAGTCCCTTAGCCCAC---ACANTCTTGGCTCGGGGTCACCC CAGAAGGGCCAGGAAACTCGAGTCCTCAGAAGAGGA------------- >FalseVamp TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAATTTATTACTGACTGAAGACATAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGTGTTAGCAAGGAGCCAGCAGAGCAGATGGGCTAACAGTAAAGAAA TATGTAATGAT---AGGCAGACTCCCAACACAGAGAAAAGGGTAGTTCTGAATGCTGATCCCTTGAATGAGAGAAAAGAA CTGAATAATCAGAAACCTCTATGCTCTGACAGTCCTAGAGAT---TCCCAAGAC---GTTCCTTGGATAACACGTAATAG TAGCATACAGAAAGTTAATGAGTGGTTTTCCAGACGTGATGAAATACTAACTTCTCATGGTTCACATGATGGGACAGGTG GATCAAATACAGAAGAAGCTGGTGCAGCAGAAATTCTAAAT------GAAGTAGATGGATATTCTGGTTCTTCAGAGAAA ATAGCTTTAATGGCCAGTGATCCTCCTGGTGCTTCAATTTGTGAAAGTGAAAGAGTCTACTCCAAACCAGTAGAAAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTACAGGAGGAAGGCAAGCCTCCCTAACTTGGGCCACAGAGCTGAAAATC TAATTATAGGAGCATCTGCTGTAGAACCTCAGATAATACAA--------------------------------------- ---------------------GAGTGTCTCCTCATAAATAAACTAAAGCGTAAAAGGAGAACTACATCAGCCCTCCATCC TGAGGATTTTATCAAGAAAGTAGATGTGGCAGTTGTTGAAAAGATTCCTGGAGATAGAATCAAGGGAACTGACCAAATAG AGCAGCAT---------GGTCATGTGATGAATATTACTAATTATGGTCATGAGAATGAAACA------------------ ------------AATGCTCACTCAACAGAA------TCACTGGACAAAGAATCTGCTTTTAGAACTAAAGCTGAACCTAT AAGCAGCAGTATAAGTAATATGGAACTAGAATTAAATATCCACAGTTTAAAAGCACCTAAGAAGAATAGGCTGAGGAGGA AGTCCTCTACAAGGCATATTCATGCACTTGAACTA---GTCAGTAGAAATCCAAGCCCACCTAATCATACTGAACTACAG ATTGATAGTTGTTCTAGCAATGAAGAG---GTGGAGAAGAAA---AACTCTGACCAAATGCCAGCCAGACACAACAAAAA TCTTCAACTTATAGAAGATAAAGAACCTGCAACTAGAGCTAAGAAGAGTAACAAGCCAGATGAACAAATAAATAAGAGAC TTACCAGTGATGCTTTTTCAGAACTAAATTTAACAAACACACCTGGTTTTGTTACCAACAGTTCAAATTCTGATAAACTT AAAGAGTTTGTCAATCCTAGCCTTCAAAGAGAAGAAATAGAAGAGCAT---CTGGGAACGATTAAAGTGTCTAATAGTAC CAAAGACCCCAAAGATCTGATACTAAGTGGAGGAAGA---GGTTTGCAA---ACCGATCGTTCTATGGAGAGTACCAGTA TTTCATTGGTACCTGATACTGATTATGGCACTCAGGCTAGTATGTCATTACTGGAACCTGACACCCCAGGG---AAGGTA AAA---ACAGCACCAAATCGACGTGCAGGTGTGTGTGCTGCAATTGAAAACCCCAAGGAAATTATCCATGGTTGT---TC TAAAGATACTAGAAATGACACAGGGGACTTTAAGGATCCACTAAGAGGTGAAGTTAAC---CACACNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NTTTCAAATCCAGGAAATCCAGAAAAGGAATGTGCAACAGTCTGTGCCCACTCCAGTTCCTTAAGGAACGAAAGTCCAAA GGTCACTCTTGAATGTGGACAAAAAGAAGAAAATCACGGAAAGAAAGAGTCCAAAATCAAGCATGTGCAGGCAGTTCCAA CAACTGCAGGCTTTCCTGTGGTTTGTGAGAAAGAAAAAAATCCAGGAGATTATGCCAAATATACCAAAAAAGGAGTCTCT AGGCTTCGTCAGTCATCTCAGATCAGA---GGCAACAAAACCGAACTCACTGTTACAAATAAACATGGAATTTCTCAAAA CCCATATCATATACCACCCATTTCTTCCATCAGGTCATCTGTTAAAACTATATGTAAGGAAAAC---CTGTCAGAGGAAA ACCTTGAGGCATATTTGGTGTCACCTGAAAGAGCAATGGGAAATGAGAGCATTGTTCAAAGTACAGTGAGCACAGTTAGC CAAAGTAACATTAGAGAAAGCACTTTTAAAGAGGGCAGCTCCAGCAATATTTATGAAGCAGATTCCAGTGCTAATGAAGT AGGCTCTAGTATCAGTGAGGTAAGTTCCAGT---------------------GGTGAGAACATTCAAACAGAACTGGGTA GAAACCAAGGACCAAAATTAAATGCTGTGCTCAGATTAGGTCTCATGCAACCTCAAGTCTATGTGCAAAGCCTT---CCT GTAAGCAATTGTGAACATTCTGAAATAAAAAGGCAAGGAGAAAATGAAGGAGTAGTTGAGGCTGTTAATGCAGAATTCTC TCCATGTCAAACTTCAGATAACCTGGAACAA---CCTATGGGAAGTAGTCATGCTTCTCAAATTTGTTCTGAGACACCGG ATGACCTGTTAAACGATTATGAAATAAAGGAAAATGTTAGCTTT---------------AAGGAAAGATCTGCTGTTTTT AGCAAAAGTGTCCAAAAAGAAGAATTGAGTAGAAGCCCTAGCCCCTTAATCCAT---ACATGTTTAGCTCAGGGTCACCG AAGA--------------------------------------------- >LeafNose TGTGGCACAAATACTCATGCCAGCTCTTTACATTATGAGCACAGCAGTTTATTACTCACTGAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAGCAGCCTGGCTTAGCAAGGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCCAGCACAGAGGAAAATGTAGTTCTGAATACTGATCCCCTGAATGGGAGAAAAGAA CTGAATAAGCAGAAACCTCCATGCTCTGACAGTCCTAGGGAT---TCCCAAGTT---GTTGCATGGATAACACAGAATAG TAGCATACAGAAAGTTAATGAGTGGTTTTCCAGACGTGATGAAATATTAACTTCTCATAGCTCATGTTATGGGAGAGCTG AATCAAATACAGAAGTATCTGGTGCAGTAGAAGTTCCACTT------GAAGTAGATGGATTTTCTGGCTCTACAGAGAAA ATAACCTTAATGACCAGTGATCCTCATGATGCTGTAATATGTGAAAGTGGAAGAGTCCACTCCAAACCATTGGAAAGTAC T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTCAGCCACACAACTGAAAACA TAATTATAGGAGCATCTGCTGTAGAACCTCAGATAACACAA--------------------------------------- ---------------------GAGTGTCCCCTCACAAACAAACTAAGGCGTAAGAGGCGAACTACGTCAGGCCTTCATCC TGAGGATTTCATCAAGAAAGTAGATTTGACAGTTGTTCAAAAGACTCCTGAAAAGATAATTGAGAGAACTCACCAAACAG AACAGAAT---------GGTCATGTGATGAACATTACTGATAATGGTCATGGGAATGAAACAAAAGGTGATTAT---GTT CAGAAAGAGGATAATGCTAACCCAACAGAA------TCATTGGAAAAAGAATCTGCTTTCAGAACTACAGCTGAACCTAT AAGCAGCAGTATAAGCCATATGGAACTAGAATTAAATATCCATAGTTCAAAAGCACCTAAGAAAAATAGGCTGAGGAGGA AGTCCTCTACCAGGCCTATTCATGCACTTGAACTAGTAGTCAGTGGAAATCCACGCCCACCTAGTCAGACTGAGCTACAA ATTGATAGTTGTTCTAGCAGTGAAGAG---GTGAAGAAAAAA---ATTTCTGACCAAGTGCCAGTCACACACAGCAAAAC GCTTGAACTCATAGAAGATCAAGAACCTGCAAATGGAGTCAAGAAGAGGAACAAGCCAAATGAACAAATAAATAAGAGAC TTACCAGTAATGCTTTTCCAGAACTAAATTTAACAAACATACCTGGTATATTG---AACCGTTCAAGTTCAAATAAACTT CAAGAGTTTGTCAATCCTAGCCTTCAAAGAGAAGAAATAGAAGAGAGC---CTAGGAACAATTCAAGTGTCTAATAGTAC CAAAGGCCTCAAAGATTTGATATCAAATGGGGGAAGA---GGCTTGCAA---ACTGGTCAATCTATGGAAAGTACCAGTA TTTTATTGGTACCTGATACTGATTATGGCAGTCAGGATAGTATGTCATTACTCGAACCTGACACCCCAGGG---AAGGCA AAG---ACTGCACCAAATCAACATGTGGGTGTATGTACAGCAGTTGAAAACCCCGAGGAACTTATCCATGGTGGT---TC TAATGATACTAGAAATGACACAGAGAGCTTTAAGGATTCATTGAGACATGAAGTTAAC---CACGGTCAGGAGACAAGCA TAGAAATGGAAGAGAGTGAACTTGATACACAGTATTTACAGGAAACATTCAAGGTTTCAAAGCGTCAATCATTTGCTCTG TTTTCAAATCCAAGAAATCAAAGAAAGGAATGTGCAACAATCCAGTCCAGGTCC------TTAAGGAAACAAAGTGCAAA AGTCACTCTTGAATGTGGACAAAAAGAAGAAAATCAGGGAAAGAAAGATTCTAAAATCAAGCTTGTACAGGCAGTTCATA CAACTGCAGGCTATCCTGTGGTTTGTGAGAAAGATGAAAATCCAGGAGATTATGCCAAATACAGCACAAAAGGAGTCTCT AGGCTTTGTCAATCATCTCTGTTCAGA---AGCAACGAAACTGAACTCACTATTGCAAATAAACATGGAATTTCTCAAAA CCCACATAATATACCACATATTTCTCCCATCAGGTCATCTGTTAAAACTGTATGTAAGAAAAAC---CTGTCAGAGGAAA ACTTGGAGGAATATCCAGTGTCACCTGAAAGAGCAATGGGAAATGAGAGCATCATTCAAAGTACAGTGAGCACAATTAGC CAAAATAACATTAGAGAAAGCACTTTTAAAGAAAGCAGCTCAAGCAATNTTTATGAAGCAGATTCCAGTACTAATGAAGT AGGCTCTAGTATCAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAGAACTCGGCA GAAACAGAGGACCGAAATTAAATGCTGTGCTCGGATTAGGTCTCGTGCAACCTGAAGTCTATAGGCAAAGCCTT---CCT GTAAGTAATTGTCAACATCCAGAAATAAAAAGGCAGGGAGAAAATGAAGGAATAGTTCAGGCTGTTAGTGCAGACTTCTC TCCATGTCAAATTTCAGATAACCTAGAACAA---CCTACGGGAAGTAGTCATGCTTCTCAGGTTTGTCCTGAGACACCGG ATGACCTGTTAAATGATAACGAAATAAAGGAAAATGACAGCTTTGCTGAAAGTGACATTAAGGAAAGATCTGCTGTTTTT AGCAAAAGTGTCCAGAAAGGAGAATTCAGAAGGAGCCCTAGCCCCGTAGCTCAC---ACACGTTTGGCTCAGGGTCACCA AAGACGGGCCAGGAAATTAGAGTCCTCAGAAGAGGA------------- >Horse TGTGGCACAAATACTCATGCCAGCTCATTGCAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAAGAGCCAACAGAGCAGACGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGGCCCCCAACTCAGAGGAAAAGCTAGTTCTGAATGCTGATCCGCTGTATGGGAGAGAAGAA CTGAATAAGCAGAAACCTCCACGCTCTGACAGTCCTAGAGAC---TCCCAAGAT---GTTCCTTGGATAACACTGAATAG TAGCATACAGAAAGTTAATGAGTGGTTTTCCAGAAGTGAGGAAATGTTAACTTCTGATGACTCATGTGACGGAGGGCCTG AATCAAATACAGAAGTAGCTGGTGCAGTAGAAGTTCCAAAT------GAAGTACGTGGATATTCTGGTTCTTCAGAGAAA ATAGACTTAATGGCCGGTGATCCTTCTAGTGCTTTAATATGTGAAAGTGAAAGAGTCCGCTCCAAACCAGTAGAGAATAA T---ATTGAAGATAGAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCTTCCCTAACTTGAGCCACATAACTGAAGATC TAATTATAGGGGCATCTGCTATAGAACCTCAGATTACACAA--------------------------------------- ---------------------GAGCGTCCACTCACAAATAGAGTGAAGCATAAAAGGAGAACGTCATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGTAGATTTGGCAGTTGTCCAAAAGACTCCTGAAAAGATAATTGAGGGAACTGACCAAATAG AGCAGAAC---------AGTCATGTGATGAATATTACTCCTAATGGTCATGGGAATGAAACAAAAGGCGATTAT---GTT CAGAATGAGAAAAACGCTTACCTAACAGAA------TCATTGGAGACAGAATCTGCTTTCAGAACTAAAGCTGAACCTAT AAGCAGCAGCATAGGCAATCTGGAACTGGAATTAAATATCCACAGTTCAAAAGCACCTAAGAAGAATAGGCTGAGGAGGA AGTCTTGTACCAAGCAGATTCATGCACTTGAACTAGTAGTCAGTAAAAATCCAAGCCCACCTAATCATACTGAACTACAA ATCGATAGTTGTTCTAGTAGTGAAGAG---ATGAAGAAAAAA---AATTCTGACCAAATGCCGGTCAGACACAGCAAAAA GCTTCAACTCATGGAAGATAAAGAACCTGCCACTGGAGCCAAAAAGAGTAACAAGCCAAATGAACAAATAAATAAAAGAC TTGCCAGTGATGCTTTTCCAGAGCTAAAATTAACAAACATACCTGGTTTTTTTACTAACTGTTCAAGTTCTAATAAACTT CATGAGTTTGTCAATCCTAGCCTTCAAAGAGAAGAAATAGAACAGAAC---CTAGGAGCAAATCGACTGTCTAATAGTGC CAAAGACCCCAAAGATCTGATATTAAGTGGAGGAAAA---TGTTTGCAA---GCTGAAAGATCTGTAGAGAGTTCCGGTA TTTCATTGGTACCTGATACTGATTATGGGACTCAGGATAGTATCTCACTGCTGGAAGCTGACACCCTAGGG---AAGGCA AAA---ACAGCACCAAATCAATGTGCAAATCTATGCGCAGCAATTGAAAACCCCAAGGAACTTACCCATGATTGT---TC TAAGGATACTAGAAATGATACACAGGGCGTTAAGGATCCATTGAGACGTGAAGTTAAC---CACACTCAGGAGACAAGCA TAGAAATGGAAGAGAGTGAATTTGATACGCAGTATCTACAGAATATGTTCAAGGTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCAAATCCAGGAAGTCCAGAAAAGAAGTGTGCAACAGTCAGTGCCCACTCCAGGTCCTTAAGGAAACAAAGTCCAAA AGTCACTCTTAAATGTGGACAAAAAGAAGGAAAGGAGGGAAAGAAAGAGTCTAAAATCAAGAATGTGCAGTCAGTTCACA CAACTGTGGGCTTTCCTGTGATTTGTCAGAAAGATAAGAAGCCAGGTGACTATGTCAAATGTAGCACAAAAGAAGCCTCT AGGCTTTGTCAGTCATCTCAGTTCAGA---GGCAACGAAACNGAACTTATTACTGCAAATAAACATGGAATTTCACAAAA CCCATATTATATACCATCACTTTCTCCCATCAGGTCATCTGTTAAAACTGTATGTCAGAAAAAC---CTGCCAGAGGGAA AGCTTGAGGAACAGTCACTGTCACCTGAAAGAGCAATGGGAAATGAGAGCATTGTTCAAAGTACAGTGAGCACAATTAGC CAAAATAACATTAGAGAAAGCACATTGAAAGAAGTCAGCTCAAGC---------------------AGTATTAATGAAGT AGGCTCTAGTATTAATGAAGTAGGTTCCAGT---------------------GGTGAACACATTCAAGCAGAACTAGGCA GAAACAGAGGACCTAAATTAAATGCTATTCTCAGATTAGGTCTTATGCAACCTGAAGTCTATAAGCAAAGTCTT---CCT ATAAGTAATTGTAAACATCTGGAAATAAAAAGGCAAGGAGAAAAGGAA---GTAGTTCAGGCTGTTAACGCAGACTTTTC TCCGTGTCTAATTTCAGATAACCTAGAACAA---CCTATGGGAAGTAGTTGTGCTTCTCAGGTTTGTTCTGAGACACCTG ATGACCTGTTAAATGATGACGAAATAAAGGAAAATATCAGCTTTGCTGAAAGTGGCGTTAAGGAAAGATCTGCCGTTTTT AGCAAAAGCGTCCAGAAAGGAAAGTTCAGAAGGAGTCCTAGCCCTATAGGCCGT---ACGTGTTTGGCTCAGGGTCACCA AAGACGGGCCAGGAAATTAGAGTCCTCAGAAGAGAACACGTCTAGTGAG >Rhino TGTGGCACGAATACTCATGCCAGCTCATTGCAGCATGAGAACAGCAGTGTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGTGAACAGCCTGGTTTAGCCAAGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCCAACTCAGAGAAAAAGCTAGTTCTGAACGCTGATCCYCTGTATGGGAGAAAAGAA CTGAATAAGCAGAAACCTCCATGTTCTGACAGTCCTAGAGAT---TCCCAAGAT---ATTCCTTGGATAACACGGAATAG TAGCATACAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAAATATTAACTTCTGATGACTCACATGATGGGGGGCCTG AATCAAATACTGAAGTAGCTGGTGCAGTAGAAGTTCAAAAT------GAAGTAGATGGATATTCTGGTTCTTCAGAGAAA ATAGGCTTAATGGCCAGTGATCCTCCTGGTGCTTTAATATGTGAAAGTGAAAGAGTCCACTCCAAACCAGTAGAGAATAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCTTAACTTGAGCCACATAACTGAAGATC TAATTATAGGAGCATCTGCTATAGAATCTCAGATTACACAG--------------------------------------- ---------------------GAGCGTCCCCTCACAAATAAACTGAAGCATAAAAGGAGAACTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGTAGATTTGGCAGTTGTCCAAAAGACTCCTGAAAAGATAATTGAGGGAACTGACCAAATAG AGCAGAAC---------GGTCGTGTGATGAGTATTGCTAATAATGGTCATGAGAATGAAACAAAAGGTGATTAT---GTT CAGAAAGAGAAAAATGCTAACCCAACAGAA------TCATTGGAGAAAGAATCTGCTTTCAGAACGAAAGCTGAACCTAT AAGCAGCAGCATAAGCAATCTGGAACTGGAATTAAATATCCACAGTTCAAAAGCACCTAAGAAGAATAGGCTGAGGAGGA AATCCTCTACCAGGCATATTCATGCACTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAATCATACTGAACTACAG GTTGATAGTTGTTCTAGCAGTGAGGAG---ATGAAGAAGAAA---AGTCCCAGCCAGGTGCCAGTCAGACATAGCAGAAA GCTTCAACTCACMGAAGATAAAGAACCCGCAGCTGGAGCCAAGAAAAGTAACAAGCCAAATGAACAGATAAATAAAAGAC TCGCCAGTGATGCTTTTCCAGAACTAAAATTAACAAACGTACCTGGTTTTTTTGCTAACTGTTCAAGTTCTAATAAACTT CAAGAGTTTGTCAATCCTAGCCTTCAAAGAGAAGACATAGAACGGAAC---CTAGGAGCAATTCAAGTGTCTAATAGTAC CAAAGACCCCGAAGATCTGATATTAAGTGGAGGAAGA---GGTTTGCAR---GCTGAAAGATCTGTAGAGAGCACCAGTA TTTCATTGGTACCTGATACTGATTATGGCACTCAGGGTAGTATCTCATTACTGGAAGCTGACACCCTAGGG---AAGGCA AAA---ACAGCACCAGATCAACGTGCAAGTCTATGTGCAGCAATTGAAAACCCCAAGGAACTTATCCATGATTGT---TC TAAAGATACTAGAAATGACACAGAGGGCCTTAAGGATCCATTGAGATGTGAAGTTAAC---CACACTCAGGAGACAAGCA TAGAAATGGAAGAGAGTGAACTTGATACACAGTATCTACAGAATACGTTCAAGGTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCAAATTCAGGAAATCCAGAAAAGGATTGTGCAGCAGTCTCTGCCCACACCAGGTCTTTAAGGAAACCAAGTCCGAA AGTCACTCTTGAATGTGGACAAAAAGAAGAAAATCAGGGAAAGAAAGAGTCTAAAGTCAAGCATGTGCAGTCAGTTCATA CAACTGTGGACTTTCCTGTGGTTTGTCAGAAAGATAAGAAACCAGGTGATCATGTCAAATATAGCATAAAAGAACTCTCT AGGCTTTGTCAGTCATCTCAGTTCAGA---GGCAATGAAACTGAACTCATTACTGCAAATAAACGTGAACTTTCACAAAA CCTGTGTTATATACCATCACTTTCTCCCATCAGGTCATCTGTTAAAACTATATGTAAGAAAAAT---GTGTCAGAGGAAA AGCTTGAGGAACATTCAGTGTCCCCTGAAAGAGCACTGGGAAACAAGAGCGTCATTCAAAGTACAGTGAGCACAATTAGC CAAAATAACATTAGAGAAAGCACTTTTAAAGAAGTCGGCTCAAGCAGTATTAATGAAGTAGATTCCAGTACTAATGAAGT AGGCTCTAGTATTAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTGAAGCAGAGCTAGGCA GAAACAGAGGACCTAAATTAAATGCTATTCTCAGATTAGGTCTTATGCAACCTGAAGTCTATAAGCAAAGTCTT---CCA ATAAGTAATTGTAAACATCCGGAAATAAAAAGGCGAGGAGAAAATGAAGGAGTAGTTCAGCCTGTTAATGCAGATTTCTC TCCGTGTCCAATTTCAGATAACCTAGAACAA---CCTGTGGGAATTAGTTGTGCTTCTCAGGTTTGTTCTGAGACACCTG ATGACCTATTAAATGACAACGAAATAAAGGAAAATATCAGCTTTACTGAAAGTGGCATTAAGGAAAGATCTGCTATTTTT AGCAAAAGCGTCCAGAAAGGAGAATTCAGAAGGAGCCCTAGCCCTTTAGCCCGT---ACATGTTTGGCT---------CA AAGAGGGGCCAGGAAATTAGAGTCCTCAGAAGAGAACATGTCTAGTGAG >Pangolin TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATAAATGTAGA AAAGACTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCTGCAAAGCAGATGGGCTGAATGTAAAGAAA CATGTAACGAT---AGGCAGGCTCTCAGCACAGGGAAAAAGGTAGTTCTGAATGCTGATCCCCTGTGTGGGAGAAAAGAA CTGAATAAGCAGAAACCTTCATGCTCTGACAGTCCTAGAGTT---TCTCAAGAC---GTTCCTTGGATAACACTAAATAG TAGCATACRGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAAATGTTAACTTCTGATGATCCATGTGATGGGAGGTCTG AATCAAATACTGAAGTAGCTGGTGCAGTAGAAGTTCCAAAT------GAAGTAGATGAAAATTGTGGTTCTTCAGAGAAA AAAGACTTAATGGCCAGTGATCCTCATGATGCTTTAATACGTGAAAGTAAAAGAGGCCACTCCAAACCAGTAGAGAGTAA T---ACTGAAGATAAAATATTTGGGAAAACCTATCGGAAGAAGGCAAGTCTCTCTCACTTGAGCCACGTAACTGAAAATC TAATTATTGGAGCATTTGCAGTAGAACCTCAGATAACACAA--------------------------------------- ---------------------GAGCGTCCCCTCAGAAATAGAGTAAAGCGTAAAAGGAGAACTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGTAGATTTGGAAGTTGTTCAAAAGACTCCTGAAAAGATAATTGAGGGAACCGACCAAATAG AGCAGAAT---------GCTCTTCTGATGAATAGAACAAATAACGGTCATAAGAATGAAACAAAAGGTGATTAT---GTT CAGAAAGAGAAAAATGCTAACCCAACAGAA------TCATTGAAAAAAGAACCTGCTTTCAGAACTAGAGCTGAACCTAT AACCAGCAATATAAGCAATATGGAGCTAGAATTAAATATCCATAGTTCAAAAGTACTTAAGAAGAATAGGCTGAGGAGGA AGTCTTCTACCAGGCACATTCATGCACTTGAACTAGTAGTCAATAGAAATCCAAGCCCCCCTAAACATACTGAACTACAA ATTGATAGTTGTTCTAGCAGTGAAGAGTTGATGAAA---------------AACCAAATACCAGTCAAACATNCCAAAAA GCTTCAACTCAAGGAAGATAGAGAACCTGCATCTAGAGCCAAGAAGAGTAACAAGCCAAATGAACAAATAAATAAAAGAT TTACCAGTGACACTTTTCCAGAACTAAATTTAACAAACATACCTGGTTTTTTTAGTAAATGTTCAAGTTCTAATAAACTT CAAGAGTTTGTTGACCCTAGCCTTCAAAGAGAAGAAATAGAAAATAAT---CTAGAAACAATTCTA------------AC CAAAGACCCCAAAGATGTGATATTACATAGAGGA------GGTTTGCAA---ACTGAAAGATGTGTAGAGAGTAACAGTA TTTCATTGGTACCTGATACTGATTACGGCACTCAGGATAGTATCTCATTACTGGAAGCTGACACACTAGGG---AAGACA AAA---ACAGCACCAAATCAATGTGCAAGTCTGTGTGCAGCAATTGAAAACCCCAAGGAACTTACCCATTGTTGT---GC TAAAGATATTAGAAATGACACAGAGAGCTTTAAGGATCTATTGAGACATGAAGATAAC---CACACTCAGGAGACAAGCA TAGGAACGGAAGAGAATAAACTTGATAAGCAGTACTTACAGAATACTTTCAGGGTTTCAAAGCATCAGTCATTTGCTCAG TTTTCAAATATGGGAAATCCAGAAAAAGAATGTGCAGCAGTCAGTGCCCACTCTGGGTCCTTAAGGAAACAAAGTCCAAA AGTCACTCTTGCATATGGACAAAAAGAAGAAAATGAGGGAAAGAAAGAGTCTGAAATCAAGCATGTTCAAACAATTCATA CAACTGCAGGCCCTCTTGTGGTTAGTCAGAAAGATGAGAAGCCAGGTGATTATGTAAAATTTGGCATAAAAGGAGTCTCT ACGCATTGTCAGTCATCTCAGTTCAGA---GGCAATAAAACTGAACTTATTTTTGCAAATAAACCTGGAATTTCACAATA CCCATATCATATATCATCAATTTCTCCCATCAGGTCATCTGTTAAAATAATATGTAAGGAAAAC---CTGTCAGAGGAAA AGTCTGTGGAACATTCGATGTCATTGGAAAAAGCAGTGGGAAACAAGAGCATCATTCAAAGTACTGTGAGCACAATTAGC CAAAATAACATTAAAGGAAACAATTTTAAAGATGGCAGCTCCAGCAGCATTAATGAAATAGGTTCCAGTACTAATGAAGT AGGCTCTAGTATTAATGAAGTAGGTTCCAGT---------------------GGTGAAAAAATTCAAGCAGAACTAGGTA GAAACAGAGGACCTAAATTAAGTGCTGTGCTCAGATTAGGTCTTATGCAACCTGAGGCCTATAAGCAAAATCTT---CTT ATAGGTAATCGTAAACACCCTGAAATTAAAAGACAAGGAGAAAAAGAAGGAGTAGTTCAGGTTGTTAATGCTGATTTCTC TCCATGTCTAATTTCAGATAACCTAGAACAA---CCTATGGGAAGTAGTCATGCTTCTCAGGTTTGTTCTGAGACACCTG AGGACATATTAAATGGTGATGAAATAAAGGAAAATATCGGCTTTGCTGAAAGTGGTATTAAGGAAAGATCTGCTGTTTCT AGCAAAAGTGTCCAGAAAGGAGAATTCAGAAAGAGCCCAAGCCCTGTAGTCCAT---ATGAGTTTGTCTCAGAGTCACCA AAGAAGAGCTGGGATAATAGAGTCTTCAGAAGAGAACATGTTTAGTGAG >Cat TGTGCCACAAATACTCGTGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGGCAGAATGAATGTAGA AAAGGCTGAATTCTGTAATGAAAGCAAACAGCCTGGCGTAGCAAGGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCCAGCACAGAGAAAAAGGTAGTTCTGAATGCTGATCCCCTGTGTAGGAGAAAA--- CTGAGTAAGCAGAAATCTCCATGCTCTGACAGTCCTAGAGAT---TCCCAAGAT---GTACCTTGGATAACACTGAATAG TAGCATACAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAAATGTTAACTTCTGATGACTCACATGATGAGGGATCTG AATCGAATCCTGGAGTAGCTGGTGCA---GAAGTTCCAAAT------GTAGTGGATGGATATTCTGGTTCTTCTGAGAAA ATAGACTTAATGGCCAGTGATCCTCATGATGCTTTGATATGTGAAAGTGAAAGAGTTCACACCAAACCAGTAGAGAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTTCCTAACTTGAGCCACACAAGTGAAGATC TCATTATAGGAGCATGTGCTATAGAACCTCAGATAACGCAA--------------------------------------- ---------------------GCCTATCCCCTCACAAATAAAACAAAGCGTAAAAGGAGATCTACCGCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGTAGATTTGGCCATTGTTCAAAAGACTCCTGAAAAGCTAATTGAGGGAACCGATCGAATAG GGCAAAAT---------GGCCATGTGATGAATAGAACAAATAATGGTCCTGAGAATGGGACAAAAGGTGATTAT---GTT CAGAAAGAGAAAAATGCTAAGCCAACAGAA------TCATTGGAAAAAGAATCTGCTTTTAGAACCAAAGCTGAACCAAT AAGCAGCAATATAAGTAATATGGAATCAGAATTAAATAGCCACAGTTCAAGGGCACCTAAGGAGAACAGGCTGAGGAGGA AGTCCTCTACCAGGCACATGCGTGCGCTGGAATTAGTAGTCAATAGAAATCCAAGCCCACCTGATCATACCGAACTACAG ATTGATAGTTGTTCTAGCAGTGAAGAGATGGTGAAAAGAAAA---AGTTCGGAACAAATGCTAGTCAGACACAGCAAAAC ACTTCAACTCGTGGAAAATAAAGAACCTGCACCTGGAGCCAAGAAGCGTAACAAGCCAAGTAAACAAATAAATAAAAGAC TTGCCAGTAACACTTTTCCAGAGCTAAATTTAACAAACATACCTGGGGTTTTTACTAACTGTTCAAGTTCTAATACACTT CAAGAGTTTGTCAACCCTGGCATTCAAAGAGAAGAACTAGAAGAGAGC---CGAGGAACGATTCACGTGTCTGATAGGAC CAGAGATCCCAAAGCGCTGGTATCGAGTGGAGGAAGA---AGTTTGCAA---ACTGAAAGATCTGTAGAGAGTACCAGTA TTTCATTGGTACCTGATGCCGATTATGGCACTCAGGATAGTATCTCATTACTGGAAGCTGACACCCTAGGG---AAGGCA AAA---ACAGCACCAAATCAACGTGTGAGTCTGTGTGCAGCAATTGAAAACCCCAAGGAAGCTATCCGTGGTTGT---TC CAAAGATACTAGAAATGGCACAGAGAGTTTTACAGATCCTCTGAGACGTGAAGATANC---CATACTCAGGAGACAAGTA TAGAAATGGAAGAGAATGAACTTGATACGCAGTGGTTATACAATACGTTCAAGGGTTCAAAACGTCAGTCATTTGCTCTG TTTTCAAATCCAGGAAACCCAGAAAAGGAATGTGCTACAGCCTGGGCCCGTTCCACGTCCTTAAGGAAACAAAGTCCAAA AGTCGCTCTTGAATATGAACAAAAAGAAGAAAATCAGGGAAAGAATCAGTCTGAAATCAAGCATGTGCAGGCAGTCCGTG CAACTGCAGGCTTTTCTGCAGTCAGTCAGAAAGTGGAGAAGCCAGGTGATTATGCCAAATGTAGCATAAAAGGAGTCCCT GGGCTCTGTCAGTCATCTCAGTTCAGA---GGCAATGAAACTGAACTCTTTATTGCAAATAACCATGACATTTCAAAAAA CCCTTATCATATACCACCACTTTCTCCCATCAGATCATCTGTTAATGCTGTATGTAAGAAAAAC---CTGTCAGAGGAAA AGTTTGAGCAGCGTTCAATGTCACCTGAAAGAGCAGTGGGAAATGAGAGCGTCATTCAAAGTACAGTGAACACAATTAGC CAAAATAACATTAGAGAAAACACTTTTAAAGAAGTTAGCTCAAGCAGTGTTAATGAAGTAGGTTCCAGTGCTAATGAAGT AGGCTCTAGCATTAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAACAGAACTGGGTA GAAACCGAGGACCTAAGTTAAGTGCTATGCTCAGGTTAGGTCTTATGCAACCTGAAGTCTATGAGCAAAGTCTT---CCT ATAAGTAATTGTAAATGTCCAGAAATGAAAAGGCAAGGAGAAAATGAAGGAGTAGTTCAGTCTGTTAATGCAGATTTTTC TCCATGTCTAATTTCCGATAATGTAGAACAA---CCTATGGAAAGTAGCCGTGCTTCTCAGGTTTGTTCTGAGACACCCA ATGACCTATTAAATGGTGATGAAATAAAGGGAAAAATCAGCTTTGCTGAAAGT---GCTAAGGAAAGATCTGCTGTTTTT GGCAAGAGTGTCCAGAAAGGAGAATTCAGAAGGAGCCCTAGCCCTTTAGACCAT---ACACATCTGGCTCAGGGTCACCA AACAGAGACCAGGAAGTTAGAGTCCTCAGAAGAGAACGTGTCTAGTGAG >Dog TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAACACAGAATGAATGTAGA AAAGGCTGAAATCTGTAATAACAGCAAACAGCCTGGCTTAGCAAGGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGATTCCCAGCACAGAGAAAAAGGTAGTTGTGAATGCTGATCTCCTGTGTGGGAGAAAAGAA CTGAATAAACAGAAACCTCCACACTCTGATAGTCCTAGAGAT---TCCCAAGAT---GTTCCTTGGATAACACTGAATAG TAGCATACGGAAAGTTAATGAGTGGTTTTCCAGAAGTGACGAAATATTAACTTCTGATGATTCACATGACAGAGGATCTG AATTGAATACTGAAGTAGGTGGTGCAGTAGAAGTTCCAAAT------GAAGTGGGTGAATATTCTGGTTCTTCTGAGAAA ATAGACTTAATGGCCAGTGATCCTCAGGATGCTTTCATATGTGAAAGTGAAAGAGTCCACACCAAGCCAGTAGGAGGTAA T---ATCGAAGATAAAATATTTGGAAAAACCTATCGGAGGAAGGCAAGCCTCCCTAAGGTGAGCCACACAACTGAAGTTC TAACTATAGGAGCGTGTGCTATAGAACCTCAGACAATGCAA--------------------------------------- ---------------------ACCCATCCCTTCATGAATAAAGCAGAGCATAAAAGGAGAACTACATCTAGCCTTCATCC TGAGGATTTTATCAAGAAAGTAGAGTTAGGCATTGTTCCAAAGACTCCTGAAAAGCTAATTGAGGGAATCAACCAAATCA AGCGAGAT---------GGTCATGTGATAAATATTACAAATAATGGTCCTGAGAATGAAACAGAAGGTGATTAT---GTT CAGAAAGAGAAAAATGCTAACCCAACAGAA------TCATTGGAAAAAGAATCTGCTTTTAGAACCAAAACTGAACCAAT GAGCAGCAGGATAAGCAATATGGAACTGGAATTAAATAGCTCCAGTTCAAAAGCACCTAAGAAGAACAGGCTGAGGAGGA AGTCCTCTGCCAGGCACACTTGTGCCCTTGAATTCGTAGTCAATAGAAATCTAAACCCACCTGATCATAGTGAACTACAG ATTGAAAGTTGTTCTAGCAGTGAAGAG---ATGAAGAAACAG---CATCTGGACCAAGTACCAGTCAGACACAACAAAAC ACTTCAGCTCATGCAAGATAAAGAACCTGCAGGTAGAGCTAAGAAAAGTAGTAAGCCAGGAGAACAAATAAATAAGAGAC TCGCCAGCCATGCTTTTCCAGAGCTAACTTTAACAAATGTATCTGGTTTTTTTGCTAACTATTCAAGTTCTAGTAAGCCT CAAGAGTGCATCAACCCTGGCCTTCGAAGAGAAGAAATAGAAGAGAGC---CGAAGAATGACTCAAGTGTCTGATAGTAC CAGAGATCCCAAAGAGCTGGTATTGAGTGGAGGAAGA---GGTTTGCAA---ACTGAGAGATCTGTAGAGAGTACCAGTA TTTCATTGGTACTTGATACTGATTATGGTACACAGGACAGTATCTCATTACTGGAAGCTGACACCCTGAGG---AAGGCA AAA---ACAGTATCAAATCAACAGGCGAATCTGTGTGCAACAATTGAGAACCCCAAGGAACCTATCCATGGTTGT---TC TAAGGACACTAGAAATGACACAGAGGGTTTTGTAGTTCCATTGACGTGCAAAGATAAC---CACACTCAAGAGACAAGCA TAGAAATGGAAGAGAGTGAACTTGACACGCAGTGCTTACGCAATATGTTCAAGGTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCATATCCAAGAGATCCGGAAGAGGACTGTGTAACAGTCTGTCCCCGCTCTGGGGCCTTTGGAAAACAAGGTCCAAA AGTCACTCTAGAATGTGGACAGAAAGAAGAAAGTCAGGGAAAGAAAGAGTCTGAAATCAGACATGTGCAGGCAGTTCATA CAAATGCAGGCTTTTCTGCAGTTAGTCAGAAAGCTAAGAAGCCAGGCGATTTTGCCAAATGTAGCATAAAGGGAGTCTCT CGGCTTTGTCTGTCATCTCAGTTCAAA---GGCAAGGAAACTGAACTCCTTATTGCAAATTACCATGGAATTTCCCAAAA CCCTTATCATATACCACCACTTTCTCCCATCAGATCATGTGTTAAAACTCTATGTCAGGAAAAC---CTGTCAGAGGAAA AGTTTGAGCAACATTCAATGTCACCCGAAAGAGCAGTGGGAAATGAGAGAGTCATTCAAAGTACAGTGAGCACAATTAGC CAAAATAACATTAGAGAATGTGCTTCTAAAGAAGTCGGCTCAAGCAGTGTTAATGAAGTAGTTTCCAGTACTAATGAAGT AGGCTCTAGTGTTAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAGAACTAGGTA GAAACCGAGGACCTAAATTAAATGCTATGCTCAGATTAGGTCTTATGCAACCTGAAGTCTGTAAGCAAAGCCTT---TCT TTAAGTAATTGTAAACATCCAGAAATGAAATGGCAAGGACAAAGTGAAGGAGCAGTTCTGTCTGTTAGTGCAGATTTCTC TCCATGTCTGATTTCAGATAACCCAGAACAA---CCTATGGGAAGTAGTCGGTCTTCTCAGGTTTGTTCTGAGACACCTG ATGACCTATTAAATGGTGACAAAATAAAGGGAAAAGTCAGCTTTGCTGAAAGTGACATTAAGGAAAAATCTGCTGTGTTT AGCAAAAGTGTCCAGAGTGGAGAGTTCAGCAGAAGCCCTAGCCCTTCAGACCAT---ACACGTTTGGCCCAGGGTTACCA GAGAGGGACCAAGAAATTAGAGTCCTCAGAAGAGAACATGTCTAGTGAG >Llama TGTGGCACAGATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAGGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGTCTTAGCAAGGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGACA CATGTAATGAT---AGGCAGACTCCCAGCACAGAGAAAAAGGTGGCTCTGAATGCCGATCGCTTATATGGGAGCGAAGAA CTGAATAAGCAGAAACCTGCATGCTCTGACAGTCCTAGAGAT---TCCCAAGAT---GTTCCTTGGATAACACTGAATAG TAGCATACAGAAAGTGAATGAGTGGTTTTCCAGAAGCGACGAAATGTTACCTTCTGATGACTCACATGAAGTGGGGCCTG AATCAAATACTGAAGTAGCTGGTGCAGTGGAAGTTCCAAAT------GAAGTAGATGGCTATTCAGGCTCTTCAGAGAAA ATAGACTTAATGGCCAGTGATCCTCATGGTGCTTTAAAATGTGAAAATGAAAAAGTCCACGCCAAACCAGTAGGGAGTAA C---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTGACCCACAGAGCTGAAGATC TAATTCTAGGAGCATCTGTTCTAGAGCCTCAGATAACACAA--------------------------------------- ---------------------GAGCGCCCCTTCACAAATAAACTAAAGCGTAAAAGGAGAACTCTACCAGGTCTTCATCC TGAGGATTTTATCAAGAAAGTCGATTTGGCCGTTGTTCAAAAGTCTCCTGAAAAGATAATTGAGCGAACAGACCAAACAG AGCAGAAT---------GGTCATGGGATGAATATTACTAGTAATGGTCATGAGAATGAAACAAACGACGATTAT---GTT CAGAAAGAGAAAAATGCTAACCCAACAGAA------TCATTGGAAAAAGAATCTGCTTTCAAAACTAAGGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTGAATACCCATAGTTCAAAAGCACCTAAG---AATAGGCCGAGGAGGA AGTCCTCTACCAGGCAGATTCATGCACTTGAACTAATAGTCAGTAGAAACCCAAGCCCACCTAATCACACTGAACTACAG ATTGAGAGTTGTTCCAGCAGTGAAGAG---ATGAAGGAAAAA---CATTCTGACCAAATGCCAGTCAGGCACCGCAAAAA GCTTCAGCTCACGGGAGATAAAGAACCTACGACTGGAACCAAGAAGAGTAACAAGCCACATCAACAAATAAATAAAAGAC TTGTCGGTAACACTTTTCCAGAACTAAATTTAACAAACACACCTGGTTTTTTTACTAAGTGTTCAAGTTCTAATAAACTT CAAGAGTTTGCCAATCCTAGCCTTCAAAGAGAA---------GAGAAC---CCAGGAGCAATTCAAGTATCGAACAGTAC CAAAGACCCCAAAGTTCTGATATTAAGTGGAGGGAGA---GGTTTACAA---ATTGAAAGATCTGTAGAGAATACCAGTA TTTCCTTGGTACCTGAAACTGATTATGACACTCAGGACAGTGTCTCATTACTGGAAGCTGACACCCTAGGG---AAGGCA AAA---ACAGCACCAAATCAATGTGTGAGTCTCTGTGCAGCAACTGAAAACCCCAAGGAACTTATCCATAGTTGT---TC TAAAGATACTAGAAATGGCACAGCAGGCTTTAAGGATCCATTGGGATGTGATGTTAAT---CACACTCAGGAGGCAAGCA TGGAAGTGGAAGAGAGTGAACTTGATACTCAGTATTTGCAGAATACGTTCAAGGTTTCAAAGCGTCAGACATTTGCTCTG TTTTCAAATCCAGGAGATTCAGAAAAGGAATGTGCAACGATCTATGCCCACTCTGGGCCCTTAAGGGAACAAAGTCCAAA AATCACTCTTGAATGTGGACAAAAAGAAGAAAATCAGAGAAAGAGCAAGTCTGAAATCAAGCATATGCAGGCAGTTCATA CAACTTTGAACTTTCCTGTGGTTGGTCAAAAAGATAAG---CCGAGTGATTATGCCAAATATAGCCCAAAAGGAATATCT AGGCTTTGTTGGCCATCACAGCTCAGA---GGCAATGAATCTGAGTTCATTATTGCAAATAAACATGGGATTTTACAAAA CCCATATCTTATAACATCACTTTCTCCTAACAGGTCATCTGTTAAAACTATGTGTAAGAAAAAC---CCGTCCGAGGAAA AGCTTGAGAAATGTGTCATATCACCTAAAAGAGTGATGGGAAACGAGAGCACCATTCAAAGTATAGTGAGTGCAATGAGC CAAAATAACATTAGAGAAAGCACTTTTAAAGAAGTCAGCTCAAGCAGTGTCAATGAAGTAGGTTCCAGTACTAATGAAGT AGGTTCTAGTATTAATGAAGTAGGTTCCAGT---------------------GGCGAAAACATTCAAGCAGAACTAGGTA GAAACATGGGACCTAAATTAAATGCTATGCTCAGATTAGGTCTTTTGCAACCTGAAGTCTGTAAGCAAAGTCTT---CCT GTAAGTAATTGTAAACAACCTGAAATAAAAAGGCAAGGAGAAAATGAAGGCATCTTTCAGGCTGTTAATACCGACTTCTC TCCATGTCTAATTTCAGATAACCTAGAACAA---CCTATGGGAAGCAGTCATGCTTCTCAGGTTTGTTCTGAGACACCTG ATGACCTGTTATATGATGACGAAATAAAGGAAAATACCAGCTTTGCTGAAAGTGACATTAAGGAAAGATCTGCTGTTTTT AGCAAAAGTGTCCAAAAA---GAATTCAGAAGGAGCCCTAGCCCTTTAGTGCAT---ACGCATTTGGCTCAGGGTCACCA AAGAGGGGCTAGGAAATTAGAGTCCTCAGAAGAGAATGTGTCTAGTGAG >Pig TGTGGCACAGATACTCATGCCAGCTCGTTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTTTGTAATAAAAGCAAGCAGCCTGTCTTAGCAAAGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGGCA CATGTAATGAT---AGGCAGACTCCTAACACAGAGAAAAAGGTAGTTCTGAATACTGATCTCCTGTATGGGAGAAACGAA CTGAATAAGCAGAAACCTGCGTGCTCTGACAGTCCTAGAGAT---TCCCAAGAT---GTTCCTTGGATAACATTGAATAG TAGCATACAGAAAGTTAATGAGTGGTTTTCTAGAAGCGATGAAATGTTAACTTCTGACGACTCACAGGACAGGAGGTCTG AATCAAATACTGGGGTAGCTGGTGCAGCAGAGGTTCCAAAT------GAAGCAGATGGACATTTGGGTTCTTCAGAGAAA ATAGACTTAATGGCCAGTGACCCTCATGGTGCTTTAATACGTGAACGTGAAAGAGGGCACTCCAAACCAGCAGAGAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTGAGCCACGTAATTGAAGATC TAATTTTAGGAGCATCTGCTGTAGAGCCTCAAATAACACAA--------------------------------------- ---------------------GAGCGCCCCCTCACAAATAAACTAAAGCGGAAAAGGAGAGGTACATCAGGCCTTGATCC CGAGGATTTTATCAAGAGAGCCGATTCGGCAGTTGTTCCAAAAACTCCTGAAAAGACAATTGAGGGAACTGATGAGACAG AGCAGAAT---------GGTCAGGGGATGAATATTACCAGTAAAGGTCATGAGAATGAAACAAAAAGTGGTAAT---GTT CAGAAAGAGAAAAATGTTAACCCAACTGCA------TCGTTGGGAAAAGAACCTGCTTTCAGAGCTAGGGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTGAATATCCATGGTTCGAAAGCACCAAAG---AACAGGCTGAGGAGGA AGTCCTCTACCAGGCAGATCCATGCACTTGAACTAGTAGTCAATAGAAACCCAAGCCCACCTAGTCATACTGAGTTACAA ATTGATAGTTGTTCTAGCAGTGAAGAG---ATGAAGAAGAGA---AATTCCGACCAAGTGCCAGTCAGGCACAGCAAAAA GCTTCACCTCAGGGGAGATAAAGAACCTACAACTGGAGCCAAGAAGAATAATAGGCCACATGAACAAGTAAATAAAAGAC CTATCAGTGATGCTTTTCCAGAACTAAATTTAACAAATGTACCTGTTTCTATTACTAACTGTTCAAATTCTAATAAGCTT CAAGAATCTGTCAGTCCTAACCTTCAAAGAGAA---------GAGAAC---CTAGGGACAATTCAAGTGTCGAATAGTAA CAAAGACCCCAAAGATGTGATGTTAAGTGGAGGAAAA---GGTTTTCAA---ATTGAAAGATCTGTAGAGAATACCAGTA TTTCCCTGGTACCTGATACTGATTATGGCACTCAGGACAGTATCTCATTACTGGAAACTGACACCCTAGGA---AAGGCA AAA---ACAACACCCAATCAACATGTGAGGCTGTGTGCAGCAACTGAAAACCCCAAAGAACTTAGCCTTGGTTGT---TC TAAAGGTGTTAGAAACGACACAGGGGACTTTAAGGATCCCCTGGCTCATGATGTTAAC---CACACTCAGGAAGCAAGCA TAGAAGTGGAAGAGAATGAACTTGATACACAGTATTTGCAGAGTATGTTCAAGGTTTCAAAGCGTCAGACATTTGCTCTG TTTTCAAATCCAGCAAATCCAGAAAAGGAATGTACAACCGTCCATGCCCACTCCAAGTCCTTAAGAGAACAAAGTCCAAA AGTCACTCATGAAGGTGGACAAAAAGATGAAAATCAGGGAAAGAGTGAGTCTAAAGTCAAGCATGGGCAGTCAGTTCATA CAACTGTGGACTTTCTAGTGGTTGGTCAAAAGGATAAGAAGCCGAGTGATTTTGCCAAATGTGGTGCAAAAGGAGTAACT GGGCTTTATCAGACATCACAGTTCAGA---GGCCACAAAACTGAGTTCATTAATGCAAATAAACCTGGGATTTCACAAAA CCCATATGTCATACCATCCCTTTCTCCCATCAGGTCGTCTGTTAAAACTATATGTAAGAAAAAC---CTGTCAGAGGAAA AGTTTGAGGAACCTAAAATGTCACCCGAAAGAACAATGGGAAACGAGAGCATCATTCCAAATACAGAGAGCACAGTTAGC CAAAATAACATTCAAGAGAGAACTTTTAAAGAAGGCAGCTCAGGCAGTCCTAATGAAGTCGGGTCCAGTACCAACGAAGT AGGCTCTAGTATTAATGAAGTAGGTTCCAGC---------------------GGTGAAAACGTTCGAGCAGAACCAGGTA GAAACAGAGGACCTAAATTAAGTGCAATGCTCAGATTAGGTCTCATGCAACCCGAAGTTTATAAGCAAAGTCTT---CCT GTAAGTAATTGTAACCACACAGAAATAAAAAGGCAAGGAGAAAATGAAGGCATATTTCAGGCTGTTAATGCAGATTTCTC CCCATATCTAATTTCAGATAACCCCGAACAA---CCTATGGGAAGTAGTCATGCTTCTCAGATTTGTTCTGAGACACCTG ATGACCTGTTAAATGATGACAAAATAAAGGAAAATCTCAACTTTGCTGAAAGTGACGTTAAGGAAAGATCTGCTGTTTTT AGCAAAAGTGCCCAAGAAGGGGAATTTAAAAGGAGCCCTAGCCCTTTAGCCCAC---AGACGTTTGGCTCAGGGTCACCA AAGATGGGCTAGGAAATTAGAGTCTTCAGAAGAGAGTGGGTCTAGTGAG >Cow TGTGGCACAGATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTGCTCACTGAAAACAGACTGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGTCTTAGTAAAGAGCCAGCAGAGCAGATGGGCTGAAAGTAAGGGCA CATGTAAGGAT---AGGCAGATTCCCAGCACTGAGAAAAAGATAGTTCTGAATACTGATCCCCTGTACAGAAGAAAAGAA CTGCGTAAGCAGAAACCTGCATGCCCTGACAGTCCTGGAGAT---TCCCAAGAT---GTTCCTTGGGTAACCCTGAATAA TAGCATACAGAAAGTTAATGACTGGTTTTCCAGAAGTGATGAAATATTAACTTCTGATGACTCGTGCGATGGGGGGTYTG AATCAAATAATGAAGTAGCTGGTGCAGTGGAAATTCCAAAT------AAAGTAGATGGATATTCAGGTTCTTCAGAGAAA ATCAACTTAATGGCCAGTGATCCTCATGGTACTTTAATACAC------GAAAGAGTCCACTCCAAACCCGTAGAGAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGTCAAGTCTCCCTAACTTCAGCCACATAGCTGAAGATC TAATTCTAGGCGCATTTACTGTAGAACCTCAGATAACACAA--------------------------------------- ---------------------GAGCAGCCCCTCACAAATAAACTAAAATGTAAAAGGAGAGGTACATCAGGCCTTCAGCC TGAGGATTTTATCAAGAAAGTCGATTTGACAATTGTTCCAAAGACTCCTGAAAAGATGACGGAGGGAACTGACCAAACAG AGCAGAAA---------TGTCATGGGATGAATATTACTAGTGATGGTCATGAGAATAAAACAAAACGTGATTAT---GTT CAGAAAGAGCAAAACGCTAACCCAGCAGAA------TCATTGGAAAAAGAATCTGTTTTCAGAACTGAGGCTGAACCTAT AAGCATCAGTATAAGCAATATGGAACTAGAATTGAATATCCACCGTTCAAAAGCACCTAAG---AATAGGCTGAAGAGAA AGTCCTCTACCAGGAAAATTCCTGAACTTGAACTAGTAGTCAGTAGAAACCCAAGTCTACCTAATCATACTGAGCTACCA ATTGATAGCAGTTCTAGCAATGAAGAG---ATGAAGAAAAAA---CATTCTAGCCAAATGCCAGTCAGGCAGAGCCAAAA GCTTCAACTCATTGGAGATAAAGAACTTACTGCTGGAGCC---AAGAATAACAAAACATATGAACAAATAAATAAAAGAC TTGCTAGTGATGCTTTTCCAGAACTAAAGTTAACAAACACACCTGGTTATTTTACTAACTGTTCTAGT------AAACCT GAAGAGTTTGTTCATCCTAGCCTTCAAAGAGAG---------GAGAAC---CTAGGAACAATTCAAGTGTCGAATAGTAC CAAAGACCCCAAAGATCTGATATTAAGAGAAGGAAAA---GCTTTGCAA---ATTGAAAGATCTGTAGAGAGTACCAATA TTTCCTTGGTTCCTGATACTGATTATAGCACTCAGGATAGTATCTCATTACTAGAAGCTAAAACCCCAGAA---AAGGCA AAG---ACTGCACCAAATCCATGTGTGAGTCTGTGTACAGCAACCAAAAACCTCAAGGAACTTATCCATAGGGAT---TT TAAAGATACCAAAAACAACACAGAGGGCTTTCAGGATCTACTGGGACATGACATTAACTACGTCATTCAGGAGACAAGCA GAGAAATGGAAGACAGTGAACTTGATACACAGTATTTGCAGAATACATTCAAGGCTTCAAAGCGTCAGACATTTGCTCTG TTTTCCAATCCAGGAAATCCACAAAAGGAATGTGCCACAGTCTTTGCCCACTCGGGGTCCTTAAGGGATCAAAGTCCAAG AGACCCCCTCAAATGCAGACAAAAAGAAGACAGTCAGGGAAAGAGTGAGTCTAAAAGCCAGCACGTGCAGGCCATTTGTA CAACAGTGCACTTTCCTGTGGCTGATCAGCAAGATAGGACGCCAGGTGACGATGCCAAATGTAGCGCAAAAGAAGTAACT AGGGTTTGTCAGTCATCACAGTTGAGA---GGCCACAAAACTGAACTTGTTTTTGCAAATAAACAAGGGGTTTCAGAAAA ACCAAATCTTATACCATCACTTTCTCCCATCAAGTCATCTGTTAAAACCATATGTAAGAAAAGC---CCATCAGAG---A AGTTTGAGGAACCTGTAACGTCACCTGAAAAAACATTGGGGAGTGAGAGCATCATTCAAAGTGCAGTGAGCACAATCAGC CAAAATAACATTCAAGAAAGCACTTTTAAAGAAGTCAGCTCAAACAGTGTAAATGAAGTAGGTTCCAGTACTAATGAAGT AGGCTCTAGTGTTAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAGAACCAGGTA GAAACAGAGAACCTAAATTAAGAGCTTTACTCGGATTAGGTCTTACGCAACCTGAAGTCTATAAGCAAAGTCTT---CCT GTAAGTAACTGTCACCATCCTGAAATAAAAAGGCAAGGAGAAAATGAGGACATGCCTCAGGCTGTTAAGGCAGATTTCTC CCCATGTCTAATTTCAGATAACCTCGAACAA---CCTACGGGAAGCCGTCATGCTTCTCAGGTTTGTTCTGAGACACCTG ACAACTTGTTAAATGATGATGAAATAAAAGAAAATAGCCACTTTGCTGAAAGTGACATTAAGGAAAGATCTGCTGTTTTT AGTGAAAGTGTCCAAAAAGGAGAATTCAGAGGGAGCCCTGGCCCTTTCACCCAT---ACACATTTGGCTCAGGGTCACCA AAGAGGGGCTGGCAAACTAGAG---TCAGAAGAGACTGTGTCTAGTGAG >Hippo TGTGGCACAGATACTCGTGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGCAGA AAAGGCTGAATTCTGTAATAAAAGCNAACAGCCTCTCTTAGCAAAGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGACA CATGTAATGAT---AGACAGACTCCCAGCACAGAGAAAAAGGTAGTTCTGAACGCTGATCCCCTATACGGGAGAAAAGAA CTGAATAAGCAGAAACGTGCATGCTCTGACGGCCCTAGCGAT---TCCCAAGAT---GTTCCTTGGATAACACTGAATAG TAGCATACAGAAAGTTAATGAATGGTTTTCCAGGAGTGGCGAGATGTTAACTTCTGACGACTTATGTGTTAAGGGGTGTG AATCAAATACTGAAGTAGCTGGTGCAGCGGAAGTTCCAAAT------GAAATTGATGGGTGTTTGGGTTCTTCAGAGAAA ATAGATTTAATGGCCCGTGACCCTCGTGGTGCCTTAATACGTGAAAGTGAAAGAGTCCACGCCAAACCAGTAGAGAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTGAGCCACATAGCTGAAGATC TAATTAAAGGAGCATCTGCTGTAGGACCTCAGATAACACAA--------------------------------------- ---------------------GAGCGCCCCCTCACAAATAAACTAAAGCGTAAAAGGAGGCGGACATCGGGCCTTCATCC TGAGGATTTTATCAAGAAAGTCGATGCGGCAGTTGTTCCAAAGACTCCTGAAAAGATAATTGAGGGAACTGATCAAACAG AGCAGAAT------------------------------------GGTCATGAGAATGAAATGAAAGGTGATTAT---GTT CAGAAAGAAAAAAATGCTAACCCAACAGAA------TCTTTGGAAAAAGAATCTGCTTTCAGAACTAAGGCTGAACCTAT AAGCATCAGTATAAGCAATATGGAACTAGAATTGAATATCCACAGTTCAAAAGCACCTAAG---AGTACGCTGAGAAGGA AGTCCTCTACCAGGCAGATTCATGCACTTGAACTAGTAGTCAGTAGAAACCCGAGCCCACCTAATCGTACTGAACTACAA ATCGATAGTTGTTCTAGCAGTGAAGAG---ATGAAGAAAAAA---CATTCCTGCCAAATGCCAGTCAGGCACAGCAAAAA GCTTCAATTCATGGGATATAAAGAACCCGCAACTGGAGTCAGGAAGAGTAATAAACCACACGAACAAATAAATAAAACAC TTGCCAGTGGTGCTTTTCCAGAACTAAATGTAACAAACATACCTGGTTTTTTTACTAACTGTTCTAGTTCTAGTAAACTT CAAGAGGTTGTTAATCCTGGCCTTCCAAGAGAG---------GAGAAC---CTAGGAACAATTCAAGTGTCAAATAGTAC CAAAGACCCCAAAAATCTGATATTAAGTAGAGGAAAA---GGTTTGCAA---ATTGAAAGATCTATAGAGAGCACCAGTA TTTCCTTGGTACCTGATACTGATTATGGCACTCAGGACAGTATCTCATTACTGGAAGCTGACACCCTAGGG---AAGGCA AAG---ACAGCAACAAATCAACGTGTGGGTCTGTGTGCAGCAACTGAAAACCCCAAGGAACTTATCCATGGTTAT---TC CAAAGATACTAGAAACGACATGGACGGCGTCCAGCATCCATTGGGACAGGATGTTAAC---CACACTCAGGATGCAAGCA TAGAAGTGGAAGACAGTGAACTTGATACACAGTATTTGCAGAATACATTCAGGGTTTCAAAGCGTCAGACATTTGCCCTG TTTTCAAATCCAGGAAATCCAGGAAAGGAACGTGCAACAGTCTGTGCTCATTCCGGGTCCTTAAGGGAACAAAGTCCAAG AGTCCCTCTTGAATGCGGACAAAAAGAAGAAAATCAGGGCAAGAGTGAATCTAAA---------ATGCAGGCAATTTATA CAACTGTGGACTTTGCTGTGGCTGGTCAAAATGATAGGAAGCCGAGTGATTACACCAAATGTAGCACTAAAGGAGTAACT AGGCTTTGTCCCTCATCACAGTTTGGA---AGCAACAAAACTGAGCACATTATTGCAAATAAATATGGAATTTCACAAAA CCCATATGTTATACCATCACTTTCTCCCATCAGGTCATCTGTTAAAACTATACGGAAGAAAAAC---CTGTCAGAGGAAA AGTTTGAGGAACCTGTAGTGTCAGCTGAAAGAGCAATGGCAAATGAGAGCATCCTTCAAAGTACAGTGAACACAATTAGC CAAAATAACATTCGAGAAAACACTTTTAAAGAAGTCAGTTCAAGCAGTATTAATGAAGTAGTTTCCAGTACTAATGAAGT AGGCTCTAGTATCAGTGAAGTAGGTTCTAGT---------------------GGTGGAAACATTCAAGCAGAACTAGACA GAAAGAGAGGACCTAAACTAAGTGCTTTGCTTAGATTAGGTCTTATGCAACCTGAAGTATATAAGCAAAGTCTT---CCT GTAAGTAATTGTCAACATCCTGAAATAAAAAGGCAAGGAGAAAATGAAGGCATACTTCAGGCTGTTAATGCAGATTTCTC CCCGTGTCTAATTTCAGATAACCTAGAACAA---CCTATGGGAAGCAGTCATGCTTCTCAGGTTTGTTCTGAAACACCTG ATGATTTGTTAAATGATGACGGAATAAAGGAAAATAGCAACTTTGCTGAAAGTGACATTAAGGAAAGATCTGCTGTTTTT AGCAAAAATGTCCAAAAAGGAGAATTCAGAAGGAGCCCTGGCCCTTTAGCCCAT---ACACGTTTGGCTCAGGGTCACCA AAGAAGGGCTGGGAAATTGGAGTCCTCAGAAGA---------------- >SpermWhale TGTGGCACAGATACTCATGCCAGCTCATTACAGCATGAAAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGTCTTAGCAAAGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGACA CATGTAATGAT---AGGCAGACTCCCAGTACAGAGAAAAAGGTAGTTCTGAATGCTGATCCCCTATATGGGAGAAAAGAA CTGAATAAGCAGAAATCTGCATGCTCTGACAGTCCTAGAGAT---TCCCAAGAG---TTTCCTTGGATAGCAGTGAATAG TAGCATACAGAAAGTTAATGAATGGTTTTCCAGAAGTGATGAAATGTTAACTTCTAACGACTTACGTGATGGGGGATTTG AATCAAACCCTGAAGTAGCTCGTGCAGTGGAAGTTCCACAG------GAAGTTGATGGATATTTGGGTTCTTCAGAGAAA ATAGACTTAATGGCCAGTGATCCTCATGGTGCTTTAATACGTGAAAGTGAAAGAGTCCACTCCAAACCAGTAGAGAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTGAGCCACGTAGCTGAAAATC TAATTATAGGAGCATCTACTGTAGGACCTCAGATTACACAA--------------------------------------- ---------------------GAGCGCCCCCTCACAAATAAACTAAAGCGTAAAAGGAGAAGTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGTCGATTTGGCAGTTATTCCAAAGACTCCTGAAAAAATAATTGAGGGAACTGACCAAACAG AGCAGAAT---------GGTCATGGGGTGAATATTACTAGTAATGGTCATGAGACTGAAATGAAAGGTGATTGT---GTT CAGAAAGAGAAAAATGCTAACCTAACAGAA------TCATTGGAAAAAGAATCTGCTTTCAGAACTAAGGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTGAATATCCATAGTTCAAAAGCACCTAAA---AATAGGCTGGGGNGGA TGTCCTCTTCCAGGAAGATTCATGCACTTGAACTAGTAGTCAGTAGAAACCCAAGCCCACCTAATCATACTGAACTACAA ATTGATAGTTGTTCTAGCACTGAAGAG---ATGAAGAAAAAA---CATTCCAGCCAAATGCCAGTCAGGTGCGGCAAAAA GCTTCAATTCATTGGAGATAAAGAACCTACAACTGGAGCCAAGAAGAGTAACAAGCCACATGAACAAATAAATAAAAGAC TTGACAGTGACACTTATCCAGAACTAAATTTAACAAACATACCTGTTTTTTTTACTCACTGTTCTAGTTCTAATAAACTT CAAGAGTTTGTTAATCCTAGCCTTCAAAGAGAG---------GAGAAC---CTAGGAACAATTCAAGTGTCGAATAGTAC TGAAGACCCCAAAGATCTGACATTAAGTGGAGGAAAA---GGTTTGCAA---ATTGAAAGATCTGTAGAGAGTTCCAGTA TTTCCTTGGTACCTGATACTGATTATGGCACTCAGGATAGTATCTCATTACTGGAAGCTGACACCCTAGGG---AAGGCA AAG---ACAGCACCAAATCAACATGTGAGTCTGTGTGCAGCAATTGAAAGCCCCAAGGAACTTATCCACGGTTGT---TC TAAAGATATTAGAAACGACACAGAGGACTTTAAGGATCCACTGGGACATCACGTTAAC---CACATTCAGGAGGCGAGCA CAGNNNNNNNNNNNNNNNNACTTGATACTCAGTTTTTGCAGAATATGTTCAAGGTTTCAAAGCGTCAGACGTTTGCTCTG TTTTCAAATCCAGAAAATCCAGAAAAGGAATGTGCAACAGTCTGTGCCCACTCTGGGTCCTTAAGAGAACAAAGTCCAAG AGTCCCTCTTGAATGCAGACAAAAAGAAGAAAATCAGGGAAAGAGTGAGTCTAAAATCAAGCATGTGTGGGCAATTAATA CAACTGTGGACTTCCCTGTTGCTGGTCAAAAAGATAAG---CCGAGCGATCATGCCAAACGTAGCCCCAAAAGAGTAACT AGGCTTTGTCAGTCATCACAGTTCAGA---AGCAACAAAACTGAGCTCATTATTGCAAATAAACATGGGATTTCACAAAA CCCATATCTTATACCATCACTTTCTCCCATCAGGTCATCTGTTAAAACTATATGTAAGAAAAAC---CTGTCAGAGGAAA AGTTTGCGGAACCTGTAATGTCACCTGAAAGAGCAATGGAAAACGAGAGCATCATTCAAAGTACAGTGAGCACAATCAGC CAAAATAACATTCGAGAAAGCACTTTTAAAGAAGTCAGCTCAAGCAGTACTAATGAAGTAGGTTCCAGTACCAATGAAGT AGGCTCTAGTATTAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAAAACTAGATA GAAACAGAGGACCTAAATTAAGTGCTTTGCTCAGATTAGGTCTCATGCAACCCGAAGTCTATAAGCAAAGTCTT---CCT GTAAGTAACTGTCAACTTCCTGAAATAAAAAGGCAAGGAGAAAATGAAGGCACACTTCAGGCTGTTAATGCAGATTTCTC CCCATGTCTAGTTTCAGATAACCTAGAACAA---CCTATGGGAAGAAGCCATGCTTCTCAGGTTTGCTCTGAGACATCTG ATGAGTTGTTAAATGATGACAAAATAAAGGAAAATAGCAACTTTGCTGAAAGTGACATTAAGGAAAGATCT---GTTTTT AGCAAAAGTGTCCAAAAAGGAGAATTCAGAAAGAGCACTGGCCCTTTAGCCCATCATACATGTTTGGCTCAGGGTCACGA AAGAGGGGCT---------GAGTCCTCAGAAGAGAAAGTGTCTAGTGAG >HumpbackW TGTGGCACAGATACTCATGCCAGCTCATTACAACATGAAAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGTCTCAGCAAAGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGACA CATGTAATGAT---AGGCAGACTCCCAGTACAGAGAAAAAGGTAGTTCTGAATGCTGATCCCCTGTATGGGAGAAAAGAA CTGAATAAGCAGAAACCTGCATGCTCTGACAGTCCTAGAGAT---TCCCAAGAG---TTTCCTTGGATAACAGTGAATAG TCGCATACAGAAAGTTAATGAATGGTTTTCCAGAAGTGATGAAATGTTAACTTCTAACGACTCACGTGATGGGGGATTTG AATCAAACACTGAAGTAGCTTGTGCAGTGGAAGTTCCAAAG------GAAGTTGATGGATATTTGGGTTCTTCAGAGAAA ATAGACTTAATGGCCAGTGATCCTCATGGTGCTTTAATACGTGAAAGTGAAAGAGTCCACTCCAAACCAGTAGAGAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTGAGCCACATAGCTGAACATC TAATTATAGGAGCATCTACTGTAGAACCTCAGATAACACAA--------------------------------------- ---------------------GAGCGCCCCCTCACAAATAAACTAAAGCGTAAAAGGAGAAGTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGTCGATTTGGCAGTTATTGCAAAGACTCCTGAAAAAATAACTGAGGGAACTGACCAAACAG AGCAGAAT---------GGTCATGGGATGCATGTTACTAGTAATGGTCCTGAGACTGAAATGAAAGATGATTAT---GTT CAGAAAGAGAAAAATGCTAACCTAACAGAG------TCATTGGAAAAACAATCTGCTTTCAGAACTAAGCCTGAACCTAT AAGCAGCAGTATAGGCAATATGGAACTAGAATTGAATATCCATAGTTCAAAAGCACCTAAA---AATAGGCTGAGGAGGA AGTCCTCTACCAGGAAGATTCATGCACTTGAACTAGTAGTCAGTAGAAACCCAAGCCCACCTAATCATACTGAACTACAA ATTGATAGTTGTTCTAGCAGTGAAGAG---ATGAAGAAAAAA---CATTCCAGCCAAATGCCAGTCAGGCACGGCAAAAA GCTTCAATTCATGGGAGATAAAGCACCTACAACTGGAGCCAAGAAGAGTAACAAGCCACACGAACAAATAAATAAAAGAC TTACCAGTGACGCTTATCCAGAACTAAATTTAACAAACATACCTGGTTTTTTTACTCACTGTTCTAGTTCTAATAAACTT CAAGAGTTTGTTAATCCTAGCCTTCAAAGAGAG---------GAGAAC---CTAGGAAAAATTCAAGTGTCGAATAGTAC CAAAGACCCCAAAGATCTGACATTAAGTGGAGGAAAA---GGTTTGCAA---ATTGAAAGATCTGTAGAGAGTACCGGTA TTTCCTTGGTACCTGATACTGATTATGGCACTCAGGATAGTATCTCATTACTGGAAGCTGACACCCTAGGG---AAGGCA AAG---ACAGCACCAAATCAACATGTGAGTCTGTGTGCAGCAATTGAAAGCCCCAAGGAACTTATCCATGGTTGT---TC TAAAGATATTAGAAACGACACAGAGGACTTTCAGGATCCACTGGGACATCACGTTAAC---CACATTCAGGAGGCGAGCG CAGAAATGGAAGAGAATGAACTTGATACACAGTATTTGCAGAATATGTTCAGGGTTTCAAAGCGTCAGACGTTTGTTCTG TTTTCAAATCCA---------GAAAAGGAATGTGCAACAGTCTGTGCCCGCTCTGGGTCCTTAAGAGAACAAAGTCCAAG AGTCCCTCTTGAATGCAGACAAAAAGAAGAAAATCAGGGAAAGAGTGAGTCTAAAATCAAGCATGTGCGGGCAATTAATA CAACTGTGGACTTCCCTGTTGCTGGTCAAAAAGATAAGAAGCCGAGCGATCATGCCAAACGTAGCCCAAAAAGAGTAACT AGGCTTTGTCAGTCATCACAGTTCAGA---AGCAACAAAACTGAGCTCATTATTGCAAATAAACATGGGATTTCACAAAA CCCATATCTTATACCATCACTTTCTCCCATCAGGTCATCTGTTAAAACTATATGTAAGAAAAAC---CTGTCAGAGGAAA AGTTTGAGGAACCTGTAAGGTCACCTGAAAGAGCAATGGAAAACGAGAGCATCATTCAAAGTACAGTGAGCACAATTAGC CAAAATAACATTCGAGAAAGCACTTTTAAAGAAGTCAGCTCAAGCAGTATTAATGAAGTAGGTTCCAGTACTAATGAAGT AGGCTCTAGTATTAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAAAACTAGRTA GAAACAGAGGACCTAAATTAAGTGCTTTGCTCAGATTAGGTCTTATGCAACCCGAAGTCTATAAGCAAAGTCTT---CCT ATAAGTAACTGTCAACTTCCTGAAATAAAAAGGCAAGGAGAAAGTGAAGGCACACTTCAGGCTGTTAATGCAGATTTCTC CCCAAGTCTAATTTCAGATAACCTAGAACAA---CCTATGGGAAGAAGCCATGCTTCTCAGGTTTGTTCTGAGACATCTG ACGAGTTTTTAAATGATGACAAAATAAGGGAAAATAGCAACTTTGCTGAAAGTGACATTAAGGAAAGATCT---GTTTTT AGCAAAAGTGTCCAAAAAGGAGAATTCAGAAGGAGCCCTGGCCCTTTAGCCCATCATACATGTTTGGCTCAGGGTCACGA AAGAGGGGCT---------GAGTCCTCAGAAGAGAATGTGTCTAGTGAG >Mole TGTGGCATAAATACTCATGCCAGCTTATTACAGCATGAAAACAGCAGTTTATTACTCACTGAAAACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGACTTAGCAAAGAGCCAGCAGAACAGATGGGCTGAAAGTAAAGAAA CATGTAATGAT---AGGCAGACTTCCAGCCCAGAGAAAAGGGTAGACCCGAATGCTGATCCCATGTATGGGAGAAAAGAA CTGAATAAGCAGAAACCTCCATGCTCTGACAGCCCCAGAAAT---TCCCAAGGT---GTTGCCTGGATAACACTGAACAG TAGCATTCGAAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAAATATTAACTTCTGATGAATCTCGTGATGGGGAGTCCC CATCAAATATTGAAATGACTGGTGAAGTAGAAGTTCCAAAC------ACAGTAGATGGATTTTCTGGTTCTTCAGAGAAA ATAGACTTAATGGCCAGCGATCCTCCC---GCTTTAATGTGTAAAAGTGAAAGAGTCCGCACCAAACCAGTAGAAAGTAA T---ATTGAAGATAAGATATTTGGGAAAACCTATCGGAGGAAGACAAGCCTCCCAAACTTGAGCCACGTGGCTGAATATT TAATTACAGGGACATCTGTCACAGAACCTCAGATAGTTCAA--------------------------------------- ---------------------GAGCGTCCCTTCACAAATAGATTAAAACGTAAAAGGAGAACTCTACCAGGCCTTTGTCC TGAGGATTTTATCAAGAAAGTAGATTTGGAAGTTGTTCAGAACACCCCTGAAAAGATAAGTGCGGGAACTGATCAAATGG ATCAGAGT---------GGTCAAGTGATGGATGTTGCTTATAATGGTCATGGGAATGAAACAAAACGTGATTAT---GTT CAGAAAGAGAAAAGTGCTAACCCAGCCGAA------TCTTTAGAAAAAGAATCTACCTTCAGAATTAAAGCTAAGCCCAT AAGTAGCAGTATAAGCAATATGGAACTAGAATTAAATATCCACTCTTCAAAAGCACCTAAGAAGAACAGGCTGAGGAGGA AGTCTTCTACCAAGCATATTCATGCACTTGAACTGGTGGTCAAAAGAAATCCAAGCCCACCTAATCATACAGAACTACAA ATTGATAGCTGTTCTAGCAGTGAAGAG---ATGAAGGAAAGA---AATTCCAACCAAATGCCAGTAAGACACAGCAAAAG GCTTCAACTCATGGAAGATAAAGACCCCGCAACTGAAGCCGTCAAGAGTAACAAGCCAAAAGAACAAATT---AAAAGAT TTGCCAGTGATACTTTGCCAGAACTAAATTCAACACATGTACCTGGCTTTCTTAGCAACTATTCAAGTTCTAATAACCTT GAAGAGTTTTCCAATCCTAGCCTTCAAAGAGAAGAACTAGAAGAGAAC---CTAGGAACAAATCAAGTGTCAAATAATAC CAAAGACCCCAAAGATCCGATACTTAGTGGACAAAGA---GATTTGCAA---GCTGAAAGATCTGGCGAGAGTACCAATA TTTCATTGGTACCTGAGACTGATTTTGGCAGTCAGGATAGTGTCTCATTACTGGAAATTGACATCCTAGGC---AAGGCA AAA---AAAGTGCCGAATCAGTGTGCAAGTCTGTGTACAAAAATTGAAAACTCTAAGGAACTTATTCATAGTTGT---TC TAAAGATACTAGAAATGACACACAGGGCTGTAAGGATCCACTGAAATACGAAGTTAAC---CACACTCAGGAGATAAGGA TAGAGATGGAAGAGAATGAACTTGATACACAGTATTTACAGTCTACGTTCAAAGCTTCAAAGCGTCAATCATTTGCTCTG TCTTCACATCCGGGAAATTCAGAAAAGAAATATCCCCCAGTCTCTGCCCCTTCCAGGTCCTTGAAGAGACAAACTCCAAA AATCACTCTGGAATGTGAACTGAAAGAAGAAAATCAAGGGAAGAAAGAGTCTAAAACCGAGCATGTACAGGCAGTACATA GAATTGCAGACTTAACTCTGGCTTGTCAGAACGATAAG---CCACATGATTCTGCCAAATGTAGCATAAAAGGAGTCTCT AGGCTTTGTCAGTCATCTCAGTTCAGA---GGCAACGAAACAGAAGTCATTGTTGCAAATAAACGTTTAATCTTACAAAA CCCATATCTTATTCCACCACTTTCTTCAATTCAGTCATCTGTTAAAAGTGCTTGTAAGAAAAAC---CTGGCGGAGGAAA AGCTTGAGGAACACTCATGGTCACCCGAAAGAGAAACAGGAAACGGGAGCATCATTGAAAGTACAGTGAGCAGGGTTAGC CAAAATAACAATAGAGAAAATGCTTTTAAGGAAGTCAGCTCAAGTAGTATTAACGAAGTAGGTTCCAGTACTAACGAAGC TGACTCTAGTATTAATGAAGTAGGTTCCAGT---------------------GGTGAAAATATTCAAGCTGAACTAGATG GAAGCAGAGGACCTAAATTAAATGCTATGCTCAGATTGGGTCTTATGCAACCTGAAGTCTATAAGCAAAGTCTT---CCT AAAAGTACTTGTAAACATCCTGAAGTTAAAAGGCAAGGAGAAAATGAAGGCATAGTTCAGGCTGTTCATACAGATTTCTT TCCATGTCTGATTTCAGATAACCAAGAACAA---CCTATGGGAAGTAGTCATGCTTCTCAGGTTTGTTCTGAGACGCCTG AGGACCTGTTAAATGACGAAATAAAGGATAATAACATCAGCTTTGCTGACAGTGGCATTAAGGAAAGATCTGCTGTTTTT AGCAAAAGTTTCCAGAAAGGAGAATTC---AGGAGCCCTAGCCTTTTAGACCAT---ACATGTTTGGCTCAGGACCACGA AAGTGGG------AAATTGGAGTCCCCAGAAGAGACTATGTCTAGTGAG >Hedgehog ---------------------------------CGTGAGAACAGCAGTTTATTACTCACTAAAGGCAAAATGAATGTAGA AAAGGCTGAATTCTGTAGTAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACAGAGCAGATGGGCTGAAAATATGGAAA CATGTAATGAT---AGGCAGACTCCTAGCCCAGACAAAGAAGTAGATCTGAATGCTGATTCCTTATATGAGAGAAAAGAA CTAAACGAGCAGATCTCTTCATACTCCAGCAGTCCTAAAGAT---TCCAAAGAC---ATTTCTTGGGTAGCACTGAAT-- -AGCATACAGAAAGTGAATGAATGGCTTTCCAGAAGTGATGAACTGTTAACTTCTGATGACTCATATGATAAGGGATCTA AATCAAAAACTGAAGTAACTGTAACAACAGAAGTTCCAAAT------GCAATAGATAGRTTTTTTGGTTCTTCAGAGAAA ATAAACTTAACAGCCAGTGATCCTCATGTTGCTTTAATACGTGAAGGTGAAGGAGTCCACTTCAAACCAGTAAAGAATAA T---ATTGAAGATAAAATATTTGGGAAAACCTATGGGAGGAAGGCAAGCCTTCTTAATTTGAGCCACGTAACTGAAGATG TAATTATAAGG------------GAACCTCAGGTAGCCAAA--------------------------------------- ---------------------GAGCCTCTCCTTGCAACTAAATTAAAACGTAAAAGGAGAACTGATGTAGGTCTTTGTCC TGAGGATTTTATCAAGAAAGTAGATGTGGCAATTGTTCAGAAGACTCCTGAAAAGATAATCAAGAGACCTGGCCAACTGG ATCAAAGT---------GACCAAGTAATGAATATTGCTACTAATGGTCATGAAACTGAAACAAAGAGTGATTAT---GTT CAGAAGGAGAAAAATGCTAACCCAGCAGAA------TCACCAGACAAAGAATCTGCTTTTAGAAATAAACCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTAAATGTCCACAGTTCAAAAGCACCTAAGAAGAATAAACTCAGGAGAA AGTCATCTGCCAGGCATATTCATGCACTTGAAGTAGTAGTCAATAAAAACTCAAGCCCACCTAGCCATACAGAACTACAA ATTGATAGCTGTTCTAGCAATGAAGAG---ATGAGGGGGGTG---AAAGCTGATCAAATGCCAGTCAGGCACAGCAAACA ACTTCCACTCATTGAAGACAAAGAACCTGCAACTAGAGCCTTTAGTAGTAGCAAGTCAAATGAACAAATAAGTAAACAGT TTGTTGGTGAGACTTTTTCAGACCTAAATTTAACAAACATCCCTGGTTTTCTTACCAGCTGTTCAAGTTCCAATAAACAT CAGGAATGTATTAATCCTAACCTACAAAGAGAGGTATTAGGAGAAAGC---CGTGGGACTATTCAAGTGTGTAATAATAC CAAAAACCCTAAAGATCTGATAGTAAGTGGAAAGAGA---GGTTTACAA---ATTGAAAGATCTGTAGAGAGTCCT---- --------GTAGAGAACACTGACTATGGCAGTCAGGATAGTATCTCATTACTGGAAACTGATACTCTAGGG---AAGGCA AAA---AAGACACCAAATCAATGTGTAAGTCTGTGTGTAGCAACTGAAACCCCCAAAGAACTTAGCCATAGTTGT---TC TAAAGATACTAGAAATGACACTGAAGGCTTTAGGGATTCACTCAGATGTGAAGTTGAT---CACACGCAGGAGACAAGCA CAGAACTGGAGGAGAGTGAACTTGATACACAGTATTTACAGAATACCTTCAAGGTTTCAAAACGTCGGTCATTTGCTCTG TTTTCAAGTCCAGAGAATTCAGAAAAGGCATGTACAAGAGGCTCTGTCCATTCTAAGTCTTTAAGGAAACAAAGTCCAAA AGTTATTCTGGACTGTGAACAAAAAGAAGAAAATCAAGGAGAGAAAGCATCTGAAATCAAGTATTTGCCATCAGAACATA CAACTACAGGCTTTCCTGTGGTGTGTTATAAAGATACA---TCAGGTGATTATGCCAAATGTAGCGTAAAAGGAGTCTCC AGGCATTGTCAGCCATCTCGGTTCAGA---GGCAGTGAACCTGAACTCATTGTTGCAAATAAAAATTTAATTTTACAAAA CCTATATCATATACCAACACTTCCTACCATCATGTCATCTACTAAGAGTATATGCAAGAAAAAC---CTGTCTGAGGAAA AT------------TCACTGTCGCCTGAAAGAGCAGTGACAAACAAAAGCATCATCCAAAGTACAGTGAACACCATTAGC CAGACTAATGCCAGTGAAAATGCTCTTAAAGATGTCAGTTCAAGCAGTGTTAATGATATGGGTTCCAGTACTAATGAAGT AGGTTCCAGT------------------------------------------AGTGAAAACATTCAAGCTAAACTATGTA GAAACAGAGGACCCAGATTAAATGCTACTCTTAGAGCAAGTCTTATGCAACCTGAAGTCTTTGAGCAGTGTCTT---CTG ATGAGTAACTGTAAACATTCTGAAATGAAAAGACAAGGAGAAAATGAAGGTGTTATTCAGACTGTTAATATAGAATCCGA TTCATGCCCAAGACCAGATAACTTAGAACAA---CCGGTGGAAACTAGTCAT---------GTTTGTTCTGAAACACCTG ACGATCTGTTAAATGACGATGAAATGAAGGAAAATACAAGCTTTACTGAAAGTGGCATTAAGGAACAATCTGCTGTTTTT GGCAAAAGTACCCAGAAAGGT---TTCAGAAGGAGCCCTAGCCCTTTAGGCCAC---ACATGTTTGAGACAGGATCAGCA AAGAGGGGCACAAAAATCAGAGCCCTCTGAAGATTTCATGTCTAGTGAG >TreeShrew TGTGGCATAAATACTTATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAGGACAGAATGAATGTAGA AAAGGCTGAATTGTGTAATAAAAGCCAACAACCTGACTTAGCAAGGAGCCAGCAGAGCAGATGGACTGAAAATAAGGAAA CATGTAATGAT---AGGCAGATTCCCAGCACAGAAAAAAAGGTAGATCTAAATGCTGATCCCCTGTGTGGGAAGAAAAAA CAAGCTAAGCAGAAACYGCTATGTTCTAACAGTCCTAGAGAT---GACCAAGAT---TCTCCTTGGATAACTCTAAATAG TAGCATTCAGAAAGTTAATGAATGGTTTTCCAGAAGTGATGAAATGTTAACTTCTAACGACTCACATGATGGTGAGTCTG AA------------ATAGCTGGTGCATTYGAAGYTCCAAAT------AAAGTAGATGAATATTCTGGTTCTTCAGAAAAA ATAGACTTAATGGCCAACAATCTTCATGATGCTTTAATAAGTAAAAGTGAAGGAATCTACTCCAAACCAGTAGAGGGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAAGCAAGTCTTCCTAACTTGAGCCGTGTAACTGATGATC TAATTAGAGGGGCATTTGTTACAGAGCCTGAGATAACTCGA--------------------------------------- ---------------------GAGCGTCCCTTCACAAATAAATTAAAGCGGAAAAGGAGAACTATATCAGGCCTTCATCC TGAAGATTTTATCAAGAAAACAGATTTGGCAGTTGTTCAAAAGACTCCTGAAAAGATAAATCAGAGAACTGACCAAATAG AGCATAAT---------GGTCAGGTGATGAGTATTGCTAATAGTGGTCATGAGAATGAAACAAAAGGTGATTAT---ATT TCGAAAGAGAAAAATGCTAACCCAATGGAA------TCATTAGAAAAAGAATCTGCTCTCAGAACTAAAGCTGAGCCCAT AAGCAGCAGTGTAAGCAATATGGAACTAGAAATAAATAACCACAGTTCAGAAGCACCTAAGAAGAATAGGCTGAGGAGAA AGTTTTCTGCTAGGCATATTCGCACACTTGAACTAGTAGACAATAAAAGTCCAAGCCCACCTAATCGTACTGAACTACAA ATTGACAGTTATTCTAGCGGTGAAGAG---AGAAAGAAAAAG---GGT---GAGCAAATGCCAGTTGGACACAGCAGAAA GTTTCAACTTGAGGAAGAGAAAGAACCTACAACTGGAGCCAAGAAAAATAACCAGCCAAATACAGAAATAAGTGAAAGAC ATGCCAGTGGTGTTATCCCAGATCTGAAGTTAACAAACATACCTGGTTTTTTCACAAACTCTTCGAGTTCTAATAAACTT CCAGAATTTGTCCATCGTAGCCTTCAAAGAGAAAAA---GAAGAGAAC---CGAGAAACAATTCAAATATCCAGTAGTAC C---------AAAGATCTGGTATTAAGGGGAGAAAGG---GGTTTGCAA---GATGTAAGGTCTGCAGAGAGTACCAGTA TTTCTTTGGTACCTGATACTGATGATAACACCCAGGATAGCATCTCATTACTAGATGCTAACCCCCTAGCTAGGAAGGCA AAA---ACAGCACCAAATCAATGTGTAAATCAGAGTGCAACAACTGAAAACCCCAAGGAACTTATACACAGTTGT---TC TAAAACTACTAGGAAT------GAAGGCTTCAAGGATCCATTGAAAAGTGAAGTTAAT---CATATTCAGGAGATGAGTG TAGAAATGGAGGAGAGTGAACTTGATACTCAGTATTTACAGAATACATTCAGGAGTTCAAAGCGTCAGTCATTTGCTCTG TCTTCAAATCCAGGAAATCCAGAAAAGGAACATGTCTGTGTT-------------------------------------- -------------------------------AAAGAAAGTCTGAAAGAGTCTAACATCCAACATATACAGGCAGTTAGTA CC---------------ATGGTTTTTCAGAAAGATAAG---CTAGGTGATTTTGCTACATCTGGCATTAAAGAAGTCCCT AGACTTTGTCCATCATCTCAGTTCAGA---GGCAATGAAACTGATCTCATTACTGCAAATAAACCTGAAGTTTCACAAAA CCCGTATCATATGCCATTACTTTATCCTGTCAAGTCACCTATTATAACTAAAAGTAAGAAAAGC---CTGTCAGAGGAAG GGTTTGAGGAACAGGCAATGTCACTTGAAAGAGCAATGGAAAATGAGAACATCATTCAAAGTACAGTGAGCACAATTAGC CAAGATAACATTAGAGAAGGTGCTTTTAAAGAAGCCAGCTCAAGCAGTATTAATGAAATAGGTCCTAGTACTAATGAAGG AAGCTCTAGTATTAATGAGGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAGAACTAGGTA AAAAGAGAGGATCCAAATTAAATGCTGTGCTTAGATTAGGTCTTATGCAACCCGAAGTCTATAAGCAAAGTCTT---CCT TTAAGTAATCATAATGATCCTGAAATGAAAAGACAAGAAAAAAATGAAGGAGGAGTTCAGGCTATTAAA---GATTTACC TCCATGTCTAATTTCAGATAATCAAGAGCAT------ATGGGAAGTAGCCATGCTTCTCAGATTTGTTCTGAGACACCTG ATGATCTGTTAGATGATGATGAAGGAAAAGAAAAT---AGCTTTGCTGAGGTTGATGTTAAGGAAAGATCTGCTGTTTTT GGCAAAACTGTCCAGAGAAGAGAGTTAAGAAGGAGCTCTAGCCCTTTAACTCGT---GCATGTTTGACTGAGGGTCAGCA AACAGGAGCCCAGAAATTAGATTCATCAGAAGAGAACCTATCTAGTGAG >FlyingLem TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACGCACTAAAGACAGAATAAATGTTGA AAAGACTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAGGAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACGCCCAGCACAGAGAAAAAGATAGATCTAAATGCTGATTCCCAGCATGGGAGAAAAGAA CGGAATATGCAGAAACCTCCATACCCTGAGAGTCCTAGAGAT---ACCCAAGAT---GTTCCTTGGATAACACTAAACAG CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAAATTTTAGCTTCTGATGACTCACGTGACAGGGTGTCTG AATCAAATGCCAAAGTAGCTGGTGCATTAGAAGTTCCAAAT------GATGTAGATGGATATTCTGATTCTTCAGAGAAA GTTGATTTAATGGCCAGTGATCCTCATGATGCTTTAATATGTAAAAGTGAAAGAATCCACTCCAGACCAGTAGAGAGTAA T---ATCAAAGATAAAATATTTGGGAAAACCTATCAGAGGAAGACAAGCCTCCCTAACTTGAGCCACGTAAATGAAGATC TAATTATAGGAGCATTTGTTACAGAACCACAGATAACACAA--------------------------------------- ---------------------GAGCGTCCCCTCACAAATAAGGTAAAGCCTAAAAGGAGAACTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGCAGACTTGGCAGTTGTTCAAAAAACTCCTGAAAAGATAAATCAGGGAATTGACCAAATGG AGCAGAAT---------GATCGAGTGATGAATATTATTAATAGTGGTCATGAGAATGAAACAAAGGATGATTAT---GTT CAGAAAGAGAAAAATGCTAACCCAACAGAA------TCATTGGAAAAAGAATCTGCTTTCAGAACTAAAGCAGAACCTAT AAGCAGCAGTATAAGCAATATGGAAATAGAATTAAATATCCACAATTCAAAACCATCTAAGAAGAATAGGCTGAGGAAGA TGTCCTCTACTAGGCATATTCATGCACTTGAACTAGTAGTCAATAGAAATCCAAGCCCACCTAATTATACTGAACTACAA ATTGATAGTTGTTCTAGCAGTGAAGAA---ATAGAGAAAAAA---AATTCCAGCCAAATGCCAGTCAGGCACAGCAGAAA GCTTCAACTCATGGAAAATAAAGAACCTGCAACTGGAGCCAAGAAGAGTAACAAGCCAAATGAACAAATAAGTAGAAGAC ATTCCAGTAATGCTTTCCCAGAACTGCGGTTAACAAATGTACCTGTTTTTTTTGCTAACTGTTCAAGTTCTAATAAACTT CAAGAATTTATCGATCCTAGCCTTCAAAGAGAAGAAATAGAAGAGAAC---CTAGAAACAATTCATGTGTCTAATAGTGC CAAAGACCCCAAAGATTTGGTGTTAAGTGGGGAGAAG---GGTTTGCAA---ACTGAAAGATCTGTAGAGAGTACCAGTA TTTCATTAGTACCTGATACTGATTATGGCACTCAAGACAGTATCTCAATATTAGAAGCTAACATCCTAGGG---AAGGCA AAA---ACAGCACCAAGTCAACATGCAAATCAGTGTGCAGCAATTGAAAACCCCAAAGAACTTATCCATGGTTGT---CC TAAAGGTACTAGAAATGACACAGAGGATTTTAAGGATCCATTGAGATGTGGAGTTGAC---CACATTCAGAAGACAAGCA TAGAAATGCAAGAGAGTGAACTTGATACTCAGTATTTACAAAATATATTCAAGGTTTCAAAACGTCAGTCATTTGCTCTC TTTTCAAATCCAGGAAATCCAGAAAAGGAGTGTGCAACAGTCTATGCCCACTCCAGGTTGTTAAGGAAACAAAGTCCAAA AGTCACTCCTGAATGTGAACAAAAAGAAGAAAATGAGGGAAATAAAGAGTCTAAAATCAAGCACATACAGGCAGTTAATA CCACTGTGGGCTTTTCTGTCCTTTGTCAGAATGTTAAGAAGCCAGGTGATTATGCCAAATTTAGCATTAAAGGAGTCTCT AGGCATTGTTCATCATCTCAGTTCAGA---GGCAATGAAACTGAACTCATTACTGCAAATAAACATGGAATTTTACAAAA CTCATGTCATATGTCATCACTTTCCCCCATCAGGTCATCTGTTAAAATTAAATGTAAGAAGAAC---CTGTCAGAGGAAA GGTTTGAGGAACATTCAGTGTCACCTGAAAGAGCAATGGCAAACAAGAGAATCATTCAAAGTACAGTGAACACAATTAGC CAAAATAACATTAGAGACAGTGCTTCTAAAGAAGCCAGCTCAAGCAGTATTAATGAAGTAGGTTCCAGTACTAATGAAGT AGGCTCCAGTATTAATGAAGTAGGTCCCAGT---------------------GGTGAAAACATTCAAGCAGAACTAGGTA GAAACAGAGGACCTAAATTAAGTGCTATGCTTAGATTAGGCCTCATGCAACCTGAAGTTTACAAGCAAAATCTT---CCT TTAGGTAATTGTAAACATCCTGAAATA---AGGCAAGAAGAAAATGAAGGAATAGTTCAGGCTGTTAATACAAATCTGTC TCTGTGCCTAATTTCACATAACCTCGAACAA---CCTATGGAAAGTAGTCATGCTTCCCAGGTTTGTTCTGAGACACCTG ATGACCTGTTAGATGGTGATGAGATAAAGGAAAACACCAGCTTTGCTGAAAGTGACAGTAAGGAAAGATCTGCTGTTTTT AGCAAAAGTGTCCAGAGAGGAGAGTTAAGCAGGAGCCCTAGCCCTTTTGCCCAA---ACATGTTTGGCTCAGGGTCACCA AAGAGGAGCCAGGAAATTAGAGTCTTCTGAAGAGAACGTATCTAGTGAG >Galago TGTGGCAAAAATACTCATGCCAGCTCATTACAGCATGAGAGCAGCAGTTTATTACTCACTAAAGACAAAATGAATGTAGA AAAGGCTGAATTTTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACAGAGCAGATCGGCTCAAAGTAAGGAAA CATGCAATGAT---AGGCACACTTGCAGCCCTGAGCAAAAGGTAGATCTGAATACTGCTCCCCCATATGGGAGAAAAGAA CAGAATAAGGAGAAACTTCTATGCTCCAAGAATCCTAGAGAT---AGCCAAGAT---GTTCCTTGGATAACACTAAATAG CAGCATTCAGAAAGTTAATGAATGGTTTTCTAGAAGTGATGAAATGTTAACTTCTGATGACTCACATGATGAGGGTTCTG AATCACATGCTGAAGTAGCTGGAGCCTTAGAAGTTCCAAGT------GAAGTAGATGGATATTCCAGTTCCTCAGAGAAA ATAGACTTACTGGCCAGTGATCCTCATTATCCTATAATATGTAAAAGTGAAAGAGTTCACTCCAAACCAATAAAGAGTAA A---GTTGAAGATAAAATATTTGGGAAAACTTATCGGAGGAAGGCAAGCCTCCCTAACTTAAGCCATGTAACTGAAAATC TAATTATAAGAGCAGCTGCTACTGAGCCACAGATAACACAA--------------------------------------- ---------------------GAGTGTTCCCTCACAAATAAATTAAAACGTAAAAGGAGAACTACATCAGGTCTTTGTCC TGAGGATTTTATCAAGAAGGCAGATTTGGCAGTTGTTCAAAAGACACCTGAAAAGAGAATTCAGGGAACTAACCAAGTGG ATCAGAAT---------AGTCACGTGGTAAATATTACTAATAGTGGTTATGAGAATGAAACAAAAGGTGATTAT---GTT CAGAATGAAAAAAATGCTAACTCAACAGAA------TCATTGGAAAAAGAATCTTCTCTCGGAACTAAAGCTGAACCTAT AAGCAGCAGTATAAGTAATATGAAATTAGAATTAAATATTCACAATTCAAAAGCAAGTAAAAAGAAAAGGCTGAGGAAGA AGTCTTCTAGCAGGCATATTCGTGCACTTGAACTAGTAGTCAATAAAAATCCAAGCCCTCCTAATCATACCAACCTACAA ATTGACAGTTGTTCTAGCAGTGAAGAA---ATAAAGGATAAA---AGTTCTGACCAAATACCAGTCAGGCATAGCAGAAA GCCTGGACTCATGGAAGATAGAGAACCTGCAACTGGAGCCAAGAAAAGTAACAAGCCAAATGAGCAAATAAGTAAAAGAC ATGTCAGTGATACTTTCCCAGAAGTGGCATTAACAAATATATCTAGTTTTTTTACTAACTGTTCAGGTTCTAATAGACTT AAAGAATTTGTCAATCCTAGCCTTCAAAGAAAAAAAACAGAAGAGAACTTAGAAGAAACAATTCAAGTGTCTAATAGTAC CAAAGGTCCGGTGTTAAGTGGAGAAAGGGTTTTGCAA---ATTGAAAGT---GAAGAAAGATCTATAAAAAGCACCAGTA TTTCATTGGTACCTGATACTGATTATGGTACTCAGGACAGTAACTCGTTACTGAAAGTTAAAGTCTTACGG---AAGGTG AAA---ACAGCACCAAATAAACATGCAAGTCAGGGTACAGCCACTGAAAACCCCAAGGAACTAATCCATGGTTGC---TC TAAAGATACTGGAAATGACACAGAGGGCTATAAGGATCCATTGAGACATGAAATTAAC---CACATTCAGAAGATAAGCA TGGAAATGGAAGACAGTGAACTTGATACTCAGTATTTACAGAATACATTCAAGTTTTCAAAGCGTCAGTCGTTTGCTCTG TTTTCAAACCTAGGA---------AAGGAATGTGCAACAGTCTGTGCCCAGTCTCTCTCTGCGTCCTTAAGAAAAGGTTC AAAAGTCATTCTTGAATGTGAACAAATAGAAAATCCAGGAATGAAAGAGCCTAAAATCAAGCATATACAGGGAAATAATA TCAATACAGGCTTCTCTGTAGTTTGTCAGAAAGATAAGAAAACAGATGATTATGCCAAATACAGCATCAAAGAAGCATCT AGGTTTTGTTTGTCAAATCAGTTTCGA---GACAATGAAACTGAATCCATTACTGTAAATAAACTTGGAATTTTACAAAA CCTCTATCATATACCACCACTTTCTCCTATCAGGCTATTTGATAAAACTAAATGTAATACAAAC---CTGTTAGAGGAAA GGTTTGAAGAACATTCAGTGTTACCTGAAAAAGCAGTAGGAAACGAGAACACCGTTCCAAGTACAATGAATACAATTAAC CAAAATAAC---AGAGAAAGTGCTTATAAAGAAGCCAGTTCAAGCAGTATCAATGAAGTAAGCTCGAGTACTAATGAAGT GGGCTCCAGTGTTAACGAAGTAGGCCCCAGT---------------------AGTGAAAACATTCAAGCAGAACTAGATA AAAACAGAGGACCTAAGTTGAATGCTGTGCTTAGATTAGGTCTTATGCAACCTGAAGTCTATAAACAAAATCTT---CCT ATAAGTAATTGTGAACATCCTAAAATAAAAGGGCAAGAAGAAAATGGA---GTAGTTCAACCTGTTAATCCAGATTTTTC TTCATGTCTAATTTCAGATAACCTAGAACAA---CCTACGAGAAGTAGTCATGCTTCTCAGCTTTGTTCTGAGACACCTG ATGACTTATTAGTTGATGATGAACTAAAGGAAAATACCAGTTTTGCTGAAAATAACATTAAGGAAAGATCTGCTGTTTTT AGCAAAAATGTCATGAGAAGAGAGATTAGCAGGAGCCCTAGCCCTTTAGCCCAT---ATACATTTGACTCAGGCTCACCA AAGAGAGGTTAGGAAATTAGAGTCCTCAGAAGAGAACATGTCTAGTGAA >HowlerMon TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTGTTACTCACTAAAGACACACTGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACATAACAGATGGGCTGAAAGTGAGGAAA CATGTAATGAT---AGGCAGACTCCCAGCACAGAGAAAAAGGTAGATGTGGATGCTGATCCCCTGCATGGGAGAAAAGAA TGGAATAAGCAGAAACCTCCGTGCTCTGAGAATCCTAGAGATGATACTGAAGAT---GTTGCTTGGATAATGCTAAATAG CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAACTTCTGATGACTCACATGATGGGGGGTCTG AATCAAATGCCAAAGTAGCTGAAGCATTGGAAGTTCTAAAT------GAGGTAGATGGATATTCTAGTTCTTCAGAGAAA ATAGACTTACTGGCCAGTGATCCTCATGATCATTTGATATGTAAAAGTGAAAGAGTTCACTGCAAATCAGTAGAGAGTAG T---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTGAGCCACGTAACTGAAAATC TAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA--------------------------------------- ---------------------GAGCATCCTCTCACAAATAAATTAAAGCGTAAAAGGAGAGTTACATCAGGACTTCATCC TGAGGATTTTATCAAGAAAGCAGATTTGGCAGTT---CAAAAGACTCCTGAAAAGATAAATCAGGGAACTAACCAAACAG AGCGGAAT---------GATCAAGTGATGAATATTACTAACAGTGGTCATGAGAATAAAACAAAAGGTGATTCT---ATT CAGAATGAGAACAATCCTAACCCAGTAGAA------TCACTGGAAAAAGAATCA---TTCAAAAGTAAAGCTGAACCTAT AAGCAGTAGTATAAGCAATATGGAATTAGAATTGAATGTCCACAATTCCAAAGCATCTAAAAAGAATAGGCTGAGAAGGA AGTCTTCTACCAGGCATATTCATGAGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAATTATACTGAAGTACAA ATTGATAGTTGTTCTAGCAGTGAAGAG---ATAAAGAAAAAA---AATTACAACCAAATGCCAGTCAGGCACAGCAGAAA GCTACAACTCATGGAAGATAAAGAACGTGCAGCTAGAGCCAAAAAGAGTAGCAAGCCAAATGAACAAACAAGTAAAAGAC ATGCCAGTGATACTTTCCCAGAACTGAGGTTAACAAACATACCTGGTTCTTTTACTAACTGTTCAAATACTAATGAATTT AAAGAATTTGTCAATCCTAGCCTTCCAAGAGAACAAACAGAAGAGAAA---CTAGAAACAGTTAAACTGTCTAATAATGC CAAAGACCCCAAAGATCTCATGTTAAGTGGAGAAAGT---GTTTTGCAA---ATTGAAAGATCTGTAGAGAGTAGCAGTA TTTTGTTGATACCTGGTACTGATTATGGCACTCAGGAAAGTATCTCATTACTGGAAGTTAGCACTCTGGGG---AAGGCA AAA---ACAGAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGAACTAATTCATGGTTGT---TC TAAAGATACTAGAAATGGCACAGAAGGCTTGAAGTATCCATTGGGACCTGAAGTTAAC---TACAGTCAGGAAACAAGCA TAGATATGAGAGAAAGTGAACTTGATACTCAATATTTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCTG TTTTCAAATCCAGGAAATCCAGAAAAGGAATGTGCAACATTCTCTGCCTGCTCTAGGTCCTTAAAGAAACAAAGTCCAAA GGTCACTCCTGAATGTGAACAAAAGGAAGAAAATCAAGGAGAGAAAGAGTCTAATATCGAGCTTGTAGAGACAGTTAATA CCACTGCAGGCTTTCCTATGGTTTGTCAGAAAGATAAG---CCAGTTGATTATGCCAGATGT---ATCGAAGGAGGCTCT AGGCTTTGTCTATCATCTCAGTTCAGA---GGCAACGAAACTGGACTCATTATTCCAAATAAACATGGACTTTTACAGAA CCCATATCATATGTCACCGCTTATTCCCACCAGGTCATTTGTTAAAACTAAATGTAAGAAAAAC---CTGCTAGAAGAAA ACTCTGAGGAACATTCAATGTCACCTGAAAGAGCAATGGGAAACAAGAACATCATTCCAAGTACAGTGAGCACAATTAGC CATAATAAC---AGAGAAAATGCTTTTAAAGAAACCAGCTCAAGCAGTATTTATGAAGTAGGTTCCAGTACTAATGAAGC AGGTTCTAGTACTAATGAAGTAGGCTCCAGTATTAATGAAGTAGGTTCCAGTGATGAAAACATTCAAGCAGAGCTAGGTA GAAACAGAAGGCCAAAATTGAATGCTATGCTTAGATTAGGGCTTCTGCAACCTGAGATTTGTAAGCAAAGTCTT---CCT ATAAGTGATTGTAAACATCCTGAAATTAAAAAGCAAGAACATGAAGAA---GTAGTTCAGACTGTTAATACAGACGTCTC TCTATGTCTGATTTCATATAACCTAGAACAG---CATATGGGAAGCAGTCATACATCTCAGGTTTGTTCTGAGACACCTG ACAACCTGTTAGATGATGGTGAAATAAAGGAAGATACTAGTTTTGCTGAATATGGCATTAAGGAGACTTCTACTGTTTTT AGCAAAAGTGTCCAGAGAGGAGAGCTCAGCAGGAGCCCTAGCCCTTTCACCCAT---ACACATTTGGCTCAGGTTTACCA AAGAGGGGCCAAGAAATTAGAGTCCTCGGAAGAGAATTTATCTAGTGAG >Rhesus TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAAC---AGTTTGTTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTGGCAAGGAGCCAACATAACAGATGGACTGGAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCCAGCACAGAGAAAAAGGTAGATCTGAATGCTAATGCCCTGTATGAGAGAAAAGAA TGGAATAAGCAAAAACTGCCATGCTCTGAGAATCCTAGAGAC---ACTGAAGAT---GTTCCTTGGATAACACTAAATAG CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAAGTTCTGATGACTCACATGATGGGGGGTCTG AATCAAATGCCAAAGTAGCTGATGTATTGGACGTTCTAAAT------GAGGTAGATGAATATTCTGGTTCTTCAGAGAAA ATAGACTTACTGGCCAGTGATCCTCATGAGCCTTTAATATGTAAAAGTGAAAGAGTTCACTCCAGTTCAGTAGAGAGTAA T---ATTAAAGACAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAACCTTCCCAATTTAAGCCATGTAACTGAAAATC TAATTATAGGAGCACTTGTTACTGAGTCACAGATAATGCAA--------------------------------------- ---------------------GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGAACTACATCAGGTCTTCATCC TGAGGATTTTATAAAGAAAGCAGATTTGGCAGTT---CAAAAGACTCCTGAAATAATAAATCAGGGAACTAACCAAATGG AGCAGAAT---------GGTCAAGTGATGAATATTACTAATAGTGCTCATGAGAATAAAACAAAAGGTGATTCT---ATT CAGAATGAGAAAAATCCTAACCCAATAGAA------TCACTGGAAGAAGAATCTGCTTTCAAAACTAAAGCTGAACCTAT AAGCAGCAGTATAAACAATATGGAACTAGAATTAAATATCCACAATTCAAAAGCACCTAAAAAAAATAGGCTGAGGAGGA AGTCTTCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAACTGTACTGAACTACAA ATTGATAGTTGTTCTAGCAGTGAAGAG---ATAAAGAAAAAA---AATTACAACCAAATGCCAGTCAGGCACAGCAGAAA CCTACAACTCATGGAAGATAAAGAATCTGCAACTGGAGCCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGAC ATGCCAGTGATACTTTCCCAGAACTGAAGTTAACAAAGGTACCTGGTTCTTTTACTAACTGTTCAAATACTAGTGAA--- AAAGAATTTGTCAATCCTAGCCTTTCAAGAGAAGAAAAAGAAGAGAAA---CTAGAAACAGTTAAAGTGTCTAATAATGC CAAAGACCCCAAAGATCTCATCTTAAGTGGAGAAAGG---GTTTTACAA---ACTGAAAGATCTGTAGAGAGTAGCAGTA TTTCATTGGTACCTGGTACCGATTATGGCACTCAGGAAAGTATCTCATTACTGGAAGTTAGCACTCTAGGG---AAGGCA AAA---ACAGAACGAAATAAATGTATGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGAACTAATTCATGGTTGT---TC TGAAGATACTAGAAATGACACAGAAGGCTTTAAGTATCCATTGGGAAGTGAAGTTAAC---CACAGTCAGGAAACAAGCA TAGAAATAGAAGAAAGTGAACTTGATACTCAGTATTTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCCTTTGCTCTG TTTTCAAATCCAGGAAATCCAGAAGAGGAATGTGCAACATTCTCTGCCCACTCTAGGTCCTTAAAGAAACAAAGTCCAAA AGTTACTTCTGAATGTGAACAAAAGGAAGAAAATCAAGGAAAGAAACAGTCTAATATCAAGCCTGTACAGACAGTTAATA TCACTGCAGGCTTTTCTGTGGTTTGTCAGAAAGATAAG---CCAGTTGATAATGCCAAATGTAGTATCAAAGGAGGCTCT AGGTTTTGTCTATCATCTCAGTTCAGA---GGCAACGAAACTGGACTCATTACTCCAAATAAACATGGACTGTTACAAAA CCCATACCATATACCACCACTTTTTCCTGTCAAGTCATTTGTTAAAACTAAATGTAACAAAAAC---CTGCTAGAGGAAA ACTCTGAGGAACATTCAGTGTCACCTGAAAGAGCAGTGGGAAACAAGAACATCATTCCAAGTACAGTGAGCACAATTAGC CATAATAACATTAGAGAAAATGCTTTTAAAGAAGCCAGCTCGAGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGT GGGCTCCAGTATTAATGAAGTAGGTTCCAGT---------------------GATGAAAACATTCAAGCAGAACTAGGTA GAAACAGAGGGCCAAAATTGAATGCTGTGCTTAGATTAGGGCTTTTGCAACCTGAGGTCTGTAAACAAAGTCTT---CCT ATAAGTAATTGTAAGCATCCTGAAATAAAAAAGCAAGAACATGAAGAA---TTAGTTCAGACTGTTAATACAGACTTCTC TCCATGTCTGATTTCAGATAACCTAGAACAG---CCTATGGGAAGTAGTCATGCGTCTGAGGTTTGTTCTGAGACTCCTG ATGATCTGTTAGATGATGGTGAAATAAAGGAAGATACTAGTTTTGCTGAAAATGACATTAAGGAGAGTTCTGCTGTTTTT AGCAAAAGCATCCAGAGAGGAGAGCTCAGCAGGAGCCCTAGCCCTTTCACCCAT---ACACATTTAGCTCAGGGTTACCG AAAAGAGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG >Orangutan TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCCAGCACAGAAAAAAAGGTAGACCTGAATGCTGATCCCCTGTGTGAGAGAAAAGAA TGGAATAAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGAT---ACTGAAGAT---GTTCCTTGGATAACACTAAATAG CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGACGAACTGTTAGGTTCTGATGACTCACATGATGGGAGGTCTG AATCAAATGCCAAAGTAGCGGATGTATTGGACGTTCTAAAT------GAGGTAGATGAATATTCTGGTTCTTCAGAGAAA ATAGACTTACTGGCCAGTGATCCTCATGAGGCTTTAATTTGTAAAAGTGAAAGAGTTCACTCCAAATCAGTAGAGAGTAA T---ATTGAAGACAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCCAACTTAAGCCATGTAACTGAAAATC TAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA--------------------------------------- ---------------------GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGAGCTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGCAGATTTGGCAGTT---CAAAAGACTCCTGAAATGATAAATCAGGGAACTAACCAAATGG AGCAGAAT---------GGTCAAGTGATGAATATTACTAATAGTGGTCATGAGAATAAAACAAAAGGTGATTCT---ATT CAGAATGAGAAAAATCCTAACCCAATAGAA------TCACTCGAAAAAGAATCTGCTTTCAAAACAAAAGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTCGAATTAAATATCCATAATTCAAAAGCACCTAAAAAGAATAGGCTGAGGAGGA AGTCTTCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAATTGTACTGAATTGCAA ATTGATAGTTGTTCTAGCAGTGAAGAG---ATAAAGAAAAAA---AAATACAACCAAATGCCAGTCAGGCACAGCAGAAA CCTACAACTCATGGAAGATAAAGAACCTGCAACTGGAGCCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGAC ATGACAGCGATACTTTCCCAGAGCTGAAGTTAACAAATGCACCTGGTTCTTTTACTAACTGTTCAAATACCAGTGAGCTT AAAGAATTTGTCAATCCTAGCCTTCCAAGAGAAGAAAAAGAAGAGAAA---CTAGGAACAGTTAAAGTGTCTAATAATGC CAAAGACCCCAAAGATCTCATGTTAAGTGGAGGAAGG---GTTTTGCAA---ACTGAAAGATCTGTAGAGAGTAGCAGTA TTTCATTGGTACCTGGTACTGATTATGGCACTCAGGAAAGTATCTCGTTACTGGAAGTTAGCACTCTAGGG---AAGGCA AAA---ACAGAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGAACTAATTCATGGTTGT---TT CAAAGATACTAGAAATGACACAGAAGGGTTTAAGTATCCATTGGGACATGAAGTTAAC---CACAGTCAGGAAACAAGCA TAGAAATGGAAGAAAGTGAACTTGATACTCAGTATTTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCTG TTTTCAAATCCAGGAAATCCAGAAGAGGAATGTGCAACATTCTCTGCCCACTCTAGGTCCTTAAAGAAACAAAGTCCAAA AGTCACTTTTGAATGTGAACAAAAGGAAGAAAATCAAGGAAAGAATGAGTCTAATATCAAGCCTGTACAGACAGCTAATA TCACTGCAGGCTTTCCTGTGGTTTGTCAGAAAGATAAG---CCAGTTGATTATGCCAAATGTAGTATCAAAGGAGGCTCT AGGTTTTGTCTATCATCTCAGTTCAGA---GGCAACGAAACTGGACTCATTACTCCAAATAAACATGGACTTTCACAAAA CCCATATCATATACCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAAC---CTGCTAGAGGAAA ACTCTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAACGAGAAC---ATTCCAAGTACAGTGAGCATAATTAGC CGTAATAACATTAGAGAAAATGTTTTTAAAGAAGCCAGCTCAAGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGT GGGCTCCAGTATTAATGAAGTAGGTTCCAGT---------------------GATGAAAACATTCAAGCAGAACTAGGTA GAAGCAGAGGGCCAAAATTGAATGCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTATAAACAAAGTTTT---CCT GGAAGTAATGGTAAGCATCCTGAAATAAAAAAGCAAGAATATGAAGAA---GTACTTCAGACTGTTAATACAGACTTCTC TCCATGTCTGATTTCAGATAACCTAGAACAG---CCTATGAGAAGTAGTCATGCATCTCAGGTTTGTTCTGAGACACCTA ATGACCTGTTAGATGATGGTGAAATAAAGGAAGATACTAGTTTTGCTGAAAATGACATTAAGGAAAGTTCTGCTGTTTTT AGCAAAAGCGTCCAGAGAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCAT---ACACATTTGGCTCAGGGTTACCG AAGAGGGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG >Gorilla TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAACAAACAGCCTGGCTTAGCAAGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAA CATGTAATGAT---AGGCGGACTCCCAGCACAGAAAAAAAGGTAGATCTGAATGCTGATCCCCTGTGTGAGAGAAACGAA TGGAATAAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGAT---ACTGAAGAT---GTTCCTTGGATAACACTAAATAG CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAGGTTCTGATGACTCACATGATGGGGGGTCTG AATCAAATGCCAAAGTAGCTGATGTATTGGACGTTCTAAAT------GAGGTAGATGAATATTCTGGTTCTTCAGAGAAA ATAGACTTACTGGCCAGTGATCCTCATGAGGCTTTAATATGTAAAAGTGAAAGAGTTCACTCCAAATCAGTAGAGAGTAA T---ATTGAAGACAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCCAGCTTAAGCCATGTAACTGAAAATC TAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA--------------------------------------- ---------------------GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGAGCTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGCAGATTTGGCAGTT---CAAAAGACTCCTGAAATGATAAATCAGGGAACTAACCAAATGG AGCAGAAT---------GGTCAAGTGATGAATATTACTAATAGTGGTCATGAGAATAAAACAAAAGGTGATTCT---ATT CAGAATGAGAAAAATCCTAACCCAATAGAA------TCACTAGAAAAAGAATCTGCTTTCAAAACGAAAGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTCGAATTAAATATCCACAATTCAAAAGCGCCTAAAAAGAATAGGCTGAGGAGGA AGTCTTCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAATTGTACTGAATTGCAA ATTGATAGTTGTTCTAGCAGTGAAGAG---ATAAAGAAAAAA---AAGTACAACCAAATGCCAGTCAGGCACAGCAGAAA CCTACAGCTCATGGAAGATAAAGAACCTGCAACTGGAGCCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGAC ATGACAGCGATACTTTCCCAGAGCTGAAGTTAACAAATGCACCTGGTTCTTTTACTAACTGTTCAAATACCAGTGAACTT AAAGAATTTGTCAATCCTAGCCTTCCAAGAGAAGAAAAAGAAGAGAAA---CTAGAAACAGTTAAAGTGTCTAATAATGC CGAAGACCCCAAAGATCTCATGTTAAGTGGAGAAAGG---GTTTTGCAA---ACTGAAAGATCTGTAGAGAGTAGCAGTA TTTCATTGGTACCTGGTACTGATTATGGCACTCAGGAAAGTATCTCGTTACTGGAAGTTAGCACTCTAGGG---AAGGCA AAA---ACAGAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGGACTAATTCATGGTTGT---TC CAAAGATACTAGAAATGACACAGAAGGCTTTAAGTATCCATTGGGACATGAAGTTAAC---CACAGTCGGGAAACAAGCA TAGAAATGGAAGAAAGTGAACTTGATGCTCAGTATTTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCTG TTTTCAAATCCAGGAAATCCAGAAGAGGAATGTGCAACATTCTCTGCCCACTCTAGGTCCTTAAAGAAACAAAGTCCAAA AGTCACTTTTGAATGTGAACAAAAGGAAGAAAATCAAGGAAAGAATGAGTCTAATATCAAGCCTGTACAGACAGTTAATA TCACTGCAGGCTTTCCTGTGGTTTGTCAGAAAGATAAG---CCAGTTGATTATGCCAAATGTAGTATCAAAGGAGGCTCT AGGTTTTGTCTATCATCTCAGTTCAGA---GGCAACGAAACTGGACTCATTACTCCAAATAAACATGGACTTTTACAAAA CCCATATCATATACCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAAC---CTGCTAGAGGAAA ACTTTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAATGAGAAC---ATTCCAAGTACAGTGAGCACAATTAGC CGTAATAACATTAGAGAAAATGTTTTTAAAGAAGCCAGCTCAAGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGT GGGCTCCAGTATTAATGAAGTAGGTTCCAGT---------------------GATGAAAACATTCAAGCAGAACTAGGTA GAAACAGAGGGCCAAAATTGAATGCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTATAAACAAAGTCTT---CCT GGAAGTAATTGTAAGCATCCTGAAATAAAAAAGCAAGAATATGAAGAA---GTAGTTCAGACTGTTAATACAGATTTCTC TCCATGTCTGATTTCAGATAACTTAGAACAG---CCTATGGGAAGTAGTCATGCATCTCAGGTTTGTTCTGAGACACCTG ATGACCTGTTAGATGATGGTGAAATAAAGGAAGATACTAGTTTTGCTAAAAATGACATTAAGGAAAGTTCTGCTGTTTTT AGCAAAAGCGTCCAGAGAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCAT---ACACATTTGGCTCAGGGTTACCG AAGAGGGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG >Human TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAA CATGTAATGAT---AGGCGGACTCCCAGCACAGAAAAAAAGGTAGATCTGAATGCTGATCCCCTGTGTGAGAGAAAAGAA TGGAATAAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGAT---ACTGAAGAT---GTTCCTTGGATAACACTAAATAG CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAGGTTCTGATGACTCACATGATGGGGAGTCTG AATCAAATGCCAAAGTAGCTGATGTATTGGACGTTCTAAAT------GAGGTAGATGAATATTCTGGTTCTTCAGAGAAA ATAGACTTACTGGCCAGTGATCCTCATGAGGCTTTAATATGTAAAAGTGAAAGAGTTCACTCCAAATCAGTAGAGAGTAA T---ATTGAAGACAAAATATTTGGGAAAACCTATCGGAAGAAGGCAAGCCTCCCCAACTTAAGCCATGTAACTGAAAATC TAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA--------------------------------------- ---------------------GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGACCTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGCAGATTTGGCAGTT---CAAAAGACTCCTGAAATGATAAATCAGGGAACTAACCAAACGG AGCAGAAT---------GGTCAAGTGATGAATATTACTAATAGTGGTCATGAGAATAAAACAAAAGGTGATTCT---ATT CAGAATGAGAAAAATCCTAACCCAATAGAA------TCACTCGAAAAAGAATCTGCTTTCAAAACGAAAGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTCGAATTAAATATCCACAATTCAAAAGCACCTAAAAAGAATAGGCTGAGGAGGA AGTCTTCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAATTGTACTGAATTGCAA ATTGATAGTTGTTCTAGCAGTGAAGAG---ATAAAGAAAAAA---AAGTACAACCAAATGCCAGTCAGGCACAGCAGAAA CCTACAACTCATGGAAGGTAAAGAACCTGCAACTGGAGCCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGAC ATGACAGCGATACTTTCCCAGAGCTGAAGTTAACAAATGCACCTGGTTCTTTTACTAAGTGTTCAAATACCAGTGAACTT AAAGAATTTGTCAATCCTAGCCTTCCAAGAGAAGAAAAAGAAGAGAAA---CTAGAAACAGTTAAAGTGTCTAATAATGC TGAAGACCCCAAAGATCTCATGTTAAGTGGAGAAAGG---GTTTTGCAA---ACTGAAAGATCTGTAGAGAGTAGCAGTA TTTCATTGGTACCTGGTACTGATTATGGCACTCAGGAAAGTATCTCGTTACTGGAAGTTAGCACTCTAGGG---AAGGCA AAA---ACAGAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGGACTAATTCATGGTTGT---TC CAAAGATAATAGAAATGACACAGAAGGCTTTAAGTATCCATTGGGACATGAAGTTAAC---CACAGTCGGGAAACAAGCA TAGAAATGGAAGAAAGTGAACTTGATGCTCAGTATTTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCCG TTTTCAAATCCAGGAAATGCAGAAGAGGAATGTGCAACATTCTCTGCCCACTCTGGGTCCTTAAAGAAACAAAGTCCAAA AGTCACTTTTGAATGTGAACAAAAGGAAGAAAATCAAGGAAAGAATGAGTCTAATATCAAGCCTGTACAGACAGTTAATA TCACTGCAGGCTTTCCTGTGGTTGGTCAGAAAGATAAG---CCAGTTGATAATGCCAAATGTAGTATCAAAGGAGGCTCT AGGTTTTGTCTATCATCTCAGTTCAGA---GGCAACGAAACTGGACTCATTACTCCAAATAAACATGGACTTTTACAAAA CCCATATCGTATACCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAAT---CTGCTAGAGGAAA ACTTTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAATGAGAAC---ATTCCAAGTACAGTGAGCACAATTAGC CGTAATAACATTAGAGAAAATGTTTTTAAAGAAGCCAGCTCAAGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGT GGGCTCCAGTATTAATGAAATAGGTTCCAGT---------------------GATGAAAACATTCAAGCAGAACTAGGTA GAAACAGAGGGCCAAAATTGAATGCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTATAAACAAAGTCTT---CCT GGAAGTAATTGTAAGCATCCTGAAATAAAAAAGCAAGAATATGAAGAA---GTAGTTCAGACTGTTAATACAGATTTCTC TCCATATCTGATTTCAGATAACTTAGAACAG---CCTATGGGAAGTAGTCATGCATCTCAGGTTTGTTCTGAGACACCTG ATGACCTGTTAGATGATGGTGAAATAAAGGAAGATACTAGTTTTGCTGAAAATGACATTAAGGAAAGTTCTGCTGTTTTT AGCAAAAGCGTCCAGAAAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCAT---ACACATTTGGCTCAGGGTTACCG AAGAGGGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG >Chimpanzee TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAA CATGTAATGAT---AGGCGGACTCCCAGCACAGAAAAAAAGGTAGATCTGAATGCTGATCCCCTGTGTGAGAGAAAAGAA TGGAATAAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGAT---ACTGAAGAT---GTTCCTTGGATAACACTAAATAG CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAGGTTCTGATGACTCACATGATGGGGGGTCTG AATCAAATGCCAAAGTAGCTGATGTATTGGACGTTCTAAAT------GAGGTAGATGAATATTCTGGTTCTTCAGAGAAA ATAGACTTACTGGCCAGCGATCCTCATGAGGCTTTAATATGTAAAAGTGAAAGAGTTCACTCCAAATCAGTAGAGAGTAA T---ACTGAAGACAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCCAACTTAAGCCATGTAACTGAAAATC TAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA--------------------------------------- ---------------------GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGAGCTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGCAGATTTGGCAGTT---CAAAAGACTCCTGAAATGATAAATCAGGGAACTAACCAAATGG AGCAGAAT---------GGTCAAGTGATGAATATTACTAATAGTGGTCATGAGAATAAAACAAAAGGTGATTCT---ATT CAGAATGAGAAAAATCCTAACCCAATAGAA------TCACTCGAAAAAGAATCTGCTTTCAAAACGAAAGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTCGAATTAAATATCCACAATTCAAAAGCACCTAAAAAGAATAGGCTGAGGAGGA AGTCTTCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAATTGTACTGAATTGCAA ATTGATAGTTGTTCTAGCAGTGAAGAG---ATAAAGAAAAAA---AAGTACAACCAAATGCCAGTCAGGCACAGCAGAAA CCTACAACTCATGGAAGATAAAGAACCTGCAACTGGAGTCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGAC ATGACAGCGATACTTTCCCAGAGCTGAAGTTAACAAATGCACCTGGTTCTTTTACTAACTGTTCAAATACCAGTGAACTT AAAGAATTTGTCAATCCTAGCCTTCCAAGAGAAGAAGAAGAAGAGAAA---CTAGAAACAGTTAAAGTGTCTAATAATGC CGAAGACCCCAAAGATCTCATGTTAAGTGGAGAAAGG---GTTTTGCAA---ACTGAAAGATCTGTAGAGAGTAGCAGTA TTTCATTGGTACCTGGTACTGATTATGGCACTCAGGAAAGTATCTCGTTACTGGAAGTTAGCACTCTAGGG---AAGGCA AAA---ACAGAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGGACTAATTCATGGTTGT---TC CAAAGATACTAGAAATGACACAGAAGGCTTTAAGTATCCATTGGGACATGAAGTTAAC---CACAGTCGGGAAACAAGCA TAGAAATGGAAGAAAGTGAACTTGATGCTCAGTATTTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCTG TTTTCAAATCCAGGAAATCCAGAAGAGGAATGTGCAACATTCTCTGCCCACTGTAGGTCCTTAAAGAAACAAAGTCCAAA AGTCACTTTTGAACGTGAACAAAAGGAACAAAATCAAGGAAAGAATGAGTCTAATATCAAGCCTGTACAGACAGTTAATA TCACTGCAGGCTTTCCTGTGGTTTGTCAGAAAGATAAG---CCAGTTGATTATGCCAAATGTAGTATCAAAGGAGGCTCT AGGTTTTGTCTATCATCTCAGTTCAGA---GGCAACGAAACTGGACTCATTACTCCAAATAAACATGGACTTTTACAAAA CCCATATCATATACCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAAC---CTGCTAGAGGAAA ACTTTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAATGAGAAC---ATTCCAAGTACAGTGAGCACAATTAGC CGTAATAACATTAGAGAAAATGTTTTTAAAGAAGCCAGCTCAAGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGT GGGCTCCAGTATTAATGAAGTAGGTTCCAGT---------------------GATGAAAACATTCAAGCAGAACTAGGTA GAAACAGAGGGCCAAAATTGAATGCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTATAAACAAAGTCTT---CCT GAAAGTAATTGTAAGCATCCTGAAATAAAAAAGCAAGAATATGAAGAA---GTAGTTCAGACTGTTAATACAGATTTCTC TCCATGTCTGATTTCAGATAACTTAGAACAG---CCTATGGGAAGTAGTCATGCATCTCAGGTTTGTTCTGAGACACCTG ATGACCTGTTAGATGATGGTGAAATAAAGGAAGATACTAGTTTTGCTGAAAATGACATTAAGGAAAGTTCTGCTGTTTTT AGCAAAAGCGTCCAGAGAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCAT---ACACATTTGGCTCAGGGTTACCG AAGAGGGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG >Jackrabbit -------------------------------------------------------------------------------- -AAGGCTGAATTCTGTAATAAGAGCAAACAGCCTGGCTTAGCAAGAAGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCCAGCACAGAGAAAAAGGTAGTTCTGAATGTTGACTGCCTGTATGGGAGAAAACAA CAGGATAAGCAGAAACCTCCATGCCCTGAGACCTCTGGAGAT---AACCAAGAT---GTTTCTTGGATAACAGTAAATAG CAGCATTCGGAAAGTTAACGAGTGGTTCTCCAGAAGTAATGAAATGTTAACTCCTGATGACTCACTTGACCGGCGGTCTG AATCAAATGCCAAAGTGGCTGGTGCATTAGAAGTCCCAAAG------GAGGTAGATGGATATTCTGGTTCTACAGAGAAA ATAGACTTACTGGCCAGTGATTCCCATAATGCTTTAATATGTGAAAGCAAAAGAGTCCATTCCAAACCAGTAGAGAATAA T---ATCAAAGATAAAATATTTGGGAAAACCTACCACAGGAAGACAAGCCTCCCTAACTTGAGCCACATAACTGAAGATC TAACTATAGGAGCATTTGCTGCGGAACCACTGGTA--------------------------------------------- ---------------------CCATGTCCCCCCGCAAATAAATTAAAGCGTAAAAGAAGAACTTCTTCAGGCCTTCAACC TGAAGATTTTATCAAGAAGGTAGATTTGGCAGTTGTTCCAAAAACCACTGCACAGATAAATCAGGGAACTGATCAAACGG TGGACAGT---------GATCAGGTGATGAATATTACTAATTGTGGTAATGAGAATGAAACAGAAGGTGACTAT---ATT CAGAAAGAGACAAATGCTAACCCAACAGAA------TCCCTAGAAAAAGACTCTTCCTTCAGAACTAAAGTTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTAAATGTCCATAATTCAAAAAAACCCAAGAAGAATAGGCTGAGGAGGA AGTCCTCTACCAGGCGTGTTCATGCACTTGAACTAGTAGTCAATAAGAAACCGAGCCCACCTAATCATGCTGAACTACAA ATTGACAGTTGTACTAGCAGTGAAGAA---ATG------------AATTTTGACCAAATACCAGTCAGTCACAGCAGACA GGCTCAAGTCATAGAAGATACAGAACCTCCAACTGGAGCTAGGAAG---AACAAGCCAAGTGAACAAATAAGTAAAAGAC ATGCCAGTGATGTTTTCCCAGAAGTCAAATTAACAAACATACCTGGTGTTTTTACAAACTGTTCAAGTTCTAATAAACTT CAAGAATCTGTCGATCCTAGCCTTCAAAGAAGGGAAATAGAAGAGAAC---CTAGACACAGTTCAAGTGCCTAACAGTGC CATAGACCCCAAAGATCTCCTGTTATGTGGAGAAAGG---GGCTTGCAA---ACTGAAAGGTCTGCGGAGAGTACCAGTA TTTCACTGGTACCTGATACTGAATATGGTACCCAGGACAGCATCTCATTACTGGGCGCTAACACCCTTGGG---AAGGCA AAA---ACAGCAGCAAATGGACATGTGAGTGAGAGCACAACAATTGGAAATCCCAAGGAACTTAGCCATGATTGT---TT GAAAGATACTGGAAATGACCCAGACAACTGTAAGGATCCACTGAGAAGTGAAGTTGAC---------CAAGAGACAAGCG TAGAAATGGAAGAGAGTGAGTTTGATACTCAGTATTTACAGAATACATTCATGGGTTCAAAGCGTCGTTCATTTGCTCTG TGTTCAAAACCAGGAGATCCAGAAAAGGAATGTGCAGCAGTCTGTACCCGCTCCAACTCCTCAAGGAAACAAAGTCCAGA AGTCACTCTTGAACGTGAACAAAAAGAA---AGTCAGAGAAAGGAAGAGTGGAAAATCAGTCATGTCCAGGCAGCTGATA GCACTGTGGGCTTTCCTGTGGTGTGTCAGAAAGAAAAG---GCAGGTGATTGTGCCAAATGGAGCACTAAAGAGATCTCT AGGCTTTGTCTGTCATCTCAGTCCAAA---GGCAGTGAAACTGAGCTCATTGCTGTAAGTAAACATGGGATGTCACAAAA CCCATATCATATACCACCAATTTCTCCCATCAAGACATCTGTTAAAGCCACACGCCAGGTACAC---CTGTCAGGGGAAA GGTCTGAGGAGCATTCCGTGTCATCTGAAAGAGCAGTGGGAAGCGAGAGCATCATTCAAAGCACAGTGAGCACAATTAGC CAAGAGTACATTAGAGAAAGTGCTTTGAAAGGATTCAGCTCAAGCAGTATTAATGAAGGGGGCTCTAGTGCTAACGAAGT ATGCTCCAGTGTGAATGAAGTAGGATCCAGT---------------------GGTGAAAACATGCAAGCACAACCAGGCA GAAGCAGAGCACCTGAGTTAAATGCTGTGCTGAGAATAGGTCTTCTGCAGCCTGAAGTCTCTGAGCAAAGCCTT---CCT ATAAGTAATTCTGAACTTCCCAAACTACAAAGGCAAGGAGAAAACGAAGGAGTAGTTCAGGCTGTGAATAGAGATTTCTC TTCGTATCTGGTTCCTGATAGCCAAGAGCAG---TCTATGGGAGGAAGGCATGCTTCTCAGATTTGTTCTGAGACACCTG ATGACCTGTTAGATATTTATGAAATAAAGGAAAATACCAGCTTTGCTGAGAGCGGCATTAAGGAAAGATCTGCTGTGTTT AGTAAAAGTGTTCAGAGGGGAGAGTGCAGTAGAACCCCTAGCCCTTCAGGCCAT---GCATGTTTGGCTCAGAGTCA--- ------------------------------------------------- >FlyingSqu TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAGCAGAGCAGATGGGCTAAAAGTAAGGAAA CCTGTAATGAT---AGGCAAATTCCCAGCTCAGAGAAAAAGGTAGATTTGAATGCTGATCCCCAATATGAGAAAAAAGAA CCAAGTAAGCAGAAACATCCATGCTCTGAGAATTCCAGAGAT---ACCCAAGAT---GTTCCTTGGATAACACTAAATAG CAGCATTCGGAAAGTTAACGAGTGGTTTTCCAGAAGTGACGAAATGTTAACTTCTGATGACTCAGATGATGGGGGTTCTG AATCAAATGCTGAAATAGCTGGTATATTAGAAATTCCAAAT------GAAGTAGATGGATTTTCTGGTTCTTCGGAGAAA ATAGACTTGTTGGCCACTGATCCGCATAATGCTTTAATTTCTAAATGTAAAAGAGTCTGCTCCAAAGCAGTCAAGAGTAA T---ATTGAAGATAAAATATTTGGGAAAACCTATCAGAGGAAGGCAAGCCTCCCTAACTTGAGCCATATAACTGAAAATC TAATTATAGGAGCATTTGCCAGAGAACCACAAATAACACAAGAGCTTGCCAGAGAACCACAAATAACACAACAGCTTGCC AGAGAACCACAAATAACACAAGAGCGTCCCCTCACAAATAAATTAAGACGTAAAAGGAGAACTACATCATGCCTTCATCC TGAGGACTTTATCAAGAAAACAGATTTGGCAATTGTTCAAAAGACTCCTGAAAAGATAAATCAGGGAACTGACCAAATGG AACACAAT---------GATCAAGTAATGAATATTACTAATAGTGGTCAAGAGAATGAAACAAAAGTTGATTAT---GTT CAGAAAGAGAAAAATGCTAACCCAGTTGAA------TCATTGGAAAAAGAGTCTGCTTTCAGAACTAAAGCTGAACCTAT AAGCAGTAGTATAAGCAACATGGAACTAGAATTAAATATCCACAATTCAAAAGCACCTAAGAAAAATAGGCTGAGGAGGA AGTCTTCTACTAGGTACATTCATGTGCTTGAACCAGTAGTTAATAGAAATACAAGTCCACCTCATCACACTGAATTGCAA ATTGATAGTTGTACTAGTAGTGAAGAA---ATAAAGACAAGA---AATTCCAACCAAATGTCAGTCAGGCATGGCAAAAA GCTTCAGTTCATGGAAGATGCAGAACCTGCAACTGATGTCAGAAAAAGTAACAAGCCAAATGAACAAGTAAATAAGAGAC ATACCAATGATGCTTTCCCAGGACTGAAGTTAACAGGCATATCTGGTATTTTTACTAACTGCTCAAGTTCTAGTAAAGTT GAAGAATTTATCAATCCTAACCTTCAGAAAGAAGGAACAGAAGAGAAC---ATAGAAATAATTCAAGTGTCTGATAATAC CCAAGACCCCAAAGATACGGTGTTAAGTGGAGAAAGG---GTTTTGCAA---ACTGAGAGATCTGTAGAGAGTACCAGTA TTTCATTGGTACCTGATACTGATTATGGCACTCAGGACAGTATCTCATTACTGGAAGCTAACACCTTTGGA---AAAGCA AGA---ACAGCATCAAATCAACATGTTACTCAGTATGTGGCAATTGAAAATCCCAAAGAATTTGTCCATGGTCAT---TC TAAAGATACTAAAAATGACCCAGAGGGTTTCAAGGATTCATTGAGATGTGAAGTTAAC---CACATTCAAGAGACAAATG TAGAAATGGAAGAAAGTGAACTTGATACTCAGTCTTTAGAGAATACATTCCAAGTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCAAATCCAGGAAATCTAGAAAAGGAATGTGCAGCCAACTATACCTCTTCCAAGTCCTTACGGAAACAAAGCCCAGA CATGACTCTTGAATGTGAGCAAGAAGAAGAAAATTGGGGAAAGAAAGAGTCTAAAATTAGGCATGTACAGGCAATTAATG CTACT------------ATGGTTTGTCAGAAAAGTAAG---CCAGATGATGATGCCCAATGTAGTGTTACAGAAGTCTCT AGAATTTTTCCATCATTTCAGTTCAGA---GACAATCTAACTAAACTCATAACTGCAGATAAACATGGAATTTCACAAAA CCCATATCATATGCCATCCATTTCTCCCAGCAGGTCATCTGTTAACACTAAATGTAGGAGAAAC---CTATCAGAAGAAA AGTTTGAGAAAGATTCAAGGTCACCTAAAGAAGCGGTGGGATATAAGAGAATCATTCAAAGTACAGTGAGCACAATTAGC CAAAATAACATTAGACAAAGTGCCTTTAAAGAAGCGAACTCAGGCAGTATTAATGAAGTAGGCTCTAGTACTAATGAAGT AGGCTCCAGTATTAATGAAGTAGGTTCCAGT---------------------GGTGAAAACATTCAAGCAGAACTAGGTA GAAACAGAGGACCCAAATTAAATGCTGTGCTTAGATTAGGTCTTATGCAACCTGAAGCCTGTAAACAAAATCTT------ CTAAGTAATTGTAAATACCCTGAAATAAAAAGACAAGGAGAAGATGAA---GTAGTTCAAGTTGTTAATGCAGATTTCTC TCCATGTCTAATTTCAGATGACTTAGAACAA---CCTATGGGAAATAGTCATGTCACTCAGGTTTGTTCTGAGACTCCTG ATGACCTGTTAGATGATGATGAAATACAGGAAAATACCAGCTTTGCTGAAGGTGGTATTAAGGAAAGATCTGCTGTTTTT AGTAAGAGTGTCCAAAGAGGAGAGTTTAGCAGGAGCCCGAGTCCTTTATCCCAT---ACGTCTTTGGCTCGGAGTCATCA AAAAGGGGCCAGGAAATTAGAGTCCTCAGAAGAGAGCATCTGTAGTGAG >OldWorld -------------------------------------------------------------------------------- ----------------------------------------------AGCCAACAGAGCAGGTGGGCTGAGAGCAAGGAGA GGTGCCATGAC---AGGCAGGCTCCTGGCACAGAGCAGAAGGTAGAGCTGACTGCTGAGCCCCTCCACGAGAGAAAAAGA CGGAAAAAGCAGAACCCTCCGAGCTCCGAGGCTCATGGAGGG---ACCCAGGAT---GTTCCTTGGATCACACTAAATAG CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGACAAAGCAGTGACTTCTGACAGCACCTGTGACAGGGAGTCCA AGTCAGATGCTGAAGAAGCTGGTGTGGCAGAAGATCCAAAA------GGCCTAGAT---------GGCTCTTCAGAGAAA GTAGGCTTGCTGACCAGCAGTCCTCAAAAAGCTATAATCTGTGCCAGGGAGAGGGTCTGCTCCAAAGCACTGAAGAGTGA C---ATGGAAGATAAAGTATTTGGGAAAACCTATCAGAGGAGGGTGAGCCTCCGCAGCTCAAGCCACGGAGCTGAAAGTC TGACCGTGGGAGCGTTTGTTAGAGAGCCGCAGCTGACACTG--------------------------------------- ---------------------GAGCGCTCCTTCACAAATAAAATAAAGCGCAAGAGGAGAACCACATCGTGCCTTCATCC TGAGGACTTTGTCAAGAAAGCAGATTTGACAGTGGCTCAAAAGACTCCTGAAAAGGTAAATCAGGGAACGAAGCAAATGG AGGAGAGT---------AGTCAAGTGACAAATACTACTAATAACAGTCATGAGAATGAGACAAAAATGGGCAAT---GTT CAG------AGAAACCCTAACCCAGTACTA------TCAGTGGAAAAAGAGTCTGCTTCTGGTACTAACGCAGAGCCCTC GAGCAGCAGCATAAACAACAGGGAACTAGAATTAAACACCCCTCTTGCAAAAGAACCTAAGGAAGACAGGCTGAGGGGGA CGTCCTCAAGCAGACACAGACCT------------------------AATCCACACCTGCTGGATCACACAGAGCTGCAA AGTGGCAATTCTATCAGCAGTGAAGAA---ATAAAGGAAAAA---AGCTCCCCCCAAATGCCCATCAGGCACAGCAGGAC GCTGCACCTCACAGGGGCTGTGGAAGCTTCTATTGGAGCCAGGGAGAGTAAGAAGCCAGCTGGACAAGTAAGGGAGAGAC ATGCCGGGGACTCTTTCCTAGAACCAAGATTAGCAAGACTACCTGCT---TTTACTAACAGTTCAAGCCCTGATAACCTT AAAGAATTTGTCAACCTTAGTGCACAGACAGAAGAGATGGAAGAGAAC---CCAGAAACAGTACAAGTGTCCGAAAGTAC CAGAGACTCCAAAGGTCCTGTGTTAAGTGGGGAAAAG---GGAGTGCAA---ACCGAGAGGTCTATGGAGAGCACTAGCA TTTCACTAGTCCCTGACACTGACTGTGGCACTCAGGACAGTGTCTCCTTACTGGAAGCCGACAGCCTCAGG---AAGGCA CGG---AGAGCATCGCATCAGTGTATGGCTCAATATGTGGCGGTTGCGAAGCCCAAGGAACTTCTGCCTGCTTGT---TC TGAAGACACTGGAAACGGCACAGACAGCTTAAATGATCCATTGAGATGTGGAGGGAGC---CACATCCAGGAGGCAAATA TAGAAATGGAAGATAGTGAACTCGACACTCAGTATTTACAGAATACATTCCAGGCTTCAAAGCGTCAGTCATTTGCTCTC TTTTCAAATCCAGGAAACTCAAAAAAGGAATCAACAACAGTCTGTGCCCACTCTGAGTCCTTTAAGAAACAAAATCCAGA AGTCATTCCTGAATGTGAACAAACAGAAGAAAATTGGGGCAAGAAAGTGCCTAAAATTAGTTGTGTGCAAGAGAGCGCC- --------------CCTCTGGTTTCTCAGAGGGATCAG---CCAGGCACCAGCATCATATGTAGCGGCACAGGAGTCTCG AGGCTCTGTCTCTCGTCTTGGTTCACA---GGCAGCAAAACTGAACTCGTCACAGCTGACAAACATGGAATTTCACAAAA TCCATATCACATGCCATCAATTTCTCCCATCAGGCCATTTGTTAAAACTCCATGTAAGAAAACC---------------- -----------CGTTCCTCATCACCTGGAGAAGCCACAGGTAACCAGATCATCCTTCAGAGCACC--------------- ------------AGCCATCGCGCTTGCAGAGAAGCCAGCTTGGGCAGTGGGAACGAAGGGGGCTCCAGT----------- ----------------------------------------------------GGGGAGCACATTCAAGCAGAACCCAGTA GACACCAAGAGCCTGAACTA------------AGATTAGGTCTGACGCAGCCCGAAGTCTACCAGCAAAGTCTT---CCT GTAGGTGACTGTAGACATCCCGAAATACAAACACGAGGAGAAAATGGAGTGGTAGCTCAGGCTGTCCATGCAGATTTCTC TCCGTGTCTAATTTTAGATAACGTGGAACAG---CCTATGGGAAATAATCCTGCTTCTCAGATCTGTTCTGAGACGCCCG ATGACCTGTTAGATGATGAGAACAAAAAGGAAGATGCCAGCTTTGCCGAAGGTGGCATTAAGGAAACTTCTGCCATTTTT AGCAAGAGTGTCCAGACAAGACGATTCAGCAGGAGCCCCAGCCCTGTAACCAAT---ACCACTTTGGCTCAGGGTCACCG AAGAAGGGCGAGAAAACTCGAGTCTTCTGAGGAGAGCATGTCAAGTGAG >Mouse TGTGGCACAGATGCTCATGCCAGCTCATTACAGCCTGAGACCAGCAGTTTATTGCTCATTGAAGACAGAATGAATGCAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCATAGCAGTGAGCCAGCAGAGCAGATGGGCTGCAAGTAAAGGAA CATGTAACGAC---AGGCAGGTTCCCAGCACTGGGGAAAAGGTAGGTCCAAACGCTGACTCCCTTAGTGATAGAGAGAAG TGGACTCACCCGCAAAGTCTGTGCCCTGAGAATTCTGGAGCT---ACCACCGAT---GTTCCTTGGATAACACTAAATAG CAGCGTTCAGAAAGTTAATGAGTGGTTTTCCAGAACTGGTGAAATGTTAACTTCTGACAGCGCATCTGCCAGGAGGCACG AGTCAAATGCTGAAGCAGCTGTTGTGTTGGAAGTTTCAAAC------GAAGTGGATGGGGGTTTTAGTTCTTCAAGGAAA ACAGACTTAGTAACCCCCGACCCCCATCATACTTTAATGTGTAAAAGTGGAAGAGACTTCTCCAAACCAGTAGAGGATAA T---ATCAGTGATAAAATATTTGGGAAATCCTATCAGAGAAAGGGAAGCCGCCCTCACCTGAACCATGTGACTGAA---- --ATTATAGGCACATTTATTACAGAACCACAGATAACACAA--------------------------------------- ---------------------GAGCAGCCCTTCACAAATAAATTAAAACGTAAGAGA------AGTACATCCCTTCAACC TGAGGACTTCATCAAGAAAGCAGATTCAGCAGGTGTTCAAAGGACTCCTGACAACATAAATCAGGGAACTGACCTAATGG AGCCAAAT---------GAGCAAGCAGTGAGTACTACCAGTAACTGTCAGGAGAACAAAATAGCAGGTAGTAAT---CTC CAGAAAGAGAAAAGCGCTCATCCAACTGAA------TCATTGAGAAAGGAACCTGCTTCCACAGCAGGAGCCAAATCTAT AAGCAACAGTGTAAGTGATTTGGAGGTAGAATTAAACGTCCACAGTTCAAAAGCACCTAAGAAAAATAGGCTGAGGAGGA AGTCTTCTATCAGGTGTGCTCTTCCACTTGAACCA---ATCAGTAGAAATCCAAGCCCACCTACTTGTGCTGAGCTTCAA ATCGATAGTTGTGGTAGCAGTGAAGAA---ACAAAGAAAAAC---CATTCCAACCAACAGCCAGCCGGGCACCTTAGAGA GCCTCAACTCATCGAAGACACTGAACCTGCAGCGGATGCCAAGAAG---AACGAGCCAAATGAACACATAAGGAAGAGAC GTGCCAGCGATGCTTTCCCAGAAGAGAAATTAATGAACAAAGCTGGTTTATTAACTAGCTGTTCAAGTCCTAGAAAATCT CAAGGGCCTGTCAATCCCAGCCCTCAGAGAACAGGAACA---GAGCAA---CTTGAAACACGCCAAATGTCTGACAGTGC CAAAGAACTCGGGGATCGGGTCCTAGGAGGAGAGCCC---AGTGGCAAAACCACTGACCGATCTGAGGAGAGCACCAGCG TATCCTTGGTACCTGACACTGACTACGACACTCAGAACAGTGTCTCAGTCCTGGACGCTCACACTGTCAGA---TATGCA AGA---ACAGGATCCGCTCAGTGTATGACTCAGTTTGTAGCAAGCGAAAACCCCAAGGAACTCGTCCATGGC------TC TAACAATGCTGGGAGTGGCACAGAGGGTCTCAAGCCCCCCTTGAGACACGCGCTTAAC---CTCAGTCAGGAGAAA---G TAGAAATGGAAGACAGTGAACTTGATACTCAGTATTTGCAGAATACATTTCAAGTTTCAAAGCGTCAGTCATTTGCTTTA TTTTCAAAACCTAGAAGTCCCCAAAAGGACTGTGCT------------CACTCTGTGCCCTCAAAGGAACTGAGTCCAAA GGTGACAGCTAAAGGTAAACAAAAAGAA---CGTCAGGGACAGGAAGAATTTGAAATCAGTCACGTACAAGCAGTTGCGG CCACAGTGGGCTTACCTGTGCCCTGTCAAGAAGGTAAG---CTAGCTGCTGATACAATGTGT------GATAGAGGTTGT AGGCTTTGTCCATCATCTCATTACAGA---AGCGGGGAGAATGGACTCAGCGCCACAGGTAAATCAGGAATTTCACAAAA CTCACATTTTAAACAATCAGTTTCTCCCATCAGGTCATCTATAAAAACTGACAATAGGAAACCT---CTGACAGAGGGAC GATTTGAGAGACATACATCATCAACTGAGATGGCGGTGGGAAATGAGAACATTCTTCAGAGTACAGTGCACACAGTTAGC CTGAATAAC---AGAGGAAATGCTTGTCAAGAAGCCGGCTCGGGCAGTATTCATGAAGTATGTTCCACT----------- ----------------------------------------------------GGTGACTCCTTCCCAGGACAACTAGGTA GAAACAGAGGGCCTAAGGTGAACACTGTGCCTCCATTAGATAGTATGCAGCCTGGTGTCTGTCAGCAAAGTGTT---CCT GTAAGTGAT---AAGTATCTTGAAATAAAAAAGCAGGAG---------------GGTGAGGCTGTCTGTGCAGACTTCTC TCCATGTCTATTCTCAGACCATCTTGAGCAA---TCTATG---AGTGGTAAGGTTTTTCAGGTTTGCTCTGAGACACCTG ATGACCTGCTGGATGATGTTGAAATACAGGGACATACTAGCTTTGGTGAAGGTGACATAATGGAGAGATCTGCTGTCTTT AACGGAAGCATCCTGAGAAGGGAGTCCAGTAGGAGCCCTAGTCCTGTAACCCAT---GCATCGAAGTCTCAGAGTCTCCA CAGAGCGTCTAGGAAATTAGAATCGTCAGAAGAGAGCGACTCCACTGAG >Rat TGTGGCACAGATGCTCGTGCCAGCTCATTACAGCGTGGGACCCGCAGTTTATTGTTCACTGAGGACAGACTGGATGCAGA AAAGGCTGAATTCTGTGATAGAAGCAAACAGTCTGGCGCAGCAGTGAGCCAGCAGAGCAGATGGGCTGACAGTAAAGAAA CATGTAATGGC---AGGCCGGTTCCCCGCACTGAGGGAAAGGCAGATCCAAATGTGGATTCCCTCTGTGGTAGAAAGCAG TGGAATCATCCGAAAAGCCTGTGCCCTGAGAATTCTGGAGCT---ACCACTGAC---GTTCCTTGGATAACACTGAATAG CAGCATTCAGAAAGTGAATGAGTGGTTTTCCAGAACTGGTGAAATGTTAACTTCTGACAATGCATCTGACAGGAGGCCTG CGTCAAATGCAGAAGCTGCTGTTGTGTTAGAAGTTTCAAAT------GAAGTGGATGGATGTTTCAGTTCTTCAAAGAAA ATAGACTTAGTTGCCCCTGATCCCGATAATGCTGTAATGTGTACAAGTGGAAGAGACTTCTCCAAGCCAGTAGAGAATAT T---ATCAACGATAAAATATTTGGGAAAACCTATCAGAGAAAGGGAAGCCGCCCTCACTTGAACCATGTGACTGAA---- --ATTATAGGCACATTTACTACAGAACCACAGATTATACAA--------------------------------------- ---------------------GAGCAGCCCTTCACAAATAAATTAAAACGCAAAAGA------AGTACATGCCTTCATCC TGAGGACTTTATCAAGAAAGCAGATTTAACAGTTGTTCAAAGGATTTCTGAAAATTTAAATCAGGGAACTGACCAAATGG AGCCAAAT---------GACCAAGCAATGAGTATTACCAGTAACGGTCAGGAGAACAGAGCAACAGGTAATGAT---CTT CAGAGAGGGAGAAATGCTCATCCAATAGAA------TCATTGAGAAAGGAACCTGCTTTCACAGCTAAAGCCAAATCTAT AAGCAACAGTATAAGTGATTTGGAGGTAGAATTAAATGTTCACAGTTCAAAAGCACCTAAGAAAAATAGGCTGAGGAGGA AGTCT---ACCAGGTGTGTTCTTCCACTCGAACCA---ATCAGTAGAAATCCGAGCCCACCTACTTGTGCTGAACTTCAG ATCGAGAGTTGTGGTAGCAGTGAAGAA---ACAAAGAAAAAC---AATTCCAACCAAACCCCAGCCGGGCACATTAGAGA GCCTCAACTCATCGAAGACACAGAACCCGCAGCTGATGCCAAGAAG---AACGAGCCAAATGAACACATAAGGAAGAGAA GTGCCAGTGATGCGTTCCCAGAAGAGAAATTAATGAACAAAGCTGGTTTATTAACTAGCTGTTCAAGTCCTAGAAAGCCT CAAGGACCTGTCAATCCTAGCCCTGAGAGAAAAGGAATA---GAGCAA---CTTGAAATGTGCCAGATGCCTGATAATAA CAAAGAACTCGGGGATTTGGTCCTGGGAGGAGAGCCC---AGTGGGAAACCTACTGAACCATCTGAGGAGAGCACCAGTG TGTCCTTGGTACCCGACACAGACTACGACACCCAGAACAGTGTCTCAATACTGGAAGCGAACACTGTCAGA---TATGCA AGA---ACAGGATCAGTTCAGTGTATGACTCAGTTTGTCGCAAGTGAAAACCCCAAGGAACTTGTCCATGGT------TC TAACAATGCTGGAAGTGGCTCGGAGTGCTTCAAGCACCCATTGAGACATGAACTTAAC---CACAATCAAGAGACA---A TAGAAATGGAAGACAGTGAACTTGATACTCAGTATTTGCAGAATACATTTCAAGTTTCAAAGCGTCAGTCATTTGCTTTA TTTTCAAAACTTAGAAGTCCCCAAAAGGACTGTACTCTGGTAGGTGCCCGCTCTGTGCCCTCAAGGGAACCAAGTCCAAA GGTGACTTCTAGAGGTGAACAAAAAGAA---CGTCAGGGACAAGAAGAGTCTGAAATCAGTCATGTACAGGCAGTCACAG TCACAGTAGGCTTACCTGTGCCCTGTCAGGAAGGTAAG---CCAGGTGCTGTTACAATGTGT------GCTGATGTTTCT AGGCTTTGTCCGTCATCTCATTATAGA---AGCTGTGAGAATGGACTCAACACCACAGATAAATCTGGAATTTCACAAAA CTCACATTTTAGACAATCAGTTTCTCCCCTCAGGTCATCTATAAAAACTGACAATAGAAAAACT---CTGACAGAGGGNC GATTTGAGAAACAT---------ACTGAAAGGGGRATGGGAAATGAGACTGCTGTTCAAAGTACAATACACACAATTAGT CTAAATAAC---AGAGGAGATGCTTGTCTAGAAGCCAGCTCAGGCAGTGTTATTGAAGTACATTCCACT----------- ----------------------------------------------------GGTGAAAACGTCCAGGGGCAACTAGATA GAAACAGAGGGCCTAAGGTAAACACCGTGTCTCTATTAGATAGTACACAGCCTGGTGTCTCTAAGCAGAGTGCT---CCT GTAAGTGAT---AAGTATCTTGAAATA---AAGCAGGAG---------------AGTAAGGCTGTCAGTGCAGACTTCTC TCCATGTCTGTTCTCAGATCATCTTGAAAAA---CCTATGAGAAGTGATAAGACTTTTCAGGTTTGCTCTGAGACACCTG ATGACCTGTTGGATGATGTTGAAATACAGGAAAATGCTAGCTTCGGTGAAGGTGGCATAACGGAAAAGTCTGCTATTTTT AATGGAAGTGTCCTGAGAAGAGAGTCCAGTAGGAGCCCTAGCCCTGTAACCCAT---GCATCGAAGTCGCGGAGTCTCCA CAGAGGGTCTAGGAAATTAGAATTCTCAGAAGAGAGCGACTCCACTGAG >NineBande TGTGGCACAAATACTCATGCCAACTTATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGCGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCCAGCATAGAGAAAAAGGTAGATGTGGATGCTGATCCCCTGTATGGGCGAAAAGAA CTGAATAAGCAGAAACCTCCATGCTCTGAGAGTCATAGAGAT---ACCCAAGAT---ATTCCTTGGATAATGCTGAATAG TAGCATTCAGAAAGTTAACGAGTGGTTTTCCAGAGGTGATGACATATTAACTTCTGATGACTCACACGATAGGGGGTCTG AATTAAATGCAGAAGTAGCTGGTGCATTGAAAGTTTCAAAA------GAAGTAGATGAATATTCTAGTTTTTCAGAGAAG ATAGACTTAATGGCCATTAATCCTCATGATACTTTACAATTTGCAAGTGAAAGAGTCCAATTGAAACCAGCAGAGAGTAA C---ATCAAAGATAAAATATTTGGGAAAACCTATCATAGGAAGGCAAGCCTCCCTAACTTGAGCCACATAACCCGAAACC TTTTTATAGGAGCTATTGCTGCAGAGCCCAAGATAACACAA--------------------------------------- ---------------------GAGCATTCCCTCCAAAATAAAATAAAGCGTAAAAGGAGAACTGCATCAGGCCTTCGTCC TGAGGATTTATCCAAGAAAGTAGATTTGACAGTTGTTCAAAAAACCCCTGAAAAGATAAATCAGGGAACTGACCAAATGG AGCAGAAT---------GATCCAGTGATGAATATTGCTAATAGTGGTCATGAGAATGAAACAAAAGGTGATTGT---GTT CAGAAAGAGAAAAATGCTAATCCGACAGAA------TCATTGGGAAAAGAATCTGCTTTCAGAACTAAAGGCGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTAAATATTTTAAATTCAAAAGCATCTAAGAAGAATAGGCCGAAGAGGA TGTCCTCTACCAGGCATATTCATGCACTTGAACTAGTCGGCAGTAGAAATCCAAGCCCACCTAATCATACTGAACTACAA ATTGATAGTTGTTCTAGCATTGAAGAG---ATAGAGAAAATA---AATTCTAACCAAAAGCCAATCAGACACAACAGAAT GCTTCAACTCACGAAAGAAAAAGAAACCACAACTGGAGCCAAAAAGAACAATAAGCCAAATGAACAAATAAGTGAAAGAC ATGCCAGTGATGCTTTCCTAGAACTTAAA------AATGTAACTGATTTTCTTCCTAAATGTTCAAGTTCTGATAAACTT CAAAAATTT---AATTCTAGCCTGCAAGGAGAAGTAGCA---GAGAAC---CTAGAAACAATTCAAGTGTCTGATAGTAC CAGGGACCCTGAAGATCTGGTGGTAAGTGGAGAAAAG---TGTTTGCAA---ACTGAAAGATCTGCAGAGAGTACCGGTA TTTCAGTGGTACCTGATACTGATTATGGCACTCAAGACAGTATCTCATTACTGGAAGCTGACACCCTGGGG---AAGGCA AAA---ACAGCACTAAATCAACATGTGAGTCAGTATGTAGCAATTAGAAATGCCACTGAACTTTCCCATGGTTGT---TC TAAAGACACTAGAAATGACACTGAAGATTTTAAGGATTCATTGAGACATGAAGTTAAC---CACACTCAGGGGACAAATG TTGAAATAGAAGAGAGTGAACTTGATACTCAGTATTTGCAGAATACATTCAAGATTTCAAAGCGCCAGTCATTTGCTCTG TTTTCGAATCCA---------GAAAATGAATGTGCAACAGTCTGTGCCCACTCCAGGTTCTTAGGGAAACAAAGTCCAAA AGTCACCTTTGAATGTAGACATAAAGAAGAAAATCAGGGGAAGAAAGAGTCTAAAATCAAACATGTGCAGGTAATTCACA CAACTGCAGGCTTTCCTATAGTTTGTCAGAAAGATAAG---CCAGGTGATTATGCCAAAGGTAGCATTCAAGGAGTCTCT AGGCTTTGTCAGTCCTCTCAGGCCAGA---GGCAATGAATCTGAACTCATTAATTCAAATGAACATGAAATTTCACAAAA CCCAGATCAAATGCCATCACTTTCTCACATGAAGTCATCTGTTAAAACTAAATGTAAGGAAAAC---CTGTCAGAGGAAA AGTTTGAGGAACTTACAGTGTCACTTGAAAGAACAATGGTAAATGAGAACATCATTCAAAGTACAGTAAGCACAATTAGC CACAGTAACATTAGAGAAAACACTTTTAAAGAAGCCAGCTCAAGCAGTATTAATGAAGTAGGGTCCAGT----------- ----------------------------------------------------GATGAGAACATTCAAGCAGAAGTAGGTA GAAACAGAGCACCTAAATTAAATGCTATGCTCAGATTAGGTCTTATGCAACCTGAAGTCTATAAGCAAAGTCTT---CCT ATAACCAATTGTAAATATCCTGAAATAAAAAGTCAAGGAGAAAATGAAGAAGCAATTCGGGCTGTTGATATAGACTTCTC TCCATGTCTAATTTCAGATAACCTACAACTA---CCTATGGGAAATAGTTGTGCTTCCCAGATTTGTTCTGAGACACCTG ATGACTTGTTAGATGATGATGAAATAAAGGAAAATAACTGCTTTGCTGAAAGTGACATTAAGGAAAGATCTGCTATTTTT AGCAAAACTGTCCAGAAAAGAGAGTTCAGAAGGAGCCCTAGCCCTTTAGTCCAT---ACAAGTTTTGCTCAGGGTCACCA AAGAAAGCCCAGGAAATTAGACTCCTCAGAAGAGGACGTATCTAGTGAG >HairyArma TGTGGCACAAATACTCATGCCAACTTATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAGTGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGCGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCCAACACAGAGAAAAAGGTAGATGTGGCTGCTGATTCCCTGTATGGGCGAAAAGAA CTGAATAAGCAGAAACTTCCATGCTCTGAGAGTCCTAGAGAT---ACCCAAGAT---ATTCCTTGGATAACGCTGAATAG TAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGACCTATTAACTTCTGATGACTCACACGATGGGGGGTCTG AATCAAAAGCAGAAGTAGCTGGTGCATTAAAAGTTCCAAAT------GAAGTAAATGGATATTCTAGTTCTTCAGAGAAG ATAGACTTATTGGCCAGTGATCCTCATAATGCTTTAATATTTGCAAGTGAAAGAGTCCAATCCAAACCAGCAGAGAGTAA C---ATCAAAGATAAAATATTTGGGAAAACCTATCACAGGAAGGCAAGCTTTCCTAACTTGAGCCACATAACTGAGGATC TTTTTATAGGAGCTATTGCTACAGAACCCAAGATAATACAA--------------------------------------- ---------------------GAGCATTCCCTCACAAATAAAATAAAGCGTAAAAGGAGAACTACGTCATGCCTTCATCC TGAGGATTTTATCAAGAAAGTAGATTTGACAGTTGTTCAAAAGACGCCTGAAAAGATAAATCAGGGAACTGACCAAATGG AGCAGAAT---------GATCAAGTGATGAATAGTGCTAATAGTGGTCATGAAAATGAAACAAAAGGTGATTAT---GTT CAGAAAGAGGAAAATGCTAACCCAATAGAA------TCATTGGAAAAAGAATCTGCTTTCAGAACTAAAGGTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTAAATATCTATAATTCAAAAGCATCTAAGAAGAATAGGCTGAGGAGGA TGTCCTCTACCAGGCATATTCATGCACTTGAACTAGTAGGCAATAGAAATCCAAGCCCACCTAAACATACTGAACTACAA ATTGATAGTTGTTCTAGCACTGAAGAG---ATAGAGAAAATA---AATTCTAACCAAAAACCAATCAGACACAACAGAAT GCTTCAACTCATGAAAGAAAAAGAAAACACAACTGGAGCTAAAAAGAATAACAAGCCAAATGAACAAATAAGTGAAAGAC ATGCCAGTGATGTTTTCCCAGAACTAAAATTAACAAATGTAACCGATTTTCTTCCTAAATGTTCAAATCCTGATAAACTT CAAGAATTTGTTAATTCTAGCCTGCAAGGGGAAGTAGCA---GAGAAC---CTAGAAACAATTCAAGTGTCTGATACTAC CAGGAATCCTGAAGATCTGGTGTTAAGTGAAGGAAAG---AGTTTGCAA---ACTCAAAGGTCTGCAGAGAGTACCAGTA TTTCAGTGGTACCTGATACTGATTATGGCACTCAAGACAGTGTCTCATTACTGGAAGCTGACACCCTGGGG---AAGGCA AAA---ACAGCACTAAATCAACCTATGAGTCAGTATGCAGCAATTAAAAATGCCACTGAACTCTCCCATGGTTGT---GC TAAAGACACTAGAAATGACACTGAGGATTTTAAGGATCCGTTGAGACATGAAGTTACC---CACACTCAGGAGACTAGTG TAGAAATGGAAGAGAGTGAACTTGATACTCAGTATTTACAGAATACATTCAAGATTTCAAAGCGTCAGTCATTTGCTCTG TTTTCGAATCCA---------GAAAATGAATGCGCAACAGTGTGTGCCCACTCCAGGTTCTTAGGCAAACAAAGTCCAAA AGTCATTTTTGAATGTAGGCAAAAAGAAGAAAATCAGGGGAAGAAAGAGTCTAAAATCAAACATGTGCAGGCAGTTCATA CAACTGCAGGCTTTCCTGTAGTTTGTCAGAAAGATAAG---CCAGGTGATTATGCCAAATGTAGCATTCAAGAAGTCTCT AGGCTTTGTCAGTCCTCTCAGTTCAGA---GGCAATGAATCTGAACTCATTACTGCAAATGAACATGAAATTTCACAAAA CCCAGATCAAATGCCATCACTTTCTCACATCAGGTCATCTGTTAAAACTAAATGTAAGGAAAAC---CTGTCAGAGGAAA AGTTTGAGGAACTTACAATATCACTTGAAAGAACAGTGGGAAATGAGAACATCGTTCAAAGTACAGTAAGCACAATTAGC CACAATAACATTAGAGAAAACGCTTTTAAAGAAGCCAGCTCAAGCAGTATTAATGAAGTAGGTTCTAGT----------- ----------------------------------------------------GGTGAAAACATTCAAGCAGAACTAGGTA GAAACAGAGCACCTAAATTAAATGCTATGCTCAGATTAGGTCTTATGCAACCTGAAGTCTATAAGCAAAGTCTT---CCT ATAACTAGTTGTAAACATCCTGAAATAAAAAGGCAAGGAGAAAATGAAGAAGCAATTCAGGCTGTTGATACTGATTTCTC TCCACATCTAATTTCAGATAACCTAGAACTA---CCTATGGGAAATAGTCATGTTTCTCAGATTTGTTCTGAGACGCCTG ATGATTTGTTAGATGATGATGAAATAAAGGAAAATAACAGCTTTGCTGAARATGGCATTAAGGAAAGATCTGCTGTTTTT AGCAAAAGTGTCCAGAAAAGAGAGTTCAGAAGGAGTCCTAGCCCTTTAGGCCAT---ACAAGTTTGGCTCAGGGTCACCA AAGAAGGGCCAGGAAATTAGACTCCTCAGAAGAGGACGTATCTAGTGAG >Anteater TGTGGCACAAATATTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAAAGAATGTAGA AAAGGCTGAATTCTGTGATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAGCAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCACACTACCAGCACAGAGAAAAAAGTAGATGTGGATGCTGATCCCCTGCATCGGAGAAAAGAA CTGAAGAAGTGGAAATCTCCATACTCTGAGAATCCTAGAGGT---ACCCAAGAT---ATTCCTTGGATAACACTGAATAG TAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGACATATTGACTTCTAATGACTCATGCAATGAGGGGTCTG AATTAAATGCAGAAGTAGCTGATGCATCAAAAGTTCCAAAT------GACGTAGATAGATATTCTGGTTCTTCAGAGAAA ATAGACTTAATAGCCAGTGACCCTCATAATGCTTTAATATGTGCAAGTGAAAAAGTCCAGTCCAAACCAACAGAGAGTAA T---ATCAAAGATAAAATATTTGGGAAAACCTATCACAGGAAGGCAAGCCTCCCTAACTTGAGCCGTATAGCTAAAGATC TTTTTATAGGAGCTGTTGCTGCAGAACCTAAGATAACACAA--------------------------------------- ---------------------GAGCTCCCCCTGACAAATGAAATGAAGCTTAAAAGGAGGACTACATCAGGACTTCATCC TGAGGATTTTATCAAGAAAGTAGATTTGACAGTTGTTCAAAAGAAGCCTAAAATGATAAATCAGGGAACTAACCAAATAG AGCAGAAT---------TGTCAACTGATGAATATTGCCAATAGTGGTAATGAAAATGAAACAAAAGGTGATTTT---GTT CAGAAGGAGAAAAGTGCTAACCCAACAGAA------TCATCAGAAAAAGAATCTGCTTTCAGAACTAAAGGTGAACCTAT AAGTAGTAGTATAAGCAATATGAAACTAGAATTAAATACCTGCAATTCAAAAGCATCTAAGAAGAATAGGCTGAGGAGAA TGTCCTCTACCAGGCATATGCATGCACTTGAACTAGTAGCCAATAGAAAGCCAAGCCAGCCTAATCACACTGAACTACAA ATTGATAGTTGTTCTAGCAGTGAAGAA---ATAAAGAAAAAA---AAATCTGACCAAAAGCCAATAAGACACAGCAGAAC AGTTCAATTCATGAAAGATAAAGAAACTGCAATTGGAGCCAAGAAGAGTAACAAGCGAAATGAACAAATAAATAAAAGAC ATGCCAGTGATGCTTTCCCAGAACTAAATTTAACAAACGTAACTGGTTTTCCTACTAAATGTTCAAATTCTGATAAACTT CAAGAATTTGTCAATTCTAGCCTGCAAGGAGAAGCAGCA---GAGAAC---CTAGAAACAATACAAGTGTCTGATACTAC CATGGACCCTGAAGGTCTGGTATTAAGTGAAGGAAAG---AATTTGCAA---ACTGAAAGATCTGTAGAGAGCACCAGTA TTTCATTGGTGCCTGACACTGATTATGGCACTCAAGATAGTATCTCATTACTAGAAGCTGCTACCCTAGGG---AAAGCA AAA---GCAGCACCAAATCAACATGTGAGTCTGTGTGCAGCAGTTGGAAATGCCACTGAACTTGTCCATGGTTGT---TC TAAAGATACTAGAAATGACACTGAAGATTTTAAGGATTCATTGAGACATGAAGTTAAC---CACACACAAGGGACAGTCA TAGAAAAGGAAGAGAGTGAACTTGATACTCAGTATTTNNNNNNNNNNTACAAGATTTCAAAGCGTCAGTCATTTGCTCTG TATTCAAATTCT---------GAAAAGGAATGTGTAACAATCTGTGCCCACTCCAGGTCCGTACGGAAACAAAGTCCAAA AGTAACTTTTGACTATAGACAAAAAGAAGAAAATCAAGGAAAGAAAGAGTCTAAGATCAAACATGTGCAGGCAGTTCATA CAACCGCAGGCTTCTCTGTAGTTTGTCAGAAAGATAAGAAGCCCCATGATTATGCCAAGTGTAGCATTCAGGGAGTCTCT AAGCTTTGTGAGTCATCTCAGTTCAGA---GGCAATGAATCTGAACTCATTACTGCAAACGAACATGGAATTTCCCCAAA TCCAGATCAAATGCCATCACTTTCTCCCAACAGGTCATCTGTTAAAACTAAATATAAGAAAAAC---TTGTCAGAAGAAA GGTTTGAGGAACATACAGTGTCACTTGATAGAGCAGTGGGAAATGAGAGCATCATTCAAAGTACAGTAAGCACAATTAGC CAAAATAACATTAGAGAAAGCACTTTTAAAGAAGCCAGCTCAAGCAGTATTAATGAAGTAGGTTCCAGTATTAATGAAGT GGGTTCCAGTATTAATGAAGTGGGTTCCAGT---------------------GGTGAGAACGTTCAAGCAGAGCTAGGTA GAAACAGA---CCTAAGCTAAATGCTATGCTCAGATTAGGTCTTATGCAACCTGAAGTCTATGAGCAAAATCTT---CCT ATAACTAATTTTAAACTTTCTGAAATTAAAAAACAAGGAGAAAATGAAGAAGTAGTTCAGGCTGTTAATACAGATTTCTC CCCATGTCTAATTTCAGATAACCTAGAACTG---CCTATGGGAAGTAGTCGTGTTTCTCAGATTTGTTCTGAGACACCTG ATGACCTGTTAGATGATGATGAAATAAAGGAAAATAACAGCTTTGCTGAAAGTGGCGTTAAGGAAAGATCTGCTGTTTTT AGCAAAAGTGTCCAGAGAAGAGAATTCAGAAGGAGCCCTAGCCCTTTAGCCCAA---ACAAGTGTGGCTCAGGGTCACCA AAGAGGGGCCAGGAAATTAGCCTCCTCAGAAGNGGACNAGTCTAGNGAG >Sloth TGTGGCACATATACTCATGCCAGCTCATTACAGCGTGAGAACAGCAGTTTATTACTCACTAAAAACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAGCAGCCTGGCTTAGCAAGGAGCCAACAGAACAGATGGGCTGAAAGTAAGGAAA CACGTAATGAT---AGGCAGACTCCCAGCACAGAGAAAAAGGTAGATGTGGATGCTGATCCCCTGTATGGGCGAAAAGAA CTGAATAAGCAGAAACCTCCATGCTCTGAGAGTCCTCAAAAT---ACCCAAGAT---ATTCCTTGGATAACACTGAATAG TAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGACATACTAACTTCTGATGACTCACACAATGGGGGGTCTG AATCAAATGCAGAAGTAGTTGGTGCATTGAAAGTTCCAAAT------GAAGTAGATGGATATTCTGGTTCTTCAGAGAAG ATAGACTTAATAGCCAGTGATCCTCACAATGCTTTAATATTTGCAGGTGAAAGAGTCCAGTCCAAACCAACAGAGACTAA C---ATTGAAGATAAAATATTTGGGAAAACCTATCACAGGAAGGCAAGCCTCCCTAACTTGAGCCACATAGCTGAAAATC TTTTTATAGGAGCCATTGCTACAAAACCTAAGATAACACAA--------------------------------------- ---------------------GAGCACCCCCTGACAAAGAAAATAAAGCATAAAAGTAGGACTACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGTCGATTTGACAGTTGTTCAAAAGATGCCTGAAAAGATAAATCAGGGAACTGACCAAATGG AGCAGAAG------AATAGTCAAGTGATAAATATTGCTAATAGTGGTCATGAGAATGAAACAAAAGATGATTAT---GTT CAGAAAGAGAAAAATGCTAACCCAACAGAA------TCATTGGAAAAAGAATCTGCTTTCAGAACTGAAGGTGAACCCAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTAAATATCTTCAATTCAAAAGTATCTAAGAAGAATAGGCTGAAGAGGA TGTCCTCTACCAGGCATATTCATGCACTTGAACTAGTAGCCAATAGAAATCCAAGCCAACCTAATCATAATGAACTACAA ATTGATAGTTGTTCTAGCAGTGAAGAG---ATAAAGAAAGAA---AATTCTGTCCAAAAGCCAATAAGGCACAGCAGAAT GCTTCAACTCCTGAAAGGTAAAGAAACTCCAACCGTAACCAAGAAGAGTAACAAGCGAAATGAACAAATAAGTAAAAGGC ATTCCAGTGATGCTTTCCCAGAACTAAATTTAACAAATGTAACTGGTTTTCTTACTAAATGTTCAAGTTCTGATAAACTT CAAGAATTTGTCAATTCTAGCCTGCAAGGAGAAGTAGCA---GAGAAC---CTAGAAACAATTCAAGTGTCTGATAGTAC CAGAGACCCCGGAGCTCTGGTGTTAAGTGGAGGAAAG---GGTTTGCAA---ACTGAAAGATCTCTAGAGAGTACCAGTA TTACAATGATACCTGAAACTGATTATGACACTCAAGACAGTATCTCCTTACTGGAAGCTGACACCCTAGGG---AAAGCA AAA---GCAGCACCAAATCAACATGTGAGTCAGTATGCAGCAATTGGAAATGCCACTAAACTTTTCCATGGTTGT---TC TAAAGATACTAGAAGTGACACTGAGAATTTTAAGGATCCATTGAGACATGAAGTTAAC---CACACACAGGAGACATTTG TAGAAATGGAAGAGAGTGAACTTGATACTCAGTATTTACAGAATACATTCAAGATTTCAAAGCGTCAATCATTTGCTCTG TTTTCAAATCCA---------GAAAAGGAATGCGCAACAGTCTCTGCCCACTCCAGGCCCTTAGGAAAACAAAGTCCAAA AGTCACTTTTGACTGTAGACAAAAAGAA---GATCAGGAGAAGAAGGAGTCTAAAATCAAACACGTGCAGGCAGTTCATA CAACTGCAGACTTTCCTGTAGTTTGTCAGAAAGATAAG---CCAGGTGATTATGCTAAATGTAGCATTCAAGGAGTCTCT AAGCTTTGTCAGTTATTTCAGTTCAGA---GGCAATGAATCTGAACCCATTACTGCAGATGAACATGAAATTTCACAAAA TCCAGATCAGATGCCATCACTTTCTCCCATGAAGTCATCTGTTAAAAGTAAATTTAAGGAAAAC---CTGTCAGAGGAAA GATTTGAGGAACATACAGTATCACTTGAAAGAGCAGTGGGAAAGGAGCACATCATTCAAAGGACAGTGAGTCCAATTAGC CAAAATAACATTAGAGAAAGCGCTTTTAAAGAAGCCAGCTCAAGCAGTATCAATGAAGTAGGTTCCAGTGTTAATGAAGT AGGTTCCAGTGTTAATGAAGTAGGTTCCAGT---------------------GGTGAGAACACTCAAGCAGAGCTAGGTA GAAACAGAGGATCTAAATTAAGTGCTATGCTCAGATTAAGTCTTATGCAACCTGAAGTCTATAAGCAAAGTCTT---CCT ATAACTAATTGTAAGCATCCTGAAATTAAAAAGCAAGGAGAAAATGAAGAAGTAGTTCAGGCTGTTAAAACA-------- ----TGTCTAATTTCAGATAACCTAGAACTA---CCTATGGGAAGTAGTCATGCTTCTCAGATTTGTTCTGAGACACCTG ATGATCTGTTAGATGATGGTGAAATAAAGGAAAATAACAGCTTTGCTGAAAGTGGCATTAAGGAAAGATCTGCTGTTTTT ACCAAAAGTGTCCAGAAAAGAGAGTTCAGAAGGAGCCCGAGCCCGTTAGCCCAA---ACAAGGGTCACC---------AA AAGACGGGCCAGGAAATTAGACTCCTCAGAAGAGGATGTGTCTAGTGAG >Dugong TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAATAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTCATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAGCAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCTAGCACAGAGAAAAAGGTAGATATGAATGCTAATCCATTGTATGAGAGAAAAGAA GTGAATAAGCAGAAACCTCCATGCTCCGAGAGTGTTAGAGAT---ACACAGGAT---ATTCCTTGGCTAACACTGAATAG TAGCATTCAGAAAGTTAATGAGTGGTTTTTCAGAAGTGATGGCCTG---------GATGACTTGCATGATAAGGGGTCTG AGTCAAATGCAGAAGTAGCTGGTGCTTTAGAAGTTCCAGAA------GAAGTACATGGATATTCTAGTTCTTCAGAGAAA ATAGACTTAATGGCTAGTGATCCTCATAGTGCTTTAATATGTGAAAGTGAAAGAGTCCTCTTCAAACCAGCAGAAAGTAA C---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAATTCTCCCTCATTTGAGCCATGTAACTGAAGATC TAATTATAGGAGCTGTTGCTACAGAACCTCAGATAGCACAA--------------------------------------- ---------------------GAACGTCCTCTTACAAATAAATTAAAGCGTAAAAGGAGA---ACATCAGGCCTTCATCC TGAGGATTTCATCAAGAAAGTAGATTTGGCAGTTGTTCAAAAGACTCCTGAAAAGATAAATCAAGAAACTGACCAAGTGG AGCAGAAT---------GGTCAAGTGATGAATATTGCTAATGGTGGTCATGGAAATGAAACAAAAGATGATTAT---GTT CAGAAAGAGAAGAATGCTAACCCAACAGAA------TCACTGACAAAGGAATCTGCTTTCAGAACTAAAGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTAAATATGCATAATTCAAAAGCACCGAAGAAGAACAGGCTGAGGAGGA AGTCCTCTACCAGGCATATTCATGCACTTGAACTAGTAGTCAATAGAAATCCAAGTCCACCTACTCATACTGAACTACAA ATTGATAGTTGGTCTAGCAGTGAAGAG---ATAAAGAAA------AGTTCTGAGCAAAAGCCAGTCAGACACAACAGAAA CCTTCAACTCATGAAAAACCAAGAAACCACAACTGGAGCCAAGAAAAGTAACAAGCCAAAGGAACAAATAAGTAAAAGAC ATGCCAGTGACGCTTACCCAGAACTAAATTTAACAAGCACAACTGGCTTAATTACTAACTGTTCAAGTTCTCATAATTAT CAAGAATTT---AATCCTAGCCTTCAGGGAGAAGAAATAGAAGAAAAT---CTGGGAACAATTCAAGTGTCTAATAGAAC CAGAGACCCCGAGGATCTAGTGTTAAATGGAGGAAGA---GGTTTGCAA---ACTGAAATATCTGTTGAGAGTACCAGTA TCTCAGTGATACCTGATACTGACTATGGCAGTCAGAACAGCATCTCATTACTGGAAGCTGACACCCTCAGG---AAGGCA AAA---ACAGCACCAAATCAATGTGCAAGTCAGTGTGCAGCAATTGAAAACCCCAATGAACTTATCCATGGTTGT---CC TAAAGATACTAGAAATGACACAGAGGATTTTAAGGATCTGTTGAGATGTGAAGTTAAC---CACATTCAGGAGACGTGCG TAGAAATGGAAGACAGNGAACTTGATACTCAGTATTTGCAGAGTACATTCAAGGTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCAAATCCA---------GAAAAGGAATGTGCAACAATTTGTGCCCACTCCAAGTCCTTAAGGAAACAAAGTCCAAA AGTCACTCCTGAGTATGGAGAAGAAGAAGAAAATCAAGGGAACAAAGAGTCTAAAATCAAGCATGAGCAGGCAGTTCATA CGACTGCGGGCTATCCTGAGGATTGTCAGAAAGAGAAGAAGCCAAGTGATTATACCAAAAGTAGCATCAAAGGAGTCTCT AGGCTTTGTCAGTCATCTCAGTTCAGA---GGCAGTGAATCCCAACACATTACTGCAGGTGAACATGGAATTTCACAAAA TCCAGATCAAATGCCATTGCTTTCTCCCATCAGGGCATCTGTTAAAAGT------AAGAAAAAC---TTGTCAGAAGAAA GGTTTGAGGAACATACAATATCACTTGAAAGAGCAGTAGGAAATGAGAGCATCGTTCAAAGTACAGTGAGCACAGTTAGC CAAAATGACATTAAGGAAAGTGCTTCTAAAGAAGCCAGCTCAAGCAGTATTAATGAAGTAGGTTCTAGT----------- ----------------------------------------------------GGCGAAAACATTCGAGCAGAGCTAGGTA GGAACAGAGGACCTAAATTAAATGCTGTGCTCAGATTAGGTCTTATGCAACCTGAAGTCTATAAACAAAGTCTT---CCT GTAAGTAACTGTAAACGTCCTGAAATAAAAAGGCAAGGAGAAAATGAAGGAGTAGTTCAGGATGTTAATATGGATTTCTC TCCATGTCTAATTTCAGATAACCTAGAACAA---CCTATGGGAAGTAGTCGTGCTTCTCAGATTTGTTCTGAGACTCCTG ATGACCTGTTAGATGATGATGAAATAAAGGAAAATATCAGCTTTGCTGAAAGTGGCATTAAGGAAAGATCTGCTGTTTTT AGTAAA---GACCAGAGAAGAGAGTTCAGAAGGAACCCAAGCCCTTTATCCCAT---TCAGGTTTGGCTCAGGGTCACCT AAGAGGGGCCAGGGAATTAGAGTCCTCAGACGAGAACATATCTAGTGAG >Manatee TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAATAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTCATAAAAGCAAACAGCCTGGCTTAACAAGGAGCCAGCAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCTAGCACAGAGAAAAAGGTAGATATGAATGCTAATCCATTGTATGAGAGAAAAGAA GTGAATAAGCAGAAACCTCCATGCTCCGAGAGTGTTAGAGAT---ACACAAGAT---ATTCCTTGGCTGACACTGAATAG TAGCATTCAGAAAGTTAATGAGTGGTTTTTCAGAAGTGATGGCCTG---------GATGACTTGCATGATAAGGGGTCTG AATCAAATGCAGAAGTAGCTGGTGCATTAGAAGTTCCAGAA------GAAGTACATGGATATTCTAGTTCTTCAGAGAAA ATAGACTTAATGGCCAGTGATCCTCATAGTGCTTTAATATGTGAAAGTGAAAGAGTCCTCTCCAAACCAGCAGAAAGTAA C---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAATTCTCCCTCATTTGAGCCATGTAACTGAAGATC TAATTATAGGAGCTGTTGCTACAGAACCTCAGATAGCACAA--------------------------------------- ---------------------GAACGTCCCCTTACAAATAAATTAAAGCGTAAAAGGAGA---ACATCAGGCCTTCATCC TGAGGATTTCATCAAGAAAGTAGATTTGGCAGTTGTTCAAAAGACTCCTGAAAAGATAAATCAGGAAACTGACCAAGTGG AGCAGAAT---------GGTCAAGTGATGAATATTGCTAATGGTGGTCATGAAAATGAAACAAAAGATGATTAT---GTT CAGAAGGAGAAGAATGCTAACCCAACAGAA------TCACTGACAAAGGAATCTGCTTTCAGAACTAAAGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTAAATATGCATAATTCAAAAGCACTGAAGAAGAATCGGCTGAGGAGGA AGTCCTCTACCAGGCATATTCATGCACTTGAACTAGTAGTCAATAGAAATCCAAGTCCACCTACTCATACTGAACTACAA ATTGATAGTTGGTCTAGCAGTGACGAG---ATAAAGAAA------AGTTCTGAGCAAAAGCCAGTCAGACACAACAGAAA CCTTCAACTCATGAAAAACCAAGAAACCACAACTGGAGCCAAGAAAAGTAACAAGCCAAAGGAACAAATAAGTAAAAGAC ATGCCAGTGACACTTACCCAGAACTAAATTTAACAAGCACAACTGGCTTAATTACTAACTGTTCAAGTTCTCATAATTAT CAAAAATTTGTTAATCCTAGCCTTCAGGGAGAAGAAATAGAAGAAAAT---CTGGGAGCAACTCAAGTGTCTAATAGAAC CAGAGACCCCGAGGATCTAGTGTTAAATGGAGGAAGA---GGTTTGCAA---ACTGAAATATCTGTTGAGAGTACCAGTA TCTCAGTGATACCTGATACTGATTATGGCAGTCAGAACAGCATCTCATTACTGGAAGCTGACACCCTCAGG---AAGGCA AAA---ACAGCACCAAATCACTGTGCAAGTCAGTGTGCAGCAATTGAAAACCCCAATGAACTTATCCATGGTTGT---CC TAAAGATACTAGAAATGACACAGAGGATTTTAAGGATCTGTTGAGATGTGAAGTTAAC---CACGTTCAGGAGACATGCA TAGAAATGGAAGACAGTGAACTTGATACTCAGTATTTGCAGAGTACATTCAAGGTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCAAATCCA---------GAAAAGGAATGTGCAACAATTTGTGCCCACTCCAAGTCCTTAAGGAAACAAAGTCCAAA AGTCACTCCTGAGTATGGAGAAAAAGAAGAAAATCAAGGGAACAAAGAGTCTAAAATCAAGCATGAGCAGGCAGTTCATA CAACTGCTGGCTATCCTGAGGATTGTCAGAAAGAGAAGAAGCCAAGTGATTATACCAAATGTAGCACTAAAGGAGTCTCT AGGCTTTGTCAGTCATCTCAGTTCAGA---GGCAGTGAATCCGAACACATT---ACAGGTGAACATGGAATTTCACAAAA TCCAGATCAAATGCCATTGCTTTCTCCCATCAGGGCATCTGTTAAAAGT------AAGAAAAAC---TTGTCAGAAGAAA GGTTTGAGGAACATACAATATCACTTGAAAGAGCAGTAGGAAATGAGAGCATCGTTCAAAGTACAGTGAGCACAGTTAGC CAAAATAACATTAAGGAAAGTGCTTCTAAAGAAGCCAGCTCAAGCAGTATTAATGAAGTAGGTTCCAGTGTTAATGAAGT AGGTTCTAGT------------------------------------------GGCGAAAACATTGAAGCAGAGCTAGGTA GGAACAGAGGACCTAAATTAAATGCTGTGCTCAGATTAGGTCTTATGCAACCTGAAGTGTATAAACAAAGTCTT---CCT GTAAGTAACTGTAAACATCCTGAAATAAAAAGGCAAGGAGAAAATGAAGGAGTCGTTGAGGATGTTAATATGGATTTCTC TCCATGTCTAATTTCAGATAACCTAGAACAA---CCTATGGGAAGTAGTCGTGCTTCTCAGATTTGTTCTGAGACTCCTG ATGACCTGTTAGATGATGATGAAATAAAGGAAAATATCAGCTTTGCTGAAAGTGGCATTAAGGAAAGATCTGCTGTTTTT AGTAAA---GACCAGAGAAGAGAGTTCAGAAGGAACCCAAGCCCTTTATCCCGT---TCAGGTTTGGCTCAGGGTCACCT AAGAGGGGCCAGGGAATTAGAGTCCTCAGAAGAGAACATATCTAGTGAG >AfricanEl TGTGGCACAAATACTCATGCCAGCTCATTACAGCATAAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAACAAAAGCAAACAGCCTGGCTTAGCAAGAAGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCTAGCACAGAGAAAAAGGTAGATGTGAATGCTGATCCCTTGTATGAGAGAAAAGAA GTGAATAAGCAGAAACCTCCACGCTCTGAGAATCTTAGAGAC---ACCCAAGAT---ATTCCTTGGATAACACTGAATAG TAGCATTCAGAAAGTTAATGAGTGGTTTTTCAGAAGTGACGGCCTG---------GATGTCTTAAATGATGAGGGGCCTG AATCCAGTGCAGAAGTAGCTGGTGCATTAGAAGTTCCAAAT------GAAGTACAT------TCTAATTCTTCAGAGAAA ATAGACCTAATGGCCAGTGATCTGCATGGTGCTTTAATATGTGAAAGTGAAAGAGTCCCCTCCAAACCAGCAGAAAGTAA C---ATCGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAGGTATGTCTCATTTGAGCCACATAACTGAAGATC TGATTATGGGAGCTATTGCTTCAGAACCTCAGATAGCACGA--------------------------------------- ---------------------GAACATCCTTTTACAAATAAATTAAAGCGTAAAAGGAGA---ACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGTAGATTTGGCAGTTGTTCAAAAGATCCCTGAAAAGATAAATCAGGAAACTGACCATGTGG AGCAGAAC---------GGTCAAGTGATGAATATTGCTAATGGTGGTCGTGAGAATGAAACAAAAGGTGATTAT---GTT CAGAAAGAGAAGAATGCTATCCCAACAGAA------TCATTGGCAAAAGAATCTGCTTTCAGAACTAAAGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAACTAGAATTAAATATGCATAATTCAAAAGCACCAAAGAAGAATAGGCTGAGGAGGA AGTCGTCAACCAGACATATTCATGCACTTGAACTAGTAGTCAATAGAAATCCAAGTCCACCTACTCATACTGAACTACAA ATTGACAGTTGGTCTAGCAGTGAAGAG---ACAAAGAAAAAA---AGTTCTGAGCAAAAGCCAATCAGACACAACAGAAA CCTTCAACTCATGAAAAATCAAGAAACCGCAACTGGAGCCAAGAAGAGTAACAAGCCAAAGGAACAAATAAGTAAAAGAC ATGGCGCTGACTCTTACCCAGAACTACATTTAACAACCACAGCTGGCTTTATTACTAAGTGTTCAAGTTCTGATAATCTT CAAGAATTTGTCAATCCTAGCCTTCAAGGAGAGAAAACAGAAGAAAAC---CTGGAAACAATTCAAGTGTCTAATATTAC CAAAGAGCCCAAGGATCTAGTGTTAAATGGAGGAAGA---GATTTGCAA---ACCAAAAAATCTATTGAGAGTACCAATA TCTCAGTGATACCTGATACTGTTTATGGCACTCAGGACAGCGTCTCATTGCTGGGAGCTGACACCCCAGGG---AAGGCA AAA---ACAGCACCAAATCGATGTGCAAGTCAGTGTACAGCAATTGAAAACCCAAGTGAACTTACCAACAGTTGT---CC TAAAGATACTAGAAATGACACAGAGGGTTTTAAGGATCTATTGAGATGTGAAGCTAGC---CACATTCAGGAGACATGCA TAGAAATAGAAGAGAGTGAACTTGATACTCAGTATTTGCAGAGTACATTCAAGGTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCAAATCCA---------GAAAAGAAATGTGCAACAATTTGTGCCCACTCCAAGTCCTTAAGGAAGCAAAGCCCAAA AGTCACTCTTGTGTGTGGAGAAAAAGAAGAAAATCAAGGGAACAGAGAGCCTAAAATCAAGCATGAGCAGGCAGTTCATA TGCCTACAGGCTATCCTGAGGCTTGTCAGAAAGAGAAG---CCAAGTGACTATACCAAATATAGCATTAAAGGAGTCTCT GGGCTTTGTCAGTCATCTCAGTTCAGA---GGCAGTGAATCTGAACTCATTACTGCAGATGGACATGGAATCTCACAAAA CCCAGATCAAATACCATCACTTTCTCCCACCAGGTCATCTGTTAAAACTAAATGTAAGACAAAC---CTGTTGGAAGAAA GGTTTGAGGAACATACAATATCACTTGAAAGAGCAATGAGAAATGAGAACGTCATTCAAAGTACAGTGAGCACAGTTAGC CAAAATAACATTAGGGAAAGTGCTTCTAAAGAAGCCAGCTCAAGCAGTATTAATGAAGTATGTTCCAGTATTAATGAAGT AGGTTCTAGT------------------------------------------GGTGAAAACATTCAAGCAGAAATAAGTA GGAAGAGAGGACCTAAATTAAATGCTGTGCTCAGATTAGGTCTTATGCAACCTGAAGTTTATAAACAAAGTCTT---CCT ATAAGTGACTGTAAACATCCTGGAATAAAAACGCAAGGAGAAAATGAAGGAGTAGTTCAGGCTGTTAATACAGATTTCTC CCCATGTCTAATTTCAGATAACCTAGAACAA---CCTGTGGGAACTAGTCGTGCTTCTCAGGTTTGTTCTGAGACTCCCG ACGACCTGTTAGATGATGATGAAATAAAGGAAAATATCAGCTTTGCTGAAAACGGCATTAAGGAAAGATCT---GTTTTT ATTAAAGATGACCAGAGAAGAGAGTTCAGAAGGAACCCAAGCCCTTTATCCCAT---TCAGGTTTGGCTCAGGGTTGCCT AAGAGGGGCCAGGGAATTGGAGCCCTCACAAGAGAACATATCTAG---- >AsianElep TGTGGCACAAATACTCATGCCAGCTCATTACAGCATAAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAACAAAAGCAAACAGCCTGGCTTAGCAAGAAGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCTAGCACAGAGAAAAAGGTAGATGTGAATGCTAATCCCTTGTATGAGAGAAAAGAA GTGAATAAGCAGAAACCTCCACGCTCTGAGAATCTTAGAGAC---ACCCAAGAT---ATTCCTTGGATAACACTGAATAG TAGCATTCAGAAAGTTAATGAGTGGTTTTTCAGAAGTGACGGCCTG---------GATGTCTTAAATGATGAGGGGCCTG AATCCAGTGCAGAAGTAGCTGGTGCATTAGAAGTTCCAAAT------GAAGTACAT------TCTAATTCTTCAGAGAAA ATAGACCTAATGGCCAGTGATCTGCGTGGTGCTTTAATATGTGAAAGTGAAAGAGTCCCCTCCAAACCAGCAGAAAGTAA C---ATCGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAGGTATCTCTCATTTGAGCCACATAACTGAAGATC TGATTATGGGAGCTATTGCTTCAGAACCTCAGATAGCACGA--------------------------------------- ---------------------GAACATCCTTTTACAAATAAATTAAAGCGTAAAAGGAGA---ACATCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGTAGATTTGGCAGTTGTTCAAAAGATCCCTGAAAAGATAAATCAGGAAACTGACCATGTGG AGCAGAAC---------GGTCAAGTGATGAATATTGCTAATGGTGGTCGTGAGAATGAAACAAAAGGTGATTAT---GTT CAGAAAGAGAAGAATGCTATCCCAACAGAA------TCATTGGCAAAAGAATCTGCTTTCAGAACTAAAGCTGAACCTAT AAGCAGCAGTATAAGCAATATGGAGGTAGAATTAAATATGCATAATTCAAAAGCCCCAAAGAAGAATAGGCTGAGGAGGA AGTCGTCAACCAGGCATATTCATGCACTTGAACTAGTAGTCAATAGAAATCTAAGTCCACCTACTCATACTGAACTACAA ATTGACAGTTGGTCTAGCAGTGAAGAG---ACAAAGAAAAAA---AGTTCTGAGCAAAAGCCAATCAGACACAACAGAAA CCTTCAACTCATGAAAAATCAAGAAACCGCAACTGGAGCCAAGAAGAGTAACAAGCCAAAGGAACAAATAAGTAAAAGAC ATGGCGCTGACTCTTACCCAGAACTACATTTAACAACCACAGCTGGCTTTATTACTAAGTGTTCAAGTTCTGATAATCTT CAAGAATTTGTCAATCCTAGCCTTCAAGGAGAGAAAACAGAAGAAAAC---CTGGAAACAATTCAAGTGTCTAATATTAC CAAAGAGCCCAAGGATCTAGTGTTAAATGGAGGAAGA---GATTTGCAA---ACCAAAAAATCTATTGAGAGTACCAATA TCTCAGTGATACCTGATACTGTTTATGGCACTCAGGACAGCGTCTCATTGCTGGGAGCTGACACCCCAGGG---AAGGCA AAA---ACAGCACCAAATCAATGTGCAAGTCAGTGTACAGCAATTGAAAACCCAAGTGAACTTACCAACAGTTGT---CC TAAAGATACTAGAAATGACACAGAGGGTTTTAAGGATCTATTGAGATGTGAAGCTAGC---CACATTCAGGAGACATGCA TAGAAATAGAAGAGAGTGAACTTGATACTCAGTATTTACAGAGTACATTCAAGGTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCAAATCCA---------GAAAAGAAATGTGCAACAATTTGTGCCCACTCCAAGTCCTTAAGGAAGCAAAGCCCAAA AGTCACTCTTGTGTGTGGAGAAAAAGAAGAAAATCAAGGGAACAGAGAGCCTAAAATCAAGCATGAGCAGGCAGTTCATA TGCCTGCAGGCTATCCTGAGGCTTGTCAGAAAGAGAAG---CCAAGTGACTATACCAAATATAGCATTAAAGGAGTCTCT GGGCTTTGTCAGTCATCTCAGTTCAGA---GGCAGTGAATCTGAACTCATTACTGCAGATGGACATGGAATCTCACAAAA CCCAGATCAAATACCATCACTTTCTCCCACCAGGTCATCTGTTAAAACTAAATGTAAGACAAAC---CTGTTGGAAGAAA GGTTTGAGGAACATACAATATCACTTGAAAGAGCAATGAGAAATGAGAACGTCATTCAAAGTACAGTGAGCACAGTTAGC CAAAATAACATTAGGGAAAGTGCTTCTAAAGAAGCCAGCTCAAGCAGTATTAATGAAGTATGTTCCAGTATTAATGAAGT AGGTTCTAGT------------------------------------------GGTGAAAACATTCAAGCAGAAATAAGTA GGAAGAGAGGACCTAAATTAAATGCTGTGCTCAGATTAGGTCTTATGCAACCTGAAGTTTATAAACAAAGTCTT---CCT ATAAGTGACTGTAAACATCCTGGAATAAAAACGCAAGGAGAAAATGAAGGAGTAGTTCAGGCTGTTAATACAGATTTCTC CCCATGTCTAATTTCAGATAACCTAGAACAA---CCTGTGGGAACTAGTCGTGCTTCTCAGGTTTGTTCTGAGACTCCCG ACGACCTGTTAGATGATGATGAAATAAAGGAAAATATCAGCTTTGCTGAAAACAGCATTAAGGAAAGATCT---GTTTTT ATTAAAGATGACCAGAGAAGAGAGTTCAGAAGGAACCCAAGCCCTTTATCCCAT---TCAGGTTTGGCTCAGGGTTGCCT AAGAGGGGCCAGGGAATTGGAGCCCTCACAAGAGAACA----------- >RockHyrax TGTGGCACAGATACTTGTGCCAGCTCGTTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGGATGTAGA AAAGGCTGAATTCTGTAGTAAGAGCAAACAGCCTGGCTTAGCAAGGAGCCAACAGAGCAGATGGGCTGAGAGTACGGAAA CATGTAATGGT---AGGCAGATTCTTAGAACAGAGAAAAAGGTAGAAACGAATGCTGATCCTTTGTATGGGAAAAAAGAA GGGAATAAGCAGAAACCTCCATGCTCTGAGAGTCGCACAGAT---ACGCAAGAT---ATTCCTTGGGTAACACTGAATAA CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGACAACCTA---------AGTGATTCACCTAGTGAGGGGTCTG AATTAAATGGAAAAGTGGCTGGTCCAGTAAAACTTCCAGGT------GAAGTACATAGATATTCTAGTTTTCCAGAGAAC ATAGATTTAATGGCCAGTGGTCCTCCTGGT------------------GAAAGAATCCCCTGCAAACCACCAAAAAGTAA C---ATCAAAGATAAAATATTTGGGAAAACCTATCAGAGGAAGGGAAGTCTCCCTCACCTGAGCCACATAACCGAAGATC TAGTTAAAGGCGCCATTGCTACACAACCACAGATAGCACGA--------------------------------------- ---------------------GAACATCCTCTTACAAATAAGTTACAGCATAAAAGGAGA---ACATCAGGCCTTCATCC TGAGGATTTTATCAAAAAAGCAGATTTGACAGTTGTTCAGAAGGCATCTGAGAAGATAAATCAGGAAACTGACAAAGTGG AGCAGAGT---------GGTCAAGTGATAAATATTGGTAATGGTGGTCATGAGAATGAAACAAAAGATGATTAT---GTT CTGAAAGAGAAGAATGGTAACCCAGCAGAA------TCACTGAAAAAAGAATCTGTGTTCAAAACTAAAGCTGAACCTAT AAGCAACAGTATAAGCAATATGGAACTAGAATTAAATACACATAATTCAAAATCACTGAAGAAGAACAGGCTGAGGAAGA AGTCCTCCTCCAGACATGTTCATGCCCTTGAACTGGTAGTCAATAGAAATCCAAGTTCACCACCTCATACTGGGCTACAA ATTGATAGCTGGTCTGGCAGTGAAGAA---ATGAAGAAAACA---AGTTCTGAGCAAAAGCCAGTCAGACACAACATAAA CCTCCAACTCATGAAAAACCAAGAAACCACAACTGGAGCCAAGAAGAGTAACAAGCCAAAGGAACAAATAAATAAAAGAC ACACTAGTAATCCTTACCCAGAACTAAATTTAACAAGCACAGCTGGCTTTATTACTGTATGTTCAAGTTCTGATAATCTT CAAGAACCTGTCAACCCCAGCCTTCAAGGAGAGGAAATAGAAGAAAAC---TTGGCAACAGTCCAAATGTCTAATACTGC CAAAGAACCTGAGGATCTAGAGTTAAATGGAGGGAGA---GTTTTGCAA---ACCAAAAGATCTGTTGAAAGTACTAGTA CCTCAGTGATACCTGATGCTGACTGTGGTGCTCAGGACAGCATCTCGTTACTGGAAGCTGACACTCTAGGG---AAGGCA AAA---ACAGCACCAAATCAAGGGGCAGGTCAATGTTCGGCAATCGAAAACCCCAACGAACTTATTCATGGTAGT---CC TAAAGACACTAGAAATGATATAGAGGGTTTTAAGGATCCACTGAGATGCAAAGTTAAC---CCTATTCAGGAGATATGTG TAGAAATGGATGAAAATGAACTTGATACTCAGTATATACAGAGTACATTCAAGGTTTCAAAGCGTCATTCTTTTGCTCTG TTTTCAAATCCG---------GAAAAGGAGTGTGCGACAAGTTATACCCACTCCAAGTCGTTAAGGAAAGAAAATCCCAA AGTCACTCTTCAGCGTGGAGAAGAAGAAGAAAATCAAGGGAACAAAGAATCTAAAATCAAGCAT---------GTTCATA CAACTGCAAGCTGTCCTGAGGTTTGTCAGGAAGACACAAAGCCTAGTGATCGTACTAACTGTAGTGTTAAAGGACTCCCT AGGCTTTGTCACTCATCTCAATTCAAA---GGTAGTGAGTCTGAACTCATTACTGAAGGTGAACATGGAATTGCACAAAA CCCGGATCAGATGCCATCATCTTCTCCCATCAGATCATCTGTTAACTCTAAGTGTAACAAAAAC---CTGTCAGAAGAGC GATTTAAGGAACATAAAATATTACTTGAAAGAACAACAGGAAATGAAACCATTGTCCAAAGTACAGTGAGCACAGGTAGC CAAGATAACATTAGGGGAAGTGCTTCGAAAGAAACCAGCTCAAGCAGTATTAATGAAGTAGGTTCTAGT----------- ----------------------------------------------------AGCGAAAACATTCAAGCAGAAATAAGTA GGAACAGAGAACCTAGATTAAATGCTGTGCTCAGGTTAGGTGTTATGCAACCTGAAGTGTATAAACAAAGTCTT---TCT ATAAGTAACTGTAAACAGCCAGAAATAAAAAAGCGAGGAGAAAATGAAGGAGTAGTTCAGGCTGTTAGTACAGATTTCTC TGCATGTCCCATTTCAGAAAACCTAGAACAA---CCTGTGAGAAGTAGTCACACTTCTCAGGTTTGTTCTCAGACTCCTG ATAACCTGTTAGATGATGATGAAATAAAGGGAAAGACTGACTTTGCTGGAAGTAGCATTAAGGACAGACCT---GTCTTT AGTAAAGATGACCAGGGAAGAGAGTTCAGAAGGAACCCAAGCCCTTTATCCCAT---TCAGGTTTGGCTCAGGGCCACCT GATAGGGGCCAGGGAATTAGAGGCCTCACAAGAGAACACATCTAGCGG- >TreeHyrax ------NCAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGGATGTAGA AAAGGCTGAATTCTGTAGTAAGAGCAAACAGCCTGGCTTAGCAAGGAGCCAACAGAGCAGATGGGCTGAGAGTACGGAAA CATGTAATGAT---AGGCAGATTCTTAGAACAGAGAAAAAGGTAGATACAAATGCTGATCCTTTGTATGGTAAAAAAGAA GGGAATAAGCAGAAACCTCCATGCTCTGAGAGTCGCACAGAT---ACGCAAGAT---ATTCCTTGGGTAACACTGAATAA CAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGACAACCTA---------AGTGATTCACCTAGTGAGGGGTCTG AATTAAATGGAAAAGTGGCTGGTCCAGTAAAGCTTCCAGGT------GAAGTACATAGATATTCTAGTTTTCCAGAGAAC ATAGATTTAATGGCCAGTGGTCCTCCTGGTGATTTAATATGTGAAAGTGAAAGAATCCCCTGCAAACCACCAAAAAGTAA C---ATCAAAGATAAAATATTTGGGAAAACCTATCAGAGGAAGGGAAGTCTCCCTCACCTGAGCCACATAACTGAAGATC TAGTTAAAGGCGCCATTACTACACAACCACAGATAGCACGA--------------------------------------- ---------------------GAACATCCTCTTACAAATAAGTTACAGCATAAAAGGAGA---ACATCAGGCCTTCATCC TGAGGATTTTATCAAAAAAGCAGATTTGACAGTTGTTCAGAAGGCATCTGAGAAGATAAATCAGGAAACTGACAAAGTGG AGCAGAGT---------GGTCAAGTGATAAATATTGCTAATGGTGGTCATGAGAAGGAAACAAAAGATGATTAT---GTT CTGAAAGAGAAGAATGGTAACCCAGCAGAA------TCACTGAAAAAAGAATCTGTGTTCAAAACTAAAGCTGAACCTAT AAGCAACAGTATAAGCAATATGGAACTAGAATTAAATACACATAATTCAAAATCACTGAAGAAGAACAGACTGAGGAAGA AGTCCTCCTCCAGACATGTTCATGCCCTTGAACTGGTAGTCAATAGAAATCCAAGTTCACCACCTCATACTGGGCTACAA ATTGATAGTTGGTCTGGCAGTGAAGAA---ATGAAGAAAACA---AGTTCTGAGCAAAAGCCAGTCAGACACAACATAAA CCTCCAACTCATGAAAAACCAAGAAACCACAACTGGAGCCAAGAAGAGTAACAAGCCAAAGGAACAAATAAATAAAAGAC ACACTAGTAATCCTTACCCAGAACTAAATTTAACAAGCACAGCTGGCTTTATTACTGCGTGTTCAAGTTCTGATAATCTT CAAGAACCTGTCAACCCCAGCCTTCAAGGAGAGGAAATAGAAAAAAAC---TTGGCAACAGTCCAAATGTCTAATACTGC CAAAGAACCTGAGGATCTAGAGTTAAATGGAGGGGGA---GTTTTGCAA---ACCAAAAGATCTGTTGAAAGTACCAGTA CCTCAGTGATACCTGATGCTGACTGTGGTGCTCAGGACAGTATCTCGTTACTGGAAGCTGACACTCTAGGG---AAGGCA AAA---ACAGCACCAAATCAAGGGGCAGGTCAATGTTCAGCAATCGAAAACCCCAACGAACTTATTCATGGTAGT---CC TAAAGACACTAGAAATGATATAGAGGGTTTTAAGGATCCACTGAGATGCGAAGTTAAC---CCTATTCAGGAGATATGTG TAGAAATGGATGAAAATGAACTTGATACTCAGTATATACAGAGTACATTCAAGGTTTCAAAGCGTCATTCTTTTGCTCTG TTTTCAAATCCA---------GAAAAGGAGTGTGTGAAAAGTTATACCCACTCCAAGTCGTTAAGGAAAGAAAATCCCAA AGTCACTCTTCAGCGTGGAGAAGAAGAAGAAAATCAAGGGAACAAAGAATCTAAAATCAAGCAT---------GTTCATA CAACTGCAAGCTGTCCTAAGGTTTGTCAGGAAGACGCAAAGCCTAGTGATCGTACCAACTGTAGTGTTAAAGGACTCCCT AGGCTTTGTCACTCATCTCAATTCAAA---GGTAGTGAGTCTGAACTCATTACTGAAGGTGAACATGGAATTGCACAAAA CCCAGATCAGATGCCATCATCTTCTCCCATCAGATCATCTGTTAACTCTAAGTGTAACAAAAAC---CTGTCAGAAGAGC GATTTAAGGAACATAAAATATTATTTGAAAGAACAACAGGAAATGAAACCATTGTCCAAAGTACAGTGAGCACAGGTAGC CAAGATAACATTAGGGGAAGTGCTTCGAAAGAAACCAGCTCAAGCAGTATTAATGAAGTAGGTTCTAGT----------- ----------------------------------------------------AGCGAAAACATTCAAGCAGAAATAAGTA GGAACAGAGAACCTAGATTAAATGCTGTGCTCAGGTTAGGTGTTATGCAACCTGAAGTGTATAAGCAAAGTCTT---TCT ATAAGTAACTGTAAACAGCCAGAAATAAAAAAGCAAGGAGAAAATGAAGGAGTAGTTCAGGCTGTTAGTACGGATTTCTC TGCATGTCCCATTTCAGAAAACCTAGAACAA---CCTGTGAGAAGTAGTCACACTTCTCAGGTTTGTTCTCAGACTCCTG ATAACCTGTTAGATGATGATGAAATAAAGGGAAAGACTGACTTTGCTGGAAGTAGCATTAAGGACAGACCT---GTCTTT AGTAAAGATGACCAGGGAAGAGAGTTCAGAAGGAACCCAAGCCCTTTGTCCCAT---TCAGGTTTGGCTCAGGGCCACCT GATAGGGGCCAGGGAATTAGAGNCCTCACAAGAGAACACATCTAGCAA- >Aardvark TGTGGCACAAATACTCATGCCAGCTCGTTACAGCATGAGAACAGCAGTTTATCACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGTGAGGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCTTAGCACAGGGGAAAAGGTAGATATGAAGGCTGATCCCTTGTATGGGAGAAAAGAA GTGAATAAGCAGAAACCTCCATGCTCTGAGAATCCTAGAGAT---ACTGAAGAT---ATTCCTTGGATAACACTGAATAG TAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGGCCTG---------GATGGCTCACATGATGAAGGGTCTG AATCAAATGCAGAAATAGGTGGTGCATTAGAAGTTTCAAAT------GAAGTACATAGTTACTCTGGTTCTTCAGAGAAA ATAGACTTAATGGCCAGTGAACCTCATGTTGCTTTAATATGTGAAAGTGAAAGAGTCCTCTCCAAACCAGCAGAAAGTAA C---ATCGAAGATAAGATATTTGGGAAAACCTATCGGAGGAAGTCGAGTCTCCCTAACATAAGCCATATAACTGAAGATC TAATTTTAGGAGCTATTGCTACAGAACCTCAGGTTGCACGA--------------------------------------- ---------------------GAATGTCCTCTTGGAAATAAATTAAAGCGTAAAAGGAGA---ACATCAGGCCTTCATCC TGAGGATTTTATTAAGAAAGTAGATTTGGCAGTT---CAAAAGACTCCTGAAAAGATAAATCAGGAAAATGACCAAATGG AACAAAAT---------GGTCAGGAGGAGAATATTGCTGATGGCTGTCATGAGAATGCAACAAAAGGTGAATAC---ATG CAGAAAAAGAAGAGTGCAAATCCAACAGAA------TCATTGGCAACAGAATCTGTTTTCAGAACTAAAGCTGAACCTAT AAGCAGCAGTATAAGCAATTTGGAACTAGAATTAAATACACACAATTCAAAGGCATCCAAGAAGAATAAGCTGAGGAGGA AATCCTCTACCAGGCATATTCATGCACTTGAACTAGTAGTCAATAGGAATCCAAGTCCCCCTAGTCATACTGAGTTACAG ATTGATAGCTGGCCTAGCAGTGAAGAG---TTAAAGAAAAAA---AGTTCTGAGCACAAGCCAATCAGACAGAATACAAA CCTGCAACTCATGAAAGATCAAGAGGCCACAACTGGGGCCAAGAAGAGTAACAAGCCAAATGAACAAATAAGTACAGGAC ACGCCTCTGACATTTTCCCAAAATTGAATTTAACAAACATAACTGGTTTTATTACTAATTGTTCAAGTTCTGATAATCTT CAAGAATTTGTCAATCCTAGCCTTCAAGAAGAGGAAATAAAAGAGAAC---CTGGGAACAATTCAAGTGTCTGATAGTAC CAGAGATCCTACGGATGAGGTGTTAAAC---AGAAGA---GGTTTGCAA---ACTGAAAGATCTGTGGAGAATACCAGTA TTTCAGTGAAACCTGATACTGATTATAGCACTCAGGACAGCATCTTATTACTGAAAGCTAACTCCCTAAGG---AAGGCA AAA---ACAGCACCA------------AGTCAGTGTGCAGCAATTGAAAATCCTAACAAACTTAGCCATGGTTTT---CC TAAAGATACCAGAAATGACATAGAGGGTTTTAAGGATCTATTTAGAGGTGAAGATAAC---CACGTTCAGGAGACATACA TAGAAATGGAAGAGAGTGAACTTGATACTCAGTATTTACAGAATACATTCAAGGTTTCAAAGCGTCAGTCATTTGCTCTG TTTTCAAATCCA---------GAAAAGGAATTTGCAACAGTCCATGCCCACTCCAGGTCCTTGAGGAAACAAAGTCCAAA CATCACTCTTGAGTGTGGAGAAAAAGAAGAAAATCAGGGGAACGAGGAATCTAAAATCAAGTGCGTACAGCTAGTTCTTT CAACTACAGGCTATGCTGGAGCTTGTCAGAAAGAGAAG---CCAAGTGATTATGCCAAATGTAGCATTAAAGGAGTCTCT AGACTTTGTCAGTCATCTCAATTCAGA---GGCAATGAATCTGAAATCATTACTGCAAATGAACATGGAGTCTCACAAAA CCTGGATCAGACACCATCACTTTCTCCCACTAGGTCATCTGTTAAAGCTAAATGTAAGACAAAT---CTGTCCAAAGAAA GATTTGAGCAACAGAAAATATCACATGAAAGAGTAATGGGAAATGAGAGCACCATTCAGAGTACAGTGGGCACAGTTAGC CAAAGTAACATTAGGGAAAGTGCTTTTAAAGAAGCTAGCTCAAGCAGTATTAATGAGGTAGGTTCCAGTGTTAATGAAAT AGGTTCTAGT------------------------------------------GGTGAAAACATTCAAGCAGAACTAGGTA AGAACAGAGGACCTAAATTGAATGCTGTGCTCAAATTATGTCTTATGCAACCTGAAGAGTATAAACAAAGTCCT---CCT ATAAGTAATTGTAAACATCCTTCAATAAAAACCCAAGGACAAAATGAAGGAGTAGTTCAGGCTGTTAATACAGGTTTTTC TTCATGTCTGATTTCAGATAACCTAGGACAA---CCTATGGGAAGCAGTCATGCTTCTCAGATTTGTTCTGAGACACCTG ATGACCTGTTAGATGATGACAAAATAAAGGAAAATACCAGCTTTGCTGTAAGTGGCATTAAGGAAAGATCTGCTGTTTTT AGTAAAGATGACCAGGAAAGAGAGTTCAGAAGGAGCCTGAGCCGTTTCTCCCAT---TCAAGTTTGGCTCAGGTTCACGT AAGAGGTGCCAGGGAATTAGAGTCCTCAGAAGAGAACATATCTAGTGAG >GoldenMol TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTGTAATAAAAACAAACAGTCTGGCTTAGCGAGGAGCCAGCAGAGCAGATGGGCTGGAAGTAAGGCAG CGTGCAATGAC---AAGCAGACTCCTAGCACACAGACAGAGCTATATAGGAGTGCTGGTCCCATGCACAGGAGAAAAGAA GTAAATAAGCTGAAATCTCCATGGTCTGAGAGTCCTGGAGCT---ACCCAAGAG---ATTCCTTGGATAACACTGAATAG TAGCATTCGGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGGCCTG---------GATGAGTCACATGATGAGGGGTCTG AATCAAATGCAGAGGTTGCTTGTAAATTCGAATTTCCCAAG------GAAGTACAAGGATATTCTAGTTCTTCAGAGAAA ATGGACTTAATGGCCAATGATCCTCATGATGCTTTAATATGTGAAAATGAGAGAATCCTCTCCAAACCAGCTGTAAGTAA C---ATTGAAGATAAAATTTTTGGGAAAACCTATCGGAGGAAGACAAGTCTCCCTAACTTGAGCCATGTAACTAAAGACC TAATGATAGGAGCTGTTGCTACAGAAGCTCAGATGGCACAA--------------------------------------- ---------------------GAACGTCCTCTTACAAATAAATTAAAGCGCAAAAGGAGA---ACCTCAGGCCTTCATCC TGAGGATTTTATCAAGAAAGCAGGCATGAAAGTTGTTCAAAAGACTCCTGAAAAGATAAATCAAGAAACTAACCAAATGG AGCAGAAT---------GATCAAGTGAGGAATATTGTTGATGGTGGTCATGAAAATGATACAAAAAGTGATTAT---GTT CAGAAAGAGAAGAGTGCTAACCCAGCCAAA------TCATTGGCAAAAGAGTCCGCTTTCACAACTAAAGCGGAACCTAT AAGCAGTAGTATAAGCAATATGGAACTAGAATTAAATATGTACAATTCTAAAGCACTGAGGAAGAATAGGTTGAGGAGGA AGTCCTCTTCCAGGCATATTCACACACTTGACTTGGTAGTCAATAGAAATCCAAGTCCACCTCCTTACACTGAACTACAG ATTGATAGTTGGCCTAGCAGTGAAGAA---ATAAACGAAACA---AGTTCTGAACAAAAGCCAACCAGACACAGCAGAAA CCTTCACCTCATGAAAGAACAGGAAACTGCAACTGGAACCAAGAACAGTAACAAGCCAAATGAACAAATAAGTAAAAGAC ATGCCACTTACACTTTTTCAGAACTAAACATAACAAATAGAACTGACTTTATTACTAACTGCCCACGTTCTGATAATCTT CAAGAACTTGTCAATCCTAGCCTTCAAGGAGAGGAAAGGGAAGAGAAA---TCGGAAACAATGCAAGTATATGATAACAC CAAAGAACCTGAGGATCAGGTGTTAAGTGGAAGAAGG---GATTTGCAA---ATGGAAAGATCTGTTGAGAGTACCAGTG TTTCAGTGATACCCGATACTGATCATGGCACTCAGAACCACACCTCATTACCGGAAGCTGGCACCCTCGGG---AAGGCA GAA---ACAGCACCAAATCAATGTACAAGTCAGTGTAAAGCAATTGAAAATCCCAACCAACTTATCCATGGTTGT---CC T---------AGAAATGACACAGAGGGCTTTAAGGATCTATTGAAACATGAAGTTAAA---CACAATCAGGAGACATGCA TAGAAATGGAAGAGGGTGAGCTTGATATTCAGTATTTACAGAATACATTCAAGGTTTCAAAGCGTCGGTCATTTGCTCTG TTTTCAAATCCA---------GAAAAGGAATGTGCAACAGTCAGTGCCCACTCTAGGTCCTTCAGGAAACAAAGTCCAAA AGCCACTCTTGAATGTGGCGAAAAAGAAGAAAATCAGGGGCACAAAGAGTCTAAAGTCAAGCATGTACAGGCAGTTCATA CAAGTGTGGGCTATCCTGGACTCTGTCAGAAAGAGAAG---CCAAGTGATTATACCAAAGGTAGCATTCAGGGGGCCTCT AGGCTTCATCAGTCATCTCAGTTCAGT---GGCAATGAATCTGAACAAATTACTGCAAATGAAAATGGAATTTCACAAAG CCCAGATCAAACAGCATTGCTTTCT------------------------AAATGTAAGAAAAAC---TTGTCTGAAGAAA GATTTGAAGAACGTGCAGTATCACTCGAAAAAGCAGTGGGAAATGAGAGCATCATTCAAAGTACAGTGAGCACAGTTAGC CACAATAACATTAGGGAAAGGGCTTTTAAGGAAACCAGCTCAAGTAGTACTAATGAAGTAGGTTCCAGTATTAATGAAGT AGGTTCTAGT------------------------------------------AACGAAAACATCCAAGCAGAGGTAGGTA GGAACAGAGGACCTAAGTTAAATGCTATGCTCACATTAGGTTTTATGCAACCTGAAGTCTATAAACAAAATCTT---ACT CTAAGTAATTGTAAACATCCTGAAATAACAAAGCAGGGAGACAATGAAGAAAGAGTTCAAGCTGCTGACCCAGGTTTCTC TCCGTGTCTAATTTCAGATAACCTAGAACCA---CCTATGGGAAGTAATCATGCTTCTCAGATTTGTTCTGAGACACCTG ATGACCTGTTAGATGATGATGAAATCAAAGAAAATATCAGCTTTGCTGAAAGTGGCATTAACGAAAGATCTGCAGTTTTT AGTAAAGATGACCATAGAAGACAATTCAGAAGGAACCTAAGCCCTTTATCCCAT---TTAGGTTTGACTCAGGGTCACTT AAGAGGTACCAGAGAATTAGAGTCTTCAGAAGAGAACCTGTCTAGTGAG >Madagascar TGTGGAACAAATACGCTTGCCAACTCATTACAGCGTGAGAACTACAGTTTATTACTCACTAAAGACAGACTGAATGTAGA AAAGGCTGGATTCTGTAATGAAAGCAAACAGCCCGGCTTAGCAAGGAGCCAACATAACAGATGGGCTGAAAGTAAGGAGA CATGTAATGAC---AAGCCGACTCCTAGCACAGAGAAAAAGGTAGATAAGAATGCTGACCCCGTGCATGGGAGAAAAGGA GTGCCTAAGCAGAAACCTCCGTGCCATGGAAGTCCTAGAAAG---AGCCACGAT---GTGCCTTGGAAAACACGGAAGAG TAGCATTTGGAAAGTCAATGAGTGGTTTTCCAAAAGCGATGGCCTG---------GGTGACTCGCATGATGAGCGGCCTG AATCAGATGCAGACGTCGCTGGGGCCTTCGAAGTTCCAGAT------GAAGCACGCGAATCTTCTAGTTCACCAGAGAAA ACAGACTTGATGGTCAGTGATCCTCATGTTCCTTTACTGCCTAAAAGTGAAATAGTCCTTTCCAAACCAGTAGAAAGTAA C---ATCGAAGACAAAATATTTGGGAAAACGTATCGGAGGAAGTCAAGTCTCCTTAACTCGAGCCATGTAACCGAAGATC TAATGGTAGGAGCTGCTGCTGCTGTAGCACCTCCGATAGCG--------------------------------------- ------------------CAAGAGCACCTTCTTACCAGTAAATTAAAACGC---AGAAGA---TCATCCGTCCTCCACCC TGAGGACTTTATCAAGAAAGCAGATTTGGCAGTTGTTGAAAAG------------ATCAACCAGGGAACTGACCAAACAC GGCCGAAG---------GGTCAAGTGGAGCATACTCCCGATGGTGGTCACGGGCATGGAACAAAAGGCGGGAAT---TGT GTTCAGAAGGAGAGTGCTGACCTGGCAGAA------TCATCGGCCAAAGAATCTGCCTTTCGAACTGTAGCTGAGCCTTT GAGCAGTAGGAAAAGCGGTATGGAACTGGAACCGACTGTGTTCAATGCAAAAGCACAN---AAGAACAGGCTGAGGAGGA AGCCCTCTGCCAGGCACATCCACCCCCTGGAACTA---GTCAACAGACAGCCAAGCCCGACGAGGCACACAGAACTGCAA ATCAGTAGCTGGCCTAGCAGTGAAGAG---CTAAAGAAAGAA---GAATCTGAGCCAAAGCCAGTCAGACGGAGCCGGAA CTTACAGCTCACGAAAGACCAAGAAATGGCCATCGGAGCCAAGAAGAATGACAAACCAAATGAAGAAAGGGGCAAAAAAC ACGCCCCTGACACTTTCCCAGAACGAAATTTAACAGACACAACTGACCTAATTACTACCTGTTCGTTTTCTGGTAATCTT CAAGCATCTGTCTGTCCTAGCCTTCCAGGGGCAAACCTGAAAGAGAAA---CTGGGAACAGTGCAGGTGTCTAATGGTAC CAGAGACCCCAAAGGTGAGGTGTTGGGTAGAGCAAGG---GGCCTGCAA---ACCGCAAGATCTGTTGAGAGTACCAGTG TTTCACTGATACCTGATGCTGATGATGGCCCTCAGGGTACCATCTCAATCATGGAAGCAGGCAACCGAGGG---GAGGCA CAA---ACAGCACCAGGTGAATGTGCAAGGCAGTGTGCAGCAACTGAAAACCCCAACAGGAGTGTCCAGGGTCTT---TC CAAAGACACTAGAAATAACAGGGAGGGCTCTGAGGATCTCTTGACACACGACGTTAAC---CACCTTCAAGAGACATGCA CAGAAATGGAAGAGAGTGAACTTGATATTCAATCCTTACAGAATACATTCAAAGTCTCAAAGCGTCAGTCATTTGCTCTG TCTTCAAATCCA---------GAAAAGGAGTGTGCCTCGCTCGGTGCCCGCCCT---------------------CCAAA CATTGCTCTTGAGTGTGGGGAA---GAAGGGACTGAGGAGAACAAAGAGTCTGCAGTTCGACCTGTGCAGGCAGTTCCTG CAACTGTGGGCGGTGCTGGAGGTAGTCCAACAGAGCAACAGCCACGTGATTATACCAAAAGTCACCTTCAAGGAGTGTCC AGGCTTGGTCAGGCAGCTCCATTCAGA---GGCAGTGAATCTGACCCCAGTACCTCAGTTGAACATGGGATTGTGCAGAC CTCGGAGCNTGCCCCACCACTCGCTCCCATCAGGTCATCTGTTAACAGTGAACGTAAG------------------GAAA GACTGGAGGATCATGCAGAATCACTTGAAAGAGCCTCAGAAAACGAGAGTATCATTCCAAGTACAGTGAGCACAATTAGC CAAAATACCCCGAGAGAAAATGCTTTCAAAGGAACCAGCTCAAGCAGTCTTAATGAAGTAGGTTCTAGC----------- ----------------------------------------------------AGCGAAAACATTCAAGCAGAGCTAGCTA GGAACAGAGAACCTAAATTGGATGCTGTGCTCAGGTTGGGTCTTGTGCAACCTGAAGTCTGTAAGGAAAGTCTT---CCT ATGAGGAAATGTAAGCATTCTGAAGTAAAAAGGCAAAAAGGCAATGGAGGACTAGTTCCAGCTGTTCATCCAGATTTCTC TCGCTATCTAATGTCACATAACCCAGAGCAG---CCTATGGAGGACAAT---GATCCTCAGATTTGTTCTGAGACACCTG AAGACCTGTTAGATGACAGTGAAATAAATAAAAACAGCCACTTGGTTCAAAGTGACATTAGGGAAAGATCTGCTGTTTTT AGCAAAGATAACCAGAGAAGAGATTTCAGAAGGAGCCCTGGGCCTACATCCCAT---TTAGGTTTGTCTTGGGGTCACCC NAGAGGTGCTGAAGAGTTAGAGTCCTTAG-------------------- >Tenrec TGTGGCACACGTACGCTTGCCAGCTCGGCACAGCGCGAGGACTGCAGCTTATTACTCACCGAAGACAGACTGGATGGAGA AAAGGCTGGATTCTGCAAGGAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAGCATAGCAGATGGGCTGAAAGTAGGGGAA CGGGTAATGAC---AAGCCGACTCGTAACACAGAGGAAGAGGTAGTTGTGAGCGCTGACTCCGCGCATGGGAGGAAAGGA GTGTCTAAGCAGAAACCTCCGCGCCAGAAAAGGCCCAGAAAG---AGCCATGAT---GTGCCTTGGAAAACACGGAAGAG CAGCATTTCGAAGGTTAACGAGTGGTTTTCCAAAAGCCACGGCCTG---------GGTGACTCTCGCGATGGGCGGCCTG AGTCAGGCGCAGACGTAGCTGTAGCCTTCGAAGTTCCAGAC------GAAGCATGTGAATCTTATAGTTCTCCAGAGAAA ACAGACCTGATGGCCAGGGACCCTCCTATTCCCTTACTGCATAAAAGCGCAAGAGTCTTTTCCAAACCAGTAGAAAGTAA C---ATCGAAGACAAAATATTTGGGAAAACGTATCGGAGGAAGTCAAGCCTCCTTAACTCGAGCCATGTAACTGAAGATC TGGTGCTAGGAGCTGCGGCTGCCGTAGCGCCTCCGGTAGCA--------------------------------------- ------------------CAAGAGCAGCTTCTTACCAGTAAATTAAAGCGCAAGAGGAGA---TCCTTCGTCCTGCACCC TGAGGACTTCATCAAGAAAGCAGATTTGGCTGTCGTTCAAAAGACTCCTGAAACGATCCATCAGGGACCTGACCCAATGC AGCCGAAG---------GGGCCAGTGCAGAAGATTCCCGATGGTGGTCCCGGGCGAGGGACAAAAGGCGGGGATCGTGTT CCGAAGGAGAAGAGTGCTAACCTGGCAGAA------ACATCGGCCAAAGAATCTGCCTTTCGAACTGTAGCTGAACCTTT GAGCAGCCGGAAAAGCAGTATGGAACTGGAGTTGCCTGTGTTCAGTTCAAGAGCGCAG---AAGAACAGGCTGAGGAGGA AGCCCTCTGCCAGGCACATCCATCCGCTGGAACTA---GTCAACAGACAACCAAGCCCGACTACGCACACGGAGCTGCAA ATCAGCAGTTGGCCTAGCAGTGAGGAG---CTAAAGAAAGAA---GAATCCGAGCCAAAGCCAATCAGACGGAGCCGGCA CTTAAAGCGGGAAAGA---------------------GCCAAGAAGAATGACAAGCCAAACGGAGAAAGGGGCAAAAAAC ACGCCCCTGACACATTCCCAGACCGAAAGTCAACAAGCACAACGGACTTAATTGCTAACTGTTCCGTTTCTGGCAGTCTT CATGGAGCTGTCTGTCCTGGCCTTCCAGGGGCAAAGCAAACAGAGAAA---CGGGGAGCAGTGCAGGTGTCTAATAGCGC CAGAGACCCCAAAGATGAGGTGTTGGGTAGAGCAAGG---GGCCTGCAA---ACTGCAAGATCTGTTGAGAGCACCAGTT TTTCACTGATACCTGCTGCTGCTGACGGCACCCCGGGTGGCGTCTCAATAATGGAAGCAGGCAGCCGAGGA---CAGGCA CAA---ACAGCACCAGATCCATGTGCAAGGCAGTGTACAGCAACGGACAACCCCAGCGAAAGTGCCCACGGTCTT---TC CAGAGATACTAGAAACAGCACGGAGGGCTTTGCGGATCTATTGACACATGGCGTTCCC---CACATCCAAGAGACATGCA CAGAAATGGAAGAGAGTGAACTTGATATTCAGTCTCTACAGAATATGTTCAAAGTCTCCAAGCGTCAGTCATTTGCTCTC TCTTCAAATCCA---------GAAAAGGAGTGTGCCACACTCTGTGCCCATCCTAGCGCCTTCAGGAAACAGAATCCAAA CGATGCTCTTGAGGGTGGAGAA---GAAGGGACTCGGGAGAACAAAGAGTCTACAATTAAGCCTGTGCGGGCAGTTCATA CCACCGTGGGC------------------ACGGAGAAGCAGCCAAGTGATTACACCAACAGTAGCTTTCAAGGAGTTTCT AGGCGTGGTCAGGCATCTCAATTCAGA---GCCAGTGAATCTGACCCCGGTACCTCAGTCGAACACGGAAGTTTGCAAAC CACAGAGCATACCCCACCACTCTCTCCCATCAGATCCTCCCTTAAAAGTAAATGTAAGGAAAAG---CTGTCAGAAGAAA GATTGGAGGATCAGGCAGAATCCCTTGAAAGAGCCTCGGGGAATGAGAGCATCATTCAAAGTACAGTGAGCACAATTAGC CAAAATACGCTTAGAGAAAATGCTTTTAAAGGAACCAGCTCAAGCAGTCTTAATGAAGTAGGTTCTAGC----------- ----------------------------------------------------AGCGAAAACATTCAGGCAGAACTAGCTA GGAACAGAGAACCTAAATTGGATGCTGTGCTCAGACTAGGACTTGTGCAGCCTGAAGGCTGTAAGGAGGAAAGTCTTCCC CTAAGGAAATGCAAGCATCCTGAAGTAAGAAGGCAAAAAGGCAATGGAGGACTAGTTCCAGCTGTCAATCCGGATCTCTC CCGCTATCTAATGTCACGTAACCCGGAGCAA---CCGATGGAGAGCAAC---GGTTCTCAGATTTGCTCGGAGACACCTG AAGACCTGTTAGATGACAGTGAAATAAAAAATAACAGCTACTTTGTTCAAAGTGACGTTAAGGAAAGGTCTGCAGTTTTT GGCAAAGATAACCAGAGAAGAGATTTCAGAAGGAGCCCTGGGCCTACATCCCAT---TTAGGTTTGACTTGGGGTCACCC AAGGGGTGCGGAAGAATTAGAGTCCTTAGAAGAAAGCGAAGCCAGTGAG >LesserEle TGTGGCACAGATCCTCATGCCAGCTCATTACAGCATGAGAGCAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGA AAAGGCTGAATTCTATAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAA CATGTAATGAT---AGGCAGACTCCTAGCACAGAAAAAAAGGTAGCTGTGAGTGCTGATCTTTATTATGGGATAAAAGAA GTGAATAAGCAGAAATCTCCGCACTCTGAGAGTCCAAACAAT---AACCAGAAT---ACTCCTTGGATGACATTGAACAG TAGCATTCGGAAAGTTAATGACTGGTTTTCCAGAAGCGGTGGCCTG---------GATGGCTGCCATGAT---AGATCTG AATCGAATGTAGTAATAGCTGGTGAAGTGGAAGTTCCAAAT------GAAGTACACGGATATTCTATTTATTCAGAGAAA TTAGACTTAATGCCCAACAATCCTCATGATACTTCACTGTATGAAAGTGAAAGAACCCTCTCCAAACCAGCTGAAAACAA C---ATTGAAGATAAAATATTTGGGAAAACCTATCGGAAGAAGGCAAGTCTCTCTAACTTGAGCCATGTGACTCAAGATC TCATTACAGGAGCTGTTGCTACAGAATGTCAAACAGCACAA--------------------------------------- ---------------------GAACGCCCTCTTACAAATAAATTAAAGCGTAAAAGAAGA---ACATCAGGCCTTCAGCC TGAGGATTTTATCAAGAAGGTAGATTTGACAGTTGTTCAAAAGATTCCTGAAAAGACAAATCAGGAAACTGAACAAATGG TGCAGAAT------GGTCAAGTGAAGAATAAGATTGCTAGTGGTGGTCATGAGAGTGATACAAAAGGTGATTAT---GTT CAGGAAGAGAAGAATACTAACCCAACAGAG------TCACTGGTAAGAGAATCTGATTTCAGAAATAAAGCTGAACCTAT AAGCAGCAGTATAAGCAAAATGGAGCTAGATTTAAATACACACAGTTCGGTAGCACCAAAGGAGAACAGGCTGAGGAGGA TGTCCTCAACCGGGCATGTTCACGCACTTGAACTAGTAGTCAATATAAGTCCAAGGTCACCTATCCATACTGAAGTACAG ATTGATGGTTGGTCTAGCAGTGAAGAG---AAAACGAAAAAA---CATTCCGAGCATAAGCCAGTCAGACCTAGCAGAAA CGTTTCAGTTACAACAAATCAAGAAACTGAAACTGGGTCCAAGAAGAATAACAAGCCAAATGAACAAATGAGTAAAAGAC ATGCCACTGACACTTCCCCAGGGCAC---------AACATAACTGGCTTCATTACTGACTGTTCAAATACCAGCAGTCTT CAAGAATGTTTTAATCCTAGACTTCAGGGAGAGGCAGTAGAAGACAAC---TTTGGACCAGTTCAAGTGTCTAATAGTAT CAAAGATCCCAAGGATCAAGTATTAAATAGAGGAAGG---AGTTTGCAGCACACTGGAAAATGTGTTGAGAGTACCAGTA TTTCAGTGTTATCTGATACTGATTATGGCACTCAGAATAGCATCTCATTACTGGAGGCTGGCACCCTAGGG---AAGGCA GAA---ACAGCACCAGATCAATGTGCAAGTCACTGTGCAGCAATGGAAAACTCCAAGAAACATGTTCATGACTTT---CC TAAAGATACTGGAAATGATACAAAGGGTTTAAAGGACTCATTGAAATGTGATGCTAGC---CACAGTAAGGATACGTACA TGGAAATGGAAGATAGTGAACTTGATACTCAGTATTTACAGCATACATTCAAGGTTTCAAAGCGTCAGTCATTTTCTCTG TTTCCAAATCCA---------GAAAGGGACTATACA-------------------------------------------- ----------------------------------------------------------AAGCTAGTGCAGCTAGTTCATA CCACTGCGAGCTATCCTGGAGCCTGTGAGGAAAAGAAA---CCAAATGATTTTACCAAATGTAGCATTAAAGGAGTCACT AGACTCTCTCATTCATCTCCATTGAGA---GGCAATGAATCTGAACTCATTACCGCAAATGAACATGAAATGTCACATAG CCCAGATCAGACACCGTCACTTTCTCCCACCAAGGTTCCTGTTAAAATGACATGGAAGAAAAACCACCTGTCAGAAGAAA ACTTTGAGGAACATACAATA------------ACAATGGGAAATGAGAGCATCGTTTATAGTCCAGTGAGCACAGTTAGC CAAAGTAATGTTAAGGGA------TATGCTCAGGCCAGTTCAAGCAGTATTAACGAAGTAGGTTCCAGTACTAATGAAGG AGGTTCCAGTATTAATGAAGTAGGTTCTAGT---------------------GGTGAAAACATCCAAGCAGAGCTCGATA GGAACAGAGGACCTGCATTAAATGCTGTACTCAGATTAGGTCTTGTGCAACCTGAAGTCTATGAACAAAGTCTT---CCT GCAGGAAATTGTAAACATCCTGAAATGACAAGGCCAGGAGACAGTGAAGGAGAAAAAAAGACTGTTAATCCAGATTTCTC TCCATGTCTACTCTCAGATAATTTAGAACAA---CCTAGGGGAAGTTGTCATGCTTCTCAGATTTGTTCTGAGACACCAG ATGACCTGTTTGGTGATGATGAAATAAAGGAAAATAGCAGCTTTGGTGAAATCAGCTTTAAAGAGAGATCTGCTGTTTTT AGTAAAGATGAGCAGGCCAGAGAGTTCAATAGGAACTCAAGCCGTTCATCTCGT---TCAGATTTGGCTCCCAGTCATCT AAGACGCGATAGGGAATTAGAGTCCTCAGAAGAGAATGTATGTA----- >GiantElep TGTGCCACAAGTATTCATGCCAGCTCATTACTGCATGAGAACAGCAGGTTATTGCTCAGTAAAGACAGAATGGATGTAGA AAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACAGAGCAGATGGACTGAAAATAAGGAAA CATGTAATCAT---AGGCAGACTCCTAGTATAGAGAAAAAATTAGATATGAATGCTGATCCCTCTTATGGGCGAAAAGAA GTGAATAAGCAGAAATATGAATGCTCTGAAAGCCCTCAAGAT---ACCCAAGAT---ATTCCTTGGATGACACTGAATAG TAGCGTTCAGAAAGTTAATGCATGGTTTTCCAGAAGTGATGGCCTC---------GATGATGCCCCTGATGTAGGGTCTG AATCAAATACAGCAATAACTGGTGCACTGGAAGTTTCAAAG------GAAGTACATGGATATTCTAGTTCTTCAGAGAAA GTACACACAATGACTGGTGATCCTCAGGGTGCTTCAGTATGTGAAAGTGAAAGAGTCTTCTCTAAACCAGCAGAAAATAA C---GTTGAAGATAAAATATTTGGGAAAACCTATCGGAGAAAGTCAAGTCTCCCTAACTTGAACCATATAACGCAAGATC TCATTGTAGGAGCTGTTGCTACAGGACCTCAGACAGCACAA--------------------------------------- ---------------------GAACACCCTCTTATAAATAAACTAAAGCGTAAAAGGAGA---ATATCTGGCCTTCGTCC TGAAGATTTTATCAAGAAGGTAGATTTGACT---AGTCAGAAGACTCCTGAAAAGAGAAATCAGACCATTTACCAAGTGG AC------------------------AGTAGTATTGCTGGCGGTGGTCATGAGCATGAAACGAAAGGTGACTGT---GTT CAGAAAGAGAAGAATGTTAGCCCAGCAGAA------TCATTGACAAAAGCTTCTGCTTTCAGAACTAAGGATGAACCTAT AAGCAGCAGTGTGAGCAGTATGGAACTAGAATTAAATATGTACAGTTCGATAGCACCAAAGAAGAACAGGCTGAGAAGGA AGTCCTCTACTAAGCGTATTTGTGCACTTGAACTAGTAGTCGATAGAAATCCAACTTCACCTACCCATGCTGAGCTAGAG ATTGATAGTTGGCCTAGCAGTGAAGAG---AGAAGGGAAAAA---TGTTCAGAGTTTAACTCCGTCAGACACAGCAGAAC CTTTCAGCTTATGAAAGGTCAAGAAACTGAAATTGGAGACCAAAAGAATAATGACAAGCAAAATGAACAAATAAGTAAAA AACATGCCACTGACACTTCTCCAAAACGA------AACATAACTGGCTTTATCACAAACTGTTCAAATACTGATAATCTT CAAGAATTTGTTAATCCTAGATTTGAAGGAGAGGAAATAGCC---------GTGGGACCAATTCAAGTGTTGAATAGTAC CAAGGACCCTGGAACTCAAGTGTTAAGTAAGGAAAGG---GGTTTGCAC---ACTGGGAATTATGTTGAGAGTACCAGTA TTTCAGTAATACCTGATACTGATGATGGTATTCAAAATAGCACCTCATTACTGGAAGCTGGCACATTAGAA---AAGGCA GAAGAAACAGCATTGACTCAATGTGCAAGTCACTGTGCAACAATTTTAAACCCCAACACACATGTTCATGGCTTT---CC TAAAGATCCTGCAAATGACACAGAGGGTGTTAAGGATCTGTTGAAATGTAATCCTAAC---CACCTTCAGGATACGTGCA TAGAAATGGAAGATAGTGAACTAGATACTCAGTATTTACAGAATACATTCAAAGTTTCAAAACGTCAGTCATTTACCCTG TTTCCAAATCCA---------GAAAAAGAATGTACAACAGTCTGTGCCCACTCCAGGCTCTCAAAGAAACAAAATCCAAG TGTCACTCTTGAGTGTAGAGAAAAAAAAGAAAATCAGGGGAATGAAGAGTCTAAAATGAAACATGTGCAGGTCGTTCATA CATCTGTGCACTATCCGGGAGTTTGTGAAGAAGAGAAGATACCAGATGATCACACCAAAGTTAGCATTAAAGGGATCTCT AGGCTTTGTCAGTCATCTCCTTTCAGG---AGCAATGAATCTGAACGCATTACTGAAAATGAATGTAAAATGTCACATAG CCCAAACCAGACACCATCATTTTCTCCCACCAAGCCATCAGTTAGAACCAAATATAAGCAATTCTACTTGTCAGAAGAAG AGTTGGAGGAACATGCACTATCACCTGAAAGAGCAATGGAAAATGAGAGTTTCATTCACAGTACAGTGAACGCAGTTAGC CAAAATAACCTTAGGGGAAGTGTCTTTAAAGAAGGCAGCTCAAGCAGTATTAATGAAGCAGGATCCAGTACTAATGAAGT AAGTTCCAGTATTAATGAAGTATGTTCTAGT---------------------GATGAAAACATCCAAGCCGAACTAGATA GGAACAGAAGACCTGAATTAAATGCTGTACTCAGGTTAGGTCTTATGCAACCTGAAGTGTATCAACCAAGTCTT---CCA ACAAATAACTGTGAATATCTTGAAATTAGTGGGCCAGGAAAAAATGAAGGCATAATTCAGGCTGTTAATCCAGATTTTAC TCCGTGTCTGATTTCAGATAACTTAGAACAACAGCCTATGGGAAGTAATCATGCTTCTCAGATTTGTTCTGAAACACCTG ATGACCTGTTAAATGATGAGAAAAGAAATGAGAATATCAGCTTTGCTCAAATCAGCATTAAGGAACGATCCACTGTTTTT AGTAAA---GACCAGGCAAGAGAATTCAGAGGAAACCCAAACCCTTTATCCCAT---TCAGATTTGACTCAGAGTCATCT AAGAAGTGGTAGGGAATTAGAGTCATCAGAAGANAA------------- >Caenolest ---------------------NGCTCATTACTGCCTGAGATCACCAGTTTATTGCCCAACACAGACAGAATGAATGTAGA AAAGGCTGAACTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGCAACCAGCAGAGCAATCTGGATGAAAGTAAGGAAA TATGTAGTGCTAGCAAGACCCCAGGTGCAGAGGAA---CAGCCTGAGCTGAATGCCAATCATCTGTGTGAGAGGAAAGAA CTAGAGGAG---AAGCTGCAGTGCCCTGAGAGCCCCAGAGGTGATTCTGGGGACTGCCTGTCTGGAACCAAAGTGAAAAA CAGTATTCAGAAAGTTAATGAGTGGTTATTCAGAAGTAATGACGTTTTAGCCCCAGATTACTCAAGTGTTAGGAGCCATG AACAGAATGCAGAGGCAACCAATGCTTTAGAATATGGGCATGTAGAG---ACAGATGGAAATTCTAGCATTTCTGAAAAG ACTGATATGGTGGCTGACAGTACTGATGGTTCCTGGCTACATGCACCTGAAAAAAACTGCCCCAGACCGGCAGAGAGCAA CAATATTGAAGATAAAATATTTGGAAAAACCTATCGAAGAAAGTCGGTTCACCCTGTTTTGAATTACGTAACTGAAAACT TGATTGTTGAAACCATTGCTCCTGATTCTGTGATCCCTCCA--------------------------------------- ---------------------GAGCCTCTCAAAAAAACCAAATTAAAGCGTAAAAGAAAGACCATATGTRACCTGCAGCC TGAGGATTTCATCAAAAAGACAAACATTCCAGTTATTCCCAAAACCCCTGAAAAGAAAAATCACCCTGTTGACCAGATCC TTGAAAAAGAACAAAATGGCCAGGTGATGAGCACAGCTGATGGTCAGTCCAAGCAGAAAGCACCCGGTTGTCAT---GTG GGGGAACTGAAAGAGGCTCAGGCATCAGAACCATCCTCTGTAGAGAAAGGATCCACTTTCAAAACTGGATCCGAGCCTGC AGCTGGTAGCACAAACCAGAGGGAACTTGAATTAAATGGTAGAGATGCAAAAATGACAAAAAAAGACAGACCAAGGAAGA GGCCTTCAGTTAGGACTGTCTGTGCTCTGGAGCTCGTGACTGATAGACACCCAGGCTCTTCTAATGAGACAGAACTACAG ATCGATAGCTATCCCAGCAGTGAAGAG---ATAAGGAAAGGAAATAATTCTGAGCAAAAGCAAGTCAGACGCAGCAGAAG ACTTCAGATGCTGTCAGAA---GAAATTGCAGCAGGAGCCAAGAAGGCCCATAAGCCAGATGAAGAAGTGGAAGAGAATT GTGTCAATGAAGTTTCCCCAGAACTAAAAATAGGAAAAGTATCTGCCTGTTCTACTGACAGTCTAGCTACTGATAGTGAT CTGGTATTAGCTAGCTGCAGTTTCACAGAA---GGAGATGAAAAGAGC---CTGGAAGCCTTCCAG---------CCCAG CAGAGACCAAGAC---AGCCTGGCAGTAAGCAGAGGAGAGAAGTTGCAA---GGGAAAAGAACCAAAGGAAACATGGAGG TTTTGGAGGTTGCCAGTACTGATTGGGACACTCAGGACAGCACCTCATTGTTTCCAACCGATGTTCCCCCA---AATCCA ACA---GCAGACTCTGGCCCACACAGAAATCAACATGAGGTCATAGAAACCCCCAAGGAACTCTTTGATGATTGCTCATC CAAAAACACCGGAAGTGGTAAAGAAGATTTG---------ACAAGACAAGAAGTCAAA---AATACCTCAGAGACAACCA CAGAGATAGAGGACAGTGAACTTGATACCCAATATTTACAGAATACCTTCAAACGTTCAAAACGCCAATCATTTGCTCTG TGTCCCAGTCCAAGG------CAGACAAGTATGAAACTCCGGTCCATCCCTAGAGCTCTAAGTCATCAGAGCCCAGATAA AGCCACGAATCATGGGGGGCAGGAAAAAAAAAAGCAGGGAAGCAGAGAATCAGACAAGGCTGTGCAGCCAGAACTTGCAG TCATGAGTTCAGCTGCGGTTTGTCAGACAAAGGAGAAAAAGCCAGGTGATTATGCCAAATGTAGCACA------GTCTCT AGGCTTTGTCACATAGCTCCATTACATGATGACAATGACTGTGACCACAGTGCTGAAAACAACAAGGGAATTTCACAAGT TCCTGATCAAAAGCAATCTGTCTCTCCAGCAAAGTCATCAGCTGGTAGATCTATATACACAAAAAGCCTCCTGGAGGAAA GACTTGATGAACAGACCACATGTCCCGAAACAGTGATGGGAAATGAAAGCTTAGTCCACAGTAGCTTAAGCCTGGTTAGC CAAAGTAACAGCGGAGAATATATTTCTAAAGCAACTGACTCAAACAGATTTGTTGGTGTAGACTCTAGC----------- ----------------------------------------------------GGAGAAGGTAGTCAGGAAGAAAAAGGTG AAAACAAATTAAATCAGTCCCAAGTCTGTCAACAAAGCTT---------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ------------------------------------------------- >Phascogale ---------------------NGCTCATTACTGCCTGAAACCTCCAGTTTATTGCCCAACACAGACAGAATGTCT---GA AAAAGCTGAGCTCAGTAATAAAAGCAAACAGTCTGGCTTAGCAAGGAACCAACAGAGCAGACTGGATGAAAGTAAGGGAA TATGTAGTACTGGGAAGACCCCCAGTGTAGGAGAG---CAGCCTGAGTTGAATGCCCATCATCTGTGTGAGAAGCAAGAA CTAAAGGAG---AAGCCACAAGGTCCTGAAGGCCCCAGAGGAAATCCTCAGAATTGCCTATCTGGAACCAAACTGAAAAG CAGTATTCAGAAAGTTAATGAGTGGTTATCCAGGAGTAATGACATTTTAGCCTCCGATTGCTCCCTTGACAAGAATCATA AGCAGAATGCAGAGATAGCTAGTGCCTTAGAAGATGGGCATCCAGATAACTCTGATGGAAATTCTAGCATTTCTGAGAAG ACTGACGTTGTGGCTGACACTGCCGATGGAGCCTGGCTACATGTGCCTAAAAGAACTTACACCAGGCCAGCAGAAAACAA CAATATTAAAGATAAAATATTTGGAAAAACCTATCGGAGAAAGTCGGGTCACCCTAATTTGAATCACATAACTGAAAATT TGATTGTTGAGACAGTTGTTCCTGATTCTTTGGTTCCTCCA--------------------------------------- ---------------------GAACCTCTGAAAAATACCAAGTTAAAGCGTAAAAGAAAGACTATATGTGACTTGCAGCC TGAGGATTTCATTAAGAAGACG---GTTCCAGTTACTCACAAAACCCCTGAAAAGAAACATCACGCTGCTGACCAAACCC TTGAAAGAGAACAAAATGACCAAGTGATGAACATGGATAATGGTCATCCTGAACAGAAAGCACTAAATGGTCAT---GTA GGGGAAATTAAAGAAGCTCAGACATCGGAACTGTTCTCTGCAGAGAAAGAATCCACTTTCAGAACTGGAACAGAGCCTGC AGCTAGC------------------------TTAAATGGTGAAGAAACCAACATAACAAAAAAAGACAAGCTGAGGAAGC AGCCTTCATTCAAGATTGTCTGTGCTCTTGAGCTTGGAGTTGATGGAAGCCCAAGCTCTTCTAATGAGACTGAACTACAG ATCGATAGCTATCCCAGCAGTGAAGAG---ATAAGGAGAAAAAATAATTCTGAACAGAAGCAACTTAGACGCAGCAGAAG ACTTCAGCAGGTGTCAGAA---GAGATTGCAATTGGAGCCAAGAAAGCCCATAAGCCAGATGATCAAGCAGAAGAAAGTT GTATCACTGAAGTTGTCTCAGAACTAAAATTAGGAAATCTACCTGCCTGTGCTCCTGACAATGTCAGTACTGATAAGGAT CAAGTATTAGCTAGTTGCAGTTTCACA------GGAGATGAAAGGAGT---CTGGAAGTAATCCCT---------AGCAG CCAAGACCAAGAT------TTGGCATTGAGTGGAAAGGAAGGGTTGCAA---GGTGAAATATCTCAAGGAAGCCTGGGGA CTGTGGAGGTTCCTGATACTGCTTGGGACACTCAGGACAGTACCTCGTTGTTTCCAGCTGATACTCCTCAA---AATTCA AAA---CCAGGTCCCAGTCCTCACAGAAATCACTGTGAAATAATGGAAACCCCCAAGGAACTCTTAGAAAGTAGTTCATC CAAAAACACTGAAAGTAGTGTAGAAGATTTAAGGAGCCTGATGAAACAAGAAGTTAAA---AATGCCTCCAAGACAATCA CAGAAATGGAGGATAGTGAACTTGACACCCAGTATTTACAGAATACTTTCAAGCGTTCAAAGCGCCAGTCATTTGCACTG TGTTCTAGCCCAAGG------CAAGAATATGTGAAAACCTGTGCTGTCCCTGGGGCTATAAGTCAGCAAAGT-------- ----ACAGTTCGTAGGTACCAGGAAAAAGAAAAGCAAGAAAACAGAGAATCAAACAAGCCTGTGCAGCCAAAGCCTGCTG TTGTG---------------------ATAGAGAAGGGGAAGCCAGGTGATCATGCCAAATGGAGCACCAGAGAAGTCTCC AGGCTTTGTCACATAGCTTCATTAAATAGTGGCAATGACTGTGAACCCATTGCTGAAATCAACCAGGGAATTTCACAAGT TCCTGATCAAAACCAATCAGTTTCTCCAGCAAGGTCATCTGATAGTAAAACCATATATGCAAAAAACTTCCTGGAGGAAA GGCTTGATGAGCAAACCACGTGCCCTGAAACAACTATGGGAAATGAAAGCTTAGCCCAAAGCAGCATTAGTTTAGTTAGT CAAAGTAATAGCAGAGAATATGTTTCTAACCCAATTGACTTAATT------ATCAGCGTAGGCTCCAGT----------- ----------------------------------------------------GGAGAAGGCACTCAGGCAGAAAAAAGTG AAAACAAAGAATCTGAATTAAATACACCACCCAAATTAAAGCTTATGCAACCCCAAGTCTCTCAACAGACCTTT---CCT CAGAATAATTGCAAA----------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ------------------------------------------------- >Wombat ---------------NNTGCCAGTTCATTACTTCCTAAAACCAGCAGTTTATTGCCCAACACAGACAGAATGAATGTAGA AAAGGCTGAACTCTGCCATAAAAACAAACAGCCTGCCTTAGGAAGGAACCAAGAGAGCAGTCTCGATGAAAGTAAGGAAA TATGTAGCCCTGGGAAGACCCCAGGTGCCTGTGAG---CGGCATGAGCTGAATGACCATCATCTGTGTGAGAGGCAAGGC CTAGAGGAG---AAGCCGCAGCACCCTGAGAGCCCTAGAGGTAATCCTCAGAACTGCCTGTCTGGAACCAAACTGAAAAG CAGCATTCAGAAAGTTAATGAGTGGTTATCCAGAAGTAGTGACATTTTAGCCTCTGATAACTCCAACGGTAGGAGCCATG AGCAGAGCGCAGAGGTGCCTAGTGCCTTAGAAGATGGGCATCCAGATACCGCAGAGGGAAATTCTAGCGTTTCTGAGAAG ACTGACCCAGTGGCTGACAGCACTGATGATGCCTGGCTACATGTGCCTAAAAGAAGCTGTCTCAGGCCTGCAGAAAAC-- -AATATTGAAGATAAAATATTTGGAAAAACCTATCGGAGAAAGTCAGGTCACCCTAATTTGAATTATGTAACTGAAAACT TGTTTGCTGAAGCTGTGGCTCCTGATTCCTTGATCCCTCTA--------------------------------------- ---------------------GAGGCTCCCAAAAACACCAGGTTAAAGCGTAAAGGAAGGAGCATAGCTGACCTGCAGCC TGAGGATTTCATCAGGAAGACGGACGTTCCAGTTATTCACAAAACCCCTGAAAAGAAAAATCACTCTGTTGACCAAATTC TCAAAAGAGAACAAAGTGACCAAGTGATGAACACGGCTAACAGTCTTCCTGAGCAGAAAGCCCTAGGTGGTCAT---GTA GGGGAAGTGAAAGATGTTCAGGTATTAGAGCTGTTCTCTGCGGAGAAAGAATCCACTTTCAGAACTGGAACAGAGCCTGT AGCTGGCAGCACAAACCATGGGGAGCTTGAATTAAATAGTAGAAATGCCAAAATGACAAGAAAAGACAGGCTGAGGAAGA AGCCTTCAGCCAGGATCGTCCGTGCTCTTGAGCTCGTAGTTGATAGAAACCCAAGCTCTTCTAATGAGAGTGAACTGCAG ATCGATAGCTATCCCAGCAGTGAAGAG---ATAAGGAAAGGAACTAATTCTGAACAGAAGCAAATCAGACGCAGCAGAAG GCTTCGGCTGCTATCAGAA---GAAATTGTGGTTGGAGCCAAGAAGGCCCATAAGCCAGATGACCAAGCAGAAGAAAGTT GTATCAGTGAAGTTTTCCCAGAACTAAAAATAGGAAACGTGCCTGCCTGTGCTACTGACAGTCTAACTACTGATAGGGAT CAAGTGTTAGCTAGCTGCAGTTTCACAGAAGAAAGAGATGAAAGGAGC---CTGGAAGCAATCCCA---------AGCAG CAAAGACCAAGAT------CTGCCCTTGAATGGAGGGGAGGGGTTGCAA---GGTGAAAGAGCCCCAGGAAGCCTGGAGG CTTTGGAGGTTCCTGATACTGATTGGGACACTCAGGACAGTACCTCATTGTTTCCAGCTGATACTCCCCAA---AATTCA AAA---GCAGGACCCAGTCCTCACAGAAGCCACAGTGAAATAATGGAAACCCCCAAGGAACTCTTAGATGGTTGTTCATC CAAAAACACTGAAAGTGACGAAGAAGATTTGAGGAGCCTGATGAGACAGGAAGTTAAA---AATGCCTCCAAGACAACCA CAGAAATGGAGGATAGTGAACTCGACACCCAGTATTTACAAAATACCTTCAAACGTTCAAAGCGCCAGTCATTTGCTCTG CGTTCTAGCCCAAGG------CAGGAATGTAGGAAACCCTCTGCTGTCCCTGGGACTGTAAATCAGCAGAGTCCAGATAA CACCACAGATTGTGGGGGCCAGGAAAAAGAAAAGCAGGGAAACAGAGAATCAAACAAGCCTGTGTGGCCAAAGTCTGCAG TCATGAGCTTAGCTGCGGCTTGTCAGACAGAGGAGAGGAAGCCAGGTGTTTATGCCAAATGTAGCACCACAGAAGTGTCC AGGCTTTGTCACATAGCTCCATTACATGGTGTCATTGACTGTGAACACATTGCTGGAAACAACCAGGGAATTTCGCAAGT TCCTGATCAAAAACCATCAGTTTCTCCAGCAAGGTCATCTGCTAGTAAAACTATAAATACAAAAAACCTCCTGGAGGAAA GGCTTGATGAACAGACCACATGTCCTGAAACAGCTATGGGAAATGAAAGCTTAGCCCAAAGCAGCTTAAGTCCAGTAAGC CAAAGTAATAGCAGAGAATATATTTCTAAAGCAACTGACTTAAATAGATTTATCAGCATAGGCTCTAGT----------- ----------------------------------------------------GGAGAAGGCAGTCAGGCACAAAAAGGTG AAAACAAAGAATCTGAATTAAATACACAACCCAAATTAAAGCTTGTGCAACCCCAAATCTGTCAACAAAGCTTT---CCT CAGGATAATTGCAAAGAGTCTAAAAGAAAAGGGAAGGGAGGAAATGGAAAATTAGCTCAGGCCATCAGTACAGATTCATC TCCATGTTTAGAACAA------------------ACTAAAGAGAGTACACATTCTTCTCAGGTTTGTTCTGAGACACCTG ATGACCT------------------------------------------------------------------------- -------------------------------------------------------------------------------- ------------------------------------------------- >Bandicoot ---------------------NACTCATTAATGCTTGAAACCAGCAGTTTATTGTCCAACATAGACAGAATGACTACAAA AAAGGCTGAACTCTGTAATAAAAGTAAAGATCCTGGCTTAGCAAGGAACCAACAGGACTGTTTGGGTGAAAGTAAAGAAA TAAGTAGCACTGTGAAGACCCCAGCTGCAGGGGAG---CAGCATAAGCTGAATGCCCACCATCTGTGTGAGAAGCAAGAA CTAGAGGTA---AAGTCACAGCACCCTGAGAGCCCCAAAGGTAGTCCTCGGAGGTGCCGGTCTGGAACCAAACTGAAAAG CAGTATTCAGAAAGTCAGTGAGTGGTTGTCTAGAAGTAATGACATTTTAAACTCTGATTATTCCCATGAGAGAAGCCAGG AGCAGAATGCAGAGATTGCTATTGCCTTAGAAGATGGGCATCTGAATACTGCTGATGGAAATTCTAGCATTTCTGAGAAG ACTGNCCTGGTGGCTGNCACCACTGAT---------------GTGCTGGAAACTAGCTTNTCCAGGCCAGCTAAAAGC-- -AATATTGAAGATAAAATATTTGGGAAGACCTATCGGAGAAAGTCANGTCACTCTAATTTGAATTATCTAACTGAAAACC TCCTTGTTGAAACTATTGCTCCTGATTCTTTGATCCCTCCA--------------------------------------- ---------------------GAATCTCTCAAGAACATCAAGTTAAAGCGAAAAAGAAAAACTATATCTGACCTGCAGCC TGAAGATTTCATCAGGAAGACAGATGTTCCAGTTAGTCACAAAACCCCTGAAAAGAAAAATCCTGCTGTTGACCAGATTC TTGAAAGAGAACAAAATGACCAAGTGATGAACACAGCTAATGGTCATCTTAAACAGAAAGCACTGGGTGATCAT---GTC AAGGAAGTGAAAGATGCTCAGGCAGCAGAACTGTTCTCTACAGAGAAAGAATCAACATTCAGGACTAGAACAGAGCCTAA AGCTGGCAGCATAGTCCACGGGGAGCTCGAATTAAATGGGAGAGGTGCCAAAATGACAAAAAAGGACAGGTTGAGGAAAA AGCCTTCAGCCAGGATTGTCCGTGCTCTTGAGCTTGTGGCTGATAAAAACTCAGGCTCTTCTAATGAGGCTGAATTACAG ATCGATAGTTATCCCAGCAGTGATGAG---ATAAGGAAAGGAAATAATTCTGAACAGAAGCAAATCAGACGCAGCAGAAG ACTTCAGCTACTGTCAGAA---GAAATTGCAGTTGGAACCAAGAAGGCCCATGAGCCATATGACCAAGCAGAAGAAAGAT GTGTCAAGAAGGTTTTCCCAGAATTGGAAATGGGAACTGTGTCTGCTGGTGCTACTGACAGCCTATCTACTGATAGGGAT GAAGTGTTAGCTAGTTGCAGTTTCACAGAT---GGAGAGGAAAGGAAC---CTGGAAGTAATCCCA---------AGCAG CAAAGACCAAGACCAAGATCTGGCATTAAGTGAAAGGGAAAGGTTGCAA---GAAAAAAGAACCCAAGGAAACCTGGAGC TTCTGGAGGTTCCAGATACTTATTGGGAAACTCAAGACAGTACTTCACTGTTTCCAGCTGAAACTCCCCAG---AATTCA AAA---GCAGGACCCAGTCCTCACAGAAGTAACTGTGAAATAATGGAATCCTCCAAAGAACTCTTCGATGCTTATTCATC CAAAAACACTGACAGTGGCACTGAAGGTTTG---------ATAAGACAGGAAATTAAA---TATGGCTCTGAGACATCTT CAGTAATGGAGGATAGTGAACTTGACACCCAGTATTTGCAGAATACCTTCAAACGTTCTAAGCGCCAAACATTTGCTCTG TGTTCCAACCCAAAG------CAGGAACAGATAAAACCCTGCTCTGTTCCTAGGGCTGTCAGCCACCAGAGTTCAGATAA TGCCTCAGACTGTGGGGGCCAAGAAAAAGAAAAGCAAGGAAACAGAGAATCAAATAAACCTGGCCAACTAGCATCTGCAG TCAGGAGCTCAGCTGCCACTTGTCAGACA---GAGAGGAAGCCAGGTGATCCTGACAAATGTAGCGCCACAGGAGTTTCC AGACTTTGTCACATAGCTCCATTACAAGAAGGCAATGACTGTGAATACATTGCTGGAAAAAAACAGGGAATTTCACAAGT TCCTAATCAAAAACAATCAGTTTCTCCGACAAGGTCATCAGTTAGTAAA---ATATATACAAAAAACCTCTTGAAAGATA GACTT---GAACAGACCACATGCCCTGAAACAGTTATGGGAAATGAAAGCTTAGGCCAAAGCAGCTTAAGTCCAGTTAGC CAAAATAACAGCAGAGAATATATTTCTAAAGCAACTGACTTAAATAGATTTATTAGCAGGGACTCTAGT----------- ----------------------------------------------------GGAGAAGACAGTCGGGCAGAAAAAGGTA AAATCAAAGAATCTGAATTAAATACACCAACCAAATTCAAACTTGTGCAACCACAAGTATGTCAACTAAGCTTT---TCT CAGGATAATTGCAAAGAGCCAAAAAGAAAAGGGAAAGGAGGAAATGGAATATTAGCTCTGGCCACCAGTACAGATTCATC TCCATGTTTAAAAGAA------------------ACTAAAGAGAGTACACATTCTTCTCAGG------------------ -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ------------------------------------------------- PyCogent-1.5.3/tests/data/brca1_5.250.paml000644 000765 000024 00000002453 10665667404 020731 0ustar00jrideoutstaff000000 000000 5 250 NineBande tgtggcacaaatactcatgccaacttattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggcgccaacagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cccagcatagagaaaaaggtagatgtggatgctgatcccctgtatgggcgaaaagaactg aataagcaga Mouse tgtggcacagatgctcatgccagctcattacagcctgagaccagcagtttattgctcatt gaagacagaatgaatgcagaaaaggctgaattctgtaataaaagcaaacagcctggcata gcagtgagccagcagagcagatgggctgcaagtaaaggaacatgtaacgacaggcaggtt cccagcactggggaaaaggtaggtccaaacgctgactcccttagtgatagagagaagtgg actcacccgc Human tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact cccagcacagaaaaaaaggtagatctgaatgctgatcccctgtgtgagagaaaagaatgg aataagcaga HowlerMon tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttgttactcact aaagacacactgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctgaaagtgaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagatgtggatgctgatcccctgcatgggagaaaagaatgg aataagcaga DogFaced tgtggcacaaatactcatgccaactcattacagcatgagaacagcagtttattatacact aaagacagaatgaatgtagaaaagactgacttctgtaataaaagcaaacagcctggctta gcaaggagccagcagaacagatgggttgaaactaaggaaacatgtaatgataggcagact tccagca?agagaaaaaggtagttctgaatgctgatcccctgaatggaagaataaaactg aataagcaga PyCogent-1.5.3/tests/data/brca1_5.paml000644 000765 000024 00000000540 10665667404 020417 0ustar00jrideoutstaff000000 000000 5 60 NineBande gcaaggcgccaacagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact Mouse gcagtgagccagcagagcagatgggctgcaagtaaaggaacatgtaacgacaggcaggtt Human gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact HowlerMon gcaaggagccaacataacagatgggctgaaagtgaggaaacatgtaatgataggcagact DogFaced gcaaggagccagcagaacagatgggttgaaactaaggaaacatgtaatgataggcagact PyCogent-1.5.3/tests/data/brca1_5.tree000644 000765 000024 00000000057 10665667404 020430 0ustar00jrideoutstaff000000 000000 (((Human,HowlerMon),Mouse),NineBande,DogFaced);PyCogent-1.5.3/tests/data/F6AVWTA01.sff000644 000765 000024 00000103100 11446435255 020236 0ustar00jrideoutstaff000000 000000 .sff‚¸„¸TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTCAG GA202I001ER3QLebh_anffnceÃf`_eb ¾ ÉÂekh Æ hi'Å eÃbe dfde` f fË Ä Î g gfß fÊ Æà fddgdfÈ„ add Å gh eh É ccj gfÊl %ËgÊkxg ¼ceh\e \ e`Ê ] ÊÃefejk” h bind_ifÐÉ ÃcVÊ -ÎUe g `fo]Âo _ hbg cÀnb[ \ fif df Ëk kg Êlgc ÉÏ^_ à Á 2 d K [ ÍË ko ijÐ ÊlTgl\]_^Ï ghÊÙ ÅZlb}hXc»jm d Q \0pk ^VaáÎTCAGCAGTAGTCCTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGTGAACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%"""""%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%& &%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&&(((&&&&&&&(&&&&&&%&&$%%$!  GA202I001DBRNCggbagmgjkihÆfcdec É ÍÀgegÈ ca'ÀeÃeg agfab f eÍ Ã Éfld ÜeÊÈÃ`_eg h g À‚ f^ b È f i ih Ì d cg gf Íg 'ÔaÏh~h ¼gfcd_efeÌ [ ÎÌk\dfl— h f fjf _elÂË Íg_à %Æ []d fhf _Ôn R lgi ^ Élf_ _ aj b d` Æh md È j[j ÉÆ\lÁ ·  Wjlĸj g di ÎÑeTbu[ \\iÏ ^ bÌ× ¾ f naui` ]ÊkhM\`-peXcbÓÑTCAGCAGTAGTCCTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGTGAACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%(%%%%%%%%%%%%%%&&&&%%%%%%%%%%%%%%%%%%%%%%%%%%%%""""%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&&&&%%%%%%%&&&&&&&&&&&&&&&&&&&&&&&&&&&&&   GA202I001DJLC5gh`aelekkfjÈeeefeÊi ieiÁ _`1Á gÄbe_bbaci hÐÉÊgjc áeÍ̾[egjÒ ÌŽh\ d¿d d ekÌ dgijh Ðg 2 ÔaÓk‡g º_gbd]bg`Ì ` Î Ìlcfej ” g e gk^ ajlÆÌÀecÈ&Ïd ^ ^ a`k fÎi ^ l efà fek c ` gl Z efÎ j` i Ñi`e ÅÔ h`Ä Â 4d hVËÌ jg jjÌÖ eKftb TfgÏckÓÓ Â_hep`e fÆjgX Qc#om\cS¸µuu\TCAGCAGTAGTCCTGCTGCCTCCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCGGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGTCG%%%%%%%%%%%%%%%%%%%%%%&&(((((((((((((((((((((((((%%%%%%%%"""""%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&&&&&&&&&&&$%$$$&&&&&&&&&&(   GA202I001D4EKMccihdoffnbfÃacded ËÌ¿gflÉ ff) Äh^[g]_di ge * Ì bggËËhÇgg+bdeb d ÉÅË bedg c db cfjgifÃËÌh (ÌfËiÅ _\ dÁÃÉ ] f i gÎ ag ejfje d,mÁhd j Ä ldjÌ aÃÒ` bÞja` ^f l ed Ë g [ f ih`¿`fg he _ ee [Çg` h iÊjf ¾\kjb ekj_ ¹ ¼ekd¿ YWíghg`jÃÑ u fÑÎeA\Tf k^Àkc T i \-h]°[´yn TCAGCAGTAGTCCTGCTGCCTTCCGTAGGAGTTTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCACCCTCTCAGGCCGGCTACTGATCGTCGCCTTGGTGGGCCGTTACCTCACCAACCAGCTAATCAGACGCGGGTCCATCTTGTACCACCGGAGTTTTTCACACTGCTTCATGCGAAGCTGTGCGCTTATGCGGTATTAGCAGTCGTTTCCAACTGTTGTCCCCCAGTACAAGGCAGGTTACCCACGCGTTACTCACCCGTCCGCCAC%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&&%%%%%%&&&&&(&&&((((((&&&&&&&&&$$  GA202I001D7W79egegflijejggcfeefe ÉÌÄgefÊ ^cff e1 Ìfdedaffg eh& Æ b`i ÃÉhÉ] ÅÇe ffb aegÅÉ bbiƼ e cgeejÌÆÉg%ÌfÅkÄ ` ` bÇ Âi gZ e h fÊ Ì Ñ_kc ce idl ¾ fd Æ di Ç _ jd Á j# ]Ãx  \a ÀiÉi b d`ÏÎh l gdh +gÈm ] a kÍkf ¸\jpbdchi1À É`e fÀ id õ kJhÌ g<p_ÔÒ^u]^ fljbgÓjiT\ _3lÍC_ ¿ ‰aTCAGCAGTCAGTCTGCTGCCTTCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTCAACCTCTCAGTCCGGCTACCAATCGTCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATTGGACGCGAGTCCATCCTGAAGCGAATAAATCCTTTTCCCTCAATCCGATGCCGGATCGTGGGCTTATGCGGTATTAGCAGTCGTTTCCAACTGTTGTCCCCCACTCCAGGGCAGGTTACTCACGCGTTACTCACCCGTTCGCCAC%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%((((((((&&&&&&&%!!!$$$%%%$%$&   GA202I001ECKP7hg_dgmghfiggdfdegdÃËÄ^ffÁ c_(½ f¾affaagdÎÌÀ ÉfkeÖfÌǸ`]eb ghÉ ec¸`de e d igeicÇ Âe ebbÑmŒfµ ÑggÉfZÈ b[ É\da af i i 4j jÉk e ffÌ ÉhÉj^ 9Ói] bdg ^l* h fl mjÐg h¢bhbf [f^ €obÈÛ c Ãjbi6 Æ db0siqiZ˜Ðkf ^`bÊ f_ÅÍZ rms`fe `bËjbZ_ `lf Y]_ÆÀvKN¼ oTCAGCAGTCAGTCTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCGGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTACCTATCATCGTCTTGGTGGGCTGTTACCCCGCCAACTAACTAATAGGACGCATGCCCATCCGATACCTTGAACATTTAACTCACATCCCATGCAGGATAAAAGTACTATGGGGTGTTAATCCACGTTTCCATGGGCTATACCCCGGTATCAGGCAGGTTGCATACGCGTTACTCACCCGTGCGCCGGTCGCCA%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%!!!!%%%%%%%&&&&&%%%%%%%%%%%%%%%%%%%&&&%&(&&&(((((((((((((((&&&&&&&((   GA202I001DUJYFdciaepfefodebddcfe ÂÊ¿dehÈ g_$ÇiÉ_i cdcfc i hÍ Ë Í fge ßfÈÀ¼dcdd ff Æ~ fdfa f e ffhÌ`ÍÁÄg+ Ï_Ëh€f±Ëb ¸ he gÈ ai ] fXffi hiŽ gaÉf hf jÈoTi4 a k5Ìef^b0 a [gb e fhmË f hkg X ec je e mdÐ;gf Òd fe iÊef$ Tgfà ÂÌe gÅ_c Äl# ó mfËÔÓ [jhkc ^kÊa_^TW'le ` V^°ºgtqTCAGCAGTCAGTCTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTACTGATCGTTGCCTTGGTGGGCCGTTACCCCGCCAACAAGCTAATCAGACGCATCCCCATCCATCACCGATAAATCTTTAATCTCTTTCAGATGTCTTCTAGAGATATCATTGGGTATTAGTCTTACTTTCGCAAGGTTATCCCCAAGTGGTGGCAGGTTGGATACGCGTTACTCACCCGTGCGCCGGTCG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%!!!""%%%%%%%%%%%%%%%""""%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&%%%%%%%%%%%%%%%%%%%%%%%((((((&&(&&&$&&$$$$&&&((   &%GA202I001EEVFWhgaiblfiggeebdbhghÊ g.jggÈ ca+Ä fÇ^h^acec g fÔ É Ê hof ß cÍÆ» aaff f f ‰ g^ ` Ì dj hj e igdhÌ Èe ÑT¿loa ¹ Åb g fÅ Z ¼Ãmba_ f‡ g efÇÊl `ig9 Êe8ÔVÇj Ògd g[  Р]e fnÊs c]i _cfÃÈkcÈË ea# `$ bm 0 ³ dhie3Ì Êm¿ØÃj\¬Ã °d ccqhi ^ÃeiT a d!piV igÄ´iURÀl qc ÊÊ dC®s cZq µeeTCAGCAGTCAGTCTGCTGCCTCCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGATGTCTTGGTGGGCCGTTACCCCGCCAACAAACTAATGGAACGCATCCCCATCGGTTATCGAAATTCTTTAATAACAAAAAGATGCCTTCTCGTTATACTATCCGGTATTAATCTTTCTTTCGAAAGGCTATCCCGGAATAACCGGTAGGTTGGATACGTGTTACTCACCCGTGCGCCGGTCGCCATCAACCTATTGCTAGGTC%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%!!!""%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&"!%%%%%&&&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%(((((((((&&$$$$$$$ !!!%$&&   GA202I001EZXR9if_el f l^anr Z Z _ `_ `_×Î ·~_mÎ]\ ß YÑkhfhhgd k jÍÜ Ò fog å ZÔ̽Vvkk [_ ÂpkUpU X l Z Z]ÈhË ´³p¯t²\cd—²p V xSÓ d^ TS qT[ f l ^ƒke Ø mn i YÈjyf&Qg6ÈSih pT ~ W j Y c ep g ¾\i[mTke g`q egÞ:ne Èi uf^ÑuZ  ajb ·Èàga{ »hv³u 9  dqÖË ÌQne\w dcÆed OX Z%mz L jP;x ymTCAGCATCGTAGCTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTACTGATCGTTGCCTTGGTGGGCCGTTACCCCGCCAACAAGCTAATCAGACGCATCCCCATCCATCACCGATAAATCTTTAATCTCTTTCAGATGTCTTCTAGAGATATCATTGGGTATTAGTCTTACTTTCGCAAGGTTATCCCCAAGTGGTCGGGCAGGTTGGATACGCGTTACTCACCCGTGCGCCGGTCG###################### ########$$$$&&'(''('(((&########"""""############# """###########""!##############################################################################"######!########!!!# GA202I001DA5ZCgg`fephgjgmddcbcd c Í g,idiÊ ab/ Ç gÈ^g`efbc e dÏÈ Æ gwaÖ dÍÇÁda^c le ¿€ f\f a g b hifÏ bÌ ÐÑa  Ò`Åg}b­Îf ¿ hd fÆ \j Wg]g` llj ‘ mfÄegh j¹i V j5gj*ÌRii\3 a jd \ l hljÍ cYph][ ^le jiiÓ1meÌbfh q»gUbj_Ç ¿ÎY[}ÁfkÈg p iÏÑ ¼] bYwi]`Æf` [ g T%qda[_¸u iMTCAGCATCGTAGCTGCTGCCTCCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTACTGATCGTTGCCTTGGTGGGCCGTTACCCCGCCAACAAGCTAATCAGACGCATCCCCATCCATCACCGATAAATCTTTAATCTCTTTCAGATGTCTTCTAGAGATATCATTGGGTATTAGTCTTACTTTCGCAAGGTTATCCCCAAGTGGTGGGCAGGTTGGATACGCGTTACTCACCCGTGCGCCGGTCG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%&&(%%%%%%%%%%%%%%&&&&%%%%%%%%%%%%%%%%%%%&&&%%%%%%!!!!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&&&%%%%%%%%%$$!!&&$$$!%&&$$$$$&&&      GA202I001B9RKVcbjd^pgccdlabc`dg c È i"hem É cd% ÄgÄcf ddbed g eÐ É Ê hjg Ø fÆÀ» d`fe h e È fd d Ëg f jf Ë e bg jf Ég ÊaÐihºbgg a[ ^kcÏ d ÊÊjdeem‰ e hflc[fhÅË ÆeYÁ $È ` h e a gj ^Åm W k g mÆjZd^ \ f k a ei Îj qj ËmYe ÌË Wg ¾ ) ] U_ Ì Ä j g lnÔÑnU `q^][jÎ` gÊÖ ¿ _ h^ikZ _Ãgq P S S(miW„yTCAGCATCGTAGCTGCTGCCTCCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%(%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&%%%%%%&&&&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&&&&&&&&&&&&&&&&(((&&&&&&&&    GA202I001E1SL6hg`ggnfgggkdd_dfe dÎÈÅgcfÃa`dfd ÊfÍgcdfebddfigÐ affgd 8ÍÌÃX c`f cefÐ ad`fig ehhfg_ÌÈÎdXc` ÏkÑi‰]È Çg fe i g dÏfdg Äbfi ea c - he)fËÌ h hkÔÌ ŒdËÑRiŒb e iiÎÒ gduh ja `cÅm^ £pgÊimkd×Ö- ~` T Zgm \Ëf6 j Ïrn·%¿pQ ¬Ç S b [ hYiked¼cq[ l Pmc\ ti ô]hT¸ k¬dTCAGCATCGTAGCTGCTGCCTTCCGTAGGAGTCTGGTCCGTGTCTCAGTACCAGTGTGGGGGGTTAACCTCTCAGTCCCCCTATGTATCGTCGCCTTGGTGAGCCGTTACCCCACCAACTAGCTAATACAACGCATGCCCATCCCTAACCATCGGAATTTTCAACCATAAAAGATGCCTTCTATGATATTATGGGGTATTAGTACCAATTTCTCAGTGTTACTCCCCCTGTTAAAGGTAGGTTGCATACGCGTTACGCACCCGTGCGCCGGTCGCCACCA%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%!!%%%%%%%%%%&!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%""""&%%%%%%%%%%%%%%%%%%%%%%& $ !!$$$$$$   þGA202I001EGORFddhdbc`mcackdeecfeÌi$hekÉg`h` b ) Âgf dfc__ibh $ Ç fmaÄ ÈÊj` e g^bg c ÊgeÈceheb c d cdjg_cÊÊÌe  g e eÊh ` kc^ iÈ Âg gc f f aË cf h iefd c%m Á i g kÍkkeÏdÊË cfá l`d ]È \ jif o e fjc Ëi Z Éff _ne cÈl^dpÃlgÄ^ÍUdÇk/¿gej ie mbn e ña jjStÈÄo\ÇÊY WQU ^aÍgyTCAGCTGACTGACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAATGTGGCCGGTCACCCTCTCAGGTCGGCTACTGATCGTCGCCTTGGTGGGCTGTTATCTCACCAACTAGCTAATCAGACGCGGGTCCATCTTATACCACCGGAGTTTTTCACACCATGTCATGCAACATTGTGCGCTTATGCGGTATTACCAGCCGTTTCCAGCTGCTATCCCCCAGTACAAGGCAGGTTACCCACGCGTTAC%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%"""""%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%((((((&&&&&&&((&&&(&&(&&&&&!!   GA202I001DRQUBeegecffoeecmeecef e »Ê¾\gh ¾ee ½ iÁ_dddd hc g fÏ Ç Ê ffe Õ gÁÁ» da ha g j Â~gc `Ç g g gh Ñd elfjÄi"Ñ`Ìk{`¹gd_ _Y bh gË e Ê Äkc d a mŠ e hcog`imÁË Â_\à (à Ub ^ e hg ^Æo Yl e jÂjcg] a h l ` fg Îj ek Èn_c ÉÐ Zg¹ Ç - \ Td ÐÉ ng jiÚØiNbl^ \aeÕ^ lÍÛ ³ d hawe] cÈggON `$okY~{TCAGCTGACTGACTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%&%%%%%%%%%%%%%%""""%%%%%%%%%%%%%%%%%%%%%%%%%%%&!!!!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%(&&&&&&&&&&$$$&$$$%%&!!!!&&&&   GA202I001COCJJggaffegihffhefafc d ÎÊÈgfiË adfgc . Îee `fdbfibd# a kcdbÃh f ÍghdeË ca`i dg_Ï _ g h ihÍm Z gnideeÒÇÊ_ Z id Ó 1pÄW d lÍ Åg i_ f n fÒdeÆ ch gi gý\Z c[ Πg e ea iai¿m s ¼ c ldefçqB- d g' jT [¿i ka d g j e mÁgj¾Vk ppk@4E5 k htÐ_X'a.lbiÇe Ou_ioÔ×f f liZe ÀdmN Y a?€TCAGCTGACTGACTGCTGCCTTCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCTGGTCGTCCTCTCAGACCAGCTACCGATCGTCGCCTTGGTGAGCCTTTACCTCACCAACTAGCTAATCGGACATCGGCCGCTCTTCCAGCGCGAGGTCTTGCGATCCCCCGCTTTCACCCTCAGGTCTCATGCGGTATTAGCTGCCCTTTCGGGCAGTTATCCCCCCATAGAAGACACGTTCCGATGTATTACTCACCCG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%"""%&&!""%%%%%%!#% !!!!$$!!!!$%$$$$ GA202I001EK5WEggbdehejfecieebeh d Ê ÏÄhgfÇ `g4 Å eÄ`i _hee\ e gÎ Ã  jse ßfË˼d`f^ hd Ç| i[b Êe kanÖÈ4Íb ×^Ðg{h²¿e df É gÍ ] Ë ÀoXc \m — k eehf\ ÉÏk: Ï e8ÔZà Ðga 7 h bdm1“ofW ej Í lj ÎÐi]ÏÑ \e2` 0 gl< Àkfn f2 `debÒUboÉ aiÆÞ È S ibqig `Ócp_ V ^5mo ]g^ÅÆkPW È c jà c ]l 1rTCAGCTGACTGACTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGTTGGTTTGGTGGGCCGTTACCCCGCCAACTGCCTAATGGAACGCATCCCCATCGACAACCGAAATTCTTTAATAATTATCCCATGCGGGAAAACTATGCCATCCGGTATTAATCTTTCTTTCGAAAGGCTATCCCAGTGTTGTCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGTCGCCATCCATCAAAG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%!!!!!%%%%%%%%%%%%%%%""""%%%%%%%%%%%%%%%%%%%!!!%%%%%%!!!!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&%%%%(&&&&&(((&&&&&((((&&&&&&&&&&&        GA202I001C3FPUbdjcocmcec^cedbegeÉi$ggk à ee'¹ gÀeg _)Šeb e eÏ Ç Ë ike ÔeËÇ¿ f_df h e Å€ fd d Í hj hh Ê f cfic Éf  Ð_Íhxf¯gehaa c ieÌ f Ë¿lcc ak e f gjh ]flÄÉ ÅdcÆ 0¹ U e d bgg bÁn Vl i jÂg`f_ a fj c h c Ëi hh Èl\a Á¾ RŽˆ · % \ Yc » »j k ij ÐÌiQ_q\ a[hÍ _ fËÒ ¹ _ fZki] a¿eiV_ \liS†TCAGAGACTCTGCTGCTGCCTCCCGTAGGAGTTTGGACCGTGCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCATCTTGTATTAATCTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((&(((((('%%%%%%%&%%%%%%%%%%%%%%&&&&%%%%%%%%%%%%%%%%%%%"!!%%%%%& %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&!!%%%%%%%%%%%%%&((((&&&&&&((&&&&((&&&&&&&(&$    GA202I001B35KAechbmdnee`daaedfffÁ g"ggm Çdd)½gÀbe ddbea f gÏ Ç Íg gf â hʹ e\ef he É…ia b Ê gh hg Ì h ej he Íf "Ó_Ìj{g ¬eec Z[ gecÆ f ÍÆledflŽ f f dhgdekËÏ Æ`bÉ )É ]e \ b hh `ÇpXh hf d Émdb a g h ] a` Ìh hf Êpff ËÉ `fÊ Ä ,` W` Å Â i l gj ÍÕiQgqa^XhÏ`eËÜ ¼ \ f]rid aÆii P[ Wmg^yTCAGAGACTCTGCTGCTGCCTCCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGTGAACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%"""""%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&!!!!&&%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%(((((&&&&&&&&&&&&&&&&&&&&&&&&    GA202I001C31UYhfagnflgheedcddefeÆ f'ghf É fb)ÆeÁce edbec g gÐÅ Ê gie Ý eÇÄ» bbgg f f Æ…h\ e Ì i g gg Î a _hhhÊh " ÌdÏlzk ºcgia\ ] dfÍ ` ÀÏi_e emˆ g f eke _em¿ÊÀg`À $Æ[f b f eg [ÄmZmd hÉiah` ` e l ` he Ìl dh ÌoYc ËÎaf ¹1 ^ a_É ½ n o gg ÐÖfOep\ ^]cÌ` nÌÛ » e m[uhb b¿glP V `"lmZ‰~TCAGAGACTCTGCTGCTGCCTCCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCG%%%%%%%%%%%%%%%%%%%%%%(((((((((((((((((((((((((((%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%& &%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&((&&&&&&&&&&&&&&$&&&&&&&&%$$   GA202I001CXAFLackfndledacaaddbc eÆ ghej Å ea ÁgÂbh cceea f eÎ É Ê ghb ÖfÅúadef fh Â} f` a Ç i f he Ï e ekhiÍi ,ÍgÎi‚h¿fc+‡a c ifÎg ËËjgkhk f e egb ZfiËË ÇeZÆ  Çad be fd bÀn ^kb iÇhbf\ ` g h a ge Ñi lh Ên`` ÊË Y^Ê Â /aT b ÈÅl l mlÐÑjS dl] _ZjÑ `fÃÕ È Yi_tgX fÉje O\ ^kl `ˆ{TCAGAGACTCTGCTGCTGCCTCCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTATATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGCGGACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCG%%%%%%%%%%%%%%%%%%%%%&&&(((((((((((((((((((((((((%%%%%%%%&%%%%%%%%%%%%%%""""%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&&%%%""""%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%(((((&&&&&((&&&&&&&&&&&&&&&&& .mft1.00ä 454 GA202I001 R_2010_01_22_13_28_56_FLX02080328_adminrig_Bushman_CaFE D_2010_01_22_13_30_28_FLX02080328_fullProcessingAmplicons /data/R_2010_01_22_13_28_56_FLX02080328_adminrig_Bushman_CaFE/D_2010_01_22_13_30_28_FLX02080328_fullProcessingAmplicons/ 1.1.03 GA202I001B35KAo÷ÿGA202I001B9RKVBâÿGA202I001C31UYvfÿGA202I001C3FPUi‘ÿGA202I001COCJJ\”ÿGA202I001CXAFL|ÔÿGA202I001D4EKM%ÿGA202I001D7W79›ÿGA202I001DA5ZCRhesus with extra words tgtggcacaaatactcatgccagctcattacagcatgagaac---agtttgttactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggcttg gcaaggagccaacataacagatggactggaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagatctgaatgctaatgccctgtatgagagaaaagaatgg aataagcaaaaactgccatgctctgagaatcctagagacactgaagatgttccttgg >Manatee tgtggcacaaatactcatgccagctcattacagcatgagaatagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtcataaaagcaaacagcctggctta acaaggagccagcagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cctagcacagagaaaaaggtagatatgaatgctaatccattgtatgagagaaaagaagtg aataagcagaaacctccatgctccgagagtgttagagatacacaagatattccttgg >Pig tgtggcacagatactcatgccagctcgttacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattttgtaataaaagcaagcagcctgtctta gcaaagagccaacagagcagatgggctgaaagtaagggcacatgtaatgataggcagact cctaacacagagaaaaaggtagttctgaatactgatctcctgtatgggagaaacgaactg aataagcagaaacctgcgtgctctgacagtcctagagattcccaagatgttccttgg >GoldenMol tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaaacaaacagtctggctta gcgaggagccagcagagcagatgggctggaagtaaggcagcgtgcaatgacaagcagact cctagcacacagacagagctatataggagtgctggtcccatgcacaggagaaaagaagta aataagctgaaatctccatggtctgagagtcctggagctacccaagagattccttgg >Rat tgtggcacagatgctcgtgccagctcattacagcgtgggacccgcagtttattgttcact gaggacagactggatgcagaaaaggctgaattctgtgatagaagcaaacagtctggcgca gcagtgagccagcagagcagatgggctgacagtaaagaaacatgtaatggcaggccggtt ccccgcactgagggaaaggcagatccaaatgtggattccctctgtggtagaaagcagtgg aatcatccgaaaagcctgtgccctgagaattctggagctaccactgacgttccttgg >Human tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact cccagcacagaaaaaaaggtagatctgaatgctgatcccctgtgtgagagaaaagaatgg aataagcagaaactgccatgctcagagaatcctagagatactgaagatgttccttgg >Jackrabbit ------------------------------------------------------------ ---------------------aaggctgaattctgtaataagagcaaacagcctggctta gcaagaagccaacagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagttctgaatgttgactgcctgtatgggagaaaacaacag gataagcagaaacctccatgccctgagacctctggagataaccaagatgtttcttgg >FlyingSqu tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccagcagagcagatgggctaaaagtaaggaaacctgtaatgataggcaaatt cccagctcagagaaaaaggtagatttgaatgctgatccccaatatgagaaaaaagaacca agtaagcagaaacatccatgctctgagaattccagagatacccaagatgttccttgg >FreeTaile tgtggcacagatactcatgccagctcattacagcatgagaacagcagtttactactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaagcagcctggctta gcaaagagccagcagagcagatgggctgaaagtaaagaaacatgtaatgataggcagact ctcagcacagagaaaagggtagttctgaatgctgatcctctgaat--------------- aggagaaaagaacctccaggctctaactatcctagagattcccaagatgttccttgg >Mole tgtggcataaatactcatgccagcttattacagcatgaaaacagcagtttattactcact gaaaacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctgactta gcaaagagccagcagaacagatgggctgaaagtaaagaaacatgtaatgataggcagact tccagcccagagaaaagggtagacccgaatgctgatcccatgtatgggagaaaagaactg aataagcagaaacctccatgctctgacagccccagaaattcccaaggtgttgcctgg PyCogent-1.5.3/tests/data/formattest.gde000644 000765 000024 00000006041 10665667404 021203 0ustar00jrideoutstaff000000 000000 %Rhesus tgtggcacaaatactcatgccagctcattacagcatgagaac---agtttgttactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggcttg gcaaggagccaacataacagatggactggaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagatctgaatgctaatgccctgtatgagagaaaagaatgg aataagcaaaaactgccatgctctgagaatcctagagacactgaagatgttccttgg %Manatee tgtggcacaaatactcatgccagctcattacagcatgagaatagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtcataaaagcaaacagcctggctta acaaggagccagcagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cctagcacagagaaaaaggtagatatgaatgctaatccattgtatgagagaaaagaagtg aataagcagaaacctccatgctccgagagtgttagagatacacaagatattccttgg %Pig tgtggcacagatactcatgccagctcgttacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattttgtaataaaagcaagcagcctgtctta gcaaagagccaacagagcagatgggctgaaagtaagggcacatgtaatgataggcagact cctaacacagagaaaaaggtagttctgaatactgatctcctgtatgggagaaacgaactg aataagcagaaacctgcgtgctctgacagtcctagagattcccaagatgttccttgg %GoldenMol tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaaacaaacagtctggctta gcgaggagccagcagagcagatgggctggaagtaaggcagcgtgcaatgacaagcagact cctagcacacagacagagctatataggagtgctggtcccatgcacaggagaaaagaagta aataagctgaaatctccatggtctgagagtcctggagctacccaagagattccttgg %Rat tgtggcacagatgctcgtgccagctcattacagcgtgggacccgcagtttattgttcact gaggacagactggatgcagaaaaggctgaattctgtgatagaagcaaacagtctggcgca gcagtgagccagcagagcagatgggctgacagtaaagaaacatgtaatggcaggccggtt ccccgcactgagggaaaggcagatccaaatgtggattccctctgtggtagaaagcagtgg aatcatccgaaaagcctgtgccctgagaattctggagctaccactgacgttccttgg %Human tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact cccagcacagaaaaaaaggtagatctgaatgctgatcccctgtgtgagagaaaagaatgg aataagcagaaactgccatgctcagagaatcctagagatactgaagatgttccttgg %Jackrabbit ------------------------------------------------------------ ---------------------aaggctgaattctgtaataagagcaaacagcctggctta gcaagaagccaacagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagttctgaatgttgactgcctgtatgggagaaaacaacag gataagcagaaacctccatgccctgagacctctggagataaccaagatgtttcttgg %FlyingSqu tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccagcagagcagatgggctaaaagtaaggaaacctgtaatgataggcaaatt cccagctcagagaaaaaggtagatttgaatgctgatccccaatatgagaaaaaagaacca agtaagcagaaacatccatgctctgagaattccagagatacccaagatgttccttgg %FreeTaile tgtggcacagatactcatgccagctcattacagcatgagaacagcagtttactactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaagcagcctggctta gcaaagagccagcagagcagatgggctgaaagtaaagaaacatgtaatgataggcagact ctcagcacagagaaaagggtagttctgaatgctgatcctctgaat--------------- aggagaaaagaacctccaggctctaactatcctagagattcccaagatgttccttgg %Mole tgtggcataaatactcatgccagcttattacagcatgaaaacagcagtttattactcact gaaaacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctgactta gcaaagagccagcagaacagatgggctgaaagtaaagaaacatgtaatgataggcagact tccagcccagagaaaagggtagacccgaatgctgatcccatgtatgggagaaaagaactg aataagcagaaacctccatgctctgacagccccagaaattcccaaggtgttgcctgg PyCogent-1.5.3/tests/data/formattest.msf000644 000765 000024 00000006653 10665667404 021242 0ustar00jrideoutstaff000000 000000 This file is an example of the sort of broken MSF many programs produce Features that cannot be relied on are the header !!AA_MULTIPLE_ALIGNMENT 1.0 that should be on the first line, checksum, date type etc. The other main variance is position numbers on the line above the sequence which is on only the second block in this test D:\MHC.msf MSF: 184 Type: N January 01, 1776 12:00 Check: 0 .. Name: TrvuDRB2 Len: 184 Check: 0 Weight: 1.00 Name: TrvuDRB3 Len: 184 Check: 0 Weight: 1.00 Name: MaruDAB1 Len: 184 Check: 0 Weight: 1.00 Name: ModoDAB1 Len: 184 Check: 0 Weight: 1.00 Name: MaruDBB Len: 184 Check: 0 Weight: 1.00 Name: TvMHC2_5 Len: 184 Check: 0 Weight: 1.00 Name: HosaDOB Len: 184 Check: 0 Weight: 1.00 Name: PatrDOB Len: 184 Check: 0 Weight: 1.00 Name: OrcuDOB Len: 184 Check: 0 Weight: 1.00 // TrvuDRB2 ---MLSVLLP KGVWIEVLAV TLLVLASQVA AGRHA-PEHF TEQGKSECHF TrvuDRB3 ---MPCVLLP KGIWIEVLAV TLLVLTSQVA SGRHA-PEHF TGQFKYECHF MaruDAB1 ---------- --------AV ILLVLTSQVA AGRHA-PKDF LWQAKSECHF ModoDAB1 ---MVSAQPL GGIWMEVLVV TLLVLTAQVA ADRRP-PKHF TEYSTSECYF MaruDBB ---MVDVRIS AGCWKIGLLM TSMLLSLSAS WARDI-PEDF VYQYKAECYF TvMHC2_5 ---MVDVWIS ADYWKIGLLM TLTVLSLSAS WARDI-PEDF MRQHKAECYF HosaDOB ------MGSG WVPWVVALLV NLTQLDSSMT QGTDS-PEDF VIQAKADCYF PatrDOB ------MGSG WVPWVVALLV NLTRLDSSMT QGTDS-PEDF VIQAKADCYF OrcuDOB ------MDSK WVPWVVALLV NLIRLDYSMT Q-GDS-PEDF VIQAKADCYF 51 100 TrvuDRB2 VNGTETVRFV ERHVYNREEF MRFDSDVGEY VALTELGRRQ AEYYNS-QKD TrvuDRB3 VNGTEHVRFV ERHIYNREEF MRFDSAVGEF VALTELGRRH AELWNS-RKE MaruDAB1 ENGTQHVRFM DRYFYNRQEA LRFDSDVGEY VAVSELGRLQ AELWNK-REE ModoDAB1 INGTDQVRLI ERYIYNREEY VRFDSNVGVY EAVTELGRGI AEHLNS-QKE MaruDBB TNGSERVRFV VRDIYNGEEF ARFDSDVGVY VPVTELGRPD AEYWNS-QEE TvMHC2_5 TNGTERVRFV ERYIYNDQED VRFDSEVGEY VAVTELGRPD AEYWNS-QKE HosaDOB TNGTEKVQFV VRFIFNLEEY VRFDSDVGMF VALTKLGQPD AEQWNS-RLD PatrDOB TNGTEKVQFV VRFIFNLEEY VRFDSDVGMF VALTKLGQPD AEQWNS-RLD OrcuDOB TNGTEQVRVV VRFIFNLEEY AHFDSDVGMF VALTELGQPD AESWNN-RPD TrvuDRB2 YMESRRGQVD NYCRHNYE-V IEPFSVRRRV EPEVTVYPSK TAPLGHHNLL TrvuDRB3 ILEGRRAQVD -TCRHNYE-V IEPFSVRRRV EPEVTVYPSK TAPLGHHNLL MaruDAB1 ILEDARAAVD TLCRHNYE-I LERFLVPRRV EPEVTVYPSK LAPLGHHNLL ModoDAB1 RLDYLLAAVD TYCRHNYE-I SEPFLVRRRV EPEVLVYPSK TAPVGHHNLL MaruDBB ILEKERAYVD TLCRHNYE-A AKPFTLDRRV QPRVTISPSK --TDALQHLL TvMHC2_5 LLEEKRAEVD TLCRHNYE-A GKSFTVDRRV QPRVTISPSR --TEALQHLL HosaDOB LLERSRQAVD GVCRHNYR-L GAPFTVGRKV QPEVTVYPER TPLLHQHNLL PatrDOB LLERSRQAVD GVCRHNYR-L GAPFTVGRKV QPEVTVYPER TPLLHQHNLL OrcuDOB ILERSRRSVD FLCRRNYR-L GAPFTVGRKV PPEGTVIPER TPVCGQQPAA TrvuDRB2 GVGLIVHRRS QKGNRGSQPT GLLS------ ---- TrvuDRB3 GVGLVVHRRS QKGNRGSQPT GLLS------ ---- MaruDAB1 GVGLIVYKRN QKGNRGSQPT GLLS------ ---- ModoDAB1 GGGLIVHMRS QKANRGSQPA GLLS------ ---- MaruDBB SVGLIIHLKN KKGHTGPQPT GLLG------ ---- TvMHC2_5 GVGLIIHLKN QKGRTGPQPA GLLR------ ---- HosaDOB LVGIVIQLRA QKGYVRTQMS GNEVSRAVLL PQSC PatrDOB LVGIVIQLRA QKGYVRTQMS GNEVSRAVLL PQSC OrcuDOB LVGTIIHIRA WKICGDTALC CC-------- ---- PyCogent-1.5.3/tests/data/formattest.paml000644 000765 000024 00000006037 10665667404 021402 0ustar00jrideoutstaff000000 000000 10 297 Rhesus tgtggcacaaatactcatgccagctcattacagcatgagaac---agtttgttactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggcttg gcaaggagccaacataacagatggactggaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagatctgaatgctaatgccctgtatgagagaaaagaatgg aataagcaaaaactgccatgctctgagaatcctagagacactgaagatgttccttgg Manatee tgtggcacaaatactcatgccagctcattacagcatgagaatagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtcataaaagcaaacagcctggctta acaaggagccagcagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cctagcacagagaaaaaggtagatatgaatgctaatccattgtatgagagaaaagaagtg aataagcagaaacctccatgctccgagagtgttagagatacacaagatattccttgg Pig tgtggcacagatactcatgccagctcgttacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattttgtaataaaagcaagcagcctgtctta gcaaagagccaacagagcagatgggctgaaagtaagggcacatgtaatgataggcagact cctaacacagagaaaaaggtagttctgaatactgatctcctgtatgggagaaacgaactg aataagcagaaacctgcgtgctctgacagtcctagagattcccaagatgttccttgg GoldenMol tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaaacaaacagtctggctta gcgaggagccagcagagcagatgggctggaagtaaggcagcgtgcaatgacaagcagact cctagcacacagacagagctatataggagtgctggtcccatgcacaggagaaaagaagta aataagctgaaatctccatggtctgagagtcctggagctacccaagagattccttgg Rat tgtggcacagatgctcgtgccagctcattacagcgtgggacccgcagtttattgttcact gaggacagactggatgcagaaaaggctgaattctgtgatagaagcaaacagtctggcgca gcagtgagccagcagagcagatgggctgacagtaaagaaacatgtaatggcaggccggtt ccccgcactgagggaaaggcagatccaaatgtggattccctctgtggtagaaagcagtgg aatcatccgaaaagcctgtgccctgagaattctggagctaccactgacgttccttgg Human tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact cccagcacagaaaaaaaggtagatctgaatgctgatcccctgtgtgagagaaaagaatgg aataagcagaaactgccatgctcagagaatcctagagatactgaagatgttccttgg Jackrabbit ------------------------------------------------------------ ---------------------aaggctgaattctgtaataagagcaaacagcctggctta gcaagaagccaacagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagttctgaatgttgactgcctgtatgggagaaaacaacag gataagcagaaacctccatgccctgagacctctggagataaccaagatgtttcttgg FlyingSqu tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccagcagagcagatgggctaaaagtaaggaaacctgtaatgataggcaaatt cccagctcagagaaaaaggtagatttgaatgctgatccccaatatgagaaaaaagaacca agtaagcagaaacatccatgctctgagaattccagagatacccaagatgttccttgg FreeTaile tgtggcacagatactcatgccagctcattacagcatgagaacagcagtttactactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaagcagcctggctta gcaaagagccagcagagcagatgggctgaaagtaaagaaacatgtaatgataggcagact ctcagcacagagaaaagggtagttctgaatgctgatcctctgaat--------------- aggagaaaagaacctccaggctctaactatcctagagattcccaagatgttccttgg Mole tgtggcataaatactcatgccagcttattacagcatgaaaacagcagtttattactcact gaaaacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctgactta gcaaagagccagcagaacagatgggctgaaagtaaagaaacatgtaatgataggcagact tccagcccagagaaaagggtagacccgaatgctgatcccatgtatgggagaaaagaactg aataagcagaaacctccatgctctgacagccccagaaattcccaaggtgttgcctgg PyCogent-1.5.3/tests/data/formattest.phylip000644 000765 000024 00000006710 10665667404 021754 0ustar00jrideoutstaff000000 000000 10 297 Rhesus tgtggcacaaatactcatgccagctcattacagcatgagaac---agtttgttactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggcttg gcaaggagccaacataacagatggactggaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagatctgaatgctaatgccctgtatgagagaaaagaatgg aataagcaaaaactgccatgctctgagaatcctagagacactgaagatgttccttgg Manatee tgtggcacaaatactcatgccagctcattacagcatgagaatagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtcataaaagcaaacagcctggctta acaaggagccagcagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cctagcacagagaaaaaggtagatatgaatgctaatccattgtatgagagaaaagaagtg aataagcagaaacctccatgctccgagagtgttagagatacacaagatattccttgg Pig tgtggcacagatactcatgccagctcgttacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattttgtaataaaagcaagcagcctgtctta gcaaagagccaacagagcagatgggctgaaagtaagggcacatgtaatgataggcagact cctaacacagagaaaaaggtagttctgaatactgatctcctgtatgggagaaacgaactg aataagcagaaacctgcgtgctctgacagtcctagagattcccaagatgttccttgg GoldenMo tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaaacaaacagtctggctta gcgaggagccagcagagcagatgggctggaagtaaggcagcgtgcaatgacaagcagact cctagcacacagacagagctatataggagtgctggtcccatgcacaggagaaaagaagta aataagctgaaatctccatggtctgagagtcctggagctacccaagagattccttgg Rat tgtggcacagatgctcgtgccagctcattacagcgtgggacccgcagtttattgttcact gaggacagactggatgcagaaaaggctgaattctgtgatagaagcaaacagtctggcgca gcagtgagccagcagagcagatgggctgacagtaaagaaacatgtaatggcaggccggtt ccccgcactgagggaaaggcagatccaaatgtggattccctctgtggtagaaagcagtgg aatcatccgaaaagcctgtgccctgagaattctggagctaccactgacgttccttgg Human tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact cccagcacagaaaaaaaggtagatctgaatgctgatcccctgtgtgagagaaaagaatgg aataagcagaaactgccatgctcagagaatcctagagatactgaagatgttccttgg Jackrabb ------------------------------------------------------------ ---------------------aaggctgaattctgtaataagagcaaacagcctggctta gcaagaagccaacagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagttctgaatgttgactgcctgtatgggagaaaacaacag gataagcagaaacctccatgccctgagacctctggagataaccaagatgtttcttgg FlyingSq tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccagcagagcagatgggctaaaagtaaggaaacctgtaatgataggcaaatt cccagctcagagaaaaaggtagatttgaatgctgatccccaatatgagaaaaaagaacca agtaagcagaaacatccatgctctgagaattccagagatacccaagatgttccttgg FreeTail tgtggcacagatactcatgccagctcattacagcatgagaacagcagtttactactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaagcagcctggctta gcaaagagccagcagagcagatgggctgaaagtaaagaaacatgtaatgataggcagact ctcagcacagagaaaagggtagttctgaatgctgatcctctgaat--------------- aggagaaaagaacctccaggctctaactatcctagagattcccaagatgttccttgg Mole tgtggcataaatactcatgccagcttattacagcatgaaaacagcagtttattactcact gaaaacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctgactta gcaaagagccagcagaacagatgggctgaaagtaaagaaacatgtaatgataggcagact tccagcccagagaaaagggtagacccgaatgctgatcccatgtatgggagaaaagaactg aataagcagaaacctccatgctctgacagccccagaaattcccaaggtgttgcctgg PyCogent-1.5.3/tests/data/Homo_sapiens_codon_usage.pri000644 000765 000024 00001606061 10665667404 024051 0ustar00jrideoutstaff000000 000000 >BX649449#3\BX649449\join(44979..45265,45932..46513,48260..48821)\1431\CAH71689.1\Homo sapiens\Human DNA sequence from clone RP11-248D7 on chromosome 1 Containstwo genes for novel proteins similar to preferentially expressedantigen in melanoma (PRAME) and a preferentially expressed antigenin melanoma (PRAME) pseudogene.\n\, complete sequence./locus_tag="RP11-248D7.3-001"/standard_name="OTTHUMP00000011089"/codon_start=1/product="novel protein similar to preferentially expressed antigen in melanoma (PRAME)"/protein_id="CAH71689.1"/db_xref="GI:55666023"/db_xref="InterPro:IPR001611"/db_xref="UniProt/TrEMBL:Q5TYX0" 0 6 3 0 7 15 6 19 37 11 6 6 2 6 1 5 9 5 4 12 0 7 6 11 1 7 4 13 0 5 3 5 4 1 2 5 12 7 8 19 11 6 10 14 6 3 8 17 12 12 5 3 14 13 15 4 4 12 4 15 8 0 0 1 >BC002704\BC002704\307..2445\2139\AAH02704.1\Homo sapiens\Homo sapiens signal transducer and activator of transcription 1,91kDa, transcript variant beta, mRNA (cDNA clone MGC:3493IMAGE:3627218), complete cds./gene="STAT1"/codon_start=1/product="signal transducer and activator of transcription 1, isoform beta"/protein_id="AAH02704.1"/db_xref="GI:12803735"/db_xref="GeneID:6772"/db_xref="MIM:600555" 2 3 6 3 11 7 3 11 29 18 7 19 4 8 2 9 12 10 7 12 6 8 8 9 5 10 4 10 4 9 6 6 7 7 3 9 19 10 30 25 13 30 9 44 9 6 35 26 19 16 9 9 5 5 17 15 6 15 15 16 15 1 0 0 >HS445C9#3\Z95115\57994..58629\636\CAB62951.1\Homo sapiens\Human DNA sequence from clone CTA-445C9 on chromosome 22q12.1,complete sequence./locus_tag="CTA-445C9.2-001"/standard_name="OTTHUMP00000028722"/codon_start=1/protein_id="CAB62951.1"/db_xref="GI:6572242"/db_xref="Genew:4994"/db_xref="GOA:Q9UGV6"/db_xref="HSSP:1NHN"/db_xref="InterPro:IPR000135"/db_xref="InterPro:IPR000910"/db_xref="InterPro:IPR009071"/db_xref="UniProt/Swiss-Prot:Q9UGV6" 1 1 1 0 2 2 0 1 5 0 0 0 5 1 1 3 2 0 1 2 0 3 2 2 1 6 7 3 5 6 6 2 2 1 0 2 0 2 16 27 2 2 1 1 1 4 19 13 2 16 2 4 2 1 4 4 0 2 2 6 2 1 0 0 >BC037529\BC037529\240..560\321\AAH37529.2\Homo sapiens\Homo sapiens methylenetetrahydrofolate dehydrogenase (NADP+dependent) 2-like, mRNA (cDNA clone IMAGE:4794959), complete cds./gene="MTHFD2L"/codon_start=1/product="MTHFD2L protein"/protein_id="AAH37529.2"/db_xref="GI:34191218"/db_xref="GeneID:441024" 1 0 0 0 4 1 2 3 0 3 3 2 2 1 0 2 2 2 1 0 0 1 4 0 0 2 1 2 0 3 3 0 1 3 2 2 3 3 2 3 2 2 2 4 1 2 6 1 2 3 0 1 0 2 0 1 5 1 3 3 1 1 0 0 >BT019976\BT019976\1..579\579\AAV38779.1\Homo sapiens\Homo sapiens phosphomevalonate kinase mRNA, complete cds./codon_start=1/product="phosphomevalonate kinase"/protein_id="AAV38779.1"/db_xref="GI:54696814" 1 5 6 0 4 2 0 4 10 2 0 3 1 3 0 2 4 1 1 3 3 0 3 1 2 0 0 5 2 5 4 7 4 1 2 4 6 2 2 6 4 0 0 13 0 2 3 14 10 2 1 2 1 2 6 5 1 6 1 3 5 0 1 0 >BC006354\BC006354\268..945\678\AAH06354.1\Homo sapiens\Homo sapiens Coenzyme A synthase, mRNA (cDNA clone IMAGE:4123850),complete cds./gene="COASY"/codon_start=1/product="COASY protein"/protein_id="AAH06354.1"/db_xref="GI:13623499"/db_xref="GeneID:80347" 2 3 5 1 1 3 1 8 9 3 0 4 1 0 0 1 7 3 2 2 1 4 5 2 1 3 1 11 2 7 2 8 6 2 2 4 13 0 2 10 3 1 2 12 2 5 4 11 4 5 1 3 0 2 0 3 2 6 8 4 5 0 0 1 >AL136089#5\AL136089\complement(join(4948..5708,8394..8514,10823..10933, 11451..11563,11876..11987,12333..12394,14627..14791, 24237..24393))\1602\CAI20011.1\Homo sapiens\Human DNA sequence from clone RP1-278E11 on chromosome 6 Containsthe 3' end of the novel gene KIAA0381, the MOCS1 gene formolybdenum cofactor synthesis 1, a pseudogene similar to 60Sribosomal protein L23 (RPL23) and a pseudogene similar to tubulinbeta chain (TUBB), complete sequence./gene="MOCS1"/locus_tag="RP1-278E11.2-002"/standard_name="OTTHUMP00000016354"/codon_start=1/protein_id="CAI20011.1"/db_xref="GI:56204710"/db_xref="GOA:Q5TCE2"/db_xref="InterPro:IPR002820"/db_xref="InterPro:IPR007197"/db_xref="InterPro:IPR010505"/db_xref="UniProt/TrEMBL:Q5TCE2" 3 4 15 1 5 5 7 14 29 5 2 3 5 9 1 7 12 2 7 14 0 6 7 11 5 6 6 26 1 15 8 14 9 7 3 11 27 2 5 25 13 3 2 26 7 6 6 27 14 8 2 3 7 2 12 8 2 19 5 12 5 0 1 0 >HUMXE7A#2\L03426\join(167..928,929..1077,1078..1318,1319..1324)\1158\AAA61303.1\Homo sapiens\Human XE7 mRNA, complete alternate coding regions./gene="XE7"/codon_start=1/protein_id="AAA61303.1"/db_xref="GI:340387" 2 10 6 3 4 8 0 6 30 2 0 3 1 8 2 3 7 2 0 10 3 0 0 6 4 1 2 14 5 2 0 9 8 0 0 3 11 1 4 38 8 2 3 20 7 0 7 41 15 2 6 1 4 0 15 3 0 16 2 12 3 0 0 1 >AY623111\AY623111\join(2465..2513,3272..3362,27057..27132,40243..40335, 73734..73813,89444..89510,92446..92508,104917..105004, 112167..112397,117471..117609,133807..133966, 144333..144445,145517..145564,147600..147722, 148444..148777,167766..167795)\1785\AAT38107.1\Homo sapiens\Homo sapiens CDC14 cell division cycle 14 homolog A (S. cerevisiae)(CDC14A) gene, complete cds./gene="CDC14A"/codon_start=1/product="CDC14 cell division cycle 14 homolog A (S. cerevisiae)"/protein_id="AAT38107.1"/db_xref="GI:47777659" 6 1 4 3 16 7 8 7 10 7 8 10 12 15 2 14 12 8 14 12 1 7 13 7 2 8 19 13 2 8 16 9 7 7 4 2 8 6 25 13 21 18 6 10 9 7 17 15 9 17 11 16 5 6 16 17 9 10 11 8 3 1 0 0 >AK027045\AK027045\98..1822\1725\BAB15636.1\Homo sapiens\Homo sapiens cDNA: FLJ23392 fis, clone HEP17418./codon_start=1/protein_id="BAB15636.1"/db_xref="GI:10440063" 4 5 11 4 4 3 4 3 25 12 0 8 12 8 4 14 10 10 9 11 3 9 33 9 0 22 17 20 0 17 12 11 14 8 2 6 20 7 8 12 3 3 3 20 8 8 13 29 14 13 11 5 6 4 12 9 0 5 4 7 6 1 0 0 >AF238378#2\AF238378\complement(join(20771..20880,21461..21635))\285\AAT68884.1\Homo sapiens\Homo sapiens chromosome 8 clone SCb-561b17 map p23.1, completesequence./gene="Em:AF200455.14"/codon_start=1/product="defensin, alpha 1, myeloid-related sequence"/protein_id="AAT68884.1"/db_xref="GI:50057848" 0 1 0 1 3 2 0 4 2 2 0 1 1 1 0 0 1 0 0 2 0 0 5 0 1 0 6 6 2 5 3 1 0 0 0 0 2 3 1 1 1 0 0 5 0 1 3 3 2 1 1 2 6 0 1 0 1 3 3 2 2 0 0 1 >BC071876\BC071876\330..1262\933\AAH71876.1\Homo sapiens\Homo sapiens chromosome 19 open reading frame 27, mRNA (cDNA cloneMGC:88549 IMAGE:6672380), complete cds./gene="C19orf27"/codon_start=1/product="C19orf27 protein"/protein_id="AAH71876.1"/db_xref="GI:47939594"/db_xref="GeneID:81926" 0 14 0 2 1 4 0 10 19 0 0 1 0 9 8 1 10 1 0 9 5 0 0 11 7 5 1 22 3 1 0 14 9 3 0 4 12 1 0 8 5 2 0 8 7 0 0 18 13 0 14 2 10 1 12 0 0 15 1 4 3 0 1 0 >HSP2X4PC\Y07684\28..1194\1167\CAA68948.1\Homo sapiens\H.sapiens mRNA for P2X4 purinoceptor./codon_start=1/product="P2X4 purinoceptor"/protein_id="CAA68948.1"/db_xref="GI:1781009"/db_xref="GOA:Q99571"/db_xref="UniProt/Swiss-Prot:Q99571" 0 8 3 1 3 4 1 11 11 5 1 2 2 7 1 4 5 2 5 8 4 8 3 7 2 2 4 12 7 8 4 16 5 1 2 8 20 3 10 13 17 3 2 9 4 0 5 12 12 9 11 7 10 4 13 6 4 19 3 9 6 0 0 1 >BX664739#1\BX664739\complement(join(5483..5615,5756..5802,5924..6132, 6381..6486,6690..6737))\543\CAI43234.1\Homo sapiens\Human DNA sequence from clone WI2-89031B12 on chromosome X Containsthe FAM3A gene for family with sequence similarity, the SLC10A3 3gene for solute carrier family 10 (sodium/bile acid cotransporterfamily) member 3, the UBL4 gene for ubiquitin-like 4 and three CpGislands, complete sequence./gene="UBL4"/locus_tag="XX-FW89031B12.1-002"/standard_name="OTTHUMP00000015408"/codon_start=1/product="ubiquitin-like 4"/protein_id="CAI43234.1"/db_xref="GI:57284174"/db_xref="GOA:Q5HY81"/db_xref="UniProt/TrEMBL:Q5HY81" 2 4 2 2 2 5 5 3 17 0 0 2 0 6 1 1 5 2 0 2 3 1 3 3 2 2 3 5 2 1 1 6 2 0 1 6 7 1 3 7 3 0 0 14 4 4 4 8 4 4 1 1 2 0 3 0 0 3 0 1 4 0 0 1 >AL627311#1\AL627311 AC027632\join(complement(AL672037.5:31607..31619), complement(142067..142127),complement(119882..119957), complement(81610..81738),complement(33457..33552), complement(28345..28377),complement(11792..11971), complement(11078..11192),complement(3746..3843), complement(AL358392.19:76459..76581), complement(AL358392.19:56988..57096), complement(AL358392.19:48476..49315), complement(AL358392.19:48361..48380))\-67197\CAI14552.1\Homo sapiens\Human DNA sequence from clone RP11-487E1 on chromosome 1 Containspart of the EIF4G3 gene for eukaryotic translation initiationfactor 4 gamma, 3, a ribosomal protein S15a (RPS15A) pseudogene anda pseudogene similar to part of ribosomal protein L34 (RPL34),complete sequence./gene="EIF4G3"/locus_tag="RP11-190H11.1-002"/standard_name="OTTHUMP00000002723"/codon_start=1/product="eukaryotic translation initiation factor 4 gamma, 3"/protein_id="CAI14552.1"/db_xref="GI:55962100"/db_xref="GOA:Q5SVN3"/db_xref="UniProt/TrEMBL:Q5SVN3" 2 6 3 3 14 9 7 8 15 16 10 12 11 11 2 10 11 13 9 16 3 9 14 15 6 20 13 8 2 6 8 11 9 6 6 6 8 6 15 5 12 14 10 24 11 15 5 10 6 7 10 9 7 11 9 28 10 13 16 12 7 2 8 11 >BC001734\BC001734\77..367\291\AAH01734.1\Homo sapiens\Homo sapiens Sec61 beta subunit, mRNA (cDNA clone MGC:1255IMAGE:3502133), complete cds./gene="SEC61B"/codon_start=1/product="protein translocation complex beta"/protein_id="AAH01734.1"/db_xref="GI:12804623"/db_xref="GeneID:10952" 1 2 2 1 0 2 0 1 1 1 0 2 2 2 2 2 2 3 3 3 0 3 1 2 1 3 3 3 2 1 2 5 5 1 2 1 2 3 3 1 1 1 0 1 1 0 1 0 0 1 2 0 0 1 2 1 0 1 1 4 2 0 1 0 >AC004410\AC004410\join(278..413,523..643,1344..1460,3657..3738,4411..4485, 4808..4870,4999..5098,5699..5776,11880..12040, 13392..13655)\1197\AAC05601.1\Homo sapiens\Homo sapiens chromosome 19, fosmid 39554, complete sequence./codon_start=1/evidence=not_experimental/product="fos39554_1"/protein_id="AAC05601.1"/db_xref="GI:2959559" 0 5 6 4 0 3 1 15 32 1 0 1 3 12 3 3 9 2 2 8 9 1 8 16 9 7 4 25 5 5 2 12 7 2 3 6 27 1 3 9 4 0 0 14 3 0 3 13 12 3 10 2 12 2 17 3 0 10 1 11 7 0 1 0 >AC073283#1\AC073283\complement(join(73603..73672,73873..73937,94935..95192, 96634..96693,98848..98928))\534\AAY24084.1\Homo sapiens\Homo sapiens BAC clone RP11-761B3 from 2, complete sequence./gene="FLJ40172"/codon_start=1/product="unknown"/protein_id="AAY24084.1"/db_xref="GI:62988697" 0 1 1 0 4 3 3 0 3 5 3 1 5 4 1 2 3 1 5 4 0 5 9 3 2 6 3 3 0 4 3 1 1 1 2 2 5 3 8 4 3 4 5 4 1 0 7 0 6 3 4 3 0 1 4 5 5 1 2 3 2 0 1 0 >HUMTYROPHO\L18983\74..3013\2940\AAA90974.1\Homo sapiens\Homo sapiens tyrosine phosphatase (IA-2/PTP) mRNA, complete cds./gene="IA-2/PTP"/EC_number="3.1.3.48"/codon_start=1/transl_except=(pos:80..82,aa:Arg)/product="tyrosine phosphatase"/protein_id="AAA90974.1"/db_xref="GI:499630" 1 20 21 5 5 12 4 25 64 10 2 10 8 29 1 10 29 7 9 19 8 7 24 28 12 27 22 42 7 20 18 25 24 12 3 16 39 4 7 23 13 4 15 46 20 9 10 50 26 13 14 7 11 6 12 9 3 22 5 17 8 0 0 1 >HS934G17#9\AL021155\33197..33562\366\CAI23399.1\Homo sapiens\Human DNA sequence from clone RP5-934G17 on chromosome 1p36.21Contains the 3' end of the CLCN6 gene for chloride channel 6, twonovel genes, the NPPA gene for natriuretic peptide precursor A, theNPPB gene for natriuretic peptide precursor B, a pseudogene similarto part of SET binding factor 1 (SBF1) and the 3' end of a novelgene, complete sequence./locus_tag="RP5-934G17.3-001"/standard_name="OTTHUMP00000002487"/codon_start=1/product="novel protein"/protein_id="CAI23399.1"/db_xref="GI:56202429"/db_xref="UniProt/TrEMBL:Q5JZE0" 0 3 0 0 2 3 0 5 8 4 2 1 4 3 1 7 5 2 1 2 1 3 5 8 0 3 4 5 1 5 1 0 0 1 0 2 2 1 0 5 0 1 2 2 1 0 0 2 1 2 0 1 1 1 0 1 1 0 2 1 2 0 0 1 >BC007891\BC007891\17..1354\1338\AAH07891.1\Homo sapiens\Homo sapiens selenocysteine lyase, mRNA (cDNA clone MGC:14083IMAGE:4126478), complete cds./gene="SCLY"/codon_start=1/product="selenocysteine lyase"/protein_id="AAH07891.1"/db_xref="GI:14043901"/db_xref="GeneID:51540" 2 6 8 0 2 8 2 8 16 7 1 1 3 6 3 1 10 3 5 12 2 5 6 11 7 6 10 17 13 6 7 9 15 6 1 8 28 2 5 13 8 9 4 18 13 4 13 15 14 5 3 6 4 1 7 6 4 11 6 11 2 0 1 0 >AK094438\AK094438\97..2373\2277\BAC04356.1\Homo sapiens\Homo sapiens cDNA FLJ37119 fis, clone BRACE2022333./codon_start=1/protein_id="BAC04356.1"/db_xref="GI:21753501" 1 12 22 4 7 23 2 28 71 5 0 5 3 11 8 3 20 3 3 6 8 2 13 18 9 6 14 44 12 14 3 21 16 5 2 15 34 5 3 22 7 1 3 57 22 3 6 35 34 4 9 1 13 6 10 5 0 9 4 15 11 0 0 1 >AY366246\AY366246\8..727\720\AAR16204.1\Homo sapiens\Homo sapiens natural killer cell inhibitory receptor (KIR2DS4)mRNA, KIR2DS4*004 allele, complete cds./gene="KIR2DS4"/allele="KIR2DS4*004"/codon_start=1/product="natural killer cell inhibitory receptor"/protein_id="AAR16204.1"/db_xref="GI:38305400" 1 0 0 0 3 3 1 6 9 5 0 5 4 11 3 2 1 3 2 4 5 3 7 6 4 7 4 5 1 4 5 3 6 4 2 5 5 5 4 2 5 1 4 7 9 2 3 8 3 3 3 3 4 2 7 3 2 6 2 9 3 0 0 1 >AB060284\AB060284\6..1064\1059\BAB61052.1\Homo sapiens\Homo sapiens mRNA for WNT3A, complete cds./gene="WNT3A"/codon_start=1/product="WNT3A"/protein_id="BAB61052.1"/db_xref="GI:14530679" 1 18 8 0 0 2 0 6 13 0 1 0 4 4 8 1 13 0 2 7 5 0 4 8 3 1 2 20 2 4 1 20 5 4 0 7 13 1 1 14 10 0 1 10 13 0 1 21 13 1 10 1 21 4 11 3 0 12 2 6 9 0 1 0 >BC014534\BC014534\313..1002\690\AAH14534.1\Homo sapiens\Homo sapiens methyl-CpG binding domain protein 5, mRNA (cDNA cloneIMAGE:3996924), complete cds./gene="MBD5"/codon_start=1/product="MBD5 protein"/protein_id="AAH14534.1"/db_xref="GI:15778911"/db_xref="GeneID:55777" 1 0 1 0 5 2 3 3 2 3 7 6 5 3 1 9 6 14 4 2 2 4 6 4 1 9 7 3 0 4 2 1 5 7 0 1 3 5 5 2 10 7 7 12 5 3 4 0 2 3 1 1 2 3 1 3 3 1 3 10 0 1 0 0 >HUMPRPHOS1\M63960\30..1022\993\AAA36508.1\Homo sapiens\Human protein phosphatase-1 catalytic subunit mRNA, complete cds./EC_number="3.1.3.16"/codon_start=1/product="protein phosphatase-1"/protein_id="AAA36508.1"/db_xref="GI:190516" 3 5 6 2 1 1 1 8 25 3 0 2 1 5 2 4 4 2 3 5 0 1 1 10 1 5 2 10 1 2 3 15 5 4 2 1 10 0 5 17 12 3 0 13 6 0 3 18 21 2 8 5 10 3 12 7 2 15 3 6 3 0 1 0 >AL627230#6\AL627230\complement(join(96963..97088,97647..97724))\204\CAH70358.1\Homo sapiens\Human DNA sequence from clone RP11-12A20 on chromosome 9 Containssix novel pseudogenes, the FAM27B gene for family with sequencesimilarity 27, member B, a pseudogene similar to part of ribosomalprotein L10 (RPL10) and four CpG islands, complete sequence./gene="FAM27B"/locus_tag="RP11-12A20.3-001"/standard_name="OTTHUMP00000015366"/codon_start=1/product="family with sequence similarity 27, member B"/protein_id="CAH70358.1"/db_xref="GI:55666401"/db_xref="UniProt/TrEMBL:Q9BU53" 0 0 4 0 2 2 0 2 0 1 0 1 1 0 0 1 2 0 1 0 2 2 0 4 0 1 3 3 1 1 2 0 3 0 0 0 1 1 2 1 1 0 1 2 1 0 2 4 0 4 1 0 0 2 1 1 0 2 0 1 0 0 0 1 >AY757363\AY757363\15..1406\1392\AAV52330.1\Homo sapiens\Homo sapiens putative L-2-hydroxyglutarate dehydrogenase (L2HGDH)mRNA, complete cds./gene="L2HGDH"/EC_number="1.1.99.2"/function="catalyzes the FAD dependent conversion of L-2-hydroxyglutarate to alpha-ketoglutarate"/codon_start=1/product="putative L-2-hydroxyglutarate dehydrogenase"/protein_id="AAV52330.1"/db_xref="GI:55469290" 4 3 4 2 9 5 4 4 11 13 3 6 6 5 0 8 5 9 5 2 0 6 11 2 3 10 12 14 3 8 18 11 7 13 9 5 7 12 14 10 1 10 8 12 2 5 15 9 3 19 5 14 3 10 8 11 8 6 22 8 1 1 0 0 >CR456744\CR456744\1..1299\1299\CAG33025.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834B0611D forgene TRIP13, thyroid hormone receptor interactor 13; complete cds,incl. stopcodon./gene="TRIP13"/codon_start=1/protein_id="CAG33025.1"/db_xref="GI:48145605" 3 3 2 0 6 5 3 12 19 7 4 9 6 3 2 7 11 5 7 10 1 6 3 6 0 5 10 12 4 5 1 10 3 2 2 8 20 5 11 16 11 9 2 16 5 6 15 15 13 12 7 3 2 4 7 9 5 9 18 5 5 1 0 0 >AL391595#2\AL391595\join(20166..20361,41019..41096,42633..42701,81150..81283, 82347..82502,83283..83306)\657\CAI15050.1\Homo sapiens\Human DNA sequence from clone RP11-51I15 on chromosome 6 Containsthe 5' part of the BCKDHB gene for branched chain keto aciddehydrogenase E1, beta polypeptide (maple syrup urine disease) andone CpG island, complete sequence./gene="BCKDHB"/locus_tag="RP1-279A18.1-002"/standard_name="OTTHUMP00000018047"/codon_start=1/product="branched chain keto acid dehydrogenase E1, beta polypeptide (maple syrup urine disease)"/protein_id="CAI15050.1"/db_xref="GI:55959445"/db_xref="GOA:Q5T2J3"/db_xref="InterPro:IPR005475"/db_xref="UniProt/TrEMBL:Q5T2J3" 1 2 5 1 2 3 1 4 1 3 1 5 1 1 0 4 2 2 1 1 0 8 3 2 1 5 5 9 10 6 9 5 6 3 3 3 1 8 5 2 2 3 2 8 2 4 5 4 1 9 1 5 2 3 3 12 2 4 6 2 3 0 1 0 >HUMIFI16A\M63838\265..2454\2190\AAA58683.1\Homo sapiens\Human interferon-gamma induced protein (IFI 16) gene, complete cds./gene="IFI 16"/codon_start=1/product="interferon-gamma induced protein"/protein_id="AAA58683.1"/db_xref="GI:184569" 5 2 1 2 13 4 8 6 11 10 7 9 16 8 0 6 14 16 21 13 3 19 20 11 4 14 9 9 0 14 17 4 9 4 14 5 19 8 48 38 15 22 6 21 4 13 38 22 12 15 2 12 3 4 19 15 12 16 13 24 0 1 0 0 >AY048673\AY048673\101..478\378\AAL07514.1\Homo sapiens\Homo sapiens SSTK-interacting protein variant mRNA, complete cds,alternatively spliced./function="interacts with protein kinase SSTK deposited with GenBank Accession Number AF329483"/codon_start=1/product="SSTK-interacting protein variant"/protein_id="AAL07514.1"/db_xref="GI:15822788" 0 0 1 0 2 1 1 5 5 2 3 2 0 2 1 4 2 3 3 1 0 2 5 5 0 4 4 4 0 4 2 0 1 0 0 0 2 3 4 3 4 5 5 6 3 2 4 2 0 1 0 2 1 2 1 1 0 1 1 3 0 0 0 1 >AF000574\AF000574\1..1797\1797\AAB88119.1\Homo sapiens\Homo sapiens clone 1 immunoglobulin-like transcript 4 mRNA,complete cds./codon_start=1/product="immunoglobulin-like transcript 4"/protein_id="AAB88119.1"/db_xref="GI:2264427" 4 5 6 6 6 9 3 27 27 4 0 3 14 13 5 10 11 8 12 22 0 4 14 31 2 14 8 19 2 12 9 17 16 7 0 10 25 3 4 11 8 1 3 31 14 3 15 17 13 8 13 9 5 7 10 2 2 17 2 6 9 0 1 0 >AB040801\AB040801\1..1122\1122\BAA96647.1\Homo sapiens\Homo sapiens mRNA for SREB3, complete cds./gene="sreb3"/codon_start=1/product="SREB3"/protein_id="BAA96647.1"/db_xref="GI:8467970" 1 9 4 3 1 1 1 12 27 1 0 2 3 3 0 4 7 1 5 8 1 3 7 8 2 3 4 27 4 14 3 12 3 5 1 9 17 1 1 15 6 2 0 7 7 5 2 9 6 1 12 2 13 2 15 9 1 10 4 18 9 0 0 1 >BC002718\BC002718\58..447\390\AAH02718.1\Homo sapiens\Homo sapiens tumor necrosis factor receptor superfamily, member12A, mRNA (cDNA clone MGC:3386 IMAGE:3632233), complete cds./gene="TNFRSF12A"/codon_start=1/product="type I transmembrane protein Fn14"/protein_id="AAH02718.1"/db_xref="GI:12803761"/db_xref="GeneID:51330"/db_xref="MIM:605914" 2 4 4 0 2 2 0 3 13 3 0 3 0 3 1 2 4 0 0 5 0 0 2 4 1 2 2 3 6 4 1 8 4 0 0 1 4 0 0 2 0 0 1 1 1 0 0 5 4 0 0 0 8 0 4 1 1 2 0 2 4 0 0 1 >AB018001\AB018001\31..1143\1113\BAA88063.1\Homo sapiens\Homo sapiens mRNA for Death-associated protein kinase 2, completecds./gene="DAPK2"/codon_start=1/product="Death-associated protein kinase 2"/protein_id="BAA88063.1"/db_xref="GI:6521210" 0 5 12 0 1 10 1 8 20 5 2 1 2 4 1 2 12 6 4 6 6 1 6 3 3 1 4 9 1 5 6 2 4 3 0 5 16 3 4 22 9 5 2 12 11 0 8 29 14 6 4 4 2 1 11 9 2 16 9 7 3 1 0 0 >BC005024\BC005024\48..425\378\AAH05024.1\Homo sapiens\Homo sapiens mitochondria-associated protein involved ingranulocyte-macrophage colony-stimulating factor signaltransduction, mRNA (cDNA clone MGC:12559 IMAGE:3955475), completecds./gene="Magmas"/codon_start=1/product="mitochondria-associated granulocyte macrophage CSF signaling molecule"/protein_id="AAH05024.1"/db_xref="GI:13477135"/db_xref="GeneID:51025" 1 3 4 0 1 1 0 5 4 0 1 1 1 5 0 1 3 0 0 0 1 0 0 1 0 1 6 9 0 3 2 4 1 1 0 2 8 0 3 6 3 1 0 11 2 1 3 7 1 3 2 1 0 0 1 3 0 2 2 3 0 0 0 1 >HUMDIKI\D73409\81..3590\3510\BAA11134.1\Homo sapiens\Homo sapiens mRNA for diacylglycerol kinase delta, complete cds./EC_number="2.7.1.107"/function="phosphorylaton of diacylglycerol"/codon_start=1/product="diacylglycerol kinase delta"/protein_id="BAA11134.1"/db_xref="GI:1181079" 9 14 15 4 11 13 6 28 55 13 5 11 12 29 11 15 27 16 11 33 10 4 15 18 14 8 18 40 3 18 15 26 25 6 3 31 31 7 26 56 32 11 1 47 28 3 23 61 38 25 11 9 22 16 24 13 6 26 15 29 17 0 1 0 >BC070387\BC070387\50..997\948\AAH70387.1\Homo sapiens\Homo sapiens T cell receptor beta variable 3-1, mRNA (cDNA cloneMGC:88400 IMAGE:30325415), complete cds./gene="TRBV3-1"/codon_start=1/product="TRBV3-1 protein"/protein_id="AAH70387.1"/db_xref="GI:47125017"/db_xref="GeneID:28619"/db_xref="IMGT/GENE-DB:TRBV3-1" 0 4 1 0 4 3 2 11 13 2 2 2 2 9 2 5 8 3 8 8 0 2 6 7 2 1 4 13 0 4 2 5 7 3 1 11 10 2 8 10 4 11 5 11 6 1 3 15 11 3 7 5 5 5 11 3 2 6 2 6 6 0 1 0 >BC090038\BC090038\124..1098\975\AAH90038.1\Homo sapiens\Homo sapiens nuclease sensitive element binding protein 1, mRNA(cDNA clone MGC:110976 IMAGE:6526492), complete cds./gene="NSEP1"/codon_start=1/product="nuclease sensitive element binding protein 1"/protein_id="AAH90038.1"/db_xref="GI:58477789"/db_xref="GeneID:4904"/db_xref="MIM:154030" 4 11 4 5 7 8 0 2 0 1 0 1 0 2 3 0 5 5 4 4 2 3 12 10 2 11 8 12 3 6 10 13 5 11 5 1 4 6 6 10 9 12 10 14 1 1 11 16 6 7 8 8 0 0 5 2 1 2 0 4 1 1 0 0 >AY279531\AY279531\1..852\852\AAP34399.1\Homo sapiens\Homo sapiens small CTD phosphatase 2 mRNA, complete cds./codon_start=1/product="small CTD phosphatase 2"/protein_id="AAP34399.1"/db_xref="GI:31074179" 1 4 4 3 2 4 1 11 14 3 1 1 1 9 2 3 5 1 2 8 0 4 5 2 0 8 5 8 3 6 3 5 6 0 0 4 13 2 0 13 5 2 2 11 4 2 7 13 12 5 5 4 5 5 10 5 3 9 3 3 1 0 1 0 >AL589862#3\AL589862\join(127383..127425,130868..131036,131816..132013, 160620..160705,162368..162487,AL138762.20:2451..2626, AL138762.20:5131..5257,AL138762.20:20669..20770, AL138762.20:23030..23116,AL138762.20:23564..23640)\-693189\CAI17856.1\Homo sapiens\Human DNA sequence from clone GS1-165F21 on chromosome 10 Containsthe 5' end of the PAX2 gene for paired box gene 2 and fifteen CpGislands, complete sequence./gene="PAX2"/locus_tag="RP11-179B2.1-001"/standard_name="OTTHUMP00000020286"/codon_start=1/product="paired box gene 2"/protein_id="CAI17856.1"/db_xref="GI:55960623"/db_xref="GOA:Q5W0L6"/db_xref="InterPro:IPR001523"/db_xref="UniProt/TrEMBL:Q5W0L6" 2 4 8 2 6 11 2 5 10 2 1 7 1 14 2 5 8 0 3 6 3 1 6 15 5 12 6 10 5 5 11 10 17 7 2 8 19 5 9 10 5 7 4 12 8 0 6 15 10 6 5 0 2 3 7 6 0 12 6 9 4 1 0 2 >AB015718\AB015718\51..2957\2907\BAA35073.1\Homo sapiens\Homo sapiens lok mRNA for protein kinase, complete cds./gene="lok"/codon_start=1/product="protein kinase"/protein_id="BAA35073.1"/db_xref="GI:4001688" 7 18 24 4 6 11 3 25 63 5 1 2 3 15 3 11 25 9 1 23 10 3 10 21 5 8 6 39 6 13 4 15 8 6 4 6 28 3 18 73 26 9 10 63 15 8 21 95 38 13 13 3 8 4 17 7 3 26 6 31 7 1 0 0 >BC030267\BC030267\178..3654\3477\AAH30267.1\Homo sapiens\Homo sapiens ATPase type 13A2, mRNA (cDNA clone MGC:40082IMAGE:5240813), complete cds./gene="ATP13A2"/codon_start=1/product="ATP13A2 protein"/protein_id="AAH30267.1"/db_xref="GI:20988435"/db_xref="GeneID:23400" 5 18 23 4 8 11 6 37 90 4 1 8 9 22 4 8 37 5 17 25 12 14 24 45 16 19 23 48 14 18 9 34 29 6 3 33 69 6 4 24 17 4 8 47 18 4 7 44 31 6 26 8 17 9 29 3 4 33 6 28 17 0 1 0 >AF241254\AF241254\104..2521\2418\AAF78220.1\Homo sapiens\Homo sapiens angiotensin converting enzyme-like protein mRNA,complete cds./codon_start=1/product="angiotensin converting enzyme-like protein"/protein_id="AAF78220.1"/db_xref="GI:8650466" 4 1 3 2 10 11 7 9 21 19 10 10 11 11 0 14 11 7 17 9 2 11 15 9 0 13 21 11 0 19 19 6 13 5 7 8 17 18 27 20 16 38 15 23 2 14 38 18 23 20 12 21 4 4 17 22 9 15 16 27 23 0 1 0 >HSGCSAB\X66533\89..1948\1860\CAA47144.1\Homo sapiens\H.sapiens soluble guanylate cyclase small subunit mRNA./EC_number="4.6.1.2"/codon_start=1/product="guanylate cyclase"/protein_id="CAA47144.1"/db_xref="GI:31686"/db_xref="GOA:Q02153"/db_xref="InterPro:IPR001054"/db_xref="UniProt/Swiss-Prot:Q02153" 3 3 4 5 15 3 6 15 16 13 7 12 5 6 1 10 8 5 17 6 2 13 11 2 1 9 8 7 0 9 16 10 7 7 3 13 13 13 18 15 6 17 12 14 11 8 37 16 17 23 11 11 7 7 7 21 12 18 10 15 2 0 0 1 >BC014594\BC014594\96..449\354\AAH14594.1\Homo sapiens\Homo sapiens GABA(A) receptor-associated protein-like 2, mRNA (cDNAclone MGC:26328 IMAGE:4821535), complete cds./gene="GABARAPL2"/codon_start=1/product="GABA(A) receptor-associated protein-like 2"/protein_id="AAH14594.1"/db_xref="GI:15779041"/db_xref="GeneID:11345"/db_xref="MIM:607452" 1 0 1 0 1 3 1 0 2 2 1 1 1 2 1 3 2 0 1 0 0 3 2 1 1 1 0 1 3 1 3 2 0 0 0 2 6 3 4 8 1 0 0 5 2 0 5 4 4 4 3 2 1 0 5 2 0 5 4 4 2 0 0 1 >HUMSSR2\D37991\25..576\552\BAA07206.1\Homo sapiens\Human SSR2 mRNA for beta-signal sequence receptor, complete cds./gene="SSR2"/codon_start=1/product="beta-signal sequence receptor"/protein_id="BAA07206.1"/db_xref="GI:1736880" 2 1 2 0 1 4 3 2 10 2 1 4 3 6 1 3 2 2 1 4 1 4 1 4 0 5 5 5 0 7 6 4 2 1 0 5 7 2 4 4 4 3 1 4 1 1 3 4 7 3 5 2 0 0 4 6 0 4 4 3 3 0 0 1 >AF214737\AF214737\26..3235\3210\AAF72866.1\Homo sapiens\Homo sapiens C9orf10a (C9orf10) mRNA, complete cds, alternativelyspliced./gene="C9orf10"/codon_start=1/product="C9orf10a"/protein_id="AAF72866.1"/db_xref="GI:8118021" 4 9 15 7 16 13 8 26 40 7 4 7 13 13 7 18 28 7 8 13 8 8 26 22 22 28 22 39 11 21 27 42 20 11 6 20 24 14 16 37 27 6 8 44 22 14 26 26 24 17 21 14 15 4 25 15 3 16 20 22 13 0 0 1 >BC014521\BC014521\58..726\669\AAH14521.1\Homo sapiens\Homo sapiens PDZ and LIM domain 7 (enigma), transcript variant 4,mRNA (cDNA clone IMAGE:3690154), complete cds./gene="PDLIM7"/codon_start=1/product="enigma protein, isoform 4"/protein_id="AAH14521.1"/db_xref="GI:15778884"/db_xref="GeneID:9260"/db_xref="MIM:605903" 1 5 9 1 0 2 0 10 9 1 0 0 1 6 2 1 9 0 4 3 1 1 3 6 10 3 2 12 5 2 2 7 6 2 1 3 7 1 4 7 4 3 2 13 6 0 2 9 7 4 2 1 1 0 5 2 0 4 1 3 4 0 1 0 >BC030227\BC030227\66..1145\1080\AAH30227.1\Homo sapiens\Homo sapiens CD72 antigen, mRNA (cDNA clone MGC:34615IMAGE:5226648), complete cds./gene="CD72"/codon_start=1/product="CD72 antigen"/protein_id="AAH30227.1"/db_xref="GI:20987445"/db_xref="GeneID:971"/db_xref="MIM:107272" 1 4 2 0 5 9 4 9 20 2 3 5 13 4 2 7 10 5 6 7 4 9 6 6 1 0 7 7 4 8 5 3 8 2 2 4 8 2 6 13 6 5 8 25 1 3 7 14 4 9 6 5 8 5 5 3 2 6 2 4 8 0 1 0 >AY449689\AY449689\1..2493\2493\AAS19705.1\Homo sapiens\Homo sapiens prominin 1 isoform s3 (PROM1) mRNA, complete cds./gene="PROM1"/codon_start=1/product="prominin 1 isoform s3"/protein_id="AAS19705.1"/db_xref="GI:42556032" 6 2 4 4 12 8 11 13 36 16 8 26 11 13 2 9 19 15 12 15 2 21 5 8 3 13 19 4 4 13 16 15 10 8 10 13 17 13 25 18 20 29 14 20 6 7 26 13 16 24 16 20 11 12 14 23 14 23 24 11 8 0 0 1 >AL157392#1\AL157392\join(13445..13510,23816..23893,26602..26706,31989..32102, 36397..36543,37973..38041,40099..40239,40373..40444, 42756..42911,56618..56698)\1029\CAI13123.1\Homo sapiens\Human DNA sequence from clone RP11-295P9 on chromosome 10 Containsthe PRPF18 gene for PRP18 pre-mRNA processing factor 18 homolog(yeast), a ribosomal protein L6 (RPL6)(TAXREB107) pseudogene, twonovel genes, the 5' end of a novel gene, the 3' end of the gene fora novel protein (FLJ10210) and two CpG islands, complete sequence./gene="PRPF18"/locus_tag="RP11-295P9.7-001"/standard_name="OTTHUMP00000019146"/codon_start=1/product="PRP18 pre-mRNA processing factor 18 homolog (yeast)(PRPF18)"/protein_id="CAI13123.1"/db_xref="GI:55957682"/db_xref="InterPro:IPR003648"/db_xref="InterPro:IPR004098" 0 1 4 2 12 6 3 3 9 9 8 7 3 4 1 1 0 3 4 5 2 5 7 1 0 5 6 5 3 4 7 5 1 4 0 3 7 6 24 11 4 11 2 15 1 5 20 18 8 12 5 6 1 2 3 7 3 9 10 7 2 0 0 1 >HUMSRTR2A\M21985\127..1578\1452\AAA36650.1\Homo sapiens\Human steroid receptor TR2 mRNA, complete cds./codon_start=1/protein_id="AAA36650.1"/db_xref="GI:338486" 6 1 0 2 5 5 3 4 10 14 11 3 17 3 1 12 10 6 15 7 0 16 8 1 1 6 17 8 3 6 17 4 4 3 12 4 6 5 16 8 5 19 16 13 8 6 18 7 6 19 5 3 5 11 7 11 4 6 17 14 3 1 0 0 >AL359706#3\AL359706\join(38993..39022,39371..39535,44269..44367,51724..51942)\513\CAI12868.1\Homo sapiens\Human DNA sequence from clone RP11-245H20 on chromosome 13 Containsthe 5' end of the NUFIP1 gene for nuclear fragile X mentalretardation protein interacting protein 1, the gene forlipopolysaccharide specific response-7 protein (LSR7) and a CpGisland, complete sequence./gene="RP11-245H20.2"/locus_tag="RP11-245H20.2-001"/standard_name="OTTHUMP00000018339"/codon_start=1/protein_id="CAI12868.1"/db_xref="GI:55959531"/db_xref="UniProt/TrEMBL:Q5T3Z3" 1 0 1 1 9 4 2 1 2 3 3 0 9 1 0 5 0 2 3 2 0 3 4 1 0 2 1 1 0 7 3 2 1 1 2 0 0 1 16 10 1 4 2 2 1 3 13 3 5 12 1 0 0 0 0 6 4 1 1 5 2 1 0 0 >AL137140#2\AL137140\join(97009..97051,97729..97846,100937..100996, 104964..105032,108813..108881,109428..109466, 110504..110584,113257..113322,130277..130357, 132528..132593,138678..138722,140462..140521, 140822..140881,143198..143257,143787..143846, 144209..144268,145489..145548,149189..149248, 150191..150250,151180..151239,154024..154074, 155026..155088,158732..158791,158936..158995, 159137..159193,169080..169142,175512..175583, 178253..178327,181809..181917,183839..183941, 185391..185407)\2007\CAH70475.1\Homo sapiens\Human DNA sequence from clone RP11-137M6 on chromosome13q21.33-22.3 Contains the SCEL gene for sciellin, a novel gene anda CpG island, complete sequence./gene="SCEL"/locus_tag="RP11-137M6.1-001"/standard_name="OTTHUMP00000018525"/codon_start=1/protein_id="CAH70475.1"/db_xref="GI:55661725"/db_xref="GOA:Q5W0S9"/db_xref="InterPro:IPR001781"/db_xref="UniProt/TrEMBL:Q5W0S9" 3 1 4 0 24 13 2 10 3 14 4 12 7 8 1 16 13 16 14 12 2 21 12 7 2 18 7 6 1 10 15 4 3 10 11 3 15 7 43 18 25 41 18 16 3 7 35 7 13 28 7 8 7 2 4 4 8 12 16 11 4 1 0 0 >AF168611\AF168611\1..1101\1101\AAD50823.1\Homo sapiens\Homo sapiens HLA-C class I antigen (HLA-C) mRNA, HLA-Cw*05DZallele, complete cds./gene="HLA-C"/allele="HLA-Cw*05DZ"/codon_start=1/product="HLA-C class I antigen"/protein_id="AAD50823.1"/db_xref="GI:5758662" 2 10 7 1 5 6 2 8 19 1 0 0 1 6 1 7 5 2 3 15 4 2 5 9 3 2 4 14 10 10 8 11 10 1 0 5 16 2 3 10 5 1 1 21 5 3 1 24 15 4 11 4 6 2 7 0 0 10 0 6 10 0 0 1 >BC019008\BC019008\57..1451\1395\AAH19008.1\Homo sapiens\Homo sapiens Williams-Beuren syndrome chromosome region 16,transcript variant 1, mRNA (cDNA clone MGC:20807 IMAGE:4330507),complete cds./gene="WBSCR16"/codon_start=1/product="RCC1-like G exchanging factor-like, isoform 1"/protein_id="AAH19008.1"/db_xref="GI:17512067"/db_xref="GeneID:81554" 6 12 7 0 3 6 1 3 25 4 2 4 4 9 3 7 8 4 4 5 7 4 4 11 2 6 5 13 8 9 20 14 17 10 1 14 28 5 7 7 9 4 4 14 9 2 11 12 10 9 6 7 7 6 11 7 0 8 4 7 8 1 0 0 >AK056091\AK056091\477..1367\891\BAB71092.1\Homo sapiens\Homo sapiens cDNA FLJ31529 fis, clone NT2RI2000421, moderatelysimilar to ZINC FINGER PROTEIN 75./codon_start=1/protein_id="BAB71092.1"/db_xref="GI:16551403" 0 0 2 1 10 3 2 5 2 8 5 2 6 4 0 5 5 4 9 3 0 6 4 5 1 4 5 3 0 0 5 2 4 1 3 2 5 3 26 12 4 6 10 10 13 5 11 9 3 9 1 5 1 11 5 7 4 2 3 4 6 0 0 1 >AK000805\AK000805\145..1398\1254\BAA91382.1\Homo sapiens\Homo sapiens cDNA FLJ20798 fis, clone ADSU02031./codon_start=1/protein_id="BAA91382.1"/db_xref="GI:7021111" 1 5 8 3 2 7 2 17 20 4 2 6 7 25 0 9 27 9 1 9 1 7 16 26 2 8 7 21 1 11 6 9 7 6 1 7 7 2 2 5 3 4 4 11 3 3 2 17 12 6 1 2 6 3 4 4 1 5 0 4 6 0 0 1 >BC040279\BC040279\18..1541\1524\AAH40279.1\Homo sapiens\Homo sapiens lactate dehydrogenase D, transcript variant 1, mRNA(cDNA clone MGC:34649 IMAGE:5162826), complete cds./gene="LDHD"/codon_start=1/product="D-lactate dehydrogenase, isoform 1 precursor"/protein_id="AAH40279.1"/db_xref="GI:54611260"/db_xref="GeneID:197257"/db_xref="MIM:607490" 4 9 12 1 0 8 1 15 33 2 1 0 2 9 2 4 6 2 7 11 9 2 6 10 1 7 10 27 8 8 7 30 9 4 3 12 33 1 2 11 12 4 2 19 11 6 5 29 13 7 7 1 10 5 14 2 1 10 2 9 9 0 0 1 >HUMGAB3R#1\L04311\join(96..175,1044..1135,1347..1414)\240\AAA52507.1\Homo sapiens\Human GABA-A receptor beta-3 subunit (GABRB3) gene, 5' end./gene="GABRB3"/codon_start=1/product="GABA-alpha receptor beta-3 subunit"/protein_id="AAA52507.1"/db_xref="GI:292040"/db_xref="GDB:G00-127-549" 1 2 0 0 1 0 1 3 5 0 0 1 0 4 0 1 1 1 0 2 1 0 0 5 1 0 0 1 0 0 0 2 5 1 0 2 4 1 1 2 4 0 0 0 0 0 1 3 5 1 1 0 2 0 1 1 0 4 1 5 2 0 0 0 >AF106685\AF106685\81..1724\1644\AAD43038.1\Homo sapiens\Homo sapiens myelin gene expression factor 2 mRNA, complete cds./codon_start=1/product="myelin gene expression factor 2"/protein_id="AAD43038.1"/db_xref="GI:5410336" 5 2 1 7 17 3 4 2 4 6 8 6 8 7 2 8 11 12 8 0 0 8 12 0 1 10 9 5 3 7 44 14 6 27 4 3 8 12 27 13 8 19 5 8 0 6 20 14 9 21 3 4 1 3 7 19 12 4 15 32 3 1 0 0 >AB065730\AB065730\join(201..344,1572..1717,3449..3582,3901..4104,4714..4885, 5312..5476,6698..6805,7669..7849,8533..8697)\1419\BAC05951.1\Homo sapiens\Homo sapiens gene for seven transmembrane helix receptor, completecds, isolate:CBRC7TM_293./codon_start=1/evidence=not_experimental/product="seven transmembrane helix receptor"/protein_id="BAC05951.1"/db_xref="GI:21928729" 1 8 8 3 3 5 0 21 29 5 0 2 1 18 3 3 18 5 9 16 6 2 10 17 3 10 5 27 4 8 4 11 11 3 2 9 13 5 0 8 6 2 3 11 9 1 0 10 6 1 13 7 11 1 18 5 1 23 4 12 12 0 0 1 >BC001861\BC001861\309..1112\804\AAH01861.2\Homo sapiens\Homo sapiens testis expressed sequence 101, mRNA (cDNA cloneMGC:4766 IMAGE:3538494), complete cds./gene="TEX101"/codon_start=1/product="testis expressed sequence 101"/protein_id="AAH01861.2"/db_xref="GI:33876506"/db_xref="GeneID:83639" 2 0 0 1 0 2 4 4 13 4 2 4 3 8 2 6 2 3 7 12 1 8 6 5 1 5 4 8 2 3 6 6 7 3 1 3 7 1 5 3 2 4 3 9 2 3 8 12 3 2 2 2 5 10 3 5 3 7 7 8 3 1 0 0 >AK058093\AK058093\170..640\471\BAB71662.1\Homo sapiens\Homo sapiens cDNA FLJ25364 fis, clone TST01761, highly similar toLEUKOCYTE ANTIGEN CD37./codon_start=1/protein_id="BAB71662.1"/db_xref="GI:16554123" 0 1 1 0 1 2 0 13 11 0 0 1 2 3 0 0 4 0 4 5 0 2 0 0 2 1 0 8 0 0 3 5 3 1 0 2 4 1 6 5 1 0 2 6 1 1 4 6 4 0 1 2 3 1 11 3 1 12 0 3 3 0 1 0 >BC036067\BC036067\84..512\429\AAH36067.1\Homo sapiens\Homo sapiens chromosome 1 open reading frame 115, mRNA (cDNA cloneMGC:33695 IMAGE:5274863), complete cds./gene="C1orf115"/codon_start=1/product="hypothetical protein LOC79762"/protein_id="AAH36067.1"/db_xref="GI:23271283"/db_xref="GeneID:79762" 4 5 4 0 0 5 0 3 7 0 0 0 0 2 0 0 7 0 0 3 2 0 1 3 4 0 1 8 11 1 2 5 7 0 2 3 5 0 1 7 0 1 1 1 2 0 0 15 3 0 6 0 1 0 3 1 0 4 0 1 0 1 0 0 >HUMSNAP25A\L19760\89..709\621\AAC37545.1\Homo sapiens\Human nerve-terminal protein (isoform SNAP25A) mRNA, complete cds./gene="SNAP"/codon_start=1/product="nerve terminal protein"/protein_id="AAC37545.1"/db_xref="GI:307426" 2 3 1 6 1 4 1 2 6 2 2 3 1 1 1 0 3 4 3 2 0 1 0 0 0 2 2 7 0 7 3 7 2 2 2 1 4 2 8 4 7 8 5 8 1 1 11 12 6 13 1 0 1 3 2 0 1 9 1 13 1 1 0 0 >AF154001\AF154001\872..1729\858\AAD38185.1\Homo sapiens\Homo sapiens MRP3s1 protein (MRP3s1) mRNA, splice variant, completecds./gene="MRP3s1"/codon_start=1/product="MRP3s1 protein"/protein_id="AAD38185.1"/db_xref="GI:5031476" 2 10 3 1 3 3 1 7 21 3 1 2 3 4 1 5 5 1 3 8 1 4 2 5 3 1 5 7 2 8 2 12 6 3 2 4 14 1 1 6 5 3 0 8 4 2 5 15 13 5 5 1 4 0 6 3 1 15 3 7 4 1 0 0 >BT007073\BT007073\1..717\717\AAP35736.1\Homo sapiens\Homo sapiens CD63 antigen (melanoma 1 antigen) mRNA, complete cds./codon_start=1/product="CD63 antigen (melanoma 1 antigen)"/protein_id="AAP35736.1"/db_xref="GI:30582985" 1 0 1 0 2 2 0 4 10 4 0 4 1 2 2 2 0 3 1 2 1 2 1 1 1 2 7 9 2 8 5 7 6 2 1 8 18 2 4 9 8 5 0 6 1 1 1 10 2 3 4 2 9 6 5 9 1 10 7 9 2 0 1 0 >AK027627\AK027627\167..1525\1359\BAB55244.1\Homo sapiens\Homo sapiens cDNA FLJ14721 fis, clone NT2RP3001608./codon_start=1/protein_id="BAB55244.1"/db_xref="GI:14042435" 0 5 8 2 1 1 2 8 31 1 0 3 3 10 3 0 24 4 3 9 5 3 12 29 14 8 10 38 8 8 1 22 10 3 1 5 18 1 3 16 10 1 2 19 8 3 1 13 7 2 16 4 8 2 3 1 0 9 3 5 2 1 0 0 >HSM806474\BX538239\99..644\546\CAD98077.1\Homo sapiens\Homo sapiens mRNA; cDNA DKFZp686P0948 (from clone DKFZp686P0948);complete cds./gene="DKFZp686P0948"/codon_start=1/product="hypothetical protein"/protein_id="CAD98077.1"/db_xref="GI:31874712" 0 0 0 1 3 1 2 1 0 1 3 2 6 6 0 11 3 4 2 0 0 6 5 1 1 4 2 2 0 6 3 3 0 3 2 3 0 4 9 2 3 2 5 5 1 5 19 5 6 11 0 3 1 3 2 1 1 2 1 3 0 0 0 1 >AB090811\AB090811\336..1964\1629\BAC55936.1\Homo sapiens\Homo sapiens ChGn-2 mRNA for chondroitin beta1,4N-acetylgalactosaminyltransferase-2, complete cds./gene="ChGn-2"/codon_start=1/product="chondroitin beta1,4 N-acetylgalactosaminyltransferase-2"/protein_id="BAC55936.1"/db_xref="GI:27923015" 5 6 4 2 8 5 5 10 12 13 6 10 5 2 0 8 3 10 3 10 2 7 5 5 1 11 6 10 1 8 12 7 4 14 5 7 14 12 19 13 5 17 13 13 7 11 36 20 9 16 7 15 3 5 10 20 4 5 16 15 5 0 0 1 >AF252611\AF252611\302..1033\732\AAK37429.1\Homo sapiens\Homo sapiens WBS15 protein (WBSCR5) mRNA, complete cds./gene="WBSCR5"/codon_start=1/product="WBS15 protein"/protein_id="AAK37429.1"/db_xref="GI:13649483" 0 3 2 1 4 4 0 5 13 0 0 3 4 8 3 4 7 2 4 2 1 3 5 8 3 1 9 9 2 0 4 4 10 4 0 2 7 0 3 7 3 3 3 11 1 1 9 14 7 7 8 2 3 2 3 1 2 2 2 4 4 0 1 0 >HSU44103\U44103\1..606\606\AAC51200.1\Homo sapiens\Homo sapiens small GTP binding protein Rab9 mRNA, complete cds./codon_start=1/product="small GTP binding protein Rab9"/protein_id="AAC51200.1"/db_xref="GI:1174147" 4 0 1 0 3 3 0 2 3 6 2 2 5 0 0 4 5 4 7 3 1 2 1 1 0 4 6 3 1 2 4 1 1 5 2 2 6 4 6 6 5 4 2 6 1 3 8 5 6 10 1 5 4 1 4 10 3 0 5 3 3 0 0 1 >HS88J8#1\AL035402\complement(12652..13614)\963\CAB42853.1\Homo sapiens\Human DNA sequence from clone RP1-88J8 on chromosome 6p21.31-21.33Contains the OR2W1 gene for olfactory receptor 2W1, olfactoryreceptor 2P1 pseudogene OR2P1P and a GTP binding proteinpseudogene. Contains ESTs, an STS and GSS, complete sequence./locus_tag="RP1-88J8.2-001"/standard_name="OTTHUMP00000017732"/codon_start=1/product="olfactory receptor, family 2, subfamily W, member 1"/protein_id="CAB42853.1"/db_xref="GI:4826521"/db_xref="Genew:8281"/db_xref="GOA:Q9Y3N9"/db_xref="InterPro:IPR000276"/db_xref="InterPro:IPR000725"/db_xref="UniProt/Swiss-Prot:Q9Y3N9" 1 0 0 1 4 2 4 11 12 7 6 8 4 4 0 9 4 4 11 7 1 4 6 2 1 3 3 5 0 7 5 5 0 4 5 6 4 6 8 11 8 7 3 4 2 7 1 3 4 6 8 7 2 9 9 6 5 13 12 16 3 0 1 0 >BC010935\BC010935\90..638\549\AAH10935.1\Homo sapiens\Homo sapiens casein kappa, mRNA (cDNA clone MGC:13553IMAGE:4279903), complete cds./gene="CSN3"/codon_start=1/product="casein kappa"/protein_id="AAH10935.1"/db_xref="GI:15012070"/db_xref="GeneID:1448"/db_xref="MIM:601695" 0 2 1 2 2 1 1 0 4 1 1 2 2 1 0 0 3 2 5 6 3 5 18 2 0 9 6 4 0 7 1 0 0 0 3 3 5 6 5 1 3 8 5 4 1 3 4 4 1 2 2 9 1 1 1 4 2 6 5 2 0 1 0 0 >BC014017\BC014017\9..1220\1212\AAH14017.1\Homo sapiens\Homo sapiens ribosomal protein L3, mRNA (cDNA clone MGC:20304IMAGE:4128023), complete cds./gene="RPL3"/codon_start=1/product="ribosomal protein L3"/protein_id="AAH14017.1"/db_xref="GI:15559314"/db_xref="GeneID:6122"/db_xref="MIM:604163" 3 10 6 5 2 4 1 5 15 4 0 1 0 7 0 4 6 2 2 15 1 5 3 3 2 7 4 9 0 8 6 18 7 2 3 9 22 1 8 48 6 3 2 12 10 6 6 15 13 5 7 4 3 2 11 6 0 10 9 10 5 1 0 0 >BC025719\BC025719\30..1202\1173\AAH25719.1\Homo sapiens\Homo sapiens matrix metalloproteinase 23B, mRNA (cDNA cloneMGC:34412 IMAGE:5228894), complete cds./gene="MMP23B"/codon_start=1/product="matrix metalloproteinase 23B"/protein_id="AAH25719.1"/db_xref="GI:19343598"/db_xref="GeneID:8510"/db_xref="MIM:603321" 1 16 13 2 1 8 1 13 30 1 0 1 1 7 2 0 10 0 0 11 8 1 3 19 9 0 2 21 14 3 3 21 6 0 0 10 16 0 4 6 6 1 1 7 16 0 1 14 14 0 13 0 11 0 17 0 1 7 0 5 12 0 0 1 >AC011298#2\AC011298\93404..93637\234\AAX93238.1\Homo sapiens\Homo sapiens BAC clone RP11-118M12 from 2, complete sequence./gene="tmp_locus_27"/codon_start=1/product="unknown"/protein_id="AAX93238.1"/db_xref="GI:62702314" 0 0 0 2 0 2 1 0 7 0 1 0 1 2 0 0 3 0 1 2 1 1 1 5 2 3 1 0 1 3 2 4 2 2 0 5 1 1 0 0 0 0 0 2 3 0 0 0 3 1 1 0 2 3 2 0 0 0 0 1 2 0 1 0 >AF289203S1#3\AF289203\32875..33747\873\AAG42366.1\Homo sapiens\Homo sapiens odorant receptor HOR3'beta1, odorant receptorHOR3'beta2, and odorant receptor HOR3'beta3 genes, complete cds./codon_start=1/product="odorant receptor HOR3'beta3"/protein_id="AAG42366.1"/db_xref="GI:11991865" 1 6 2 1 0 3 2 10 15 9 3 4 3 5 1 8 5 3 8 5 1 3 3 7 2 4 3 8 0 5 5 3 2 6 2 7 10 4 3 1 4 4 1 3 8 2 3 2 4 2 6 5 4 4 9 12 5 12 12 13 2 1 0 0 >AF262240\AF262240\20..739\720\AAF87716.1\Homo sapiens\Homo sapiens Smac mRNA, complete cds; nuclear gene formitochondrial product./function="binds IAPs and neutralizes their inhibition on caspase activation and activity"/codon_start=1/product="Smac"/protein_id="AAF87716.1"/db_xref="GI:9454219" 1 1 3 2 4 3 0 3 8 4 2 6 6 2 2 5 2 4 4 8 1 9 1 0 0 3 11 4 3 7 2 2 2 1 3 0 8 5 7 6 1 2 3 12 4 1 13 15 0 4 4 5 0 4 3 3 4 1 5 6 4 0 0 1 >AF386492\AF386492\join(2804..3074,4831..5064,6291..6485,8115..8313, 9912..10012,10133..10219,11418..11501,11821..11858)\1209\AAK60338.1\Homo sapiens\Homo sapiens serine-cysteine proteinase inhibitor clade E member 1(SERPINE1) gene, complete cds./gene="SERPINE1"/codon_start=1/product="serine-cysteine proteinase inhibitor clade E member 1"/protein_id="AAK60338.1"/db_xref="GI:14326588" 0 5 5 0 4 4 2 14 20 4 0 2 7 6 1 4 7 3 9 10 4 4 3 16 0 3 1 18 4 6 6 9 6 2 0 12 21 2 6 13 11 2 4 18 10 3 6 13 16 4 6 3 1 0 20 5 1 10 4 18 4 0 0 1 >AK055353\AK055353\530..2140\1611\BAB70908.1\Homo sapiens\Homo sapiens cDNA FLJ30791 fis, clone FEBRA2000972, moderatelysimilar to ZINC FINGER PROTEIN 184./codon_start=1/protein_id="BAB70908.1"/db_xref="GI:16550064" 3 0 0 1 18 6 3 0 7 19 3 7 8 8 1 3 10 8 6 3 1 14 6 12 0 4 8 8 1 10 16 3 6 1 5 5 5 9 31 18 5 10 18 22 8 26 33 22 4 5 9 13 3 28 9 12 8 4 10 5 5 0 1 0 >AB001895\AB001895\288..3716\3429\BAA23269.1\Homo sapiens\Homo sapiens mRNA for B120, complete cds./codon_start=1/product="B120"/protein_id="BAA23269.1"/db_xref="GI:2588991" 6 6 10 5 3 12 3 11 9 4 3 7 18 43 8 24 25 20 16 19 4 17 53 45 12 57 14 34 5 23 30 49 34 16 2 4 12 5 9 22 30 24 33 122 8 8 10 15 15 17 21 37 1 1 8 6 5 12 6 60 4 0 0 1 >S82745\S82745\1..330\330\AAD14415.1\Homo sapiens\Ig VH=IgM monoclonal antibody heavy chain variable region {VH-D-JHjunctions, clone B4E7} [human, hybrid cells, splenic lymphocytesfused to lymphoblastoid B cell line GM4672, Genomic, 330 nt]./gene="Ig VH"/codon_start=1/protein_id="AAD14415.1"/db_xref="GI:4262115" 1 1 1 2 2 0 0 1 5 0 0 0 1 5 0 1 2 3 0 4 3 0 1 0 1 0 4 2 1 3 3 2 4 1 0 4 3 0 2 3 2 2 2 1 1 0 0 2 4 2 7 5 0 2 3 1 2 1 1 3 3 0 0 0 >BC010674\BC010674\428..3208\2781\AAH10674.1\Homo sapiens\Homo sapiens protein tyrosine phosphatase, non-receptor type 4(megakaryocyte), mRNA (cDNA clone MGC:9204 IMAGE:3853914), completecds./gene="PTPN4"/codon_start=1/product="protein tyrosine phosphatase, non-receptor type 4"/protein_id="AAH10674.1"/db_xref="GI:14715027"/db_xref="GeneID:5775"/db_xref="MIM:176878" 11 0 7 5 23 10 12 6 10 12 17 18 22 8 3 20 4 14 24 9 1 17 25 8 2 25 14 9 1 16 22 9 9 7 20 12 9 21 32 18 16 35 27 28 7 22 47 10 16 34 14 21 6 13 12 28 6 10 30 23 9 1 0 0 >BC002715\BC002715\259..1344\1086\AAH02715.1\Homo sapiens\Homo sapiens peroxisome proliferative activated receptor, delta,transcript variant 2, mRNA (cDNA clone MGC:3931 IMAGE:3630487),complete cds./gene="PPARD"/codon_start=1/product="peroxisome proliferative activated receptor, delta, isoform 2"/protein_id="AAH02715.1"/db_xref="GI:12803755"/db_xref="GeneID:5467"/db_xref="MIM:600409" 0 8 5 4 0 1 1 9 19 3 0 2 3 4 2 1 14 3 3 6 2 2 5 5 1 4 9 16 2 3 3 10 8 3 1 5 11 2 5 23 11 4 1 13 7 2 6 27 10 2 9 1 10 4 15 5 0 12 4 8 2 0 0 1 >AF073312\AF073312\125..1561\1437\AAC25673.1\Homo sapiens\Homo sapiens peanut-like 2 (PNUTL2) mRNA, complete cds./gene="PNUTL2"/codon_start=1/product="peanut-like 2"/protein_id="AAC25673.1"/db_xref="GI:3290200" 6 4 15 3 2 8 2 9 18 5 0 2 3 5 1 6 8 2 9 8 1 7 14 12 2 6 9 7 1 5 5 9 7 5 4 7 18 2 10 22 13 4 8 14 7 5 13 28 22 17 6 7 6 2 13 8 2 15 4 10 5 1 0 0 >BT007076\BT007076\1..507\507\AAP35739.1\Homo sapiens\Homo sapiens protein phosphatase 1, regulatory (inhibitor) subunit1B (dopamine and cAMP regulated phosphoprotein, DARPP-32) mRNA,complete cds./codon_start=1/product="protein phosphatase 1, regulatory (inhibitor) subunit 1B (dopamine and cAMP regulated phosphoprotein, DARPP-32)"/protein_id="AAP35739.1"/db_xref="GI:30582991" 0 3 2 0 4 1 1 2 8 1 0 1 4 4 2 5 2 1 3 1 0 0 5 6 0 6 1 4 0 4 2 3 5 2 0 2 2 0 1 4 2 2 2 8 4 1 12 26 3 6 1 1 0 2 1 0 0 2 1 1 1 0 1 0 >AK126499\AK126499\718..1095\378\BAC86567.1\Homo sapiens\Homo sapiens cDNA FLJ44535 fis, clone UTERU3004709./codon_start=1/protein_id="BAC86567.1"/db_xref="GI:34532998" 0 1 2 2 4 4 0 0 4 1 0 4 0 2 3 2 2 1 2 1 0 0 2 1 0 5 6 10 1 5 2 8 5 4 0 1 4 1 2 0 0 0 2 3 3 0 3 7 1 1 2 0 1 1 3 0 0 0 0 3 3 0 0 1 >AL772202#4\AL772202\40993..41253\261\CAI16742.1\Homo sapiens\Human DNA sequence from clone RP11-791O21 on chromosome 9 Containsa PEST-containing nuclear protein (PEST) pseudogene, a novel gene(FLJ37523), a novel gene, the EDG3 gene for endothelialdifferentiation, sphingolipid G-protein-coupled receptor, 3 and aCpG island, complete sequence./locus_tag="RP11-791O21.3-001"/standard_name="OTTHUMP00000021613"/codon_start=1/protein_id="CAI16742.1"/db_xref="GI:55962810"/db_xref="UniProt/TrEMBL:Q5SQD5" 0 0 1 0 1 2 2 2 0 1 0 0 0 1 0 3 2 0 4 0 0 1 1 4 2 3 2 1 0 0 0 2 1 0 0 1 3 0 4 2 0 3 0 8 1 2 1 1 2 0 1 0 3 1 3 3 1 2 1 4 3 0 0 1 >HUMMTMMP\D26512\112..1860\1749\BAA05519.1\Homo sapiens\Homo sapiens MT-MMP mRNA for membrane-type matrixmetalloproteinase, comlete cds./gene="MT-MMP"/codon_start=1/product="membrane-type matrix metalloproteinase"/protein_id="BAA05519.1"/db_xref="GI:793763" 5 10 5 6 3 9 2 14 24 3 0 1 6 7 3 4 8 2 4 12 4 6 6 28 2 9 4 24 8 7 8 25 16 4 2 7 19 3 9 24 12 9 7 12 4 8 7 30 18 13 20 5 3 3 28 9 0 16 7 15 13 0 0 1 >AL732442#6\AL732442\join(4469..4575,5043..5103,5219..5310,5434..5494, 5615..5728,5912..6013,17958..18003,18116..18184, 19757..19828,20349..20596)\972\CAI17752.1\Homo sapiens\Human DNA sequence from clone XXbac-48F10 on chromosome 6 containsthe 3' end of the MRPS18B gene for mitochondrial ribosomal proteinS18B, two novel proteins, a prothymosin, alpha (PTMA) pseudogene,the DDX16 gene for DEAD/H box polypeptide 16 , the gene forKIAA1949 protein, the NRM gene for nurim and four CpG islands,complete sequence./locus_tag="XXbac-BCX48F10.5-001"/standard_name="OTTHUMP00000029755"/codon_start=1/product="chromosome 6 open reading frame 134"/protein_id="CAI17752.1"/db_xref="GI:55962472"/db_xref="InterPro:IPR007965"/db_xref="UniProt/TrEMBL:Q9H8X5" 8 10 2 6 1 6 2 5 18 3 0 3 4 4 0 5 3 4 3 5 3 2 12 14 5 8 6 11 1 13 6 4 2 2 3 2 8 4 4 11 5 4 5 10 6 7 7 11 8 6 4 2 2 0 7 7 1 7 6 4 1 1 0 0 >BC013376\BC013376\45..1250\1206\AAH13376.1\Homo sapiens\Homo sapiens MARVEL domain containing 3, mRNA (cDNA clone MGC:16443IMAGE:3946889), complete cds./gene="MARVELD3"/codon_start=1/product="MARVEL domain containing 3"/protein_id="AAH13376.1"/db_xref="GI:15426530"/db_xref="GeneID:91862" 7 10 15 0 7 10 2 7 16 0 0 7 1 1 3 4 7 5 3 5 2 2 3 12 14 6 5 16 7 6 12 13 15 4 0 6 6 4 7 8 4 0 5 9 7 1 13 15 21 5 15 4 9 3 7 5 2 5 2 7 4 1 0 0 >AK000530\AK000530\236..607\372\BAA91234.1\Homo sapiens\Homo sapiens cDNA FLJ20523 fis, clone KAT10456./codon_start=1/protein_id="BAA91234.1"/db_xref="GI:7020688" 0 1 1 0 1 3 1 1 6 0 2 3 2 4 0 4 2 1 1 2 0 4 4 11 0 6 0 6 0 2 2 2 5 2 0 5 1 0 0 3 0 4 0 3 1 1 1 4 2 2 0 2 2 0 0 3 0 1 1 2 6 0 0 1 >BC003521\BC003521\241..711\471\AAH03521.1\Homo sapiens\Homo sapiens calpain 3, (p94), transcript variant 5, mRNA (cDNAclone IMAGE:3606623), complete cds./gene="CAPN3"/codon_start=1/product="calpain 3, isoform e"/protein_id="AAH03521.1"/db_xref="GI:13097609"/db_xref="GeneID:825"/db_xref="MIM:114240" 1 0 1 1 1 1 0 8 6 1 0 0 0 2 0 1 2 1 5 3 0 0 0 0 0 0 5 2 1 1 3 3 1 1 0 3 1 2 4 9 10 1 0 5 7 1 0 7 9 5 2 3 3 1 7 3 0 7 4 8 3 0 0 1 >AF118108\AF118108\91..1059\969\AAD42764.1\Homo sapiens\Homo sapiens lymphatic endothelium-specific hyaluronan receptorLYVE-1 mRNA, complete cds./codon_start=1/product="lymphatic endothelium-specific hyaluronan receptor LYVE-1"/protein_id="AAD42764.1"/db_xref="GI:5359673" 2 1 1 1 2 5 2 4 8 5 0 5 3 5 2 6 9 4 9 7 2 15 8 3 0 7 6 8 1 11 6 3 4 4 2 7 10 6 12 13 8 7 4 6 0 0 17 6 3 5 3 4 5 4 7 9 2 6 8 5 4 0 1 0 >AB037972\AB037972\24..227\204\BAB40572.1\Homo sapiens\Homo sapiens mRNA for defensin like protein, complete cds./codon_start=1/product="defensin like protein"/protein_id="BAB40572.1"/db_xref="GI:13516833" 2 0 1 1 3 1 0 2 2 2 1 2 0 0 1 0 1 0 1 0 1 0 2 0 0 1 0 0 0 2 2 4 0 1 0 1 2 1 3 3 1 0 0 2 0 2 1 1 0 0 0 3 5 1 1 2 1 3 0 1 0 1 0 0 >BC096061\BC096061\40..522\483\AAH96061.1\Homo sapiens\Homo sapiens ribonuclease, RNase A family, 3 (eosinophil cationicprotein), mRNA (cDNA clone MGC:116696 IMAGE:40000110), completecds./gene="RNASE3"/codon_start=1/product="ribonuclease, RNase A family, 3 (eosinophil cationic protein)"/protein_id="AAH96061.1"/db_xref="GI:64654615"/db_xref="GeneID:6037"/db_xref="MIM:131398" 2 1 5 2 5 4 0 4 5 4 1 1 2 1 0 1 0 3 1 3 1 4 7 2 0 4 5 2 0 2 1 1 1 3 2 0 3 6 2 0 8 7 3 4 3 3 0 1 3 3 0 4 4 5 3 4 2 3 4 3 2 1 0 0 >AF527764\AF527764\160..1257\1098\AAQ09011.1\Homo sapiens\Homo sapiens tissue-type brain LIM-like protein 2A mRNA, completecds./codon_start=1/product="LIM-like protein 2A"/protein_id="AAQ09011.1"/db_xref="GI:33327362" 1 9 11 1 0 3 0 8 22 0 0 1 0 5 4 1 6 0 2 4 1 1 2 9 5 2 0 23 4 2 1 13 7 3 0 7 12 0 1 26 14 3 1 10 14 4 3 23 10 2 10 2 25 8 15 5 0 6 3 7 3 0 0 1 >BC094732\BC094732\148..393\246\AAH94732.1\Homo sapiens\Homo sapiens A kinase (PRKA) anchor protein 7, transcript variantalpha, mRNA (cDNA clone IMAGE:5275622), complete cds./gene="AKAP7"/codon_start=1/product="A-kinase anchor protein 7, isoform alpha"/protein_id="AAH94732.1"/db_xref="GI:63100300"/db_xref="GeneID:9465"/db_xref="MIM:604693" 0 0 0 0 1 3 1 2 2 1 0 0 1 0 0 1 1 2 1 1 0 0 0 1 1 1 1 0 1 3 2 2 3 0 1 1 3 0 4 4 5 4 0 5 0 0 5 5 2 3 0 1 2 0 1 1 0 1 0 1 0 0 0 1 >HSA269537\AJ269537\68..1126\1059\CAB87380.1\Homo sapiens\Homo sapiens mRNA for chondroitin-4-sulfotransferase (C4ST gene)./gene="C4ST"/codon_start=1/product="chondroitin-4-sulfotransferase"/protein_id="CAB87380.1"/db_xref="GI:7572958"/db_xref="GOA:Q9NPF2"/db_xref="UniProt/Swiss-Prot:Q9NPF2" 2 4 12 1 2 4 1 8 20 0 1 5 4 5 0 2 9 1 4 11 3 4 6 8 1 1 3 10 1 1 2 3 4 1 0 9 14 0 6 17 15 3 3 12 12 1 9 15 9 5 17 4 9 0 13 5 0 15 0 13 2 1 0 0 >CR533546\CR533546\1..927\927\CAG38577.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834H0819D forgene PPT2, palmitoyl-protein thioesterase 2; complete cds, incl.stopcodon./gene="PPT2"/codon_start=1/protein_id="CAG38577.1"/db_xref="GI:49065518" 2 2 7 2 2 1 1 9 20 5 0 6 0 9 2 5 8 1 4 4 1 3 2 12 2 4 3 9 4 2 2 4 13 3 1 5 13 2 0 6 5 5 3 7 9 4 5 8 5 11 7 6 6 1 11 3 1 9 3 8 10 0 0 1 >AF109355\AF109355\62..718\657\AAQ13503.1\Homo sapiens\Homo sapiens MSTP001 mRNA, complete cds./codon_start=1/product="MSTP001"/protein_id="AAQ13503.1"/db_xref="GI:33337733" 2 2 1 0 1 1 1 5 15 0 0 4 0 3 2 1 5 2 1 6 2 2 2 1 1 1 1 13 1 1 0 3 3 3 1 3 5 1 2 10 7 2 3 12 2 3 8 9 10 4 5 2 4 1 6 5 1 9 5 8 4 1 0 0 >BC017355\BC017355\25..1590\1566\AAH17355.1\Homo sapiens\Homo sapiens karyopherin alpha 3 (importin alpha 4), mRNA (cDNAclone MGC:29709 IMAGE:5020312), complete cds./gene="KPNA3"/codon_start=1/product="karyopherin alpha 3"/protein_id="AAH17355.1"/db_xref="GI:16878323"/db_xref="GeneID:3839"/db_xref="MIM:601892" 1 2 2 1 10 2 6 6 10 12 13 8 10 4 0 5 5 6 15 5 0 5 10 8 3 7 16 5 0 15 9 5 2 8 7 11 12 18 16 9 15 19 15 20 4 6 25 9 6 28 4 3 2 7 7 9 21 6 14 6 6 1 0 0 >AK097034\AK097034\293..655\363\BAC04932.1\Homo sapiens\Homo sapiens cDNA FLJ39715 fis, clone SMINT2013228./codon_start=1/protein_id="BAC04932.1"/db_xref="GI:21756673" 0 1 0 1 1 2 2 7 4 0 4 0 1 1 1 2 1 3 8 2 0 0 3 0 2 2 3 3 0 2 0 3 3 2 1 1 2 1 0 1 2 3 0 4 8 2 0 2 1 1 3 4 4 2 0 5 1 0 1 4 3 0 0 1 >AK097954\AK097954\117..488\372\BAC05204.1\Homo sapiens\Homo sapiens cDNA FLJ40635 fis, clone THYMU2015825./codon_start=1/protein_id="BAC05204.1"/db_xref="GI:21757865" 1 0 2 1 5 2 1 1 7 4 1 5 4 4 1 1 1 1 2 2 0 2 4 2 0 0 2 1 0 1 2 4 4 6 1 1 3 1 4 0 0 2 3 1 4 2 0 4 3 2 0 0 5 1 0 2 0 2 0 4 4 0 0 1 >HSIGKLV40\X72461\1..366\366\CAA51129.1\Homo sapiens\H.sapiens mRNA for rearranged Ig kappa light chain variable region(I.27)./codon_start=1/product="Ig kappa light chain (VJC)"/protein_id="CAA51129.1"/db_xref="GI:441391" 1 0 1 0 1 3 0 7 6 0 1 1 1 1 0 4 3 3 1 7 1 3 3 1 0 2 2 3 1 2 2 2 5 2 0 4 2 1 4 2 0 2 3 6 0 2 2 1 2 2 1 3 2 1 3 1 0 4 0 1 3 0 0 0 >BC042558\BC042558\473..1270\798\AAH42558.1\Homo sapiens\Homo sapiens zinc finger, DHHC-type containing 21, mRNA (cDNA cloneMGC:42917 IMAGE:4837003), complete cds./gene="ZDHHC21"/codon_start=1/product="zinc finger, DHHC domain containing 21"/protein_id="AAH42558.1"/db_xref="GI:27503777"/db_xref="GeneID:340481" 2 1 1 3 5 6 3 6 5 3 6 5 2 3 1 2 1 0 3 1 0 7 9 4 0 3 2 5 0 0 5 6 0 4 0 3 2 10 2 7 4 6 2 4 7 8 10 2 2 5 6 4 6 9 8 12 9 4 12 9 8 1 0 0 >AY009107\AY009107\582..2021\1440\AAG49398.1\Homo sapiens\Homo sapiens PRTD-NY3 mRNA, complete cds./codon_start=1/product="PRTD-NY3"/protein_id="AAG49398.1"/db_xref="GI:12330998" 0 1 1 0 3 3 5 10 15 5 1 12 2 5 1 3 11 8 6 6 4 12 2 9 0 6 10 8 3 5 7 15 8 8 5 7 13 3 12 32 15 9 8 14 8 5 8 21 10 10 9 5 2 5 11 7 6 21 11 19 8 0 0 1 >AF203910\AF203910\125..1042\918\AAG23728.1\Homo sapiens\Homo sapiens serine/threonine kinase KRCT mRNA, complete cds./codon_start=1/product="serine/threonine kinase KRCT"/protein_id="AAG23728.1"/db_xref="GI:10834569" 3 4 6 1 3 3 4 9 18 5 2 4 1 4 1 3 3 3 0 6 1 3 5 4 1 4 3 11 2 5 3 7 6 6 0 4 10 1 2 8 4 5 7 14 7 8 5 13 11 5 4 4 5 4 9 1 2 11 3 9 5 0 0 1 >AY509035\AY509035\193..4353\4161\AAS91662.1\Homo sapiens\Homo sapiens roundabout-like protein 3 (ROBO3) mRNA, complete cds./gene="ROBO3"/codon_start=1/product="roundabout-like protein 3"/protein_id="AAS91662.1"/db_xref="GI:46395048" 12 21 20 11 12 22 7 29 70 13 0 10 12 34 4 21 48 28 17 21 13 13 41 66 18 38 27 48 25 22 29 51 41 10 7 22 63 2 10 21 24 9 12 57 14 6 28 61 22 21 19 8 16 4 21 3 3 20 5 23 31 0 0 1 >AK027159\AK027159\88..1692\1605\BAB15677.1\Homo sapiens\Homo sapiens cDNA: FLJ23506 fis, clone LNG03055./codon_start=1/protein_id="BAB15677.1"/db_xref="GI:10440218" 6 3 6 2 15 15 2 10 10 14 2 4 5 7 1 10 16 7 9 3 2 9 7 5 2 7 6 11 3 7 16 7 13 6 6 6 12 7 23 12 4 3 6 20 24 7 19 23 10 9 5 10 9 19 12 13 2 5 6 8 6 1 0 0 >AY007112\AY007112\429..1001\573\AAG01989.1\Homo sapiens\Homo sapiens clone TCCCTA00141 mRNA sequence./codon_start=1/protein_id="AAG01989.1"/db_xref="GI:9956007" 2 1 2 0 2 3 5 3 4 7 1 3 1 5 0 3 2 2 3 2 1 2 2 1 1 2 5 2 1 4 1 4 2 1 2 0 2 1 8 5 5 5 2 9 1 5 15 7 3 4 0 5 3 2 2 5 2 6 6 4 1 1 0 0 >BC031632\BC031632\494..1387\894\AAH31632.1\Homo sapiens\Homo sapiens KIAA1257, mRNA (cDNA clone MGC:35174 IMAGE:5170293),complete cds./gene="KIAA1257"/codon_start=1/product="KIAA1257 protein"/protein_id="AAH31632.1"/db_xref="GI:21594958"/db_xref="GeneID:57501" 1 0 1 2 9 5 1 4 6 4 8 7 6 5 2 7 4 3 6 4 2 6 6 5 4 1 5 5 0 2 5 3 1 3 2 4 6 3 16 14 5 4 6 9 6 4 17 12 9 3 2 3 1 1 4 5 3 5 7 7 6 1 0 0 >AF043045\AF043045\166..7974\7809\AAC33845.1\Homo sapiens\Homo sapiens actin-binding protein homolog ABP-278 mRNA, completecds./codon_start=1/product="actin-binding protein homolog ABP-278"/protein_id="AAC33845.1"/db_xref="GI:3282771" 12 17 20 7 11 17 6 38 68 10 5 13 15 33 14 26 63 27 44 69 23 32 41 75 24 60 37 89 12 44 53 91 79 54 12 78 131 42 68 104 59 29 15 57 45 13 71 93 84 72 55 34 29 9 54 32 9 80 45 37 16 1 0 0 >BC044658\BC044658\111..1505\1395\AAH44658.1\Homo sapiens\Homo sapiens FKSG44 gene, mRNA (cDNA clone MGC:47564IMAGE:5759383), complete cds./gene="FKSG44"/codon_start=1/product="FKSG44 protein"/protein_id="AAH44658.1"/db_xref="GI:27881616"/db_xref="GeneID:83786" 5 11 11 1 1 5 1 14 38 2 0 3 1 12 4 1 13 8 3 6 1 1 4 16 7 5 3 21 6 8 2 22 11 2 1 9 22 3 1 17 5 0 1 24 10 4 4 32 20 6 10 3 7 2 13 3 0 10 1 3 4 0 0 1 >HS223H9#1\AL008582\complement(37838..38872)\1035\CAB62938.1\Homo sapiens\Human DNA sequence from clone CTA-223H9 on chromosome 22q12.3-13.2,complete sequence./locus_tag="CTA-223H9.1-001"/standard_name="OTTHUMP00000028652"/codon_start=1/protein_id="CAB62938.1"/db_xref="GI:6572197"/db_xref="Genew:11980"/db_xref="GOA:Q14106"/db_xref="InterPro:IPR002087"/db_xref="UniProt/Swiss-Prot:Q14106" 0 5 5 0 0 0 2 5 18 1 0 2 5 10 1 3 16 6 1 9 1 1 7 13 2 6 3 17 3 4 3 17 7 10 2 1 16 1 6 14 13 2 0 18 3 1 3 16 7 5 7 2 2 1 12 9 0 6 5 7 2 0 0 1 >HSVATPA\X69151\167..1315\1149\CAA48903.1\Homo sapiens\H.sapiens mRNA for subunit C of vacuolar proton-ATPase V1 domain./codon_start=1/product="vacuolar proton-ATPase"/protein_id="CAA48903.1"/db_xref="GI:37643"/db_xref="GOA:P21283"/db_xref="UniProt/Swiss-Prot:P21283" 2 0 2 1 4 5 5 1 12 8 7 14 4 2 0 7 3 6 4 2 1 8 5 3 0 3 14 4 1 8 6 3 1 1 9 4 7 14 21 17 9 16 8 8 3 3 17 9 12 14 7 8 1 2 9 6 3 1 13 8 6 0 0 1 >AF357881\AF357881\25..3117\3093\AAK70402.1\Homo sapiens\Homo sapiens transcriptional coactivator MMS19 (MMS19) mRNA,complete cds./gene="MMS19"/codon_start=1/product="transcriptional coactivator MMS19"/protein_id="AAK70402.1"/db_xref="GI:14586956" 6 7 15 5 11 6 14 31 74 25 5 23 9 18 0 21 26 12 16 15 2 9 14 21 4 25 22 30 6 37 7 17 10 10 11 13 50 7 11 26 12 12 10 46 18 14 24 35 27 21 6 8 18 12 28 14 2 18 7 19 8 0 0 1 >BT009921\BT009921\1..1029\1029\AAP88923.1\Homo sapiens\Homo sapiens homeo box C10 mRNA, complete cds./codon_start=1/product="homeo box C10"/protein_id="AAP88923.1"/db_xref="GI:32880185" 2 8 5 0 2 6 0 8 12 2 0 4 0 12 6 2 17 5 4 8 4 2 2 15 6 7 5 10 7 3 4 9 7 0 2 3 3 1 11 13 8 10 3 7 4 1 11 21 10 1 8 6 8 1 6 3 1 3 2 8 3 0 1 0 >AB061822\AB061822\join(1094..1096,1623..1724,2375..2469,5142..5241, 5343..5396,5674..5982)\663\BAB79460.1\Homo sapiens\Homo sapiens RPL14 gene for ribosomal protein L14, complete cds andsequence./gene="RPL14"/codon_start=1/product="ribosomal protein L14"/protein_id="BAB79460.1"/db_xref="GI:17932938" 2 1 1 1 3 5 0 3 1 1 0 2 0 1 0 2 0 2 4 1 0 4 5 1 1 7 10 14 3 23 3 3 0 3 1 4 4 6 17 20 2 2 4 10 2 1 3 1 1 6 0 2 2 0 4 4 1 4 3 6 3 1 0 0 >BC036027\BC036027\205..3483\3279\AAH36027.1\Homo sapiens\Homo sapiens chromosome 13 open reading frame 22, mRNA (cDNA cloneMGC:33089 IMAGE:5271146), complete cds./gene="C13orf22"/codon_start=1/product="chromosome 13 open reading frame 22"/protein_id="AAH36027.1"/db_xref="GI:23273907"/db_xref="GeneID:10208" 3 0 2 5 11 6 11 8 18 21 32 22 34 11 7 33 12 22 31 12 0 27 32 7 3 23 15 11 1 35 19 9 9 17 18 5 13 23 48 33 14 52 21 30 14 19 56 20 20 35 2 20 8 23 8 27 18 7 25 15 9 0 0 1 >HSTCRGSA\X72500\1..825\825\CAA51165.1\Homo sapiens\H.sapiens mRNA for soluble gamma TCR./codon_start=1/product="gamma-delta T-cell receptor"/protein_id="CAA51165.1"/db_xref="GI:298107" 1 1 0 0 4 1 2 3 11 7 1 2 8 6 0 4 2 1 12 3 5 5 2 4 1 6 6 3 0 4 5 4 2 4 4 5 5 3 13 12 3 6 5 6 4 2 10 9 5 10 4 4 0 6 2 6 6 5 11 4 4 1 0 0 >AF193054\AF193054\286..1224\939\AAG22482.1\Homo sapiens\Homo sapiens PP3241 mRNA, complete cds./gene="PP3241"/codon_start=1/product="unknown"/protein_id="AAG22482.1"/db_xref="GI:10732632" 2 5 10 4 2 2 2 10 16 5 0 7 2 5 1 3 2 0 3 11 3 4 2 3 1 7 5 15 2 6 2 7 5 3 0 4 15 1 4 11 5 2 2 15 9 2 4 13 9 6 1 1 4 4 7 4 2 16 4 10 0 1 0 0 >HSCALB\X65869\55..294\240\CAA46699.1\Homo sapiens\H.sapiens mRNA for calbindin D9k, a calcium binding protein./function="calcium-binding protein"/codon_start=1/product="calbindin-D9k"/protein_id="CAA46699.1"/db_xref="GI:29602"/db_xref="GOA:P29377"/db_xref="UniProt/Swiss-Prot:P29377" 0 0 0 0 0 1 2 2 3 0 2 2 1 1 0 1 0 3 0 1 0 1 2 1 0 1 1 1 0 1 2 0 0 2 2 0 0 1 5 6 1 1 2 3 0 0 9 1 2 5 0 1 0 0 2 3 1 0 2 1 0 0 0 1 >AB110940\AB110940\join(801..1145,9755..9850,10833..10970,11281..11303, 11486..11573)\690\BAD13706.1\Homo sapiens\Homo sapiens TRIM40 gene for TRIM40 protein, complete cds,cell_line: LKT3./gene="TRIM40"/codon_start=1/product="TRIM40 protein"/protein_id="BAD13706.1"/db_xref="GI:46091155" 2 1 3 0 4 6 2 8 12 6 3 3 1 2 0 3 6 2 4 3 1 0 2 6 0 3 3 6 1 4 2 2 2 2 1 4 7 1 3 13 5 3 0 17 4 4 10 13 5 1 1 2 10 4 3 0 0 5 4 3 1 0 0 1 >AK023667\AK023667\638..1090\453\BAB14631.1\Homo sapiens\Homo sapiens cDNA FLJ13605 fis, clone PLACE1010562./codon_start=1/protein_id="BAB14631.1"/db_xref="GI:10435659" 0 2 1 0 1 6 0 4 17 2 1 0 2 4 0 1 3 4 1 3 0 1 2 3 0 0 2 7 2 2 1 2 1 4 0 2 3 1 0 4 1 0 4 5 3 1 1 8 3 4 5 4 5 1 4 1 1 1 1 5 3 0 0 1 >BC017221\BC017221\105..1736\1632\AAH17221.1\Homo sapiens\Homo sapiens chromosome 12 open reading frame 22, mRNA (cDNA cloneMGC:13118 IMAGE:4100454), complete cds./gene="C12orf22"/codon_start=1/product="TGF-beta induced apotosis protein 12"/protein_id="AAH17221.1"/db_xref="GI:16878013"/db_xref="GeneID:81566" 4 8 13 2 0 4 3 12 28 2 1 3 11 14 3 9 18 7 9 10 5 5 12 13 1 11 8 15 1 11 4 12 7 7 3 7 22 2 7 16 8 10 3 20 6 4 23 33 11 24 5 4 8 12 10 8 0 7 8 8 1 0 0 1 >AL591668#3\AL591668\join(79309..79446,79696..79797,95192..95325,95600..95643, 115179..115348,115514..115657,116707..116913)\939\CAI41601.1\Homo sapiens\Human DNA sequence from clone RP13-546E3 on chromosome X Containsthe 5' end of the PHF6 gene for PHD finger protein 6 and a CpGisland, complete sequence./gene="PHF6"/locus_tag="AC004383.6-003"/standard_name="OTTHUMP00000024064"/codon_start=1/product="PHD finger protein 6"/protein_id="CAI41601.1"/db_xref="GI:57209487" 4 2 0 0 9 5 1 2 4 2 4 5 13 5 0 8 3 8 11 2 1 3 2 2 0 6 9 2 1 4 10 3 3 2 4 4 2 1 22 12 2 7 3 7 8 10 19 4 3 9 3 5 7 11 0 14 3 1 8 7 0 0 1 0 >AB032928\AB032928\112..1677\1566\BAA89344.1\Homo sapiens\Homo sapiens mRNA for hMYHgamma3, complete cds./gene="hMYH"/codon_start=1/product="hMYHgamma3"/protein_id="BAA89344.1"/db_xref="GI:6691537" 3 7 12 5 5 8 7 7 31 4 1 2 5 9 2 8 11 6 6 16 2 3 12 11 3 7 13 27 0 12 7 11 14 7 6 8 21 6 4 13 9 2 7 39 10 2 9 23 9 7 5 6 7 5 8 3 0 4 3 9 12 0 0 1 >BC031976\BC031976\682..804\123\AAH31976.1\Homo sapiens\Homo sapiens cDNA clone MGC:42850 IMAGE:4830480, complete cds./codon_start=1/product="Unknown (protein for MGC:42850)"/protein_id="AAH31976.1"/db_xref="GI:71297011" 1 0 0 0 3 0 1 2 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 2 4 1 3 0 0 0 2 0 2 3 0 0 0 0 0 0 1 2 0 2 0 1 0 0 >BC005385\BC005385\43..834\792\AAH05385.1\Homo sapiens\Homo sapiens similar to Chymotrypsinogen B precursor, mRNA (cDNAclone MGC:12508 IMAGE:3950220), complete cds./gene="LOC440387"/codon_start=1/product="similar to Chymotrypsinogen B precursor"/protein_id="AAH05385.1"/db_xref="GI:13529251"/db_xref="GeneID:440387" 0 1 0 1 0 4 0 8 16 0 0 0 0 15 0 2 6 1 4 15 0 0 0 8 0 5 1 18 1 4 1 15 7 2 0 10 14 0 1 14 6 3 1 8 3 0 1 5 14 1 2 0 9 2 8 1 1 9 2 3 10 0 0 1 >CR456776\CR456776\1..1020\1020\CAG33057.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834F024D forgene TBP, TATA box binding protein; complete cds, incl. stopcodon./gene="TBP"/codon_start=1/protein_id="CAG33057.1"/db_xref="GI:48145669" 2 0 1 1 7 4 2 4 7 5 4 5 3 5 1 4 2 7 5 8 4 8 9 8 2 9 12 7 1 7 6 5 3 5 5 1 6 5 6 9 5 5 11 49 2 0 8 5 2 2 3 6 1 2 5 8 2 7 11 10 0 1 0 0 >D86043\D86043\3..1514\1512\BAA12974.1\Homo sapiens\Homo sapiens mRNA for SHPS-1, complete cds./codon_start=1/product="SHPS-1"/protein_id="BAA12974.1"/db_xref="GI:1864011" 2 2 6 2 6 6 2 10 17 1 1 6 9 10 3 9 12 1 11 21 3 7 4 18 7 8 5 24 8 6 8 9 8 3 5 9 25 6 7 18 14 10 3 22 11 3 6 28 15 5 8 4 8 1 9 2 1 13 4 5 6 0 0 1 >AF022728\AF022728\162..2045\1884\AAC05082.1\Homo sapiens\Homo sapiens beta-dystrobrevin (BDTN) mRNA, complete cds./gene="BDTN"/codon_start=1/product="beta-dystrobrevin"/protein_id="AAC05082.1"/db_xref="GI:2935183" 8 8 6 7 8 8 3 10 30 10 5 7 7 14 3 8 10 8 7 13 5 7 8 13 2 14 17 14 1 10 6 10 6 6 3 8 10 9 11 20 15 11 9 32 12 13 20 29 16 8 5 7 9 6 6 15 8 7 7 28 4 0 1 0 >BC013996\BC013996\41..1033\993\AAH13996.1\Homo sapiens\Homo sapiens aldo-keto reductase family 7, member A2 (aflatoxinaldehyde reductase), mRNA (cDNA clone MGC:20271 IMAGE:3625337),complete cds./gene="AKR7A2"/codon_start=1/product="AKR7A2 protein"/protein_id="AAH13996.1"/db_xref="GI:15559276"/db_xref="GeneID:8574"/db_xref="MIM:603418" 0 7 6 0 1 4 2 7 21 1 0 4 3 5 2 0 7 2 1 10 4 2 2 7 4 5 4 21 4 6 2 17 8 2 0 4 13 1 3 10 6 4 2 13 11 2 6 18 10 2 9 4 4 3 9 4 0 4 2 8 7 0 1 0 >S68252\S68252\4..627\624\AAD14009.1\Homo sapiens\TIMP-1=metalloproteinase inhibitor [human, keratoconus keratocytes,mRNA, 636 nt]./gene="TIMP-1"/codon_start=1/protein_id="AAD14009.1"/db_xref="GI:4261709" 0 2 4 2 0 3 0 5 10 1 4 4 0 6 0 2 4 3 2 9 2 3 4 8 0 1 0 10 0 5 3 6 4 0 0 5 3 1 3 5 3 1 2 10 5 1 3 7 3 2 4 2 9 3 7 4 2 6 1 4 4 0 0 1 >BC101124\BC101124\123..251\129\AAI01125.1\Homo sapiens\Homo sapiens cDNA clone MGC:119771 IMAGE:40014007, complete cds./codon_start=1/product="Unknown (protein for MGC:119771)"/protein_id="AAI01125.1"/db_xref="GI:71682794" 0 3 1 0 1 0 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 1 0 1 3 3 0 1 0 0 1 1 1 1 1 1 0 3 2 2 0 0 2 0 1 2 0 0 1 >BC078662\BC078662\130..756\627\AAH78662.1\Homo sapiens\Homo sapiens RAB6B, member RAS oncogene family, mRNA (cDNA cloneMGC:88263 IMAGE:6450986), complete cds./gene="RAB6B"/codon_start=1/product="RAB6B protein"/protein_id="AAH78662.1"/db_xref="GI:50927456"/db_xref="GeneID:51560" 2 3 1 1 1 5 1 2 8 1 0 4 1 4 1 2 7 2 4 5 5 1 1 4 1 0 2 2 3 4 2 5 7 1 0 4 8 1 6 8 4 3 1 10 0 0 2 12 12 3 6 0 2 0 8 2 1 9 4 7 2 1 0 0 >AK027060\AK027060\7..636\630\BAB15643.1\Homo sapiens\Homo sapiens cDNA: FLJ23407 fis, clone HEP19601./codon_start=1/protein_id="BAB15643.1"/db_xref="GI:10440085" 3 2 4 1 8 3 0 7 1 4 0 2 3 1 0 2 8 5 2 0 0 9 3 2 0 0 3 4 0 1 4 2 8 1 2 1 1 1 11 11 0 4 2 11 9 6 10 9 1 3 1 5 5 9 6 1 1 3 2 1 0 1 0 0 >AC093674\AC093674\join(21995..22131,27668..27720,58421..58528,59567..59697, 62635..62728,71278..71633)\879\AAY24212.1\Homo sapiens\Homo sapiens BAC clone RP11-592G8 from 2, complete sequence./gene="HNMT"/codon_start=1/product="unknown"/protein_id="AAY24212.1"/db_xref="GI:62988825" 0 1 1 0 2 3 3 8 7 8 1 4 5 4 1 5 3 6 5 4 1 3 5 1 0 2 5 3 0 6 7 3 4 4 3 0 2 7 11 14 6 7 7 6 2 4 11 11 12 8 4 6 5 1 5 10 5 3 12 10 5 1 0 0 >HSARSD\X83572\77..1858\1782\CAA58555.1\Homo sapiens\Homo sapiens ARSD gene, complete CDS./gene="ARSD"/codon_start=1/protein_id="CAA58555.1"/db_xref="GI:791002"/db_xref="GOA:P51689"/db_xref="InterPro:IPR000917"/db_xref="UniProt/Swiss-Prot:P51689" 5 5 5 0 9 8 5 15 26 8 5 6 4 11 2 6 7 5 5 14 9 5 4 19 9 5 16 19 10 5 17 19 15 13 2 5 21 4 6 9 10 9 1 21 17 16 9 18 17 9 9 5 9 6 19 10 3 9 7 12 14 0 0 1 >AL391384#1\AL391384\join(complement(1897..2124),complement(1138..1295), complement(AL138695.13:66652..66845), complement(AL138695.13:65885..65958), complement(AL138695.13:64390..64557), complement(AL138695.13:63676..63840), complement(AL138695.13:62411..62524), complement(AL138695.13:62149..62286), complement(AL138695.13:61158..61304), complement(AL138695.13:60624..60740), complement(AL138695.13:60260..60361), complement(AL138695.13:59546..59610), complement(AL138695.13:59369..59453), complement(AL138695.13:57250..57377), complement(AL138695.13:54437..54523), complement(AL138695.13:51916..52072), complement(AL138695.13:50388..50602), complement(AL138695.13:50111..50279), complement(AL138695.13:49828..49986), complement(AL138695.13:48994..49116), complement(AL138695.13:48260..48343))\228\CAI15670.1\Homo sapiens\Human DNA sequence from clone RP11-555G22 on chromosome 13 Containsthe 5' end of the gene for mitotic control protein (DIS3,KIAA1008),the 5' end of the gene for progesterone-induced blocking factor 1(PIBF1) and a CpG island, complete sequence./gene="RP11-342J4.3"/locus_tag="RP11-342J4.3-003"/standard_name="OTTHUMP00000018498"/codon_start=1/protein_id="CAI15670.1"/db_xref="GI:55959542"/db_xref="GOA:Q5W0P7"/db_xref="InterPro:IPR001900"/db_xref="InterPro:IPR006596"/db_xref="UniProt/TrEMBL:Q5W0P7" 2 1 2 1 29 14 17 13 23 8 26 10 23 10 4 16 11 15 28 10 4 14 25 10 7 12 17 8 7 8 12 11 8 9 10 9 7 11 49 27 17 31 19 21 25 17 17 13 15 15 22 15 19 15 17 24 23 16 16 13 9 29 8 15 >AF289485\AF289485\21..1151\1131\AAG17847.1\Homo sapiens\Homo sapiens MYG1 homolog mRNA, complete cds./codon_start=1/product="MYG1 homolog"/protein_id="AAG17847.1"/db_xref="GI:10444289" 9 6 9 2 2 1 0 9 25 4 2 4 2 8 0 2 8 5 1 11 2 3 9 9 7 7 12 11 2 4 5 7 9 5 0 3 13 3 4 7 2 3 5 12 13 3 5 22 14 5 6 6 3 3 11 3 1 9 3 8 7 0 1 0 >HUMHVDC\L29384\166..6921\6756\AAA59204.1\Homo sapiens\Homo sapiens (clone pcDNA-alpha1E-1) voltage-dependent calciumchannel alpha-1E-1 subunit mRNA, complete cds./codon_start=1/product="voltage-dependent calcium channel alpha-1E-1"/protein_id="AAA59204.1"/db_xref="GI:495868" 21 33 39 13 16 36 18 54 97 20 7 30 25 62 7 27 42 29 17 61 15 20 21 42 15 30 28 81 9 34 26 52 41 12 8 48 72 12 26 75 60 34 11 75 38 15 44 106 61 33 35 27 21 14 68 53 12 84 33 77 29 0 1 0 >AF039843\AF039843\391..1338\948\AAC04258.1\Homo sapiens\Homo sapiens Sprouty 2 (SPRY2) mRNA, complete cds./gene="SPRY2"/codon_start=1/product="Sprouty 2"/protein_id="AAC04258.1"/db_xref="GI:2809400" 3 3 3 2 8 6 1 7 8 4 1 5 5 8 4 8 8 3 5 4 3 4 8 8 0 13 1 8 0 4 1 5 8 7 2 7 3 4 8 8 9 2 1 16 8 2 3 11 10 5 4 3 12 14 2 2 2 4 1 3 3 0 1 0 >AL512785#3\AL512785\complement(join(84515..84550,86158..86322,89974..90037, 99118..99251))\399\CAI15780.1\Homo sapiens\Human DNA sequence from clone RP11-565P22 on chromosome 1 Containstwo novel genes, the 3' end of the gene for C-terminal PDZ domainligand of neuronal nitric oxide synthase (CAPON), a novel gene(LOC284680), the EAT2 gene for SH2 domain-containing molecule EAT2and a CpG island, complete sequence./gene="EAT2"/locus_tag="RP11-565P22.5-001"/standard_name="OTTHUMP00000029531"/codon_start=1/product="SH2 domain-containing molecule EAT2"/protein_id="CAI15780.1"/db_xref="GI:55960297"/db_xref="InterPro:IPR000980" 1 0 0 1 5 1 1 2 5 2 3 5 0 1 2 1 5 1 2 3 0 1 5 1 0 2 1 0 0 0 3 1 3 1 0 5 4 1 6 5 3 2 1 3 2 1 5 4 2 4 5 2 1 2 1 5 3 2 1 2 1 0 0 1 >AF091871#1\AF091871\17..283\267\AAD14598.1\Homo sapiens\Homo sapiens molybdopterin-synthase small and large subunit (MOCS2)bicistronic mRNA, complete cds./gene="MOCS2"/function="molybdenum cofactor biosynthesis"/codon_start=1/product="molybdopterin-synthase small subunit"/protein_id="AAD14598.1"/db_xref="GI:4262372" 1 0 0 2 1 0 0 2 2 2 0 3 1 0 0 1 0 2 1 1 0 1 0 2 1 3 1 1 1 3 6 0 0 0 1 1 3 5 2 1 0 1 2 5 0 1 6 3 1 2 0 2 1 0 0 2 5 1 3 1 1 0 1 0 >BC009743\BC009743\104..1699\1596\AAH09743.1\Homo sapiens\Homo sapiens coiled-coil domain containing 9, mRNA (cDNA cloneMGC:10357 IMAGE:3832697), complete cds./gene="CCDC9"/codon_start=1/product="coiled-coil domain containing 9"/protein_id="AAH09743.1"/db_xref="GI:14602475"/db_xref="GeneID:26093" 6 17 25 3 2 9 0 4 11 4 0 3 5 11 3 5 10 7 4 5 3 7 12 23 3 18 7 21 1 12 9 22 14 5 1 4 8 1 5 21 7 4 2 19 11 2 16 67 16 11 3 2 1 1 6 2 1 8 2 8 11 0 0 1 >HS889N15#8\AL031177\join(44805..44814,79055..79165,81759..81830,84238..84336, 87035..87136,87317..87389,90069..90148,90782..90969, 91091..91169,103130..103275,104780..104836,105957..106065, 106620..106690)\1197\CAI43137.1\Homo sapiens\Human DNA sequence from clone RP5-889N15 on chromosome Xq22.1-22.3Contains the 3' end of a novel gene (MGC44287), the PSMD10 gene forproteasome (prosome, macropain) 26S subunit, non-ATPase, 10 (p28),the AUTL2 gene for AUT-like 2, cysteine endopeptidase (S.cerevisiae) (APG4A, MGC43691), the 3' end of the COL4A6 gene forcollagen, type IV, alpha 6 and a CpG island, complete sequence./gene="APG4A"/locus_tag="RP5-889N15.2-001"/standard_name="OTTHUMP00000023821"/codon_start=1/product="APG4 autophagy 4 homolog A (S. cerevisiae)"/protein_id="CAI43137.1"/db_xref="GI:57284015"/db_xref="GOA:Q8WYN0"/db_xref="InterPro:IPR005078"/db_xref="UniProt/Swiss-Prot:Q8WYN0" 1 3 0 2 3 4 6 3 11 6 11 6 10 3 0 7 2 4 8 4 2 7 9 6 0 7 6 4 1 8 12 3 4 5 3 3 5 10 18 8 7 7 8 13 2 5 18 12 11 15 5 6 6 6 9 12 3 8 9 11 10 0 1 0 >HUMCACNL1E\L29534\14..6448\6435\AAA51900.1\Homo sapiens\Homo sapiens cardiac L-type voltage-dependent calcium channel a1subunit (CACNL1A1) mRNA, complete cds./gene="CACNL1A1"/function="calcium channel"/codon_start=1/evidence=experimental/product="cardiac L-type voltage-dependent calcium channel a1 subunit"/protein_id="AAA51900.1"/db_xref="GI:463079"/db_xref="GDB:G00-126-094" 14 25 36 4 7 35 9 69 117 9 7 13 7 56 4 19 58 12 17 58 22 12 26 55 10 17 25 90 21 29 18 54 41 18 4 48 67 12 20 81 74 25 12 69 30 9 41 103 53 34 42 14 28 15 80 47 6 107 30 53 26 0 1 0 >BC063652\BC063652\282..1055\774\AAH63652.1\Homo sapiens\Homo sapiens potassium channel tetramerisation domain containing 1,mRNA (cDNA clone MGC:75559 IMAGE:4802987), complete cds./gene="KCTD1"/codon_start=1/product="KCTD1 protein"/protein_id="AAH63652.1"/db_xref="GI:39645903"/db_xref="GeneID:284252" 2 0 4 2 9 5 3 9 5 3 1 7 3 8 2 2 4 1 3 3 3 4 5 5 0 7 5 2 0 0 6 7 1 3 3 8 5 1 6 3 4 5 4 8 6 0 10 7 10 5 5 4 1 4 4 7 2 9 4 6 2 1 0 0 >AY358607\AY358607\68..508\441\AAQ88970.1\Homo sapiens\Homo sapiens clone DNA77303 SRSR846 (UNQ846) mRNA, complete cds./locus_tag="UNQ846"/codon_start=1/product="SRSR846"/protein_id="AAQ88970.1"/db_xref="GI:37182336" 1 1 0 0 3 2 3 5 4 3 1 5 3 2 1 2 0 2 2 4 1 2 5 1 1 6 3 0 1 5 3 3 2 0 2 2 4 5 1 0 0 3 3 2 3 0 6 0 5 2 0 5 3 4 2 2 1 4 4 3 3 0 0 1 >AL669918#7\AL669918\complement(join(43384..43472,43956..44160,44559..44688, 45097..45208,45367..45514,46275..46421))\831\CAI18138.1\Homo sapiens\Human DNA sequence from clone XXbac-246D15 on chromosome 6 containsthe HLA-DOB gene for major histocompatibility complex, class II, DObeta, the TAP2 gene for transporter 2, ATP-binding cassette,subfamily B (MDR/TAP), the PSMB8 gene for proteasome (prosome,macropain) subunit, beta type, 8 (large multifunctional protease7), the TAP1 gene for transporter 1, ATP-binding cassette,sub-family B (MDR/TAP), the PSMB9 gene for proteasome (prosome,macropain) subunit, beta type, 9 (large multifunctional protease2), a novel protein similar to protein phosphatase 1, regulatory(inhibitor) subunit 2 and two CpG islands, complete sequence./gene="PSMB8"/locus_tag="XXbac-BPG246D15.6-002"/standard_name="OTTHUMP00000029420"/codon_start=1/product="proteasome (prosome, macropain) subunit, beta type, 8 (large multifunctional protease 7)"/protein_id="CAI18138.1"/db_xref="GI:55961477"/db_xref="GOA:Q5QNR8"/db_xref="InterPro:IPR000243"/db_xref="InterPro:IPR001353"/db_xref="UniProt/TrEMBL:Q5QNR8" 3 3 8 2 1 3 2 6 10 3 2 0 3 5 3 6 3 7 2 3 2 3 2 3 2 4 5 11 2 5 10 10 7 4 2 3 7 3 1 7 5 5 1 8 4 3 10 5 7 6 11 6 3 3 6 0 0 1 7 14 3 1 0 0 >AL513164#2\AL513164\join(97905..98060,144971..145099,AL033378.12:19328..19378, AL033378.12:19502..19551,AL033378.12:47689..47729, AL033378.12:50561..50647,AL033378.12:53254..53366, AL033378.12:66758..66859,AL033378.12:93468..93600, AL033378.12:98691..99037,AL033378.12:104435..104509, AL033378.12:106597..106740,AL033378.12:110670..110805, AL033378.12:111941..112110,AL033378.12:112915..113124, AL033378.12:113895..114045,AL033378.12:119587..119700, AL033378.12:122824..123962,AL033378.12:125159..125290, AL033378.12:127439..127702)\-600465\CAH73449.1\Homo sapiens\Human DNA sequence from clone RP11-400P17 on chromosome 6 Containsthe 5' part of a novel gene, the 5' part of the gene for KIAA0790protein (KIAA0790) and a CpG island, complete sequence./gene="RP3-323M4.1"/locus_tag="RP3-323M4.1-001"/standard_name="OTTHUMP00000017386"/codon_start=1/protein_id="CAH73449.1"/db_xref="GI:55665282"/db_xref="InterPro:IPR001452"/db_xref="InterPro:IPR001660"/db_xref="InterPro:IPR011510"/db_xref="InterPro:IPR011511"/db_xref="UniProt/TrEMBL:Q5VUX1" 3 4 5 2 17 25 9 20 35 24 22 40 29 20 0 26 20 22 26 13 2 22 20 17 8 20 16 12 7 15 22 15 16 11 13 11 20 31 45 30 14 27 18 28 19 17 16 27 14 21 15 28 21 26 21 52 21 9 24 24 31 17 14 29 >AF411849S2\AF456925\complement(2643..3659)\1017\AAL47764.1\Homo sapiens\Homo sapiens USH3 region, partial sequence./codon_start=1/product="KIAA0001"/protein_id="AAL47764.1"/db_xref="GI:17901946"/IGNORED_CODON=2 0 1 3 0 3 3 5 9 8 8 3 5 8 5 1 6 9 6 7 3 0 7 3 3 1 6 2 3 0 6 5 1 1 2 4 6 14 5 13 11 7 7 1 11 3 2 8 1 6 2 9 6 5 4 13 14 5 19 14 7 6 0 0 1 >BC075842\BC075842\23..1432\1410\AAH75842.1\Homo sapiens\Homo sapiens immunoglobulin heavy constant gamma 1 (G1m marker),mRNA (cDNA clone MGC:88778 IMAGE:6215815), complete cds./gene="IGHG1"/codon_start=1/product="IGHG1 protein"/protein_id="AAH75842.1"/db_xref="GI:49904194"/db_xref="GeneID:3500"/db_xref="IMGT/GENE-DB:IGHG1"/db_xref="MIM:147100" 2 0 4 1 2 1 1 15 23 0 0 1 6 22 4 7 19 4 5 25 5 3 8 16 7 5 4 9 2 4 6 13 9 2 2 19 26 1 11 23 14 4 1 17 9 2 4 17 15 1 15 4 9 2 12 2 2 6 2 3 11 0 0 1 >HUMMGPHB\J03544\80..2671\2592\AAA59597.1\Homo sapiens\Human brain glycogen phosphorylase mRNA, complete cds./gene="PYGB"/codon_start=1/protein_id="AAA59597.1"/db_xref="GI:307200"/db_xref="GDB:G00-120-326" 2 17 19 1 2 12 1 10 54 8 1 10 3 14 6 2 5 3 3 20 11 4 8 21 6 4 10 32 9 10 5 27 14 6 1 16 47 2 7 47 29 11 0 29 21 5 11 54 44 10 22 12 10 4 28 11 2 38 6 22 14 0 1 0 >AF461041\AF461041\146..7096\6951\AAL67803.1\Homo sapiens\Homo sapiens AF15q14 isoform 2 mRNA, complete cds; alternativelyspliced./codon_start=1/product="AF15q14 isoform 2"/protein_id="AAL67803.1"/db_xref="GI:18308012" 6 2 3 7 44 16 22 20 36 35 38 42 45 21 1 61 26 60 56 39 5 71 36 16 4 41 28 21 1 44 37 8 14 17 37 20 20 40 117 72 44 122 64 54 22 39 131 73 46 104 13 34 10 40 23 53 66 18 64 54 13 0 1 0 >AK056900\AK056900\94..483\390\BAB71305.1\Homo sapiens\Homo sapiens cDNA FLJ32338 fis, clone PROST2005919, moderatelysimilar to Human breast cancer, estrogen regulated LIV-1 protein(LIV-1) mRNA./codon_start=1/protein_id="BAB71305.1"/db_xref="GI:16552425" 0 2 1 0 1 2 0 5 15 0 0 2 0 4 1 0 3 0 0 3 1 1 1 0 0 0 4 13 5 4 3 4 6 1 1 2 6 1 0 1 2 0 1 1 4 0 0 7 4 0 2 0 2 0 6 0 0 2 0 3 2 0 0 1 >BC025784\BC025784\1..1404\1404\AAH25784.1\Homo sapiens\Homo sapiens pancreatic lipase-related protein 1, mRNA (cDNA cloneMGC:34434 IMAGE:5226231), complete cds./gene="PNLIPRP1"/codon_start=1/product="pancreatic lipase-related protein 1"/protein_id="AAH25784.1"/db_xref="GI:19343958"/db_xref="GeneID:5407"/db_xref="MIM:604422" 2 2 3 0 4 4 2 13 14 4 0 6 3 6 0 6 12 3 15 9 4 7 8 11 2 4 8 9 1 11 18 10 6 5 1 1 18 6 12 17 11 11 6 10 7 3 9 18 14 14 8 7 11 2 16 12 2 16 8 7 8 1 0 0 >HSU49187\U49187\465..2075\1611\AAC51134.1\Homo sapiens\Human placenta (Diff48) mRNA, complete cds./gene="Diff48"/codon_start=1/protein_id="AAC51134.1"/db_xref="GI:1293561" 3 5 4 0 6 6 6 10 26 7 5 4 7 14 3 9 11 3 11 12 4 2 9 5 1 10 13 14 1 10 9 9 5 10 5 5 7 5 22 17 10 15 6 15 8 2 25 34 15 14 4 8 4 3 9 11 3 12 6 14 3 0 1 0 >BC011786\BC011786\76..417\342\AAH11786.1\Homo sapiens\Homo sapiens putative insulin-like growth factor II associatedprotein, mRNA (cDNA clone MGC:19839 IMAGE:4100006), complete cds./gene="LOC492304"/codon_start=1/product="putative insulin-like growth factor II associated protein"/protein_id="AAH11786.1"/db_xref="GI:15080002"/db_xref="GeneID:492304" 2 0 2 2 1 2 0 3 3 1 0 0 0 1 1 1 0 2 0 4 1 2 4 7 3 4 1 6 1 1 3 5 2 1 0 4 1 2 2 1 2 3 1 4 3 2 0 2 2 0 1 0 4 3 1 1 1 0 4 1 2 0 1 0 >HUMPECAM1\M28526\142..2358\2217\AAA36429.1\Homo sapiens\Human platelet endothelial cell adhesion molecule (PECAM-1) mRNA,complete cds./codon_start=1/protein_id="AAA36429.1"/db_xref="GI:189776" 2 2 2 1 10 8 3 4 25 9 3 4 12 17 2 15 12 14 14 18 8 8 10 12 3 7 8 21 4 9 14 4 6 6 3 24 32 8 26 33 24 13 12 24 11 7 33 24 17 14 12 10 8 8 14 9 11 22 18 19 5 0 1 0 >AF151100\AF151100\320..703\384\AAD42081.1\Homo sapiens\Homo sapiens TDAG51/Ipl homologue 1 (TIH1) mRNA, complete cds./gene="TIH1"/codon_start=1/product="TDAG51/Ipl homologue 1"/protein_id="AAD42081.1"/db_xref="GI:5326878" 0 6 4 0 0 0 1 6 8 0 0 0 0 1 0 0 4 0 1 7 4 0 0 3 0 0 0 6 3 1 0 9 6 0 0 2 7 0 0 9 2 0 0 7 1 0 3 7 1 1 1 0 3 0 5 0 0 5 0 1 2 1 0 0 >HSU38805\U38805\72..2276\2205\AAD04206.1\Homo sapiens\Homo sapiens fertilin beta mRNA, complete cds./codon_start=1/product="fertilin beta"/protein_id="AAD04206.1"/db_xref="GI:4151119" 0 2 2 1 13 5 3 6 11 9 18 7 22 6 1 10 7 19 12 3 1 13 13 3 2 14 17 7 0 11 19 7 12 12 10 3 10 22 30 14 11 25 20 14 8 12 33 13 12 25 9 20 17 26 10 28 15 3 32 16 8 0 1 0 >AY495333\AY495333\190..726\537\AAS80145.1\Homo sapiens\Homo sapiens tumor necrosis factor alpha-inducing factor deltaisoform (TAIF) mRNA, complete cds, alternatively spliced./gene="TAIF"/codon_start=1/product="tumor necrosis factor alpha-inducing factor delta isoform"/protein_id="AAS80145.1"/db_xref="GI:46095220" 2 0 3 1 4 0 1 3 10 1 1 0 3 4 1 3 2 1 2 2 0 1 2 3 1 3 4 7 1 1 4 2 3 0 0 3 6 2 5 8 1 1 2 11 3 1 5 15 4 4 2 3 4 1 6 2 1 0 0 7 5 0 0 1 >AC009957\AC009957\complement(120698..120949)\252\AAX93056.1\Homo sapiens\Homo sapiens BAC clone RP11-285H23 from 2, complete sequence./gene="PRO0159"/codon_start=1/product="unknown"/protein_id="AAX93056.1"/db_xref="GI:62702129" 0 0 0 0 1 0 1 1 3 4 0 1 1 3 0 3 0 1 1 0 0 1 0 0 0 0 2 1 1 0 1 1 0 0 1 4 1 0 2 3 0 3 1 1 1 4 1 1 0 2 0 4 0 1 2 8 1 2 9 2 2 1 0 0 >HUMHYMEGLA\L07033 S55700\15..992\978\AAA92733.1\Homo sapiens\Human hydroxymethylglutaryl-CoA lyase mRNA, complete cds./EC_number="4.1.3.4"/standard_name="3-hydroxy-3-methylglutaryl CoA; HMG CoA Lyase"/codon_start=1/evidence=experimental/product="hydroxymethylglutaryl-CoA lyase"/protein_id="AAA92733.1"/db_xref="GI:184503" 2 0 4 0 1 2 3 5 12 5 1 6 5 6 0 9 3 2 1 12 1 4 6 3 1 4 9 8 4 13 9 11 5 5 3 10 13 4 8 15 6 6 3 9 3 1 10 7 8 1 6 2 3 5 3 6 3 10 5 12 1 0 0 1 >BC010397\BC010397\7..699\693\AAH10397.1\Homo sapiens\Homo sapiens lysophospholipase I, mRNA (cDNA clone MGC:13688IMAGE:4109335), complete cds./gene="LYPLA1"/codon_start=1/product="lysophospholipase I"/protein_id="AAH10397.1"/db_xref="GI:14714526"/db_xref="GeneID:10434"/db_xref="MIM:605599" 0 0 2 0 3 1 2 3 4 5 3 5 5 1 1 6 0 3 3 5 1 3 5 3 2 7 6 7 2 6 6 2 5 7 0 2 7 3 8 3 3 7 3 8 3 3 7 1 1 10 0 3 4 2 2 6 1 5 12 9 3 0 0 1 >BC036370\BC036370\32..1447\1416\AAH36370.1\Homo sapiens\Homo sapiens stress 70 protein chaperone, microsome-associated,60kDa, mRNA (cDNA clone MGC:26248 IMAGE:4837136), complete cds./gene="STCH"/codon_start=1/product="stress 70 protein chaperone, microsome-associated, 60kDa, precursor"/protein_id="AAH36370.1"/db_xref="GI:22137780"/db_xref="GeneID:6782"/db_xref="MIM:601100" 2 1 2 3 7 3 6 5 8 9 12 14 5 3 1 11 3 7 9 7 3 8 7 6 0 8 13 5 0 11 17 12 6 2 11 6 15 9 18 11 8 14 12 11 5 4 20 13 11 13 2 12 0 1 6 14 7 7 14 10 1 0 0 1 >AL589765#13\AL589765\complement(join(96827..96879,97085..97181,97440..97541, 97767..97918,98195..98259,98471..98643,98814..98974, 101060..101381,101478..101582,102426..102515, 103856..103962,105274..105397))\1551\CAI17175.1\Homo sapiens\Human DNA sequence from clone RP11-98D18 on chromosome 1 Containsthe 3' end of the SNX27 gene for sorting nexin family member 27,the TNRC4 gene for trinucleotide repeat containing 4, five novelgenes, a ribosomal protein S11 (RPS11) pseudogene, the MRPL9 genefor mitochondrial ribosomal protein L9, the OAZ3 gene for ornithinedecarboxylase antizyme 3, a novel gene (FLJ36032) the TDRKH genefor tudor and KH domain containing, the RORC gene for RAR-relatedorphan receptor C and four CpG islands, complete sequence./gene="TDRKH"/locus_tag="RP11-98D18.8-004"/standard_name="OTTHUMP00000015262"/codon_start=1/product="tudor and KH domain containing"/protein_id="CAI17175.1"/db_xref="GI:55960562"/db_xref="GOA:Q5SZR5"/db_xref="InterPro:IPR002999"/db_xref="InterPro:IPR004087"/db_xref="InterPro:IPR004088"/db_xref="InterPro:IPR008191"/db_xref="UniProt/TrEMBL:Q5SZR5" 2 3 9 0 7 5 6 10 12 9 7 8 8 6 0 12 11 10 12 8 1 10 11 6 1 10 12 12 0 14 12 6 7 7 6 6 10 8 15 13 5 7 3 13 4 6 27 22 19 18 6 7 2 3 3 9 7 13 11 11 8 0 0 1 >AK055094\AK055094\71..1444\1374\BAB70853.1\Homo sapiens\Homo sapiens cDNA FLJ30532 fis, clone BRAWH2001129, weakly similarto OCCLUDIN./codon_start=1/protein_id="BAB70853.1"/db_xref="GI:16549750" 3 3 7 2 11 6 4 8 12 4 5 8 9 7 3 4 3 9 7 12 0 2 19 10 3 5 9 6 1 11 13 9 5 5 5 6 11 9 8 10 7 9 5 4 6 3 12 13 13 14 13 13 4 3 6 15 7 5 9 14 8 0 0 1 >HS187N21\Z98036\join(complement(AL355855.23:6892..6990), complement(59373..59483),complement(13152..13196), complement(10962..11046),complement(6264..6442))\-355851\CAI19715.1\Homo sapiens\Human DNA sequence from clone RP1-187N21 on chromosome 6p21.2-21.33Contains part of a novel gene and the 3' end of the NUDT3 gene fornudix (nucleoside diphosphate linked moiety X)-type motif 3,complete sequence./gene="NUDT3"/locus_tag="RP1-187N21.3-001"/standard_name="OTTHUMP00000016227"/codon_start=1/protein_id="CAI19715.1"/db_xref="GI:56208216"/db_xref="Genew:8050"/db_xref="GOA:O95989"/db_xref="InterPro:IPR000086"/db_xref="UniProt/TrEMBL:O95989" 1 1 1 2 7 3 2 4 3 0 3 5 3 2 1 1 4 4 5 1 1 3 9 2 1 1 4 0 0 4 8 2 3 2 1 2 3 3 3 6 1 2 0 4 2 5 3 8 2 1 1 0 6 5 0 4 0 1 5 5 4 1 1 1 >AL670886#18\AL670886\join(19489..19560,20525..20627,21196..21353)\333\CAI17797.1\Homo sapiens\Human DNA sequence from clone XXbac-88C14 on chromosome 6 containsthe 5' end of the BAT3 gene for HLA-B associated transcript 3, theAPOM gene for apolipoprotein M, the gene for chromosome 6, openreading frame 47, the BAT4 gene for HLA-B associated transcript 4,the CSNK2B gene for casein kinase 2, beta polypeptide, the LY6G5Bgene for lymphocyte antigen 6 complex, locus 5B , the gene for G5cprotein, the BAT5 gene for HLA-B associated transcript 5, theLY6G6F gene for lymphocyte antigen 6 complex, locus G6f, the LY6G6Egene for lymphocyte antigen 6 complex, locus G6E, the LY6G6C genefor lymphocyte antigen 6 complex, locus G6C, the gene forchromosome 6 open reading frame 25, the DDAH gene fordimethylarginine dimethylaminohydrolase 2, the CLIC1 gene forchloride intracellular channel 1 and seven CpG islands, completesequence./gene="CSNK2B"/locus_tag="XXbac-BCX88C14.4-006"/standard_name="OTTHUMP00000029469"/codon_start=1/product="casein kinase 2, beta polypeptide"/protein_id="CAI17797.1"/db_xref="GI:55961590"/db_xref="GOA:Q5SRQ7"/db_xref="InterPro:IPR000704"/db_xref="UniProt/TrEMBL:Q5SRQ7" 1 1 0 2 0 1 1 3 4 3 0 2 1 3 0 1 2 1 0 1 0 1 0 1 0 5 1 3 0 1 2 2 1 0 0 1 3 0 1 0 3 3 1 6 2 0 6 7 6 2 3 2 1 2 3 1 0 5 2 4 2 1 0 0 >AY358109\AY358109\14..361\348\AAQ88476.1\Homo sapiens\Homo sapiens clone DNA129563 ACAH3104 (UNQ3104) mRNA, complete cds./locus_tag="UNQ3104"/codon_start=1/product="ACAH3104"/protein_id="AAQ88476.1"/db_xref="GI:37181324" 0 0 1 0 0 0 2 6 4 8 2 6 0 4 0 6 1 1 0 2 0 1 3 3 0 5 1 3 1 7 0 2 1 0 0 0 1 0 0 0 2 1 1 2 3 3 1 2 1 2 2 0 3 4 5 3 3 1 2 2 1 1 0 0 >BC032690\BC032690\90..1775\1686\AAH32690.1\Homo sapiens\Homo sapiens tudor and KH domain containing protein, mRNA (cDNAclone MGC:45166 IMAGE:5241008), complete cds./gene="TDRKH"/codon_start=1/product="TDRKH protein"/protein_id="AAH32690.1"/db_xref="GI:21595812"/db_xref="GeneID:11022" 2 3 9 1 10 5 6 10 13 9 7 8 9 6 0 15 11 10 14 9 1 10 12 7 1 10 15 13 0 14 11 9 8 7 6 6 13 9 16 15 5 8 3 16 4 7 27 25 19 18 6 7 3 4 3 10 8 17 12 11 8 0 0 1 >MMPLNHR\X16070\129..1247\1119\CAA34203.1\Homo sapiens\Human mRNA for pln homing receptor homologue (peripheral lymphnode)./codon_start=1/product="pln homing receptor"/protein_id="CAA34203.1"/db_xref="GI:38093"/db_xref="GOA:P14151"/db_xref="UniProt/Swiss-Prot:P14151"/IGNORED_CODON=1 1 0 1 1 5 3 2 4 6 1 5 5 8 1 0 9 7 5 4 11 1 7 9 8 0 3 10 6 1 2 13 3 7 3 1 1 6 2 12 14 16 5 4 7 3 5 12 12 7 8 7 6 9 17 12 5 6 10 9 8 15 1 0 0 >HS20I3#4\AL035423\join(36844..36909,41747..41840,43109..43256,45816..45910, 47211..47296,55205..55300,61193..61317,62106..62241, 68129..68209,69474..69515)\969\CAI42441.1\Homo sapiens\Human DNA sequence from clone RP1-20I3 on chromosome Xq25-26Contains a SNRPN upstream reading frame (SNURF) pseudogene, theSLC25A14 gene for solute carrier family 25 (mitochondrial carrier,brain),member 14 (UCP5, BMCP1), the gene for G-protein coupledreceptor 2 (GPCR2), the gene for CGI-79 protein and a CpG island,complete sequence./gene="SLC25A14"/locus_tag="RP1-20I3.5-002"/standard_name="OTTHUMP00000024013"/codon_start=1/product="solute carrier family 25 (mitochondrial carrier, brain), member 14"/protein_id="CAI42441.1"/db_xref="GI:57471622"/db_xref="GOA:O95258"/db_xref="UniProt/Swiss-Prot:O95258" 3 3 1 3 2 5 6 3 4 7 6 6 4 3 0 3 6 2 4 5 1 9 3 4 1 2 4 6 3 8 10 8 6 7 7 1 10 8 9 8 4 3 9 6 2 5 5 7 1 9 3 7 0 3 6 13 6 10 15 11 6 1 0 0 >AK125630\AK125630\854..1285\432\BAC86224.1\Homo sapiens\Homo sapiens cDNA FLJ43642 fis, clone STOMA2004925./codon_start=1/protein_id="BAC86224.1"/db_xref="GI:34531786" 0 2 1 0 0 7 7 4 7 2 1 5 0 6 1 2 1 0 1 5 1 1 3 4 0 3 1 4 0 2 3 7 3 4 0 1 4 0 1 3 2 3 1 3 2 1 4 2 4 2 0 1 5 1 2 2 2 2 0 2 5 0 0 1 >HSA488947\AJ488947\1..2568\2568\CAD35759.1\Homo sapiens\Homo sapiens mRNA for polyserase-IB protein./codon_start=1/product="polyserase-IB protein"/protein_id="CAD35759.1"/db_xref="GI:33341912"/db_xref="GOA:Q7Z410"/db_xref="UniProt/Swiss-Prot:Q7Z410" 5 6 19 2 4 13 5 16 52 5 1 6 5 17 6 8 25 5 12 33 14 7 8 29 11 18 9 47 10 18 12 36 25 5 2 18 39 4 7 23 12 3 4 26 18 2 8 35 29 2 10 4 20 12 18 7 1 24 3 13 17 0 1 0 >BC044952\BC044952\107..1729\1623\AAH44952.1\Homo sapiens\Homo sapiens KIAA1271 protein, mRNA (cDNA clone MGC:50830IMAGE:5751684), complete cds./gene="KIAA1271"/codon_start=1/product="KIAA1271 protein"/protein_id="AAH44952.1"/db_xref="GI:27924145"/db_xref="GeneID:57506" 1 2 8 5 4 8 2 13 21 4 0 6 9 20 3 10 21 6 13 19 1 5 22 22 4 21 15 19 7 9 3 18 14 5 4 6 22 3 5 10 8 11 2 14 5 3 3 31 14 6 7 3 9 1 8 4 0 9 2 6 4 0 1 0 >AY775789\AY775789\234..1013\780\AAW66851.1\Homo sapiens\Homo sapiens thymic stromal-derived lymphopoietin receptortranscript variant 2 (TSLPR) mRNA, complete cds, alternativelyspliced./gene="TSLPR"/codon_start=1/product="thymic stromal-derived lymphopoietin receptor transcript variant 2"/protein_id="AAW66851.1"/db_xref="GI:58201460" 0 1 2 0 3 3 0 9 8 3 3 3 1 9 1 5 4 2 5 6 3 1 6 9 2 2 4 7 1 1 2 3 6 3 1 4 13 2 7 7 3 2 3 11 5 1 3 16 9 6 6 3 2 3 5 5 2 4 4 6 8 0 0 1 >BC069117\BC069117\1..987\987\AAH69117.1\Homo sapiens\Homo sapiens G protein-coupled receptor 7, mRNA (cDNA cloneMGC:95396 IMAGE:7216971), complete cds./gene="GPR7"/codon_start=1/product="G protein-coupled receptor 7"/protein_id="AAH69117.1"/db_xref="GI:46575749"/db_xref="GeneID:2831"/db_xref="MIM:600730" 0 12 8 0 0 1 2 17 29 0 0 2 0 5 5 0 12 0 1 13 4 3 1 10 8 0 4 24 15 2 0 5 2 1 1 9 22 1 0 5 11 0 0 7 2 1 0 7 11 0 11 1 9 1 16 1 1 14 0 5 6 0 0 1 >S77906\S77906\161..1024\864\AAB34218.2\Homo sapiens\Pax (PAX8e)=paired box {alternatively spliced} [human, thyroid,kidney and Wilms' tumors, mRNA Partial, 1031 nt]./gene="PAX8"/codon_start=1/product="paired box protein 8 isoform 3"/protein_id="AAB34218.2"/db_xref="GI:7524546" 3 6 4 0 3 4 1 6 10 2 0 0 3 9 1 2 15 3 2 6 2 6 5 11 3 11 1 9 1 5 5 11 6 1 3 5 10 0 4 8 4 5 1 13 8 2 2 10 10 5 5 1 6 1 2 3 1 11 3 6 1 0 0 1 >BC051847\BC051847\162..1847\1686\AAH51847.1\Homo sapiens\Homo sapiens zinc finger protein 394, mRNA (cDNA clone MGC:60166IMAGE:6007172), complete cds./gene="ZNF394"/codon_start=1/product="zinc finger protein 99"/protein_id="AAH51847.1"/db_xref="GI:30354282"/db_xref="GeneID:84124" 5 9 4 4 15 10 6 10 18 8 1 7 5 12 2 8 10 16 5 11 2 5 4 15 7 4 4 6 10 7 9 5 17 4 0 2 16 2 20 17 8 10 13 23 20 10 27 33 17 3 3 5 6 16 11 8 1 7 6 5 7 1 0 0 >AC097173\AC097173 AC007512\complement(join(131051..131223,131807..131910, 132908..133028,136605..136751,137595..137667))\618\AAY40945.1\Homo sapiens\Homo sapiens BAC clone RP11-96A1 from 4, complete sequence./gene="tmp_locus_1"/codon_start=1/product="unknown"/protein_id="AAY40945.1"/db_xref="GI:63991993" 1 2 1 3 3 0 1 3 9 3 2 3 6 1 1 4 3 2 4 4 1 6 3 1 0 2 2 3 1 2 3 2 1 1 4 3 7 5 10 3 1 6 1 10 1 0 11 7 4 7 4 3 0 3 3 6 2 9 6 2 3 0 0 1 >HUMCMYBLA\M15024\114..2036\1923\AAA52032.1\Homo sapiens\Human c-myb mRNA, complete cds./gene="MYB"/codon_start=1/protein_id="AAA52032.1"/db_xref="GI:180660"/db_xref="GDB:G00-119-441" 6 0 4 4 11 6 8 8 15 4 9 8 8 12 2 14 11 8 17 13 2 12 15 9 4 19 17 11 3 9 11 3 6 6 6 6 10 12 23 21 14 18 10 26 18 10 35 16 14 18 7 7 4 5 7 9 5 8 12 12 12 0 0 1 >AF150100\AF150100\218..487\270\AAD40006.1\Homo sapiens\Homo sapiens small zinc finger-like protein (TIM9a) mRNA, completecds./gene="TIM9a"/codon_start=1/product="small zinc finger-like protein"/protein_id="AAD40006.1"/db_xref="GI:5107188" 1 0 0 0 3 0 0 1 2 2 2 1 1 1 0 1 0 0 4 4 0 0 2 0 0 1 3 2 0 1 1 1 1 0 1 0 0 1 7 1 0 2 3 6 0 2 7 2 2 1 1 2 2 2 1 4 3 0 1 3 0 0 1 0 >BC008833\BC008833\154..1431\1278\AAH08833.1\Homo sapiens\Homo sapiens hypothetical protein FLJ20291, mRNA (cDNA cloneMGC:12612 IMAGE:4304656), complete cds./gene="FLJ20291"/codon_start=1/product="hypothetical protein FLJ20291"/protein_id="AAH08833.1"/db_xref="GI:14250728"/db_xref="GeneID:54883" 5 3 12 3 14 12 1 5 14 5 4 4 5 15 2 9 10 5 3 3 0 3 7 4 1 5 7 8 1 4 4 6 9 5 2 4 6 3 21 38 9 7 5 11 7 7 11 39 9 6 5 3 1 0 2 3 0 8 2 14 4 0 0 1 >BC018538\BC018538\52..537\486\AAH18538.1\Homo sapiens\Homo sapiens arachidonate 5-lipoxygenase-activating protein, mRNA(cDNA clone MGC:17120 IMAGE:4342203), complete cds./gene="ALOX5AP"/codon_start=1/product="arachidonate 5-lipoxygenase-activating protein"/protein_id="AAH18538.1"/db_xref="GI:17391275"/db_xref="GeneID:241"/db_xref="MIM:603700" 0 1 1 0 1 4 3 7 3 3 0 2 0 3 0 1 5 1 1 6 2 3 0 2 0 3 0 4 3 4 5 3 3 1 2 5 4 3 2 2 4 3 3 5 1 1 3 3 1 2 8 1 1 1 8 8 4 5 2 3 1 1 0 0 >BC008250\BC008250\30..794\765\AAH08250.1\Homo sapiens\Homo sapiens ethylmalonic encephalopathy 1, mRNA (cDNA cloneMGC:9282 IMAGE:3872059), complete cds./gene="ETHE1"/codon_start=1/product="ETHE1 protein"/protein_id="AAH08250.1"/db_xref="GI:14198377"/db_xref="GeneID:23474"/db_xref="MIM:608451" 0 5 8 2 1 3 0 7 17 2 1 4 0 5 2 2 5 1 6 8 1 3 4 2 0 8 0 11 4 5 4 7 9 1 1 7 5 2 2 3 4 2 1 10 8 2 2 12 9 4 4 1 3 6 9 2 1 9 2 5 0 0 0 1 >HS337O184\AL591562\69..1190\1122\CAC39140.1\Homo sapiens\Novel human gene mapping to chomosome 20./codon_start=1/product="hypothetical protein"/protein_id="CAC39140.1"/db_xref="GI:14149068"/db_xref="GOA:Q969T3"/db_xref="UniProt/Swiss-Prot:Q969T3" 2 6 18 6 0 0 0 16 32 2 0 5 3 7 1 2 9 3 1 10 2 2 8 12 5 5 9 22 3 6 3 13 7 2 0 2 7 1 2 8 3 2 4 20 7 2 5 27 22 4 3 1 3 2 7 6 0 4 1 3 5 1 0 0 >HUMDV\L23768\1..1128\1128\AAA35778.1\Homo sapiens\Human Gal beta1,3(4)GlcNAc alpha2,3-sialyltransferase mRNA,complete cds./codon_start=1/product="Gal beta 1,3 (4)GlcNAc alpha 2, 3-sialyltranferase"/protein_id="AAA35778.1"/db_xref="GI:388015" 4 10 6 0 3 0 5 11 17 3 2 7 7 6 1 5 4 3 3 8 3 2 3 8 1 7 10 13 1 6 6 17 3 1 2 5 15 3 11 12 6 8 2 6 4 0 1 19 13 5 7 8 4 1 10 12 0 16 4 8 7 0 0 1 >AL731797#1\AL731797\join(5853..6928,18235..18358,20120..20239,21032..21118, 22096..22280,22593..22755,23535..23760,27786..27907, 28856..28990,31599..31741,33235..33417,33911..34117, 35425..35666,37293..37423,38493..38981)\3633\CAI22793.1\Homo sapiens\Human DNA sequence from clone RP4-786G8 on chromosome 1 Containsthe 3' end of the HIPK1 gene for homeodomain interacting proteinkinase 1 and the gene for HNOEL-iso protein (HNOEL-iso), completesequence./gene="HIPK1"/locus_tag="RP4-786G8.2-001"/standard_name="OTTHUMP00000013763"/codon_start=1/product="homeodomain interacting protein kinase 1"/protein_id="CAI22793.1"/db_xref="GI:56206712"/db_xref="InterPro:IPR000719"/db_xref="InterPro:IPR002290"/db_xref="InterPro:IPR008271"/db_xref="InterPro:IPR011009" 7 6 4 3 10 10 15 21 32 17 6 20 26 23 7 28 37 29 32 29 2 34 34 17 7 33 20 30 7 46 21 15 17 10 10 18 28 25 17 31 20 25 26 66 20 18 21 19 15 25 21 22 12 7 12 19 5 21 24 20 8 0 1 0 >HSTRAN\X72215\1..876\876\CAA51017.1\Homo sapiens\H.sapiens mRNA for variant of transcription factor./gene="Pit-1/GHF-1 variant"/codon_start=1/product="transcription factor"/protein_id="CAA51017.1"/db_xref="GI:311926"/db_xref="GOA:P28069"/db_xref="UniProt/Swiss-Prot:P28069" 4 0 3 0 8 4 1 2 12 5 4 3 1 3 1 10 2 8 8 6 0 3 4 1 0 9 8 5 0 12 8 1 1 1 3 1 7 3 17 3 3 11 3 8 4 7 17 10 4 3 2 5 5 3 2 11 6 3 3 7 2 1 0 0 >AK057899\AK057899\406..906\501\BAB71611.1\Homo sapiens\Homo sapiens cDNA FLJ25170 fis, clone CBR08752./codon_start=1/protein_id="BAB71611.1"/db_xref="GI:16553878" 0 1 3 0 2 2 0 7 8 1 1 3 2 5 1 2 6 0 3 3 3 2 6 7 1 7 5 8 2 5 1 3 4 1 0 2 9 3 1 2 0 4 1 8 5 0 0 4 2 1 0 0 2 2 4 5 1 1 0 2 2 1 0 0 >BC026154\BC026154\280..1755\1476\AAH26154.1\Homo sapiens\Homo sapiens chromosome 9 open reading frame 12, mRNA (cDNA cloneMGC:26027 IMAGE:4839268), complete cds./gene="C9orf12"/codon_start=1/product="chromosome 9 open reading frame 12"/protein_id="AAH26154.1"/db_xref="GI:20072840"/db_xref="GeneID:64768" 3 6 7 1 4 7 2 9 24 9 6 5 3 5 6 5 9 7 4 7 2 6 4 7 5 11 7 12 2 3 3 10 13 2 3 11 16 4 11 28 10 6 6 17 12 4 10 23 15 10 14 6 9 8 7 14 5 6 6 11 3 1 0 0 >AL590423#3\AL590423\complement(join(3991..4221,5371..5422,6162..6271))\393\CAI41542.1\Homo sapiens\Human DNA sequence from clone RP13-364K23 on chromosome Xq22.1-23Contains the DSIPI gene for delta sleep inducing peptide,immunoreactor (DIP, GILZ, TSC-22R), the 5' end of a novel gene andtwo CpG islands, complete sequence./gene="DSIPI"/locus_tag="RP13-364K23.1-001"/standard_name="OTTHUMP00000023810"/codon_start=1/product="delta sleep inducing peptide, immunoreactor"/protein_id="CAI41542.1"/db_xref="GI:57209361"/db_xref="GOA:Q5JRJ1"/db_xref="UniProt/TrEMBL:Q5JRJ1" 1 0 0 1 1 0 1 0 11 1 0 1 0 7 0 3 3 1 0 3 0 0 3 3 0 2 1 5 2 2 2 0 1 2 0 1 10 2 0 7 4 2 1 7 1 1 3 13 2 2 1 2 0 1 4 0 1 4 0 4 0 1 0 0 >BC101154\BC101154\1..900\900\AAI01155.1\Homo sapiens\Homo sapiens cDNA clone MGC:119923 IMAGE:40015678, complete cds./codon_start=1/product="Unknown (protein for MGC:119923)"/protein_id="AAI01155.1"/db_xref="GI:71681910" 0 1 0 1 4 3 4 6 10 5 8 11 5 6 0 5 4 4 2 5 1 11 1 3 1 3 3 3 2 8 5 3 1 1 10 6 8 7 6 13 7 9 6 2 3 4 4 1 1 2 3 5 1 6 8 14 8 6 10 9 10 0 0 1 >BC013943\BC013943\73..1011\939\AAH13943.1\Homo sapiens\Homo sapiens hypothetical protein MGC24381, mRNA (cDNA cloneMGC:24381 IMAGE:4064325), complete cds./gene="MGC24381"/codon_start=1/product="hypothetical protein MGC24381"/protein_id="AAH13943.1"/db_xref="GI:15530296"/db_xref="GeneID:115939" 1 12 7 0 5 5 0 2 20 2 0 6 1 5 1 2 8 1 1 2 4 1 3 11 7 3 2 18 12 6 5 19 10 1 1 4 11 1 3 8 5 1 0 11 4 0 3 22 12 5 4 1 9 1 6 5 0 4 1 3 4 0 0 1 >BC001917\BC001917\29..1045\1017\AAH01917.1\Homo sapiens\Homo sapiens malate dehydrogenase 2, NAD (mitochondrial), mRNA(cDNA clone MGC:3559 IMAGE:2823443), complete cds./gene="MDH2"/codon_start=1/product="mitochondrial malate dehydrogenase, precursor"/protein_id="AAH01917.1"/db_xref="GI:12804929"/db_xref="GeneID:4191"/db_xref="MIM:154100" 1 4 3 0 2 0 1 8 16 4 0 2 2 8 2 4 6 0 4 15 4 1 3 9 4 4 6 20 2 10 7 11 6 4 2 10 13 6 8 18 8 5 0 9 3 2 8 8 6 7 3 2 4 4 8 4 0 17 4 6 0 0 0 1 >HSM805997\BX537890\105..914\810\CAD97885.1\Homo sapiens\Homo sapiens mRNA; cDNA DKFZp313F0536 (from clone DKFZp313F0536);complete cds./gene="DKFZp313F0536"/codon_start=1/product="hypothetical protein"/protein_id="CAD97885.1"/db_xref="GI:31873903"/db_xref="GOA:Q7Z3H8"/db_xref="InterPro:IPR007087"/db_xref="UniProt/TrEMBL:Q7Z3H8" 1 4 3 2 1 7 4 11 4 2 3 5 4 5 2 6 10 2 6 4 6 2 1 7 2 3 8 10 2 6 4 1 5 3 1 3 8 3 6 13 5 0 2 7 9 3 6 11 10 2 3 2 7 6 3 4 0 4 1 2 2 1 0 0 >AL590708#20\AL590708\complement(join(158372..158586,158674..158830, 158906..159012,159095..159197,159285..159503, 159760..159875,160312..160387,160472..160696, 161381..161494,161770..162107,162477..162564, 162759..162866,162973..163062,163868..164009, 164339..164397,166890..166996,167093..167186, 167275..167325,167592..167630,168499..168579, 168763..168822,169239..169302,169422..169465, 169725..169829,175155..175277,177198..177245))\2973\CAI13841.1\Homo sapiens\Human DNA sequence from clone RP11-395P17 on chromosome 9 Containsthe PCSCL gene for likely ortholog of rat peroxisomal Ca-dependentsolute carrier-like protein (KIAA1896), the PTGES2 gene forprostaglandin E synthase 2 (GBF1, GBF-1, C9orf15, FLJ14038), theLCN2 gene for lipocalin 2 (oncogene 24p3) (NGAL), the C9orf16 genefor chromosome 9 open reading frame 16 (MGC4639, EST00098,FLJ12823), the CIZ1 gene for Cip1-interacting zinc finger protein(LSFR1), the DNM1 gene for dynamin 1 (DNM), the GOLGA2 gene forgolgi autoantigen, golgin subfamily a, 2 (GM130, MGC20672), thegene for a novel protein and seven CpG islands, complete sequence./gene="GOLGA2"/locus_tag="RP11-395P17.5-004"/standard_name="OTTHUMP00000022234"/codon_start=1/product="golgi autoantigen, golgin subfamily a, 2"/protein_id="CAI13841.1"/db_xref="GI:55961118"/db_xref="InterPro:IPR003345" 3 13 23 4 6 16 9 17 60 10 8 14 9 7 5 11 25 7 8 18 6 11 8 12 2 16 20 36 12 25 14 12 13 10 9 8 21 4 16 41 20 16 24 87 13 5 30 111 19 15 8 6 6 5 3 6 3 11 5 25 3 1 0 0 >BC012354\BC012354\7..888\882\AAH12354.1\Homo sapiens\Homo sapiens ribosomal protein S2, mRNA (cDNA clone MGC:20390IMAGE:4564801), complete cds./gene="RPS2"/codon_start=1/product="ribosomal protein S2"/protein_id="AAH12354.1"/db_xref="GI:15214457"/db_xref="GeneID:6187"/db_xref="MIM:603624" 1 10 6 3 2 2 0 7 9 0 0 3 2 6 0 4 1 1 3 12 0 6 2 8 0 6 4 13 2 6 5 22 8 8 2 8 8 3 0 24 3 1 0 6 4 0 1 9 6 6 5 2 5 0 9 2 0 12 5 7 3 0 1 0 >BC002854\BC002854\177..1994\1818\AAH02854.1\Homo sapiens\Homo sapiens ubiquitin specific protease 2, transcript variant 1,mRNA (cDNA clone MGC:4176 IMAGE:3635143), complete cds./gene="USP2"/codon_start=1/product="ubiquitin specific protease 2, isoform a"/protein_id="AAH02854.1"/db_xref="GI:12804005"/db_xref="GeneID:9099"/db_xref="MIM:604725" 9 13 13 2 9 6 5 19 28 7 4 2 8 22 3 4 23 5 12 20 6 6 6 18 4 10 5 18 0 6 7 15 11 12 0 5 14 3 6 20 17 7 2 21 7 4 12 16 19 14 18 13 9 6 16 6 3 9 4 11 5 0 1 0 >AY189940\AY189940\21..563\543\AAO39000.1\Homo sapiens\Homo sapiens HOMER1F (HOMER1) mRNA, complete cds, alternativelyspliced./gene="HOMER1"/codon_start=1/product="HOMER1F"/protein_id="AAO39000.1"/db_xref="GI:28396185" 1 1 2 2 1 2 5 3 12 2 5 2 1 1 0 2 3 2 5 1 0 5 0 0 0 0 5 6 0 4 3 0 0 1 2 0 2 1 10 6 3 8 10 6 1 3 22 8 2 4 0 1 1 1 0 2 3 1 3 1 2 1 0 0 >HS111C20#4\AL035530\join(73239..73354,78863..78950,86368..86509,88982..89127, 90101..90304,91853..92015,94443..94529,94941..95251)\1257\CAI19235.1\Homo sapiens\Human DNA sequence from clone RP1-111C20 on chromosome 6q25.3-27Contains the gene for radial spoke protein 3 (RSP3), the TAGP genefor T-cell activation GTPase activating protein, a novel gene and aCpG island, complete sequence./gene="RSHL2"/locus_tag="RP1-111C20.2-002"/standard_name="OTTHUMP00000017517"/codon_start=1/product="radial spokehead-like 2"/protein_id="CAI19235.1"/db_xref="GI:56202997"/db_xref="InterPro:IPR009290"/db_xref="UniProt/TrEMBL:Q9BX75" 6 3 9 6 11 9 5 9 12 9 5 4 6 2 0 6 7 3 14 6 1 3 9 4 2 7 7 10 1 6 7 5 5 3 2 1 8 7 8 8 4 2 12 14 3 4 40 21 8 15 7 8 2 1 1 9 6 2 6 15 2 0 1 0 >AB042029\AB042029\407..1267\861\BAB70508.1\Homo sapiens\Homo sapiens DEPC-1 mRNA for prostate cancer antigen-1, completecds./gene="DEPC-1"/codon_start=1/product="prostate cancer antigen-1"/protein_id="BAB70508.1"/db_xref="GI:16326129" 5 4 2 1 11 3 3 3 3 4 2 4 4 1 0 3 4 2 8 6 0 5 8 7 1 9 2 5 1 8 4 4 3 3 3 2 10 4 7 8 5 4 3 11 6 5 14 10 9 5 3 8 2 2 2 5 3 4 5 4 9 0 0 1 >AB055225\AB055225\80..622\543\BAB97211.1\Homo sapiens\Homo sapiens IMAA mRNA for hLAT1-3TM, complete cds./gene="IMAA"/codon_start=1/product="hLAT1-3TM"/protein_id="BAB97211.1"/db_xref="GI:21320896" 0 2 3 0 0 0 2 9 9 0 0 0 1 3 5 0 0 0 0 5 3 0 0 4 6 1 3 13 12 0 0 16 2 1 1 5 12 0 1 9 1 0 0 2 1 1 1 11 3 0 6 0 4 0 5 0 1 10 0 4 2 0 0 1 >BC067286\BC067286\150..1373\1224\AAH67286.1\Homo sapiens\Homo sapiens melanoma antigen family B, 6, mRNA (cDNA cloneMGC:75411 IMAGE:30386334), complete cds./gene="MAGEB6"/codon_start=1/product="melanoma antigen, family B, 6 protein"/protein_id="AAH67286.1"/db_xref="GI:45501360"/db_xref="GeneID:158809"/db_xref="MIM:300467" 1 4 3 4 6 1 1 6 14 1 2 9 12 13 3 10 7 9 0 8 2 7 6 7 1 9 6 12 3 11 4 10 4 13 3 5 10 10 10 18 7 2 6 14 2 4 10 21 3 22 6 8 3 5 5 3 2 6 5 6 2 1 0 0 >HUMEPI\D14582\96..995\900\BAA03436.1\Homo sapiens\Homo sapiens mRNA for epimorphin, complete cds./codon_start=1/product="epimorphin"/protein_id="BAA03436.1"/db_xref="GI:303605" 2 3 7 2 6 2 2 1 10 3 1 3 5 2 2 3 5 4 4 4 1 8 4 0 1 0 4 4 3 7 3 2 4 1 3 1 8 8 14 11 7 7 4 8 3 4 16 17 9 11 1 5 0 1 5 5 7 9 9 11 2 0 0 1 >HUMIGHWD\M36530\join(87..132,236..546)\357\AAA36064.1\Homo sapiens\Human Ig germline heavy-chain gene V region./codon_start=1/protein_id="AAA36064.1"/db_xref="GI:185628" 0 1 0 0 5 0 1 1 6 2 2 4 3 0 0 4 4 2 1 3 2 0 0 0 0 1 0 2 1 6 2 2 5 3 1 3 3 2 5 1 3 0 2 4 1 0 1 6 2 1 3 3 0 3 3 1 1 1 1 3 2 0 0 0 >BC038971\BC038971\205..1176\972\AAH38971.1\Homo sapiens\Homo sapiens family with sequence similarity 49, member A, mRNA(cDNA clone MGC:47569 IMAGE:5200807), complete cds./gene="FAM49A"/codon_start=1/product="family with sequence similarity 49, member A"/protein_id="AAH38971.1"/db_xref="GI:24658571"/db_xref="GeneID:81553" 4 2 0 0 6 7 4 5 12 9 2 3 1 4 0 3 5 6 9 6 2 5 7 2 2 3 6 8 1 6 4 3 1 0 0 8 6 2 10 7 10 11 3 11 6 2 17 13 7 6 6 3 5 2 7 8 2 9 8 14 2 0 1 0 >BC098164\BC098164\70..513\444\AAH98164.1\Homo sapiens\Homo sapiens family with sequence similarity 12, member A, mRNA(cDNA clone MGC:119848 IMAGE:40014621), complete cds./gene="FAM12A"/codon_start=1/product="human epididymis-specific 3 alpha, precursor"/protein_id="AAH98164.1"/db_xref="GI:68532413"/db_xref="GeneID:10876" 2 0 0 1 4 3 1 4 4 3 2 1 0 1 0 1 7 3 2 0 0 0 2 0 0 1 2 3 0 1 1 3 1 1 4 1 0 1 7 4 5 3 0 1 1 3 6 6 2 3 7 4 3 4 6 1 4 4 5 3 5 0 1 0 >BC010858\BC010858\123..2378\2256\AAH10858.1\Homo sapiens\Homo sapiens enhancer of zeste homolog 2 (Drosophila), transcriptvariant 1, mRNA (cDNA clone MGC:9169 IMAGE:3901250), complete cds./gene="EZH2"/codon_start=1/product="enhancer of zeste 2, isoform a"/protein_id="AAH10858.1"/db_xref="GI:14790029"/db_xref="GeneID:2146"/db_xref="MIM:601573" 6 6 9 5 17 10 6 6 11 5 6 8 8 7 4 13 9 10 8 10 3 13 18 9 2 16 15 8 0 18 12 10 10 6 5 5 17 8 32 29 18 22 10 23 7 10 43 19 15 38 9 14 13 22 7 24 11 9 17 13 7 0 0 1 >AK001783\AK001783\111..842\732\BAA91908.1\Homo sapiens\Homo sapiens cDNA FLJ10921 fis, clone OVARC1000411./codon_start=1/protein_id="BAA91908.1"/db_xref="GI:7023271" 2 0 0 0 4 3 0 3 5 2 5 5 3 1 0 8 1 4 7 1 0 6 7 0 1 4 7 1 0 3 4 0 1 2 6 4 6 7 13 6 1 6 4 5 6 2 24 14 2 8 3 3 1 2 5 3 5 1 8 6 2 1 0 0 >HSU78576\U78576\422..2110\1689\AAC50911.1\Homo sapiens\Human 68 kDa type I phosphatidylinositol-4-phosphate 5-kinase alphamRNA, clone PIP5KIa2, complete cds./EC_number="2.7.1.68"/codon_start=1/product="68 kDa type I phosphatidylinositol-4-phosphate 5-kinase alpha"/protein_id="AAC50911.1"/db_xref="GI:1743873" 5 3 11 4 5 2 6 7 12 8 5 14 13 15 6 13 7 12 12 11 0 9 7 10 3 17 10 9 3 9 8 13 5 10 3 7 11 9 15 20 9 4 7 14 5 10 13 20 13 15 15 9 3 5 16 11 3 13 12 15 1 1 0 0 >AF075171\AF075171\928..1233\306\AAC69326.1\Homo sapiens\Homo sapiens nonsyndromic hearing impairment protein (DFNA5) mRNA,short form, complete cds./gene="DFNA5"/codon_start=1/product="nonsyndromic hearing impairment protein"/protein_id="AAC69326.1"/db_xref="GI:3777547" 0 1 0 1 2 1 0 2 10 4 1 3 3 0 0 4 1 1 2 2 0 2 3 2 0 0 2 1 0 5 2 2 1 0 1 2 2 0 4 2 0 1 0 2 1 1 3 1 3 5 0 0 3 2 1 2 1 1 4 1 0 0 0 1 >AL158151#2\AL158151\complement(join(2878..3093,3479..3616,4731..4793, 5494..5629,5730..5852,6012..6131,7347..7447,7992..8170, 9364..9538,9881..10046,10494..10547,11669..11787, 15295..15558,17964..17990))\1881\CAI12869.1\Homo sapiens\Human DNA sequence from clone RP11-247A12 on chromosome 9 Containsthe CRAT gene for carnitine acetyltransferase (CAT1), the PPP2R4gene for protein phosphatase 2A, regulatory subunit B' (PR 53)(PTPA), the IER5L gene for immediate early response 5-like, a novelgene and three CpG islands, complete sequence./gene="CRAT"/locus_tag="RP11-247A12.5-001"/standard_name="OTTHUMP00000022327"/codon_start=1/product="carnitine acetyltransferase"/protein_id="CAI12869.1"/db_xref="GI:55958023"/db_xref="GOA:Q5T952"/db_xref="InterPro:IPR000542"/db_xref="UniProt/TrEMBL:Q5T952" 4 12 8 2 0 5 4 14 40 3 1 4 4 17 4 2 16 3 3 22 5 1 4 23 1 7 8 28 6 6 3 13 10 4 2 12 30 0 4 37 19 0 0 32 17 4 3 28 23 12 20 4 5 3 19 9 0 26 3 20 7 0 0 1 >AK027309\AK027309\13..984\972\BAB55032.1\Homo sapiens\Homo sapiens cDNA FLJ14403 fis, clone HEMBA1003805, highly similarto Mus musculus KH domain RNA binding protein QKI-5A mRNA./codon_start=1/protein_id="BAB55032.1"/db_xref="GI:14041905" 3 0 2 2 8 5 2 3 8 8 6 5 3 0 0 1 4 3 7 6 1 6 10 5 1 14 11 8 3 11 6 7 4 3 5 3 6 6 13 8 7 7 5 6 3 3 15 9 6 7 6 4 1 1 2 3 3 9 7 11 1 1 0 0 >AL590808#2\AL590808\join(67396..68038,77966..78108,80405..80597,87940..88048, 88247..88338,88445..88567,91858..91962,93075..93239, 94842..95002,95867..96036,AL606833.8:3444..3517, AL606833.8:17146..17436,AL606833.8:24294..24461, AL606833.8:33259..33389)\-240792\CAI41594.1\Homo sapiens\Human DNA sequence from clone RP13-483F6 on chromosome X Contains aserine or cysteine proteinase inhibitor clade A (alpha-1antiproteinase, antitrypsin) member 7 (SERPINA7) pseudogene and the5' end of gene FLJ10178, complete sequence./gene="RP11-647M7.1"/locus_tag="RP11-647M7.1-002"/standard_name="OTTHUMP00000023784"/codon_start=1/product="novel protein"/protein_id="CAI41594.1"/db_xref="GI:57209435"/db_xref="GOA:Q5JQS1"/db_xref="UniProt/TrEMBL:Q5JQS1" 4 4 5 3 22 15 5 4 18 17 15 16 22 9 6 18 9 16 17 9 3 14 23 21 4 16 10 5 3 12 9 9 13 10 8 9 19 15 33 27 14 30 20 20 9 12 30 20 12 21 10 22 8 9 6 20 15 13 31 14 14 4 1 4 >HSA271671\AJ271671\88..1062\975\CAB82784.1\Homo sapiens\Homo sapiens mRNA for IRT1 protein./gene="Irt1"/function="unknown"/codon_start=1/product="IRT1 protein"/protein_id="CAB82784.1"/db_xref="GI:7330679"/db_xref="GOA:Q9NY26"/db_xref="UniProt/TrEMBL:Q9NY26" 1 3 4 1 0 3 4 14 43 1 0 6 6 4 1 2 5 2 4 4 1 2 7 5 2 6 8 15 4 12 5 10 12 2 4 5 16 0 1 4 1 1 3 12 5 3 3 16 3 2 2 1 3 5 9 4 1 11 1 6 3 0 1 0 >AK128756\AK128756\747..1292\546\BAC87601.1\Homo sapiens\Homo sapiens cDNA FLJ44869 fis, clone BRAMY2015516./codon_start=1/protein_id="BAC87601.1"/db_xref="GI:34536289" 0 0 0 1 0 1 4 3 18 6 0 8 1 5 0 5 1 0 0 5 0 1 4 14 1 25 2 16 1 1 0 6 0 4 0 2 0 2 1 0 0 0 0 0 2 1 0 1 1 0 2 1 9 2 1 3 1 2 0 3 14 0 0 1 >AK027269\AK027269\44..1396\1353\BAB55010.1\Homo sapiens\Homo sapiens cDNA FLJ14363 fis, clone HEMBA1000719./codon_start=1/protein_id="BAB55010.1"/db_xref="GI:14041843" 2 26 7 1 0 3 3 9 37 6 0 2 3 8 8 4 9 1 2 7 3 1 3 15 8 3 2 20 19 7 2 22 10 2 1 12 20 2 3 7 8 1 1 12 20 2 3 24 25 3 16 0 6 1 8 2 0 8 0 3 7 0 0 1 >BC026167\BC026167\43..621\579\AAH26167.1\Homo sapiens\Homo sapiens similar to CG9643-PA, mRNA (cDNA clone MGC:26857IMAGE:4823003), complete cds./gene="LOC399818"/codon_start=1/product="LOC399818 protein"/protein_id="AAH26167.1"/db_xref="GI:20071183"/db_xref="GeneID:399818" 2 1 1 0 1 3 1 2 5 4 2 2 1 2 3 8 2 3 2 1 0 4 1 1 1 2 3 1 4 4 7 6 3 6 2 3 4 2 7 6 2 7 2 3 1 2 8 5 6 6 1 3 0 1 4 7 3 1 10 4 3 1 0 0 >HSAB5104\Z15143\1..1089\1089\CAA78849.1\Homo sapiens\H.sapiens mRNA for MHC class I HLA-B*5104./codon_start=1/product="MCH class I HLA-B*5104"/protein_id="CAA78849.1"/db_xref="GI:28235"/db_xref="GOA:P30489"/db_xref="UniProt/Swiss-Prot:P30489" 2 8 6 1 6 7 1 8 17 1 0 0 1 7 0 7 6 2 6 16 3 4 4 8 4 2 5 12 12 8 5 11 10 1 1 6 16 2 2 7 7 0 1 19 6 4 1 24 15 4 12 3 4 1 7 0 1 10 2 5 11 0 0 1 >AY590469\AY590469\join(1898..3116,3346..3569,6609..6687,7047..7191, 9500..9690,11892..12031,13632..13829,14388..14656, 16654..16816,20076..20236,21052..21293,21695..21897, 23002..23102,25808..25939,26313..26426,28128..28280, 30249..30446,32391..32459,32709..32843,35555..35708, 40384..40642,40866..40963,42074..42464,43549..43806, 44340..44469,46254..46460,50292..50442,51846..52004)\5943\AAS89300.1\Homo sapiens\Homo sapiens MCM3 minichromosome maintenance deficient 3 (S.cerevisiae) associated protein (MCM3AP) gene, complete cds./gene="MCM3AP"/codon_start=1/product="MCM3 minichromosome maintenance deficient 3 (S. cerevisiae) associated protein"/protein_id="AAS89300.1"/db_xref="GI:46361510" 7 13 22 10 25 36 14 37 96 29 12 26 28 36 10 52 44 42 24 35 12 23 36 41 8 45 33 37 23 50 30 36 31 21 14 22 71 26 37 64 33 26 19 86 30 18 53 100 51 39 19 5 18 24 35 53 11 24 23 36 19 0 0 1 >AY598792\AY598792\1..690\690\AAT09203.1\Homo sapiens\Homo sapiens HBV X-transactivated protein 12 mRNA, complete cds./codon_start=1/product="HBV X-transactivated protein 12"/protein_id="AAT09203.1"/db_xref="GI:47054187" 2 1 2 2 4 3 3 6 6 4 2 2 2 4 0 5 3 1 2 2 0 3 5 0 0 6 2 2 0 5 1 2 1 2 4 3 4 4 11 11 5 5 1 13 3 4 9 8 6 7 1 6 2 2 5 9 2 5 5 6 3 0 1 0 >HSDJ559A3#5\AL117348\complement(join(73022..73171,74203..74211,74425..74646, 74771..74947))\558\CAI21800.1\Homo sapiens\Human DNA sequence from clone RP4-559A3 on chromosome 1q41-42.2Contains the 5' end of a novel gene (KIAA0792), the LEFTB gene forleft-right determination, factor B, the PYCR2 gene forpyrroline-5-carboxylate reductase family, member 2, the EBAF genefor endometrial bleeding associated factor (left-rightdetermination, factor A; transforming growth factor betasuperfamily), a novel gene, an NADH dehydrogenase (ubiquinone) 1alpha subcomplex, 3, 9kDa (NDUFA3) pseudogene and two CpG islands,complete sequence./gene="PYCR2"/locus_tag="RP4-559A3.4-003"/standard_name="OTTHUMP00000035615"/codon_start=1/product="pyrroline-5-carboxylate reductase family, member 2"/protein_id="CAI21800.1"/db_xref="GI:56203953"/db_xref="GOA:Q5TE93"/db_xref="InterPro:IPR000304"/db_xref="UniProt/TrEMBL:Q5TE93" 0 2 0 0 3 0 0 7 11 1 1 0 0 5 0 1 8 1 6 5 3 1 4 4 0 2 2 10 1 4 1 5 5 2 1 5 14 0 1 12 3 0 2 4 2 2 4 7 6 3 1 1 2 1 4 1 0 7 2 5 0 1 0 0 >HSU96759\U96759\319..801\483\AAC23907.1\Homo sapiens\Homo sapiens von Hippel-Lindau binding protein (VBP-1) mRNA,complete cds./gene="VBP-1"/codon_start=1/product="von Hippel-Lindau binding protein"/protein_id="AAC23907.1"/db_xref="GI:2738244" 1 0 0 0 4 2 2 1 5 5 2 5 2 2 1 1 0 0 3 5 0 4 0 0 0 4 4 2 0 4 0 0 2 1 3 1 1 2 11 10 5 6 2 7 0 0 9 4 3 10 2 4 1 1 1 3 0 0 4 6 2 1 0 0 >HSA133005\AJ133005\5..2212\2208\CAB40813.1\Homo sapiens\Homo sapiens mRNA for fertilin beta sperm protein./gene="fertilin beta"/function="sperm surface protein"/codon_start=1/product="fertilin beta protein"/protein_id="CAB40813.1"/db_xref="GI:4585655"/db_xref="GOA:Q99965"/db_xref="InterPro:IPR001590"/db_xref="InterPro:IPR001762"/db_xref="InterPro:IPR002870"/db_xref="InterPro:IPR006025"/db_xref="InterPro:IPR006209"/db_xref="InterPro:IPR006586"/db_xref="UniProt/Swiss-Prot:Q99965" 0 3 2 1 13 5 3 6 11 9 18 7 22 6 1 10 7 19 12 3 1 13 13 3 2 14 16 7 0 11 19 7 12 12 11 3 10 22 30 14 11 25 20 14 8 12 33 13 12 25 9 20 17 26 10 28 15 3 32 16 8 0 1 0 >AY255768\AY255768\1..1113\1113\AAP13479.1\Homo sapiens\Homo sapiens NADPH oxidase regulatory protein (NOXO1) mRNA,complete cds./gene="NOXO1"/codon_start=1/product="NADPH oxidase regulatory protein"/protein_id="AAP13479.1"/db_xref="GI:30102430" 3 21 13 1 1 7 3 8 27 2 0 3 4 5 1 5 12 2 1 11 8 2 7 16 9 6 7 11 13 4 4 18 8 2 0 1 17 2 0 4 1 0 4 16 3 0 6 17 13 3 4 1 5 2 10 3 0 5 0 1 7 0 0 1 >HSA010119\AJ010119\66..2384\2319\CAA09009.1\Homo sapiens\Homo sapiens mRNA for Ribosomal protein kinase B (RSK-B)./gene="rsk-b"/function="CREB-Kinase"/codon_start=1/evidence=experimental/product="Ribosomal protein kinase B (RSK-B)"/protein_id="CAA09009.1"/db_xref="GI:3452409"/db_xref="GOA:O75676"/db_xref="InterPro:IPR000719"/db_xref="InterPro:IPR000961"/db_xref="InterPro:IPR008271"/db_xref="InterPro:IPR011009"/db_xref="UniProt/Swiss-Prot:O75676" 7 24 25 1 0 4 0 17 66 2 0 3 3 14 9 4 20 2 1 10 16 1 5 31 5 9 9 31 18 10 4 30 26 5 1 10 38 0 5 31 14 3 2 36 18 3 9 51 26 6 15 2 7 3 31 5 0 20 5 13 6 1 0 0 >AF248484\AF248484\complement(124276..127752)\3477\AAF62185.1\Homo sapiens\Homo sapiens chromosome 21 PAC 30P13 map 21q11.2, completesequence, containing gene for nuclear factor RIP140./gene="nuclear factor RIP140"/codon_start=1/product="nuclear factor RIP140"/protein_id="AAF62185.1"/db_xref="GI:7407669" 5 1 2 1 22 12 12 11 24 26 25 22 30 12 1 40 33 46 22 8 5 27 25 10 3 31 30 15 1 26 21 11 11 15 8 8 19 20 61 31 28 54 27 32 20 14 54 23 15 37 7 15 5 9 7 17 10 8 11 26 6 1 0 0 >BC042047\BC042047\60..3128\3069\AAH42047.2\Homo sapiens\Homo sapiens hect domain and RLD 6, transcript variant 1, mRNA(cDNA clone MGC:50367 IMAGE:5531519), complete cds./gene="HERC6"/codon_start=1/product="hect domain and RLD 6, isoform a"/protein_id="AAH42047.2"/db_xref="GI:66840150"/db_xref="GeneID:55008"/db_xref="MIM:609249" 3 8 3 4 22 9 10 19 40 13 15 10 14 16 5 19 15 19 14 12 3 20 20 7 3 20 19 12 5 25 28 8 18 9 6 10 21 17 36 34 21 17 21 28 18 15 46 20 16 31 7 16 8 18 24 31 23 9 22 28 12 1 0 0 >BC051856\BC051856\311..1246\936\AAH51856.1\Homo sapiens\Homo sapiens hypothetical protein FLJ12505, mRNA (cDNA cloneMGC:60211 IMAGE:6067012), complete cds./gene="FLJ12505"/codon_start=1/product="hypothetical protein FLJ12505"/protein_id="AAH51856.1"/db_xref="GI:30354003"/db_xref="GeneID:79805" 2 4 7 0 4 9 1 5 15 0 1 3 5 5 2 2 12 2 1 8 1 4 3 10 2 7 4 9 3 3 3 12 6 0 0 7 9 1 8 17 5 4 2 7 12 2 3 12 11 1 9 3 2 0 6 5 2 8 5 11 4 0 1 0 >AK056449\AK056449\58..480\423\BAB71185.1\Homo sapiens\Homo sapiens cDNA FLJ31887 fis, clone NT2RP7003050./codon_start=1/protein_id="BAB71185.1"/db_xref="GI:16551854" 3 2 4 2 0 5 0 2 4 2 0 0 1 4 1 1 6 2 2 2 1 3 5 7 2 4 4 4 8 2 6 3 8 4 0 1 4 1 0 2 0 0 0 1 4 0 1 4 1 0 0 0 3 2 1 0 0 0 0 2 9 0 1 0 >AY358204\AY358204\111..590\480\AAQ88571.1\Homo sapiens\Homo sapiens clone DNA131647 SQAW5801 (UNQ5801) mRNA, complete cds./locus_tag="UNQ5801"/codon_start=1/product="SQAW5801"/protein_id="AAQ88571.1"/db_xref="GI:37181520" 0 1 1 0 2 4 0 3 10 2 0 1 0 4 0 1 5 1 1 2 0 1 6 15 2 3 1 3 2 3 1 6 2 2 1 5 3 0 3 3 3 0 2 2 0 0 3 7 3 2 2 1 8 4 8 1 0 8 3 1 1 0 0 1 >HSU71092\U71092\382..1590\1209\AAC14587.1\Homo sapiens\Homo sapiens somatostatin receptor-like protein (GPR24) gene,complete cds./gene="GPR24"/codon_start=1/product="somatostatin receptor-like protein"/protein_id="AAC14587.1"/db_xref="GI:1737179" 0 11 3 0 3 7 1 16 19 2 1 5 5 10 4 3 12 2 5 14 7 3 3 10 1 7 5 19 2 4 4 9 10 4 2 9 16 1 3 10 12 4 1 12 6 2 4 7 6 3 10 3 7 3 13 8 1 27 3 12 6 0 0 1 >AF431970\AF431970\43..1308\1266\AAM44214.1\Homo sapiens\Homo sapiens putative nucleoporin protein SEH1B (SEH1) mRNA,complete cds./gene="SEH1"/codon_start=1/product="putative nucleoporin protein SEH1B"/protein_id="AAM44214.1"/db_xref="GI:21239233" 4 2 4 2 7 4 4 5 5 5 4 5 7 8 1 16 11 10 9 6 1 9 6 4 0 16 9 5 3 14 10 1 7 7 7 5 9 7 9 10 4 16 0 13 7 9 11 4 6 19 0 8 2 8 6 8 5 7 7 9 14 1 0 0 >AB009589\AB009589\join(8540..9479,10624..10949)\1266\BAA23982.1\Homo sapiens\Homo sapiens gene for Osteomodulin, complete cds./codon_start=1/product="Osteomodulin"/protein_id="BAA23982.1"/db_xref="GI:2696502" 1 1 0 2 5 1 13 8 4 15 6 4 8 1 0 8 4 5 6 1 0 9 15 2 1 9 4 1 0 4 5 1 2 6 4 2 4 4 18 6 9 24 17 6 9 12 26 4 10 16 10 15 4 5 12 13 6 6 14 12 1 0 1 0 >AK131376\AK131376\220..585\366\BAD18527.1\Homo sapiens\Homo sapiens cDNA FLJ16434 fis, clone BRACE3015829./codon_start=1/protein_id="BAD18527.1"/db_xref="GI:47077213" 0 0 0 1 1 1 0 2 2 4 3 2 1 6 0 4 3 3 0 1 1 2 5 1 0 5 3 3 1 0 2 3 1 1 3 2 3 0 10 4 0 1 2 5 1 2 2 3 0 3 0 2 1 2 1 1 3 0 0 5 3 0 0 1 >BC002618\BC002618\127..1044\918\AAH02618.1\Homo sapiens\Homo sapiens serine/threonine kinase 16, transcript variant 2, mRNA(cDNA clone MGC:1953 IMAGE:3143105), complete cds./gene="STK16"/codon_start=1/product="serine/threonine kinase 16"/protein_id="AAH02618.1"/db_xref="GI:12803571"/db_xref="GeneID:8576"/db_xref="MIM:604719" 3 4 5 1 3 3 4 9 18 5 2 4 1 4 1 3 3 3 0 6 1 3 5 4 1 4 3 11 2 5 3 7 6 6 0 4 10 1 2 8 4 5 7 14 7 8 5 13 11 5 4 4 5 4 9 1 2 11 3 9 6 0 0 1 >BC020750\BC020750\122..778\657\AAH20750.1\Homo sapiens\Homo sapiens serine dehydratase, mRNA (cDNA clone MGC:22597IMAGE:4662951), complete cds./gene="SDS"/codon_start=1/product="SDS protein"/protein_id="AAH20750.1"/db_xref="GI:18088452"/db_xref="GeneID:10993"/db_xref="MIM:182128" 0 1 1 1 0 3 1 5 10 1 1 1 1 6 1 1 4 1 3 8 0 1 0 9 2 2 4 14 5 4 2 13 5 4 0 8 10 0 3 12 3 1 2 3 5 1 5 7 4 2 2 1 2 2 4 3 0 8 3 7 5 0 0 1 >AK075142\AK075142\300..2600\2301\BAC11431.1\Homo sapiens\Homo sapiens cDNA FLJ90661 fis, clone PLACE1005003, weakly similarto PROSTASIN PRECURSOR (EC 3.4.21.-)./codon_start=1/protein_id="BAC11431.1"/db_xref="GI:22761041" 4 19 11 2 3 6 5 20 60 2 1 6 6 12 7 3 19 3 6 13 4 10 15 40 12 15 12 38 13 14 13 34 21 6 3 13 31 2 1 4 10 3 4 30 15 6 11 39 19 4 9 4 19 13 8 7 2 15 3 7 29 0 0 1 >HSLTKM\X52213\259..1653\1395\CAA36460.1\Homo sapiens\H.sapiens ltk mRNA./gene="ltk"/codon_start=1/protein_id="CAA36460.1"/db_xref="GI:34422"/db_xref="GOA:P29376"/db_xref="UniProt/Swiss-Prot:P29376" 3 7 7 3 3 9 2 17 40 7 0 4 5 9 1 6 10 5 5 8 1 4 14 13 3 15 3 17 0 2 6 15 16 4 2 5 13 5 2 14 5 5 2 19 7 2 5 21 10 7 6 6 7 3 7 5 1 10 6 16 9 0 0 1 >AF159174\AF159174\37..831\795\AAL25998.1\Homo sapiens\Homo sapiens small intestine aquaporin mRNA, complete cds./codon_start=1/product="small intestine aquaporin"/protein_id="AAL25998.1"/db_xref="GI:16564853" 0 2 4 1 1 1 1 14 19 1 1 3 2 6 1 5 1 1 5 8 1 2 3 4 1 8 2 17 1 9 5 10 8 8 1 7 10 4 1 4 8 2 1 6 3 1 3 4 3 2 4 4 4 1 8 4 2 6 4 8 3 1 0 0 >BC056402\BC056402\237..1382\1146\AAH56402.1\Homo sapiens\Homo sapiens hypothetical protein BC007540, mRNA (cDNA cloneMGC:50858 IMAGE:5764461), complete cds./gene="LOC144097"/codon_start=1/product="hypothetical protein LOC144097"/protein_id="AAH56402.1"/db_xref="GI:33880045"/db_xref="GeneID:144097" 1 4 5 2 8 6 2 12 9 3 0 4 4 5 5 4 12 2 4 8 2 3 18 18 8 9 9 15 2 6 6 13 8 2 1 6 14 3 6 11 4 3 3 12 7 0 9 29 20 4 0 0 3 2 7 0 0 5 1 7 5 1 0 0 >AK026909\AK026909\149..1588\1440\BAB15588.1\Homo sapiens\Homo sapiens cDNA: FLJ23256 fis, clone COL05534./codon_start=1/protein_id="BAB15588.1"/db_xref="GI:10439879" 5 3 6 5 13 7 4 3 11 7 4 13 5 8 2 6 5 6 4 6 2 3 9 6 0 3 12 11 2 14 10 5 6 6 4 5 11 10 23 27 3 10 11 11 1 4 23 38 11 27 8 7 1 1 1 5 3 5 10 3 4 0 0 1 >BC009393\BC009393\136..603\468\AAH09393.1\Homo sapiens\Homo sapiens HIV-1 Rev binding protein-like, mRNA (cDNA cloneIMAGE:4135946), complete cds./gene="HRBL"/codon_start=1/product="HRBL protein"/protein_id="AAH09393.1"/db_xref="GI:14424764"/db_xref="GeneID:3268"/db_xref="MIM:604019" 0 2 4 3 2 3 0 2 5 2 1 1 1 4 1 1 3 0 2 5 0 2 3 1 2 3 0 3 5 1 1 8 5 2 3 3 7 1 2 8 2 1 2 4 1 1 2 9 1 4 1 1 8 0 5 2 0 2 1 3 3 0 0 1 >HSU34625\U34625\121..1908\1788\AAA86419.1\Homo sapiens\Human T cell surface glycoprotein CD-6 mRNA, complete cds./codon_start=1/product="T cell surface glycoprotein CD6"/protein_id="AAA86419.1"/db_xref="GI:1015968" 2 8 10 2 1 8 1 20 23 3 0 9 9 20 5 7 18 8 7 10 3 7 15 17 9 11 11 27 9 5 7 20 21 2 2 9 19 3 4 4 17 5 2 22 13 4 6 41 18 6 9 1 23 4 11 3 3 8 3 5 15 0 1 0 >HUMFERL\M11147\152..679\528\AAA52439.1\Homo sapiens\Human ferritin L chain mRNA, complete cds./gene="FTL"/codon_start=1/protein_id="AAA52439.1"/db_xref="GI:182514"/db_xref="GDB:G00-119-234" 0 5 0 3 0 2 1 8 13 3 0 2 0 3 0 2 3 0 0 4 1 2 2 1 1 0 1 9 0 5 0 7 1 4 0 2 4 0 3 9 4 2 1 6 5 2 5 11 8 5 4 3 0 1 8 0 0 2 1 5 1 1 0 0 >AF527578\AF527578\1..1413\1413\AAN37810.1\Homo sapiens\Homo sapiens mutant desmin mRNA, complete cds./codon_start=1/product="mutant desmin"/protein_id="AAN37810.1"/db_xref="GI:23506465" 5 20 13 1 1 3 1 16 26 2 0 4 2 11 7 6 7 3 2 13 6 4 1 3 5 1 4 24 8 7 1 12 6 2 0 6 21 0 1 21 17 3 1 31 2 4 12 45 18 5 13 1 1 0 10 2 0 11 7 10 1 1 0 0 >CR456941\CR456941\1..996\996\CAG33222.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834A075D forgene RABGGTB, Rab geranylgeranyltransferase, beta subunit; completecds, incl. stopcodon./gene="RABGGTB"/codon_start=1/protein_id="CAG33222.1"/db_xref="GI:48145999" 1 1 0 1 7 2 3 2 6 10 8 9 5 4 0 6 0 7 4 1 2 4 4 1 2 6 8 0 1 9 14 5 4 7 4 3 7 7 7 8 1 9 5 8 0 9 16 6 6 16 3 8 5 7 2 11 2 6 15 9 7 1 0 0 >HSJ244F24#1\AL096865\join(complement(121800..121937), complement(121476..121599),complement(27303..27667), complement(18241..18397),complement(12209..12313), AL358135.19:34471..34644,AL161907.17:5353..5514, AL161907.17:38324..38389,AL161907.17:39934..40412)\138\CAI19932.1\Homo sapiens\Human DNA sequence from clone RP1-244F24 on chromosome 6p12.3-21.2Contains the 5' end of the RUNX2 gene for runt-relatedtranscription factor 2, the 3' end of the SUPT3H gene forsuppressor of Ty 3 homolog (S. cerevisiae) and three CpG islands,complete sequence./gene="RUNX2"/locus_tag="RP1-166H4.1-002"/standard_name="OTTHUMP00000016533"/codon_start=1/protein_id="CAI19932.1"/db_xref="GI:56203978"/db_xref="GOA:Q5T802"/db_xref="InterPro:IPR000040"/db_xref="UniProt/TrEMBL:Q5T802" 3 3 16 2 12 11 8 6 21 12 6 2 10 12 5 8 30 8 15 7 6 7 12 5 7 6 12 7 1 4 13 8 4 8 8 5 8 8 16 17 15 7 13 13 9 7 11 13 5 7 4 6 13 8 6 13 5 7 11 16 20 6 5 11 >BX248519#1\BX248519\join(5137..5235,5322..5427,5675..6087)\618\CAI41939.1\Homo sapiens\Human DNA sequence from clone DASS-280D8 on chromosome 6 Containsthe LTA gene for lymphotoxin alpha (TNF superfamily, member 1), theTNF gene for tumor necrosis factor (TNF superfamily, member 2), theLTB gene for lymphotoxin beta (TNF superfamily, member 3), the LST1gene for leukocyte specific transcript 1, the NCR3 gene for naturalcytotoxicity triggering receptor 3, and 1 CpG island, completesequence./locus_tag="DASS-280D8.1-001"/standard_name="OTTHUMP00000037612"/codon_start=1/product="lymphotoxin alpha (TNF superfamily, member 1)"/protein_id="CAI41939.1"/db_xref="GI:57209822"/db_xref="InterPro:IPR003636"/db_xref="InterPro:IPR006052"/db_xref="InterPro:IPR006053"/db_xref="InterPro:IPR008983" 0 0 0 3 1 1 3 17 12 2 0 1 2 8 1 3 5 2 3 7 1 2 4 7 0 7 1 9 1 5 3 4 6 2 0 6 4 2 2 4 3 1 0 13 8 3 2 1 3 2 6 1 0 1 10 1 0 2 1 4 2 0 1 0 >HSGLTSY\X76013\6..2333\2328\CAA53600.1\Homo sapiens\H.sapiens QRSHs mRNA for glutaminyl-tRNA synthetase./gene="QRSHs"/codon_start=1/product="glutaminyl-tRNA synthetase"/protein_id="CAA53600.1"/db_xref="GI:558586"/db_xref="GOA:P47897"/db_xref="InterPro:IPR000924"/db_xref="InterPro:IPR001412"/db_xref="InterPro:IPR004514"/db_xref="InterPro:IPR007638"/db_xref="InterPro:IPR007639"/db_xref="InterPro:IPR011035"/db_xref="UniProt/Swiss-Prot:P47897" 8 9 17 7 2 7 12 18 35 9 3 5 7 8 2 5 6 5 12 16 4 14 15 14 1 13 14 24 4 23 11 21 5 8 6 13 34 4 9 37 15 9 1 26 17 9 11 48 28 13 9 16 8 8 22 14 1 17 10 16 10 0 0 1 >AC018463#1\AC018463\join(4628..4770,9547..9629,9706..9775,13285..13342, 21006..21126,24049..24090,26818..26909)\609\AAX93048.1\Homo sapiens\Homo sapiens BAC clone RP11-295J19 from 2, complete sequence./gene="MGC33602"/codon_start=1/product="unknown"/protein_id="AAX93048.1"/db_xref="GI:62702121" 0 2 3 1 2 0 3 6 18 5 3 1 1 2 0 3 5 3 5 6 2 4 0 1 4 1 3 5 9 5 2 2 3 0 3 1 8 0 0 7 3 3 1 6 0 1 1 3 2 2 6 3 1 6 3 6 4 11 1 4 6 0 0 1 >S61511\S61511\1..225\225\AAB26807.1\Homo sapiens\PAX3=HuP2 {exon 2} [human, Genomic Mutant, 229 nt]./gene="PAX3"/codon_start=1/product="HuP2"/protein_id="AAB26807.1"/db_xref="GI:300503" 0 4 3 0 0 2 0 2 3 0 0 0 0 3 1 0 0 0 0 0 1 0 0 2 1 0 1 3 0 0 0 7 0 1 0 3 2 1 0 2 4 0 2 4 5 2 0 1 1 0 0 0 4 0 0 1 0 6 0 1 1 0 0 1 >AF060500\AF060500\1..2076\2076\AAD38323.1\Homo sapiens\Homo sapiens liver specific transporter mRNA, complete cds./codon_start=1/product="liver specific transporter"/protein_id="AAD38323.1"/db_xref="GI:5051630" 2 0 1 2 8 5 9 4 11 12 16 21 27 10 1 21 6 8 13 10 2 14 11 4 0 9 16 4 0 14 29 5 6 15 6 9 12 12 31 10 10 33 14 4 1 11 19 7 3 15 13 18 6 14 17 26 20 10 29 19 6 1 0 0 >BC065533\BC065533\179..1459\1281\AAH65533.1\Homo sapiens\Homo sapiens inositol hexaphosphate kinase 2, transcript variant 1,mRNA (cDNA clone MGC:75016 IMAGE:6162048), complete cds./gene="IHPK2"/codon_start=1/product="inositol hexaphosphate kinase 2, isoform a"/protein_id="AAH65533.1"/db_xref="GI:41350931"/db_xref="GeneID:51447"/db_xref="MIM:606992" 3 8 4 4 1 5 4 9 14 4 4 5 7 4 1 5 7 5 7 6 0 4 4 6 0 5 8 7 0 6 2 11 7 4 4 11 17 0 15 20 7 5 3 19 7 9 15 28 11 11 12 6 3 6 11 5 4 5 5 11 5 0 0 1 >HSTRHREC\X72089\20..1216\1197\CAA50979.1\Homo sapiens\H.sapiens mRNA for TRH receptor./codon_start=1/product="TRH receptor"/protein_id="CAA50979.1"/db_xref="GI:440156"/db_xref="GOA:P34981"/db_xref="UniProt/Swiss-Prot:P34981" 1 0 0 1 6 5 3 14 11 6 4 5 7 8 0 8 6 6 13 10 0 5 3 5 1 5 7 10 0 7 3 6 0 2 8 13 12 5 8 10 13 12 3 9 2 2 5 6 5 8 16 6 9 6 10 12 5 13 15 11 6 0 0 1 >AL162590#5\AL162590\join(26407..26538,26735..26912,29807..29911,30362..30589, 34138..34252,36496..36611,36937..37037,38607..38825)\1194\CAI15553.1\Homo sapiens\Human DNA sequence from clone RP11-54K16 on chromosome 9p13.3-21.1Contains a cancer/testis antigen 1 (CTAG1) (LAGE2B) pseudogene, theDNAJA1 gene for DnaJ (Hsp40) homolog, subfamily A, member 1, thegene for homolog of C. elegans smu-1 (SMU-1) (FLJ10805), the 5' endof the APTX gene for aprataxin and 3 CpG islands, completesequence./gene="DNAJA1"/locus_tag="RP11-54K16.1-001"/standard_name="OTTHUMP00000021193"/codon_start=1/product="DnaJ (Hsp40) homolog, subfamily A, member 1"/protein_id="CAI15553.1"/db_xref="GI:55958015"/db_xref="InterPro:IPR001305"/db_xref="InterPro:IPR001623"/db_xref="InterPro:IPR002939"/db_xref="InterPro:IPR003095"/db_xref="InterPro:IPR008971" 5 2 2 1 9 6 4 5 8 2 4 4 1 1 0 8 1 1 3 4 0 6 8 5 0 5 7 0 0 6 19 9 2 13 6 3 6 9 19 17 4 10 6 17 1 11 25 12 12 14 5 6 5 8 3 10 6 8 11 12 0 1 0 0 >HSJ000882\AJ000882\202..4401\4200\CAA04372.1\Homo sapiens\Homo sapiens mRNA for steroid receptor coactivator 1e./codon_start=1/product="steroid receptor coactivator 1e"/protein_id="CAA04372.1"/db_xref="GI:2924311"/db_xref="GOA:Q15788"/db_xref="UniProt/Swiss-Prot:Q15788" 9 4 7 5 23 16 19 12 25 23 24 18 41 24 4 38 34 30 34 17 7 27 45 24 3 46 24 22 2 29 38 19 14 16 19 10 21 13 36 19 45 45 45 77 9 12 40 29 29 38 9 12 10 12 17 18 16 17 24 50 4 0 1 0 >AB020982\AB020982\12..1898\1887\BAA35127.1\Homo sapiens\Homo sapiens ASH2L mRNA, complete cds, similar to Drosophila ash2sequence./gene="ASH2L"/codon_start=1/protein_id="BAA35127.1"/db_xref="GI:4009336" 4 4 7 0 5 2 5 3 9 7 4 15 6 8 0 9 8 14 17 10 2 7 13 9 6 17 13 14 6 17 23 15 16 10 6 6 12 6 23 23 11 10 5 16 8 7 20 23 14 28 12 15 8 3 9 14 5 4 12 13 10 0 0 1 >AF202635\AF202635\707..1039\333\AAG22489.1\Homo sapiens\Homo sapiens PP1200 mRNA, complete cds./codon_start=1/product="PP1200"/protein_id="AAG22489.1"/db_xref="GI:10732646" 0 1 0 0 1 1 3 2 2 3 1 4 1 4 0 3 1 3 2 3 1 2 1 2 1 2 2 1 2 3 1 1 1 3 1 0 2 1 4 2 2 1 1 2 2 0 3 2 1 0 1 2 2 1 1 10 1 3 3 2 2 0 1 0 >BC013579\BC013579\219..2222\2004\AAH13579.1\Homo sapiens\Homo sapiens calpastatin, mRNA (cDNA clone MGC:9402 IMAGE:3878564),complete cds./gene="CAST"/codon_start=1/product="CAST protein"/protein_id="AAH13579.1"/db_xref="GI:15488898"/db_xref="GeneID:831"/db_xref="MIM:114090" 0 1 4 2 9 2 5 9 7 7 6 9 23 8 6 16 4 10 20 12 3 10 33 10 3 19 14 8 1 37 16 4 6 8 3 5 8 6 49 42 3 6 9 11 1 2 55 24 22 45 3 3 1 6 3 2 6 5 6 9 0 1 0 0 >AL353135#3\AL353135\12579..12785\207\CAI16245.1\Homo sapiens\Human DNA sequence from clone RP11-63L7 on chromosome 6 Containsthe PROL2 gene for proline rich 2 (proline-rich protein withnuclear targeting signal), the gene for NB4apoptosis/differentiation related protein (PNAS-145), the gene forserine-arginine repressor protein (35kDa) (SRrp35), the gene for anovel protein similar to putative amidohydrolase, the 3' end of theGABRR1 gene for gamma-aminobutyric acid (GABA) receptor, rho 1 and3 CpG islands, complete sequence./gene="RP11-63L7.5"/locus_tag="RP11-63L7.5-001"/standard_name="OTTHUMP00000016848"/codon_start=1/protein_id="CAI16245.1"/db_xref="GI:55958302"/db_xref="UniProt/TrEMBL:Q9BXV2" 0 0 0 0 2 0 0 1 0 0 4 6 1 1 0 0 1 0 2 0 0 1 0 0 0 1 1 1 0 0 1 0 0 1 6 0 0 0 7 0 1 2 1 1 0 5 0 1 0 1 2 4 0 3 0 4 1 1 2 1 1 0 1 0 >BC096078\BC096078\74..1489\1416\AAH96078.1\Homo sapiens\Homo sapiens 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 1,mRNA (cDNA clone MGC:116716 IMAGE:40000711), complete cds./gene="PFKFB1"/codon_start=1/product="6-phosphofructo-2-kinase/fructose-2, 6-biphosphatase 1"/protein_id="AAH96078.1"/db_xref="GI:64654628"/db_xref="GeneID:5207"/db_xref="MIM:311790" 9 7 6 1 4 6 2 11 22 4 2 2 4 10 0 2 10 3 10 10 0 4 8 5 0 8 5 13 2 4 4 13 0 6 1 7 18 6 7 17 11 8 6 16 9 7 19 24 10 11 12 16 7 3 5 8 5 19 7 11 4 0 0 1 >BC055419\BC055419\453..1040\588\AAH55419.1\Homo sapiens\Homo sapiens integrin, alpha 4 (antigen CD49D, alpha 4 subunit ofVLA-4 receptor), mRNA (cDNA clone IMAGE:4805867), complete cds./gene="ITGA4"/codon_start=1/product="ITGA4 protein"/protein_id="AAH55419.1"/db_xref="GI:33392763"/db_xref="GeneID:3676"/db_xref="MIM:192975" 3 2 1 0 6 4 1 4 8 2 1 3 1 2 1 0 5 1 3 1 3 5 1 9 2 3 0 4 5 4 6 5 5 5 0 4 7 0 2 5 5 8 1 6 3 1 8 3 2 1 6 2 4 4 1 1 3 4 1 2 5 1 0 0 >BC009300\BC009300\116..616\501\AAH09300.1\Homo sapiens\Homo sapiens mitochondrial protein 18 kDa, transcript variant 1,mRNA (cDNA clone MGC:15180 IMAGE:4126237), complete cds./gene="MTP18"/codon_start=1/product="mitochondrial protein 18 kDa, isoform a"/protein_id="AAH09300.1"/db_xref="GI:31455234"/db_xref="GeneID:51537" 1 7 2 0 0 2 1 4 8 2 0 1 1 4 1 3 6 0 1 8 1 1 2 5 3 1 2 8 4 6 1 8 2 0 1 2 16 0 1 5 1 1 0 2 1 0 1 5 5 3 4 3 0 1 3 1 0 4 4 1 5 0 0 1 >AL353602#5\AL353602\complement(join(28535..28711,28924..29088,29554..29725, 30658..30716,30854..30989,31431..31650,31809..31983, 32284..32481,32677..32785,33147..33331,33843..33869, 35898..36083))\1809\CAC36269.1\Homo sapiens\Human DNA sequence from clone RP11-22I24 on chromosome 6 Containsthe 3' part of the POLH gene for DNA directed polymerase eta, theGTPBP2 gene for GTP binding protein 2 and one CpG island, completesequence./gene="GTPBP2"/locus_tag="RP11-22I24.2-011"/standard_name="OTTHUMP00000016486"/codon_start=1/product="GTP binding protein 2"/protein_id="CAC36269.1"/db_xref="GI:13561007"/db_xref="Genew:4670"/db_xref="GOA:Q9BX10"/db_xref="InterPro:IPR000795"/db_xref="InterPro:IPR004160"/db_xref="InterPro:IPR004161"/db_xref="UniProt/TrEMBL:Q9BX10" 8 7 12 7 2 4 4 13 37 6 1 3 4 5 4 5 14 4 13 18 3 5 5 11 4 3 9 19 1 11 14 23 18 3 10 15 28 4 8 25 13 8 3 21 8 6 10 33 17 9 7 3 10 3 17 7 2 15 7 12 1 0 0 1 >AY551299\AY551299\1..1545\1545\AAS58460.1\Homo sapiens\Homo sapiens protein kinase Chk2 transcript variant del9 (CHK2)mRNA, complete cds./gene="CHK2"/codon_start=1/product="protein kinase Chk2 transcript variant del9"/protein_id="AAS58460.1"/db_xref="GI:45356842" 3 3 3 4 7 5 4 5 11 10 6 10 8 13 1 13 11 7 10 11 2 7 6 5 1 13 8 10 1 14 10 5 10 4 6 4 11 11 19 18 7 11 5 16 5 4 27 18 9 16 9 8 5 7 4 17 4 7 12 7 6 0 0 1 >HSU10690\U10690\3075..3449\375\AAA68874.1\Homo sapiens\Human MAGE-5b antigen (MAGE5b) gene, complete cds./gene="MAGE5b"/codon_start=1/product="MAGE-5b antigen"/protein_id="AAA68874.1"/db_xref="GI:533521" 1 0 0 0 0 1 1 4 5 2 0 1 1 7 0 3 2 3 0 3 0 4 4 1 0 6 2 5 0 5 1 4 3 2 0 1 6 0 0 7 1 0 3 5 1 1 3 10 3 1 0 1 1 0 2 1 0 2 2 1 1 1 0 0 >HSU28164\U28164\232..1794\1563\AAD28324.1\Homo sapiens\Homo sapiens spermatogenesis associated PD1 mRNA, complete cds./codon_start=1/product="spermatogenesis associated PD1"/protein_id="AAD28324.1"/db_xref="GI:4730927" 3 11 15 1 4 3 1 20 28 3 2 2 8 21 8 4 25 4 3 16 5 6 6 16 5 7 7 22 1 4 0 15 4 1 0 4 21 1 7 30 8 0 2 14 11 4 4 23 16 16 18 6 10 7 11 5 0 8 1 9 3 0 1 0 >AL139287#21\AL139287\complement(join(73754..73855,73953..74398,74658..74709))\600\CAI23191.1\Homo sapiens\Human DNA sequence from clone RP5-890O3 on chromosome 1 Containsthe AKIP gene for aurora-A kinase interacting protein, a novel gene(FLJ90811) a NADH dehydrogenase (ubiquinone) 1 beta subcomplex 415kDa (NDUFB4) pseudogene, a novel gene (MGC3047), the DVL1 genefor dishevelled dsh homolog 1 (Drosophila), the TAS1R3 gene fortaste receptor type 1 member 3, a novel gene (MGC10334), a novelgene (FLJ20542), a novel gene and five CpG islands, completesequence./gene="AKIP"/locus_tag="RP5-890O3.1-002"/standard_name="OTTHUMP00000003014"/codon_start=1/product="aurora-A kinase interacting protein"/protein_id="CAI23191.1"/db_xref="GI:56204969" 0 8 7 0 2 5 1 3 17 1 0 1 0 3 1 2 5 0 1 3 2 2 2 8 4 4 2 7 4 1 2 7 9 1 0 6 4 1 3 14 2 0 2 9 2 0 4 7 1 3 4 0 3 1 3 0 1 4 1 4 5 0 0 1 >S78433\S78433\1..18\18\AAB21197.1\Homo sapiens\isovaleryl-CoA dehydrogenase presursor {exon 2} [human, mRNAPartial Mutant, 18 nt]./gene="isovaleryl-CoA dehydrogenase presursor"/codon_start=1/product="isovaleryl-CoA dehydrogenase presursor"/protein_id="AAB21197.1"/db_xref="GI:243926" 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 >BC046199\BC046199\423..752\330\AAH46199.1\Homo sapiens\Homo sapiens family with sequence similarity 72, member A, mRNA(cDNA clone MGC:57827 IMAGE:6064384), complete cds./gene="FAM72A"/codon_start=1/product="FAM72A protein"/protein_id="AAH46199.1"/db_xref="GI:28278563"/db_xref="GeneID:389835" 0 0 0 0 3 0 2 0 1 2 2 1 1 3 0 1 2 3 2 2 0 1 2 0 0 1 4 0 0 0 2 1 1 2 2 0 3 2 4 2 7 1 0 1 2 1 2 4 4 3 0 3 3 7 4 1 1 4 5 2 2 1 0 0 >BT007312\BT007312\1..1416\1416\AAP35976.1\Homo sapiens\Homo sapiens G protein-coupled receptor kinase-interactor 2 mRNA,complete cds./codon_start=1/product="G protein-coupled receptor kinase-interactor 2"/protein_id="AAP35976.1"/db_xref="GI:30583463" 2 3 6 1 8 6 7 8 10 8 8 11 7 3 3 9 12 10 8 6 6 4 6 5 3 6 12 12 3 10 9 4 6 2 5 9 9 5 19 10 12 13 8 19 6 12 12 16 16 19 2 7 5 3 5 7 6 3 5 10 4 0 1 0 >BC013937\BC013937\94..660\567\AAH13937.1\Homo sapiens\Homo sapiens chromosome 7 open reading frame 24, mRNA (cDNA cloneMGC:24311 IMAGE:3997629), complete cds./gene="C7orf24"/codon_start=1/product="chromosome 7 open reading frame 24"/protein_id="AAH13937.1"/db_xref="GI:15530290"/db_xref="GeneID:79017" 2 1 0 0 0 1 0 1 7 2 2 1 1 2 2 1 2 6 5 2 1 3 4 2 1 1 3 4 2 1 5 5 3 3 3 2 2 4 10 6 4 6 6 3 1 1 12 6 4 4 3 5 2 2 2 5 4 3 4 5 3 0 1 0 >BC052563\BC052563\434..2734\2301\AAH52563.1\Homo sapiens\Homo sapiens zyg-11 homolog B (C. elegans)-like, mRNA (cDNA cloneMGC:59807 IMAGE:6303965), complete cds./gene="ZYG11BL"/codon_start=1/product="zyg-11 homolog B (C. elegans)-like"/protein_id="AAH52563.1"/db_xref="GI:30851198"/db_xref="GeneID:10444" 2 16 17 3 2 7 6 30 58 5 0 8 3 18 3 7 25 4 6 13 9 6 5 10 4 8 3 25 3 5 3 15 8 1 2 15 26 3 6 30 31 10 2 31 15 2 14 50 33 7 16 4 17 7 26 8 3 30 10 21 9 0 1 0 >AY619692\AY619692\84..1025\942\AAU88048.1\Homo sapiens\Homo sapiens intelectin 1 (ITLN1) mRNA, complete cds./gene="ITLN1"/codon_start=1/product="intelectin 1"/protein_id="AAU88048.1"/db_xref="GI:52843235" 0 3 1 2 3 4 1 4 10 1 0 1 1 3 1 7 9 5 4 8 3 7 4 5 1 4 6 8 3 3 14 13 3 5 1 3 8 3 5 7 10 6 1 12 5 2 5 10 10 7 12 8 3 7 7 11 2 6 1 5 9 0 0 1 >BC034046\BC034046\179..3787\3609\AAH34046.1\Homo sapiens\Homo sapiens protein tyrosine phosphatase, receptor type, fpolypeptide (PTPRF), interacting protein (liprin), alpha 1, mRNA(cDNA clone MGC:26800 IMAGE:4794300), complete cds./gene="PPFIA1"/codon_start=1/product="PPFIA1 protein"/protein_id="AAH34046.1"/db_xref="GI:21707845"/db_xref="GeneID:8500" 14 10 11 9 30 28 10 16 43 36 20 21 13 29 5 19 24 21 22 13 8 14 14 13 7 12 34 25 5 24 14 24 13 12 8 12 22 11 42 30 24 20 21 47 21 14 82 44 34 36 6 4 7 3 9 10 6 16 16 34 10 1 0 0 >AL627107#1\AL627107\join(36278..36304,37516..37860)\372\CAI23611.1\Homo sapiens\Human DNA sequence from clone XXyac-R12DG2 on chromosome 13Contains 2 novel genes, a calumenin (CALU) pseudogene, a novel gene(possible ortholog of RIKEN cDNA 4933433D23 gene) and a CpG island,complete sequence./locus_tag="XXyac-R12DG2.2-001"/standard_name="OTTHUMP00000018349"/codon_start=1/protein_id="CAI23611.1"/db_xref="GI:56207198"/db_xref="UniProt/TrEMBL:Q8WYW1" 1 8 2 0 3 4 0 3 0 1 0 7 5 7 8 1 9 0 4 4 1 0 4 4 3 1 0 5 0 1 1 4 4 0 1 1 3 2 0 4 1 0 0 0 0 0 0 3 0 0 2 1 0 0 3 0 0 2 1 1 3 0 0 1 >AY956764\AY956764\1..1794\1794\AAX38251.1\Homo sapiens\Homo sapiens heat shock protein 90Bc (HSP90Bc) mRNA, complete cds./gene="HSP90Bc"/codon_start=1/product="heat shock protein 90Bc"/protein_id="AAX38251.1"/db_xref="GI:61104913" 2 3 5 3 2 3 5 10 15 9 2 11 4 9 1 13 9 5 7 14 1 6 1 13 0 7 10 10 1 11 3 8 1 14 4 5 20 7 22 35 12 8 2 19 10 6 30 53 18 27 5 13 4 2 9 11 4 23 10 17 3 0 1 0 >HSTIF\X79538\74..1309\1236\CAA56074.1\Homo sapiens\H.sapiens nuk_34 mRNA for translation initiation factor./codon_start=1/product="translation initiation factor"/protein_id="CAA56074.1"/db_xref="GI:496902"/db_xref="GOA:P38919"/db_xref="UniProt/Swiss-Prot:P38919" 3 3 6 5 8 6 1 8 13 3 0 11 5 5 3 2 3 2 7 11 5 8 4 3 1 3 4 9 5 8 4 10 5 5 3 7 12 7 13 10 7 5 4 17 2 3 11 18 13 15 11 2 3 2 9 5 1 23 10 16 3 0 0 1 >HSNUCPP\Z34289\55..2154\2100\CAA84063.1\Homo sapiens\H.sapiens mRNA for nucleolar phosphoprotein p130./function="nucleologenesis"/codon_start=1/product="nucleolar phosphoprotein p130"/protein_id="CAA84063.1"/db_xref="GI:663008"/db_xref="GOA:Q14978"/db_xref="UniProt/Swiss-Prot:Q14978" 7 3 5 0 3 5 1 6 7 2 3 1 16 15 0 28 35 21 7 15 1 8 23 14 1 24 29 33 5 22 11 10 2 9 5 8 14 9 31 80 6 12 8 20 0 1 17 50 20 19 3 2 0 0 5 4 3 2 3 3 2 0 0 1 >HSRPS3AGE\X87373\join(387..448,1212..1315,1702..1889,3598..3806,4911..5020, 5209..5330)\795\CAA60827.1\Homo sapiens\Homo sapiens RPS3a gene for ribosomal protein S3a./gene="RPS3a"/codon_start=1/product="ribosomal protein S3a"/protein_id="CAA60827.1"/db_xref="GI:854179"/db_xref="GOA:P49241"/db_xref="UniProt/Swiss-Prot:P49241" 2 3 2 3 3 1 0 4 4 5 0 5 0 1 0 6 1 3 3 7 2 5 4 1 0 2 3 2 1 7 7 5 1 7 1 7 6 9 20 18 4 6 5 7 2 3 12 3 5 12 1 4 2 2 5 5 3 2 7 11 2 1 0 0 >AF167473\AF167473\1..570\570\AAF89618.1\Homo sapiens\Homo sapiens heme-binding protein mRNA, complete cds./codon_start=1/product="heme-binding protein"/protein_id="AAF89618.1"/db_xref="GI:9622095" 0 1 5 2 0 1 2 0 5 0 1 2 0 2 1 1 4 1 5 3 2 1 3 5 0 4 4 7 1 4 3 6 6 3 3 5 3 2 5 11 2 3 3 3 0 0 11 5 7 3 4 5 1 1 4 4 0 4 4 7 4 0 0 1 >BA000025#2\BA000025 AP000502-AP000521\42521..43900\1380\BAB63293.1\Homo sapiens\Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region./gene="NG35"/function="Zn finger"/codon_start=1/protein_id="BAB63293.1"/db_xref="GI:15277208" 1 7 13 1 0 3 2 14 25 4 1 5 3 7 6 2 16 3 2 7 3 1 9 14 3 8 7 27 5 3 7 19 11 5 1 9 23 2 5 19 13 1 1 12 17 1 8 25 10 6 5 1 14 1 15 1 1 10 6 8 0 0 1 0 >AY337001\AY337001\1..1035\1035\AAQ76789.1\Homo sapiens\Homo sapiens chemokine receptor-like 2 mRNA, complete cds./codon_start=1/product="chemokine receptor-like 2"/protein_id="AAQ76789.1"/db_xref="GI:34596212" 0 2 0 1 2 9 4 11 20 10 1 5 3 3 2 3 6 5 8 5 1 6 5 7 0 3 8 8 2 4 2 5 3 2 3 5 12 6 12 6 6 6 5 6 4 4 10 7 7 6 9 8 6 6 14 13 2 7 7 7 4 1 0 0 >AK022909\AK022909\1..2223\2223\BAB14304.1\Homo sapiens\Homo sapiens cDNA FLJ12847 fis, clone NT2RP2003347./codon_start=1/protein_id="BAB14304.1"/db_xref="GI:10434574" 1 3 2 5 15 12 7 6 15 13 12 14 24 11 3 26 15 15 14 13 3 9 14 9 4 19 12 9 3 8 16 13 6 8 5 13 9 8 37 26 9 22 8 20 15 9 36 22 11 23 6 7 10 15 4 18 6 7 16 14 5 0 0 1 >BC007226\BC007226\42..2006\1965\AAH07226.1\Homo sapiens\Homo sapiens N-glycanase 1, mRNA (cDNA clone MGC:15182IMAGE:2963554), complete cds./gene="NGLY1"/codon_start=1/product="N-glycanase 1"/protein_id="AAH07226.1"/db_xref="GI:13938211"/db_xref="GeneID:55768" 8 1 3 3 22 6 4 8 14 16 7 15 17 9 2 16 8 9 14 8 1 8 5 4 3 10 8 12 5 16 12 9 7 7 8 9 10 13 26 15 12 18 11 19 9 6 37 19 9 23 3 13 6 14 4 20 10 4 18 5 16 0 0 1 >HSA550516\AJ550516\1..297\297\CAD79455.1\Homo sapiens\Homo sapiens mRNA for HDM2-HD1 protein (HDM2 gene)./gene="HDM2"/function="oncogene"/codon_start=1/product="HDM2-HD1 protein"/protein_id="CAD79455.1"/db_xref="GI:29125741" 1 0 0 0 1 1 2 0 0 2 0 1 1 0 1 2 0 2 2 4 0 2 3 3 0 3 1 2 0 2 1 1 0 3 3 1 3 0 4 4 1 4 5 1 0 2 4 2 1 1 0 1 4 4 1 1 0 0 6 4 0 0 1 0 >BC063017\BC063017\11..691\681\AAH63017.1\Homo sapiens\Homo sapiens fumarylacetoacetate hydrolase domain containing 1,mRNA (cDNA clone MGC:74876 IMAGE:5226963), complete cds./gene="FAHD1"/codon_start=1/product="FAHD1 protein"/protein_id="AAH63017.1"/db_xref="GI:38648928"/db_xref="GeneID:81889" 0 5 1 0 2 3 1 4 11 1 0 4 1 8 1 1 5 0 2 2 3 3 2 9 3 3 4 4 8 2 5 7 4 1 0 6 9 3 2 14 5 0 0 3 5 0 5 13 5 3 5 2 5 0 4 1 2 10 2 8 4 0 0 1 >HSY17999#1\Y17999\281..2170\1890\CAA76991.1\Homo sapiens\Homo sapiens mRNA for protein kinase Dyrk1B./gene="Dyrk1B"/codon_start=1/product="Dyrk1B protein kinase"/protein_id="CAA76991.1"/db_xref="GI:4379097"/db_xref="GOA:Q9Y463"/db_xref="InterPro:IPR000719"/db_xref="InterPro:IPR008271"/db_xref="InterPro:IPR011009"/db_xref="UniProt/Swiss-Prot:Q9Y463" 3 16 18 4 0 4 2 16 43 4 1 2 6 14 6 5 15 6 5 14 8 3 12 30 9 16 5 28 7 5 3 28 13 7 5 5 17 0 4 23 17 2 1 30 14 6 4 23 21 9 16 7 8 2 13 4 1 16 7 12 4 0 0 1 >BC013748\BC013748\153..812\660\AAH13748.1\Homo sapiens\Homo sapiens px19-like protein, mRNA (cDNA clone MGC:9879IMAGE:3867564), complete cds./gene="PX19"/codon_start=1/product="px19-like protein"/protein_id="AAH13748.1"/db_xref="GI:15489294"/db_xref="GeneID:27166"/db_xref="MIM:605733" 3 2 7 0 1 1 1 3 8 2 1 2 0 5 1 3 5 2 3 8 2 3 2 1 1 3 3 13 1 4 0 3 0 3 1 5 11 3 4 14 6 3 3 11 3 1 7 9 7 0 3 3 0 2 5 5 1 3 1 5 6 0 1 0 >AL353671#8\AL353671\105019..105606\588\CAH71256.1\Homo sapiens\Human DNA sequence from clone RP11-205M20 on chromosome 9 Containsthe 5' end of the gene for DEAD/H (Asp-Glu-Ala-Asp/His) boxpolypeptide (RIG-I), the gene for tumor protein p53-binding protein(TP53BPL), a novel gene, a DNA fragmentation factor, 40kDa, betapolypeptide (caspase-activated DNase) (DFFB) (CAD, CPAN, DFF40)pseudogene, a novel gene (FLJ25547), the NDUFB6 gene for NADHdehydrogenase (ubiquinone) 1 beta subcomplex, 6, 17kDa (B17), and aCpG island, complete sequence./locus_tag="RP11-205M20.5-001"/standard_name="OTTHUMP00000021184"/codon_start=1/protein_id="CAH71256.1"/db_xref="GI:55662288"/db_xref="UniProt/TrEMBL:Q8N7I0" 0 2 1 1 2 5 1 8 4 3 2 4 9 3 0 5 9 3 3 2 0 4 5 7 3 3 2 5 0 6 1 5 4 3 2 2 3 1 1 1 4 3 3 6 5 3 3 1 1 4 1 2 6 2 6 5 0 1 3 4 7 1 0 0 >AL590385#2\AL590385 AC013307\join(62127..62211,62646..62666,62996..63250,66479..66733, 67493..67615,70554..70591,74634..74807)\951\CAI16119.1\Homo sapiens\Human DNA sequence from clone RP11-5K23 on chromosome 1 Containsthe FCGR2A gene for Fc fragment of IgG low affinity IIa receptorfor (CD32), the HSPA6 gene for heat shock 70kDa protein 6(HSP70B'), a ribosomal protein S23 (RPS23) pseudogene, the FCGR3Agene for Fc fragment of IgG low affinity IIIa receptor for (CD16),a novel gene and three CpG islands, complete sequence./gene="FCGR2A"/locus_tag="RP11-5K23.6-002"/standard_name="OTTHUMP00000032377"/codon_start=1/product="Fc fragment of IgG, low affinity IIa, receptor for (CD32)"/protein_id="CAI16119.1"/db_xref="GI:55960610" 0 1 1 1 3 4 0 4 15 5 0 4 3 7 0 4 9 4 5 8 3 10 5 9 1 9 4 6 1 11 4 5 4 1 3 5 12 2 5 9 12 8 8 11 7 3 5 8 10 5 7 1 5 1 8 1 1 8 7 8 5 1 0 0 >BC067422\BC067422\29..1156\1128\AAH67422.1\Homo sapiens\Homo sapiens alcohol dehydrogenase 1C (class I), gamma polypeptide,mRNA (cDNA clone MGC:79242 IMAGE:7002194), complete cds./gene="ADH1C"/codon_start=1/product="class I alcohol dehydrogenase, gamma subunit"/protein_id="AAH67422.1"/db_xref="GI:45768580"/db_xref="GeneID:126"/db_xref="MIM:103730" 0 3 2 2 2 2 5 3 10 4 5 2 3 4 3 5 4 5 4 9 3 6 3 6 1 10 13 6 0 11 13 12 7 5 4 12 15 7 20 13 7 5 1 5 2 4 11 10 6 10 3 1 6 9 3 14 3 10 11 8 2 0 0 1 >HSU18803\U18803\1..642\642\AAC50879.1\Homo sapiens\Human myelin/oligodendrocyte glycoprotein MOG beta 2 isoform (MOG)mRNA, complete cds./gene="MOG"/codon_start=1/product="myelin/oligodendrocyte glycoprotein beta 2 isoform"/protein_id="AAC50879.1"/db_xref="GI:1688063" 2 2 5 0 6 3 1 18 10 2 1 2 1 2 1 6 4 0 2 1 0 3 2 5 0 5 6 1 1 3 6 5 5 2 2 2 10 3 3 3 1 3 4 4 3 2 8 8 2 6 5 2 4 1 11 1 3 3 1 3 2 1 0 0 >AK074396\AK074396\255..917\663\BAB85069.1\Homo sapiens\Homo sapiens cDNA FLJ23816 fis, clone HSI02685./codon_start=1/protein_id="BAB85069.1"/db_xref="GI:18676987" 0 11 5 5 0 2 0 7 12 2 0 1 0 2 1 1 9 1 2 7 3 1 0 12 3 5 1 11 5 1 2 14 2 1 0 6 7 0 0 1 7 0 1 9 4 0 0 10 7 0 9 0 12 4 6 0 0 4 0 3 1 1 0 0 >AK098557\AK098557\281..1384\1104\BAC05332.1\Homo sapiens\Homo sapiens cDNA FLJ25691 fis, clone TST04456./codon_start=1/protein_id="BAC05332.1"/db_xref="GI:21758596" 1 1 2 0 2 9 5 11 5 5 6 1 7 3 0 7 10 7 11 1 1 7 9 6 2 11 7 5 0 7 7 4 5 5 3 1 5 8 10 7 9 14 7 8 5 9 13 9 7 3 6 12 4 3 5 13 7 9 6 10 4 1 0 0 >AF040639\AF040639\85..1080\996\AAD02195.1\Homo sapiens\Homo sapiens aflatoxin B1-aldehyde reductase mRNA, complete cds./codon_start=1/product="aflatoxin B1-aldehyde reductase"/protein_id="AAD02195.1"/db_xref="GI:4104867" 1 7 8 1 1 3 0 8 25 2 0 2 2 5 2 0 8 1 1 11 5 1 3 7 2 3 5 21 4 3 1 17 8 1 0 3 11 1 2 10 6 4 1 14 11 2 5 19 10 2 9 4 4 3 10 5 1 5 3 11 6 0 1 0 >BC069360\BC069360\1..345\345\AAH69360.1\Homo sapiens\Homo sapiens chemokine (C motif) ligand 2, mRNA (cDNA cloneMGC:97475 IMAGE:7262751), complete cds./gene="XCL2"/codon_start=1/product="chemokine (C motif) ligand 2"/protein_id="AAH69360.1"/db_xref="GI:47479837"/db_xref="GeneID:6846"/db_xref="MIM:604828" 1 0 0 1 5 4 1 4 3 2 0 1 1 2 1 1 3 1 2 10 2 3 3 0 0 0 2 2 0 2 1 4 1 1 2 3 5 1 3 2 2 2 2 3 0 1 3 0 2 1 2 0 1 2 0 1 0 5 3 3 1 0 1 0 >BC026319\BC026319\23..1870\1848\AAH26319.1\Homo sapiens\Homo sapiens hypothetical protein FLJ11078, mRNA (cDNA cloneMGC:26163 IMAGE:4799495), complete cds./gene="FLJ11078"/codon_start=1/product="hypothetical protein LOC55295"/protein_id="AAH26319.1"/db_xref="GI:20071160"/db_xref="GeneID:55295" 2 27 12 3 1 4 1 13 46 0 0 1 3 4 10 1 25 0 1 14 9 2 1 14 9 2 5 42 6 6 1 36 9 6 2 11 35 1 0 11 12 3 0 28 15 2 2 36 27 4 19 3 18 1 25 2 1 14 2 17 8 0 1 0 >AF155811\AF155811\1..969\969\AAG29584.1\Homo sapiens\Homo sapiens mitochondrial uncoupling protein 5 short form mRNA,complete cds; nuclear gene for mitochondrial product./function="reduces mitochondrial membrane potential in mammalian cells"/codon_start=1/product="mitochondrial uncoupling protein 5 short form"/protein_id="AAG29584.1"/db_xref="GI:11094339" 3 3 1 3 2 5 6 3 4 7 6 6 4 3 0 3 6 2 4 5 1 9 3 4 1 2 4 6 3 8 10 8 6 7 7 1 10 8 9 8 4 3 9 6 2 5 5 7 1 9 3 7 0 3 6 13 6 10 15 11 6 1 0 0 >AF520533S2\AF520534\284..1900\1617\AAQ17589.1\Homo sapiens\Homo sapiens individual 80 allele A, envelope glycoprotein gene,complete cds, and 3' long terminal repeat, complete sequence./codon_start=1/product="envelope glycoprotein"/protein_id="AAQ17589.1"/db_xref="GI:33411037" 5 6 3 4 8 3 16 17 5 10 8 4 9 10 2 12 8 5 6 23 0 21 4 19 0 17 11 7 0 6 18 6 5 6 11 11 2 10 12 5 14 12 18 4 2 10 16 7 5 8 9 9 8 10 9 12 10 11 7 13 9 0 1 0 >HS1120P11#2\AL109615\complement(join(138779..139145,141477..141547))\438\CAI19245.1\Homo sapiens\Human DNA sequence from clone RP5-1120P11 on chromosome6p12.3-21.2, complete sequence./locus_tag="RP5-1120P11.5-001"/standard_name="OTTHUMP00000016500"/codon_start=1/protein_id="CAI19245.1"/db_xref="GI:56204136"/db_xref="GOA:Q6P1L8"/db_xref="InterPro:IPR000218"/db_xref="UniProt/TrEMBL:Q6P1L8" 3 2 3 0 2 0 1 3 6 0 0 0 0 2 0 0 5 3 2 4 1 2 1 5 0 3 0 3 2 3 2 5 6 0 2 2 10 0 1 10 7 1 0 4 2 3 1 2 4 0 1 2 2 2 4 2 1 3 5 4 1 0 0 1 >BC008999\BC008999\320..1255\936\AAH08999.1\Homo sapiens\Homo sapiens KIAA0157, mRNA (cDNA clone MGC:17346 IMAGE:2900632),complete cds./gene="KIAA0157"/codon_start=1/product="KIAA0157 protein"/protein_id="AAH08999.1"/db_xref="GI:14290468"/db_xref="GeneID:23172" 3 2 1 0 10 6 1 6 1 5 3 4 5 7 1 10 5 11 2 3 0 8 4 7 2 11 9 4 2 7 6 3 2 2 0 2 10 6 7 6 4 12 6 15 4 3 14 8 12 8 4 8 0 2 4 6 1 4 5 7 0 1 0 0 >HSAJ3317\AJ223317\133..1560\1428\CAA11257.1\Homo sapiens\Homo sapiens mRNA for sarcosine dehydrogenase, complete cds./codon_start=1/product="sarcosine dehydrogenase"/protein_id="CAA11257.1"/db_xref="GI:4454259"/db_xref="GOA:O96023"/db_xref="UniProt/TrEMBL:O96023"/IGNORED_CODON=1 6 7 13 5 0 4 3 9 28 2 0 2 2 8 3 3 10 4 3 16 8 3 5 13 5 4 8 19 4 7 8 22 12 8 1 12 27 0 0 12 8 4 3 13 12 8 4 27 13 5 6 6 5 4 8 4 0 11 5 13 9 0 0 1 >BC007315\BC007315\262..1002\741\AAH07315.1\Homo sapiens\Homo sapiens CCR4-NOT transcription complex, subunit 7, mRNA (cDNAclone MGC:1092 IMAGE:2989585), complete cds./gene="CNOT7"/codon_start=1/product="CNOT7 protein"/protein_id="AAH07315.1"/db_xref="GI:13938366"/db_xref="GeneID:29883"/db_xref="MIM:604913" 2 0 2 1 2 1 5 4 2 4 4 8 4 0 0 5 4 0 4 3 1 3 4 1 0 3 6 2 1 3 9 1 2 4 2 3 4 4 9 5 4 8 6 9 0 4 15 11 7 4 7 5 2 3 2 12 3 5 7 7 3 0 0 1 >AC016776#2\AC016776\join(82753..82870,97608..97637,104389..104454, 126211..126278,134986..135056,140200..140257, 141866..141898,145522..145584,150932..150982)\558\AAY24220.1\Homo sapiens\Homo sapiens BAC clone RP11-574K22 from 2, complete sequence./gene="NCE2"/codon_start=1/product="unknown"/protein_id="AAY24220.1"/db_xref="GI:62988833" 0 0 4 2 4 1 3 1 4 4 4 5 1 2 1 2 0 2 7 2 2 4 3 4 0 3 4 2 2 3 2 2 2 3 1 1 5 5 10 5 4 4 0 3 1 5 8 4 6 10 4 1 1 4 1 6 1 3 2 2 3 0 0 1 >AK097248\AK097248\283..2055\1773\BAC04985.1\Homo sapiens\Homo sapiens cDNA FLJ39929 fis, clone SPLEN2021295./codon_start=1/protein_id="BAC04985.1"/db_xref="GI:21756940" 4 3 0 6 12 8 4 6 9 11 8 10 11 4 2 10 4 6 11 13 5 11 20 16 4 21 4 6 0 7 9 6 3 10 5 5 4 8 25 17 11 16 12 14 11 10 29 17 14 16 10 21 3 9 11 12 10 4 14 10 8 0 0 1 >AK126055\AK126055\1285..4461\3177\BAC86417.1\Homo sapiens\Homo sapiens cDNA FLJ44067 fis, clone TESTI4037066./codon_start=1/protein_id="BAC86417.1"/db_xref="GI:34532404" 3 1 1 4 18 10 21 10 16 25 29 26 37 10 3 32 9 20 24 11 1 23 18 4 4 17 24 6 1 12 16 5 4 14 15 10 13 10 56 20 20 33 31 26 8 18 66 16 22 33 10 21 8 19 12 33 22 8 41 15 13 0 0 1 >D86062\D86062\19..732\714\BAA12985.1\Homo sapiens\Homo sapiens mRNA for KNP-Ib, complete cds./gene="KNP-I"/codon_start=1/product="KNP-Ib"/protein_id="BAA12985.1"/db_xref="GI:1545815" 0 2 1 2 1 6 0 8 9 1 0 2 0 5 2 3 2 1 1 5 4 2 2 2 2 5 7 16 8 7 7 7 5 3 0 8 15 1 3 9 4 1 0 6 9 2 6 9 4 3 2 1 5 0 2 3 0 9 2 4 1 0 0 1 >AL591845#3\AL591845\complement(join(58531..58961,59207..59273))\498\CAH71861.1\Homo sapiens\Human DNA sequence from clone RP11-268J15 on chromosome 1 Containsthe 3' end of a novel gene for thyroid hormone receptor-associatedprotein, 150 kDa subunit (TRAP150) (FLJ22082), a novel transcript(FLJ22938), a gene for a novel protein (FLJ10647), the 3' end of agene for Ser/Thr-like kinase (MGC4796) and two CpG islands,complete sequence./gene="RP11-268J15.2"/locus_tag="RP11-268J15.2-001"/standard_name="OTTHUMP00000009598"/codon_start=1/product="novel protein"/protein_id="CAH71861.1"/db_xref="GI:55665952"/db_xref="UniProt/TrEMBL:Q9NVM1" 1 7 8 0 0 1 0 6 16 0 0 1 0 1 2 0 7 0 1 4 6 1 0 10 6 0 1 4 4 2 0 10 3 0 0 4 2 0 0 0 4 0 0 3 2 0 1 15 11 1 3 1 2 0 4 0 0 5 0 3 2 0 0 1 >HSRBPRL7A\X57959\11..757\747\CAA41027.1\Homo sapiens\H.sapiens mRNA for ribosomal protein L7./gene="humL7-14"/codon_start=1/product="ribosomal protein L7"/protein_id="CAA41027.1"/db_xref="GI:35903"/db_xref="GOA:P18124"/db_xref="UniProt/Swiss-Prot:P18124" 7 3 0 1 7 8 1 1 5 9 0 6 1 0 1 3 1 0 1 4 0 2 4 3 0 2 7 1 2 7 5 4 1 6 5 1 4 3 10 24 8 7 2 3 2 2 8 9 1 3 4 7 1 0 6 4 0 12 8 9 2 1 0 0 >CR456738\CR456738\1..627\627\CAG33019.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834F046D forgene FADD, Fas (TNFRSF6)-associated via death domain; complete cds,incl. stopcodon./gene="FADD"/codon_start=1/protein_id="CAG33019.1"/db_xref="GI:48145593" 0 7 3 3 3 3 2 8 21 0 0 0 4 6 3 2 5 1 3 4 0 0 0 2 2 1 4 6 5 3 0 2 8 0 1 3 9 1 2 6 8 1 1 7 4 0 3 14 13 2 1 0 2 2 4 1 1 3 0 5 3 1 0 0 >S81868\S81868\1327..1350\24\AAD14378.1\Homo sapiens\adenosine A3 receptor {5' region} [human, HepG2 cells, Genomic,1350 nt]./gene="adenosine A3 receptor"/codon_start=1/protein_id="AAD14378.1"/db_xref="GI:4262078" 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 >BC014661\BC014661\3..794\792\AAH14661.1\Homo sapiens\Homo sapiens hypothetical protein FLJ12448, mRNA (cDNA cloneMGC:22955 IMAGE:4860511), complete cds./gene="FLJ12448"/codon_start=1/product="hypothetical protein FLJ12448"/protein_id="AAH14661.1"/db_xref="GI:15779204"/db_xref="GeneID:64897" 3 4 4 1 1 2 1 2 4 3 0 4 4 4 4 3 11 8 3 9 0 2 5 5 3 4 11 10 5 9 3 4 3 3 2 5 7 0 7 23 4 2 4 8 3 2 5 20 6 5 0 0 2 0 6 0 1 2 2 3 2 0 0 1 >BC010505\BC010505\104..1069\966\AAH10505.1\Homo sapiens\Homo sapiens secretory carrier membrane protein 3, transcriptvariant 2, mRNA (cDNA clone MGC:17478 IMAGE:3452347), complete cds./gene="SCAMP3"/codon_start=1/product="secretory carrier membrane protein 3, isoform 2"/protein_id="AAH10505.1"/db_xref="GI:14714725"/db_xref="GeneID:10067"/db_xref="MIM:606913" 4 3 4 0 2 1 2 11 12 4 1 3 4 7 1 3 7 3 5 3 1 5 8 11 3 7 7 17 1 16 3 8 3 4 2 7 7 2 2 8 9 5 2 14 1 1 5 11 5 1 5 3 5 2 18 9 0 6 4 7 6 0 0 1 >AK074662\AK074662\212..580\369\BAC11120.1\Homo sapiens\Homo sapiens cDNA FLJ90181 fis, clone MAMMA1000706./codon_start=1/protein_id="BAC11120.1"/db_xref="GI:22760248" 0 5 1 3 0 4 1 5 8 1 0 1 1 2 0 1 3 0 9 1 0 4 4 9 0 4 5 4 0 0 1 3 2 1 1 1 1 1 0 0 1 0 0 3 9 2 0 1 1 0 0 1 5 3 1 0 0 2 0 1 5 0 0 1 >BC003674\BC003674\48..347\300\AAH03674.1\Homo sapiens\Homo sapiens NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 2,8kDa, mRNA (cDNA clone MGC:12315 IMAGE:4044385), complete cds./gene="NDUFA2"/codon_start=1/product="NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 2, 8kDa"/protein_id="AAH03674.1"/db_xref="GI:13277540"/db_xref="GeneID:4695"/db_xref="MIM:602137" 1 5 0 1 1 1 3 1 4 0 1 1 0 1 1 0 1 3 0 1 1 0 0 4 0 1 4 4 3 1 1 4 1 1 1 3 2 1 2 4 3 2 1 4 1 0 1 5 2 2 2 0 1 1 2 1 0 3 2 1 1 0 0 1 >HSU14971\U14971\36..620\585\AAA85659.1\Homo sapiens\Human ribosomal protein S9 mRNA, complete cds./codon_start=1/product="ribosomal protein S9"/protein_id="AAA85659.1"/db_xref="GI:550023" 1 11 7 3 2 3 0 2 18 1 1 2 0 3 0 3 1 0 0 3 2 1 2 1 3 1 0 7 0 3 1 8 5 1 0 6 7 2 4 15 3 1 1 6 3 1 2 11 5 6 1 2 0 1 6 1 1 7 2 2 2 1 0 0 >BC000881\BC000881\130..474\345\AAH00881.1\Homo sapiens\Homo sapiens centromere protein A, 17kDa, mRNA (cDNA clone MGC:5165IMAGE:3461992), complete cds./gene="CENPA"/codon_start=1/product="CENPA protein"/protein_id="AAH00881.1"/db_xref="GI:12654133"/db_xref="GeneID:1058"/db_xref="MIM:117139" 3 4 7 0 1 4 2 6 3 2 2 1 0 4 0 0 5 1 1 3 0 1 1 5 5 0 4 3 0 1 1 7 0 1 0 0 1 2 0 6 0 0 3 1 2 3 1 5 1 1 0 1 0 0 2 2 1 2 0 1 1 0 0 1 >HSRH4\X63098\20..1084\1065\CAA44812.1\Homo sapiens\H.sapiens mRNA for rhesus polypeptide (Rh4)./codon_start=1/product="Rhesus polypeptide Rh4"/protein_id="CAA44812.1"/db_xref="GI:36020"/db_xref="GOA:P18577"/db_xref="UniProt/Swiss-Prot:P18577" 0 1 4 1 3 5 2 8 22 6 2 8 5 4 2 8 7 9 5 8 2 3 4 4 3 8 7 12 5 11 4 8 7 6 1 9 21 0 1 9 6 5 6 6 8 2 2 3 4 5 4 6 5 2 14 4 1 11 2 14 9 1 0 0 >HSU81031#1\U81031 AC000028\join(5284..5501,5659..5794,5911..6078,6236..6345, 7501..7551,7688..7823,8388..8480)\912\AAC39521.2\Homo sapiens\Homo sapiens CDK4, SAS, and KIAA0167 genes, complete cds; and OS9gene, partial cds; from cosmid 6E5, complete sequence./codon_start=1/product="CDK4"/protein_id="AAC39521.2"/db_xref="GI:17986211" 9 2 2 4 4 2 4 3 19 4 1 2 0 3 2 4 2 4 10 2 1 2 8 11 1 5 5 10 0 7 9 7 3 5 4 5 11 7 1 10 3 4 0 8 5 4 6 13 10 7 4 5 1 3 2 11 0 8 3 8 3 0 0 1 >BC094744\BC094744\75..2924\2850\AAH94744.1\Homo sapiens\Homo sapiens multimerin 2, mRNA (cDNA clone MGC:104596IMAGE:30343159), complete cds./gene="MMRN2"/codon_start=1/product="multimerin 2"/protein_id="AAH94744.1"/db_xref="GI:63101264"/db_xref="GeneID:79812"/db_xref="MIM:608925" 4 11 19 4 5 13 3 23 71 7 3 9 4 12 7 5 31 4 9 21 11 7 10 10 7 16 16 55 18 14 9 24 25 9 1 16 40 3 11 28 19 6 7 61 28 5 12 70 35 11 12 2 10 2 18 12 3 14 2 15 10 0 0 1 >AK127042\AK127042\1326..4424\3099\BAC86800.1\Homo sapiens\Homo sapiens cDNA FLJ45099 fis, clone BRAWH3031710, highly similarto Homo sapiens serologically defined colon cancer antigen 33(SDCCAG33)./codon_start=1/protein_id="BAC86800.1"/db_xref="GI:34533774" 1 7 7 2 2 7 3 18 49 4 3 13 6 43 12 11 50 9 11 38 19 9 13 33 20 12 12 33 15 12 5 30 14 3 0 16 31 4 26 69 28 10 2 46 24 5 10 63 36 12 23 7 11 6 15 7 2 23 4 19 7 0 1 0 >AK074906\AK074906\92..2332\2241\BAC11282.1\Homo sapiens\Homo sapiens cDNA FLJ90425 fis, clone NT2RP3000444./codon_start=1/protein_id="BAC11282.1"/db_xref="GI:22760656" 3 11 30 6 8 10 6 21 63 4 0 6 6 21 5 9 22 6 5 8 2 5 8 27 8 14 9 31 8 9 9 22 20 7 1 5 31 4 2 25 6 6 4 39 8 5 15 53 27 4 8 2 16 4 8 1 0 17 3 8 15 0 0 1 >BC101263\BC101263\101..1048\948\AAI01264.1\Homo sapiens\Homo sapiens hypothetical protein DKFZp547E052, mRNA (cDNA cloneMGC:120278 IMAGE:40023930), complete cds./gene="DKFZp547E052"/codon_start=1/product="hypothetical protein LOC84236"/protein_id="AAI01264.1"/db_xref="GI:71682034"/db_xref="GeneID:84236" 1 0 3 1 10 4 4 8 7 9 3 7 5 5 0 4 6 5 1 3 1 5 8 3 2 6 8 4 0 8 11 5 7 3 4 3 3 7 5 3 8 7 7 7 3 8 12 2 4 6 8 9 2 4 9 11 2 5 5 7 7 0 0 1 >HSA418064\AJ418064\join(195..214,1642..1849,10482..10648,12181..12314, 13207..13304,14872..14946,15086..15277,18194..18289, 21663..21738,23005..23098,25671..25750,25842..25886, 26722..26811,31129..31226,36154..36297,37528..37666, 38366..38501,40270..40449,44239..44363,44861..45039, 46373..46498,46836..46948,47300..47480,51531..51611, 54157..54309)\3030\CAD10805.1\Homo sapiens\Homo sapiens SMARCA3 gene for SWI/SNF related, matrix associated,actin dependent regulator of chromatin, subfamily a, member 3,exons 1-26./gene="SMARCA3"/function="transcription factor"/codon_start=1/product="SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 3"/protein_id="CAD10805.1"/db_xref="GI:16943790"/db_xref="GOA:Q14527"/db_xref="UniProt/Swiss-Prot:Q14527" 5 4 2 7 30 6 8 1 16 25 30 22 17 9 0 23 9 14 27 4 3 23 26 5 3 17 27 10 3 18 33 8 3 15 12 7 13 26 55 32 12 41 15 24 3 22 58 13 16 38 5 16 7 11 8 28 15 11 30 24 14 1 0 0 >BC029125\BC029125\15..1070\1056\AAH29125.1\Homo sapiens\Homo sapiens formyl peptide receptor-like 1, mRNA (cDNA cloneMGC:34599 IMAGE:5180065), complete cds./gene="FPRL1"/codon_start=1/product="formyl peptide receptor-like 1"/protein_id="AAH29125.1"/db_xref="GI:20809658"/db_xref="GeneID:2358"/db_xref="MIM:136538" 1 3 4 1 2 2 1 9 20 6 4 6 2 8 0 5 4 3 6 9 3 9 7 4 1 6 5 14 0 8 4 9 6 2 1 14 13 5 5 6 8 5 2 2 4 1 5 9 8 0 5 4 3 5 15 11 0 13 14 11 8 0 0 1 >HS885L7#2\AL035669\join(45701..45848,46287..46408,47764..47845,48152..48226, 48633..48820)\615\CAC12748.1\Homo sapiens\Human DNA sequence from clone RP5-885L7 on chromosome 20q13.2-13.33Contains the 3' end of the NTSR1 gene for high affinity neurotensinreceptor 1, the C20orf20 gene, the OGFR gene for opioid growthfactor receptor, the COL9A3 gene for collagen IX alpha 3, theC20orf143 gene, the TCFL5 gene for basic helix-loop-helixtranscription factor-like 5, the ARF4P2 gene for ADP-ribosylationfactor 4 pseudogene 2, the 3' end of the DATF1 gene for deathassociated transcription factor 1, two novel genes and seven CpGislands, complete sequence./gene="C20orf20"/locus_tag="RP5-885L7.13-001"/standard_name="OTTHUMP00000031512"/codon_start=1/protein_id="CAC12748.1"/db_xref="GI:12313998"/db_xref="Genew:15866"/db_xref="GOA:Q9NV56"/db_xref="UniProt/Swiss-Prot:Q9NV56" 2 3 5 0 0 1 0 1 4 2 0 2 4 4 0 2 7 3 1 4 1 0 4 4 3 1 4 5 3 2 2 7 7 1 0 7 8 1 8 9 6 2 0 4 5 2 6 17 11 1 1 0 2 1 5 1 1 3 4 8 2 0 1 0 >D13641\D13641\102..539\438\BAA02804.1\Homo sapiens\Human mRNA for KIAA0016 gene, complete cds./gene="KIAA0016"/codon_start=1/product="mitochondrial outer membrane protein 19"/protein_id="BAA02804.1"/db_xref="GI:285987" 3 1 1 0 4 1 1 1 3 9 4 1 0 1 0 0 2 3 2 0 0 2 5 1 0 1 1 5 0 7 1 1 3 4 3 1 4 1 3 8 3 1 2 11 0 1 8 3 4 3 2 1 2 1 6 0 1 2 4 2 0 0 0 1 >AK127827\AK127827\728..1117\390\BAC87150.1\Homo sapiens\Homo sapiens cDNA FLJ45930 fis, clone PLACE7000707./codon_start=1/protein_id="BAC87150.1"/db_xref="GI:34534909" 2 1 2 1 0 4 0 4 5 1 0 2 3 1 1 3 5 1 1 1 0 1 4 1 1 6 1 1 2 2 6 1 5 1 0 1 5 3 0 1 0 1 1 3 2 0 1 4 2 5 1 2 3 1 3 8 2 0 2 2 6 0 1 0 >AF213668\AF213668\54..950\897\AAG40147.1\Homo sapiens\Homo sapiens bHLHZip transcription factor BIGMAX gamma mRNA,complete cds./codon_start=1/product="bHLHZip transcription factor BIGMAX gamma"/protein_id="AAG40147.1"/db_xref="GI:11761696" 1 3 6 2 4 4 2 3 6 3 1 2 2 13 0 5 9 3 1 6 3 2 1 9 1 2 4 12 4 4 1 8 6 2 2 8 10 1 5 17 6 2 4 17 8 1 3 18 13 6 6 3 2 3 6 3 0 8 6 3 2 0 0 1 >BC069110\BC069110\31..774\744\AAH69110.1\Homo sapiens\Homo sapiens chymase 1, mast cell, mRNA (cDNA clone MGC:95388IMAGE:7216927), complete cds./gene="CMA1"/codon_start=1/product="chymase 1, mast cell, preproprotein"/protein_id="AAH69110.1"/db_xref="GI:46575745"/db_xref="GeneID:1215"/db_xref="MIM:118938" 1 1 4 1 6 2 1 5 12 8 1 3 3 5 1 4 2 0 7 3 1 4 4 8 1 4 4 5 0 8 5 6 6 4 2 3 7 1 4 10 6 3 3 6 4 5 5 5 5 3 3 2 4 4 6 6 4 7 1 5 3 1 0 0 >AK091742\AK091742\132..2195\2064\BAC03738.1\Homo sapiens\Homo sapiens cDNA FLJ34423 fis, clone HHDPC2008123, highly similarto NUCLEOLIN./codon_start=1/protein_id="BAC03738.1"/db_xref="GI:21750187" 7 0 3 1 10 5 2 5 6 3 5 7 7 8 0 6 7 8 11 5 2 16 10 4 1 13 20 17 5 23 26 20 10 22 5 9 8 13 47 37 7 14 6 8 1 0 58 44 26 47 3 5 0 1 5 21 3 6 8 8 2 0 1 0 >BC035311\BC035311\178..1608\1431\AAH35311.1\Homo sapiens\Homo sapiens dermokine, mRNA (cDNA clone MGC:21664 IMAGE:4752921),complete cds./gene="ZD52F10"/codon_start=1/product="dermokine"/protein_id="AAH35311.1"/db_xref="GI:54035054"/db_xref="GeneID:93099" 3 2 1 1 5 2 0 6 13 3 0 1 8 11 2 9 25 12 1 5 2 9 6 9 2 9 11 15 3 9 25 46 24 15 0 9 6 2 7 10 20 13 4 19 5 5 10 10 5 6 5 3 3 1 6 8 1 2 7 3 11 0 1 0 >BC027583\BC027583\153..209\57\AAH27583.1\Homo sapiens\Homo sapiens microtubule-associated protein 2, mRNA (cDNA cloneIMAGE:4837202), complete cds./gene="MAP2"/codon_start=1/product="MAP2 protein"/protein_id="AAH27583.1"/db_xref="GI:71052145"/db_xref="GeneID:4133"/db_xref="MIM:157130" 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 4 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 2 0 0 2 0 0 0 0 0 0 0 0 0 1 1 1 0 0 >BT007282\BT007282\1..1938\1938\AAP35946.1\Homo sapiens\Homo sapiens eukaryotic translation initiation factor 4 gamma, 1mRNA, complete cds./codon_start=1/product="eukaryotic translation initiation factor 4 gamma, 1"/protein_id="AAP35946.1"/db_xref="GI:30583403" 13 9 13 8 3 4 5 17 34 7 3 8 7 8 1 7 18 10 7 7 6 4 4 18 1 11 11 16 7 19 8 15 14 9 5 5 21 3 13 29 10 4 3 30 4 6 24 42 22 11 6 6 3 2 11 6 2 13 11 12 9 0 1 0 >AF311312\AF311312 S58544\199..2979\2781\AAG23967.1\Homo sapiens\Homo sapiens infertility-related sperm protein mRNA, complete cds./codon_start=1/product="infertility-related sperm protein"/protein_id="AAG23967.1"/db_xref="GI:10863768" 3 3 8 4 13 13 14 12 18 22 11 14 15 3 1 10 14 13 19 8 3 13 16 4 9 14 25 27 17 19 18 24 9 6 8 3 10 8 60 38 21 21 10 26 1 15 62 33 12 40 8 21 7 13 9 15 10 8 24 14 7 1 0 0 >AF151808\AF151808\70..1386\1317\AAD34045.1\Homo sapiens\Homo sapiens CGI-50 protein mRNA, complete cds./codon_start=1/product="CGI-50 protein"/protein_id="AAD34045.1"/db_xref="GI:4929569" 1 3 5 0 6 8 3 9 22 4 2 6 2 10 3 2 13 3 5 12 8 6 6 11 4 4 8 17 4 8 6 11 5 3 5 3 19 2 10 11 13 2 2 9 9 4 4 21 15 4 12 0 13 4 11 8 1 12 10 12 2 0 0 1 >BC067765\BC067765\343..963\621\AAH67765.1\Homo sapiens\Homo sapiens SPFH domain family, member 2, mRNA (cDNA cloneMGC:87072 IMAGE:5296776), complete cds./gene="SPFH2"/codon_start=1/product="SPFH2 protein"/protein_id="AAH67765.1"/db_xref="GI:45709604"/db_xref="GeneID:11160" 0 0 0 0 3 2 1 7 7 3 1 1 4 2 1 4 2 4 3 2 1 4 0 1 1 6 3 3 1 6 3 3 2 7 2 4 14 1 1 10 6 2 4 7 5 3 4 5 3 3 3 6 3 2 7 3 3 4 3 7 3 0 1 0 >HSU19D8\Z70689\complement(join(4824..5056,6232..6395,7157..7338, 7437..7657,7904..8040,8807..8997,9761..9856))\1224\CAI41980.1\Homo sapiens\Human DNA sequence from clone LL0XNC01-19D8 on chromosome XContains the 5' end of the NXF2 gene for nuclear RNA export factor2, complete sequence./gene="NXF2"/locus_tag="GHc-618H1.1-001"/standard_name="OTTHUMP00000023709"/codon_start=1/product="nuclear RNA export factor 2"/protein_id="CAI41980.1"/db_xref="GI:57210044"/db_xref="UniProt/TrEMBL:Q5H9J9" 2 1 1 3 6 2 14 17 18 9 3 6 5 4 0 6 7 7 7 7 0 9 6 4 2 9 8 6 2 6 1 10 2 2 2 4 8 5 12 7 11 6 6 19 3 9 23 12 7 11 6 4 4 5 6 9 3 11 5 12 5 0 0 1 >AF222043\AF222043\173..1681\1509\AAF37827.2\Homo sapiens\Homo sapiens ubiquitin-associated protein (NAG20) mRNA, completecds./gene="NAG20"/codon_start=1/product="ubiquitin-associated protein"/protein_id="AAF37827.2"/db_xref="GI:7948993" 0 1 5 0 3 1 4 11 12 9 7 14 7 13 1 11 15 8 10 6 4 8 10 11 0 14 8 10 3 10 7 11 4 3 3 7 9 5 18 20 8 16 2 18 6 5 21 24 14 11 3 3 3 8 12 9 3 4 11 17 1 0 0 1 >HUMLFERR\M83202\8..2143\2136\AAA59511.1\Homo sapiens\Human lactoferrin (HLF1) mRNA, complete cds./gene="HLF1"/codon_start=1/product="lactoferrin"/protein_id="AAA59511.1"/db_xref="GI:187122" 3 5 7 4 13 13 0 13 32 10 2 9 5 13 2 10 12 8 8 9 2 12 6 10 6 13 12 27 8 18 12 23 12 8 2 11 32 5 23 23 17 15 8 21 4 5 19 23 22 16 12 9 17 16 21 11 4 7 5 6 10 1 0 0 >BC005279\BC005279\23..1282\1260\AAH05279.1\Homo sapiens\Homo sapiens carboxypeptidase A1 (pancreatic), mRNA (cDNA cloneMGC:12328 IMAGE:3949850), complete cds./gene="CPA1"/codon_start=1/product="pancreatic carboxypeptidase A1, precursor"/protein_id="AAH05279.1"/db_xref="GI:13528975"/db_xref="GeneID:1357"/db_xref="MIM:114850" 2 5 7 2 0 1 0 7 22 3 0 5 0 15 2 2 10 5 4 14 5 6 3 9 1 4 4 17 4 5 2 16 11 0 3 10 12 0 1 19 9 3 2 16 10 3 3 22 20 5 11 8 1 1 14 9 0 23 7 6 8 0 0 1 >HUMPTS4\U63383\join(U63380.1:17..99,U63381.1:11..90,U63382.1:11..33, U63382.1:429..485,11..81,280..403)\-25149\AAC16970.1\Homo sapiens\Homo sapiens 6-pyruvoyl-tetrahydropterin synthase gene, exon 5,exon 6, and complete cds./codon_start=1/product="6-pyruvoyl-tetrahydropterin synthase"/protein_id="AAC16970.1"/db_xref="GI:3142325" 2 0 0 0 1 2 0 1 5 4 1 5 0 0 0 0 2 1 0 0 1 3 2 5 0 2 4 1 1 1 2 1 1 1 4 1 8 4 5 3 1 5 0 4 0 3 4 2 2 9 3 6 2 1 1 3 2 4 4 6 6 1 1 2 >AK093122\AK093122\203..1336\1134\BAC04063.1\Homo sapiens\Homo sapiens cDNA FLJ35803 fis, clone TESTI2005979./codon_start=1/protein_id="BAC04063.1"/db_xref="GI:21751892" 4 10 13 3 2 10 1 12 24 3 1 3 5 8 6 3 7 3 4 10 2 1 7 18 10 6 2 15 10 6 5 12 13 3 1 6 7 0 2 14 5 0 4 15 7 1 3 17 13 2 3 0 5 4 11 1 0 1 3 2 8 0 1 0 >AF168954\AF168954\133..1071\939\AAM88911.1\Homo sapiens\Homo sapiens LFIRE1 (LFIRE1) mRNA, complete cds./gene="LFIRE1"/codon_start=1/product="LFIRE1"/protein_id="AAM88911.1"/db_xref="GI:22023090" 1 1 2 1 5 4 0 4 5 8 1 3 1 2 1 7 5 4 2 5 2 4 3 1 0 2 7 2 2 4 11 6 6 2 3 6 4 5 14 7 6 16 5 13 3 4 13 8 8 13 7 10 1 4 8 9 1 4 8 6 12 1 0 0 >AF022813\AF022813\105..821\717\AAC51864.1\Homo sapiens\Homo sapiens tetraspan (NAG-2) mRNA, complete cds./gene="NAG-2"/codon_start=1/product="tetraspan"/protein_id="AAC51864.1"/db_xref="GI:2586350" 0 2 1 0 0 1 0 8 25 1 0 2 0 5 1 1 3 1 1 8 6 2 0 1 2 1 1 18 4 2 1 14 3 2 1 5 10 0 1 8 8 0 2 7 2 0 0 7 7 0 8 1 10 3 15 2 0 11 1 5 7 0 1 0 >BC045759\BC045759\752..952\201\AAH45759.1\Homo sapiens\Homo sapiens microtubule-associated protein 1 light chain 3 beta,mRNA (cDNA clone MGC:48651 IMAGE:4828857), complete cds./gene="MAP1LC3B"/codon_start=1/product="MAP1LC3B protein"/protein_id="AAH45759.1"/db_xref="GI:71297332"/db_xref="GeneID:81631" 0 1 0 0 1 1 0 2 2 0 1 2 2 2 0 0 2 2 1 0 1 0 1 0 0 0 0 2 0 1 2 0 1 0 0 3 3 0 2 1 1 2 0 3 1 0 1 5 0 2 1 2 0 0 4 0 1 2 1 4 0 1 0 0 >AF151980\AF151980\1..1149\1149\AAD37802.2\Homo sapiens\Homo sapiens connexin 43 (CX43) gene, complete cds./gene="CX43"/codon_start=1/product="connexin 43"/protein_id="AAD37802.2"/db_xref="GI:6563408" 6 3 2 2 4 0 2 7 14 2 2 7 5 4 1 12 9 6 2 5 1 5 4 5 0 10 3 11 4 8 6 4 7 6 2 3 13 8 10 18 7 9 7 11 1 7 8 10 10 8 10 6 5 4 17 4 2 15 5 7 6 0 1 0 >AK023746\AK023746\162..947\786\BAB14665.1\Homo sapiens\Homo sapiens cDNA FLJ13684 fis, clone PLACE2000021, moderatelysimilar to Homo sapiens TRF1-interacting ankyrin-related ADP-ribosepolymerase mRNA./codon_start=1/protein_id="BAB14665.1"/db_xref="GI:10435772" 2 0 2 0 8 4 4 2 4 3 3 3 1 2 0 9 1 4 2 2 0 5 3 1 1 4 5 1 1 5 13 3 3 9 4 2 2 6 9 6 7 6 4 5 9 7 11 10 1 7 5 8 2 3 3 8 3 5 10 7 1 1 0 0 >AF303084\AF303084\43..534\492\AAQ14494.1\Homo sapiens\Homo sapiens epididymal-specific lipocalin LCN6 mRNA, complete cds./codon_start=1/product="epididymal-specific lipocalin LCN6"/protein_id="AAQ14494.1"/db_xref="GI:33340037" 1 0 2 0 2 2 0 3 16 3 0 2 2 3 1 1 3 2 1 4 2 2 1 4 0 1 0 8 1 3 3 5 6 0 0 3 12 0 0 5 6 1 0 7 1 0 2 9 6 0 2 1 0 1 6 3 2 2 0 5 5 0 1 0 >AL158839#4\AL158839\join(28627..28883,40119..40206)\345\CAH70972.1\Homo sapiens\Human DNA sequence from clone RP11-180O5 on chromosome 1 Containsthe 5' end of gene DKFZP566D1346, the HHLA3 gene for HERV-HLTR-associating 3 and a novel gene, complete sequence./gene="HHLA3"/locus_tag="RP11-180O5.1-002"/standard_name="OTTHUMP00000010943"/codon_start=1/product="HERV-H LTR-associating 3"/protein_id="CAH70972.1"/db_xref="GI:55661699"/db_xref="UniProt/TrEMBL:Q5VZP2" 0 0 2 1 4 3 2 1 0 2 1 0 0 1 0 4 1 2 1 3 2 2 5 2 0 2 4 2 3 0 3 0 2 1 1 2 0 0 6 3 0 2 4 2 2 2 2 8 0 2 1 1 3 5 1 2 1 2 0 5 1 0 1 0 >AL135927#24\AL135927\complement(join(131817..132091,132449..132599, 132830..132926,133223..133349,133470..133675, 133811..133917))\963\CAI15543.1\Homo sapiens\Human DNA sequence from clone RP11-54H19 on chromosome 1 Containsthe 3' end of the LMNA gene for lamin A/C, the gene for a novelprotein similar to semaphorins (FLJ12287), a novel gene (KIAA0446),the PMF1 gene for polyamine-modulated factor 1, the BGLAP gene forbone gamma-carboxyglutamate (gla) protein (osteocalcin), the genefor progestin and adipoQ receptor family member VI (PAQR6) and the3' end of the gene for Est1p-like protein B (EST1B), completesequence./gene="PAQR6"/locus_tag="RP11-54H19.6-002"/standard_name="OTTHUMP00000018882"/codon_start=1/product="progestin and adipoQ receptor family member VI"/protein_id="CAI15543.1"/db_xref="GI:55957519"/db_xref="GOA:Q5TCK8"/db_xref="UniProt/TrEMBL:Q5TCK8" 0 7 3 2 1 2 1 17 31 1 1 2 1 8 3 1 8 3 6 9 1 2 4 8 3 5 6 18 7 6 3 18 5 2 0 5 4 0 1 1 5 0 1 9 13 2 3 6 3 2 11 3 10 3 21 3 0 4 2 6 7 0 0 1 >HS1039K5#4\AL031587\join(43475..43590,83147..83425,83552..83588)\432\CAI18967.1\Homo sapiens\Human DNA sequence from clone RP5-1039K5 on chromosome22q12.3-13.2, complete sequence./locus_tag="RP5-1039K5.7-007"/standard_name="OTTHUMP00000028518"/codon_start=1/protein_id="CAI18967.1"/db_xref="GI:56202770"/db_xref="UniProt/TrEMBL:Q5JYU7" 0 1 1 0 4 1 1 5 4 3 0 4 3 5 1 8 1 2 0 2 2 1 2 7 3 5 1 5 1 4 2 4 4 1 1 2 4 1 0 3 2 4 1 1 1 2 3 4 8 1 0 0 6 2 1 0 0 2 0 3 3 0 0 1 >CR456751\CR456751\1..261\261\CAG33032.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834C1216D forgene SNRPF, small nuclear ribonucleoprotein polypeptide F; completecds, incl. stopcodon./gene="SNRPF"/codon_start=1/protein_id="CAG33032.1"/db_xref="GI:48145619" 0 0 0 0 2 1 1 2 2 3 2 1 0 0 0 2 0 1 2 0 0 0 1 2 0 1 1 0 0 1 5 2 1 2 2 1 3 1 2 3 1 5 0 1 0 1 8 2 0 4 3 2 0 1 1 0 2 1 0 6 1 1 0 0 >BC003557\BC003557\46..999\954\AAH03557.1\Homo sapiens\Homo sapiens apolipoprotein E, mRNA (cDNA clone MGC:1571IMAGE:3355712), complete cds./gene="APOE"/codon_start=1/product="apolipoprotein E precursor"/protein_id="AAH03557.1"/db_xref="GI:13097699"/db_xref="GeneID:348"/db_xref="MIM:107741" 0 22 10 1 0 1 1 7 31 0 0 2 0 4 1 1 8 0 3 7 1 1 0 4 3 1 4 24 9 2 1 13 3 1 0 2 21 1 1 12 0 1 2 30 2 0 6 34 8 3 4 0 2 0 3 1 1 1 0 8 8 0 0 1 >BC074940\BC074940\41..796\756\AAH74940.1\Homo sapiens\Homo sapiens tumor necrosis factor (ligand) superfamily, member 15,mRNA (cDNA clone MGC:104048 IMAGE:30915494), complete cds./gene="TNFSF15"/codon_start=1/product="tumor necrosis factor (ligand) superfamily, member 15"/protein_id="AAH74940.1"/db_xref="GI:49902479"/db_xref="GeneID:9966"/db_xref="MIM:604052" 2 1 1 1 3 3 4 7 10 3 1 3 1 3 1 3 8 3 7 9 0 2 7 4 0 3 7 8 0 3 8 3 4 1 3 4 7 2 5 7 6 1 4 11 4 2 8 7 6 4 6 2 5 1 11 4 0 6 1 7 3 0 1 0 >AK074216\AK074216\79..708\630\BAB85019.1\Homo sapiens\Homo sapiens cDNA FLJ23636 fis, clone CAS07176./codon_start=1/protein_id="BAB85019.1"/db_xref="GI:18676757" 0 1 0 0 6 3 8 2 5 4 1 2 2 2 1 5 3 1 2 0 1 2 4 5 1 5 7 2 0 3 6 1 2 1 5 2 5 8 11 12 4 3 4 1 3 0 10 7 3 3 3 3 3 1 5 11 3 4 3 1 3 0 0 1 >HSM800550\AL050393\1729..2511\783\CAB43687.2\Homo sapiens\Homo sapiens mRNA; cDNA DKFZp586F0420 (from clone DKFZp586F0420);partial cds./gene="DKFZp586F0420"/codon_start=1/product="hypothetical protein"/protein_id="CAB43687.2"/db_xref="GI:6562171"/db_xref="UniProt/TrEMBL:Q9UG54" 2 2 0 3 5 3 4 0 3 3 5 6 6 6 0 7 5 10 8 6 1 8 7 2 1 4 7 3 0 4 6 3 1 1 2 2 3 3 11 7 5 6 11 14 1 3 15 4 6 8 2 2 2 1 0 1 1 4 4 9 1 0 0 1 >BC039174\BC039174\82..3282\3201\AAH39174.1\Homo sapiens\Homo sapiens vinculin, mRNA (cDNA clone MGC:21734 IMAGE:4520338),complete cds./gene="VCL"/codon_start=1/product="VCL protein"/protein_id="AAH39174.1"/db_xref="GI:24657579"/db_xref="GeneID:7414"/db_xref="MIM:193065" 11 7 16 10 15 7 7 20 33 15 8 12 16 12 2 12 10 6 17 20 6 18 17 7 7 23 26 48 6 53 19 10 15 11 7 9 40 14 37 39 23 15 12 50 6 5 39 50 24 40 3 5 5 5 7 8 7 20 29 36 9 0 1 0 >BC010464\BC010464\96..368\273\AAH10464.1\Homo sapiens\Homo sapiens mitogen-activated protein kinase kinase kinase 3, mRNA(cDNA clone IMAGE:4179950), complete cds./gene="MAP3K3"/codon_start=1/product="MAP3K3 protein"/protein_id="AAH10464.1"/db_xref="GI:14714649"/db_xref="GeneID:4215" 1 0 0 0 0 1 1 2 6 1 1 2 1 1 0 2 1 2 0 2 0 4 2 0 1 5 2 3 0 3 3 5 2 0 0 5 2 4 1 2 0 3 1 0 0 0 2 4 0 0 1 1 1 2 0 1 0 0 1 4 1 0 0 1 >AY258289\AY258289\join(5573..5728,6880..7017,8678..8734,11369..11530, 13337..13438,13967..13996)\645\AAP14648.1\Homo sapiens\Homo sapiens membrane-spanning 4-domains subfamily A member 3(MS4A3) gene, complete cds./gene="MS4A3"/codon_start=1/product="membrane-spanning 4-domains subfamily A member 3"/protein_id="AAP14648.1"/db_xref="GI:30144639" 0 0 0 0 2 1 4 2 8 1 3 4 11 4 0 7 0 5 4 5 0 3 4 3 2 1 8 6 1 4 4 4 4 4 2 2 4 4 3 1 3 10 3 5 4 1 4 5 1 3 6 1 4 3 6 5 6 4 6 6 3 1 0 0 >AF126021\AF126021\186..1085\900\AAF17231.1\Homo sapiens\Homo sapiens B-cell receptor-associated protein BAP37 mRNA,complete cds./codon_start=1/evidence=not_experimental/product="B-cell receptor-associated protein BAP37"/protein_id="AAF17231.1"/db_xref="GI:6563274" 6 4 8 1 2 3 3 4 14 5 0 7 2 7 0 2 6 3 6 3 2 1 0 4 1 5 3 23 1 5 4 7 4 4 3 1 15 0 7 12 5 6 2 20 2 0 8 12 8 3 6 2 0 0 9 1 0 16 5 5 1 0 0 1 >AK126160\AK126160\2311..2922\612\BAC86466.1\Homo sapiens\Homo sapiens cDNA FLJ44172 fis, clone THYMU2036085./codon_start=1/protein_id="BAC86466.1"/db_xref="GI:34532558" 0 0 0 1 1 2 4 18 10 6 1 2 3 7 0 21 1 2 1 0 0 4 0 3 0 6 0 3 0 7 0 1 0 3 2 10 3 4 2 0 0 4 3 2 3 1 2 1 0 1 1 6 3 9 10 14 0 2 8 2 3 0 0 1 >BC036724\BC036724\572..1588\1017\AAH36724.1\Homo sapiens\Homo sapiens FBJ murine osteosarcoma viral oncogene homolog B, mRNA(cDNA clone MGC:39968 IMAGE:5212854), complete cds./gene="FOSB"/codon_start=1/product="FBJ murine osteosarcoma viral oncogene homolog B"/protein_id="AAH36724.1"/db_xref="GI:54673701"/db_xref="GeneID:2354"/db_xref="MIM:164772" 5 3 5 1 2 4 1 11 13 1 0 2 2 14 5 6 8 8 4 12 3 2 7 20 11 6 5 13 6 5 4 13 7 3 0 6 7 3 4 5 4 1 6 12 2 0 9 19 10 4 6 1 5 0 7 4 0 4 0 5 2 0 0 1 >AY242134\AY242134\1..1161\1161\AAO92301.1\Homo sapiens\Homo sapiens prostaglandin I2 (prostacyclin) receptor (PTGIR) mRNA,complete cds./gene="PTGIR"/codon_start=1/product="prostaglandin I2 (prostacyclin) receptor"/protein_id="AAO92301.1"/db_xref="GI:29825395" 3 14 3 0 0 4 0 17 34 3 0 2 1 7 8 2 14 1 3 10 2 0 3 12 5 6 2 35 10 5 5 18 8 2 0 8 19 0 1 4 5 0 1 14 5 0 1 7 9 2 8 1 18 0 20 1 0 8 0 10 5 0 0 1 >AB049824\AB049824\203..934\732\BAC06820.1\Homo sapiens\Homo sapiens mRNA for JC7, complete cds./gene="JC7"/codon_start=1/product="JC7"/protein_id="BAC06820.1"/db_xref="GI:22090467" 3 2 2 0 5 2 2 1 0 0 0 0 7 2 3 11 3 6 4 1 1 2 5 8 2 5 6 1 1 4 2 5 4 1 0 1 6 3 21 17 1 4 3 6 1 1 25 11 4 25 1 2 0 0 2 1 0 0 1 6 0 1 0 0 >AB102799\AB102799\1..168\168\BAD89390.1\Homo sapiens\Homo sapiens IL6ST mRNA for IL6ST nirs variant 1, complete cds./gene="IL6ST"/codon_start=1/product="IL6ST nirs variant 1"/protein_id="BAD89390.1"/db_xref="GI:58736963" 0 0 0 0 0 0 2 1 3 0 0 7 1 0 0 2 0 1 4 1 1 2 1 1 0 1 2 1 0 0 0 0 2 0 0 0 2 0 0 2 1 0 3 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 5 1 1 0 1 0 >HSDJ261K5#3\AL050350\join(27067..27146,29147..29322,34171..34279,37663..37839, 49271..49838)\1110\CAI42510.1\Homo sapiens\Human DNA sequence from clone RP1-261K5 on chromosome 6q21-22.1Contains the 3' end of the SLC22A16 gene for solute carrier family22 (organic cation transporter) member 16, the DDO gene forD-aspartate oxidase, the 5' part of a novel gene and two CpGislands, complete sequence./gene="DDO"/locus_tag="RP1-261K5.2-001"/standard_name="OTTHUMP00000017001"/codon_start=1/protein_id="CAI42510.1"/db_xref="GI:57208416"/db_xref="GOA:Q5JXM5"/db_xref="InterPro:IPR006076"/db_xref="InterPro:IPR006181"/db_xref="UniProt/TrEMBL:Q5JXM5" 7 0 2 0 6 12 1 9 13 6 0 3 6 8 0 1 4 4 8 7 2 8 5 8 4 5 8 9 1 9 10 10 6 8 4 5 14 5 5 8 3 4 4 10 8 6 9 10 9 8 2 5 6 4 6 12 3 6 9 4 10 0 1 0 >BT006799\BT006799\1..273\273\AAP35445.1\Homo sapiens\Homo sapiens ALL1-fused gene from chromosome 1q mRNA, complete cds./codon_start=1/product="ALL1-fused gene from chromosome 1q"/protein_id="AAP35445.1"/db_xref="GI:30582437" 0 0 0 0 1 2 0 2 6 2 0 1 1 2 1 0 5 2 1 2 0 1 1 2 0 2 3 2 0 1 0 3 1 2 0 1 1 1 3 1 2 0 1 2 1 0 4 3 4 3 3 0 0 0 4 1 0 3 1 3 2 0 1 0 >HSUBA80R\X63237\26..496\471\CAA44911.1\Homo sapiens\H.sapiens Uba80 mRNA for ubiquitin./gene="UbA80"/codon_start=1/product="ubiquitin"/protein_id="CAA44911.1"/db_xref="GI:37571" 1 1 0 3 4 1 0 1 5 5 0 2 0 0 1 4 0 2 0 4 2 4 1 2 0 3 1 1 0 4 2 3 2 3 1 1 4 2 6 19 1 4 1 5 2 2 6 4 3 6 3 3 1 5 2 3 1 3 4 2 0 1 0 0 >AY064485\AY064485\1..1536\1536\AAL57720.1\Homo sapiens\Homo sapiens cytochrome P450 (CYP4B1) mRNA, CYP4B1*3 allele,complete cds./gene="CYP4B1"/allele="CYP4B1*3"/codon_start=1/product="cytochrome P450"/protein_id="AAL57720.1"/db_xref="GI:18086502" 0 11 9 3 3 6 2 18 32 5 1 6 2 10 0 8 10 3 6 9 2 4 5 12 1 11 5 15 1 8 5 13 11 4 1 8 17 1 7 20 5 3 0 21 11 11 4 18 23 11 9 8 7 1 21 15 0 16 4 16 12 0 1 0 >AF099935\AF099935\85..681\597\AAC72975.1\Homo sapiens\Homo sapiens MDC-3.13 isoform 2 mRNA, complete cds./codon_start=1/product="MDC-3.13 isoform 2"/protein_id="AAC72975.1"/db_xref="GI:3860093" 0 1 2 0 2 3 2 3 5 2 4 5 2 6 0 0 0 3 3 6 0 1 0 1 0 1 4 6 0 2 1 0 1 2 0 3 8 3 8 12 5 10 5 5 4 4 7 9 3 8 2 3 1 2 1 9 2 8 2 6 0 0 0 1 >BC013724\BC013724\189..740\552\AAH13724.1\Homo sapiens\Homo sapiens ferritin, heavy polypeptide 1, mRNA (cDNA cloneMGC:17255 IMAGE:3857790), complete cds./gene="FTH1"/codon_start=1/product="ferritin, heavy polypeptide 1"/protein_id="AAH13724.1"/db_xref="GI:15489239"/db_xref="GeneID:2495"/db_xref="MIM:134770" 2 4 0 0 0 1 1 2 10 2 1 6 2 3 1 3 2 1 1 4 1 1 1 2 0 0 2 6 3 2 2 2 1 2 0 0 5 1 8 5 6 6 2 8 5 5 7 9 10 5 8 1 0 3 2 4 0 5 1 5 1 1 0 0 >HUAC002550#3\AC002550\complement(join(43656..43754,44484..44630,52661..52779, 61742..61907,74833..74922))\621\AAC05806.1\Homo sapiens\Human Chromosome 16 BAC clone CIT987SK-A-101F10, complete sequence./gene="101F10.2"/codon_start=1/product="Unknown gene product"/protein_id="AAC05806.1"/db_xref="GI:2911265" 1 3 1 0 1 3 3 9 13 5 2 2 0 5 3 1 6 0 1 6 4 2 2 0 1 2 3 5 0 1 1 2 4 0 0 3 4 2 5 8 10 2 3 6 3 6 5 8 10 3 6 0 2 2 5 3 2 6 2 6 2 0 0 1 >BC017441\BC017441\266..622\357\AAH17441.1\Homo sapiens\Homo sapiens solute carrier family 30 (zinc transporter), member 5,mRNA (cDNA clone IMAGE:3630637), complete cds./gene="SLC30A5"/codon_start=1/product="zinc transporter ZTL1"/protein_id="AAH17441.1"/db_xref="GI:16924303"/db_xref="GeneID:64924"/db_xref="MIM:607819" 1 0 0 0 1 1 4 1 1 3 4 2 2 0 0 2 1 0 1 1 0 3 1 2 1 2 1 1 0 3 3 7 4 1 1 0 5 3 12 3 1 1 1 2 2 0 2 2 3 2 1 2 0 1 3 7 2 0 6 2 0 0 0 1 >S72008\S72008\49..1305\1257\AAB31337.1\Homo sapiens\hCDC10=CDC10 homolog [human, fetal lung, mRNA, 2314 nt]./gene="hCDC10"/codon_start=1/protein_id="AAB31337.1"/db_xref="GI:560623" 1 1 1 6 11 4 2 5 4 7 6 9 4 1 2 5 2 3 8 2 1 5 8 0 0 7 7 2 0 8 9 2 2 8 6 3 11 10 28 19 10 16 11 16 3 7 30 15 6 13 6 6 2 3 4 10 5 9 9 14 3 1 0 0 >AK126695\AK126695\64..1260\1197\BAC86646.1\Homo sapiens\Homo sapiens cDNA FLJ44741 fis, clone BRACE3027931./codon_start=1/protein_id="BAC86646.1"/db_xref="GI:34533273" 0 12 0 9 2 1 3 27 19 5 0 3 14 6 11 7 11 0 4 1 0 32 3 5 0 3 23 6 13 11 0 3 1 0 14 2 1 0 1 1 19 2 20 9 10 4 2 2 2 1 1 15 32 4 4 1 2 0 1 13 0 0 0 1 >AY358759\AY358759\286..1902\1617\AAQ89119.1\Homo sapiens\Homo sapiens clone DNA97003 TREC2422 (UNQ2422) mRNA, complete cds./locus_tag="UNQ2422"/codon_start=1/product="TREC2422"/protein_id="AAQ89119.1"/db_xref="GI:37182637" 5 8 6 1 1 2 7 21 40 8 1 11 4 10 3 7 10 6 6 13 0 11 7 9 4 6 10 23 4 6 9 9 9 11 10 13 28 2 1 11 6 3 1 14 8 0 5 10 11 2 12 8 7 2 21 11 3 27 8 16 10 0 1 0 >BC014946\BC014946\187..1113\927\AAH14946.1\Homo sapiens\Homo sapiens translocase of outer mitochondrial membrane 40homolog-like (yeast), mRNA (cDNA clone MGC:22951 IMAGE:4872309),complete cds./gene="TOMM40L"/codon_start=1/product="TOMM40L protein"/protein_id="AAH14946.1"/db_xref="GI:15928956"/db_xref="GeneID:84134" 2 3 6 1 2 2 5 8 17 1 0 8 1 1 2 0 8 5 8 2 1 8 3 8 1 4 7 10 2 7 9 9 9 2 6 4 11 6 1 7 8 4 1 12 10 5 3 12 4 7 5 4 1 2 11 4 1 3 1 8 5 0 0 1 >AF015185\AF015185\354..1547\1194\AAB92496.1\Homo sapiens\Homo sapiens clone ET10 ET putative translation product (ET) mRNA,alternatively spliced complete cds./gene="ET"/codon_start=1/product="ET putative translation product"/protein_id="AAB92496.1"/db_xref="GI:2708503" 2 1 1 1 2 2 4 8 10 10 6 4 2 3 0 15 12 8 10 4 1 7 3 0 4 4 5 16 1 12 16 8 2 6 4 5 11 9 8 7 6 9 4 5 3 0 11 5 7 6 7 6 3 3 17 24 3 10 21 10 3 0 0 1 >BC017034\BC017034\87..782\696\AAH17034.1\Homo sapiens\Homo sapiens lysophospholipase II, mRNA (cDNA clone MGC:9106IMAGE:3845764), complete cds./gene="LYPLA2"/codon_start=1/product="lysophospholipase II"/protein_id="AAH17034.1"/db_xref="GI:16877568"/db_xref="GeneID:11313" 1 0 5 0 0 2 0 8 13 2 1 2 1 4 0 3 3 2 3 6 3 0 2 7 1 10 4 13 3 8 4 5 6 2 1 6 7 2 0 11 4 2 0 5 5 4 3 9 6 2 3 0 3 3 2 5 1 8 2 10 3 1 0 0 >HSA245821\AJ245821\85..1827\1743\CAB57951.1\Homo sapiens\Homo sapiens mRNA for putative secreted molecule (psk-2 gene)./gene="psk-2"/function="putative role in seizures"/codon_start=1/product="putative secreted molecule"/protein_id="CAB57951.1"/db_xref="GI:6018462"/db_xref="UniProt/TrEMBL:Q9UJ46" 3 10 10 1 0 7 2 14 45 7 1 5 4 9 5 4 14 3 5 17 10 6 17 28 9 13 6 21 0 4 12 21 21 7 2 12 16 3 2 2 9 9 3 18 11 4 13 34 17 7 7 6 9 7 10 5 2 14 3 8 6 0 1 0 >HSMSHR\X65634\462..1415\954\CAA46588.1\Homo sapiens\H.sapiens MSH-R gene for melanocyte stimulating hormone receptor./gene="MSH-R"/codon_start=1/evidence=experimental/product="melanocyte stimulating hormone receptor"/protein_id="CAA46588.1"/db_xref="GI:34791"/db_xref="GOA:Q01726"/db_xref="UniProt/Swiss-Prot:Q01726" 1 4 6 0 2 2 0 20 31 2 0 2 1 8 1 1 8 1 4 6 5 0 0 7 3 0 2 24 4 5 2 9 4 1 0 10 19 1 1 4 8 2 0 11 9 1 0 7 7 0 8 0 14 0 15 2 0 21 2 6 3 0 0 1 >AK000922\AK000922\273..1619\1347\BAA91427.1\Homo sapiens\Homo sapiens cDNA FLJ10060 fis, clone HEMBA1001407./codon_start=1/protein_id="BAA91427.1"/db_xref="GI:7021891" 0 2 4 2 1 2 4 9 52 8 1 8 6 6 1 8 9 4 2 12 1 5 11 15 3 12 15 28 2 7 3 19 10 13 5 4 37 2 1 4 3 3 4 9 7 3 4 12 5 0 4 2 4 8 16 5 0 6 0 8 7 0 0 1 >BC047021\BC047021\130..1293\1164\AAH47021.1\Homo sapiens\Homo sapiens mRNA similar to immunoglobulin superfamily, member 4(cDNA clone MGC:51880 IMAGE:5248100), complete cds./codon_start=1/product="Similar to immunoglobulin superfamily, member 4"/protein_id="AAH47021.1"/db_xref="GI:28422552" 1 2 4 1 4 6 6 6 18 2 2 5 4 4 3 6 4 6 8 9 2 7 5 7 1 7 4 8 14 3 4 5 9 5 6 9 20 1 9 8 9 7 4 15 5 2 13 10 9 11 5 8 5 3 6 6 2 13 6 10 3 0 0 1 >AF165519\AF165519\25..579\555\AAF86649.1\Homo sapiens\Homo sapiens mitogen-activated protein kinase phosphatase x (MKPX)mRNA, complete cds./gene="MKPX"/codon_start=1/product="mitogen-activated protein kinase phosphatase x"/protein_id="AAF86649.1"/db_xref="GI:9294745" 0 1 2 1 6 2 0 3 12 1 0 3 1 2 0 2 4 2 3 2 0 1 3 2 0 2 3 8 2 2 3 4 4 1 1 4 5 1 5 6 7 1 2 6 4 4 6 8 2 5 3 2 4 1 5 3 0 5 5 4 3 1 0 0 >BC021708\BC021708\88..1104\1017\AAH21708.1\Homo sapiens\Homo sapiens HLA-G histocompatibility antigen, class I, G, mRNA(cDNA clone MGC:26146 IMAGE:4822521), complete cds./gene="HLA-G"/codon_start=1/product="HLA-G protein"/protein_id="AAH21708.1"/db_xref="GI:18203847"/db_xref="GeneID:3135"/db_xref="MIM:142871" 1 8 6 0 5 5 1 9 19 1 0 0 1 7 2 3 5 1 2 15 3 3 1 8 4 4 6 12 9 8 4 7 9 2 1 5 15 2 0 11 5 1 1 15 6 3 3 24 13 5 9 5 4 2 7 1 1 6 1 10 11 0 0 1 >BC093736\BC093736\36..1139\1104\AAH93736.1\Homo sapiens\Homo sapiens heparan sulfate (glucosamine) 3-O-sulfotransferase 2,mRNA (cDNA clone MGC:120771 IMAGE:7939581), complete cds./gene="HS3ST2"/codon_start=1/product="heparan sulfate D-glucosaminyl 3-O-sulfotransferase 2"/protein_id="AAH93736.1"/db_xref="GI:62739479"/db_xref="GeneID:9956"/db_xref="MIM:604056" 7 12 6 1 3 7 1 15 18 1 0 4 3 9 2 1 11 1 2 9 7 4 1 14 7 8 0 12 5 3 1 17 7 3 2 4 12 2 7 12 7 1 2 11 6 1 6 9 12 4 8 5 7 2 14 6 1 11 4 5 4 1 0 0 >AY055383\AY055383\join(242..267,363..565,5232..5365,7066..7133,7691..7875, 8189..8230,13999..14203,17362..17498,18942..19054, 19377..19486,28501..28646,29073..29171,30136..30249, 32749..32865,33129..33297,34437..34708,36819..36955, 37543..37701)\2436\AAL16413.1\Homo sapiens\Homo sapiens type II transmembrane serine protease 6 (TMPRSS6)gene, complete cds./gene="TMPRSS6"/codon_start=1/product="type II transmembrane serine protease 6"/protein_id="AAL16413.1"/db_xref="GI:23428409" 2 22 11 1 3 10 3 29 49 1 1 8 3 20 9 3 23 5 6 16 9 1 4 28 8 6 5 32 8 8 7 33 19 10 1 16 49 4 7 21 14 3 3 38 17 1 9 35 28 12 25 7 20 18 23 3 2 19 3 11 19 0 0 1 >AK094247\AK094247\20..1024\1005\BAC04317.1\Homo sapiens\Homo sapiens cDNA FLJ36928 fis, clone BRACE2005216, weakly similarto Xenopus laevis bicaudal-C (Bic-C) mRNA./codon_start=1/protein_id="BAC04317.1"/db_xref="GI:21753271" 1 2 5 1 4 6 1 5 9 5 3 4 7 16 3 10 16 6 5 8 2 7 7 14 3 8 8 8 4 3 7 14 9 5 0 3 5 2 10 10 9 2 3 9 7 3 5 13 7 4 0 1 0 1 3 5 2 4 6 3 1 0 0 1 >AL136128#4\AL136128\join(127425..127443,AL121938.10:41776..41891, AL121938.10:50816..50964,AL121938.10:69980..70081, AL121938.10:75415..75453,AL121938.10:84532..84541)\-609666\CAI23002.1\Homo sapiens\Human DNA sequence from clone RP1-84N20 on chromosome 6 Contains anovel gene, the 5' end of the TPD52L1 gene encoding two variants oftumor protein D52-like 1 protein and two CpG islands, completesequence./gene="TPD52L1"/locus_tag="RP1-167O5.1-003"/standard_name="OTTHUMP00000017135"/codon_start=1/protein_id="CAI23002.1"/db_xref="GI:56204340"/db_xref="InterPro:IPR007327"/db_xref="UniProt/TrEMBL:Q9BUQ6" 0 0 0 0 2 2 1 5 5 5 7 0 3 1 0 3 1 3 3 2 0 1 4 4 0 1 2 1 1 2 3 1 0 0 3 2 2 2 2 3 1 6 3 4 3 4 3 2 0 3 2 2 1 0 4 11 3 0 3 5 0 5 1 2 >AF390031\AF390031\join(2492..2558,3337..3505,4348..4540)\429\AAM73649.1\Homo sapiens\Homo sapiens trans-activated by hepatitis C virus core protein 2(TAHCCP2) gene, complete cds./gene="TAHCCP2"/codon_start=1/product="trans-activated by hepatitis C virus core protein 2"/protein_id="AAM73649.1"/db_xref="GI:21666302" 0 1 6 1 2 4 1 5 4 0 0 1 0 6 0 2 3 1 1 3 2 0 1 3 2 3 1 6 4 7 2 5 4 1 1 1 5 1 1 9 5 1 1 5 1 1 4 5 2 1 1 1 0 1 3 0 0 3 2 4 1 0 0 1 >BC016395\BC016395\126..806\681\AAH16395.1\Homo sapiens\Homo sapiens hepatocellularcarcinoma-associated antigen HCA557a,transcript variant 1, mRNA (cDNA clone MGC:16818 IMAGE:3882414),complete cds./gene="DKFZP586D0919"/codon_start=1/product="hepatocellularcarcinoma-associated antigen HCA557a, isoform a"/protein_id="AAH16395.1"/db_xref="GI:16741082"/db_xref="GeneID:25895" 1 2 2 1 2 4 1 4 18 0 0 2 0 3 3 2 3 1 2 5 1 1 3 6 1 3 3 7 6 4 1 6 12 1 0 4 11 1 0 5 4 4 3 11 3 6 7 8 6 6 2 4 2 2 9 3 0 9 1 2 2 0 0 1 >BC046367\BC046367\238..1713\1476\AAH46367.1\Homo sapiens\Homo sapiens synaptotagmin IX, mRNA (cDNA clone MGC:50869IMAGE:5770087), complete cds./gene="SYT9"/codon_start=1/product="synaptotagmin IX"/protein_id="AAH46367.1"/db_xref="GI:28204903"/db_xref="GeneID:143425" 5 3 7 4 7 9 2 10 18 8 6 9 7 8 3 6 5 3 5 11 3 4 4 13 3 9 6 10 2 6 3 9 6 4 3 10 15 4 14 14 15 10 6 15 13 4 7 18 26 14 5 9 5 8 9 8 5 12 10 8 6 0 0 1 >AY358239\AY358239\37..345\309\AAQ88606.1\Homo sapiens\Homo sapiens clone DNA147246 KLIA6249 (UNQ6249) mRNA, complete cds./locus_tag="UNQ6249"/codon_start=1/product="KLIA6249"/protein_id="AAQ88606.1"/db_xref="GI:37181594" 0 0 0 0 0 0 0 4 5 0 1 0 0 1 1 1 1 1 2 10 1 3 2 0 2 8 3 10 5 4 0 2 2 0 1 1 0 1 3 2 0 1 1 4 0 0 3 7 1 1 0 0 0 1 2 1 0 2 0 1 0 0 0 1 >HSJ469A13#4\AL132768\complement(join(51898..52007,53873..53960,54792..54939, 57197..57271,57668..57727,61571..61658,65879..65951, 67079..67152,70535..70568))\750\CAI21468.1\Homo sapiens\Human DNA sequence from clone RP3-469A13 on chromosome 20 Containsthe 3' end of a novel gene, the C20orf172 gene for chromosome 20open reading frame 172, the 5' end of the NDRG3 gene for NDRGfamily member 3 (N-myc downstream-regulated gene 3, FLJ13556) andfour CpG islands, complete sequence./gene="C20orf172"/locus_tag="RP3-469A13.1-009"/standard_name="OTTHUMP00000030880"/codon_start=1/protein_id="CAI21468.1"/db_xref="GI:56204753"/db_xref="UniProt/TrEMBL:Q5JW55" 1 0 2 1 4 1 3 4 5 9 4 6 8 2 0 13 4 5 3 1 0 8 2 1 0 4 5 4 0 3 9 2 1 0 1 3 6 1 14 6 2 3 4 18 1 2 14 8 4 9 3 1 1 4 5 7 5 2 2 7 1 0 0 1 >S82917\S82917\51..335\285\AAB46802.1\Homo sapiens\dopamine D4 receptor {exon 1} [human, leukocytes, Genomic, 350 nt]./gene="dopamine D4 receptor"/codon_start=1/product="dopamine D4 receptor"/protein_id="AAB46802.1"/db_xref="GI:1835971" 0 3 1 0 0 0 0 8 9 0 0 0 0 2 1 2 3 0 0 3 1 0 0 1 2 0 2 5 11 3 0 3 9 0 0 1 8 0 0 0 3 0 0 2 0 0 0 2 3 0 1 0 1 0 2 0 0 2 0 1 0 0 0 0 >AK057511\AK057511\153..2429\2277\BAB71515.1\Homo sapiens\Homo sapiens cDNA FLJ32949 fis, clone TESTI2008020, weakly similarto DPY-19 PROTEIN./codon_start=1/protein_id="BAB71515.1"/db_xref="GI:16553246" 3 6 8 7 12 10 8 13 16 22 22 18 12 12 2 14 9 11 21 11 1 12 6 7 4 12 14 13 3 11 17 10 5 6 11 5 14 17 18 16 7 19 10 15 5 9 28 15 6 11 9 21 6 9 16 33 15 13 26 30 16 0 0 1 >AK024983\AK024983\366..833\468\BAB15049.1\Homo sapiens\Homo sapiens cDNA: FLJ21330 fis, clone COL02466./codon_start=1/protein_id="BAB15049.1"/db_xref="GI:10437414" 0 0 1 2 2 5 2 2 13 0 0 1 0 4 0 2 2 0 1 5 0 2 2 5 0 3 1 7 0 5 4 3 5 0 0 4 4 1 0 3 3 2 2 8 4 5 5 4 3 4 1 3 1 1 4 1 0 6 1 4 2 0 0 1 >CR457379\CR457379\1..1842\1842\CAG33660.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834B0621D forgene PDCD8, programmed cell death 8 (apoptosis-inducing factor);complete cds, incl. stopcodon./gene="PDCD8"/codon_start=1/protein_id="CAG33660.1"/db_xref="GI:48146875" 8 1 8 0 13 9 5 8 15 8 5 11 13 3 0 10 8 9 12 4 2 9 8 10 4 10 14 12 4 22 17 17 9 16 9 11 20 11 18 23 9 12 6 16 3 7 22 20 13 16 8 5 3 2 14 6 6 9 20 12 8 1 0 0 >HUMTFPC\M27436\100..987\888\AAA36734.1\Homo sapiens\Human tissue factor gene, complete cds, with a Alu repetitivesequence in the 3' untranslated region./codon_start=1/protein_id="AAA36734.1"/db_xref="GI:339508" 1 1 6 0 3 1 2 5 5 2 3 3 6 4 0 3 6 4 14 5 2 9 3 5 3 3 6 5 0 5 7 5 4 1 3 9 13 5 10 10 8 8 3 7 1 0 7 14 5 6 7 5 1 4 8 5 3 5 7 2 7 1 0 0 >AY527412\AY527412\join(18626..18709,20398..20499,21575..21668,21833..21900, 24457..24575,26195..26270,34900..35081,35237..35376, 37188..37289,37599..37826,38268..38329)\1257\AAS00097.1\Homo sapiens\Homo sapiens RAD52 homolog (S. cerevisiae) (RAD52) gene, completecds./gene="RAD52"/codon_start=1/product="RAD52 homolog (S. cerevisiae)"/protein_id="AAS00097.1"/db_xref="GI:41324138" 4 3 5 1 5 8 1 6 12 7 5 3 6 7 1 10 12 4 5 3 4 9 8 3 5 7 12 10 4 8 10 9 4 6 2 5 16 5 7 20 9 7 5 27 6 5 11 16 14 8 8 4 6 2 3 5 2 3 4 7 4 1 0 0 >AF035119\AF035119\325..3600\3276\AAB87700.1\Homo sapiens\Homo sapiens deleted in liver cancer-1 (DLC-1) mRNA, complete cds./gene="DLC-1"/codon_start=1/product="deleted in liver cancer-1"/protein_id="AAB87700.1"/db_xref="GI:2654198" 8 12 14 3 10 20 14 24 44 8 8 13 11 33 8 18 50 13 8 19 10 13 13 29 7 17 9 21 7 14 9 21 15 5 6 19 28 13 30 43 35 16 13 37 17 10 32 36 40 29 18 10 11 10 22 13 3 25 10 25 12 0 0 1 >AY226382\AY226382\1..3102\3102\AAO59428.1\Homo sapiens\Homo sapiens NOD protein family member 17 (NOD17) mRNA, completecds./gene="NOD17"/codon_start=1/product="NOD protein family member 17"/protein_id="AAO59428.1"/db_xref="GI:29372853" 4 6 4 8 19 13 13 22 47 31 12 32 13 6 7 23 26 14 16 11 8 19 12 14 4 4 15 17 3 18 13 9 9 11 8 12 17 13 29 27 22 32 14 24 10 14 29 30 25 21 15 16 19 21 19 33 18 21 18 32 11 0 0 1 >BC011934\BC011934\3..686\684\AAH11934.1\Homo sapiens\Homo sapiens sperm associated antigen 7, mRNA (cDNA clone MGC:20134IMAGE:4330461), complete cds./gene="SPAG7"/codon_start=1/product="SPAG7 protein"/protein_id="AAH11934.1"/db_xref="GI:15080361"/db_xref="GeneID:9552" 1 4 3 4 1 4 5 2 7 0 0 1 2 6 0 1 6 2 2 2 0 1 3 4 1 2 4 11 1 3 2 4 4 1 0 1 10 0 7 19 1 3 5 11 2 1 7 20 10 5 4 1 0 2 3 4 1 6 2 7 1 0 1 0 >HSP0071\X81889\142..3777\3636\CAA57478.1\Homo sapiens\H.sapiens mRNA for p0071 protein./codon_start=1/product="p0071 protein"/protein_id="CAA57478.1"/db_xref="GI:1702924"/db_xref="GOA:Q99569"/db_xref="UniProt/Swiss-Prot:Q99569" 14 6 15 4 27 23 10 14 38 14 9 21 36 31 7 34 21 17 29 31 6 16 33 18 5 24 28 26 4 18 23 21 20 11 10 8 31 24 27 23 37 22 27 48 10 16 25 35 29 29 20 30 8 6 7 15 10 13 16 20 11 0 0 1 >HUMA2CIIA\D13538\3..1379\1377\BAA02737.1\Homo sapiens\Human alpha2CII-adrenergic receptor gene, complete cds./codon_start=1/product="alpha2CII-adrenergic receptor"/protein_id="BAA02737.1"/db_xref="GI:219406" 4 23 12 0 1 6 0 13 25 0 0 2 0 11 15 0 13 0 1 8 5 1 2 11 12 0 5 25 31 4 0 28 13 4 0 11 29 1 0 8 10 2 0 11 2 1 1 14 8 2 15 0 11 1 24 3 0 16 0 5 8 0 0 1 >AK126497\AK126497\1109..3427\2319\BAC86566.1\Homo sapiens\Homo sapiens cDNA FLJ44533 fis, clone UTERU3004523./codon_start=1/protein_id="BAC86566.1"/db_xref="GI:34532995" 3 3 0 4 18 23 6 13 17 14 4 13 25 26 1 30 16 20 13 13 1 18 24 19 2 28 10 13 0 14 10 13 17 14 4 7 11 9 5 19 11 9 22 26 17 19 16 31 10 26 7 5 6 9 11 8 3 5 4 12 15 1 0 0 >HSU09412\U09412\522..1568\1047\AAC50253.1\Homo sapiens\Human zinc finger protein ZNF134 mRNA, complete cds./codon_start=1/product="zinc finger protein ZNF134"/protein_id="AAC50253.1"/db_xref="GI:488553" 4 2 1 0 8 9 1 2 1 7 2 1 3 4 0 2 9 14 6 3 1 10 1 2 2 7 4 4 0 3 7 4 9 3 1 3 2 8 19 14 4 5 5 15 20 10 14 13 4 5 5 8 13 10 6 8 1 4 9 4 2 0 1 0 >AL136231#1\AL136231\join(AL162587.20:72938..73028,AL162587.20:126825..126965, 379..471,3274..3388,4977..5019,6599..6697,11134..11318, 12837..12944,14931..15053,15499..15693,21968..22102, 24242..24488)\-323829\CAI41264.1\Homo sapiens\Human DNA sequence from clone RP11-6J24 on chromosome 9p24.1-24.3Contains the 3' end of the SLC1A1 gene for solute carrier family 1(neuronal/epithelial high affinity glutamate transporter, systemXag), the 5' end of a novel gene (FLJ10058), the gene forHsp90-associating relative of Cdc37 (HARC) (FLJ20639), the 3' endof the AK3L1 gene for adenylate kinase 3 like 1 (AK3), a ribosomalprotein S6 (RPS6) pseudogene and 3 CpG islands, complete sequence./gene="SLC1A1"/locus_tag="RP11-6J24.1-001"/standard_name="OTTHUMP00000021006"/codon_start=1/product="solute carrier family 1 (neuronal\/epithelial high affinity glutamate transporter, system Xag), member 1"/protein_id="CAI41264.1"/db_xref="GI:57208640"/db_xref="InterPro:IPR001991" 3 3 1 2 3 5 5 9 20 9 6 8 6 8 0 6 5 6 11 13 3 6 4 5 1 7 10 10 3 14 8 11 6 7 4 18 20 11 14 18 13 7 1 14 3 1 17 7 12 11 5 13 1 6 16 13 6 18 25 21 3 1 2 0 >BC009915\BC009915\68..1240\1173\AAH09915.1\Homo sapiens\Homo sapiens brain protein 16, mRNA (cDNA clone MGC:2733IMAGE:2822563), complete cds./gene="LOC51236"/codon_start=1/product="brain protein 16"/protein_id="AAH09915.1"/db_xref="GI:14602823"/db_xref="GeneID:51236" 3 10 12 1 0 4 3 8 37 5 0 9 0 3 2 2 3 1 4 2 3 1 6 14 11 7 6 22 30 4 4 11 9 1 1 2 17 1 0 4 5 1 1 19 6 1 6 31 16 4 5 0 6 2 4 1 0 4 2 8 5 0 0 1 >BC050272\BC050272\696..1799\1104\AAH50272.1\Homo sapiens\Homo sapiens zinc finger, DHHC-type containing 2, mRNA (cDNA cloneMGC:21181 IMAGE:4398300), complete cds./gene="ZDHHC2"/codon_start=1/product="rec"/protein_id="AAH50272.1"/db_xref="GI:30044999"/db_xref="GeneID:51201" 4 1 4 0 5 3 5 6 7 9 5 8 4 6 1 10 9 3 6 4 1 7 8 3 2 6 10 8 2 5 6 7 1 4 1 4 6 5 12 8 6 10 5 7 2 10 8 8 1 12 8 7 10 6 9 17 3 7 4 12 9 1 0 0 >AF450008\AF450008\243..1607\1365\AAL47160.1\Homo sapiens\Homo sapiens CFTR-associated ligand (CAL) mRNA, complete cds./gene="CAL"/codon_start=1/product="CFTR-associated ligand"/protein_id="AAL47160.1"/db_xref="GI:17865154" 4 0 1 5 8 2 5 5 15 11 5 11 2 5 1 9 3 5 5 3 0 8 9 0 2 5 13 7 3 15 13 8 11 11 8 2 10 9 24 13 7 4 13 12 6 14 26 16 10 21 3 6 7 0 2 4 3 8 8 6 2 1 0 0 >AY032594\AY032594\1..570\570\AAK51136.1\Homo sapiens\Homo sapiens hepatitis C virus core-binding protein 6 (HCBP6) mRNA,complete cds./gene="HCBP6"/codon_start=1/product="hepatitis C virus core-binding protein 6"/protein_id="AAK51136.1"/db_xref="GI:20384691" 2 2 0 4 0 1 2 2 5 4 0 1 3 4 1 2 4 1 4 2 0 5 2 0 0 3 6 5 5 4 11 2 3 3 1 2 9 3 4 13 2 3 3 7 1 1 5 7 4 1 2 1 1 0 5 8 2 2 1 4 4 1 0 0 >HUMKAP1A\L27711\52..690\639\AAA66496.1\Homo sapiens\Human protein phosphatase (KAP1) mRNA, complete cds./gene="KAP1"/function="interaction with cyclin dependent kinases"/standard_name="dual specificity protein phosphatase"/codon_start=1/product="protein phosphatase"/protein_id="AAA66496.1"/db_xref="GI:443669" 3 0 1 0 9 0 6 3 4 7 3 1 12 1 0 5 3 2 4 5 0 2 5 1 1 2 3 2 0 4 5 0 4 3 2 2 1 2 7 3 1 4 7 4 1 6 8 5 7 7 4 3 4 8 1 5 10 3 3 2 1 1 0 0 >BT006684\BT006684\1..1476\1476\AAP35330.1\Homo sapiens\Homo sapiens ferredoxin reductase mRNA, complete cds./codon_start=1/product="ferredoxin reductase"/protein_id="AAP35330.1"/db_xref="GI:30582207" 4 11 14 3 3 5 4 9 30 4 1 3 2 4 3 3 13 2 7 9 9 2 9 16 5 13 8 24 3 6 4 20 13 8 1 10 32 1 1 17 5 2 2 18 8 3 4 26 15 8 7 1 4 4 10 5 1 8 6 7 11 0 1 0 >BC022003\BC022003\42..1691\1650\AAH22003.1\Homo sapiens\Homo sapiens myotubularin related protein 9, mRNA (cDNA cloneMGC:26187 IMAGE:4825583), complete cds./gene="MTMR9"/codon_start=1/product="myotubularin-related protein 9"/protein_id="AAH22003.1"/db_xref="GI:18314567"/db_xref="GeneID:66036"/db_xref="MIM:606260" 8 2 4 5 5 9 4 12 20 9 3 12 7 12 0 7 4 9 9 10 3 7 7 6 3 8 6 10 2 15 9 6 4 7 5 6 9 3 12 14 10 15 5 22 8 7 25 24 12 7 7 9 6 7 9 18 5 18 17 11 14 0 0 1 >HUMDEF5A\M97925\join(1399..1570,2550..2662)\285\AAA35754.1\Homo sapiens\H.sapiens defensin 5 gene, complete cds./codon_start=1/product="defensin 5"/protein_id="AAA35754.1"/db_xref="GI:181533" 1 2 0 2 4 1 0 6 2 3 0 0 2 3 0 3 0 1 1 6 0 0 0 0 0 0 2 5 0 7 2 2 2 1 0 0 2 0 0 1 1 1 0 7 0 0 3 3 2 1 1 1 3 3 0 1 0 4 1 1 0 0 0 1 >BC068976\BC068976\79..3303\3225\AAH68976.1\Homo sapiens\Homo sapiens phospholipase D1, phophatidylcholine-specific, mRNA(cDNA clone MGC:70810 IMAGE:6068382), complete cds./gene="PLD1"/codon_start=1/product="phospholipase D1, phophatidylcholine-specific"/protein_id="AAH68976.1"/db_xref="GI:46362479"/db_xref="GeneID:5337"/db_xref="MIM:602382" 7 6 10 8 26 15 7 15 23 20 13 13 13 13 1 19 13 15 19 10 5 10 15 17 3 11 14 17 1 29 29 11 11 10 10 13 27 14 43 31 20 26 20 19 23 21 37 37 28 29 16 25 8 4 28 25 25 26 28 23 19 1 0 0 >AF022152\AF022152\32..3280\3249\AAB71894.1\Homo sapiens\Homo sapiens AP-3 complex beta3B subunit mRNA, complete cds./codon_start=1/product="AP-3 complex beta3B subunit"/protein_id="AAB71894.1"/db_xref="GI:2460298" 5 16 14 7 4 7 7 18 69 7 2 4 13 28 7 15 27 20 13 31 4 8 12 29 3 17 15 45 15 17 9 28 12 6 7 20 44 6 22 50 23 9 5 41 14 5 21 71 39 20 20 7 4 7 21 7 4 30 19 27 5 0 0 1 >AK022907\AK022907\163..2031\1869\BAB14302.1\Homo sapiens\Homo sapiens cDNA FLJ12845 fis, clone NT2RP2003307, moderatelysimilar to KINESIN LIGHT CHAIN./codon_start=1/protein_id="BAB14302.1"/db_xref="GI:10434570" 3 17 17 6 0 5 3 16 46 1 0 9 3 10 2 5 22 5 9 11 1 2 6 11 3 10 9 27 5 14 6 22 17 5 5 7 17 2 7 36 17 5 1 31 10 3 11 48 19 15 12 7 5 2 3 3 0 13 1 10 4 1 0 0 >AK172809\AK172809\215..1033\819\BAD18779.1\Homo sapiens\Homo sapiens cDNA FLJ23970 fis, clone HEP17024./codon_start=1/protein_id="BAD18779.1"/db_xref="GI:47077816" 0 1 1 1 9 5 0 7 7 3 16 0 3 0 0 0 4 0 12 1 2 2 1 1 0 13 4 1 0 3 6 1 1 1 12 8 19 1 5 6 14 16 0 3 1 3 9 3 4 1 1 4 0 0 17 3 11 16 2 7 0 0 1 0 >BC027937\BC027937\418..2010\1593\AAH27937.1\Homo sapiens\Homo sapiens retinoic acid induced 2, mRNA (cDNA clone MGC:34677IMAGE:5201821), complete cds./gene="RAI2"/codon_start=1/product="RAI2 protein"/protein_id="AAH27937.1"/db_xref="GI:20379790"/db_xref="GeneID:10742"/db_xref="MIM:300217" 0 1 3 0 3 2 1 16 22 3 1 7 5 24 0 5 13 10 2 14 2 5 17 30 5 19 5 19 3 9 4 14 4 3 2 12 24 0 12 16 15 8 3 28 11 1 13 23 13 6 2 4 2 2 9 9 3 14 7 18 2 1 0 0 >AB062430\AB062430\225..2243\2019\BAB93493.1\Homo sapiens\Homo sapiens OK/SW-cl.100 mRNA, complete cds./gene="OK/SW-cl.100"/codon_start=1/protein_id="BAB93493.1"/db_xref="GI:21104446" 5 19 21 3 3 2 2 8 43 1 0 5 10 19 10 6 27 8 6 13 12 5 17 48 17 10 15 45 9 7 4 23 15 4 0 9 35 0 7 22 4 3 1 15 12 3 6 40 25 4 4 1 8 2 8 3 0 6 0 9 3 0 1 0 >HUMTRHYAL\L09190\join(1507..1644,2512..8070)\5697\AAA65582.1\Homo sapiens\Human trichohyalin (TRHY) gene, complete cds./gene="TRHY"/codon_start=1/product="trichohyalin"/protein_id="AAA65582.1"/db_xref="GI:292836" 15 151 59 31 57 105 14 36 120 9 5 6 0 5 1 4 12 8 4 0 7 4 9 4 11 4 7 14 3 7 10 9 8 2 1 5 9 2 48 57 4 4 75 262 16 7 151 361 41 18 11 8 2 5 33 6 1 7 3 2 18 1 0 0 >BT019657\BT019657\1..1011\1011\AAV38463.1\Homo sapiens\Homo sapiens TATA box binding protein mRNA, complete cds./codon_start=1/product="TATA box binding protein"/protein_id="AAV38463.1"/db_xref="GI:54696182" 2 0 1 1 7 4 2 4 7 4 4 5 3 5 1 4 2 7 5 8 5 7 9 8 2 10 12 7 1 7 6 5 3 5 5 1 6 5 6 9 6 4 11 46 2 0 8 5 2 2 3 6 1 2 5 8 2 7 11 10 0 0 1 0 >AF026005\AF026005\1..2577\2577\AAB88808.1\Homo sapiens\Homo sapiens delayed rectifier potassium channel Kv2.1 (DRK1) mRNA,complete cds./gene="DRK1"/codon_start=1/product="delayed rectifier potassium channel Kv2.1"/protein_id="AAB88808.1"/db_xref="GI:2570866" 6 15 11 3 5 8 4 30 27 10 2 12 4 26 5 11 31 14 14 19 7 10 8 24 4 15 12 26 7 12 11 16 14 7 4 14 14 5 21 37 23 10 5 22 15 3 15 48 27 11 16 3 12 3 31 7 2 27 17 26 10 0 0 1 >AF153978\AF153978\246..1097\852\AAF75588.1\Homo sapiens\Homo sapiens CD40-like protein precusor mRNA, complete cds./codon_start=1/product="CD40-like protein precusor"/protein_id="AAF75588.1"/db_xref="GI:8489097" 0 2 2 1 4 5 2 4 11 0 0 1 2 8 0 1 9 2 5 12 4 0 8 8 4 6 1 13 2 3 4 9 8 4 2 9 11 2 3 9 4 2 1 11 6 0 3 14 8 1 4 2 16 7 4 1 2 4 3 3 6 0 0 1 >AK001753\AK001753\128..1525\1398\BAA91884.1\Homo sapiens\Homo sapiens cDNA FLJ10891 fis, clone NT2RP4002078, weakly similarto ZINC FINGER PROTEIN 91./codon_start=1/protein_id="BAA91884.1"/db_xref="GI:7023216" 4 1 1 6 9 7 2 4 8 10 4 2 14 4 1 7 2 6 11 2 0 12 2 1 0 9 8 4 0 5 12 9 2 4 4 3 5 10 29 21 6 22 12 9 13 26 17 17 7 8 9 3 5 18 10 10 4 3 11 8 2 0 1 0 >AL450310\AL450310\complement(join(33273..33364,34830..34891,38748..38872, 53976..54116,55402..55608,60637..60719,62653..62812, 66293..66370,70972..71063,73447..73612,84113..84282, 88802..88865,93154..93389,96576..96756))\1857\CAH72298.1\Homo sapiens\Human DNA sequence from clone RP11-307O14 on chromosome 6 Containsthe 5' end of a novel gene and the gene for MacGAP protein,complete sequence./gene="ARHGAP18"/locus_tag="RP11-307O14.1-001"/standard_name="OTTHUMP00000017190"/codon_start=1/product="Rho GTPase activating protein 18"/protein_id="CAH72298.1"/db_xref="GI:55665388"/db_xref="GOA:Q96S64"/db_xref="InterPro:IPR000198"/db_xref="UniProt/TrEMBL:Q96S64" 7 1 3 0 9 6 9 13 11 15 8 17 6 6 2 7 7 8 14 7 5 6 11 5 1 10 13 16 1 12 11 3 4 8 9 8 7 12 36 26 12 15 18 25 1 5 38 22 12 22 4 8 4 1 3 15 6 10 16 16 5 0 1 0 >AF395536\AF395536\1..1887\1887\AAK82415.1\Homo sapiens\Homo sapiens sorting nexin 18 (SNX18) mRNA, complete cds./gene="SNX18"/codon_start=1/product="sorting nexin 18"/protein_id="AAK82415.1"/db_xref="GI:15042691" 1 17 2 0 2 5 3 10 33 3 3 4 4 13 11 4 20 1 4 12 7 1 4 23 19 5 4 28 19 4 4 38 11 3 3 8 24 3 5 28 10 3 2 34 11 2 1 34 35 5 13 7 8 1 32 3 2 11 1 8 12 0 1 0 >HS1000E10#4\AL096773\join(65219..65417,67037..67146,67546..67638,68254..68351, 70586..70667,70992..71120,71252..71377,72293..72505, 74461..74601,74687..74851,78019..78126,78723..78898, 79776..79888,81107..81226,84392..84570,85367..85530, 86364..86496,86893..86940)\2397\CAI18825.1\Homo sapiens\Human DNA sequence from clone RP5-1000E10 on chromosome 1p12-13.3Contains the 5' end of the gene for a novel protein (FLJ37099), theAMPD1 gene for adenosine monophosphate deaminase 1 (isoform M), theNRAS gene for neuroblastoma RAS viral (v-ras) oncogene homolog, thegene for NRAS-related gene (D1S155E) and the gene for a novelprotein (FLJ44339), complete sequence./gene="RP5-1000E10.3"/locus_tag="RP5-1000E10.3-003"/standard_name="OTTHUMP00000013881"/codon_start=1/product="NRAS-related gene (D1S155E)"/protein_id="CAI18825.1"/db_xref="GI:56203365"/db_xref="InterPro:IPR002059"/db_xref="InterPro:IPR008994" 8 8 2 10 7 7 3 6 21 7 7 8 11 5 1 8 8 13 11 8 3 20 11 8 1 14 13 12 0 18 18 18 13 16 13 11 21 25 39 21 17 25 14 19 7 11 41 21 14 39 8 8 4 11 15 29 5 12 28 15 1 1 0 0 >BC022993\BC022993\127..837\711\AAH22993.1\Homo sapiens\Homo sapiens reticulon 3, transcript variant 1, mRNA (cDNA cloneIMAGE:4940867), complete cds./gene="RTN3"/codon_start=1/product="reticulon 3, isoform a"/protein_id="AAH22993.1"/db_xref="GI:18606175"/db_xref="GeneID:10313"/db_xref="MIM:604249" 1 0 0 1 1 2 1 6 14 4 0 1 3 11 3 2 4 3 0 6 2 3 2 2 3 1 3 9 4 8 4 6 3 1 3 7 4 6 5 11 2 2 2 4 3 3 7 3 3 5 5 3 2 1 8 4 0 12 9 6 2 1 0 0 >AF093415\AF093415\30..770\741\AAF22488.1\Homo sapiens\Homo sapiens cell division protein FtsJ (FJH1) mRNA, complete cds./gene="FJH1"/codon_start=1/product="cell division protein FtsJ"/protein_id="AAF22488.1"/db_xref="GI:6652820" 2 1 6 2 4 8 0 6 12 7 2 2 3 1 0 1 7 4 6 5 0 4 2 4 1 6 4 6 4 5 3 6 9 0 2 1 14 3 3 8 2 3 3 9 5 2 4 7 8 4 4 0 3 4 8 3 1 5 2 2 3 0 0 1 >AL359764#4\AL359764\join(complement(6483..6560),AL512307.12:153863..153959, complement(AL590682.9:65938..65988), complement(AL590682.9:19459..19565), complement(AL590682.9:13576..13627), complement(AL365184.12:142467..142531), complement(AL365184.12:141188..141264), complement(AL365184.12:140999..141080), complement(AL365184.12:99510..99584), complement(AL365184.12:88729..88827), complement(AL365184.12:87128..87189), complement(AL365184.12:86030..86140), complement(AL365184.12:84330..84455), complement(AL365184.12:78552..78738), complement(AL365184.12:75316..75405), complement(AL365184.12:73059..73133))\78\CAI15142.1\Homo sapiens\Human DNA sequence from clone RP11-527D7 on chromosome 1 Containsthe 5' end of the RGS7 gene for regulator of G-protein signalling7, a novel gene, the 5' end of the FH gene for fumarate hydrataseand two CpG islands, complete sequence./gene="RGS7"/locus_tag="RP11-80B9.3-004"/standard_name="OTTHUMP00000037996"/codon_start=1/product="regulator of G-protein signalling 7"/protein_id="CAI15142.1"/db_xref="GI:55959534"/db_xref="GOA:P49802"/db_xref="UniProt/Swiss-Prot:P49802" 2 1 1 2 23 13 3 5 4 5 7 7 3 5 1 10 7 9 4 3 1 8 10 5 2 5 4 8 0 10 11 5 8 3 8 5 11 9 26 16 7 11 8 11 6 6 16 5 2 13 6 10 3 7 7 17 13 8 11 8 7 6 7 13 >AF155095\AF155095\151..1863\1713\AAD42861.1\Homo sapiens\Homo sapiens NY-REN-2 antigen mRNA, complete cds./codon_start=1/product="NY-REN-2 antigen"/protein_id="AAD42861.1"/db_xref="GI:5360085" 1 3 3 4 5 3 3 2 7 5 5 9 9 11 1 16 16 12 5 7 1 10 20 13 2 15 17 10 0 19 17 8 11 19 7 3 11 9 15 18 18 21 10 29 7 6 7 9 10 16 9 10 1 3 9 10 4 2 17 10 10 1 0 0 >AF077961\AF077961\1..1317\1317\AAD51483.1\Homo sapiens\Homo sapiens clone 327 CLN3 protein (CLN3) mRNA, complete cds./gene="CLN3"/codon_start=1/product="CLN3 protein"/protein_id="AAD51483.1"/db_xref="GI:5801841" 1 5 7 1 1 3 0 21 38 7 0 7 5 14 5 9 12 4 5 10 2 3 4 8 3 7 6 19 4 10 7 14 12 3 1 10 12 4 1 4 11 0 1 13 7 5 5 14 9 2 10 4 6 4 16 9 1 13 4 5 10 0 0 1 >D87953\D87953\123..1307\1185\BAA13505.1\Homo sapiens\Human mRNA for RTP, complete cds./gene="GC4"/codon_start=1/product="RTP"/protein_id="BAA13505.1"/db_xref="GI:1596167" 3 9 4 0 0 0 2 8 13 5 1 3 2 11 4 4 12 1 4 18 1 5 2 8 4 8 4 13 2 9 7 16 9 2 1 10 15 2 4 9 15 2 3 15 13 2 5 19 14 5 10 0 5 3 5 5 0 11 4 20 3 0 1 0 >AY372211\AY372211\1..879\879\AAQ75759.1\Homo sapiens\Homo sapiens unknown mRNA./codon_start=1/product="unknown"/protein_id="AAQ75759.1"/db_xref="GI:34577163" 3 1 3 3 2 1 0 4 12 6 1 1 5 1 1 4 1 1 7 5 1 8 12 4 3 17 11 9 1 12 3 3 3 6 1 2 3 5 6 8 7 10 6 12 6 5 4 6 0 5 4 1 7 2 5 5 2 3 6 16 0 0 0 1 >BC093690\BC093690\50..1489\1440\AAH93690.1\Homo sapiens\Homo sapiens carbohydrate (chondroitin 6) sulfotransferase 3, mRNA(cDNA clone MGC:120725 IMAGE:7939535), complete cds./gene="CHST3"/codon_start=1/product="carbohydrate (chondroitin 6) sulfotransferase 3"/protein_id="AAH93690.1"/db_xref="GI:62739448"/db_xref="GeneID:9469"/db_xref="MIM:603799" 0 25 9 1 3 3 1 13 36 2 1 4 4 9 4 1 11 0 1 7 7 2 2 13 8 1 5 30 6 2 1 19 5 0 0 8 22 2 4 21 12 1 2 24 8 0 6 34 18 2 10 1 8 1 23 3 2 12 1 11 7 0 1 0 >HSU17838\U17838\856..6015\5160\AAC50820.2\Homo sapiens\Homo sapiens zinc finger protein RIZ mRNA, complete cds./function="zinc finger protein"/codon_start=1/product="zinc finger protein RIZ"/protein_id="AAC50820.2"/db_xref="GI:9955379" 13 6 18 4 15 15 6 17 26 30 24 22 32 38 7 63 37 30 33 26 5 33 54 40 15 53 31 41 10 35 16 15 24 18 16 15 32 28 100 51 30 46 17 54 17 27 76 74 25 50 15 11 16 25 21 27 17 14 24 27 12 0 1 0 >CR457417\CR457417\1..930\930\CAG33698.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834A1014D forgene PPP2CA, protein phosphatase 2 (formerly 2A), catalyticsubunit, alpha isoform; complete cds, incl. stopcodon./gene="PPP2CA"/codon_start=1/protein_id="CAG33698.1"/db_xref="GI:48146951"/db_xref="GOA:Q6I9T8"/db_xref="InterPro:IPR004843"/db_xref="InterPro:IPR006186"/db_xref="UniProt/TrEMBL:Q6I9T8" 3 2 1 7 6 0 3 5 8 8 1 5 3 3 1 5 2 1 8 4 2 4 10 1 0 3 4 2 0 6 6 3 4 10 3 2 6 10 7 6 5 7 6 8 1 9 8 12 10 14 5 11 3 7 6 8 2 7 4 6 5 1 0 0 >BC038838\BC038838\232..936\705\AAH38838.1\Homo sapiens\Homo sapiens mesenchymal stem cell protein DSC54, mRNA (cDNA cloneMGC:47873 IMAGE:5170587), complete cds./gene="LOC51334"/codon_start=1/product="LOC51334 protein"/protein_id="AAH38838.1"/db_xref="GI:24416469"/db_xref="GeneID:51334" 3 0 2 0 4 4 3 3 3 3 2 3 2 2 0 1 5 5 6 5 4 5 13 8 2 20 5 2 0 3 6 5 2 1 2 4 5 2 10 7 7 8 1 5 6 3 6 2 6 2 1 2 2 4 2 1 3 5 3 3 0 0 0 1 >BC030830\BC030830\133..750\618\AAH30830.1\Homo sapiens\Homo sapiens CD83 antigen (activated B lymphocytes, immunoglobulinsuperfamily), mRNA (cDNA clone MGC:26322 IMAGE:4818856), completecds./gene="CD83"/codon_start=1/product="CD83 antigen (activated B lymphocytes, immunoglobulin superfamily)"/protein_id="AAH30830.1"/db_xref="GI:21410827"/db_xref="GeneID:9308"/db_xref="MIM:604534" 2 1 1 1 3 4 3 5 10 1 3 3 0 4 2 2 3 2 5 3 2 5 3 7 3 2 2 3 2 5 2 3 4 3 1 3 6 3 3 9 3 3 1 9 3 2 7 6 4 4 5 2 6 1 4 4 0 4 3 3 2 0 0 1 >HSM806071\BX537946\80..2989\2910\CAD97914.1\Homo sapiens\Homo sapiens mRNA; cDNA DKFZp686C0686 (from clone DKFZp686C0686);complete cds./gene="DKFZp686C0686"/codon_start=1/product="hypothetical protein"/protein_id="CAD97914.1"/db_xref="GI:31873988"/db_xref="GOA:Q7Z3F2"/db_xref="InterPro:IPR000719"/db_xref="InterPro:IPR001090"/db_xref="InterPro:IPR001245"/db_xref="InterPro:IPR001426"/db_xref="InterPro:IPR001660"/db_xref="InterPro:IPR003961"/db_xref="InterPro:IPR006209"/db_xref="InterPro:IPR008266"/db_xref="UniProt/TrEMBL:Q7Z3F2" 5 1 8 7 19 6 8 6 18 15 6 15 11 10 0 23 17 15 14 24 2 11 22 12 1 14 24 12 1 25 27 16 12 14 15 16 36 14 36 22 18 25 16 14 9 12 43 30 12 32 13 24 18 14 11 20 7 27 23 27 14 1 0 0 >AB055421\AB055421\7..960\954\BAB21756.1\Homo sapiens\Homo sapiens mRNA for ChM1L, complete cds./gene="ChM1L"/codon_start=1/product="ChM1L"/protein_id="BAB21756.1"/db_xref="GI:12698293" 3 4 0 3 4 1 5 0 6 5 2 1 3 2 0 2 3 1 2 4 0 9 6 3 1 8 4 5 0 1 8 4 3 3 3 7 9 1 14 9 6 11 6 3 6 0 24 14 5 8 9 6 3 11 5 11 6 6 14 7 7 1 0 0 >CR456454\CR456454\61..840\780\CAG30340.1\Homo sapiens\Homo sapiens dJ186O1.2 full length open reading frame (ORF) cDNAclone (cDNA clone C22ORF:pGEM.dJ186O1.2.V2)./gene="dJ186O1.2"/codon_start=1/protein_id="CAG30340.1"/db_xref="GI:47678439"/db_xref="InterPro:IPR001849"/db_xref="UniProt/TrEMBL:Q6ICB4" 2 9 3 0 1 4 2 5 16 0 0 5 2 7 0 3 12 3 1 5 0 1 5 5 2 6 5 11 4 6 1 15 7 0 2 1 11 2 1 8 3 1 1 12 6 1 6 16 5 2 1 2 5 3 3 6 0 2 0 3 8 1 0 0 >BC020818\BC020818\108..1334\1227\AAH20818.1\Homo sapiens\Homo sapiens Sjogren syndrome antigen B (autoantigen La), mRNA(cDNA clone MGC:23872 IMAGE:4338551), complete cds./gene="SSB"/codon_start=1/product="autoantigen La"/protein_id="AAH20818.1"/db_xref="GI:18089160"/db_xref="GeneID:6741"/db_xref="MIM:109090" 0 0 1 2 11 3 10 1 5 3 6 7 3 3 1 5 3 2 6 2 1 6 3 1 0 6 12 6 0 9 9 5 2 11 7 1 7 2 46 15 6 14 9 10 1 5 41 8 10 20 2 4 1 2 6 13 11 3 8 6 5 0 1 0 >AK092263\AK092263\306..674\369\BAC03841.1\Homo sapiens\Homo sapiens cDNA FLJ34944 fis, clone NT2RP7008015./codon_start=1/protein_id="BAC03841.1"/db_xref="GI:21750811" 1 0 6 1 2 3 1 7 1 1 0 1 0 2 3 2 1 3 0 5 1 1 1 5 2 3 1 3 4 1 4 7 3 2 1 2 0 0 2 3 0 0 2 2 1 1 4 1 5 1 1 1 4 2 1 0 0 1 0 6 3 0 1 0 >AY038182\AY038182\69..308\240\AAK72469.1\Homo sapiens\Homo sapiens probable protease inhibitor WAP10 precursor, mRNA,complete cds./codon_start=1/product="probable protease inhibitor WAP10 precursor"/protein_id="AAK72469.1"/db_xref="GI:21654721" 1 0 0 1 0 1 3 1 7 0 1 0 0 1 0 2 1 0 1 1 0 1 1 1 0 2 2 1 0 0 2 0 1 0 0 2 1 2 4 2 1 2 1 8 2 0 2 0 1 1 2 1 4 5 0 0 1 2 0 3 0 0 0 1 >HUMMLCAB\M20642\44..628\585\AAA59854.1\Homo sapiens\Human alkali myosin light chain 1 mRNA, complete cds./gene="MYL1"/codon_start=1/product="myosin light chain"/protein_id="AAA59854.1"/db_xref="GI:188592"/db_xref="GDB:G00-120-217" 1 1 0 1 1 1 1 4 7 1 1 0 0 3 0 3 2 0 3 4 0 0 3 3 2 5 7 12 2 5 1 4 0 6 0 6 3 3 8 11 4 6 3 4 1 1 15 7 7 3 1 1 1 0 2 7 0 5 4 7 0 0 0 1 >AY177201\AY177201\1..705\705\AAO27703.1\Homo sapiens\Homo sapiens ubiquitin-specific protease otubain 2 mRNA, completecds./codon_start=1/product="ubiquitin-specific protease otubain 2"/protein_id="AAO27703.1"/db_xref="GI:28628209" 1 3 3 0 1 6 2 3 14 3 0 2 2 6 3 1 4 3 2 5 3 1 1 1 0 3 2 11 1 3 0 3 3 0 3 0 8 1 8 6 8 2 1 5 8 2 7 13 11 4 7 3 2 2 16 2 1 9 5 4 0 0 0 1 >HS86C11#3\AL021807\56377..56769\393\CAA16948.1\Homo sapiens\Human DNA sequence from clone RP1-86C11 on chromosome 6p21.31-22.1Contains a vomeronasal olfactory 1 receptor chromosome 6-2(VN1R6-2P) pseudogene, histone genes H2BFN and H2AFA, the gene fora novel protein identical to H2B histone family member A (H2BFA),the gene for a novel protein identical to H2A histone family memberI (H2AFI), a novel histone gene similar to H4F2 and three CpGislands, complete sequence./gene="RP1-86C11.5"/locus_tag="RP1-86C11.5-001"/standard_name="OTTHUMP00000016174"/codon_start=1/protein_id="CAA16948.1"/db_xref="GI:3928696" 1 7 1 2 0 1 1 4 11 0 0 0 0 0 0 3 1 0 0 3 0 2 0 3 2 0 2 9 5 1 2 9 1 2 0 3 5 0 4 10 6 0 1 4 3 1 1 6 2 0 1 2 0 0 1 0 0 5 1 1 0 1 0 0 >BC073784\BC073784\25..732\708\AAH73784.1\Homo sapiens\Homo sapiens cDNA clone MGC:88801 IMAGE:4297904, complete cds./codon_start=1/product="Unknown (protein for MGC:88801)"/protein_id="AAH73784.1"/db_xref="GI:49257468" 1 0 2 0 2 1 2 9 7 0 0 0 4 7 0 9 12 3 4 8 4 7 2 12 3 1 2 11 3 2 5 4 8 3 0 8 8 0 2 9 6 2 1 10 2 2 2 7 3 4 6 2 1 5 6 0 0 5 0 1 5 0 1 0 >AL157935#12\AL157935\complement(join(109055..109113,109886..109988, 110767..110856,111151..111153))\255\CAI12615.1\Homo sapiens\Human DNA sequence from clone RP11-203J24 on chromosome 9 Containsthe 5' end of the ENG gene for endoglin (Osler-Rendu-Weber syndrome1) (END, ORW, HHT1, ORW1, CD105), the AK1 gene for adenylate kinase1, the ST6GALNAC6 gene for CMP-NeuAC:(beta)-N-acetylgalactosaminide(alpha)2,6-sialyltransferase member VI (ST6GALNACVI), the SIAT7Dgene for sialyltransferase 7D((alpha-N-acetylneuraminyl-2,3-beta-galactosyl-1,3)-N-acetylgalactosaminide alpha-2,6-sialyltransferase) (SIAT3C, ST6GALNAC4,ST6GALNACIV), the DPM2 gene for dolichyl-phosphatemannosyltransferase polypeptide 2, regulatory subunit (MGC21559),the gene for a novel protien containing FLJ00179, a novel gene andten CpG islands, complete sequence./gene="DPM2"/locus_tag="RP11-203J24.2-001"/standard_name="OTTHUMP00000022224"/codon_start=1/product="dolichyl-phosphate mannosyltransferase polypeptide 2, regulatory subunit"/protein_id="CAI12615.1"/db_xref="GI:55958264" 1 0 0 0 1 0 0 5 8 0 0 1 0 1 0 0 1 1 1 4 1 0 2 1 0 0 1 5 0 3 2 2 1 0 0 3 6 1 0 5 0 0 0 3 1 1 0 0 2 0 2 3 0 0 4 1 0 6 1 2 1 0 0 1 >AF327066\AF327066\1..1431\1431\AAK11227.1\Homo sapiens\Homo sapiens Ewings sarcoma EWS-Fli1 (type 1) oncogene mRNA,complete cds./codon_start=1/product="Ewings sarcoma EWS-Fli1 (type 1) oncogene"/protein_id="AAK11227.1"/db_xref="GI:12963355" 0 3 3 1 2 1 2 3 6 1 1 0 2 14 1 6 27 13 5 23 4 19 10 19 6 18 13 14 2 10 13 7 15 4 1 7 4 0 6 6 13 2 17 43 5 4 0 7 5 8 23 29 0 1 4 3 0 6 2 8 4 0 1 0 >AK023335\AK023335\252..1637\1386\BAB14532.1\Homo sapiens\Homo sapiens cDNA FLJ13273 fis, clone OVARC1001010./codon_start=1/protein_id="BAB14532.1"/db_xref="GI:10435228" 2 1 1 0 10 6 6 10 10 11 8 13 6 5 0 12 5 7 9 6 0 9 13 5 0 14 11 6 0 7 7 2 5 4 12 6 9 7 17 21 4 6 6 16 5 8 27 9 8 12 5 3 6 9 4 16 7 9 9 5 4 0 1 0 >AY740522\AY740522\1..177\177\AAU89079.1\Homo sapiens\Homo sapiens HCV F-transactivated protein 2 mRNA, complete cds./codon_start=1/product="HCV F-transactivated protein 2"/protein_id="AAU89079.1"/db_xref="GI:52857652" 0 0 0 0 1 1 0 1 1 0 2 1 0 1 0 1 1 1 3 0 0 3 2 0 0 0 3 1 1 1 1 0 0 3 1 1 1 0 5 1 2 0 1 1 0 0 1 0 0 1 1 4 0 1 3 0 0 0 3 2 0 1 0 0 >BC012858\BC012858\71..613\543\AAH12858.1\Homo sapiens\Homo sapiens pituitary tumor-transforming 1 interacting protein,mRNA (cDNA clone MGC:9271 IMAGE:3857505), complete cds./gene="PTTG1IP"/codon_start=1/product="pituitary tumor-transforming gene 1 protein-interacting protein, precursor"/protein_id="AAH12858.1"/db_xref="GI:15277511"/db_xref="GeneID:754"/db_xref="MIM:603784" 0 3 5 1 4 6 0 5 8 2 0 3 0 2 1 2 3 1 3 3 1 1 2 2 6 1 2 6 4 5 4 3 2 1 1 3 3 2 5 6 9 0 0 3 0 1 6 10 2 1 2 2 8 6 0 3 1 5 1 4 4 1 0 0 >HUMPLCA\M34667\77..3949\3873\AAA36452.1\Homo sapiens\Human phospholipase C-gamma mRNA, complete cds./codon_start=1/protein_id="AAA36452.1"/db_xref="GI:190038" 15 23 25 11 4 10 5 33 54 13 2 12 14 28 6 9 26 13 13 24 10 8 14 25 9 21 10 44 5 19 9 32 28 2 4 21 36 6 10 62 38 12 6 53 22 9 20 93 54 20 39 12 15 8 44 22 2 37 18 32 19 0 1 0 >AY305302\AY305302\15..1766\1752\AAP72167.1\Homo sapiens\Homo sapiens kin of irregular chiasm 2 splice variant B (KIRREL2)mRNA, complete cds; alternatively spliced./gene="KIRREL2"/codon_start=1/product="kin of irregular chiasm 2 splice variant B"/protein_id="AAP72167.1"/db_xref="GI:32186868" 4 11 13 4 6 5 3 11 37 3 2 4 6 8 6 13 16 6 12 15 0 10 13 15 6 13 8 32 6 11 10 19 21 8 2 14 30 4 1 8 8 3 4 15 8 3 9 26 16 10 5 2 9 3 15 4 2 5 5 4 11 1 0 0 >BC030532\BC030532\118..2685\2568\AAH30532.1\Homo sapiens\Homo sapiens suppression of tumorigenicity 14 (colon carcinoma,matriptase, epithin), mRNA (cDNA clone MGC:40392 IMAGE:5213189),complete cds./gene="ST14"/codon_start=1/product="matriptase"/protein_id="AAH30532.1"/db_xref="GI:20988875"/db_xref="GeneID:6768"/db_xref="MIM:606797" 0 21 13 3 2 9 0 16 41 1 2 7 6 24 2 3 31 4 7 21 13 5 6 23 10 8 3 26 9 8 11 40 21 5 5 20 35 3 9 29 28 9 1 33 16 7 3 41 42 11 26 2 29 11 36 7 1 23 2 10 15 0 1 0 >BC034410\BC034410\154..1977\1824\AAH34410.1\Homo sapiens\Homo sapiens engulfment and cell motility 3 (ced-12 homolog, C.elegans), mRNA (cDNA clone MGC:34167 IMAGE:5171412), complete cds./gene="ELMO3"/codon_start=1/product="engulfment and cell motility 3"/protein_id="AAH34410.1"/db_xref="GI:21706711"/db_xref="GeneID:79767"/db_xref="MIM:606422" 2 14 16 6 1 6 5 19 61 4 0 8 3 9 1 2 21 6 7 9 3 5 6 24 2 7 4 23 4 7 2 15 13 2 1 5 25 1 1 22 16 3 3 31 9 2 5 48 20 5 8 6 8 4 19 5 2 14 2 20 5 0 0 1 >AF312923\AF312923\18..296\279\AAK28486.1\Homo sapiens\Homo sapiens prostin 1 short isoform mRNA, complete cds./codon_start=1/product="prostin 1 short isoform"/protein_id="AAK28486.1"/db_xref="GI:13507169" 1 3 0 1 0 1 1 0 10 1 2 3 0 3 0 0 1 0 0 2 2 2 1 2 0 0 1 10 2 0 2 5 1 0 0 1 4 0 1 0 0 0 0 5 0 1 1 4 3 0 1 2 3 0 1 1 1 1 0 1 4 0 0 1 >AY358306\AY358306\214..2064\1851\AAQ88673.1\Homo sapiens\Homo sapiens clone DNA57254 GPR125 (UNQ556) mRNA, complete cds./locus_tag="UNQ556"/codon_start=1/product="GPR125"/protein_id="AAQ88673.1"/db_xref="GI:37181732" 5 8 9 3 10 7 6 7 29 11 6 12 11 2 5 14 4 11 8 12 5 11 8 9 5 4 13 12 10 12 14 18 9 6 5 7 15 10 10 15 16 18 6 21 3 4 15 12 10 22 6 14 7 12 7 15 8 3 17 13 9 1 0 0 >BC042755\BC042755\2463..2558\96\AAH42755.1\Homo sapiens\Homo sapiens regulator of G-protein signalling 2, 24kDa, mRNA (cDNAclone IMAGE:4830785), complete cds./gene="RGS2"/codon_start=1/product="RGS2 protein"/protein_id="AAH42755.1"/db_xref="GI:71297280"/db_xref="GeneID:5997"/db_xref="MIM:600861" 0 0 0 1 0 0 0 0 0 0 0 2 1 0 0 1 0 0 2 1 0 0 1 0 0 2 0 0 0 1 0 0 0 0 0 0 0 0 1 1 2 0 1 1 0 1 1 3 1 0 1 1 0 1 2 0 0 1 0 1 0 0 0 1 >AF069734\AF069734\1..954\954\AAC39904.1\Homo sapiens\Homo sapiens SPT3-like protein mRNA, complete cds./codon_start=1/product="SPT3-like protein"/protein_id="AAC39904.1"/db_xref="GI:3335557" 5 3 1 0 8 6 2 2 5 8 9 4 3 3 0 7 7 9 5 2 1 7 3 3 0 2 14 6 2 11 3 3 3 2 4 2 4 5 13 5 3 8 7 9 5 2 16 6 9 12 4 5 3 2 6 6 2 6 10 13 1 0 0 1 >S66427\S66427\115..3888\3774\AAB28543.1\Homo sapiens\RBP1=retinoblastoma binding protein 1 [human, Nalm-6 pre-B cellleukemia, mRNA, 4834 nt]./gene="RBP1"/codon_start=1/product="retinoblastoma binding protein 1"/protein_id="AAB28543.1"/db_xref="GI:435776" 11 1 2 4 27 19 13 15 9 19 24 16 30 6 3 39 14 35 24 14 4 19 23 4 5 24 22 10 2 22 34 5 5 11 26 4 14 20 83 55 13 46 23 21 4 14 118 51 28 67 6 19 5 12 6 18 17 12 32 20 8 0 0 1 >HSA413230\AJ413230\209..2200\1992\CAC88372.1\Homo sapiens\Homo sapiens mRNA for anion transporter (SUT2/SLC26A7 gene),isoform 2./gene="SUT2/SLC26A7"/function="putative sulfate transporter"/codon_start=1/product="anion transporter"/protein_id="CAC88372.1"/db_xref="GI:18643952"/db_xref="GOA:Q8TE53"/db_xref="UniProt/TrEMBL:Q8TE53" 3 1 1 0 3 6 4 6 19 17 10 17 12 10 1 10 9 10 17 9 2 6 12 5 3 7 16 18 2 18 23 8 3 4 7 9 26 19 25 11 15 16 9 15 2 10 20 5 8 13 4 20 7 11 10 25 21 12 18 26 7 0 0 1 >HUMGLIA1F\M64979\1..1128\1128\AAA58611.1\Homo sapiens\Human glial factor-1 (which interacts with a polyomavirus (JC)B-domain) mRNA, partial cds./gene="glial factor-1"/codon_start=1/product="glial factor-1"/protein_id="AAA58611.1"/db_xref="GI:183250" 4 3 5 4 5 13 0 6 14 5 0 3 1 10 2 4 10 2 2 8 3 2 10 7 2 11 5 15 3 10 7 10 11 5 1 11 14 2 6 12 9 1 3 25 11 3 8 23 11 4 1 3 1 1 6 5 3 5 2 3 0 0 0 0 >BC070085\BC070085\218..2887\2670\AAH70085.1\Homo sapiens\Homo sapiens colony stimulating factor 2 receptor, beta,low-affinity (granulocyte-macrophage), mRNA (cDNA clone MGC:87425IMAGE:30344148), complete cds./gene="CSF2RB"/codon_start=1/product="CSF2RB protein"/protein_id="AAH70085.1"/db_xref="GI:47123336"/db_xref="GeneID:1439"/db_xref="MIM:138981" 2 13 8 1 3 19 6 27 42 5 1 1 11 26 5 11 34 9 5 28 3 5 31 42 9 32 10 35 3 9 13 20 33 4 3 21 36 5 7 18 14 3 6 40 16 3 7 44 34 9 16 3 12 6 16 8 4 15 5 13 19 0 0 1 >DQ023504\DQ023504\960..3143\2184\AAY89635.1\Homo sapiens\Homo sapiens prolyl endopeptidase-like variant C2 (PREPL) mRNA,complete cds, alternatively spliced./gene="PREPL"/codon_start=1/product="prolyl endopeptidase-like variant C2"/protein_id="AAY89635.1"/db_xref="GI:68303897" 3 4 4 6 9 3 8 11 11 19 15 13 5 2 0 12 7 9 16 10 2 14 14 7 1 17 11 6 3 14 12 9 4 9 10 2 12 18 36 27 11 24 8 17 11 12 42 15 16 33 14 19 6 11 17 16 13 7 17 15 8 0 0 1 >HUMDCKATPB\M60527\160..942\783\AAA35752.1\Homo sapiens\Human deoxycytidine kinase mRNA, complete cds./EC_number="2.7.1.74"/codon_start=1/product="deoxycytidine kinase"/protein_id="AAA35752.1"/db_xref="GI:181510" 3 1 1 0 5 2 0 3 5 9 5 5 2 1 0 8 4 5 8 4 0 3 1 1 2 6 3 6 0 2 2 3 4 1 1 1 4 6 11 6 3 10 10 3 0 5 17 11 5 6 0 12 4 2 3 9 2 7 4 6 7 0 0 1 >BC020228\BC020228\7..864\858\AAH20228.1\Homo sapiens\Homo sapiens dodecenoyl-Coenzyme A delta isomerase (3,2trans-enoyl-Coenzyme A isomerase), mRNA (cDNA clone MGC:31929IMAGE:4541779), complete cds./gene="DCI"/codon_start=1/product="DCI protein"/protein_id="AAH20228.1"/db_xref="GI:18044954"/db_xref="GeneID:1632"/db_xref="MIM:600305" 3 8 7 1 0 3 0 5 28 0 1 2 1 5 2 1 9 0 0 5 5 1 2 4 7 0 1 16 14 5 3 9 8 2 0 9 13 2 5 7 7 1 0 14 2 1 2 14 12 1 5 0 2 2 7 1 2 6 2 7 3 1 0 0 >BC033212\BC033212\31..2487\2457\AAH33212.1\Homo sapiens\Homo sapiens KIAA1914, transcript variant 1, mRNA (cDNA cloneMGC:45872 IMAGE:4909140), complete cds./gene="KIAA1914"/codon_start=1/product="KIAA1914 protein, isoform 1"/protein_id="AAH33212.1"/db_xref="GI:21620069"/db_xref="GeneID:84632" 0 9 13 1 4 10 5 23 46 8 2 4 6 21 5 13 30 6 14 20 4 3 20 13 8 17 11 25 5 11 6 24 4 3 1 15 31 1 20 52 17 10 4 34 12 1 22 66 34 14 17 6 11 5 7 2 2 15 7 9 9 0 1 0 >HUMMOXYII\M83772\137..1738\1602\AAA86284.1\Homo sapiens\Human flavin-containing monooxygenase form II (FMO2) mRNA, completecds./gene="FMO2"/codon_start=1/product="flavoprotein"/protein_id="AAA86284.1"/db_xref="GI:188631"/db_xref="GDB:G00-127-981" 2 2 3 1 5 6 3 9 14 5 5 9 3 12 4 3 11 6 11 11 0 6 9 10 1 10 13 11 0 6 9 16 12 5 10 9 11 6 20 19 12 12 2 11 3 6 11 18 11 12 6 8 3 8 15 26 6 12 17 16 10 1 0 0 >BC002973\BC002973\71..931\861\AAH02973.1\Homo sapiens\Homo sapiens ribonuclease H1, mRNA (cDNA clone MGC:2019IMAGE:3537074), complete cds./gene="RNASEH1"/codon_start=1/product="ribonuclease H1"/protein_id="AAH02973.1"/db_xref="GI:12804229"/db_xref="GeneID:246243"/db_xref="MIM:604123" 2 4 2 2 9 5 0 1 7 3 2 2 1 2 3 2 6 3 4 3 2 3 3 1 5 5 9 13 3 3 7 7 10 3 1 6 5 6 8 9 4 8 7 4 2 7 11 10 8 4 3 3 5 0 4 9 3 3 4 7 8 0 0 1 >AF304464\AF304464\350..2539\2190\AAL04015.1\Homo sapiens\Homo sapiens calcium transport protein CaT2 mRNA, complete cds./codon_start=1/product="calcium transport protein CaT2"/protein_id="AAL04015.1"/db_xref="GI:15625297" 7 11 7 6 4 10 8 18 51 15 1 9 1 17 0 9 7 7 8 21 4 8 8 15 1 6 7 28 4 12 13 6 20 7 1 11 29 7 5 15 19 7 8 26 13 7 7 37 24 8 11 10 10 4 24 16 1 29 11 22 11 0 0 1 >AK024329\AK024329\70..1860\1791\BAB14888.1\Homo sapiens\Homo sapiens cDNA FLJ14267 fis, clone PLACE1002665, highly similarto Mus musculus enhancer of polycomb (Epc1) mRNA./codon_start=1/protein_id="BAB14888.1"/db_xref="GI:10436688" 10 2 4 1 9 8 5 6 12 7 12 7 19 9 3 17 3 22 15 6 4 17 14 5 1 14 21 13 1 16 9 6 5 5 8 7 8 11 22 13 10 21 18 23 7 10 19 7 13 16 6 8 2 5 5 11 6 7 9 13 3 0 1 0 >BC009679\BC009679\18..995\978\AAH09679.1\Homo sapiens\Homo sapiens DKFZP566O084 protein, mRNA (cDNA clone MGC:8916IMAGE:3878039), complete cds./gene="DKFZp566O084"/codon_start=1/product="DKFZP566O084 protein"/protein_id="AAH09679.1"/db_xref="GI:16307180"/db_xref="GeneID:25979" 3 1 4 2 2 6 2 9 16 3 1 2 2 7 0 5 7 1 7 14 1 1 1 3 2 5 8 15 2 14 2 12 7 3 1 8 15 5 7 14 4 4 1 8 4 1 5 7 9 4 5 6 2 3 9 4 2 14 3 9 1 0 1 0 >AY223901\AY223901\272..652\381\AAO73604.1\Homo sapiens\Homo sapiens putative schizophrenia protein variant 4 (G72) mRNA,complete cds, alternatively spliced./gene="G72"/codon_start=1/product="putative schizophrenia protein variant 4"/protein_id="AAO73604.1"/db_xref="GI:29423472" 0 0 0 0 6 4 2 1 3 3 2 3 1 2 0 6 2 0 3 0 1 1 0 1 0 2 3 0 0 3 3 2 1 3 1 1 1 0 4 4 2 2 1 5 2 4 8 4 2 1 2 3 0 2 2 4 2 1 4 4 2 0 0 1 >HSMH3C2R\X04481 K01236\37..2295\2259\CAA28169.1\Homo sapiens\Human mRNA for complement component C2./codon_start=1/protein_id="CAA28169.1"/db_xref="GI:34628"/db_xref="GOA:P06681"/db_xref="UniProt/Swiss-Prot:P06681" 3 10 10 3 4 11 2 15 40 11 2 4 4 20 7 10 17 6 12 16 4 5 10 20 2 10 6 23 3 11 11 23 18 10 1 18 23 6 13 23 24 19 7 23 6 13 13 26 23 15 6 10 17 7 17 14 2 23 7 20 13 0 1 0 >BC004492\BC004492\6..740\735\AAH04492.1\Homo sapiens\Homo sapiens DKFZP586A0522 protein, mRNA (cDNA clone MGC:11081IMAGE:3689692), complete cds./gene="DKFZP586A0522"/codon_start=1/product="DKFZP586A0522 protein"/protein_id="AAH04492.1"/db_xref="GI:13325370"/db_xref="GeneID:25840" 1 3 3 0 4 2 0 3 20 2 0 5 0 3 1 3 5 1 1 4 1 2 2 5 1 3 3 7 1 5 2 5 6 0 1 2 16 0 4 9 10 1 2 7 4 2 2 14 1 4 6 3 6 3 10 9 2 6 4 4 8 0 1 0 >HSU94586\U94586\91..336\246\AAB52726.1\Homo sapiens\Homo sapiens NADH:ubiquinone oxidoreductase MLRQ subunit mRNA,complete cds./codon_start=1/product="NADH:ubiquinone oxidoreductase MLRQ subunit"/protein_id="AAB52726.1"/db_xref="GI:1946692" 0 1 0 2 1 0 0 3 4 0 0 3 1 0 0 0 2 0 1 0 0 2 4 3 0 0 2 1 0 1 3 0 0 2 1 0 2 1 1 6 2 4 1 2 0 1 1 1 1 4 3 1 0 1 3 2 0 3 1 1 2 1 0 0 >BC093838\BC093838\64..627\564\AAH93838.1\Homo sapiens\Homo sapiens hypothetical protein BC011880, mRNA (cDNA cloneMGC:120873 IMAGE:7939683), complete cds./gene="LOC113444"/codon_start=1/product="hypothetical protein LOC113444"/protein_id="AAH93838.1"/db_xref="GI:62740001"/db_xref="GeneID:113444" 0 0 0 1 0 5 3 3 7 5 1 2 2 4 0 6 2 5 2 3 0 2 6 2 0 4 7 5 0 3 4 9 2 1 0 2 8 4 3 3 2 1 2 4 6 3 5 5 3 3 1 4 2 4 2 6 0 1 3 5 9 0 0 1 >AF221069\AF221069\10..795\786\AAF26448.1\Homo sapiens\Homo sapiens Claudin-18 mRNA, complete cds./codon_start=1/product="Claudin-18"/protein_id="AAF26448.1"/db_xref="GI:6715518" 1 2 1 0 0 4 1 4 14 2 0 0 3 8 0 4 4 2 6 12 0 3 2 2 0 2 3 15 3 4 5 12 7 6 3 4 12 3 4 5 8 0 2 6 2 1 4 3 6 1 7 4 7 1 8 4 1 9 5 15 4 1 0 0 >AF012629\AF012629\1..780\780\AAB67110.1\Homo sapiens\Homo sapiens antagonist decoy receptor for TRAIL/Apo-2L (TRID)mRNA, complete cds./gene="TRID"/codon_start=1/product="antagonist decoy receptor for TRAIL/Apo-2L"/protein_id="AAB67110.1"/db_xref="GI:2338431" 0 0 4 0 2 2 3 1 3 1 0 0 3 4 0 5 7 3 9 16 1 8 11 2 5 7 1 12 1 12 2 1 8 2 1 8 6 5 4 4 6 4 3 7 1 4 14 10 1 4 3 0 6 9 4 2 1 5 2 8 1 0 0 1 >BT007015\BT007015\1..1251\1251\AAP35661.1\Homo sapiens\Homo sapiens cyclin G associated kinase mRNA, complete cds./codon_start=1/product="cyclin G associated kinase"/protein_id="AAP35661.1"/db_xref="GI:30582869" 0 5 7 0 1 3 0 10 25 2 0 2 3 14 7 9 13 3 5 6 7 1 15 17 11 9 5 21 6 10 6 19 9 3 0 3 12 0 6 14 9 4 4 16 6 0 4 15 20 3 1 2 3 1 12 6 0 3 3 6 9 0 1 0 >BC002631\BC002631\248..1429\1182\AAH02631.1\Homo sapiens\Homo sapiens matrix metalloproteinase 28, transcript variant 2,mRNA (cDNA clone MGC:4164 IMAGE:3610296), complete cds./gene="MMP28"/codon_start=1/product="matrix metalloproteinase 28, preproprotein isoform 2"/protein_id="AAH02631.1"/db_xref="GI:12803593"/db_xref="GeneID:79148"/db_xref="MIM:608417" 1 20 3 3 3 5 4 8 25 1 0 4 4 7 1 1 8 2 2 7 2 4 4 13 3 4 5 16 14 6 5 15 9 2 1 8 12 3 7 8 8 3 7 14 10 2 4 16 12 9 10 2 2 0 15 6 0 5 2 4 12 0 0 1 >AL354750#4\AL354750\join(127998..128040,AL356053.14:42950..43060, AL356053.14:46033..46143,AL356053.14:51859..51972, AL356053.14:68600..68710,AL356053.14:83714..83812, AL356053.14:85867..85977,AL356053.14:86312..86437, AL356053.14:161995..162078,AL356053.14:171350..171433, AL591850.4:7143..7229,AL591850.4:17440..17523, AL591850.4:28168..28251,AL591850.4:28501..28584, AL591850.4:29001..29084,AL365203.19:2001..2030, AL365203.19:2710..2832,AL365203.19:5921..6046, AL365203.19:8514..8603,AL365203.19:30461..30507)\-5326254\CAI39705.1\Homo sapiens\Human DNA sequence from clone RP11-195O1 on chromosome 10 Containsthe 3' end of a novel gene novel protein (FLJ32762) (FLJ25419), anuclear DNA-binding protein (C1D) pseudogene, the 5' end of a novelgene (FLJ13031) and two CpG islands, complete sequence./gene="RP11-479G22.1"/locus_tag="RP11-479G22.1-001"/standard_name="OTTHUMP00000019424"/codon_start=1/product="novel protein"/protein_id="CAI39705.1"/db_xref="GI:57162332"/db_xref="UniProt/TrEMBL:Q5VTK2" 1 0 0 2 10 4 7 13 11 20 21 21 6 8 1 16 6 6 6 9 0 18 10 12 0 10 6 7 1 4 11 3 5 4 5 2 10 10 20 7 3 15 12 15 6 12 7 4 3 11 2 25 10 18 15 39 11 8 24 14 13 12 7 12 >HS108K11#3\Z85986\join(83137..83267,97697..97748,98876..98998, 101522..101605,105499..105622,107847..108001, 115737..115839,116695..116756,123349..123466, 124853..124976,125464..125559,126534..126628, 127425..127555)\1398\CAB39180.1\Homo sapiens\Human DNA sequence from clone RP1-108K11 on chromosome 6p21Contains the SFRS3 for splicing factor arginine/serine-rich(SRP20), the STK38 gene for serine/threonine protein kinase (NDR),the 3' end of a gene for a novel protein (2410004NRIK) and a CpGisland, complete sequence./gene="STK38"/locus_tag="RP1-108K11.2-001"/standard_name="OTTHUMP00000016293"/codon_start=1/product="serine\/threonine kinase 38"/protein_id="CAB39180.1"/db_xref="GI:4495062"/db_xref="GOA:Q15208"/db_xref="InterPro:IPR000719"/db_xref="InterPro:IPR000961"/db_xref="InterPro:IPR001245"/db_xref="InterPro:IPR002290"/db_xref="InterPro:IPR008271"/db_xref="InterPro:IPR011009"/db_xref="UniProt/Swiss-Prot:Q15208" 2 1 3 5 11 6 6 6 6 12 3 8 4 5 1 7 4 6 15 6 2 8 5 2 0 11 14 4 1 5 8 9 4 2 4 1 11 6 25 16 11 6 4 7 5 8 24 22 13 12 8 7 3 3 11 13 7 10 9 19 8 0 1 0 >CR457017\CR457017\1..414\414\CAG33298.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834H055D forgene CRABP1, cellular retinoic acid binding protein 1; completecds, incl. stopcodon./gene="CRABP1"/codon_start=1/protein_id="CAG33298.1"/db_xref="GI:48146151"/db_xref="GOA:P29762"/db_xref="UniProt/Swiss-Prot:P29762" 0 4 1 1 1 2 0 0 6 3 1 0 0 2 0 0 2 1 1 7 3 3 0 2 1 0 1 6 2 1 2 4 2 1 1 3 6 0 2 7 5 2 1 2 2 0 3 10 5 4 2 1 3 0 4 2 0 5 1 3 3 1 0 0 >BC036298\BC036298\79..1335\1257\AAH36298.2\Homo sapiens\Homo sapiens serine (or cysteine) proteinase inhibitor, clade H(heat shock protein 47), member 1, (collagen binding protein 1),mRNA (cDNA clone MGC:25195 IMAGE:4748644), complete cds./gene="SERPINH1"/codon_start=1/product="serine (or cysteine) proteinase inhibitor, clade H, member 1, precursor"/protein_id="AAH36298.2"/db_xref="GI:37589973"/db_xref="GeneID:871"/db_xref="MIM:600943" 2 14 4 1 0 0 3 10 31 3 2 5 5 8 4 0 16 0 2 10 4 3 1 10 0 4 5 26 8 4 1 15 5 3 0 4 23 1 4 30 11 0 2 11 12 2 2 25 21 3 9 1 2 0 14 2 0 10 2 14 4 0 1 0 >BC063793\BC063793\10..936\927\AAH63793.1\Homo sapiens\Homo sapiens transcriptional regulator protein, mRNA (cDNA cloneMGC:70553 IMAGE:6502502), complete cds./gene="HCNGP"/codon_start=1/product="transcriptional regulator protein"/protein_id="AAH63793.1"/db_xref="GI:39795356"/db_xref="GeneID:29115" 3 0 2 1 4 1 1 3 3 2 1 3 3 4 4 7 6 1 5 16 5 3 5 5 1 4 4 12 4 8 3 10 3 2 1 3 6 5 9 22 6 3 2 8 1 1 17 18 8 17 5 4 0 2 1 5 2 10 5 6 2 0 0 1 >AF152514\AF152514\1..2454\2454\AAD43774.1\Homo sapiens\Homo sapiens protocadherin gamma A7 short form protein(PCDH-gamma-A7) variable region sequence, complete cds./gene="PCDH-gamma-A7"/function="cell-cell adhesion"/codon_start=1/product="protocadherin gamma A7 short form protein"/protein_id="AAD43774.1"/db_xref="GI:5457076" 1 8 12 1 10 5 6 20 44 9 8 12 11 21 3 10 17 8 18 14 10 14 12 18 7 15 7 21 11 16 12 13 13 10 9 18 37 6 13 10 21 15 6 26 12 4 27 27 35 22 16 12 3 3 16 11 11 18 6 12 4 0 0 1 >HSM805396\AL834156\join(890..2041,2041..2295)\1407\CAD38862.1\Homo sapiens\Homo sapiens mRNA; cDNA DKFZp761P2012 (from clone DKFZp761P2012)./gene="DKFZp761P2012"/codon_start=1/product="hypothetical protein"/protein_id="CAD38862.1"/db_xref="GI:21739633"/db_xref="UniProt/TrEMBL:Q8N3H9" 3 17 17 5 0 2 1 4 41 0 0 6 4 7 2 3 8 4 1 17 2 3 9 16 6 6 8 12 8 8 6 16 9 2 1 1 19 3 1 12 9 3 1 30 11 7 2 24 13 8 16 3 5 0 10 3 2 7 5 13 6 0 0 1 >AF121855\AF121855\181..1395\1215\AAD27828.1\Homo sapiens\Homo sapiens sorting nexin 5 (SNX5) mRNA, complete cds./gene="SNX5"/codon_start=1/product="sorting nexin 5"/protein_id="AAD27828.1"/db_xref="GI:4689250" 3 3 4 0 6 5 7 8 11 9 4 6 3 4 1 10 5 5 8 5 2 4 2 5 0 5 5 7 2 12 2 1 2 4 3 6 9 10 24 19 8 8 4 12 4 7 22 18 14 14 5 6 3 2 7 15 3 3 9 8 1 0 0 1 >BC029796\BC029796\322..681\360\AAH29796.1\Homo sapiens\Homo sapiens hypothetical protein BC014011, mRNA (cDNA cloneMGC:35369 IMAGE:5183143), complete cds./gene="LOC116349"/codon_start=1/product="LOC116349 protein"/protein_id="AAH29796.1"/db_xref="GI:20987520"/db_xref="GeneID:116349" 1 2 3 2 3 1 0 2 2 0 0 3 2 3 2 1 1 2 0 4 1 2 1 2 2 4 3 4 3 3 0 7 6 2 1 4 1 1 1 1 2 3 3 4 2 2 3 5 0 0 0 0 0 2 1 2 1 0 0 4 2 0 0 1 >BC053592\BC053592\488..799\312\AAH53592.1\Homo sapiens\Homo sapiens uridine phosphorylase 1, mRNA (cDNA cloneIMAGE:5724026), complete cds./gene="UPP1"/codon_start=1/product="UPP1 protein"/protein_id="AAH53592.1"/db_xref="GI:31566145"/db_xref="GeneID:7378"/db_xref="MIM:191730" 0 3 1 1 0 1 0 4 6 0 0 1 0 2 1 0 6 0 0 2 1 0 0 0 1 1 2 7 2 1 0 3 3 0 0 2 5 0 1 5 1 2 2 5 0 0 2 5 3 1 3 3 4 1 2 1 0 3 0 3 0 0 0 1 >AY459291\AY459291\1..240\240\AAR23236.1\Homo sapiens\Homo sapiens HCV-NS5ATP5 binding protein 1 mRNA, complete cds./codon_start=1/product="HCV-NS5ATP5 binding protein 1"/protein_id="AAR23236.1"/db_xref="GI:38505469" 0 0 0 0 2 0 0 1 1 1 2 0 3 3 0 2 1 0 1 3 0 3 5 1 0 5 2 2 0 0 2 0 2 2 1 2 2 0 1 0 1 2 1 2 3 2 0 0 2 0 0 1 0 0 4 4 1 0 1 3 2 0 0 1 >AY230251\AY230251\1..873\873\AAO73561.1\Homo sapiens\Homo sapiens N-acetyltransferase 2 (NAT2) gene, NAT2*12D allele,complete cds./gene="NAT2"/allele="NAT2*12D"/codon_start=1/product="N-acetyltransferase 2"/protein_id="AAO73561.1"/db_xref="GI:29423786" 1 0 2 0 8 3 1 6 8 7 5 7 2 5 0 7 2 0 10 6 2 5 4 2 0 6 1 1 0 4 4 5 7 3 1 3 5 5 9 5 7 9 4 11 4 2 12 10 6 5 8 6 2 4 3 12 3 6 13 7 4 0 1 0 >AY663396#2\AY663396\complement(join(37566..37720,38077..38358,38772..39020, 42791..42872))\768\AAU87981.1\Homo sapiens\Homo sapiens voucher Coriell Cell Repository DNA sample NA14660 MHCclass II antigen (HLA-DQB1) gene, HLA-DQB1-DQB1*050101 allele andMHC class II antigen (HLA-DQA1) gene, complete cds; and MHC classII antigen (HLA-DRB1) gene, partial cds./gene="HLA-DQA1"/codon_start=1/product="MHC class II antigen"/protein_id="AAU87981.1"/db_xref="GI:52840181" 0 1 1 1 2 1 1 7 12 2 0 4 3 4 0 7 4 2 5 8 0 3 2 4 1 8 2 5 0 8 3 7 4 8 1 5 13 4 5 5 9 2 2 6 5 2 3 14 6 4 6 1 2 3 7 6 0 7 6 6 5 0 0 1 >BC062746\BC062746\47..538\492\AAH62746.1\Homo sapiens\Homo sapiens lipocalin 6, mRNA (cDNA clone MGC:72111IMAGE:6452008), complete cds./gene="LCN6"/codon_start=1/product="lipocalin 6"/protein_id="AAH62746.1"/db_xref="GI:38540969"/db_xref="GeneID:158062"/db_xref="MIM:609379" 1 0 2 0 2 2 0 3 16 3 0 2 2 3 1 1 3 2 1 4 1 3 1 4 0 1 0 8 1 3 3 5 6 0 0 3 12 0 0 5 6 1 0 7 1 0 2 9 6 0 2 1 0 1 6 3 2 2 0 5 5 0 1 0 >HSPMP35HM\Y12860\122..1045\924\CAA73367.1\Homo sapiens\Homo sapiens mRNA for peroxisomal integral membrane protein./codon_start=1/product="peroxisomal integral membrane protein"/protein_id="CAA73367.1"/db_xref="GI:2808531"/db_xref="GOA:O43808"/db_xref="UniProt/Swiss-Prot:O43808" 3 1 3 3 6 2 2 11 13 10 3 5 2 6 1 2 3 2 11 5 2 5 5 3 0 1 7 8 1 7 10 2 4 4 6 5 12 6 15 4 4 7 4 7 5 3 8 3 1 6 4 6 2 0 6 12 1 6 9 8 4 0 0 1 >BC012803\BC012803\84..1583\1500\AAH12803.1\Homo sapiens\Homo sapiens CDC20 cell division cycle 20 homolog (S. cerevisiae),mRNA (cDNA clone MGC:2777 IMAGE:2959588), complete cds./gene="CDC20"/codon_start=1/product="cell division cycle 20"/protein_id="AAH12803.1"/db_xref="GI:15215409"/db_xref="GeneID:991"/db_xref="MIM:603618" 7 8 10 3 1 2 4 7 22 5 0 6 3 13 1 7 15 17 5 10 1 4 6 9 2 13 15 25 5 7 4 16 7 9 5 4 17 4 10 13 12 9 5 22 11 9 9 14 7 12 7 5 3 4 4 3 1 14 3 7 16 0 0 1 >AL355987#23\AL355987\complement(join(140941..140981,141391..141484, 141773..141933,142305..142356,144666..144743))\426\CAI12698.1\Homo sapiens\Human DNA sequence from clone RP11-216L13 on chromosome 9 Containsthe 3' end of the gene for a novel protein similar to mousepancreatitis-induced protein 49 (PIP49), the 5' end of a novelgene (FLJ30346), five novel genes (containing LOC158062, KIAA1984,FLJ35320, FLJ36901, FLJ30985, FLJ10101, FLJ37063 FLJ13045,FLJ40767), a nucleolin (NCL) pseudogene, the gene for a novelprotein similar to mouse lipocalin 8 (EP17, Lcn5), a 13kD lysosomalH+ transporting ATPase V1 subunit G isoform 1 (ATP6V1G1)pseudogene, the gene for phosphohistidine phosphatase (PHP14), thegene for a novel protein similar to rat apical early endosomalglycoprotein (LOC252882), the EDF1 gene for endothelialdifferentiation-related factor 1 (MBF1, EDF-1), the 5' end of theTRAF2 gene for TNF receptor-associated factor 2 (TRAP, TRAP3) andten CpG islands, complete sequence./gene="EDF1"/locus_tag="RP11-216L13.1-003"/standard_name="OTTHUMP00000022618"/codon_start=1/product="endothelial differentiation-related factor 1"/protein_id="CAI12698.1"/db_xref="GI:55958759"/db_xref="GOA:O60869"/db_xref="GOA:Q5T5T2"/db_xref="UniProt/Swiss-Prot:O60869" 1 1 4 1 2 2 0 0 6 2 1 0 0 2 0 1 3 1 1 3 6 1 1 1 0 1 1 9 2 4 3 4 1 1 0 0 10 0 5 10 3 2 2 10 1 2 1 9 6 2 0 1 0 0 0 0 1 5 2 1 2 0 0 1 >AP001748#1\AP001748 AL163293 BA000005\join(160243..160290,163349..163476,165911..166082, 168926..169096,172766..172865,173991..174088, 177161..177289,180729..180805,184560..184732, 185748..185959)\1308\BAA95533.1\Homo sapiens\Homo sapiens genomic DNA, chromosome 21q, section 92/105./gene="PKNOX1"/codon_start=1/product="homeobox-containing protein"/protein_id="BAA95533.1"/db_xref="GI:7768746" 2 1 2 0 3 4 2 5 10 5 7 9 5 6 3 9 10 8 17 7 4 5 10 6 5 9 6 10 5 7 7 6 7 3 6 10 14 4 11 9 14 4 12 30 2 5 14 14 8 15 3 3 3 4 4 6 1 10 9 12 3 0 1 0 >AF401998\AF401998\1..1149\1149\AAK94915.1\Homo sapiens\Homo sapiens muscleblind 41kD isoform (MBNL) mRNA, complete cds./gene="MBNL"/codon_start=1/product="muscleblind 41kD isoform"/protein_id="AAK94915.1"/db_xref="GI:15290633" 4 1 4 2 3 3 3 1 4 4 7 7 5 2 1 4 5 3 15 6 3 6 14 9 3 9 19 20 0 18 5 3 3 5 5 6 4 9 10 6 10 10 11 14 2 8 7 6 6 3 5 3 9 5 3 8 5 4 5 16 1 0 1 0 >AY166584\AY166584\1..2022\2022\AAO27704.1\Homo sapiens\Homo sapiens vasorin mRNA, complete cds./codon_start=1/product="vasorin"/protein_id="AAO27704.1"/db_xref="GI:37725933" 4 20 18 2 0 5 3 30 73 2 0 6 2 10 3 3 22 0 13 16 10 6 17 32 13 13 8 36 7 9 2 29 25 3 1 13 25 0 1 7 20 2 0 29 11 2 3 32 19 3 9 1 15 4 10 4 0 9 2 6 3 1 0 0 >BC021988\BC021988\95..823\729\AAH21988.1\Homo sapiens\Homo sapiens Nedd4 family interacting protein 2, mRNA (cDNA cloneMGC:23729 IMAGE:4101140), complete cds./gene="NDFIP2"/codon_start=1/product="NDFIP2 protein"/protein_id="AAH21988.1"/db_xref="GI:18314412"/db_xref="GeneID:54602" 0 1 0 0 6 3 1 2 2 11 2 2 5 3 1 6 1 4 5 3 0 6 8 2 2 5 12 0 1 12 6 3 3 2 2 2 5 3 3 1 4 5 1 8 2 1 9 6 3 8 3 9 1 3 10 9 3 4 6 6 5 0 1 0 >HUMC1A1\M20789\join(120..222,1678..1872,2011..2045,2148..2183,2274..2375, 3099..3170,3405..3449,3598..3651,3813..3866,4359..4412, 4529..4582,4856..4909,4999..5043,5160..5213,5328..5372, 5551..5604,5864..5962,6023..6067,6170..6268,6396..6449, 6666..6773,6870..6923,7048..7146,7308..7361,7450..7548)\1767\AAB59373.1\Homo sapiens\Human, alpha 1 collagen type I gene, exons 1-25, clone RMS-8./gene="COL1A1"/codon_start=1/product="alpha-1 type I collagen"/protein_id="AAB59373.1"/db_xref="GI:179594"/db_xref="GDB:G00-119-061" 9 2 2 10 2 2 0 9 8 1 1 3 4 1 0 6 7 1 1 8 1 6 8 54 0 62 3 11 1 37 43 45 5 75 1 6 6 4 8 13 6 4 10 8 2 1 15 17 12 12 1 2 8 2 7 2 0 5 2 6 1 0 0 0 >AF490768\AF490768\89..4792\4704\AAM18046.1\Homo sapiens\Homo sapiens KIAA0184 protein mRNA, complete cds./codon_start=1/product="KIAA0184 protein"/protein_id="AAM18046.1"/db_xref="GI:20269774" 6 14 25 7 18 22 10 31 95 14 8 16 13 25 16 21 30 17 25 51 12 23 22 35 20 32 36 54 15 34 28 33 34 15 5 47 76 20 27 42 21 10 7 45 31 9 22 56 45 28 26 13 18 19 28 12 8 34 13 30 18 0 0 1 >BC007537\BC007537\342..2690\2349\AAH07537.1\Homo sapiens\Homo sapiens CSRP2 binding protein, transcript variant 1, mRNA(cDNA clone MGC:15388 IMAGE:3350378), complete cds./gene="CSRP2BP"/codon_start=1/product="CSRP2 binding protein, isoform a"/protein_id="AAH07537.1"/db_xref="GI:14043103"/db_xref="GeneID:57325" 5 5 6 4 14 19 2 8 28 11 7 15 12 11 3 20 12 16 12 12 7 11 18 12 5 18 13 15 1 15 19 10 8 7 6 11 14 11 28 23 6 6 8 23 10 6 33 27 25 27 14 14 4 9 13 18 5 13 17 18 12 0 0 1 >BC034737\BC034737\225..2123\1899\AAH34737.1\Homo sapiens\Homo sapiens ligand of numb-protein X, mRNA (cDNA clone MGC:34086IMAGE:5194423), complete cds./gene="LNX"/codon_start=1/product="multi-PDZ-domain-containing protein"/protein_id="AAH34737.1"/db_xref="GI:21961543"/db_xref="GeneID:84708" 8 4 8 5 16 9 4 10 21 8 6 11 6 12 2 6 21 12 10 6 1 9 16 12 1 12 13 15 3 10 16 15 9 10 5 14 18 9 16 9 17 15 3 13 7 10 21 16 18 16 8 8 5 6 6 7 6 16 19 8 9 0 1 0 >BC071982\BC071982\116..676\561\AAH71982.1\Homo sapiens\Homo sapiens cDNA clone MGC:88679 IMAGE:5477638, complete cds./codon_start=1/product="Unknown (protein for MGC:88679)"/protein_id="AAH71982.1"/db_xref="GI:47938380" 1 1 2 1 2 2 0 2 12 7 2 1 6 3 0 6 3 1 4 2 2 3 3 0 0 2 5 4 1 1 2 1 2 2 0 4 4 6 4 8 4 3 2 7 2 4 4 3 1 6 6 2 4 1 5 7 0 5 3 3 2 1 0 0 >CR536527\CR536527\1..783\783\CAG38764.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834F0322D forgene DCK, deoxycytidine kinase; complete cds, incl. stopcodon./gene="DCK"/codon_start=1/protein_id="CAG38764.1"/db_xref="GI:49168538" 3 1 1 0 5 2 0 3 5 9 5 5 2 1 0 8 4 5 8 4 0 3 1 1 2 6 3 6 0 2 2 3 4 1 1 1 4 6 11 6 3 10 10 3 0 5 17 11 5 6 0 12 4 2 3 9 2 7 4 6 7 0 0 1 >AF026445\AF026445\266..3865\3600\AAC02966.1\Homo sapiens\Homo sapiens cofactor of initiator function (CIF150) mRNA, completecds./gene="CIF150"/codon_start=1/product="cofactor of initiator function"/protein_id="AAC02966.1"/db_xref="GI:2739087" 4 2 5 3 21 17 11 9 16 30 21 30 21 20 1 24 14 26 25 8 0 26 28 14 5 18 26 14 2 29 21 9 5 10 16 12 22 28 49 37 21 38 13 29 18 34 45 28 18 37 14 26 4 22 18 41 20 11 27 39 17 0 0 1 >BC057777\BC057777\42..1121\1080\AAH57777.1\Homo sapiens\Homo sapiens guanine nucleotide binding protein (G protein), qpolypeptide, mRNA (cDNA clone MGC:71577 IMAGE:30337698), completecds./gene="GNAQ"/codon_start=1/product="guanine nucleotide binding protein (G protein), q polypeptide"/protein_id="AAH57777.1"/db_xref="GI:35505494"/db_xref="GeneID:2776"/db_xref="MIM:600998" 6 4 5 0 9 3 3 7 12 4 5 2 3 4 1 5 2 6 6 6 4 1 4 4 0 3 3 11 2 5 4 3 4 0 4 8 6 5 6 16 7 8 7 9 3 3 11 18 16 12 9 9 5 0 9 8 2 18 5 11 3 1 0 0 >AY720432\AY720432\1..3858\3858\AAU14134.1\Homo sapiens\Homo sapiens crumbs-like protein 2 precursor (CRB2) mRNA, completecds./gene="CRB2"/codon_start=1/product="crumbs-like protein 2 precursor"/protein_id="AAU14134.1"/db_xref="GI:51921989" 6 35 16 8 4 7 5 45 82 8 0 10 8 16 10 13 20 9 6 33 12 10 21 45 25 29 17 67 27 25 8 82 39 25 3 16 47 1 0 5 13 11 1 40 30 12 9 60 38 20 10 3 74 35 33 10 1 11 2 8 19 0 1 0 >CR456765\CR456765\1..930\930\CAG33046.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834D054D forgene TOMM34, translocase of outer mitochondrial membrane 34;complete cds, incl. stopcodon./gene="TOMM34"/codon_start=1/protein_id="CAG33046.1"/db_xref="GI:48145647" 2 4 4 0 7 2 2 9 13 2 0 7 5 5 2 5 6 4 4 2 2 2 2 5 0 7 10 12 3 9 3 5 2 2 2 1 10 5 10 20 12 5 2 9 3 2 12 13 8 6 8 5 5 2 4 1 0 4 4 5 2 1 0 0 >AK075407\AK075407\73..699\627\BAC11600.1\Homo sapiens\Homo sapiens cDNA PSEC0098 fis, clone NT2RP2002695, highly similarto Mesenchymal stem cell protein DSCD75./codon_start=1/protein_id="BAC11600.1"/db_xref="GI:22761475" 0 22 3 1 0 2 1 4 34 1 0 4 1 1 3 0 3 2 0 4 1 0 0 5 3 1 0 5 13 3 0 4 5 1 0 7 10 0 0 1 3 0 0 8 7 0 0 11 8 2 4 0 5 0 5 3 0 1 0 3 3 0 0 1 >BC006401\BC006401\145..1317\1173\AAH06401.1\Homo sapiens\Homo sapiens Fas apoptotic inhibitory molecule 3, mRNA (cDNA cloneMGC:12770 IMAGE:4308958), complete cds./gene="FAIM3"/codon_start=1/product="Fas apoptotic inhibitory molecule 3"/protein_id="AAH06401.1"/db_xref="GI:13623573"/db_xref="GeneID:9214"/db_xref="MIM:606015" 5 7 4 1 3 12 1 5 24 5 0 2 7 9 2 7 6 4 7 15 1 3 12 12 6 7 7 16 3 5 9 9 7 2 7 4 9 7 6 11 5 3 5 10 9 3 10 11 8 2 9 5 4 2 8 2 1 10 0 9 5 0 0 1 >BC002357\BC002357\211..1644\1434\AAH02357.1\Homo sapiens\Homo sapiens protoporphyrinogen oxidase, mRNA (cDNA clone MGC:8485IMAGE:2821983), complete cds./gene="PPOX"/codon_start=1/product="protoporphyrinogen oxidase"/protein_id="AAH02357.1"/db_xref="GI:12803107"/db_xref="GeneID:5498"/db_xref="MIM:600923" 1 8 8 6 2 9 10 14 29 5 4 8 8 6 0 6 11 14 4 6 0 7 7 10 2 15 8 19 2 19 18 16 10 5 2 11 17 7 3 5 4 3 7 17 9 5 6 22 8 3 3 3 5 3 5 7 2 5 6 5 7 0 0 1 >BC069499\BC069499\267..1526\1260\AAH69499.1\Homo sapiens\Homo sapiens potassium inwardly-rectifying channel, subfamily J,member 5, mRNA (cDNA clone MGC:97076 IMAGE:7262288), complete cds./gene="KCNJ5"/codon_start=1/product="potassium inwardly-rectifying channel J5"/protein_id="AAH69499.1"/db_xref="GI:46854375"/db_xref="GeneID:3762"/db_xref="MIM:600734" 2 7 8 1 1 4 1 17 18 0 0 3 0 11 1 3 6 2 8 14 2 2 5 9 0 3 4 12 1 6 1 18 9 2 0 12 13 4 2 19 13 3 2 13 5 3 11 26 15 6 7 5 5 4 18 4 1 14 10 15 8 0 0 1 >HSGDF5\X80915\640..2145\1506\CAA56874.1\Homo sapiens\H.sapiens Gdf5 gene./gene="Gdf5"/codon_start=1/protein_id="CAA56874.1"/db_xref="GI:671525"/db_xref="GOA:P43026"/db_xref="InterPro:IPR001111"/db_xref="InterPro:IPR001839"/db_xref="InterPro:IPR002400"/db_xref="UniProt/Swiss-Prot:P43026" 5 8 14 1 4 11 0 7 33 4 0 8 0 8 4 4 10 3 6 10 4 4 12 30 3 3 7 26 1 7 10 20 11 4 0 7 15 0 8 30 10 4 2 16 6 2 4 23 19 6 6 4 8 2 14 6 0 10 4 6 7 0 1 0 >HUMFKMKA\M69148\1..432\432\AAA58478.1\Homo sapiens\Human midkine mRNA, complete cds./gene="hMK-1"/codon_start=1/product="midkine"/protein_id="AAA58478.1"/db_xref="GI:182651" 1 5 1 0 0 1 0 6 3 0 0 0 0 1 0 0 3 0 1 11 0 0 0 5 1 0 1 7 4 2 2 8 5 2 0 3 3 0 6 17 2 1 1 5 1 0 0 6 2 3 2 0 9 1 2 2 0 2 0 1 4 0 1 0 >AL606537\AL606537\join(complement(AC011700.5:23459..25185), complement(AC011700.5:16449..16554), complement(AC011700.5:10004..10200),96915..97122)\-1164\CAI15309.1\Homo sapiens\Human DNA sequence from clone RP11-53A1 on chromosome 1 Containsthe 3' end of the PROX1 gene for prospero-related homeobox protein1, complete sequence./gene="PROX1"/locus_tag="RP11-53A1.1-001"/standard_name="OTTHUMP00000035023"/codon_start=1/product="prospero-related homeobox 1"/protein_id="CAI15309.1"/db_xref="GI:55961898"/db_xref="GOA:Q5SW76"/db_xref="InterPro:IPR007738"/db_xref="UniProt/TrEMBL:Q5SW76" 1 1 1 0 22 11 13 16 9 15 14 14 18 10 1 26 6 16 11 9 0 15 13 10 1 15 11 5 0 10 9 5 8 3 9 8 8 10 37 18 10 19 10 19 5 17 28 12 6 12 4 16 9 18 17 23 17 13 21 15 9 17 9 11 >BC099842\BC099842\31..405\375\AAH99842.1\Homo sapiens\Homo sapiens TP53TG3 protein, transcript variant 2, mRNA (cDNAclone MGC:119886 IMAGE:40015192), complete cds./gene="TP53TG3"/codon_start=1/product="hypothetical protein LOC24150, isoform 2"/protein_id="AAH99842.1"/db_xref="GI:71122422"/db_xref="GeneID:24150" 3 2 2 0 3 0 2 1 2 0 0 3 5 3 1 3 3 2 3 2 1 2 3 7 0 7 4 6 1 4 3 1 3 4 0 0 0 4 0 1 0 1 1 1 0 2 2 4 3 0 0 1 4 4 1 3 1 2 0 1 2 1 0 0 >AL357033#1\AL357033\complement(join(5687..6079,6733..6774))\435\CAI17067.1\Homo sapiens\Human DNA sequence from clone RP11-93B14 on chromosome 20 Containsthe SLC21A12 gene for solute carrier family 21 (organic aniontransporter) member 12, the 5' end of the NTSR1 gene encoding thehigh affinity neurotensin receptor 1, three novel genes, the genefor a novel protein (FLJ32154) and four CpG islands, completesequence./gene="RP11-93B14.6"/locus_tag="RP11-93B14.6-001"/standard_name="OTTHUMP00000031495"/codon_start=1/protein_id="CAI17067.1"/db_xref="GI:55959056"/db_xref="UniProt/TrEMBL:Q5JT57" 2 7 9 1 0 2 0 5 5 1 0 1 0 5 2 2 1 1 1 0 5 2 1 3 12 2 2 7 4 1 5 10 5 3 0 1 2 1 1 2 1 0 1 9 2 0 3 2 2 1 0 0 2 0 2 1 0 0 0 1 3 0 0 1 >AF053318\AF053318\245..1123\879\AAD02685.1\Homo sapiens\Homo sapiens CCR4-associated factor 1 (POP2) mRNA, complete cds./gene="POP2"/codon_start=1/product="CCR4-associated factor 1"/protein_id="AAD02685.1"/db_xref="GI:4106061" 2 0 2 2 0 4 2 3 9 12 3 5 5 5 0 3 4 5 9 0 0 1 4 1 0 2 4 5 1 3 8 6 2 3 1 1 9 5 6 9 5 9 1 17 2 4 12 12 8 11 5 7 2 3 6 10 2 7 9 11 3 0 0 1 >S74499\S74499\25..342\318\AAB33275.1\Homo sapiens\Ig VH4=Ig heavy chain variable region {clone D21} [human, systemiclupus erythematosus, mRNA Partial, 343 nt]./gene="Ig VH4"/codon_start=1/product="Ig heavy chain variable region"/protein_id="AAB33275.1"/db_xref="GI:807018" 1 0 1 1 1 0 0 4 5 0 1 0 3 5 1 1 3 4 0 5 3 1 1 1 1 1 0 4 2 0 2 2 4 2 1 2 2 0 0 4 3 0 0 4 0 0 0 6 3 0 4 2 1 1 3 0 0 2 1 3 4 0 0 0 >BC012303\BC012303\81..425\345\AAH12303.1\Homo sapiens\Homo sapiens PDZK1 interacting protein 1, mRNA (cDNA cloneMGC:21224 IMAGE:4469903), complete cds./gene="PDZK1IP1"/codon_start=1/product="PDZK1 interacting protein 1"/protein_id="AAH12303.1"/db_xref="GI:15126763"/db_xref="GeneID:10158"/db_xref="MIM:607178" 0 1 0 0 0 2 0 5 6 2 0 0 0 1 2 1 3 2 1 2 0 0 1 2 2 2 5 6 2 0 4 4 1 0 0 5 5 1 1 2 2 2 1 4 2 1 1 8 0 2 1 1 1 1 3 1 0 3 1 6 2 1 0 0 >AF121775\AF121775\219..1319\1101\AAF24125.1\Homo sapiens\Homo sapiens nasopharyngeal carcinoma susceptibility protein LZ16mRNA, complete cds./codon_start=1/product="nasopharyngeal carcinoma susceptibility protein LZ16"/protein_id="AAF24125.1"/db_xref="GI:6690397" 3 4 8 2 3 8 2 6 15 2 1 2 5 7 2 5 13 3 7 9 12 3 7 8 3 5 6 14 4 2 4 14 11 2 0 5 10 1 13 21 15 1 0 14 6 0 4 24 14 5 6 1 1 2 5 1 0 4 1 8 2 0 0 1 >HSU77396\U77396\234..920\687\AAB36550.1\Homo sapiens\Homo sapiens LPS-Induced TNF-Alpha Factor (LITAF) mRNA, completecds./gene="LITAF"/codon_start=1/product="LPS-Induced TNF-Alpha Factor"/protein_id="AAB36550.1"/db_xref="GI:1684872" 0 1 0 3 0 1 2 4 12 4 1 1 3 8 2 2 5 4 3 5 5 2 9 6 1 15 6 5 3 5 7 1 11 4 1 4 7 2 1 2 4 4 4 8 4 3 3 2 1 1 4 5 4 2 3 3 0 4 2 5 4 0 1 0 >AL591543#2\AL591543\join(54020..54195,54662..54953)\468\CAI17286.1\Homo sapiens\Human DNA sequence from clone RP13-198D9 on chromosome 9 Contains anovel gene (KIAA2015), a novel gene, a tyrosine3-monooxygenase/tryptophan 5-monooxygenase activation protein, betapolypeptide (YWHAB) pseudogene and two CpG islands, completesequence./locus_tag="RP13-198D9.2-003"/standard_name="OTTHUMP00000021401"/codon_start=1/protein_id="CAI17286.1"/db_xref="GI:55961130"/db_xref="UniProt/TrEMBL:Q5SY85" 0 5 7 1 2 3 2 5 11 0 0 2 1 3 0 1 3 0 3 2 0 0 1 2 6 4 6 6 5 2 5 11 5 2 0 1 2 1 0 0 0 0 1 10 6 1 5 2 5 1 0 0 6 1 0 0 0 0 1 1 5 1 0 0 >AL138690\AL138690\join(157463..157756,AL161902.12:69081..69312, AL161902.12:73095..73195,AL161902.12:73454..73549, AL161902.12:76711..76832,AL161902.12:78417..78543, AL161902.12:84158..84277,AL161902.12:86865..87011, AL161902.12:98056..98253,AL161902.12:98867..99000, AL356430.19:21907..22015,AL356430.19:32900..33052, AL356430.19:34420..34588,AL356430.19:41024..41103, AL356430.19:41838..41900,AL356430.19:41987..42084, AL356430.19:47092..47184,AL356430.19:65879..65987, AL356430.19:79384..79465,AL356430.19:79693..79816, AL356430.19:80688..80880,AL356430.19:82626..83642, AL356430.19:85360..85577,AL356430.19:87966..88119, AL356430.19:94882..95062,AL356430.19:97074..97204, AL356430.19:100606..100713,AL356430.19:105970..106138, AL356430.19:107586..107681,AL356430.19:119474..119912, AL356430.19:132310..132464,AL356430.19:134663..134789, AL356430.19:156102..156229,AL357083.19:45114..45175, AL357083.19:64253..64317,AL357083.19:103833..103941, AL357083.19:106882..107048,AL161718.14:52938..53062, AL390071.9:14325..14468,AL390071.9:34633..34769, AL390071.9:112710..112827,AL390071.9:113163..113265, AL390071.9:117220..117335,AL390071.9:129138..129250, AL390071.9:146131..146271,AL390071.9:155561..155680, AL390071.9:168659..168811,AL139083.15:19322..19489, AL139083.15:37500..37589,AL139083.15:40896..41088, AL139083.15:43046..43117,AL139083.15:46096..46251, AL139083.15:46820..46990,AL139083.15:56326..56427, AL139083.15:58615..58811,AL139083.15:59609..59760, AL139083.15:62142..62232)\-12986205\CAH72296.1\Homo sapiens\Human DNA sequence from clone RP11-307O13 on chromosome13q12.3-14.2 Contains the 5' end of the NBEA gene for neurobeachin(BCL8B FLJ10197 KIAA1544) and a CpG island, complete sequence./gene="NBEA"/locus_tag="RP11-270C18.1-001"/standard_name="OTTHUMP00000018239"/codon_start=1/product="neurobeachin"/protein_id="CAH72296.1"/db_xref="GI:55662138"/db_xref="InterPro:IPR000409"/db_xref="InterPro:IPR001680"/db_xref="InterPro:IPR010508"/db_xref="UniProt/TrEMBL:Q5W0E7" 7 6 9 1 72 57 32 49 53 51 72 50 59 52 13 80 37 36 74 35 7 43 52 38 12 40 35 39 6 41 47 35 35 29 33 27 46 37 123 80 53 74 52 54 46 52 63 65 34 41 24 55 46 53 59 91 57 51 68 53 47 61 37 58 >AF536216\AF536216\364..2196\1833\AAP46193.1\Homo sapiens\Homo sapiens sodium solute symporter family 5 member 8 protein(SLC5A8) mRNA, complete cds./gene="SLC5A8"/codon_start=1/product="sodium solute symporter family 5 member 8 protein"/protein_id="AAP46193.1"/db_xref="GI:31249543" 1 1 1 3 9 3 4 11 19 11 11 14 9 10 2 7 13 6 18 14 2 11 9 4 1 10 12 16 6 12 24 14 9 9 5 11 26 13 11 6 9 13 12 10 3 3 6 6 8 13 16 14 3 8 19 19 6 14 23 16 11 0 0 1 >HSU13665\U13665\106..1095\990\AAA65233.1\Homo sapiens\Human cathepsin O (CTSO) mRNA, complete cds./gene="CTSO"/codon_start=1/product="cathepsin O"/protein_id="AAA65233.1"/db_xref="GI:606923" 3 1 2 1 3 2 3 5 14 3 2 1 0 4 0 8 8 4 2 6 0 3 4 4 0 4 4 10 1 6 11 6 5 7 1 4 14 4 11 17 16 9 3 8 4 4 10 11 6 7 6 12 2 6 3 3 2 6 6 8 9 0 0 1 >AL157935#4\AL157935\complement(join(41341..41409,41654..41845,45156..45272, 46023..46186,46359..46394,46727..46781))\633\CAI12607.1\Homo sapiens\Human DNA sequence from clone RP11-203J24 on chromosome 9 Containsthe 5' end of the ENG gene for endoglin (Osler-Rendu-Weber syndrome1) (END, ORW, HHT1, ORW1, CD105), the AK1 gene for adenylate kinase1, the ST6GALNAC6 gene for CMP-NeuAC:(beta)-N-acetylgalactosaminide(alpha)2,6-sialyltransferase member VI (ST6GALNACVI), the SIAT7Dgene for sialyltransferase 7D((alpha-N-acetylneuraminyl-2,3-beta-galactosyl-1,3)-N-acetylgalactosaminide alpha-2,6-sialyltransferase) (SIAT3C, ST6GALNAC4,ST6GALNACIV), the DPM2 gene for dolichyl-phosphatemannosyltransferase polypeptide 2, regulatory subunit (MGC21559),the gene for a novel protien containing FLJ00179, a novel gene andten CpG islands, complete sequence./gene="AK1"/locus_tag="RP11-203J24.1-009"/standard_name="OTTHUMP00000022219"/codon_start=1/product="adenylate kinase 1"/protein_id="CAI12607.1"/db_xref="GI:55958256"/db_xref="GOA:Q5T9B7"/db_xref="InterPro:IPR000850"/db_xref="InterPro:IPR006267"/db_xref="InterPro:IPR011769"/db_xref="UniProt/TrEMBL:Q5T9B7" 1 2 6 3 3 1 1 4 12 0 0 2 2 6 3 0 2 2 3 10 0 1 1 2 1 3 1 7 0 1 3 11 5 1 0 6 10 1 6 13 1 2 1 7 2 0 4 15 9 3 3 4 3 1 3 2 0 6 3 5 0 0 1 0 >BC027463\BC027463\38..673\636\AAH27463.1\Homo sapiens\Homo sapiens ribosomal protein L13, transcript variant 1, mRNA(cDNA clone MGC:34920 IMAGE:5111302), complete cds./gene="RPL13"/codon_start=1/product="ribosomal protein L13"/protein_id="AAH27463.1"/db_xref="GI:20071962"/db_xref="GeneID:6137"/db_xref="MIM:113703" 1 9 12 3 2 4 0 5 6 0 0 1 0 3 2 2 2 2 0 3 4 1 0 10 4 0 2 16 3 4 2 6 0 1 0 5 7 2 7 18 5 2 1 6 4 0 6 7 2 2 2 1 1 0 7 0 1 7 2 4 2 1 0 0 >HSU07151\U07151\16..564\549\AAA21654.1\Homo sapiens\Human GTP binding protein (ARL3) mRNA, complete cds./gene="ARL3"/function="GTP binding protein"/standard_name="ARF-like 3"/codon_start=1/evidence=experimental/product="ARL3"/protein_id="AAA21654.1"/db_xref="GI:460625" 1 2 0 0 4 1 2 5 6 4 1 5 3 0 0 3 1 4 4 3 1 1 3 0 0 2 6 2 1 3 3 5 0 4 3 3 3 1 8 7 3 7 2 9 1 1 9 4 6 4 1 2 2 1 1 4 2 8 4 2 4 1 0 0 >AL365181#10\AL365181\join(134628..134712,134944..135297,135488..135604, 135738..135920,136238..136521)\1023\CAI13055.1\Homo sapiens\Human DNA sequence from clone RP11-284F21 on chromosome 1 Containsthe 5' end of the MEF2D gene for MADS box transcription enhancerfactor 2 polypeptide D (myocyte enhancer factor 2D), the IQGAP3gene for IQ motif containing GTPase activating protein 3, a novelgene, the APOA1BP gene for apolipoprotein A-I binding protein, anovel gene (FLJ20249), a novel gene, the gene for brain linkprotein-1 (BRAL1), a novel gene, the 5' end of the BCAN gene forbrevican, the 3' end of a novel gene and three CpG islands,complete sequence./gene="HAPLN2"/locus_tag="RP11-284F21.6-001"/standard_name="OTTHUMP00000031987"/codon_start=1/product="hyaluronan and proteoglycan link protein 2"/protein_id="CAI13055.1"/db_xref="GI:55959666"/db_xref="InterPro:IPR000538"/db_xref="InterPro:IPR003599"/db_xref="InterPro:IPR007110" 3 16 10 1 0 4 2 12 16 2 0 2 0 5 1 2 6 2 1 11 6 0 3 14 6 3 3 17 9 2 6 18 13 2 0 3 14 1 1 5 3 2 4 7 8 2 2 15 12 1 15 2 10 1 11 2 0 9 1 3 8 0 1 0 >HUMCATS\M90696 S39127\137..1132\996\AAC37592.1\Homo sapiens\Homo sapiens cathepsin S (CTSS) mRNA, complete cds./gene="CTSS"/codon_start=1/product="cathepsin S"/protein_id="AAC37592.1"/db_xref="GI:179957" 1 0 3 3 6 0 1 5 13 5 0 4 7 4 0 8 3 5 4 3 1 4 3 3 0 5 5 5 1 8 6 9 4 8 3 2 15 5 17 10 8 11 3 7 6 6 17 2 5 10 9 11 4 7 5 3 1 5 3 11 8 0 1 0 >D63705\D63705\84..719\636\BAA23620.1\Homo sapiens\Homo sapiens mRNA for TFIID subunit p30beta, complete cds./codon_start=1/product="TFIID subunit p30beta"/protein_id="BAA23620.1"/db_xref="GI:2645175" 0 2 0 1 1 4 1 1 5 1 3 1 5 3 3 6 0 1 3 4 1 2 3 3 1 3 5 5 1 5 5 2 4 2 2 2 5 6 14 8 2 2 2 7 2 1 12 14 8 10 0 2 0 1 4 1 1 9 2 6 1 0 1 0 >HOSA18281\Y18281\22..579\558\CAB50728.1\Homo sapiens\Homo sapiens mRNA for mannose binding lectin-associated serineprotease-2, alternatively spliced transcript (clone phl-5)./gene="MASP-2"/codon_start=1/product="mannose binding lectin-associated serine protease-2 related protein, MAp19 (19kDa)"/protein_id="CAB50728.1"/db_xref="GI:5459315"/db_xref="GOA:O00187"/db_xref="UniProt/Swiss-Prot:O00187" 0 7 2 1 0 1 0 5 15 1 1 0 1 6 3 0 4 0 1 7 3 2 1 5 3 3 4 8 1 0 1 9 5 1 0 2 4 0 0 6 3 1 0 4 7 0 1 12 10 0 7 2 8 1 10 1 0 0 2 1 2 0 1 0 >AL136373#7\AL136373\join(complement(AL593856.8:21752..22017), complement(AL593856.8:18951..19059), complement(AL593856.8:12382..12432), complement(AL593856.8:11822..11884), complement(AL593856.8:7512..7562), complement(AL593856.8:6276..6323), complement(117968..118063),complement(116001..116108), complement(108583..108777))\-59376\CAI14761.1\Homo sapiens\Human DNA sequence from clone RP11-49P4 on chromosome 1p32.2-33Contains two novel genes (FLJ40412, FLJ32011), the MKNK1 gene forMAP kinase-interacting serine/threonine kinase 1, the gene forMOB3C protein (MOB3C) and the 3' end of the ATPAF1 gene for ATPsynthase mitochondrial F1 complex assembly factor 1, completesequence./gene="ATPAF1"/locus_tag="RP11-49P4.5-002"/standard_name="OTTHUMP00000009681"/codon_start=1/product="ATP synthase mitochondrial F1 complex assembly factor 1"/protein_id="CAI14761.1"/db_xref="GI:55957743"/db_xref="GOA:Q5SXB7"/db_xref="InterPro:IPR010591"/db_xref="UniProt/TrEMBL:Q5SXB7" 4 2 1 0 8 3 4 11 10 4 4 3 2 5 2 6 4 2 4 6 3 8 7 6 3 3 8 10 0 11 6 7 6 6 4 2 4 5 6 6 6 8 6 9 8 3 11 11 4 2 2 4 5 4 7 16 2 5 5 5 4 1 2 3 >HS330O12#4\AL031731\join(29996..30260,32427..32553,32693..32788,32896..33031, 35290..35433)\768\CAI20211.1\Homo sapiens\Human DNA sequence from clone RP3-330O12 on chromosome1p36.11-36.23 Contains the FBXO2 gene for F-box only protein 2, thegene for F-box protein FBX30, the FBXO6 gene for F-box only protein6, the MAD2L2 gene for MAD2 mitotic arrest deficient-like 2(yeast), the 5' end of a novel gene and three CpG islands, completesequence./gene="FBXO44"/locus_tag="RP3-330O12.5-001"/standard_name="OTTHUMP00000002063"/codon_start=1/product="F-box protein 44"/protein_id="CAI20211.1"/db_xref="GI:56203289"/db_xref="GOA:Q9H4M3"/db_xref="InterPro:IPR001810"/db_xref="InterPro:IPR007397"/db_xref="UniProt/Swiss-Prot:Q9H4M3" 2 4 3 0 0 5 0 9 17 0 1 0 1 5 1 1 6 0 2 7 1 3 2 7 6 1 1 6 3 2 1 7 6 0 0 6 9 2 2 11 8 2 0 11 6 1 3 13 12 6 8 2 7 0 11 1 0 10 0 2 12 0 0 1 >HSA306634\AJ306634\join(217..289,420..689,931..1206,1786..1826)\660\CAC36395.1\Homo sapiens\Homo sapiens HLA-A gene for MHC class I antigen, A*23N allele,exons 1-7./gene="HLA-A"/function="MHC class I antigen"/codon_start=1/product="HLA-A protein"/protein_id="CAC36395.1"/db_xref="GI:13620230"/db_xref="GOA:Q9BCN1"/db_xref="InterPro:IPR001039"/db_xref="UniProt/TrEMBL:Q9BCN1" 2 9 4 1 2 2 1 7 9 0 0 1 1 5 2 2 3 0 3 6 4 1 2 6 3 0 1 11 9 1 0 6 9 1 0 2 8 0 2 4 3 0 1 12 4 1 0 15 15 1 9 3 2 0 5 1 1 4 0 7 5 0 0 1 >HS333A15#4\AL031429\join(complement(123019..123915),complement(88643..88822), complement(30101..30127), complement(AL158087.12:96271..96346), complement(AL158087.12:89396..89424))\897\CAI20231.1\Homo sapiens\Human DNA sequence from clone RP3-333A15 on chromosome 1p31.1-31.3Contains the 5' end of the PTGER3 gene for prostaglandin E receptor3 (subtype EP3), part of a novel gene and a CpG island, completesequence./gene="PTGER3"/locus_tag="RP5-952N6.2-007"/standard_name="OTTHUMP00000011029"/codon_start=1/product="prostaglandin E receptor 3 (subtype EP3)"/protein_id="CAI20231.1"/db_xref="GI:56417800"/db_xref="GOA:O00325"/db_xref="InterPro:IPR000265"/db_xref="InterPro:IPR000276"/db_xref="InterPro:IPR001244"/db_xref="InterPro:IPR001481"/db_xref="InterPro:IPR008365"/db_xref="UniProt/TrEMBL:O00325" 2 10 8 3 7 1 0 15 25 5 4 4 2 9 6 5 6 1 5 14 7 3 3 3 6 3 2 21 7 3 2 12 13 1 3 9 12 5 5 12 11 1 1 12 6 4 3 12 2 3 6 1 12 0 15 5 2 12 3 15 12 1 0 0 >AB012723\AB012723\join(10753..10844,11940..12014,15049..15198,19720..19896, 21232..21383,23136..23318,23728..23882,25740..25910)\1155\BAA36418.1\Homo sapiens\Homo sapiens gene for kinesin-like protein, complete cds./codon_start=1/product="kinesin-like protein"/protein_id="BAA36418.1"/db_xref="GI:4115553" 4 3 3 3 3 13 3 13 17 5 2 7 2 5 4 9 14 2 5 8 4 3 5 6 4 5 9 16 9 7 12 9 10 5 0 7 13 10 2 11 2 4 5 16 6 4 3 15 13 7 3 3 7 1 3 2 2 3 7 5 1 0 0 1 >AF069291#1\AF069291\join(68735..68801,73157..73293,73635..73826,80160..80251, 83583..84612)\1518\AAC62231.1\Homo sapiens\Homo sapiens BAC clone 255A7 from 8q21 containing NBS1 gene,complete sequence./gene="hT41"/codon_start=1/product="hT41"/protein_id="AAC62231.1"/db_xref="GI:3687829" 6 3 0 2 10 2 3 3 10 14 14 9 14 6 2 12 6 4 8 2 1 8 10 4 1 12 8 3 2 11 14 6 8 11 13 2 9 13 17 16 5 13 5 8 3 17 19 9 8 21 7 15 2 7 4 19 13 6 9 11 5 1 0 0 >HSDJ319M7#1\AL079341\join(60503..60687,64270..64633,65757..66021,78160..78379, 80878..80997,complement(AL031905.7:20931..21040), complement(AL355345.13:18948..19137), AL033518.14:12909..13016,AL033518.14:76828..76906, AL033518.14:94243..94440)\-2028\CAI20151.1\Homo sapiens\Human DNA sequence from clone RP1-319M7 on chromosome 6p21.1-21.3Contains the 5' end of the gene for a novel protein (containsKIAA1880 and FLJ32945), a TRK-fused gene (TFG) pseudogene and a CpGisland, complete sequence./locus_tag="RP3-322I12.1-004"/standard_name="OTTHUMP00000016337"/codon_start=1/protein_id="CAI20151.1"/db_xref="GI:56203781"/db_xref="InterPro:IPR000210"/db_xref="InterPro:IPR000421"/db_xref="InterPro:IPR008979" 2 1 1 2 11 11 3 16 14 13 6 6 4 10 6 10 7 8 7 6 7 13 10 14 4 9 9 4 1 8 5 11 10 4 6 9 7 9 16 12 9 17 8 15 13 19 8 14 10 7 6 10 12 13 21 14 14 22 15 11 9 15 7 12 >BC066294\BC066294\14..598\585\AAH66294.1\Homo sapiens\Homo sapiens lysozyme-like 2, mRNA (cDNA clone MGC:86984IMAGE:4374284), complete cds./gene="LYZL2"/codon_start=1/product="LYZL2 protein"/protein_id="AAH66294.1"/db_xref="GI:42542634"/db_xref="GeneID:119180" 0 1 1 1 2 3 0 3 10 1 0 2 4 4 1 2 6 1 4 3 1 4 0 1 1 1 3 6 5 4 3 10 1 0 0 4 0 3 8 7 5 3 2 5 3 0 0 7 9 3 4 3 7 4 5 1 1 6 4 4 7 1 0 0 >BC009073\BC009073\330..1208\879\AAH09073.1\Homo sapiens\Homo sapiens chromosome 14 open reading frame 152, mRNA (cDNA cloneMGC:9764 IMAGE:3856004), complete cds./gene="C14orf152"/codon_start=1/product="C14orf152 protein"/protein_id="AAH09073.1"/db_xref="GI:14290588"/db_xref="GeneID:90050" 0 5 3 0 2 12 0 7 21 3 0 3 0 10 0 2 5 3 0 3 0 1 7 21 3 8 4 6 2 6 4 13 9 2 0 4 13 0 4 18 3 3 0 13 2 4 4 13 5 5 6 3 7 1 5 2 0 4 0 5 3 0 1 0 >AE006463#8\AE006463 AE005175\119356..119826\471\AAK61228.1\Homo sapiens\Homo sapiens 16p13.3 sequence section 2 of 8./gene="RPL23AL"/codon_start=1/product="60S ribosomal protein L23A like"/protein_id="AAK61228.1"/db_xref="GI:14336695"/db_xref="MIM:602326"/db_xref="UniProt/Swiss-Prot:P29316" 2 2 3 0 1 3 0 2 5 2 1 3 1 0 1 1 2 0 2 6 3 1 0 2 2 6 3 8 4 5 1 1 1 2 0 3 4 3 7 18 5 0 0 3 4 0 6 3 1 6 1 3 0 1 2 1 1 4 2 1 1 1 0 0 >BC004372\BC004372\132..2231\2100\AAH04372.1\Homo sapiens\Homo sapiens CD44 antigen (homing function and Indian blood groupsystem), transcript variant 2, mRNA (cDNA clone MGC:10468IMAGE:3638681), complete cds./gene="CD44"/codon_start=1/product="CD44 antigen, isoform 2 precursor"/protein_id="AAH04372.1"/db_xref="GI:13325118"/db_xref="GeneID:960"/db_xref="MIM:107269" 2 4 3 2 12 11 2 8 8 4 2 10 19 14 2 14 19 15 34 23 7 18 20 12 2 9 15 7 1 15 22 9 10 9 3 5 15 5 6 11 20 20 6 19 12 20 33 14 25 20 7 7 6 3 9 12 4 19 10 12 12 1 0 0 >AY486331\AY486331\10..558\549\AAS57906.1\Homo sapiens\Homo sapiens BCL10-associated protein variant 1 (BAG) mRNA,complete cds, alternatively spliced./gene="BAG"/codon_start=1/product="BCL10-associated protein variant 1"/protein_id="AAS57906.1"/db_xref="GI:45331268" 0 3 1 0 1 4 1 3 2 2 0 1 3 1 2 2 5 1 0 3 2 3 7 0 7 8 4 1 6 3 2 3 2 1 2 2 1 0 8 14 4 4 0 2 1 1 5 20 6 7 3 3 0 1 1 3 4 1 0 2 3 0 1 0 >AK075017\AK075017\169..948\780\BAC11355.1\Homo sapiens\Homo sapiens cDNA FLJ90536 fis, clone OVARC1000105, highly similarto Ubiquitin conjugating enzyme E2./codon_start=1/protein_id="BAC11355.1"/db_xref="GI:22760840" 1 1 1 0 3 5 1 10 10 2 2 3 0 2 2 2 5 5 0 7 7 3 4 7 5 6 6 6 3 2 2 5 8 2 0 8 7 2 9 6 9 2 3 10 5 2 5 9 7 2 4 5 1 3 3 9 2 6 5 4 3 0 0 1 >BC023981\BC023981\131..1195\1065\AAH23981.1\Homo sapiens\Homo sapiens carbonic anhydrase XII, transcript variant 1, mRNA(cDNA clone MGC:22865 IMAGE:4042589), complete cds./gene="CA12"/codon_start=1/product="carbonic anhydrase XII, isoform 1 precursor"/protein_id="AAH23981.1"/db_xref="GI:18645129"/db_xref="GeneID:771"/db_xref="MIM:603263" 1 3 3 0 2 3 0 10 24 5 1 1 4 12 2 3 7 5 4 6 2 5 2 8 7 5 2 13 3 6 4 11 4 3 2 7 11 2 3 14 13 5 5 13 12 3 6 12 10 3 9 6 2 3 11 2 1 9 8 6 5 0 0 1 >BC098269\BC098269\49..1875\1827\AAH98269.1\Homo sapiens\Homo sapiens RAP1, GTP-GDP dissociation stimulator 1, mRNA (cDNAclone MGC:118860 IMAGE:40001059), complete cds./gene="RAP1GDS1"/codon_start=1/product="RAP1, GTP-GDP dissociation stimulator 1"/protein_id="AAH98269.1"/db_xref="GI:67514265"/db_xref="GeneID:5910"/db_xref="MIM:179502" 2 1 0 1 10 3 14 8 19 21 16 13 8 4 1 6 4 19 9 4 2 8 6 1 0 4 26 12 0 23 13 7 3 8 17 4 10 15 28 13 8 22 12 21 4 11 31 17 7 25 1 2 4 10 2 11 12 7 17 19 2 0 0 1 >BC010087\BC010087\14..1852\1839\AAH10087.1\Homo sapiens\Homo sapiens phosphoglucomutase 2, mRNA (cDNA clone MGC:19508IMAGE:2964390), complete cds./gene="PGM2"/codon_start=1/product="phosphoglucomutase 2"/protein_id="AAH10087.1"/db_xref="GI:14603253"/db_xref="GeneID:55276"/db_xref="MIM:172000" 6 3 2 1 6 5 7 6 10 9 8 12 3 7 0 12 11 14 10 13 2 9 10 5 5 12 10 15 1 28 10 10 11 8 1 7 14 6 28 20 7 18 8 18 5 8 24 16 19 18 11 8 4 5 7 27 5 12 15 10 10 1 0 0 >BC000954\BC000954\112..663\552\AAH00954.1\Homo sapiens\Homo sapiens chromobox homolog 3 (HP1 gamma homolog, Drosophila),transcript variant 1, mRNA (cDNA clone MGC:4912 IMAGE:3450099),complete cds./gene="CBX3"/codon_start=1/product="chromobox homolog 3"/protein_id="AAH00954.1"/db_xref="GI:12654267"/db_xref="GeneID:11335"/db_xref="MIM:604477" 1 0 0 1 6 0 2 1 2 2 2 4 2 1 0 5 2 3 4 0 0 3 3 0 0 4 3 3 2 6 5 1 1 3 3 1 4 1 19 6 2 5 3 2 0 1 17 5 6 10 0 2 0 3 1 6 1 0 4 5 4 1 0 0 >AY651137\AY651137\534..1526\993\AAW70050.1\Homo sapiens\Homo sapiens clone KHC33 MRGX2 (MRGX2) gene, complete cds./gene="MRGX2"/codon_start=1/product="MRGX2"/protein_id="AAW70050.1"/db_xref="GI:58294240"/IGNORED_CODON=3 0 6 1 1 2 6 2 14 33 6 4 1 3 6 1 4 8 7 5 10 0 2 3 4 4 1 2 10 2 3 4 9 5 3 1 11 10 3 0 5 4 3 1 9 2 2 3 4 4 6 5 1 6 9 21 7 2 12 7 6 11 0 1 0 >AY359040\AY359040\161..457\297\AAQ89399.1\Homo sapiens\Homo sapiens clone DNA52758 SCRG1 (UNQ390) mRNA, complete cds./locus_tag="UNQ390"/codon_start=1/product="SCRG1"/protein_id="AAQ89399.1"/db_xref="GI:37183198" 0 1 0 0 1 0 3 2 3 2 0 2 0 0 0 2 1 0 1 1 0 1 2 0 1 2 1 1 0 1 4 0 2 0 2 2 1 3 3 3 4 3 2 2 2 1 2 1 2 4 2 0 5 3 5 1 1 2 3 4 1 0 0 1 >AY663423#2\AY663423\join(48096..48116,48596..49585)\1011\AAU47296.1\Homo sapiens\Homo sapiens isolate fa0141 immunoglobulin superfamily member 4B(IGSF4B) gene, complete cds; and Duffy blood group (FY) genes,complete cds, alternatively spliced./gene="FY"/codon_start=1/product="Duffy blood group"/protein_id="AAU47296.1"/db_xref="GI:52426524" 0 3 0 0 2 2 6 13 36 3 0 12 4 7 0 7 8 4 3 7 2 5 5 8 0 6 7 19 1 9 8 8 7 8 2 8 9 1 1 5 5 3 0 9 6 3 4 3 5 6 2 3 5 7 11 4 1 4 4 4 11 0 1 0 >HSU70439\U70439\211..966\756\AAB37579.1\Homo sapiens\Human silver-stainable protein SSP29 mRNA, complete cds./codon_start=1/product="silver-stainable protein SSP29"/protein_id="AAB37579.1"/db_xref="GI:1698783" 3 0 1 0 2 4 1 6 9 5 6 6 4 0 0 0 2 5 3 4 0 0 1 2 1 3 2 1 0 3 6 3 2 5 2 3 3 3 10 7 5 8 0 2 1 1 34 22 13 25 2 1 1 2 2 5 1 4 1 3 0 1 0 0 >HSU66867\U66867\807..1283\477\AAC50716.1\Homo sapiens\Human ubiquitin conjugating enzyme 9 (hUBC9) mRNA, complete cds./gene="hUBC9"/codon_start=1/product="ubiquitin conjugating enzyme 9"/protein_id="AAC50716.1"/db_xref="GI:1561759" 1 0 1 0 2 4 3 3 1 2 3 1 1 1 3 1 1 0 3 0 2 1 11 2 2 1 3 6 1 2 3 2 3 1 0 2 4 0 10 4 3 4 4 3 2 0 4 7 4 3 4 1 3 1 3 4 1 6 2 4 4 1 0 0 >HSIL15MR1\X94222\223..630\408\CAA63913.1\Homo sapiens\H.sapiens mRNA for interleukin-15 (cell line NCIH69)./codon_start=1/product="interleukin-15 (IL-15)"/protein_id="CAA63913.1"/db_xref="GI:1495460"/db_xref="GOA:P40933"/db_xref="UniProt/Swiss-Prot:P40933" 0 0 0 0 0 0 1 1 2 3 2 6 1 1 0 5 1 7 4 1 1 2 0 1 0 1 4 1 0 1 3 0 2 0 6 1 1 2 7 1 4 5 3 1 1 3 9 4 0 7 0 1 4 2 2 3 2 3 7 4 1 0 0 1 >BC054049\BC054049\83..997\915\AAH54049.1\Homo sapiens\Homo sapiens zinc finger protein 364, mRNA (cDNA clone MGC:65162IMAGE:6157275), complete cds./gene="ZNF364"/codon_start=1/product="Rabring 7"/protein_id="AAH54049.1"/db_xref="GI:32450454"/db_xref="GeneID:27246" 2 0 6 1 8 2 5 0 4 4 5 3 1 3 2 8 9 7 11 2 1 6 8 4 2 8 5 9 3 4 6 7 4 7 5 2 3 2 2 4 3 7 6 7 10 4 13 7 11 8 2 2 4 6 5 14 3 2 5 4 6 0 0 1 >AY297044\AY297044\219..3341\3123\AAP44473.1\Homo sapiens\Homo sapiens transient receptor potential cation channel subfamilyM member 4 splice variant A (TRPM4) mRNA, complete cds;alternatively spliced./gene="TRPM4"/codon_start=1/product="transient receptor potential cation channel subfamily M member 4 splice variant A"/protein_id="AAP44473.1"/db_xref="GI:31335331" 4 32 22 8 1 14 5 49 76 9 0 14 6 17 12 7 22 7 6 15 10 6 10 22 16 13 10 49 15 15 8 36 38 8 3 19 35 6 10 22 17 4 5 35 13 5 12 45 38 12 19 6 19 4 41 10 2 22 8 24 22 0 0 1 >BC018272\BC018272\82..906\825\AAH18272.1\Homo sapiens\Homo sapiens toll interacting protein, mRNA (cDNA clone MGC:16867IMAGE:3452585), complete cds./gene="TOLLIP"/codon_start=1/product="toll interacting protein"/protein_id="AAH18272.1"/db_xref="GI:17390641"/db_xref="GeneID:54472"/db_xref="MIM:606277" 3 8 1 0 1 2 0 4 14 1 0 1 0 5 0 1 5 0 3 6 5 1 5 14 4 0 3 15 4 2 1 12 6 1 1 6 17 1 1 8 6 4 0 22 3 0 1 12 12 2 9 2 2 2 5 0 0 12 1 14 3 0 1 0 >AY174122\AY174122\128..313\186\AAN73231.1\Homo sapiens\Homo sapiens chemokine-like factor super family 1 isoform 5(CKLFSF1) mRNA, complete cds; alternatively spliced./gene="CKLFSF1"/codon_start=1/product="chemokine-like factor super family 1 isoform 5"/protein_id="AAN73231.1"/db_xref="GI:25361699" 0 0 1 0 1 1 0 1 1 2 1 2 3 1 0 0 0 1 1 1 0 1 0 0 1 3 2 5 0 1 0 0 3 0 1 1 1 1 3 1 2 0 2 0 1 1 2 3 0 2 0 1 0 0 1 0 0 2 1 2 0 1 0 0 >BC046149\BC046149\60..791\732\AAH46149.1\Homo sapiens\Homo sapiens nitrilase 1, mRNA (cDNA clone MGC:57670IMAGE:5764281), complete cds./gene="NIT1"/codon_start=1/product="NIT1 protein"/protein_id="AAH46149.1"/db_xref="GI:28279801"/db_xref="GeneID:4817"/db_xref="MIM:604618" 1 0 3 1 3 4 2 4 16 6 0 3 3 7 1 5 3 1 5 4 2 2 5 6 0 11 7 5 0 10 4 4 5 5 4 3 4 1 4 3 3 1 4 7 4 2 7 10 6 0 3 2 4 7 6 3 2 3 4 5 3 0 0 1 >AK125005\AK125005\76..1527\1452\BAC86022.1\Homo sapiens\Homo sapiens cDNA FLJ43015 fis, clone BRTHA2016496, moderatelysimilar to Vacuolar processing enzyme precursor (EC 3.4.22.-)./codon_start=1/protein_id="BAC86022.1"/db_xref="GI:34530959" 2 2 1 3 2 8 4 10 7 10 4 6 6 8 0 11 5 13 7 3 1 8 5 4 2 13 14 6 1 15 11 7 7 18 3 7 7 10 14 10 10 17 11 3 6 10 19 9 9 18 8 14 4 4 6 15 7 6 12 13 7 1 0 0 >AF020918\AF020918\71..739\669\AAD04711.1\Homo sapiens\Homo sapiens glutathione transferase (GSTA4) mRNA, complete cds./gene="GSTA4"/EC_number="2.5.1.18"/codon_start=1/product="glutathione transferase"/protein_id="AAD04711.1"/db_xref="GI:2738933" 1 0 1 0 6 3 2 6 10 4 5 3 0 1 0 1 4 1 4 4 0 0 2 4 0 8 5 3 0 3 3 3 2 3 1 2 8 4 4 13 5 4 5 7 5 1 12 5 3 7 6 2 0 0 4 8 2 4 10 7 1 1 0 0 >AF035360\AF035360\88..2091\2004\AAB99951.1\Homo sapiens\Homo sapiens ring finger protein (FXY) mRNA, complete cds./gene="FXY"/codon_start=1/product="ring finger protein"/protein_id="AAB99951.1"/db_xref="GI:2827994" 4 9 8 5 4 3 7 16 21 7 3 10 10 14 1 7 14 7 13 26 2 6 3 12 4 10 9 17 5 13 5 6 9 3 5 11 17 5 14 25 25 10 8 23 13 12 16 30 17 17 9 7 16 14 10 11 3 23 17 8 8 0 0 1 >AY038048\AY038048\180..3170\2991\AAK71497.1\Homo sapiens\Homo sapiens SEZ6 (SEZ6) mRNA, complete cds./gene="SEZ6"/codon_start=1/product="SEZ6"/protein_id="AAK71497.1"/db_xref="GI:20143985" 6 18 5 4 2 8 7 24 45 6 1 4 8 20 6 10 30 10 19 38 1 15 28 43 9 26 14 36 5 15 14 39 22 13 6 16 33 8 5 18 10 10 6 35 22 9 9 50 26 21 19 10 14 12 25 13 5 31 5 12 15 0 0 1 >AK057590\AK057590\42..410\369\BAB71530.1\Homo sapiens\Homo sapiens cDNA FLJ33028 fis, clone THYMU2000140./codon_start=1/protein_id="BAB71530.1"/db_xref="GI:16553340" 1 2 4 0 5 4 0 3 5 6 1 3 2 3 1 3 1 2 0 1 0 1 2 5 1 3 0 2 0 4 2 4 2 1 1 4 2 4 3 2 0 3 1 2 1 2 5 0 1 1 0 0 2 0 1 2 1 1 3 4 2 0 1 0 >HSMTMMPPR\X90925\113..1861\1749\CAA62432.1\Homo sapiens\H.sapiens mRNA for MT-MMP protein./codon_start=1/product="MT-MMP protein"/protein_id="CAA62432.1"/db_xref="GI:1495995"/db_xref="GOA:P50281"/db_xref="UniProt/Swiss-Prot:P50281" 5 10 5 6 3 9 2 14 25 2 0 1 5 7 3 4 8 2 4 12 4 6 7 28 2 9 4 24 8 7 8 26 15 4 2 7 19 3 9 23 12 9 7 12 4 8 7 31 18 13 20 5 3 3 28 9 0 17 6 15 13 0 0 1 >BC096703\BC096703\1..255\255\AAH96703.1\Homo sapiens\Homo sapiens CDC42 small effector 2, mRNA (cDNA clone MGC:120071IMAGE:40021506), complete cds./gene="CDC42SE2"/codon_start=1/product="CDC42 small effector 2"/protein_id="AAH96703.1"/db_xref="GI:68532498"/db_xref="GeneID:56990" 1 0 2 0 1 1 0 1 1 0 0 1 2 2 0 0 1 3 2 0 1 0 0 1 0 3 1 1 1 1 8 0 0 2 0 1 2 2 1 2 3 2 1 6 0 2 2 1 2 1 0 1 1 2 3 1 0 0 4 6 1 0 1 0 >AF121141\AF121141\357..6656\6300\AAD17298.1\Homo sapiens\Homo sapiens endocrine regulator mRNA, complete cds./codon_start=1/product="endocrine regulator"/protein_id="AAD17298.1"/db_xref="GI:4325209"/IGNORED_CODON=1 22 13 17 12 22 27 24 28 36 24 24 43 47 43 3 69 36 54 41 26 0 38 80 33 5 71 31 36 2 53 33 20 33 22 27 11 38 34 81 86 37 30 33 48 27 24 91 111 53 58 13 23 13 15 21 32 23 23 37 30 11 1 0 0 >BC012476\BC012476\59..1465\1407\AAH12476.2\Homo sapiens\Homo sapiens hypothetical protein LOC116236, mRNA (cDNA cloneMGC:21749 IMAGE:4537124), complete cds./gene="LOC116236"/codon_start=1/product="hypothetical protein LOC116236"/protein_id="AAH12476.2"/db_xref="GI:54035278"/db_xref="GeneID:116236" 6 16 11 4 2 7 1 19 32 7 0 7 2 11 4 2 14 2 5 6 1 4 4 13 13 6 1 25 13 3 8 21 13 2 2 8 13 2 2 4 5 1 1 8 11 2 3 27 17 4 11 1 11 4 19 3 0 11 1 1 11 0 0 1 >AL670886#17\AL670886\join(19489..19560,20525..20627,21196..21311,21754..21840)\378\CAI17798.1\Homo sapiens\Human DNA sequence from clone XXbac-88C14 on chromosome 6 containsthe 5' end of the BAT3 gene for HLA-B associated transcript 3, theAPOM gene for apolipoprotein M, the gene for chromosome 6, openreading frame 47, the BAT4 gene for HLA-B associated transcript 4,the CSNK2B gene for casein kinase 2, beta polypeptide, the LY6G5Bgene for lymphocyte antigen 6 complex, locus 5B , the gene for G5cprotein, the BAT5 gene for HLA-B associated transcript 5, theLY6G6F gene for lymphocyte antigen 6 complex, locus G6f, the LY6G6Egene for lymphocyte antigen 6 complex, locus G6E, the LY6G6C genefor lymphocyte antigen 6 complex, locus G6C, the gene forchromosome 6 open reading frame 25, the DDAH gene fordimethylarginine dimethylaminohydrolase 2, the CLIC1 gene forchloride intracellular channel 1 and seven CpG islands, completesequence./gene="CSNK2B"/locus_tag="XXbac-BCX88C14.4-004"/standard_name="OTTHUMP00000029470"/codon_start=1/product="casein kinase 2, beta polypeptide"/protein_id="CAI17798.1"/db_xref="GI:55961591"/db_xref="GOA:Q5SRQ6"/db_xref="InterPro:IPR000704"/db_xref="UniProt/TrEMBL:Q5SRQ6" 1 1 0 3 0 0 1 2 3 4 0 3 1 2 0 0 2 1 0 1 0 1 1 2 0 3 1 3 0 1 3 2 1 2 0 1 3 0 1 1 4 3 2 7 2 0 7 8 7 2 5 2 0 5 3 2 0 5 3 5 2 0 0 1 >AK000710\AK000710\377..970\594\BAA91335.1\Homo sapiens\Homo sapiens cDNA FLJ20703 fis, clone KAIA1965./codon_start=1/protein_id="BAA91335.1"/db_xref="GI:7020969" 1 6 5 0 0 3 0 3 12 2 0 1 1 2 1 3 6 1 2 6 2 4 1 2 5 1 2 6 3 4 2 2 4 0 0 2 2 0 2 7 4 0 1 9 2 1 1 29 9 6 1 3 2 1 3 1 1 3 2 10 2 0 0 1 >AF126426\AF126426\265..1299\1035\AAF37591.1\Homo sapiens\Homo sapiens neurotrimin (HNT) mRNA, complete cds./gene="HNT"/codon_start=1/product="neurotrimin"/protein_id="AAF37591.1"/db_xref="GI:7158998" 0 3 5 0 5 5 2 9 13 4 0 2 6 3 2 6 9 2 4 11 4 4 5 9 0 5 4 10 2 2 4 7 9 4 5 13 20 2 8 9 12 6 3 8 5 1 11 7 10 5 8 4 8 2 7 3 1 8 9 3 6 0 0 1 >AY044836\AY044836\6..1583\1578\AAK98782.1\Homo sapiens\Homo sapiens vesicular inhibitory amino acid transporter mRNA,complete cds./codon_start=1/product="vesicular inhibitory amino acid transporter"/protein_id="AAK98782.1"/db_xref="GI:17975777" 3 11 2 0 0 1 2 21 35 1 0 4 0 15 5 1 11 2 0 7 12 1 2 10 5 4 2 33 13 3 4 32 6 3 1 14 28 1 2 20 13 3 1 13 10 2 5 21 16 2 13 3 15 2 26 7 2 28 3 13 10 0 1 0 >BC034952\BC034952\184..2733\2550\AAH34952.1\Homo sapiens\Homo sapiens diaphanous homolog 3 (Drosophila), mRNA (cDNA cloneMGC:26208 IMAGE:4830888), complete cds./gene="DIAPH3"/codon_start=1/product="diaphanous homolog 3"/protein_id="AAH34952.1"/db_xref="GI:54114914"/db_xref="GeneID:81624" 3 3 8 3 21 7 8 8 17 26 20 23 7 6 2 15 11 10 13 5 2 8 19 6 4 27 13 8 1 14 14 4 4 7 11 5 18 11 47 31 13 24 20 21 6 9 58 33 12 33 7 10 5 12 13 28 13 13 20 26 3 1 0 0 >BC011350\BC011350\54..1979\1926\AAH11350.1\Homo sapiens\Homo sapiens chromosome 14 open reading frame 169, mRNA (cDNA cloneMGC:16815 IMAGE:4340013), complete cds./gene="C14orf169"/codon_start=1/product="chromosome 14 open reading frame 169"/protein_id="AAH11350.1"/db_xref="GI:15030188"/db_xref="GeneID:79697" 10 19 14 3 2 10 5 11 43 5 2 15 1 13 11 6 6 5 5 12 8 4 7 21 13 10 9 29 19 14 7 8 17 8 3 8 27 6 2 8 11 3 4 28 11 5 15 28 19 15 14 5 4 2 11 9 4 4 3 12 8 0 1 0 >BC022481\BC022481\199..1347\1149\AAH22481.1\Homo sapiens\Homo sapiens neurogenic differentiation 2, mRNA (cDNA cloneMGC:26304 IMAGE:4816285), complete cds./gene="NEUROD2"/codon_start=1/product="neurogenic differentiation 2"/protein_id="AAH22481.1"/db_xref="GI:18490286"/db_xref="GeneID:4761"/db_xref="MIM:601725" 0 13 9 1 0 1 3 11 19 3 0 3 1 7 11 3 7 0 0 5 7 1 6 13 11 2 4 14 17 2 3 30 8 4 0 4 7 0 1 20 10 4 1 8 13 1 6 29 15 1 13 2 5 2 7 2 0 3 0 7 2 0 0 1 >AK126907\AK126907\1195..1809\615\BAC86746.1\Homo sapiens\Homo sapiens cDNA FLJ44959 fis, clone BRAWH2012054./codon_start=1/protein_id="BAC86746.1"/db_xref="GI:34533585" 0 3 2 1 2 1 3 6 8 7 2 6 5 11 2 11 2 1 0 4 0 3 4 9 0 6 2 3 0 2 1 2 0 0 1 1 5 1 2 0 5 4 4 9 5 5 2 5 2 1 0 1 1 7 11 6 1 6 3 5 2 0 0 1 >BC032549\BC032549\83..853\771\AAH32549.1\Homo sapiens\Homo sapiens POU domain, class 2, associating factor 1, mRNA (cDNAclone MGC:45211 IMAGE:5554134), complete cds./gene="POU2AF1"/codon_start=1/product="POU2AF1 protein"/protein_id="AAH32549.1"/db_xref="GI:21619723"/db_xref="GeneID:5450"/db_xref="MIM:601206" 1 0 1 1 3 2 2 6 11 4 0 3 2 6 1 5 6 1 6 11 4 1 9 18 7 7 6 15 3 4 0 5 4 2 0 5 15 0 1 4 1 1 2 12 3 2 6 10 6 1 6 5 3 1 1 2 0 4 0 4 4 0 1 0 >AF044333\AF044333\1..1545\1545\AAD09407.1\Homo sapiens\Homo sapiens pleiotropic regulator 1 (PLRG1) mRNA, complete cds./gene="PLRG1"/codon_start=1/product="pleiotropic regulator 1"/protein_id="AAD09407.1"/db_xref="GI:2832296" 6 0 3 3 11 5 1 2 4 7 10 11 8 2 1 12 4 13 19 7 2 13 13 1 2 15 14 5 2 20 15 6 5 6 9 3 18 11 23 11 3 17 6 13 7 18 16 7 9 18 9 5 1 5 4 8 5 6 12 10 12 1 0 0 >HUMTROMOD\M77016\35..1114\1080\AAA61224.1\Homo sapiens\Human tropomodulin mRNA, complete cds./gene="TMOD"/codon_start=1/product="tropomodulin"/protein_id="AAA61224.1"/db_xref="GI:339948"/db_xref="GDB:G00-127-386" 3 1 6 1 2 5 2 9 23 7 0 5 3 3 1 2 7 4 7 5 4 2 4 12 1 5 11 6 4 3 4 5 5 1 2 3 13 2 10 19 16 10 3 9 3 0 25 15 11 9 8 3 1 1 3 3 1 8 7 10 1 0 1 0 >HS1119A7#5\AL022313\complement(join(63285..63298,63801..64084,64758..64900, 68776..68905,69003..69088,69599..69729,71061..71208, 71703..71835,72874..72986,75507..75579,76193..76278, 76894..77030,77962..78007,78297..78419))\1647\CAA18440.1\Homo sapiens\Human DNA sequence from clone RP5-1119A7 on chromosome22q12.2-12.3, complete sequence./locus_tag="RP5-1119A7.9-001"/standard_name="OTTHUMP00000028733"/codon_start=1/protein_id="CAA18440.1"/db_xref="GI:4200328"/db_xref="Genew:3278"/db_xref="GOA:O15371"/db_xref="InterPro:IPR007783"/db_xref="UniProt/Swiss-Prot:O15371" 4 12 7 6 7 6 3 8 15 3 1 6 7 8 0 8 9 6 9 8 4 6 3 10 2 6 4 17 3 4 7 7 7 5 1 11 11 8 15 24 16 12 4 28 6 1 20 27 21 20 17 4 4 7 14 11 0 16 8 14 10 1 0 0 >HSCD59R\X17198\73..459\387\CAA35059.1\Homo sapiens\Human mRNA for CD59 antigen./codon_start=1/product="CD59 antigen precursor"/protein_id="CAA35059.1"/db_xref="GI:29815"/db_xref="GOA:P13987"/db_xref="UniProt/Swiss-Prot:P13987" 0 1 0 0 0 1 1 2 9 3 2 1 3 1 0 2 2 0 4 2 1 2 2 1 0 1 2 2 1 3 2 0 4 2 0 5 2 1 3 4 6 4 2 2 0 4 3 3 3 2 3 1 6 5 3 4 0 1 1 1 2 1 0 0 >DQ054788\DQ054788\1..1902\1902\AAY88246.1\Homo sapiens\Homo sapiens interleukin-1 receptor associated-kinase 1 splicevariant c (IRAK1) mRNA, complete cds, alternatively spliced./gene="IRAK1"/codon_start=1/product="interleukin-1 receptor associated-kinase 1 splice variant c"/protein_id="AAY88246.1"/db_xref="GI:68235820" 2 8 10 4 2 11 4 14 40 4 0 6 6 21 5 6 24 6 6 12 7 4 19 24 10 12 11 38 4 12 8 23 16 4 1 4 27 1 1 14 12 0 0 33 10 4 6 32 19 3 11 2 10 3 10 10 0 14 4 6 13 0 0 1 >HSRIBIR\Y00281\138..1961\1824\CAA68392.1\Homo sapiens\Human mRNA for ribophorin I./codon_start=1/protein_id="CAA68392.1"/db_xref="GI:36053"/db_xref="GOA:P04843"/db_xref="UniProt/Swiss-Prot:P04843" 3 7 5 5 5 4 2 11 37 8 1 8 4 10 1 10 11 6 8 21 6 4 9 4 4 11 5 19 6 10 5 14 8 3 6 12 28 7 6 34 10 10 3 17 12 6 17 30 14 19 15 14 2 0 13 10 3 19 14 8 3 0 1 0 >AL160163#4\AL160163\complement(join(101513..101680,102132..102268, 102808..102967,105943..106158,106525..106572))\729\CAI23490.1\Homo sapiens\Human DNA sequence from clone RP5-973N23 on chromosome 6p12.3-21.2Contains the TRFP gene for TRF-proximal protein, the BYSL gene forbystin-like protein, the CCDN3 gene for cyclin D3, the 5' end ofUSP49 gene for ubiquitin specific protease 49 and 3 CpG islands,complete sequence./gene="CCND3"/locus_tag="RP5-973N23.3-002"/standard_name="OTTHUMP00000016391"/codon_start=1/product="cyclin D3"/protein_id="CAI23490.1"/db_xref="GI:56204980"/db_xref="GOA:Q5T8J1"/db_xref="InterPro:IPR004367"/db_xref="InterPro:IPR006670"/db_xref="InterPro:IPR006671"/db_xref="UniProt/TrEMBL:Q5T8J1" 2 4 5 1 0 3 1 9 19 0 0 4 0 7 0 4 10 1 4 8 2 3 2 8 1 3 5 13 3 6 1 6 5 2 1 7 4 0 4 5 1 0 2 10 3 2 5 10 5 5 4 1 5 4 3 2 1 5 3 6 2 0 1 0 >AK075391\AK075391\131..649\519\BAC11590.1\Homo sapiens\Homo sapiens cDNA PSEC0081 fis, clone NT2RP2004130./codon_start=1/protein_id="BAC11590.1"/db_xref="GI:22761449" 0 3 2 0 0 5 0 4 6 2 0 1 1 5 0 0 0 1 1 4 1 0 7 8 8 4 1 7 3 1 5 4 5 0 1 2 6 0 1 4 3 3 0 8 0 1 3 4 2 0 7 5 9 2 8 0 2 2 0 7 3 0 1 0 >AY550027\AY550027\1..378\378\AAS55642.1\Homo sapiens\Homo sapiens augmenter of liver regeneration mRNA, complete cds./codon_start=1/product="augmenter of liver regeneration"/protein_id="AAS55642.1"/db_xref="GI:45239054" 0 6 3 0 1 3 1 1 7 0 1 0 1 1 0 1 1 0 1 5 1 0 2 3 2 1 1 3 0 2 0 4 0 0 0 1 2 0 2 6 2 1 1 6 5 1 4 5 10 3 3 0 5 3 3 3 1 0 0 2 4 0 1 0 >AF451977\AF451977\99..4394\4296\AAL49758.1\Homo sapiens\Homo sapiens cask-interacting protein 1 (CASKIN1) mRNA, completecds./gene="CASKIN1"/codon_start=1/product="cask-interacting protein 1"/protein_id="AAL49758.1"/db_xref="GI:17940758" 10 24 38 0 0 14 4 22 85 4 0 6 7 24 16 14 51 5 10 30 20 10 25 88 54 32 16 95 40 17 7 72 35 16 1 21 55 1 8 71 23 8 1 59 30 6 12 74 45 13 12 7 9 1 12 4 0 35 6 19 7 0 0 1 >BC031427\BC031427\33..851\819\AAH31427.1\Homo sapiens\Homo sapiens phosphatidylinositol transfer protein, beta, mRNA(cDNA clone MGC:17010 IMAGE:4341137), complete cds./gene="PITPNB"/codon_start=1/product="PITPNB protein"/protein_id="AAH31427.1"/db_xref="GI:21594294"/db_xref="GeneID:23760"/db_xref="MIM:606876" 0 1 1 2 4 3 2 1 4 2 5 4 1 1 0 2 4 3 7 2 3 3 5 4 0 4 7 3 1 3 8 2 1 4 3 3 7 6 12 14 5 6 6 8 3 4 18 8 7 9 4 6 0 5 8 3 2 4 10 6 8 0 0 1 >HSA544583\AJ544583\1..882\882\CAD67566.1\Homo sapiens\Homo sapiens mRNA for testis serine protease 2 precursor (TESSP2gene)./gene="TESSP2"/codon_start=1/product="testis serine protease 2 precursor"/protein_id="CAD67566.1"/db_xref="GI:32562977"/db_xref="GOA:Q7Z5A4"/db_xref="InterPro:IPR001254"/db_xref="UniProt/TrEMBL:Q7Z5A4" 3 3 2 1 2 5 0 8 8 6 0 2 7 5 0 4 4 3 5 8 3 1 5 5 2 7 2 4 7 2 7 18 5 3 2 8 12 1 5 6 4 7 5 10 3 2 10 7 5 5 2 4 5 6 3 4 4 7 4 5 10 0 0 1 >BC096088\BC096088\295..1206\912\AAH96088.1\Homo sapiens\Homo sapiens DPH1 homolog (S. cerevisiae), mRNA (cDNA cloneMGC:116728 IMAGE:40000799), complete cds./gene="DPH1"/codon_start=1/product="DPH1 protein"/protein_id="AAH96088.1"/db_xref="GI:64654638"/db_xref="GeneID:1801"/db_xref="MIM:603527" 3 7 5 4 1 3 2 6 17 6 0 4 1 10 3 4 6 2 3 4 1 4 4 14 4 4 2 17 4 8 2 11 3 0 0 6 16 1 3 7 2 1 2 12 6 1 3 13 13 2 4 6 5 1 5 4 3 4 5 4 5 0 0 1 >HSM804399\AL833088\3029..3571\543\CAD89931.1\Homo sapiens\Homo sapiens mRNA; cDNA DKFZp451M2119 (from clone DKFZp451M2119);complete cds./gene="DKFZp451M2119"/codon_start=1/product="hypothetical protein"/protein_id="CAD89931.1"/db_xref="GI:30268378" 0 0 0 0 3 2 3 5 4 8 3 5 1 7 0 9 1 2 2 2 0 2 8 2 2 6 2 4 0 1 2 3 7 5 1 2 4 2 2 5 0 1 3 3 3 4 4 4 1 2 2 2 5 5 5 5 4 0 4 4 2 0 0 1 >AK074264\AK074264\251..1537\1287\BAB85033.1\Homo sapiens\Homo sapiens cDNA FLJ23684 fis, clone HEP09821./codon_start=1/protein_id="BAB85033.1"/db_xref="GI:18676819" 2 6 3 1 4 5 3 3 14 2 4 5 5 8 1 11 4 4 5 6 8 4 3 7 9 8 12 6 4 11 10 12 11 3 7 7 18 9 11 5 8 12 11 6 6 7 16 11 9 9 2 6 8 4 6 8 4 8 13 5 8 1 0 0 >AF098534\AF098534\259..2013\1755\AAC97951.1\Homo sapiens\Homo sapiens RAD17 isoform 4 (RAD17) mRNA, complete cds./gene="RAD17"/function="cell cycle checkpoint regulation"/codon_start=1/product="RAD17 isoform 4"/protein_id="AAC97951.1"/db_xref="GI:4050044" 6 1 5 0 10 9 11 8 10 10 20 8 12 1 1 18 5 12 9 4 2 10 8 7 1 16 9 7 1 12 22 1 3 9 3 3 8 8 27 12 10 16 16 15 4 11 32 10 12 20 3 10 4 6 9 19 16 5 17 15 5 0 1 0 >HSU79303\U79303\289..750\462\AAB50227.1\Homo sapiens\Human clone 23882 mRNA, complete cds./codon_start=1/product="unknown"/protein_id="AAB50227.1"/db_xref="GI:1710290" 0 1 9 1 1 6 1 1 9 0 0 0 1 7 1 3 5 4 1 2 0 0 1 4 1 3 3 4 2 5 1 2 4 1 0 1 0 0 1 4 3 1 0 4 3 0 3 16 9 2 1 0 2 0 5 1 0 4 1 7 1 0 1 0 >AB065474\AB065474\join(201..287,1173..1227,3684..3835,5306..5557,6747..8126, 9755..9796,10241..10300,10652..10714)\2091\BAC45257.1\Homo sapiens\Homo sapiens gene for seven transmembrane helix receptor, completecds, isolate:CBRC7TM_37./codon_start=1/evidence=not_experimental/product="seven transmembrane helix receptor"/protein_id="BAC45257.1"/db_xref="GI:27348191" 3 3 1 3 9 12 5 14 16 9 6 16 12 16 3 16 12 5 14 13 2 10 9 8 1 4 13 19 2 16 13 8 6 4 6 7 24 12 25 17 14 24 11 12 13 5 17 14 10 10 6 5 12 9 15 21 15 23 28 25 13 0 1 0 >AK096295\AK096295\124..2169\2046\BAC04752.1\Homo sapiens\Homo sapiens cDNA FLJ38976 fis, clone NT2RI2004099, weakly similarto ANKYRIN 2./codon_start=1/protein_id="BAC04752.1"/db_xref="GI:21755754" 0 23 19 3 1 11 4 12 56 3 0 2 1 8 4 1 15 3 3 16 11 1 5 19 10 1 5 44 22 12 4 32 17 6 1 11 49 1 3 14 21 0 5 31 20 6 2 37 32 3 5 0 18 4 12 2 1 17 2 5 5 0 0 1 >AK091496\AK091496\561..3035\2475\BAC03677.1\Homo sapiens\Homo sapiens cDNA FLJ34177 fis, clone FCBBF3016451, highly similarto RETINAL-CADHERIN PRECURSOR./codon_start=1/protein_id="BAC03677.1"/db_xref="GI:21749880" 4 13 12 2 3 6 0 17 34 1 0 1 7 21 2 5 17 2 12 29 13 1 10 33 15 6 11 25 9 14 11 24 18 7 1 27 36 3 7 21 41 11 3 28 14 0 7 35 54 11 22 8 4 2 15 5 1 45 5 26 7 0 0 1 >AY221751\AY221751\474..1970\1497\AAO67709.1\Homo sapiens\Homo sapiens myocardial ischemic preconditioning upregulatedprotein 2 (MIP2) mRNA, complete cds./gene="MIP2"/codon_start=1/product="myocardial ischemic preconditioning upregulated protein 2"/protein_id="AAO67709.1"/db_xref="GI:37900894" 4 3 4 6 4 6 11 2 14 14 10 7 7 3 0 10 5 11 17 3 3 7 11 1 2 5 10 2 2 8 10 10 7 3 6 2 10 13 15 9 5 16 10 16 7 14 21 9 14 22 4 11 5 11 7 6 6 3 10 10 14 0 0 1 >HSM806319\BX537603\1119..4055\2937\CAD97793.1\Homo sapiens\Homo sapiens mRNA; cDNA DKFZp686L04115 (from clone DKFZp686L04115)./gene="DKFZp686L04115"/codon_start=1/product="hypothetical protein"/protein_id="CAD97793.1"/db_xref="GI:31874445" 6 7 7 2 10 12 9 21 33 13 8 11 11 10 1 14 29 17 20 17 5 18 12 19 3 18 15 17 7 17 17 15 11 13 9 20 32 9 20 16 36 20 14 20 30 14 31 20 34 24 21 16 14 14 15 15 11 17 23 32 6 1 0 0 >BC035657\BC035657\62..1102\1041\AAH35657.1\Homo sapiens\Homo sapiens G protein-coupled receptor 41, mRNA (cDNA cloneMGC:46158 IMAGE:5588752), complete cds./gene="GPR41"/codon_start=1/product="G protein-coupled receptor 41"/protein_id="AAH35657.1"/db_xref="GI:23272749"/db_xref="GeneID:2865"/db_xref="MIM:603821" 1 6 4 0 3 6 1 16 30 3 0 3 2 8 2 2 11 3 1 10 2 2 3 7 4 1 3 12 4 5 3 11 10 3 0 10 22 0 0 6 5 3 1 13 7 2 6 9 8 1 11 2 7 4 18 4 1 10 1 5 8 0 1 0 >BC001484\BC001484\57..1061\1005\AAH01484.1\Homo sapiens\Homo sapiens malate dehydrogenase 1, NAD (soluble), mRNA (cDNAclone MGC:1375 IMAGE:3505345), complete cds./gene="MDH1"/codon_start=1/product="cytosolic malate dehydrogenase"/protein_id="AAH01484.1"/db_xref="GI:16306622"/db_xref="GeneID:4190"/db_xref="MIM:154200" 2 0 0 3 3 2 2 4 12 8 2 4 5 9 1 5 1 4 2 4 1 7 6 3 0 4 7 10 0 13 8 4 0 11 2 10 10 10 16 15 6 8 4 5 2 2 13 5 7 16 3 4 2 2 5 7 1 11 8 9 4 0 0 1 >AF078835\AF078835\1..840\840\AAG12174.1\Homo sapiens\Homo sapiens p33ING1 (ING1) mRNA, complete cds./gene="ING1"/codon_start=1/product="p33ING1"/protein_id="AAG12174.1"/db_xref="GI:10039545" 1 8 7 1 1 3 1 3 13 0 0 2 1 8 3 0 6 2 3 2 2 0 0 9 0 3 1 6 11 3 0 10 5 0 0 2 9 0 7 20 15 2 1 13 6 1 1 29 19 0 6 2 6 3 5 0 0 9 0 6 2 0 1 0 >HSNRASR\X02751\727..1296\570\CAA26529.1\Homo sapiens\Human N-ras mRNA and flanking regions./codon_start=1/protein_id="CAA26529.1"/db_xref="GI:35103"/db_xref="GOA:P01111"/db_xref="InterPro:IPR001806"/db_xref="InterPro:IPR005225"/db_xref="UniProt/Swiss-Prot:P01111" 2 1 0 0 4 3 2 3 5 0 0 4 2 0 1 1 3 3 7 4 0 2 3 1 0 1 2 5 1 2 4 2 3 5 5 0 7 5 7 5 4 2 4 6 2 0 8 4 3 13 8 1 0 5 2 4 4 3 4 7 0 1 0 0 >BC061912\BC061912\18..338\321\AAH61912.1\Homo sapiens\Homo sapiens PTD008 protein, mRNA (cDNA clone MGC:70730IMAGE:4444565), complete cds./gene="PTD008"/codon_start=1/product="PTD008 protein"/protein_id="AAH61912.1"/db_xref="GI:38541370"/db_xref="GeneID:51398" 0 0 2 0 0 2 0 1 6 1 0 1 0 4 2 2 5 1 0 0 3 1 2 3 6 1 0 3 0 2 0 2 0 0 0 2 3 0 1 4 5 2 1 2 0 0 1 1 5 0 3 1 2 2 3 1 0 3 0 11 3 0 0 1 >BC010418\BC010418\62..949\888\AAH10418.1\Homo sapiens\Homo sapiens ribosomal protein SA, mRNA (cDNA clone MGC:16557IMAGE:4079845), complete cds./gene="RPSA"/codon_start=1/product="ribosomal protein SA"/protein_id="AAH10418.1"/db_xref="GI:14714564"/db_xref="GeneID:3921"/db_xref="MIM:150370" 0 3 3 3 1 5 1 2 9 7 1 1 1 4 0 6 1 2 1 8 2 13 4 4 0 12 10 8 1 18 6 5 0 4 2 4 7 6 3 8 6 3 2 13 4 0 13 12 8 7 3 4 1 1 8 2 2 5 8 7 10 1 0 0 >BC094839\BC094839\63..1121\1059\AAH94839.1\Homo sapiens\Homo sapiens ADP-ribosylation-like factor 6 interacting protein 4,transcript variant 3, mRNA (cDNA clone MGC:104844 IMAGE:6339167),complete cds./gene="ARL6IP4"/codon_start=1/product="SRp25 nuclear protein, isoform 3"/protein_id="AAH94839.1"/db_xref="GI:63100339"/db_xref="GeneID:51329"/db_xref="MIM:607668" 7 11 12 0 6 12 1 1 8 7 0 2 1 20 8 8 9 3 3 6 4 0 3 8 7 3 7 18 10 6 6 18 12 4 1 4 6 0 3 32 2 0 2 9 3 0 7 19 7 5 1 0 4 0 1 0 0 5 1 5 4 0 0 1 >BT006647\BT006647\1..792\792\AAP35293.1\Homo sapiens\Homo sapiens proteasome (prosome, macropain) subunit, alpha type, 1mRNA, complete cds./codon_start=1/product="proteasome (prosome, macropain) subunit, alpha type, 1"/protein_id="AAP35293.1"/db_xref="GI:30582133" 2 0 1 5 9 2 2 2 6 6 3 5 5 4 0 5 2 0 4 2 1 5 7 1 0 6 7 4 2 10 3 3 1 8 2 1 2 11 8 4 2 6 7 8 1 8 12 6 5 12 1 7 1 4 3 6 1 2 11 8 1 0 1 0 >BC045619\BC045619\617..2110\1494\AAH45619.1\Homo sapiens\Homo sapiens protein phosphatase 2, regulatory subunit B (B56),beta isoform, mRNA (cDNA clone MGC:39486 IMAGE:5294756), completecds./gene="PPP2R5B"/codon_start=1/product="PPP2R5B protein"/protein_id="AAH45619.1"/db_xref="GI:28374383"/db_xref="GeneID:5526"/db_xref="MIM:601644" 0 10 14 4 2 0 5 16 37 4 1 2 2 12 3 3 11 3 4 9 2 4 5 20 3 6 4 14 1 5 1 3 11 5 2 9 22 3 4 24 7 7 5 26 11 2 5 41 9 8 8 7 3 5 18 11 0 19 2 8 5 0 1 0 >BC039146\BC039146\45..1166\1122\AAH39146.1\Homo sapiens\Homo sapiens CD34 antigen, mRNA (cDNA clone MGC:21247IMAGE:4746591), complete cds./gene="CD34"/codon_start=1/product="CD34 protein"/protein_id="AAH39146.1"/db_xref="GI:24657613"/db_xref="GeneID:947"/db_xref="MIM:142230" 2 2 1 0 4 3 3 2 18 7 1 7 11 7 1 13 12 6 16 19 2 9 3 3 1 11 5 9 2 8 10 12 7 2 3 5 9 5 7 9 11 6 8 13 4 2 10 11 6 5 2 6 3 4 5 2 1 9 2 4 2 0 0 1 >BC075054\BC075054\50..736\687\AAH75054.1\Homo sapiens\Homo sapiens ephrin-A5, mRNA (cDNA clone MGC:103964IMAGE:30915365), complete cds./gene="EFNA5"/codon_start=1/product="ephrin-A5"/protein_id="AAH75054.1"/db_xref="GI:50960213"/db_xref="GeneID:1946"/db_xref="MIM:601535" 1 4 1 1 4 4 3 5 6 2 2 4 2 6 0 4 5 1 3 1 1 4 8 3 2 2 4 4 2 1 3 4 1 2 2 6 6 4 5 5 6 5 1 3 4 3 7 6 8 8 7 3 2 5 11 5 2 4 1 6 3 0 1 0 >AF336793\AF336793\104..943\840\AAL71993.1\Homo sapiens\Homo sapiens SSC1/ELOVL1 mRNA, complete cds./codon_start=1/product="SSC1/ELOVL1"/protein_id="AAL71993.1"/db_xref="GI:18461755" 1 2 4 2 0 1 2 8 12 6 1 2 3 9 1 5 4 0 2 7 0 0 2 6 1 5 5 7 0 3 4 7 4 1 1 8 11 3 2 10 7 2 3 8 7 4 0 5 4 1 13 6 0 2 15 4 2 7 10 16 11 0 0 1 >AY360171\AY360171\26..1978\1953\AAQ98856.1\Homo sapiens\Homo sapiens transducer of regulated CREB protein 1 (TORC1) mRNA,complete cds./gene="TORC1"/codon_start=1/product="transducer of regulated CREB protein 1"/protein_id="AAQ98856.1"/db_xref="GI:37693041" 1 5 10 3 2 5 0 14 46 2 2 1 4 30 8 9 32 3 10 23 8 6 18 39 19 7 7 25 15 5 8 25 14 2 2 11 7 1 6 8 17 5 6 43 15 1 5 20 29 0 10 3 1 1 13 4 0 15 0 17 2 0 0 1 >BC019043\BC019043\46..1479\1434\AAH19043.1\Homo sapiens\Homo sapiens solute carrier family 2, (facilitated glucosetransporter) member 8, mRNA (cDNA clone MGC:20634 IMAGE:4641145),complete cds./gene="SLC2A8"/codon_start=1/product="solute carrier family 2, (facilitated glucose transporter) member 8"/protein_id="AAH19043.1"/db_xref="GI:17512130"/db_xref="GeneID:29988"/db_xref="MIM:605245" 1 11 4 0 1 3 2 20 36 4 0 4 2 11 4 1 13 2 3 9 2 4 3 14 6 8 5 39 10 7 3 27 12 5 0 22 22 2 1 8 3 0 1 14 3 2 6 13 8 1 6 2 7 2 25 6 0 19 0 15 13 0 0 1 >AY322547\AY322547\1..1062\1062\AAP84360.1\Homo sapiens\Homo sapiens endothelial differentiation G-protein-coupled receptor7 mRNA, complete cds./codon_start=1/product="endothelial differentiation G-protein-coupled receptor 7"/protein_id="AAP84360.1"/db_xref="GI:32482015" 0 3 3 2 1 11 1 10 18 4 2 7 2 8 0 7 9 5 10 5 2 7 2 7 2 0 2 8 4 5 2 7 5 1 2 12 14 7 7 9 13 6 1 5 3 4 0 6 8 4 10 4 9 3 10 11 1 13 7 15 7 1 0 0 >BC037881\BC037881\148..960\813\AAH37881.2\Homo sapiens\Homo sapiens monocyte to macrophage differentiation-associated 2,mRNA (cDNA clone MGC:43879 IMAGE:5274201), complete cds./gene="MMD2"/codon_start=1/product="MMD2 protein"/protein_id="AAH37881.2"/db_xref="GI:71052022"/db_xref="GeneID:221938" 2 1 3 1 0 6 1 9 12 5 0 1 0 8 1 1 5 1 2 9 2 1 0 9 0 3 3 12 3 5 1 10 4 2 2 8 8 0 2 8 5 0 0 4 8 5 2 8 4 2 10 5 3 5 17 5 1 13 1 10 11 0 0 1 >HSU59299\U59299\61..1578\1518\AAC52013.1\Homo sapiens\Homo sapiens putative monocarboxylate transporter MCT mRNA,complete cds./codon_start=1/product="MCT"/protein_id="AAC52013.1"/db_xref="GI:2463630" 1 5 7 3 1 4 1 22 40 6 1 2 3 13 1 3 19 2 4 13 4 3 9 9 3 8 9 26 6 8 5 27 8 5 1 9 21 2 2 8 9 3 5 15 7 2 3 11 7 4 10 2 11 4 31 4 2 22 0 16 13 0 0 1 >AF159456\AF159456\107..7348\7242\AAD49696.1\Homo sapiens\Homo sapiens gp-340 variant protein (DMBT1) mRNA, complete cds./gene="DMBT1"/codon_start=1/product="gp-340 variant protein"/protein_id="AAD49696.1"/db_xref="GI:5733598" 33 22 22 8 8 51 18 23 83 7 6 33 82 90 14 34 35 36 34 67 15 22 43 54 15 16 28 78 1 39 71 128 14 46 12 60 84 15 4 6 42 82 12 67 39 36 46 41 79 77 50 24 65 66 25 26 7 37 29 17 89 0 1 0 >AK000814\AK000814\245..952\708\BAA91385.1\Homo sapiens\Homo sapiens cDNA FLJ20807 fis, clone ADSE01784./codon_start=1/protein_id="BAA91385.1"/db_xref="GI:7021123" 0 3 5 0 0 0 2 6 18 2 0 2 2 5 0 1 4 3 0 4 2 2 3 14 3 3 4 7 2 5 4 4 1 1 0 3 12 0 0 8 3 1 0 12 5 3 5 12 9 4 9 7 4 0 9 1 0 4 0 7 5 0 0 1 >AK026092\AK026092\63..1556\1494\BAB15357.1\Homo sapiens\Homo sapiens cDNA: FLJ22439 fis, clone HRC09236./codon_start=1/protein_id="BAB15357.1"/db_xref="GI:10438831" 0 10 11 1 1 0 1 14 39 1 1 3 3 16 4 4 23 2 1 18 6 4 3 14 5 5 1 20 13 4 0 11 8 6 2 3 23 0 2 29 8 1 1 28 11 2 2 33 19 2 16 3 8 1 13 3 0 13 2 18 1 0 0 1 >BC099711\BC099711\278..2668\2391\AAH99711.1\Homo sapiens\Homo sapiens solute carrier family 4 (anion exchanger), member 1,adaptor protein, mRNA (cDNA clone MGC:120646 IMAGE:40026923),complete cds./gene="SLC4A1AP"/codon_start=1/product="solute carrier family 4 (anion exchanger), member 1, adaptor protein"/protein_id="AAH99711.1"/db_xref="GI:71043487"/db_xref="GeneID:22950"/db_xref="MIM:602655" 3 6 10 4 8 14 6 10 14 14 12 17 15 7 6 15 15 9 14 14 1 17 23 11 6 20 14 11 7 12 20 17 11 12 8 3 11 7 34 35 10 6 12 22 7 7 55 42 23 28 8 8 9 4 5 15 5 5 10 15 7 0 0 1 >AK095693\AK095693\669..1133\465\BAC04609.1\Homo sapiens\Homo sapiens cDNA FLJ38374 fis, clone FEBRA2002552./codon_start=1/protein_id="BAC04609.1"/db_xref="GI:21755009" 0 0 1 0 4 1 0 1 4 3 1 2 1 3 3 3 1 4 0 0 1 4 4 3 0 4 4 0 1 3 3 1 4 4 0 2 8 2 3 6 2 2 3 5 4 5 6 6 1 4 0 0 3 2 2 2 2 3 2 5 6 0 1 0 >AF019039\AF019039\125..2815\2691\AAB83973.1\Homo sapiens\Homo sapiens transportin2 mRNA, complete cds./codon_start=1/product="transportin2"/protein_id="AAB83973.1"/db_xref="GI:2589204"/IGNORED_CODON=1 2 12 9 1 1 7 2 33 66 7 0 8 8 11 8 4 9 7 8 17 7 2 9 30 6 9 6 40 5 12 3 21 7 6 4 19 25 4 10 32 31 7 7 48 19 3 15 45 38 21 13 8 19 8 28 9 0 43 15 29 12 0 1 0 >AK126036\AK126036\3..3041\3039\BAC86406.1\Homo sapiens\Homo sapiens cDNA FLJ44048 fis, clone TESTI4030669./codon_start=1/protein_id="BAC86406.1"/db_xref="GI:34532374" 3 0 0 1 20 2 11 5 11 15 29 19 42 16 1 36 12 22 27 9 2 20 19 7 0 17 15 10 0 20 14 4 1 3 14 10 17 37 62 38 13 39 33 16 8 12 69 24 8 43 4 14 1 14 7 21 31 10 39 13 2 0 0 1 >HSU86529\U86529\104..754\651\AAB96392.1\Homo sapiens\Human glutathione transferase Zeta 1 (GSTZ1) mRNA, complete cds./gene="GSTZ1"/EC_number="2.5.1.18"/codon_start=1/product="glutathione transferase Zeta 1"/protein_id="AAB96392.1"/db_xref="GI:2228731" 3 0 1 2 2 4 2 4 14 1 0 3 2 3 0 4 4 0 2 6 2 3 3 7 1 4 2 8 2 4 3 5 1 1 1 2 9 1 1 11 4 3 2 15 2 0 1 9 5 5 3 3 3 2 4 2 2 11 5 5 2 0 1 0 >AF272884\AF272884\55..3129\3075\AAK77662.1\Homo sapiens\Homo sapiens gamma-tubulin complex component GCP5 (GCP5) mRNA,complete cds./gene="GCP5"/codon_start=1/product="gamma-tubulin complex component GCP5"/protein_id="AAK77662.1"/db_xref="GI:15021369" 8 2 8 4 18 10 9 19 31 19 18 21 11 12 2 20 15 24 14 13 6 17 15 8 3 12 19 13 5 16 16 9 6 7 11 16 25 19 30 30 18 19 20 35 15 22 58 24 26 36 15 18 6 9 18 26 17 12 21 28 20 1 0 0 >AK097353\AK097353\25..735\711\BAC05011.1\Homo sapiens\Homo sapiens cDNA FLJ40034 fis, clone SYNOV2000152, highly similarto Homo sapiens kappa 1 immunoglobulin light chain mRNA./codon_start=1/protein_id="BAC05011.1"/db_xref="GI:21757071" 2 1 1 0 4 3 1 7 10 2 1 1 2 3 2 8 8 6 4 8 1 4 4 2 2 4 3 6 2 2 6 4 3 3 1 8 4 2 8 5 7 2 4 12 3 1 4 6 7 5 5 3 3 3 6 5 0 5 1 3 3 0 1 0 >AK123305\AK123305\1306..2115\810\BAC85577.1\Homo sapiens\Homo sapiens cDNA FLJ41311 fis, clone BRAMY2042760./codon_start=1/protein_id="BAC85577.1"/db_xref="GI:34528814" 3 3 4 0 2 2 2 6 18 4 0 0 7 11 2 5 9 4 1 8 4 3 1 10 3 4 0 6 1 2 2 9 7 3 3 5 5 2 5 15 1 0 3 9 3 2 1 12 9 5 11 1 4 1 2 3 1 7 2 6 5 1 0 0 >AB183548\AB183548\402..4154\3753\BAD27573.1\Homo sapiens\Homo sapiens DREG mRNA for developmentally regulatedG-protein-coupled receptor beta 1, complete cds./gene="DREG"/codon_start=1/product="developmentally regulated G-protein-coupled receptor beta 1"/protein_id="BAD27573.1"/db_xref="GI:50251168" 6 3 6 1 18 8 16 20 25 18 25 24 30 23 1 25 28 25 24 24 3 21 18 11 1 17 24 19 3 27 23 14 13 12 14 17 25 26 48 20 42 57 15 24 10 14 34 15 21 34 18 20 13 22 33 39 10 32 42 24 25 1 0 0 >AY355461\AY355461\1..1044\1044\AAQ84219.1\Homo sapiens\Homo sapiens MNK1-like kinase 1b (MNK1B) mRNA, complete cds./gene="MNK1B"/codon_start=1/product="MNK1-like kinase 1b"/protein_id="AAQ84219.1"/db_xref="GI:35187115" 4 2 5 1 1 7 1 3 13 5 1 8 0 5 0 6 7 6 3 6 2 1 8 4 0 4 5 14 0 4 6 11 4 4 2 5 13 3 11 16 5 4 6 12 7 2 14 15 16 5 6 4 3 7 6 8 3 10 3 6 4 0 0 1 >BC014256\BC014256\84..1037\954\AAH14256.1\Homo sapiens\Homo sapiens guanine nucleotide binding protein (G protein), betapolypeptide 2-like 1, mRNA (cDNA clone MGC:20686 IMAGE:4765908),complete cds./gene="GNB2L1"/codon_start=1/product="guanine nucleotide binding protein (G protein), beta polypeptide 2-like 1"/protein_id="AAH14256.1"/db_xref="GI:15559817"/db_xref="GeneID:10399"/db_xref="MIM:176981" 4 4 2 2 0 2 2 7 15 2 2 0 3 8 1 8 7 3 4 17 5 5 3 3 2 2 1 7 0 7 6 17 1 3 3 6 10 3 4 13 13 1 1 11 7 1 4 5 5 16 4 2 4 4 4 4 0 14 5 5 13 0 1 0 >HUMSP4040A\L00974\join(520..931,1460..1564,5852..6081,7221..7396,7605..7614)\933\AAA60567.1\Homo sapiens\Homo sapiens SP40,40 gene, exons 5-9./gene="SP40,40"/codon_start=1/product="'SP40,40'"/protein_id="AAA60567.1"/db_xref="GI:338305" 1 6 12 0 0 5 1 6 15 2 0 4 0 14 2 4 6 0 1 7 5 4 1 10 3 4 1 3 3 4 0 3 0 2 1 8 8 1 4 9 12 1 1 18 13 0 6 20 17 4 8 1 3 2 17 4 3 7 0 10 3 0 0 1 >AK124881\AK124881\2679..3668\990\BAC85978.1\Homo sapiens\Homo sapiens cDNA FLJ42891 fis, clone BRHIP3008405, highly similarto Dynamin 2 (EC 3.6.1.50)./codon_start=1/protein_id="BAC85978.1"/db_xref="GI:34530791" 1 2 2 2 3 7 2 7 15 4 0 3 6 8 1 5 9 5 3 2 3 1 8 11 1 10 2 8 3 8 6 14 8 3 1 6 7 2 4 13 6 1 3 13 5 4 5 15 9 2 6 1 5 7 10 6 0 8 1 7 9 0 1 0 >AK126012\AK126012\744..2759\2016\BAC86391.1\Homo sapiens\Homo sapiens cDNA FLJ44024 fis, clone TESTI4026524, moderatelysimilar to Chromodomain helicase-DNA-binding protein 4./codon_start=1/protein_id="BAC86391.1"/db_xref="GI:34532335" 7 16 14 1 6 9 1 10 44 5 0 6 3 15 2 5 13 4 2 7 5 0 15 18 11 7 3 31 6 6 1 17 16 3 1 8 20 3 5 40 22 4 2 34 17 3 11 67 33 10 12 3 1 1 12 7 1 19 3 15 8 0 1 0 >BC093694\BC093694\49..1269\1221\AAH93694.1\Homo sapiens\Homo sapiens uronyl-2-sulfotransferase, mRNA (cDNA clone MGC:120729IMAGE:7939539), complete cds./gene="UST"/codon_start=1/product="uronyl-2-sulfotransferase"/protein_id="AAH93694.1"/db_xref="GI:62739710"/db_xref="GeneID:10090" 1 6 5 3 7 9 1 9 22 7 5 6 2 5 1 2 6 3 1 4 1 8 7 14 2 11 1 5 1 1 10 10 6 1 2 8 11 1 5 18 11 7 3 12 8 7 15 15 9 8 15 5 4 3 15 10 1 10 7 8 5 0 0 1 >BC030579\BC030579\108..1301\1194\AAH30579.1\Homo sapiens\Homo sapiens UDP-GlcNAc:betaGalbeta-1,3-N-acetylglucosaminyltransferase 1, mRNA (cDNA cloneMGC:26071 IMAGE:4828158), complete cds./gene="B3GNT1"/codon_start=1/product="B3GNT1 protein"/protein_id="AAH30579.1"/db_xref="GI:21040509"/db_xref="GeneID:10678"/db_xref="MIM:605581" 3 1 3 1 6 6 0 9 21 4 3 9 2 8 1 4 5 7 2 4 4 6 6 7 2 5 6 3 2 2 5 8 6 3 4 5 6 8 16 14 15 8 7 8 6 8 8 13 11 12 12 6 6 1 10 10 7 9 7 10 6 1 0 0 >AY707088\AY707088\1..1269\1269\AAU12168.1\Homo sapiens\Homo sapiens paired box gene 6 isoform a mRNA, complete cds./codon_start=1/product="paired box gene 6 isoform a"/protein_id="AAU12168.1"/db_xref="GI:51872083" 6 2 6 0 14 5 4 3 12 2 2 2 6 15 3 3 13 11 9 17 1 4 12 8 7 10 5 8 1 5 8 10 8 6 4 5 8 6 8 5 14 7 13 17 3 3 10 11 5 9 6 6 4 2 3 6 5 6 8 14 6 1 0 0 >HSHEPSH\X07732\826..2079\1254\CAA30558.1\Homo sapiens\Human hepatoma mRNA for serine protease hepsin./codon_start=1/product="hepsin"/protein_id="CAA30558.1"/db_xref="GI:32064"/db_xref="GOA:P05981"/db_xref="UniProt/Swiss-Prot:P05981" 4 4 11 2 2 6 1 13 21 3 0 2 0 11 2 3 10 4 3 7 7 4 5 13 3 1 4 22 5 8 5 24 9 6 2 11 20 1 1 9 7 3 3 17 8 0 3 14 14 6 6 4 10 9 9 4 2 10 4 5 10 0 0 1 >AK022532\AK022532\89..3016\2928\BAB14081.1\Homo sapiens\Homo sapiens cDNA FLJ12470 fis, clone NT2RM1000885, weakly similarto HYPOTHETICAL 97.1 KD PROTEIN R05D3.4 IN CHROMOSOME III./codon_start=1/protein_id="BAB14081.1"/db_xref="GI:10433974" 16 14 13 11 12 13 17 14 31 17 5 24 10 14 0 18 7 13 14 13 1 11 8 2 1 6 24 22 2 15 9 9 6 6 4 12 18 10 40 62 14 21 14 54 8 9 63 80 18 41 5 10 3 8 9 10 4 15 18 25 2 0 0 1 >AK027310\AK027310\101..1150\1050\BAB55033.1\Homo sapiens\Homo sapiens cDNA FLJ14404 fis, clone HEMBA1004055./codon_start=1/protein_id="BAB55033.1"/db_xref="GI:14041907" 4 1 2 0 5 3 1 9 17 4 3 6 2 5 2 2 8 1 4 6 0 6 3 4 1 10 3 8 2 6 5 4 6 3 3 2 13 5 12 10 11 8 5 6 9 6 14 6 12 9 2 10 2 2 7 7 5 11 9 8 9 0 0 1 >HSA302580\AJ302580\1..939\939\CAC20500.1\Homo sapiens\Homo sapiens 6M1-6*02 gene for olfactory receptor, cell line SA./gene="6M1-6*02"/codon_start=1/product="olfactory receptor"/protein_id="CAC20500.1"/db_xref="GI:12054385"/db_xref="GOA:O76002"/db_xref="UniProt/Swiss-Prot:O76002" 0 1 2 3 3 1 5 15 12 10 1 6 7 4 3 5 3 2 5 7 0 7 5 1 2 7 7 4 2 4 5 2 4 3 7 5 8 10 4 5 3 6 2 5 4 7 5 3 3 4 7 6 4 6 11 11 1 10 9 13 5 0 0 1 >AK074849\AK074849\125..1726\1602\BAC11243.1\Homo sapiens\Homo sapiens cDNA FLJ90368 fis, clone NT2RP2004205, highly similarto B7 homolog 3./codon_start=1/protein_id="BAC11243.1"/db_xref="GI:22760560" 0 7 10 3 2 2 1 8 43 0 0 2 0 9 4 4 26 1 8 21 7 2 4 14 4 10 12 21 5 10 5 26 7 7 0 10 41 0 6 6 14 3 1 34 7 1 4 22 16 12 8 2 11 2 18 3 0 10 1 8 8 0 0 1 >HUMAARE\D38441\20..2218\2199\BAA07476.1\Homo sapiens\Human mRNA for acylamino acid-releasing enzyme, complete cds./EC_number="3.4.19.1"/codon_start=1/product="acylamino acid-releasing enzyme"/protein_id="BAA07476.1"/db_xref="GI:556514" 1 10 13 4 3 5 6 16 36 6 2 11 11 15 5 7 22 5 7 14 3 4 12 18 1 18 9 17 4 13 8 28 15 6 2 11 49 6 7 20 10 6 7 30 11 7 9 38 27 12 11 11 14 3 13 18 1 16 6 16 16 0 0 1 >HSDJ144C9#1\AL096774\complement(join(3258..3264,3806..3950,5535..5579, 6083..6154,6255..6422,6907..6994))\525\CAI19568.1\Homo sapiens\Human DNA sequence from clone RP1-144C9 on chromosome 1p34.3-36.11Contains the gene for a novel protein, a ribosomal protein S24(RPS24) pseudogene, the GPR3 gene for G protein-coupled receptor 3and the 3' end of the WASF2 gene for WAS protein family, member 2,complete sequence./locus_tag="RP1-144C9.1-001"/standard_name="OTTHUMP00000003472"/codon_start=1/product="novel protein"/protein_id="CAI19568.1"/db_xref="GI:56203462"/db_xref="UniProt/TrEMBL:Q5JXD6" 1 4 2 0 1 1 3 5 6 0 0 2 1 2 0 1 6 0 3 2 1 1 4 3 3 2 1 8 4 7 7 3 4 3 0 6 8 0 2 4 1 2 2 5 7 0 2 10 3 0 2 1 7 6 2 4 0 5 0 2 2 0 0 1 >AK027665\AK027665\72..1163\1092\BAB55277.1\Homo sapiens\Homo sapiens cDNA FLJ14759 fis, clone NT2RP3003290, moderatelysimilar to Mus musculus mRNA for Ndr1 related protein Ndr3./codon_start=1/protein_id="BAB55277.1"/db_xref="GI:14042511" 2 2 2 0 7 0 4 10 13 5 4 4 4 8 2 6 7 2 8 9 0 6 6 6 0 7 5 6 1 8 10 9 5 2 2 6 8 8 8 5 9 11 6 14 6 8 11 11 11 10 6 2 3 5 5 6 6 7 8 9 2 1 0 0 >S55843\S55843\1..60\60\AAB25673.1\Homo sapiens\rhodopsin {exon 5} [human, peripheral blood, Genomic Mutant, 60nt]./gene="rhodopsin"/codon_start=1/product="rhodopsin"/protein_id="AAB25673.1"/db_xref="GI:266288" 2 0 1 0 1 2 2 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 >AK027466\AK027466\1..687\687\BAB55131.1\Homo sapiens\Homo sapiens cDNA FLJ14560 fis, clone NT2RM2002049./codon_start=1/protein_id="BAB55131.1"/db_xref="GI:14042161" 2 1 6 2 1 2 3 8 14 3 1 3 3 6 2 1 4 0 3 7 0 1 4 4 0 3 4 10 1 5 3 10 2 1 0 6 11 2 3 6 4 1 1 2 4 2 2 6 12 2 3 2 0 1 10 7 0 4 5 7 5 0 0 1 >AF150733\AF150733\94..261\168\AAF67473.1\Homo sapiens\Homo sapiens AD-014 protein mRNA, complete cds./codon_start=1/product="AD-014 protein"/protein_id="AAF67473.1"/db_xref="GI:7688665" 0 1 0 0 1 1 2 1 2 2 0 1 0 0 0 0 3 0 0 0 0 1 1 1 0 2 1 2 0 0 3 1 1 1 0 0 2 1 1 4 0 0 0 4 0 0 1 1 0 1 1 0 0 0 1 3 0 1 2 2 2 1 0 0 >AB075876\AB075876\429..1238\810\BAD38658.1\Homo sapiens\Homo sapiens mRNA for putative protein product of HMFN2567,complete cds./gene="HMFN2567"/codon_start=1/product="putative protein product of HMFN2567"/protein_id="BAD38658.1"/db_xref="GI:51555806" 2 3 1 1 6 4 3 2 13 9 1 11 7 3 1 6 1 2 2 8 0 5 2 5 0 3 6 4 1 6 4 4 3 4 6 5 8 4 6 2 6 6 2 7 1 3 5 6 4 2 1 2 3 4 10 9 9 7 5 5 8 0 1 0 >AL662870#1\AL662870\join(42188..42272,42366..42468,42660..42888,44431..44578, 44838..44937,45460..45646,45754..46000,46526..46621, 46730..46881,47964..48129,48862..48972,50079..50323, 50472..50599,50837..51055,51531..51765,52346..52495, 52614..52754)\2742\CAI18442.1\Homo sapiens\Human DNA sequence from clone XXbac-70P20 on chromosome 6 containsthe DDR1 gene for discoidin domain receptor family, member 1, aputative novel transcript and two CpG islands, complete sequence./gene="DDR1"/locus_tag="XXbac-BPG70P20.1-005"/standard_name="OTTHUMP00000029346"/codon_start=1/product="discoidin domain receptor family, member 1"/protein_id="CAI18442.1"/db_xref="GI:55962110"/db_xref="InterPro:IPR000421"/db_xref="InterPro:IPR000719"/db_xref="InterPro:IPR001245"/db_xref="InterPro:IPR002011"/db_xref="InterPro:IPR008266"/db_xref="InterPro:IPR008979"/db_xref="InterPro:IPR011009" 6 16 27 8 4 9 4 29 55 8 4 7 4 14 2 10 17 9 3 18 5 5 15 35 10 19 12 42 5 14 12 35 31 6 3 13 38 6 3 19 21 9 2 33 12 7 5 44 32 17 14 15 10 7 20 13 0 22 4 24 20 0 0 1 >AF144477\AF144477\281..1777\1497\AAD44754.1\Homo sapiens\Homo sapiens myotilin mRNA, complete cds./codon_start=1/product="myotilin"/protein_id="AAD44754.1"/db_xref="GI:5532485" 3 2 1 5 16 2 4 3 6 8 7 7 14 14 3 8 9 5 9 8 1 9 21 3 0 11 14 4 1 20 14 4 3 3 5 2 10 9 20 10 13 14 21 21 4 6 19 6 6 18 4 9 3 4 11 8 8 9 4 9 3 1 0 0 >AK097588\AK097588\258..1961\1704\BAC05111.1\Homo sapiens\Homo sapiens cDNA FLJ40269 fis, clone TESTI2026597./codon_start=1/protein_id="BAC05111.1"/db_xref="GI:21757406" 6 2 3 5 4 4 6 8 14 5 12 9 7 8 0 11 11 5 4 11 1 10 10 5 2 8 15 15 1 13 15 11 7 12 3 3 13 14 9 25 10 6 11 22 7 6 21 16 10 20 5 16 5 6 20 13 4 8 14 15 5 0 1 0 >AL136131#14\AL136131\join(136965..137030,140059..140110,143187..143383, 144178..144254,144607..144636,150259..150280)\444\CAC19515.1\Homo sapiens\Human DNA sequence from clone RP1-261G23 on chromosome 6p12.2-21.1Contains the KIAA0110 gene, the FLJ30845 gene, the MRPS18A gene formitochondrial ribosomal protein S18A, a pseudogene similar toOPA-interacting protein (OIP21) a novel gene, the VEGF gene forvascular endothelial growth factor and two CpG islands, completesequence./gene="VEGF"/locus_tag="RP1-261G23.1-003"/standard_name="OTTHUMP00000016488"/codon_start=1/product="vascular endothelial growth factor"/protein_id="CAC19515.1"/db_xref="GI:11989985"/db_xref="Genew:12680"/db_xref="GOA:P15692"/db_xref="InterPro:IPR000072"/db_xref="InterPro:IPR002400"/db_xref="UniProt/Swiss-Prot:P15692" 1 1 2 0 3 1 1 2 6 1 0 1 0 3 0 1 3 0 0 2 0 1 3 3 1 2 3 2 0 1 3 3 2 0 0 1 6 0 4 5 3 2 2 7 5 4 4 9 3 3 4 1 5 4 4 1 1 6 1 7 3 0 0 1 >AF218421\AF218421\28..897\870\AAF37561.1\Homo sapiens\Homo sapiens hepatocellular carcinoma-associated antigen 59 mRNA,complete cds./codon_start=1/product="hepatocellular carcinoma-associated antigen 59"/protein_id="AAF37561.1"/db_xref="GI:7158847" 2 3 6 4 4 6 2 1 9 3 1 3 3 5 2 1 1 3 3 6 2 2 3 5 1 6 6 4 1 4 1 5 3 2 3 2 10 3 10 23 12 4 1 8 2 3 9 32 9 10 2 5 0 1 3 3 2 5 4 10 0 0 0 1 >AK128288\AK128288\1550..1981\432\BAC87369.1\Homo sapiens\Homo sapiens cDNA FLJ46426 fis, clone THYMU3013897./codon_start=1/protein_id="BAC87369.1"/db_xref="GI:34535589" 0 1 1 1 0 0 1 7 4 3 2 7 1 5 1 5 4 0 3 4 0 2 5 4 2 8 1 7 0 2 1 1 2 1 2 3 0 3 0 0 1 1 1 5 4 2 1 2 1 1 1 0 6 4 5 3 0 5 0 5 1 0 0 1 >BC025255\BC025255\9..2417\2409\AAH25255.1\Homo sapiens\Homo sapiens spermatogenesis associated 20, mRNA (cDNA cloneMGC:39131 IMAGE:5019010), complete cds./gene="SPATA20"/codon_start=1/product="sperm protein SSP411"/protein_id="AAH25255.1"/db_xref="GI:19263653"/db_xref="GeneID:64847" 7 15 19 6 1 7 5 22 48 1 1 11 12 11 4 6 18 11 5 20 5 8 4 26 7 12 6 37 5 14 7 34 13 9 1 16 35 1 7 26 12 11 3 42 15 8 10 35 30 11 17 10 10 4 21 9 1 10 4 18 18 0 0 1 >BC017583\BC017583\193..372\180\AAH17583.1\Homo sapiens\Homo sapiens cDNA clone MGC:26468 IMAGE:4828130, complete cds./codon_start=1/product="Unknown (protein for MGC:26468)"/protein_id="AAH17583.1"/db_xref="GI:17068408" 0 1 0 0 1 0 3 1 3 2 0 1 0 0 0 1 0 0 1 1 0 1 0 0 1 1 1 1 0 1 2 0 1 0 2 2 0 2 2 1 2 2 1 2 2 1 1 0 1 3 1 0 1 1 2 0 1 0 2 3 0 1 0 0 >BC070333\BC070333\59..421\363\AAH70333.1\Homo sapiens\Homo sapiens immunoglobulin heavy variable 1-69, mRNA (cDNA cloneMGC:88338 IMAGE:6557369), complete cds./gene="IGHV1-69"/codon_start=1/product="IGHV1-69 protein"/protein_id="AAH70333.1"/db_xref="GI:47124252"/db_xref="GeneID:28461"/db_xref="IMGT/GENE-DB:IGHV1-69" 1 0 0 0 3 1 0 1 3 1 0 0 1 4 0 3 6 0 5 2 5 0 0 0 0 3 3 3 1 4 3 2 4 2 0 4 8 0 0 6 1 0 1 7 0 0 1 3 4 0 3 2 1 1 3 2 0 4 1 3 4 0 0 1 >BC026288\BC026288\167..679\513\AAH26288.1\Homo sapiens\Homo sapiens ubiquitin-conjugating enzyme E2G 1 (UBC7 homolog, C.elegans), transcript variant 1, mRNA (cDNA clone MGC:23926IMAGE:4806647), complete cds./gene="UBE2G1"/codon_start=1/product="ubiquitin-conjugating enzyme E2G 1, isoform 1"/protein_id="AAH26288.1"/db_xref="GI:20070812"/db_xref="GeneID:7326"/db_xref="MIM:601569" 3 2 0 0 4 1 1 4 5 4 1 0 1 0 1 3 1 1 2 1 1 3 5 1 0 7 4 1 1 4 2 2 1 5 1 2 3 4 9 3 1 7 1 2 2 2 12 5 3 9 1 4 1 1 2 4 1 3 7 4 4 0 0 1 >AF397395\AF397395\90..737\648\AAK97476.1\Homo sapiens\Homo sapiens NOE3-4 (NOE3) mRNA, complete cds, alternativelyspliced./gene="NOE3"/codon_start=1/product="NOE3-4"/protein_id="AAK97476.1"/db_xref="GI:15420834" 1 1 3 1 5 3 3 4 10 4 5 5 1 4 0 3 3 1 4 3 1 3 1 1 0 4 2 3 1 4 2 0 3 2 0 2 3 4 13 9 5 1 7 13 1 1 15 6 3 9 3 2 3 1 2 3 1 2 7 7 1 1 0 0 >HSU43292\U43292\308..817\510\AAB05839.1\Homo sapiens\Human MDS1B (MDS1) mRNA, complete cds./gene="MDS1"/codon_start=1/product="MDS1B"/protein_id="AAB05839.1"/db_xref="GI:1294815" 1 0 1 0 2 6 1 2 3 2 0 3 3 3 1 5 3 2 2 1 0 2 4 4 0 9 4 6 0 2 7 2 3 2 6 1 1 1 5 4 2 4 1 3 1 1 7 7 2 5 4 4 1 2 3 2 3 4 2 4 3 0 1 0 >BC008200\BC008200\331..1170\840\AAH08200.1\Homo sapiens\Homo sapiens keratin 8, mRNA (cDNA clone MGC:5377 IMAGE:3445666),complete cds./gene="KRT8"/codon_start=1/product="KRT8 protein"/protein_id="AAH08200.1"/db_xref="GI:14198278"/db_xref="GeneID:3856"/db_xref="MIM:148060" 0 5 8 3 0 4 1 5 24 0 0 1 0 11 2 7 16 1 4 5 1 1 0 2 0 0 3 14 2 6 1 10 6 2 1 3 5 1 1 15 9 0 0 14 1 1 2 30 9 5 5 4 0 0 2 1 0 10 6 9 0 0 0 1 >HS34B20#16\AL031777\complement(76356..76766)\411\CAC03421.1\Homo sapiens\Human DNA sequence from clone RP1-34B20 on chromosome 6p21.31-22.2Contains seventeen histone (pseudo)genes and gene RPS10P1 (40SRibosomal protein S10 pseudogene 1), three CpG islands, ESTs, STSsand GSSs, complete sequence./gene="HIST1H3G"/locus_tag="RP1-34B20.1-001"/standard_name="OTTHUMP00000016152"/codon_start=1/protein_id="CAC03421.1"/db_xref="GI:10198618"/db_xref="GOA:P16106"/db_xref="GOA:P68431"/db_xref="InterPro:IPR000164"/db_xref="InterPro:IPR007124"/db_xref="InterPro:IPR007125"/db_xref="UniProt/Swiss-Prot:P68431" 2 12 1 2 1 0 0 2 8 0 0 2 0 2 1 0 1 1 1 5 0 4 0 2 2 2 1 7 5 5 0 4 2 1 0 0 6 0 3 10 1 0 1 7 0 2 1 6 2 2 2 1 1 1 2 2 0 4 3 3 0 0 1 0 >HSU11870\U11870\2057..3109\1053\AAA64378.1\Homo sapiens\Human interleukin-8 receptor type A (IL8RBA) gene, promoter andcomplete cds./gene="IL8RA"/codon_start=1/evidence=experimental/product="interleukin-8 receptor type A"/protein_id="AAA64378.1"/db_xref="GI:511805" 1 6 3 5 0 3 4 9 34 3 1 6 2 7 1 4 6 3 7 7 0 5 4 5 1 2 4 15 0 3 6 10 1 0 0 17 9 4 1 10 10 8 1 6 4 6 3 6 4 8 9 4 7 3 14 10 0 16 4 12 6 0 0 1 >AB065748\AB065748\201..1151\951\BAC05968.1\Homo sapiens\Homo sapiens gene for seven transmembrane helix receptor, completecds, isolate:CBRC7TM_311./codon_start=1/evidence=not_experimental/product="seven transmembrane helix receptor"/protein_id="BAC05968.1"/db_xref="GI:21928764" 2 1 2 0 1 3 3 9 17 3 5 8 6 3 1 17 2 2 9 8 1 6 2 4 0 5 5 4 0 8 2 2 4 3 6 7 11 7 6 5 5 10 3 3 0 4 4 3 2 6 7 10 4 7 10 12 4 5 13 14 0 1 0 0 >AF055460\AF055460\123..1031\909\AAC27036.1\Homo sapiens\Homo sapiens stanniocalcin-2 (STC-2) mRNA, complete cds./gene="STC-2"/codon_start=1/product="stanniocalcin-2"/protein_id="AAC27036.1"/db_xref="GI:3335144" 3 2 8 0 2 6 0 4 12 1 1 8 1 6 1 3 10 1 1 9 1 2 3 8 2 2 1 14 4 6 5 10 7 4 0 2 11 1 4 10 9 1 1 17 11 4 7 17 9 3 2 1 7 8 6 4 2 8 2 5 2 0 0 1 >BT007434\BT007434\1..2253\2253\AAP36102.1\Homo sapiens\Homo sapiens methylmalonyl Coenzyme A mutase mRNA, complete cds./codon_start=1/product="methylmalonyl Coenzyme A mutase"/protein_id="AAP36102.1"/db_xref="GI:30583707" 9 3 2 8 12 8 6 7 11 23 8 9 11 7 0 14 4 5 13 9 0 16 14 5 1 16 20 10 3 35 29 7 6 16 13 2 16 17 29 20 7 18 10 24 7 8 47 8 9 31 7 12 0 8 7 21 14 10 27 24 7 0 1 0 >HUMANTN\M60618 M34541\32..1474\1443\AAA35537.1\Homo sapiens\Human nuclear autoantigen (SP-100) mRNA, complete cds./gene="Sp-100"/codon_start=1/product="nuclear autoantigen"/protein_id="AAA35537.1"/db_xref="GI:178689" 3 0 0 3 15 8 4 7 12 4 4 3 12 6 2 12 12 12 10 4 1 7 10 9 2 8 12 3 1 6 8 6 4 7 6 5 8 4 12 13 8 18 16 7 7 7 32 24 14 18 2 2 5 6 9 9 7 6 8 8 2 0 1 0 >AK058150\AK058150\119..757\639\BAB71688.1\Homo sapiens\Homo sapiens cDNA FLJ25421 fis, clone TST03713./codon_start=1/protein_id="BAB71688.1"/db_xref="GI:16554206" 0 0 1 1 1 2 1 3 7 3 5 2 7 10 0 17 5 6 4 4 0 2 4 2 0 5 2 3 1 5 3 1 2 1 3 2 2 2 5 6 2 9 4 8 3 4 6 6 2 10 0 0 0 1 3 5 2 7 4 4 2 1 0 0 >AF193339\AF193339\303..3650\3348\AAF61199.1\Homo sapiens\Homo sapiens eukaryotic translation initiation factor 2 alphakinase PEK mRNA, complete cds./codon_start=1/product="eukaryotic translation initiation factor 2 alpha kinase PEK"/protein_id="AAF61199.1"/db_xref="GI:7341091" 9 9 4 5 20 17 5 18 28 16 14 17 24 14 2 30 20 23 22 11 7 19 30 9 8 21 16 19 14 23 26 13 12 8 15 8 20 22 37 25 16 25 11 24 10 15 57 36 29 39 8 25 4 14 15 27 13 18 28 25 16 0 1 0 >AF110763\AF110763\join(22..177,604..778,1380..1549,2043..2229,3457..3611)\843\AAD21579.1\Homo sapiens\Homo sapiens skeletal muscle LIM-protein 1 (FHL1) gene, completecds./gene="FHL1"/codon_start=1/product="skeletal muscle LIM-protein 1"/protein_id="AAD21579.1"/db_xref="GI:4512028" 0 4 1 1 1 1 0 0 4 1 0 2 0 5 0 2 1 2 1 9 0 5 0 7 0 1 2 11 2 3 7 3 7 2 0 2 15 2 7 26 8 2 8 7 9 4 2 9 10 6 7 5 25 10 12 8 0 5 1 1 4 1 0 0 >AK128139\AK128139\162..1736\1575\BAC87293.1\Homo sapiens\Homo sapiens cDNA FLJ46260 fis, clone TESTI4024494, weakly similarto Homo sapiens zinc-binding protein Rbcc728./codon_start=1/protein_id="BAC87293.1"/db_xref="GI:34535364" 6 10 12 3 4 8 2 14 34 7 0 6 4 10 2 3 12 6 10 15 2 3 10 17 3 8 8 24 0 7 10 16 10 9 3 1 21 0 0 24 8 1 6 29 13 3 10 26 9 2 7 1 22 7 9 2 2 8 3 9 3 0 0 1 >BC030509\BC030509\291..530\240\AAH30509.1\Homo sapiens\Homo sapiens DKFZp434A0131 protein, mRNA (cDNA cloneIMAGE:4400104), complete cds./gene="DKFZP434A0131"/codon_start=1/product="DKFZP434A0131 protein"/protein_id="AAH30509.1"/db_xref="GI:20988121"/db_xref="GeneID:54441" 0 1 0 0 1 1 1 2 8 1 1 0 0 1 0 0 4 0 1 1 1 2 3 0 0 0 1 1 0 2 1 0 4 3 1 0 1 0 0 3 1 0 1 5 4 0 0 5 5 0 1 0 1 1 4 1 0 0 0 2 2 0 0 1 >D87062\D87062\1..501\501\BAB46918.1\Homo sapiens\Homo sapiens hucep-3 mRNA for cerebral protein-3, complete cds./gene="hucep-3"/codon_start=1/evidence=not_experimental/product="cerebral protein-3"/protein_id="BAB46918.1"/db_xref="GI:13874423" 2 0 3 0 4 6 2 4 10 4 0 2 1 5 1 5 8 1 3 7 0 0 3 2 0 1 1 5 0 1 5 7 10 3 0 1 5 0 3 1 0 0 0 6 3 1 2 6 1 1 2 0 5 5 3 3 0 1 1 4 6 0 0 1 >HUMNKEFA\L19184\68..667\600\AAA50464.1\Homo sapiens\Human natural killer cell enhancing factor (NKEFA) mRNA, completecds./gene="NKEFA"/codon_start=1/product="enhancer protein"/protein_id="AAA50464.1"/db_xref="GI:440306" 0 2 1 0 1 2 2 1 2 3 1 2 2 1 1 5 1 2 2 4 1 3 3 4 1 7 2 3 0 7 5 4 3 3 2 3 6 3 10 9 3 3 2 6 2 2 5 2 5 11 2 3 4 2 10 5 0 6 7 3 2 0 0 1 >BC068477\BC068477\345..1376\1032\AAH68477.1\Homo sapiens\Homo sapiens Rap guanine nucleotide exchange factor (GEF) 3, mRNA(cDNA clone IMAGE:4813268), complete cds./gene="RAPGEF3"/codon_start=1/product="RAPGEF3 protein"/protein_id="AAH68477.1"/db_xref="GI:46250001"/db_xref="GeneID:10411"/db_xref="MIM:606057" 8 2 11 3 2 5 0 10 28 3 1 4 0 7 2 3 9 0 2 9 1 3 5 8 1 2 5 13 1 5 6 6 10 2 0 2 21 2 2 12 7 1 2 13 7 8 8 19 9 10 4 1 7 1 7 3 0 11 1 4 4 1 0 0 >HSY14768#9\Y14768\join(26891..26933,29486..29637,29713..29830,29979..30086, 30224..30300)\498\CAA75068.1\Homo sapiens\Homo sapiens DNA, cosmid clones TN62 and TN82./gene="1C7"/codon_start=1/product="1C7f"/protein_id="CAA75068.1"/db_xref="GI:3805809"/db_xref="GOA:O14931"/db_xref="UniProt/Swiss-Prot:O14931" 3 0 3 1 3 0 1 5 7 2 0 2 1 6 0 2 4 0 3 2 1 1 3 5 0 2 0 6 0 5 5 5 7 2 0 6 11 1 2 1 0 2 1 3 5 3 2 7 2 2 2 2 4 3 4 1 0 4 3 4 3 0 1 0 >HSJ1103B4#2\AL121998\complement(join(48522..48673,53613..53835,58125..58282, 59446..59571,80071..80311,82557..82757,84604..84793, 90029..90176,91145..91289,98536..98629,111260..111336))\1755\CAI19172.1\Homo sapiens\Human DNA sequence from clone RP5-1103B4 on chromosome 1 Containsthe C8A gene for complement component 8 alpha polypeptide, the C8Bgene for complement component 8 beta polypeptide and a novel gene,complete sequence./gene="C8A"/locus_tag="RP5-1103B4.1-001"/standard_name="OTTHUMP00000010016"/codon_start=1/product="complement component 8, alpha polypeptide"/protein_id="CAI19172.1"/db_xref="GI:56203387"/db_xref="GOA:P07357"/db_xref="InterPro:IPR000884"/db_xref="InterPro:IPR001862"/db_xref="InterPro:IPR002172"/db_xref="InterPro:IPR006209"/db_xref="InterPro:IPR008085"/db_xref="InterPro:IPR008957"/db_xref="UniProt/Swiss-Prot:P07357" 4 6 6 2 7 10 0 3 14 7 5 10 9 10 1 8 13 6 12 9 3 7 7 3 1 7 14 15 0 6 15 16 12 13 8 1 11 7 16 16 9 14 5 28 6 3 17 16 21 18 11 15 16 14 10 12 2 15 11 12 9 0 0 1 >AY357727\AY357727\198..2042\1845\AAR29466.1\Homo sapiens\Homo sapiens progerin mRNA, complete cds./codon_start=1/product="progerin"/protein_id="AAR29466.1"/db_xref="GI:39653934" 2 29 15 9 0 5 2 9 50 6 0 3 6 12 6 6 24 2 1 19 4 8 2 6 2 3 9 28 7 14 4 11 14 3 1 4 20 1 4 36 12 7 1 38 9 4 7 64 23 12 7 2 5 0 4 3 1 13 1 10 4 1 0 0 >HSA249976\AJ249976\91..1800\1710\CAB65116.1\Homo sapiens\Homo sapiens mRNA for AMP-activated protein kinase gamma2 subunit(AMPK gamma2 gene)./gene="AMPK gamma2"/function="AMP-activated protein kinase regulatory subunit"/codon_start=1/evidence=experimental/product="AMP-activated protein kinase gamma2 subunit"/protein_id="CAB65116.1"/db_xref="GI:6688199"/db_xref="GOA:Q9UGJ0"/db_xref="UniProt/Swiss-Prot:Q9UGJ0" 5 6 3 2 8 7 3 6 18 8 5 5 9 27 4 9 12 7 11 12 4 2 8 19 10 10 8 13 7 9 8 10 6 6 9 3 14 10 21 20 7 10 6 12 9 5 15 18 13 12 6 10 2 1 19 10 10 13 12 13 2 0 0 1 >BC015803\BC015803\8..1057\1050\AAH15803.1\Homo sapiens\Homo sapiens interferon regulatory factor 2, mRNA (cDNA cloneMGC:9260 IMAGE:3920890), complete cds./gene="IRF2"/codon_start=1/product="interferon regulatory factor 2"/protein_id="AAH15803.1"/db_xref="GI:16041826"/db_xref="GeneID:3660"/db_xref="MIM:147576" 2 3 6 0 3 3 1 6 8 4 0 1 3 12 1 5 14 6 4 6 3 6 8 8 7 7 4 5 4 1 6 3 5 0 4 9 8 4 14 16 9 10 5 8 2 5 14 15 7 11 5 2 2 1 3 3 4 9 6 10 8 1 0 0 >HUMHAHLABW\M83193\1..1017\1017\AAA58628.1\Homo sapiens\Homo sapiens (isolate HA) HLA-Bw62 antigen (HLA-Bw62.3 IEF variant,B*1501 allele) mRNA, 3'end./gene="HLA-Bw62.3"/codon_start=1/product="HLA-Bw62 antigen"/protein_id="AAA58628.1"/db_xref="GI:183777" 1 8 5 1 6 7 1 6 14 1 0 0 1 9 0 7 7 2 6 14 2 3 4 7 4 2 4 10 11 8 5 12 9 1 1 3 15 2 1 8 5 0 1 19 4 5 1 23 16 4 13 2 4 1 6 0 1 9 1 5 10 0 0 1 >AF487652\AF487652\join(2955..3328,7168..7693)\900\AAL96666.1\Homo sapiens\Homo sapiens fibrinogen silencer binding protein (FSBP) gene,complete cds./gene="FSBP"/codon_start=1/product="fibrinogen silencer binding protein"/protein_id="AAL96666.1"/db_xref="GI:19744802" 1 3 2 0 9 6 9 3 4 4 7 8 5 5 2 7 1 6 3 5 0 2 6 1 0 8 7 1 0 3 4 1 1 2 7 0 5 7 14 9 4 8 8 12 4 7 16 15 3 12 0 7 1 1 0 7 4 2 11 8 1 1 0 0 >AF226604\AF226604\1..579\579\AAF64280.1\Homo sapiens\Homo sapiens sigma 1 receptor beta variant mRNA, complete cds,alternatively spliced./codon_start=1/product="sigma 1 receptor beta variant"/protein_id="AAF64280.1"/db_xref="GI:7582320" 0 3 8 1 0 0 0 8 16 3 0 2 0 4 2 1 2 0 2 7 2 2 3 1 0 2 2 8 7 4 0 11 4 3 2 6 9 0 0 0 1 1 0 9 6 0 1 11 5 0 3 3 1 0 9 1 1 2 0 4 9 0 0 1 >BC065530\BC065530\225..3236\3012\AAH65530.1\Homo sapiens\Homo sapiens methyl-CpG binding domain protein 6, mRNA (cDNA cloneMGC:70424 IMAGE:6152748), complete cds./gene="MBD6"/codon_start=1/product="methyl-CpG binding domain protein 6"/protein_id="AAH65530.1"/db_xref="GI:41351092"/db_xref="GeneID:114785" 7 6 12 11 10 13 10 29 47 22 5 13 21 23 2 27 32 17 8 19 2 11 47 79 8 80 15 47 3 26 27 29 44 15 3 11 15 4 7 10 10 13 10 20 12 6 9 28 13 7 5 0 6 6 12 7 1 5 3 9 4 0 1 0 >AF033095\AF033095\63..776\714\AAB87479.1\Homo sapiens\Homo sapiens testis enhanced gene transcript protein (TEGT) mRNA,complete cds./gene="TEGT"/codon_start=1/product="testis enhanced gene transcript protein"/protein_id="AAB87479.1"/db_xref="GI:2645729" 1 1 0 1 2 1 0 6 14 9 2 6 2 3 0 3 4 2 3 3 1 5 0 2 1 2 7 8 2 5 7 6 2 1 0 7 3 2 5 8 4 2 2 5 2 6 5 2 0 8 1 5 2 3 10 11 3 6 8 12 3 0 0 1 >BC028237\BC028237\429..1295\867\AAH28237.1\Homo sapiens\Homo sapiens growth differentiation factor 10, mRNA (cDNA cloneMGC:39897 IMAGE:5245100), complete cds./gene="GDF10"/codon_start=1/product="GDF10 protein"/protein_id="AAH28237.1"/db_xref="GI:20380935"/db_xref="GeneID:2662"/db_xref="MIM:601361" 1 8 8 1 1 7 1 4 14 1 0 0 2 6 2 2 4 0 0 2 2 1 4 20 7 3 5 17 7 1 2 6 5 0 0 4 12 3 4 11 8 3 0 12 4 1 1 13 12 8 6 2 4 3 7 2 1 11 1 6 5 0 0 1 >AB044786\AB044786\211..780\570\BAD29732.1\Homo sapiens\Homo sapiens Si-1-2 mRNA for zinc finger protein, complete cds./gene="Si-1-2"/codon_start=1/product="zinc finger protein"/protein_id="BAD29732.1"/db_xref="GI:50300079" 0 6 4 1 1 3 0 1 20 0 0 1 0 6 1 0 4 3 2 3 5 1 4 12 5 5 1 9 3 1 2 3 3 2 0 4 5 2 1 7 1 0 0 8 3 0 0 10 7 3 3 1 7 2 4 0 0 2 0 4 3 1 0 0 >AL355315#4\AL355315\join(13479..13689,30632..30765,40285..40434)\495\CAI15455.1\Homo sapiens\Human DNA sequence from clone RP11-548K23 on chromosome 10 Containsthe ANKRD2 gene for ankyrin repeat domain 2 (stretch responsivemuscle), six novel genes, the gene for phosphatidylinositol4-kinase type II (PI4KII) and four CpG islands, complete sequence./gene="C10orf65"/locus_tag="RP11-548K23.9-001"/standard_name="OTTHUMP00000020215"/codon_start=1/product="chromosome 10 open reading frame 65"/protein_id="CAI15455.1"/db_xref="GI:55958616"/db_xref="GOA:Q96EV5"/db_xref="InterPro:IPR002220"/db_xref="UniProt/TrEMBL:Q96EV5" 2 4 1 0 0 3 1 2 12 0 0 2 1 1 0 2 4 0 0 5 1 2 2 7 0 1 2 6 2 4 2 5 10 2 0 4 8 0 4 3 2 3 2 6 2 0 1 10 3 2 2 2 5 0 4 2 0 3 2 3 5 0 0 1 >AL355145#2\AL355145\join(12011..12137,14114..14179,14563..14667,15064..15274, 15671..15861,16440..16522,16904..17157,17987..18185, 18328..18449,18979..19774,20643..20657)\2169\CAI22937.1\Homo sapiens\Human DNA sequence from clone RP5-831G13 on chromosome 1 Containsthe 3' end of the gene for a novel protein (possible ortholog ofmouse mitsugumin 29), genes MGC46534 and FLJ39035, theamphoterin-induced gene (KIAA1161) (AMIGO) and the 5' end of theGPR61 gene for G protein-coupled receptor 61, complete sequence./gene="ATXN7L2"/locus_tag="RP5-831G13.2-001"/standard_name="OTTHUMP00000012702"/codon_start=1/product="ataxin 7-like 1"/protein_id="CAI22937.1"/db_xref="GI:56206276"/db_xref="UniProt/TrEMBL:Q5T6C5" 1 10 23 4 5 6 0 14 24 6 2 4 6 25 2 13 25 7 7 21 3 4 28 29 6 25 12 31 8 19 6 19 22 6 4 5 20 4 17 40 12 3 3 21 14 6 12 34 18 7 4 4 19 10 10 4 1 9 1 13 4 1 0 0 >BC040555\BC040555\725..1099\375\AAH40555.1\Homo sapiens\Homo sapiens thiamin pyrophosphokinase 1, mRNA (cDNA cloneMGC:41789 IMAGE:5269277), complete cds./gene="TPK1"/codon_start=1/product="TPK1 protein"/protein_id="AAH40555.1"/db_xref="GI:71296864"/db_xref="GeneID:27010"/db_xref="MIM:606370" 0 0 0 1 4 2 1 3 3 4 1 2 0 0 1 1 0 0 3 4 0 5 3 1 0 4 1 2 1 1 4 3 1 1 1 1 3 4 2 3 1 3 3 3 3 1 1 2 3 3 2 0 0 4 1 2 3 8 3 4 3 0 0 1 >AK056984\AK056984\96..752\657\BAB71332.1\Homo sapiens\Homo sapiens cDNA FLJ32422 fis, clone SKMUS2000933, weakly similarto RNA polymerase III subunit./codon_start=1/protein_id="BAB71332.1"/db_xref="GI:16552536" 2 1 8 3 1 2 5 3 5 0 0 4 5 0 0 1 1 1 3 4 0 2 4 9 1 5 1 4 0 3 2 5 4 5 2 2 4 0 4 11 2 3 1 6 0 1 21 20 7 9 3 5 0 0 4 2 2 4 4 5 2 0 0 1 >BT019526\BT019526\1..930\930\AAV38333.1\Homo sapiens\Homo sapiens protein phosphatase 2 (formerly 2A), catalyticsubunit, beta isoform mRNA, complete cds./codon_start=1/product="protein phosphatase 2 (formerly 2A), catalytic subunit, beta isoform"/protein_id="AAV38333.1"/db_xref="GI:54695922" 2 2 3 9 4 0 1 5 9 7 8 1 5 2 0 3 1 1 9 6 1 4 9 1 1 3 3 3 2 4 8 4 2 9 5 2 7 7 4 7 7 7 7 7 2 8 9 10 10 15 5 11 3 7 6 8 4 1 7 6 5 0 1 0 >BT006840\BT006840\1..1113\1113\AAP35486.1\Homo sapiens\Homo sapiens phosphoserine aminotransferase mRNA, complete cds./codon_start=1/product="phosphoserine aminotransferase"/protein_id="AAP35486.1"/db_xref="GI:30582519" 1 1 2 1 2 4 2 4 10 6 5 8 5 7 1 5 5 4 5 2 2 1 8 6 1 3 4 14 2 10 10 7 7 6 1 8 16 10 12 15 10 12 5 6 3 3 9 10 8 10 6 7 3 4 6 11 3 4 15 9 3 0 1 0 >AF498921\AF498921\1..642\642\AAM18130.1\Homo sapiens\Homo sapiens purinergic receptor P2RY11 (P2RY11) mRNA, completecds./gene="P2RY11"/codon_start=1/product="purinergic receptor P2RY11"/protein_id="AAM18130.1"/db_xref="GI:20269977" 1 3 5 2 1 3 2 8 16 0 0 2 1 3 0 0 11 1 4 1 1 2 2 9 4 2 6 20 5 1 0 10 6 1 0 1 13 0 1 4 3 1 2 5 5 0 0 6 4 1 8 2 8 2 2 1 1 2 0 6 2 0 0 1 >AK002033\AK002033\135..2447\2313\BAA92047.1\Homo sapiens\Homo sapiens cDNA FLJ11171 fis, clone PLACE1007317, weakly similarto Drosophila melanogaster Adrift mRNA./codon_start=1/protein_id="BAA92047.1"/db_xref="GI:7023671" 2 0 3 3 9 7 13 10 17 25 12 24 7 9 1 19 11 14 8 6 0 23 16 7 1 7 10 7 3 19 14 3 7 9 11 5 7 16 28 21 15 29 11 20 9 25 34 16 9 21 7 11 10 22 12 39 7 7 21 18 13 0 0 1 >BC003134\BC003134\554..1264\711\AAH03134.1\Homo sapiens\Homo sapiens achaete-scute complex-like 1 (Drosophila), mRNA (cDNAclone MGC:3624 IMAGE:2821173), complete cds./gene="ASCL1"/codon_start=1/product="achaete-scute complex homolog-like 1"/protein_id="AAH03134.1"/db_xref="GI:13111927"/db_xref="GeneID:429"/db_xref="MIM:100790" 1 10 2 0 1 0 0 3 10 2 0 2 3 4 7 3 7 1 1 3 1 0 0 11 8 0 6 17 13 0 0 12 1 1 0 7 3 0 1 8 10 0 1 29 2 1 2 11 7 0 5 0 1 1 6 3 0 2 0 5 1 0 0 1 >BC066907\BC066907\23..1381\1359\AAH66907.1\Homo sapiens\Homo sapiens ethanolamine kinase 1, mRNA (cDNA clone MGC:87084IMAGE:5260109), complete cds./gene="ETNK1"/codon_start=1/product="ethanolamine kinase 1"/protein_id="AAH66907.1"/db_xref="GI:45219773"/db_xref="GeneID:55500" 4 4 5 4 3 3 6 11 12 7 3 5 2 8 1 6 3 8 3 6 1 6 7 9 4 6 9 16 2 10 10 8 7 2 6 14 11 6 9 11 8 15 6 14 4 8 14 16 5 14 10 9 5 4 10 12 4 7 14 7 8 1 0 0 >HSA512835\AJ512835\108..1424\1317\CAD55563.1\Homo sapiens\Homo sapiens mRNA for putative calcium binding transporter (ORF1)./codon_start=1/product="putative calcium binding transporter"/protein_id="CAD55563.1"/db_xref="GI:28551967"/db_xref="GOA:Q86Y43"/db_xref="InterPro:IPR001993"/db_xref="InterPro:IPR002048"/db_xref="InterPro:IPR002067"/db_xref="InterPro:IPR002167"/db_xref="UniProt/TrEMBL:Q86Y43" 3 13 11 3 1 5 1 9 34 5 0 4 6 7 3 2 5 4 6 8 7 1 3 6 3 2 4 17 2 6 2 22 9 8 1 8 15 0 3 14 7 5 6 19 6 4 5 23 20 6 7 5 4 0 10 2 3 14 8 11 10 1 0 0 >BC033082\BC033082\741..1604\864\AAH33082.1\Homo sapiens\Homo sapiens KIAA1922 protein, mRNA (cDNA clone MGC:45647IMAGE:2960715), complete cds./gene="KIAA1922"/codon_start=1/product="KIAA1922 protein"/protein_id="AAH33082.1"/db_xref="GI:23138751"/db_xref="GeneID:114819" 2 12 15 1 3 17 1 3 21 2 0 0 2 1 0 1 20 3 1 4 0 0 3 3 1 5 4 16 3 4 4 9 4 2 0 2 8 1 2 12 2 0 5 21 5 1 4 27 7 1 1 2 5 1 0 0 0 1 0 4 8 0 1 0 >BC009404\BC009404\15..809\795\AAH09404.1\Homo sapiens\Homo sapiens ribosomal protein S3A, mRNA (cDNA clone MGC:15425IMAGE:4308955), complete cds./gene="RPS3A"/codon_start=1/product="ribosomal protein S3a"/protein_id="AAH09404.1"/db_xref="GI:14424796"/db_xref="GeneID:6189"/db_xref="MIM:180478" 2 3 2 3 3 1 0 4 4 5 0 5 0 1 0 6 1 3 3 7 2 5 4 1 0 2 3 2 1 7 7 5 1 7 1 7 6 9 20 18 4 6 5 7 2 3 12 3 5 12 1 4 2 2 5 5 3 2 7 11 2 1 0 0 >AK126051\AK126051\1..2937\2937\BAC86413.1\Homo sapiens\Homo sapiens cDNA FLJ44063 fis, clone TESTI4035637./codon_start=1/protein_id="BAC86413.1"/db_xref="GI:34532396" 2 0 0 0 14 7 10 6 8 9 26 19 35 16 3 32 13 20 31 6 1 25 22 5 0 17 13 7 0 14 11 4 5 5 18 7 17 31 72 30 14 44 35 12 5 9 58 27 10 48 0 21 2 11 10 21 27 9 39 14 1 0 0 1 >HUMPRPC\K03204\38..1033\996\AAA60185.1\Homo sapiens\Human PRB1 locus salivary proline-rich protein mRNA, clone cP3,complete cds./gene="PRB1"/codon_start=1/protein_id="AAA60185.1"/db_xref="GI:190486"/db_xref="GDB:G00-119-511" 5 1 0 0 3 0 1 0 5 0 1 2 2 7 0 4 3 4 0 0 0 0 60 31 0 30 4 3 0 3 40 11 2 10 0 1 1 0 4 16 10 4 40 12 0 0 4 0 3 1 0 0 0 0 0 0 1 0 1 1 0 0 0 1 >AB020592\AB020592\166..1245\1080\BAB18742.1\Homo sapiens\Homo sapiens HspBP mRNA for heat shock protein binding protein,complete cds./gene="HspBP"/codon_start=1/product="heat shock protein binding protein"/protein_id="BAB18742.1"/db_xref="GI:11559220" 4 8 9 2 0 3 1 8 41 1 0 2 5 6 3 4 6 2 2 3 2 1 4 7 4 3 4 17 7 5 2 18 9 3 0 4 14 0 2 6 4 2 2 29 7 1 6 28 13 1 2 0 7 6 9 1 0 4 0 13 2 0 0 1 >AF259792\AF259792\193..4653\4461\AAG36928.1\Homo sapiens\Homo sapiens p250R mRNA, complete cds./codon_start=1/product="p250R"/protein_id="AAG36928.1"/db_xref="GI:11527189"/IGNORED_CODON=1 10 13 4 8 14 18 5 13 41 15 5 23 18 35 13 23 54 25 10 19 10 17 51 52 22 40 34 25 12 22 27 40 34 15 4 21 25 10 28 42 43 16 21 93 19 9 30 56 42 27 28 17 9 7 14 20 7 28 16 74 12 0 0 1 >BT020103\BT020103\1..900\900\AAV38906.1\Homo sapiens\Homo sapiens F11 receptor mRNA, complete cds./codon_start=1/product="F11 receptor"/protein_id="AAV38906.1"/db_xref="GI:54697068" 2 1 5 2 3 1 0 4 10 2 0 7 4 8 3 9 6 4 11 9 1 6 3 6 0 8 4 7 2 3 5 6 9 2 1 9 15 4 5 11 5 8 3 2 2 0 10 9 5 4 5 6 4 3 7 6 2 8 4 5 3 0 1 0 >AB062402\AB062402\189..740\552\BAB93489.1\Homo sapiens\Homo sapiens OK/SW-cl.84 mRNA for ferritin-heavy polypeptide 1,complete cds./gene="OK/SW-cl.84"/codon_start=1/product="ferritin-heavy polypeptide 1"/protein_id="BAB93489.1"/db_xref="GI:21104438" 2 4 0 0 0 1 1 2 10 2 1 6 2 3 1 3 2 1 1 4 1 1 1 2 0 0 2 6 3 2 2 2 1 2 0 0 5 1 8 5 6 6 2 8 5 5 7 9 10 5 8 1 0 3 1 5 0 5 1 5 1 1 0 0 >AL356356#6\AL356356\join(15278..15316,16226..16250,17936..17987,19673..19751, 24792..25009,25764..25926,26159..26296,26478..26555, 31597..31697,31894..31995,32168..32240,32853..32967, 34178..34326)\1332\CAI15490.1\Homo sapiens\Human DNA sequence from clone RP11-54A4 on chromosome 1 Containsthe 3' end of the gene for a novel protein (KIAA0460), the gene forthreonyl-tRNA synthetase (FLJ12528), the ECM1 gene forextracellular matrix protein 1, the TSRC1 gene for thrombospondinrepeat containing 1, two novel genes, the MCL1 gene for myeloidcell leukemia sequence 1 (BCL2-related), the ENSA gene forendosulfine alpha, the 3' end of the gene for GPP34-related protein(GPP34R) and two CpG islands, complete sequence./gene="TARSL1"/locus_tag="RP11-54A4.9-012"/standard_name="OTTHUMP00000014941"/codon_start=1/product="threonyl-tRNA synthetase-like 1"/protein_id="CAI15490.1"/db_xref="GI:55958998"/db_xref="GOA:Q5T5E9"/db_xref="InterPro:IPR002314"/db_xref="InterPro:IPR002320"/db_xref="InterPro:IPR004154"/db_xref="InterPro:IPR006195"/db_xref="UniProt/TrEMBL:Q5T5E9" 5 3 11 4 6 4 5 10 28 9 0 3 1 7 1 8 6 6 9 4 2 5 8 9 2 7 9 16 1 6 11 9 10 7 0 6 13 3 5 8 6 3 4 25 11 6 15 13 15 11 4 6 7 4 15 8 0 5 6 4 8 0 0 1 >HUMENOA\M14328\95..1399\1305\AAA52387.1\Homo sapiens\Human alpha enolase mRNA, complete cds./gene="ENO1"/codon_start=1/protein_id="AAA52387.1"/db_xref="GI:182114" 1 5 1 1 3 6 1 10 20 2 0 3 3 5 2 11 3 2 4 5 0 9 4 8 1 3 4 19 5 15 4 11 13 10 2 12 12 7 11 27 14 10 1 10 3 3 9 20 14 12 7 4 6 0 12 5 0 19 8 9 3 1 0 0 >AL359553#2\AL359553\join(5268..5398,7097..7248,8958..9235,10804..11013, 16292..16450,16800..16869,17967..18022)\1056\CAC19798.1\Homo sapiens\Human DNA sequence from clone RP5-871G17 on chromosome 1 Containsthe 3' end of the HAO2 gene for hydroxyacid oxidase 2 (long chain),the HSD3B2 gene for hydroxy-delta-5-steroid dehydrogenase 3 beta-and steroid delta-isomerase 2, two glyceraldehyde-3-phosphatedehydrogenase (GAPD) pseudogenes and two hydroxy-delta-5-steroiddehydrogenase 3 beta- and steroid delta-isomerase family (HSD3B)pseudogenes, complete sequence./gene="HAO2"/locus_tag="RP5-871G17.1-002"/standard_name="OTTHUMP00000014486"/codon_start=1/product="hydroxyacid oxidase 2 (long chain)"/protein_id="CAC19798.1"/db_xref="GI:12043434"/db_xref="Genew:4810"/db_xref="GOA:Q9NYQ3"/db_xref="HSSP:1GOX"/db_xref="InterPro:IPR000262"/db_xref="InterPro:IPR003009"/db_xref="InterPro:IPR008259"/db_xref="UniProt/Swiss-Prot:Q9NYQ3" 7 1 4 1 5 6 3 5 9 9 3 11 3 6 1 3 7 1 12 3 1 7 2 3 1 6 8 7 2 12 3 6 8 7 4 7 8 4 7 8 8 6 4 11 3 5 9 9 7 13 3 2 6 4 5 7 3 13 13 4 5 1 0 0 >AF343078\AF343078\67..1803\1737\AAK38647.1\Homo sapiens\Homo sapiens TOB3 mRNA, complete cds./codon_start=1/product="TOB3"/protein_id="AAK38647.1"/db_xref="GI:13752411" 4 15 19 4 2 8 1 11 36 6 1 6 2 8 5 1 11 2 5 11 10 2 2 12 8 2 5 38 13 8 4 16 16 2 0 9 17 3 6 31 8 8 6 33 11 1 7 43 24 2 6 4 2 2 11 6 2 13 2 18 7 0 0 1 >BC011855\BC011855\2..1444\1443\AAH11855.1\Homo sapiens\Homo sapiens DnaJ (Hsp40) homolog, subfamily A, member 3, mRNA(cDNA clone MGC:20326 IMAGE:4139881), complete cds./gene="DNAJA3"/codon_start=1/product="DnaJ (Hsp40) homolog, subfamily A, member 3"/protein_id="AAH11855.1"/db_xref="GI:15080163"/db_xref="GeneID:9093"/db_xref="MIM:608382" 4 4 7 2 4 11 1 3 12 5 1 5 3 12 2 6 15 2 10 11 8 3 1 10 3 13 8 16 3 7 12 22 11 5 0 6 21 3 11 22 8 3 2 19 6 3 5 21 10 11 11 6 5 6 14 7 5 14 6 10 3 0 0 1 >BC026307\BC026307\157..2136\1980\AAH26307.1\Homo sapiens\Homo sapiens chromosome 18 open reading frame 9, mRNA (cDNA cloneMGC:26034 IMAGE:4792211), complete cds./gene="C18orf9"/codon_start=1/product="hypothetical protein LOC79959"/protein_id="AAH26307.1"/db_xref="GI:20072843"/db_xref="GeneID:79959" 12 1 3 8 13 6 1 7 16 22 15 10 13 7 4 9 3 7 18 4 3 16 16 5 1 19 16 9 1 15 18 5 4 10 12 4 10 19 23 7 7 13 14 15 7 10 38 12 7 22 5 13 3 20 5 23 10 9 12 13 9 0 1 0 >AF149722\AF149722\199..831\633\AAF37422.1\Homo sapiens\Homo sapiens ING1 tumor suppressor, variant B (ING1) mRNA, completecds./gene="ING1"/codon_start=1/product="ING1 tumor suppressor, variant B"/protein_id="AAF37422.1"/db_xref="GI:7158367" 1 6 4 1 0 3 0 2 8 0 0 0 1 6 2 1 6 0 2 2 2 0 0 9 0 1 0 5 8 1 0 10 3 0 1 1 8 1 5 18 13 2 0 10 5 1 1 19 13 1 3 1 5 3 3 0 0 6 0 4 2 0 1 0 >HSU43784\U43784\23..1171\1149\AAC50428.1\Homo sapiens\Human mitogen activated protein kinase activated protein kinase-3mRNA, complete cds./codon_start=1/product="mitogen activated protein kinase activated protein kinase-3"/protein_id="AAC50428.1"/db_xref="GI:1256005" 2 4 7 1 2 5 1 10 16 2 0 5 3 5 1 5 4 1 5 8 2 6 5 12 3 4 6 12 0 5 2 18 4 7 3 6 11 1 4 22 12 2 2 20 7 5 7 20 15 6 7 6 5 4 7 3 1 11 5 12 5 0 1 0 >AF369951\AF369951\83..442\360\AAM21294.1\Homo sapiens\Homo sapiens lung cancer oncogene 1 mRNA, complete cds./codon_start=1/product="lung cancer oncogene 1"/protein_id="AAM21294.1"/db_xref="GI:20385454" 0 1 7 3 0 2 1 2 5 2 2 2 0 1 0 0 1 0 1 1 2 0 3 3 4 1 1 2 1 3 2 3 0 2 1 0 2 0 4 2 2 0 0 1 0 3 7 2 3 4 3 6 1 0 2 7 1 4 3 3 0 0 0 1 >AK131264\AK131264\538..2541\2004\BAD18442.1\Homo sapiens\Homo sapiens cDNA FLJ16197 fis, clone CTONG1000113, highly similarto ZINC FINGER PROTEIN 184./codon_start=1/protein_id="BAD18442.1"/db_xref="GI:47077016" 3 2 3 1 17 4 4 6 6 12 2 3 12 7 0 13 12 13 13 11 1 25 14 6 1 10 2 11 2 7 17 7 11 5 5 2 8 3 45 17 12 18 15 30 9 36 44 21 6 13 5 14 4 32 10 15 8 4 18 4 6 0 0 1 >BC060810\BC060810\674..2020\1347\AAH60810.1\Homo sapiens\Homo sapiens G protein-coupled receptor 172B, mRNA (cDNA cloneMGC:71649 IMAGE:30348722), complete cds./gene="GPR172B"/codon_start=1/product="G protein-coupled receptor 172B"/protein_id="AAH60810.1"/db_xref="GI:38511970"/db_xref="GeneID:55065"/db_xref="MIM:607883" 0 2 4 2 1 2 4 9 52 8 1 8 6 6 1 8 9 4 2 12 1 5 11 15 3 12 15 27 2 7 3 19 10 13 5 5 37 2 1 4 3 3 4 9 7 3 4 12 5 0 4 2 4 8 16 5 0 6 0 8 7 0 0 1 >BC016914\BC016914\88..705\618\AAH16914.1\Homo sapiens\Homo sapiens zinc finger, CCHC domain containing 4, mRNA (cDNAclone MGC:21108 IMAGE:4423672), complete cds./gene="ZCCHC4"/codon_start=1/product="ZCCHC4 protein"/protein_id="AAH16914.1"/db_xref="GI:16877315"/db_xref="GeneID:29063" 3 0 3 0 8 4 0 2 8 6 4 11 3 2 0 2 2 2 4 2 1 3 3 5 1 4 1 7 0 4 4 0 2 2 3 1 5 2 3 9 3 3 4 10 1 4 7 4 4 7 1 3 1 9 2 8 0 1 4 1 2 0 1 0 >AF006740\AF006740\57..1709\1653\AAC39545.1\Homo sapiens\Homo sapiens Coch-5B2 mRNA, complete cds./gene="Coch-5B2"/codon_start=1/protein_id="AAC39545.1"/db_xref="GI:2801413" 4 4 1 0 10 3 6 4 10 4 3 5 4 9 2 13 10 6 16 4 2 10 5 7 3 10 11 14 1 20 15 15 10 11 15 11 13 11 21 14 11 13 6 11 2 7 14 12 7 21 3 9 4 7 17 18 8 14 14 10 5 1 0 0 >AF516142\AF516142\16..1446\1431\AAP47194.1\Homo sapiens\Homo sapiens proton-coupled amino acid transporter (PAT1) mRNA,complete cds./gene="PAT1"/codon_start=1/product="proton-coupled amino acid transporter"/protein_id="AAP47194.1"/db_xref="GI:31324239" 3 4 3 1 2 3 1 20 39 1 2 6 3 16 2 0 17 0 6 15 4 1 4 11 4 6 3 10 4 6 7 14 8 4 0 10 21 5 6 5 12 10 3 10 10 0 6 10 10 3 14 5 8 5 19 14 4 33 6 12 5 0 1 0 >AY823524\AY823524\12..1391\1380\AAW22617.1\Homo sapiens\Homo sapiens protection of telomeres protein 1 variant 5 mRNA,complete cds, alternatively spliced./codon_start=1/product="protection of telomeres protein 1 variant 5"/protein_id="AAW22617.1"/db_xref="GI:56693613" 2 3 2 1 6 5 10 3 9 14 10 13 14 0 0 12 7 5 13 6 1 12 11 3 2 5 10 5 0 2 11 4 0 9 8 6 6 14 20 11 4 15 14 11 3 13 11 8 8 16 4 14 3 7 2 15 8 8 9 6 5 1 0 0 >BC007775\BC007775\15..899\885\AAH07775.1\Homo sapiens\Homo sapiens zinc finger protein 346, mRNA (cDNA clone MGC:12633IMAGE:4301820), complete cds./gene="ZNF346"/codon_start=1/product="zinc finger protein 346"/protein_id="AAH07775.1"/db_xref="GI:14043595"/db_xref="GeneID:23567"/db_xref="MIM:605308" 0 5 1 0 5 2 3 3 8 1 2 4 4 7 2 1 8 0 3 7 2 3 4 4 3 4 6 9 5 4 3 6 5 2 0 4 11 1 11 20 9 1 6 18 6 6 6 11 12 1 6 3 7 4 4 3 3 3 3 8 1 0 1 0 >AF059252\AF059252\130..1320\1191\AAC78603.1\Homo sapiens\Homo sapiens clone 1 HLA class III protein Dom3z (DOM3Z) mRNA,complete cds./gene="DOM3Z"/codon_start=1/product="HLA class III protein Dom3z"/protein_id="AAC78603.1"/db_xref="GI:3372630" 3 5 10 3 3 6 3 13 18 3 0 1 3 6 1 8 7 2 8 7 3 4 12 17 5 12 5 12 0 8 7 10 7 3 3 6 8 5 6 8 8 3 4 13 7 2 6 17 14 6 12 5 6 2 12 7 2 0 0 9 10 0 1 0 >HSA400854\AJ400854\72..3716\3645\CAC14139.1\Homo sapiens\Homo sapiens mRNA for MUC4 protein splice variant sv17 (MUC4 gene)./gene="MUC4"/codon_start=1/product="MUC4 protein splice variant sv17"/protein_id="CAC14139.1"/db_xref="GI:10944947"/db_xref="UniProt/TrEMBL:Q9H485" 3 3 8 2 9 18 8 20 17 15 3 1 69 52 4 42 29 22 108 75 11 41 40 25 9 27 25 35 7 23 31 16 21 15 10 19 15 12 9 12 15 8 21 35 20 11 22 27 24 7 4 3 3 2 23 8 13 15 9 21 12 0 1 0 >BT007295\BT007295\1..828\828\AAP35959.1\Homo sapiens\Homo sapiens zinc finger protein 339 mRNA, complete cds./codon_start=1/product="zinc finger protein 339"/protein_id="AAP35959.1"/db_xref="GI:30583429" 0 7 2 3 2 5 1 5 17 1 0 0 0 3 5 2 17 2 5 9 2 0 1 9 4 1 4 7 2 0 3 18 3 0 0 7 6 1 10 13 5 2 0 11 12 2 1 16 11 4 5 2 10 2 7 1 0 3 1 2 1 0 1 0 >AK023886\AK023886\306..2129\1824\BAB14712.1\Homo sapiens\Homo sapiens cDNA FLJ13824 fis, clone THYRO1000505./codon_start=1/protein_id="BAB14712.1"/db_xref="GI:10435959" 2 14 16 6 1 6 5 19 61 4 0 8 3 9 1 2 21 6 7 9 3 5 6 24 2 7 4 23 4 7 2 15 13 2 1 5 25 1 1 22 16 3 4 30 9 2 5 48 20 5 8 6 8 4 19 5 2 14 2 20 5 0 0 1 >BC025305\BC025305\48..1664\1617\AAH25305.1\Homo sapiens\Homo sapiens cisplatin resistance related protein CRR9p, mRNA (cDNAclone MGC:39275 IMAGE:3051368), complete cds./gene="CRR9"/codon_start=1/product="cisplatin resistance related protein CRR9p"/protein_id="AAH25305.1"/db_xref="GI:19263702"/db_xref="GeneID:81037" 1 7 9 0 3 3 0 10 41 2 2 5 5 13 0 4 11 3 2 19 9 3 3 10 5 4 4 14 8 5 2 7 11 4 2 19 28 3 10 24 15 5 0 15 9 5 6 18 16 11 23 5 5 1 21 18 1 17 4 15 13 0 0 1 >BC040152\BC040152\208..501\294\AAH40152.1\Homo sapiens\Homo sapiens dolichyl pyrophosphate phosphatase 1, mRNA (cDNA cloneIMAGE:5752769), complete cds./gene="DOLPP1"/codon_start=1/product="DOLPP1 protein"/protein_id="AAH40152.1"/db_xref="GI:25304089"/db_xref="GeneID:57171" 1 1 1 0 2 4 2 5 6 1 1 1 0 7 0 0 3 0 2 1 2 0 0 2 0 1 1 3 0 1 1 0 1 0 2 3 2 0 1 1 4 0 2 2 2 1 1 2 2 0 2 2 0 0 8 2 0 1 1 3 3 0 0 1 >AL356280#2\AL356280\join(complement(150364..150432), complement(AL359760.10:67552..67698), complement(AL359760.10:61345..61500), complement(AL359760.10:55699..55918), complement(AL359760.10:36749..36855), complement(AL359760.10:34911..35588))\69\CAI16873.1\Homo sapiens\Human DNA sequence from clone RP11-82O2 on chromosome 1 Containsthe 5' end of the OLFM3 gene for olfactomedin 3 and a DNAJ familypseudogene, complete sequence./gene="OLFM3"/locus_tag="RP11-556K13.2-003"/standard_name="OTTHUMP00000012614"/codon_start=1/product="olfactomedin 3"/protein_id="CAI16873.1"/db_xref="GI:55959089" 1 1 0 0 14 6 14 6 13 9 16 5 10 6 1 10 2 10 7 1 1 11 10 3 0 11 2 4 1 3 6 4 2 4 2 5 7 8 29 10 6 13 15 8 3 2 7 4 3 5 8 16 4 7 7 27 11 5 8 6 9 10 7 13 >AK057341\AK057341\81..1547\1467\BAB71436.1\Homo sapiens\Homo sapiens cDNA FLJ32779 fis, clone TESTI2002090./codon_start=1/protein_id="BAB71436.1"/db_xref="GI:16552997" 1 0 0 1 11 3 7 3 5 14 17 15 14 3 0 6 6 12 14 5 2 8 15 1 3 8 7 0 0 6 6 0 1 0 7 5 10 9 23 10 8 16 14 5 3 6 33 17 5 20 7 9 2 12 9 19 9 4 16 11 5 0 0 1 >HS323M22A\AL096886\90..1361\1272\CAB51469.1\Homo sapiens\Novel human gene mapping to chomosome 22./codon_start=1/product="hypothetical protein"/protein_id="CAB51469.1"/db_xref="GI:5596697"/db_xref="GOA:O95922"/db_xref="UniProt/Swiss-Prot:O95922" 3 3 7 2 6 2 1 11 22 2 1 2 3 6 4 6 8 5 4 12 2 2 5 3 6 3 3 6 3 4 4 8 9 3 2 8 15 4 10 26 17 11 4 7 5 3 13 14 20 4 15 7 5 2 7 8 1 21 4 8 11 0 0 1 >AB065858\AB065858\201..1208\1008\BAC06076.1\Homo sapiens\Homo sapiens gene for seven transmembrane helix receptor, completecds, isolate:CBRC7TM_421./codon_start=1/evidence=not_experimental/product="seven transmembrane helix receptor"/protein_id="BAC06076.1"/db_xref="GI:21928981" 1 5 0 0 1 3 4 14 17 7 1 3 1 11 0 9 7 2 4 5 0 6 3 4 0 5 6 16 1 8 3 7 5 3 0 13 3 3 3 6 4 5 3 5 8 6 2 4 3 4 5 8 6 5 10 10 4 24 12 14 3 1 0 0 >BC008316\BC008316\68..1045\978\AAH08316.1\Homo sapiens\Homo sapiens LRP16 protein, mRNA (cDNA clone MGC:15615IMAGE:3342973), complete cds./gene="LRP16"/codon_start=1/product="LRP16 protein"/protein_id="AAH08316.1"/db_xref="GI:14249877"/db_xref="GeneID:28992" 2 9 11 1 0 3 1 9 24 1 0 1 0 6 2 1 12 2 3 9 3 2 0 13 4 0 1 18 16 2 3 13 8 3 0 4 16 1 4 19 2 1 0 8 5 2 1 17 14 0 7 2 7 2 6 3 0 12 1 3 5 0 0 1 >BC002683\BC002683\130..558\429\AAH02683.1\Homo sapiens\Homo sapiens neuritin 1, mRNA (cDNA clone MGC:3391 IMAGE:3605775),complete cds./gene="NRN1"/codon_start=1/product="neuritin, precursor"/protein_id="AAH02683.1"/db_xref="GI:12803695"/db_xref="GeneID:51299"/db_xref="MIM:607409" 0 0 0 0 3 0 0 7 7 3 2 2 1 3 2 1 4 0 2 2 3 0 0 0 3 0 2 3 9 1 1 8 3 0 0 2 6 0 4 6 6 0 2 3 1 0 3 1 4 5 2 2 5 1 5 1 1 3 1 3 3 0 0 1 >HUMRASAD\M31470\1..642\642\AAA36547.1\Homo sapiens\Human ras-like protein mRNA, complete cds, clone TC10./codon_start=1/protein_id="AAA36547.1"/db_xref="GI:190881" 1 1 0 1 3 1 3 5 3 2 7 1 3 0 1 1 4 0 0 4 3 5 4 5 2 2 5 6 2 3 8 5 3 0 4 5 9 1 9 7 2 4 2 5 3 0 6 6 7 5 5 4 5 5 4 3 7 1 3 6 1 0 0 1 >AF302493\AF302493\71..2092\2022\AAQ14492.1\Homo sapiens\Homo sapiens endocrine transmitter regulatory protein mRNA,complete cds./codon_start=1/product="endocrine transmitter regulatory protein"/protein_id="AAQ14492.1"/db_xref="GI:33340033" 3 8 9 4 9 14 7 10 27 7 5 9 6 13 3 10 14 19 5 10 3 15 11 4 1 8 7 15 0 8 13 12 12 13 4 6 24 6 27 36 7 12 6 24 6 11 28 28 18 18 10 6 7 4 5 12 6 14 6 17 11 0 0 1 >AL671561#7\AL671561\join(41671..41676,42468..42549,42679..42948,43175..43450, 44050..44325,44448..44564,45010..45014)\1032\CAI17562.1\Homo sapiens\Human DNA sequence from clone XXbac-209H21 on chromosome 6 containsa HLA (MHC) class 1 pseudogene, a putative novel MHC class 1 (HLA)protein (fragment), a 60S ribosomal protein L7A (RPL7A) pseudogene,a P5-1 pseudogene, the HLA-G gene for histompatibility antigen,class I, G, a major histocompatibility complex, class I, Hpseudogene (HLA-H, HLAHP) and five CpG islands, complete sequence./gene="HLA-G"/locus_tag="XXbac-BCX209H21.15-001"/standard_name="OTTHUMP00000029630"/codon_start=1/product="HLA-G histocompatibility antigen, class I, G"/protein_id="CAI17562.1"/db_xref="GI:55961745"/db_xref="GOA:Q5RIS3"/db_xref="InterPro:IPR001039"/db_xref="InterPro:IPR003006"/db_xref="InterPro:IPR003597"/db_xref="InterPro:IPR007110"/db_xref="UniProt/TrEMBL:Q5RIS3" 1 8 5 0 5 7 1 9 19 1 0 0 1 7 2 3 4 1 2 15 4 3 2 8 4 4 6 12 9 8 4 8 9 2 1 5 15 2 0 11 5 1 1 16 6 3 3 24 13 5 9 5 4 2 7 1 1 6 1 11 11 0 0 1 >BC007515\BC007515\47..277\231\AAH07515.1\Homo sapiens\Homo sapiens heat shock factor binding protein 1, mRNA (cDNA cloneMGC:4536 IMAGE:3010737), complete cds./gene="HSBP1"/codon_start=1/product="heat shock factor binding protein 1"/protein_id="AAH07515.1"/db_xref="GI:13960151"/db_xref="GeneID:3281"/db_xref="MIM:604553" 0 1 0 0 1 0 0 3 3 0 0 0 0 0 1 1 0 4 2 3 1 1 0 1 0 1 0 2 1 1 0 0 2 0 0 0 4 0 1 4 1 1 2 7 0 0 5 1 4 5 0 0 0 0 0 1 1 2 3 5 0 0 0 1 >AF104938\AF104938\217..3588\3372\AAD54676.1\Homo sapiens\Homo sapiens lectomedin-1 beta (LEC1) mRNA, complete cds./gene="LEC1"/codon_start=1/product="lectomedin-1 beta"/protein_id="AAD54676.1"/db_xref="GI:5880492" 14 2 3 4 14 11 9 11 23 25 13 14 26 9 0 15 18 16 32 23 4 26 21 10 1 20 27 11 2 36 27 16 9 13 10 13 30 21 33 22 25 34 13 23 11 7 43 13 21 31 20 31 20 14 29 32 16 18 46 23 19 0 0 1 >AF277194\AF277194\46..495\450\AAK07549.1\Homo sapiens\Homo sapiens PNAS-136 mRNA, complete cds./codon_start=1/product="PNAS-136"/protein_id="AAK07549.1"/db_xref="GI:12751113"/IGNORED_CODON=1 3 0 0 0 4 1 2 4 8 2 0 1 2 3 0 2 1 1 4 3 1 3 0 0 0 3 3 1 1 3 3 1 0 2 0 1 4 3 7 9 2 3 3 2 0 1 5 4 4 3 3 1 1 2 4 6 4 3 3 7 1 0 0 1 >AY804140\AY804140\39..815\777\AAW78745.1\Homo sapiens\Homo sapiens MHC class II antigen (HLA-DPB1) mRNA,HLA-DPB1-DPB1*1401 allele, complete cds./gene="HLA-DPB1"/allele="DPB1*1401"/codon_start=1/product="MHC class II antigen"/protein_id="AAW78745.1"/db_xref="GI:58532285" 3 3 7 1 2 4 0 3 17 1 2 1 0 3 0 5 5 2 6 6 3 1 2 4 1 3 3 4 4 3 6 4 7 0 1 7 17 4 1 8 6 4 3 14 7 0 3 14 9 3 8 0 4 1 9 1 0 7 1 6 4 1 0 0 >AF525416\AF525416\132..1646\1515\AAP80792.1\Homo sapiens\Homo sapiens unnamed C1q domain protein mRNA, complete cds./codon_start=1/product="unnamed C1q domain protein"/protein_id="AAP80792.1"/db_xref="GI:32401237" 4 0 2 3 9 2 2 3 11 10 5 1 13 7 1 20 11 11 8 9 3 17 19 6 3 19 20 8 0 8 16 6 3 11 7 5 9 7 6 5 8 22 20 20 3 5 12 8 8 11 4 15 3 4 6 17 5 0 10 9 4 0 0 1 >S65701\S65701\14..85\72\AAB28213.1\Homo sapiens\CRK=proto-oncogene {intron/exon junction} [human, Genomic, 85 nt]./gene="CRK"/codon_start=1/protein_id="AAB28213.1"/db_xref="GI:430793" 1 0 0 0 1 0 0 1 1 0 1 1 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 2 1 0 0 0 1 1 1 0 0 0 0 0 0 0 >BC093046\BC093046\81..1859\1779\AAH93046.1\Homo sapiens\Homo sapiens signal peptide peptidase-like 2B, mRNA (cDNA cloneMGC:111084 IMAGE:30395602), complete cds./gene="SPPL2B"/codon_start=1/product="signal peptide peptidase-like 2B"/protein_id="AAH93046.1"/db_xref="GI:62202483"/db_xref="GeneID:56928"/db_xref="MIM:608239" 0 8 9 6 0 9 1 23 43 3 0 2 3 15 2 3 14 2 2 10 12 1 8 21 13 7 6 35 16 7 3 23 12 2 3 11 38 1 7 14 9 1 0 21 5 1 4 17 20 3 18 4 16 3 24 4 0 18 3 16 10 0 1 0 >BT019813\BT019813\1..732\732\AAV38616.1\Homo sapiens\Homo sapiens CD48 antigen (B-cell membrane protein) mRNA, completecds./codon_start=1/product="CD48 antigen (B-cell membrane protein)"/protein_id="AAV38616.1"/db_xref="GI:54696488" 0 0 1 0 3 4 3 3 14 6 1 3 2 6 1 5 7 4 0 10 2 4 3 4 0 6 2 1 0 1 1 5 2 3 5 7 10 0 9 10 7 4 5 4 1 2 7 7 6 4 6 4 3 5 2 4 2 5 5 5 7 0 1 0 >HSU63289\U63289\138..1586\1449\AAC50895.1\Homo sapiens\Human RNA-binding protein CUG-BP/hNab50 (NAB50) mRNA, complete cds./gene="NAB50"/function="binds to (CUG)n triplet repeats"/codon_start=1/product="CUG-BP/hNab50"/protein_id="AAC50895.1"/db_xref="GI:1518802" 2 0 4 2 2 4 3 13 13 9 5 5 5 6 4 8 16 9 9 8 2 8 7 8 1 6 15 11 1 25 10 11 7 12 2 5 9 5 8 16 14 15 6 36 6 1 11 5 11 6 6 4 4 3 4 14 4 8 5 21 2 0 0 1 >BC042426\BC042426\431..907\477\AAH42426.1\Homo sapiens\Homo sapiens LIM domain only 2 (rhombotin-like 1), mRNA (cDNA cloneMGC:34396 IMAGE:5188482), complete cds./gene="LMO2"/codon_start=1/product="LMO2 protein"/protein_id="AAH42426.1"/db_xref="GI:27502791"/db_xref="GeneID:4005"/db_xref="MIM:180385" 0 2 6 1 2 2 0 7 9 1 0 0 1 3 1 1 2 0 2 0 0 1 2 1 0 1 1 5 0 0 0 4 4 3 1 0 6 0 4 6 2 1 1 6 2 1 4 6 12 1 6 3 9 5 3 1 2 7 1 4 2 0 1 0 >AY359018\AY359018\151..597\447\AAQ89377.1\Homo sapiens\Homo sapiens clone DNA73746 TKAL754 (UNQ754) mRNA, complete cds./locus_tag="UNQ754"/codon_start=1/product="TKAL754"/protein_id="AAQ89377.1"/db_xref="GI:37183154" 0 2 2 0 1 2 2 4 9 4 0 4 2 3 1 0 6 3 2 0 0 0 1 1 0 0 3 3 2 1 3 3 2 1 2 1 3 0 1 4 7 4 1 4 4 0 4 2 5 4 4 3 5 3 2 4 2 4 1 2 5 0 0 1 >BC014533\BC014533\7..642\636\AAH14533.3\Homo sapiens\Homo sapiens myelin expression factor 2, mRNA (cDNA cloneIMAGE:4133844), complete cds./gene="MYEF2"/codon_start=1/product="MYEF2 protein"/protein_id="AAH14533.3"/db_xref="GI:32425486"/db_xref="GeneID:50804" 4 1 1 3 9 1 2 0 4 1 1 2 2 4 1 0 6 5 1 0 0 3 4 0 0 1 3 1 1 2 24 7 2 13 1 2 1 2 8 1 2 7 0 3 0 1 6 5 3 9 1 0 1 2 1 12 9 2 3 19 1 1 0 0 >BC034296\BC034296\50..751\702\AAH34296.1\Homo sapiens\Homo sapiens hypothetical protein MGC35043, mRNA (cDNA cloneMGC:35043 IMAGE:5165558), complete cds./gene="MGC35043"/codon_start=1/product="hypothetical protein MGC35043"/protein_id="AAH34296.1"/db_xref="GI:21706729"/db_xref="GeneID:255119" 1 2 0 1 11 5 4 1 9 4 1 3 2 2 1 2 1 5 3 1 3 8 4 1 0 1 3 6 2 7 4 2 3 2 0 3 6 1 8 5 4 7 6 5 2 1 12 6 12 9 8 3 0 0 3 7 4 5 8 2 1 1 0 0 >AK128830\AK128830\1266..2633\1368\BAC87630.1\Homo sapiens\Homo sapiens cDNA FLJ46317 fis, clone TESTI4041832./codon_start=1/protein_id="BAC87630.1"/db_xref="GI:34536392" 3 1 2 14 28 17 1 1 20 0 0 0 0 15 0 0 1 16 3 0 0 1 3 29 0 17 2 7 16 1 24 23 34 1 0 16 30 1 0 17 0 0 1 8 0 1 15 36 2 0 0 1 0 15 1 0 0 0 0 1 30 0 1 0 >AY518541\AY518541\join(1953..2016,2351..2507,3966..4088,5361..5459, 7052..7145)\537\AAR89914.1\Homo sapiens\Homo sapiens endothelin 2 (EDN2) gene, complete cds./gene="EDN2"/codon_start=1/product="endothelin 2"/protein_id="AAR89914.1"/db_xref="GI:40786807" 2 5 5 1 2 7 1 4 6 4 0 1 1 11 0 1 3 1 5 5 0 4 4 2 1 5 2 11 2 3 2 4 3 0 0 4 4 1 0 7 2 0 3 6 2 3 3 5 5 0 2 0 7 2 3 1 0 2 1 2 5 0 1 0 >HUMCYCD3A\M92287\166..1044\879\AAA52137.1\Homo sapiens\Homo sapiens cyclin D3 (CCND3) mRNA, complete cds./gene="CCND3"/codon_start=1/product="cyclin D3"/protein_id="AAA52137.1"/db_xref="GI:181247"/db_xref="GDB:G00-128-969" 2 7 9 2 0 2 1 8 28 0 0 4 0 7 0 4 9 0 4 9 2 3 2 10 4 1 4 15 4 7 0 5 7 2 2 8 5 0 3 7 1 0 2 14 5 2 6 15 7 5 7 1 7 5 4 2 1 6 3 9 3 0 1 0 >AF242523\AF242523\6..725\720\AAF99603.1\Homo sapiens\Homo sapiens hypothetical transmembrane protein SBBI53 mRNA,complete cds./codon_start=1/product="hypothetical transmembrane protein SBBI53"/protein_id="AAF99603.1"/db_xref="GI:9802044" 3 2 4 1 2 1 3 4 5 5 6 11 1 1 0 2 4 0 5 5 0 1 7 4 1 4 2 7 1 1 7 5 2 7 2 4 5 3 3 0 6 5 2 7 2 0 6 4 2 4 6 4 1 1 12 19 4 5 6 8 4 1 0 0 >AB097031\AB097031\13..741\729\BAC77384.1\Homo sapiens\Homo sapiens mRNA for putative MAPK activating protein, completecds, clone: PM07./codon_start=1/product="putative MAPK activating protein"/protein_id="BAC77384.1"/db_xref="GI:31455517" 0 1 0 0 6 3 1 2 2 11 2 2 5 3 1 6 1 4 5 3 0 6 8 2 2 5 12 0 0 12 6 3 3 2 2 2 6 3 3 1 4 5 1 8 2 1 9 6 3 8 3 9 1 3 10 9 3 4 6 6 5 0 1 0 >HSA218C14#6\AL121894\complement(join(103487..103570,104825..104938, 107191..107433))\441\CAC05424.1\Homo sapiens\Human DNA sequence from clone RP11-218C14 on chromosome 20 Containsthe 3' end of a novel gene, two genes for novel proteins similar tocystatin 9-like (mouse) (CST9L), the CST9L gene for cystatin 9-like(mouse), a cystatin 9-like (mouse) pseudogene, the CST3 gene forcystatin C, a novel gene and a CpG island, complete sequence./gene="CST3"/locus_tag="RP11-218C14.4-001"/standard_name="OTTHUMP00000030440"/codon_start=1/protein_id="CAC05424.1"/db_xref="GI:9944241"/db_xref="Genew:2475"/db_xref="GOA:P01034"/db_xref="InterPro:IPR000010"/db_xref="UniProt/Swiss-Prot:P01034" 1 5 1 1 0 1 1 1 10 0 0 3 0 2 1 1 4 1 1 5 1 0 1 6 3 1 2 12 2 2 1 6 2 1 1 1 10 0 4 3 5 0 0 7 1 2 0 5 7 0 4 0 2 2 4 1 0 3 0 4 1 0 1 0 >HSY14768#14\Y14768\join(73051..73132,73401..73501,74029..74202)\357\CAA75073.1\Homo sapiens\Homo sapiens DNA, cosmid clones TN62 and TN82./gene="ATP6G"/codon_start=1/product="V-ATPase G-subunit like protein"/protein_id="CAA75073.1"/db_xref="GI:3805814"/db_xref="GOA:O95670"/db_xref="InterPro:IPR005124"/db_xref="UniProt/Swiss-Prot:O95670" 3 2 3 1 3 3 0 0 4 3 0 0 0 3 0 2 2 1 1 0 0 0 0 2 0 0 4 6 1 4 0 3 1 1 0 4 4 0 0 7 3 0 3 17 2 0 1 11 1 1 2 0 1 0 1 0 0 1 1 5 0 0 1 0 >HUMHBGFBV\M63888\1..1989\1989\AAA35959.1\Homo sapiens\Human heparin-binding growth factor receptor (HBGF-R-alpha-a2)mRNA, complete cds./gene="HBGF-R"/codon_start=1/product="heparin-binding growth factor receptor"/protein_id="AAA35959.1"/db_xref="GI:183881" 2 7 10 3 7 3 4 9 33 4 2 12 6 21 5 8 14 7 13 16 3 2 9 18 8 12 10 17 2 12 7 12 16 6 3 11 33 5 13 28 18 7 3 17 11 6 14 30 25 13 12 10 13 4 8 2 1 21 3 18 13 1 0 0 >HSH12\X57129\526..1167\642\CAA40408.1\Homo sapiens\H.sapiens H1.2 gene for histone H1./gene="H1.2"/codon_start=1/product="histone H1"/protein_id="CAA40408.1"/db_xref="GI:31968"/db_xref="GOA:P16403"/db_xref="UniProt/Swiss-Prot:P16403" 0 0 0 3 0 0 0 3 4 1 0 1 1 4 0 3 6 1 1 4 2 4 2 8 4 7 3 16 12 17 2 7 3 4 2 1 7 5 16 43 3 0 1 0 0 0 1 5 0 1 0 1 0 0 0 1 0 2 0 1 0 0 1 0 >AF230929\AF230929\227..1264\1038\AAG16780.1\Homo sapiens\Homo sapiens keratinocyte annexin-like protein pemphaxin mRNA,complete cds./function="may be involved in the regulation of keratinocyte cell adhesion"/codon_start=1/product="keratinocyte annexin-like protein pemphaxin"/protein_id="AAG16780.1"/db_xref="GI:10436074" 5 3 3 2 3 8 3 10 22 5 0 6 3 3 1 8 7 3 5 9 0 7 2 3 2 2 9 13 2 11 3 10 6 0 0 6 14 0 4 9 6 4 7 23 3 2 8 14 12 8 6 1 3 1 7 4 0 10 7 5 2 0 0 1 >BC062353\BC062353\27..908\882\AAH62353.1\Homo sapiens\Homo sapiens chromosome 1 open reading frame 131, mRNA (cDNA cloneMGC:71140 IMAGE:4648754), complete cds./gene="C1orf131"/codon_start=1/product="hypothetical protein LOC128061"/protein_id="AAH62353.1"/db_xref="GI:38565958"/db_xref="GeneID:128061" 0 0 3 2 8 6 3 2 6 4 2 4 2 7 3 6 4 4 9 1 3 0 4 6 2 7 5 4 1 11 9 2 3 5 2 0 6 7 17 23 6 7 5 9 2 2 19 8 4 7 2 2 0 1 4 4 3 5 6 4 0 0 0 1 >AF213041\AF213041\1..639\639\AAK06373.1\Homo sapiens\Homo sapiens isolate NT1 cyclin-dependent kinase associated proteinphosphatase mRNA, complete cds./codon_start=1/product="cyclin-dependent kinase associated protein phosphatase"/protein_id="AAK06373.1"/db_xref="GI:12734660" 3 0 1 0 9 0 6 3 4 7 3 1 12 1 0 5 3 2 4 5 0 2 5 1 1 2 3 2 0 4 5 0 4 3 2 2 1 3 7 3 1 4 7 4 1 6 8 5 7 6 4 3 4 8 1 5 10 3 3 2 1 1 0 0 >BC023995\BC023995\120..1010\891\AAH23995.1\Homo sapiens\Homo sapiens chromosome 2 open reading frame 25, mRNA (cDNA cloneMGC:24534 IMAGE:4103877), complete cds./gene="C2orf25"/codon_start=1/product="chromosome 2 open reading frame 25"/protein_id="AAH23995.1"/db_xref="GI:18645170"/db_xref="GeneID:27249" 4 1 0 1 5 2 2 4 5 5 5 3 6 1 2 5 2 7 5 1 0 11 7 2 0 7 9 4 0 8 8 1 3 7 6 1 7 9 9 5 5 10 5 4 1 7 20 4 4 11 2 5 4 5 4 13 3 2 7 6 4 0 1 0 >BC013795\BC013795\393..848\456\AAH13795.1\Homo sapiens\Homo sapiens hypothetical protein FLJ23560, mRNA (cDNA cloneIMAGE:3919095), complete cds./gene="FLJ23560"/codon_start=1/product="FLJ23560 protein"/protein_id="AAH13795.1"/db_xref="GI:15489408"/db_xref="GeneID:79738" 0 0 0 0 2 0 3 1 1 5 5 5 5 2 0 2 3 3 4 3 0 3 4 2 1 1 2 1 0 3 1 1 0 6 4 2 1 5 8 3 0 6 4 5 2 4 5 1 1 2 4 4 1 3 0 2 4 2 2 7 0 1 0 0 >AF038461\AF038461\260..2365\2106\AAC39770.1\Homo sapiens\Homo sapiens 12R-lipoxygenase mRNA, complete cds./function="converts arachidonic acid to 12R-HPETE (12R-hydroperoxyeicosatetraenoic acid)"/codon_start=1/product="12R-lipoxygenase"/protein_id="AAC39770.1"/db_xref="GI:3220166" 5 15 13 2 3 8 1 25 40 2 1 8 3 6 4 6 12 3 9 20 8 4 5 24 10 11 7 24 6 7 8 20 13 2 2 14 22 1 8 22 17 6 2 20 18 4 3 39 26 7 26 10 15 1 24 9 3 26 14 14 13 0 1 0 >AB011406\AB011406\177..1751\1575\BAA32129.1\Homo sapiens\Homo sapiens mRNA for alkalin phosphatase, complete cds./function="hydrolyze a variety of monophosphate esters with high pH optima"/standard_name="tissue non-specific alkalin phosphatase"/codon_start=1/evidence=experimental/product="alkalin phosphatase"/protein_id="BAA32129.1"/db_xref="GI:3401945" 2 4 5 2 3 5 1 11 26 4 2 4 3 11 3 3 9 2 4 19 6 6 3 15 2 6 5 29 6 8 4 22 12 7 2 11 25 1 10 18 20 7 2 12 18 5 5 24 27 2 16 5 3 3 14 2 0 13 5 15 5 0 0 1 >AF182422\AF182422\336..1019\684\AAG14958.1\Homo sapiens\Homo sapiens MDS023 (MDS023) mRNA, complete cds./gene="MDS023"/codon_start=1/product="MDS023"/protein_id="AAG14958.1"/db_xref="GI:10197644" 2 0 1 0 4 3 4 2 3 8 7 2 4 3 0 4 2 4 4 2 1 6 3 2 0 2 7 2 0 2 3 0 0 1 5 1 5 4 14 11 1 13 0 3 2 3 18 12 4 7 5 4 3 2 0 3 2 2 7 4 4 0 0 1 >HUMHBBPA6G#2\L48216\join(217..308,420..438)\111\AAA88064.1\Homo sapiens\Homo sapiens beta-globin (HBB) gene, with a to g mutation in thepoly-A signal, (J00179 bases 61971-63802)./gene="HBB"/codon_start=1/product="beta-globin"/protein_id="AAA88064.1"/db_xref="GI:1066776"/db_xref="GDB:G00-119-297" 0 0 0 0 0 0 1 0 3 0 0 0 0 1 0 1 0 1 0 0 0 2 0 0 0 2 0 3 0 0 0 2 0 2 0 0 3 2 0 2 1 0 0 0 2 0 1 3 0 1 0 0 0 0 0 1 0 0 0 1 1 0 1 0 >BC040061\BC040061\166..2994\2829\AAH40061.1\Homo sapiens\Homo sapiens protein kinase N1, transcript variant 2, mRNA (cDNAclone MGC:46204 IMAGE:5752583), complete cds./gene="PKN1"/codon_start=1/product="protein kinase C-like 1, isoform 2"/protein_id="AAH40061.1"/db_xref="GI:25304069"/db_xref="GeneID:5585"/db_xref="MIM:601032" 7 20 32 2 4 11 2 21 83 6 0 10 3 15 10 4 33 7 9 27 9 6 11 29 9 15 9 44 15 13 6 30 24 9 4 9 35 2 6 35 16 5 3 36 15 2 16 62 39 11 9 2 9 5 31 7 2 18 6 13 9 0 1 0 >HSA276674\AJ276674\join(10303..10421,11044..11172,11329..11429,11515..11736, 11896..12073,12951..13195,13572..13679,17109..17239)\1233\CAB82769.1\Homo sapiens\Homo sapiens PNR gene for photoreceptor-specific nuclear receptor,exons 1-8./gene="PNR"/function="unknown"/codon_start=1/product="photoreceptor-specific nuclear receptor"/protein_id="CAB82769.1"/db_xref="GI:7329721"/db_xref="GOA:Q9Y5X4"/db_xref="InterPro:IPR000003"/db_xref="InterPro:IPR000536"/db_xref="InterPro:IPR001628"/db_xref="InterPro:IPR001723"/db_xref="InterPro:IPR008946"/db_xref="UniProt/Swiss-Prot:Q9Y5X4" 1 7 10 1 3 6 1 14 22 2 0 5 0 11 2 8 19 1 6 5 4 4 7 13 6 6 7 21 5 11 1 11 10 2 2 5 12 0 2 15 7 3 1 17 9 1 5 21 9 7 2 2 11 3 11 5 2 8 1 13 4 0 1 0 >AF107292\AF107292\44..4483\4440\AAF14192.1\Homo sapiens\Homo sapiens urokinase receptor-associated protein uPARAP mRNA,complete cds./codon_start=1/product="urokinase receptor-associated protein uPARAP"/protein_id="AAF14192.1"/db_xref="GI:6492130" 7 29 27 8 6 6 4 27 85 8 1 10 7 29 7 9 64 12 10 44 18 12 13 48 18 15 10 57 17 9 8 67 33 14 2 23 32 1 8 38 48 10 6 84 33 7 14 82 60 12 32 8 37 19 42 9 3 39 5 19 67 0 1 0 >AL358075#6\AL358075\join(AL603888.9:87702..87856,AL603888.9:129709..129800, AL645480.8:11485..11560,AL645480.8:16566..16644, AL645480.8:20002..20156,AL645480.8:22057..22132, AL645480.8:24491..24700,complement(126523..126624), complement(124138..124270),complement(123249..123310), complement(122902..122987),complement(122240..122448), complement(120405..120543),complement(118357..118489), complement(117287..117452),complement(116653..116754), complement(116267..116389),complement(115982..116091), complement(115461..115613),complement(110308..110325))\-443349\CAI21706.1\Homo sapiens\Human DNA sequence from clone RP4-533D7 on chromosome 1p34.1-35.1Contains the 3' end of the gene for microtubule associated testisspecific serine/threonine protein kinase (MAST205), the PIK3R3 genefor phosphoinositide-3-kinase regulatory subunit polypeptide 3(p55, gamma), a cytochrome c oxidase subunit VIIb (COX7B)pseudogene, two novel genes and a CpG island, complete sequence./gene="MAST2"/locus_tag="RP4-533D7.1-003"/standard_name="OTTHUMP00000009698"/codon_start=1/product="microtubule associated serine/threonine kinase 2"/protein_id="CAI21706.1"/db_xref="GI:56205429"/db_xref="GOA:Q5VT07"/db_xref="GOA:Q6P0Q8"/db_xref="UniProt/Swiss-Prot:Q6P0Q8" 3 10 9 5 13 22 10 16 24 9 10 10 11 13 1 12 14 12 15 10 2 17 15 15 4 17 12 20 3 17 22 11 13 9 6 4 23 9 22 25 7 12 16 20 14 14 8 23 17 8 10 7 10 7 10 16 8 10 12 14 17 14 6 18 >AF325830\AF325830\1..1119\1119\AAK49911.1\Homo sapiens\Homo sapiens calponin-like integrin-linked kinase binding proteinmRNA, complete cds./codon_start=1/product="calponin-like integrin-linked kinase binding protein"/protein_id="AAK49911.1"/db_xref="GI:13936722" 4 3 3 2 1 3 3 10 25 4 2 7 2 7 2 6 5 2 5 8 1 4 7 10 4 2 2 9 0 5 6 1 6 1 1 9 18 5 12 20 9 8 7 11 3 4 12 21 12 10 4 2 0 1 8 7 2 9 6 7 2 0 0 1 >BC028581\BC028581\69..2654\2586\AAH28581.1\Homo sapiens\Homo sapiens piwi-like 1 (Drosophila), mRNA (cDNA clone MGC:26748IMAGE:4827445), complete cds./gene="PIWIL1"/codon_start=1/product="piwi-like 1"/protein_id="AAH28581.1"/db_xref="GI:20306782"/db_xref="GeneID:9271"/db_xref="MIM:605571" 7 6 7 8 26 15 11 10 18 14 16 14 12 8 1 5 12 10 24 11 4 14 21 6 2 15 9 19 2 12 23 13 8 14 5 11 20 20 24 21 19 23 20 34 8 11 22 15 19 26 20 16 9 7 9 23 10 16 26 19 11 1 0 0 >AF301470\AF301470\219..422\204\AAG22030.1\Homo sapiens\Homo sapiens beta-defensin 3 mRNA, complete cds./codon_start=1/product="beta-defensin 3"/protein_id="AAG22030.1"/db_xref="GI:10717136" 2 0 1 1 3 1 0 2 2 2 1 2 0 0 1 0 1 0 1 0 1 0 2 0 0 1 0 0 0 2 2 4 0 1 0 1 2 1 3 3 1 0 0 2 0 2 1 1 0 0 0 3 5 1 1 2 1 3 0 1 0 1 0 0 >AK057788\AK057788\197..697\501\BAB71573.1\Homo sapiens\Homo sapiens cDNA FLJ25059 fis, clone CBL04610./codon_start=1/protein_id="BAB71573.1"/db_xref="GI:16553729" 3 0 0 2 4 1 2 1 2 3 2 1 4 1 0 3 3 5 2 2 0 2 6 0 0 4 2 1 0 2 6 3 1 8 1 2 2 5 6 1 0 6 3 1 1 4 4 1 3 7 1 6 1 4 2 9 2 3 9 4 2 0 1 0 >AF270513\AF270513\160..3321\3162\AAK37963.1\Homo sapiens\Homo sapiens extracellular glycoprotein EMILIN-2 precursor, mRNA,complete cds./codon_start=1/product="extracellular glycoprotein EMILIN-2 precursor"/protein_id="AAK37963.1"/db_xref="GI:13661556" 5 8 11 1 16 20 4 24 41 13 3 11 9 11 5 15 22 14 14 20 11 14 14 35 17 15 22 23 10 17 18 27 30 12 3 20 36 14 17 40 27 17 13 51 13 10 39 38 31 24 15 9 14 8 13 13 6 14 7 18 11 1 0 0 >BC001767\BC001767\98..499\402\AAH01767.1\Homo sapiens\Homo sapiens LSM1 homolog, U6 small nuclear RNA associated (S.cerevisiae), mRNA (cDNA clone MGC:1352 IMAGE:3353833), completecds./gene="LSM1"/codon_start=1/product="Lsm1 protein"/protein_id="AAH01767.1"/db_xref="GI:12804683"/db_xref="GeneID:27257"/db_xref="MIM:607281" 4 0 0 1 2 2 4 2 3 4 2 3 0 2 0 0 2 1 2 2 0 2 0 1 0 3 3 2 0 0 3 3 1 2 1 2 7 1 4 5 2 1 3 5 1 2 9 5 5 5 2 1 0 0 0 3 2 1 8 2 0 1 0 0 >BC050677\BC050677\259..2016\1758\AAH50677.1\Homo sapiens\Homo sapiens hypothetical protein MGC3121, mRNA (cDNA cloneMGC:60225 IMAGE:6053861), complete cds./gene="MGC3121"/codon_start=1/product="hypothetical protein MGC3121"/protein_id="AAH50677.1"/db_xref="GI:29791770"/db_xref="GeneID:78994" 10 10 20 4 6 13 4 12 21 5 2 9 10 16 3 10 15 2 7 21 3 7 30 34 12 28 7 20 1 11 6 10 11 4 1 8 13 4 4 15 5 4 4 21 8 1 14 24 11 8 4 0 4 3 10 4 3 10 3 9 6 0 1 0 >HUMFKBP\M34539\79..405\327\AAA35844.1\Homo sapiens\Human FK506-binding protein (FKBP) mRNA, complete cds./codon_start=1/protein_id="AAA35844.1"/db_xref="GI:182628" 1 2 1 0 2 0 2 1 2 2 0 0 0 3 0 1 0 1 0 4 0 3 5 2 0 0 0 5 0 0 3 4 4 2 0 1 7 1 3 5 1 0 0 5 2 1 5 2 2 4 1 2 1 0 2 3 1 4 0 4 1 0 0 1 >BC093885\BC093885\74..919\846\AAH93885.1\Homo sapiens\Homo sapiens SH3 and multiple ankyrin repeat domains 2, transcriptvariant 2, mRNA (cDNA clone IMAGE:7939730), complete cds./gene="SHANK2"/codon_start=1/product="SH3 and multiple ankyrin repeat domains 2, isoform 2"/protein_id="AAH93885.1"/db_xref="GI:62739586"/db_xref="GeneID:22941"/db_xref="MIM:603290" 3 4 6 0 0 6 2 3 8 2 0 2 2 5 2 2 3 1 5 7 5 2 5 11 7 4 1 10 5 6 6 8 5 4 0 8 14 3 9 13 5 8 3 8 2 0 5 12 8 3 4 0 2 2 5 3 2 4 4 11 1 0 1 0 >AK129649\AK129649\214..642\429\BAC85207.1\Homo sapiens\Homo sapiens cDNA FLJ26138 fis, clone TMS04622./codon_start=1/protein_id="BAC85207.1"/db_xref="GI:34526237" 2 0 0 0 2 2 0 5 9 1 0 1 3 4 2 4 4 7 2 6 0 5 3 3 0 1 3 6 0 0 1 1 3 3 0 1 1 1 1 7 2 4 2 5 1 0 4 16 1 2 1 1 0 1 1 2 0 1 0 4 0 0 1 0 >HSU84744\U84744\553..5148\4596\AAB87737.1\Homo sapiens\Human Chediak-Higashi syndrome protein short isoform (LYST) mRNA,complete cds./gene="LYST"/codon_start=1/product="Chediak-Higashi syndrome protein short isoform"/protein_id="AAB87737.1"/db_xref="GI:2654474" 12 3 5 4 32 11 21 17 36 42 53 41 38 13 5 37 20 51 23 14 1 26 19 10 3 28 37 24 0 29 21 6 11 20 19 8 21 35 59 33 26 38 32 51 12 45 76 41 23 49 7 17 15 28 13 45 31 14 38 31 11 1 0 0 >BC014445\BC014445\162..1793\1632\AAH14445.1\Homo sapiens\Homo sapiens EH-domain containing 2, mRNA (cDNA clone MGC:22994IMAGE:4908085), complete cds./gene="EHD2"/codon_start=1/product="EH-domain containing 2"/protein_id="AAH14445.1"/db_xref="GI:15680192"/db_xref="GeneID:30846"/db_xref="MIM:605890" 2 18 8 1 1 2 0 19 44 0 0 2 2 8 6 1 11 0 0 13 6 1 1 17 7 7 4 23 9 2 2 27 10 2 0 6 30 1 4 34 11 1 0 19 13 3 3 41 30 5 11 0 2 1 17 9 0 22 1 16 7 0 0 1 >AL139281#1\AL139281\join(13755..13872,22290..22390,28388..28464,37450..37597, 38904..39008)\549\CAI12665.1\Homo sapiens\Human DNA sequence from clone RP11-215C7 on chromosome 10 Containsthe gene for pilin-like transcription factor (PILB) (CGI-131), atyrosine 3-monooxygenase/tryptophan 5-monooxygenase activationprotein zeta polypeptide (YWHAZ) pseudogene, the gene for a novelprotein similar to bHLH protein Ptf1-p48 (PTF1A) (possible orthologof rodent pancreas specific transcription factor 1a (Ptf1a)), anovel gene, a novel pseudogene and five CpG islands, completesequence./gene="MSRB2"/locus_tag="RP11-215C7.1-001"/standard_name="OTTHUMP00000019305"/codon_start=1/product="methionine sulfoxide reductase B2"/protein_id="CAI12665.1"/db_xref="GI:55957250" 0 1 5 1 2 2 2 5 6 2 1 2 1 1 2 5 1 5 3 4 3 2 2 2 3 6 3 1 4 4 6 8 8 3 0 2 4 1 4 7 2 2 2 3 3 3 5 7 1 3 3 1 6 2 4 3 0 3 0 2 3 0 0 1 >AB158468\AB158468\50..1411\1362\BAD07014.1\Homo sapiens\Homo sapiens gdf7 mRNA for growth differentiation factor 7,complete cds./gene="gdf7"/codon_start=1/product="growth differentiation factor 7"/protein_id="BAD07014.1"/db_xref="GI:40786371" 3 23 12 3 1 9 2 7 22 5 1 6 1 8 8 3 14 0 3 5 6 2 8 8 16 1 10 32 27 7 3 38 12 3 0 8 15 2 1 3 6 0 2 8 7 2 4 16 19 0 7 0 10 1 11 0 0 8 2 6 6 0 1 0 >BC047733\BC047733\8..1183\1176\AAH47733.1\Homo sapiens\Homo sapiens DNA (cytosine-5-)-methyltransferase 2, transcriptvariant a, mRNA (cDNA clone MGC:54074 IMAGE:6473993), complete cds./gene="DNMT2"/codon_start=1/product="DNA methyltransferase 2, isoform a"/protein_id="AAH47733.1"/db_xref="GI:29126973"/db_xref="GeneID:1787"/db_xref="MIM:602478" 3 1 2 1 5 5 7 6 7 15 13 4 5 2 0 9 7 2 9 3 1 7 11 5 0 7 6 4 1 4 6 8 2 3 7 2 11 4 16 9 4 13 10 13 4 2 20 10 5 13 5 11 2 4 6 13 9 3 16 8 0 1 0 0 >AY061755\AY061755\121..10086\9966\AAL33798.1\Homo sapiens\Homo sapiens nesprin-1 beta mRNA, complete cds./codon_start=1/product="nesprin-1 beta"/protein_id="AAL33798.1"/db_xref="GI:17861386" 22 28 31 20 32 35 39 103 155 57 39 72 39 59 12 70 63 52 44 43 16 38 30 22 6 20 57 78 8 72 41 29 24 25 17 31 57 33 114 110 47 61 97 207 40 40 171 183 88 96 30 24 17 30 28 35 34 52 69 77 52 0 0 1 >BC028040\BC028040\106..1371\1266\AAH28040.1\Homo sapiens\Homo sapiens 2',3'-cyclic nucleotide 3' phosphodiesterase, mRNA(cDNA clone MGC:40095 IMAGE:5248370), complete cds./gene="CNP"/codon_start=1/product="CNP protein"/protein_id="AAH28040.1"/db_xref="GI:20380838"/db_xref="GeneID:1267"/db_xref="MIM:123830" 4 7 11 1 3 2 2 10 26 4 3 5 3 7 2 3 8 1 4 11 9 2 2 10 3 4 3 16 1 8 3 17 13 1 1 5 15 0 5 34 6 1 5 15 4 1 4 26 17 8 13 1 4 3 15 5 1 9 2 7 5 0 0 1 >Z93322\Z93322\8..778\771\CAB07532.1\Homo sapiens\H.sapiens mRNA; IMAGE cDNA clone 308937./codon_start=1/product="c21ORF-HumF09G8.5"/protein_id="CAB07532.1"/db_xref="GI:2425155"/db_xref="GOA:O43822"/db_xref="UniProt/Swiss-Prot:O43822" 1 8 8 4 1 4 2 6 31 0 0 0 1 7 2 0 14 3 2 6 3 4 1 4 5 3 3 11 1 4 1 12 3 0 0 2 9 1 0 8 6 0 0 10 5 0 2 25 3 4 3 0 7 0 1 1 1 6 1 4 2 0 0 1 >HS179D312\AL049704\25..1005\981\CAB41268.1\Homo sapiens\Human gene from PAC 179D3, chromosome X, isoform of mitochondrialapoptosis inducing factor, AIF, AF100928./codon_start=1/product="hypothetical protein"/protein_id="CAB41268.1"/db_xref="GI:4678809"/db_xref="GOA:O95831"/db_xref="UniProt/Swiss-Prot:O95831" 5 1 5 0 2 7 3 6 5 2 1 8 5 2 0 2 4 6 4 3 0 6 3 10 3 1 10 5 2 11 10 10 3 8 4 6 14 8 8 12 7 6 4 7 3 4 11 13 9 7 5 1 2 1 7 3 4 7 8 7 5 0 0 1 >AF077820\AF077820\89..4936\4848\AAC72791.1\Homo sapiens\Homo sapiens LDL receptor member LR3 mRNA, complete cds./codon_start=1/product="LDL receptor member LR3"/protein_id="AAC72791.1"/db_xref="GI:3831748" 5 33 34 6 3 17 3 39 100 4 0 7 12 29 13 6 46 4 17 46 26 9 9 51 36 7 12 80 18 4 11 55 42 10 1 31 62 3 5 47 50 12 3 53 33 6 5 65 122 16 43 7 35 22 40 8 1 73 14 29 35 0 0 1 >HSJ154G14#4\AL121964\join(complement(AL121837.33:1025..1144),14033..14143, 17217..17282,24173..24218,25626..25764,29216..29340, 32254..32382,33661..33791,35291..35372,37663..37793, 38453..38582,41208..41288,49439..49503,62057..62162, 66521..66582,69159..69191)\-121833\CAI19610.1\Homo sapiens\Human DNA sequence from clone RP1-154G14 on chromosome 6q15-16.3Contains the 3' end of the MAP3K7 gene for mitogen-activatedprotein kinase kinase kinase 7 (TGF-beta activated kinase 1, TAK1),complete sequence./gene="MAP3K7"/locus_tag="RP1-154G14.1-003"/standard_name="OTTHUMP00000016872"/codon_start=1/protein_id="CAI19610.1"/db_xref="GI:56203690" 0 3 0 3 9 11 0 10 12 9 1 8 2 15 3 7 17 14 8 7 7 16 10 5 4 5 4 7 2 8 9 9 5 8 5 8 7 9 9 8 10 17 10 9 9 19 5 0 5 5 4 6 8 10 17 18 11 13 14 4 17 10 3 11 >HUMHLABC\L38504\1..1089\1089\AAA69724.1\Homo sapiens\Homo sapiens (clones 18.1, 18.2, 19.2) MHC class I HLA-B*2702 mRNA,complete cds./gene="HLA-B*2702"/codon_start=1/product="major histocompatibility complex"/protein_id="AAA69724.1"/db_xref="GI:896271" 2 8 6 1 7 5 1 10 17 1 0 0 1 7 0 7 6 2 5 15 4 3 2 7 7 2 5 12 11 10 5 11 10 1 1 5 18 2 1 9 4 1 1 18 7 3 1 26 17 4 11 3 6 1 6 0 1 9 1 4 11 0 0 1 >AY337264\AY337264\102..1184\1083\AAR00269.1\Homo sapiens\Homo sapiens protein phosphatase 2C epsilon (PP2CE) mRNA, completecds./gene="PP2CE"/codon_start=1/product="protein phosphatase 2C epsilon"/protein_id="AAR00269.1"/db_xref="GI:37700518" 4 5 3 0 6 2 3 7 14 6 2 10 4 7 1 5 6 3 4 5 4 4 3 1 2 5 5 8 1 5 2 5 9 5 4 4 12 3 9 16 7 6 0 12 7 2 12 14 16 12 7 3 2 2 14 4 4 14 4 10 4 0 0 1 >BC014549\BC014549\51..1097\1047\AAH14549.1\Homo sapiens\Homo sapiens capping protein (actin filament), gelsolin-like, mRNA(cDNA clone MGC:13643 IMAGE:4108934), complete cds./gene="CAPG"/codon_start=1/product="gelsolin-like capping protein"/protein_id="AAH14549.1"/db_xref="GI:15778939"/db_xref="GeneID:822"/db_xref="MIM:153615" 2 3 7 2 0 1 1 6 23 1 0 0 3 6 2 4 3 3 3 3 1 4 7 3 2 6 5 17 3 7 4 15 7 3 0 4 17 1 3 21 12 4 2 21 5 2 5 21 13 5 6 2 2 3 11 4 2 11 3 5 6 0 0 1 >BC012176\BC012176\97..1032\936\AAH12176.1\Homo sapiens\Homo sapiens naked cuticle homolog 2 (Drosophila), mRNA (cDNA cloneMGC:20432 IMAGE:4649935), complete cds./gene="NKD2"/codon_start=1/product="NKD2 protein"/protein_id="AAH12176.1"/db_xref="GI:15082530"/db_xref="GeneID:85409"/db_xref="MIM:607852" 1 8 6 2 5 11 2 9 7 1 0 1 1 6 5 2 11 2 1 5 4 1 3 12 4 6 8 7 6 6 4 10 11 3 0 8 8 0 3 11 7 1 1 12 8 0 3 28 17 3 4 2 7 1 5 1 0 1 2 4 3 0 0 1 >AF020920\AF020920\311..1345\1035\AAG50147.1\Homo sapiens\Homo sapiens chromosome 3 beta-1,4-galactosyltransferase mRNA,complete cds./codon_start=1/evidence=experimental/product="beta-1,4-galactosyltransferase"/protein_id="AAG50147.1"/db_xref="GI:12275809" 3 2 4 1 8 5 2 10 14 3 6 7 1 4 0 5 3 4 4 2 1 8 1 6 0 6 5 6 0 3 5 8 5 5 4 4 13 5 13 12 10 10 3 8 8 5 14 11 7 6 9 7 3 3 12 7 1 5 5 6 6 0 0 1 >BC000218\BC000218\8..1864\1857\AAH00218.1\Homo sapiens\Homo sapiens delta-like 3 (Drosophila), transcript variant 1, mRNA(cDNA clone MGC:652 IMAGE:3508262), complete cds./gene="DLL3"/codon_start=1/product="delta-like 3 protein, isoform 1 precursor"/protein_id="AAH00218.1"/db_xref="GI:12652923"/db_xref="GeneID:10683"/db_xref="MIM:602768" 2 29 12 3 2 7 4 20 23 4 1 7 2 17 4 6 12 2 3 9 5 3 6 26 22 17 9 23 21 14 13 37 20 8 3 13 13 0 1 2 6 3 2 14 13 1 6 23 18 7 9 1 36 12 18 2 1 6 5 3 7 0 0 1 >BC064548\BC064548\66..332\267\AAH64548.1\Homo sapiens\Homo sapiens cytochrome c oxidase subunit VIb polypeptide 2(testis), mRNA (cDNA clone MGC:74620 IMAGE:5744705), complete cds./gene="COX6B2"/codon_start=1/product="cytochrome c oxidase subunit VIb, testis-specific isoform"/protein_id="AAH64548.1"/db_xref="GI:40353035"/db_xref="GeneID:125965" 0 5 0 1 0 1 0 1 2 0 0 1 0 0 2 0 4 0 0 2 2 0 0 6 2 0 0 2 0 0 0 1 3 0 0 0 3 0 2 4 5 0 0 6 2 0 1 4 2 1 4 1 4 0 5 0 0 4 1 1 3 0 0 1 >AL845353#28\AL845353\complement(join(127057..127089,127265..127426, 127539..127748))\405\CAI41898.1\Homo sapiens\Human DNA sequence from clone DAQB-47P19 on chromosome 6 Containsthe 3' end of the MRPS18B gene for mitochondrial ribosomal proteinS18B, the C6orf134 gene for chromosome 6 open reading frame 134,the C6orf136 gene for chromosome 6 open reading frame 136, theDHX16 gene for DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 16, theKIAA1949 gene, the NRM gene for nurim (nuclear envelope membraneprotein), the RPL7P4 pseudogene for ribosomal protein L7 pseudogene4, the MDC1 gene for mediator of DNA damage checkpoint 1, the TUBBgene for tubulin, beta polypeptide, the FLOT1 gene for flotillin 1,the IER3 gene for immediate early response 3 and 8 CpG islands,complete sequence./gene="IER3"/locus_tag="DAQB-47P19.3-003"/standard_name="OTTHUMP00000014832"/codon_start=1/product="immediate early response 3"/protein_id="CAI41898.1"/db_xref="GI:57209818"/db_xref="UniProt/TrEMBL:Q5JP65" 2 6 3 0 0 2 0 4 7 1 0 0 0 5 0 2 3 0 0 7 0 3 4 6 7 7 3 8 3 1 1 1 3 2 1 3 2 1 1 1 2 0 0 3 3 0 2 6 2 0 1 0 2 1 3 1 0 5 0 3 0 0 0 1 >AY335944\AY335944\join(101..349,1400..1805,1982..2775)\1449\AAQ16550.1\Homo sapiens\Homo sapiens homeodomain protein IRXB2 (IRX5) gene, complete cds./gene="IRX5"/codon_start=1/product="homeodomain protein IRXB2"/protein_id="AAQ16550.1"/db_xref="GI:33356607" 1 9 8 0 0 1 2 11 9 2 1 7 1 10 20 8 7 0 4 12 8 1 8 31 22 10 3 22 25 7 7 30 14 3 0 2 6 1 7 17 12 2 1 12 6 5 4 23 20 2 19 4 4 2 7 2 1 5 3 7 4 1 0 0 >BC009676\BC009676\10..1053\1044\AAH09676.1\Homo sapiens\Homo sapiens histone deacetylase 11, mRNA (cDNA clone MGC:8894IMAGE:3906049), complete cds./gene="HDAC11"/codon_start=1/product="histone deacetylase 11"/protein_id="AAH09676.1"/db_xref="GI:16307174"/db_xref="GeneID:79885"/db_xref="MIM:607226" 2 9 6 4 0 6 2 6 18 9 0 0 3 7 2 1 4 0 7 8 2 0 4 10 2 2 3 9 4 5 3 14 11 1 1 3 21 2 4 13 6 5 0 9 6 4 2 21 13 8 9 2 1 1 8 6 2 20 3 8 5 0 0 1 >AK074766\AK074766\171..1382\1212\BAC11192.1\Homo sapiens\Homo sapiens cDNA FLJ90285 fis, clone NT2RP1000679./codon_start=1/protein_id="BAC11192.1"/db_xref="GI:22760424" 7 0 11 1 4 5 0 5 8 6 2 5 4 10 1 8 8 9 8 7 2 5 14 22 1 19 9 9 2 10 6 5 14 4 2 4 7 2 8 24 3 0 2 7 2 2 19 21 13 13 2 2 4 4 4 6 0 5 4 8 4 0 0 1 >HSU49379\U49379\88..1791\1704\AAC50497.1\Homo sapiens\Human diacylglycerol kinase epsilon DGK mRNA, complete cds./EC_number="2.7.1.107"/codon_start=1/product="diacylglycerol kinase epsilon DGK"/protein_id="AAC50497.1"/db_xref="GI:1289445" 5 6 4 3 3 7 8 8 22 5 7 10 1 4 4 8 5 5 11 6 2 10 9 8 3 7 7 8 7 7 18 9 11 7 9 6 8 14 18 21 8 13 10 16 8 6 18 10 14 21 9 9 18 10 11 8 5 10 13 17 12 0 1 0 >AC069281#2\AC069281\complement(join(7546..7592,8626..8784,10150..10384, 11265..11500,12852..12993,13092..13190,16503..16541))\957\AAP21862.1\Homo sapiens\Homo sapiens BAC clone RP11-44M6 from 7, complete sequence./gene="FBXO24"/codon_start=1/product="unknown"/protein_id="AAP21862.1"/db_xref="GI:30141998" 1 11 6 2 4 6 4 12 11 4 0 7 2 5 2 4 7 2 4 10 2 2 5 3 3 6 1 11 1 1 5 8 5 3 1 8 13 7 3 12 1 3 2 16 7 1 5 10 7 5 10 3 5 8 11 2 0 6 3 5 4 0 0 1 >AF050080\AF050080\1..354\354\AAC69520.1\Homo sapiens\Homo sapiens C16orf3 small protein mRNA, complete cds./codon_start=1/product="C16orf3 small protein"/protein_id="AAC69520.1"/db_xref="GI:3818471" 0 0 0 1 2 2 0 0 3 1 0 0 2 6 0 6 5 1 2 2 2 1 5 9 0 2 1 7 3 3 0 5 2 1 6 0 1 0 1 2 0 0 3 3 1 4 2 3 1 0 0 0 8 1 0 0 1 1 0 3 2 0 0 1 >HSJ836N17#2\AL049539\join(AL353092.6:21133..21252,AL353092.6:22789..22831, AL353092.6:23186..23288,AL353092.6:24093..24191, 4334..4437,8454..8603,8951..9103,11188..11367, 13128..13204,18423..18576,23564..23695,25877..26079)\-140055\CAI22966.1\Homo sapiens\Human DNA sequence from clone RP5-836N17 on chromosome20q11.1-11.21 Contains the 3' end of the HCK for hemopoietic cellkinase, the gene for novel protein (KIAA0255), the RPL24P1 gene forribosomal protein L24 pseudogene 1 and two CpG Islands, completesequence./gene="HCK"/locus_tag="RP5-836N17.3-002"/standard_name="OTTHUMP00000030578"/codon_start=1/protein_id="CAI22966.1"/db_xref="GI:56203175" 5 4 12 3 3 5 1 7 28 2 0 8 3 10 5 6 10 3 2 12 4 4 7 10 2 17 6 16 1 9 9 9 8 2 1 10 15 0 14 21 15 2 4 18 4 4 7 25 15 5 17 1 5 1 15 10 2 20 4 19 8 1 5 5 >HUMP45C17\M19489\join(1839..2135,3804..3942,4176..4405,5068..5154, 5849..6064,6297..6466,7364..7467,7986..8269)\1527\AAA36405.1\Homo sapiens\Human P450XVIIA-1 (steroid 17-alpha-hydroxylase/17,20 lyase) gene,complete cds./gene="CYP17"/codon_start=1/protein_id="AAA36405.1"/db_xref="GI:386992" 3 5 2 5 2 5 3 13 34 5 1 8 5 9 0 6 9 3 5 17 0 3 5 12 2 8 3 15 5 7 3 14 6 6 2 6 17 5 10 27 12 16 3 17 9 6 7 19 17 11 5 4 1 3 18 6 7 23 6 12 10 1 0 0 >HUMIFNAJ1\M34913\41..610\570\AAA36039.1\Homo sapiens\Human interferon-alpha-J1 (IFN-alpha-J1) mRNA, complete cds./codon_start=1/protein_id="AAA36039.1"/db_xref="GI:184615" 0 0 1 1 6 5 2 5 10 2 2 3 2 5 0 6 5 0 3 2 0 4 0 1 1 3 3 5 0 3 2 2 1 0 1 2 4 2 6 4 1 4 4 8 2 2 7 10 4 3 4 1 2 3 9 4 2 7 0 6 2 0 0 1 >AB093575\AB093575\252..953\702\BAC76997.1\Homo sapiens\Homo sapiens ERas mRNA for small GTPase protein E-Ras, completecds./gene="ERas"/codon_start=1/product="small GTPase protein E-Ras"/protein_id="BAC76997.1"/db_xref="GI:31158278" 0 2 3 1 0 6 0 3 16 2 0 1 0 4 2 2 2 3 4 10 0 1 1 5 0 4 5 14 2 8 1 11 4 1 0 5 12 1 1 8 2 1 1 15 8 3 1 11 9 5 2 0 5 4 5 1 1 5 1 3 5 0 0 1 >BC053851\BC053851\394..4056\3663\AAH53851.1\Homo sapiens\Homo sapiens bromodomain and PHD finger containing, 1, transcriptvariant 1, mRNA (cDNA clone MGC:61492 IMAGE:6140996), complete cds./gene="BRPF1"/codon_start=1/product="bromodomain and PHD finger-containing protein 1, isoform 1"/protein_id="AAH53851.1"/db_xref="GI:31753086"/db_xref="GeneID:7862"/db_xref="MIM:602410" 9 25 35 10 2 11 7 18 47 9 1 11 24 14 3 9 40 20 17 20 7 12 27 35 8 17 17 33 4 18 6 36 10 16 7 15 35 7 22 66 30 13 10 46 29 13 18 88 43 28 25 8 22 7 20 9 0 33 7 28 13 0 0 1 >BC046366\BC046366\395..1096\702\AAH46366.1\Homo sapiens\Homo sapiens copine VIII, mRNA (cDNA clone IMAGE:5764002), completecds./gene="LOC144402"/codon_start=1/product="LOC144402 protein"/protein_id="AAH46366.1"/db_xref="GI:28204828"/db_xref="GeneID:144402" 1 0 0 0 7 3 4 1 5 2 1 3 4 4 0 5 1 3 2 2 0 2 8 3 0 8 6 4 1 9 8 2 3 4 6 4 5 5 7 3 3 6 5 5 2 2 5 5 3 13 4 9 0 1 1 8 6 3 7 9 0 0 0 1 >HSFLA1A\Y00796\89..3601\3513\CAA68747.1\Homo sapiens\Human mRNA for leukocyte-associated molecule-1 alpha subunit (LFA-1alpha subunit)./codon_start=1/protein_id="CAA68747.1"/db_xref="GI:31422"/db_xref="GOA:P20701"/db_xref="UniProt/Swiss-Prot:P20701" 3 9 12 1 14 11 2 26 73 7 1 15 9 29 9 14 30 10 19 23 2 13 15 26 8 15 14 24 7 20 22 28 42 12 7 19 42 13 15 36 22 15 11 48 19 11 21 53 39 23 20 14 11 11 34 24 6 46 10 25 10 0 0 1 >AF151905\AF151905\128..667\540\AAD34142.1\Homo sapiens\Homo sapiens CGI-147 protein mRNA, complete cds./codon_start=1/product="CGI-147 protein"/protein_id="AAD34142.1"/db_xref="GI:4929763" 2 0 0 1 2 0 3 3 4 3 3 5 1 2 0 2 5 3 3 1 1 5 3 4 0 2 5 3 0 8 6 5 5 1 2 3 4 5 9 6 0 2 4 4 2 3 7 1 4 3 4 1 3 2 0 1 0 1 7 8 2 0 1 0 >BC001689\BC001689\72..977\906\AAH01689.1\Homo sapiens\Homo sapiens solute carrier family 25 (carnitine/acylcarnitinetranslocase), member 20, mRNA (cDNA clone MGC:1207 IMAGE:3050073),complete cds./gene="SLC25A20"/codon_start=1/product="carnitine/acylcarnitine translocase"/protein_id="AAH01689.1"/db_xref="GI:12804553"/db_xref="GeneID:788"/db_xref="MIM:212138" 5 0 5 0 2 2 2 4 9 6 3 6 1 1 0 4 3 4 4 5 2 7 6 6 3 9 6 10 1 6 8 9 12 4 1 7 10 1 7 11 3 5 2 9 2 0 6 5 4 5 5 5 3 3 13 9 0 14 2 11 3 0 0 1 >AB035863\AB035863\58..1449\1392\BAA92873.1\Homo sapiens\Homo sapiens SCS-betaA mRNA for ATP specific succinyl CoAsynthetase beta subunit precursor, complete cds./gene="SCS-betaA"/codon_start=1/product="ATP specific succinyl CoA synthetase beta subunit precursor"/protein_id="BAA92873.1"/db_xref="GI:7328935" 2 2 5 0 4 5 4 7 5 9 6 9 10 2 0 6 1 5 6 5 2 2 6 2 0 6 17 8 3 21 17 6 2 14 10 9 11 10 21 16 7 10 13 12 1 6 26 2 6 21 7 4 2 4 3 12 17 5 12 16 1 0 0 1 >AL391357#1\AL391357\complement(join(5456..5526,7021..7186))\237\CAH73469.1\Homo sapiens\Human DNA sequence from clone RP11-401M16 on chromosome 1 Containsthe gene for calcium/calmodulin-dependent protein kinase II(CaMKIINalpha), the gene for a novel protein (FLJ12875), aribosomal protein S4 X-linked (RPS4X) pseudogene, the CDA gene forcytidine deaminase, the PINK1 gene for PTEN induced putative kinase1, a novel gene (FLJ00387), the DDOST gene fordolichyl-diphosphooligosaccharide-protein glycosyltransferase andthe 3' end of the KIF17 gene for kinesin family member 17, completesequence./gene="RP11-401M16.1"/locus_tag="RP11-401M16.1-001"/standard_name="OTTHUMP00000002879"/codon_start=1/product="calcium/calmodulin-dependent protein kinase II (CaMKIINalpha)"/protein_id="CAH73469.1"/db_xref="GI:55664520"/db_xref="GOA:Q7Z7J9"/db_xref="UniProt/TrEMBL:Q7Z7J9" 0 1 3 0 0 1 0 0 5 0 0 0 0 1 1 0 2 0 0 2 0 0 0 3 1 2 1 1 0 0 0 8 1 1 0 1 3 2 1 5 3 1 0 4 0 0 1 2 6 3 2 0 1 0 3 0 0 2 2 2 0 1 0 0 >CR536498\CR536498\1..1221\1221\CAG38737.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834A0620D forgene REN, renin; complete cds, incl. stopcodon./gene="REN"/codon_start=1/protein_id="CAG38737.1"/db_xref="GI:49168484"/db_xref="InterPro:IPR001461"/db_xref="InterPro:IPR001969"/db_xref="InterPro:IPR009007" 3 4 2 2 3 5 1 14 15 2 1 4 4 15 3 4 8 3 8 20 3 4 2 11 2 3 2 13 1 1 11 14 7 8 2 8 12 2 5 14 8 4 4 9 5 1 8 11 12 8 11 6 3 4 13 9 1 17 6 13 7 0 0 1 >AF063601\AF063601\194..2137\1944\AAG43160.1\Homo sapiens\Homo sapiens brain my043 protein mRNA, complete cds./codon_start=1/product="brain my043 protein"/protein_id="AAG43160.1"/db_xref="GI:12002034" 0 2 1 0 3 4 7 14 11 10 11 22 29 4 0 21 8 14 13 3 2 13 17 5 2 19 15 8 0 22 7 2 2 7 7 0 14 27 32 20 4 12 8 12 2 7 47 21 11 27 4 14 3 1 9 14 16 7 16 12 2 0 0 1 >AY281131\AY281131\1..3750\3750\AAP37612.1\Homo sapiens\Homo sapiens AIDA-1b mRNA, complete cds./codon_start=1/product="AIDA-1b"/protein_id="AAP37612.1"/db_xref="GI:31746739" 8 5 4 2 24 13 11 21 28 21 14 23 24 24 5 33 17 17 38 17 2 25 24 19 4 31 28 20 2 23 35 15 19 8 12 15 21 17 48 28 26 42 18 32 26 20 67 29 24 40 14 15 6 16 5 23 10 20 38 22 11 0 0 1 >HSA406932\AJ406932\34..330\297\CAC27571.1\Homo sapiens\Homo sapiens mRNA for keratin associated protein 3.2 (KRTAP3.2gene)./gene="KRTAP3.2"/function="structural protein"/standard_name="KAP3.2"/codon_start=1/product="keratin associated protein 3.2"/protein_id="CAC27571.1"/db_xref="GI:12655438"/db_xref="UniProt/TrEMBL:Q9BYR7" 0 2 0 0 1 0 0 4 4 0 1 0 0 4 0 1 2 1 1 7 0 3 4 9 1 2 0 2 0 0 1 2 1 0 0 2 1 1 1 0 3 0 0 3 2 0 0 3 2 1 0 0 14 5 2 0 0 2 1 1 1 0 0 1 >AY358984\AY358984\51..1697\1647\AAQ89343.1\Homo sapiens\Homo sapiens clone DNA82307 aminyltransferase (UNQ906) mRNA,complete cds./locus_tag="UNQ906"/codon_start=1/product="aminyltransferase"/protein_id="AAQ89343.1"/db_xref="GI:37183086" 3 16 11 0 2 5 2 18 45 0 0 3 6 12 9 2 11 2 3 10 5 5 3 12 6 8 7 14 8 2 5 17 4 1 1 5 26 2 10 24 12 2 1 23 14 4 2 35 29 4 16 0 4 1 32 3 0 19 5 8 9 1 0 0 >BC008988\BC008988\58..1194\1137\AAH08988.1\Homo sapiens\Homo sapiens T-cell immunoglobulin and mucin domain containing 4,mRNA (cDNA clone MGC:17333 IMAGE:4184237), complete cds./gene="TIMD4"/codon_start=1/product="T-cell immunoglobulin and mucin domain containing 4"/protein_id="AAH08988.1"/db_xref="GI:14290446"/db_xref="GeneID:91937" 1 4 1 0 7 3 4 13 9 7 1 5 10 10 1 8 5 7 26 17 4 10 6 11 2 6 8 4 2 3 8 2 4 5 1 7 12 4 11 5 7 4 2 9 5 1 11 9 7 10 5 2 6 2 3 5 3 7 4 13 9 1 0 0 >AK126120\AK126120\1301..1741\441\BAC86449.1\Homo sapiens\Homo sapiens cDNA FLJ44132 fis, clone THYMU2008282./codon_start=1/protein_id="BAC86449.1"/db_xref="GI:34532501" 0 2 2 0 2 7 3 4 5 1 4 1 3 4 1 1 3 5 2 2 0 2 4 1 0 3 1 7 0 5 2 5 4 5 4 0 3 1 1 0 2 0 3 4 4 2 2 6 2 2 0 0 4 1 0 6 1 0 4 3 0 0 0 1 >AF147709\AF147709\26..4012\3987\AAF33021.1\Homo sapiens\Homo sapiens MYB-binding protein 1A (MYBBP1A) mRNA, complete cds./gene="MYBBP1A"/codon_start=1/product="MYB-binding protein 1A"/protein_id="AAF33021.1"/db_xref="GI:6959304" 6 22 31 7 2 14 10 38 106 5 1 27 7 25 7 6 40 14 11 32 16 7 12 38 13 17 21 60 12 16 10 26 21 8 1 18 56 3 19 82 21 7 7 92 28 9 13 82 42 14 11 5 13 4 36 12 3 24 4 23 11 0 0 1 >S76984\S76984\1..39\39\AAD14243.1\Homo sapiens\HEXA {HEXA4bpDeltass mutation, exon 11} [human, Tay-Sachs diseasepatient, mRNA Partial Mutant, 84 nt]./gene="HEXA"/codon_start=1/protein_id="AAD14243.1"/db_xref="GI:4261943" 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 2 0 0 1 >AY152724\AY152724\1..957\957\AAN73210.1\Homo sapiens\Homo sapiens X-linked ectodermal dysplasia receptor long isoform(XEDAR) mRNA, complete cds, alternatively spliced./gene="XEDAR"/codon_start=1/product="X-linked ectodermal dysplasia receptor long isoform"/protein_id="AAN73210.1"/db_xref="GI:25140979" 1 2 2 2 2 3 2 6 10 2 1 5 1 7 1 5 13 4 9 11 1 3 4 10 1 7 5 7 0 7 9 3 3 4 1 6 9 4 3 8 5 5 7 13 5 1 6 19 7 6 5 2 13 8 10 7 1 6 3 1 4 1 0 0 >AL359513#1\AL359513\complement(join(12614..13992,21231..21389,32435..33397, 37746..37803))\2559\CAH71657.1\Homo sapiens\Human DNA sequence from clone RP11-245D16 on chromosome 13 Containsthe gene for TMTSP for transmembrane molecule with thrombospondinmodule (LOC55901), the gene fpr CGI-145 protein (LOC51028), theCKAP2 gene for cytoskeleton associated protein 2 (LB1, FLJ10749)and 3 CpG islands, complete sequence./gene="THSD1"/locus_tag="RP11-245D16.1-001"/standard_name="OTTHUMP00000018448"/codon_start=1/product="thrombospondin, type I, domain 1"/protein_id="CAH71657.1"/db_xref="GI:55663986"/db_xref="Genew:17754"/db_xref="GOA:Q9NS62"/db_xref="InterPro:IPR000884"/db_xref="InterPro:IPR003067"/db_xref="UniProt/TrEMBL:Q9NS62" 5 4 8 3 14 24 6 9 31 9 4 15 7 21 3 10 27 17 19 16 5 20 21 23 8 20 13 27 3 12 14 11 18 9 7 14 25 3 15 24 17 12 6 34 11 6 21 37 20 13 10 9 10 12 24 20 5 13 9 10 9 0 1 0 >AK128007\AK128007\1041..1484\444\BAC87229.1\Homo sapiens\Homo sapiens cDNA FLJ46126 fis, clone TESTI2041362./codon_start=1/protein_id="BAC87229.1"/db_xref="GI:34535168" 0 0 0 0 5 8 1 3 1 2 1 3 1 5 0 3 1 4 5 6 0 2 5 2 0 2 5 4 0 7 1 5 9 1 1 1 4 1 2 2 1 1 2 3 2 1 4 1 2 2 2 0 3 1 4 3 1 1 2 3 5 0 0 1 >AF049259\AF049259\join(512..1006,2331..2413,2613..2769,2960..3121, 3246..3371,3464..3684,4694..4712)\1263\AAC35754.1\Homo sapiens\Homo sapiens keratin 13 gene, complete cds./codon_start=1/product="keratin 13"/protein_id="AAC35754.1"/db_xref="GI:3603253" 0 11 6 4 0 5 1 12 31 2 0 0 2 3 0 6 16 3 2 10 4 5 2 2 1 1 4 16 1 9 12 20 6 16 0 4 12 1 1 18 14 4 2 21 2 1 6 40 14 4 8 6 3 3 3 8 0 12 6 12 2 0 1 0 >HSRNASMG\X85373\84..314\231\CAA59689.1\Homo sapiens\H.sapiens mRNA for Sm protein G./gene="pBSCF"/codon_start=1/evidence=experimental/product="Sm protein G"/protein_id="CAA59689.1"/db_xref="GI:806566"/db_xref="GOA:Q15357"/db_xref="UniProt/Swiss-Prot:Q15357" 2 0 1 0 1 0 0 0 0 1 3 4 1 0 0 0 1 2 0 0 0 1 0 2 0 1 0 1 1 1 5 1 0 1 2 1 3 0 4 2 2 3 2 1 1 1 3 2 1 2 0 0 0 1 0 3 3 2 1 6 0 1 0 0 >AY532090\AY532090\3712..3792\81\AAT09392.1\Homo sapiens\Homo sapiens isolate YCC26 truncated 5-aminoevulinate synthase 2(ALAS2) gene, exons 2 and 3 and complete cds./gene="ALAS2"/codon_start=1/product="truncated 5-aminoevulinate synthase 2"/protein_id="AAT09392.1"/db_xref="GI:47059411" 0 0 0 0 0 1 1 0 2 0 0 1 0 0 0 0 0 1 0 0 0 1 0 2 1 2 1 1 0 1 0 0 1 2 0 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 1 0 0 >BT006838\BT006838\1..1410\1410\AAP35484.1\Homo sapiens\Homo sapiens v-ets erythroblastosis virus E26 oncogene homolog 2(avian) mRNA, complete cds./codon_start=1/product="v-ets erythroblastosis virus E26 oncogene homolog 2 (avian)"/protein_id="AAP35484.1"/db_xref="GI:30582515" 0 7 5 0 1 3 2 15 20 2 4 7 4 12 2 8 15 8 9 8 4 1 6 10 3 9 2 10 2 5 7 14 5 3 3 4 12 4 8 18 14 13 10 21 4 3 13 19 20 8 9 3 7 3 15 10 1 7 5 11 11 0 1 0 >HUMTHYP\M24398\301..609\309\AAA61185.1\Homo sapiens\Human parathymosin mRNA, complete cds./gene="PTMS"/codon_start=1/protein_id="AAA61185.1"/db_xref="GI:339699"/db_xref="GDB:G00-125-555" 1 0 2 0 1 0 0 0 2 0 0 1 0 0 2 0 3 0 1 0 0 1 0 2 0 0 3 4 4 2 1 0 4 0 0 0 4 0 4 9 1 1 0 1 0 0 18 21 1 7 0 0 0 0 0 0 0 0 0 1 0 0 0 1 >AB107883\AB107883\1..1209\1209\BAC79077.1\Homo sapiens\Homo sapiens BRAL2 mRNA for brain link protein-2, complete cds./gene="BRAL2"/codon_start=1/product="brain link protein-2"/protein_id="BAC79077.1"/db_xref="GI:32480475" 4 18 8 5 0 1 2 7 24 0 0 2 1 3 2 1 3 2 3 8 2 0 1 10 7 8 9 19 21 7 4 35 15 7 4 12 16 0 0 9 8 1 2 10 9 1 5 10 15 5 11 3 10 1 12 1 0 3 0 2 13 0 1 0 >HSB29IGBV\X83539\1..378\378\CAA58522.1\Homo sapiens\H.sapiens B29 mRNA for Ig-beta, splice variant./gene="B29, alternative-spliced form"/codon_start=1/product="Ig-beta, variant"/protein_id="CAA58522.1"/db_xref="GI:620078"/db_xref="GOA:P40259"/db_xref="UniProt/Swiss-Prot:P40259" 0 0 3 0 1 2 0 2 13 0 0 3 1 0 1 2 3 0 2 3 3 0 3 2 0 2 2 3 2 2 1 3 1 2 2 0 5 0 1 5 1 1 0 5 3 0 2 7 6 3 2 1 0 0 3 0 1 8 1 4 2 0 0 1 >AF165281\AF165281\121..6726\6606\AAD49849.1\Homo sapiens\Homo sapiens ATP cassette binding transporter 1 (ABC1) mRNA,complete cds./gene="ABC1"/codon_start=1/product="ATP cassette binding transporter 1"/protein_id="AAD49849.1"/db_xref="GI:5734101" 12 12 22 3 26 24 14 44 109 32 14 29 25 36 3 38 46 30 40 46 10 23 26 32 8 36 27 49 5 40 40 49 42 16 11 44 86 25 47 78 57 44 19 65 18 20 54 71 60 55 40 27 19 17 60 54 13 72 39 61 37 0 0 1 >HSU03907\U03907\1..1098\1098\AAA03605.1\Homo sapiens\Human MHC Class I HLA-B-3201 mRNA, complete cds./codon_start=1/product="Class I HLA-B-3201"/protein_id="AAA03605.1"/db_xref="GI:432996" 2 9 6 1 5 7 2 10 13 1 0 3 2 5 0 9 8 2 4 14 6 2 0 8 5 3 2 15 12 10 5 9 10 2 0 3 17 1 2 8 3 1 0 24 5 4 0 23 17 5 11 3 4 1 7 2 1 9 1 10 11 0 0 1 >BC039033\BC039033\89..478\390\AAH39033.1\Homo sapiens\Homo sapiens HLA class II region expressed gene KE2, mRNA (cDNAclone MGC:47665 IMAGE:5748799), complete cds./gene="HKE2"/codon_start=1/product="HLA class II region expressed gene KE2"/protein_id="AAH39033.1"/db_xref="GI:24658611"/db_xref="GeneID:10471"/db_xref="MIM:605660" 1 0 4 0 0 3 4 0 8 5 1 0 1 3 1 0 0 1 3 1 0 0 0 0 1 1 4 4 1 4 1 1 5 1 1 2 4 0 6 7 1 2 3 14 0 0 6 9 2 2 1 2 0 0 1 1 0 3 1 2 0 0 0 1 >BC015761\BC015761\145..1902\1758\AAH15761.1\Homo sapiens\Homo sapiens lectin, galactoside-binding, soluble, 3 bindingprotein, mRNA (cDNA clone MGC:23157 IMAGE:4858122), complete cds./gene="LGALS3BP"/codon_start=1/product="galectin 3 binding protein"/protein_id="AAH15761.1"/db_xref="GI:16041761"/db_xref="GeneID:3959"/db_xref="MIM:600626" 1 5 7 1 5 10 2 18 46 2 0 4 6 16 6 5 15 6 3 23 4 6 1 20 3 4 6 35 2 5 3 24 8 3 0 15 19 1 3 16 14 3 6 23 8 1 4 27 25 8 17 5 13 3 26 4 0 10 4 9 16 0 1 0 >BC034393\BC034393\26..562\537\AAH34393.1\Homo sapiens\Homo sapiens endothelin 2, mRNA (cDNA clone MGC:34606IMAGE:5185394), complete cds./gene="EDN2"/codon_start=1/product="endothelin 2"/protein_id="AAH34393.1"/db_xref="GI:21706719"/db_xref="GeneID:1907"/db_xref="MIM:131241" 2 5 5 1 2 7 1 4 6 4 0 1 1 11 0 1 3 1 5 5 0 4 4 2 1 5 2 11 2 3 2 4 3 0 0 4 4 1 0 7 2 0 3 6 2 3 3 5 5 0 2 0 7 2 3 1 0 2 1 2 5 0 1 0 >AK025764\AK025764\42..455\414\BAB15235.1\Homo sapiens\Homo sapiens cDNA: FLJ22111 fis, clone HEP18290./codon_start=1/protein_id="BAB15235.1"/db_xref="GI:10438381" 2 1 0 0 1 0 2 0 0 4 6 3 1 0 0 1 1 2 4 0 1 2 3 1 0 2 4 3 0 1 6 1 1 0 0 0 3 1 8 4 1 6 3 2 0 2 10 0 6 6 2 1 0 1 2 9 3 0 4 6 4 0 1 0 >AB035340#2\AB035340\3243..3461\219\BAA96378.1\Homo sapiens\Homo sapiens TCL6f1 mRNA for T-cell leukemia/lymphoma 6 ORF141,T-cell leukemia/lymphoma 6 ORF72, complete cds, clone:pDG1./gene="TCL6f1"/codon_start=1/product="T-cell leukemia/lymphoma 6 ORF72"/protein_id="BAA96378.1"/db_xref="GI:8176583" 1 0 0 0 1 2 0 0 1 1 2 1 2 1 0 3 1 4 0 1 0 1 1 0 1 2 4 3 0 4 7 1 1 1 0 0 2 0 0 0 0 1 1 2 1 0 0 2 1 2 1 0 3 1 2 1 0 1 0 2 2 0 0 1 >AK002011\AK002011\314..802\489\BAA92033.1\Homo sapiens\Homo sapiens cDNA FLJ11149 fis, clone PLACE1006731, weakly similarto RIBOFLAVIN KINASE (EC 2.7.1.26)/ FMN ADENYLYLTRANSFERASE (EC2.7.7.2)./codon_start=1/protein_id="BAA92033.1"/db_xref="GI:7023634" 2 1 2 0 1 1 1 1 3 2 2 1 2 2 0 3 2 3 2 1 1 1 4 2 0 2 1 2 1 3 3 6 1 4 1 1 6 3 6 7 2 5 3 2 2 4 7 4 3 5 4 3 2 0 6 2 3 4 6 6 2 0 0 1 >CR456983\CR456983\1..1245\1245\CAG33264.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834D025D forgene STAF65(gamma), SPTF-associated factor 65 gamma; complete cds,incl. stopcodon./gene="STAF65(gamma)"/codon_start=1/protein_id="CAG33264.1"/db_xref="GI:48146083" 1 4 5 4 2 3 2 11 14 7 1 3 10 7 1 5 8 14 4 4 1 7 8 7 1 20 4 8 1 11 4 5 7 4 3 6 13 2 5 13 6 9 4 21 8 6 17 21 14 8 3 5 3 4 8 6 2 7 8 10 4 1 0 0 >AK021843\AK021843\164..589\426\BAB13908.1\Homo sapiens\Homo sapiens cDNA FLJ11781 fis, clone HEMBA1005963./codon_start=1/protein_id="BAB13908.1"/db_xref="GI:10433112" 0 1 1 1 5 4 3 2 3 0 4 3 9 6 0 3 0 1 6 1 0 2 2 4 0 4 1 3 0 4 0 2 3 0 1 0 2 1 6 3 1 4 3 4 1 3 2 2 1 1 2 3 0 3 3 1 4 2 4 4 2 1 0 0 >AK022850\AK022850\10..867\858\BAB14272.1\Homo sapiens\Homo sapiens cDNA FLJ12788 fis, clone NT2RP2001946./codon_start=1/protein_id="BAB14272.1"/db_xref="GI:10434483" 3 5 6 2 1 2 3 6 15 3 0 3 4 3 0 6 10 4 1 3 0 5 7 12 1 5 2 8 1 6 6 6 7 5 0 2 8 2 5 16 2 1 5 12 3 2 9 23 5 11 4 1 5 3 3 2 1 2 1 4 2 0 0 1 >HUMGPMIVI\M60710\join(38..136,844..936,1870..1908)\231\AAA35929.1\Homo sapiens\Human glycophorin (HGpMiVI) gene, exons 2-4./gene="HGpMiVI"/codon_start=3/product="glycophorin MiVI"/protein_id="AAA35929.1"/db_xref="GI:553319" 0 0 1 0 2 2 1 2 5 3 1 2 2 1 0 1 1 0 4 1 0 0 1 0 1 1 1 0 0 0 0 0 1 0 1 0 0 2 4 2 2 0 5 6 1 3 0 1 1 0 1 1 1 0 0 3 2 2 0 2 1 1 0 1 >AY341953\AY341953\342..599\258\AAQ76866.1\Homo sapiens\Homo sapiens chromosome 20 F379 retina specific protein (F379)mRNA, complete cds./gene="F379"/codon_start=1/product="F379 retina specific protein"/protein_id="AAQ76866.1"/db_xref="GI:34596537" 0 0 0 0 2 0 0 2 3 1 3 3 2 1 1 2 2 0 3 4 1 1 1 2 1 3 0 3 0 2 0 1 2 2 0 0 0 2 3 3 1 1 1 1 2 1 2 1 1 3 0 1 1 1 3 1 2 1 2 2 1 1 0 0 >BC008753\BC008753\222..764\543\AAH08753.1\Homo sapiens\Homo sapiens ADP-ribosylation factor 4, mRNA (cDNA clone MGC:1097IMAGE:3346199), complete cds./gene="ARF4"/codon_start=1/product="ADP-ribosylation factor 4"/protein_id="AAH08753.1"/db_xref="GI:14250597"/db_xref="GeneID:378"/db_xref="MIM:601177" 1 1 0 3 4 1 3 3 7 7 1 4 2 3 0 1 1 1 6 5 0 3 1 0 0 2 4 2 0 3 3 3 2 5 5 1 3 3 5 6 4 4 3 7 0 1 7 3 1 10 1 4 0 2 3 4 1 2 8 6 4 1 0 0 >AF117210\AF117210\13..2391\2379\AAD27814.1\Homo sapiens\Homo sapiens host cell factor 2 (HCF-2) mRNA, complete cds./gene="HCF-2"/codon_start=1/product="host cell factor 2"/protein_id="AAD27814.1"/db_xref="GI:4689221" 5 3 4 2 14 7 7 2 12 14 10 12 22 8 0 19 13 17 26 9 6 18 23 5 3 24 23 10 4 17 22 10 10 17 12 7 15 22 33 13 12 32 15 11 8 12 24 13 6 27 9 18 5 12 7 17 8 15 12 13 16 0 0 1 >AK074518\AK074518\497..1435\939\BAC11035.1\Homo sapiens\Homo sapiens cDNA FLJ90037 fis, clone HEMBA1002048, moderatelysimilar to ODD-SKIPPED PROTEIN./codon_start=1/protein_id="BAC11035.1"/db_xref="GI:22760013" 2 3 6 1 6 5 2 8 15 1 1 4 0 5 2 3 7 2 2 12 7 5 4 11 8 2 1 9 7 2 5 5 4 2 1 1 6 1 13 9 3 6 2 12 20 3 2 8 10 4 6 1 4 7 9 8 0 9 1 6 1 0 1 0 >BC008953\BC008953\55..573\519\AAH08953.1\Homo sapiens\Homo sapiens thioredoxin domain containing 12 (endoplasmicreticulum), mRNA (cDNA clone MGC:3735 IMAGE:2959502), complete cds./gene="TXNDC12"/codon_start=1/product="endoplasmic reticulum thioredoxin superfamily member, 18 kDa"/protein_id="AAH08953.1"/db_xref="GI:14286304"/db_xref="GeneID:51060" 1 0 1 1 1 2 1 4 7 4 0 2 1 2 0 3 2 4 1 1 3 0 1 5 0 3 2 4 0 4 6 2 5 3 1 2 2 3 6 7 1 5 1 2 0 7 13 5 1 9 1 3 1 2 4 5 0 4 5 4 2 1 0 0 >BC012163\BC012163\7..345\339\AAH12163.1\Homo sapiens\Homo sapiens mitochondrial ribosomal protein L53, mRNA (cDNA cloneMGC:20402 IMAGE:4622995), complete cds./gene="MRPL53"/codon_start=1/product="mitochondrial ribosomal protein L53"/protein_id="AAH12163.1"/db_xref="GI:15082500"/db_xref="GeneID:116540" 0 4 4 0 0 3 0 3 5 1 0 1 1 3 1 0 2 1 0 3 2 2 0 2 1 1 1 4 3 5 1 4 2 2 0 2 6 2 2 2 2 1 0 3 1 3 2 3 6 1 0 0 2 1 5 0 0 1 2 3 0 0 0 1 >AB041018\AB041018\418..1311\894\BAD74066.1\Homo sapiens\Homo sapiens C2orf4a mRNA for chromosome 2 open reading frame 4a,complete cds./gene="C2orf4"/codon_start=1/product="chromosome 2 open reading frame 4 long form"/protein_id="BAD74066.1"/db_xref="GI:56377669" 3 1 1 2 6 2 3 5 6 4 3 3 4 4 2 9 4 11 7 2 2 1 0 3 2 7 5 8 1 7 9 0 5 3 3 2 7 2 8 5 2 8 6 9 3 10 13 4 4 9 7 10 2 5 5 6 3 2 13 9 5 0 0 1 >BC015887\BC015887\54..1268\1215\AAH15887.1\Homo sapiens\Homo sapiens WD repeat and SOCS box-containing 2, mRNA (cDNA cloneMGC:10210 IMAGE:3910968), complete cds./gene="WSB2"/codon_start=1/product="WD SOCS-box protein 2"/protein_id="AAH15887.1"/db_xref="GI:16198433"/db_xref="GeneID:55884" 2 3 4 0 4 7 4 8 20 8 3 5 4 11 2 8 12 6 8 5 6 4 7 12 3 4 5 11 1 6 5 9 8 3 0 14 7 4 12 10 1 5 6 9 12 2 7 10 10 11 7 0 9 4 9 8 0 9 8 7 15 1 0 0 >HS125H231\AL035291\220..3984\3765\CAA22894.1\Homo sapiens\H.sapiens gene from PACs 125H23 and 105D12./codon_start=1/product="hypothetical protein"/protein_id="CAA22894.1"/db_xref="GI:4200228"/db_xref="GOA:Q9UBS9"/db_xref="UniProt/TrEMBL:Q9UBS9" 7 4 6 3 13 6 10 5 18 29 19 23 50 10 3 50 17 26 33 10 2 31 34 11 4 22 20 18 2 22 26 7 2 16 24 9 15 30 62 32 7 54 25 32 8 14 85 40 16 51 7 28 8 14 7 24 27 12 32 25 7 1 0 0 >AF144412\AF144412\2490..2675\186\AAD38378.1\Homo sapiens\Homo sapiens lens epithelial cell protein (LEP503) gene, completecds./gene="LEP503"/codon_start=1/product="lens epithelial cell protein"/protein_id="AAD38378.1"/db_xref="GI:5051958" 1 0 3 0 0 0 2 3 4 0 0 0 0 0 0 0 0 0 2 2 0 1 0 3 0 2 0 3 0 0 1 2 3 0 0 2 1 0 0 3 0 0 1 3 0 0 2 1 1 1 1 0 2 1 3 0 0 2 1 2 2 1 0 0 >AF120274\AF120274\783..1445\663\AAD21075.1\Homo sapiens\Homo sapiens pre-pro-neublastin mRNA, complete cds./codon_start=1/product="pre-pro-neublastin"/protein_id="AAD21075.1"/db_xref="GI:4455085" 2 17 8 1 2 1 1 4 17 3 0 0 0 10 1 3 9 1 0 4 3 0 2 14 11 6 3 12 10 5 4 15 5 0 0 5 4 0 0 0 1 0 0 4 4 0 3 2 4 0 1 0 9 0 3 0 0 0 0 2 4 0 0 1 >BC047365\BC047365\523..903\381\AAH47365.1\Homo sapiens\Homo sapiens hypothetical protein LOC130355, mRNA (cDNA cloneMGC:35431 IMAGE:5189916), complete cds./gene="LOC130355"/codon_start=1/product="hypothetical protein LOC130355"/protein_id="AAH47365.1"/db_xref="GI:66840145"/db_xref="GeneID:130355" 0 2 0 1 2 2 2 2 4 1 1 3 2 2 0 0 1 2 2 2 0 3 3 1 0 3 4 1 0 3 3 0 0 0 3 1 4 1 6 4 2 5 3 0 1 3 11 0 3 4 1 4 0 1 4 3 0 4 4 1 1 0 0 1 >BT006931\BT006931\1..999\999\AAP35577.1\Homo sapiens\Homo sapiens dystonia 1, torsion (autosomal dominant; torsin A)mRNA, complete cds./codon_start=1/product="dystonia 1, torsion (autosomal dominant; torsin A)"/protein_id="AAP35577.1"/db_xref="GI:30582701" 3 0 4 1 2 4 1 12 16 2 4 5 2 5 1 1 8 4 4 4 2 0 2 5 2 2 5 9 4 4 4 11 3 3 1 4 13 3 10 14 9 5 0 9 7 3 8 9 8 14 9 6 4 2 13 8 4 12 7 7 4 0 1 0 >AY358899\AY358899\62..1393\1332\AAQ89258.1\Homo sapiens\Homo sapiens clone DNA33460 EFEMP2 (UNQ200) mRNA, complete cds./locus_tag="UNQ200"/codon_start=1/product="EFEMP2"/protein_id="AAQ89258.1"/db_xref="GI:37182916" 1 16 6 2 0 2 5 9 16 0 0 4 2 14 2 6 10 1 2 11 2 0 8 15 4 9 0 15 3 5 3 12 10 4 2 10 13 1 0 3 18 2 4 21 9 4 3 24 17 11 19 5 28 14 9 3 1 9 4 8 2 0 0 1 >AL590559#4\AL590559\complement(join(2366..2422,4325..4498,9168..9345, 13359..13481,14047..14136,14242..14382,15294..15444, 19147..19459))\1227\CAH73323.1\Homo sapiens\Human DNA sequence from clone RP11-393N21 on chromosome 1 Containsthe 5' end of the gene for PAI-1 mRNA-binding protein (PAI-RBP1), apoly(A) binding protein, cytoplasmic 4 (PABPC4) pseudogene, a novelgene and a CpG island, complete sequence./gene="RP11-102M16.2"/locus_tag="RP11-393N21.3-006"/standard_name="OTTHUMP00000010978"/codon_start=1/product="PAI-1 mRNA-binding protein (PAI-RBP1)"/protein_id="CAH73323.1"/db_xref="GI:55665493"/db_xref="InterPro:IPR006861" 10 4 2 11 9 4 0 0 6 6 3 2 5 3 2 7 4 6 2 3 2 4 11 5 3 7 8 4 3 9 18 17 9 12 2 2 7 8 17 18 7 9 5 11 3 8 25 17 16 14 3 1 1 0 6 8 4 2 6 4 3 1 0 0 >AF365928\AF365928\148..1821\1674\AAM00356.1\Homo sapiens\Homo sapiens calcium transport protein CaT1 mRNA, complete cds,alternatively spliced./codon_start=1/product="calcium transport protein CaT1"/protein_id="AAM00356.1"/db_xref="GI:19880486" 4 9 15 1 4 6 6 18 40 6 2 3 5 8 1 2 9 3 4 21 4 5 4 15 2 5 2 18 0 6 7 11 13 5 2 6 20 1 4 15 12 3 3 22 9 3 6 21 21 6 13 8 11 1 23 7 2 31 6 25 12 0 0 1 >BC069235\BC069235\224..1957\1734\AAH69235.1\Homo sapiens\Homo sapiens THAP domain containing 4, mRNA (cDNA clone MGC:78456IMAGE:6645198), complete cds./gene="THAP4"/codon_start=1/product="THAP domain containing 4"/protein_id="AAH69235.1"/db_xref="GI:46623323"/db_xref="GeneID:51078" 2 9 8 1 4 13 3 3 27 4 1 1 8 22 7 7 21 8 8 15 6 5 13 20 6 5 5 28 6 7 11 17 8 3 1 9 21 2 8 33 13 1 4 27 13 5 9 28 13 9 5 2 9 3 14 4 0 13 3 11 5 1 0 0 >AY074877\AY074877\249..2990\2742\AAL79676.1\Homo sapiens\Homo sapiens pVHL-interacting deubiquitinating enzyme 2 (VDU2)mRNA, complete cds./gene="VDU2"/codon_start=1/product="pVHL-interacting deubiquitinating enzyme 2"/protein_id="AAL79676.1"/db_xref="GI:23262727" 6 12 22 3 1 7 3 16 44 6 2 8 8 23 9 9 26 6 5 20 11 2 12 24 11 8 9 38 5 12 11 26 17 6 5 21 31 3 11 37 24 4 2 42 22 6 8 62 38 12 26 4 19 13 26 6 1 25 11 13 13 0 0 1 >HSA400877#6\AJ400877\complement(join(190479..190622,192592..192672, 195028..195132,197427..197507,198287..198508))\633\CAB92290.1\Homo sapiens\Homo sapiens ASCL3 gene, CEGP1 gene, C11orf14 gene, C11orf15 gene,C11orf16 gene and C11orf17 gene./gene="C11orf17"/codon_start=1/product="C11orf17 protein"/protein_id="CAB92290.1"/db_xref="GI:8052242"/db_xref="UniProt/Swiss-Prot:Q9NQ31" 1 1 1 4 6 3 2 2 7 3 1 2 5 4 1 4 2 2 2 4 0 3 4 3 3 3 6 5 4 5 4 5 6 2 2 4 8 2 5 7 2 3 2 6 3 4 5 11 9 1 1 8 2 2 3 0 3 1 0 5 1 0 0 1 >AL451062#4\AL451062 AC026934\join(53597..53759,AL592548.10:7818..7865, AL592548.10:30608..30650,AL592548.10:53936..54028, AL357079.24:14444..14538,AL357079.24:50193..50263)\-2491626\CAH71026.1\Homo sapiens\Human DNA sequence from clone RP11-184I16 on chromosome 1 Containsthe 3' end of the JMJD2 gene for jumonji domain containing 2, twonovel genes, the 5' end of the SIAT6 gene for sialyltransferase 6(N-acetyllacosaminide alpha 2,3-sialyltransferase), a SMT3suppressor of mif two 3 (yeast) family pseudogene and a CpG island,complete sequence./gene="SIAT6"/locus_tag="RP11-7O11.1-007"/standard_name="OTTHUMP00000008817"/codon_start=1/product="sialyltransferase 6 (N-acetyllacosaminide alpha 2,3-sialyltransferase)"/protein_id="CAH71026.1"/db_xref="GI:55665241" 2 2 0 0 1 2 6 6 4 3 1 6 3 3 1 5 3 4 3 1 0 2 2 2 0 2 1 3 1 1 5 2 3 1 2 1 3 3 4 5 3 2 2 6 4 5 2 3 1 1 1 4 2 6 3 6 5 0 4 3 5 1 2 1 >AF417920\AF417920\264..1553\1290\AAL59159.1\Homo sapiens\Homo sapiens ASB-10 mRNA, complete cds./codon_start=1/product="ASB-10"/protein_id="AAL59159.1"/db_xref="GI:18092200" 3 14 7 5 5 7 1 17 39 5 0 4 2 9 1 4 8 3 2 12 2 2 8 14 2 6 10 25 6 13 7 14 9 2 0 9 13 2 2 5 4 1 1 13 10 9 6 20 12 11 4 1 6 10 5 3 0 5 0 3 6 0 1 0 >BC017318\BC017318\252..677\426\AAH17318.1\Homo sapiens\Homo sapiens hypothetical protein MGC29643, mRNA (cDNA cloneMGC:29643 IMAGE:3641660), complete cds./gene="MGC29643"/codon_start=1/product="hypothetical protein MGC29643"/protein_id="AAH17318.1"/db_xref="GI:16878240"/db_xref="GeneID:116372" 0 2 0 0 0 3 1 5 4 2 1 2 3 5 2 2 1 2 0 3 1 1 4 2 0 1 3 7 3 0 2 2 5 0 0 1 3 2 4 2 6 1 3 5 1 0 3 2 2 0 3 0 9 5 6 2 0 6 1 4 1 0 0 1 >HSHHA2GEN\X90761\join(4581..5048,5730..5812,5977..6133,7463..7624, 7703..7828,8856..9076,11662..11791)\1347\CAA62284.1\Homo sapiens\Homo sapiens hHa2 gene./gene="hHa2"/codon_start=1/product="HHa2 hair keratin type I intermediate filament"/protein_id="CAA62284.1"/db_xref="GI:3757666"/db_xref="GOA:Q14532"/db_xref="InterPro:IPR001664"/db_xref="InterPro:IPR002957"/db_xref="UniProt/Swiss-Prot:Q14532" 1 9 11 0 1 9 1 6 35 6 0 1 2 16 2 4 11 4 2 14 6 4 4 9 1 6 5 22 1 5 0 11 2 1 0 8 15 4 3 9 15 8 3 29 2 2 5 39 15 4 10 2 19 7 6 0 0 10 4 15 2 0 0 1 >HSA539427#2\AJ539427\311..1045\735\CAD62383.1\Homo sapiens\Homo sapiens mRNA for IKIP3 (IKIP gene)./gene="IKIP"/codon_start=1/product="IKIP3"/protein_id="CAD62383.1"/db_xref="GI:41529139"/db_xref="UniProt/TrEMBL:Q6ZWH4" 3 1 1 2 5 1 10 2 3 3 8 3 5 3 0 6 2 6 12 2 4 5 2 0 0 0 5 1 0 3 0 0 1 4 8 1 3 4 15 12 3 11 8 2 1 2 14 4 7 13 0 1 0 0 2 4 10 0 12 3 1 0 1 0 >AY049737\AY049737\16..552\537\AAL12172.1\Homo sapiens\Homo sapiens nucleophosmin/nucleoplasmin 3 (NPM3) mRNA, completecds./gene="NPM3"/codon_start=1/product="nucleophosmin/nucleoplasmin 3"/protein_id="AAL12172.1"/db_xref="GI:16118247" 1 2 5 0 0 1 2 6 5 1 1 1 0 3 1 2 3 4 0 5 2 3 1 3 2 4 3 9 2 1 1 6 4 2 3 4 5 4 2 4 2 2 2 5 3 2 6 16 5 5 0 0 3 2 5 3 0 3 1 5 0 0 1 0 >AF428136\AF428136\1..1173\1173\AAQ04219.1\Homo sapiens\Homo sapiens cell line PE/CA-PJ15 SCCA1/SCCA2 fusion protein mRNA,complete cds./codon_start=1/product="SCCA1/SCCA2 fusion protein"/protein_id="AAQ04219.1"/db_xref="GI:33313307" 1 0 1 0 4 3 5 9 6 4 7 5 9 5 1 4 6 5 9 8 3 7 5 0 0 5 8 8 0 6 6 4 3 3 5 7 7 5 20 17 12 15 7 10 6 3 20 14 5 11 4 8 0 3 16 10 3 9 6 12 5 0 1 0 >AL161645#3\AL161645\complement(join(138931..138969,139992..140079, 140238..140348,140421..140544,140622..140688, 141400..141463,141708..141797,142505..142559, 142808..142855,143518..143592,143942..144001, 144732..144759))\849\CAH70048.1\Homo sapiens\Human DNA sequence from clone RP11-108K14 on chromosome 10 Containsthe 3' end of a novel gene (LOC92170) (FLJ90495, FLJ39553), threenovel genes, the CYP2E1 gene for family 2 subfamily E polypeptide 1cytochrome P450, a novel gene (LOC93426), the 3' end of a olfactoryreceptor protein pseudogene and eight CpG islands, completesequence./gene="RP11-108K14.6"/locus_tag="RP11-108K14.6-001"/standard_name="OTTHUMP00000020812"/codon_start=1/product="novel protein"/protein_id="CAH70048.1"/db_xref="GI:55661922"/db_xref="UniProt/TrEMBL:Q9BWU4" 1 2 5 0 4 5 3 6 23 1 0 4 0 2 1 1 6 2 4 4 0 0 4 3 0 1 6 9 1 7 6 3 1 1 1 4 6 1 8 20 3 2 8 25 7 4 14 26 6 3 0 0 4 3 2 2 0 3 5 6 3 0 0 1 >HSU79262\U79262\96..1205\1110\AAB50208.1\Homo sapiens\Human deoxyhypusine synthase mRNA, complete cds./codon_start=1/product="deoxyhypusine synthase"/protein_id="AAB50208.1"/db_xref="GI:1710220" 1 6 6 1 0 3 1 6 21 4 0 3 3 6 2 2 5 2 7 10 2 1 5 7 2 2 4 16 5 6 2 15 7 4 3 8 13 2 0 17 16 6 2 13 6 2 7 19 16 4 9 3 4 0 9 6 0 16 5 12 4 0 0 1 >AF153341\AF153341\233..1441\1209\AAF75586.1\Homo sapiens\Homo sapiens winged helix/forkhead transcription factor (HFH1)gene, complete cds./gene="HFH1"/codon_start=1/product="winged helix/forkhead transcription factor"/protein_id="AAF75586.1"/db_xref="GI:8489093"/IGNORED_CODON=1 1 22 6 2 1 2 1 14 17 3 1 2 1 6 8 0 10 2 0 2 8 0 4 27 20 3 6 25 39 3 2 32 10 1 0 5 7 0 0 12 7 0 0 6 5 0 1 18 14 2 9 1 6 0 10 1 0 7 0 5 4 0 0 1 >BC011775\BC011775\287..1369\1083\AAH11775.1\Homo sapiens\Homo sapiens non imprinted in Prader-Willi/Angelman syndrome 2,transcript variant 3, mRNA (cDNA clone MGC:19609 IMAGE:3640970),complete cds./gene="NIPA2"/codon_start=1/product="non imprinted in Prader-Willi/Angelman syndrome 2, isoform a"/protein_id="AAH11775.1"/db_xref="GI:15079979"/db_xref="GeneID:81614"/db_xref="MIM:608146" 3 1 1 1 3 2 8 5 13 10 4 8 5 4 0 7 7 6 8 2 0 8 5 1 0 4 7 7 3 10 11 9 5 9 3 8 15 4 8 8 3 14 3 3 2 6 9 7 4 4 3 7 1 5 9 13 5 7 18 9 5 1 0 0 >AY369207\AY369207\253..882\630\AAQ73311.1\Homo sapiens\Homo sapiens RNA-binding protein with multiple splicing 2 (RBPMS2)mRNA, complete cds./gene="RBPMS2"/codon_start=1/product="RNA-binding protein with multiple splicing 2"/protein_id="AAQ73311.1"/db_xref="GI:34485858" 0 2 3 2 2 1 3 5 9 0 0 2 0 6 0 1 6 0 2 8 0 5 5 6 2 5 7 14 4 5 3 10 3 2 0 2 4 1 2 8 4 3 1 5 4 1 4 7 6 1 6 2 0 1 5 5 0 5 2 4 3 0 1 0 >BT019487\BT019487\1..648\648\AAV38294.1\Homo sapiens\Homo sapiens sodium channel, voltage-gated, type II, beta mRNA,complete cds./codon_start=1/product="sodium channel, voltage-gated, type II, beta"/protein_id="AAV38294.1"/db_xref="GI:54695844" 0 5 3 1 3 1 1 7 10 0 0 2 1 4 1 3 4 1 3 3 3 1 2 4 2 5 0 7 0 1 1 7 4 2 1 6 13 0 3 9 12 1 1 6 3 2 2 14 6 6 5 0 5 1 8 2 0 4 3 8 2 0 1 0 >AY129018\AY129018\58..411\354\AAM98761.1\Homo sapiens\Homo sapiens clone FP18376 unknown mRNA./codon_start=1/product="unknown"/protein_id="AAM98761.1"/db_xref="GI:37547445" 1 1 1 1 2 1 0 5 3 1 3 3 2 3 0 2 6 1 1 5 0 5 3 3 1 3 2 3 0 1 2 2 1 3 2 1 2 0 3 1 1 2 2 5 2 4 0 1 0 0 1 0 2 1 3 1 0 1 2 2 7 1 0 0 >AY040554\AY040554\1..1008\1008\AAK77968.1\Homo sapiens\Homo sapiens SLAM mRNA, complete cds./codon_start=1/product="SLAM"/protein_id="AAK77968.1"/db_xref="GI:15072538" 0 3 2 1 3 4 2 9 14 7 2 5 3 9 0 3 12 3 12 12 2 1 11 5 1 4 4 3 1 7 5 4 8 4 1 9 10 4 12 9 11 6 1 13 3 3 8 12 5 4 6 8 7 0 3 4 6 8 4 8 4 0 0 1 >HSGALT2\Y15060\1..1269\1269\CAA75344.1\Homo sapiens\Homo sapiens mRNA for GalT2 protein./gene="galT2"/codon_start=1/product="GalT2 protein"/protein_id="CAA75344.1"/db_xref="GI:3256001"/db_xref="GOA:O43825"/db_xref="InterPro:IPR002659"/db_xref="UniProt/Swiss-Prot:O43825" 4 3 3 4 9 4 9 1 10 8 7 7 2 1 1 9 5 9 11 4 1 9 9 4 0 12 11 5 0 5 10 7 1 5 4 4 3 5 16 9 11 17 12 4 7 11 14 7 5 7 12 12 3 5 9 12 7 4 14 9 9 1 0 0 >HS5O6#1\AL020993\join(complement(24348..24539),complement(22513..22678), complement(7822..7912),complement(7247..7322), complement(3358..3449),complement(2003..2172), complement(1364..1558), complement(AL021977.10:40844..41085))\192\CAI21955.1\Homo sapiens\Human DNA sequence from clone RP1-5O6 on chromosome 22q12, completesequence./locus_tag="CTA-447C4.2-001"/standard_name="OTTHUMP00000028772"/codon_start=1/protein_id="CAI21955.1"/db_xref="GI:56202405"/db_xref="GOA:Q9Y519"/db_xref="InterPro:IPR005178"/db_xref="UniProt/Swiss-Prot:Q9Y519" 1 2 4 2 1 5 5 11 16 8 1 5 9 8 6 9 10 3 10 10 6 8 8 12 5 14 7 15 4 8 9 16 7 6 2 9 10 3 2 3 1 3 8 7 7 8 6 10 2 3 7 4 11 4 12 1 0 10 2 7 11 0 1 3 >HSA302551\AJ302551\1..936\936\CAC20476.1\Homo sapiens\Homo sapiens 6M1-3*01 gene for olfactory receptor, cell line OLGA./gene="6M1-3*01"/codon_start=1/product="olfactory receptor"/protein_id="CAC20476.1"/db_xref="GI:12054343"/db_xref="GOA:O76001"/db_xref="UniProt/Swiss-Prot:O76001" 2 1 0 2 4 1 3 14 16 7 2 4 6 5 1 8 3 2 6 10 0 4 4 0 2 8 5 3 0 8 8 2 4 3 7 8 7 9 3 3 5 6 4 4 5 6 4 3 2 6 8 6 4 6 10 11 4 11 5 11 5 0 0 1 >AK128511\AK128511\252..1412\1161\BAC87473.1\Homo sapiens\Homo sapiens cDNA FLJ46666 fis, clone TRACH3007625./codon_start=1/protein_id="BAC87473.1"/db_xref="GI:34535916" 3 2 6 1 4 2 3 10 12 4 5 7 3 4 1 10 5 4 7 3 0 1 6 2 0 12 4 11 0 7 10 10 7 6 4 7 6 5 6 5 4 4 6 5 9 11 12 11 5 10 5 7 6 6 14 18 6 11 10 12 9 0 0 1 >AF101264\AF101264\1..1254\1254\AAD04566.1\Homo sapiens\Homo sapiens Ca2+/calmodulin-dependent kinase kinase mRNA, completecds./codon_start=1/product="Ca2+/calmodulin-dependent kinase kinase"/protein_id="AAD04566.1"/db_xref="GI:4151803" 1 5 8 5 0 4 1 9 21 0 2 6 4 10 4 5 4 3 2 12 4 1 8 16 6 5 3 8 3 3 5 10 5 4 0 12 18 1 8 21 9 7 1 14 9 2 14 20 15 6 9 3 6 3 8 6 2 19 4 11 2 0 1 0 >BC008734\BC008734\167..1264\1098\AAH08734.1\Homo sapiens\Homo sapiens Fc fragment of IgG, receptor, transporter, alpha, mRNA(cDNA clone MGC:1506 IMAGE:3163446), complete cds./gene="FCGRT"/codon_start=1/product="Fc fragment of IgG, receptor, transporter, alpha"/protein_id="AAH08734.1"/db_xref="GI:14250561"/db_xref="GeneID:2217"/db_xref="MIM:601437" 1 3 6 1 1 5 3 12 28 5 0 4 1 12 4 1 7 4 2 11 1 2 4 11 5 8 2 15 7 8 8 15 11 4 2 6 11 0 4 10 6 4 1 14 8 0 5 17 8 5 6 1 4 2 12 3 0 5 2 4 13 0 0 1 >AJ938058\AJ938058\6..1238\1233\CAI79393.1\Homo sapiens\Homo sapiens mRNA for killer cell immunoglobulin-like receptor(KIR3DL3 gene), variant 44b TM3./gene="KIR3DL3"/codon_start=1/product="killer cell immunoglobulin-like receptor"/protein_id="CAI79393.1"/db_xref="GI:62700724" 0 5 2 2 9 6 2 12 13 6 1 6 6 9 3 9 6 4 9 7 2 7 6 16 2 13 8 8 4 4 7 5 12 9 2 13 16 6 6 3 10 5 4 11 11 5 5 13 11 4 7 3 5 5 12 7 1 8 2 9 6 1 0 0 >AL954664#2\AL954664\join(18932..19085,20954..21072,21140..21246,25333..25483)\531\CAI41962.1\Homo sapiens\Human DNA sequence from clone LL0YNC03-56G10 on chromosome XContains the 3' end of the gene for protein phosphatase 2A 48 kDaregulatory subunit (PR48), a novel gene and five CpG islands,complete sequence./gene="PPP2R3B"/locus_tag="LL0YNC03-56G10.1-007"/standard_name="OTTHUMP00000022820"/codon_start=1/product="protein phosphatase 2 (formerly 2A), regulatory subunit B'', beta"/protein_id="CAI41962.1"/db_xref="GI:57209895"/db_xref="GOA:Q96H01"/db_xref="InterPro:IPR002048"/db_xref="UniProt/TrEMBL:Q96H01" 1 2 0 0 0 4 0 6 18 0 0 0 1 3 1 0 3 1 0 1 1 2 0 5 4 1 1 8 4 1 1 4 4 1 0 2 3 0 1 7 2 0 0 8 1 0 1 23 17 0 7 0 5 0 10 0 0 5 0 4 2 0 0 1 >BC021886\BC021886\21..443\423\AAH21886.1\Homo sapiens\Homo sapiens ribosomal protein L27, mRNA (cDNA clone MGC:5383IMAGE:3445930), complete cds./gene="RPL27"/codon_start=1/product="RPL27 protein"/protein_id="AAH21886.1"/db_xref="GI:18255173"/db_xref="GeneID:6155"/db_xref="MIM:607526" 1 6 2 0 3 1 2 0 4 2 0 1 2 1 0 2 1 0 4 1 0 1 1 4 1 1 0 4 0 6 3 3 0 0 0 6 6 0 9 14 3 2 0 2 1 2 3 2 2 6 6 1 0 0 3 4 1 4 2 3 1 0 1 0 >AY422212\AY422212\105..3401\3297\AAR31184.1\Homo sapiens\Homo sapiens putative non-ribosomal peptide synthetase NRPS1098mRNA, complete cds./codon_start=1/product="putative non-ribosomal peptide synthetase NRPS1098"/protein_id="AAR31184.1"/db_xref="GI:39841348" 4 0 3 2 15 9 7 19 20 29 27 25 30 15 1 34 8 18 29 8 0 29 23 2 8 13 13 10 2 19 28 14 14 12 22 11 24 32 50 24 16 31 19 22 9 24 44 21 13 32 12 23 8 27 11 39 12 15 37 15 15 1 0 0 >BC074966\BC074966\21..590\570\AAH74966.1\Homo sapiens\Homo sapiens interferon, alpha 4, mRNA (cDNA clone MGC:103901IMAGE:30915288), complete cds./gene="IFNA4"/codon_start=1/product="interferon, alpha 4"/protein_id="AAH74966.1"/db_xref="GI:49901646"/db_xref="GeneID:3441"/db_xref="MIM:147564" 0 0 0 0 6 5 2 6 12 2 2 2 2 6 1 5 5 0 3 2 0 4 0 2 0 2 3 5 0 3 2 2 1 1 0 2 5 2 5 4 1 4 5 8 2 3 6 10 4 4 4 1 1 4 7 4 2 7 0 6 2 0 0 1 >AF054821\AF054821\32..1594\1563\AAC08589.1\Homo sapiens\Homo sapiens cytochrome P-450 mRNA, complete cds./codon_start=1/product="cytochrome P-450"/protein_id="AAC08589.1"/db_xref="GI:2997737" 0 14 8 5 2 6 0 16 43 4 0 5 4 5 1 3 10 7 3 13 3 2 7 17 6 8 6 21 3 4 2 9 9 4 2 16 9 3 7 21 8 2 5 15 10 6 6 21 21 8 5 8 10 3 19 10 2 17 10 13 13 0 0 1 >AB084279\AB084279\85..4392\4308\BAD00086.1\Homo sapiens\Homo sapiens ASXH2 mRNA for polycomb group protein, complete cds./gene="ASXH2"/codon_start=1/product="polycomb group protein"/protein_id="BAD00086.1"/db_xref="GI:37998953" 5 2 3 5 30 21 14 19 24 24 17 13 39 28 6 34 48 31 36 32 5 33 48 35 8 37 39 25 5 33 29 28 13 27 13 24 20 24 52 50 20 28 29 67 11 15 50 50 16 29 7 7 14 9 14 14 9 21 15 25 6 1 0 0 >HSU60266\U60266\15..3050\3036\AAC34130.1\Homo sapiens\Homo sapiens lysosomal alpha-mannosidase (manB) mRNA, complete cds./gene="manB"/EC_number="3.2.1.24"/function="necessary for the catabolism of N-linked carbohydrates released during glycoprotein turnover"/codon_start=1/product="lysosomal alpha-mannosidase"/protein_id="AAC34130.1"/db_xref="GI:3522867" 5 29 18 5 5 8 4 21 67 8 1 8 10 13 9 2 21 5 18 24 8 7 6 30 18 14 11 44 18 11 14 28 17 11 8 12 46 9 7 17 31 17 7 50 23 4 7 38 33 18 25 9 9 3 32 12 3 20 5 21 27 0 1 0 >BC034761\BC034761\220..1434\1215\AAH34761.1\Homo sapiens\Homo sapiens adrenomedullin receptor, mRNA (cDNA clone MGC:34399IMAGE:5185932), complete cds./gene="ADMR"/codon_start=1/product="adrenomedullin receptor"/protein_id="AAH34761.1"/db_xref="GI:21961320"/db_xref="GeneID:11318"/db_xref="MIM:605307" 1 5 8 1 0 0 0 18 33 6 0 2 2 12 2 4 14 2 5 15 2 4 4 13 1 10 7 13 3 2 2 10 5 1 1 17 16 1 1 5 9 3 1 10 15 5 1 11 7 2 11 4 12 3 14 7 1 17 2 11 10 0 0 1 >BC095509\BC095509\431..1399\969\AAH95509.1\Homo sapiens\Homo sapiens MAS-related GPR, member X4, mRNA (cDNA cloneMGC:111722 IMAGE:30706615), complete cds./gene="MRGPRX4"/codon_start=1/product="G protein-coupled receptor MRGX4"/protein_id="AAH95509.1"/db_xref="GI:63100789"/db_xref="GeneID:117196"/db_xref="MIM:607230" 0 8 1 3 1 8 2 15 35 2 3 2 2 8 2 4 8 6 6 6 3 1 6 5 1 3 2 4 3 3 5 6 3 3 1 11 9 7 3 3 7 4 1 7 2 2 4 7 3 4 8 2 6 7 12 6 1 14 6 8 7 0 0 1 >AF282882\AF282882\1..273\273\AAM52248.1\Homo sapiens\Homo sapiens plasminogen/activator kringle mRNA, complete cds./codon_start=1/product="plasminogen/activator kringle"/protein_id="AAM52248.1"/db_xref="GI:21425749" 0 0 1 1 1 0 0 2 3 1 0 0 1 3 1 0 1 2 2 1 2 0 1 2 1 2 2 5 1 0 1 4 3 1 0 1 0 1 2 2 2 5 0 3 1 1 0 2 2 3 7 0 4 2 0 1 1 1 0 2 2 1 0 0 >HSLIPOC\X67647\44..574\531\CAA47889.1\Homo sapiens\H.sapiens mRNA for tear lipocalin./codon_start=1/product="lipocalin precursor"/protein_id="CAA47889.1"/db_xref="GI:313856"/db_xref="GOA:P31025"/db_xref="UniProt/Swiss-Prot:P31025" 1 1 1 0 1 3 0 7 12 1 0 1 2 0 2 1 5 1 1 4 4 2 1 4 2 1 2 10 0 2 3 6 6 0 0 4 8 0 3 9 3 1 0 4 6 0 5 12 7 2 4 1 2 1 1 2 1 5 2 5 1 0 1 0 >AF480304\AF480304\1..642\642\AAL89654.1\Homo sapiens\Homo sapiens START domain-containing 5 protein (STARD5) mRNA,complete cds./gene="STARD5"/codon_start=1/product="START domain-containing 5 protein"/protein_id="AAL89654.1"/db_xref="GI:19525702" 2 1 4 0 4 1 3 4 4 2 1 1 1 5 0 2 5 1 3 9 0 2 6 5 3 2 3 5 1 3 4 4 4 4 2 2 13 3 1 9 6 2 2 4 0 5 5 9 7 3 3 3 2 5 6 6 0 3 4 5 4 1 0 0 >BC022475\BC022475\207..884\678\AAH22475.1\Homo sapiens\Homo sapiens insulin induced gene 2, mRNA (cDNA clone MGC:26273IMAGE:4794938), complete cds./gene="INSIG2"/codon_start=1/product="insulin induced protein 2"/protein_id="AAH22475.1"/db_xref="GI:18490895"/db_xref="GeneID:51141"/db_xref="MIM:608660" 2 0 1 1 5 0 4 3 4 3 6 3 3 2 0 8 2 4 4 0 2 4 6 3 0 2 9 2 0 5 9 4 2 4 7 3 8 4 6 1 4 4 3 5 0 4 5 3 1 5 3 4 2 5 3 11 4 2 10 5 6 0 0 1 >AK122653\AK122653\108..3179\3072\BAC85501.1\Homo sapiens\Homo sapiens cDNA FLJ16086 fis, clone NT2RP7005118, moderatelysimilar to RAS GTPASE-ACTIVATING-LIKE PROTEIN IQGAP1./codon_start=1/protein_id="BAC85501.1"/db_xref="GI:34527841" 12 13 16 6 8 14 10 33 65 11 4 10 6 11 2 7 17 4 12 18 4 10 10 13 4 11 23 50 1 24 5 12 16 9 4 21 37 8 14 56 25 10 12 66 25 12 16 45 34 16 20 15 5 2 25 13 3 32 10 17 9 0 0 1 >AF328864\AF328864\31..594\564\AAK15708.1\Homo sapiens\Homo sapiens hypothetical protein SBBI8 mRNA, complete cds./codon_start=1/product="hypothetical protein SBBI8"/protein_id="AAK15708.1"/db_xref="GI:13195462" 3 3 4 0 5 3 2 1 10 3 1 3 2 5 1 4 2 2 0 3 1 1 0 1 3 5 2 4 3 7 8 5 2 2 0 3 3 4 10 6 1 2 6 5 1 1 14 5 4 1 3 2 2 0 2 1 0 2 1 4 3 0 0 1 >AF152525\AF152525\1..2616\2616\AAD43785.1\Homo sapiens\Homo sapiens protocadherin gamma C4 short form protein(PCDH-gamma-C4) variable region sequence, complete cds./gene="PCDH-gamma-C4"/function="cell-cell adhesion"/codon_start=1/product="protocadherin gamma C4 short form protein"/protein_id="AAD43785.1"/db_xref="GI:5457098" 11 9 10 8 8 9 10 27 31 16 3 15 17 14 2 21 20 16 8 14 2 16 27 18 6 20 18 18 2 28 12 14 19 12 13 15 38 11 1 15 15 19 9 20 9 8 13 32 27 32 11 10 6 10 14 21 1 17 9 9 5 1 0 0 >HSA010228\AJ010228\297..1163\867\CAA09043.1\Homo sapiens\Homo sapiens mRNA for RET finger protein-like 1, long variant./gene="rfpl1l"/standard_name="RET finger protein-like 1, long variant"/codon_start=1/evidence=experimental/product="RFPL1L"/protein_id="CAA09043.1"/db_xref="GI:3417312"/db_xref="GOA:O75677"/db_xref="UniProt/Swiss-Prot:O75677" 2 5 2 1 2 7 3 6 12 2 1 4 2 7 0 5 6 7 6 3 1 4 6 4 2 5 2 5 0 7 5 4 5 3 1 9 8 2 4 10 6 3 3 7 4 4 5 12 9 8 1 2 9 6 8 7 1 7 4 8 4 1 0 0 >AL160175#10\AL160175\join(114367..114418,114912..115332,116225..116523, 118263..118368,119045..119138,121391..121500, 121583..121724,125605..125675,125799..125927, 126489..126641,126790..126922,128503..128583)\1791\CAC10006.1\Homo sapiens\Human DNA sequence from clone RP11-243J16 on chromosome 20 Containsthe BCL2L1 gene for the apoptosis regulator BCL2-like 1, theC20orf1 gene encoding the targeting protein for XKLP2 (FLS353), theMYLK gene for myosin light chain kinase 2 (skeletal muscle), theFKHL18 gene for forkhead-like 18 (Drosophila)(FREAC10), an ATPsynthase H+ transporting mitochondrial F0 complex subunit f isoform2 (ATP5J2) pseudogene, the DUSP15 gene for dual specificityphosphatase-like 15, the 3' end of a novel gene, the 5' end of theC20orf125 gene and seven CpG islands, complete sequence./locus_tag="RP11-243J16.2-001"/standard_name="OTTHUMP00000030560"/codon_start=1/protein_id="CAC10006.1"/db_xref="GI:10279705"/db_xref="Genew:16243"/db_xref="GOA:Q9H1R3"/db_xref="HSSP:2BBM"/db_xref="UniProt/Swiss-Prot:Q9H1R3" 1 6 2 0 1 14 2 13 28 2 0 5 9 7 2 2 14 5 8 17 1 4 10 22 5 10 16 24 5 11 10 16 17 5 1 12 14 4 17 35 20 4 5 18 5 5 9 39 18 12 5 3 3 7 11 9 0 17 7 18 4 0 0 1 >AF151830\AF151830\70..1401\1332\AAD34067.1\Homo sapiens\Homo sapiens CGI-72 protein mRNA, complete cds./codon_start=1/product="CGI-72 protein"/protein_id="AAD34067.1"/db_xref="GI:4929613" 2 1 1 3 7 3 2 1 5 9 8 6 15 7 3 20 8 11 10 2 1 7 17 8 1 16 15 7 1 15 9 1 6 4 8 3 5 6 28 19 8 10 5 13 1 5 16 14 6 17 2 1 1 7 3 5 8 4 4 8 4 0 0 1 >D87258\D87258\49..1491\1443\BAA13322.1\Homo sapiens\Homo sapiens mRNA for serin protease with IGF-binding motif,complete cds./codon_start=1/product="serin protease with IGF-binding motif"/protein_id="BAA13322.1"/db_xref="GI:1513059" 4 14 10 0 0 3 0 7 23 4 1 4 4 9 5 3 10 2 4 10 3 2 6 5 15 4 4 27 11 4 11 17 8 6 2 11 25 2 14 13 14 4 2 14 6 3 12 18 17 8 3 4 15 1 4 6 2 23 11 6 0 0 1 0 >HSJ976O13#4\AL117354\join(complement(AC093577.2:49381..49385), complement(AC093577.2:18484..18682), complement(AC093577.2:18286..18367), complement(AC093577.2:14129..14224), complement(AC093577.2:13824..13924), complement(AC093577.2:13294..13442), complement(AC093577.2:10472..10567), complement(AC093577.2:9511..9579), complement(AC093577.2:8240..8363), complement(AC093577.2:1613..1680), complement(37326..37496),complement(32966..33071), complement(32794..32846),complement(32579..32683), complement(29747..30104))\-9354\CAI23502.1\Homo sapiens\Human DNA sequence from clone RP5-976O13 on chromosome 1p21.2-22.2Contains the 3' end of the M96 gene for likely ortholog of mousemetal response element binding transcription factor 2 and the 3'end of the CGI-100 gene for CGI-100 protein, complete sequence./gene="RP5-976O13.1"/locus_tag="RP5-976O13.1-002"/standard_name="OTTHUMP00000011846"/codon_start=1/product="likely ortholog of mouse metal response element binding transcription factor 2 (M96)"/protein_id="CAI23502.1"/db_xref="GI:56203558"/db_xref="GOA:Q5TE74"/db_xref="InterPro:IPR001965"/db_xref="InterPro:IPR002999"/db_xref="UniProt/TrEMBL:Q5TE74" 2 0 2 2 21 16 4 6 9 6 16 7 12 7 3 10 7 5 9 10 2 10 13 2 2 7 10 8 2 2 13 5 3 6 7 4 7 6 24 28 8 16 13 11 4 13 19 6 5 9 12 10 17 9 10 23 16 9 16 11 10 14 3 15 >AK125205\AK125205\181..588\408\BAC86085.1\Homo sapiens\Homo sapiens cDNA FLJ43215 fis, clone FEBRA2021908./codon_start=1/protein_id="BAC86085.1"/db_xref="GI:34531222" 4 3 5 1 1 3 0 2 2 1 1 0 1 3 0 1 3 0 0 6 3 0 3 8 7 4 0 5 3 2 2 5 8 0 1 1 2 1 2 2 1 1 1 3 10 1 0 5 1 0 1 0 3 1 2 3 0 0 1 4 1 0 1 0 >AK123553\AK123553\194..2326\2133\BAC85644.1\Homo sapiens\Homo sapiens cDNA FLJ41559 fis, clone CTONG1000087./codon_start=1/protein_id="BAC85644.1"/db_xref="GI:34529129" 3 3 1 1 14 5 10 6 12 23 15 19 23 5 0 14 12 16 22 4 2 18 8 1 1 10 19 8 3 8 15 9 4 7 21 6 18 18 22 8 16 26 18 18 8 13 36 10 10 17 8 10 6 16 8 19 11 11 17 10 8 0 1 0 >BC000591\BC000591\132..1814\1683\AAH00591.1\Homo sapiens\Homo sapiens apoptosis antagonizing transcription factor, mRNA(cDNA clone MGC:815 IMAGE:3346769), complete cds./gene="AATF"/codon_start=1/product="apoptosis antagonizing transcription factor"/protein_id="AAH00591.1"/db_xref="GI:12653627"/db_xref="GeneID:26574"/db_xref="MIM:608463" 6 5 2 0 8 15 4 5 19 14 2 14 4 4 1 11 11 14 7 9 1 5 9 6 4 9 11 11 6 9 10 9 10 10 5 7 7 3 24 19 6 5 6 20 8 5 32 32 25 29 4 5 1 1 8 12 3 8 5 10 5 0 0 1 >AK125596\AK125596\1077..1784\708\BAC86214.1\Homo sapiens\Homo sapiens cDNA FLJ43608 fis, clone SPLEN2012624./codon_start=1/protein_id="BAC86214.1"/db_xref="GI:34531742" 2 3 1 1 4 2 6 9 13 7 3 5 1 2 0 5 4 3 5 2 0 3 5 4 2 2 6 7 0 11 6 5 2 1 1 6 7 3 5 4 6 7 1 4 4 0 4 12 3 8 2 6 2 4 0 4 2 3 0 3 2 0 1 0 >BC060796\BC060796\146..2200\2055\AAH60796.1\Homo sapiens\Homo sapiens testis-specific protein TSP-NY, mRNA (cDNA cloneMGC:71605 IMAGE:5284906), complete cds./gene="TSP-NY"/codon_start=1/product="TSP-NY protein"/protein_id="AAH60796.1"/db_xref="GI:38173788"/db_xref="GeneID:84660" 3 4 2 4 7 5 8 12 19 14 9 11 21 12 2 15 8 12 11 6 9 9 7 6 3 4 9 11 3 9 5 7 5 3 7 6 9 6 38 34 12 21 21 28 10 8 48 27 15 22 2 4 11 10 7 7 5 11 19 17 4 0 0 1 >S77435\S77435\1..57\57\AAD14255.1\Homo sapiens\V beta 3=T-cell receptor beta 3 chain variable region {VDJjunction} [human, colorectal carcinoma patient, colonicintraepithelial T lymphocytes, mRNA Partial, 57 nt]./gene="V3"/codon_start=1/protein_id="AAD14255.1"/db_xref="GI:4261955" 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 1 2 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 2 0 0 1 2 0 0 1 0 1 0 0 0 0 >HUMIGHWE1\M36852\join(311..356,439..494)\102\AAA58803.1\Homo sapiens\Human Ig germline gamma-3 heavy-chain gene V region, partial cds./gene="IGHG3"/codon_start=1/map="14q32.33"/protein_id="AAA58803.1"/db_xref="GI:185632"/db_xref="GDB:G00-119-339" 0 0 0 0 1 0 0 1 5 1 0 0 0 1 1 0 0 0 0 0 0 0 2 1 0 1 1 0 0 1 1 1 1 0 0 1 2 0 1 1 0 0 0 2 1 1 0 1 0 0 0 0 0 0 2 0 0 0 0 1 2 0 0 0 >BC001370\BC001370\157..282\126\AAH01370.1\Homo sapiens\Homo sapiens LIM and senescent cell antigen-like domains 2, mRNA(cDNA clone IMAGE:3051392), complete cds./gene="LIMS2"/codon_start=1/product="LIMS2 protein"/protein_id="AAH01370.1"/db_xref="GI:12655041"/db_xref="GeneID:55679"/db_xref="MIM:607908" 0 1 1 0 0 1 0 1 5 0 0 0 0 1 1 1 0 0 1 1 0 0 0 2 1 0 0 3 0 0 0 0 0 0 0 0 1 0 0 9 1 0 0 1 0 0 0 3 1 0 1 0 1 1 1 0 0 0 0 1 0 0 0 1 >CR457390\CR457390\1..1041\1041\CAG33671.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834B0414D forgene TMEFF2, transmembrane protein with EGF-like and twofollistatin-like domains 2; complete cds, incl. stopcodon./gene="TMEFF2"/codon_start=1/protein_id="CAG33671.1"/db_xref="GI:48146897"/db_xref="InterPro:IPR000742"/db_xref="InterPro:IPR002350"/db_xref="InterPro:IPR006209"/db_xref="InterPro:IPR006210"/db_xref="UniProt/TrEMBL:Q9NSS5" 3 1 1 0 4 2 2 6 6 3 4 1 3 4 1 10 3 5 8 6 1 7 1 4 3 4 8 5 0 8 9 5 5 5 3 8 10 3 9 6 5 11 6 12 3 5 15 11 8 14 5 7 14 19 5 4 2 6 7 5 5 1 0 0 >CR457086\CR457086\1..2052\2052\CAG33367.1\Homo sapiens\Homo sapiens full open reading frame cDNA clone RZPDo834E0814D forgene SV2B, synaptic vesicle glycoprotein 2B; complete cds, incl.stopcodon./gene="SV2B"/codon_start=1/protein_id="CAG33367.1"/db_xref="GI:48146289" 5 4 3 0 9 3 4 13 34 4 2 6 4 11 0 12 9 9 13 15 2 3 7 9 1 2 8 25 2 11 12 25 13 7 2 16 17 6 11 20 11 12 6 16 8 8 16 26 13 21 18 12 8 7 30 22 2 26 15 33 14 1 0 0 >BC071930\BC071930\62..460\399\AAH71930.1\Homo sapiens\Homo sapiens ribosomal protein S12, mRNA (cDNA clone MGC:88615IMAGE:6295016), complete cds./gene="RPS12"/codon_start=1/product="ribosomal protein S12"/protein_id="AAH71930.1"/db_xref="GI:47940604"/db_xref="GeneID:6206"/db_xref="MIM:603660" 0 2 0 2 1 0 3 1 1 4 2 1 0 1 0 1 0 1 0 0 0 2 0 1 0 1 2 7 0 5 3 4 1 2 4 2 3 5 7 9 3 1 3 1 2 1 5 7 5 4 0 3 2 5 1 0 0 2 5 3 1 0 0 1 >HSY17392\Y17392\48..416\369\CAA76759.1\Homo sapiens\Homo sapiens mRNA for prefoldin subunit 1./gene="pfd1hu"/codon_start=1/product="prefoldin subunit 1"/protein_id="CAA76759.1"/db_xref="GI:3212110"/db_xref="GOA:O60925"/db_xref="UniProt/Swiss-Prot:O60925" 2 0 1 0 2 2 3 1 4 3 2 1 0 2 0 0 0 1 3 0 1 3 0 1 0 0 5 5 0 1 1 0 0 1 2 0 2 1 6 9 3 0 2 8 1 2 8 8 3 3 1 1 0 0 1 1 2 2 5 6 0 0 1 0 >AC092610#1\AC092610 AC024380\complement(join(54535..54640,54787..54926,55328..55467, 56810..57152,58703..58911,60723..60873,70435..70607, 71385..71622,72396..72508,77551..77711,87298..87335, 89764..89916))\1965\AAS07559.1\Homo sapiens\Homo sapiens BAC clone RP11-160E17 from 7, complete sequence./gene="FLJ10324"/codon_start=1/product="unknown"/protein_id="AAS07559.1"/db_xref="GI:41474573" 1 9 8 5 3 10 1 28 51 7 0 5 1 18 9 3 25 1 6 7 9 4 12 31 10 9 10 34 8 8 5 25 17 4 0 9 16 2 4 11 6 1 2 34 16 2 9 35 23 2 14 1 16 1 21 2 1 16 1 15 10 0 1 0 >AF502942\AF502942\141..3248\3108\AAP30832.1\Homo sapiens\Homo sapiens ubiquitin-specific protease 31 (USP31) mRNA, completecds./gene="USP31"/codon_start=1/product="ubiquitin-specific protease 31"/protein_id="AAP30832.1"/db_xref="GI:30315217" 6 8 7 7 17 7 8 14 24 17 16 22 14 7 3 19 10 15 11 8 4 17 16 9 5 13 15 13 2 20 22 14 8 12 8 9 20 19 44 40 20 30 32 28 9 15 60 42 12 43 13 22 14 23 5 30 12 11 27 21 16 1 0 0 >AL355310#4\AL355310\join(81322..81331,86356..86486,86715..86845,87213..87281, 87372..87480,87779..87965,88228..88369,88482..88571, 88827..88956,89136..89330,89417..89548,89696..89859, 90162..90288,90380..90543,90849..90969,91286..91459, 91736..91846,91996..92205)\2397\CAI19305.1\Homo sapiens\Human DNA sequence from clone RP5-1160K1 on chromosome 1 Containsthe GPR61 gene for G protein-coupled receptor 61, the GNAI3 genefor guanine nucleotide binding protein (G protein) alpha inhibitingactivity polypeptide 3, two novel genes, the GNAT2 gene for guaninenucleotide binding protein (G protein) alpha transducing activitypolypeptide 2, the AMPD2 gene for adenosine monophosphate deaminase2 (isoform L) and the 5' end of a ribosomal protein L7 (RPL7)pseudogene, complete sequence./gene="AMPD2"/locus_tag="RP5-1160K1.5-001"/standard_name="OTTHUMP00000013369"/codon_start=1/product="adenosine monophosphate deaminase 2 (isoform L)"/protein_id="CAI19305.1"/db_xref="GI:56206059" 3 22 21 4 0 7 3 19 64 2 1 3 6 13 4 6 21 5 1 19 8 3 10 17 12 12 4 29 7 8 1 13 13 8 2 11 35 0 4 31 18 9 2 33 25 8 11 63 24 16 25 6 8 4 24 10 1 23 6 24 6 0 0 1 >AF284421\AF284421\91..2277\2187\AAK84071.1\Homo sapiens\Homo sapiens complement factor MASP-3 mRNA, complete cds./codon_start=1/product="complement factor MASP-3"/protein_id="AAK84071.1"/db_xref="GI:15088517" 4 5 5 2 4 10 3 12 28 6 2 6 7 21 5 5 14 8 13 15 4 9 14 17 2 14 4 12 1 10 9 25 13 6 3 20 35 4 12 18 19 13 7 20 13 6 10 43 27 13 22 14 15 12 21 8 5 21 6 13 13 0 0 1 >S60073\S60073\1..81\81\AAB20039.1\Homo sapiens\T-cell antigen receptor VJ junction alpha chain Valpha8.1/Jalphaunassigned [human, cytotoxic T lymphocyte cell line B1b, mRNAPartial, 81 nt]./gene="T-cell antigen receptor VJ junction alpha chain Valpha8.1/Jalpha unassigned"/codon_start=1/product="T-cell antigen receptor VJ junction alpha chain Valpha8.1/Jalpha unassigned"/protein_id="AAB20039.1"/db_xref="GI:237179" 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 2 0 0 0 4 1 0 1 0 0 0 1 3 0 0 2 1 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 0 0 0 0 >BC007641\BC007641\336..563\228\AAH07641.1\Homo sapiens\Homo sapiens hypothetical protein MGC15912, mRNA (cDNA cloneMGC:15912 IMAGE:3534121), complete cds./gene="MGC15912"/codon_start=1/product="hypothetical protein MGC15912"/protein_id="AAH07641.1"/db_xref="GI:14043293"/db_xref="GeneID:84972" 1 0 0 1 0 1 2 2 1 6 2 2 0 0 1 0 1 1 1 0 0 2 0 0 0 2 0 2 0 2 3 1 2 2 1 2 4 2 1 2 1 0 3 0 1 2 2 1 1 2 0 1 1 0 2 1 0 3 0 1 3 0 0 1 >BC000259\BC000259\188..1366\1179\AAH00259.1\Homo sapiens\Homo sapiens family with sequence similarity 53, member C, mRNA(cDNA clone MGC:821 IMAGE:3357346), complete cds./gene="FAM53C"/codon_start=1/product="FAM53C protein"/protein_id="AAH00259.1"/db_xref="GI:12652997"/db_xref="GeneID:51307" 4 10 13 1 3 3 2 8 20 4 0 6 7 18 1 10 17 8 3 1 0 5 12 24 3 15 3 12 1 7 3 6 7 7 0 3 8 0 3 10 6 2 6 16 7 2 7 15 9 7 3 0 6 7 11 2 2 4 1 3 8 1 0 0 >BC019864\BC019864\185..2128\1944\AAH19864.1\Homo sapiens\Homo sapiens phosphodiesterase 4A, cAMP-specific (phosphodiesteraseE2 dunce homolog, Drosophila), mRNA (cDNA clone MGC:30168IMAGE:4942336), complete cds./gene="PDE4A"/codon_start=1/product="phosphodiesterase 4A, cAMP-specific (phosphodiesterase E2 dunce homolog, Drosophila)"/protein_id="AAH19864.1"/db_xref="GI:18043809"/db_xref="GeneID:5141"/db_xref="MIM:600126" 5 8 6 2 1 5 3 17 35 3 1 11 10 15 5 6 14 3 11 14 8 3 10 14 10 10 13 24 6 13 5 10 9 4 3 7 24 3 6 17 19 3 7 29 18 3 14 44 25 12 12 1 8 2 16 5 4 14 5 22 10 0 0 1 >AL445931#6\AL445931 AC041022\complement(join(149770..149797,153240..153475, 154966..155157,158499..158627,161289..161660, 163580..163794,164768..164915,165512..165649, 166471..166683))\1671\CAI13727.1\Homo sapiens\Human DNA sequence from clone RP11-374P20 on chromosome 9 Containsthe 5' end of the VAV2 gene for vav 2 oncogene, a novel gene(FLJ35348), the 3' end of the BRD3 gene for bromodomain containing3, a novel gene and five CpG islands, complete sequence./gene="BRD3"/locus_tag="RP11-374P20.3-002"/standard_name="OTTHUMP00000022725"/codon_start=1/product="bromodomain containing 3"/protein_id="CAI13727.1"/db_xref="GI:55960169" 1 3 8 1 2 4 5 4 13 0 3 3 2 6 8 3 14 4 6 13 9 4 14 28 11 13 8 35 9 11 2 13 2 3 2 13 24 3 14 51 10 6 4 22 10 2 5 33 17 8 10 4 2 3 6 6 2 10 4 16 4 0 0 1 >AC004510#2\AC004510\complement(join(16764..17053,18531..18558))\318\AAC08455.1\Homo sapiens\Homo sapiens chromosome 19, cosmid R30385, complete sequence./codon_start=1/product="R30385_2"/protein_id="AAC08455.1"/db_xref="GI:2996653" 0 2 0 1 2 3 0 1 1 2 0 0 2 2 0 0 2 2 1 0 1 2 4 2 0 4 1 2 0 0 5 2 1 2 1 1 2 0 3 4 1 3 1 3 6 2 8 1 0 1 1 4 0 7 1 3 1 1 1 1 1 0 1 0 >BC044920\BC044920\1597..2445\849\AAH44920.1\Homo sapiens\Homo sapiens seven in absentia homolog 1 (Drosophila), mRNA (cDNAclone MGC:44340 IMAGE:5295925), complete cds./gene="SIAH1"/codon_start=1/product="SIAH1 protein"/protein_id="AAH44920.1"/db_xref="GI:71297497"/db_xref="GeneID:6477"/db_xref="MIM:602212" 4 3 1 1 0 3 3 2 5 8 5 4 1 8 1 4 4 2 8 4 0 8 6 3 2 7 7 2 3 9 5 7 0 5 4 3 4 3 6 4 3 9 3 11 3 7 8 7 7 3 2 3 3 15 4 9 3 2 9 9 3 0 0 1 >AF307096\AF307096\778..2214\1437\AAL29190.1\Homo sapiens\Homo sapiens ZNF317-3 protein mRNA, complete cds, alternativelyspliced./codon_start=1/product="ZNF317-3 protein"/protein_id="AAL29190.1"/db_xref="GI:16797864" 4 4 9 0 6 17 0 14 5 7 1 2 7 11 1 2 12 3 1 9 10 10 3 8 2 4 4 11 4 5 6 9 17 3 2 4 9 3 21 25 10 4 2 12 37 3 12 18 9 6 13 3 17 10 13 6 2 6 3 13 4 0 0 1 >BC007781\BC007781\4..753\750\AAH07781.1\Homo sapiens\Homo sapiens inhibitor of growth family, member 4, transcriptvariant 1, mRNA (cDNA clone MGC:12557 IMAGE:4309278), complete cds./gene="ING4"/codon_start=1/product="inhibitor of growth family, member 4, isoform 1"/protein_id="AAH07781.1"/db_xref="GI:14043612"/db_xref="GeneID:51147"/db_xref="MIM:608524" 0 3 5 3 2 1 1 4 6 4 2 4 2 6 1 2 4 6 4 4 0 3 1 6 0 4 0 9 1 6 1 5 5 1 0 2 7 0 12 16 5 0 3 9 3 2 11 11 13 6 0 9 4 4 1 7 0 1 6 9 2 0 1 0 >AY769950\AY769950\join(3285..3427,26236..26357,28741..28863,32688..32900, 38531..38599,41033..41149,56284..56505,59149..59410, 59695..59866,64668..64761,68665..68879,71794..71944, 77929..78138,94160..97265,119263..119416,120813..121025, 121312..121540,121748..121930,123669..123785, 124394..124465,125885..125970,129624..129779, 162629..162773,163990..164138,165248..165424, 188104..188259)\7056\AAV85964.1\Homo sapiens\Homo sapiens coagulation factor VIII, procoagulant component(hemophilia A) (F8) gene, complete cds./gene="F8"/codon_start=1/product="coagulation factor VIII, procoagulant component (hemophilia A)"/protein_id="AAV85964.1"/db_xref="GI:56385012" 12 10 6 8 37 31 25 30 49 39 33 46 54 29 4 52 39 41 53 36 4 62 57 25 3 43 40 26 3 41 49 30 24 26 31 31 35 28 95 63 43 82 54 56 31 44 90 58 42 79 29 50 18 8 47 62 29 28 53 61 37 0 0 1 >BC071926\BC071926\25..417\393\AAH71926.1\Homo sapiens\Homo sapiens ribosomal protein S24, transcript variant 1, mRNA(cDNA clone MGC:88611 IMAGE:5556377), complete cds./gene="RPS24"/codon_start=1/product="ribosomal protein S24, isoform a"/protein_id="AAH71926.1"/db_xref="GI:48734994"/db_xref="GeneID:6229"/db_xref="MIM:602412" 2 2 1 0 6 2 2 0 2 3 0 0 1 1 0 0 0 0 4 4 0 5 0 2 1 1 3 2 1 1 1 5 2 3 2 4 1 1 8 14 3 2 2 1 1 3 4 1 1 4 1 3 0 0 2 4 0 2 3 6 0 0 0 1 >HSG115G20#1\AL078645\join(82987..83080,113065..113159,196216..196292, 203878..203938,AL358152.9:20859..20897)\-35478\CAI17842.1\Homo sapiens\Human DNA sequence from clone GS1-115G20 on chromosome 1 Containspart of the C1orf21 gene for chromosome 1 open reading frame 21, anovel pseudogene and a novel transcript, complete sequence./gene="C1orf21"/locus_tag="RP4-768P8.1-001"/standard_name="OTTHUMP00000033397"/codon_start=1/product="chromosome 1 open reading frame 21"/protein_id="CAI17842.1"/db_xref="GI:55957216"/db_xref="Genew:15494"/db_xref="UniProt/TrEMBL:Q9H246" 0 0 0 0 3 2 0 0 1 1 1 1 1 1 0 1 2 1 1 0 0 2 1 1 1 1 4 6 0 0 2 2 3 0 1 1 2 5 10 3 5 4 5 4 1 2 12 5 0 3 3 1 0 1 2 2 2 2 2 4 0 1 0 0 >BC006012\BC006012\76..663\588\AAH06012.2\Homo sapiens\Homo sapiens selenoprotein T, mRNA (cDNA clone MGC:14845IMAGE:4333083), complete cds./gene="SELT"/codon_start=1/transl_except=(pos:220..222,aa:Sec)/product="selenoprotein T"/protein_id="AAH06012.2"/db_xref="GI:45766671"/db_xref="GeneID:51714"/db_xref="MIM:607912" 1 1 4 0 2 2 2 4 6 5 4 1 4 2 1 4 5 0 1 0 1 1 4 1 1 4 3 3 3 2 1 6 1 3 2 2 5 6 1 7 2 6 6 5 3 2 4 6 2 3 4 3 0 3 6 4 5 3 6 12 3 0 1 1 >AY663420#3\AY663420\48577..49593\1017\AAU47288.1\Homo sapiens\Homo sapiens isolate fa0138 immunoglobulin superfamily member 4B(IGSF4B) gene, complete cds; and Duffy blood group (FY) genes,complete cds, alternatively spliced./gene="FY"/codon_start=1/product="Duffy blood group"/protein_id="AAU47288.1"/db_xref="GI:52426513" 0 3 0 0 2 1 6 14 35 3 0 12 4 8 0 8 8 4 3 7 2 5 5 8 0 6 7 20 1 9 8 8 7 8 2 9 9 1 1 5 4 3 0 10 5 3 4 3 5 6 2 4 5 6 11 4 1 4 4 4 11 0 1 0 >AK001944\AK001944\111..1691\1581\BAA91992.1\Homo sapiens\Homo sapiens cDNA FLJ11082 fis, clone PLACE1005206./codon_start=1/protein_id="BAA91992.1"/db_xref="GI:7023523" 4 2 2 3 8 3 6 8 16 25 11 14 7 5 2 11 3 14 5 2 1 7 11 2 0 18 13 5 0 18 10 2 2 6 6 4 13 10 18 10 5 13 13 14 1 10 29 10 5 16 8 16 2 3 9 20 8 7 14 9 7 0 0 1 >BC001342\BC001342\105..1481\1377\AAH01342.2\Homo sapiens\Homo sapiens Huntingtin interacting protein E, mRNA (cDNA cloneMGC:5623 IMAGE:3462741), complete cds./gene="HYPE"/codon_start=1/product="Huntingtin interacting protein E"/protein_id="AAH01342.2"/db_xref="GI:45945855"/db_xref="GeneID:11153" 0 8 5 1 2 10 0 16 28 3 1 5 3 8 5 2 9 0 5 17 5 4 4 8 7 5 5 21 7 6 1 12 6 1 1 9 22 2 7 19 12 0 2 14 11 6 12 23 14 3 15 2 2 1 9 5 2 23 2 16 4 1 0 0 >BC007846\BC007846\23..1357\1335\AAH07846.1\Homo sapiens\Homo sapiens fatty acid desaturase 1, mRNA (cDNA clone MGC:14118IMAGE:4129208), complete cds./gene="FADS1"/codon_start=1/product="fatty acid desaturase 1"/protein_id="AAH07846.1"/db_xref="GI:14043780"/db_xref="GeneID:3992"/db_xref="MIM:606148" 2 5 7 1 0 1 6 12 22 6 0 7 6 5 1 4 5 3 4 9 3 1 4 13 3 3 2 24 2 3 3 6 12 1 0 7 16 3 8 17 13 4 1 19 18 12 2 15 11 6 10 6 3 2 27 10 2 10 8 12 16 1 0 0 >AF494057\AF494057\join(2220..2323,4103..4333,4745..5610,7917..7991, 8069..8203,8806..8955,9447..9550,10475..10552)\1743\AAM00008.1\Homo sapiens\Homo sapiens methyl-CpG binding domain protein 4 (MBD4) gene,complete cds./gene="MBD4"/codon_start=1/product="methyl-CpG binding domain protein 4"/protein_id="AAM00008.1"/db_xref="GI:19852107" 4 3 3 3 15 7 4 8 8 17 5 8 16 4 0 14 11 16 9 11 1 13 11 5 3 8 10 4 0 11 14 5 4 8 4 6 11 9 39 22 13 12 11 6 4 8 35 15 12 19 4 9 5 9 4 20 6 8 8 8 10 1 0 0 >AB016901\AB016901\281..454\174\BAA78636.1\Homo sapiens\Homo sapiens HGC6.2 mRNA, complete cds./gene="HGC6.2"/codon_start=1/protein_id="BAA78636.1"/db_xref="GI:5006259" 1 0 0 0 0 1 0 2 1 0 1 0 1 2 0 1 0 2 1 0 1 2 2 1 1 3 0 1 0 3 2 2 0 1 1 0 2 0 1 0 0 0 2 1 1 3 1 1 3 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 >BC015308\BC015308\31..915\885\AAH15308.1\Homo sapiens\Homo sapiens survival of motor neuron 1, telomeric, transcriptvariant d, mRNA (cDNA clone MGC:20996 IMAGE:4427000), complete cds./gene="SMN1"/codon_start=1/product="survival of motor neuron 1, telomeric, isoform d"/protein_id="AAH15308.1"/db_xref="GI:15929774"/db_xref="GeneID:6606"/db_xref="MIM:600354" 0 1 1 0 5 2 4 1 5 2 4 1 8 6 1 7 5 4 4 3 0 4 23 6 3 7 3 1 1 12 8 9 2 6 1 1 3 4 13 9 3 14 7 4 1 3 9 6 4 14 3 5 3 5 2 5 4 2 8 6 6 1 0 0 >AF116538\AF116538\383..958\576\AAD41262.2\Homo sapiens\Homo sapiens pituitary tumor transforming gene protein 2 (PTTG2)gene, complete cds./gene="PTTG2"/codon_start=1/product="pituitary tumor transforming gene protein 2"/protein_id="AAD41262.2"/db_xref="GI:27414489" 0 1 0 2 4 0 1 2 7 3 3 2 5 1 0 6 1 3 4 4 0 3 7 5 0 8 3 5 0 6 5 4 1 0 1 3 4 4 10 10 1 3 2 3 1 1 8 7 4 6 2 1 1 1 3 7 2 3 2 4 1 0 0 1 >BC069529\BC069529\53..607\555\AAH69529.1\Homo sapiens\Homo sapiens paired-like homeobox protein OTEX, mRNA (cDNA cloneMGC:96960 IMAGE:7262169), complete cds./gene="OTEX"/codon_start=1/product="paired-like homeobox protein OTEX"/protein_id="AAH69529.1"/db_xref="GI:46854685"/db_xref="GeneID:158800"/db_xref="MIM:300447" 3 2 3 2 3 3 1 3 4 1 2 1 1 0 1 0 2 2 2 1 2 3 3 5 5 4 3 5 2 2 2 9 1 5 2 3 7 3 3 2 7 3 2 11 3 2 7 10 6 2 4 0 1 2 3 1 1 2 0 8 1 0 1 0 >HUMORFI\L40399\451..1248\798\AAC42005.1\Homo sapiens\Homo sapiens (clone S240ii117/zap112) mRNA, complete cds./codon_start=1/protein_id="AAC42005.1"/db_xref="GI:887372"/IGNORED_CODON=5 1 2 2 3 5 2 4 5 14 5 5 4 1 4 0 6 3 4 5 3 0 7 6 2 2 4 4 6 0 4 3 5 4 2 3 2 8 2 9 5 2 4 7 16 4 2 13 11 3 6 2 0 3 5 1 6 2 3 6 5 3 0 0 1 >AY676494\AY676494\621..1301\681\AAT78423.1\Homo sapiens\Homo sapiens claudin-like protein 24 (CLP24) mRNA, complete cds./gene="CLP24"/codon_start=1/product="claudin-like protein 24"/protein_id="AAT78423.1"/db_xref="GI:50541903" 1 6 2 1 3 4 0 9 21 1 0 0 3 7 0 0 3 1 1 7 3 1 2 3 2 1 5 17 7 1 1 9 7 0 0 5 14 0 1 1 7 1 1 6 1 1 0 8 8 1 4 0 8 1 6 2 0 4 4 6 7 0 0 1 >BC003103\BC003103\178..1104\927\AAH03103.1\Homo sapiens\Homo sapiens regulating synaptic membrane exocytosis 3, mRNA (cDNAclone MGC:1884 IMAGE:3503630), complete cds./gene="RIMS3"/codon_start=1/product="RIMS3 protein"/protein_id="AAH03103.1"/db_xref="GI:13111871"/db_xref="GeneID:9783" 1 5 10 0 0 2 1 7 17 1 0 1 3 12 0 3 18 2 7 15 2 2 5 7 1 2 2 14 2 5 6 12 11 3 0 3 12 1 3 15 4 2 1 15 2 0 5 9 8 5 3 2 4 1 6 2 0 9 4 10 3 1 0 0 >BC025963\BC025963\48..1286\1239\AAH25963.1\Homo sapiens\Homo sapiens acyl-Coenzyme A dehydrogenase, C-2 to C-3 short chain,mRNA (cDNA clone MGC:39242 IMAGE:4842286), complete cds./gene="ACADS"/codon_start=1/product="acyl-Coenzyme A dehydrogenase, C-2 to C-3 short chain, precursor"/protein_id="AAH25963.1"/db_xref="GI:19684166"/db_xref="GeneID:35"/db_xref="MIM:606885" 0 7 9 1 2 2 0 13 21 3 1 5 2 5 3 2 9 5 3 13 5 1 4 7 1 5 5 42 6 9 3 19 10 2 0 6 8 1 3 19 7 3 2 15 4 2 5 20 11 5 9 0 3 3 7 6 1 22 3 16 6 0 0 1 >BC029378\BC029378\9..1268\1260\AAH29378.1\Homo sapiens\Homo sapiens telomeric repeat binding factor (NIMA-interacting) 1,transcript variant 2, mRNA (cDNA clone MGC:32568 IMAGE:4724068),complete cds./gene="TERF1"/codon_start=1/product="telomeric repeat binding factor 1, isoform 2"/protein_id="AAH29378.1"/db_xref="GI:20810196"/db_xref="GeneID:7013"/db_xref="MIM:600951" 3 4 2 0 15 6 5 4 5 7 3 8 7 5 0 8 8 7 10 5 1 6 1 2 3 6 9 9 5 8 6 6 1 4 4 2 7 4 25 10 7 10 6 14 3 8 27 21 11 11 2 4 5 3 8 9 9 1 10 13 6 0 0 1 >AK122906\AK122906\313..1104\792\BAC85513.1\Homo sapiens\Homo sapiens cDNA FLJ16581 fis, clone TESTI2050987, weakly similarto RET finger protein-like 1./codon_start=1/protein_id="BAC85513.1"/db_xref="GI:34528351" 2 3 2 1 4 3 3 5 11 6 2 6 1 8 0 3 7 2 4 4 0 2 3 3 1 4 5 4 1 5 4 4 2 2 1 6 9 3 2 8 3 5 4 4 6 6 7 12 4 12 1 2 7 8 11 6 1 7 3 4 4 0 0 1 >BC010003\BC010003\136..753\618\AAH10003.2\Homo sapiens\Homo sapiens chromosome 6 open reading frame 117, mRNA (cDNA cloneMGC:16762 IMAGE:4134852), complete cds./gene="C6orf117"/codon_start=1/product="chromosome 6 open reading frame 117"/protein_id="AAH10003.2"/db_xref="GI:33872596"/db_xref="GeneID:112609" 0 2 0 0 6 3 0 3 8 4 1 2 3 7 1 6 2 2 3 6 0 1 6 1 0 2 3 5 0 2 5 1 2 1 1 2 6 3 4 5 7 3 4 7 4 1 7 11 8 5 3 4 1 1 2 12 0 3 7 4 2 0 0 1 >HSA131730#2\AJ131730\229..873\645\CAB56836.1\Homo sapiens\Homo sapiens mRNA for DREAM protein./gene="dream"/function="transcriptional repressor"/codon_start=1/evidence=experimental/product="DREAM protein"/protein_id="CAB56836.1"/db_xref="GI:6006496"/db_xref="GOA:Q9Y2W7"/db_xref="UniProt/Swiss-Prot:Q9Y2W7" 0 2 2 0 1 2 0 6 14 0 0 0 1 5 0 1 4 1 2 8 2 0 2 2 1 1 1 8 4 0 1 7 4 0 1 4 5 1 3 10 5 3 0 12 6 0 2 17 10 6 5 2 2 2 9 6 0 8 3 8 2 0 1 0 >HSTROPONC\X54163\90..722\633\CAA38102.1\Homo sapiens\Human mRNA for cardiac troponin I./codon_start=1/protein_id="CAA38102.1"/db_xref="GI:37428"/db_xref="GOA:P19429"/db_xref="UniProt/Swiss-Prot:P19429" 3 8 7 1 4 2 0 2 16 1 0 3 0 4 1 2 4 1 0 5 2 2 2 1 2 2 6 8 5 4 3 3 4 0 0 1 5 0 5 18 4 0 1 8 3 0 3 18 7 6 2 1 2 0 1 3 1 6 2 4 1 0 0 1 >AF110798#3\AF110798\join(1163..1184,1523..1729,2373..2496,2577..2817)\594\AAD17189.1\Homo sapiens\Homo sapiens interleukin-18 binding protein precursor (IL18BP)gene, alternative splice products, complete cds./gene="IL18BP"/codon_start=1/product="interleukin-18 binding protein c precursor"/protein_id="AAD17189.1"/db_xref="GI:4324926" 1 1 1 2 4 2 0 10 13 0 1 3 1 4 1 0 12 0 5 6 2 2 8 4 0 4 2 8 0 2 3 3 3 3 0 8 9 2 0 4 3 2 0 11 5 1 5 6 3 0 1 0 5 4 6 0 0 1 1 2 7 0 0 1 >HSU00803\U00803\448..1965\1518\AAA18284.1\Homo sapiens\Human SRC-like tyrosine kinase (FRK) mRNA, complete cds./gene="FRK"/codon_start=1/product="SRC-like tyrosine kinase"/protein_id="AAA18284.1"/db_xref="GI:392888" 5 1 2 2 17 3 6 5 14 11 5 10 11 6 1 8 6 6 8 5 2 8 11 4 3 4 12 8 2 8 11 7 3 8 8 3 7 7 15 13 12 10 12 16 6 6 21 15 14 13 10 14 4 4 6 17 5 13 10 10 11 0 0 1 >BC013975\BC013975\73..2319\2247\AAH13975.1\Homo sapiens\Homo sapiens sema domain, immunoglobulin domain (Ig), short basicdomain, secreted, (semaphorin) 3B, transcript variant 2, mRNA (cDNAclone MGC:12697 IMAGE:4121913), complete cds./gene="SEMA3B"/codon_start=1/product="semaphorin 3B, isoform 2 precursor"/protein_id="AAH13975.1"/db_xref="GI:15559241"/db_xref="GeneID:7869"/db_xref="MIM:601281" 5 26 26 7 1 8 4 15 44 2 0 9 3 13 10 4 17 7 3 16 11 5 8 22 12 6 12 38 18 8 11 25 16 5 4 14 33 4 2 20 14 3 7 22 18 5 5 39 28 5 13 3 16 4 20 15 1 10 4 9 13 0 0 1 >HSJ136O14#9\AL121963\join(106104..106447,complement(AL357141.8:70100..70221), complement(AL357141.8:34799..34962), complement(AL357141.8:33774..33942), complement(AL357141.8:22675..22833), complement(AL357141.8:10467..10648), complement(AL357141.8:9243..9408), complement(AL357141.8:8637..8848))\-35361\CAB87592.2\Homo sapiens\Human DNA sequence from clone RP1-136O14 on chromosome 6q21-22.3Contains a pseudogene similar to various (hypothetical) genes, theCOL10A1 gene for collagen, type X, alpha 1 (Schmid metaphysealchondrodysplasia) (collagen X, alpha-1 polypeptide; collagen, typeX, alpha 1 (Schmid metaphyseal chondrodysplasia)), the 5' end of anovel gene and the 5' end of the FRK gene for fyn-related kinase(GTK, RAK). Contains a CpG island, complete sequence./gene="FRK"/locus_tag="RP11-702N8.1-001"/standard_name="OTTHUMP00000017048"/codon_start=1/product="fyn-related kinase"/protein_id="CAB87592.2"/db_xref="GI:56203498"/db_xref="GOA:Q9NTR5"/db_xref="HSSP:1BU1"/db_xref="InterPro:IPR000719"/db_xref="InterPro:IPR000980"/db_xref="InterPro:IPR001245"/db_xref="InterPro:IPR001452"/db_xref="InterPro:IPR002290"/db_xref="InterPro:IPR008266"/db_xref="InterPro:IPR011009"/db_xref="InterPro:IPR011511"/db_xref="UniProt/TrEMBL:Q9NTR5" 0 1 0 1 10 4 4 12 11 8 8 15 4 4 2 6 8 9 15 2 0 8 7 7 1 9 5 7 1 4 11 4 11 6 9 10 6 8 21 11 5 12 10 11 7 9 10 8 4 4 5 14 10 14 9 31 11 5 15 11 6 8 10 7 >AF264014\AF264014\26..4387\4362\AAF91396.1\Homo sapiens\Homo sapiens scavenger receptor cysteine-rich type 1 protein M160precursor, mRNA, complete cds, alternatively spliced./codon_start=1/product="scavenger receptor cysteine-rich type 1 protein M160 precursor"/protein_id="AAF91396.1"/db_xref="GI:9652087" 8 7 4 9 30 26 11 20 38 23 13 20 24 19 7 37 21 19 28 18 3 21 10 11 3 12 24 27 2 30 72 29 35 17 13 19 50 27 26 14 26 39 17 29 24 28 38 40 40 57 4 12 46 58 16 22 9 26 21 17 57 0 0 1 >BC011686\BC011686\74..3496\3423\AAH11686.1\Homo sapiens\Homo sapiens damage-specific DNA binding protein 1, 127kDa, mRNA(cDNA clone MGC:19563 IMAGE:3845478), complete cds./gene="DDB1"/codon_start=1/product="damage-specific DNA binding protein 1"/protein_id="AAH11686.1"/db_xref="GI:15079750"/db_xref="GeneID:1642"/db_xref="MIM:600045" 6 13 12 6 4 7 7 28 46 17 8 19 9 16 6 15 28 11 13 29 9 19 7 19 0 15 7 29 4 19 17 33 18 14 9 26 48 8 15 39 27 27 8 39 18 10 36 55 35 20 18 17 15 7 21 25 1 40 31 27 8 0 1 0 >HSIL4SV\X81851\1..414\414\CAA57444.1\Homo sapiens\H. sapiens IL-4 gene splice variant./gene="IL-4"/codon_start=1/product="interleukin-4"/protein_id="CAA57444.1"/db_xref="GI:673419"/db_xref="GOA:P05112"/db_xref="UniProt/Swiss-Prot:P05112" 1 1 2 0 1 4 2 5 7 1 1 4 1 2 1 0 3 1 2 3 2 5 0 1 0 2 2 2 3 1 1 3 0 2 0 1 2 0 3 8 6 1 1 7 5 1 4 5 2 1 1 1 3 3 7 1 0 5 0 2 1 0 0 1 >HSPSANLZP\Z50781\136..369\234\CAA90644.1\Homo sapiens\H.sapiens mRNA for leucine zipper protein./codon_start=1/evidence=experimental/product="leucine zipper protein"/protein_id="CAA90644.1"/db_xref="GI:1834507"/db_xref="GOA:Q99576"/db_xref="UniProt/Swiss-Prot:Q99576" 1 0 0 1 1 0 1 0 8 0 0 1 0 3 0 1 2 0 0 2 0 0 3 2 0 2 1 1 1 2 0 0 0 2 0 0 6 0 0 5 2 1 1 4 0 1 2 12 0 1 0 1 0 1 1 0 0 2 0 2 0 1 0 0 >AL513327#5\AL513327\complement(join(122695..122849,126700..126976, 127901..128042,129178..129292,130105..130234, 131920..132072))\972\CAI13955.1\Homo sapiens\Human DNA sequence from clone RP11-415J8 on chromosome 1 Containsthe gene for a novel protein (FLJ35476), the gene for a novelprotein similar to ABO family protein, two novel genes, the PHC2gene for polyhomeotic-like 2 (Drosophila) and two CpG islands,complete sequence./gene="PHC2"/locus_tag="RP11-415J8.4-003"/standard_name="OTTHUMP00000004294"/codon_start=1/product="polyhomeotic-like 2 (Drosophila)"/protein_id="CAI13955.1"/db_xref="GI:55960661"/db_xref="InterPro:IPR001660"/db_xref="UniProt/TrEMBL:Q8N306" 1 6 7 4 0 1 1 7 17 4 0 2 7 8 3 3 9 2 2 9 1 5 8 10 1 2 4 13 2 5 6 6 7 2 1 1 9 2 7 18 6 4 4 13 7 2 8 16 11 3 3 4 3 4 8 2 1 10 1 9 1 0 1 0 >BC028712\BC028712\245..1039\795\AAH28712.1\Homo sapiens\Homo sapiens ankyrin repeat domain 19, mRNA (cDNA clone MGC:27029IMAGE:4837806), complete cds./gene="ANKRD19"/codon_start=1/product="ANKRD19 protein"/protein_id="AAH28712.1"/db_xref="GI:71052207"/db_xref="GeneID:138649" 0 3 4 1 9 6 3 8 10 6 2 6 3 2 0 2 5 2 2 2 1 5 2 0 0 1 5 9 2 8 1 5 3 1 2 2 5 2 4 8 8 10 8 7 5 9 8 7 10 3 3 5 3 3 5 4 4 7 8 4 1 0 0 1 >AF067513\AF067513\620..2200\1581\AAC72078.1\Homo sapiens\Homo sapiens PITSLRE protein kinase alpha SV4 isoform (CDC2L1)mRNA, complete cds./gene="CDC2L1"/codon_start=1/product="PITSLRE protein kinase alpha SV4 isoform"/protein_id="AAC72078.1"/db_xref="GI:3850306" 2 4 5 3 2 7 1 13 34 2 1 1 9 8 4 4 18 5 2 17 7 1 4 16 5 5 5 14 2 3 4 17 11 6 1 5 17 3 5 29 14 1 0 16 8 2 27 57 21 5 11 4 3 0 18 0 0 18 2 12 5 0 0 1 >BC060782\BC060782\450..2861\2412\AAH60782.1\Homo sapiens\Homo sapiens leucine rich repeat containing 8 family, member B,mRNA (cDNA clone MGC:71558 IMAGE:5298560), complete cds./gene="LRRC8B"/codon_start=1/product="LRRC8B protein"/protein_id="AAH60782.1"/db_xref="GI:38174504"/db_xref="GeneID:23507" 2 7 2 2 3 6 15 23 42 26 13 24 14 20 2 8 16 9 10 8 3 11 14 6 2 14 8 13 1 10 5 4 5 6 4 15 20 10 33 22 18 27 2 33 10 14 24 29 19 11 12 20 9 12 10 23 8 23 17 12 12 0 0 1 >AY531265\AY531265\1..429\429\AAT00461.1\Homo sapiens\Homo sapiens biogenesis of lysosome-related organelles complex 1subunit 2 mRNA, complete cds./codon_start=1/product="biogenesis of lysosome-related organelles complex 1 subunit 2"/protein_id="AAT00461.1"/db_xref="GI:46558847" 2 0 2 0 0 1 0 3 8 2 2 2 1 1 0 0 1 3 1 3 1 3 0 1 0 2 6 6 3 7 1 1 1 0 2 1 1 0 6 9 3 3 0 5 0 0 8 9 5 5 3 5 1 0 1 0 1 2 3 5 0 0 0 1 >S66400#2\S66400\join(51..819,1114..1532)\1188\AAB27919.1\Homo sapiens\CD44=CD44SP {alternatively spliced} [human, breast carcinoma cellline MCF-7, mRNA Partial, 1586 nt]./gene="CD44"/codon_start=1/product="CD44R5"/protein_id="AAB27919.1"/db_xref="GI:435700" 2 3 2 1 4 6 1 6 6 3 0 9 7 8 1 5 11 9 16 11 4 7 7 9 1 7 9 6 1 9 12 3 8 6 2 3 13 2 4 8 13 10 3 9 6 6 13 8 17 9 6 4 6 3 5 7 4 14 7 8 5 1 0 0 >AK126047\AK126047\1418..1927\510\BAC86411.1\Homo sapiens\Homo sapiens cDNA FLJ44059 fis, clone TESTI4035065./codon_start=1/protein_id="BAC86411.1"/db_xref="GI:34532390" 0 1 2 1 5 8 0 2 6 2 0 5 5 3 0 4 5 3 6 3 0 2 8 4 1 3 3 4 0 3 7 4 4 5 1 1 3 3 0 1 1 1 2 7 8 1 2 2 1 0 2 0 0 2 4 1 1 2 1 4 9 0 0 1 >S73837\S73837\1..66\66\AAD14137.1\Homo sapiens\TCR V delta 2 D delta J delta=T cell receptor delta chain variableregion [human, systemic lupus erythematosus patient, peripheralblood, lymphocytes, mRNA Partial, 66 nt]./gene="TCR V delta 2 D delta J delta"/codon_start=1/protein_id="AAD14137.1"/db_xref="GI:4261837" 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 3 1 0 0 0 0 0 0 2 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 2 1 2 0 0 2 0 0 0 0 0 1 1 0 0 0 >BC000148\BC000148\55..690\636\AAH00148.1\Homo sapiens\Homo sapiens synaptosomal-associated protein, 23kDa, transcriptvariant 1, mRNA (cDNA clone MGC:5325 IMAGE:2900640), complete cds./gene="SNAP23"/codon_start=1/product="synaptosomal-associated protein 23, isoform SNAP23A"/protein_id="AAH00148.1"/db_xref="GI:12652793"/db_xref="GeneID:8773"/db_xref="MIM:602534" 1 2 0 1 7 1 2 2 7 2 2 1 3 0 0 4 1 3 9 2 1 5 4 0 1 1 3 6 0 4 6 6 0 4 2 1 2 0 6 9 7 10 8 6 1 0 12 7 6 10 1 1 3 3 0 1 5 5 8 5 1 1 0 0 PyCogent-1.5.3/tests/data/interleaved.phylip000644 000765 000024 00000015602 10665667404 022066 0ustar00jrideoutstaff000000 000000 3 1758 I human ATGGGCCAAG GGGACGAGAG CGAGCGCATC GTGATCAACG TGGGCGGCAC chimp ?????????? ?????????? ?????????? ?????????? ?????????? mouse ATGGGCCAAG GGGACGAGAG CGAGCGCATC GTGATCAACG TGGGCGGCAC GCGCCACCAG ACGTACCGCT CGACCCTGCG CACGCTGCCC GGCACGCGGC ?????????? ?????????? ?????????? ?????????? ?????????? GCGCCACCAG ACGTACCGCT CGACGCTGCG CACGCTGCCC GGCACGCGGC TCGCCTGGCT GGCGGAGCCC GACGCCCACA GCCACTTCGA CTATGACCCG ?????????? ?????????? ?????????? ?????????? ?????????? TTGCCTGGCT GGCAGAGCCG GACGCCCACA GCCACTTCGA CTATGACCCG CGTGCTGACG AGTTCTTCTT CGACCGCCAC CCCGGCGTCT TCGCGCACAT ?????????? ?????????? ?????????? ?????????? ?????????? CGTGCCGACG AGTTCTTCTT CGACCGCCAC CCGGGCGTCT TCGCTCACAT CCTGAACTAC TACCGCACGG GCAAGCTGCA CTGCCCAGCC GACGTGTGCG ?????????? ??CCGCACGG GCAAGCTGCA CTGCCCAGCC GACGTGTGCG CCTGAACTAT TACCGCACCG GCAAGCTTCA CTGCCCGGCC GACGTGTGCG GGCCGCTCTA CGAGGAGGAG CTGGCCTTCT GGGGCATCGA CGAGACCGAC GGCCGCTCTA CGAGGAGGAG CTGGCCTTCT GGGGCATCGA CG???C?GAC GGCCGCTCTA CGAGGAGGAG CTGGCCTTCT GGGGCATCGA CGAGACGGAC GTGGAGCCCT GCTGCTGGAT GACGTACCGC CAGCACCGCG ACGCCGAGGA GTGGAGCCCT GCTGCTGGAT GACGTACCG? ?AGCACCGCG ACGCCGAGGA GTGGAGCCCT GCTGCTGGAT GACCTATCGC CAGCACCGAG ACGCTGAGGA GGCTCTGGAC AGCTTCGGCG GCGCTCCTCT GGACAACAGC GCCGACGACG GGCGCTGGAC AGCTTCGGCG GCGCTCCTCT GGACAACAGC GCCGACG??? GGCGCTGGAC AGCTTTGGCG GTGCGCCCTT GGACAACAGC GCCGACGACG CGGACGCCGA CGGCCCTGGC GACTCGGGCG ACGGCGAGGA CGAGCTGGAG ?????????? ?????????? ??????GGCG ACGGCGAGGA CGAGCTGGAG CGGACGCCGA CGGCCCCGGC GACTCGGGCG ACGGCGAGGA CGAGCTGGAG ATGACCAAGC GCCTGGCGCT CAGTGACTCC CCGGATGGCC GGCCTGGCGG ATGACCAAGC GC??????CT CAGTGACTCC CCGGATGGCC GGCCTGGC?? ATGACCAAGA GATTGGCACT CAGTGACTCC CCAGATGGCC GGCCTGGCGG CTTTTGGCGC CGCTGGCAGC CGCGCATCTG GGCGCTCTTC GAGGACCCGT CTTT?GGCGC C????????? ?????????? ?????????? ?????????? CTTCTGGCGC CGCTGGCAAC CGCGCATCTG GGCGCTGTTC GAGGACCCCT ACTCGTCCCG CTACGCGCGG TATGTGGCCT TCGCTTCCCT CTTCTTCATC ?????????? ?????????? TATGTGGCCT TCGCTTCCCT CTTCTTCATC ACTCATCCCG CTACGCGCGG TATGTGGCCT TTGCCTCCCT CTTCTTCATC CTGGTCTCCA TCACCACCTT CTGCCTGGAG ACCCACGAGC GCTTCAACCC CTGGTCTCCA TCACCACCTT CTGCCTGGAG ACCCACGAGC GCTTCAACCC CTGGTCTCCA TCACAACCTT CTGTCTGGAG ACTCACGAGC GCTTCAACCC CATCGTGAAC AAGACGGAGA TCGAGAACGT TCGCAATGGC ACGCAAGTGC CATCGTGAAC AAGACGGAGA TTGAGAACGT TCGCAATGGC ACGCAAGTGC CATCGTGAAC AAGACCGAAA TCGAGAACGT TCGAAACGGC ACCCAAGTGC GCTACTACCG GGAGGCCGAG ACGGAGGCCT TCCTTACCTA CATCGAGGGC GCTACTACCG GGAGGCCGAG ACGGAGGCCT TCCTTACCTA CATCGAGGGC GGTACTACCG GGAAGCAGAG ACGGAGGCCT TCCTCACCTA CATCGAGGGC GTCTGTGTGG TCTGGTTCAC CTTCGAGTTC CTCATGCGTG TCATCTTCTG GTCTGTGTGG TCTGGTTCAC CTT???GTTC CTCATGCGT? TCATCTTCTG GTCTGCGTGG TCTGGTTCAC CTTCGAGTTC CTCATGCGTG TCGTCTTCTG CCCCAACAAG GTAGAGTTCA TCAAGAACTC GCTCAACATC ATTGACTTTG CCCCAACAAG GTAGAGTTCA TCAAGAACTC GCTCAACATC ATTGACTTTG CCCCAACAAG GTGGAATTCA TCAAGAACTC CCTCAATATC ATTGACTTTG TGGCCATCCT GCCCTTCTAC CTGGAGGTGG GGCTGAGCGG CCTGTCCTCC TGGCCATCCT GCCCTTCTAC CTGGAGGTGG GGCTGAGCGG CCTGTCCTCC TGGCCATTCT CCCCTTCTAC CTGGAGGTGG GCCTAAGCGG CCTGTCCTCA AAGGCAGCCA AGGACGTGCT GGGCTTCCTG CGCGTCGTCC GCTTCGTGCG AAGGCAGCCA AGGACGTGCT GGGCTTCCTG CGCGTCGTCC GCTTCGTGCG AAAGCCGCCA AGGACGTTCT GGGCTTCCTG CGCGTCGTCC GCTTCGTGCG CATCTTGCGC ATCTTTAAGC TGACCCGCCA CTTTGTGGGC CTGCGGGTCC CATCTTGCGC ATCTTTAAGC TGACCCGCCA CTTTGTGGGC CTGCGGGTCC CATCCTGCGC ATCTTCAAGC TGACCCGCCA CTTCGTGGGC CTGAGGGTCC TGGGCCACAC GCTCCGAGCC AGCACCAACG AGTTCCTGCT GCTCATCATC TGGGCCACAC GCTCCGAGCC AGCACCAACG AGTTCCTGCT GCTCATCATC TGGGCCACAC GCTCCGTGCC AGCACCAACG AGTTCCTGCT GCTTATCATC TTCCTGGCCT TGGGCGTGCT GATCTTCGCC ACCATGATCT ACTACGCCGA TTCCTGGCCT TGGGCGTGCT GATCTTCGCC ACCATGATCT ACTATGCCGA TTCCTGGCCC TGGGAGTGCT CATCTTTGCC ACCATGATCT ACTACGCCGA GAGGATAGGG GCACAGCCCA ATGACCCCAG CGCCAGTGAG CACACGCACT GAGGATAGGG GCACAGCCCA ATGACCCCAG CGCCAGTGAG CACACGCACT GAGGATAGGG GCACAGCCCA ATGACCCCAG CGCCAGCGAA CACACACACT TTAAGAACAT CCCCATCGGC TTCTGGTGGG CCGTGGTCAC CATGACGACC TTAAGAACAT CCCCATCGGC TTCTG?TGGG CCGTGGTCAC CATGACGACC TTAAAAACAT CCCCATCGGC TTCTGGTGGG CTGTGGTCAC CATGACGACA CTGGGCTATG GAGACATGTA CCCGCAGACG TGGTCCGGCA TGCTGGTGGG CTGGGCTATG GAGACATGTA CC???AGACG TGGTCCGGCA TGCTGGTGGG CTGGGCTATG GAGACATGTA TCCCCAGACG TGGTCTGGAA TGCTGGTGGG GGCTCTGTGT GCGCTGGCGG GCGTGCTCAC CATCGCCATG CCCGTGCCCG GGCTCTGTGT GCGCTGGCGG GCGTGCTCAC CATCGCCATG CCCGTGCCCG AGCCTTGTGT GCTCTGGCTG GTGTGCTGAC CATTGCCATG CCGGTGCCTG TCATCGTGAA CAATTTCGGG ATGTATTACT CCTTAGCCAT GGCTAAGCAG TCATCGTGAA CAATTTCGGG ATGTATTACT CCTTAGCCAT GGCTAAGCAG TCATCGTGAA CAATTTTGGG ATGTACTACT CTTTAGCCAT GGCTAAGCAG AAACTACCAA AGAAAAAAAA GAAGCATATT CCGCGGCCAC CGCAGCTGGG AAACTACCAA AGAAAAAAAA GAAGCATATT CCGCGGCCAC CGCAGCTGGG AAACTACCAA AGAAAAAAAA GAAGCATATT CCGCGGCCAC CACAGCTGGG ATCTCCCAAT TATTGTAAAT CTGTCGTAAA CTCTCCACAC CACAGTACTC ATCTCCCAAT TATTGTAAAT CTGTCGTAAA CTCTCCACAC CACAGTACTC ATCTCCCAAT TATTGTAAAT CTGTCGTAAA CTCTCCACAC CACAGTACTC AGAGTGACAC ATGTCCGCTG GCCCAGGAAG AAATTTTAGA AATTAACAGA AGAGTGACAC ATGTCCGCTG GCCCAGGAAG AAATTTTAGA AATTAACAGA AGAGTGACAC ATGCCCGCTG GCCCAGGAAG AAATTTTAGA AATTAACAGA GCAGATTCCA AACTGAATGG GGAGGTGGCG AAGGCCGCGC TGGCGAACGA GCAGATTCCA AACTGAATGG GGAGGTGGCG AAGGCCG??? TGGCGAAC?? GCAGATTCCA AACTGAATGG GGAGGTGGCG AAGGCCGCGC TGGCGAACGA AGACTGCCCC CACATAGACC AGGCCCTCAC TCCCGATGAG GGCCTGCCCT AGA??GCCCC CACATAGACC AGGCC???AC TCCCGATGAG GGCCTGCCCT AGACTGCCCC CACATAGACC AGGCCCTCAC TCCCGATGAG GGCCTGCCCT TTACGCGCTC GGGCACCCGC GAGAGATACG GACCCTGCTT CCTCTTATCA TTACGCG??C GGGCAC??GC GAGAGATACG GACCCTGCTT CCTCTTATCA TTACCCGCTC GGGCACCCGC GAGAGATACG GACCCTGCTT CCTCTTATCA ACCGGGGAGT ACGCGTGCCC ACCTGGTGGA GGAATGAGAA AGGATCTTTG ACCGGGGAGT ACGCG????? ?????????? ?????????? ???ATCTTTG ACCGGGGAGT ACGCGTGCCC ACCTGGTGGA GGAATGAGAA AGGATCTTTG CAAAGAAAGC CCTGTCATTG CTAAGTATAT GCCGACAGAG GCTGTGAGAG CAAAGAAAGC CCTGTCATTG CTAAGTATAT GCCGACAGAG GCTGTGAGAG CAAAGAAAGC CCTGTCATTG CTAAGTATAT GCCGACAGAG GCTGTGAGAG TGACTTGA TGACTTGA TGACTTGA PyCogent-1.5.3/tests/data/murphy.tree000644 000765 000024 00000001103 10665667404 020531 0ustar00jrideoutstaff000000 000000 (Chook,((((((((FlyingFox,DogFaced),((FreeTaile,LittleBro),(TombBat,RoundEare))),(FalseVamp,LeafNose)),(((Horse,Rhino),(Pangolin,(Cat,Dog))),(Llama,(Pig,(Cow,(Hippo,(SpermWhale,HumpbackW))))))),(Mole,Hedgehog)),(((TreeShrew,FlyingLem),(Galago,(HowlerMon,(Rhesus,(Orangutan,(Gorilla,(Human,Chimpanzee))))))),(Jackrabbit,(FlyingSqu,(OldWorld,(Mouse,Rat)))))),((NineBande,HairyArma),(Anteater,Sloth))),(((Dugong,Manatee),((AfricanEl,AsianElep),(RockHyrax,TreeHyrax))),(Aardvark,((GoldenMol,(Madagascar,Tenrec)),(LesserEle,GiantElep))))),(Caenolest,(Phascogale,(Wombat,Bandicoot)))); PyCogent-1.5.3/tests/data/primates_brca1.fasta000644 000765 000024 00000071347 10665667404 022261 0ustar00jrideoutstaff000000 000000 >TreeShrew tgtggcataaatacttatgccagctcattacagcatgagaacagcagtttattactcact aaggacagaatgaatgtagaaaaggctgaattgtgtaataaaagccaacaacctgactta gcaaggagccagcagagcagatggactgaaaataaggaaacatgtaatgataggcagatt cccagcacagaaaaaaaggtagatctaaatgctgatcccctgtgtgggaagaaaaaacaa gctaagcagaaacygctatgttctaacagtcctagagatnnngaccaagattctccttgg ataactctaaatagtagcattcagaaagttaatgaatggttttccagaagtgatgaaatg ttaacttctaacgactcacatgatggtgagtctgaannnnnnnnnnnnatagctggtgca ttygaagytccaaataaagtagatgaatattctggttcttcagaaaaaatagacttaatg gccaacaatcttcatgatgctttaataagtaaaagtgaaggaatctactccaaaccagta gagggtaatattgaagataaaatatttgggaaaacctatcggaggaaagcaagtcttcct aacttgagccgtgtaactgatgatctaattagaggggcatttgttacagagcctgagata actcgagagcgtcccttcacaaataaattaaagcggaaaaggagaactatatcaggcctt catcctgaagattttatcaagaaaacagatttggcagttgttcaaaagactcctgaaaag ataaatcagagaactgaccaaatagagcataatggtcaggtgatgagtattgctaatagt ggtcatgagaatgaaacaaaaggtgattatatttcgaaagagaaaaatgctaacccaatg gaatcattagaaaaagaatctgctctcagaactaaagctgagcccataagcagcagtgta agcaatatggaactagaaataaataaccacagttcagaagcacctaagaagaataggctg aggagaaagttttctgctaggcatattcgcacacttgaactagtagacaataaaagtcca agcccacctaatcgtactgaactacaaattgacagttattctagcggtgaagagagaaag aaaaagnnnggtgagcaaatgccagttggacacagcagaaagtttcaacttgaggaagag aaagaacctacaactggagccaagaaaaataaccagccaaatacagaaataagtgaaaga catgccagtggtgttatcccagatctgaagttaacaaacatacctggttttttcacaaac tcttcgagttctaataaacttccagaatttgtccatcgtagccttcaaagagaannnaaa gaagagaacnnncgagaaacaattcaaatatccagtagtaccaaannnnnnnnngatctg gtattaaggggagaannnaggggtttgcaagatgtaaggtctgcagagagtaccagtatt tctttggtacctgatactgatgataacacccaggatagcatctcattactagatgctaac cccctagctaggaaggcaaaaacagcaccaaatcaatgtgtaaatcagagtgcaacaact gaaaaccccaaggaacttatacacagttgttctaaaactactaggaatnnnnnngaaggc ttcaaggatccattgaaaagtgaagttaatcatattcaggagatgagtgtagaaatggag gagagtgaacttgatactcagtatttacagaatacattcaggagttcaaagcgtcagtca tttgctctgtcttcaaatccaggaaatccagaaaaggaacatgtctgtgttnnnnnnnnn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn aaagaaagtctgaaagagtctaacatccaacatatacaggcagttagtaccatggttnnn nnnnnnnnnnnntttcagaaagataagnnnctaggtgattttgctacatctggcattaaa gaagtccctagactttgtccatcatctcagttcagaggcaatgaaactgatctcattact gcaaataaacctgaagtttcacaaaacccgtatcatatgccattactttatcctgtcaag tcacctattataactaaaagtaagaaaagcctgtcagaggaagggtttgaggaacaggca atgtcacttgaaagagcaatggaaaatgagaacatcattcaaagtacagtgagcacaatt agccaagataacattagagaaggtgcttttaaagaagccagctcaagcagtattaatgaa ataggtcctagtactaatgaaggaagctctagtattaatgaggtaggttccagtnnnnnn nnnnnnnnnnnnnnnggtgaaaacattcaagcagaactaggtaaaaagagaggatccaaa ttaaatgctgtgcttagattaggtcttatgcaacccgaagtctataagcaaagtcttcct ttaagtaatcataatgatcctgaaatgaaaagacaagaaaaaaatgaaggaggagttcag gctattaaannngatttacctccatgtctaatttcagataatcaagagcatnnnatggga agtagccatgcttctcagatttgttctgagacacctgatgatctgttagatgatgatgaa ggaaaagaaaatnnnagctttgctgaggttgatgttaaggaaagatctgctgtttttggc aaaactgtccagagaagagagttaagaaggagctctagccctttaactcgtgcatgtttg actgagggtcagcaaacaggagcccagaaattagattcatcagaagagaacctatctagt gag >Orangutan tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcagact cccagcacagaaaaaaaggtagacctgaatgctgatcccctgtgtgagagaaaagaatgg aataagcagaaactgccatgctcagagaatcctagagatnnnactgaagatgttccttgg ataacactaaatagcagcattcagaaagttaatgagtggttttccagaagtgacgaactg ttaggttctgatgactcacatgatgggaggtctgaatcaaatgccaaagtagcggatgta ttggacgttctaaatgaggtagatgaatattctggttcttcagagaaaatagacttactg gccagtgatcctcatgaggctttaatttgtaaaagtgaaagagttcactccaaatcagta gagagtaatattgaagacaaaatatttgggaaaacctatcggaggaaggcaagcctcccc aacttaagccatgtaactgaaaatctaattataggagcatttgttactgagccacagata atacaagagcgtcccctcacaaataaattaaagcgtaaaaggagagctacatcaggcctt catcctgaggattttatcaagaaagcagatttggcagttnnncaaaagactcctgaaatg ataaatcagggaactaaccaaatggagcagaatggtcaagtgatgaatattactaatagt ggtcatgagaataaaacaaaaggtgattctattcagaatgagaaaaatcctaacccaata gaatcactcgaaaaagaatctgctttcaaaacaaaagctgaacctataagcagcagtata agcaatatggaactcgaattaaatatccataattcaaaagcacctaaaaagaataggctg aggaggaagtcttctaccaggcatattcatgcgcttgaactagtagtcagtagaaatcta agcccacctaattgtactgaattgcaaattgatagttgttctagcagtgaagagataaag aaaaaaaaatacaaccaaatgccagtcaggcacagcagaaacctacaactcatggaagat aaagaacctgcaactggagccaagaagagtaacaagccaaatgaacagacaagtaaaaga catgacagcgatactttcccagagctgaagttaacaaatgcacctggttcttttactaac tgttcaaataccagtgagcttaaagaatttgtcaatcctagccttccaagagaagaaaaa gaagagaaannnctaggaacagttaaagtgtctaataatgccaaagaccccaaagatctc atgttaagtggaggannnagggttttgcaaactgaaagatctgtagagagtagcagtatt tcattggtacctggtactgattatggcactcaggaaagtatctcgttactggaagttagc actctagggnnnaaggcaaaaacagaaccaaataaatgtgtgagtcagtgtgcagcattt gaaaaccccaaggaactaattcatggttgtttcaaagatactagaaatgacacagaaggg tttaagtatccattgggacatgaagttaaccacagtcaggaaacaagcatagaaatggaa gaaagtgaacttgatactcagtatttgcagaatacattcaaggtttcaaagcgccagtca tttgctctgttttcaaatccaggaaatccagaagaggaatgtgcaacattctctgcccac tctaggtccttaaagaaacaaagtccaaaagtcacttttgaatgtgaacaaaaggaagaa aatcaaggaaagaatgagtctaatatcaagcctgtacagacagctaatatcactgcaggc tttcctgtggtttgtcagaaagataagnnnccagttgattatgccaaatgtagtatcaaa ggaggctctaggttttgtctatcatctcagttcagaggcaacgaaactggactcattact ccaaataaacatggactttcacaaaacccatatcatataccaccactttttcccatcaag tcatttgttaaaactaaatgtaagaaaaacctgctagaggaaaactctgaggaacattca atgtcacctgaaagagaaatgggaaacgagaacnnnattccaagtacagtgagcataatt agccgtaataacattagagaaaatgtttttaaagaagccagctcaagcaatattaatgaa gtaggttccagtactaatgaagtgggctccagtattaatgaagtaggttccagtnnnnnn nnnnnnnnnnnnnnngatgaaaacattcaagcagaactaggtagaagcagagggccaaaa ttgaatgctatgcttagattaggggttttgcaacctgaggtctataaacaaagttttcct ggaagtaatggtaagcatcctgaaataaaaaagcaagaatatgaagaannngtacttcag actgttaatacagacttctctccatgtctgatttcagataacctagaacagcctatgaga agtagtcatgcatctcaggtttgttctgagacacctaatgacctgttagatgatggtgaa ataaaggaagatactagttttgctgaaaatgacattaaggaaagttctgctgtttttagc aaaagcgtccagagaggagagcttagcaggagtcctagccctttcacccatacacatttg gctcagggttaccgaagaggggccaagaaattagagtcctcagaagagaacttatctagt gag >Rhesus tgtggcacaaatactcatgccagctcattacagcatgagaacnnnagtttgttactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggcttg gcaaggagccaacataacagatggactggaagtaaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagatctgaatgctaatgccctgtatgagagaaaagaatgg aataagcaaaaactgccatgctctgagaatcctagagacnnnactgaagatgttccttgg ataacactaaatagcagcattcagaaagttaatgagtggttttccagaagtgatgaactg ttaagttctgatgactcacatgatggggggtctgaatcaaatgccaaagtagctgatgta ttggacgttctaaatgaggtagatgaatattctggttcttcagagaaaatagacttactg gccagtgatcctcatgagcctttaatatgtaaaagtgaaagagttcactccagttcagta gagagtaatattaaagacaaaatatttgggaaaacctatcggaggaaggcaaaccttccc aatttaagccatgtaactgaaaatctaattataggagcacttgttactgagtcacagata atgcaagagcgtcccctcacaaataaattaaagcgtaaaaggagaactacatcaggtctt catcctgaggattttataaagaaagcagatttggcagttnnncaaaagactcctgaaata ataaatcagggaactaaccaaatggagcagaatggtcaagtgatgaatattactaatagt gctcatgagaataaaacaaaaggtgattctattcagaatgagaaaaatcctaacccaata gaatcactggaagaagaatctgctttcaaaactaaagctgaacctataagcagcagtata aacaatatggaactagaattaaatatccacaattcaaaagcacctaaaaaaaataggctg aggaggaagtcttctaccaggcatattcatgcgcttgaactagtagtcagtagaaatcta agcccacctaactgtactgaactacaaattgatagttgttctagcagtgaagagataaag aaaaaaaattacaaccaaatgccagtcaggcacagcagaaacctacaactcatggaagat aaagaatctgcaactggagccaagaagagtaacaagccaaatgaacagacaagtaaaaga catgccagtgatactttcccagaactgaagttaacaaaggtacctggttcttttactaac tgttcannnaatactagtgaaaaagaatttgtcaatcctagcctttcaagagaagaaaaa gaagagaaannnctagaaacagttaaagtgtctaataatgccaaagaccccaaagatctc atcttaagtggagaannnagggttttacaaactgaaagatctgtagagagtagcagtatt tcattggtacctggtaccgattatggcactcaggaaagtatctcattactggaagttagc actctagggnnnaaggcaaaaacagaacgaaataaatgtatgagtcagtgtgcagcattt gaaaaccccaaggaactaattcatggttgttctgaagatactagaaatgacacagaaggc tttaagtatccattgggaagtgaagttaaccacagtcaggaaacaagcatagaaatagaa gaaagtgaacttgatactcagtatttgcagaatacattcaaggtttcaaagcgccagtcc tttgctctgttttcaaatccaggaaatccagaagaggaatgtgcaacattctctgcccac tctaggtccttaaagaaacaaagtccaaaagttacttctgaatgtgaacaaaaggaagaa aatcaaggaaagaaacagtctaatatcaagcctgtacagacagttaatatcactgcaggc ttttctgtggtttgtcagaaagataagnnnccagttgataatgccaaatgtagtatcaaa ggaggctctaggttttgtctatcatctcagttcagaggcaacgaaactggactcattact ccaaataaacatggactgttacaaaacccataccatataccaccactttttcctgtcaag tcatttgttaaaactaaatgtaacaaaaacctgctagaggaaaactctgaggaacattca gtgtcacctgaaagagcagtgggaaacaagaacatcattccaagtacagtgagcacaatt agccataataacattagagaaaatgcttttaaagaagccagctcgagcaatattaatgaa gtaggttccagtactaatgaagtgggctccagtattaatgaagtaggttccagtnnnnnn nnnnnnnnnnnnnnngatgaaaacattcaagcagaactaggtagaaacagagggccaaaa ttgaatgctgtgcttagattagggcttttgcaacctgaggtctgtaaacaaagtcttcct ataagtaattgtaagcatcctgaaataaaaaagcaagaacatgaagaannnttagttcag actgttaatacagacttctctccatgtctgatttcagataacctagaacagcctatggga agtagtcatgcgtctgaggtttgttctgagactcctgatgatctgttagatgatggtgaa ataaaggaagatactagttttgctgaaaatgacattaaggagagttctgctgtttttagc aaaagcatccagagaggagagctcagcaggagccctagccctttcacccatacacattta gctcagggttaccgaaaagaggccaagaaattagagtcctcagaagagaacttatctagt gag >Chimpanzee tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact cccagcacagaaaaaaaggtagatctgaatgctgatcccctgtgtgagagaaaagaatgg aataagcagaaactgccatgctcagagaatcctagagatnnnactgaagatgttccttgg ataacactaaatagcagcattcagaaagttaatgagtggttttccagaagtgatgaactg ttaggttctgatgactcacatgatggggggtctgaatcaaatgccaaagtagctgatgta ttggacgttctaaatgaggtagatgaatattctggttcttcagagaaaatagacttactg gccagcgatcctcatgaggctttaatatgtaaaagtgaaagagttcactccaaatcagta gagagtaatactgaagacaaaatatttgggaaaacctatcggaggaaggcaagcctcccc aacttaagccatgtaactgaaaatctaattataggagcatttgttactgagccacagata atacaagagcgtcccctcacaaataaattaaagcgtaaaaggagagctacatcaggcctt catcctgaggattttatcaagaaagcagatttggcagttnnncaaaagactcctgaaatg ataaatcagggaactaaccaaatggagcagaatggtcaagtgatgaatattactaatagt ggtcatgagaataaaacaaaaggtgattctattcagaatgagaaaaatcctaacccaata gaatcactcgaaaaagaatctgctttcaaaacgaaagctgaacctataagcagcagtata agcaatatggaactcgaattaaatatccacaattcaaaagcacctaaaaagaataggctg aggaggaagtcttctaccaggcatattcatgcgcttgaactagtagtcagtagaaatcta agcccacctaattgtactgaattgcaaattgatagttgttctagcagtgaagagataaag aaaaaaaagtacaaccaaatgccagtcaggcacagcagaaacctacaactcatggaagat aaagaacctgcaactggagtcaagaagagtaacaagccaaatgaacagacaagtaaaaga catgacagcgatactttcccagagctgaagttaacaaatgcacctggttcttttactaac tgttcaaataccagtgaacttaaagaatttgtcaatcctagccttccaagagaagaagaa gaagagaaannnctagaaacagttaaagtgtctaataatgccgaagaccccaaagatctc atgttaagtggagaannnagggttttgcaaactgaaagatctgtagagagtagcagtatt tcattggtacctggtactgattatggcactcaggaaagtatctcgttactggaagttagc actctagggnnnaaggcaaaaacagaaccaaataaatgtgtgagtcagtgtgcagcattt gaaaaccccaagggactaattcatggttgttccaaagatactagaaatgacacagaaggc tttaagtatccattgggacatgaagttaaccacagtcgggaaacaagcatagaaatggaa gaaagtgaacttgatgctcagtatttgcagaatacattcaaggtttcaaagcgccagtca tttgctctgttttcaaatccaggaaatccagaagaggaatgtgcaacattctctgcccac tgtaggtccttaaagaaacaaagtccaaaagtcacttttgaacgtgaacaaaaggaacaa aatcaaggaaagaatgagtctaatatcaagcctgtacagacagttaatatcactgcaggc tttcctgtggtttgtcagaaagataagnnnccagttgattatgccaaatgtagtatcaaa ggaggctctaggttttgtctatcatctcagttcagaggcaacgaaactggactcattact ccaaataaacatggacttttacaaaacccatatcatataccaccactttttcccatcaag tcatttgttaaaactaaatgtaagaaaaacctgctagaggaaaactttgaggaacattca atgtcacctgaaagagaaatgggaaatgagaacnnnattccaagtacagtgagcacaatt agccgtaataacattagagaaaatgtttttaaagaagccagctcaagcaatattaatgaa gtaggttccagtactaatgaagtgggctccagtattaatgaagtaggttccagtnnnnnn nnnnnnnnnnnnnnngatgaaaacattcaagcagaactaggtagaaacagagggccaaaa ttgaatgctatgcttagattaggggttttgcaacctgaggtctataaacaaagtcttcct gaaagtaattgtaagcatcctgaaataaaaaagcaagaatatgaagaannngtagttcag actgttaatacagatttctctccatgtctgatttcagataacttagaacagcctatggga agtagtcatgcatctcaggtttgttctgagacacctgatgacctgttagatgatggtgaa ataaaggaagatactagttttgctgaaaatgacattaaggaaagttctgctgtttttagc aaaagcgtccagagaggagagcttagcaggagtcctagccctttcacccatacacatttg gctcagggttaccgaagaggggccaagaaattagagtcctcagaagagaacttatctagt gag >Gorilla tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaaacaaacagcctggctta gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact cccagcacagaaaaaaaggtagatctgaatgctgatcccctgtgtgagagaaacgaatgg aataagcagaaactgccatgctcagagaatcctagagatnnnactgaagatgttccttgg ataacactaaatagcagcattcagaaagttaatgagtggttttccagaagtgatgaactg ttaggttctgatgactcacatgatggggggtctgaatcaaatgccaaagtagctgatgta ttggacgttctaaatgaggtagatgaatattctggttcttcagagaaaatagacttactg gccagtgatcctcatgaggctttaatatgtaaaagtgaaagagttcactccaaatcagta gagagtaatattgaagacaaaatatttgggaaaacctatcggaggaaggcaagcctcccc agcttaagccatgtaactgaaaatctaattataggagcatttgttactgagccacagata atacaagagcgtcccctcacaaataaattaaagcgtaaaaggagagctacatcaggcctt catcctgaggattttatcaagaaagcagatttggcagttnnncaaaagactcctgaaatg ataaatcagggaactaaccaaatggagcagaatggtcaagtgatgaatattactaatagt ggtcatgagaataaaacaaaaggtgattctattcagaatgagaaaaatcctaacccaata gaatcactagaaaaagaatctgctttcaaaacgaaagctgaacctataagcagcagtata agcaatatggaactcgaattaaatatccacaattcaaaagcgcctaaaaagaataggctg aggaggaagtcttctaccaggcatattcatgcgcttgaactagtagtcagtagaaatcta agcccacctaattgtactgaattgcaaattgatagttgttctagcagtgaagagataaag aaaaaaaagtacaaccaaatgccagtcaggcacagcagaaacctacagctcatggaagat aaagaacctgcaactggagccaagaagagtaacaagccaaatgaacagacaagtaaaaga catgacagcgatactttcccagagctgaagttaacaaatgcacctggttcttttactaac tgttcaaataccagtgaacttaaagaatttgtcaatcctagccttccaagagaagaaaaa gaagagaaannnctagaaacagttaaagtgtctaataatgccgaagaccccaaagatctc atgttaagtggagaannnagggttttgcaaactgaaagatctgtagagagtagcagtatt tcattggtacctggtactgattatggcactcaggaaagtatctcgttactggaagttagc actctagggnnnaaggcaaaaacagaaccaaataaatgtgtgagtcagtgtgcagcattt gaaaaccccaagggactaattcatggttgttccaaagatactagaaatgacacagaaggc tttaagtatccattgggacatgaagttaaccacagtcgggaaacaagcatagaaatggaa gaaagtgaacttgatgctcagtatttgcagaatacattcaaggtttcaaagcgccagtca tttgctctgttttcaaatccaggaaatccagaagaggaatgtgcaacattctctgcccac tctaggtccttaaagaaacaaagtccaaaagtcacttttgaatgtgaacaaaaggaagaa aatcaaggaaagaatgagtctaatatcaagcctgtacagacagttaatatcactgcaggc tttcctgtggtttgtcagaaagataagnnnccagttgattatgccaaatgtagtatcaaa ggaggctctaggttttgtctatcatctcagttcagaggcaacgaaactggactcattact ccaaataaacatggacttttacaaaacccatatcatataccaccactttttcccatcaag tcatttgttaaaactaaatgtaagaaaaacctgctagaggaaaactttgaggaacattca atgtcacctgaaagagaaatgggaaatgagaacnnnattccaagtacagtgagcacaatt agccgtaataacattagagaaaatgtttttaaagaagccagctcaagcaatattaatgaa gtaggttccagtactaatgaagtgggctccagtattaatgaagtaggttccagtnnnnnn nnnnnnnnnnnnnnngatgaaaacattcaagcagaactaggtagaaacagagggccaaaa ttgaatgctatgcttagattaggggttttgcaacctgaggtctataaacaaagtcttcct ggaagtaattgtaagcatcctgaaataaaaaagcaagaatatgaagaannngtagttcag actgttaatacagatttctctccatgtctgatttcagataacttagaacagcctatggga agtagtcatgcatctcaggtttgttctgagacacctgatgacctgttagatgatggtgaa ataaaggaagatactagttttgctaaaaatgacattaaggaaagttctgctgtttttagc aaaagcgtccagagaggagagcttagcaggagtcctagccctttcacccatacacatttg gctcagggttaccgaagaggggccaagaaattagagtcctcagaagagaacttatctagt gag >FlyingLem tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattacgcact aaagacagaataaatgttgaaaagactgaattctgtaataaaagcaaacagcctggctta gcaaggagccaggagagcagatgggctgaaagtaaggaaacatgtaatgataggcagacg cccagcacagagaaaaagatagatctaaatgctgattcccagcatgggagaaaagaacgg aatatgcagaaacctccataccctgagagtcctagagatnnnacccaagatgttccttgg ataacactaaacagcagcattcagaaagttaatgagtggttttccagaagtgatgaaatt ttagcttctgatgactcacgtgacagggtgtctgaatcaaatgccaaagtagctggtgca ttagaagttccaaatgatgtagatggatattctgattcttcagagaaagttgatttaatg gccagtgatcctcatgatgctttaatatgtaaaagtgaaagaatccactccagaccagta gagagtaatatcaaagataaaatatttgggaaaacctatcagaggaagacaagcctccct aacttgagccacgtaaatgaagatctaattataggagcatttgttacagaaccacagata acacaagagcgtcccctcacaaataaggtaaagcctaaaaggagaactacatcaggcctt catcctgaggattttatcaagaaagcagacttggcagttgttcaaaaaactcctgaaaag ataaatcagggaattgaccaaatggagcagaatgatcgagtgatgaatattattaatagt ggtcatgagaatgaaacaaaggatgattatgttcagaaagagaaaaatgctaacccaaca gaatcattggaaaaagaatctgctttcagaactaaagcagaacctataagcagcagtata agcaatatggaaatagaattaaatatccacaattcaaaaccatctaagaagaataggctg aggaagatgtcctctactaggcatattcatgcacttgaactagtagtcaatagaaatcca agcccacctaattatactgaactacaaattgatagttgttctagcagtgaagaaatagag aaaaaaaattccagccaaatgccagtcaggcacagcagaaagcttcaactcatggaaaat aaagaacctgcaactggagccaagaagagtaacaagccaaatgaacaaataagtagaaga cattccagtaatgctttcccagaactgcggttaacaaatgtacctgttttttttgctaac tgttcaagttctaataaacttcaagaatttatcgatcctagccttcaaagagaagaaata gaagagaacnnnctagaaacaattcatgtgtctaatagtgccaaagaccccaaagatttg gtgttaagtggggagnnnaagggtttgcaaactgaaagatctgtagagagtaccagtatt tcattagtacctgatactgattatggcactcaagacagtatctcaatattagaagctaac atcctagggnnnaaggcaaaaacagcaccaagtcaacatgcaaatcagtgtgcagcaatt gaaaaccccaaagaacttatccatggttgtcctaaaggtactagaaatgacacagaggat tttaaggatccattgagatgtggagttgaccacattcagaagacaagcatagaaatgcaa gagagtgaacttgatactcagtatttacaaaatatattcaaggtttcaaaacgtcagtca tttgctctcttttcaaatccaggaaatccagaaaaggagtgtgcaacagtctatgcccac tccaggttgttaaggaaacaaagtccaaaagtcactcctgaatgtgaacaaaaagaagaa aatgagggaaataaagagtctaaaatcaagcacatacaggcagttaataccactgtgggc ttttctgtcctttgtcagaatgttaagaagccaggtgattatgccaaatttagcattaaa ggagtctctaggcattgttcatcatctcagttcagaggcaatgaaactgaactcattact gcaaataaacatggaattttacaaaactcatgtcatatgtcatcactttcccccatcagg tcatctgttaaaattaaatgtaagaagaacctgtcagaggaaaggtttgaggaacattca gtgtcacctgaaagagcaatggcaaacaagagaatcattcaaagtacagtgaacacaatt agccaaaataacattagagacagtgcttctaaagaagccagctcaagcagtattaatgaa gtaggttccagtactaatgaagtaggctccagtattaatgaagtaggtcccagtnnnnnn nnnnnnnnnnnnnnnggtgaaaacattcaagcagaactaggtagaaacagaggacctaaa ttaagtgctatgcttagattaggcctcatgcaacctgaagtttacaagcaaaatcttcct ttaggtaattgtaaacatcctgaaataaggnnncaagaagaaaatgaaggaatagttcag gctgttaatacaaatctgtctctgtgcctaatttcacataacctcgaacaacctatggaa agtagtcatgcttcccaggtttgttctgagacacctgatgacctgttagatggtgatgag ataaaggaaaacaccagctttgctgaaagtgacagtaaggaaagatctgctgtttttagc aaaagtgtccagagaggagagttaagcaggagccctagcccttttgcccaaacatgtttg gctcagggtcaccaaagaggagccaggaaattagagtcttctgaagagaacgtatctagt gag >Galago tgtggcaaaaatactcatgccagctcattacagcatgagagcagcagtttattactcact aaagacaaaatgaatgtagaaaaggctgaattttgtaataaaagcaaacagcctggctta gcaaggagccaacagagcagatcggctcaaagtaaggaaacatgcaatgataggcacact tgcagccctgagcaaaaggtagatctgaatactgctcccccatatgggagaaaagaacag aataaggagaaacttctatgctccaagaatcctagagatnnnagccaagatgttccttgg ataacactaaatagcagcattcagaaagttaatgaatggttttctagaagtgatgaaatg ttaacttctgatgactcacatgatgagggttctgaatcacatgctgaagtagctggagcc ttagaagttccaagtgaagtagatggatattccagttcctcagagaaaatagacttactg gccagtgatcctcattatcctataatatgtaaaagtgaaagagttcactccaaaccaata aagagtaaagttgaagataaaatatttgggaaaacttatcggaggaaggcaagcctccct aacttaagccatgtaactgaaaatctaattataagagcagctgctactgagccacagata acacaagagtgttccctcacaaataaattaaaacgtaaaaggagaactacatcaggtctt tgtcctgaggattttatcaagaaggcagatttggcagttgttcaaaagacacctgaaaag agaattcagggaactaaccaagtggatcagaatagtcacgtggtaaatattactaatagt ggttatgagaatgaaacaaaaggtgattatgttcagaatgaaaaaaatgctaactcaaca gaatcattggaaaaagaatcttctctcggaactaaagctgaacctataagcagcagtata agtaatatgaaattagaattaaatattcacaattcaaaagcaagtaaaaagaaaaggctg aggaagaagtcttctagcaggcatattcgtgcacttgaactagtagtcaataaaaatcca agccctcctaatcataccaacctacaaattgacagttgttctagcagtgaagaaataaag gataaaagttctgaccaaataccagtcaggcatagcagaaagcctggactcatggaagat agagaacctgcaactggagccaagaaaagtaacaagccaaatgagcaaataagtaaaaga catgtcagtgatactttcccagaagtggcattaacaaatatatctagtttttttactaac tgttcaggttctaatagacttaaagaatttgtcaatcctagccttcaaagaaaaaaaaca gaagagaacttagaagaaacaattcaagtgtctaatagtaccaaaggtccggtgttaagt ggagaaagggttttgnnncaaattgaaagtgaagaaagatctataaaaagcaccagtatt tcattggtacctgatactgattatggtactcaggacagtaactcgttactgaaagttaaa gtcttacggnnnaaggtgaaaacagcaccaaataaacatgcaagtcagggtacagccact gaaaaccccaaggaactaatccatggttgctctaaagatactggaaatgacacagagggc tataaggatccattgagacatgaaattaaccacattcagaagataagcatggaaatggaa gacagtgaacttgatactcagtatttacagaatacattcaagttttcaaagcgtcagtcg tttgctctgttttcaaacctannnnnnnnnggaaaggaatgtgcaacagtctgtgcccag tctctctctgcgtccttaagaaaaggttcaaaagtcattcttgaatgtgaacaaatagaa aatccaggaatgaaagagcctaaaatcaagcatatacagggaaataatatcaatacaggc ttctctgtagtttgtcagaaagataagaaaacagatgattatgccaaatacagcatcaaa gaagcatctaggttttgtttgtcaaatcagtttcgagacaatgaaactgaatccattact gtaaataaacttggaattttacaaaacctctatcatataccaccactttctcctatcagg ctatttgataaaactaaatgtaatacaaacctgttagaggaaaggtttgaagaacattca gtgttacctgaaaaagcagtaggaaacgagaacaccgttccaagtacaatgaatacaatt aaccaaaataacnnnagagaaagtgcttataaagaagccagttcaagcagtatcaatgaa gtaagctcgagtactaatgaagtgggctccagtgttaacgaagtaggccccagtnnnnnn nnnnnnnnnnnnnnnagtgaaaacattcaagcagaactagataaaaacagaggacctaag ttgaatgctgtgcttagattaggtcttatgcaacctgaagtctataaacaaaatcttcct ataagtaattgtgaacatcctaaaataaaagggcaagaagaaaatggannngtagttcaa cctgttaatccagatttttcttcatgtctaatttcagataacctagaacaacctacgaga agtagtcatgcttctcagctttgttctgagacacctgatgacttattagttgatgatgaa ctaaaggaaaataccagttttgctgaaaataacattaaggaaagatctgctgtttttagc aaaaatgtcatgagaagagagattagcaggagccctagccctttagcccatatacatttg actcaggctcaccaaagagaggttaggaaattagagtcctcagaagagaacatgtctagt gaa >Human tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttattactcact aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact cccagcacagaaaaaaaggtagatctgaatgctgatcccctgtgtgagagaaaagaatgg aataagcagaaactgccatgctcagagaatcctagagatnnnactgaagatgttccttgg ataacactaaatagcagcattcagaaagttaatgagtggttttccagaagtgatgaactg ttaggttctgatgactcacatgatggggagtctgaatcaaatgccaaagtagctgatgta ttggacgttctaaatgaggtagatgaatattctggttcttcagagaaaatagacttactg gccagtgatcctcatgaggctttaatatgtaaaagtgaaagagttcactccaaatcagta gagagtaatattgaagacaaaatatttgggaaaacctatcggaagaaggcaagcctcccc aacttaagccatgtaactgaaaatctaattataggagcatttgttactgagccacagata atacaagagcgtcccctcacaaataaattaaagcgtaaaaggagacctacatcaggcctt catcctgaggattttatcaagaaagcagatttggcagttnnncaaaagactcctgaaatg ataaatcagggaactaaccaaacggagcagaatggtcaagtgatgaatattactaatagt ggtcatgagaataaaacaaaaggtgattctattcagaatgagaaaaatcctaacccaata gaatcactcgaaaaagaatctgctttcaaaacgaaagctgaacctataagcagcagtata agcaatatggaactcgaattaaatatccacaattcaaaagcacctaaaaagaataggctg aggaggaagtcttctaccaggcatattcatgcgcttgaactagtagtcagtagaaatcta agcccacctaattgtactgaattgcaaattgatagttgttctagcagtgaagagataaag aaaaaaaagtacaaccaaatgccagtcaggcacagcagaaacctacaactcatggaaggt aaagaacctgcaactggagccaagaagagtaacaagccaaatgaacagacaagtaaaaga catgacagcgatactttcccagagctgaagttaacaaatgcacctggttcttttactaag tgttcaaataccagtgaacttaaagaatttgtcaatcctagccttccaagagaagaaaaa gaagagaaannnctagaaacagttaaagtgtctaataatgctgaagaccccaaagatctc atgttaagtggagaannnagggttttgcaaactgaaagatctgtagagagtagcagtatt tcattggtacctggtactgattatggcactcaggaaagtatctcgttactggaagttagc actctagggnnnaaggcaaaaacagaaccaaataaatgtgtgagtcagtgtgcagcattt gaaaaccccaagggactaattcatggttgttccaaagataatagaaatgacacagaaggc tttaagtatccattgggacatgaagttaaccacagtcgggaaacaagcatagaaatggaa gaaagtgaacttgatgctcagtatttgcagaatacattcaaggtttcaaagcgccagtca tttgctccgttttcaaatccaggaaatgcagaagaggaatgtgcaacattctctgcccac tctgggtccttaaagaaacaaagtccaaaagtcacttttgaatgtgaacaaaaggaagaa aatcaaggaaagaatgagtctaatatcaagcctgtacagacagttaatatcactgcaggc tttcctgtggttggtcagaaagataagnnnccagttgataatgccaaatgtagtatcaaa ggaggctctaggttttgtctatcatctcagttcagaggcaacgaaactggactcattact ccaaataaacatggacttttacaaaacccatatcgtataccaccactttttcccatcaag tcatttgttaaaactaaatgtaagaaaaatctgctagaggaaaactttgaggaacattca atgtcacctgaaagagaaatgggaaatgagaacnnnattccaagtacagtgagcacaatt agccgtaataacattagagaaaatgtttttaaagaagccagctcaagcaatattaatgaa gtaggttccagtactaatgaagtgggctccagtattaatgaaataggttccagtnnnnnn nnnnnnnnnnnnnnngatgaaaacattcaagcagaactaggtagaaacagagggccaaaa ttgaatgctatgcttagattaggggttttgcaacctgaggtctataaacaaagtcttcct ggaagtaattgtaagcatcctgaaataaaaaagcaagaatatgaagaannngtagttcag actgttaatacagatttctctccatatctgatttcagataacttagaacagcctatggga agtagtcatgcatctcaggtttgttctgagacacctgatgacctgttagatgatggtgaa ataaaggaagatactagttttgctgaaaatgacattaaggaaagttctgctgtttttagc aaaagcgtccagaaaggagagcttagcaggagtcctagccctttcacccatacacatttg gctcagggttaccgaagaggggccaagaaattagagtcctcagaagagaacttatctagt gag >Mouse tgtggcacagatgctcatgccagctcattacagcctgagaccagcagtttattgctcatt gaagacagaatgaatgcagaaaaggctgaattctgtaataaaagcaaacagcctggcata gcagtgagccagcagagcagatgggctgcaagtaaaggaacatgtaacgacaggcaggtt cccagcactggggaaaaggtaggtccaaacgctgactcccttagtgatagagagaagtgg actcacccgcaaagtctgtgccctgagaattctggagctnnnaccaccgatgttccttgg ataacactaaatagcagcgttcagaaagttaatgagtggttttccagaactggtgaaatg ttaacttctgacagcgcatctgccaggaggcacgagtcaaatgctgaagcagctgttgtg ttggaagtttcaaacgaagtggatgggggttttagttcttcaaggaaaacagacttagta acccccgacccccatcatactttaatgtgtaaaagtggaagagacttctccaaaccagta gaggataatatcagtgataaaatatttgggaaatcctatcagagaaagggaagccgccct cacctgaaccatgtgactgaaattataggcacannnnnntttattacagaaccacagata acacaagagcagcccttcacaaataaattaaaacgtaagagannnnnnagtacatccctt caacctgaggacttcatcaagaaagcagattcagcaggtgttcaaaggactcctgacaac ataaatcagggaactgacctaatggagccaaatgagcaagcagtgagtactaccagtaac tgtcaggagaacaaaatagcaggtagtaatctccagaaagagaaaagcgctcatccaact gaatcattgagaaaggaacctgcttccacagcaggagccaaatctataagcaacagtgta agtgatttggaggtagaattaaacgtccacagttcaaaagcacctaagaaaaataggctg aggaggaagtcttctatcaggtgtgctcttccacttgaaccannnatcagtagaaatcca agcccacctacttgtgctgagcttcaaatcgatagttgtggtagcagtgaagaaacaaag aaaaaccattccaaccaacagccagccgggcaccttagagagcctcaactcatcgaagac actgaacctgcagcggatgccaagaagaacgagnnnccaaatgaacacataaggaagaga cgtgccagcgatgctttcccagaagagaaattaatgaacaaagctggtttattaactagc tgttcaagtcctagaaaatctcaagggcctgtcaatcccagccctcagagaacaggaaca gagcaannnnnncttgaaacacgccaaatgtctgacagtgccaaagaactcggggatcgg gtcctaggaggagagcccagtggcaaaaccactgaccgatctgaggagagcaccagcgta tccttggtacctgacactgactacgacactcagaacagtgtctcagtcctggacgctcac actgtcagannntatgcaagaacaggatccgctcagtgtatgactcagtttgtagcaagc gaaaaccccaaggaactcgtccatggctctaacaatgctgggnnnagtggcacagagggt ctcaagccccccttgagacacgcgcttaacctcagtcaggagaaannngtagaaatggaa gacagtgaacttgatactcagtatttgcagaatacatttcaagtttcaaagcgtcagtca tttgctttattttcaaaacctagaagtccccaaaaggactgtnnnnnnnnnnnngctcac tctgtgccctcaaaggaactgagtccaaaggtgacagctaaaggtaaacaaaaagaacgt cagggacaggaagaatttgaaatcagtcacgtacaagcagttgcggccacagtgggcnnn ttacctgtgccctgtcaagaaggtaagnnnctagctgctgatacaatgtgtnnnnnngat agaggttgtaggctttgtccatcatctcattacagaagcggggagaatggactcagcgcc acaggtaaatcaggaatttcacaaaactcacattttaaacaatcagtttctcccatcagg tcatctataaaaactgacaataggaaacctctgacagagggacgatttgagagacataca tcatcaactgagatggcggtgggaaatgagaacattcttcagagtacagtgcacacagtt agcctgaataacnnnagaggaaatgctnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnntgtcaagaagccggctcgggcagtattcatgaagtatgttccactnnnnnn nnnnnnnnnnnnnnnggtgactccttcccaggacaactaggtagaaacagagggcctaag gtgaacactgtgcctccattagatagtatgcagcctggtgtctgtcagcaaagtgttcct gtaagtgatnnnaagtatcttgaaataaaaaagnnnnnnnnnnnnnnncaggagggtgag gctgtctgtgcagacttctctccatgtctattctcagaccatcttgagcaatctatgagt ggtnnnaaggtttttcaggtttgctctgagacacctgatgacctgctggatgatgttgaa atacagggacatactagctttggtgaaggtgacataatggagagatctgctgtctttaac ggaagcatcctgagaagggagtccagtaggagccctagtcctgtaacccatgcatcgaag tctcagagtctccacagagcgtctaggaaattagaatcgtcagaagagagcgactccact gag >HowlerMon tgtggcacaaatactcatgccagctcattacagcatgagaacagcagtttgttactcact aaagacacactgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggctta gcaaggagccaacataacagatgggctgaaagtgaggaaacatgtaatgataggcagact cccagcacagagaaaaaggtagatgtggatgctgatcccctgcatgggagaaaagaatgg aataagcagaaacctccgtgctctgagaatcctagagatgatactgaagatgttgcttgg ataatgctaaatagcagcattcagaaagttaatgagtggttttccagaagtgatgaactg ttaacttctgatgactcacatgatggggggtctgaatcaaatgccaaagtagctgaagca ttggaagttctaaatgaggtagatggatattctagttcttcagagaaaatagacttactg gccagtgatcctcatgatcatttgatatgtaaaagtgaaagagttcactgcaaatcagta gagagtagtattgaagataaaatatttgggaaaacctatcggaggaaggcaagcctccct aacttgagccacgtaactgaaaatctaattataggagcatttgttactgagccacagata atacaagagcatcctctcacaaataaattaaagcgtaaaaggagagttacatcaggactt catcctgaggattttatcaagaaagcagatttggcagttnnncaaaagactcctgaaaag ataaatcagggaactaaccaaacagagcggaatgatcaagtgatgaatattactaacagt ggtcatgagaataaaacaaaaggtgattctattcagaatgagaacaatcctaacccagta gaatcactggaaaaagaannntcattcaaaagtaaagctgaacctataagcagtagtata agcaatatggaattagaattgaatgtccacaattccaaagcatctaaaaagaataggctg agaaggaagtcttctaccaggcatattcatgagcttgaactagtagtcagtagaaatcta agcccacctaattatactgaagtacaaattgatagttgttctagcagtgaagagataaag aaaaaaaattacaaccaaatgccagtcaggcacagcagaaagctacaactcatggaagat aaagaacgtgcagctagagccaaaaagagtagcaagccaaatgaacaaacaagtaaaaga catgccagtgatactttcccagaactgaggttaacaaacatacctggttcttttactaac tgttcaaatactaatgaatttaaagaatttgtcaatcctagccttccaagagaacaaaca gaagagaaannnctagaaacagttaaactgtctaataatgccaaagaccccaaagatctc atgttaagtggagaannnagtgttttgcaaattgaaagatctgtagagagtagcagtatt ttgttgatacctggtactgattatggcactcaggaaagtatctcattactggaagttagc actctggggnnnaaggcaaaaacagaaccaaataaatgtgtgagtcagtgtgcagcattt gaaaaccccaaggaactaattcatggttgttctaaagatactagaaatggcacagaaggc ttgaagtatccattgggacctgaagttaactacagtcaggaaacaagcatagatatgaga gaaagtgaacttgatactcaatatttgcagaatacattcaaggtttcaaagcgccagtca tttgctctgttttcaaatccaggaaatccagaaaaggaatgtgcaacattctctgcctgc tctaggtccttaaagaaacaaagtccaaaggtcactcctgaatgtgaacaaaaggaagaa aatcaaggagagaaagagtctaatatcgagcttgtagagacagttaataccactgcaggc tttcctatggtttgtcagaaagataagnnnccagttgattatgccagatgtatcgaannn ggaggctctaggctttgtctatcatctcagttcagaggcaacgaaactggactcattatt ccaaataaacatggacttttacagaacccatatcatatgtcaccgcttattcccaccagg tcatttgttaaaactaaatgtaagaaaaacctgctagaagaaaactctgaggaacattca atgtcacctgaaagagcaatgggaaacaagaacatcattccaagtacagtgagcacaatt agccataataacnnnagagaaaatgcttttaaagaaaccagctcaagcagtatttatgaa gtaggttccagtactaatgaagcaggttctagtactaatgaagtaggctccagtattaat gaagtaggttccagtgatgaaaacattcaagcagagctaggtagaaacagaaggccaaaa ttgaatgctatgcttagattagggcttctgcaacctgagatttgtaagcaaagtcttcct ataagtgattgtaaacatcctgaaattaaaaagcaagaacatgaagaannngtagttcag actgttaatacagacgtctctctatgtctgatttcatataacctagaacagcatatggga agcagtcatacatctcaggtttgttctgagacacctgacaacctgttagatgatggtgaa ataaaggaagatactagttttgctgaatatggcattaaggagacttctactgtttttagc aaaagtgtccagagaggagagctcagcaggagccctagccctttcacccatacacatttg gctcaggtttaccaaagaggggccaagaaattagagtcctcggaagagaatttatctagt gag PyCogent-1.5.3/tests/data/primates_brca1.tree000644 000765 000024 00000000145 10665667404 022106 0ustar00jrideoutstaff000000 000000 ((TreeShrew,FlyingLem),Mouse,(Galago,(HowlerMon,(Rhesus,(Orangutan,(Gorilla,(Human,Chimpanzee)))))));PyCogent-1.5.3/tests/data/test.psl000644 000765 000024 00000001377 11533073176 020030 0ustar00jrideoutstaff000000 000000 psLayout version 3 match mis- rep. N's Q gap Q gap T gap T gap strand Q Q Q Q T T T T block blockSizes qStarts tStarts match match count bases count bases name size start end name size start end count --------------------------------------------------------------------------------------------------------------------------------------------------------------- 33 0 0 0 0 0 0 0 + q1 33 0 33 83 35 0 33 1 33, 0, 0, 32 0 0 0 0 0 0 0 + q1 33 0 32 314 35 0 32 1 32, 0, 0, 33 0 0 0 0 0 0 0 + q1 33 0 33 279 35 0 33 1 33, 0, 0, 33 0 0 0 0 0 0 0 + q1 33 0 33 260 35 0 33 1 33, 0, 0, 33 0 0 0 0 0 0 0 + q1 33 0 33 257 35 0 33 1 33, 0, 0, 33 0 0 0 0 0 0 0 - q2 33 0 33 501-Rev-Primer-Match 33 0 33 1 33, 0, 0, PyCogent-1.5.3/tests/data/test.sff000644 000765 000024 00000004250 11347461377 020010 0ustar00jrideoutstaff000000 000000 .sffà¸TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTCAG M4FA6P1OK01CGMHQhe`ffcdddnl’X e`\bDYsqp¸n÷`VjÄ ] ¥jc»,lE a2d¬\:;   ! #   3   % ;!", #"-"                             TCAGATCTGAGCTGGGTCATAGCTGCCTCCGTAGGAGGTGCCTCCCTACGGCGCNNNANNNNNGNNNNNNNNNNNNN   !  .mft1.00ž 454 FA6P1OK R_2008_05_28_17_11_38_FLX02070135_adminrig_KnightLauber /data/2008_05_28/R_2008_05_28_17_11_38_FLX02070135_adminrig_KnightLauber/D_2008_05_28_21_13_06_FLX02070135_KnightLauber_FullAnalysisAmplicons /data/2008_05_28/R_2008_05_28_17_11_38_FLX02070135_adminrig_KnightLauber/D_2008_05_28_21_13_06_FLX02070135_KnightLauber_FullAnalysisAmplicons/../D_2008_05_29_13_52_01_FLX02070135_Knight_Lauber_jones_SignalProcessingAmplicons 1.1.03 FA6P1OK01CGMHQ¹ÿPyCogent-1.5.3/misc/boring000644 000765 000024 00000001167 10665667404 016421 0ustar00jrideoutstaff000000 000000 # Boring file regexps for darcs: \.hi$ \.o$ \.o\.cmd$ # *.ko files aren't boring by default because they might # be Korean translations rather than kernel modules. # \.ko$ \.ko\.cmd$ \.mod\.c$ (^|/)\.tmp_versions($|/) (^|/)CVS($|/) (^|/)RCS($|/) ~$ #(^|/)\.[^/] (^|/)_darcs($|/) \.bak$ \.BAK$ \.orig$ (^|/)vssver\.scc$ \.swp$ (^|/)MT($|/) (^|/)\{arch\}($|/) (^|/).arch-ids($|/) (^|/), \.class$ \.prof$ (^|/)\.DS_Store$ (^|/)BitKeeper($|/) (^|/)ChangeSet($|/) (^|/)\.svn($|/) \.py[co]$ \# \.cvsignore$ (^|/)Thumbs\.db$ # # Cogent specific entries start here # ^build($|/) \.pdf$ \.so$ _.*.c$ ^doc/examples/.*\.py$ doctest_.*\.py$ PyCogent-1.5.3/include/array_interface.h000644 000765 000024 00000000632 12024702176 021174 0ustar00jrideoutstaff000000 000000 #include "Python.h" #define PYCOGENT_VERSION "1.5.3" /* Array Interface flags */ #define CONTIGUOUS 0x001 #define FORTRAN 0x002 #define ALIGNED 0x100 #define NOTSWAPPED 0x200 #define WRITEABLE 0x400 typedef struct PyArrayInterface { int version; int nd; char typekind; int itemsize; int flags; Py_intptr_t *shape; Py_intptr_t *strides; void *data; } PyArrayInterface; PyCogent-1.5.3/include/numerical_pyrex.pyx000644 000765 000024 00000011047 12024702176 021637 0ustar00jrideoutstaff000000 000000 # Array checking functions for Pyrex code that takes NumPy arrays # # The idea here is that many array functions: # - Need to know the dimensions of their array inputs. # - Require some dimensions of different arrays to match. # - Don't ever need to consider a dimension of length 0. # # eg: x = y = z = 0 # dataA = checkArrayDouble2D(A, &x, &y) # dataB = checkArrayDouble2D(B, &z, &x) # x must match # __version__ = "('1', '5', '3')" cdef extern from "Python.h": void *PyCObject_AsVoidPtr(object) cdef extern from "array_interface.h": struct PyArrayInterface: int version int nd char typekind int itemsize int flags int *shape, *strides void *data int CONTIGUOUS ctypedef object ArrayType cdef double *uncheckedArrayDouble(ArrayType A): cdef PyArrayInterface *a cobj = A.__array_struct__ a = PyCObject_AsVoidPtr(cobj) return a.data cdef void *checkArray(ArrayType A, char typecode, int itemsize, int nd, int **dims) except NULL: cdef PyArrayInterface *a cdef int length, size cdef char kind if A is None: raise TypeError("Array required, got None") cobj = A.__array_struct__ a = PyCObject_AsVoidPtr(cobj) if a.version != 2: raise ValueError( "Unexpected array interface version %s" % str(a.version)) cdef char typecode2 typecode2 = a.typekind if typecode2 != typecode: raise TypeError("'%s' type array required, got '%s'" % (chr(typecode), chr(typecode2))) if a.itemsize != itemsize: raise TypeError("'%s%s' type array required, got '%s%s'" % (chr(typecode), itemsize, chr(typecode2), a.itemsize)) if a.nd != nd: raise ValueError("%s dimensional array required, got %s" % (nd, a.nd)) if not a.flags & CONTIGUOUS: raise ValueError ('Noncontiguous array') cdef int dimension, val cdef int *var for dimension from 0 <= dimension < nd: val = a.shape[dimension] var = dims[dimension] if var[0] == 0: # Length unspecified, take it from the provided array var[0] = val elif var[0] != val: # Length already specified, but not the same raise ValueError("Dimension %s is %s, expected %s" % (dimension, val, var[0])) else: # Length matches what was expected pass return a.data cdef void *checkArray1D(ArrayType a, char typecode, int size, int *x) except NULL: cdef int *dims[1] dims[0] = x return checkArray(a, typecode, size, 1, dims) cdef void *checkArray2D(ArrayType a, char typecode, int size, int *x, int *y) except NULL: cdef int *dims[2] dims[0] = x dims[1] = y return checkArray(a, typecode, size, 2, dims) cdef void *checkArray3D(ArrayType a, char typecode, int size, int *x, int *y, int *z) except NULL: cdef int *dims[3] dims[0] = x dims[1] = y dims[2] = z return checkArray(a, typecode, size, 3, dims) cdef void *checkArray4D(ArrayType a, char typecode, int size, int *w, int *x, int *y, int *z) except NULL: cdef int *dims[4] dims[0] = w dims[1] = x dims[2] = y dims[3] = z return checkArray(a, typecode, size, 4, dims) cdef double * checkArrayDouble1D(ArrayType a, int *x) except NULL: return checkArray1D(a, c'f', sizeof(double), x) cdef double * checkArrayDouble2D(ArrayType a, int *x, int *y) except NULL: return checkArray2D(a, c'f', sizeof(double), x, y) cdef double * checkArrayDouble3D(ArrayType a, int *x, int *y, int *z) except NULL: return checkArray3D(a, c'f', sizeof(double), x, y, z) cdef double * checkArrayDouble4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: return checkArray4D(a, c'f', sizeof(double), w, x, y, z) cdef long * checkArrayLong1D(ArrayType a, int *x) except NULL: return checkArray1D(a, c'i', sizeof(long), x) cdef long * checkArrayLong2D(ArrayType a, int *x, int *y) except NULL: return checkArray2D(a, c'i', sizeof(long), x, y) cdef long * checkArrayLong3D(ArrayType a, int *x, int *y, int *z) except NULL: return checkArray3D(a, c'i', sizeof(long), x, y, z) cdef long * checkArrayLong4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: return checkArray4D(a, c'i', sizeof(long), w, x, y, z) PyCogent-1.5.3/doc/_static/000755 000765 000024 00000000000 12024703636 016436 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/ChangeLog.rst000755 000765 000024 00000000000 12024703636 021347 2../ChangeLogustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/coding_guidelines.rst000644 000765 000024 00000062332 11305663627 021231 0ustar00jrideoutstaff000000 000000 Coding guidelines ================= As project size increases, consistency increases in importance. Unit testing and a consistent style are critical to having trusted code to integrate. Also, guesses about names and interfaces will be correct more often. What should I call my variables? -------------------------------- - *Choose the name that people will most likely guess.* Make it descriptive, but not too long: ``curr_record`` is better than ``c``, or ``curr``, or ``current_genbank_record_from_database``. - *Good names are hard to find.* Don't be afraid to change names except when they are part of interfaces that other people are also using. It may take some time working with the code to come up with reasonable names for everything: if you have unit tests, it's easy to change them, especially with global search and replace. - *Use singular names for individual things, plural names for collections.* For example, you'd expect ``self.Name`` to hold something like a single string, but ``self.Names`` to hold something that you could loop through like a list or dict. Sometimes the decision can be tricky: is ``self.Index`` an int holding a positon, or a dict holding records keyed by name for easy lookup? If you find yourself wondering these things, the name should probably be changed to avoid the problem: try ``self.Position`` or ``self.LookUp``. - *Don't make the type part of the name.* You might want to change the implementation later. Use ``Records`` rather than ``RecordDict`` or ``RecordList``, etc. Don't use Hungarian Notation either (i.e. where you prefix the name with the type). - *Make the name as precise as possible.* If the variable is the name of the input file, call it ``infile_name``, not ``input`` or ``file`` (which you shouldn't use anyway, since they're keywords), and not ``infile`` (because that looks like it should be a file object, not just its name). - *Use* ``result`` *to store the value that will be returned from a method or function.* Use ``data`` for input in cases where the function or method acts on arbitrary data (e.g. sequence data, or a list of numbers, etc.) unless a more descriptive name is appropriate. - *One-letter variable names should only occur in math functions or as loop iterators with limited scope.* Limited scope covers things like ``for k in keys: print k``, where ``k`` survives only a line or two. Loop iterators should refer to the variable that they're looping through: ``for k in keys, i in items``, or ``for key in keys, item in items``. If the loop is long or there are several 1-letter variables active in the same scope, rename them. - *Limit your use of abbreviations.* A few well-known abbreviations are OK, but you don't want to come back to your code in 6 months and have to figure out what ``sptxck2`` is. It's worth it to spend the extra time typing ``species_taxon_check_2``, but that's still a horrible name: what's check number 1? Far better to go with something like ``taxon_is_species_rank`` that needs no explanation, especially if the variable is only used once or twice. Acceptable abbreviations ^^^^^^^^^^^^^^^^^^^^^^^^ The following list of abbreviations can be considered well-known and used with impunity within mixed name variables, but some should not be used by themselves as they would conflict with common functions, python built-in's, or raise an exception. Do not use the following by themselves as variable names: ``dir``, ``exp`` (a common ``math`` module function), ``in``, ``max``, and ``min``. They can, however, be used as part of a name, eg ``matrix_exp``. +--------------------+--------------+ | Full | Abbreviated | +====================+==============+ | alignment | aln | +--------------------+--------------+ | archaeal | arch | +--------------------+--------------+ | auxillary | aux | +--------------------+--------------+ | bacterial | bact | +--------------------+--------------+ | citation | cite | +--------------------+--------------+ | current | curr | +--------------------+--------------+ | database | db | +--------------------+--------------+ | dictionary | dict | +--------------------+--------------+ | directory | dir | +--------------------+--------------+ | end of file | eof | +--------------------+--------------+ | eukaryotic | euk | +--------------------+--------------+ | frequency | freq | +--------------------+--------------+ | expected | exp | +--------------------+--------------+ | index | idx | +--------------------+--------------+ | input | in | +--------------------+--------------+ | maximum | max | +--------------------+--------------+ | minimum | min | +--------------------+--------------+ | mitochondrial | mt | +--------------------+--------------+ | number | num | +--------------------+--------------+ | observed | obs | +--------------------+--------------+ | original | orig | +--------------------+--------------+ | output | out | +--------------------+--------------+ | parameter | param | +--------------------+--------------+ | phylogeny | phylo | +--------------------+--------------+ | previous | prev | +--------------------+--------------+ | probability | prob | +--------------------+--------------+ | protein | prot | +--------------------+--------------+ | record | rec | +--------------------+--------------+ | reference | ref | +--------------------+--------------+ | sequence | seq | +--------------------+--------------+ | standard deviation | stdev | +--------------------+--------------+ | statistics | stats | +--------------------+--------------+ | string | str | +--------------------+--------------+ | structure | struct | +--------------------+--------------+ | temporary | temp | +--------------------+--------------+ | taxonomic | tax | +--------------------+--------------+ | variance | var | +--------------------+--------------+ What are the naming conventions? -------------------------------- .. tabularcolumns:: |p{3.4cm}|p{6cm}|p{5cm}| .. csv-table:: :header: Type , Convention , Example function , ``verb_with_underscores`` , ``find_all`` variable , ``noun_with_underscores`` , ``curr_index`` constant , ``NOUN_ALL_CAPS`` , ``ALLOWED_RNA_PAIRS`` class , ``MixedCaseNoun`` , ``RnaSequence`` public property , ``MixedCaseNoun`` , ``IsPaired`` private property , ``_noun_with_leading_underscore`` , ``_is_updated`` public method , ``mixedCaseExceptFirstWordVerb`` , ``stripDegenerate`` private method , ``_verb_with_leading_underscore`` , ``_check_if_paired`` really private data , ``__two_leading_underscores`` , ``__delegator_object_ref`` parameters that match properties , ``SameAsProperty`` , "``def __init__(data, Alphabet=None)``" factory function , ``MixedCase`` , ``InverseDict`` module , ``lowercase_with_underscores`` , ``unit_test`` global variables , ``gMixedCaseWithLeadingG`` , no examples - should be rare! - *It is important to follow the naming conventions because they make it much easier to guess what a name refers to*. In particular, it should be easy to guess what scope a name is defined in, what it refers to, whether it's OK to change its value, and whether its referent is callable. The following rules provide these distinctions. - ``lowercase_with_underscores`` *for modules and internal variables (including function/method parameters).* - ``MixedCase`` for *classes* and *public properties*, and for *factory functions* that act like additional constructors for a class. - ``mixedCaseExceptFirstWord`` for *public methods and functions*. - ``_lowercase_with_leading_underscore`` for *private* functions, methods, and properties. - ``__lowercase_with_two_leading_underscores`` for *private* properties and functions that *must not be overwritten* by a subclass. - ``CAPS_WITH_UNDERSCORES`` for named *constants*. - ``gMixedCase`` (i.e. mixed case prefixed with 'g') for *globals*. Globals should be used extremely rarely and with caution, even if you sneak them in using the Singleton pattern or some similar system. - *Underscores can be left out if the words read OK run together.* ``infile`` and ``outfile`` rather than ``in_file`` and ``out_file``; ``infile_name`` and ``outfile_name`` rather than ``in_file_name`` and ``out_file_name`` or ``infilename`` and ``outfilename`` (getting too long to read effortlessly). How do I organize my modules (source files)? -------------------------------------------- - *Have a docstring with a description of the module's functions*. If the description is long, the first line should be a short summary that makes sense on its own, separated from the rest by a newline. - *All code, including import statements, should follow the docstring.* Otherwise, the docstring will not be recognized by the interpreter, and you will not have access to it in interactive sessions (i.e. through ``obj.__doc__``) or when generating documentation with automated tools. - *Import built-in modules first, followed by third-party modules, followed by any changes to the path and your own modules.* Especially, additions to the path and names of your modules are likely to change rapidly: keeping them in one place makes them easier to find. - *Don't use* ``from module import *``, *instead use* ``from module import Name, Name2, Name3...`` *or possibly* ``import module``. This makes it *much* easier to see name collisions and to replace implementations. Example of module structure ^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: #!/usr/bin/env python """Provides NumberList and FrequencyDistribution, classes for statistics. NumberList holds a sequence of numbers, and defines several statistical operations (mean, stdev, etc.) FrequencyDistribution holds a mapping from items (not necessarily numbers) to counts, and defines operations such as Shannon entropy and frequency normalization. """ from math import sqrt, log, e from random import choice, random from Utils import indices class NumberList(list): pass # much code deleted class FrequencyDistribution(dict): pass # much code deleted # use the following when the module can meaningfully be called as a script. if __name__ == '__main__': # code to execute if called from command-line pass # do nothing - code deleted How should I write comments? ---------------------------- - *Always update the comments when the code changes.* Incorrect comments are far worse than no comments, since they are actively misleading. - *Comments should say more than the code itself.* Examine your comments carefully: they may indicate that you'd be better off rewriting your code (especially, *renaming your variables* and getting rid of the comment.) In particular, don't scatter magic numbers and other constants that have to be explained through your code. It's far better to use variables whose names are self-documenting, especially if you use the same constant more than once. Also, think about making constants into class or instance data, since it's all too common for 'constants' to need to change or to be needed in several methods. +-------+------------------------------------------------------------+ | Wrong | ``win_size -= 20 # decrement win_size by 20`` | +-------+------------------------------------------------------------+ | OK | ``win_size -= 20 # leave space for the scroll bar`` | +-------+------------------------------------------------------------+ | Right | ``self._scroll_bar_size = 20`` | +-------+------------------------------------------------------------+ | | ``win_size -= self._scroll_bar_size`` | +-------+------------------------------------------------------------+ - *Use comments starting with #, not strings, inside blocks of code.* Python ignores real comments, but must allocate storage for strings (which can be a performance disaster inside an inner loop). - *Start each method, class and function with a docstring using triple double quotes (""").* The docstring should start with a 1-line description that makes sense by itself (many automated formatting tools, and the IDE, use this). This should be followed by a blank line, followed by descriptions of the parameters (if any). Finally, add any more detailed information, such as a longer description, notes about the algorithm, detailed notes about the parameters, etc. If there is a usage example, it should appear at the end. Make sure any descriptions of parameters have the correct spelling, case, etc. For example: :: def __init__(self, data, name='', alphabet=None): """Returns new Sequence object with specified data, name, alphabet. Arguments: - data: The sequence data. Should be a sequence of characters. - name: Arbitrary label for the sequence. Should be string-like. - alphabet: Set of allowed characters. Should support 'for x in y' syntax. None by default. Note: if alphabet is None, performs no validation. """ - *Always update the docstring when the code changes.* Like outdated comments, outdated docstrings can waste a lot of time. "Correct examples are priceless, but incorrect examples are worse than worthless." `Jim Fulton`_. How should I format my code? ---------------------------- - *Use 4 spaces for indentation.* Do not use tabs (set your editor to convert tabs to spaces). The behaviour of tabs is not predictable across platforms, and will cause syntax errors. If we all use the same indentation, collaboration is much easier. - *Lines should not be longer than 79 characters.* Long lines are inconvenient in some editors. Use \\ for line continuation. Note that there cannot be whitespace after the \\. - *Blank lines should be used to highlight class and method definitions.* Separate class definitions by two blank lines. Separate methods by one blank line. - *Be consistent with the use of whitespace around operators.* Inconsistent whitespace makes it harder to see at a glance what is grouped together. +------+--------------------------+ | Good | ``((a+b)*(c+d))`` | +------+--------------------------+ | OK | ``((a + b) * (c + d))`` | +------+--------------------------+ | Bad | ``( (a+ b) *(c +d ))`` | +------+--------------------------+ - *Don't put whitespace after delimiters or inside slicing delimiters.* Whitespace here makes it harder to see what's associated. +------+-------------+------------------+ | Good | ``(a+b)`` | ``d[k]`` | +------+-------------+------------------+ | Bad | ``( a+b )`` | ``d [k], d[ k]`` | +------+-------------+------------------+ How should I test my code ? --------------------------- There are two basic approaches for testing code in python: unit testing and doc testing. Their purpose is the same, to check that execution of code given some input produces a specified output. The cases to which the two approaches lend themselves are different. An excellent discourse on testing code and the pros and cons of these alternatives is provided in a presentation by `Jim Fulton`_, which is recommended reading. A significant change since that presentation is that ``doctest`` can now read content that is not contained within docstrings. A another comparison of these two approaches, along with a third (``py.test``) is also available_. To see examples of both styles of testing look in ``PyCogent/tests``: files ending in .rst are using ``doctest``, those ending in .py are using ``unittest``. .. _`Jim Fulton`: http://www.python.org/pycon/dc2004/papers/4/PyCon2004DocTestUnit.pdf .. _available: http://agiletesting.blogspot.com/2005/11/articles-and-tutorials-page-updated.html In general, it's easier to start writing ``doctest``'s, as you don't need to learn the ``unittest`` API but the latter give's much greater control. Whatever approach is employed, the general principle is every line of code should be tested. It is critical that your code be fully tested before you draw conclusions from results it produces. For scientific work, bugs don't just mean unhappy users who you'll never actually meet: they may mean retracted publications. Tests are an opportunity to invent the interface(s) you want. Write the test for a method before you write the method: often, this helps you figure out what you would want to call it and what parameters it should take. It's OK to write the tests a few methods at a time, and to change them as your ideas about the interface change. However, you shouldn't change them once you've told other people what the interface is. Never treat prototypes as production code. It's fine to write prototype code without tests to try things out, but when you've figured out the algorithm and interfaces you must rewrite it *with tests* to consider it finished. Often, this helps you decide what interfaces and functionality you actually need and what you can get rid of. "Code a little test a little". For production code, write a couple of tests, then a couple of methods, then a couple more tests, then a couple more methods, then maybe change some of the names or generalize some of the functionality. If you have a huge amount of code where 'all you have to do is write the tests', you're probably closer to 30% done than 90%. Testing vastly reduces the time spent debugging, since whatever went wrong has to be in the code you wrote since the last test suite. And remember to use python's interactive interpreter for quick checks of syntax and ideas. Run the test suite when you change `anything`. Even if a change seems trivial, it will only take a couple of seconds to run the tests and then you'll be sure. This can eliminate long and frustrating debugging sessions where the change turned out to have been made long ago, but didn't seem significant at the time. Some ``unittest`` pointers ^^^^^^^^^^^^^^^^^^^^^^^^^^ - *Use the* ``unittest`` *framework with tests in a separate file for each module.* Name the test file ``test_module_name.py``. Keeping the tests separate from the code reduces the temptation to change the tests when the code doesn't work, and makes it easy to verify that a completely new implementation presents the same interface (behaves the same) as the old. - *Use* ``evo.unit_test`` *if you are doing anything with floating point numbers or permutations* (use ``assertFloatEqual``). Do *not* try to compare floating point numbers using ``assertEqual`` if you value your sanity. ``assertFloatEqualAbs`` and ``assertFloatEqualRel`` can specifically test for absolute and relative differences if the default behavior is not giving you what you want. Similarly, ``assertEqualItems``, ``assertSameItems``, etc. can be useful when testing permutations. - *Test the interface of each class in your code by defining at least one* ``TestCase`` *with the name* ``ClassNameTests``. This should contain tests for everything in the public interface. - *If the class is complicated, you may want to define additional tests with names* ``ClassNameTests_test_type``. These might subclass ``ClassNameTests`` in order to share ``setUp`` methods, etc. - *Tests of private methods should be in a separate* ``TestCase`` *called* ``ClassNameTests_private``. Private methods may change if you change the implementation. It is not required that test cases for private methods pass when you change things (that's why they're private, after all), though it is often useful to have these tests for debugging. - *Test `all` the methods in your class.* You should assume that any method you haven't tested has bugs. The convention for naming tests is ``test_method_name``. Any leading and trailing underscores on the method name can be ignored for the purposes of the test; however, *all tests must start with the literal substring* ``test`` *for* ``unittest`` *to find them.* If the method is particularly complex, or has several discretely different cases you need to check, use ``test_method_name_suffix``, e.g. ``test_init_empty``, ``test_init_single``, ``test_init_wrong_type``, etc. for testing ``__init__``. - *Write good docstrings for all your test methods.* When you run the test with the ``-v`` command-line switch for verbose output, the docstring for each test will be printed along with ``...OK`` or ``...FAILED`` on a single line. It is thus important that your docstring is short and descriptive, and makes sense in this context. **Good docstrings:** :: NumberList.var should raise ValueError on empty or 1-item list NumberList.var should match values from R if list has >2 items NumberList.__init__ should raise error on values that fail float() FrequencyDistribution.var should match corresponding NumberList var **Bad docstrings:** :: var should calculate variance # lacks class name, not descriptive Check initialization of a NumberList # doesn't say what's expected Tests of the NumberList initialization. # ditto - *Module-level functions should be tested in their own* ``TestCase``\ *, called* ``modulenameTests``. Even if these functions are simple, it's important to check that they work as advertised. - *It is much more important to test several small cases that you can check by hand than a single large case that requires a calculator.* Don't trust spreadsheets for numerical calculations -- use R instead! - *Make sure you test all the edge cases: what happens when the input is None, or '', or 0, or negative?* What happens at values that cause a conditional to go one way or the other? Does incorrect input raise the right exceptions? Can your code accept subclasses or superclasses of the types it expects? What happens with very large input? - *To test permutations, check that the original and shuffled version are different, but that the sorted original and sorted shuffled version are the same.* Make sure that you get *different* permutations on repeated runs and when starting from different points. - *To test random choices, figure out how many of each choice you expect in a large sample (say, 1000 or a million) using the binomial distribution or its normal approximation.* Run the test several times and check that you're within, say, 3 standard deviations of the mean. Example of a ``unittest`` test module structure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: #!/usr/bin/env python """Tests NumberList and FrequencyDistribution, classes for statistics.""" from cogent.util.unit_test import TestCase, main # for floating point test use unittestfp from statistics import NumberList, FrequencyDistribution class NumberListTests(TestCase): # remember to subclass TestCase """Tests of the NumberList class.""" def setUp(self): """Define a few standard NumberLists.""" self.Null = NumberList() # test empty init self.Empty = NumberList([]) # test init with empty sequence self.Single = NumberList([5]) # single item self.Zero = NumberList([0]) # single, False item self.Three = NumberList([1,2,3]) # multiple items self.ZeroMean = NumberList([1,-1]) # items nonzero, mean zero self.ZeroVar = NumberList([1,1,1]) # items nonzero, mean nonzero, variance zero # etc. These objects shared by all tests, and created new each time a method # starting with the string 'test' is called (i.e. the same object does not # persist between tests: rather, you get separate copies). def test_mean_empty(self): """NumberList.mean() should raise ValueError on empty object""" for empty in (self.Null, self.Empty): self.assertRaises(ValueError, empty.mean) def test_mean_single(self): """NumberList.mean() should return item if only 1 item in list""" for single in (self.Single, self.Zero): self.assertEqual(single.mean(), single[0]) # other tests of mean def test_var_failures(self): """NumberList.var() should raise ZeroDivisionError if <2 items""" for small in (self.Null, self.Empty, self.Single, self.Zero): self.assertRaises(ZeroDivisionError, small.var) # other tests of var # tests of other methods class FrequencyDistributionTests(TestCase): pass # much code deleted # tests of other classes if __name__ == '__main__': # run tests if called from command-line main() PyCogent-1.5.3/doc/COGENT_LICENSE.rst000644 000765 000024 00000045504 11204372323 017665 0ustar00jrideoutstaff000000 000000 .. _cogent_license: PyCogent License ================ :: GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright 2007-2009 (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright 2007-2009 (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright 2007-2009 (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. PyCogent-1.5.3/doc/conf.py000644 000765 000024 00000014721 12024702176 016312 0ustar00jrideoutstaff000000 000000 # -*- coding: utf-8 -*- # # PyCogent documentation build configuration file, created by # sphinx-quickstart on Wed May 13 12:28:51 2009. # # This file is execfile()d with the current directory set to its containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys, os # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. #sys.path.append(os.path.abspath('.')) # -- General configuration ----------------------------------------------------- # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.todo', 'sphinx.ext.doctest', 'sphinx.ext.pngmath'] # todo_include_todos=True # to expose the TODOs, uncomment this line # Add any paths that contain templates here, relative to this directory. templates_path = ['templates'] # The suffix of source filenames. source_suffix = '.rst' # ignore the cookbook/ensembl.rst file as it's specifically imported exclude_patterns = ['cookbook/ensembl.rst'] # The encoding of source files. #source_encoding = 'utf-8' # The master toctree document. master_doc = 'index' # General information about the project. project = u'PyCogent' copyright = u'2009, PyCogent Team' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = "1.5" # The full version, including alpha/beta/rc tags. release = "1.5.3" # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of documents that shouldn't be included in the build. #unused_docs = [] # List of directories, relative to source directory, that shouldn't be searched # for source files. exclude_trees = ['_build'] #exclude_trees = ['_build', 'cookbook'] # comment out after release # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. show_authors = True # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # -- Options for HTML output --------------------------------------------------- # The theme to use for HTML and HTML Help pages. Major themes that come with # Sphinx are currently 'default' and 'sphinxdoc'. html_theme = 'default' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the carry path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_use_modindex = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = '' # Output file base name for HTML help builder. htmlhelp_basename = 'PyCogentdoc' # -- Options for LaTeX output -------------------------------------------------- # The paper size ('letter' or 'a4'). #latex_paper_size = 'letter' # The font size ('10pt', '11pt' or '12pt'). #latex_font_size = '10pt' # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto/manual]). latex_documents = [ ('index', 'PyCogent.tex', u'PyCogent Documentation', u'PyCogent Team', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # Additional stuff for the LaTeX preamble. #latex_preamble = '' # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_use_modindex = True PyCogent-1.5.3/doc/cookbook/000755 000765 000024 00000000000 12024703640 016611 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/data/000755 000765 000024 00000000000 12024703641 015715 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/data_file_links.rst000644 000765 000024 00000002363 12022465506 020655 0ustar00jrideoutstaff000000 000000 The data files used in the documentation ======================================== All data files referred to in the documentation can be download by clicking on the links below. .. note:: not all browsers handle the file suffixes well, so you may need to rename the downloaded files in order to use them. :download:`4TSV.pdb ` :download:`abglobin_aa.phylip ` :download:`dists_for_phylo.pickle ` :download:`long_testseqs.fasta ` :download:`motif_example.fasta ` :download:`motif_example_meme_results.txt ` :download:`primate_brca1.fasta ` :download:`primate_brca1.tree ` :download:`primate_cdx2_promoter.fasta ` :download:`test.paml ` :download:`test.tree ` :download:`test2.fasta ` :download:`trna_profile.fasta ` :download:`refseqs.fasta ` :download:`refseqs_protein.fasta ` :download:`inseqs.fasta ` :download:`inseqs_protein.fasta ` PyCogent-1.5.3/doc/developer_notes.rst000644 000765 000024 00000013563 12014704335 020743 0ustar00jrideoutstaff000000 000000 For Developers ============== Anyone can contribute to the development of PyCogent, not just registered developers. If you figure out a solution to something using PyCogent that you'd like to share please post comments on the forums_ page at sourceforge and we'll look at including your tips! If you have ideas for improving the current documentation, then we want to hear from you so please post on the tracker_ page! .. _forums: http://sourceforge.net/forum/?group_id=186234 .. _tracker: http://sourceforge.net/tracker2/?group_id=186234 Grabbing from the subversion repository --------------------------------------- To grab PyCogent from the sourceforge subversion repository, do the following:: $ svn co https://pycogent.svn.sourceforge.net/svnroot/pycogent/trunk PyCogent Building/testing the documentation ---------------------------------- To build the documentation or ``doctest`` the contents, you'll need to install Sphinx. Assuming you have ``easy_install`` configured on your machine (and if not, download ``setuptools`` from http://pypi.python.org/pypi/setuptools and follow the instructions). Then grab Sphinx:: $ sudo easy_install -U sphinx Generating the html form of the documentation requires changing into the ``doc`` directory and executing a ``make`` command:: $ cd path/to/PyCogent/doc $ make html ... # bunch of output Build finished. The HTML pages are in _build/html. This prints a bunch of output to screen and creates a directory ``PyCogent/doc/_build`` and within that ``html``. The index file ``PyCogent/doc/_build/html/index.html`` is the root of the documentation. One can also generate a pdf file, using the Sphinx latex generation capacity. This is slightly more involved. (It also requires that you have an installation of TeTex_.) .. _TeTex: http://www.tug.org/texlive/ First generate the latex :: $ make latex ... # bunch of output Build finished; the LaTeX files are in _build/latex. Run `make all-pdf' or `make all-ps' in that directory to run these through (pdf)latex. then change into the ``latex`` dir and build the pdf :: $ cd _build/latex $ make all-pdf You can now open ``PyCogent.pdf``. To actually test the documentation, you need to be in the ``doc`` directory and then execute another ``make`` command:: $ cd path/to/PyCogent/doc $ make doctest The results are in ``_build/doctest/output.txt``. .. note:: The documentation does not test for presence of 3rd party dependencies (such as applications or python modules) like the PyCogent ``unittest`` test suite. If you don't have all the 3rd party applications installed you will see failures. At this point **no effort** is being expended to hide such failures. Adding to the documentation --------------------------- You can maximise the cogent user experience for yourself and others by contributing to the documentation. If you solve a problem that you think might prove useful to others then add it into the documentation. If you can think of ways to improve the existing documents let us know. For guidance on adding documentation, look at any of the existing examples. The restructured text format is pretty easy to write (for overview see the Sphinx `rest overview`_). The conventions adopted by PyCogent are to use heading levels to be consistent with the Python.org standard (taken from `Sphinx headings`_). They are - # with overline, for parts - \* with overline, for chapters - =, for sections - -, for subsections - ^, for subsubsections - ", for paragraphs - +, added for sub-paragraphs (non-standard) If it's a use-case, create your file in the ``examples`` directory, giving it a ``.rst`` suffix. Link it into the documentation tree, adding a line into the ``examples/index.rst`` file. If it's something you think should be added into the cookbook, add it into the appropriate cookbook document. The new documentation checklist ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Things you should check before committing your new document: - Add a line at the beginning with yourself as author (``.. sectionauthor:: My Name``) so people can contact you with feedback. - Add any data files used in your documentation under ``PyCogent/doc/data/`` - Add a download link to those files to ``PyCogent/doc/data_file_links.rst`` following the style employed in that file. - Spellcheck!! - Check what you wrote is valid restructured text by building the documents for both html and latex. If your document isn't connected into the table of contents, Sphinx will print a warning to screen. - Check you have correctly marked up the content and that it looks OK. Make sure that python code and shell commands are correctly highlighted and that literals are marked up as literals. In particular, check the latex build since it is common for text to span beyond the page margins. If the latter happens, revise your document! - Check that it works (rather than testing the entire suite, you can use the convenience script within doc). For instance, the following is a single test of one file:: $ cd path/to/PyCogent/doc $ python doctest_rsts.py examples/reverse_complement.rst Adding TODOs ^^^^^^^^^^^^ Add todo's into the rst files using the ``todo`` directive as in :: .. todo:: some task To see the list of todo's in the project, uncomment the line that sets ``todo_include_todos=True`` in ``doc/conf.py``, then cd into the ``doc/`` and make the html docs again. The todo's are listed on the main page. .. warning:: Be sure to revert the conf.py file back to it's original state so you don't accidentally commit the change as this affects everyone else's documentation too! Developing C-extensions ----------------------- Extensions for PyCogent should be written in `Cython `_. If you have any questions, contact Gavin_. .. _`rest overview`: http://sphinx.pocoo.org/rest.html .. _`Sphinx headings`: http://sphinx.pocoo.org/rest.html#sections .. _Gavin: Gavin.Huttley@anu.edu.au PyCogent-1.5.3/doc/doctest2script.py000644 000765 000024 00000001415 11502775461 020344 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ This takes doctest files and turns them into standalone scripts. """ import doctest, sys, os __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2011, The Cogent Project" __contributors__ = ["Gavin Huttley", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.3.0.dev" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" for filename in sys.argv[1:]: print filename, (name, suffix) = os.path.splitext(filename) if suffix != '.rst': print 'not a .rst file' continue f = open(filename,'r') s = ''.join(f.readlines()) f.close() s = doctest.script_from_examples(s) f = open(name+'.py','w') f.write(s) f.close() print '->', name+'.py' PyCogent-1.5.3/doc/doctest_rsts.py000644 000765 000024 00000000561 11213026275 020100 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import doctest, os, sys """ This will doctest all files ending with .rst in this directory. """ fl = sys.argv[1:] if not fl: # find all files that end with rest fl = [fname for fname in os.listdir(os.getcwd()) if fname.endswith('.rst')] for test in fl: doctest.testfile(test, optionflags = doctest.ELLIPSIS, verbose=True) PyCogent-1.5.3/doc/examples/000755 000765 000024 00000000000 12024703643 016624 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/images/000755 000765 000024 00000000000 12024703637 016256 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/index.rst000644 000765 000024 00000005657 12022502120 016645 0ustar00jrideoutstaff000000 000000 .. _contents: #################################### Welcome to PyCogent's documentation! #################################### **Contents** .. toctree:: :maxdepth: 1 install README coding_guidelines data_file_links examples/index cookbook/index developer_notes scripting_guidelines licenses ChangeLog .. todolist:: ******** Overview ******** PyCogent is a software library for genomic biology. It is a fully integrated and thoroughly tested framework for: controlling third-party applications; devising workflows; querying databases; conducting novel probabilistic analyses of biological sequence evolution; and generating publication quality graphics. It is distinguished by many unique built-in capabilities (such as true codon alignment) and the frequent addition of entirely new methods for the analysis of genomic data. Our primary goal is to provide a collection of rigourously validated tools for the manipulation and analysis of genome biology data sets. The project is routinely employed in numerous labs across the world and has provided essential capabilities for many high profile publications, e.g. `Nature 2009 457:480-4`_, `PNAS 2008 105:17994-9`_, `Science 2008 320:1647-51`_, `Nature 2008 453: 175-83`_ and `Nat Genet 2007 39: 1261-5`_. ************************* Contacts and contributing ************************* If you find a bug, have feature or documentation requests then please post comments on the corresponding tracker_ page at sourceforge. If you have any questions please post on the appropriate projects forums_ page at sourceforge. We appreciate your input! .. _tracker: http://sourceforge.net/tracker2/?group_id=186234 .. _forums: http://sourceforge.net/projects/pycogent/forums ******** Citation ******** If you use this software for published work please cite -- `Knight et al., 2007, Genome Biol, 8, R171 `_. ****** Search ****** * :ref:`search` .. _`Nature 2009 457:480-4`: http://www.ncbi.nlm.nih.gov/pubmed/19043404?ordinalpos=6&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum .. _`PNAS 2008 105:17994-9`: http://www.ncbi.nlm.nih.gov/pubmed/19004758?ordinalpos=7&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum .. _`Science 2008 320:1647-51`: http://www.ncbi.nlm.nih.gov/pubmed/18497261?ordinalpos=12&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum .. _`Nature 2008 453: 175-83`: http://www.ncbi.nlm.nih.gov/pubmed/18464734?dopt=Citation .. _`Nat Genet 2007 39: 1261-5`: http://www.ncbi.nlm.nih.gov/pubmed/17828264?dopt=Citation ********************** News and Announcements ********************** PyCogent News and Annoucements are available via the News and Announcements Blog at http://pycogent.wordpress.com. `Subscribe to the News and Annoucements RSS Feed `_ PyCogent-1.5.3/doc/install.rst000644 000765 000024 00000007714 12024703343 017214 0ustar00jrideoutstaff000000 000000 .. _quick-install: Quick installation ================== PyCogent.app for OSX 10.6 ------------------------- `Download PyCogent.app `_. This native OSX app comes bundled with all required dependencies and is a download, decompress and go experience! It also implements the novel script form system that controls command line scripts via a form based input mechanism. By virtual machine ------------------ One way to install PyCogent is to install the QIIME virtual machine using VirtualBox. The installation instructions can be found `here `_. Please, note that this is the only installation method supported for Windows and that natively Windows does not support gz files properly so to uncompress a gz file in Windows use `7-zip `_. For systems with ``easy_install`` --------------------------------- For the list of dependencies see the :ref:`required` software list. The following assumes you have ``easy_install`` on your system (this comes standard with new Macs for instance), that you have administrator privileges and that you're connected to the internet. See below if you don't have ``easy_install``. The key steps for the minimal install are: 1. Download the :download:`requirements file <../cogent-requirements.txt>`. 2. Install pip :: $ sudo easy_install -U pip 3. Use pip to download, build and install PyCogent plus the numpy dependency. :: $ DONT_USE_PYREX=1 sudo pip install -r path/to/cogent-requirements.txt .. note:: The ``DONT_USE_PYREX=1`` statement is required if you have Pyrex installed due to a conflict between setuptools and later versions of Pyrex. If you don't have Pyrex, this will still work. If the above fails to download PyCogent you can `download the tarball `_ to your hard drive and replace the first line of the :download:`requirements file <../cogent-requirements.txt>` with the full path to the tarball, e.g. ``/Users/my_user_name/Downloads/cogent-1.5.3.tgz``. Optional installs ^^^^^^^^^^^^^^^^^ To use the Ensembl querying code """""""""""""""""""""""""""""""" Add the following lines to the requirements file :: MySQL-python>=1.2.2 SQLAlchemy>=0.5 .. note:: The MySQL-python module requires that you have MySQL installed. To use the parallel capabilities """""""""""""""""""""""""""""""" Add the following to the requirements file :: mpi4py>=1.0 To build the documentation """""""""""""""""""""""""" Add the following to the requirements file :: Sphinx>=0.6 To use the development version of PyCogent """""""""""""""""""""""""""""""""""""""""" Just replace the first line of the requirements file with ``https://pycogent.svn.sourceforge.net/svnroot/pycogent/trunk``. To use the graphics capabilities """""""""""""""""""""""""""""""" You need to install matplotlib_ (version 1.1.0+) to use the drawing code. However, compiling matplotlib can be a challenge. We therefore suggest you obtain a prebuilt binary for your platform from the matplotlib_ project page rather than modify the requirements file. For OSX, we suggest reading the following instructions on `compiling matplotlib`_. .. _pip: http://pypi.python.org/pypi/pip .. _matplotlib: http://matplotlib.sourceforge.net/ .. _`compiling matplotlib`: http://sourceforge.net/projects/pycogent/forums/forum/651121/topic/5635916 .. todo:: **FOR RELEASE:** update the tarball name for the version. Installing ``easy_install`` --------------------------- If your system doesn't have ``easy_install``, you can execute the following:: $ sudo curl http://peak.telecommunity.com/dist/ez_setup.py | python or, if you are on a linux system that has a package manager, you may only need to do something like:: $ sudo apt-get install python-setuptools Use the approach to getting ``easy_install`` that best suites your system, then follow the (above) instructions for the ``pip`` based installation. PyCogent-1.5.3/doc/licenses.rst000644 000765 000024 00000001633 11523377514 017357 0ustar00jrideoutstaff000000 000000 Licenses and disclaimer ======================= .. toctree:: :maxdepth: 1 :hidden: COGENT_LICENSE PyCogent is released under the GPL license, a copy of which is included in the distribution (see :ref:`cogent_license`). Licenses for other code sources are left in place. This software is provided "as-is". There are no expressed or implied warranties of any kind, including, but not limited to, the warranties of merchantability and fitness for a given application. In no event shall the authors be liable for any direct, indirect, incidental, special, exemplary or consequential damages (including, but not limited to, loss of use, data or profits, or business interruption) however caused and on any theory of liability, whether in contract, strict liability or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage. PyCogent-1.5.3/doc/Makefile000644 000765 000024 00000005665 11204372323 016456 0ustar00jrideoutstaff000000 000000 # Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d _build/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml pickle json htmlhelp qthelp latex changes linkcheck doctest help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " changes to make an overview of all changed/added/deprecated items" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf _build/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) _build/html @echo @echo "Build finished. The HTML pages are in _build/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) _build/dirhtml @echo @echo "Build finished. The HTML pages are in _build/dirhtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) _build/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) _build/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) _build/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in _build/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) _build/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in _build/qthelp, like this:" @echo "# qcollectiongenerator _build/qthelp/PyCogent.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile _build/qthelp/PyCogent.qhc" latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) _build/latex @echo @echo "Build finished; the LaTeX files are in _build/latex." @echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \ "run these through (pdf)latex." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) _build/changes @echo @echo "The overview file is in _build/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) _build/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in _build/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) _build/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in _build/doctest/output.txt." PyCogent-1.5.3/doc/README.rst000755 000765 000024 00000000000 12024703636 017563 2../READMEustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/scripting_guidelines.rst000644 000765 000024 00000067553 11730414735 021776 0ustar00jrideoutstaff000000 000000 Scripting guidelines ==================== Developing command line interfaces for your scripts is a convenient way to create easily reproducible analyses with PyCogent. This document covers the support for developing standardized interfaces in PyCogent. In addition to making your code easier to distribute (or to return to months after it was originally written), several GUI generators that are currently in development make use of the PyCogent ``option_parsing`` module -- this means that defining your interfaces with PyCogent's ``option_parsing`` module will allow your code to be easily wrapped in graphical interfaces. PyCogent command line interfaces are based on the ``optparse`` module in the Python Standard Library, using ``cogent.util.option_parsing.parse_command_line_parameters`` as a convenience wrapper for defining interfaces based on certain information provided by the script developer. A fully functional `example script <./scripting_guidelines.html#countseqs>`_ and a `script template <./scripting_guidelines.html#scripttemplate>`_ are provided below. This page will help you develop your command line interfaces after you've familiarized yourself with ``optparse``, but will not be a replacement to a general understanding of how to interact with ``optparse``. You should therefore refer both to the `optparse documentation `_ as well as the `PyCogent Scripting Guidelines <./scripting_guidelines.html>`_. As support for optparse will not continue into Python 3.0, we will be switching to ``argparse`` when we transition PyCogent to Python 3. We'll do our best to minimize the work in transitioning from ``optparse`` to ``argparse`` for PyCogent scripts by changing the interface to ``cogent.util.option_parsing.parse_command_line_parameters`` as little as possible. This document starts with a basic example of creating a script. It then covers guidelines for defining scripts which all PyCogent developers must adhere to, and which we suggest all PyCogent script developers adhere to. Finally it covers some details of the ``script_info`` object and custom command line option types defined in PyCogent. You should also review the PyCogent `coding guidelines <./coding_guidelines.html>`_. The scripting guidelines presented here are an extension of those. Quickstart : how to create your first PyCogent script in 5 minutes ------------------------------------------------------------------ These steps show you how to quickly create a working PyCogent script from the PyCogent script template. Some very basic familiarity with python will help here (e.g., identifying the ``main()`` function and adding to it with the correct indentation). 1. Copy and paste the `script template <./scripting_guidelines.html#scripttemplate>`_ to a new file on your system. The next steps will assume that you're saving that file as ``/path/to/home/pycogent_user/pycogent_script_template.py``. 2. Open a command terminal and enter the following command:: chmod 755 /path/to/home/pycogent_user/pycogent_script_template.py 3. Open ``pycogent_script_template.py`` with a text editor (such as emacs, pico, or TextWrangler) and add the following to the bottom of the main() function (be sure you have proper indentation):: print "Hello World!" Next, after the line:: script_info['version'] = __version__ add the following line:: script_info['help_on_no_arguments'] = False 4. Run and print help text:: python /home/pycogent_user/pycogent_script_template.py -h 5. Run script -- ``Hello World!`` will be printed to the terminal:: python /home/pycogent_user/pycogent_script_template.py You've now got a working PyCogent script. You can continue working with this script to add the functionality you're interested in. See the `optparse documentation `_ documentation for discussion of how to create options, as well as the `example script <./scripting_guidelines.html#countseqs>`_ provided below. General notes on designing command line interfaces -------------------------------------------------- **Design convenient command line interfaces.** The goal of your interface is to make things easy for the user (who is often you). This section covers some guidelines for how to do that. **Have people who are better programmers than you interact with your command line interface and give you feedback on it.** If your script is difficult to work with, or has requirements that are not intuitive for users who frequently work with command line applications, people won't use your code. **If there are tasks that are automatable, automate them.** For example, if you can make a good guess at what an output file should be named from an input file and a parameter choice, do that and use it as the default output path (but allow the user to overwrite it with a command line option). **Define sensible default values for your command line options.** If most of the time that a script is used it will require a parameter to be set to a certain value, make that value the default to simplify the interface. **Have the user specify named options rather than positional arguments.** The latter are more difficult to work with as users need to remember the order that they need to be passed. PyCogent scripts do not allow positional arguments by default, but if you must use them you can override this behavior by setting ``script_info['disallow_positional_arguments'] = False``. Note that this contradicts what the ``optparse`` docs say - we disagree with their comment that all required options should be passed as positional arguments. **Avoid making assumptions about how a script will be run.** Perhaps most importantly, don't assume that the script will be run from the same directory that the script lives in. Users often want to copy executables into a centralized directory on their system (e.g., ``/usr/local/bin``). Facilitate that by not requiring that the script is run from a specific location. If you rely on data files, you have other options such as having users set an environment variable that defines where data files live on the system. Test your script from multiple locations on the file system! Designing PyCogent command line interfaces ------------------------------------------ This section covers guidelines for how to build PyCogent command line interfaces using the ``script_info`` dictionary and the ``cogent.util.option_parsing.parse_command_line_parameters`` function. Some of this is general to ``optparse`` and some is specific to PyCogent. Flag options ^^^^^^^^^^^^ Flags are boolean options to your script. ``optparse`` supports these directly, so you should never have to define an option that explicitly takes ``True`` or ``False`` on the command line. Flags to your script should always be either ``action='store_true'`` or ``action='store_false'``, and do not need to define a type. The names of these options should suggest whether the option enables something (e.g., ``--print_to_stdout``) which would be defined with ``action='store_true'`` (i.e., default is False), or whether the option disables something (e.g., ``--suppress_stdout``) which would be defined with ``action='store_false'`` (i.e., the default is True). A bad name for a flag is ``--stdout`` as it's not clear what this option does. Always define ``default`` for boolean options to set the default option for your script. If ``action='store_true'`` you should *always* pass ``default=False``. If ``action='store_false'`` you should *always* pass ``default=True``. Choice options ^^^^^^^^^^^^^^ Use ``type=choice`` when an option is passed as a string and can be one of several acceptable values. This saves you from having to check that the user passed an acceptable value. This is done by ``optparse``, so saves you lines of code that you'd need to test, and standardizes how errors are handled. The acceptable choices are defined with ``choices=``. An example choice option definition is:: alignment_method_choices = ['pynast','mafft','muscle'] o = make_option('-m','--alignment_method',type='choice', help='Method for aligning sequences. Valid choices are: '+\ ', '.join(alignment_method_choices) + ' [default: %default]', choices=alignment_method_choices, default='pynast') Note that the help text here includes the list of acceptable options. This is generally a good idea as it's convenient for the user. It's not a good idea however if this is a big list (say, more than 5 or so options). If the user passes something invalid (such as ``raxml`` in this example) the list of acceptable options will be included in the error text. Defining where output will be stored ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If a single file is created, allow the user to define that file name. If multiple files are created, allow the user to define a directory name and store all of the output files in that directory. Use the ``new_filepath`` and ``new_dirpath``, respectively, to define these output types. These will raise errors if the file or directory already exists, which is generally good as it avoids overwriting results that may have taken a long time to generate. Defining options ---------------- Use ``make_option`` (`described here `_) to create options. As in that example, you'll define these in lists that get set as ``script_info['required_options']`` and ``script_info['optional_options']``. Use the PyCogent custom option types when specifying input file or directory paths. These standardize error handling in the case of input files which don't exist or aren't readable and output files which already exist. Don't define ``dest=``. By default this gets set to the long-form parameter option (e.g. ``dest='input_fp'`` is implied if your option is ``--input_fp``). Defining this as something else will confuse other people who may end up doing maintenance work on your scripts in the future. Always define ``default=`` for optional options, and never define ``default=`` for required options. The default value for all options is ``None``, but it's convenient to explicitly define that for readability. Always define ``help=``, and provide useful information in this string. Include ``[default: %default]`` for optional options, but not for required options (as there can be no default for a required option, or it'd be optional). The ``%default`` gets replaced with the value provided for ``default=``. It sometimes makes sense to include additional information in the ``[default:%default]`` text if the option on it's own is not informative. For example:: make_option("--output_fp",default=None,help="output filepath [default:%default; print to stdout]") ``action=store`` and ``type=string`` are defaults, and therefore do not need to be included. Leave these values out to keep your code cleaner. If you need to pass multiple paths or strings to a single option, do this by passing a comma-separated string. The ``existing_filepaths`` option type expects strings in this format and takes care of splitting them on commas and returning a list, so if you're passing multiple input filepaths set ``type='existing_filepaths'``. Naming options -------------- ``optparse`` allows for users to define short-form (e.g., ``-i``) and long-form (``--input_fp``) option names. For options that are commonly used, define both a long-form and a short-form parameter name:: make_option('-i','--input_dir',type="existing_filepath",help='the input directory') For options that are infrequently used define only a long-form parameter name:: make_option('--output_file_type',help='the file type for graphical output',default='pdf') This helps with reducing clutter and saving convenient short-form parameter names for future options that may be added. Make paths to files end with ``_fp`` and paths to directories end with ``_dir``. This helps users understand exactly what must be passed to a script. Some standard names for common options are listed below. You should use these whenever possible. +-------------------------------+----------------------------------------------------------------------------------------------------+ | Description | Option name | +===============================+====================================================================================================+ | path to an input file | ``-i``, ``--input_fp`` | +-------------------------------+----------------------------------------------------------------------------------------------------+ | path to an output file | ``-o``, ``--output_fp`` | +-------------------------------+----------------------------------------------------------------------------------------------------+ | path to an input directory | ``-i``, ``--input_dir`` | +-------------------------------+----------------------------------------------------------------------------------------------------+ | path to an output dir | ``-o``, ``--output_dir`` | +-------------------------------+----------------------------------------------------------------------------------------------------+ | path to a log file | ``-l``, ``--log_fp`` | +-------------------------------+----------------------------------------------------------------------------------------------------+ What documentation should be included in my scripts? ---------------------------------------------------- The ``script_documentation`` entry in ``script_info`` should describe the basic functionality of your script. This entry is typically one to several sentences. Be sure not to add line breaks yourself - ``optparse`` will take care of this for you, and the formatting will look better than if you try to do it yourself. The ``usage_examples`` entry in ``script_info`` should list one or more examples of commands that need to be run to execute your script. These should be actual calls to commands. A user should be able to copy this and paste it on the command line and have the script run (provided they put the right input files in place). See the `example script <./scripting_guidelines.html#countseqs>`_ for instances of what good usage examples look like. ``script_info['usage_examples']`` must be a list of tuples with three string entries each where the first entry is a concise title for the example, the second entry is a description of the example and why certain parameter settings are being made, and the third entry should be the exact command that needs to be run. Start these examples with ``%prog`` - this gets replaced with the name of your script and is convenient so you don't have to remember to update the usage examples if the name of your script changes. The ``output_description`` entry in ``script_info`` should describe the output generated by the script. This entry is typically one to several sentences. Again, don't add line breaks yourself. The script_info dictionary -------------------------- The ``script_info`` dictionary is the central piece of information required to define a cogent script. ``script_info`` is passed to ``parse_command_line_parameters`` to define the command line interface for your script. Additionally several tools have been developed to import and use this object to define other types of interfaces (e.g., script form in the PyCogent beta GUI) or to auto-generate script documentation (e.g., for the QIIME project). This section covers the values that can be defined in your ``script_info`` dictionaries, what they do, and their default values. Core values defined in PyCogent command line interfaces ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ These are the core values defined in the ``script_info`` dictionary used by the PyCogent ``option_parsing`` module. +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ | key | Description | Default | +===============================+====================================================================================================+==============+ | script_description | a paragraph description of the script's functionality | REQUIRED | +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ | script_usage | a list of tuples illustrating example usages of the script | [] | +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ | output_description | a paragraph description of the script's output | "" | +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ | version | a version number for the script | REQUIRED | +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ | required_options | a list of optparse Option objects that are required for the script to run | [] | +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ | optional_options | a list of optparse Option objects that are optional for the script to run | [] | +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ | disallow_positional_arguments | do not allow positional arguments to be passed to the script | True | +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ | help_on_no_arguments | print help text if the script is called with no options or arguments | True | +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ | suppress_verbose | do not auto-generate a verbose option for the script | False | +-------------------------------+----------------------------------------------------------------------------------------------------+--------------+ Values known to be used by the tools outside of the PyCogent codebase ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ These values are known to be used by tools outside of the PyCogent code base in ``script_info`` objects. It's best to not name new values with these names to avoid conflicts. +-------------------------------+-------------------------------------------------------------------------------------------------------+--------------+ | key | Description | Used by | +===============================+=======================================================================================================+==============+ | brief_description | a one-sentence description of the script, used by some document generators | Q,T | +-------------------------------+-------------------------------------------------------------------------------------------------------+--------------+ | script_type | a definition of the type of script, used by some graphical interfaces | Q,PG | +-------------------------------+-------------------------------------------------------------------------------------------------------+--------------+ | optional_options_groups | a list grouping related options under a heading [['section heading string', section_option_list], ...]| PG | +-------------------------------+-------------------------------------------------------------------------------------------------------+--------------+ | authors | string of author names | PG | +-------------------------------+-------------------------------------------------------------------------------------------------------+--------------+ | script_name | a brief "human readable" name for the script, used in some graphical interfaces | Q,PG | +-------------------------------+-------------------------------------------------------------------------------------------------------+--------------+ | output_type | a list of tuples noting the type (in a controlled vocabulary) of each possible output | Q | +-------------------------------+-------------------------------------------------------------------------------------------------------+--------------+ | option_label | a dictionary matching option names to "human readable" names, used in some graphical interfaces | Q | +-------------------------------+-------------------------------------------------------------------------------------------------------+--------------+ | script_usage_output_to_remove | a list of output dirs/files that must be cleaned up if running script_usage examples multiple times | Q | +-------------------------------+-------------------------------------------------------------------------------------------------------+--------------+ * "Used by" key : Q: `QIIME `_; PG: PyCogent beta GUI; T: tax2tree. Setting values in script_info ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``script_info`` object is simply a dict, so the standard method for setting and working with dict entries applies. Some examples are:: script_info['brief_description'] = "Count sequences in one or more fasta files." script_info['required_options'] = [ make_option('-i','--input_fps', help='the input filepaths (comma-separated)'), ] Custom command line option types -------------------------------- Several custom option types are defined in PyCogent. These are: * ``existing_path`` : Specify a path to a directory or file. Path must exist or an error is raised. * ``new_path`` : Specify a path to a directory or file. Path must not exist or an error is raised. * ``existing_filepath`` : Specify a path to a file. Path must exist or an error is raised. * ``existing_filepaths`` : Specify a comma-separated list of file paths. All paths must exist or an error is raised. These are returned as a list split on commas. * ``new_filepath`` : Specify a path to a file. Path must not exist or an error is raised. * ``existing_dirpath`` : Specify a path to a directory. Path must exist or an error is raised. * ``new_dirpath`` : Specify a path to a directory. Path must not exist or an error is raised. .. _scripttemplate: Template for a new PyCogent script ---------------------------------- The following is a template for a PyCogent script. You can download this from :download:`here ` to form the basis of your new script. This script is also embedded here for documentation purposes. This template forms a fully functional PyCogent script, so on copying this you should be able to run the script to confirm that it is working:: python pycogent_script_template.py This will print help text and exit. You can rename this script and use it define your new PyCogent script. PyCogent script template:: #!/usr/bin/env python # File created on 15 Jul 2011 from __future__ import division __author__ = "AUTHOR_NAME" __copyright__ = "COPYRIGHT_INFORMATION" __credits__ = ["AUTHOR_NAME"] __license__ = "GPL" __version__ = "1.6.0dev" __maintainer__ = "AUTHOR_NAME" __email__ = "AUTHOR_EMAIL" __status__ = "Development" from cogent.util.option_parsing import parse_command_line_parameters, make_option script_info = {} script_info['brief_description'] = "" script_info['script_description'] = "" script_info['script_usage'] = [("","","")] script_info['output_description']= "" script_info['required_options'] = [\ # Example required option #make_option('-i','--input_dir',type="existing_filepath",help='the input directory'),\ ] script_info['optional_options'] = [\ # Example optional option #make_option('-o','--output_dir',type="new_dirpath",help='the output directory [default: %default]'),\ ] script_info['version'] = __version__ def main(): option_parser, opts, args =\ parse_command_line_parameters(**script_info) if __name__ == "__main__": main() .. _countseqs: Example of a simple PyCogent script ----------------------------------- You can download an example PyCogent script for counting the number of sequences in one or more fasta files :download:`here `. This script is also embedded here for documentation purposes. :: #!/usr/bin/env python from __future__ import division __author__ = "Greg Caporaso" __copyright__ = "Copyright 2011, The PyCogent project" __credits__ = ["Greg Caporaso"] __license__ = "GPL" __version__ = "1.6.0dev" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Development" from glob import glob from cogent.util.option_parsing import ( parse_command_line_parameters, make_option) from cogent.parse.fasta import MinimalFastaParser script_info = {} script_info['brief_description'] = "Count sequences in one or more fasta files." script_info['script_description'] = "This script counts the number of sequences in one or more fasta files and prints the results to stdout." script_info['script_usage'] = [\ ("Count sequences in one file", "Count the sequences in a fasta file and write results to stdout.", "%prog -i in.fasta"), ("Count sequences in two file", "Count the sequences in two fasta files and write results to stdout.", "%prog -i in1.fasta,in2.fasta"), ("Count the sequences in many fasta files", "Count the sequences all .fasta files in current directory and write results to stdout. Note that -i option must be quoted.", "%prog -i \"*.fasta\"")] script_info['output_description']= "Tabular data is written to stdout." script_info['required_options'] = [ make_option('-i','--input_fps', help='the input filepaths (comma-separated)'), ] script_info['optional_options'] = [ make_option('--suppress_errors',action='store_true',\ help='Suppress warnings about missing files [default: %default]', default=False) ] script_info['version'] = __version__ def main(): option_parser, opts, args =\ parse_command_line_parameters(**script_info) suppress_errors = opts.suppress_errors input_fps = [] for input_fp in opts.input_fps.split(','): input_fps.extend(glob(input_fp)) for input_fp in input_fps: i = 0 try: input_f = open(input_fp,'U') except IOError,e: if suppress_errors: continue else: print input_fp, e for s in MinimalFastaParser(input_f): i += 1 print input_fp, i if __name__ == "__main__": main() PyCogent-1.5.3/doc/templates/000755 000765 000024 00000000000 12024703636 017006 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/templates/layout.html000644 000765 000024 00000001035 11361423150 021201 0ustar00jrideoutstaff000000 000000 {% extends "!layout.html" %} {% block extrahead %} \ {% endblock %} {% block sidebartoc %}
seq1/2 a c b e
a 0.0883

PyCogent News and Announcements

{{ super() }} {% endblock %} PyCogent-1.5.3/doc/images/tol_entropy_gap_filtered.png000644 000765 000024 00000124411 11611026400 024035 0ustar00jrideoutstaff000000 000000 ‰PNG  IHDRàðØu·ºUiCCPICC ProfilexÕYgXK³îÙ¼ KZrÎ9ç 9KÎAÒ’s\ATÀ‚ ( QÀ€€ˆ  ‚HP@Å€€ ¢€(ñzÎù¾û|÷þ»n?Ïô¼[U]Ý;UÓ]UÛ922A@X8%ÚÖX×ÙÅ•; 0€à`$ûÄDêZ[›ƒÿµýÐsTjO×ÿ*ö?3è}ýb|€¬a¶·oŒOŒ@èùDFS@þ„éC)‘0F=‚1c4¼@Oíá€?xy{ÿÆhÔo{[}Ьà¨Éä舂07Î'ÖC4ÃîÉÆZ>d_Ø `ɰ°ˆ=üÆ¢Þÿ¦'àß0™ìýN29àüç¿À#በ‚b"CÉ ¿ü_va¡±ðóúÝàž:<ÔrÏ6Ìð5ïK60ƒïœðµúÛf° Äîî`Óö°d¸·¥Õ_XË?ÚÈÆðXÈ:’¢·‡ágùGR¬íÿ¢I Ô·„15LÏõ‹1ü[Ï•`òþ=›ÑÀô[ѱ¶0„ñƒ˜8;CÃ}H ´wúKfÕ×Ïà/:áddúGÁD1Ý›‹¶9H„ÙÞà¹ÊÀ „? ¢á>Hs  þꥀ? Ü8˜BÀG‡Á#"à10æýKNÿ?(F¿ÇÀãþ»F^àËÆþ3çŸÙxá9ÿÖ|aü7 ϱÇÛ[]ŒgPÊ¿æü[bOßïÕÈÖÊ.Èný½&”0J¥„ÒCi¢´Pj€ÅŒbR(E”*J¥Ò€yjÀ|€5ü½Æ=ýa·üã "ÔaîÞ÷þ› Kýóû?V‚—š—þ^¿xø=@?"2!:( « ¿¹~’¼¦á>Ò’¼ò²rr{ìÿ7moÏú³ØÛß{Äüì_´°TÔraŸ:ð/šÏ ÍßÀüM(vç$z}b£ãþèCíÝЀhaeƒ÷D ?gy  4€0û°.ÀöŸ@Ø£ÁA’A:Èç@¸J@¨uàhm ô‚~0^€I0fÁ"X?À&AXˆ‘ 6ˆ‚$ yHÒ‚ !sÈr¼ (Š…’ T(Ê.BW¡jè&tê„CÃÐKè-´}‡6H5‚Á…FÈ Tº3„=€ˆB$"ÒgˆRÄuD¢Ñx˜A,"ÖI…dFò!¥ªH}¤ÒéŒFAf ó‘¥Èzd+²9ŠœA.!¡0(Š%û© Ê僊BAe¡.¢ªPM¨‡¨QÔ[Ô2jMDs¢%ÐêhS´3:}ŽÎGW ï {Ð/гè †#‚QÁ˜`\0Á˜C˜,Ì%Læfó³†ÅbÙ°XM¬–Œ¥`Ó±…ØëØìvûG…ãÁÉãŒp®¸p\ .WƒkÇàæp›x:¼^o…÷Å'àÏâËñ­øgøYü&ž BÐ$Ø‚ É„B=¡‡0EX¡¢¢â§R£²¡ ¢:FU@uƒêÕ[ª_Ô ÔâÔúÔnÔ±Ôg¨+©P¿¤^!‰ÂD¢+‘B!Ü,RC«é©UkSû¥®¬NQ¿¥þUCJ#D£Fc~ŸÈ>¿}åûÞkòk’5¯jÎhñjyi]ÑšÑæÓ&k—j¿ÓÐñթЙÓÓ Ö½®ûEOV/ZïŽÞº¾ºþaýHcƒ ƒACCˆ¯øŒj–•Œ?0A›˜™d›Œ›r™ú˜V›.ïWÙxÿC3j3;³‹fïÌÅÍ£Í[-û-Î[LY Y†[6[+S«óVÓÖ"ÖQÖ÷l06Ö6E6mål“lûìHvžv5v?ìõìÏÚO:ˆ:Ä:t9Ò:º9V;®;8å8Í8Ë8vîwaw riqź:ºV¸®0`!P;0?p)H?èbз`“à’àõ«ÊÝP§Ð†0\˜WØÝp†ðð‡ÜñÑ‘é‘3QêQyQËÑfÑ1PŒ{L …bEcǾӊ+ŠûyÐñàíxúøðøñ„S s‰F‰×¡ùêJâKJNz{X÷ðÕ#Ðï#]Gަ=f|¬*™’ü4E6%'e5Õ)µ5+íXÚûãÆÇkÓiÒ£ÓÇOhœ(9‰:trð”©ÂS;¾O2e3ó3·²|²žœ–;]pz÷Œÿ™Á³Êg/ŸÃœ ?7–­]•CŸ“˜óþ¼Åù¦\ÞÜŒÜÕ<ϼÇùŠù%b/̘´ ž+ܺxñE‘^QC1gñ©âõK¾—F.ë\®/á*É,Ù¸teâªñÕ¦RáÒü2LY\ÙÇrÇò¾kª×ª+Ø+2+¶+Ã+gªl«V«TW×pÖœ­EÔÆÖ.\w»>TgP×R/Uµ¹!ó¸{ãÓM¯›c·ÌnuÝV½]ß(ÔX|‡t'£ jJhZnlžiqi¾»ÿnW«Fë{Ò÷*ÛøÚŠî3Ý?ÛNhOkßíHìX{ù`©3 ó}—g×d·s÷ó‡6{Ìzõõv÷éöu<Ò|ÔöXýñÝ'ªOšû•û›”î3á;1ÿ2ôå·Wq¯6'M¡§2¦é¦ó_s¾.}#ö¦aFyæþ[ƒ·ïìÞM¾÷y¿ø!æÃÖlÚGâÇü9ž¹êyùù¶£…¡O>Í.F.n.¥¦ÿ\üEôKãW¯ËÎ˳ߢ¿í~ÏZa[©\U\íZ³^{ý#ìÇæzÆO¶ŸU¿Tõm8mÌmÜÂnl‹m·î˜íLí†íîF’£É¿c$Ü#üýø^ ç.pî0æONñ[NW XÆ87€£€Qˆr‡ªጸ‡A^D± ŠÑ’è>L8–;ŠËÃ{¤©PT¯©¿ÑièЧ0Ü$Í1q2»°\`bâˆälç¦å àmçgˆlÚQ«%‰•’’¶”ñ——K–?®¢xX‰¢ b£*®†R{­~W#_¬¦ƒ–Š6‡BgIw\¯GÿŽA¥a±QŽq†IŠé¡ý³pó ?K_+_k_›@Ûp;Šýa‡tÇ3NœK\*]4¹µ¹wyôzö{=#zûLú¾óûâ¿H ’ 6 ñ=v=|(b5Š%Z5Æ…›Wtðz|{ÂHâBâ0÷Í£žÇR“kRFSwŽs§ËÐ?ét*,ãDfyVßé¯g¹ÎÙfgåôçÒæ9ä^˜*ä¼èZt¡xè2®DçJüÕ†ÒùrþknѕǪÎU—Ö´ÔŽ\_®'5hܺYtëY#îŽJ“c3¥åÜÝÚÖ®{/ÚfïkßèØíDv¡º1ñ=„^lïvßÒ£¡Ç•O¢ûåú粟ª<¬};¤=Œ)õ}.ýü׋ž±œqò„êKö—Û¯ÞN>œº6þÚïî çÌêÛ'ïJÞÇ}°ž•‚½ìÛÜ«ùÇ mŸo.Ýø|ûKýתåºoÝß—WU׊׹Þ߈ÙÒÚaÛÝ…í†cÅ} ´@È: #$©ˆY8¶ê‚ãþ´9zs «Œýˆ»„w#ð–¨a´D:AzU[…1©•y–•M—ý Gç<·ïU¾!þ‚ìBÂDbDO‰Š—J”I^–:/"*k+§(O’ŸS¸ {‚±2òK•RÕP5eu þX#gŸ›¦°æW­Ví“:zŒz_õûaoH3ò6Ö1á2Ù2ÜßjVhoáj©m%lM´^³ycûĮپÌ!Û1Ù)Ú™ìbçjp@ÉMÈÙï±í¹âµHþà=ã3í;é7é?0ø&èMðtÈdè«°Wá“ÓðN=½³BÙŠÃdˆçHàK9$¤|XûˆéQ‡c>É””ôÔ¢´[ÇûÓNÒœRÈpÉ<œUzº÷̧stÙÊ9îçÓsòÆó¿€B†‹ÂEšÅN—(—óKî_™+e*3.O‚÷¿G•sÕ˜áZÃë¾u©õå ½7noË7ÚÞ j:ÜœÝR~·©µïÞDÛüý_„œÒ] ÝBI= g©w¼¯óQíãÜ'Iý~–OUEŸñ q³°²?ç~!0&:.3¡ôRý•Τєå´ëë7©3¥°?lP›=ü±ožu!äSç’Èç«_å–ß}¿½Zù£íç—M•íÜßöGÁÙ‚,pçÁÄ9B…Є""±€´D¶¢dQõh%tƳŠÍÅiàæñ×ñT^ÔæDU!Z:"=–"!ÑLfZVa6%vCGÎ ®Pnog^3¾}ü¢´pDÕ/tE8\DUä—è±pq!ñq‰£’¼’¤ÈÒt¹Œ‰Ì’lŽœšÜ[ùL…wŠg•´”•/¨è©|V-T3T[V/Ò0ÖXÙW¢i®ùS«\ÛV{W§I7ZO^oE¿Ñ ÖPÙpݨÙ8ÁDÃdÓôþþ#f:æÀ¼Ë"ÍÒÄŠhõܺØ&ÀVÁa7 ûH¬£©—Óç—s®>°—àܦÜozœôôôR%“È_½|®ûžó‹õw Ð ä B-? ¹š–î¡)Å^‹yGyÛWv03>*Á!Qõ[”´q:Š?ÆÌž"*‘¦p\=]ç„ÑI³SÖî™ÑY'O—œ¹}¶÷ÜxölÎ×óë¹[y;ù;„BÙ‹.EiÅõ—ÆKÀ‘«¥Ñeùå-×^VìVÉUûÖ\¨¨õŠ A7.ß½mÜw'ªéZóø]|«ú½¶‹÷µ¯>àé4éŠê.xØÑó®ýHì±Õ“„þªéAögCÕÛ£¶Ï»Ç<'X_nL‰¿îx;¼¾^~Wügyƒ<ƒ‹C&ÂèÃM"ŽDÞŒzÃD1ŒMŽ{ÏžœØ–DwØÿHû1–䨔4‘ã©é3'5OÕdògŸa?[”Í“S‘+›wÿ‚yÁôňb䥂¯«jeÌå¿*fªžÖt\o¬¯¿Qs«ª±¢)«%²Õ¶M¡¡c¹s°»®çt_Äc‡~­§bχ¶FÞ›ÀD(¡’£š N#*çiJhééFèsœI|¤ïŒ}LW™²ø°îgSfæàà$qns}äæéämä«å¯(¬ªné[ß•d”“Ö–q •;._¢pOqF§"¯ê©vF½]cYS@ËI;K§K÷§¾¸‡a¾Ñ ÑÔrŽÙK Ë«z[w» ûG§ço®ÖÝy<Îz¡ÉÉÞ_|UýRý‡y‚¢‚{B9ÂbÃG"å£ò¢·(~±ÝÙãcI%;üó¨ÿ±W)ö©cÇ=ÒO=5›©—uõ tÖ÷ÜãÙóEyøüÄ _ .¾/ö¾ô¾ÄöʃRÙ²«×H'*·«)5Ÿ¯Ô½o ßx{ËûöìЦõ–ÔVú{e÷UÚuáºk{lz7U=q <íy–<¬=²õ¼y,|‚ÿå³É¸iæ×7gŒÞŽ¿÷ýðå£Ã\ùüâ'þEó¥ ÏÁ_|¿,ó,¿ûví»õ÷_+—VeW®9¬Müpý1½î¸>ðSïgó/¡_Ù¿¶77†6•6 7··¼·:·y¶lOïhìäí,ïîß-ß³Œ¿|FÀ ¢ÖƒƒÉ×»»+Â`sØÎÞÝÝ,ÝÝÝ.ƒ“ øȃÐ?ß+ö„1pͽ¸|õê§Û»ÿ{û/6'†ªÞ"„ IDATxì] \TUþÿZ à …æ#-ùÈJÇÔüãVš`ÛJVc¦YŠ­–¢ë¶ŠÛÃÅ6·°Í°ÖÄm5ÃJÜ +±V¢…–háæXá*&T`‚B͘ƒ uþ¿ß¹sïÜDA…á?Ã=÷<~çœïçwçüÍ%¨¤P( …ÀYE༳:šL! P( ‰€bÀê‹ P( s€€bÀçt5¤B@! P(Vß…€B@! Pœ> «! …€B@! °ú( …€Bà  ð9] ©P( B ਬD}|SPHêë?@űc¨ A«by~þë¼ Ñ}Ûùë¾§9U!îöz£Šòb¶»Þ#BôbTVVÐǸ52¾k¨ÔˆÖ¸.nÃýŒäŸ  „ÐGK•p”—£Ì~-[wBÛ0½œky¾£¾>ërR…€BÀ„€’€M`¨l`"°gÕ„††Vù,ØQ^O >†%aa[ø™Aïà¿Åð7¿6î½3ÅXŒÐÈ¥0Ï`ÏÚ¹%×µkWtjйëö¸»•cipÕùóš–î2QpìÁŸ¨l‘¹Ì{`ìZ= Á¡7c‡ÃSáŸàErnÖ¡_³`„·n'çÕ.<³Wï"¶«%Ç®UUp ˆ­&úž‘TN! 0#`~•5—«¼B ÀH@nþ}¸Àå‚˽²ðNa§±F’k+L-ÌÃmA—x赆¾hî¹7r¬›Ö s¶Pµ…!©Vo€eÂħÛðô˜+·î!XÆZ0´È…ÃpwA>Fê\/(®¼5¸"fƒù¿°«G…cr†6H²1–w¦pã| œ¼‚ ­6W°ƒF~ü h݃…#‚Ú8ŽÃàÄ4¼zßïè¥ Ÿ¿øW <·ŽÜè¶Ô@² =÷ º€åaNAhw:ÐʾêB é  $à¦ó¬›ðJ‰ÁDõ‚¥{wtïݽݟŽúVjÅÌnÖ ÓÖê§?¨*±‹$Ôf$ ††£Y¿~è×o6v¹%½Ã›ŸÅ?Þ?诣WÙÖ…VŒ]‡ä„(€¦¥§üÓ)›ˆ‡Æô%ö„¾câ‘@%iïçÓß ´íBsçùó§KG¼7‡Ö”k_æta¸ce ò2@Tý&ÇžÕè3É™éÄ~íÆK7æK·Ë‰nÓÌ]‰rßIXþèxôîØa!rëH*µaÿ÷f·.Ð]ôùuïB3RI! 8J>Bª>¶LÆC Kp±{5'~8A÷=„Ýé,´Ò…Ï©üh©.W]rùŽEHjBZ6¦]×ß|¼C'l1Ùñ÷a÷‰êû3Åëfc蜣È*ÛŒ.£0ëÏ8ǼÞĸZ£qÓwèÜÕ7Uì]‹˜Å@ªm¼”R¹>¬mG„…u­´j*ÝŒë-“—ž™#Žc½O Ú`†mÖ,Lü´Z^Ô ¿=£†t'¶_5øDŠî¸¶§›ÅÊ%gÀ=q=/ÂU7ÜŠ1££ÑÑtÄ\•Š*Q(ÿÇ2 D ÇÂ!÷ÊŽ9Ž+h;šÔ €Vd:H¢ iUͺ+ùIœqxrüÙ¦K§Qˆ"\ÛT¾k9zŒ]Œô|'¢iowï ­§¾Ü2¼¸ ÝDù®;ï¹8ð⌠@lî’Ò¯§†eY“P­Uйð´vÃÑ-%ËÇt§—=²œŽ”Ô²W4’€.´…\¸mÆ+µ_?m€Ñ†3,EóbSmèëf°¡z"‘:_Ú«ìû¶cÖ„á˜eIBÑî‡ÑÑ«·ºQ(|P Øu€[²ŒCò³Ó˜ÝúI¤µÛª:æ«5çÓܨÝ<}Ij>•tpÓRÙüƒçðÁñãØ¿‚l⮉%xjå£8n' ¸Íõ^$[ÐÝQ¯ |Ç2L'¾Ÿ^pg5kñîàØŸ >õµìz³§-ÇqÆú—8f"®½÷1<|mi˜‰'G¸ûM›¡—DÌÒm('ÌçÀœÊ÷¬Ek’¢- ™X9©¯VHƒ:FãÑ'£Ý÷“pwÌ%h7t#”Ö;­UF! 0# ° •\ªŠ’§°VŽ‘€¼e/1IR­Ò’« s¬‰`‡“šV¦©e7osx¶dX=¼ŸÜ6îØ“Ø…"¢mî–a71ZË”–&²Åxn0Iâñ™ÕÅß]ÞLæä© íp#ÒÒÒ<4~>€œqeôp:÷õ·aM T¼Á*ܼ]‡ÏBlr6^ž©íxzçÂ.lG¾¯ ÞmÔB@!àF€ã«¤dl)VAûÇ"Ëf¹¹¹Ú''W”¹´e;sÅm€¸mYnµ0e% ˆ¸ät‘‘ž,¬”'mb‘k׺ä.&ú&yúûÞûÎK¥9YR„S¯(ÊÔè§æ»Ó.rRãä}f‰Þ@ˆ‚Œx*³ˆ,S™^[VR"ŠòÒeŸÄÌØ‹FœIØ«‚n\?~ˆ=øÙ·ø¤÷AdkÛÖßÎÅI{Ö­APH˜ÿ“Ú¥ž°¥qU¨T…€rÄAš® ÀÙo)VçÔ fg |ôÒX’ç‘mjmwÄ]§ÏƒõØG @œª§í‚ƒ–Nm@Ί(Á2x][¤WܱEÈL°ààÜÆ'ÑçÜÖƒaOLEÙŸ29-œ¶Þ2 ¯aýP`ËEVV&222a+(Ãr“6°ïÚ¯œ^ˆýSúø«{…€B@!` @0â5g 595¨³³þiH×At’7;ö—#r@PVç0! ·>žŒéCÇ"´(éû=òž˜Ž¨Ä, tŸ…µEÊÛYÕ½R9Vé$AÛ½ ¯ŽžtrÚž¶œ A—¾j}¶Ò®3z2jm …@ ÌXœ[g <ƒšœÔèl¡¢Û¶£ž!s=¹wŠÃ[j©&Ú­;w•½¢zÁØá12ßÅG‘J“¢uòÚµçÃ䃉äú[,>]ÚÕ©ÊO ¯¿þ3fÌÀ¨Q£0}úôÓ¢¡:) :&PÍû·§Œ±ÙÙ‚ÞùÔœ-轋¢©ûÝ5j8c¢íøšÚƒû\¦µŠ@ôŒû©'ðݶ7ñ$\ë. ½Iä¨!--}Ý€n¸ƒI Ú›€˜A— øh{FQ¹úF`ëÖ­èÓ§Þ{ï=$$$àðáÃõ=„¢§PœmE ü\8[pØ– `˜ÈÕÍNâÔ@œÄÙB~Z,ѳˆŒMJª›B@!ÐP@C™H]çaKaæ¥ý8™¯älA#MÞŽb©þ¶eûÖªcºDnZ¢´ƒ%P³ÊG‰·y£¯mgìʉÒgäÔ@yÌG9[ðš'9[0M£L¤Å[Lõq"[ï|Ú®¢lÇLØÀÀ"R²=þˆüÙؘf „]¤0Nöxu:m¯îtã"»Ðèèhqçw ň}Ñ9ýûM›6™ž/Äûï¿úÄTO…€Bàœ#ÐŒg@?Ú*1äªÜ„wœØf£‡õ lÎõFô3PÇö¤ Ìò*r[0ÀäË¿²Â‡“”™‚ȶR9kêXYQŽrG%U·õk{é(/%oZd[Ù6¢ŠÃý“Ñ®©¯i §•­-í™3gâ_ÿú—ãꫯ&³ tîÜù´ÆT¼¸çž{ðòË/ËBŽ üÅ_ $¤†3~ïîêN! h@Öp]•Î(Ø:i˜ò§¹þ‹Kï—ùjCqÀ¸«¸[N È1?æËý¤³…¶þ™/ׇE´E[?ÌWëKL½Ú5õåþuIµ¡íû>÷ùçŸãšk®Avvv]†V}Ý<óÌ3òùóíðÄOœ26• -m¯y(­·Ä.[Ux²ËÖ sPîoîUMíx®J¦ w–_ ‹‹‹QnOºƒõYŸ·{ØJp¿ÂÂB”Ò‹p•Ä/â¥ÅD·þª+å䖵ť~WT…œ*PT‹À9—ÁÔœ‚œ-r¶ ÈÙ‚ g '™Cî/ôïôþ$=½úùçŸÍ›77¶LƒƒƒÅÊ•+}Ùge}Œ#ý‡–Æø«¯¾ªõ¸Î¼T£¯NC^«wOZkâ²aí\¶j4OâÕn“îU“r«þ?´çgŠXó‘‹iþþŽ[ͬîjÕ¼öøÔ\¡Ÿå§sÀ óQDr–~”ãi^®b©]l²p«YœLªµB€˜3`õ4ü±hß¾½×™ÏˆŸþ¹áM¶Íè×_×]wëСC—Õ&ÙmíÈ"Ò)úQ~~žÈËÓ>ùîèFµ¡anãt: 楗—Í"]3Qˆ*ºÔЖêÑÙ°¦xô 4v‘Ê ˆnF˜ìË€K²)¢ÕÇ¥[‰üP‰ÓümÉQñé"¿ €ÖÈë¤OBÊnKq‰i4¿Š:U&²S8ê”'ÂT^z¢ŒîTD~­YRóÍ®Gw²‹ôÄD‘i+ ŸÙNQ”›¦ùMO­I¯DGE]UP ¸*&ª¤øöÛoÅÀSþQe†ÁáòT:}èìWð®‚ΤV­ZU+b¦ Õ¶vŠÌb`±©¤ŽW}²ç‘j’-$ƧiÁ?ò(”b¬‰)ùcÀ.{ 1Í‹´òMaõí%ÄTó2$£õeÀ9I¤¨•\íür‰[’«_¡>†¼eH SlþWkK!,ª ¯hÉ´nŠ åERÝ(j‹€:¦_0•Η\r ؆uüøñÆ |ÌçÂ|>¬Òé!pÕUWáÁ4:?üðÃ8räˆq_s&ÏÍ_ˆ… µÏüùs±|ãw'ö½³…<±Ø«?{uìÂŒ+b°&.¹ùù°å¤ÃB¾\v9.iÿqv—¸½ÈT3éµctbר~RXÛŽèÒµ9_õM|þ ¶e=ÕOêjDOœ‹{=籆ËÖ‰1mö|¬Ûz Úµø„Ö +®õ²µ¯ÄÞk±p.0™¾ÉÙ÷ûwAZü1VÑTFZ:øNRÝ+j…€bÀµ‚I5ª ¡¡¡äè# ÿøÇ?pÞyÚWî›o¾m£âµ×^« é&Ý÷ÑGE·nÝ$GÅäÐÆž£ÙøW©·Š\ó´IùK‹û%ëý)hÍyKWÐ2-JeµD@IÀô¿U¥³‡@LL vî܉Ë/¿Ü411Qú7¶Û½ä£^eªG`Ĉ7nœÑ€}D“’›q_]ÆåÇúÆÜ¶¦êN½É0™ÍQµŽj½ãÓsðöß"Ñë×A ^ÈË߀¾O«æ!d¾ŠkT½I𕼙ÌIsYÒk l¶ÏÑw°àý²¦E_ „vXd;§%A HÉ ´=N XÈx´?À¿taWbÝôò£²ïí‰éHçl{´ å««G‘ b¸+莭«;uëÍRRw¡5þž’‚©\0$óéT%éÖå(ä{•§€bÀ§šêR7zõê…;vàæ›o6½õÖ[ó5û(ÏK%†hÑ^|Í™\öoÄW@üÃd'ì*ʉ IâùËĂٓ4<,I¢ÈÄ)É’çÁ ™Ujèhy§BóX°åß^to ±Â*_> ½|¹ûi®Š5  p ਪ³‡@nn®¸ôÒK ÂRÆÄ‰;zP©vY— 20|饗j×ñ”[•‰$z>RÔ½QYŠDÊ'Ûª>/‡%_HÉ·Æ¡HéŠ6ƒE|–ÇF\ÚáVÀY kq_bØ&ìK¿$›ƒŸD‰l/gZ.‘)%í$áÉÔÓGùËTã•5+hyU¨…@-PgÀôË Ò¹G`À€øôÓO¥i’>:@N;¤¯_½L]«G _¿~ éÓhÀfIååúÙ®Q\™L-ÊCNv2çg ‹žœë§ã¢\áªjXÔªï} —­¸š–Ôèsšº²^· ë/¸DØ…íˆ"3›Råõˆ¡³ß¤œ©hk*÷d½•¿<åÞ9³‚–wºSÔÅ€k‡“ju ·•ؼy3&OžlŒÆL™väääe*S==ö˜yª”—ØAÇé&³Í¬'xƒ¦ѱ7"‡DcÄm·!z@w©hÕ‚|j°}Š={tå.¾î%e®VèÜ£3*)zXppv‘––*Á Sá°^¸y80kà,ß°kN“Ž0ÐÍÝú8;ú8Ã?xÔÃ~%ùÖuÂXⳕãï/fá­½øÀV$Îb%ª‘èVA8ÀÂA¤þe, Å)øcd½18dyaa1X¬œûüO*ýXRDŠb™L_Ž•ã­X¸a‡T+Þ»Ù­ Õ?“2™W0ÙžþøÐÖ‹k¥Å¢‚>Ô¤un´W™ÀC –’²j¦8«<÷Üs^Û©¬X”ššzVçÐXÛ°aƒ¶=LÛ¹ÕK'²ÓXŠfÛK¿x--o1âc{õößìé牧­áÞ&ru7ÑŽí’öãÛõýaÍ®÷Öq1²<*Ž”¸Ø'ô°'ÅsñìRŸ‹XAˆoÅh£L¯ó¹’Õ¿–=jê«Õ§æÙE^šoà…‘bF¤O¼H$qœïö}½é´ëèuŸ®¹âd\ªÒ¶Š SON &ÁÛíÆšbSýo‡ëÔ5 PgÀùXcQìT⢋.òüHÑkMWVVÆOa.: çópóÇUƒ3Ñ£G¸‘ÛÊ“Àp±‚”ùü¶\çíe%Ô·H3à3Üý6›(rè„bQˆy›öimœåZô¡K™QYÅÚ-;Å—ÿÛ-vîþÚ­ÈU.’æõùpa¯Ø¹s‡ÈJ›'×½lÛ7¢ŒÅ 24&ûЪw„mßñծݢØÍmi‰")}›øúP©(ýæSñ[iœþ%Jg}J¤t•çf”¬.×P&rWO¦q¢D†­D¸\vwPˆ wh$vŽà ö›Hâ‰(]³¼D$ñÙy,igÓ@Ù‰•ÏbS= ü¾ªi#F@1àFüðšÂÔ<(,6wqÿÒõÆoäz±),_®ñtBrŒ°°0·'Ÿ|²F¼t/Sl"¤¥Câ6Â:æš’?OPÀxaã>Õz™ºIÌa–^=ÏÑúпÅ_úiyÚ öü7DÿO½öÌ5żeËŒµpyBºÕ$æ9™ï‹x]L~O"Eº4* tO;å^}a‰9%‡eO\æï–žJÖ"4e'q}œHæ%iŸ%âøL7V|!¦JcéýL2›/µ¹É—lªi`§äy^xä< ú¾½Õ} " p >ÙZ×±cÇÄwÜáõCÙ½{wÁšB:Ý‚lÊ¥36ó:pà€„Ë_A—£HØlûM±­‰;qñèÙdÏ›$?ÆF‹¡cÄN È¡úÇ0ck+Þ”ü“¤M=ô]9Òy™roéÙésÅœÑTvÃb±•ÂþñÜzÐg1k1SˆÁa”Ç…zˆðð EÖ¿|íro³/ûf9Åáý;Ä$5“c ñåg/Ëò±ÌÉô¨¸(S¿×U‚|wQ¾¸çÏsÄí”×#09K Džœw®Hº‘éÌwS=3Rí…ƒBÒô4ªK°î¸Àñ&“*§ŒŒäŸ—iácÓ$¾öÜdš‹æÞRÿþJ»h·¶·^¦®€R¢_•6¿ùÍožžò-£ßðl‰™HÏYì+©)¤n¸œ”ºw'ïRìaŠ>Ý;’g'™*°q.yš¸ÚË{Õý÷ßÖ.çDLÓ&ÃÄfÍÀÁ1‚éÚ¯_3Ì^»WÖWJ/S/ @Ó8’e88TVnm¨¼¨:·¾­ÃÂÑÑ·Ü@íîA_Òn&?èBèÅH~º´¶#ƒ4;RMRN*Æ ¹O.û&ÝJ'Ÿå!è7ü:îäv2 ìH½[ˆYiÙÚ´‰@Û`}ãðFN6Ò&„ãÙ®1T¤½_€víƒõ˜p;¶>4ˆMà +ÓÈUfví+””¬7Ç!‘"peeÜoÿ•„7Éoå¯nEë¶]È[Ww¿“ˆ9›€Õ›ã?äéjÆå;d †ôüˆ&]-Š-!“ôtI*gÃ&l½ØŠ1s—“7° X>*f‘70Íu¦ÖVÿ»uÉdL¦¯iúc£5¯`Á-ô*ãê:Áa !q IDATúß*554'«MmÕj½R&Âã?.Jll,EÅ9&?·ß~;žxâ $$$̹Ñ-®VÖBvwÿvŸ8ñÚ ºÓFt§Þî‚å¥MKM°|ùrŠ ‰_ý›¶~ Ÿ‚ÜåEð‘ÿbáà±FÁJ—´—·ã“:nåÏÀm|ÏÞy±i†îP}Ù™˜³à# Õ·<ß´cÅ,Š˜47a-ýä!Zqø;òÞ¸·´Þˆa-Â/Oÿ…ºÓB ~ø62ã¡Ô2¢=ÝkÉž½Ð(Ü9a‘¿E«k`%†æ´³V´–m_Žy[ˆÉ܉V]ò)À  ù%?R¥󟞉¾(ƆUÿEïë)’á¶KðàíwÏJl]2CgeÐZrñ¿¡IxÎÆe»ï—õ<—€Hûzÿ q×Ä<µòQô‘ˆ¼¬^xvÉKx4'#£ÜªÚ-4­7Ó.Á˜în®ïææfƒ­àþX·{Šê°( 8`m`.ÌjµJ“$–´8Ñ&yäŒ;?ýôS`.Ú½ªÓ !È&\, ë©Ío£ûE!½-dÉŸ³ÞÐ}%…7ŸböÄ|?q‡ê+·;}êé¶t3âfmAÂ÷È f»^6m‚ã ²©Y™ÓÅûÂ”Ž¹C vÇnÅ¥_îÊÃÇ?¼Ÿå·J9­@VO$xwÃ=±À²Ùÿnz¡»_ÃÜQã¤QHû+©Aù¹¦5TÚ±{Ûf|¹;¹(l"¯Ë so’Ì7ÍV†‘‡—cy¹^ñÇ!èpcR)„æ ƒá†ßÝ€Á$¸“ÛMDïgˆè= Ë×oÆîÍëñä}¤FEéÎ{Ê«/í™CÚºËyÊý‰Òì(ð˜Sÿñ {AF3•i þ.»Za "ÀJX¬ŒEÿG+k±ÒV ¥º†¤(S¢}DK§?þñ‘ff¤ŸYúºyB;Ž|l= ç öT¦}ò Ñ»—DÍ EZ±q³—)÷Y)i0§ddˆ´¤8m¬Mì>óíLÏoqîW"‘® ÙÅ­<ÏóR*KÚ˜®õéþgñ5­£ 'M\Håí/ óã^–¼9•Ú´×]oQÖx‘ÇJ{dþTV&’åyp‚È-¢P…¤œÈ÷]éõ/ñŠûŒ:.-[ä™!n!ºQ¾.Š<ÖBÆÂt—˜µ)—(ÈË#-ç2‘—*µ˜—î>Cwº}ZCHÚùy„¡M~4Ú4/V£sëü2§(±¥K·˜±iùÆx*Ó4PJXMã9ä*Ù‰¤;íš~<™³Ù›/RÒp®ç×ßïò Ë?µkž¾×Àé¼óÎ99›E 1šðøp SóKŽdlnUú:+Cµ/š°+_c˜†?ggH‰×© »^Òö-#ïÌÌtIŠ%\,R˜1&¾'†¶ñŒyÕ1ìG®5æÎsiv^3ã~ßW›d ˆ¨¤cÕ2^1¥ÿgWIŽ·†tT¢xui4[ŠçØ4ˆèù~ôøÃAÊè ØÃ›‰‰šúÆ&¦›ìx«³‰öÄ6ve ܱ-ñ隆´yP•xøGø dì¨Cÿ!eÈË–- ˜…ëZЩٹÂfH¢9ÂV ;°pj¶ª5™±”åHf§cÔ!‚‚Ô°ºÍmt Øc†äëg™˜J¬U$emk Ùòæe¹[‚È78¿›ñ˜æár’Æ4Õç§'ˆØÄUâ©QÄtF?'ÒÜQ‡Òòµ· iB\Jš×úÎ"‹Í…ÜÚÌX’)b'M6êDý^>cWI¾ÈÍ#ÛbÒhÎLѤìÄ,w ðGRñˆ¯ö}-l™¬ÌZÐÙžï‡;˜D\ú©I .{Ù—™l†=$k—#é—°,1¶¨]GÕ*@P 8@dS_ÆöíÛÅÅ_lü8óì”)Sĉ'=463&-¯…Ðãå¹ßIBîÎz^žÑfÀ©lOKµ6ŽXt›ðÄRðeÀ' ÕG4t{eö2¥'{N¢{<ïgÃÏ'QÚòê-9ÄàHcn\Ÿ²A¯ù™:ä¶TÖÛs“Lý¢D²9B‘+_Ûv&zL“?Qñžð„L GÚú&ˆc4•Qœšñ0ô¥TI!Ðè(&_¿¬½sçNc-×]w^ýu°Ÿé¦žå,_µ=ô„‚¬Œðè;_áï7_ášcH±„Á¾º ˆðÔ“R…$më DDè&Bžê*¹ŠRìÙ½¥âçæíqyß+ÉÔ·¡©bƒQ‰Ží;áPI‰$óõ×_£[7·v1•ÌýëX´HÖ "å(Ïs¦9ÑÚxNa4§*d©+9È Aaˆó×B’UgÅ€Ï*Üj°3 ˆ‹‹GRÒ…9Ä›o¾‰êEMðZŽ…ÍZc¯œ5Ýö¾¿»õV¼Oª¦cÔ> ?l/ÓƒM ¸jÃz+aY€ŽàrGTbÛåM-šÍÎ:wîlDwZ·nÈ9K½­)ÎÊ é\ ®ÆaOÁg6Ó¼õÞxã ?ÓÁ¤ýûß§R®ŸÞ§Støðaƒù¶iÓÆ`¾LkÅŠóíÕ«ú÷ï3fàçŸÉXY%…@cEàììt«Qg÷Þ{ODDDgôTÌ;WüòË/g2 lD6Eb<øÓ©S'Á¦Jç:±™“>'6)Ó1YqÉ%—uì“¶¦å}LLŒ P¡·UW…@cB@IÀô?^¥ÀDছn’ç„W\á9㤠¸•¶]üñÇÀ\t-WÅ8tèÐA¶¦èDxôÑGkÙóÌ5ãyè‰^ ô,Ö¬Yƒï¾ûNÞ·k×+W®Ù{Ë{29#^{Œ¶*£hL(ܘž–šë)#УGé9‹™®žÞ}÷]éžqß¾}zQ“»^xá…xöÙguSüejÜŸ‹Œ?LÒ ž~úic:ì’4??_Þ7oÞ\nŸ³B–J ƈ€bÀñ©©9Ÿáááà  sæH$Ù÷ÿûþïÿþ™™™§D+ßu×]à]N´-iÓ¦IŸÑgb´Å#GŽÔHÚÞ@ byyy²9Ÿs"[o¼úê« -hy¯þ(#Š7Ƨ¦æ|ʰäôÔSOÉí–-[Êþ¼ }Ë-·àŸÿüç)Ó ”K—.•Ñ‘x=,³$\Ÿ‰¼®¾új2YŠÀ3ÏÃСC¾k×®•¦Jº²QÑ2ì˜ãÊ+9rÅð!'3gά·U³#=1æ5™ ™pÇŽÁŠb¾‰%ôÉ“'û«{…@£D@1àFùØÔ¤ëŠ@Û¶m±iÓ&L:Õ Å[°¾ï“O>1ÊšB&88)))†m0o¿ýöÛõ²tÖZîÙS ÑÇNRjRô23`~Ú¸q£×x ›mUR ŠÊ“Të8e˜ñ°ƒ ÜÎsb%Ÿ¨¨(¹ÍyÊqvTrï½÷+àÂõ_Ù,W· }œ‚ÞÿðÃr|~¾[ÌóçÏÇ<`ÌOe€€bÀðÕê„ÀôéÓ‘••–Š9ñ6)KÆñññ¨¾NC5èÎ .40(,,Äc=V/ó½öÚk :Õí.°o=…††z1ÿaÆySëíÔU!ÐØP ¸±?A5ÿzA€%@>£äóa=%''K3£GêE}mݺµ—FøâÅ‹Aäë¼æÚHÀæíg–†õÄëìlC%…@ " p >Uµ¦ÓB€ýó)kJë‰üÙÑCSñ¶4qâD¹Ïëgé¿>lƒÙ›!q¢ø·†# YàþcfÀú®Ÿþùçæf*¯(¨Ç©SWXâb‰‰‰ÒÞ”é±ÛÃßþö·Õ-¨ëˆ ¯?+dqT"N999X¾|y&ɦCæmhçÀf̃qvA©+pÕiª³B " p}0jZçöÌ޳؋'VH3fŒdÌì1GJHH0–Èùï¿ÿÞ¸?Ìɶ¡Ùæ×œ(X~÷»ß™‹T^!p¨xÀ÷HÕ‚êvƒhµZ±ŸBóé‰=0±'¦V­ZéEw=qâ("tÙãÆÃ+¯¼rÚëüè£ÀÊTœØæøË/¿”yþÃ/4¬xÅcrâ<»­Ô=–ÉBõG!€( 8ªZRý!Àç—;wî4|&3åõë×Ë-鯿þºþj`”x šÍ³ôÄÛòÞQ¿=å+ûÝÖM½ø¥¦¼¼Ü ÁøêÌ— gÏž­˜¯ŽÊ2ŠòÓUk«8rGPb³$=}ñÅR9‹Í—5EGGãž{î1–ÇN0œN§q*–jû÷ï/»°Äkv1™ššjbe­Y³f÷*£dä§«ÖVoœþù2|ßK/½„I·¬¬ #FŒ¨÷õ6éz ÄÞ§Ø<‰KüO<ñÄiSõwÌŽO^|ñEƒæ›o¾ Ö~VI!ÐP ¸)+ðö,3Þ‚‚c<lÀf<9Øo%óy.‡+äU«ó`ޝÌ..9µoß¾ÎQ–$!õG!ÐHPp#}pjÚŽ(ÄvÁz$ ž56`gß|óMãX„ŸYòV2Ç  ’µÌXÙMçÉ’9 ‡wTI!ДP ¸)?}µö³‚ÀE]„>øãVOÿýïe0Ý»”^Þ˜®ürÁÌzzðÁ¥y’~ïïjfÀ¿ÿýïý5Qe &ƒ€Ú‚n2Z-´! ðüóÏãOú~þùg9ö.õïÿS§NmÓ;å9üôÓO`Ó«ÂÂBÙ÷Þ{ïŪU«üÒamg>?f×–¼n¶%ä˜Ê~AP… J6¡² 3À”)SäY)Ÿrbsœ¸¸8p¨?Ý4çLÏ¡>é³’ÙsÏ=gdÏ`ÕIõï¿ÿ¾¡}íµ×*æk ¦2MÅ€›ê“Wë>g°òŸ 8ИòeËpã7ž–{GƒÈ9ÊÜrË-¸ãŽ;äè,ÝNŸ>ÝïË„Ú~>GH Û`P ¸Á>5±@FàÒK/•’";µÐ‡94h>ÿüs½¨Ñ\9¬ nÏËá 9l¡91cæsp=©ó_ umÊ(Ü”Ÿ¾Zû9E€];®]»V:²`­bN|–Ê2Û7¦Ô©S'iû¬Ïyþüù8pà€~+_*>,ï9Æpÿþý:•Q4UnªO^­»Á ð׿þo½õ–á]ŠCõÝyç˜7ožqfÚ`&[ÃDî¿ÿ~c[Rð¹¶žøüWO7Ýt“á%L/SW…@SD@1à¦øÔÕš7ß|3vìØŽ‹«'–"Ù‰‡ÃáЋô•¥ø+V¡™é¾òÊ+rÎæó_fÀ*)€2CRß…@B€Ôó¹pff¦1+ŽÅËÁzôèa”5äL||<øL˜›}öÙg2Œ!›^±›ÎââbYÞ× æ¦8( øl ¬ÆPÔrÀAØ©…žX©éÿþïÿ¼”˜ôº†xeÉý’K.‘Sûþû拾n÷ÌÎ;˜)«¤PŠ«oB !À[¹¬Eœ––VÔâT^^Ž˜˜,^¼¸Í¶êtXzÉ’%F…YšWÚÏ,*£P X} ñãÇKS%]šä(BìúñøNœ8ÑP§-çuûí·ãÖ[o­2GÅ€«@¢ š0ê ¸ ?|µôÆ›ïŒ=Ÿ|ò‰1aÞ’~óÍ7ѱcG£¬¡e8ØÇûÕ·Ÿùü×ét¢E‹ mªj> s‚€Ú‚>'°«AµG€ÝVr´!vc©§;wâšk®‘šÓzYC»vîÜÙËÞ—q4¥šO`" p`>WµªC yóæX¹r¥<[ÕC:t7Üp^|ñÅ»Ú~ýúsã  J%…€B@C@1`õMP4"þüç?ƒíkÛ´i#gÍgÁ“&MÂ<>#nh‰µºõ4vìX=«® …! °ú(QQQ2˜Cß¾}™/Z´#FŒÚÒFaȳèÚµ«‘W…€B@1`õP4Jºuë†íÛ·Kå,}›6m’Á¾üòK½èœ_Ù醞Ø_´J …€%{°P9…@£B€cñ®[·=ö˜á[™ üö·¿•ž³Âb̰bÀ ቨ94$”RCzj. ÓD`ýúõ˜8q"Ž;&)°É{¤š;w®ÁœO“tºñK—àÄn6ÃÃÃëDOuVŠÒÓTkiÒ|ñÅ2xÃ×_mà0f̬^½ÌÏvúá‡!‡åñõ—ƒ³=5žB ¡" ¶ ê“QóRœ"}úô‘ÊYÇ7zò5Ç.((0ÊÎVƼý¬{ó:[c«qÅ€ÃSRsTÔÖ­[ƒCÿ±¹’žvïÞ-•³>úè#½è¬\Í XÿžÈÕ Å€ÙSÓUœ óÏ?_:ìxá… ·GŽÁ7ÞˆeË–¬{½Õ›pCv™Yo V„§ˆ€bÀ§˜j®h,LžïÝwß-vüúë¯^mëz£p]Týåˆ#П°ZŸBÀû÷ï—N;òòòŒÚ‘#GbíÚµõâ­Š¾Z´hfêçw8j“FÑPeM%7ñ/€Z~ÓD gÏžÈÉÉÁ-·ÜbÀ’qdd$öíÛg”n†cëuûöíó=] U¿€F@1à€~¼jq ê`¿ÌHHH0íÝ»W2á7e§“1o?+àÓAPõi (Üž²Z£B x{øÉ'Ÿ”ŠX-[¶”­Ø‡3oG?óÌ3Õô:y±™+à“ã¥Z4MnšÏ]­Z!à…À¸qãÀ¦I;w–å¼}üÐCIÓ¥ŠŠ ¯¶µ¹Q6ÀµAIµiê(ÜÔ¿jý 7ýû÷—Á† b`’––¾ÿî»ïŒ²Úd”\”T›¦Ž€bÀMý Ö¯0!Ю];deeaÚ´iF©îIë“O>1ÊN–Q ød©z… °ú(^#%%K—.ç9}ÿý÷ˆŠŠÂªU«¼ÚVw£puȨr…€Å€=X¨œB@!`Bàü#6mÚ„¶mÛÊÒŸþS¦LÁÌ™3Áv¾5%Å€kBGÕ)4”#ަòM¨¬@EeBB‚ê}ÅÇŽ¡2(­Ü´+KöâÝ]?`ȈÁˆÐG£ì ßí š}ôä(-F‰Ý‰ÐðvèØ6L/Ö®”–9pœ‚ø´ïÒ!ÞµÚA93Mwåå(³GËÖÐ6Ì3¦¬¯mm„@þ[XXˆQ£FáóÏ?7–ÉÁ^{í5´iÓÆ(3gZµj…Ÿ~úI±Võ\`®Vy…€B€Pp“øbnp(BCbcqÍ’Ë©Ãq K¶ð3£kÅwïÁ‡ÿ3аgÕŸç`ú/B¹lRŽus£Þ®zôèNíÂÑoöZ”º»ï];ÍBÃÑŽ‚ tíÚ ¡ÍFaÃ^‡‡¸ÌU`ß‚º(׫¼âÀ:ôkŒðÖí¨oW´ ÆìÕ» £PÚ^ðM—.]ðñÇcìØ±Æ*7oÞ,#,íÙ³Ç(Ó3?þø£Á|Ù´I1_uUx# °7yçØµ äÊlH{ßãzðtË&):óÒú·ÂÔÂ<äMéã!Ü‚òmÐÜSœ°ñéÈ/(@~~¾ö)˜,%äò+1vÁQdæÛ!„€=?¶ÅºKcÏ.´FRzŠÊì°—ØdÍ€uÆZè,xÏê‰hÖ,Ö€õíÌRÚuü'¦!¯¨vg²Sâ°˜âäf»¹{]hëc4…+3R–xŸxâ º™\òÁƒqíµ×âÍ7ßô‚@m?{Á¡nÕ#@?x*4.‘AÌOd§Æ Qy­×)2¢bS…Ý«ÜûÆž—)bAtÜ‹">-O6ÊK±©6£ƒÃ¶ŒÚ ¹£Hä&G K²§§Fˆ¢¬jo9ezi¾+ÉS WÈk~ªUÀ’,ôæ.{‰((* uX„%)׫m•›¢ ¹†›ÿÕÖ‰v•Á³`Æ ",,Ìø.C=ö˜ Ûa¹à>øÀ¨6lX`‚ V¥¨”\ý»I`ÔTز˜Øî]¿Ç›x qÞßkv¬àľw¶6»dkZ¾cf\ƒ5q)È%éÕ–“‹ Ø}ä¸ltüÇ}Ø]B‡³5$Ú†mÖ,Lœ8ÓfÏǺ­Œñ:ö·"ܺ®ÛŒ KÃÄ#æJãÙD¹[×g–‹ê. k‹.» Sÿ瑦Î8ð ­V\ÛÓçŒY6ªmó8œ¿õÖ[¥i>.àD¿C Œ;î¸ÇH@IÀüôÕÚêÅ€ëÍH«øÃuÄÚâ0>’NÇ(¤D“Ó?5Í4qÛhkw[œGaÊTËYGÁ6bˆQÈMž†Ý»£oä-NtŽú´«é¶e¯h$$¢ø³ƒÐæÈ:ŒÚc–»ƒÁG\†«ˆ¢,ødìpXg­âá?¡Ë´l˜{†ÎÊ@š­ 3‡´­B‘ ¼Õ¯< ãÇO¤Iã1~ÒCHµsÖ|$mfA×Mk†àèå†V±/aÝ"iz N]ˆ ×aþÄë1΀Ã}Vw_±³GÍÆ†]PZ^Š=›—cà¬-°Lè/·½Û^BònF^ÛU,)”î}ÏoácÞ èžÌ‹f‡Ãº` âÒ¨ŒC´ IDAT²Ùò°ÜâämÎb·teE9JɆ¸ˆù%(.-E¹ûe¡pã|t>‹|2Ù{©÷Ý+;×vuË- Mo]S˜Û”“ 2{•¼­Û˜Sdd¤TÐâ5ð6ü-·ÜÒ˜—£æ®8c¨-è3m .DzÖár[µY¿fhFÌw méŽñÃ|µUóà‡øÙ‚ b¼Õ3× °éwØsö!Õ¨n‡ŠçŸ^4øÏþƒ#Fx1Y§Ó û÷ûßÿ—^z)æÌ™ƒ/¾ø¢Q­Ñ<Ùnݺ·l–¤’B@!à%ûÇE•ú P^¼ûÓyæ14¿ä*\; »2“¹Ã1|“_†v=:×ÐÆÜ¾éå>Œ5kÖHÉØf£u?iÀ€R*¾ûî»ÁaKâP†+V[2JË–-ÃôéÓËÔÕ<gÅ€Ï*Üj0…@U˜³ñÚµk¥”ìÛ‚Lš#™1;Á 3ß& êž%ø… Ê9=õÔSR¢oPT“Q4ÔtyjM ùõdSžï¾ûï¼óÆ'ƒVèˆpè?>Kå`:tå'Ÿ|¢W7¸«9øGBRI! ð€bÀþqQ¥ ³Ž+_Ý|óÍxå•W¤$¼råJòv½—I3´åË—ãºë®“‘£æÏŸŠÐÒ…^hLÇŸÒ™Q©2 &Ž€bÀMü  –ß0Ç”)S°uëVI“ôµÜ܀𗓩¸œLš°jÕ*I“¹É9É+ øœÀ®m„(ÜššrÓB€µŠÿþ÷¿ËŽÛ¶mÃÔ©Sa–2ÉvQ:Á`†Í[ÔãÇÇÆñË/¿œ ÌsSð9yjÐF‚€bÀäA©i*Þzf ãï¿ÿ¯¾ú*FŽ VÒÒ›4±©+m]rÉ%xðÁQ–µÞ§¾¯J®oD½@E@iAê“Uëj2”””H jÖ¤þüóÏý®ûꫯ–ZÔ,·oßÞo›ú*dæ¾}ûJrW]uU£¶i®/L…€?ö‡Š*S4R˜ù1#fLJª² Vôb‡ìuËjµ"$ÄO\Æ*½N­àÛo¿EçÎe'GÈÚÝ*)UP ¸*&ª¤NT¢Bº“ ¡˜Gõœ*+pŒ‚,´j¥3Jìݼ ®>7¢o;}4m|ˆÑèõTCÁ ‹(^oP8Úwiëã,¤ žPf?Ž–­;¡­—™L›B ’ËÌjÖHfC "Ä´5Ì}89(hD‰Ý‰ÐðvèØÖãÆ³’ÖFݪ¤jǨÒÒ»€Ï7mÚ$™ñúõëÁ[Ó¾‰·Šï¼óNÉŒykÛ@·í©Ü;°§V­ZïUR(ü p¶O«ñìDŽí ›–Wï uä>.iowè¤"‰Æz|»”Kí¹)~‚XD¶;ÞAAf’O}œÈvG…pæ§Ë8Ä<ýŸš+Œ˜îa‹²™F”ÈñCÁ)2⨕ e"=AÃF§m‰OZdã2‘lS¯ç«pB_ñé\9ØiHËàÂL_ÏsÐRôùùù§3„WŸ_ýUœwÞyÆ83X%…€B *¨Z¤J§‰€+OÄéŒÄ’äf.§KË)œ>œÏå(6Û~á4H:Äâ¾ùȉ‹í6fÀ‘ž›/ ˆ™0CÉÏ/Ð"+9m‚"1Џ4›FÁY ’9JSlš¤i·¥Š¸Ä4‘WT"ìÎ2‘'iei\Rg®ì¯1-«Ð¢I“¶TOä%kJ®§‚re9Ì´-"3_ãÚöü É ´¹»DI>Wº‰<÷‹BªÍ/—÷¢}*7*Q$&&Šž={Êñul¾’í± E/A6ǧBÚ«-ùü6è=zÔ«NÝ(¦}9ú/¨’B ”çnÀ Ä"+'ÃOÀfb|w}»8¶w5®Ø›c9úš¢$y YYˆåMÆôÅ[d±%Ê‚6=ç cùx÷!&Lø¯í|½=d½ºk7Ýp9ùªîâ[ã:¶’½²u­&¤ ¢FE›µÛ°¾“°\Ó’Cn L_ýß;ÍÛÅ!WbiQæÛVÁ£ÍOë©ý½âŽE(¸é ä=wæø=rþÄ¡º!¢µ¶íÖý*BŠ’‹ÿ¡m—îhËYwÚ¸x Ù)°öõlSëuækFF6oÞŒæÍ›#88X^«Ëëõ¬5hÐ ìß¿YYY²¿y›˜Møóç?ÿ7Þx£Ô¨¾÷Þ{½¼s™çà/ÏÛÛ™;iݺµ¿fªL!ФP ¸I?þú\|%¶¾@L#.ÑŽŽ¶†‘ôúnŒ8Ò3ÈqæJÛñ“§Ä'W u%ækI°tÁÈxl fí×Î+]vìÙ³?ø9+5I†–KôDÄõ¼WÝp+ÆŒŽFGfØa½0#! cc:á«„TÌzÃgmArΫ>çÀµŸ0“µâÚž: ‘ç[õ!(¬-ºPS{bð|lJû[‡ܺ’ÒŸEïâT¬AÒ¶äñž¦Ur~ë#.ÃU$lÓ64>;ôªB/+i¸¤Êм8ch[wDúe?ÿìô‘KÏLbIö_ÿúWâ,Ý2ÃeÉ[—¾YºÖKḛ̀UR(¼P ØuwšnÉË‹[fÝ…mÂPD¡ö8½c›‡™nfJjä+‹ýÿ XwöÐO×u‚7kŸ‚:FãÑ'£Ý&áî˜KÐnèF(­ËÖ⊠+žïÄÚ_¹hž7C;-A‰xØØ.ß³­-“aIÈÄÊI}k?x -÷,‡Y[P$ž¤—Q¸u5n:w è‹ÍÓ"g¦v!o•\?¡«—wÑ致€éKÀ¼-{ð=m·¥âÂÍKÐuø,Ä&gãå™CüŽLï Z22zAõÅ?üHs°ö¾yÝeH,æDMFÒ ³¼\Œç“lŸ‰Q]j÷_sèСàO}'f¼o¿ý6Þ|óM¼ûî»~͘j;&o“³-0{æRI! ð P»ÿåžö*§¨Š@ñÇRjKÉ{ãMÚQÃZíÆ k l‰#0 Œ”°ö¤ Ìò*r[0ÀŸ$v%¦ÄND‹Œ{¶÷L˜Câ uœ×˜5mºîY=ë›SìÜßõF°}/ž›5ú& 3íh‡–u¤|®ÚŠ¥÷ A¨óþ³„^HÙ©+3ßóÑ5f—Œ‡GvÄÞ={¤ŽTp›®èÝ‘@»¥¥-*‘ó))(Fi›0´uÛóV’}q¹Ã‰"f¶Î—–"”Î…#ˆvÛK,4t^Û5 ãtDéÞwñ<-Í2åI‹ÿnxóh‹:k®ç…Ũ< ™Ã‡ƒ•º˜é²rKÎþR—.]pË-·È¬Ø5pà@/é›û±G®GyL“Ã)Μ9o¼ñ†?rªL!ÐtPêà º"'ÍoE‘/¡²lAÇž")[³ãqØ– à6‘kØñúv {2 J‰·R;ˆ¨¸D‘`%3¡¨Á–¾z›a‡TÕ )?=^öåþòc‰™F‘ïk• rK4{'[ŠÇŒÈèOt,ºM¯=W®Ç\KŠfâDó³¥hó6×'åè6Ê%"-Á»Þš”iô®|ÏkÖÇòÍ™(¢P†bÑ¢E‚M̶»æ5pþŠ+®ÄPÅgŸ}Vëi&µ0Ûoذ¡Ö}UC…@S@@y¢_•ΕäQŠÏ†õméÊ *ƒÂÈ‹ÏɵÃ1ép¾<ÆÏ™è1¤XÂ`_]†‡õsfîF—ÃQŽŸ~úRt Ý Vy~þë¼ ÑW„¢´\7"4žgîÌÓ1{¸ªDiqì$ ¶nßEJ³Z+íoãpØá 'ÏVþu–åÝ4+©}¹ôˆ`'JË8N´ÛwéXe}åÅ8L‡·GG£½’6/îÞ‰æu {Y_~ù¥”rY"ýïÿëEÕ|sÍ5×`ôèÑòsùå—›«j‹‹Ç5æÄî)¿úê+üæ7¿©uÕP!Ð4…· µÆ†‰€“_Ð.’p“¥„˳´ç$º%X‹qÍ0I°Þ+ÑD)’V—ÜUÓgŒ{xš»p5vj^·Èmn}<X3¡Ñ³±nóF,˜1Q‰Ïb +Dû&Ò_E~=FZ:5am;¢K×nÒ¶Ù(ô›)ÇÖõ´pB©þø‘"Ê·£è©5zÑ›‚"4é‰|m)¤ë D‘öŽýåF™¿ {³zë­·H“»¯¾ú*ÆbÆþšÖ{Ù²eË@Ò¶¤»cǤ¤¤Ôûg† «n•n8CCéJZßÍšEcᆽõ8Ü1,¡ç¶ð3ƒæÁ7~‹áo~mÜ{gб0:¡‘Ka£PÏ”eÉg”êV ÌI"…HR”ô|õ†Ú•¿¯–dÏܽk}îÜú )ÕDùªq^>¤áVIÀûnÕ`Wv k m˜Z‘’6t6ˆáwNA üÚ¤bYéÉ_€z½Ž=dÔ·×,„tÀ%úޝ¥kÍÁ}.ÓJ‚"=ã~Úê¾;â”e•¥;0;´æµID¾‹"<ùÝÍ¥­=ÙÚW:u`ÃÜ›0tVÒle˜9Ä 邎$µlÙÍÝw.Ãî-¼CÝRRâ?+òŠ|EÁ6šk" FUƒÊ°ŸhóÖ3oƒoܸ±AÍÑßdZÐÖÃæÿ¬ÂÚµk±zÉ\\/½•ý ‘úó®$Ç#Ôq{iõÇå;aà„HHË–jöœ1°ÙvkÁ°¨ïñ÷awIõýy^ÖÍÆÐ9G‘U¶#{Ñ·JßíáJwWÿÇ)ÜÀ“Œ`ÉÞÁ>*‹w`ù¹ˆn=Q ¸£7/ÐÏ?¡³•-ëñà¨~´ýÞ Ñçbã^cóü ·Íš%q¦Ížu[T1íÓG¯àD¯ápwŸyyZ`.Þ"Ô|ž$FK¢[²Ô§NR±…¤ˆD·D«9ÝVƒÓÝ È*R22DZ’&Âê–k!süߨøTa£ø»%%ù"#I“FÓòXÂ-I,ÙF%‰¼2¾wŠ© ‘ÁÛFìcŽ=œ'òóò(V±MØòŠ Å–²RjÉK—ÒDbfQâVèrŠ –@¤Ô”MñŠÝ}Yk›ÅŒ¢L­.5‡âÛq3Ýq‰ó¥’–…Ì–ˆ^Y¾Hf ê“hAë07„ëþð¹>^ÿe—]&Ž?Þ¦UÍhÇAj¨[µÝÚ±á ž{r¶îz†”H‰Ê¬—äMÌý]Ëð»lDÇ#EûJ¼¾÷e¹¬A~Ì%&rUîy˜¾÷ýéOâ¾)qî—‹ˆ±cüÔ¬-íÖH.ÿTÜGíoxæMCÙ=2¹}NŒœ^"YsÙçëòœÞxÙ¢ýi6çóB/²¤÷ W|¦gg½üÕžåfŠÔUKÌÆüc…梱⡱4æùd Ãj;ñitT rEfªæ¥,*Ù}ŒBÒìÍ’éõ䘥ˆì]ór24feÕ¶z‹²´>qÉé"#=ÙÍ䬆V‰wÑPÑgÉiŒœ¡Ç¹Lž´OÛÝ/•úqʧ¹;ÅÇ;7‹”‡o”ë{î½Mr§†pA{_ë)®3´6~~z‚ˆML££›˜wýé×_ö]µY³a¿uÚ­ò>eã;â=-žO¼“îûˆ”U—åÃn˜!2róEIñ&c{¸^®­@ßɱ&‹×%Ìcÿ)Þyï=ñÞÇŸÓÑK¾È/(qÏË"Lú”5<ÀªR 8°žgÓZÍ)yÍ¢øÃ…b‘Ç1GÓ«á­–üB RÌ’?â,m‘Ïè3:É>úÈ‹éëR"…I±©š±ýXDBj¶Gò#LñºhQ¤,-¤F¯5Õ½sâ[Ÿ¾][kŸ "%-ÙK6vgjé\FgÀÙn\ÝqжµKçµ…{ÝÁ>>’Î6l’1j^Ý |”ÐZô£M#‘xÙ+H”HÍ¡ f·¶ô푼vÏgÐ(^§Fû¤NÎQ’jöY/V ø¬C®T(tyäãÇ;""Bn¹ëuõuÍÉÉ¿ûÝïŒqtfÁn8gÍš%ȹÊPFQp4§ÔÔTA6Ë‚XTic.°ÿ{ße•÷ÿ¥ Ñ„¯A†ä%/0¤Vjáe´Ì2ÁL»(¶š5^*„.*þÓ +£-uÜ6ÑÖ°WÇj±«Åí]ÅÈ VXÛÂ*Úà&(èŒ5èPçÿ;ç¹Ì „›ör9ÇÏðœçÜÏ÷ç÷œßUx##•žÁ,Ô'M%Xºiy¢:š»œÖÍöÒ;`±8Æ3«—\CS΋g¯=5˜õ™oafîfT37rüºi·8Eñ fH#öq­¤yOóä¬W¶t,ÛSån,"|©¦}.;‰T2GiKOK§]Ò–/"\Äý~W’§+_có­ÂäN1=4%¥²ÌÜbß2h÷ô->WÛ6‚¾›2I$ßgžyä‡FUUžxâ lذá‚LÎ#=-Y²$ëõÇ0ž9s&}ôQüŸ~ú):äõ!^}"##ѯŸ‡ó ¯Z÷Mήí8è¾Å¼K»súÀ5ß5CƒüÄÕ”a%g=œËŒü{(žÄµŠs™j›FüZ’G­@Õ<êðæ—Ƚys†yL¢d}<µ¹ò ª @›KÝÍ]ÜɆšüƒBÀC_Wý»¶å 3=ƒóœ09ò§Iµ€]õù'i×{RÈÿGMS²k&_BF ùØ2k 6E«»6â‘¶:Lä†%ß @ ÓøÆoàöÛo3RàLŸ>]؇ÿ·K ….<ûì³"ä"¡ôa¸ +'¤Üý&·Aö´IÖÕ“áúדæŒ ˆVU7¬Ú‡ ¡C°a_ŽÍšÃåDôɉUÌ`DGÔ _IÞ¬¸Íø±kÐov Õ„Âù‘›þúü-ª ðÜäµDà¬n{ejŸY—â}Í“8´á+òN}&aÃú‡¬Q‡ã·—⥱aÞ|Üõ›HíÔŠYs1¼Û ÜñÆ.T®…„Þ-³Hà–ù\å®$ÍÒDÆ<€wÞyG¬yΜ9(((Ð}G7f#?þø£puÉeœ8qÂgNŒ!`…^ IDAT©Ï¿²w—àà`p÷™Ú‡;ŒiLrq/¦Ú¯jȵD·ù?p"ˆè§z-öuñˆòñ–‘W"°s'ô¤|AF+Øÿöbòµžë¢}vP=Wò¯ž‚"À€yØgs`°êUæ§Óß©>PõfîL§Â1MNÉI*s`}H=ãîâÎù£sWòpÒÊ“öUiå0ÈíK$ÿ—¬X±ÙÙÙ8uê”`“f2žþù—DŠSعs§8érÂËûžOjß¾=zöì©YØò+Å0>Ÿ¡Ô¶'Qt íˆÅúÓé ìz+äñ™Æ^¢þLA:‚ ï!ß‘ƒ¾|œõÃÃd\>uÐ4´ÉzAEŸ`*wúO*^瓪>ÇS³³@îJ½YÜbŒj ÆáÀIòÍÌS¹­ ‚Љó–C¢0“¼áÌ[œq&ÿ1L ¬H°Ü*ƒÞ¼ÏFaþƒ#Ñ5À‰ÏÖ=)ög¹UÙ1§ë›NäÉ÷¿ã’Ç042?|ýÍË!íùV{ú ÅK¹å%f±„9¿X|¸vtQQQus¯Y[¶la>ø ãJ[Zûú®íÚµ#ŸËvÏ=÷°… ЏÇ\º¬Ì—‰:Ó5º@SÂòZ‡ÑD^Ò4oY¡ÚºšÖ×€w7šÎÉCl*ÙŒ¦TÅÞWóÙÜç2|ÁyÂ_t ³ùZ½ê7ÜkšÿtjÏÏÐ;€Ž«Û#c iK‹©›´S¹=±ÇØFòB׌üÇøBó7—ùñ™$‰Àÿ)ü§è–[nÁÞ½{Å:Fމœœœ>}Ÿ|ò >øàá;š³›}%îkšî¼óN<üðÃBi*,,Lø/öÕ¾)–ÕT;PãOŠT‚7éÀ¦iÁ˜ŠL87L‘²j¯¹ÀlÄÔ³¯Á:ÿB*2)'Yø¢SŒk¥j‡Îê´ Q×Y«A·|'çÓÒVdÀJÜÀ—EVI$¿/\ö;pà@pö2O]ºt™·PÄïQîPS3g n¥a ŠvlCA¹ïÙøZ|ÕT;ªPVV†ŠŠ*Ÿõ|ìšzú¢Ú ê[RRÏ•h çcW”• ŒÆ®“hýUe4w>Fë*¡¾UŽ:]½ h_ݽڴÖ&“D@" h šÑï³ÏOPP3™L,33“Af¿üòK£vbM¯3^BŠ…•ºÕ½QùÏ‹9ö:´æ–Fûx~o¥V _KsÓ¨­‘åÙõ"Ê8Y¦©Ö¾ I,¯¼Ö"íVf¢qÓò½Ç-´$ÕÚc<Ë*Ô&p2KB­±Ì̦]œY»/˜y»M_œm»Ùklƒ)ƒ•ëµîŒ³8“hmñéVw¡ÌéÈpk}ó’û–4† RïJÖ®]‹É“'‹à Ünxذah”É’!ÖÒRØlÅÈÍLÅÆeS±Ùêã$Xïìî _§×ÀÞÃj=„ëÛ»Ûµ‰/sߣz?&øù!|ø* †û¬ÏÛ¸p®C 2ó‹aw:Q^˜ëJ˜2¬ê¬Ÿà¿`ÖRI Zª]\EZfJ+í°—[‘Ÿ…ø¹› œW]¸¬w*²­6TÚ(Í·À°qæm,ÝyßTK®èë¬,Fz0oô”PmMÙ6t=Ö‘»Ë…rk&°vY³_›Z¹VíÃý='C¬ö¬Ë»NÞ $–_‰€D I#à$⣥ŒŒ Lš4I»­såy×®]ðd[×i$ ì@‡n膈ˆ6i è‰jNTcÛ¢Qð›¶^%X¾GqmÃ4" mÛ¶E]cbü¼©H4®ù~'¦N} 6_¼_m¸À~xƒ^¬Ù©TBkòJA˜òÒK˜4°‚Ñ©Ï-¸‘Ž“nJ„{Þ¤ˆÂ,½ú)7ÑScþ¤Á BP§hÜ3!8yVeaÒâÅ @„ ™4öwå?‰Î}&-Æâ)ÃDßÀ¸9Ö=ƒó‡ƒÔÆ€‡îŒ tŠž„ ³Yoì‚þúRs‹B‡Àžš4êZ{g>–Û*‹$n•]nZ"Ð|àòO- 4ÄnF ÉpçÌ™bAkUúõ’K.ATT”~ï™á²R~Z=ëºÈYŽåë7aÓúU˜5ª'"Mˆ¢6wâà'9€Õ^¿üÒ±sûÞ¦täSøDk^&¢é¸÷ué T“гÆeGAÁ^=˜‡åžËpçküÚ¹3z…wv—‰\ UBö[uê¶®Y…ä Á˜gMBÆ4N…)!Ñ=’ÎΞIé[BòÙ ]x[…Ï·dÝ숶¾äÂe»±ŽÖ>ÎÐ•ÄÆšÌ¹ßnÛ„WMƒav̹!‚¦©¶Ÿ¦¿.üPå–ì^Ñ‘¯  ü ­Š2+^»«'–Ó‘µø~ÔÞ™çJ[}^gFËŒD@" h‚ëòF2KòZá™3gØ’%K@E›¹s粜œ¯6Ú³0C‡~ø•|—>ìÖ»`ññqŸšM’W%9ívf·kwÚ(î«ÝÊå F–¯7q²Œáʸ\Îë°®¦ú‘̲”ÊžßKle´§¬Öɲ¸Œ7-Ù­éÊX\Dë,2]}´Ö>QƒÙp±æ–[Ê't÷eÎ|F‡Lf&°&sõì›”‘ÏrÌŠÌ{åòÇÅ>Ýõ$.8ÀRøØ†غÚ2çÎ}ØðØhÑ'5;Ÿe¥½úÇ-XË23RÅüˆ]Àæ”ý+ã'°l›ƒeÁŒRìþâxäà‘—Y‰€D@"Ф –²þƒOö¿õ®íèÑ£lÖ¬YìôéÓõ¶QˆœeYmlûKCzÎe39ቷ¢kËN¥¹j+BÕ;+´LeˆZÁܪOv–>lÈ‚÷X©Ãå&À¯ŒdÑiù4›[3ô}ŧç{`RªJKIgÖR"¨N;ËÏäëâ¶ŒY8ì3Ø«¯'¶[3˜)Õ KË™ÝYÉrÓ¡¶Ê Œ9·œY-©ŒäÂŒäÂŒäÂ,m<מM³Ò ‡%%…‘Ì™ú:Éœ‰~™Ö^œÅ•²Âèc$üÊYeñ–8º݃œùœòÂÀÇ÷ëCeÀxfìA×ЬŸ¸KHÍbš Xý¨¶®P&‰€D@"Ð$ؼy³×ºN:…+¯¼Ò«Œßp[áôôô:åu "Ù7ØC*KÝ ˆ» X÷Ý9Ñ,¼Wº~„•Á~¶X±fJ´`?×ý‘¬ÁþMK0hª…Ú[Ã%£±‹ä¸'€cÿó vN¿qu'×KúÞ³¶1/ ðõ8, Ž®®|Å3јÿik©iàh%.rp»öèӟκ{nÅÒ÷P¦öå½´?‚¢§cÞÕªoóh;zÿ(‡u¢übhÕ5?¡ð#*j'ò]kM&™³–‚T™³•  ìNÅ$ ˜»qÑ|#ÌÿØp¿ÎØfÄñù'œò0nœ2•Ôâçã× ç‰-¸¶?à? wÜxµ{ŸT/ eÀò[ 4Y<‰êÙ³gÁ=cý¶”…-ëÖão{O‘ øeįMé‹;6aRÏ©4ô ØèïÞŠ±y–F­©£„Uµo9ßeHY›Ž¡|1Wv%mçÿÅšGb1›ˆÐ™cù¨øI׿"iiÝäÔ a«`+UdÄå6nï«è)×|ãnBLèhÜ2oúÚ20h^ Ó†Á úvÀÏÔ·Ìöot§ËK©o…jï[­ÉÁˆÿ×jŽÅ=}~$ytø”ñákŠðhç!XOY 7"°¸H©/*#ÑòA$ 1`Ø.sž…W†œDJBoI”ôœx)©:\€ÃedßKÇÝ£1bÁ?GàZª®t\±S¦` }¦%?†x”¢C;Òw‹}SÆtë‰ÁäÉ‚n]¹[‰@³A€­X§N›“~ªÅµ!6ô¯mLaA“<2!é5.?>É̬•.æ$¶·Óegñ4g|z-¶©j?kÊÓUf1ÓÈÞ4V{vÿ+éŒÈë@Ÿ•$UdÀqì½Wë² µµæ§ÝÀÚ«{Óöƒ2§³àu¢ì[«3&eè¶Êùæa,œ³y½úk,t;[M²Wï:åžËŠ™=Ÿñê§¶5˜Y¥ó6-æ*ï¾ÛøöÓ¢,žØã+êî®ï5ƒäÒÇX†°+6°ÔŒL–ea)ª\}ꄹ™³àeª€$Àµ‘÷‰@“@€¼]¹ä=ˆyÂú¯Ö§):Y=ýXTæ12ÎaIÙn' NÎ1ˆÐhJEv’¥*2X"|GsÙH«'}8ÖRþJ7^í¥„ÅX¾Ù(d¬Z[_W×ѰX3ùƒ¯ê†úVZ-;CŠ[©ÌÝÙÅrU¥,.®7¹ªØºéD˜cWª/!.V¸dÌñ´æ‘ãYrÊ… k V®R–•žÂâfŒOb)&®ØF¸¸¡¨wªÖZ!YÐôŠ(“D@"дøê«¯˜˜¨/ŠbÐÒG}„÷ß_»=ï«Ëm=C‚Ëk‰éLö¯?4ÖRÕ…3ÄÂÍ)*¥^UX ¿¶ƒCÿ†\3;ïz\BZ_²§‘³h¢Ë~ëë@åþÝú‚N«ø¦œ›ÿ¸S}}Kv¬B¨a*̹8ðÒØZl_¶.ƒáó²`±Vªra÷˜^9ÿ+qýnÄë§šbù£Ï(’1oÙ9[±ü±±¢ù½·öRºù‡!nÖKزãvlYG¨Üˆn!^£Ê$öCf%¦À²eË@n%õÅð{Ï4uêTìÞ½Û³¨‘ù“(:P€¢¢ìß·«fÝŠ%Ô3Á¨òLO5âÓ÷×3^Æ$’ÑÎÊ;0kÕvt[»dDi0Ò?/†3ó~²†­/b:ù¯NŠ "ßÍ䃙ü,—ž&™ö©ròÅ\*ÕaÇáÍ‹0mé&ÿæŠ²ÃØ¼t®XãÃ*¡k¨oɶ¥ÂKâ͘?. EªÜ·H~U¹ð²˜ÈËÕàvǽåÂd×¶oÏ&Ÿ «­²^âË×Õov =Lž*d’ÔB@ž€kÒ¬oIdid*gxäÛ·``Pý;*Z? }g8]¼·†;°ñ‘p̰šQy QUº¨¿¯¬‘\>üðC<ôÐCà®j'Z\KŠâRŠF‘‹Š)èAïÞ½‹šè :Ú÷ɳöxò^"ÐÔ'à¦öD~Ëz mV{`>ûŸ0˜—`l’׆áþùéýe þêɹöXò^"ðß!@JT>‰ï¥—^Š®]»êƒjºgÏž3†å6ɉß}÷]½ÌHš’7·'Öàz)®'ê†6«¿SªÀî7’Õ^öÒ£zzÔ¶œ¬’NQ'öæ…O5¨>sF„¤ÓÆ®*n($µ÷¹tŽzDÔ¯¡¾Þ!é´ÕÐÕWH:¯j-$]•jêQIY™Ð>|˜Ìg~[ãX7”.\(*E<òj6qâD|úé§8vìx<à~ýÈ¥¿šxB-ýå/Á¹s†µZ…¼Jš’7ƒ‡t¾Kôå{¶ÎÕ6ìÊ&Īö¼A€Ò*¸]ËÕĪ.Z žÎ¨‹ êð•ŒRö}æ+Ò Â+_¹_d¾û`(Fx¤Îc Jˆ­ íò|ïºê"Ìò#'û¡…âQhp[LX´µ–_bß}«oFŒÚ—Ëö; yý~mJ&çÁ$ûG÷îáhë7[‹4BZ-| · óv˜dì«Ð^Tª°™Õw?†wFLò&Tx¯^Üí_?‹°¾û´¡}´áld !»ÝÛ'Ì×\s8sÙ¯gºë®»Dð¾·ääd!#ö¬—y‰@sA@àæò¤.ô:»#Ö|{Dq/†W)·ý§ =YÓÏå:K‹1€BÒ¡”” ó²`ÊY‰Ñó>PØŸïZéÊ£¯{¦öQ8dµâar¸ §6#ÉDÅí͉— Þϯ-â×ñW¨o?ZÒúí‘’ IG'@;($²–Å#c¿BÍêëúé †¤Z@!é@!é›nÂJR^ÊU©¤ ¡ t t tH‹ÏBüÜM*qwá\‡PH:êë…¤C‚u%¸–/OUûÞÄäe'IoÀ.XÀöâ,XWN¥u¹_6x;îbÐ ÚÅ驵3^­' 7ˆ¤$ÕÑ¿ZJþŸ1`À½Mí gOïܹSœÀ9¡n׎¼ýË$h†HÜ Ú¯-YÿÁÓ3õ÷X[X¦WVÛvÚˆÁ½<‡^Û’2JHº°ˆ ­‡»Å￳ºÉt2›µ© Mót‹ˆ€Òéµ-¢cbC'EA«¿Ç[S±óû†Ù¯"$]© Ù)|§è¹ç â!éf!:Œì-ƒàI§4i¨¯I·x ú„uBP`†G¬8ôƒB¼£§,ÆüIƒF¶šA¢qÏòhtò¬zB I7i`´¥^AÝázRàý岟ÅÄι²T$BB•ïHPþHà5lGÁzt¿c ÌÙ™ ÇîY%Æðü“øØcàZО©oÿ8Eø×Iìök¯½—\¢ý|5‚UOƒ ‘CA97ž¼R••;Ýû%BiªŒ]/+ßc<ŸìöÊJJPRV¡s ô.Üž˜XùeTÇmkej…øtÏ! ›)D»¼œªÅ³ KéÞÛº:޼$0ÍIO±…{Í¡å…åÌYYÌÌ<ú †ÒLwߘekÞRÓ3˜%#ƒŸÇÜ'æÈNõ£WÖÁ¥2Oñú“bÉe6[1Ë%ÏHÂñ<FíïÛ!ÝUZpÈï*Ícéi)Œ˜̘R7 yC}µ™Š3y0õxfuj%ž×JÅ;R‚”^¯qds®óöÆÄ÷—¶é]%ø:}oÒ2·³,3ÿ%Q0wµgùvá!ʔɽ YiÝZt}d=óÑGØsü½?ÝØ^·#)µ½“eñ`ô£“>B(T×ã#ÞN¥¹üyÕ^‹ÙÈsþ3³©ÁœÅ™Š·+ú¤Œ|æËA[I~†©ÎøÎâlèAß[B†î9Jy&î5‹ïàv›6œ¼¶8I¦–‚…ã?ÔúxžWC›ñ-ZWóK7æ®å,I]}qQ1Éךө¯'N0Òrkãß?:Íêyàfý%‘/ÚšÁ¿·ÊwT[Û Í™FáñˆUψUψU/Úê/Tô,è|¯ögžïYÜ5cfj*˶ÚX¥ÝÉJó-‚àÆ«XqìL©F¬|F¬|F¬|ÇÀ¶× d/¼|ÏñËYšÊ(| ÓÅìDŒùÿÍ„ŒB±òBZg*½¼‘@¼ü¦‹P~)L’`íÁ¶Ž«$À­ã97¸K{e9”é¬ÁV-£òb„¤Èн~Ú»€XG½œût F×û^‰N°qãF<ýÈ42¹G=F‘€H&ËE£þAð®´ 5qy«§Ø´¦lÖ¼²£BG#öñ ÜÔÖ¡*Õy›æ=Ó€¼·,oÒÚF„û¡Ê‡Ù×á=Ûiö1ˆlO8òuTì@¬a*FâØëHÚý3NüG‘÷VýçŸ$u7âÆp÷ÿª€6djåõøjP´m^Y4 †Ù90ç>†uòÒJh or ål`–<++,´²ü¼íÌ,‡ƒeÚÔÓ `yD›¸ÕõË€K·§ˆÓɜɲ2Í*›SeA6òì"¶fyy)ËN¥“N/%ù}¥º„âÌ–@ìO«äø¥ÅÄ*UN“•UÜP_%ofÖâbVhµ²÷+llúYëæ×^qKØž¢Bf¥zþ)¥£™«|›y×c,kÏŒBÉ’#Z´0ÆC¼Ñ—YÔBG¿z—Q8>–`)&Ñ÷ó¢­.¿uíe‘b¾X–¯l-±°£^´vuâS±,jñû,+{/±±lÃ$÷úÄZ£ÙgeJgý´Olg›šsPœn9û˜³êÇ÷ë¦ïMÙg<ÛüM…`{Û­ªUGóL|–}Q¦j<8þîuJçýÇÌý ;bWøç•;Ÿ©ÕÿV6šÚÄ©ìvgÑjÖQìU]¿q"µîÅò'`ÃìæmÄÖvÒY8+…XùÀžÏ—*£iÿM[Íõ‚¹¢üì³Ï„i}‘d’4YÎÿ ËVì½>Òô¼±W·Å-WÙqÛ¬å^u’‹'E‹²‚5èÔ”åUOáì0p ÖðÕF¯:qs• Ö¯žÄ§3®ÃSŸÖ­6çE›'¯ÁìÞu=â^Bîû)ów`sr0&CvFôFñiÙØ0,Úž)Cáw?¡WtO…›R½ ÝÛCIe¨,^¨º7-Æ ¿^دö~ê©XüíÜükÕß±×±ÿJ¾3ÿR…Ï+v⺟âÞ¾ñØyçÛ`Ÿ<ˆý«F`ÂÚ\üç[}j=CŠWö2˯?IDATXu¹Õ‘5s6†¾»¯Æ‹»®Ã{n°™l0ãO{·¡ÍÊë`zOï <°OX—bù·ÃðÎÿÎÄ‹·MÇ?=ªç,]„Õ‹_z?‚·¼‰ä Êи\|÷ÉFì«®ïÕß:®7è9ׂ=ž‚NTòÍ«#0`q.7ýVS²moblD ”ˆJÈ}ìŽÝÇj=åµ… p¡^5ž{î¹Zoˆê› ç[¡ÌKŒäw QßÍ›7‹ÿšN»UrùüoÔ2Fc†®ÅX±8Ej1hùdŽÂÕbm3ßµ²²#_³½À&LFenÙfþŠ¡ {é{è?,š$%,.×tZXtt s‹=«Ø =øï€ç ØÎ^"WÎ×'¾,NÔD€Y¿äEÔF‘û:¬| #Ù{BaPù ¾×"°àßQ‹·1ÛW«Xú-ØvćâU)K¥:÷i4šåÐâœBùÎH'qMñê[¶¬?XŸGLJÛ>²bÂXW¼ú×ûbsÿ0X\‰ñÎRóŽ2KJ Å.f;øï]'FZ욉±kè®TÖ-ýäË_6@ô#2ÁÒ„v_9{â*ÚOÇ{Ù7tªvV²4au&4£µõåù‹ÝË?-)n!/Rr-@ò¨Âåó¿Ù9Y0³Ë×o¦õ«0kTO¬… qƒˆ´hé'ÅUpD_\Ù‘Wamø ÅÿÚmëçcPò^àÄ!Üöâ'°ïÁM?`gæßp†ZÕ¸ì((Ø‹S$oVl¤CðÌa>ø. †nÆÒi±XtD¡Ç/%õ¥³øÞÑ3oçõѹw’—¡ôôI\RÃÐ9×QMX§+Qúñ2¬Ø¯)ýÎ|‡ÍKçb Õ=·çk,Êû×Ày¢Äòå7(·90bvÆv8Šwè$øæZœ òˆ1Àâ"®@{æô»C7ÞÞ'ú]A}Ñ<€¨®ÑÑxÙ8W _or5rm(ŽâF:årnÇ@Ü64!WÑà¢+*ÉÆºb_Öb_€˜ 'ÞM\ˆ_î WëÛØùŤ$̦ÆCб­ÒGþmüæÿÖL#GŽ‚µ[y•´p\8u¼•&»"éz5º\Yÿ¯§ëÌ)Øi‹ÁüGúד§ïã_oÝøB»qèïZ,\v7Þ]A,JÔ±ƒsñù„k1ìÕäñ«#&PéÐ$Î ”® åµê!mÌUH£~ª›oêJЕ÷ŒÙüU¸wðh²n2;z z\°_d1´üÓÔh‰Çz¹'‰@KFÀEö´ @|(P}£¶*XÐd®ëBQ/EYËÍ^æv±éœ-jŒcÜw{`J‚®xµìØö/ÿW(›=þñw:K\Œý« u(ìãáìÅûh SitU°e}Ážþk&¦±±+sÙ Ä–½Šœ¸]¬ù‹rVñõŸ‰]ëÍ‚Îwílåöô +r¡`që%%7Þ÷¹7W² Co¬Þ¨¹ØÁªw1Ͷ9mýŸ{*<0r³TnŸKöðÛ|Ƨq`PØÀîñí¬Üú»‰×ÑGØP»JYVzŠP˜2ŽÿÞK3¢q¶½Jí8ŽåÌl¤>d#Í×Çû´ž$åºòÊ£,CØúXjF&Ëβ°”8EÁmù®2}j™i]H;àÖõ¼ån›9ÎBű†øq×~äIlûa‡N”9qö 9bÇŠ Ø›»Š-*áÐ@±³åÑDóv âñü^®!L2Ó$"$\–»èM¡ñ¬;¹ Y²’‰ˆÃî¡J#ÀiD€‡˜ÜÎ3ôµv§v|Íý®e=¬fwP~æDºO^@å^)ÚÐ Wi«^çh¨H†»{Å=^u\ÆLªi,%û Ûüd¬¨{ý³#$Ï}Yi¥xws•ç1ò:Jós¹¶{ü‡7XU\l÷ë3½êˆiÌ2u›bjF/Nú /rÿ½Îm¤‡=ÅúÓx §¾Št^i6øƒÉFZ$O"ŸÄR„ö=î–ìzNEV^|# °o\d©D I" ™QQ0 FÁ$“ÊTˆT‹pÐÉ-ÅÂ4ÇfÖtnÆDžÎ …YRÞv‹j:õ,û·ªáå°®c$~~ŒQ0 VJ§Pž„ùLXOÖ'ñuŽx–ž•Å^¾·‡>ç82ÇQð0öÌ8"JCR] 3õ{fÓ2FZÀt7›Ç ®ñãù›f'RàcùÌÿ([“6“õ¤ò?¼ ˜þ\Å ›e¬æ.5=û`k?þ{ŸÚ©üꮞu”çJO¼ýXr~áÒ\Vve$žeè~/ûóreOo­}” ŽQÛ{ÌñÀ­TFÎ3*Ë‹Y~áf™F÷'ˆ1R·—*Ø–±eüTÝ?ŒáÎõ¬Ìª¸°L°1 ˆ fßæl OUš4ɯ›\ÔEF@à‹ °^"p!P0ùuö8âfój iÝRÉ%c)³ýû6Y%"fÕ¿¦5½î‰tä·)Š·5XT¯â~JúF6»× ì…>gÙнó ²¨¥Û˜u×›â´É ÛÈÉ,¶=õ½*ŒõOÓðv‹NÀ²»»‘$”ÆíHŸ'™9åI6‘ʼ éUtO¸êvwGï:ÃÐn¬ï£ÏŠöO¿0ŸM$Íeム,q<ïO¾²…ÆuízºzŽ9’Ëôuy´¹îQ6ÊÇöª5.µ ŸÅÞ_JöÓ½°uOÆxÔG³ßý‚¼k™E™!-—¹JsÙxñ¹ÛЃ…Y‚Ím¤yr "^JÄœ¼z ·–¤YMD\¦Ö‹€$À­÷ÙË7CL?ÜéÞÁ$,¢S¢ff¤ƒ%‚àmfÄÙ×&f¥@¾ƒI gÏÜéAœA d#–°m+G²¨§^dýû%°;©›•ÃÌ\qˆX°¹D4û'óÓp$}xÅ{Ï7Ýìí£ÙådÇòÞ 6·ˆSa÷¡¢Ï£YÊIŽ‹¤ÐdfI‚u|5ë1g!Õ+X€ç´ ö8?½×L¢’­å¬îë⮓¨ÌOseîÎR¢2=>†N÷\ö­F}êCv½†WÙjb k˜Ï(Ù<+÷êã£À&Ub­Ík¸-rô {Xee%pðWíÆ/.;÷pF^ÍêÔsÅ2êG|÷ôDf[Ré~Ad’4/Œ0gl@´ú¿wê}äyrк™Qˆ/3£Yp8f"ìa¾§ Fº·]ÃúþˆwßÒ>{Ù#Œ¹:cFS,¡îfd}‹÷v¼Žg«pî² |1oN_‚Ï&θjÈ\Qòÿ{¸÷'&àÏñà!îé)ÀŽ»và€päåÂ/Ø ²æR“K˜(ýT¼Á÷™@n:ñætR³òHŸ¯šÓ'¼àò¡\ŽI=£ÛïþA ôÒ-ñ°†FП¸Ë¯2ŒºþYìý9Ûæä`ì–v¢ Ú(ž´ÛüBöÁ‡@^Êó#n;üùÚ'…·®)ècóN%¿€6 ñ°ŸVFÓÿú… S~ë‘ñGPý<Êl+A€›ôÉ$43j“ 9+9ašŒ77lÀúlÙrdf„¬%¯à€CÝœ Ú·o"#g¨,§¨Ôc×J0‰Ë©äØO?ë害ÜZ×3 çÃ0jlÆŽºWøù°å ŠÁ·ï/¢N1Xñn6¬¶Jd<+á }°xÃì`v¼|óq|·Ãß*ÁõÜPVMœ$§ñM0çâ9¥V}ZP)ÇUwaø¼­¢¥ÅZ‰Äa¤â¥¦®Ã’p7¹¥ú£ð²9Ãéý¸šµd’\$ä ø"+‡•\<N¢è@Ú]A>ªNW`×[Éø3Ÿìj%Fª÷#¾-Ц¾Xg ä¤ þþA“˜Œ¾³"31.¬ oMžGÑ{F‚ltyÅeÂ9Eε jªOá wš" ! ehK‘‹B]Øý?žú£ï̓ÐáǯUOO)º§§šê*Š8äÄåWÕ->¿+ÚÂIT7(mɧoAøÐ™†ùãÂPTP Nž:âàë ~QLJ&K.·;NÞ·øyžü^´ÿ +n~SQ|x±pjQTóW,;2«NÞºPŽôOÒà× ˆŠ}½Ný ÏRßÇC?Ú0.¹'âW3_Nĺ…ûÐñœÆ:tFj•3³˜Nþ‘üvZ£]nR"ÐBÐdÀô?_È:ÅÕhb¯-ºîU3£/ßcc©ž|.QYªä]ó³œOJX$©dù–T!'…!¥[Ì”ÁžçN4È©1°Ù´ÄÊñž2à‘$CVÀ´¦ß¥Ô{¬%-ïÉ?²µÓ”8Áú: &–ýí23JbÙäKù³?Þá£/ÉGIvºý9o;]mŒî±ÊžGpçsjù›E™|5²b2·âž²^"\¸–Çó/ÎV´ªµ~i¤ U,ìz£Xâhßc¯Øs¤ÞÈUCˬDà¼JXç—l,hšø23ŠO¢p„N§»'E)O%¡â¬$"jdÞTecéIJèC£)•¥pE)RÂÚ¥+aÅ1«îNËÁV’ã%Ø€Š )/kÇþ+HZÈ©ÜS”G½·±¢åYïagBiË»?o+L­h^ŸfFÅ ‡÷S³³7F€Å¾ø=ü#é…3K’[±L¼Ì¸Õ›™bGí=·§†¹÷øòN"Ð8.X8Bú “D@"Ð\pìƒ_ð¯Õš2¬øóýÝQC,j%øƒ›¦c*2±·×d %Æ0[âÙç Ò A°¯§0ˆëWJòš„njª«áp:‰CM&‚š–ÌQUêR–â0j/\ÞK.0MëÛ7'‡“HêA€¥lÖ|ŠþSŽ3Ê(2f0¢#BàØ·ÁCx®Em×,Û]¸¾m Õ ´V²“|ô”[}¹V­ï[ô†paoLA!àS¹ ®U.©ù# OÀÍÿÊH.Õ(8P„Š*nfÔ†›#¬^ZYƒòbе;:·—ïòî!È‘Z ’·–'-÷)H$M iܤ‡\ŒD@" ´$n-OZîS" Hš’7©Ç!#H$­ÿšR­—ñµkOIEND®B`‚PyCogent-1.5.3/doc/images/tol_gap_filtered.png000644 000765 000024 00000122277 11611026400 022265 0ustar00jrideoutstaff000000 000000 ‰PNG  IHDRàðØu·ºUiCCPICC ProfilexÕYgXK³îÙ¼ KZrÎ9ç 9KÎAÒ’s\ATÀ‚ ( QÀ€€ˆ  ‚HP@Å€€ ¢€(ñzÎù¾û|÷þ»n?Ïô¼[U]Ý;UÓ]UÛ922A@X8%ÚÖX×ÙÅ•; 0€à`$ûÄDêZ[›ƒÿµýÐsTjO×ÿ*ö?3è}ýb|€¬a¶·oŒOŒ@èùDFS@þ„éC)‘0F=‚1c4¼@Oíá€?xy{ÿÆhÔo{[}Ьà¨Éä舂07Î'ÖC4ÃîÉÆZ>d_Ø `ɰ°ˆ=üÆ¢Þÿ¦'àß0™ìýN29àüç¿À#በ‚b"CÉ ¿ü_va¡±ðóúÝàž:<ÔrÏ6Ìð5ïK60ƒïœðµúÛf° Äîî`Óö°d¸·¥Õ_XË?ÚÈÆðXÈ:’¢·‡ágùGR¬íÿ¢I Ô·„15LÏõ‹1ü[Ï•`òþ=›ÑÀô[ѱ¶0„ñƒ˜8;CÃ}H ´wúKfÕ×Ïà/:áddúGÁD1Ý›‹¶9H„ÙÞà¹ÊÀ „? ¢á>Hs  þꥀ? Ü8˜BÀG‡Á#"à10æýKNÿ?(F¿ÇÀãþ»F^àËÆþ3çŸÙxá9ÿÖ|aü7 ϱÇÛ[]ŒgPÊ¿æü[bOßïÕÈÖÊ.Èný½&”0J¥„ÒCi¢´Pj€ÅŒbR(E”*J¥Ò€yjÀ|€5ü½Æ=ýa·üã "ÔaîÞ÷þ› Kýóû?V‚—š—þ^¿xø=@?"2!:( « ¿¹~’¼¦á>Ò’¼ò²rr{ìÿ7moÏú³ØÛß{Äüì_´°TÔraŸ:ð/šÏ ÍßÀüM(vç$z}b£ãþèCíÝЀhaeƒ÷D ?gy  4€0û°.ÀöŸ@Ø£ÁA’A:Èç@¸J@¨uàhm ô‚~0^€I0fÁ"X?À&AXˆ‘ 6ˆ‚$ yHÒ‚ !sÈr¼ (Š…’ T(Ê.BW¡jè&tê„CÃÐKè-´}‡6H5‚Á…FÈ Tº3„=€ˆB$"ÒgˆRÄuD¢Ñx˜A,"ÖI…dFò!¥ªH}¤ÒéŒFAf ó‘¥Èzd+²9ŠœA.!¡0(Š%û© Ê僊BAe¡.¢ªPM¨‡¨QÔ[Ô2jMDs¢%ÐêhS´3:}ŽÎGW ï {Ð/гè †#‚QÁ˜`\0Á˜C˜,Ì%Læfó³†ÅbÙ°XM¬–Œ¥`Ó±…ØëØìvûG…ãÁÉãŒp®¸p\ .WƒkÇàæp›x:¼^o…÷Å'àÏâËñ­øgøYü&ž BÐ$Ø‚ É„B=¡‡0EX¡¢¢â§R£²¡ ¢:FU@uƒêÕ[ª_Ô ÔâÔúÔnÔ±Ôg¨+©P¿¤^!‰ÂD¢+‘B!Ü,RC«é©UkSû¥®¬NQ¿¥þUCJ#D£Fc~ŸÈ>¿}åûÞkòk’5¯jÎhñjyi]ÑšÑæÓ&k—j¿ÓÐñթЙÓÓ Ö½®ûEOV/ZïŽÞº¾ºþaýHcƒ ƒACCˆ¯øŒj–•Œ?0A›˜™d›Œ›r™ú˜V›.ïWÙxÿC3j3;³‹fïÌÅÍ£Í[-û-Î[LY Y†[6[+S«óVÓÖ"ÖQÖ÷l06Ö6E6mål“lûìHvžv5v?ìõìÏÚO:ˆ:Ä:t9Ò:º9V;®;8å8Í8Ë8vîwaw riqź:ºV¸®0`!P;0?p)H?èbз`“à’àõ«ÊÝP§Ð†0\˜WØÝp†ðð‡ÜñÑ‘é‘3QêQyQËÑfÑ1PŒ{L …bEcǾӊ+ŠûyÐñàíxúøðøñ„S s‰F‰×¡ùêJâKJNz{X÷ðÕ#Ðï#]Gަ=f|¬*™’ü4E6%'e5Õ)µ5+íXÚûãÆÇkÓiÒ£ÓÇOhœ(9‰:trð”©ÂS;¾O2e3ó3·²|²žœ–;]pz÷Œÿ™Á³Êg/ŸÃœ ?7–­]•CŸ“˜óþ¼Åù¦\ÞÜŒÜÕ<ϼÇùŠù%b/̘´ ž+ܺxñE‘^QC1gñ©âõK¾—F.ë\®/á*É,Ù¸teâªñÕ¦RáÒü2LY\ÙÇrÇò¾kª×ª+Ø+2+¶+Ã+gªl«V«TW×pÖœ­EÔÆÖ.\w»>TgP×R/Uµ¹!ó¸{ãÓM¯›c·ÌnuÝV½]ß(ÔX|‡t'£ jJhZnlžiqi¾»ÿnW«Fë{Ò÷*ÛøÚŠî3Ý?ÛNhOkßíHìX{ù`©3 ó}—g×d·s÷ó‡6{Ìzõõv÷éöu<Ò|ÔöXýñÝ'ªOšû•û›”î3á;1ÿ2ôå·Wq¯6'M¡§2¦é¦ó_s¾.}#ö¦aFyæþ[ƒ·ïìÞM¾÷y¿ø!æÃÖlÚGâÇü9ž¹êyùù¶£…¡O>Í.F.n.¥¦ÿ\üEôKãW¯ËÎ˳ߢ¿í~ÏZa[©\U\íZ³^{ý#ìÇæzÆO¶ŸU¿Tõm8mÌmÜÂnl‹m·î˜íLí†íîF’£É¿c$Ü#üýø^ ç.pî0æONñ[NW XÆ87€£€Qˆr‡ªጸ‡A^D± ŠÑ’è>L8–;ŠËÃ{¤©PT¯©¿ÑièЧ0Ü$Í1q2»°\`bâˆälç¦å àmçgˆlÚQ«%‰•’’¶”ñ——K–?®¢xX‰¢ b£*®†R{­~W#_¬¦ƒ–Š6‡BgIw\¯GÿŽA¥a±QŽq†IŠé¡ý³pó ?K_+_k_›@Ûp;Šýa‡tÇ3NœK\*]4¹µ¹wyôzö{=#zûLú¾óûâ¿H ’ 6 ñ=v=|(b5Š%Z5Æ…›Wtðz|{ÂHâBâ0÷Í£žÇR“kRFSwŽs§ËÐ?ét*,ãDfyVßé¯g¹ÎÙfgåôçÒæ9ä^˜*ä¼èZt¡xè2®DçJüÕ†ÒùrþknѕǪÎU—Ö´ÔŽ\_®'5hܺYtëY#îŽJ“c3¥åÜÝÚÖ®{/ÚfïkßèØíDv¡º1ñ=„^lïvßÒ£¡Ç•O¢ûåú粟ª<¬};¤=Œ)õ}.ýü׋ž±œqò„êKö—Û¯ÞN>œº6þÚïî çÌêÛ'ïJÞÇ}°ž•‚½ìÛÜ«ùÇ mŸo.Ýø|ûKýתåºoÝß—WU׊׹Þ߈ÙÒÚaÛÝ…í†cÅ} ´@È: #$©ˆY8¶ê‚ãþ´9zs «Œýˆ»„w#ð–¨a´D:AzU[…1©•y–•M—ý Gç<·ïU¾!þ‚ìBÂDbDO‰Š—J”I^–:/"*k+§(O’ŸS¸ {‚±2òK•RÕP5eu þX#gŸ›¦°æW­Ví“:zŒz_õûaoH3ò6Ö1á2Ù2ÜßjVhoáj©m%lM´^³ycûĮپÌ!Û1Ù)Ú™ìbçjp@ÉMÈÙï±í¹âµHþà=ã3í;é7é?0ø&èMðtÈdè«°Wá“ÓðN=½³BÙŠÃdˆçHàK9$¤|XûˆéQ‡c>É””ôÔ¢´[ÇûÓNÒœRÈpÉ<œUzº÷̧stÙÊ9îçÓsòÆó¿€B†‹ÂEšÅN—(—óKî_™+e*3.O‚÷¿G•sÕ˜áZÃë¾u©õå ½7noË7ÚÞ j:ÜœÝR~·©µïÞDÛüý_„œÒ] ÝBI= g©w¼¯óQíãÜ'Iý~–OUEŸñ q³°²?ç~!0&:.3¡ôRý•Τєå´ëë7©3¥°?lP›=ü±ožu!äSç’Èç«_å–ß}¿½Zù£íç—M•íÜßöGÁÙ‚,pçÁÄ9B…Є""±€´D¶¢dQõh%tƳŠÍÅiàæñ×ñT^ÔæDU!Z:"=–"!ÑLfZVa6%vCGÎ ®Pnog^3¾}ü¢´pDÕ/tE8\DUä—è±pq!ñq‰£’¼’¤ÈÒt¹Œ‰Ì’lŽœšÜ[ùL…wŠg•´”•/¨è©|V-T3T[V/Ò0ÖXÙW¢i®ùS«\ÛV{W§I7ZO^oE¿Ñ ÖPÙpݨÙ8ÁDÃdÓôþþ#f:æÀ¼Ë"ÍÒÄŠhõܺØ&ÀVÁa7 ûH¬£©—Óç—s®>°—àܦÜozœôôôR%“È_½|®ûžó‹õw Ð ä B-? ¹š–î¡)Å^‹yGyÛWv03>*Á!Qõ[”´q:Š?ÆÌž"*‘¦p\=]ç„ÑI³SÖî™ÑY'O—œ¹}¶÷ÜxölÎ×óë¹[y;ù;„BÙ‹.EiÅõ—ÆKÀ‘«¥Ñeùå-×^VìVÉUûÖ\¨¨õŠ A7.ß½mÜw'ªéZóø]|«ú½¶‹÷µ¯>àé4éŠê.xØÑó®ýHì±Õ“„þªéAögCÕÛ£¶Ï»Ç<'X_nL‰¿îx;¼¾^~Wügyƒ<ƒ‹C&ÂèÃM"ŽDÞŒzÃD1ŒMŽ{ÏžœØ–DwØÿHû1–䨔4‘ã©é3'5OÕdògŸa?[”Í“S‘+›wÿ‚yÁôňb䥂¯«jeÌå¿*fªžÖt\o¬¯¿Qs«ª±¢)«%²Õ¶M¡¡c¹s°»®çt_Äc‡~­§bχ¶FÞ›ÀD(¡’£š N#*çiJhééFèsœI|¤ïŒ}LW™²ø°îgSfæàà$qns}äæéämä«å¯(¬ªné[ß•d”“Ö–q •;._¢pOqF§"¯ê©vF½]cYS@ËI;K§K÷§¾¸‡a¾Ñ ÑÔrŽÙK Ë«z[w» ûG§ço®ÖÝy<Îz¡ÉÉÞ_|UýRý‡y‚¢‚{B9ÂbÃG"å£ò¢·(~±ÝÙãcI%;üó¨ÿ±W)ö©cÇ=ÒO=5›©—uõ tÖ÷ÜãÙóEyøüÄ _ .¾/ö¾ô¾ÄöʃRÙ²«×H'*·«)5Ÿ¯Ô½o ßx{ËûöìЦõ–ÔVú{e÷UÚuáºk{lz7U=q <íy–<¬=²õ¼y,|‚ÿå³É¸iæ×7gŒÞŽ¿÷ýðå£Ã\ùüâ'þEó¥ ÏÁ_|¿,ó,¿ûví»õ÷_+—VeW®9¬Müpý1½î¸>ðSïgó/¡_Ù¿¶77†6•6 7··¼·:·y¶lOïhìäí,ïîß-ß³Œ¿|FÀ ¢ÖƒƒÉ×»»+Â`sØÎÞÝÝ,ÝÝÝ.ƒ“ øȃÐ?ß+ö„1pͽ¸|õê§Û»ÿ{û/6'†ªÞ"„ IDATxì] \TUþÿÙj‚‰æ3 ÍGj2¦æúHM´|léXiÅÒ2t­UÜJÿÓÃŒÚ {[FöÀRì[áÖân)†¥¨a9´a‰ÕPBBΘƒ íùÿ~ç>æÎ03€€ÀÌïøÁ{î=¯ßùÞ;÷wçü-&àÄ0Œ#À0g³Îèh<#À0Œ#ÀH˜óƒÀ0Œ#À4Ì€t’`F€`ÌÏ#À0Œ#Ð0nÐyHF€`F€0?Œ#À0Œ@# À ¸@ç!F€`– # TVT@e•‰´„°°zzÄ++àD@Û¶aê(•phLJàt%Dwõ3¶©lž¥•PZlƒ“.€v‘QÐA/¬„ŠŠª3lï=… œkË0ï~“§¾Àcî¾ðñè£Â¥eI×yQÝA›¥ìµûÃ?Ô±Å?NŒ#pú°|úØqË&ƒ€^þS8„‡{ÿ ƒ½Žú!ò„u DD„CÎ ­¿ Ø:q*¼óï*ŽlÁ­Âaæ yZ<à s[A×ȞгgOèØjl+BN‰éÀÚI>è‡?­; ˵ÿŠw­Áz‚\ßÃbµbX3¡„XåZ#ǘY›p˜´VéûÐæeÐ"¼Ò‰tEBx‹°õ{€¼—gV¥­ÕÓîþµqøÈ0µB€?akWnÊħç@üÎàt¡xI `Ï%[«ÿQr­@Ð(y†õ[VëlèÛÖÝSëhÌŸí>×så{á¦>³ÀŠzRiÁü¡ +`Ñ&3dl„+#°éŽH˜:}=”\oN…ü«+¡•ì¤%„Ãw°¢Ï8°iV€áà Cž›ÕzZ¡vtÀ–…‘°" ÏÍ­Ý’·Ò)X2raáàö ³.hÙ.R6tAGHDìb'„ˆÊBxþ˜oûŽ… á;eˆO‡‚øá 6l ´aùÈ0§…KÀ§7jjœ:†ÌnÐ0ˆêÝú÷ï¯üõŽÒ—ROÚ-Z,„<]‚õ1ƒÊ"xaÙhŒ;¼U !L½.†ÞQQ*}ØGTÙ/ÿÇ0§KÀ§·lB´î°î™•PÑû\•ªSðË© `¡eDÑS~ò8þ·~õK3.)ßÛ%™!=·†·?«†ÁÒÃÊRl¥Ëyy{௭PÏîÊaÃÌ>°:&ìÞÿÈšïYŒgH¦žÂz‡X{—È¡Fy²ž¹L‰Ù0¥»ö ƒîÝ»CÇc]õöÆÌ‘-Ë`ÜŠc°½lDeÌ€¥ïJU!|Ñâ™ðqd\=fÞx= R°¡&fËa×»(g›n@)\I­ X—.…¹û#¡Mç~pÕu³aÆØÞn Û³>c" ýºkX«1Mv(L=Ú8UOÂÏ'[++¦ø”·5-‡ãv3,!{ÌÄñ%$%$æ¤ÂL•Nšˆ’à3µ|®8+5*`Û#ãaþ· `;HK·*§n­®ÿVÂǸ4<ãɾîÕ¢vm<†G¶<«‘5çüy¬»®šSy©Çõò/@ŸYI^à„ ÈÇRŠÕî"ºÁ‚ ÌèÚZŸú2—·a+æCF¡ ¦Ë¯ww»Ö·ùÈÓ ®ÓWÚô›–D” Û}ü2Ì·Ì)¹ðî¡cZ#àù˯usnÀ4 N} ¼)–D{èﺉCmâ¶þ˜/ÕBn…<~üÕ-âºhﳦÉqV¬´bm<»l!;yráZ³L0·$Ö=cpeøÀ7%CUÉSå¦ö“†A*ò`Õ¬õÈà¬0—€j¨ªe¿ýpÌ~ð¬>Àq¯G ßÂMsKàñ„þ¸Ü=ûÁÇ´ê°0n, 7AR†¦/јh%ìZ;Æ-Í€äì˜ÙÛcï)Kà±)jó…‹aÜÃ`꺡0ò{NŒ#pš0>Mà¸YÓC B*_¹‡7…¤|ä×r™!±Û^çúoïÝŸÇyø˜–eúE\ºÎÙ„üØS‡÷Е¦Öç£&4ô–µ* ?†,”tW÷usÚ¯>ˆ{¿±`½5ZïɘѥZ=ÐíÊDHM+S¤sT ëtt=de˜`ÂÄÁò£ÂØ^æÃºAÌÖ °õ>3˜WgAšµ fGb«-¡k7ãBºÞ gF –0®%`\½i"@{Àÿµî‡¼V炦 Ðúí/5yOä¥@„éM4ßÉ‚¡¾$ሰ `ΰ¹Ð:ã6ˆ8ô>Ì!ubóþÍãÌpÒ²;L™=Ûp¡~{is3Ìž¢0Ó+åwÁÖY¤óì°þÁ¥È “¡ŸÆËwÁ=‹2 .½ª òPZê€c6” 1•Ci§肊T݇Nyš ‹e‡N¼«¿qó¦ËedGÞXü ÀòøIÐ —è÷¤Þ +±^Ú¤ø?î}/kf\~KˆmŽâ^÷xyÏhèÞ*–Ýô Ĭ¼ Fõj?}ö6 [š…ûÓ³ô+QâÿÓG€ðécÇ-›­qýxýü‰°Þƒ&ÜGµï0,åÒ"³¿³_,G祰Èl†˜¸°àðj‹QpV%ØvpN1N[±3¤Þ³×BÚþñ` )RÅA¶m±ÎÈö¾øJÄH©HÈ”âõ]‡a¹’¦ˆDæÊ^r¿Ù£¦~B+Ü$Ï·jÓ¬I³À„LVK éV˜ÝŸJP†Ë÷”ÖÏç_rn,1µŽö$0s7މO…M­º?­ôÂÿ3Œ@Mh!0Õ´2×c‚JôUÙ2BµÿuÀæ¹í`¤ƒsãL]!É=ÿbŠû†2Xî¡Áì®á/ç(/EÏR-Ñ䧃~ýµªûõ ‡œèm+¼Ž[ã e\ÂÆáÄOħCD-×tîJ‚ë—äËeÞ6—{uºè’ü¢*­ö+/ÈÏœÁ„hMªA‰É‡[DOü¹#T‰«t@±­ \­Ú@d÷.æ5íÛŽâOÇóÐÕ¡¯-M ‡ èJ0€Å€}× ßÆmì<˜íFÒ⬠ÿ¬ò˜Qx_&iÏøÅïš´ÿ¹Dtè¢8¸ð_¥v%5|îÃ"ðãB[î®Õ•¨MÞÂ#ÜŒÛ×s¯wYÍóЕ&>w¥%øìãšCÇ®]ôßYµÏ=~ øu¥‰„U8Ê¡Ìá„VhãÝ?|<_|•à(ÇrÔ†kÓ1ºð†~+9ÓÀ,Éšb&i¾Ê_BNYýLѱGöýðw¹‰8ÞÃ{ÜýÛs®\V¡!&9W­c)±Ær³È,tê홞ícSE‰VZMß®’ç5vBF¾ÖZ=:EFŽŸ˜ãuÝ}ê,HÈ‚jâêkÖ·^½yfœ%š“-¶gfˆÌí9Âæ¾->æãGG.egöR“xîõ)~¾lÙ‰øÛˆ9v½žÉMOðøÝè¿Ûjžûü´xv¸q/2òµœ"žwãïÂ/rJ”û¦=ëÆòøÔ\ÑøwU‡…3AŒÓÜ”‘E䈂ü|‘¯þÙì§ósr §Ó»C¶Z…ÍðÒÍM/¢5æŠh:ŒnÿDa!ÒA´ä‹Âå…Ÿ‹/dºvárÚD*1cS²PXz‰H4áyl*¾ü]ÂŽÌ-WDlªÊDöíifl“"l’l—ÈMñbD®ú.²Ê±•—Úqú¾õe9ú€Yÿh¨¾oßñÕ3@“xîq¢Ÿ/õÙUY&5| 3&jIË%v§(),Ä£Zð¹ÇqÓºÒ¶2»°—XE¢ú;PšÛEšÅ"Ð¹Š°;±ßüLºv¤þfíÖT—&òm%X^&²Sè7cÛõ¯^B>2õ@1àd@©þ¿^V?¾¸4k$‘q¥YÜ_Ì&“0á³dbÎ|a‰/Òòz{_ ˜˜fJ¾7ó¦&v‘ƒ?~±!¿¶¦H&)%‚’íRòLÉw‹^¹É4§l‰ _DÕõ ±éBkíʧ¾Mº´á²—ˆB[¡È´àœŒ Ô7%W° í1 ©Ý( ·Ô®Ð¨o¥þ¿>øý÷ßkÕ5¥ <÷HqàçË)l6›@Wšòy×> •‰Êç.6­À÷¼>÷U›¤âJ˜þQë]nÉø‘kü zÔ°eÈß~ŠUãþ¥|ÂÔ+ž[!øyÚ¼j¹¢û¿{×”ÀùêDNýr †ß~/úßÅÍÐJ|Ž×•’~¨ïT¾÷i6g5à—8,¼¼;|÷Éf7' ÷¤0Už„=y;á<Š%ç/©EÜþ­(î¸óÃ#Ý·­H½÷˜:XÀ¾#`ÚÃɰhÜ,·ÅCú“!ÿÑE“°†©û~-#ºy´“gŸÆÜ(Vß·?8øzíÀ¸å–[ >>n¸á†6nÏ=RøùòïJ³²8=azk8æl€&ˆKX÷Ý;¢è§𹧆ÆTÕ•¦,­,†­/o¬Ì¥d‡Ü¹&c#=dwæÍ0Ú`›­r†¨g‚Œ:1pòÇáG¨“?Ÿ„šƒ†¶&ÈDMPÿþ+ óÙh™ÍVÌ,¢"g@ 2`™j‚V ·çUãŽÐÕZ¥Ú}ððƨot)Øñž²aLߟa†Ê£å¥p¢¿Ïd¹ò_5n±RMû6tÊÙZ"°sçN¸æškà×_…¯¾ú ÆŒ#ý?׬›F~î Dø<Õx©¡6€ó§"ynî5eÇÁ’ò/áQóè™ûØß‡a ëæJSv^i‡ƒä$ ~†_I£ÛŒYV!{iré›jõa‡-«ðŒ@ý"P¯òt#w&—âL)úlíÉA… Ü“1()˾ªÒ.…Ç%Ú¤\ƒ–÷°÷ ¸ìMŠQÊr®²”‹ŽܵPÁ„–•“qMΞ›ŒË_ž *rçäsAÌ£ïqrpU¹ÊU&¶'Ó~ˆ´mQZ––µ=– í9ré VXâãD\\¬zŽûÏ 8vÍûvOŒsµEÀn· Œ,ïÝ·É“'ר‹†{îÇ‹JœÂY¾[ŒCzžØ]ŒzN¹ÅSeëÅ@i•çËPfW·\´%h§Ã!Šw?s6 +îÚ”Þ#¶[Š2£²–‹ô1í¯|¿¸]>ë¤Ç •9ćO^#±{òÃïDõ ¬©ÐŽ{¤”¥më`{{Y‰ørû³²­É’Y‡÷‡a¢œej€@ð…#4®ïÖú[Å'P@Î:d3´té+À†‹5ϪnÿŒ È¡–4w„#pÉ+¢×\|Ë‚½…H„šNÿÖ3„ŽV@Gcߎo¤“†‘ƒ.Rj´ìß…ë?ü¬(PŠ ^ • ªÅ´´©0høpÅc`$®Ð™bU7еè[éÿ?"ÐLèÕW_…³ÎR~–ÿþ÷¿aݺu5몞{n‰\ÑC3v ÙHŽ£»Cxx8¬Þ[¦*Ï—¡¶^&3'`-ι{ÆÙX#òŠ*áÛ·GÁÄw¾–tIMy/Ï”ãÒØò¯ÃpxËÈAʵ“Ô²¸òž÷d‹{®¼þ´î€ÚZ9ïZƒõþ„žÐ" ÿ%ø« p‹‡`a ŒõÔ±+ œx—¬Ø ~ó)¥{tÆ'Œ@}!P&Ýlª(Ú qøm¹¹¹Ê_j#—© Q(ÁNÇ/çéÏûÑÆ™Ú¶+ XqÉé"#=YÕVµ6k “Vel|ª°ÚDIIÈHTL£ÒTŪ‚4Ò‚6¡™D‰p–ˆdÒØÔFʤ‚˜“EAjlZs M9%pߪuL¢ÈǶø½/r¤4Œz#ï¡ µÅDÜsdgo|vc„UÝéÙýà%u±t<<Ÿ{»H¥X¯r:GWšøàþWÜ;Ú³]iª¦zhÂä;S;y÷¯3µE ¨pm'ï³¾Ë)Ê vÃÒ>÷§<·Qõ¦V?{ÀNÜÏ++)ó¹Eiß©„ÊõžŒ”~Q:-A»F_©n}ûê±æ×ÓíîgΜ9/ÅÎ;‹­[·º+p. gŸ}¶Ž!­$4hòùÜ“ ">1Q$ª –´aw zî¡ýìx,Nò¿¢T–CŽ7PzFû^²‹ÏÆ[£žƒ·$½úr1`J§†dM¦U¤qó7Šy -bsVSCÅœIJ·†ö”uìÿ;ö1FdWýqÙrDJ¢EêbÄXHç#pf¾=`üµ×)9BÇv­ E‹ò¯†‰‰K}įV¤ëøNÈ«¨êÜþ‘Ë;Íe¥7M䎰‹__ÀaX†å|û ¬[ßÞ”Ôî<0Ýî¾^{í5xøá‡¥`yñçŸ39¸ûî»á·ßªâånÉ9BÀd2.áë`P~ÿþýúy2ä*Ý¥zü•ï«òÜϼµ?£jW“eþýðýp­ è¹ÿBDI«‚}wB1IÂÈeë ôwí¶*¸÷­àˆ=ÆÞ€V^Ä Œ»\\ŒÑÊpNx+h~º¾tÓ÷‡žcáÞG¯†Ñc†Ãy¿¾³cúÀÌÔ=^r]‰m‹‹KáË7Cs¦Xÿg²` ÷’¥PTT¥ŽJ­iÓ÷{ÛV5e¨(=oË‘:Çœ¿É½e¨(‡bl[Œ4UMž}û*/-.’c—ÓF5'FÀg†Ï7§Qœ¢Ðš+¶oÏ™¸—ëÖxö5 çÑ"qØæ^’öU'”¯ÑRt=¤„Ï <^vÙeÒCX(ãR“¹“CޱcÇêØ]|ñÅâäÉ“5i°Ž¯¥bº7Ý”åñÜÒ®øÜë.[‹u«Ýe+nÅh[:®B±êŠsõù$Š,¹¢”%’ÕgE{f´crn‰t$CzY^®+crŸêk«fñ#÷á8n¯p@š·8ÅN¶”‚?NRº{ÙÚçvë«oOו]Íjãò‘ x šŸƒG¥_1mšªì¢¾ØPãWlÞ¼¹ÁÇnî|ûí·‚°ÒÂ]wÝUç)) ¸z—­Š‡­êÌú¹l-Sögã3ÝJ[N2y# mï¶³¸pÉ{Ò d!*.î“ xº¢Ôܸâ¯çg*KÙ©è¡Jz‡ë9^bBKÛäºòÃÇ&áy¬Ü*òí^2Ë‘ö| ¥B1ZÍ5¦M¬RŸÏ¸ÔaÇ}wMRsÝê»o·ëÊÀ®fë|븃 C€pÝЦ<§žzÊc_“˜J\\œ@ÇM™ì¥Í¥Ù¶ŽÞúV¯¼òŠd6„n4Oª9M¸·ëp¸¹Žëh¾Xû—aU]¶Ò0jÈÛÈ'²CqÙ:ó6tq qâ?û÷‹Ý»w+í_ýè@ݧÈÝ0_§ .( ¸÷Xq,ƒËVͪàºkˆ ®_$&Kg»ö¤Ê¶÷’WmÃýáBôåŒ}zëT8? 'ß.â¯EF9æ ññvr¯ ¢+þ‘u€Ó^&l_¾-.Çó;Œöõº}p¹ønÿË*cDFž¢}MŒ”ú¹lîR”zG‹Ì¯ Dz‚ÌeÈqC¦0|ÝRÁ zוոšõjʧŒ3`~Î(dÖ·o_õ…¨,ó 0@XÑt,Ô’3_aÄ<þªFªš1c†^'22R”•ÞѰtä>,ÛíQwI¹IxÞN^Ó”«VÄŽÑûÖèèƒ4õ½aj•ëZ9À%²lÖ‹ô¢ß|÷ÓwaU‡5.‘¶t‚G?ç¡2TZ’ÂìÆõ=Ç£ìÖù—éÁM¤ft´6Øïø;þª´ééU6ÈåHJÉ^té´\&.Á²Ù÷/÷›ÊH\×P¢5À[†´úq†SN¦Kš’fUëô®#—µ¥F¶vSøÈ¨0æGáŒ#à@ïG7Ýt“ÇË/,,L¬_¿þŒÓÒ˜*^¡L2REÌÒ¢wø°A%­ø®]»ê˜ÝˆÁÞIóRe¼îr`ëa]ÛÞa}ûèŒ1".>^ ÏiqëÕ£Äe—OësÐÓÚ¦§Ì!Æv·øJ* ÛE¡Œè¥, S0ôx•ñÝ‘!mi-)ÂZô±¸ÆŠ •ýSbàä1®ðí»$Íc/‘³DÒRZ*ž.þñw·éÔ¬Ç6ˆ‰³e=b‚=ï߇S(Qmâ׈GÆâx×?*ø­ßä*Rð°áW‰û_zCüsËߥLm©AFœ%…"é>ø¾²t}]âiÞ³æMé™N[R.«¤;«XµXÀ€(ä™0Kꀤ§&(çºç,7ºšMfT2¡¹™9’W;=É=o©zxD½„3Œïó3ÐxlذAœsާtýõ׋òòòÆ#ê Ž¬0`”žüŽéD…#\ŽM•¦1dÆ¥Kqø¢×öÐíjˆ=­ ¨E|Z¾ìÕ™Ÿ†6Å¡ú FŧAþ÷våÞ/:à(ôE—Êd®B¦èvÔ‚5åªï¶KQðAåH£K;Þ¾~?Úß⼎º Â&Çq)qª‰ù]û2vEr´¼¼Jö—ŽÎd\i˜7‰÷²È™†âºR#±,[eükÜ&QeVªN7,™¢XšC™ÄÛÛV+×âðã·?ââIÜlމЋX{aŽHŽ1 -6>A‰± Ø  cßîÕêÀ®f5ZùÈh°¬!ÁÇFAàСCbðàÁò¥¨½ £¢¢Äž={…ž39¨Â€q<Á`g›`)™š¯pÕ3šaÿqÁ‚:V:tßú—â¸%.EÆÁ¶æ¤+’žòRa¸ã…¾-%`dÀã¼° ÒD¢E‘L“³¶èG…Ç„Åâ r~ÊUZry—¢£Ä0dxãïU– ŸÞ–®ÐzÅdæ³Ð9K!r,E,ʰ´«2à. ú%ìÒð(bÈ8Žb½›/=Ó¥ìü§ì/­¯ºl"#%Y,‡õ ‡ØðR³p»âñ*6™lŠÝöÁ.[®HKKsÿ¡„kB¦nIN™V_A€ËdŒnÝÇ:õæÑ·† n;f§•èðôïnlÁùÐF€phßÿ&1{òü´hÑ"ý%MŒ¸eË–â‰'žÿûßÿš A„Æ€Í$©ËÁ#!]‘^iLrºbG-_-ÑòýE]¤c3òbù‚wk÷:¥Q-Þ­Æ€5çmî%ho—­Ùâ¥es„9FñÒFl—­j¼jK&*ky¹lFŒR.m£ÄyW¢ƒç‰iOI:§Ä=ªÓ‹Ö¸Š‹I[¦ríüEâœcaNšô>7÷|leâÉñÔg˜˜·6]df¤ ‹Y¡-Åêv]™ûÍ^ñÀÌÑKŽ?Jl³¡¡¦ÝŒ.]­¸ ýï§fb?ÃÅ_–†5@q šÌ¢r¾-¯–XE*£•äˆDéeˬÇÔöî;uHAs]ØÕ¬6(fÀü$4ÞyçAR& Óñª«®?ýôS“¡±> QpŒŒT›~?þøcÜ8u¿Î°<ê©ä›Ÿçn+1:·[Ò‚te?öŠgŒ‹ã.‘)÷~qw–’—­ßdKïX}©¿´5دY¼—X¯g÷Ž˜W$Â÷y*ãÅ&ÿS¬GÐòãÁU Ý\F_w«ü(ˆ1Ç K1`l‹kÁU]WÞ/¿çKÅ{jã^R[:7ðæÜD…ÑËç0Æ‚®bÝ@Õ»® ìj¶6÷™ë?Ì€ƒÿ7«¢ç"qùå—ë/lz žwÞyâ?ÿùO³šGMˆÕ°[zõÝÊ œé,‹£³Ä?>פfdÀ¸¬ùñöÍ€Ç M"Ö;4dœjÈ@éœB½îB?äF&¢ý¸lµ)a1)JÌÔæÑw÷ê´šúuǼ€ Ò)\æí"ÌÞܲU|ÿýÉtcÕýkI2›/™h{ÆJ©tªºl•šÏ†åzª‘›B{¼Š}°Ò¢ÿÓÜP˼̰òPƒVUjê²Õ£Ÿ„Ì€Cî–7ýC SF IDAT Sô¤+VxHydÿzÿý÷ JÐôgæI¡Â€M"5;WXµè]¹9ïkŸÚ#v­»ý©S§Ä¥—^ª36è)Þy£Å*Ò›9à°‘£CŒX3* åÈèX¶üíjP‹ -V%©N3ŒšÀCZJˆÈ`µã-O¿¯ÛÒ*1¨q›¢]dÐp>ÏÕ(GB]†¾-e§h×®8 ï1õ•~XQÂsa4±\ŒšT†Í™)Ĭ1øÈvuo•´âP*ÎÌ-~Ó­™Êž¯‡è2ų•GümmJ|dšÌ€›ÀM`|#ðᇊnݺé/xz5J:mðÝ¢y]Õö€5¦Ý‘xTÆgNñ à‹/¾­[·6àÓUÄ'§H)Ò¬šÆ(ðt=˜ˆ·DŒ»ÌžRÑÖÆSœÈ$M)5iöÊž¦4ŠËÖÍßf_aƺ0ÊÑ´(7“†NëQލûm¿Veä#{jâ"´bB¤Ð#’uå4¬‚KÔ2d¡ÚŽê£Q'9‰¤0æG›[…3Œ@ã!À ¸ñ°ç‘k€ÀÑ£GÅ”)S<^òçž{®Ø²eK Zò.¦1N ÿ˜óAŠêøÁ°©i€¡*V Ñý––]ýDà2tá3(Bׯu)J–wºlؽüé§Ÿ6»iÒ…qC)eÉ“˜²\쯆W>eš ß^œš.è|þõ¯ÁêÕ«¡U+ŠhðË/¿ÀÌ™3á/ù‹ŒèÓt©oxÊ–-›©Ã`ðyÕ"Èºæ ˜9Ôw$-€SX{'T‰GÕ²%DtèüDàªn&"tÙív½9.5ëyÊ8p>ÍýL^Cg,pë­·n5ÀÎ;ñš›¦ª1Œdh‰mPq:Dø«¡Ôãÿ¦ˆ3à¦xW˜&(4äÿýßÿjÿB¯^hp¢¦gŸ}0X= )íRÈ[´èïîÛmÏ9GŸû=Ž@wýÌ3Ó6úv(:\—¶õ¼ÞgpJJŠ>ô¬Y³àÕW_…Ç{ &Ož ›6mÒË8Ã#Ì€ƒñ®霈Ù~öÙgpà 7è3ÌËË o(_ÜúÅËDuÏ?¯ÏúEdj™™™ú¹g¦-\ØçB󼨠gþ0]Go^úØýû÷‡{î¹GžSÌè÷ß_/ã #Œ0Æ»Äsjß¾=¼ùæ›°nÝ:—3Ź0oÞ<˜={6 £Š ž½ÿ©Í;Ч^áöÛo‡cÇŽéç™ñÇ€IÂÅHX’´Þ½{ã> èxEž3ÐUic’Íc3 Ž3à‡˜hþüç?Ãþýûá’K0®š^ýu2d|úé§Ú¥:¾ð €ZãrÎ?þø# w±&1 ظüŒÊv€Á$$½ýúõƒŒŒ @ ï&A?Á4Ì€ Yî·Á æKLø¶ÛnÓÇ:rä #HNN& ýz(d:uê/¿ü²>UÔÔ@ÖÏë3C’*aŸô1(ùbÀ_hû€íñŸ8qBæ»té"—Ï;vì(Ïù?F ¨hBÙL #pÚ಴ÀåiÝœ´âꫯ¥¥¥§Ýgsmhô«M˜w±úN?ÿü³î(…ÌŸÅ'މ!{\Åxûöí’\2ׯie¸¥ rrªÆB®oÚ¹?F © ÀpP^…ÎäH1‹´HQKK¤Äƒ‘–T“íjðŸ|òIèÓ#òb:~ü¸ÜÇN½Nœ¤m k2úàƒüöï-#³†·ÞzË£>ú¶ŒR#FŒð¸Î'Œ@0#À 8˜ïnˆÍL”ÈTé¯ý«\Ö¤éÃĉåR)1ŠPHcY.=£d*§›••IIIõ>ut¢÷¹mÛ6=ïñfÀ¤\…®4=ª¡C¸öÚk=®ñ #ì´ Q<Ø'Éó =ˆ!ÜrË-€KÐúäÇ'¥¬=zèׂ9óàƒJÍbš#9¹ å4£ÒZ]ç¾wï^9r¤ìæüóÏ—;¾úÄ`PRR"‹H9Œ4žIs]KK–,‘{öÚ9PA€p¨Üéœ'½ìcccaÇŽúìié455¦MöAž\. ïlÈÍÍ•3% qbššG±ºNŸ±ÈS™fîôùçŸË%ï~É\ c>ËËd.f4/Â}zغu+Ð4'F Ôà§>ÔîxÍ—¤2Ú›|øá‡A[Ž%f1}út¹LM΂9£%-h’~)ÑùC=ToS&¦9iÒ$½?_ËÐô 1_º¯½öš^ŸÌŒh/˜™¯ gB fÀ!vÃCmºôr_¹r%|ôÑGpÁèÓæ™g¤txøðaýZ0f ‰‰‰úÔÖ¬YdT_iêÔ©zW¾¼o÷Ï>ûlÝÑ5"/XmÚ´ÑÛs†5˜‡ÚÑù’mðÁƒaÆŒ:`èС@<‚9QÐ RD£DŠh´7®ÙÝÖuÞ$“/%bìޞȌ X“„©.ùö¾îºë(ˉY˜‡ì­½‰SÔœwÞyG*üh^–ˆ‘ KÚ›ÔÜ"2Ä ißÃ8Ê©‘³Ò¯D V´·L‰–›ÑÎ×£[#Öô=©>¹äÄ„:Ì€Cý Áù“Ö-:|€‹/¾XŸ=Eá6l˜”’õ‹A”!Íïçž{NŸÑ‹/¾ï½÷ž~^—L eh#¦1hK€\Pj{òu—Û2ÍfÀÍý2ý§…À¥—^*µƒIKZK_}õ•tñ¼!²V G’ôo¼ñF}* ,ð0ÓÒ j™ düÍ7ßxôvÇwè<< ø„AØ )o:OÙŠÊCþŒû¢ä‚ü*Ó²u0%òB­ÛìÒž8-Ë×%UVVBçÎ¥×-êç¿ÿý/ 8PvIvÇ_~ù¥ÌÓR8¯-…Ë‹ü#°Â7Ÿ§® @R0)di{™t•˜IÉŸ|òIPÁDAÈZSœz÷Ýwåþp]&Ù²eK¸êª«ô.ŒæHä‰LK则¯†Ü’aF oß¾r_ø®»îÒáøî»ï`üøñðøã{˜Ïèši†4—/^¬S¿téR(,,ÔÏO'ãkz×®]ðË/¿ÈîÈ©¾öœO‡>nÃ4E˜7Å»Â45 Ä$þþ÷¿ËX´Z8c|"ðóÏ?K&ò¯ýK/oß¾=C‹Y³féךcæë¯¿–àZˆÀåË—{ø®éœ(¸E›2šs‘—ÍfŠˆÄ‰`<à%hO<øŒð‰Ù¹’ïO<¡‡ó;~ü8Üpà ҆Øétúl×.’y)eiéÉ'ŸÒ`®m"©Y“€µ¶äw𙝆O˜{âÁgŒ€_Èvöž{î‘A(¨¼–HÛ—öSÉEsMwÞy'Lžh×®TÂZ¿~½Ó––¡É¿21fï õ9vCõÕ½{w0úÁ&Yµ®gÏžz}Rîâ  :œaª À ¸ $|¨9´Äúé§Ÿê6°ÔòÍ7ß”JMûöí«yGM¤&íiSÐ-ÑüÈüª¦É8çûî»O·3®i{®Ç„¬Jw›çÚ`P°yZ²%‰XK­Zµ’n,—-[¦û^ÖÊšò‘ÜGRÀ†~øA’9mÚ4غukµ$“Ý/­ 64%Ò'­hNŒ#à–€}ãÂWZ!/¼ð¤§§ëÈYÇÝwß úÓŸê%ì_­ªCe ˜@¶»ZÀ†þóŸðÒK/UÛcnn®Î|ÉÍ%3ßj!ã !Ž3àxúõ‹ÀÌ™3á³Ï>òþ¤%ŠDž vìØ¡]jòlj'Â’%Kt:IŠ÷Ží«ª™={öè—F¥ç9Ã0¾`ì¾Êœ6¤ˆDv´÷Þ{¯.Eþøã2dßÊ•+á÷ß?í¾ÏdCŠ5`À9$9× Ó$2Qò—˜ûC†¯3¾`ì¾ÊÔ rÅHÎ-Hú=ï¼ód_ļyä¸âŠ+àû￯Sÿg¢1-«oÚ´Iwa*°\¾À4 Þÿ}¹7Laý´DŒ˜ÜCvïÞ]»Ô¨Go…±£GB×®]•&œh.°Ü\îÓrÉÙ“ïh-eeeÉÈJš)v½±Žš‡,Ÿ‚P0óm¬;Áã6G˜7Ç»Æ4‡ ‘‘‘2Œ!ù`&T”H"¾æškd´%²½mÌdôùlôwݘ4ñØŒ@sA€ps¹SLgÈ"@{««V­’Œ˜2%!<ùä“@n«‹ÓÛÀ‘2––:wî¬eùÈ05@€p @â*Œ@S@`ܸq2²Ò´iÓtr>ýôS2d¼ñÆúµ3™9~ü¸>)fqbš#À ¸æXqMF Ñ /T[·n…§Ÿ~Zº€$‚H ½ùæ›áŽ;'OžQ03à3 =0‚›ÈS=–-[Ì–´ôÒK/ÁðáÃá‹/¾Ð.5øÑÈ€É4'F€¨9Ì€kŽ×dšdtàÀ˜3gŽN9 &ü /è×2cdÀ,7$ÒÜw0"À 8ï*Ï)d וdüÊ+¯èñz)ºÒ¢E‹`Ö¬YðË/¿4(Ì€^î<È`ä7˜§ÌŸ?H!kðàÁú„·lÙ"´rrrôkõa\߈r¡„3àPºÛ<× F€Ü@³]¼x±>ÏÂÂBéCzÍš5ÒtI/¨§ 3àz’» I˜‡ämçI+aaaðÜsÏÁÛo¿ :tÓ¬¬¬„+VÀ”)S€\EÖgb\Ÿhr_¡†û‚µ;Îó (R™'Qp-Q<ß7•W^©]ªÓ±uëÖðÛo¿É>hï™Î91Œ@Í` ¸f8q-F Ù!pá…ÂG}‹È›¥Ÿ~ú &Mš÷Ýwd\—D1‹5æÛªU+f¾u“Û†$,‡ämçI‡Û·o‡ØØXÉ€µ¹5 ^ýuˆŠŠÒ.ÕêXZZª_ 7”tΉ`jŽKÀ5ÇŠk2͉'J7–“'OÖç@Ž<.½ôR¹_¬_¬E†÷kWe| À Ø(|‰F(T`ff&$&&-S";á믿î¼óN =ÜÚ$fÀµA‹ë2U`\¾Â--Z´€å˗î]» W¯^ú<×­[#FŒ€C‡éתË0®!.g#À 80>\Ê%Äl?ûì3é-K› Õj…Ë.» RSSµKÌ€ÂÃ…Œ@µ0®"®À'<á­·Þ‚””—“üõ×_á¶Ûn“þ¥GÀ‰32Õ"À ¸Zˆ¸#Ü,\¸öíÛÔ'ºyóf `¹¹¹ú5ï ÇöF„ÏÚ!À ¸vxqmF (4hô%½`Á}~0zôhHJJÒ¯3,Ñà<#P{˜×3nÁ%´ ýâ‹/Âo¼ZhAr´A±‡§M›ÇŽó˜·‘s,`hø„¨Ì€kWbBo¼Q*hQ\a-½÷Þ{2ÒyÖÒ’‘k [+ã##ÀT3àê1âŒ@È!pÑEÁ'Ÿ|wß}7é%›Í&L€U«VÁï¿ÿÌ€Cî±à ×3Ì€ëPîŽÈYÇ“O> ï¿ÿ>téÒENëÿû<üðÃ’#+±,wçq&`_Ðgm‹h¦üøãÒ4)++KŸAË–-õ€äkš¤cNŒ#PsX®9V\“YÎ?ÿ|øðÃá‘G?üác4%bÆœF v0®^Í«veú÷­[È9®8qNú®,9[·å@¹GƒJé_˜| »ÿ¼è©(‡¢#G ¨¨ªx"®p@iq1–W-ÓÆÁz>ÃêU‚£¼ÛA©ÃkLj[§¾µÁCëH! xàعs'\pÁ“ùå—=Îù„`ªG€põ5ÓEp_«pôp4 ¶û`@ušÕ Xk>Õ{©øáß`ž_Ð/ãÀËÒÙ·¸ÿ†Á.ÕÁRѶ5Ð"¼#ôìÓzöì á-Â.•ÖC›—aY;è‰e‘X6¶òöÌT[ïláO{:‹¨8²·hí:vŶ=¡k»V°lÃÐP¨KßîÙ…nn̘1ðùçŸëÊY„Ä”)SBž9#pš0>Màšz3Çm°Zi…´ÿä׉\’^5æ¥tÔî(ʇüƒÜý¶jùNp¶û € ¸c‚ôÜ(D§䨡 `+\•*ò`éÔ—f!g!$Ǭ‡q+Þ’Ò® :BbzØÊì`/±B¢9Ì‹7ƒÆ‚ó6ÌEæõæöJdmh×É02! òm%`w–AvJ$ÍŸÙj¸Úºô­êÇŽ;ÂÕW_­Ã@ZÑœF v0®^ͤv%|²q@|:d§ÆÂ¦ùïB±å°í¾ Ðbî¡y«'ŽCÛ`n‹Rzm…ÇÁƒ[À²ÍJ´œ£;ž¿ýç[_ͼ®õ‚‹‡ö†¨Þ½¡·ü‹â¿à: v}ž¸qŒ‹\ %b9(n*‘A΄qK3 9»DÖs÷^}®6Åëc£n}»Gqç:|ð$''CLL œwÞy0oÞ¢ôÑ*à·wWÂüM!#›‰Ì¨¶, ‡Y‡SÀ¾c¡Ïeèˆ^#pz) »c d̹¦=+q•U¡jœò6Üïž=n¹ª?´²‚g—.¶¸°@xYwÌgÀš—wÁºÛÇB¸ó¼¾%÷˜è†æEËÚ÷±ãÒ²aD›£—÷ƒ·SÏh莫ԕh?\îp‚í8~8K ¸´Âq_¸òó¢m@Ï©+‘ØdX~uw8”—´TݪSOTÌjU§¾%>þÛºu«ÇUŠôꫯÊ?2Áš­á˜œ[&Û[SÌžm±NbŽV[¥ŒhPp©[ß:ñ^d>=E‰;ï¼SàÒ¬Ïñ‰dXbüøñcìŠo¿ýÖ«—æw©Ï5æÓüîSÜœ`_ÐøVääG¹"Ð H¦ò]0¡ã8˜‘k‡%CÕk†f'òR Âô&ä:²`h[Cz©r8¨mÝ:h}Š¡½]•ãšsË0èâ«ÜX·å?ýôS¹üLKÐÿýïýR~饗Jɘ¤ãÁƒû­×T F999’¼]»v9éàÄ05C€÷€k†SÖ*‡ç;¶“ÞŽZ ýo d¾Y¸¤;ÓóUÀ!­ð›7Rè#8¢CßÌ—ê¶Œ‘v‚‰ùÒ´.»ì2xôÑGá‹/¾€Ã‡ÃO<—_~9;Gc"R«V­bĽzõ‚eË–ÅÜm.Ž-ŒŠXlŠd¼³œgªG€%àê1 ÙåŇàë#ÅP^~Îîq ŒF‡ŠÊ”/HNÀweеυêøjZ×h¯ôŸÿü'¼óÎ;@„N¢—ª©sçÎr¿˜$cÒ®¦}䦘胗Ó%ikÖ¬{ï½·)’É41MfÀMò¶0Q¡€À h‘™™)—ªÿõ¯Á/¿üâsÚmÚ´ñPâ"7M%=ýôÓp÷ÝwKr–,Y"µÀ› mL#ÐÔ`ÜÔïÓ1E¢=㌌ °Ùl>çMZÇãÆá^ðxŒÀé"À øt‘ãv*•>+ó‹ IDAT(õ ›°0tµQÏ©²N Ž¶m5ã§J8´ãCp º¢»j£)ã{ŽÜ´r,A·•E¶2´9nçEuñ2“BG!ååPf? m:FB—C;C§$Ùù#Jä¾: mž½“£´JìNo׺wq;0©Ä¹a³*ÉïUjz^ %®¬¬,]‰ËßRpK¤Ñ¨Ä¥-{öV»³‘#GÂÞ½{e£?þXß®]/\›Aš³/¦½ñÈNˆ‘®cÓòëGîòï=º{K‡HDwŽïQ\NÒ€öÜÝ"þ|Õ¼Id«>6 ½Ý]BœÈ¶‘»K!œéÕ–<ÚÇ§æ ¥TV‘ÿÙ²ÉefŒÈñé·Ó)2â°*î?ËDºEÁF£ËŸ&Jde"Ùk\­ŽæjÓ=zís¸G,! ‹Å" à1?mí8tèP‘ ¬VÅÅhíGâúë¯×Çxýõ×O§ nÄ$žnyðWɉ¨1•‡`ÓÊ,Y}SâVðŒµTã^”Š(¢ í‘Âú-«õ0\jpmÙ:%ͳ Õ¤ï¤ç@aAÈ¿­p) ›y°tê è`üuƒpBrÌz·â-)±ºNž€‘ io+»³ ²Sâ iþ|ÈÖ&Rqf´h‘ãV`gí0z²gÊÛ0=……ƒy=zÙnïYZ¾÷E˜µúdØåØö‚ °&ÍÔåØIÜ\¨ÑŠÇBäg⧦v­ªJѲ šÿH&»b ñã?Jïb·Ür C„ôôtX´h 3®ÒË`åÊ•@Ô´—KËÉÍ);;~úé§*õ}]0zÃâ¨H¾âkŒ€Bò³ƒ']/”åd+¶ç¤I (­Àhpä§âõ8aÕ%Xú EJ¼[R4að‡˜¸4$™Ÿ&¢£-"_ïÖ!’¢1è‚z³[I6 Ÿò›=GÄ ¤iÉ´é[“q,sšÐ»ÔK0cËóH±j¢®S =®°f&àõ5…»Ë^" m…"ÓbÂ@îÀTö] ~¡Æ‡À+"iÑF¸{Qr™ñŒ"E¸e{ïÊùòåˆ]»víÛ·aaa¢E‹’nü‰×û÷E¿~ýD·nÝDDD„èß¿¿˜?¾˜={¶”|1º“èÝ»·>.IÜœF fœÞç¶fΗC  hÿ J†q™0aÄ ù-ñaöònN’RÐøÕ}Å+‡aïí ‹’ÌR‚Þþ8d¬K;d½J—Ãî_¼$cN(Î †54M˜ q};Ã%WLƒ™×M€î´mÑ[b`ÖÔHøÒ’ KÆ‚‰K³ 9çM¯}`¥Ç#»³0c†Ñ}µ½Ú0èÞ½;t\êG.$áÒkŠu<ÖƒÓã`Ŭ7àÈ_G@oõ©jkZˆ‘n‡0òGwŽ/Ñ02îœT ò °žI1âð3µÀ¸âìY‚q…»õ…K"\Я+Ø¿ÞKçL„¥¦D°\Ý‘ Ô›ƒˆïŸ‰S1 12Ø!h ÖÝ›#oô™•±©VˆÖt¾ÔbÉãÝU«ä|–w¸.AñC9ÂîY±q-zTÚ¯.žƒ ipSt•Bjå‘ü16R®¢²Óýûí·ßà³Ï>ƒBÔb&…3b¢þüT{äuBfI÷Ýw<õÔS^%|Ê0Þ0öF„Ïk„@QV¼˜µô&ø82l›6Évï[Wb¸BUŽÃƒmý1_ªÛ¦íððã¯n×uÊ.û©é-»O€› VŸ7Oí]Çmƒ#åË¡cÙf0g=¤8afï0xñéðÄÃ`\äZ(Ë¡‹Úª,ݸ¢›$‘©ë¥g‰Ýö:×Käô;nõçÒrñ1YÑuüg<šáâ(¥ÿ°.CaÑX™•?á’q¼\´c-ôœ¸b“³aã’±>ÐÕ«ôŒg5_—9Ž4˜¶x56VÄ̇ÄSFy¹ž‰²q|&̈ªÙO±S§N@õ‘(~/1NRÒ"›bïtî¹çJ?ÏóæÍ25 ”È4ÙSP ÜýjƒšÕ€ûÔšq#ÚÔl«˜k1l™Ré&Å­% mñxݬ++9¬Ïãùx‘ëW Ë)ÒbIqÈ,R22DZbœìÌ)R Kko4CòV²¦ZDBZ¶(,)¶‚l[¾Ø‡E"E.43Â_·ˆMÉv´-rÙ D²YQv"5«B©\…çæda-(ùhŠCæ8ù6·V ö›¯*aeæÛDI‰V†ý9ËðÜ&2P Ë’)lX·LÕîÊO‹Å±M"-WQ+ÉÏ a±i:…/“Ø®Ø&é×2ƒKÌÒìȨ8Eiè¾RL™2E¼ñÆÂéô©ªæ—<Ô¾FQïkîܹ~ër#Ài'F –ä§sInÝbµƒ²lÉd³Ž¢0Ðé0¶s’´Y¾´câ„Ee¤ ¬µ·ê| ªtA:1}7SœÈ,Ôˆo;à‹È-Q,}­)4C[5oÒlzí¹r>uLÊÇÍØš¢Ðm,wk9—ˆ4‹g¹91S~XP[ü:ñ8^Œ6–¼Ø0ÿ¡y’xõÕW. ûÕ˜&íåÇ\j}×…Šüã˜nÙ²¥.Ý5‰¶.ü¡ùçò¶¯‡p8Ý}»Žæ‹ŒÌ=´â]H»>QáA§J¯VÃå2ÌA› µrjï´—Éû_RRæqÊ´d/Ã]´ (3Ž˜èøh}{àäÔŽ>þJèKØjB—¡zÐe™Ý-mÚ¢—„ñ·érÚ…û·lW$âØtßfBÂ!ž÷2C’³Åéþã÷?ú¢ìð±Ýz”D_”TKDI™=àKJAÎ%Jд¨_š4kD”^RTn×”¿$稺¨>IÆøâ©®­³Ì&_R6_ã+‘Æ¥—X™ç;Ì'äŒã£>’&CmÛ¶õ`ŠÚC‡ÄŸÿüg|öqºÑþX—ËIÆÍ7ÙEŠ\U1~¨ÅˆÄŒüzœ’â`Þ£÷éÈMB £…{õG/ÂŒM$M¦d7ƒÆET_Ôq×îqL2™ÇUçøÅ)P‡Ò³­)^䨪4²½ SÄš uÜÏŒ¯QT­”´vvS]4rp'fÀÁ}›ÔìœV² ÆqŒûÅaÏ![úa›ôc†A‚õœ€ò¢²Ü{5ÚöKÕí»›Yª[5rD§òl[2rñ#­@ BþåãÖ m™àÇ›~ ¯Úp[…løA¤J›wüðE¯ièÌFØñ¹$™-–é¶í%ÊÊÄ¥+ög§E}›W‚ȶ>>]~´*ãÒÊ*˜óq i8·ì»L ³WÛv©Ž.9³ þpPßÞ¦5¹íªÛJú¦kœËY"¬9Ùb{f†ÈÜž#lîÕcÄ»ÄÑÇÅÑr‡‡¤«TtˆÃ¸ks¸9bnÒxmtð%…LR]þV‚^;´TqÖ½<Çõc·7-­8àÑOÛ“v”€;yúN¦3éN$¬'ŒAQé¿ß”¸»V%yƒ€ì.kb¹K/½~øa* }¹¹¹úycgФ¦F$T`´)wŠ€‹záM9v ¤ ˆÞfÈ}§öç-sÒÒË \hÉ:dË¥º‹q÷(×mü#ðüËà \͸âª+`$ª<ãCý‡ÒW\(Æ´FCœ ´6ó?•Ýô¨Ž_–¼S"}VQ^ EEeЩ×êü" +NuõÎCØ9Æ·..©€6Ô܉ÖÖ¾kGíLÉÙMGÓ¸ùÉaïÊñè7]¡³ ]µpHã1@3>aÜŒo^s!ýÈöM¸`j†”´0qâD˜xÃHAƒ_ëêTÈ3ð6òšå7áÝŽ…õé5 ºAm@Ç7H#ÀÈA)WZv€ ‹ï’Ë„?üì”×*K÷²ð>°²S¸^ß®›[©/DOÏVÚ¶ïÞßÎÀÍbËà cê,_srìõùêr'žU~Œ´ÆÀ=h„BrSý#8áòäI…M¤¥h´mTrÉý&…aìÛ·¯ Ùˆ˜Ö¸æüã¡Ïá¡ÈË;Û6ÜÃ0¨GÌí#dðŒ¼”« <<\ÿk…!,çÞ·ÔZì:&-Á5“¤©°píغe-Ì V|RZØPÖ®åçðçÛçÁEÓgÃ왳á–Wci´þãxxÄ‘÷Œjå¦!<üB@É>ºg4¬“!0ÑëÛÖ'`% ‚G~¹"ž6|9ÀÂË å±ÅÞ1zöì S,AWS{œ_\± `å8hýwì ‘Q—ÃûØ÷w6ÜÁŸË°íÖG ´ü{øûµ­`ô²`š3DbS´íéi«Ãë÷\©cDxMZ«,S+t™`û}nzT?$Rðó<“¦‰Úú¡Ò•É`7(éD§&\ÚJPÝ@y/¹U‹fÊãÛk-ËŽÇþ-A“IDl|ª°¢FIIÈHTe¤åÓ¢X‰HD:!&QäK;\§È‘ZÐ äµ+_5QB îÜ|QŸ/½fYÑ;–¦pR†ö¾¶|ÅûVBf¾´ÿ•ËmÕµU=‹Å¥æ ©†]7S±äªW­Œüá,S½yU£]¿Æ½räÈa´G^²dI£DÊTÃX.éâK^†wÄ=j?ôxj·S}ü4–Tô®¦¶°þ}¼ìküãÛ«ôyî¹ AZÐ Š—)V¤¤%cÞ½m5jìãvLégž^ä\ ‹i=¬/ßæ§^-Ðy¸HP·\œoˆ’6¢Où[rÿ2™—¦F¸õlT/3§ưlÕhwµ~õ$YWÑÄ/I×ö×Ûâgª¸u ÕGsØ6ÁËNºóÄUBUàGg5îí.:ö_þ†b"…íÏ”C?7¸Q/³t£Âüƒ»Tæ¢09ã|U†ªj+ øô½fÑž×tüa' '½÷Õ|9H0ØÛºlÙ"Ž˜°þ¢2‰”ìB…hÔ25¾Àô::#Tömõë²õ[m[ÔjÕÜbªc'f÷žËDZ¼f'Môʼnlí g„´‰ç_zé%[ŠaüÁœQŠ÷ïß/F­Ó Ý«Ë.»L|óÍ7§M‹5…L‚Rõ1ŠýLöäF}GÞz÷=6vê¶ÕCØåàqÙm5MpÚí¢ Äso“ž@ñ«MüÅ—eeÒi :j5hâ+Ð(Ne°o<•sV]¾ÊqK¿¿<(ÒïŽvÛ+ÍtßÖZ¥Ð<2ÍûÞ,g];¯Yø’0J†_R†Ëz–¼j‘;>_Ê"z¥È(¾¥Ýþ¤½‡h,º¼é¨ËùôéÓuHRhyyy]º«Q[òÄ5þ|»db¾Ýºu¯¼òŠ ›éº$Å”I$¦¦‰´Ôd'%Â8a4Éud+Ò¨Æôég°ûE.%2H± ý¢“CŒ/ß¿GÁi\Íý¢çȤ‡gèN6ÔFdËëaÖ„ò}*ÎÃhþF=ò3ÓD¢E‘j“³«8£•N9¼ûÖ(kb"eœC(ä=7ªð©àÄ4Uœ“ ÝÈ•H.^ã3 ¯ñÆÏu|'äÁoU¦†¡õÂÜ6?UÊ#:t1˜U)n° -Ã:ÈMþh,ºüÑs:×_|ñEسg {øÃ?ÀwÞ ¸$|:]UÛ†â'%%™C£=QÌäøøx¸ÿþûÂ,z'2™:xð 8p@þQíÚµó®VåohPÓŽžÏõ𫯅7œ"ŸÏ¶£VÂ÷\¶“'Àyu€è(5l'õäðŽýìxÜ 6S¡;íÎÈ6éñ¥;Õ¦ie©¶MjÏt&M®ô‘œð5š!íΡßÀñ_«îãûêT©]ûAëSßCæR4‘B3)ÜÚé5Œ¦“dfÀAvCƒy:ƒ5g<*{”ÃogŸ¦Ñ#¤Š¿9\T‡+=52ýÕåëg®]»1á3fÈ7oÞ (˰‡õIAFF4î={tk6›á©§žŒ%¯£ÛL@S)ÙÓýꫯ¥b½Ý­·Þ W\q…~^5ƒê¦ÅðâÆ…úÇ T>šºÆOÄã[ž GŽܧõP‹X¼;6v[è|®Ov滽ájmcLSSE_Q”ÓºÒ5ñµ øY:ý±0Ïl]}¦ÞWÛwÀЪß0z %£XhÆÍ€…á&i}0}ÉPírH™‡ämo¦“ëÑ#ºÔ˜ø°®BŸ׿Šg b‚·Ývàò¯BÀرc¡{÷îu&á‹/¾ÒíöíÛ=ú"‡ «V­’/Å?Ö¤[Üû¥m8ºÞ'T70Æž¢"Dö%ŸåÑÕ™‰]}Œi_š×FM|…Ÿªšø ÚxÌA;‰ìÕ³«X"ùê[k£UëƒÃú…ÐÍ0Ý{Ï3g ZÖEÑðí·ßîîѶmÛ•³N‹¦²²2iV”’’âám‹Ì]úôé#ǘ9sfúÆ€€áaèСòoòäÉÕ´CÙÍÈö£™R4³:^š«'ÎÁ6 ÐM•Oä¥@„éMÈudÁж>º‹dí3gØ\hqDzæ¬@n¾Ñ£rÕ bÕÔÈ·IOeE9”;œ`;~ ÀYÅ¥¥Ñ:„ô}-v4Öl¸ ž¼i |ùÆc„W2¯"Fë€ h:V2˱£ÐLïX<»¯[ s80ê›<`-Æo­åñ“ ['ìI½Í¡Ò&ù]}†Ä¡°ÑÍsd¦‡@vv¶8묳Hü”Ï>ûl­‰üí·ßļyóD›6mô~´þjr$/]ÈhÅ‚ ĺuëdhF´®¾LmÌñÉÂêv—¬Ç¶6(éW£Ž±±«‹1í+l QSÛ¿&¾Sd&(&{:¦^q·õ]õAU BçJ š*‚ʉ`3ŽÀŠ+£4ÉqIZ¥ýØ‹/¾¸Z:0.2ÐÞ,2ñjëj¨ÿÁƒë’-I¸—\r Öª4걲•-QAP®K:`óÜv0ÒÁ¹q¦¾·ì&ð¤˜"À¾¡ –5(s¹+œVN‘d+¡¥*{t‚kè¤Ì†¥Ð¡Cµ¿Mé¤Û:+*!¼CuŽUª„ÜfÀ!wËyÂŒ@ÓA%X,ÂjU4k)pÄîÝ»¡e˪»c¤¼eËذa`¼ã€{·¤Ý Á;ofÀÁ{oyfŒ@³G€—¡›ý-ä @€— €ÃEŒ#и8p† &‰ ¸À?ýô´jÕªq‰âÑzB€%àz’»aúG`È!z,ಲ28p lܸ(__©²¢*Ô?Úo®ÿT è^³ÂÐuyAìÈ+ñ9Ñc¨ê®SYå¥Åèš³>+`U¤¿Âç*ÁQ^ EEEPê«q…JÑågQQ1T¸GÔsŽré´´´w¤W IDATÜ'måJÛrã$õÖî Í“fÀn,8Ç0ÿßÞ·ÀG]]ù³%J¢€D *°<ñ6ˆ<@VŠ [ |º„C‹ÕPÁ Å„Zˆ¯&û‡ 4Hª¶Kˆ„Pj&«I!`@']™Ød‚{öÜû{Ìo’ >Ê¿@r/ŸÉïþîãÜsÏo˜ó;{ÎEF¨¨¨°|ÀUUUxøá‡qÝu×!)) &LÀo¼"úN˜7Tæ#:&1úGH×Qca÷‰óÈ(NBLl,Vª7qühÛû»ãæ½Yñ—c>ã²æph¬è;¶u¢¢cÐ)> ñˆ‹ŽÂÚÝ'ÌiZ¥;æG#f+¬½áØVܸNñèÙ³'â㢱(ÿ°ÉH+71ì˜8Ä'$pb¢¦`G¥_‡Ñ€­s¢׉×M@||'Dß²ëŒ7€zìX61´¹b¢±lkeØúÆÍáü9Lë{pÐmt´á«bÀmøá«­+ \ ‘°š!©îÙ³üãñÐCI/é>}úÀn·cñâÅX·nìûøã›N »Ïò½ w5<ÕÕ¨(u µ8cn‹( †MŽxÃhS)°Ã u»ñØ ØÐŒ+’0¸ýå¡{ø‘?% Qq6¼Â­í-=¢D'd”Àãõ!à­Bn °pìË0XpyþLDEÅÀΓíW‡«èƒgNcXf*<µð¼(ÉMEöìÙ(©Ó°³ K%l_­Yvìó61FÚÊ_vÎ@¡«ŠçP[áDŠ;©ynÙ{lë“°¯Ó¯Á OÂ^5}vœ0´¶Æ‰+0d¶ØY±ÓúÛì_~sTEQ@Q@QࢥÀ¸qã„x+?<ò±g4±dl¶}‘®<ðÀ9÷åsç2;¹ƒ¡aÎ4^ËžG£)à¦4^?µÀm´D¸ÉUÂÉf#›-\>¨ ŒÁITPá7ç¹²“hp–˼_­‡ª+”Ìk帼a}MoܹɼVUëA_-U{ªÉ™Áë6Ût.yÏ\·@®y©Ê³l9åØ@¶»ò’™ViN ZÊbü“e¿ÖìsçÉõrœ…dG2•F^Ö£íTÛµÙ7µqEEK‚_|ñ…‰ç£>ŠáÇKg¬… bÛ¶mòœ°9 I¥wïÞMZ"Ý:°ýõ|”_|rd;Ò³×ý!)´1ˆ÷yÚ©º`¤É²­þà ™± ,¥Î¹«;N¾» £f³äÊ¥ñ ”¿ƒëδ<_ ‹íÒ±±½XFl©4¢rço±£Ä‰ôUÅÈ)ÙˆúÐv±]Ѓl_×Îø:ÑýØþbžeÇð¾‰Ü\²{·;X)ðbÌ6®4Ö`Çë[Qì\ˆlw\3m¢—'ˆA_Ju¶ÆLbÑ^t¥n7FØf#µ° &žÁv£]]%V_EE‹š>ŸÏÄ/.NcO]»vÅ–-[dûÞ½{ñÌ3Ï ¨¨ÈgTöïß“'OâÆo4š"^‹÷ሥgáì§0âq»Ð~w°Áég…lû–Öjœ/¥©¬|h¤ìè‘0ÉÌ€eùV¿²A„vk]CÔ8ÂÌw©¦þýÛ¦4†ß¬5Ôà/ÏGŸéÙHaòà¦zn¶wílÌfþ[X55ô"¦7úP¶o7ʸø _XÅܱ=’fd“옓‹GFuÃ_ß+ÄBFÏ.†±={NüXôÊuáåiü"ÔX.ZÙ,/êS@Ù€Õ×@Q@Qࢦ€•‹àMËÈ‘#±k×.™°aÒ¤IaÝÂN|êÔ©°¶°›¯þη£±zý«xõÕW‘·a;È[ ;Û974,¬í˜÷v@ûs0RaÍM¾½W4KÍfi<‹¯ÌQiDõQ/§Ãí¤Æ‘ø4- ì½<ôÇYxuË;¨t¤á©Iã°f¨5‡~ü»ÔÕ› F…½œîþ5âX”áÄ«³=òÚà¯ÃïWLÀ¨…–¬k1­·•;³u 3ÍZ‡7}µ(H݈Q)ÿ)mÄÝ'f¢Âùb]ÿ‰'W¾Â6ék5¸WDÃÔ)íÙÇäâ±31kîÏPÌÿ2§ÍÄêÇÂÖo³7mGÛ®vª( (p)R€¥^iCäiúüóÏ¿v  ñãÇË9S¦L9çøËo1a øâ“’‘K?á«=ï\6_+X¶…Ú›ØB¥dÓíþs%Ü´½†UÕO ~üüb+Y÷ìZ$Ç>±×j( Paª†›#úåqIáöTŸ›¦1Ün ÂáV¤I˜æ\¶y;* ø*Hi;%‡ª ›8Û¯Sfh.(éîÁ¦¸º('¬Ï–²@ÞT(è)¦Å“oëÇM”‘“GNwm³½·Åá¾¯Š¢€¢€¢ÀEIAIÚüÿꫯ¾1žûöí£ŠŠŠsŽ×ðuôÜ[ETTô6½¶r–¹Vaµî†pÑdfB“ׇ;MY{Š4¬G_( m…9ìl$˜–]:aëvÒP¾Þt¬òÓ’n ®VFÉkÜcats·UPmm­îVCsõ£”5[éÈÉ“ä~û9ê/ÇΦ*É(}ôÚ}!&Ù}v!yx®WGgúmO÷¬¦ÒƒoÓá8uÇr:äL¸ŽÒïÔæþèõ]´ßñ<õcØÉ+Þ$Ù]³…™Ù¯ß颓'Ó–ŸÞ#au_°“¬SÖZ¹™ŽðzïæÏ'VÒFfI¼« S¹?™ØCšØCšJÖM“ã_0¤ÍJèϪ`÷Ûhì{;ÆŒ™ˆÉ“Ç!á +Äæõv,¸vîrU³3L£Yi6„bÑ ëÊO•0B#8Þ³¼±JÅ¢ÁÏñšÇKæ[àöbÁÈpöÚXw‹búà©Î™¨ ¾ŒÁ-‰úT—Þ,õ¥'N‰;ÄöHÄ‚7`wY6¼øcôfBò”[Yo J;ô3 /oçóû·cå£R¿ÆlZjÓnÓ_m^Qàâ¥ÀæÍ›aœfOhÑ·ß~¢ïüfYŇñ^e%*ËËqp÷&Lë3ƒA'£«dDZ2‚è1/뱑›¯Û+‘GcÈ㫱cçV¬˜9sÝ"êqxi®p¶ö7 ®®Õ¯Y”Úê¾×Ep¶1Ï™²;“cÊw®Å…Ű͸Mgtà,Iu¨©þ XŒ¿Õzä8-'hXÇñš‹‘ÊQº¯üå¼Oñ©à+1?~²Ùò[øÂt ªR믬‘ÌýØÖe˜¹bÊ9KR]Í1l]1Oñ´Çt&Z¬Ç8;SݱƒX=¥ÒÝv¬š-lË¢4âÓµ¦¾•{ó1&Ï©…˜vÆXÙfÿ¶~?3µCEEK‘ 0½oÓÒÂϲrÖ"*..þ‡·ÉCØž–CnãÈ®á%lÏåZËÅË1œS“mŒ¯Òrreìhã±ß½žÛ'“[?Dä§ìÁ ,óXÃõ¹d hfDæžaÓ×Œà‰œœ–Gó4r›y*Ô_ãåÌë¦X×4êz,èjgV½/³0ä îÊ{ÖqNΠ¹IÞ”—rŒ>¾¦d²¯·*V D‰›6ûö¡6®( (pQR@d8iE¹ì²ËÀgbñøãËØÏÂ|, ;wîÄ]wÝe4]°«¿ÞØŽºî¶~/Æt…).{GÒçžF®-¾|/–ÊX—ß m‘K×/lâíbÑ1¶©šù›Áø®£8g€³<Åpfª°ˆ`"G±_àÃ85·“7ra‘~8&¶É¼ïŠH+›§p+{ j;Š­±ìø$¼žE¹òÊ+!2|öÙg¸á†8ÝŸ¦\}"6ôÿ÷ã{ßûž¸½@¥«£Xý*Vg(;s Žx¶/@wÑÖ¬œæñ±øü€+‡iÖÒfCTC› À?÷5ªMTmRQ@Qà¡€Èék0_'1Q³)^{íµ˜3grrX±©a#ž0av³‡îm·Ýf4ÿ“¯ñ¸§£Õ ¾þ4.¿~ †ßÞ›â´TÚcÖÑ£ÐÌ- Rím€JnYmQQàR¡@{ÓÞzë­aèÞwß}رc‡lûøãå9à`0üÄlçÎÁ6a <8l®ºQ¸˜)r-¼˜±T¸) ( ´z iv̘1ÍöùÉ'Ÿ˜mB’ÂnCzag,Y"¥#4Æ©«¢ÀÅHÅ€/¢§"œ,ÎuÞЊjC}Œ TS²‡YûU]QàR¢€lG-C6Å[ôYËÒ¥KaK’°°?ÿüó`Oië0UW¸è) ðÅòˆ8yõü˜¬9\ÿµ•oZ†˜N èÙ³':Å`ÙÖò¯£( \¬ñˆ÷ß? Eáý,Šp¾²:^õïßœfÐ+lÀK–,1ïUEQàR¡€bÀüIù‘?% QƒöÆ¥eÇ ÑÆš|Ò ÝRîÂ4¬šnÃΚo*;_ð +L x½^ôèÑ~>æÒ´ÄÇÇ›MǬ‹JFF†y¿}ûvTWW›÷ª¢(p©P@1à þ¤bqÿ«TW8dˆ·¯C§êB’‰ŸLÌÞÚað´4ˆŸ¢‚?V}ÝTÕ¯(pQQ€sõbèС¨©á¸ÂMŠ8^Ô«W/³µ©ZÌ7nœì´PA«¢(p©Q@1à‹à‰ÅvéŽ={5 ] µ3Ÿ±$`‹·dfé„~bÕí;i¸jS¸() ÂI6 }ô‘‰Ÿ¡r âè‘p¸2ŠÕËh{òÉ'*òòòðù矛÷ª¢(p)P@1à‹æ)}}*1ê•q ü÷lÖâ®sX‹ºQ8ÎÂ+?,mþ£EH«“'O6ã<ðFŽmT‘h닳¡¬¹M°Àeôر’Q‹ä "IÃ5×\ƒ޼$$꺺úÅÜH»sëjN †ç6/ð××IÇÇ:ÚéEЫYiô£†S)žà¸ÉÍgk°Þõþs›S ­…Š_bOòŒ%àÎW„a-îN…µ¨EóE?^¿'1ì (?|ì'*j Vï¨üÖ œ={?ü0„óßÿþ÷°ù=ùîÊwHϯ°ýÍbsŒU]³w5ãr\l6Œ÷Í7ßdUö l…˜¸NHHH@||'Dß²ëÂÙu®  Ø4S›ŸÀŽ<7jæZœÐ§6ÛŠ[¢8ÓP§xéøEù‡#0RàpþD3n-&í†c;1%: ì4Ù3!Ñ3óQg,ÞP‰9:lw§¸LY¶£ÅÄÆ4umP ø¢yŽ-¥ Gðêî}eö-gŠè󢌧l×^>PÝ) œ/ ðÛ]†ÃZO5ª«\ÈãÓ>éöyaLæ›,%$Ï?ýéOÍ†Š³¼¿yk+.«­’}QQQz[?®_eÙ¹xë­·4/ç†Ã˜Â} £DÐÇ8DóßË/¿\΂ø²s ]Uðq¼äÚ 'RÜÙHÍq!¹D˜«uˆ¿A\Þ/Nw5¼¾<®Ø6.ÄÂårHðÌi Ë,@g*ò¼(ÉMEöìÙ(1¹¨éÄÎ2[¸Rj¸i­uX;u)yðXËUåDòÆÙx"_ácT½3ráöxøà*ä\Ä«ìÈ;láà õ·5RÀš™AÕ/ ¼µµä©($þ~Q¦³‚8ð<9E´L* õ.¿†œÇ)ǥ敒/à£Ò¼TyïTiFZ|xÁ@€8±»üð¹ÑÇ}÷Ž ü~ X@{ "÷§A |,CCc‚òÖzÈã©%_Ä<”ñDÜC|ÞZbo`ª4™¿+µ÷{ÌïVha¢€ÏËëzø»çm‚›rl ös½¼I_©ÈcdÛ8ÕÑÆywî_ËùoZ.¾ 'M¶dÇßwña‰˜+¦QtL¬¼¿ùæ›IûÞ'‘ñµ× $Žng¦\ßu®ôDœ»HàmËqé}‡¹Y¥‘7ãqHs Ä~\Yü|ž9дÏM©üƒ–ŠŽGU†§ÛxæU7™ G*ÃmÂU…òG޺Ǵ<—ÉH+ šÂ¶“ÃüP¡€Éë™[•Öo'i^KãÃA„´"‰[ ÊÉ ÇÒŸ’9圑ö.5—&ÞwŸ¹Ö»'Z Wv zî}ðÁ´wïÞ°FŽ€;—ç'SDô#7‡Òì§4ry}|ƒ¹Ú"~ÉL3ÃÙ”þÚíYÙCéu&›Z(þº9­`èåÄçÊi†«;×NfªA}Í §”r³2dJÂä Ç9Shªë¥O¥‚æ_ [b1g7‰!Ëg7õ,fíûÏ’ísoï`¢Ù{ârY&R´y„¥{›}ªN`P¸¨Ùà`õ¢‡ÏŠV”:Zœ± ·±åï»”FvF ·+¢Ã u»ñØ ý¡ °W$ap{C=*Î}Þ;ˆNÈä„é¯orS€…c_Æ 1•KyþL¶½ÆÀÎNûÕBù*_§"°³ K%l_­Yvìó6évÆs«nmý9Ö $3ĸľæì±X»·ÜT¯¯õ¢æ}>JgU¯†P„¿z6òA»½ÏÎ@ÓÊ(]ï_*³‰{¡zfé#FŒ0º#^ã@7ÒèCÙ¾Ý(sˆöÏðE üYs®˜Òx Ë&ÁmË¢‰=DKXñ—ç£Ïôl¤ä­àçË]@gNüXôÊuáåiüÿP_.:FŸî¯!Zƒg}zgèÒèûûv–¢˜›N¾4À„¨Zë¤À¥ÿ¡v (Ð2|RZbiÅ"9ÓX:²ç…T±·L žZJ4Þb\,ÍèÒžÍF6–¥¨ ŒÁITP¡› x²ègêO šÕË|Þ[J99ÖdìÍ#w.«)‘A† ô±zÙSMÎ ^· ÜfÓ#¨H­cªò„–Ó‚ºØªºådî"Ñûµ,›Ùäk)‹i00‰ÛG,¢\ÿ&êÕŠ‚sMÎtDK–, Ûº€’uqSzEVAët<—lnÔGB²o¢êÕ¾-HÏ*Êdµ5Jí² Ñë.Ïß–á4¿;>—–°Þ–šFi©©”šÊ´ešØí)”å¨"C.µh”#IÀæ"µ%òû‘áô˜MªÒz) Ò¶Î÷*µ«0 8°ýõ|”³0òÉ‘íHÏr\÷‡¢Ž5ñ>?Uײ|Tp †p² –RçÜÕ'ßÝ„Q3ŠÙ}‡Kã(×iy¾&Î{ÇÆžë¼w#*wþ;JœH_UŒœ’è!&riÛ=XÀöuíÌEZ[Kí/æ.;†÷µHäæàzìÝÎâ¡íBšìj¬ÁŽ×·¢Ø¹Ùî4¸~x³šÅÎŽ]:rxQbÑ`2PzVJ“òhP„¹|n u^?üŸþ/Þ~/^JÎDUp9z뿌¶ßïÃG¥o ñþ ¤ä”`‘r!î~3/½ô.逫Ú3B_Cé+áæ1caëu%ÚuG–k‹ÿâGâPíyœùÛGòÜ ˆçÎ Ä]ŽF–û.½e@žâ§xîú:êÒZ)pÞðž={ðÎ;ï´V:©}]¢8ûé!‰ùoÖ½¨:Ñ~ðÞ3Ìšˆî±cGÃ)B!¶©ù÷ÛçKìy›êÀʇ´ß SÌ X–oõ¿è\ç½8ÂÌw©¦¢ýÛp4øîÜ,^hD ©[S‘6°wílÌfþ[X55ô"Æ4QÝ–çMÇô·ÌÙ?‡“fcxÏOñ¢@Ï}£ÒÇ¢$Kx57ð L?š‹’)ÌQ"”Ø^‰ÌXbÈã«á˜ñ¯(+X§þ‡.ŽÒ'hÊú/±cì$4ð≫ˆãC £þçñBôdodQj«ã7«zãÑÿgÛw6œŽg‘p¢CÖõ : 51‡zíºë~4æÖ`ÿžgq×Üu–ÉqHÿñ=@U%ʃ§ñ‡ŸÞ'oé¾~*fNèŽÊòrV#„_ÜaÇo-Ýö 6dNÁk³‹ñƒéðZ÷÷g܇O7½eƒŽ"5ݤÉùˆï4לýØú=X99{^_Œ§¸µ`\_³OUZ1ΗpÿôÓO“I} .©ï@zzú7ø/ÀŽJ)¬.͵¨¨.VêÎ6\OâçžmQ+7U©š‹Èy¬ÒµŒ5û,•*‡pœj®*´-© #©HC ƒTòâdùl^ØuRóuZjºêö¦[XeœInÏ´thóÿ×ý~”O5ºƒQ©UÕÂéÉžKÖ±ƒÕ _Yœˆ‚TQä$wŸjþÇA©É6ÆÁFi9¹´€i6ú©ß‘PÜ»…VÖ^ʬ9¨Õ•þ’ÇÝI»*Š-žÓIÄr·Üƒù[?‘6¸XÕëÿ€ÒÃûî|ô9Êšjc–>7a"e–’çä^z°iŸ¸gØ…îjÚ—ýÃðyúØnÉéL/ÓÒG||HŽYý_¿§;‘Héx>«¿=lúzŠ‰Ùº £ÿ÷ïæz29ܵTõ–6ÏÜË,´|Ï,OFU[„ƒÏy)ЇþƒYÿ3©úÅM—oÆ€u“æ ý_ ”²§ìwaÀné5œÑ…×^·sZm‡¢×Ý®.Þ¶ V‘†€˜592„M¹ù³HÉ(LÂÊyL&¡/3<Ó“šý´5ïí·6I8UòÔlâé$w/¸Nöæðàô˺6fÀÉ©|dª„•ÀçÚ\g¬î— 8ej<3¿ø$gaÜ ñÛŒæé¬ýÇ¢ÛÕ£;ÄÁ0KtÕŸÖOÖûn¡v¦(Àr‹”$-R‹N’S©Ðeñ2eµp ·O^.ÕŸð‚ÎÔÎÛÚR(· Ç"»¥š4Û5¢¹ ú\ç½äÌÔ¼gùG“¤l©ä´œU–ž³FŸ~]Uª;v粇q“>ó>y-ßdÿúØ—KJµã‡4éï7‹ÞZ'ð±ÑŒÓhöõc~B‹-÷bv-2×ô£…tóà ªðsP‘ã;¨¿è묩•½ž Úœq¡ÇR"5qä1l?eS“€³öŸä !Ghǯþƒáޤ'±×uú2®'Óþ÷ éfËzFÚ+oÒÓ)væM&.ün„¾ ˆ-÷ºÚ?Ȱ«©ªúOôôhÆ­=ÃæsÕ%9ýWþdRØümàïD°ŠøõK¶?Ï5<%Z ’üÕ¬UHL¥“5í‚-­Ðô,ü©Ó´Mxiì-ŠÇ©iì¹äp:(7S{~v3hˆúŸÛV(pÞTÐm…`jŸm”"J•%”f£µchBaÓlz ©Éæ·áÊçõ’×ëkÞÇ-‘˜°U…¬½lØØ¶XM|æ™øÌ³d°°Èc3Ú|eåPA^ÛcSI¥ a§äè\^ïqzæ_5Fó2ãÛ:ªÔÓ*f>Oþñ8ýEÚ+‡ÒÁŒfd“«ìmšÆõ›øså„õTËxkLJ®7™ŽÁd'ç–É=ÉãUƒ–Sæâåºm7Žž}U³ƒæ¸þJ7hëó´ë<*Ù·Žaަ§„mU¬o|%rÝFE"\à%íú5¡½ÎN#øþÅ]ÔèÑd̃sÇKx/æ?)¯ƒo U’6×Èû™ÄzÃù3TÚúµµí´%›™¶Mhv|¨VkFiÁ:‚^*ÊI•°„êž©Ívñ²24µªáð¥IŸËh3‚„ÑC%“Ð^(:ÐÙìÅ(* èæ>Izßd*,ÚN3» †ÙOÎn!tïCý—ï¤|~ˆa cǧ«Â°Ã9@ϙԓ)'K“11‹>à€$‚rÈ0Éä´ð‹º Z÷×BNv&\Ããlì]Í*hg…ˆW͘êÞä×óˇé¢='k™1jŽh¶¬bS¥,÷—¬áyŸ²1ìnߟf ó´Ï½¯—P™c‰vß>ޝhø­ÍþŽ?ø5•½)TàwÐÆC–ò3îA#' aæü÷TQ¡nBxåí|9¯[<ÃN΢ UÇÈš1ÛÒã+ÈY*ÈÃÚŽŠ’ô[M'%îݬÿ‰Ÿ§Ë6á‰ô”PªŒªe¬a£Ü’jýûê¥ n)™…‘ãg·©owÛܬbÀmó¹«]_‚0pJJ 3×w(ÎÔ²“®BÖðh:Pïò–ÊcOiÎj݆<•nã9K6î£Ò¢=‘«j¥*5@¦i ã¶lC½ÎL‘ zcÉÓF OžÅ½mk²ˆ#VIâšHÀüâèS îs¥ìÿ+–“Ï‹c@)tòä.]<’œïn£»N†ó­gÉ41éßÉÉ/[ ”sÅmY¡ãU"{؇Ûfi}×ÜOï®õ»«ªˆã]“›?‡><.3L 9ØZ‚œ}ÊË8[3hYûU½mP@1à¶ñœÕ.[¤ËŒÒúc^­§æÌQcÀ|VÖ:€e+a›´³½:’·´=-‡Þ\cØ2}´^:1#™ïÔ)¦1ànæÖâeƒ¨Sï>’A]ϰùÅMß"Çk6àÉf¦ O¶ä«4ÇSŽö7ó‚Ìodn©Œ3=²‰WöÐçYÎ*ûhe˜dɸ²jZ;ЬŸ×fX’™Z®9®#´Dàii3Çt{„¶>«½Ìƒéß8¦³°×WŸ¢ûò+äÞ„T+¼åy}òzŽD[1ÎÖ$s•NTuQøþ’¨¢( (p©P s8¢ }›fÂ:…\娬,Ç჻±vÎ8Ú0%¹/ÏÙª+àÜ À Œ½âåÛ_\€¾W0c9Öñz¾¹ø5Ÿy^»;¶®Çkÿü•c+GëÃ.‹íˆ„Ô- À›`æ‹:þ\Õ­›ìí0x.Ãu`êùœs >¿*]˜ë X¹32Ê0ðÃß#á©çdÌãgw–í§rîÜq1ëèûx<‰­Ô\2¹ˆÿ&aÅüñˆ®÷˶c[–`™…yí=x»²JfVr»spµ%Êvb‘\”TŽÙ]Uµó¹~Çò71­›.®Ï/ª•û{¯xÙ,—g÷…<n娾·ß'zà῎áÃT\Úu¿kjOâ]>ëËÇ£°vþp4ÖixIÚ2L×ú)[Ê~Ùª( ´Dþ²¨¢( (p P@SA§R ;ðgH…œ)=ŒC6à”ÖüÌó~y$GH¹ŽQF’…hýÑhÊÝ8óü =ÀžÉ! ØKÌ5)ð&ýÚóFê·|—.™³ʶçEóØÛÙ§_—¾«9®•m^Þ¬ï)~‘óë*dûل1ˆñ VÈcOf›þ`–4« Óý-{·ô 8KHÎǨÍö”xl6Bañþ›æÚþ…5×6ç4Ö$c˦ô%ðR(^tokª( ( \ˆ¤æ*d·Å!ÛP[b‚4ßY šrÓ4­0Ç(©Íÿ>í«5ò7úiyÁp&™je¯ç¥ßfaB’Ñ ÕÕÀ‚ŠÜôoôƇ†Gø¹’œK…Ì›dhU›ŒXO­Xí2Âñ²ÆU>·cTˆLœk›Þ];‰cZk/%¡US8¿ˆàøK«Š¢€¢@¡@#§àklÇieD?6͌à "ÀX2 ¤ÿ ¢â†…Q#ñé½(ýùˆ°6ãæpf«¢€ž²Ìilä4~¼'ìØ‘ceýK¯hhDLÇŽC;úëëÐÐØŽÃ€rÿ?/µ”¢@S (Ü”"ê^Q •SÀpâ†=Å»äÄ"Ù/_Õ1¹‡ÁŽp¢üC««ÅéÓ¼cçÔ…œ"på°Ž)ÓÀvÑO;¡O÷–Ò9Fœ¦Ú<nó_E€6G†:”—U¢®¾_^~lÃÑÝà½ÍˆÑˆÚªj kOÄw8o¹[š­¢Ú"n‹O]íYQ@Q@Q@Qà‚S@Cºà@! ( ( ( (Ð) p[|êjÏŠŠŠŠœŠ_ðG PPPPh‹P ¸->uµgEEEE Nÿç”lvj˜lIEND®B`‚PyCogent-1.5.3/doc/images/tol_not_gap_filtered.png000644 000765 000024 00000124325 11611026400 023141 0ustar00jrideoutstaff000000 000000 ‰PNG  IHDRàðØu·ºUiCCPICC ProfilexÕYgXK³îÙ¼ KZrÎ9ç 9KÎAÒ’s\ATÀ‚ ( QÀ€€ˆ  ‚HP@Å€€ ¢€(ñzÎù¾û|÷þ»n?Ïô¼[U]Ý;UÓ]UÛ922A@X8%ÚÖX×ÙÅ•; 0€à`$ûÄDêZ[›ƒÿµýÐsTjO×ÿ*ö?3è}ýb|€¬a¶·oŒOŒ@èùDFS@þ„éC)‘0F=‚1c4¼@Oíá€?xy{ÿÆhÔo{[}Ьà¨Éä舂07Î'ÖC4ÃîÉÆZ>d_Ø `ɰ°ˆ=üÆ¢Þÿ¦'àß0™ìýN29àüç¿À#በ‚b"CÉ ¿ü_va¡±ðóúÝàž:<ÔrÏ6Ìð5ïK60ƒïœðµúÛf° Äîî`Óö°d¸·¥Õ_XË?ÚÈÆðXÈ:’¢·‡ágùGR¬íÿ¢I Ô·„15LÏõ‹1ü[Ï•`òþ=›ÑÀô[ѱ¶0„ñƒ˜8;CÃ}H ´wúKfÕ×Ïà/:áddúGÁD1Ý›‹¶9H„ÙÞà¹ÊÀ „? ¢á>Hs  þꥀ? Ü8˜BÀG‡Á#"à10æýKNÿ?(F¿ÇÀãþ»F^àËÆþ3çŸÙxá9ÿÖ|aü7 ϱÇÛ[]ŒgPÊ¿æü[bOßïÕÈÖÊ.Èný½&”0J¥„ÒCi¢´Pj€ÅŒbR(E”*J¥Ò€yjÀ|€5ü½Æ=ýa·üã "ÔaîÞ÷þ› Kýóû?V‚—š—þ^¿xø=@?"2!:( « ¿¹~’¼¦á>Ò’¼ò²rr{ìÿ7moÏú³ØÛß{Äüì_´°TÔraŸ:ð/šÏ ÍßÀüM(vç$z}b£ãþèCíÝЀhaeƒ÷D ?gy  4€0û°.ÀöŸ@Ø£ÁA’A:Èç@¸J@¨uàhm ô‚~0^€I0fÁ"X?À&AXˆ‘ 6ˆ‚$ yHÒ‚ !sÈr¼ (Š…’ T(Ê.BW¡jè&tê„CÃÐKè-´}‡6H5‚Á…FÈ Tº3„=€ˆB$"ÒgˆRÄuD¢Ñx˜A,"ÖI…dFò!¥ªH}¤ÒéŒFAf ó‘¥Èzd+²9ŠœA.!¡0(Š%û© Ê僊BAe¡.¢ªPM¨‡¨QÔ[Ô2jMDs¢%ÐêhS´3:}ŽÎGW ï {Ð/гè †#‚QÁ˜`\0Á˜C˜,Ì%Læfó³†ÅbÙ°XM¬–Œ¥`Ó±…ØëØìvûG…ãÁÉãŒp®¸p\ .WƒkÇàæp›x:¼^o…÷Å'àÏâËñ­øgøYü&ž BÐ$Ø‚ É„B=¡‡0EX¡¢¢â§R£²¡ ¢:FU@uƒêÕ[ª_Ô ÔâÔúÔnÔ±Ôg¨+©P¿¤^!‰ÂD¢+‘B!Ü,RC«é©UkSû¥®¬NQ¿¥þUCJ#D£Fc~ŸÈ>¿}åûÞkòk’5¯jÎhñjyi]ÑšÑæÓ&k—j¿ÓÐñթЙÓÓ Ö½®ûEOV/ZïŽÞº¾ºþaýHcƒ ƒACCˆ¯øŒj–•Œ?0A›˜™d›Œ›r™ú˜V›.ïWÙxÿC3j3;³‹fïÌÅÍ£Í[-û-Î[LY Y†[6[+S«óVÓÖ"ÖQÖ÷l06Ö6E6mål“lûìHvžv5v?ìõìÏÚO:ˆ:Ä:t9Ò:º9V;®;8å8Í8Ë8vîwaw riqź:ºV¸®0`!P;0?p)H?èbз`“à’àõ«ÊÝP§Ð†0\˜WØÝp†ðð‡ÜñÑ‘é‘3QêQyQËÑfÑ1PŒ{L …bEcǾӊ+ŠûyÐñàíxúøðøñ„S s‰F‰×¡ùêJâKJNz{X÷ðÕ#Ðï#]Gަ=f|¬*™’ü4E6%'e5Õ)µ5+íXÚûãÆÇkÓiÒ£ÓÇOhœ(9‰:trð”©ÂS;¾O2e3ó3·²|²žœ–;]pz÷Œÿ™Á³Êg/ŸÃœ ?7–­]•CŸ“˜óþ¼Åù¦\ÞÜŒÜÕ<ϼÇùŠù%b/̘´ ž+ܺxñE‘^QC1gñ©âõK¾—F.ë\®/á*É,Ù¸teâªñÕ¦RáÒü2LY\ÙÇrÇò¾kª×ª+Ø+2+¶+Ã+gªl«V«TW×pÖœ­EÔÆÖ.\w»>TgP×R/Uµ¹!ó¸{ãÓM¯›c·ÌnuÝV½]ß(ÔX|‡t'£ jJhZnlžiqi¾»ÿnW«Fë{Ò÷*ÛøÚŠî3Ý?ÛNhOkßíHìX{ù`©3 ó}—g×d·s÷ó‡6{Ìzõõv÷éöu<Ò|ÔöXýñÝ'ªOšû•û›”î3á;1ÿ2ôå·Wq¯6'M¡§2¦é¦ó_s¾.}#ö¦aFyæþ[ƒ·ïìÞM¾÷y¿ø!æÃÖlÚGâÇü9ž¹êyùù¶£…¡O>Í.F.n.¥¦ÿ\üEôKãW¯ËÎ˳ߢ¿í~ÏZa[©\U\íZ³^{ý#ìÇæzÆO¶ŸU¿Tõm8mÌmÜÂnl‹m·î˜íLí†íîF’£É¿c$Ü#üýø^ ç.pî0æONñ[NW XÆ87€£€Qˆr‡ªጸ‡A^D± ŠÑ’è>L8–;ŠËÃ{¤©PT¯©¿ÑièЧ0Ü$Í1q2»°\`bâˆälç¦å àmçgˆlÚQ«%‰•’’¶”ñ——K–?®¢xX‰¢ b£*®†R{­~W#_¬¦ƒ–Š6‡BgIw\¯GÿŽA¥a±QŽq†IŠé¡ý³pó ?K_+_k_›@Ûp;Šýa‡tÇ3NœK\*]4¹µ¹wyôzö{=#zûLú¾óûâ¿H ’ 6 ñ=v=|(b5Š%Z5Æ…›Wtðz|{ÂHâBâ0÷Í£žÇR“kRFSwŽs§ËÐ?ét*,ãDfyVßé¯g¹ÎÙfgåôçÒæ9ä^˜*ä¼èZt¡xè2®DçJüÕ†ÒùrþknѕǪÎU—Ö´ÔŽ\_®'5hܺYtëY#îŽJ“c3¥åÜÝÚÖ®{/ÚfïkßèØíDv¡º1ñ=„^lïvßÒ£¡Ç•O¢ûåú粟ª<¬};¤=Œ)õ}.ýü׋ž±œqò„êKö—Û¯ÞN>œº6þÚïî çÌêÛ'ïJÞÇ}°ž•‚½ìÛÜ«ùÇ mŸo.Ýø|ûKýתåºoÝß—WU׊׹Þ߈ÙÒÚaÛÝ…í†cÅ} ´@È: #$©ˆY8¶ê‚ãþ´9zs «Œýˆ»„w#ð–¨a´D:AzU[…1©•y–•M—ý Gç<·ïU¾!þ‚ìBÂDbDO‰Š—J”I^–:/"*k+§(O’ŸS¸ {‚±2òK•RÕP5eu þX#gŸ›¦°æW­Ví“:zŒz_õûaoH3ò6Ö1á2Ù2ÜßjVhoáj©m%lM´^³ycûĮپÌ!Û1Ù)Ú™ìbçjp@ÉMÈÙï±í¹âµHþà=ã3í;é7é?0ø&èMðtÈdè«°Wá“ÓðN=½³BÙŠÃdˆçHàK9$¤|XûˆéQ‡c>É””ôÔ¢´[ÇûÓNÒœRÈpÉ<œUzº÷̧stÙÊ9îçÓsòÆó¿€B†‹ÂEšÅN—(—óKî_™+e*3.O‚÷¿G•sÕ˜áZÃë¾u©õå ½7noË7ÚÞ j:ÜœÝR~·©µïÞDÛüý_„œÒ] ÝBI= g©w¼¯óQíãÜ'Iý~–OUEŸñ q³°²?ç~!0&:.3¡ôRý•Τєå´ëë7©3¥°?lP›=ü±ožu!äSç’Èç«_å–ß}¿½Zù£íç—M•íÜßöGÁÙ‚,pçÁÄ9B…Є""±€´D¶¢dQõh%tƳŠÍÅiàæñ×ñT^ÔæDU!Z:"=–"!ÑLfZVa6%vCGÎ ®Pnog^3¾}ü¢´pDÕ/tE8\DUä—è±pq!ñq‰£’¼’¤ÈÒt¹Œ‰Ì’lŽœšÜ[ùL…wŠg•´”•/¨è©|V-T3T[V/Ò0ÖXÙW¢i®ùS«\ÛV{W§I7ZO^oE¿Ñ ÖPÙpݨÙ8ÁDÃdÓôþþ#f:æÀ¼Ë"ÍÒÄŠhõܺØ&ÀVÁa7 ûH¬£©—Óç—s®>°—àܦÜozœôôôR%“È_½|®ûžó‹õw Ð ä B-? ¹š–î¡)Å^‹yGyÛWv03>*Á!Qõ[”´q:Š?ÆÌž"*‘¦p\=]ç„ÑI³SÖî™ÑY'O—œ¹}¶÷ÜxölÎ×óë¹[y;ù;„BÙ‹.EiÅõ—ÆKÀ‘«¥Ñeùå-×^VìVÉUûÖ\¨¨õŠ A7.ß½mÜw'ªéZóø]|«ú½¶‹÷µ¯>àé4éŠê.xØÑó®ýHì±Õ“„þªéAögCÕÛ£¶Ï»Ç<'X_nL‰¿îx;¼¾^~Wügyƒ<ƒ‹C&ÂèÃM"ŽDÞŒzÃD1ŒMŽ{ÏžœØ–DwØÿHû1–䨔4‘ã©é3'5OÕdògŸa?[”Í“S‘+›wÿ‚yÁôňb䥂¯«jeÌå¿*fªžÖt\o¬¯¿Qs«ª±¢)«%²Õ¶M¡¡c¹s°»®çt_Äc‡~­§bχ¶FÞ›ÀD(¡’£š N#*çiJhééFèsœI|¤ïŒ}LW™²ø°îgSfæàà$qns}äæéämä«å¯(¬ªné[ß•d”“Ö–q •;._¢pOqF§"¯ê©vF½]cYS@ËI;K§K÷§¾¸‡a¾Ñ ÑÔrŽÙK Ë«z[w» ûG§ço®ÖÝy<Îz¡ÉÉÞ_|UýRý‡y‚¢‚{B9ÂbÃG"å£ò¢·(~±ÝÙãcI%;üó¨ÿ±W)ö©cÇ=ÒO=5›©—uõ tÖ÷ÜãÙóEyøüÄ _ .¾/ö¾ô¾ÄöʃRÙ²«×H'*·«)5Ÿ¯Ô½o ßx{ËûöìЦõ–ÔVú{e÷UÚuáºk{lz7U=q <íy–<¬=²õ¼y,|‚ÿå³É¸iæ×7gŒÞŽ¿÷ýðå£Ã\ùüâ'þEó¥ ÏÁ_|¿,ó,¿ûví»õ÷_+—VeW®9¬Müpý1½î¸>ðSïgó/¡_Ù¿¶77†6•6 7··¼·:·y¶lOïhìäí,ïîß-ß³Œ¿|FÀ ¢ÖƒƒÉ×»»+Â`sØÎÞÝÝ,ÝÝÝ.ƒ“ øȃÐ?ß+ö„1pͽ¸|õê§Û»ÿ{û/6'†ªÞ"„ IDATxì] |SUÖ?hÂP´ˆ­ ( 2 P\(*¸QG4¸ g‹ËhQ™â§Pׂba¤ŽZœ¡¨St,hA)Ã%ˆE RÔÔ±@ ’bª÷;ç¾%/k[è’¤çö—¾íÞsÏý¿äwÏ=K¸0Œ#À0Œ@³"pB³öÆ1Œ#À0Œ€D€0F€`@€p €Î]2Œ#À0,€ù;À0Œ#À´,€[tî’`F€`ÌßF€`F `Ü s—Œ#À0Œ@CÀ0Œ@K"P[Sµ $$$BBc=jkàp @§N‰j/µ°s݇àt ¤u×:©…šš . 1Q»ŽMkÝPép‚·mGHMé†+’nMu%üèòBÛΧBJ²Ö]ª›¶{_%T¹<СswHé–$éþ«Aœ—À ú1õþ<ãµúÐÖIðN³"À3àf…›;c#5;—CÛ CÀ§í‚ÍÆjǵض’’:@ÙaL ¬?þù[;î­Ë‚xèÐalP«Ôì^ “Ûv†Ô^½ Wjwh{órا·Øþú<èÐ%záõÔ.`ÞªíúÕÈ´«aÕ¼qй{*ôíÛR»w†Ás^÷£M„*7,Dþ.«e¾²S ǵ…#^„jýJýhëÕy§Ù`Üìs‡Œ# !àõÅ]Zí`·—Cy¹ò±Ï UiЖf‰óØÄ³o›m éä#Õ> gŠí|ÇЖö>*ìÄ }ÖÀ9ÝÏ^3Š2òÁáñ‚Ë^ æ•ÓáÞå;%ÚÊ5`š¶2 mà^°f‚©&X[©rvõæ—aê‚Plw%tÙ‹À¶xäoUÅhÍV˜Ü¦ ¤Ž™‹}uVØ”½ÿ¹aÕÌT˜»Ïõn¯Ïë¤m$Áû-‚ à;e½áœ¡} OŸþп¿òé“¢©ak`-ÎÛàŒ3ìä ¹w®…›QPÑLº-nns^Wä÷Á´i¯BE¯ÇÐ{ =û/ôé ŠüÝ6€¼o€TI'õ™ÿ—k†•ûXòdÿ¨ÉeÃ}SÒPø%@Ú”LÈÂ3Ø Ý„¦íùé Öé É]”ñ&õÔÊ«6M/:`+ÎÆ.õ¤ÿfÃÂt˜ºÔ¹Yf¿*uÒö'ÃG-€@øå„`†»dÖˆ@f-쮾†ÈujŸV˜*ɰZûˆ gÀ§\¤)Ùê"Ñ>ðÖõ0{}8Ä“¯ÁÞ ËaÒ˜ipíÐ4X73ÍfàÁž_”§þý|üûÈص´÷À 7WÁ__~¼+Žv`_|Ü4ðpÓàÊTF x#©‡‘N¤Ë©ý‡ØVÃã¯&ýÀƒ±jÒɤòUˆ$õ>ç§ëas…oúÈ¡=ú÷¤”~ë·B•NÀ ÛPknêªNéõóÊŽ‘öÁCØGú ÐÌ=GgÀ\œ8ê?çÕgµú@Kr ¿ .>.¾ôb™NôM0nü`4ר/m…+þß°n Ô¹OF€0 pl[¶Âö­[a«ül†í{5gš´ðmmÇ- k„•Ô{* ×ðÛšµ«`þÍÁh4EBÈX~6ìo_>濾öîÛ•»7@öì;°Æp&-Å&‚&€Ùç㌸öm_–¹6ȸí©ïsáÕXi),\¾Ü5nؼüIXŒg¦]Úÿ£‹RÚÝNGÂE9ðÖÖJYwßÎÁ+RxŸ$°?ä©Â¡ˆ÷ªŠJ§Ø^OÚU¢ +ݯmzN±o¬.«À ±ßu0…Æ¢<éø]«ƒ¶Î!ï´m¨ã–ÿÜ3#À0Ç€»Ú IÉI ¡ê 0®Ë˜luÁ¬¡ê9¿.Cž) \ËpÿPšâª¥¶Ün7ª» Y£¥]“[šâì£tu q½¶¦ªÝØ:©ø¢¶uЮuc[ŒÄ•˜ ÉIaÃýx©ÿASÒ®?\3{§CõÀçF€hRªá¥.] bÔè¢Ï®«æÂÊÂWaĵ×HuÒû2)Ù ƒxFÁÛM·ô ºJÂ3ìå:h'$aÛPï A½4üDSÒn87܈πhð>#ÀÄ$Õ•;áëÝ•P]}Ú>.ÄÀáEe-TÙ+Њ©tïÄs˜¼áqÂ4 à8¹‘< F€`ØB€­ cë~1·Œ#Ð̰™L3ÞŠºcÜŠn6•`ŽÀ< ]}œNgÃs F ¬‚Ž_bÖ@ii)˜Íføõ×_á´ÓNƒM›6AÏž=[7(<úFC€gÀ%bxBà§Ÿ~‚3fHáKãJKKcáO78 ÆÂ8 n³À0чÀŸÿüg™˜8;餓`Ù²eÑÇ$sÓ°ŽéÛÇÌ3Œ@S °~ýzX²d‰NúÅ_„ÓO?]?æF 1à5àÆ@‘i0Œ@Ü @ѰHݼwï^9¦I“&AQQQÜŒ=ð 8zîsÂ0Q€À¬Y³tá{Ê)§ÀÒ¥K£€+f!`w•ÇÄ0Ç„@qq1,_¾\oKªçSO=U?æF 1`tc¢É´F f8xð 8*+•Ô€”æ¯óíraš ÀM…,Óe˜BàÆo„7ÞxCòL³Þ;v© ¹0M…« › Y¦Ë01ƒÀêÕ«uáKL“4 ߘ¹}1Ë(Ï€cöÖ1ãŒ#Ðìß¿_ªž«ªª$¹?üáìóÛÀ2:`\'D\`â©S§ÂªU«äSSSá‹/¾€“O>9ž‡Ìc‹X%7‚Ù`æG`åÊ•ºð¥Þ)Ú ßæ¿­µGž·Ö;ÏãfZ9ÿûßÿ¤êYËrt×]wÁ /¼ÐÊQáá7',€›mî‹`¢«®º Þ}÷]ÉO¯^½`ûöíЩS§¨á‰Xÿ÷˜GÈ0¼òÊ+ºðmÓ¦ äçç³ð Àˆ›ž7=ÆÜ#ÀDß}÷Œõ|èÐ!ÉÕ½÷Þ O?ýtqȬ´X·–;Íãd‰Àe—]ÿþ÷¿å~¿~ý`Û¶mСCF‡hvXÝìs‡Œ#ÐR<ÿüóºð=á„`ÅŠ,|[êfp¿À˜¿Œ#Ð*øæ›o ++Kë<\p~Ì;Œ@s#À*èæFœûcfG@cÇŽ…ÒÒRÙ7%]°Z­Ð¾}ûfç…;d4x¬!Á[F€ˆ[ÈÈJ¾ RõÌÂ7nowÌ ŒpÌÜ*f”`ޝ¾ú yä½éÃ? Æ Óy‡h)XÝRÈs¿Œ#ÐäüòË/0jÔ(ؼy³ìë¼óΓûmÛ¶mò¾¹F .x\B|`b è·]»vRõÌÂ7fogÜ1Î8în)ˆ`Êj4þ| ÚOKKÓy‡hiXÝÒw€ûgFGÀëõˆ#à³Ï>“´iÿ“O>O<±Ñûb‚ŒÀ±"À3àcEŽÛ1Œ@Ô"ðØcéÂ711QªžYøFííjµŒ±nµ·žÎÄ'[·n…œœ}p´Î9çèǼÃD ¬‚Ž–;Á|0ŒÀq#pôèQéb´cÇIkôèÑðÑG…äÂDü­Œ¶;Âü0ŒÀ1#ðÐC&|ó›ßÀòåËYø3šÜ°©`ÜÔ3}F€h6mÚ‹-Òûzæ™gଳÎÒy‡ˆ6Xmw„ùa#àñx`È!ðõ×_˶—\r‰žõ¨Áĸ#ÐL𠸙€ænF é¸ÿþûuáÛ¹sgxõÕW›®3¦Ì4,€ H&Ã0-ƒÀÇ /¼ð‚ÞùsÏ=gœq†~Ì;Œ@´"À*èh½3Ì#ÀÔ‰ÀáÇÁd2Áž={dÝ+¯¼Þyç:ÛqF àp4Üæ`Ž 9sæè·K—.°téÒc¢Ã–@€pK Î}2ŒÀq#ðÁÀ+¯¼¢Óyþùçá´ÓNÓy‡ˆvXíwˆùc :ƒ ‚ï¿ÿ^^»îºëàÍ7ß ªÇ'hF€gÀÑ|w˜7F€‰À=÷Ü£ ßîÝ»ûa…lÀ'(D€pÞf‰`Â#°fÍX¹r¥^!//ºvíªó#+° :VîóÉ0pàÀ8p üøãßÿþ÷2ÓCÃÄ",€cñ®1ÏŒ@+Eà†nÐ×zSRRà‹/¾€äääVŠ;Ö`t¬ßAæŸh%üýï×…/ ™, Yø¶’›§ÃäpœÞX#OÊ™TϤ‚¦b±X`É’%ñ4DK+D€p+¼é#À3&L€÷ß_¶ïÛ·/lÛ¶ :vìxÌô¸!#M° :šîóÂ0:`C¾'œp*š…¯ïÄ,€ãà&òxC ¢¢î»ï>}XþóŸaÔ¨Qú1ï0ñ€« ãá.ò8B@ãÆƒ>úHŽêÜsÏ…­[·Bbbb’‡Â𠘿Œ#U‘•&|d¨I¾Qu‹˜™FB€p#ÉdFàøصk<øàƒ:¡yóæÁðáÃõcÞaâ VAÇÓÝä±01ŒÀ¯¿þ ]tlÚ´IŽbðàÁ°eËhÛ¶m ŠYgÂ#À3àðØðF€hF.\¨ _º+V¬`áÛŒøsWÍ àæÇœ{dvìØ>ú¨~ö±ÇšsaâVAÇóÝå±11€@mm-Œ9¬V«ä–Ö|I }â‰'Æ÷Ì"#pìð øØ±ã–Œ#Ð<þøãºð%kgR=³ðm`™DÔ#À8êo3ÈÄ/Ÿ}ö<ùä“úiŸü~¹0­VA·†»Ìcd¢ŸþÎ?ÿ|™ZØ£HW¥¥¥@a'¹0­þ¦·†»Ìcd¢GyD¾ã™TÏ,|£ðF1KM† à&ƒ– 3Œ@86oÞ ÿ÷ÿ§_¦ý>}úèǼôXÝîrTޱjjj!n›¿Ú8\Щ“;¸v®û¼ƒ.´îZoJÿþ]'`¼aí:^©©†½'@Bg8µg7Ш)mjÁ]] N×èØ%º%ÚˆÖÔÔ„c»ÜnxÛv†”nɆ´[Úh9ŒC„D Õk…02d|õÕW’uŠûüá‡B›6mbm(Ì/#p\ð ø¸àãÆÇŠÀ†ù—A‡`úë;•DØv‡m !)©”ÖªÔÀšñáŸß¸µàÞºLöO<ø>Ã`ƒZeïڅЦCè…9h{õêÚÌ„ •µ²}ÍîU0¸M[èÜ¥;^ëÝ;·…9Ë·¢Øô/•"íËÁêë+ÔÀë7·»@÷Ô^Ú½ ´¹ùYØ«6®mÏÝm¡Ã3ŠÛޝÑôÀèÂ7)) ^}õU¾ÑÛ˜Ã&@€p€Ê$ë@ v'¬|d½¬´2g ì«£z¤Ë4› |‰gß6Û.ÒÉײ}ÎÛùŽAF74A¡Õv;Øåg IÂ:5ÛaöĹ`)°eæž È5/…1sß’³Nï‘Ã02»ÊUàò8¡4Ï‹§O‡Rm 5[a2ÎæRÇÌEb•®ô®½Ðîìl(¶UàìÙk˜VΆÙ+·ËuÑÞ¾üfV })@úI±¢qÆ ðÜsÏéhPâ…ž={êǼô*ðÃ…hVœe9 C””àDÝcèß#гÌ2ò…Ëp6p×U^,2°-µ§É"³ \V󔈴´,Q®“u‹Åi r¬NŒË–‡íÒ…M?cØq• 3ÒÌ*vè'm¹ÈSzÐIêWpÇQ$yȳi{„Ãá¶âlT5W”Wa]z•8 àWÜ`IꙈªj¾0 ÿo*œÓà ¯üè†îýÓ`dŸþ¸4ý–Nû„ý·Áû%7ÃÞÕ÷Âó‡HÕ\ Õn´ž>„ wÏaµÀÛÛ™CÒ%}ß?/ ›1@ê)ÐEþÊÀ4r(ž(“ªñÄn=¡7_íêÍo¢ZßÅSL¾“°÷î»ïÊW©—^z ºw¯ÇÚ¸Ö€·Œ@<#Ъçÿ<øfE <UÇ-|¶Åj÷ÎR©ŠÎ)­Â.‘ŸŽêÖô¼ˆVÐÎò"a¹¸ICqÁ½‹E¦ªâÍGÓgE=IØöYÅX<Ÿ¶xƒÁ Ú+<ØY˜)ÛR{ù9s’XñÉQZ ¨¢Wýƒ,µÕk´5g k•W¥ÝÝÿšZïä!—Šéw?" ×É~ýÚCªÈ.,E9C¶UTêN1°¡OCÿgO&Uζ¼IAís>Þ(Püú©ñiŒ{Ks•ºæLQXR,Õè?´ZT¢µ£Êg.„¢Æ7å” Ó!**ðº3”½·WTa[ºîôÖýÕ9pà€8í´Ót~§M›Vw#®Á´"ÈÏ‘ #s¸œ.á(¦õd‹°«üºóQxe—Æ‚ÂÖí¯[¼¤º!•f+kŸ䲄ç—MÅ6·¾ákãµáË€Y”‘ë×%>||”8÷QÍàõe!ìŹ"+'W,žç®éy´–ê–kȇ’ÏBÒ_(kÂÖ$J±¿ÒÜt]XùêYD©Ã+hÍvÐý ÎOýêdæ[…×c•/0Ú:º£”^ÌâãŠ{”ÙB/>þíäqF®¨@Aj/´ÈúËO_½¬B›|mÊËð]£µêâŠPBÚ×äúë¯×é‘ &Ì…`|°öaÁ{1ƒ€S1Ž2 8ý|õaoåA³3Efm4%™rD7¢P1çfªF<¬‹ÇŠ4ƒ¿­.€«Ñ4* ¯}m;Eúè*3Z·˜›BBkŽ>ëß±üÉç yï "0~v‹µªðMÃ1aåNx*D.eˆOIÏyTX² D`¤e%ߪøÓõ!?Q í¶0çªþÅÞoÅ_.W0Zº­Z`)ÌÓ—¾-ý©'å<+ùÊD¡ëÅ?›ª!(FáOEÑ^ Ðµ»Pà;D> cÝNa×øÕªU’ž&ÐQm¼ÌûŒ#€°>!¸ÄÉp»-ú"Û—ß2 æ=‘KçÏ‚ül<³ÖlÒBRiãJ„[wí‚[^W3€ ×\ø÷n7F¯ õ;Ó£Ã;—cøÉùð½œèim¶¿8ñÄç0,CX¶mƒÇ †q3_r.ªýþ#˜6mÔžÒEo䡵ߋÏmÕóŒ¡#䵟Oí=Ñ…ªOŸî°cÙm0av”}d vJÕÇ7±'˜'cH$ÞÏŸÐã*XòðMÐ?¥$%&Ãè«®À³6Øõ£WY÷n?^t8€àù*ø/þ9è,ü%á x´èU\;ÀwHzÌÀéô—ksqí7fœú¼’ ÷MICã·H›’ ¸ºØñ¿>þ®ç>" ÉSà†ûÑ á¾¦”ªª*¸óÎ;õ³3fÌ€+® ^¹0Œ€üÂÄ"Ò 2qõÒX<Êú±¥ç¶Xp6; g“^"u°Wqå±ã¾2ƒ¦èSŽEíjÉ-oä\£ÎÚÒõèUþ3`xSªa»Š…ì8ƒµŠÜQZw/Þ°ÚÅžOaû³% ZO%ô¼3HšVeåû›·ïTú.,–?ŠÛ'Ji³ÞRœ oOß6\^ÿÝŸòEI1EëñD±Mgcä+tIrTU myÖ.g©EÉç…²^vq¹@á'ª¥ûÓ(1kÒ6çˆrÙÀ#ÊòiÆ ¢ˆôÎŽb9î‰Ç–B»TqƒÉ¸îŽkñ8û6I·$—ÈÃ}?%—¿Ú«r3¢~èCîGä†Ä…`‚`t0&|&êPB¨ðʺ°*@Qg ˜ôªtÕð’y6eݲB®{f »c— ³ IDAT×WÑðJ®­¦Ý(/[(ÒЇX®#6£ …Î)X.Ò}ŒË_¹D@'EØhBçü»ó…¢¹Uüoœî[¦:ýÍ3ÄKùù"7kº.¨´¶Úöü˪×|FLÚµœ2§pÙòåõýýû¦µßu›Hp›Å¾.ék¬Õ1‰¼R|eñÚßéÓú`½,ù#ŲRvé÷Þ t=6‰c®1¦¦TÕƒÈGŸecùÛßþ¦ò ³‰ÿûßÆË¼Ï0XÀàÝøE lÿü œ.ßb®½ˆ¬£1Ft€}ÑçŸ.Ö®y^\)ð(]ÛòШ gŽÕ—øvãÓØöbýÍ€µÔ^—SÎN}Ú¡€({ª¬z²׊D Ò¡õm•8×¾Ô.göíTµZ‹ÐCWcS˜ÃyL3e½è3`Ÿ¦ø×bRÀwß}·^wF ð­š #ç€ÜvŒ?n+ÈœÂìåoBL'H¹héÓ7}1X(ô£ê¦Œ.70zôh™¿öŽ{žtÅÅò ü€Ø‰‰IÜz¡MÀå0!)(Er@Þओiu}z±ø€('v w«»÷bRç^&ž+*9Êšªæ³¬YHS˜J¥øfÀÚ[~–ÈÆõÞ \«uØKæ|@zŠØkWÖr3òJMʽ.»È%ŸhufZ!³+áqz®°¡õt¹Í†V×6´ŠÖf¡9Ë.W³0—;ðX½æ-WÃRšD¡µ\ØËËe[Ö‘ó#K~ZX»ôõâbuêm§ðŸhq]T޶ãN•/ƒ4ÆyÖ1:á„Di©ÏuK{,n½èOM>ÕþŸÆä¸ÇäE¿u·QÝâåø]´ý¾â)¨ Ëœý³+ÐBÞ¿¥â÷îϰ?Ñ£:þm{!Zþ5BáãW¿KU¨©¨p(#I\þ æË¯µ±v\î³ :.o+ʈ@ý€ ÂXÆždP!+T²²|þ±—Ž-ò2]³%[d©’ÂZhí}êë`¬N‘ÐU?&‹Ÿ?­½˜|x ×Õ Ä‰-/´?¯¾®ª…ýÚk†UxÖÃý®Ñ±AˆÚ5¯ÖË)6¨œÑp­ Ý«tŠ2ñE9’’’ôk˜õˆNÇAQÖÁ}cÖð3é6Ç;H·õ1‰›ñ¥â?¶É(Å׋UÒùÜäèŠÇ^lp?C3òÑÅN)õù­ùŽk¶jsÃÆ!rèeÑð}ø¢0øhÆzåÁnÔ8çQ[^(¿÷5Æ»V+¾·,€ãûþòè½{÷ŠOhȰȶã+•ªKUt†juÔ—/ˆß%|ów9‰ÁDBœÍ5sU¸ë!5ÎI/úSßš¥u U¿õd¼øë¯¿ŠqãÆéáþýûËYM`»Ø!j¤«õJ[—À¤eÊ=6F¨SÛfYñ%Lói/GíŽò½¶¡qcbçÀﱫÊ&r Ú¢*slg ;¾À)þðH£B{u ñ_XÇÿ=æ6“'OÖ…ŒòƯÍQ-6"”$Ìl¦ØjqÏ>«ð Lè%¥¬L üÑâœ5.‘‹Â-·<¼ZÔ]NÖèhЦ/;„è×[Z%ád2›„ÙR -Î=å"--K”ëZåÐX³zÏ-.Di˜W•H ~ãòJ°A'giXLB °‚ógAÆsè;Žç}†{¾¡-‡x· Õ>n†zª%|^|Œtìù8ã5Ì ‰OS®ÍX¥Õís2üEpaêB-zaõêÕ²Z§N¿7_{Nè|&˜.)ªÁV0 %ô0X;WŠÙ38kx@çŸöGŒPè'c|§=ÆDyqÑ#PÓçdu$GáàÑ3`f֭ГžžGÈ,oüvœ5°æ¾^pÇât(´ÚaøI‡ èÑa0{—Á¤Öë‚íÛ7ÁÁÚ°ö­ƒ‹LÓ­ÐaÖ„# | •úîï¾ÀP,fžêû¶mßY³ë ºž9ÐG&¤hþD™—¹Ë-TŒ“Ý«æÀ˜¹ Ä¹zM†ÙﮫI©ï¸k lLM‚3ÒÆÃ”ë¯…¡2!‰¡žÜ­† «‹Pö_‡™·”ÒÃÇØfφ›·¤BÇ®gÃ¥×Ü“G÷ ›-b<kw!ÆÂc`š ²œ>çœs૯¾‚Ç‚ï~tÂÌÉSëè/N8ùd°üþ:iü›ßü(;P<T=í·Þ GŽ‘ÃIKKƒGy$†æ?”e¶À¸Žõüؤ½’uŸžL3Á힉ü›éGî/añbÌ]–S†&ËÓ—G¥ï"½†Üiçè;ro‡™ÝÇCï<+,™Òím—×ÚêR¬½¯®ºç=Ji°‚ËîUObêN3”Ý9:è¢*KýÎWo]}§.†B»Æ!ë;*—u+û¤p[vLî~6´?úÏžÃæN ô“äۉ܆g§Ë´™…öktËýŽgƒ,\ðîyæÛ¸ ¦Žyí­°z&eíj%¥ÕÍùyÀŒÀ1"››‹ª8e-Ìd2Õ‹ % ÐÚœ|òÉõj •.\¨«mÛ¶â³Ï>‹¶È£ª‚öYÕ5°=VW×I3K|k›šï8­”j–óV]…í¯‚vY£<“%SdZ,=M1\JOÏ9Ev´ƒ¢ï¤A%4¥q“f|§q¬SI7Æ=×®á6Ðwœ.Ys”eKf&ök‘kˤ¾NÏÈ6¨Ì D°‹q}Y^òêIFre¶3C}¿]¯(Öc¨û]ˆëž·’-æñ#pË-·À¼yóà§Ÿ~tÿ 6HáH”QèJÿb|Іdš9¢›N¤&Q ‘üf»4ó2dHÔó}¬ Ö`¾fÐçmÁTjQ}œîIŠMi>ÚûdŸŠ8˜Bø3z\¾ ?¥{`À¸ñ`êÝ’N?³8φÍn¡:v9´ŠûšÐÞÖ˘߶[0ÝGˆ¢Ïjõ€—ä@~äìÿòãRX_d‚qãë9·ýH%ö€ÓñÄ.ý¤ÖÌK‡ôë¡Àæ„›Ò €~Ùo'º÷`ÚïzœÄõëŽhdî¸ã}æwÝu×Õ‹úI'¤·q¢õs`¡p ™8ÓˆvÿY/Zp®Ä«ÆG¢6l˜ s±Xè^|ðÁXW¬ -ù%h©lV«ö)×CvÏ`ÉyKy´\Î;Fßq?ŠjºL«¹SŠ‘o¸M׫l…Ò(+£ÀàB¦¦ëô‹d¦à;®×QvÊ¥Užn±M†a™ù†>¾UUvÌu­ÌÎ ¤E™Ge*Ú"%ιêwŽ~ëÒmg˙院c¨Wa¤6[‰÷\‰¡Ðq²tß\Zã#@/HðЇT¯•••uvÒ«W/½ÍîÝ»ƒêßwß}úõY³f]¦ÙÙd-«Œ¿}ûöbÇŽÑÄ^½y!kíž={Š:È€$¡bB rQÇëÛúT¾šï·O…‚¦•<ßq?ŠÒòØ×?]ó:×"?ºö_óʤ³ìÅDò¨¬ `Íy΃cd u>Ù˜ÎR)á°C«r+r‡ñȳe¾fæjöË€žã÷püÞ[Y!0fÌý¡üè£ÖÙËyç§×ÿïÿTðàÁúõhΛKñ°é¥C{Ð?õÔSAc‰…Ï<óŒß8Èw¹)gñ^Œåóÿ=Fßñ:Uf²-á;îq¹„“bë³ò:™Õ+6¤‰Œ‘®WˆóØ^ŒÂ'F ¹¸çž{ô.1#ZÄFò!Àz}-V²v‚rçÒz2npñÅk—¢j‹ h œ¶T.¼ðB¸÷Þ{£ŠÇº˜!ì1U¢ä[G—.]dÌê„°‹¸uQ­ûºgÛb™;ºM›ÁhЦ­4AÑãW†]Uv¡¡óÁÀ`ãuv“(cwKn~—·Ä$ŒƒŽ±ÆíƒGà=b¨ãï#0Fz„&qué ‹«ñó`#@ñÓN; ~øá@4üãÿ\K'’þðÃI %Û’PëÔ)œ?KXòÍrgú°mÛ6ÙWÇŽaùòå1eL¶eËy(¹†V~ûÛßÂ[o½¨ŠÖN5É6ið`+ ûðàçv§¶jßñ&8†‰ò 8†o³Þ2ÐliæÌ™zç/¼ð‚¾j'’F# ½É¥—^ªïGÓ /cf#Úïׯ_4±‘—çž{.ºè"0 _\k‡76¹ð•Œ%vƒ´£aÜ„I0a\¤À-T­ûö…îxnñ¦ÆÉEÀqr#yÍ‹úEJ•1õŠ–ËðÅ_„e ’¦°V¢Qc6(©zþå—_$›cÇŽŠ  …ܾ®½öZ aûóÏŠN—ÜÂþùÏútë÷/ÆÂ<Æ',€ãó¾ò¨šRA_sÍ5z/‘fÁáð—_~ ‡WÒ :矾N/Zv|ðA ¿_*˜ñòóó¥_s´ðŽ4v4~“ËZtŸ ×[;Å[F E`Ü¢ðs籌€q&¸råJp¹(äBp '€êg um:>ùä £ˆqÕ‚ÖÀ.UÚaÔnéehÔ¨Q°gÏG2œ#•s,ð¯3Í;q ฿Å<À¦B`ôèÑ@1©>|X&…êË(€<¨WÁú~´©Ÿ)Æ3Åz¦È]T~÷»ßÁm·Ý¦ó;ô4uêT a«©œ1 ¼ýöÛ@ëÀíÚ…¸ÃažZ,€[ÁMæ!6ÆYð‹/¾¨[4{4 `Í ‰ÄǬW‹6ŒÁA€²Q¡uÓeË–é¼FãÎÖ­[aèС°jÕ*=ŒÒtÞ¸T _äF `7Yˆ]222€fYT(SRIIIÐ`B àO?ýTÆ”¦Ê}ÑêµwïÞAíZêĺuëॗ^Ò»§Ùcjjª~m;Ä+¹pa”1µ»îº ã³Î:K?Ç;Œ@´!À8ÚîóSPŠARÕjåùçŸ×võm(­êg·Û øÃô™|zz:ÐKF4R9_ýõ@ÂöèQ%W^çΡ°°h˜UÎÑxט'#,€hð>#p hÓ¦lIù~¿ýö[?*u àË.»Ì¯~K`BØ»w¯d¡k×®°dÉ’–d'lßHÅL4´B*hR9O™2E;Å[F ª`Õ·‡™‹Î>ûl¸ä’K$«ä/›——çÇ6­¡j…Œ°0G0`fyêÄO³Ù¬]nÑí¿þõ/xõÕWuhMûÔSOÕ£e‡^ FŽ©¯Q_˜¥JªœûôÁ¤õ\A€pŒÜ(f3º0Ƈ~å•Wt•(qMB–|h©U1Í’5ëb ‡¨­!Ë -ôŒÃn¿ýv½÷n¸AZë'¢`‡Ôã7Þx£¶šÊ™p}óÍ7åš5fgŠ.™F þ°®?V\“‹ÀW\¡‡5Ü·oŸŸj”ÕÐk×®ÕéD‹ú™^ (®5•=z@¨µl騡8Ô¤r~ã7ôÞ‡ "UΑâpë•y‡ˆBXGáMa–bšå’T+‘±Œ8ÚÜ(4ã믿®±.×}O9åý¸¥w^~ùe©rÞµk—Î …Ý´i“´ ×Oò#c´¡t‹1Æ3³ËD%û÷ï‡ÓO?]W?S8DšµQ¡uÞ>úHîkÿÈb—Öƒ›2žÖW¸-ÍÖ ”‘Ê­hÑMá&£¡PpJza|9 •3¥€$9F Öàp¬ßAæ?j «ar‹ÑŠqlœk×I(·¤ð%>î¼óN]øÒ˃1ô¤ÆgKl·oß.cc…ïàÁƒ^jXø¶Äá>›ÀM*Ólµ#cýýï§Ó)±%€[:ú 7 Ó¨Šv aÄLj#d`7 ƒYVVdqÎ…ˆXÇËäqDdÕ¬e5¢T~ZÇhÀ?üðüñÔ1£õë–6ûé§Ÿàæ›o–1§=ä­S§NPPP´œ˜˜¨óË;Œ@< À8î"!ª0΂)L"¹ à3Ï<³EgsdĤÍÎ) æSO=Õ¢R>ezq¡¬RZ¡D¤r¾é¦›´S¼eâ Àqu;y0Ñ€­QjVÄ”¯¸¸8H·äl“‚m/2ŠàEFW4Ól©Bý“æ`çÎ: sóæÍpÎ9çèçx‡ˆ7XÇÛåñ´8¤*1c†ÎùÔj8´“-µþûÝwßÁœ9s46`öìÙpñÅëÇ͹C*ç[n¹EÆžÖTÎ[ûµ×^“ªû:4';Ü#Ðì°nvȹÃÖ€YŸp‚òózÿý÷ýÂ&ÒøÇßì0Ç!Í,)‰2hzòÉ'›êðË/¿”³Þ¿ýíozÿäE*çhMþ 3Ê;Œ@#!À¸‘€d2Œ€^½zÁå—_.O‘à£`Z!Á¬©¨µsͱ¥õè?üPvECV¬X-1ˤ~‡.…°6nò?&•sÿþýµS¼eâÀq‹y€-…€1>4å ÖJÇŽµÝfÛ~óÍ7pÿý÷ëýÝwß}2º”~¢vŽ9Ó§O—Á>hŸ aA™Ö[—f6wÁ„E€#a……†/0LJÍ|IÍk·Ûý‘rs¦ù#+ì±cdž $šª·9“”——Ëä;vìб0`€ÌÝK[.Œ@kD€gÀ­ñ®ó˜›²0¦\Á…Ü~š³Pt+MøRä-šq6§ð%£*R9…/_mÙ²Xø6ç7ûŠ6XGÛa~â R¹ªVO;í´f#©¾|ðA½?Ú§ÄõÍQȲ™"Xýþ÷¿²x¦BXÔòå˃pižÔGm-P0•ÀOmƒˆDªŒô1Þu`µ½ Öm¯ jTã®–Ùªöí«Cu¨­ æø5Ö ×ÖØI­F¿Úm<ûµ°¯r/ìÝ»ªDµZ5È^« jGjÁ]½Oò]í®ÑZÈm}øök¯¨&ãÂ0MˆæÙ¥„'úçwÞiÂÞ|¤kkk†tÔû=ï¼óÄÏ?ÿì«Ð„{¨r¨êÖû¦ñŸ{î¹n4a¯KÚ–—îÇ¿v³ËœÓ‘{“¤ÿØ&=k~OÛd ï…ßwGò`ÊeU^¬ã¹†ï•Æms­D3R[µ o…ÈÏ4Æ™ƒTµâyƾÓEq…G»(*Jr í@˜,ù¢J»ê)–ÞÒ³Š„K^¯‹oHüoyŒßV.Œ@S"`ŒŒEýhÉ䛲O¢““#-‹i¿]»v@.?mÛ¶¥Ã&-:’TÎÝJ+b’TÎÔNÅÈ6 ¬¸†oÇ5lZǦόIÇÀ;ͦ¦Á.› nd ×~,¤%¶3Ð÷ÂϧdA¡Õ.Ô(T•C†m1XòmX' n¬@Þˆ?úT8 ¼8G¶íÜ6·‘ÚR57,ŸÒ ¦/îŒô+¾**2ªRv.Ÿ w¬L‡b» ¼ägÁÄIK¡/×V®…^ãg#eàòz¡ÊV°t:ܾd«ÒÏõÉʛà  ¬…ÙP´ ò·Ò ».¾­âü¿cð–G&úlaîܹMÎÍf(tõ>,XÐä}¢ÊYÎöÑÍI¼òÊ+MÞwSt`ËÙ¡9_Ð\3dñØD&ÞWK-äeå¤WX ²ôû&“0á ÖJSAœ%f¥ån½½uñX‘–cÕƒw\"ׄ³ÍÜÐuŠ3ñ{fÎ3ÌbüÛºlù’¯b}Úê_7ÏìßÇ–‡õÍ¢ ywYiöke¾ ±°æ"^¦ÜÐ};Kd_ùåÊØØíGæ;°vüÓkF€hbhÝ— P¹êª«š´7/Î>hÝÕͲŸ‘#G¹5eùú믥•3 ~½ #YXXÓ96Kg€õÓá¾…U ­Ú=x†Ï¸&ôÁĵ^øv`Ÿ7ìðª7?æ-€¬‚R˜9*¾ýäu3m=ÎM±ÔMÛ?‚S„o¯®­„5ËVÁúâٰؖ Ö›Mú%m§fçë0q1@¾í&HÖNÒ6LÛŠOWËZÝ —b nS:ä<üÜ;e(h‚áÄ^ÃÁ wÀæ]Õ Ù­ÿD93Ô'uE¼ ½Þ–šÖVn†e+‹à͹ ÀœU×ö׿×tU)aùÖ*ÄñVÃ9އÈCcZ\Õpu5)ñš®ÌŸ?>ÿœDÈ@dðD7šªPÚEr­:ŒEZ™6mäååµhŒi—ãÛšá¾8)¯NGösñGJN&(v£J51\í(~~.€¥ž¼i´d£gêd0£–¥!OßZlÛ¸¶QËýð“ÕÙÉFnXq×4€Œ¸!-@È…m«Ô;ch:”–Y໗´©Ãà‡¢ Xt™ 6"›“ÿ¯ŸÂ+ýWW/:wL€¤S¡0¶Áø±s `þï s ¼8™3OôÕǽZ×w°qmЈMžŸýŒÃ”Šøö£§ñ3™ç‘0Ñ‹ $©‚ÃLjÀµØ&cC9 t5ÒûZ´hQ“õE*ç™3gê}ÑØ0¶Xºti“õÙ„1 ¦ÈÎÎßÿ}DrRmÊCS¦c-h…FLæ<ƒŠÚcfU+p,b¶XL)}ÔG]@FY¨f6*se9ò>V„U˜c.al¬b÷Š"¢Qˆcv RA[ í¾Á»ˆw2ðRzvU”‰ÜÌ aFµzFf¶btNý]U*Ûf;|ôp¯~|û5‰«ã+Tœ¾bð°–GÀ˜ŽðàÁƒMÂw‘-Z?KúcÆŒ‘Éš¢³]»vI•ó¶mÛtòt„TÎ&S°zT¯ÔB;ÄçÛo¿ «V­’†TÄFçÎëÆÇ¨ƒm0ï^8Œäõ;ØRSÃ{á@ƒé$Aÿ(ËŽf“•ðüHœigÃäž‘émIÅþ5ÓSб ÃþÒòJX}䙚Š8“5Â~ÊÌ9©ç˜µ?òj5,\÷˜'ñWk´ºõÁ–ØÝ^½Ö[}ùÖˆÄá6®^'x0Œ@”"ððÃë3ÅÇ{¬I¸ÄP“z˜^P`øÉ&éç7Þ˜ÝIï ‹âÆo˜ä¡Iú;¢ýK`liA˜ôéÓÇWâ—>£GŽHZqC²ˆ4h³Z­Ê§Ì**œê,g°“Τ—BDqG‰b€eÉ-E…¹"]ö®aÕcŒ³NKz¦(¶ÚEUU•°+®?¦œR÷Š¢LI”SÕÕÖQ,qHÏ)Uxï*Ê ¾:‰Ì¢ IÛ^!é•W Ó.rÓ7ƒ‘•ÓnvG•¨²—‰ºéÒ@‹Û ³DFv°U8D•Ã. ³—®»OŸ–o}dñ¿ñ?D!#Ðò<óÌ3òaGþÌÌÌFgè“O>˜äAï/4zàA`–'½ ©œq­·Ñû:‚¿üò‹(--•øž~úé~|jB—¶˜òPL:Uàl=b7¶<@а6ns4?`Àx}ÒKsE²‚Ζ‚ L"¯ ÷KbVÔ$À[#XA{í"UÁÆþÍ™ù¡išñz&Ò0ç”õ,êj‹-*T®ÑÏÈ-1¨Ü¢ ÓdèÛ"JõŽ…°æ®™³ðšA¸+*q.m³ 8Eâ;x$q{†pÜÞZX4!€Éôª‰•5Œ2%úõë§ÓÇ\ÃJŸˆ¡ÊYP ã•úüì³Ï½¯†D‹ožä‹A=üø3òzÒI' Z‡ÿÇ?þ!0DCº8¾º^pº4i‰3C9[M6Ÿ¬ò£o ã†äÅõv§ÓéG˯a„ƒ:Û8»6òi$çrâ,·ÊiÌêUj'y 3¬æÁ™µ“Úú 0’nõû‘ ðûË…`ãpcXAS‚ øñǪ· Š$›ã™BM>ýôÓ@qŸ Âa<ÖöC×ÎíÙ³GöAÍZ8Iêèú믇—_~PÝð4ˆ¹W•””ÈõÜ¢¢"8p ôª*¥|œ4i\{íµ€/%2Iƒ:jŒÊžmÐ¥óH?JD#MuÝñ»€ÞCÁvPÜÇŒ×!?ÇRêl›€´»…§”ÜMÎá×?µKߎê&â÷#±ù¿"~lFó àh¾;Ì[Ü p¬xß¾}@>¶ô¡¸ÎÚ>E> Q‹ÎQ¬¦*”˘ü˜ÉØË.8Í%€)®ôûï¿/ ©0”':t(äq “'O–B—2@ÑKF‹–¤ÁPa³Âî}Uè¦Ð{ðHë™–¥wì…]µ]Â^ç ñ…§#Œ¯ûÉ£‰R(@ÅàÁƒ%w”pûöí:§$\ÈU. IDATªØ(`µýƘ-ë5ÁN×®]å¸hlô2d`ÌçF yI~Åï½÷žºÿú׿üfàÆ¡œqÆpõÕWÔ)S`Ô¨Q@/ \X@€p,Ü%æ1æ uq¯^½ä8hÖH±‘µY-©q1¬Ác¤èZgžy&¦([¤f¥ô~ ‹\‘?¡Î×u-©×LÄ©¯‰M(k[ÖurÑ¢.¹ ÑŒ—2û„*hÙ ×\sœéþö·¿JýÈ…ˆ5XÇÀ«5¦C•Zb¨Õ(-Z-®étJTTvµU;á_[Âè #Cúõу‘Ö–|”úÌéöà ¨®+%]'¸#µu»1ð{Û΂mý ¥6CÚ®#бK*tK ìÙW›ð"ÞüJ­*10¼·mGHMéÀWýiûÑlÀ­U’j´¡…„5;…u$?[úhûÍ¥þ-++“”fíäO«}hV®GõRJJJPÆXÑ2ZÖêÕ«¥Ð¥µ]z)UhfMë¹ô¡™6F æhõfhQ€‘¿h O³È)*oDÎÝW ýÒ ¹­‹±¿4±Éç!¡÷ç(%Õ•B?[ÔgX7\Û¿´gÈKF®Ð‚úxì…Š‡ƒÌ|kÈ ùÖ|Ko{±ê©b˜áK›ÖÚúP¸³lÙ2é®ã}÷×) Vqå•WŠ{ï½W,Y²D¬_¿^8þQƒØm“W'_[ò5þç?ÿ)}ôQj`qÖYg œ¾«¾q†¤ó8{–Q«(zU´²6¦è`Æi7Zñ¸ÝÂm0'öþX.ŠŠ7…N~€½a»GÆ<˜#Ëgúþ9 ý j0>^Aí***Я8DÏÒÚÚt«D¨Ë—}†+„­¦[Ka7¤¨¿Ó(€1ûIV‘M~9+ìV5g ¬ÿ@Bý0{ËE¹Ã'mݶ—ð!:V\ñm525˜€Þ-†¹Ëʘ6M`Ú4iÓ¤¤IËìRGÛB Xl«@WpX•€éùŠß em± S9ýp=NQšGB68ð@Eq¶úà7òV%r?È@ßI|˜¸PcD‘‘¯¼ÀÔ—¶>Ììà¦@Us0êÖ­›xâ‰'Äš5k®õâs-ÄêýD[U46lÏ?ÿ¼ÌŽ„*bѱcÇ  ^ œñøã 44‹¶¡…åÇSîs3óWvÿܰT"]hŒ—d¾G 9Âw˜~ô{Ñ^v ü‡Î¬ä®ëEÖ^HÁB tq?·¤BeÈ#"½€k\Çã–pÔßU%…X®Í÷vɸ¯FìÅY˜ ,#ß/>làÐ\ªPÔ~1Pd(‚¨gŽªÀ£v!0zÒ[±M :³Í'°ýØ?õJ ñ€äEòÇœgóE^ÒR«åâK‚›ª9{Î+÷ù*Ê´i±tu¶CÐÖ¯5`‡ÏSz kÚR‚zJTß ÉØ¹s§xóÍ7żyóÄW\!(U¡ã>†‰×]wôï¥Pô2C@p];*ásÉT}&ùj·ãË,ÞgúؾïhCoš—d!lù¾à"éy¼ÔxÍ`ɶ*Îf¢ÂÀ¿Ò f ;ÎpéåH~*”ð[u½È–f‹ì‚RápºdT­<©ñÊŠva¤¬ð/à Á-Öê†_HÃ_—è@ =Æ£-þû2èº= ~Þÿ,š½ã¾–ÀÝ¿Î_¿·™lˆÀ»{+ÜuîDXiÉëý—@ÛýŸÁ‘Safw¡räÐ×°íhèµ7¥D u¼.ºûNî…I}FÙcêl«Ñªü–aV»+rzhgü¶»?¥±¦Ã…jLZØ·.2M ³&Õ†Úîï¾Àœ-fžê[nÛ¾3æ3T2ìÑ6\«ïîŠ+à®»îò3^ºõÖ[áÅ_”Ù‰êK'žê‘e2­]Ó«ù2“˪,aëÖ­€‰$ô!ÓÚò[o½%1ª—~^Û!÷"ÍWYógÖ¶¡Î×÷\8ºf³&NôÏô£ñâÛö†s†öQ#'ûÎ*{5°vÞå0ñ»ßƒëµ[CûÕbE÷εÊïTmNaµÇÍ-‡E7õ‡×-‚'àxíÖ4õjˆMâxÑá€ù¶e`šH¿ÿrîµÏ@ÅeCùó“`n€'׿ü{`½9\Kfªü%A’Áƒž¦Þç@Ÿž=ý‰âQRÚ­°ÄÀÖè«®¸c)ìúŸÆuÃÔS†‡õVIpáE¨‡Z©H‚)èË®•ä¡WÁ ÷²ª†ÿiíciË8îÊ‹¢²bHúN‘¸2>üâñðì5˜5š›'ƒe£ ¦c¾0¥Fð ÜH}% "kîLJ²¨O*ŒÇßÀ¢àªuž‰(¦Ã¦>SÈFlKUjwüԉ`3åÀ‡‚èîíË¡ïÔÅ8ÙWƒ¸·ÃÌîã¡wž–LÁ ñµŠ{Ú€)¥m{uÇ·ñ m8DÛפ^{d-|÷Ýw¥ÿÓ †=”‚—òórñG€„rVV–~’‚~¬\¹R~ÈBi oáhûˆFÞÃü¾Rål ÕHB—T§h|¹1_ ‰ÀE]äL‚ -ÃeØIšõj…4 ‹>䦥͈ëòkÖêѶ¾u©Þ…^¨uak†ù/,‚´€ï®Ö@†eÔBl›í%9DßòÊŬÒÇáÖÑÊ«ïˆáVXßaØP"ü ¸NíÅÚ¢uëãWqCÊD;uØlVQœ¯¤83çj–ª P8Ã"ÍY&­!#] Ev†’É$=W1ÄL΋ bÊU#¬ârL5†Æ²Ô•ú °Â¶õ–+ɼѲ¹ÐІ+rœ6aCúdz¦[7§ç ”cz8ôAE«è.^Ž Ät›5§È%«Nlkw"6Å¥)£@±°mme¤ú ê?cÆ ýCÖ¾”xKã"@nH—_~¹Ž³sÌ{,P}ݸ6šb„…ß;Ÿ­_H ú×2ÄU—•R i(æýFÂif¤åß!Té ò’êî˜íKs(ðwIéõìOþ$„4f4¤'¬(QR%âKr@ÍàC¿2õ'ì­*“YÀœ-ì‘@ &ÓgâÆ Úϯ‰Ü:ä‡G>s!ùP|夺܄+>+H—È—¹6}y€þâϾg°xVë\µX|ƒVʘƒ3åþûºÈ¸,sŠY¹yò ¯¹úP–A®•mÉ÷/X{EÕ×ïŠ ÂF>My u¥>CMî?Æ'hmñ¥uó»FÇê;\Z8S(+itw"¬ý° _¯£Ô¾ -95OÃÑ6Ü,²p%«f#ÏdõLÖÏ\š>ø@¤¥¥ùá®ÝĘ B`d°¦c eÍ :¿Ô*lZî`kæÃÕ¾iQ𒌼{ѯªÊ!гM—rÐç³ ©‹r™ÿD^©ý}«DI®’¸¤ %"¦NÌļÄE”—}}m%yòhy‰#¿Èâó)#c”Éþå%"Sº:e)¶Žð0ÇÅé8ÀêÛ›ßC¼%ƒU„ñ³óh3=Ÿ°IÏ*2Rå;u,Á*Ê ýìÒEQ¹6C $càT1 ñʵR§("7&~zúQ<Öü€BqÐøNÅ¡D<ýÀµeÑö«ÌÀ«Ð âx  Éœ²F¼Èß—\e¸4=¨>&Lðû¾ïf„Z`t-™“šž!ìAÀÚï÷õ½$ª/Ééê k®œåEÂb&í”Id†xINÓüê±}ðK21áE/‡òåõÍp¢0+Ý€«Yä—)nFuåŽü"‹n’ÙFºˆÉâ{†Ôñª¸8?ߨ/XE|  õ VÑÏÎe9YècçÀü˜—°¢o=8!ˆß§ãVaäß9…ø†‰ÆU…ýЬYWISŒ9¾Yà9êbàŸÅZ²*Ñ£ŠlUÂ¥ºPêP$¬ré;ˆB×îBëÀ·Yúåê³È¸ø%Ôc¤r¾ýöÛ (¾¬¯¼òJ=Zs•ÆFCc Œ/íw?Œ‚˜ö1%¡@—0ñé§Ÿ6v÷MBÏe|At–J­‘þ|è1¤¨s,‡^ŠH*ÿ/óâsëXóS$-å6ŽñXŒ£6q#€i/b° R¡àÒR`‹pû¼ÂZ ¬¯Ê² Õ4¦L=àD}‚Ux]Š Ã©ga[ÃÛjÈN%òá‘o˜©68ÐE(5,vfÏÇ7Î!ét”‹²ÒQ\ô†¸»†kKøfnFšYl`£G!V<ü.ª²í‚I]‡¢Jë7††q»K«Lø}0>àû÷ï/×¢ãvÐ100šáΜ9Óï¾ï‘q¿oß¾2dfôFØr*!añ·‡–ƒÊ˜Ðv^’C·ÂÅBÈp±¡êó¹èC nÜ(XÅ: Vñúë¯ÃògçÁE#çb°Š‡|Á*j½ð9~[7í 6ÀÇÓ²To~†M[Y¥À¥s§€Í¶Mwa‘Á*ªÂ·'"ÒÏ.¥'¤’Ÿ]˜R[¹–,œ㺌sV\Û_3T‚UôKèB£©ùÊ™B«¨† «‹P‹Õ4wXj–ܽ3üøåðþ«7À »2aÅÍèíŽ6ÍíRéêφ ˜D/%v= Ãg(Å8šÄ^ÃÑ£x=lÞU­^ïÍßÿþw6 ­A1ù€V¦M›[¶l\ÔNñ¶ yyy€ñ³¡]»v:”ɘƒ™. àŒY (ˆ¥U3Y©8p@oÓò;Ép»£]€J x~”XíàY= ÈÓ?Té”6öîÚ C:…ºÊçb0ó±À:ºÝèY‚ŽÂ¯(Ã"«èd‚bL ‰á¾­5Pü< mK(#UmÑk‰IP&;ć2a*¾å‹;Œ-áŒ1Âï¾Q2ˆÝ»w‹ÂBŒžž.pÖìw]»Ç'Ÿ|²\ãÿøãez [ÈÃ!0+[„¥PǤ™Œf(Î]›D‰íGÿv¡øP=/¼˜y¨^< µMq™Ù¨.o ÿ6|ÔPbv X3û/ :ŽŠb^_#HФè.8~j‘Á©K¯ÂöƵPÏ:q ïü ü´¤¾j˜Ð,¸©c»H_ ÷ØE£«$ðï£>„_+ó×/+ªu…4ÑVÞxã ¹võ{J÷æÆoŒJ^£ »hà‡’6LŸ>ÝïþõèÑC|òÉ'’½ýû÷‹^xA\pÁ~u´ß0vEÊPÊ’˜Úf—i.Glj‚{“äë±M>zÖüm«/þõÐ|Pv"ô}7)4ŽÏ[Cヷǎ@Œ àt¡$ R …®žˆ_ÒñOŠ-6_°ŠÓñK' ¯ÐÂ#°ˆI/dÁ÷V?Ã+¬Ó%å ±èµ\5õ^WñÄ!Z¥{Ð÷ÝÈÏuѧ߄õ³³f‰+®€?®? «ÃŽ@ÓüÜO¾R¶>­G° Ï›ÿ V?$šícÿpæ,aÅl,¨‚>2VÎX-éw‰;ÐðjÀb6#Åaà"¡<3¼ÈÿûâZlÿ§¼çå80³­PR¬=³ ~äŒsÞF VAôµvúV BÕÇ0ÄõÜÿ|)Š^|X\ÐÕ¿­ù·T·$º/­¸¢©ÓÎÈF᫦ñà CˆÒF1°Þ'XD©Ã÷p0~ éaD‚ôhÚ>š¸¾&ÓÑQ]rÛ1^Ãp‚F-²OâóÎ;Ï/ q(>ûì³á‡;mH Cù—ß7²šåL®Jä²dÅ@‘Š-}æÍùáUÎÇëqñ²ÒÆŠ‚rŸv(”& Ÿ)×ÿyŽïbu9N›SŸ·F¸^ø|CˆylÎÈh@Õћ܆´oþD×RüRA[Дӏnâ¬þ … Iɾê&ùÃû©NP~kÀ(àIxú«x´Þ\è"C¤K{üc›p’ŸA†ÉS·%-ß­1g¯F¶¥¹Êø QmìWðÇœØ(‚ýFñÞ—?`Ò{ê¸Jä’û‘ùÒå-KE%]r)ëÙšJ\¨á7`ÔœpþZŸ˜vObF³\Ìvã÷à£s4»  Z£ú]{ðÁ5ÍºÅøÍâšk®‘³uãCúú믘¯YyáΚ½{÷ 4úóû¾5ê˜m4ÕofNŽÀtŠò“…ùlµßžÛ*ÆâoJ¾ø‡’ÿ‹¿]”¢?…¡”Ë:jûž™ßsG#i“¿y³|þY2³E!F°2†¥ù2q=¡¹¬ŠºÊdÉ™‹°Xážžžáä”ÊÂ~»³H£P3î“ÊïÒK/õÃÝBüh5ÕE¬Â\³‚„,º«øñ@Q”0ooSuÍt[ ¢‚®c~÷:55U`¶ªs¤`³°dâïCýX2,¢P÷çoà‹?q ¾äJŒÏàáž;Ãv\JÊÊÉÅ%Üj±Ýógïj”À ŸmŒÖž¶‘ž;žòB‘aV"ä÷Ûóþñ!pl~$ø$–â¥)4 rãé‚þ럷wz†Í äãÝ ‡Ñ3iýNžBΤÁP±eŒþ ÜùÆ_áwç%¬spî¨t':òó¯ÚtJ»J^K†„#ÈDgÊ–IÅ)¹±$¨nIZ£ÔÞäbð_¨ü_%œQÓ’»a{<£zöhÕ¤‹¹49³Õ¸aŸÓ­º%ÁóÒ!}Áz(°9ᦴd¨A·#·ÛÞ¶!iûŠv¢ö¡ŸÑ М©zŽÎ€¹c§Ãã?| Um†Ák˜aHw7d˜Ò.bÖ%´–îß×uˆ=´JúÛÇÑ£JÊ@JbO.#h©¢Åñ:tè¼óÎ;ðöÛoÃûï¿h!’àúõësB^㓱‹@‡dža|A„¹sçZø.•H73ʪTÿâ®ë!wÑLÝOÞ¿mz;Fv"eóÐÞ¾f§ ¡¥^Ù‰6¿w¬G—ÄŠëÂð¾×ÄþSàµuøÙ·ÆuOýÎÙÂy%‡§ÃW" p|ò»åZkVÐe˜§Ü&Þœ;Êïí‡,¦Ý÷„¸ ·Á†W>¾%Jä+Kn¡(*Ô ¯Ò•jÀ›¨Ûú˜ì£Ï|í-Ó-#×<𯯅ß©ÏìâriHµ»”f‘cÄC׃‹+ºøÖE©> ºS¹×?8:†’$—¦¬çÔ¨\j ÷þJÁÑÚã±…2%Ù?½2àÚcÅŠÿ`ðtTk†W<ÝšO¡'OÅIX•8;ß¾ÿÑÕH;#•ÔjÑ2 9ý]­´Ë¡¶˜JN”––ЧŸ~Z\}õÕÒÈ%P%j,tŽ"NUVV†"Û s¤ê¦õfL¤.È'\ÆYð²eËÔWŽ=(¡C—.]ü¾è\ï(kÀy!´Qõ%¡Úløy\”¡JUAôëCc ²œ^lUŒ!¼n úb¾Vë‹Ó@\s~ô¥—Ÿ+ÅsªkÓÚ¹îç‰ßÝ!\è7XU^,®Ô·ØÎ7Õï!¡ s-ÁxÏebêiZ?þÛ¥?Çž§Ð4gæ‹ÍïÒZÕŸ žøSoCï_ã´Ë^,×}2ò˨<¥çí:/¡×¹ýQ¥#œq©xµ1Ôµ%a}Ùe—‰+VLrL<à ΢¥55æ•*äñãÇûY[öGBþ±Ç“Ù‹þú׿êc£µB.ñù³*eeeÕ{Њ Ú‚~¹6i°EF[Ö2«¨pªbhh _ yñ§VA˜ ½"d'¢6r¹Ë$Jô7i:«”HY‘È[# Bmh=]eðÖ ·D.‹@Ì à@‚-íJ` Ý73rPÆŸŒ|c2¼2¸|þâÕúØÚ& z‘Y "2fè{Ú™âì'|3`ÀZð …'ŠçŒŽî¡ƒ»d®Zcœe¦ Ó\šêt¡â=ûpQh§=þ±<]3äÊÅ|ÀéêÛ¶ ¼þåÛb Ž5¯Ü÷“oÓ† & áK㬡Ÿñ˜\|:uê$]EêšÓ,•|siæÒ½{w‘’’"Î<óL9³>çœsÄI'±/êwøðábÁ‚‚,ž…‚Ë£ŠRo’Œ—y?N [ÊšD£¿zàpýRâ‹ó–çnÔ¿+Æï³žUp( º†ÿA׋ÅËŸÑgÀÞo‹Äoñûú¤úâOítŒÏ«* 6âØ»Y<2Ö÷RL|\|ÏR±Û­àpï÷à9s@¬x¦¤çÓÆç&ááw·aº@'úþ+6 ƱiÞ8+ 1›F·'Œ_á@7EÿAÊ#%wÀëD;Tñ8)…*^×ò"†ª§çâHÓl]jò Ð-)SzÑ—Ó"ÊuÝ‹š˜=Ð5„á\ü°ôñ³•¡}‰y“ß IDATÕDŠÕ3Ò}H{@+3`¬|S´`ÆVòª×!ŠÐp"Sæù%«mI­±4&‹,€Jžà0ÆÂQ,gõYšAšŸ[‘’´^ ¤¡'7†É·|-_¯öåÇ·n2r‹Ì—V=¿ðaEª_rç¹òÊ+Ž÷Þ+}jqU>|5…| Êézþùç=Œƒúî“Ð&k×gžyF%l¤2cÆ ½O2ÖáÒz ­p%dÄ=ü @vY¸&õ;¯¿ø+ËXpÓõøýø(ÁÜÖŸŸæ—dAº@fÜ¡Gé7@?(;Ñúý{ßWU‘öÿµ°Ä„Ó\É”DWSn,¯Ú¦¶­ä®X«e ©èZ+¸k–¦XÖ*ø¶+Ú”àÖâîOÜv¯ûG1$ÅŸ‹— RX±‚ò^ôRó>3çÏ=çÞË?£4œás9çÌ™yfÎsî=Ï™™çû|MùÚïC{nññY­8àÁÔ Âó´O\«¦GR¡Œh¦J%2Ÿ!O¹L'ÍÄ)ܾZý8Í ™ã\Μ¦Éå[U¶¦$›‘ü†Î§¨°Lí|wßv;¬@’Z«0×Ħº<‡ùÊnwG£hO¸cÏ+{}#ÿ‚g%z‘&Á¤P<–í<ªáóZ7ÀZ8I¯Ø Š#L­Þ ò}£7èÖ!MJ‘VaGü´³’¥ð/»%MYÇmVT¢Ú}o}Uܸ֓K;Ð/¥w®ÿmEr•rí}øá‡btkúÑÀØ‘Ž:³Ž|äÈýAÁ_Nžt ûçê¢Ü»Œ4 ùšä}%¸Ñ‚nTŠ7W)?;//þ‰Y6!ÐoT›xs–bÈx¼‚:z†Õш±N}anª«få<à†øÐH²²Ð4p`jÈ[$*U²ó§Þ Ù›J*Š(Ø@rkJrÅ‹{¬Úý U„ ÿ]ÆêÌheäF|T*i±©¡\¡AÕž;4ר–l§q/™Œ®“þlyJ\k+1ô¾t£nd€¹‚Ù1¢ÚjÀÖuক璓“i„¬¸ðko’ž?Œ 4Àz_ÌÐ =»MC×쨩’¥ª±¢Õ 7튴°6"æýð>nÿÅ@¿†NîÔ×׳%K–x¬Ûj‘¨xlfòdfÄ^#Œ#ŸFãÓVü¡Ãך5ƒÍ§©¹ÓÙ…$chÂçž{îBDÈ:ÝLЦ‘i«×ÕÊÒ–[y;ù{ðÙ4í{Ê—¶æ<ÿ'ñâÿÛE1lÊ Ez ÏçL5{œ¾—:‡\Á[˜fÈŠÓh0•a^zÓ[sßQ—ÅŒS×ê }TjK£™EíyH1P0†àtVrÈ$­;{m6ËÏ[¤êPHZµÝ÷ĽkÝðXGšÐ㻟Œœyt5AÆz\S‹GŽ+#hD`ۉ㜖HKÈV¦ÃŠôM q«BƒNÐ×OH“rÎ]Ë|;Z4¾Ÿ« íµÔÂb߬蛊Jçf„CO¾îD.ÁŠ–FF "2‘ã ®eÁ-Ñ“` ö‡_ðmtDô‚Õ„ÉRÓÙÓÇ7}r°”;ÔJ+¡[¢P€ r¼øâ‹¢É¢0‚ ú ÈE‚9Éßß´ Z­cðàÁ <± ™ÓêpöŠ|¤vjKABôòœâŽCUd’ ž5¼´zˆHD|V¯^†Í»«TÅ4áè_ è¹awû ôæ8‚…#c“˜ _ ZÚ‚…ØÆþë;ÑÑ“1öæ&|ø•‘8ÔP—v[jËñmóÿù¦Íb·bþêí8a|NéUNà¢bµ¤½„É9ÀÑwPc;±dÚ­T·¢ã—awE£^ôSû6¶Rñ):Íi#²§‡`mT&ò—ÏÔa‹Æ: .HŸÏÐ0bQ³áØÇ®g‰^ÖMöÙÏþiéõ1EÅ1œ¦þlö³z•î¾ã­~篔LCÁ®¨@?§§ë˰vÒlºªT w¸;æûbƱLØ÷Î7Üt×…û!žÛ$„Ï[‡üÙ7£4w5VЗÑ•zÞUÅË^3êë8US'ÎÕU×¢¾¯êýçÇoÃ=+Cø |ZB<³IôcY ±ÛX_¦SÐO¯«!Y}áG¥‘îZìØtZÙ&¾â1½O¢¬ìcÑFß!¡è[GûÅ2¥y/Î òÓ ”‘г/FŽìYa®¾¶”á÷ ˜¾ða„‰o€æX€¤åY˜’•ÿßDâRârïÔõÔz¿\b;³Gën X½âFNPU¹­ {I#Zsî+ѳ(¸‡¾ßÙrÈÁâÅ‹A#h|ôÑGصkBÕY1²|7ÔÀÉ?AïÞÊ…=û0RÃí ±ÈŽz9mí›ê¨.B=YJ2æ#¬É„Idh6tPOMŸž%cƒ#° 0‹ßdz±³1¤ä<ì;6µ[µc ÖR[Å¿¯Jw ŽÞ¥a†q7¢ëïZn³ÜUŠ·7\ Ug(Cž©!ÛäC2ðzÀa úb„?y-ø­g’s÷lÔAõ¡¥¥EÅàÐ!ÍÀyÛŽ5JÄŠîh³sçÎÕåq¯çÎ:}¹·C#_Ó:4wº‘éòÕ€f€]Ž™ÞuÑ–}Ö|,\2è…–Ö€µµÔö 0#ÿ­äVºZ± ³.ÉäH8Ã÷U}yN-tuÜY.Ð lJx2sC¥ÐœêɸZs U¢2Y"õÛ—Âr­%^!Gî뺭ʦF*óx@ Dæ n[-aðSµ¸Íª7q»˜óûò™3 Q Ö4P›2“kˆi ö2óýRŒ"áü‰¬¡©©ŽíÉà³…<@=mÈó¨{ü8¯¤ÜæÔ,‡ ‹ª1"–³®’•”×\©šY3¹A%xÔõ‰Ùžl‚Jòò‰YÅ˜ÈΊE„>0ë…ùPºwö;qÜm °­†±\Ÿ¸”\Ãt¯÷ûÓPSΊ ÷0+aR÷ôÀ›YÒj:JV ùûD3› ¦ÏH¤­•­ñ6eDÒÅ×ÐG›S1ÿa“ #2MÑùò\Åmßu±,¿\3†í`ð ¥üð\ú‰50žð7W÷ó©ù®Ñ`õSX©oI4„®4öݰ?vìX¡/H#==qþ]þ@ãèƒ ^͆âíîÒ±éþÞ}÷ÝíÖéHyË¥S°÷Þ{¯#Õd™n¨Å»~#Ú÷¢{ ý¥-Žñ5rýºOAsµzàmi´êZ^c¬8P D®Ríõ4°¼>:Ö®#ŠeEªHYºRGÓÚÈ×׈[ÙQ,ÃêÏr/éö–Å*REë52ÖWzØ­ÿwol¡/©€«¸BSv4ˆ„Ç]'༑>Ÿç¡)m¶c#íeM˜Ö[xð å ªÆ˜Ö… ‰BàqŒ¡š’a-z¨Kóº¯{ßmB3-¯˜"È¶ÞŠÓø±þ&Þ6S¦¥.Þ¨›èí³D Ý™!€Ëd¼UYÊú4Eío§ ‹“ÓøæJ3 u6  ¯EÓ/дóÖ[o±5kÖ0Ș¸ã0£Î&ãô3×ïæÍ›;+Âky‡ÃaxX1Âk9™)5ÐQ tfiË›ÖÚá3VºD$¢ÞqTMX—l]Qþ܆ –®„Ŭó>ËÖa)ßÍ‚ÝØÓ¡‰4Û=4%Eœ)Ï¢m"³iq5¼ÝCg5ËLæo—еƒVTb.½ R¤ MšÂÊõo´L_wДÞÚyß|hJ×›»['”u¦,1‚V×âòô«³<“t`|¥Êšíë×­:T˜0Ônò»øÇ|æÔÚ}á[òZî²VFm’Í_d’¸0 tniËDsza ÊZß t#=‚M‰Ã’6`}öU¸Ÿá­×’¢“ˆòp ðC…Ïž¦ñ…©žñ€ ?Á‚ôXÐ(מFþÊp$S0n-N;Aâóc÷ý^³xJ#þs?OÇ-µØµu ¬IH·%£$ž|Õä×o Aq‚Uü­–ëmÛˆý; ¼d¹^…/m€©rKí!lÍÉÇK ¼’ŸPÀ?[•fÀ·&yOÿåÏ. =§Óêï ¾Ãa”®½žô« dŒ"ûûÇ!L¡Ö‡.uU¢àü ”º¸ÄÄD”–– úD=SîH tH˜WSŽÈªZ46žÁU7ŽÂíaCµŸŽ‡„>¡spâØ=èßÇã”ÌèfèÆX¹Se{¬(£]‡?7j[ðäÚ{ðúòÉâËßÇ2Ÿ¸sçwg+wÕñ>Ò w›Vœ…éaŠá¾kM(»ø®j¥º1[C óôý;J‹ö¢”ì'èeá‹&²èÆ[ã„]/ì}gÿÆ$Pý¼Ê{=Ø^0xš”ûG(Ú]La8Èv7×~¼iˆ(5ì3̘¤ ú’“Že¿A£ ih䉋‘»ú'¸ªá0~Çò@ÿiÍté–¦°±iÓ&™œK¸«Rtt´IÔ|"rÀÊ•+Mùò@j #8cèÓ±Ô7…´öPê˜Y껡î ˤs2Y–…xyÛ6l£ÏÎ¥ Ð”È_±¥Êðébãi}ÔFxs>®ûä ××y®=ShêDÇzÀòm;±—Ù‘›˜ƒ q¯RüšÖSKs3ÈIù´4cÿÆi":VFa¦U‡¤Zõæ*¬§H64ú·=÷Q éö^P¼XôMŸv×RÍz¬‹ /aïæm`ÎüãÙû‚—«(üN¿häW##ú3ü~ùRä—7«í¹4Å3Zàh¬ÅdF£ÃkØQ‹_‹¹ž"¬ÙшúÚ¨­oT2ÔÿÄñK# %ïšk®Á¸1"á†n0•Ó¸¾›^9Ôdè7ÅšÖrÅöüùó",&-/™òåÔ€Ô€ÔÀ…h `»ª¡)+ÊÊphïvLákzQjhJàLY&zöŒÂ‘3­¨ÎïÌ£áñØLa ·¯›Ëš¬Ul‘^©ýДõ ®LQ^„¦¤P•"ñДÓc÷‘* 9Y²Ý•Д³oÓ Y[­„¦<-BS~'ÙÈ”üÃØ6O31& X´Û‹ËhMº µ\|‹š2Ýš’ΕUÔŠ_ÕŽeˆ§x²e'jñÑñ÷ñÇÕ ±‚ªÍ½sà7ñ|¶ìh *ù豇ÿ~€¶@ˆ×Ue¨:¾1=z hÂJ:ó=ü2~ /¢$ ƒ94%äe ò²yY#m,²Ž¨×Þ|Dĵ š°”Êû»½—8qÕðTC Ÿ bh%' I9eBvÖóó;÷L>õŸ 03é (Ô­¨Ò…»W#”潤_0‡%ÕÕ~ˆåËW 99Iäoذڛ…üà$¼µ' ºu½×žÿŠœ»10À]ûT°Û-4…/|ºjåŽÔÀå© Y8–u.¦Ì ô­%Ç KÉ*488µÎÐÒ$â_»‹”ú}YØ÷¼ëØ€‡þ¤_¨×ñ­yx;êÎJŠ|ƒìU™Tw‚ÁÁI”³¨”nöŒMž2‚eyËWó´8×Uàtkæú37`¯q:·|å8‚eÿ}›N6îQ¦×€Vê)ò†ßÎŽÿ3Ýkw–¨¦J¸Â¦ëVîH H \ž^Ðó¾sÎbút(9í‚{—ðÄ­rwDŽÆošo«fGKßcåÅù í ØMdœžÛý«®,aY‚€Âå=ìÕ‹9ï²û58Ø»‡³:g·Wh;°~"»%åO¬¼Fc¾°³MœãT5ˆÓ“-ËeGa‘Šxˆ%¤æRÿþ˶>¾˜­X2êŒd(T<°µ~Ž ¤>ß½œ%ßÍû~¥Þ~~îÅ?¸z MÇ“—¯F=6Se‰"^ØXõzcU.ØŽÜ3YFj@j {j@à‹v_›X>'òn ÛjèWS¥Up C×>÷®¡ºiW1ÀÙqZ®W aÔ&¨´›ll¡0‚ýˆ pØ^TŒÍý 8Q榸 ÅÂ,Äe\øo+›­ÞW Á¡8¶˜Ãš†Ï¸—Åe¹F~Ÿì{žê‡²R9Í;íu¬º†È½SHžNËÈ¡Bf¾SQîýí¢ýAKTÜ¶Ø N˜ÂªÅGÍ•Y„óv‡léujÙ›UýP?)B/K-nPûéº~\ÁÞÈ_EÇ KTIõ—d¨Ûg%K¡ºQ©Y,^<4¸—éæÈ©©ËJÒ ‹¬Ä·Ê²ã)²’/bÉ]7öZ7ï#δ¹É£|G2ò±sk>6¬Á£q?âü˜x ðÓ±¾Ø·};²ÿw^åb"ç`Œº„Ù'ô>¬ü>å•Û…£Wã¡õŸ½)¹…¨®®DáÒé°Ùаæ‡1ÈMX†p*ºôÕ,Xh]¹ô3ÅSø«æ(­s9õ ¼–JQ~·|ª‡—:–Ij@jà2Ѐ_Ä›l©µÚFnjµ~+'œF¤Q@0Q¹ú“ †Òma‹8ã ®‹ nX´Dyœ¼ñÝÎb‹¯P§ Íï‚îãàà¢!“Î[LÞÒ8ÛhžFPê~Œ¨qfœ° [ü46/­ÂÃKóiv¡Ag‘òŽ-Vª®ä“ßJúôÌ•øKî³t‚ŸÜÁGÆ£ÿ€âä9‚p=”0 I”µdF†Òªðƒ«›ƒÑ„K޼céœ%.17º‘P(òå©©ËCæ§ÞåqÍß­«ì`ä¦Î]Ô)T”–¡7-Áž=]¢Wáo$`JÀ¨"¾Û³gŽ 1b6Š)¯5lñ]‹È¥hR æçaÊÀZ¼2# ïaM] |á«„àþþt 1ñÕWïéó²â8‹}ô?Z0=õ…/…ãlr|†ìWJP?¬ÿ)÷E¯óŸ`Ë=Š;Ú IDATƒ ¨Ô8¬]¿«žˆë¿üÖ-ËÀùŠnº¡q×S]?´8(Ëÿ#å£þ¥bÞ8p6,ç—Çñìm±ø#õlé÷žÅ|‚O>×}R†w=}òí͘ü ’6e šÿ’…—80ü†kpºæb c=0ô¬:Æ*Fòýû©Û0sEÙÚSƒAïbñ3ÏáÉy3Pk]†u¶ xì?ÆÒ²X¿½B_K§©œÿý»ãÌšLÖY&©©ËW—•ËÙ%v±6ò5zúzí^C¡€²d˜~Š9Q”Š™õZ©õLÅ ÚåÕKß|’•ÀtË£üaf·-æ^Щ vÖÇ2s3h?ŠýãßkŠÆFYäãéŒ"d²Xò|æ^ÐÃ~ÎBuÏfòBŽpo3Šå¤+ÌS¢_¼oÚÇ’È^}õY×±–oئí)¤zÃ9žG“ÅmÖo­^Tò–³Ò[¿"‰m©R(ÛÄ¥µ• ktª,õ¦(dÒ ºõï¨<#5p¹h@Â.Âv6µ®®†YS j“be5uu¬AN£ÒD°‰é*v”©ÌF±¬’ ÕÙ”@q¹Êƒÿ"t_i’³!Ù]øßÊüd2n±ì`­fˆ^(@&AŒìÌFØe|­c‹Õ«"Œ´½¡X— 2锆-ö€«ÕºbãlÒØž\×øÈÃëü™gV™ñØ;TC£ÎÕý2¤¤º¯ä4 {¾íTþ*9- ïX‘b´nÀchŒæ‘°ÐZˆÃA˜$ÎZ’ó°qw﹈‰H)ýÇš:˜U„Âþ4QL‰/rçåØ ä„ùáÓ7÷¡ŒÜ´ÌéîóÈÕËC“±D¹'á æçžÛµÇ‚¡ŠXªŒéŽ ðJv¶È*)9l%)Ø¡Ìåuå¾Ô€Ô€Ô€»zðw ÷Ly|)j@‰» zÐ÷ó_ø[ïr3N”½ªú:œ!Qð­c:8µ8ZU‹ÆÆ3¸êÆQ¸=l¨ðœn®û·"d õ{V6ÅMªwõ·~jð1] >\ÔáTŒœ©G’! K H H hXÓ„ÜJ t@÷ÝwBCC1~üxL ñWH AÔ&‹H H xÑ€4À^”"³.= p' 9åCSÓüÓµ©Íg”Ý«(Þ‰H•ÅøOÓ͈m%v•«ÉhÖz©õνTÆkÐÎÒ…P{Úµ´ÀÑØG““<ºÐ Ÿ³£±^)ØZg þäÁbe”Ýì@=yŸ%× ƒšfÚï—Þ[¹#5 5Ð…Ð~ù_[ä[o½…}ûö}m9R€Ô€»œõÿÁs¿ÕÖÌÕ³CÆâÁi“póuîˆa÷Ú£ÕèèhÏÆœ3ÿ&£7«6`ÅXeÝùøŸÇaRóA0/¸¹jÆ„Ì@p¦ ;ç+p"Ç‘­ðçp)c² Ð^JQÂ!œÖÌù²¸9Ţľa=+0ßw$(:©žbSò±m+P‡£j7ÞƒMHj1ØrŽ„ʶN7øh"ÒÐÀžÀÉí‹1rvº–IÛXä—S€“´îh§_ßðZ»¡SrWjà²Ó@—`n|W®\yÙ)P^ðEÒ@u1^K/îPã)))0`¹6ÓP< !}Fã˜Í†ÞÁ‹sõDŠ}•g‡0“Œ/·ƒÁç ¡AÄ»€y%Fa¬•±°ú ‘·`We¥kïëƒÃ/ ÁŒôÐÛ—ѨwhJ&l݇a>xÿÍt„ψEÖt;‘(êWlH ©Ì¿faq74úzÎs^er5¬ŽÂ}ü?ôÇDZ^1â&Ý¿–jlšgAìÂí W´_;ýò¼z™#5 5ÐEè2ÜEý‘b¤¾9 ´œÀæÇ° ½@´a‰² ï°¥Èß< ~Íã•Ù‹0úÁ¬FG1·îð8ÒcaOÍBZAžvOÁø>9ž vϦ€“ƒ‡ fsž¦AibÞÁ…~¡xbM¨^+lÒľo%ëPÖc(ˆÊ€}ó|5t¥yˆëÅÁ_,ÁßÇÐÁž-‡ÎZN/Z ÅϧÅbé†sêË@;ýҪɭԀÔ@—k Ë ðĉå¸ËoÈ5pî俱vÓ›ˆúi,®¤ϧ*ð¯¢091c^-”t®¶k·Áœ”…¤déÊãS<èö®Ç‡ñ¥j%PO#e8’Žñh\”ZÎâ`Ù>ÜÀI[MÈVYìÄjô'2À¦$ªæÃÄa×cTäÏ0ýÞh"Ñ0•‡¶$!ɨžn†“ ¦§œ|¼±t-¢h úç|š˜h7Þ9@ãí‚X2m+¶äStp eùÄScòÅ ÷¤0 ¶¤$ÄBïë‡ãÇ÷δñCÍP)Ñr£ÊþtŸ‰ýIëakýÒÎË­Ô€Ô@×i Ë pd$sÙu=“’¤T 8Ê6 ŒëüÀ„½ýFd€w7á¹}kÆmM+/ûM ùPõñbtè¼ãÒiÄ™Vœ…é¢p×$ ’¹Am¤Ý_B3v¯žˆÎjTÊG¡ê4¯ÕÈwÀ0¤¦¤aÐðþ°=ˆ¤Ù“(&t•œ®AOõ{‘Há-S¬9#e#Ó“¥é¼ÚŠuÇ©¶eÆ=ò4ùõyüã71ˆùWXkJ0y z5Á4õ}¢h+fLXØÌZŸÓ›å;û7&(ìO•.ö'½@ýÒËÈ©©®Ó@÷1"¯¬[h€¢aÕþ_FäÄì°£É}ª¡˜‘ùdÉÖêŽ]fS‰R~O^Þ–Å`¡žÅÊQ€M$yé% tÞÉÊ÷XÙ«#õè]Ά·Å·¦~<Àl›3çõ,.5ß²“úÛDžê S©|+ä")55Ô°êê*–ó .'¹zB'y$-ŠŽVSSGý¡ã:%ŠXе†jX*µRP«Ëft=<ÊXZIk¢¨]¤kË 8jZr²Â %œ&±?i™†­“Y“½ôËPBîJ H t­$ˆ±ëÞe¤¤o@Ü»·Ä/Iò[ˆðóEO |¿l;jýnF8åÿ”œÔ¤ùiǦ-Mó’Á×yΟ±­ƒ_¿ÛÅy¥MWÑDÁ©/Uø]`$ÞGÀ /mÅÖ­Çï^l`5¢zö„ïúQÒï:_:%ö˶/ƒo`† ЏMd–×üýÄ¢Zܱ=zú"°‚‚úÿgl,ëE$”@ñ ^ßýé`íãuÙðé͹§pîÝøúúº}Æ & ¯*oصì.L êE#û“¡Q$ïDŒ˜!˜§÷Ëx^îK H t½Úxëú&¥D©Nj`øÏˆN·;_DßC\Òlœý`þLbò¢† agÊ2ágy%Ž„yó¡"oß¹qÀìðYè™?×Vü³—"ç~ô>¶Ã·â±ˆizÇ®®Ò~x ºSÔYgøøÂYþFƤìè~b5 CYv<, 9¢þ]/œAmÕ~¼”´€ŽS0°iBf¯%'åƒ3s«&¡`ÙØýSL;ÉK99)õŽÂ“3G`ߪ!iu–RîÃHö<, ®òtëÄ“„5®Ç-ÏV›1o¡Ž)‘·ãž_=Ë}Pÿþ¿0%~,i«È š^ ˆ×9–;|åbLï“(+ûXÈê;$ýø®9¿žAo™ø…×p¨¢¸ü'5 5ÐÕèÚµ”&5еÓÄÑ”-Ÿ5|Âby%|z¶‰YSh*yʃt~*+qxoß^nedÒt}Á"Âé˜X¥jËsYèè9bJ7]p°ôP°EKîЧ Ý¥ŠéÚ°ñ i†Ó^Ç ·ÌÑ勾ƒ“µº‰•çÆQ~*ûoy–8ŸUþ#2G—U®‹­¶¦™ëÒù_åÙÄy.»º¦”=5ÎßP&ŠeÓtµ-“òîdF¹®·•œÅjÄL¸eÅšÏizÌÓí¤A½_fÒ ½srGj@jàÑ€dCúFÔ*…v•l™|ÝÒÂÒ²rYnVK†&‘•ë$E*[”i½Ó­u{‰b|礳•ÌVœ'Žoù!¦¸7ºÑ'Ö⥱K)ä£Ó{wÑ(ݤ];Å Ž NÏo@i½\¯†¹Òó• àb‡xÉ›lãy¾oô¼v?'¥¤.- H|iÝÙ ؆Œ×አT”•áÐÞí˜BSÀ4¥<@Œ›±c~ôŒÞL¾¼Þ“_ð*]€ðyë°k÷¬Ž¿ l0c ÷Éfã9¾b× XAžÇ{–MööÑÒ܈úúZÔœ¦qíçu¨­¯G#‘+ñ4ôö{èÿ¬Ë>±Ê^rJÆìsCë@vü4¬ÛuHÔ©­Ø‹”¸”?×óєڒ]–½ «·ïÇ j{^§ Ïë)¸IŽ~åÉÿR—².©iÙ©7 Ø2¹±Ù‹769ƒÙôª—o,Ôp«k4ýŒØ ä´áád'ǬÏÛ›“æò:‘|z Ÿ»áÕêûøÐ ÂκZûr+5 5ð­k@Ž€¿u•Ë/†k+p´ªgpÕ£p;Q¶f ¹KW]e50`ú÷‘ï¨ã~É6¥. H|9ÜeyRRRR—œ¤ô%wKd‡¤¤¤¤. H|9ÜeyRRRR—œ¤¾än‰ìÔ€Ô€Ô€ÔÀå ÿÓ€&z1ß´åIEND®B`‚PyCogent-1.5.3/doc/examples/align_codons_to_protein.rst000644 000765 000024 00000006316 11213034174 024260 0ustar00jrideoutstaff000000 000000 Map protein alignment gaps to DNA alignment gaps ================================================ .. sectionauthor:: Gavin Huttley Although PyCogent provides a means for directly aligning codon sequences, you may want to use a different approach based on the translate-align-introduce gaps into the original paradigm. After you've translated your codon sequences, and aligned the resulting amino acid sequences, you want to introduce the gaps from the aligned protein sequences back into the original codon sequences. Here's how. .. doctest:: >>> from cogent import LoadSeqs, DNA, PROTEIN First I'm going to construct an artificial example, using the seqs dict as a means to get the data into the Alignment object. The basic idea, however, is that you should already have a set of DNA sequences that are in frame (i.e. position 0 is the 1st codon position), you've translated those sequences and aligned these translated sequences. The result is an alignment of aa sequences and a set of unaligned DNA sequences from which the aa seqs were derived. If your sequences are not in frame you can adjust it by either slicing, or adding N's to the beginning of the raw string. .. doctest:: >>> seqs = { ... 'hum': 'AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT', ... 'mus': 'AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG', ... 'rat': 'CTGAACAAGCAGCCACTTTCAAACAAGAAA'} >>> unaligned_DNA = LoadSeqs(data=seqs, moltype = DNA, aligned = False) >>> print unaligned_DNA.toFasta() >hum AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT >mus AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG >rat CTGAACAAGCAGCCACTTTCAAACAAGAAA In order to ensure the alignment algorithm preserves the coding frame, we align the translation of the sequences. We need to translate them first, but note that because the seqs are unaligned they we have to set ``aligned = False``, or we'll get an error. .. doctest:: >>> unaligned_aa = unaligned_DNA.getTranslation() >>> print unaligned_aa.toFasta() >hum KQIQESSENGSLAARQERQAQVNLT >mus KQIQESGESGSLAARQERQAQVNLT >rat LNKQPLSNKK The translated seqs can then be written to file, using the method ``writeToFile``. That file then serves as input for an alignment program. The resulting alignment file can be read back in. (We won't write to file in this example.) For this example we will specify the aligned sequences in the dict, rather than from file. .. doctest:: >>> aligned_aa_seqs = {'hum': 'KQIQESSENGSLAARQERQAQVNLT', ... 'mus': 'KQIQESGESGSLAARQERQAQVNLT', ... 'rat': 'LNKQ------PLS---------NKK'} >>> aligned_aa = LoadSeqs(data = aligned_aa_seqs, moltype = PROTEIN) >>> aligned_DNA = aligned_aa.replaceSeqs(unaligned_DNA) Just to be sure, we'll check that the DNA sequence has gaps in the right place. .. doctest:: >>> print aligned_DNA >hum AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT >rat CTGAACAAGCAG------------------CCACTTTCA---------------------------AACAAGAAA >mus AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG PyCogent-1.5.3/doc/examples/alignment_app_controllers.rst000644 000765 000024 00000004432 11444532333 024626 0ustar00jrideoutstaff000000 000000 .. _alignment-controllers: Using alignment application controllers to align unaligned sequences ==================================================================== .. sectionauthor:: Daniel McDonald This document provides examples of how to align sequences using the alignment application controllers. Each alignment application controller module provides the support method ``align_unaligned_seqs``. This method takes as input a ``SequenceCollection`` object or a dict mapping sequence ids to sequences, the ``MolType`` of the sequences, and an option dict containing specific parameter settings. As output, the method returns an ``Alignment`` object. First, lets import all of the ``align_unaligned_seqs`` methods: .. doctest:: >>> from cogent.app.clustalw import align_unaligned_seqs as clustalw_align_unaligned_seqs >>> from cogent.app.muscle import align_unaligned_seqs as muscle_align_unaligned_seqs >>> from cogent.app.mafft import align_unaligned_seqs as mafft_align_unaligned_seqs Next, we'll load our test data. We will be using DNA sequences for this example: .. doctest:: >>> from cogent.core.moltype import DNA >>> from cogent import LoadSeqs >>> unaligned_seqs = LoadSeqs(filename='data/test2.fasta', aligned=False) Lets align some sequences using default parameters! .. note:: Output is truncated for document formatting .. doctest:: >>> clustalw_aln = clustalw_align_unaligned_seqs(unaligned_seqs, DNA) >>> muscle_aln = muscle_align_unaligned_seqs(unaligned_seqs, DNA) >>> mafft_aln = mafft_align_unaligned_seqs(unaligned_seqs, DNA) >>> clustalw_aln 5 x 60 dna alignment: NineBande[------CGCCA...], Mouse[GCAGTGAGCCA...], ... >>> muscle_aln 5 x 60 dna alignment: NineBande[------CGCCA...], Mouse[GCAGTGAGCCA...], ... >>> mafft_aln 5 x 60 dna alignment: NineBande[------CGCCA...], Mouse[GCAGTGAGCCA...], ... To change specific parameters, simply specify the parameters in a dict and pass it in: .. note:: Output is truncated for document formatting .. doctest:: >>> clustalw_params = {'-gapopen':-3, '-quicktree':True} >>> clustalw_aln = clustalw_align_unaligned_seqs(unaligned_seqs, DNA, params=clustalw_params) >>> clustalw_aln 5 x 60 dna alignment: NineBande[------CGCCA...], Mouse[GCAGTGAGCCA...], ... PyCogent-1.5.3/doc/examples/aln_profile.rst000644 000765 000024 00000012237 11214023773 021654 0ustar00jrideoutstaff000000 000000 Creating and manipulating alignment profiles ============================================ .. sectionauthor:: Sandra Smit This is an example of how to create a profile from an alignment and how to do particular tricks with it. First, import the necessary stuff. .. doctest:: >>> from cogent.core.profile import Profile >>> from cogent import LoadSeqs, RNA Then load an example alignment of 20 phe-tRNA sequences which we will use to create the profile .. doctest:: >>> aln = LoadSeqs("data/trna_profile.fasta", moltype=RNA) Examine the number of sequences in the alignment and the alignment length .. doctest:: >>> print len(aln.Seqs) 20 >>> print len(aln) 77 Create a profile containing the counts of each base at each alignment position .. doctest:: >>> pf = aln.getPosFreqs() >>> print pf.prettyPrint(include_header=True, column_limit=6, col_sep=' ') U C A G - B 0 0 0 20 0 0 0 12 0 8 0 0 1 18 0 1 0 0 7 9 0 4 0 0... Normalize the positions to get the relative frequencies at each position .. doctest:: >>> pf.normalizePositions() >>> print pf.prettyPrint(include_header=True, column_limit=6, col_sep=' ') U C A G - B 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.6000 0.0000 0.4000 0.0000 0.0000 0.0500 0.9000 0.0000 0.0500 0.0000 0.0000 0.3500 0.4500 0.0000 0.2000 0.0000 0.0000... Make sure the data in the profile is valid. The method isValid checks whether all rows add up to one and whether the profile has a valid Alphabet and CharacterOrder. .. doctest:: >>> print pf.isValid() True A profile can be used to calculate consensus sequences from the alignment. To illustrate the different options for consensus calculation, let's examine the frequency data at the fifth position of the alignment (index=4) .. doctest:: >>> print '\n'.join(['%s: %.3f'%(c,f) for (c,f) in\ ... zip(pf.CharOrder, pf.dataAt(4)) if f!=0]) U: 0.050 C: 0.400 A: 0.250 G: 0.300 The easiest consensus calculation will simply take the most frequent character at each position. .. doctest:: >>> print pf.toConsensus(fully_degenerate=False) GCCCCGGUAGCUCAGU--GGUAGAGCAGGGGACUGAAAAUCCCCGUGUCGGCGGUUCGAUUCCGUCCCGGGGCACCA You can also specify to use the degenerate character needed to cover all symbols occurring at a certain alignment position (fully_degenerate=True). At index 4 in the alignment U, C, A, and G occur, thus the fully degenerate symbol needed is 'N'. Alternatively, using the cutoff value, you can ask for the degenerate symbol needed to cover a certain frequency. At a cutoff of 0.8, we need both C, G, and A at index 4 to cover this value, which results in the degenerate character 'V'. For the lower cutoff of 0.6, C and G suffice, and thus the character in the consensus sequence is 'S'. .. doctest:: >>> pf.Alphabet=RNA >>> print pf.toConsensus(fully_degenerate=True) GSBBNNDUAGCUCAGH??GGKAGAGCRBNVGRYUGAARAYCBNVNKGUCVBBDGWUCRAWHCHSNBHNNNVSC?CHM >>> print pf.toConsensus(cutoff=0.8) GSCYVBRUAGCUCAGU??GGUAGAGCASVSGAYUGAAAAUCYBSRUGUCSSYGGUUCGAUUCCGBSYSBRGSCACCA >>> print pf.toConsensus(cutoff=0.6) GCCYSGRUAGCUCAGU??GGUAGAGCAGRGGACUGAAAAUCCYCGUGUCGGYGGUUCGAUUCCGYCYCKRGGCACCA A profile could also function as the description of a certain motif. As an example, let's create a profile description for the T-pseudouridine-C-loop which starts at index 54 and ends at index 59 (based on the reference structure matching the alignment). .. doctest:: >>> loop_profile = Profile(pf.Data[54:60,:], Alphabet=RNA, CharOrder=pf.CharOrder) >>> print loop_profile.prettyPrint(include_header=True, column_limit=6, col_sep=' ') U C A G - B 0.9500 0.0000 0.0500 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0500 0.9500 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.8500 0.0000 0.1500 0.0000 0.0000 0.0000 We can calculate how well this profile matches in a certain sequence (or profile) by using the score method. As an example we see where the loop profile best fits into the yeast phe-tRNA sequence. As expected, we find the best hit at index 54 (with a score of 5.75). .. doctest:: >>> yeast = RNA.Sequence(\ ... 'GCGGAUUUAGCUCAGUU-GGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA') >>> scores = loop_profile.score(yeast) >>> print scores [ 2.8 0.9 0.85 0.15 2.05 2. 3.75 0.95 1.2 1. 2.9 2.75 0. 0.05 1. 2.9 2.05 1.95 0.2 1.95 0.05 1. 0. 2. 0.15 2. 1.2 1.95 0.9 0.05 1.15 2.15 2.05 1.15 2.8 0.1 0.9 0. 2.05 2.05 2.95 1. 1.8 0.95 0.05 0.85 2. 2.8 0.95 1.85 2.75 1. 0.95 1.15 5.75 1. 0. 0.15 3.05 2.15 1. 1.2 2.15 1.9 0.95 0. 0.05 1.05 4.05 1.95 1.05 0.15] >>> print max(scores) 5.75 >>> print scores.argmax() 54 PyCogent-1.5.3/doc/examples/application_controller_framework.rst000644 000765 000024 00000162015 11361411300 026173 0ustar00jrideoutstaff000000 000000 ******************************************************** Application Controller Documentation ******************************************************** .. sectionauthor:: Greg Caporaso, Sandra Smit About this document =================== The purpose of this document is to serve as a guideline for new users who want to implement application controllers. It should also be helpful as a reference document for more experienced coders. To meet the different needs of these people, this document is split up into multiple sections that discuss the framework in different levels of detail. We first provide a summary, then documentation for application controller users, then documentation for application controller developers, and finally a developer's reference detailing the underlying objects. An addition example of building and applying a new application controller is provided `here <./building_and_using_an_application_controller.html>`_. Feedback from users of this document will help us to improve it, so please don't hesitate to contact us: gregcaporaso@gmail.com or sandra_smit at users.sourceforge.net. .. % ============================================================================ The application controller framework: brief overview ==================================================== Summary ------- An application performs a certain task. It uses parameters to give the user control over the possible settings. In general an application requires some input data and produces some output. To run RNAfold on (i.e. predict an RNA secondary structure for) the sequence in seqs.txt, using some settings for temperature and GU pairs), and to save the output in the file result.txt, the following command would have to be executed:: RNAfold -T 25 -noGU result.txt Suppose you want to fold the sequences at two different temperatures (25 and 37 degrees Celcius) and investigate the difference between the structures. This would require a lot of manual work (run RNA fold twice, store all the files, use python to read in the files, parse the output, and calculate the difference). This task becomes a lot easier if we can control the application RNAfold from within the python environment. The following example folds two sequences at two different temperatures and reports the symmetric difference between the resulting structures. It works by creating a single application controller, and running it with different settings on the same input. The output is parsed and compared. The code snippet doesn't leave any files on the system. .. % This command would run RNAfold on (i.e. predict an RNA secondary structure for) the sequences in seqs.txt. It would write the output to the file result.txt. And it would use the specified settings (for temperature and GU pairs). :: seqs = ['GCCCGGAUAGCUCAGUUAAAAUCCCCGUGUCGGUGGUUCAAUUCCGCCUCCGGGCCCA',\ 'CCCCUGGUGGCCAUAGCGGGGAUUAACCCGGCCUGUGCCGGCAGCGGCACGGGAGGUCGCUGCCAGGGG'] r = RNAfold(WorkingDir="/tmp") r.Parameters['-T'].on(25) print "COMMAND AT 25C:", r.BaseCommand res = r(seqs) fold_25 = just_seqs_parser(res['StdOut'].readlines()) res.cleanUp() r.Parameters['-T'].on(37) print "COMMAND AT 37C:", r.BaseCommand res = r(seqs) fold_37 = just_seqs_parser(res['StdOut'].readlines()) res.cleanUp() for seq_25, struct_25 in fold_25: for seq_37, struct_37 in fold_37: if seq_25 == seq_37: print 'SEQ:', seq_25 print '25C:', struct_25 print '37C:', struct_37 print 'BASE PAIR SIMILARITY: %.3f'%\ (compare_pairs(struct_25.toPairs(), struct_37.toPairs())) Running the script results in the following output:: COMMAND AT 25C: cd "/tmp/"; RNAfold -d1 -T 25 -S 1.07 COMMAND AT 37C: cd "/tmp/"; RNAfold -d1 -T 37 -S 1.07 SEQ: GCCCGGAUAGCUCAGUUAAAAUCCCCGUGUCGGUGGUUCAAUUCCGCCUCCGGGCCCA 25C: (((((((..((..((((.....(((((...))).))...))))..)).)))))))... (-26.25) 37C: ((((((.((((...)))).............(((((.......))))).))))))... (-19.6) BASE PAIR SIMILARITY: 0.222 SEQ: CCCCUGGUGGCCAUAGCGGGGAUUAACCCGGCCUGUGCCGGCAGCGGCACGGGAGGUCGCUGCCAGGGG 25C: (((((((((((.((..((((......)))).(((((((((....)))))))))..)).))))))))))) (-55.52) 37C: (((((((..((.....((((......)))).(((((((((....))))))))).....))..))))))) (-46.3) BASE PAIR SIMILARITY: 0.846 It has tremendous benefits to have the capability to run an application in an integrative fashion from the Python environment, using Python object as input, catching and directly using the output in a downstream analysis. It is timesaving and allows for the full automation of an analysis (prevents manual intervention). This is extremely convenient, for example, if you wish to perform a comparison of a large variety of parameters passed to a single application. Rather than manually running the application many times from the command line, you can script the runs and automatically process the output. A piece of python code that handles this is called an application controller (APPC). The purpose of the application controller framework is to support various application controllers in general. The framework is generic and supports actions that all applications have in common. Each application requires a specific APPC that defines all the variables and methods for that unique application. At the moment the framework supports only one type of application: the command line application. In the (near) future we want to support web applications as well. This section addresses the different components of an application and how these components are represented in the framework. The framework is set up in two files: cogent/app/util.py and cogent/app/parameters.py Command line applications ------------------------- Every application requires a base command (such as RNAfold, muscle, or clustalw) and some parameters. These two properties are stored in the high level class Application. The class CommandlineApplication inherits from Application and has additional features. It deals with the working directory, input, and output. The central method is the __call__ method in which the full command-to-execute is built up and executed. The result of running the application controller is returned to the user. The util.py file contains one other class: ApplicationError. This class is used to raise exceptions for application controllers. Parameters ---------- Most applications allow you to specify a certain set of parameters to control how the program runs. Parameters can control many different features of an application, such as the temperature at which RNA is folded, the number of gaps allowed in an alignment, or the name of an output file. They come in many forms as well, some are simply flags, some always require a value, some can have optional values. .. % Parameter .. % -- FlagParameter .. % -- ValuedParameter .. % -- MixedParameter .. % Parameters .. % ParameterError .. % FilePath The application controller framework supports three types of parameters, which will be discussed below. Subclassing to specify new types of parameters or to make certain attributes fixed, is very easy. The abstract Parameter class defines the basic functionality of a parameter: it initializes all the Parameter attributes and it defines a Parameter ID which is a unique identifier for each parameter. In general a parameter has a prefix (usually a dash) and a name. Some parameters have values. The Parameter object is discussed in more detail in section :ref:`sec:build`. There are three subclasses from the class Parameter. FlagParameter is used for parameters that don't have values (e.g. allow GU pairs or not). ValuedParameters are used for paramaters that specify some value (e.g. the temperature or some input file). MixedParameters are parameters that might or might not have a value (e.g. the -d parameter in RNAfold). All parameters of an application are grouped in a Paramters object. The class Parameters is a special type of dictionary that allows lookups by parameter ID or synonyms. The parameters.py file contains two more classes. ParameterError is used to raise exceptions in the parameter framework. The class FilePath defines paths on a system, it can print itself in a special way and add other parts of a path. Input ----- Input can be very diverse between applications. Most often it requires a file or some data directly from the command line. Application input is handled by "input handlers". There are a few generic input handlers in CommandLineApplication object. Specific APPCs can use these methods directly or overwrite them. The methods process the input data for the application. They might for example write a certain Python object to a temporary file, and change some application parameters to use this file. Output ------ All applications produce some form of output. It can be limited to information on "standard output" (stdout) and "standard error" (stderr). Many applications produce additional output files. Most (unfortunately not all) applications report a meaningful exit status that inform the user on whether the execution of the program was succesful. The class CommandLineAppResult handles all aspects of application output: stdout, stderr, exit status, and the additional output files. Access to all the available files is handles by the class ResultPath. More technical aspect of these classes is discussed in section :ref:`sec:build`. .. % ResultPath .. % CommandLineAppResult .. % ============================================================================ .. % \newpage Using an application controller =============================== Summary ------- #. Create an instance of some app controller #. Turn parameters on and off #. Optionally change the working directory #. Optionally check the base command which is built-up from the above information #. Set the input handler #. Possibly redirect StdOut and StdErr #. Apply the instance to the input data, store the results #. Use the results as you like #. Possibly clean up files created by the program and the APPC Creating an instance with basic settings (parameters, working directory) ------------------------------------------------------------------------ The first step toward running an application is creating an instance of the APPC. Two basic settings are the parameters and the working directory. Below are some examples on how to do this. Note that the working directory must be an absolute path. All parameters have the methods isOn and isOff to check whether the parameter is active or not. Parameters can be turned on and off (with or without a value) with the on() and off() methods. The values can also be set during initialization of the APPC. When specifying the parameters upon initialization the __init__ parameter params should be a dictionary of parameters that should be turned on, keyed by either the Parameter ID or a synonym. The values in params should be the values to turn the parameters on with for Valued or Mixed Parameters, or None for Flag or Mixed Parameters. It is useful to check the BaseCommand to see if all the parameters have the correct settings and if the working directory is correct. During debugging it is useful to check whether the command runs on the normal command line. .. % \subsection{Setting/changing parameters} .. % \subsection{Changing the working directory} :: Initialization without params, only defaults are on. >>> from cogent.app.vienna_package import RNAfold >>> r = RNAfold() Initialization with params, set new values for this instance >>> r = RNAfold(params={'-T':25,'-d':None,'-4':None,'-S':1.2}) Initialization changing the Working directory (must be absolute path!) >>> r = RNAfold(WorkingDir='/tmp') >>> print r.BaseCommand cd "/tmp/"; RNAfold -d1 -T 37 -S 1.07 Changing the working directory after initialization (must be absolute path!) >>> r = RNAfold() >>> r.WorkingDir = '/tmp' >>> print r.BaseCommand cd "/tmp/"; RNAfold -d1 -T 37 -S 1.07 Checking the parameters >>> r = RNAfold() >>> print r.Parameters['-P'].isOn() False >>> print r.Parameters['-P'] >>> print r.Parameters['-T'].isOn() True >>> print r.Parameters['-T'] -T 37 Other settings on initialization -------------------------------- The input handler could be set (if not, the default is used) :: On initialization >>> r = RNAfold(InputHandler="_input_as_string") >>> print r.InputHandler _input_as_string After initialization >>> r = RNAfold() >>> print r.InputHandler # default _input_as_lines >>> r.InputHandler = "_input_as_path" >>> print r.InputHandler _input_as_path Standard out and standard error can be suppressed. If SuppressStderr or SuppressStdout are set to True, stdout and stderr will be routed to /dev/null. The default is to store these results in a temporary file. Redirecting StdErr might be useful for programs that write a lot of useless information to this filestream. .. % input handler Some parameters concerning the creation of temporary files can be changed. TmpDir: default is /tmp. TmpNameLen is the length of the filenames, default is 20. HALT_EXEC is a parameter that can be set to True for debugging purposes. It stops the process right before execution of the system call, it leaves all the input files (incl. temporary) in place. This allows the user to check whether the input is generated correctly. See Section :ref:`sec:haltexec` for more details. Running the application, using the output, and cleaning up ---------------------------------------------------------- When calling the instance of the APPC on some data the __call__ method is invoked. The call method has to optional parameters: data (the input data) and remove_tmp (if True the temporary files are removed). The call method returns a CommandLineAppResult object, containing all the application output information. The output dictionary can be used to access the resulting files. All the information can be incorporated in a downstream analysis. In the example below the aligned sequences in clustalw format are parsed and printed. Additionally CommandLineAppResult contains one public method: cleanUp() which takes no parameters. The method cleanUp() should be used when you want to delete the files that were created by the CommandLineApplication from disk. Note that after cleanUp() you may still have access to your files, but these are not reliable. You will only have access to what has already been loaded into memory (ie. only a fraction of your file typically), so you should only run cleanUp() after you are done accessing you files. Also note that running cleanUp() is not required. If you want the result files to remain on disc you should not run cleanUp() and they will be left in place. This is useful for running an application for later analysis of results. :: >>> from cogent import PROTEIN >>> from cogent.app.clustalw import Clustalw >>> from cogent.parse.clustal import ClustalParser >>> s1 = PROTEIN.Sequence('MHSSIVLATVLFVAIASASKTRELCMKSL') >>> s2 = PROTEIN.Sequence('MALAEADDGAVVFGEEQEALVLKSWAVMKKDA') >>> s3 = PROTEIN.Sequence('MSTVEGREFSEDQEALVVKSWTVMKLNAGELALKF') >>> c = Clustalw(InputHandler="_input_as_seqs") >>> result = c([s1,s2,s3]) >>> print result['ExitStatus'] 0 >>> aln_txt = result['Align'].readlines() >>> for label, seq in ClustalParser(aln_txt): print "%s: %s"%(label, seq) 2: MALAEADDGAVVFGEEQEALVLKSWAVMKKDA------- 3: MSTVEGRE----FSEDQEALVVKSWTVMKLNAGELALKF 1: MHSSIVLAT-VLFVAIASASKTRELCMKSL--------- >>> result.cleanUp() .. % ============================================================================ .. % \newpage .. _sec:build: Designing and implementing a new type of application controller =============================================================== Each specific application that you wish to control through PyCogent requires an application controller, i.e., a subclass of CommandLineApplication. Building the new application controller consists of three steps: #. Creating the application controller class: Overwrite CommandLineApplication to define your new application controller, and define the class data. (Section :ref:`sec:step1`.) #. Input handing: Determine whether the built-in input handlers (in CommandLineApplication) are sufficient. If not, write one or more input handling methods. (Section :ref:`sec:step2`.) #. Output handling: Determine whether the program writes any output files to disk. If so, implement the _get_result_paths method. (Section :ref:`sec:step3`.) .. _sec:step1: Step 1: Creating the application controller class and defining class data ------------------------------------------------------------------------- All of these class variables are discussed in detail in Sections :ref:`sec:application` and :ref:`sec:commandlineapplication`. --- **The following class data must be overwritten:** _command: The command used to run the command (a string). **The following class data can be overwritten:** _parameters: A dictionary of Parameter objects. Keys should be the identifiers of the parameters, and values should be the Parameter objects. _command_delimiter: String that specifies the delimiter between the components of a full command, e.g. the command, parameters, and arguments. _synonyms: A dictionary of parameter synonyms. Keys should be the alternative keys to lookup a parameter, and values should be the identifiers used in the _parameters dictionary. _input_handler: The name of the input handler method that should be used by default. The value should be a string (see CommandLineApplication.__call__ for how it's used). _working_dir: Specifies where the command should be run (string). Default is current working directory. _suppress_stdout: Boolean value that specifies what happens with standard output (stdout) by default. _suppress_stderr: Boolean value that specifies what happens with standard error (stderr) by default. Defining parameters ^^^^^^^^^^^^^^^^^^^ All parameters should be one of the three built-in types: FlagParamater, ValuedParameter, or MixedParameter. (We don't know of any types that wouldn't fit into this framework, but if you come across any, please let us know.) Examples illustrating how to define the three different parameter types can be found in Section :ref:`sec:parameters`. The _parameters dict is a mapping of parameter identifiers, or Prefix and Name joined by the empty string, to parameter objects. All parameters which can be passed to an application should be defined in the parameters dict. Usually you can get this list by reviewing the application's documentation. See Section :ref:`sec:rnafoldexample` for an example including the definition of the _parameters dict. Note: if for a given ValuedParameter or MixedParameter, the value is intended to be a path to a directory or file, ``IsPath=True`` must be passed when initializing those parameters. --- **Defining a new Parameter type** --- If the application you're working with uses a type of parameter that is not supported by the framework yet, you might want to write your own subclass. To subclass Parameter, the following methods will need to be implemented: __str__, isOn(), isOff(), on(), off(). These methods cover the two important characteristics of each parameter: knowing how to print itself, based on its status, and knowing how to be turned on or off. It is unlikely that you will need to subclass parameter if working with CommandLineApplication subclasses. If you think you do, please let us know. --- **Writing constructor functions/wrappers** --- There might be several reasons, such as to make some attribute of the parameter fixed, to write a wrapper around or constructor function for a parameter. For example to fixate the prefix of the FlagParameter, one might write this:: >>> from cogent.app.parameters import FlagParameter >>> def DashedFlag(name): ... return FlagParameter('-',name) ... >>> tree = DashedFlag('tree') >>> tree >> tree.on() >>> print tree -tree .. _sec:rnafoldexample: A complete Command-LineApplication subclass example ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A subclass of CommandLineApplication might look something like this:: class RNAfold(CommandLineApplication): """Application controller for RNAfold (in the Vienna RNA package) """ _command = 'RNAfold' _parameters = { '-p':MixedParameter(Prefix='-',Name='p',Delimiter='',Value=False), '-C':FlagParameter(Prefix='-',Name='C'), '-T':ValuedParameter(Prefix='-',Name='T',Value=37,Delimiter=' '), '-4':FlagParameter(Prefix='-',Name=4), '-d':MixedParameter(Prefix='-',Name='d',Delimiter='',Value=1), '-noLP':FlagParameter(Prefix='-',Name='noLP'), '-noGU':FlagParameter(Prefix='-',Name='noGU'), '-noCloseGU':FlagParameter(Prefix='-',Name='noCloseGU'), '-e':ValuedParameter(Prefix='-',Name='e',Delimiter=' '), '-P':ValuedParameter(Prefix='-',Name='P',Delimiter=' '), '-nsp':ValuedParameter(Prefix='-',Name='nsp',Delimiter=' '), '-S':ValuedParameter(Prefix='-',Name='S',Value=1.07,Delimiter=' ')} _synonyms = {'Temperature':'-T','Temp':'-T','Scale':'-S'} _input_handler = '_input_as_lines' _suppress_stderr = True If the built-in input handlers are sufficient, and no output to disk is written by the program, this would complete the application controller. .. _sec:step2: Step 2: Input handling ---------------------- Not all applications handle their input in the same way. The input might be specified as a filename on the command line, as a list of values on the command line, or an input file might be specified through parameters. Some input data might also require processing before it is used by the application. To give the user control over how input is handled without having to overwrite __call__(), small input handling methods can be specified in the application controller. In most cases, the CommandLineApplication input handlers can probably be used (e.g., passing data via stdin or a temp file), but for more complicated input formats, custom input handlers may need to be written for a CommandLineApplication subclass. Every input handling method should take one parameter, data, and return a string that will be appended to the command, e.g. ``/path/to/input/file.txt``, if a path is passed to the application. (In this example, you would want to use CommandLineApplication._input_as_path as the input handler.) By writing multiple input handling methods, multiple types of input can be handled by one application. The user can specify which one they want to use in a certain instance by setting the _input_handler class variable, or the InputHandler initialization variable. For example, RNAfold takes a list of sequences from stdin. In this case, none of the built-in input handlers provides this functionaloty. The following input handler (from cogent.app.rnafold.Rnafold) writes the sequences (data) to a temporary file and redirects them to stdin. :: def _input_as_lines(self,data): """Returns 'name' above it in your input file (or list of sequences). The ss and dp files for named sequences will be written to name_ss.ps and name_dp.ps """ result = {} name_counter = 0 seq_counter = 0 if not isinstance(data,list): #means data is file data = open(data).readlines() for item in data: if item.startswith('>'): name_counter += 1 name = item.strip('>\n') result[(name+'_ss')] =\ ResultPath(Path=(self.WorkingDir+name+'_ss.ps')) result[(name+'_dp')] =\ ResultPath(Path=(self.WorkingDir+name+'_dp.ps'),\ IsWritten=self.Parameters['-p'].isOn()) else: seq_counter += 1 result['SS'] = ResultPath(Path=self.WorkingDir+'rna.ps',\ IsWritten=seq_counter - name_counter > 0) #Secondary Structure result['DP'] = ResultPath(Path=self.WorkingDir+'dot.ps', IsWritten=(self.Parameters['-p'].isOn() and\ seq_counter - name_counter > 0)) #DotPlot return result Tips and tricks for creating application controllers ---------------------------------------------------- .. _sec:haltexec: HALT_EXEC is your friend ^^^^^^^^^^^^^^^^^^^^^^^^ The __init__ method takes a boolean parameter, HALT_EXEC, which is False by default. Setting HALT_EXEC=True will cause __call__ to exit before the system call, print out the complete command that was about to be run, and leave all temporary files in place. This is extremely useful for debugging, because it allows you to run the application directly with the input that was generated by the application controller. You can therefore run the command and look directly at stdout and stderr, debug any temporary files that were created, etc. If the application you are controlling is slow, this can also allow you to debug earlier steps without having to wait for the application to run. HALT_EXEC is your friend. .. % ============================================================================ .. % \newpage Application controller base classes: Developer's reference ========================================================== Command line applications ------------------------- .. _sec:application: Application: cogent.app. util.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Application is an abstract class that contains some data used by for all application handlers that could be written. Private class data of Application consists of: _command: The command used to run the command (a string). If the command is in your path (in a directory listed in the environment variable ``$PATH``, or found by typing ``which`` followed by the command) you can provide only the command, e.g. ``RNAfold``. If the command is not in your path, you must specify the absolute path it, e.g. ``/some/other/bin/rnaview``. The use of absolute paths here is *not* recommended, because the location of the installation might be different on every machine. Instead, consider setting your ``$PATH`` environment variable to include the directory where the application is installed. For example if you are writing an application controller for ``ls`` where you might run: --- ls -al \*.jpg --- _command should be set to ``ls``. _parameters: A dictionary of Parameter objects. Keys should be the identifiers of the parameters, and values should be the Parameter objects. This dictionary defines which parameters are available to the application. No values are specified, except for occasional default values. The default value for _parameters is the empty dictionary. If the application takes any command line parameters, this must be overwritten. This is almost always the case. See cogent.app.clustalw.Clustalw._parameters for an example of when parameters is overwritten. :: _parameters = {'-T':ValuedParameter('-','T',Delimiter='=')} _synonyms: A dictionary of parameter synonyms. Keys should be the alternative keys to lookup a parameter, and values should be the identifiers used in the _parameters dictionary. It probably a good idea to comment on the available synonyms in the docstring of the application controller, so users that haven't read the manual know what they can use to control the parameters. The default value for _synonyms is the empty dictionary. See cogent.app.vienna_package.ViennaPackage._synonyms for an example of this being overwritten. :: _synonyms = {'Temperature':'-T', 'Temp':'-T'} _command_delimiter: String that specifies the delimiter between the components of a full command, e.g. the command, parameters, and arguments. The default value is ' ' (a single space). This delimiter will work for any Unix application, so it is usually not overwritten. (We are interested in hearing about any circumstances where this might be overwritten. Please let us know if you come across any. One example might be if the command being constructed is a URL.) In the above 'ls' example, a single space (' ') spearates the command componenets: the base command ``ls``, the parameters ``-al``, and the argument ``*.jpg``. The only method that is defined by Application is __init__, which takes one optional argument, params. The value of params should be a dictionary of parameters that should be turned on. Keys should be either the Parameter ID or a synonym. The values in params should be the values to turn the parameters on with for Valued or Mixed Parameters, or None for Flag or Mixed Parameters. Application is never directly instantiated, but is instead inherited (either directly or indirectly) by all application controllers. It is necessary that Application.__init__() be called somewhere during the initialization of your class, but if you are inheriting from a higher level class (such as CommandLineApplication) this should already be handled. .. _sec:commandlineapplication: Command-LineApplication: cogent.app.util.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ CommandLineApplication is an abstract class for command line application controllers. Several variables are class data to facilitate subclassing CommandLineApplication and to allow definition of defaults for Application Controller subclasses. This class was designed to be easily and minimally subclassed. CommandLineApplication inherits from Application. CommandLineApplication contains the following additional class data: _input_handler: The name of the input handler method that should be used by default. The value should be a string (see CommandLineApplication.__call__ for how it's used). The input handling methods are private, so they should start with an underscore. The default value for _input_handler is '_input_as_string'. The input handler can be changed on instance level via the InputHandler initialization parameter. _working_dir: Specifies the default working directory (string). The working_dir is where many applications write out their output. Setting this value gives you control over where output is written. The value of _working_dir should be an *absolute* path. If the value of _working_dir is None (the default) the current working directory will be used. The working directory can be changed at instance level via the WorkingDir initialization parameter. _suppress_stdout: Boolean value that specifies what happens with standard output (stdout) by default. If the value is False (default), stdout is caught and accessible in the result object. If the value is True, stdout is routed to /dev/null and won't be accessible. Suppression of stdout can also be controlled at instance level via the SuppressStdout initialization parameter. _suppress_stderr: Boolean value that specifies what happens with standard error (stderr) by default. Some programs write a lot to stderr which you might want to ignore. If the value is False (default), stderr is caught and accessible in the result object. If the value is True, stderr is routed to /dev/null and won't be accessible. Suppression of stderr can also be controlled at instance level via the SuppressStderr initialization parameter. Class data can be overruled on the instance level by passing alternate data in as parameters to __init__(). These parameters are InputHandler, SupressStderr, and WorkingDir. Note that _working_dir and WorkingDir must always be an absolute path, although no explicit checking is done for this. You *will* get weird results in many cases if you use relative paths. WorkingDir, InputHandler, and SupressStderr are all public attributes of CommandLineApplication, and can be modified at anytime. You should (obviously) not modify the private versions of these attributes. Note that if WorkingDir does not exist on the system it will be created, and it will not be removed after the program runs. There is an additional private variable _input_filename. This is set to the string containing the absolute path to an input file when the input file is a python generated temporary file. This should not be accessed from outside of the program, but may be useful at times when subclassing. CommandLineApplication defines several methods. These include:: __init__(), __call__(), _input_as_string(), _input_as_multiline_string(), _input_as_lines(), _input_as_path(), _input_as_paths(), _absolute(), _get_base_command(), _get_WorkingDir(), _set_WorkingDir(), _accept_exit_status(), _get_result_paths(), getTmpFilename() We will go over these in differing depths, because for most cases, these are background methods that should never be called directly, or overwritten. __init__(): Initializes the object, taking as parameters params (see Application), InputHandler, WorkingDir, SupressStderr (discussed above). This method *must* be called by subclasses in their __init__() if they have one. For most purposes, you will never need to overwrite this method. __call__(): This is the method that does most of the work in the CommandLineApplication. Most of a users interaction with CommandLineApplications will be through this method, which takes data as a parameter. data is the data that should be passed as input to the application when it is called, default is None. Note that before data is appended to the command the InputHandler function is called on it. If data=None, no data is passed into the function, and the input handler will not be called. You should at all costs avoid overwriting __call__() as a lot is going on here. _input_as_string(): The default input handler. This acts on one parameter, data, that is passed in. It type casts data to a string, and returns the string. _input_as_lines(): An alternate input handler. In this case, data is a a sequence of lines to be written in a temporary file. This allows you interact with programs which only takes files as input, when you have created a data file on the fly. The return value of this function is a string representing the absolute path to the filename, which will be created with in self.WorkingDir. _input_as_multiline_string(): Input handler, similar to _input_as_lines, except data is a single string which should be written to a temporary file. The temporary file's path is passed as input to the application as input. _input_as_path(): Another alternate input handler. This is similar to _input_as_string, but casts the input to a FilePath object rather than a string. If the input is a path, this input handler should be used. _input_as_paths(): Yet another alternate input handler. This is similar to _input_as_path, but operates on a list of paths. _absolute(): Converts a filename to an absolute path if it is not already. The path that is appended is self.WorkingDir. The result is a FilePath object. _get_base_command(): Appends the necessary parameters to self._command and returns the full command as a string (without input and output). _get_WorkingDir() and _set_WorkingDir(): accessor methods for the WorkingDir attribute. _accept_exit_status(): This function takes a string containing the return value of the application that was run. It is meant to be overwritten when necessary. It's purpose is to analyze the exit_status of the application being run to determine if an ApplicationError should be raised. By default, no ApplicationError is raised regardless of the exit_status. In a subclass this is handy because you can customize what exit statuses are acceptable to you, and which are not, or you can not define the function in your subclass and accept all exit statuses. _get_result_paths(): This method is used to initialize the CommandLineAppResult class (see Section :ref:`sec:commandlineappresult`). This method should be overwritten if the application creates output other than stdout and stderr. A dict should be returned with ResultPath objects keyed by the names that you'd like to access their data by in the CommandLineAppResult object. When building the ResultPath objects, you will need to construct the names of all of the files that are being created. For this reason, you will need access to all of the data that the application has access to in the case of dynamic filenames. In order to construct these file name you have access to the Parameters object, data (which is passed in to the function) in the case where, for example, the output filename is specified as input to the program. The name of the input filename, when generated as a temporary file is available as self._input_filename, for cases where the output file name is based on the name of the input filename. This, in addition to system calls if necessary, should provide all of the information needed to build the names and paths of output files. getTmpFilename(): Generates a random filename using ``TmpLenName`` random alphanumeric (upper and lowercase) characters. The result will be an absolute path (presuming that ``TmpDir`` is absolute, which it should be), and the filename will begin with ``prefix``, end with ``suffix``, and be in ``tmp_dir`` or ``TmpDir``. The ``tmp_dir`` parameter overrides the class/object-level default. Note that this function does not actually created the file, just the filename. The result is a ``FilePath`` object. Two module level functions are also implemented:: get_tmp_filename, guess_input_handler get_tmp_filename: A module level implementation of ``CommandLineApplication.getTmpFilename().`` guess_input_handler: This is a module-level function intended to pick the right input_handler in case the input is a set of sequences. It will return one of four input handlers: --- _input_as_multiline_string, _input_as_path, _input_as_seqs, _input_as_lines. Parameters ---------- Parameter: cogent.app. parameters.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The class Parameter is an abstract class. Every Parameter object has six attributes: Prefix, Name, Value, Delimiter, Quote, and IsPath. All attributes may have any value, as long as it can be type casted into a string. The Prefix of a parameter specifies the character that precedes the name of the parameter. It is mandatory to specify a prefix for a parameter, although it may be the empty string. For example: '-' is the prefix in '-T=37', and '\*' is the prefix in '\*d'. Note that some characters may have to be escaped (e.g. `\backslash`). Name is the second mandatory attribute of Parameter. The combination of the prefix and name of a parameter should form a unique combination that identifies the parameter. This ID is a public property of Parameter and will function later on as the key in the dictionary of parameters. The attribute Value specifies the value of a parameter. It will be clear that not all parameters, such as flags, require a value. Therefore this field is optional in the __init__ method. For example, the value in '-T=37' is 37, the value in '-d1' is 1. The Delimiter specifies what separates the name from the value when a parameter is printed. For example: '=' in 'T=37' or ' ' (single space) in '#r 14'. The Quote is an optional attribute that determines which characters will surround the value when the parameter is printed. Be alert on escaping quotes, since most quote-values will have a special meaning in python. At the moment only symmetrical quotes are supported, such as " ' " (single quote) in " -p='a' ". Asymmetrical quotes are not possible, e.g. 'd=[4]'. *Is this something that should get supported?* IsPath should be set to true if the Value of the Parameter object is intended to be a path to a directory or file. Paths require special handling when printing, and Value is therefore cast to a cogent.app.parameters.FilePath object. IsPath is only used by ValuedParameter and MixedParameter objects, and has no effect on FlagParameters. Every type of parameter prints itself differently. A flag will only print a combination of its prefix and name; another parameter may include everything. Therefore, the __str__ has to be specified in each specific subclass of Parameter. Whether a parameter is printed is determined by its value. This is also subclass specific and will be explained in the following sections. FlagParameter: cogent.app. parameters.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FlagParameter inherits from Parameter. A flag can't have a value, it is just on or off. For example: if '-tree' is set, a dendrogram is calculated; if '-tree' is not set, the tree is not calculated. Since a flag can never have a value, we can easily use the value to specify whether the flag will be printed or not. If Value=True, the parameter will print itself; if Value=False, it won't. A FlagParameter can be initialized with three things. Prefix, Name (mandatory), and Value (optional). The default for Value is False to indicate that the parameter is off (i.e. not printed) by default. The only thing that counts for a flag is whether its value evaluates to True or to False. If a FlagParameter has to print itself, it checks first whether it is on or off (Value=True or Value=False). If it is off, it will return the empty string. If it is on it will return the combination of its prefix and name. The methods isOn() and isOff() will return True or False depending on the Value of the FlagParameter. These methods can be used to see whether the parameter will be printed on the command line or not. With the methods on() and off() the parameter can be turned on or off. These methods don't take a value, because a flag can't have a value. Internally, they'll set parameter.Value to True or False. .. % Example can be removed (b/c is in section 2?)? :: >>> from cogent.app.parameters import FlagParameter >>> tree = FlagParameter(Prefix='-',Name='tree') >>> tree.isOn() False >>> print tree >>> tree.on() >>> print tree -tree ValuedParameter: cogent.app. parameters.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValuedParameter also inherits from Parameter. In addition to setting all attributes of the parameter during initialization, a default value is set. This is a private property of a ValuedParameter and will be set to the value with which the parameter.Value is initialized. The Default value is available for inspection through parameter.Default. The default value should not be changed by the user. With the method reset() the Value of the parameter will be reset to the default value. Like in FlagParameter, the value is used to control whether the parameter will print itself or not. If the Value is None, the parameter is off and __str__ will return the empty string. If the Value is anything else, the parameter will be printed in full glory: prefix, name, value and optionally delimiter and quotes. If IsPath is True, the value will be wrapped in double quotes when printed allowing for spaces in paths. The methods isOn() and isOff() can be used to check whether the parameter will be printed or not. If parameter.Value is not None, the parameter is on and will be printed. If parameter.Value is None the parameter is off and won't be printed. By using the method on(value) the Value of the parameter is set to the specified value. If you accidentally try to turn the parameter on with the value None, an error will be raised. Calling off() will set the Value of the parameter to None. .. % Example can be removed (b/c is in section 2?)? :: >>> from cogent.app.parameters import ValuedParameter >>> temp = ValuedParameter(Prefix='-',Name='T',Delimiter="=") >>> temp.isOn() False >>> print temp >>> temp.on(37) >>> print temp -T=37 >>> temp_def = ValuedParameter(Prefix='-',Name='T',Value=100,Delimiter="=") >>> temp_def.Default 100 >>> print temp_def -T=100 >>> temp_def.on(15) >>> print temp_def -T=15 >>> temp_def.reset() >>> print temp_def -T=100 MixedParameter: cogent.app. parameters.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ MixedParameter is a subclass of ValuedParameter, because they share many features. A MixedParameter is a parameter that has an optional value; sometimes it behaves like a FlagParameter, sometimes like a ValuedParameter. An example is: '-d[0\ `\mid`\ 1\ `\mid`\ 2]'. During initialization the Default value is set like in ValuedParameter. The method reset() is available to reset the parameter value to the default. This type of Parameter has the most complicated control over 'on' or 'off'. If the Value is False, the parameter is off. If the Value is None, the parameter is on, but behaves like a flag (only prefix and name will be printed), if the Value is anything else, the parameter is on and behaves like a ValuedParameter. The methods isOn() and isOff() have the same functionality as in the other parameter types. When using on(val=None) it is optional to specify the value. If a MixedParameter is turned on without a value it will behave like flag. When turned on with a value, it will behave like a ValuedParameter. The method off() sets the Value to False, which indicates that the parameter should not be printed. .. % Example can be removed (b/c is in section 2?)? :: >>> from cogent.app.parameters import MixedParameter >>> d = MixedParameter(Prefix='-',Name='d',Delimiter='') >>> d.isOff() True >>> d.on() >>> print d -d >>> d.on(2) >>> print d -d2 FilePath: cogent.app. parameters.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The FilePath object inherits from string, and should be used to wrap all strings that represent paths. Examples inlcude:: my_file.txt /path/to/my/file.txt /path/to/my/dir/ Wrapping paths in a FilePath object wraps the path in quotes when it should be, for example when passed to a system call, and doesn't wrap it in quotes when it shouldn't be, for example when performing operations on strings. The following example illustrates how this fails with a simple string, but performs as it should with a FilePath. In this example, p1 and p2 are simple strings, and p3, p4, p5, and p6 are FilePath objects. Since the example path contains spaces, a system call would not generate the desired result if the path is not wrapped in quotes. The FilePath object will wrap it in quotes when it is cast to a string, but will not wrap it in quotes when performing other string operations. The string object, on the other hand, does not differentiate, and joining p1 and p2 results in quotes placed in the middle of the string. :: >>> p1 = '"/path to/"' >>> p2 = '"my_file.txt"' >>> print str(p1 + p2) "/path to/""my_file.txt" >>> from cogent.app.parameters import FilePath >>> p3 = FilePath("/path to/") >>> p4 = FilePath("my_file.txt") >>> print str(p3+p4) "/path to/my_file.txt" >>> p5 = FilePath("/path to/") >>> p6 = FilePath("my_file.txt") >>> print str(p5+p6) "/path to/my_file.txt" The FilePath object is used by MixedParameter and ValuedParameter when their IsPath attribute is set to True. This causes the Value attribute to be cast to a FilePath object, and it is wrapped in quotes when used in a system call. The _input_as_path input handler also casts the input to a FilePath object. In general, if you are working with a string in an application controller that represents a path to a file or directory, for example in a custom input handler, that string should be cast to a FilePath object. Failure to do this will result in errors if users pass a path that contains spaces. .. _sec:parameters: Parameters: cogent.app. parameters.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For most applications multiple parameters can be set. An application controller should have a set of known parameters with optional default values. All parameters of an application are organized in a dictionary where keys are parameter identifiers (combination of prefix and name) and values are Parameter objects. Sometimes it might be hard to remember what the identifiers of the parameters in a specific application are. Lets look at an example. Suppose the user knows that the temperature can be set in three applications. In application *A* with '-T', in application *B* with '\*temp', and in application *C* with '--t'. It is very likely that he/she doesn't remember what parameter is used in which application. If every application controller has an synonyms dictionary which maps 'temp' to the identifier of the application specific parameter, the user can always lookup the temperature with parameters['temp']. To support the lookup of parameters by synonyms, the class Parameters is not a simple dictionary, but a MappedDict (in cogent.util.misc). A MappedDict is a dictionary that can apply some function to a lookup value, before it looks it up in the dictionary. This function is called a mask. In the Parameters object the mask allows users to look up parameters in the dictionary by synonyms. The Parameters object uses the private function _find_synonym() internally to determine by what key the parameter will be looked up. If the key, given by the user, appears in the synonyms dictionary the key to use for the parameters dictionary is looked up. Otherwise, it is assumed that the user used an existing key in the parameters dictionary. :: >>> from cogent.app.parameters import FlagParameter >>> a = FlagParameter('-','a') >>> from cogent.app.parameters import ValuedParameter >>> b = ValuedParameter('-','T',Value=37,Delimiter='=') >>> from cogent.app.parameters import MixedParameter >>> c = MixedParameter('-','d',Value=0) >>> params = {'-a':a,'-T':b,'-d':c} >>> synonyms = {'temp':'-T','distance':'-d'} >>> from cogent.app.parameters import Parameters >>> p = Parameters(params,synonyms) >>> p['-a'].isOn() False >>> print p['temp'] -T=37 >>> p['distance'].on(2) >>> print p['-d'] -d2 Input ----- CommandLineApplication subclasses are called using the __call__ method with a single variable, data. This is the value passed to the application on the command line. The value of data will differ based on the application you are interfacing. Controlling for this without having to overwrite __call__ for every CommandLineApplication is the purpose of the _input_handlers discussed in Section :ref:`sec:commandlineapplication`. Some examples of data that might be passed to CommandLineApplications are strings, via the _input_as_string input handler, a list of lines that should be written to file and then passed to the application, via the _input_as_lines input handler, or a path to a file or directory, via the _input_as_path input handler. To define the input handler that should be used, the class data _input_handler should be set. If one of the default input handlers is not applicable for a new CommandLineApplication, you will need to write a custom input handler. See ``cogent.app.raxml.Raxml._input_as_seqs`` for an example of a custom input handler. For a discussion of the predefined input handlers, see Section :ref:`sec:commandlineapplication`. For a discussion on defining custom input handlers, see Section :ref:`sec:step2`. .. % I think we're better off just pointing to the relevant discussion, rather than including there here. .. % Is there a good example that we could put here? I hesitate to put a real example, b/c if we're going to .. % rewrite this as executable documentation, it will fail on any system that didn't have the application .. % we use as an example. Output ------ ResultPath: cogent.app.util.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ResultPath object is intended to hold the important information pertaining to output files created by an application, namely the path to where the file can be found, and whether the file was written or not. ResultPath is a very simple container class. It has no methods aside from __init__(). To initialize a ResultPath object you must specify the path to the output file by setting the Path parameter. This must be a string, and it is *highly* recommended that this be an absolute path, though relative paths will also work in many cases. You can also optionally specify a boolean value specifying whether the file has been written or not by setting the IsWritten parameter. This is True by default. :: >>> from cogent.app.util import ResultPath >>> rp = ResultPath(Path='/tmp/my_output.txt',IsWritten=True) >>> rp.Path '/tmp/my_output.txt' >>> rp.IsWritten True >>> rp `_. (Note this is what NCBI now refers to as *legacy BLAST*, not BLAST+.) This document was developed in the process of writing the full ``formatdb`` application controller in PyCogent. You can find that file in your PyCogent directory at: ``cogent/app/formatdb.py``. After you work through this example, you should refer to that file to see what the full application controller and convenience wrappers look like. A more complete reference on `PyCogent Application Controllers can be found here <./application_controller_framework.html>`_. Building a formatdb application controller ========================================== Step 0. Decide which application you want to support, and which version. ------------------------------------------------------------------------ Decide what version of the program you want to support and install the application. Check out the features, and decide what functionality you want to support:: formatdb --help For the sake of brevity in this example, we'll support only the basic functionality: creating nucleic acid or protein blast databases in the default format, and with default names (i.e., named based on the names of the input file). So, we'll support the ``-i``, ``-o``, and ``-p`` parameters. Step 1. Define the class and the class variables. ------------------------------------------------- First, create a new file called ``minimal_formatdb.py``. Open this in a text editor, and add the following header lines:: #!/usr/bin/python from cogent.app.util import CommandLineApplication, ResultPath from cogent.app.parameters import ValuedParameter This imports some classes that we'll be using from PyCogent. For this to function correctly, ``cogent`` must be in your ``$PYTHONPATH``. Next define your class, ``MinimalFormatDb``:: class MinimalFormatDb(CommandLineApplication): """ Example APPC for the formatdb program in the blast package """ Note that you're defining this class to inherit from ``CommandLineApplication``. This is where most of the heavy lifting is done. We'll next want to define the following class variables: * ``_command``: the command called to start the application * ``_parameters``: the parameters to be passed (we'll select a few that we're going to support) * ``_input_handler``: function describing how to convert a single parameter passed to the app controller object into input for the function * ``_suppress_stdout``: (if other than False) * ``_suppress_stderr``: (if other than False) This is done by adding the following lines:: _command = "formatdb" _parameters = {\ '-i':ValuedParameter(Prefix='-',Name='i',Delimiter=' ',IsPath=True), '-o':ValuedParameter(Prefix='-',Name='o',Delimiter=' ',Value='T'), '-p':ValuedParameter(Prefix='-',Name='p',Delimiter=' ',Value='F') } _input_handler = "_input_as_parameter" You'll see here that we're only defining the ``-i``, ``-o``, and ``-p`` parameters, hence the name of this call being ``MinimalFormatDb``. An important variable to note here is ``_input_handler``. We'll come back to that next. An addition thing to note here is that I'm setting Value for ``-p`` to ``F`` instead of the default of ``T``. This is because I usually build nucleotide databases (specified by ``-p F``), not protein databases (which is the default, and specified by ``-p T``). Step 2. Overwrite methods as necessary. --------------------------------------- We'll next create a non-default input handler. The input handler takes the input that you'll eventually provide when calling an instance of ``MinimalFormatDb``, and prepares it be passed to the actual command line application that you're calling. ``formatdb`` requires that you pass a fasta files via the ``-i`` parameters, so we'll define a new function ``_input_as_parameter``, here:: def _input_as_parameter(self,data): """ Set -i to data """ self.Parameters['-i'].on(data) return '' Input handlers return a string that gets appended to the command, but turning parameters on also caused them to be added to the command. For that reason, this input handler returns an empty string -- otherwise ``-i`` would end up being passed twice to ``formatdb``. Finally, we'll define the ``_get_result_paths``, which is the function that tells the application controller what files it should expect to be written, and under what circumstances. Our ``MinimalFormatDb`` application controller writes a log file under all circumstances, and nucleotide databases when ``-p`` is set to ``F`` or protein databases when ``-p`` is set to ``T``. We return the resulting files as a dict of ResultPath objects:: def _get_result_paths(self,data): """ """ result = {} result['log'] = ResultPath(\ Path=self.WorkingDir+'formatdb.log',IsWritten=True) if self.Parameters['-p'].Value == 'F': extensions = ['nhr','nin','nsi','nsq','nsd'] else: extensions = ['phr','pin','psi','psq','psd'] for extension in extensions: result[extension] = ResultPath(\ Path=data + '.' + extension,\ IsWritten=True) return result At this stage, you've created an application controller which supports interacting with a few features of the ``formatdb`` command line application controller. In the next step, we'll look at how to use your new application controller. Using the new formatdb application controller ============================================= Next we'll import the new ``minimal_formatdb`` application controller, and test it out. For the following examples, you need to access some files that are in your ``cogent/doc/data`` directory. For simplicity, we'll assume that on your system this directory is ``/home/pycogent_user/PyCogent/cogent/doc/data``. You should always replace this directory with the path as it exists on your machine. Open a python interpreter in the directory where you created your ``minimal_formatdb.py`` and enter the following commands:: >>> import minimal_formatdb >>> fdb = minimal_formatdb.MinimalFormatDb() >>> res = fdb('/home/pycogent_user/PyCogent/doc/data/refseqs.fasta') >>> res You'll see that you've created a new protein BLAST database -- you can tell because you have the nucleotide database files in the result object (i.e., they begin with ``n``). Next clean up your the files that were created:: >>> res.cleanUp() Next we'll change some parameters settings, and confirm the changes:: >>> fdb = minimal_formatdb.MinimalFormatDb() >>> fdb.Parameters['-p'].on('T') >>> fdb.Parameters['-p'].isOn() True >>> fdb.Parameters['-p'].Value 'T' >>> str(fdb.Parameters['-p']) '-p T' We've just set the -p parameter to F, indicating that a protein database should be built instead of a nucleotide database. Note that the database files now begin with ``p``. Run the appc and investigate the results:: >>> res = fdb('/home/pycogent_user/PyCogent/doc/data/refseqs.fasta') >>> res Next clean up your the files that were created:: >>> res.cleanUp() Tips and tricks when writing applications controllers ======================================================= One of the most useful features of application controller object when building and debugging is the HALT_EXEC parameter that can be passed to the constructor. This will cause the program to halt just before executing, and print the command that would have been run. For example: >>> fdb = minimal_formatdb.MinimalFormatDb(HALT_EXEC=True) >>> res = fdb('/home/pycogent_user/PyCogent/doc/data/refseqs.fasta') [Traceback omitted] Halted exec with command: cd "/home/pycogent_user/"; formatdb -o T -i "/home/pycogent_user/PyCogent/doc/data/refseqs.fasta" -p F > "/tmp/tmpBpMUXE0ksEhzIZA1SSbS.txt" 2> "/tmp/tmpSKc0PRhTl47SZfkxY0g1.txt" You can then leave the interpreter and paste this command onto the command line, and see directly what happens if this command is called. It's usually useful to remove the stdout and stderr redirects (i.e., everything after and including the first ``>``). For example:: cd "/home/pycogent_user/"; formatdb -o T -i "/home/pycogent_user/PyCogent/doc/data/refseqs.fasta" .. rubric:: Footnotes .. [#attribution] This document was modified from Greg Caporaso's PyCogent lecture. You can also grab the `full lecture materials `_. PyCogent-1.5.3/doc/examples/calculate_neigbourjoining_tree.rst000644 000765 000024 00000002470 11425201333 025576 0ustar00jrideoutstaff000000 000000 Make a neighbor joining tree ============================ .. sectionauthor:: Gavin Huttley An example of how to calculate the pairwise distances for a set of sequences. .. doctest:: >>> from cogent import LoadSeqs >>> from cogent.phylo import distance, nj Import a substitution model (or create your own) .. doctest:: >>> from cogent.evolve.models import HKY85 Load the alignment. .. doctest:: >>> al = LoadSeqs("data/long_testseqs.fasta") Create a pairwise distances object calculator for the alignment, providing a substitution model instance. .. doctest:: >>> d = distance.EstimateDistances(al, submodel= HKY85()) >>> d.run() Now use this matrix to build a neighbour joining tree. .. doctest:: >>> mytree = nj.nj(d.getPairwiseDistances()) We can visualise this tree by ``print mytree.asciiArt()``, which generates the equivalent of: .. code-block:: python /-Human /edge.0--| | \-HowlerMon | -root----| /-NineBande |-edge.1--| | \-DogFaced | \-Mouse We can save this tree to file. .. doctest:: >>> mytree.writeToFile('test_nj.tree') .. clean up .. doctest:: :hide: >>> import os >>> os.remove('test_nj.tree') PyCogent-1.5.3/doc/examples/calculate_pairwise_distances.rst000644 000765 000024 00000005612 11425201333 025250 0ustar00jrideoutstaff000000 000000 .. _calculating-pairwise-distances: Calculate pairwise distances between sequences ============================================== .. sectionauthor:: Gavin Huttley An example of how to calculate the pairwise distances for a set of sequences. .. doctest:: >>> from cogent import LoadSeqs >>> from cogent.phylo import distance Import a substitution model (or create your own) .. doctest:: >>> from cogent.evolve.models import HKY85 Load my alignment .. doctest:: >>> al = LoadSeqs("data/long_testseqs.fasta") Create a pairwise distances object with your alignment and substitution model .. doctest:: >>> d = distance.EstimateDistances(al, submodel= HKY85()) Printing ``d`` before execution shows its status. .. doctest:: >>> print d ========================================================================= Seq1 \ Seq2 Human HowlerMon Mouse NineBande DogFaced ------------------------------------------------------------------------- Human * Not Done Not Done Not Done Not Done HowlerMon Not Done * Not Done Not Done Not Done Mouse Not Done Not Done * Not Done Not Done NineBande Not Done Not Done Not Done * Not Done DogFaced Not Done Not Done Not Done Not Done * ------------------------------------------------------------------------- Which in this case is to simply indicate nothing has been done. .. doctest:: >>> d.run() >>> print d ===================================================================== Seq1 \ Seq2 Human HowlerMon Mouse NineBande DogFaced --------------------------------------------------------------------- Human * 0.0730 0.3363 0.1804 0.1972 HowlerMon 0.0730 * 0.3487 0.1865 0.2078 Mouse 0.3363 0.3487 * 0.3813 0.4022 NineBande 0.1804 0.1865 0.3813 * 0.2019 DogFaced 0.1972 0.2078 0.4022 0.2019 * --------------------------------------------------------------------- Note that pairwise distances can be distributed for computation across multiple CPU's. In this case, when statistics (like distances) are requested only the master CPU returns data. We'll write a phylip formatted distance matrix. .. doctest:: >>> d.writeToFile('dists_for_phylo.phylip', format="phylip") We'll also save the distances to file in Python's pickle format. .. doctest:: >>> import cPickle >>> f = open('dists_for_phylo.pickle', "w") >>> cPickle.dump(d.getPairwiseDistances(), f) >>> f.close() .. clean up .. doctest:: :hide: >>> import os >>> for file_name in 'dists_for_phylo.phylip', 'dists_for_phylo.pickle': ... os.remove(file_name) PyCogent-1.5.3/doc/examples/calculate_UPGMA_cluster.rst000644 000765 000024 00000003006 11425201333 023775 0ustar00jrideoutstaff000000 000000 Make a UPGMA cluster ==================== .. sectionauthor:: Catherine Lozupone An example of how to calculate the pairwise distances for a set of sequences. **NOTE:** UPGMA should not be used for phylogenetic reconstruction. .. doctest:: >>> from cogent import LoadSeqs >>> from cogent.phylo import distance >>> from cogent.cluster.UPGMA import upgma Import a substitution model (or create your own) .. doctest:: >>> from cogent.evolve.models import HKY85 Load the alignment. .. doctest:: >>> al = LoadSeqs("data/test.paml") Create a pairwise distances object calculator for the alignment, providing a substitution model instance. .. doctest:: >>> d = distance.EstimateDistances(al, submodel= HKY85()) >>> d.run() Now use this matrix to build a UPGMA cluster. .. doctest:: >>> mycluster = upgma(d.getPairwiseDistances()) >>> print mycluster.asciiArt() /-NineBande /edge.1--| | | /-HowlerMon /edge.0--| \edge.2--| | | \-Human -root----| | | \-DogFaced | \-Mouse We demonstrate saving this UPGMA cluster to a file. .. doctest:: >>> mycluster.writeToFile('test_upgma.tree') .. We don't actually want to keep that file now, so I'm importing the ``os`` module to delete it. .. doctest:: :hide: >>> import os >>> os.remove('test_upgma.tree') PyCogent-1.5.3/doc/examples/codon_models.rst000644 000765 000024 00000043435 11473355707 022047 0ustar00jrideoutstaff000000 000000 Using codon models ================== .. sectionauthor:: Gavin Huttley The basic paradigm for evolutionary modelling is: #. construct the codon substitution model #. constructing likelihood function #. modify likelihood function (setting rules) #. providing the alignment(s) #. optimisation #. get results out .. note:: In the following, a result followed by '...' just means the output has been truncated for the sake of a succinct presentation. Constructing the codon substitution model ----------------------------------------- PyCogent implements 4 basic rate matrices, described in a recently accepted manuscript: i) NF models, these are nucleotide frequency weighted rate matrices and were initially described by Muse and Gaut (1994); ii) a variant of (i) where position specific nucleotide frequencies are used; iii) TF models, these are tuple (codon in this case) frequency weighted rate matrices and were initially described by Goldman and Yang (1994); iv) CNF, these use the conditional nucleotide frequency and have developed by Yap, Lindsay, Easteal and Huttley. These different models can be created using provided convenience functions which will be the case here, or specified by directly calling the ``Codon`` substitution model class and setting the argument ``mprob_model`` equal to: - NF, ``mprob_model='monomer'`` - NF with position specific nucleotide frequencies, ``mprob_model='monomers'`` - TF, ``mprob_model=None`` - CNF, ``mprob_model='conditional'`` .. warning:: The TF form is the currently the default, but this will be changed in the near future. In the following I will construct GTR variants of i and iv and a HKY variant of iii. We import these explicitly from the ``cogent.evolve.models`` module. .. doctest:: >>> from cogent.evolve.models import CNFGTR, MG94GTR, GY94 These are functions and calling them returns the indicated substitution model with default behaviour of recoding gap characters into N's. .. doctest:: >>> tf = GY94() >>> nf = MG94GTR() >>> cnf = CNFGTR() In the following demonstration I will use only the CNF form (``cnf``). For our example we load a sample alignment and tree as per usual. To reduce the computational overhead for this example we will limit the number of sampled taxa. .. doctest:: >>> from cogent import LoadSeqs, LoadTree, DNA >>> aln = LoadSeqs('data/primate_brca1.fasta', moltype=DNA) >>> tree = LoadTree('data/primate_brca1.tree') Standard test of neutrality --------------------------- We construct a likelihood function and constrain omega parameter (the ratio of nonsynonymous to synonymous substitutions) to equal 1. We also set some display formatting parameters. .. doctest:: >>> lf = cnf.makeLikelihoodFunction(tree, digits=2, space=3) >>> lf.setParamRule('omega', is_constant=True, value=1.0) We then provide an alignment and optimise the model. In the current case we just use the local optimiser (hiding progress to keep this document succinct). We then print the results. .. note:: I'm going to specify a set of conditions that will be used for all optimiser steps. For those new to python, one can construct a dictionary with the following form ``{'argument_name': argument_value}``, or alternatively ``dict(argument_name=argument_value)``. I'm doing the latter. This dictionary is then passed to functions/methods by prefacing it with ``**``. .. doctest:: >>> optimiser_args = dict(local=True, max_restarts=5, tolerance=1e-8) >>> lf.setAlignment(aln) >>> lf.optimise(**optimiser_args) >>> print lf Likelihood Function Table ======================================== A/C A/G A/T C/G C/T omega ---------------------------------------- 1.10 4.07 0.84 1.95 4.58 1.00 ---------------------------------------- ============================ edge parent length ---------------------------- Galago root 0.53 HowlerMon root 0.14 Rhesus edge.3 0.07 Orangutan edge.2 0.02 Gorilla edge.1 0.01 Human edge.0 0.02 Chimpanzee edge.0 0.01 edge.0 edge.1 0.00 edge.1 edge.2 0.01 edge.2 edge.3 0.04 edge.3 root 0.02 ---------------------------- ============== motif mprobs -------------- CTT 0.01 ACC 0.00... In the above output, the first table shows the maximum likelihood estimates (MLEs) for the substitution model parameters that are 'global' in scope. For instance, the ``C/T=4.58`` MLE indicates that the relative rate of substitutions between C and T is nearly 5 times the background substitution rate. The above function has been fit using the default counting procedure for estimating the motif frequencies, i.e. codon frequencies are estimated as the average of the observed codon frequencies. If you wanted to numerically optimise the motif probabilities, then modify the likelihood function creation line to .. code-block:: python lf = cnf.makeLikelihoodFunction(tree,optimise_motif_probs=True) We can then free up the omega parameter, but before we do that we'll store the log-likelihood and number of free parameters for the current model form for reuse later. .. doctest:: >>> neutral_lnL = lf.getLogLikelihood() >>> neutral_nfp = lf.getNumFreeParams() >>> lf.setParamRule('omega', is_constant=False) >>> lf.optimise(**optimiser_args) >>> print lf Likelihood Function Table ======================================== A/C A/G A/T C/G C/T omega ---------------------------------------- 1.08 3.86 0.78 1.96 4.08 0.75 ---------------------------------------- ============================ edge parent length ---------------------------- Galago root 0.53 HowlerMon root 0.14... >>> non_neutral_lnL = lf.getLogLikelihood() >>> non_neutral_nfp = lf.getNumFreeParams() We then conduct a likelihood ratio test whether the MLE of omega significantly improves the fit over the constraint it equals 1. We import the convenience function from the cogent stats module. >>> from cogent.maths.stats import chisqprob >>> LR = 2*(non_neutral_lnL-neutral_lnL) >>> df = non_neutral_nfp - neutral_nfp >>> print chisqprob(LR, df) 0.0026... Not surprisingly, this is significant. We then ask whether the Human and Chimpanzee edges have a value of omega that is significantly different from the rest of the tree. .. doctest:: >>> lf.setParamRule('omega', tip_names=['Chimpanzee', 'Human'], ... outgroup_name='Galago', is_clade=True) >>> lf.optimise(**optimiser_args) >>> print lf Likelihood Function Table ================================ A/C A/G A/T C/G C/T -------------------------------- 1.08 3.86 0.78 1.96 4.07 -------------------------------- ==================================== edge parent length omega ------------------------------------ Galago root 0.53 0.73 HowlerMon root 0.14 0.73 Rhesus edge.3 0.07 0.73 Orangutan edge.2 0.02 0.73 Gorilla edge.1 0.01 0.73 Human edge.0 0.02 2.39 Chimpanzee edge.0 0.01 2.39 edge.0 edge.1 0.00 0.73... >>> chimp_human_clade_lnL = lf.getLogLikelihood() >>> chimp_human_clade_nfp = lf.getNumFreeParams() >>> LR = 2*(chimp_human_clade_lnL-non_neutral_lnL) >>> df = chimp_human_clade_nfp-non_neutral_nfp >>> print chisqprob(LR, df) 0.028... This is basically a replication of the original Huttley et al (2000) result for *BRCA1*. Rate-heterogeneity model variants --------------------------------- It is also possible to specify rate-heterogeneity variants of these models. In the first instance we'll create a likelihood function where these rate-classes are global across the entire tree. Because fitting these models can be time consuming I'm going to recreate the non-neutral likelihood function from above first, fit it, and then construct the rate-heterogeneity likelihood function. By doing this I can ensure that the richer model starts with parameter values that produce a log-likelihood the same as the null model, ensuring the subsequent optimisation step improves the likelihood over the null. .. doctest:: >>> lf = cnf.makeLikelihoodFunction(tree, digits=2, space=3) >>> lf.setAlignment(aln) >>> lf.optimise(**optimiser_args) >>> non_neutral_lnL = lf.getLogLikelihood() >>> non_neutral_nfp = lf.getNumFreeParams() Now, we have a null model which we know (from having fit it above) has a MLE < 1. We will construct a rate-heterogeneity model with just 2 rate-classes (neutral and adaptive) that are separated by the boundary of omega=1. These rate-classes are specified as discrete bins in PyCogent and the model configuration steps for a bin or bins are done using the ``setParamRule`` method. To ensure the alternate model starts with a likelihood at least as good as the previous we need to make the probability of the neutral site-class bin ~= 1 (these are referenced by the ``bprobs`` parameter type) and assign the null model omega MLE to this class. To get all the parameter MLEs (branch lengths, GTR terms, etc ..) into the alternate model we get an annotated tree from the null model which will have these values associated with it. .. doctest:: >>> annot_tree = lf.getAnnotatedTree() >>> omega_mle = lf.getParamValue('omega') We can then construct a new likelihood function, specifying the rate-class properties. .. doctest:: >>> rate_lf = cnf.makeLikelihoodFunction(annot_tree, ... bins = ['neutral', 'adaptive'], digits=2, space=3) We define a very small value (``epsilon``) that is used to specify the starting values. .. doctest:: >>> epsilon=1e-6 We now provide starting parameter values for ``omega`` for the two bins, setting the boundary .. doctest:: >>> rate_lf.setParamRule('omega', bin='neutral', upper=1, init=omega_mle) >>> rate_lf.setParamRule('omega', bin='adaptive', lower=1+epsilon, ... upper=100, init=1+2*epsilon) and provide the starting values for the bin probabilities (``bprobs``). .. doctest:: >>> rate_lf.setParamRule('bprobs', init=[1-epsilon, epsilon]) The above statement essentially assigns a probability of nearly 1 to the 'neutral' bin. We now set the alignment and fit the model. .. doctest:: >>> rate_lf.setAlignment(aln) >>> rate_lf.optimise(**optimiser_args) >>> rate_lnL = rate_lf.getLogLikelihood() >>> rate_nfp = rate_lf.getNumFreeParams() >>> LR = 2*(rate_lnL-non_neutral_lnL) >>> df = rate_nfp-non_neutral_nfp >>> print rate_lf Likelihood Function Table ============================ edge parent length ---------------------------- Galago root 0.56 HowlerMon root 0.14 Rhesus edge.3 0.07 Orangutan edge.2 0.02 Gorilla edge.1 0.01 Human edge.0 0.02 Chimpanzee edge.0 0.01 edge.0 edge.1 0.00 edge.1 edge.2 0.01 edge.2 edge.3 0.03 edge.3 root 0.02 ---------------------------- ========================= bin bprobs omega ------------------------- neutral 0.14 0.01 adaptive 0.86 1.17 ------------------------- ================================ A/C A/G A/T C/G C/T -------------------------------- 1.07 3.96 0.78 1.96 4.20 -------------------------------- ============== motif mprobs -------------- CTT 0.01... >>> print chisqprob(LR, df) 0.000... We can get the posterior probabilities of site-classifications out of this model as .. doctest:: >>> pp = rate_lf.getBinProbs() This is a ``DictArray`` class which stores the probabilities as a ``numpy.array``. Mixing branch and site-heterogeneity ------------------------------------ The following implements a modification of the approach of Zhang, Nielsen and Yang (Mol Biol Evol, 22:2472–9, 2005). For this model class, there are groups of branches for which all positions are evolving neutrally but some proportion of those neutrally evolving sites change to adaptively evolving on so-called foreground edges. For the current example, we'll define the Chimpanzee and Human branches as foreground and everything else as background. The following table defines the parameter scopes. +--------------+----------------+----------------------+---------------------+ | Site class | Proportion | Background edges | Foreground edges | +==============+================+======================+=====================+ | 0 | p_0 | 0 < omega_0 < 1 | 0 < omega_0 < 1 | +--------------+----------------+----------------------+---------------------+ | 1 | p_1 | omega_1=1 | omega_1=1 | +--------------+----------------+----------------------+---------------------+ | 2a | p_2 | 0 < omega_0 < 1 | omega_2 > 1 | +--------------+----------------+----------------------+---------------------+ | 2b | p_3 | omega_1=1 | omega_2 > 1 | +--------------+----------------+----------------------+---------------------+ .. note:: Our implementation is not as parametrically succinct as that of Zhang et al, we have 1 additional bin probability. After Zhang et al, we first define a null model that has 2 rate classes '0' and '1'. We also get all the MLEs out using ``getStatistics``, just printing out the bin parameters table in the current case. .. doctest:: >>> rate_lf = cnf.makeLikelihoodFunction(tree, bins = ['0', '1'], ... digits=2, space=3) >>> rate_lf.setParamRule('omega', bin='0', upper=1.0-epsilon, ... init=1-epsilon) >>> rate_lf.setParamRule('omega', bins='1', is_constant=True, value=1.0) >>> rate_lf.setAlignment(aln) >>> rate_lf.optimise(**optimiser_args) >>> tables = rate_lf.getStatistics(with_titles=True) >>> for table in tables: ... if 'bin' in table.Title: ... print table bin params ==================== bin bprobs omega -------------------- 0 0.11 0.00 1 0.89 1.00 -------------------- We're also going to use the MLEs from the ``rate_lf`` model, since that nests within the more complex branch by rate-class model. This is unfortunately quite ugly compared with just using the annotated tree approach described above. It is currently necessary, however, due to a bug in constructing annotated trees for models with binned parameters. .. doctest:: >>> globals = [t for t in tables if 'global' in t.Title][0] >>> globals = dict(zip(globals.Header, globals.getRawData()[0])) >>> bin_params = [t for t in tables if 'bin' in t.Title][0] >>> rate_class_omegas = dict(bin_params.getRawData(['bin', 'omega'])) >>> rate_class_probs = dict(bin_params.getRawData(['bin', 'bprobs'])) >>> lengths = [t for t in tables if 'edge' in t.Title][0] >>> lengths = dict(lengths.getRawData(['edge', 'length'])) We now create the more complex model, .. doctest:: >>> rate_branch_lf = cnf.makeLikelihoodFunction(tree, ... bins = ['0', '1', '2a', '2b'], digits=2, space=3) and set from the nested null model the branch lengths, .. doctest:: >>> for branch, length in lengths.items(): ... rate_branch_lf.setParamRule('length', edge=branch, init=length) GTR term MLES, .. doctest:: >>> for param, mle in globals.items(): ... rate_branch_lf.setParamRule(param, init=mle) binned parameter values, .. doctest:: >>> rate_branch_lf.setParamRule('omega', bins=['0', '2a'], upper=1.0, ... init=rate_class_omegas['0']) >>> rate_branch_lf.setParamRule('omega', bins=['1', '2b'], is_constant=True, ... value=1.0) >>> rate_branch_lf.setParamRule('omega', bins=['2a', '2b'], ... edges=['Chimpanzee', 'Human'], init=99, ... lower=1.0, upper=100.0, is_constant=False) and the bin probabilities. .. doctest:: >>> rate_branch_lf.setParamRule('bprobs', ... init=[rate_class_probs['0']-epsilon, ... rate_class_probs['1']-epsilon, epsilon, epsilon]) The result of these steps is to create a rate/branch model with initial parameter values that result in likelihood the same as the null. .. doctest:: >>> rate_branch_lf.setAlignment(aln) This function can then be optimised as before. The results of one such optimisation are shown below. As you can see, the ``omega`` value for the '2a' and '2b' bins is at the upper bounds, indicating the model is not maximised in this case. .. code-block:: python rate_branch_lf.optimise(**optimiser_args) print rate_branch_lf Likelihood Function Table ========================= edge bin omega ------------------------- Galago 0 0.00 Galago 1 1.00 Galago 2a 0.00 Galago 2b 1.00 HowlerMon 0 0.00 HowlerMon 1 1.00 HowlerMon 2a 0.00 HowlerMon 2b 1.00 Rhesus 0 0.00 Rhesus 1 1.00 Rhesus 2a 0.00 Rhesus 2b 1.00 Orangutan 0 0.00 Orangutan 1 1.00 Orangutan 2a 0.00 Orangutan 2b 1.00 Gorilla 0 0.00 Gorilla 1 1.00 Gorilla 2a 0.00 Gorilla 2b 1.00 Human 0 0.00 Human 1 1.00 Human 2a 100.00 Human 2b 100.00 Chimpanzee 0 0.00 Chimpanzee 1 1.00 Chimpanzee 2a 100.00 Chimpanzee 2b 100.00... PyCogent-1.5.3/doc/examples/coevolution.rst000644 000765 000024 00000011756 12014672260 021735 0ustar00jrideoutstaff000000 000000 Perform a coevolutionary analysis on biological sequence alignments =================================================================== .. sectionauthor:: Greg Caporaso This document describes how to perform a coevolutionary analysis on a ``DenseAlignment`` object. Coevolutionary analyses identify correlated substitution patterns between ``DenseAlignment`` positions (columns). Several coevolution detection methods are currently provided via the PyCogent coevolution module. ``DenseAlignment`` objects must always be used as input to these functions. Before using an alignment in a coevolutionary analysis, you should be confident in the alignment. Poorly aligned sequences can yield very misleading results. There can be no ambiguous residue/base codes (e.g., B/Z/X in protein alignments) -- while some of the algorithms could tolerate them (e.g. Mutual Information), others which rely on information such as background residue frequencies (e.g. Statistical Coupling Analysis) cannot handle them. Some recoded amino acid alphabets will also not handle ambiguous residues. The best strategy is just to exclude ambiguous codes all together. To test for invalid characters before starting an analysis you can do the following: .. doctest:: >>> from cogent import LoadSeqs, PROTEIN, DNA, RNA >>> from cogent.core.alignment import DenseAlignment >>> from cogent.evolve.coevolution import validate_alignment >>> aln = LoadSeqs(data={'1':'GAA','2':'CTA', '3':'CTC','4':'-TC'},moltype=PROTEIN,aligned=DenseAlignment) >>> validate_alignment(aln) To run a coevolutionary analysis, first create a ``DenseAlignment``: .. doctest:: >>> from cogent import LoadSeqs, PROTEIN, DNA, RNA >>> from cogent.core.alignment import DenseAlignment >>> aln = LoadSeqs(data={'1':'AAA','2':'CTA', '3':'CTC','4':'-TC'},moltype=PROTEIN,aligned=DenseAlignment) Perform a coevolutionary analysis on a pair of positions in the alignment using mutual information (``mi``): .. doctest:: >>> from cogent.evolve.coevolution import coevolve_pair_functions, coevolve_pair >>> coevolve_pair(coevolve_pair_functions['mi'],aln,pos1=1,pos2=2) 0.31127... Perform a coevolutionary analysis on a pair of positions in the alignment using statistical coupling analysis (``sca``): .. doctest:: >>> from cogent.evolve.coevolution import coevolve_pair_functions, coevolve_pair >>> coevolve_pair(coevolve_pair_functions['sca'],aln,pos1=1,pos2=2,cutoff=0.5) 0.98053... Perform a coevolutionary analysis on one position and all other positions in the alignment using mutual information (``mi``): .. doctest:: >>> from cogent.evolve.coevolution import coevolve_position_functions, coevolve_position >>> coevolve_position(coevolve_position_functions['mi'],aln,position=1) array([ nan, 0.81127812, 0.31127812]) Perform a coevolutionary analysis on all pairs of positions in the alignment using mutual information (``mi``): .. doctest:: :options: +NORMALIZE_WHITESPACE >>> from cogent.evolve.coevolution import coevolve_alignment_functions, coevolve_alignment >>> coevolve_alignment(coevolve_alignment_functions['mi'],aln) array([[ nan, nan, nan], [ nan, 0.81127812, 0.31127812], [ nan, 0.31127812, 1. ]]) View the available algorithms for computing coevolution values: .. doctest:: >>> print coevolve_pair_functions.keys() ['mi', 'sca', 'an', 'gctmpca', 'rmi', 'nmi'] Perform an intermolecular coevolutionary analysis using mutual information (``mi``). Note that there are strict requirements on the sequence identifiers for intermolecular analyses, and some important considerations involved in preparing alignments for these analyses. See the coevolve_alignments docstring (i.e., ``help(coevolve_alignments)`` from the python interpreter) for information. Briefly, sequence identifiers are split on ``+`` symbols. The ids before the + must match perfectly between the two alignments as these are used to match the sequences between alignments. In the following example, these are common species names: human, chicken, echidna, and pig. The text after the ``+`` can be anything, and should probably be the original database identifiers of the sequences. .. doctest:: >>> from cogent.evolve.coevolution import coevolve_alignment_functions,\ ... coevolve_alignments >>> aln1 = LoadSeqs(data={'human+protein1':'AAA','pig+protein1':'CTA', ... 'chicken+protein1':'CTC','echidna+weird_db_identifier':'-TC'}, ... moltype=PROTEIN,aligned=DenseAlignment) >>> aln2 = LoadSeqs(data={'pig+protein2':'AAAY','chicken+protein2':'CTAY', ... 'echidna+protein2':'CTCF','human+protein2':'-TCF'}, ... moltype=PROTEIN,aligned=DenseAlignment) >>> coevolve_alignments(coevolve_alignment_functions['mi'],aln1,aln2) array([[ nan, nan, nan], [ nan, 0.12255625, 0.31127812], [ nan, 0.31127812, 0. ], [ nan, 0.31127812, 0. ]]) PyCogent-1.5.3/doc/examples/complete_seq_features.rst000755 000765 000024 00000000000 12024703642 033112 2../../tests/test_core/test_features.rstustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/examples/DisplayPolicy_key.pdf000644 000765 000024 00000013361 11204372323 022754 0ustar00jrideoutstaff000000 000000 %PDF-1.3 %“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com % 'BasicFonts': class PDFDictionary 1 0 obj % The standard fonts dictionary << /F1 2 0 R /F2 3 0 R /F3 4 0 R >> endobj % 'F1': class PDFType1Font 2 0 obj % Font Helvetica << /BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font >> endobj % 'F2': class PDFType1Font 3 0 obj % Font Times-Roman << /BaseFont /Times-Roman /Encoding /WinAnsiEncoding /Name /F2 /Subtype /Type1 /Type /Font >> endobj % 'F3': class PDFType1Font 4 0 obj % Font Times-Bold << /BaseFont /Times-Bold /Encoding /WinAnsiEncoding /Name /F3 /Subtype /Type1 /Type /Font >> endobj % 'Page1': class PDFPage 5 0 obj % Page dictionary << /Contents 9 0 R /MediaBox [ 0 0 595.2756 841.8898 ] /Parent 8 0 R /Resources << /Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >> /Rotate 0 /Trans << >> /Type /Page >> endobj % 'R6': class PDFCatalog 6 0 obj % Document Root << /Outlines 10 0 R /PageMode /UseNone /Pages 8 0 R /Type /Catalog >> endobj % 'R7': class PDFInfo 7 0 obj << /Author (anonymous) /CreationDate (20041124152946) /Producer (ReportLab http://www.reportlab.com) /Subject (unspecified) /Title (untitled) >> endobj % 'R8': class PDFPages 8 0 obj % page tree << /Count 1 /Kids [ 5 0 R ] /Type /Pages >> endobj % 'R9': class PDFStream 9 0 obj % page stream << /Filter [ /ASCII85Decode /FlateDecode ] /Length 3694 >> stream Gb"/,h/D>`&Ut=Qs$:S44lB:gLm%P#gU7J0%L&<2:IPk?*@7b=/UJ51fDbLj AXac"-.9.HKfRsADSpE=%YqgX$ti*<7Y(Ggc2ST\kiZ+@kJ+h=U1j6?_'DdM frre1B=BhWDf=R*PXHAJ36u6'LC).N!tcE;1_C2eG#^C U#:Pd[pngBg3Z#3.B4S8_qB -^@l/6NG,V69?44itQ.:N=ODoL,l$6cm9CH)5:_(?!&e%':(9C:TO&/nTb*. ]2LF5YEr^Mq4X('rL6eM[>lMkI[ACL&ApO*Xn;_:p$SaT0Dq_T1d2E 8\hlN>*!_u&PSrZcgh4V.&JG$5OBD%buG@5&Y^^`UD^1E+chp$NcC4-)!M4B $K%T@*t'TQj[NtEqQrE kLJmm/iVN[N62HV_WZYMB8i\rV:5mCpkQhogc/JeE73\OP7o9[QJE@G`"#LU dO/^'cog3]qa,@8o/3?b%j%*G/4c?h4;8W.Heb+@cCl'RMZ9h02k,eJ1uWAi T>a-D_Q^,M[0;B'6AGn9GVgu2jm><<[/t?E5-p1hglTq"65q1OXcQeecWl*V 2K5g&_7W+9":4G;IjEU3)hV#;OgZp0>_l!%@5WRS3^O!,/Xh SiHZGQNp#jr;f_#`^1gc$\cVamX+KjM8Iejqm\(5g[`IJS2D5H$&oT:gPp75 \4tleL9K]s'+C^O:=9XQ_jCEpe'YsEBRWKCRX3l#Z1c+KP;+WEVFkR4-]Ce; oe>L!(u;9[s17j.+e3YssVhp5q81;ShbRg"C,\2&A4QhFk VQaXoUe;L$bRc"^:k59"JM9_g'^%Kn":$nf3:nKofn79M,(ub032#s(nIL+= 8Rh`#[7dXX:BlikSW430g_VV^bPAS'2.emCK->=t9Ojh2Y"L^ES1M;:jFh0= dfbB'?_Vh[h%nPqI(QQ;()2RZR;J7Y#PFG\/q)';-VrIh>N]>LY2>u)[cKfR LbBC!lJ&sT'\t__$5P*$e[X9UC'M\7)B,(e/o#(fIEtbX?Z:J.]:5?+2#IXS CeFd%*+h3FGc7:XBoi,tTeL8Qq;n;_G;H/L[5?KA'uX-sW$[*FrPcF$E_j^@G\dj9rU4r(RDk#4b#bAC?B3 .a>[THG=hkcfbT@?T#$+E6:XDBQl3d>^!>E&q/jF&b4([f5-U%X2hDTp#PfbRX3>/!R"O6 X0T%.MPL7`*jAITAl+S\1cuI+:OCk-(o7]9"CpIfSn5hB73aXj8B5[!>?W*u K[eHE?5uXNe"PYF.s8+B>hA(1l$1_^6770k!gHsF;NDal$-V]4"Gpe NuXtDpHm[L_Y68V*k2Y$]=`/`:UE4!3PZ!$BCF p1#1AA^K;)Y(5Q'PHqMMQn`ZD9$bND65717AbR4R69kM.(j=S7:84o)jF^su AHt5P\i,NYY7hE#[n5)WA,Dc2[]f,sl?WJ2W&c^\C3j5W_pn:PMtse`/tGeK 1/B`G"LcQ&,K+loS.$/(EKEc,>M6^K].I'EbOsTZ-Kks?hnpn?UOPdaYFUlN i9MMLIe2U6nD2(7gV63gF99>"0]^i2-W,g'c%5e:Pb":%:57N6BpD7sY!_39 (ci/q<:boq:lFN2OpE_tJI1XD3LF`i,o6dPsd`$Q(s 0C#P"Y7'NSN-t[p8%V^l1"q*sDL^[(*NQiDj)C5IO(kUC^E%\<%qdln!/WM+ c3#Z8Z$Q)W'1(aX"dffS&Wg9"*\apOZ@k2C_DKgXd)Bckb+E=7VYjK1>qgpT WI0`'@@gMl'Rj#8LZh:fXbcjABRCFkW,'%K/ln=r ;tIKd*e3[*c`t@%qokSKS1bT6Z`i9.CH-XdQM<:6mm=A?m=GoY8*eh`5=a3M (%nab/dU8kI[TCV[t-SPrK"gn&&n:PU08D.rhem.,u29T$e\rHl6[/gl6[/W 16?2Oauj`llaPRn*g9fnKq8r'*lT7'YZm MQgrKia+bjF1A].T30s:a10[WcH8RG19S%;"pn.tDE#L=4:%:D#.`H.3ldUK ot3^Tdth7Jb4N63>PGr>Q]HfFb#GuSAp6'(=I8:i=s*6/*E1iK3S\R2)jS4/ *ndJm@+sBl:Mj2eT,q?Df7t^Q/5U2N2MGEdPC>GKHK+\$H?J)j(d&iD$Xf(jQqF5l@jotcKW_dhTLM#EM1brl#*5!Kjk Fk.9k8.p7B?EqDFM>""SJ]E37VT&j-BM6Zu!N#^mKc`2hlfrau/f[YfIWSE[ rUG\=;"jhkM,$n@A)htbbVbkAbMqdVlho;\e-Ck)E"l7jkm!&CY,OJLEGr>` q5*".Fh'!eF0=R%!/Yr?8(5(F_4.qS,5H[JQN8DMdEu:P:49AB-_@%R2KRh6OY?1b> \C^/@0D0_ac10kjgX[IUSNh!2J)'brZu#X8V0tJ9;D:3*4!,"F,m80/1++g! 8D4cIY@:O,6V<@/qqX%nr^qOMjVa#aFtchZ@%u2VU8VS];!W@'en)BmQfUhF Wc>n6Z7aOLiC-n)1gYE(ecDGocFd^5W0?Q>Mn\U3i%Vr`HU9&tc 0>&^u+Ot_q79Psm]^VYs9/U%#;:[$30>&_!ON%Y]auZ3e!Yet>`X5^*aYZIB Kb'o_6.F<0#=mlAB0gK%ZUVFo*k%dk(kIq^FjF16uc#,RujB!S,WRhBn0C~>endstream endobj % 'R10': class PDFOutlines 10 0 obj << /Count 0 /Type /Outlines >> endobj xref 0 11 0000000000 65535 f 0000000113 00000 n 0000000233 00000 n 0000000398 00000 n 0000000567 00000 n 0000000732 00000 n 0000001021 00000 n 0000001156 00000 n 0000001352 00000 n 0000001457 00000 n 0000005298 00000 n trailer << /ID % ReportLab generated PDF document -- digest (http://www.reportlab.com) [(\325\004\213\013\021\274+\232\032\330\313\035\372\2363\311) (\325\004\213\013\021\274+\232\032\330\313\035\372\2363\311)] /Info 7 0 R /Root 6 0 R /Size 11 >> startxref 5350 %%EOF PyCogent-1.5.3/doc/examples/draw_dendrogram.rst000644 000765 000024 00000005354 11425201333 022515 0ustar00jrideoutstaff000000 000000 .. _draw-trees: Drawing dendrograms and saving to PDF ===================================== .. sectionauthor:: Gavin Huttley From cogent import all the components we need. .. doctest:: >>> from cogent import LoadSeqs, LoadTree >>> from cogent.evolve.models import MG94HKY >>> from cogent.draw import dendrogram Do a model, see the neutral test example for more details of this .. doctest:: :options: +NORMALIZE_WHITESPACE >>> aln = LoadSeqs("data/long_testseqs.fasta") >>> t = LoadTree("data/test.tree") >>> sm = MG94HKY() >>> nonneutral_lf = sm.makeLikelihoodFunction(t) >>> nonneutral_lf.setParamRule("omega", is_independent = True) >>> nonneutral_lf.setAlignment(aln) >>> nonneutral_lf.optimise() We will draw two different dendrograms -- one with branch lengths contemporaneous, the other where length is scaled. Specify the dimensions of the canvas in pixels .. doctest:: >>> height, width = 500, 700 Dendrogram with branch lengths not proportional ----------------------------------------------- .. doctest:: >>> np = dendrogram.ContemporaneousDendrogram(nonneutral_lf.tree) >>> np.drawToPDF('tree-unscaled.pdf' , width, height, stroke_width=2.0, ... show_params = ['r'], label_template = "%(r).2g", shade_param = 'r', ... max_value = 1.0, show_internal_labels=False, font_size = 10, ... scale_bar = None, use_lengths=False) Dendrogram with branch lengths proportional ------------------------------------------- .. doctest:: >>> p = dendrogram.SquareDendrogram(nonneutral_lf.tree) >>> p.drawToPDF('tree-scaled.pdf', width, height, stroke_width=2.0, ... shade_param = 'r', max_value = 1.0, show_internal_labels=False, ... font_size = 10) Separating the analysis and visualisation steps ----------------------------------------------- It's typically better to not have the analysis and drawing code in the same script, since drawing involves frequent iterations. This requires saving a tree for later reuse. This can be done using an annotated tree, which looks just like a tree, but has the maximum-likelihood parameter estimates attached to each tree edge. The tree must be saved in xml format to preserve the parameter estimates. The annotated tree is obtained from the likelihood function and saved to file specifying the format with the .xml suffix. This file can then be loaded using the standard ``LoadTree`` method in a separate script and used for drawing. .. doctest:: >>> at = nonneutral_lf.getAnnotatedTree() >>> at.writeToFile('annotated_tree.xml') .. we clean up after ourselves, deleting the file .. doctest:: :hide: >>> import os >>> for file_name in 'tree-scaled.pdf', 'tree-unscaled.pdf', 'annotated_tree.xml': ... os.remove(file_name) PyCogent-1.5.3/doc/examples/draw_dotplot.rst000644 000765 000024 00000002072 11444532333 022062 0ustar00jrideoutstaff000000 000000 Drawing a dotplot ================= .. sectionauthor:: Gavin Huttley .. doctest:: >>> from cogent import LoadSeqs, DNA >>> from cogent.core import annotation >>> from cogent.draw import dotplot Load the alignment for illustrative purposes, I'll make one sequence a different length than the other and introduce a custom sequence annotation for a miscellaneous feature. Normally, those annotations would be on the unaligned sequences. .. doctest:: >>> aln = LoadSeqs("data/test.paml", moltype=DNA) >>> feature = aln.addAnnotation(annotation.Feature, "misc_feature", ... "pprobs", [(38, 55)]) >>> seq1 = aln.getSeq('NineBande')[10:-3] >>> seq2 = aln.getSeq('DogFaced') Write out the dotplot as a pdf file in the current directory note that seq1 will be the x-axis, and seq2 the y-axis. .. doctest:: >>> dp = dotplot.Display2D(seq1,seq2) >>> filename = 'dotplot_example.pdf' >>> dp.drawToPDF(filename) .. clean up .. doctest:: :hide: >>> import os >>> os.remove('dotplot_example.pdf') PyCogent-1.5.3/doc/examples/empirical_protein_models.rst000644 000765 000024 00000005466 11425201333 024432 0ustar00jrideoutstaff000000 000000 Use an empirical protein substitution model =========================================== .. sectionauthor:: Gavin Huttley This file contains an example of importing an empirically determined protein substitution matrix such as Dayhoff et al 1978 and using it to create a substitution model. The globin alignment is from the PAML distribution. .. doctest:: >>> from cogent import LoadSeqs, LoadTree, PROTEIN >>> from cogent.evolve.substitution_model import EmpiricalProteinMatrix >>> from cogent.parse.paml_matrix import PamlMatrixParser Make a tree object. In this case from a string. .. doctest:: >>> treestring="(((rabbit,rat),human),goat-cow,marsupial);" >>> t = LoadTree(treestring=treestring) Import the alignment, explicitly setting the ``moltype`` to be protein .. doctest:: >>> al = LoadSeqs('data/abglobin_aa.phylip', ... interleaved=True, ... moltype=PROTEIN, ... ) Open the file that contains the empirical matrix and parse the matrix and frequencies. .. doctest:: >>> matrix_file = open('data/dayhoff.dat') The ``PamlMatrixParser`` will import the matrix and frequency from files designed for Yang's PAML package. This format is the lower half of the matrix in three letter amino acid name order white space delineated followed by motif frequencies in the same order. .. doctest:: >>> empirical_matrix, empirical_frequencies = PamlMatrixParser(matrix_file) Create an Empirical Protein Matrix Substitution model object. This will take the unscaled empirical matrix and use it and the motif frequencies to create a scaled Q matrix. .. doctest:: >>> sm = EmpiricalProteinMatrix(empirical_matrix, empirical_frequencies) Make a parameter controller, likelihood function object and optimise. .. doctest:: >>> lf = sm.makeLikelihoodFunction(t) >>> lf.setAlignment(al) >>> lf.optimise() >>> print lf.getLogLikelihood() -1706... >>> print lf Likelihood Function Table ============================= edge parent length ----------------------------- rabbit edge.0 0.0785 rat edge.0 0.1750 edge.0 edge.1 0.0324 human edge.1 0.0545 edge.1 root 0.0269 goat-cow root 0.0972 marsupial root 0.2424 ----------------------------- =============== motif mprobs --------------- A 0.0871 C 0.0335 D 0.0469 E 0.0495 F 0.0398 G 0.0886 H 0.0336 I 0.0369 K 0.0805 L 0.0854 M 0.0148 N 0.0404 P 0.0507 Q 0.0383 R 0.0409 S 0.0696 T 0.0585 V 0.0647 W 0.0105 Y 0.0299 --------------- PyCogent-1.5.3/doc/examples/entity_selection.rst000644 000765 000024 00000011651 11307154323 022741 0ustar00jrideoutstaff000000 000000 Selecting and grouping entities =============================== .. sectionauthor:: Marcin Cieslik The feature, which distinguishes PyCogent's approach to the handling of macromolecular structures is the flexible and concise way of selecting, grouping and retrieving data from entities. The concepts of entity and hierarchy are similar. Overview of methods and functions covered. ------------------------------------------ The methods covered in this section of the manual deal with selecting entities for purposes like: "select all hydrogen atoms from chain B", "mask all hetero atoms", "remove all water molecules" etc. We start with the high-level functions first, which are concise and standard to low-level methods for fine grained manipulations Selection based on hierarchy. ----------------------------- Let's start by accessing a PDB file and creating a structure entity. We establish a connection to the PDB file server download a file and parse it. .. doctest:: >>> from cogent.parse.pdb import PDBParser >>> from cogent.db.pdb import Pdb >>> pdb = Pdb() >>> socket_handle = pdb['2E1F'] >>> structure = PDBParser(socket_handle) Let's see what we got .. doctest:: >>> print structure.header['name'] HYDROLASE >>> print structure.header['experiment_type'] X-RAY DIFFRACTION WOW, thats descriptive. At least we know it is an X-Ray structure. Now how many chains does it have? .. doctest:: >>> structure[(0,)].getChildren() [] We found the 'A' chain of the first (0-based indexing) model. We can dig deeper .. doctest:: >>> structure[(0,)][('A',)].sortedkeys()[0:2] [(('H_HOH', 1, ' '),), (('H_HOH', 2, ' '),)] Only waters? Probably not. You can see what is inside a chain by looking inside the dictionary to get the list of short ids and child entities: .. doctest:: >>> chain_A = structure[(0,)][('A',)] >>> # chain_A.keys() # get the short_ids >>> # chain_A.values() # get the children >>> len(chain_A) 147 This number is too high because we counted water molecules not only amino acids. But typing ``structure[(0,)][('A',)]`` is pretty boring and it requires to inspect the number of models and chain ids first. The function which allows to select entities from the hierarchy based on their identity is called ``einput`` .. doctest:: >>> from cogent.struct.selection import einput >>> all_residues = einput(structure, 'R', 'my_residues') >>> all_atoms = einput(structure, 'A') >>> len(all_residues) 147 Still waters are included. Selection based on properties. ------------------------------ We already have a collection of entities ``all_residues`` which contains all residues in the structure regardless of the number of chains and models. Our task is to determine the number of non-water residues. The property which allows us to distinguish a water molecule from an amino acid is the name, which is stored as the ``name`` attribute. .. doctest:: >>> chain_A.name 'A' >>> first_child = chain_A.sortedvalues()[0] >>> first_child.name 'H_HOH' We could write a loop to select those residues we can either loop over the residues in ``chain_A`` or ``all_residues`` as they are the same: .. doctest:: >>> non_water = [] >>> for residue in chain_A: ... if residue.name != 'H_HOH': ... non_water.append(residue) ... >>> len(non_water) 95 To make this more convenient each entity e.g. a ``Chain`` instance has a method to select children based on a property ``selectChildren``. The equivalent of the above expression is: .. doctest:: >>> non_water = chain_A.selectChildren('H_HOH', 'ne', 'name').values() or .. doctest:: >>> non_water = all_residues.selectChildren('H_HOH', 'ne', 'name').values() >>> len(non_water) 95 The first argument is a value, the second an operator name from the ``operator`` module, here 'ne' is for 'Not Equal'. The last argument 'name' is resolved by the ``data_children`` method which allows the user to retrieve data from a child entities attributes, xtra dictionary or methods. Here we get the data from the 'name' attribute. The ``selectChildren`` method returns a dictionary, where keys are the short ids and values are the child entities. The result can be put into a new entity holder. .. doctest:: >>> non_water_holder = einput(non_water, 'R') But having to first group the entities via ``einput`` then select them only to put them into a new container seems awkward. It can be done in one step using the ``select`` function. .. doctest:: >>> from cogent.struct.selection import select >>> non_water_holder = select(structure, 'R', 'H_HOH', 'ne', 'name') >>> len(non_water_holder) 95 Is there a serine(s) in the sequence? .. doctest:: >>> serines = select(structure, 'R', 'SER', 'eq', 'name') >>> serines.sortedkeys()[0] ('2E1F', 0, 'A', ('SER', 1146, ' ')) The function raises a ``ValueError`` if no entities can be selected. PyCogent-1.5.3/doc/examples/estimate_startingpoint.rst000644 000765 000024 00000004533 11425201333 024154 0ustar00jrideoutstaff000000 000000 Estimate parameter values using a sampling from a dataset ========================================================= .. sectionauthor:: Gavin Huttley This script uses the ``sample`` method of the alignment class to provide an estimate for a two stage optimisation. This allows rapid optimisation of long alignments and complex models with a good chance of arriving at the global maximum for the model and data. Local optimisation of the full alignment may end up in local maximum and for this reason results from this strategy my be inaccurate. From cogent import all the components we need. .. doctest:: >>> from cogent import LoadSeqs, LoadTree >>> from cogent.evolve import substitution_model Load your alignment, note that if your file ends with a suffix that is the same as it's format (assuming it's a supported format) then you can just give the filename. Otherwise you can specify the format using the format argument. .. doctest:: >>> aln = LoadSeqs(filename = "data/long_testseqs.fasta") Get your tree .. doctest:: >>> t = LoadTree(filename = "data/test.tree") Get the substitution model (defaults to Felsensteins 1981 model) .. doctest:: >>> sm = substitution_model.Nucleotide() Make a likelihood function from a sample of the alignment the ``sample`` method selects the chosen number of bases at random. .. doctest:: >>> lf = sm.makeLikelihoodFunction(t) >>> lf.setMotifProbsFromData(aln) >>> lf.setAlignment(aln.sample(20)) Optimise with the local optimiser .. doctest:: >>> lf.optimise(local=True) Next use the whole alignment .. doctest:: >>> lf.setAlignment(aln) and the faster Powell optimiser that will only find the best result near the provided starting point .. doctest:: >>> lf.optimise(local=True) .. doctest:: >>> print lf Likelihood Function Table ============================= edge parent length ----------------------------- Human edge.0 0.0309 HowlerMon edge.0 0.0412 edge.0 edge.1 0.0359 Mouse edge.1 0.2666 edge.1 root 0.0226 NineBande root 0.0895 DogFaced root 0.1095 ----------------------------- =============== motif mprobs --------------- T 0.2317 C 0.1878 A 0.3681 G 0.2125 --------------- PyCogent-1.5.3/doc/examples/generating_app_commandlines.rst000644 000765 000024 00000007727 11213303017 025075 0ustar00jrideoutstaff000000 000000 Generating application commandlines =================================== .. sectionauthor:: Daniel McDonald In this example we will generate application command lines. This tool is useful for creating jobs scripts in a cluster or supercomputer environment, as well as when varying parameters. First, we must create a ``ParameterIterBase`` subclass instance. For this example, we will use the ``ParameterCombinations`` object and will vary parameters from the ``Clustalw`` application controller. .. doctest:: >>> from cogent.app.clustalw import Clustalw >>> from cogent.app.util import ParameterCombinations Lets go ahead and vary the parameters ``-gapdist`` and ``-kimura``, specifying that we always want a value for the ``-gapdist`` parameter. ``-gapdist`` is a ``ValuedParameter``, so we must specify a list of values we wish to use. ``-kimura`` is a ``FlagParameter``, so we only need to say that we would like it on (``True``). If the parameter is not specified in ``AlwaysOn`` then the off state will be added to the range of possible values for the parameter. .. doctest:: >>> params_to_vary = {'-gapdist':[1,2,3], '-kimura':True} >>> always_on = ['-gapdist'] >>> param_comb = ParameterCombinations(Clustalw, params_to_vary, always_on) The returned instance is a generator that will yield parameter dictionaries that can be passed to the application controller. For this example, we will instead use the generator to construct command line strings. .. doctest:: >>> from cogent.app.util import cmdline_generator To generate command lines, you must specify a ``ParameterIterBase`` subclass instance, the full path to the application, an optional initial binary (such as Python), how to handle inputs and outputs, specify the actual inputs, how to handle ``stdin/stdout/stderr``, and if you would like unique outputs to be created. This sounds like a lot, but not all applications support by PyCogent work the same. This generator is designed to handle every application supported by PyCogent. In this example, we are not specifying how to handle ``stderr`` and ``stdout``. They are by default thrown to ``/dev/null``. .. note:: Output from printing ``cmd`` is truncated for document formatting .. doctest:: >>> path_to_bin = '' # we do not need an initial binary >>> path_to_cmd = '/usr/bin/clustalw' >>> paths_to_inputs = ['/home/user/input1','/home/user/input2'] >>> path_to_output = '/home/user/output' >>> unique_outputs = True >>> input_param = '-infile' >>> output_param = '-outfile' >>> cmd_gen = cmdline_generator(param_comb, PathToBin=path_to_bin, \ ... PathToCmd=path_to_cmd, PathsToInputs=paths_to_inputs, \ ... PathToOutput=path_to_output, UniqueOutputs=unique_outputs,\ ... InputParam=input_param, OutputParam=output_param) >>> for cmd in cmd_gen: ... print cmd ... /usr/bin/clustalw -align -gapdist=1 -kimura -infile="/home/user/input1" -outfile="/home/... /usr/bin/clustalw -align -gapdist=1 -kimura -infile="/home/user/input2" -outfile="/home/... /usr/bin/clustalw -align -gapdist=1 -infile="/home/user/input1" -outfile="/home/user/out... /usr/bin/clustalw -align -gapdist=1 -infile="/home/user/input2" -outfile="/home/user/out... /usr/bin/clustalw -align -gapdist=2 -kimura -infile="/home/user/input1" -outfile="/home/... /usr/bin/clustalw -align -gapdist=2 -kimura -infile="/home/user/input2" -outfile="/home/... /usr/bin/clustalw -align -gapdist=2 -infile="/home/user/input1" -outfile="/home/user/out... /usr/bin/clustalw -align -gapdist=2 -infile="/home/user/input2" -outfile="/home/user/out... /usr/bin/clustalw -align -gapdist=3 -kimura -infile="/home/user/input1" -outfile="/home/... /usr/bin/clustalw -align -gapdist=3 -kimura -infile="/home/user/input2" -outfile="/home/... /usr/bin/clustalw -align -gapdist=3 -infile="/home/user/input1" -outfile="/home/user/out... /usr/bin/clustalw -align -gapdist=3 -infile="/home/user/input2" -outfile="/home/user/out... PyCogent-1.5.3/doc/examples/genetic_code_aa_index.rst000644 000765 000024 00000006067 11444532333 023630 0ustar00jrideoutstaff000000 000000 Compute the effect of a nucleotide substitution on residue polarity in two different genetic codes using GeneticCode and AAIndex ================================================================================================================================ .. sectionauthor:: Greg Caporaso This document illustrates how to work with a genetic code object, and compare two different genetic codes. Here we compare the change in residue polarity, as judged by the Woese Polarity Requirement index (Woese 1973), resulting from a nucleotide substitution if the sequence is translated with the standard nuclear genetic code, or the vertebrate mitochondrial genetic code. First, we load the genetic code objects and look at how they differ from one another. .. doctest:: >>> from cogent.core.genetic_code import GeneticCode >>> code = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG' >>> standard_nuclear_genetic_code = GeneticCode(code) >>> code = 'FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG' >>> vertebrate_mitochondrial_genetic_code = GeneticCode(code) >>> standard_nuclear_genetic_code == vertebrate_mitochondrial_genetic_code False We'll make some synonyms for the objects for simplicity, and then look at the differences between the two codes: .. doctest:: >>> ngc = standard_nuclear_genetic_code >>> mgc = vertebrate_mitochondrial_genetic_code >>> differences = ngc.changes(mgc).items() >>> differences.sort() >>> differences [('AGA', 'R*'), ('AGG', 'R*'), ('ATA', 'IM'), ('TGA', '*W')] Next, let's load the Woese Polar Requirement ``AAIndex`` data, and find the effect of an ATA to ATG substitution with each of the two ``GeneticCode`` objects. .. doctest:: >>> from cogent.parse.aaindex import getWoeseDistanceMatrix >>> woese_distance_matrix = getWoeseDistanceMatrix() >>> woese_distance_matrix[ngc['ATA']][ngc['ATG']] 0.39999999999999947 >>> woese_distance_matrix[mgc['ATA']][mgc['ATG']] 0.0 This illustrates that there is a difference in residue polarity associated with substitution only in the standard nuclear code (where ATA to ATG translates to an isoleucine to methionine substitution). In the vertebrate mitochondrial code, ATA to ATG is a synonymous substitution. Calculations of this type were central to [1]_ which presents the study that these modules were initially developed for. ``GeneticCode`` objects can also be used to translate DNA sequences (where asterisks in the results refer to stop-translation characters): .. doctest:: >>> dna = "AAACGCTGTGTGTGAGATGAAAAA" >>> ngc.translate(dna) 'KRCV*DEK' >>> mgc.translate(dna) 'KRCVWDEK' The standard nuclear genetic code can also be loaded as ``DEFAULT``: .. doctest:: >>> from cogent.core.genetic_code import DEFAULT >>> DEFAULT == standard_nuclear_genetic_code True **Citations** .. [1] Caporaso, Yarus, and Knight. *Error minimization and coding triplet/binding site associations are independent features of the canonical genetic code.* J Mol Evol, 61(5):597-607, 2005. PyCogent-1.5.3/doc/examples/goodness_of_fit.rst000644 000765 000024 00000006543 11305663710 022536 0ustar00jrideoutstaff000000 000000 The ``goodness_of_fit`` module ============================== .. sectionauthor:: Andreas Wilm This is a short example of how to use the goodness_of_fit module. ``goodness_of_fit`` will measure the degree of correspondence (or goodness of fit) between a multidimensional scaling (i.e. a lower dimensional representation of distances between objects) and the input distance matrix. Let's setup some example data: .. doctest:: >>> import numpy >>> import cogent.cluster.goodness_of_fit as goodness_of_fit >>> distmat = numpy.array([ ... [ 0. , 0.039806, 0.056853, 0.21595 , 0.056853, 0.0138 , ... 0.203862, 0.219002, 0.056853, 0.064283], ... [ 0.039806, 0. , 0.025505, 0.203862, 0.0208 , 0.039806, ... 0.194917, 0.21291 , 0.0208 , 0.027869], ... [ 0.056853, 0.025505, 0. , 0.197887, 0.018459, 0.056853, ... 0.191958, 0.203862, 0.018459, 0.025505], ... [ 0.21595 , 0.203862, 0.197887, 0. , 0.206866, 0.206866, ... 0.07956 , 0.066935, 0.203862, 0.206866], ... [ 0.056853, 0.0208 , 0.018459, 0.206866, 0. , 0.056853, ... 0.203862, 0.21595 , 0.0138 , 0.0208 ], ... [ 0.0138 , 0.039806, 0.056853, 0.206866, 0.056853, 0. , ... 0.197887, 0.209882, 0.056853, 0.064283], ... [ 0.203862, 0.194917, 0.191958, 0.07956 , 0.203862, 0.197887, ... 0. , 0.030311, 0.200869, 0.206866], ... [ 0.219002, 0.21291 , 0.203862, 0.066935, 0.21595 , 0.209882, ... 0.030311, 0. , 0.21291 , 0.219002], ... [ 0.056853, 0.0208 , 0.018459, 0.203862, 0.0138 , 0.056853, ... 0.200869, 0.21291 , 0. , 0.011481], ... [ 0.064283, 0.027869, 0.025505, 0.206866, 0.0208 , 0.064283, ... 0.206866, 0.219002, 0.011481, 0. ]]) >>> mds_coords = numpy.array([ ... [ 0.065233, 0.035019, 0.015413], ... [ 0.059604, 0.00168 , -0.003254], ... [ 0.052371, -0.010959, -0.014047], ... [-0.13804 , -0.036031, 0.031628], ... [ 0.063703, -0.015483, -0.00751 ], ... [ 0.056803, 0.031762, 0.021767], ... [-0.135082, 0.023552, -0.021006], ... [-0.150323, 0.011935, -0.010013], ... [ 0.06072 , -0.01622 , -0.007721], ... [ 0.065009, -0.025254, -0.005257]]) You are now good to compute stress values as shown in the following. ``Stress.calcKruskalStress()`` ------------------------------ This computes Kruskal's Stress AKA Stress Formula 1 (Kruskal, 1964). Kruskal gives the following numbers as guideline: ========== =============== Stress [%] Goodness of fit ========== =============== 20 Poor 10 Fair 5 Good 2.5 Excellent 0 Perfect ========== =============== Our example data gives a very good stress value: .. doctest:: >>> stress = goodness_of_fit.Stress(distmat, mds_coords) >>> print stress.calcKruskalStress() 0.02255... ``Stress.calcSstress()`` ------------------------ This computes S-Stress (Takane, 1977). S-Stress emphasises larger dissimilarities more than smaller ones. S-Stress values behave nicer then Stress-1 values: - The value of S-Stress is always between 0 and 1 - Values less then 0.1 mean good representation. Using the example data again: .. doctest:: >>> stress = goodness_of_fit.Stress(distmat, mds_coords) >>> print stress.calcSstress() 0.00883... PyCogent-1.5.3/doc/examples/handling_3dstructures.rst000644 000765 000024 00000031671 11444532333 023705 0ustar00jrideoutstaff000000 000000 Working with macromolecular structures ====================================== .. sectionauthor:: Marcin Cieslik This part of the documentation presents examples on how to work with macromolecular structures i.e. coordinate files. This functionality has originates from ZenPDB. At the current stage of development the input and output is limited to PDB files and some derivatives, but the "internal" representation and construction is file-format agnostic and other parsers can be easily added. Hierarchy and identity ---------------------- A common way to describe macromolecular structures is the hierarchical representation. The hierarchy is made from entities (atoms, residues, chains, models, structures and crystals). In this hierarchical representation models are within structures (e.g. several NMR models of one protein structure or residues which are a collection of atoms). Hierarchical representations are unique i.e. each entity has to be uniquely identified for a given structure, which means that it has a unique identifier. We will refer to this identifier as the *full_id* in contrast with the *short_id* or just *id* which defines an entity uniquely only within it's parent. Each entity has only a single parent, but can have multiple children e.g. a residue is part of only one peptide chain, but it will contain multiple atoms. An example of a *full_id*: :: # for an atom ('4TSV', 0, 'A', ('ARG', 131, ' '), ('O', ' ')) # for a residue ('4TSV', 0, 'A', ('ARG', 131, ' ')) # for a chain ('4TSV', 0, 'A') The first is the identifier for the oxygen atom from the peptide bond of the ARG131 residue in the 'A' chain of the first model (0) in the structure available from the PDB as '4TSV'. A short version of an *id* which is specific only within a parent (i.e. for an atom within a residue) looks similar. In the example below we see the short id of the same atom. Of course this information is not enough to pin-point an atom in the structure but it is enough to identify different atoms within the same residue. :: (('O', ' '),) As you can see the *full_id* is a linear tuple of short id's which can be either a tuple (e.g. ('O', ' ') for an oxygen atom) or a string (e.g. 'A' for chain A). All strings within a short id have some special meaning for example the id of a residue has the following structure ('three letter AA name', residue_id, 'insertion code'). It should be noted that according to the RCSB the ``residue_id`` is the integer number which should be 1 for the first natural residue in a protein. Residues can have negative ``residue_ids`` e.g. residues of a N-terminal affinity tag. What is an Entity? ------------------ ``Entity`` is the most basic class to provide methods specific to macromolecular structures. The ``Atom``, ``Residue``, ``Chain``, ``Model`` and ``Structure`` classes all inherit from the ``Entity`` class, yet there is some distinction between them. Only the ``Atom`` entity cannot contain other entities e.g. an instance of the ``Residue`` class can (and should) contain some ``Atom`` instances. The methods common to container entities are within the ``MultiEntity`` class which they all inherit from. The ``MultiEntity`` is also a subclass of the ``Entity`` class. It is important not to use the ``Entity`` and ``MultiEntity`` classes directly as some attributes (e.g. their position within the SMCRA hierarchy) have to be provided. In fact each entity is just a Python dictionary with almost all dictionary methods left untouched. Parsing a macromolecular structure e.g. a PDB file means to create a ``Structure`` entity or in other words to recursively fill it with atoms, residues and chains. Working with entities --------------------- Our first task will be to parse and write a structure in a PDB file into an ``Entity`` hierarchy. This is quite easy and if you are familiar with the internals of PyCogent I hope it will also be obvious. You can use any PDB file the examples use the ``4TSV.pdb`` file in the doc/data directory. The easy way, but implicit way: .. doctest:: >>> import cogent >>> structure = cogent.LoadStructure('data/4TSV.pdb') This code involves quite a bit of magic, so let's do everything manually. The ``cogent.LoadStructure`` method is a convenience method to get a structure object from a file in any of the supported formats. Right now PyCogent supports only the PDB file-format. Now let's read and write the same PDB file by using the ``PDBParser`` and ``PDBWriter`` function directly. The new_structure argument can be any ``Entity`` (e.g. a ``Structure`` entity) or a container of entities (e.g. a list of ``Atom`` and ``Residue`` entities): .. doctest:: >>> from cogent.parse.pdb import PDBParser >>> from cogent.format.pdb import PDBWriter >>> import tempfile, os >>> pdb_file = open('data/4TSV.pdb') >>> new_structure = PDBParser(pdb_file) >>> open_handle, file_name = tempfile.mkstemp() >>> os.close(open_handle) >>> new_pdb_file = open(file_name,'wb') >>> PDBWriter(new_pdb_file, new_structure) >>> new_structure In this code-listing we first import the PDB parser and PDB writer, open a PDB file and parse the structure. You can verify that the ``PDBParser`` does not close the open ``pdb_file``: .. doctest:: >>> assert not pdb_file.closed >>> assert not new_pdb_file.closed Currently the ``PDBParser`` parses quite a lot information from the header of the PDB file and the atomic coordinates. It omits the anisotropic b-factors. Additional information is stored in the ``header`` attribute which is a dictionary. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> structure.id # the static id tuple. ('4TSV',) >>> structure.getId() # the dynamic id tuple, use calls to get_id whenever possible. ('4TSV',) >>> structure.getFull_id() # only for the structure entity is the full_id identical to the id. ('4TSV',) >>> structure.header.keys() # the pdb header is parsed in to a dictionary as the header attribute ['bio_cmx', 'uc_mxs',... >>> structure.header['id'] # this is the 4-char PDB ID parsed from the header and used to construct the structure.id '4TSV' >>> structure.header['expdta'] # if this is 'X-RAY' we probably deal with a x-ray structre and thus a lot crystallografic data is store in the header. 'X-RAY' Not all information from the PDB header is currently parsed, If you are interested in some special data you can access the unparsed header through the ``raw_header`` attribute, the same is true for the trailer. If you manage to extract the data from the ``raw_header`` you are ready to modify the modular code of the ``PDBParser`` class, please submit a patch! .. code-block:: python structure.raw_header structure.raw_trailer The structure entity is a container for model entities, as you already know the structure is just a dictionary of models. .. doctest:: >>> structure.items() [((0,), )] >>> structure.values() [] >>> structure.keys() [(0,)] >>> first_model = structure.values()[0] # we name the first(and only) model in the structure >>> first_model_id = first_model.getId() But PyCogent provides more specific methods to work with entities. The one which is useful to access the contents of an entity is ``getChildren``. The optional argument to the ``getChildren`` methods is a list of ids (e.g. to access only a subset of children) more concise and sophisticated methods to work with children will be introduced later .. doctest:: >>> structure.getChildren() # the output should be the same as structure.values() [] >>> children_list = structure.getChildren([first_model_id]) A typical way to change a property of all children in a MultiEntity would be to write a loop. In this example we change the name of every residue to 'UNK'. .. doctest:: >>> some_model = structure.values()[0] >>> some_chain = some_model.values()[0] >>> for residue in some_chain.values(): ... residue.setName('UNK') ... Pycogent allows to make it much shorter. Whenever a structure is created the top-level entity(i.e. the structure) gets pointer list to all the entities it contains stored as the ``table`` attribute. For example the structure entity will have a table with a list of all models, chains, residues and atoms that it contains. The keys of this table are *full_ids* the values the actual entities. The table is divided into sections based on the hierarchy i.e. there is a separate dictionary for residues, atoms, chains and models. .. doctest:: >>> sorted(structure.table.keys()) # all the different entity levels in the table (which is a normal dictionary) ['A', 'C', 'M', 'R'] >>> structure.table['C'] # this is a full_id to entity mapping for all chains inside the structures {('4TSV', 0, ' '): , ('4TSV', 0, 'A'): } The creation of such a table is quite expensive so it is created for the structure entity, but there is no reason why you should not create a table for e.g. a chain if you need it. .. doctest:: >>> some_model = structure.values()[0] >>> some_chain = some_model.values()[0] >>> some_chain.setTable() >>> # some_chain.table['R'] # all the residues There is however a catch. Tables are not dynamic, this means that they are not magically updated whenever a child changes it's id. This can be easily seen in following example where a new chain is created a residue moved into it. A table is created for the chain, but it does not update the key after the child changes it's name. .. doctest:: >>> from cogent.core.entity import Chain # the chain entity >>> new_chain = Chain('J') # an ampty chain named 'J' >>> new_chain.getId() ('J',) >>> some_residue = structure.table['R'].values()[0] # a semi-random residue from structure >>> # a possible output: >>> some_residue.setName('001') # change the name to '001' >>> # some_residue.getId() # should return e.g. (('001', 39, ' '),) >>> # some_residue.getFull_id() # should return ('4TSV', 0, 'A', ('001', 39, ' ')) >>> new_chain.addChild(some_residue) # move from chain 'A' in 4TSV into chain 'J' >>> # new_chain.keys() # should return: [(('001', 39, ' '),)] >>> new_chain.setTable() >>> # new_chain.table['R'].keys() # should return: [('J', ('001', 39, ' '))] >>> some_residue.setName('002') # change the name to '002' >>> # new_chain.keys() # should return: [(('002', 39, ' '),)] # updated! >>> # new_chain.table['R'].keys() # should return [('J', ('001', 39, ' '))] not updated >>> new_chain.setTable(force =True) # update table >>> # new_chain.table['R'].keys() # should return [('J', ('002', 39, ' '))] updated It is important to realize that Python dictionaries are not sorted so the order of two equal dictionaries is not the same. Each time a child is changed in a way that affects the parent e.g. a part of it's id changes the parent dictionary will be updated and the order might also. You should **never** assume that an entity has a particular order. .. doctest:: >>> some_residue = some_chain.values()[0] >>> old_id = some_residue.getId() # e.g. (('ILE', 154, ' '),) >>> some_residue.setName('VAL') >>> new_id = some_residue.getId() # e.g. (('VAL', 154, ' '),) >>> some_chain.getChildren([old_id]) # nothin... not valid anymore [] >>> # some_chain.getChildren([new_id]) # e.g. [] But the the table of an entity is static and does not get updated. .. doctest:: >>> some_full_id = some_residue.getFull_id() # entities in tables are stored using their full ids!! >>> # some_chain.table['R'][some_full_id] # should raise a KeyError >>> some_chain.setTable() # we make a new table >>> some_chain.table['R'][some_full_id] It is important to note that the table is a simple dictionary and the entity specific methods like ``getChildren`` are not available. You can figure out whether the table is up-to-date (or at least I hope I managed to code it right) using the ``modified`` attribute. .. doctest:: >>> some_chain.modified False If the result were ``True`` the residue has been modified and might require to use the ``setTable``, or in some cases ``updateIds()`` methods. .. doctest:: >>> some_chain.setTable() >>> some_chain.updateIds() Do not run those methods if you do not need to as they take some time. The loop to run a child method can be implicitly omitted by using the dispatch method. It calls the method for every child. .. doctest:: >>> some_model = structure.values()[0] >>> some_chain = some_model.values()[1] >>> some_chain.dispatch('setName', 'UNK') >>> some_chain.modified True The above method has exactly the same effect as the loop. All residues within the chain will have the name set to 'UNK'. You can verify that the id's and dictionary keys got updated.: .. code-block:: python some_chain.keys()[0] # output random e.g. (('UNK', 260, ' '),) some_chain.values()[0] # e.g. PyCogent-1.5.3/doc/examples/handling_tabular_data.rst000755 000765 000024 00000000000 12024703642 032321 2../../tests/test_util/test_table.rstustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/examples/hmm_par_heterogeneity.rst000644 000765 000024 00000025533 11473355707 023757 0ustar00jrideoutstaff000000 000000 .. _rate-heterogeneity-hmm: Evaluate process heterogeneity using a Hidden Markov Model ========================================================== .. sectionauthor:: Gavin Huttley The existence of rate heterogeneity in the evolution of biological sequences is well known. Typically such an evolutionary property is evaluated using so-called site-heterogeneity models. These models postulate the existence of discrete classes of sites, where sites within a class evolve according to a specific rate that is distinct from the rates of the other classes. These models retain the assumption that alignment columns evolve independently. One can naturally ask the question of whether rate classes occur randomly throughout the sequence or whether they are in fact auto-correlated - meaning sites of a class tend to cluster together. Because we do not have, *a priori*, a basis for classifying the sites the models are specified such that each column can belong to any of the designated site classes and the likelihood is computed across all possible classifications. Post numerical optimisation we can calculate the posterior probability a site column belongs to a specific site class. In ``cogent``, site classes are referred to as ``bins`` and so we refer to bin probabilities etc ... To illustrate how to evaluate these hypotheses formally we specify 3 nested hypotheses: (i) Ho: no rate heterogeneity; (ii) Ha(1): two classes of sites - fast and slow, but independent sites; (iii) Ha(2): fast and slowly evolving sites are auto-correlated (meaning a sites class is correlated with that of its' immediate neighbours). It is also possible to apply these models to different types of changes and we illustrate this with a single parameterisation at the end. First import standard components necessary for all of the following calculations. As the likelihood ratio tests (LRT) involve nested hypotheses we will employ the chi-square approximation for assessing statistical significance. .. doctest:: >>> from cogent.evolve.substitution_model import Nucleotide, predicate >>> from cogent import LoadSeqs, LoadTree >>> from cogent.maths.stats import chisqprob Load the alignment and tree. .. doctest:: >>> aln = LoadSeqs("data/long_testseqs.fasta") >>> tree = LoadTree("data/test.tree") Model Ho: no rate heterogeneity ------------------------------- We define a HKY model of nucleotide substitution, which has a transition parameter. This is defined using the ``MotifChange`` class, by specifying a transition as **not** a transversion (``~MotifChange('R','Y')``). .. doctest:: >>> MotifChange = predicate.MotifChange >>> treat_gap = dict(recode_gaps=True, model_gaps=False) >>> kappa = (~MotifChange('R', 'Y')).aliased('kappa') >>> model = Nucleotide(predicates=[kappa], **treat_gap) We specify a null model with no bins, and optimise it. .. doctest:: >>> lf_one = model.makeLikelihoodFunction(tree, digits=2, space=3) >>> lf_one.setAlignment(aln) >>> lf_one.optimise() >>> lnL_one = lf_one.getLogLikelihood() >>> df_one = lf_one.getNumFreeParams() >>> print lf_one Likelihood Function Table ===== kappa ----- 4.10 ----- =========================== edge parent length --------------------------- Human edge.0 0.03 HowlerMon edge.0 0.04 edge.0 edge.1 0.04 Mouse edge.1 0.28 edge.1 root 0.02 NineBande root 0.09 DogFaced root 0.11 --------------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- Model Ha(1): two classes of gamma distributed but independent sites ------------------------------------------------------------------- Our next hypothesis is that there are two rate classes, or bins, with rates gamma distributed. We will restrict the bin probabilities to be equal. .. doctest:: >>> bin_submod = Nucleotide(predicates=[kappa], ordered_param='rate', ... distribution='gamma', **treat_gap) >>> lf_bins = bin_submod.makeLikelihoodFunction(tree, bins=2, ... sites_independent=True, digits=2, space=3) >>> lf_bins.setParamRule('bprobs', is_constant=True) >>> lf_bins.setAlignment(aln) >>> lf_bins.optimise(local=True) >>> lnL_bins = lf_bins.getLogLikelihood() >>> df_bins = lf_bins.getNumFreeParams() >>> assert df_bins == 9 >>> print lf_bins Likelihood Function Table ================== kappa rate_shape ------------------ 4.38 1.26 ------------------ =========================== edge parent length --------------------------- Human edge.0 0.03 HowlerMon edge.0 0.04 edge.0 edge.1 0.04 Mouse edge.1 0.31 edge.1 root 0.02 NineBande root 0.10 DogFaced root 0.12 --------------------------- ==================== bin bprobs rate -------------------- bin0 0.50 0.41 bin1 0.50 1.59 -------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- Model Ha(2): fast and slowly evolving sites are auto-correlated --------------------------------------------------------------- We then specify a model with switches for changing between site-classes, the HMM part. The setup is almost identical to that for above with the sole difference being setting the ``sites_independent=False``. .. doctest:: >>> lf_patches = bin_submod.makeLikelihoodFunction(tree, bins=2, ... sites_independent=False, digits=2, space=3) >>> lf_patches.setParamRule('bprobs', is_constant=True) >>> lf_patches.setAlignment(aln) >>> lf_patches.optimise(local=True) >>> lnL_patches = lf_patches.getLogLikelihood() >>> df_patches = lf_patches.getNumFreeParams() >>> print lf_patches Likelihood Function Table =============================== bin_switch kappa rate_shape ------------------------------- 0.56 4.42 1.16 ------------------------------- =========================== edge parent length --------------------------- Human edge.0 0.03 HowlerMon edge.0 0.04 edge.0 edge.1 0.04 Mouse edge.1 0.31 edge.1 root 0.02 NineBande root 0.10 DogFaced root 0.12 --------------------------- ==================== bin bprobs rate -------------------- bin0 0.50 0.39 bin1 0.50 1.61 -------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- We use the following short function to compute the LR test statistic. .. doctest:: >>> LR = lambda alt, null: 2 * (alt - null) We conduct the test between the sequentially nested models. .. doctest:: >>> lr = LR(lnL_bins, lnL_one) >>> print lr 22... >>> print "%.4f" % chisqprob(lr, df_patches-df_bins) 0.0000 The stationary bin probabilities are labelled as ``bprobs`` and can be obtained as follows. .. doctest:: >>> bprobs = lf_patches.getParamValue('bprobs') >>> print "%.1f : %.1f" % tuple(bprobs) 0.5 : 0.5 Of greater interest here (given the model was set up so the bin probabilities were equal, i.e. ``is_constant=True``) are the posterior probabilities as those allow classification of sites. The result is a ``DictArray`` class instance, which behaves like a dictionary. .. doctest:: >>> pp = lf_patches.getBinProbs() If we want to know the posterior probability the 21st position belongs to ``bin0``, we can determine it as: .. doctest:: >>> print pp['bin0'][20] 0.8... A model with patches of ``kappa`` --------------------------------- In this example we model sequence evolution where there are 2 classes of sites distinguished by their ``kappa`` parameters. We need to know what value of ``kappa`` to specify the delineation of the bin boundaries. We can determine this from the null model (``lf_one``). For this use case, we also need to use a ``numpy.array``, so we'll import that. .. todo:: **FOR RELEASE** did we fix this silliness of requiring a nump.array? .. doctest:: >>> from numpy import array >>> single_kappa = lf_one.getParamValue('kappa') We then construct the substitution model in a different way to that when evaluating generic rate heterogeneity (above). .. doctest:: >>> kappa_bin_submod = Nucleotide(predicates=[kappa], **treat_gap) >>> lf_kappa = kappa_bin_submod.makeLikelihoodFunction(tree, ... bins = ['slow', 'fast'], sites_independent=False, digits=1, ... space=3) To improve the likelihood fitting it is desirable to set starting values in the model that result in it's initial likelihood being that of the null model (or as close as possible). To do this, we're going to define an arbitrarily small value (``epsilon``) which we use to provide the starting value to the two bins as slightly smaller/greater than ``single_kappa`` for the slow/fast bins respectively. At the same time we set the upper/lower bin boundaries. .. doctest:: >>> epsilon = 1e-6 >>> lf_kappa.setParamRule(kappa, init=single_kappa-epsilon, ... upper=single_kappa, bin='slow') >>> lf_kappa.setParamRule(kappa, init=single_kappa+epsilon, ... lower=single_kappa, bin='fast') We then illustrate how to adjust the bin probabilities, here doing it so that one of them is nearly 1, the other nearly 0. This ensures the likelihood will be near identical to that of ``lf_one`` and as a result the optimisation step will actually improve fit over the simpler model. .. doctest:: >>> lf_kappa.setParamRule('bprobs', ... init=array([1.0-epsilon, 0.0+epsilon])) >>> lf_kappa.setAlignment(aln) >>> lf_kappa.optimise(local=True) >>> print lf_kappa Likelihood Function Table ========== bin_switch ---------- 0.6 ---------- ===================== bin bprobs kappa --------------------- slow 0.8 3.0 fast 0.2 23.3 --------------------- =========================== edge parent length --------------------------- Human edge.0 0.0 HowlerMon edge.0 0.0 edge.0 edge.1 0.0 Mouse edge.1 0.3 edge.1 root 0.0 NineBande root 0.1 DogFaced root 0.1 --------------------------- ============== motif mprobs -------------- T 0.2 C 0.2 A 0.4 G 0.2 -------------- >>> print lf_kappa.getLogLikelihood() -8749.3... PyCogent-1.5.3/doc/examples/index.rst000644 000765 000024 00000004075 11443272207 020474 0ustar00jrideoutstaff000000 000000 ##################### Cogent Usage Examples ##################### ************************************** A Note on the Computable Documentation ************************************** The following examples are all available as standalone text files which can be computed using the Python doctest module. ***************** Data manipulation ***************** .. toctree:: :maxdepth: 1 translate_dna seq_features complete_seq_features reverse_complement align_codons_to_protein aln_profile genetic_code_aa_index handling_tabular_data handling_3dstructures entity_selection query_ncbi query_ensembl manipulating_tree_nodes ********************************** Controlling 3rd party applications ********************************** .. toctree:: :maxdepth: 1 application_controller_framework building_and_using_an_application_controller alignment_app_controllers phylogeny_app_controllers generating_app_commandlines ********************* General data analysis ********************* .. toctree:: :maxdepth: 1 goodness_of_fit perform_PCoA_analysis perform_nmds motif_results unifrac period_estimation ****************** Data Visualisation ****************** .. toctree:: :maxdepth: 1 draw_dendrogram draw_dotplot ******************* Modelling Evolution ******************* .. toctree:: :maxdepth: 1 simple relative_rate neutral_test scope_model_params_on_trees codon_models empirical_protein_models testing_multi_loci reuse_results unrestricted_nucleotide simulate_alignment parametric_bootstrap estimate_startingpoint coevolution rate_heterogeneity hmm_par_heterogeneity seqsim_alignment_simulation seqsim_aln_sim_user_alphabet seqsim_tree_sim *************************** Phylogenetic Reconstruction *************************** .. toctree:: :maxdepth: 1 calculate_pairwise_distances calculate_neigbourjoining_tree calculate_UPGMA_cluster phylo_by_ls maketree_from_proteinseqs PyCogent-1.5.3/doc/examples/maketree_from_proteinseqs.rst000644 000765 000024 00000004304 11425201333 024624 0ustar00jrideoutstaff000000 000000 Making a phylogenetic tree from a protein sequence alignment ============================================================ .. sectionauthor:: Gavin Huttley In this example we pull together the distance calculation and tree building with the additional twist of using an empirical protein substitution matrix. We will therefore be computing the tree from a protein sequence alignment. We will first do the standard cogent import for ``LoadSeqs``. .. doctest:: >>> from cogent import LoadSeqs, PROTEIN We will use an empirical protein substitution matrix. .. doctest:: >>> from cogent.evolve.models import JTT92 The next components we need are for computing the matrix of pairwise sequence distances and then for estimating a neighbour joining tree from those distances. .. doctest:: >>> from cogent.phylo import nj, distance Now load our sequence alignment, explicitly setting the alphabet to be protein. .. doctest:: >>> aln = LoadSeqs('data/abglobin_aa.phylip', interleaved=True, ... moltype=PROTEIN) Create an Empirical Protein Matrix Substitution model object. This will take the unscaled empirical matrix and use it and the motif frequencies to create a scaled Q matrix. .. doctest:: >>> sm = JTT92() We now use this and the alignment to construct a distance calculator. .. doctest:: >>> d = distance.EstimateDistances(aln, submodel = sm) >>> d.run() The resulting distances are passed to the nj function. .. doctest:: >>> mytree = nj.nj(d.getPairwiseDistances()) The shape of the resulting tree can be readily view by printing ``mytree.asciiArt()``. The result will be equivalent to. .. code-block:: python /-human | | /-rabbit -root----|-edge.1--| | \-rat | | /-goat-cow \edge.0--| \-marsupial This tree can be saved to file, the ``with_distances`` argument specifies that branch lengths are to be included in the newick formatted output. .. doctest:: >>> mytree.writeToFile('test_nj.tree', with_distances=True) .. clean up .. doctest:: :hide: >>> import os >>> os.remove('test_nj.tree') PyCogent-1.5.3/doc/examples/manipulating_tree_nodes.rst000644 000765 000024 00000020323 11213036343 024250 0ustar00jrideoutstaff000000 000000 Manipulation of Tree Node Objects ================================= .. sectionauthor:: Tony Walters Examples of how to initialize and manipulate various tree node objects. .. doctest :: >>> from cogent.core.tree import PhyloNode >>> from cogent import LoadTree >>> from cogent.parse.tree import DndParser The general method to initialize a tree is ``LoadTree``, however, for exceptionally large trees or if one needs to specify the node objects (``TreeNode``, ``PhyloNode``, or ``RangeNode``), ``DndParser`` should be used. ``LoadTree`` uses ``PhyloNode`` objects by default. The basic properties of the tree node objects are: * ``TreeNode`` objects are general purpose in nature, and lack phylogenetic distance values. * ``PhyloNode`` objects inherit the methods of the ``TreeNode`` class and in addition contain phylogenetic distances. * ``RangeNode`` objects contain evolution simulation methods in addition to the standard features of a ``PhyloNode``. The following demonstrates the two methods for initializing a phylogenetic tree object. .. doctest :: >>> simple_tree_string="(B:0.2,(C:0.3,D:0.4)E:0.5)F;" >>> complex_tree_string="(((363564 AB294167.1 Alkalibacterium putridalgicola:0.0028006,55874 AB083411.1 Marinilactibacillus psychrotolerans:0.0022089):0.40998,(15050 Y10772.1 Facklamia hominis:0.32304,(132509 AY707780.1 Aerococcus viridans:0.58815,((143063 AY879307.1 Abiotrophia defectiva:0.5807,83619 AB042060.1 Bacillus schlegelii:0.23569):0.03586,169722 AB275483.1 Fibrobacter succinogenes:0.38272):0.06516):0.03492):0.14265):0.63594,(3589 M62687.1 Fibrobacter intestinalis:0.65866,314063 CP001146.1 Dictyoglomus thermophilum:0.38791):0.32147,276579 EU652053.1 Thermus scotoductus:0.57336);" >>> simple_tree=LoadTree(treestring=simple_tree_string) >>> complex_tree=DndParser(complex_tree_string, PhyloNode) Now to displaying, creating, deleting, and inserting a node in simple_tree. Note that simple_tree has three tips, one internal node 'E', and the root 'F.' For this example, we will create a node named 'A', with a distance of 0.1, delete the node 'C' through its parent, the internal node 'E', and finally we will insert 'A' where 'C' once was. Display the original tree. .. doctest :: >>> print simple_tree.asciiArt() /-B -F-------| | /-C \E-------| \-D Create a new node object. .. doctest :: >>> A_node=PhyloNode(Name='A',Length=0.1) Display the children of the root node, one of which is the parent of the tip we wish to alter. To add or remove a node, we need to use the parent of the target node, which in this case is the internal node 'E.' .. doctest :: >>> print simple_tree.Children [Tree("B;"), Tree("(C,D)E;")] Remove the 'C' tip. **Note:** ``remove()`` and ``removeNode()`` return 'True' if a node is removed, 'False' if they cannot remove a node. .. doctest :: >>> simple_tree.Children[1].remove('C') True Insert the new 'A' tip where 'C' was previously. .. doctest :: >>> simple_tree.Children[1].insert(0,A_node) Finally, display the modified tree. .. doctest :: >>> print simple_tree.asciiArt() /-B -F-------| | /-A \E-------| \-D When deleting tree nodes, it is often desirable to clean up any unbranched internal nodes that may have resulted from removal of tips. For example, if we wanted to delete the node 'A' that was previously added, the resulting tree would have an unbranched internal node 'E.' .. doctest :: >>> simple_tree.Children[1].remove('A') True >>> print simple_tree.asciiArt() /-B -F-------| \E------- /-D With the ``prune()`` method, internal nodes with only a single branch are removed. .. doctest :: >>> simple_tree.prune() >>> print simple_tree.asciiArt() /-B -F-------| \-D An Example of Conditional Tree Node Modifications ================================================= Now to look at the more complex and realistic tree. In complex_tree, there are no internal nodes or a defined root. In order to display this tree in a more succinct manner, we can rename these tips to only contain the genus and species names. To step through the tips only, we can use the ``iterTips()`` iterator, and rename each node. The ``asciiArt()`` function, by default, will attempt to display internal nodes; this can be suppressed by the parameter ``show_internal=False``. First, let's split the ungainly name string for each tip and only preserve the genus and species component, separated by a space. .. doctest :: >>> for n in complex_tree.iterTips(): ... n.Name=n.Name.split()[2]+" "+n.Name.split()[3] Now we display the tree with ``asciiArt()``. .. doctest :: >>> print complex_tree.asciiArt(show_internal=False) /-Alkalibacterium putridalgicola /--------| | \-Marinilactibacillus psychrotolerans /--------| | | /-Facklamia hominis | | | | \--------| /-Aerococcus viridans | | | | \--------| /-Abiotrophia defectiva | | /--------| ---------| \--------| \-Bacillus schlegelii | | | \-Fibrobacter succinogenes | | /-Fibrobacter intestinalis |---------| | \-Dictyoglomus thermophilum | \-Thermus scotoductus For another example of manipulating a phylogenetic tree, let us suppose that we want to remove any species in the tree that are not closely related to *Aerococcus viridans*. To do this, we will delete any nodes that have a greater phylogenetic distance than 1.8 from *Aerococcus viridans*. The best method to remove a large number of nodes from a tree is to first create a list of nodes to delete, followed by the actual removal process. It is important that the ``prune()`` function be called after deletion of each node to ensure that internal nodes whose tips are deleted are removed instead of becoming tips. Alternatively, one could test for internal nodes whose children are deleted in the procedure and flag these nodes to be deleted as well. First, generate a list of tip nodes. .. doctest :: >>> tips=complex_tree.tips() Next, iterate through this list, compare the distances to *Aerococcus*, and append to the deletion list if greater than 1.8. .. doctest :: >>> tips_to_delete=[] >>> AEROCOCCUS_INDEX=3 >>> for n in tips: ... if tips[AEROCOCCUS_INDEX].distance(n)>1.8: ... tips_to_delete.append(n) Now for the actual deletion process. We can simply use the parent of each node in the deletion list to remove itself. Pruning is necessary to prevent internal nodes from being left as tips. **Note:** ``remove()`` and ``removeNode()`` return 'True' if a node is successfully removed, 'False' otherwise. .. doctest :: >>> for n in tips_to_delete: ... n.Parent.remove(n) ... complex_tree.prune() True True True Finally, print the modified complex_tree. .. doctest :: >>> print complex_tree.asciiArt(show_internal=False) /-Alkalibacterium putridalgicola /--------| | \-Marinilactibacillus psychrotolerans --------- /--------| | /-Facklamia hominis | | \--------| /-Aerococcus viridans | | \--------| /-Abiotrophia defectiva | /--------| \--------| \-Bacillus schlegelii | \-Fibrobacter succinogenes PyCogent-1.5.3/doc/examples/motif_results.rst000644 000765 000024 00000013225 11357233473 022267 0ustar00jrideoutstaff000000 000000 Motif results example ===================== .. sectionauthor:: Jeremy Widmann In this example we will be parsing a motif results file and doing some basic operations highlighting the features of the various core motif handling objects. We first want to import the necessary modules. .. doctest:: >>> from cogent import LoadSeqs >>> from cogent.parse.meme import MemeParser Now we want to parse the MEME (http://meme.sdsc.edu) motif results file and the fasta file that we passed to MEME originally. This will construct a ``MotifResults`` object and a ``SequenceCollection`` object respectively, then add the ``SequenceCollection`` to the ``MotifResults``. .. doctest:: >>> results = MemeParser(open('data/motif_example_meme_results.txt','U')) >>> seqs = LoadSeqs('data/motif_example.fasta',aligned=False) >>> results.Alignment = seqs Lets quickly look at an overview of the ``MotifResults``. First, we can check the ``MolType`` of the sequences, how many sequences were searched, and how many distinct motifs were found. .. doctest:: >>> results.MolType.label 'protein' >>> len(results.Alignment.NamedSeqs) 96 >>> len(results.Motifs) 10 Here 10 unique motifs were found searching 96 protein sequences. Now lets look in more detail at the motifs that were found by MEME. The ``MotifResults`` object has a list of ``Motif`` objects. Each ``Motif`` object contains a list of ``Module`` objects that make up the motif. In this example, each ``Motif`` has only one ``Module``. Show the module ID, Evalue, and number of instances of the module in the set of sequences. .. doctest:: >>> for motif in results.Motifs: ... module = motif.Modules[0] ... print module.ID, module.Evalue, len(module.NamedSeqs) ... 1 0.0 50 2 7.3e-239 41 3 2.2e-254 45 4 7.8e-153 37 5 2.9e-120 13 6 2e-99 29 7 4.2e-54 20 8 6.1e-41 29 9 3.5e-15 9 10 5.3e-14 7 Module 1 has the smallest Evalue and the most instances, so we'll look at this one in more detail. The ``Module`` object is a subclass of the core ``Alignment`` object, so it shares much of this functionality. We can look at the consensus sequence for Module 1 calculated in different ways. Lets compare the consensus that MEME provides, the calculated majority consensus, and the calculated IUPAC consensus. .. doctest:: >>> module_1 = results.Motifs[0].Modules[0] >>> module_1.ID '1' >>> module_1.ConsensusSequence 'GKPVVVDFWATWCGPCRxEAPILEELAKE' >>> module_1.majorityConsensus(transform=str) 'GKPVVVDFWATWCGPCRAEAPILEELAKE' >>> module_1.IUPACConsensus() 'XXXXXXXXXXXXCXXCXXXXXXXXXXXXX' Here we can see that the consensus sequences provided by MEME and the calculated majority consensus are about the same. The IUPAC consensus is an easy way to see if any positions in the module are absolutely conserved. To get a better idea of the conservation of the module, we can calculate the uncertainties for every position in the module. .. doctest:: >>> iupac = module_1.IUPACConsensus() >>> majority = module_1.majorityConsensus() >>> uncertainty = module_1.uncertainties() >>> for i,m,u in zip(iupac,majority,uncertainty): ... print i,m,u ... X G 2.69585768303 X K 2.29582593843 X P 2.96578451217 X V 1.61117952123 X V 1.91067699662 X V 2.01512726036 X D 1.57769736083 X F 0.777268500731 X W 2.0045407601 X A 0.522179190202 X T 2.70369216641 X W 0.282292189082 C C 0.0 X G 1.96072818839 X P 0.937268500731 C C 0.0 X R 2.03875770182 X A 3.68637013016 X E 2.60359082041 X A 2.9672863748 X P 0.282292189082 X I 3.49915032218 X L 2.19664948376 X E 2.71346937346 X E 2.49058231553 X L 1.94895812367 X A 2.71230564207 X K 2.85533775047 X E 2.36191706121 The first column is the IUPAC consensus symbol, the second column is the majority consensus symbol, and the third column is the uncertainty at a given position in the module. The smaller the number, the less uncertainty, and therefore the more conserved the majority residue is at that position. Now that we have examined Module 1 in some detail, lets do some more simple tasks. How many different sequences is Module 1 located in? .. doctest:: :options: +NORMALIZE_WHITESPACE >>> module_1.LocationDict {'18309723': [284], '15614085': [58], '15966937': [59], ... >>> len(module_1.LocationDict) 49 The ``LocationDict`` property of the ``Module`` object is a dictionary of sequences IDs and indices in the sequence where the module was found. Here we see that Module 1 was found in 49 different sequences, which means that it was found twice in one sequence. We can find what other modules were found to have more than one instance in a given sequence. .. doctest:: >>> for motif in results.Motifs: ... module = motif.Modules[0] ... for seq_id, indices in module.LocationDict.items(): ... if len(indices) > 1: ... print module.ID, seq_id, indices ... 1 18406743 [42, 362] 3 18406743 [104, 264, 424] We see that Module 1 and Module 3 have more than one instance in sequence 18406743. Since this sequence is the only one to contain multiple instances of the same module, lets quickly examine some statistics of the alignment. .. doctest:: >>> len(results.Alignment.NamedSeqs['18406743']) 578 >>> lengths = [len(seq) for seq in results.Alignment.Seqs] >>> min(lengths) 89 >>> max(lengths) 578 >>> sum(lengths)/float(len(lengths)) 169.86458333333334 This sequence is the longest of all the sequences searched and more than 3 times longer than the average sequence. PyCogent-1.5.3/doc/examples/neutral_test.rst000644 000765 000024 00000010760 11425201333 022064 0ustar00jrideoutstaff000000 000000 A test of the neutral theory ============================ .. sectionauthor:: Gavin Huttley This file contains an example for performing a likelihood ratio test of neutrality. The test compares a model where the codon model parameter omega is constrained to be the same for all edges against one where each edge has its' own omega. From cogent import all the components we need. .. doctest:: >>> from cogent import LoadSeqs, LoadTree >>> from cogent.evolve.models import MG94GTR >>> from cogent.maths import stats Get your alignment and tree. .. doctest:: >>> al = LoadSeqs("data/long_testseqs.fasta") >>> t = LoadTree("data/test.tree") We use a Goldman Yang 1994 model. .. doctest:: >>> sm = MG94GTR() Make the controller object .. doctest:: >>> lf = sm.makeLikelihoodFunction(t, digits=2, space=2) Get the likelihood function object this object performs the actual likelihood calculation. .. doctest:: >>> lf.setAlignment(al) By default, parameters other than branch lengths are treated as global in scope, so we don't need to do anything special here. We can influence how rigorous the optimisation will be, and switch between the global and local optimisers provided in the toolkit using arguments to the optimise method. The ``global_tolerance=1.0`` argument specifies conditions for an early break from simulated annealing which will be automatically followed by the Powell local optimiser. .. note:: the 'results' are of course nonsense. .. doctest:: >>> lf.optimise(global_tolerance = 1.0) View the resulting maximum-likelihood parameter values .. doctest:: >>> print lf Likelihood Function Table =================================== A/C A/G A/T C/G C/T omega ----------------------------------- 1.02 3.36 0.73 0.95 3.71 0.90 ----------------------------------- ========================= edge parent length ------------------------- Human edge.0 0.09 HowlerMon edge.0 0.12 edge.0 edge.1 0.12 Mouse edge.1 0.84 edge.1 root 0.06 NineBande root 0.28 DogFaced root 0.34 ------------------------- ============= motif mprobs ------------- T 0.23 C 0.19 A 0.37 G 0.21 ------------- We'll get the lnL and number of free parameters for later use. .. doctest:: >>> null_lnL = lf.getLogLikelihood() >>> null_nfp = lf.getNumFreeParams() Specify each edge has it's own omega by just modifying the existing ``lf``. This means the new function will start with the above values. .. doctest:: >>> lf.setParamRule("omega", is_independent = True) Optimise the likelihood function, this time just using the local optimiser. .. doctest:: >>> lf.optimise(local = True) View the resulting maximum-likelihood parameter values. .. doctest:: >>> print lf Likelihood Function Table ============================ A/C A/G A/T C/G C/T ---------------------------- 1.03 3.38 0.73 0.95 3.72 ---------------------------- ================================ edge parent length omega -------------------------------- Human edge.0 0.09 0.59 HowlerMon edge.0 0.12 0.96 edge.0 edge.1 0.11 1.13 Mouse edge.1 0.83 0.92 edge.1 root 0.06 0.39 NineBande root 0.28 1.28 DogFaced root 0.34 0.84 -------------------------------- ============= motif mprobs ------------- T 0.23 C 0.19 A 0.37 G 0.21 ------------- Get out an annotated tree, it looks just like a tree, but has the maximum-likelihood parameter estimates attached to each tree edge. This object can be used for plotting, or to provide starting estimates to a related model. .. doctest:: >>> at = lf.getAnnotatedTree() The lnL's from the two models are now used to calculate the likelihood ratio statistic (``LR``) it's degrees-of-freedom (``df``) and the probability (``P``) of observing the LR. .. doctest:: >>> LR = 2 * (lf.getLogLikelihood() - null_lnL) >>> df = lf.getNumFreeParams() - null_nfp >>> P = stats.chisqprob(LR, df) Print this and look up a chi-sq with number of edges - 1 degrees of freedom. .. doctest:: >>> print "Likelihood ratio statistic = ", LR Likelihood ratio statistic = 8... >>> print "degrees-of-freedom = ", df degrees-of-freedom = 6 >>> print "probability = ", P probability = 0.2... PyCogent-1.5.3/doc/examples/parametric_bootstrap.rst000644 000765 000024 00000004613 11425201333 023577 0ustar00jrideoutstaff000000 000000 .. _parametric-bootstrap: Performing a parametric bootstrap ================================= .. sectionauthor:: Gavin Huttley This file contains an example for estimating the probability of a Likelihood ratio statistic obtained from a relative rate test. The bootstrap classes can take advantage of parallel architectures. From cogent import all the components we need. .. doctest:: >>> from cogent import LoadSeqs, LoadTree >>> from cogent.evolve import bootstrap >>> from cogent.evolve.models import HKY85 >>> from cogent.maths import stats Define the null model that takes an alignment object and returns a likelihood function properly assembled for optimising the likelihood under the null hypothesis. The sample distribution is generated using this model. We will use a HKY model. .. doctest:: >>> def create_alt_function(): ... t = LoadTree("data/test.tree") ... sm = HKY85() ... return sm.makeLikelihoodFunction(t) Define a function that takes an alignment object and returns an appropriately assembled function for the alternative model. Since the two models are identical bar the constraint on the branch lengths, we'll use the same code to generate the basic likelihood function as for the alt model, and then apply the constraint here .. doctest:: >>> def create_null_function(): ... lf = create_alt_function() ... # set the local clock for humans & howler monkey ... lf.setLocalClock("Human", "HowlerMon") ... return lf Get our observed data alignment .. doctest:: >>> aln = LoadSeqs(filename = "data/long_testseqs.fasta") Create a ``EstimateProbability`` bootstrap instance .. doctest:: >>> estimateP = bootstrap.EstimateProbability(create_null_function(), ... create_alt_function(), ... aln) Specify how many random samples we want it to generate. Here we use a very small number of replicates only for the purpose of testing. .. doctest:: >>> estimateP.setNumReplicates(5) Run it. .. doctest:: >>> estimateP.run() Get the estimated probability. .. doctest:: >>> p = estimateP.getEstimatedProb() ``p`` is a floating point value, as you'd expect. Grab the estimated likelihoods (null and alternate) for the observed data. .. doctest:: >>> print '%.2f, %.2f' % estimateP.getObservedlnL() -8751.94, -8750.59 PyCogent-1.5.3/doc/examples/perform_nmds.rst000644 000765 000024 00000003031 11213034174 022041 0ustar00jrideoutstaff000000 000000 Perform Nonmetric Multidimensional Scaling ========================================== .. sectionauthor:: Justin Kuczynski An example of how to use nmds. .. doctest:: >>> from cogent.cluster.nmds import NMDS >>> from cogent.maths.distance_transform import dist_euclidean >>> from numpy import array We start with an abundance matrix, samples (rows) by sequences/species (cols) .. doctest:: >>> abundance = array( ... [[7,1,0,0,0,0,0,0,0], ... [4,2,0,0,0,1,0,0,0], ... [2,4,0,0,0,1,0,0,0], ... [1,7,0,0,0,0,0,0,0], ... [0,8,0,0,0,0,0,0,0], ... [0,7,1,0,0,0,0,0,0],#idx 5 ... [0,4,2,0,0,0,2,0,0], ... [0,2,4,0,0,0,1,0,0], ... [0,1,7,0,0,0,0,0,0], ... [0,0,8,0,0,0,0,0,0], ... [0,0,7,1,0,0,0,0,0],#idx 10 ... [0,0,4,2,0,0,0,3,0], ... [0,0,2,4,0,0,0,1,0], ... [0,0,1,7,0,0,0,0,0], ... [0,0,0,8,0,0,0,0,0], ... [0,0,0,7,1,0,0,0,0],#idx 15 ... [0,0,0,4,2,0,0,0,4], ... [0,0,0,2,4,0,0,0,1], ... [0,0,0,1,7,0,0,0,0]], 'float') Then compute a distance matrix of your choosing, and perform nmds on that matrix .. doctest:: >>> distmtx = dist_euclidean(abundance) >>> nm = NMDS(distmtx, verbosity=0) The NMDS object provides a list of points, which can be plotted if desired .. doctest:: >>> pts = nm.getPoints() >>> stress = nm.getStress() With matplotlib installed, we could then do ``plt.plot(pts[:,0], pts[:,1])`` PyCogent-1.5.3/doc/examples/perform_PCoA_analysis.rst000644 000765 000024 00000004502 11572751111 023576 0ustar00jrideoutstaff000000 000000 Perform Principal Coordinates Analysis ====================================== .. sectionauthor:: Cathy Lozupone An example of how to calculate the pairwise distances for a set of sequences. .. doctest:: >>> from cogent import LoadSeqs >>> from cogent.phylo import distance >>> from cogent.cluster.metric_scaling import PCoA Import a substitution model (or create your own) .. doctest:: >>> from cogent.evolve.models import HKY85 Load the alignment. .. doctest:: >>> al = LoadSeqs("data/test.paml") Create a pairwise distances object calculator for the alignment, providing a substitution model instance. .. doctest:: >>> d = distance.EstimateDistances(al, submodel= HKY85()) >>> d.run() Now use this matrix to perform principal coordinates analysis. .. doctest:: >>> PCoA_result = PCoA(d.getPairwiseDistances()) >>> print PCoA_result ====================================================================================== Type Label vec_num-0 vec_num-1 vec_num-2 vec_num-3 vec_num-4 -------------------------------------------------------------------------------------- Eigenvectors NineBande -0.02 0.01 0.04 0.01 0.00 Eigenvectors DogFaced -0.04 -0.06 -0.01 0.00 0.00 Eigenvectors HowlerMon -0.07 0.01 0.01 -0.02 0.00 Eigenvectors Mouse 0.20 0.01 -0.01 -0.00 0.00 Eigenvectors Human -0.07 0.04 -0.03 0.01 0.00 Eigenvalues eigenvalues 0.05 0.01 0.00 0.00 0.00 Eigenvalues var explained (%) 85.71 9.60 3.73 0.95 0.00 -------------------------------------------------------------------------------------- We can save these results to a file in a delimited format (we'll use tab here) that can be opened up in any data analysis program, like R or Excel. Here the principal coordinates can be plotted against each other for visualization. .. doctest:: >>> PCoA_result.writeToFile('PCoA_results.txt',sep='\t') .. We don't actually want to keep that file now, so I'm importing the ``os`` module to delete it. .. doctest:: >>> import os >>> os.remove('PCoA_results.txt') PyCogent-1.5.3/doc/examples/period_estimation.rst000644 000765 000024 00000021314 12014672411 023072 0ustar00jrideoutstaff000000 000000 *************************** Estimating periodic signals *************************** .. sectionauthor:: Gavin Huttley, Julien Epps, Hua Ying We consider two different scenarios: - estimating the periods in a signal - estimating the power for a given period - measuring statistical significance for the latter case Estimating the periods in a signal ================================== For numerical (continuous) data ------------------------------- We first make some sample data. A periodic signal and some noise. .. We set a seed for the random number generator so that we can get consistent generation of the same series. This makes the document robust for doctesting. .. doctest:: :hide: >>> import numpy >>> numpy.random.seed(11) .. doctest:: >>> import numpy >>> t = numpy.arange(0, 10, 0.1) >>> n = numpy.random.randn(len(t)) >>> nse = numpy.convolve(n, numpy.exp(-t/0.05))*0.1 >>> nse = nse[:len(t)] >>> sig = numpy.sin(2*numpy.pi*t) + nse Discrete Fourier transform ^^^^^^^^^^^^^^^^^^^^^^^^^^ We now use the discrete Fourier transform to estimate periodicity in this signal. Given we set the period to equal 10, we expect the maximum power for that index. .. doctest:: >>> from cogent.maths.period import dft >>> pwr, period = dft(sig) >>> print period [ 2. 2.04081633 2.08333333 2.12765957 2.17391304 2.22222222 2.27272727 2.3255814 2.38095238 2.43902439 2.5 2.56410256 2.63157895 2.7027027 2.77777778 2.85714286 2.94117647 3.03030303 3.125 3.22580645... >>> print pwr [ 1.06015801 +0.00000000e+00j 0.74686707 -1.93971914e-02j 0.36784793 -2.66370366e-02j 0.04384413 +2.86970840e-02j 1.54473269 -2.43777386e-02j 0.28522968 -2.33602932e-01j... The power (``pwr``) is returned as an array of complex numbers, so we convert into real numbers using ``abs``. We then zip the power and corresponding periods and sort to identify the period with maximum signal. >>> pwr = abs(pwr) >>> max_pwr, max_period = sorted(zip(pwr,period))[-1] >>> print max_pwr, max_period 50.7685934719 10.0 Auto-correlation ^^^^^^^^^^^^^^^^ We now use auto-correlation. .. doctest:: >>> from cogent.maths.period import auto_corr >>> pwr, period = auto_corr(sig) >>> print period [ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24... >>> print pwr [ 1.63366075e+01 -1.47309007e+01 -3.99310414e+01 -4.94779387e+01... We then zip the power and corresponding periods and sort to identify the period with maximum signal. .. doctest:: >>> max_pwr, max_period = sorted(zip(pwr,period))[-1] >>> print max_pwr, max_period 46.7917300733 10 For symbolic data ----------------- We create a sequence as just a string .. doctest:: >>> s = 'ATCGTTGGGACCGGTTCAAGTTTTGGAACTCGCAAGGGGTGAATGGTCTTCGTCTAACGCTGG'\ ... 'GGAACCCTGAATCGTTGTAACGCTGGGGTCTTTAACCGTTCTAATTTAACGCTGGGGGGTTCT'\ ... 'AATTTTTAACCGCGGAATTGCGTC' We then specify the motifs whose occurrences will be converted into 1, with all other motifs converted into 0. As we might want to do this in batches for many sequences we use a factory function. .. doctest:: >>> from cogent.maths.stats.period import SeqToSymbols >>> seq_to_symbols = SeqToSymbols(['AA', 'TT', 'AT']) >>> symbols = seq_to_symbols(s) >>> len(symbols) == len(s) True >>> symbols array([1, 0, 0, 0, 1, 0, 0, 0, 0, 0... We then estimate the integer discrete Fourier transform for the full data. To do this, we need to pass in the symbols from full conversion of the sequence. The returned values are the powers and periods. .. doctest:: >>> from cogent.maths.period import ipdft >>> powers, periods = ipdft(symbols) >>> powers #doctest: +SKIP array([ 3.22082108e-14, 4.00000000e+00, 9.48683298e+00, 6.74585634e+00, 3.46410162e+00, 3.20674669e+00,... >>> periods array([ 2, 3, 4... We can also compute the auto-correlation statistic, and the hybrid (which combines IPDFT and auto-correlation). .. doctest:: >>> from cogent.maths.period import auto_corr, hybrid >>> powers, periods = auto_corr(symbols) >>> powers array([ 11., 9., 11., 9., 6... >>> periods array([ 2, 3, 4... >>> powers, periods = hybrid(symbols) >>> powers #doctest: +SKIP array([ 3.54290319e-13, 3.60000000e+01, 1.04355163e+02, 6.07127071e+01, 2.07846097e+01, 2.88607202e+01,... >>> periods array([ 2, 3, 4... Estimating power for specified period ===================================== For numerical (continuous) data ------------------------------- We just use ``sig`` created above. The Goertzel algorithm gives the same result as the ``dft``. .. doctest:: >>> from cogent.maths.period import goertzel >>> pwr = goertzel(sig, 10) >>> print pwr 50.7685934719 For symbolic data ----------------- .. take example above and show how to compute it using autocorrelation We use the symbols from the above example. For the ``ipdft``, ``auto_corr`` and ``hybrid`` functions we just need to identify the array index containing the period of interest and slice the corresponding value from the returned powers. The reported periods start at ``llim``, which defaults to 2, but indexes start at 0, the index for a period-5 is simply 5-``llim``. .. doctest:: >>> powers, periods = auto_corr(symbols) >>> llim = 2 >>> period5 = 5-llim >>> periods[period5] 5 >>> powers[period5] 9.0 For Fourier techniques, we can compute the power for a specific period more efficiently using Goertzel algorithm. .. doctest:: >>> from cogent.maths.period import goertzel >>> period = 4 >>> power = goertzel(symbols, period) >>> ipdft_powers, periods = ipdft(symbols) >>> ipdft_power = abs(ipdft_powers[period-llim]) >>> round(power, 6) == round(ipdft_power, 6) True >>> power 9.4868... It's also possible to specify a period to the stand-alone functions. As per the ``goertzel`` function, just the power is returned. .. doctest:: >>> power = hybrid(symbols, period=period) >>> power 104.355... Measuring statistical significance of periodic signals ====================================================== For numerical (continuous data) ------------------------------- We use the signal provided above. Because significance testing is being done using a resampling approach, we define a calculator which precomputes some values to improve compute performance. For a continuous signal, we'll use the Goertzel algorithm. .. doctest:: >>> from cogent.maths.period import Goertzel >>> goertzel_calc = Goertzel(len(sig), period=10) Having defined this, we then just pass this calculator to the ``blockwise_bootstrap`` function. The other critical settings are the ``block_size`` which specifies the size of segments of contiguous sequence positions to use for sampling and ``num_reps`` which is the number of permuted replicate sequences to generate. .. doctest:: >>> from cogent.maths.stats.period import blockwise_bootstrap >>> obs_stat, p = blockwise_bootstrap(sig, calc=goertzel_calc, block_size=10, ... num_reps=1000) >>> print obs_stat 50.7685934719 >>> print p 0.0 For symbolic data ----------------- Permutation testing ^^^^^^^^^^^^^^^^^^^ The very notion of permutation testing for periods, applied to a genome, requires the compute performance be as quick as possible. This means providing as much information up front as possible. We have made the implementation flexible by not assuming how the user will convert sequences to symbols. It's also the case that numerous windows of exactly the same size are being assessed. Accordingly, we use a class to construct a fixed signal length evaluator. We do this for the hybrid metric first. .. doctest:: >>> from cogent.maths.period import Hybrid >>> len(s) 150 >>> hybrid_calculator = Hybrid(len(s), period = 4) .. note:: We defined the period length of interest in defining this calculator because we're interested in dinucleotide motifs. We then construct a seq-to-symbol convertor. .. doctest:: >>> from cogent.maths.stats.period import SeqToSymbols >>> seq_to_symbols = SeqToSymbols(['AA', 'TT', 'AT'], length=len(s)) The rest is as per the analysis using ``Goertzel`` above. .. doctest:: >>> from cogent.maths.stats.period import blockwise_bootstrap >>> stat, p = blockwise_bootstrap(s, calc=hybrid_calculator, ... block_size=10, num_reps=1000, seq_to_symbols=seq_to_symbols) ... >>> print stat 104.35... >>> p < 0.1 True PyCogent-1.5.3/doc/examples/phylo_by_ls.rst000644 000765 000024 00000014674 11444532333 021716 0ustar00jrideoutstaff000000 000000 Phylogenetic reconstruction by least squares ============================================ .. sectionauthor:: Gavin Huttley We will load some pre-computed pairwise distance data. To see how that data was computed see the :ref:`calculating-pairwise-distances` example. That data is saved in a format called ``pickle`` which is native to python. As per usual, we import the basic components we need. .. recompute the data matrix and then delete file at end .. doctest:: :hide: >>> from cogent import LoadSeqs >>> from cogent.phylo import distance >>> from cogent.evolve.models import HKY85 >>> al = LoadSeqs("data/long_testseqs.fasta") >>> d = distance.EstimateDistances(al, submodel= HKY85()) >>> d.run() >>> import cPickle >>> f = open('dists_for_phylo.pickle', "w") >>> cPickle.dump(d.getPairwiseDistances(), f) >>> f.close() .. doctest:: >>> import cPickle >>> from cogent.phylo import distance, least_squares Now load the distance data. .. doctest:: >>> f = file('dists_for_phylo.pickle', 'r') >>> dists = cPickle.load(f) >>> f.close() If there are extremely small distances, they can cause an error in the least squares calculation. Since such estimates are between extremely closely related sequences we could simply drop all distances for one of the sequences. We won't do that here, we'll leave that as exercise. We make the ls calculator. .. doctest:: >>> ls = least_squares.WLS(dists) We will search tree space for the collection of best trees using the advanced stepwise addition algorithm (hereafter *asaa*). Look for the single best tree ----------------------------- In this use case we are after just 1 tree. We specify up to what taxa size all possible trees for the sample will be computed. Here we are specifying ``a=5``. This means 5 sequences will be picked randomly and all possible trees relating them will be evaluated. ``k=1`` means only the best tree will be kept at the end of each such round of evaluation. For every remaining sequence it is grafted onto every possible branch of this tree. The best ``k`` results are then taken to the next round, when another sequence is randomly selected for addition. This proceeds until all sequences have been added. The result with following arguments is a single wls score and a single ``Tree`` which can be saved etc .. .. doctest:: >>> score, tree = ls.trex(a = 5, k = 1) >>> assert score < 1e-4 We won't display this tree, because we are doing more below. A more rigourous tree space search ---------------------------------- We change the asaa settings, so we keep more trees and then look at the distribution of the statistics for the last collection of trees. We could also change ``a`` to be larger, but in the current case we just adjust ``k``. We also set the argument ``return_all = True``, the effect of which is to return the complete set of saved trees. These, and their support statistic, can then be inspected. .. doctest:: >>> trees = ls.trex(a = 5, k = 5, return_all = True) Remember the sum-of-squares statistic will be smaller for 'good' trees. The order of the trees returned is from good to bad. The number of returned ``trees`` is the same as the number requested to be retained at each step. .. doctest:: >>> print len(trees) 5 Lets inspect the resulting statistics. First, the object ``trees`` is a list of ``(wls, Tree)`` tuples. We will therefore loop over the list to generate a separate list of just the wls statistics. The following syntax is called a list comprehension - basically just a very succinct ``for`` loop. .. doctest:: >>> wls_stats = [tree[0] for tree in trees] The ``wls_stats`` is a list which, if printed, looks like .. code-block:: python [1.3308768548934439e-05, 0.0015588630350439783, ... From this you'll see that the first 5 results are very similar to each other and would probably reasonably be considered equivalently supported topologies. I'll just print the first two of the these trees after balancing them (in order to make their representations as equal as possible). .. doctest:: >>> t1 = trees[0][1].balanced() >>> t2 = trees[1][1].balanced() >>> print t1.asciiArt() /-Human /edge.0--| | \-HowlerMon | -root----|--Mouse | | /-NineBande \edge.1--| \-DogFaced >>> print t2.asciiArt() /-DogFaced | | /-Human -root----|-edge.0--| | \-HowlerMon | | /-NineBande \edge.1--| \-Mouse You can see the difference involves the Jackrabbit, TreeShrew, Gorilla, Rat clade. Assessing the fit for a pre-specified tree topology --------------------------------------------------- In some instances we may have a tree from the literature or elsewhere whose fit to the data we seek to evaluate. In this case I'm going load a tree as follows. .. doctest:: >>> from cogent import LoadTree >>> query_tree = LoadTree( ... treestring="((Human:.2,DogFaced:.2):.3,(NineBande:.1, Mouse:.5):.2,HowlerMon:.1)") We now just use the ``ls`` object created above. The following evaluates the query using it's associated branch lengths, returning only the wls statistic. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> ls.evaluateTree(query_tree) 2.8... We can also evaluate just the tree's topology, returning both the wls statistic and the tree with best fit branch lengths. .. doctest:: >>> wls, t = ls.evaluateTopology(query_tree) >>> assert "%.4f" % wls == '0.0084' Using maximum likelihood for measuring tree fit ----------------------------------------------- This is a much slower algorithm and the interface largely mirrors that for the above. The difference is you import ``maximum_likelihood`` instead of ``least_squares``, and use the ``ML`` instead of ``WLS`` classes. The ``ML`` class requires a substitution model (like a HKY85 for DNA or JTT92 for protein), and an alignment. It also optionally takes a distance matrix, such as that used here, computed for the same sequences. These distances are then used to obtain estimates of branch lengths by the WLS method for each evaluated tree topology which are then used as starting values for the likelihood optimisation. .. clean up .. doctest:: :hide: >>> import os >>> os.remove('dists_for_phylo.pickle') PyCogent-1.5.3/doc/examples/phylogeny_app_controllers.rst000644 000765 000024 00000006170 11361531512 024663 0ustar00jrideoutstaff000000 000000 .. _appcontroller-phylogeny: Using phylogeny application controllers to construct phylogenetic trees from alignments ======================================================================================= .. sectionauthor:: Daniel McDonald This document provides a few use case examples of how to use the phylogeny application controllers available in PyCogent. Each phylogeny application controller provides the support method ``build_tree_from_alignment``. This method takes as input an ``Alignment`` object, a ``SequenceColleciton`` object or a dict mapping sequence IDs to sequences. The ``MolType`` must also be specified. Optionally, you can indicate if you would like the "best_tree$", as well as any additional application parameters. These methods return a ``PhyloNode`` object. To start, lets import all of our ``build_tree_from_alignment`` methods and our ``MolType``: .. doctest:: >>> from cogent.core.moltype import DNA >>> from cogent.app.clearcut import build_tree_from_alignment as clearcut_build_tree >>> from cogent.app.clustalw import build_tree_from_alignment as clustalw_build_tree >>> from cogent.app.fasttree import build_tree_from_alignment as fasttree_build_tree >>> from cogent.app.muscle import build_tree_from_alignment as muscle_build_tree >>> from cogent.app.raxml import build_tree_from_alignment as raxml_build_tree Next, we'll load up a test set of sequences and construct an ``Alignment``: .. doctest:: >>> from cogent import LoadSeqs >>> from cogent.app.muscle import align_unaligned_seqs >>> unaligned = LoadSeqs(filename='data/test2.fasta', aligned=False) >>> aln = align_unaligned_seqs(unaligned, DNA) Now, let's construct some trees with default parameters! .. note:: We are explicitly seeding Clearcut and RAxML to ensure reproducible results, and FastTree's output depends slightly on which version of FastTree is installed .. doctest:: >>> clearcut_tree = clearcut_build_tree(aln, DNA, params={'-s':42}) >>> clustalw_tree = clustalw_build_tree(aln, DNA) >>> fasttree_tree = fasttree_build_tree(aln, DNA) >>> muscle_tree = muscle_build_tree(aln, DNA) >>> raxml_tree = raxml_build_tree(aln, DNA, params={'-p':42}) >>> clearcut_tree Tree("(Mouse,(((HowlerMon,Human),DogFaced),NineBande));") >>> clustalw_tree Tree("((DogFaced,(HowlerMon,Human)),Mouse,NineBande);") >>> muscle_tree Tree("(Mouse,(DogFaced,(Human,(HowlerMon,NineBande))));") >>> raxml_tree Tree("((HowlerMon,Human),(DogFaced,Mouse),NineBande);") .. code-block:: python >>> fasttree_tree Tree("(Mouse,NineBande,(DogFaced,(HowlerMon,Human)0.752)0.508);") These methods allow the programmer to specify any of the applications parameters. Let's look at an example where we tell Clearcut to use traditional neighbor-joining, shuffle the distance matrix, use Kimura distance correction and explicitly seed the random number generator: .. doctest:: >>> clearcut_params = {'-N':True,'-k':True,'-S':True,'-s':42} >>> clearcut_tree = clearcut_build_tree(aln, DNA, params=clearcut_params) >>> clearcut_tree Tree("(((HowlerMon,Human),(NineBande,Mouse)),DogFaced);") PyCogent-1.5.3/doc/examples/query_ensembl.rst000644 000765 000024 00000067574 12003174737 022256 0ustar00jrideoutstaff000000 000000 .. _query-ensembl: Querying Ensembl ================ .. sectionauthor:: Gavin Huttley, Hua Ying We begin this documentation with a note on dependencies, performance and code status. Ensembl_ makes their data available via MySQL servers, so the ``cogent.db.ensembl`` module has additional dependencies of `MySQL-python`_ and SQLAlchemy_. You can use ``easy_install`` to install the latter, but the former is more involved. If you experience trouble, please post to the PyCogent help forums. Regarding performance, significant queries to the UK servers from Australia are very slow. The examples in this documentation, for instance, take ~15 minutes to run when pointed at the UK servers. Running against a local installation, however, is ~50 times faster. On status, the ``cogent.db.ensembl`` module should be considered beta level code. We still strongly advise users to check results for a subset of their analyses against the data from the UK Ensembl web site. .. _`MySQL-python`: http://sourceforge.net/projects/mysql-python .. _SQLAlchemy: http://www.sqlalchemy.org/ The major components of Ensembl_ are compara and individual genomes. In all cases extracting data requires connecting to MySQL databases on a server and the server may be located remotely or locally. For convenience, the critical objects you'll need to query a database are provided in the top-level module import, ie immediately under ``cogent.db.ensembl``. .. _Ensembl: http://www.ensembl.org Specifying a Host and Account ----------------------------- So the first step is to specify what host and account are to be used. On my lab's machines, I have set an environment variable with the username and password for our local installation of the Ensembl_ MySQL databases, e.g. ``ENSEMBL_ACCOUNT="username password"``. So I'll check for that (since the documentation runs much quicker when this is true) and if it's absent, we just set ``account=None`` and the account used defaults to the UK Ensembl_ service. I also define which release of ensembl we'll use in one place to allow easier updating of this documentation. .. doctest:: >>> import os >>> Release = 67 >>> from cogent.db.ensembl import HostAccount >>> if 'ENSEMBL_ACCOUNT' in os.environ: ... host, username, password = os.environ['ENSEMBL_ACCOUNT'].split() ... account = HostAccount(host, username, password) ... else: ... account = None What Species Are Available? --------------------------- Another key element, of course, is the species available. Included as part of ``cogent.db.ensembl`` is the module ``species``. This module contains a class that translates between latin names, common names and ensembl database prefixes. The species listed are guaranteed to be incomplete, given Ensembl's update schedule, so it's possible to dynamically add to this listing, or even change the common name for a given latin name. .. doctest:: >>> from cogent.db.ensembl import Species >>> print Species ================================================================================ Common Name Species Name Ensembl Db Prefix -------------------------------------------------------------------------------- A.aegypti Aedes aegypti aedes_aegypti A.clavatus Aspergillus clavatus aspergillus_clavatus... In Australia, the common name for *Gallus gallus* is chook, so I'll modify that. .. doctest:: >>> Species.amendSpecies('Gallus gallus', 'chook') >>> assert Species.getCommonName('Gallus gallus') == 'chook' You can also add new species for when they become available using ``Species.amendSpecies``. Species common names are used to construct attributes on PyCogent ``Compara`` instances). You can get the name that will be using the ``getComparaName`` method. For species with a real common name .. doctest:: >>> Species.getComparaName('Procavia capensis') 'RockHyrax' or with a shortened species name .. doctest:: >>> Species.getComparaName('Caenorhabditis remanei') 'Cremanei' The ``Species`` class is basically used to translate between latin names and ensembl's database naming scheme. It also serves to allow the user to simply enter the common name for a species in order to reference it's genome databases. The queries are case-insensitive. Interrogating a Genome ---------------------- As implied above, Ensembl databases are versioned, hence you must explicitly state what release you want. Aside from that, getting an object for querying a genome is simply a matter of importing the ``HostAccount`` and ``Genome`` classes. Here I'm going to use the ``cogent.db.ensembl`` level imports. .. doctest:: >>> from cogent.db.ensembl import HostAccount, Genome >>> human = Genome(Species='human', Release=Release, account=account) >>> print human Genome(Species='Homo sapiens'; Release='67') Notice I used the common name rather than full name. The ``Genome`` provides an interface to obtaining different attributes. It's primary role is to allow selection of genomic regions according to some search criteria. The type of region is presently limited to ``Gene``, ``Est``, ``CpGisland``, ``Repeat`` and ``Variation``. There's also a ``GenericRegion``. The specific types are also capable of identifying information related to themselves, as we will demonstrate below. A Note on Coordinate Systems ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The positions employed on Ensembl's web-site, and in their MySQL database differ from those used internally by ``cogent.db.ensembl``. In all cases where you are querying ``cogent.db.ensembl`` objects directly inputting nucleotide positions you can indicate you are using Ensembl coordinates by setting ``ensembl_coord=True``. If you are explicitly passing in a ``cogent.db.ensembl`` region, that argument has no effect. Selecting Gene's ^^^^^^^^^^^^^^^^ The genome can be queried for gene's in a number of ways. You can search for genes using the ``Genome.getGeneByStableId`` method which requires you know the Ensembl stable id. .. doctest:: >>> brca1 = human.getGeneByStableId(StableId='ENSG00000012048') >>> print brca1.Description breast cancer 1, early onset... Alternatively, you can query using the ``Genome.getGenesMatching`` method. This method allows querying for gene(s) by the following identifiers: HGNC symbol; Ensembl ``stable_id``; description; or coding type. .. note:: When querying by description, you can specify that the exact words in the query must be present in the description by setting the argument ``like=True``. The default is ``like=False``. In general for such queries, case shouldn't matter. For instance, find the *BRCA2* gene by it's HGNC symbol. .. doctest:: >>> genes = human.getGenesMatching(Symbol='brca2') Because there can be multiple hits from a ``getGenesMatching`` query, and because we wish to not spend time doing things (like talking to the database) unnecessarily, the result of the query is a python generator. This acts like a series and allows you to iterate over the database hits until you find the one you want and then terminate the record collection. .. doctest:: >>> for gene in genes: ... if gene.Symbol.lower() == 'brca2': ... break ... >>> brca2 = gene # so we keep track of this reference for later on >>> print brca2.Symbol BRCA2 >>> print brca2.Description breast cancer 2... >>> print brca2 Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='breast... This code serves to illustrate a few things. First, the sorts of properties that exist on the object. These can be directly accessed as illustrated above. Secondly, that the argument names to ``getGenesMatching`` match the properties. Gene's also have a location. The length of a gene is the difference between its start and end location. .. note:: Unfortunately all gene coordinates can vary between genome builds. So start, end and length can all differ between Ensembl releases for the same gene. .. doctest:: >>> print brca2.Location Homo sapiens:chromosome:13:32889610... >>> print len(brca2) 84195 Each location is directly tied to the parent genome and the coordinate above also shows the coordinates' *type* (chromosome in this case), name (13), start, end and strand. The start and end positions are python indices and will differ from the Ensembl indices in that start will be the Ensembl index - 1. This is because python counts from 0, not 1. In querying for regions using a specific set of coordinates, it is possible to put in the Ensembl coordinates (demonstrated below). ``Gene`` has several useful properties, including the ability to directly get their own DNA sequence and their ``CanonicalTranscript`` and ``Transcripts``. ``CanonicalTranscript`` is the characteristic transcript for a gene, as defined by Ensembl. ``Transcripts`` is a tuple attribute containing individual region instances of type ``Transcript``. A ``Transcript`` has ``Exons``, ``Introns``, a ``Cds`` and, if the ``BioType`` is protein coding, a protein sequence. In the following we grab the cannonical transcript from ``brca2`` .. doctest:: >>> print brca2.BioType protein_coding >>> print brca2.Seq GGGCTTGTGGCGC... >>> print brca2.CanonicalTranscript.Cds ATGCCTATTGGATC... >>> print brca2.CanonicalTranscript.ProteinSeq MPIGSKERPTF... It is also possible to iterate over a transcript's exons, over their translated exons, or to obtain their coding DNA sequence. We grab the second transcript for this. .. doctest:: >>> transcript = brca2.Transcripts[0] >>> for exon in transcript.Exons: ... print exon, exon.Location Exon(StableId=ENSE00001184784, Rank=1) Homo sapiens:chromosome:13:... >>> for exon in transcript.TranslatedExons: ... print exon, exon.Location Exon(StableId=ENSE00001484009, Rank=2) Homo sapiens:chromosome:13:... >>> print transcript.Cds ATGCCTATTGGATCCAAA... The ``Cds`` sequence includes the stop-codon, if present. The reason for this is there are many annotated transcripts in the Ensembl database the length of whose transcribed exons are not divisible by 3. Hence we leave it to the user to decide how to deal with that, but mention here that determining the number of complete codons is trivial and you can slice the ``Cds`` so that it's length is divisible by 3. The ``Exons`` and ``TranslatedExons`` properties are tuples that are evaluated on demand and can be sliced. Each ``Exon/TranslatedExon`` is itself a region, with all of the properties of generic regions (like having a ``Seq`` attribute). Similar descriptions apply to the ``Introns`` property and ``Intron`` class. We show just for the canonical transcript. .. doctest:: >>> for intron in brca2.CanonicalTranscript.Introns: ... print intron Intron(TranscriptId=ENST00000380152, Rank=1) Intron(TranscriptId=ENST00000380152, Rank=2) Intron(TranscriptId=ENST00000380152, Rank=3)... The ``Gene`` region also has convenience methods for examining properties of it's transcripts, in presenting the ``Cds`` lengths and getting the ``Transcript`` encoding the longest ``Cds``. .. doctest:: >>> print brca2.getCdsLengths() [10257, 1807, 10257] >>> longest = brca2.getLongestCdsTranscript() >>> print longest.Cds ATGCCTATTGGATCCAAA... All Regions have a ``getFeatures`` method which differs from that on genome only in that the genomic coordinates are automatically entered for you. Regions also have the ability to return their sequence as an annotated ``cogent`` sequence. The method on ``Gene`` simply queries the parent genome using the gene's own location as the coordinate for the currently supported region types. We will query ``brca2`` asking for gene features, the end-result will be a ``cogent`` sequence that can be used to obtain the CDS, for instance, using the standard ``cogent`` annotation capabilities. .. doctest:: >>> annot_brca2 = brca2.getAnnotatedSeq(feature_types='gene') >>> cds = annot_brca2.getAnnotationsMatching('CDS')[0].getSlice() >>> print cds ATGCCTATTGGATCCAAA... Those are the properties of a ``Gene``, at present, of direct interest to end-users. There are obviously different types of genes, and the ``Genome`` object provides an ability to establish exactly what distinct types are defined in Ensembl. .. doctest:: >>> print human.getDistinct('BioType') ['rRNA', 'lincRNA', 'IG_C_pseudogene', ... The genome can be queried for any of these types, for instance we'll query for ``rRNA``. We'll get the first few records and then exit. .. doctest:: >>> rRNA_genes = human.getGenesMatching(BioType='rRNA') >>> count = 0 >>> for gene in rRNA_genes: ... print gene ... count += 1 ... if count == 1: ... break ... Gene(Species='Homo sapiens'; BioType='Mt_rRNA'; ... This has the effect of returning any gene whose ``BioType`` includes the phrase ``rRNA``. If a gene is not a protein coding gene, as in the current case, then it's ``Transcripts`` will have ``ProteinSeq==None`` and ``TranslatedExons==None``, but it will have ``Exons`` and a ``Cds``. .. doctest:: >>> transcript = gene.Transcripts[0] >>> assert transcript.ProteinSeq == None >>> assert transcript.TranslatedExons == None >>> assert transcript.Cds != None Getting ESTs ^^^^^^^^^^^^ Ensembl's ``otherfeatures`` database mirrors the structure of the ``core`` database and contains EST information. Hence, the ``Est`` region inherits directly from ``Gene`` (ie has many of the same properties). ``est`` is a supported ``feature_types`` for the ``getFeatures`` method. You can also directly query for an EST using Ensembl's ``StableID``. Here, however, we'll just query for ``Est`` that map to the ``brca2`` region. .. doctest:: >>> ests = human.getFeatures(feature_types='est', region=brca2) >>> for est in ests: ... print est Est(Species='Homo sapiens'; BioType='protein_coding'; Description='None';... Getting Variation ^^^^^^^^^^^^^^^^^ ``Variation`` regions also have distinctive properties worthy of additional mention. As for genes, there are distinct types stored in Ensembl that may be of interest. Those types can likewise be discovered from the genome, .. doctest:: >>> print human.getDistinct('Effect') ['3_prime_UTR_variant', 'splice_acceptor_variant', 'intergenic_variant'... and that information can be used to query the genome for all variation of that effect. .. note:: What we term ``effect``, Ensembl terms consequence. We use ``effect`` because it's shorter. We allow the query to be an inexact match by setting ``like=True``. Again we'll just iterate over the first few. .. doctest:: >>> nsyn_variants = human.getVariation(Effect='non_synonymous_codon', ... like=True) ... >>> for nsyn_variant in nsyn_variants: ... break ... >>> print nsyn_variant Variation(Symbol='rs180965628'; Effect='non_synonymous_codon'; Alleles='G/A') >>> print nsyn_variant.AlleleFreqs ============================= allele freq sample_id ----------------------------- A 0.0000 113559 G 1.0000 113559 A 0.0013 113560 G 0.9987 113560 A 0.0000 113561 G 1.0000 113561 A 0.0005 113562 G 0.9995 113562 A 0.0000 113563 G 1.0000 113563 ----------------------------- ``Variation`` objects also have other useful properties, such as a location, the number of alleles and the allele frequencies. The length of a ``Variation`` instance is the length of it's longest allele. .. doctest:: >>> assert len(nsyn_variant) == 1 >>> print nsyn_variant.Location Homo sapiens:chromosome:17:6068-6069:1 >>> assert nsyn_variant.NumAlleles == 2 ``Variation`` objects have ``FlankingSeq`` and ``Seq`` attributes which, of course, in the case of a SNP is a single nucleotide long and should correspond to one of the alleles. In the latter case, this property is a tuple with the 0th entry being the 5'- 300 nucleotides and the 1st entry being the 3' nucleotides. .. doctest:: >>> print nsyn_variant.FlankingSeq[0] ACTAATACCTG... >>> print nsyn_variant.FlankingSeq[1] CACGATGCCTA... >>> assert str(nsyn_variant.Seq) in nsyn_variant.Alleles, str(nsyn_variant.Seq) As a standard feature, ``Variation`` within a specific interval can also be obtained. Using the ``brca2`` gene region instance created above, we can find all the genetic variants using the ``Variants`` property of genome regions. We use this example to also demonstrate the ``PeptideAlleles`` and ``TranslationLocation`` attributes. ``PeptideAlleles`` is the amino-acid variation resulting from the nucleotide variation while ``TranslationLocation`` is the position in the translated peptide of the variant. If a variant does not affect protein coding sequence (either it's not exonic or it's a synonymous variant) then these properties have the value ``None``. We illustrate their use. .. doctest:: >>> for variant in brca2.Variants: ... if variant.PeptideAlleles is None: ... continue ... print variant.PeptideAlleles, variant.TranslationLocation P/L 1... .. note:: These are Python coordinates, add 1 to get the Ensembl value. We can also use a slightly more involved query to find all variants within the gene of a specific type. (Of course, you could also simply iterate over the ``Variants`` attribute to grab these out too.) .. doctest:: >>> brca2_snps = human.getFeatures(feature_types='variation', ... region=brca2) >>> for snp in brca2_snps: ... if 'non_synonymous_codon' in snp.Effect: ... break >>> print snp Variation(Symbol='rs80358836'; Effect=['2KB_upstream_variant', '5KB_upstream_variant', 'non_synonymous_codon']; Alleles='C/T') >>> print snp.Location Homo sapiens:chromosome:13:32890601-32890602:1 Other Region Types ^^^^^^^^^^^^^^^^^^ These can be obtained from the genome instance using the genomes ``getFeatures`` method. At present, only repeats, CpG islands, variation, EST's and genes can be obtained through this method. There's also ``GenericRegion``, which is precisely that. In Ensembl's databases, each type of feature may be recorded at multiple coordinate levels. Accordingly, each level is checked to obtain full information of that feature. .. doctest:: >>> chicken = Genome(Species='chook', Release=Release, account=account) >>> print chicken.FeatureCoordLevels Gallus gallus ============================================ Type Levels -------------------------------------------- gene chromosome repeat contig est chromosome variation chromosome cpg chromosome, supercontig, contig -------------------------------------------- Comparative Analyses -------------------- The Ensembl compara database is represented by ``cogent.db.ensembl.compara.Compara``. This object provides a means for querying for relationships among genomes and obtaining multiple alignments. For convenience the class is made available through the top-level module for importing (i.e. ``cogent.db.ensembl.Compara``). Instantiating ``Compara`` requires, as before, the ensembl release, the series of species of interest and optionally an account (we also use our local account for speed). For the purpose of illustration we'll use the human, mouse and rat genomes. .. note:: Any queries on this instance of compara will only return results for the indicated species. If you want to query about other species, create another instance. .. doctest:: >>> from cogent.db.ensembl import Compara >>> compara = Compara(['human', 'mouse', 'rat'], account=account, ... Release=Release) >>> print compara Compara(Species=('Homo sapiens', 'Mus musculus', 'Rattus norvegicus'); Release=67... The ``Compara`` object loads the corresponding ``Genome``'s and attaches them to itself as named attributes (use ``Species.getComparaName`` to find out what the attribute will be). The genome instances are named according to their common name in CamelCase, or Scase. For instance, if we had created a ``Compara`` instance with the American pika species included, then that genome would be accessed as ``compara.AmericanPika``. Common names containing a '.' are treated differently. For instance, the common name for *Caenorhabditis remanei* is ``C.remanei`` which becomes ``compara.Cremanei``. We access the human genome in this ``Compara`` instance and conduct a gene search. .. doctest:: >>> brca2 = compara.Human.getGeneByStableId(StableId='ENSG00000139618') >>> print brca2 Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='breast... We can now use this result to search compara for related genes. We note here that like ``Genome``, ``Compara`` has the ``getDistinct`` method to assist in identifying appropriate search criteria. What are the distinct types of gene relationships recorded in Ensembl, for instance? .. doctest:: >>> relationships = compara.getDistinct('relationship') >>> print relationships [u'ortholog_one2many', u'contiguous_gene_split', u'ortholog_one2one',... So we use the ``brca2`` instance above and search for orthologs among the human, mouse, rat genomes. .. doctest:: >>> orthologs = compara.getRelatedGenes(gene_region=brca2, ... Relationship='ortholog_one2one') >>> print orthologs RelatedGenes: Relationships=ortholog_one2one Gene(Species='Rattus norvegicus'; BioType='protein_coding'; Description='Breast cancer ... I could also have done that query using a ``StableId``, which I now do using the Ensembl mouse identifier for *Brca2*. .. doctest:: >>> orthologs = compara.getRelatedGenes(StableId='ENSMUSG00000041147', ... Relationship='ortholog_one2one') >>> print orthologs RelatedGenes: Relationships=ortholog_one2one Gene(Species='Rattus norvegicus'; BioType='protein_coding'; Description='Breast cancer... The ``RelatedGenes`` object has a number of properties allowing you to get access to data. A ``Members`` attribute holds each of the ``Gene`` instances displayed above. The length of this attribute tells you how many hits there were, while each member has all of the capabilities described for ``Gene`` above, eg. a ``Cds`` property. There is also a ``getSeqLengths`` method which returns the vector of sequence lengths for the members. This method returns just the lengths of the individual genes. .. doctest:: >>> print orthologs.Members (Gene(Species='Rattus norvegicus'; BioType='protein_coding'; Descr... >>> print orthologs.getSeqLengths() [40742, 47117, 84195] In addition there's a ``getMaxCdsLengths`` method for returning the lengths of the longest ``Cds`` from each member. .. doctest:: >>> print orthologs.getMaxCdsLengths() [10032, 9990, 10257] You can also obtain the sequences as a ``cogent`` ``SequenceCollection`` (unaligned), with the ability to have those sequences annotated as described above. The sequences are named in accordance with their genomic coordinates. .. doctest:: >>> seqs = orthologs.getSeqCollection(feature_types='gene') >>> print seqs.Names ['Rattus norvegicus:chromosome:12:428... We can also search for other relationship types, which we do here for a histone. .. doctest:: >>> paralogs = compara.getRelatedGenes(StableId='ENSG00000164032', ... Relationship='within_species_paralog') >>> print paralogs RelatedGenes: Relationships=within_species_paralog Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='H2A... Getting Comparative Alignments ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Ensembl stores multiple sequence alignments for selected species. For a given group of species, you can examine what alignments are available by printing the ``method_species_links`` attribute of ``Compara``. This will return something like >>> print compara.method_species_links Align Methods/Clades =============================================================================... method_link_species_set_id method_link_id species_set_id align_method ... -----------------------------------------------------------------------------... 580 10 34468 PECAN ... 578 13 34466 EPO ... 582 14 34697 EPO_LOW_COVERAGE ... -----------------------------------------------------------------------------... The ``align_method`` and ``align_clade`` columns can be used as arguments to ``getSyntenicRegions``. This method is responsible for returning ``SyntenicRegions`` instances for a given coordinate from a species. As it's possible that multiple records may be found from the multiple alignment for a given set of coordinates, the result of calling this method is a python generator. The returned regions have a length, defined by the full set of aligned sequences. If the ``omit_redundant`` argument is used, then positions with gaps in all sampled species will be removed in the alignment to be returned. The length of the syntenic region, however, is the length of the unfiltered alignment. .. note:: It's important to realise that multiple alignments are from these clades. Hence, sequence regions that you might expect would result in a contiguous alignment in the species subset of interest may be returned as separate ``SyntenicRegions`` due to the influence on the alignment of the other species. .. doctest:: >>> syntenic_regions = compara.getSyntenicRegions(region=brca2, ... align_method='EPO', align_clade='eutherian') >>> for syntenic_region in syntenic_regions: ... print syntenic_region ... print len(syntenic_region) ... print repr(syntenic_region.getAlignment(omit_redundant=False)) SyntenicRegions: Coordinate(Human,chro...,13,32889610-32907347,1) Coordinate(Mouse,chro...,5,151325195-151339535,-1) Coordinate(Rat,chro...,12,4313281-4324025,1) 54774 3 x 54774 dna alignment: Homo sapiens:chromosome:13:32889610-32907347... We consider a species for which pairwise alignments are available -- the bush baby. .. doctest:: >>> compara_pair = Compara(['Human', 'Bushbaby'], Release=Release, ... account=account) >>> print compara_pair Compara(Species=('Homo sapiens', 'Otolemur garnettii'); Release=67; connected=True) Printing the ``method_species_links`` table provides all the necessary information for specifying selection conditions. >>> print compara_pair.method_species_links Align Methods/Clades ============================================================================... method_link_species_set_id method_link_id species_set_id align_method... ----------------------------------------------------------------------------... 582 14 34697 EPO_LOW_COVERAGE... 545 16 34112 LASTZ_NET... ----------------------------------------------------------------------------... .. doctest:: >>> gene = compara_pair.Bushbaby.getGeneByStableId( ... StableId='ENSOGAG00000003166' ... ) ... >>> print gene Gene(Species='Otolemur garnettii'; BioType='protein_coding'... >>> syntenic = compara_pair.getSyntenicRegions(region=gene, ... align_method='LASTZ_NET', align_clade='H.sap-O.gar') ... >>> for region in syntenic: ... print region ... break SyntenicRegions: Coordinate(Bushbaby,scaf...,GL87...,8624867-8626121,1) Coordinate(Human,chro...,7,135410894-135412244,1) PyCogent-1.5.3/doc/examples/query_ncbi.rst000644 000765 000024 00000012636 11464420767 021540 0ustar00jrideoutstaff000000 000000 Querying NCBI for VWF ===================== .. sectionauthor:: Gavin Huttley This example is taken from the PyCogent paper (Knight et al. Genome Biol, 8(8):R171, 2007). .. note:: Due to changes by NCBI in the structure of there records, it is no longer easy (but not impossible) to use the Sequences ``Info`` attribute to restrict sequences to Swissprot entries as we had done previously. We query the NCBI protein data base for von Willebrand Factor (VWF), a 2813 amino acid glycoprotein required for platelet adhesion in blood coagulation. Missense mutations in this molecule have been associated with von Willebrand disease, a heterogeneous disorder characterized by prolonged bleeding. We import ``EUtils`` for querying NCBI and search the protein data-base, restricting our search to mammal sequences. .. doctest:: >>> from cogent.db.ncbi import EUtils >>> db = EUtils(db="protein", rettype="gp") >>> query = '"VWf"[gene] AND Mammalia[orgn]' >>> records = db[query].readlines() We have requested the GenBank record format. We use the ``RichGenbankParser`` to grab features from the feature annotations of this format. We illustrate grabbing additional content from the feature tables by extracting the locations of SNPs and whether those SNPs have a disease association. An inspection of the feature tables for human entries reveals that the Swissprot entry had the most complete and accessible content regarding the SNP data: region features with ``region_name="Variant"`` and a note field that contains the indicated amino acid difference along with an indication of whether the SNP was associated with von Willebrand disease (denoted by the symbol VWD). Finally, we seek to extract the protein domain locations for presentation purposes. We simply limit our attention to large, relatively complete sequences. From the human record we extract annotation data. We use regular expressions to assist with extracting data regarding the amino acid change and the domain names. We also name sequences based on their [G]enus [spe]cies and accession. The selected species are accumulated in a ``seqs`` dictionary, keyed by their name. The feature data are accumulated in a list. .. doctest:: >>> import re >>> from cogent.parse.genbank import RichGenbankParser >>> parser = RichGenbankParser(records) >>> seqs = {} >>> rows = [] >>> for accession, seq in parser: ... if len(seq) < 2800: ... continue ... # we extract annotation data only from the human record ... if "Homo" in seq.Info.species: ... for feature in seq.Info.features: ... if "region_name" not in feature: ... continue # ignore this one, go to the next feature ... if "Variant" in feature["region_name"]: ... note = feature["note"][0] ... variant = " ".join(re.findall("[A-Z] *-> *[A-Z]", ... note)) ... disease = "VWD" in note ... lo = feature["location"].first() - 1 ... hi = feature["location"].last() ... rows.append(["SNP", lo, hi, variant.strip(), disease]) ... else: ... region_name = feature["region_name"][0] ... if region_name == "Domain": ... note = [field.strip() \ ... for field in re.split("[.;]", feature["note"][0]) if field] ... if len(note) == 1: ... note += [""] ... lo = feature["location"].first() - 1 ... hi = feature["location"].last() ... rows.append(["Domain", lo, hi, ' '.join(note).strip(), None]) ... species = seq.Info.species.split() ... seq_name = "%s.%s" % (species[0][0] + species[1][:3], accession) ... seqs[seq_name] = seq We convert the sequences to a ``SequenceCollection`` using ``LoadSeqs`` and then save to a fasta formatted file. .. doctest:: >>> from cogent import LoadSeqs >>> seqs = LoadSeqs(data=seqs, aligned=False) >>> print seqs.NamedSeqs['Clup.VWF_CANFA'].toFasta() >Clup.VWF_CANFA MSPTRLVRVLLALALI... We convert the features into a PyCogent ``Table`` object, which requires we specify column headings. This can be saved to file if desired, but we don't do that here. For display purposes, we just print the first 10 records. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> from cogent import LoadTable >>> feature_table = LoadTable(header=["Type", "Start", "Stop", "Note", ... "Disease"], rows=rows) Printing ``feature_table[:10]`` should result in something like: .. code-block:: python ============================================ Type Start Stop Note Disease -------------------------------------------- Domain 33 240 VWFD 1 SNP 272 273 R -> W True Domain 294 348 TIL 1 SNP 317 318 N -> K False SNP 376 377 W -> C True Domain 386 598 VWFD 2 SNP 483 484 H -> R False SNP 527 528 N -> S True SNP 549 550 G -> R True Domain 651 707 TIL 2 -------------------------------------------- PyCogent-1.5.3/doc/examples/rate_heterogeneity.rst000644 000765 000024 00000004370 11473355707 023263 0ustar00jrideoutstaff000000 000000 .. _rate-heterogeneity: Analysis of rate heterogeneity ============================== .. sectionauthor:: Gavin Huttley A simple example for analyses involving rate heterogeneity among sites. In this case we will simulate an alignment with two rate categories and then try to recover the rates from the alignment. .. doctest:: >>> from cogent.evolve.substitution_model import Nucleotide >>> from cogent import LoadTree Make an alignment with equal split between rates 0.6 and 0.2, and then concatenate them to create a new alignment. .. doctest:: >>> model = Nucleotide(equal_motif_probs=True) >>> tree = LoadTree("data/test.tree") >>> lf = model.makeLikelihoodFunction(tree) >>> lf.setParamRule('length', value=0.6, is_constant=True) >>> aln1 = lf.simulateAlignment(sequence_length=10000) >>> lf.setParamRule('length', value=0.2, is_constant=True) >>> aln2 = lf.simulateAlignment(sequence_length=10000) >>> aln3 = aln1 + aln2 Start from scratch, optimising only rates and the rate probability ratio. .. doctest:: >>> model = Nucleotide(equal_motif_probs=True, ordered_param="rate", ... distribution="free") >>> lf = model.makeLikelihoodFunction(tree, bins=2, digits=2, space=3) >>> lf.setAlignment(aln3) >>> lf.optimise(local=True, max_restarts=2) We want to know the bin probabilities and the posterior probabilities. .. doctest:: >>> bprobs = [t for t in lf.getStatistics() if 'bin' in t.Title][0] Print the ``bprobs`` sorted by ``'rate'`` will generate a table like .. code-block:: python bin params ==================== bin bprobs rate -------------------- bin0 0.49 0.49 bin1 0.51 1.48 -------------------- We'll now use a gamma distribution on the sample alignment, specifying the number of bins as 4. We specify that the bins have equal density using the ``lf.setParamRule('bprobs', is_constant=True)`` command. .. doctest:: >>> model = Nucleotide(equal_motif_probs=True, ordered_param="rate", ... distribution="gamma") >>> lf = model.makeLikelihoodFunction(tree, bins=4) >>> lf.setParamRule('bprobs', is_constant=True) >>> lf.setAlignment(aln3) >>> lf.optimise(local=True, max_restarts=2) PyCogent-1.5.3/doc/examples/relative_rate.rst000644 000765 000024 00000007040 11425201333 022176 0ustar00jrideoutstaff000000 000000 Performing a relative rate test =============================== .. sectionauthor:: Gavin Huttley From cogent import all the components we need .. doctest:: >>> from cogent import LoadSeqs, LoadTree >>> from cogent.evolve.models import HKY85 >>> from cogent.maths import stats Get your alignment and tree. .. doctest:: >>> aln = LoadSeqs(filename = "data/long_testseqs.fasta") >>> t = LoadTree(filename = "data/test.tree") Create a HKY85 model. .. doctest:: >>> sm = HKY85() Make the controller object and limit the display precision (to decrease the chance that small differences in estimates cause tests of the documentation to fail). .. doctest:: >>> lf = sm.makeLikelihoodFunction(t, digits=2, space=3) Set the local clock for humans & Howler Monkey. This method is just a special interface to the more general ``setParamRules`` method. .. doctest:: >>> lf.setLocalClock("Human", "HowlerMon") Get the likelihood function object this object performs the actual likelihood calculation. .. doctest:: >>> lf.setAlignment(aln) Optimise the function capturing the return optimised lnL, and parameter value vector. .. doctest:: >>> lf.optimise() View the resulting maximum-likelihood parameter values. .. doctest:: >>> lf.setName("clock") >>> print lf clock ===== kappa ----- 4.10 ----- =========================== edge parent length --------------------------- Human edge.0 0.04 HowlerMon edge.0 0.04 edge.0 edge.1 0.04 Mouse edge.1 0.28 edge.1 root 0.02 NineBande root 0.09 DogFaced root 0.11 --------------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- We extract the log-likelihood and number of free parameters for later use. .. doctest:: >>> null_lnL = lf.getLogLikelihood() >>> null_nfp = lf.getNumFreeParams() Clear the local clock constraint, freeing up the branch lengths. .. doctest:: >>> lf.setParamRule('length', is_independent=True) Run the optimiser capturing the return optimised lnL, and parameter value vector. .. doctest:: >>> lf.optimise() View the resulting maximum-likelihood parameter values. .. doctest:: >>> lf.setName("non clock") >>> print lf non clock ===== kappa ----- 4.10 ----- =========================== edge parent length --------------------------- Human edge.0 0.03 HowlerMon edge.0 0.04 edge.0 edge.1 0.04 Mouse edge.1 0.28 edge.1 root 0.02 NineBande root 0.09 DogFaced root 0.11 --------------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- These two lnL's are now used to calculate the likelihood ratio statistic it's degrees-of-freedom and the probability of observing the LR. .. doctest:: >>> LR = 2 * (lf.getLogLikelihood() - null_lnL) >>> df = lf.getNumFreeParams() - null_nfp >>> P = stats.chisqprob(LR, df) Print this and look up a :math:`$\chi^2$` with number of edges - 1 degrees of freedom. .. doctest:: >>> print "Likelihood ratio statistic = ", LR Likelihood ratio statistic = 2.7... >>> print "degrees-of-freedom = ", df degrees-of-freedom = 1 >>> print "probability = ", P probability = 0.09... PyCogent-1.5.3/doc/examples/reuse_results.rst000644 000765 000024 00000006231 11425201333 022255 0ustar00jrideoutstaff000000 000000 Reusing results to speed up optimisation ======================================== .. sectionauthor:: Gavin Huttley An example of how to use the maximum-likelihood parameter estimates from one model as starting values for another model. In this file we do something silly, by saving a result and then reloading it. This is silly because the analyses are run consecutively. A better approach when running consecutively is to simply use the annotated tree directly. .. doctest:: >>> from cogent import LoadSeqs, LoadTree >>> from cogent.evolve.models import MG94HKY We'll create a simple model, optimise it and save it for later reuse .. doctest:: >>> aln = LoadSeqs("data/long_testseqs.fasta") >>> t = LoadTree("data/test.tree") >>> sm = MG94HKY() >>> lf = sm.makeLikelihoodFunction(t, digits=2, space=2) >>> lf.setAlignment(aln) >>> lf.optimise(local=True) >>> print lf Likelihood Function Table ============ kappa omega ------------ 3.85 0.90 ------------ ========================= edge parent length ------------------------- Human edge.0 0.09 HowlerMon edge.0 0.12 edge.0 edge.1 0.12 Mouse edge.1 0.84 edge.1 root 0.06 NineBande root 0.28 DogFaced root 0.34 ------------------------- ============= motif mprobs ------------- T 0.23 C 0.19 A 0.37 G 0.21 ------------- The essential object for reuse is an annotated tree these capture the parameter estimates from the above optimisation we can either use this directly in the same run, or we can save the tree to file in ``xml`` format and reload the tree at a later time for use. In this example I'll illustrate the latter scenario. .. doctest:: >>> at=lf.getAnnotatedTree() >>> at.writeToFile('tree.xml') We load the tree as per usual .. doctest:: >>> nt = LoadTree('tree.xml') Now create a more parameter rich model, in this case by allowing the ``Human`` edge to have a different value of ``omega``. By providing the annotated tree, the parameter estimates from the above run will be used as starting values for the new model. .. doctest:: >>> new_lf = sm.makeLikelihoodFunction(nt, digits=2, space=2) >>> new_lf.setParamRule('omega', edge='Human', ... is_independent=True) >>> new_lf.setAlignment(aln) >>> new_lf.optimise(local=True) >>> print new_lf Likelihood Function Table ===== kappa ----- 3.85 ----- ================================ edge parent length omega -------------------------------- Human edge.0 0.09 0.59 HowlerMon edge.0 0.12 0.92 edge.0 edge.1 0.12 0.92 Mouse edge.1 0.84 0.92 edge.1 root 0.06 0.92 NineBande root 0.28 0.92 DogFaced root 0.34 0.92 -------------------------------- ============= motif mprobs ------------- T 0.23 C 0.19 A 0.37 G 0.21 ------------- .. clean up .. doctest:: :hide: >>> import os >>> os.remove('tree.xml') PyCogent-1.5.3/doc/examples/reverse_complement.rst000644 000765 000024 00000002560 12014672513 023257 0ustar00jrideoutstaff000000 000000 Getting the reverse complement ============================== .. sectionauthor:: Gavin Huttley This is a property of DNA, and hence alignments need to be created with the appropriate ``MolType``. In the following example, the alignment is truncated to just 50 bases for the sake of simplifying the presentation. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs("data/long_testseqs.fasta", moltype=DNA)[:50] The original alignment looks like this. .. doctest:: >>> print aln >Human TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTT >HowlerMon TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTT >Mouse TGTGGCACAGATGCTCATGCCAGCTCATTACAGCCTGAGACCAGCAGTTT >NineBande TGTGGCACAAATACTCATGCCAACTTATTACAGCATGAGAACAGCAGTTT >DogFaced TGTGGCACAAATACTCATGCCAACTCATTACAGCATGAGAACAGCAGTTT We do reverse complement very simply. .. doctest:: >>> naln = aln.rc() The reverse complemented alignment looks like this. .. doctest:: >>> print naln >Human AAACTGCTGTTCTCATGCTGTAATGAGCTGGCATGAGTATTTGTGCCACA >HowlerMon AAACTGCTGTTCTCATGCTGTAATGAGCTGGCATGAGTATTTGTGCCACA >Mouse AAACTGCTGGTCTCAGGCTGTAATGAGCTGGCATGAGCATCTGTGCCACA >NineBande AAACTGCTGTTCTCATGCTGTAATAAGTTGGCATGAGTATTTGTGCCACA >DogFaced AAACTGCTGTTCTCATGCTGTAATGAGTTGGCATGAGTATTTGTGCCACA PyCogent-1.5.3/doc/examples/scope_model_params_on_trees.rst000644 000765 000024 00000027413 11473355707 025132 0ustar00jrideoutstaff000000 000000 .. _scope-params-on-trees: Allowing substitution model parameters to differ between branches ================================================================= .. sectionauthor:: Gavin Huttley A common task concerns assessing how substitution model exchangeability parameters differ between evolutionary lineages. This is most commonly of interest for the case of testing for natural selection. Here I'll demonstrate the different ways of scoping parameters across trees for the codon model case and how these can be used for evolutionary modelling. We start with the standard imports, plus using a canned codon substitution model and then load the sample data set. .. doctest:: >>> from cogent import LoadSeqs, LoadTree >>> from cogent.evolve.models import MG94HKY >>> aln = LoadSeqs("data/long_testseqs.fasta") >>> tree = LoadTree("data/test.tree") We construct the substitution model and likelihood function and set the alignment. .. doctest:: >>> sm = MG94HKY() >>> lf = sm.makeLikelihoodFunction(tree, digits=2, space=3) >>> lf.setAlignment(aln) At this point we have a likelihood function with two exchangeability parameters from the substitution model (``kappa`` the transition/transversion ratio; ``omega`` the nonsynonymous/synonymous ratio) plus branch lengths for all tree edges. To facilitate subsequent discussion I now display the tree .. doctest:: >>> print tree.asciiArt() /-Human /edge.0--| /edge.1--| \-HowlerMon | | | \-Mouse -root----| |--NineBande | \-DogFaced In order to scope a parameter on a tree (meaning specifying a subset of edges for which the parameter is to be treated differently to the remainder of the tree) requires uniquely identifying the edges. We do this using the following arguments to the likelihood function ``setParamRule`` method: - ``tip_names``: the name of two tips - ``outgroup_name``: the name of a tip that is not part of the clade of interest - ``is_clade``: if ``True``, all lineages descended from the tree node identified by the ``tip_names`` and ``outgroup_name`` argument are affected by the other arguments. If ``False``, then the ``is_stem`` argument must apply. - ``is_stem``: Whether the edge leading to the node is included. The next concepts include exactly what can be scoped and how. In the case of testing for distinctive periods of natural selection it is common to specify distinct values for ``omega`` for an edge. I'll first illustrate some possible uses for the arguments above by setting ``omega`` to be distinctive for specific edges. I will set a value for omega so that printing the likelihood function illustrates what edges have been effected, but I won't actually do any model fitting. Specifying a clade ------------------ I'm going to cause ``omega`` to attain a different value for all branches aside from the primate clade and stem (``HowlerMon``, ``Human``, ``edge.0``). .. doctest:: >>> lf.setParamRule('omega', tip_names=['DogFaced', 'Mouse'], ... outgroup_name='Human', init=2.0, is_clade=True) >>> print lf Likelihood Function Table ===== kappa ----- 1.00 ----- =================================== edge parent length omega ----------------------------------- Human edge.0 0.03 1.00 HowlerMon edge.0 0.04 1.00 edge.0 edge.1 0.04 1.00 Mouse edge.1 0.28 2.00 edge.1 root 0.02 2.00 NineBande root 0.09 2.00 DogFaced root 0.11 2.00 ----------------------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- As you can see ``omega`` for the primate edges I listed above have the default parameter value (1.0), while the others have what I've assigned. In fact, you could omit the ``is_clade`` argument as this is the default, but I think for readability of scripts it's best to be explicit. Specifying a stem ----------------- This time I'll specify the stem leading to the primates as the edge of interest. .. note:: I need to reset the ``lf`` so all edges have the default value again. I'll show this only for this example, but rest assured I'm doing it for all others too. .. doctest:: >>> lf.setParamRule('omega', init=1.0) >>> lf.setParamRule('omega', tip_names=['Human', 'HowlerMon'], ... outgroup_name='Mouse', init=2.0, is_stem=True, is_clade=False) >>> print lf Likelihood Function Table ===== kappa ----- 1.00 ----- =================================== edge parent length omega ----------------------------------- Human edge.0 0.03 1.00 HowlerMon edge.0 0.04 1.00 edge.0 edge.1 0.04 2.00 Mouse edge.1 0.28 1.00 edge.1 root 0.02 1.00 NineBande root 0.09 1.00 DogFaced root 0.11 1.00 -----------------------------------... Specifying clade and stem ------------------------- I'll specify that both the primates and their stem are to be considered. .. doctest:: :hide: >>> lf.setParamRule('omega', init=1.0) .. doctest:: >>> lf.setParamRule('omega', tip_names=['Human', 'HowlerMon'], ... outgroup_name='Mouse', init=2.0, is_stem=True, is_clade=True) >>> print lf Likelihood Function Table ===== kappa ----- 1.00 ----- =================================== edge parent length omega ----------------------------------- Human edge.0 0.03 2.00 HowlerMon edge.0 0.04 2.00 edge.0 edge.1 0.04 2.00 Mouse edge.1 0.28 1.00 edge.1 root 0.02 1.00 NineBande root 0.09 1.00 DogFaced root 0.11 1.00 -----------------------------------... Alternate arguments for specifying edges ---------------------------------------- The likelihood function ``setParamRule`` method also has the arguments of ``edge`` and ``edges``. These allow specific naming of the tree edge(s) to be affected by a rule. In general, however, the ``tip_names`` + ``outgroup_name`` combo is more robust. Applications of scoped parameters --------------------------------- The general use-cases for which a tree scope can be applied are: 1. constraining all edges identified by a rule to have a specific value which is constant and not modifiable >>> lf.setParamRule('omega', tip_names=['Human', 'HowlerMon'], ... outgroup_name='Mouse', is_clade=True, is_constant=True) 2. all edges identified by a rule have the same but different value to the rest of the tree >>> lf.setParamRule('omega', tip_names=['Human', 'HowlerMon'], ... outgroup_name='Mouse', is_clade=True) 3. allowing all edges identified by a rule to have different values of the parameter with the remaining tree edges having the same value >>> lf.setParamRule('omega', tip_names=['Human', 'HowlerMon'], ... outgroup_name='Mouse', is_clade=True, is_independent=True) 4. allowing all edges to have a different value >>> lf.setParamRule('omega', is_independent=True) I'll demonstrate these cases sequentially as they involve gradually increasing the degrees of freedom in the model. First we'll constrain ``omega`` to equal 1 on the primate edges. I'll then optimise the model. .. note:: here I'm specifying a constant value for the parameter and so I **must** use the argument ``value`` to set it. This not to be confused with the argument ``init`` that is used for providing initial (starting) values for fitting. .. doctest:: :hide: >>> lf.setParamRule('omega', init=1.0) .. doctest:: >>> lf.setParamRule('omega', tip_names=['Human', 'HowlerMon'], ... outgroup_name='Mouse', is_clade=True, value=1.0, is_constant=True) >>> lf.optimise(local=True) >>> print lf Likelihood Function Table ===== kappa ----- 3.87 ----- =================================== edge parent length omega ----------------------------------- Human edge.0 0.09 1.00 HowlerMon edge.0 0.12 1.00 edge.0 edge.1 0.12 0.92 Mouse edge.1 0.84 0.92 edge.1 root 0.06 0.92 NineBande root 0.28 0.92 DogFaced root 0.34 0.92 ----------------------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- >>> print lf.getLogLikelihood() -8640.9... >>> print lf.getNumFreeParams() 9 I'll now free up ``omega`` on the primate clade, but making it a single value shared by all primate lineages. .. doctest:: >>> lf.setParamRule('omega', tip_names=['Human', 'HowlerMon'], ... outgroup_name='Mouse', is_clade=True, is_constant=False) >>> lf.optimise(local=True) >>> print lf Likelihood Function Table ===== kappa ----- 3.85 ----- =================================== edge parent length omega ----------------------------------- Human edge.0 0.09 0.77 HowlerMon edge.0 0.12 0.77 edge.0 edge.1 0.12 0.92 Mouse edge.1 0.84 0.92 edge.1 root 0.06 0.92 NineBande root 0.28 0.92 DogFaced root 0.34 0.92 ----------------------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- >>> print lf.getLogLikelihood() -8639.7... >>> print lf.getNumFreeParams() 10 Finally I'll allow all primate edges to have different values of ``omega``. .. doctest:: >>> lf.setParamRule('omega', tip_names=['Human', 'HowlerMon'], ... outgroup_name='Mouse', is_clade=True, is_independent=True) >>> lf.optimise(local=True) >>> print lf Likelihood Function Table ===== kappa ----- 3.85 ----- =================================== edge parent length omega ----------------------------------- Human edge.0 0.09 0.59 HowlerMon edge.0 0.12 0.95 edge.0 edge.1 0.12 0.92 Mouse edge.1 0.84 0.92 edge.1 root 0.06 0.92 NineBande root 0.28 0.92 DogFaced root 0.34 0.92 ----------------------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- >>> print lf.getLogLikelihood() -8638.9... >>> print lf.getNumFreeParams() 11 We now allow ``omega`` to be different on all edges. .. doctest:: >>> lf.setParamRule('omega', is_independent=True) >>> lf.optimise(local=True) >>> print lf Likelihood Function Table ===== kappa ----- 3.85 ----- =================================== edge parent length omega ----------------------------------- Human edge.0 0.09 0.59 HowlerMon edge.0 0.12 0.95 edge.0 edge.1 0.12 1.13 Mouse edge.1 0.84 0.92 edge.1 root 0.06 0.38 NineBande root 0.28 1.27 DogFaced root 0.34 0.84 ----------------------------------- ============== motif mprobs -------------- T 0.23 C 0.19 A 0.37 G 0.21 -------------- >>> print lf.getLogLikelihood() -8636.1... >>> print lf.getNumFreeParams() 15 PyCogent-1.5.3/doc/examples/seq_features.rst000644 000765 000024 00000006254 11350301455 022047 0ustar00jrideoutstaff000000 000000 .. _seq-annotations: Advanced sequence handling ========================== .. sectionauthor:: Gavin Huttley Individual sequences and alignments can be manipulated by annotations. Most value in the genome sequences arises from sequence annotations regarding specific sequence feature types, e.g. genes with introns / exons, repeat sequences. These can be applied to an alignment either using data formats available from genome portals (e.g. GFF, or GenBank annotation formats) or by custom assignments. Annotations can be added in two ways: using either the ``addAnnotation`` or the ``addFeature`` method. The distinction between these two is that ``addFeatures`` is more specialised. Features can be thought of as a type of annotation representing standard sequence properties eg introns/exons. Annotations are the more general case, such as a computed property which has, say a numerical value and a span. For illustrative purposes we define a sequence with 2 exons and grab the 1\ :sup:`st` \ exon: .. doctest:: >>> from cogent import DNA >>> s = DNA.makeSequence("aagaagaagacccccaaaaaaaaaattttttttttaaaaaaaaaaaaa", ... Name="Orig") >>> exon1 = s.addFeature('exon', 'exon1', [(10,15)]) >>> exon2 = s.addFeature('exon', 'exon2', [(30,40)]) Here, '``exon``' is the feature type, and '``exon#``' the feature name. The feature type is used for the display formatting, which won't be illustrated here, and also for selecting all features of the same type, shown below. We could also have created an annotation using the ``addAnnotation`` method: .. doctest:: >>> from cogent.core.annotation import Feature >>> s2=DNA.makeSequence("aagaagaagacccccaaaaaaaaaattttttttttaaaaaaaaaaaaa", ... Name="Orig2") >>> exon3 = s2.addAnnotation(Feature, 'exon', 'exon1', [(35,40)]) We can use the features (eg ``exon1``) to get the corresponding sequence region. .. doctest:: >>> s[exon1] DnaSequence(CCCCC) You can query annotations by type and optionally by label, receiving a list of features: .. doctest:: >>> exons = s.getAnnotationsMatching('exon') >>> print exons [exon "exon1" at [10:15]/48, exon "exon2" at [30:40]/48] We can use this list to construct a pseudo-feature covering (or excluding) multiple features using ``getRegionCoveringAll``. For instance, getting all exons, .. doctest:: >>> print s.getRegionCoveringAll(exons) region "exon" at [10:15, 30:40]/48 >>> s.getRegionCoveringAll(exons).getSlice() DnaSequence(CCCCCTT... 15) or not exons (the exon *shadow*): .. doctest:: >>> print s.getRegionCoveringAll(exons).getShadow().getSlice() AAGAAGAAGAAAAAAAAAAATTTTTAAAAAAAA The first of these essentially returns the CDS of the gene. Features are themselves sliceable: .. doctest:: >>> exon1[0:3].getSlice() DnaSequence(CCC) This approach to sequence / alignment handling allows the user to manipulate them according to things they know about such as genes or repeat elements. Most of this annotation data can be obtained from genome portals. The toolkit can perform standard sequence / alignment manipulations such as getting a subset of sequences or aligned columns, translating sequences, reading and writing standard formats. PyCogent-1.5.3/doc/examples/seqsim_alignment_simulation.rst000644 000765 000024 00000006736 11361433354 025177 0ustar00jrideoutstaff000000 000000 Seqsim Simple Alignment Simulation Example ========================================== .. sectionauthor:: Julia Goodrich This is a very simple example of how to use the ``seqsim`` module to simulate an alignment for a tree starting with a random sequence and substitution rate matrix (q). The rate matrix gives the rate constant of going from one character in the sequence to another character in the sequence, the Q matrix determines the rate of change of the sequence. First we will perform the necessary imports: * ``Rates`` is an object that stores the rate matrix data, it can also be used to generate a random rate matrix given an ``Alphabet``. * ``DnaUsage`` is a ``Usage`` object that stores the usage of each nucleotide. * ``DnaPairs`` is an Alphabet it stores the DNA pairs (AA,AT,AC,...), it can be passed into the ``Rates`` object, defining the rate matrix pairs for DNA. * ``DNA`` is a ``MolType`` object for DNA. * ``RangeNode`` is the main ``seqsim`` Node object, it allows for the easy evolution of sequences. * ``DndParser`` is a parser for a newick format tree. .. doctest:: >>> from cogent.seqsim.usage import Rates, DnaUsage >>> from cogent.core.usage import DnaPairs >>> from cogent.core.moltype import DNA >>> from cogent.core.alignment import Alignment >>> from cogent.seqsim.tree import RangeNode >>> from cogent.parse.tree import DndParser Now, lets specify a 4 taxon tree: .. doctest:: >>> t = DndParser('(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1);', ... constructor = RangeNode) To generate a random DNA sequence, we first specify nucleotide frequencies with the ``DnaUsage`` object. Then we create a random DNA sequence that is five bases long. .. doctest:: >>> u = DnaUsage({'A':0.5,'T':0.2,'C':0.15,'G':0.25}) >>> s = DNA.ModelSeq(u.randomIndices(5)) >>> q = Rates.random(DnaPairs) Set q at the base of the tree and propagate it to all nodes in the tree, .. doctest:: >>> t.Q = q >>> t.propagateAttr('Q') Set a P matrix (probability matrix) from every Q matrix on each node: P(t) = e^(Qt), .. doctest:: >>> t.assignP() Use ``evolve`` to evolve sequences for each tip, Note: must evolve sequence data, not sequence object itself (for speed) .. doctest:: >>> t.evolve(s._data) Build alignment, .. doctest:: >>> seqs = {} >>> for n in t.tips(): ... seqs[n.Name] = DNA.ModelSeq(n.Sequence) >>> aln = Alignment(seqs) The result is a Cogent ``Alignment`` object, which can be used the same way as any other alignment object. ``evolveSeqs`` can be used instead of evolve to evolve multiple sequences according to the same tree (can model either different genes, or different rate categories within a gene that you then combine, etc...), .. doctest:: >>> from numpy import concatenate First you need to use ``assignPs`` to assign the proper P matricies given rates: .. doctest:: >>> t.assignPs([.5, .75, 1]) There needs to be the same number of random sequences as there are rate catigories so we create a list of 3 random sequences, .. doctest:: >>> s = [DNA.ModelSeq(u.randomIndices(5))._data for i in range(0,3)] Then use ``evolveSeqs`` to evolve a sequence for every tip with every rate. .. doctest:: >>> t.evolveSeqs(s) Now to concatenate the sequences, .. doctest:: >>> seqs = {} >>> for n in t.tips(): ... for s in n.Sequences: ... seqs[n.Name] = DNA.ModelSeq(concatenate(tuple(n.Sequences))) >>> aln = Alignment(seqs) PyCogent-1.5.3/doc/examples/seqsim_aln_sim_user_alphabet.rst000644 000765 000024 00000004245 11361433354 025266 0ustar00jrideoutstaff000000 000000 Seqsim Alignment Simulation Example with Non-standard alphabet ============================================================== .. sectionauthor:: Julia Goodrich This is an example of how to use PyCogent's ``seqsim`` module to simulate an alignment where the alphabet is defined by the user for a simple tree starting with a random sequence and a random substitution rate matrix. First we will perform the necessary imports. .. doctest:: >>> from cogent.seqsim.usage import Rates >>> from cogent.core.alignment import DenseAlignment >>> from cogent.seqsim.tree import RangeNode >>> from cogent.parse.tree import DndParser >>> from cogent.core.alphabet import CharAlphabet >>> from cogent.seqsim.usage import Usage >>> from cogent.core.sequence import ModelSequence Now, lets specify a 4 taxon tree: .. doctest:: >>> t = DndParser('(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1);', ... constructor = RangeNode) Create the alphabet by passing in the characters to ``CharAlphabet`` then create tuples of all the possible pairs using ** operator .. doctest:: >>> Bases = CharAlphabet('ABCD') >>> Pairs = Bases**2 Generate a random sequence with the new alphabet and a random rate matrix, ``Usage`` is being used to define character frequencies for the random sequence. Then we create a random sequence of length five. .. doctest:: >>> u = Usage({'A':0.5,'B':0.2,'C':0.15,'D':0.25}, Alphabet = Bases) >>> s = ModelSequence(u.randomIndices(5)) >>> q = Rates.random(Pairs) Set q at the base of the tree and propagate it to all nodes in the tree, .. doctest:: >>> t.Q = q >>> t.propagateAttr('Q') Set a P matrix from every Q matrix on each node, .. doctest:: >>> t.assignP() Use ``evolve`` to evolve sequences for each tip, Note: must evolve sequence data, not sequence object itself (for speed) .. doctest:: >>> t.evolve(s._data) Build alignment, .. doctest:: >>> seqs = {} >>> for n in t.tips(): ... seqs[n.Name] = ModelSequence(n.Sequence,Bases) >>> aln = DenseAlignment(seqs,Alphabet=Bases) The result is a Cogent ``Alignment`` object, which can be used the same way as any other alignment object. PyCogent-1.5.3/doc/examples/seqsim_tree_sim.rst000644 000765 000024 00000002503 11361433354 022550 0ustar00jrideoutstaff000000 000000 Seqsim Simple Tree Simulation ============================= .. sectionauthor:: Julia Goodrich This is an example of how to use the birth-death model in cogent to simulate a tree. .. doctest:: >>> from cogent.seqsim.birth_death import BirthDeathModel, ExtinctionError,\ ... TooManyTaxaError Create a model with specific death probabilities per timestep using ``BirthProb`` and ``DeathProb``, and ``TimePerStep``. The desired maximum number of taxa on the tree can be set using ``MaxTaxa``. .. doctest:: >>> b = BirthDeathModel(BirthProb=0.2, DeathProb=0.1, TimePerStep=0.03, ... MaxTaxa=20) To simulate a tree with an exact number of taxa, use the following loop. The ``exact`` flag raises a ``TooManyTaxaError`` exception if the call produces the wrong number of taxa (e.g. because too many things died in the same timestep). An ``ExtinctionError`` is raised if the ancestral node dies off, or all the nodes die off. .. doctest:: >>> while True: ... try: ... t = b(exact=True) ... except (ExtinctionError, TooManyTaxaError): ... pass ... else: ... break t now contains a ``RangeNode`` tree object that was returned, and can be used the same ways any other tree object in cogent is used. For instance, ``seqsim`` can be used to evolve alignments. PyCogent-1.5.3/doc/examples/simple.rst000644 000765 000024 00000002234 11425201333 020641 0ustar00jrideoutstaff000000 000000 The simplest script =================== .. sectionauthor:: Gavin Huttley This is just about the simplest possible Cogent script for evolutionary modelling. We use a canned nucleotide substitution model (the ``HKY85`` model) on just three primate species. As there is only unrooted tree possible, the sequence names are all that's required to make the tree. .. doctest:: >>> from cogent.evolve.models import HKY85 >>> from cogent import LoadSeqs, LoadTree >>> model = HKY85() >>> aln = LoadSeqs("data/primate_cdx2_promoter.fasta") >>> tree = LoadTree(tip_names=aln.Names) >>> lf = model.makeLikelihoodFunction(tree) >>> lf.setAlignment(aln) >>> lf.optimise() >>> print lf Likelihood Function Table ====== kappa ------ 5.9589 ------ =========================== edge parent length --------------------------- human root 0.0040 macaque root 0.0384 chimp root 0.0061 --------------------------- =============== motif mprobs --------------- T 0.2552 C 0.2581 A 0.2439 G 0.2428 --------------- PyCogent-1.5.3/doc/examples/simulate_alignment.rst000644 000765 000024 00000003150 11213034174 023231 0ustar00jrideoutstaff000000 000000 Simulate an alignment ===================== .. sectionauthor:: Gavin Huttley How to simulate an alignment. For this example we just create a simple model using a four taxon tree with very different branch lengths, a Felsenstein model with very different nucleotide frequencies and a long alignment. See the other examples for how to define other substitution models. .. doctest:: >>> import sys >>> from cogent import LoadTree >>> from cogent.evolve import substitution_model Specify the 4 taxon tree, .. doctest:: >>> t = LoadTree(treestring='(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1);') Define our Felsenstein 1981 substitution model. .. doctest:: >>> sm = substitution_model.Nucleotide(motif_probs = {'A': 0.5, 'C': 0.2, ... 'G': 0.2, 'T': 0.1}, model_gaps=False) >>> lf = sm.makeLikelihoodFunction(t) >>> lf.setConstantLengths() >>> lf.setName('F81 model') >>> print lf F81 model ========================== edge parent length -------------------------- a root 0.4000 b root 0.3000 c edge.0 0.1500 d edge.0 0.2000 edge.0 root 0.1000 -------------------------- =============== motif mprobs --------------- T 0.1000 C 0.2000 A 0.5000 G 0.2000 --------------- We'll now create a simulated alignment of length 1000 nucleotides. .. doctest:: >>> simulated = lf.simulateAlignment(sequence_length=1000) The result is a normal ``Cogent`` alignment object, which can be used in the same way as any other alignment object. PyCogent-1.5.3/doc/examples/testing_multi_loci.rst000644 000765 000024 00000006412 11425201333 023247 0ustar00jrideoutstaff000000 000000 Likelihood analysis of multiple loci ==================================== .. sectionauthor:: Gavin Huttley We want to know whether an exchangeability parameter is different between alignments. We will specify a null model, under which each alignment get's it's own motif probabilities and all alignments share branch lengths and the exchangeability parameter kappa (the transition / transversion ratio). We'll split the example alignment into two-pieces. .. doctest:: >>> from cogent import LoadSeqs, LoadTree, LoadTable >>> from cogent.evolve.models import HKY85 >>> from cogent.recalculation.scope import EACH, ALL >>> from cogent.maths.stats import chisqprob >>> aln = LoadSeqs("data/long_testseqs.fasta") >>> half = len(aln)/2 >>> aln1 = aln[:half] >>> aln2 = aln[half:] We provide names for those alignments, then construct the tree, model instances. .. doctest:: >>> loci_names = ["1st-half", "2nd-half"] >>> loci = [aln1, aln2] >>> tree = LoadTree(tip_names=aln.getSeqNames()) >>> mod = HKY85() To make a likelihood function with multiple alignments we provide the list of loci names. We can then specify a parameter (other than length) to be the same across the loci (using the imported ``ALL``) or different for each locus (using ``EACH``). We conduct a LR test as before. .. doctest:: >>> lf = mod.makeLikelihoodFunction(tree,loci=loci_names,digits=2,space=3) >>> lf.setParamRule("length", is_independent=False) >>> lf.setParamRule('kappa', loci = ALL) >>> lf.setAlignment(loci) >>> lf.optimise(local=True) >>> print lf Likelihood Function Table ========================= locus motif mprobs ------------------------- 1st-half T 0.22 1st-half C 0.18 1st-half A 0.38 1st-half G 0.21 2nd-half T 0.24 2nd-half C 0.19 2nd-half A 0.35 2nd-half G 0.22 ------------------------- ============== kappa length -------------- 3.98 0.13 -------------- >>> all_lnL = lf.getLogLikelihood() >>> all_nfp = lf.getNumFreeParams() >>> lf.setParamRule('kappa', loci = EACH) >>> lf.optimise(local=True) >>> print lf Likelihood Function Table ================ locus kappa ---------------- 1st-half 4.33 2nd-half 3.74 ---------------- ========================= locus motif mprobs ------------------------- 1st-half T 0.22 1st-half C 0.18 1st-half A 0.38 1st-half G 0.21 2nd-half T 0.24 2nd-half C 0.19 2nd-half A 0.35 2nd-half G 0.22 ------------------------- ====== length ------ 0.13 ------ >>> each_lnL = lf.getLogLikelihood() >>> each_nfp = lf.getNumFreeParams() >>> LR = 2 * (each_lnL - all_lnL) >>> df = each_nfp - all_nfp Just to pretty up the result display, I'll print a table consisting of the test statistics created on the fly. >>> print LoadTable(header=['LR', 'df', 'p'], ... rows=[[LR, df, chisqprob(LR, df)]], digits=2, space=3) ================ LR df p ---------------- 1.59 1 0.21 ---------------- PyCogent-1.5.3/doc/examples/translate_dna.rst000644 000765 000024 00000001371 11213034174 022172 0ustar00jrideoutstaff000000 000000 Translating DNA into protein ============================ .. sectionauthor:: Gavin Huttley To translate a DNA alignment, read it in assigning the DNA alphabet. Note setting ``aligned = False`` is critical for loading sequences of unequal length. Different genetic codes are available in ``cogent.core.genetic_code`` .. doctest:: >>> from cogent import LoadSeqs, DNA >>> al = LoadSeqs('data/test2.fasta', moltype=DNA, aligned = False) >>> pal = al.getTranslation() >>> print pal.toFasta() >DogFaced ARSQQNRWVETKETCNDRQT >HowlerMon ARSQHNRWAESEETCNDRQT >Human ARSQHNRWAGSKETCNDRRT >Mouse AVSQQSRWAASKGTCNDRQV >NineBande RQQSRWAESKETCNDRQT To save this result to a file, use the ``writeToFile`` method. PyCogent-1.5.3/doc/examples/unifrac.rst000644 000765 000024 00000003121 11215636452 021006 0ustar00jrideoutstaff000000 000000 Run a Fast Unifrac community analysis ===================================== .. sectionauthor:: Justin Kuczynski Below is a simple example of using the fast unifrac function. first we import some tools .. doctest:: >>> from cogent.parse.tree import DndParser >>> from cogent.maths.unifrac.fast_unifrac import fast_unifrac >>> from cogent.maths.unifrac.fast_tree import UniFracTreeNode then we make a small example tree with tips B, C, D representing the relationship between species B, C, and D .. doctest:: >>> tree_str = "(B:0.2,(C:0.3,D:0.4)E:0.6)F;" >>> tr = DndParser(tree_str, UniFracTreeNode) >>> print tr.asciiArt() # doctest: +SKIP /-B -F-------| | /-C \E-------| \-D here's what the sample (rows) by sequence (cols) abundance matrix looks like:: ... [10,11,0] ... [2,0,9] ... [2,2,2] and here it is in dict format for unifrac .. doctest:: >>> envs = {'B':{'sample1':10, 'sample2':2, 'sample3':2}, ... 'C':{'sample1':11,'sample2':0, 'sample3':2}, ... 'D':{'sample1':0, 'sample2':9, 'sample3':2} ... } now we run unifrac:: >>> res = fast_unifrac(tr, envs) >>> print res['distance_matrix'] # doctest: +SKIP (array([[ 0. , 0.46666667, 0.26666667], [ 0.46666667, 0. , 0.2 ], [ 0.26666667, 0.2 , 0. ]]), ['sample1', 'sample2', 'sample3']) the pcoa results are misleading for such a small dataset, but the distance matrix is accurate PyCogent-1.5.3/doc/examples/unrestricted_nucleotide.rst000644 000765 000024 00000007064 11425201333 024304 0ustar00jrideoutstaff000000 000000 Specifying and using an unrestricted nucleotide substitution model ================================================================== .. sectionauthor:: Gavin Huttley Do standard ``cogent`` imports. .. doctest:: >>> from cogent import LoadSeqs, LoadTree, DNA >>> from cogent.evolve.predicate import MotifChange >>> from cogent.evolve.substitution_model import Nucleotide .. don't pollute screen during execution with uninteresting warning .. doctest:: :hide: >>> import warnings >>> warnings.filterwarnings("ignore", "Model not reversible") To specify substitution models we use the ``MotifChange`` class from predicates. In the case of an unrestricted nucleotide model, we specify 11 such ``MotifChanges``, the last possible change being ignored (with the result it is constrained to equal 1, thus calibrating the matrix). Also note that this is a non-reversible model and thus we can't assume the nucleotide frequencies estimated from the alignments are reasonable estimates for the root frequencies. We therefore specify they are to be optimised using ``optimise_motif_probs`` argument. .. doctest:: >>> ACTG = list('ACTG') >>> preds = [MotifChange(i, j, forward_only=True) for i in ACTG for j in ACTG if i != j] >>> del(preds[-1]) >>> preds [A>C, A>T, A>G, C>A, C>T, C>G, T>A, T>C, T>G, G>A, G>C] >>> sm = Nucleotide(predicates=preds, recode_gaps=True, ... optimise_motif_probs=True) >>> print sm Nucleotide ( name = ''; type = 'None'; params = ['A>T', 'C>G', 'T>G',... We'll illustrate this with a sample alignment and tree in ``data/primate_cdx2_promoter.fasta``. .. doctest:: >>> al = LoadSeqs("data/primate_cdx2_promoter.fasta", moltype=DNA) >>> al 3 x 1525 dna alignment: human[AGCGCCCGCGG...], macaque[AGC... >>> tr = LoadTree(tip_names=al.Names) >>> print tr (human,macaque,chimp)root; We now construct the parameter controller with each predicate constant across the tree, get the likelihood function calculator and optimise the function. .. doctest:: >>> lf = sm.makeLikelihoodFunction(tr, digits=2, space=3) >>> lf.setAlignment(al) >>> lf.setName('Unrestricted model') >>> lf.optimise(local=True) We just used the Powell optimiser, as this works quite well. .. doctest:: >>> print lf Unrestricted model ========================================================================== A>C A>G A>T C>A C>G C>T G>A G>C T>A T>C T>G -------------------------------------------------------------------------- 0.49 4.88 1.04 2.04 0.99 7.89 9.00 1.55 0.48 5.53 1.57 -------------------------------------------------------------------------- ========================= edge parent length ------------------------- human root 0.00 macaque root 0.04 chimp root 0.01 ------------------------- ============== motif mprobs -------------- T 0.26 C 0.26 A 0.24 G 0.24 -------------- This data set consists of species that are relatively close for a modest length alignment. As a result, doing something like allowing the parameters to differ between edges is not particularly well supported. If you have lots of data it makes sense to allow parameters to differ between edges, which can be specified by modifying the ``lf`` as follows. .. doctest:: >>> for pred in preds: ... lf.setParamRule(pred, is_independent=True) You would then re-optimise the model as above. PyCogent-1.5.3/doc/examples/using_parallel.py000644 000765 000024 00000002036 11502775461 022207 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """An example of how to distribute jobs across multiple cpu's. Note that this example works even if on a single cpu since the parallel assigns a `fake` communicator in that instance. """ from cogent.util import parallel __author__ = "Gavin Huttleu" __copyright__ = "Copyright 2007-2011, The Cogent Project" __contributors__ = ["Gavin Huttley", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.3.0.dev" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" JOBS = range(12) # nonsense jobs, just a list of numbers to be printed. # we divide up the CPUs into (at most) 12 groups of (at least) 1 CPU. (comm, leftover) = parallel.getSplitCommunicators(len(JOBS)) # and set the cpu's available to lower levels parallel.push(leftover) try: for job in JOBS: if not job % comm.size == comm.rank: continue print "My ID=%d, my message=%s" % (comm.rank, JOBS[job]) finally: # always restore the original parallel context parallel.pop(leftover) PyCogent-1.5.3/doc/data/1HQF.pdb000644 000765 000024 00002374042 11361362322 017117 0ustar00jrideoutstaff000000 000000 HEADER HYDROLASE 16-DEC-00 1HQF TITLE CRYSTAL STRUCTURE OF THE BINUCLEAR MANGANESE METALLOENZYME TITLE 2 ARGINASE COMPLEXED WITH N-HYDROXY-L-ARGININE COMPND MOL_ID: 1; COMPND 2 MOLECULE: ARGINASE 1; COMPND 3 CHAIN: A, B, C; COMPND 4 EC: 3.5.3.1; COMPND 5 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: RATTUS NORVEGICUS; SOURCE 3 ORGANISM_COMMON: NORWAY RAT; SOURCE 4 ORGANISM_TAXID: 10116; SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 6 EXPRESSION_SYSTEM_TAXID: 562 KEYWDS ARGINASE, N-HYDROXY-L-ARGININE (NOHA), BINUCLEAR MANGANESE KEYWDS 2 CLUSTER, METALLOENZYME, HYDROLASE EXPDTA X-RAY DIFFRACTION AUTHOR J.D.COX,E.CAMA,D.M.COLLELUORI,D.E.ASH,D.W.CHRISTIANSON REVDAT 3 24-FEB-09 1HQF 1 VERSN REVDAT 2 01-APR-03 1HQF 1 JRNL REVDAT 1 04-APR-01 1HQF 0 JRNL AUTH J.D.COX,E.CAMA,D.M.COLLELUORI,S.PETHE,J.L.BOUCHER, JRNL AUTH 2 D.MANSUY,D.E.ASH,D.W.CHRISTIANSON JRNL TITL MECHANISTIC AND METABOLIC INFERENCES FROM THE JRNL TITL 2 BINDING OF SUBSTRATE ANALOGUES AND PRODUCTS TO JRNL TITL 3 ARGINASE. JRNL REF BIOCHEMISTRY V. 40 2689 2001 JRNL REFN ISSN 0006-2960 JRNL PMID 11258880 JRNL DOI 10.1021/BI002318+ REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.90 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : CNS REMARK 3 AUTHORS : BRUNGER,ADAMS,CLORE,DELANO,GROS,GROSSE- REMARK 3 : KUNSTLEVE,JIANG,KUSZEWSKI,NILGES, PANNU, REMARK 3 : READ,RICE,SIMONSON,WARREN REMARK 3 REMARK 3 REFINEMENT TARGET : ENGH & HUBER REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 2.90 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 30.00 REMARK 3 DATA CUTOFF (SIGMA(F)) : 2.000 REMARK 3 DATA CUTOFF HIGH (ABS(F)) : NULL REMARK 3 DATA CUTOFF LOW (ABS(F)) : NULL REMARK 3 COMPLETENESS (WORKING+TEST) (%) : NULL REMARK 3 NUMBER OF REFLECTIONS : 17125 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : NULL REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING SET) : 0.265 REMARK 3 FREE R VALUE : 0.289 REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 FREE R VALUE TEST SET COUNT : 781 REMARK 3 ESTIMATED ERROR OF FREE R VALUE : NULL REMARK 3 REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. REMARK 3 TOTAL NUMBER OF BINS USED : NULL REMARK 3 BIN RESOLUTION RANGE HIGH (A) : NULL REMARK 3 BIN RESOLUTION RANGE LOW (A) : NULL REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : NULL REMARK 3 REFLECTIONS IN BIN (WORKING SET) : NULL REMARK 3 BIN R VALUE (WORKING SET) : NULL REMARK 3 BIN FREE R VALUE : NULL REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 BIN FREE R VALUE TEST SET COUNT : NULL REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : NULL REMARK 3 REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. REMARK 3 PROTEIN ATOMS : 7185 REMARK 3 NUCLEIC ACID ATOMS : 0 REMARK 3 HETEROGEN ATOMS : 45 REMARK 3 SOLVENT ATOMS : 15 REMARK 3 REMARK 3 B VALUES. REMARK 3 FROM WILSON PLOT (A**2) : NULL REMARK 3 MEAN B VALUE (OVERALL, A**2) : NULL REMARK 3 OVERALL ANISOTROPIC B VALUE. REMARK 3 B11 (A**2) : NULL REMARK 3 B22 (A**2) : NULL REMARK 3 B33 (A**2) : NULL REMARK 3 B12 (A**2) : NULL REMARK 3 B13 (A**2) : NULL REMARK 3 B23 (A**2) : NULL REMARK 3 REMARK 3 ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM LUZZATI PLOT (A) : NULL REMARK 3 ESD FROM SIGMAA (A) : NULL REMARK 3 LOW RESOLUTION CUTOFF (A) : NULL REMARK 3 REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM C-V LUZZATI PLOT (A) : NULL REMARK 3 ESD FROM C-V SIGMAA (A) : NULL REMARK 3 REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES. REMARK 3 BOND LENGTHS (A) : 0.008 REMARK 3 BOND ANGLES (DEGREES) : 1.50 REMARK 3 DIHEDRAL ANGLES (DEGREES) : NULL REMARK 3 IMPROPER ANGLES (DEGREES) : NULL REMARK 3 REMARK 3 ISOTROPIC THERMAL MODEL : NULL REMARK 3 REMARK 3 ISOTROPIC THERMAL FACTOR RESTRAINTS. RMS SIGMA REMARK 3 MAIN-CHAIN BOND (A**2) : NULL ; NULL REMARK 3 MAIN-CHAIN ANGLE (A**2) : NULL ; NULL REMARK 3 SIDE-CHAIN BOND (A**2) : NULL ; NULL REMARK 3 SIDE-CHAIN ANGLE (A**2) : NULL ; NULL REMARK 3 REMARK 3 BULK SOLVENT MODELING. REMARK 3 METHOD USED : NULL REMARK 3 KSOL : NULL REMARK 3 BSOL : NULL REMARK 3 REMARK 3 NCS MODEL : NULL REMARK 3 REMARK 3 NCS RESTRAINTS. RMS SIGMA/WEIGHT REMARK 3 GROUP 1 POSITIONAL (A) : NULL ; NULL REMARK 3 GROUP 1 B-FACTOR (A**2) : NULL ; NULL REMARK 3 REMARK 3 PARAMETER FILE 1 : NULL REMARK 3 TOPOLOGY FILE 1 : NULL REMARK 3 REMARK 3 OTHER REFINEMENT REMARKS: NULL REMARK 4 REMARK 4 1HQF COMPLIES WITH FORMAT V. 3.15, 01-DEC-08 REMARK 100 REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY RCSB ON 18-DEC-00. REMARK 100 THE RCSB ID CODE IS RCSB012517. REMARK 200 REMARK 200 EXPERIMENTAL DETAILS REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION REMARK 200 DATE OF DATA COLLECTION : 20-JUN-00 REMARK 200 TEMPERATURE (KELVIN) : 100 REMARK 200 PH : 8.5 REMARK 200 NUMBER OF CRYSTALS USED : 2 REMARK 200 REMARK 200 SYNCHROTRON (Y/N) : Y REMARK 200 RADIATION SOURCE : SSRL REMARK 200 BEAMLINE : BL7-1 REMARK 200 X-RAY GENERATOR MODEL : NULL REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M REMARK 200 WAVELENGTH OR RANGE (A) : 1.08 REMARK 200 MONOCHROMATOR : FILTER REMARK 200 OPTICS : NULL REMARK 200 REMARK 200 DETECTOR TYPE : IMAGE PLATE REMARK 200 DETECTOR MANUFACTURER : MAR SCANNER 345 MM PLATE REMARK 200 INTENSITY-INTEGRATION SOFTWARE : DENZO REMARK 200 DATA SCALING SOFTWARE : SCALEPACK REMARK 200 REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 17424 REMARK 200 RESOLUTION RANGE HIGH (A) : 2.900 REMARK 200 RESOLUTION RANGE LOW (A) : 30.000 REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 2.000 REMARK 200 REMARK 200 OVERALL. REMARK 200 COMPLETENESS FOR RANGE (%) : 83.7 REMARK 200 DATA REDUNDANCY : 1.800 REMARK 200 R MERGE (I) : 0.06400 REMARK 200 R SYM (I) : NULL REMARK 200 FOR THE DATA SET : 11.4000 REMARK 200 REMARK 200 IN THE HIGHEST RESOLUTION SHELL. REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 2.90 REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : 3.00 REMARK 200 COMPLETENESS FOR SHELL (%) : 86.7 REMARK 200 DATA REDUNDANCY IN SHELL : 1.80 REMARK 200 R MERGE FOR SHELL (I) : 0.30200 REMARK 200 R SYM FOR SHELL (I) : NULL REMARK 200 FOR SHELL : NULL REMARK 200 REMARK 200 DIFFRACTION PROTOCOL: SINGLE WAVELENGTH REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: MOLECULAR REPLACEMENT REMARK 200 SOFTWARE USED: AMORE REMARK 200 STARTING MODEL: NULL REMARK 200 REMARK 200 REMARK: NULL REMARK 280 REMARK 280 CRYSTAL REMARK 280 SOLVENT CONTENT, VS (%): 48.35 REMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 2.38 REMARK 280 REMARK 280 CRYSTALLIZATION CONDITIONS: PEG 8000, BICINE, MANGANESE REMARK 280 CHLORIDE, PH 8.5, VAPOR DIFFUSION, HANGING DROP, TEMPERATURE REMARK 280 277K REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 32 REMARK 290 REMARK 290 SYMOP SYMMETRY REMARK 290 NNNMMM OPERATOR REMARK 290 1555 X,Y,Z REMARK 290 2555 -Y,X-Y,Z+2/3 REMARK 290 3555 -X+Y,-X,Z+1/3 REMARK 290 REMARK 290 WHERE NNN -> OPERATOR NUMBER REMARK 290 MMM -> TRANSLATION VECTOR REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY REMARK 290 RELATED MOLECULES. REMARK 290 SMTRY1 1 1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 1 0.000000 1.000000 0.000000 0.00000 REMARK 290 SMTRY3 1 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 2 -0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 2 0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 2 0.000000 0.000000 1.000000 74.66667 REMARK 290 SMTRY1 3 -0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 3 -0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 3 0.000000 0.000000 1.000000 37.33333 REMARK 290 REMARK 290 REMARK: NULL REMARK 300 REMARK 300 BIOMOLECULE: 1 REMARK 300 SEE REMARK 350 FOR THE AUTHOR PROVIDED AND/OR PROGRAM REMARK 300 GENERATED ASSEMBLY INFORMATION FOR THE STRUCTURE IN REMARK 300 THIS ENTRY. THE REMARK MAY ALSO PROVIDE INFORMATION ON REMARK 300 BURIED SURFACE AREA. REMARK 300 REMARK: THE BIOLOGICAL ASSEMBLY IS A TRIMER REMARK 350 REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS REMARK 350 GIVEN BELOW. BOTH NON-CRYSTALLOGRAPHIC AND REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN. REMARK 350 REMARK 350 BIOMOLECULE: 1 REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: TRIMERIC REMARK 350 SOFTWARE DETERMINED QUATERNARY STRUCTURE: TRIMERIC REMARK 350 SOFTWARE USED: PISA REMARK 350 TOTAL BURIED SURFACE AREA: 6090 ANGSTROM**2 REMARK 350 SURFACE AREA OF THE COMPLEX: 34170 ANGSTROM**2 REMARK 350 CHANGE IN SOLVENT FREE ENERGY: -42.0 KCAL/MOL REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000 REMARK 465 REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 MET A 1 REMARK 465 SER A 2 REMARK 465 SER A 3 REMARK 465 LYS A 4 REMARK 465 PRO A 5 REMARK 465 LYS A 320 REMARK 465 PRO A 321 REMARK 465 PRO A 322 REMARK 465 LYS A 323 REMARK 465 MET B 1 REMARK 465 SER B 2 REMARK 465 SER B 3 REMARK 465 LYS B 4 REMARK 465 PRO B 5 REMARK 465 LYS B 320 REMARK 465 PRO B 321 REMARK 465 PRO B 322 REMARK 465 LYS B 323 REMARK 465 MET C 1 REMARK 465 SER C 2 REMARK 465 SER C 3 REMARK 465 LYS C 4 REMARK 465 PRO C 5 REMARK 465 LYS C 320 REMARK 465 PRO C 321 REMARK 465 PRO C 322 REMARK 465 LYS C 323 REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: COVALENT BOND ANGLES REMARK 500 REMARK 500 THE STEREOCHEMICAL PARAMETERS OF THE FOLLOWING RESIDUES REMARK 500 HAVE VALUES WHICH DEVIATE FROM EXPECTED VALUES BY MORE REMARK 500 THAN 6*RMSD (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 500 IDENTIFIER; SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT: (10X,I3,1X,A3,1X,A1,I4,A1,3(1X,A4,2X),12X,F5.1) REMARK 500 REMARK 500 EXPECTED VALUES PROTEIN: ENGH AND HUBER, 1999 REMARK 500 EXPECTED VALUES NUCLEIC ACID: CLOWNEY ET AL 1996 REMARK 500 REMARK 500 M RES CSSEQI ATM1 ATM2 ATM3 REMARK 500 GLY A 99 N - CA - C ANGL. DEV. = -16.9 DEGREES REMARK 500 GLY B 99 N - CA - C ANGL. DEV. = -16.9 DEGREES REMARK 500 GLY C 99 N - CA - C ANGL. DEV. = -16.9 DEGREES REMARK 500 REMARK 500 REMARK: NULL REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: TORSION ANGLES REMARK 500 REMARK 500 TORSION ANGLES OUTSIDE THE EXPECTED RAMACHANDRAN REGIONS: REMARK 500 (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; REMARK 500 SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT:(10X,I3,1X,A3,1X,A1,I4,A1,4X,F7.2,3X,F7.2) REMARK 500 REMARK 500 EXPECTED VALUES: GJ KLEYWEGT AND TA JONES (1996). PHI/PSI- REMARK 500 CHOLOGY: RAMACHANDRAN REVISITED. STRUCTURE 4, 1395 - 1400 REMARK 500 REMARK 500 M RES CSSEQI PSI PHI REMARK 500 PRO A 14 85.88 -48.72 REMARK 500 SER A 16 -5.78 -165.06 REMARK 500 LEU A 53 103.77 -51.24 REMARK 500 ASN A 60 75.10 50.01 REMARK 500 PRO A 63 81.46 -69.84 REMARK 500 PHE A 64 83.60 -41.96 REMARK 500 GLN A 65 -85.83 73.46 REMARK 500 ASN A 69 45.42 29.26 REMARK 500 GLN A 88 -35.97 -39.68 REMARK 500 HIS A 101 7.39 -69.68 REMARK 500 THR A 135 134.44 -30.16 REMARK 500 SER A 136 30.35 -99.07 REMARK 500 SER A 137 147.43 -171.80 REMARK 500 LEU A 140 -7.95 -56.98 REMARK 500 GLN A 143 -17.02 -140.95 REMARK 500 PRO A 144 -39.73 -27.56 REMARK 500 GLU A 151 0.66 -58.92 REMARK 500 ASP A 158 133.47 -39.00 REMARK 500 PRO A 160 90.93 -57.05 REMARK 500 PRO A 167 108.91 -46.32 REMARK 500 ARG A 180 -2.77 -154.05 REMARK 500 PRO A 184 -73.06 -43.78 REMARK 500 GLU A 214 -75.17 -61.77 REMARK 500 THR A 215 -37.19 -28.53 REMARK 500 ARG A 222 -84.88 -85.02 REMARK 500 VAL A 233 -25.58 -28.46 REMARK 500 LYS A 266 2.48 -56.69 REMARK 500 ASN A 279 79.63 -113.91 REMARK 500 PRO A 280 0.83 -43.08 REMARK 500 VAL A 289 -70.82 -63.14 REMARK 500 LEU A 301 -12.50 -49.38 REMARK 500 THR A 306 126.95 -33.68 REMARK 500 PRO B 14 85.85 -48.65 REMARK 500 SER B 16 -5.73 -165.10 REMARK 500 LEU B 53 103.76 -51.23 REMARK 500 ASN B 60 75.07 50.02 REMARK 500 PRO B 63 81.46 -69.86 REMARK 500 PHE B 64 83.61 -41.96 REMARK 500 GLN B 65 -85.84 73.46 REMARK 500 ASN B 69 45.41 29.29 REMARK 500 GLN B 88 -35.81 -39.74 REMARK 500 HIS B 101 7.29 -69.59 REMARK 500 THR B 135 134.44 -30.13 REMARK 500 SER B 136 30.33 -99.04 REMARK 500 SER B 137 147.42 -171.82 REMARK 500 LEU B 140 -7.94 -56.99 REMARK 500 GLN B 143 -17.05 -140.95 REMARK 500 PRO B 144 -39.71 -27.55 REMARK 500 GLU B 151 0.66 -58.93 REMARK 500 ASP B 158 133.44 -38.97 REMARK 500 PRO B 160 90.85 -56.98 REMARK 500 PRO B 167 108.92 -46.33 REMARK 500 ARG B 180 -2.74 -154.05 REMARK 500 PRO B 184 -73.14 -43.72 REMARK 500 GLU B 214 -75.16 -61.81 REMARK 500 THR B 215 -37.16 -28.54 REMARK 500 ARG B 222 -84.88 -84.96 REMARK 500 VAL B 233 -25.55 -28.51 REMARK 500 LYS B 266 2.55 -56.75 REMARK 500 ASN B 279 79.62 -113.87 REMARK 500 PRO B 280 0.78 -43.02 REMARK 500 VAL B 289 -70.76 -63.19 REMARK 500 LEU B 301 -12.54 -49.31 REMARK 500 THR B 306 126.94 -33.71 REMARK 500 PRO C 14 85.85 -48.68 REMARK 500 SER C 16 -5.79 -165.09 REMARK 500 LEU C 53 103.78 -51.23 REMARK 500 ASN C 60 75.10 50.02 REMARK 500 PRO C 63 81.43 -69.79 REMARK 500 PHE C 64 83.59 -41.92 REMARK 500 GLN C 65 -85.86 73.49 REMARK 500 ASN C 69 45.43 29.21 REMARK 500 GLN C 88 -36.05 -39.63 REMARK 500 HIS C 101 7.37 -69.69 REMARK 500 THR C 135 134.44 -30.07 REMARK 500 SER C 136 30.32 -99.11 REMARK 500 SER C 137 147.43 -171.81 REMARK 500 LEU C 140 -7.99 -56.95 REMARK 500 GLN C 143 -17.05 -140.94 REMARK 500 PRO C 144 -39.76 -27.49 REMARK 500 GLU C 151 0.69 -58.91 REMARK 500 ASP C 158 133.48 -39.02 REMARK 500 PRO C 160 90.90 -57.03 REMARK 500 PRO C 167 108.93 -46.31 REMARK 500 ARG C 180 -2.76 -154.04 REMARK 500 PRO C 184 -73.09 -43.76 REMARK 500 GLU C 214 -75.17 -61.77 REMARK 500 THR C 215 -37.13 -28.58 REMARK 500 ARG C 222 -84.82 -85.02 REMARK 500 VAL C 233 -25.63 -28.42 REMARK 500 LYS C 266 2.48 -56.66 REMARK 500 ASN C 279 79.62 -113.88 REMARK 500 PRO C 280 0.85 -43.07 REMARK 500 VAL C 289 -70.88 -63.14 REMARK 500 LEU C 301 -12.51 -49.33 REMARK 500 THR C 306 126.95 -33.67 REMARK 500 REMARK 500 REMARK: NULL REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: PLANAR GROUPS REMARK 500 REMARK 500 PLANAR GROUPS IN THE FOLLOWING RESIDUES HAVE A TOTAL REMARK 500 RMS DISTANCE OF ALL ATOMS FROM THE BEST-FIT PLANE REMARK 500 BY MORE THAN AN EXPECTED VALUE OF 6*RMSD, WITH AN REMARK 500 RMSD 0.02 ANGSTROMS, OR AT LEAST ONE ATOM HAS REMARK 500 AN RMSD GREATER THAN THIS VALUE REMARK 500 (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; REMARK 500 SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 M RES CSSEQI RMS TYPE REMARK 500 TYR A 176 0.07 SIDE_CHAIN REMARK 500 TYR B 176 0.07 SIDE_CHAIN REMARK 500 TYR C 176 0.07 SIDE_CHAIN REMARK 500 REMARK 500 REMARK: NULL REMARK 620 REMARK 620 METAL COORDINATION REMARK 620 (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; REMARK 620 SSEQ=SEQUENCE NUMBER; I=INSERTION CODE): REMARK 620 REMARK 620 COORDINATION ANGLES FOR: M RES CSSEQI METAL REMARK 620 MN A 500 MN REMARK 620 N RES CSSEQI ATOM REMARK 620 1 ASP A 124 OD2 REMARK 620 2 HAR A 906 OH1 87.5 REMARK 620 3 ASP A 128 OD2 77.2 119.9 REMARK 620 4 HIS A 101 ND1 78.9 114.6 118.5 REMARK 620 N 1 2 3 REMARK 620 REMARK 620 COORDINATION ANGLES FOR: M RES CSSEQI METAL REMARK 620 MN A 501 MN REMARK 620 N RES CSSEQI ATOM REMARK 620 1 ASP A 124 OD1 REMARK 620 2 ASP A 232 OD2 74.9 REMARK 620 3 ASP A 234 OD1 91.6 75.8 REMARK 620 4 ASP A 234 OD2 130.8 61.4 57.9 REMARK 620 5 HAR A 906 OH1 106.8 58.3 122.1 69.3 REMARK 620 6 HIS A 126 ND1 87.5 157.2 119.8 140.0 115.6 REMARK 620 7 HAR A 906 NH1 108.5 90.6 152.0 94.1 34.3 81.3 REMARK 620 N 1 2 3 4 5 6 REMARK 620 REMARK 620 COORDINATION ANGLES FOR: M RES CSSEQI METAL REMARK 620 MN B 502 MN REMARK 620 N RES CSSEQI ATOM REMARK 620 1 HIS B 101 ND1 REMARK 620 2 ASP B 124 OD2 78.9 REMARK 620 3 HAR B 907 OH1 114.6 87.5 REMARK 620 4 ASP B 128 OD2 118.5 77.2 119.9 REMARK 620 N 1 2 3 REMARK 620 REMARK 620 COORDINATION ANGLES FOR: M RES CSSEQI METAL REMARK 620 MN B 503 MN REMARK 620 N RES CSSEQI ATOM REMARK 620 1 ASP B 232 OD2 REMARK 620 2 HAR B 907 OH1 58.3 REMARK 620 3 ASP B 124 OD1 74.9 106.8 REMARK 620 4 ASP B 234 OD1 75.7 123.6 89.1 REMARK 620 5 ASP B 234 OD2 58.4 71.2 126.3 56.8 REMARK 620 6 HIS B 126 ND1 157.2 115.5 87.5 119.0 143.6 REMARK 620 7 HAR B 907 NH1 90.6 34.3 108.4 154.3 97.5 81.3 REMARK 620 N 1 2 3 4 5 6 REMARK 620 REMARK 620 COORDINATION ANGLES FOR: M RES CSSEQI METAL REMARK 620 MN C 504 MN REMARK 620 N RES CSSEQI ATOM REMARK 620 1 HAR C 908 OH1 REMARK 620 2 HIS C 101 ND1 114.6 REMARK 620 3 ASP C 124 OD2 87.5 78.9 REMARK 620 4 ASP C 128 OD2 119.9 118.5 77.2 REMARK 620 N 1 2 3 REMARK 620 REMARK 620 COORDINATION ANGLES FOR: M RES CSSEQI METAL REMARK 620 MN C 505 MN REMARK 620 N RES CSSEQI ATOM REMARK 620 1 HAR C 908 OH1 REMARK 620 2 ASP C 124 OD1 106.8 REMARK 620 3 HIS C 126 ND1 115.6 87.5 REMARK 620 4 ASP C 234 OD1 123.4 89.0 119.2 REMARK 620 5 ASP C 234 OD2 71.2 126.5 143.3 56.8 REMARK 620 6 ASP C 232 OD2 58.3 74.8 157.2 75.5 58.6 REMARK 620 7 HAR C 908 NH1 34.3 108.5 81.3 154.2 97.5 90.6 REMARK 620 N 1 2 3 4 5 6 REMARK 800 REMARK 800 SITE REMARK 800 SITE_IDENTIFIER: AC1 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MN A 500 REMARK 800 SITE_IDENTIFIER: AC2 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MN A 501 REMARK 800 SITE_IDENTIFIER: AC3 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MN B 502 REMARK 800 SITE_IDENTIFIER: AC4 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MN B 503 REMARK 800 SITE_IDENTIFIER: AC5 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MN C 504 REMARK 800 SITE_IDENTIFIER: AC6 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MN C 505 REMARK 800 SITE_IDENTIFIER: AC7 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE HAR A 906 REMARK 800 SITE_IDENTIFIER: AC8 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE HAR B 907 REMARK 800 SITE_IDENTIFIER: AC9 REMARK 800 EVIDENCE_CODE: SOFTWARE REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE HAR C 908 REMARK 900 REMARK 900 RELATED ENTRIES REMARK 900 RELATED ID: 1RLA RELATED DB: PDB REMARK 900 NATIVE ARGINASE DBREF 1HQF A 1 323 UNP P07824 ARGI1_RAT 1 323 DBREF 1HQF B 1 323 UNP P07824 ARGI1_RAT 1 323 DBREF 1HQF C 1 323 UNP P07824 ARGI1_RAT 1 323 SEQRES 1 A 323 MET SER SER LYS PRO LYS PRO ILE GLU ILE ILE GLY ALA SEQRES 2 A 323 PRO PHE SER LYS GLY GLN PRO ARG GLY GLY VAL GLU LYS SEQRES 3 A 323 GLY PRO ALA ALA LEU ARG LYS ALA GLY LEU VAL GLU LYS SEQRES 4 A 323 LEU LYS GLU THR GLU TYR ASN VAL ARG ASP HIS GLY ASP SEQRES 5 A 323 LEU ALA PHE VAL ASP VAL PRO ASN ASP SER PRO PHE GLN SEQRES 6 A 323 ILE VAL LYS ASN PRO ARG SER VAL GLY LYS ALA ASN GLU SEQRES 7 A 323 GLN LEU ALA ALA VAL VAL ALA GLU THR GLN LYS ASN GLY SEQRES 8 A 323 THR ILE SER VAL VAL LEU GLY GLY ASP HIS SER MET ALA SEQRES 9 A 323 ILE GLY SER ILE SER GLY HIS ALA ARG VAL HIS PRO ASP SEQRES 10 A 323 LEU CYS VAL ILE TRP VAL ASP ALA HIS THR ASP ILE ASN SEQRES 11 A 323 THR PRO LEU THR THR SER SER GLY ASN LEU HIS GLY GLN SEQRES 12 A 323 PRO VAL ALA PHE LEU LEU LYS GLU LEU LYS GLY LYS PHE SEQRES 13 A 323 PRO ASP VAL PRO GLY PHE SER TRP VAL THR PRO CYS ILE SEQRES 14 A 323 SER ALA LYS ASP ILE VAL TYR ILE GLY LEU ARG ASP VAL SEQRES 15 A 323 ASP PRO GLY GLU HIS TYR ILE ILE LYS THR LEU GLY ILE SEQRES 16 A 323 LYS TYR PHE SER MET THR GLU VAL ASP LYS LEU GLY ILE SEQRES 17 A 323 GLY LYS VAL MET GLU GLU THR PHE SER TYR LEU LEU GLY SEQRES 18 A 323 ARG LYS LYS ARG PRO ILE HIS LEU SER PHE ASP VAL ASP SEQRES 19 A 323 GLY LEU ASP PRO VAL PHE THR PRO ALA THR GLY THR PRO SEQRES 20 A 323 VAL VAL GLY GLY LEU SER TYR ARG GLU GLY LEU TYR ILE SEQRES 21 A 323 THR GLU GLU ILE TYR LYS THR GLY LEU LEU SER GLY LEU SEQRES 22 A 323 ASP ILE MET GLU VAL ASN PRO THR LEU GLY LYS THR PRO SEQRES 23 A 323 GLU GLU VAL THR ARG THR VAL ASN THR ALA VAL ALA LEU SEQRES 24 A 323 THR LEU SER CYS PHE GLY THR LYS ARG GLU GLY ASN HIS SEQRES 25 A 323 LYS PRO GLU THR ASP TYR LEU LYS PRO PRO LYS SEQRES 1 B 323 MET SER SER LYS PRO LYS PRO ILE GLU ILE ILE GLY ALA SEQRES 2 B 323 PRO PHE SER LYS GLY GLN PRO ARG GLY GLY VAL GLU LYS SEQRES 3 B 323 GLY PRO ALA ALA LEU ARG LYS ALA GLY LEU VAL GLU LYS SEQRES 4 B 323 LEU LYS GLU THR GLU TYR ASN VAL ARG ASP HIS GLY ASP SEQRES 5 B 323 LEU ALA PHE VAL ASP VAL PRO ASN ASP SER PRO PHE GLN SEQRES 6 B 323 ILE VAL LYS ASN PRO ARG SER VAL GLY LYS ALA ASN GLU SEQRES 7 B 323 GLN LEU ALA ALA VAL VAL ALA GLU THR GLN LYS ASN GLY SEQRES 8 B 323 THR ILE SER VAL VAL LEU GLY GLY ASP HIS SER MET ALA SEQRES 9 B 323 ILE GLY SER ILE SER GLY HIS ALA ARG VAL HIS PRO ASP SEQRES 10 B 323 LEU CYS VAL ILE TRP VAL ASP ALA HIS THR ASP ILE ASN SEQRES 11 B 323 THR PRO LEU THR THR SER SER GLY ASN LEU HIS GLY GLN SEQRES 12 B 323 PRO VAL ALA PHE LEU LEU LYS GLU LEU LYS GLY LYS PHE SEQRES 13 B 323 PRO ASP VAL PRO GLY PHE SER TRP VAL THR PRO CYS ILE SEQRES 14 B 323 SER ALA LYS ASP ILE VAL TYR ILE GLY LEU ARG ASP VAL SEQRES 15 B 323 ASP PRO GLY GLU HIS TYR ILE ILE LYS THR LEU GLY ILE SEQRES 16 B 323 LYS TYR PHE SER MET THR GLU VAL ASP LYS LEU GLY ILE SEQRES 17 B 323 GLY LYS VAL MET GLU GLU THR PHE SER TYR LEU LEU GLY SEQRES 18 B 323 ARG LYS LYS ARG PRO ILE HIS LEU SER PHE ASP VAL ASP SEQRES 19 B 323 GLY LEU ASP PRO VAL PHE THR PRO ALA THR GLY THR PRO SEQRES 20 B 323 VAL VAL GLY GLY LEU SER TYR ARG GLU GLY LEU TYR ILE SEQRES 21 B 323 THR GLU GLU ILE TYR LYS THR GLY LEU LEU SER GLY LEU SEQRES 22 B 323 ASP ILE MET GLU VAL ASN PRO THR LEU GLY LYS THR PRO SEQRES 23 B 323 GLU GLU VAL THR ARG THR VAL ASN THR ALA VAL ALA LEU SEQRES 24 B 323 THR LEU SER CYS PHE GLY THR LYS ARG GLU GLY ASN HIS SEQRES 25 B 323 LYS PRO GLU THR ASP TYR LEU LYS PRO PRO LYS SEQRES 1 C 323 MET SER SER LYS PRO LYS PRO ILE GLU ILE ILE GLY ALA SEQRES 2 C 323 PRO PHE SER LYS GLY GLN PRO ARG GLY GLY VAL GLU LYS SEQRES 3 C 323 GLY PRO ALA ALA LEU ARG LYS ALA GLY LEU VAL GLU LYS SEQRES 4 C 323 LEU LYS GLU THR GLU TYR ASN VAL ARG ASP HIS GLY ASP SEQRES 5 C 323 LEU ALA PHE VAL ASP VAL PRO ASN ASP SER PRO PHE GLN SEQRES 6 C 323 ILE VAL LYS ASN PRO ARG SER VAL GLY LYS ALA ASN GLU SEQRES 7 C 323 GLN LEU ALA ALA VAL VAL ALA GLU THR GLN LYS ASN GLY SEQRES 8 C 323 THR ILE SER VAL VAL LEU GLY GLY ASP HIS SER MET ALA SEQRES 9 C 323 ILE GLY SER ILE SER GLY HIS ALA ARG VAL HIS PRO ASP SEQRES 10 C 323 LEU CYS VAL ILE TRP VAL ASP ALA HIS THR ASP ILE ASN SEQRES 11 C 323 THR PRO LEU THR THR SER SER GLY ASN LEU HIS GLY GLN SEQRES 12 C 323 PRO VAL ALA PHE LEU LEU LYS GLU LEU LYS GLY LYS PHE SEQRES 13 C 323 PRO ASP VAL PRO GLY PHE SER TRP VAL THR PRO CYS ILE SEQRES 14 C 323 SER ALA LYS ASP ILE VAL TYR ILE GLY LEU ARG ASP VAL SEQRES 15 C 323 ASP PRO GLY GLU HIS TYR ILE ILE LYS THR LEU GLY ILE SEQRES 16 C 323 LYS TYR PHE SER MET THR GLU VAL ASP LYS LEU GLY ILE SEQRES 17 C 323 GLY LYS VAL MET GLU GLU THR PHE SER TYR LEU LEU GLY SEQRES 18 C 323 ARG LYS LYS ARG PRO ILE HIS LEU SER PHE ASP VAL ASP SEQRES 19 C 323 GLY LEU ASP PRO VAL PHE THR PRO ALA THR GLY THR PRO SEQRES 20 C 323 VAL VAL GLY GLY LEU SER TYR ARG GLU GLY LEU TYR ILE SEQRES 21 C 323 THR GLU GLU ILE TYR LYS THR GLY LEU LEU SER GLY LEU SEQRES 22 C 323 ASP ILE MET GLU VAL ASN PRO THR LEU GLY LYS THR PRO SEQRES 23 C 323 GLU GLU VAL THR ARG THR VAL ASN THR ALA VAL ALA LEU SEQRES 24 C 323 THR LEU SER CYS PHE GLY THR LYS ARG GLU GLY ASN HIS SEQRES 25 C 323 LYS PRO GLU THR ASP TYR LEU LYS PRO PRO LYS HET MN A 500 1 HET MN A 501 1 HET MN B 502 1 HET MN B 503 1 HET MN C 504 1 HET MN C 505 1 HET HAR A 906 13 HET HAR B 907 13 HET HAR C 908 13 HETNAM MN MANGANESE (II) ION HETNAM HAR N-OMEGA-HYDROXY-L-ARGININE FORMUL 4 MN 6(MN 2+) FORMUL 10 HAR 3(C6 H14 N4 O3) FORMUL 13 HOH *15(H2 O) HELIX 1 1 GLY A 23 GLU A 25 5 3 HELIX 2 2 LYS A 26 ALA A 34 1 9 HELIX 3 3 GLY A 35 GLU A 42 1 8 HELIX 4 4 ASN A 69 GLY A 91 1 23 HELIX 5 5 SER A 102 HIS A 115 1 14 HELIX 6 6 ASN A 139 GLY A 142 5 4 HELIX 7 7 GLN A 143 LEU A 149 1 7 HELIX 8 8 ASP A 183 LEU A 193 1 11 HELIX 9 9 SER A 199 GLY A 207 1 9 HELIX 10 10 GLY A 207 GLY A 221 1 15 HELIX 11 11 SER A 253 LYS A 266 1 14 HELIX 12 12 THR A 285 PHE A 304 1 20 HELIX 13 13 GLY B 23 GLU B 25 5 3 HELIX 14 14 LYS B 26 ALA B 34 1 9 HELIX 15 15 GLY B 35 GLU B 42 1 8 HELIX 16 16 ASN B 69 GLY B 91 1 23 HELIX 17 17 SER B 102 HIS B 115 1 14 HELIX 18 18 ASN B 139 GLY B 142 5 4 HELIX 19 19 GLN B 143 LEU B 149 1 7 HELIX 20 20 ASP B 183 LEU B 193 1 11 HELIX 21 21 SER B 199 GLY B 207 1 9 HELIX 22 22 GLY B 207 GLY B 221 1 15 HELIX 23 23 SER B 253 LYS B 266 1 14 HELIX 24 24 THR B 285 PHE B 304 1 20 HELIX 25 25 GLY C 23 GLU C 25 5 3 HELIX 26 26 LYS C 26 ALA C 34 1 9 HELIX 27 27 GLY C 35 GLU C 42 1 8 HELIX 28 28 ASN C 69 GLY C 91 1 23 HELIX 29 29 SER C 102 HIS C 115 1 14 HELIX 30 30 ASN C 139 GLY C 142 5 4 HELIX 31 31 GLN C 143 LEU C 149 1 7 HELIX 32 32 ASP C 183 LEU C 193 1 11 HELIX 33 33 SER C 199 GLY C 207 1 9 HELIX 34 34 GLY C 207 GLY C 221 1 15 HELIX 35 35 SER C 253 LYS C 266 1 14 HELIX 36 36 THR C 285 PHE C 304 1 20 SHEET 1 A 8 ASN A 46 ASP A 52 0 SHEET 2 A 8 PRO A 7 GLY A 12 1 N ILE A 8 O ASN A 46 SHEET 3 A 8 ILE A 93 LEU A 97 1 N ILE A 93 O PRO A 7 SHEET 4 A 8 LEU A 270 MET A 276 1 O SER A 271 N SER A 94 SHEET 5 A 8 ILE A 227 ASP A 232 1 O ILE A 227 N SER A 271 SHEET 6 A 8 VAL A 120 VAL A 123 1 O ILE A 121 N SER A 230 SHEET 7 A 8 ILE A 174 ILE A 177 1 O VAL A 175 N TRP A 122 SHEET 8 A 8 TYR A 197 PHE A 198 1 N PHE A 198 O TYR A 176 SHEET 1 B 8 ASN B 46 ASP B 52 0 SHEET 2 B 8 PRO B 7 GLY B 12 1 N ILE B 8 O ASN B 46 SHEET 3 B 8 ILE B 93 LEU B 97 1 N ILE B 93 O PRO B 7 SHEET 4 B 8 LEU B 270 MET B 276 1 O SER B 271 N SER B 94 SHEET 5 B 8 ILE B 227 ASP B 232 1 O ILE B 227 N SER B 271 SHEET 6 B 8 VAL B 120 VAL B 123 1 O ILE B 121 N SER B 230 SHEET 7 B 8 ILE B 174 ILE B 177 1 O VAL B 175 N TRP B 122 SHEET 8 B 8 TYR B 197 PHE B 198 1 N PHE B 198 O TYR B 176 SHEET 1 C 8 ASN C 46 ASP C 52 0 SHEET 2 C 8 PRO C 7 GLY C 12 1 N ILE C 8 O ASN C 46 SHEET 3 C 8 ILE C 93 LEU C 97 1 N ILE C 93 O PRO C 7 SHEET 4 C 8 LEU C 270 MET C 276 1 O SER C 271 N SER C 94 SHEET 5 C 8 ILE C 227 ASP C 232 1 O ILE C 227 N SER C 271 SHEET 6 C 8 VAL C 120 VAL C 123 1 O ILE C 121 N SER C 230 SHEET 7 C 8 ILE C 174 ILE C 177 1 O VAL C 175 N TRP C 122 SHEET 8 C 8 TYR C 197 PHE C 198 1 N PHE C 198 O TYR C 176 LINK MN MN A 500 OD2 ASP A 124 1555 1555 2.03 LINK MN MN A 500 OH1 HAR A 906 1555 1555 2.64 LINK MN MN A 500 OD2 ASP A 128 1555 1555 2.08 LINK MN MN A 500 ND1 HIS A 101 1555 1555 2.35 LINK MN MN A 501 OD1 ASP A 124 1555 1555 2.19 LINK MN MN A 501 OD2 ASP A 232 1555 1555 2.49 LINK MN MN A 501 OD1 ASP A 234 1555 1555 2.48 LINK MN MN A 501 OD2 ASP A 234 1555 1555 2.01 LINK MN MN A 501 OH1 HAR A 906 1555 1555 2.21 LINK MN MN A 501 ND1 HIS A 126 1555 1555 1.99 LINK MN MN A 501 NH1 HAR A 906 1555 1555 2.34 LINK MN MN B 502 ND1 HIS B 101 1555 1555 2.35 LINK MN MN B 502 OD2 ASP B 124 1555 1555 2.03 LINK MN MN B 502 OH1 HAR B 907 1555 1555 2.64 LINK MN MN B 502 OD2 ASP B 128 1555 1555 2.08 LINK MN MN B 503 OD2 ASP B 232 1555 1555 2.49 LINK MN MN B 503 OH1 HAR B 907 1555 1555 2.21 LINK MN MN B 503 OD1 ASP B 124 1555 1555 2.19 LINK MN MN B 503 OD1 ASP B 234 1555 1555 2.58 LINK MN MN B 503 OD2 ASP B 234 1555 1555 1.89 LINK MN MN B 503 ND1 HIS B 126 1555 1555 1.99 LINK MN MN B 503 NH1 HAR B 907 1555 1555 2.34 LINK MN MN C 504 OH1 HAR C 908 1555 1555 2.64 LINK MN MN C 504 ND1 HIS C 101 1555 1555 2.35 LINK MN MN C 504 OD2 ASP C 124 1555 1555 2.03 LINK MN MN C 504 OD2 ASP C 128 1555 1555 2.08 LINK MN MN C 505 OH1 HAR C 908 1555 1555 2.21 LINK MN MN C 505 OD1 ASP C 124 1555 1555 2.19 LINK MN MN C 505 ND1 HIS C 126 1555 1555 1.99 LINK MN MN C 505 OD1 ASP C 234 1555 1555 2.59 LINK MN MN C 505 OD2 ASP C 234 1555 1555 1.88 LINK MN MN C 505 OD2 ASP C 232 1555 1555 2.49 LINK MN MN C 505 NH1 HAR C 908 1555 1555 2.34 SITE 1 AC1 5 HIS A 101 ASP A 124 ASP A 128 ASP A 232 SITE 2 AC1 5 HAR A 906 SITE 1 AC2 5 ASP A 124 HIS A 126 ASP A 232 ASP A 234 SITE 2 AC2 5 HAR A 906 SITE 1 AC3 5 HIS B 101 ASP B 124 ASP B 128 ASP B 232 SITE 2 AC3 5 HAR B 907 SITE 1 AC4 5 ASP B 124 HIS B 126 ASP B 232 ASP B 234 SITE 2 AC4 5 HAR B 907 SITE 1 AC5 5 HIS C 101 ASP C 124 ASP C 128 ASP C 232 SITE 2 AC5 5 HAR C 908 SITE 1 AC6 5 ASP C 124 HIS C 126 ASP C 232 ASP C 234 SITE 2 AC6 5 HAR C 908 SITE 1 AC7 13 ASP A 124 HIS A 126 ASP A 128 SER A 137 SITE 2 AC7 13 HIS A 141 GLY A 142 ASP A 183 GLU A 186 SITE 3 AC7 13 ASP A 232 ASP A 234 GLU A 277 MN A 500 SITE 4 AC7 13 MN A 501 SITE 1 AC8 13 ASP B 124 HIS B 126 ASP B 128 SER B 137 SITE 2 AC8 13 HIS B 141 GLY B 142 ASP B 183 GLU B 186 SITE 3 AC8 13 ASP B 232 ASP B 234 GLU B 277 MN B 502 SITE 4 AC8 13 MN B 503 SITE 1 AC9 13 ASP C 124 HIS C 126 ASP C 128 SER C 137 SITE 2 AC9 13 HIS C 141 GLY C 142 ASP C 183 GLU C 186 SITE 3 AC9 13 ASP C 232 ASP C 234 GLU C 277 MN C 504 SITE 4 AC9 13 MN C 505 CRYST1 88.000 88.000 112.000 90.00 90.00 120.00 P 32 9 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.011364 0.006561 0.000000 0.00000 SCALE2 0.000000 0.013122 0.000000 0.00000 SCALE3 0.000000 0.000000 0.008929 0.00000 ATOM 1 N LYS A 6 20.150 24.280 -3.804 1.00 99.08 N ATOM 2 CA LYS A 6 20.016 25.666 -3.251 1.00 99.00 C ATOM 3 C LYS A 6 20.110 25.692 -1.713 1.00 98.40 C ATOM 4 O LYS A 6 19.950 24.661 -1.046 1.00 97.61 O ATOM 5 CB LYS A 6 18.690 26.275 -3.726 1.00 99.66 C ATOM 6 CG LYS A 6 18.835 27.475 -4.668 1.00 99.85 C ATOM 7 CD LYS A 6 19.049 28.777 -3.884 1.00100.73 C ATOM 8 CE LYS A 6 19.103 30.010 -4.789 1.00101.23 C ATOM 9 NZ LYS A 6 19.115 31.289 -4.008 1.00100.75 N ATOM 10 N PRO A 7 20.388 26.877 -1.135 1.00 97.80 N ATOM 11 CA PRO A 7 20.507 27.023 0.316 1.00 97.15 C ATOM 12 C PRO A 7 19.380 27.825 0.940 1.00 96.32 C ATOM 13 O PRO A 7 18.846 28.746 0.325 1.00 96.22 O ATOM 14 CB PRO A 7 21.827 27.749 0.457 1.00 97.20 C ATOM 15 CG PRO A 7 21.693 28.770 -0.642 1.00 97.43 C ATOM 16 CD PRO A 7 21.026 28.026 -1.803 1.00 97.47 C ATOM 17 N ILE A 8 19.053 27.481 2.180 1.00 95.57 N ATOM 18 CA ILE A 8 17.996 28.154 2.922 1.00 94.69 C ATOM 19 C ILE A 8 18.514 28.607 4.269 1.00 94.01 C ATOM 20 O ILE A 8 19.133 27.814 4.985 1.00 93.77 O ATOM 21 CB ILE A 8 16.792 27.204 3.226 1.00 94.64 C ATOM 22 CG1 ILE A 8 16.130 26.722 1.935 1.00 93.94 C ATOM 23 CG2 ILE A 8 15.771 27.916 4.108 1.00 93.66 C ATOM 24 CD1 ILE A 8 15.647 27.841 1.065 1.00 94.03 C ATOM 25 N GLU A 9 18.293 29.874 4.608 1.00 92.63 N ATOM 26 CA GLU A 9 18.681 30.319 5.937 1.00 91.96 C ATOM 27 C GLU A 9 17.423 30.890 6.573 1.00 91.79 C ATOM 28 O GLU A 9 16.719 31.703 5.962 1.00 91.03 O ATOM 29 CB GLU A 9 19.783 31.383 5.946 1.00 91.63 C ATOM 30 CG GLU A 9 20.487 31.410 7.322 1.00 91.26 C ATOM 31 CD GLU A 9 21.121 32.739 7.707 1.00 91.17 C ATOM 32 OE1 GLU A 9 21.711 32.810 8.808 1.00 89.98 O ATOM 33 OE2 GLU A 9 21.029 33.705 6.926 1.00 91.53 O ATOM 34 N ILE A 10 17.132 30.438 7.790 1.00 91.97 N ATOM 35 CA ILE A 10 15.947 30.883 8.514 1.00 92.34 C ATOM 36 C ILE A 10 16.288 31.950 9.541 1.00 91.65 C ATOM 37 O ILE A 10 17.227 31.793 10.319 1.00 91.67 O ATOM 38 CB ILE A 10 15.268 29.705 9.249 1.00 93.05 C ATOM 39 CG1 ILE A 10 14.834 28.654 8.224 1.00 93.78 C ATOM 40 CG2 ILE A 10 14.079 30.216 10.082 1.00 91.79 C ATOM 41 CD1 ILE A 10 14.226 27.398 8.817 1.00 92.99 C ATOM 42 N ILE A 11 15.516 33.032 9.539 1.00 90.86 N ATOM 43 CA ILE A 11 15.715 34.133 10.475 1.00 89.22 C ATOM 44 C ILE A 11 14.465 34.241 11.331 1.00 88.75 C ATOM 45 O ILE A 11 13.350 34.222 10.816 1.00 87.87 O ATOM 46 CB ILE A 11 15.906 35.491 9.751 1.00 88.97 C ATOM 47 CG1 ILE A 11 17.018 35.381 8.714 1.00 87.62 C ATOM 48 CG2 ILE A 11 16.223 36.582 10.758 1.00 88.11 C ATOM 49 CD1 ILE A 11 18.340 34.985 9.283 1.00 88.39 C ATOM 50 N GLY A 12 14.659 34.346 12.638 1.00 88.08 N ATOM 51 CA GLY A 12 13.529 34.477 13.530 1.00 86.84 C ATOM 52 C GLY A 12 13.402 35.940 13.891 1.00 85.81 C ATOM 53 O GLY A 12 14.381 36.556 14.305 1.00 86.11 O ATOM 54 N ALA A 13 12.207 36.501 13.731 1.00 84.95 N ATOM 55 CA ALA A 13 11.973 37.908 14.041 1.00 83.69 C ATOM 56 C ALA A 13 11.024 38.075 15.226 1.00 83.14 C ATOM 57 O ALA A 13 10.099 38.884 15.166 1.00 83.23 O ATOM 58 CB ALA A 13 11.399 38.627 12.801 1.00 82.18 C ATOM 59 N PRO A 14 11.251 37.329 16.323 1.00 82.39 N ATOM 60 CA PRO A 14 10.376 37.435 17.495 1.00 82.40 C ATOM 61 C PRO A 14 10.136 38.880 17.911 1.00 83.87 C ATOM 62 O PRO A 14 10.828 39.401 18.776 1.00 84.25 O ATOM 63 CB PRO A 14 11.132 36.648 18.552 1.00 81.00 C ATOM 64 CG PRO A 14 12.562 36.848 18.154 1.00 80.99 C ATOM 65 CD PRO A 14 12.493 36.624 16.677 1.00 81.10 C ATOM 66 N PHE A 15 9.144 39.516 17.296 1.00 86.12 N ATOM 67 CA PHE A 15 8.819 40.913 17.578 1.00 87.98 C ATOM 68 C PHE A 15 7.305 41.081 17.715 1.00 88.21 C ATOM 69 O PHE A 15 6.548 40.163 17.386 1.00 88.51 O ATOM 70 CB PHE A 15 9.349 41.793 16.443 1.00 88.79 C ATOM 71 CG PHE A 15 9.326 43.256 16.746 1.00 91.15 C ATOM 72 CD1 PHE A 15 9.860 43.739 17.938 1.00 92.70 C ATOM 73 CD2 PHE A 15 8.802 44.161 15.827 1.00 91.85 C ATOM 74 CE1 PHE A 15 9.883 45.108 18.207 1.00 93.41 C ATOM 75 CE2 PHE A 15 8.820 45.531 16.083 1.00 93.06 C ATOM 76 CZ PHE A 15 9.359 46.008 17.276 1.00 93.27 C ATOM 77 N SER A 16 6.850 42.237 18.190 1.00 87.82 N ATOM 78 CA SER A 16 5.413 42.425 18.338 1.00 88.48 C ATOM 79 C SER A 16 4.935 43.859 18.544 1.00 90.22 C ATOM 80 O SER A 16 3.734 44.108 18.571 1.00 90.56 O ATOM 81 CB SER A 16 4.896 41.562 19.492 1.00 86.82 C ATOM 82 OG SER A 16 5.377 42.010 20.739 1.00 83.07 O ATOM 83 N LYS A 17 5.856 44.801 18.690 1.00 92.02 N ATOM 84 CA LYS A 17 5.466 46.191 18.913 1.00 93.42 C ATOM 85 C LYS A 17 4.714 46.797 17.725 1.00 93.24 C ATOM 86 O LYS A 17 4.114 47.874 17.829 1.00 92.98 O ATOM 87 CB LYS A 17 6.704 47.033 19.255 1.00 95.56 C ATOM 88 CG LYS A 17 6.807 47.487 20.727 1.00 96.54 C ATOM 89 CD LYS A 17 6.616 46.336 21.709 1.00 97.30 C ATOM 90 CE LYS A 17 7.113 46.705 23.100 1.00 97.15 C ATOM 91 NZ LYS A 17 8.596 46.859 23.109 1.00 96.57 N ATOM 92 N GLY A 18 4.739 46.107 16.593 1.00 92.44 N ATOM 93 CA GLY A 18 4.026 46.619 15.441 1.00 92.27 C ATOM 94 C GLY A 18 2.538 46.700 15.749 1.00 92.53 C ATOM 95 O GLY A 18 1.793 47.378 15.040 1.00 92.29 O ATOM 96 N GLN A 19 2.113 46.009 16.811 1.00 92.65 N ATOM 97 CA GLN A 19 0.713 45.969 17.238 1.00 92.73 C ATOM 98 C GLN A 19 0.574 45.619 18.727 1.00 94.03 C ATOM 99 O GLN A 19 1.526 45.154 19.355 1.00 93.82 O ATOM 100 CB GLN A 19 -0.067 44.967 16.379 1.00 92.02 C ATOM 101 CG GLN A 19 0.542 43.575 16.280 1.00 91.29 C ATOM 102 CD GLN A 19 0.385 42.780 17.553 1.00 91.08 C ATOM 103 OE1 GLN A 19 -0.703 42.725 18.118 1.00 91.96 O ATOM 104 NE2 GLN A 19 1.464 42.152 18.009 1.00 89.85 N ATOM 105 N PRO A 20 -0.628 45.825 19.303 1.00 95.05 N ATOM 106 CA PRO A 20 -0.989 45.576 20.709 1.00 95.93 C ATOM 107 C PRO A 20 -0.967 44.147 21.273 1.00 96.49 C ATOM 108 O PRO A 20 -0.271 43.872 22.256 1.00 97.00 O ATOM 109 CB PRO A 20 -2.377 46.203 20.814 1.00 95.49 C ATOM 110 CG PRO A 20 -2.943 45.941 19.473 1.00 95.38 C ATOM 111 CD PRO A 20 -1.801 46.322 18.562 1.00 94.96 C ATOM 112 N ARG A 21 -1.748 43.255 20.670 1.00 96.79 N ATOM 113 CA ARG A 21 -1.837 41.861 21.116 1.00 96.91 C ATOM 114 C ARG A 21 -0.463 41.184 21.226 1.00 94.86 C ATOM 115 O ARG A 21 0.288 41.101 20.247 1.00 94.86 O ATOM 116 CB ARG A 21 -2.729 41.064 20.155 1.00 99.34 C ATOM 117 CG ARG A 21 -3.809 41.896 19.475 1.00102.34 C ATOM 118 CD ARG A 21 -4.595 41.057 18.483 1.00106.42 C ATOM 119 NE ARG A 21 -5.547 40.179 19.152 1.00109.60 N ATOM 120 CZ ARG A 21 -6.649 40.607 19.757 1.00111.63 C ATOM 121 NH1 ARG A 21 -6.939 41.902 19.773 1.00112.45 N ATOM 122 NH2 ARG A 21 -7.461 39.741 20.347 1.00112.44 N ATOM 123 N GLY A 22 -0.149 40.683 22.417 1.00 92.20 N ATOM 124 CA GLY A 22 1.137 40.041 22.627 1.00 89.96 C ATOM 125 C GLY A 22 1.196 38.544 22.385 1.00 88.10 C ATOM 126 O GLY A 22 0.241 37.821 22.653 1.00 87.82 O ATOM 127 N GLY A 23 2.337 38.080 21.888 1.00 86.64 N ATOM 128 CA GLY A 23 2.507 36.667 21.623 1.00 85.01 C ATOM 129 C GLY A 23 3.205 36.439 20.299 1.00 84.30 C ATOM 130 O GLY A 23 4.015 35.522 20.165 1.00 84.52 O ATOM 131 N VAL A 24 2.900 37.285 19.320 1.00 83.77 N ATOM 132 CA VAL A 24 3.489 37.174 17.986 1.00 83.44 C ATOM 133 C VAL A 24 5.004 36.944 17.935 1.00 84.55 C ATOM 134 O VAL A 24 5.517 36.393 16.954 1.00 83.80 O ATOM 135 CB VAL A 24 3.121 38.408 17.120 1.00 82.02 C ATOM 136 CG1 VAL A 24 1.875 38.107 16.303 1.00 82.66 C ATOM 137 CG2 VAL A 24 2.861 39.604 18.004 1.00 80.13 C ATOM 138 N GLU A 25 5.722 37.363 18.977 1.00 85.75 N ATOM 139 CA GLU A 25 7.168 37.163 18.997 1.00 86.17 C ATOM 140 C GLU A 25 7.437 35.699 19.323 1.00 85.93 C ATOM 141 O GLU A 25 8.528 35.175 19.057 1.00 86.45 O ATOM 142 CB GLU A 25 7.854 38.071 20.026 1.00 86.78 C ATOM 143 CG GLU A 25 7.688 37.675 21.482 1.00 88.34 C ATOM 144 CD GLU A 25 6.301 37.957 22.014 1.00 90.53 C ATOM 145 OE1 GLU A 25 5.594 38.809 21.422 1.00 91.06 O ATOM 146 OE2 GLU A 25 5.927 37.340 23.037 1.00 91.16 O ATOM 147 N LYS A 26 6.434 35.040 19.899 1.00 84.12 N ATOM 148 CA LYS A 26 6.551 33.628 20.218 1.00 82.19 C ATOM 149 C LYS A 26 6.378 32.864 18.908 1.00 80.51 C ATOM 150 O LYS A 26 6.492 31.641 18.870 1.00 80.34 O ATOM 151 CB LYS A 26 5.472 33.211 21.212 1.00 82.12 C ATOM 152 CG LYS A 26 5.627 33.832 22.583 1.00 84.38 C ATOM 153 CD LYS A 26 4.535 33.336 23.539 1.00 86.61 C ATOM 154 CE LYS A 26 4.701 33.921 24.947 1.00 86.85 C ATOM 155 NZ LYS A 26 3.645 33.481 25.915 1.00 86.67 N ATOM 156 N GLY A 27 6.106 33.608 17.838 1.00 78.69 N ATOM 157 CA GLY A 27 5.916 33.013 16.527 1.00 76.08 C ATOM 158 C GLY A 27 6.996 32.022 16.144 1.00 74.71 C ATOM 159 O GLY A 27 6.727 30.821 16.068 1.00 74.43 O ATOM 160 N PRO A 28 8.231 32.489 15.882 1.00 74.00 N ATOM 161 CA PRO A 28 9.330 31.591 15.505 1.00 74.28 C ATOM 162 C PRO A 28 9.599 30.518 16.560 1.00 75.07 C ATOM 163 O PRO A 28 10.172 29.466 16.277 1.00 74.60 O ATOM 164 CB PRO A 28 10.503 32.550 15.323 1.00 73.18 C ATOM 165 CG PRO A 28 9.838 33.786 14.807 1.00 71.70 C ATOM 166 CD PRO A 28 8.635 33.898 15.722 1.00 72.82 C ATOM 167 N ALA A 29 9.166 30.796 17.781 1.00 77.64 N ATOM 168 CA ALA A 29 9.346 29.871 18.891 1.00 79.22 C ATOM 169 C ALA A 29 8.468 28.636 18.726 1.00 79.71 C ATOM 170 O ALA A 29 8.883 27.519 19.062 1.00 81.37 O ATOM 171 CB ALA A 29 9.025 30.574 20.214 1.00 79.86 C ATOM 172 N ALA A 30 7.259 28.839 18.210 1.00 78.80 N ATOM 173 CA ALA A 30 6.329 27.735 18.005 1.00 78.62 C ATOM 174 C ALA A 30 6.643 27.029 16.693 1.00 78.68 C ATOM 175 O ALA A 30 6.593 25.798 16.600 1.00 78.63 O ATOM 176 CB ALA A 30 4.897 28.254 17.989 1.00 78.52 C ATOM 177 N LEU A 31 6.982 27.819 15.682 1.00 78.19 N ATOM 178 CA LEU A 31 7.291 27.274 14.369 1.00 77.40 C ATOM 179 C LEU A 31 8.457 26.302 14.400 1.00 77.67 C ATOM 180 O LEU A 31 8.397 25.251 13.774 1.00 77.84 O ATOM 181 CB LEU A 31 7.565 28.410 13.378 1.00 75.60 C ATOM 182 CG LEU A 31 6.306 29.077 12.813 1.00 73.01 C ATOM 183 CD1 LEU A 31 6.521 30.549 12.633 1.00 70.89 C ATOM 184 CD2 LEU A 31 5.949 28.417 11.504 1.00 71.89 C ATOM 185 N ARG A 32 9.522 26.639 15.116 1.00 78.94 N ATOM 186 CA ARG A 32 10.656 25.726 15.177 1.00 80.72 C ATOM 187 C ARG A 32 10.224 24.490 15.946 1.00 81.47 C ATOM 188 O ARG A 32 10.589 23.375 15.588 1.00 80.36 O ATOM 189 CB ARG A 32 11.863 26.363 15.879 1.00 82.03 C ATOM 190 CG ARG A 32 12.700 27.320 15.032 1.00 82.49 C ATOM 191 CD ARG A 32 14.092 27.523 15.652 1.00 84.29 C ATOM 192 NE ARG A 32 14.834 28.634 15.057 1.00 85.70 N ATOM 193 CZ ARG A 32 14.491 29.915 15.180 1.00 86.70 C ATOM 194 NH1 ARG A 32 13.416 30.251 15.882 1.00 87.41 N ATOM 195 NH2 ARG A 32 15.219 30.863 14.602 1.00 86.01 N ATOM 196 N LYS A 33 9.449 24.714 17.007 1.00 82.85 N ATOM 197 CA LYS A 33 8.938 23.647 17.855 1.00 84.50 C ATOM 198 C LYS A 33 8.052 22.707 17.041 1.00 85.48 C ATOM 199 O LYS A 33 7.895 21.534 17.380 1.00 85.93 O ATOM 200 CB LYS A 33 8.147 24.240 19.024 1.00 85.61 C ATOM 201 CG LYS A 33 7.295 23.228 19.779 1.00 88.84 C ATOM 202 CD LYS A 33 8.130 22.161 20.495 1.00 91.56 C ATOM 203 CE LYS A 33 8.497 22.583 21.921 1.00 93.10 C ATOM 204 NZ LYS A 33 7.288 22.874 22.767 1.00 93.72 N ATOM 205 N ALA A 34 7.461 23.221 15.969 1.00 86.44 N ATOM 206 CA ALA A 34 6.622 22.388 15.113 1.00 86.60 C ATOM 207 C ALA A 34 7.563 21.654 14.151 1.00 86.88 C ATOM 208 O ALA A 34 7.138 21.028 13.177 1.00 85.52 O ATOM 209 CB ALA A 34 5.629 23.254 14.346 1.00 87.03 C ATOM 210 N GLY A 35 8.856 21.764 14.444 1.00 88.26 N ATOM 211 CA GLY A 35 9.883 21.112 13.652 1.00 89.33 C ATOM 212 C GLY A 35 10.115 21.685 12.275 1.00 89.25 C ATOM 213 O GLY A 35 10.597 20.985 11.385 1.00 89.47 O ATOM 214 N LEU A 36 9.793 22.956 12.090 1.00 89.38 N ATOM 215 CA LEU A 36 9.969 23.569 10.782 1.00 90.16 C ATOM 216 C LEU A 36 11.369 23.377 10.197 1.00 90.88 C ATOM 217 O LEU A 36 11.522 23.101 9.003 1.00 90.76 O ATOM 218 CB LEU A 36 9.621 25.062 10.845 1.00 89.73 C ATOM 219 CG LEU A 36 9.749 25.890 9.559 1.00 89.10 C ATOM 220 CD1 LEU A 36 9.245 25.116 8.351 1.00 89.09 C ATOM 221 CD2 LEU A 36 8.967 27.180 9.735 1.00 88.91 C ATOM 222 N VAL A 37 12.393 23.495 11.033 1.00 91.52 N ATOM 223 CA VAL A 37 13.756 23.352 10.537 1.00 92.05 C ATOM 224 C VAL A 37 14.023 21.993 9.871 1.00 93.18 C ATOM 225 O VAL A 37 14.330 21.929 8.676 1.00 93.84 O ATOM 226 CB VAL A 37 14.779 23.576 11.663 1.00 90.22 C ATOM 227 CG1 VAL A 37 16.099 24.015 11.067 1.00 88.34 C ATOM 228 CG2 VAL A 37 14.258 24.610 12.643 1.00 89.51 C ATOM 229 N GLU A 38 13.893 20.915 10.641 1.00 93.67 N ATOM 230 CA GLU A 38 14.133 19.558 10.141 1.00 93.22 C ATOM 231 C GLU A 38 13.368 19.216 8.861 1.00 91.50 C ATOM 232 O GLU A 38 13.967 18.795 7.867 1.00 89.54 O ATOM 233 CB GLU A 38 13.786 18.524 11.227 1.00 95.27 C ATOM 234 CG GLU A 38 14.485 18.751 12.567 1.00 97.90 C ATOM 235 CD GLU A 38 13.809 19.818 13.416 1.00 98.72 C ATOM 236 OE1 GLU A 38 12.801 19.504 14.088 1.00 98.61 O ATOM 237 OE2 GLU A 38 14.281 20.973 13.402 1.00100.35 O ATOM 238 N LYS A 39 12.049 19.387 8.888 1.00 89.99 N ATOM 239 CA LYS A 39 11.242 19.076 7.719 1.00 89.24 C ATOM 240 C LYS A 39 11.836 19.778 6.517 1.00 89.66 C ATOM 241 O LYS A 39 11.944 19.197 5.436 1.00 90.45 O ATOM 242 CB LYS A 39 9.803 19.529 7.917 1.00 87.39 C ATOM 243 CG LYS A 39 9.108 18.851 9.057 1.00 86.93 C ATOM 244 CD LYS A 39 7.701 19.373 9.197 1.00 87.74 C ATOM 245 CE LYS A 39 7.034 18.791 10.426 1.00 88.21 C ATOM 246 NZ LYS A 39 6.956 17.314 10.322 1.00 89.64 N ATOM 247 N LEU A 40 12.235 21.030 6.708 1.00 89.66 N ATOM 248 CA LEU A 40 12.824 21.787 5.614 1.00 89.32 C ATOM 249 C LEU A 40 14.053 21.052 5.093 1.00 90.45 C ATOM 250 O LEU A 40 14.258 20.950 3.884 1.00 89.95 O ATOM 251 CB LEU A 40 13.184 23.209 6.072 1.00 85.82 C ATOM 252 CG LEU A 40 12.106 24.275 5.815 1.00 82.28 C ATOM 253 CD1 LEU A 40 12.438 25.534 6.573 1.00 82.14 C ATOM 254 CD2 LEU A 40 12.009 24.560 4.332 1.00 80.23 C ATOM 255 N LYS A 41 14.856 20.516 6.007 1.00 92.70 N ATOM 256 CA LYS A 41 16.063 19.793 5.617 1.00 95.26 C ATOM 257 C LYS A 41 15.711 18.612 4.719 1.00 96.02 C ATOM 258 O LYS A 41 16.482 18.238 3.833 1.00 96.02 O ATOM 259 CB LYS A 41 16.817 19.307 6.857 1.00 95.44 C ATOM 260 CG LYS A 41 17.342 20.436 7.740 1.00 97.67 C ATOM 261 CD LYS A 41 17.803 19.917 9.099 1.00 98.54 C ATOM 262 CE LYS A 41 18.078 21.056 10.072 1.00 97.92 C ATOM 263 NZ LYS A 41 18.379 20.550 11.440 1.00 96.37 N ATOM 264 N GLU A 42 14.535 18.041 4.943 1.00 97.23 N ATOM 265 CA GLU A 42 14.074 16.906 4.162 1.00 99.27 C ATOM 266 C GLU A 42 13.934 17.193 2.673 1.00100.29 C ATOM 267 O GLU A 42 13.638 16.292 1.889 1.00 99.46 O ATOM 268 CB GLU A 42 12.738 16.421 4.698 1.00100.72 C ATOM 269 CG GLU A 42 12.835 15.631 5.974 1.00104.56 C ATOM 270 CD GLU A 42 11.464 15.245 6.501 1.00107.81 C ATOM 271 OE1 GLU A 42 10.562 14.990 5.666 1.00108.24 O ATOM 272 OE2 GLU A 42 11.295 15.187 7.742 1.00110.21 O ATOM 273 N THR A 43 14.128 18.442 2.275 1.00102.51 N ATOM 274 CA THR A 43 14.010 18.786 0.863 1.00104.86 C ATOM 275 C THR A 43 15.364 18.735 0.165 1.00106.85 C ATOM 276 O THR A 43 16.342 18.194 0.701 1.00107.03 O ATOM 277 CB THR A 43 13.422 20.203 0.661 1.00104.47 C ATOM 278 OG1 THR A 43 14.244 21.165 1.336 1.00104.26 O ATOM 279 CG2 THR A 43 12.005 20.273 1.201 1.00103.94 C ATOM 280 N GLU A 44 15.399 19.298 -1.042 1.00108.54 N ATOM 281 CA GLU A 44 16.611 19.362 -1.854 1.00109.37 C ATOM 282 C GLU A 44 17.418 20.626 -1.505 1.00109.21 C ATOM 283 O GLU A 44 18.448 20.918 -2.123 1.00109.70 O ATOM 284 CB GLU A 44 16.231 19.361 -3.340 1.00109.72 C ATOM 285 CG GLU A 44 14.952 20.135 -3.644 1.00111.43 C ATOM 286 CD GLU A 44 14.707 20.311 -5.135 1.00112.66 C ATOM 287 OE1 GLU A 44 14.820 19.312 -5.881 1.00113.58 O ATOM 288 OE2 GLU A 44 14.395 21.448 -5.561 1.00112.10 O ATOM 289 N TYR A 45 16.939 21.362 -0.502 1.00108.44 N ATOM 290 CA TYR A 45 17.582 22.592 -0.046 1.00107.04 C ATOM 291 C TYR A 45 18.379 22.374 1.243 1.00106.43 C ATOM 292 O TYR A 45 17.982 21.584 2.109 1.00106.41 O ATOM 293 CB TYR A 45 16.533 23.676 0.219 1.00106.81 C ATOM 294 CG TYR A 45 15.777 24.176 -0.992 1.00106.18 C ATOM 295 CD1 TYR A 45 14.900 23.350 -1.694 1.00106.02 C ATOM 296 CD2 TYR A 45 15.920 25.492 -1.420 1.00106.09 C ATOM 297 CE1 TYR A 45 14.182 23.833 -2.797 1.00106.12 C ATOM 298 CE2 TYR A 45 15.214 25.982 -2.515 1.00105.75 C ATOM 299 CZ TYR A 45 14.348 25.153 -3.198 1.00105.62 C ATOM 300 OH TYR A 45 13.661 25.649 -4.283 1.00104.68 O ATOM 301 N ASN A 46 19.504 23.075 1.362 1.00105.11 N ATOM 302 CA ASN A 46 20.331 22.988 2.559 1.00103.37 C ATOM 303 C ASN A 46 19.779 24.079 3.446 1.00101.06 C ATOM 304 O ASN A 46 19.540 25.194 2.984 1.00 99.33 O ATOM 305 CB ASN A 46 21.797 23.279 2.243 1.00106.07 C ATOM 306 CG ASN A 46 22.368 22.335 1.211 1.00108.03 C ATOM 307 OD1 ASN A 46 22.276 21.113 1.352 1.00109.57 O ATOM 308 ND2 ASN A 46 22.967 22.897 0.164 1.00108.96 N ATOM 309 N VAL A 47 19.586 23.775 4.720 1.00 99.24 N ATOM 310 CA VAL A 47 19.012 24.767 5.611 1.00 98.07 C ATOM 311 C VAL A 47 19.908 25.157 6.774 1.00 96.62 C ATOM 312 O VAL A 47 20.481 24.300 7.443 1.00 96.63 O ATOM 313 CB VAL A 47 17.657 24.270 6.181 1.00 98.47 C ATOM 314 CG1 VAL A 47 16.828 25.451 6.665 1.00 97.35 C ATOM 315 CG2 VAL A 47 16.901 23.481 5.123 1.00 98.39 C ATOM 316 N ARG A 48 20.019 26.461 7.004 1.00 94.83 N ATOM 317 CA ARG A 48 20.809 26.981 8.102 1.00 93.46 C ATOM 318 C ARG A 48 19.889 27.851 8.932 1.00 92.09 C ATOM 319 O ARG A 48 19.276 28.783 8.413 1.00 90.53 O ATOM 320 CB ARG A 48 21.985 27.813 7.583 1.00 95.23 C ATOM 321 CG ARG A 48 22.812 28.488 8.686 1.00 96.76 C ATOM 322 CD ARG A 48 23.895 29.396 8.107 1.00 98.94 C ATOM 323 NE ARG A 48 23.997 30.689 8.800 1.00102.01 N ATOM 324 CZ ARG A 48 24.580 30.887 9.984 1.00102.83 C ATOM 325 NH1 ARG A 48 25.135 29.874 10.641 1.00103.67 N ATOM 326 NH2 ARG A 48 24.609 32.108 10.510 1.00103.04 N ATOM 327 N ASP A 49 19.772 27.520 10.215 1.00 91.94 N ATOM 328 CA ASP A 49 18.934 28.283 11.129 1.00 92.13 C ATOM 329 C ASP A 49 19.848 29.349 11.727 1.00 92.71 C ATOM 330 O ASP A 49 20.878 29.032 12.322 1.00 93.23 O ATOM 331 CB ASP A 49 18.379 27.373 12.229 1.00 92.41 C ATOM 332 CG ASP A 49 17.266 28.034 13.031 1.00 93.21 C ATOM 333 OD1 ASP A 49 17.523 29.074 13.676 1.00 93.93 O ATOM 334 OD2 ASP A 49 16.125 27.518 13.022 1.00 94.25 O ATOM 335 N HIS A 50 19.467 30.611 11.549 1.00 93.35 N ATOM 336 CA HIS A 50 20.234 31.766 12.027 1.00 93.56 C ATOM 337 C HIS A 50 19.871 32.135 13.481 1.00 93.08 C ATOM 338 O HIS A 50 20.577 32.907 14.135 1.00 92.89 O ATOM 339 CB HIS A 50 19.971 32.936 11.056 1.00 94.71 C ATOM 340 CG HIS A 50 20.757 34.184 11.326 1.00 95.99 C ATOM 341 ND1 HIS A 50 20.652 34.899 12.506 1.00 97.39 N ATOM 342 CD2 HIS A 50 21.584 34.905 10.531 1.00 95.59 C ATOM 343 CE1 HIS A 50 21.373 36.001 12.422 1.00 96.73 C ATOM 344 NE2 HIS A 50 21.950 36.030 11.231 1.00 96.87 N ATOM 345 N GLY A 51 18.785 31.562 13.990 1.00 92.46 N ATOM 346 CA GLY A 51 18.378 31.860 15.348 1.00 92.91 C ATOM 347 C GLY A 51 17.452 33.059 15.378 1.00 93.76 C ATOM 348 O GLY A 51 16.974 33.509 14.334 1.00 93.36 O ATOM 349 N ASP A 52 17.206 33.590 16.574 1.00 94.72 N ATOM 350 CA ASP A 52 16.319 34.741 16.732 1.00 95.39 C ATOM 351 C ASP A 52 16.987 36.067 17.009 1.00 95.22 C ATOM 352 O ASP A 52 17.501 36.296 18.097 1.00 94.37 O ATOM 353 CB ASP A 52 15.301 34.504 17.847 1.00 96.23 C ATOM 354 CG ASP A 52 14.153 33.647 17.402 1.00 96.65 C ATOM 355 OD1 ASP A 52 13.689 33.838 16.258 1.00 97.76 O ATOM 356 OD2 ASP A 52 13.712 32.792 18.197 1.00 97.38 O ATOM 357 N LEU A 53 16.947 36.951 16.025 1.00 96.11 N ATOM 358 CA LEU A 53 17.517 38.270 16.191 1.00 97.52 C ATOM 359 C LEU A 53 16.959 38.885 17.482 1.00 99.27 C ATOM 360 O LEU A 53 15.805 39.311 17.520 1.00100.25 O ATOM 361 CB LEU A 53 17.147 39.162 14.997 1.00 95.74 C ATOM 362 CG LEU A 53 17.582 38.737 13.593 1.00 94.64 C ATOM 363 CD1 LEU A 53 17.245 39.855 12.624 1.00 93.53 C ATOM 364 CD2 LEU A 53 19.080 38.450 13.563 1.00 94.55 C ATOM 365 N ALA A 54 17.763 38.910 18.543 1.00101.13 N ATOM 366 CA ALA A 54 17.321 39.507 19.799 1.00102.37 C ATOM 367 C ALA A 54 17.209 40.996 19.509 1.00103.74 C ATOM 368 O ALA A 54 18.091 41.581 18.881 1.00102.75 O ATOM 369 CB ALA A 54 18.342 39.251 20.898 1.00101.69 C ATOM 370 N PHE A 55 16.121 41.611 19.948 1.00106.29 N ATOM 371 CA PHE A 55 15.936 43.025 19.676 1.00109.42 C ATOM 372 C PHE A 55 16.023 43.908 20.908 1.00111.23 C ATOM 373 O PHE A 55 15.192 43.821 21.815 1.00111.53 O ATOM 374 CB PHE A 55 14.599 43.256 18.962 1.00110.05 C ATOM 375 CG PHE A 55 14.493 42.564 17.628 1.00109.86 C ATOM 376 CD1 PHE A 55 15.416 42.824 16.616 1.00109.98 C ATOM 377 CD2 PHE A 55 13.481 41.643 17.389 1.00109.37 C ATOM 378 CE1 PHE A 55 15.332 42.177 15.387 1.00109.69 C ATOM 379 CE2 PHE A 55 13.388 40.992 16.166 1.00109.52 C ATOM 380 CZ PHE A 55 14.318 41.260 15.163 1.00109.89 C ATOM 381 N VAL A 56 17.048 44.756 20.917 1.00113.41 N ATOM 382 CA VAL A 56 17.305 45.703 21.998 1.00114.90 C ATOM 383 C VAL A 56 16.120 46.662 22.079 1.00115.83 C ATOM 384 O VAL A 56 15.704 47.222 21.061 1.00115.87 O ATOM 385 CB VAL A 56 18.596 46.508 21.711 1.00115.37 C ATOM 386 CG1 VAL A 56 19.799 45.566 21.643 1.00114.66 C ATOM 387 CG2 VAL A 56 18.454 47.269 20.392 1.00115.29 C ATOM 388 N ASP A 57 15.581 46.866 23.278 1.00116.87 N ATOM 389 CA ASP A 57 14.423 47.741 23.408 1.00118.41 C ATOM 390 C ASP A 57 14.740 49.223 23.575 1.00119.01 C ATOM 391 O ASP A 57 15.558 49.606 24.414 1.00118.81 O ATOM 392 CB ASP A 57 13.528 47.274 24.555 1.00118.79 C ATOM 393 CG ASP A 57 12.051 47.475 24.250 1.00119.33 C ATOM 394 OD1 ASP A 57 11.647 48.632 23.999 1.00118.72 O ATOM 395 OD2 ASP A 57 11.301 46.472 24.251 1.00119.85 O ATOM 396 N VAL A 58 14.068 50.042 22.765 1.00119.62 N ATOM 397 CA VAL A 58 14.237 51.494 22.771 1.00119.95 C ATOM 398 C VAL A 58 13.555 52.159 23.967 1.00120.08 C ATOM 399 O VAL A 58 12.341 52.397 23.958 1.00120.20 O ATOM 400 CB VAL A 58 13.678 52.127 21.470 1.00120.15 C ATOM 401 CG1 VAL A 58 13.733 53.652 21.554 1.00119.13 C ATOM 402 CG2 VAL A 58 14.476 51.630 20.273 1.00120.11 C ATOM 403 N PRO A 59 14.340 52.481 25.010 1.00119.88 N ATOM 404 CA PRO A 59 13.798 53.124 26.210 1.00118.69 C ATOM 405 C PRO A 59 13.081 54.412 25.852 1.00117.26 C ATOM 406 O PRO A 59 13.558 55.197 25.034 1.00116.68 O ATOM 407 CB PRO A 59 15.039 53.359 27.066 1.00119.27 C ATOM 408 CG PRO A 59 16.130 53.533 26.038 1.00119.94 C ATOM 409 CD PRO A 59 15.814 52.431 25.062 1.00119.85 C ATOM 410 N ASN A 60 11.928 54.620 26.469 1.00116.37 N ATOM 411 CA ASN A 60 11.132 55.808 26.210 1.00116.10 C ATOM 412 C ASN A 60 10.935 56.022 24.707 1.00115.83 C ATOM 413 O ASN A 60 11.570 56.886 24.093 1.00115.68 O ATOM 414 CB ASN A 60 11.791 57.042 26.844 1.00115.54 C ATOM 415 CG ASN A 60 10.939 58.301 26.704 1.00115.47 C ATOM 416 OD1 ASN A 60 9.751 58.304 27.031 1.00114.68 O ATOM 417 ND2 ASN A 60 11.550 59.379 26.221 1.00115.65 N ATOM 418 N ASP A 61 10.061 55.209 24.122 1.00115.14 N ATOM 419 CA ASP A 61 9.735 55.306 22.706 1.00114.13 C ATOM 420 C ASP A 61 8.441 56.131 22.669 1.00113.89 C ATOM 421 O ASP A 61 7.418 55.724 23.229 1.00113.46 O ATOM 422 CB ASP A 61 9.504 53.904 22.129 1.00113.36 C ATOM 423 CG ASP A 61 9.587 53.874 20.621 1.00112.30 C ATOM 424 OD1 ASP A 61 9.116 54.835 19.986 1.00111.35 O ATOM 425 OD2 ASP A 61 10.113 52.885 20.071 1.00111.82 O ATOM 426 N SER A 62 8.491 57.292 22.023 1.00113.79 N ATOM 427 CA SER A 62 7.330 58.184 21.961 1.00113.65 C ATOM 428 C SER A 62 6.397 57.976 20.759 1.00112.82 C ATOM 429 O SER A 62 6.847 57.735 19.635 1.00112.49 O ATOM 430 CB SER A 62 7.805 59.644 21.995 1.00114.15 C ATOM 431 OG SER A 62 8.621 59.890 23.131 1.00113.73 O ATOM 432 N PRO A 63 5.074 58.072 20.993 1.00111.99 N ATOM 433 CA PRO A 63 4.026 57.905 19.973 1.00111.53 C ATOM 434 C PRO A 63 3.924 59.018 18.918 1.00110.84 C ATOM 435 O PRO A 63 3.075 59.901 19.031 1.00111.39 O ATOM 436 CB PRO A 63 2.746 57.801 20.808 1.00111.29 C ATOM 437 CG PRO A 63 3.230 57.204 22.091 1.00111.54 C ATOM 438 CD PRO A 63 4.488 58.000 22.343 1.00111.51 C ATOM 439 N PHE A 64 4.779 58.956 17.898 1.00109.81 N ATOM 440 CA PHE A 64 4.801 59.925 16.794 1.00109.33 C ATOM 441 C PHE A 64 3.376 60.280 16.332 1.00108.56 C ATOM 442 O PHE A 64 2.874 59.711 15.364 1.00108.37 O ATOM 443 CB PHE A 64 5.604 59.318 15.630 1.00110.08 C ATOM 444 CG PHE A 64 5.743 60.216 14.419 1.00110.63 C ATOM 445 CD1 PHE A 64 6.603 61.309 14.433 1.00110.93 C ATOM 446 CD2 PHE A 64 5.060 59.926 13.240 1.00110.60 C ATOM 447 CE1 PHE A 64 6.786 62.091 13.289 1.00110.88 C ATOM 448 CE2 PHE A 64 5.237 60.703 12.094 1.00110.13 C ATOM 449 CZ PHE A 64 6.101 61.784 12.120 1.00110.58 C ATOM 450 N GLN A 65 2.738 61.226 17.023 1.00108.06 N ATOM 451 CA GLN A 65 1.367 61.655 16.719 1.00107.81 C ATOM 452 C GLN A 65 0.369 60.580 17.162 1.00107.14 C ATOM 453 O GLN A 65 -0.174 60.636 18.268 1.00107.01 O ATOM 454 CB GLN A 65 1.184 61.907 15.219 1.00108.70 C ATOM 455 CG GLN A 65 2.222 62.822 14.589 1.00110.56 C ATOM 456 CD GLN A 65 1.912 63.139 13.131 1.00110.91 C ATOM 457 OE1 GLN A 65 0.874 63.726 12.822 1.00110.88 O ATOM 458 NE2 GLN A 65 2.812 62.748 12.230 1.00110.95 N ATOM 459 N ILE A 66 0.137 59.608 16.277 1.00106.12 N ATOM 460 CA ILE A 66 -0.770 58.482 16.523 1.00103.69 C ATOM 461 C ILE A 66 0.058 57.190 16.548 1.00101.94 C ATOM 462 O ILE A 66 -0.175 56.307 17.382 1.00100.49 O ATOM 463 CB ILE A 66 -1.866 58.389 15.410 1.00103.32 C ATOM 464 CG1 ILE A 66 -3.243 58.685 16.002 1.00102.53 C ATOM 465 CG2 ILE A 66 -1.858 57.025 14.758 1.00103.34 C ATOM 466 CD1 ILE A 66 -3.438 60.137 16.393 1.00102.55 C ATOM 467 N VAL A 67 1.027 57.107 15.631 1.00100.15 N ATOM 468 CA VAL A 67 1.922 55.956 15.510 1.00 98.05 C ATOM 469 C VAL A 67 2.641 55.642 16.816 1.00 97.96 C ATOM 470 O VAL A 67 3.259 56.515 17.419 1.00 98.49 O ATOM 471 CB VAL A 67 2.977 56.184 14.415 1.00 96.47 C ATOM 472 CG1 VAL A 67 3.852 54.939 14.274 1.00 95.16 C ATOM 473 CG2 VAL A 67 2.290 56.542 13.105 1.00 94.29 C ATOM 474 N LYS A 68 2.578 54.386 17.237 1.00 97.30 N ATOM 475 CA LYS A 68 3.192 53.985 18.491 1.00 97.06 C ATOM 476 C LYS A 68 4.474 53.211 18.291 1.00 97.44 C ATOM 477 O LYS A 68 4.824 52.871 17.162 1.00 97.46 O ATOM 478 CB LYS A 68 2.185 53.164 19.286 1.00 95.66 C ATOM 479 CG LYS A 68 0.822 53.788 19.193 1.00 94.66 C ATOM 480 CD LYS A 68 -0.213 53.089 20.021 1.00 94.77 C ATOM 481 CE LYS A 68 -1.570 53.725 19.751 1.00 95.37 C ATOM 482 NZ LYS A 68 -1.485 55.223 19.734 1.00 93.99 N ATOM 483 N ASN A 69 5.174 52.968 19.398 1.00 98.69 N ATOM 484 CA ASN A 69 6.444 52.231 19.423 1.00100.55 C ATOM 485 C ASN A 69 7.261 52.365 18.126 1.00101.33 C ATOM 486 O ASN A 69 7.800 51.377 17.622 1.00100.98 O ATOM 487 CB ASN A 69 6.162 50.747 19.694 1.00101.27 C ATOM 488 CG ASN A 69 5.332 50.520 20.957 1.00101.35 C ATOM 489 OD1 ASN A 69 5.877 50.330 22.049 1.00101.81 O ATOM 490 ND2 ASN A 69 4.006 50.543 20.808 1.00100.63 N ATOM 491 N PRO A 70 7.387 53.596 17.593 1.00102.26 N ATOM 492 CA PRO A 70 8.126 53.865 16.353 1.00102.62 C ATOM 493 C PRO A 70 9.597 53.482 16.341 1.00102.96 C ATOM 494 O PRO A 70 10.024 52.663 15.526 1.00103.01 O ATOM 495 CB PRO A 70 7.916 55.362 16.150 1.00103.21 C ATOM 496 CG PRO A 70 7.868 55.873 17.545 1.00103.31 C ATOM 497 CD PRO A 70 6.984 54.861 18.235 1.00102.72 C ATOM 498 N ARG A 71 10.372 54.090 17.230 1.00103.42 N ATOM 499 CA ARG A 71 11.798 53.808 17.313 1.00104.26 C ATOM 500 C ARG A 71 12.048 52.304 17.415 1.00104.02 C ATOM 501 O ARG A 71 12.987 51.771 16.809 1.00104.32 O ATOM 502 CB ARG A 71 12.389 54.536 18.518 1.00105.61 C ATOM 503 CG ARG A 71 12.195 56.037 18.437 1.00107.38 C ATOM 504 CD ARG A 71 12.640 56.742 19.702 1.00109.01 C ATOM 505 NE ARG A 71 12.465 58.188 19.588 1.00109.72 N ATOM 506 CZ ARG A 71 11.288 58.789 19.442 1.00110.10 C ATOM 507 NH1 ARG A 71 10.173 58.068 19.395 1.00110.13 N ATOM 508 NH2 ARG A 71 11.228 60.110 19.335 1.00109.92 N ATOM 509 N SER A 72 11.200 51.624 18.180 1.00103.10 N ATOM 510 CA SER A 72 11.313 50.180 18.346 1.00101.82 C ATOM 511 C SER A 72 11.311 49.516 16.957 1.00101.36 C ATOM 512 O SER A 72 12.310 48.923 16.527 1.00100.77 O ATOM 513 CB SER A 72 10.135 49.655 19.185 1.00101.15 C ATOM 514 OG SER A 72 10.120 50.217 20.488 1.00 99.62 O ATOM 515 N VAL A 73 10.179 49.648 16.265 1.00100.71 N ATOM 516 CA VAL A 73 9.971 49.085 14.934 1.00 99.43 C ATOM 517 C VAL A 73 10.957 49.604 13.891 1.00 98.74 C ATOM 518 O VAL A 73 11.145 48.989 12.844 1.00 98.50 O ATOM 519 CB VAL A 73 8.543 49.389 14.435 1.00 99.57 C ATOM 520 CG1 VAL A 73 8.318 48.724 13.093 1.00 99.47 C ATOM 521 CG2 VAL A 73 7.511 48.919 15.462 1.00 98.64 C ATOM 522 N GLY A 74 11.576 50.742 14.170 1.00 98.05 N ATOM 523 CA GLY A 74 12.532 51.287 13.230 1.00 98.09 C ATOM 524 C GLY A 74 13.882 50.608 13.367 1.00 98.36 C ATOM 525 O GLY A 74 14.550 50.310 12.372 1.00 97.94 O ATOM 526 N LYS A 75 14.290 50.358 14.607 1.00 98.26 N ATOM 527 CA LYS A 75 15.571 49.719 14.853 1.00 98.70 C ATOM 528 C LYS A 75 15.504 48.202 14.721 1.00 98.41 C ATOM 529 O LYS A 75 16.486 47.558 14.324 1.00 98.11 O ATOM 530 CB LYS A 75 16.101 50.088 16.238 1.00 99.64 C ATOM 531 CG LYS A 75 17.410 49.371 16.591 1.00101.06 C ATOM 532 CD LYS A 75 18.475 49.577 15.507 1.00101.25 C ATOM 533 CE LYS A 75 19.682 48.675 15.721 1.00100.94 C ATOM 534 NZ LYS A 75 20.639 48.782 14.585 1.00100.45 N ATOM 535 N ALA A 76 14.350 47.632 15.064 1.00 98.25 N ATOM 536 CA ALA A 76 14.159 46.187 14.972 1.00 97.31 C ATOM 537 C ALA A 76 14.169 45.777 13.506 1.00 96.60 C ATOM 538 O ALA A 76 14.733 44.743 13.149 1.00 95.57 O ATOM 539 CB ALA A 76 12.836 45.788 15.619 1.00 96.93 C ATOM 540 N ASN A 77 13.542 46.608 12.671 1.00 97.00 N ATOM 541 CA ASN A 77 13.452 46.380 11.229 1.00 97.74 C ATOM 542 C ASN A 77 14.752 46.753 10.531 1.00 98.75 C ATOM 543 O ASN A 77 15.021 46.296 9.420 1.00 98.91 O ATOM 544 CB ASN A 77 12.290 47.184 10.631 1.00 97.23 C ATOM 545 CG ASN A 77 10.975 46.405 10.622 1.00 97.89 C ATOM 546 OD1 ASN A 77 10.757 45.538 9.771 1.00 96.57 O ATOM 547 ND2 ASN A 77 10.097 46.708 11.580 1.00 97.76 N ATOM 548 N GLU A 78 15.550 47.597 11.181 1.00 99.93 N ATOM 549 CA GLU A 78 16.837 47.996 10.624 1.00100.34 C ATOM 550 C GLU A 78 17.804 46.831 10.805 1.00 99.33 C ATOM 551 O GLU A 78 18.568 46.486 9.902 1.00 99.02 O ATOM 552 CB GLU A 78 17.383 49.231 11.342 1.00101.86 C ATOM 553 CG GLU A 78 18.810 49.574 10.935 1.00103.76 C ATOM 554 CD GLU A 78 19.304 50.857 11.561 1.00104.81 C ATOM 555 OE1 GLU A 78 19.254 50.962 12.806 1.00105.01 O ATOM 556 OE2 GLU A 78 19.742 51.752 10.803 1.00105.37 O ATOM 557 N GLN A 79 17.761 46.226 11.985 1.00 98.33 N ATOM 558 CA GLN A 79 18.615 45.093 12.284 1.00 98.14 C ATOM 559 C GLN A 79 18.317 43.920 11.350 1.00 98.36 C ATOM 560 O GLN A 79 19.233 43.197 10.938 1.00 99.37 O ATOM 561 CB GLN A 79 18.415 44.657 13.735 1.00 98.02 C ATOM 562 CG GLN A 79 19.077 43.327 14.072 1.00 98.51 C ATOM 563 CD GLN A 79 18.936 42.961 15.536 1.00 99.74 C ATOM 564 OE1 GLN A 79 19.382 41.894 15.968 1.00100.42 O ATOM 565 NE2 GLN A 79 18.318 43.849 16.312 1.00 99.69 N ATOM 566 N LEU A 80 17.038 43.744 11.015 1.00 97.08 N ATOM 567 CA LEU A 80 16.596 42.648 10.154 1.00 96.12 C ATOM 568 C LEU A 80 17.004 42.802 8.698 1.00 95.39 C ATOM 569 O LEU A 80 17.694 41.942 8.155 1.00 94.88 O ATOM 570 CB LEU A 80 15.068 42.475 10.256 1.00 96.94 C ATOM 571 CG LEU A 80 14.318 41.481 9.348 1.00 96.90 C ATOM 572 CD1 LEU A 80 14.882 40.069 9.459 1.00 96.64 C ATOM 573 CD2 LEU A 80 12.853 41.486 9.744 1.00 97.01 C ATOM 574 N ALA A 81 16.570 43.888 8.064 1.00 95.38 N ATOM 575 CA ALA A 81 16.904 44.132 6.660 1.00 95.32 C ATOM 576 C ALA A 81 18.365 43.748 6.453 1.00 95.15 C ATOM 577 O ALA A 81 18.707 43.034 5.505 1.00 95.06 O ATOM 578 CB ALA A 81 16.680 45.610 6.301 1.00 94.39 C ATOM 579 N ALA A 82 19.210 44.217 7.371 1.00 94.53 N ATOM 580 CA ALA A 82 20.639 43.944 7.352 1.00 92.86 C ATOM 581 C ALA A 82 20.925 42.450 7.167 1.00 92.77 C ATOM 582 O ALA A 82 21.511 42.040 6.158 1.00 94.08 O ATOM 583 CB ALA A 82 21.259 44.430 8.641 1.00 91.55 C ATOM 584 N VAL A 83 20.508 41.631 8.131 1.00 90.57 N ATOM 585 CA VAL A 83 20.750 40.196 8.036 1.00 87.86 C ATOM 586 C VAL A 83 20.245 39.634 6.717 1.00 87.81 C ATOM 587 O VAL A 83 20.903 38.810 6.101 1.00 87.85 O ATOM 588 CB VAL A 83 20.081 39.426 9.183 1.00 85.44 C ATOM 589 CG1 VAL A 83 20.562 37.977 9.171 1.00 83.17 C ATOM 590 CG2 VAL A 83 20.389 40.107 10.519 1.00 84.36 C ATOM 591 N VAL A 84 19.078 40.094 6.284 1.00 88.67 N ATOM 592 CA VAL A 84 18.480 39.633 5.030 1.00 89.97 C ATOM 593 C VAL A 84 19.382 39.901 3.816 1.00 90.78 C ATOM 594 O VAL A 84 19.658 39.005 3.014 1.00 89.94 O ATOM 595 CB VAL A 84 17.105 40.312 4.804 1.00 89.27 C ATOM 596 CG1 VAL A 84 16.498 39.842 3.488 1.00 88.30 C ATOM 597 CG2 VAL A 84 16.175 40.007 5.973 1.00 88.68 C ATOM 598 N ALA A 85 19.833 41.142 3.681 1.00 92.01 N ATOM 599 CA ALA A 85 20.704 41.498 2.574 1.00 93.16 C ATOM 600 C ALA A 85 21.971 40.673 2.738 1.00 93.76 C ATOM 601 O ALA A 85 22.607 40.268 1.754 1.00 92.85 O ATOM 602 CB ALA A 85 21.032 42.996 2.616 1.00 92.48 C ATOM 603 N GLU A 86 22.305 40.414 4.002 1.00 94.93 N ATOM 604 CA GLU A 86 23.492 39.655 4.383 1.00 96.65 C ATOM 605 C GLU A 86 23.513 38.245 3.808 1.00 97.30 C ATOM 606 O GLU A 86 24.541 37.767 3.317 1.00 96.34 O ATOM 607 CB GLU A 86 23.585 39.587 5.907 1.00 97.43 C ATOM 608 CG GLU A 86 25.000 39.631 6.397 1.00101.11 C ATOM 609 CD GLU A 86 25.795 40.709 5.681 1.00103.56 C ATOM 610 OE1 GLU A 86 25.361 41.887 5.705 1.00104.28 O ATOM 611 OE2 GLU A 86 26.847 40.374 5.090 1.00104.59 O ATOM 612 N THR A 87 22.363 37.587 3.884 1.00 98.79 N ATOM 613 CA THR A 87 22.207 36.236 3.382 1.00 99.86 C ATOM 614 C THR A 87 22.033 36.335 1.888 1.00101.87 C ATOM 615 O THR A 87 22.756 35.697 1.133 1.00102.78 O ATOM 616 CB THR A 87 20.963 35.572 3.962 1.00 99.41 C ATOM 617 OG1 THR A 87 20.789 35.983 5.324 1.00 99.80 O ATOM 618 CG2 THR A 87 21.117 34.075 3.930 1.00 99.73 C ATOM 619 N GLN A 88 21.065 37.147 1.471 1.00104.27 N ATOM 620 CA GLN A 88 20.771 37.362 0.053 1.00107.00 C ATOM 621 C GLN A 88 22.018 37.455 -0.834 1.00107.90 C ATOM 622 O GLN A 88 22.007 36.996 -1.983 1.00107.08 O ATOM 623 CB GLN A 88 19.939 38.638 -0.121 1.00107.91 C ATOM 624 CG GLN A 88 18.444 38.450 0.060 1.00109.47 C ATOM 625 CD GLN A 88 17.885 37.366 -0.847 1.00110.31 C ATOM 626 OE1 GLN A 88 18.415 37.111 -1.932 1.00110.06 O ATOM 627 NE2 GLN A 88 16.799 36.730 -0.410 1.00110.80 N ATOM 628 N LYS A 89 23.078 38.057 -0.293 1.00109.56 N ATOM 629 CA LYS A 89 24.342 38.227 -1.008 1.00110.90 C ATOM 630 C LYS A 89 25.078 36.891 -1.153 1.00111.00 C ATOM 631 O LYS A 89 25.699 36.618 -2.185 1.00110.95 O ATOM 632 CB LYS A 89 25.228 39.228 -0.266 1.00111.68 C ATOM 633 CG LYS A 89 26.458 39.658 -1.047 1.00113.68 C ATOM 634 CD LYS A 89 27.348 40.596 -0.230 1.00116.52 C ATOM 635 CE LYS A 89 26.614 41.873 0.203 1.00117.23 C ATOM 636 NZ LYS A 89 27.492 42.813 0.971 1.00117.27 N ATOM 637 N ASN A 90 25.005 36.068 -0.110 1.00110.80 N ATOM 638 CA ASN A 90 25.634 34.751 -0.112 1.00110.60 C ATOM 639 C ASN A 90 24.921 33.851 -1.108 1.00110.46 C ATOM 640 O ASN A 90 25.410 32.775 -1.460 1.00110.57 O ATOM 641 CB ASN A 90 25.551 34.132 1.280 1.00110.18 C ATOM 642 CG ASN A 90 26.339 34.909 2.299 1.00109.88 C ATOM 643 OD1 ASN A 90 27.566 34.892 2.280 1.00110.06 O ATOM 644 ND2 ASN A 90 25.644 35.611 3.186 1.00109.05 N ATOM 645 N GLY A 91 23.747 34.300 -1.543 1.00110.39 N ATOM 646 CA GLY A 91 22.968 33.548 -2.507 1.00109.77 C ATOM 647 C GLY A 91 21.861 32.669 -1.944 1.00109.04 C ATOM 648 O GLY A 91 21.079 32.108 -2.715 1.00109.23 O ATOM 649 N THR A 92 21.770 32.546 -0.620 1.00107.90 N ATOM 650 CA THR A 92 20.741 31.697 -0.019 1.00106.35 C ATOM 651 C THR A 92 19.375 32.383 0.037 1.00105.86 C ATOM 652 O THR A 92 19.283 33.615 -0.047 1.00105.70 O ATOM 653 CB THR A 92 21.144 31.233 1.418 1.00105.45 C ATOM 654 OG1 THR A 92 20.501 32.048 2.402 1.00103.68 O ATOM 655 CG2 THR A 92 22.655 31.319 1.611 1.00104.19 C ATOM 656 N ILE A 93 18.318 31.573 0.154 1.00104.70 N ATOM 657 CA ILE A 93 16.942 32.070 0.239 1.00101.94 C ATOM 658 C ILE A 93 16.600 32.413 1.684 1.00100.46 C ATOM 659 O ILE A 93 16.758 31.584 2.587 1.00100.27 O ATOM 660 CB ILE A 93 15.925 31.023 -0.247 1.00101.61 C ATOM 661 CG1 ILE A 93 16.257 30.580 -1.670 1.00101.79 C ATOM 662 CG2 ILE A 93 14.523 31.606 -0.191 1.00101.59 C ATOM 663 CD1 ILE A 93 16.287 31.716 -2.664 1.00102.89 C ATOM 664 N SER A 94 16.138 33.641 1.890 1.00 98.22 N ATOM 665 CA SER A 94 15.780 34.113 3.217 1.00 96.48 C ATOM 666 C SER A 94 14.339 33.749 3.568 1.00 95.30 C ATOM 667 O SER A 94 13.420 33.963 2.763 1.00 94.98 O ATOM 668 CB SER A 94 15.962 35.631 3.300 1.00 96.70 C ATOM 669 OG SER A 94 15.296 36.288 2.230 1.00 96.79 O ATOM 670 N VAL A 95 14.146 33.182 4.761 1.00 92.94 N ATOM 671 CA VAL A 95 12.814 32.794 5.229 1.00 89.32 C ATOM 672 C VAL A 95 12.596 33.325 6.642 1.00 87.17 C ATOM 673 O VAL A 95 13.033 32.723 7.628 1.00 86.40 O ATOM 674 CB VAL A 95 12.642 31.274 5.229 1.00 88.54 C ATOM 675 CG1 VAL A 95 11.154 30.936 5.278 1.00 88.25 C ATOM 676 CG2 VAL A 95 13.293 30.680 3.992 1.00 87.16 C ATOM 677 N VAL A 96 11.893 34.450 6.716 1.00 85.04 N ATOM 678 CA VAL A 96 11.627 35.137 7.968 1.00 83.33 C ATOM 679 C VAL A 96 10.424 34.694 8.789 1.00 81.90 C ATOM 680 O VAL A 96 9.293 35.094 8.510 1.00 81.87 O ATOM 681 CB VAL A 96 11.480 36.654 7.724 1.00 83.38 C ATOM 682 CG1 VAL A 96 11.342 37.391 9.051 1.00 83.72 C ATOM 683 CG2 VAL A 96 12.672 37.163 6.947 1.00 83.57 C ATOM 684 N LEU A 97 10.675 33.868 9.803 1.00 80.03 N ATOM 685 CA LEU A 97 9.618 33.440 10.711 1.00 76.82 C ATOM 686 C LEU A 97 9.375 34.744 11.444 1.00 75.41 C ATOM 687 O LEU A 97 10.333 35.449 11.745 1.00 74.63 O ATOM 688 CB LEU A 97 10.131 32.399 11.703 1.00 75.09 C ATOM 689 CG LEU A 97 10.749 31.133 11.112 1.00 75.32 C ATOM 690 CD1 LEU A 97 10.932 30.113 12.220 1.00 72.55 C ATOM 691 CD2 LEU A 97 9.851 30.584 10.014 1.00 74.86 C ATOM 692 N GLY A 98 8.127 35.087 11.736 1.00 74.85 N ATOM 693 CA GLY A 98 7.936 36.354 12.400 1.00 74.16 C ATOM 694 C GLY A 98 6.970 36.569 13.541 1.00 74.60 C ATOM 695 O GLY A 98 6.400 35.656 14.137 1.00 73.56 O ATOM 696 N GLY A 99 6.865 37.850 13.856 1.00 75.85 N ATOM 697 CA GLY A 99 6.008 38.352 14.897 1.00 78.40 C ATOM 698 C GLY A 99 5.122 39.171 13.996 1.00 80.54 C ATOM 699 O GLY A 99 4.904 38.753 12.856 1.00 80.94 O ATOM 700 N ASP A 100 4.655 40.334 14.440 1.00 82.59 N ATOM 701 CA ASP A 100 3.780 41.140 13.590 1.00 84.85 C ATOM 702 C ASP A 100 4.372 41.386 12.192 1.00 84.58 C ATOM 703 O ASP A 100 5.584 41.315 11.987 1.00 84.05 O ATOM 704 CB ASP A 100 3.422 42.461 14.284 1.00 87.56 C ATOM 705 CG ASP A 100 4.395 43.572 13.978 1.00 90.13 C ATOM 706 OD1 ASP A 100 4.435 44.009 12.808 1.00 89.63 O ATOM 707 OD2 ASP A 100 5.107 44.006 14.912 1.00 92.61 O ATOM 708 N HIS A 101 3.489 41.668 11.242 1.00 84.89 N ATOM 709 CA HIS A 101 3.840 41.873 9.843 1.00 85.65 C ATOM 710 C HIS A 101 4.618 43.144 9.530 1.00 85.99 C ATOM 711 O HIS A 101 4.839 43.459 8.360 1.00 85.93 O ATOM 712 CB HIS A 101 2.551 41.832 9.009 1.00 86.72 C ATOM 713 CG HIS A 101 2.738 41.366 7.596 1.00 86.60 C ATOM 714 ND1 HIS A 101 1.693 40.872 6.839 1.00 85.54 N ATOM 715 CD2 HIS A 101 3.825 41.355 6.788 1.00 86.26 C ATOM 716 CE1 HIS A 101 2.131 40.580 5.630 1.00 86.39 C ATOM 717 NE2 HIS A 101 3.422 40.865 5.570 1.00 86.65 N ATOM 718 N SER A 102 5.042 43.878 10.552 1.00 86.24 N ATOM 719 CA SER A 102 5.796 45.103 10.301 1.00 86.63 C ATOM 720 C SER A 102 7.276 44.785 10.119 1.00 85.62 C ATOM 721 O SER A 102 8.092 45.676 9.898 1.00 84.65 O ATOM 722 CB SER A 102 5.615 46.088 11.448 1.00 87.71 C ATOM 723 OG SER A 102 6.280 45.619 12.605 1.00 91.99 O ATOM 724 N MET A 103 7.622 43.507 10.229 1.00 85.08 N ATOM 725 CA MET A 103 9.000 43.095 10.039 1.00 84.48 C ATOM 726 C MET A 103 9.164 42.876 8.549 1.00 83.76 C ATOM 727 O MET A 103 10.190 42.378 8.086 1.00 83.39 O ATOM 728 CB MET A 103 9.302 41.805 10.799 1.00 84.83 C ATOM 729 CG MET A 103 9.126 41.927 12.302 1.00 87.32 C ATOM 730 SD MET A 103 9.968 43.364 13.016 1.00 88.58 S ATOM 731 CE MET A 103 11.677 42.836 12.847 1.00 87.63 C ATOM 732 N ALA A 104 8.130 43.251 7.802 1.00 83.38 N ATOM 733 CA ALA A 104 8.143 43.122 6.352 1.00 83.07 C ATOM 734 C ALA A 104 9.080 44.180 5.763 1.00 82.42 C ATOM 735 O ALA A 104 9.838 43.900 4.829 1.00 82.21 O ATOM 736 CB ALA A 104 6.734 43.289 5.799 1.00 84.51 C ATOM 737 N ILE A 105 9.028 45.398 6.304 1.00 80.53 N ATOM 738 CA ILE A 105 9.922 46.451 5.829 1.00 78.98 C ATOM 739 C ILE A 105 11.346 45.892 5.938 1.00 77.86 C ATOM 740 O ILE A 105 12.063 45.791 4.940 1.00 76.47 O ATOM 741 CB ILE A 105 9.807 47.765 6.675 1.00 78.15 C ATOM 742 CG1 ILE A 105 8.926 47.559 7.904 1.00 77.60 C ATOM 743 CG2 ILE A 105 9.183 48.868 5.831 1.00 78.93 C ATOM 744 CD1 ILE A 105 8.669 48.857 8.658 1.00 74.53 C ATOM 745 N GLY A 106 11.737 45.509 7.151 1.00 76.94 N ATOM 746 CA GLY A 106 13.053 44.943 7.349 1.00 76.15 C ATOM 747 C GLY A 106 13.354 43.920 6.274 1.00 76.23 C ATOM 748 O GLY A 106 14.188 44.167 5.407 1.00 77.43 O ATOM 749 N SER A 107 12.668 42.781 6.316 1.00 75.72 N ATOM 750 CA SER A 107 12.872 41.711 5.337 1.00 75.71 C ATOM 751 C SER A 107 12.956 42.222 3.896 1.00 75.75 C ATOM 752 O SER A 107 13.975 42.051 3.218 1.00 74.82 O ATOM 753 CB SER A 107 11.735 40.681 5.444 1.00 76.72 C ATOM 754 OG SER A 107 11.782 39.698 4.408 1.00 75.74 O ATOM 755 N ILE A 108 11.879 42.853 3.438 1.00 75.95 N ATOM 756 CA ILE A 108 11.809 43.362 2.079 1.00 75.58 C ATOM 757 C ILE A 108 12.978 44.277 1.736 1.00 76.75 C ATOM 758 O ILE A 108 13.625 44.092 0.706 1.00 77.19 O ATOM 759 CB ILE A 108 10.467 44.079 1.836 1.00 74.49 C ATOM 760 CG1 ILE A 108 9.315 43.104 2.090 1.00 73.59 C ATOM 761 CG2 ILE A 108 10.397 44.577 0.409 1.00 75.30 C ATOM 762 CD1 ILE A 108 7.962 43.605 1.678 1.00 71.41 C ATOM 763 N SER A 109 13.258 45.253 2.597 1.00 77.84 N ATOM 764 CA SER A 109 14.378 46.168 2.371 1.00 78.47 C ATOM 765 C SER A 109 15.657 45.368 2.182 1.00 78.55 C ATOM 766 O SER A 109 16.288 45.427 1.131 1.00 77.36 O ATOM 767 CB SER A 109 14.550 47.114 3.564 1.00 78.95 C ATOM 768 OG SER A 109 13.625 48.183 3.502 1.00 81.14 O ATOM 769 N GLY A 110 16.026 44.623 3.219 1.00 79.20 N ATOM 770 CA GLY A 110 17.219 43.812 3.160 1.00 81.12 C ATOM 771 C GLY A 110 17.305 43.071 1.845 1.00 83.10 C ATOM 772 O GLY A 110 18.372 42.994 1.240 1.00 83.47 O ATOM 773 N HIS A 111 16.178 42.536 1.391 1.00 85.03 N ATOM 774 CA HIS A 111 16.137 41.791 0.135 1.00 87.81 C ATOM 775 C HIS A 111 16.484 42.651 -1.098 1.00 88.96 C ATOM 776 O HIS A 111 17.319 42.261 -1.923 1.00 87.29 O ATOM 777 CB HIS A 111 14.747 41.155 -0.039 1.00 88.57 C ATOM 778 CG HIS A 111 14.669 40.158 -1.158 1.00 89.32 C ATOM 779 ND1 HIS A 111 13.495 39.528 -1.509 1.00 89.05 N ATOM 780 CD2 HIS A 111 15.616 39.692 -2.009 1.00 88.94 C ATOM 781 CE1 HIS A 111 13.721 38.718 -2.530 1.00 89.41 C ATOM 782 NE2 HIS A 111 14.999 38.799 -2.852 1.00 88.73 N ATOM 783 N ALA A 112 15.835 43.814 -1.204 1.00 90.99 N ATOM 784 CA ALA A 112 16.021 44.756 -2.318 1.00 92.37 C ATOM 785 C ALA A 112 17.465 45.227 -2.488 1.00 93.72 C ATOM 786 O ALA A 112 17.876 45.643 -3.579 1.00 93.91 O ATOM 787 CB ALA A 112 15.101 45.969 -2.131 1.00 91.27 C ATOM 788 N ARG A 113 18.226 45.175 -1.402 1.00 94.50 N ATOM 789 CA ARG A 113 19.616 45.589 -1.435 1.00 94.94 C ATOM 790 C ARG A 113 20.414 44.713 -2.395 1.00 95.62 C ATOM 791 O ARG A 113 21.213 45.215 -3.184 1.00 96.16 O ATOM 792 CB ARG A 113 20.202 45.547 -0.015 1.00 94.28 C ATOM 793 CG ARG A 113 19.745 46.736 0.825 1.00 94.48 C ATOM 794 CD ARG A 113 20.114 46.645 2.300 1.00 95.26 C ATOM 795 NE ARG A 113 19.785 47.894 2.999 1.00 96.22 N ATOM 796 CZ ARG A 113 19.727 48.043 4.326 1.00 96.89 C ATOM 797 NH1 ARG A 113 19.972 47.015 5.138 1.00 96.87 N ATOM 798 NH2 ARG A 113 19.422 49.229 4.847 1.00 96.57 N ATOM 799 N VAL A 114 20.173 43.408 -2.348 1.00 96.02 N ATOM 800 CA VAL A 114 20.883 42.473 -3.210 1.00 96.73 C ATOM 801 C VAL A 114 20.171 42.238 -4.546 1.00 97.56 C ATOM 802 O VAL A 114 20.814 41.949 -5.557 1.00 97.44 O ATOM 803 CB VAL A 114 21.084 41.130 -2.486 1.00 97.09 C ATOM 804 CG1 VAL A 114 21.944 40.206 -3.331 1.00 97.30 C ATOM 805 CG2 VAL A 114 21.724 41.372 -1.117 1.00 96.77 C ATOM 806 N HIS A 115 18.845 42.363 -4.540 1.00 99.09 N ATOM 807 CA HIS A 115 18.026 42.179 -5.744 1.00100.13 C ATOM 808 C HIS A 115 16.963 43.274 -5.811 1.00100.23 C ATOM 809 O HIS A 115 15.808 43.052 -5.459 1.00100.81 O ATOM 810 CB HIS A 115 17.325 40.818 -5.730 1.00100.57 C ATOM 811 CG HIS A 115 18.257 39.649 -5.655 1.00101.28 C ATOM 812 ND1 HIS A 115 18.888 39.273 -4.490 1.00101.75 N ATOM 813 CD2 HIS A 115 18.635 38.752 -6.596 1.00101.64 C ATOM 814 CE1 HIS A 115 19.612 38.189 -4.714 1.00102.42 C ATOM 815 NE2 HIS A 115 19.475 37.853 -5.983 1.00102.31 N ATOM 816 N PRO A 116 17.343 44.465 -6.280 1.00100.42 N ATOM 817 CA PRO A 116 16.471 45.634 -6.410 1.00100.72 C ATOM 818 C PRO A 116 15.457 45.524 -7.536 1.00100.87 C ATOM 819 O PRO A 116 14.693 46.461 -7.787 1.00100.56 O ATOM 820 CB PRO A 116 17.460 46.752 -6.660 1.00101.70 C ATOM 821 CG PRO A 116 18.451 46.063 -7.564 1.00102.18 C ATOM 822 CD PRO A 116 18.674 44.745 -6.847 1.00101.10 C ATOM 823 N ASP A 117 15.461 44.389 -8.225 1.00101.28 N ATOM 824 CA ASP A 117 14.530 44.175 -9.332 1.00102.06 C ATOM 825 C ASP A 117 13.382 43.237 -8.950 1.00101.05 C ATOM 826 O ASP A 117 12.539 42.902 -9.786 1.00101.44 O ATOM 827 CB ASP A 117 15.281 43.618 -10.550 1.00103.66 C ATOM 828 CG ASP A 117 16.057 42.344 -10.234 1.00105.43 C ATOM 829 OD1 ASP A 117 16.826 42.338 -9.244 1.00106.09 O ATOM 830 OD2 ASP A 117 15.904 41.353 -10.985 1.00106.50 O ATOM 831 N LEU A 118 13.350 42.835 -7.679 1.00 98.82 N ATOM 832 CA LEU A 118 12.324 41.926 -7.167 1.00 95.55 C ATOM 833 C LEU A 118 10.892 42.471 -7.232 1.00 93.68 C ATOM 834 O LEU A 118 10.659 43.681 -7.348 1.00 93.41 O ATOM 835 CB LEU A 118 12.644 41.524 -5.709 1.00 93.59 C ATOM 836 CG LEU A 118 12.657 42.592 -4.599 1.00 92.16 C ATOM 837 CD1 LEU A 118 11.272 43.181 -4.388 1.00 92.08 C ATOM 838 CD2 LEU A 118 13.134 41.963 -3.313 1.00 90.84 C ATOM 839 N CYS A 119 9.940 41.546 -7.176 1.00 90.59 N ATOM 840 CA CYS A 119 8.525 41.875 -7.169 1.00 86.30 C ATOM 841 C CYS A 119 8.109 41.414 -5.793 1.00 83.38 C ATOM 842 O CYS A 119 8.946 40.981 -5.007 1.00 83.17 O ATOM 843 CB CYS A 119 7.782 41.080 -8.234 1.00 85.90 C ATOM 844 SG CYS A 119 8.373 39.392 -8.362 1.00 85.58 S ATOM 845 N VAL A 120 6.824 41.488 -5.497 1.00 80.52 N ATOM 846 CA VAL A 120 6.353 41.078 -4.187 1.00 76.40 C ATOM 847 C VAL A 120 4.922 40.565 -4.220 1.00 73.42 C ATOM 848 O VAL A 120 4.020 41.248 -4.695 1.00 72.14 O ATOM 849 CB VAL A 120 6.446 42.262 -3.173 1.00 75.95 C ATOM 850 CG1 VAL A 120 5.674 41.930 -1.902 1.00 74.91 C ATOM 851 CG2 VAL A 120 7.915 42.554 -2.842 1.00 73.63 C ATOM 852 N ILE A 121 4.727 39.350 -3.723 1.00 71.26 N ATOM 853 CA ILE A 121 3.390 38.758 -3.640 1.00 69.79 C ATOM 854 C ILE A 121 2.987 38.893 -2.166 1.00 66.24 C ATOM 855 O ILE A 121 3.645 38.354 -1.273 1.00 65.09 O ATOM 856 CB ILE A 121 3.382 37.252 -4.078 1.00 71.19 C ATOM 857 CG1 ILE A 121 3.643 37.117 -5.584 1.00 71.26 C ATOM 858 CG2 ILE A 121 2.023 36.637 -3.802 1.00 71.09 C ATOM 859 CD1 ILE A 121 4.863 37.861 -6.072 1.00 72.36 C ATOM 860 N TRP A 122 1.916 39.631 -1.918 1.00 62.73 N ATOM 861 CA TRP A 122 1.471 39.872 -0.556 1.00 60.62 C ATOM 862 C TRP A 122 0.147 39.186 -0.203 1.00 60.19 C ATOM 863 O TRP A 122 -0.941 39.704 -0.490 1.00 59.32 O ATOM 864 CB TRP A 122 1.378 41.391 -0.344 1.00 58.94 C ATOM 865 CG TRP A 122 1.153 41.853 1.077 1.00 56.58 C ATOM 866 CD1 TRP A 122 -0.029 41.851 1.766 1.00 55.86 C ATOM 867 CD2 TRP A 122 2.126 42.438 1.956 1.00 55.45 C ATOM 868 NE1 TRP A 122 0.144 42.402 3.018 1.00 55.24 N ATOM 869 CE2 TRP A 122 1.460 42.762 3.167 1.00 55.02 C ATOM 870 CE3 TRP A 122 3.500 42.703 1.851 1.00 54.23 C ATOM 871 CZ2 TRP A 122 2.111 43.359 4.250 1.00 54.64 C ATOM 872 CZ3 TRP A 122 4.157 43.297 2.934 1.00 54.27 C ATOM 873 CH2 TRP A 122 3.461 43.610 4.120 1.00 55.66 C ATOM 874 N VAL A 123 0.254 38.010 0.414 1.00 59.72 N ATOM 875 CA VAL A 123 -0.925 37.257 0.851 1.00 59.40 C ATOM 876 C VAL A 123 -1.212 37.675 2.307 1.00 57.38 C ATOM 877 O VAL A 123 -0.516 37.275 3.244 1.00 55.82 O ATOM 878 CB VAL A 123 -0.697 35.723 0.789 1.00 59.65 C ATOM 879 CG1 VAL A 123 -2.017 35.003 0.936 1.00 58.94 C ATOM 880 CG2 VAL A 123 -0.043 35.336 -0.519 1.00 58.72 C ATOM 881 N ASP A 124 -2.245 38.497 2.465 1.00 55.40 N ATOM 882 CA ASP A 124 -2.649 39.052 3.755 1.00 54.22 C ATOM 883 C ASP A 124 -4.128 39.331 3.553 1.00 55.31 C ATOM 884 O ASP A 124 -4.581 39.455 2.405 1.00 55.23 O ATOM 885 CB ASP A 124 -1.889 40.375 3.985 1.00 50.71 C ATOM 886 CG ASP A 124 -2.005 40.899 5.388 1.00 47.58 C ATOM 887 OD1 ASP A 124 -3.129 41.084 5.879 1.00 46.94 O ATOM 888 OD2 ASP A 124 -0.959 41.157 6.016 1.00 46.77 O ATOM 889 N ALA A 125 -4.884 39.407 4.645 1.00 56.37 N ATOM 890 CA ALA A 125 -6.307 39.725 4.538 1.00 57.62 C ATOM 891 C ALA A 125 -6.360 41.236 4.271 1.00 58.34 C ATOM 892 O ALA A 125 -7.222 41.739 3.546 1.00 56.62 O ATOM 893 CB ALA A 125 -7.033 39.393 5.843 1.00 56.07 C ATOM 894 N HIS A 126 -5.381 41.932 4.849 1.00 59.92 N ATOM 895 CA HIS A 126 -5.241 43.374 4.762 1.00 61.14 C ATOM 896 C HIS A 126 -4.154 43.830 3.807 1.00 64.86 C ATOM 897 O HIS A 126 -3.182 43.103 3.556 1.00 65.54 O ATOM 898 CB HIS A 126 -4.914 43.930 6.141 1.00 58.52 C ATOM 899 CG HIS A 126 -5.564 43.177 7.247 1.00 57.04 C ATOM 900 ND1 HIS A 126 -4.956 42.112 7.877 1.00 59.20 N ATOM 901 CD2 HIS A 126 -6.815 43.256 7.755 1.00 55.86 C ATOM 902 CE1 HIS A 126 -5.810 41.557 8.720 1.00 58.98 C ATOM 903 NE2 HIS A 126 -6.946 42.232 8.664 1.00 58.70 N ATOM 904 N THR A 127 -4.329 45.052 3.298 1.00 67.28 N ATOM 905 CA THR A 127 -3.372 45.686 2.402 1.00 69.38 C ATOM 906 C THR A 127 -2.215 46.169 3.285 1.00 70.18 C ATOM 907 O THR A 127 -1.039 46.062 2.916 1.00 68.18 O ATOM 908 CB THR A 127 -4.006 46.902 1.677 1.00 71.32 C ATOM 909 OG1 THR A 127 -4.933 47.547 2.561 1.00 72.91 O ATOM 910 CG2 THR A 127 -4.729 46.470 0.401 1.00 70.67 C ATOM 911 N ASP A 128 -2.568 46.686 4.463 1.00 72.64 N ATOM 912 CA ASP A 128 -1.585 47.192 5.423 1.00 76.54 C ATOM 913 C ASP A 128 -0.800 48.353 4.786 1.00 78.32 C ATOM 914 O ASP A 128 0.327 48.659 5.186 1.00 77.37 O ATOM 915 CB ASP A 128 -0.598 46.070 5.845 1.00 76.13 C ATOM 916 CG ASP A 128 -1.299 44.825 6.405 1.00 74.37 C ATOM 917 OD1 ASP A 128 -2.361 44.968 7.056 1.00 74.66 O ATOM 918 OD2 ASP A 128 -0.774 43.706 6.199 1.00 71.25 O ATOM 919 N ILE A 129 -1.422 48.988 3.791 1.00 81.57 N ATOM 920 CA ILE A 129 -0.837 50.099 3.029 1.00 82.62 C ATOM 921 C ILE A 129 -1.206 51.494 3.574 1.00 83.95 C ATOM 922 O ILE A 129 -1.137 52.497 2.855 1.00 83.15 O ATOM 923 CB ILE A 129 -1.235 49.968 1.503 1.00 81.91 C ATOM 924 CG1 ILE A 129 -0.529 51.027 0.673 1.00 81.40 C ATOM 925 CG2 ILE A 129 -2.752 50.058 1.333 1.00 80.22 C ATOM 926 CD1 ILE A 129 0.960 50.999 0.830 1.00 82.11 C ATOM 927 N ASN A 130 -1.572 51.546 4.856 1.00 85.83 N ATOM 928 CA ASN A 130 -1.923 52.800 5.508 1.00 86.95 C ATOM 929 C ASN A 130 -0.667 53.553 5.927 1.00 87.55 C ATOM 930 O ASN A 130 0.391 52.965 6.148 1.00 87.00 O ATOM 931 CB ASN A 130 -2.805 52.564 6.746 1.00 88.13 C ATOM 932 CG ASN A 130 -4.294 52.509 6.412 1.00 90.24 C ATOM 933 OD1 ASN A 130 -4.780 53.246 5.550 1.00 91.56 O ATOM 934 ND2 ASN A 130 -5.027 51.645 7.110 1.00 91.70 N ATOM 935 N THR A 131 -0.802 54.867 6.029 1.00 88.52 N ATOM 936 CA THR A 131 0.291 55.733 6.415 1.00 88.78 C ATOM 937 C THR A 131 0.007 56.312 7.797 1.00 90.67 C ATOM 938 O THR A 131 -1.112 56.202 8.317 1.00 89.83 O ATOM 939 CB THR A 131 0.424 56.874 5.420 1.00 87.72 C ATOM 940 OG1 THR A 131 -0.777 57.660 5.439 1.00 86.20 O ATOM 941 CG2 THR A 131 0.637 56.319 4.025 1.00 87.64 C ATOM 942 N PRO A 132 1.025 56.933 8.414 1.00 92.13 N ATOM 943 CA PRO A 132 0.924 57.549 9.745 1.00 92.76 C ATOM 944 C PRO A 132 -0.234 58.548 9.786 1.00 92.59 C ATOM 945 O PRO A 132 -0.730 58.927 10.857 1.00 91.94 O ATOM 946 CB PRO A 132 2.282 58.237 9.905 1.00 93.72 C ATOM 947 CG PRO A 132 3.208 57.353 9.106 1.00 92.98 C ATOM 948 CD PRO A 132 2.391 57.075 7.872 1.00 92.38 C ATOM 949 N LEU A 133 -0.639 58.957 8.586 1.00 92.49 N ATOM 950 CA LEU A 133 -1.714 59.917 8.361 1.00 92.47 C ATOM 951 C LEU A 133 -3.065 59.221 8.373 1.00 92.31 C ATOM 952 O LEU A 133 -4.014 59.671 9.023 1.00 92.59 O ATOM 953 CB LEU A 133 -1.520 60.587 6.998 1.00 92.42 C ATOM 954 CG LEU A 133 -0.109 61.073 6.657 1.00 93.22 C ATOM 955 CD1 LEU A 133 -0.103 61.677 5.255 1.00 92.74 C ATOM 956 CD2 LEU A 133 0.353 62.091 7.701 1.00 93.47 C ATOM 957 N THR A 134 -3.132 58.117 7.636 1.00 91.80 N ATOM 958 CA THR A 134 -4.345 57.319 7.500 1.00 90.29 C ATOM 959 C THR A 134 -4.710 56.484 8.733 1.00 88.61 C ATOM 960 O THR A 134 -5.751 56.700 9.348 1.00 87.36 O ATOM 961 CB THR A 134 -4.198 56.378 6.309 1.00 90.62 C ATOM 962 OG1 THR A 134 -3.074 55.523 6.533 1.00 91.28 O ATOM 963 CG2 THR A 134 -3.958 57.166 5.034 1.00 89.53 C ATOM 964 N THR A 135 -3.847 55.533 9.077 1.00 87.08 N ATOM 965 CA THR A 135 -4.072 54.647 10.216 1.00 87.00 C ATOM 966 C THR A 135 -4.872 55.242 11.360 1.00 86.71 C ATOM 967 O THR A 135 -4.628 56.368 11.774 1.00 86.97 O ATOM 968 CB THR A 135 -2.758 54.158 10.829 1.00 86.89 C ATOM 969 OG1 THR A 135 -1.822 55.239 10.867 1.00 85.36 O ATOM 970 CG2 THR A 135 -2.193 52.990 10.039 1.00 87.48 C ATOM 971 N SER A 136 -5.832 54.468 11.858 1.00 86.78 N ATOM 972 CA SER A 136 -6.647 54.865 12.995 1.00 87.04 C ATOM 973 C SER A 136 -5.993 54.146 14.152 1.00 88.13 C ATOM 974 O SER A 136 -6.646 53.752 15.117 1.00 88.17 O ATOM 975 CB SER A 136 -8.091 54.388 12.852 1.00 86.80 C ATOM 976 OG SER A 136 -8.852 55.273 12.056 1.00 87.18 O ATOM 977 N SER A 137 -4.685 53.956 14.018 1.00 89.21 N ATOM 978 CA SER A 137 -3.887 53.287 15.033 1.00 91.19 C ATOM 979 C SER A 137 -2.428 53.435 14.636 1.00 91.93 C ATOM 980 O SER A 137 -2.101 53.474 13.448 1.00 90.67 O ATOM 981 CB SER A 137 -4.264 51.807 15.110 1.00 92.06 C ATOM 982 OG SER A 137 -4.001 51.158 13.867 1.00 93.47 O ATOM 983 N GLY A 138 -1.558 53.524 15.636 1.00 93.57 N ATOM 984 CA GLY A 138 -0.141 53.687 15.371 1.00 95.55 C ATOM 985 C GLY A 138 0.554 52.360 15.168 1.00 96.50 C ATOM 986 O GLY A 138 1.789 52.254 15.210 1.00 97.48 O ATOM 987 N ASN A 139 -0.268 51.343 14.938 1.00 96.37 N ATOM 988 CA ASN A 139 0.183 49.976 14.723 1.00 95.32 C ATOM 989 C ASN A 139 0.837 49.866 13.345 1.00 94.83 C ATOM 990 O ASN A 139 0.184 49.586 12.334 1.00 94.93 O ATOM 991 CB ASN A 139 -1.026 49.053 14.835 1.00 95.07 C ATOM 992 CG ASN A 139 -1.940 49.439 15.992 1.00 95.16 C ATOM 993 OD1 ASN A 139 -2.119 50.625 16.297 1.00 96.11 O ATOM 994 ND2 ASN A 139 -2.534 48.443 16.632 1.00 95.53 N ATOM 995 N LEU A 140 2.139 50.098 13.318 1.00 93.89 N ATOM 996 CA LEU A 140 2.893 50.044 12.082 1.00 94.05 C ATOM 997 C LEU A 140 2.804 48.717 11.327 1.00 94.50 C ATOM 998 O LEU A 140 3.292 48.621 10.198 1.00 94.62 O ATOM 999 CB LEU A 140 4.355 50.390 12.361 1.00 93.18 C ATOM 1000 CG LEU A 140 4.604 51.860 12.677 1.00 91.51 C ATOM 1001 CD1 LEU A 140 5.785 51.977 13.604 1.00 91.57 C ATOM 1002 CD2 LEU A 140 4.818 52.636 11.386 1.00 89.96 C ATOM 1003 N HIS A 141 2.194 47.695 11.933 1.00 94.12 N ATOM 1004 CA HIS A 141 2.063 46.408 11.245 1.00 93.21 C ATOM 1005 C HIS A 141 1.018 46.568 10.147 1.00 93.53 C ATOM 1006 O HIS A 141 0.728 45.629 9.403 1.00 93.63 O ATOM 1007 CB HIS A 141 1.656 45.276 12.212 1.00 90.47 C ATOM 1008 CG HIS A 141 0.212 45.299 12.629 1.00 88.47 C ATOM 1009 ND1 HIS A 141 -0.345 44.306 13.404 1.00 86.91 N ATOM 1010 CD2 HIS A 141 -0.782 46.186 12.379 1.00 87.52 C ATOM 1011 CE1 HIS A 141 -1.623 44.579 13.614 1.00 86.12 C ATOM 1012 NE2 HIS A 141 -1.913 45.713 13.003 1.00 85.96 N ATOM 1013 N GLY A 142 0.452 47.771 10.069 1.00 93.98 N ATOM 1014 CA GLY A 142 -0.557 48.070 9.069 1.00 95.37 C ATOM 1015 C GLY A 142 -0.125 49.206 8.156 1.00 96.23 C ATOM 1016 O GLY A 142 -0.963 49.945 7.631 1.00 95.98 O ATOM 1017 N GLN A 143 1.191 49.327 7.974 1.00 96.91 N ATOM 1018 CA GLN A 143 1.811 50.358 7.142 1.00 97.13 C ATOM 1019 C GLN A 143 3.025 49.890 6.307 1.00 97.15 C ATOM 1020 O GLN A 143 3.407 50.548 5.343 1.00 96.46 O ATOM 1021 CB GLN A 143 2.230 51.526 8.036 1.00 97.37 C ATOM 1022 CG GLN A 143 1.063 52.120 8.811 1.00 98.88 C ATOM 1023 CD GLN A 143 1.478 53.200 9.796 1.00 99.70 C ATOM 1024 OE1 GLN A 143 2.157 54.162 9.437 1.00 98.74 O ATOM 1025 NE2 GLN A 143 1.054 53.048 11.047 1.00100.53 N ATOM 1026 N PRO A 144 3.625 48.735 6.650 1.00 97.65 N ATOM 1027 CA PRO A 144 4.790 48.193 5.943 1.00 97.32 C ATOM 1028 C PRO A 144 4.994 48.534 4.473 1.00 96.74 C ATOM 1029 O PRO A 144 6.123 48.758 4.041 1.00 96.95 O ATOM 1030 CB PRO A 144 4.650 46.692 6.165 1.00 97.93 C ATOM 1031 CG PRO A 144 4.144 46.639 7.556 1.00 98.75 C ATOM 1032 CD PRO A 144 3.063 47.702 7.544 1.00 98.52 C ATOM 1033 N VAL A 145 3.921 48.563 3.696 1.00 96.76 N ATOM 1034 CA VAL A 145 4.065 48.860 2.273 1.00 96.67 C ATOM 1035 C VAL A 145 4.311 50.351 2.045 1.00 95.18 C ATOM 1036 O VAL A 145 5.050 50.740 1.134 1.00 94.04 O ATOM 1037 CB VAL A 145 2.810 48.396 1.468 1.00 97.80 C ATOM 1038 CG1 VAL A 145 3.019 48.650 -0.017 1.00 97.24 C ATOM 1039 CG2 VAL A 145 2.546 46.908 1.707 1.00 97.02 C ATOM 1040 N ALA A 146 3.695 51.173 2.890 1.00 94.06 N ATOM 1041 CA ALA A 146 3.824 52.623 2.809 1.00 93.54 C ATOM 1042 C ALA A 146 5.271 53.092 3.027 1.00 93.29 C ATOM 1043 O ALA A 146 5.635 54.217 2.661 1.00 93.69 O ATOM 1044 CB ALA A 146 2.891 53.281 3.829 1.00 92.75 C ATOM 1045 N PHE A 147 6.090 52.232 3.624 1.00 91.80 N ATOM 1046 CA PHE A 147 7.489 52.559 3.868 1.00 89.97 C ATOM 1047 C PHE A 147 8.362 52.054 2.742 1.00 88.94 C ATOM 1048 O PHE A 147 9.413 52.620 2.452 1.00 88.67 O ATOM 1049 CB PHE A 147 7.965 51.932 5.169 1.00 90.00 C ATOM 1050 CG PHE A 147 7.492 52.648 6.376 1.00 90.95 C ATOM 1051 CD1 PHE A 147 6.164 52.572 6.766 1.00 91.17 C ATOM 1052 CD2 PHE A 147 8.367 53.446 7.104 1.00 91.91 C ATOM 1053 CE1 PHE A 147 5.706 53.284 7.872 1.00 92.42 C ATOM 1054 CE2 PHE A 147 7.926 54.164 8.209 1.00 92.68 C ATOM 1055 CZ PHE A 147 6.590 54.085 8.596 1.00 92.65 C ATOM 1056 N LEU A 148 7.912 50.980 2.110 1.00 88.06 N ATOM 1057 CA LEU A 148 8.653 50.362 1.031 1.00 87.82 C ATOM 1058 C LEU A 148 8.317 50.979 -0.318 1.00 89.07 C ATOM 1059 O LEU A 148 9.104 50.915 -1.266 1.00 88.06 O ATOM 1060 CB LEU A 148 8.359 48.865 1.034 1.00 85.29 C ATOM 1061 CG LEU A 148 8.597 48.243 2.410 1.00 82.72 C ATOM 1062 CD1 LEU A 148 8.166 46.797 2.422 1.00 81.84 C ATOM 1063 CD2 LEU A 148 10.062 48.368 2.754 1.00 82.23 C ATOM 1064 N LEU A 149 7.142 51.587 -0.397 1.00 91.44 N ATOM 1065 CA LEU A 149 6.710 52.211 -1.635 1.00 94.48 C ATOM 1066 C LEU A 149 7.375 53.555 -1.883 1.00 96.57 C ATOM 1067 O LEU A 149 7.557 54.362 -0.962 1.00 96.50 O ATOM 1068 CB LEU A 149 5.191 52.402 -1.644 1.00 93.94 C ATOM 1069 CG LEU A 149 4.316 51.275 -2.184 1.00 93.30 C ATOM 1070 CD1 LEU A 149 2.880 51.756 -2.177 1.00 92.84 C ATOM 1071 CD2 LEU A 149 4.740 50.889 -3.606 1.00 93.57 C ATOM 1072 N LYS A 150 7.725 53.782 -3.146 1.00 98.82 N ATOM 1073 CA LYS A 150 8.348 55.028 -3.573 1.00100.82 C ATOM 1074 C LYS A 150 7.237 56.063 -3.792 1.00101.32 C ATOM 1075 O LYS A 150 7.333 57.201 -3.325 1.00101.72 O ATOM 1076 CB LYS A 150 9.128 54.806 -4.876 1.00101.79 C ATOM 1077 CG LYS A 150 10.171 53.691 -4.801 1.00102.99 C ATOM 1078 CD LYS A 150 10.869 53.472 -6.144 1.00103.99 C ATOM 1079 CE LYS A 150 11.922 52.363 -6.049 1.00104.51 C ATOM 1080 NZ LYS A 150 12.661 52.130 -7.331 1.00104.60 N ATOM 1081 N GLU A 151 6.178 55.651 -4.488 1.00101.60 N ATOM 1082 CA GLU A 151 5.044 56.523 -4.775 1.00102.29 C ATOM 1083 C GLU A 151 4.399 57.045 -3.504 1.00102.54 C ATOM 1084 O GLU A 151 3.419 57.787 -3.554 1.00102.16 O ATOM 1085 CB GLU A 151 3.987 55.780 -5.590 1.00102.95 C ATOM 1086 CG GLU A 151 4.448 55.312 -6.956 1.00104.96 C ATOM 1087 CD GLU A 151 5.045 53.920 -6.934 1.00106.23 C ATOM 1088 OE1 GLU A 151 6.035 53.701 -6.206 1.00107.34 O ATOM 1089 OE2 GLU A 151 4.521 53.043 -7.653 1.00107.33 O ATOM 1090 N LEU A 152 4.951 56.656 -2.365 1.00102.97 N ATOM 1091 CA LEU A 152 4.411 57.085 -1.092 1.00104.39 C ATOM 1092 C LEU A 152 5.438 57.865 -0.290 1.00105.77 C ATOM 1093 O LEU A 152 5.131 58.404 0.777 1.00105.76 O ATOM 1094 CB LEU A 152 3.943 55.867 -0.303 1.00104.29 C ATOM 1095 CG LEU A 152 2.452 55.801 0.033 1.00104.15 C ATOM 1096 CD1 LEU A 152 1.607 56.346 -1.103 1.00103.71 C ATOM 1097 CD2 LEU A 152 2.100 54.358 0.323 1.00104.71 C ATOM 1098 N LYS A 153 6.661 57.921 -0.809 1.00107.04 N ATOM 1099 CA LYS A 153 7.735 58.642 -0.139 1.00107.67 C ATOM 1100 C LYS A 153 7.369 60.121 -0.057 1.00107.92 C ATOM 1101 O LYS A 153 6.763 60.678 -0.980 1.00108.76 O ATOM 1102 CB LYS A 153 9.052 58.465 -0.902 1.00107.80 C ATOM 1103 CG LYS A 153 10.260 59.095 -0.221 1.00108.64 C ATOM 1104 CD LYS A 153 11.552 58.800 -0.980 1.00109.46 C ATOM 1105 CE LYS A 153 12.752 59.546 -0.386 1.00110.70 C ATOM 1106 NZ LYS A 153 13.062 59.158 1.028 1.00111.92 N ATOM 1107 N GLY A 154 7.727 60.747 1.059 1.00107.54 N ATOM 1108 CA GLY A 154 7.425 62.151 1.240 1.00107.33 C ATOM 1109 C GLY A 154 5.937 62.414 1.149 1.00107.30 C ATOM 1110 O GLY A 154 5.514 63.533 0.860 1.00108.46 O ATOM 1111 N LYS A 155 5.140 61.376 1.377 1.00106.64 N ATOM 1112 CA LYS A 155 3.690 61.508 1.344 1.00105.81 C ATOM 1113 C LYS A 155 3.193 61.505 2.789 1.00105.92 C ATOM 1114 O LYS A 155 1.988 61.497 3.054 1.00105.68 O ATOM 1115 CB LYS A 155 3.070 60.355 0.551 1.00104.96 C ATOM 1116 CG LYS A 155 3.441 60.354 -0.921 1.00103.47 C ATOM 1117 CD LYS A 155 2.896 61.590 -1.612 1.00104.34 C ATOM 1118 CE LYS A 155 3.337 61.663 -3.069 1.00104.97 C ATOM 1119 NZ LYS A 155 2.769 62.855 -3.775 1.00103.63 N ATOM 1120 N PHE A 156 4.152 61.515 3.713 1.00106.22 N ATOM 1121 CA PHE A 156 3.886 61.528 5.149 1.00106.40 C ATOM 1122 C PHE A 156 5.148 61.912 5.943 1.00106.38 C ATOM 1123 O PHE A 156 6.276 61.702 5.481 1.00105.46 O ATOM 1124 CB PHE A 156 3.342 60.161 5.615 1.00107.04 C ATOM 1125 CG PHE A 156 4.223 58.980 5.263 1.00107.71 C ATOM 1126 CD1 PHE A 156 4.511 58.667 3.930 1.00107.58 C ATOM 1127 CD2 PHE A 156 4.748 58.168 6.269 1.00106.90 C ATOM 1128 CE1 PHE A 156 5.304 57.565 3.608 1.00106.82 C ATOM 1129 CE2 PHE A 156 5.537 57.067 5.960 1.00106.77 C ATOM 1130 CZ PHE A 156 5.818 56.764 4.625 1.00107.12 C ATOM 1131 N PRO A 157 4.967 62.492 7.147 1.00106.61 N ATOM 1132 CA PRO A 157 6.081 62.912 8.008 1.00106.37 C ATOM 1133 C PRO A 157 7.055 61.801 8.371 1.00106.46 C ATOM 1134 O PRO A 157 6.682 60.823 9.020 1.00106.53 O ATOM 1135 CB PRO A 157 5.373 63.485 9.237 1.00106.35 C ATOM 1136 CG PRO A 157 4.089 62.704 9.290 1.00106.43 C ATOM 1137 CD PRO A 157 3.678 62.699 7.835 1.00106.79 C ATOM 1138 N ASP A 158 8.306 61.966 7.949 1.00106.90 N ATOM 1139 CA ASP A 158 9.351 60.987 8.231 1.00106.90 C ATOM 1140 C ASP A 158 9.153 60.469 9.652 1.00107.41 C ATOM 1141 O ASP A 158 8.930 61.241 10.592 1.00107.55 O ATOM 1142 CB ASP A 158 10.728 61.632 8.063 1.00104.94 C ATOM 1143 CG ASP A 158 10.847 62.391 6.757 1.00104.15 C ATOM 1144 OD1 ASP A 158 10.567 61.794 5.692 1.00103.27 O ATOM 1145 OD2 ASP A 158 11.210 63.585 6.789 1.00103.26 O ATOM 1146 N VAL A 159 9.222 59.154 9.804 1.00107.49 N ATOM 1147 CA VAL A 159 8.997 58.547 11.102 1.00107.43 C ATOM 1148 C VAL A 159 10.257 58.215 11.896 1.00105.92 C ATOM 1149 O VAL A 159 11.246 57.713 11.349 1.00105.41 O ATOM 1150 CB VAL A 159 8.126 57.274 10.952 1.00108.72 C ATOM 1151 CG1 VAL A 159 7.778 56.703 12.326 1.00109.55 C ATOM 1152 CG2 VAL A 159 6.855 57.610 10.170 1.00109.32 C ATOM 1153 N PRO A 160 10.224 58.507 13.209 1.00104.72 N ATOM 1154 CA PRO A 160 11.282 58.294 14.198 1.00104.41 C ATOM 1155 C PRO A 160 11.765 56.853 14.321 1.00103.98 C ATOM 1156 O PRO A 160 11.224 56.076 15.111 1.00103.74 O ATOM 1157 CB PRO A 160 10.643 58.789 15.490 1.00103.93 C ATOM 1158 CG PRO A 160 9.783 59.892 15.014 1.00103.63 C ATOM 1159 CD PRO A 160 9.125 59.287 13.805 1.00104.02 C ATOM 1160 N GLY A 161 12.789 56.512 13.542 1.00103.53 N ATOM 1161 CA GLY A 161 13.352 55.177 13.594 1.00103.26 C ATOM 1162 C GLY A 161 13.464 54.493 12.249 1.00103.11 C ATOM 1163 O GLY A 161 14.166 53.489 12.114 1.00102.78 O ATOM 1164 N PHE A 162 12.795 55.046 11.244 1.00102.76 N ATOM 1165 CA PHE A 162 12.803 54.445 9.919 1.00102.25 C ATOM 1166 C PHE A 162 13.581 55.262 8.904 1.00101.97 C ATOM 1167 O PHE A 162 13.363 55.150 7.700 1.00101.94 O ATOM 1168 CB PHE A 162 11.362 54.247 9.459 1.00101.45 C ATOM 1169 CG PHE A 162 10.484 53.644 10.514 1.00101.08 C ATOM 1170 CD1 PHE A 162 10.144 54.372 11.653 1.00100.76 C ATOM 1171 CD2 PHE A 162 10.036 52.334 10.399 1.00101.58 C ATOM 1172 CE1 PHE A 162 9.378 53.806 12.663 1.00100.12 C ATOM 1173 CE2 PHE A 162 9.267 51.756 11.408 1.00101.83 C ATOM 1174 CZ PHE A 162 8.939 52.497 12.543 1.00101.45 C ATOM 1175 N SER A 163 14.494 56.084 9.403 1.00102.29 N ATOM 1176 CA SER A 163 15.325 56.918 8.545 1.00102.63 C ATOM 1177 C SER A 163 16.013 56.038 7.495 1.00102.57 C ATOM 1178 O SER A 163 15.851 56.247 6.290 1.00102.59 O ATOM 1179 CB SER A 163 16.367 57.644 9.397 1.00102.37 C ATOM 1180 OG SER A 163 15.741 58.319 10.479 1.00101.02 O ATOM 1181 N TRP A 164 16.761 55.047 7.979 1.00102.49 N ATOM 1182 CA TRP A 164 17.503 54.075 7.160 1.00101.47 C ATOM 1183 C TRP A 164 16.656 53.402 6.074 1.00 99.77 C ATOM 1184 O TRP A 164 17.186 52.783 5.149 1.00 98.35 O ATOM 1185 CB TRP A 164 18.070 52.988 8.072 1.00103.47 C ATOM 1186 CG TRP A 164 16.996 52.353 8.921 1.00106.13 C ATOM 1187 CD1 TRP A 164 16.552 52.772 10.150 1.00106.71 C ATOM 1188 CD2 TRP A 164 16.174 51.236 8.556 1.00106.44 C ATOM 1189 NE1 TRP A 164 15.506 51.979 10.569 1.00106.48 N ATOM 1190 CE2 TRP A 164 15.261 51.033 9.622 1.00106.26 C ATOM 1191 CE3 TRP A 164 16.134 50.386 7.447 1.00106.62 C ATOM 1192 CZ2 TRP A 164 14.305 50.016 9.584 1.00106.64 C ATOM 1193 CZ3 TRP A 164 15.184 49.378 7.417 1.00106.79 C ATOM 1194 CH2 TRP A 164 14.287 49.197 8.486 1.00107.31 C ATOM 1195 N VAL A 165 15.339 53.510 6.208 1.00 98.45 N ATOM 1196 CA VAL A 165 14.422 52.917 5.251 1.00 97.02 C ATOM 1197 C VAL A 165 14.417 53.689 3.940 1.00 96.57 C ATOM 1198 O VAL A 165 14.342 54.917 3.926 1.00 95.58 O ATOM 1199 CB VAL A 165 12.998 52.882 5.808 1.00 96.60 C ATOM 1200 CG1 VAL A 165 12.066 52.232 4.806 1.00 96.25 C ATOM 1201 CG2 VAL A 165 12.987 52.136 7.127 1.00 95.83 C ATOM 1202 N THR A 166 14.485 52.950 2.838 1.00 96.74 N ATOM 1203 CA THR A 166 14.498 53.543 1.508 1.00 96.80 C ATOM 1204 C THR A 166 13.397 52.949 0.632 1.00 96.74 C ATOM 1205 O THR A 166 13.451 51.771 0.273 1.00 96.41 O ATOM 1206 CB THR A 166 15.846 53.288 0.803 1.00 96.99 C ATOM 1207 OG1 THR A 166 16.918 53.566 1.710 1.00 97.70 O ATOM 1208 CG2 THR A 166 15.985 54.176 -0.436 1.00 96.21 C ATOM 1209 N PRO A 167 12.378 53.757 0.288 1.00 96.68 N ATOM 1210 CA PRO A 167 11.284 53.266 -0.558 1.00 97.04 C ATOM 1211 C PRO A 167 11.915 52.513 -1.724 1.00 97.98 C ATOM 1212 O PRO A 167 12.517 53.123 -2.606 1.00 98.54 O ATOM 1213 CB PRO A 167 10.596 54.552 -0.983 1.00 96.70 C ATOM 1214 CG PRO A 167 10.740 55.410 0.259 1.00 96.21 C ATOM 1215 CD PRO A 167 12.175 55.170 0.661 1.00 95.91 C ATOM 1216 N CYS A 168 11.774 51.189 -1.719 1.00 99.21 N ATOM 1217 CA CYS A 168 12.392 50.335 -2.736 1.00100.39 C ATOM 1218 C CYS A 168 11.594 49.807 -3.935 1.00100.22 C ATOM 1219 O CYS A 168 12.197 49.377 -4.916 1.00100.36 O ATOM 1220 CB CYS A 168 13.074 49.150 -2.039 1.00101.41 C ATOM 1221 SG CYS A 168 12.045 48.309 -0.814 1.00102.86 S ATOM 1222 N ILE A 169 10.266 49.802 -3.879 1.00100.14 N ATOM 1223 CA ILE A 169 9.507 49.311 -5.033 1.00 99.43 C ATOM 1224 C ILE A 169 8.256 50.124 -5.324 1.00100.06 C ATOM 1225 O ILE A 169 7.630 50.689 -4.423 1.00 99.94 O ATOM 1226 CB ILE A 169 9.094 47.819 -4.886 1.00 97.80 C ATOM 1227 CG1 ILE A 169 8.135 47.643 -3.706 1.00 97.02 C ATOM 1228 CG2 ILE A 169 10.326 46.950 -4.727 1.00 97.30 C ATOM 1229 CD1 ILE A 169 8.689 48.068 -2.363 1.00 96.20 C ATOM 1230 N SER A 170 7.911 50.194 -6.603 1.00100.49 N ATOM 1231 CA SER A 170 6.731 50.922 -7.033 1.00101.32 C ATOM 1232 C SER A 170 5.529 49.970 -7.023 1.00101.91 C ATOM 1233 O SER A 170 5.684 48.768 -6.782 1.00102.46 O ATOM 1234 CB SER A 170 6.969 51.513 -8.428 1.00100.78 C ATOM 1235 OG SER A 170 7.596 50.585 -9.289 1.00100.26 O ATOM 1236 N ALA A 171 4.336 50.503 -7.274 1.00101.63 N ATOM 1237 CA ALA A 171 3.120 49.692 -7.275 1.00101.52 C ATOM 1238 C ALA A 171 3.060 48.733 -8.465 1.00101.47 C ATOM 1239 O ALA A 171 2.149 47.907 -8.577 1.00101.00 O ATOM 1240 CB ALA A 171 1.899 50.604 -7.273 1.00101.29 C ATOM 1241 N LYS A 172 4.045 48.841 -9.345 1.00101.77 N ATOM 1242 CA LYS A 172 4.097 47.997 -10.524 1.00102.19 C ATOM 1243 C LYS A 172 4.808 46.675 -10.265 1.00100.80 C ATOM 1244 O LYS A 172 4.860 45.807 -11.139 1.00100.93 O ATOM 1245 CB LYS A 172 4.782 48.760 -11.665 1.00104.75 C ATOM 1246 CG LYS A 172 3.806 49.323 -12.708 1.00107.35 C ATOM 1247 CD LYS A 172 2.609 50.052 -12.073 1.00107.90 C ATOM 1248 CE LYS A 172 1.503 50.295 -13.105 1.00108.26 C ATOM 1249 NZ LYS A 172 0.237 50.819 -12.513 1.00107.62 N ATOM 1250 N ASP A 173 5.342 46.518 -9.058 1.00 98.57 N ATOM 1251 CA ASP A 173 6.053 45.296 -8.709 1.00 95.87 C ATOM 1252 C ASP A 173 5.427 44.539 -7.534 1.00 92.84 C ATOM 1253 O ASP A 173 5.845 43.421 -7.219 1.00 92.74 O ATOM 1254 CB ASP A 173 7.522 45.612 -8.392 1.00 97.37 C ATOM 1255 CG ASP A 173 8.215 46.378 -9.511 1.00 97.26 C ATOM 1256 OD1 ASP A 173 8.154 45.935 -10.681 1.00 95.86 O ATOM 1257 OD2 ASP A 173 8.828 47.426 -9.208 1.00 98.11 O ATOM 1258 N ILE A 174 4.428 45.149 -6.895 1.00 89.13 N ATOM 1259 CA ILE A 174 3.728 44.540 -5.757 1.00 83.93 C ATOM 1260 C ILE A 174 2.394 43.916 -6.166 1.00 80.70 C ATOM 1261 O ILE A 174 1.697 44.451 -7.019 1.00 80.98 O ATOM 1262 CB ILE A 174 3.431 45.585 -4.651 1.00 82.90 C ATOM 1263 CG1 ILE A 174 2.612 44.946 -3.535 1.00 81.94 C ATOM 1264 CG2 ILE A 174 2.650 46.744 -5.218 1.00 82.20 C ATOM 1265 CD1 ILE A 174 3.299 43.772 -2.898 1.00 82.59 C ATOM 1266 N VAL A 175 2.049 42.779 -5.565 1.00 77.49 N ATOM 1267 CA VAL A 175 0.775 42.106 -5.848 1.00 74.17 C ATOM 1268 C VAL A 175 0.166 41.524 -4.562 1.00 71.60 C ATOM 1269 O VAL A 175 0.789 40.719 -3.862 1.00 70.28 O ATOM 1270 CB VAL A 175 0.926 40.972 -6.916 1.00 73.83 C ATOM 1271 CG1 VAL A 175 -0.418 40.295 -7.158 1.00 72.13 C ATOM 1272 CG2 VAL A 175 1.439 41.549 -8.215 1.00 71.71 C ATOM 1273 N TYR A 176 -1.059 41.955 -4.270 1.00 68.81 N ATOM 1274 CA TYR A 176 -1.805 41.530 -3.090 1.00 65.95 C ATOM 1275 C TYR A 176 -2.750 40.394 -3.411 1.00 66.00 C ATOM 1276 O TYR A 176 -3.431 40.435 -4.435 1.00 66.47 O ATOM 1277 CB TYR A 176 -2.655 42.676 -2.583 1.00 63.07 C ATOM 1278 CG TYR A 176 -1.935 43.714 -1.775 1.00 62.07 C ATOM 1279 CD1 TYR A 176 -1.578 43.465 -0.461 1.00 61.02 C ATOM 1280 CD2 TYR A 176 -1.750 45.001 -2.273 1.00 62.56 C ATOM 1281 CE1 TYR A 176 -1.074 44.479 0.355 1.00 60.83 C ATOM 1282 CE2 TYR A 176 -1.247 46.026 -1.465 1.00 62.19 C ATOM 1283 CZ TYR A 176 -0.919 45.757 -0.145 1.00 61.80 C ATOM 1284 OH TYR A 176 -0.503 46.774 0.689 1.00 61.14 O ATOM 1285 N ILE A 177 -2.796 39.383 -2.547 1.00 66.80 N ATOM 1286 CA ILE A 177 -3.723 38.263 -2.745 1.00 67.11 C ATOM 1287 C ILE A 177 -4.374 37.942 -1.391 1.00 68.56 C ATOM 1288 O ILE A 177 -3.677 37.766 -0.378 1.00 69.41 O ATOM 1289 CB ILE A 177 -3.027 36.961 -3.285 1.00 65.54 C ATOM 1290 CG1 ILE A 177 -1.803 37.302 -4.140 1.00 65.06 C ATOM 1291 CG2 ILE A 177 -4.002 36.172 -4.153 1.00 61.44 C ATOM 1292 CD1 ILE A 177 -1.131 36.090 -4.755 1.00 62.70 C ATOM 1293 N GLY A 178 -5.709 37.906 -1.374 1.00 68.46 N ATOM 1294 CA GLY A 178 -6.437 37.587 -0.158 1.00 68.24 C ATOM 1295 C GLY A 178 -7.071 38.757 0.576 1.00 68.59 C ATOM 1296 O GLY A 178 -7.617 38.570 1.675 1.00 68.65 O ATOM 1297 N LEU A 179 -7.024 39.950 -0.019 1.00 67.99 N ATOM 1298 CA LEU A 179 -7.595 41.145 0.615 1.00 65.97 C ATOM 1299 C LEU A 179 -9.106 41.016 0.846 1.00 64.84 C ATOM 1300 O LEU A 179 -9.873 40.646 -0.043 1.00 62.84 O ATOM 1301 CB LEU A 179 -7.281 42.397 -0.222 1.00 63.88 C ATOM 1302 CG LEU A 179 -5.791 42.594 -0.553 1.00 61.65 C ATOM 1303 CD1 LEU A 179 -5.604 43.843 -1.400 1.00 60.43 C ATOM 1304 CD2 LEU A 179 -4.984 42.673 0.731 1.00 58.38 C ATOM 1305 N ARG A 180 -9.524 41.323 2.065 1.00 64.18 N ATOM 1306 CA ARG A 180 -10.921 41.216 2.409 1.00 63.57 C ATOM 1307 C ARG A 180 -11.282 42.163 3.546 1.00 65.71 C ATOM 1308 O ARG A 180 -12.437 42.243 3.945 1.00 67.12 O ATOM 1309 CB ARG A 180 -11.242 39.767 2.782 1.00 60.48 C ATOM 1310 CG ARG A 180 -11.048 39.448 4.247 1.00 58.48 C ATOM 1311 CD ARG A 180 -11.227 37.978 4.543 1.00 55.45 C ATOM 1312 NE ARG A 180 -10.080 37.260 4.032 1.00 58.34 N ATOM 1313 CZ ARG A 180 -9.295 36.469 4.754 1.00 60.04 C ATOM 1314 NH1 ARG A 180 -9.534 36.271 6.046 1.00 58.90 N ATOM 1315 NH2 ARG A 180 -8.237 35.902 4.179 1.00 60.12 N ATOM 1316 N ASP A 181 -10.298 42.876 4.075 1.00 68.57 N ATOM 1317 CA ASP A 181 -10.562 43.844 5.141 1.00 72.38 C ATOM 1318 C ASP A 181 -9.692 45.095 4.939 1.00 73.65 C ATOM 1319 O ASP A 181 -8.701 45.331 5.650 1.00 72.86 O ATOM 1320 CB ASP A 181 -10.312 43.209 6.516 1.00 74.74 C ATOM 1321 CG ASP A 181 -10.261 44.242 7.637 1.00 76.50 C ATOM 1322 OD1 ASP A 181 -10.962 45.273 7.524 1.00 76.76 O ATOM 1323 OD2 ASP A 181 -9.526 44.012 8.628 1.00 77.76 O ATOM 1324 N VAL A 182 -10.082 45.905 3.960 1.00 75.62 N ATOM 1325 CA VAL A 182 -9.323 47.104 3.636 1.00 77.84 C ATOM 1326 C VAL A 182 -9.941 48.416 4.069 1.00 78.44 C ATOM 1327 O VAL A 182 -11.098 48.689 3.755 1.00 79.02 O ATOM 1328 CB VAL A 182 -9.038 47.173 2.125 1.00 78.31 C ATOM 1329 CG1 VAL A 182 -8.948 48.621 1.662 1.00 79.03 C ATOM 1330 CG2 VAL A 182 -7.730 46.458 1.826 1.00 79.08 C ATOM 1331 N ASP A 183 -9.138 49.229 4.763 1.00 79.14 N ATOM 1332 CA ASP A 183 -9.550 50.543 5.256 1.00 78.94 C ATOM 1333 C ASP A 183 -9.732 51.569 4.143 1.00 79.30 C ATOM 1334 O ASP A 183 -9.102 51.476 3.086 1.00 79.36 O ATOM 1335 CB ASP A 183 -8.535 51.077 6.266 1.00 78.05 C ATOM 1336 CG ASP A 183 -8.649 50.399 7.612 1.00 79.06 C ATOM 1337 OD1 ASP A 183 -9.726 50.506 8.236 1.00 78.74 O ATOM 1338 OD2 ASP A 183 -7.668 49.757 8.049 1.00 80.08 O ATOM 1339 N PRO A 184 -10.600 52.569 4.375 1.00 79.74 N ATOM 1340 CA PRO A 184 -10.875 53.625 3.395 1.00 80.29 C ATOM 1341 C PRO A 184 -9.567 54.111 2.773 1.00 81.42 C ATOM 1342 O PRO A 184 -9.262 53.803 1.617 1.00 81.63 O ATOM 1343 CB PRO A 184 -11.543 54.708 4.237 1.00 78.84 C ATOM 1344 CG PRO A 184 -12.233 53.934 5.285 1.00 78.62 C ATOM 1345 CD PRO A 184 -11.208 52.900 5.674 1.00 78.97 C ATOM 1346 N GLY A 185 -8.800 54.859 3.566 1.00 81.81 N ATOM 1347 CA GLY A 185 -7.530 55.393 3.115 1.00 82.31 C ATOM 1348 C GLY A 185 -6.772 54.417 2.249 1.00 82.89 C ATOM 1349 O GLY A 185 -6.291 54.770 1.175 1.00 82.84 O ATOM 1350 N GLU A 186 -6.668 53.180 2.716 1.00 83.40 N ATOM 1351 CA GLU A 186 -5.970 52.157 1.963 1.00 84.01 C ATOM 1352 C GLU A 186 -6.583 51.938 0.577 1.00 85.43 C ATOM 1353 O GLU A 186 -5.851 51.826 -0.412 1.00 85.30 O ATOM 1354 CB GLU A 186 -5.946 50.852 2.760 1.00 82.52 C ATOM 1355 CG GLU A 186 -4.753 50.735 3.683 1.00 80.01 C ATOM 1356 CD GLU A 186 -4.936 49.662 4.723 1.00 80.52 C ATOM 1357 OE1 GLU A 186 -5.894 49.797 5.507 1.00 82.23 O ATOM 1358 OE2 GLU A 186 -4.137 48.697 4.765 1.00 79.04 O ATOM 1359 N HIS A 187 -7.914 51.889 0.493 1.00 87.43 N ATOM 1360 CA HIS A 187 -8.569 51.681 -0.799 1.00 89.34 C ATOM 1361 C HIS A 187 -8.171 52.807 -1.743 1.00 91.26 C ATOM 1362 O HIS A 187 -7.855 52.580 -2.916 1.00 91.25 O ATOM 1363 CB HIS A 187 -10.094 51.647 -0.658 1.00 88.09 C ATOM 1364 CG HIS A 187 -10.794 51.084 -1.862 1.00 87.99 C ATOM 1365 ND1 HIS A 187 -12.162 51.021 -1.972 1.00 88.58 N ATOM 1366 CD2 HIS A 187 -10.298 50.526 -2.993 1.00 87.65 C ATOM 1367 CE1 HIS A 187 -12.486 50.445 -3.121 1.00 87.86 C ATOM 1368 NE2 HIS A 187 -11.374 50.136 -3.756 1.00 86.86 N ATOM 1369 N TYR A 188 -8.195 54.024 -1.210 1.00 93.24 N ATOM 1370 CA TYR A 188 -7.825 55.219 -1.958 1.00 94.52 C ATOM 1371 C TYR A 188 -6.466 54.951 -2.594 1.00 93.43 C ATOM 1372 O TYR A 188 -6.332 54.876 -3.814 1.00 93.13 O ATOM 1373 CB TYR A 188 -7.722 56.402 -0.992 1.00 97.75 C ATOM 1374 CG TYR A 188 -7.504 57.742 -1.651 1.00101.26 C ATOM 1375 CD1 TYR A 188 -8.513 58.337 -2.400 1.00102.84 C ATOM 1376 CD2 TYR A 188 -6.291 58.417 -1.518 1.00103.00 C ATOM 1377 CE1 TYR A 188 -8.326 59.570 -3.003 1.00105.63 C ATOM 1378 CE2 TYR A 188 -6.089 59.656 -2.118 1.00105.61 C ATOM 1379 CZ TYR A 188 -7.112 60.227 -2.861 1.00106.52 C ATOM 1380 OH TYR A 188 -6.934 61.449 -3.473 1.00107.84 O ATOM 1381 N ILE A 189 -5.474 54.790 -1.728 1.00 91.82 N ATOM 1382 CA ILE A 189 -4.102 54.532 -2.112 1.00 89.96 C ATOM 1383 C ILE A 189 -3.932 53.468 -3.194 1.00 89.45 C ATOM 1384 O ILE A 189 -3.338 53.744 -4.231 1.00 90.11 O ATOM 1385 CB ILE A 189 -3.284 54.129 -0.879 1.00 89.42 C ATOM 1386 CG1 ILE A 189 -3.574 55.113 0.256 1.00 89.61 C ATOM 1387 CG2 ILE A 189 -1.808 54.133 -1.210 1.00 88.40 C ATOM 1388 CD1 ILE A 189 -2.982 54.739 1.613 1.00 90.14 C ATOM 1389 N ILE A 190 -4.448 52.260 -2.976 1.00 88.36 N ATOM 1390 CA ILE A 190 -4.299 51.199 -3.978 1.00 87.98 C ATOM 1391 C ILE A 190 -4.921 51.518 -5.338 1.00 87.97 C ATOM 1392 O ILE A 190 -4.393 51.115 -6.377 1.00 87.59 O ATOM 1393 CB ILE A 190 -4.886 49.852 -3.495 1.00 87.07 C ATOM 1394 CG1 ILE A 190 -6.258 50.080 -2.869 1.00 86.15 C ATOM 1395 CG2 ILE A 190 -3.929 49.181 -2.528 1.00 86.59 C ATOM 1396 CD1 ILE A 190 -6.891 48.815 -2.363 1.00 85.40 C ATOM 1397 N LYS A 191 -6.046 52.225 -5.331 1.00 88.65 N ATOM 1398 CA LYS A 191 -6.717 52.595 -6.575 1.00 88.53 C ATOM 1399 C LYS A 191 -5.889 53.672 -7.267 1.00 88.10 C ATOM 1400 O LYS A 191 -5.494 53.532 -8.429 1.00 88.34 O ATOM 1401 CB LYS A 191 -8.110 53.149 -6.280 1.00 88.47 C ATOM 1402 CG LYS A 191 -8.988 52.221 -5.477 1.00 88.98 C ATOM 1403 CD LYS A 191 -9.185 50.891 -6.187 1.00 90.42 C ATOM 1404 CE LYS A 191 -9.988 51.044 -7.473 1.00 91.22 C ATOM 1405 NZ LYS A 191 -10.220 49.740 -8.165 1.00 91.30 N ATOM 1406 N THR A 192 -5.636 54.745 -6.519 1.00 87.39 N ATOM 1407 CA THR A 192 -4.868 55.890 -6.984 1.00 86.37 C ATOM 1408 C THR A 192 -3.532 55.479 -7.562 1.00 85.11 C ATOM 1409 O THR A 192 -3.254 55.733 -8.730 1.00 85.30 O ATOM 1410 CB THR A 192 -4.598 56.877 -5.836 1.00 86.84 C ATOM 1411 OG1 THR A 192 -5.818 57.538 -5.471 1.00 87.24 O ATOM 1412 CG2 THR A 192 -3.562 57.910 -6.259 1.00 87.53 C ATOM 1413 N LEU A 193 -2.708 54.854 -6.731 1.00 83.59 N ATOM 1414 CA LEU A 193 -1.389 54.415 -7.147 1.00 81.81 C ATOM 1415 C LEU A 193 -1.444 53.211 -8.054 1.00 81.47 C ATOM 1416 O LEU A 193 -0.412 52.626 -8.362 1.00 82.12 O ATOM 1417 CB LEU A 193 -0.532 54.075 -5.936 1.00 81.24 C ATOM 1418 CG LEU A 193 -0.121 55.238 -5.039 1.00 80.57 C ATOM 1419 CD1 LEU A 193 -1.349 55.886 -4.409 1.00 80.60 C ATOM 1420 CD2 LEU A 193 0.820 54.710 -3.971 1.00 80.83 C ATOM 1421 N GLY A 194 -2.642 52.836 -8.480 1.00 80.95 N ATOM 1422 CA GLY A 194 -2.771 51.690 -9.360 1.00 80.89 C ATOM 1423 C GLY A 194 -1.932 50.498 -8.924 1.00 80.68 C ATOM 1424 O GLY A 194 -0.863 50.246 -9.475 1.00 79.93 O ATOM 1425 N ILE A 195 -2.427 49.767 -7.930 1.00 81.64 N ATOM 1426 CA ILE A 195 -1.751 48.590 -7.395 1.00 81.78 C ATOM 1427 C ILE A 195 -2.522 47.339 -7.791 1.00 82.29 C ATOM 1428 O ILE A 195 -3.706 47.219 -7.461 1.00 82.81 O ATOM 1429 CB ILE A 195 -1.695 48.647 -5.861 1.00 81.34 C ATOM 1430 CG1 ILE A 195 -0.708 49.722 -5.426 1.00 81.48 C ATOM 1431 CG2 ILE A 195 -1.339 47.282 -5.301 1.00 81.33 C ATOM 1432 CD1 ILE A 195 -0.732 50.003 -3.942 1.00 82.56 C ATOM 1433 N LYS A 196 -1.860 46.417 -8.490 1.00 82.10 N ATOM 1434 CA LYS A 196 -2.514 45.176 -8.909 1.00 82.86 C ATOM 1435 C LYS A 196 -2.798 44.312 -7.678 1.00 83.44 C ATOM 1436 O LYS A 196 -1.899 44.024 -6.879 1.00 82.76 O ATOM 1437 CB LYS A 196 -1.639 44.410 -9.901 1.00 82.66 C ATOM 1438 CG LYS A 196 -2.238 43.111 -10.409 1.00 82.46 C ATOM 1439 CD LYS A 196 -3.449 43.330 -11.304 1.00 83.91 C ATOM 1440 CE LYS A 196 -3.835 42.017 -12.016 1.00 86.21 C ATOM 1441 NZ LYS A 196 -5.105 42.029 -12.829 1.00 85.46 N ATOM 1442 N TYR A 197 -4.062 43.915 -7.531 1.00 83.30 N ATOM 1443 CA TYR A 197 -4.503 43.112 -6.393 1.00 82.70 C ATOM 1444 C TYR A 197 -5.560 42.092 -6.796 1.00 82.58 C ATOM 1445 O TYR A 197 -6.019 42.067 -7.939 1.00 83.40 O ATOM 1446 CB TYR A 197 -5.081 44.023 -5.293 1.00 82.50 C ATOM 1447 CG TYR A 197 -6.277 44.862 -5.730 1.00 82.72 C ATOM 1448 CD1 TYR A 197 -7.410 44.261 -6.297 1.00 82.81 C ATOM 1449 CD2 TYR A 197 -6.275 46.257 -5.587 1.00 82.18 C ATOM 1450 CE1 TYR A 197 -8.507 45.017 -6.716 1.00 82.44 C ATOM 1451 CE2 TYR A 197 -7.376 47.031 -6.001 1.00 82.21 C ATOM 1452 CZ TYR A 197 -8.491 46.398 -6.572 1.00 82.80 C ATOM 1453 OH TYR A 197 -9.579 47.124 -7.026 1.00 80.90 O ATOM 1454 N PHE A 198 -5.954 41.259 -5.839 1.00 81.74 N ATOM 1455 CA PHE A 198 -6.971 40.240 -6.071 1.00 79.29 C ATOM 1456 C PHE A 198 -7.743 40.012 -4.790 1.00 75.97 C ATOM 1457 O PHE A 198 -7.411 39.119 -4.018 1.00 76.68 O ATOM 1458 CB PHE A 198 -6.344 38.908 -6.486 1.00 81.72 C ATOM 1459 CG PHE A 198 -5.749 38.909 -7.860 1.00 84.47 C ATOM 1460 CD1 PHE A 198 -4.495 39.466 -8.092 1.00 85.88 C ATOM 1461 CD2 PHE A 198 -6.430 38.314 -8.923 1.00 85.59 C ATOM 1462 CE1 PHE A 198 -3.924 39.425 -9.364 1.00 87.07 C ATOM 1463 CE2 PHE A 198 -5.868 38.267 -10.195 1.00 86.78 C ATOM 1464 CZ PHE A 198 -4.610 38.822 -10.418 1.00 86.99 C ATOM 1465 N SER A 199 -8.773 40.808 -4.554 1.00 71.87 N ATOM 1466 CA SER A 199 -9.547 40.623 -3.345 1.00 67.84 C ATOM 1467 C SER A 199 -10.272 39.287 -3.420 1.00 66.10 C ATOM 1468 O SER A 199 -10.403 38.698 -4.502 1.00 64.20 O ATOM 1469 CB SER A 199 -10.556 41.746 -3.185 1.00 66.46 C ATOM 1470 OG SER A 199 -11.427 41.759 -4.287 1.00 65.39 O ATOM 1471 N MET A 200 -10.728 38.817 -2.259 1.00 64.09 N ATOM 1472 CA MET A 200 -11.450 37.562 -2.156 1.00 61.01 C ATOM 1473 C MET A 200 -12.449 37.465 -3.292 1.00 62.42 C ATOM 1474 O MET A 200 -12.649 36.388 -3.860 1.00 62.06 O ATOM 1475 CB MET A 200 -12.172 37.476 -0.811 1.00 56.33 C ATOM 1476 CG MET A 200 -11.247 37.266 0.372 1.00 53.16 C ATOM 1477 SD MET A 200 -10.171 35.801 0.200 1.00 46.37 S ATOM 1478 CE MET A 200 -11.362 34.537 0.043 1.00 48.10 C ATOM 1479 N THR A 201 -13.063 38.595 -3.634 1.00 62.97 N ATOM 1480 CA THR A 201 -14.033 38.611 -4.719 1.00 64.97 C ATOM 1481 C THR A 201 -13.363 38.205 -6.045 1.00 66.10 C ATOM 1482 O THR A 201 -13.991 37.542 -6.896 1.00 65.58 O ATOM 1483 CB THR A 201 -14.736 40.006 -4.834 1.00 65.68 C ATOM 1484 OG1 THR A 201 -14.545 40.550 -6.148 1.00 65.66 O ATOM 1485 CG2 THR A 201 -14.196 40.972 -3.783 1.00 65.96 C ATOM 1486 N GLU A 202 -12.091 38.576 -6.214 1.00 66.73 N ATOM 1487 CA GLU A 202 -11.362 38.219 -7.431 1.00 67.80 C ATOM 1488 C GLU A 202 -11.004 36.765 -7.373 1.00 68.66 C ATOM 1489 O GLU A 202 -11.062 36.059 -8.385 1.00 70.18 O ATOM 1490 CB GLU A 202 -10.095 39.031 -7.584 1.00 67.72 C ATOM 1491 CG GLU A 202 -10.303 40.210 -8.494 1.00 72.08 C ATOM 1492 CD GLU A 202 -11.396 41.136 -7.989 1.00 74.23 C ATOM 1493 OE1 GLU A 202 -11.155 41.809 -6.957 1.00 73.29 O ATOM 1494 OE2 GLU A 202 -12.488 41.178 -8.619 1.00 75.62 O ATOM 1495 N VAL A 203 -10.619 36.324 -6.180 1.00 67.55 N ATOM 1496 CA VAL A 203 -10.287 34.929 -5.968 1.00 65.53 C ATOM 1497 C VAL A 203 -11.555 34.211 -6.431 1.00 64.91 C ATOM 1498 O VAL A 203 -11.525 33.354 -7.323 1.00 64.87 O ATOM 1499 CB VAL A 203 -10.070 34.620 -4.453 1.00 65.75 C ATOM 1500 CG1 VAL A 203 -9.491 33.222 -4.279 1.00 65.81 C ATOM 1501 CG2 VAL A 203 -9.166 35.660 -3.816 1.00 66.78 C ATOM 1502 N ASP A 204 -12.674 34.608 -5.820 1.00 62.79 N ATOM 1503 CA ASP A 204 -13.977 34.030 -6.103 1.00 61.49 C ATOM 1504 C ASP A 204 -14.282 33.960 -7.571 1.00 62.37 C ATOM 1505 O ASP A 204 -14.775 32.946 -8.056 1.00 61.65 O ATOM 1506 CB ASP A 204 -15.070 34.836 -5.411 1.00 59.70 C ATOM 1507 CG ASP A 204 -15.247 34.446 -3.965 1.00 56.79 C ATOM 1508 OD1 ASP A 204 -14.278 33.900 -3.398 1.00 52.71 O ATOM 1509 OD2 ASP A 204 -16.342 34.694 -3.405 1.00 54.02 O ATOM 1510 N LYS A 205 -13.989 35.043 -8.280 1.00 64.38 N ATOM 1511 CA LYS A 205 -14.264 35.101 -9.707 1.00 65.17 C ATOM 1512 C LYS A 205 -13.389 34.196 -10.567 1.00 64.26 C ATOM 1513 O LYS A 205 -13.889 33.307 -11.239 1.00 63.15 O ATOM 1514 CB LYS A 205 -14.143 36.536 -10.208 1.00 66.18 C ATOM 1515 CG LYS A 205 -14.584 36.687 -11.649 1.00 69.34 C ATOM 1516 CD LYS A 205 -14.486 38.132 -12.116 1.00 73.83 C ATOM 1517 CE LYS A 205 -15.071 38.343 -13.519 1.00 76.14 C ATOM 1518 NZ LYS A 205 -16.559 38.137 -13.597 1.00 77.46 N ATOM 1519 N LEU A 206 -12.083 34.413 -10.533 1.00 64.81 N ATOM 1520 CA LEU A 206 -11.153 33.640 -11.356 1.00 67.28 C ATOM 1521 C LEU A 206 -10.807 32.234 -10.887 1.00 68.71 C ATOM 1522 O LEU A 206 -10.696 31.311 -11.698 1.00 69.72 O ATOM 1523 CB LEU A 206 -9.840 34.408 -11.503 1.00 68.10 C ATOM 1524 CG LEU A 206 -9.968 35.909 -11.740 1.00 67.44 C ATOM 1525 CD1 LEU A 206 -8.586 36.565 -11.699 1.00 63.42 C ATOM 1526 CD2 LEU A 206 -10.678 36.126 -13.072 1.00 67.18 C ATOM 1527 N GLY A 207 -10.608 32.078 -9.583 1.00 69.83 N ATOM 1528 CA GLY A 207 -10.222 30.785 -9.053 1.00 70.69 C ATOM 1529 C GLY A 207 -8.722 30.866 -8.870 1.00 70.94 C ATOM 1530 O GLY A 207 -8.001 31.200 -9.805 1.00 70.79 O ATOM 1531 N ILE A 208 -8.259 30.574 -7.660 1.00 70.50 N ATOM 1532 CA ILE A 208 -6.844 30.654 -7.332 1.00 70.18 C ATOM 1533 C ILE A 208 -5.917 30.335 -8.513 1.00 72.13 C ATOM 1534 O ILE A 208 -4.913 31.025 -8.717 1.00 71.07 O ATOM 1535 CB ILE A 208 -6.513 29.751 -6.106 1.00 68.19 C ATOM 1536 CG1 ILE A 208 -5.824 30.574 -5.018 1.00 66.86 C ATOM 1537 CG2 ILE A 208 -5.685 28.558 -6.528 1.00 65.59 C ATOM 1538 CD1 ILE A 208 -4.600 31.335 -5.457 1.00 63.60 C ATOM 1539 N GLY A 209 -6.257 29.317 -9.301 1.00 74.55 N ATOM 1540 CA GLY A 209 -5.420 28.963 -10.440 1.00 78.13 C ATOM 1541 C GLY A 209 -5.107 30.133 -11.365 1.00 79.78 C ATOM 1542 O GLY A 209 -3.941 30.474 -11.573 1.00 79.81 O ATOM 1543 N LYS A 210 -6.161 30.737 -11.916 1.00 81.12 N ATOM 1544 CA LYS A 210 -6.073 31.890 -12.815 1.00 81.72 C ATOM 1545 C LYS A 210 -5.481 33.085 -12.060 1.00 81.92 C ATOM 1546 O LYS A 210 -4.704 33.864 -12.612 1.00 82.06 O ATOM 1547 CB LYS A 210 -7.480 32.223 -13.339 1.00 83.49 C ATOM 1548 CG LYS A 210 -7.630 33.543 -14.084 1.00 86.80 C ATOM 1549 CD LYS A 210 -6.745 33.615 -15.328 1.00 89.98 C ATOM 1550 CE LYS A 210 -6.904 34.963 -16.055 1.00 90.59 C ATOM 1551 NZ LYS A 210 -5.832 35.226 -17.079 1.00 91.17 N ATOM 1552 N VAL A 211 -5.850 33.222 -10.792 1.00 81.88 N ATOM 1553 CA VAL A 211 -5.343 34.310 -9.969 1.00 81.43 C ATOM 1554 C VAL A 211 -3.816 34.290 -9.966 1.00 82.60 C ATOM 1555 O VAL A 211 -3.186 35.338 -10.093 1.00 82.44 O ATOM 1556 CB VAL A 211 -5.848 34.193 -8.512 1.00 80.75 C ATOM 1557 CG1 VAL A 211 -5.199 35.253 -7.643 1.00 79.53 C ATOM 1558 CG2 VAL A 211 -7.359 34.332 -8.472 1.00 80.56 C ATOM 1559 N MET A 212 -3.226 33.101 -9.817 1.00 83.85 N ATOM 1560 CA MET A 212 -1.767 32.958 -9.796 1.00 85.06 C ATOM 1561 C MET A 212 -1.206 33.105 -11.202 1.00 87.35 C ATOM 1562 O MET A 212 -0.223 33.813 -11.432 1.00 87.24 O ATOM 1563 CB MET A 212 -1.358 31.595 -9.230 1.00 82.29 C ATOM 1564 CG MET A 212 -1.708 31.386 -7.757 1.00 80.30 C ATOM 1565 SD MET A 212 -0.942 32.561 -6.600 1.00 73.97 S ATOM 1566 CE MET A 212 0.638 31.866 -6.471 1.00 75.37 C ATOM 1567 N GLU A 213 -1.846 32.416 -12.137 1.00 90.13 N ATOM 1568 CA GLU A 213 -1.472 32.447 -13.541 1.00 92.16 C ATOM 1569 C GLU A 213 -1.314 33.912 -13.969 1.00 92.77 C ATOM 1570 O GLU A 213 -0.294 34.293 -14.549 1.00 94.09 O ATOM 1571 CB GLU A 213 -2.571 31.752 -14.344 1.00 94.04 C ATOM 1572 CG GLU A 213 -2.244 31.404 -15.778 1.00 99.07 C ATOM 1573 CD GLU A 213 -3.352 30.571 -16.421 1.00102.30 C ATOM 1574 OE1 GLU A 213 -3.623 29.457 -15.914 1.00103.12 O ATOM 1575 OE2 GLU A 213 -3.952 31.026 -17.425 1.00104.92 O ATOM 1576 N GLU A 214 -2.317 34.733 -13.668 1.00 92.07 N ATOM 1577 CA GLU A 214 -2.267 36.148 -14.013 1.00 91.63 C ATOM 1578 C GLU A 214 -1.102 36.818 -13.305 1.00 91.32 C ATOM 1579 O GLU A 214 -0.086 37.111 -13.925 1.00 91.52 O ATOM 1580 CB GLU A 214 -3.567 36.853 -13.615 1.00 92.05 C ATOM 1581 CG GLU A 214 -4.652 36.808 -14.668 1.00 92.73 C ATOM 1582 CD GLU A 214 -5.872 37.625 -14.279 1.00 93.79 C ATOM 1583 OE1 GLU A 214 -5.727 38.851 -14.051 1.00 93.91 O ATOM 1584 OE2 GLU A 214 -6.973 37.037 -14.208 1.00 93.93 O ATOM 1585 N THR A 215 -1.263 37.053 -12.004 1.00 91.31 N ATOM 1586 CA THR A 215 -0.238 37.696 -11.177 1.00 90.84 C ATOM 1587 C THR A 215 1.180 37.460 -11.668 1.00 90.63 C ATOM 1588 O THR A 215 2.030 38.349 -11.575 1.00 89.71 O ATOM 1589 CB THR A 215 -0.297 37.213 -9.714 1.00 91.00 C ATOM 1590 OG1 THR A 215 -0.442 35.788 -9.687 1.00 91.70 O ATOM 1591 CG2 THR A 215 -1.443 37.843 -8.986 1.00 90.94 C ATOM 1592 N PHE A 216 1.444 36.256 -12.173 1.00 90.43 N ATOM 1593 CA PHE A 216 2.776 35.948 -12.666 1.00 90.26 C ATOM 1594 C PHE A 216 3.088 36.749 -13.916 1.00 91.62 C ATOM 1595 O PHE A 216 3.964 37.615 -13.887 1.00 92.03 O ATOM 1596 CB PHE A 216 2.932 34.447 -12.916 1.00 86.72 C ATOM 1597 CG PHE A 216 3.158 33.653 -11.659 1.00 83.81 C ATOM 1598 CD1 PHE A 216 3.357 34.296 -10.439 1.00 82.90 C ATOM 1599 CD2 PHE A 216 3.158 32.267 -11.687 1.00 82.83 C ATOM 1600 CE1 PHE A 216 3.527 33.571 -9.263 1.00 82.23 C ATOM 1601 CE2 PHE A 216 3.328 31.531 -10.513 1.00 82.56 C ATOM 1602 CZ PHE A 216 3.520 32.184 -9.299 1.00 81.79 C ATOM 1603 N SER A 217 2.380 36.483 -15.008 1.00 92.18 N ATOM 1604 CA SER A 217 2.625 37.240 -16.227 1.00 92.47 C ATOM 1605 C SER A 217 2.780 38.726 -15.865 1.00 92.71 C ATOM 1606 O SER A 217 3.797 39.340 -16.181 1.00 93.44 O ATOM 1607 CB SER A 217 1.465 37.069 -17.205 1.00 92.57 C ATOM 1608 OG SER A 217 0.345 37.830 -16.789 1.00 92.76 O ATOM 1609 N TYR A 218 1.781 39.290 -15.189 1.00 92.44 N ATOM 1610 CA TYR A 218 1.810 40.694 -14.787 1.00 93.21 C ATOM 1611 C TYR A 218 3.135 41.144 -14.148 1.00 94.61 C ATOM 1612 O TYR A 218 3.512 42.314 -14.246 1.00 93.90 O ATOM 1613 CB TYR A 218 0.666 40.981 -13.811 1.00 92.46 C ATOM 1614 CG TYR A 218 0.769 42.342 -13.149 1.00 92.79 C ATOM 1615 CD1 TYR A 218 0.179 43.472 -13.721 1.00 93.36 C ATOM 1616 CD2 TYR A 218 1.504 42.512 -11.975 1.00 91.95 C ATOM 1617 CE1 TYR A 218 0.324 44.743 -13.139 1.00 92.39 C ATOM 1618 CE2 TYR A 218 1.658 43.770 -11.387 1.00 92.17 C ATOM 1619 CZ TYR A 218 1.067 44.886 -11.973 1.00 92.07 C ATOM 1620 OH TYR A 218 1.240 46.137 -11.403 1.00 91.03 O ATOM 1621 N LEU A 219 3.831 40.221 -13.490 1.00 96.79 N ATOM 1622 CA LEU A 219 5.098 40.537 -12.825 1.00 99.53 C ATOM 1623 C LEU A 219 6.348 40.001 -13.535 1.00102.10 C ATOM 1624 O LEU A 219 7.370 40.687 -13.624 1.00102.34 O ATOM 1625 CB LEU A 219 5.086 39.997 -11.389 1.00 98.47 C ATOM 1626 CG LEU A 219 4.218 40.679 -10.335 1.00 97.30 C ATOM 1627 CD1 LEU A 219 4.155 39.802 -9.108 1.00 97.75 C ATOM 1628 CD2 LEU A 219 4.787 42.041 -9.981 1.00 96.22 C ATOM 1629 N LEU A 220 6.263 38.772 -14.032 1.00104.81 N ATOM 1630 CA LEU A 220 7.391 38.133 -14.705 1.00107.29 C ATOM 1631 C LEU A 220 7.161 38.026 -16.216 1.00109.31 C ATOM 1632 O LEU A 220 7.805 37.223 -16.894 1.00109.84 O ATOM 1633 CB LEU A 220 7.614 36.731 -14.110 1.00106.72 C ATOM 1634 CG LEU A 220 7.339 36.524 -12.609 1.00105.67 C ATOM 1635 CD1 LEU A 220 7.545 35.055 -12.248 1.00104.74 C ATOM 1636 CD2 LEU A 220 8.241 37.424 -11.769 1.00104.50 C ATOM 1637 N GLY A 221 6.242 38.836 -16.738 1.00111.26 N ATOM 1638 CA GLY A 221 5.947 38.801 -18.161 1.00113.38 C ATOM 1639 C GLY A 221 7.136 39.205 -19.008 1.00114.74 C ATOM 1640 O GLY A 221 7.529 38.487 -19.928 1.00114.30 O ATOM 1641 N ARG A 222 7.709 40.361 -18.692 1.00116.28 N ATOM 1642 CA ARG A 222 8.864 40.875 -19.414 1.00117.84 C ATOM 1643 C ARG A 222 10.164 40.292 -18.855 1.00117.49 C ATOM 1644 O ARG A 222 10.689 39.293 -19.358 1.00117.04 O ATOM 1645 CB ARG A 222 8.887 42.406 -19.323 1.00120.30 C ATOM 1646 CG ARG A 222 7.796 43.093 -20.148 1.00124.08 C ATOM 1647 CD ARG A 222 7.560 44.553 -19.730 1.00127.28 C ATOM 1648 NE ARG A 222 6.905 44.666 -18.423 1.00129.65 N ATOM 1649 CZ ARG A 222 6.365 45.785 -17.939 1.00130.39 C ATOM 1650 NH1 ARG A 222 6.397 46.906 -18.651 1.00131.75 N ATOM 1651 NH2 ARG A 222 5.786 45.786 -16.743 1.00130.10 N ATOM 1652 N LYS A 223 10.681 40.917 -17.805 1.00117.21 N ATOM 1653 CA LYS A 223 11.917 40.456 -17.187 1.00116.89 C ATOM 1654 C LYS A 223 11.651 39.571 -15.969 1.00115.74 C ATOM 1655 O LYS A 223 11.094 40.024 -14.967 1.00116.07 O ATOM 1656 CB LYS A 223 12.775 41.655 -16.755 1.00118.12 C ATOM 1657 CG LYS A 223 13.327 42.530 -17.885 1.00119.48 C ATOM 1658 CD LYS A 223 14.481 41.865 -18.637 1.00120.20 C ATOM 1659 CE LYS A 223 15.289 42.887 -19.446 1.00119.96 C ATOM 1660 NZ LYS A 223 14.485 43.612 -20.473 1.00120.32 N ATOM 1661 N LYS A 224 12.047 38.307 -16.064 1.00113.56 N ATOM 1662 CA LYS A 224 11.892 37.378 -14.951 1.00110.71 C ATOM 1663 C LYS A 224 12.793 37.876 -13.819 1.00108.17 C ATOM 1664 O LYS A 224 14.006 37.988 -13.994 1.00107.92 O ATOM 1665 CB LYS A 224 12.333 35.968 -15.363 1.00111.33 C ATOM 1666 CG LYS A 224 11.350 35.221 -16.242 1.00111.67 C ATOM 1667 CD LYS A 224 10.994 36.002 -17.491 1.00113.19 C ATOM 1668 CE LYS A 224 10.010 35.226 -18.354 1.00114.15 C ATOM 1669 NZ LYS A 224 9.515 36.039 -19.504 1.00114.36 N ATOM 1670 N ARG A 225 12.208 38.191 -12.668 1.00104.97 N ATOM 1671 CA ARG A 225 12.998 38.661 -11.531 1.00102.05 C ATOM 1672 C ARG A 225 12.814 37.760 -10.306 1.00 99.65 C ATOM 1673 O ARG A 225 12.153 36.720 -10.381 1.00100.15 O ATOM 1674 CB ARG A 225 12.637 40.114 -11.169 1.00101.03 C ATOM 1675 CG ARG A 225 11.188 40.374 -10.782 1.00 99.54 C ATOM 1676 CD ARG A 225 10.295 40.355 -12.005 1.00 99.50 C ATOM 1677 NE ARG A 225 9.100 41.185 -11.851 1.00 98.90 N ATOM 1678 CZ ARG A 225 9.122 42.495 -11.614 1.00 99.29 C ATOM 1679 NH1 ARG A 225 10.278 43.136 -11.492 1.00 99.63 N ATOM 1680 NH2 ARG A 225 7.986 43.171 -11.518 1.00 99.36 N ATOM 1681 N PRO A 226 13.446 38.118 -9.177 1.00 96.61 N ATOM 1682 CA PRO A 226 13.282 37.282 -7.984 1.00 93.91 C ATOM 1683 C PRO A 226 11.969 37.635 -7.301 1.00 90.68 C ATOM 1684 O PRO A 226 11.476 38.760 -7.434 1.00 90.41 O ATOM 1685 CB PRO A 226 14.489 37.654 -7.130 1.00 94.90 C ATOM 1686 CG PRO A 226 15.508 38.067 -8.157 1.00 96.02 C ATOM 1687 CD PRO A 226 14.679 38.914 -9.076 1.00 95.93 C ATOM 1688 N ILE A 227 11.409 36.666 -6.580 1.00 86.58 N ATOM 1689 CA ILE A 227 10.142 36.848 -5.872 1.00 81.66 C ATOM 1690 C ILE A 227 10.332 36.979 -4.367 1.00 78.57 C ATOM 1691 O ILE A 227 11.235 36.372 -3.774 1.00 78.91 O ATOM 1692 CB ILE A 227 9.152 35.647 -6.119 1.00 81.09 C ATOM 1693 CG1 ILE A 227 8.302 35.897 -7.367 1.00 79.71 C ATOM 1694 CG2 ILE A 227 8.243 35.436 -4.902 1.00 79.22 C ATOM 1695 CD1 ILE A 227 9.024 35.678 -8.673 1.00 79.96 C ATOM 1696 N HIS A 228 9.481 37.793 -3.761 1.00 73.48 N ATOM 1697 CA HIS A 228 9.500 37.948 -2.331 1.00 69.84 C ATOM 1698 C HIS A 228 8.062 37.720 -1.886 1.00 67.82 C ATOM 1699 O HIS A 228 7.209 38.610 -1.983 1.00 66.60 O ATOM 1700 CB HIS A 228 9.940 39.338 -1.915 1.00 70.56 C ATOM 1701 CG HIS A 228 10.037 39.499 -0.432 1.00 73.54 C ATOM 1702 ND1 HIS A 228 11.058 38.940 0.312 1.00 75.85 N ATOM 1703 CD2 HIS A 228 9.211 40.096 0.459 1.00 74.60 C ATOM 1704 CE1 HIS A 228 10.856 39.185 1.594 1.00 75.11 C ATOM 1705 NE2 HIS A 228 9.740 39.884 1.712 1.00 76.27 N ATOM 1706 N LEU A 229 7.784 36.509 -1.428 1.00 65.08 N ATOM 1707 CA LEU A 229 6.450 36.177 -0.963 1.00 62.32 C ATOM 1708 C LEU A 229 6.312 36.516 0.518 1.00 61.53 C ATOM 1709 O LEU A 229 6.847 35.814 1.378 1.00 62.14 O ATOM 1710 CB LEU A 229 6.175 34.691 -1.152 1.00 59.30 C ATOM 1711 CG LEU A 229 4.861 34.239 -0.512 1.00 57.40 C ATOM 1712 CD1 LEU A 229 3.687 34.835 -1.274 1.00 54.44 C ATOM 1713 CD2 LEU A 229 4.810 32.730 -0.508 1.00 56.40 C ATOM 1714 N SER A 230 5.612 37.600 0.818 1.00 59.36 N ATOM 1715 CA SER A 230 5.401 37.985 2.203 1.00 58.06 C ATOM 1716 C SER A 230 4.041 37.362 2.534 1.00 57.34 C ATOM 1717 O SER A 230 2.983 37.879 2.151 1.00 57.15 O ATOM 1718 CB SER A 230 5.363 39.518 2.334 1.00 58.58 C ATOM 1719 OG SER A 230 5.725 39.967 3.634 1.00 57.12 O ATOM 1720 N PHE A 231 4.070 36.228 3.222 1.00 55.66 N ATOM 1721 CA PHE A 231 2.831 35.547 3.565 1.00 54.53 C ATOM 1722 C PHE A 231 2.368 35.851 4.973 1.00 53.57 C ATOM 1723 O PHE A 231 3.101 35.671 5.947 1.00 52.54 O ATOM 1724 CB PHE A 231 2.981 34.034 3.403 1.00 52.60 C ATOM 1725 CG PHE A 231 1.687 33.330 3.169 1.00 46.80 C ATOM 1726 CD1 PHE A 231 0.620 33.522 4.025 1.00 44.29 C ATOM 1727 CD2 PHE A 231 1.549 32.461 2.094 1.00 46.49 C ATOM 1728 CE1 PHE A 231 -0.555 32.862 3.824 1.00 44.81 C ATOM 1729 CE2 PHE A 231 0.377 31.787 1.876 1.00 46.31 C ATOM 1730 CZ PHE A 231 -0.687 31.987 2.745 1.00 47.96 C ATOM 1731 N ASP A 232 1.122 36.294 5.064 1.00 54.56 N ATOM 1732 CA ASP A 232 0.522 36.642 6.340 1.00 54.94 C ATOM 1733 C ASP A 232 -0.489 35.610 6.764 1.00 54.45 C ATOM 1734 O ASP A 232 -1.644 35.688 6.386 1.00 53.16 O ATOM 1735 CB ASP A 232 -0.160 37.998 6.250 1.00 53.83 C ATOM 1736 CG ASP A 232 -0.888 38.357 7.507 1.00 54.18 C ATOM 1737 OD1 ASP A 232 -0.905 37.546 8.459 1.00 51.22 O ATOM 1738 OD2 ASP A 232 -1.464 39.462 7.541 1.00 54.39 O ATOM 1739 N VAL A 233 -0.036 34.658 7.569 1.00 55.37 N ATOM 1740 CA VAL A 233 -0.863 33.588 8.093 1.00 56.07 C ATOM 1741 C VAL A 233 -2.342 33.943 8.253 1.00 56.17 C ATOM 1742 O VAL A 233 -3.197 33.057 8.238 1.00 57.17 O ATOM 1743 CB VAL A 233 -0.348 33.147 9.452 1.00 58.18 C ATOM 1744 CG1 VAL A 233 -0.651 34.238 10.495 1.00 57.95 C ATOM 1745 CG2 VAL A 233 -0.986 31.836 9.835 1.00 59.10 C ATOM 1746 N ASP A 234 -2.657 35.226 8.420 1.00 55.85 N ATOM 1747 CA ASP A 234 -4.063 35.628 8.557 1.00 55.47 C ATOM 1748 C ASP A 234 -4.843 35.763 7.236 1.00 52.10 C ATOM 1749 O ASP A 234 -6.055 35.812 7.247 1.00 52.05 O ATOM 1750 CB ASP A 234 -4.183 36.937 9.370 1.00 59.03 C ATOM 1751 CG ASP A 234 -3.841 38.191 8.563 1.00 63.96 C ATOM 1752 OD1 ASP A 234 -4.294 38.323 7.392 1.00 65.02 O ATOM 1753 OD2 ASP A 234 -3.136 39.067 9.126 1.00 66.85 O ATOM 1754 N GLY A 235 -4.148 35.832 6.109 1.00 50.06 N ATOM 1755 CA GLY A 235 -4.876 35.746 4.823 1.00 49.84 C ATOM 1756 C GLY A 235 -5.652 34.477 4.460 1.00 49.59 C ATOM 1757 O GLY A 235 -6.279 34.383 3.392 1.00 47.09 O ATOM 1758 N LEU A 236 -5.575 33.478 5.331 1.00 50.09 N ATOM 1759 CA LEU A 236 -6.306 32.249 5.122 1.00 50.76 C ATOM 1760 C LEU A 236 -7.566 32.466 5.926 1.00 51.57 C ATOM 1761 O LEU A 236 -7.662 33.445 6.670 1.00 52.88 O ATOM 1762 CB LEU A 236 -5.522 31.073 5.678 1.00 52.43 C ATOM 1763 CG LEU A 236 -4.915 30.163 4.613 1.00 54.89 C ATOM 1764 CD1 LEU A 236 -4.292 31.006 3.512 1.00 55.00 C ATOM 1765 CD2 LEU A 236 -3.884 29.252 5.259 1.00 55.89 C ATOM 1766 N ASP A 237 -8.548 31.589 5.794 1.00 52.93 N ATOM 1767 CA ASP A 237 -9.758 31.791 6.582 1.00 57.18 C ATOM 1768 C ASP A 237 -9.499 31.487 8.068 1.00 58.88 C ATOM 1769 O ASP A 237 -8.649 30.651 8.401 1.00 59.95 O ATOM 1770 CB ASP A 237 -10.897 30.922 6.071 1.00 58.63 C ATOM 1771 CG ASP A 237 -12.202 31.231 6.770 1.00 60.03 C ATOM 1772 OD1 ASP A 237 -12.355 30.816 7.944 1.00 60.59 O ATOM 1773 OD2 ASP A 237 -13.052 31.916 6.154 1.00 60.31 O ATOM 1774 N PRO A 238 -10.223 32.165 8.983 1.00 57.68 N ATOM 1775 CA PRO A 238 -9.992 31.897 10.402 1.00 56.57 C ATOM 1776 C PRO A 238 -10.134 30.441 10.850 1.00 56.08 C ATOM 1777 O PRO A 238 -9.540 30.041 11.858 1.00 56.48 O ATOM 1778 CB PRO A 238 -10.977 32.841 11.083 1.00 57.23 C ATOM 1779 CG PRO A 238 -10.912 34.037 10.192 1.00 57.07 C ATOM 1780 CD PRO A 238 -11.054 33.372 8.828 1.00 57.93 C ATOM 1781 N VAL A 239 -10.895 29.643 10.103 1.00 53.68 N ATOM 1782 CA VAL A 239 -11.077 28.233 10.452 1.00 52.34 C ATOM 1783 C VAL A 239 -9.754 27.459 10.296 1.00 52.10 C ATOM 1784 O VAL A 239 -9.516 26.447 10.960 1.00 50.40 O ATOM 1785 CB VAL A 239 -12.145 27.579 9.545 1.00 51.91 C ATOM 1786 CG1 VAL A 239 -11.492 26.911 8.356 1.00 51.39 C ATOM 1787 CG2 VAL A 239 -12.960 26.587 10.335 1.00 51.86 C ATOM 1788 N PHE A 240 -8.892 27.966 9.423 1.00 52.61 N ATOM 1789 CA PHE A 240 -7.615 27.331 9.133 1.00 51.13 C ATOM 1790 C PHE A 240 -6.452 27.840 9.963 1.00 50.55 C ATOM 1791 O PHE A 240 -5.648 27.050 10.443 1.00 52.43 O ATOM 1792 CB PHE A 240 -7.293 27.504 7.645 1.00 49.52 C ATOM 1793 CG PHE A 240 -8.305 26.865 6.717 1.00 46.38 C ATOM 1794 CD1 PHE A 240 -8.323 25.483 6.533 1.00 43.52 C ATOM 1795 CD2 PHE A 240 -9.216 27.647 6.012 1.00 44.12 C ATOM 1796 CE1 PHE A 240 -9.207 24.900 5.664 1.00 40.93 C ATOM 1797 CE2 PHE A 240 -10.108 27.063 5.138 1.00 41.80 C ATOM 1798 CZ PHE A 240 -10.105 25.688 4.964 1.00 40.90 C ATOM 1799 N THR A 241 -6.361 29.150 10.134 1.00 49.79 N ATOM 1800 CA THR A 241 -5.267 29.742 10.890 1.00 50.83 C ATOM 1801 C THR A 241 -5.824 30.632 11.974 1.00 52.55 C ATOM 1802 O THR A 241 -5.576 31.840 11.988 1.00 51.99 O ATOM 1803 CB THR A 241 -4.391 30.608 9.988 1.00 51.65 C ATOM 1804 OG1 THR A 241 -5.214 31.587 9.324 1.00 53.02 O ATOM 1805 CG2 THR A 241 -3.681 29.744 8.961 1.00 50.87 C ATOM 1806 N PRO A 242 -6.594 30.052 12.899 1.00 54.97 N ATOM 1807 CA PRO A 242 -7.200 30.812 13.998 1.00 57.22 C ATOM 1808 C PRO A 242 -6.294 31.627 14.928 1.00 56.87 C ATOM 1809 O PRO A 242 -6.674 32.720 15.333 1.00 57.12 O ATOM 1810 CB PRO A 242 -8.027 29.751 14.736 1.00 57.28 C ATOM 1811 CG PRO A 242 -7.408 28.439 14.312 1.00 57.06 C ATOM 1812 CD PRO A 242 -7.114 28.676 12.868 1.00 56.17 C ATOM 1813 N ALA A 243 -5.113 31.110 15.263 1.00 58.31 N ATOM 1814 CA ALA A 243 -4.190 31.822 16.156 1.00 58.19 C ATOM 1815 C ALA A 243 -3.535 33.043 15.514 1.00 58.24 C ATOM 1816 O ALA A 243 -2.363 32.982 15.136 1.00 57.68 O ATOM 1817 CB ALA A 243 -3.103 30.864 16.666 1.00 56.37 C ATOM 1818 N THR A 244 -4.287 34.141 15.404 1.00 59.71 N ATOM 1819 CA THR A 244 -3.787 35.390 14.807 1.00 62.49 C ATOM 1820 C THR A 244 -4.247 36.642 15.561 1.00 64.13 C ATOM 1821 O THR A 244 -5.001 36.564 16.534 1.00 66.01 O ATOM 1822 CB THR A 244 -4.264 35.586 13.343 1.00 60.93 C ATOM 1823 OG1 THR A 244 -4.306 34.328 12.665 1.00 62.57 O ATOM 1824 CG2 THR A 244 -3.297 36.495 12.600 1.00 62.79 C ATOM 1825 N GLY A 245 -3.779 37.799 15.094 1.00 65.60 N ATOM 1826 CA GLY A 245 -4.167 39.064 15.688 1.00 65.27 C ATOM 1827 C GLY A 245 -5.541 39.394 15.154 1.00 65.53 C ATOM 1828 O GLY A 245 -6.518 39.319 15.883 1.00 65.18 O ATOM 1829 N THR A 246 -5.626 39.725 13.870 1.00 67.51 N ATOM 1830 CA THR A 246 -6.916 40.048 13.268 1.00 69.03 C ATOM 1831 C THR A 246 -7.500 38.944 12.405 1.00 70.69 C ATOM 1832 O THR A 246 -6.942 38.642 11.349 1.00 72.99 O ATOM 1833 CB THR A 246 -6.865 41.219 12.287 1.00 68.81 C ATOM 1834 OG1 THR A 246 -5.709 42.029 12.523 1.00 70.16 O ATOM 1835 CG2 THR A 246 -8.156 42.024 12.391 1.00 67.59 C ATOM 1836 N PRO A 247 -8.605 38.303 12.838 1.00 69.84 N ATOM 1837 CA PRO A 247 -9.103 37.282 11.913 1.00 67.28 C ATOM 1838 C PRO A 247 -10.160 38.016 11.050 1.00 65.99 C ATOM 1839 O PRO A 247 -10.721 39.037 11.478 1.00 64.86 O ATOM 1840 CB PRO A 247 -9.698 36.226 12.847 1.00 66.52 C ATOM 1841 CG PRO A 247 -10.233 37.044 13.965 1.00 67.42 C ATOM 1842 CD PRO A 247 -9.140 38.085 14.199 1.00 69.14 C ATOM 1843 N VAL A 248 -10.396 37.535 9.831 1.00 64.07 N ATOM 1844 CA VAL A 248 -11.391 38.147 8.939 1.00 60.46 C ATOM 1845 C VAL A 248 -12.065 37.005 8.146 1.00 58.21 C ATOM 1846 O VAL A 248 -11.449 36.399 7.273 1.00 58.79 O ATOM 1847 CB VAL A 248 -10.718 39.175 7.979 1.00 59.13 C ATOM 1848 CG1 VAL A 248 -11.756 39.918 7.216 1.00 59.22 C ATOM 1849 CG2 VAL A 248 -9.914 40.178 8.764 1.00 60.23 C ATOM 1850 N VAL A 249 -13.321 36.697 8.457 1.00 54.05 N ATOM 1851 CA VAL A 249 -13.998 35.592 7.779 1.00 51.52 C ATOM 1852 C VAL A 249 -13.996 35.692 6.269 1.00 49.68 C ATOM 1853 O VAL A 249 -13.816 36.775 5.715 1.00 50.93 O ATOM 1854 CB VAL A 249 -15.445 35.489 8.209 1.00 50.57 C ATOM 1855 CG1 VAL A 249 -15.511 35.113 9.657 1.00 50.72 C ATOM 1856 CG2 VAL A 249 -16.138 36.803 7.956 1.00 48.35 C ATOM 1857 N GLY A 250 -14.209 34.562 5.604 1.00 45.28 N ATOM 1858 CA GLY A 250 -14.256 34.567 4.153 1.00 44.16 C ATOM 1859 C GLY A 250 -12.943 34.831 3.427 1.00 45.03 C ATOM 1860 O GLY A 250 -12.876 35.650 2.491 1.00 43.44 O ATOM 1861 N GLY A 251 -11.906 34.112 3.859 1.00 45.14 N ATOM 1862 CA GLY A 251 -10.591 34.236 3.268 1.00 44.15 C ATOM 1863 C GLY A 251 -10.099 32.957 2.610 1.00 44.72 C ATOM 1864 O GLY A 251 -10.820 31.956 2.525 1.00 45.93 O ATOM 1865 N LEU A 252 -8.859 33.012 2.135 1.00 43.77 N ATOM 1866 CA LEU A 252 -8.196 31.911 1.462 1.00 44.97 C ATOM 1867 C LEU A 252 -8.312 30.572 2.190 1.00 47.83 C ATOM 1868 O LEU A 252 -8.208 30.488 3.431 1.00 51.38 O ATOM 1869 CB LEU A 252 -6.719 32.272 1.271 1.00 43.84 C ATOM 1870 CG LEU A 252 -6.527 33.479 0.355 1.00 43.55 C ATOM 1871 CD1 LEU A 252 -5.193 34.166 0.599 1.00 41.47 C ATOM 1872 CD2 LEU A 252 -6.668 33.014 -1.076 1.00 41.53 C ATOM 1873 N SER A 253 -8.512 29.509 1.426 1.00 46.62 N ATOM 1874 CA SER A 253 -8.615 28.197 2.034 1.00 46.70 C ATOM 1875 C SER A 253 -7.223 27.599 2.160 1.00 47.40 C ATOM 1876 O SER A 253 -6.295 28.043 1.482 1.00 48.44 O ATOM 1877 CB SER A 253 -9.459 27.323 1.148 1.00 45.90 C ATOM 1878 OG SER A 253 -9.056 27.527 -0.191 1.00 46.47 O ATOM 1879 N TYR A 254 -7.077 26.603 3.031 1.00 47.39 N ATOM 1880 CA TYR A 254 -5.799 25.926 3.211 1.00 47.44 C ATOM 1881 C TYR A 254 -5.393 25.585 1.790 1.00 49.11 C ATOM 1882 O TYR A 254 -4.339 26.021 1.326 1.00 49.44 O ATOM 1883 CB TYR A 254 -5.987 24.647 4.028 1.00 46.22 C ATOM 1884 CG TYR A 254 -4.726 23.843 4.337 1.00 47.21 C ATOM 1885 CD1 TYR A 254 -3.759 24.322 5.216 1.00 48.90 C ATOM 1886 CD2 TYR A 254 -4.556 22.556 3.828 1.00 47.33 C ATOM 1887 CE1 TYR A 254 -2.656 23.535 5.593 1.00 49.41 C ATOM 1888 CE2 TYR A 254 -3.467 21.758 4.190 1.00 49.47 C ATOM 1889 CZ TYR A 254 -2.519 22.250 5.077 1.00 51.72 C ATOM 1890 OH TYR A 254 -1.459 21.442 5.467 1.00 53.13 O ATOM 1891 N ARG A 255 -6.256 24.836 1.096 1.00 49.81 N ATOM 1892 CA ARG A 255 -6.033 24.439 -0.299 1.00 50.99 C ATOM 1893 C ARG A 255 -5.559 25.608 -1.174 1.00 52.53 C ATOM 1894 O ARG A 255 -4.471 25.551 -1.767 1.00 53.55 O ATOM 1895 CB ARG A 255 -7.321 23.851 -0.894 1.00 51.35 C ATOM 1896 CG ARG A 255 -7.507 22.365 -0.629 1.00 54.18 C ATOM 1897 CD ARG A 255 -8.900 21.851 -0.994 1.00 55.20 C ATOM 1898 NE ARG A 255 -9.164 21.927 -2.429 1.00 59.63 N ATOM 1899 CZ ARG A 255 -10.322 21.587 -3.002 1.00 62.22 C ATOM 1900 NH1 ARG A 255 -11.333 21.140 -2.253 1.00 63.52 N ATOM 1901 NH2 ARG A 255 -10.478 21.704 -4.322 1.00 60.66 N ATOM 1902 N GLU A 256 -6.371 26.659 -1.262 1.00 52.25 N ATOM 1903 CA GLU A 256 -5.995 27.820 -2.056 1.00 54.48 C ATOM 1904 C GLU A 256 -4.634 28.384 -1.630 1.00 55.70 C ATOM 1905 O GLU A 256 -3.847 28.835 -2.469 1.00 56.74 O ATOM 1906 CB GLU A 256 -7.040 28.926 -1.931 1.00 55.00 C ATOM 1907 CG GLU A 256 -8.374 28.616 -2.564 1.00 55.21 C ATOM 1908 CD GLU A 256 -9.322 29.777 -2.450 1.00 56.50 C ATOM 1909 OE1 GLU A 256 -9.391 30.358 -1.341 1.00 57.48 O ATOM 1910 OE2 GLU A 256 -9.995 30.103 -3.459 1.00 57.60 O ATOM 1911 N GLY A 257 -4.366 28.365 -0.327 1.00 55.23 N ATOM 1912 CA GLY A 257 -3.106 28.885 0.176 1.00 54.94 C ATOM 1913 C GLY A 257 -1.943 27.990 -0.184 1.00 54.60 C ATOM 1914 O GLY A 257 -0.830 28.448 -0.460 1.00 52.82 O ATOM 1915 N LEU A 258 -2.204 26.693 -0.154 1.00 54.63 N ATOM 1916 CA LEU A 258 -1.184 25.749 -0.509 1.00 55.19 C ATOM 1917 C LEU A 258 -1.045 25.800 -2.014 1.00 56.88 C ATOM 1918 O LEU A 258 0.022 25.496 -2.551 1.00 58.61 O ATOM 1919 CB LEU A 258 -1.548 24.341 -0.048 1.00 53.48 C ATOM 1920 CG LEU A 258 -1.260 24.113 1.433 1.00 51.43 C ATOM 1921 CD1 LEU A 258 -1.343 22.639 1.771 1.00 50.84 C ATOM 1922 CD2 LEU A 258 0.129 24.614 1.732 1.00 52.56 C ATOM 1923 N TYR A 259 -2.098 26.201 -2.717 1.00 56.63 N ATOM 1924 CA TYR A 259 -1.946 26.265 -4.157 1.00 58.61 C ATOM 1925 C TYR A 259 -1.007 27.400 -4.533 1.00 59.71 C ATOM 1926 O TYR A 259 -0.194 27.268 -5.453 1.00 61.48 O ATOM 1927 CB TYR A 259 -3.260 26.481 -4.871 1.00 58.79 C ATOM 1928 CG TYR A 259 -3.096 26.308 -6.365 1.00 59.56 C ATOM 1929 CD1 TYR A 259 -3.250 25.047 -6.958 1.00 58.31 C ATOM 1930 CD2 TYR A 259 -2.808 27.409 -7.190 1.00 57.95 C ATOM 1931 CE1 TYR A 259 -3.135 24.888 -8.339 1.00 60.55 C ATOM 1932 CE2 TYR A 259 -2.689 27.268 -8.568 1.00 58.97 C ATOM 1933 CZ TYR A 259 -2.860 26.006 -9.140 1.00 61.35 C ATOM 1934 OH TYR A 259 -2.800 25.859 -10.506 1.00 61.49 O ATOM 1935 N ILE A 260 -1.124 28.518 -3.823 1.00 59.36 N ATOM 1936 CA ILE A 260 -0.268 29.665 -4.081 1.00 59.47 C ATOM 1937 C ILE A 260 1.206 29.284 -3.906 1.00 59.59 C ATOM 1938 O ILE A 260 1.950 29.221 -4.884 1.00 57.54 O ATOM 1939 CB ILE A 260 -0.639 30.842 -3.149 1.00 60.43 C ATOM 1940 CG1 ILE A 260 -2.014 31.402 -3.544 1.00 60.99 C ATOM 1941 CG2 ILE A 260 0.408 31.934 -3.232 1.00 61.05 C ATOM 1942 CD1 ILE A 260 -2.496 32.527 -2.661 1.00 60.66 C ATOM 1943 N THR A 261 1.630 29.018 -2.672 1.00 61.59 N ATOM 1944 CA THR A 261 3.026 28.639 -2.421 1.00 63.51 C ATOM 1945 C THR A 261 3.491 27.573 -3.428 1.00 67.33 C ATOM 1946 O THR A 261 4.677 27.526 -3.797 1.00 68.11 O ATOM 1947 CB THR A 261 3.219 28.056 -1.011 1.00 60.27 C ATOM 1948 OG1 THR A 261 2.313 26.970 -0.842 1.00 57.46 O ATOM 1949 CG2 THR A 261 2.976 29.100 0.071 1.00 57.45 C ATOM 1950 N GLU A 262 2.559 26.717 -3.857 1.00 69.10 N ATOM 1951 CA GLU A 262 2.869 25.664 -4.825 1.00 71.64 C ATOM 1952 C GLU A 262 3.235 26.208 -6.211 1.00 74.49 C ATOM 1953 O GLU A 262 4.229 25.784 -6.805 1.00 73.82 O ATOM 1954 CB GLU A 262 1.701 24.685 -4.951 1.00 70.03 C ATOM 1955 CG GLU A 262 1.614 23.697 -3.809 1.00 66.23 C ATOM 1956 CD GLU A 262 0.732 22.508 -4.137 1.00 64.16 C ATOM 1957 OE1 GLU A 262 -0.269 22.686 -4.872 1.00 61.75 O ATOM 1958 OE2 GLU A 262 1.037 21.400 -3.645 1.00 61.96 O ATOM 1959 N GLU A 263 2.431 27.132 -6.732 1.00 77.10 N ATOM 1960 CA GLU A 263 2.726 27.732 -8.030 1.00 80.64 C ATOM 1961 C GLU A 263 3.936 28.663 -7.914 1.00 82.90 C ATOM 1962 O GLU A 263 4.634 28.920 -8.900 1.00 83.65 O ATOM 1963 CB GLU A 263 1.526 28.512 -8.546 1.00 81.10 C ATOM 1964 CG GLU A 263 0.500 27.618 -9.180 1.00 85.33 C ATOM 1965 CD GLU A 263 1.108 26.753 -10.278 1.00 87.53 C ATOM 1966 OE1 GLU A 263 1.628 27.336 -11.255 1.00 89.71 O ATOM 1967 OE2 GLU A 263 1.071 25.502 -10.164 1.00 87.81 O ATOM 1968 N ILE A 264 4.171 29.165 -6.705 1.00 83.91 N ATOM 1969 CA ILE A 264 5.305 30.034 -6.437 1.00 86.02 C ATOM 1970 C ILE A 264 6.570 29.202 -6.627 1.00 87.43 C ATOM 1971 O ILE A 264 7.491 29.568 -7.370 1.00 86.72 O ATOM 1972 CB ILE A 264 5.268 30.567 -4.974 1.00 86.67 C ATOM 1973 CG1 ILE A 264 4.251 31.702 -4.849 1.00 87.39 C ATOM 1974 CG2 ILE A 264 6.664 31.006 -4.531 1.00 86.51 C ATOM 1975 CD1 ILE A 264 4.502 32.885 -5.768 1.00 86.98 C ATOM 1976 N TYR A 265 6.593 28.071 -5.933 1.00 88.88 N ATOM 1977 CA TYR A 265 7.713 27.153 -5.989 1.00 89.33 C ATOM 1978 C TYR A 265 7.995 26.706 -7.423 1.00 89.27 C ATOM 1979 O TYR A 265 9.127 26.772 -7.890 1.00 89.51 O ATOM 1980 CB TYR A 265 7.423 25.932 -5.117 1.00 89.38 C ATOM 1981 CG TYR A 265 8.366 24.809 -5.387 1.00 89.34 C ATOM 1982 CD1 TYR A 265 9.695 24.896 -5.002 1.00 90.14 C ATOM 1983 CD2 TYR A 265 7.956 23.698 -6.121 1.00 90.59 C ATOM 1984 CE1 TYR A 265 10.611 23.906 -5.349 1.00 92.33 C ATOM 1985 CE2 TYR A 265 8.859 22.699 -6.475 1.00 92.63 C ATOM 1986 CZ TYR A 265 10.190 22.810 -6.088 1.00 92.95 C ATOM 1987 OH TYR A 265 11.096 21.835 -6.454 1.00 93.11 O ATOM 1988 N LYS A 266 6.956 26.254 -8.116 1.00 89.51 N ATOM 1989 CA LYS A 266 7.077 25.781 -9.491 1.00 90.13 C ATOM 1990 C LYS A 266 7.669 26.789 -10.482 1.00 89.83 C ATOM 1991 O LYS A 266 7.777 26.502 -11.676 1.00 89.92 O ATOM 1992 CB LYS A 266 5.706 25.305 -9.989 1.00 91.98 C ATOM 1993 CG LYS A 266 5.231 24.003 -9.345 1.00 93.52 C ATOM 1994 CD LYS A 266 3.758 23.699 -9.637 1.00 94.83 C ATOM 1995 CE LYS A 266 3.479 23.466 -11.124 1.00 95.81 C ATOM 1996 NZ LYS A 266 3.436 24.720 -11.944 1.00 96.25 N ATOM 1997 N THR A 267 8.059 27.963 -9.998 1.00 89.07 N ATOM 1998 CA THR A 267 8.639 28.978 -10.874 1.00 87.97 C ATOM 1999 C THR A 267 10.108 29.224 -10.527 1.00 88.38 C ATOM 2000 O THR A 267 10.786 30.045 -11.151 1.00 88.04 O ATOM 2001 CB THR A 267 7.875 30.291 -10.758 1.00 86.90 C ATOM 2002 OG1 THR A 267 8.122 30.872 -9.470 1.00 86.88 O ATOM 2003 CG2 THR A 267 6.390 30.034 -10.937 1.00 84.42 C ATOM 2004 N GLY A 268 10.581 28.501 -9.519 1.00 88.84 N ATOM 2005 CA GLY A 268 11.958 28.617 -9.082 1.00 88.87 C ATOM 2006 C GLY A 268 12.450 30.028 -8.819 1.00 89.07 C ATOM 2007 O GLY A 268 13.644 30.237 -8.603 1.00 90.27 O ATOM 2008 N LEU A 269 11.550 31.001 -8.810 1.00 87.70 N ATOM 2009 CA LEU A 269 11.970 32.374 -8.586 1.00 87.22 C ATOM 2010 C LEU A 269 11.823 32.887 -7.162 1.00 86.80 C ATOM 2011 O LEU A 269 11.817 34.102 -6.941 1.00 84.64 O ATOM 2012 CB LEU A 269 11.209 33.302 -9.524 1.00 88.82 C ATOM 2013 CG LEU A 269 11.463 33.107 -11.014 1.00 89.57 C ATOM 2014 CD1 LEU A 269 10.770 34.232 -11.768 1.00 89.71 C ATOM 2015 CD2 LEU A 269 12.965 33.121 -11.302 1.00 89.67 C ATOM 2016 N LEU A 270 11.712 31.977 -6.195 1.00 87.46 N ATOM 2017 CA LEU A 270 11.554 32.392 -4.801 1.00 88.00 C ATOM 2018 C LEU A 270 12.877 32.861 -4.230 1.00 87.09 C ATOM 2019 O LEU A 270 13.886 32.161 -4.316 1.00 87.38 O ATOM 2020 CB LEU A 270 11.010 31.256 -3.923 1.00 88.45 C ATOM 2021 CG LEU A 270 10.779 31.677 -2.457 1.00 88.79 C ATOM 2022 CD1 LEU A 270 9.553 32.581 -2.368 1.00 86.57 C ATOM 2023 CD2 LEU A 270 10.605 30.447 -1.576 1.00 88.64 C ATOM 2024 N SER A 271 12.869 34.046 -3.635 1.00 85.87 N ATOM 2025 CA SER A 271 14.089 34.581 -3.071 1.00 85.51 C ATOM 2026 C SER A 271 13.921 34.895 -1.581 1.00 83.98 C ATOM 2027 O SER A 271 14.785 34.556 -0.763 1.00 83.61 O ATOM 2028 CB SER A 271 14.494 35.831 -3.857 1.00 86.30 C ATOM 2029 OG SER A 271 15.902 35.972 -3.874 1.00 87.42 O ATOM 2030 N GLY A 272 12.802 35.529 -1.238 1.00 81.96 N ATOM 2031 CA GLY A 272 12.534 35.881 0.145 1.00 78.89 C ATOM 2032 C GLY A 272 11.132 35.468 0.550 1.00 76.86 C ATOM 2033 O GLY A 272 10.173 35.697 -0.191 1.00 75.55 O ATOM 2034 N LEU A 273 11.012 34.872 1.733 1.00 74.99 N ATOM 2035 CA LEU A 273 9.722 34.399 2.228 1.00 72.30 C ATOM 2036 C LEU A 273 9.376 34.947 3.614 1.00 70.78 C ATOM 2037 O LEU A 273 10.217 34.986 4.511 1.00 69.89 O ATOM 2038 CB LEU A 273 9.731 32.866 2.286 1.00 71.11 C ATOM 2039 CG LEU A 273 8.464 32.038 2.033 1.00 69.91 C ATOM 2040 CD1 LEU A 273 8.551 30.790 2.895 1.00 68.42 C ATOM 2041 CD2 LEU A 273 7.194 32.820 2.360 1.00 68.44 C ATOM 2042 N ASP A 274 8.129 35.365 3.784 1.00 70.14 N ATOM 2043 CA ASP A 274 7.682 35.879 5.072 1.00 69.92 C ATOM 2044 C ASP A 274 6.468 35.115 5.619 1.00 68.55 C ATOM 2045 O ASP A 274 5.438 34.987 4.960 1.00 68.14 O ATOM 2046 CB ASP A 274 7.371 37.384 4.978 1.00 69.92 C ATOM 2047 CG ASP A 274 8.631 38.241 4.843 1.00 70.83 C ATOM 2048 OD1 ASP A 274 9.631 37.966 5.546 1.00 69.82 O ATOM 2049 OD2 ASP A 274 8.617 39.199 4.044 1.00 71.83 O ATOM 2050 N ILE A 275 6.624 34.578 6.823 1.00 67.15 N ATOM 2051 CA ILE A 275 5.559 33.848 7.496 1.00 65.60 C ATOM 2052 C ILE A 275 5.306 34.638 8.774 1.00 65.13 C ATOM 2053 O ILE A 275 5.889 34.352 9.834 1.00 63.76 O ATOM 2054 CB ILE A 275 5.988 32.398 7.824 1.00 65.55 C ATOM 2055 CG1 ILE A 275 5.988 31.558 6.547 1.00 65.04 C ATOM 2056 CG2 ILE A 275 5.059 31.781 8.847 1.00 65.82 C ATOM 2057 CD1 ILE A 275 7.140 31.856 5.621 1.00 67.21 C ATOM 2058 N MET A 276 4.428 35.637 8.637 1.00 63.90 N ATOM 2059 CA MET A 276 4.074 36.565 9.705 1.00 62.24 C ATOM 2060 C MET A 276 2.698 36.347 10.298 1.00 61.84 C ATOM 2061 O MET A 276 1.913 35.538 9.799 1.00 60.99 O ATOM 2062 CB MET A 276 4.106 37.998 9.176 1.00 62.85 C ATOM 2063 CG MET A 276 5.187 38.271 8.162 1.00 66.01 C ATOM 2064 SD MET A 276 6.803 37.805 8.780 1.00 70.21 S ATOM 2065 CE MET A 276 6.810 38.752 10.336 1.00 69.59 C ATOM 2066 N GLU A 277 2.439 37.114 11.366 1.00 61.47 N ATOM 2067 CA GLU A 277 1.174 37.177 12.130 1.00 61.13 C ATOM 2068 C GLU A 277 0.691 35.965 12.914 1.00 60.56 C ATOM 2069 O GLU A 277 -0.500 35.861 13.211 1.00 58.86 O ATOM 2070 CB GLU A 277 0.011 37.647 11.230 1.00 60.57 C ATOM 2071 CG GLU A 277 0.103 39.091 10.703 1.00 59.95 C ATOM 2072 CD GLU A 277 0.285 40.120 11.796 1.00 60.38 C ATOM 2073 OE1 GLU A 277 -0.072 39.859 12.963 1.00 60.82 O ATOM 2074 OE2 GLU A 277 0.809 41.207 11.509 1.00 63.05 O ATOM 2075 N VAL A 278 1.590 35.052 13.255 1.00 60.40 N ATOM 2076 CA VAL A 278 1.173 33.886 14.012 1.00 61.42 C ATOM 2077 C VAL A 278 1.278 34.192 15.485 1.00 61.69 C ATOM 2078 O VAL A 278 2.370 34.465 15.957 1.00 62.84 O ATOM 2079 CB VAL A 278 2.059 32.687 13.749 1.00 61.62 C ATOM 2080 CG1 VAL A 278 1.773 31.632 14.795 1.00 60.45 C ATOM 2081 CG2 VAL A 278 1.792 32.131 12.358 1.00 63.02 C ATOM 2082 N ASN A 279 0.161 34.120 16.211 1.00 62.09 N ATOM 2083 CA ASN A 279 0.160 34.415 17.646 1.00 62.05 C ATOM 2084 C ASN A 279 -0.152 33.207 18.511 1.00 62.56 C ATOM 2085 O ASN A 279 -1.275 33.074 18.980 1.00 62.08 O ATOM 2086 CB ASN A 279 -0.866 35.500 17.964 1.00 61.37 C ATOM 2087 CG ASN A 279 -0.414 36.409 19.091 1.00 60.61 C ATOM 2088 OD1 ASN A 279 0.356 35.997 19.964 1.00 58.30 O ATOM 2089 ND2 ASN A 279 -0.893 37.654 19.080 1.00 57.50 N ATOM 2090 N PRO A 280 0.841 32.323 18.751 1.00 64.39 N ATOM 2091 CA PRO A 280 0.758 31.089 19.553 1.00 67.08 C ATOM 2092 C PRO A 280 0.000 31.187 20.874 1.00 69.58 C ATOM 2093 O PRO A 280 -0.114 30.209 21.614 1.00 69.61 O ATOM 2094 CB PRO A 280 2.225 30.718 19.757 1.00 65.11 C ATOM 2095 CG PRO A 280 2.812 31.096 18.477 1.00 64.36 C ATOM 2096 CD PRO A 280 2.204 32.473 18.217 1.00 64.19 C ATOM 2097 N THR A 281 -0.520 32.374 21.152 1.00 72.57 N ATOM 2098 CA THR A 281 -1.262 32.626 22.371 1.00 75.80 C ATOM 2099 C THR A 281 -2.721 32.934 22.061 1.00 76.44 C ATOM 2100 O THR A 281 -3.609 32.663 22.867 1.00 76.53 O ATOM 2101 CB THR A 281 -0.631 33.801 23.132 1.00 76.94 C ATOM 2102 OG1 THR A 281 -0.277 34.831 22.201 1.00 79.55 O ATOM 2103 CG2 THR A 281 0.622 33.350 23.862 1.00 77.78 C ATOM 2104 N LEU A 282 -2.967 33.500 20.888 1.00 78.33 N ATOM 2105 CA LEU A 282 -4.326 33.830 20.498 1.00 80.29 C ATOM 2106 C LEU A 282 -5.054 32.603 19.979 1.00 83.19 C ATOM 2107 O LEU A 282 -5.489 32.566 18.831 1.00 85.02 O ATOM 2108 CB LEU A 282 -4.333 34.936 19.439 1.00 77.40 C ATOM 2109 CG LEU A 282 -3.728 36.239 19.952 1.00 75.93 C ATOM 2110 CD1 LEU A 282 -3.933 37.352 18.951 1.00 75.40 C ATOM 2111 CD2 LEU A 282 -4.375 36.586 21.269 1.00 75.90 C ATOM 2112 N GLY A 283 -5.172 31.592 20.833 1.00 85.07 N ATOM 2113 CA GLY A 283 -5.877 30.382 20.451 1.00 86.60 C ATOM 2114 C GLY A 283 -7.068 30.132 21.368 1.00 87.56 C ATOM 2115 O GLY A 283 -6.903 30.043 22.593 1.00 88.62 O ATOM 2116 N LYS A 284 -8.264 30.039 20.780 1.00 86.77 N ATOM 2117 CA LYS A 284 -9.495 29.789 21.537 1.00 85.88 C ATOM 2118 C LYS A 284 -9.302 28.474 22.283 1.00 85.02 C ATOM 2119 O LYS A 284 -9.770 28.309 23.414 1.00 84.12 O ATOM 2120 CB LYS A 284 -10.695 29.664 20.588 1.00 87.29 C ATOM 2121 CG LYS A 284 -12.009 30.266 21.094 1.00 87.71 C ATOM 2122 CD LYS A 284 -12.186 31.714 20.625 1.00 87.88 C ATOM 2123 CE LYS A 284 -11.075 32.619 21.155 1.00 88.18 C ATOM 2124 NZ LYS A 284 -11.107 33.997 20.599 1.00 87.29 N ATOM 2125 N THR A 285 -8.608 27.547 21.620 1.00 84.15 N ATOM 2126 CA THR A 285 -8.289 26.234 22.164 1.00 83.26 C ATOM 2127 C THR A 285 -6.926 25.820 21.623 1.00 82.99 C ATOM 2128 O THR A 285 -6.533 26.209 20.521 1.00 81.31 O ATOM 2129 CB THR A 285 -9.291 25.159 21.725 1.00 83.35 C ATOM 2130 OG1 THR A 285 -9.041 24.798 20.359 1.00 84.05 O ATOM 2131 CG2 THR A 285 -10.711 25.672 21.857 1.00 84.61 C ATOM 2132 N PRO A 286 -6.188 25.019 22.399 1.00 83.41 N ATOM 2133 CA PRO A 286 -4.861 24.527 22.027 1.00 82.95 C ATOM 2134 C PRO A 286 -4.768 24.008 20.593 1.00 82.76 C ATOM 2135 O PRO A 286 -3.768 24.247 19.910 1.00 83.94 O ATOM 2136 CB PRO A 286 -4.607 23.447 23.067 1.00 83.43 C ATOM 2137 CG PRO A 286 -5.219 24.065 24.297 1.00 83.83 C ATOM 2138 CD PRO A 286 -6.545 24.579 23.762 1.00 84.08 C ATOM 2139 N GLU A 287 -5.797 23.293 20.145 1.00 81.54 N ATOM 2140 CA GLU A 287 -5.823 22.768 18.781 1.00 80.48 C ATOM 2141 C GLU A 287 -5.679 23.922 17.777 1.00 79.49 C ATOM 2142 O GLU A 287 -4.760 23.928 16.949 1.00 78.67 O ATOM 2143 CB GLU A 287 -7.139 22.008 18.546 1.00 82.25 C ATOM 2144 CG GLU A 287 -7.899 22.331 17.237 1.00 83.83 C ATOM 2145 CD GLU A 287 -7.189 21.856 15.960 1.00 84.57 C ATOM 2146 OE1 GLU A 287 -6.168 22.465 15.578 1.00 85.56 O ATOM 2147 OE2 GLU A 287 -7.652 20.873 15.332 1.00 84.28 O ATOM 2148 N GLU A 288 -6.590 24.893 17.864 1.00 77.25 N ATOM 2149 CA GLU A 288 -6.578 26.060 16.985 1.00 73.95 C ATOM 2150 C GLU A 288 -5.168 26.565 16.751 1.00 72.69 C ATOM 2151 O GLU A 288 -4.801 26.892 15.627 1.00 72.64 O ATOM 2152 CB GLU A 288 -7.408 27.186 17.583 1.00 72.46 C ATOM 2153 CG GLU A 288 -8.867 26.839 17.761 1.00 70.50 C ATOM 2154 CD GLU A 288 -9.740 28.075 17.833 1.00 69.40 C ATOM 2155 OE1 GLU A 288 -9.176 29.195 17.863 1.00 68.54 O ATOM 2156 OE2 GLU A 288 -10.986 27.930 17.862 1.00 68.62 O ATOM 2157 N VAL A 289 -4.385 26.652 17.823 1.00 71.45 N ATOM 2158 CA VAL A 289 -2.999 27.094 17.707 1.00 70.18 C ATOM 2159 C VAL A 289 -2.322 26.039 16.841 1.00 69.20 C ATOM 2160 O VAL A 289 -1.995 26.299 15.684 1.00 68.99 O ATOM 2161 CB VAL A 289 -2.278 27.180 19.102 1.00 69.79 C ATOM 2162 CG1 VAL A 289 -0.841 27.657 18.928 1.00 67.62 C ATOM 2163 CG2 VAL A 289 -3.021 28.149 20.022 1.00 69.71 C ATOM 2164 N THR A 290 -2.133 24.846 17.401 1.00 68.00 N ATOM 2165 CA THR A 290 -1.509 23.739 16.676 1.00 66.88 C ATOM 2166 C THR A 290 -1.893 23.766 15.194 1.00 66.56 C ATOM 2167 O THR A 290 -1.036 23.653 14.309 1.00 66.19 O ATOM 2168 CB THR A 290 -1.936 22.390 17.275 1.00 65.35 C ATOM 2169 OG1 THR A 290 -1.423 22.294 18.604 1.00 63.64 O ATOM 2170 CG2 THR A 290 -1.408 21.238 16.442 1.00 64.85 C ATOM 2171 N ARG A 291 -3.191 23.909 14.942 1.00 65.26 N ATOM 2172 CA ARG A 291 -3.718 23.973 13.580 1.00 64.06 C ATOM 2173 C ARG A 291 -2.921 25.057 12.869 1.00 63.34 C ATOM 2174 O ARG A 291 -2.143 24.797 11.946 1.00 64.70 O ATOM 2175 CB ARG A 291 -5.196 24.367 13.622 1.00 61.65 C ATOM 2176 CG ARG A 291 -5.919 24.347 12.298 1.00 58.64 C ATOM 2177 CD ARG A 291 -7.379 24.710 12.527 1.00 56.90 C ATOM 2178 NE ARG A 291 -7.893 24.016 13.704 1.00 57.01 N ATOM 2179 CZ ARG A 291 -9.040 24.296 14.320 1.00 57.06 C ATOM 2180 NH1 ARG A 291 -9.827 25.272 13.868 1.00 57.97 N ATOM 2181 NH2 ARG A 291 -9.387 23.621 15.413 1.00 55.48 N ATOM 2182 N THR A 292 -3.121 26.277 13.346 1.00 61.05 N ATOM 2183 CA THR A 292 -2.471 27.461 12.829 1.00 58.67 C ATOM 2184 C THR A 292 -0.985 27.235 12.589 1.00 58.47 C ATOM 2185 O THR A 292 -0.430 27.647 11.576 1.00 57.49 O ATOM 2186 CB THR A 292 -2.650 28.609 13.828 1.00 58.81 C ATOM 2187 OG1 THR A 292 -4.053 28.792 14.112 1.00 56.48 O ATOM 2188 CG2 THR A 292 -2.042 29.895 13.267 1.00 59.71 C ATOM 2189 N VAL A 293 -0.349 26.560 13.533 1.00 59.78 N ATOM 2190 CA VAL A 293 1.075 26.284 13.456 1.00 58.82 C ATOM 2191 C VAL A 293 1.489 25.365 12.320 1.00 58.73 C ATOM 2192 O VAL A 293 2.366 25.709 11.539 1.00 60.24 O ATOM 2193 CB VAL A 293 1.566 25.700 14.777 1.00 57.64 C ATOM 2194 CG1 VAL A 293 2.892 25.021 14.587 1.00 59.61 C ATOM 2195 CG2 VAL A 293 1.690 26.812 15.794 1.00 56.78 C ATOM 2196 N ASN A 294 0.873 24.199 12.216 1.00 59.01 N ATOM 2197 CA ASN A 294 1.245 23.271 11.151 1.00 59.52 C ATOM 2198 C ASN A 294 0.933 23.817 9.764 1.00 57.04 C ATOM 2199 O ASN A 294 1.787 23.810 8.886 1.00 56.04 O ATOM 2200 CB ASN A 294 0.550 21.929 11.369 1.00 62.65 C ATOM 2201 CG ASN A 294 0.814 21.364 12.757 1.00 65.74 C ATOM 2202 OD1 ASN A 294 1.973 21.114 13.136 1.00 65.73 O ATOM 2203 ND2 ASN A 294 -0.259 21.172 13.531 1.00 65.75 N ATOM 2204 N THR A 295 -0.290 24.296 9.572 1.00 55.78 N ATOM 2205 CA THR A 295 -0.697 24.869 8.289 1.00 53.42 C ATOM 2206 C THR A 295 0.335 25.924 7.851 1.00 51.63 C ATOM 2207 O THR A 295 0.690 26.035 6.669 1.00 51.32 O ATOM 2208 CB THR A 295 -2.105 25.511 8.402 1.00 51.74 C ATOM 2209 OG1 THR A 295 -2.468 26.108 7.156 1.00 51.03 O ATOM 2210 CG2 THR A 295 -2.114 26.581 9.471 1.00 52.14 C ATOM 2211 N ALA A 296 0.814 26.693 8.817 1.00 48.87 N ATOM 2212 CA ALA A 296 1.816 27.692 8.537 1.00 48.83 C ATOM 2213 C ALA A 296 3.061 26.934 8.066 1.00 49.86 C ATOM 2214 O ALA A 296 3.551 27.129 6.951 1.00 48.11 O ATOM 2215 CB ALA A 296 2.108 28.464 9.788 1.00 48.98 C ATOM 2216 N VAL A 297 3.542 26.049 8.934 1.00 51.36 N ATOM 2217 CA VAL A 297 4.700 25.199 8.665 1.00 53.60 C ATOM 2218 C VAL A 297 4.558 24.572 7.285 1.00 55.78 C ATOM 2219 O VAL A 297 5.479 24.606 6.459 1.00 58.28 O ATOM 2220 CB VAL A 297 4.783 24.044 9.699 1.00 53.14 C ATOM 2221 CG1 VAL A 297 5.801 23.003 9.267 1.00 50.59 C ATOM 2222 CG2 VAL A 297 5.142 24.600 11.063 1.00 54.44 C ATOM 2223 N ALA A 298 3.393 23.989 7.051 1.00 56.73 N ATOM 2224 CA ALA A 298 3.102 23.335 5.798 1.00 58.92 C ATOM 2225 C ALA A 298 3.282 24.318 4.654 1.00 61.14 C ATOM 2226 O ALA A 298 3.875 23.975 3.628 1.00 62.85 O ATOM 2227 CB ALA A 298 1.671 22.798 5.820 1.00 58.61 C ATOM 2228 N LEU A 299 2.770 25.537 4.819 1.00 60.82 N ATOM 2229 CA LEU A 299 2.892 26.527 3.759 1.00 61.09 C ATOM 2230 C LEU A 299 4.352 26.834 3.418 1.00 62.91 C ATOM 2231 O LEU A 299 4.652 27.247 2.300 1.00 65.11 O ATOM 2232 CB LEU A 299 2.187 27.830 4.139 1.00 57.81 C ATOM 2233 CG LEU A 299 0.665 27.992 4.080 1.00 55.38 C ATOM 2234 CD1 LEU A 299 0.377 29.446 4.314 1.00 53.96 C ATOM 2235 CD2 LEU A 299 0.075 27.601 2.741 1.00 51.97 C ATOM 2236 N THR A 300 5.260 26.643 4.369 1.00 62.76 N ATOM 2237 CA THR A 300 6.666 26.925 4.104 1.00 62.65 C ATOM 2238 C THR A 300 7.231 25.849 3.206 1.00 61.32 C ATOM 2239 O THR A 300 7.636 26.123 2.081 1.00 59.77 O ATOM 2240 CB THR A 300 7.494 26.957 5.399 1.00 63.73 C ATOM 2241 OG1 THR A 300 6.782 27.679 6.410 1.00 65.35 O ATOM 2242 CG2 THR A 300 8.831 27.658 5.157 1.00 63.25 C ATOM 2243 N LEU A 301 7.252 24.628 3.727 1.00 60.60 N ATOM 2244 CA LEU A 301 7.730 23.460 2.996 1.00 62.25 C ATOM 2245 C LEU A 301 7.095 23.374 1.615 1.00 62.75 C ATOM 2246 O LEU A 301 7.509 22.579 0.779 1.00 61.14 O ATOM 2247 CB LEU A 301 7.377 22.186 3.763 1.00 62.64 C ATOM 2248 CG LEU A 301 7.940 22.064 5.174 1.00 63.60 C ATOM 2249 CD1 LEU A 301 7.288 20.909 5.926 1.00 64.16 C ATOM 2250 CD2 LEU A 301 9.434 21.856 5.060 1.00 64.47 C ATOM 2251 N SER A 302 6.060 24.173 1.393 1.00 65.66 N ATOM 2252 CA SER A 302 5.378 24.177 0.108 1.00 67.63 C ATOM 2253 C SER A 302 6.240 24.982 -0.849 1.00 69.13 C ATOM 2254 O SER A 302 6.601 24.513 -1.931 1.00 70.32 O ATOM 2255 CB SER A 302 3.994 24.810 0.247 1.00 67.19 C ATOM 2256 OG SER A 302 3.224 24.576 -0.915 1.00 69.11 O ATOM 2257 N CYS A 303 6.587 26.189 -0.414 1.00 70.31 N ATOM 2258 CA CYS A 303 7.433 27.101 -1.173 1.00 71.32 C ATOM 2259 C CYS A 303 8.792 26.478 -1.479 1.00 72.31 C ATOM 2260 O CYS A 303 9.553 26.990 -2.303 1.00 72.83 O ATOM 2261 CB CYS A 303 7.672 28.363 -0.358 1.00 71.03 C ATOM 2262 SG CYS A 303 6.178 29.094 0.236 1.00 72.69 S ATOM 2263 N PHE A 304 9.096 25.372 -0.811 1.00 73.09 N ATOM 2264 CA PHE A 304 10.379 24.725 -0.991 1.00 72.61 C ATOM 2265 C PHE A 304 10.356 23.314 -1.551 1.00 73.36 C ATOM 2266 O PHE A 304 11.192 22.488 -1.180 1.00 74.39 O ATOM 2267 CB PHE A 304 11.119 24.742 0.336 1.00 71.80 C ATOM 2268 CG PHE A 304 11.378 26.121 0.851 1.00 71.95 C ATOM 2269 CD1 PHE A 304 12.400 26.896 0.314 1.00 72.36 C ATOM 2270 CD2 PHE A 304 10.584 26.662 1.850 1.00 72.11 C ATOM 2271 CE1 PHE A 304 12.627 28.193 0.766 1.00 71.54 C ATOM 2272 CE2 PHE A 304 10.804 27.957 2.308 1.00 72.05 C ATOM 2273 CZ PHE A 304 11.829 28.723 1.763 1.00 71.30 C ATOM 2274 N GLY A 305 9.409 23.026 -2.438 1.00 72.86 N ATOM 2275 CA GLY A 305 9.389 21.703 -3.027 1.00 72.03 C ATOM 2276 C GLY A 305 8.143 20.866 -2.864 1.00 71.73 C ATOM 2277 O GLY A 305 7.430 20.623 -3.845 1.00 72.28 O ATOM 2278 N THR A 306 7.891 20.412 -1.636 1.00 70.88 N ATOM 2279 CA THR A 306 6.731 19.577 -1.347 1.00 69.05 C ATOM 2280 C THR A 306 5.539 19.961 -2.227 1.00 69.72 C ATOM 2281 O THR A 306 5.135 21.137 -2.311 1.00 68.72 O ATOM 2282 CB THR A 306 6.312 19.672 0.117 1.00 66.56 C ATOM 2283 OG1 THR A 306 7.471 19.823 0.936 1.00 65.69 O ATOM 2284 CG2 THR A 306 5.598 18.407 0.527 1.00 65.22 C ATOM 2285 N LYS A 307 5.001 18.946 -2.895 1.00 69.29 N ATOM 2286 CA LYS A 307 3.871 19.089 -3.799 1.00 69.35 C ATOM 2287 C LYS A 307 2.765 18.223 -3.215 1.00 68.64 C ATOM 2288 O LYS A 307 3.046 17.325 -2.434 1.00 68.97 O ATOM 2289 CB LYS A 307 4.298 18.609 -5.190 1.00 69.94 C ATOM 2290 CG LYS A 307 5.658 19.203 -5.601 1.00 71.55 C ATOM 2291 CD LYS A 307 6.367 18.403 -6.698 1.00 75.08 C ATOM 2292 CE LYS A 307 5.767 18.656 -8.069 1.00 76.43 C ATOM 2293 NZ LYS A 307 5.923 20.091 -8.461 1.00 77.73 N ATOM 2294 N ARG A 308 1.512 18.493 -3.552 1.00 68.82 N ATOM 2295 CA ARG A 308 0.440 17.684 -2.995 1.00 69.63 C ATOM 2296 C ARG A 308 0.226 16.403 -3.799 1.00 73.65 C ATOM 2297 O ARG A 308 -0.750 15.659 -3.593 1.00 74.33 O ATOM 2298 CB ARG A 308 -0.853 18.501 -2.898 1.00 65.67 C ATOM 2299 CG ARG A 308 -1.047 19.134 -1.519 1.00 61.84 C ATOM 2300 CD ARG A 308 -2.286 19.993 -1.437 1.00 56.86 C ATOM 2301 NE ARG A 308 -2.127 21.204 -2.225 1.00 51.40 N ATOM 2302 CZ ARG A 308 -3.122 21.812 -2.855 1.00 51.56 C ATOM 2303 NH1 ARG A 308 -4.353 21.317 -2.790 1.00 50.69 N ATOM 2304 NH2 ARG A 308 -2.887 22.911 -3.554 1.00 53.11 N ATOM 2305 N GLU A 309 1.156 16.140 -4.711 1.00 76.34 N ATOM 2306 CA GLU A 309 1.076 14.947 -5.524 1.00 78.81 C ATOM 2307 C GLU A 309 2.193 13.998 -5.129 1.00 81.14 C ATOM 2308 O GLU A 309 2.209 12.835 -5.537 1.00 80.97 O ATOM 2309 CB GLU A 309 1.179 15.304 -7.003 1.00 78.42 C ATOM 2310 CG GLU A 309 2.482 15.938 -7.395 1.00 79.83 C ATOM 2311 CD GLU A 309 2.793 15.708 -8.856 1.00 80.70 C ATOM 2312 OE1 GLU A 309 2.001 16.166 -9.715 1.00 79.92 O ATOM 2313 OE2 GLU A 309 3.827 15.059 -9.143 1.00 80.64 O ATOM 2314 N GLY A 310 3.125 14.500 -4.327 1.00 84.23 N ATOM 2315 CA GLY A 310 4.237 13.681 -3.880 1.00 88.85 C ATOM 2316 C GLY A 310 5.613 14.225 -4.234 1.00 92.10 C ATOM 2317 O GLY A 310 5.745 15.212 -4.951 1.00 91.07 O ATOM 2318 N ASN A 311 6.641 13.555 -3.726 1.00 97.12 N ATOM 2319 CA ASN A 311 8.038 13.912 -3.960 1.00102.29 C ATOM 2320 C ASN A 311 8.864 12.631 -3.775 1.00106.18 C ATOM 2321 O ASN A 311 8.584 11.834 -2.871 1.00107.49 O ATOM 2322 CB ASN A 311 8.512 14.969 -2.946 1.00102.14 C ATOM 2323 CG ASN A 311 7.845 16.330 -3.141 1.00102.71 C ATOM 2324 OD1 ASN A 311 8.009 16.972 -4.178 1.00103.04 O ATOM 2325 ND2 ASN A 311 7.093 16.775 -2.135 1.00102.31 N ATOM 2326 N HIS A 312 9.863 12.426 -4.634 1.00109.68 N ATOM 2327 CA HIS A 312 10.730 11.247 -4.548 1.00112.52 C ATOM 2328 C HIS A 312 12.149 11.606 -5.000 1.00113.52 C ATOM 2329 O HIS A 312 12.341 12.169 -6.084 1.00113.20 O ATOM 2330 CB HIS A 312 10.179 10.102 -5.414 1.00114.32 C ATOM 2331 CG HIS A 312 10.305 10.339 -6.891 1.00117.06 C ATOM 2332 ND1 HIS A 312 9.493 11.212 -7.578 1.00118.23 N ATOM 2333 CD2 HIS A 312 11.180 9.839 -7.798 1.00118.21 C ATOM 2334 CE1 HIS A 312 9.862 11.245 -8.850 1.00118.99 C ATOM 2335 NE2 HIS A 312 10.882 10.422 -9.008 1.00119.29 N ATOM 2336 N LYS A 313 13.139 11.291 -4.168 1.00114.77 N ATOM 2337 CA LYS A 313 14.533 11.590 -4.494 1.00116.23 C ATOM 2338 C LYS A 313 14.857 11.036 -5.879 1.00117.88 C ATOM 2339 O LYS A 313 14.705 9.838 -6.127 1.00118.20 O ATOM 2340 CB LYS A 313 15.458 10.969 -3.447 1.00115.55 C ATOM 2341 CG LYS A 313 15.169 11.415 -2.019 1.00114.52 C ATOM 2342 CD LYS A 313 16.016 10.625 -1.029 1.00113.62 C ATOM 2343 CE LYS A 313 15.671 10.946 0.417 1.00111.93 C ATOM 2344 NZ LYS A 313 16.459 10.099 1.360 1.00109.72 N ATOM 2345 N PRO A 314 15.298 11.907 -6.806 1.00119.02 N ATOM 2346 CA PRO A 314 15.635 11.490 -8.170 1.00119.57 C ATOM 2347 C PRO A 314 16.822 10.538 -8.246 1.00120.29 C ATOM 2348 O PRO A 314 17.581 10.378 -7.281 1.00119.94 O ATOM 2349 CB PRO A 314 15.931 12.812 -8.877 1.00119.10 C ATOM 2350 CG PRO A 314 15.115 13.800 -8.115 1.00119.32 C ATOM 2351 CD PRO A 314 15.368 13.373 -6.690 1.00119.29 C ATOM 2352 N GLU A 315 16.966 9.913 -9.411 1.00120.71 N ATOM 2353 CA GLU A 315 18.051 8.979 -9.682 1.00120.73 C ATOM 2354 C GLU A 315 18.044 7.734 -8.807 1.00119.42 C ATOM 2355 O GLU A 315 18.984 6.937 -8.841 1.00119.59 O ATOM 2356 CB GLU A 315 19.399 9.698 -9.559 1.00122.78 C ATOM 2357 CG GLU A 315 19.665 10.686 -10.692 1.00124.50 C ATOM 2358 CD GLU A 315 20.924 11.503 -10.482 1.00125.58 C ATOM 2359 OE1 GLU A 315 21.290 12.260 -11.404 1.00125.94 O ATOM 2360 OE2 GLU A 315 21.541 11.395 -9.399 1.00126.86 O ATOM 2361 N THR A 316 16.990 7.569 -8.017 1.00117.54 N ATOM 2362 CA THR A 316 16.881 6.392 -7.168 1.00115.78 C ATOM 2363 C THR A 316 15.745 5.555 -7.728 1.00113.97 C ATOM 2364 O THR A 316 14.675 6.080 -8.034 1.00113.84 O ATOM 2365 CB THR A 316 16.575 6.758 -5.686 1.00116.09 C ATOM 2366 OG1 THR A 316 17.638 7.566 -5.163 1.00116.18 O ATOM 2367 CG2 THR A 316 16.447 5.489 -4.831 1.00115.05 C ATOM 2368 N ASP A 317 16.003 4.260 -7.894 1.00111.82 N ATOM 2369 CA ASP A 317 15.007 3.326 -8.405 1.00109.43 C ATOM 2370 C ASP A 317 14.309 2.721 -7.186 1.00109.27 C ATOM 2371 O ASP A 317 14.869 1.865 -6.488 1.00109.58 O ATOM 2372 CB ASP A 317 15.689 2.242 -9.262 1.00106.72 C ATOM 2373 CG ASP A 317 14.713 1.202 -9.810 1.00104.53 C ATOM 2374 OD1 ASP A 317 13.709 1.574 -10.452 1.00103.22 O ATOM 2375 OD2 ASP A 317 14.961 -0.004 -9.607 1.00102.42 O ATOM 2376 N TYR A 318 13.093 3.207 -6.922 1.00108.87 N ATOM 2377 CA TYR A 318 12.279 2.750 -5.798 1.00108.47 C ATOM 2378 C TYR A 318 11.649 1.403 -6.125 1.00109.21 C ATOM 2379 O TYR A 318 10.792 0.918 -5.399 1.00108.22 O ATOM 2380 CB TYR A 318 11.181 3.775 -5.486 1.00107.18 C ATOM 2381 CG TYR A 318 11.632 5.027 -4.738 1.00105.92 C ATOM 2382 CD1 TYR A 318 11.945 4.983 -3.370 1.00104.92 C ATOM 2383 CD2 TYR A 318 11.713 6.264 -5.387 1.00104.89 C ATOM 2384 CE1 TYR A 318 12.323 6.145 -2.669 1.00103.58 C ATOM 2385 CE2 TYR A 318 12.091 7.430 -4.696 1.00104.05 C ATOM 2386 CZ TYR A 318 12.392 7.362 -3.341 1.00103.64 C ATOM 2387 OH TYR A 318 12.761 8.507 -2.670 1.00102.85 O ATOM 2388 N LEU A 319 12.089 0.814 -7.232 1.00111.27 N ATOM 2389 CA LEU A 319 11.612 -0.493 -7.690 1.00112.79 C ATOM 2390 C LEU A 319 12.650 -1.591 -7.385 1.00113.76 C ATOM 2391 O LEU A 319 12.326 -2.521 -6.602 1.00114.70 O ATOM 2392 CB LEU A 319 11.341 -0.461 -9.203 1.00111.81 C ATOM 2393 CG LEU A 319 10.062 0.223 -9.681 1.00110.60 C ATOM 2394 CD1 LEU A 319 10.039 0.280 -11.202 1.00109.61 C ATOM 2395 CD2 LEU A 319 8.861 -0.550 -9.149 1.00109.90 C TER 2396 LEU A 319 ATOM 2397 N LYS B 6 -32.819 -1.014 -13.478 1.00 99.08 N ATOM 2398 CA LYS B 6 -31.536 -1.722 -13.166 1.00 99.00 C ATOM 2399 C LYS B 6 -31.542 -2.347 -11.758 1.00 98.40 C ATOM 2400 O LYS B 6 -32.358 -1.982 -10.901 1.00 97.61 O ATOM 2401 CB LYS B 6 -30.365 -0.743 -13.323 1.00 99.66 C ATOM 2402 CG LYS B 6 -29.397 -1.076 -14.463 1.00 99.85 C ATOM 2403 CD LYS B 6 -28.354 -2.113 -14.025 1.00100.73 C ATOM 2404 CE LYS B 6 -27.313 -2.399 -15.109 1.00101.23 C ATOM 2405 NZ LYS B 6 -26.192 -3.258 -14.609 1.00100.75 N ATOM 2406 N PRO B 7 -30.634 -3.312 -11.510 1.00 97.80 N ATOM 2407 CA PRO B 7 -30.548 -3.981 -10.212 1.00 97.15 C ATOM 2408 C PRO B 7 -29.293 -3.635 -9.433 1.00 96.32 C ATOM 2409 O PRO B 7 -28.236 -3.398 -10.013 1.00 96.22 O ATOM 2410 CB PRO B 7 -30.553 -5.442 -10.605 1.00 97.20 C ATOM 2411 CG PRO B 7 -29.608 -5.411 -11.777 1.00 97.43 C ATOM 2412 CD PRO B 7 -29.947 -4.122 -12.532 1.00 97.47 C ATOM 2413 N ILE B 8 -29.421 -3.643 -8.111 1.00 95.57 N ATOM 2414 CA ILE B 8 -28.311 -3.337 -7.219 1.00 94.69 C ATOM 2415 C ILE B 8 -28.152 -4.437 -6.193 1.00 94.01 C ATOM 2416 O ILE B 8 -29.137 -4.836 -5.565 1.00 93.77 O ATOM 2417 CB ILE B 8 -28.554 -2.025 -6.404 1.00 94.64 C ATOM 2418 CG1 ILE B 8 -28.668 -0.814 -7.330 1.00 93.94 C ATOM 2419 CG2 ILE B 8 -27.426 -1.816 -5.399 1.00 93.66 C ATOM 2420 CD1 ILE B 8 -27.465 -0.620 -8.200 1.00 94.03 C ATOM 2421 N GLU B 9 -26.933 -4.946 -6.035 1.00 92.63 N ATOM 2422 CA GLU B 9 -26.718 -5.928 -4.984 1.00 91.96 C ATOM 2423 C GLU B 9 -25.601 -5.375 -4.113 1.00 91.79 C ATOM 2424 O GLU B 9 -24.555 -4.951 -4.620 1.00 91.03 O ATOM 2425 CB GLU B 9 -26.322 -7.317 -5.496 1.00 91.63 C ATOM 2426 CG GLU B 9 -26.625 -8.387 -4.422 1.00 91.26 C ATOM 2427 CD GLU B 9 -25.767 -9.642 -4.488 1.00 91.17 C ATOM 2428 OE1 GLU B 9 -25.979 -10.542 -3.645 1.00 89.98 O ATOM 2429 OE2 GLU B 9 -24.886 -9.731 -5.364 1.00 91.53 O ATOM 2430 N ILE B 10 -25.841 -5.355 -2.805 1.00 91.97 N ATOM 2431 CA ILE B 10 -24.868 -4.835 -1.851 1.00 92.34 C ATOM 2432 C ILE B 10 -24.090 -5.955 -1.181 1.00 91.65 C ATOM 2433 O ILE B 10 -24.674 -6.926 -0.705 1.00 91.67 O ATOM 2434 CB ILE B 10 -25.560 -4.002 -0.749 1.00 93.05 C ATOM 2435 CG1 ILE B 10 -26.280 -2.813 -1.392 1.00 93.78 C ATOM 2436 CG2 ILE B 10 -24.527 -3.546 0.297 1.00 91.79 C ATOM 2437 CD1 ILE B 10 -27.076 -1.953 -0.430 1.00 92.99 C ATOM 2438 N ILE B 11 -22.769 -5.809 -1.144 1.00 90.86 N ATOM 2439 CA ILE B 11 -21.893 -6.796 -0.523 1.00 89.22 C ATOM 2440 C ILE B 11 -21.182 -6.116 0.634 1.00 88.75 C ATOM 2441 O ILE B 11 -20.663 -5.012 0.489 1.00 87.87 O ATOM 2442 CB ILE B 11 -20.806 -7.314 -1.501 1.00 88.97 C ATOM 2443 CG1 ILE B 11 -21.454 -7.816 -2.786 1.00 87.62 C ATOM 2444 CG2 ILE B 11 -19.995 -8.418 -0.847 1.00 88.11 C ATOM 2445 CD1 ILE B 11 -22.436 -8.922 -2.578 1.00 88.39 C ATOM 2446 N GLY B 12 -21.170 -6.778 1.783 1.00 88.08 N ATOM 2447 CA GLY B 12 -20.496 -6.220 2.934 1.00 86.84 C ATOM 2448 C GLY B 12 -19.152 -6.902 3.049 1.00 85.81 C ATOM 2449 O GLY B 12 -19.084 -8.129 3.034 1.00 86.11 O ATOM 2450 N ALA B 13 -18.083 -6.118 3.157 1.00 84.95 N ATOM 2451 CA ALA B 13 -16.736 -6.670 3.267 1.00 83.69 C ATOM 2452 C ALA B 13 -16.116 -6.379 4.632 1.00 83.14 C ATOM 2453 O ALA B 13 -14.960 -5.963 4.708 1.00 83.23 O ATOM 2454 CB ALA B 13 -15.843 -6.091 2.148 1.00 82.18 C ATOM 2455 N PRO B 14 -16.866 -6.610 5.725 1.00 82.39 N ATOM 2456 CA PRO B 14 -16.335 -6.348 7.067 1.00 82.40 C ATOM 2457 C PRO B 14 -14.951 -6.948 7.271 1.00 83.87 C ATOM 2458 O PRO B 14 -14.822 -8.053 7.782 1.00 84.25 O ATOM 2459 CB PRO B 14 -17.379 -6.981 7.972 1.00 81.00 C ATOM 2460 CG PRO B 14 -17.903 -8.108 7.135 1.00 80.99 C ATOM 2461 CD PRO B 14 -18.081 -7.435 5.811 1.00 81.10 C ATOM 2462 N PHE B 15 -13.920 -6.206 6.878 1.00 86.12 N ATOM 2463 CA PHE B 15 -12.538 -6.669 6.991 1.00 87.98 C ATOM 2464 C PHE B 15 -11.655 -5.549 7.543 1.00 88.21 C ATOM 2465 O PHE B 15 -12.094 -4.398 7.625 1.00 88.51 O ATOM 2466 CB PHE B 15 -12.039 -7.106 5.611 1.00 88.79 C ATOM 2467 CG PHE B 15 -10.746 -7.854 5.641 1.00 91.15 C ATOM 2468 CD1 PHE B 15 -10.569 -8.926 6.512 1.00 92.70 C ATOM 2469 CD2 PHE B 15 -9.710 -7.512 4.775 1.00 91.85 C ATOM 2470 CE1 PHE B 15 -9.381 -9.657 6.514 1.00 93.41 C ATOM 2471 CE2 PHE B 15 -8.519 -8.235 4.766 1.00 93.06 C ATOM 2472 CZ PHE B 15 -8.350 -9.308 5.637 1.00 93.27 C ATOM 2473 N SER B 16 -10.418 -5.863 7.918 1.00 87.82 N ATOM 2474 CA SER B 16 -9.554 -4.819 8.453 1.00 88.48 C ATOM 2475 C SER B 16 -8.067 -5.146 8.534 1.00 90.22 C ATOM 2476 O SER B 16 -7.265 -4.281 8.874 1.00 90.56 O ATOM 2477 CB SER B 16 -10.045 -4.407 9.844 1.00 86.82 C ATOM 2478 OG SER B 16 -9.873 -5.438 10.789 1.00 83.07 O ATOM 2479 N LYS B 17 -7.689 -6.379 8.227 1.00 92.02 N ATOM 2480 CA LYS B 17 -6.282 -6.764 8.305 1.00 93.42 C ATOM 2481 C LYS B 17 -5.400 -6.006 7.309 1.00 93.24 C ATOM 2482 O LYS B 17 -4.166 -6.036 7.394 1.00 92.98 O ATOM 2483 CB LYS B 17 -6.143 -8.280 8.105 1.00 95.56 C ATOM 2484 CG LYS B 17 -5.779 -9.083 9.372 1.00 96.54 C ATOM 2485 CD LYS B 17 -6.682 -8.748 10.555 1.00 97.30 C ATOM 2486 CE LYS B 17 -6.585 -9.807 11.644 1.00 97.15 C ATOM 2487 NZ LYS B 17 -7.170 -11.098 11.181 1.00 96.57 N ATOM 2488 N GLY B 18 -6.028 -5.321 6.363 1.00 92.44 N ATOM 2489 CA GLY B 18 -5.247 -4.565 5.406 1.00 92.27 C ATOM 2490 C GLY B 18 -4.451 -3.487 6.125 1.00 92.53 C ATOM 2491 O GLY B 18 -3.504 -2.934 5.563 1.00 92.29 O ATOM 2492 N GLN B 19 -4.836 -3.195 7.371 1.00 92.65 N ATOM 2493 CA GLN B 19 -4.187 -2.176 8.198 1.00 92.73 C ATOM 2494 C GLN B 19 -4.409 -2.423 9.698 1.00 94.03 C ATOM 2495 O GLN B 19 -5.270 -3.213 10.084 1.00 93.82 O ATOM 2496 CB GLN B 19 -4.694 -0.784 7.805 1.00 92.02 C ATOM 2497 CG GLN B 19 -6.207 -0.621 7.778 1.00 91.29 C ATOM 2498 CD GLN B 19 -6.812 -0.576 9.160 1.00 91.08 C ATOM 2499 OE1 GLN B 19 -6.325 0.146 10.025 1.00 91.96 O ATOM 2500 NE2 GLN B 19 -7.880 -1.338 9.376 1.00 89.85 N ATOM 2501 N PRO B 20 -3.639 -1.729 10.561 1.00 95.05 N ATOM 2502 CA PRO B 20 -3.665 -1.810 12.031 1.00 95.93 C ATOM 2503 C PRO B 20 -4.919 -1.379 12.808 1.00 96.49 C ATOM 2504 O PRO B 20 -5.486 -2.169 13.570 1.00 97.00 O ATOM 2505 CB PRO B 20 -2.442 -0.990 12.433 1.00 95.49 C ATOM 2506 CG PRO B 20 -2.411 0.061 11.393 1.00 95.38 C ATOM 2507 CD PRO B 20 -2.643 -0.732 10.129 1.00 94.96 C ATOM 2508 N ARG B 21 -5.326 -0.125 12.636 1.00 96.79 N ATOM 2509 CA ARG B 21 -6.497 0.423 13.329 1.00 96.91 C ATOM 2510 C ARG B 21 -7.754 -0.438 13.141 1.00 94.86 C ATOM 2511 O ARG B 21 -8.202 -0.677 12.014 1.00 94.86 O ATOM 2512 CB ARG B 21 -6.771 1.850 12.837 1.00 99.34 C ATOM 2513 CG ARG B 21 -5.527 2.598 12.375 1.00102.34 C ATOM 2514 CD ARG B 21 -5.890 3.968 11.830 1.00106.42 C ATOM 2515 NE ARG B 21 -6.188 4.914 12.899 1.00109.60 N ATOM 2516 CZ ARG B 21 -5.272 5.415 13.720 1.00111.63 C ATOM 2517 NH1 ARG B 21 -3.999 5.063 13.591 1.00112.45 N ATOM 2518 NH2 ARG B 21 -5.628 6.268 14.670 1.00112.44 N ATOM 2519 N GLY B 22 -8.331 -0.885 14.253 1.00 92.20 N ATOM 2520 CA GLY B 22 -9.514 -1.724 14.179 1.00 89.96 C ATOM 2521 C GLY B 22 -10.854 -1.012 14.202 1.00 88.10 C ATOM 2522 O GLY B 22 -11.020 0.005 14.868 1.00 87.82 O ATOM 2523 N GLY B 23 -11.820 -1.567 13.477 1.00 86.64 N ATOM 2524 CA GLY B 23 -13.140 -0.976 13.429 1.00 85.01 C ATOM 2525 C GLY B 23 -13.693 -0.985 12.020 1.00 84.30 C ATOM 2526 O GLY B 23 -14.890 -1.189 11.816 1.00 84.52 O ATOM 2527 N VAL B 24 -12.817 -0.775 11.043 1.00 83.77 N ATOM 2528 CA VAL B 24 -13.215 -0.744 9.636 1.00 83.44 C ATOM 2529 C VAL B 24 -14.152 -1.867 9.176 1.00 84.55 C ATOM 2530 O VAL B 24 -14.893 -1.698 8.201 1.00 83.80 O ATOM 2531 CB VAL B 24 -11.967 -0.698 8.715 1.00 82.02 C ATOM 2532 CG1 VAL B 24 -11.634 0.746 8.376 1.00 82.66 C ATOM 2533 CG2 VAL B 24 -10.785 -1.332 9.408 1.00 80.13 C ATOM 2534 N GLU B 25 -14.122 -3.009 9.863 1.00 85.75 N ATOM 2535 CA GLU B 25 -14.999 -4.113 9.484 1.00 86.17 C ATOM 2536 C GLU B 25 -16.406 -3.786 9.970 1.00 85.93 C ATOM 2537 O GLU B 25 -17.396 -4.353 9.487 1.00 86.45 O ATOM 2538 CB GLU B 25 -14.527 -5.445 10.081 1.00 86.78 C ATOM 2539 CG GLU B 25 -14.776 -5.637 11.566 1.00 88.34 C ATOM 2540 CD GLU B 25 -13.850 -4.811 12.430 1.00 90.53 C ATOM 2541 OE1 GLU B 25 -12.769 -4.409 11.935 1.00 91.06 O ATOM 2542 OE2 GLU B 25 -14.197 -4.581 13.611 1.00 91.16 O ATOM 2543 N LYS B 26 -16.488 -2.866 10.928 1.00 84.12 N ATOM 2544 CA LYS B 26 -17.776 -2.435 11.443 1.00 82.19 C ATOM 2545 C LYS B 26 -18.375 -1.492 10.404 1.00 80.51 C ATOM 2546 O LYS B 26 -19.499 -1.020 10.552 1.00 80.34 O ATOM 2547 CB LYS B 26 -17.606 -1.707 12.773 1.00 82.12 C ATOM 2548 CG LYS B 26 -17.123 -2.592 13.901 1.00 84.38 C ATOM 2549 CD LYS B 26 -17.015 -1.804 15.212 1.00 86.61 C ATOM 2550 CE LYS B 26 -16.569 -2.695 16.378 1.00 86.85 C ATOM 2551 NZ LYS B 26 -16.430 -1.967 17.680 1.00 86.67 N ATOM 2552 N GLY B 27 -17.604 -1.231 9.350 1.00 78.69 N ATOM 2553 CA GLY B 27 -18.047 -0.350 8.284 1.00 76.08 C ATOM 2554 C GLY B 27 -19.442 -0.656 7.778 1.00 74.71 C ATOM 2555 O GLY B 27 -20.362 0.134 8.002 1.00 74.43 O ATOM 2556 N PRO B 28 -19.636 -1.790 7.079 1.00 74.00 N ATOM 2557 CA PRO B 28 -20.959 -2.156 6.557 1.00 74.28 C ATOM 2558 C PRO B 28 -22.015 -2.259 7.657 1.00 75.07 C ATOM 2559 O PRO B 28 -23.216 -2.156 7.407 1.00 74.60 O ATOM 2560 CB PRO B 28 -20.692 -3.489 5.864 1.00 73.18 C ATOM 2561 CG PRO B 28 -19.294 -3.321 5.359 1.00 71.70 C ATOM 2562 CD PRO B 28 -18.602 -2.702 6.557 1.00 72.82 C ATOM 2563 N ALA B 29 -21.548 -2.455 8.881 1.00 77.64 N ATOM 2564 CA ALA B 29 -22.432 -2.571 10.033 1.00 79.22 C ATOM 2565 C ALA B 29 -23.087 -1.235 10.361 1.00 79.71 C ATOM 2566 O ALA B 29 -24.261 -1.188 10.751 1.00 81.37 O ATOM 2567 CB ALA B 29 -21.647 -3.086 11.243 1.00 79.86 C ATOM 2568 N ALA B 30 -22.329 -0.154 10.203 1.00 78.80 N ATOM 2569 CA ALA B 30 -22.844 1.180 10.487 1.00 78.62 C ATOM 2570 C ALA B 30 -23.629 1.698 9.289 1.00 78.68 C ATOM 2571 O ALA B 30 -24.682 2.328 9.436 1.00 78.63 O ATOM 2572 CB ALA B 30 -21.696 2.127 10.808 1.00 78.52 C ATOM 2573 N LEU B 31 -23.114 1.415 8.099 1.00 78.19 N ATOM 2574 CA LEU B 31 -23.755 1.865 6.873 1.00 77.40 C ATOM 2575 C LEU B 31 -25.171 1.335 6.726 1.00 77.67 C ATOM 2576 O LEU B 31 -26.067 2.077 6.345 1.00 77.84 O ATOM 2577 CB LEU B 31 -22.906 1.472 5.659 1.00 75.60 C ATOM 2578 CG LEU B 31 -21.718 2.401 5.388 1.00 73.01 C ATOM 2579 CD1 LEU B 31 -20.538 1.623 4.892 1.00 70.89 C ATOM 2580 CD2 LEU B 31 -22.137 3.449 4.386 1.00 71.89 C ATOM 2581 N ARG B 32 -25.385 0.059 7.018 1.00 78.94 N ATOM 2582 CA ARG B 32 -26.733 -0.481 6.898 1.00 80.72 C ATOM 2583 C ARG B 32 -27.595 0.164 7.969 1.00 81.47 C ATOM 2584 O ARG B 32 -28.751 0.492 7.723 1.00 80.36 O ATOM 2585 CB ARG B 32 -26.754 -2.005 7.081 1.00 82.03 C ATOM 2586 CG ARG B 32 -26.334 -2.830 5.866 1.00 82.49 C ATOM 2587 CD ARG B 32 -26.825 -4.281 5.994 1.00 84.29 C ATOM 2588 NE ARG B 32 -26.221 -5.185 5.016 1.00 85.70 N ATOM 2589 CZ ARG B 32 -24.933 -5.524 5.006 1.00 86.70 C ATOM 2590 NH1 ARG B 32 -24.109 -5.038 5.926 1.00 87.41 N ATOM 2591 NH2 ARG B 32 -24.464 -6.349 4.077 1.00 86.01 N ATOM 2592 N LYS B 33 -27.011 0.330 9.156 1.00 82.85 N ATOM 2593 CA LYS B 33 -27.686 0.936 10.294 1.00 84.50 C ATOM 2594 C LYS B 33 -28.087 2.372 9.964 1.00 85.48 C ATOM 2595 O LYS B 33 -29.032 2.912 10.538 1.00 85.93 O ATOM 2596 CB LYS B 33 -26.771 0.910 11.521 1.00 85.61 C ATOM 2597 CG LYS B 33 -27.233 1.804 12.664 1.00 88.84 C ATOM 2598 CD LYS B 33 -28.564 1.351 13.275 1.00 91.56 C ATOM 2599 CE LYS B 33 -28.357 0.362 14.427 1.00 93.10 C ATOM 2600 NZ LYS B 33 -27.506 0.929 15.530 1.00 93.72 N ATOM 2601 N ALA B 34 -27.363 2.998 9.044 1.00 86.44 N ATOM 2602 CA ALA B 34 -27.693 4.362 8.642 1.00 86.60 C ATOM 2603 C ALA B 34 -28.803 4.256 7.589 1.00 86.88 C ATOM 2604 O ALA B 34 -29.154 5.227 6.915 1.00 85.52 O ATOM 2605 CB ALA B 34 -26.463 5.054 8.066 1.00 87.03 C ATOM 2606 N GLY B 35 -29.331 3.042 7.457 1.00 88.26 N ATOM 2607 CA GLY B 35 -30.408 2.769 6.523 1.00 89.33 C ATOM 2608 C GLY B 35 -30.035 2.799 5.061 1.00 89.25 C ATOM 2609 O GLY B 35 -30.891 3.030 4.207 1.00 89.47 O ATOM 2610 N LEU B 36 -28.770 2.554 4.757 1.00 89.38 N ATOM 2611 CA LEU B 36 -28.334 2.588 3.369 1.00 90.16 C ATOM 2612 C LEU B 36 -29.188 1.729 2.436 1.00 90.88 C ATOM 2613 O LEU B 36 -29.517 2.144 1.320 1.00 90.76 O ATOM 2614 CB LEU B 36 -26.860 2.178 3.266 1.00 89.73 C ATOM 2615 CG LEU B 36 -26.212 2.147 1.875 1.00 89.10 C ATOM 2616 CD1 LEU B 36 -26.658 3.331 1.032 1.00 89.09 C ATOM 2617 CD2 LEU B 36 -24.703 2.145 2.044 1.00 88.91 C ATOM 2618 N VAL B 37 -29.573 0.543 2.892 1.00 91.52 N ATOM 2619 CA VAL B 37 -30.365 -0.339 2.044 1.00 92.05 C ATOM 2620 C VAL B 37 -31.690 0.288 1.583 1.00 93.18 C ATOM 2621 O VAL B 37 -31.909 0.481 0.382 1.00 93.84 O ATOM 2622 CB VAL B 37 -30.654 -1.672 2.754 1.00 90.22 C ATOM 2623 CG1 VAL B 37 -30.917 -2.747 1.721 1.00 88.34 C ATOM 2624 CG2 VAL B 37 -29.486 -2.053 3.644 1.00 89.51 C ATOM 2625 N GLU B 38 -32.561 0.613 2.535 1.00 93.67 N ATOM 2626 CA GLU B 38 -33.869 1.203 2.237 1.00 93.22 C ATOM 2627 C GLU B 38 -33.811 2.432 1.327 1.00 91.50 C ATOM 2628 O GLU B 38 -34.481 2.476 0.291 1.00 89.54 O ATOM 2629 CB GLU B 38 -34.593 1.577 3.543 1.00 95.27 C ATOM 2630 CG GLU B 38 -34.719 0.433 4.549 1.00 97.90 C ATOM 2631 CD GLU B 38 -33.448 0.211 5.357 1.00 98.72 C ATOM 2632 OE1 GLU B 38 -33.226 0.947 6.345 1.00 98.61 O ATOM 2633 OE2 GLU B 38 -32.668 -0.694 4.997 1.00100.35 O ATOM 2634 N LYS B 39 -33.021 3.429 1.717 1.00 89.99 N ATOM 2635 CA LYS B 39 -32.914 4.639 0.919 1.00 89.24 C ATOM 2636 C LYS B 39 -32.602 4.253 -0.511 1.00 89.66 C ATOM 2637 O LYS B 39 -33.175 4.803 -1.453 1.00 90.45 O ATOM 2638 CB LYS B 39 -31.817 5.548 1.454 1.00 87.39 C ATOM 2639 CG LYS B 39 -32.060 6.027 2.852 1.00 86.93 C ATOM 2640 CD LYS B 39 -30.919 6.899 3.311 1.00 87.74 C ATOM 2641 CE LYS B 39 -31.090 7.281 4.767 1.00 88.21 C ATOM 2642 NZ LYS B 39 -32.345 8.049 4.956 1.00 89.64 N ATOM 2643 N LEU B 40 -31.700 3.293 -0.675 1.00 89.66 N ATOM 2644 CA LEU B 40 -31.336 2.849 -2.012 1.00 89.32 C ATOM 2645 C LEU B 40 -32.581 2.353 -2.737 1.00 90.45 C ATOM 2646 O LEU B 40 -32.783 2.652 -3.913 1.00 89.95 O ATOM 2647 CB LEU B 40 -30.263 1.751 -1.944 1.00 85.82 C ATOM 2648 CG LEU B 40 -28.810 2.245 -2.052 1.00 82.28 C ATOM 2649 CD1 LEU B 40 -27.862 1.139 -1.665 1.00 82.14 C ATOM 2650 CD2 LEU B 40 -28.531 2.712 -3.464 1.00 80.23 C ATOM 2651 N LYS B 41 -33.429 1.617 -2.025 1.00 92.70 N ATOM 2652 CA LYS B 41 -34.652 1.088 -2.623 1.00 95.26 C ATOM 2653 C LYS B 41 -35.523 2.223 -3.149 1.00 96.02 C ATOM 2654 O LYS B 41 -36.234 2.068 -4.144 1.00 96.02 O ATOM 2655 CB LYS B 41 -35.429 0.256 -1.600 1.00 95.44 C ATOM 2656 CG LYS B 41 -34.687 -0.993 -1.131 1.00 97.67 C ATOM 2657 CD LYS B 41 -35.349 -1.610 0.097 1.00 98.54 C ATOM 2658 CE LYS B 41 -34.476 -2.690 0.724 1.00 97.92 C ATOM 2659 NZ LYS B 41 -35.049 -3.185 2.006 1.00 96.37 N ATOM 2660 N GLU B 42 -35.449 3.369 -2.485 1.00 97.23 N ATOM 2661 CA GLU B 42 -36.226 4.532 -2.877 1.00 99.27 C ATOM 2662 C GLU B 42 -35.924 5.037 -4.281 1.00100.29 C ATOM 2663 O GLU B 42 -36.577 5.960 -4.767 1.00 99.46 O ATOM 2664 CB GLU B 42 -35.995 5.662 -1.888 1.00100.72 C ATOM 2665 CG GLU B 42 -36.719 5.495 -0.580 1.00104.56 C ATOM 2666 CD GLU B 42 -36.385 6.612 0.394 1.00107.81 C ATOM 2667 OE1 GLU B 42 -36.179 7.759 -0.073 1.00108.24 O ATOM 2668 OE2 GLU B 42 -36.339 6.344 1.618 1.00110.21 O ATOM 2669 N THR B 43 -34.931 4.452 -4.935 1.00102.51 N ATOM 2670 CA THR B 43 -34.588 4.885 -6.285 1.00104.86 C ATOM 2671 C THR B 43 -35.298 4.039 -7.335 1.00106.85 C ATOM 2672 O THR B 43 -36.240 3.294 -7.029 1.00107.03 O ATOM 2673 CB THR B 43 -33.067 4.798 -6.551 1.00104.47 C ATOM 2674 OG1 THR B 43 -32.617 3.454 -6.336 1.00104.26 O ATOM 2675 CG2 THR B 43 -32.311 5.743 -5.633 1.00103.94 C ATOM 2676 N GLU B 44 -34.837 4.177 -8.578 1.00108.54 N ATOM 2677 CA GLU B 44 -35.378 3.435 -9.713 1.00109.37 C ATOM 2678 C GLU B 44 -34.661 2.080 -9.853 1.00109.21 C ATOM 2679 O GLU B 44 -34.913 1.317 -10.793 1.00109.70 O ATOM 2680 CB GLU B 44 -35.211 4.265 -10.992 1.00109.72 C ATOM 2681 CG GLU B 44 -33.917 5.072 -11.032 1.00111.43 C ATOM 2682 CD GLU B 44 -33.661 5.713 -12.387 1.00112.66 C ATOM 2683 OE1 GLU B 44 -34.597 6.332 -12.942 1.00113.58 O ATOM 2684 OE2 GLU B 44 -32.520 5.604 -12.896 1.00112.10 O ATOM 2685 N TYR B 45 -33.774 1.791 -8.901 1.00108.44 N ATOM 2686 CA TYR B 45 -33.006 0.549 -8.885 1.00107.04 C ATOM 2687 C TYR B 45 -33.569 -0.456 -7.877 1.00106.43 C ATOM 2688 O TYR B 45 -34.057 -0.075 -6.805 1.00106.41 O ATOM 2689 CB TYR B 45 -31.546 0.828 -8.516 1.00106.81 C ATOM 2690 CG TYR B 45 -30.755 1.645 -9.514 1.00106.18 C ATOM 2691 CD1 TYR B 45 -31.059 2.983 -9.762 1.00106.02 C ATOM 2692 CD2 TYR B 45 -29.679 1.083 -10.192 1.00106.09 C ATOM 2693 CE1 TYR B 45 -30.301 3.739 -10.667 1.00106.12 C ATOM 2694 CE2 TYR B 45 -28.920 1.823 -11.095 1.00105.75 C ATOM 2695 CZ TYR B 45 -29.232 3.146 -11.328 1.00105.62 C ATOM 2696 OH TYR B 45 -28.477 3.865 -12.227 1.00104.68 O ATOM 2697 N ASN B 46 -33.501 -1.738 -8.227 1.00105.11 N ATOM 2698 CA ASN B 46 -33.965 -2.795 -7.337 1.00103.37 C ATOM 2699 C ASN B 46 -32.733 -3.144 -6.535 1.00101.06 C ATOM 2700 O ASN B 46 -31.648 -3.290 -7.096 1.00 99.33 O ATOM 2701 CB ASN B 46 -34.426 -4.021 -8.124 1.00106.07 C ATOM 2702 CG ASN B 46 -35.540 -3.704 -9.094 1.00108.03 C ATOM 2703 OD1 ASN B 46 -36.562 -3.126 -8.717 1.00109.57 O ATOM 2704 ND2 ASN B 46 -35.351 -4.085 -10.355 1.00108.96 N ATOM 2705 N VAL B 47 -32.891 -3.292 -5.229 1.00 99.24 N ATOM 2706 CA VAL B 47 -31.735 -3.579 -4.399 1.00 98.07 C ATOM 2707 C VAL B 47 -31.816 -4.896 -3.647 1.00 96.62 C ATOM 2708 O VAL B 47 -32.836 -5.213 -3.039 1.00 96.63 O ATOM 2709 CB VAL B 47 -31.506 -2.440 -3.370 1.00 98.47 C ATOM 2710 CG1 VAL B 47 -30.065 -2.461 -2.879 1.00 97.35 C ATOM 2711 CG2 VAL B 47 -31.840 -1.094 -3.995 1.00 98.39 C ATOM 2712 N ARG B 48 -30.728 -5.656 -3.697 1.00 94.83 N ATOM 2713 CA ARG B 48 -30.644 -6.923 -2.997 1.00 93.46 C ATOM 2714 C ARG B 48 -29.428 -6.849 -2.099 1.00 92.09 C ATOM 2715 O ARG B 48 -28.321 -6.586 -2.568 1.00 90.53 O ATOM 2716 CB ARG B 48 -30.494 -8.084 -3.984 1.00 95.23 C ATOM 2717 CG ARG B 48 -30.293 -9.452 -3.318 1.00 96.76 C ATOM 2718 CD ARG B 48 -30.031 -10.550 -4.347 1.00 98.94 C ATOM 2719 NE ARG B 48 -28.943 -11.459 -3.959 1.00102.01 N ATOM 2720 CZ ARG B 48 -29.040 -12.440 -3.059 1.00102.83 C ATOM 2721 NH1 ARG B 48 -30.187 -12.667 -2.429 1.00103.67 N ATOM 2722 NH2 ARG B 48 -27.980 -13.198 -2.792 1.00103.04 N ATOM 2723 N ASP B 49 -29.646 -7.051 -0.802 1.00 91.94 N ATOM 2724 CA ASP B 49 -28.562 -7.025 0.169 1.00 92.13 C ATOM 2725 C ASP B 49 -28.067 -8.466 0.266 1.00 92.71 C ATOM 2726 O ASP B 49 -28.838 -9.376 0.572 1.00 93.23 O ATOM 2727 CB ASP B 49 -29.076 -6.542 1.529 1.00 92.41 C ATOM 2728 CG ASP B 49 -27.949 -6.205 2.496 1.00 93.21 C ATOM 2729 OD1 ASP B 49 -27.157 -7.111 2.838 1.00 93.93 O ATOM 2730 OD2 ASP B 49 -27.846 -5.032 2.921 1.00 94.25 O ATOM 2731 N HIS B 50 -26.781 -8.661 -0.012 1.00 93.35 N ATOM 2732 CA HIS B 50 -26.138 -9.979 0.001 1.00 93.56 C ATOM 2733 C HIS B 50 -25.623 -10.354 1.406 1.00 93.08 C ATOM 2734 O HIS B 50 -25.284 -11.511 1.670 1.00 92.89 O ATOM 2735 CB HIS B 50 -24.999 -9.953 -1.039 1.00 94.71 C ATOM 2736 CG HIS B 50 -24.286 -11.257 -1.244 1.00 95.99 C ATOM 2737 ND1 HIS B 50 -23.597 -11.904 -0.234 1.00 97.39 N ATOM 2738 CD2 HIS B 50 -24.067 -11.985 -2.366 1.00 95.59 C ATOM 2739 CE1 HIS B 50 -22.985 -12.965 -0.726 1.00 96.73 C ATOM 2740 NE2 HIS B 50 -23.253 -13.037 -2.020 1.00 96.87 N ATOM 2741 N GLY B 51 -25.591 -9.381 2.311 1.00 92.46 N ATOM 2742 CA GLY B 51 -25.118 -9.654 3.652 1.00 92.91 C ATOM 2743 C GLY B 51 -23.620 -9.446 3.744 1.00 93.76 C ATOM 2744 O GLY B 51 -23.006 -8.892 2.828 1.00 93.36 O ATOM 2745 N ASP B 52 -23.023 -9.900 4.844 1.00 94.72 N ATOM 2746 CA ASP B 52 -21.584 -9.746 5.052 1.00 95.39 C ATOM 2747 C ASP B 52 -20.746 -10.990 4.876 1.00 95.22 C ATOM 2748 O ASP B 52 -20.784 -11.895 5.701 1.00 94.37 O ATOM 2749 CB ASP B 52 -21.285 -9.192 6.445 1.00 96.23 C ATOM 2750 CG ASP B 52 -21.482 -7.707 6.524 1.00 96.65 C ATOM 2751 OD1 ASP B 52 -21.102 -7.013 5.557 1.00 97.76 O ATOM 2752 OD2 ASP B 52 -22.006 -7.235 7.554 1.00 97.38 O ATOM 2753 N LEU B 53 -19.965 -11.014 3.808 1.00 96.11 N ATOM 2754 CA LEU B 53 -19.087 -12.136 3.557 1.00 97.52 C ATOM 2755 C LEU B 53 -18.264 -12.405 4.825 1.00 99.27 C ATOM 2756 O LEU B 53 -17.331 -11.663 5.130 1.00100.25 O ATOM 2757 CB LEU B 53 -18.141 -11.819 2.390 1.00 95.74 C ATOM 2758 CG LEU B 53 -18.739 -11.495 1.019 1.00 94.64 C ATOM 2759 CD1 LEU B 53 -17.609 -11.386 0.013 1.00 93.53 C ATOM 2760 CD2 LEU B 53 -19.718 -12.585 0.594 1.00 94.55 C ATOM 2761 N ALA B 54 -18.621 -13.446 5.574 1.00101.13 N ATOM 2762 CA ALA B 54 -17.870 -13.791 6.777 1.00102.37 C ATOM 2763 C ALA B 54 -16.517 -14.270 6.273 1.00103.74 C ATOM 2764 O ALA B 54 -16.441 -15.040 5.316 1.00102.75 O ATOM 2765 CB ALA B 54 -18.577 -14.896 7.547 1.00101.69 C ATOM 2766 N PHE B 55 -15.447 -13.807 6.901 1.00106.29 N ATOM 2767 CA PHE B 55 -14.124 -14.199 6.449 1.00109.42 C ATOM 2768 C PHE B 55 -13.380 -15.099 7.420 1.00111.23 C ATOM 2769 O PHE B 55 -13.043 -14.693 8.535 1.00111.53 O ATOM 2770 CB PHE B 55 -13.281 -12.957 6.139 1.00110.05 C ATOM 2771 CG PHE B 55 -13.849 -12.092 5.044 1.00109.86 C ATOM 2772 CD1 PHE B 55 -14.081 -12.615 3.773 1.00109.98 C ATOM 2773 CD2 PHE B 55 -14.166 -10.762 5.287 1.00109.37 C ATOM 2774 CE1 PHE B 55 -14.620 -11.826 2.761 1.00109.69 C ATOM 2775 CE2 PHE B 55 -14.703 -9.965 4.285 1.00109.52 C ATOM 2776 CZ PHE B 55 -14.932 -10.501 3.018 1.00109.89 C ATOM 2777 N VAL B 56 -13.137 -16.327 6.971 1.00113.41 N ATOM 2778 CA VAL B 56 -12.421 -17.342 7.738 1.00114.90 C ATOM 2779 C VAL B 56 -11.007 -16.830 7.998 1.00115.83 C ATOM 2780 O VAL B 56 -10.327 -16.387 7.068 1.00115.87 O ATOM 2781 CB VAL B 56 -12.348 -18.666 6.939 1.00115.37 C ATOM 2782 CG1 VAL B 56 -13.756 -19.205 6.683 1.00114.66 C ATOM 2783 CG2 VAL B 56 -11.628 -18.434 5.609 1.00115.29 C ATOM 2784 N ASP B 57 -10.553 -16.897 9.247 1.00116.87 N ATOM 2785 CA ASP B 57 -9.225 -16.387 9.559 1.00118.41 C ATOM 2786 C ASP B 57 -8.081 -17.375 9.357 1.00119.01 C ATOM 2787 O ASP B 57 -8.134 -18.512 9.830 1.00118.81 O ATOM 2788 CB ASP B 57 -9.186 -15.840 10.986 1.00118.79 C ATOM 2789 CG ASP B 57 -8.296 -14.612 11.106 1.00119.33 C ATOM 2790 OD1 ASP B 57 -7.092 -14.715 10.786 1.00118.72 O ATOM 2791 OD2 ASP B 57 -8.809 -13.543 11.511 1.00119.85 O ATOM 2792 N VAL B 58 -7.048 -16.911 8.653 1.00119.62 N ATOM 2793 CA VAL B 58 -5.861 -17.707 8.349 1.00119.95 C ATOM 2794 C VAL B 58 -4.935 -17.864 9.555 1.00120.08 C ATOM 2795 O VAL B 58 -4.138 -16.971 9.868 1.00120.20 O ATOM 2796 CB VAL B 58 -5.051 -17.081 7.184 1.00120.15 C ATOM 2797 CG1 VAL B 58 -3.743 -17.844 6.974 1.00119.13 C ATOM 2798 CG2 VAL B 58 -5.886 -17.095 5.911 1.00120.11 C ATOM 2799 N PRO B 59 -5.023 -19.018 10.240 1.00119.88 N ATOM 2800 CA PRO B 59 -4.184 -19.281 11.413 1.00118.69 C ATOM 2801 C PRO B 59 -2.714 -19.149 11.062 1.00117.26 C ATOM 2802 O PRO B 59 -2.269 -19.611 10.013 1.00116.68 O ATOM 2803 CB PRO B 59 -4.572 -20.705 11.802 1.00119.27 C ATOM 2804 CG PRO B 59 -4.961 -21.321 10.480 1.00119.94 C ATOM 2805 CD PRO B 59 -5.782 -20.224 9.857 1.00119.85 C ATOM 2806 N ASN B 60 -1.966 -18.511 11.949 1.00116.37 N ATOM 2807 CA ASN B 60 -0.544 -18.304 11.733 1.00116.10 C ATOM 2808 C ASN B 60 -0.278 -17.715 10.345 1.00115.83 C ATOM 2809 O ASN B 60 0.162 -18.413 9.425 1.00115.68 O ATOM 2810 CB ASN B 60 0.222 -19.623 11.909 1.00115.54 C ATOM 2811 CG ASN B 60 1.735 -19.443 11.809 1.00115.47 C ATOM 2812 OD1 ASN B 60 2.318 -18.583 12.470 1.00114.68 O ATOM 2813 ND2 ASN B 60 2.375 -20.264 10.981 1.00115.65 N ATOM 2814 N ASP B 61 -0.571 -16.426 10.203 1.00115.14 N ATOM 2815 CA ASP B 61 -0.344 -15.709 8.956 1.00114.13 C ATOM 2816 C ASP B 61 1.005 -15.006 9.162 1.00113.89 C ATOM 2817 O ASP B 61 1.153 -14.177 10.066 1.00113.46 O ATOM 2818 CB ASP B 61 -1.464 -14.685 8.734 1.00113.36 C ATOM 2819 CG ASP B 61 -1.547 -14.214 7.302 1.00112.30 C ATOM 2820 OD1 ASP B 61 -0.485 -14.040 6.676 1.00111.35 O ATOM 2821 OD2 ASP B 61 -2.673 -14.008 6.805 1.00111.82 O ATOM 2822 N SER B 62 1.989 -15.346 8.334 1.00113.79 N ATOM 2823 CA SER B 62 3.331 -14.774 8.465 1.00113.65 C ATOM 2824 C SER B 62 3.589 -13.495 7.655 1.00112.82 C ATOM 2825 O SER B 62 3.147 -13.364 6.509 1.00112.49 O ATOM 2826 CB SER B 62 4.378 -15.835 8.094 1.00114.15 C ATOM 2827 OG SER B 62 4.209 -17.012 8.871 1.00113.73 O ATOM 2828 N PRO B 63 4.318 -12.533 8.253 1.00111.99 N ATOM 2829 CA PRO B 63 4.669 -11.242 7.641 1.00111.53 C ATOM 2830 C PRO B 63 5.680 -11.293 6.484 1.00110.84 C ATOM 2831 O PRO B 63 6.865 -11.034 6.687 1.00111.39 O ATOM 2832 CB PRO B 63 5.209 -10.434 8.825 1.00111.29 C ATOM 2833 CG PRO B 63 4.467 -11.009 9.989 1.00111.54 C ATOM 2834 CD PRO B 63 4.555 -12.490 9.707 1.00111.51 C ATOM 2835 N PHE B 64 5.199 -11.612 5.283 1.00109.81 N ATOM 2836 CA PHE B 64 6.023 -11.683 4.069 1.00109.33 C ATOM 2837 C PHE B 64 7.020 -10.512 3.999 1.00108.56 C ATOM 2838 O PHE B 64 6.756 -9.505 3.344 1.00108.37 O ATOM 2839 CB PHE B 64 5.090 -11.663 2.846 1.00110.08 C ATOM 2840 CG PHE B 64 5.794 -11.761 1.509 1.00110.63 C ATOM 2841 CD1 PHE B 64 6.332 -12.966 1.069 1.00110.93 C ATOM 2842 CD2 PHE B 64 5.859 -10.658 0.660 1.00110.60 C ATOM 2843 CE1 PHE B 64 6.914 -13.071 -0.197 1.00110.88 C ATOM 2844 CE2 PHE B 64 6.439 -10.755 -0.606 1.00110.13 C ATOM 2845 CZ PHE B 64 6.965 -11.962 -1.033 1.00110.58 C ATOM 2846 N GLN B 65 8.165 -10.656 4.669 1.00108.06 N ATOM 2847 CA GLN B 65 9.202 -9.618 4.718 1.00107.81 C ATOM 2848 C GLN B 65 8.752 -8.467 5.624 1.00107.14 C ATOM 2849 O GLN B 65 9.077 -8.431 6.813 1.00107.01 O ATOM 2850 CB GLN B 65 9.494 -9.059 3.322 1.00108.70 C ATOM 2851 CG GLN B 65 9.783 -10.105 2.258 1.00110.56 C ATOM 2852 CD GLN B 65 10.194 -9.486 0.927 1.00110.91 C ATOM 2853 OE1 GLN B 65 11.208 -8.791 0.844 1.00110.88 O ATOM 2854 NE2 GLN B 65 9.406 -9.735 -0.117 1.00110.95 N ATOM 2855 N ILE B 66 8.005 -7.529 5.037 1.00106.12 N ATOM 2856 CA ILE B 66 7.464 -6.361 5.740 1.00103.69 C ATOM 2857 C ILE B 66 5.932 -6.466 5.746 1.00101.94 C ATOM 2858 O ILE B 66 5.283 -6.167 6.755 1.00100.49 O ATOM 2859 CB ILE B 66 7.902 -5.031 5.042 1.00103.32 C ATOM 2860 CG1 ILE B 66 8.836 -4.240 5.956 1.00102.53 C ATOM 2861 CG2 ILE B 66 6.698 -4.194 4.672 1.00103.34 C ATOM 2862 CD1 ILE B 66 10.204 -4.872 6.122 1.00102.55 C ATOM 2863 N VAL B 67 5.379 -6.905 4.611 1.00100.15 N ATOM 2864 CA VAL B 67 3.937 -7.078 4.435 1.00 98.05 C ATOM 2865 C VAL B 67 3.328 -7.982 5.500 1.00 97.96 C ATOM 2866 O VAL B 67 3.798 -9.094 5.724 1.00 98.49 O ATOM 2867 CB VAL B 67 3.612 -7.666 3.052 1.00 96.47 C ATOM 2868 CG1 VAL B 67 2.097 -7.774 2.880 1.00 95.16 C ATOM 2869 CG2 VAL B 67 4.244 -6.807 1.966 1.00 94.29 C ATOM 2870 N LYS B 68 2.266 -7.510 6.137 1.00 97.30 N ATOM 2871 CA LYS B 68 1.631 -8.270 7.200 1.00 97.06 C ATOM 2872 C LYS B 68 0.330 -8.903 6.767 1.00 97.44 C ATOM 2873 O LYS B 68 -0.150 -8.644 5.665 1.00 97.46 O ATOM 2874 CB LYS B 68 1.411 -7.349 8.394 1.00 95.66 C ATOM 2875 CG LYS B 68 2.617 -6.479 8.603 1.00 94.66 C ATOM 2876 CD LYS B 68 2.518 -5.601 9.814 1.00 94.77 C ATOM 2877 CE LYS B 68 3.730 -4.680 9.854 1.00 95.37 C ATOM 2878 NZ LYS B 68 4.998 -5.420 9.545 1.00 93.99 N ATOM 2879 N ASN B 69 -0.210 -9.754 7.638 1.00 98.69 N ATOM 2880 CA ASN B 69 -1.471 -10.473 7.413 1.00100.55 C ATOM 2881 C ASN B 69 -1.765 -10.753 5.929 1.00101.33 C ATOM 2882 O ASN B 69 -2.896 -10.574 5.471 1.00100.98 O ATOM 2883 CB ASN B 69 -2.628 -9.666 8.016 1.00101.27 C ATOM 2884 CG ASN B 69 -2.409 -9.321 9.489 1.00101.35 C ATOM 2885 OD1 ASN B 69 -2.828 -10.064 10.383 1.00101.81 O ATOM 2886 ND2 ASN B 69 -1.747 -8.191 9.742 1.00100.63 N ATOM 2887 N PRO B 70 -0.756 -11.227 5.172 1.00102.26 N ATOM 2888 CA PRO B 70 -0.893 -11.523 3.740 1.00102.62 C ATOM 2889 C PRO B 70 -1.942 -12.554 3.357 1.00102.96 C ATOM 2890 O PRO B 70 -2.875 -12.251 2.611 1.00103.01 O ATOM 2891 CB PRO B 70 0.515 -11.956 3.346 1.00103.21 C ATOM 2892 CG PRO B 70 1.001 -12.633 4.577 1.00103.31 C ATOM 2893 CD PRO B 70 0.553 -11.691 5.669 1.00102.72 C ATOM 2894 N ARG B 71 -1.777 -13.775 3.850 1.00103.42 N ATOM 2895 CA ARG B 71 -2.715 -14.848 3.551 1.00104.26 C ATOM 2896 C ARG B 71 -4.150 -14.409 3.840 1.00104.02 C ATOM 2897 O ARG B 71 -5.079 -14.729 3.085 1.00104.32 O ATOM 2898 CB ARG B 71 -2.352 -16.082 4.373 1.00105.61 C ATOM 2899 CG ARG B 71 -0.947 -16.572 4.088 1.00107.38 C ATOM 2900 CD ARG B 71 -0.532 -17.697 5.014 1.00109.01 C ATOM 2901 NE ARG B 71 0.815 -18.167 4.702 1.00109.72 N ATOM 2902 CZ ARG B 71 1.911 -17.420 4.811 1.00110.10 C ATOM 2903 NH1 ARG B 71 1.821 -16.163 5.229 1.00110.13 N ATOM 2904 NH2 ARG B 71 3.093 -17.931 4.493 1.00109.92 N ATOM 2905 N SER B 72 -4.324 -13.672 4.932 1.00103.10 N ATOM 2906 CA SER B 72 -5.640 -13.171 5.311 1.00101.82 C ATOM 2907 C SER B 72 -6.235 -12.385 4.128 1.00101.36 C ATOM 2908 O SER B 72 -7.243 -12.788 3.532 1.00100.77 O ATOM 2909 CB SER B 72 -5.517 -12.258 6.544 1.00101.15 C ATOM 2910 OG SER B 72 -5.004 -12.954 7.669 1.00 99.62 O ATOM 2911 N VAL B 73 -5.577 -11.274 3.795 1.00100.71 N ATOM 2912 CA VAL B 73 -5.983 -10.385 2.710 1.00 99.43 C ATOM 2913 C VAL B 73 -6.020 -11.066 1.345 1.00 98.74 C ATOM 2914 O VAL B 73 -6.661 -10.578 0.417 1.00 98.50 O ATOM 2915 CB VAL B 73 -5.030 -9.176 2.616 1.00 99.57 C ATOM 2916 CG1 VAL B 73 -5.517 -8.223 1.544 1.00 99.47 C ATOM 2917 CG2 VAL B 73 -4.928 -8.474 3.971 1.00 98.64 C ATOM 2918 N GLY B 74 -5.323 -12.186 1.218 1.00 98.05 N ATOM 2919 CA GLY B 74 -5.321 -12.890 -0.047 1.00 98.09 C ATOM 2920 C GLY B 74 -6.568 -13.739 -0.201 1.00 98.36 C ATOM 2921 O GLY B 74 -7.164 -13.807 -1.281 1.00 97.94 O ATOM 2922 N LYS B 75 -6.971 -14.394 0.883 1.00 98.26 N ATOM 2923 CA LYS B 75 -8.149 -15.243 0.844 1.00 98.70 C ATOM 2924 C LYS B 75 -9.444 -14.457 1.011 1.00 98.41 C ATOM 2925 O LYS B 75 -10.488 -14.834 0.460 1.00 98.11 O ATOM 2926 CB LYS B 75 -8.068 -16.327 1.918 1.00 99.64 C ATOM 2927 CG LYS B 75 -9.326 -17.202 1.985 1.00101.06 C ATOM 2928 CD LYS B 75 -9.676 -17.792 0.613 1.00101.25 C ATOM 2929 CE LYS B 75 -11.048 -18.450 0.613 1.00100.94 C ATOM 2930 NZ LYS B 75 -11.431 -18.889 -0.758 1.00100.45 N ATOM 2931 N ALA B 76 -9.378 -13.371 1.780 1.00 98.25 N ATOM 2932 CA ALA B 76 -10.550 -12.529 2.008 1.00 97.31 C ATOM 2933 C ALA B 76 -10.929 -11.841 0.705 1.00 96.60 C ATOM 2934 O ALA B 76 -12.111 -11.713 0.385 1.00 95.57 O ATOM 2935 CB ALA B 76 -10.249 -11.488 3.082 1.00 96.93 C ATOM 2936 N ASN B 77 -9.908 -11.411 -0.039 1.00 97.00 N ATOM 2937 CA ASN B 77 -10.080 -10.732 -1.323 1.00 97.74 C ATOM 2938 C ASN B 77 -10.392 -11.725 -2.433 1.00 98.75 C ATOM 2939 O ASN B 77 -10.935 -11.352 -3.473 1.00 98.91 O ATOM 2940 CB ASN B 77 -8.819 -9.932 -1.678 1.00 97.23 C ATOM 2941 CG ASN B 77 -8.862 -8.497 -1.154 1.00 97.89 C ATOM 2942 OD1 ASN B 77 -9.524 -7.630 -1.732 1.00 96.57 O ATOM 2943 ND2 ASN B 77 -8.160 -8.246 -0.047 1.00 97.76 N ATOM 2944 N GLU B 78 -10.035 -12.988 -2.213 1.00 99.93 N ATOM 2945 CA GLU B 78 -10.317 -14.031 -3.191 1.00100.34 C ATOM 2946 C GLU B 78 -11.803 -14.362 -3.104 1.00 99.33 C ATOM 2947 O GLU B 78 -12.486 -14.519 -4.117 1.00 99.02 O ATOM 2948 CB GLU B 78 -9.495 -15.288 -2.902 1.00101.86 C ATOM 2949 CG GLU B 78 -9.892 -16.473 -3.772 1.00103.76 C ATOM 2950 CD GLU B 78 -9.003 -17.676 -3.562 1.00104.81 C ATOM 2951 OE1 GLU B 78 -8.873 -18.116 -2.399 1.00105.01 O ATOM 2952 OE2 GLU B 78 -8.442 -18.176 -4.563 1.00105.37 O ATOM 2953 N GLN B 79 -12.298 -14.465 -1.877 1.00 98.33 N ATOM 2954 CA GLN B 79 -13.699 -14.759 -1.650 1.00 98.14 C ATOM 2955 C GLN B 79 -14.591 -13.659 -2.227 1.00 98.36 C ATOM 2956 O GLN B 79 -15.672 -13.941 -2.759 1.00 99.37 O ATOM 2957 CB GLN B 79 -13.967 -14.903 -0.153 1.00 98.02 C ATOM 2958 CG GLN B 79 -15.447 -14.963 0.202 1.00 98.51 C ATOM 2959 CD GLN B 79 -15.683 -15.192 1.682 1.00 99.74 C ATOM 2960 OE1 GLN B 79 -16.827 -15.227 2.143 1.00100.42 O ATOM 2961 NE2 GLN B 79 -14.598 -15.356 2.436 1.00 99.69 N ATOM 2962 N LEU B 80 -14.127 -12.412 -2.127 1.00 97.08 N ATOM 2963 CA LEU B 80 -14.880 -11.254 -2.606 1.00 96.12 C ATOM 2964 C LEU B 80 -14.960 -11.151 -4.120 1.00 95.39 C ATOM 2965 O LEU B 80 -16.053 -11.140 -4.683 1.00 94.88 O ATOM 2966 CB LEU B 80 -14.289 -9.956 -2.022 1.00 96.94 C ATOM 2967 CG LEU B 80 -14.804 -8.575 -2.471 1.00 96.90 C ATOM 2968 CD1 LEU B 80 -16.311 -8.439 -2.284 1.00 96.64 C ATOM 2969 CD2 LEU B 80 -14.084 -7.512 -1.662 1.00 97.01 C ATOM 2970 N ALA B 81 -13.807 -11.064 -4.778 1.00 95.38 N ATOM 2971 CA ALA B 81 -13.772 -10.960 -6.238 1.00 95.32 C ATOM 2972 C ALA B 81 -14.819 -11.914 -6.801 1.00 95.15 C ATOM 2973 O ALA B 81 -15.620 -11.542 -7.665 1.00 95.06 O ATOM 2974 CB ALA B 81 -12.375 -11.318 -6.771 1.00 94.39 C ATOM 2975 N ALA B 82 -14.809 -13.140 -6.277 1.00 94.53 N ATOM 2976 CA ALA B 82 -15.742 -14.184 -6.674 1.00 92.86 C ATOM 2977 C ALA B 82 -17.189 -13.679 -6.667 1.00 92.77 C ATOM 2978 O ALA B 82 -17.843 -13.624 -7.715 1.00 94.08 O ATOM 2979 CB ALA B 82 -15.603 -15.362 -5.738 1.00 91.55 C ATOM 2980 N VAL B 83 -17.691 -13.303 -5.492 1.00 90.57 N ATOM 2981 CA VAL B 83 -19.064 -12.820 -5.398 1.00 87.86 C ATOM 2982 C VAL B 83 -19.325 -11.692 -6.383 1.00 87.81 C ATOM 2983 O VAL B 83 -20.372 -11.646 -7.010 1.00 87.85 O ATOM 2984 CB VAL B 83 -19.400 -12.323 -3.985 1.00 85.44 C ATOM 2985 CG1 VAL B 83 -20.900 -12.059 -3.882 1.00 83.17 C ATOM 2986 CG2 VAL B 83 -18.939 -13.349 -2.947 1.00 84.36 C ATOM 2987 N VAL B 84 -18.361 -10.791 -6.521 1.00 88.67 N ATOM 2988 CA VAL B 84 -18.488 -9.654 -7.435 1.00 89.97 C ATOM 2989 C VAL B 84 -18.705 -10.093 -8.891 1.00 90.78 C ATOM 2990 O VAL B 84 -19.632 -9.636 -9.566 1.00 89.94 O ATOM 2991 CB VAL B 84 -17.230 -8.753 -7.356 1.00 89.27 C ATOM 2992 CG1 VAL B 84 -17.360 -7.584 -8.324 1.00 88.30 C ATOM 2993 CG2 VAL B 84 -17.032 -8.259 -5.928 1.00 88.68 C ATOM 2994 N ALA B 85 -17.841 -10.977 -9.374 1.00 92.01 N ATOM 2995 CA ALA B 85 -17.965 -11.468 -10.736 1.00 93.16 C ATOM 2996 C ALA B 85 -19.299 -12.193 -10.815 1.00 93.76 C ATOM 2997 O ALA B 85 -19.973 -12.190 -11.855 1.00 92.85 O ATOM 2998 CB ALA B 85 -16.814 -12.428 -11.062 1.00 92.48 C ATOM 2999 N GLU B 86 -19.690 -12.783 -9.687 1.00 94.93 N ATOM 3000 CA GLU B 86 -20.908 -13.570 -9.543 1.00 96.65 C ATOM 3001 C GLU B 86 -22.144 -12.729 -9.841 1.00 97.30 C ATOM 3002 O GLU B 86 -23.081 -13.193 -10.523 1.00 96.34 O ATOM 3003 CB GLU B 86 -20.996 -14.169 -8.138 1.00 97.43 C ATOM 3004 CG GLU B 86 -19.848 -15.102 -7.788 1.00101.11 C ATOM 3005 CD GLU B 86 -19.766 -16.298 -8.716 1.00103.56 C ATOM 3006 OE1 GLU B 86 -20.617 -16.405 -9.625 1.00104.28 O ATOM 3007 OE2 GLU B 86 -18.851 -17.129 -8.535 1.00104.59 O ATOM 3008 N THR B 87 -22.175 -11.520 -9.307 1.00 98.79 N ATOM 3009 CA THR B 87 -23.286 -10.607 -9.490 1.00 99.86 C ATOM 3010 C THR B 87 -23.131 -9.988 -10.856 1.00101.87 C ATOM 3011 O THR B 87 -24.048 -10.031 -11.667 1.00102.78 O ATOM 3012 CB THR B 87 -23.256 -9.487 -8.456 1.00 99.41 C ATOM 3013 OG1 THR B 87 -22.797 -10.005 -7.200 1.00 99.80 O ATOM 3014 CG2 THR B 87 -24.639 -8.927 -8.265 1.00 99.73 C ATOM 3015 N GLN B 88 -21.956 -9.415 -11.102 1.00104.27 N ATOM 3016 CA GLN B 88 -21.641 -8.776 -12.381 1.00107.00 C ATOM 3017 C GLN B 88 -22.175 -9.534 -13.603 1.00107.90 C ATOM 3018 O GLN B 88 -22.584 -8.917 -14.595 1.00107.08 O ATOM 3019 CB GLN B 88 -20.124 -8.609 -12.523 1.00107.91 C ATOM 3020 CG GLN B 88 -19.560 -7.359 -11.872 1.00109.47 C ATOM 3021 CD GLN B 88 -20.247 -6.094 -12.361 1.00110.31 C ATOM 3022 OE1 GLN B 88 -20.739 -6.036 -13.492 1.00110.06 O ATOM 3023 NE2 GLN B 88 -20.271 -5.067 -11.513 1.00110.80 N ATOM 3024 N LYS B 89 -22.162 -10.868 -13.524 1.00109.56 N ATOM 3025 CA LYS B 89 -22.634 -11.730 -14.598 1.00110.90 C ATOM 3026 C LYS B 89 -24.151 -11.680 -14.716 1.00111.00 C ATOM 3027 O LYS B 89 -24.709 -11.708 -15.822 1.00110.95 O ATOM 3028 CB LYS B 89 -22.165 -13.168 -14.376 1.00111.68 C ATOM 3029 CG LYS B 89 -22.853 -13.872 -13.216 1.00113.68 C ATOM 3030 CD LYS B 89 -22.332 -15.289 -13.043 1.00116.52 C ATOM 3031 CE LYS B 89 -23.013 -15.990 -11.878 1.00117.23 C ATOM 3032 NZ LYS B 89 -22.513 -17.381 -11.697 1.00117.27 N ATOM 3033 N ASN B 90 -24.830 -11.613 -13.571 1.00110.80 N ATOM 3034 CA ASN B 90 -26.287 -11.534 -13.527 1.00110.60 C ATOM 3035 C ASN B 90 -26.738 -10.195 -14.086 1.00110.46 C ATOM 3036 O ASN B 90 -27.920 -9.988 -14.371 1.00110.57 O ATOM 3037 CB ASN B 90 -26.772 -11.671 -12.087 1.00110.18 C ATOM 3038 CG ASN B 90 -26.464 -13.024 -11.506 1.00109.88 C ATOM 3039 OD1 ASN B 90 -27.074 -14.018 -11.888 1.00110.06 O ATOM 3040 ND2 ASN B 90 -25.503 -13.079 -10.592 1.00109.05 N ATOM 3041 N GLY B 91 -25.781 -9.282 -14.222 1.00110.39 N ATOM 3042 CA GLY B 91 -26.071 -7.966 -14.759 1.00109.77 C ATOM 3043 C GLY B 91 -26.295 -6.856 -13.743 1.00109.04 C ATOM 3044 O GLY B 91 -26.415 -5.692 -14.131 1.00109.23 O ATOM 3045 N THR B 92 -26.344 -7.187 -12.452 1.00107.90 N ATOM 3046 CA THR B 92 -26.580 -6.168 -11.430 1.00106.35 C ATOM 3047 C THR B 92 -25.316 -5.375 -11.090 1.00105.86 C ATOM 3048 O THR B 92 -24.196 -5.827 -11.361 1.00105.70 O ATOM 3049 CB THR B 92 -27.165 -6.790 -10.121 1.00105.45 C ATOM 3050 OG1 THR B 92 -26.129 -6.973 -9.151 1.00103.68 O ATOM 3051 CG2 THR B 92 -27.821 -8.137 -10.408 1.00104.19 C ATOM 3052 N ILE B 93 -25.510 -4.182 -10.520 1.00104.70 N ATOM 3053 CA ILE B 93 -24.406 -3.306 -10.117 1.00101.94 C ATOM 3054 C ILE B 93 -23.924 -3.683 -8.721 1.00100.46 C ATOM 3055 O ILE B 93 -24.716 -3.753 -7.774 1.00100.27 O ATOM 3056 CB ILE B 93 -24.833 -1.828 -10.081 1.00101.61 C ATOM 3057 CG1 ILE B 93 -25.398 -1.405 -11.435 1.00101.79 C ATOM 3058 CG2 ILE B 93 -23.642 -0.960 -9.712 1.00101.59 C ATOM 3059 CD1 ILE B 93 -24.430 -1.597 -12.578 1.00102.89 C ATOM 3060 N SER B 94 -22.624 -3.930 -8.608 1.00 98.22 N ATOM 3061 CA SER B 94 -22.023 -4.312 -7.342 1.00 96.48 C ATOM 3062 C SER B 94 -21.638 -3.086 -6.516 1.00 95.30 C ATOM 3063 O SER B 94 -21.013 -2.147 -7.034 1.00 94.98 O ATOM 3064 CB SER B 94 -20.783 -5.176 -7.589 1.00 96.70 C ATOM 3065 OG SER B 94 -19.898 -4.553 -8.510 1.00 96.79 O ATOM 3066 N VAL B 95 -22.026 -3.087 -5.239 1.00 92.94 N ATOM 3067 CA VAL B 95 -21.714 -1.981 -4.332 1.00 89.32 C ATOM 3068 C VAL B 95 -21.128 -2.534 -3.037 1.00 87.17 C ATOM 3069 O VAL B 95 -21.855 -2.964 -2.137 1.00 86.40 O ATOM 3070 CB VAL B 95 -22.959 -1.153 -4.010 1.00 88.54 C ATOM 3071 CG1 VAL B 95 -22.531 0.205 -3.458 1.00 88.25 C ATOM 3072 CG2 VAL B 95 -23.808 -0.988 -5.258 1.00 87.16 C ATOM 3073 N VAL B 96 -19.802 -2.490 -2.958 1.00 85.04 N ATOM 3074 CA VAL B 96 -19.058 -3.018 -1.828 1.00 83.33 C ATOM 3075 C VAL B 96 -18.852 -2.116 -0.619 1.00 81.90 C ATOM 3076 O VAL B 96 -17.957 -1.271 -0.613 1.00 81.87 O ATOM 3077 CB VAL B 96 -17.664 -3.498 -2.283 1.00 83.38 C ATOM 3078 CG1 VAL B 96 -16.938 -4.180 -1.129 1.00 83.72 C ATOM 3079 CG2 VAL B 96 -17.806 -4.436 -3.459 1.00 83.57 C ATOM 3080 N LEU B 97 -19.685 -2.302 0.404 1.00 80.03 N ATOM 3081 CA LEU B 97 -19.536 -1.557 1.648 1.00 76.82 C ATOM 3082 C LEU B 97 -18.270 -2.202 2.175 1.00 75.41 C ATOM 3083 O LEU B 97 -18.115 -3.412 2.044 1.00 74.63 O ATOM 3084 CB LEU B 97 -20.684 -1.854 2.610 1.00 75.09 C ATOM 3085 CG LEU B 97 -22.097 -1.584 2.096 1.00 75.32 C ATOM 3086 CD1 LEU B 97 -23.065 -1.659 3.262 1.00 72.55 C ATOM 3087 CD2 LEU B 97 -22.153 -0.216 1.434 1.00 74.86 C ATOM 3088 N GLY B 98 -17.361 -1.434 2.761 1.00 74.85 N ATOM 3089 CA GLY B 98 -16.153 -2.080 3.215 1.00 74.16 C ATOM 3090 C GLY B 98 -15.483 -1.782 4.535 1.00 74.60 C ATOM 3091 O GLY B 98 -15.998 -1.109 5.427 1.00 73.56 O ATOM 3092 N GLY B 99 -14.309 -2.384 4.634 1.00 75.85 N ATOM 3093 CA GLY B 99 -13.442 -2.269 5.776 1.00 78.40 C ATOM 3094 C GLY B 99 -12.306 -1.598 5.051 1.00 80.54 C ATOM 3095 O GLY B 99 -12.579 -0.833 4.122 1.00 80.94 O ATOM 3096 N ASP B 100 -11.058 -1.894 5.400 1.00 82.59 N ATOM 3097 CA ASP B 100 -9.938 -1.243 4.721 1.00 84.85 C ATOM 3098 C ASP B 100 -10.026 -1.354 3.190 1.00 84.58 C ATOM 3099 O ASP B 100 -10.679 -2.246 2.647 1.00 84.05 O ATOM 3100 CB ASP B 100 -8.602 -1.788 5.243 1.00 87.56 C ATOM 3101 CG ASP B 100 -8.106 -2.982 4.467 1.00 90.13 C ATOM 3102 OD1 ASP B 100 -7.757 -2.805 3.281 1.00 89.63 O ATOM 3103 OD2 ASP B 100 -8.062 -4.088 5.052 1.00 92.61 O ATOM 3104 N HIS B 101 -9.362 -0.425 2.514 1.00 84.89 N ATOM 3105 CA HIS B 101 -9.368 -0.318 1.061 1.00 85.65 C ATOM 3106 C HIS B 101 -8.639 -1.422 0.308 1.00 85.99 C ATOM 3107 O HIS B 101 -8.484 -1.339 -0.911 1.00 85.93 O ATOM 3108 CB HIS B 101 -8.788 1.050 0.672 1.00 86.72 C ATOM 3109 CG HIS B 101 -9.302 1.599 -0.625 1.00 86.60 C ATOM 3110 ND1 HIS B 101 -9.235 2.944 -0.934 1.00 85.54 N ATOM 3111 CD2 HIS B 101 -9.848 0.992 -1.706 1.00 86.26 C ATOM 3112 CE1 HIS B 101 -9.716 3.137 -2.146 1.00 86.39 C ATOM 3113 NE2 HIS B 101 -10.095 1.969 -2.639 1.00 86.65 N ATOM 3114 N SER B 102 -8.191 -2.458 1.008 1.00 86.24 N ATOM 3115 CA SER B 102 -7.489 -3.543 0.328 1.00 86.63 C ATOM 3116 C SER B 102 -8.488 -4.552 -0.229 1.00 85.62 C ATOM 3117 O SER B 102 -8.108 -5.548 -0.839 1.00 84.65 O ATOM 3118 CB SER B 102 -6.528 -4.239 1.282 1.00 87.71 C ATOM 3119 OG SER B 102 -7.247 -4.977 2.251 1.00 91.99 O ATOM 3120 N MET B 103 -9.772 -4.297 -0.002 1.00 85.08 N ATOM 3121 CA MET B 103 -10.803 -5.177 -0.519 1.00 84.48 C ATOM 3122 C MET B 103 -11.091 -4.693 -1.925 1.00 83.76 C ATOM 3123 O MET B 103 -12.030 -5.150 -2.578 1.00 83.39 O ATOM 3124 CB MET B 103 -12.069 -5.107 0.333 1.00 84.83 C ATOM 3125 CG MET B 103 -11.860 -5.541 1.773 1.00 87.32 C ATOM 3126 SD MET B 103 -11.005 -7.130 1.934 1.00 88.58 S ATOM 3127 CE MET B 103 -12.298 -8.237 1.358 1.00 87.63 C ATOM 3128 N ALA B 104 -10.269 -3.753 -2.383 1.00 83.38 N ATOM 3129 CA ALA B 104 -10.405 -3.201 -3.723 1.00 83.07 C ATOM 3130 C ALA B 104 -9.941 -4.243 -4.744 1.00 82.42 C ATOM 3131 O ALA B 104 -10.565 -4.414 -5.797 1.00 82.21 O ATOM 3132 CB ALA B 104 -9.581 -1.926 -3.849 1.00 84.51 C ATOM 3133 N ILE B 105 -8.845 -4.939 -4.439 1.00 80.53 N ATOM 3134 CA ILE B 105 -8.364 -5.983 -5.339 1.00 78.98 C ATOM 3135 C ILE B 105 -9.543 -6.938 -5.564 1.00 77.86 C ATOM 3136 O ILE B 105 -9.991 -7.134 -6.696 1.00 76.47 O ATOM 3137 CB ILE B 105 -7.150 -6.777 -4.746 1.00 78.15 C ATOM 3138 CG1 ILE B 105 -6.889 -6.388 -3.294 1.00 77.60 C ATOM 3139 CG2 ILE B 105 -5.893 -6.469 -5.547 1.00 78.93 C ATOM 3140 CD1 ILE B 105 -5.621 -7.026 -2.742 1.00 74.53 C ATOM 3141 N GLY B 106 -10.054 -7.509 -4.476 1.00 76.94 N ATOM 3142 CA GLY B 106 -11.186 -8.403 -4.583 1.00 76.15 C ATOM 3143 C GLY B 106 -12.238 -7.814 -5.499 1.00 76.23 C ATOM 3144 O GLY B 106 -12.437 -8.308 -6.605 1.00 77.43 O ATOM 3145 N SER B 107 -12.900 -6.751 -5.051 1.00 75.72 N ATOM 3146 CA SER B 107 -13.945 -6.094 -5.839 1.00 75.71 C ATOM 3147 C SER B 107 -13.556 -5.892 -7.306 1.00 75.75 C ATOM 3148 O SER B 107 -14.208 -6.415 -8.216 1.00 74.82 O ATOM 3149 CB SER B 107 -14.293 -4.732 -5.214 1.00 76.72 C ATOM 3150 OG SER B 107 -15.186 -3.966 -6.024 1.00 75.74 O ATOM 3151 N ILE B 108 -12.486 -5.132 -7.525 1.00 75.95 N ATOM 3152 CA ILE B 108 -12.023 -4.831 -8.868 1.00 75.58 C ATOM 3153 C ILE B 108 -11.794 -6.085 -9.703 1.00 76.75 C ATOM 3154 O ILE B 108 -12.282 -6.174 -10.829 1.00 77.19 O ATOM 3155 CB ILE B 108 -10.747 -3.968 -8.822 1.00 74.49 C ATOM 3156 CG1 ILE B 108 -11.037 -2.670 -8.065 1.00 73.59 C ATOM 3157 CG2 ILE B 108 -10.294 -3.639 -10.228 1.00 75.30 C ATOM 3158 CD1 ILE B 108 -9.947 -1.642 -8.136 1.00 71.41 C ATOM 3159 N SER B 109 -11.067 -7.056 -9.154 1.00 77.84 N ATOM 3160 CA SER B 109 -10.814 -8.311 -9.865 1.00 78.47 C ATOM 3161 C SER B 109 -12.136 -8.934 -10.282 1.00 78.55 C ATOM 3162 O SER B 109 -12.403 -9.113 -11.467 1.00 77.36 O ATOM 3163 CB SER B 109 -10.057 -9.295 -8.967 1.00 78.95 C ATOM 3164 OG SER B 109 -8.674 -8.996 -8.938 1.00 81.14 O ATOM 3165 N GLY B 110 -12.955 -9.262 -9.288 1.00 79.20 N ATOM 3166 CA GLY B 110 -14.244 -9.855 -9.556 1.00 81.12 C ATOM 3167 C GLY B 110 -14.948 -9.133 -10.682 1.00 83.10 C ATOM 3168 O GLY B 110 -15.540 -9.763 -11.555 1.00 83.47 O ATOM 3169 N HIS B 111 -14.874 -7.807 -10.674 1.00 85.03 N ATOM 3170 CA HIS B 111 -15.519 -6.999 -11.707 1.00 87.81 C ATOM 3171 C HIS B 111 -14.949 -7.243 -13.119 1.00 88.96 C ATOM 3172 O HIS B 111 -15.705 -7.466 -14.073 1.00 87.29 O ATOM 3173 CB HIS B 111 -15.402 -5.510 -11.340 1.00 88.57 C ATOM 3174 CG HIS B 111 -16.248 -4.606 -12.188 1.00 89.32 C ATOM 3175 ND1 HIS B 111 -16.233 -3.234 -12.053 1.00 89.05 N ATOM 3176 CD2 HIS B 111 -17.125 -4.876 -13.186 1.00 88.94 C ATOM 3177 CE1 HIS B 111 -17.062 -2.699 -12.933 1.00 89.41 C ATOM 3178 NE2 HIS B 111 -17.615 -3.673 -13.632 1.00 88.73 N ATOM 3179 N ALA B 112 -13.619 -7.199 -13.232 1.00 90.99 N ATOM 3180 CA ALA B 112 -12.898 -7.389 -14.500 1.00 92.37 C ATOM 3181 C ALA B 112 -13.189 -8.728 -15.175 1.00 93.72 C ATOM 3182 O ALA B 112 -13.038 -8.874 -16.395 1.00 93.91 O ATOM 3183 CB ALA B 112 -11.389 -7.246 -14.265 1.00 91.27 C ATOM 3184 N ARG B 113 -13.592 -9.708 -14.376 1.00 94.50 N ATOM 3185 CA ARG B 113 -13.905 -11.025 -14.897 1.00 94.94 C ATOM 3186 C ARG B 113 -15.069 -10.951 -15.880 1.00 95.62 C ATOM 3187 O ARG B 113 -15.027 -11.559 -16.948 1.00 96.16 O ATOM 3188 CB ARG B 113 -14.211 -11.982 -13.734 1.00 94.28 C ATOM 3189 CG ARG B 113 -12.940 -12.437 -13.022 1.00 94.48 C ATOM 3190 CD ARG B 113 -13.182 -13.213 -11.734 1.00 95.26 C ATOM 3191 NE ARG B 113 -11.923 -13.750 -11.203 1.00 96.22 N ATOM 3192 CZ ARG B 113 -11.750 -14.232 -9.968 1.00 96.89 C ATOM 3193 NH1 ARG B 113 -12.758 -14.252 -9.097 1.00 96.87 N ATOM 3194 NH2 ARG B 113 -10.559 -14.699 -9.600 1.00 96.57 N ATOM 3195 N VAL B 114 -16.093 -10.180 -15.531 1.00 96.02 N ATOM 3196 CA VAL B 114 -17.264 -10.040 -16.385 1.00 96.73 C ATOM 3197 C VAL B 114 -17.139 -8.884 -17.382 1.00 97.56 C ATOM 3198 O VAL B 114 -17.715 -8.929 -18.470 1.00 97.44 O ATOM 3199 CB VAL B 114 -18.528 -9.851 -15.527 1.00 97.09 C ATOM 3200 CG1 VAL B 114 -19.763 -9.846 -16.412 1.00 97.30 C ATOM 3201 CG2 VAL B 114 -18.612 -10.963 -14.478 1.00 96.77 C ATOM 3202 N HIS B 115 -16.386 -7.853 -17.001 1.00 99.09 N ATOM 3203 CA HIS B 115 -16.163 -6.678 -17.852 1.00100.13 C ATOM 3204 C HIS B 115 -14.690 -6.276 -17.791 1.00100.23 C ATOM 3205 O HIS B 115 -14.320 -5.350 -17.076 1.00100.81 O ATOM 3206 CB HIS B 115 -17.012 -5.492 -17.386 1.00100.57 C ATOM 3207 CG HIS B 115 -18.486 -5.756 -17.386 1.00101.28 C ATOM 3208 ND1 HIS B 115 -19.107 -6.510 -16.416 1.00101.75 N ATOM 3209 CD2 HIS B 115 -19.464 -5.334 -18.222 1.00101.64 C ATOM 3210 CE1 HIS B 115 -20.409 -6.537 -16.650 1.00102.42 C ATOM 3211 NE2 HIS B 115 -20.651 -5.831 -17.738 1.00102.31 N ATOM 3212 N PRO B 116 -13.839 -6.963 -18.557 1.00100.42 N ATOM 3213 CA PRO B 116 -12.395 -6.729 -18.626 1.00100.72 C ATOM 3214 C PRO B 116 -12.011 -5.455 -19.359 1.00100.87 C ATOM 3215 O PRO B 116 -10.824 -5.163 -19.532 1.00100.56 O ATOM 3216 CB PRO B 116 -11.901 -7.960 -19.356 1.00101.70 C ATOM 3217 CG PRO B 116 -12.994 -8.148 -20.378 1.00102.18 C ATOM 3218 CD PRO B 116 -14.247 -7.985 -19.537 1.00101.10 C ATOM 3219 N ASP B 117 -13.013 -4.705 -19.803 1.00101.28 N ATOM 3220 CA ASP B 117 -12.761 -3.458 -20.524 1.00102.06 C ATOM 3221 C ASP B 117 -13.019 -2.225 -19.655 1.00101.05 C ATOM 3222 O ASP B 117 -12.912 -1.090 -20.126 1.00101.44 O ATOM 3223 CB ASP B 117 -13.626 -3.399 -21.791 1.00103.66 C ATOM 3224 CG ASP B 117 -15.113 -3.571 -21.500 1.00105.43 C ATOM 3225 OD1 ASP B 117 -15.480 -4.545 -20.801 1.00106.09 O ATOM 3226 OD2 ASP B 117 -15.913 -2.736 -21.981 1.00106.50 O ATOM 3227 N LEU B 118 -13.341 -2.460 -18.382 1.00 98.82 N ATOM 3228 CA LEU B 118 -13.632 -1.385 -17.433 1.00 95.55 C ATOM 3229 C LEU B 118 -12.461 -0.432 -17.162 1.00 93.68 C ATOM 3230 O LEU B 118 -11.291 -0.747 -17.417 1.00 93.41 O ATOM 3231 CB LEU B 118 -14.122 -1.974 -16.090 1.00 93.59 C ATOM 3232 CG LEU B 118 -13.182 -2.854 -15.244 1.00 92.16 C ATOM 3233 CD1 LEU B 118 -11.993 -2.056 -14.737 1.00 92.08 C ATOM 3234 CD2 LEU B 118 -13.949 -3.409 -14.070 1.00 90.84 C ATOM 3235 N CYS B 119 -12.806 0.748 -16.660 1.00 90.59 N ATOM 3236 CA CYS B 119 -11.832 1.759 -16.288 1.00 86.30 C ATOM 3237 C CYS B 119 -12.017 1.830 -14.791 1.00 83.38 C ATOM 3238 O CYS B 119 -12.793 1.064 -14.228 1.00 83.17 O ATOM 3239 CB CYS B 119 -12.178 3.100 -16.922 1.00 85.90 C ATOM 3240 SG CYS B 119 -13.942 3.421 -16.918 1.00 85.58 S ATOM 3241 N VAL B 120 -11.326 2.749 -14.142 1.00 80.52 N ATOM 3242 CA VAL B 120 -11.441 2.864 -12.700 1.00 76.40 C ATOM 3243 C VAL B 120 -11.195 4.283 -12.211 1.00 73.42 C ATOM 3244 O VAL B 120 -10.165 4.881 -12.508 1.00 72.14 O ATOM 3245 CB VAL B 120 -10.439 1.900 -11.989 1.00 75.95 C ATOM 3246 CG1 VAL B 120 -10.341 2.241 -10.507 1.00 74.91 C ATOM 3247 CG2 VAL B 120 -10.894 0.446 -12.171 1.00 73.63 C ATOM 3248 N ILE B 121 -12.157 4.819 -11.470 1.00 71.26 N ATOM 3249 CA ILE B 121 -12.024 6.155 -10.887 1.00 69.79 C ATOM 3250 C ILE B 121 -11.694 5.912 -9.408 1.00 66.24 C ATOM 3251 O ILE B 121 -12.475 5.304 -8.672 1.00 65.09 O ATOM 3252 CB ILE B 121 -13.342 6.995 -11.026 1.00 71.19 C ATOM 3253 CG1 ILE B 121 -13.603 7.366 -12.492 1.00 71.26 C ATOM 3254 CG2 ILE B 121 -13.216 8.293 -10.251 1.00 71.09 C ATOM 3255 CD1 ILE B 121 -13.551 6.198 -13.447 1.00 72.36 C ATOM 3256 N TRP B 122 -10.526 6.372 -8.987 1.00 62.73 N ATOM 3257 CA TRP B 122 -10.084 6.154 -7.620 1.00 60.62 C ATOM 3258 C TRP B 122 -10.037 7.428 -6.770 1.00 60.19 C ATOM 3259 O TRP B 122 -9.059 8.188 -6.806 1.00 59.32 O ATOM 3260 CB TRP B 122 -8.709 5.470 -7.664 1.00 58.94 C ATOM 3261 CG TRP B 122 -8.180 4.952 -6.347 1.00 56.58 C ATOM 3262 CD1 TRP B 122 -7.600 5.683 -5.347 1.00 55.86 C ATOM 3263 CD2 TRP B 122 -8.131 3.582 -5.919 1.00 55.45 C ATOM 3264 NE1 TRP B 122 -7.188 4.856 -4.324 1.00 55.24 N ATOM 3265 CE2 TRP B 122 -7.511 3.561 -4.643 1.00 55.02 C ATOM 3266 CE3 TRP B 122 -8.567 2.370 -6.476 1.00 54.23 C ATOM 3267 CZ2 TRP B 122 -7.293 2.379 -3.929 1.00 54.64 C ATOM 3268 CZ3 TRP B 122 -8.355 1.185 -5.764 1.00 54.27 C ATOM 3269 CH2 TRP B 122 -7.730 1.202 -4.500 1.00 55.66 C ATOM 3270 N VAL B 123 -11.110 7.657 -6.014 1.00 59.72 N ATOM 3271 CA VAL B 123 -11.191 8.813 -5.117 1.00 59.40 C ATOM 3272 C VAL B 123 -10.670 8.353 -3.741 1.00 57.38 C ATOM 3273 O VAL B 123 -11.347 7.635 -3.000 1.00 55.82 O ATOM 3274 CB VAL B 123 -12.643 9.340 -4.970 1.00 59.65 C ATOM 3275 CG1 VAL B 123 -12.630 10.698 -4.309 1.00 58.94 C ATOM 3276 CG2 VAL B 123 -13.314 9.434 -6.323 1.00 58.72 C ATOM 3277 N ASP B 124 -9.448 8.775 -3.430 1.00 55.40 N ATOM 3278 CA ASP B 124 -8.752 8.407 -2.199 1.00 54.22 C ATOM 3279 C ASP B 124 -7.793 9.566 -1.995 1.00 55.31 C ATOM 3280 O ASP B 124 -7.477 10.282 -2.958 1.00 55.23 O ATOM 3281 CB ASP B 124 -7.962 7.105 -2.447 1.00 50.71 C ATOM 3282 CG ASP B 124 -7.432 6.474 -1.191 1.00 47.58 C ATOM 3283 OD1 ASP B 124 -6.719 7.143 -0.427 1.00 46.94 O ATOM 3284 OD2 ASP B 124 -7.707 5.280 -0.962 1.00 46.77 O ATOM 3285 N ALA B 125 -7.347 9.772 -0.759 1.00 56.37 N ATOM 3286 CA ALA B 125 -6.379 10.834 -0.490 1.00 57.62 C ATOM 3287 C ALA B 125 -5.036 10.288 -0.994 1.00 58.34 C ATOM 3288 O ALA B 125 -4.186 11.022 -1.505 1.00 56.62 O ATOM 3289 CB ALA B 125 -6.302 11.126 1.010 1.00 56.07 C ATOM 3290 N HIS B 126 -4.896 8.969 -0.869 1.00 59.92 N ATOM 3291 CA HIS B 126 -3.705 8.232 -1.250 1.00 61.14 C ATOM 3292 C HIS B 126 -3.844 7.466 -2.552 1.00 64.86 C ATOM 3293 O HIS B 126 -4.955 7.084 -2.949 1.00 65.54 O ATOM 3294 CB HIS B 126 -3.362 7.232 -0.154 1.00 58.52 C ATOM 3295 CG HIS B 126 -3.692 7.721 1.211 1.00 57.04 C ATOM 3296 ND1 HIS B 126 -4.911 7.483 1.809 1.00 59.20 N ATOM 3297 CD2 HIS B 126 -3.010 8.536 2.048 1.00 55.86 C ATOM 3298 CE1 HIS B 126 -4.972 8.142 2.954 1.00 58.98 C ATOM 3299 NE2 HIS B 126 -3.831 8.789 3.122 1.00 58.70 N ATOM 3300 N THR B 127 -2.697 7.236 -3.194 1.00 67.28 N ATOM 3301 CA THR B 127 -2.617 6.475 -4.434 1.00 69.38 C ATOM 3302 C THR B 127 -2.747 4.999 -4.039 1.00 70.18 C ATOM 3303 O THR B 127 -3.415 4.210 -4.718 1.00 68.18 O ATOM 3304 CB THR B 127 -1.255 6.699 -5.140 1.00 71.32 C ATOM 3305 OG1 THR B 127 -0.231 6.861 -4.149 1.00 72.91 O ATOM 3306 CG2 THR B 127 -1.295 7.933 -6.043 1.00 70.67 C ATOM 3307 N ASP B 128 -2.110 4.645 -2.921 1.00 72.64 N ATOM 3308 CA ASP B 128 -2.134 3.275 -2.406 1.00 76.54 C ATOM 3309 C ASP B 128 -1.508 2.328 -3.445 1.00 78.32 C ATOM 3310 O ASP B 128 -1.783 1.124 -3.462 1.00 77.37 O ATOM 3311 CB ASP B 128 -3.590 2.824 -2.106 1.00 76.13 C ATOM 3312 CG ASP B 128 -4.331 3.767 -1.149 1.00 74.37 C ATOM 3313 OD1 ASP B 128 -3.684 4.348 -0.247 1.00 74.66 O ATOM 3314 OD2 ASP B 128 -5.567 3.913 -1.300 1.00 71.25 O ATOM 3315 N ILE B 129 -0.662 2.898 -4.305 1.00 81.57 N ATOM 3316 CA ILE B 129 0.017 2.181 -5.392 1.00 82.62 C ATOM 3317 C ILE B 129 1.421 1.664 -5.020 1.00 83.95 C ATOM 3318 O ILE B 129 2.257 1.405 -5.893 1.00 83.15 O ATOM 3319 CB ILE B 129 0.078 3.099 -6.680 1.00 81.91 C ATOM 3320 CG1 ILE B 129 0.652 2.330 -7.858 1.00 81.40 C ATOM 3321 CG2 ILE B 129 0.892 4.364 -6.401 1.00 80.22 C ATOM 3322 CD1 ILE B 129 -0.094 1.065 -8.152 1.00 82.11 C ATOM 3323 N ASN B 130 1.659 1.495 -3.718 1.00 85.83 N ATOM 3324 CA ASN B 130 2.933 0.990 -3.225 1.00 86.95 C ATOM 3325 C ASN B 130 2.986 -0.528 -3.343 1.00 87.55 C ATOM 3326 O ASN B 130 1.961 -1.208 -3.348 1.00 87.00 O ATOM 3327 CB ASN B 130 3.169 1.389 -1.759 1.00 88.13 C ATOM 3328 CG ASN B 130 3.840 2.754 -1.616 1.00 90.24 C ATOM 3329 OD1 ASN B 130 4.710 3.120 -2.410 1.00 91.56 O ATOM 3330 ND2 ASN B 130 3.448 3.503 -0.588 1.00 91.70 N ATOM 3331 N THR B 131 4.201 -1.046 -3.441 1.00 88.52 N ATOM 3332 CA THR B 131 4.432 -2.470 -3.561 1.00 88.78 C ATOM 3333 C THR B 131 5.092 -2.979 -2.284 1.00 90.67 C ATOM 3334 O THR B 131 5.545 -2.192 -1.442 1.00 89.83 O ATOM 3335 CB THR B 131 5.354 -2.747 -4.737 1.00 87.72 C ATOM 3336 OG1 THR B 131 6.624 -2.123 -4.500 1.00 86.20 O ATOM 3337 CG2 THR B 131 4.750 -2.186 -6.009 1.00 87.64 C ATOM 3338 N PRO B 132 5.147 -4.311 -2.121 1.00 92.13 N ATOM 3339 CA PRO B 132 5.750 -4.970 -0.953 1.00 92.76 C ATOM 3340 C PRO B 132 7.186 -4.484 -0.746 1.00 92.59 C ATOM 3341 O PRO B 132 7.770 -4.621 0.338 1.00 91.94 O ATOM 3342 CB PRO B 132 5.694 -6.452 -1.333 1.00 93.72 C ATOM 3343 CG PRO B 132 4.462 -6.535 -2.201 1.00 92.98 C ATOM 3344 CD PRO B 132 4.602 -5.308 -3.064 1.00 92.38 C ATOM 3345 N LEU B 133 7.726 -3.918 -1.823 1.00 92.49 N ATOM 3346 CA LEU B 133 9.085 -3.391 -1.883 1.00 92.47 C ATOM 3347 C LEU B 133 9.132 -1.971 -1.343 1.00 92.31 C ATOM 3348 O LEU B 133 9.994 -1.621 -0.530 1.00 92.59 O ATOM 3349 CB LEU B 133 9.561 -3.378 -3.338 1.00 92.42 C ATOM 3350 CG LEU B 133 9.297 -4.637 -4.167 1.00 93.22 C ATOM 3351 CD1 LEU B 133 9.806 -4.427 -5.590 1.00 92.74 C ATOM 3352 CD2 LEU B 133 9.974 -5.840 -3.508 1.00 93.47 C ATOM 3353 N THR B 134 8.192 -1.160 -1.817 1.00 91.80 N ATOM 3354 CA THR B 134 8.081 0.244 -1.439 1.00 90.29 C ATOM 3355 C THR B 134 7.543 0.491 -0.025 1.00 88.61 C ATOM 3356 O THR B 134 8.244 1.035 0.825 1.00 87.36 O ATOM 3357 CB THR B 134 7.174 0.963 -2.431 1.00 90.62 C ATOM 3358 OG1 THR B 134 5.884 0.348 -2.405 1.00 91.28 O ATOM 3359 CG2 THR B 134 7.732 0.854 -3.839 1.00 89.53 C ATOM 3360 N THR B 135 6.296 0.092 0.209 1.00 87.08 N ATOM 3361 CA THR B 135 5.644 0.280 1.502 1.00 87.00 C ATOM 3362 C THR B 135 6.565 0.270 2.707 1.00 86.71 C ATOM 3363 O THR B 135 7.435 -0.583 2.822 1.00 86.97 O ATOM 3364 CB THR B 135 4.585 -0.792 1.770 1.00 86.89 C ATOM 3365 OG1 THR B 135 5.076 -2.062 1.332 1.00 85.36 O ATOM 3366 CG2 THR B 135 3.281 -0.453 1.069 1.00 87.48 C ATOM 3367 N SER B 136 6.360 1.235 3.600 1.00 86.78 N ATOM 3368 CA SER B 136 7.115 1.329 4.839 1.00 87.04 C ATOM 3369 C SER B 136 6.182 0.713 5.855 1.00 88.13 C ATOM 3370 O SER B 136 6.166 1.091 7.026 1.00 88.17 O ATOM 3371 CB SER B 136 7.398 2.780 5.222 1.00 86.80 C ATOM 3372 OG SER B 136 8.532 3.283 4.546 1.00 87.18 O ATOM 3373 N SER B 137 5.380 -0.228 5.372 1.00 89.21 N ATOM 3374 CA SER B 137 4.419 -0.935 6.204 1.00 91.19 C ATOM 3375 C SER B 137 3.835 -2.062 5.368 1.00 91.93 C ATOM 3376 O SER B 137 3.697 -1.935 4.150 1.00 90.67 O ATOM 3377 CB SER B 137 3.309 0.016 6.653 1.00 92.06 C ATOM 3378 OG SER B 137 2.600 0.526 5.525 1.00 93.47 O ATOM 3379 N GLY B 138 3.502 -3.165 6.029 1.00 93.57 N ATOM 3380 CA GLY B 138 2.954 -4.311 5.327 1.00 95.55 C ATOM 3381 C GLY B 138 1.454 -4.211 5.166 1.00 96.50 C ATOM 3382 O GLY B 138 0.762 -5.192 4.854 1.00 97.48 O ATOM 3383 N ASN B 139 0.961 -2.997 5.378 1.00 96.37 N ATOM 3384 CA ASN B 139 -0.455 -2.675 5.285 1.00 95.32 C ATOM 3385 C ASN B 139 -0.884 -2.682 3.817 1.00 94.83 C ATOM 3386 O ASN B 139 -0.823 -1.667 3.115 1.00 94.93 O ATOM 3387 CB ASN B 139 -0.674 -1.304 5.916 1.00 95.07 C ATOM 3388 CG ASN B 139 0.121 -1.130 7.206 1.00 95.16 C ATOM 3389 OD1 ASN B 139 1.248 -1.625 7.334 1.00 96.11 O ATOM 3390 ND2 ASN B 139 -0.455 -0.416 8.161 1.00 95.53 N ATOM 3391 N LEU B 140 -1.314 -3.847 3.360 1.00 93.89 N ATOM 3392 CA LEU B 140 -1.741 -4.011 1.986 1.00 94.05 C ATOM 3393 C LEU B 140 -2.866 -3.076 1.541 1.00 94.50 C ATOM 3394 O LEU B 140 -3.200 -3.040 0.354 1.00 94.62 O ATOM 3395 CB LEU B 140 -2.145 -5.466 1.748 1.00 93.18 C ATOM 3396 CG LEU B 140 -0.977 -6.444 1.707 1.00 91.51 C ATOM 3397 CD1 LEU B 140 -1.438 -7.790 2.201 1.00 91.57 C ATOM 3398 CD2 LEU B 140 -0.417 -6.521 0.295 1.00 89.96 C ATOM 3399 N HIS B 141 -3.457 -2.324 2.474 1.00 94.12 N ATOM 3400 CA HIS B 141 -4.526 -1.396 2.098 1.00 93.21 C ATOM 3401 C HIS B 141 -3.891 -0.227 1.353 1.00 93.53 C ATOM 3402 O HIS B 141 -4.579 0.695 0.910 1.00 93.63 O ATOM 3403 CB HIS B 141 -5.307 -0.887 3.328 1.00 90.47 C ATOM 3404 CG HIS B 141 -4.581 0.144 4.147 1.00 88.47 C ATOM 3405 ND1 HIS B 141 -5.170 0.780 5.218 1.00 86.91 N ATOM 3406 CD2 HIS B 141 -3.326 0.646 4.053 1.00 87.52 C ATOM 3407 CE1 HIS B 141 -4.308 1.633 5.749 1.00 86.12 C ATOM 3408 NE2 HIS B 141 -3.183 1.572 5.061 1.00 85.96 N ATOM 3409 N GLY B 142 -2.565 -0.278 1.235 1.00 93.98 N ATOM 3410 CA GLY B 142 -1.825 0.764 0.547 1.00 95.37 C ATOM 3411 C GLY B 142 -1.052 0.214 -0.641 1.00 96.23 C ATOM 3412 O GLY B 142 -0.005 0.751 -1.014 1.00 95.98 O ATOM 3413 N GLN B 143 -1.587 -0.858 -1.227 1.00 96.91 N ATOM 3414 CA GLN B 143 -0.996 -1.543 -2.376 1.00 97.13 C ATOM 3415 C GLN B 143 -2.004 -2.038 -3.439 1.00 97.15 C ATOM 3416 O GLN B 143 -1.625 -2.313 -4.575 1.00 96.46 O ATOM 3417 CB GLN B 143 -0.169 -2.727 -1.872 1.00 97.37 C ATOM 3418 CG GLN B 143 0.926 -2.306 -0.902 1.00 98.88 C ATOM 3419 CD GLN B 143 1.679 -3.478 -0.296 1.00 99.70 C ATOM 3420 OE1 GLN B 143 2.187 -4.345 -1.007 1.00 98.74 O ATOM 3421 NE2 GLN B 143 1.766 -3.497 1.031 1.00100.53 N ATOM 3422 N PRO B 144 -3.301 -2.129 -3.091 1.00 97.65 N ATOM 3423 CA PRO B 144 -4.349 -2.595 -4.006 1.00 97.32 C ATOM 3424 C PRO B 144 -4.166 -2.404 -5.506 1.00 96.74 C ATOM 3425 O PRO B 144 -4.523 -3.282 -6.289 1.00 96.95 O ATOM 3426 CB PRO B 144 -5.590 -1.879 -3.489 1.00 97.93 C ATOM 3427 CG PRO B 144 -5.375 -1.924 -2.024 1.00 98.75 C ATOM 3428 CD PRO B 144 -3.921 -1.512 -1.901 1.00 98.52 C ATOM 3429 N VAL B 145 -3.628 -1.265 -5.918 1.00 96.76 N ATOM 3430 CA VAL B 145 -3.455 -1.022 -7.348 1.00 96.67 C ATOM 3431 C VAL B 145 -2.273 -1.818 -7.901 1.00 95.18 C ATOM 3432 O VAL B 145 -2.302 -2.283 -9.046 1.00 94.04 O ATOM 3433 CB VAL B 145 -3.260 0.499 -7.644 1.00 97.80 C ATOM 3434 CG1 VAL B 145 -3.156 0.730 -9.144 1.00 97.24 C ATOM 3435 CG2 VAL B 145 -4.430 1.305 -7.076 1.00 97.02 C ATOM 3436 N ALA B 146 -1.246 -1.978 -7.071 1.00 94.06 N ATOM 3437 CA ALA B 146 -0.042 -2.710 -7.445 1.00 93.54 C ATOM 3438 C ALA B 146 -0.332 -4.187 -7.757 1.00 93.29 C ATOM 3439 O ALA B 146 0.470 -4.866 -8.410 1.00 93.69 O ATOM 3440 CB ALA B 146 0.997 -2.597 -6.326 1.00 92.75 C ATOM 3441 N PHE B 147 -1.475 -4.679 -7.289 1.00 91.80 N ATOM 3442 CA PHE B 147 -1.866 -6.061 -7.538 1.00 89.97 C ATOM 3443 C PHE B 147 -2.743 -6.157 -8.765 1.00 88.94 C ATOM 3444 O PHE B 147 -2.762 -7.175 -9.452 1.00 88.67 O ATOM 3445 CB PHE B 147 -2.630 -6.622 -6.349 1.00 90.00 C ATOM 3446 CG PHE B 147 -1.761 -6.977 -5.204 1.00 90.95 C ATOM 3447 CD1 PHE B 147 -1.178 -5.988 -4.427 1.00 91.17 C ATOM 3448 CD2 PHE B 147 -1.480 -8.310 -4.926 1.00 91.91 C ATOM 3449 CE1 PHE B 147 -0.321 -6.319 -3.380 1.00 92.42 C ATOM 3450 CE2 PHE B 147 -0.626 -8.657 -3.886 1.00 92.68 C ATOM 3451 CZ PHE B 147 -0.042 -7.659 -3.109 1.00 92.65 C ATOM 3452 N LEU B 148 -3.471 -5.082 -9.031 1.00 88.06 N ATOM 3453 CA LEU B 148 -4.383 -5.036 -10.154 1.00 87.82 C ATOM 3454 C LEU B 148 -3.696 -4.569 -11.428 1.00 89.07 C ATOM 3455 O LEU B 148 -4.144 -4.856 -12.541 1.00 88.06 O ATOM 3456 CB LEU B 148 -5.549 -4.119 -9.796 1.00 85.29 C ATOM 3457 CG LEU B 148 -6.193 -4.513 -8.467 1.00 82.72 C ATOM 3458 CD1 LEU B 148 -7.247 -3.510 -8.069 1.00 81.84 C ATOM 3459 CD2 LEU B 148 -6.791 -5.893 -8.605 1.00 82.23 C ATOM 3460 N LEU B 149 -2.595 -3.851 -11.258 1.00 91.44 N ATOM 3461 CA LEU B 149 -1.853 -3.347 -12.401 1.00 94.48 C ATOM 3462 C LEU B 149 -1.004 -4.413 -13.072 1.00 96.57 C ATOM 3463 O LEU B 149 -0.377 -5.248 -12.407 1.00 96.50 O ATOM 3464 CB LEU B 149 -0.949 -2.182 -11.988 1.00 93.94 C ATOM 3465 CG LEU B 149 -1.515 -0.766 -12.031 1.00 93.30 C ATOM 3466 CD1 LEU B 149 -0.398 0.194 -11.681 1.00 92.84 C ATOM 3467 CD2 LEU B 149 -2.075 -0.445 -13.422 1.00 93.57 C ATOM 3468 N LYS B 150 -0.990 -4.363 -14.401 1.00 98.82 N ATOM 3469 CA LYS B 150 -0.208 -5.289 -15.210 1.00100.82 C ATOM 3470 C LYS B 150 1.234 -4.768 -15.267 1.00101.32 C ATOM 3471 O LYS B 150 2.188 -5.523 -15.061 1.00101.72 O ATOM 3472 CB LYS B 150 -0.795 -5.376 -16.625 1.00101.79 C ATOM 3473 CG LYS B 150 -2.275 -5.755 -16.668 1.00102.99 C ATOM 3474 CD LYS B 150 -2.821 -5.761 -18.097 1.00103.99 C ATOM 3475 CE LYS B 150 -4.300 -6.158 -18.126 1.00104.51 C ATOM 3476 NZ LYS B 150 -4.877 -6.213 -19.507 1.00104.60 N ATOM 3477 N GLU B 151 1.380 -3.469 -15.529 1.00101.60 N ATOM 3478 CA GLU B 151 2.690 -2.832 -15.614 1.00102.29 C ATOM 3479 C GLU B 151 3.474 -2.981 -14.322 1.00102.54 C ATOM 3480 O GLU B 151 4.597 -2.494 -14.208 1.00102.16 O ATOM 3481 CB GLU B 151 2.544 -1.344 -15.929 1.00102.95 C ATOM 3482 CG GLU B 151 1.896 -1.036 -17.264 1.00104.96 C ATOM 3483 CD GLU B 151 0.390 -0.905 -17.173 1.00106.23 C ATOM 3484 OE1 GLU B 151 -0.274 -1.873 -16.749 1.00107.34 O ATOM 3485 OE2 GLU B 151 -0.131 0.172 -17.534 1.00107.33 O ATOM 3486 N LEU B 152 2.878 -3.655 -13.351 1.00102.97 N ATOM 3487 CA LEU B 152 3.530 -3.849 -12.072 1.00104.39 C ATOM 3488 C LEU B 152 3.722 -5.324 -11.767 1.00105.77 C ATOM 3489 O LEU B 152 4.354 -5.687 -10.771 1.00105.76 O ATOM 3490 CB LEU B 152 2.701 -3.189 -10.976 1.00104.29 C ATOM 3491 CG LEU B 152 3.371 -2.052 -10.202 1.00104.15 C ATOM 3492 CD1 LEU B 152 4.245 -1.208 -11.111 1.00103.71 C ATOM 3493 CD2 LEU B 152 2.284 -1.212 -9.568 1.00104.71 C ATOM 3494 N LYS B 153 3.171 -6.173 -12.630 1.00107.04 N ATOM 3495 CA LYS B 153 3.288 -7.614 -12.452 1.00107.67 C ATOM 3496 C LYS B 153 4.759 -8.010 -12.530 1.00107.92 C ATOM 3497 O LYS B 153 5.530 -7.443 -13.312 1.00108.76 O ATOM 3498 CB LYS B 153 2.485 -8.350 -13.530 1.00107.80 C ATOM 3499 CG LYS B 153 2.457 -9.864 -13.366 1.00108.64 C ATOM 3500 CD LYS B 153 1.563 -10.528 -14.412 1.00109.46 C ATOM 3501 CE LYS B 153 1.639 -12.057 -14.347 1.00110.70 C ATOM 3502 NZ LYS B 153 1.166 -12.629 -13.046 1.00111.92 N ATOM 3503 N GLY B 154 5.145 -8.976 -11.702 1.00107.54 N ATOM 3504 CA GLY B 154 6.521 -9.425 -11.693 1.00107.33 C ATOM 3505 C GLY B 154 7.472 -8.290 -11.379 1.00107.30 C ATOM 3506 O GLY B 154 8.653 -8.347 -11.723 1.00108.46 O ATOM 3507 N LYS B 155 6.955 -7.246 -10.742 1.00106.64 N ATOM 3508 CA LYS B 155 7.773 -6.103 -10.362 1.00105.81 C ATOM 3509 C LYS B 155 8.028 -6.197 -8.858 1.00105.92 C ATOM 3510 O LYS B 155 8.609 -5.296 -8.248 1.00105.68 O ATOM 3511 CB LYS B 155 7.058 -4.797 -10.714 1.00104.96 C ATOM 3512 CG LYS B 155 6.860 -4.588 -12.205 1.00103.47 C ATOM 3513 CD LYS B 155 8.198 -4.458 -12.909 1.00104.34 C ATOM 3514 CE LYS B 155 8.031 -4.346 -14.420 1.00104.97 C ATOM 3515 NZ LYS B 155 9.341 -4.172 -15.124 1.00103.63 N ATOM 3516 N PHE B 156 7.582 -7.311 -8.281 1.00106.22 N ATOM 3517 CA PHE B 156 7.738 -7.599 -6.858 1.00106.40 C ATOM 3518 C PHE B 156 7.470 -9.085 -6.560 1.00106.38 C ATOM 3519 O PHE B 156 6.734 -9.756 -7.293 1.00105.46 O ATOM 3520 CB PHE B 156 6.813 -6.697 -6.014 1.00107.04 C ATOM 3521 CG PHE B 156 5.349 -6.764 -6.398 1.00107.71 C ATOM 3522 CD1 PHE B 156 4.920 -6.395 -7.677 1.00107.58 C ATOM 3523 CD2 PHE B 156 4.395 -7.179 -5.467 1.00106.90 C ATOM 3524 CE1 PHE B 156 3.568 -6.437 -8.020 1.00106.82 C ATOM 3525 CE2 PHE B 156 3.046 -7.222 -5.797 1.00106.77 C ATOM 3526 CZ PHE B 156 2.630 -6.851 -7.078 1.00107.12 C ATOM 3527 N PRO B 157 8.079 -9.618 -5.481 1.00106.61 N ATOM 3528 CA PRO B 157 7.915 -11.023 -5.082 1.00106.37 C ATOM 3529 C PRO B 157 6.475 -11.447 -4.836 1.00106.46 C ATOM 3530 O PRO B 157 5.808 -10.925 -3.941 1.00106.53 O ATOM 3531 CB PRO B 157 8.773 -11.128 -3.821 1.00106.35 C ATOM 3532 CG PRO B 157 8.714 -9.739 -3.247 1.00106.43 C ATOM 3533 CD PRO B 157 8.893 -8.893 -4.487 1.00106.79 C ATOM 3534 N ASP B 158 6.007 -12.402 -5.635 1.00106.90 N ATOM 3535 CA ASP B 158 4.647 -12.917 -5.509 1.00106.90 C ATOM 3536 C ASP B 158 4.306 -13.015 -4.026 1.00107.41 C ATOM 3537 O ASP B 158 5.100 -13.508 -3.216 1.00107.55 O ATOM 3538 CB ASP B 158 4.540 -14.281 -6.194 1.00104.94 C ATOM 3539 CG ASP B 158 5.131 -14.266 -7.589 1.00104.15 C ATOM 3540 OD1 ASP B 158 4.733 -13.396 -8.397 1.00103.27 O ATOM 3541 OD2 ASP B 158 5.999 -15.115 -7.881 1.00103.26 O ATOM 3542 N VAL B 159 3.125 -12.531 -3.669 1.00107.49 N ATOM 3543 CA VAL B 159 2.718 -12.524 -2.277 1.00107.43 C ATOM 3544 C VAL B 159 1.825 -13.686 -1.851 1.00105.92 C ATOM 3545 O VAL B 159 0.900 -14.081 -2.570 1.00105.41 O ATOM 3546 CB VAL B 159 2.026 -11.181 -1.930 1.00108.72 C ATOM 3547 CG1 VAL B 159 1.711 -11.116 -0.436 1.00109.55 C ATOM 3548 CG2 VAL B 159 2.928 -10.017 -2.342 1.00109.32 C ATOM 3549 N PRO B 160 2.111 -14.248 -0.663 1.00104.72 N ATOM 3550 CA PRO B 160 1.422 -15.365 -0.014 1.00104.41 C ATOM 3551 C PRO B 160 -0.071 -15.154 0.213 1.00103.98 C ATOM 3552 O PRO B 160 -0.478 -14.634 1.254 1.00103.74 O ATOM 3553 CB PRO B 160 2.179 -15.514 1.300 1.00103.93 C ATOM 3554 CG PRO B 160 3.556 -15.140 0.914 1.00103.63 C ATOM 3555 CD PRO B 160 3.333 -13.905 0.086 1.00104.02 C ATOM 3556 N GLY B 161 -0.875 -15.569 -0.763 1.00103.53 N ATOM 3557 CA GLY B 161 -2.314 -15.447 -0.645 1.00103.26 C ATOM 3558 C GLY B 161 -2.982 -14.761 -1.817 1.00103.11 C ATOM 3559 O GLY B 161 -4.202 -14.838 -1.975 1.00102.78 O ATOM 3560 N PHE B 162 -2.185 -14.112 -2.657 1.00102.76 N ATOM 3561 CA PHE B 162 -2.729 -13.385 -3.794 1.00102.25 C ATOM 3562 C PHE B 162 -2.404 -14.040 -5.124 1.00101.97 C ATOM 3563 O PHE B 162 -2.409 -13.391 -6.168 1.00101.94 O ATOM 3564 CB PHE B 162 -2.208 -11.952 -3.759 1.00101.45 C ATOM 3565 CG PHE B 162 -2.297 -11.326 -2.399 1.00101.08 C ATOM 3566 CD1 PHE B 162 -1.482 -11.772 -1.360 1.00100.76 C ATOM 3567 CD2 PHE B 162 -3.226 -10.326 -2.139 1.00101.58 C ATOM 3568 CE1 PHE B 162 -1.594 -11.239 -0.083 1.00100.12 C ATOM 3569 CE2 PHE B 162 -3.346 -9.785 -0.860 1.00101.83 C ATOM 3570 CZ PHE B 162 -2.527 -10.246 0.170 1.00101.45 C ATOM 3571 N SER B 163 -2.123 -15.335 -5.077 1.00102.29 N ATOM 3572 CA SER B 163 -1.807 -16.095 -6.278 1.00102.63 C ATOM 3573 C SER B 163 -2.922 -15.897 -7.312 1.00102.57 C ATOM 3574 O SER B 163 -2.674 -15.438 -8.430 1.00102.59 O ATOM 3575 CB SER B 163 -1.669 -17.576 -5.921 1.00102.37 C ATOM 3576 OG SER B 163 -0.763 -17.743 -4.840 1.00101.02 O ATOM 3577 N TRP B 164 -4.146 -16.232 -6.905 1.00102.49 N ATOM 3578 CA TRP B 164 -5.365 -16.117 -7.722 1.00101.47 C ATOM 3579 C TRP B 164 -5.554 -14.739 -8.366 1.00 99.77 C ATOM 3580 O TRP B 164 -6.363 -14.573 -9.281 1.00 98.35 O ATOM 3581 CB TRP B 164 -6.580 -16.410 -6.843 1.00103.47 C ATOM 3582 CG TRP B 164 -6.605 -15.536 -5.612 1.00106.13 C ATOM 3583 CD1 TRP B 164 -6.009 -15.789 -4.402 1.00106.71 C ATOM 3584 CD2 TRP B 164 -7.186 -14.230 -5.509 1.00106.44 C ATOM 3585 NE1 TRP B 164 -6.190 -14.718 -3.555 1.00106.48 N ATOM 3586 CE2 TRP B 164 -6.908 -13.759 -4.200 1.00106.26 C ATOM 3587 CE3 TRP B 164 -7.922 -13.426 -6.385 1.00106.62 C ATOM 3588 CZ2 TRP B 164 -7.334 -12.502 -3.768 1.00106.64 C ATOM 3589 CZ3 TRP B 164 -8.342 -12.180 -5.949 1.00106.79 C ATOM 3590 CH2 TRP B 164 -8.053 -11.734 -4.646 1.00107.31 C ATOM 3591 N VAL B 165 -4.819 -13.753 -7.865 1.00 98.45 N ATOM 3592 CA VAL B 165 -4.903 -12.399 -8.382 1.00 97.02 C ATOM 3593 C VAL B 165 -4.240 -12.286 -9.747 1.00 96.57 C ATOM 3594 O VAL B 165 -3.131 -12.775 -9.956 1.00 95.58 O ATOM 3595 CB VAL B 165 -4.236 -11.407 -7.427 1.00 96.60 C ATOM 3596 CG1 VAL B 165 -4.363 -9.999 -7.971 1.00 96.25 C ATOM 3597 CG2 VAL B 165 -4.868 -11.521 -6.054 1.00 95.83 C ATOM 3598 N THR B 166 -4.932 -11.625 -10.668 1.00 96.74 N ATOM 3599 CA THR B 166 -4.435 -11.440 -12.025 1.00 96.80 C ATOM 3600 C THR B 166 -4.429 -9.962 -12.410 1.00 96.74 C ATOM 3601 O THR B 166 -5.489 -9.349 -12.553 1.00 96.41 O ATOM 3602 CB THR B 166 -5.320 -12.186 -13.044 1.00 96.99 C ATOM 3603 OG1 THR B 166 -5.587 -13.508 -12.564 1.00 97.70 O ATOM 3604 CG2 THR B 166 -4.625 -12.270 -14.405 1.00 96.21 C ATOM 3605 N PRO B 167 -3.232 -9.370 -12.571 1.00 96.68 N ATOM 3606 CA PRO B 167 -3.140 -7.955 -12.949 1.00 97.04 C ATOM 3607 C PRO B 167 -4.117 -7.727 -14.097 1.00 97.98 C ATOM 3608 O PRO B 167 -3.886 -8.190 -15.212 1.00 98.54 O ATOM 3609 CB PRO B 167 -1.686 -7.823 -13.371 1.00 96.70 C ATOM 3610 CG PRO B 167 -0.992 -8.761 -12.403 1.00 96.21 C ATOM 3611 CD PRO B 167 -1.894 -9.972 -12.413 1.00 95.91 C ATOM 3612 N CYS B 168 -5.206 -7.015 -13.814 1.00 99.21 N ATOM 3613 CA CYS B 168 -6.264 -6.783 -14.800 1.00100.39 C ATOM 3614 C CYS B 168 -6.352 -5.471 -15.590 1.00100.22 C ATOM 3615 O CYS B 168 -7.031 -5.430 -16.614 1.00100.36 O ATOM 3616 CB CYS B 168 -7.623 -7.050 -14.139 1.00101.41 C ATOM 3617 SG CYS B 168 -7.845 -6.252 -12.533 1.00102.86 S ATOM 3618 N ILE B 169 -5.711 -4.397 -15.139 1.00100.14 N ATOM 3619 CA ILE B 169 -5.784 -3.150 -15.906 1.00 99.43 C ATOM 3620 C ILE B 169 -4.469 -2.389 -15.949 1.00100.06 C ATOM 3621 O ILE B 169 -3.662 -2.443 -15.018 1.00 99.94 O ATOM 3622 CB ILE B 169 -6.886 -2.188 -15.378 1.00 97.80 C ATOM 3623 CG1 ILE B 169 -6.561 -1.732 -13.954 1.00 97.02 C ATOM 3624 CG2 ILE B 169 -8.242 -2.863 -15.443 1.00 97.30 C ATOM 3625 CD1 ILE B 169 -6.444 -2.847 -12.937 1.00 96.20 C ATOM 3626 N SER B 170 -4.255 -1.691 -17.057 1.00100.49 N ATOM 3627 CA SER B 170 -3.051 -0.901 -17.237 1.00101.32 C ATOM 3628 C SER B 170 -3.299 0.513 -16.697 1.00101.91 C ATOM 3629 O SER B 170 -4.423 0.844 -16.304 1.00102.46 O ATOM 3630 CB SER B 170 -2.665 -0.877 -18.721 1.00100.78 C ATOM 3631 OG SER B 170 -3.790 -0.673 -19.550 1.00100.26 O ATOM 3632 N ALA B 171 -2.257 1.339 -16.670 1.00101.63 N ATOM 3633 CA ALA B 171 -2.375 2.705 -16.163 1.00101.52 C ATOM 3634 C ALA B 171 -3.198 3.602 -17.089 1.00101.47 C ATOM 3635 O ALA B 171 -3.479 4.762 -16.774 1.00101.00 O ATOM 3636 CB ALA B 171 -0.985 3.295 -15.958 1.00101.29 C ATOM 3637 N LYS B 172 -3.592 3.051 -18.228 1.00101.77 N ATOM 3638 CA LYS B 172 -4.368 3.800 -19.198 1.00102.19 C ATOM 3639 C LYS B 172 -5.866 3.723 -18.933 1.00100.80 C ATOM 3640 O LYS B 172 -6.660 4.377 -19.613 1.00100.93 O ATOM 3641 CB LYS B 172 -4.046 3.290 -20.609 1.00104.75 C ATOM 3642 CG LYS B 172 -3.092 4.201 -21.395 1.00107.35 C ATOM 3643 CD LYS B 172 -1.866 4.633 -20.571 1.00107.90 C ATOM 3644 CE LYS B 172 -1.129 5.791 -21.250 1.00108.26 C ATOM 3645 NZ LYS B 172 -0.049 6.388 -20.410 1.00107.62 N ATOM 3646 N ASP B 173 -6.249 2.935 -17.933 1.00 98.57 N ATOM 3647 CA ASP B 173 -7.658 2.781 -17.601 1.00 95.87 C ATOM 3648 C ASP B 173 -8.003 3.228 -16.177 1.00 92.84 C ATOM 3649 O ASP B 173 -9.180 3.280 -15.808 1.00 92.74 O ATOM 3650 CB ASP B 173 -8.092 1.321 -17.800 1.00 97.37 C ATOM 3651 CG ASP B 173 -7.771 0.796 -19.193 1.00 97.26 C ATOM 3652 OD1 ASP B 173 -8.142 1.454 -20.193 1.00 95.86 O ATOM 3653 OD2 ASP B 173 -7.149 -0.287 -19.280 1.00 98.11 O ATOM 3654 N ILE B 174 -6.978 3.550 -15.388 1.00 89.13 N ATOM 3655 CA ILE B 174 -7.157 4.004 -14.003 1.00 83.93 C ATOM 3656 C ILE B 174 -7.060 5.524 -13.876 1.00 80.70 C ATOM 3657 O ILE B 174 -6.263 6.152 -14.562 1.00 80.98 O ATOM 3658 CB ILE B 174 -6.087 3.390 -13.064 1.00 82.90 C ATOM 3659 CG1 ILE B 174 -6.236 3.963 -11.659 1.00 81.94 C ATOM 3660 CG2 ILE B 174 -4.701 3.706 -13.568 1.00 82.20 C ATOM 3661 CD1 ILE B 174 -7.588 3.707 -11.058 1.00 82.59 C ATOM 3662 N VAL B 175 -7.880 6.111 -13.006 1.00 77.49 N ATOM 3663 CA VAL B 175 -7.853 7.560 -12.770 1.00 74.17 C ATOM 3664 C VAL B 175 -8.051 7.875 -11.279 1.00 71.60 C ATOM 3665 O VAL B 175 -9.050 7.483 -10.665 1.00 70.28 O ATOM 3666 CB VAL B 175 -8.929 8.320 -13.614 1.00 73.83 C ATOM 3667 CG1 VAL B 175 -8.871 9.814 -13.318 1.00 72.13 C ATOM 3668 CG2 VAL B 175 -8.688 8.091 -15.088 1.00 71.71 C ATOM 3669 N TYR B 176 -7.076 8.585 -10.715 1.00 68.81 N ATOM 3670 CA TYR B 176 -7.073 8.979 -9.310 1.00 65.95 C ATOM 3671 C TYR B 176 -7.610 10.380 -9.125 1.00 66.00 C ATOM 3672 O TYR B 176 -7.255 11.277 -9.888 1.00 66.47 O ATOM 3673 CB TYR B 176 -5.652 8.983 -8.784 1.00 63.07 C ATOM 3674 CG TYR B 176 -5.086 7.641 -8.428 1.00 62.07 C ATOM 3675 CD1 TYR B 176 -5.462 7.002 -7.259 1.00 61.02 C ATOM 3676 CD2 TYR B 176 -4.056 7.081 -9.179 1.00 62.56 C ATOM 3677 CE1 TYR B 176 -4.811 5.846 -6.826 1.00 60.83 C ATOM 3678 CE2 TYR B 176 -3.395 5.923 -8.755 1.00 62.19 C ATOM 3679 CZ TYR B 176 -3.775 5.316 -7.568 1.00 61.80 C ATOM 3680 OH TYR B 176 -3.078 4.224 -7.093 1.00 61.14 O ATOM 3681 N ILE B 177 -8.462 10.573 -8.121 1.00 66.80 N ATOM 3682 CA ILE B 177 -8.993 11.909 -7.829 1.00 67.11 C ATOM 3683 C ILE B 177 -8.943 12.117 -6.308 1.00 68.56 C ATOM 3684 O ILE B 177 -9.424 11.272 -5.536 1.00 69.41 O ATOM 3685 CB ILE B 177 -10.476 12.114 -8.312 1.00 65.54 C ATOM 3686 CG1 ILE B 177 -10.781 11.253 -9.540 1.00 65.06 C ATOM 3687 CG2 ILE B 177 -10.702 13.574 -8.693 1.00 61.44 C ATOM 3688 CD1 ILE B 177 -12.174 11.462 -10.102 1.00 62.70 C ATOM 3689 N GLY B 178 -8.326 13.224 -5.886 1.00 68.46 N ATOM 3690 CA GLY B 178 -8.238 13.543 -4.471 1.00 68.24 C ATOM 3691 C GLY B 178 -6.899 13.279 -3.802 1.00 68.59 C ATOM 3692 O GLY B 178 -6.785 13.429 -2.575 1.00 68.65 O ATOM 3693 N LEU B 179 -5.886 12.909 -4.587 1.00 67.99 N ATOM 3694 CA LEU B 179 -4.557 12.617 -4.034 1.00 65.97 C ATOM 3695 C LEU B 179 -3.933 13.836 -3.342 1.00 64.84 C ATOM 3696 O LEU B 179 -3.894 14.943 -3.880 1.00 62.84 O ATOM 3697 CB LEU B 179 -3.624 12.086 -5.136 1.00 63.88 C ATOM 3698 CG LEU B 179 -4.179 10.888 -5.928 1.00 61.65 C ATOM 3699 CD1 LEU B 179 -3.187 10.465 -7.000 1.00 60.43 C ATOM 3700 CD2 LEU B 179 -4.487 9.742 -4.980 1.00 58.38 C ATOM 3701 N ARG B 180 -3.448 13.616 -2.129 1.00 64.18 N ATOM 3702 CA ARG B 180 -2.860 14.692 -1.370 1.00 63.57 C ATOM 3703 C ARG B 180 -1.844 14.165 -0.365 1.00 65.71 C ATOM 3704 O ARG B 180 -1.209 14.938 0.341 1.00 67.12 O ATOM 3705 CB ARG B 180 -3.966 15.480 -0.665 1.00 60.48 C ATOM 3706 CG ARG B 180 -4.323 14.955 0.707 1.00 58.48 C ATOM 3707 CD ARG B 180 -5.518 15.663 1.300 1.00 55.45 C ATOM 3708 NE ARG B 180 -6.708 15.223 0.605 1.00 58.34 N ATOM 3709 CZ ARG B 180 -7.773 14.684 1.188 1.00 60.04 C ATOM 3710 NH1 ARG B 180 -7.815 14.520 2.506 1.00 58.90 N ATOM 3711 NH2 ARG B 180 -8.788 14.271 0.433 1.00 60.12 N ATOM 3712 N ASP B 181 -1.692 12.850 -0.291 1.00 68.57 N ATOM 3713 CA ASP B 181 -0.706 12.259 0.615 1.00 72.38 C ATOM 3714 C ASP B 181 -0.037 11.049 -0.058 1.00 73.65 C ATOM 3715 O ASP B 181 -0.304 9.881 0.270 1.00 72.86 O ATOM 3716 CB ASP B 181 -1.367 11.861 1.942 1.00 74.74 C ATOM 3717 CG ASP B 181 -0.476 10.962 2.793 1.00 76.50 C ATOM 3718 OD1 ASP B 181 0.764 11.111 2.713 1.00 76.76 O ATOM 3719 OD2 ASP B 181 -1.023 10.117 3.543 1.00 77.76 O ATOM 3720 N VAL B 182 0.849 11.345 -1.003 1.00 75.62 N ATOM 3721 CA VAL B 182 1.525 10.293 -1.748 1.00 77.84 C ATOM 3722 C VAL B 182 2.977 10.057 -1.391 1.00 78.44 C ATOM 3723 O VAL B 182 3.774 10.993 -1.388 1.00 79.02 O ATOM 3724 CB VAL B 182 1.430 10.554 -3.262 1.00 78.31 C ATOM 3725 CG1 VAL B 182 2.647 9.987 -3.981 1.00 79.03 C ATOM 3726 CG2 VAL B 182 0.167 9.907 -3.806 1.00 79.08 C ATOM 3727 N ASP B 183 3.306 8.788 -1.126 1.00 79.14 N ATOM 3728 CA ASP B 183 4.660 8.361 -0.775 1.00 78.94 C ATOM 3729 C ASP B 183 5.633 8.435 -1.947 1.00 79.30 C ATOM 3730 O ASP B 183 5.234 8.328 -3.109 1.00 79.36 O ATOM 3731 CB ASP B 183 4.645 6.935 -0.227 1.00 78.05 C ATOM 3732 CG ASP B 183 4.123 6.866 1.189 1.00 79.06 C ATOM 3733 OD1 ASP B 183 4.746 7.485 2.078 1.00 78.74 O ATOM 3734 OD2 ASP B 183 3.090 6.198 1.419 1.00 80.08 O ATOM 3735 N PRO B 184 6.931 8.616 -1.648 1.00 79.74 N ATOM 3736 CA PRO B 184 7.977 8.706 -2.672 1.00 80.29 C ATOM 3737 C PRO B 184 7.759 7.629 -3.733 1.00 81.42 C ATOM 3738 O PRO B 184 7.329 7.920 -4.853 1.00 81.63 O ATOM 3739 CB PRO B 184 9.257 8.472 -1.876 1.00 78.84 C ATOM 3740 CG PRO B 184 8.927 9.023 -0.549 1.00 78.62 C ATOM 3741 CD PRO B 184 7.530 8.513 -0.307 1.00 78.97 C ATOM 3742 N GLY B 185 8.050 6.385 -3.353 1.00 81.81 N ATOM 3743 CA GLY B 185 7.895 5.257 -4.251 1.00 82.31 C ATOM 3744 C GLY B 185 6.664 5.377 -5.116 1.00 82.89 C ATOM 3745 O GLY B 185 6.727 5.197 -6.330 1.00 82.84 O ATOM 3746 N GLU B 186 5.537 5.688 -4.489 1.00 83.40 N ATOM 3747 CA GLU B 186 4.296 5.839 -5.221 1.00 84.01 C ATOM 3748 C GLU B 186 4.387 6.924 -6.298 1.00 85.43 C ATOM 3749 O GLU B 186 3.922 6.718 -7.424 1.00 85.30 O ATOM 3750 CB GLU B 186 3.152 6.131 -4.249 1.00 82.52 C ATOM 3751 CG GLU B 186 2.481 4.883 -3.720 1.00 80.01 C ATOM 3752 CD GLU B 186 1.644 5.155 -2.499 1.00 80.52 C ATOM 3753 OE1 GLU B 186 2.236 5.608 -1.501 1.00 82.23 O ATOM 3754 OE2 GLU B 186 0.413 4.920 -2.527 1.00 79.04 O ATOM 3755 N HIS B 187 4.989 8.069 -5.969 1.00 87.43 N ATOM 3756 CA HIS B 187 5.111 9.151 -6.947 1.00 89.34 C ATOM 3757 C HIS B 187 5.891 8.644 -8.152 1.00 91.26 C ATOM 3758 O HIS B 187 5.526 8.896 -9.305 1.00 91.25 O ATOM 3759 CB HIS B 187 5.823 10.370 -6.352 1.00 88.09 C ATOM 3760 CG HIS B 187 5.657 11.619 -7.170 1.00 87.99 C ATOM 3761 ND1 HIS B 187 6.265 12.809 -6.853 1.00 88.58 N ATOM 3762 CD2 HIS B 187 4.916 11.857 -8.280 1.00 87.65 C ATOM 3763 CE1 HIS B 187 5.906 13.735 -7.730 1.00 87.86 C ATOM 3764 NE2 HIS B 187 5.089 13.183 -8.603 1.00 86.86 N ATOM 3765 N TYR B 188 6.973 7.929 -7.862 1.00 93.24 N ATOM 3766 CA TYR B 188 7.829 7.346 -8.887 1.00 94.52 C ATOM 3767 C TYR B 188 6.928 6.572 -9.842 1.00 93.43 C ATOM 3768 O TYR B 188 6.784 6.921 -11.013 1.00 93.13 O ATOM 3769 CB TYR B 188 8.824 6.390 -8.223 1.00 97.75 C ATOM 3770 CG TYR B 188 9.882 5.835 -9.145 1.00101.26 C ATOM 3771 CD1 TYR B 188 10.884 6.656 -9.651 1.00102.84 C ATOM 3772 CD2 TYR B 188 9.885 4.488 -9.504 1.00103.00 C ATOM 3773 CE1 TYR B 188 11.864 6.156 -10.492 1.00105.63 C ATOM 3774 CE2 TYR B 188 10.863 3.971 -10.348 1.00105.61 C ATOM 3775 CZ TYR B 188 11.851 4.813 -10.840 1.00106.52 C ATOM 3776 OH TYR B 188 12.826 4.328 -11.685 1.00107.84 O ATOM 3777 N ILE B 189 6.315 5.528 -9.299 1.00 91.82 N ATOM 3778 CA ILE B 189 5.420 4.652 -10.024 1.00 89.96 C ATOM 3779 C ILE B 189 4.395 5.370 -10.899 1.00 89.45 C ATOM 3780 O ILE B 189 4.336 5.118 -12.098 1.00 90.11 O ATOM 3781 CB ILE B 189 4.684 3.732 -9.041 1.00 89.42 C ATOM 3782 CG1 ILE B 189 5.698 3.131 -8.066 1.00 89.61 C ATOM 3783 CG2 ILE B 189 3.967 2.634 -9.794 1.00 88.40 C ATOM 3784 CD1 ILE B 189 5.098 2.341 -6.905 1.00 90.14 C ATOM 3785 N ILE B 190 3.592 6.263 -10.325 1.00 88.36 N ATOM 3786 CA ILE B 190 2.581 6.969 -11.119 1.00 87.98 C ATOM 3787 C ILE B 190 3.146 7.810 -12.265 1.00 87.97 C ATOM 3788 O ILE B 190 2.526 7.920 -13.325 1.00 87.59 O ATOM 3789 CB ILE B 190 1.694 7.891 -10.251 1.00 87.07 C ATOM 3790 CG1 ILE B 190 2.566 8.697 -9.294 1.00 86.15 C ATOM 3791 CG2 ILE B 190 0.653 7.072 -9.511 1.00 86.59 C ATOM 3792 CD1 ILE B 190 1.773 9.612 -8.404 1.00 85.40 C ATOM 3793 N LYS B 191 4.311 8.412 -12.047 1.00 88.65 N ATOM 3794 CA LYS B 191 4.946 9.229 -13.078 1.00 88.53 C ATOM 3795 C LYS B 191 5.478 8.304 -14.167 1.00 88.10 C ATOM 3796 O LYS B 191 5.150 8.447 -15.349 1.00 88.34 O ATOM 3797 CB LYS B 191 6.110 10.020 -12.483 1.00 88.47 C ATOM 3798 CG LYS B 191 5.734 10.881 -11.302 1.00 88.98 C ATOM 3799 CD LYS B 191 4.659 11.891 -11.672 1.00 90.42 C ATOM 3800 CE LYS B 191 5.168 12.929 -12.664 1.00 91.22 C ATOM 3801 NZ LYS B 191 4.133 13.950 -13.010 1.00 91.30 N ATOM 3802 N THR B 192 6.301 7.351 -13.733 1.00 87.39 N ATOM 3803 CA THR B 192 6.924 6.365 -14.603 1.00 86.37 C ATOM 3804 C THR B 192 5.910 5.654 -15.472 1.00 85.11 C ATOM 3805 O THR B 192 5.984 5.718 -16.695 1.00 85.30 O ATOM 3806 CB THR B 192 7.669 5.297 -13.784 1.00 86.84 C ATOM 3807 OG1 THR B 192 8.843 5.874 -13.194 1.00 87.24 O ATOM 3808 CG2 THR B 192 8.064 4.127 -14.675 1.00 87.53 C ATOM 3809 N LEU B 193 4.972 4.970 -14.828 1.00 83.59 N ATOM 3810 CA LEU B 193 3.944 4.230 -15.535 1.00 81.81 C ATOM 3811 C LEU B 193 2.908 5.135 -16.154 1.00 81.47 C ATOM 3812 O LEU B 193 1.892 4.659 -16.647 1.00 82.12 O ATOM 3813 CB LEU B 193 3.244 3.258 -14.596 1.00 81.24 C ATOM 3814 CG LEU B 193 4.071 2.083 -14.085 1.00 80.57 C ATOM 3815 CD1 LEU B 193 5.241 2.579 -13.243 1.00 80.60 C ATOM 3816 CD2 LEU B 193 3.165 1.176 -13.272 1.00 80.83 C ATOM 3817 N GLY B 194 3.157 6.437 -16.127 1.00 80.95 N ATOM 3818 CA GLY B 194 2.208 7.367 -16.709 1.00 80.89 C ATOM 3819 C GLY B 194 0.763 7.065 -16.339 1.00 80.68 C ATOM 3820 O GLY B 194 0.018 6.492 -17.131 1.00 79.93 O ATOM 3821 N ILE B 195 0.376 7.455 -15.129 1.00 81.64 N ATOM 3822 CA ILE B 195 -0.975 7.245 -14.620 1.00 81.78 C ATOM 3823 C ILE B 195 -1.699 8.581 -14.537 1.00 82.29 C ATOM 3824 O ILE B 195 -1.225 9.493 -13.852 1.00 82.81 O ATOM 3825 CB ILE B 195 -0.936 6.639 -13.209 1.00 81.34 C ATOM 3826 CG1 ILE B 195 -0.470 5.191 -13.289 1.00 81.48 C ATOM 3827 CG2 ILE B 195 -2.295 6.768 -12.547 1.00 81.33 C ATOM 3828 CD1 ILE B 195 -0.196 4.567 -11.941 1.00 82.56 C ATOM 3829 N LYS B 196 -2.834 8.698 -15.226 1.00 82.10 N ATOM 3830 CA LYS B 196 -3.606 9.941 -15.202 1.00 82.86 C ATOM 3831 C LYS B 196 -4.210 10.136 -13.809 1.00 83.44 C ATOM 3832 O LYS B 196 -4.889 9.249 -13.278 1.00 82.76 O ATOM 3833 CB LYS B 196 -4.712 9.914 -16.257 1.00 82.66 C ATOM 3834 CG LYS B 196 -5.562 11.170 -16.322 1.00 82.46 C ATOM 3835 CD LYS B 196 -4.793 12.377 -16.837 1.00 83.91 C ATOM 3836 CE LYS B 196 -5.761 13.536 -17.155 1.00 86.21 C ATOM 3837 NZ LYS B 196 -5.143 14.856 -17.539 1.00 85.46 N ATOM 3838 N TYR B 197 -3.942 11.302 -13.222 1.00 83.30 N ATOM 3839 CA TYR B 197 -4.417 11.631 -11.880 1.00 82.70 C ATOM 3840 C TYR B 197 -4.800 13.100 -11.759 1.00 82.58 C ATOM 3841 O TYR B 197 -4.612 13.887 -12.688 1.00 83.40 O ATOM 3842 CB TYR B 197 -3.327 11.311 -10.838 1.00 82.50 C ATOM 3843 CG TYR B 197 -2.018 12.066 -11.039 1.00 82.72 C ATOM 3844 CD1 TYR B 197 -2.000 13.466 -11.124 1.00 82.81 C ATOM 3845 CD2 TYR B 197 -0.798 11.385 -11.154 1.00 82.18 C ATOM 3846 CE1 TYR B 197 -0.811 14.171 -11.323 1.00 82.44 C ATOM 3847 CE2 TYR B 197 0.408 12.084 -11.351 1.00 82.21 C ATOM 3848 CZ TYR B 197 0.390 13.484 -11.439 1.00 82.80 C ATOM 3849 OH TYR B 197 1.548 14.208 -11.668 1.00 80.90 O ATOM 3850 N PHE B 198 -5.326 13.467 -10.595 1.00 81.74 N ATOM 3851 CA PHE B 198 -5.726 14.843 -10.327 1.00 79.29 C ATOM 3852 C PHE B 198 -5.536 15.134 -8.854 1.00 75.97 C ATOM 3853 O PHE B 198 -6.469 14.995 -8.071 1.00 76.68 O ATOM 3854 CB PHE B 198 -7.199 15.074 -10.666 1.00 81.72 C ATOM 3855 CG PHE B 198 -7.502 15.064 -12.132 1.00 84.47 C ATOM 3856 CD1 PHE B 198 -7.627 13.863 -12.824 1.00 85.88 C ATOM 3857 CD2 PHE B 198 -7.704 16.262 -12.818 1.00 85.59 C ATOM 3858 CE1 PHE B 198 -7.954 13.856 -14.180 1.00 87.07 C ATOM 3859 CE2 PHE B 198 -8.032 16.265 -14.170 1.00 86.78 C ATOM 3860 CZ PHE B 198 -8.160 15.059 -14.855 1.00 86.99 C ATOM 3861 N SER B 199 -4.337 15.538 -8.466 1.00 71.87 N ATOM 3862 CA SER B 199 -4.110 15.836 -7.068 1.00 67.84 C ATOM 3863 C SER B 199 -4.926 17.061 -6.683 1.00 66.10 C ATOM 3864 O SER B 199 -5.390 17.812 -7.553 1.00 64.20 O ATOM 3865 CB SER B 199 -2.636 16.102 -6.817 1.00 66.46 C ATOM 3866 OG SER B 199 -2.214 17.195 -7.591 1.00 65.39 O ATOM 3867 N MET B 200 -5.103 17.243 -5.375 1.00 64.09 N ATOM 3868 CA MET B 200 -5.848 18.368 -4.838 1.00 61.01 C ATOM 3869 C MET B 200 -5.461 19.628 -5.587 1.00 62.42 C ATOM 3870 O MET B 200 -6.312 20.476 -5.867 1.00 62.06 O ATOM 3871 CB MET B 200 -5.558 18.531 -3.346 1.00 56.33 C ATOM 3872 CG MET B 200 -6.177 17.454 -2.477 1.00 53.16 C ATOM 3873 SD MET B 200 -7.982 17.292 -2.699 1.00 46.37 S ATOM 3874 CE MET B 200 -8.511 18.896 -2.264 1.00 48.10 C ATOM 3875 N THR B 201 -4.179 19.741 -5.925 1.00 62.97 N ATOM 3876 CA THR B 201 -3.706 20.908 -6.654 1.00 64.97 C ATOM 3877 C THR B 201 -4.401 21.003 -8.025 1.00 66.10 C ATOM 3878 O THR B 201 -4.685 22.115 -8.516 1.00 65.58 O ATOM 3879 CB THR B 201 -2.146 20.895 -6.800 1.00 65.68 C ATOM 3880 OG1 THR B 201 -1.778 20.951 -8.186 1.00 65.66 O ATOM 3881 CG2 THR B 201 -1.552 19.649 -6.149 1.00 65.96 C ATOM 3882 N GLU B 202 -4.696 19.850 -8.631 1.00 66.73 N ATOM 3883 CA GLU B 202 -5.376 19.836 -9.926 1.00 67.80 C ATOM 3884 C GLU B 202 -6.820 20.179 -9.720 1.00 68.66 C ATOM 3885 O GLU B 202 -7.420 20.898 -10.525 1.00 70.18 O ATOM 3886 CB GLU B 202 -5.283 18.482 -10.594 1.00 67.72 C ATOM 3887 CG GLU B 202 -4.161 18.437 -11.594 1.00 72.08 C ATOM 3888 CD GLU B 202 -2.816 18.741 -10.959 1.00 74.23 C ATOM 3889 OE1 GLU B 202 -2.333 17.879 -10.184 1.00 73.29 O ATOM 3890 OE2 GLU B 202 -2.256 19.838 -11.230 1.00 75.62 O ATOM 3891 N VAL B 203 -7.379 19.646 -8.638 1.00 67.55 N ATOM 3892 CA VAL B 203 -8.757 19.930 -8.290 1.00 65.53 C ATOM 3893 C VAL B 203 -8.775 21.457 -8.216 1.00 64.91 C ATOM 3894 O VAL B 203 -9.548 22.130 -8.908 1.00 64.87 O ATOM 3895 CB VAL B 203 -9.116 19.363 -6.880 1.00 65.75 C ATOM 3896 CG1 VAL B 203 -10.617 19.458 -6.641 1.00 65.81 C ATOM 3897 CG2 VAL B 203 -8.638 17.929 -6.739 1.00 66.78 C ATOM 3898 N ASP B 204 -7.877 21.984 -7.379 1.00 62.79 N ATOM 3899 CA ASP B 204 -7.753 23.413 -7.151 1.00 61.49 C ATOM 3900 C ASP B 204 -7.683 24.207 -8.423 1.00 62.37 C ATOM 3901 O ASP B 204 -8.335 25.239 -8.549 1.00 61.65 O ATOM 3902 CB ASP B 204 -6.510 23.706 -6.319 1.00 59.70 C ATOM 3903 CG ASP B 204 -6.749 23.524 -4.841 1.00 56.79 C ATOM 3904 OD1 ASP B 204 -7.690 22.777 -4.502 1.00 52.71 O ATOM 3905 OD2 ASP B 204 -5.994 24.116 -4.032 1.00 54.02 O ATOM 3906 N LYS B 205 -6.886 23.724 -9.369 1.00 64.38 N ATOM 3907 CA LYS B 205 -6.718 24.421 -10.634 1.00 65.17 C ATOM 3908 C LYS B 205 -7.944 24.411 -11.541 1.00 64.26 C ATOM 3909 O LYS B 205 -8.485 25.457 -11.862 1.00 63.15 O ATOM 3910 CB LYS B 205 -5.528 23.848 -11.396 1.00 66.18 C ATOM 3911 CG LYS B 205 -5.198 24.644 -12.641 1.00 69.34 C ATOM 3912 CD LYS B 205 -3.988 24.073 -13.366 1.00 73.83 C ATOM 3913 CE LYS B 205 -3.535 24.947 -14.544 1.00 76.14 C ATOM 3914 NZ LYS B 205 -2.993 26.290 -14.134 1.00 77.46 N ATOM 3915 N LEU B 206 -8.388 23.228 -11.939 1.00 64.81 N ATOM 3916 CA LEU B 206 -9.524 23.100 -12.851 1.00 67.28 C ATOM 3917 C LEU B 206 -10.916 23.288 -12.264 1.00 68.71 C ATOM 3918 O LEU B 206 -11.786 23.896 -12.893 1.00 69.72 O ATOM 3919 CB LEU B 206 -9.492 21.726 -13.519 1.00 68.10 C ATOM 3920 CG LEU B 206 -8.120 21.236 -13.970 1.00 67.44 C ATOM 3921 CD1 LEU B 206 -8.217 19.790 -14.463 1.00 63.42 C ATOM 3922 CD2 LEU B 206 -7.601 22.185 -15.045 1.00 67.18 C ATOM 3923 N GLY B 207 -11.134 22.740 -11.074 1.00 69.83 N ATOM 3924 CA GLY B 207 -12.446 22.823 -10.462 1.00 70.69 C ATOM 3925 C GLY B 207 -13.101 21.490 -10.754 1.00 70.94 C ATOM 3926 O GLY B 207 -13.170 21.073 -11.906 1.00 70.79 O ATOM 3927 N ILE B 208 -13.568 20.820 -9.707 1.00 70.50 N ATOM 3928 CA ILE B 208 -14.181 19.508 -9.837 1.00 70.18 C ATOM 3929 C ILE B 208 -14.923 19.302 -11.165 1.00 72.13 C ATOM 3930 O ILE B 208 -14.810 18.237 -11.780 1.00 71.07 O ATOM 3931 CB ILE B 208 -15.118 19.217 -8.626 1.00 68.19 C ATOM 3932 CG1 ILE B 208 -14.720 17.900 -7.959 1.00 66.86 C ATOM 3933 CG2 ILE B 208 -16.567 19.223 -9.057 1.00 65.59 C ATOM 3934 CD1 ILE B 208 -14.654 16.704 -8.873 1.00 63.60 C ATOM 3935 N GLY B 209 -15.657 20.316 -11.620 1.00 74.55 N ATOM 3936 CA GLY B 209 -16.386 20.185 -12.875 1.00 78.13 C ATOM 3937 C GLY B 209 -15.525 19.721 -14.044 1.00 79.78 C ATOM 3938 O GLY B 209 -15.795 18.682 -14.649 1.00 79.81 O ATOM 3939 N LYS B 210 -14.492 20.506 -14.353 1.00 81.12 N ATOM 3940 CA LYS B 210 -13.536 20.226 -15.428 1.00 81.72 C ATOM 3941 C LYS B 210 -12.771 18.937 -15.110 1.00 81.92 C ATOM 3942 O LYS B 210 -12.473 18.140 -15.999 1.00 82.06 O ATOM 3943 CB LYS B 210 -12.568 21.414 -15.557 1.00 83.49 C ATOM 3944 CG LYS B 210 -11.350 21.201 -16.445 1.00 86.80 C ATOM 3945 CD LYS B 210 -11.730 20.875 -17.889 1.00 89.98 C ATOM 3946 CE LYS B 210 -10.483 20.650 -18.764 1.00 90.59 C ATOM 3947 NZ LYS B 210 -10.785 20.007 -20.091 1.00 91.17 N ATOM 3948 N VAL B 211 -12.458 18.737 -13.836 1.00 81.88 N ATOM 3949 CA VAL B 211 -11.743 17.542 -13.410 1.00 81.43 C ATOM 3950 C VAL B 211 -12.502 16.296 -13.861 1.00 82.60 C ATOM 3951 O VAL B 211 -11.893 15.349 -14.356 1.00 82.44 O ATOM 3952 CB VAL B 211 -11.584 17.502 -11.872 1.00 80.75 C ATOM 3953 CG1 VAL B 211 -10.963 16.188 -11.441 1.00 79.53 C ATOM 3954 CG2 VAL B 211 -10.729 18.667 -11.407 1.00 80.56 C ATOM 3955 N MET B 212 -13.826 16.297 -13.686 1.00 83.85 N ATOM 3956 CA MET B 212 -14.660 15.155 -14.078 1.00 85.06 C ATOM 3957 C MET B 212 -14.819 15.118 -15.590 1.00 87.35 C ATOM 3958 O MET B 212 -14.680 14.071 -16.226 1.00 87.24 O ATOM 3959 CB MET B 212 -16.043 15.238 -13.427 1.00 82.29 C ATOM 3960 CG MET B 212 -16.040 15.107 -11.904 1.00 80.30 C ATOM 3961 SD MET B 212 -15.371 13.544 -11.259 1.00 73.97 S ATOM 3962 CE MET B 212 -16.744 12.515 -11.487 1.00 75.37 C ATOM 3963 N GLU B 213 -15.121 16.281 -16.152 1.00 90.13 N ATOM 3964 CA GLU B 213 -15.291 16.449 -17.585 1.00 92.16 C ATOM 3965 C GLU B 213 -14.092 15.806 -18.295 1.00 92.77 C ATOM 3966 O GLU B 213 -14.261 14.998 -19.212 1.00 94.09 O ATOM 3967 CB GLU B 213 -15.374 17.945 -17.885 1.00 94.04 C ATOM 3968 CG GLU B 213 -15.853 18.333 -19.265 1.00 99.07 C ATOM 3969 CD GLU B 213 -16.051 19.843 -19.387 1.00102.30 C ATOM 3970 OE1 GLU B 213 -16.887 20.393 -18.632 1.00103.12 O ATOM 3971 OE2 GLU B 213 -15.373 20.481 -20.230 1.00104.92 O ATOM 3972 N GLU B 214 -12.884 16.154 -17.859 1.00 92.07 N ATOM 3973 CA GLU B 214 -11.675 15.594 -18.450 1.00 91.63 C ATOM 3974 C GLU B 214 -11.647 14.088 -18.254 1.00 91.32 C ATOM 3975 O GLU B 214 -11.891 13.337 -19.192 1.00 91.52 O ATOM 3976 CB GLU B 214 -10.424 16.205 -17.813 1.00 92.05 C ATOM 3977 CG GLU B 214 -9.948 17.483 -18.467 1.00 92.73 C ATOM 3978 CD GLU B 214 -8.637 17.981 -17.883 1.00 93.79 C ATOM 3979 OE1 GLU B 214 -7.633 17.229 -17.931 1.00 93.91 O ATOM 3980 OE2 GLU B 214 -8.616 19.126 -17.382 1.00 93.93 O ATOM 3981 N THR B 215 -11.349 13.661 -17.029 1.00 91.31 N ATOM 3982 CA THR B 215 -11.275 12.241 -16.675 1.00 90.84 C ATOM 3983 C THR B 215 -12.176 11.354 -17.518 1.00 90.63 C ATOM 3984 O THR B 215 -11.810 10.222 -17.844 1.00 89.71 O ATOM 3985 CB THR B 215 -11.653 11.998 -15.200 1.00 91.00 C ATOM 3986 OG1 THR B 215 -12.828 12.751 -14.877 1.00 91.70 O ATOM 3987 CG2 THR B 215 -10.537 12.401 -14.287 1.00 90.94 C ATOM 3988 N PHE B 216 -13.362 11.857 -17.856 1.00 90.43 N ATOM 3989 CA PHE B 216 -14.283 11.073 -18.662 1.00 90.26 C ATOM 3990 C PHE B 216 -13.749 10.891 -20.069 1.00 91.62 C ATOM 3991 O PHE B 216 -13.417 9.770 -20.459 1.00 92.03 O ATOM 3992 CB PHE B 216 -15.674 11.710 -18.675 1.00 86.72 C ATOM 3993 CG PHE B 216 -16.464 11.445 -17.423 1.00 83.81 C ATOM 3994 CD1 PHE B 216 -15.985 10.567 -16.454 1.00 82.90 C ATOM 3995 CD2 PHE B 216 -17.676 12.081 -17.202 1.00 82.83 C ATOM 3996 CE1 PHE B 216 -16.688 10.345 -15.273 1.00 82.23 C ATOM 3997 CE2 PHE B 216 -18.389 11.865 -16.021 1.00 82.56 C ATOM 3998 CZ PHE B 216 -17.897 10.989 -15.057 1.00 81.79 C ATOM 3999 N SER B 217 -13.650 11.973 -20.834 1.00 92.18 N ATOM 4000 CA SER B 217 -13.121 11.854 -22.184 1.00 92.47 C ATOM 4001 C SER B 217 -11.893 10.930 -22.157 1.00 92.71 C ATOM 4002 O SER B 217 -11.853 9.927 -22.867 1.00 93.44 O ATOM 4003 CB SER B 217 -12.718 13.225 -22.723 1.00 92.57 C ATOM 4004 OG SER B 217 -11.504 13.656 -22.134 1.00 92.76 O ATOM 4005 N TYR B 218 -10.907 11.260 -21.324 1.00 92.44 N ATOM 4006 CA TYR B 218 -9.689 10.462 -21.207 1.00 93.21 C ATOM 4007 C TYR B 218 -9.932 8.948 -21.085 1.00 94.61 C ATOM 4008 O TYR B 218 -9.094 8.144 -21.498 1.00 93.90 O ATOM 4009 CB TYR B 218 -8.872 10.932 -20.000 1.00 92.46 C ATOM 4010 CG TYR B 218 -7.725 10.002 -19.653 1.00 92.79 C ATOM 4011 CD1 TYR B 218 -6.457 10.175 -20.214 1.00 93.36 C ATOM 4012 CD2 TYR B 218 -7.920 8.913 -18.804 1.00 91.95 C ATOM 4013 CE1 TYR B 218 -5.410 9.279 -19.939 1.00 92.39 C ATOM 4014 CE2 TYR B 218 -6.889 8.013 -18.523 1.00 92.17 C ATOM 4015 CZ TYR B 218 -5.633 8.198 -19.094 1.00 92.07 C ATOM 4016 OH TYR B 218 -4.617 7.293 -18.835 1.00 91.03 O ATOM 4017 N LEU B 219 -11.070 8.564 -20.512 1.00 96.79 N ATOM 4018 CA LEU B 219 -11.401 7.149 -20.324 1.00 99.53 C ATOM 4019 C LEU B 219 -12.484 6.612 -21.269 1.00102.10 C ATOM 4020 O LEU B 219 -12.382 5.493 -21.780 1.00102.34 O ATOM 4021 CB LEU B 219 -11.851 6.902 -18.878 1.00 98.47 C ATOM 4022 CG LEU B 219 -10.822 6.940 -17.752 1.00 97.30 C ATOM 4023 CD1 LEU B 219 -11.544 6.960 -16.427 1.00 97.75 C ATOM 4024 CD2 LEU B 219 -9.903 5.734 -17.834 1.00 96.22 C ATOM 4025 N LEU B 220 -13.523 7.410 -21.490 1.00104.81 N ATOM 4026 CA LEU B 220 -14.637 7.006 -22.344 1.00107.29 C ATOM 4027 C LEU B 220 -14.636 7.770 -23.673 1.00109.31 C ATOM 4028 O LEU B 220 -15.658 7.840 -24.358 1.00109.84 O ATOM 4029 CB LEU B 220 -15.964 7.249 -21.603 1.00106.72 C ATOM 4030 CG LEU B 220 -15.995 7.046 -20.077 1.00105.67 C ATOM 4031 CD1 LEU B 220 -17.375 7.414 -19.539 1.00104.74 C ATOM 4032 CD2 LEU B 220 -15.636 5.606 -19.721 1.00104.50 C ATOM 4033 N GLY B 221 -13.487 8.341 -24.031 1.00111.26 N ATOM 4034 CA GLY B 221 -13.390 9.094 -25.270 1.00113.38 C ATOM 4035 C GLY B 221 -13.624 8.230 -26.492 1.00114.74 C ATOM 4036 O GLY B 221 -14.453 8.553 -27.344 1.00114.30 O ATOM 4037 N ARG B 222 -12.888 7.128 -26.574 1.00116.28 N ATOM 4038 CA ARG B 222 -13.008 6.198 -27.688 1.00117.84 C ATOM 4039 C ARG B 222 -14.142 5.199 -27.450 1.00117.49 C ATOM 4040 O ARG B 222 -15.276 5.394 -27.900 1.00117.04 O ATOM 4041 CB ARG B 222 -11.679 5.456 -27.883 1.00120.30 C ATOM 4042 CG ARG B 222 -10.558 6.329 -28.452 1.00124.08 C ATOM 4043 CD ARG B 222 -9.163 5.718 -28.250 1.00127.28 C ATOM 4044 NE ARG B 222 -8.732 5.750 -26.849 1.00129.65 N ATOM 4045 CZ ARG B 222 -7.486 5.519 -26.433 1.00130.39 C ATOM 4046 NH1 ARG B 222 -6.529 5.235 -27.310 1.00131.75 N ATOM 4047 NH2 ARG B 222 -7.190 5.578 -25.139 1.00130.10 N ATOM 4048 N LYS B 223 -13.835 4.126 -26.732 1.00117.21 N ATOM 4049 CA LYS B 223 -14.831 3.104 -26.440 1.00116.89 C ATOM 4050 C LYS B 223 -15.462 3.298 -25.061 1.00115.74 C ATOM 4051 O LYS B 223 -14.784 3.202 -24.036 1.00116.07 O ATOM 4052 CB LYS B 223 -14.195 1.707 -26.506 1.00118.12 C ATOM 4053 CG LYS B 223 -13.711 1.252 -27.887 1.00119.48 C ATOM 4054 CD LYS B 223 -14.861 0.867 -28.819 1.00120.20 C ATOM 4055 CE LYS B 223 -14.368 0.023 -30.001 1.00119.96 C ATOM 4056 NZ LYS B 223 -13.356 0.714 -30.852 1.00120.32 N ATOM 4057 N LYS B 224 -16.760 3.577 -25.044 1.00113.56 N ATOM 4058 CA LYS B 224 -17.485 3.736 -23.788 1.00110.71 C ATOM 4059 C LYS B 224 -17.474 2.377 -23.086 1.00108.17 C ATOM 4060 O LYS B 224 -17.967 1.391 -23.633 1.00107.92 O ATOM 4061 CB LYS B 224 -18.936 4.154 -24.055 1.00111.33 C ATOM 4062 CG LYS B 224 -19.122 5.605 -24.451 1.00111.67 C ATOM 4063 CD LYS B 224 -18.280 5.980 -25.655 1.00113.19 C ATOM 4064 CE LYS B 224 -18.490 7.439 -26.031 1.00114.15 C ATOM 4065 NZ LYS B 224 -17.552 7.879 -27.105 1.00114.36 N ATOM 4066 N ARG B 225 -16.902 2.314 -21.888 1.00104.97 N ATOM 4067 CA ARG B 225 -16.862 1.057 -21.143 1.00102.05 C ATOM 4068 C ARG B 225 -17.546 1.188 -19.779 1.00 99.65 C ATOM 4069 O ARG B 225 -18.135 2.227 -19.466 1.00100.15 O ATOM 4070 CB ARG B 225 -15.412 0.571 -20.954 1.00101.03 C ATOM 4071 CG ARG B 225 -14.477 1.509 -20.204 1.00 99.54 C ATOM 4072 CD ARG B 225 -14.074 2.677 -21.080 1.00 99.50 C ATOM 4073 NE ARG B 225 -12.767 3.230 -20.725 1.00 98.90 N ATOM 4074 CZ ARG B 225 -11.629 2.537 -20.744 1.00 99.29 C ATOM 4075 NH1 ARG B 225 -11.629 1.256 -21.090 1.00 99.63 N ATOM 4076 NH2 ARG B 225 -10.486 3.132 -20.434 1.00 99.36 N ATOM 4077 N PRO B 226 -17.528 0.114 -18.974 1.00 96.61 N ATOM 4078 CA PRO B 226 -18.165 0.211 -17.657 1.00 93.91 C ATOM 4079 C PRO B 226 -17.212 0.892 -16.687 1.00 90.68 C ATOM 4080 O PRO B 226 -15.990 0.835 -16.865 1.00 90.41 O ATOM 4081 CB PRO B 226 -18.417 -1.246 -17.285 1.00 94.90 C ATOM 4082 CG PRO B 226 -18.562 -1.912 -18.626 1.00 96.02 C ATOM 4083 CD PRO B 226 -17.429 -1.294 -19.390 1.00 95.93 C ATOM 4084 N ILE B 227 -17.779 1.538 -15.671 1.00 86.58 N ATOM 4085 CA ILE B 227 -16.997 2.250 -14.660 1.00 81.66 C ATOM 4086 C ILE B 227 -16.958 1.510 -13.330 1.00 78.57 C ATOM 4087 O ILE B 227 -17.920 0.836 -12.936 1.00 78.91 O ATOM 4088 CB ILE B 227 -17.569 3.692 -14.381 1.00 81.09 C ATOM 4089 CG1 ILE B 227 -16.952 4.712 -15.341 1.00 79.71 C ATOM 4090 CG2 ILE B 227 -17.298 4.110 -12.930 1.00 79.22 C ATOM 4091 CD1 ILE B 227 -17.508 4.672 -16.742 1.00 79.96 C ATOM 4092 N HIS B 228 -15.826 1.631 -12.652 1.00 73.48 N ATOM 4093 CA HIS B 228 -15.684 1.047 -11.345 1.00 69.84 C ATOM 4094 C HIS B 228 -15.180 2.176 -10.457 1.00 67.82 C ATOM 4095 O HIS B 228 -13.989 2.508 -10.451 1.00 66.60 O ATOM 4096 CB HIS B 228 -14.678 -0.087 -11.335 1.00 70.56 C ATOM 4097 CG HIS B 228 -14.567 -0.756 -10.002 1.00 73.54 C ATOM 4098 ND1 HIS B 228 -15.543 -1.602 -9.511 1.00 75.85 N ATOM 4099 CD2 HIS B 228 -13.635 -0.658 -9.026 1.00 74.60 C ATOM 4100 CE1 HIS B 228 -15.217 -1.993 -8.293 1.00 75.11 C ATOM 4101 NE2 HIS B 228 -14.063 -1.433 -7.973 1.00 76.27 N ATOM 4102 N LEU B 229 -16.099 2.792 -9.728 1.00 65.08 N ATOM 4103 CA LEU B 229 -15.736 3.875 -8.834 1.00 62.32 C ATOM 4104 C LEU B 229 -15.356 3.320 -7.465 1.00 61.53 C ATOM 4105 O LEU B 229 -16.220 2.898 -6.694 1.00 62.14 O ATOM 4106 CB LEU B 229 -16.904 4.838 -8.663 1.00 59.30 C ATOM 4107 CG LEU B 229 -16.654 5.899 -7.589 1.00 57.40 C ATOM 4108 CD1 LEU B 229 -15.571 6.859 -8.058 1.00 54.44 C ATOM 4109 CD2 LEU B 229 -17.948 6.621 -7.301 1.00 56.40 C ATOM 4110 N SER B 230 -14.065 3.300 -7.167 1.00 59.36 N ATOM 4111 CA SER B 230 -13.611 2.817 -5.874 1.00 58.06 C ATOM 4112 C SER B 230 -13.491 4.100 -5.046 1.00 57.34 C ATOM 4113 O SER B 230 -12.530 4.869 -5.180 1.00 57.15 O ATOM 4114 CB SER B 230 -12.251 2.110 -6.014 1.00 58.58 C ATOM 4115 OG SER B 230 -12.019 1.157 -4.984 1.00 57.12 O ATOM 4116 N PHE B 231 -14.489 4.349 -4.207 1.00 55.66 N ATOM 4117 CA PHE B 231 -14.479 5.555 -3.393 1.00 54.53 C ATOM 4118 C PHE B 231 -13.973 5.307 -1.989 1.00 53.57 C ATOM 4119 O PHE B 231 -14.475 4.447 -1.264 1.00 52.54 O ATOM 4120 CB PHE B 231 -15.876 6.172 -3.320 1.00 52.60 C ATOM 4121 CG PHE B 231 -15.866 7.634 -3.027 1.00 46.80 C ATOM 4122 CD1 PHE B 231 -15.171 8.125 -1.939 1.00 44.29 C ATOM 4123 CD2 PHE B 231 -16.571 8.514 -3.838 1.00 46.49 C ATOM 4124 CE1 PHE B 231 -15.179 9.459 -1.658 1.00 44.81 C ATOM 4125 CE2 PHE B 231 -16.593 9.857 -3.571 1.00 46.31 C ATOM 4126 CZ PHE B 231 -15.892 10.338 -2.474 1.00 47.96 C ATOM 4127 N ASP B 232 -12.979 6.099 -1.610 1.00 54.56 N ATOM 4128 CA ASP B 232 -12.370 5.990 -0.296 1.00 54.94 C ATOM 4129 C ASP B 232 -12.776 7.139 0.588 1.00 54.45 C ATOM 4130 O ASP B 232 -12.152 8.184 0.566 1.00 53.16 O ATOM 4131 CB ASP B 232 -10.854 5.969 -0.418 1.00 53.83 C ATOM 4132 CG ASP B 232 -10.173 5.967 0.914 1.00 54.18 C ATOM 4133 OD1 ASP B 232 -10.863 6.016 1.956 1.00 51.22 O ATOM 4134 OD2 ASP B 232 -8.927 5.929 0.921 1.00 54.39 O ATOM 4135 N VAL B 233 -13.819 6.916 1.377 1.00 55.37 N ATOM 4136 CA VAL B 233 -14.347 7.896 2.306 1.00 56.07 C ATOM 4137 C VAL B 233 -13.317 8.895 2.836 1.00 56.17 C ATOM 4138 O VAL B 233 -13.677 10.003 3.236 1.00 57.17 O ATOM 4139 CB VAL B 233 -14.968 7.199 3.504 1.00 58.18 C ATOM 4140 CG1 VAL B 233 -13.855 6.592 4.378 1.00 57.95 C ATOM 4141 CG2 VAL B 233 -15.800 8.182 4.288 1.00 59.10 C ATOM 4142 N ASP B 234 -12.041 8.516 2.858 1.00 55.85 N ATOM 4143 CA ASP B 234 -11.005 9.442 3.336 1.00 55.47 C ATOM 4144 C ASP B 234 -10.579 10.531 2.333 1.00 52.10 C ATOM 4145 O ASP B 234 -9.952 11.497 2.714 1.00 52.05 O ATOM 4146 CB ASP B 234 -9.760 8.667 3.826 1.00 59.03 C ATOM 4147 CG ASP B 234 -8.873 8.157 2.688 1.00 63.96 C ATOM 4148 OD1 ASP B 234 -8.592 8.921 1.724 1.00 65.02 O ATOM 4149 OD2 ASP B 234 -8.426 6.986 2.785 1.00 66.85 O ATOM 4150 N GLY B 235 -10.911 10.367 1.060 1.00 50.06 N ATOM 4151 CA GLY B 235 -10.549 11.357 0.058 1.00 49.84 C ATOM 4152 C GLY B 235 -11.286 12.694 0.177 1.00 49.59 C ATOM 4153 O GLY B 235 -11.076 13.623 -0.620 1.00 47.09 O ATOM 4154 N LEU B 236 -12.187 12.779 1.148 1.00 50.09 N ATOM 4155 CA LEU B 236 -12.909 14.007 1.390 1.00 50.76 C ATOM 4156 C LEU B 236 -12.098 14.664 2.483 1.00 51.57 C ATOM 4157 O LEU B 236 -11.187 14.041 3.034 1.00 52.88 O ATOM 4158 CB LEU B 236 -14.311 13.700 1.886 1.00 52.43 C ATOM 4159 CG LEU B 236 -15.413 13.984 0.869 1.00 54.89 C ATOM 4160 CD1 LEU B 236 -14.991 13.475 -0.500 1.00 55.00 C ATOM 4161 CD2 LEU B 236 -16.703 13.324 1.328 1.00 55.89 C ATOM 4162 N ASP B 237 -12.390 15.912 2.809 1.00 52.93 N ATOM 4163 CA ASP B 237 -11.617 16.540 3.874 1.00 57.18 C ATOM 4164 C ASP B 237 -11.992 15.947 5.244 1.00 58.88 C ATOM 4165 O ASP B 237 -13.131 15.511 5.450 1.00 59.95 O ATOM 4166 CB ASP B 237 -11.829 18.046 3.891 1.00 58.63 C ATOM 4167 CG ASP B 237 -10.918 18.735 4.882 1.00 60.03 C ATOM 4168 OD1 ASP B 237 -11.193 18.639 6.102 1.00 60.59 O ATOM 4169 OD2 ASP B 237 -9.913 19.339 4.437 1.00 60.31 O ATOM 4170 N PRO B 238 -11.037 15.917 6.197 1.00 57.68 N ATOM 4171 CA PRO B 238 -11.368 15.354 7.506 1.00 56.57 C ATOM 4172 C PRO B 238 -12.566 15.972 8.228 1.00 56.08 C ATOM 4173 O PRO B 238 -13.193 15.314 9.066 1.00 56.48 O ATOM 4174 CB PRO B 238 -10.057 15.499 8.271 1.00 57.23 C ATOM 4175 CG PRO B 238 -9.053 15.216 7.203 1.00 57.07 C ATOM 4176 CD PRO B 238 -9.580 16.108 6.086 1.00 57.93 C ATOM 4177 N VAL B 239 -12.903 17.218 7.898 1.00 53.68 N ATOM 4178 CA VAL B 239 -14.043 17.883 8.531 1.00 52.34 C ATOM 4179 C VAL B 239 -15.364 17.200 8.127 1.00 52.10 C ATOM 4180 O VAL B 239 -16.357 17.230 8.858 1.00 50.40 O ATOM 4181 CB VAL B 239 -14.107 19.371 8.117 1.00 51.91 C ATOM 4182 CG1 VAL B 239 -15.021 19.551 6.927 1.00 51.39 C ATOM 4183 CG2 VAL B 239 -14.569 20.214 9.279 1.00 51.86 C ATOM 4184 N PHE B 240 -15.349 16.567 6.960 1.00 52.61 N ATOM 4185 CA PHE B 240 -16.527 15.906 6.419 1.00 51.13 C ATOM 4186 C PHE B 240 -16.638 14.432 6.758 1.00 50.55 C ATOM 4187 O PHE B 240 -17.713 13.961 7.108 1.00 52.43 O ATOM 4188 CB PHE B 240 -16.549 16.082 4.897 1.00 49.52 C ATOM 4189 CG PHE B 240 -16.627 17.525 4.444 1.00 46.38 C ATOM 4190 CD1 PHE B 240 -17.828 18.228 4.523 1.00 43.52 C ATOM 4191 CD2 PHE B 240 -15.509 18.166 3.917 1.00 44.12 C ATOM 4192 CE1 PHE B 240 -17.918 19.520 4.078 1.00 40.93 C ATOM 4193 CE2 PHE B 240 -15.596 19.467 3.469 1.00 41.80 C ATOM 4194 CZ PHE B 240 -16.801 20.146 3.550 1.00 40.90 C ATOM 4195 N THR B 241 -15.535 13.706 6.657 1.00 49.79 N ATOM 4196 CA THR B 241 -15.540 12.276 6.933 1.00 50.83 C ATOM 4197 C THR B 241 -14.479 11.954 7.957 1.00 52.55 C ATOM 4198 O THR B 241 -13.544 11.200 7.680 1.00 51.99 O ATOM 4199 CB THR B 241 -15.218 11.480 5.670 1.00 51.65 C ATOM 4200 OG1 THR B 241 -13.970 11.945 5.120 1.00 53.02 O ATOM 4201 CG2 THR B 241 -16.330 11.645 4.649 1.00 50.87 C ATOM 4202 N PRO B 242 -14.602 12.527 9.158 1.00 54.97 N ATOM 4203 CA PRO B 242 -13.631 12.298 10.234 1.00 57.22 C ATOM 4204 C PRO B 242 -13.348 10.862 10.689 1.00 56.87 C ATOM 4205 O PRO B 242 -12.204 10.539 10.987 1.00 57.12 O ATOM 4206 CB PRO B 242 -14.149 13.200 11.362 1.00 57.28 C ATOM 4207 CG PRO B 242 -15.601 13.431 11.014 1.00 57.06 C ATOM 4208 CD PRO B 242 -15.553 13.586 9.530 1.00 56.17 C ATOM 4209 N ALA B 243 -14.370 10.009 10.741 1.00 58.31 N ATOM 4210 CA ALA B 243 -14.186 8.618 11.175 1.00 58.19 C ATOM 4211 C ALA B 243 -13.443 7.752 10.159 1.00 58.24 C ATOM 4212 O ALA B 243 -14.070 6.949 9.465 1.00 57.68 O ATOM 4213 CB ALA B 243 -15.545 7.980 11.498 1.00 56.37 C ATOM 4214 N THR B 244 -12.120 7.912 10.086 1.00 59.71 N ATOM 4215 CA THR B 244 -11.277 7.145 9.154 1.00 62.49 C ATOM 4216 C THR B 244 -9.951 6.695 9.775 1.00 64.13 C ATOM 4217 O THR B 244 -9.642 7.010 10.926 1.00 66.01 O ATOM 4218 CB THR B 244 -10.891 7.958 7.889 1.00 60.93 C ATOM 4219 OG1 THR B 244 -11.978 8.797 7.491 1.00 62.57 O ATOM 4220 CG2 THR B 244 -10.574 7.012 6.741 1.00 62.79 C ATOM 4221 N GLY B 245 -9.172 5.950 8.991 1.00 65.60 N ATOM 4222 CA GLY B 245 -7.871 5.491 9.438 1.00 65.27 C ATOM 4223 C GLY B 245 -6.922 6.656 9.290 1.00 65.53 C ATOM 4224 O GLY B 245 -6.505 7.239 10.279 1.00 65.18 O ATOM 4225 N THR B 246 -6.605 7.024 8.053 1.00 67.51 N ATOM 4226 CA THR B 246 -5.704 8.147 7.817 1.00 69.03 C ATOM 4227 C THR B 246 -6.395 9.426 7.380 1.00 70.69 C ATOM 4228 O THR B 246 -6.941 9.472 6.277 1.00 72.99 O ATOM 4229 CB THR B 246 -4.716 7.918 6.674 1.00 68.81 C ATOM 4230 OG1 THR B 246 -4.566 6.521 6.404 1.00 70.16 O ATOM 4231 CG2 THR B 246 -3.384 8.579 7.014 1.00 67.59 C ATOM 4232 N PRO B 247 -6.414 10.473 8.231 1.00 69.84 N ATOM 4233 CA PRO B 247 -7.075 11.665 7.696 1.00 67.28 C ATOM 4234 C PRO B 247 -5.930 12.502 7.072 1.00 65.99 C ATOM 4235 O PRO B 247 -4.760 12.353 7.459 1.00 64.86 O ATOM 4236 CB PRO B 247 -7.699 12.305 8.938 1.00 66.52 C ATOM 4237 CG PRO B 247 -6.711 11.986 10.000 1.00 67.42 C ATOM 4238 CD PRO B 247 -6.329 10.536 9.706 1.00 69.14 C ATOM 4239 N VAL B 248 -6.249 13.338 6.086 1.00 64.07 N ATOM 4240 CA VAL B 248 -5.241 14.190 5.439 1.00 60.46 C ATOM 4241 C VAL B 248 -5.921 15.535 5.101 1.00 58.21 C ATOM 4242 O VAL B 248 -6.760 15.607 4.207 1.00 58.79 O ATOM 4243 CB VAL B 248 -4.680 13.507 4.154 1.00 59.13 C ATOM 4244 CG1 VAL B 248 -3.535 14.290 3.618 1.00 59.22 C ATOM 4245 CG2 VAL B 248 -4.185 12.120 4.470 1.00 60.23 C ATOM 4246 N VAL B 249 -5.577 16.598 5.824 1.00 54.05 N ATOM 4247 CA VAL B 249 -6.222 17.889 5.588 1.00 51.52 C ATOM 4248 C VAL B 249 -6.153 18.368 4.154 1.00 49.68 C ATOM 4249 O VAL B 249 -5.299 17.924 3.388 1.00 50.93 O ATOM 4250 CB VAL B 249 -5.605 18.974 6.443 1.00 50.57 C ATOM 4251 CG1 VAL B 249 -5.885 18.694 7.887 1.00 50.72 C ATOM 4252 CG2 VAL B 249 -4.123 19.038 6.179 1.00 48.35 C ATOM 4253 N GLY B 250 -7.044 19.285 3.796 1.00 45.28 N ATOM 4254 CA GLY B 250 -7.033 19.827 2.450 1.00 44.16 C ATOM 4255 C GLY B 250 -7.448 18.882 1.329 1.00 45.03 C ATOM 4256 O GLY B 250 -6.775 18.784 0.285 1.00 43.44 O ATOM 4257 N GLY B 251 -8.575 18.205 1.551 1.00 45.14 N ATOM 4258 CA GLY B 251 -9.112 17.275 0.582 1.00 44.15 C ATOM 4259 C GLY B 251 -10.476 17.678 0.046 1.00 44.72 C ATOM 4260 O GLY B 251 -11.002 18.752 0.360 1.00 45.93 O ATOM 4261 N LEU B 252 -11.036 16.800 -0.781 1.00 43.77 N ATOM 4262 CA LEU B 252 -12.327 16.987 -1.414 1.00 44.97 C ATOM 4263 C LEU B 252 -13.434 17.433 -0.458 1.00 47.83 C ATOM 4264 O LEU B 252 -13.544 16.954 0.689 1.00 51.38 O ATOM 4265 CB LEU B 252 -12.731 15.677 -2.100 1.00 43.84 C ATOM 4266 CG LEU B 252 -11.779 15.293 -3.231 1.00 43.55 C ATOM 4267 CD1 LEU B 252 -11.824 13.802 -3.524 1.00 41.47 C ATOM 4268 CD2 LEU B 252 -12.134 16.118 -4.447 1.00 41.53 C ATOM 4269 N SER B 253 -14.274 18.344 -0.925 1.00 46.62 N ATOM 4270 CA SER B 253 -15.364 18.809 -0.090 1.00 46.70 C ATOM 4271 C SER B 253 -16.561 17.892 -0.282 1.00 47.40 C ATOM 4272 O SER B 253 -16.631 17.166 -1.275 1.00 48.44 O ATOM 4273 CB SER B 253 -15.728 20.206 -0.512 1.00 45.90 C ATOM 4274 OG SER B 253 -15.761 20.249 -1.924 1.00 46.47 O ATOM 4275 N TYR B 254 -17.493 17.919 0.668 1.00 47.39 N ATOM 4276 CA TYR B 254 -18.703 17.112 0.575 1.00 47.44 C ATOM 4277 C TYR B 254 -19.215 17.428 -0.818 1.00 49.11 C ATOM 4278 O TYR B 254 -19.350 16.527 -1.646 1.00 49.44 O ATOM 4279 CB TYR B 254 -19.721 17.560 1.625 1.00 46.22 C ATOM 4280 CG TYR B 254 -21.033 16.780 1.680 1.00 47.21 C ATOM 4281 CD1 TYR B 254 -21.074 15.463 2.129 1.00 48.90 C ATOM 4282 CD2 TYR B 254 -22.246 17.399 1.382 1.00 47.33 C ATOM 4283 CE1 TYR B 254 -22.293 14.781 2.293 1.00 49.41 C ATOM 4284 CE2 TYR B 254 -23.468 16.738 1.537 1.00 49.47 C ATOM 4285 CZ TYR B 254 -23.488 15.429 1.997 1.00 51.72 C ATOM 4286 OH TYR B 254 -24.705 14.787 2.189 1.00 53.13 O ATOM 4287 N ARG B 255 -19.458 18.717 -1.076 1.00 49.81 N ATOM 4288 CA ARG B 255 -19.929 19.199 -2.380 1.00 50.99 C ATOM 4289 C ARG B 255 -19.147 18.586 -3.550 1.00 52.53 C ATOM 4290 O ARG B 255 -19.732 17.925 -4.422 1.00 53.55 O ATOM 4291 CB ARG B 255 -19.824 20.730 -2.447 1.00 51.35 C ATOM 4292 CG ARG B 255 -21.030 21.461 -1.878 1.00 54.18 C ATOM 4293 CD ARG B 255 -20.807 22.965 -1.711 1.00 55.20 C ATOM 4294 NE ARG B 255 -20.629 23.647 -2.991 1.00 59.63 N ATOM 4295 CZ ARG B 255 -20.370 24.951 -3.120 1.00 62.22 C ATOM 4296 NH1 ARG B 255 -20.262 25.723 -2.036 1.00 63.52 N ATOM 4297 NH2 ARG B 255 -20.207 25.487 -4.332 1.00 60.66 N ATOM 4298 N GLU B 256 -17.835 18.809 -3.577 1.00 52.25 N ATOM 4299 CA GLU B 256 -17.012 18.253 -4.641 1.00 54.48 C ATOM 4300 C GLU B 256 -17.175 16.732 -4.750 1.00 55.70 C ATOM 4301 O GLU B 256 -17.172 16.174 -5.852 1.00 56.74 O ATOM 4302 CB GLU B 256 -15.536 18.568 -4.408 1.00 55.00 C ATOM 4303 CG GLU B 256 -15.167 20.024 -4.546 1.00 55.21 C ATOM 4304 CD GLU B 256 -13.690 20.238 -4.363 1.00 56.50 C ATOM 4305 OE1 GLU B 256 -13.136 19.646 -3.406 1.00 57.48 O ATOM 4306 OE2 GLU B 256 -13.090 20.996 -5.165 1.00 57.60 O ATOM 4307 N GLY B 257 -17.307 16.066 -3.605 1.00 55.23 N ATOM 4308 CA GLY B 257 -17.458 14.621 -3.604 1.00 54.94 C ATOM 4309 C GLY B 257 -18.809 14.196 -4.130 1.00 54.60 C ATOM 4310 O GLY B 257 -18.952 13.171 -4.804 1.00 52.82 O ATOM 4311 N LEU B 258 -19.816 14.985 -3.792 1.00 54.63 N ATOM 4312 CA LEU B 258 -21.140 14.698 -4.262 1.00 55.19 C ATOM 4313 C LEU B 258 -21.180 15.085 -5.724 1.00 56.88 C ATOM 4314 O LEU B 258 -21.970 14.533 -6.492 1.00 58.61 O ATOM 4315 CB LEU B 258 -22.189 15.472 -3.470 1.00 53.48 C ATOM 4316 CG LEU B 258 -22.512 14.822 -2.128 1.00 51.43 C ATOM 4317 CD1 LEU B 258 -23.756 15.439 -1.523 1.00 50.84 C ATOM 4318 CD2 LEU B 258 -22.745 13.351 -2.353 1.00 52.56 C ATOM 4319 N TYR B 259 -20.326 16.013 -6.139 1.00 56.63 N ATOM 4320 CA TYR B 259 -20.360 16.361 -7.545 1.00 58.61 C ATOM 4321 C TYR B 259 -19.828 15.208 -8.381 1.00 59.71 C ATOM 4322 O TYR B 259 -20.349 14.921 -9.464 1.00 61.48 O ATOM 4323 CB TYR B 259 -19.542 17.592 -7.860 1.00 58.79 C ATOM 4324 CG TYR B 259 -19.789 18.056 -9.278 1.00 59.56 C ATOM 4325 CD1 TYR B 259 -20.823 18.958 -9.563 1.00 58.31 C ATOM 4326 CD2 TYR B 259 -18.976 17.609 -10.334 1.00 57.95 C ATOM 4327 CE1 TYR B 259 -21.034 19.417 -10.863 1.00 60.55 C ATOM 4328 CE2 TYR B 259 -19.172 18.055 -11.636 1.00 58.97 C ATOM 4329 CZ TYR B 259 -20.199 18.965 -11.896 1.00 61.35 C ATOM 4330 OH TYR B 259 -20.372 19.458 -13.168 1.00 61.49 O ATOM 4331 N ILE B 260 -18.786 14.552 -7.880 1.00 59.36 N ATOM 4332 CA ILE B 260 -18.202 13.421 -8.583 1.00 59.47 C ATOM 4333 C ILE B 260 -19.249 12.321 -8.792 1.00 59.59 C ATOM 4334 O ILE B 260 -19.676 12.079 -9.921 1.00 57.54 O ATOM 4335 CB ILE B 260 -16.982 12.869 -7.808 1.00 60.43 C ATOM 4336 CG1 ILE B 260 -15.830 13.883 -7.866 1.00 60.99 C ATOM 4337 CG2 ILE B 260 -16.537 11.545 -8.394 1.00 61.05 C ATOM 4338 CD1 ILE B 260 -14.602 13.463 -7.095 1.00 60.66 C ATOM 4339 N THR B 261 -19.673 11.663 -7.715 1.00 61.59 N ATOM 4340 CA THR B 261 -20.679 10.600 -7.830 1.00 63.51 C ATOM 4341 C THR B 261 -21.848 11.051 -8.724 1.00 67.33 C ATOM 4342 O THR B 261 -22.469 10.226 -9.416 1.00 68.11 O ATOM 4343 CB THR B 261 -21.267 10.214 -6.463 1.00 60.27 C ATOM 4344 OG1 THR B 261 -21.775 11.390 -5.839 1.00 57.46 O ATOM 4345 CG2 THR B 261 -20.224 9.565 -5.562 1.00 57.45 C ATOM 4346 N GLU B 262 -22.149 12.352 -8.694 1.00 69.10 N ATOM 4347 CA GLU B 262 -23.231 12.911 -9.506 1.00 71.64 C ATOM 4348 C GLU B 262 -22.948 12.847 -11.012 1.00 74.49 C ATOM 4349 O GLU B 262 -23.808 12.429 -11.791 1.00 73.82 O ATOM 4350 CB GLU B 262 -23.521 14.356 -9.100 1.00 70.03 C ATOM 4351 CG GLU B 262 -24.330 14.476 -7.827 1.00 66.23 C ATOM 4352 CD GLU B 262 -24.944 15.852 -7.659 1.00 64.16 C ATOM 4353 OE1 GLU B 262 -24.311 16.850 -8.079 1.00 61.75 O ATOM 4354 OE2 GLU B 262 -26.056 15.930 -7.091 1.00 61.96 O ATOM 4355 N GLU B 263 -21.756 13.272 -11.424 1.00 77.10 N ATOM 4356 CA GLU B 263 -21.389 13.210 -12.836 1.00 80.64 C ATOM 4357 C GLU B 263 -21.162 11.756 -13.256 1.00 82.90 C ATOM 4358 O GLU B 263 -21.287 11.410 -14.435 1.00 83.65 O ATOM 4359 CB GLU B 263 -20.131 14.024 -13.100 1.00 81.10 C ATOM 4360 CG GLU B 263 -20.421 15.491 -13.227 1.00 85.33 C ATOM 4361 CD GLU B 263 -21.485 15.765 -14.284 1.00 87.53 C ATOM 4362 OE1 GLU B 263 -21.239 15.415 -15.459 1.00 89.71 O ATOM 4363 OE2 GLU B 263 -22.559 16.321 -13.943 1.00 87.81 O ATOM 4364 N ILE B 264 -20.823 10.915 -12.283 1.00 83.91 N ATOM 4365 CA ILE B 264 -20.611 9.497 -12.526 1.00 86.02 C ATOM 4366 C ILE B 264 -21.955 8.900 -12.934 1.00 87.43 C ATOM 4367 O ILE B 264 -22.090 8.238 -13.972 1.00 86.72 O ATOM 4368 CB ILE B 264 -20.111 8.777 -11.239 1.00 86.67 C ATOM 4369 CG1 ILE B 264 -18.624 9.057 -11.019 1.00 87.39 C ATOM 4370 CG2 ILE B 264 -20.400 7.278 -11.320 1.00 86.51 C ATOM 4371 CD1 ILE B 264 -17.722 8.636 -12.167 1.00 86.98 C ATOM 4372 N TYR B 265 -22.947 9.151 -12.089 1.00 88.88 N ATOM 4373 CA TYR B 265 -24.294 8.665 -12.313 1.00 89.33 C ATOM 4374 C TYR B 265 -24.838 9.135 -13.662 1.00 89.27 C ATOM 4375 O TYR B 265 -25.335 8.338 -14.451 1.00 89.51 O ATOM 4376 CB TYR B 265 -25.211 9.151 -11.191 1.00 89.38 C ATOM 4377 CG TYR B 265 -26.653 8.978 -11.527 1.00 89.34 C ATOM 4378 CD1 TYR B 265 -27.218 7.713 -11.579 1.00 90.14 C ATOM 4379 CD2 TYR B 265 -27.434 10.072 -11.894 1.00 90.59 C ATOM 4380 CE1 TYR B 265 -28.532 7.528 -12.002 1.00 92.33 C ATOM 4381 CE2 TYR B 265 -28.749 9.905 -12.318 1.00 92.63 C ATOM 4382 CZ TYR B 265 -29.294 8.627 -12.373 1.00 92.95 C ATOM 4383 OH TYR B 265 -30.590 8.450 -12.814 1.00 93.11 O ATOM 4384 N LYS B 266 -24.736 10.434 -13.920 1.00 89.51 N ATOM 4385 CA LYS B 266 -25.224 11.027 -15.161 1.00 90.13 C ATOM 4386 C LYS B 266 -24.641 10.431 -16.447 1.00 89.83 C ATOM 4387 O LYS B 266 -24.958 10.888 -17.547 1.00 89.92 O ATOM 4388 CB LYS B 266 -24.980 12.542 -15.132 1.00 91.98 C ATOM 4389 CG LYS B 266 -25.880 13.295 -14.154 1.00 93.52 C ATOM 4390 CD LYS B 266 -25.434 14.744 -13.932 1.00 94.83 C ATOM 4391 CE LYS B 266 -25.519 15.597 -15.201 1.00 95.81 C ATOM 4392 NZ LYS B 266 -24.411 15.351 -16.180 1.00 96.25 N ATOM 4393 N THR B 267 -23.799 9.411 -16.319 1.00 89.07 N ATOM 4394 CA THR B 267 -23.203 8.782 -17.495 1.00 87.97 C ATOM 4395 C THR B 267 -23.697 7.343 -17.653 1.00 88.38 C ATOM 4396 O THR B 267 -23.315 6.633 -18.588 1.00 88.04 O ATOM 4397 CB THR B 267 -21.683 8.776 -17.391 1.00 86.90 C ATOM 4398 OG1 THR B 267 -21.280 7.862 -16.362 1.00 86.88 O ATOM 4399 CG2 THR B 267 -21.189 10.174 -17.069 1.00 84.42 C ATOM 4400 N GLY B 268 -24.547 6.930 -16.721 1.00 88.84 N ATOM 4401 CA GLY B 268 -25.110 5.595 -16.745 1.00 88.87 C ATOM 4402 C GLY B 268 -24.112 4.461 -16.897 1.00 89.07 C ATOM 4403 O GLY B 268 -24.507 3.311 -17.090 1.00 90.27 O ATOM 4404 N LEU B 269 -22.824 4.758 -16.793 1.00 87.70 N ATOM 4405 CA LEU B 269 -21.826 3.715 -16.953 1.00 87.22 C ATOM 4406 C LEU B 269 -21.290 3.108 -15.666 1.00 86.80 C ATOM 4407 O LEU B 269 -20.222 2.487 -15.674 1.00 84.64 O ATOM 4408 CB LEU B 269 -20.655 4.248 -17.770 1.00 88.82 C ATOM 4409 CG LEU B 269 -20.966 4.646 -19.208 1.00 89.57 C ATOM 4410 CD1 LEU B 269 -19.654 4.970 -19.908 1.00 89.71 C ATOM 4411 CD2 LEU B 269 -21.686 3.506 -19.930 1.00 89.67 C ATOM 4412 N LEU B 270 -22.021 3.273 -14.564 1.00 87.46 N ATOM 4413 CA LEU B 270 -21.566 2.730 -13.284 1.00 88.00 C ATOM 4414 C LEU B 270 -21.791 1.232 -13.229 1.00 87.09 C ATOM 4415 O LEU B 270 -22.894 0.750 -13.487 1.00 87.38 O ATOM 4416 CB LEU B 270 -22.285 3.384 -12.096 1.00 88.45 C ATOM 4417 CG LEU B 270 -21.788 2.873 -10.728 1.00 88.79 C ATOM 4418 CD1 LEU B 270 -20.402 3.440 -10.438 1.00 86.57 C ATOM 4419 CD2 LEU B 270 -22.769 3.264 -9.631 1.00 88.64 C ATOM 4420 N SER B 271 -20.745 0.496 -12.880 1.00 85.87 N ATOM 4421 CA SER B 271 -20.863 -0.944 -12.812 1.00 85.51 C ATOM 4422 C SER B 271 -20.491 -1.467 -11.422 1.00 83.98 C ATOM 4423 O SER B 271 -21.197 -2.308 -10.853 1.00 83.61 O ATOM 4424 CB SER B 271 -19.976 -1.567 -13.893 1.00 86.30 C ATOM 4425 OG SER B 271 -20.537 -2.781 -14.356 1.00 87.42 O ATOM 4426 N GLY B 272 -19.389 -0.953 -10.878 1.00 81.96 N ATOM 4427 CA GLY B 272 -18.936 -1.374 -9.564 1.00 78.89 C ATOM 4428 C GLY B 272 -18.612 -0.177 -8.691 1.00 76.86 C ATOM 4429 O GLY B 272 -17.954 0.765 -9.139 1.00 75.55 O ATOM 4430 N LEU B 273 -19.061 -0.221 -7.440 1.00 74.99 N ATOM 4431 CA LEU B 273 -18.843 0.879 -6.505 1.00 72.30 C ATOM 4432 C LEU B 273 -18.180 0.433 -5.200 1.00 70.78 C ATOM 4433 O LEU B 273 -18.545 -0.588 -4.619 1.00 69.89 O ATOM 4434 CB LEU B 273 -20.187 1.544 -6.180 1.00 71.11 C ATOM 4435 CG LEU B 273 -20.298 3.047 -5.890 1.00 69.91 C ATOM 4436 CD1 LEU B 273 -21.422 3.239 -4.886 1.00 68.42 C ATOM 4437 CD2 LEU B 273 -18.994 3.623 -5.343 1.00 68.44 C ATOM 4438 N ASP B 274 -17.208 1.209 -4.742 1.00 70.14 N ATOM 4439 CA ASP B 274 -16.527 0.896 -3.493 1.00 69.92 C ATOM 4440 C ASP B 274 -16.599 2.047 -2.480 1.00 68.55 C ATOM 4441 O ASP B 274 -16.219 3.181 -2.767 1.00 68.14 O ATOM 4442 CB ASP B 274 -15.061 0.504 -3.756 1.00 69.92 C ATOM 4443 CG ASP B 274 -14.925 -0.871 -4.413 1.00 70.83 C ATOM 4444 OD1 ASP B 274 -15.643 -1.813 -4.004 1.00 69.82 O ATOM 4445 OD2 ASP B 274 -14.090 -1.014 -5.328 1.00 71.83 O ATOM 4446 N ILE B 275 -17.131 1.742 -1.303 1.00 67.15 N ATOM 4447 CA ILE B 275 -17.245 2.712 -0.223 1.00 65.60 C ATOM 4448 C ILE B 275 -16.417 2.118 0.910 1.00 65.13 C ATOM 4449 O ILE B 275 -16.938 1.399 1.780 1.00 63.76 O ATOM 4450 CB ILE B 275 -18.717 2.901 0.215 1.00 65.55 C ATOM 4451 CG1 ILE B 275 -19.465 3.725 -0.832 1.00 65.04 C ATOM 4452 CG2 ILE B 275 -18.794 3.586 1.562 1.00 65.82 C ATOM 4453 CD1 ILE B 275 -19.774 2.967 -2.099 1.00 67.21 C ATOM 4454 N MET B 276 -15.119 2.436 0.866 1.00 63.90 N ATOM 4455 CA MET B 276 -14.124 1.935 1.808 1.00 62.24 C ATOM 4456 C MET B 276 -13.640 2.957 2.815 1.00 61.84 C ATOM 4457 O MET B 276 -13.971 4.141 2.726 1.00 60.99 O ATOM 4458 CB MET B 276 -12.892 1.446 1.047 1.00 62.85 C ATOM 4459 CG MET B 276 -13.190 0.788 -0.276 1.00 66.01 C ATOM 4460 SD MET B 276 -14.375 -0.544 -0.098 1.00 70.21 S ATOM 4461 CE MET B 276 -13.533 -1.520 1.190 1.00 69.59 C ATOM 4462 N GLU B 277 -12.831 2.451 3.756 1.00 61.47 N ATOM 4463 CA GLU B 277 -12.154 3.196 4.840 1.00 61.13 C ATOM 4464 C GLU B 277 -12.970 3.867 5.936 1.00 60.56 C ATOM 4465 O GLU B 277 -12.479 4.789 6.589 1.00 58.86 O ATOM 4466 CB GLU B 277 -11.188 4.252 4.261 1.00 60.57 C ATOM 4467 CG GLU B 277 -9.977 3.708 3.482 1.00 59.95 C ATOM 4468 CD GLU B 277 -9.153 2.713 4.268 1.00 60.38 C ATOM 4469 OE1 GLU B 277 -9.195 2.718 5.516 1.00 60.82 O ATOM 4470 OE2 GLU B 277 -8.460 1.892 3.649 1.00 63.05 O ATOM 4471 N VAL B 278 -14.201 3.422 6.149 1.00 60.40 N ATOM 4472 CA VAL B 278 -15.009 4.028 7.191 1.00 61.42 C ATOM 4473 C VAL B 278 -14.776 3.290 8.486 1.00 61.69 C ATOM 4474 O VAL B 278 -15.062 2.105 8.553 1.00 62.84 O ATOM 4475 CB VAL B 278 -16.490 3.933 6.893 1.00 61.62 C ATOM 4476 CG1 VAL B 278 -17.262 4.280 8.147 1.00 60.45 C ATOM 4477 CG2 VAL B 278 -16.862 4.888 5.769 1.00 63.02 C ATOM 4478 N ASN B 279 -14.288 3.987 9.514 1.00 62.09 N ATOM 4479 CA ASN B 279 -14.014 3.355 10.806 1.00 62.05 C ATOM 4480 C ASN B 279 -14.909 3.855 11.926 1.00 62.56 C ATOM 4481 O ASN B 279 -14.475 4.674 12.726 1.00 62.08 O ATOM 4482 CB ASN B 279 -12.564 3.597 11.218 1.00 61.37 C ATOM 4483 CG ASN B 279 -11.976 2.422 11.977 1.00 60.61 C ATOM 4484 OD1 ASN B 279 -12.700 1.672 12.638 1.00 58.30 O ATOM 4485 ND2 ASN B 279 -10.655 2.257 11.888 1.00 57.50 N ATOM 4486 N PRO B 280 -16.161 3.355 12.011 1.00 64.39 N ATOM 4487 CA PRO B 280 -17.190 3.701 13.008 1.00 67.08 C ATOM 4488 C PRO B 280 -16.722 3.820 14.455 1.00 69.58 C ATOM 4489 O PRO B 280 -17.513 4.097 15.358 1.00 69.61 O ATOM 4490 CB PRO B 280 -18.225 2.593 12.826 1.00 65.11 C ATOM 4491 CG PRO B 280 -18.194 2.386 11.383 1.00 64.36 C ATOM 4492 CD PRO B 280 -16.698 2.354 11.076 1.00 64.19 C ATOM 4493 N THR B 281 -15.429 3.614 14.660 1.00 72.57 N ATOM 4494 CA THR B 281 -14.835 3.685 15.980 1.00 75.80 C ATOM 4495 C THR B 281 -13.861 4.852 16.071 1.00 76.44 C ATOM 4496 O THR B 281 -13.657 5.423 17.141 1.00 76.53 O ATOM 4497 CB THR B 281 -14.105 2.371 16.295 1.00 76.94 C ATOM 4498 OG1 THR B 281 -13.387 1.939 15.132 1.00 79.55 O ATOM 4499 CG2 THR B 281 -15.100 1.291 16.684 1.00 77.78 C ATOM 4500 N LEU B 282 -13.259 5.207 14.945 1.00 78.33 N ATOM 4501 CA LEU B 282 -12.316 6.310 14.927 1.00 80.29 C ATOM 4502 C LEU B 282 -13.041 7.643 14.878 1.00 83.19 C ATOM 4503 O LEU B 282 -12.875 8.417 13.938 1.00 85.02 O ATOM 4504 CB LEU B 282 -11.357 6.185 13.740 1.00 77.40 C ATOM 4505 CG LEU B 282 -10.506 4.921 13.807 1.00 75.93 C ATOM 4506 CD1 LEU B 282 -9.445 4.935 12.731 1.00 75.40 C ATOM 4507 CD2 LEU B 282 -9.874 4.837 15.173 1.00 75.90 C ATOM 4508 N GLY B 283 -13.857 7.899 15.894 1.00 85.07 N ATOM 4509 CA GLY B 283 -14.577 9.158 15.963 1.00 86.60 C ATOM 4510 C GLY B 283 -14.207 9.929 17.223 1.00 87.56 C ATOM 4511 O GLY B 283 -14.351 9.407 18.338 1.00 88.62 O ATOM 4512 N LYS B 284 -13.715 11.159 17.047 1.00 86.77 N ATOM 4513 CA LYS B 284 -13.327 12.019 18.170 1.00 85.88 C ATOM 4514 C LYS B 284 -14.562 12.194 19.046 1.00 85.02 C ATOM 4515 O LYS B 284 -14.466 12.259 20.275 1.00 84.12 O ATOM 4516 CB LYS B 284 -12.864 13.392 17.662 1.00 87.29 C ATOM 4517 CG LYS B 284 -11.694 14.023 18.422 1.00 87.71 C ATOM 4518 CD LYS B 284 -10.348 13.677 17.777 1.00 87.88 C ATOM 4519 CE LYS B 284 -10.090 12.171 17.780 1.00 88.18 C ATOM 4520 NZ LYS B 284 -8.876 11.769 17.023 1.00 87.29 N ATOM 4521 N THR B 285 -15.717 12.274 18.382 1.00 84.15 N ATOM 4522 CA THR B 285 -17.013 12.416 19.030 1.00 83.26 C ATOM 4523 C THR B 285 -18.043 11.672 18.189 1.00 82.99 C ATOM 4524 O THR B 285 -17.906 11.557 16.969 1.00 81.31 O ATOM 4525 CB THR B 285 -17.471 13.877 19.110 1.00 83.35 C ATOM 4526 OG1 THR B 285 -17.924 14.311 17.820 1.00 84.05 O ATOM 4527 CG2 THR B 285 -16.332 14.766 19.568 1.00 84.61 C ATOM 4528 N PRO B 286 -19.093 11.157 18.838 1.00 83.41 N ATOM 4529 CA PRO B 286 -20.171 10.419 18.180 1.00 82.95 C ATOM 4530 C PRO B 286 -20.686 11.076 16.900 1.00 82.76 C ATOM 4531 O PRO B 286 -20.970 10.385 15.918 1.00 83.94 O ATOM 4532 CB PRO B 286 -21.227 10.336 19.271 1.00 83.43 C ATOM 4533 CG PRO B 286 -20.376 10.131 20.497 1.00 83.83 C ATOM 4534 CD PRO B 286 -19.289 11.174 20.301 1.00 84.08 C ATOM 4535 N GLU B 287 -20.816 12.401 16.916 1.00 81.54 N ATOM 4536 CA GLU B 287 -21.278 13.134 15.739 1.00 80.48 C ATOM 4537 C GLU B 287 -20.350 12.844 14.549 1.00 79.49 C ATOM 4538 O GLU B 287 -20.801 12.375 13.497 1.00 78.67 O ATOM 4539 CB GLU B 287 -21.306 14.640 16.048 1.00 82.25 C ATOM 4540 CG GLU B 287 -20.670 15.575 14.991 1.00 83.83 C ATOM 4541 CD GLU B 287 -21.444 15.651 13.667 1.00 84.57 C ATOM 4542 OE1 GLU B 287 -21.411 14.670 12.894 1.00 85.56 O ATOM 4543 OE2 GLU B 287 -22.085 16.694 13.392 1.00 84.28 O ATOM 4544 N GLU B 288 -19.058 13.124 14.730 1.00 77.25 N ATOM 4545 CA GLU B 288 -18.054 12.893 13.695 1.00 73.95 C ATOM 4546 C GLU B 288 -18.300 11.588 12.963 1.00 72.69 C ATOM 4547 O GLU B 288 -18.205 11.531 11.741 1.00 72.64 O ATOM 4548 CB GLU B 288 -16.660 12.858 14.303 1.00 72.46 C ATOM 4549 CG GLU B 288 -16.253 14.151 14.968 1.00 70.50 C ATOM 4550 CD GLU B 288 -14.748 14.285 15.077 1.00 69.40 C ATOM 4551 OE1 GLU B 288 -14.042 13.305 14.736 1.00 68.54 O ATOM 4552 OE2 GLU B 288 -14.269 15.363 15.503 1.00 68.62 O ATOM 4553 N VAL B 289 -18.592 10.533 13.718 1.00 71.45 N ATOM 4554 CA VAL B 289 -18.879 9.235 13.115 1.00 70.18 C ATOM 4555 C VAL B 289 -20.140 9.457 12.289 1.00 69.20 C ATOM 4556 O VAL B 289 -20.085 9.474 11.060 1.00 68.99 O ATOM 4557 CB VAL B 289 -19.139 8.118 14.191 1.00 69.79 C ATOM 4558 CG1 VAL B 289 -19.421 6.783 13.513 1.00 67.62 C ATOM 4559 CG2 VAL B 289 -17.921 7.970 15.103 1.00 69.71 C ATOM 4560 N THR B 290 -21.268 9.646 12.970 1.00 68.00 N ATOM 4561 CA THR B 290 -22.547 9.886 12.301 1.00 66.88 C ATOM 4562 C THR B 290 -22.354 10.705 11.022 1.00 66.56 C ATOM 4563 O THR B 290 -22.879 10.361 9.956 1.00 66.19 O ATOM 4564 CB THR B 290 -23.512 10.637 13.231 1.00 65.35 C ATOM 4565 OG1 THR B 290 -23.831 9.796 14.340 1.00 63.64 O ATOM 4566 CG2 THR B 290 -24.785 11.014 12.497 1.00 64.85 C ATOM 4567 N ARG B 291 -21.601 11.794 11.149 1.00 65.26 N ATOM 4568 CA ARG B 291 -21.305 12.673 10.019 1.00 64.06 C ATOM 4569 C ARG B 291 -20.752 11.776 8.920 1.00 63.34 C ATOM 4570 O ARG B 291 -21.368 11.576 7.869 1.00 64.70 O ATOM 4571 CB ARG B 291 -20.242 13.694 10.431 1.00 61.65 C ATOM 4572 CG ARG B 291 -19.924 14.758 9.410 1.00 58.64 C ATOM 4573 CD ARG B 291 -18.895 15.714 9.997 1.00 56.90 C ATOM 4574 NE ARG B 291 -19.239 16.039 11.378 1.00 57.01 N ATOM 4575 CZ ARG B 291 -18.431 16.640 12.249 1.00 57.06 C ATOM 4576 NH1 ARG B 291 -17.200 17.003 11.887 1.00 57.97 N ATOM 4577 NH2 ARG B 291 -18.840 16.850 13.497 1.00 55.48 N ATOM 4578 N THR B 292 -19.583 11.223 9.210 1.00 61.05 N ATOM 4579 CA THR B 292 -18.870 10.335 8.319 1.00 58.67 C ATOM 4580 C THR B 292 -19.791 9.300 7.690 1.00 58.47 C ATOM 4581 O THR B 292 -19.712 9.011 6.501 1.00 57.49 O ATOM 4582 CB THR B 292 -17.768 9.615 9.105 1.00 58.81 C ATOM 4583 OG1 THR B 292 -16.924 10.586 9.758 1.00 56.48 O ATOM 4584 CG2 THR B 292 -16.945 8.730 8.167 1.00 59.71 C ATOM 4585 N VAL B 293 -20.680 8.754 8.504 1.00 59.78 N ATOM 4586 CA VAL B 293 -21.613 7.736 8.055 1.00 58.82 C ATOM 4587 C VAL B 293 -22.630 8.206 7.030 1.00 58.73 C ATOM 4588 O VAL B 293 -22.764 7.603 5.974 1.00 60.24 O ATOM 4589 CB VAL B 293 -22.347 7.136 9.250 1.00 57.64 C ATOM 4590 CG1 VAL B 293 -23.587 6.420 8.796 1.00 59.61 C ATOM 4591 CG2 VAL B 293 -21.424 6.177 9.968 1.00 56.78 C ATOM 4592 N ASN B 294 -23.352 9.275 7.325 1.00 59.01 N ATOM 4593 CA ASN B 294 -24.355 9.760 6.381 1.00 59.52 C ATOM 4594 C ASN B 294 -23.742 10.253 5.077 1.00 57.04 C ATOM 4595 O ASN B 294 -24.173 9.860 4.000 1.00 56.04 O ATOM 4596 CB ASN B 294 -25.189 10.861 7.033 1.00 62.65 C ATOM 4597 CG ASN B 294 -25.795 10.416 8.355 1.00 65.74 C ATOM 4598 OD1 ASN B 294 -26.572 9.445 8.408 1.00 65.73 O ATOM 4599 ND2 ASN B 294 -25.433 11.114 9.437 1.00 65.75 N ATOM 4600 N THR B 295 -22.732 11.108 5.178 1.00 55.78 N ATOM 4601 CA THR B 295 -22.048 11.630 3.995 1.00 53.42 C ATOM 4602 C THR B 295 -21.631 10.459 3.087 1.00 51.63 C ATOM 4603 O THR B 295 -21.720 10.529 1.853 1.00 51.32 O ATOM 4604 CB THR B 295 -20.802 12.457 4.408 1.00 51.74 C ATOM 4605 OG1 THR B 295 -20.117 12.920 3.243 1.00 51.03 O ATOM 4606 CG2 THR B 295 -19.850 11.609 5.222 1.00 52.14 C ATOM 4607 N ALA B 296 -21.181 9.381 3.712 1.00 48.87 N ATOM 4608 CA ALA B 296 -20.797 8.205 2.972 1.00 48.83 C ATOM 4609 C ALA B 296 -22.069 7.688 2.292 1.00 49.86 C ATOM 4610 O ALA B 296 -22.149 7.586 1.066 1.00 48.11 O ATOM 4611 CB ALA B 296 -20.250 7.180 3.919 1.00 48.98 C ATOM 4612 N VAL B 297 -23.067 7.391 3.120 1.00 51.36 N ATOM 4613 CA VAL B 297 -24.375 6.917 2.672 1.00 53.60 C ATOM 4614 C VAL B 297 -24.870 7.798 1.533 1.00 55.78 C ATOM 4615 O VAL B 297 -25.296 7.314 0.477 1.00 58.28 O ATOM 4616 CB VAL B 297 -25.413 7.011 3.823 1.00 53.14 C ATOM 4617 CG1 VAL B 297 -26.823 6.795 3.299 1.00 50.59 C ATOM 4618 CG2 VAL B 297 -25.086 5.989 4.895 1.00 54.44 C ATOM 4619 N ALA B 298 -24.816 9.100 1.767 1.00 56.73 N ATOM 4620 CA ALA B 298 -25.261 10.070 0.796 1.00 58.92 C ATOM 4621 C ALA B 298 -24.502 9.877 -0.505 1.00 61.14 C ATOM 4622 O ALA B 298 -25.101 9.902 -1.583 1.00 62.85 O ATOM 4623 CB ALA B 298 -25.035 11.481 1.341 1.00 58.61 C ATOM 4624 N LEU B 299 -23.186 9.690 -0.415 1.00 60.82 N ATOM 4625 CA LEU B 299 -22.391 9.511 -1.621 1.00 61.09 C ATOM 4626 C LEU B 299 -22.836 8.292 -2.433 1.00 62.91 C ATOM 4627 O LEU B 299 -22.633 8.249 -3.644 1.00 65.11 O ATOM 4628 CB LEU B 299 -20.906 9.370 -1.286 1.00 57.81 C ATOM 4629 CG LEU B 299 -20.026 10.567 -0.915 1.00 55.38 C ATOM 4630 CD1 LEU B 299 -18.612 10.065 -0.868 1.00 53.96 C ATOM 4631 CD2 LEU B 299 -20.096 11.695 -1.923 1.00 51.97 C ATOM 4632 N THR B 300 -23.433 7.302 -1.779 1.00 62.76 N ATOM 4633 CA THR B 300 -23.872 6.112 -2.499 1.00 62.65 C ATOM 4634 C THR B 300 -25.097 6.447 -3.318 1.00 61.32 C ATOM 4635 O THR B 300 -25.066 6.382 -4.543 1.00 59.77 O ATOM 4636 CB THR B 300 -24.231 4.966 -1.539 1.00 63.73 C ATOM 4637 OG1 THR B 300 -23.243 4.873 -0.507 1.00 65.35 O ATOM 4638 CG2 THR B 300 -24.270 3.636 -2.291 1.00 63.25 C ATOM 4639 N LEU B 301 -26.169 6.799 -2.618 1.00 60.60 N ATOM 4640 CA LEU B 301 -27.430 7.189 -3.238 1.00 62.25 C ATOM 4641 C LEU B 301 -27.212 8.230 -4.327 1.00 62.75 C ATOM 4642 O LEU B 301 -28.117 8.541 -5.093 1.00 61.14 O ATOM 4643 CB LEU B 301 -28.364 7.787 -2.187 1.00 62.64 C ATOM 4644 CG LEU B 301 -28.728 6.888 -1.011 1.00 63.60 C ATOM 4645 CD1 LEU B 301 -29.412 7.684 0.095 1.00 64.16 C ATOM 4646 CD2 LEU B 301 -29.636 5.795 -1.528 1.00 64.47 C ATOM 4647 N SER B 302 -26.014 8.797 -4.368 1.00 65.66 N ATOM 4648 CA SER B 302 -25.693 9.803 -5.369 1.00 67.63 C ATOM 4649 C SER B 302 -25.419 9.065 -6.667 1.00 69.13 C ATOM 4650 O SER B 302 -26.016 9.357 -7.706 1.00 70.32 O ATOM 4651 CB SER B 302 -24.467 10.605 -4.937 1.00 67.19 C ATOM 4652 OG SER B 302 -24.310 11.748 -5.754 1.00 69.11 O ATOM 4653 N CYS B 303 -24.527 8.083 -6.579 1.00 70.31 N ATOM 4654 CA CYS B 303 -24.149 7.241 -7.706 1.00 71.32 C ATOM 4655 C CYS B 303 -25.357 6.512 -8.289 1.00 72.31 C ATOM 4656 O CYS B 303 -25.288 5.943 -9.380 1.00 72.83 O ATOM 4657 CB CYS B 303 -23.153 6.190 -7.239 1.00 71.03 C ATOM 4658 SG CYS B 303 -21.782 6.880 -6.365 1.00 72.69 S ATOM 4659 N PHE B 304 -26.464 6.530 -7.557 1.00 73.09 N ATOM 4660 CA PHE B 304 -27.655 5.831 -7.994 1.00 72.61 C ATOM 4661 C PHE B 304 -28.883 6.682 -8.261 1.00 73.36 C ATOM 4662 O PHE B 304 -30.007 6.239 -8.016 1.00 74.39 O ATOM 4663 CB PHE B 304 -27.984 4.753 -6.975 1.00 71.80 C ATOM 4664 CG PHE B 304 -26.899 3.738 -6.816 1.00 71.95 C ATOM 4665 CD1 PHE B 304 -26.723 2.736 -7.763 1.00 72.36 C ATOM 4666 CD2 PHE B 304 -26.029 3.798 -5.738 1.00 72.11 C ATOM 4667 CE1 PHE B 304 -25.695 1.806 -7.639 1.00 71.54 C ATOM 4668 CE2 PHE B 304 -24.999 2.873 -5.606 1.00 72.05 C ATOM 4669 CZ PHE B 304 -24.833 1.875 -6.560 1.00 71.30 C ATOM 4670 N GLY B 305 -28.685 7.899 -8.757 1.00 72.86 N ATOM 4671 CA GLY B 305 -29.839 8.718 -9.067 1.00 72.03 C ATOM 4672 C GLY B 305 -29.964 10.063 -8.392 1.00 71.73 C ATOM 4673 O GLY B 305 -29.841 11.100 -9.055 1.00 72.28 O ATOM 4674 N THR B 306 -30.224 10.047 -7.085 1.00 70.88 N ATOM 4675 CA THR B 306 -30.388 11.276 -6.317 1.00 69.05 C ATOM 4676 C THR B 306 -29.483 12.389 -6.854 1.00 69.72 C ATOM 4677 O THR B 306 -28.260 12.218 -7.021 1.00 68.72 O ATOM 4678 CB THR B 306 -30.085 11.068 -4.837 1.00 66.56 C ATOM 4679 OG1 THR B 306 -30.506 9.762 -4.443 1.00 65.69 O ATOM 4680 CG2 THR B 306 -30.840 12.083 -4.013 1.00 65.22 C ATOM 4681 N LYS B 307 -30.117 13.522 -7.138 1.00 69.29 N ATOM 4682 CA LYS B 307 -29.453 14.700 -7.672 1.00 69.35 C ATOM 4683 C LYS B 307 -29.667 15.796 -6.639 1.00 68.64 C ATOM 4684 O LYS B 307 -30.580 15.699 -5.831 1.00 68.97 O ATOM 4685 CB LYS B 307 -30.096 15.051 -9.018 1.00 69.94 C ATOM 4686 CG LYS B 307 -30.241 13.809 -9.917 1.00 71.55 C ATOM 4687 CD LYS B 307 -31.297 13.970 -11.014 1.00 75.08 C ATOM 4688 CE LYS B 307 -30.800 14.826 -12.165 1.00 76.43 C ATOM 4689 NZ LYS B 307 -29.626 14.186 -12.835 1.00 77.73 N ATOM 4690 N ARG B 308 -28.826 16.821 -6.628 1.00 68.82 N ATOM 4691 CA ARG B 308 -29.007 17.873 -5.640 1.00 69.63 C ATOM 4692 C ARG B 308 -30.032 18.907 -6.102 1.00 73.65 C ATOM 4693 O ARG B 308 -30.206 19.973 -5.483 1.00 74.33 O ATOM 4694 CB ARG B 308 -27.664 18.532 -5.308 1.00 65.67 C ATOM 4695 CG ARG B 308 -27.001 17.925 -4.070 1.00 61.84 C ATOM 4696 CD ARG B 308 -25.648 18.526 -3.776 1.00 56.86 C ATOM 4697 NE ARG B 308 -24.675 18.123 -4.778 1.00 51.40 N ATOM 4698 CZ ARG B 308 -23.667 18.885 -5.179 1.00 51.56 C ATOM 4699 NH1 ARG B 308 -23.502 20.097 -4.661 1.00 50.69 N ATOM 4700 NH2 ARG B 308 -22.829 18.439 -6.100 1.00 53.11 N ATOM 4701 N GLU B 309 -30.723 18.580 -7.188 1.00 76.34 N ATOM 4702 CA GLU B 309 -31.737 19.468 -7.713 1.00 78.81 C ATOM 4703 C GLU B 309 -33.104 18.841 -7.508 1.00 81.14 C ATOM 4704 O GLU B 309 -34.133 19.495 -7.688 1.00 80.97 O ATOM 4705 CB GLU B 309 -31.491 19.737 -9.194 1.00 78.42 C ATOM 4706 CG GLU B 309 -31.574 18.517 -10.065 1.00 79.83 C ATOM 4707 CD GLU B 309 -31.942 18.874 -11.486 1.00 80.70 C ATOM 4708 OE1 GLU B 309 -31.167 19.617 -12.136 1.00 79.92 O ATOM 4709 OE2 GLU B 309 -33.015 18.418 -11.949 1.00 80.64 O ATOM 4710 N GLY B 310 -33.109 17.570 -7.125 1.00 84.23 N ATOM 4711 CA GLY B 310 -34.360 16.870 -6.893 1.00 88.85 C ATOM 4712 C GLY B 310 -34.556 15.618 -7.734 1.00 92.10 C ATOM 4713 O GLY B 310 -33.765 15.313 -8.622 1.00 91.07 O ATOM 4714 N ASN B 311 -35.635 14.899 -7.446 1.00 97.12 N ATOM 4715 CA ASN B 311 -36.004 13.672 -8.148 1.00102.29 C ATOM 4716 C ASN B 311 -37.523 13.508 -7.993 1.00106.18 C ATOM 4717 O ASN B 311 -38.073 13.783 -6.920 1.00107.49 O ATOM 4718 CB ASN B 311 -35.299 12.452 -7.528 1.00102.14 C ATOM 4719 CG ASN B 311 -33.787 12.453 -7.753 1.00102.71 C ATOM 4720 OD1 ASN B 311 -33.317 12.390 -8.889 1.00103.04 O ATOM 4721 ND2 ASN B 311 -33.022 12.519 -6.665 1.00102.31 N ATOM 4722 N HIS B 312 -38.196 13.079 -9.061 1.00109.68 N ATOM 4723 CA HIS B 312 -39.647 12.869 -9.030 1.00112.52 C ATOM 4724 C HIS B 312 -40.027 11.699 -9.942 1.00113.52 C ATOM 4725 O HIS B 312 -39.640 11.665 -11.116 1.00113.20 O ATOM 4726 CB HIS B 312 -40.390 14.141 -9.472 1.00114.32 C ATOM 4727 CG HIS B 312 -40.261 14.445 -10.937 1.00117.06 C ATOM 4728 ND1 HIS B 312 -39.111 14.957 -11.493 1.00118.23 N ATOM 4729 CD2 HIS B 312 -41.133 14.268 -11.960 1.00118.21 C ATOM 4730 CE1 HIS B 312 -39.276 15.082 -12.802 1.00118.99 C ATOM 4731 NE2 HIS B 312 -40.492 14.671 -13.108 1.00119.29 N ATOM 4732 N LYS B 313 -40.774 10.739 -9.403 1.00114.77 N ATOM 4733 CA LYS B 313 -41.193 9.572 -10.179 1.00116.23 C ATOM 4734 C LYS B 313 -41.850 10.039 -11.476 1.00117.88 C ATOM 4735 O LYS B 313 -42.826 10.791 -11.449 1.00118.20 O ATOM 4736 CB LYS B 313 -42.173 8.728 -9.364 1.00115.55 C ATOM 4737 CG LYS B 313 -41.627 8.267 -8.019 1.00114.52 C ATOM 4738 CD LYS B 313 -42.718 7.583 -7.203 1.00113.62 C ATOM 4739 CE LYS B 313 -42.253 7.217 -5.802 1.00111.93 C ATOM 4740 NZ LYS B 313 -43.366 6.624 -5.003 1.00109.72 N ATOM 4741 N PRO B 314 -41.313 9.606 -12.632 1.00119.02 N ATOM 4742 CA PRO B 314 -41.856 9.993 -13.937 1.00119.57 C ATOM 4743 C PRO B 314 -43.266 9.474 -14.194 1.00120.29 C ATOM 4744 O PRO B 314 -43.763 8.587 -13.488 1.00119.94 O ATOM 4745 CB PRO B 314 -40.852 9.399 -14.924 1.00119.10 C ATOM 4746 CG PRO B 314 -39.584 9.357 -14.141 1.00119.32 C ATOM 4747 CD PRO B 314 -40.064 8.846 -12.805 1.00119.29 C ATOM 4748 N GLU B 315 -43.895 10.044 -15.217 1.00120.71 N ATOM 4749 CA GLU B 315 -45.241 9.669 -15.630 1.00120.73 C ATOM 4750 C GLU B 315 -46.317 9.933 -14.585 1.00119.42 C ATOM 4751 O GLU B 315 -47.470 9.532 -14.757 1.00119.59 O ATOM 4752 CB GLU B 315 -45.266 8.195 -16.046 1.00122.78 C ATOM 4753 CG GLU B 315 -44.544 7.925 -17.364 1.00124.50 C ATOM 4754 CD GLU B 315 -44.439 6.448 -17.690 1.00125.58 C ATOM 4755 OE1 GLU B 315 -43.965 6.127 -18.799 1.00125.94 O ATOM 4756 OE2 GLU B 315 -44.821 5.613 -16.840 1.00126.86 O ATOM 4757 N THR B 316 -45.940 10.598 -13.500 1.00117.54 N ATOM 4758 CA THR B 316 -46.907 10.923 -12.462 1.00115.78 C ATOM 4759 C THR B 316 -47.093 12.429 -12.497 1.00113.97 C ATOM 4760 O THR B 316 -46.118 13.178 -12.557 1.00113.84 O ATOM 4761 CB THR B 316 -46.422 10.493 -11.046 1.00116.09 C ATOM 4762 OG1 THR B 316 -46.225 9.073 -11.018 1.00116.18 O ATOM 4763 CG2 THR B 316 -47.459 10.873 -9.980 1.00115.05 C ATOM 4764 N ASP B 317 -48.352 12.860 -12.499 1.00111.82 N ATOM 4765 CA ASP B 317 -48.691 14.278 -12.513 1.00109.43 C ATOM 4766 C ASP B 317 -48.867 14.700 -11.054 1.00109.27 C ATOM 4767 O ASP B 317 -49.880 14.384 -10.414 1.00109.58 O ATOM 4768 CB ASP B 317 -49.979 14.506 -13.327 1.00106.72 C ATOM 4769 CG ASP B 317 -50.421 15.968 -13.363 1.00104.53 C ATOM 4770 OD1 ASP B 317 -49.615 16.849 -13.731 1.00103.22 O ATOM 4771 OD2 ASP B 317 -51.593 16.239 -13.032 1.00102.42 O ATOM 4772 N TYR B 318 -47.849 15.387 -10.529 1.00108.87 N ATOM 4773 CA TYR B 318 -47.841 15.871 -9.150 1.00108.47 C ATOM 4774 C TYR B 318 -48.716 17.110 -9.028 1.00109.21 C ATOM 4775 O TYR B 318 -48.716 17.781 -8.004 1.00108.22 O ATOM 4776 CB TYR B 318 -46.408 16.201 -8.711 1.00107.18 C ATOM 4777 CG TYR B 318 -45.524 15.004 -8.369 1.00105.92 C ATOM 4778 CD1 TYR B 318 -45.699 14.291 -7.172 1.00104.92 C ATOM 4779 CD2 TYR B 318 -44.490 14.605 -9.222 1.00104.89 C ATOM 4780 CE1 TYR B 318 -44.859 13.211 -6.836 1.00103.58 C ATOM 4781 CE2 TYR B 318 -43.646 13.527 -8.895 1.00104.05 C ATOM 4782 CZ TYR B 318 -43.837 12.839 -7.703 1.00103.64 C ATOM 4783 OH TYR B 318 -43.007 11.785 -7.389 1.00102.85 O ATOM 4784 N LEU B 319 -49.457 17.401 -10.092 1.00111.27 N ATOM 4785 CA LEU B 319 -50.373 18.542 -10.145 1.00112.79 C ATOM 4786 C LEU B 319 -51.834 18.080 -9.974 1.00113.76 C ATOM 4787 O LEU B 319 -52.480 18.493 -8.978 1.00114.70 O ATOM 4788 CB LEU B 319 -50.231 19.278 -11.488 1.00111.81 C ATOM 4789 CG LEU B 319 -49.017 20.186 -11.675 1.00110.60 C ATOM 4790 CD1 LEU B 319 -48.973 20.709 -13.104 1.00109.61 C ATOM 4791 CD2 LEU B 319 -49.104 21.336 -10.679 1.00109.90 C TER 4792 LEU B 319 ATOM 4793 N LYS C 6 -29.204 57.250 -3.818 1.00 99.08 N ATOM 4794 CA LYS C 6 -30.445 56.415 -3.730 1.00 99.00 C ATOM 4795 C LYS C 6 -30.956 56.282 -2.282 1.00 98.40 C ATOM 4796 O LYS C 6 -30.220 56.533 -1.319 1.00 97.61 O ATOM 4797 CB LYS C 6 -30.170 55.034 -4.340 1.00 99.66 C ATOM 4798 CG LYS C 6 -30.959 54.727 -5.617 1.00 99.85 C ATOM 4799 CD LYS C 6 -32.365 54.205 -5.292 1.00100.73 C ATOM 4800 CE LYS C 6 -33.149 53.798 -6.542 1.00101.23 C ATOM 4801 NZ LYS C 6 -34.437 53.110 -6.207 1.00100.75 N ATOM 4802 N PRO C 7 -32.237 55.897 -2.117 1.00 97.80 N ATOM 4803 CA PRO C 7 -32.835 55.741 -0.791 1.00 97.15 C ATOM 4804 C PRO C 7 -33.133 54.299 -0.425 1.00 96.32 C ATOM 4805 O PRO C 7 -33.459 53.485 -1.286 1.00 96.22 O ATOM 4806 CB PRO C 7 -34.113 56.540 -0.920 1.00 97.20 C ATOM 4807 CG PRO C 7 -34.574 56.093 -2.283 1.00 97.43 C ATOM 4808 CD PRO C 7 -33.302 56.009 -3.131 1.00 97.47 C ATOM 4809 N ILE C 8 -33.050 54.009 0.868 1.00 95.57 N ATOM 4810 CA ILE C 8 -33.310 52.673 1.385 1.00 94.69 C ATOM 4811 C ILE C 8 -34.324 52.737 2.505 1.00 94.01 C ATOM 4812 O ILE C 8 -34.175 53.553 3.420 1.00 93.77 O ATOM 4813 CB ILE C 8 -32.031 52.021 2.005 1.00 94.64 C ATOM 4814 CG1 ILE C 8 -30.940 51.836 0.950 1.00 93.94 C ATOM 4815 CG2 ILE C 8 -32.381 50.679 2.641 1.00 93.66 C ATOM 4816 CD1 ILE C 8 -31.379 51.008 -0.217 1.00 94.03 C ATOM 4817 N GLU C 9 -35.362 51.909 2.431 1.00 92.63 N ATOM 4818 CA GLU C 9 -36.302 51.866 3.540 1.00 91.96 C ATOM 4819 C GLU C 9 -36.350 50.415 3.994 1.00 91.79 C ATOM 4820 O GLU C 9 -36.505 49.501 3.174 1.00 91.03 O ATOM 4821 CB GLU C 9 -37.716 52.334 3.183 1.00 91.63 C ATOM 4822 CG GLU C 9 -38.477 52.756 4.462 1.00 91.26 C ATOM 4823 CD GLU C 9 -39.993 52.641 4.389 1.00 91.17 C ATOM 4824 OE1 GLU C 9 -40.655 52.979 5.396 1.00 89.98 O ATOM 4825 OE2 GLU C 9 -40.521 52.214 3.345 1.00 91.53 O ATOM 4826 N ILE C 10 -36.188 50.210 5.298 1.00 91.97 N ATOM 4827 CA ILE C 10 -36.192 48.870 5.873 1.00 92.34 C ATOM 4828 C ILE C 10 -37.536 48.536 6.500 1.00 91.65 C ATOM 4829 O ILE C 10 -38.085 49.328 7.264 1.00 91.67 O ATOM 4830 CB ILE C 10 -35.103 48.726 6.961 1.00 93.05 C ATOM 4831 CG1 ILE C 10 -33.729 48.971 6.333 1.00 93.78 C ATOM 4832 CG2 ILE C 10 -35.192 47.338 7.618 1.00 91.79 C ATOM 4833 CD1 ILE C 10 -32.569 48.945 7.309 1.00 92.99 C ATOM 4834 N ILE C 11 -38.057 47.357 6.175 1.00 90.86 N ATOM 4835 CA ILE C 11 -39.333 46.895 6.708 1.00 89.22 C ATOM 4836 C ILE C 11 -39.066 45.637 7.517 1.00 88.75 C ATOM 4837 O ILE C 11 -38.363 44.739 7.063 1.00 87.87 O ATOM 4838 CB ILE C 11 -40.339 46.526 5.586 1.00 88.97 C ATOM 4839 CG1 ILE C 11 -40.485 47.688 4.610 1.00 87.62 C ATOM 4840 CG2 ILE C 11 -41.685 46.162 6.187 1.00 88.11 C ATOM 4841 CD1 ILE C 11 -40.961 48.954 5.242 1.00 88.39 C ATOM 4842 N GLY C 12 -39.625 45.585 8.718 1.00 88.08 N ATOM 4843 CA GLY C 12 -39.445 44.417 9.551 1.00 86.84 C ATOM 4844 C GLY C 12 -40.697 43.578 9.432 1.00 85.81 C ATOM 4845 O GLY C 12 -41.799 44.093 9.605 1.00 86.11 O ATOM 4846 N ALA C 13 -40.537 42.292 9.133 1.00 84.95 N ATOM 4847 CA ALA C 13 -41.677 41.392 8.987 1.00 83.69 C ATOM 4848 C ALA C 13 -41.699 40.326 10.081 1.00 83.14 C ATOM 4849 O ALA C 13 -41.903 39.148 9.791 1.00 83.23 O ATOM 4850 CB ALA C 13 -41.635 40.719 7.598 1.00 82.18 C ATOM 4851 N PRO C 14 -41.509 40.727 11.352 1.00 82.39 N ATOM 4852 CA PRO C 14 -41.513 39.757 12.452 1.00 82.40 C ATOM 4853 C PRO C 14 -42.711 38.820 12.394 1.00 83.87 C ATOM 4854 O PRO C 14 -43.726 39.068 13.032 1.00 84.25 O ATOM 4855 CB PRO C 14 -41.533 40.645 13.684 1.00 81.00 C ATOM 4856 CG PRO C 14 -42.274 41.856 13.207 1.00 80.99 C ATOM 4857 CD PRO C 14 -41.629 42.095 11.880 1.00 81.10 C ATOM 4858 N PHE C 15 -42.580 37.737 11.634 1.00 86.12 N ATOM 4859 CA PHE C 15 -43.659 36.765 11.467 1.00 87.98 C ATOM 4860 C PHE C 15 -43.106 35.344 11.582 1.00 88.21 C ATOM 4861 O PHE C 15 -41.887 35.152 11.580 1.00 88.51 O ATOM 4862 CB PHE C 15 -44.314 36.969 10.099 1.00 88.79 C ATOM 4863 CG PHE C 15 -45.600 36.228 9.923 1.00 91.15 C ATOM 4864 CD1 PHE C 15 -46.602 36.313 10.886 1.00 92.70 C ATOM 4865 CD2 PHE C 15 -45.829 35.470 8.778 1.00 91.85 C ATOM 4866 CE1 PHE C 15 -47.822 35.661 10.707 1.00 93.41 C ATOM 4867 CE2 PHE C 15 -47.044 34.815 8.586 1.00 93.06 C ATOM 4868 CZ PHE C 15 -48.043 34.907 9.552 1.00 93.27 C ATOM 4869 N SER C 16 -43.979 34.345 11.677 1.00 87.82 N ATOM 4870 CA SER C 16 -43.483 32.980 11.794 1.00 88.48 C ATOM 4871 C SER C 16 -44.497 31.867 11.547 1.00 90.22 C ATOM 4872 O SER C 16 -44.130 30.696 11.524 1.00 90.56 O ATOM 4873 CB SER C 16 -42.854 32.776 13.175 1.00 86.82 C ATOM 4874 OG SER C 16 -43.817 32.824 14.203 1.00 83.07 O ATOM 4875 N LYS C 17 -45.762 32.215 11.364 1.00 92.02 N ATOM 4876 CA LYS C 17 -46.787 31.198 11.144 1.00 93.42 C ATOM 4877 C LYS C 17 -46.581 30.415 9.845 1.00 93.24 C ATOM 4878 O LYS C 17 -47.211 29.375 9.617 1.00 92.98 O ATOM 4879 CB LYS C 17 -48.180 31.844 11.170 1.00 95.56 C ATOM 4880 CG LYS C 17 -49.031 31.528 12.419 1.00 96.54 C ATOM 4881 CD LYS C 17 -48.271 31.767 13.720 1.00 97.30 C ATOM 4882 CE LYS C 17 -49.218 31.846 14.909 1.00 97.15 C ATOM 4883 NZ LYS C 17 -50.065 33.071 14.832 1.00 96.57 N ATOM 4884 N GLY C 18 -45.696 30.907 8.990 1.00 92.44 N ATOM 4885 CA GLY C 18 -45.441 30.198 7.753 1.00 92.27 C ATOM 4886 C GLY C 18 -44.878 28.817 8.057 1.00 92.53 C ATOM 4887 O GLY C 18 -44.874 27.943 7.189 1.00 92.29 O ATOM 4888 N GLN C 19 -44.409 28.626 9.294 1.00 92.65 N ATOM 4889 CA GLN C 19 -43.823 27.363 9.746 1.00 92.73 C ATOM 4890 C GLN C 19 -43.897 27.207 11.273 1.00 94.03 C ATOM 4891 O GLN C 19 -44.154 28.174 11.991 1.00 93.82 O ATOM 4892 CB GLN C 19 -42.370 27.260 9.269 1.00 92.02 C ATOM 4893 CG GLN C 19 -41.485 28.454 9.601 1.00 91.29 C ATOM 4894 CD GLN C 19 -41.120 28.518 11.064 1.00 91.08 C ATOM 4895 OE1 GLN C 19 -40.713 27.517 11.645 1.00 91.96 O ATOM 4896 NE2 GLN C 19 -41.254 29.694 11.669 1.00 89.85 N ATOM 4897 N PRO C 20 -43.653 25.983 11.784 1.00 95.05 N ATOM 4898 CA PRO C 20 -43.680 25.597 13.205 1.00 95.93 C ATOM 4899 C PRO C 20 -42.673 26.207 14.192 1.00 96.49 C ATOM 4900 O PRO C 20 -43.066 26.814 15.194 1.00 97.00 O ATOM 4901 CB PRO C 20 -43.558 24.076 13.141 1.00 95.49 C ATOM 4902 CG PRO C 20 -42.680 23.880 11.967 1.00 95.38 C ATOM 4903 CD PRO C 20 -43.283 24.823 10.953 1.00 94.96 C ATOM 4904 N ARG C 21 -41.384 26.018 13.923 1.00 96.79 N ATOM 4905 CA ARG C 21 -40.318 26.530 14.790 1.00 96.91 C ATOM 4906 C ARG C 21 -40.454 28.033 15.074 1.00 94.86 C ATOM 4907 O ARG C 21 -40.465 28.858 14.153 1.00 94.86 O ATOM 4908 CB ARG C 21 -38.950 26.247 14.154 1.00 99.34 C ATOM 4909 CG ARG C 21 -38.919 25.005 13.273 1.00102.34 C ATOM 4910 CD ARG C 21 -37.559 24.838 12.618 1.00106.42 C ATOM 4911 NE ARG C 21 -36.567 24.325 13.554 1.00109.60 N ATOM 4912 CZ ARG C 21 -36.564 23.081 14.020 1.00111.63 C ATOM 4913 NH1 ARG C 21 -37.498 22.223 13.632 1.00112.45 N ATOM 4914 NH2 ARG C 21 -35.626 22.696 14.875 1.00112.44 N ATOM 4915 N GLY C 22 -40.536 28.383 16.354 1.00 92.20 N ATOM 4916 CA GLY C 22 -40.687 29.779 16.724 1.00 89.96 C ATOM 4917 C GLY C 22 -39.409 30.560 16.968 1.00 88.10 C ATOM 4918 O GLY C 22 -38.427 30.025 17.475 1.00 87.82 O ATOM 4919 N GLY C 23 -39.433 31.840 16.615 1.00 86.64 N ATOM 4920 CA GLY C 23 -38.270 32.681 16.807 1.00 85.01 C ATOM 4921 C GLY C 23 -38.036 33.573 15.607 1.00 84.30 C ATOM 4922 O GLY C 23 -37.631 34.727 15.750 1.00 84.52 O ATOM 4923 N VAL C 24 -38.305 33.042 14.418 1.00 83.77 N ATOM 4924 CA VAL C 24 -38.112 33.786 13.174 1.00 83.44 C ATOM 4925 C VAL C 24 -38.639 35.226 13.160 1.00 84.55 C ATOM 4926 O VAL C 24 -38.148 36.061 12.392 1.00 83.80 O ATOM 4927 CB VAL C 24 -38.704 33.004 11.971 1.00 82.02 C ATOM 4928 CG1 VAL C 24 -37.617 32.162 11.323 1.00 82.66 C ATOM 4929 CG2 VAL C 24 -39.822 32.103 12.436 1.00 80.13 C ATOM 4930 N GLU C 25 -39.634 35.521 13.997 1.00 85.75 N ATOM 4931 CA GLU C 25 -40.173 36.877 14.043 1.00 86.17 C ATOM 4932 C GLU C 25 -39.187 37.751 14.809 1.00 85.93 C ATOM 4933 O GLU C 25 -39.204 38.985 14.696 1.00 86.45 O ATOM 4934 CB GLU C 25 -41.552 36.919 14.713 1.00 86.78 C ATOM 4935 CG GLU C 25 -41.566 36.765 16.223 1.00 88.34 C ATOM 4936 CD GLU C 25 -41.282 35.350 16.673 1.00 90.53 C ATOM 4937 OE1 GLU C 25 -41.474 34.413 15.860 1.00 91.06 O ATOM 4938 OE2 GLU C 25 -40.888 35.174 17.848 1.00 91.16 O ATOM 4939 N LYS C 26 -38.324 37.103 15.590 1.00 84.12 N ATOM 4940 CA LYS C 26 -37.306 37.821 16.337 1.00 82.19 C ATOM 4941 C LYS C 26 -36.212 38.198 15.341 1.00 80.51 C ATOM 4942 O LYS C 26 -35.246 38.872 15.690 1.00 80.34 O ATOM 4943 CB LYS C 26 -36.728 36.939 17.438 1.00 82.12 C ATOM 4944 CG LYS C 26 -37.712 36.604 18.537 1.00 84.38 C ATOM 4945 CD LYS C 26 -37.051 35.753 19.629 1.00 86.61 C ATOM 4946 CE LYS C 26 -38.022 35.440 20.774 1.00 86.85 C ATOM 4947 NZ LYS C 26 -37.428 34.592 21.858 1.00 86.67 N ATOM 4948 N GLY C 27 -36.386 37.755 14.098 1.00 78.69 N ATOM 4949 CA GLY C 27 -35.424 38.039 13.048 1.00 76.08 C ATOM 4950 C GLY C 27 -35.016 39.496 12.972 1.00 74.71 C ATOM 4951 O GLY C 27 -33.871 39.830 13.287 1.00 74.43 O ATOM 4952 N PRO C 28 -35.923 40.395 12.544 1.00 74.00 N ATOM 4953 CA PRO C 28 -35.602 41.824 12.444 1.00 74.28 C ATOM 4954 C PRO C 28 -35.151 42.419 13.777 1.00 75.07 C ATOM 4955 O PRO C 28 -34.476 43.448 13.828 1.00 74.60 O ATOM 4956 CB PRO C 28 -36.909 42.429 11.938 1.00 73.18 C ATOM 4957 CG PRO C 28 -37.460 41.340 11.073 1.00 71.70 C ATOM 4958 CD PRO C 28 -37.236 40.114 11.936 1.00 72.82 C ATOM 4959 N ALA C 29 -35.524 41.750 14.858 1.00 77.64 N ATOM 4960 CA ALA C 29 -35.168 42.191 16.199 1.00 79.22 C ATOM 4961 C ALA C 29 -33.675 42.019 16.456 1.00 79.71 C ATOM 4962 O ALA C 29 -33.050 42.857 17.118 1.00 81.37 O ATOM 4963 CB ALA C 29 -35.976 41.408 17.239 1.00 79.86 C ATOM 4964 N ALA C 30 -33.110 40.934 15.933 1.00 78.80 N ATOM 4965 CA ALA C 30 -31.689 40.661 16.112 1.00 78.62 C ATOM 4966 C ALA C 30 -30.878 41.438 15.084 1.00 78.68 C ATOM 4967 O ALA C 30 -29.809 41.980 15.387 1.00 78.63 O ATOM 4968 CB ALA C 30 -31.421 39.169 15.969 1.00 78.52 C ATOM 4969 N LEU C 31 -31.402 41.501 13.867 1.00 78.19 N ATOM 4970 CA LEU C 31 -30.721 42.199 12.787 1.00 77.40 C ATOM 4971 C LEU C 31 -30.490 43.667 13.095 1.00 77.67 C ATOM 4972 O LEU C 31 -29.412 44.187 12.836 1.00 77.84 O ATOM 4973 CB LEU C 31 -31.506 42.041 11.480 1.00 75.60 C ATOM 4974 CG LEU C 31 -31.285 40.704 10.765 1.00 73.01 C ATOM 4975 CD1 LEU C 31 -32.554 40.231 10.122 1.00 70.89 C ATOM 4976 CD2 LEU C 31 -30.189 40.871 9.742 1.00 71.89 C ATOM 4977 N ARG C 32 -31.490 44.348 13.640 1.00 78.94 N ATOM 4978 CA ARG C 32 -31.301 45.757 13.959 1.00 80.72 C ATOM 4979 C ARG C 32 -30.293 45.853 15.090 1.00 81.47 C ATOM 4980 O ARG C 32 -29.445 46.738 15.095 1.00 80.36 O ATOM 4981 CB ARG C 32 -32.614 46.424 14.392 1.00 82.03 C ATOM 4982 CG ARG C 32 -33.564 46.824 13.264 1.00 82.49 C ATOM 4983 CD ARG C 32 -34.584 47.865 13.753 1.00 84.29 C ATOM 4984 NE ARG C 32 -35.688 48.077 12.818 1.00 85.70 N ATOM 4985 CZ ARG C 32 -36.617 47.164 12.537 1.00 86.70 C ATOM 4986 NH1 ARG C 32 -36.578 45.974 13.123 1.00 87.41 N ATOM 4987 NH2 ARG C 32 -37.585 47.436 11.670 1.00 86.01 N ATOM 4988 N LYS C 33 -30.411 44.930 16.045 1.00 82.85 N ATOM 4989 CA LYS C 33 -29.528 44.867 17.200 1.00 84.50 C ATOM 4990 C LYS C 33 -28.087 44.636 16.749 1.00 85.48 C ATOM 4991 O LYS C 33 -27.140 45.000 17.446 1.00 85.93 O ATOM 4992 CB LYS C 33 -29.973 43.744 18.140 1.00 85.61 C ATOM 4993 CG LYS C 33 -28.944 43.369 19.198 1.00 88.84 C ATOM 4994 CD LYS C 33 -28.673 44.502 20.194 1.00 91.56 C ATOM 4995 CE LYS C 33 -29.611 44.438 21.403 1.00 93.10 C ATOM 4996 NZ LYS C 33 -29.512 43.133 22.144 1.00 93.72 N ATOM 4997 N ALA C 34 -27.917 44.022 15.584 1.00 86.44 N ATOM 4998 CA ALA C 34 -26.575 43.788 15.058 1.00 86.60 C ATOM 4999 C ALA C 34 -26.145 45.081 14.355 1.00 86.88 C ATOM 5000 O ALA C 34 -25.141 45.129 13.640 1.00 85.52 O ATOM 5001 CB ALA C 34 -26.589 42.618 14.081 1.00 87.03 C ATOM 5002 N GLY C 35 -26.945 46.123 14.565 1.00 88.26 N ATOM 5003 CA GLY C 35 -26.674 47.430 13.995 1.00 89.33 C ATOM 5004 C GLY C 35 -26.861 47.548 12.502 1.00 89.25 C ATOM 5005 O GLY C 35 -26.258 48.413 11.868 1.00 89.47 O ATOM 5006 N LEU C 36 -27.702 46.699 11.932 1.00 89.38 N ATOM 5007 CA LEU C 36 -27.916 46.741 10.493 1.00 90.16 C ATOM 5008 C LEU C 36 -28.264 48.133 9.964 1.00 90.88 C ATOM 5009 O LEU C 36 -27.765 48.553 8.915 1.00 90.76 O ATOM 5010 CB LEU C 36 -29.000 45.733 10.088 1.00 89.73 C ATOM 5011 CG LEU C 36 -29.374 45.630 8.603 1.00 89.10 C ATOM 5012 CD1 LEU C 36 -28.141 45.709 7.716 1.00 89.09 C ATOM 5013 CD2 LEU C 36 -30.113 44.322 8.381 1.00 88.91 C ATOM 5014 N VAL C 37 -29.098 48.864 10.693 1.00 91.52 N ATOM 5015 CA VAL C 37 -29.495 50.189 10.234 1.00 92.05 C ATOM 5016 C VAL C 37 -28.308 51.143 10.026 1.00 93.18 C ATOM 5017 O VAL C 37 -28.058 51.600 8.906 1.00 93.84 O ATOM 5018 CB VAL C 37 -30.500 50.831 11.205 1.00 90.22 C ATOM 5019 CG1 VAL C 37 -31.328 51.861 10.467 1.00 88.34 C ATOM 5020 CG2 VAL C 37 -31.386 49.764 11.819 1.00 89.51 C ATOM 5021 N GLU C 38 -27.577 51.429 11.102 1.00 93.67 N ATOM 5022 CA GLU C 38 -26.427 52.337 11.052 1.00 93.22 C ATOM 5023 C GLU C 38 -25.405 51.996 9.966 1.00 91.50 C ATOM 5024 O GLU C 38 -25.059 52.847 9.142 1.00 89.54 O ATOM 5025 CB GLU C 38 -25.719 52.370 12.418 1.00 95.27 C ATOM 5026 CG GLU C 38 -26.632 52.699 13.599 1.00 97.90 C ATOM 5027 CD GLU C 38 -27.433 51.499 14.082 1.00 98.72 C ATOM 5028 OE1 GLU C 38 -26.881 50.674 14.844 1.00 98.61 O ATOM 5029 OE2 GLU C 38 -28.611 51.376 13.691 1.00100.35 O ATOM 5030 N LYS C 39 -24.917 50.758 9.970 1.00 89.99 N ATOM 5031 CA LYS C 39 -23.932 50.351 8.981 1.00 89.24 C ATOM 5032 C LYS C 39 -24.451 50.703 7.604 1.00 89.66 C ATOM 5033 O LYS C 39 -23.711 51.211 6.760 1.00 90.45 O ATOM 5034 CB LYS C 39 -23.668 48.855 9.061 1.00 87.39 C ATOM 5035 CG LYS C 39 -23.102 48.411 10.375 1.00 86.93 C ATOM 5036 CD LYS C 39 -22.894 46.918 10.378 1.00 87.74 C ATOM 5037 CE LYS C 39 -22.446 46.443 11.744 1.00 88.21 C ATOM 5038 NZ LYS C 39 -21.158 47.076 12.116 1.00 89.64 N ATOM 5039 N LEU C 40 -25.734 50.445 7.378 1.00 89.66 N ATOM 5040 CA LEU C 40 -26.328 50.753 6.086 1.00 89.32 C ATOM 5041 C LEU C 40 -26.164 52.239 5.794 1.00 90.45 C ATOM 5042 O LEU C 40 -25.829 52.626 4.675 1.00 89.95 O ATOM 5043 CB LEU C 40 -27.809 50.345 6.060 1.00 85.82 C ATOM 5044 CG LEU C 40 -28.095 48.940 5.506 1.00 82.28 C ATOM 5045 CD1 LEU C 40 -29.515 48.544 5.817 1.00 82.14 C ATOM 5046 CD2 LEU C 40 -27.855 48.919 4.012 1.00 80.23 C ATOM 5047 N LYS C 41 -26.373 53.071 6.810 1.00 92.70 N ATOM 5048 CA LYS C 41 -26.246 54.515 6.639 1.00 95.26 C ATOM 5049 C LYS C 41 -24.840 54.875 6.171 1.00 96.02 C ATOM 5050 O LYS C 41 -24.647 55.841 5.430 1.00 96.02 O ATOM 5051 CB LYS C 41 -26.567 55.237 7.950 1.00 95.44 C ATOM 5052 CG LYS C 41 -28.009 55.054 8.416 1.00 97.67 C ATOM 5053 CD LYS C 41 -28.196 55.519 9.857 1.00 98.54 C ATOM 5054 CE LYS C 41 -29.551 55.101 10.412 1.00 97.92 C ATOM 5055 NZ LYS C 41 -29.674 55.419 11.862 1.00 96.37 N ATOM 5056 N GLU C 42 -23.865 54.082 6.594 1.00 97.23 N ATOM 5057 CA GLU C 42 -22.478 54.310 6.225 1.00 99.27 C ATOM 5058 C GLU C 42 -22.217 54.251 4.726 1.00100.29 C ATOM 5059 O GLU C 42 -21.103 54.515 4.276 1.00 99.46 O ATOM 5060 CB GLU C 42 -21.587 53.296 6.921 1.00100.72 C ATOM 5061 CG GLU C 42 -21.350 53.579 8.379 1.00104.56 C ATOM 5062 CD GLU C 42 -20.521 52.489 9.035 1.00107.81 C ATOM 5063 OE1 GLU C 42 -19.633 51.929 8.347 1.00108.24 O ATOM 5064 OE2 GLU C 42 -20.750 52.204 10.234 1.00110.21 O ATOM 5065 N THR C 43 -23.228 53.892 3.948 1.00102.51 N ATOM 5066 CA THR C 43 -23.047 53.816 2.503 1.00104.86 C ATOM 5067 C THR C 43 -23.457 55.117 1.823 1.00106.85 C ATOM 5068 O THR C 43 -23.637 56.154 2.478 1.00107.03 O ATOM 5069 CB THR C 43 -23.875 52.668 1.877 1.00104.47 C ATOM 5070 OG1 THR C 43 -25.263 52.851 2.191 1.00104.26 O ATOM 5071 CG2 THR C 43 -23.405 51.324 2.403 1.00103.94 C ATOM 5072 N GLU C 44 -23.590 55.045 0.500 1.00108.54 N ATOM 5073 CA GLU C 44 -23.994 56.183 -0.321 1.00109.37 C ATOM 5074 C GLU C 44 -25.529 56.255 -0.411 1.00109.21 C ATOM 5075 O GLU C 44 -26.089 57.103 -1.116 1.00109.70 O ATOM 5076 CB GLU C 44 -23.380 56.048 -1.720 1.00109.72 C ATOM 5077 CG GLU C 44 -23.313 54.609 -2.221 1.00111.43 C ATOM 5078 CD GLU C 44 -22.909 54.510 -3.685 1.00112.66 C ATOM 5079 OE1 GLU C 44 -21.923 55.172 -4.078 1.00113.58 O ATOM 5080 OE2 GLU C 44 -23.575 53.764 -4.441 1.00112.10 O ATOM 5081 N TYR C 45 -26.196 55.361 0.318 1.00108.44 N ATOM 5082 CA TYR C 45 -27.655 55.291 0.345 1.00107.04 C ATOM 5083 C TYR C 45 -28.233 55.920 1.616 1.00106.43 C ATOM 5084 O TYR C 45 -27.638 55.825 2.697 1.00106.41 O ATOM 5085 CB TYR C 45 -28.121 53.834 0.279 1.00106.81 C ATOM 5086 CG TYR C 45 -27.819 53.099 -1.009 1.00106.18 C ATOM 5087 CD1 TYR C 45 -26.509 52.809 -1.390 1.00106.02 C ATOM 5088 CD2 TYR C 45 -28.851 52.669 -1.835 1.00106.09 C ATOM 5089 CE1 TYR C 45 -26.243 52.102 -2.571 1.00106.12 C ATOM 5090 CE2 TYR C 45 -28.599 51.968 -3.011 1.00105.75 C ATOM 5091 CZ TYR C 45 -27.298 51.686 -3.374 1.00105.62 C ATOM 5092 OH TYR C 45 -27.062 50.997 -4.543 1.00104.68 O ATOM 5093 N ASN C 46 -29.390 56.562 1.478 1.00105.11 N ATOM 5094 CA ASN C 46 -30.064 57.168 2.619 1.00103.37 C ATOM 5095 C ASN C 46 -30.955 56.060 3.129 1.00101.06 C ATOM 5096 O ASN C 46 -31.628 55.393 2.343 1.00 99.33 O ATOM 5097 CB ASN C 46 -30.921 58.358 2.191 1.00106.07 C ATOM 5098 CG ASN C 46 -30.118 59.433 1.497 1.00108.03 C ATOM 5099 OD1 ASN C 46 -29.105 59.903 2.020 1.00109.57 O ATOM 5100 ND2 ASN C 46 -30.569 59.834 0.311 1.00108.96 N ATOM 5101 N VAL C 47 -30.979 55.864 4.438 1.00 99.24 N ATOM 5102 CA VAL C 47 -31.779 54.782 4.982 1.00 98.07 C ATOM 5103 C VAL C 47 -32.871 55.231 5.939 1.00 96.62 C ATOM 5104 O VAL C 47 -32.633 56.043 6.830 1.00 96.63 O ATOM 5105 CB VAL C 47 -30.879 53.752 5.715 1.00 98.47 C ATOM 5106 CG1 VAL C 47 -31.594 52.413 5.820 1.00 97.35 C ATOM 5107 CG2 VAL C 47 -29.555 53.598 4.981 1.00 98.39 C ATOM 5108 N ARG C 48 -34.069 54.691 5.741 1.00 94.83 N ATOM 5109 CA ARG C 48 -35.197 54.995 6.597 1.00 93.46 C ATOM 5110 C ARG C 48 -35.712 53.675 7.131 1.00 92.09 C ATOM 5111 O ARG C 48 -36.036 52.774 6.359 1.00 90.53 O ATOM 5112 CB ARG C 48 -36.303 55.706 5.811 1.00 95.23 C ATOM 5113 CG ARG C 48 -37.579 55.970 6.622 1.00 96.76 C ATOM 5114 CD ARG C 48 -38.686 56.572 5.758 1.00 98.94 C ATOM 5115 NE ARG C 48 -40.004 55.967 6.003 1.00102.01 N ATOM 5116 CZ ARG C 48 -40.792 56.229 7.048 1.00102.83 C ATOM 5117 NH1 ARG C 48 -40.412 57.099 7.978 1.00103.67 N ATOM 5118 NH2 ARG C 48 -41.967 55.617 7.159 1.00103.04 N ATOM 5119 N ASP C 49 -35.753 53.557 8.456 1.00 91.94 N ATOM 5120 CA ASP C 49 -36.243 52.348 9.101 1.00 92.13 C ATOM 5121 C ASP C 49 -37.738 52.573 9.307 1.00 92.71 C ATOM 5122 O ASP C 49 -38.146 53.543 9.946 1.00 93.23 O ATOM 5123 CB ASP C 49 -35.541 52.140 10.447 1.00 92.41 C ATOM 5124 CG ASP C 49 -35.781 50.752 11.026 1.00 93.21 C ATOM 5125 OD1 ASP C 49 -36.952 50.408 11.301 1.00 93.93 O ATOM 5126 OD2 ASP C 49 -34.801 49.995 11.209 1.00 94.25 O ATOM 5127 N HIS C 50 -38.546 51.675 8.748 1.00 93.35 N ATOM 5128 CA HIS C 50 -40.009 51.746 8.816 1.00 93.56 C ATOM 5129 C HIS C 50 -40.559 51.064 10.087 1.00 93.08 C ATOM 5130 O HIS C 50 -41.727 51.236 10.444 1.00 92.89 O ATOM 5131 CB HIS C 50 -40.568 51.100 7.531 1.00 94.71 C ATOM 5132 CG HIS C 50 -42.057 51.171 7.372 1.00 95.99 C ATOM 5133 ND1 HIS C 50 -42.939 50.591 8.266 1.00 97.39 N ATOM 5134 CD2 HIS C 50 -42.823 51.665 6.368 1.00 95.59 C ATOM 5135 CE1 HIS C 50 -44.174 50.720 7.819 1.00 96.73 C ATOM 5136 NE2 HIS C 50 -44.132 51.369 6.666 1.00 96.87 N ATOM 5137 N GLY C 51 -39.708 50.313 10.778 1.00 92.46 N ATOM 5138 CA GLY C 51 -40.151 49.639 11.981 1.00 92.91 C ATOM 5139 C GLY C 51 -40.703 48.266 11.653 1.00 93.76 C ATOM 5140 O GLY C 51 -40.542 47.777 10.533 1.00 93.36 O ATOM 5141 N ASP C 52 -41.369 47.646 12.625 1.00 94.72 N ATOM 5142 CA ASP C 52 -41.937 46.313 12.432 1.00 95.39 C ATOM 5143 C ASP C 52 -43.436 46.244 12.258 1.00 95.22 C ATOM 5144 O ASP C 52 -44.188 46.443 13.204 1.00 94.37 O ATOM 5145 CB ASP C 52 -41.572 45.385 13.590 1.00 96.23 C ATOM 5146 CG ASP C 52 -40.180 44.838 13.469 1.00 96.65 C ATOM 5147 OD1 ASP C 52 -39.783 44.494 12.335 1.00 97.76 O ATOM 5148 OD2 ASP C 52 -39.490 44.744 14.505 1.00 97.38 O ATOM 5149 N LEU C 53 -43.862 45.928 11.046 1.00 96.11 N ATOM 5150 CA LEU C 53 -45.276 45.791 10.770 1.00 97.52 C ATOM 5151 C LEU C 53 -45.888 44.845 11.814 1.00 99.27 C ATOM 5152 O LEU C 53 -45.693 43.632 11.744 1.00100.25 O ATOM 5153 CB LEU C 53 -45.489 45.210 9.365 1.00 95.74 C ATOM 5154 CG LEU C 53 -44.942 45.974 8.158 1.00 94.64 C ATOM 5155 CD1 LEU C 53 -45.423 45.287 6.893 1.00 93.53 C ATOM 5156 CD2 LEU C 53 -45.420 47.423 8.182 1.00 94.55 C ATOM 5157 N ALA C 54 -46.604 45.397 12.791 1.00101.13 N ATOM 5158 CA ALA C 54 -47.247 44.566 13.804 1.00102.37 C ATOM 5159 C ALA C 54 -48.339 43.814 13.060 1.00103.74 C ATOM 5160 O ALA C 54 -49.067 44.397 12.257 1.00102.75 O ATOM 5161 CB ALA C 54 -47.846 45.434 14.901 1.00101.69 C ATOM 5162 N PHE C 55 -48.449 42.517 13.309 1.00106.29 N ATOM 5163 CA PHE C 55 -49.449 41.733 12.607 1.00109.42 C ATOM 5164 C PHE C 55 -50.578 41.236 13.493 1.00111.23 C ATOM 5165 O PHE C 55 -50.366 40.429 14.402 1.00111.53 O ATOM 5166 CB PHE C 55 -48.788 40.550 11.891 1.00110.05 C ATOM 5167 CG PHE C 55 -47.779 40.956 10.848 1.00109.86 C ATOM 5168 CD1 PHE C 55 -48.147 41.777 9.783 1.00109.98 C ATOM 5169 CD2 PHE C 55 -46.460 40.530 10.939 1.00109.37 C ATOM 5170 CE1 PHE C 55 -47.217 42.167 8.825 1.00109.69 C ATOM 5171 CE2 PHE C 55 -45.523 40.914 9.988 1.00109.52 C ATOM 5172 CZ PHE C 55 -45.904 41.736 8.929 1.00109.89 C ATOM 5173 N VAL C 56 -51.776 41.737 13.207 1.00113.41 N ATOM 5174 CA VAL C 56 -52.995 41.378 13.925 1.00114.90 C ATOM 5175 C VAL C 56 -53.239 39.884 13.728 1.00115.83 C ATOM 5176 O VAL C 56 -53.206 39.394 12.596 1.00115.87 O ATOM 5177 CB VAL C 56 -54.201 42.171 13.365 1.00115.37 C ATOM 5178 CG1 VAL C 56 -53.984 43.672 13.569 1.00114.66 C ATOM 5179 CG2 VAL C 56 -54.380 41.868 11.877 1.00115.29 C ATOM 5180 N ASP C 57 -53.494 39.159 14.813 1.00116.87 N ATOM 5181 CA ASP C 57 -53.695 37.721 14.689 1.00118.41 C ATOM 5182 C ASP C 57 -55.121 37.287 14.369 1.00119.01 C ATOM 5183 O ASP C 57 -56.076 37.713 15.023 1.00118.81 O ATOM 5184 CB ASP C 57 -53.209 37.003 15.947 1.00118.79 C ATOM 5185 CG ASP C 57 -52.573 35.657 15.631 1.00119.33 C ATOM 5186 OD1 ASP C 57 -53.261 34.799 15.036 1.00118.72 O ATOM 5187 OD2 ASP C 57 -51.382 35.468 15.969 1.00119.85 O ATOM 5188 N VAL C 58 -55.239 36.425 13.358 1.00119.62 N ATOM 5189 CA VAL C 58 -56.523 35.896 12.900 1.00119.95 C ATOM 5190 C VAL C 58 -57.089 34.832 13.840 1.00120.08 C ATOM 5191 O VAL C 58 -56.696 33.660 13.787 1.00120.20 O ATOM 5192 CB VAL C 58 -56.400 35.284 11.480 1.00120.15 C ATOM 5193 CG1 VAL C 58 -57.711 34.612 11.074 1.00119.13 C ATOM 5194 CG2 VAL C 58 -56.028 36.372 10.482 1.00120.11 C ATOM 5195 N PRO C 59 -58.036 35.231 14.707 1.00119.88 N ATOM 5196 CA PRO C 59 -58.652 34.298 15.654 1.00118.69 C ATOM 5197 C PRO C 59 -59.267 33.119 14.925 1.00117.26 C ATOM 5198 O PRO C 59 -59.909 33.279 13.888 1.00116.68 O ATOM 5199 CB PRO C 59 -59.693 35.162 16.360 1.00119.27 C ATOM 5200 CG PRO C 59 -60.067 36.172 15.303 1.00119.94 C ATOM 5201 CD PRO C 59 -58.721 36.537 14.735 1.00119.85 C ATOM 5202 N ASN C 60 -59.060 31.931 15.473 1.00116.37 N ATOM 5203 CA ASN C 60 -59.582 30.716 14.871 1.00116.10 C ATOM 5204 C ASN C 60 -59.229 30.643 13.383 1.00115.83 C ATOM 5205 O ASN C 60 -60.073 30.878 12.511 1.00115.68 O ATOM 5206 CB ASN C 60 -61.104 30.635 15.064 1.00115.54 C ATOM 5207 CG ASN C 60 -61.692 29.322 14.554 1.00115.47 C ATOM 5208 OD1 ASN C 60 -61.215 28.238 14.893 1.00114.68 O ATOM 5209 ND2 ASN C 60 -62.739 29.419 13.739 1.00115.65 N ATOM 5210 N ASP C 61 -57.966 30.334 13.108 1.00115.14 N ATOM 5211 CA ASP C 61 -57.479 30.191 11.743 1.00114.13 C ATOM 5212 C ASP C 61 -57.525 28.679 11.479 1.00113.89 C ATOM 5213 O ASP C 61 -56.857 27.899 12.165 1.00113.46 O ATOM 5214 CB ASP C 61 -56.042 30.717 11.646 1.00113.36 C ATOM 5215 CG ASP C 61 -55.621 31.003 10.224 1.00112.30 C ATOM 5216 OD1 ASP C 61 -56.004 30.228 9.329 1.00111.35 O ATOM 5217 OD2 ASP C 61 -54.899 31.997 10.003 1.00111.82 O ATOM 5218 N SER C 62 -58.322 28.268 10.497 1.00113.79 N ATOM 5219 CA SER C 62 -58.480 26.845 10.185 1.00113.65 C ATOM 5220 C SER C 62 -57.510 26.284 9.135 1.00112.82 C ATOM 5221 O SER C 62 -57.203 26.939 8.134 1.00112.49 O ATOM 5222 CB SER C 62 -59.926 26.577 9.743 1.00114.15 C ATOM 5223 OG SER C 62 -60.852 27.026 10.721 1.00113.73 O ATOM 5224 N PRO C 63 -57.018 25.051 9.360 1.00111.99 N ATOM 5225 CA PRO C 63 -56.079 24.346 8.472 1.00111.53 C ATOM 5226 C PRO C 63 -56.644 23.879 7.122 1.00110.84 C ATOM 5227 O PRO C 63 -56.997 22.710 6.970 1.00111.39 O ATOM 5228 CB PRO C 63 -55.616 23.164 9.329 1.00111.29 C ATOM 5229 CG PRO C 63 -55.727 23.696 10.723 1.00111.54 C ATOM 5230 CD PRO C 63 -57.066 24.393 10.677 1.00111.51 C ATOM 5231 N PHE C 64 -56.711 24.791 6.153 1.00109.81 N ATOM 5232 CA PHE C 64 -57.203 24.505 4.798 1.00109.33 C ATOM 5233 C PHE C 64 -56.675 23.154 4.284 1.00108.56 C ATOM 5234 O PHE C 64 -55.682 23.107 3.559 1.00108.37 O ATOM 5235 CB PHE C 64 -56.753 25.644 3.867 1.00110.08 C ATOM 5236 CG PHE C 64 -57.212 25.508 2.431 1.00110.63 C ATOM 5237 CD1 PHE C 64 -58.535 25.751 2.076 1.00110.93 C ATOM 5238 CD2 PHE C 64 -56.301 25.202 1.422 1.00110.60 C ATOM 5239 CE1 PHE C 64 -58.939 25.698 0.739 1.00110.88 C ATOM 5240 CE2 PHE C 64 -56.698 25.147 0.085 1.00110.13 C ATOM 5241 CZ PHE C 64 -58.016 25.396 -0.255 1.00110.58 C ATOM 5242 N GLN C 65 -57.348 22.064 4.655 1.00108.06 N ATOM 5243 CA GLN C 65 -56.952 20.705 4.265 1.00107.81 C ATOM 5244 C GLN C 65 -55.710 20.273 5.051 1.00107.14 C ATOM 5245 O GLN C 65 -55.814 19.625 6.096 1.00107.01 O ATOM 5246 CB GLN C 65 -56.638 20.626 2.768 1.00108.70 C ATOM 5247 CG GLN C 65 -57.714 21.192 1.855 1.00110.56 C ATOM 5248 CD GLN C 65 -57.404 20.966 0.380 1.00110.91 C ATOM 5249 OE1 GLN C 65 -57.299 19.825 -0.073 1.00110.88 O ATOM 5250 NE2 GLN C 65 -57.256 22.055 -0.373 1.00110.95 N ATOM 5251 N ILE C 66 -54.539 20.639 4.526 1.00106.12 N ATOM 5252 CA ILE C 66 -53.241 20.337 5.139 1.00103.69 C ATOM 5253 C ILE C 66 -52.580 21.659 5.552 1.00101.94 C ATOM 5254 O ILE C 66 -51.979 21.756 6.628 1.00100.49 O ATOM 5255 CB ILE C 66 -52.313 19.568 4.141 1.00103.32 C ATOM 5256 CG1 ILE C 66 -52.063 18.147 4.643 1.00102.53 C ATOM 5257 CG2 ILE C 66 -51.001 20.296 3.955 1.00103.34 C ATOM 5258 CD1 ILE C 66 -53.282 17.249 4.559 1.00102.55 C ATOM 5259 N VAL C 67 -52.715 22.667 4.684 1.00100.15 N ATOM 5260 CA VAL C 67 -52.161 24.002 4.912 1.00 98.05 C ATOM 5261 C VAL C 67 -52.627 24.605 6.231 1.00 97.96 C ATOM 5262 O VAL C 67 -53.821 24.660 6.511 1.00 98.49 O ATOM 5263 CB VAL C 67 -52.542 24.964 3.775 1.00 96.47 C ATOM 5264 CG1 VAL C 67 -51.896 26.328 4.014 1.00 95.16 C ATOM 5265 CG2 VAL C 67 -52.128 24.370 2.437 1.00 94.29 C ATOM 5266 N LYS C 68 -51.680 25.079 7.030 1.00 97.30 N ATOM 5267 CA LYS C 68 -52.008 25.637 8.331 1.00 97.06 C ATOM 5268 C LYS C 68 -51.930 27.145 8.354 1.00 97.44 C ATOM 5269 O LYS C 68 -51.491 27.759 7.383 1.00 97.46 O ATOM 5270 CB LYS C 68 -51.073 25.032 9.370 1.00 95.66 C ATOM 5271 CG LYS C 68 -50.903 23.561 9.119 1.00 94.66 C ATOM 5272 CD LYS C 68 -50.065 22.871 10.151 1.00 94.77 C ATOM 5273 CE LYS C 68 -49.857 21.423 9.728 1.00 95.37 C ATOM 5274 NZ LYS C 68 -51.131 20.803 9.235 1.00 93.99 N ATOM 5275 N ASN C 69 -52.387 27.724 9.463 1.00 98.69 N ATOM 5276 CA ASN C 69 -52.399 29.176 9.688 1.00100.55 C ATOM 5277 C ASN C 69 -52.529 30.000 8.396 1.00101.33 C ATOM 5278 O ASN C 69 -51.827 30.998 8.217 1.00100.98 O ATOM 5279 CB ASN C 69 -51.114 29.584 10.423 1.00101.27 C ATOM 5280 CG ASN C 69 -50.891 28.796 11.713 1.00101.35 C ATOM 5281 OD1 ASN C 69 -51.313 29.217 12.795 1.00101.81 O ATOM 5282 ND2 ASN C 69 -50.226 27.644 11.598 1.00100.63 N ATOM 5283 N PRO C 70 -53.453 29.608 7.497 1.00102.26 N ATOM 5284 CA PRO C 70 -53.674 30.294 6.217 1.00102.62 C ATOM 5285 C PRO C 70 -54.065 31.761 6.290 1.00102.96 C ATOM 5286 O PRO C 70 -53.358 32.623 5.766 1.00103.01 O ATOM 5287 CB PRO C 70 -54.751 29.440 5.556 1.00103.21 C ATOM 5288 CG PRO C 70 -55.553 28.976 6.717 1.00103.31 C ATOM 5289 CD PRO C 70 -54.490 28.582 7.715 1.00102.72 C ATOM 5290 N ARG C 71 -55.199 32.038 6.921 1.00103.42 N ATOM 5291 CA ARG C 71 -55.678 33.407 7.053 1.00104.26 C ATOM 5292 C ARG C 71 -54.586 34.312 7.620 1.00104.02 C ATOM 5293 O ARG C 71 -54.424 35.462 7.191 1.00104.32 O ATOM 5294 CB ARG C 71 -56.914 33.426 7.950 1.00105.61 C ATOM 5295 CG ARG C 71 -58.037 32.568 7.402 1.00107.38 C ATOM 5296 CD ARG C 71 -59.201 32.462 8.367 1.00109.01 C ATOM 5297 NE ARG C 71 -60.278 31.651 7.805 1.00109.72 N ATOM 5298 CZ ARG C 71 -60.164 30.361 7.505 1.00110.10 C ATOM 5299 NH1 ARG C 71 -59.016 29.727 7.717 1.00110.13 N ATOM 5300 NH2 ARG C 71 -61.196 29.708 6.987 1.00109.92 N ATOM 5301 N SER C 72 -53.835 33.785 8.582 1.00103.10 N ATOM 5302 CA SER C 72 -52.745 34.534 9.195 1.00101.82 C ATOM 5303 C SER C 72 -51.793 35.025 8.088 1.00101.36 C ATOM 5304 O SER C 72 -51.661 36.232 7.843 1.00100.77 O ATOM 5305 CB SER C 72 -51.984 33.636 10.186 1.00101.15 C ATOM 5306 OG SER C 72 -52.818 33.189 11.244 1.00 99.62 O ATOM 5307 N VAL C 73 -51.155 34.064 7.419 1.00100.71 N ATOM 5308 CA VAL C 73 -50.205 34.320 6.341 1.00 99.43 C ATOM 5309 C VAL C 73 -50.808 35.080 5.162 1.00 98.74 C ATOM 5310 O VAL C 73 -50.088 35.669 4.360 1.00 98.50 O ATOM 5311 CB VAL C 73 -49.622 32.995 5.807 1.00 99.57 C ATOM 5312 CG1 VAL C 73 -48.575 33.285 4.751 1.00 99.47 C ATOM 5313 CG2 VAL C 73 -49.032 32.175 6.956 1.00 98.64 C ATOM 5314 N GLY C 74 -52.128 35.055 5.051 1.00 98.05 N ATOM 5315 CA GLY C 74 -52.769 35.762 3.963 1.00 98.09 C ATOM 5316 C GLY C 74 -52.900 37.242 4.274 1.00 98.36 C ATOM 5317 O GLY C 74 -52.688 38.097 3.408 1.00 97.94 O ATOM 5318 N LYS C 75 -53.249 37.551 5.518 1.00 98.26 N ATOM 5319 CA LYS C 75 -53.411 38.937 5.922 1.00 98.70 C ATOM 5320 C LYS C 75 -52.086 39.602 6.279 1.00 98.41 C ATOM 5321 O LYS C 75 -51.914 40.813 6.082 1.00 98.11 O ATOM 5322 CB LYS C 75 -54.372 39.046 7.104 1.00 99.64 C ATOM 5323 CG LYS C 75 -54.514 40.479 7.633 1.00101.06 C ATOM 5324 CD LYS C 75 -54.885 41.458 6.513 1.00101.25 C ATOM 5325 CE LYS C 75 -54.785 42.906 6.972 1.00100.94 C ATOM 5326 NZ LYS C 75 -55.007 43.844 5.837 1.00100.45 N ATOM 5327 N ALA C 76 -51.157 38.812 6.814 1.00 98.25 N ATOM 5328 CA ALA C 76 -49.844 39.329 7.190 1.00 97.31 C ATOM 5329 C ALA C 76 -49.086 39.723 5.930 1.00 96.60 C ATOM 5330 O ALA C 76 -48.401 40.745 5.905 1.00 95.57 O ATOM 5331 CB ALA C 76 -49.062 38.271 7.963 1.00 96.93 C ATOM 5332 N ASN C 77 -49.228 38.898 4.890 1.00 97.00 N ATOM 5333 CA ASN C 77 -48.579 39.116 3.598 1.00 97.74 C ATOM 5334 C ASN C 77 -49.314 40.172 2.785 1.00 98.75 C ATOM 5335 O ASN C 77 -48.745 40.767 1.870 1.00 98.91 O ATOM 5336 CB ASN C 77 -48.510 37.804 2.805 1.00 97.23 C ATOM 5337 CG ASN C 77 -47.228 37.016 3.077 1.00 97.89 C ATOM 5338 OD1 ASN C 77 -46.160 37.342 2.550 1.00 96.57 O ATOM 5339 ND2 ASN C 77 -47.331 35.980 3.911 1.00 97.76 N ATOM 5340 N GLU C 78 -50.585 40.392 3.113 1.00 99.93 N ATOM 5341 CA GLU C 78 -51.375 41.406 2.426 1.00100.34 C ATOM 5342 C GLU C 78 -50.932 42.771 2.943 1.00 99.33 C ATOM 5343 O GLU C 78 -50.755 43.719 2.177 1.00 99.02 O ATOM 5344 CB GLU C 78 -52.867 41.214 2.702 1.00101.86 C ATOM 5345 CG GLU C 78 -53.722 42.358 2.171 1.00103.76 C ATOM 5346 CD GLU C 78 -55.202 42.110 2.346 1.00104.81 C ATOM 5347 OE1 GLU C 78 -55.625 41.853 3.494 1.00105.01 O ATOM 5348 OE2 GLU C 78 -55.935 42.177 1.332 1.00105.37 O ATOM 5349 N GLN C 79 -50.752 42.859 4.255 1.00 98.33 N ATOM 5350 CA GLN C 79 -50.315 44.094 4.876 1.00 98.14 C ATOM 5351 C GLN C 79 -48.932 44.502 4.370 1.00 98.36 C ATOM 5352 O GLN C 79 -48.658 45.695 4.185 1.00 99.37 O ATOM 5353 CB GLN C 79 -50.278 43.930 6.395 1.00 98.02 C ATOM 5354 CG GLN C 79 -49.596 45.083 7.119 1.00 98.51 C ATOM 5355 CD GLN C 79 -49.649 44.936 8.627 1.00 99.74 C ATOM 5356 OE1 GLN C 79 -49.107 45.766 9.363 1.00100.42 O ATOM 5357 NE2 GLN C 79 -50.308 43.880 9.096 1.00 99.69 N ATOM 5358 N LEU C 80 -48.071 43.509 4.140 1.00 97.08 N ATOM 5359 CA LEU C 80 -46.703 43.746 3.682 1.00 96.12 C ATOM 5360 C LEU C 80 -46.607 44.224 2.243 1.00 95.39 C ATOM 5361 O LEU C 80 -46.072 45.300 1.983 1.00 94.88 O ATOM 5362 CB LEU C 80 -45.852 42.476 3.872 1.00 96.94 C ATOM 5363 CG LEU C 80 -44.405 42.402 3.345 1.00 96.90 C ATOM 5364 CD1 LEU C 80 -43.543 43.538 3.884 1.00 96.64 C ATOM 5365 CD2 LEU C 80 -43.816 41.065 3.756 1.00 97.01 C ATOM 5366 N ALA C 81 -47.111 43.423 1.308 1.00 95.38 N ATOM 5367 CA ALA C 81 -47.068 43.787 -0.109 1.00 95.32 C ATOM 5368 C ALA C 81 -47.397 45.272 -0.220 1.00 95.15 C ATOM 5369 O ALA C 81 -46.698 46.029 -0.901 1.00 95.06 O ATOM 5370 CB ALA C 81 -48.077 42.951 -0.912 1.00 94.39 C ATOM 5371 N ALA C 82 -48.458 45.672 0.480 1.00 94.53 N ATOM 5372 CA ALA C 82 -48.917 47.052 0.513 1.00 92.86 C ATOM 5373 C ALA C 82 -47.767 48.022 0.804 1.00 92.77 C ATOM 5374 O ALA C 82 -47.420 48.859 -0.038 1.00 94.08 O ATOM 5375 CB ALA C 82 -49.991 47.198 1.565 1.00 91.55 C ATOM 5376 N VAL C 83 -47.168 47.911 1.988 1.00 90.57 N ATOM 5377 CA VAL C 83 -46.071 48.803 2.347 1.00 87.86 C ATOM 5378 C VAL C 83 -44.981 48.797 1.287 1.00 87.81 C ATOM 5379 O VAL C 83 -44.440 49.838 0.948 1.00 87.85 O ATOM 5380 CB VAL C 83 -45.443 48.424 3.695 1.00 85.44 C ATOM 5381 CG1 VAL C 83 -44.475 49.521 4.131 1.00 83.17 C ATOM 5382 CG2 VAL C 83 -46.541 48.200 4.738 1.00 84.36 C ATOM 5383 N VAL C 84 -44.672 47.619 0.760 1.00 88.67 N ATOM 5384 CA VAL C 84 -43.640 47.476 -0.268 1.00 89.97 C ATOM 5385 C VAL C 84 -43.946 48.301 -1.526 1.00 90.78 C ATOM 5386 O VAL C 84 -43.107 49.066 -2.010 1.00 89.94 O ATOM 5387 CB VAL C 84 -43.471 45.986 -0.662 1.00 89.27 C ATOM 5388 CG1 VAL C 84 -42.409 45.848 -1.746 1.00 88.30 C ATOM 5389 CG2 VAL C 84 -43.108 45.160 0.566 1.00 88.68 C ATOM 5390 N ALA C 85 -45.151 48.136 -2.059 1.00 92.01 N ATOM 5391 CA ALA C 85 -45.545 48.879 -3.243 1.00 93.16 C ATOM 5392 C ALA C 85 -45.523 50.350 -2.858 1.00 93.76 C ATOM 5393 O ALA C 85 -45.211 51.226 -3.679 1.00 92.85 O ATOM 5394 CB ALA C 85 -46.954 48.464 -3.687 1.00 92.48 C ATOM 5395 N GLU C 86 -45.836 50.596 -1.586 1.00 94.93 N ATOM 5396 CA GLU C 86 -45.892 51.938 -1.014 1.00 96.65 C ATOM 5397 C GLU C 86 -44.571 52.688 -1.113 1.00 97.30 C ATOM 5398 O GLU C 86 -44.531 53.875 -1.453 1.00 96.34 O ATOM 5399 CB GLU C 86 -46.322 51.849 0.450 1.00 97.43 C ATOM 5400 CG GLU C 86 -47.183 53.002 0.865 1.00101.11 C ATOM 5401 CD GLU C 86 -48.251 53.291 -0.176 1.00103.56 C ATOM 5402 OE1 GLU C 86 -49.022 52.360 -0.516 1.00104.28 O ATOM 5403 OE2 GLU C 86 -48.312 54.446 -0.657 1.00104.59 O ATOM 5404 N THR C 87 -43.494 51.978 -0.803 1.00 98.79 N ATOM 5405 CA THR C 87 -42.157 52.536 -0.847 1.00 99.86 C ATOM 5406 C THR C 87 -41.722 52.536 -2.291 1.00101.87 C ATOM 5407 O THR C 87 -41.326 53.565 -2.822 1.00102.78 O ATOM 5408 CB THR C 87 -41.175 51.680 -0.055 1.00 99.41 C ATOM 5409 OG1 THR C 87 -41.824 51.156 1.110 1.00 99.80 O ATOM 5410 CG2 THR C 87 -40.002 52.516 0.385 1.00 99.73 C ATOM 5411 N GLN C 88 -41.805 51.366 -2.919 1.00104.27 N ATOM 5412 CA GLN C 88 -41.431 51.196 -4.324 1.00107.00 C ATOM 5413 C GLN C 88 -41.853 52.362 -5.226 1.00107.90 C ATOM 5414 O GLN C 88 -41.137 52.717 -6.170 1.00107.08 O ATOM 5415 CB GLN C 88 -42.033 49.897 -4.873 1.00107.91 C ATOM 5416 CG GLN C 88 -41.208 48.652 -4.604 1.00109.47 C ATOM 5417 CD GLN C 88 -39.779 48.787 -5.106 1.00110.31 C ATOM 5418 OE1 GLN C 88 -39.511 49.514 -6.067 1.00110.06 O ATOM 5419 NE2 GLN C 88 -38.856 48.075 -4.462 1.00110.80 N ATOM 5420 N LYS C 89 -43.020 52.938 -4.931 1.00109.56 N ATOM 5421 CA LYS C 89 -43.564 54.059 -5.694 1.00110.90 C ATOM 5422 C LYS C 89 -42.773 55.344 -5.427 1.00111.00 C ATOM 5423 O LYS C 89 -42.549 56.151 -6.334 1.00110.95 O ATOM 5424 CB LYS C 89 -45.034 54.271 -5.330 1.00111.68 C ATOM 5425 CG LYS C 89 -45.757 55.250 -6.238 1.00113.68 C ATOM 5426 CD LYS C 89 -47.199 55.485 -5.784 1.00116.52 C ATOM 5427 CE LYS C 89 -48.025 54.191 -5.759 1.00117.23 C ATOM 5428 NZ LYS C 89 -49.448 54.420 -5.351 1.00117.27 N ATOM 5429 N ASN C 90 -42.360 55.525 -4.175 1.00110.80 N ATOM 5430 CA ASN C 90 -41.574 56.689 -3.776 1.00110.60 C ATOM 5431 C ASN C 90 -40.198 56.616 -4.417 1.00110.46 C ATOM 5432 O ASN C 90 -39.443 57.591 -4.422 1.00110.57 O ATOM 5433 CB ASN C 90 -41.425 56.720 -2.257 1.00110.18 C ATOM 5434 CG ASN C 90 -42.743 56.914 -1.558 1.00109.88 C ATOM 5435 OD1 ASN C 90 -43.316 57.998 -1.601 1.00110.06 O ATOM 5436 ND2 ASN C 90 -43.244 55.861 -0.922 1.00109.05 N ATOM 5437 N GLY C 91 -39.876 55.437 -4.942 1.00110.39 N ATOM 5438 CA GLY C 91 -38.600 55.232 -5.598 1.00109.77 C ATOM 5439 C GLY C 91 -37.501 54.598 -4.757 1.00109.04 C ATOM 5440 O GLY C 91 -36.437 54.277 -5.291 1.00109.23 O ATOM 5441 N THR C 92 -37.739 54.400 -3.461 1.00107.90 N ATOM 5442 CA THR C 92 -36.714 53.815 -2.596 1.00106.35 C ATOM 5443 C THR C 92 -36.637 52.293 -2.726 1.00105.86 C ATOM 5444 O THR C 92 -37.586 51.651 -3.194 1.00105.70 O ATOM 5445 CB THR C 92 -36.940 54.194 -1.096 1.00105.45 C ATOM 5446 OG1 THR C 92 -37.588 53.122 -0.406 1.00103.68 O ATOM 5447 CG2 THR C 92 -37.797 55.451 -0.979 1.00104.19 C ATOM 5448 N ILE C 93 -35.491 51.730 -2.331 1.00104.70 N ATOM 5449 CA ILE C 93 -35.261 50.283 -2.374 1.00101.94 C ATOM 5450 C ILE C 93 -35.797 49.633 -1.104 1.00100.46 C ATOM 5451 O ILE C 93 -35.450 50.037 0.011 1.00100.27 O ATOM 5452 CB ILE C 93 -33.764 49.944 -2.477 1.00101.61 C ATOM 5453 CG1 ILE C 93 -33.147 50.629 -3.694 1.00101.79 C ATOM 5454 CG2 ILE C 93 -33.585 48.439 -2.573 1.00101.59 C ATOM 5455 CD1 ILE C 93 -33.813 50.258 -4.997 1.00102.89 C ATOM 5456 N SER C 94 -36.648 48.630 -1.286 1.00 98.22 N ATOM 5457 CA SER C 94 -37.249 47.921 -0.168 1.00 96.48 C ATOM 5458 C SER C 94 -36.354 46.783 0.316 1.00 95.30 C ATOM 5459 O SER C 94 -35.854 45.986 -0.492 1.00 94.98 O ATOM 5460 CB SER C 94 -38.616 47.363 -0.576 1.00 96.70 C ATOM 5461 OG SER C 94 -38.528 46.616 -1.782 1.00 96.79 O ATOM 5462 N VAL C 95 -36.137 46.720 1.632 1.00 92.94 N ATOM 5463 CA VAL C 95 -35.308 45.673 2.232 1.00 89.32 C ATOM 5464 C VAL C 95 -36.050 45.048 3.409 1.00 87.17 C ATOM 5465 O VAL C 95 -36.049 45.580 4.523 1.00 86.40 O ATOM 5466 CB VAL C 95 -33.969 46.230 2.718 1.00 88.54 C ATOM 5467 CG1 VAL C 95 -32.985 45.078 2.909 1.00 88.25 C ATOM 5468 CG2 VAL C 95 -33.435 47.240 1.717 1.00 87.16 C ATOM 5469 N VAL C 96 -36.662 43.900 3.140 1.00 85.04 N ATOM 5470 CA VAL C 96 -37.463 43.182 4.116 1.00 83.33 C ATOM 5471 C VAL C 96 -36.753 42.226 5.065 1.00 81.90 C ATOM 5472 O VAL C 96 -36.457 41.087 4.702 1.00 81.87 O ATOM 5473 CB VAL C 96 -38.575 42.379 3.408 1.00 83.38 C ATOM 5474 CG1 VAL C 96 -39.501 41.740 4.436 1.00 83.72 C ATOM 5475 CG2 VAL C 96 -39.347 43.288 2.479 1.00 83.57 C ATOM 5476 N LEU C 97 -36.485 42.696 6.282 1.00 80.03 N ATOM 5477 CA LEU C 97 -35.883 41.850 7.305 1.00 76.82 C ATOM 5478 C LEU C 97 -37.056 40.934 7.593 1.00 75.41 C ATOM 5479 O LEU C 97 -38.188 41.404 7.631 1.00 74.63 O ATOM 5480 CB LEU C 97 -35.558 42.652 8.563 1.00 75.09 C ATOM 5481 CG LEU C 97 -34.639 43.860 8.388 1.00 75.32 C ATOM 5482 CD1 LEU C 97 -34.205 44.349 9.757 1.00 72.55 C ATOM 5483 CD2 LEU C 97 -33.434 43.475 7.544 1.00 74.86 C ATOM 5484 N GLY C 98 -36.821 39.643 7.793 1.00 74.85 N ATOM 5485 CA GLY C 98 -37.967 38.799 8.025 1.00 74.16 C ATOM 5486 C GLY C 98 -38.008 37.702 9.063 1.00 74.60 C ATOM 5487 O GLY C 98 -37.151 37.549 9.932 1.00 73.56 O ATOM 5488 N GLY C 99 -39.107 36.972 8.959 1.00 75.85 N ATOM 5489 CA GLY C 99 -39.409 35.851 9.808 1.00 78.40 C ATOM 5490 C GLY C 99 -39.398 34.814 8.717 1.00 80.54 C ATOM 5491 O GLY C 99 -38.617 34.968 7.774 1.00 80.94 O ATOM 5492 N ASP C 100 -40.261 33.805 8.781 1.00 82.59 N ATOM 5493 CA ASP C 100 -40.259 32.777 7.741 1.00 84.85 C ATOM 5494 C ASP C 100 -40.344 33.365 6.323 1.00 84.58 C ATOM 5495 O ASP C 100 -40.812 34.486 6.120 1.00 84.05 O ATOM 5496 CB ASP C 100 -41.378 31.757 7.990 1.00 87.56 C ATOM 5497 CG ASP C 100 -42.678 32.132 7.323 1.00 90.13 C ATOM 5498 OD1 ASP C 100 -42.720 32.118 6.074 1.00 89.63 O ATOM 5499 OD2 ASP C 100 -43.651 32.430 8.053 1.00 92.61 O ATOM 5500 N HIS C 101 -39.876 32.587 5.355 1.00 84.89 N ATOM 5501 CA HIS C 101 -39.810 32.984 3.955 1.00 85.65 C ATOM 5502 C HIS C 101 -41.146 33.115 3.236 1.00 85.99 C ATOM 5503 O HIS C 101 -41.175 33.316 2.021 1.00 85.93 O ATOM 5504 CB HIS C 101 -38.912 31.985 3.210 1.00 86.72 C ATOM 5505 CG HIS C 101 -38.209 32.553 2.013 1.00 86.60 C ATOM 5506 ND1 HIS C 101 -37.077 31.968 1.478 1.00 85.54 N ATOM 5507 CD2 HIS C 101 -38.491 33.616 1.223 1.00 86.26 C ATOM 5508 CE1 HIS C 101 -36.698 32.647 0.414 1.00 86.39 C ATOM 5509 NE2 HIS C 101 -37.540 33.652 0.234 1.00 86.65 N ATOM 5510 N SER C 102 -42.253 33.009 3.961 1.00 86.24 N ATOM 5511 CA SER C 102 -43.557 33.132 3.316 1.00 86.63 C ATOM 5512 C SER C 102 -43.957 34.599 3.207 1.00 85.62 C ATOM 5513 O SER C 102 -45.023 34.928 2.694 1.00 84.65 O ATOM 5514 CB SER C 102 -44.615 32.364 4.096 1.00 87.71 C ATOM 5515 OG SER C 102 -44.884 33.011 5.324 1.00 91.99 O ATOM 5516 N MET C 103 -43.099 35.482 3.707 1.00 85.08 N ATOM 5517 CA MET C 103 -43.370 36.905 3.623 1.00 84.48 C ATOM 5518 C MET C 103 -42.837 37.348 2.276 1.00 83.76 C ATOM 5519 O MET C 103 -42.787 38.539 1.969 1.00 83.39 O ATOM 5520 CB MET C 103 -42.669 37.669 4.745 1.00 84.83 C ATOM 5521 CG MET C 103 -43.120 37.260 6.136 1.00 87.32 C ATOM 5522 SD MET C 103 -44.920 37.234 6.336 1.00 88.58 S ATOM 5523 CE MET C 103 -45.261 38.997 6.300 1.00 87.63 C ATOM 5524 N ALA C 104 -42.431 36.367 1.475 1.00 83.38 N ATOM 5525 CA ALA C 104 -41.912 36.630 0.141 1.00 83.07 C ATOM 5526 C ALA C 104 -43.068 37.036 -0.777 1.00 82.42 C ATOM 5527 O ALA C 104 -42.933 37.953 -1.593 1.00 82.21 O ATOM 5528 CB ALA C 104 -41.209 35.393 -0.401 1.00 84.51 C ATOM 5529 N ILE C 105 -44.206 36.352 -0.649 1.00 80.53 N ATOM 5530 CA ILE C 105 -45.371 36.708 -1.455 1.00 78.98 C ATOM 5531 C ILE C 105 -45.629 38.200 -1.211 1.00 77.86 C ATOM 5532 O ILE C 105 -45.603 39.006 -2.143 1.00 76.47 O ATOM 5533 CB ILE C 105 -46.646 35.884 -1.067 1.00 78.15 C ATOM 5534 CG1 ILE C 105 -46.406 35.046 0.186 1.00 77.60 C ATOM 5535 CG2 ILE C 105 -47.012 34.936 -2.201 1.00 78.93 C ATOM 5536 CD1 ILE C 105 -47.572 34.117 0.496 1.00 74.53 C ATOM 5537 N GLY C 106 -45.852 38.559 0.051 1.00 76.94 N ATOM 5538 CA GLY C 106 -46.077 39.948 0.385 1.00 76.15 C ATOM 5539 C GLY C 106 -45.066 40.830 -0.318 1.00 76.23 C ATOM 5540 O GLY C 106 -45.422 41.560 -1.239 1.00 77.43 O ATOM 5541 N SER C 107 -43.806 40.754 0.100 1.00 75.72 N ATOM 5542 CA SER C 107 -42.737 41.560 -0.494 1.00 75.71 C ATOM 5543 C SER C 107 -42.783 41.586 -2.024 1.00 75.75 C ATOM 5544 O SER C 107 -42.938 42.648 -2.638 1.00 74.82 O ATOM 5545 CB SER C 107 -41.367 41.030 -0.037 1.00 76.72 C ATOM 5546 OG SER C 107 -40.278 41.666 -0.709 1.00 75.74 O ATOM 5547 N ILE C 108 -42.652 40.411 -2.631 1.00 75.95 N ATOM 5548 CA ILE C 108 -42.645 40.293 -4.079 1.00 75.58 C ATOM 5549 C ILE C 108 -43.867 40.935 -4.724 1.00 76.75 C ATOM 5550 O ILE C 108 -43.729 41.724 -5.658 1.00 77.19 O ATOM 5551 CB ILE C 108 -42.519 38.817 -4.503 1.00 74.49 C ATOM 5552 CG1 ILE C 108 -41.231 38.228 -3.924 1.00 73.59 C ATOM 5553 CG2 ILE C 108 -42.485 38.712 -6.012 1.00 75.30 C ATOM 5554 CD1 ILE C 108 -40.872 36.866 -4.439 1.00 71.41 C ATOM 5555 N SER C 109 -45.058 40.612 -4.224 1.00 77.84 N ATOM 5556 CA SER C 109 -46.290 41.197 -4.757 1.00 78.47 C ATOM 5557 C SER C 109 -46.192 42.713 -4.714 1.00 78.55 C ATOM 5558 O SER C 109 -46.242 43.377 -5.746 1.00 77.36 O ATOM 5559 CB SER C 109 -47.500 40.749 -3.930 1.00 78.95 C ATOM 5560 OG SER C 109 -47.919 39.450 -4.304 1.00 81.14 O ATOM 5561 N GLY C 110 -46.055 43.246 -3.504 1.00 79.20 N ATOM 5562 CA GLY C 110 -45.944 44.675 -3.333 1.00 81.12 C ATOM 5563 C GLY C 110 -44.992 45.269 -4.347 1.00 83.10 C ATOM 5564 O GLY C 110 -45.269 46.319 -4.924 1.00 83.47 O ATOM 5565 N HIS C 111 -43.875 44.592 -4.580 1.00 85.03 N ATOM 5566 CA HIS C 111 -42.876 45.069 -5.533 1.00 87.81 C ATOM 5567 C HIS C 111 -43.398 45.135 -6.984 1.00 88.96 C ATOM 5568 O HIS C 111 -43.240 46.157 -7.663 1.00 87.29 O ATOM 5569 CB HIS C 111 -41.628 44.171 -5.462 1.00 88.57 C ATOM 5570 CG HIS C 111 -40.443 44.715 -6.205 1.00 89.32 C ATOM 5571 ND1 HIS C 111 -39.254 44.027 -6.308 1.00 89.05 N ATOM 5572 CD2 HIS C 111 -40.270 45.873 -6.888 1.00 88.94 C ATOM 5573 CE1 HIS C 111 -38.398 44.737 -7.025 1.00 89.41 C ATOM 5574 NE2 HIS C 111 -38.990 45.860 -7.388 1.00 88.73 N ATOM 5575 N ALA C 112 -44.015 44.040 -7.437 1.00 90.99 N ATOM 5576 CA ALA C 112 -44.561 43.912 -8.796 1.00 92.37 C ATOM 5577 C ALA C 112 -45.599 44.979 -9.143 1.00 93.72 C ATOM 5578 O ALA C 112 -45.825 45.289 -10.320 1.00 93.91 O ATOM 5579 CB ALA C 112 -45.173 42.518 -8.980 1.00 91.27 C ATOM 5580 N ARG C 113 -46.237 45.526 -8.116 1.00 94.50 N ATOM 5581 CA ARG C 113 -47.241 46.554 -8.314 1.00 94.94 C ATOM 5582 C ARG C 113 -46.625 47.787 -8.966 1.00 95.62 C ATOM 5583 O ARG C 113 -47.198 48.357 -9.894 1.00 96.16 O ATOM 5584 CB ARG C 113 -47.900 46.898 -6.969 1.00 94.28 C ATOM 5585 CG ARG C 113 -48.905 45.834 -6.538 1.00 94.48 C ATOM 5586 CD ARG C 113 -49.435 46.004 -5.120 1.00 95.26 C ATOM 5587 NE ARG C 113 -50.510 45.043 -4.845 1.00 96.22 N ATOM 5588 CZ ARG C 113 -50.989 44.747 -3.632 1.00 96.89 C ATOM 5589 NH1 ARG C 113 -50.493 45.332 -2.543 1.00 96.87 N ATOM 5590 NH2 ARG C 113 -51.973 43.859 -3.507 1.00 96.57 N ATOM 5591 N VAL C 114 -45.444 48.177 -8.502 1.00 96.02 N ATOM 5592 CA VAL C 114 -44.765 49.347 -9.041 1.00 96.73 C ATOM 5593 C VAL C 114 -43.841 49.010 -10.215 1.00 97.56 C ATOM 5594 O VAL C 114 -43.620 49.841 -11.098 1.00 97.44 O ATOM 5595 CB VAL C 114 -43.962 50.053 -7.934 1.00 97.09 C ATOM 5596 CG1 VAL C 114 -43.369 51.347 -8.464 1.00 97.30 C ATOM 5597 CG2 VAL C 114 -44.867 50.319 -6.729 1.00 96.77 C ATOM 5598 N HIS C 115 -43.305 47.791 -10.215 1.00 99.09 N ATOM 5599 CA HIS C 115 -42.409 47.319 -11.278 1.00100.13 C ATOM 5600 C HIS C 115 -42.781 45.888 -11.662 1.00100.23 C ATOM 5601 O HIS C 115 -42.141 44.934 -11.228 1.00100.81 O ATOM 5602 CB HIS C 115 -40.950 47.337 -10.816 1.00100.57 C ATOM 5603 CG HIS C 115 -40.456 48.687 -10.397 1.00101.28 C ATOM 5604 ND1 HIS C 115 -40.787 49.260 -9.190 1.00101.75 N ATOM 5605 CD2 HIS C 115 -39.625 49.559 -11.015 1.00101.64 C ATOM 5606 CE1 HIS C 115 -40.176 50.428 -9.077 1.00102.42 C ATOM 5607 NE2 HIS C 115 -39.465 50.632 -10.170 1.00102.31 N ATOM 5608 N PRO C 116 -43.813 45.728 -12.494 1.00100.42 N ATOM 5609 CA PRO C 116 -44.321 44.438 -12.965 1.00100.72 C ATOM 5610 C PRO C 116 -43.414 43.751 -13.972 1.00100.87 C ATOM 5611 O PRO C 116 -43.747 42.680 -14.488 1.00100.56 O ATOM 5612 CB PRO C 116 -45.650 44.817 -13.582 1.00101.70 C ATOM 5613 CG PRO C 116 -45.298 46.124 -14.245 1.00102.18 C ATOM 5614 CD PRO C 116 -44.524 46.838 -13.153 1.00101.10 C ATOM 5615 N ASP C 117 -42.279 44.374 -14.266 1.00101.28 N ATOM 5616 CA ASP C 117 -41.332 43.805 -15.224 1.00102.06 C ATOM 5617 C ASP C 117 -40.113 43.186 -14.536 1.00101.05 C ATOM 5618 O ASP C 117 -39.187 42.715 -15.201 1.00101.44 O ATOM 5619 CB ASP C 117 -40.882 44.883 -16.221 1.00103.66 C ATOM 5620 CG ASP C 117 -40.296 46.113 -15.537 1.00105.43 C ATOM 5621 OD1 ASP C 117 -40.949 46.658 -14.616 1.00106.09 O ATOM 5622 OD2 ASP C 117 -39.186 46.540 -15.931 1.00106.50 O ATOM 5623 N LEU C 118 -40.133 43.177 -13.202 1.00 98.82 N ATOM 5624 CA LEU C 118 -39.034 42.635 -12.403 1.00 95.55 C ATOM 5625 C LEU C 118 -38.774 41.136 -12.601 1.00 93.68 C ATOM 5626 O LEU C 118 -39.627 40.385 -13.089 1.00 93.41 O ATOM 5627 CB LEU C 118 -39.278 42.909 -10.901 1.00 93.59 C ATOM 5628 CG LEU C 118 -40.488 42.276 -10.188 1.00 92.16 C ATOM 5629 CD1 LEU C 118 -40.366 40.762 -10.140 1.00 92.08 C ATOM 5630 CD2 LEU C 118 -40.570 42.816 -8.781 1.00 90.84 C ATOM 5631 N CYS C 119 -37.565 40.726 -12.231 1.00 90.59 N ATOM 5632 CA CYS C 119 -37.156 39.334 -12.293 1.00 86.30 C ATOM 5633 C CYS C 119 -36.972 39.002 -10.832 1.00 83.38 C ATOM 5634 O CYS C 119 -37.245 39.832 -9.970 1.00 83.17 O ATOM 5635 CB CYS C 119 -35.832 39.194 -13.032 1.00 85.90 C ATOM 5636 SG CYS C 119 -34.685 40.514 -12.633 1.00 85.58 S ATOM 5637 N VAL C 120 -36.498 37.804 -10.542 1.00 80.52 N ATOM 5638 CA VAL C 120 -36.310 37.409 -9.158 1.00 76.40 C ATOM 5639 C VAL C 120 -35.185 36.400 -8.991 1.00 73.42 C ATOM 5640 O VAL C 120 -35.177 35.355 -9.635 1.00 72.14 O ATOM 5641 CB VAL C 120 -37.627 36.805 -8.575 1.00 75.95 C ATOM 5642 CG1 VAL C 120 -37.347 36.116 -7.245 1.00 74.91 C ATOM 5643 CG2 VAL C 120 -38.673 37.911 -8.390 1.00 73.63 C ATOM 5644 N ILE C 121 -34.231 36.729 -8.129 1.00 71.26 N ATOM 5645 CA ILE C 121 -33.119 35.823 -7.829 1.00 69.79 C ATOM 5646 C ILE C 121 -33.463 35.213 -6.465 1.00 66.24 C ATOM 5647 O ILE C 121 -33.593 35.921 -5.464 1.00 65.09 O ATOM 5648 CB ILE C 121 -31.744 36.575 -7.767 1.00 71.19 C ATOM 5649 CG1 ILE C 121 -31.323 37.065 -9.159 1.00 71.26 C ATOM 5650 CG2 ILE C 121 -30.659 35.635 -7.276 1.00 71.09 C ATOM 5651 CD1 ILE C 121 -32.386 37.851 -9.888 1.00 72.36 C ATOM 5652 N TRP C 122 -33.627 33.899 -6.436 1.00 62.73 N ATOM 5653 CA TRP C 122 -34.005 33.217 -5.210 1.00 60.62 C ATOM 5654 C TRP C 122 -32.900 32.331 -4.625 1.00 60.19 C ATOM 5655 O TRP C 122 -32.720 31.176 -5.034 1.00 59.32 O ATOM 5656 CB TRP C 122 -35.277 32.401 -5.488 1.00 58.94 C ATOM 5657 CG TRP C 122 -35.961 31.801 -4.282 1.00 56.58 C ATOM 5658 CD1 TRP C 122 -35.587 30.676 -3.598 1.00 55.86 C ATOM 5659 CD2 TRP C 122 -37.168 32.264 -3.659 1.00 55.45 C ATOM 5660 NE1 TRP C 122 -36.488 30.406 -2.590 1.00 55.24 N ATOM 5661 CE2 TRP C 122 -37.465 31.370 -2.596 1.00 55.02 C ATOM 5662 CE3 TRP C 122 -38.021 33.357 -3.877 1.00 54.23 C ATOM 5663 CZ2 TRP C 122 -38.586 31.519 -1.775 1.00 54.64 C ATOM 5664 CZ3 TRP C 122 -39.142 33.513 -3.055 1.00 54.27 C ATOM 5665 CH2 TRP C 122 -39.408 32.601 -2.012 1.00 55.66 C ATOM 5666 N VAL C 123 -32.159 32.890 -3.670 1.00 59.72 N ATOM 5667 CA VAL C 123 -31.093 32.151 -2.987 1.00 59.40 C ATOM 5668 C VAL C 123 -31.721 31.513 -1.732 1.00 57.38 C ATOM 5669 O VAL C 123 -31.998 32.184 -0.734 1.00 55.82 O ATOM 5670 CB VAL C 123 -29.918 33.073 -2.565 1.00 59.65 C ATOM 5671 CG1 VAL C 123 -28.727 32.233 -2.164 1.00 58.94 C ATOM 5672 CG2 VAL C 123 -29.535 33.998 -3.699 1.00 58.72 C ATOM 5673 N ASP C 124 -31.947 30.205 -1.816 1.00 55.40 N ATOM 5674 CA ASP C 124 -32.583 29.423 -0.759 1.00 54.22 C ATOM 5675 C ASP C 124 -32.041 28.025 -1.002 1.00 55.31 C ATOM 5676 O ASP C 124 -31.593 27.723 -2.118 1.00 55.23 O ATOM 5677 CB ASP C 124 -34.110 29.442 -0.980 1.00 50.71 C ATOM 5678 CG ASP C 124 -34.893 28.911 0.187 1.00 47.58 C ATOM 5679 OD1 ASP C 124 -34.645 27.776 0.623 1.00 46.94 O ATOM 5680 OD2 ASP C 124 -35.793 29.623 0.675 1.00 46.77 O ATOM 5681 N ALA C 125 -32.054 27.184 0.029 1.00 56.37 N ATOM 5682 CA ALA C 125 -31.599 25.805 -0.138 1.00 57.62 C ATOM 5683 C ALA C 125 -32.745 25.090 -0.869 1.00 58.34 C ATOM 5684 O ALA C 125 -32.534 24.198 -1.694 1.00 56.62 O ATOM 5685 CB ALA C 125 -31.351 25.151 1.223 1.00 56.07 C ATOM 5686 N HIS C 126 -33.960 25.547 -0.566 1.00 59.92 N ATOM 5687 CA HIS C 126 -35.194 25.010 -1.109 1.00 61.14 C ATOM 5688 C HIS C 126 -35.820 25.875 -2.186 1.00 64.86 C ATOM 5689 O HIS C 126 -35.616 27.098 -2.218 1.00 65.54 O ATOM 5690 CB HIS C 126 -36.210 24.855 0.014 1.00 58.52 C ATOM 5691 CG HIS C 126 -35.594 24.490 1.318 1.00 57.04 C ATOM 5692 ND1 HIS C 126 -35.190 25.434 2.237 1.00 59.20 N ATOM 5693 CD2 HIS C 126 -35.202 23.291 1.806 1.00 55.86 C ATOM 5694 CE1 HIS C 126 -34.562 24.834 3.234 1.00 58.98 C ATOM 5695 NE2 HIS C 126 -34.555 23.533 2.995 1.00 58.70 N ATOM 5696 N THR C 127 -36.598 25.221 -3.052 1.00 67.28 N ATOM 5697 CA THR C 127 -37.325 25.882 -4.127 1.00 69.38 C ATOM 5698 C THR C 127 -38.539 26.553 -3.472 1.00 70.18 C ATOM 5699 O THR C 127 -38.912 27.681 -3.817 1.00 68.18 O ATOM 5700 CB THR C 127 -37.814 24.857 -5.183 1.00 71.32 C ATOM 5701 OG1 THR C 127 -38.155 23.629 -4.526 1.00 72.91 O ATOM 5702 CG2 THR C 127 -36.739 24.594 -6.238 1.00 70.67 C ATOM 5703 N ASP C 128 -39.136 25.848 -2.510 1.00 72.64 N ATOM 5704 CA ASP C 128 -40.307 26.346 -1.785 1.00 76.54 C ATOM 5705 C ASP C 128 -41.461 26.577 -2.777 1.00 78.32 C ATOM 5706 O ASP C 128 -42.374 27.368 -2.523 1.00 77.37 O ATOM 5707 CB ASP C 128 -39.979 27.676 -1.054 1.00 76.13 C ATOM 5708 CG ASP C 128 -38.773 27.567 -0.111 1.00 74.37 C ATOM 5709 OD1 ASP C 128 -38.566 26.485 0.488 1.00 74.66 O ATOM 5710 OD2 ASP C 128 -38.042 28.575 0.035 1.00 71.25 O ATOM 5711 N ILE C 129 -41.398 25.868 -3.906 1.00 81.57 N ATOM 5712 CA ILE C 129 -42.378 25.964 -4.995 1.00 82.62 C ATOM 5713 C ILE C 129 -43.511 24.920 -4.911 1.00 83.95 C ATOM 5714 O ILE C 129 -44.165 24.609 -5.913 1.00 83.15 O ATOM 5715 CB ILE C 129 -41.637 25.879 -6.390 1.00 81.91 C ATOM 5716 CG1 ILE C 129 -42.613 26.114 -7.531 1.00 81.40 C ATOM 5717 CG2 ILE C 129 -40.929 24.532 -6.542 1.00 80.22 C ATOM 5718 CD1 ILE C 129 -43.354 27.409 -7.411 1.00 82.11 C ATOM 5719 N ASN C 130 -43.748 24.406 -3.703 1.00 85.83 N ATOM 5720 CA ASN C 130 -44.803 23.429 -3.474 1.00 86.95 C ATOM 5721 C ASN C 130 -46.153 24.122 -3.347 1.00 87.55 C ATOM 5722 O ASN C 130 -46.242 25.293 -2.977 1.00 87.00 O ATOM 5723 CB ASN C 130 -44.540 22.603 -2.203 1.00 88.13 C ATOM 5724 CG ASN C 130 -43.678 21.370 -2.464 1.00 90.24 C ATOM 5725 OD1 ASN C 130 -43.803 20.715 -3.503 1.00 91.56 O ATOM 5726 ND2 ASN C 130 -42.812 21.038 -1.510 1.00 91.70 N ATOM 5727 N THR C 131 -47.203 23.379 -3.663 1.00 88.52 N ATOM 5728 CA THR C 131 -48.559 23.882 -3.599 1.00 88.78 C ATOM 5729 C THR C 131 -49.300 23.181 -2.465 1.00 90.67 C ATOM 5730 O THR C 131 -48.818 22.184 -1.909 1.00 89.83 O ATOM 5731 CB THR C 131 -49.278 23.599 -4.908 1.00 87.72 C ATOM 5732 OG1 THR C 131 -49.353 22.179 -5.108 1.00 86.20 O ATOM 5733 CG2 THR C 131 -48.519 24.228 -6.060 1.00 87.64 C ATOM 5734 N PRO C 132 -50.483 23.701 -2.103 1.00 92.13 N ATOM 5735 CA PRO C 132 -51.328 23.150 -1.034 1.00 92.76 C ATOM 5736 C PRO C 132 -51.606 21.666 -1.282 1.00 92.59 C ATOM 5737 O PRO C 132 -51.990 20.914 -0.375 1.00 91.94 O ATOM 5738 CB PRO C 132 -52.599 23.997 -1.134 1.00 93.72 C ATOM 5739 CG PRO C 132 -52.085 25.325 -1.635 1.00 92.98 C ATOM 5740 CD PRO C 132 -51.104 24.902 -2.696 1.00 92.38 C ATOM 5741 N LEU C 133 -51.402 21.280 -2.539 1.00 92.49 N ATOM 5742 CA LEU C 133 -51.611 19.923 -3.030 1.00 92.47 C ATOM 5743 C LEU C 133 -50.386 19.063 -2.764 1.00 92.31 C ATOM 5744 O LEU C 133 -50.488 17.937 -2.266 1.00 92.59 O ATOM 5745 CB LEU C 133 -51.864 19.961 -4.540 1.00 92.42 C ATOM 5746 CG LEU C 133 -52.849 21.015 -5.052 1.00 93.22 C ATOM 5747 CD1 LEU C 133 -52.945 20.924 -6.573 1.00 92.74 C ATOM 5748 CD2 LEU C 133 -54.215 20.807 -4.397 1.00 93.47 C ATOM 5749 N THR C 134 -49.228 19.615 -3.111 1.00 91.80 N ATOM 5750 CA THR C 134 -47.943 18.943 -2.956 1.00 90.29 C ATOM 5751 C THR C 134 -47.434 18.849 -1.513 1.00 88.61 C ATOM 5752 O THR C 134 -47.287 17.756 -0.973 1.00 87.36 O ATOM 5753 CB THR C 134 -46.892 19.667 -3.791 1.00 90.62 C ATOM 5754 OG1 THR C 134 -46.794 21.019 -3.336 1.00 91.28 O ATOM 5755 CG2 THR C 134 -47.291 19.679 -5.256 1.00 89.53 C ATOM 5756 N THR C 135 -47.164 20.001 -0.908 1.00 87.08 N ATOM 5757 CA THR C 135 -46.653 20.066 0.459 1.00 87.00 C ATOM 5758 C THR C 135 -47.089 18.937 1.375 1.00 86.71 C ATOM 5759 O THR C 135 -48.257 18.572 1.404 1.00 86.97 O ATOM 5760 CB THR C 135 -47.060 21.362 1.162 1.00 86.89 C ATOM 5761 OG1 THR C 135 -48.417 21.673 0.832 1.00 85.36 O ATOM 5762 CG2 THR C 135 -46.140 22.505 0.769 1.00 87.48 C ATOM 5763 N SER C 136 -46.129 18.392 2.117 1.00 86.78 N ATOM 5764 CA SER C 136 -46.392 17.343 3.089 1.00 87.04 C ATOM 5765 C SER C 136 -46.449 18.097 4.398 1.00 88.13 C ATOM 5766 O SER C 136 -46.087 17.581 5.454 1.00 88.17 O ATOM 5767 CB SER C 136 -45.259 16.320 3.141 1.00 86.80 C ATOM 5768 OG SER C 136 -45.393 15.348 2.125 1.00 87.18 O ATOM 5769 N SER C 137 -46.885 19.348 4.298 1.00 89.21 N ATOM 5770 CA SER C 137 -47.011 20.223 5.452 1.00 91.19 C ATOM 5771 C SER C 137 -47.723 21.484 4.992 1.00 91.93 C ATOM 5772 O SER C 137 -47.570 21.909 3.845 1.00 90.67 O ATOM 5773 CB SER C 137 -45.628 20.572 6.004 1.00 92.06 C ATOM 5774 OG SER C 137 -44.859 21.268 5.024 1.00 93.47 O ATOM 5775 N GLY C 138 -48.506 22.072 5.890 1.00 93.57 N ATOM 5776 CA GLY C 138 -49.249 23.271 5.551 1.00 95.55 C ATOM 5777 C GLY C 138 -48.429 24.524 5.762 1.00 96.50 C ATOM 5778 O GLY C 138 -48.951 25.648 5.804 1.00 97.48 O ATOM 5779 N ASN C 139 -47.125 24.308 5.887 1.00 96.37 N ATOM 5780 CA ASN C 139 -46.151 25.367 6.105 1.00 95.32 C ATOM 5781 C ASN C 139 -45.978 26.173 4.817 1.00 94.83 C ATOM 5782 O ASN C 139 -45.139 25.865 3.964 1.00 94.93 O ATOM 5783 CB ASN C 139 -44.837 24.724 6.534 1.00 95.07 C ATOM 5784 CG ASN C 139 -45.049 23.591 7.532 1.00 95.16 C ATOM 5785 OD1 ASN C 139 -46.031 22.842 7.450 1.00 96.11 O ATOM 5786 ND2 ASN C 139 -44.124 23.450 8.470 1.00 95.53 N ATOM 5787 N LEU C 140 -46.791 27.208 4.685 1.00 93.89 N ATOM 5788 CA LEU C 140 -46.753 28.057 3.511 1.00 94.05 C ATOM 5789 C LEU C 140 -45.395 28.696 3.218 1.00 94.50 C ATOM 5790 O LEU C 140 -45.225 29.317 2.166 1.00 94.62 O ATOM 5791 CB LEU C 140 -47.826 29.138 3.629 1.00 93.18 C ATOM 5792 CG LEU C 140 -49.252 28.630 3.457 1.00 91.51 C ATOM 5793 CD1 LEU C 140 -50.188 29.487 4.269 1.00 91.57 C ATOM 5794 CD2 LEU C 140 -49.624 28.627 1.981 1.00 89.96 C ATOM 5795 N HIS C 141 -44.431 28.558 4.132 1.00 94.12 N ATOM 5796 CA HIS C 141 -43.105 29.133 3.891 1.00 93.21 C ATOM 5797 C HIS C 141 -42.415 28.289 2.826 1.00 93.53 C ATOM 5798 O HIS C 141 -41.284 28.571 2.426 1.00 93.63 O ATOM 5799 CB HIS C 141 -42.253 29.175 5.177 1.00 90.47 C ATOM 5800 CG HIS C 141 -41.694 27.846 5.602 1.00 88.47 C ATOM 5801 ND1 HIS C 141 -40.829 27.718 6.665 1.00 86.91 N ATOM 5802 CD2 HIS C 141 -41.875 26.596 5.109 1.00 87.52 C ATOM 5803 CE1 HIS C 141 -40.498 26.445 6.810 1.00 86.12 C ATOM 5804 NE2 HIS C 141 -41.118 25.744 5.879 1.00 85.96 N ATOM 5805 N GLY C 142 -43.113 27.244 2.385 1.00 93.98 N ATOM 5806 CA GLY C 142 -42.584 26.354 1.368 1.00 95.37 C ATOM 5807 C GLY C 142 -43.467 26.325 0.131 1.00 96.23 C ATOM 5808 O GLY C 142 -43.521 25.317 -0.580 1.00 95.98 O ATOM 5809 N GLN C 143 -44.150 27.444 -0.113 1.00 96.91 N ATOM 5810 CA GLN C 143 -45.061 27.617 -1.244 1.00 97.13 C ATOM 5811 C GLN C 143 -45.019 29.008 -1.918 1.00 97.15 C ATOM 5812 O GLN C 143 -45.469 29.163 -3.051 1.00 96.46 O ATOM 5813 CB GLN C 143 -46.487 27.322 -0.778 1.00 97.37 C ATOM 5814 CG GLN C 143 -46.639 25.922 -0.202 1.00 98.88 C ATOM 5815 CD GLN C 143 -48.016 25.652 0.379 1.00 99.70 C ATOM 5816 OE1 GLN C 143 -49.036 25.846 -0.283 1.00 98.74 O ATOM 5817 NE2 GLN C 143 -48.048 25.186 1.624 1.00100.53 N ATOM 5818 N PRO C 144 -44.454 30.025 -1.242 1.00 97.65 N ATOM 5819 CA PRO C 144 -44.364 31.390 -1.770 1.00 97.32 C ATOM 5820 C PRO C 144 -44.320 31.605 -3.277 1.00 96.74 C ATOM 5821 O PRO C 144 -44.925 32.546 -3.786 1.00 96.95 O ATOM 5822 CB PRO C 144 -43.120 31.937 -1.080 1.00 97.93 C ATOM 5823 CG PRO C 144 -43.234 31.335 0.268 1.00 98.75 C ATOM 5824 CD PRO C 144 -43.587 29.896 -0.053 1.00 98.52 C ATOM 5825 N VAL C 145 -43.601 30.755 -3.996 1.00 96.76 N ATOM 5826 CA VAL C 145 -43.504 30.931 -5.443 1.00 96.67 C ATOM 5827 C VAL C 145 -44.789 30.483 -6.138 1.00 95.18 C ATOM 5828 O VAL C 145 -45.204 31.069 -7.143 1.00 94.04 O ATOM 5829 CB VAL C 145 -42.282 30.155 -6.027 1.00 97.80 C ATOM 5830 CG1 VAL C 145 -42.163 30.416 -7.521 1.00 97.24 C ATOM 5831 CG2 VAL C 145 -40.993 30.585 -5.322 1.00 97.02 C ATOM 5832 N ALA C 146 -45.415 29.449 -5.582 1.00 94.06 N ATOM 5833 CA ALA C 146 -46.653 28.898 -6.121 1.00 93.54 C ATOM 5834 C ALA C 146 -47.802 29.917 -6.100 1.00 93.29 C ATOM 5835 O ALA C 146 -48.802 29.761 -6.813 1.00 93.69 O ATOM 5836 CB ALA C 146 -47.041 27.640 -5.339 1.00 92.75 C ATOM 5837 N PHE C 147 -47.660 30.955 -5.282 1.00 91.80 N ATOM 5838 CA PHE C 147 -48.676 31.995 -5.189 1.00 89.97 C ATOM 5839 C PHE C 147 -48.355 33.143 -6.119 1.00 88.94 C ATOM 5840 O PHE C 147 -49.247 33.838 -6.600 1.00 88.67 O ATOM 5841 CB PHE C 147 -48.764 32.532 -3.769 1.00 90.00 C ATOM 5842 CG PHE C 147 -49.477 31.625 -2.840 1.00 90.95 C ATOM 5843 CD1 PHE C 147 -48.885 30.446 -2.413 1.00 91.17 C ATOM 5844 CD2 PHE C 147 -50.769 31.923 -2.425 1.00 91.91 C ATOM 5845 CE1 PHE C 147 -49.572 29.568 -1.578 1.00 92.42 C ATOM 5846 CE2 PHE C 147 -51.469 31.057 -1.594 1.00 92.68 C ATOM 5847 CZ PHE C 147 -50.871 29.873 -1.168 1.00 92.65 C ATOM 5848 N LEU C 148 -47.068 33.333 -6.366 1.00 88.06 N ATOM 5849 CA LEU C 148 -46.603 34.412 -7.212 1.00 87.82 C ATOM 5850 C LEU C 148 -46.560 34.009 -8.678 1.00 89.07 C ATOM 5851 O LEU C 148 -46.613 34.853 -9.577 1.00 88.06 O ATOM 5852 CB LEU C 148 -45.224 34.851 -6.727 1.00 85.29 C ATOM 5853 CG LEU C 148 -45.223 35.166 -5.232 1.00 82.72 C ATOM 5854 CD1 LEU C 148 -43.823 35.460 -4.751 1.00 81.84 C ATOM 5855 CD2 LEU C 148 -46.134 36.345 -4.982 1.00 82.23 C ATOM 5856 N LEU C 149 -46.472 32.708 -8.916 1.00 91.44 N ATOM 5857 CA LEU C 149 -46.421 32.204 -10.277 1.00 94.48 C ATOM 5858 C LEU C 149 -47.781 32.193 -10.955 1.00 96.57 C ATOM 5859 O LEU C 149 -48.803 31.855 -10.342 1.00 96.50 O ATOM 5860 CB LEU C 149 -45.843 30.786 -10.308 1.00 93.94 C ATOM 5861 CG LEU C 149 -44.332 30.616 -10.440 1.00 93.30 C ATOM 5862 CD1 LEU C 149 -44.038 29.135 -10.550 1.00 92.84 C ATOM 5863 CD2 LEU C 149 -43.807 31.355 -11.677 1.00 93.57 C ATOM 5864 N LYS C 150 -47.773 32.561 -12.233 1.00 98.82 N ATOM 5865 CA LYS C 150 -48.980 32.583 -13.049 1.00100.82 C ATOM 5866 C LYS C 150 -49.235 31.158 -13.556 1.00101.32 C ATOM 5867 O LYS C 150 -50.357 30.650 -13.477 1.00101.72 O ATOM 5868 CB LYS C 150 -48.797 33.541 -14.233 1.00101.79 C ATOM 5869 CG LYS C 150 -48.401 34.963 -13.835 1.00102.99 C ATOM 5870 CD LYS C 150 -48.169 35.853 -15.056 1.00103.99 C ATOM 5871 CE LYS C 150 -47.788 37.278 -14.641 1.00104.51 C ATOM 5872 NZ LYS C 150 -47.582 38.203 -15.802 1.00104.60 N ATOM 5873 N GLU C 151 -48.182 30.515 -14.059 1.00101.60 N ATOM 5874 CA GLU C 151 -48.272 29.154 -14.579 1.00102.29 C ATOM 5875 C GLU C 151 -48.759 28.179 -13.523 1.00102.54 C ATOM 5876 O GLU C 151 -48.884 26.982 -13.781 1.00102.16 O ATOM 5877 CB GLU C 151 -46.911 28.683 -15.089 1.00102.95 C ATOM 5878 CG GLU C 151 -46.352 29.485 -16.248 1.00104.96 C ATOM 5879 CD GLU C 151 -45.497 30.652 -15.801 1.00106.23 C ATOM 5880 OE1 GLU C 151 -46.005 31.524 -15.066 1.00107.34 O ATOM 5881 OE2 GLU C 151 -44.311 30.697 -16.191 1.00107.33 O ATOM 5882 N LEU C 152 -49.034 28.692 -12.334 1.00102.97 N ATOM 5883 CA LEU C 152 -49.496 27.852 -11.249 1.00104.39 C ATOM 5884 C LEU C 152 -50.868 28.281 -10.762 1.00105.77 C ATOM 5885 O LEU C 152 -51.474 27.621 -9.913 1.00105.76 O ATOM 5886 CB LEU C 152 -48.491 27.904 -10.103 1.00104.29 C ATOM 5887 CG LEU C 152 -47.813 26.586 -9.726 1.00104.15 C ATOM 5888 CD1 LEU C 152 -47.527 25.743 -10.955 1.00103.71 C ATOM 5889 CD2 LEU C 152 -46.535 26.911 -8.985 1.00104.71 C ATOM 5890 N LYS C 153 -51.355 29.394 -11.303 1.00107.04 N ATOM 5891 CA LYS C 153 -52.664 29.910 -10.923 1.00107.67 C ATOM 5892 C LYS C 153 -53.733 28.891 -11.305 1.00107.92 C ATOM 5893 O LYS C 153 -53.634 28.223 -12.341 1.00108.76 O ATOM 5894 CB LYS C 153 -52.933 31.246 -11.623 1.00107.80 C ATOM 5895 CG LYS C 153 -54.234 31.920 -11.208 1.00108.64 C ATOM 5896 CD LYS C 153 -54.395 33.289 -11.867 1.00109.46 C ATOM 5897 CE LYS C 153 -55.763 33.913 -11.570 1.00110.70 C ATOM 5898 NZ LYS C 153 -56.001 34.178 -10.115 1.00111.92 N ATOM 5899 N GLY C 154 -54.746 28.765 -10.454 1.00107.54 N ATOM 5900 CA GLY C 154 -55.813 27.824 -10.720 1.00107.33 C ATOM 5901 C GLY C 154 -55.285 26.411 -10.852 1.00107.30 C ATOM 5902 O GLY C 154 -55.922 25.558 -11.470 1.00108.46 O ATOM 5903 N LYS C 155 -54.108 26.166 -10.287 1.00106.64 N ATOM 5904 CA LYS C 155 -53.507 24.840 -10.323 1.00105.81 C ATOM 5905 C LYS C 155 -53.682 24.215 -8.940 1.00105.92 C ATOM 5906 O LYS C 155 -53.170 23.129 -8.656 1.00105.68 O ATOM 5907 CB LYS C 155 -52.025 24.939 -10.694 1.00104.96 C ATOM 5908 CG LYS C 155 -51.778 25.459 -12.099 1.00103.47 C ATOM 5909 CD LYS C 155 -52.336 24.498 -13.132 1.00104.34 C ATOM 5910 CE LYS C 155 -52.188 25.043 -14.548 1.00104.97 C ATOM 5911 NZ LYS C 155 -52.695 24.084 -15.580 1.00103.63 N ATOM 5912 N PHE C 156 -54.421 24.928 -8.091 1.00106.22 N ATOM 5913 CA PHE C 156 -54.719 24.499 -6.727 1.00106.40 C ATOM 5914 C PHE C 156 -55.875 25.320 -6.128 1.00106.38 C ATOM 5915 O PHE C 156 -56.113 26.466 -6.528 1.00105.46 O ATOM 5916 CB PHE C 156 -53.461 24.597 -5.838 1.00107.04 C ATOM 5917 CG PHE C 156 -52.809 25.965 -5.820 1.00107.71 C ATOM 5918 CD1 PHE C 156 -52.304 26.539 -6.992 1.00107.58 C ATOM 5919 CD2 PHE C 156 -52.682 26.669 -4.622 1.00106.90 C ATOM 5920 CE1 PHE C 156 -51.683 27.788 -6.968 1.00106.82 C ATOM 5921 CE2 PHE C 156 -52.064 27.913 -4.586 1.00106.77 C ATOM 5922 CZ PHE C 156 -51.563 28.476 -5.763 1.00107.12 C ATOM 5923 N PRO C 157 -56.615 24.732 -5.166 1.00106.61 N ATOM 5924 CA PRO C 157 -57.749 25.398 -4.511 1.00106.37 C ATOM 5925 C PRO C 157 -57.407 26.719 -3.839 1.00106.46 C ATOM 5926 O PRO C 157 -56.606 26.762 -2.904 1.00106.53 O ATOM 5927 CB PRO C 157 -58.236 24.349 -3.509 1.00106.35 C ATOM 5928 CG PRO C 157 -56.986 23.582 -3.179 1.00106.43 C ATOM 5929 CD PRO C 157 -56.363 23.417 -4.547 1.00106.79 C ATOM 5930 N ASP C 158 -58.026 27.793 -4.323 1.00106.90 N ATOM 5931 CA ASP C 158 -57.803 29.126 -3.772 1.00106.90 C ATOM 5932 C ASP C 158 -57.690 29.006 -2.255 1.00107.41 C ATOM 5933 O ASP C 158 -58.492 28.327 -1.603 1.00107.55 O ATOM 5934 CB ASP C 158 -58.953 30.053 -4.170 1.00104.94 C ATOM 5935 CG ASP C 158 -59.260 29.977 -5.652 1.00104.15 C ATOM 5936 OD1 ASP C 158 -58.323 30.151 -6.465 1.00103.27 O ATOM 5937 OD2 ASP C 158 -60.431 29.735 -6.009 1.00103.26 O ATOM 5938 N VAL C 159 -56.681 29.658 -1.696 1.00107.49 N ATOM 5939 CA VAL C 159 -56.446 29.572 -0.267 1.00107.43 C ATOM 5940 C VAL C 159 -57.010 30.724 0.559 1.00105.92 C ATOM 5941 O VAL C 159 -56.914 31.896 0.175 1.00105.41 O ATOM 5942 CB VAL C 159 -54.929 29.422 0.016 1.00108.72 C ATOM 5943 CG1 VAL C 159 -54.687 29.201 1.509 1.00109.55 C ATOM 5944 CG2 VAL C 159 -54.367 28.256 -0.800 1.00109.32 C ATOM 5945 N PRO C 160 -57.615 30.386 1.712 1.00104.72 N ATOM 5946 CA PRO C 160 -58.236 31.280 2.691 1.00104.41 C ATOM 5947 C PRO C 160 -57.314 32.357 3.252 1.00103.98 C ATOM 5948 O PRO C 160 -56.639 32.140 4.261 1.00103.74 O ATOM 5949 CB PRO C 160 -58.710 30.319 3.775 1.00103.93 C ATOM 5950 CG PRO C 160 -59.068 29.116 2.996 1.00103.63 C ATOM 5951 CD PRO C 160 -57.901 28.982 2.058 1.00104.02 C ATOM 5952 N GLY C 161 -57.301 33.515 2.596 1.00103.53 N ATOM 5953 CA GLY C 161 -56.485 34.622 3.054 1.00103.26 C ATOM 5954 C GLY C 161 -55.585 35.216 1.993 1.00103.11 C ATOM 5955 O GLY C 161 -55.055 36.315 2.165 1.00102.78 O ATOM 5956 N PHE C 162 -55.428 34.506 0.881 1.00102.76 N ATOM 5957 CA PHE C 162 -54.552 34.968 -0.185 1.00102.25 C ATOM 5958 C PHE C 162 -55.310 35.403 -1.425 1.00101.97 C ATOM 5959 O PHE C 162 -54.764 35.424 -2.526 1.00101.94 O ATOM 5960 CB PHE C 162 -53.559 33.860 -0.522 1.00101.45 C ATOM 5961 CG PHE C 162 -52.942 33.233 0.692 1.00101.08 C ATOM 5962 CD1 PHE C 162 -53.709 32.445 1.550 1.00100.76 C ATOM 5963 CD2 PHE C 162 -51.610 33.465 1.010 1.00101.58 C ATOM 5964 CE1 PHE C 162 -53.163 31.905 2.706 1.00100.12 C ATOM 5965 CE2 PHE C 162 -51.052 32.928 2.169 1.00101.83 C ATOM 5966 CZ PHE C 162 -51.834 32.146 3.018 1.00101.45 C ATOM 5967 N SER C 163 -56.574 35.754 -1.236 1.00102.29 N ATOM 5968 CA SER C 163 -57.417 36.206 -2.335 1.00102.63 C ATOM 5969 C SER C 163 -56.718 37.357 -3.069 1.00102.57 C ATOM 5970 O SER C 163 -56.464 37.277 -4.273 1.00102.59 O ATOM 5971 CB SER C 163 -58.767 36.668 -1.783 1.00102.37 C ATOM 5972 OG SER C 163 -59.335 35.663 -0.956 1.00101.02 O ATOM 5973 N TRP C 164 -56.401 38.409 -2.314 1.00102.49 N ATOM 5974 CA TRP C 164 -55.719 39.619 -2.802 1.00101.47 C ATOM 5975 C TRP C 164 -54.440 39.334 -3.596 1.00 99.77 C ATOM 5976 O TRP C 164 -53.917 40.208 -4.291 1.00 98.35 O ATOM 5977 CB TRP C 164 -55.358 40.500 -1.607 1.00103.47 C ATOM 5978 CG TRP C 164 -54.560 39.743 -0.573 1.00106.13 C ATOM 5979 CD1 TRP C 164 -55.048 38.997 0.470 1.00106.71 C ATOM 5980 CD2 TRP C 164 -53.135 39.592 -0.545 1.00106.44 C ATOM 5981 NE1 TRP C 164 -54.008 38.395 1.145 1.00106.48 N ATOM 5982 CE2 TRP C 164 -52.834 38.746 0.553 1.00106.26 C ATOM 5983 CE3 TRP C 164 -52.092 40.099 -1.326 1.00106.62 C ATOM 5984 CZ2 TRP C 164 -51.521 38.388 0.864 1.00106.64 C ATOM 5985 CZ3 TRP C 164 -50.791 39.741 -1.010 1.00106.79 C ATOM 5986 CH2 TRP C 164 -50.518 38.898 0.083 1.00107.31 C ATOM 5987 N VAL C 165 -53.931 38.114 -3.470 1.00 98.45 N ATOM 5988 CA VAL C 165 -52.722 37.714 -4.167 1.00 97.02 C ATOM 5989 C VAL C 165 -52.978 37.523 -5.654 1.00 96.57 C ATOM 5990 O VAL C 165 -53.953 36.888 -6.055 1.00 95.58 O ATOM 5991 CB VAL C 165 -52.166 36.410 -3.592 1.00 96.60 C ATOM 5992 CG1 VAL C 165 -50.889 36.029 -4.312 1.00 96.25 C ATOM 5993 CG2 VAL C 165 -51.925 36.573 -2.104 1.00 95.83 C ATOM 5994 N THR C 166 -52.082 38.073 -6.466 1.00 96.74 N ATOM 5995 CA THR C 166 -52.193 37.984 -7.916 1.00 96.80 C ATOM 5996 C THR C 166 -50.917 37.413 -8.530 1.00 96.74 C ATOM 5997 O THR C 166 -49.866 38.056 -8.498 1.00 96.41 O ATOM 5998 CB THR C 166 -52.430 39.376 -8.537 1.00 96.99 C ATOM 5999 OG1 THR C 166 -53.440 40.065 -7.793 1.00 97.70 O ATOM 6000 CG2 THR C 166 -52.873 39.248 -9.997 1.00 96.21 C ATOM 6001 N PRO C 167 -50.993 36.190 -9.087 1.00 96.68 N ATOM 6002 CA PRO C 167 -49.814 35.573 -9.706 1.00 97.04 C ATOM 6003 C PRO C 167 -49.160 36.630 -10.588 1.00 97.98 C ATOM 6004 O PRO C 167 -49.699 36.990 -11.633 1.00 98.54 O ATOM 6005 CB PRO C 167 -50.422 34.429 -10.499 1.00 96.70 C ATOM 6006 CG PRO C 167 -51.560 33.991 -9.597 1.00 96.21 C ATOM 6007 CD PRO C 167 -52.171 35.306 -9.176 1.00 95.91 C ATOM 6008 N CYS C 168 -47.999 37.122 -10.160 1.00 99.21 N ATOM 6009 CA CYS C 168 -47.297 38.195 -10.869 1.00100.39 C ATOM 6010 C CYS C 168 -46.129 37.901 -11.819 1.00100.22 C ATOM 6011 O CYS C 168 -45.781 38.759 -12.628 1.00100.36 O ATOM 6012 CB CYS C 168 -46.849 39.251 -9.850 1.00101.41 C ATOM 6013 SG CYS C 168 -46.011 38.580 -8.396 1.00102.86 S ATOM 6014 N ILE C 169 -45.499 36.734 -11.730 1.00100.14 N ATOM 6015 CA ILE C 169 -44.393 36.450 -12.650 1.00 99.43 C ATOM 6016 C ILE C 169 -44.377 35.016 -13.152 1.00100.06 C ATOM 6017 O ILE C 169 -44.802 34.086 -12.461 1.00 99.94 O ATOM 6018 CB ILE C 169 -43.003 36.764 -12.027 1.00 97.80 C ATOM 6019 CG1 ILE C 169 -42.736 35.850 -10.828 1.00 97.02 C ATOM 6020 CG2 ILE C 169 -42.926 38.225 -11.632 1.00 97.30 C ATOM 6021 CD1 ILE C 169 -43.743 35.960 -9.704 1.00 96.20 C ATOM 6022 N SER C 170 -43.898 34.850 -14.378 1.00100.49 N ATOM 6023 CA SER C 170 -43.805 33.536 -14.987 1.00101.32 C ATOM 6024 C SER C 170 -42.440 32.925 -14.645 1.00101.91 C ATOM 6025 O SER C 170 -41.592 33.589 -14.039 1.00102.46 O ATOM 6026 CB SER C 170 -44.005 33.654 -16.502 1.00100.78 C ATOM 6027 OG SER C 170 -43.292 34.748 -17.040 1.00100.26 O ATOM 6028 N ALA C 171 -42.232 31.667 -15.022 1.00101.63 N ATOM 6029 CA ALA C 171 -40.974 30.980 -14.735 1.00101.52 C ATOM 6030 C ALA C 171 -39.808 31.531 -15.558 1.00101.47 C ATOM 6031 O ALA C 171 -38.653 31.133 -15.379 1.00101.00 O ATOM 6032 CB ALA C 171 -41.138 29.487 -14.991 1.00101.29 C ATOM 6033 N LYS C 172 -40.119 32.459 -16.451 1.00101.77 N ATOM 6034 CA LYS C 172 -39.106 33.053 -17.303 1.00102.19 C ATOM 6035 C LYS C 172 -38.432 34.257 -16.656 1.00100.80 C ATOM 6036 O LYS C 172 -37.487 34.822 -17.211 1.00100.93 O ATOM 6037 CB LYS C 172 -39.737 33.449 -18.644 1.00104.75 C ATOM 6038 CG LYS C 172 -39.429 32.471 -19.786 1.00107.35 C ATOM 6039 CD LYS C 172 -39.637 31.000 -19.385 1.00107.90 C ATOM 6040 CE LYS C 172 -39.005 30.055 -20.413 1.00108.26 C ATOM 6041 NZ LYS C 172 -38.998 28.625 -19.986 1.00107.62 N ATOM 6042 N ASP C 173 -38.909 34.637 -15.475 1.00 98.57 N ATOM 6043 CA ASP C 173 -38.343 35.782 -14.775 1.00 95.87 C ATOM 6044 C ASP C 173 -37.755 35.431 -13.405 1.00 92.84 C ATOM 6045 O ASP C 173 -37.124 36.275 -12.763 1.00 92.74 O ATOM 6046 CB ASP C 173 -39.406 36.879 -14.612 1.00 97.37 C ATOM 6047 CG ASP C 173 -40.050 37.277 -15.933 1.00 97.26 C ATOM 6048 OD1 ASP C 173 -39.316 37.584 -16.900 1.00 95.86 O ATOM 6049 OD2 ASP C 173 -41.300 37.286 -15.993 1.00 98.11 O ATOM 6050 N ILE C 174 -37.961 34.188 -12.968 1.00 89.13 N ATOM 6051 CA ILE C 174 -37.449 33.709 -11.678 1.00 83.93 C ATOM 6052 C ILE C 174 -36.171 32.885 -11.834 1.00 80.70 C ATOM 6053 O ILE C 174 -36.029 32.139 -12.795 1.00 80.98 O ATOM 6054 CB ILE C 174 -38.489 32.816 -10.953 1.00 82.90 C ATOM 6055 CG1 ILE C 174 -37.888 32.249 -9.672 1.00 81.94 C ATOM 6056 CG2 ILE C 174 -38.906 31.668 -11.838 1.00 82.20 C ATOM 6057 CD1 ILE C 174 -37.434 33.312 -8.713 1.00 82.59 C ATOM 6058 N VAL C 175 -35.238 33.033 -10.895 1.00 77.49 N ATOM 6059 CA VAL C 175 -33.985 32.269 -10.918 1.00 74.17 C ATOM 6060 C VAL C 175 -33.581 31.836 -9.499 1.00 71.60 C ATOM 6061 O VAL C 175 -33.419 32.664 -8.596 1.00 70.28 O ATOM 6062 CB VAL C 175 -32.811 33.070 -11.575 1.00 73.83 C ATOM 6063 CG1 VAL C 175 -31.533 32.241 -11.556 1.00 72.13 C ATOM 6064 CG2 VAL C 175 -33.160 33.422 -13.002 1.00 71.71 C ATOM 6065 N TYR C 176 -33.430 30.525 -9.328 1.00 68.81 N ATOM 6066 CA TYR C 176 -33.060 29.914 -8.056 1.00 65.95 C ATOM 6067 C TYR C 176 -31.571 29.658 -7.976 1.00 66.00 C ATOM 6068 O TYR C 176 -30.980 29.178 -8.943 1.00 66.47 O ATOM 6069 CB TYR C 176 -33.743 28.570 -7.917 1.00 63.07 C ATOM 6070 CG TYR C 176 -35.182 28.610 -7.498 1.00 62.07 C ATOM 6071 CD1 TYR C 176 -35.529 28.864 -6.182 1.00 61.02 C ATOM 6072 CD2 TYR C 176 -36.191 28.239 -8.383 1.00 62.56 C ATOM 6073 CE1 TYR C 176 -36.846 28.726 -5.743 1.00 60.83 C ATOM 6074 CE2 TYR C 176 -37.515 28.095 -7.954 1.00 62.19 C ATOM 6075 CZ TYR C 176 -37.833 28.333 -6.626 1.00 61.80 C ATOM 6076 OH TYR C 176 -39.115 28.113 -6.168 1.00 61.14 O ATOM 6077 N ILE C 177 -30.964 29.974 -6.835 1.00 66.80 N ATOM 6078 CA ILE C 177 -29.533 29.710 -6.644 1.00 67.11 C ATOM 6079 C ILE C 177 -29.345 29.110 -5.243 1.00 68.56 C ATOM 6080 O ILE C 177 -29.829 29.668 -4.244 1.00 69.41 O ATOM 6081 CB ILE C 177 -28.637 30.996 -6.761 1.00 65.54 C ATOM 6082 CG1 ILE C 177 -29.263 32.022 -7.711 1.00 65.06 C ATOM 6083 CG2 ILE C 177 -27.263 30.625 -7.310 1.00 61.44 C ATOM 6084 CD1 ILE C 177 -28.408 33.255 -7.927 1.00 62.70 C ATOM 6085 N GLY C 178 -28.675 27.956 -5.181 1.00 68.46 N ATOM 6086 CA GLY C 178 -28.411 27.307 -3.909 1.00 68.24 C ATOM 6087 C GLY C 178 -29.284 26.110 -3.568 1.00 68.59 C ATOM 6088 O GLY C 178 -29.184 25.574 -2.453 1.00 68.65 O ATOM 6089 N LEU C 179 -30.121 25.675 -4.511 1.00 67.99 N ATOM 6090 CA LEU C 179 -31.016 24.535 -4.275 1.00 65.97 C ATOM 6091 C LEU C 179 -30.246 23.243 -3.977 1.00 64.84 C ATOM 6092 O LEU C 179 -29.312 22.861 -4.683 1.00 62.84 O ATOM 6093 CB LEU C 179 -31.960 24.338 -5.473 1.00 63.88 C ATOM 6094 CG LEU C 179 -32.747 25.595 -5.887 1.00 61.65 C ATOM 6095 CD1 LEU C 179 -33.625 25.289 -7.091 1.00 60.43 C ATOM 6096 CD2 LEU C 179 -33.573 26.094 -4.715 1.00 58.38 C ATOM 6097 N ARG C 180 -30.650 22.573 -2.908 1.00 64.18 N ATOM 6098 CA ARG C 180 -29.986 21.354 -2.513 1.00 63.57 C ATOM 6099 C ARG C 180 -30.923 20.447 -1.726 1.00 65.71 C ATOM 6100 O ARG C 180 -30.547 19.346 -1.344 1.00 67.12 O ATOM 6101 CB ARG C 180 -28.742 21.698 -1.691 1.00 60.48 C ATOM 6102 CG ARG C 180 -28.995 21.823 -0.206 1.00 58.48 C ATOM 6103 CD ARG C 180 -27.780 22.311 0.545 1.00 55.45 C ATOM 6104 NE ARG C 180 -27.593 23.716 0.259 1.00 58.34 N ATOM 6105 CZ ARG C 180 -27.527 24.676 1.175 1.00 60.04 C ATOM 6106 NH1 ARG C 180 -27.621 24.388 2.468 1.00 58.90 N ATOM 6107 NH2 ARG C 180 -27.403 25.942 0.782 1.00 60.12 N ATOM 6108 N ASP C 181 -32.141 20.907 -1.475 1.00 68.57 N ATOM 6109 CA ASP C 181 -33.122 20.084 -0.764 1.00 72.38 C ATOM 6110 C ASP C 181 -34.517 20.290 -1.374 1.00 73.65 C ATOM 6111 O ASP C 181 -35.397 20.954 -0.800 1.00 72.86 O ATOM 6112 CB ASP C 181 -33.115 20.416 0.734 1.00 74.74 C ATOM 6113 CG ASP C 181 -34.318 19.832 1.468 1.00 76.50 C ATOM 6114 OD1 ASP C 181 -34.798 18.754 1.052 1.00 76.76 O ATOM 6115 OD2 ASP C 181 -34.769 20.451 2.462 1.00 77.76 O ATOM 6116 N VAL C 182 -34.715 19.701 -2.550 1.00 75.62 N ATOM 6117 CA VAL C 182 -35.979 19.851 -3.256 1.00 77.84 C ATOM 6118 C VAL C 182 -36.890 18.642 -3.246 1.00 78.44 C ATOM 6119 O VAL C 182 -36.466 17.544 -3.601 1.00 79.02 O ATOM 6120 CB VAL C 182 -35.737 20.268 -4.718 1.00 78.31 C ATOM 6121 CG1 VAL C 182 -36.843 19.734 -5.618 1.00 79.03 C ATOM 6122 CG2 VAL C 182 -35.691 21.785 -4.807 1.00 79.08 C ATOM 6123 N ASP C 183 -38.151 18.875 -2.866 1.00 79.14 N ATOM 6124 CA ASP C 183 -39.180 17.837 -2.805 1.00 78.94 C ATOM 6125 C ASP C 183 -39.618 17.348 -4.180 1.00 79.30 C ATOM 6126 O ASP C 183 -39.540 18.082 -5.169 1.00 79.36 O ATOM 6127 CB ASP C 183 -40.404 18.344 -2.042 1.00 78.05 C ATOM 6128 CG ASP C 183 -40.178 18.381 -0.548 1.00 79.06 C ATOM 6129 OD1 ASP C 183 -39.926 17.306 0.037 1.00 78.74 O ATOM 6130 OD2 ASP C 183 -40.247 19.481 0.045 1.00 80.08 O ATOM 6131 N PRO C 184 -40.092 16.092 -4.256 1.00 79.74 N ATOM 6132 CA PRO C 184 -40.549 15.490 -5.513 1.00 80.29 C ATOM 6133 C PRO C 184 -41.403 16.491 -6.289 1.00 81.42 C ATOM 6134 O PRO C 184 -40.961 17.054 -7.295 1.00 81.63 O ATOM 6135 CB PRO C 184 -41.365 14.290 -5.041 1.00 78.84 C ATOM 6136 CG PRO C 184 -40.695 13.907 -3.784 1.00 78.62 C ATOM 6137 CD PRO C 184 -40.448 15.234 -3.114 1.00 78.97 C ATOM 6138 N GLY C 185 -42.621 16.709 -5.794 1.00 81.81 N ATOM 6139 CA GLY C 185 -43.545 17.632 -6.423 1.00 82.31 C ATOM 6140 C GLY C 185 -42.854 18.864 -6.954 1.00 82.89 C ATOM 6141 O GLY C 185 -43.068 19.263 -8.096 1.00 82.84 O ATOM 6142 N GLU C 186 -42.017 19.469 -6.121 1.00 83.40 N ATOM 6143 CA GLU C 186 -41.291 20.655 -6.529 1.00 84.01 C ATOM 6144 C GLU C 186 -40.413 20.404 -7.758 1.00 85.43 C ATOM 6145 O GLU C 186 -40.388 21.227 -8.678 1.00 85.30 O ATOM 6146 CB GLU C 186 -40.455 21.178 -5.360 1.00 82.52 C ATOM 6147 CG GLU C 186 -41.201 22.154 -4.478 1.00 80.01 C ATOM 6148 CD GLU C 186 -40.527 22.356 -3.147 1.00 80.52 C ATOM 6149 OE1 GLU C 186 -40.402 21.351 -2.422 1.00 82.23 O ATOM 6150 OE2 GLU C 186 -40.128 23.499 -2.821 1.00 79.04 O ATOM 6151 N HIS C 187 -39.705 19.273 -7.788 1.00 87.43 N ATOM 6152 CA HIS C 187 -38.843 18.967 -8.931 1.00 89.34 C ATOM 6153 C HIS C 187 -39.693 18.917 -10.193 1.00 91.26 C ATOM 6154 O HIS C 187 -39.319 19.454 -11.241 1.00 91.25 O ATOM 6155 CB HIS C 187 -38.119 17.630 -8.748 1.00 88.09 C ATOM 6156 CG HIS C 187 -36.968 17.438 -9.693 1.00 87.99 C ATOM 6157 ND1 HIS C 187 -36.223 16.285 -9.743 1.00 88.58 N ATOM 6158 CD2 HIS C 187 -36.419 18.282 -10.601 1.00 87.65 C ATOM 6159 CE1 HIS C 187 -35.258 16.422 -10.642 1.00 87.86 C ATOM 6160 NE2 HIS C 187 -35.356 17.623 -11.174 1.00 86.86 N ATOM 6161 N TYR C 188 -40.842 18.259 -10.073 1.00 93.24 N ATOM 6162 CA TYR C 188 -41.792 18.125 -11.170 1.00 94.52 C ATOM 6163 C TYR C 188 -42.043 19.524 -11.722 1.00 93.43 C ATOM 6164 O TYR C 188 -41.693 19.837 -12.858 1.00 93.13 O ATOM 6165 CB TYR C 188 -43.098 17.537 -10.632 1.00 97.75 C ATOM 6166 CG TYR C 188 -44.121 17.192 -11.687 1.00101.26 C ATOM 6167 CD1 TYR C 188 -43.908 16.131 -12.560 1.00102.84 C ATOM 6168 CD2 TYR C 188 -45.303 17.922 -11.805 1.00103.00 C ATOM 6169 CE1 TYR C 188 -44.843 15.801 -13.527 1.00105.63 C ATOM 6170 CE2 TYR C 188 -46.251 17.601 -12.772 1.00105.61 C ATOM 6171 CZ TYR C 188 -46.014 16.539 -13.632 1.00106.52 C ATOM 6172 OH TYR C 188 -46.933 16.207 -14.604 1.00107.84 O ATOM 6173 N ILE C 189 -42.639 20.352 -10.875 1.00 91.82 N ATOM 6174 CA ILE C 189 -42.978 21.724 -11.192 1.00 89.96 C ATOM 6175 C ILE C 189 -41.868 22.511 -11.885 1.00 89.45 C ATOM 6176 O ILE C 189 -42.083 23.040 -12.971 1.00 90.11 O ATOM 6177 CB ILE C 189 -43.397 22.464 -9.916 1.00 89.42 C ATOM 6178 CG1 ILE C 189 -44.398 21.602 -9.144 1.00 89.61 C ATOM 6179 CG2 ILE C 189 -44.017 23.798 -10.268 1.00 88.40 C ATOM 6180 CD1 ILE C 189 -44.767 22.115 -7.754 1.00 90.14 C ATOM 6181 N ILE C 190 -40.683 22.593 -11.282 1.00 88.36 N ATOM 6182 CA ILE C 190 -39.589 23.349 -11.900 1.00 87.98 C ATOM 6183 C ILE C 190 -39.159 22.836 -13.274 1.00 87.97 C ATOM 6184 O ILE C 190 -38.780 23.623 -14.145 1.00 87.59 O ATOM 6185 CB ILE C 190 -38.332 23.398 -11.000 1.00 87.07 C ATOM 6186 CG1 ILE C 190 -38.038 22.009 -10.444 1.00 86.15 C ATOM 6187 CG2 ILE C 190 -38.519 24.420 -9.895 1.00 86.59 C ATOM 6188 CD1 ILE C 190 -36.833 21.976 -9.547 1.00 85.40 C ATOM 6189 N LYS C 191 -39.201 21.522 -13.464 1.00 88.65 N ATOM 6190 CA LYS C 191 -38.823 20.927 -14.743 1.00 88.53 C ATOM 6191 C LYS C 191 -39.913 21.242 -15.762 1.00 88.10 C ATOM 6192 O LYS C 191 -39.652 21.807 -16.829 1.00 88.34 O ATOM 6193 CB LYS C 191 -38.693 19.411 -14.604 1.00 88.47 C ATOM 6194 CG LYS C 191 -37.735 18.968 -13.527 1.00 88.98 C ATOM 6195 CD LYS C 191 -36.335 19.509 -13.773 1.00 90.42 C ATOM 6196 CE LYS C 191 -35.702 18.905 -15.021 1.00 91.22 C ATOM 6197 NZ LYS C 191 -34.312 19.400 -15.257 1.00 91.30 N ATOM 6198 N THR C 192 -41.139 20.865 -15.399 1.00 87.39 N ATOM 6199 CA THR C 192 -42.321 21.066 -16.222 1.00 86.37 C ATOM 6200 C THR C 192 -42.460 22.503 -16.674 1.00 85.11 C ATOM 6201 O THR C 192 -42.466 22.783 -17.869 1.00 85.30 O ATOM 6202 CB THR C 192 -43.599 20.691 -15.454 1.00 86.84 C ATOM 6203 OG1 THR C 192 -43.662 19.268 -15.286 1.00 87.24 O ATOM 6204 CG2 THR C 192 -44.831 21.173 -16.208 1.00 87.53 C ATOM 6205 N LEU C 193 -42.582 23.405 -15.709 1.00 83.59 N ATOM 6206 CA LEU C 193 -42.736 24.819 -15.997 1.00 81.81 C ATOM 6207 C LEU C 193 -41.452 25.451 -16.474 1.00 81.47 C ATOM 6208 O LEU C 193 -41.378 26.667 -16.607 1.00 82.12 O ATOM 6209 CB LEU C 193 -43.219 25.567 -14.763 1.00 81.24 C ATOM 6210 CG LEU C 193 -44.638 25.267 -14.291 1.00 80.57 C ATOM 6211 CD1 LEU C 193 -44.763 23.807 -13.868 1.00 80.60 C ATOM 6212 CD2 LEU C 193 -44.965 26.195 -13.135 1.00 80.83 C ATOM 6213 N GLY C 194 -40.440 24.633 -16.729 1.00 80.95 N ATOM 6214 CA GLY C 194 -39.176 25.170 -17.196 1.00 80.89 C ATOM 6215 C GLY C 194 -38.723 26.401 -16.426 1.00 80.68 C ATOM 6216 O GLY C 194 -38.871 27.527 -16.896 1.00 79.93 O ATOM 6217 N ILE C 195 -38.167 26.177 -15.240 1.00 81.64 N ATOM 6218 CA ILE C 195 -37.675 27.245 -14.377 1.00 81.78 C ATOM 6219 C ILE C 195 -36.155 27.205 -14.336 1.00 82.29 C ATOM 6220 O ILE C 195 -35.579 26.181 -13.955 1.00 82.81 O ATOM 6221 CB ILE C 195 -38.193 27.065 -12.942 1.00 81.34 C ATOM 6222 CG1 ILE C 195 -39.684 27.371 -12.895 1.00 81.48 C ATOM 6223 CG2 ILE C 195 -37.399 27.938 -11.987 1.00 81.33 C ATOM 6224 CD1 ILE C 195 -40.334 27.023 -11.576 1.00 82.56 C ATOM 6225 N LYS C 196 -35.510 28.306 -14.723 1.00 82.10 N ATOM 6226 CA LYS C 196 -34.048 28.367 -14.710 1.00 82.86 C ATOM 6227 C LYS C 196 -33.552 28.357 -13.262 1.00 83.44 C ATOM 6228 O LYS C 196 -33.980 29.172 -12.436 1.00 82.76 O ATOM 6229 CB LYS C 196 -33.550 29.620 -15.430 1.00 82.66 C ATOM 6230 CG LYS C 196 -32.041 29.767 -15.485 1.00 82.46 C ATOM 6231 CD LYS C 196 -31.378 28.724 -16.372 1.00 83.91 C ATOM 6232 CE LYS C 196 -29.900 29.091 -16.621 1.00 86.21 C ATOM 6233 NZ LYS C 196 -29.061 28.082 -17.364 1.00 85.46 N ATOM 6234 N TYR C 197 -32.656 27.416 -12.965 1.00 83.30 N ATOM 6235 CA TYR C 197 -32.108 27.253 -11.621 1.00 82.70 C ATOM 6236 C TYR C 197 -30.638 26.856 -11.653 1.00 82.58 C ATOM 6237 O TYR C 197 -30.065 26.617 -12.717 1.00 83.40 O ATOM 6238 CB TYR C 197 -32.900 26.178 -10.852 1.00 82.50 C ATOM 6239 CG TYR C 197 -32.889 24.799 -11.502 1.00 82.72 C ATOM 6240 CD1 TYR C 197 -31.681 24.162 -11.820 1.00 82.81 C ATOM 6241 CD2 TYR C 197 -34.085 24.132 -11.809 1.00 82.18 C ATOM 6242 CE1 TYR C 197 -31.655 22.905 -12.430 1.00 82.44 C ATOM 6243 CE2 TYR C 197 -34.073 22.863 -12.419 1.00 82.21 C ATOM 6244 CZ TYR C 197 -32.846 22.258 -12.731 1.00 82.80 C ATOM 6245 OH TYR C 197 -32.789 21.028 -13.364 1.00 80.90 O ATOM 6246 N PHE C 198 -30.036 26.772 -10.471 1.00 81.74 N ATOM 6247 CA PHE C 198 -28.636 26.387 -10.343 1.00 79.29 C ATOM 6248 C PHE C 198 -28.445 25.648 -9.036 1.00 75.97 C ATOM 6249 O PHE C 198 -28.090 26.252 -8.030 1.00 76.68 O ATOM 6250 CB PHE C 198 -27.718 27.611 -10.330 1.00 81.72 C ATOM 6251 CG PHE C 198 -27.609 28.313 -11.648 1.00 84.47 C ATOM 6252 CD1 PHE C 198 -28.608 29.182 -12.076 1.00 85.88 C ATOM 6253 CD2 PHE C 198 -26.481 28.134 -12.450 1.00 85.59 C ATOM 6254 CE1 PHE C 198 -28.482 29.869 -13.283 1.00 87.07 C ATOM 6255 CE2 PHE C 198 -26.345 28.816 -13.655 1.00 86.78 C ATOM 6256 CZ PHE C 198 -27.347 29.688 -14.074 1.00 86.99 C ATOM 6257 N SER C 199 -28.673 24.345 -9.039 1.00 71.87 N ATOM 6258 CA SER C 199 -28.496 23.594 -7.815 1.00 67.84 C ATOM 6259 C SER C 199 -27.020 23.591 -7.444 1.00 66.10 C ATOM 6260 O SER C 199 -26.157 23.893 -8.280 1.00 64.20 O ATOM 6261 CB SER C 199 -28.983 22.167 -7.993 1.00 66.46 C ATOM 6262 OG SER C 199 -28.254 21.544 -9.020 1.00 65.39 O ATOM 6263 N MET C 200 -26.747 23.257 -6.183 1.00 64.09 N ATOM 6264 CA MET C 200 -25.391 23.195 -5.669 1.00 61.01 C ATOM 6265 C MET C 200 -24.500 22.517 -6.690 1.00 62.42 C ATOM 6266 O MET C 200 -23.349 22.918 -6.882 1.00 62.06 O ATOM 6267 CB MET C 200 -25.360 22.426 -4.348 1.00 56.33 C ATOM 6268 CG MET C 200 -25.975 23.176 -3.183 1.00 53.16 C ATOM 6269 SD MET C 200 -25.235 24.823 -2.909 1.00 46.37 S ATOM 6270 CE MET C 200 -23.569 24.389 -2.627 1.00 48.10 C ATOM 6271 N THR C 201 -25.038 21.499 -7.357 1.00 62.97 N ATOM 6272 CA THR C 201 -24.269 20.786 -8.366 1.00 64.97 C ATOM 6273 C THR C 201 -23.873 21.736 -9.512 1.00 66.10 C ATOM 6274 O THR C 201 -22.776 21.608 -10.092 1.00 65.58 O ATOM 6275 CB THR C 201 -25.049 19.537 -8.900 1.00 65.68 C ATOM 6276 OG1 THR C 201 -25.211 19.624 -10.323 1.00 65.66 O ATOM 6277 CG2 THR C 201 -26.412 19.421 -8.223 1.00 65.96 C ATOM 6278 N GLU C 202 -24.746 22.699 -9.821 1.00 66.73 N ATOM 6279 CA GLU C 202 -24.451 23.665 -10.881 1.00 67.80 C ATOM 6280 C GLU C 202 -23.439 24.647 -10.374 1.00 68.66 C ATOM 6281 O GLU C 202 -22.535 25.058 -11.108 1.00 70.18 O ATOM 6282 CB GLU C 202 -25.690 24.416 -11.315 1.00 67.72 C ATOM 6283 CG GLU C 202 -26.301 23.806 -12.546 1.00 72.08 C ATOM 6284 CD GLU C 202 -26.684 22.351 -12.333 1.00 74.23 C ATOM 6285 OE1 GLU C 202 -27.655 22.112 -11.574 1.00 73.29 O ATOM 6286 OE2 GLU C 202 -26.009 21.459 -12.916 1.00 75.62 O ATOM 6287 N VAL C 203 -23.606 25.031 -9.113 1.00 67.55 N ATOM 6288 CA VAL C 203 -22.674 25.943 -8.478 1.00 65.53 C ATOM 6289 C VAL C 203 -21.335 25.228 -8.657 1.00 64.91 C ATOM 6290 O VAL C 203 -20.383 25.771 -9.232 1.00 64.87 O ATOM 6291 CB VAL C 203 -22.962 26.076 -6.949 1.00 65.75 C ATOM 6292 CG1 VAL C 203 -22.137 27.210 -6.355 1.00 65.81 C ATOM 6293 CG2 VAL C 203 -24.442 26.299 -6.697 1.00 66.78 C ATOM 6294 N ASP C 204 -21.299 23.983 -8.175 1.00 62.79 N ATOM 6295 CA ASP C 204 -20.110 23.149 -8.228 1.00 61.49 C ATOM 6296 C ASP C 204 -19.480 23.109 -9.590 1.00 62.37 C ATOM 6297 O ASP C 204 -18.264 23.214 -9.715 1.00 61.65 O ATOM 6298 CB ASP C 204 -20.448 21.726 -7.799 1.00 59.70 C ATOM 6299 CG ASP C 204 -20.458 21.562 -6.300 1.00 56.79 C ATOM 6300 OD1 ASP C 204 -20.639 22.589 -5.614 1.00 52.71 O ATOM 6301 OD2 ASP C 204 -20.296 20.414 -5.820 1.00 54.02 O ATOM 6302 N LYS C 205 -20.312 22.956 -10.612 1.00 64.38 N ATOM 6303 CA LYS C 205 -19.814 22.877 -11.977 1.00 65.17 C ATOM 6304 C LYS C 205 -19.240 24.177 -12.527 1.00 64.26 C ATOM 6305 O LYS C 205 -18.069 24.242 -12.870 1.00 63.15 O ATOM 6306 CB LYS C 205 -20.913 22.381 -12.910 1.00 66.18 C ATOM 6307 CG LYS C 205 -20.408 22.116 -14.313 1.00 69.34 C ATOM 6308 CD LYS C 205 -21.515 21.591 -15.216 1.00 73.83 C ATOM 6309 CE LYS C 205 -21.001 21.166 -16.598 1.00 76.14 C ATOM 6310 NZ LYS C 205 -20.089 19.970 -16.569 1.00 77.46 N ATOM 6311 N LEU C 206 -20.060 25.215 -12.597 1.00 64.81 N ATOM 6312 CA LEU C 206 -19.633 26.497 -13.156 1.00 67.28 C ATOM 6313 C LEU C 206 -18.773 27.392 -12.274 1.00 68.71 C ATOM 6314 O LEU C 206 -17.830 28.026 -12.753 1.00 69.72 O ATOM 6315 CB LEU C 206 -20.859 27.309 -13.571 1.00 68.10 C ATOM 6316 CG LEU C 206 -21.969 26.530 -14.269 1.00 67.44 C ATOM 6317 CD1 LEU C 206 -23.191 27.428 -14.473 1.00 63.42 C ATOM 6318 CD2 LEU C 206 -21.420 25.983 -15.582 1.00 67.18 C ATOM 6319 N GLY C 207 -19.118 27.467 -10.993 1.00 69.83 N ATOM 6320 CA GLY C 207 -18.389 28.336 -10.091 1.00 70.69 C ATOM 6321 C GLY C 207 -19.233 29.587 -9.981 1.00 70.94 C ATOM 6322 O GLY C 207 -19.587 30.186 -10.991 1.00 70.79 O ATOM 6323 N ILE C 208 -19.566 29.968 -8.752 1.00 70.50 N ATOM 6324 CA ILE C 208 -20.410 31.126 -8.503 1.00 70.18 C ATOM 6325 C ILE C 208 -20.253 32.242 -9.545 1.00 72.13 C ATOM 6326 O ILE C 208 -21.249 32.826 -9.982 1.00 71.07 O ATOM 6327 CB ILE C 208 -20.178 31.674 -7.062 1.00 68.19 C ATOM 6328 CG1 ILE C 208 -21.506 31.750 -6.309 1.00 66.86 C ATOM 6329 CG2 ILE C 208 -19.471 33.009 -7.105 1.00 65.59 C ATOM 6330 CD1 ILE C 208 -22.599 32.525 -6.997 1.00 63.60 C ATOM 6331 N GLY C 209 -19.020 32.522 -9.960 1.00 74.55 N ATOM 6332 CA GLY C 209 -18.802 33.570 -10.949 1.00 78.13 C ATOM 6333 C GLY C 209 -19.653 33.422 -12.205 1.00 79.78 C ATOM 6334 O GLY C 209 -20.438 34.311 -12.540 1.00 79.81 O ATOM 6335 N LYS C 210 -19.483 32.291 -12.892 1.00 81.12 N ATOM 6336 CA LYS C 210 -20.219 31.951 -14.112 1.00 81.72 C ATOM 6337 C LYS C 210 -21.711 31.813 -13.790 1.00 81.92 C ATOM 6338 O LYS C 210 -22.570 32.204 -14.580 1.00 82.06 O ATOM 6339 CB LYS C 210 -19.663 30.634 -14.679 1.00 83.49 C ATOM 6340 CG LYS C 210 -20.466 29.987 -15.800 1.00 86.80 C ATOM 6341 CD LYS C 210 -20.593 30.892 -17.025 1.00 89.98 C ATOM 6342 CE LYS C 210 -21.420 30.222 -18.137 1.00 90.59 C ATOM 6343 NZ LYS C 210 -21.859 31.173 -19.218 1.00 91.17 N ATOM 6344 N VAL C 211 -22.012 31.259 -12.622 1.00 81.88 N ATOM 6345 CA VAL C 211 -23.394 31.087 -12.200 1.00 81.43 C ATOM 6346 C VAL C 211 -24.116 32.432 -12.229 1.00 82.60 C ATOM 6347 O VAL C 211 -25.250 32.513 -12.697 1.00 82.44 O ATOM 6348 CB VAL C 211 -23.475 30.507 -10.768 1.00 80.75 C ATOM 6349 CG1 VAL C 211 -24.916 30.467 -10.297 1.00 79.53 C ATOM 6350 CG2 VAL C 211 -22.871 29.114 -10.736 1.00 80.56 C ATOM 6351 N MET C 212 -23.462 33.482 -11.726 1.00 83.85 N ATOM 6352 CA MET C 212 -24.055 34.823 -11.697 1.00 85.06 C ATOM 6353 C MET C 212 -24.041 35.432 -13.091 1.00 87.35 C ATOM 6354 O MET C 212 -25.034 35.993 -13.558 1.00 87.24 O ATOM 6355 CB MET C 212 -23.289 35.740 -10.739 1.00 82.29 C ATOM 6356 CG MET C 212 -23.374 35.336 -9.267 1.00 80.30 C ATOM 6357 SD MET C 212 -25.049 35.307 -8.562 1.00 73.97 S ATOM 6358 CE MET C 212 -25.276 36.996 -8.259 1.00 75.37 C ATOM 6359 N GLU C 213 -22.892 35.316 -13.743 1.00 90.13 N ATOM 6360 CA GLU C 213 -22.692 35.814 -15.093 1.00 92.16 C ATOM 6361 C GLU C 213 -23.855 35.328 -15.967 1.00 92.77 C ATOM 6362 O GLU C 213 -24.495 36.120 -16.664 1.00 94.09 O ATOM 6363 CB GLU C 213 -21.354 35.282 -15.606 1.00 94.04 C ATOM 6364 CG GLU C 213 -20.810 35.920 -16.864 1.00 99.07 C ATOM 6365 CD GLU C 213 -19.400 35.423 -17.182 1.00102.30 C ATOM 6366 OE1 GLU C 213 -18.495 35.637 -16.341 1.00103.12 O ATOM 6367 OE2 GLU C 213 -19.196 34.819 -18.263 1.00104.92 O ATOM 6368 N GLU C 214 -24.136 34.028 -15.917 1.00 92.07 N ATOM 6369 CA GLU C 214 -25.230 33.459 -16.694 1.00 91.63 C ATOM 6370 C GLU C 214 -26.551 34.073 -16.264 1.00 91.32 C ATOM 6371 O GLU C 214 -27.105 34.908 -16.971 1.00 91.52 O ATOM 6372 CB GLU C 214 -25.300 31.940 -16.507 1.00 92.05 C ATOM 6373 CG GLU C 214 -24.434 31.151 -17.464 1.00 92.73 C ATOM 6374 CD GLU C 214 -24.632 29.652 -17.323 1.00 93.79 C ATOM 6375 OE1 GLU C 214 -25.781 29.178 -17.499 1.00 93.91 O ATOM 6376 OE2 GLU C 214 -23.635 28.952 -17.042 1.00 93.93 O ATOM 6377 N THR C 215 -27.044 33.651 -15.101 1.00 91.31 N ATOM 6378 CA THR C 215 -28.310 34.139 -14.547 1.00 90.84 C ATOM 6379 C THR C 215 -28.658 35.555 -14.973 1.00 90.63 C ATOM 6380 O THR C 215 -29.830 35.873 -15.189 1.00 89.71 O ATOM 6381 CB THR C 215 -28.305 34.119 -13.005 1.00 91.00 C ATOM 6382 OG1 THR C 215 -27.065 34.651 -12.524 1.00 91.70 O ATOM 6383 CG2 THR C 215 -28.482 32.726 -12.486 1.00 90.94 C ATOM 6384 N PHE C 216 -27.644 36.413 -15.077 1.00 90.43 N ATOM 6385 CA PHE C 216 -27.890 37.787 -15.481 1.00 90.26 C ATOM 6386 C PHE C 216 -28.342 37.853 -16.927 1.00 91.62 C ATOM 6387 O PHE C 216 -29.489 38.213 -17.197 1.00 92.03 O ATOM 6388 CB PHE C 216 -26.653 38.655 -15.246 1.00 86.72 C ATOM 6389 CG PHE C 216 -26.470 39.056 -13.808 1.00 83.81 C ATOM 6390 CD1 PHE C 216 -27.450 38.770 -12.860 1.00 82.90 C ATOM 6391 CD2 PHE C 216 -25.316 39.705 -13.395 1.00 82.83 C ATOM 6392 CE1 PHE C 216 -27.273 39.101 -11.520 1.00 82.23 C ATOM 6393 CE2 PHE C 216 -25.129 40.041 -12.053 1.00 82.56 C ATOM 6394 CZ PHE C 216 -26.113 39.745 -11.114 1.00 81.79 C ATOM 6395 N SER C 217 -27.464 37.502 -17.861 1.00 92.18 N ATOM 6396 CA SER C 217 -27.855 37.525 -19.262 1.00 92.47 C ATOM 6397 C SER C 217 -29.263 36.921 -19.394 1.00 92.71 C ATOM 6398 O SER C 217 -30.170 37.568 -19.914 1.00 93.44 O ATOM 6399 CB SER C 217 -26.871 36.719 -20.106 1.00 92.57 C ATOM 6400 OG SER C 217 -27.079 35.329 -19.925 1.00 92.76 O ATOM 6401 N TYR C 218 -29.442 35.695 -18.907 1.00 92.44 N ATOM 6402 CA TYR C 218 -30.733 35.014 -18.972 1.00 93.21 C ATOM 6403 C TYR C 218 -31.929 35.880 -18.542 1.00 94.61 C ATOM 6404 O TYR C 218 -33.050 35.678 -19.015 1.00 93.90 O ATOM 6405 CB TYR C 218 -30.699 33.750 -18.108 1.00 92.46 C ATOM 6406 CG TYR C 218 -32.065 33.119 -17.915 1.00 92.79 C ATOM 6407 CD1 TYR C 218 -32.549 32.153 -18.800 1.00 93.36 C ATOM 6408 CD2 TYR C 218 -32.900 33.528 -16.874 1.00 91.95 C ATOM 6409 CE1 TYR C 218 -33.838 31.611 -18.655 1.00 92.39 C ATOM 6410 CE2 TYR C 218 -34.185 33.000 -16.719 1.00 92.17 C ATOM 6411 CZ TYR C 218 -34.652 32.041 -17.614 1.00 92.07 C ATOM 6412 OH TYR C 218 -35.934 31.535 -17.474 1.00 91.03 O ATOM 6413 N LEU C 219 -31.693 36.831 -17.643 1.00 96.79 N ATOM 6414 CA LEU C 219 -32.759 37.705 -17.145 1.00 99.53 C ATOM 6415 C LEU C 219 -32.714 39.143 -17.679 1.00102.10 C ATOM 6416 O LEU C 219 -33.749 39.730 -18.007 1.00102.34 O ATOM 6417 CB LEU C 219 -32.722 37.756 -15.612 1.00 98.47 C ATOM 6418 CG LEU C 219 -33.171 36.540 -14.807 1.00 97.30 C ATOM 6419 CD1 LEU C 219 -32.771 36.730 -13.364 1.00 97.75 C ATOM 6420 CD2 LEU C 219 -34.674 36.357 -14.917 1.00 96.22 C ATOM 6421 N LEU C 220 -31.514 39.706 -17.759 1.00104.81 N ATOM 6422 CA LEU C 220 -31.336 41.079 -18.223 1.00107.29 C ATOM 6423 C LEU C 220 -30.700 41.128 -19.616 1.00109.31 C ATOM 6424 O LEU C 220 -30.151 42.155 -20.020 1.00109.84 O ATOM 6425 CB LEU C 220 -30.457 41.848 -17.220 1.00106.72 C ATOM 6426 CG LEU C 220 -30.587 41.505 -15.724 1.00105.67 C ATOM 6427 CD1 LEU C 220 -29.577 42.321 -14.922 1.00104.74 C ATOM 6428 CD2 LEU C 220 -32.010 41.765 -15.236 1.00104.50 C ATOM 6429 N GLY C 221 -30.774 40.016 -20.344 1.00111.26 N ATOM 6430 CA GLY C 221 -30.191 39.962 -21.675 1.00113.38 C ATOM 6431 C GLY C 221 -30.854 40.927 -22.636 1.00114.74 C ATOM 6432 O GLY C 221 -30.185 41.726 -23.290 1.00114.30 O ATOM 6433 N ARG C 222 -32.178 40.848 -22.717 1.00116.28 N ATOM 6434 CA ARG C 222 -32.952 41.716 -23.594 1.00117.84 C ATOM 6435 C ARG C 222 -33.260 43.051 -22.912 1.00117.49 C ATOM 6436 O ARG C 222 -32.542 44.042 -23.085 1.00117.04 O ATOM 6437 CB ARG C 222 -34.255 41.011 -23.993 1.00120.30 C ATOM 6438 CG ARG C 222 -34.057 39.846 -24.966 1.00124.08 C ATOM 6439 CD ARG C 222 -35.270 38.905 -25.026 1.00127.28 C ATOM 6440 NE ARG C 222 -35.425 38.106 -23.807 1.00129.65 N ATOM 6441 CZ ARG C 222 -36.229 37.049 -23.689 1.00130.39 C ATOM 6442 NH1 ARG C 222 -36.965 36.649 -24.720 1.00131.75 N ATOM 6443 NH2 ARG C 222 -36.296 36.383 -22.541 1.00130.10 N ATOM 6444 N LYS C 223 -34.330 43.073 -22.127 1.00117.21 N ATOM 6445 CA LYS C 223 -34.725 44.288 -21.427 1.00116.89 C ATOM 6446 C LYS C 223 -34.217 44.306 -19.985 1.00115.74 C ATOM 6447 O LYS C 223 -34.612 43.475 -19.164 1.00116.07 O ATOM 6448 CB LYS C 223 -36.255 44.424 -21.419 1.00118.12 C ATOM 6449 CG LYS C 223 -36.918 44.649 -22.781 1.00119.48 C ATOM 6450 CD LYS C 223 -36.709 46.068 -23.313 1.00120.20 C ATOM 6451 CE LYS C 223 -37.710 46.407 -24.424 1.00119.96 C ATOM 6452 NZ LYS C 223 -37.623 45.502 -25.607 1.00120.32 N ATOM 6453 N LYS C 224 -33.336 45.253 -19.684 1.00113.56 N ATOM 6454 CA LYS C 224 -32.815 45.403 -18.330 1.00110.71 C ATOM 6455 C LYS C 224 -33.990 45.810 -17.438 1.00108.17 C ATOM 6456 O LYS C 224 -34.618 46.842 -17.670 1.00107.92 O ATOM 6457 CB LYS C 224 -31.744 46.499 -18.285 1.00111.33 C ATOM 6458 CG LYS C 224 -30.398 46.103 -18.857 1.00111.67 C ATOM 6459 CD LYS C 224 -30.510 45.593 -20.280 1.00113.19 C ATOM 6460 CE LYS C 224 -29.144 45.207 -20.827 1.00114.15 C ATOM 6461 NZ LYS C 224 -29.244 44.548 -22.163 1.00114.36 N ATOM 6462 N ARG C 225 -34.300 44.999 -16.431 1.00104.97 N ATOM 6463 CA ARG C 225 -35.399 45.321 -15.522 1.00102.05 C ATOM 6464 C ARG C 225 -34.920 45.417 -14.071 1.00 99.65 C ATOM 6465 O ARG C 225 -33.719 45.333 -13.796 1.00100.15 O ATOM 6466 CB ARG C 225 -36.530 44.281 -15.631 1.00101.03 C ATOM 6467 CG ARG C 225 -36.157 42.841 -15.310 1.00 99.54 C ATOM 6468 CD ARG C 225 -35.356 42.230 -16.441 1.00 99.50 C ATOM 6469 NE ARG C 225 -35.509 40.777 -16.528 1.00 98.90 N ATOM 6470 CZ ARG C 225 -36.671 40.156 -16.719 1.00 99.29 C ATOM 6471 NH1 ARG C 225 -37.795 40.853 -16.835 1.00 99.63 N ATOM 6472 NH2 ARG C 225 -36.709 38.834 -16.813 1.00 99.36 N ATOM 6473 N PRO C 226 -35.848 45.654 -13.130 1.00 96.61 N ATOM 6474 CA PRO C 226 -35.422 45.742 -11.730 1.00 93.91 C ATOM 6475 C PRO C 226 -35.277 44.338 -11.161 1.00 90.68 C ATOM 6476 O PRO C 226 -35.930 43.401 -11.631 1.00 90.41 O ATOM 6477 CB PRO C 226 -36.560 46.512 -11.068 1.00 94.90 C ATOM 6478 CG PRO C 226 -37.097 47.348 -12.199 1.00 96.02 C ATOM 6479 CD PRO C 226 -37.130 46.350 -13.317 1.00 95.93 C ATOM 6480 N ILE C 227 -34.414 44.204 -10.156 1.00 86.58 N ATOM 6481 CA ILE C 227 -34.157 42.917 -9.510 1.00 81.66 C ATOM 6482 C ILE C 227 -34.793 42.823 -8.130 1.00 78.57 C ATOM 6483 O ILE C 227 -34.899 43.817 -7.398 1.00 78.91 O ATOM 6484 CB ILE C 227 -32.615 42.642 -9.339 1.00 81.09 C ATOM 6485 CG1 ILE C 227 -32.050 41.947 -10.580 1.00 79.71 C ATOM 6486 CG2 ILE C 227 -32.354 41.783 -8.095 1.00 79.22 C ATOM 6487 CD1 ILE C 227 -31.840 42.853 -11.767 1.00 79.96 C ATOM 6488 N HIS C 228 -35.230 41.619 -7.792 1.00 73.48 N ATOM 6489 CA HIS C 228 -35.780 41.374 -6.486 1.00 69.84 C ATOM 6490 C HIS C 228 -35.026 40.163 -5.955 1.00 67.82 C ATOM 6491 O HIS C 228 -35.321 39.015 -6.308 1.00 66.60 O ATOM 6492 CB HIS C 228 -37.262 41.058 -6.543 1.00 70.56 C ATOM 6493 CG HIS C 228 -37.871 40.871 -5.191 1.00 73.54 C ATOM 6494 ND1 HIS C 228 -38.117 41.927 -4.334 1.00 75.85 N ATOM 6495 CD2 HIS C 228 -38.223 39.753 -4.514 1.00 74.60 C ATOM 6496 CE1 HIS C 228 -38.593 41.466 -3.191 1.00 75.11 C ATOM 6497 NE2 HIS C 228 -38.665 40.149 -3.272 1.00 76.27 N ATOM 6498 N LEU C 229 -34.023 40.423 -5.130 1.00 65.08 N ATOM 6499 CA LEU C 229 -33.239 39.348 -4.550 1.00 62.32 C ATOM 6500 C LEU C 229 -33.881 38.874 -3.250 1.00 61.53 C ATOM 6501 O LEU C 229 -33.808 39.555 -2.226 1.00 62.14 O ATOM 6502 CB LEU C 229 -31.823 39.824 -4.252 1.00 59.30 C ATOM 6503 CG LEU C 229 -31.000 38.800 -3.468 1.00 57.40 C ATOM 6504 CD1 LEU C 229 -30.705 37.596 -4.350 1.00 54.44 C ATOM 6505 CD2 LEU C 229 -29.729 39.457 -2.985 1.00 56.40 C ATOM 6506 N SER C 230 -34.525 37.717 -3.292 1.00 59.36 N ATOM 6507 CA SER C 230 -35.142 37.170 -2.095 1.00 58.06 C ATOM 6508 C SER C 230 -34.066 36.226 -1.550 1.00 57.34 C ATOM 6509 O SER C 230 -33.871 35.110 -2.050 1.00 57.15 O ATOM 6510 CB SER C 230 -36.428 36.406 -2.456 1.00 58.58 C ATOM 6511 OG SER C 230 -37.350 36.342 -1.375 1.00 57.12 O ATOM 6512 N PHE C 231 -33.341 36.688 -0.539 1.00 55.66 N ATOM 6513 CA PHE C 231 -32.279 35.875 0.034 1.00 54.53 C ATOM 6514 C PHE C 231 -32.714 35.142 1.284 1.00 53.57 C ATOM 6515 O PHE C 231 -33.201 35.738 2.246 1.00 52.54 O ATOM 6516 CB PHE C 231 -31.055 36.732 0.356 1.00 52.60 C ATOM 6517 CG PHE C 231 -29.780 35.958 0.390 1.00 46.80 C ATOM 6518 CD1 PHE C 231 -29.671 34.821 1.167 1.00 44.29 C ATOM 6519 CD2 PHE C 231 -28.684 36.384 -0.350 1.00 46.49 C ATOM 6520 CE1 PHE C 231 -28.499 34.127 1.216 1.00 44.81 C ATOM 6521 CE2 PHE C 231 -27.498 35.700 -0.314 1.00 46.31 C ATOM 6522 CZ PHE C 231 -27.400 34.561 0.473 1.00 47.96 C ATOM 6523 N ASP C 232 -32.504 33.833 1.261 1.00 54.56 N ATOM 6524 CA ASP C 232 -32.871 32.977 2.376 1.00 54.94 C ATOM 6525 C ASP C 232 -31.653 32.517 3.130 1.00 54.45 C ATOM 6526 O ASP C 232 -31.050 31.520 2.776 1.00 53.16 O ATOM 6527 CB ASP C 232 -33.636 31.762 1.878 1.00 53.83 C ATOM 6528 CG ASP C 232 -33.945 30.791 2.974 1.00 54.18 C ATOM 6529 OD1 ASP C 232 -33.542 31.028 4.134 1.00 51.22 O ATOM 6530 OD2 ASP C 232 -34.590 29.768 2.670 1.00 54.39 O ATOM 6531 N VAL C 233 -31.317 33.250 4.184 1.00 55.37 N ATOM 6532 CA VAL C 233 -30.185 32.955 5.041 1.00 56.07 C ATOM 6533 C VAL C 233 -29.810 31.474 5.117 1.00 56.17 C ATOM 6534 O VAL C 233 -28.661 31.140 5.405 1.00 57.17 O ATOM 6535 CB VAL C 233 -30.462 33.431 6.456 1.00 58.18 C ATOM 6536 CG1 VAL C 233 -31.519 32.520 7.107 1.00 57.95 C ATOM 6537 CG2 VAL C 233 -29.182 33.432 7.251 1.00 59.10 C ATOM 6538 N ASP C 234 -30.767 30.579 4.877 1.00 55.85 N ATOM 6539 CA ASP C 234 -30.459 29.143 4.915 1.00 55.47 C ATOM 6540 C ASP C 234 -29.741 28.589 3.670 1.00 52.10 C ATOM 6541 O ASP C 234 -29.200 27.504 3.713 1.00 52.05 O ATOM 6542 CB ASP C 234 -31.735 28.316 5.194 1.00 59.03 C ATOM 6543 CG ASP C 234 -32.639 28.157 3.969 1.00 63.96 C ATOM 6544 OD1 ASP C 234 -32.131 27.871 2.850 1.00 65.02 O ATOM 6545 OD2 ASP C 234 -33.877 28.288 4.144 1.00 66.85 O ATOM 6546 N GLY C 235 -29.748 29.327 2.569 1.00 50.06 N ATOM 6547 CA GLY C 235 -29.084 28.871 1.358 1.00 49.84 C ATOM 6548 C GLY C 235 -27.556 28.830 1.435 1.00 49.59 C ATOM 6549 O GLY C 235 -26.866 28.466 0.468 1.00 47.09 O ATOM 6550 N LEU C 236 -27.019 29.247 2.576 1.00 50.09 N ATOM 6551 CA LEU C 236 -25.590 29.207 2.786 1.00 50.76 C ATOM 6552 C LEU C 236 -25.393 27.897 3.511 1.00 51.57 C ATOM 6553 O LEU C 236 -26.371 27.259 3.908 1.00 52.88 O ATOM 6554 CB LEU C 236 -25.157 30.367 3.666 1.00 52.43 C ATOM 6555 CG LEU C 236 -24.391 31.462 2.929 1.00 54.89 C ATOM 6556 CD1 LEU C 236 -25.070 31.761 1.603 1.00 55.00 C ATOM 6557 CD2 LEU C 236 -24.323 32.703 3.805 1.00 55.89 C ATOM 6558 N ASP C 237 -24.156 27.463 3.689 1.00 52.93 N ATOM 6559 CA ASP C 237 -23.966 26.206 4.402 1.00 57.18 C ATOM 6560 C ASP C 237 -24.269 26.377 5.901 1.00 58.88 C ATOM 6561 O ASP C 237 -24.085 27.466 6.460 1.00 59.95 O ATOM 6562 CB ASP C 237 -22.550 25.681 4.222 1.00 58.63 C ATOM 6563 CG ASP C 237 -22.377 24.302 4.820 1.00 60.03 C ATOM 6564 OD1 ASP C 237 -22.299 24.206 6.068 1.00 60.59 O ATOM 6565 OD2 ASP C 237 -22.354 23.321 4.040 1.00 60.31 O ATOM 6566 N PRO C 238 -24.745 25.307 6.572 1.00 57.68 N ATOM 6567 CA PRO C 238 -25.045 25.446 7.996 1.00 56.57 C ATOM 6568 C PRO C 238 -23.903 25.940 8.886 1.00 56.08 C ATOM 6569 O PRO C 238 -24.151 26.513 9.954 1.00 56.48 O ATOM 6570 CB PRO C 238 -25.546 24.055 8.368 1.00 57.23 C ATOM 6571 CG PRO C 238 -26.308 23.673 7.142 1.00 57.07 C ATOM 6572 CD PRO C 238 -25.296 24.039 6.063 1.00 57.93 C ATOM 6573 N VAL C 239 -22.660 25.744 8.450 1.00 53.68 N ATOM 6574 CA VAL C 239 -21.507 26.195 9.233 1.00 52.34 C ATOM 6575 C VAL C 239 -21.462 27.734 9.297 1.00 52.10 C ATOM 6576 O VAL C 239 -20.932 28.325 10.241 1.00 50.40 O ATOM 6577 CB VAL C 239 -20.189 25.685 8.607 1.00 51.91 C ATOM 6578 CG1 VAL C 239 -19.608 26.724 7.675 1.00 51.39 C ATOM 6579 CG2 VAL C 239 -19.203 25.328 9.691 1.00 51.86 C ATOM 6580 N PHE C 240 -22.045 28.368 8.286 1.00 52.61 N ATOM 6581 CA PHE C 240 -22.053 29.819 8.181 1.00 51.13 C ATOM 6582 C PHE C 240 -23.275 30.490 8.777 1.00 50.55 C ATOM 6583 O PHE C 240 -23.150 31.498 9.462 1.00 52.43 O ATOM 6584 CB PHE C 240 -21.921 30.217 6.707 1.00 49.52 C ATOM 6585 CG PHE C 240 -20.636 29.752 6.055 1.00 46.38 C ATOM 6586 CD1 PHE C 240 -19.432 30.403 6.319 1.00 43.52 C ATOM 6587 CD2 PHE C 240 -20.638 28.684 5.162 1.00 44.12 C ATOM 6588 CE1 PHE C 240 -18.272 30.016 5.702 1.00 40.93 C ATOM 6589 CE2 PHE C 240 -19.472 28.291 4.541 1.00 41.80 C ATOM 6590 CZ PHE C 240 -18.287 28.956 4.812 1.00 40.90 C ATOM 6591 N THR C 241 -24.451 29.938 8.521 1.00 49.79 N ATOM 6592 CA THR C 241 -25.688 30.520 9.023 1.00 50.83 C ATOM 6593 C THR C 241 -26.468 29.475 9.782 1.00 52.55 C ATOM 6594 O THR C 241 -27.590 29.128 9.407 1.00 51.99 O ATOM 6595 CB THR C 241 -26.566 31.003 7.871 1.00 51.65 C ATOM 6596 OG1 THR C 241 -26.785 29.915 6.952 1.00 53.02 O ATOM 6597 CG2 THR C 241 -25.898 32.162 7.153 1.00 50.87 C ATOM 6598 N PRO C 242 -25.884 28.948 10.863 1.00 54.97 N ATOM 6599 CA PRO C 242 -26.537 27.919 11.679 1.00 57.22 C ATOM 6600 C PRO C 242 -27.917 28.210 12.280 1.00 56.87 C ATOM 6601 O PRO C 242 -28.754 27.315 12.327 1.00 57.12 O ATOM 6602 CB PRO C 242 -25.474 27.591 12.736 1.00 57.28 C ATOM 6603 CG PRO C 242 -24.566 28.800 12.734 1.00 57.06 C ATOM 6604 CD PRO C 242 -24.487 29.138 11.283 1.00 56.17 C ATOM 6605 N ALA C 243 -28.157 29.440 12.731 1.00 58.31 N ATOM 6606 CA ALA C 243 -29.450 29.798 13.329 1.00 58.19 C ATOM 6607 C ALA C 243 -30.590 29.888 12.317 1.00 58.24 C ATOM 6608 O ALA C 243 -30.996 30.992 11.948 1.00 57.68 O ATOM 6609 CB ALA C 243 -29.331 31.128 14.089 1.00 56.37 C ATOM 6610 N THR C 244 -31.102 28.733 11.883 1.00 59.71 N ATOM 6611 CA THR C 244 -32.203 28.669 10.909 1.00 62.49 C ATOM 6612 C THR C 244 -33.234 27.584 11.238 1.00 64.13 C ATOM 6613 O THR C 244 -33.086 26.831 12.205 1.00 66.01 O ATOM 6614 CB THR C 244 -31.711 28.354 9.471 1.00 60.93 C ATOM 6615 OG1 THR C 244 -30.455 28.992 9.228 1.00 62.57 O ATOM 6616 CG2 THR C 244 -32.715 28.875 8.454 1.00 62.79 C ATOM 6617 N GLY C 245 -34.281 27.517 10.417 1.00 65.60 N ATOM 6618 CA GLY C 245 -35.311 26.510 10.589 1.00 65.27 C ATOM 6619 C GLY C 245 -34.765 25.225 10.013 1.00 65.53 C ATOM 6620 O GLY C 245 -34.442 24.308 10.752 1.00 65.18 O ATOM 6621 N THR C 246 -34.626 25.166 8.693 1.00 67.51 N ATOM 6622 CA THR C 246 -34.096 23.966 8.052 1.00 69.03 C ATOM 6623 C THR C 246 -32.652 24.083 7.599 1.00 70.69 C ATOM 6624 O THR C 246 -32.367 24.852 6.679 1.00 72.99 O ATOM 6625 CB THR C 246 -34.805 23.596 6.749 1.00 68.81 C ATOM 6626 OG1 THR C 246 -36.101 24.199 6.687 1.00 70.16 O ATOM 6627 CG2 THR C 246 -34.876 22.078 6.625 1.00 67.59 C ATOM 6628 N PRO C 247 -31.714 23.357 8.240 1.00 69.84 N ATOM 6629 CA PRO C 247 -30.362 23.519 7.699 1.00 67.28 C ATOM 6630 C PRO C 247 -30.208 22.366 6.675 1.00 65.99 C ATOM 6631 O PRO C 247 -30.905 21.343 6.771 1.00 64.86 O ATOM 6632 CB PRO C 247 -29.472 23.366 8.935 1.00 66.52 C ATOM 6633 CG PRO C 247 -30.213 22.369 9.749 1.00 67.42 C ATOM 6634 CD PRO C 247 -31.669 22.810 9.613 1.00 69.14 C ATOM 6635 N VAL C 248 -29.344 22.545 5.678 1.00 64.07 N ATOM 6636 CA VAL C 248 -29.111 21.507 4.663 1.00 60.46 C ATOM 6637 C VAL C 248 -27.613 21.554 4.290 1.00 58.21 C ATOM 6638 O VAL C 248 -27.157 22.490 3.639 1.00 58.79 O ATOM 6639 CB VAL C 248 -30.008 21.745 3.410 1.00 59.13 C ATOM 6640 CG1 VAL C 248 -29.900 20.591 2.477 1.00 59.22 C ATOM 6641 CG2 VAL C 248 -31.453 21.878 3.816 1.00 60.23 C ATOM 6642 N VAL C 249 -26.841 20.556 4.715 1.00 54.05 N ATOM 6643 CA VAL C 249 -25.405 20.568 4.439 1.00 51.52 C ATOM 6644 C VAL C 249 -25.052 20.723 2.976 1.00 49.68 C ATOM 6645 O VAL C 249 -25.874 20.450 2.103 1.00 50.93 O ATOM 6646 CB VAL C 249 -24.745 19.292 4.916 1.00 50.57 C ATOM 6647 CG1 VAL C 249 -24.821 19.218 6.410 1.00 50.72 C ATOM 6648 CG2 VAL C 249 -25.423 18.108 4.277 1.00 48.35 C ATOM 6649 N GLY C 250 -23.823 21.150 2.709 1.00 45.28 N ATOM 6650 CA GLY C 250 -23.385 21.299 1.333 1.00 44.16 C ATOM 6651 C GLY C 250 -24.027 22.421 0.528 1.00 45.03 C ATOM 6652 O GLY C 250 -24.466 22.222 -0.621 1.00 43.44 O ATOM 6653 N GLY C 251 -24.059 23.606 1.139 1.00 45.14 N ATOM 6654 CA GLY C 251 -24.626 24.777 0.506 1.00 44.15 C ATOM 6655 C GLY C 251 -23.616 25.890 0.275 1.00 44.72 C ATOM 6656 O GLY C 251 -22.416 25.736 0.530 1.00 45.93 O ATOM 6657 N LEU C 252 -24.123 27.013 -0.224 1.00 43.77 N ATOM 6658 CA LEU C 252 -23.339 28.194 -0.529 1.00 44.97 C ATOM 6659 C LEU C 252 -22.387 28.620 0.588 1.00 47.83 C ATOM 6660 O LEU C 252 -22.727 28.585 1.789 1.00 51.38 O ATOM 6661 CB LEU C 252 -24.296 29.344 -0.862 1.00 43.84 C ATOM 6662 CG LEU C 252 -25.121 29.072 -2.118 1.00 43.55 C ATOM 6663 CD1 LEU C 252 -26.404 29.887 -2.138 1.00 41.47 C ATOM 6664 CD2 LEU C 252 -24.255 29.354 -3.324 1.00 41.53 C ATOM 6665 N SER C 253 -21.191 29.040 0.206 1.00 46.62 N ATOM 6666 CA SER C 253 -20.233 29.480 1.201 1.00 46.70 C ATOM 6667 C SER C 253 -20.447 30.960 1.475 1.00 47.40 C ATOM 6668 O SER C 253 -21.067 31.655 0.668 1.00 48.44 O ATOM 6669 CB SER C 253 -18.846 29.265 0.659 1.00 45.90 C ATOM 6670 OG SER C 253 -18.822 29.700 -0.685 1.00 46.47 O ATOM 6671 N TYR C 254 -19.947 31.436 2.613 1.00 47.39 N ATOM 6672 CA TYR C 254 -20.057 32.846 2.966 1.00 47.44 C ATOM 6673 C TYR C 254 -19.560 33.548 1.716 1.00 49.11 C ATOM 6674 O TYR C 254 -20.295 34.329 1.111 1.00 49.44 O ATOM 6675 CB TYR C 254 -19.145 33.168 4.150 1.00 46.22 C ATOM 6676 CG TYR C 254 -19.179 34.605 4.666 1.00 47.21 C ATOM 6677 CD1 TYR C 254 -20.296 35.112 5.323 1.00 48.90 C ATOM 6678 CD2 TYR C 254 -18.050 35.420 4.587 1.00 47.33 C ATOM 6679 CE1 TYR C 254 -20.288 36.394 5.903 1.00 49.41 C ATOM 6680 CE2 TYR C 254 -18.022 36.697 5.156 1.00 49.47 C ATOM 6681 CZ TYR C 254 -19.143 37.180 5.817 1.00 51.72 C ATOM 6682 OH TYR C 254 -19.100 38.432 6.416 1.00 53.13 O ATOM 6683 N ARG C 255 -18.323 33.233 1.317 1.00 49.81 N ATOM 6684 CA ARG C 255 -17.700 33.797 0.115 1.00 50.99 C ATOM 6685 C ARG C 255 -18.642 33.784 -1.097 1.00 52.53 C ATOM 6686 O ARG C 255 -18.949 34.841 -1.669 1.00 53.55 O ATOM 6687 CB ARG C 255 -16.419 33.022 -0.230 1.00 51.35 C ATOM 6688 CG ARG C 255 -15.178 33.516 0.497 1.00 54.18 C ATOM 6689 CD ARG C 255 -13.975 32.584 0.349 1.00 55.20 C ATOM 6690 NE ARG C 255 -13.495 32.508 -1.029 1.00 59.63 N ATOM 6691 CZ ARG C 255 -12.489 31.728 -1.435 1.00 62.22 C ATOM 6692 NH1 ARG C 255 -11.848 30.952 -0.558 1.00 63.52 N ATOM 6693 NH2 ARG C 255 -12.129 31.712 -2.720 1.00 60.66 N ATOM 6694 N GLU C 256 -19.093 32.596 -1.493 1.00 52.25 N ATOM 6695 CA GLU C 256 -20.004 32.490 -2.623 1.00 54.48 C ATOM 6696 C GLU C 256 -21.251 33.362 -2.433 1.00 55.70 C ATOM 6697 O GLU C 256 -21.761 33.952 -3.391 1.00 56.74 O ATOM 6698 CB GLU C 256 -20.449 31.044 -2.829 1.00 55.00 C ATOM 6699 CG GLU C 256 -19.366 30.104 -3.297 1.00 55.21 C ATOM 6700 CD GLU C 256 -19.900 28.719 -3.533 1.00 56.50 C ATOM 6701 OE1 GLU C 256 -20.668 28.242 -2.664 1.00 57.48 O ATOM 6702 OE2 GLU C 256 -19.553 28.112 -4.576 1.00 57.60 O ATOM 6703 N GLY C 257 -21.741 33.433 -1.198 1.00 55.23 N ATOM 6704 CA GLY C 257 -22.925 34.227 -0.918 1.00 54.94 C ATOM 6705 C GLY C 257 -22.643 35.709 -1.005 1.00 54.60 C ATOM 6706 O GLY C 257 -23.480 36.507 -1.440 1.00 52.82 O ATOM 6707 N LEU C 258 -21.454 36.080 -0.559 1.00 54.63 N ATOM 6708 CA LEU C 258 -21.064 37.459 -0.622 1.00 55.19 C ATOM 6709 C LEU C 258 -20.738 37.756 -2.068 1.00 56.88 C ATOM 6710 O LEU C 258 -20.847 38.903 -2.508 1.00 58.61 O ATOM 6711 CB LEU C 258 -19.858 37.735 0.271 1.00 53.48 C ATOM 6712 CG LEU C 258 -20.237 37.897 1.740 1.00 51.43 C ATOM 6713 CD1 LEU C 258 -19.076 38.466 2.530 1.00 50.84 C ATOM 6714 CD2 LEU C 258 -21.408 38.840 1.830 1.00 52.56 C ATOM 6715 N TYR C 259 -20.358 36.741 -2.835 1.00 56.63 N ATOM 6716 CA TYR C 259 -20.067 37.035 -4.225 1.00 58.61 C ATOM 6717 C TYR C 259 -21.350 37.378 -4.965 1.00 59.71 C ATOM 6718 O TYR C 259 -21.368 38.273 -5.816 1.00 61.48 O ATOM 6719 CB TYR C 259 -19.405 35.879 -4.937 1.00 58.79 C ATOM 6720 CG TYR C 259 -18.909 36.300 -6.303 1.00 59.56 C ATOM 6721 CD1 TYR C 259 -17.621 36.831 -6.461 1.00 58.31 C ATOM 6722 CD2 TYR C 259 -19.719 36.149 -7.441 1.00 57.95 C ATOM 6723 CE1 TYR C 259 -17.146 37.188 -7.723 1.00 60.55 C ATOM 6724 CE2 TYR C 259 -19.262 36.501 -8.706 1.00 58.97 C ATOM 6725 CZ TYR C 259 -17.971 37.015 -8.844 1.00 61.35 C ATOM 6726 OH TYR C 259 -17.483 37.316 -10.094 1.00 61.49 O ATOM 6727 N ILE C 260 -22.423 36.662 -4.643 1.00 59.36 N ATOM 6728 CA ILE C 260 -23.710 36.911 -5.273 1.00 59.47 C ATOM 6729 C ILE C 260 -24.158 38.355 -5.023 1.00 59.59 C ATOM 6730 O ILE C 260 -24.182 39.165 -5.949 1.00 57.54 O ATOM 6731 CB ILE C 260 -24.773 35.916 -4.753 1.00 60.43 C ATOM 6732 CG1 ILE C 260 -24.457 34.504 -5.270 1.00 60.99 C ATOM 6733 CG2 ILE C 260 -26.157 36.335 -5.204 1.00 61.05 C ATOM 6734 CD1 ILE C 260 -25.410 33.442 -4.777 1.00 60.66 C ATOM 6735 N THR C 261 -24.500 38.687 -3.780 1.00 61.59 N ATOM 6736 CA THR C 261 -24.933 40.052 -3.457 1.00 63.51 C ATOM 6737 C THR C 261 -23.986 41.088 -4.085 1.00 67.33 C ATOM 6738 O THR C 261 -24.413 42.197 -4.450 1.00 68.11 O ATOM 6739 CB THR C 261 -24.953 40.306 -1.940 1.00 60.27 C ATOM 6740 OG1 THR C 261 -23.666 39.996 -1.413 1.00 57.46 O ATOM 6741 CG2 THR C 261 -26.011 39.464 -1.239 1.00 57.45 C ATOM 6742 N GLU C 262 -22.704 40.728 -4.197 1.00 69.10 N ATOM 6743 CA GLU C 262 -21.704 41.617 -4.789 1.00 71.64 C ATOM 6744 C GLU C 262 -21.929 41.868 -6.285 1.00 74.49 C ATOM 6745 O GLU C 262 -21.887 43.013 -6.739 1.00 73.82 O ATOM 6746 CB GLU C 262 -20.294 41.067 -4.569 1.00 70.03 C ATOM 6747 CG GLU C 262 -19.766 41.300 -3.170 1.00 66.23 C ATOM 6748 CD GLU C 262 -18.262 41.124 -3.082 1.00 64.16 C ATOM 6749 OE1 GLU C 262 -17.713 40.263 -3.810 1.00 61.75 O ATOM 6750 OE2 GLU C 262 -17.636 41.842 -2.272 1.00 61.96 O ATOM 6751 N GLU C 263 -22.154 40.803 -7.051 1.00 77.10 N ATOM 6752 CA GLU C 263 -22.417 40.954 -8.480 1.00 80.64 C ATOM 6753 C GLU C 263 -23.804 41.564 -8.695 1.00 82.90 C ATOM 6754 O GLU C 263 -24.069 42.186 -9.729 1.00 83.65 O ATOM 6755 CB GLU C 263 -22.333 39.609 -9.186 1.00 81.10 C ATOM 6756 CG GLU C 263 -20.915 39.211 -9.478 1.00 85.33 C ATOM 6757 CD GLU C 263 -20.176 40.290 -10.260 1.00 87.53 C ATOM 6758 OE1 GLU C 263 -20.627 40.603 -11.384 1.00 89.71 O ATOM 6759 OE2 GLU C 263 -19.157 40.825 -9.754 1.00 87.81 O ATOM 6760 N ILE C 264 -24.683 41.376 -7.714 1.00 83.91 N ATOM 6761 CA ILE C 264 -26.026 41.929 -7.764 1.00 86.02 C ATOM 6762 C ILE C 264 -25.895 43.448 -7.712 1.00 87.43 C ATOM 6763 O ILE C 264 -26.427 44.182 -8.556 1.00 86.72 O ATOM 6764 CB ILE C 264 -26.872 41.455 -6.545 1.00 86.67 C ATOM 6765 CG1 ILE C 264 -27.355 40.020 -6.761 1.00 87.39 C ATOM 6766 CG2 ILE C 264 -28.037 42.414 -6.299 1.00 86.51 C ATOM 6767 CD1 ILE C 264 -28.188 39.811 -8.013 1.00 86.98 C ATOM 6768 N TYR C 265 -25.172 43.903 -6.696 1.00 88.88 N ATOM 6769 CA TYR C 265 -24.938 45.318 -6.487 1.00 89.33 C ATOM 6770 C TYR C 265 -24.290 45.962 -7.712 1.00 89.27 C ATOM 6771 O TYR C 265 -24.756 46.984 -8.204 1.00 89.51 O ATOM 6772 CB TYR C 265 -24.041 45.517 -5.266 1.00 89.38 C ATOM 6773 CG TYR C 265 -23.490 46.900 -5.189 1.00 89.34 C ATOM 6774 CD1 TYR C 265 -24.316 47.972 -4.886 1.00 90.14 C ATOM 6775 CD2 TYR C 265 -22.162 47.155 -5.523 1.00 90.59 C ATOM 6776 CE1 TYR C 265 -23.840 49.280 -4.924 1.00 92.33 C ATOM 6777 CE2 TYR C 265 -21.670 48.457 -5.565 1.00 92.63 C ATOM 6778 CZ TYR C 265 -22.517 49.519 -5.266 1.00 92.95 C ATOM 6779 OH TYR C 265 -22.043 50.815 -5.327 1.00 93.11 O ATOM 6780 N LYS C 266 -23.214 45.355 -8.199 1.00 89.51 N ATOM 6781 CA LYS C 266 -22.484 45.863 -9.356 1.00 90.13 C ATOM 6782 C LYS C 266 -23.317 46.043 -10.630 1.00 89.83 C ATOM 6783 O LYS C 266 -22.786 46.429 -11.674 1.00 89.92 O ATOM 6784 CB LYS C 266 -21.284 44.950 -9.643 1.00 91.98 C ATOM 6785 CG LYS C 266 -20.165 45.055 -8.607 1.00 93.52 C ATOM 6786 CD LYS C 266 -19.118 43.946 -8.751 1.00 94.83 C ATOM 6787 CE LYS C 266 -18.360 44.007 -10.080 1.00 95.81 C ATOM 6788 NZ LYS C 266 -19.139 43.494 -11.253 1.00 96.25 N ATOM 6789 N THR C 267 -24.615 45.774 -10.553 1.00 89.07 N ATOM 6790 CA THR C 267 -25.481 45.925 -11.720 1.00 87.97 C ATOM 6791 C THR C 267 -26.494 47.050 -11.506 1.00 88.38 C ATOM 6792 O THR C 267 -27.320 47.344 -12.375 1.00 88.04 O ATOM 6793 CB THR C 267 -26.230 44.630 -12.007 1.00 86.90 C ATOM 6794 OG1 THR C 267 -27.202 44.405 -10.976 1.00 86.88 O ATOM 6795 CG2 THR C 267 -25.249 43.474 -12.058 1.00 84.42 C ATOM 6796 N GLY C 268 -26.417 47.667 -10.333 1.00 88.84 N ATOM 6797 CA GLY C 268 -27.304 48.761 -9.990 1.00 88.87 C ATOM 6798 C GLY C 268 -28.784 48.500 -10.201 1.00 89.07 C ATOM 6799 O GLY C 268 -29.596 49.419 -10.092 1.00 90.27 O ATOM 6800 N LEU C 269 -29.156 47.258 -10.478 1.00 87.70 N ATOM 6801 CA LEU C 269 -30.558 46.957 -10.711 1.00 87.22 C ATOM 6802 C LEU C 269 -31.323 46.402 -9.520 1.00 86.80 C ATOM 6803 O LEU C 269 -32.388 45.802 -9.696 1.00 84.64 O ATOM 6804 CB LEU C 269 -30.686 45.984 -11.876 1.00 88.82 C ATOM 6805 CG LEU C 269 -30.217 46.494 -13.233 1.00 89.57 C ATOM 6806 CD1 LEU C 269 -30.594 45.463 -14.287 1.00 89.71 C ATOM 6807 CD2 LEU C 269 -30.871 47.840 -13.548 1.00 89.67 C ATOM 6808 N LEU C 270 -30.797 46.600 -8.312 1.00 87.46 N ATOM 6809 CA LEU C 270 -31.467 46.084 -7.118 1.00 88.00 C ATOM 6810 C LEU C 270 -32.659 46.948 -6.759 1.00 87.09 C ATOM 6811 O LEU C 270 -32.543 48.168 -6.644 1.00 87.38 O ATOM 6812 CB LEU C 270 -30.519 46.021 -5.912 1.00 88.45 C ATOM 6813 CG LEU C 270 -31.179 45.429 -4.650 1.00 88.79 C ATOM 6814 CD1 LEU C 270 -31.360 43.923 -4.821 1.00 86.57 C ATOM 6815 CD2 LEU C 270 -30.334 45.732 -3.420 1.00 88.64 C ATOM 6816 N SER C 271 -33.806 46.311 -6.570 1.00 85.87 N ATOM 6817 CA SER C 271 -35.001 47.055 -6.236 1.00 85.51 C ATOM 6818 C SER C 271 -35.610 46.565 -4.918 1.00 83.98 C ATOM 6819 O SER C 271 -35.984 47.371 -4.057 1.00 83.61 O ATOM 6820 CB SER C 271 -36.002 46.932 -7.387 1.00 86.30 C ATOM 6821 OG SER C 271 -36.794 48.101 -7.484 1.00 87.42 O ATOM 6822 N GLY C 272 -35.693 45.245 -4.766 1.00 81.96 N ATOM 6823 CA GLY C 272 -36.254 44.664 -3.560 1.00 78.89 C ATOM 6824 C GLY C 272 -35.352 43.576 -3.009 1.00 76.86 C ATOM 6825 O GLY C 272 -34.864 42.728 -3.760 1.00 75.55 O ATOM 6826 N LEU C 273 -35.144 43.591 -1.696 1.00 74.99 N ATOM 6827 CA LEU C 273 -34.273 42.617 -1.044 1.00 72.30 C ATOM 6828 C LEU C 273 -34.960 41.876 0.105 1.00 70.78 C ATOM 6829 O LEU C 273 -35.657 42.475 0.922 1.00 69.89 O ATOM 6830 CB LEU C 273 -33.027 43.331 -0.503 1.00 71.11 C ATOM 6831 CG LEU C 273 -31.658 42.640 -0.449 1.00 69.91 C ATOM 6832 CD1 LEU C 273 -30.918 43.183 0.762 1.00 68.42 C ATOM 6833 CD2 LEU C 273 -31.785 41.122 -0.354 1.00 68.44 C ATOM 6834 N ASP C 274 -34.752 40.567 0.165 1.00 70.14 N ATOM 6835 CA ASP C 274 -35.333 39.767 1.235 1.00 69.92 C ATOM 6836 C ASP C 274 -34.273 38.987 2.026 1.00 68.55 C ATOM 6837 O ASP C 274 -33.479 38.233 1.468 1.00 68.14 O ATOM 6838 CB ASP C 274 -36.399 38.807 0.676 1.00 69.92 C ATOM 6839 CG ASP C 274 -37.677 39.528 0.245 1.00 70.83 C ATOM 6840 OD1 ASP C 274 -38.136 40.439 0.973 1.00 69.82 O ATOM 6841 OD2 ASP C 274 -38.231 39.176 -0.816 1.00 71.83 O ATOM 6842 N ILE C 275 -34.253 39.215 3.334 1.00 67.15 N ATOM 6843 CA ILE C 275 -33.330 38.533 4.230 1.00 65.60 C ATOM 6844 C ILE C 275 -34.230 37.775 5.198 1.00 65.13 C ATOM 6845 O ILE C 275 -34.582 38.278 6.279 1.00 63.76 O ATOM 6846 CB ILE C 275 -32.434 39.540 4.990 1.00 65.55 C ATOM 6847 CG1 ILE C 275 -31.370 40.100 4.046 1.00 65.04 C ATOM 6848 CG2 ILE C 275 -31.771 38.879 6.179 1.00 65.82 C ATOM 6849 CD1 ILE C 275 -31.905 41.092 3.044 1.00 67.21 C ATOM 6850 N MET C 276 -34.592 36.560 4.773 1.00 63.90 N ATOM 6851 CA MET C 276 -35.497 35.677 5.501 1.00 62.24 C ATOM 6852 C MET C 276 -34.823 34.496 6.167 1.00 61.84 C ATOM 6853 O MET C 276 -33.631 34.251 5.970 1.00 60.99 O ATOM 6854 CB MET C 276 -36.543 35.108 4.544 1.00 62.85 C ATOM 6855 CG MET C 276 -36.998 36.061 3.469 1.00 66.01 C ATOM 6856 SD MET C 276 -37.572 37.611 4.162 1.00 70.21 S ATOM 6857 CE MET C 276 -38.808 36.970 5.337 1.00 69.59 C ATOM 6858 N GLU C 277 -35.641 33.771 6.943 1.00 61.47 N ATOM 6859 CA GLU C 277 -35.303 32.533 7.679 1.00 61.13 C ATOM 6860 C GLU C 277 -34.294 32.571 8.819 1.00 60.56 C ATOM 6861 O GLU C 277 -33.719 31.537 9.163 1.00 58.86 O ATOM 6862 CB GLU C 277 -34.869 31.416 6.706 1.00 60.57 C ATOM 6863 CG GLU C 277 -35.954 30.894 5.746 1.00 59.95 C ATOM 6864 CD GLU C 277 -37.209 30.430 6.452 1.00 60.38 C ATOM 6865 OE1 GLU C 277 -37.158 30.084 7.650 1.00 60.82 O ATOM 6866 OE2 GLU C 277 -38.277 30.421 5.822 1.00 63.05 O ATOM 6867 N VAL C 278 -34.073 33.737 9.409 1.00 60.40 N ATOM 6868 CA VAL C 278 -33.127 33.815 10.507 1.00 61.42 C ATOM 6869 C VAL C 278 -33.856 33.569 11.804 1.00 61.69 C ATOM 6870 O VAL C 278 -34.746 34.336 12.137 1.00 62.84 O ATOM 6871 CB VAL C 278 -32.488 35.183 10.615 1.00 61.62 C ATOM 6872 CG1 VAL C 278 -31.780 35.285 11.948 1.00 60.45 C ATOM 6873 CG2 VAL C 278 -31.497 35.392 9.479 1.00 63.02 C ATOM 6874 N ASN C 279 -33.467 32.529 12.543 1.00 62.09 N ATOM 6875 CA ASN C 279 -34.125 32.201 13.810 1.00 62.05 C ATOM 6876 C ASN C 279 -33.226 32.376 15.021 1.00 62.56 C ATOM 6877 O ASN C 279 -32.710 31.393 15.536 1.00 62.08 O ATOM 6878 CB ASN C 279 -34.618 30.756 13.793 1.00 61.37 C ATOM 6879 CG ASN C 279 -35.914 30.580 14.562 1.00 60.61 C ATOM 6880 OD1 ASN C 279 -36.198 31.330 15.501 1.00 58.30 O ATOM 6881 ND2 ASN C 279 -36.708 29.583 14.169 1.00 57.50 N ATOM 6882 N PRO C 280 -33.045 33.625 15.503 1.00 64.39 N ATOM 6883 CA PRO C 280 -32.217 34.020 16.657 1.00 67.08 C ATOM 6884 C PRO C 280 -32.314 33.136 17.898 1.00 69.58 C ATOM 6885 O PRO C 280 -31.664 33.394 18.912 1.00 69.61 O ATOM 6886 CB PRO C 280 -32.678 35.450 16.931 1.00 65.11 C ATOM 6887 CG PRO C 280 -32.904 35.957 15.583 1.00 64.36 C ATOM 6888 CD PRO C 280 -33.673 34.818 14.915 1.00 64.19 C ATOM 6889 N THR C 281 -33.124 32.092 17.798 1.00 72.57 N ATOM 6890 CA THR C 281 -33.326 31.164 18.893 1.00 75.80 C ATOM 6891 C THR C 281 -32.786 29.785 18.538 1.00 76.44 C ATOM 6892 O THR C 281 -32.366 29.027 19.410 1.00 76.53 O ATOM 6893 CB THR C 281 -34.822 31.069 19.226 1.00 76.94 C ATOM 6894 OG1 THR C 281 -35.575 31.023 18.008 1.00 79.55 O ATOM 6895 CG2 THR C 281 -35.266 32.279 20.029 1.00 77.78 C ATOM 6896 N LEU C 282 -32.796 29.461 17.253 1.00 78.33 N ATOM 6897 CA LEU C 282 -32.299 28.170 16.812 1.00 80.29 C ATOM 6898 C LEU C 282 -30.784 28.173 16.728 1.00 83.19 C ATOM 6899 O LEU C 282 -30.211 27.961 15.662 1.00 85.02 O ATOM 6900 CB LEU C 282 -32.904 27.789 15.458 1.00 77.40 C ATOM 6901 CG LEU C 282 -34.421 27.645 15.516 1.00 75.93 C ATOM 6902 CD1 LEU C 282 -34.953 27.080 14.219 1.00 75.40 C ATOM 6903 CD2 LEU C 282 -34.776 26.743 16.671 1.00 75.90 C ATOM 6904 N GLY C 283 -30.138 28.427 17.861 1.00 85.07 N ATOM 6905 CA GLY C 283 -28.687 28.423 17.900 1.00 86.60 C ATOM 6906 C GLY C 283 -28.170 27.376 18.879 1.00 87.56 C ATOM 6907 O GLY C 283 -28.531 27.400 20.064 1.00 88.62 O ATOM 6908 N LYS C 284 -27.345 26.450 18.381 1.00 86.77 N ATOM 6909 CA LYS C 284 -26.763 25.389 19.209 1.00 85.88 C ATOM 6910 C LYS C 284 -25.985 26.071 20.328 1.00 85.02 C ATOM 6911 O LYS C 284 -25.950 25.589 21.465 1.00 84.12 O ATOM 6912 CB LYS C 284 -25.806 24.523 18.379 1.00 87.29 C ATOM 6913 CG LYS C 284 -25.815 23.025 18.701 1.00 87.71 C ATOM 6914 CD LYS C 284 -26.790 22.259 17.802 1.00 87.88 C ATOM 6915 CE LYS C 284 -28.228 22.740 17.990 1.00 88.18 C ATOM 6916 NZ LYS C 284 -29.191 22.144 17.026 1.00 87.29 N ATOM 6917 N THR C 285 -25.363 27.198 19.976 1.00 84.15 N ATOM 6918 CA THR C 285 -24.589 28.016 20.900 1.00 83.26 C ATOM 6919 C THR C 285 -24.749 29.473 20.484 1.00 82.99 C ATOM 6920 O THR C 285 -24.942 29.782 19.306 1.00 81.31 O ATOM 6921 CB THR C 285 -23.090 27.697 20.850 1.00 83.35 C ATOM 6922 OG1 THR C 285 -22.517 28.265 19.663 1.00 84.05 O ATOM 6923 CG2 THR C 285 -22.866 26.198 20.848 1.00 84.61 C ATOM 6924 N PRO C 286 -24.668 30.389 21.455 1.00 83.41 N ATOM 6925 CA PRO C 286 -24.795 31.829 21.225 1.00 82.95 C ATOM 6926 C PRO C 286 -23.996 32.341 20.027 1.00 82.76 C ATOM 6927 O PRO C 286 -24.479 33.195 19.279 1.00 83.94 O ATOM 6928 CB PRO C 286 -24.326 32.416 22.547 1.00 83.43 C ATOM 6929 CG PRO C 286 -24.896 31.430 23.532 1.00 83.83 C ATOM 6930 CD PRO C 286 -24.526 30.101 22.896 1.00 84.08 C ATOM 6931 N GLU C 287 -22.779 31.833 19.855 1.00 81.54 N ATOM 6932 CA GLU C 287 -21.938 32.235 18.729 1.00 80.48 C ATOM 6933 C GLU C 287 -22.671 31.956 17.408 1.00 79.49 C ATOM 6934 O GLU C 287 -22.880 32.867 16.598 1.00 78.67 O ATOM 6935 CB GLU C 287 -20.606 31.467 18.780 1.00 82.25 C ATOM 6936 CG GLU C 287 -20.127 30.825 17.456 1.00 83.83 C ATOM 6937 CD GLU C 287 -19.708 31.837 16.378 1.00 84.57 C ATOM 6938 OE1 GLU C 287 -20.594 32.497 15.797 1.00 85.56 O ATOM 6939 OE2 GLU C 287 -18.491 31.972 16.106 1.00 84.28 O ATOM 6940 N GLU C 288 -23.058 30.696 17.205 1.00 77.25 N ATOM 6941 CA GLU C 288 -23.774 30.279 16.002 1.00 73.95 C ATOM 6942 C GLU C 288 -24.805 31.309 15.584 1.00 72.69 C ATOM 6943 O GLU C 288 -24.928 31.627 14.406 1.00 72.64 O ATOM 6944 CB GLU C 288 -24.477 28.950 16.233 1.00 72.46 C ATOM 6945 CG GLU C 288 -23.537 27.811 16.548 1.00 70.50 C ATOM 6946 CD GLU C 288 -24.157 26.462 16.247 1.00 69.40 C ATOM 6947 OE1 GLU C 288 -25.364 26.431 15.906 1.00 68.54 O ATOM 6948 OE2 GLU C 288 -23.444 25.435 16.352 1.00 68.62 O ATOM 6949 N VAL C 289 -25.565 31.812 16.553 1.00 71.45 N ATOM 6950 CA VAL C 289 -26.567 32.835 16.268 1.00 70.18 C ATOM 6951 C VAL C 289 -25.771 34.033 15.764 1.00 69.20 C ATOM 6952 O VAL C 289 -25.810 34.351 14.577 1.00 68.99 O ATOM 6953 CB VAL C 289 -27.389 33.242 17.545 1.00 69.79 C ATOM 6954 CG1 VAL C 289 -28.427 34.300 17.193 1.00 67.62 C ATOM 6955 CG2 VAL C 289 -28.097 32.019 18.129 1.00 69.71 C ATOM 6956 N THR C 290 -25.039 34.680 16.669 1.00 68.00 N ATOM 6957 CA THR C 290 -24.216 35.837 16.317 1.00 66.88 C ATOM 6958 C THR C 290 -23.624 35.684 14.913 1.00 66.56 C ATOM 6959 O THR C 290 -23.688 36.604 14.088 1.00 66.19 O ATOM 6960 CB THR C 290 -23.068 36.012 17.323 1.00 65.35 C ATOM 6961 OG1 THR C 290 -23.620 36.329 18.600 1.00 63.64 O ATOM 6962 CG2 THR C 290 -22.130 37.120 16.885 1.00 64.85 C ATOM 6963 N ARG C 291 -23.042 34.515 14.661 1.00 65.26 N ATOM 6964 CA ARG C 291 -22.446 34.204 13.363 1.00 64.06 C ATOM 6965 C ARG C 291 -23.522 34.491 12.325 1.00 63.34 C ATOM 6966 O ARG C 291 -23.416 35.415 11.513 1.00 64.70 O ATOM 6967 CB ARG C 291 -22.070 32.722 13.315 1.00 61.65 C ATOM 6968 CG ARG C 291 -21.321 32.273 12.084 1.00 58.64 C ATOM 6969 CD ARG C 291 -20.982 30.796 12.223 1.00 56.90 C ATOM 6970 NE ARG C 291 -20.501 30.514 13.572 1.00 57.01 N ATOM 6971 CZ ARG C 291 -20.356 29.298 14.096 1.00 57.06 C ATOM 6972 NH1 ARG C 291 -20.652 28.215 13.378 1.00 57.97 N ATOM 6973 NH2 ARG C 291 -19.947 29.164 15.355 1.00 55.48 N ATOM 6974 N THR C 292 -24.571 33.685 12.396 1.00 61.05 N ATOM 6975 CA THR C 292 -25.714 33.772 11.514 1.00 58.67 C ATOM 6976 C THR C 292 -26.176 35.209 11.320 1.00 58.47 C ATOM 6977 O THR C 292 -26.492 35.637 10.215 1.00 57.49 O ATOM 6978 CB THR C 292 -26.865 32.949 12.101 1.00 58.81 C ATOM 6979 OG1 THR C 292 -26.421 31.598 12.348 1.00 56.48 O ATOM 6980 CG2 THR C 292 -28.060 32.957 11.147 1.00 59.71 C ATOM 6981 N VAL C 293 -26.198 35.955 12.413 1.00 59.78 N ATOM 6982 CA VAL C 293 -26.636 37.340 12.392 1.00 58.82 C ATOM 6983 C VAL C 293 -25.748 38.280 11.596 1.00 58.73 C ATOM 6984 O VAL C 293 -26.230 38.991 10.724 1.00 60.24 O ATOM 6985 CB VAL C 293 -26.773 37.867 13.817 1.00 57.64 C ATOM 6986 CG1 VAL C 293 -26.797 39.368 13.819 1.00 59.61 C ATOM 6987 CG2 VAL C 293 -28.046 37.323 14.426 1.00 56.78 C ATOM 6988 N ASN C 294 -24.457 38.297 11.882 1.00 59.01 N ATOM 6989 CA ASN C 294 -23.562 39.196 11.157 1.00 59.52 C ATOM 6990 C ASN C 294 -23.461 38.852 9.677 1.00 57.04 C ATOM 6991 O ASN C 294 -23.614 39.719 8.825 1.00 56.04 O ATOM 6992 CB ASN C 294 -22.180 39.183 11.807 1.00 62.65 C ATOM 6993 CG ASN C 294 -22.242 39.493 13.295 1.00 65.74 C ATOM 6994 OD1 ASN C 294 -22.705 40.574 13.705 1.00 65.73 O ATOM 6995 ND2 ASN C 294 -21.789 38.541 14.117 1.00 65.75 N ATOM 6996 N THR C 295 -23.210 37.584 9.374 1.00 55.78 N ATOM 6997 CA THR C 295 -23.116 37.131 7.986 1.00 53.42 C ATOM 6998 C THR C 295 -24.360 37.601 7.211 1.00 51.63 C ATOM 6999 O THR C 295 -24.282 38.017 6.046 1.00 51.32 O ATOM 7000 CB THR C 295 -22.999 35.585 7.925 1.00 51.74 C ATOM 7001 OG1 THR C 295 -22.957 35.154 6.564 1.00 51.03 O ATOM 7002 CG2 THR C 295 -24.188 34.938 8.600 1.00 52.14 C ATOM 7003 N ALA C 296 -25.506 37.535 7.871 1.00 48.87 N ATOM 7004 CA ALA C 296 -26.735 37.984 7.263 1.00 48.83 C ATOM 7005 C ALA C 296 -26.574 39.489 7.025 1.00 49.86 C ATOM 7006 O ALA C 296 -26.649 39.974 5.894 1.00 48.11 O ATOM 7007 CB ALA C 296 -27.876 37.715 8.197 1.00 48.98 C ATOM 7008 N VAL C 297 -26.326 40.207 8.117 1.00 51.36 N ATOM 7009 CA VAL C 297 -26.105 41.651 8.102 1.00 53.60 C ATOM 7010 C VAL C 297 -25.119 42.001 6.996 1.00 55.78 C ATOM 7011 O VAL C 297 -25.353 42.901 6.179 1.00 58.28 O ATOM 7012 CB VAL C 297 -25.490 42.125 9.446 1.00 53.14 C ATOM 7013 CG1 VAL C 297 -24.996 43.557 9.341 1.00 50.59 C ATOM 7014 CG2 VAL C 297 -26.517 42.000 10.554 1.00 54.44 C ATOM 7015 N ALA C 298 -24.007 41.284 6.988 1.00 56.73 N ATOM 7016 CA ALA C 298 -22.964 41.499 6.015 1.00 58.92 C ATOM 7017 C ALA C 298 -23.532 41.351 4.614 1.00 61.14 C ATOM 7018 O ALA C 298 -23.238 42.165 3.735 1.00 62.85 O ATOM 7019 CB ALA C 298 -21.835 40.494 6.242 1.00 58.61 C ATOM 7020 N LEU C 299 -24.340 40.314 4.397 1.00 60.82 N ATOM 7021 CA LEU C 299 -24.910 40.100 3.075 1.00 61.09 C ATOM 7022 C LEU C 299 -25.771 41.280 2.618 1.00 62.91 C ATOM 7023 O LEU C 299 -25.934 41.498 1.419 1.00 65.11 O ATOM 7024 CB LEU C 299 -25.757 38.826 3.040 1.00 57.81 C ATOM 7025 CG LEU C 299 -25.138 37.427 2.971 1.00 55.38 C ATOM 7026 CD1 LEU C 299 -26.268 36.467 2.740 1.00 53.96 C ATOM 7027 CD2 LEU C 299 -24.143 37.269 1.841 1.00 51.97 C ATOM 7028 N THR C 300 -26.327 42.038 3.557 1.00 62.76 N ATOM 7029 CA THR C 300 -27.162 43.172 3.181 1.00 62.65 C ATOM 7030 C THR C 300 -26.286 44.286 2.656 1.00 61.32 C ATOM 7031 O THR C 300 -26.383 44.662 1.493 1.00 59.77 O ATOM 7032 CB THR C 300 -27.963 43.711 4.378 1.00 63.73 C ATOM 7033 OG1 THR C 300 -28.508 42.618 5.125 1.00 65.35 O ATOM 7034 CG2 THR C 300 -29.119 44.587 3.892 1.00 63.25 C ATOM 7035 N LEU C 301 -25.438 44.804 3.537 1.00 60.60 N ATOM 7036 CA LEU C 301 -24.492 45.862 3.202 1.00 62.25 C ATOM 7037 C LEU C 301 -23.715 45.529 1.936 1.00 62.75 C ATOM 7038 O LEU C 301 -23.016 46.372 1.385 1.00 61.14 O ATOM 7039 CB LEU C 301 -23.491 46.045 4.342 1.00 62.64 C ATOM 7040 CG LEU C 301 -24.070 46.408 5.704 1.00 63.60 C ATOM 7041 CD1 LEU C 301 -23.018 46.275 6.800 1.00 64.16 C ATOM 7042 CD2 LEU C 301 -24.587 47.827 5.624 1.00 64.47 C ATOM 7043 N SER C 302 -23.811 44.281 1.498 1.00 65.66 N ATOM 7044 CA SER C 302 -23.114 43.852 0.296 1.00 67.63 C ATOM 7045 C SER C 302 -23.919 44.359 -0.888 1.00 69.13 C ATOM 7046 O SER C 302 -23.392 45.036 -1.774 1.00 70.32 O ATOM 7047 CB SER C 302 -23.008 42.328 0.262 1.00 67.19 C ATOM 7048 OG SER C 302 -22.107 41.917 -0.746 1.00 69.11 O ATOM 7049 N CYS C 303 -25.209 44.043 -0.867 1.00 70.31 N ATOM 7050 CA CYS C 303 -26.152 44.460 -1.897 1.00 71.32 C ATOM 7051 C CYS C 303 -26.206 45.979 -2.024 1.00 72.31 C ATOM 7052 O CYS C 303 -26.758 46.516 -2.987 1.00 72.83 O ATOM 7053 CB CYS C 303 -27.546 43.974 -1.531 1.00 71.03 C ATOM 7054 SG CYS C 303 -27.601 42.247 -1.161 1.00 72.69 S ATOM 7055 N PHE C 304 -25.632 46.672 -1.048 1.00 73.09 N ATOM 7056 CA PHE C 304 -25.665 48.119 -1.046 1.00 72.61 C ATOM 7057 C PHE C 304 -24.326 48.830 -1.129 1.00 73.36 C ATOM 7058 O PHE C 304 -24.154 49.897 -0.536 1.00 74.39 O ATOM 7059 CB PHE C 304 -26.420 48.584 0.188 1.00 71.80 C ATOM 7060 CG PHE C 304 -27.834 48.101 0.233 1.00 71.95 C ATOM 7061 CD1 PHE C 304 -28.813 48.705 -0.548 1.00 72.36 C ATOM 7062 CD2 PHE C 304 -28.186 47.022 1.029 1.00 72.11 C ATOM 7063 CE1 PHE C 304 -30.125 48.241 -0.536 1.00 71.54 C ATOM 7064 CE2 PHE C 304 -29.495 46.551 1.047 1.00 72.05 C ATOM 7065 CZ PHE C 304 -30.466 47.164 0.261 1.00 71.30 C ATOM 7066 N GLY C 305 -23.374 48.253 -1.855 1.00 72.86 N ATOM 7067 CA GLY C 305 -22.100 48.929 -1.993 1.00 72.03 C ATOM 7068 C GLY C 305 -20.854 48.206 -1.542 1.00 71.73 C ATOM 7069 O GLY C 305 -20.025 47.825 -2.377 1.00 72.28 O ATOM 7070 N THR C 306 -20.712 48.034 -0.227 1.00 70.88 N ATOM 7071 CA THR C 306 -19.545 47.369 0.341 1.00 69.05 C ATOM 7072 C THR C 306 -19.032 46.264 -0.585 1.00 69.72 C ATOM 7073 O THR C 306 -19.784 45.374 -1.028 1.00 68.72 O ATOM 7074 CB THR C 306 -19.844 46.765 1.709 1.00 66.56 C ATOM 7075 OG1 THR C 306 -20.766 47.601 2.408 1.00 65.69 O ATOM 7076 CG2 THR C 306 -18.573 46.674 2.517 1.00 65.22 C ATOM 7077 N LYS C 307 -17.740 46.354 -0.883 1.00 69.29 N ATOM 7078 CA LYS C 307 -17.051 45.418 -1.757 1.00 69.35 C ATOM 7079 C LYS C 307 -15.970 44.776 -0.901 1.00 68.64 C ATOM 7080 O LYS C 307 -15.589 45.336 0.117 1.00 68.97 O ATOM 7081 CB LYS C 307 -16.458 46.199 -2.935 1.00 69.94 C ATOM 7082 CG LYS C 307 -17.487 47.167 -3.547 1.00 71.55 C ATOM 7083 CD LYS C 307 -16.851 48.305 -4.351 1.00 75.08 C ATOM 7084 CE LYS C 307 -16.374 47.843 -5.716 1.00 76.43 C ATOM 7085 NZ LYS C 307 -17.522 47.364 -6.547 1.00 77.73 N ATOM 7086 N ARG C 308 -15.490 43.598 -1.275 1.00 68.82 N ATOM 7087 CA ARG C 308 -14.465 42.962 -0.463 1.00 69.63 C ATOM 7088 C ARG C 308 -13.070 43.478 -0.814 1.00 73.65 C ATOM 7089 O ARG C 308 -12.043 42.942 -0.358 1.00 74.33 O ATOM 7090 CB ARG C 308 -14.543 41.438 -0.597 1.00 65.67 C ATOM 7091 CG ARG C 308 -15.371 40.791 0.514 1.00 61.84 C ATOM 7092 CD ARG C 308 -15.506 39.296 0.351 1.00 56.86 C ATOM 7093 NE ARG C 308 -16.356 38.976 -0.784 1.00 51.40 N ATOM 7094 CZ ARG C 308 -16.196 37.906 -1.549 1.00 51.56 C ATOM 7095 NH1 ARG C 308 -15.211 37.050 -1.299 1.00 50.69 N ATOM 7096 NH2 ARG C 308 -17.015 37.692 -2.566 1.00 53.11 N ATOM 7097 N GLU C 309 -13.038 44.534 -1.619 1.00 76.34 N ATOM 7098 CA GLU C 309 -11.779 45.127 -2.009 1.00 78.81 C ATOM 7099 C GLU C 309 -11.648 46.494 -1.362 1.00 81.14 C ATOM 7100 O GLU C 309 -10.577 47.103 -1.381 1.00 80.97 O ATOM 7101 CB GLU C 309 -11.696 45.247 -3.527 1.00 78.42 C ATOM 7102 CG GLU C 309 -12.736 46.144 -4.133 1.00 79.83 C ATOM 7103 CD GLU C 309 -12.273 46.717 -5.453 1.00 80.70 C ATOM 7104 OE1 GLU C 309 -12.021 45.924 -6.392 1.00 79.92 O ATOM 7105 OE2 GLU C 309 -12.154 47.961 -5.545 1.00 80.64 O ATOM 7106 N GLY C 310 -12.745 46.970 -0.784 1.00 84.23 N ATOM 7107 CA GLY C 310 -12.735 48.265 -0.129 1.00 88.85 C ATOM 7108 C GLY C 310 -13.747 49.263 -0.672 1.00 92.10 C ATOM 7109 O GLY C 310 -14.419 49.014 -1.668 1.00 91.07 O ATOM 7110 N ASN C 311 -13.837 50.408 -0.004 1.00 97.12 N ATOM 7111 CA ASN C 311 -14.739 51.495 -0.374 1.00102.29 C ATOM 7112 C ASN C 311 -14.133 52.790 0.187 1.00106.18 C ATOM 7113 O ASN C 311 -13.600 52.796 1.304 1.00107.49 O ATOM 7114 CB ASN C 311 -16.135 51.284 0.240 1.00102.14 C ATOM 7115 CG ASN C 311 -16.881 50.093 -0.359 1.00102.71 C ATOM 7116 OD1 ASN C 311 -17.191 50.075 -1.550 1.00103.04 O ATOM 7117 ND2 ASN C 311 -17.176 49.094 0.473 1.00102.31 N ATOM 7118 N HIS C 312 -14.198 53.873 -0.588 1.00109.68 N ATOM 7119 CA HIS C 312 -13.667 55.169 -0.154 1.00112.52 C ATOM 7120 C HIS C 312 -14.519 56.304 -0.733 1.00113.52 C ATOM 7121 O HIS C 312 -14.763 56.353 -1.944 1.00113.20 O ATOM 7122 CB HIS C 312 -12.204 55.334 -0.599 1.00114.32 C ATOM 7123 CG HIS C 312 -12.034 55.529 -2.077 1.00117.06 C ATOM 7124 ND1 HIS C 312 -12.164 54.503 -2.985 1.00118.23 N ATOM 7125 CD2 HIS C 312 -11.781 56.648 -2.801 1.00118.21 C ATOM 7126 CE1 HIS C 312 -12.002 54.979 -4.211 1.00118.99 C ATOM 7127 NE2 HIS C 312 -11.769 56.276 -4.125 1.00119.29 N ATOM 7128 N LYS C 313 -14.977 57.207 0.131 1.00114.77 N ATOM 7129 CA LYS C 313 -15.803 58.331 -0.308 1.00116.23 C ATOM 7130 C LYS C 313 -15.102 59.055 -1.454 1.00117.88 C ATOM 7131 O LYS C 313 -13.966 59.512 -1.305 1.00118.20 O ATOM 7132 CB LYS C 313 -16.040 59.291 0.858 1.00115.55 C ATOM 7133 CG LYS C 313 -16.682 58.641 2.078 1.00114.52 C ATOM 7134 CD LYS C 313 -16.724 59.619 3.246 1.00113.62 C ATOM 7135 CE LYS C 313 -17.241 58.977 4.524 1.00111.93 C ATOM 7136 NZ LYS C 313 -17.194 59.936 5.666 1.00109.72 N ATOM 7137 N PRO C 314 -15.766 59.158 -2.620 1.00119.02 N ATOM 7138 CA PRO C 314 -15.190 59.828 -3.789 1.00119.57 C ATOM 7139 C PRO C 314 -14.955 61.320 -3.589 1.00120.29 C ATOM 7140 O PRO C 314 -15.468 61.931 -2.643 1.00119.94 O ATOM 7141 CB PRO C 314 -16.221 59.565 -4.886 1.00119.10 C ATOM 7142 CG PRO C 314 -16.864 58.291 -4.456 1.00119.32 C ATOM 7143 CD PRO C 314 -17.045 58.522 -2.976 1.00119.29 C ATOM 7144 N GLU C 315 -14.171 61.891 -4.499 1.00120.71 N ATOM 7145 CA GLU C 315 -13.845 63.311 -4.488 1.00120.73 C ATOM 7146 C GLU C 315 -13.066 63.768 -3.263 1.00119.42 C ATOM 7147 O GLU C 315 -12.851 64.966 -3.067 1.00119.59 O ATOM 7148 CB GLU C 315 -15.126 64.140 -4.633 1.00122.78 C ATOM 7149 CG GLU C 315 -15.743 64.063 -6.027 1.00124.50 C ATOM 7150 CD GLU C 315 -17.087 64.757 -6.119 1.00125.58 C ATOM 7151 OE1 GLU C 315 -17.623 64.847 -7.242 1.00125.94 O ATOM 7152 OE2 GLU C 315 -17.609 65.204 -5.073 1.00126.86 O ATOM 7153 N THR C 316 -12.649 62.818 -2.435 1.00117.54 N ATOM 7154 CA THR C 316 -11.869 63.158 -1.255 1.00115.78 C ATOM 7155 C THR C 316 -10.466 62.627 -1.492 1.00113.97 C ATOM 7156 O THR C 316 -10.295 61.487 -1.921 1.00113.84 O ATOM 7157 CB THR C 316 -12.452 62.524 0.042 1.00116.09 C ATOM 7158 OG1 THR C 316 -13.784 63.009 0.255 1.00116.18 O ATOM 7159 CG2 THR C 316 -11.589 62.889 1.258 1.00115.05 C ATOM 7160 N ASP C 317 -9.473 63.478 -1.245 1.00111.82 N ATOM 7161 CA ASP C 317 -8.072 63.108 -1.408 1.00109.43 C ATOM 7162 C ASP C 317 -7.587 62.618 -0.043 1.00109.27 C ATOM 7163 O ASP C 317 -7.352 63.414 0.876 1.00109.58 O ATOM 7164 CB ASP C 317 -7.258 64.323 -1.894 1.00106.72 C ATOM 7165 CG ASP C 317 -5.768 64.025 -2.060 1.00104.53 C ATOM 7166 OD1 ASP C 317 -5.405 63.058 -2.761 1.00103.22 O ATOM 7167 OD2 ASP C 317 -4.950 64.777 -1.491 1.00102.42 O ATOM 7168 N TYR C 318 -7.478 61.292 0.084 1.00108.87 N ATOM 7169 CA TYR C 318 -7.032 60.644 1.315 1.00108.47 C ATOM 7170 C TYR C 318 -5.520 60.763 1.448 1.00109.21 C ATOM 7171 O TYR C 318 -4.915 60.142 2.311 1.00108.22 O ATOM 7172 CB TYR C 318 -7.439 59.165 1.313 1.00107.18 C ATOM 7173 CG TYR C 318 -8.907 58.878 1.614 1.00105.92 C ATOM 7174 CD1 TYR C 318 -9.418 58.991 2.917 1.00104.92 C ATOM 7175 CD2 TYR C 318 -9.781 58.459 0.605 1.00104.89 C ATOM 7176 CE1 TYR C 318 -10.764 58.689 3.204 1.00103.58 C ATOM 7177 CE2 TYR C 318 -11.127 58.156 0.881 1.00104.05 C ATOM 7178 CZ TYR C 318 -11.608 58.272 2.179 1.00103.64 C ATOM 7179 OH TYR C 318 -12.927 57.973 2.443 1.00102.85 O ATOM 7180 N LEU C 319 -4.925 61.568 0.574 1.00111.27 N ATOM 7181 CA LEU C 319 -3.482 61.820 0.566 1.00112.79 C ATOM 7182 C LEU C 319 -3.164 63.199 1.177 1.00113.76 C ATOM 7183 O LEU C 319 -2.466 63.244 2.222 1.00114.70 O ATOM 7184 CB LEU C 319 -2.940 61.768 -0.872 1.00111.81 C ATOM 7185 CG LEU C 319 -2.749 60.393 -1.510 1.00110.60 C ATOM 7186 CD1 LEU C 319 -2.346 60.548 -2.969 1.00109.61 C ATOM 7187 CD2 LEU C 319 -1.684 59.631 -0.730 1.00109.90 C TER 7188 LEU C 319 HETATM 7189 MN MN A 500 -0.105 42.231 7.506 1.00 67.49 MN HETATM 7190 MN MN A 501 -3.619 40.640 7.969 1.00 56.97 MN HETATM 7191 N HAR A 906 -4.248 48.081 7.923 1.00 64.15 N HETATM 7192 CA HAR A 906 -5.197 48.023 9.112 1.00 65.82 C HETATM 7193 C HAR A 906 -4.618 48.827 10.404 1.00 67.85 C HETATM 7194 O HAR A 906 -5.328 48.805 11.513 1.00 67.91 O HETATM 7195 CB HAR A 906 -5.473 46.547 9.503 1.00 62.42 C HETATM 7196 CG HAR A 906 -4.329 45.979 10.273 1.00 59.37 C HETATM 7197 CD HAR A 906 -3.927 44.648 9.756 1.00 57.53 C HETATM 7198 NE HAR A 906 -2.776 44.175 10.510 1.00 55.07 N HETATM 7199 CZ HAR A 906 -2.142 43.037 10.299 1.00 53.02 C HETATM 7200 NH1 HAR A 906 -2.527 42.193 9.331 1.00 53.12 N HETATM 7201 NH2 HAR A 906 -1.092 42.735 11.048 1.00 53.69 N HETATM 7202 OH1 HAR A 906 -1.802 41.073 9.157 1.00 52.94 O HETATM 7203 OXT HAR A 906 -3.457 49.433 10.293 1.00 70.33 O HETATM 7204 MN MN B 502 -7.166 3.574 -0.012 1.00 67.49 MN HETATM 7205 MN MN B 503 -6.846 7.018 1.758 1.00 56.97 MN HETATM 7206 N HAR B 907 -0.036 4.190 0.576 1.00 64.15 N HETATM 7207 CA HAR B 907 0.387 4.581 1.985 1.00 65.82 C HETATM 7208 C HAR B 907 0.823 3.292 2.879 1.00 67.85 C HETATM 7209 O HAR B 907 1.161 3.499 4.135 1.00 67.91 O HETATM 7210 CB HAR B 907 -0.765 5.338 2.697 1.00 62.42 C HETATM 7211 CG HAR B 907 -1.808 4.387 3.178 1.00 59.37 C HETATM 7212 CD HAR B 907 -3.173 4.838 2.810 1.00 57.53 C HETATM 7213 NE HAR B 907 -4.137 3.844 3.257 1.00 55.07 N HETATM 7214 CZ HAR B 907 -5.442 3.910 3.072 1.00 53.02 C HETATM 7215 NH1 HAR B 907 -6.003 4.945 2.430 1.00 53.12 N HETATM 7216 NH2 HAR B 907 -6.207 2.923 3.513 1.00 53.69 N HETATM 7217 OH1 HAR B 907 -7.336 4.916 2.250 1.00 52.94 O HETATM 7218 OXT HAR B 907 0.788 2.104 2.319 1.00 70.33 O HETATM 7219 MN MN C 504 -37.524 29.674 1.725 1.00 67.49 MN HETATM 7220 MN MN C 505 -34.646 27.277 2.757 1.00 56.97 MN HETATM 7221 N HAR C 908 -40.478 23.270 0.371 1.00 64.15 N HETATM 7222 CA HAR C 908 -40.316 22.310 1.540 1.00 65.82 C HETATM 7223 C HAR C 908 -41.633 22.271 2.496 1.00 67.85 C HETATM 7224 O HAR C 908 -41.593 21.513 3.572 1.00 67.91 O HETATM 7225 CB HAR C 908 -39.075 22.703 2.386 1.00 62.42 C HETATM 7226 CG HAR C 908 -39.381 23.867 3.267 1.00 59.37 C HETATM 7227 CD HAR C 908 -38.326 24.906 3.189 1.00 57.53 C HETATM 7228 NE HAR C 908 -38.709 26.034 4.024 1.00 55.07 N HETATM 7229 CZ HAR C 908 -38.014 27.146 4.169 1.00 53.02 C HETATM 7230 NH1 HAR C 908 -36.851 27.330 3.529 1.00 53.12 N HETATM 7231 NH2 HAR C 908 -38.488 28.107 4.949 1.00 53.69 N HETATM 7232 OH1 HAR C 908 -36.225 28.509 3.700 1.00 52.94 O HETATM 7233 OXT HAR C 908 -42.663 23.020 2.169 1.00 70.33 O HETATM 7234 O HOH A 601 -4.336 39.787 11.132 1.00 30.80 O HETATM 7235 O HOH A 602 12.633 23.151 13.834 1.00 72.37 O HETATM 7236 O HOH A 603 -0.705 37.841 15.495 1.00 72.08 O HETATM 7237 O HOH A 604 -12.526 31.118 -6.884 1.00 44.81 O HETATM 7238 O HOH A 605 -15.138 31.020 -5.114 1.00 31.75 O HETATM 7239 O HOH B 606 -7.208 6.890 5.089 1.00 30.80 O HETATM 7240 O HOH B 607 -29.959 -0.474 5.507 1.00 72.37 O HETATM 7241 O HOH B 608 -10.623 3.267 8.438 1.00 72.08 O HETATM 7242 O HOH B 609 -11.012 23.809 -7.798 1.00 44.81 O HETATM 7243 O HOH B 610 -9.810 25.382 -5.340 1.00 31.75 O HETATM 7244 O HOH C 611 -34.509 26.628 6.045 1.00 30.80 O HETATM 7245 O HOH C 612 -29.740 48.863 13.452 1.00 72.37 O HETATM 7246 O HOH C 613 -35.917 30.135 10.706 1.00 72.08 O HETATM 7247 O HOH C 614 -18.178 25.878 -8.082 1.00 44.81 O HETATM 7248 O HOH C 615 -17.347 23.403 -6.306 1.00 31.75 O CONECT 714 7189 CONECT 887 7190 CONECT 888 7189 CONECT 900 7190 CONECT 918 7189 CONECT 1738 7190 CONECT 1752 7190 CONECT 1753 7190 CONECT 3110 7204 CONECT 3283 7205 CONECT 3284 7204 CONECT 3296 7205 CONECT 3314 7204 CONECT 4134 7205 CONECT 4148 7205 CONECT 4149 7205 CONECT 5506 7219 CONECT 5679 7220 CONECT 5680 7219 CONECT 5692 7220 CONECT 5710 7219 CONECT 6530 7220 CONECT 6544 7220 CONECT 6545 7220 CONECT 7189 714 888 918 7202 CONECT 7190 887 900 1738 1752 CONECT 7190 1753 7200 7202 CONECT 7191 7192 CONECT 7192 7191 7193 7195 CONECT 7193 7192 7194 7203 CONECT 7194 7193 CONECT 7195 7192 7196 CONECT 7196 7195 7197 CONECT 7197 7196 7198 CONECT 7198 7197 7199 CONECT 7199 7198 7200 7201 CONECT 7200 7190 7199 7202 CONECT 7201 7199 CONECT 7202 7189 7190 7200 CONECT 7203 7193 CONECT 7204 3110 3284 3314 7217 CONECT 7205 3283 3296 4134 4148 CONECT 7205 4149 7215 7217 CONECT 7206 7207 CONECT 7207 7206 7208 7210 CONECT 7208 7207 7209 7218 CONECT 7209 7208 CONECT 7210 7207 7211 CONECT 7211 7210 7212 CONECT 7212 7211 7213 CONECT 7213 7212 7214 CONECT 7214 7213 7215 7216 CONECT 7215 7205 7214 7217 CONECT 7216 7214 CONECT 7217 7204 7205 7215 CONECT 7218 7208 CONECT 7219 5506 5680 5710 7232 CONECT 7220 5679 5692 6530 6544 CONECT 7220 6545 7230 7232 CONECT 7221 7222 CONECT 7222 7221 7223 7225 CONECT 7223 7222 7224 7233 CONECT 7224 7223 CONECT 7225 7222 7226 CONECT 7226 7225 7227 CONECT 7227 7226 7228 CONECT 7228 7227 7229 CONECT 7229 7228 7230 7231 CONECT 7230 7220 7229 7232 CONECT 7231 7229 CONECT 7232 7219 7220 7230 CONECT 7233 7223 MASTER 498 0 9 36 24 0 24 6 7245 3 72 75 END PyCogent-1.5.3/doc/data/4TSV.pdb000644 000765 000024 00000335736 11305747275 017201 0ustar00jrideoutstaff000000 000000 HEADER LYMPHOKINE 29-OCT-97 4TSV TITLE HIGH RESOLUTION CRYSTAL STRUCTURE OF A HUMAN TNF-ALPHA TITLE 2 MUTANT COMPND MOL_ID: 1; COMPND 2 MOLECULE: TUMOR NECROSIS FACTOR-ALPHA; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES; COMPND 5 MUTATION: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 ORGANISM_COMMON: HUMAN; SOURCE 4 CELL_LINE: BL21; SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 6 EXPRESSION_SYSTEM_STRAIN: BL21 (DE3); SOURCE 7 EXPRESSION_SYSTEM_PLASMID: BL21 KEYWDS LYMPHOKINE, TNF EXPDTA X-RAY DIFFRACTION AUTHOR S.-S.CHA,J.-S.KIM,H.-S.CHO,B.-H.OH REVDAT 2 02-MAR-99 4TSV 1 COMPND REMARK HEADER SOURCE REVDAT 2 2 1 KEYWDS REVDAT 1 30-DEC-98 4TSV 0 JRNL AUTH S.S.CHA,J.S.KIM,H.S.CHO,N.K.SHIN,W.JEONG,H.C.SHIN, JRNL AUTH 2 Y.J.KIM,J.H.HAHN,B.H.OH JRNL TITL HIGH RESOLUTION CRYSTAL STRUCTURE OF A HUMAN TUMOR JRNL TITL 2 NECROSIS FACTOR-ALPHA MUTANT WITH LOW SYSTEMIC JRNL TITL 3 TOXICITY. JRNL REF J.BIOL.CHEM. V. 273 2153 1998 JRNL REFN ASTM JBCHA3 US ISSN 0021-9258 REMARK 1 REMARK 1 REFERENCE 1 REMARK 1 AUTH M.J.ECK,S.R.SPRANG REMARK 1 TITL THE STRUCTURE OF TUMOR NECROSIS FACTOR-ALPHA AT REMARK 1 TITL 2 2.6 A RESOLUTION. IMPLICATIONS FOR RECEPTOR BINDING REMARK 1 REF J.BIOL.CHEM. V. 264 17595 1989 REMARK 1 REFN ASTM JBCHA3 US ISSN 0021-9258 REMARK 2 REMARK 2 RESOLUTION. 1.80 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.01 REMARK 3 AUTHORS : BRUNGER REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.80 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 8.00 REMARK 3 DATA CUTOFF (SIGMA(F)) : 1.000 REMARK 3 DATA CUTOFF HIGH (ABS(F)) : NULL REMARK 3 DATA CUTOFF LOW (ABS(F)) : NULL REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 99.9 REMARK 3 NUMBER OF REFLECTIONS : 13102 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : NULL REMARK 3 FREE R VALUE TEST SET SELECTION : NULL REMARK 3 R VALUE (WORKING SET) : 0.200 REMARK 3 FREE R VALUE : 0.262 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 10.000 REMARK 3 FREE R VALUE TEST SET COUNT : NULL REMARK 3 ESTIMATED ERROR OF FREE R VALUE : NULL REMARK 3 REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. REMARK 3 TOTAL NUMBER OF BINS USED : NULL REMARK 3 BIN RESOLUTION RANGE HIGH (A) : NULL REMARK 3 BIN RESOLUTION RANGE LOW (A) : NULL REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : NULL REMARK 3 REFLECTIONS IN BIN (WORKING SET) : NULL REMARK 3 BIN R VALUE (WORKING SET) : NULL REMARK 3 BIN FREE R VALUE : NULL REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 BIN FREE R VALUE TEST SET COUNT : NULL REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : NULL REMARK 3 REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. REMARK 3 PROTEIN ATOMS : 1118 REMARK 3 NUCLEIC ACID ATOMS : NULL REMARK 3 HETEROGEN ATOMS : NULL REMARK 3 SOLVENT ATOMS : 80 REMARK 3 REMARK 3 B VALUES. REMARK 3 FROM WILSON PLOT (A**2) : NULL REMARK 3 MEAN B VALUE (OVERALL, A**2) : NULL REMARK 3 OVERALL ANISOTROPIC B VALUE. REMARK 3 B11 (A**2) : NULL REMARK 3 B22 (A**2) : NULL REMARK 3 B33 (A**2) : NULL REMARK 3 B12 (A**2) : NULL REMARK 3 B13 (A**2) : NULL REMARK 3 B23 (A**2) : NULL REMARK 3 REMARK 3 ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM LUZZATI PLOT (A) : 0.30 REMARK 3 ESD FROM SIGMAA (A) : NULL REMARK 3 LOW RESOLUTION CUTOFF (A) : NULL REMARK 3 REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM C-V LUZZATI PLOT (A) : NULL REMARK 3 ESD FROM C-V SIGMAA (A) : NULL REMARK 3 REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES. REMARK 3 BOND LENGTHS (A) : 0.015 REMARK 3 BOND ANGLES (DEGREES) : 2.18 REMARK 3 DIHEDRAL ANGLES (DEGREES) : NULL REMARK 3 IMPROPER ANGLES (DEGREES) : NULL REMARK 3 REMARK 3 ISOTROPIC THERMAL MODEL : NULL REMARK 3 REMARK 3 ISOTROPIC THERMAL FACTOR RESTRAINTS. RMS SIGMA REMARK 3 MAIN-CHAIN BOND (A**2) : NULL ; NULL REMARK 3 MAIN-CHAIN ANGLE (A**2) : NULL ; NULL REMARK 3 SIDE-CHAIN BOND (A**2) : NULL ; NULL REMARK 3 SIDE-CHAIN ANGLE (A**2) : NULL ; NULL REMARK 3 REMARK 3 NCS MODEL : NULL REMARK 3 REMARK 3 NCS RESTRAINTS. RMS SIGMA/WEIGHT REMARK 3 GROUP 1 POSITIONAL (A) : NULL ; NULL REMARK 3 GROUP 1 B-FACTOR (A**2) : NULL ; NULL REMARK 3 REMARK 3 PARAMETER FILE 1 : NULL REMARK 3 TOPOLOGY FILE 1 : NULL REMARK 3 REMARK 3 OTHER REFINEMENT REMARKS: NULL REMARK 4 REMARK 4 4TSV COMPLIES WITH FORMAT V. 3.0, 1-DEC-2006 REMARK 4 REMARK 4 THIS IS THE REMEDIATED VERSION OF THIS PDB ENTRY. REMARK 4 REMEDIATED DATA FILE REVISION 3.100 (2007-03-18) REMARK 4 REMEDIATOR VALIDATED PDB VERSION 2.3 COMPLIANT REMARK 200 REMARK 200 EXPERIMENTAL DETAILS REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION REMARK 200 DATE OF DATA COLLECTION : DEC-1995 REMARK 200 TEMPERATURE (KELVIN) : 298.0 REMARK 200 PH : 6.80 REMARK 200 NUMBER OF CRYSTALS USED : 1 REMARK 200 REMARK 200 SYNCHROTRON (Y/N) : N REMARK 200 RADIATION SOURCE : NULL REMARK 200 BEAMLINE : NULL REMARK 200 X-RAY GENERATOR MODEL : MAC SCIENCE M18XHF22 REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M REMARK 200 WAVELENGTH OR RANGE (A) : 1.5418 REMARK 200 MONOCHROMATOR : GRAPHITE(002) REMARK 200 OPTICS : NULL REMARK 200 REMARK 200 DETECTOR TYPE : IMAGE PLATE AREA DETECTOR REMARK 200 DETECTOR MANUFACTURER : MACSCIENCE REMARK 200 INTENSITY-INTEGRATION SOFTWARE : DENZO REMARK 200 DATA SCALING SOFTWARE : SCALEPACK REMARK 200 REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 13102 REMARK 200 RESOLUTION RANGE HIGH (A) : 1.800 REMARK 200 RESOLUTION RANGE LOW (A) : 10.000 REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 1.000 REMARK 200 REMARK 200 OVERALL. REMARK 200 COMPLETENESS FOR RANGE (%) : 99.9 REMARK 200 DATA REDUNDANCY : NULL REMARK 200 R MERGE (I) : NULL REMARK 200 R SYM (I) : 0.04000 REMARK 200 FOR THE DATA SET : NULL REMARK 200 REMARK 200 IN THE HIGHEST RESOLUTION SHELL. REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : NULL REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : NULL REMARK 200 COMPLETENESS FOR SHELL (%) : NULL REMARK 200 DATA REDUNDANCY IN SHELL : NULL REMARK 200 R MERGE FOR SHELL (I) : NULL REMARK 200 R SYM FOR SHELL (I) : NULL REMARK 200 FOR SHELL : NULL REMARK 200 REMARK 200 DIFFRACTION PROTOCOL: NULL REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: NULL REMARK 200 SOFTWARE USED: X-PLOR 3.0 REMARK 200 STARTING MODEL: NULL REMARK 200 REMARK 200 REMARK: NULL REMARK 280 REMARK 280 CRYSTAL REMARK 280 SOLVENT CONTENT, VS (%): 44.02 REMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 2.20 REMARK 280 REMARK 280 CRYSTALLIZATION CONDITIONS: 1.4 M SODIUM FORMATE, PH 6.8 REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: H 3 REMARK 290 REMARK 290 SYMOP SYMMETRY REMARK 290 NNNMMM OPERATOR REMARK 290 1555 X,Y,Z REMARK 290 2555 -Y,X-Y,Z REMARK 290 3555 -X+Y,-X,Z REMARK 290 4555 2/3+X,1/3+Y,1/3+Z REMARK 290 5555 2/3-Y,1/3+X-Y,1/3+Z REMARK 290 6555 2/3-X+Y,1/3-X,1/3+Z REMARK 290 7555 1/3+X,2/3+Y,2/3+Z REMARK 290 8555 1/3-Y,2/3+X-Y,2/3+Z REMARK 290 9555 1/3-X+Y,2/3-X,2/3+Z REMARK 290 REMARK 290 WHERE NNN -> OPERATOR NUMBER REMARK 290 MMM -> TRANSLATION VECTOR REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY REMARK 290 RELATED MOLECULES. REMARK 290 SMTRY1 1 1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 1 0.000000 1.000000 0.000000 0.00000 REMARK 290 SMTRY3 1 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 2 -0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 2 0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 2 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 3 -0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 3 -0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 3 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 4 1.000000 0.000000 0.000000 33.36000 REMARK 290 SMTRY2 4 0.000000 1.000000 0.000000 19.26040 REMARK 290 SMTRY3 4 0.000000 0.000000 1.000000 28.36333 REMARK 290 SMTRY1 5 -0.500000 -0.866025 0.000000 33.36000 REMARK 290 SMTRY2 5 0.866025 -0.500000 0.000000 19.26040 REMARK 290 SMTRY3 5 0.000000 0.000000 1.000000 28.36333 REMARK 290 SMTRY1 6 -0.500000 0.866025 0.000000 33.36000 REMARK 290 SMTRY2 6 -0.866025 -0.500000 0.000000 19.26040 REMARK 290 SMTRY3 6 0.000000 0.000000 1.000000 28.36333 REMARK 290 SMTRY1 7 1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 7 0.000000 1.000000 0.000000 38.52081 REMARK 290 SMTRY3 7 0.000000 0.000000 1.000000 56.72667 REMARK 290 SMTRY1 8 -0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 8 0.866025 -0.500000 0.000000 38.52081 REMARK 290 SMTRY3 8 0.000000 0.000000 1.000000 56.72667 REMARK 290 SMTRY1 9 -0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 9 -0.866025 -0.500000 0.000000 38.52081 REMARK 290 SMTRY3 9 0.000000 0.000000 1.000000 56.72667 REMARK 290 REMARK 290 REMARK: NULL REMARK 300 REMARK 300 BIOMOLECULE: 1 REMARK 300 THIS ENTRY CONTAINS THE CRYSTALLOGRAPHIC ASYMMETRIC UNIT REMARK 300 WHICH CONSISTS OF 1 CHAIN(S). SEE REMARK 350 FOR REMARK 300 INFORMATION ON GENERATING THE BIOLOGICAL MOLECULE(S). REMARK 350 REMARK 350 GENERATING THE BIOMOLECULE REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS REMARK 350 GIVEN BELOW. BOTH NON-CRYSTALLOGRAPHIC AND REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN. REMARK 350 REMARK 350 BIOMOLECULE: 1 REMARK 350 APPLY THE FOLLOWING TO CHAINS: A REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000 REMARK 350 BIOMT1 2 -0.500000 -0.866025 0.000000 233.52000 REMARK 350 BIOMT2 2 0.866025 -0.500000 0.000000 -57.78121 REMARK 350 BIOMT3 2 0.000000 0.000000 1.000000 0.00000 REMARK 350 BIOMT1 3 -0.500000 0.866025 0.000000 166.80000 REMARK 350 BIOMT2 3 -0.866025 -0.500000 0.000000 173.34364 REMARK 350 BIOMT3 3 0.000000 0.000000 1.000000 0.00000 REMARK 465 REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 PRO A 8 REMARK 465 SER A 9 REMARK 470 REMARK 470 MISSING ATOM REMARK 470 THE FOLLOWING RESIDUES HAVE MISSING ATOMS(M=MODEL NUMBER; REMARK 470 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER; REMARK 470 I=INSERTION CODE): REMARK 470 M RES CSSEQI ATOMS REMARK 470 ASP A 10 CG OD1 OD2 REMARK 470 GLU A 23 CG CD OE1 OE2 REMARK 470 GLN A 25 CG CD OE1 NE2 REMARK 470 TYR A 87 CG CD1 CD2 CE1 CE2 CZ OH REMARK 470 GLN A 88 CG CD OE1 NE2 REMARK 470 ARG A 103 CG CD NE CZ NH1 NH2 REMARK 470 GLU A 107 CG CD OE1 OE2 REMARK 470 GLU A 110 CG CD OE1 OE2 REMARK 470 GLU A 146 CG CD OE1 OE2 REMARK 470 LEU A 157 O REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: CLOSE CONTACTS IN SAME ASYMMETRIC UNIT REMARK 500 REMARK 500 THE FOLLOWING ATOMS ARE IN CLOSE CONTACT. REMARK 500 REMARK 500 ATM1 RES C SSEQI ATM2 RES C SSEQI REMARK 500 CA THR A 105 O HOH 261 2.18 REMARK 500 NZ LYS A 98 O PRO A 117 2.19 REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: CLOSE CONTACTS REMARK 500 REMARK 500 THE FOLLOWING ATOMS THAT ARE RELATED BY CRYSTALLOGRAPHIC REMARK 500 SYMMETRY ARE IN CLOSE CONTACT. AN ATOM LOCATED WITHIN 0.15 REMARK 500 ANGSTROMS OF A SYMMETRY RELATED ATOM IS ASSUMED TO BE ON A REMARK 500 SPECIAL POSITION AND IS, THEREFORE, LISTED IN REMARK 375 REMARK 500 INSTEAD OF REMARK 500. ATOMS WITH NON-BLANK ALTERNATE REMARK 500 LOCATION INDICATORS ARE NOT INCLUDED IN THE CALCULATIONS. REMARK 500 REMARK 500 DISTANCE CUTOFF: REMARK 500 2.2 ANGSTROMS FOR CONTACTS NOT INVOLVING HYDROGEN ATOMS REMARK 500 1.6 ANGSTROMS FOR CONTACTS INVOLVING HYDROGEN ATOMS REMARK 500 REMARK 500 ATM1 RES C SSEQI ATM2 RES C SSEQI SSYMOP DISTANCE REMARK 500 NH2 ARG A 32 OG SER A 71 6985 1.91 REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: COVALENT BOND LENGTHS REMARK 500 REMARK 500 THE STEREOCHEMICAL PARAMETERS OF THE FOLLOWING RESIDUES REMARK 500 HAVE VALUES WHICH DEVIATE FROM EXPECTED VALUES BY MORE REMARK 500 THAN 6*RMSD (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 500 IDENTIFIER; SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT: (10X,I3,1X,2(A3,1X,A1,I4,A1,1X,A4,3X),F6.3) REMARK 500 REMARK 500 EXPECTED VALUES: ENGH AND HUBER, 1991 REMARK 500 REMARK 500 M RES CSSEQI ATM1 RES CSSEQI ATM2 DEVIATION REMARK 500 ILE A 83 C ALA A 84 N 0.165 REMARK 500 GLU A 110 C ALA A 111 N 0.409 REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: COVALENT BOND ANGLES REMARK 500 REMARK 500 THE STEREOCHEMICAL PARAMETERS OF THE FOLLOWING RESIDUES REMARK 500 HAVE VALUES WHICH DEVIATE FROM EXPECTED VALUES BY MORE REMARK 500 THAN 6*RMSD (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 500 IDENTIFIER; SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT: (10X,I3,1X,A3,1X,A1,I4,A1,3(1X,A4,2X),12X,F5.1) REMARK 500 REMARK 500 EXPECTED VALUES: ENGH AND HUBER, 1991 REMARK 500 REMARK 500 M RES CSSEQI ATM1 ATM2 ATM3 REMARK 500 PRO A 70 CA - C - N ANGL. DEV. = 31.6 DEGREES REMARK 500 PRO A 70 O - C - N ANGL. DEV. =-90.0 DEGREES REMARK 500 ILE A 83 O - C - N ANGL. DEV. =-44.1 DEGREES REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: TORSION ANGLES REMARK 500 REMARK 500 TORSION ANGLES OUTSIDE THE EXPECTED RAMACHANDRAN REGIONS: REMARK 500 (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; REMARK 500 SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT:(10X,I3,1X,A3,1X,A1,I4,A1,4X,F7.2,3X,F7.2) REMARK 500 REMARK 500 M RES CSSEQI PSI PHI REMARK 500 SER A 71 -31.86 56.56 REMARK 500 VAL A 85 -95.80 78.36 REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: NON-CIS, NON-TRANS REMARK 500 REMARK 500 THE FOLLOWING PEPTIDE BONDS DEVIATE SIGNIFICANTLY FROM BOTH REMARK 500 CIS AND TRANS CONFORMATION. CIS BONDS, IF ANY, ARE LISTED REMARK 500 ON CISPEP RECORDS. TRANS IS DEFINED AS 180 +/- 30 AND REMARK 500 CIS IS DEFINED AS 0 +/- 30 DEGREES. REMARK 500 MODEL OMEGA REMARK 500 GLU A 110 ALA A 111 137.31 REMARK 525 REMARK 525 SOLVENT REMARK 525 THE FOLLOWING SOLVENT MOLECULES LIE FARTHER THAN EXPECTED REMARK 525 FROM THE PROTEIN OR NUCLEIC ACID MOLECULE AND MAY BE REMARK 525 ASSOCIATED WITH A SYMMETRY RELATED MOLECULE (M=MODEL REMARK 525 NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE REMARK 525 NUMBER; I=INSERTION CODE): REMARK 525 REMARK 525 M RES CSSEQI REMARK 525 HOH 225 DISTANCE = 5.67 ANGSTROMS REMARK 525 HOH 244 DISTANCE = 8.73 ANGSTROMS REMARK 525 HOH 248 DISTANCE = 7.30 ANGSTROMS REMARK 525 HOH 252 DISTANCE = 6.32 ANGSTROMS REMARK 525 HOH 253 DISTANCE = 5.65 ANGSTROMS REMARK 525 HOH 254 DISTANCE = 5.68 ANGSTROMS REMARK 525 HOH 267 DISTANCE = 6.29 ANGSTROMS REMARK 525 HOH 269 DISTANCE = 6.10 ANGSTROMS DBREF 4TSV A 8 157 UNP P01375 TNFA_HUMAN 84 233 SEQADV 4TSV SER A 29 UNP P01375 LEU 105 ENGINEERED SEQADV 4TSV ILE A 52 UNP P01375 SER 128 ENGINEERED SEQADV 4TSV PHE A 56 UNP P01375 TYR 132 ENGINEERED SEQRES 1 A 150 PRO SER ASP LYS PRO VAL ALA HIS VAL VAL ALA ASN PRO SEQRES 2 A 150 GLN ALA GLU GLY GLN LEU GLN TRP SER ASN ARG ARG ALA SEQRES 3 A 150 ASN ALA LEU LEU ALA ASN GLY VAL GLU LEU ARG ASP ASN SEQRES 4 A 150 GLN LEU VAL VAL PRO ILE GLU GLY LEU PHE LEU ILE TYR SEQRES 5 A 150 SER GLN VAL LEU PHE LYS GLY GLN GLY CYS PRO SER THR SEQRES 6 A 150 HIS VAL LEU LEU THR HIS THR ILE SER ARG ILE ALA VAL SEQRES 7 A 150 SER TYR GLN THR LYS VAL ASN LEU LEU SER ALA ILE LYS SEQRES 8 A 150 SER PRO CYS GLN ARG GLU THR PRO GLU GLY ALA GLU ALA SEQRES 9 A 150 LYS PRO TRP TYR GLU PRO ILE TYR LEU GLY GLY VAL PHE SEQRES 10 A 150 GLN LEU GLU LYS GLY ASP ARG LEU SER ALA GLU ILE ASN SEQRES 11 A 150 ARG PRO ASP TYR LEU ASP PHE ALA GLU SER GLY GLN VAL SEQRES 12 A 150 TYR PHE GLY ILE ILE ALA LEU FORMUL 2 HOH *70(H2 O) HELIX 1 1 PRO A 139 TYR A 141 5 3 SHEET 1 A 5 LEU A 36 ALA A 38 0 SHEET 2 A 5 VAL A 13 VAL A 17 -1 N HIS A 15 O LEU A 36 SHEET 3 A 5 TYR A 151 ALA A 156 -1 N ILE A 154 O ALA A 14 SHEET 4 A 5 GLY A 54 GLN A 67 -1 N GLN A 61 O TYR A 151 SHEET 5 A 5 PRO A 113 LEU A 126 -1 N LEU A 126 O GLY A 54 SHEET 1 B 2 GLU A 42 ARG A 44 0 SHEET 2 B 2 GLN A 47 VAL A 49 -1 N VAL A 49 O GLU A 42 SHEET 1 C 3 ARG A 131 ILE A 136 0 SHEET 2 C 3 LEU A 75 ILE A 83 -1 N ILE A 83 O ARG A 131 SHEET 3 C 3 LYS A 90 SER A 99 -1 N LYS A 98 O LEU A 76 SSBOND 1 CYS A 69 CYS A 101 CRYST1 66.720 66.720 85.090 90.00 90.00 120.00 H 3 9 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.014988 0.008653 0.000000 0.00000 SCALE2 0.000000 0.017307 0.000000 0.00000 SCALE3 0.000000 0.000000 0.011752 0.00000 ATOM 1 N ASP A 10 143.224 41.252 20.262 1.00 33.32 N ATOM 2 CA ASP A 10 143.865 39.983 20.718 1.00 33.75 C ATOM 3 C ASP A 10 143.660 38.778 19.790 1.00 32.25 C ATOM 4 O ASP A 10 144.253 37.736 20.022 1.00 37.02 O ATOM 5 CB ASP A 10 143.405 39.643 22.148 1.00 35.94 C ATOM 6 N LYS A 11 142.780 38.905 18.793 1.00 32.19 N ATOM 7 CA LYS A 11 142.495 37.859 17.785 1.00 26.76 C ATOM 8 C LYS A 11 143.600 37.697 16.725 1.00 22.39 C ATOM 9 O LYS A 11 144.151 38.698 16.231 1.00 18.85 O ATOM 10 CB LYS A 11 141.249 38.245 16.966 1.00 28.02 C ATOM 11 CG LYS A 11 139.941 38.456 17.713 1.00 32.27 C ATOM 12 CD LYS A 11 138.765 38.699 16.745 1.00 34.99 C ATOM 13 CE LYS A 11 137.481 38.943 17.534 1.00 40.00 C ATOM 14 NZ LYS A 11 136.210 38.723 16.752 1.00 43.13 N ATOM 15 N PRO A 12 143.913 36.445 16.338 1.00 19.62 N ATOM 16 CA PRO A 12 144.939 36.256 15.304 1.00 15.56 C ATOM 17 C PRO A 12 144.367 36.883 14.027 1.00 14.16 C ATOM 18 O PRO A 12 143.174 36.739 13.743 1.00 13.12 O ATOM 19 CB PRO A 12 144.999 34.740 15.151 1.00 16.36 C ATOM 20 CG PRO A 12 144.721 34.276 16.535 1.00 18.42 C ATOM 21 CD PRO A 12 143.561 35.158 16.956 1.00 18.73 C ATOM 22 N VAL A 13 145.192 37.598 13.283 1.00 12.41 N ATOM 23 CA VAL A 13 144.717 38.227 12.076 1.00 13.30 C ATOM 24 C VAL A 13 145.859 38.454 11.090 1.00 14.05 C ATOM 25 O VAL A 13 147.026 38.654 11.472 1.00 12.10 O ATOM 26 CB VAL A 13 144.026 39.573 12.439 1.00 17.06 C ATOM 27 CG1 VAL A 13 145.024 40.511 13.176 1.00 19.48 C ATOM 28 CG2 VAL A 13 143.470 40.256 11.238 1.00 13.49 C ATOM 29 N ALA A 14 145.510 38.434 9.800 1.00 13.51 N ATOM 30 CA ALA A 14 146.461 38.682 8.731 1.00 10.53 C ATOM 31 C ALA A 14 145.780 39.225 7.506 1.00 8.76 C ATOM 32 O ALA A 14 144.642 38.862 7.208 1.00 7.01 O ATOM 33 CB ALA A 14 147.243 37.395 8.355 1.00 12.75 C ATOM 34 N HIS A 15 146.389 40.256 6.945 1.00 11.87 N ATOM 35 CA HIS A 15 145.956 40.816 5.673 1.00 10.59 C ATOM 36 C HIS A 15 147.262 41.122 4.955 1.00 12.24 C ATOM 37 O HIS A 15 147.978 42.069 5.283 1.00 10.64 O ATOM 38 CB HIS A 15 145.049 42.045 5.772 1.00 10.41 C ATOM 39 CG HIS A 15 144.611 42.572 4.429 1.00 8.60 C ATOM 40 ND1 HIS A 15 143.476 42.127 3.787 1.00 8.51 N ATOM 41 CD2 HIS A 15 145.152 43.508 3.612 1.00 8.05 C ATOM 42 CE1 HIS A 15 143.332 42.764 2.640 1.00 9.72 C ATOM 43 NE2 HIS A 15 144.340 43.608 2.508 1.00 9.91 N ATOM 44 N VAL A 16 147.608 40.240 4.037 1.00 11.98 N ATOM 45 CA VAL A 16 148.826 40.371 3.274 1.00 12.24 C ATOM 46 C VAL A 16 148.525 40.670 1.799 1.00 12.12 C ATOM 47 O VAL A 16 147.447 40.332 1.280 1.00 7.76 O ATOM 48 CB VAL A 16 149.758 39.126 3.505 1.00 10.30 C ATOM 49 CG1 VAL A 16 150.012 38.973 5.008 1.00 12.61 C ATOM 50 CG2 VAL A 16 149.132 37.810 2.973 1.00 11.85 C ATOM 51 N VAL A 17 149.459 41.366 1.160 1.00 12.95 N ATOM 52 CA VAL A 17 149.331 41.748 -0.237 1.00 13.09 C ATOM 53 C VAL A 17 150.446 41.190 -1.128 1.00 11.45 C ATOM 54 O VAL A 17 151.493 40.738 -0.654 1.00 13.18 O ATOM 55 CB VAL A 17 149.341 43.306 -0.382 1.00 13.98 C ATOM 56 CG1 VAL A 17 148.215 43.928 0.446 1.00 14.42 C ATOM 57 CG2 VAL A 17 150.683 43.863 0.050 1.00 15.72 C ATOM 58 N ALA A 18 150.185 41.181 -2.426 1.00 14.79 N ATOM 59 CA ALA A 18 151.175 40.742 -3.388 1.00 15.88 C ATOM 60 C ALA A 18 152.362 41.709 -3.283 1.00 15.79 C ATOM 61 O ALA A 18 152.204 42.931 -3.133 1.00 17.49 O ATOM 62 CB ALA A 18 150.574 40.779 -4.812 1.00 12.09 C ATOM 63 N ASN A 19 153.552 41.153 -3.385 1.00 17.64 N ATOM 64 CA ASN A 19 154.773 41.948 -3.338 1.00 20.04 C ATOM 65 C ASN A 19 155.024 42.328 -4.821 1.00 20.85 C ATOM 66 O ASN A 19 155.319 41.474 -5.636 1.00 21.96 O ATOM 67 CB ASN A 19 155.898 41.077 -2.738 1.00 20.94 C ATOM 68 CG ASN A 19 157.279 41.714 -2.844 1.00 25.71 C ATOM 69 OD1 ASN A 19 157.469 42.766 -3.478 1.00 23.72 O ATOM 70 ND2 ASN A 19 158.255 41.065 -2.235 1.00 26.30 N ATOM 71 N PRO A 20 154.883 43.610 -5.183 1.00 22.12 N ATOM 72 CA PRO A 20 155.100 44.040 -6.574 1.00 28.28 C ATOM 73 C PRO A 20 156.542 43.933 -7.135 1.00 31.58 C ATOM 74 O PRO A 20 156.775 44.203 -8.322 1.00 33.79 O ATOM 75 CB PRO A 20 154.555 45.474 -6.567 1.00 28.83 C ATOM 76 CG PRO A 20 154.906 45.946 -5.200 1.00 28.31 C ATOM 77 CD PRO A 20 154.548 44.759 -4.324 1.00 23.42 C ATOM 78 N GLN A 21 157.495 43.551 -6.281 1.00 33.56 N ATOM 79 CA GLN A 21 158.887 43.374 -6.686 1.00 35.60 C ATOM 80 C GLN A 21 159.118 41.910 -7.067 1.00 34.82 C ATOM 81 O GLN A 21 160.213 41.556 -7.509 1.00 37.51 O ATOM 82 CB GLN A 21 159.874 43.764 -5.564 1.00 39.05 C ATOM 83 CG GLN A 21 159.747 45.192 -5.013 1.00 44.97 C ATOM 84 CD GLN A 21 160.081 46.289 -6.035 1.00 50.39 C ATOM 85 OE1 GLN A 21 159.534 46.329 -7.148 1.00 53.45 O ATOM 86 NE2 GLN A 21 160.967 47.201 -5.645 1.00 50.90 N ATOM 87 N ALA A 22 158.136 41.050 -6.807 1.00 31.14 N ATOM 88 CA ALA A 22 158.255 39.647 -7.170 1.00 28.99 C ATOM 89 C ALA A 22 157.710 39.539 -8.585 1.00 31.71 C ATOM 90 O ALA A 22 156.514 39.725 -8.826 1.00 34.93 O ATOM 91 CB ALA A 22 157.475 38.772 -6.233 1.00 26.11 C ATOM 92 N GLU A 23 158.618 39.349 -9.538 1.00 32.39 N ATOM 93 CA GLU A 23 158.256 39.235 -10.956 1.00 27.89 C ATOM 94 C GLU A 23 157.829 37.834 -11.272 1.00 24.65 C ATOM 95 O GLU A 23 158.553 36.914 -10.958 1.00 23.15 O ATOM 96 CB GLU A 23 159.425 39.592 -11.820 1.00 26.66 C ATOM 97 N GLY A 24 156.636 37.670 -11.841 1.00 26.46 N ATOM 98 CA GLY A 24 156.138 36.345 -12.200 1.00 25.24 C ATOM 99 C GLY A 24 155.941 35.332 -11.077 1.00 26.12 C ATOM 100 O GLY A 24 156.214 34.138 -11.235 1.00 23.73 O ATOM 101 N GLN A 25 155.499 35.812 -9.922 1.00 24.28 N ATOM 102 CA GLN A 25 155.222 34.939 -8.812 1.00 22.78 C ATOM 103 C GLN A 25 154.237 35.683 -7.928 1.00 21.14 C ATOM 104 O GLN A 25 154.180 36.915 -7.953 1.00 23.46 O ATOM 105 CB GLN A 25 156.496 34.657 -8.046 1.00 23.59 C ATOM 106 N LEU A 26 153.410 34.948 -7.195 1.00 20.41 N ATOM 107 CA LEU A 26 152.500 35.586 -6.248 1.00 17.69 C ATOM 108 C LEU A 26 153.184 35.466 -4.888 1.00 17.34 C ATOM 109 O LEU A 26 153.082 34.432 -4.220 1.00 19.82 O ATOM 110 CB LEU A 26 151.170 34.867 -6.224 1.00 16.13 C ATOM 111 CG LEU A 26 150.126 35.362 -5.230 1.00 15.88 C ATOM 112 CD1 LEU A 26 150.134 36.851 -5.096 1.00 15.60 C ATOM 113 CD2 LEU A 26 148.790 34.823 -5.621 1.00 15.18 C ATOM 114 N GLN A 27 153.940 36.490 -4.514 1.00 20.61 N ATOM 115 CA GLN A 27 154.654 36.517 -3.229 1.00 19.35 C ATOM 116 C GLN A 27 153.908 37.402 -2.240 1.00 16.18 C ATOM 117 O GLN A 27 153.790 38.615 -2.464 1.00 16.65 O ATOM 118 CB GLN A 27 156.029 37.095 -3.466 1.00 23.85 C ATOM 119 CG GLN A 27 157.104 36.564 -2.576 1.00 34.07 C ATOM 120 CD GLN A 27 158.455 37.180 -2.955 1.00 41.80 C ATOM 121 OE1 GLN A 27 158.911 38.167 -2.331 1.00 43.70 O ATOM 122 NE2 GLN A 27 159.060 36.665 -4.047 1.00 44.39 N ATOM 123 N TRP A 28 153.391 36.813 -1.169 1.00 14.32 N ATOM 124 CA TRP A 28 152.628 37.579 -0.167 1.00 13.78 C ATOM 125 C TRP A 28 153.525 38.388 0.767 1.00 11.87 C ATOM 126 O TRP A 28 154.602 37.927 1.159 1.00 15.31 O ATOM 127 CB TRP A 28 151.710 36.649 0.656 1.00 14.44 C ATOM 128 CG TRP A 28 150.707 35.960 -0.160 1.00 14.13 C ATOM 129 CD1 TRP A 28 150.592 34.615 -0.356 1.00 12.17 C ATOM 130 CD2 TRP A 28 149.658 36.579 -0.907 1.00 14.18 C ATOM 131 NE1 TRP A 28 149.515 34.357 -1.175 1.00 14.75 N ATOM 132 CE2 TRP A 28 148.920 35.544 -1.533 1.00 11.76 C ATOM 133 CE3 TRP A 28 149.263 37.910 -1.111 1.00 14.41 C ATOM 134 CZ2 TRP A 28 147.804 35.791 -2.357 1.00 11.06 C ATOM 135 CZ3 TRP A 28 148.151 38.166 -1.929 1.00 14.92 C ATOM 136 CH2 TRP A 28 147.435 37.101 -2.542 1.00 12.82 C ATOM 137 N SER A 29 153.060 39.567 1.170 1.00 12.87 N ATOM 138 CA SER A 29 153.850 40.429 2.060 1.00 12.86 C ATOM 139 C SER A 29 153.003 41.246 2.997 1.00 14.62 C ATOM 140 O SER A 29 151.941 41.729 2.611 1.00 14.58 O ATOM 141 CB SER A 29 154.662 41.400 1.240 1.00 14.50 C ATOM 142 OG SER A 29 155.469 42.187 2.091 1.00 19.88 O ATOM 143 N ASN A 30 153.477 41.396 4.238 1.00 16.28 N ATOM 144 CA ASN A 30 152.767 42.192 5.250 1.00 18.59 C ATOM 145 C ASN A 30 153.471 43.532 5.442 1.00 19.03 C ATOM 146 O ASN A 30 153.075 44.316 6.302 1.00 19.54 O ATOM 147 CB ASN A 30 152.792 41.475 6.599 1.00 14.38 C ATOM 148 CG ASN A 30 154.177 41.267 7.079 1.00 16.24 C ATOM 149 OD1 ASN A 30 154.952 40.533 6.461 1.00 20.35 O ATOM 150 ND2 ASN A 30 154.555 41.985 8.114 1.00 18.77 N ATOM 151 N ARG A 31 154.498 43.786 4.628 1.00 21.43 N ATOM 152 CA ARG A 31 155.308 44.991 4.755 1.00 25.65 C ATOM 153 C ARG A 31 154.732 46.101 3.932 1.00 26.84 C ATOM 154 O ARG A 31 155.339 46.565 2.973 1.00 26.52 O ATOM 155 CB ARG A 31 156.752 44.731 4.340 1.00 25.24 C ATOM 156 CG ARG A 31 157.279 43.390 4.733 1.00 29.66 C ATOM 157 CD ARG A 31 158.281 43.486 5.820 1.00 34.97 C ATOM 158 NE ARG A 31 157.612 43.458 7.097 1.00 37.26 N ATOM 159 CZ ARG A 31 158.207 43.193 8.248 1.00 35.45 C ATOM 160 NH1 ARG A 31 159.501 42.932 8.318 1.00 34.87 N ATOM 161 NH2 ARG A 31 157.480 43.164 9.336 1.00 40.35 N ATOM 162 N ARG A 32 153.553 46.533 4.345 1.00 26.97 N ATOM 163 CA ARG A 32 152.838 47.585 3.682 1.00 24.26 C ATOM 164 C ARG A 32 151.980 48.292 4.746 1.00 23.68 C ATOM 165 O ARG A 32 151.603 47.701 5.759 1.00 19.33 O ATOM 166 CB ARG A 32 151.953 46.924 2.610 1.00 27.52 C ATOM 167 CG ARG A 32 151.174 47.874 1.742 1.00 34.83 C ATOM 168 CD ARG A 32 150.779 47.201 0.458 1.00 40.41 C ATOM 169 NE ARG A 32 151.418 47.850 -0.681 1.00 49.54 N ATOM 170 CZ ARG A 32 152.454 47.358 -1.355 1.00 55.11 C ATOM 171 NH1 ARG A 32 152.992 46.185 -1.022 1.00 58.48 N ATOM 172 NH2 ARG A 32 152.977 48.069 -2.347 1.00 57.80 N ATOM 173 N ALA A 33 151.742 49.584 4.584 1.00 20.12 N ATOM 174 CA ALA A 33 150.844 50.231 5.510 1.00 21.21 C ATOM 175 C ALA A 33 149.520 49.577 5.018 1.00 22.20 C ATOM 176 O ALA A 33 149.435 49.140 3.854 1.00 24.03 O ATOM 177 CB ALA A 33 150.853 51.755 5.284 1.00 19.68 C ATOM 178 N ASN A 34 148.521 49.485 5.894 1.00 19.97 N ATOM 179 CA ASN A 34 147.229 48.860 5.591 1.00 16.85 C ATOM 180 C ASN A 34 147.285 47.321 5.352 1.00 14.45 C ATOM 181 O ASN A 34 146.436 46.755 4.670 1.00 15.47 O ATOM 182 CB ASN A 34 146.515 49.608 4.445 1.00 19.98 C ATOM 183 CG ASN A 34 146.272 51.098 4.763 1.00 22.23 C ATOM 184 OD1 ASN A 34 145.660 51.419 5.773 1.00 24.36 O ATOM 185 ND2 ASN A 34 146.724 51.997 3.888 1.00 20.16 N ATOM 186 N ALA A 35 148.330 46.668 5.854 1.00 13.25 N ATOM 187 CA ALA A 35 148.461 45.212 5.782 1.00 11.96 C ATOM 188 C ALA A 35 148.664 44.812 7.252 1.00 12.70 C ATOM 189 O ALA A 35 149.051 45.662 8.057 1.00 13.40 O ATOM 190 CB ALA A 35 149.646 44.796 4.942 1.00 11.98 C ATOM 191 N LEU A 36 148.331 43.572 7.608 1.00 13.11 N ATOM 192 CA LEU A 36 148.432 43.072 8.990 1.00 16.33 C ATOM 193 C LEU A 36 149.007 41.657 9.089 1.00 14.41 C ATOM 194 O LEU A 36 148.816 40.842 8.203 1.00 11.82 O ATOM 195 CB LEU A 36 147.027 42.994 9.613 1.00 21.14 C ATOM 196 CG LEU A 36 146.051 44.159 9.469 1.00 22.22 C ATOM 197 CD1 LEU A 36 144.638 43.720 9.638 1.00 23.27 C ATOM 198 CD2 LEU A 36 146.431 45.218 10.518 1.00 26.78 C ATOM 199 N LEU A 37 149.682 41.364 10.196 1.00 14.13 N ATOM 200 CA LEU A 37 150.216 40.024 10.472 1.00 15.60 C ATOM 201 C LEU A 37 150.462 40.021 11.993 1.00 18.45 C ATOM 202 O LEU A 37 151.615 40.157 12.445 1.00 19.89 O ATOM 203 CB LEU A 37 151.517 39.794 9.702 1.00 15.71 C ATOM 204 CG LEU A 37 151.954 38.317 9.650 1.00 16.03 C ATOM 205 CD1 LEU A 37 150.932 37.490 8.815 1.00 17.74 C ATOM 206 CD2 LEU A 37 153.339 38.183 9.008 1.00 18.39 C ATOM 207 N ALA A 38 149.367 39.871 12.760 1.00 20.95 N ATOM 208 CA ALA A 38 149.377 39.954 14.232 1.00 19.69 C ATOM 209 C ALA A 38 148.858 38.784 15.021 1.00 22.65 C ATOM 210 O ALA A 38 148.229 37.889 14.493 1.00 25.21 O ATOM 211 CB ALA A 38 148.591 41.206 14.669 1.00 18.54 C ATOM 212 N ASN A 39 149.155 38.802 16.314 1.00 22.40 N ATOM 213 CA ASN A 39 148.675 37.788 17.238 1.00 18.76 C ATOM 214 C ASN A 39 148.899 36.310 16.918 1.00 18.30 C ATOM 215 O ASN A 39 147.995 35.491 17.128 1.00 19.70 O ATOM 216 CB ASN A 39 147.183 38.019 17.568 1.00 17.42 C ATOM 217 CG ASN A 39 146.926 39.368 18.160 1.00 19.10 C ATOM 218 OD1 ASN A 39 147.650 39.818 19.021 1.00 21.87 O ATOM 219 ND2 ASN A 39 145.891 40.026 17.707 1.00 23.83 N ATOM 220 N GLY A 40 150.073 35.966 16.391 1.00 17.34 N ATOM 221 CA GLY A 40 150.389 34.563 16.152 1.00 17.19 C ATOM 222 C GLY A 40 150.386 34.016 14.747 1.00 18.10 C ATOM 223 O GLY A 40 150.978 32.957 14.470 1.00 16.88 O ATOM 224 N VAL A 41 149.655 34.678 13.862 1.00 18.73 N ATOM 225 CA VAL A 41 149.625 34.217 12.485 1.00 20.10 C ATOM 226 C VAL A 41 151.021 34.446 11.898 1.00 20.46 C ATOM 227 O VAL A 41 151.597 35.525 12.064 1.00 25.52 O ATOM 228 CB VAL A 41 148.589 34.965 11.667 1.00 14.08 C ATOM 229 CG1 VAL A 41 148.546 34.394 10.275 1.00 18.10 C ATOM 230 CG2 VAL A 41 147.261 34.845 12.314 1.00 11.27 C ATOM 231 N GLU A 42 151.569 33.418 11.252 1.00 21.47 N ATOM 232 CA GLU A 42 152.903 33.483 10.663 1.00 21.65 C ATOM 233 C GLU A 42 152.840 33.360 9.172 1.00 20.06 C ATOM 234 O GLU A 42 152.000 32.644 8.659 1.00 22.92 O ATOM 235 CB GLU A 42 153.764 32.312 11.138 1.00 27.01 C ATOM 236 CG GLU A 42 154.236 32.372 12.566 1.00 36.48 C ATOM 237 CD GLU A 42 155.124 31.170 12.935 1.00 42.54 C ATOM 238 OE1 GLU A 42 155.050 30.094 12.257 1.00 42.73 O ATOM 239 OE2 GLU A 42 155.897 31.312 13.918 1.00 46.54 O ATOM 240 N LEU A 43 153.755 34.032 8.492 1.00 19.87 N ATOM 241 CA LEU A 43 153.879 33.946 7.048 1.00 19.12 C ATOM 242 C LEU A 43 155.203 33.138 6.918 1.00 20.37 C ATOM 243 O LEU A 43 156.279 33.616 7.269 1.00 22.76 O ATOM 244 CB LEU A 43 153.947 35.346 6.397 1.00 16.01 C ATOM 245 CG LEU A 43 153.795 35.429 4.876 1.00 19.75 C ATOM 246 CD1 LEU A 43 152.538 34.734 4.436 1.00 20.57 C ATOM 247 CD2 LEU A 43 153.747 36.882 4.422 1.00 20.33 C ATOM 248 N ARG A 44 155.065 31.866 6.559 1.00 21.56 N ATOM 249 CA ARG A 44 156.147 30.911 6.389 1.00 24.64 C ATOM 250 C ARG A 44 155.988 30.273 5.011 1.00 24.69 C ATOM 251 O ARG A 44 154.906 29.807 4.664 1.00 22.92 O ATOM 252 CB ARG A 44 156.003 29.820 7.449 1.00 29.62 C ATOM 253 CG ARG A 44 156.899 30.045 8.623 1.00 38.09 C ATOM 254 CD ARG A 44 156.849 28.938 9.690 1.00 43.38 C ATOM 255 NE ARG A 44 157.905 29.133 10.700 1.00 49.92 N ATOM 256 CZ ARG A 44 158.219 30.300 11.279 1.00 53.06 C ATOM 257 NH1 ARG A 44 157.565 31.422 10.965 1.00 52.63 N ATOM 258 NH2 ARG A 44 159.186 30.349 12.199 1.00 53.67 N ATOM 259 N ASP A 45 157.063 30.196 4.241 1.00 25.43 N ATOM 260 CA ASP A 45 157.005 29.598 2.889 1.00 26.80 C ATOM 261 C ASP A 45 155.834 30.148 2.060 1.00 24.86 C ATOM 262 O ASP A 45 155.170 29.376 1.370 1.00 24.60 O ATOM 263 CB ASP A 45 156.816 28.048 2.860 1.00 29.11 C ATOM 264 CG ASP A 45 157.497 27.297 3.990 1.00 37.23 C ATOM 265 OD1 ASP A 45 158.709 27.511 4.235 1.00 39.94 O ATOM 266 OD2 ASP A 45 156.799 26.430 4.585 1.00 39.20 O ATOM 267 N ASN A 46 155.548 31.438 2.144 1.00 23.46 N ATOM 268 CA ASN A 46 154.447 32.027 1.383 1.00 20.18 C ATOM 269 C ASN A 46 153.062 31.519 1.816 1.00 19.44 C ATOM 270 O ASN A 46 152.076 31.760 1.116 1.00 18.47 O ATOM 271 CB ASN A 46 154.654 31.807 -0.139 1.00 20.87 C ATOM 272 CG ASN A 46 154.067 32.948 -0.999 1.00 18.87 C ATOM 273 OD1 ASN A 46 154.167 34.116 -0.655 1.00 17.15 O ATOM 274 ND2 ASN A 46 153.461 32.591 -2.121 1.00 17.43 N ATOM 275 N GLN A 47 152.997 30.892 2.993 1.00 17.36 N ATOM 276 CA GLN A 47 151.751 30.341 3.546 1.00 18.22 C ATOM 277 C GLN A 47 151.418 30.963 4.886 1.00 15.93 C ATOM 278 O GLN A 47 152.304 31.472 5.541 1.00 14.38 O ATOM 279 CB GLN A 47 151.865 28.838 3.739 1.00 15.71 C ATOM 280 CG GLN A 47 152.136 28.051 2.462 1.00 22.03 C ATOM 281 CD GLN A 47 152.347 26.558 2.722 1.00 25.55 C ATOM 282 OE1 GLN A 47 151.439 25.763 2.537 1.00 32.21 O ATOM 283 NE2 GLN A 47 153.537 26.184 3.181 1.00 31.59 N ATOM 284 N LEU A 48 150.132 30.986 5.256 1.00 17.41 N ATOM 285 CA LEU A 48 149.675 31.543 6.536 1.00 16.58 C ATOM 286 C LEU A 48 149.446 30.373 7.473 1.00 16.02 C ATOM 287 O LEU A 48 148.699 29.451 7.163 1.00 16.21 O ATOM 288 CB LEU A 48 148.391 32.380 6.395 1.00 12.90 C ATOM 289 CG LEU A 48 148.439 33.713 5.627 1.00 14.27 C ATOM 290 CD1 LEU A 48 147.083 34.410 5.668 1.00 15.87 C ATOM 291 CD2 LEU A 48 149.469 34.620 6.205 1.00 13.62 C ATOM 292 N VAL A 49 150.164 30.398 8.591 1.00 16.03 N ATOM 293 CA VAL A 49 150.129 29.354 9.608 1.00 15.49 C ATOM 294 C VAL A 49 149.208 29.776 10.758 1.00 15.64 C ATOM 295 O VAL A 49 149.358 30.870 11.290 1.00 15.29 O ATOM 296 CB VAL A 49 151.574 29.100 10.119 1.00 16.30 C ATOM 297 CG1 VAL A 49 151.635 27.848 10.990 1.00 16.98 C ATOM 298 CG2 VAL A 49 152.522 28.952 8.958 1.00 18.92 C ATOM 299 N VAL A 50 148.174 28.987 11.048 1.00 18.05 N ATOM 300 CA VAL A 50 147.234 29.307 12.139 1.00 18.82 C ATOM 301 C VAL A 50 147.789 28.999 13.550 1.00 16.51 C ATOM 302 O VAL A 50 148.261 27.902 13.837 1.00 15.43 O ATOM 303 CB VAL A 50 145.837 28.635 11.908 1.00 20.27 C ATOM 304 CG1 VAL A 50 145.899 27.163 12.056 1.00 19.66 C ATOM 305 CG2 VAL A 50 144.862 29.144 12.865 1.00 24.11 C ATOM 306 N PRO A 51 147.758 29.992 14.433 1.00 18.89 N ATOM 307 CA PRO A 51 148.241 29.943 15.827 1.00 20.91 C ATOM 308 C PRO A 51 147.393 29.238 16.885 1.00 21.62 C ATOM 309 O PRO A 51 147.878 28.984 17.995 1.00 21.70 O ATOM 310 CB PRO A 51 148.402 31.425 16.179 1.00 20.57 C ATOM 311 CG PRO A 51 147.274 32.080 15.392 1.00 19.04 C ATOM 312 CD PRO A 51 147.365 31.364 14.048 1.00 21.27 C ATOM 313 N ILE A 52 146.132 28.957 16.555 1.00 21.33 N ATOM 314 CA ILE A 52 145.185 28.336 17.477 1.00 23.54 C ATOM 315 C ILE A 52 144.044 27.704 16.697 1.00 23.32 C ATOM 316 O ILE A 52 143.750 28.092 15.573 1.00 23.21 O ATOM 317 CB ILE A 52 144.527 29.393 18.416 1.00 26.69 C ATOM 318 CG1 ILE A 52 143.930 30.530 17.569 1.00 27.85 C ATOM 319 CG2 ILE A 52 145.529 29.926 19.459 1.00 25.90 C ATOM 320 CD1 ILE A 52 143.074 31.516 18.337 1.00 29.67 C ATOM 321 N GLU A 53 143.397 26.728 17.297 1.00 24.87 N ATOM 322 CA GLU A 53 142.282 26.084 16.635 1.00 27.88 C ATOM 323 C GLU A 53 141.106 27.031 16.790 1.00 26.53 C ATOM 324 O GLU A 53 141.071 27.792 17.759 1.00 29.04 O ATOM 325 CB GLU A 53 142.017 24.754 17.313 1.00 30.38 C ATOM 326 CG GLU A 53 140.839 24.010 16.796 1.00 37.75 C ATOM 327 CD GLU A 53 140.201 23.232 17.897 1.00 42.92 C ATOM 328 OE1 GLU A 53 140.744 22.148 18.226 1.00 41.02 O ATOM 329 OE2 GLU A 53 139.197 23.750 18.456 1.00 48.72 O ATOM 330 N GLY A 54 140.203 27.068 15.811 1.00 24.62 N ATOM 331 CA GLY A 54 139.053 27.964 15.898 1.00 20.71 C ATOM 332 C GLY A 54 138.336 28.176 14.575 1.00 15.54 C ATOM 333 O GLY A 54 138.655 27.517 13.610 1.00 17.51 O ATOM 334 N LEU A 55 137.346 29.052 14.544 1.00 17.03 N ATOM 335 CA LEU A 55 136.627 29.356 13.308 1.00 15.60 C ATOM 336 C LEU A 55 137.334 30.575 12.742 1.00 16.96 C ATOM 337 O LEU A 55 137.585 31.521 13.487 1.00 16.95 O ATOM 338 CB LEU A 55 135.182 29.747 13.603 1.00 17.07 C ATOM 339 CG LEU A 55 134.221 28.621 13.998 1.00 20.60 C ATOM 340 CD1 LEU A 55 132.880 29.188 14.500 1.00 23.33 C ATOM 341 CD2 LEU A 55 134.003 27.708 12.818 1.00 19.59 C ATOM 342 N PHE A 56 137.696 30.539 11.448 1.00 16.39 N ATOM 343 CA PHE A 56 138.358 31.661 10.788 1.00 14.03 C ATOM 344 C PHE A 56 137.631 32.117 9.523 1.00 12.38 C ATOM 345 O PHE A 56 137.116 31.275 8.786 1.00 11.33 O ATOM 346 CB PHE A 56 139.772 31.282 10.335 1.00 12.51 C ATOM 347 CG PHE A 56 140.787 31.224 11.448 1.00 17.83 C ATOM 348 CD1 PHE A 56 140.890 30.089 12.262 1.00 20.70 C ATOM 349 CD2 PHE A 56 141.634 32.311 11.689 1.00 16.16 C ATOM 350 CE1 PHE A 56 141.811 30.045 13.298 1.00 16.83 C ATOM 351 CE2 PHE A 56 142.556 32.280 12.717 1.00 15.39 C ATOM 352 CZ PHE A 56 142.641 31.150 13.519 1.00 18.90 C ATOM 353 N LEU A 57 137.469 33.428 9.346 1.00 11.29 N ATOM 354 CA LEU A 57 136.955 33.938 8.076 1.00 10.66 C ATOM 355 C LEU A 57 138.249 34.015 7.214 1.00 10.32 C ATOM 356 O LEU A 57 139.232 34.611 7.642 1.00 13.78 O ATOM 357 CB LEU A 57 136.367 35.321 8.241 1.00 12.96 C ATOM 358 CG LEU A 57 136.038 35.976 6.877 1.00 18.10 C ATOM 359 CD1 LEU A 57 134.941 35.164 6.105 1.00 16.74 C ATOM 360 CD2 LEU A 57 135.552 37.418 7.076 1.00 16.36 C ATOM 361 N ILE A 58 138.251 33.422 6.024 1.00 9.30 N ATOM 362 CA ILE A 58 139.436 33.410 5.164 1.00 10.57 C ATOM 363 C ILE A 58 139.019 34.019 3.831 1.00 12.12 C ATOM 364 O ILE A 58 137.945 33.701 3.345 1.00 13.73 O ATOM 365 CB ILE A 58 139.885 31.969 4.927 1.00 14.53 C ATOM 366 CG1 ILE A 58 140.017 31.227 6.252 1.00 13.43 C ATOM 367 CG2 ILE A 58 141.213 31.968 4.213 1.00 19.55 C ATOM 368 CD1 ILE A 58 140.209 29.739 6.090 1.00 20.93 C ATOM 369 N TYR A 59 139.846 34.863 3.223 1.00 10.68 N ATOM 370 CA TYR A 59 139.450 35.494 1.959 1.00 9.28 C ATOM 371 C TYR A 59 140.647 35.894 1.058 1.00 9.65 C ATOM 372 O TYR A 59 141.806 35.932 1.491 1.00 9.24 O ATOM 373 CB TYR A 59 138.549 36.721 2.245 1.00 11.30 C ATOM 374 CG TYR A 59 139.284 37.830 2.996 1.00 6.97 C ATOM 375 CD1 TYR A 59 140.044 38.792 2.306 1.00 6.96 C ATOM 376 CD2 TYR A 59 139.300 37.868 4.400 1.00 10.69 C ATOM 377 CE1 TYR A 59 140.796 39.752 3.009 1.00 8.24 C ATOM 378 CE2 TYR A 59 140.042 38.812 5.087 1.00 8.56 C ATOM 379 CZ TYR A 59 140.786 39.739 4.384 1.00 7.21 C ATOM 380 OH TYR A 59 141.562 40.615 5.080 1.00 11.01 O ATOM 381 N SER A 60 140.385 36.181 -0.205 1.00 9.01 N ATOM 382 CA SER A 60 141.476 36.559 -1.088 1.00 10.88 C ATOM 383 C SER A 60 140.928 37.103 -2.398 1.00 7.56 C ATOM 384 O SER A 60 139.832 36.737 -2.819 1.00 9.72 O ATOM 385 CB SER A 60 142.405 35.348 -1.368 1.00 6.90 C ATOM 386 OG SER A 60 143.481 35.664 -2.268 1.00 12.31 O ATOM 387 N GLN A 61 141.642 38.045 -2.982 1.00 8.65 N ATOM 388 CA GLN A 61 141.260 38.568 -4.277 1.00 9.33 C ATOM 389 C GLN A 61 142.531 38.566 -5.126 1.00 9.29 C ATOM 390 O GLN A 61 143.640 38.736 -4.607 1.00 7.39 O ATOM 391 CB GLN A 61 140.734 39.987 -4.159 1.00 6.88 C ATOM 392 CG GLN A 61 140.477 40.672 -5.497 1.00 12.16 C ATOM 393 CD GLN A 61 140.021 42.131 -5.377 1.00 8.16 C ATOM 394 OE1 GLN A 61 139.004 42.444 -4.757 1.00 12.48 O ATOM 395 NE2 GLN A 61 140.772 43.025 -6.013 1.00 10.82 N ATOM 396 N VAL A 62 142.358 38.265 -6.410 1.00 10.07 N ATOM 397 CA VAL A 62 143.423 38.307 -7.372 1.00 8.77 C ATOM 398 C VAL A 62 142.868 39.093 -8.531 1.00 9.55 C ATOM 399 O VAL A 62 141.665 39.068 -8.752 1.00 10.13 O ATOM 400 CB VAL A 62 143.817 36.916 -7.873 1.00 10.60 C ATOM 401 CG1 VAL A 62 144.564 36.172 -6.749 1.00 12.98 C ATOM 402 CG2 VAL A 62 142.594 36.155 -8.343 1.00 15.30 C ATOM 403 N LEU A 63 143.732 39.809 -9.240 1.00 11.16 N ATOM 404 CA LEU A 63 143.328 40.547 -10.417 1.00 12.38 C ATOM 405 C LEU A 63 144.270 40.186 -11.591 1.00 12.36 C ATOM 406 O LEU A 63 145.498 40.353 -11.512 1.00 9.63 O ATOM 407 CB LEU A 63 143.353 42.040 -10.136 1.00 12.38 C ATOM 408 CG LEU A 63 142.498 42.905 -11.052 1.00 14.68 C ATOM 409 CD1 LEU A 63 142.600 44.337 -10.522 1.00 18.87 C ATOM 410 CD2 LEU A 63 143.012 42.873 -12.413 1.00 20.69 C ATOM 411 N PHE A 64 143.690 39.708 -12.696 1.00 13.07 N ATOM 412 CA PHE A 64 144.466 39.327 -13.879 1.00 10.92 C ATOM 413 C PHE A 64 144.319 40.378 -14.960 1.00 10.72 C ATOM 414 O PHE A 64 143.305 41.056 -15.073 1.00 10.15 O ATOM 415 CB PHE A 64 143.999 37.974 -14.424 1.00 13.06 C ATOM 416 CG PHE A 64 144.112 36.825 -13.425 1.00 15.95 C ATOM 417 CD1 PHE A 64 145.342 36.218 -13.145 1.00 14.39 C ATOM 418 CD2 PHE A 64 142.968 36.293 -12.836 1.00 11.76 C ATOM 419 CE1 PHE A 64 145.412 35.098 -12.310 1.00 14.59 C ATOM 420 CE2 PHE A 64 143.035 35.178 -12.005 1.00 12.78 C ATOM 421 CZ PHE A 64 144.255 34.583 -11.745 1.00 12.65 C ATOM 422 N LYS A 65 145.318 40.507 -15.789 1.00 12.13 N ATOM 423 CA LYS A 65 145.203 41.493 -16.823 1.00 14.96 C ATOM 424 C LYS A 65 145.825 40.938 -18.064 1.00 19.75 C ATOM 425 O LYS A 65 146.846 40.245 -17.968 1.00 17.88 O ATOM 426 CB LYS A 65 145.962 42.744 -16.428 1.00 17.82 C ATOM 427 CG LYS A 65 145.903 43.851 -17.457 1.00 21.84 C ATOM 428 CD LYS A 65 147.214 44.578 -17.555 1.00 23.46 C ATOM 429 CE LYS A 65 147.050 45.714 -18.538 1.00 27.61 C ATOM 430 NZ LYS A 65 147.858 46.916 -18.193 1.00 31.18 N ATOM 431 N GLY A 66 145.229 41.247 -19.222 1.00 20.48 N ATOM 432 CA GLY A 66 145.776 40.802 -20.501 1.00 27.18 C ATOM 433 C GLY A 66 145.701 41.841 -21.610 1.00 31.95 C ATOM 434 O GLY A 66 145.100 42.896 -21.424 1.00 30.27 O ATOM 435 N GLN A 67 146.315 41.714 -22.694 1.00 37.91 N ATOM 436 CA GLN A 67 146.253 42.712 -23.787 1.00 40.98 C ATOM 437 C GLN A 67 145.757 42.065 -25.063 1.00 40.98 C ATOM 438 O GLN A 67 146.529 41.739 -25.939 1.00 40.65 O ATOM 439 CB GLN A 67 147.586 43.485 -23.985 1.00 45.12 C ATOM 440 CG GLN A 67 148.828 42.591 -24.198 1.00 53.61 C ATOM 441 CD GLN A 67 150.169 43.256 -23.796 1.00 58.12 C ATOM 442 OE1 GLN A 67 150.979 43.691 -24.649 1.00 60.58 O ATOM 443 NE2 GLN A 67 150.417 43.309 -22.480 1.00 60.80 N ATOM 444 N GLY A 68 144.441 41.918 -25.112 1.00 41.86 N ATOM 445 CA GLY A 68 143.758 41.331 -26.218 1.00 44.13 C ATOM 446 C GLY A 68 143.674 39.853 -25.996 1.00 47.55 C ATOM 447 O GLY A 68 144.083 39.357 -24.963 1.00 44.84 O ATOM 448 N CYS A 69 143.174 39.143 -26.997 1.00 53.62 N ATOM 449 CA CYS A 69 143.033 37.684 -26.938 1.00 62.25 C ATOM 450 C CYS A 69 143.862 36.894 -27.991 1.00 67.78 C ATOM 451 O CYS A 69 143.275 36.120 -28.764 1.00 68.75 O ATOM 452 CB CYS A 69 141.565 37.272 -27.096 1.00 60.88 C ATOM 453 SG CYS A 69 140.448 37.934 -25.905 1.00 64.44 S ATOM 454 N PRO A 70 145.224 37.026 -27.989 1.00 73.36 N ATOM 455 CA PRO A 70 146.159 36.353 -28.915 1.00 78.19 C ATOM 456 C PRO A 70 145.780 34.988 -29.567 1.00 82.11 C ATOM 457 O PRO A 70 145.954 34.782 -30.788 1.00 83.22 O ATOM 458 CB PRO A 70 147.443 36.283 -28.072 1.00 76.96 C ATOM 459 CG PRO A 70 147.461 37.643 -27.445 1.00 74.67 C ATOM 460 CD PRO A 70 145.995 37.843 -27.018 1.00 74.33 C ATOM 461 N SER A 71 145.424 34.395 -30.550 1.00 86.04 N ATOM 462 CA SER A 71 145.404 33.039 -31.113 1.00 89.11 C ATOM 463 C SER A 71 144.767 31.771 -30.458 1.00 88.40 C ATOM 464 O SER A 71 144.315 30.849 -31.179 1.00 89.46 O ATOM 465 CB SER A 71 146.854 32.696 -31.540 1.00 91.69 C ATOM 466 OG SER A 71 147.092 31.285 -31.632 1.00 96.00 O ATOM 467 N THR A 72 144.762 31.677 -29.131 1.00 86.12 N ATOM 468 CA THR A 72 144.239 30.469 -28.486 1.00 82.35 C ATOM 469 C THR A 72 143.702 30.815 -27.093 1.00 76.27 C ATOM 470 O THR A 72 143.692 32.009 -26.718 1.00 77.40 O ATOM 471 CB THR A 72 145.338 29.403 -28.396 1.00 84.78 C ATOM 472 OG1 THR A 72 144.906 28.310 -27.566 1.00 87.54 O ATOM 473 CG2 THR A 72 146.624 30.048 -27.838 1.00 86.05 C ATOM 474 N HIS A 73 142.990 28.379 -25.746 1.00 68.68 N ATOM 475 CA HIS A 73 142.679 29.508 -24.901 1.00 59.58 C ATOM 476 C HIS A 73 143.088 29.268 -23.438 1.00 53.71 C ATOM 477 O HIS A 73 143.383 28.135 -22.998 1.00 51.71 O ATOM 478 CB HIS A 73 141.202 29.919 -25.032 1.00 61.72 C ATOM 479 CG HIS A 73 140.932 30.976 -26.088 1.00 64.15 C ATOM 480 ND1 HIS A 73 141.903 31.839 -26.569 1.00 64.93 N ATOM 481 CD2 HIS A 73 139.773 31.339 -26.707 1.00 64.65 C ATOM 482 CE1 HIS A 73 141.352 32.687 -27.425 1.00 64.78 C ATOM 483 NE2 HIS A 73 140.063 32.405 -27.527 1.00 64.61 N ATOM 484 N VAL A 74 143.056 30.287 -22.721 1.00 46.33 N ATOM 485 CA VAL A 74 143.599 30.460 -21.376 1.00 36.22 C ATOM 486 C VAL A 74 142.495 30.468 -20.322 1.00 31.87 C ATOM 487 O VAL A 74 141.454 31.107 -20.485 1.00 25.32 O ATOM 488 CB VAL A 74 144.373 31.810 -21.216 1.00 36.31 C ATOM 489 CG1 VAL A 74 145.156 31.835 -19.914 1.00 37.31 C ATOM 490 CG2 VAL A 74 145.325 32.050 -22.388 1.00 36.17 C ATOM 491 N LEU A 75 142.726 29.697 -19.266 1.00 28.74 N ATOM 492 CA LEU A 75 141.836 29.617 -18.117 1.00 27.40 C ATOM 493 C LEU A 75 142.633 30.164 -16.946 1.00 25.13 C ATOM 494 O LEU A 75 143.834 29.903 -16.804 1.00 24.29 O ATOM 495 CB LEU A 75 141.399 28.184 -17.844 1.00 30.26 C ATOM 496 CG LEU A 75 140.192 27.799 -18.683 1.00 31.53 C ATOM 497 CD1 LEU A 75 139.784 26.367 -18.386 1.00 34.94 C ATOM 498 CD2 LEU A 75 139.091 28.764 -18.387 1.00 32.19 C ATOM 499 N LEU A 76 141.973 30.933 -16.108 1.00 20.85 N ATOM 500 CA LEU A 76 142.648 31.541 -14.975 1.00 18.54 C ATOM 501 C LEU A 76 142.027 30.944 -13.724 1.00 15.05 C ATOM 502 O LEU A 76 140.807 30.899 -13.605 1.00 14.77 O ATOM 503 CB LEU A 76 142.473 33.067 -15.039 1.00 18.52 C ATOM 504 CG LEU A 76 143.265 33.949 -16.046 1.00 25.31 C ATOM 505 CD1 LEU A 76 144.103 33.185 -17.028 1.00 25.29 C ATOM 506 CD2 LEU A 76 142.343 34.937 -16.723 1.00 27.38 C ATOM 507 N THR A 77 142.846 30.415 -12.832 1.00 13.04 N ATOM 508 CA THR A 77 142.317 29.797 -11.615 1.00 14.45 C ATOM 509 C THR A 77 142.874 30.479 -10.366 1.00 13.18 C ATOM 510 O THR A 77 143.940 31.116 -10.418 1.00 11.55 O ATOM 511 CB THR A 77 142.647 28.273 -11.564 1.00 14.37 C ATOM 512 OG1 THR A 77 144.057 28.074 -11.327 1.00 11.98 O ATOM 513 CG2 THR A 77 142.205 27.596 -12.882 1.00 15.82 C ATOM 514 N HIS A 78 142.120 30.391 -9.277 1.00 10.47 N ATOM 515 CA HIS A 78 142.504 30.958 -7.977 1.00 8.86 C ATOM 516 C HIS A 78 141.938 29.994 -6.972 1.00 9.82 C ATOM 517 O HIS A 78 140.743 29.680 -7.043 1.00 10.74 O ATOM 518 CB HIS A 78 141.825 32.315 -7.792 1.00 11.94 C ATOM 519 CG HIS A 78 142.116 32.991 -6.485 1.00 15.09 C ATOM 520 ND1 HIS A 78 141.206 33.815 -5.855 1.00 19.91 N ATOM 521 CD2 HIS A 78 143.247 33.047 -5.742 1.00 16.68 C ATOM 522 CE1 HIS A 78 141.769 34.361 -4.791 1.00 14.79 C ATOM 523 NE2 HIS A 78 143.005 33.913 -4.701 1.00 16.53 N ATOM 524 N THR A 79 142.757 29.513 -6.040 1.00 10.36 N ATOM 525 CA THR A 79 142.228 28.583 -5.050 1.00 10.26 C ATOM 526 C THR A 79 142.897 28.810 -3.719 1.00 12.48 C ATOM 527 O THR A 79 143.973 29.418 -3.655 1.00 14.17 O ATOM 528 CB THR A 79 142.249 27.049 -5.536 1.00 16.73 C ATOM 529 OG1 THR A 79 143.291 26.303 -4.931 1.00 25.39 O ATOM 530 CG2 THR A 79 142.414 26.907 -7.010 1.00 13.30 C ATOM 531 N ILE A 80 142.160 28.548 -2.655 1.00 12.25 N ATOM 532 CA ILE A 80 142.738 28.679 -1.310 1.00 12.39 C ATOM 533 C ILE A 80 142.715 27.255 -0.786 1.00 9.85 C ATOM 534 O ILE A 80 141.685 26.636 -0.857 1.00 9.71 O ATOM 535 CB ILE A 80 141.898 29.619 -0.388 1.00 8.84 C ATOM 536 CG1 ILE A 80 142.010 31.064 -0.877 1.00 12.23 C ATOM 537 CG2 ILE A 80 142.428 29.581 1.083 1.00 11.01 C ATOM 538 CD1 ILE A 80 141.009 32.035 -0.201 1.00 10.77 C ATOM 539 N SER A 81 143.879 26.673 -0.500 1.00 11.87 N ATOM 540 CA SER A 81 143.942 25.339 0.075 1.00 11.73 C ATOM 541 C SER A 81 144.341 25.356 1.547 1.00 15.56 C ATOM 542 O SER A 81 144.961 26.305 2.039 1.00 17.30 O ATOM 543 CB SER A 81 144.999 24.542 -0.617 1.00 15.75 C ATOM 544 OG SER A 81 144.969 24.870 -1.966 1.00 28.39 O ATOM 545 N ARG A 82 144.064 24.241 2.207 1.00 16.85 N ATOM 546 CA ARG A 82 144.392 24.027 3.602 1.00 17.03 C ATOM 547 C ARG A 82 145.340 22.828 3.601 1.00 20.76 C ATOM 548 O ARG A 82 145.089 21.860 2.864 1.00 17.91 O ATOM 549 CB ARG A 82 143.120 23.631 4.326 1.00 16.82 C ATOM 550 CG ARG A 82 143.307 23.378 5.803 1.00 20.13 C ATOM 551 CD ARG A 82 142.050 22.836 6.404 1.00 20.04 C ATOM 552 NE ARG A 82 142.376 22.207 7.671 1.00 27.98 N ATOM 553 CZ ARG A 82 141.563 21.419 8.361 1.00 30.32 C ATOM 554 NH1 ARG A 82 140.344 21.140 7.917 1.00 28.33 N ATOM 555 NH2 ARG A 82 141.979 20.919 9.514 1.00 30.81 N ATOM 556 N ILE A 83 146.457 22.912 4.332 1.00 25.59 N ATOM 557 CA ILE A 83 147.383 21.766 4.421 1.00 30.86 C ATOM 558 C ILE A 83 147.474 21.225 5.848 1.00 36.26 C ATOM 559 O ILE A 83 147.381 21.984 6.814 1.00 33.38 O ATOM 560 CB ILE A 83 148.828 22.018 3.803 1.00 30.20 C ATOM 561 CG1 ILE A 83 149.908 21.913 4.884 1.00 30.27 C ATOM 562 CG2 ILE A 83 148.877 23.322 2.987 1.00 30.31 C ATOM 563 CD1 ILE A 83 151.328 22.075 4.403 1.00 34.89 C ATOM 564 N ALA A 84 146.356 20.588 6.607 1.00 44.19 N ATOM 565 CA ALA A 84 146.115 20.233 8.004 1.00 52.59 C ATOM 566 C ALA A 84 147.272 19.603 8.765 1.00 58.50 C ATOM 567 O ALA A 84 148.209 19.043 8.170 1.00 58.67 O ATOM 568 CB ALA A 84 144.902 19.348 8.111 1.00 49.84 C ATOM 569 N VAL A 85 147.233 19.792 10.088 1.00 65.35 N ATOM 570 CA VAL A 85 148.220 19.211 10.990 1.00 70.50 C ATOM 571 C VAL A 85 149.622 19.861 11.127 1.00 74.31 C ATOM 572 O VAL A 85 149.837 20.772 11.956 1.00 74.83 O ATOM 573 CB VAL A 85 148.349 17.661 10.677 1.00 69.90 C ATOM 574 CG1 VAL A 85 149.630 17.059 11.272 1.00 69.59 C ATOM 575 CG2 VAL A 85 147.086 16.900 11.170 1.00 68.52 C ATOM 576 N SER A 86 150.582 19.332 10.373 1.00 78.14 N ATOM 577 CA SER A 86 151.997 19.737 10.424 1.00 82.42 C ATOM 578 C SER A 86 152.586 18.760 9.436 1.00 83.87 C ATOM 579 O SER A 86 153.049 17.682 9.847 1.00 84.70 O ATOM 580 CB SER A 86 152.609 19.421 11.797 1.00 83.63 C ATOM 581 OG SER A 86 152.535 20.550 12.643 1.00 84.92 O ATOM 582 N TYR A 87 152.417 19.082 8.150 1.00 84.60 N ATOM 583 CA TYR A 87 152.865 18.257 7.030 1.00 83.31 C ATOM 584 C TYR A 87 151.891 17.094 6.724 1.00 82.14 C ATOM 585 O TYR A 87 152.282 15.927 6.704 1.00 82.75 O ATOM 586 CB TYR A 87 154.304 17.748 7.264 1.00 83.13 C ATOM 587 N GLN A 88 150.603 17.407 6.582 1.00 80.64 N ATOM 588 CA GLN A 88 149.613 16.386 6.220 1.00 79.15 C ATOM 589 C GLN A 88 149.231 16.706 4.773 1.00 78.25 C ATOM 590 O GLN A 88 150.010 17.371 4.075 1.00 78.91 O ATOM 591 CB GLN A 88 148.395 16.419 7.130 1.00 77.67 C ATOM 592 N THR A 89 148.038 16.295 4.331 1.00 76.42 N ATOM 593 CA THR A 89 147.611 16.535 2.939 1.00 73.67 C ATOM 594 C THR A 89 147.156 17.962 2.651 1.00 70.63 C ATOM 595 O THR A 89 146.904 18.747 3.573 1.00 69.90 O ATOM 596 CB THR A 89 146.420 15.632 2.539 1.00 75.37 C ATOM 597 OG1 THR A 89 146.210 15.698 1.109 1.00 76.17 O ATOM 598 CG2 THR A 89 145.135 16.109 3.282 1.00 76.34 C ATOM 599 N LYS A 90 147.009 18.256 1.360 1.00 66.36 N ATOM 600 CA LYS A 90 146.530 19.545 0.894 1.00 61.48 C ATOM 601 C LYS A 90 145.094 19.310 0.380 1.00 58.73 C ATOM 602 O LYS A 90 144.824 18.253 -0.216 1.00 59.46 O ATOM 603 CB LYS A 90 147.424 20.065 -0.225 1.00 59.63 C ATOM 604 CG LYS A 90 147.179 21.505 -0.488 1.00 60.67 C ATOM 605 CD LYS A 90 148.261 22.086 -1.311 1.00 61.36 C ATOM 606 CE LYS A 90 148.201 23.594 -1.276 1.00 62.29 C ATOM 607 NZ LYS A 90 149.379 24.147 -1.982 1.00 61.77 N ATOM 608 N VAL A 91 144.170 20.233 0.666 1.00 52.42 N ATOM 609 CA VAL A 91 142.774 20.118 0.217 1.00 46.24 C ATOM 610 C VAL A 91 142.198 21.511 -0.106 1.00 40.66 C ATOM 611 O VAL A 91 142.445 22.470 0.632 1.00 39.95 O ATOM 612 CB VAL A 91 141.902 19.446 1.280 1.00 46.28 C ATOM 613 CG1 VAL A 91 141.986 20.190 2.606 1.00 46.22 C ATOM 614 CG2 VAL A 91 140.484 19.410 0.800 1.00 47.03 C ATOM 615 N ASN A 92 141.524 21.766 -1.080 1.00 32.03 N ATOM 616 CA ASN A 92 140.933 23.048 -1.501 1.00 25.84 C ATOM 617 C ASN A 92 139.702 23.402 -0.676 1.00 20.98 C ATOM 618 O ASN A 92 138.856 22.567 -0.402 1.00 21.80 O ATOM 619 CB ASN A 92 140.554 23.048 -2.981 1.00 26.50 C ATOM 620 CG ASN A 92 141.756 22.983 -3.912 1.00 29.41 C ATOM 621 OD1 ASN A 92 142.912 22.913 -3.482 1.00 31.51 O ATOM 622 ND2 ASN A 92 141.477 22.998 -5.210 1.00 30.18 N ATOM 623 N LEU A 93 139.692 24.622 -0.178 1.00 16.21 N ATOM 624 CA LEU A 93 138.612 25.103 0.623 1.00 12.23 C ATOM 625 C LEU A 93 137.719 25.967 -0.234 1.00 10.77 C ATOM 626 O LEU A 93 136.507 25.860 -0.146 1.00 11.33 O ATOM 627 CB LEU A 93 139.173 25.939 1.775 1.00 13.27 C ATOM 628 CG LEU A 93 139.961 25.224 2.885 1.00 11.22 C ATOM 629 CD1 LEU A 93 140.563 26.206 3.818 1.00 15.16 C ATOM 630 CD2 LEU A 93 139.122 24.208 3.622 1.00 9.71 C ATOM 631 N LEU A 94 138.321 26.757 -1.122 1.00 12.44 N ATOM 632 CA LEU A 94 137.610 27.724 -1.980 1.00 10.12 C ATOM 633 C LEU A 94 138.335 27.727 -3.317 1.00 12.04 C ATOM 634 O LEU A 94 139.569 27.640 -3.353 1.00 11.28 O ATOM 635 CB LEU A 94 137.681 29.117 -1.329 1.00 11.80 C ATOM 636 CG LEU A 94 137.166 29.301 0.113 1.00 9.62 C ATOM 637 CD1 LEU A 94 137.760 30.589 0.739 1.00 13.27 C ATOM 638 CD2 LEU A 94 135.639 29.394 0.082 1.00 11.46 C ATOM 639 N SER A 95 137.586 27.871 -4.403 1.00 12.23 N ATOM 640 CA SER A 95 138.173 27.804 -5.740 1.00 13.05 C ATOM 641 C SER A 95 137.259 28.478 -6.783 1.00 11.64 C ATOM 642 O SER A 95 136.018 28.402 -6.701 1.00 11.49 O ATOM 643 CB SER A 95 138.356 26.317 -6.082 1.00 13.46 C ATOM 644 OG SER A 95 139.257 26.170 -7.153 1.00 22.53 O ATOM 645 N ALA A 96 137.876 29.134 -7.760 1.00 9.33 N ATOM 646 CA ALA A 96 137.139 29.785 -8.858 1.00 9.37 C ATOM 647 C ALA A 96 137.952 29.692 -10.136 1.00 10.16 C ATOM 648 O ALA A 96 139.165 29.533 -10.085 1.00 9.13 O ATOM 649 CB ALA A 96 136.803 31.212 -8.555 1.00 7.20 C ATOM 650 N ILE A 97 137.271 29.702 -11.275 1.00 11.82 N ATOM 651 CA ILE A 97 137.910 29.636 -12.591 1.00 14.10 C ATOM 652 C ILE A 97 137.260 30.696 -13.411 1.00 13.14 C ATOM 653 O ILE A 97 136.063 30.944 -13.264 1.00 14.36 O ATOM 654 CB ILE A 97 137.611 28.329 -13.356 1.00 15.42 C ATOM 655 CG1 ILE A 97 137.996 27.138 -12.508 1.00 16.89 C ATOM 656 CG2 ILE A 97 138.413 28.279 -14.661 1.00 15.09 C ATOM 657 CD1 ILE A 97 137.346 25.874 -12.964 1.00 23.07 C ATOM 658 N LYS A 98 138.010 31.228 -14.357 1.00 13.95 N ATOM 659 CA LYS A 98 137.496 32.261 -15.216 1.00 18.27 C ATOM 660 C LYS A 98 138.182 32.125 -16.574 1.00 18.16 C ATOM 661 O LYS A 98 139.328 31.672 -16.637 1.00 20.04 O ATOM 662 CB LYS A 98 137.865 33.575 -14.571 1.00 21.03 C ATOM 663 CG LYS A 98 137.177 34.727 -15.120 1.00 24.61 C ATOM 664 CD LYS A 98 137.445 35.824 -14.211 1.00 28.64 C ATOM 665 CE LYS A 98 136.258 36.097 -13.386 1.00 30.54 C ATOM 666 NZ LYS A 98 136.787 37.129 -12.545 1.00 34.04 N ATOM 667 N SER A 99 137.470 32.402 -17.654 1.00 19.28 N ATOM 668 CA SER A 99 138.102 32.351 -18.974 1.00 20.54 C ATOM 669 C SER A 99 137.992 33.746 -19.551 1.00 24.24 C ATOM 670 O SER A 99 136.903 34.315 -19.638 1.00 24.84 O ATOM 671 CB SER A 99 137.508 31.274 -19.909 1.00 23.81 C ATOM 672 OG SER A 99 136.089 31.254 -19.939 1.00 29.79 O ATOM 673 N PRO A 100 139.146 34.376 -19.793 1.00 26.60 N ATOM 674 CA PRO A 100 139.153 35.724 -20.348 1.00 27.42 C ATOM 675 C PRO A 100 138.741 35.833 -21.810 1.00 30.40 C ATOM 676 O PRO A 100 137.987 36.715 -22.152 1.00 32.15 O ATOM 677 CB PRO A 100 140.589 36.178 -20.109 1.00 27.70 C ATOM 678 CG PRO A 100 141.382 34.875 -20.065 1.00 27.75 C ATOM 679 CD PRO A 100 140.484 33.969 -19.322 1.00 26.12 C ATOM 680 N CYS A 101 139.200 34.945 -22.678 1.00 34.08 N ATOM 681 CA CYS A 101 138.827 35.074 -24.082 1.00 40.79 C ATOM 682 C CYS A 101 137.980 33.968 -24.697 1.00 46.10 C ATOM 683 O CYS A 101 138.386 32.802 -24.774 1.00 48.78 O ATOM 684 CB CYS A 101 140.066 35.249 -24.930 1.00 39.94 C ATOM 685 SG CYS A 101 141.039 36.676 -24.519 1.00 44.90 S ATOM 686 N GLN A 102 136.814 34.342 -25.199 1.00 52.41 N ATOM 687 CA GLN A 102 135.984 33.343 -25.826 1.00 57.68 C ATOM 688 C GLN A 102 136.564 33.224 -27.247 1.00 59.52 C ATOM 689 O GLN A 102 136.710 32.116 -27.769 1.00 60.62 O ATOM 690 CB GLN A 102 134.503 33.763 -25.796 1.00 59.89 C ATOM 691 CG GLN A 102 133.525 32.641 -25.367 1.00 61.98 C ATOM 692 CD GLN A 102 133.596 32.274 -23.881 1.00 64.75 C ATOM 693 OE1 GLN A 102 134.541 31.621 -23.407 1.00 65.12 O ATOM 694 NE2 GLN A 102 132.565 32.664 -23.145 1.00 65.07 N ATOM 695 N ARG A 103 137.022 34.356 -27.796 1.00 60.57 N ATOM 696 CA ARG A 103 137.613 34.409 -29.140 1.00 61.59 C ATOM 697 C ARG A 103 138.560 35.609 -29.387 1.00 63.50 C ATOM 698 O ARG A 103 138.672 36.508 -28.548 1.00 63.76 O ATOM 699 CB ARG A 103 136.504 34.388 -30.200 1.00 59.24 C ATOM 700 N GLU A 104 139.262 35.539 -30.527 1.00 66.69 N ATOM 701 CA GLU A 104 140.233 36.502 -31.106 1.00 70.55 C ATOM 702 C GLU A 104 141.344 35.674 -31.722 1.00 74.59 C ATOM 703 O GLU A 104 141.116 34.526 -32.110 1.00 74.13 O ATOM 704 CB GLU A 104 140.844 37.523 -30.129 1.00 70.37 C ATOM 705 CG GLU A 104 142.194 38.109 -30.655 1.00 70.82 C ATOM 706 CD GLU A 104 142.312 39.640 -30.624 1.00 70.34 C ATOM 707 OE1 GLU A 104 141.556 40.336 -31.332 1.00 70.07 O ATOM 708 OE2 GLU A 104 143.212 40.156 -29.930 1.00 70.09 O ATOM 709 N THR A 105 140.954 35.709 -33.677 1.00 81.65 N ATOM 710 CA THR A 105 141.464 34.824 -34.784 1.00 87.35 C ATOM 711 C THR A 105 142.067 35.652 -36.007 1.00 90.13 C ATOM 712 O THR A 105 142.365 36.897 -35.846 1.00 90.21 O ATOM 713 CB THR A 105 140.210 33.881 -35.242 1.00 88.59 C ATOM 714 OG1 THR A 105 138.945 34.518 -34.900 1.00 91.31 O ATOM 715 CG2 THR A 105 140.258 32.431 -34.596 1.00 88.63 C ATOM 716 N PRO A 106 142.138 35.039 -37.260 1.00 92.17 N ATOM 717 CA PRO A 106 142.707 35.754 -38.453 1.00 93.63 C ATOM 718 C PRO A 106 142.790 37.327 -38.651 1.00 94.86 C ATOM 719 O PRO A 106 141.880 37.952 -39.325 1.00 95.77 O ATOM 720 CB PRO A 106 141.911 35.133 -39.627 1.00 93.20 C ATOM 721 CG PRO A 106 141.846 33.701 -39.239 1.00 92.26 C ATOM 722 CD PRO A 106 141.618 33.710 -37.688 1.00 91.68 C ATOM 723 N GLU A 107 143.884 37.954 -38.169 1.00 95.05 N ATOM 724 CA GLU A 107 144.098 39.400 -38.363 1.00 95.21 C ATOM 725 C GLU A 107 142.929 40.384 -38.107 1.00 95.40 C ATOM 726 O GLU A 107 141.917 40.078 -37.440 1.00 95.78 O ATOM 727 CB GLU A 107 144.685 39.655 -39.759 1.00 94.25 C ATOM 728 N GLY A 108 143.050 41.558 -38.708 1.00 95.84 N ATOM 729 CA GLY A 108 142.069 42.606 -38.482 1.00 96.19 C ATOM 730 C GLY A 108 142.831 43.580 -37.558 1.00 96.70 C ATOM 731 O GLY A 108 144.083 43.792 -37.695 1.00 96.78 O ATOM 732 N ALA A 109 142.104 44.147 -36.585 1.00 95.66 N ATOM 733 CA ALA A 109 142.717 45.087 -35.625 1.00 93.55 C ATOM 734 C ALA A 109 142.830 44.271 -34.326 1.00 90.80 C ATOM 735 O ALA A 109 142.545 43.036 -34.375 1.00 91.97 O ATOM 736 CB ALA A 109 141.813 46.324 -35.454 1.00 94.49 C ATOM 737 N GLU A 110 143.211 44.872 -33.178 1.00 86.58 N ATOM 738 CA GLU A 110 143.303 43.963 -32.004 1.00 82.70 C ATOM 739 C GLU A 110 143.579 44.490 -30.581 1.00 78.42 C ATOM 740 O GLU A 110 143.296 45.670 -30.268 1.00 78.66 O ATOM 741 CB GLU A 110 144.364 42.852 -32.341 1.00 83.97 C ATOM 742 N ALA A 111 144.270 43.452 -29.371 1.00 72.29 N ATOM 743 CA ALA A 111 145.294 44.198 -28.632 1.00 66.49 C ATOM 744 C ALA A 111 144.734 45.116 -27.531 1.00 61.84 C ATOM 745 O ALA A 111 145.439 45.997 -27.011 1.00 62.47 O ATOM 746 CB ALA A 111 146.177 45.003 -29.601 1.00 66.89 C ATOM 747 N LYS A 112 143.448 44.955 -27.233 1.00 55.77 N ATOM 748 CA LYS A 112 142.796 45.729 -26.176 1.00 49.52 C ATOM 749 C LYS A 112 143.017 45.015 -24.827 1.00 44.66 C ATOM 750 O LYS A 112 142.829 43.798 -24.721 1.00 41.76 O ATOM 751 CB LYS A 112 141.287 45.926 -26.466 1.00 49.12 C ATOM 752 CG LYS A 112 140.665 44.917 -27.424 1.00 48.61 C ATOM 753 CD LYS A 112 140.797 43.504 -26.875 1.00 48.10 C ATOM 754 CE LYS A 112 140.826 42.448 -27.968 1.00 46.57 C ATOM 755 NZ LYS A 112 140.954 41.129 -27.321 1.00 45.20 N ATOM 756 N PRO A 113 143.527 45.740 -23.817 1.00 40.92 N ATOM 757 CA PRO A 113 143.751 45.103 -22.513 1.00 38.66 C ATOM 758 C PRO A 113 142.431 44.747 -21.836 1.00 34.21 C ATOM 759 O PRO A 113 141.413 45.438 -21.990 1.00 33.86 O ATOM 760 CB PRO A 113 144.523 46.171 -21.722 1.00 39.82 C ATOM 761 CG PRO A 113 144.008 47.451 -22.300 1.00 39.92 C ATOM 762 CD PRO A 113 143.990 47.137 -23.802 1.00 40.85 C ATOM 763 N TRP A 114 142.460 43.682 -21.055 1.00 27.41 N ATOM 764 CA TRP A 114 141.263 43.265 -20.375 1.00 21.71 C ATOM 765 C TRP A 114 141.659 43.034 -18.939 1.00 19.51 C ATOM 766 O TRP A 114 142.835 42.812 -18.674 1.00 17.22 O ATOM 767 CB TRP A 114 140.667 41.999 -21.030 1.00 18.83 C ATOM 768 CG TRP A 114 141.609 40.863 -21.269 1.00 17.27 C ATOM 769 CD1 TRP A 114 142.200 40.501 -22.466 1.00 18.02 C ATOM 770 CD2 TRP A 114 142.067 39.918 -20.289 1.00 18.20 C ATOM 771 NE1 TRP A 114 142.996 39.395 -22.278 1.00 16.57 N ATOM 772 CE2 TRP A 114 142.932 39.016 -20.957 1.00 17.54 C ATOM 773 CE3 TRP A 114 141.834 39.745 -18.908 1.00 13.25 C ATOM 774 CZ2 TRP A 114 143.572 37.961 -20.285 1.00 19.73 C ATOM 775 CZ3 TRP A 114 142.470 38.697 -18.238 1.00 10.85 C ATOM 776 CH2 TRP A 114 143.330 37.821 -18.929 1.00 15.22 C ATOM 777 N TYR A 115 140.707 43.181 -18.010 1.00 17.91 N ATOM 778 CA TYR A 115 140.948 42.939 -16.596 1.00 13.28 C ATOM 779 C TYR A 115 139.891 42.001 -16.076 1.00 12.72 C ATOM 780 O TYR A 115 138.726 42.072 -16.480 1.00 14.81 O ATOM 781 CB TYR A 115 140.843 44.191 -15.811 1.00 11.84 C ATOM 782 CG TYR A 115 141.950 45.092 -16.053 1.00 14.28 C ATOM 783 CD1 TYR A 115 141.934 45.951 -17.151 1.00 16.77 C ATOM 784 CD2 TYR A 115 143.009 45.131 -15.163 1.00 13.80 C ATOM 785 CE1 TYR A 115 142.963 46.846 -17.357 1.00 21.10 C ATOM 786 CE2 TYR A 115 144.038 46.005 -15.341 1.00 18.22 C ATOM 787 CZ TYR A 115 144.023 46.875 -16.436 1.00 21.70 C ATOM 788 OH TYR A 115 145.046 47.795 -16.581 1.00 25.62 O ATOM 789 N GLU A 116 140.303 41.156 -15.132 1.00 13.08 N ATOM 790 CA GLU A 116 139.438 40.167 -14.538 1.00 12.20 C ATOM 791 C GLU A 116 139.799 39.901 -13.072 1.00 12.50 C ATOM 792 O GLU A 116 140.875 39.349 -12.773 1.00 9.75 O ATOM 793 CB GLU A 116 139.552 38.836 -15.291 1.00 18.67 C ATOM 794 CG GLU A 116 139.029 38.844 -16.696 1.00 18.84 C ATOM 795 CD GLU A 116 137.532 38.956 -16.781 1.00 22.77 C ATOM 796 OE1 GLU A 116 136.854 39.311 -15.797 1.00 19.57 O ATOM 797 OE2 GLU A 116 137.009 38.696 -17.881 1.00 28.82 O ATOM 798 N PRO A 117 138.910 40.327 -12.143 1.00 12.52 N ATOM 799 CA PRO A 117 139.120 40.117 -10.697 1.00 13.93 C ATOM 800 C PRO A 117 138.445 38.779 -10.240 1.00 13.59 C ATOM 801 O PRO A 117 137.480 38.324 -10.845 1.00 12.95 O ATOM 802 CB PRO A 117 138.454 41.352 -10.053 1.00 12.17 C ATOM 803 CG PRO A 117 137.279 41.641 -11.010 1.00 10.30 C ATOM 804 CD PRO A 117 137.718 41.176 -12.408 1.00 10.60 C ATOM 805 N ILE A 118 139.002 38.098 -9.241 1.00 13.57 N ATOM 806 CA ILE A 118 138.329 36.902 -8.711 1.00 11.65 C ATOM 807 C ILE A 118 138.365 37.112 -7.207 1.00 11.47 C ATOM 808 O ILE A 118 139.392 37.556 -6.727 1.00 12.50 O ATOM 809 CB ILE A 118 139.034 35.558 -9.058 1.00 13.40 C ATOM 810 CG1 ILE A 118 139.049 35.329 -10.576 1.00 13.37 C ATOM 811 CG2 ILE A 118 138.240 34.397 -8.420 1.00 10.68 C ATOM 812 CD1 ILE A 118 139.757 33.996 -11.002 1.00 16.92 C ATOM 813 N TYR A 119 137.239 36.915 -6.513 1.00 11.48 N ATOM 814 CA TYR A 119 137.134 37.025 -5.049 1.00 10.08 C ATOM 815 C TYR A 119 136.712 35.635 -4.449 1.00 10.55 C ATOM 816 O TYR A 119 135.822 34.952 -4.972 1.00 12.30 O ATOM 817 CB TYR A 119 136.095 38.093 -4.645 1.00 10.67 C ATOM 818 CG TYR A 119 136.150 38.438 -3.177 1.00 10.16 C ATOM 819 CD1 TYR A 119 137.016 39.436 -2.704 1.00 10.32 C ATOM 820 CD2 TYR A 119 135.386 37.741 -2.249 1.00 10.91 C ATOM 821 CE1 TYR A 119 137.129 39.733 -1.317 1.00 13.12 C ATOM 822 CE2 TYR A 119 135.489 38.024 -0.864 1.00 11.39 C ATOM 823 CZ TYR A 119 136.362 39.017 -0.398 1.00 12.66 C ATOM 824 OH TYR A 119 136.488 39.251 0.965 1.00 9.71 O ATOM 825 N LEU A 120 137.341 35.242 -3.343 1.00 8.03 N ATOM 826 CA LEU A 120 137.031 33.992 -2.648 1.00 10.64 C ATOM 827 C LEU A 120 136.903 34.362 -1.162 1.00 10.02 C ATOM 828 O LEU A 120 137.649 35.218 -0.679 1.00 10.90 O ATOM 829 CB LEU A 120 138.198 33.041 -2.805 1.00 13.66 C ATOM 830 CG LEU A 120 138.351 31.970 -3.879 1.00 16.92 C ATOM 831 CD1 LEU A 120 137.585 32.237 -5.130 1.00 19.17 C ATOM 832 CD2 LEU A 120 139.795 31.760 -4.108 1.00 14.58 C ATOM 833 N GLY A 121 135.950 33.777 -0.440 1.00 11.07 N ATOM 834 CA GLY A 121 135.812 34.083 0.990 1.00 9.55 C ATOM 835 C GLY A 121 134.894 33.099 1.683 1.00 7.39 C ATOM 836 O GLY A 121 133.918 32.701 1.080 1.00 11.47 O ATOM 837 N GLY A 122 135.266 32.596 2.868 1.00 8.95 N ATOM 838 CA GLY A 122 134.416 31.650 3.578 1.00 10.35 C ATOM 839 C GLY A 122 134.859 31.421 5.013 1.00 11.54 C ATOM 840 O GLY A 122 135.983 31.778 5.356 1.00 12.43 O ATOM 841 N VAL A 123 134.005 30.793 5.822 1.00 7.76 N ATOM 842 CA VAL A 123 134.295 30.527 7.248 1.00 10.38 C ATOM 843 C VAL A 123 134.631 29.069 7.456 1.00 11.59 C ATOM 844 O VAL A 123 133.854 28.202 7.051 1.00 13.11 O ATOM 845 CB VAL A 123 133.101 30.924 8.138 1.00 10.65 C ATOM 846 CG1 VAL A 123 133.312 30.485 9.603 1.00 11.84 C ATOM 847 CG2 VAL A 123 132.925 32.412 8.072 1.00 11.48 C ATOM 848 N PHE A 124 135.782 28.803 8.086 1.00 11.38 N ATOM 849 CA PHE A 124 136.265 27.432 8.285 1.00 12.17 C ATOM 850 C PHE A 124 136.835 27.118 9.658 1.00 13.82 C ATOM 851 O PHE A 124 137.458 27.970 10.287 1.00 13.27 O ATOM 852 CB PHE A 124 137.338 27.096 7.216 1.00 14.08 C ATOM 853 CG PHE A 124 136.847 27.241 5.767 1.00 11.98 C ATOM 854 CD1 PHE A 124 136.884 28.479 5.111 1.00 14.47 C ATOM 855 CD2 PHE A 124 136.356 26.133 5.065 1.00 13.95 C ATOM 856 CE1 PHE A 124 136.440 28.610 3.788 1.00 14.57 C ATOM 857 CE2 PHE A 124 135.909 26.249 3.738 1.00 12.52 C ATOM 858 CZ PHE A 124 135.950 27.486 3.105 1.00 9.95 C ATOM 859 N GLN A 125 136.625 25.876 10.100 1.00 13.56 N ATOM 860 CA GLN A 125 137.141 25.379 11.388 1.00 12.70 C ATOM 861 C GLN A 125 138.526 24.827 11.114 1.00 13.91 C ATOM 862 O GLN A 125 138.692 23.898 10.324 1.00 12.60 O ATOM 863 CB GLN A 125 136.256 24.286 11.969 1.00 15.58 C ATOM 864 CG GLN A 125 136.746 23.711 13.299 1.00 22.63 C ATOM 865 CD GLN A 125 136.333 24.529 14.512 1.00 28.47 C ATOM 866 OE1 GLN A 125 135.219 24.384 15.024 1.00 35.75 O ATOM 867 NE2 GLN A 125 137.233 25.374 14.998 1.00 32.26 N ATOM 868 N LEU A 126 139.512 25.483 11.721 1.00 15.10 N ATOM 869 CA LEU A 126 140.922 25.165 11.565 1.00 18.02 C ATOM 870 C LEU A 126 141.492 24.763 12.909 1.00 18.46 C ATOM 871 O LEU A 126 140.898 25.068 13.933 1.00 16.37 O ATOM 872 CB LEU A 126 141.686 26.392 11.015 1.00 17.52 C ATOM 873 CG LEU A 126 141.195 26.964 9.685 1.00 16.72 C ATOM 874 CD1 LEU A 126 142.183 28.035 9.229 1.00 16.79 C ATOM 875 CD2 LEU A 126 141.123 25.853 8.667 1.00 13.83 C ATOM 876 N GLU A 127 142.628 24.070 12.871 1.00 21.14 N ATOM 877 CA GLU A 127 143.348 23.591 14.042 1.00 24.34 C ATOM 878 C GLU A 127 144.648 24.364 14.073 1.00 24.58 C ATOM 879 O GLU A 127 145.098 24.819 13.040 1.00 25.10 O ATOM 880 CB GLU A 127 143.715 22.127 13.878 1.00 26.09 C ATOM 881 CG GLU A 127 142.592 21.161 13.702 1.00 34.51 C ATOM 882 CD GLU A 127 143.093 19.770 13.194 1.00 40.22 C ATOM 883 OE1 GLU A 127 144.315 19.600 12.902 1.00 43.77 O ATOM 884 OE2 GLU A 127 142.253 18.847 13.051 1.00 40.57 O ATOM 885 N LYS A 128 145.273 24.491 15.240 1.00 24.50 N ATOM 886 CA LYS A 128 146.557 25.180 15.327 1.00 25.01 C ATOM 887 C LYS A 128 147.618 24.473 14.446 1.00 23.06 C ATOM 888 O LYS A 128 147.774 23.244 14.483 1.00 22.50 O ATOM 889 CB LYS A 128 147.077 25.189 16.778 1.00 27.04 C ATOM 890 CG LYS A 128 148.592 25.483 16.859 1.00 27.48 C ATOM 891 CD LYS A 128 149.237 24.865 18.062 1.00 34.45 C ATOM 892 CE LYS A 128 149.595 25.928 19.106 1.00 36.05 C ATOM 893 NZ LYS A 128 149.942 25.334 20.441 1.00 37.00 N ATOM 894 N GLY A 129 148.407 25.260 13.725 1.00 20.68 N ATOM 895 CA GLY A 129 149.452 24.684 12.889 1.00 22.11 C ATOM 896 C GLY A 129 149.089 24.481 11.431 1.00 21.56 C ATOM 897 O GLY A 129 149.985 24.284 10.588 1.00 19.83 O ATOM 898 N ASP A 130 147.783 24.493 11.139 1.00 21.38 N ATOM 899 CA ASP A 130 147.297 24.355 9.774 1.00 18.41 C ATOM 900 C ASP A 130 147.944 25.445 8.947 1.00 16.89 C ATOM 901 O ASP A 130 148.225 26.529 9.444 1.00 15.45 O ATOM 902 CB ASP A 130 145.774 24.512 9.707 1.00 18.00 C ATOM 903 CG ASP A 130 145.043 23.226 10.001 1.00 16.97 C ATOM 904 OD1 ASP A 130 145.660 22.179 10.288 1.00 21.79 O ATOM 905 OD2 ASP A 130 143.820 23.247 9.930 1.00 18.73 O ATOM 906 N ARG A 131 148.178 25.153 7.681 1.00 16.54 N ATOM 907 CA ARG A 131 148.779 26.110 6.774 1.00 20.89 C ATOM 908 C ARG A 131 147.813 26.400 5.651 1.00 19.92 C ATOM 909 O ARG A 131 147.163 25.486 5.120 1.00 20.50 O ATOM 910 CB ARG A 131 150.039 25.544 6.162 1.00 24.49 C ATOM 911 CG ARG A 131 151.200 25.583 7.080 1.00 34.08 C ATOM 912 CD ARG A 131 152.229 24.599 6.657 1.00 40.41 C ATOM 913 NE ARG A 131 153.393 24.781 7.504 1.00 50.20 N ATOM 914 CZ ARG A 131 154.364 25.653 7.255 1.00 53.51 C ATOM 915 NH1 ARG A 131 154.346 26.363 6.130 1.00 56.09 N ATOM 916 NH2 ARG A 131 155.381 25.766 8.104 1.00 55.90 N ATOM 917 N LEU A 132 147.733 27.669 5.278 1.00 18.92 N ATOM 918 CA LEU A 132 146.855 28.106 4.198 1.00 16.11 C ATOM 919 C LEU A 132 147.639 28.689 3.019 1.00 17.03 C ATOM 920 O LEU A 132 148.501 29.584 3.169 1.00 15.95 O ATOM 921 CB LEU A 132 145.843 29.143 4.714 1.00 13.19 C ATOM 922 CG LEU A 132 144.911 28.663 5.775 1.00 9.98 C ATOM 923 CD1 LEU A 132 144.331 29.836 6.403 1.00 13.44 C ATOM 924 CD2 LEU A 132 143.867 27.766 5.189 1.00 15.06 C ATOM 925 N SER A 133 147.252 28.252 1.832 1.00 15.91 N ATOM 926 CA SER A 133 147.864 28.741 0.616 1.00 18.56 C ATOM 927 C SER A 133 146.816 29.408 -0.323 1.00 14.84 C ATOM 928 O SER A 133 145.706 28.909 -0.439 1.00 16.32 O ATOM 929 CB SER A 133 148.477 27.543 -0.094 1.00 18.21 C ATOM 930 OG SER A 133 149.563 27.955 -0.882 1.00 27.00 O ATOM 931 N ALA A 134 147.125 30.572 -0.880 1.00 12.26 N ATOM 932 CA ALA A 134 146.255 31.234 -1.857 1.00 12.13 C ATOM 933 C ALA A 134 147.136 31.182 -3.098 1.00 15.17 C ATOM 934 O ALA A 134 148.187 31.864 -3.169 1.00 14.98 O ATOM 935 CB ALA A 134 146.005 32.627 -1.483 1.00 14.25 C ATOM 936 N GLU A 135 146.659 30.447 -4.101 1.00 14.49 N ATOM 937 CA GLU A 135 147.413 30.191 -5.310 1.00 14.62 C ATOM 938 C GLU A 135 146.696 30.396 -6.614 1.00 10.52 C ATOM 939 O GLU A 135 145.484 30.224 -6.687 1.00 10.35 O ATOM 940 CB GLU A 135 147.829 28.737 -5.285 1.00 16.30 C ATOM 941 CG GLU A 135 148.645 28.342 -4.086 1.00 19.37 C ATOM 942 CD GLU A 135 148.810 26.841 -3.993 1.00 22.13 C ATOM 943 OE1 GLU A 135 148.160 26.102 -4.753 1.00 32.17 O ATOM 944 OE2 GLU A 135 149.578 26.366 -3.149 1.00 28.78 O ATOM 945 N ILE A 136 147.447 30.838 -7.622 1.00 12.41 N ATOM 946 CA ILE A 136 146.933 31.017 -8.993 1.00 11.83 C ATOM 947 C ILE A 136 147.713 30.153 -10.016 1.00 13.72 C ATOM 948 O ILE A 136 148.868 29.819 -9.793 1.00 12.90 O ATOM 949 CB ILE A 136 146.970 32.492 -9.429 1.00 14.35 C ATOM 950 CG1 ILE A 136 148.409 33.023 -9.368 1.00 15.26 C ATOM 951 CG2 ILE A 136 146.006 33.303 -8.565 1.00 12.94 C ATOM 952 CD1 ILE A 136 148.644 34.160 -10.201 1.00 19.85 C ATOM 953 N ASN A 137 147.124 29.819 -11.157 1.00 10.76 N ATOM 954 CA ASN A 137 147.871 29.031 -12.136 1.00 12.31 C ATOM 955 C ASN A 137 148.844 29.859 -12.976 1.00 10.82 C ATOM 956 O ASN A 137 149.952 29.409 -13.230 1.00 14.19 O ATOM 957 CB ASN A 137 146.943 28.202 -13.050 1.00 13.50 C ATOM 958 CG ASN A 137 145.982 29.061 -13.909 1.00 17.42 C ATOM 959 OD1 ASN A 137 145.659 30.213 -13.564 1.00 17.90 O ATOM 960 ND2 ASN A 137 145.503 28.474 -15.024 1.00 17.14 N ATOM 961 N ARG A 138 148.476 31.083 -13.345 1.00 9.75 N ATOM 962 CA ARG A 138 149.366 31.896 -14.167 1.00 11.53 C ATOM 963 C ARG A 138 149.776 33.240 -13.532 1.00 13.51 C ATOM 964 O ARG A 138 149.137 34.287 -13.757 1.00 13.12 O ATOM 965 CB ARG A 138 148.723 32.117 -15.530 1.00 11.98 C ATOM 966 CG ARG A 138 148.431 30.839 -16.363 1.00 15.10 C ATOM 967 CD ARG A 138 149.717 30.214 -16.977 1.00 16.37 C ATOM 968 NE ARG A 138 150.477 31.138 -17.856 1.00 20.70 N ATOM 969 CZ ARG A 138 150.373 31.246 -19.198 1.00 17.21 C ATOM 970 NH1 ARG A 138 149.520 30.488 -19.884 1.00 14.90 N ATOM 971 NH2 ARG A 138 151.158 32.109 -19.859 1.00 15.65 N ATOM 972 N PRO A 139 150.864 33.233 -12.726 1.00 15.85 N ATOM 973 CA PRO A 139 151.341 34.450 -12.070 1.00 16.17 C ATOM 974 C PRO A 139 151.777 35.498 -13.046 1.00 16.86 C ATOM 975 O PRO A 139 151.943 36.658 -12.673 1.00 18.12 O ATOM 976 CB PRO A 139 152.491 33.957 -11.202 1.00 14.86 C ATOM 977 CG PRO A 139 152.915 32.693 -11.886 1.00 21.02 C ATOM 978 CD PRO A 139 151.607 32.066 -12.223 1.00 14.27 C ATOM 979 N ASP A 140 151.978 35.082 -14.293 1.00 15.58 N ATOM 980 CA ASP A 140 152.395 36.004 -15.351 1.00 16.68 C ATOM 981 C ASP A 140 151.242 36.844 -15.906 1.00 13.10 C ATOM 982 O ASP A 140 151.471 37.748 -16.692 1.00 14.78 O ATOM 983 CB ASP A 140 153.173 35.270 -16.485 1.00 16.07 C ATOM 984 CG ASP A 140 152.413 34.080 -17.080 1.00 17.87 C ATOM 985 OD1 ASP A 140 151.609 33.411 -16.391 1.00 15.77 O ATOM 986 OD2 ASP A 140 152.630 33.820 -18.277 1.00 23.51 O ATOM 987 N TYR A 141 150.021 36.571 -15.456 1.00 10.86 N ATOM 988 CA TYR A 141 148.853 37.335 -15.883 1.00 13.66 C ATOM 989 C TYR A 141 148.377 38.237 -14.732 1.00 14.58 C ATOM 990 O TYR A 141 147.396 38.951 -14.879 1.00 15.33 O ATOM 991 CB TYR A 141 147.679 36.412 -16.288 1.00 14.82 C ATOM 992 CG TYR A 141 147.694 35.953 -17.701 1.00 12.10 C ATOM 993 CD1 TYR A 141 148.440 34.863 -18.064 1.00 16.76 C ATOM 994 CD2 TYR A 141 146.970 36.611 -18.671 1.00 14.69 C ATOM 995 CE1 TYR A 141 148.473 34.413 -19.363 1.00 20.95 C ATOM 996 CE2 TYR A 141 146.984 36.177 -19.987 1.00 20.43 C ATOM 997 CZ TYR A 141 147.752 35.061 -20.327 1.00 23.10 C ATOM 998 OH TYR A 141 147.816 34.568 -21.613 1.00 22.16 O ATOM 999 N LEU A 142 149.062 38.222 -13.594 1.00 16.72 N ATOM 1000 CA LEU A 142 148.624 39.051 -12.488 1.00 19.03 C ATOM 1001 C LEU A 142 148.908 40.523 -12.686 1.00 23.10 C ATOM 1002 O LEU A 142 149.821 40.924 -13.432 1.00 22.49 O ATOM 1003 CB LEU A 142 149.267 38.614 -11.171 1.00 15.87 C ATOM 1004 CG LEU A 142 148.660 37.477 -10.355 1.00 17.46 C ATOM 1005 CD1 LEU A 142 149.556 37.281 -9.137 1.00 19.75 C ATOM 1006 CD2 LEU A 142 147.210 37.784 -9.909 1.00 16.31 C ATOM 1007 N ASP A 143 148.046 41.321 -12.060 1.00 22.61 N ATOM 1008 CA ASP A 143 148.189 42.752 -12.054 1.00 23.51 C ATOM 1009 C ASP A 143 148.118 43.270 -10.600 1.00 22.94 C ATOM 1010 O ASP A 143 147.077 43.184 -9.954 1.00 22.01 O ATOM 1011 CB ASP A 143 147.110 43.391 -12.915 1.00 23.96 C ATOM 1012 CG ASP A 143 147.323 44.879 -13.111 1.00 27.00 C ATOM 1013 OD1 ASP A 143 148.461 45.353 -13.008 1.00 28.49 O ATOM 1014 OD2 ASP A 143 146.347 45.593 -13.381 1.00 30.38 O ATOM 1015 N PHE A 144 149.249 43.682 -10.039 1.00 25.83 N ATOM 1016 CA PHE A 144 149.265 44.264 -8.687 1.00 30.40 C ATOM 1017 C PHE A 144 149.961 45.599 -8.840 1.00 33.28 C ATOM 1018 O PHE A 144 150.579 46.118 -7.902 1.00 35.56 O ATOM 1019 CB PHE A 144 149.962 43.394 -7.612 1.00 31.48 C ATOM 1020 CG PHE A 144 151.122 42.564 -8.113 1.00 29.63 C ATOM 1021 CD1 PHE A 144 152.310 43.152 -8.522 1.00 28.05 C ATOM 1022 CD2 PHE A 144 151.010 41.180 -8.188 1.00 30.50 C ATOM 1023 CE1 PHE A 144 153.356 42.374 -8.997 1.00 32.18 C ATOM 1024 CE2 PHE A 144 152.057 40.399 -8.663 1.00 32.03 C ATOM 1025 CZ PHE A 144 153.227 40.997 -9.066 1.00 32.31 C ATOM 1026 N ALA A 145 149.826 46.134 -10.055 1.00 32.57 N ATOM 1027 CA ALA A 145 150.380 47.410 -10.467 1.00 32.42 C ATOM 1028 C ALA A 145 150.062 48.455 -9.439 1.00 35.45 C ATOM 1029 O ALA A 145 150.689 49.519 -9.394 1.00 35.79 O ATOM 1030 CB ALA A 145 149.771 47.832 -11.769 1.00 32.80 C ATOM 1031 N GLU A 146 149.046 48.180 -8.636 1.00 35.09 N ATOM 1032 CA GLU A 146 148.665 49.124 -7.613 1.00 33.95 C ATOM 1033 C GLU A 146 147.642 48.588 -6.646 1.00 30.04 C ATOM 1034 O GLU A 146 147.219 47.438 -6.734 1.00 28.78 O ATOM 1035 CB GLU A 146 148.084 50.373 -8.295 1.00 34.26 C ATOM 1036 N SER A 147 147.439 49.377 -5.606 1.00 28.85 N ATOM 1037 CA SER A 147 146.342 49.177 -4.682 1.00 29.39 C ATOM 1038 C SER A 147 146.059 47.929 -3.889 1.00 27.62 C ATOM 1039 O SER A 147 146.935 47.206 -3.432 1.00 33.54 O ATOM 1040 CB SER A 147 145.092 49.522 -5.476 1.00 29.12 C ATOM 1041 OG SER A 147 144.054 50.040 -4.675 1.00 37.21 O ATOM 1042 N GLY A 148 144.772 47.786 -3.637 1.00 26.24 N ATOM 1043 CA GLY A 148 144.258 46.676 -2.911 1.00 20.01 C ATOM 1044 C GLY A 148 143.746 45.780 -3.976 1.00 12.10 C ATOM 1045 O GLY A 148 142.638 45.320 -3.849 1.00 11.21 O ATOM 1046 N GLN A 149 144.549 45.532 -5.014 1.00 13.54 N ATOM 1047 CA GLN A 149 144.171 44.633 -6.124 1.00 11.77 C ATOM 1048 C GLN A 149 144.359 43.109 -5.883 1.00 11.26 C ATOM 1049 O GLN A 149 143.589 42.308 -6.410 1.00 11.23 O ATOM 1050 CB GLN A 149 144.934 45.021 -7.393 1.00 13.29 C ATOM 1051 CG GLN A 149 144.566 46.376 -7.972 1.00 18.69 C ATOM 1052 CD GLN A 149 145.250 46.657 -9.316 1.00 17.78 C ATOM 1053 OE1 GLN A 149 144.912 47.622 -10.016 1.00 21.20 O ATOM 1054 NE2 GLN A 149 146.191 45.798 -9.692 1.00 19.36 N ATOM 1055 N VAL A 150 145.392 42.720 -5.121 1.00 12.06 N ATOM 1056 CA VAL A 150 145.710 41.314 -4.824 1.00 10.98 C ATOM 1057 C VAL A 150 145.957 41.267 -3.326 1.00 11.31 C ATOM 1058 O VAL A 150 146.863 41.947 -2.854 1.00 10.87 O ATOM 1059 CB VAL A 150 147.024 40.862 -5.575 1.00 8.44 C ATOM 1060 CG1 VAL A 150 147.249 39.380 -5.414 1.00 7.85 C ATOM 1061 CG2 VAL A 150 146.910 41.187 -7.092 1.00 10.42 C ATOM 1062 N TYR A 151 145.188 40.443 -2.608 1.00 10.87 N ATOM 1063 CA TYR A 151 145.286 40.296 -1.156 1.00 7.76 C ATOM 1064 C TYR A 151 144.745 38.957 -0.686 1.00 6.81 C ATOM 1065 O TYR A 151 143.959 38.301 -1.369 1.00 9.17 O ATOM 1066 CB TYR A 151 144.553 41.428 -0.417 1.00 8.04 C ATOM 1067 CG TYR A 151 143.124 41.671 -0.878 1.00 10.04 C ATOM 1068 CD1 TYR A 151 142.086 40.891 -0.405 1.00 7.86 C ATOM 1069 CD2 TYR A 151 142.808 42.701 -1.796 1.00 12.17 C ATOM 1070 CE1 TYR A 151 140.796 41.123 -0.812 1.00 9.92 C ATOM 1071 CE2 TYR A 151 141.491 42.938 -2.221 1.00 9.29 C ATOM 1072 CZ TYR A 151 140.494 42.144 -1.718 1.00 8.74 C ATOM 1073 OH TYR A 151 139.173 42.363 -2.079 1.00 10.19 O ATOM 1074 N PHE A 152 145.192 38.571 0.498 1.00 8.63 N ATOM 1075 CA PHE A 152 144.838 37.309 1.148 1.00 7.60 C ATOM 1076 C PHE A 152 144.780 37.626 2.659 1.00 7.28 C ATOM 1077 O PHE A 152 145.731 38.199 3.218 1.00 6.24 O ATOM 1078 CB PHE A 152 145.958 36.313 0.775 1.00 7.78 C ATOM 1079 CG PHE A 152 145.894 34.920 1.417 1.00 12.16 C ATOM 1080 CD1 PHE A 152 144.683 34.285 1.800 1.00 12.65 C ATOM 1081 CD2 PHE A 152 147.102 34.200 1.557 1.00 11.05 C ATOM 1082 CE1 PHE A 152 144.685 32.931 2.319 1.00 11.97 C ATOM 1083 CE2 PHE A 152 147.114 32.870 2.062 1.00 11.38 C ATOM 1084 CZ PHE A 152 145.905 32.232 2.444 1.00 9.60 C ATOM 1085 N GLY A 153 143.667 37.324 3.310 1.00 9.92 N ATOM 1086 CA GLY A 153 143.585 37.569 4.747 1.00 11.12 C ATOM 1087 C GLY A 153 142.841 36.501 5.519 1.00 10.83 C ATOM 1088 O GLY A 153 142.167 35.655 4.950 1.00 12.36 O ATOM 1089 N ILE A 154 142.986 36.523 6.838 1.00 13.35 N ATOM 1090 CA ILE A 154 142.265 35.600 7.725 1.00 13.98 C ATOM 1091 C ILE A 154 142.077 36.328 9.036 1.00 12.09 C ATOM 1092 O ILE A 154 142.904 37.162 9.396 1.00 14.29 O ATOM 1093 CB ILE A 154 142.987 34.219 8.090 1.00 16.58 C ATOM 1094 CG1 ILE A 154 144.222 34.423 8.963 1.00 14.91 C ATOM 1095 CG2 ILE A 154 143.340 33.376 6.845 1.00 20.27 C ATOM 1096 CD1 ILE A 154 144.680 33.112 9.614 1.00 21.92 C ATOM 1097 N ILE A 155 140.993 36.032 9.728 1.00 12.24 N ATOM 1098 CA ILE A 155 140.701 36.628 11.031 1.00 11.73 C ATOM 1099 C ILE A 155 139.934 35.578 11.866 1.00 11.73 C ATOM 1100 O ILE A 155 138.996 34.912 11.359 1.00 9.05 O ATOM 1101 CB ILE A 155 139.934 37.999 10.911 1.00 11.44 C ATOM 1102 CG1 ILE A 155 139.553 38.515 12.300 1.00 14.11 C ATOM 1103 CG2 ILE A 155 138.708 37.888 10.030 1.00 14.33 C ATOM 1104 CD1 ILE A 155 140.080 39.930 12.603 1.00 18.47 C ATOM 1105 N ALA A 156 140.400 35.341 13.094 1.00 12.07 N ATOM 1106 CA ALA A 156 139.724 34.378 13.960 1.00 14.71 C ATOM 1107 C ALA A 156 138.398 34.995 14.392 1.00 17.52 C ATOM 1108 O ALA A 156 138.340 36.180 14.727 1.00 17.55 O ATOM 1109 CB ALA A 156 140.570 34.035 15.142 1.00 13.77 C ATOM 1110 N LEU A 157 137.334 34.202 14.295 1.00 21.24 N ATOM 1111 CA LEU A 157 135.967 34.622 14.639 1.00 27.30 C ATOM 1112 C LEU A 157 135.522 34.202 16.044 1.00 29.11 C ATOM 1113 CB LEU A 157 134.955 34.050 13.633 1.00 30.48 C ATOM 1114 CG LEU A 157 135.004 34.466 12.159 1.00 32.78 C ATOM 1115 CD1 LEU A 157 133.788 33.891 11.444 1.00 31.95 C ATOM 1116 CD2 LEU A 157 134.976 35.991 12.040 1.00 35.27 C ATOM 1117 OXT LEU A 157 134.446 34.662 16.457 1.00 31.94 O TER 1118 LEU A 157 HETATM 1119 O HOH 201 137.069 41.771 -6.850 1.00 17.83 O HETATM 1120 O HOH 202 134.516 36.472 -8.339 1.00 19.46 O HETATM 1121 O HOH 204 145.132 25.531 -11.730 1.00 32.55 O HETATM 1122 O HOH 205 144.666 28.030 -8.681 1.00 16.62 O HETATM 1123 O HOH 206 145.999 26.779 -3.125 1.00 21.29 O HETATM 1124 O HOH 207 149.919 31.401 -0.735 1.00 15.23 O HETATM 1125 O HOH 208 155.666 26.973 -0.378 1.00 46.78 O HETATM 1126 O HOH 209 151.921 28.765 -1.919 1.00 28.44 O HETATM 1127 O HOH 210 150.433 31.196 -6.749 1.00 16.68 O HETATM 1128 O HOH 211 147.899 47.948 9.515 1.00 17.19 O HETATM 1129 O HOH 212 145.211 47.227 8.546 1.00 15.96 O HETATM 1130 O HOH 213 144.706 45.193 6.669 1.00 9.40 O HETATM 1131 O HOH 214 144.507 50.081 7.882 1.00 18.16 O HETATM 1132 O HOH 215 146.202 51.682 10.249 1.00 25.94 O HETATM 1133 O HOH 216 147.659 45.005 -5.373 1.00 22.32 O HETATM 1134 O HOH 217 145.917 44.434 -2.185 1.00 27.40 O HETATM 1135 O HOH 218 144.751 45.522 0.359 1.00 14.64 O HETATM 1136 O HOH 219 146.331 47.118 1.782 1.00 28.76 O HETATM 1137 O HOH 220 154.188 31.414 -15.920 1.00 32.81 O HETATM 1138 O HOH 221 155.223 31.090 -9.958 1.00 25.08 O HETATM 1139 O HOH 222 160.131 36.058 -8.682 1.00 33.75 O HETATM 1140 O HOH 223 156.028 35.122 1.212 1.00 23.18 O HETATM 1141 O HOH 224 156.810 33.593 3.329 1.00 17.90 O HETATM 1142 O HOH 225 131.548 26.688 18.681 1.00 51.58 O HETATM 1143 O HOH 226 139.719 21.261 14.235 1.00 33.22 O HETATM 1144 O HOH 227 138.199 31.513 16.287 1.00 28.45 O HETATM 1145 O HOH 228 151.999 42.522 -19.516 1.00 39.47 O HETATM 1146 O HOH 229 155.593 35.680 9.917 1.00 21.41 O HETATM 1147 O HOH 230 154.129 39.073 12.894 1.00 29.64 O HETATM 1148 O HOH 231 154.582 36.121 12.756 1.00 32.19 O HETATM 1149 O HOH 232 154.114 38.814 -17.897 1.00 26.17 O HETATM 1150 O HOH 233 153.958 38.932 -6.122 1.00 17.76 O HETATM 1151 O HOH 234 151.457 46.828 -5.607 1.00 35.17 O HETATM 1152 O HOH 235 151.071 50.634 9.434 1.00 30.11 O HETATM 1153 O HOH 236 152.883 23.313 10.011 1.00 34.98 O HETATM 1154 O HOH 237 139.672 31.590 -22.014 1.00 39.91 O HETATM 1155 O HOH 238 148.208 50.498 8.586 1.00 35.43 O HETATM 1156 O HOH 239 136.993 21.320 1.081 1.00 63.99 O HETATM 1157 O HOH 240 135.861 38.266 -31.914 1.00 34.44 O HETATM 1158 O HOH 241 144.105 22.014 17.148 1.00 33.48 O HETATM 1159 O HOH 242 158.385 48.495 5.435 1.00 23.41 O HETATM 1160 O HOH 243 158.241 40.897 2.047 1.00 40.82 O HETATM 1161 O HOH 244 167.036 38.524 8.384 1.00 87.86 O HETATM 1162 O HOH 245 145.907 38.427 -23.488 1.00 51.70 O HETATM 1163 O HOH 246 151.903 43.747 -12.234 1.00 27.51 O HETATM 1164 O HOH 247 152.055 53.152 -7.506 1.00 45.10 O HETATM 1165 O HOH 248 165.628 39.438 6.445 1.00 31.65 O HETATM 1166 O HOH 249 158.710 34.770 -6.387 1.00 30.31 O HETATM 1167 O HOH 250 162.568 40.773 -7.961 1.00 75.22 O HETATM 1168 O HOH 251 163.288 45.013 -7.285 1.00 48.22 O HETATM 1169 O HOH 252 153.343 30.821 -32.435 1.00 31.89 O HETATM 1170 O HOH 253 151.032 34.651 -28.310 1.00 50.45 O HETATM 1171 O HOH 254 153.664 38.706 -24.238 1.00 65.31 O HETATM 1172 O HOH 255 153.665 38.774 -13.000 1.00 35.47 O HETATM 1173 O HOH 256 159.158 44.270 -11.773 1.00 66.65 O HETATM 1174 O HOH 257 149.572 22.243 8.230 1.00 25.62 O HETATM 1175 O HOH 258 151.608 53.171 8.943 1.00 41.70 O HETATM 1176 O HOH 259 146.325 28.874 -25.979 1.00 43.62 O HETATM 1177 O HOH 260 141.897 33.449 -22.872 1.00 60.15 O HETATM 1178 O HOH 261 142.901 33.610 -33.680 1.00 32.13 O HETATM 1179 O HOH 262 142.993 26.074 -27.292 1.00 56.58 O HETATM 1180 O HOH 263 149.097 13.806 -1.263 1.00 37.20 O HETATM 1181 O HOH 264 143.952 19.577 4.867 1.00 34.68 O HETATM 1182 O HOH 265 146.180 25.248 -6.294 1.00 44.91 O HETATM 1183 O HOH 266 148.111 50.620 -3.136 1.00 55.61 O HETATM 1184 O HOH 267 144.093 13.465 10.360 1.00 64.38 O HETATM 1185 O HOH 268 141.577 14.676 13.168 1.00 54.76 O HETATM 1186 O HOH 269 137.019 17.606 19.854 1.00 33.36 O HETATM 1187 O HOH 270 149.639 55.203 4.611 1.00 49.01 O HETATM 1188 O HOH 271 156.238 32.191 -4.204 1.00 64.53 O CONECT 453 685 CONECT 685 453 MASTER 357 0 0 1 10 0 0 6 1187 1 2 12 END PyCogent-1.5.3/doc/data/abglobin_aa.phylip000644 000765 000024 00000003215 10665667404 021402 0ustar00jrideoutstaff000000 000000 5 285 I human goat-cow rabbit rat marsupial VLSPADKTNV KAAWGKVGAH AGEYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHGK VLSAADKSNV KAAWGKVGGN AGAYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHGE VLSPADKTNI KTAWEKIGSH GGEYGAEAVE RMFLGFPTTK TYFPHFDFTH GSEQIKAHGK VLSADDKTNI KNCWGKIGGH GGEYGEEALQ RMFAAFPTTK TYFSHIDVSP GSAQVKAHGK VLSDADKTHV KAIWGKVGGH AGAYAAEALA RTFLSFPTTK TYFPHFDLSP GSAQIQGHGK KVADALTNAV AHVDDMPNAL SALSDLHAHK LRVDPVNFKL LSHCLLVTLA AHLPAEFTPA KVAAALTKAV GHLDDLPGTL SDLSDLHAHK LRVDPVNFKL LSHSLLVTLA CHLPNDFTPA KVSEALTKAV GHLDDLPGAL STLSDLHAHK LRVDPVNFKL LSHCLLVTLA NHHPSEFTPA KVADALAKAA DHVEDLPGAL STLSDLHAHK LRVDPVNFKF LSHCLLVTLA CHHPGDFTPA KVADALSQAV AHLDDLPGTM SKLSDLHAHK LRVDPVNFKL LSHCLIVTLA AHLSKDLTPE VHASLDKFLA SVSTVLTSKY RLTPEEKSAV TALWGKVNVD EVGGEALGRL LVVYPWTQRF VHASLDKFLA NVSTVLTSKY RLTAEEKAAV TAFWGKVKVD EVGGEALGRL LVVYPWTQRF VHASLDKFLA NVSTVLTSKY RLSSEEKSAV TALWGKVNVE EVGGEALGRL LVVYPWTQRF MHASLDKFLA SVSTVLTSKY RLTDAEKAAV NALWGKVNPD DVGGEALGRL LVVYPWTQRY VHASMDKFFA SVATVLTSKY RLTSEEKNCI TTIWSKVQVD QTGGEALGRM LVVYPWTTRF FESFGDLSTP DAVMGNPKVK AHGKKVLGAF SDGLAHLDNL KGTFATLSEL HCDKLHVDPE FESFGDLSTA DAVMNNPKVK AHGKKVLDSF SNGMKHLDDL KGTFAALSEL HCDKLHVDPE FESFGDLSSA NAVMNNPKVK AHGKKVLAAF SEGLSHLDNL KGTFAKLSEL HCDKLHVDPE FDSFGDLSSA SAIMGNPKVK AHGKKVINAF NDGLKHLDNL KGTFAHLSEL HCDKLHVDPE FGSFGDLSSP GAVMSNSKVQ AHGAKVLTSF GEAVKHLDNL KGTYAKLSEL HCDKLHVDPE NFRLLGNVLV CVLAHHFGKE FTPPVQAAYQ KVVAGVANAL AHKYH NFKLLGNVLV VVLARNFGKE FTPVLQADFQ KVVAGVANAL AHRYH NFRLLGNVLV IVLSHHFGKE FTPQVQAAYQ KVVAGVANAL AHKYH NFRLLGNMIV IVLGHHLGKE FTPCAQAAFQ KVVAGVASAL AHKYH NFKMLGNIIV ICLAEHFGKD FTPECQVAWQ KLVAGVAHAL AHKYH PyCogent-1.5.3/doc/data/count_seqs.py000644 000765 000024 00000004321 11612102756 020454 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division __author__ = "Greg Caporaso" __copyright__ = "Copyright 2011, The PyCogent project" __credits__ = ["Greg Caporaso"] __license__ = "GPL" __version__ = "1.6.0dev" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Development" from glob import glob from cogent.util.option_parsing import ( parse_command_line_parameters, make_option) from cogent.parse.fasta import MinimalFastaParser script_info = {} script_info['brief_description'] = "Count sequences in one or more fasta files." script_info['script_description'] = "This script counts the number of sequences in one or more fasta files and prints the results to stdout." script_info['script_usage'] = [\ ("Count sequences in one file", "Count the sequences in a fasta file and write results to stdout.", "%prog -i in.fasta"), ("Count sequences in two file", "Count the sequences in two fasta files and write results to stdout.", "%prog -i in1.fasta,in2.fasta"), ("Count the sequences in many fasta files", "Count the sequences all .fasta files in current directory and write results to stdout. Note that -i option must be quoted.", "%prog -i \"*.fasta\"")] script_info['output_description']= "Tabular data is written to stdout." script_info['required_options'] = [ make_option('-i','--input_fps', help='the input filepaths (comma-separated)'), ] script_info['optional_options'] = [ make_option('--suppress_errors',action='store_true',\ help='Suppress warnings about missing files [default: %default]', default=False) ] script_info['version'] = __version__ def main(): option_parser, opts, args =\ parse_command_line_parameters(**script_info) suppress_errors = opts.suppress_errors input_fps = [] for input_fp in opts.input_fps.split(','): input_fps.extend(glob(input_fp)) for input_fp in input_fps: i = 0 try: input_f = open(input_fp,'U') except IOError,e: if suppress_errors: continue else: print input_fp, e for s in MinimalFastaParser(input_f): i += 1 print input_fp, i if __name__ == "__main__": main()PyCogent-1.5.3/doc/data/Crump_et_al_example_env_file.txt000644 000765 000024 00000005325 11444723227 024306 0ustar00jrideoutstaff000000 000000 AF141399 R_FL 1 AF141411 R_FL 1 AF141408 R_FL 2 AF141403 R_FL 1 AF141410 R_FL 1 AF141398 R_FL 2 AF141391 R_FL 1 AF141389 R_FL 1 AF141395 R_FL 1 AF141401 R_FL 1 AF141390 R_FL 1 AF141393 R_FL 1 AF141396 R_FL 1 AF141402 R_FL 1 AF141407 R_FL 1 AF141387 R_FL 1 AF141394 R_FL 2 AF141409 R_FL 1 AF141400 R_FL 1 AF141397 R_FL 1 AF141405 R_FL 1 AF141388 R_FL 1 AF141424 R_PA 1 AF141421 R_PA 1 AF141433 R_PA 1 AF141428 R_PA 1 AF141432 R_PA 1 AF141426 R_PA 1 AF141430 R_PA 1 AF141413 R_PA 1 AF141419 R_PA 1 AF141423 R_PA 2 AF141429 R_PA 2 AF141422 R_PA 1 AF141431 R_PA 1 AF141415 R_PA 1 AF141418 R_PA 1 AF141416 R_PA 1 AF141420 R_PA 1 AF141417 R_PA 1 AF141434 R_PA 1 AF141412 R_PA 1 AF141414 R_PA 1 AF141463 E_FL 1 AF141493 E_FL 14 AF141459 E_FL 1 AF141461 E_FL 1 AF141447 E_FL 1 AF141479 E_FL 1 AF141449 E_FL 1 AF141465 E_FL 2 AF141435 E_FL 1 AF141457 E_FL 3 AF141468 E_FL 1 AF141487 E_FL 1 AF141472 E_FL 1 AF141466 E_FL 1 AF141444 E_FL 4 AF141473 E_FL 3 AF141439 E_FL 1 AF141436 E_FL 1 AF141455 E_FL 1 AF141443 E_FL 3 AF141483 E_FL 1 AF141476 E_FL 1 AF141441 E_FL 1 AF141440 E_FL 2 AF141474 E_FL 1 AF141486 E_FL 1 AF141481 E_FL 1 AF141480 E_FL 1 AF141470 E_FL 1 AF141458 E_FL 1 AF141460 E_FL 1 AF141478 E_FL 1 AF141450 E_FL 1 AF141471 E_FL 2 AF141442 E_FL 1 AF141454 E_FL 1 AF141488 E_FL 1 AF141451 E_FL 1 AF141456 E_FL 1 AF141452 E_FL 1 AF141482 E_FL 1 AF141448 E_FL 1 AF141484 E_FL 3 AF141475 E_FL 1 AF141548 E_PA 4 AF141520 E_PA 1 AF141508 E_PA 1 AF141523 E_PA 1 AF141547 E_PA 3 AF141530 E_PA 2 AF141510 E_PA 1 AF141513 E_PA 1 AF141524 E_PA 1 AF141516 E_PA 1 AF141543 E_PA 1 AF141503 E_PA 11 AF141498 E_PA 1 AF141532 E_PA 2 AF141549 E_PA 1 AF141501 E_PA 2 AF141525 E_PA 1 AF141534 E_PA 1 AF141506 E_PA 1 AF141519 E_PA 1 AF141550 E_PA 1 AF141496 E_PA 1 AF141535 E_PA 3 AF141512 E_PA 1 AF141514 E_PA 1 AF141537 E_PA 1 AF141533 E_PA 2 AF141511 E_PA 1 AF141539 E_PA 1 AF141545 E_PA 2 AF141500 E_PA 3 AF141505 E_PA 1 AF141497 E_PA 1 AF141541 E_PA 1 AF141518 E_PA 1 AF141551 E_PA 1 AF141504 E_PA 1 AF141546 E_PA 1 AF141515 E_PA 1 AF141529 E_PA 1 AF141507 E_PA 1 AF141540 E_PA 1 AF141521 E_PA 1 AF141538 E_PA 1 AF141522 E_PA 1 AF141509 E_PA 1 AF141517 E_PA 1 AF141536 E_PA 1 AF141557 O_UN 1 AF141569 O_UN 2 AF141561 O_UN 4 AF141578 O_UN 2 AF141567 O_UN 1 AF141566 O_UN 1 AF141568 O_UN 2 AF141579 O_UN 6 AF141562 O_UN 1 AF141559 O_UN 1 AF141560 O_UN 1 AF141577 O_UN 1 AF141583 O_FL 2 AF141595 O_FL 1 AF141599 O_FL 2 AF141582 O_FL 2 AF141600 O_FL 1 AF141594 O_FL 2 AF141592 O_FL 1 AF141597 O_FL 1 AF141590 O_FL 3 AF141581 O_FL 1 AF141587 O_FL 1 AF141588 O_FL 1 AF141586 O_FL 1 AF141591 O_FL 1 AF141589 O_FL 1 AF141593 O_FL 1 PyCogent-1.5.3/doc/data/Crump_example_tree_newick.txt000644 000765 000024 00000062330 11444723227 023651 0ustar00jrideoutstaff000000 000000 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( AF141409:0.25346, ( ( ( AF141563:0.00000, AF141568:0.00000 ):0.03773, AF141410:0.06022 )Pseudomonadaceae:0.08853, ( ( AF141521:0.12012, AF141439:0.03364 ):0.01884, ( ( ( ( ( ( ( ( ( ( ( AF141494:0.00000, AF141503:0.00000 ):0.00000, AF141553:0.00000 ):0.00000, AF141555:0.00000 ):0.00000, AF141527:0.00000 ):0.00000, AF141477:0.00513 ):0.00000, AF141554:0.00000 ):0.00000, AF141438:0.00000 ):0.00000, AF141493:0.00000 ):0.00000, AF141489:0.00000 ):0.00000, AF141437:0.00513 ):0.00000, AF141490:0.00771 ):0.07637 ):0.00573 ):0.00896 ):0.00327, ( ( AF141546:0.03666, AF141510:0.04080 ):0.03273, AF141586:0.05358 ):0.00492 ):0.02041, ( ( AF141577:0.04055, ( ( AF141558:0.00000, AF141569:0.00771 ):0.01436, AF141430:0.01282 ):0.01144 )Legionellaceae:0.06630, ( AF141501:0.01323, AF141528:0.00000 )SUP05:0.06854 ):0.02220 ):0.00651, AF141498:0.12862 ):0.00081, AF141428:0.08582 ):0.00570, ( ( ( ( ( ( ( ( ( ( AF141464:0.00485, AF141422:0.00256 ):0.00000, AF141473:0.00770 )Alcaligenaceae:0.06178, ( ( ( ( ( ( AF141492:0.01026, AF141453:0.00513 ):0.00000, AF141443:0.01027 ):0.00323, ( AF141441:0.00162, ( AF141404:0.00000, AF141398:0.00000 ):0.00161 ):0.02267 ):0.01703, ( ( AF141465:0.00513, AF141491:0.00256 ):0.00000, AF141405:0.00256 ):0.01214 )Polynucleobacter:0.02121, AF141431:0.03263 )Ralstoniaceae:0.01786, AF141456:0.01328 ):0.01630 ):0.00325, ( ( ( ( ( ( ( ( AF141462:0.01064, AF141414:0.00513 ):0.00000, AF141457:0.00000 ):0.00161, AF141413:0.00383 ):0.00646, AF141459:0.08285 ):0.02827, ( ( ( ( ( ( AF141468:0.00256, AF141392:0.00256 ):0.00324, AF141394:0.00000 ):0.00323, AF141388:0.05424 ):0.01136, AF141581:0.00998 ):0.00323, ( AF141444:0.00000, AF141446:0.00000 ):0.01673 ):0.02266, AF141478:0.02523 ):0.00162 ):0.02266, AF141595:0.05097 ):0.02108, ( ( AF141393:0.00513, AF141458:0.00256 ):0.02621, AF141469:0.03147 ):0.02285 ):0.01541, AF141538:0.10137 )Comamonadaceae:0.06378 ):0.01134, AF141449:0.06970 )Burkholderiales:0.02283, ( AF141600:0.07983, AF141482:0.04023 ):0.00573 ):0.01304, ( ( ( AF141486:0.00000, AF141403:0.00256 ):0.00000, AF141525:0.01542 ):0.01546, AF141461:0.01645 )Methylophilales:0.06427 ):0.00326, ( AF141551:0.02426, AF141507:0.04615 )Neisseriales:0.09487 )Betaproteobacteria:0.07947, AF141517:0.09902 ):0.05527, ( ( AF141532:0.01292, AF141542:0.00257 ):0.00000, AF141424:0.00262 )Ellin307/WD2124:0.12728 ):0.01546 )Gamma_beta_proteobacteria:0.02854, ( ( ( ( ( ( AF141557:0.00513, AF141480:0.00256 )Roseobacter:0.13561, ( ( AF141529:0.01265, AF141434:0.03671 ):0.02966, ( ( ( AF141526:0.00256, AF141495:0.00000 ):0.00000, AF141556:0.00513 ):0.00000, AF141548:0.00514 ):0.00164 )Rhodobacter:0.04051 )Rhodobacterales:0.22265, ( AF141421:0.04514, AF141432:0.01539 )Beijerinckiaceae:0.06527 ):0.00244, ( ( ( AF141544:0.00000, AF141545:0.00000 ):0.11111, AF141450:0.02767 ):0.10513, AF141396:0.07575 ):0.00653 ):0.00487, ( ( AF141530:0.00000, AF141531:0.00000 ):0.02863, AF141448:0.04585 )Sphingomonadales:0.13392 ):0.04010, ( ( ( ( ( ( ( ( ( ( ( AF141598:0.00323, AF141479:0.00836 ):0.00646, AF141593:0.01564 ):0.01291, AF141588:0.01033 )Pelagibacter:0.00646, AF141583:0.00257 ):0.00000, AF141539:0.00000 ):0.01778, ( AF141601:0.00162, AF141580:0.00769 ):0.02595 ):0.06022, ( ( ( AF141582:0.00000, AF141585:0.00000 ):0.02425, AF141590:0.02328 ):0.02274, AF141395:0.04863 ):0.00498 )SAR11:0.04348, ( AF141447:0.05616, ( AF141594:0.00256, AF141584:0.00000 ):0.06659 ):0.02347 ):0.00663, AF141567:0.22932 ):0.01333, AF141589:0.05577 ):0.01247, AF141435:0.09271 )Consistiales:0.05054 )Alphaproteobacteria:0.08879 ):0.07221, ( ( ( ( AF141472:0.16321, ( AF141537:0.01302, AF141496:0.00838 )Desulfobulbaceae:0.12237 ):0.01639, AF141454:0.14646 ):0.01228, AF141504:0.15471 ):0.01396, AF141505:0.13406 )'Delta proteobacteria':0.00581 )Proteobacteria:0.00332, AF141560:0.18908 ):0.00658, ( ( ( ( ( ( AF141549:0.10860, ( ( ( ( ( ( ( ( AF141499:0.00258, AF141547:0.00258 ):0.00000, AF141559:0.00258 ):0.07190, AF141518:0.07560 ):0.04246, AF141474:0.07724 )Cytophaga:0.01871, ( ( ( ( AF141515:0.02581, ( AF141519:0.02952, AF141452:0.04379 ):0.01140 ):0.00484, AF141451:0.02101 ):0.01471, AF141466:0.06719 )Sporocytophaga:0.02370, AF141524:0.04740 ):0.01239 ):0.01304, ( ( AF141500:0.00256, AF141552:0.00000 ):0.00000, AF141502:0.00000 ):0.02059 ):0.04357, AF141407:0.14560 ):0.03134, AF141488:0.10821 )Flavobacteriales:0.01232 ):0.02462, AF141543:0.10581 ):0.00743, AF141397:0.07576 ):0.04542, ( ( AF141436:0.01862, AF141497:0.00325 ):0.00162, AF141460:0.00797 )Flexibacteraceae:0.13515 ):0.00494, ( AF141550:0.09051, AF141418:0.03264 )Saprospiraceae:0.16732 ):0.01398, AF141514:0.25839 )Bacteroidetes:0.13111 ):0.00907, ( ( ( ( AF141516:0.06810, AF141562:0.03493 )'"Planctomycetacia"':0.09418, ( AF141399:0.00528, AF141417:0.00262 )'"Gemmatae"':0.21158 )Planctomycetes:0.09116, ( ( AF141391:0.02575, ( AF141508:0.16401, ( ( ( AF141475:0.07537, AF141487:0.02921 ):0.02882, AF141513:0.08164 ):0.03217, AF141541:0.05472 ):0.02157 )'Verrucomicrobiae (1)':0.00906 ):0.03990, ( ( AF141455:0.00512, AF141387:0.00256 ):0.01132, ( AF141408:0.00767, AF141406:0.00257 ):0.00815 )'Opitutae (4)':0.18896 )Verrucomicrobia:0.10524 ):0.01404, AF141536:0.11511 ):0.00906 ):0.00332, ( ( ( ( AF141463:0.01550, AF141476:0.01081 )'Agrococcus et al.':0.09011, ( ( ( ( ( AF141592:0.00664, AF141426:0.01026 ):0.02268, ( AF141402:0.01525, ( AF141484:0.00418, AF141445:0.00257 ):0.00486 ):0.00811 ):0.00486, AF141587:0.00170 ):0.02354, ( AF141442:0.00674, AF141389:0.00000 ):0.01457 ):0.00489, AF141411:0.06787 )Cellulomonadaceae:0.10377 )Actinobacteridae:0.09674, ( ( ( ( AF141471:0.01091, AF141467:0.01136 ):0.01798, ( ( AF141400:0.00767, AF141481:0.00000 ):0.05116, ( AF141401:0.00512, AF141433:0.00256 ):0.03594 ):0.01066 ):0.01065, ( ( ( AF141522:0.00000, AF141423:0.01279 ):0.00000, AF141427:0.00000 ):0.00000, AF141420:0.00257 ):0.04350 ):0.09155, ( ( ( AF141440:0.00421, AF141520:0.00256 ):0.00161, AF141597:0.00256 ):0.00000, AF141485:0.00256 )'BD2-10 group':0.11726 )Acidimicrobidae:0.04260 )Actinobacteria:0.05311, ( ( ( ( ( ( ( ( AF141574:0.00000, AF141591:0.00256 ):0.00000, AF141579:0.00256 ):0.00000, AF141572:0.00769 ):0.00000, AF141565:0.00256 ):0.00000, AF141571:0.00513 ):0.00000, AF141575:0.00771 ):0.00324, ( ( AF141596:0.00000, AF141564:0.00513 ):0.00162, ( AF141599:0.00256, AF141578:0.01026 ):0.00162 ):0.00809 )Prochlorales:0.18823, ( ( ( ( ( ( AF141523:0.01285, AF141425:0.00258 ):0.00000, AF141429:0.00260 ):0.02350, AF141470:0.10407 ):0.00486, ( ( AF141534:0.00809, AF141533:0.04316 ):0.00647, ( AF141412:0.00000, AF141419:0.00257 ):0.00162 ):0.01456 ):0.04144, ( ( ( ( ( AF141561:0.00372, AF141570:0.00000 ):0.00000, AF141576:0.00000 ):0.00000, AF141506:0.00743 ):0.00000, AF141573:0.00743 ):0.05520, AF141566:0.02577 ):0.01571 )'Euglena et al. chloroplasts':0.08248, AF141512:0.08318 )Chloroplasts:0.03457 )Cyanobacteria:0.16029 ):0.00661 ):0.01395, ( AF141416:0.20329, AF141511:0.19093 )'"Anaerolines"':0.09929 ):0.00986, AF141540:0.22389 ):0.03152, ( AF141509:0.23520, ( AF141390:0.25622, AF141483:0.15819 )OP11-5:0.02369 ):0.08699 ):0.00913, ( AF141415:0.00647, AF141535:0.00260 )OP10:0.19926 )Bacteria; PyCogent-1.5.3/doc/data/dayhoff.dat000644 000765 000024 00000003177 10665667404 020056 0ustar00jrideoutstaff000000 000000 27 98 32 120 0 905 36 23 0 0 89 246 103 134 0 198 1 148 1153 0 716 240 9 139 125 11 28 81 23 240 535 86 28 606 43 10 65 64 77 24 44 18 61 0 7 41 15 34 0 0 73 11 7 44 257 26 464 318 71 0 153 83 27 26 46 18 72 90 1 0 0 114 30 17 0 336 527 243 18 14 14 0 0 0 0 15 48 196 157 0 92 250 103 42 13 19 153 51 34 94 12 32 33 17 11 409 154 495 95 161 56 79 234 35 24 17 96 62 46 245 371 26 229 66 16 53 34 30 22 192 33 136 104 13 78 550 0 201 23 0 0 0 0 0 27 0 46 0 0 76 0 75 0 24 8 95 0 96 0 22 0 127 37 28 13 0 698 0 34 42 61 208 24 15 18 49 35 37 54 44 889 175 10 258 12 48 30 157 0 28 0.087127 0.040904 0.040432 0.046872 0.033474 0.038255 0.049530 0.088612 0.033618 0.036886 0.085357 0.080482 0.014753 0.039772 0.050680 0.069577 0.058542 0.010494 0.029916 0.064718 Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val S_ij = S_ji and PI_i for the Dayhoff model, with the rate Q_ij=S_ij*PI_j The rest of the file is not used. Prepared by Z. Yang, March 1995. From the PAML distribution. PyCogent-1.5.3/doc/data/dists_for_phylo.pickle000644 000765 000024 00000011674 10665667404 022345 0ustar00jrideoutstaff000000 000000 (dp1 (S'SpermWhale' p2 S'Jackrabbit' p3 tp4 F0.23558133673269555 s(S'LeafNose' p5 S'TreeShrew' p6 tp7 F0.22278929978754272 s(g5 S'Hedgehog' p8 tp9 F0.25237868673223468 s(g3 S'Anteater' p10 tp11 F0.23657337673373716 s(g6 S'Gorilla' p12 tp13 F0.18676231969492157 s(S'FreeTaile' p14 S'Llama' p15 tp16 F0.16003181770424132 s(g14 S'Pangolin' p17 tp18 F0.1603579293663818 s(S'Rat' p19 S'LesserEle' p20 tp21 F0.42828304384234206 s(g15 g6 tp22 F0.20581286739688848 s(g8 S'Wombat' p23 tp24 F0.61998461379703196 s(g12 g3 tp25 F0.21764010128591116 s(g12 S'AfricanEl' p26 tp27 F0.17413706816807067 s(g26 S'GoldenMol' p28 tp29 F0.16381078019769496 s(g12 g28 tp30 F0.21665860241388582 s(g5 S'Mole' p31 tp32 F0.20376495738316491 s(g6 g20 tp33 F0.2804198266016712 s(g15 g23 tp34 F0.56706563327409432 s(g17 g10 tp35 F0.18634249480113463 s(g14 g28 tp36 F0.23896070726886304 s(g2 g26 tp37 F0.19175198045399872 s(g2 g8 tp38 F0.24161185834356419 s(g15 S'Sloth' p39 tp40 F0.16754632039974526 s(g3 g28 tp41 F0.28529535427776653 s(g17 g23 tp42 F0.55554889797566909 s(g19 g23 tp43 F0.68514773363006787 s(g14 g8 tp44 F0.24840744685320804 s(g5 g20 tp45 F0.26814374655962026 s(g2 g31 tp46 F0.20381415104564346 s(g2 g19 tp47 F0.36507283188667267 s(g19 g26 tp48 F0.36098887414715852 s(g2 g10 tp49 F0.18660004132482483 s(g5 g12 tp50 F0.17706328308590918 s(g2 g6 tp51 F0.21142599693080405 s(g6 g26 tp52 F0.21295142747772908 s(g6 g3 tp53 F0.24464518837038263 s(g15 g26 tp54 F0.17670838821812171 s(g39 g20 tp55 F0.24408791205710079 s(g14 g2 tp56 F0.15881905159696566 s(g5 g19 tp57 F0.35294087691991094 s(g12 g39 tp58 F0.1684966665016325 s(g8 g26 tp59 F0.25963862266271504 s(g15 g19 tp60 F0.35372121548659219 s(g31 g26 tp61 F0.22043254465350556 s(g3 g20 tp62 F0.31724601399917407 s(g12 g19 tp63 F0.31928646619856205 s(g14 g19 tp64 F0.35853994863389005 s(g19 g10 tp65 F0.36076410676098597 s(g5 g2 tp66 F0.1552418100705735 s(g8 g12 tp67 F0.25351166585922669 s(g15 g2 tp68 F0.11541046335984746 s(g14 g39 tp69 F0.17429921193670331 s(g6 g10 tp70 F0.21793050355674712 s(g6 g23 tp71 F0.54524246572012069 s(g5 g15 tp72 F0.15371168872395888 s(g17 g26 tp73 F0.18307091636646436 s(g10 g23 tp74 F0.54283966576227993 s(g6 g39 tp75 F0.20347481450512622 s(g39 g28 tp76 F0.19626573589070589 s(g31 g10 tp77 F0.22934425261435534 s(g5 g39 tp78 F0.17481118696254297 s(g10 g20 tp79 F0.26273051893624683 s(g8 g39 tp80 F0.25507622514211981 s(g14 g12 tp81 F0.17645529962671383 s(g28 g23 tp82 F0.57958513433104586 s(g6 g28 tp83 F0.2523275028705062 s(g2 g28 tp84 F0.23379205703803776 s(g26 g20 tp85 F0.202434309645733 s(g14 g26 tp86 F0.19290342630687046 s(g14 g5 tp87 F0.13259776625919503 s(g39 g23 tp88 F0.52862866877459347 s(g8 g6 tp89 F0.28306453011242577 s(g31 g39 tp90 F0.21055499043334253 s(g17 g19 tp91 F0.34953396164153971 s(g5 g17 tp92 F0.15611423240637898 s(g20 g23 tp93 F0.57341059979350384 s(g15 g3 tp94 F0.23579080572698605 s(g14 g31 tp95 F0.2035375708124624 s(g10 g26 tp96 F0.17198063443413589 s(g31 g8 tp97 F0.2514186346183323 s(g2 g12 tp98 F0.17427958001912375 s(g12 g23 tp99 F0.54897931148601919 s(g31 g3 tp100 F0.26305710513226543 s(g31 g23 tp101 F0.58419001011914529 s(g17 g2 tp102 F0.15061801593919677 s(g31 g6 tp103 F0.23157223340912828 s(g31 g12 tp104 F0.20030071593079626 s(g5 g28 tp105 F0.2356211665616679 s(g31 g20 tp106 F0.30047877128702466 s(g2 g23 tp107 F0.58717050903439605 s(g14 g3 tp108 F0.23535606447601404 s(g15 g28 tp109 F0.22391018143489855 s(g3 g23 tp110 F0.60050563531851719 s(g19 g28 tp111 F0.38781922994069989 s(g17 g31 tp112 F0.19663617963790703 s(g14 g20 tp113 F0.27697713155467224 s(g17 g28 tp114 F0.22139844503006362 s(g15 g8 tp115 F0.22963441872671392 s(g2 g39 tp116 F0.17537168206526232 s(g8 g19 tp117 F0.44528498187553256 s(g39 g26 tp118 F0.15707714792985861 s(g8 g10 tp119 F0.262069531052805 s(g15 g20 tp120 F0.26186754674282603 s(g15 g31 tp121 F0.19500762515085551 s(g5 g23 tp122 F0.56372551302180518 s(g5 g10 tp123 F0.18431861844687059 s(g31 g28 tp124 F0.26123994830524716 s(g12 g20 tp125 F0.2556467459575969 s(g15 g10 tp126 F0.18649954620741344 s(g15 g12 tp127 F0.16892629241793777 s(g10 g39 tp128 F0.10430995903020382 s(g14 g23 tp129 F0.5724514681894517 s(g12 g10 tp130 F0.17953067369611775 s(g17 g20 tp131 F0.26576072655285454 s(g17 g12 tp132 F0.17154547111760352 s(g28 g20 tp133 F0.23784467508495774 s(g17 g6 tp134 F0.20514478285357235 s(g14 g6 tp135 F0.21926904153263413 s(g17 g3 tp136 F0.22895223603506784 s(g6 g19 tp137 F0.3643788194875382 s(g8 g28 tp138 F0.30424919454836291 s(g14 g10 tp139 F0.19114094081112373 s(g17 g8 tp140 F0.23876822737703321 s(g26 g23 tp141 F0.5387491498191419 s(g5 g26 tp142 F0.18819818650279263 s(g3 g26 tp143 F0.23469538913725793 s(g3 g39 tp144 F0.22055976180843367 s(g19 g39 tp145 F0.34696523857967398 s(g31 g19 tp146 F0.39839525397146391 s(g8 g20 tp147 F0.34529762840916212 s(g17 g39 tp148 F0.17675542635071309 s(g17 g15 tp149 F0.15559649688143493 s(g2 g20 tp150 F0.26909857983936736 s(g10 g28 tp151 F0.20422976142040555 s(g8 g3 tp152 F0.30157557875560231 s(g3 g19 tp153 F0.3759948154971699 s(g5 g3 tp154 F0.22443287357052219 s.PyCogent-1.5.3/doc/data/fastq.txt000755 000765 000024 00000000000 12024703640 024144 2../../tests/data/fastq.txtustar00jrideoutstaff000000 000000 PyCogent-1.5.3/doc/data/inseqs.fasta000644 000765 000024 00000000052 11350301455 020232 0ustar00jrideoutstaff000000 000000 >s2_like_seq TGCAGCTTGAGCACAGGTTAGAGCCTTC PyCogent-1.5.3/doc/data/inseqs_protein.fasta000644 000765 000024 00000000202 11361457314 021777 0ustar00jrideoutstaff000000 000000 >1091044_fragment IPLDFDKEFRDKTVVIVAIPGAFTPT >13541053_fragment KKKNTEVISVSEDTVYVHKAWVQYD >15605725_fragment FEILAINMDPENLTGFLKNNPPyCogent-1.5.3/doc/data/long_testseqs.fasta000644 000765 000024 00000031171 11213030412 021616 0ustar00jrideoutstaff000000 000000 >Human TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACT AAAGACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTA GCAAGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACT CCCAGCGAAAAAAAGGTAGATCTGAATGCTGATCCCCTGTGTGAGAGAAAAGAATGGAAT AAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGATACTGAAGATGTTCCTTGGATAACA CTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAGGT TCTGATGACTCACATGATGGGGAGTCTGAATCAAATGCCTTGGACGTTCTAAATGAGGTA GATGAATATTCTGGTTCTTCAGAGAAAATAGACTTACTGGCCAGTGATCCTCATGAGGCT TTAATATGTGAAAGAGTTCACTCCAAATCAGTAGAGAGTAATATTGAAGACAAAATATTT GGGAAAACCTATCGGAAGAAGGCAAGCCTCCCCAACTTAAGCCATGTAACTGAAATTATA GGAGCATTTGTTACTGAGCCACAGATAATACAAGAGCGTCCCCTCACAAATAAATTAAAG CGTAAAAGGACATCAGGCCTTCATCCTGAGGATTTTATCAAGAAAGCAGATTTGGCAGTT CAAAAGACTCCTGAAATGATAAATCAGGGAACTAACCAAACGGAGCAGAATGGTCAAGTG ATGAATATTACTAATAGTGGTCATGAGAATAAAACAAAAGGTGATTCTATTCAGAATGAG AAAAATCCTAACCCAATAGAATCACTCGAAAAAGAATCTTTCAAAACGAAAGCTGAACCT ATAAGCAGCAGTATAAGCAATATGGAACTCGAATTAAATATCCACAATTCAAAAGCACCT AAAAAGAATCTGAGGAGGAAGTCTACCAGGCATATTCATGCGCTTGAACTAGTCAGTAGA AATCTAAGCCCACCTAATTGTACTGAATTGCAAATTGATAGTTGTTCTAGCAGTGAAGAG ATAAAGAAAAAAAAGTACAACCAAATGCCAGTCAGGCACAGCAGAAACCTACAACTCATG GAAGGTAAAGAACCTGCAACTGGAGCCAAGAAGAACAAGCCAAATGAACAGACAAGTAAA AGACATGACAGCGATACTTTCCCAGAGCTGAAGAATGCACCTGGTTCTTTTACTAAGTGT TCAAATACCAGTGAACTTAAAGAATTTAATCCTAGCCTTCCAAGAGAAGAAAAAGAGAAA CTAGAAACAGTTAAAGTGTCTAATAATGCTGAAGACCCCAAAGATCTCATGTTAAGTGGA GAAAGGGTTTTGCAAACTGAAAGATCTGTAGAGAGTAGCAGTATTTCATTGGTACCTGGT ACTGATTATGGCACTCAGGAAAGTATCTCGTTACTGGAAGTTAGCACTCTAGGGAAGGCA AAAACAGAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGGACTA ATTCATGGTTCCAAAGATAATAGAAATGACACAGAAGGCTTTAAGTATCCATTGGGACAT GAAGTTAACCACTCAAATCCAGAAGAGGAATGTGCACACTCTGGGTCCTTAAAGAAACAA AGTCCAAAAGTCACTTTTGAATGTGAACAAAAGGAAAATCAAGGAAAGAATGAGTCTAAT AAGCCTGTACAGACAGTTAATATCACTGCAGGCTTTCCTGTGGTTGGTCAGAAAGATAAG CCAGTTGATAATGCCAAATGTAAAGGAGGCTCTAGGTTTTGTCTATCATCTCAGTTCAGA GGCAACGAAACTGGACTCATTACTCCAAATAAACATGGACTTTTACAAAACCCATATCGT ATACCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAATCTGCTA GAGGAAAACTTTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAATGAGAACATT CCAAGTACAGTGAGCACAATTAGCCGTAATAACAGAGAAAATGTTTTTAAAGAAGCCAGC TCAAGCAATATTAATGAAGTAGGTTCCAGTGATGAAAACATTCAAGCAGAACTAGGTAGA AACAGAGGGCCAAAATTGAATGCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTAT AAACAAAGTCTTCCTGGAAGTAATAAGCATCCTGAAATAAAAAAGCAAGAAGTTCAGACT GTTAATACAGATTTCTCTCCACTGATTTCAGATAACTTAGAACAGCCTATGAGTAGTCAT GCATCTCAGGTTTGTTCTGAGACACCTGATGACCTGTTAGATGATGGTGAAATAAAGGAA GATACTAGTTTTGCTGAAAATGACATTAAGGAAAGTTCTGCTGTTTTTAGCAAAAGCGTC CAGAAAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCATACACATTTGGCTCAGGGT TACCGAAGAGGG >HowlerMon TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTGTTACTCACT AAAGACACACTGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTA GCAAGGAGCCAACATAACAGATGGGCTGAAAGTGAGGAAACATGTAATGATAGGCAGACT CCCAGCGAGAAAAAGGTAGATGTGGATGCTGATCCCCTGCATGGGAGAAAAGAATGGAAT AAGCAGAAACCTCCGTGCTCTGAGAATCCTAGAGATACTGAAGATGTTGCTTGGATAATG CTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAACT TCTGATGACTCACATGATGGGGGGTCTGAATCAAATGCCTTGGAAGTTCTAAATGAGGTA GATGGATATTCTAGTTCTTCAGAGAAAATAGACTTACTGGCCAGTGATCCTCATGATCAT TTGATATGTGAAAGAGTTCACTGCAAATCAGTAGAGAGTAGTATTGAAGATAAAATATTT GGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTGAGCCACGTAACTGAAATTATA GGAGCATTTGTTACTGAGCCACAGATAATACAAGAGCATCCTCTCACAAATAAATTAAAG CGTAAAAGGACATCAGGACTTCATCCTGAGGATTTTATCAAGAAAGCAGATTTGGCAGTT CAAAAGACTCCTGAAAAGATAAATCAGGGAACTAACCAAACAGAGCGGAATGATCAAGTG ATGAATATTACTAACAGTGGTCATGAGAATAAAACAAAAGGTGATTCTATTCAGAATGAG AACAATCCTAACCCAGTAGAATCACTGGAAAAAGAATCATTCAAAAGTAAAGCTGAACCT ATAAGCAGTAGTATAAGCAATATGGAATTAGAATTGAATGTCCACAATTCCAAAGCATCT AAAAAGAATCTGAGAAGGAAGTCTACCAGGCATATTCATGAGCTTGAACTAGTCAGTAGA AATCTAAGCCCACCTAATTATACTGAAGTACAAATTGATAGTTGTTCTAGCAGTGAAGAG ATAAAGAAAAAAAATTACAACCAAATGCCAGTCAGGCACAGCAGAAAGCTACAACTCATG GAAGATAAAGAACGTGCAGCTAGAGCCAAAAAGAGCAAGCCAAATGAACAAACAAGTAAA AGACATGCCAGTGATACTTTCCCAGAACTGAGGAACATACCTGGTTCTTTTACTAACTGT TCAAATACTAATGAATTTAAAGAATTTAATCCTAGCCTTCCAAGAGAACAAACAGAGAAA CTAGAAACAGTTAAACTGTCTAATAATGCCAAAGACCCCAAAGATCTCATGTTAAGTGGA GAAAGTGTTTTGCAAATTGAAAGATCTGTAGAGAGTAGCAGTATTTTGTTGATACCTGGT ACTGATTATGGCACTCAGGAAAGTATCTCATTACTGGAAGTTAGCACTCTGGGGAAGGCA AAAACAGAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGAACTA ATTCATGGTTCTAAAGATACTAGAAATGGCACAGAAGGCTTGAAGTATCCATTGGGACCT GAAGTTAACTACTCAAATCCAGAAAAGGAATGTGCATGCTCTAGGTCCTTAAAGAAACAA AGTCCAAAGGTCACTCCTGAATGTGAACAAAAGGAAAATCAAGGAGAGAAAGAGTCTAAT GAGCTTGTAGAGACAGTTAATACCACTGCAGGCTTTCCTATGGTTTGTCAGAAAGATAAG CCAGTTGATTATGCCAGATGTGAAGGAGGCTCTAGGCTTTGTCTATCATCTCAGTTCAGA GGCAACGAAACTGGACTCATTATTCCAAATAAACATGGACTTTTACAGAACCCATATCAT ATGTCACCGCTTATTCCCACCAGGTCATTTGTTAAAACTAAATGTAAGAAAAACCTGCTA GAAGAAAACTCTGAGGAACATTCAATGTCACCTGAAAGAGCAATGGGAAACAAGAACATT CCAAGTACAGTGAGCACAATTAGCCATAATAACAGAGAAAATGCTTTTAAAGAAACCAGC TCAAGCAGTATTTATGAAGTAGGTTCCAGTGATGAAAACATTCAAGCAGAGCTAGGTAGA AACAGAAGGCCAAAATTGAATGCTATGCTTAGATTAGGGCTTCTGCAACCTGAGATTTGT AAGCAAAGTCTTCCTATAAGTGATAAACATCCTGAAATTAAAAAGCAAGAAGTTCAGACT GTTAATACAGACGTCTCTCTACTGATTTCATATAACCTAGAACAGCATATGAGCAGTCAT ACATCTCAGGTTTGTTCTGAGACACCTGACAACCTGTTAGATGATGGTGAAATAAAGGAA GATACTAGTTTTGCTGAATATGGCATTAAGGAGACTTCTACTGTTTTTAGCAAAAGTGTC CAGAGAGGAGAGCTCAGCAGGAGCCCTAGCCCTTTCACCCATACACATTTGGCTCAGGTT TACCAAAGAGGG >Mouse TGTGGCACAGATGCTCATGCCAGCTCATTACAGCCTGAGACCAGCAGTTTATTGCTCATT GAAGACAGAATGAATGCAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCATA GCAGTGAGCCAGCAGAGCAGATGGGCTGCAAGTAAAGGAACATGTAACGACAGGCAGGTT CCCAGCGGGGAAAAGGTAGGTCCAAACGCTGACTCCCTTAGTGATAGAGAGAAGTGGACT CACCCGCAAAGTCTGTGCCCTGAGAATTCTGGAGCTACCACCGATGTTCCTTGGATAACA CTAAATAGCAGCGTTCAGAAAGTTAATGAGTGGTTTTCCAGAACTGGTGAAATGTTAACT TCTGACAGCGCATCTGCCAGGAGGCACGAGTCAAATGCTTTGGAAGTTTCAAACGAAGTG GATGGGGGTTTTAGTTCTTCAAGGAAAACAGACTTAGTAACCCCCGACCCCCATCATACT TTAATGTGTGGAAGAGACTTCTCCAAACCAGTAGAGGATAATATCAGTGATAAAATATTT GGGAAATCCTATCAGAGAAAGGGAAGCCGCCCTCACCTGAACCATGTGACTGAAATTATA GGCACATTTATTACAGAACCACAGATAACACAAGAGCAGCCCTTCACAAATAAATTAAAA CGTAAGAGAAGTACATCCCTTCAACCTGAGGACTTCATCAAGAAAGCAGATTCAGCAGGT CAAAGGACTCCTGACAACATAAATCAGGGAACTGACCTAATGGAGCCAAATGAGCAAGCA GTGAGTACTACCAGTAACTGTCAGGAGAACAAAATAGCAGGTAGTAATCTCCAGAAAGAG AAAAGCGCTCATCCAACTGAATCATTGAGAAAGGAACCTTCCACAGCAGGAGCCAAATCT ATAAGCAACAGTGTAAGTGATTTGGAGGTAGAATTAAACGTCCACAGTTCAAAAGCACCT AAGAAAAATCTGAGGAGGAAGTCTATCAGGTGTGCTCTTCCACTTGAACCAATCAGTAGA AATCCAAGCCCACCTACTTGTGCTGAGCTTCAAATCGATAGTTGTGGTAGCAGTGAAGAA ACAAAGAAAAACCATTCCAACCAACAGCCAGCCGGGCACCTTAGAGAGCCTCAACTCATC GAAGACACTGAACCTGCAGCGGATGCCAAGAAGAACGAGCCAAATGAACACATAAGGAAG AGACGTGCCAGCGATGCTTTCCCAGAAGAGAAAAACAAAGCTGGTTTATTAACTAGCTGT TCAAGTCCTAGAAAATCTCAAGGGCCTAATCCCAGCCCTCAGAGAACAGGAACAGAGCAA CTTGAAACACGCCAAATGTCTGACAGTGCCAAAGAACTCGGGGATCGGGTCCTAGGAGGA GAGCCCAGTGGCAAAACTGACCGATCTGAGGAGAGCACCAGCGTATCCTTGGTACCTGAC ACTGACTACGACACTCAGAACAGTGTCTCAGTCCTGGACGCTCACACTGTCAGATATGCA AGAACAGGATCCGCTCAGTGTATGACTCAGTTTGTAGCAAGCGAAAACCCCAAGGAACTC GTCCATGGCTCTAACAATGCTGGGAGTGGCACAGAGGGTCTCAAGCCCCCCTTGAGACAC GCGCTTAACCTCTCAAAACCTCAAAAGGACTGTGCTCACTCTGTGCCCTCAAAGGAACTG AGTCCAAAGGTGACAGCTAAAGGTAAACAAAAAGAACGTCAGGGACAGGAAGAATTTGAA AGTCACGTACAAGCAGTTGCGGCCACAGTGGGCTTACCTGTGCCCTGTCAAGAAGGTAAG CTAGCTGCTGATACAATGTGTGATAGAGGTTGTAGGCTTTGTCCATCATCTCATTACAGA AGCGGGGAGAATGGACTCAGCGCCACAGGTAAATCAGGAATTTCACAAAACTCACATTTT AAACAATCAGTTTCTCCCATCAGGTCATCTATAAAAACTGACAATAGGAAACCTCTGACA GAGGGACGATTTGAGAGACATACATCATCAACTGAGATGGCGGTGGGAAATGAGAACCTT CAGAGTACAGTGCACACAGTTAGCCTGAATAACAGAGGAAATGCTTGTCAAGAAGCCGGC TCGGGCAGTATTCATGAAGTATGTTCCACTGGTGACTCCTTCCCAGGACAACTAGGTAGA AACAGAGGGCCTAAGGTGAACACTGTGCCTCCATTAGATAGTATGCAGCCTGGTGTCTGT CAGCAAAGTGTTCCTGTAAGTGATAAGTATCTTGAAATAAAAAAGCAGGAGGGTGAGGCT GTCTGTGCAGACTTCTCTCCACTATTCTCAGACCATCTTGAGCAATCTATGAGTGGTAAG GTTTTTCAGGTTTGCTCTGAGACACCTGATGACCTGCTGGATGATGTTGAAATACAGGGA CATACTAGCTTTGGTGAAGGTGACATAATGGAGAGATCTGCTGTCTTTAACGGAAGCATC CTGAGAAGGGAGTCCAGTAGGAGCCCTAGTCCTGTAACCCATGCATCGAAGTCTCAGAGT CTCCACAGAGCG >NineBande TGTGGCACAAATACTCATGCCAACTTATTACAGCATGAGAACAGCAGTTTATTACTCACT AAAGACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTA GCAAGGCGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAACATGTAATGATAGGCAGACT CCCAGCGAGAAAAAGGTAGATGTGGATGCTGATCCCCTGTATGGGCGAAAAGAACTGAAT AAGCAGAAACCTCCATGCTCTGAGAGTCATAGAGATACCCAAGATATTCCTTGGATAATG CTGAATAGTAGCATTCAGAAAGTTAACGAGTGGTTTTCCAGAGGTGATGACATATTAACT TCTGATGACTCACACGATAGGGGGTCTGAATTAAATGCATTGAAAGTTTCAAAAGAAGTA GATGAATATTCTAGTTTTTCAGAGAAGATAGACTTAATGGCCATTAATCCTCATGATACT TTACAATTTGAAAGAGTCCAATTGAAACCAGCAGAGAGTAACATCAAAGATAAAATATTT GGGAAAACCTATCATAGGAAGGCAAGCCTCCCTAACTTGAGCCACATAACCCGATTTATA GGAGCTATTGCTGCAGAGCCCAAGATAACACAAGAGCATTCCCTCCAAAATAAAATAAAG CGTAAAAGGGCATCAGGCCTTCGTCCTGAGGATTTATCCAAGAAAGTAGATTTGACAGTT CAAAAAACCCCTGAAAAGATAAATCAGGGAACTGACCAAATGGAGCAGAATGATCCAGTG ATGAATATTGCTAATAGTGGTCATGAGAATGAAACAAAAGGTGATTGTGTTCAGAAAGAG AAAAATGCTAATCCGACAGAATCATTGGGAAAAGAATCTTTCAGAACTAAAGGCGAACCT ATAAGCAGCAGTATAAGCAATATGGAACTAGAATTAAATATTTTAAATTCAAAAGCATCT AAGAAGAATCCGAAGAGGATGTCCACCAGGCATATTCATGCACTTGAACTAGGCAGTAGA AATCCAAGCCCACCTAATCATACTGAACTACAAATTGATAGTTGTTCTAGCATTGAAGAG ATAGAGAAAATAAATTCTAACCAAAAGCCAATCAGACACAACAGAATGCTTCAACTCACG AAAGAAAAAGAAACCACAACTGGAGCCAAAAAGAATAAGCCAAATGAACAAATAAGTGAA AGACATGCCAGTGATGCTTTCCTAGAACTTAAAAATGTAACTGATTTTCTTCCTAAATGT TCAAGTTCTGATAAACTTCAAAAATTTAATTCTAGCCTGCAAGGAGAAGTAGCAGAGAAC CTAGAAACAATTCAAGTGTCTGATAGTACCAGGGACCCTGAAGATCTGGTGGTAAGTGGA GAAAAGTGTTTGCAAACTGAAAGATCTGCAGAGAGTACCGGTATTTCAGTGGTACCTGAT ACTGATTATGGCACTCAAGACAGTATCTCATTACTGGAAGCTGACACCCTGGGGAAGGCA AAAACAGCACTAAATCAACATGTGAGTCAGTATGTAGCAATTAGAAATGCCACTGAACTT TCCCATGGTTCTAAAGACACTAGAAATGACACTGAAGATTTTAAGGATTCATTGAGACAT GAAGTTAACCACTCGAATCCAGAAAATGAATGTGCACACTCCAGGTTCTTAGGGAAACAA AGTCCAAAAGTCACCTTTGAATGTAGACATAAAGAAAATCAGGGGAAGAAAGAGTCTAAA AAACATGTGCAGGTAATTCACACAACTGCAGGCTTTCCTATAGTTTGTCAGAAAGATAAG CCAGGTGATTATGCCAAAGGTCAAGGAGTCTCTAGGCTTTGTCAGTCCTCTCAGGCCAGA GGCAATGAATCTGAACTCATTAATTCAAATGAACATGAAATTTCACAAAACCCAGATCAA ATGCCATCACTTTCTCACATGAAGTCATCTGTTAAAACTAAATGTAAGGAAAACCTGTCA GAGGAAAAGTTTGAGGAACTTACAGTGTCACTTGAAAGAACAATGGTAAATGAGAACATT CAAAGTACAGTAAGCACAATTAGCCACAGTAACAGAGAAAACACTTTTAAAGAAGCCAGC TCAAGCAGTATTAATGAAGTAGGGTCCAGTGATGAGAACATTCAAGCAGAAGTAGGTAGA AACAGAGCACCTAAATTAAATGCTATGCTCAGATTAGGTCTTATGCAACCTGAAGTCTAT AAGCAAAGTCTTCCTATAACCAATAAATATCCTGAAATAAAAAGTCAAGGAATTCGGGCT GTTGATATAGACTTCTCTCCACTAATTTCAGATAACCTACAACTACCTATGAATAGTTGT GCTTCCCAGATTTGTTCTGAGACACCTGATGACTTGTTAGATGATGATGAAATAAAGGAA AATAACTGCTTTGCTGAAAGTGACATTAAGGAAAGATCTGCTATTTTTAGCAAAACTGTC CAGAAAAGAGAGTTCAGAAGGAGCCCTAGCCCTTTAGTCCATACAAGTTTTGCTCAGGGT CACCAAAGAAAG >DogFaced TGTGGCACAAATACTCATGCCAACTCATTACAGCATGAGAACAGCAGTTTATTATACACT AAAGACAGAATGAATGTAGAAAAGACTGACTTCTGTAATAAAAGCAAACAGCCTGGCTTA GCAAGGAGCCAGCAGAACAGATGGGTTGAAACTAAGGAAACATGTAATGATAGGCAGACT TCCAGCGAGAAAAAGGTAGTTCTGAATGCTGATCCCCTGAATGGAAGAATAAAACTGAAT AAGCAGAAACCTCCATGCTCTGACAGTCCTAGAGATTCCAAAGATATTCCTTGGATAACA CGGAATAGTAGCATACAGAAAGTTAATGAGTGGTTTTCCAGACGTGATGAAACATTAACT TCTGATGTCTTACTTGATGAGAGGTCTGAATCAAATGTGGTAGAAGTTCCAAATGAAGTA GATGGATACTCTGGTGCTTCAGAGGAAATAGCCTTAAAGGCCAGTGATCCTCATGGTGCT TTAATATGTGAAAGAGTTCACTCCAAATTGATAGAAAGTAATATTGAAGATAAAATATTT GGGAAAACATATCGGAGGAAAGCAAGCCTCCCTAACTTAAGCCACATAACTGAAATTACA AGAGCATCTGCTACAGAACCTCAGATAACACAAGAGTGCCCCCTCACAAATAAACTAAAA CGTAAAAGAACATCAGGCCTTCATCCTGAGGATTTTATCAAGAAAATAGATTTGACAACT CAAAAAACTTCTGAAAATATAATTGAGGGAACTGACCAAATAGAGCAGAATGGTCATGTG ATGAATAGTTCTAATGATGGTCATGAGAATGAAACAAAAGGTGATTATGTTCAGAAGAAG AAAAATACAAACCCAACAGAATCATTGGAAAAAGAATCTTTCAGAACTAAAGTTGAGTCT GTACCCAACAACATAAGCAATGTGGAACTAGAATTAAATATTCACGGTTCAAAAGCACTC AAGAAGAATCTGAGGAGGAAGTCCACCAGGCATATTCATGCACTTGAACTAGTCAATAGA AATTCAAGCCCACCTAATCATACTGAACTACAAATTGATAGTTGTTCCAGCAGTGAAGAA CTGAAGGAAAAAAATTCTGACCGAATGCCAGACAGACACAGCAAAAAACTTCAGTTCGTA GAAGATAAAGAATCTGCAACTGGAGCCAAGAAGAACATGCCAAATGAGGCAATAAATAAA AGACTTTCCAGTGAAGCTTTTCCCGAATTAAATAACGTACCTGGTTTTTTTACTAATGGT TCAAGTTCTAATAAACGTCAAGAGTTTAATCCTAGCCTTCAAGGAGAAGAAATAGAGAAT CTACGAACAATTCAAGTGTCTAATAGCACCAAAGACCCCAAAATTCTAATCTTTGGTGAA GGAAGAGGTTCACAAACTGATCGATCTACAGAGAGTACCAGTATTTTATTGGGACCTGAA ACGGATTATGGCACTCAAGATAGTATCTCATTACTGGAATCTGACATCCCAGGGAGGGCA AAGACAGCACCAAACCAACATGCAGATCTGTGTGCAGCAATTGAAAACCCCAGAGAACTT ATTCATGATTTTAAAGAAACTAGAAATGACACAGAGAGCTTTAAAGATCCATTGAGACAT GAAGTTAACTCCTCAGACCCAGAAAAGGAATGTGCACACTCCAGGTCCTTGATAAAACAA AGTCCAAAAGTCACTCTTGAATGTGACCGAAAAGGAAATCAGGGAAAGAAAGAGTCTAAC GAGCATGTGCAGGCAGTTTATACAACTATAGGCTTTCCTGGGGTTTCTGAGAAAGACAAA CCAGGAGATTATGCCAGATATAAAGAAGTCTCTAGGCTTTGTCAGTCATTTCAGTCTAGA AGAAATGAAACTGAGCTCACTATTGCAAATAAACTTGGACTTTCACAAAACCCATATCAT ATGCCATCCATTTCTCCCATCAAGTCATCTGTTAAAACTATATGTAAGAAAAATCTGTCA GAGGAAAAGTTTGAAGAACATTCAATATTCCCTGAAAGAGCAATAGGAAATGAGACCATT CAAAGTACAGTGGGCACAATTAGCCAAAATAACAGAGAAAGCACTTTTAAAGAAGGCAGC TCAAGCGGTATTTATGAAGCAGGTTCCAGTGGTGAAAACATTCAAGCAGAACTAAGTAGA AACAGAGGACCAAAATTAAATGCTGTGCTTCAGTTGGGTCTCATGCAGCCTGAAGTCTAT GAGCAAAGCCTTCCTCTAAGTAATAAACATTCTGAAATAAAAAGGCAAGGAGTTCAGGCT GTTAATGCAGATGTCTCTCCACAAATTTCAGATAACTTAGAGCAACCTATGAACAGTAAT ATTTCTCAGGTTTGTTCTGAGACACCGGATGACCTGTTAAATGATGACAAAATAAAGGAC AATATCAGCTTTGATGAAAGTGGCATTCAGGAAAGATCTGCTGTTTTTAGCAAAAATGTC CAGAAAGGAGAATTCAGAAGGAGCCCTAGTCCCTTAGCCCATGCAAGTTTGTCTCAAGGT CGCCCAAGAAGG PyCogent-1.5.3/doc/data/motif_example.fasta000644 000765 000024 00000041664 11115453517 021605 0ustar00jrideoutstaff000000 000000 >1091044 VTDKFPEDVKFLYIAYTPAKADITACGIPIPLDFDKEFRDKTVVIVAIPGAFTPTCTANHIPPFVEKFTALKSAGVDAVIVLSANDPFVQSAFGKALGVTDEAFIFASDPGAEFSKSAGLSLDLPPAFGTRTARYAIIVSNGVVKYVEKDSEGVAGSGVDAVLAAL >11467494 MTNFPKIGKTPPNFLTIGVYKKRLGKIRLSDYRGKKYVILFFYPANFTAISPTELMLLSDRISEFRKLSTQILAISVDSPFSHLQYLLCNREEGGLEDLNYPLVSDLTQTITRDYQVLTDEGLAFPGLFIIDKEGIIQYYTVNNLLCGRNINELLRILESIQYVKENPGYACPVNWNFGDQVFYSHPLKSKIYFKDLYSPKKSS >11499727 MERLNSERFREVIQSDKLVVVDFYADWCMPCRYISPILEKLSKEYNGEVEFYKLNVDENQDVAFEYGIASIPTVLFFRNGKVVGGFIGAMPESAVRAEIEKALGA >1174686 MSDGVKHINSAQEFANLLNTTQYVVADFYADWCGPCKAIAPMYAQFAKTFSIPNFLAFAKINVDSVQQVAQHYRVSAMPTFLFFKNGKQVAVNGSVMIQGADVNSLRAAAEKMGRLAKEKAAAAGSS >12044976 MVTEIRSLKQLEEIFSAKKNVIVDFWAAWCGPCKLTSPEFQKAADEFSDAQFVKVNVDDHTDIAAAYNITSLPTIVVFENGVEKKRAIGFMPKTKIIDLFNN >13186328 MLNTRIKPFKNISYYKKKFYEIKEIDLKSNWNVFFFYPYSYSFICPLELKNISNKIKEFKNLNTKIYAISNDSHFVQKNWIENELKFINFPFISDFNHKISNNFNILNKKDGNCLRSTIIIDKNLIIKYINIVDDSIGRSIDEILKNIKMLQFINTNENKLCPYSWNNDSKSIEIN >13358154 MLIKLESNQNLNELLKENHSKPILIDFYADWCPPCRMLIPVLDSIEKKHGDEFTIIKINVDHFPELSTQYQVKSIPALFYLKNGDIKATSLGFIDENSLVNKLRSI >13541053 MSLVNKAAPDFEANAFVNGEVKKIRLSSYRGKWVVLFFYPADFTFVCPTEVEGFAEDYEKFKKKNTEVISVSEDTVYVHKAWVQYDERVAKAKYPMVEDRKGIIARAYDVYNEETGNAQRGLFIINPDGIVKYVVITDDNVGRSTDETLRVLEALQSGGLCPVNWHEGEPTLKV >13541117 MSPKVNEKAPDFEAPDTALKMRKLSEFRGQNVVLAFFPGAFTSVCTKEMCTFRDSMANFNKFKAKVIGISVDSPFSLAEFAKKNNLTFDLLSDSNREISKKYDVLHQNFAGVPGLTASKRSVFIIDGDGIVRYAWVSDDPGKEPDYKAIQEFLSKMN >135765 AIVKATDQSFSAETSEGVVLADFWAPWCGPCKMIAPVLEELDQEMGDKLKIVKIDVDENQETAGKYGVMSIPTLLVLKDGEVVETSVGFKPKEALQELVNKHL >1388082 MAAEEGQVIGCHTNDVWTVQLDKAKESNKLIVIDFTASWCPPCRMIAPIFNDLAKKFMSSAIFFKVDVDELQSVAKEFGVEAMPTFVFIKAGEVVDKLVGANKEDLQAKI >140543 MLFYKPVMRMAVRPLKSIRFQSSYTSITKLTNLTEFRNLIKQNDKLVIDFYATWCGPCKMMQPHLTKLIQAYPDVRFVKCDVDESPDIAKECEVTAMPTFVLGKDGQLIGKIIGANPTALEKGIKDL >14286173 MPLIGDKFPEMEVQTTHGPMELPDEFEGKWFILFSHPADFTPVCTTEFVAFQEVYPELRELDCELVGLSVDQVFSHIKWIEWIAENLDTEIEFPVIADTGRVADTLGLIHPARPTNTVRAVFVVDPEGIIRAILYYPQELGRNIPEIVRMIRAFRVIDAEGVAAPANWPDNQLIGDHVIVPPASDIETARKRKDEYECYDWWLCTQSRG >14578634 MKKNFFYAGGLSLLLWGMAACSGQGKADKAAVVADSVVVKTDSVAADSTGYIVKVGESAPDFTITLTDGKQMKLSELRGKVVMLQFTASWCGVCRKEMPFIEKDIWLKHKNNPEFALIGIDRDEPLDKVIAFGKSVGVTYPLGLDPGADIFAKYALRESGITRNVLIDREGKIVKLTRLYNEEEFASLVDQIDEMLKK >14600438 MPGVGEQAPDFEGIAHTGEKIRLSDFRGRIVVLYFYPRAMTPGCTREGVRFNELLDEFEKLGAVVIGVSTDSVEKNRKFAEKHGFRFKLVSDEKGEIGMKYGVVRGEGSNLAAERVTFIIDREGNIRAILRNIRPAEKHADLALEEVKKLVLGKKAERAGEVI >15218394 MQVRAAKKQTFNSFDDLLQNSDKPVLVDFYATWCGPCQLMVPILNEVSETLKDIIAVVKIDTEKYPSLANKYQIEALPTFILFKDGKLWDRFVSFLSRMNTAYLLLASKAQL >15597673 MLSFSLGPLVVSLQHLLLFLALGAALLGGWLAARGGGRNAEPVLFNLLLLGLLVARLAFVVRYWPQYRGDFAQMLDIRDGGFLAWPGLLAAVLGALFWAWRRPALRRSLGVGASLGLAFWLLGSLGLGIYERGTRLPELSLRNAAGESVQLADFRGRPLVINLWASWCPPCRREMPVLQQAQAENPDVVFLFANQGESAETVRHFLQGENLRLDNLLFDNGGQLGQQVGSVALPTTVFYTAEGRLLGSHLGELSRGSLARYLEAFEPAAAAPATRSSE >15599256 MSDTPYIFDVTGANFEQLVIENSFHKPVLVDFWADWCAPCKALMPLLAQIAESYQGELLLAKVNCDVEQDIVMRFGIRSLPTVVLFKDGQPVDGFAGAQPESQIRALLEPHVKAPALPDEDPLEVAQALFAEGRIGDAEATLKALLAENNENAAALILYARCLAERGELEEAQAILDAVKSDEHKQALAGARAQLTFLRQAADLPDSAELKSRLAADAGDDEAAYQLAVQQLARQQYEAALDGLLKLFLRNRGYQDDLPRKTLVQVFDLLGNDHPLVTAYRRKLYQALY >15602312 MVNSYEKPQMKTSVLLTALFKPLLLCTIVLSCIGCKEDIAVIGKQAPEIAVFDLVGTQRSLNEGKGKTILLNFWSETCGVCIAELKTFEQLLQSYPQNNLHIIAINVDGDKADTQALVKKREISLLVVKDQLKITAERYQLVGTPTSFVIDPEGKILYKFEGLIPTQDLHLFFKG >15605725 MKRLLPVFLALILFGLLVYIGLNQDKHDHYTITTQKGQKIPNVTLTTPDGKKVSIEEFKGKVLLINFWATWCPPCKEEIPMFKEIYEKYRDRGFEILAINMDPENLTGFLKNNPLPFPVFVINEKWKEPLTFQGFQRRTWLTEEEL >15605963 MARTVNLKGNPVTLVGPELKVGDRAPEAVVVTKDLQEKIVGGAKDVVQVIITVPSLDTPVCETETKKFNEIMAGMEGVDVTVVSMDLPFAQKRFCESFNIQNVTVASDFRYRDMEKYGVLIGEGALKGILARAVFIIDKEGKVAYVQLVPEITEEPNYDEVVNKVKELI >15609375 MLNVGATAPDFTLRDQNQQLVTLRGYRGAKNVLLVFFPLAFTGICQGELDQLRDHLPEFENDDSAALAISVGPPPTHKIWATQSGFTFPLLSDFWPHGAVSQAYGVFNEQAGIANRGTFVVDRSGIIRFAEMKQPGEVRDQRLWTDALAALTA >15609658 MTKTTRLTPGDKAPAFTLPDADGNNVSLADYRGRRVIVYFYPAASTPGCTKQACDFRDNLGDFTTAGLNVVGISPDKPEKLATFRDAQGLTFPLLSDPDREVLTAWGAYGEKQMYGKTVQGVIRSTFVVDEDGKIVVAQYNVKATGHVAKLRRDLSV >15613511 MLEGKQAPDFSLPASNGETVSLSDFKGKNIVLYFYPKDMTPGCTTEACDFRDRVEDFKGLNTVILGVSPDPVERHKKFIEKYSLPFLLLADEDTKVAQQYDVWKLKKNFGKEYMGIERSTFVIDKDGTVVKEWRKVRVKDHVEEALAFIKENLE >15614085 MNKKRLTTIVVLIAVVASVIIILTQNNLEVGNGKGMLAEDFTLPLYEESQSRSLSDYRGDVVILNVWASWCEPCRKEMPALMELQSDYESEDVSIVTVNMQTFERTVNDAGEFIEELGITLPVFLDEEGEFADAYQVQHLPMTYVLDREGIINEVILGEVTYEQLEQLIVPLLEKAS >15614140 MDKRKRFWMRLSILAVISVALGYTFYSNFFADRSLARAGEQAVNFVLEDLEGESIELRELEGKGVFLNFWGTYCPPCEREMPHMEKLYGEYKEQGVEIIAVNANEPELTVQRFVDRYGLSFPIVIDKGLNVIDAYGIRPLPTTILINEHGEIVKVHTGGMTEQMVEEFMELIKPEA >15615431 MKGKQAFYRSFLILALCTVGYFYSQSEHVYRPVDKEAITTFNEGVQVGQRAVPFSLTTLEGQVVDLSSLRGQPVILHFFATWCPVCQDEMPSLVKLDKEYRQKGGQFLAINLTNQESSIKDVRAFVQHYRAEFDPLLDTDGEVMETYQVIGIPTTLILDEEGTIVKRYNGVLTEEIIDEIMDIH >15643152 MRVKHFELLTDEGKTFTHVDLYGKYTILFFFPKAGTSGCTREAVEFSRENFEKAQVVGISRDSVEALKRFKEKNDLKVTLLSDPEGILHEFFNVLENGKTVRSTFLIDRWGFVRKEWRRVKVEGHVQEVKEALDRLIEEDLSLNKHIEWRRARRALKKDRVPREELELLIKAAHLAPSCMNNQPWRFVVVDEEELLKKIHEALPGGNYWMKNAPALIAVHSKKDFDCALPDNRDYFLFDTGLAVGNLLVQATQMGLVAHPVAGYDPVKVKEILKIPEDHVLITLIAVGYLGDESELSEKHRELERSERVRKELSEIVRWNL >15672286 MPMNITSHGEKIATLNPPISGDAPDFELTDLKGNKIKLSKLEKPVLISVFPDINTRVCSLQTKHFNLEAAKHSEIDFLSISNNTADEQKNWCATEGVDMTILADDGTFGKAYGLILNGGPLEGRLARSVFVVKNGQIVYSEVLSELSDEPNYEKALAATK >15790738 MQRGWGSERLAPRLSPGRAGSVISNDTDTPSMRCEGARAPAFELPGVSDGTQTRLGLTDALADNRAVVLFFYPFDFSPVCATELCAIQNARWFDCTPGLAVWGISPDSTYAHEAFADEYALTFPLLSDHAGAIADAFGVLQASAEDHDRVPERAVFLIDADRVIRYAWASSDLSESPDLGAVKAAIDDLQPDTAGVAPRTISDAHDDGTTDG >15791337 MTVTLKDFYADWCGPCKTQDPILEELEADYDEDVSFEKIDVDEAEDVANEYQVRSIPTIVVENDDGVVERFVGVTQREQLEDAIESAGA >15801846 MSQTVHFQGNPVTVANSIPQAGSKAQTFTLVAKDLSDVTLGQFAGKRKVLNIFPSIDTGVCAASVRKFNQLATEIDNTVVLCISADLPFAQSRFCGAEGLNNVITLSTFRNAEFLQAYGVAIADGPLKGLAARAVVVIDENDNVIFSQLVDEITTEPDYEAALAVLKA >15805225 MTEPAPASPARPAWTRALPPLLAAALVGGLGWALLKPAGNAANGPLVGKPAPQFNLTGLDGQPVALADYRGRPVVLNFWASWCGPCREEAPLFAKLAAHPGAPAVLGILFNETKPQNARDFARQYGLTYPNLQDPGVATAIAYQVTGIPRTVFIDAQGVVRHIDQGGLDTARLNAGLSKIGVPGL >15805374 MTSPTPPTSPSLSPQPRRASWTRWIVPAVMVGLVGLLAYGLFTPDPEGGPALLGKPAPAFALEDLGGRTHALTAAQGKPVVINFWASWCVPCRQEAPLFSKLSQETAGKAEFFGVIYNDQPADARRFMDQYGLIYPALLDPGSRTALSYGVGKLPITFIVDGQGKVVHIKDGPIEEPELRAALKQAGL >15807234 MTLVGQPAPDFTLPASTGQDITLSSYRGQSHVVLVFYPLDFSPVCSMQLPEYSGSQDDFTEAGAVVLGINRDSVYAHRAWAAEYGIEVPLLADMQLEVARQYGVAIDERGISGRAVFVIDREGVVRYQHVEEQTGQYTVRPGAVLEQLRGL >15826629 APIKVGDAIPAVEVFEGEPGNKVNLAELFKGKKGVLFGVPGAFTPGCSKTHLPGFVEQAEALKAKGVQVVACLSVNDAFVTGEWGRAHKAEGKVRLLADPTGAFGKETDLLLDDSLVSIFGNRRLKRFSMVVQDGIVKALNVEPDGTGLTCSLAPNIISQL >15899007 MNDELNDPELQKILSKKTTQILNNLKEKVKEPVKHLNSKNFDEFITKNKIVVVDFWAEWCAPCLILAPVIEELANDYPQVAFGKLNTEESQDIAMRYGIMSLPTIMFFKNGELVDQILGAVPREEIEVRLKSLLE >15899339 MVEIGEKAPEIELVDTDLKKVKIPSDFKGKVVVLAFYPAAFTSVCTKEMCTFRDSMAKFNEVNAVVIGISVDPPFSNKAFKEQNKINFTIVSDFNREAVKAYGVAGELPILKGYVLAKRSVFVIDKNGIVRYKWVSEDPTKEPNYDEIKDVVTKLS >15964668 MTIAVGDKLPNATFKEKTADGPVEVTTELLFKGKRVVLFAVPGAFTPTCSLNHLPGYLENRDAILARGVDDIAVVAVNDLHVMGAWATHSGGMGKIHFLSDWNAAFTKAIGMEIDLSAGTLGIRSKRYSMLVEDGVVKALNIEESPGQATASGAAAMLELL >15966937 MSGSDNPYQGSFGTQMTGSASFGGQPASAASGPNDLTPDDLIRETTTAAFTRDVLEASRQQPVLVDFWAPWCGPCKQLTPVIEKVVREAAGRVKLVKMNIDDHPSIAGQLGIQSIPAVIAFIDGRPVDGFMGAVPESQIKEFIDRIAGPAADNGKAEIESVLADAKALIDAGDAQNAAGLYGAVLQADPENAAAVAGMIECMIALGQVAEARQALSGLPEALAGEAAVAAVSKKLDQIEEARKLGDPAALERQLALDPDDHAARLKLAKLRNVEGDRTAAAEHLLTIMKRDRSFEDDGARRELLSFFEVWGPKDPATIAARRKLSSILFS >15988313 SRAPTGDPACRAAVATAQKIAPLAHGEVAALTMASAPLKLPDLAFEDADGKPKKLSDFRGKTLLVNLWATWCVPCRKEMPALDELQGKLSGPNFEVVAINIDTRDPEKPKTFLKEANLTRLGYFNDQKAKVFQDLKAIGRALGMPTSVLVDPQGCEIATIAGPAEWASEDALKLIRAATGKAAAAL >16078864 MLKKWLAGILLIMLVGYTGWNLYQTYSKKEVGIQEGQQAPDFSLKTLSGEKSSLQDAKGKKVLLNFWATWCKPCRQEMPAMEKLQKEYADKLAVVAVNFTSAEKSEKQVRAFADTYDLTFPILIDKKGINADYNVMSYPTTYILDEKGVIQDIHVGTMTKKEMEQKLDLD >16123427 MNTVCTACMATNRLPEERIDDGAKCGRCGHSLFDGEVINATAETLDKLLQDDLPMVIDFWAPWCGPCRSFAPIFAETAAERAGKVRFVKVNTEAEPALSTRFRIRSIPTIMLYRNGKMIDMLNGAVPKAPFDNWLDEQLSRDPNS >16125919 MKIVLAGVAAVALAFASPALAALKAGDKAPDFSAKGALAGKDFDFNLAKALKKGPVVLYFFPAAYTAGCTAEAREFAEATPEFEKLGATVIGMTAGNVDRLKDFSKEHCRDKFAVAAADKALIKAYDVALAVKPDWSNRTSYVIAPDGKILLSHTDGNFMGHVQQTMGAVKAYKAKK >16330420 MNLQIELYKFQQESLKRSSPERAAIFSDFIQGLSEEFRNRRLLRIGDFAPDFTLKNTKGETIILSEQLKTGPILLKFFRGYWCPYCGLELRAYQKVVNKIRALGGTILAISPQTLVASQKTIDRHDLTYDLLSDSGFQTAQDYGLVFTVPDAVKQIYLQSGCVIPEHNGTEEWLLPVPATFVIDRRGHIALAYANVDFRVRYEPEDAIAILLSLFVGN >1633495 XDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLA >16501671 MVLVTRQAPDFTAAAVLGSGEIVDKFNFKQHTNGKTTVLFFWPMDFTFVCPSELIAFDKRYEEFQKRGVEVVGVSFDSEFVHNAWRNTPVDKGGIGPVKYAMVADVKREIQKAYGIEHPDEGVALRGSFLIDANGIVRHQVVNDLPLGRNIDEMLRMVDALQFHEEHGDVCPAQWEKGKEGMNASPDGVAKYLAENISSL >1651717 MVVAWTTRDQSARNMPMLAVNEDNFDNLVLQCPKPILVYFGAPWCGLCHFVKPLLNHLHGEWQEQLVCVEVNADVNLHLANAYRLKNLPTLILFNRGQVIQRLEDFRVREDLHRIREQIAVSLFSP >16759994 MAGKLRRWLREAAVFLALLIAIMVVMDVWRAPQAPPAFAATPLHTLTGESTTLATLSEERPVLLYFWASWCGVCRFTTPAVAHLAAEGENVMTVALRSGGDAEVARWLARKGVDFPVVNDANGALSAGWEISVTPTLVVVSQGRVVFTTSGWTSYWGMKLRLWWAKTF >16761507 MNTVCTHCQAINRIPGDRLQDAAKCGRCGHELFDGEVINATGETLDKLLKDDLPVVIDFWAPWCGPCRNFAPIFEDVAEERSGKVRFVKVNTEAERELSARFGIRSIPMIMIFKHGQVVDMLNGAVPKAPFDSWLNEAL >16803644 MAERLVGTQAPRFEMEAVMPNQTFGKVSLEKNIEDDKWTILFFYPMDFTFVCPTEIVAISARSDEFDALNARIIGASTDTIHSHLAWTNTPIKEGGIGKLNYPLAADTNHQVASDYGVLIEEEGVALRGLFIINPKGEIQYEVVHHNNIGREVDEVLRVLQALQTGGLCPINWQPGEKTIV >16804867 MAIIFAKEDDLEEIISSHPKILLNFWAEWCAPCRCFWPTLEQFAEMEEGNVQVVKINVDKQRALAQKFDVKGIPNSLVLVDGEIKGAIAGIVSCDELRSRFKSLAK >17229033 MFSNREGQRVPQVTFRTRQNNEWVNVTTDDLFAGKTVAVFSLPGAFTPTCSSTHLPGYNELAKVFKDNGVDEIVCISVNDAFVMNEWAKTQEAENITLIPDGNGEFTEGMGMLVDKTDLGFGKRSWRYSMLVKDGVIEKMFIEPDVPGDPFEVSDAQTMLKHINPQAAKPKLVSLFTKVGCPFCARAKAALQENGIEYEEITLGKGITTRSLRAVTGATTVPQVFIDGVLIGGSEALTDYLSTIKQEKVGV >17229859 MPTDTVAYVQENEFDTVLSEDKVVVVDFTATWCGPCRLVSPLMDQLADEYKGRVKVVKVDVDNNKPLFKRFGLRSIPAVLIFKDGELTEKIVGVSPYEQFSGAVEKLI >1729944 MLMLDKDTFKTEVLEGTGYVLVDYFSDGCVPCKALMPAVEELSKKYEGRVVFAKLNTTGARRLAISQKILGLPTLSLYKDGVKVDEVTKDDATIENIEAMVEEHISK >17531233 MLKRCNFKNQVKYFQSDFEQLIRQHPEKIIILDFYATWCGPCKAIAPLYKELATTHKGIIFCKVDVDEAEDLCSKYDVKMMPTFIFTKNGDAIEALEGCVEDELRQKVLEHVSAQ >17537401 MPQFLKGVTLENREGDELPAEEHLKGKIIGLYFSASWCPPCRAFTPKLKEFFEEIKKTHPEFEIIFVSRDRNSSDLVTYFKEHQGEWTYIPFGSDKIMSLMQKYEVKTIPAMRIVNDQGEVIVQDARTEIQNKGENVEGLWAEWMALIK >17547503 MSRRQTLIALILAGIVAAVLGVYAGHRATQPRPPADGAVTAFFDARLPDANGTPIDLSAFRGQPVVINFWAPWCGPCVEEMPELSALAQEQKARVKFIGIGIDSAANIQAFLGKVPVTYPIAVAGFGGTELGRQLGNQAGGLPFTVILNAQGEITFRKMGRVHADELRAALQRT >18309723 MENINLFIVFIEGILSIFSPCILPILPIYLSMLSNSSVEDIRDSNFKSGVLIRNTLFFTLGISTTFFILGSSISALSSFFNTNKNIIMILGGVIILFMGLFYLGLINLNILNREKRLNFKYKNMSPVSAFVLGFTFSFGWTPCIGPILASVLVMASSSKNLLMSNLLILVYTIGFILPFIIASLFYGKLFKKFDGIKKHMDLIKKISGIFIIIAGLIMLVGGIRNMNNEIKINNNPQINKSESVNKDSKNESDNKKQEEENKIPPIDFTLYDQYGNKHTLSEYKGKTIFLNFWATWCPPCRGEMPYIDELYKEYNENKDDVVILGVASPNLGREGSEEHIKNFLKEENHVFPVVLDENGAMVYQYGINAFPSTFIINKEGYITKYIPGAMTKETMKTLIESER >18313548 MVLKVGDKAPDFELLNEELKPVRLSEVLKRGRPVVLLFFPGAFTSVCTKELCTFRDKMALLNKANAEVLAISVDSPFALKAFKDANRLNFPLLSDYNRIVIGMYDVVQANLLGLPLYHLAKRAVYIIDPTGTIRYVWYSDDPRDEPPYDEVIKEAEKIGAQK >18406743 MAETSKQVNGDDAQDLHSLLSSPARDFLVRNDGEQVKVDSLLGKKIGLYFSAAWCGPCQRFTPQLVEVYNELSSKVGFEIVFVSGDEDEESFGDYFRKMPWLAVPFTDSETRDRLDELFKVRGIPNLVMVDDHGKLVNENGVGVIRSYGADAYPFTPEKMKEIKEDEDRARRGQTLRSVLVTPSRDFVISPDGNKVPVSELEGKTIGLLFSVASYRKCTELTPKLVEFYTKLKENKEDFEIVLISLEDDEESFNQDFKTKPWLALPFNDKSGSKLARHFMLSTLPTLVILGPDGKTRHSNVAEAIDDYGVLAYPFTPEKFQELKELEKAKVEAQTLESLLVSGDLNYVLGKDGAKVLVSDLVGKTILMYFSAHWCPPCRAFTPKLVEVYKQIKERNEAFELIFISSDRDQESFDEYYSQMPWLALPFGDPRKASLAKTFKVGGIPMLAALGPTGQTVTKEARDLVVAHGADAYPFTEERLKEIEAKYDEIAKDWPKKVKHVLHEEHELELTRVQVYTCDKCEEEGTIWSYHCDECDFDLHAKCALNEDTKENGDEAVKVGGDESKDGWVCEGNVCTKA >19173077 MFPKTLTDSKYKAFVDGEIKEISLQDYIGKYVVLAFYPLDFTFVCPTEINRFSDLKGAFLRRNAVVLLISCDSVYTHKAWASIPREQNGVLGTAWPMVWDAKRELCNQFGLYDEENGHPMRSTVILAKDLSVRHISSNYHAIGRSVDEIIRLIDAITFNDENGDICPAEWRSENKDN >19554157 MEEGEEISLSDFEGEVVVLNAWGQWCAPCRAEVDDLQLVQETLDPLGGTVLGINVRDYNQTIAQDFKLDNAVTYPSIYDPPFRIAAALGGVPTSVIPTTIVLDRSHRPAAVFLREVTALSG >19705357 MSFSVFAAKSDKKDDVKLPNIVLYDQYGKKHNIEEYKGKVVVINFWATWCGYCVEEMPGFEKVYKEFGSNKKDVIILGVAGPKSKENLNNVDVEKDKIISFLKKKNITYPSLMDETGKSFDDYKVRALPMTYVINKNGYLEGFVNGAITDEQLRKAINETLKKK >19746502 MKKGLLVTTGLACLGLLTACSTQDNMAKKEITQDKMSMAAKKKDKMSTSKDKSMMADKSSDKKMTNDGPMAPDFELKGIDGKTYRLSEFKGKKVYLKFWASWCSICLSTLADTEDLAKMSDKDYVVLTVVSPGHQGEKSEADFTKWFQGTDYKDLPVLLDPDGKLLEAYGVRSYPTEVFIGSDGVLAKKHIGYAKKSDIKKALKGIH >20092028 MAGDFMKPMLLDFSATWCGPCRMQKPILEELEKKYGDKVEFKVVDVDENQELASKYGIHAVPTLIIQKDGTEVKRFMGVTQGSILAAELDKLL >20151112 SLINTKIKPFKNQAFKNGEFIEVTEKDTEGRWSVFFFYPADFTFVCPTELGDVADHYEELQKLGVDVYSVSTDTHFTHKAWHSSSETIAKIKYAMIGDPTGALTRNFDNMREDEGLADRATFVVDPQGIIQAIEVTAEGIGRDASDLLRKIKAAQYVAAHPGEVCPAKWKEGEATLAPSLDLVGKI >21112072 MTIQVGDRIPEVVLKHLREGIEAVDTHTLFTGRKVVLFAVPGAFTPTCSAKHLPGYVEQFEAFRKRGIEVLCTAVNDPFVMQAWGRSQLVPDGLHLVPDGNAELARALGLEIDASGSGMGLRSRRYALYADDGVVKALFVEEPGEFKVSAADYVLQHLPD >21222859 MSAASRAPLRSNRTAERFTERTADARARVRSRAARAAVGAAAAALLVSACSSGGTSGGGGQTGFITGSDGIATAKKGERADAPELSGETVDGGQVDVADYKGKVVVLNVWGSWCPPCRAEAKNFEKVYQDVKDQGVQFVGINTRDTSTGPARAFEKDYGVTYPSLYDPAGRLMLRFEKGTLNPQAVPSTLIIDREGKVAARTLQALSEEKLRKMLAPYLQPEK >21223405 MLTVGDKFPEFDLTACVSLEKGKEFQQINHKTYEGQWKVVFAWPKDFTFVCPTEIAAFGKLNDEFADRDAQILGFSGDSEFVHHAWRKDHPDLTDLPFPMLADSKHELMRDLGIEGEDGFAQRAVFIVDQNNEIQFTMVTAGSVGRNPKEVLRVLDALQTDELCPCNWSKGDETLDPVALLSGE >21227878 MTDEIRVGETIQDFRLRDQKREEIHLYDLKGKKVLLSFHPLAWTQVCAQQMKSLEENYELFTELNTVPLGISVDPIPSKKAWARELGINHIKLLSDFWPHGEVARTCGIFRGKEGVSERANIIIDENRQVIYFKKYLGHELPDIKEIIEVLKNK >21283385 MTEITFKGGPIHLKGQQINEGDFAPDFTVLDNDLNQVTLADYAGKKKLISVVPSIDTGVCDQQTRKFNSDASKEEGIVLTISADLPFAQKRWCASAGLDNVITLSDHRDLSFGENYGVVMEELRLLARAVFVLDADNKVVYKEIVSEGTDFPDFDAALAAYKNI >21674812 MIEEGKIAPDFTLPDSTGKMVSLSEFKGRKVLLIFYPGDDTPVCTAQLCDYRNNVAAFTSRGITVIGISGDSPESHKQFAEKHKLPFLLLSDQERTVAKAYDALGFLGMAQRAYVLIDEQGLVLLSYSDFLPVTYQPMKDLLARIDAS >23098307 MKILKYVLIISISMFLPIMTAYAEGTEVGERAPDFELKTIDGQQLRLSDFKGERVLINFWTTWCPPCRQEMPDMQRFYQDLQPNILAVNLTDTEMNKEQVVRFSQELELTFPILLDEKGEVSKAYRISPIPTTYMIDSEGIIRHKSYGALTYEQMVAEYNKME >2649838 MVFTSKYCPYCRAFEKVVERLMGELNGTVEFEVVDVDEKRELAEKYEVLMLPTLVLADGDEVLGGFMGFADYKTAREAILEQISAFLKPDYKN >267116 MSKVIHVTSNEELDKYLQHQRVVVDFSAEWCGPCRAIAPVFDKLSNEFTTFTFVHVDIDKVNTHPIGKEIRSVPTFYFYVNGAKVSEFSGANEATLRSTLEANI >27375582 MSEQSTSANPQRRTFLMVLPLIAFIGLALLFWFRLGSGDPSRIPSALIGRPAPQTALPPLEGLQADNVQVPGLDPAAFKGKVSLVNVWASWCVPCHDEAPLLTELGKDKRFQLVGINYKDAADNARRFLGRYGNPFGRVGVDANGRASIEWGVYGVPETFVVGREGTIVYKLVGPITPDNLRSVLLPQMEKALK >2822332 MATDTAATTGTASPDEPLYVNGQTELDDVTSDNDVVLADFYADWCGPCQMLEPVVETLAEQTDAAVAKIDVDENQALASAYGVRGVPTLVLFADGEQVEEVVGLQDEDALKDLIESYTE >30021713 MWRKLTIIVVLLCLAGYAAYEQFGNKEQAVKVKQEKSEAAMKEIIARNGIEIGKSAPDFELTKLDGTNVKLSDLKGKKVILNFWATWCGPCQQEMPDMEAFYKEHKENVEILAVNYTPSEKGGGEEKVSNFIKEKGITFPVLLDKNIDVTTAYKVITIPTSYFIDTKGVIQDKFIGPMTQKEMEKRVAKLK >3261501 MTTRDLTAAYFQQTISANSNVLVYFWAPLCAPCDLFTPTYEASSRKHFDVVHGKVNIETEKDLASIAGVKLLPTLMAFKKGKLVFKQAGIANPAIMDNLVQQLRAYTFKSPAGEGIGPGTKTSS >3318841 MPGGLLLGDVAPNFEANTTVGRIRFHDFLGDSWGILFSHPRDFTPVCTTELGRAAKLAPEFAKRNVKLIALSIDSVEDHLAWSKDINAYNSEEPTEKLPFPIIDDRNRELAILLGMLDPAEKDEKGMPVTARVVFVFGPDKKLKLSILYPATTGRNFDEILRVVISLQLTAEKRVATPVDWKDGDSVMVLPTIPEEEAKKLFPKGVFTKELPSGKKYLRYTPQP >3323237 MALLDISSGNVRKTIETNPLVIVDFWAPWCGSCKMLGPVLEEVESEVGSGVVIGKLNVDDDQDLAVEFNVASIPTLIVFKDGKEVDRSIGFVDKSKILTLIQKNA >4155972 MLEVINGKNYAEKTAHQAVVVNVGASWCPDCRKIEPIMENLAKTYKGKVEFFKVSFDESQDLKESLGIRKIPTLIFYKNAKEVGERLVEPSSQKPIEDALKALL >4200327 MAQRLLLRRFLASVISRKPSQGQWPPLTSRALQTPQCSPGGLTVTPNPARTIYTTRISLTTFNIQDGPDFQDRVVNSETPVVVDFHAQWCGPCKILGPRLEKMVAKQHGKVVMAKVDIDDHTDLAIEYEVSAVPTVLAMKNGDVVDKFVGIKDEDQLEAFLKKLIG >4433065 GSIKEIDINEYKGKYVVLLFYPLDWTFVCPTEMIGYSEVAGQLKEINCEVIGVSVDSVYCHQAWCEADKSKGGVGKLGFPLVSDIKRCISIKYGMLNVETGVSRRGYVIIDDKGKVRYIQMNDDGIGRSTEET >4704732 MAPITVGDVVPDGTISFFDENDQLQTVSVHSIAAGKKVILFGVPGAFTPTCSMSHVPGFIGKAEELKSKGIDEIICFSVNDPFVMKAWGKTYPENKHVKFVADGSGEYTHLLGLELDLKDKGSGISSGRFALLLDNLKVTVANVESGGEFTVSSAEDILKAL >4996210 MAYHLGATFPNFTATASNVDGVFDFYKYVGDNWAILFSHPHDFTPVCTTELAEFGKMHEEFLKLNCKLIGFSCNSKESHDQWIEDIKFYGNLDKWDIPMVCDESRELANQLKIMDEKEKDIKGLPLTCRCVFFISPDKKVKATVLYPATTGRNSQEILRVLKSLQLTNTHPVATPVNWKEGDKCCILPSVDNADLPKLFKNEVKKLDVPSQKAYLRFVQM >5326864 MSLKAGDSFPEGVTFSYIPWAEDASEITSCGIPINYNASKEFANKKVVLFALPGAFTPVCSANHVPEYIQKLPELRAKGVDQVAVLAYNDAYVMSAWGKANGVTGDDILFLSDPEAKFSKSIGWADEEGRTYRYVLVIDNGKIIYAAKEAAKNSLELSRADHVLKQL >6322180 MGEALRRSTRIAISKRMLEEEESKLAPISTPEVPKKKIKTGPKHNANQAVVQEANRSSDVNELEIGDPIPDLSLLNEDNDSISLKKITENNRVVVFFVYPRASTPGCTRQACGFRDNYQELKKYAAVFGLSADSVTSQKKFQSKQNLPYHLLSDPKREFIGLLGAKKTPLSGSIRSHFIFVDGKLKFKRVKISPEVSVNDAKKEVLEVAEKFKEE >6323138 MSDLVNKKFPAGDYKFQYIAISQSDADSESCKMPQTVEWSKLISENKKVIITGAPAAFSPTCTVSHIPGYINYLDELVKEKEVDQVIVVTVDNPFANQAWAKSLGVKDTTHIKFASDPGCAFTKSIGFELAVGDGVYWSGRWAMVVENGIVTYAAKETNPGTDVTVSSVESVLAHL >6687568 MRVLATAADLEKLINENKGRLIVVDFFAQWCGPCRNIAPKVEALAKEIPEVEFAKVDVDQNEEAAAKYSVTAMPTFVFIKDGKEVDRFSGANETKLRETITRHK >6850955 MVSVGKKAPDFEMAGFYKGEFKTFRLSEYLGKWVVLCFYPGDFTFVXATEVSAVAEKYPEFQKLGVEVLSVSVDSVFVHKMWNDNELSKMVEGGIPFPMLSDGGGNVGTLYGVYDPEAGVENRGRFLIDPDGIIQGYEVLILPVGRNVSETLRQIQAFQLVRETKGAEVAPSGWKPGKKTLKPGPGLVGNVYKEWSVKEAFED >7109697 MKHITNKAELDQLLTTNKKVVVDFYANWCGPCKILGPIFEEVAQDKKDWTFVKVDVDQANEISSEYEIRSIPTIIFFQDGKMADKRIGFIPKNELKELLK >7290567 MVYPVRNKDDLDQQLILAEDKLVVIDFYADWCGPCKIIAPKLDELAQQYSDRVVVLKVNVDENEDITVEYNVNSMPTFVFIKGGNVLELFVGCNSDKLAKLMEKHARRLYR >9955016 XSGNARIGKPAPDFKATAVVDGAFKEVKLSDYKGKYVVLFFYPLDFTFVCPTEIIAFSNRAEDFRKLGCEVLGVSVDSQFTHLAWINTPRKEGGLGPLNIPLLADVTRRLSEDYGVLKTDEGIAYRGLFIIDGKGVLRQITVNDLPVGRSVDEALRLVQAFQYTDEHGEVCPAGWKPGSDTIKPNVDDSKEYFSKHN >15677788 MKKILTAAVVALIGILLAIVLIPDSKTAPAFSLPDLHGKTVSNADLQGKVTLINFWFPSCPGCVSEMPKIIKTANDYKNKNFQVLAVAQPIDPIESVRQYVKDYGLPFTVMYDADKAVGQAFGTQVYPTSVLIGKKGEILKTYVGEPDFGKLYQEIDTAWRNSDAVPyCogent-1.5.3/doc/data/motif_example_meme_results.txt000644 000765 000024 00000611657 11115453517 024117 0ustar00jrideoutstaff000000 000000 ******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 3.0.14 (Release date: 2005/07/19 07:15:54) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= superfamily_aln_gis.fasta ALPHABET= ACDEFGHIKLMNPQRSTVWY Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ 1091044 1.0000 166 11467494 1.0000 204 11499727 1.0000 105 1174686 1.0000 127 12044976 1.0000 102 13186328 1.0000 176 13358154 1.0000 106 13541053 1.0000 174 13541117 1.0000 157 135765 1.0000 103 1388082 1.0000 110 140543 1.0000 127 14286173 1.0000 209 14578634 1.0000 198 14600438 1.0000 163 15218394 1.0000 112 15597673 1.0000 278 15599256 1.0000 289 15602312 1.0000 175 15605725 1.0000 146 15605963 1.0000 169 15609375 1.0000 153 15609658 1.0000 157 15613511 1.0000 154 15614085 1.0000 177 15614140 1.0000 176 15615431 1.0000 184 15643152 1.0000 321 15672286 1.0000 160 15790738 1.0000 212 15791337 1.0000 89 15801846 1.0000 168 15805225 1.0000 185 15805374 1.0000 188 15807234 1.0000 151 15826629 1.0000 161 15899007 1.0000 135 15899339 1.0000 156 15964668 1.0000 161 15966937 1.0000 330 15988313 1.0000 186 16078864 1.0000 170 16123427 1.0000 145 16125919 1.0000 177 16330420 1.0000 218 1633495 1.0000 108 16501671 1.0000 200 1651717 1.0000 126 16759994 1.0000 168 16761507 1.0000 139 16803644 1.0000 181 16804867 1.0000 106 17229033 1.0000 251 17229859 1.0000 108 1729944 1.0000 107 17531233 1.0000 115 17537401 1.0000 149 17547503 1.0000 174 18309723 1.0000 403 18313548 1.0000 162 18406743 1.0000 578 19173077 1.0000 177 19554157 1.0000 121 19705357 1.0000 164 19746502 1.0000 207 20092028 1.0000 93 20151112 1.0000 186 21112072 1.0000 160 21222859 1.0000 223 21223405 1.0000 184 21227878 1.0000 154 21283385 1.0000 164 21674812 1.0000 148 23098307 1.0000 163 2649838 1.0000 93 267116 1.0000 104 27375582 1.0000 194 2822332 1.0000 119 30021713 1.0000 191 3261501 1.0000 124 3318841 1.0000 224 3323237 1.0000 105 4155972 1.0000 104 4200327 1.0000 166 4433065 1.0000 133 4704732 1.0000 162 4996210 1.0000 220 5326864 1.0000 167 6322180 1.0000 215 6323138 1.0000 176 6687568 1.0000 104 6850955 1.0000 203 7109697 1.0000 100 7290567 1.0000 111 9955016 1.0000 197 15677788 1.0000 166 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme superfamily_aln_gis.fasta -protein -mod anr -evt 1e-10 -nmotifs 10 model: mod= anr nmotifs= 10 evt= 1e-10 object function= E-value of product of p-values width: minw= 8 maxw= 50 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 50 wnsites= 0.8 theta: prob= 1 spmap= pam spfuzz= 120 em: prior= megap b= 81535 maxiter= 50 distance= 1e-05 data: n= 16307 N= 96 sample: seed= 0 seqfrac= 1 Dirichlet mixture priors file: prior30.plib Letter frequencies in dataset: A 0.084 C 0.016 D 0.064 E 0.071 F 0.051 G 0.068 H 0.014 I 0.059 K 0.072 L 0.093 M 0.021 N 0.040 P 0.051 Q 0.032 R 0.041 S 0.051 T 0.051 V 0.082 W 0.014 Y 0.027 Background letter frequencies (from dataset with add-one prior applied): A 0.084 C 0.016 D 0.064 E 0.071 F 0.051 G 0.068 H 0.014 I 0.059 K 0.072 L 0.093 M 0.021 N 0.040 P 0.051 Q 0.032 R 0.041 S 0.051 T 0.051 V 0.082 W 0.014 Y 0.027 ******************************************************************************** ******************************************************************************** MOTIF 1 width = 29 sites = 50 llr = 2381 E-value = 2.4e-621 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A :::::1:::9:::1:::2:3:1:11:41: pos.-specific C ::::::::::::a::a::::::::::::: probability D 11::::5:::2::::::::::1:11::11 matrix E 1:::::::::1::::::13::::44:115 F :::::::91:::::::::1:::2::1::: G 41:::::::::::5:::::1:::::::1: H 1:::::::::::::::::::::::::::: I ::1212:::::::::::12::21::1::: K :51:::::::::::::31:::1:13::31 L ::1143:::::::::::12::15::6::: M :::::::::::::::::2:3::1:::::: N 1:::::3::::::::::::::::1:::1: P ::3:::::::1::29:::::a:::::::: Q :1::::::::::::::11:::::11:121 R :1::::::::::::::51::::::::::: S ::::::::1:2::::::::1::::::1:: T ::1:::::::3:::::::11::::::::1 V ::2644:1:::::1:::::::211:2::: W ::::::::5::a::::::::::::::::: Y ::::::1:2:::::::::::::::::2:: bits 6.2 * * 5.6 ** * 5.0 ** * 4.3 ** * Information 3.7 ** ** * * content 3.1 **** ** ** * (68.7 bits) 2.5 * **** ** *** * 1.9 ** ******* ****** * * ** * 1.2 ***************** *** ******* 0.6 ***************************** 0.0 ----------------------------- Multilevel GKPVVVDFWATWCGPCRxEAPILEELAKE consensus ILLN Y S P K IM F K sequence I -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------------------------- 16078864 59 7.26e-28 GEKSSLQDAK GKKVLLNFWATWCKPCRQEMPAMEKLQKE YADKLAVVAV 30021713 76 1.70e-27 GTNVKLSDLK GKKVILNFWATWCGPCQQEMPDMEAFYKE HKENVEILAV 15805374 77 2.01e-27 GRTHALTAAQ GKPVVINFWASWCVPCRQEAPLFSKLSQE TAGKAEFFGV 15805225 71 3.96e-26 GQPVALADYR GRPVVLNFWASWCGPCREEAPLFAKLAAH PGAPAVLGIL 17547503 62 1.68e-25 GTPIDLSAFR GQPVVINFWAPWCGPCVEEMPELSALAQE QKARVKFIGI 18309723 285 4.41e-25 GNKHTLSEYK GKTIFLNFWATWCPPCRGEMPYIDELYKE YNENKDDVVI 16761507 52 4.41e-25 GETLDKLLKD DLPVVIDFWAPWCGPCRNFAPIFEDVAEE RSGKVRFVKV 135765 16 4.41e-25 TDQSFSAETS EGVVLADFWAPWCGPCKMIAPVLEELDQE MGDKLKIVKI 7290567 20 5.05e-25 DLDQQLILAE DKLVVIDFYADWCGPCKIIAPKLDELAQQ YSDRVVVLKV 19705357 38 1.66e-24 GKKHNIEEYK GKVVVINFWATWCGYCVEEMPGFEKVYKE FGSNKKDVII 20092028 6 1.89e-24 MAGDF MKPMLLDFSATWCGPCRMQKPILEELEKK YGDKVEFKVV 11499727 16 2.44e-24 SERFREVIQS DKLVVVDFYADWCMPCRYISPILEKLSKE YNGEVEFYKL 7109697 17 3.57e-24 KAELDQLLTT NKKVVVDFYANWCGPCKILGPIFEEVAQD KKDWTFVKVD 267116 19 4.58e-24 SNEELDKYLQ HQRVVVDFSAEWCGPCRAIAPVFDKLSNE FTTFTFVHVD 1388082 28 6.65e-24 TVQLDKAKES NKLIVIDFTASWCPPCRMIAPIFNDLAKK FMSSAIFFKV 15599256 25 9.60e-24 FEQLVIENSF HKPVLVDFWADWCAPCKALMPLLAQIAES YQGELLLAKV 1633495 20 1.08e-23 DSFDTDVLKA DGAILVDFWAEWCGPCKMIAPILDEIADE YQGKLTVAKL 15966937 60 3.99e-23 FTRDVLEASR QQPVLVDFWAPWCGPCKQLTPVIEKVVRE AAGRVKLVKM 15605725 60 3.99e-23 GKKVSIEEFK GKVLLINFWATWCPPCKEEIPMFKEIYEK YRDRGFEILA 16123427 52 5.03e-23 AETLDKLLQD DLPMVIDFWAPWCGPCRSFAPIFAETAAE RAGKVRFVKV 23098307 52 5.65e-23 GQQLRLSDFK GERVLINFWTTWCPPCRQEMPDMQRFYQD LQPNILAVNL 17229859 21 6.33e-23 ENEFDTVLSE DKVVVVDFTATWCGPCRLVSPLMDQLADE YKGRVKVVKV 6687568 19 8.90e-23 DLEKLINENK GRLIVVDFFAQWCGPCRNIAPKVEALAKE IPEVEFAKVD 15597673 156 9.96e-23 GESVQLADFR GRPLVINLWASWCPPCRREMPVLQQAQAE NPDVVFLFAN 15218394 22 9.96e-23 NSFDDLLQNS DKPVLVDFYATWCGPCQLMVPILNEVSET LKDIIAVVKI 15899007 48 3.01e-22 SKNFDEFITK NKIVVVDFWAEWCAPCLILAPVIEELAND YPQVAFGKLN 15988313 60 4.16e-22 GKPKKLSDFR GKTLLVNLWATWCVPCRKEMPALDELQGK LSGPNFEVVA 17531233 27 7.10e-22 DFEQLIRQHP EKIIILDFYATWCGPCKAIAPLYKELATT HKGIIFCKVD 13358154 20 7.89e-22 NLNELLKENH SKPILIDFYADWCPPCRMLIPVLDSIEKK HGDEFTIIKI 1174686 21 1.08e-21 AQEFANLLNT TQYVVADFYADWCGPCKAIAPMYAQFAKT FSIPNFLAFA 3323237 18 1.20e-21 SGNVRKTIET NPLVIVDFWAPWCGSCKMLGPVLEEVESE VGSGVVIGKL 18406743 363 1.48e-21 GAKVLVSDLV GKTILMYFSAHWCPPCRAFTPKLVEVYKQ IKERNEAFEL 15614085 59 1.64e-21 SQSRSLSDYR GDVVILNVWASWCEPCRKEMPALMELQSD YESEDVSIVT 16804867 18 1.82e-21 EDDLEEIISS HPKILLNFWAEWCAPCRCFWPTLEQFAEM EEGNVQVVKI 21222859 102 2.01e-21 GGQVDVADYK GKVVVLNVWGSWCPPCRAEAKNFEKVYQD VKDQGVQFVG 2822332 33 2.73e-21 QTELDDVTSD NDVVLADFYADWCGPCQMLEPVVETLAEQ TDAAVAKIDV 16759994 59 2.73e-21 ESTTLATLSE ERPVLLYFWASWCGVCRFTTPAVAHLAAE GENVMTVALR 140543 43 5.53e-21 LTEFRNLIKQ NDKLVIDFYATWCGPCKMMQPHLTKLIQA YPDVRFVKCD 15614140 62 8.21e-21 GESIELRELE GKGVFLNFWGTYCPPCEREMPHMEKLYGE YKEQGVEIIA 17537401 26 1.21e-20 DELPAEEHLK GKIIGLYFSASWCPPCRAFTPKLKEFFEE IKKTHPEFEI 18406743 43 1.34e-20 GEQVKVDSLL GKKIGLYFSAAWCGPCQRFTPQLVEVYNE LSSKVGFEIV 27375582 80 1.63e-20 VPGLDPAAFK GKVSLVNVWASWCVPCHDEAPLLTELGKD KRFQLVGINY 15615431 71 1.79e-20 GQVVDLSSLR GQPVILHFFATWCPVCQDEMPSLVKLDKE YRQKGGQFLA 12044976 18 3.84e-20 LKQLEEIFSA KKNVIVDFWAAWCGPCKLTSPEFQKAADE FSDAQFVKVN 4200327 78 6.73e-20 PDFQDRVVNS ETPVVVDFHAQWCGPCKILGPRLEKMVAK QHGKVVMAKV 4155972 16 2.21e-19 NGKNYAEKTA HQAVVVNVGASWCPDCRKIEPIMENLAKT YKGKVEFFKV 15791337 1 2.90e-19 . MTVTLKDFYADWCGPCKTQDPILEELEAD YDEDVSFEKI 1651717 33 5.02e-18 DNFDNLVLQC PKPILVYFGAPWCGLCHFVKPLLNHLHGE WQEQLVCVEV 1729944 17 2.50e-16 DTFKTEVLEG TGYVLVDYFSDGCVPCKALMPAVEELSKK YEGRVVFAKL 19746502 91 2.34e-15 GKTYRLSEFK GKKVYLKFWASWCSICLSTLADTEDLAKM SDKDYVVLTV -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 16078864 7.3e-28 58_[1]_83 30021713 1.7e-27 75_[1]_87 15805374 2e-27 76_[1]_83 15805225 4e-26 70_[1]_86 17547503 1.7e-25 61_[1]_84 18309723 4.4e-25 284_[1]_90 16761507 4.4e-25 51_[1]_59 135765 4.4e-25 15_[1]_59 7290567 5e-25 19_[1]_63 19705357 1.7e-24 37_[1]_98 20092028 1.9e-24 5_[1]_59 11499727 2.4e-24 15_[1]_61 7109697 3.6e-24 16_[1]_55 267116 4.6e-24 18_[1]_57 1388082 6.7e-24 27_[1]_54 15599256 9.6e-24 24_[1]_236 1633495 1.1e-23 19_[1]_60 15966937 4e-23 59_[1]_242 15605725 4e-23 59_[1]_58 16123427 5e-23 51_[1]_65 23098307 5.6e-23 51_[1]_83 17229859 6.3e-23 20_[1]_59 6687568 8.9e-23 18_[1]_57 15597673 1e-22 155_[1]_94 15218394 1e-22 21_[1]_62 15899007 3e-22 47_[1]_59 15988313 4.2e-22 59_[1]_98 17531233 7.1e-22 26_[1]_60 13358154 7.9e-22 19_[1]_58 1174686 1.1e-21 20_[1]_78 3323237 1.2e-21 17_[1]_59 18406743 1.3e-20 42_[1]_291_[1]_187 15614085 1.6e-21 58_[1]_90 16804867 1.8e-21 17_[1]_60 21222859 2e-21 101_[1]_93 2822332 2.7e-21 32_[1]_58 16759994 2.7e-21 58_[1]_81 140543 5.5e-21 42_[1]_56 15614140 8.2e-21 61_[1]_86 17537401 1.2e-20 25_[1]_95 27375582 1.6e-20 79_[1]_86 15615431 1.8e-20 70_[1]_85 12044976 3.8e-20 17_[1]_56 4200327 6.7e-20 77_[1]_60 4155972 2.2e-19 15_[1]_60 15791337 2.9e-19 [1]_60 1651717 5e-18 32_[1]_65 1729944 2.5e-16 16_[1]_62 19746502 2.3e-15 90_[1]_88 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=29 seqs=50 16078864 ( 59) GKKVLLNFWATWCKPCRQEMPAMEKLQKE 1 30021713 ( 76) GKKVILNFWATWCGPCQQEMPDMEAFYKE 1 15805374 ( 77) GKPVVINFWASWCVPCRQEAPLFSKLSQE 1 15805225 ( 71) GRPVVLNFWASWCGPCREEAPLFAKLAAH 1 17547503 ( 62) GQPVVINFWAPWCGPCVEEMPELSALAQE 1 18309723 ( 285) GKTIFLNFWATWCPPCRGEMPYIDELYKE 1 16761507 ( 52) DLPVVIDFWAPWCGPCRNFAPIFEDVAEE 1 135765 ( 16) EGVVLADFWAPWCGPCKMIAPVLEELDQE 1 7290567 ( 20) DKLVVIDFYADWCGPCKIIAPKLDELAQQ 1 19705357 ( 38) GKVVVINFWATWCGYCVEEMPGFEKVYKE 1 20092028 ( 6) MKPMLLDFSATWCGPCRMQKPILEELEKK 1 11499727 ( 16) DKLVVVDFYADWCMPCRYISPILEKLSKE 1 7109697 ( 17) NKKVVVDFYANWCGPCKILGPIFEEVAQD 1 267116 ( 19) HQRVVVDFSAEWCGPCRAIAPVFDKLSNE 1 1388082 ( 28) NKLIVIDFTASWCPPCRMIAPIFNDLAKK 1 15599256 ( 25) HKPVLVDFWADWCAPCKALMPLLAQIAES 1 1633495 ( 20) DGAILVDFWAEWCGPCKMIAPILDEIADE 1 15966937 ( 60) QQPVLVDFWAPWCGPCKQLTPVIEKVVRE 1 15605725 ( 60) GKVLLINFWATWCPPCKEEIPMFKEIYEK 1 16123427 ( 52) DLPMVIDFWAPWCGPCRSFAPIFAETAAE 1 23098307 ( 52) GERVLINFWTTWCPPCRQEMPDMQRFYQD 1 17229859 ( 21) DKVVVVDFTATWCGPCRLVSPLMDQLADE 1 6687568 ( 19) GRLIVVDFFAQWCGPCRNIAPKVEALAKE 1 15597673 ( 156) GRPLVINLWASWCPPCRREMPVLQQAQAE 1 15218394 ( 22) DKPVLVDFYATWCGPCQLMVPILNEVSET 1 15899007 ( 48) NKIVVVDFWAEWCAPCLILAPVIEELAND 1 15988313 ( 60) GKTLLVNLWATWCVPCRKEMPALDELQGK 1 17531233 ( 27) EKIIILDFYATWCGPCKAIAPLYKELATT 1 13358154 ( 20) SKPILIDFYADWCPPCRMLIPVLDSIEKK 1 1174686 ( 21) TQYVVADFYADWCGPCKAIAPMYAQFAKT 1 3323237 ( 18) NPLVIVDFWAPWCGSCKMLGPVLEEVESE 1 18406743 ( 363) GKTILMYFSAHWCPPCRAFTPKLVEVYKQ 1 15614085 ( 59) GDVVILNVWASWCEPCRKEMPALMELQSD 1 16804867 ( 18) HPKILLNFWAEWCAPCRCFWPTLEQFAEM 1 21222859 ( 102) GKVVVLNVWGSWCPPCRAEAKNFEKVYQD 1 2822332 ( 33) NDVVLADFYADWCGPCQMLEPVVETLAEQ 1 16759994 ( 59) ERPVLLYFWASWCGVCRFTTPAVAHLAAE 1 140543 ( 43) NDKLVIDFYATWCGPCKMMQPHLTKLIQA 1 15614140 ( 62) GKGVFLNFWGTYCPPCEREMPHMEKLYGE 1 17537401 ( 26) GKIIGLYFSASWCPPCRAFTPKLKEFFEE 1 18406743 ( 43) GKKIGLYFSAAWCGPCQRFTPQLVEVYNE 1 27375582 ( 80) GKVSLVNVWASWCVPCHDEAPLLTELGKD 1 15615431 ( 71) GQPVILHFFATWCPVCQDEMPSLVKLDKE 1 12044976 ( 18) KKNVIVDFWAAWCGPCKLTSPEFQKAADE 1 4200327 ( 78) ETPVVVDFHAQWCGPCKILGPRLEKMVAK 1 4155972 ( 16) HQAVVVNVGASWCPDCRKIEPIMENLAKT 1 15791337 ( 1) MTVTLKDFYADWCGPCKTQDPILEELEAD 1 1651717 ( 33) PKPILVYFGAPWCGLCHFVKPLLNHLHGE 1 1729944 ( 17) TGYVLVDYFSDGCVPCKALMPAVEELSKK 1 19746502 ( 91) GKKVYLKFWASWCSICLSTLADTEDLAKM 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 29 n= 13619 bayes= 8.57575 E= 2.4e-621 -307 -432 105 24 -485 245 244 -447 -123 -431 83 153 -129 -26 -235 -84 -26 -433 -461 -356 -323 -445 -10 -128 -502 -26 -191 -453 278 -121 -330 -256 -40 185 99 -248 -28 -441 -465 -370 -91 -296 -491 -418 -344 -172 -280 16 85 15 -196 -92 242 -322 -7 -308 24 110 -379 51 -235 -246 -491 -455 -397 -487 -312 153 -471 -56 48 -432 -441 -405 -376 -160 -122 300 -462 -441 -436 -381 -695 -621 -31 -83 -426 107 -601 199 -233 -539 -598 -492 -521 -455 -393 223 -460 -42 -48 -374 -695 -625 -425 -596 -444 202 -186 165 6 -544 -607 -509 -533 -461 -389 210 -494 -433 -525 -579 298 -425 -689 -405 60 -758 -172 -738 -677 308 -564 -396 -487 -315 -430 -744 -684 181 -507 -364 -614 -603 406 -604 -345 -365 -613 -118 -298 -525 -554 -523 -543 -436 -519 -41 -282 -5 -740 -586 -790 -794 38 -81 61 -626 -751 -568 -542 -597 -750 -589 -627 94 -39 -663 527 291 341 -314 -658 -630 -618 -74 -511 -570 -648 -581 -475 -534 -569 -535 -556 -95 -123 -448 -601 -600 -94 -462 126 25 -521 -388 63 -486 -280 -470 -365 -56 134 43 -272 202 233 -471 -498 -390 -447 -335 -462 -462 -190 -303 -297 -430 -453 -288 -291 -389 -480 -349 -348 -389 -407 -416 607 -90 -484 593 -632 -597 -590 -612 -466 -511 -649 -567 -458 -553 -619 -552 -529 -501 -462 -587 -617 -586 -54 -390 -443 -184 -550 306 -352 -480 -187 -510 -26 -359 199 -389 -395 -117 -385 -26 -509 -476 -312 -425 -182 -432 -527 -457 -328 -174 -423 -227 -420 -422 405 -339 -381 -122 -354 -127 -550 -75 -484 593 -632 -597 -590 -612 -466 -511 -649 -567 -458 -553 -619 -552 -529 -501 -462 -587 -617 -586 -528 -578 -626 -171 -724 -578 152 -591 206 -123 -467 -421 -605 163 344 -453 -461 -107 -572 -521 83 18 -57 25 -48 -160 -158 30 -9 -69 274 13 -377 160 58 -12 -95 -400 -431 -45 -338 -292 -591 205 116 -483 -322 173 -493 94 99 -432 -500 25 -415 -344 24 -68 -379 -321 169 -383 -136 -55 -432 -27 -171 -58 -56 -193 344 -231 -387 -23 -219 31 89 -179 41 -327 -176 -429 -465 -438 -533 -461 -334 -485 -190 -464 -427 -427 418 -345 -387 -300 -360 -471 -556 -511 22 -402 -9 -52 -450 -161 150 159 24 45 83 -61 -379 -19 -61 -79 -95 80 -429 -45 -483 -422 -735 -657 204 -634 -459 14 -638 230 247 -583 -626 -510 -548 -503 -129 0 -471 50 22 -420 103 243 -468 -378 -169 -425 -10 -409 -8 60 -386 96 -216 -14 -25 -53 -443 -342 -41 -418 -7 236 -466 -376 150 -424 174 -408 -302 -63 -385 160 -62 -81 -98 -410 -441 -340 -114 -354 -635 -555 80 -557 -378 44 -539 261 16 -496 -534 -419 -448 -420 -131 99 -421 -379 209 -390 -59 24 -139 -161 65 -149 -222 -390 -286 -222 -381 130 -210 94 -247 -104 -427 252 45 -410 -8 94 -458 -27 -160 -418 202 -401 -295 61 -380 221 -61 -12 -96 -404 -434 -333 -170 -501 108 268 -545 -435 58 -489 91 -473 83 -290 -435 95 -286 -101 58 -469 -520 -416 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 29 nsites= 50 E= 2.4e-621 0.000000 0.000000 0.140000 0.080000 0.000000 0.420000 0.080000 0.000000 0.020000 0.000000 0.040000 0.120000 0.020000 0.020000 0.000000 0.020000 0.040000 0.000000 0.000000 0.000000 0.000000 0.000000 0.060000 0.020000 0.000000 0.060000 0.000000 0.000000 0.540000 0.040000 0.000000 0.000000 0.040000 0.120000 0.080000 0.000000 0.040000 0.000000 0.000000 0.000000 0.040000 0.000000 0.000000 0.000000 0.000000 0.020000 0.000000 0.060000 0.140000 0.100000 0.000000 0.020000 0.300000 0.000000 0.040000 0.000000 0.060000 0.180000 0.000000 0.040000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.220000 0.000000 0.080000 0.040000 0.000000 0.000000 0.000000 0.000000 0.020000 0.020000 0.620000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.040000 0.040000 0.000000 0.120000 0.000000 0.380000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.400000 0.000000 0.020000 0.060000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.240000 0.020000 0.300000 0.020000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.360000 0.000000 0.000000 0.000000 0.000000 0.520000 0.000000 0.000000 0.000000 0.020000 0.000000 0.020000 0.000000 0.000000 0.340000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.100000 0.000000 0.000000 0.000000 0.000000 0.860000 0.000000 0.000000 0.000000 0.000000 0.040000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.080000 0.000000 0.020000 0.000000 0.000000 0.000000 0.000000 0.060000 0.040000 0.020000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.100000 0.040000 0.000000 0.540000 0.200000 0.920000 0.000000 0.000000 0.000000 0.000000 0.040000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.020000 0.020000 0.000000 0.000000 0.000000 0.040000 0.000000 0.160000 0.080000 0.000000 0.000000 0.020000 0.000000 0.000000 0.000000 0.000000 0.020000 0.140000 0.040000 0.000000 0.220000 0.280000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.020000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.960000 0.020000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.060000 0.000000 0.000000 0.020000 0.000000 0.540000 0.000000 0.000000 0.020000 0.000000 0.020000 0.000000 0.240000 0.000000 0.000000 0.020000 0.000000 0.080000 0.000000 0.000000 0.000000 0.000000 0.020000 0.000000 0.000000 0.000000 0.000000 0.020000 0.000000 0.020000 0.000000 0.000000 0.860000 0.000000 0.000000 0.020000 0.000000 0.040000 0.000000 0.020000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.020000 0.000000 0.000000 0.040000 0.000000 0.300000 0.040000 0.000000 0.000000 0.000000 0.100000 0.460000 0.000000 0.000000 0.040000 0.000000 0.000000 0.160000 0.020000 0.040000 0.080000 0.040000 0.020000 0.000000 0.080000 0.060000 0.060000 0.160000 0.040000 0.000000 0.100000 0.060000 0.040000 0.020000 0.000000 0.000000 0.020000 0.000000 0.000000 0.000000 0.320000 0.120000 0.000000 0.000000 0.200000 0.000000 0.180000 0.040000 0.000000 0.000000 0.040000 0.000000 0.000000 0.060000 0.040000 0.000000 0.000000 0.300000 0.000000 0.020000 0.040000 0.000000 0.060000 0.000000 0.040000 0.040000 0.020000 0.260000 0.000000 0.000000 0.020000 0.000000 0.060000 0.100000 0.020000 0.020000 0.000000 0.020000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.020000 0.000000 0.000000 0.000000 0.960000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.100000 0.000000 0.060000 0.040000 0.000000 0.020000 0.040000 0.200000 0.080000 0.140000 0.040000 0.020000 0.000000 0.020000 0.020000 0.020000 0.020000 0.160000 0.000000 0.020000 0.000000 0.000000 0.000000 0.000000 0.220000 0.000000 0.000000 0.060000 0.000000 0.460000 0.120000 0.000000 0.000000 0.000000 0.000000 0.000000 0.020000 0.080000 0.000000 0.040000 0.100000 0.000000 0.140000 0.420000 0.000000 0.000000 0.000000 0.000000 0.060000 0.000000 0.020000 0.060000 0.000000 0.060000 0.000000 0.040000 0.040000 0.060000 0.000000 0.000000 0.060000 0.000000 0.060000 0.400000 0.000000 0.000000 0.040000 0.000000 0.260000 0.000000 0.000000 0.020000 0.000000 0.100000 0.020000 0.020000 0.020000 0.000000 0.000000 0.000000 0.040000 0.000000 0.000000 0.000000 0.100000 0.000000 0.000000 0.080000 0.000000 0.560000 0.020000 0.000000 0.000000 0.000000 0.000000 0.000000 0.020000 0.180000 0.000000 0.000000 0.400000 0.000000 0.040000 0.080000 0.020000 0.020000 0.020000 0.020000 0.000000 0.000000 0.000000 0.000000 0.000000 0.080000 0.000000 0.100000 0.000000 0.040000 0.000000 0.180000 0.120000 0.000000 0.060000 0.140000 0.000000 0.060000 0.000000 0.000000 0.320000 0.000000 0.000000 0.060000 0.000000 0.160000 0.020000 0.040000 0.020000 0.000000 0.000000 0.000000 0.020000 0.000000 0.140000 0.480000 0.000000 0.000000 0.020000 0.000000 0.140000 0.000000 0.040000 0.000000 0.000000 0.060000 0.000000 0.020000 0.080000 0.000000 0.000000 0.000000 -------------------------------------------------------------------------------- Time 66.68 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 width = 17 sites = 41 llr = 1175 E-value = 7.3e-239 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A :::11::24:::::11: pos.-specific C :::::::::::::a::: probability D ::::::::41::::::: matrix E ::::::::::::::::6 F ::1562:::6:2::::: G ::::1::3:::12:::: H :::::1::::::::::: I :21:::::1:::1:::: K 1::::::1:::::::1: L :17::::1:::::::1: M ::::::::::::::::: N ::::::::::::::::1 P ::::::a::::5::2:: Q ::::::::::::::::2 R :::::::1:::::::1: S :::11::1::11::2:1 T 1:::::::::9:1:44: V 661111::::::6:::: W :::::::::1::::::: Y :::1:4::::::::::: bits 6.2 5.6 * 5.0 * 4.3 * * Information 3.7 * * * content 3.1 * * * (41.3 bits) 2.5 ** ** *** * * 1.9 ******* ******* * 1.2 ***************** 0.6 ***************** 0.0 ----------------- Multilevel VVLFFYPGAFTPVCTTE consensus I D F P sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------------- 13541053 34 1.14e-19 IRLSSYRGKW VVLFFYPADFTFVCPTE VEGFAEDYEK 9955016 37 2.52e-19 VKLSDYKGKY VVLFFYPLDFTFVCPTE IIAFSNRAED 19173077 32 1.46e-17 ISLQDYIGKY VVLAFYPLDFTFVCPTE INRFSDLKGA 16803644 39 2.64e-17 LEKNIEDDKW TILFFYPMDFTFVCPTE IVAISARSDE 15790738 67 8.20e-17 LTDALADNRA VVLFFYPFDFSPVCATE LCAIQNARWF 15899339 32 1.24e-16 KIPSDFKGKV VVLAFYPAAFTSVCTKE MCTFRDSMAK 14600438 31 1.61e-16 IRLSDFRGRI VVLYFYPRAMTPGCTRE GVRFNELLDE 4433065 16 2.10e-16 IDINEYKGKY VVLLFYPLDWTFVCPTE MIGYSEVAGQ 4996210 34 4.01e-16 DFYKYVGDNW AILFSHPHDFTPVCTTE LAEFGKMHEE 20151112 33 4.55e-16 VTEKDTEGRW SVFFFYPADFTFVCPTE LGDVADHYEE 13541117 32 4.55e-16 RKLSEFRGQN VVLAFFPGAFTSVCTKE MCTFRDSMAN 15807234 32 5.16e-16 TLSSYRGQSH VVLVFYPLDFSPVCSMQ LPEYSGSQDD 3318841 34 6.63e-16 RFHDFLGDSW GILFSHPRDFTPVCTTE LGRAAKLAPE 18313548 34 6.63e-16 LSEVLKRGRP VVLLFFPGAFTSVCTKE LCTFRDKMAL 15613511 30 7.50e-16 VSLSDFKGKN IVLYFYPKDMTPGCTTE ACDFRDRVED 14286173 31 7.50e-16 ELPDEFEGKW FILFSHPADFTPVCTTE FVAFQEVYPE 16501671 37 1.75e-15 NFKQHTNGKT TVLFFWPMDFTFVCPSE LIAFDKRYEE 21674812 31 1.97e-15 VSLSEFKGRK VLLIFYPGDDTPVCTAQ LCDYRNNVAA 5326864 47 5.55e-15 NASKEFANKK VVLFALPGAFTPVCSAN HVPEYIQKLP 15964668 36 1.20e-14 TTELLFKGKR VVLFAVPGAFTPTCSLN HLPGYLENRD 4704732 38 2.05e-14 SVHSIAAGKK VILFGVPGAFTPTCSMS HVPGFIGKAE 15609658 36 2.82e-14 VSLADYRGRR VIVYFYPAASTPGCTKQ ACDFRDNLGD 21112072 35 3.47e-14 DTHTLFTGRK VVLFAVPGAFTPTCSAK HLPGYVEQFE 16125919 56 3.85e-14 NLAKALKKGP VVLYFFPAAYTAGCTAE AREFAEATPE 6322180 94 4.26e-14 LKKITENNRV VVFFVYPRASTPGCTRQ ACGFRDNYQE 21223405 38 1.05e-13 INHKTYEGQW KVVFAWPKDFTFVCPTE IAAFGKLNDE 6850955 34 1.16e-13 FRLSEYLGKW VVLCFYPGDFTFVXATE VSAVAEKYPE 1091044 43 3.68e-13 DFDKEFRDKT VVIVAIPGAFTPTCTAN HIPPFVEKFT 21227878 34 5.85e-13 IHLYDLKGKK VLLSFHPLAWTQVCAQQ MKSLEENYEL 15826629 34 7.69e-13 NLAELFKGKK GVLFGVPGAFTPGCSKT HLPGFVEQAE 15643152 26 1.21e-12 FTHVDLYGKY TILFFFPKAGTSGCTRE AVEFSRENFE 11467494 38 1.72e-12 RLSDYRGKKY VILFFYPANFTAISPTE LMLLSDRISE 15609375 32 3.44e-12 TLRGYRGAKN VLLVFFPLAFTGICQGE LDQLRDHLPE 17229033 37 8.66e-12 TTDDLFAGKT VAVFSLPGAFTPTCSST HLPGYNELAK 13186328 32 3.67e-11 IKEIDLKSNW NVFFFYPYSYSFICPLE LKNISNKIKE 6323138 49 9.94e-11 WSKLISENKK VIITGAPAAFSPTCTVS HIPGYINYLD 15605963 48 5.32e-10 KIVGGAKDVV QVIITVPSLDTPVCETE TKKFNEIMAG 15672286 45 6.58e-10 KIKLSKLEKP VLISVFPDINTRVCSLQ TKHFNLEAAK 15801846 48 1.51e-09 VTLGQFAGKR KVLNIFPSIDTGVCAAS VRKFNQLATE 16330420 73 5.37e-09 ILSEQLKTGP ILLKFFRGYWCPYCGLE LRAYQKVVNK 21283385 47 6.53e-09 VTLADYAGKK KLISVVPSIDTGVCDQQ TRKFNSDASK -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 13541053 1.1e-19 33_[2]_124 9955016 2.5e-19 36_[2]_144 19173077 1.5e-17 31_[2]_129 16803644 2.6e-17 38_[2]_126 15790738 8.2e-17 66_[2]_129 15899339 1.2e-16 31_[2]_108 14600438 1.6e-16 30_[2]_116 4433065 2.1e-16 15_[2]_101 4996210 4e-16 33_[2]_170 20151112 4.6e-16 32_[2]_137 13541117 4.6e-16 31_[2]_109 15807234 5.2e-16 31_[2]_103 3318841 6.6e-16 33_[2]_174 18313548 6.6e-16 33_[2]_112 15613511 7.5e-16 29_[2]_108 14286173 7.5e-16 30_[2]_162 16501671 1.7e-15 36_[2]_147 21674812 2e-15 30_[2]_101 5326864 5.5e-15 46_[2]_104 15964668 1.2e-14 35_[2]_109 4704732 2.1e-14 37_[2]_108 15609658 2.8e-14 35_[2]_105 21112072 3.5e-14 34_[2]_109 16125919 3.8e-14 55_[2]_105 6322180 4.3e-14 93_[2]_105 21223405 1.1e-13 37_[2]_130 6850955 1.2e-13 33_[2]_153 1091044 3.7e-13 42_[2]_107 21227878 5.9e-13 33_[2]_104 15826629 7.7e-13 33_[2]_111 15643152 1.2e-12 25_[2]_279 11467494 1.7e-12 37_[2]_150 15609375 3.4e-12 31_[2]_105 17229033 8.7e-12 36_[2]_198 13186328 3.7e-11 31_[2]_128 6323138 9.9e-11 48_[2]_111 15605963 5.3e-10 47_[2]_105 15672286 6.6e-10 44_[2]_99 15801846 1.5e-09 47_[2]_104 16330420 5.4e-09 72_[2]_129 21283385 6.5e-09 46_[2]_101 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=17 seqs=41 13541053 ( 34) VVLFFYPADFTFVCPTE 1 9955016 ( 37) VVLFFYPLDFTFVCPTE 1 19173077 ( 32) VVLAFYPLDFTFVCPTE 1 16803644 ( 39) TILFFYPMDFTFVCPTE 1 15790738 ( 67) VVLFFYPFDFSPVCATE 1 15899339 ( 32) VVLAFYPAAFTSVCTKE 1 14600438 ( 31) VVLYFYPRAMTPGCTRE 1 4433065 ( 16) VVLLFYPLDWTFVCPTE 1 4996210 ( 34) AILFSHPHDFTPVCTTE 1 20151112 ( 33) SVFFFYPADFTFVCPTE 1 13541117 ( 32) VVLAFFPGAFTSVCTKE 1 15807234 ( 32) VVLVFYPLDFSPVCSMQ 1 3318841 ( 34) GILFSHPRDFTPVCTTE 1 18313548 ( 34) VVLLFFPGAFTSVCTKE 1 15613511 ( 30) IVLYFYPKDMTPGCTTE 1 14286173 ( 31) FILFSHPADFTPVCTTE 1 16501671 ( 37) TVLFFWPMDFTFVCPSE 1 21674812 ( 31) VLLIFYPGDDTPVCTAQ 1 5326864 ( 47) VVLFALPGAFTPVCSAN 1 15964668 ( 36) VVLFAVPGAFTPTCSLN 1 4704732 ( 38) VILFGVPGAFTPTCSMS 1 15609658 ( 36) VIVYFYPAASTPGCTKQ 1 21112072 ( 35) VVLFAVPGAFTPTCSAK 1 16125919 ( 56) VVLYFFPAAYTAGCTAE 1 6322180 ( 94) VVFFVYPRASTPGCTRQ 1 21223405 ( 38) KVVFAWPKDFTFVCPTE 1 6850955 ( 34) VVLCFYPGDFTFVXATE 1 1091044 ( 43) VVIVAIPGAFTPTCTAN 1 21227878 ( 34) VLLSFHPLAWTQVCAQQ 1 15826629 ( 34) GVLFGVPGAFTPGCSKT 1 15643152 ( 26) TILFFFPKAGTSGCTRE 1 11467494 ( 38) VILFFYPANFTAISPTE 1 15609375 ( 32) VLLVFFPLAFTGICQGE 1 17229033 ( 37) VAVFSLPGAFTPTCSST 1 13186328 ( 32) NVFFFYPYSYSFICPLE 1 6323138 ( 49) VIITGAPAAFSPTCTVS 1 15605963 ( 48) QVIITVPSLDTPVCETE 1 15672286 ( 45) VLISVFPDINTRVCSLQ 1 15801846 ( 48) KVLNIFPSIDTGVCAAS 1 16330420 ( 73) ILLKFFRGYWCPYCGLE 1 21283385 ( 47) KLISVVPSIDTGVCDQQ 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 17 n= 14771 bayes= 10.3384 E= 7.3e-239 -116 -174 -408 -375 -195 -171 -234 8 -125 -248 -201 -175 -362 -145 -296 -176 -32 311 -404 -399 -172 -419 -770 -736 -538 -738 -643 192 -746 63 -364 -680 -715 -687 -692 -630 -459 287 -710 -615 -551 -462 -759 -677 44 -711 -502 101 -667 296 -174 -643 -635 -510 -559 -591 -502 -19 -485 -472 -19 58 -550 -476 311 -448 -277 -1 -157 -62 -161 -74 -463 -358 -379 45 -80 0 -338 173 56 -311 -635 -578 343 7 -395 -99 -566 -361 -283 -475 -515 -452 -479 93 -88 -9 -461 -408 -156 -223 -358 -326 181 -378 244 -136 -338 -109 -193 -283 -354 -271 -262 -207 -279 22 132 404 -480 -568 -613 -598 -677 -593 -479 -645 -587 -619 -587 -579 425 -503 -80 -461 -519 -632 -669 -644 107 -373 -109 -204 -114 187 90 -369 14 48 107 -198 -357 -134 82 57 -223 -361 -403 -20 227 -356 250 -380 -424 -400 -264 32 -396 -165 -285 -42 -480 -319 -372 -79 -313 -340 -451 -17 -440 -304 -15 -525 382 -213 -311 -329 -539 -257 57 -132 -491 -463 -482 -68 -454 -385 171 66 -357 39 -512 -541 -528 -495 -369 -399 -471 -508 -330 -301 -502 -375 -409 98 402 -390 -500 -499 -81 -400 -445 -420 187 -22 -316 -467 -410 -448 -407 -402 344 -56 -93 67 -337 -450 -537 -491 -412 -365 -687 -633 -458 121 -472 50 -625 -348 -302 -558 -611 -538 -553 -468 143 280 -538 -21 -400 591 -545 -508 -503 -535 -389 -425 -557 -480 -374 -469 -538 -468 -447 -330 -379 -498 -537 -503 16 -427 -108 -115 -534 -137 -223 -487 -311 -490 -379 -207 213 -21 -302 185 274 -470 -512 -411 69 -366 -286 -217 -417 -138 -149 -355 73 -5 108 -208 -366 71 81 11 264 -157 -403 -309 -414 -633 -247 306 -696 -466 -319 -603 -136 -588 -493 86 -497 234 -398 49 -11 -566 -665 -538 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 17 nsites= 41 E= 7.3e-239 0.024390 0.000000 0.000000 0.000000 0.024390 0.048780 0.000000 0.048780 0.073171 0.000000 0.000000 0.024390 0.000000 0.024390 0.000000 0.024390 0.073171 0.634146 0.000000 0.000000 0.024390 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.219512 0.000000 0.146341 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.609756 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.073171 0.000000 0.000000 0.121951 0.000000 0.731707 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.073171 0.000000 0.000000 0.073171 0.024390 0.000000 0.000000 0.487805 0.000000 0.000000 0.048780 0.024390 0.048780 0.000000 0.024390 0.000000 0.000000 0.000000 0.073171 0.024390 0.073171 0.000000 0.097561 0.121951 0.000000 0.000000 0.000000 0.585366 0.073171 0.000000 0.024390 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.097561 0.024390 0.073171 0.000000 0.000000 0.024390 0.000000 0.000000 0.000000 0.195122 0.000000 0.097561 0.024390 0.000000 0.048780 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.146341 0.048780 0.414634 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.975610 0.000000 0.024390 0.000000 0.000000 0.000000 0.000000 0.000000 0.195122 0.000000 0.024390 0.000000 0.024390 0.292683 0.024390 0.000000 0.073171 0.146341 0.048780 0.000000 0.000000 0.000000 0.073171 0.073171 0.000000 0.000000 0.000000 0.024390 0.439024 0.000000 0.390244 0.000000 0.000000 0.000000 0.000000 0.073171 0.000000 0.024390 0.000000 0.024390 0.000000 0.000000 0.000000 0.024390 0.000000 0.000000 0.000000 0.024390 0.000000 0.000000 0.097561 0.000000 0.634146 0.024390 0.000000 0.000000 0.000000 0.000000 0.048780 0.024390 0.000000 0.000000 0.000000 0.048780 0.000000 0.000000 0.073171 0.048780 0.000000 0.024390 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.097561 0.878049 0.000000 0.000000 0.000000 0.048780 0.000000 0.000000 0.000000 0.243902 0.073171 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.487805 0.024390 0.024390 0.097561 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.170732 0.000000 0.073171 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.146341 0.585366 0.000000 0.024390 0.002057 0.951610 0.001551 0.001720 0.001239 0.001669 0.000335 0.001427 0.001751 0.002265 0.000513 0.000965 0.001233 0.000782 0.001003 0.025633 0.001255 0.001999 0.000330 0.000665 0.097561 0.000000 0.024390 0.024390 0.000000 0.024390 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.243902 0.024390 0.000000 0.195122 0.365854 0.000000 0.000000 0.000000 0.146341 0.000000 0.000000 0.000000 0.000000 0.024390 0.000000 0.000000 0.121951 0.097561 0.048780 0.000000 0.000000 0.048780 0.073171 0.048780 0.365854 0.024390 0.000000 0.000000 0.000000 0.000000 0.000000 0.609756 0.000000 0.000000 0.000000 0.000000 0.024390 0.000000 0.000000 0.073171 0.000000 0.170732 0.000000 0.073171 0.048780 0.000000 0.000000 0.000000 -------------------------------------------------------------------------------- Time 130.52 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 width = 29 sites = 45 llr = 1514 E-value = 2.2e-254 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A 2:::::11:1:612::::2::1::1:::: pos.-specific C ::::::::::::::::::::::::::::: probability D :::3:811:3::::::::::::::::241 matrix E :::::13112::12:2::::::::::::1 F 2:1:::::::::::2:::::::2133::: G 1::1::::2::::::4::2:::::::1:6 H :::::::1::::::::::::::::::::: I ::2:1:::::2:1:::4::4::1222::: K :6::::111:::12:1:1::::::::4:: L 1:111:::::5:::::1::2::3222::: M :::::::::::::::::::1:1:1::::: N :::4:::3::::1::1::::::::::12: P :2::::::1:::::::::::a::::::1: Q ::::::1:2:::11:1::::::::::::: R ::::::::11:::1:1:3:::::::::1: S :::::::1:1:21::::14:::1:::::: T ::::1:1:::1:1::::11::81:::::: V 3:514:::::2:::::5::1::1321::: W ::::::::::::::::::::::::::::: Y ::::::::::::::6::::::::1:1::: bits 6.2 5.6 5.0 4.3 * Information 3.7 * content 3.1 * * ** (48.5 bits) 2.5 * * * ** * 1.9 *** * ** * * *** * 1.2 ****** **** **************** 0.6 ***************************** 0.0 ----------------------------- Multilevel VKVNVDENQDLAxEYGVRSIPTLVFFKDG consensus PID E KF I GL ILI N sequence V -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------------------------- 135765 52 2.50e-23 DQEMGDKLKI VKIDVDENQETAGKYGVMSIPTLLVLKDG EVVETSVGFK 7109697 52 2.06e-22 VAQDKKDWTF VKVDVDQANEISSEYEIRSIPTIIFFQDG KMADKRIGFI 2822332 67 2.06e-22 TLAEQTDAAV AKIDVDENQALASAYGVRGVPTLVLFADG EQVEEVVGLQ 11499727 52 2.40e-21 SKEYNGEVEF YKLNVDENQDVAFEYGIASIPTVLFFRNG KVVGGFIGAM 15899007 83 2.81e-21 LANDYPQVAF GKLNTEESQDIAMRYGIMSLPTIMFFKNG ELVDQILGAV 1174686 59 4.48e-21 TFSIPNFLAF AKINVDSVQQVAQHYRVSAMPTFLFFKNG KQVAVNGSVM 6687568 54 1.30e-20 LAKEIPEVEF AKVDVDQNEEAAAKYSVTAMPTFVFIKDG KEVDRFSGAN 12044976 53 1.30e-20 AADEFSDAQF VKVNVDDHTDIAAAYNITSLPTIVVFENG VEKKRAIGFM 7290567 56 2.34e-20 AQQYSDRVVV LKVNVDENEDITVEYNVNSMPTFVFIKGG NVLELFVGCN 20092028 42 3.13e-20 EKKYGDKVEF KVVDVDENQELASKYGIHAVPTLIIQKDG TEVKRFMGVT 1633495 56 4.19e-20 ADEYQGKLTV AKLNIDQNPGTAPKYGIRGIPTLLLFKNG EVAATKVGAL 3323237 54 8.53e-20 ESEVGSGVVI GKLNVDDDQDLAVEFNVASIPTLIVFKDG KEVDRSIGFV 15599256 61 8.53e-20 AESYQGELLL AKVNCDVEQDIVMRFGIRSLPTVVLFKDG QPVDGFAGAQ 1388082 64 8.53e-20 AKKFMSSAIF FKVDVDELQSVAKEFGVEAMPTFVFIKAG EVVDKLVGAN 17531233 62 9.82e-20 LATTHKGIIF CKVDVDEAEDLCSKYDVKMMPTFIFTKNG DAIEALEGCV 16761507 88 5.79e-19 AEERSGKVRF VKVNTEAERELSARFGIRSIPMIMIFKHG QVVDMLNGAV 15791337 37 6.61e-19 EADYDEDVSF EKIDVDEAEDVANEYQVRSIPTIVVENDD GVVERFVGVT 15218394 58 9.80e-19 SETLKDIIAV VKIDTEKYPSLANKYQIEALPTFILFKDG KLWDRFVSFL 4200327 114 1.65e-18 VAKQHGKVVM AKVDIDDHTDLAIEYEVSAVPTVLAMKNG DVVDKFVGIK 16123427 88 1.65e-18 AAERAGKVRF VKVNTEAEPALSTRFRIRSIPTIMLYRNG KMIDMLNGAV 140543 78 5.14e-18 LIQAYPDVRF VKCDVDESPDIAKECEVTAMPTFVLGKDG QLIGKIIGAN 13358154 56 1.07e-17 EKKHGDEFTI IKINVDHFPELSTQYQVKSIPALFYLKNG DIKATSLGFI 17229859 57 3.99e-17 ADEYKGRVKV VKVDVDNNKPLFKRFGLRSIPAVLIFKDG ELTEKIVGVS 16804867 54 4.74e-16 AEMEEGNVQV VKINVDKQRALAQKFDVKGIPNSLVLVDG EIKGAIAGIV 1651717 69 1.70e-15 HGEWQEQLVC VEVNADVNLHLANAYRLKNLPTLILFNRG QVIQRLEDFR 15966937 96 2.09e-15 VREAAGRVKL VKMNIDDHPSIAGQLGIQSIPAVIAFIDG RPVDGFMGAV 23098307 111 2.85e-15 VRFSQELELT FPILLDEKGEVSKAYRISPIPTTYMIDSE GIIRHKSYGA 4155972 52 5.25e-15 AKTYKGKVEF FKVSFDESQDLKESLGIRKIPTLIFYKNA KEVGERLVEP 2649838 32 5.25e-15 MGELNGTVEF EVVDVDEKRELAEKYEVLMLPTLVLADGD EVLGGFMGFA 3261501 53 1.56e-14 SSRKHFDVVH GKVNIETEKDLASIAGVKLLPTLMAFKKG KLVFKQAGIA 15614085 121 4.93e-14 GEFIEELGIT LPVFLDEEGEFADAYQVQHLPMTYVLDRE GIINEVILGE 30021713 139 1.49e-13 SNFIKEKGIT FPVLLDKNIDVTTAYKVITIPTSYFIDTK GVIQDKFIGP 15614140 121 2.81e-13 QRFVDRYGLS FPIVIDKGLNVIDAYGIRPLPTTILINEH GEIVKVHTGG 267116 54 3.07e-13 LSNEFTTFTF VHVDIDKVNTHPIGKEIRSVPTFYFYVNG AKVSEFSGAN 19746502 155 3.66e-13 KWFQGTDYKD LPVLLDPDGKLLEAYGVRSYPTEVFIGSD GVLAKKHIGY 18309723 351 5.20e-13 KNFLKEENHV FPVVLDENGAMVYQYGINAFPSTFIINKE GYITKYIPGA 16759994 115 1.13e-12 ARWLARKGVD FPVVNDANGALSAGWEISVTPTLVVVSQG RVVFTTSGWT 1729944 53 1.23e-12 SKKYEGRVVF AKLNTTGARRLAISQKILGLPTLSLYKDG VKVDEVTKDD 15615431 133 3.09e-12 RAFVQHYRAE FDPLLDTDGEVMETYQVIGIPTTLILDEE GTIVKRYNGV 17537401 90 1.42e-11 FKEHQGEWTY IPFGSDKIMSLMQKYEVKTIPAMRIVNDQ GEVIVQDART 27375582 137 1.67e-11 RFLGRYGNPF GRVGVDANGRASIEWGVYGVPETFVVGRE GTIVYKLVGP 15602312 125 1.02e-10 QALVKKREIS LLVVKDQLKITAERYQLVGTPTSFVIDPE GKILYKFEGL 18406743 265 1.70e-10 QDFKTKPWLA LPFNDKSGSKLARHFMLSTLPTLVILGPD GKTRHSNVAE 18406743 105 7.89e-10 YFRKMPWLAV PFTDSETRDRLDELFKVRGIPNLVMVDDH GKLVNENGVG 18406743 425 9.84e-09 EYYSQMPWLA LPFGDPRKASLAKTFKVGGIPMLAALGPT GQTVTKEARD -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 135765 2.5e-23 51_[3]_23 7109697 2.1e-22 51_[3]_20 2822332 2.1e-22 66_[3]_24 11499727 2.4e-21 51_[3]_25 15899007 2.8e-21 82_[3]_24 1174686 4.5e-21 58_[3]_40 6687568 1.3e-20 53_[3]_22 12044976 1.3e-20 52_[3]_21 7290567 2.3e-20 55_[3]_27 20092028 3.1e-20 41_[3]_23 1633495 4.2e-20 55_[3]_24 3323237 8.5e-20 53_[3]_23 15599256 8.5e-20 60_[3]_200 1388082 8.5e-20 63_[3]_18 17531233 9.8e-20 61_[3]_25 16761507 5.8e-19 87_[3]_23 15791337 6.6e-19 36_[3]_24 15218394 9.8e-19 57_[3]_26 4200327 1.6e-18 113_[3]_24 16123427 1.6e-18 87_[3]_29 140543 5.1e-18 77_[3]_21 13358154 1.1e-17 55_[3]_22 17229859 4e-17 56_[3]_23 16804867 4.7e-16 53_[3]_24 1651717 1.7e-15 68_[3]_29 15966937 2.1e-15 95_[3]_206 23098307 2.8e-15 110_[3]_24 4155972 5.2e-15 51_[3]_24 2649838 5.2e-15 31_[3]_33 3261501 1.6e-14 52_[3]_43 15614085 4.9e-14 120_[3]_28 30021713 1.5e-13 138_[3]_24 15614140 2.8e-13 120_[3]_27 267116 3.1e-13 53_[3]_22 19746502 3.7e-13 154_[3]_24 18309723 5.2e-13 350_[3]_24 16759994 1.1e-12 114_[3]_25 1729944 1.2e-12 52_[3]_26 15615431 3.1e-12 132_[3]_23 17537401 1.4e-11 89_[3]_31 27375582 1.7e-11 136_[3]_29 15602312 1e-10 124_[3]_22 18406743 9.8e-09 104_[3]_131_[3]_131_[3]_125 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=29 seqs=45 135765 ( 52) VKIDVDENQETAGKYGVMSIPTLLVLKDG 1 7109697 ( 52) VKVDVDQANEISSEYEIRSIPTIIFFQDG 1 2822332 ( 67) AKIDVDENQALASAYGVRGVPTLVLFADG 1 11499727 ( 52) YKLNVDENQDVAFEYGIASIPTVLFFRNG 1 15899007 ( 83) GKLNTEESQDIAMRYGIMSLPTIMFFKNG 1 1174686 ( 59) AKINVDSVQQVAQHYRVSAMPTFLFFKNG 1 6687568 ( 54) AKVDVDQNEEAAAKYSVTAMPTFVFIKDG 1 12044976 ( 53) VKVNVDDHTDIAAAYNITSLPTIVVFENG 1 7290567 ( 56) LKVNVDENEDITVEYNVNSMPTFVFIKGG 1 20092028 ( 42) KVVDVDENQELASKYGIHAVPTLIIQKDG 1 1633495 ( 56) AKLNIDQNPGTAPKYGIRGIPTLLLFKNG 1 3323237 ( 54) GKLNVDDDQDLAVEFNVASIPTLIVFKDG 1 15599256 ( 61) AKVNCDVEQDIVMRFGIRSLPTVVLFKDG 1 1388082 ( 64) FKVDVDELQSVAKEFGVEAMPTFVFIKAG 1 17531233 ( 62) CKVDVDEAEDLCSKYDVKMMPTFIFTKNG 1 16761507 ( 88) VKVNTEAERELSARFGIRSIPMIMIFKHG 1 15791337 ( 37) EKIDVDEAEDVANEYQVRSIPTIVVENDD 1 15218394 ( 58) VKIDTEKYPSLANKYQIEALPTFILFKDG 1 4200327 ( 114) AKVDIDDHTDLAIEYEVSAVPTVLAMKNG 1 16123427 ( 88) VKVNTEAEPALSTRFRIRSIPTIMLYRNG 1 140543 ( 78) VKCDVDESPDIAKECEVTAMPTFVLGKDG 1 13358154 ( 56) IKINVDHFPELSTQYQVKSIPALFYLKNG 1 17229859 ( 57) VKVDVDNNKPLFKRFGLRSIPAVLIFKDG 1 16804867 ( 54) VKINVDKQRALAQKFDVKGIPNSLVLVDG 1 1651717 ( 69) VEVNADVNLHLANAYRLKNLPTLILFNRG 1 15966937 ( 96) VKMNIDDHPSIAGQLGIQSIPAVIAFIDG 1 23098307 ( 111) FPILLDEKGEVSKAYRISPIPTTYMIDSE 1 4155972 ( 52) FKVSFDESQDLKESLGIRKIPTLIFYKNA 1 2649838 ( 32) EVVDVDEKRELAEKYEVLMLPTLVLADGD 1 3261501 ( 53) GKVNIETEKDLASIAGVKLLPTLMAFKKG 1 15614085 ( 121) LPVFLDEEGEFADAYQVQHLPMTYVLDRE 1 30021713 ( 139) FPVLLDKNIDVTTAYKVITIPTSYFIDTK 1 15614140 ( 121) FPIVIDKGLNVIDAYGIRPLPTTILINEH 1 267116 ( 54) VHVDIDKVNTHPIGKEIRSVPTFYFYVNG 1 19746502 ( 155) LPVLLDPDGKLLEAYGVRSYPTEVFIGSD 1 18309723 ( 351) FPVVLDENGAMVYQYGINAFPSTFIINKE 1 16759994 ( 115) FPVVNDANGALSAGWEISVTPTLVVVSQG 1 1729944 ( 53) AKLNTTGARRLAISQKILGLPTLSLYKDG 1 15615431 ( 133) FDPLLDTDGEVMETYQVIGIPTTLILDEE 1 17537401 ( 90) IPFGSDKIMSLMQKYEVKTIPAMRIVNDQ 1 27375582 ( 137) GRVGVDANGRASIEWGVYGVPETFVVGRE 1 15602312 ( 125) LLVVKDQLKITAERYQLVGTPTSFVIDPE 1 18406743 ( 265) LPFNDKSGSKLARHFMLSTLPTLVILGPD 1 18406743 ( 105) PFTDSETRDRLDELFKVRGIPNLVMVDDH 1 18406743 ( 425) LPFGDPRKASLAKTFKVGGIPMLAALGPT 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 29 n= 13619 bayes= 8.61696 E= 2.2e-254 81 47 -567 -75 169 26 -299 -11 -167 54 -171 -408 -123 -369 -391 -320 -272 163 -358 -23 -330 -444 -133 -122 -126 -415 76 -448 294 -193 -330 -263 198 -180 -37 -256 -293 -92 -463 -373 -202 3 -457 -416 -26 -439 -264 132 -427 -23 -23 -385 -169 -354 -333 -315 -106 288 -396 -366 -465 -495 234 -411 -122 -3 -278 -527 -451 -12 -483 321 -537 -373 -455 -90 -392 6 -602 -477 -133 5 -117 -413 -152 -434 -259 89 -211 0 -191 -134 -400 -349 -330 -69 56 270 -386 -353 -402 -451 366 57 -547 -442 -260 -498 -212 -521 -435 -161 -182 -300 -412 -343 -171 -497 -494 -435 7 -395 41 208 -442 -147 79 -402 86 -385 -279 -48 -115 145 -47 1 37 -93 -418 -317 7 -394 5 65 -126 -66 216 -138 4 -108 -279 268 -364 -5 -47 46 -232 -93 -418 -32 -139 -394 -118 38 -442 101 -145 -138 4 -108 5 26 121 264 107 -65 -10 -388 -418 -317 35 -395 200 154 -442 -147 79 -138 -38 -385 -279 -48 -115 -5 71 108 -82 -388 -418 -317 -83 -278 -573 -495 -102 -469 61 135 -475 233 31 -417 -480 -371 -394 -330 34 106 -362 -306 292 22 -210 -400 -173 -236 -307 -171 -216 -218 40 -367 -189 -347 -345 112 -62 -96 -381 -382 7 -394 -44 88 -126 -66 -145 43 64 -385 95 74 -115 110 -47 108 37 -93 -418 -32 96 -395 -278 140 -443 -66 163 -138 138 -188 -279 -204 -364 110 159 1 -10 -388 -418 -317 -164 5 -363 -332 194 -384 34 -282 -185 -117 -200 -289 -360 -84 -268 -213 -286 -302 123 435 -283 -413 -43 109 -463 227 -161 -421 37 -404 5 75 -381 197 107 -70 -250 -407 -437 -335 -549 -455 -845 -812 -560 -801 -760 276 -819 -3 -383 -740 -786 -761 -794 -699 -506 262 -761 -649 -73 -394 -278 -38 -442 -147 79 -51 86 -108 95 26 -364 63 263 108 37 -175 -418 -32 96 -373 -293 -224 -422 136 78 -360 -105 -182 97 -51 -28 -154 -205 263 38 -167 -410 -315 -391 -340 -634 -565 -105 -538 -377 280 -544 136 257 -484 -548 -440 -467 -401 -21 49 -416 -30 -848 -763 -850 -921 -898 -765 -701 -968 -906 -898 -912 -838 430 -810 -799 -797 -833 -962 -797 -827 -19 -324 -498 -190 -518 -525 -357 -385 -456 -498 135 0 -502 -363 -397 -27 385 -381 -490 -489 -317 -271 -570 -168 169 -461 -301 119 -472 175 31 -411 -479 -372 -394 35 130 50 -359 -300 -153 -270 -569 -492 74 -460 -299 172 -471 91 202 -409 -477 -371 -87 -101 -272 174 -358 160 6 -316 -620 -544 229 -512 -350 140 -523 124 113 -462 -526 -418 -443 -374 -322 126 -397 -26 -153 -270 -569 -167 257 -160 -299 172 -470 74 32 -409 -477 -53 -392 -320 -90 24 -358 160 -139 -396 115 -100 -444 24 -146 -139 246 -387 -280 139 -365 -5 24 -65 -234 -93 -419 -318 -140 -399 229 -39 -447 -65 79 -407 -39 -391 -285 246 26 -6 71 1 -83 -393 -423 -321 -181 -492 30 77 -623 325 142 -580 -170 -591 -477 -277 -498 -58 -391 -317 -133 -554 -543 -495 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 29 nsites= 45 E= 2.2e-254 0.155556 0.022222 0.000000 0.044444 0.177778 0.088889 0.000000 0.044444 0.022222 0.133333 0.000000 0.000000 0.022222 0.000000 0.000000 0.000000 0.000000 0.266667 0.000000 0.022222 0.000000 0.000000 0.022222 0.022222 0.022222 0.000000 0.022222 0.000000 0.600000 0.022222 0.000000 0.000000 0.222222 0.000000 0.022222 0.000000 0.000000 0.044444 0.000000 0.000000 0.000000 0.022222 0.000000 0.000000 0.066667 0.000000 0.000000 0.200000 0.000000 0.111111 0.022222 0.000000 0.022222 0.000000 0.000000 0.000000 0.022222 0.533333 0.000000 0.000000 0.000000 0.000000 0.333333 0.000000 0.022222 0.066667 0.000000 0.000000 0.000000 0.088889 0.000000 0.377778 0.000000 0.000000 0.000000 0.022222 0.000000 0.088889 0.000000 0.000000 0.022222 0.022222 0.044444 0.000000 0.022222 0.000000 0.000000 0.133333 0.022222 0.133333 0.000000 0.022222 0.000000 0.000000 0.000000 0.044444 0.111111 0.422222 0.000000 0.000000 0.000000 0.000000 0.800000 0.133333 0.000000 0.000000 0.000000 0.000000 0.022222 0.000000 0.000000 0.000000 0.022222 0.000000 0.000000 0.000000 0.022222 0.000000 0.000000 0.000000 0.088889 0.000000 0.088889 0.333333 0.000000 0.022222 0.022222 0.000000 0.133333 0.000000 0.000000 0.022222 0.022222 0.088889 0.022222 0.044444 0.066667 0.044444 0.000000 0.000000 0.088889 0.000000 0.066667 0.111111 0.022222 0.044444 0.066667 0.022222 0.066667 0.044444 0.000000 0.288889 0.000000 0.022222 0.022222 0.066667 0.000000 0.044444 0.000000 0.022222 0.022222 0.000000 0.022222 0.088889 0.000000 0.155556 0.000000 0.022222 0.066667 0.044444 0.022222 0.044444 0.133333 0.222222 0.088889 0.022222 0.044444 0.000000 0.000000 0.000000 0.111111 0.000000 0.288889 0.222222 0.000000 0.022222 0.022222 0.022222 0.044444 0.000000 0.000000 0.022222 0.022222 0.022222 0.066667 0.111111 0.022222 0.000000 0.000000 0.000000 0.044444 0.000000 0.000000 0.000000 0.022222 0.000000 0.022222 0.155556 0.000000 0.488889 0.022222 0.000000 0.000000 0.000000 0.000000 0.000000 0.066667 0.177778 0.000000 0.000000 0.555556 0.022222 0.022222 0.000000 0.022222 0.000000 0.000000 0.022222 0.022222 0.022222 0.044444 0.000000 0.022222 0.000000 0.000000 0.155556 0.044444 0.044444 0.000000 0.000000 0.088889 0.000000 0.044444 0.133333 0.022222 0.044444 0.000000 0.088889 0.111111 0.000000 0.044444 0.066667 0.022222 0.066667 0.022222 0.111111 0.066667 0.044444 0.000000 0.022222 0.177778 0.000000 0.000000 0.200000 0.000000 0.044444 0.044444 0.022222 0.200000 0.022222 0.000000 0.000000 0.000000 0.066667 0.133333 0.044444 0.044444 0.000000 0.000000 0.000000 0.022222 0.022222 0.000000 0.000000 0.222222 0.000000 0.000000 0.000000 0.022222 0.044444 0.000000 0.000000 0.000000 0.022222 0.000000 0.000000 0.000000 0.000000 0.044444 0.600000 0.000000 0.000000 0.044444 0.155556 0.000000 0.377778 0.000000 0.000000 0.088889 0.000000 0.022222 0.066667 0.000000 0.133333 0.088889 0.022222 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.400000 0.000000 0.088889 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.511111 0.000000 0.000000 0.044444 0.000000 0.000000 0.044444 0.000000 0.022222 0.022222 0.044444 0.133333 0.044444 0.044444 0.044444 0.000000 0.044444 0.288889 0.111111 0.066667 0.022222 0.000000 0.022222 0.177778 0.000000 0.000000 0.000000 0.000000 0.200000 0.022222 0.000000 0.022222 0.022222 0.044444 0.022222 0.044444 0.000000 0.000000 0.355556 0.066667 0.022222 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.022222 0.000000 0.000000 0.422222 0.000000 0.244444 0.133333 0.000000 0.000000 0.000000 0.000000 0.000000 0.044444 0.111111 0.000000 0.022222 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.088889 0.000000 0.000000 0.022222 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.066667 0.044444 0.000000 0.000000 0.000000 0.022222 0.755556 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.022222 0.177778 0.000000 0.000000 0.133333 0.000000 0.333333 0.022222 0.000000 0.000000 0.000000 0.000000 0.066667 0.133333 0.111111 0.000000 0.000000 0.022222 0.000000 0.000000 0.000000 0.088889 0.000000 0.000000 0.200000 0.000000 0.177778 0.088889 0.000000 0.000000 0.000000 0.022222 0.022222 0.000000 0.288889 0.000000 0.088889 0.088889 0.000000 0.000000 0.000000 0.266667 0.000000 0.000000 0.155556 0.000000 0.222222 0.044444 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.200000 0.000000 0.022222 0.022222 0.000000 0.000000 0.022222 0.333333 0.022222 0.000000 0.200000 0.000000 0.155556 0.022222 0.000000 0.000000 0.022222 0.000000 0.000000 0.022222 0.088889 0.000000 0.088889 0.022222 0.000000 0.155556 0.022222 0.000000 0.088889 0.000000 0.022222 0.444444 0.000000 0.000000 0.111111 0.000000 0.022222 0.044444 0.022222 0.000000 0.044444 0.000000 0.000000 0.022222 0.000000 0.355556 0.044444 0.000000 0.044444 0.022222 0.000000 0.044444 0.000000 0.000000 0.244444 0.066667 0.022222 0.066667 0.044444 0.022222 0.000000 0.000000 0.000000 0.022222 0.000000 0.088889 0.133333 0.000000 0.644444 0.044444 0.000000 0.022222 0.000000 0.000000 0.000000 0.000000 0.022222 0.000000 0.000000 0.022222 0.000000 0.000000 0.000000 -------------------------------------------------------------------------------- Time 188.79 secs. ******************************************************************************** ******************************************************************************** MOTIF 4 width = 21 sites = 37 llr = 1031 E-value = 7.8e-153 ******************************************************************************** -------------------------------------------------------------------------------- Motif 4 Description -------------------------------------------------------------------------------- Simplified A 11:31::1::1::1:2::5:3 pos.-specific C :::1:::1::1:::::::::1 probability D :1111::::::81:::::::: matrix E 111:3::::::::12:::1:: F :::::1::1:::::5::::4: G :13::::5::1:::::::::1 H :::::::::::::1::5:::: I :::1:24:5:1::1::::::1 K 511:1::::::::::::51:1 L :3:::14:1:::1:::11::1 M ::::::::::::::::1:::: N 1:3:::::::12::::111:: P ::::::::::1:141:::::: Q ::::1:::::::::::211:: R :2::::::::1::::::11:1 S :::::::1:9::5::2:1::: T :::2::::::1:1::1::::: V ::1226113:4::3:3::::: W :::::::::::::::::::6: Y ::::::1:::::::1:::::: bits 6.2 5.6 5.0 4.3 * Information 3.7 * * content 3.1 * * * * (40.2 bits) 2.5 * * * * * 1.9 ***** **** * ** 1.2 * ******************* 0.6 ********************* 0.0 --------------------- Multilevel KLNAEVLGISVDSPFVHKAWA consensus G V I V V S F sequence A -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- --------------------- 4433065 45 1.59e-19 GYSEVAGQLK EINCEVIGVSVDSVYCHQAWC EADKSKGGVG 9955016 66 5.72e-19 AFSNRAEDFR KLGCEVLGVSVDSQFTHLAWI NTPRKEGGLG 6850955 63 7.15e-18 AVAEKYPEFQ KLGVEVLSVSVDSVFVHKMWN DNELSKMVEG 19173077 61 6.34e-17 RFSDLKGAFL RRNAVVLLISCDSVYTHKAWA SIPREQNGVL 18313548 63 3.76e-16 TFRDKMALLN KANAEVLAISVDSPFALKAFK DANRLNFPLL 20151112 62 6.13e-16 DVADHYEELQ KLGVDVYSVSTDTHFTHKAWH SSSETIAKIK 16501671 66 9.92e-16 AFDKRYEEFQ KRGVEVVGVSFDSEFVHNAWR NTPVDKGGIG 15899339 61 1.59e-15 TFRDSMAKFN EVNAVVIGISVDPPFSNKAFK EQNKINFTIV 21674812 60 3.58e-15 DYRNNVAAFT SRGITVIGISGDSPESHKQFA EKHKLPFLLL 13541053 63 3.58e-15 GFAEDYEKFK KKNTEVISVSEDTVYVHKAWV QYDERVAKAK 15807234 61 7.89e-15 EYSGSQDDFT EAGAVVLGINRDSVYAHRAWA AEYGIEVPLL 11467494 67 1.23e-14 LLSDRISEFR KLSTQILAISVDSPFSHLQYL LCNREEGGLE 13186328 61 1.37e-14 NISNKIKEFK NLNTKIYAISNDSHFVQKNWI ENELKFINFP 4704732 69 3.60e-14 FIGKAEELKS KGIDEIICFSVNDPFVMKAWG KTYPENKHVK 13541117 61 7.49e-14 TFRDSMANFN KFKAKVIGISVDSPFSLAEFA KKNNLTFDLL 4996210 63 9.20e-14 EFGKMHEEFL KLNCKLIGFSCNSKESHDQWI EDIKFYGNLD 21112072 65 1.25e-13 GYVEQFEAFR KRGIEVLCTAVNDPFVMQAWG RSQLVPDGLH 21223405 67 1.53e-13 AFGKLNDEFA DRDAQILGFSGDSEFVHHAWR KDHPDLTDLP 6323138 81 1.69e-13 INYLDELVKE KEVDQVIVVTVDNPFANQAWA KSLGVKDTTH 15801846 75 1.69e-13 VRKFNQLATE IDNTVVLCISADLPFAQSRFC GAEGLNNVIT 14286173 60 3.08e-13 AFQEVYPELR ELDCELVGLSVDQVFSHIKWI EWIAENLDTE 3318841 63 4.55e-13 RAAKLAPEFA KRNVKLIALSIDSVEDHLAWS KDINAYNSEE 21227878 63 5.52e-13 SLEENYELFT ELNTVPLGISVDPIPSKKAWA RELGINHIKL 14600438 60 5.52e-13 RFNELLDEFE KLGAVVIGVSTDSVEKNRKFA EKHGFRFKLV 21283385 73 1.08e-12 QTRKFNSDAS KEEGIVLTISADLPFAQKRWC ASAGLDNVIT 17229033 68 1.43e-12 YNELAKVFKD NGVDEIVCISVNDAFVMNEWA KTQEAENITL 15613511 59 2.74e-12 DFRDRVEDFK GLNTVILGVSPDPVERHKKFI EKYSLPFLLL 16803644 68 6.75e-12 AISARSDEFD ALNARIIGASTDTIHSHLAWT NTPIKEGGIG 15790738 96 6.75e-12 AIQNARWFDC TPGLAVWGISPDSTYAHEAFA DEYALTFPLL 15605963 75 8.06e-12 TKKFNEIMAG MEGVDVTVVSMDLPFAQKRFC ESFNIQNVTV 15643152 51 1.62e-11 REAVEFSREN FEKAQVVGISRDSVEALKRFK EKNDLKVTLL 6322180 122 3.49e-11 CGFRDNYQEL KKYAAVFGLSADSVTSQKKFQ SKQNLPYHLL 1091044 74 1.02e-10 FVEKFTALKS AGVDAVIVLSANDPFVQSAFG KALGVTDEAF 15609375 61 1.30e-10 QLRDHLPEFE NDDSAALAISVGPPPTHKIWA TQSGFTFPLL 15826629 65 2.10e-10 FVEQAEALKA KGVQVVACLSVNDAFVTGEWG RAHKAEGKVR 15672286 72 1.32e-09 TKHFNLEAAK HSEIDFLSISNNTADEQKNWC ATEGVDMTIL 17547503 92 2.18e-08 PELSALAQEQ KARVKFIGIGIDSAANIQAFL GKVPVTYPIA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 4433065 1.6e-19 44_[4]_68 9955016 5.7e-19 65_[4]_111 6850955 7.1e-18 62_[4]_120 19173077 6.3e-17 60_[4]_96 18313548 3.8e-16 62_[4]_79 20151112 6.1e-16 61_[4]_104 16501671 9.9e-16 65_[4]_114 15899339 1.6e-15 60_[4]_75 21674812 3.6e-15 59_[4]_68 13541053 3.6e-15 62_[4]_91 15807234 7.9e-15 60_[4]_70 11467494 1.2e-14 66_[4]_117 13186328 1.4e-14 60_[4]_95 4704732 3.6e-14 68_[4]_73 13541117 7.5e-14 60_[4]_76 4996210 9.2e-14 62_[4]_137 21112072 1.2e-13 64_[4]_75 21223405 1.5e-13 66_[4]_97 6323138 1.7e-13 80_[4]_75 15801846 1.7e-13 74_[4]_73 14286173 3.1e-13 59_[4]_129 3318841 4.5e-13 62_[4]_141 21227878 5.5e-13 62_[4]_71 14600438 5.5e-13 59_[4]_83 21283385 1.1e-12 72_[4]_71 17229033 1.4e-12 67_[4]_163 15613511 2.7e-12 58_[4]_75 16803644 6.7e-12 67_[4]_93 15790738 6.7e-12 95_[4]_96 15605963 8.1e-12 74_[4]_74 15643152 1.6e-11 50_[4]_250 6322180 3.5e-11 121_[4]_73 1091044 1e-10 73_[4]_72 15609375 1.3e-10 60_[4]_72 15826629 2.1e-10 64_[4]_76 15672286 1.3e-09 71_[4]_68 17547503 2.2e-08 91_[4]_62 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 4 width=21 seqs=37 4433065 ( 45) EINCEVIGVSVDSVYCHQAWC 1 9955016 ( 66) KLGCEVLGVSVDSQFTHLAWI 1 6850955 ( 63) KLGVEVLSVSVDSVFVHKMWN 1 19173077 ( 61) RRNAVVLLISCDSVYTHKAWA 1 18313548 ( 63) KANAEVLAISVDSPFALKAFK 1 20151112 ( 62) KLGVDVYSVSTDTHFTHKAWH 1 16501671 ( 66) KRGVEVVGVSFDSEFVHNAWR 1 15899339 ( 61) EVNAVVIGISVDPPFSNKAFK 1 21674812 ( 60) SRGITVIGISGDSPESHKQFA 1 13541053 ( 63) KKNTEVISVSEDTVYVHKAWV 1 15807234 ( 61) EAGAVVLGINRDSVYAHRAWA 1 11467494 ( 67) KLSTQILAISVDSPFSHLQYL 1 13186328 ( 61) NLNTKIYAISNDSHFVQKNWI 1 4704732 ( 69) KGIDEIICFSVNDPFVMKAWG 1 13541117 ( 61) KFKAKVIGISVDSPFSLAEFA 1 4996210 ( 63) KLNCKLIGFSCNSKESHDQWI 1 21112072 ( 65) KRGIEVLCTAVNDPFVMQAWG 1 21223405 ( 67) DRDAQILGFSGDSEFVHHAWR 1 6323138 ( 81) KEVDQVIVVTVDNPFANQAWA 1 15801846 ( 75) IDNTVVLCISADLPFAQSRFC 1 14286173 ( 60) ELDCELVGLSVDQVFSHIKWI 1 3318841 ( 63) KRNVKLIALSIDSVEDHLAWS 1 21227878 ( 63) ELNTVPLGISVDPIPSKKAWA 1 14600438 ( 60) KLGAVVIGVSTDSVEKNRKFA 1 21283385 ( 73) KEEGIVLTISADLPFAQKRWC 1 17229033 ( 68) NGVDEIVCISVNDAFVMNEWA 1 15613511 ( 59) GLNTVILGVSPDPVERHKKFI 1 16803644 ( 68) ALNARIIGASTDTIHSHLAWT 1 15790738 ( 96) TPGLAVWGISPDSTYAHEAFA 1 15605963 ( 75) MEGVDVTVVSMDLPFAQKRFC 1 15643152 ( 51) FEKAQVVGISRDSVEALKRFK 1 6322180 ( 122) KKYAAVFGLSADSVTSQKKFQ 1 1091044 ( 74) AGVDAVIVLSANDPFVQSAFG 1 15609375 ( 61) NDDSAALAISVGPPPTHKIWA 1 15826629 ( 65) KGVQVVACLSVNDAFVTGEWG 1 15672286 ( 72) HSEIDFLSISNNTADEQKNWC 1 17547503 ( 92) KARVKFIGIGIDSAANIQAFL 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 21 n= 14387 bayes= 8.76477 E= 7.8e-153 -49 -371 -94 88 -103 -124 103 -115 254 -362 29 97 -340 -116 -23 -41 -59 -364 -394 -293 -4 -365 -21 61 -102 48 -124 -113 -15 144 -251 -183 -91 -119 182 -42 -210 -149 -392 -293 -250 -377 29 -16 -426 175 -128 -115 -16 -370 -264 282 -348 -125 -25 -43 -216 22 -403 -8 155 259 58 -433 -298 -135 -264 57 -416 -113 -151 -366 -445 -26 -350 -74 154 97 -336 -276 30 -389 30 181 -437 -346 -138 -115 87 -377 -271 -196 -354 169 -27 -183 -63 118 -412 -311 -114 -176 -411 -377 -106 -416 -236 95 -397 -100 -199 -361 -186 -334 -298 -298 -174 309 -403 -396 -129 -249 -548 -471 -76 -440 -279 247 -450 191 -148 -389 -456 -350 -372 -300 -66 48 87 90 75 303 -698 -683 -624 278 -524 -553 -701 -176 -485 -535 -537 -544 -584 112 -79 -3 -620 -599 -179 -330 -545 -521 23 -527 -391 311 -495 37 -154 -439 -533 -429 -450 -386 -110 152 -433 -359 -150 -225 -414 -463 -444 -177 -295 -433 -392 -448 -329 -72 -390 -348 -321 398 57 -463 -427 -371 -25 72 -414 -213 -167 -141 -233 13 -393 -241 -34 -66 -97 -326 -61 -291 -10 286 -388 -370 -474 -516 357 -342 -640 -125 -256 -658 -444 -658 -589 218 -527 -363 -449 -286 -398 -652 -613 -483 -338 -346 86 -347 -516 -361 -234 -505 -354 -48 -400 -22 79 -33 -332 335 109 -521 -499 -403 25 -333 -330 -36 -388 -377 179 -20 -108 -323 -239 -255 275 -8 -236 -217 -72 161 -398 -319 -141 -363 -105 106 319 -386 105 -386 -294 -376 -296 -229 -9 -219 -275 -239 -83 -392 -300 249 119 55 -95 -77 -417 -333 -124 -373 -76 -360 -255 -24 -343 -119 -24 208 97 148 -395 -294 -381 -314 -338 -352 -322 -420 536 -159 -172 -75 127 67 -407 201 -186 -248 -119 -396 -300 -102 -115 -371 -95 -76 -419 -124 103 -115 247 5 -256 49 -341 168 47 24 -209 -364 -395 -293 241 -406 -301 25 -463 -371 -153 -115 66 -395 30 47 -382 134 137 -210 -249 -401 -426 -331 -710 -548 -757 -758 277 -718 -267 -587 -713 -527 -503 -561 -711 -549 -587 -593 -632 -625 546 60 150 282 -275 -206 -390 48 101 101 24 -78 -232 -29 -353 12 44 -46 -60 -140 -383 -290 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 4 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 21 nsites= 37 E= 7.8e-153 0.054054 0.000000 0.027027 0.135135 0.027027 0.027027 0.027027 0.027027 0.486486 0.000000 0.027027 0.081081 0.000000 0.000000 0.027027 0.027027 0.027027 0.000000 0.000000 0.000000 0.081081 0.000000 0.054054 0.108108 0.027027 0.108108 0.000000 0.027027 0.054054 0.297297 0.000000 0.000000 0.027027 0.000000 0.162162 0.027027 0.000000 0.027027 0.000000 0.000000 0.000000 0.000000 0.081081 0.054054 0.000000 0.270270 0.000000 0.027027 0.054054 0.000000 0.000000 0.324324 0.000000 0.000000 0.027027 0.027027 0.000000 0.108108 0.000000 0.027027 0.270270 0.108108 0.108108 0.000000 0.000000 0.027027 0.000000 0.081081 0.000000 0.027027 0.000000 0.000000 0.000000 0.027027 0.000000 0.027027 0.162162 0.162162 0.000000 0.000000 0.108108 0.000000 0.081081 0.270270 0.000000 0.000000 0.000000 0.027027 0.135135 0.000000 0.000000 0.000000 0.000000 0.108108 0.027027 0.000000 0.027027 0.216216 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.054054 0.000000 0.000000 0.189189 0.000000 0.081081 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.000000 0.621622 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.027027 0.000000 0.000000 0.351351 0.000000 0.378378 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.027027 0.108108 0.027027 0.054054 0.135135 0.135135 0.000000 0.000000 0.000000 0.486486 0.000000 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.000000 0.000000 0.108108 0.027027 0.081081 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.081081 0.000000 0.000000 0.459459 0.000000 0.135135 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.027027 0.270270 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.891892 0.027027 0.000000 0.000000 0.000000 0.108108 0.054054 0.000000 0.027027 0.027027 0.054054 0.000000 0.054054 0.000000 0.000000 0.027027 0.054054 0.054054 0.000000 0.054054 0.000000 0.081081 0.405405 0.000000 0.000000 0.000000 0.000000 0.783784 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.000000 0.000000 0.189189 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.135135 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.081081 0.000000 0.027027 0.108108 0.027027 0.000000 0.513514 0.108108 0.000000 0.000000 0.000000 0.108108 0.000000 0.000000 0.054054 0.000000 0.000000 0.054054 0.054054 0.027027 0.000000 0.000000 0.000000 0.351351 0.027027 0.000000 0.000000 0.027027 0.297297 0.000000 0.000000 0.027027 0.000000 0.027027 0.162162 0.540541 0.000000 0.027027 0.000000 0.000000 0.000000 0.000000 0.000000 0.054054 0.000000 0.000000 0.000000 0.027027 0.000000 0.000000 0.135135 0.216216 0.027027 0.027027 0.027027 0.000000 0.000000 0.000000 0.000000 0.027027 0.000000 0.000000 0.027027 0.000000 0.000000 0.027027 0.243243 0.108108 0.270270 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.486486 0.027027 0.027027 0.081081 0.081081 0.081081 0.000000 0.189189 0.000000 0.000000 0.027027 0.000000 0.000000 0.000000 0.027027 0.000000 0.027027 0.027027 0.000000 0.027027 0.027027 0.027027 0.459459 0.108108 0.000000 0.054054 0.000000 0.108108 0.054054 0.054054 0.000000 0.000000 0.000000 0.000000 0.513514 0.000000 0.000000 0.081081 0.000000 0.000000 0.000000 0.027027 0.108108 0.000000 0.027027 0.054054 0.000000 0.081081 0.108108 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.351351 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.621622 0.027027 0.270270 0.135135 0.000000 0.000000 0.000000 0.108108 0.027027 0.135135 0.081081 0.054054 0.000000 0.027027 0.000000 0.027027 0.054054 0.027027 0.027027 0.027027 0.000000 0.000000 -------------------------------------------------------------------------------- Time 241.79 secs. ******************************************************************************** ******************************************************************************** MOTIF 5 width = 50 sites = 13 llr = 1029 E-value = 2.9e-120 ******************************************************************************** -------------------------------------------------------------------------------- Motif 5 Description -------------------------------------------------------------------------------- Simplified A 1:1::51:2::::11:::::2::::221::::1::1:::::71:::22:: pos.-specific C :::::11:1:::::::::::::::::::1:::::::::::::::::::2: probability D :32:::1::::::6:4:::::::::32::::1:51:::::2::::12::: matrix E 344::1:::::::::2:2:::12:::11:::1::9:::::2:::::2511 F ::::1:1:::811:::::::1:::::::::::1:::::::::2:2::::: G 12:8::::5::::11:6:::1:::::11:a:::::::::::::::22::3 H ::::1:::::::::::::::2:::111::::::::::::::::::1::3: I ::::1:::::156::::55::412:1::3:::3::41:141:2::2:::: K 21::::::::::::2222:3:::::::::::::1:::11:2:::::1:11 L ::1:2:2::3:21:::2:2:1:121:312::::::28:24::5:2::2:: M ::1:::1:::::::::::::::1::::::::::::1::2::1:::::::: N ::1:2:1::::::2:21:::::1:3112:::5:1::::1::::::12:2: P 2:12:1:1::::::5::::::::::2:2::::11:::::::::::::::5 Q ::::::2:::::::11:::411::::1::::::1::::1:2::8:::::: R :1:::::9:1:::::::::3::::::::::a::::::9::1::1::1::1 S ::::::::21:::1:::1:::111:::2:::312:::::::2::1::::: T 2:11:2:::2:::::::::::22:2::22:::1::2:::::::122:21: V ::::3:1:13122::::13::1351:::3:::3::21:42::::12:::: W :::::::::::::::::::::::::::::::::::::::::::::::::: Y ::::::1:::::::::::::32:121::::::::::::::::::2::::: bits 6.2 5.6 5.0 4.3 * * * * Information 3.7 * * ** * * * content 3.1 * * * * *** * * * * (114.2 bits) 2.5 * * ***** * ** * *** * ** * * * ** 1.9 * *** *************** *** **** ************ *** 1.2 ************************************************** 0.6 ************************************************** 0.0 -------------------------------------------------- Multilevel EEEGVALRGLFIIDPDGIIQYIVVNDLPIGRNIDEILRVIDALQFTDEHP consensus KD LT AV VV KN VKA TP V SV LKSF LV LCG sequence N T R Y VQ Y N -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- -------------------------------------------------- 16803644 121 5.68e-41 QVASDYGVLI EEEGVALRGLFIINPKGEIQYEVVHHNNIGREVDEVLRVLQALQTGGLCP INWQPGEKTI 16501671 119 2.15e-40 EIQKAYGIEH PDEGVALRGSFLIDANGIVRHQVVNDLPLGRNIDEMLRMVDALQFHEEHG DVCPAQWEKG 13541053 113 3.96e-39 IIARAYDVYN EETGNAQRGLFIINPDGIVKYVVITDDNVGRSTDETLRVLEALQSGGLCP VNWHEGEPTL 9955016 119 7.82e-38 RLSEDYGVLK TDEGIAYRGLFIIDGKGVLRQITVNDLPVGRSVDEALRLVQAFQYTDEHG EVCPAGWKPG 11467494 119 2.81e-37 QTITRDYQVL TDEGLAFPGLFIIDKEGIIQYYTVNNLLCGRNINELLRILESIQYVKENP GYACPVNWNF 6850955 116 5.66e-34 NVGTLYGVYD PEAGVENRGRFLIDPDGIIQGYEVLILPVGRNVSETLRQIQAFQLVRETK GAEVAPSGWK 21223405 116 9.87e-34 HELMRDLGIE GEDGFAQRAVFIVDQNNEIQFTMVTAGSVGRNPKEVLRVLDALQTDELCP CNWSKGDETL 13186328 109 1.48e-32 KISNNFNILN KKDGNCLRSTIIIDKNLIIKYINIVDDSIGRSIDEILKNIKMLQFINTNE NKLCPYSWNN 20151112 112 3.57e-32 ALTRNFDNMR EDEGLADRATFVVDPQGIIQAIEVTAEGIGRDASDLLRKIKAAQYVAAHP GEVCPAKWKE 4996210 122 2.00e-31 KIMDEKEKDI KGLPLTCRCVFFISPDKKVKATVLYPATTGRNSQEILRVLKSLQLTNTHP VATPVNWKEG 14286173 112 6.53e-29 VADTLGLIHP ARPTNTVRAVFVVDPEGIIRAILYYPQELGRNIPEIVRMIRAFRVIDAEG VAAPANWPDN 19173077 114 3.86e-28 ELCNQFGLYD EENGHPMRSTVILAKDLSVRHISSNYHAIGRSVDEIIRLIDAITFNDENG DICPAEWRSE 3318841 125 8.83e-27 GMLDPAEKDE KGMPVTARVVFVFGPDKKLKLSILYPATTGRNFDEILRVVISLQLTAEKR VATPVDWKDG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 16803644 5.7e-41 120_[5]_11 16501671 2.2e-40 118_[5]_32 13541053 4e-39 112_[5]_12 9955016 7.8e-38 118_[5]_29 11467494 2.8e-37 118_[5]_36 6850955 5.7e-34 115_[5]_38 21223405 9.9e-34 115_[5]_19 13186328 1.5e-32 108_[5]_18 20151112 3.6e-32 111_[5]_25 4996210 2e-31 121_[5]_49 14286173 6.5e-29 111_[5]_48 19173077 3.9e-28 113_[5]_14 3318841 8.8e-27 124_[5]_50 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 5 width=50 seqs=13 16803644 ( 121) EEEGVALRGLFIINPKGEIQYEVVHHNNIGREVDEVLRVLQALQTGGLCP 1 16501671 ( 119) PDEGVALRGSFLIDANGIVRHQVVNDLPLGRNIDEMLRMVDALQFHEEHG 1 13541053 ( 113) EETGNAQRGLFIINPDGIVKYVVITDDNVGRSTDETLRVLEALQSGGLCP 1 9955016 ( 119) TDEGIAYRGLFIIDGKGVLRQITVNDLPVGRSVDEALRLVQAFQYTDEHG 1 11467494 ( 119) TDEGLAFPGLFIIDKEGIIQYYTVNNLLCGRNINELLRILESIQYVKENP 1 6850955 ( 116) PEAGVENRGRFLIDPDGIIQGYEVLILPVGRNVSETLRQIQAFQLVRETK 1 21223405 ( 116) GEDGFAQRAVFIVDQNNEIQFTMVTAGSVGRNPKEVLRVLDALQTDELCP 1 13186328 ( 109) KKDGNCLRSTIIIDKNLIIKYINIVDDSIGRSIDEILKNIKMLQFINTNE 1 20151112 ( 112) EDEGLADRATFVVDPQGIIQAIEVTAEGIGRDASDLLRKIKAAQYVAAHP 1 4996210 ( 122) KGLPLTCRCVFFISPDKKVKATVLYPATTGRNSQEILRVLKSLQLTNTHP 1 14286173 ( 112) ARPTNTVRAVFVVDPEGIIRAILYYPQELGRNIPEIVRMIRAFRVIDAEG 1 19173077 ( 114) EENGHPMRSTVILAKDLSVRHISSNYHAIGRSVDEIIRLIDAITFNDENG 1 3318841 ( 125) KGMPVTARVVFVFGPDKKLKLSILYPATTGRNFDEILRVVISLQLTAEKR 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 50 n= 11603 bayes= 10.3312 E= 2.9e-120 -9 -287 -140 182 -338 -12 -37 -293 139 -277 -172 -90 112 -24 -81 -81 124 -277 -313 -211 -256 -472 206 241 -524 86 -168 -434 9 -419 -320 -173 -343 -88 71 -205 -255 -400 -492 -376 -8 -271 92 203 -319 -232 -24 -273 -81 -55 140 84 20 -14 -71 -68 49 -260 -296 -195 -218 -300 -318 -381 -497 357 -265 -471 -378 -507 -375 -233 42 -347 -308 -200 -45 -433 -398 -398 -177 -133 -426 -349 45 -322 210 68 -329 117 -31 209 -339 -230 -252 -183 -135 169 -218 -157 276 174 -389 -40 -384 -214 -253 -322 -353 -336 -236 -294 0 -273 -299 -77 159 -227 -374 -353 -10 185 -12 -251 41 -295 -106 -77 -239 109 168 63 -309 183 -193 -147 -124 18 -217 133 -399 -295 -491 -486 -523 -433 -154 -445 -192 -438 -384 -333 -66 -231 445 -349 -387 -530 -333 -408 149 215 -539 -538 -523 265 -392 -479 -558 -497 -388 -389 -400 -408 -445 159 -239 -16 -501 -491 -174 -137 -399 -325 -188 -314 -153 -68 -293 147 -41 -252 -334 -211 60 46 188 166 -227 -167 -358 -218 -473 -460 405 -467 -308 -104 -478 -160 -171 -408 -412 -414 -425 -292 -381 -155 -190 -42 -325 -278 -442 -433 -85 -457 -327 342 -403 15 -51 -350 -458 -348 -370 -312 -256 104 -357 -281 -322 -277 -432 -425 -109 -450 -321 349 -394 -22 -47 -341 -452 -342 -362 -304 -250 99 -355 -276 -148 -353 358 -137 -459 -121 -159 -431 -373 -456 -373 87 -437 -276 -334 -64 -321 -436 -410 -342 -49 -305 -319 -282 -407 -60 -187 -357 82 -336 -290 -269 363 53 -219 -169 -227 -343 -420 -367 -265 -430 240 134 -503 -290 -141 -443 89 -429 -332 230 -346 120 -232 -175 -243 -415 -476 -352 -248 -332 -261 -312 -477 338 -194 -438 16 -39 -343 22 -397 -245 -203 -197 -306 -425 -386 -358 -203 -221 -256 68 -266 -315 -124 295 65 -148 -91 -194 -327 -123 -146 27 -164 15 -291 -216 -340 -282 -480 -470 -309 -491 -370 334 -442 31 -86 -388 -490 -389 -411 -350 -275 146 -400 -319 -357 -406 -452 -294 -552 -406 -123 -419 210 -382 -295 -248 -433 335 282 -281 -289 -420 -400 -349 22 -152 -282 -248 134 -104 252 -198 -257 -88 -119 -206 -282 6 -186 -134 -201 -218 -61 397 -162 -154 -289 -8 -200 -287 -104 235 -208 -130 -51 -186 -301 100 -171 48 129 15 -231 203 -165 -136 -351 69 -186 -303 -131 61 -265 0 169 60 -317 -180 -210 43 130 170 -223 -161 -168 -146 -422 -364 -224 -356 -191 121 -357 37 -76 -303 -351 -269 -273 7 -149 267 -272 83 -177 -166 -294 -234 -162 -296 213 -107 -220 -11 -69 249 -312 -146 -182 -144 182 8 -213 272 60 -264 181 -99 -313 -230 214 0 -97 -259 -158 89 171 -33 -85 -73 -117 -258 -299 105 61 -270 92 35 -319 -12 214 -278 -80 117 -156 88 -239 127 -69 -64 -108 -264 -295 -193 -8 -262 -150 29 -313 -15 -21 -269 -81 -55 -152 159 170 -18 -69 134 123 -257 -292 -192 -267 191 -514 -455 -275 -434 -288 243 -438 63 -111 -376 -444 -350 -371 -299 126 182 -343 -276 -317 -393 -382 -446 -554 379 -324 -540 -440 -569 -446 -303 -473 -415 -368 -294 -423 -519 -446 -453 -509 -400 -575 -587 -605 -506 -267 -558 -312 -541 -497 -442 -491 -345 455 -455 -491 -631 -435 -496 -343 -387 42 4 -510 -238 -101 -515 -293 -528 -450 370 -384 -211 -304 232 -249 -526 -481 -350 -26 -136 -410 -346 16 -330 -168 215 -331 -112 -44 -275 -5 -239 -254 17 31 197 -238 -181 -266 -348 307 -145 -434 -254 -99 -411 -19 -405 -309 96 3 87 -210 115 -218 -402 -405 -297 -503 -534 -197 377 -630 -518 -368 -532 -562 -614 -522 -396 -620 -324 -510 -481 -507 -579 -566 -549 -69 -190 -435 -390 -227 -378 -227 298 -365 42 129 -307 -390 -283 -304 -237 74 90 -279 -213 -364 -284 -538 -454 -210 -506 -297 17 -445 310 -18 -427 -427 -304 -340 -371 -312 -43 -327 -303 -369 -272 -462 -426 -500 -409 -117 -410 -60 -401 -341 -292 -373 -178 442 -314 -346 -488 -306 -374 -168 -132 -389 -315 -183 -313 -147 66 -24 66 251 57 -327 89 -229 -172 -128 200 -222 -162 -382 -296 -637 -601 -323 -594 -481 271 -592 188 -148 -526 -578 -502 -542 -477 -335 152 -471 -409 -180 -319 150 116 -372 -270 -60 1 145 -301 -197 -121 -276 251 90 -112 -152 -302 -338 -240 305 -93 -418 -377 -369 -194 -278 -311 -390 -323 98 -325 -413 -312 -317 152 -194 -196 -357 -356 -42 -299 -605 -525 184 -527 -336 125 -511 256 -43 -470 -487 -365 -411 -401 -326 -204 -330 -303 -279 -283 -379 -139 -439 -419 64 -361 -274 -278 -140 -214 -324 466 -62 -241 -131 -382 -297 -357 -181 -140 -366 -313 182 -318 -11 -95 -306 89 -58 -248 -319 -220 -230 26 106 -2 -142 320 -161 -189 4 -164 -240 73 212 110 -159 -174 -89 78 -282 -89 -136 -105 181 118 -258 -181 62 -290 149 101 -343 76 -35 -304 28 -287 -182 168 -257 -34 83 -78 -127 -289 -318 -214 70 -301 -153 255 -385 -318 -170 -278 -216 100 -217 -218 -335 -107 -223 -180 132 -277 -394 -311 -210 339 -145 20 -382 -237 412 -348 24 -333 -232 229 -297 -76 -115 -102 49 -337 -355 -242 -220 -329 -266 -15 -426 152 -139 -367 3 -345 -277 -196 327 -132 67 -172 -223 -356 -403 -332 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 5 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 50 nsites= 13 E= 2.9e-120 0.076923 0.000000 0.000000 0.307692 0.000000 0.076923 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.153846 0.000000 0.000000 0.000000 0.153846 0.000000 0.000000 0.000000 0.000000 0.000000 0.307692 0.384615 0.000000 0.153846 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.153846 0.384615 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.076923 0.076923 0.076923 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.769231 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.153846 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.076923 0.076923 0.000000 0.230769 0.000000 0.230769 0.000000 0.000000 0.000000 0.000000 0.000000 0.307692 0.000000 0.000000 0.538462 0.076923 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.076923 0.076923 0.076923 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.230769 0.076923 0.076923 0.000000 0.153846 0.000000 0.000000 0.000000 0.076923 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.923077 0.000000 0.000000 0.000000 0.000000 0.000000 0.230769 0.076923 0.000000 0.000000 0.000000 0.461538 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.153846 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.307692 0.000000 0.000000 0.000000 0.000000 0.076923 0.076923 0.230769 0.307692 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.846154 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.538462 0.000000 0.153846 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.615385 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.076923 0.000000 0.615385 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.153846 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.538462 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.384615 0.153846 0.000000 0.000000 0.000000 0.000000 0.153846 0.000000 0.000000 0.230769 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.615385 0.000000 0.000000 0.153846 0.153846 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.153846 0.000000 0.000000 0.000000 0.538462 0.153846 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.538462 0.000000 0.153846 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.307692 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.307692 0.000000 0.000000 0.000000 0.000000 0.384615 0.307692 0.000000 0.000000 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.076923 0.076923 0.153846 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.307692 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.384615 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.076923 0.153846 0.076923 0.000000 0.153846 0.000000 0.000000 0.000000 0.153846 0.000000 0.000000 0.000000 0.076923 0.000000 0.076923 0.076923 0.076923 0.000000 0.000000 0.000000 0.076923 0.153846 0.307692 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.153846 0.000000 0.153846 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.538462 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.076923 0.000000 0.307692 0.000000 0.000000 0.000000 0.000000 0.230769 0.076923 0.000000 0.230769 0.153846 0.000000 0.307692 0.000000 0.000000 0.000000 0.076923 0.076923 0.000000 0.000000 0.000000 0.076923 0.230769 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.153846 0.000000 0.153846 0.076923 0.000000 0.076923 0.076923 0.000000 0.000000 0.307692 0.000000 0.076923 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.076923 0.000000 0.076923 0.000000 0.000000 0.000000 0.076923 0.000000 0.153846 0.230769 0.000000 0.000000 0.153846 0.153846 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.307692 0.000000 0.153846 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.153846 0.307692 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.538462 0.000000 0.000000 0.000000 0.307692 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.307692 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.076923 0.076923 0.307692 0.000000 0.000000 0.000000 0.000000 0.538462 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.076923 0.076923 0.076923 0.000000 0.153846 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.923077 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.384615 0.000000 0.153846 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.153846 0.153846 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.846154 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.923077 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.076923 0.153846 0.153846 0.076923 0.000000 0.076923 0.000000 0.000000 0.000000 0.384615 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.384615 0.000000 0.384615 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.000000 0.230769 0.153846 0.000000 0.000000 0.000000 0.076923 0.230769 0.000000 0.000000 0.000000 0.000000 0.230769 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.692308 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.153846 0.000000 0.538462 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.846154 0.076923 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.153846 0.076923 0.000000 0.230769 0.000000 0.000000 0.076923 0.000000 0.000000 0.153846 0.076923 0.153846 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.230769 0.230769 0.000000 0.000000 0.153846 0.000000 0.230769 0.153846 0.000000 0.153846 0.000000 0.000000 0.076923 0.000000 0.000000 0.153846 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.153846 0.000000 0.000000 0.461538 0.000000 0.000000 0.000000 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.153846 0.000000 0.000000 0.000000 0.000000 0.230769 0.000000 0.076923 0.000000 0.000000 0.307692 0.000000 0.076923 0.000000 0.000000 0.230769 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.076923 0.000000 0.307692 0.000000 0.000000 0.076923 0.000000 0.000000 0.000000 0.461538 0.000000 0.076923 0.000000 0.000000 0.000000 0.000000 0.000000 -------------------------------------------------------------------------------- Time 289.77 secs. ******************************************************************************** ******************************************************************************** MOTIF 6 width = 21 sites = 29 llr = 814 E-value = 2.0e-099 ******************************************************************************** -------------------------------------------------------------------------------- Motif 6 Description -------------------------------------------------------------------------------- Simplified A :::8:1:11:211::::1:2: pos.-specific C ::::::::::::::::::::: probability D :2:::7::::4:3::1::::4 matrix E :2:::1:2:1:11:2:::::3 F ::1:::9:::::::::::::: G 8:::::::::1::8::::::: H ::::::::::::::::1:::: I :::1:::::::1::::21::: K :22::::::1::1:22:1::1 L ::::::1:81:3:1::::9:: M :::::::::::::::1::::: N :::::1::::1:1:11::::: P ::2:8::::2:::::1::::: Q :12:1::::::1::21::::: R ::1::::::1::::::11:1: S 1:1::::2:::1::11:3:51 T ::1::::4:2111:12:2::: V ::::1:::1:::::::4:::: W ::::::::::::::::::::: Y ::::::::::::::::::::: bits 6.2 5.6 5.0 4.3 Information 3.7 * content 3.1 * * * * (40.5 bits) 2.5 * **** * * * 1.9 * ****** * * * *** 1.2 *********** ********* 0.6 ********************* 0.0 --------------------- Multilevel GKQAPDFTLPDLDGExVSLSD consensus E E T K I E sequence S -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- --------------------- 15613511 4 1.82e-19 MLE GKQAPDFSLPASNGETVSLSD FKGKNIVLYF 21674812 5 3.37e-19 MIEE GKIAPDFTLPDSTGKMVSLSE FKGRKVLLIF 15807234 5 3.01e-17 MTLV GQPAPDFTLPASTGQDITLSS YRGQSHVVLV 16330420 46 9.44e-17 EFRNRRLLRI GDFAPDFTLKNTKGETIILSE QLKTGPILLK 23098307 29 1.05e-16 MTAYAEGTEV GERAPDFELKTIDGQQLRLSD FKGERVLINF 30021713 53 1.18e-16 EIIARNGIEI GKSAPDFELTKLDGTNVKLSD LKGKKVILNF 15609658 10 3.44e-16 MTKTTRLTP GDKAPAFTLPDADGNNVSLAD YRGRRVIVYF 16078864 36 6.39e-16 YSKKEVGIQE GQQAPDFSLKTLSGEKSSLQD AKGKKVLLNF 14578634 56 3.40e-15 ADSTGYIVKV GESAPDFTITLTDGKQMKLSE LRGKVVMLQF 19746502 68 5.44e-15 KSSDKKMTND GPMAPDFELKGIDGKTYRLSE FKGKKVYLKF 14600438 5 5.44e-15 MPGV GEQAPDFEGIAHTGEKIRLSD FRGRIVVLYF 18313548 6 7.88e-15 MVLKV GDKAPDFELLNEELKPVRLSE VLKRGRPVVL 21283385 21 2.53e-14 IHLKGQQINE GDFAPDFTVLDNDLNQVTLAD YAGKKKLISV 15805225 48 4.65e-14 AGNAANGPLV GKPAPQFNLTGLDGQPVALAD YRGRPVVLNF 15672286 20 9.16e-14 EKIATLNPPI SGDAPDFELTDLKGNKIKLSK LEKPVLISVF 15609375 5 2.46e-13 MLNV GATAPDFTLRDQNQQLVTLRG YRGAKNVLLV 15677788 25 2.67e-13 ILLAIVLIPD SKTAPAFSLPDLHGKTVSNAD LQGKVTLINF 15615431 48 3.66e-12 ITTFNEGVQV GQRAVPFSLTTLEGQVVDLSS LRGQPVILHF 13541117 6 3.66e-12 MSPKV NEKAPDFEAPDTALKMRKLSE FRGQNVVLAF 15614140 39 4.56e-12 FFADRSLARA GEQAVNFVLEDLEGESIELRE LEGKGVFLNF 18309723 262 7.05e-12 SDNKKQEEEN KIPPIDFTLYDQYGNKHTLSE YKGKTIFLNF 21227878 8 1.34e-11 MTDEIRV GETIQDFRLRDQKREEIHLYD LKGKKVLLSF 15605725 37 1.34e-11 HDHYTITTQK GQKIPNVTLTTPDGKKVSIEE FKGKVLLINF 15805374 54 3.79e-11 PDPEGGPALL GKPAPAFALEDLGGRTHALTA AQGKPVVINF 6322180 66 1.10e-10 RSSDVNELEI GDPIPDLSLLNEDNDSISLKK ITENNRVVVF 15597673 133 1.26e-10 GSLGLGIYER GTRLPELSLRNAAGESVQLAD FRGRPLVINL 15602312 43 2.41e-10 IGCKEDIAVI GKQAPEIAVFDLVGTQRSLNE GKGKTILLNF 15801846 22 1.08e-09 VTVANSIPQA GSKAQTFTLVAKDLSDVTLGQ FAGKRKVLNI 16501671 5 1.80e-07 MVLV TRQAPDFTAAAVLGSGEIVDK FNFKQHTNGK -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 15613511 1.8e-19 3_[6]_130 21674812 3.4e-19 4_[6]_123 15807234 3e-17 4_[6]_126 16330420 9.4e-17 45_[6]_152 23098307 1.1e-16 28_[6]_114 30021713 1.2e-16 52_[6]_118 15609658 3.4e-16 9_[6]_127 16078864 6.4e-16 35_[6]_114 14578634 3.4e-15 55_[6]_122 19746502 5.4e-15 67_[6]_119 14600438 5.4e-15 4_[6]_138 18313548 7.9e-15 5_[6]_136 21283385 2.5e-14 20_[6]_123 15805225 4.7e-14 47_[6]_117 15672286 9.2e-14 19_[6]_120 15609375 2.5e-13 4_[6]_128 15677788 2.7e-13 24_[6]_121 15615431 3.7e-12 47_[6]_116 13541117 3.7e-12 5_[6]_131 15614140 4.6e-12 38_[6]_117 18309723 7e-12 261_[6]_121 21227878 1.3e-11 7_[6]_126 15605725 1.3e-11 36_[6]_89 15805374 3.8e-11 53_[6]_114 6322180 1.1e-10 65_[6]_129 15597673 1.3e-10 132_[6]_125 15602312 2.4e-10 42_[6]_112 15801846 1.1e-09 21_[6]_126 16501671 1.8e-07 4_[6]_175 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 6 width=21 seqs=29 15613511 ( 4) GKQAPDFSLPASNGETVSLSD 1 21674812 ( 5) GKIAPDFTLPDSTGKMVSLSE 1 15807234 ( 5) GQPAPDFTLPASTGQDITLSS 1 16330420 ( 46) GDFAPDFTLKNTKGETIILSE 1 23098307 ( 29) GERAPDFELKTIDGQQLRLSD 1 30021713 ( 53) GKSAPDFELTKLDGTNVKLSD 1 15609658 ( 10) GDKAPAFTLPDADGNNVSLAD 1 16078864 ( 36) GQQAPDFSLKTLSGEKSSLQD 1 14578634 ( 56) GESAPDFTITLTDGKQMKLSE 1 19746502 ( 68) GPMAPDFELKGIDGKTYRLSE 1 14600438 ( 5) GEQAPDFEGIAHTGEKIRLSD 1 18313548 ( 6) GDKAPDFELLNEELKPVRLSE 1 21283385 ( 21) GDFAPDFTVLDNDLNQVTLAD 1 15805225 ( 48) GKPAPQFNLTGLDGQPVALAD 1 15672286 ( 20) SGDAPDFELTDLKGNKIKLSK 1 15609375 ( 5) GATAPDFTLRDQNQQLVTLRG 1 15677788 ( 25) SKTAPAFSLPDLHGKTVSNAD 1 15615431 ( 48) GQRAVPFSLTTLEGQVVDLSS 1 13541117 ( 6) NEKAPDFEAPDTALKMRKLSE 1 15614140 ( 39) GEQAVNFVLEDLEGESIELRE 1 18309723 ( 262) KIPPIDFTLYDQYGNKHTLSE 1 21227878 ( 8) GETIQDFRLRDQKREEIHLYD 1 15605725 ( 37) GQKIPNVTLTTPDGKKVSIEE 1 15805374 ( 54) GKPAPAFALEDLGGRTHALTA 1 6322180 ( 66) GDPIPDLSLLNEDNDSISLKK 1 15597673 ( 133) GTRLPELSLRNAAGESVQLAD 1 15602312 ( 43) GKQAPEIAVFDLVGTQRSLNE 1 15801846 ( 22) GSKAQTFTLVAKDLSDVTLGQ 1 16501671 ( 5) TRQAPDFTAAAVLGSGEIVDK 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 21 n= 14387 bayes= 9.59338 E= 2.0e-099 -278 -369 -338 -403 -529 363 -290 -503 -177 -543 -407 -77 -450 -377 -330 -29 -137 -483 -421 -423 -87 -344 122 139 -391 -96 -94 -87 157 -334 -228 -153 -63 196 5 -14 -31 -337 -367 -266 -214 -343 -67 -158 17 -304 -94 -86 115 -333 57 -153 147 248 123 52 89 -335 -366 -266 321 -180 -507 -463 -412 -289 -360 47 -474 -146 -260 -415 -97 -397 -402 -150 -279 -234 -429 -419 -257 -369 -404 -377 -471 -402 -274 -116 -368 -400 -363 -367 399 64 -327 -241 -300 -70 -495 -450 -124 -370 365 -60 -469 -389 -197 -432 -423 -461 -377 6 -214 -127 -364 -292 -193 -440 -416 -363 -392 -251 -506 -493 406 -500 -341 -139 -511 -87 -193 -441 -444 -445 -457 -325 -417 -194 -222 -74 -41 -310 -306 144 -438 -378 -184 -351 -250 -395 -259 -16 -382 -180 -28 181 296 -131 -413 -346 -99 -309 -547 -462 -246 -176 -306 -59 -452 308 -47 -435 -439 -315 -348 -375 -331 -62 -351 -324 -91 -282 -281 2 -63 -333 -133 -59 83 7 -176 -199 175 -138 120 -173 180 -96 -340 24 93 -449 258 -280 -538 0 -183 -523 -86 -148 -413 181 -430 -233 -299 -213 133 -510 -519 -391 -21 -337 -230 13 -385 -306 131 2 -48 144 -223 3 -63 161 -143 97 89 -121 -364 -264 -21 -343 215 57 -391 -96 131 -350 56 -136 -227 77 -312 -88 -140 -13 89 -123 -366 19 -280 -370 -340 -402 -530 357 -288 -502 -384 -42 -407 -75 -451 -73 -95 -256 -389 -483 -422 -423 -228 -358 -65 161 -407 -313 -106 -366 158 -349 -243 163 -325 226 3 51 40 -351 -382 -280 -213 -343 8 -48 -390 -96 -93 -350 115 -136 147 77 24 196 -140 97 153 -123 -366 -265 -193 -188 -445 -150 -289 -402 166 159 -395 -114 42 -348 -383 -314 15 -86 -187 266 -335 -19 -21 -343 -66 -48 -391 -304 131 1 89 -334 -228 -152 -313 47 158 220 154 -337 -367 -266 -391 -315 -549 -464 -244 -522 -309 -61 -454 318 -45 -99 -440 -316 -349 -383 -338 -127 -352 -327 85 -336 -68 -51 -392 -98 -97 -352 -49 -336 -230 3 -314 44 73 298 -25 -339 -368 18 -117 -619 246 232 -654 -104 -292 -542 46 -528 -429 -309 -443 29 -361 37 -369 -502 -624 -504 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 6 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 21 nsites= 29 E= 2.0e-099 0.000000 0.000000 0.000000 0.000000 0.000000 0.827586 0.000000 0.000000 0.034483 0.000000 0.000000 0.034483 0.000000 0.000000 0.000000 0.068966 0.034483 0.000000 0.000000 0.000000 0.034483 0.000000 0.172414 0.206897 0.000000 0.034483 0.000000 0.034483 0.241379 0.000000 0.000000 0.000000 0.034483 0.137931 0.034483 0.034483 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.068966 0.000000 0.000000 0.034483 0.172414 0.000000 0.034483 0.000000 0.172414 0.206897 0.103448 0.068966 0.103448 0.000000 0.000000 0.000000 0.827586 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.103448 0.000000 0.034483 0.000000 0.000000 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.000000 0.000000 0.000000 0.827586 0.068966 0.000000 0.000000 0.000000 0.068966 0.000000 0.000000 0.103448 0.000000 0.655172 0.068966 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.068966 0.034483 0.034483 0.000000 0.000000 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.862069 0.000000 0.000000 0.034483 0.000000 0.068966 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.000000 0.068966 0.000000 0.000000 0.241379 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.000000 0.034483 0.206897 0.379310 0.034483 0.000000 0.000000 0.068966 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.034483 0.000000 0.793103 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.068966 0.000000 0.000000 0.034483 0.000000 0.000000 0.068966 0.034483 0.000000 0.000000 0.034483 0.137931 0.103448 0.000000 0.000000 0.206897 0.000000 0.103448 0.000000 0.206897 0.034483 0.000000 0.034483 0.172414 0.000000 0.413793 0.000000 0.000000 0.068966 0.000000 0.000000 0.034483 0.034483 0.000000 0.137931 0.000000 0.000000 0.000000 0.000000 0.137931 0.000000 0.000000 0.000000 0.068966 0.000000 0.000000 0.068966 0.000000 0.000000 0.034483 0.068966 0.034483 0.310345 0.000000 0.034483 0.034483 0.103448 0.000000 0.103448 0.103448 0.034483 0.000000 0.000000 0.068966 0.000000 0.344828 0.103448 0.000000 0.034483 0.034483 0.000000 0.103448 0.034483 0.000000 0.068966 0.000000 0.000000 0.000000 0.034483 0.103448 0.034483 0.000000 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.758621 0.000000 0.000000 0.000000 0.137931 0.000000 0.034483 0.000000 0.034483 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.034483 0.241379 0.000000 0.000000 0.000000 0.000000 0.241379 0.000000 0.000000 0.137931 0.000000 0.172414 0.034483 0.068966 0.068966 0.000000 0.000000 0.000000 0.000000 0.000000 0.068966 0.034483 0.000000 0.034483 0.000000 0.000000 0.172414 0.034483 0.068966 0.068966 0.068966 0.137931 0.000000 0.103448 0.172414 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.000000 0.068966 0.241379 0.000000 0.034483 0.034483 0.000000 0.000000 0.000000 0.068966 0.034483 0.000000 0.448276 0.000000 0.034483 0.068966 0.000000 0.034483 0.034483 0.000000 0.000000 0.034483 0.068966 0.137931 0.000000 0.000000 0.000000 0.000000 0.034483 0.137931 0.275862 0.172414 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.896552 0.000000 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.000000 0.172414 0.000000 0.034483 0.034483 0.000000 0.034483 0.000000 0.000000 0.034483 0.000000 0.000000 0.034483 0.000000 0.034483 0.068966 0.482759 0.034483 0.000000 0.000000 0.034483 0.034483 0.000000 0.379310 0.344828 0.000000 0.034483 0.000000 0.000000 0.103448 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.068966 0.000000 0.000000 0.000000 0.000000 -------------------------------------------------------------------------------- Time 336.47 secs. ******************************************************************************** ******************************************************************************** MOTIF 7 width = 18 sites = 20 llr = 553 E-value = 4.2e-054 ******************************************************************************** -------------------------------------------------------------------------------- Motif 7 Description -------------------------------------------------------------------------------- Simplified A 42:5::::11::::213: pos.-specific C :::::::::::::::::: probability D ::::::::a131:::1:: matrix E :2:::::::32:::::11 F :::::6::::::1::11: G :1:1:::::1:8:::::: H ::::::::::::1::::2 I 11:::148::::241:2: K :2:::::::21:3::111 L 1::::121::::1:12:1 M :::::::::::::::::: N :1::1:::::211::::: P 1:2::::::1:::::::: Q :1::::::::1:1:1:12 R :19::::::311::5:1: S 3::31:::::1:::::2: T 1::14:::::1:1::::1 V 12:14251::::26211: W ::::::::::1::::::3 Y :1::12:::::::::5:1 bits 6.2 5.6 5.0 4.3 Information 3.7 * * content 3.1 * ** * * (39.9 bits) 2.5 * ***** * * * 1.9 ******** * *** * 1.2 * **************** 0.6 ****************** 0.0 ------------------ Multilevel AERATFVIDRDGKVRYAW consensus S SV I EE I Q sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------------ 13541117 118 1.24e-17 NFAGVPGLTA SKRSVFIIDGDGIVRYAW VSDDPGKEPD 15899339 117 1.72e-17 ELPILKGYVL AKRSVFVIDKNGIVRYKW VSEDPTKEPN 15807234 112 5.06e-17 YGVAIDERGI SGRAVFVIDREGVVRYQH VEEQTGQYTV 18313548 120 1.71e-15 NLLGLPLYHL AKRAVYIIDPTGTIRYVW YSDDPRDEPP 15605963 130 1.42e-14 LIGEGALKGI LARAVFIIDKEGKVAYVQ LVPEITEEPN 15790738 151 1.98e-14 QASAEDHDRV PERAVFLIDADRVIRYAW ASSDLSESPD 15643152 100 4.69e-14 EFFNVLENGK TVRSTFLIDRWGFVRKEW RRVKVEGHVQ 15613511 116 4.69e-14 KKNFGKEYMG IERSTFVIDKDGTVVKEW RKVRVKDHVE 4433065 103 1.19e-13 YGMLNVETGV SRRGYVIIDDKGKVRYIQ MNDDGIGRST 15609375 114 1.61e-13 YGVFNEQAGI ANRGTFVVDRSGIIRFAE MKQPGEVRDQ 14600438 113 2.40e-13 VVRGEGSNLA AERVTFIIDREGNIRAIL RNIRPAEKHA 16330420 176 1.01e-12 EHNGTEEWLL PVPATFVIDRRGHIALAY ANVDFRVRYE 15609658 122 2.98e-12 KQMYGKTVQG VIRSTFVVDEDGKIVVAQ YNVKATGHVA 15801846 131 7.71e-12 AIADGPLKGL AARAVVVIDENDNVIFSQ LVDEITTEPD 21227878 117 2.46e-11 CGIFRGKEGV SERANIIIDENRQVIYFK KYLGHELPDI 21222859 185 2.46e-11 RFEKGTLNPQ AVPSTLIIDREGKVAART LQALSEEKLR 21283385 126 3.38e-11 YGVVMEELRL LARAVFVLDADNKVVYKE IVSEGTDFPD 21674812 110 3.97e-11 AYDALGFLGM AQRAYVLIDEQGLVLLSY SDFLPVTYQP 16125919 137 6.86e-11 DVALAVKPDW SNRTSYVIAPDGKILLSH TDGNFMGHVQ 16078864 137 1.36e-10 KGINADYNVM SYPTTYILDEKGVIQDIH VGTMTKKEME -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 13541117 1.2e-17 117_[7]_22 15899339 1.7e-17 116_[7]_22 15807234 5.1e-17 111_[7]_22 18313548 1.7e-15 119_[7]_25 15605963 1.4e-14 129_[7]_22 15790738 2e-14 150_[7]_44 15643152 4.7e-14 99_[7]_204 15613511 4.7e-14 115_[7]_21 4433065 1.2e-13 102_[7]_13 15609375 1.6e-13 113_[7]_22 14600438 2.4e-13 112_[7]_33 16330420 1e-12 175_[7]_25 15609658 3e-12 121_[7]_18 15801846 7.7e-12 130_[7]_20 21227878 2.5e-11 116_[7]_20 21222859 2.5e-11 184_[7]_21 21283385 3.4e-11 125_[7]_21 21674812 4e-11 109_[7]_21 16125919 6.9e-11 136_[7]_23 16078864 1.4e-10 136_[7]_16 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 7 width=18 seqs=20 13541117 ( 118) SKRSVFIIDGDGIVRYAW 1 15899339 ( 117) AKRSVFVIDKNGIVRYKW 1 15807234 ( 112) SGRAVFVIDREGVVRYQH 1 18313548 ( 120) AKRAVYIIDPTGTIRYVW 1 15605963 ( 130) LARAVFIIDKEGKVAYVQ 1 15790738 ( 151) PERAVFLIDADRVIRYAW 1 15643152 ( 100) TVRSTFLIDRWGFVRKEW 1 15613511 ( 116) IERSTFVIDKDGTVVKEW 1 4433065 ( 103) SRRGYVIIDDKGKVRYIQ 1 15609375 ( 114) ANRGTFVVDRSGIIRFAE 1 14600438 ( 113) AERVTFIIDREGNIRAIL 1 16330420 ( 176) PVPATFVIDRRGHIALAY 1 15609658 ( 122) VIRSTFVVDEDGKIVVAQ 1 15801846 ( 131) AARAVVVIDENDNVIFSQ 1 21227878 ( 117) SERANIIIDENRQVIYFK 1 21222859 ( 185) AVPSTLIIDREGKVAART 1 21283385 ( 126) LARAVFVLDADNKVVYKE 1 21674812 ( 110) AQRAYVLIDEQGLVLLSY 1 16125919 ( 137) SNRTSYVIAPDGKILLSH 1 16078864 ( 137) SYPTTYILDEKGVIQDIH 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 18 n= 14675 bayes= 9.76903 E= 4.2e-054 192 -170 -447 -376 -242 -332 -208 10 -361 16 -95 -307 65 -268 -292 224 3 -26 -278 -221 64 -302 -187 130 -350 -56 -54 -47 96 -293 -187 117 -273 86 45 -99 -141 51 -326 60 -610 -538 -675 -682 -717 -593 -420 -708 -447 -675 -635 -568 148 -480 436 -560 -600 -739 -579 -609 247 -213 -557 -532 -521 45 -409 -473 -551 -485 -379 -426 -454 -430 -456 241 84 -73 -504 -500 -266 -233 -398 -414 -345 -388 -247 -233 -374 -300 -191 14 -414 -284 -317 33 309 186 -360 144 -411 -296 -565 -540 366 -533 -293 -38 -542 -68 -177 -463 -496 -440 -467 -376 -397 40 -244 203 -436 -345 -712 -680 -427 -672 -601 275 -680 68 -249 -607 -658 -612 -646 -563 -391 241 -602 -510 -346 -298 -459 -452 -309 -478 -352 361 -422 -9 -75 -369 -479 -371 -392 -333 -275 68 -386 -305 -227 -356 379 -129 -456 -380 -187 -420 -422 -450 -366 -74 -470 -313 -357 -284 -364 -428 -402 -352 17 -344 -20 168 -393 -56 -89 -346 96 -329 -224 -143 68 -70 255 -134 -177 -330 -366 -265 -177 -307 187 130 -355 -265 -57 -314 53 -298 -192 166 -276 86 44 26 9 -300 145 -229 -257 -347 -123 -378 -507 362 -264 -481 -374 -521 -386 -50 -427 -352 11 -230 -365 -462 -399 -400 -200 -208 -302 -231 -13 -319 166 120 175 -56 -106 110 -332 62 -190 -165 84 76 -281 -210 -435 -343 -730 -698 -468 -686 -653 274 -705 -301 -288 -627 -674 -658 -684 -583 -393 282 -675 -546 69 -188 -455 -375 -236 -369 -200 86 -319 23 -88 -306 -386 49 314 -229 -186 87 -275 -215 -46 -171 -119 -275 138 -327 89 -219 -50 -16 -140 -232 -304 -219 -211 -156 -225 -119 -80 412 127 -288 -196 51 -32 -269 -61 98 51 -277 -176 -121 -279 84 43 137 -143 3 -321 -224 -187 -308 -202 52 -352 -277 310 -309 -5 -95 -194 -126 -286 238 -98 -113 7 -299 402 152 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 7 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 18 nsites= 20 E= 4.2e-054 0.350000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.050000 0.000000 0.100000 0.000000 0.000000 0.100000 0.000000 0.000000 0.300000 0.050000 0.050000 0.000000 0.000000 0.150000 0.000000 0.000000 0.200000 0.000000 0.050000 0.000000 0.050000 0.150000 0.000000 0.000000 0.100000 0.000000 0.050000 0.050000 0.000000 0.000000 0.150000 0.000000 0.050000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.150000 0.000000 0.850000 0.000000 0.000000 0.000000 0.000000 0.000000 0.450000 0.000000 0.000000 0.000000 0.000000 0.100000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.300000 0.100000 0.050000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.050000 0.000000 0.000000 0.000000 0.050000 0.400000 0.400000 0.000000 0.100000 0.000000 0.000000 0.000000 0.000000 0.600000 0.000000 0.000000 0.050000 0.000000 0.050000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.150000 0.000000 0.150000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.400000 0.000000 0.150000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.450000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.800000 0.000000 0.100000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.100000 0.000000 0.000000 0.050000 0.000000 0.950000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.100000 0.000000 0.050000 0.250000 0.000000 0.050000 0.000000 0.000000 0.150000 0.000000 0.000000 0.000000 0.100000 0.000000 0.300000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.300000 0.200000 0.000000 0.000000 0.000000 0.000000 0.100000 0.000000 0.000000 0.150000 0.000000 0.050000 0.050000 0.050000 0.050000 0.000000 0.050000 0.000000 0.000000 0.000000 0.050000 0.000000 0.000000 0.800000 0.000000 0.000000 0.000000 0.000000 0.000000 0.050000 0.000000 0.000000 0.100000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.050000 0.000000 0.050000 0.150000 0.300000 0.050000 0.000000 0.100000 0.000000 0.050000 0.000000 0.000000 0.100000 0.150000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.400000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.600000 0.000000 0.000000 0.150000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.100000 0.000000 0.100000 0.000000 0.000000 0.000000 0.050000 0.450000 0.000000 0.000000 0.150000 0.000000 0.000000 0.100000 0.000000 0.050000 0.000000 0.100000 0.000000 0.000000 0.000000 0.100000 0.150000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.050000 0.000000 0.450000 0.250000 0.000000 0.000000 0.100000 0.050000 0.000000 0.000000 0.150000 0.100000 0.000000 0.000000 0.000000 0.000000 0.050000 0.050000 0.150000 0.000000 0.100000 0.000000 0.000000 0.000000 0.000000 0.000000 0.100000 0.000000 0.000000 0.150000 0.000000 0.050000 0.050000 0.000000 0.000000 0.000000 0.200000 0.000000 0.000000 0.050000 0.000000 0.300000 0.100000 -------------------------------------------------------------------------------- Time 380.88 secs. ******************************************************************************** ******************************************************************************** MOTIF 8 width = 15 sites = 29 llr = 571 E-value = 6.1e-041 ******************************************************************************** -------------------------------------------------------------------------------- Motif 8 Description -------------------------------------------------------------------------------- Simplified A :1::::::2::1:11 pos.-specific C ::::::::::::::: probability D :2:::1:::a:1::: matrix E 1::1::::::11:3: F ::1:6:::::1:1:: G 13:::::::::23:: H 1:::::::::::1:: I ::2:::111::::13 K 12:::::::::2:1: L ::6::167::::::2 M ::::::1:::::::: N 21:2:::::::2::: P :::2:7::::2:::: Q 1::::::::::::1: R ::::::::::1:31: S 1:::::1:5:1:::: T 1::4:1::::11:11 V ::1:1:11::::::2 W ::::::::::::::: Y 2:::3:::1:::::: bits 6.2 5.6 5.0 4.3 Information 3.7 * content 3.1 * * (28.4 bits) 2.5 * ** * * 1.9 * ******** * 1.2 ********** **** 0.6 *************** 0.0 --------------- Multilevel YGLTFPLLSDPNREI consensus INY A G V sequence P -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- --------------- 15609658 88 1.88e-14 PEKLATFRDA QGLTFPLLSDPDREV LTAWGAYGEK 13541117 84 5.14e-13 PFSLAEFAKK NNLTFDLLSDSNREI SKKYDVLHQN 18313548 86 6.00e-13 PFALKAFKDA NRLNFPLLSDYNRIV IGMYDVVQAN 15790738 119 2.25e-12 TYAHEAFADE YALTFPLLSDHAGAI ADAFGVLQAS 6322180 145 1.28e-11 VTSQKKFQSK QNLPYHLLSDPKREF IGLLGAKKTP 21223405 94 3.84e-11 HAWRKDHPDL TDLPFPMLADSKHEL MRDLGIEGED 16330420 125 6.86e-11 LVASQKTIDR HDLTYDLLSDSGFQT AQDYGLVFTV 21674812 83 8.61e-11 PESHKQFAEK HKLPFLLLSDQERTV AKAYDALGFL 21222859 158 1.34e-10 TGPARAFEKD YGVTYPSLYDPAGRL MLRFEKGTLN 6850955 93 1.67e-10 NDNELSKMVE GGIPFPMLSDGGGNV GTLYGVYDPE 16803644 98 1.67e-10 TNTPIKEGGI GKLNYPLAADTNHQV ASDYGVLIEE 4433065 75 3.89e-10 CEADKSKGGV GKLGFPLVSDIKRCI SIKYGMLNVE 16078864 116 5.84e-10 EKQVRAFADT YDLTFPILIDKKGIN ADYNVMSYPT 15609375 84 7.88e-10 PPTHKIWATQ SGFTFPLLSDFWPHG AVSQAYGVFN 11467494 97 7.88e-10 LLCNREEGGL EDLNYPLVSDLTQTI TRDYQVLTDE 13186328 86 8.70e-10 VQKNWIENEL KFINFPFISDFNHKI SNNFNILNKK 15805225 125 1.16e-09 PQNARDFARQ YGLTYPNLQDPGVAT AIAYQVTGIP 15899339 84 1.28e-09 PFSNKAFKEQ NKINFTIVSDFNREA VKAYGVAGEL 3318841 96 1.41e-09 INAYNSEEPT EKLPFPIIDDRNREL AILLGMLDPA 14600438 83 2.48e-09 VEKNRKFAEK HGFRFKLVSDEKGEI GMKYGVVRGE 19705357 105 2.72e-09 KDKIISFLKK KNITYPSLMDETGKS FDDYKVRALP 15807234 84 5.59e-09 VYAHRAWAAE YGIEVPLLADMQLEV ARQYGVAIDE 9955016 96 8.63e-09 INTPRKEGGL GPLNIPLLADVTRRL SEDYGVLKTD 15805374 131 8.63e-09 PADARRFMDQ YGLIYPALLDPGSRT ALSYGVGKLP 15643152 74 9.40e-09 VEALKRFKEK NDLKVTLLSDPEGIL HEFFNVLENG 19554157 70 1.02e-08 QTIAQDFKLD NAVTYPSIYDPPFRI AAALGGVPTS 15613511 82 1.11e-08 VERHKKFIEK YSLPFLLLADEDTKV AQQYDVWKLK 14578634 159 1.47e-07 DIFAKYALRE SGITRNVLIDREGKI VKLTRLYNEE 14286173 89 3.42e-07 WIEWIAENLD TEIEFPVIADTGRVA DTLGLIHPAR -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 15609658 1.9e-14 87_[8]_55 13541117 5.1e-13 83_[8]_59 18313548 6e-13 85_[8]_62 15790738 2.3e-12 118_[8]_79 6322180 1.3e-11 144_[8]_56 21223405 3.8e-11 93_[8]_76 16330420 6.9e-11 124_[8]_79 21674812 8.6e-11 82_[8]_51 21222859 1.3e-10 157_[8]_51 6850955 1.7e-10 92_[8]_96 16803644 1.7e-10 97_[8]_69 4433065 3.9e-10 74_[8]_44 16078864 5.8e-10 115_[8]_40 15609375 7.9e-10 83_[8]_55 11467494 7.9e-10 96_[8]_93 13186328 8.7e-10 85_[8]_76 15805225 1.2e-09 124_[8]_46 15899339 1.3e-09 83_[8]_58 3318841 1.4e-09 95_[8]_114 14600438 2.5e-09 82_[8]_66 19705357 2.7e-09 104_[8]_45 15807234 5.6e-09 83_[8]_53 9955016 8.6e-09 95_[8]_87 15805374 8.6e-09 130_[8]_43 15643152 9.4e-09 73_[8]_233 19554157 1e-08 69_[8]_37 15613511 1.1e-08 81_[8]_58 14578634 1.5e-07 158_[8]_25 14286173 3.4e-07 88_[8]_106 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 8 width=15 seqs=29 15609658 ( 88) QGLTFPLLSDPDREV 1 13541117 ( 84) NNLTFDLLSDSNREI 1 18313548 ( 86) NRLNFPLLSDYNRIV 1 15790738 ( 119) YALTFPLLSDHAGAI 1 6322180 ( 145) QNLPYHLLSDPKREF 1 21223405 ( 94) TDLPFPMLADSKHEL 1 16330420 ( 125) HDLTYDLLSDSGFQT 1 21674812 ( 83) HKLPFLLLSDQERTV 1 21222859 ( 158) YGVTYPSLYDPAGRL 1 6850955 ( 93) GGIPFPMLSDGGGNV 1 16803644 ( 98) GKLNYPLAADTNHQV 1 4433065 ( 75) GKLGFPLVSDIKRCI 1 16078864 ( 116) YDLTFPILIDKKGIN 1 15609375 ( 84) SGFTFPLLSDFWPHG 1 11467494 ( 97) EDLNYPLVSDLTQTI 1 13186328 ( 86) KFINFPFISDFNHKI 1 15805225 ( 125) YGLTYPNLQDPGVAT 1 15899339 ( 84) NKINFTIVSDFNREA 1 3318841 ( 96) EKLPFPIIDDRNREL 1 14600438 ( 83) HGFRFKLVSDEKGEI 1 19705357 ( 105) KNITYPSLMDETGKS 1 15807234 ( 84) YGIEVPLLADMQLEV 1 9955016 ( 96) GPLNIPLLADVTRRL 1 15805374 ( 131) YGLIYPALLDPGSRT 1 15643152 ( 74) NDLKVTLLSDPEGIL 1 19554157 ( 70) NAVTYPSIYDPPFRI 1 15613511 ( 82) YSLPFLLLADEDTKV 1 14578634 ( 159) SGITRNVLIDREGKI 1 14286173 ( 89) TEIEFPVIADTGRVA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 15 n= 14963 bayes= 8.89339 E= 6.1e-041 -231 -358 -226 11 -408 79 270 -370 11 -353 -248 194 -328 114 -157 52 41 -356 -384 285 -22 -349 123 -50 -74 189 -99 -358 115 -342 -236 127 -63 -95 3 -14 -187 -344 -374 -272 -542 -448 -797 -721 41 -727 -529 198 -713 272 -164 -673 -662 -539 -603 -623 -496 -18 -475 -464 -242 -362 -238 10 -418 -96 -116 -86 -52 -361 -255 218 177 -114 2 -157 264 -362 -392 -292 -554 -453 -647 -629 337 -624 -196 -77 -608 -428 -393 -492 -613 -478 -32 -482 -520 -31 -228 353 -258 -371 -35 -372 -473 -399 86 -424 -133 -85 -366 -59 381 -282 -324 -239 2 -411 -495 -448 -150 -266 -528 -445 -75 -463 -275 53 -431 271 141 -75 -426 -307 -336 42 -278 -41 -327 -286 -137 -380 -699 -630 -371 -652 -485 122 -623 285 -188 -585 -601 -498 -534 -529 -418 75 -503 -462 112 -239 -95 -302 -290 -370 -189 32 -291 -95 77 -266 -384 14 -255 311 -190 -200 -319 116 -740 -662 396 -575 -786 -665 -553 -816 -785 -792 -763 -511 -717 -681 -691 -664 -721 -821 -694 -695 -214 -338 -229 56 73 -96 131 -85 -48 -135 58 -155 194 46 75 97 41 -121 -364 20 -21 -345 8 57 -393 106 -95 -353 115 -336 -230 215 -63 46 -142 -140 89 -338 105 -267 -222 -314 -251 -181 21 172 268 -290 -173 -126 -204 -173 -64 40 280 -19 -32 -111 -355 -263 -21 83 -227 176 -391 -304 131 55 89 -334 -227 4 -313 115 158 -139 41 -123 -366 -265 -24 -216 -512 -436 -46 -106 -245 212 -415 86 -116 -30 -422 -316 -337 -47 92 147 -303 -244 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 8 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 15 nsites= 29 E= 6.1e-041 0.000000 0.000000 0.000000 0.068966 0.000000 0.137931 0.103448 0.000000 0.068966 0.000000 0.000000 0.172414 0.000000 0.068966 0.000000 0.068966 0.068966 0.000000 0.000000 0.241379 0.068966 0.000000 0.172414 0.034483 0.034483 0.310345 0.000000 0.000000 0.172414 0.000000 0.000000 0.103448 0.034483 0.000000 0.034483 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.068966 0.000000 0.000000 0.241379 0.000000 0.620690 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.068966 0.000000 0.000000 0.000000 0.000000 0.000000 0.068966 0.000000 0.034483 0.000000 0.034483 0.034483 0.000000 0.000000 0.206897 0.206897 0.000000 0.034483 0.000000 0.379310 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.551724 0.000000 0.000000 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.034483 0.000000 0.000000 0.068966 0.000000 0.310345 0.000000 0.000000 0.068966 0.000000 0.000000 0.000000 0.034483 0.000000 0.034483 0.068966 0.000000 0.034483 0.689655 0.000000 0.000000 0.000000 0.068966 0.000000 0.000000 0.000000 0.034483 0.000000 0.000000 0.000000 0.034483 0.000000 0.000000 0.103448 0.000000 0.551724 0.068966 0.034483 0.000000 0.000000 0.000000 0.103448 0.000000 0.068966 0.000000 0.000000 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.137931 0.000000 0.689655 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.137931 0.000000 0.000000 0.206897 0.000000 0.034483 0.000000 0.000000 0.000000 0.000000 0.068966 0.000000 0.034483 0.034483 0.000000 0.000000 0.034483 0.000000 0.517241 0.000000 0.000000 0.000000 0.068966 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.103448 0.103448 0.034483 0.034483 0.034483 0.034483 0.034483 0.034483 0.000000 0.241379 0.034483 0.068966 0.103448 0.068966 0.034483 0.000000 0.034483 0.068966 0.000000 0.068966 0.103448 0.000000 0.172414 0.000000 0.000000 0.172414 0.000000 0.000000 0.206897 0.034483 0.034483 0.000000 0.000000 0.103448 0.000000 0.034483 0.000000 0.000000 0.000000 0.000000 0.000000 0.068966 0.275862 0.103448 0.000000 0.000000 0.034483 0.000000 0.000000 0.034483 0.034483 0.344828 0.034483 0.034483 0.034483 0.000000 0.000000 0.068966 0.034483 0.000000 0.275862 0.000000 0.000000 0.034483 0.103448 0.137931 0.000000 0.000000 0.034483 0.000000 0.068966 0.137931 0.000000 0.068966 0.034483 0.000000 0.000000 0.068966 0.000000 0.000000 0.000000 0.034483 0.034483 0.000000 0.275862 0.000000 0.172414 0.000000 0.034483 0.000000 0.000000 0.000000 0.034483 0.103448 0.241379 0.000000 0.000000 -------------------------------------------------------------------------------- Time 422.25 secs. ******************************************************************************** ******************************************************************************** MOTIF 9 width = 21 sites = 9 llr = 338 E-value = 3.5e-015 ******************************************************************************** -------------------------------------------------------------------------------- Motif 9 Description -------------------------------------------------------------------------------- Simplified A ::2::4::1::::::332::1 pos.-specific C ::::::::::::::::::::: probability D :::::::::34::::::::1: matrix E :::::::::2::::1::2:82 F ::::2:1::::::::::2::: G ::2::::::::9::::::::: H ::::::::::::::::::::: I ::::::111:::231:::2:: K ::2::::::2::2:41::3:: L :2::::331::1::::3::1: M ::::::4:::::::::1:::: N ::::::::::6::::::3::: P ::::::::::::::::::::2 Q :::::::::1::1:::::::: R 8:1a::::::::::::::::: S 16::13:::1::::::1:::3 T :2::::::::::::2:::::1 V :::::2:47:::47111:4:: W 1:1:1:::::::::::::::: Y ::1:6::1:::::::4::::: bits 6.2 5.6 5.0 4.3 * Information 3.7 * * content 3.1 * ** * ** * * (54.2 bits) 2.5 ** **** * ** * * *** 1.9 ********************* 1.2 ********************* 0.6 ********************* 0.0 --------------------- Multilevel RSARYAMVVDNGVVKYANVES consensus LG FSLL ED IITALAK E sequence TK V K K EI P F -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- --------------------- 15964668 124 1.20e-20 IDLSAGTLGI RSKRYSMLVEDGVVKALNIEE SPGQATASGA 17229033 124 3.84e-19 VDKTDLGFGK RSWRYSMLVKDGVIEKMFIEP DVPGDPFEVS 6323138 138 3.52e-18 FELAVGDGVY WSGRWAMVVENGIVTYAAKET NPGTDVTVSS 15826629 124 6.10e-18 DSLVSIFGNR RLKRFSMVVQDGIVKALNVEP DGTGLTCSLA 21112072 122 1.16e-17 IDASGSGMGL RSRRYALYADDGVVKALFVEE PGEFKVSAAD 1091044 131 1.31e-16 SLDLPPAFGT RTARYAIIVSNGVVKYVEKDS EGVAGSGVDA 5326864 130 2.32e-16 KSIGWADEEG RTYRYVLVIDNGKIIYAAKEA AKNSLELSRA 4704732 126 2.30e-13 LDLKDKGSGI SSGRFALLLDNLKVTVANVES GGEFTVSSAE 15672286 124 3.35e-13 LILNGGPLEG RLARSVFVVKNGQIVYSEVLS ELSDEPNYEK -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 15964668 1.2e-20 123_[9]_17 17229033 3.8e-19 123_[9]_107 6323138 3.5e-18 137_[9]_18 15826629 6.1e-18 123_[9]_17 21112072 1.2e-17 121_[9]_18 1091044 1.3e-16 130_[9]_15 5326864 2.3e-16 129_[9]_17 4704732 2.3e-13 125_[9]_16 15672286 3.4e-13 123_[9]_16 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 9 width=21 seqs=9 15964668 ( 124) RSKRYSMLVEDGVVKALNIEE 1 17229033 ( 124) RSWRYSMLVKDGVIEKMFIEP 1 6323138 ( 138) WSGRWAMVVENGIVTYAAKET 1 15826629 ( 124) RLKRFSMVVQDGIVKALNVEP 1 21112072 ( 122) RSRRYALYADDGVVKALFVEE 1 1091044 ( 131) RTARYAIIVSNGVVKYVEKDS 1 5326864 ( 130) RTYRYVLVIDNGKIIYAAKEA 1 4704732 ( 126) SSGRFALLLDNLKVTVANVES 1 15672286 ( 124) RLARSVFVVKNGQIVYSEVLS 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 21 n= 14387 bayes= 11.49 E= 3.5e-015 -338 -256 -434 -406 -446 -385 -106 -392 -130 -385 -324 -274 -359 -170 436 -81 -324 -461 82 -321 -168 -149 -339 -369 -346 -293 -211 -300 -307 -28 -221 -146 -315 -257 -240 361 176 -323 -340 -283 95 -237 -151 -79 -277 108 -2 -237 135 -229 -128 -69 -230 2 128 -57 -95 -230 227 144 -442 -339 -517 -515 -546 -452 -200 -490 -237 -476 -427 -374 -433 -274 452 -388 -424 -564 -373 -433 -214 -175 -314 -285 189 -331 90 -227 -293 -206 -145 -231 -309 -218 -214 29 -233 -248 204 424 251 -120 -449 -417 -402 -202 -298 -323 -432 -356 -258 -325 -364 -322 -347 236 -181 104 -394 -380 -404 -315 -677 -597 103 -577 -374 101 -583 195 415 -534 -525 -395 -468 -473 -359 -211 -310 -302 -241 -184 -499 -431 -201 -407 -247 120 -415 168 -47 -354 -410 -310 -337 -274 -200 223 -284 164 -48 -140 -384 -349 -297 -383 -208 57 -367 -88 -149 -329 -336 -302 -271 -265 -142 311 -364 -346 -181 -362 200 178 -409 -267 -84 -342 127 -326 -224 -116 -273 162 -137 87 -169 -316 -381 -273 -316 -369 262 -206 -497 -204 -90 -539 -268 -524 -460 361 -358 -190 -282 -114 -227 -526 -480 -326 -229 -316 -299 -362 -472 367 -246 -444 -355 -132 -353 -218 -402 -333 -286 -208 -340 -427 -373 -372 -166 -159 -342 -260 -239 -324 -128 157 99 -149 -83 -228 -326 114 -113 -186 -141 237 -272 -213 -268 -218 -540 -506 -346 -517 -397 234 -517 -193 -174 -460 -489 -458 -450 -405 -250 293 -495 -413 -134 -199 -172 53 -252 -233 -26 53 210 -184 -93 -93 -244 -23 -46 -72 163 14 -251 -167 132 -151 -273 -211 -14 -271 17 -134 16 -161 -85 -177 -286 -127 -129 -119 -143 9 -139 372 171 -97 -380 -308 -159 -267 -135 -41 -291 149 200 -237 -311 -197 -220 75 -109 53 -200 -144 103 -257 -111 140 156 -208 -38 -271 -118 -273 -177 264 -260 -48 -110 -72 -127 -265 -309 -194 -271 -211 -514 -443 -305 -460 -306 193 184 -176 -140 -387 -464 -333 -303 -333 -234 236 -401 -323 -404 -537 -87 368 -618 -468 -318 -506 -406 -229 -459 -345 -504 -224 -425 -391 -423 -518 -560 -515 26 -225 -151 129 -332 -219 -50 -285 -113 -276 -174 -92 179 -42 -101 232 88 -265 -313 -219 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 9 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 21 nsites= 9 E= 3.5e-015 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.777778 0.111111 0.000000 0.000000 0.111111 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.222222 0.000000 0.000000 0.000000 0.000000 0.000000 0.555556 0.222222 0.000000 0.000000 0.000000 0.222222 0.000000 0.000000 0.000000 0.000000 0.222222 0.000000 0.000000 0.222222 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.000000 0.000000 0.111111 0.111111 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.222222 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.000000 0.111111 0.555556 0.444444 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.333333 0.000000 0.222222 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.000000 0.111111 0.000000 0.333333 0.444444 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.333333 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.444444 0.000000 0.111111 0.111111 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.111111 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.666667 0.000000 0.000000 0.000000 0.000000 0.333333 0.222222 0.000000 0.000000 0.000000 0.000000 0.222222 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.111111 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.444444 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.555556 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.888889 0.000000 0.000000 0.000000 0.111111 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.222222 0.222222 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.000000 0.000000 0.444444 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.333333 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.666667 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.000000 0.000000 0.111111 0.444444 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.222222 0.111111 0.000000 0.000000 0.333333 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.444444 0.333333 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.333333 0.111111 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.111111 0.000000 0.000000 0.222222 0.000000 0.000000 0.222222 0.222222 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.333333 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.222222 0.333333 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.444444 0.000000 0.000000 0.000000 0.000000 0.111111 0.777778 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.000000 0.000000 0.222222 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.222222 0.000000 0.000000 0.333333 0.111111 0.000000 0.000000 0.000000 -------------------------------------------------------------------------------- Time 458.65 secs. ******************************************************************************** ******************************************************************************** MOTIF 10 width = 29 sites = 7 llr = 376 E-value = 5.3e-014 ******************************************************************************** -------------------------------------------------------------------------------- Motif 10 Description -------------------------------------------------------------------------------- Simplified A ::14::::316::::1::::::::::::: pos.-specific C ::::::::::::::::::::::::::::: probability D :::::6:::::::3:::::::::7::::: matrix E :::::::3:::::::6::6::3:1:1::: F ::::::9::::6::::6::1::::::::: G 4:::::::::3:::7::1::::::::9:: H ::::::::::::::::::::::::::::: I :::3:::::3::::::11:6::1::1::: K :73:3::6:1::1431:63:3:1::3:71 L :::1:::1::::::::1::::7::11::: M ::::::::1:::::::::::::::::::: N 4::::1::31:::3:::::::::::::1: P 1:1163::::::::::::::::::::::: Q :::::::::1::::::::::::1:::::: R ::1::::::::::::1::::4::::3:1: S ::::::1:::1:::::::::1:61::1:: T :33:1:::11::::::::1:1:::1:::: V :::::::::::34:::1::3::::::::: W ::::::::::::::::::::::::::::6 Y ::::::::1::14::::1::::::7:::3 bits 6.2 5.6 5.0 4.3 * Information 3.7 * * * content 3.1 ** * ** * ** *** (77.5 bits) 2.5 ** ***** *************** *** 1.9 ***************************** 1.2 ***************************** 0.6 ***************************** 0.0 ----------------------------- Multilevel GKKAPDFKAIAFVKGEFKEIRLSDYKGKW consensus NTTIKP EN GVYDK KVKE R Y sequence N -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------------------------- 13541053 5 1.28e-26 MSLV NKAAPDFEANAFVNGEVKKIRLSSYRGKW VVLFFYPADF 6850955 5 8.03e-26 MVSV GKKAPDFEMAGFYKGEFKTFRLSEYLGKW VVLCFYPGDF 9955016 8 5.16e-25 XSGNARI GKPAPDFKATAVVDGAFKEVKLSDYKGKY VVLFFYPLDF 19173077 3 1.71e-21 MF PKTLTDSKYKAFVDGEIKEISLQDYIGKY VVLAFYPLDF 20151112 4 4.55e-21 SLI NTKIKPFKNQAFKNGEFIEVTEKDTEGRW SVFFFYPADF 11467494 8 9.97e-21 MTNFPKI GKTPPNFLTIGVYKKRLGKIRLSDYRGKK YVILFFYPAN 13186328 3 7.45e-19 ML NTRIKPFKNISYYKKKFYEIKEIDLKSNW NVFFFYPYSY -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 13541053 1.3e-26 4_[10]_141 6850955 8e-26 4_[10]_170 9955016 5.2e-25 7_[10]_161 19173077 1.7e-21 2_[10]_146 20151112 4.5e-21 3_[10]_154 11467494 1e-20 7_[10]_168 13186328 7.4e-19 2_[10]_145 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 10 width=29 seqs=7 13541053 ( 5) NKAAPDFEANAFVNGEVKKIRLSSYRGKW 1 6850955 ( 5) GKKAPDFEMAGFYKGEFKTFRLSEYLGKW 1 9955016 ( 8) GKPAPDFKATAVVDGAFKEVKLSDYKGKY 1 19173077 ( 3) PKTLTDSKYKAFVDGEIKEISLQDYIGKY 1 20151112 ( 4) NTKIKPFKNQAFKNGEFIEVTEKDTEGRW 1 11467494 ( 8) GKTPPNFLTIGVYKKRLGKIRLSDYRGKK 1 13186328 ( 3) NTRIKPFKNISYYKKKFYEIKEIDLKSNW 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 20 w= 29 n= 13619 bayes= 9.90401 E= 5.3e-014 -216 -281 -99 -207 -449 255 -84 -455 -242 -453 -365 306 101 -172 -238 -88 -194 -421 -417 -302 -284 -314 -338 -261 -473 -344 -127 -359 330 -365 -272 -199 -378 -116 13 -223 145 -378 -363 -322 40 -232 -138 -64 -293 -205 11 -239 160 -225 -123 -54 86 20 151 -40 182 -229 -259 -165 191 -98 -362 -291 -158 -265 -133 189 -277 57 -9 -229 91 -187 -213 -125 -107 -43 -204 -146 -159 -268 -279 -241 -371 -290 -144 -318 80 -300 -251 -222 368 -149 -159 -134 38 -306 -379 -325 -278 -330 321 -121 -444 -214 -84 -437 -255 -441 -363 149 156 -170 -254 -119 -221 -431 -413 -299 -336 -205 -452 -443 406 -439 -274 -233 -460 -158 -169 -383 -395 -394 -404 -97 -367 -291 -170 -18 -200 -316 -151 171 -396 -281 -55 -308 263 8 -197 -132 -290 -17 8 -139 -169 -300 -330 -250 123 -134 -169 -106 -177 -206 -13 -96 -97 -127 206 210 -226 -30 -75 -46 108 -109 -202 168 42 -199 -113 -44 -249 -185 24 139 81 -188 -88 132 -195 174 -17 -20 103 -182 -234 -138 281 -87 -395 -375 -381 143 -265 -331 -393 -342 -238 -283 -348 -293 -314 123 -160 -213 -366 -360 -356 -235 -474 -456 366 -449 -116 -228 -450 -185 -175 -351 -419 -342 -365 -294 -343 77 -105 207 -204 -176 -332 -264 -51 -317 -30 -114 65 -160 -93 -210 -329 -149 -121 -174 -170 189 -129 363 -238 -319 192 -124 -426 -173 -39 -421 217 -402 -315 265 -298 -98 -145 -71 -166 -407 -400 -264 -226 -310 -275 -324 -471 353 -209 -442 56 -467 -345 -188 -388 -269 -217 -194 -313 -421 -368 -358 39 -311 -102 260 -373 -254 -42 -302 94 -283 -183 -107 -258 -1 152 -105 -142 -285 -325 -233 -305 -204 -493 -457 362 -448 -283 63 -458 30 -66 -393 -415 -360 -388 -295 -284 13 -217 -92 -200 -238 -255 -183 -290 -10 -75 20 301 -242 -153 -148 -303 -79 -3 -145 -163 -236 -275 120 -185 -365 -71 272 -425 -273 -82 -338 159 -322 -223 -127 -276 -15 -81 -136 101 -314 -379 -281 -295 -240 -440 -425 -7 -442 -313 339 -398 -58 -46 -343 -443 -340 -362 -301 -232 126 -347 -267 -206 -291 -242 -149 -389 -275 -30 -305 189 -280 -185 -119 -295 -10 302 111 107 -300 -304 -232 -291 -271 -281 120 -207 -402 -220 -119 -300 286 -24 -305 -371 -192 -257 -278 -262 -194 -304 -258 -138 -201 -173 -106 -298 -230 -27 38 70 -237 -137 -75 -243 153 -22 306 -6 -239 -273 -188 -320 -339 369 -32 -438 -341 -157 -400 -363 -426 -342 -50 -428 -251 -320 -90 -323 -405 -386 -329 -205 -166 -320 -286 75 -323 72 -169 -285 2 -105 -227 -310 -210 -210 -162 55 -200 -60 435 -160 -253 -191 73 -322 -243 -12 67 179 12 -141 -93 -256 3 241 -89 -119 -244 -275 -193 -203 -289 -280 -343 -465 365 -229 -439 -339 -478 -343 -197 -383 -314 -270 -12 -312 -412 -360 -362 -347 -330 -444 -368 -533 -422 -209 -384 356 -429 -325 -56 -421 -221 66 -325 -318 -451 -385 -390 -415 -304 -443 -432 -87 -426 -145 -373 -50 -266 -257 -334 -450 -296 -290 -344 -367 -378 574 217 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 10 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 20 w= 29 nsites= 7 E= 5.3e-014 0.000000 0.000000 0.000000 0.000000 0.000000 0.428571 0.000000 0.000000 0.000000 0.000000 0.000000 0.428571 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.714286 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.142857 0.000000 0.142857 0.000000 0.285714 0.000000 0.000000 0.000000 0.428571 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.142857 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.571429 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.571429 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.857143 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.571429 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.285714 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.142857 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.142857 0.000000 0.000000 0.142857 0.000000 0.142857 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.571429 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.571429 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.428571 0.000000 0.428571 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.428571 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.714286 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.571429 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.571429 0.000000 0.000000 0.142857 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.142857 0.571429 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.571429 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.571429 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.428571 0.142857 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.714286 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.142857 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.571429 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.714286 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.714286 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.142857 0.285714 0.142857 0.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.857143 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.714286 0.000000 0.000000 0.142857 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.571429 0.285714 -------------------------------------------------------------------------------- Time 495.64 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 1091044 2.42e-23 42_[2(3.68e-13)]_14_[4(1.02e-10)]_36_[9(1.31e-16)]_15 11467494 1.57e-72 7_[10(9.97e-21)]_1_[2(1.72e-12)]_12_[4(1.23e-14)]_9_[8(7.88e-10)]_7_[5(2.81e-37)]_36 11499727 6.74e-33 15_[1(2.44e-24)]_7_[3(2.40e-21)]_25 1174686 2.97e-30 20_[1(1.08e-21)]_9_[3(4.48e-21)]_40 12044976 2.53e-29 17_[1(3.84e-20)]_6_[3(1.30e-20)]_21 13186328 1.31e-65 2_[10(7.45e-19)]_[2(3.67e-11)]_12_[4(1.37e-14)]_4_[8(8.70e-10)]_8_[5(1.48e-32)]_18 13358154 6.02e-27 19_[1(7.89e-22)]_7_[3(1.07e-17)]_22 13541053 1.55e-86 4_[10(1.28e-26)]_[2(1.14e-19)]_12_[4(3.58e-15)]_6_[8(1.69e-05)]_8_[5(3.96e-39)]_12 13541117 1.86e-50 5_[6(3.66e-12)]_5_[2(4.55e-16)]_12_[4(7.49e-14)]_2_[8(5.14e-13)]_19_[7(1.24e-17)]_22 135765 5.48e-36 15_[1(4.41e-25)]_7_[3(2.50e-23)]_23 1388082 1.52e-30 27_[1(6.65e-24)]_7_[3(8.53e-20)]_18 140543 2.16e-26 42_[1(5.53e-21)]_6_[3(5.14e-18)]_21 14286173 1.16e-52 4_[6(9.90e-05)]_5_[2(7.50e-16)]_12_[4(3.08e-13)]_8_[8(3.42e-07)]_8_[5(6.53e-29)]_48 14578634 1.79e-24 55_[6(3.40e-15)]_2_[1(9.96e-13)]_28_[8(3.64e-05)]_8_[8(1.47e-07)]_25 14600438 5.52e-50 4_[6(5.44e-15)]_5_[2(1.61e-16)]_12_[4(5.52e-13)]_2_[8(2.48e-09)]_15_[7(2.40e-13)]_33 15218394 4.73e-28 21_[1(9.96e-23)]_7_[3(9.80e-19)]_26 15597673 1.63e-22 132_[6(1.26e-10)]_2_[1(9.96e-23)]_29_[3(7.76e-06)]_36 15599256 9.34e-30 24_[1(9.60e-24)]_7_[3(8.53e-20)]_200 15602312 1.02e-24 42_[6(2.41e-10)]_2_[1(7.24e-13)]_30_[3(1.02e-10)]_22 15605725 1.09e-23 36_[6(1.34e-11)]_2_[1(3.99e-23)]_58 15605963 3.45e-22 21_[6(3.89e-05)]_5_[2(5.32e-10)]_10_[4(8.06e-12)]_34_[7(1.42e-14)]_22 15609375 1.05e-35 4_[6(2.46e-13)]_6_[2(3.44e-12)]_12_[4(1.30e-10)]_2_[8(7.88e-10)]_15_[7(1.61e-13)]_22 15609658 5.42e-42 9_[6(3.44e-16)]_5_[2(2.82e-14)]_12_[4(7.40e-07)]_2_[8(1.88e-14)]_19_[7(2.98e-12)]_18 15613511 8.97e-47 3_[6(1.82e-19)]_5_[2(7.50e-16)]_12_[4(2.74e-12)]_2_[8(1.11e-08)]_19_[7(4.69e-14)]_21 15614085 9.35e-29 34_[6(5.78e-05)]_3_[1(1.64e-21)]_33_[3(4.93e-14)]_28 15614140 3.44e-39 38_[6(4.56e-12)]_2_[1(8.21e-21)]_30_[3(2.81e-13)]_27 15615431 5.03e-37 47_[6(3.66e-12)]_2_[1(1.79e-20)]_33_[3(3.09e-12)]_23 15643152 2.42e-25 25_[2(1.21e-12)]_8_[4(1.62e-11)]_2_[8(9.40e-09)]_11_[7(4.69e-14)]_204 15672286 6.37e-29 19_[6(9.16e-14)]_4_[2(6.58e-10)]_10_[4(1.32e-09)]_31_[9(3.35e-13)]_16 15790738 2.18e-39 35_[10(1.17e-05)]_2_[2(8.20e-17)]_12_[4(6.75e-12)]_2_[8(2.25e-12)]_17_[7(1.98e-14)]_44 15791337 3.50e-28 [1(2.90e-19)]_7_[3(6.61e-19)]_24 15801846 6.95e-27 21_[6(1.08e-09)]_5_[2(1.51e-09)]_10_[4(1.69e-13)]_35_[7(7.71e-12)]_20 15805225 1.36e-39 47_[6(4.65e-14)]_2_[1(3.96e-26)]_25_[8(1.16e-09)]_6_[8(6.83e-05)]_25 15805374 3.44e-37 53_[6(3.79e-11)]_2_[1(2.01e-27)]_25_[8(8.63e-09)]_6_[8(6.72e-06)]_22 15807234 6.99e-52 4_[6(3.01e-17)]_6_[2(5.16e-16)]_12_[4(7.89e-15)]_2_[8(5.59e-09)]_13_[7(5.06e-17)]_22 15826629 1.24e-25 33_[2(7.69e-13)]_14_[4(2.10e-10)]_38_[9(6.10e-18)]_17 15899007 5.23e-28 47_[1(3.01e-22)]_6_[3(2.81e-21)]_24 15899339 6.23e-45 4_[6(3.56e-09)]_6_[2(1.24e-16)]_12_[4(1.59e-15)]_2_[8(1.28e-09)]_18_[7(1.72e-17)]_22 15964668 3.66e-26 35_[2(1.20e-14)]_14_[4(1.58e-07)]_36_[9(1.20e-20)]_17 15966937 2.09e-23 59_[1(3.99e-23)]_7_[3(2.09e-15)]_206 15988313 1.83e-20 36_[6(1.33e-06)]_2_[1(4.16e-22)]_98 16078864 1.65e-46 35_[6(6.39e-16)]_2_[1(7.26e-28)]_28_[8(5.84e-10)]_6_[7(1.36e-10)]_16 16123427 2.80e-28 51_[1(5.03e-23)]_7_[3(1.65e-18)]_29 16125919 1.43e-20 25_[10(2.05e-05)]_1_[2(3.85e-14)]_12_[4(4.39e-07)]_31_[7(6.86e-11)]_23 16330420 7.96e-33 45_[6(9.44e-17)]_6_[2(5.37e-09)]_35_[8(6.86e-11)]_36_[7(1.01e-12)]_25 1633495 1.63e-31 19_[1(1.08e-23)]_7_[3(4.19e-20)]_24 16501671 6.10e-67 4_[6(1.80e-07)]_11_[2(1.75e-15)]_12_[4(9.92e-16)]_9_[8(2.22e-06)]_8_[5(2.15e-40)]_32 1651717 1.52e-21 32_[1(5.02e-18)]_7_[3(1.70e-15)]_29 16759994 3.23e-25 58_[1(2.73e-21)]_27_[3(1.13e-12)]_25 16761507 5.77e-31 51_[1(4.41e-25)]_7_[3(5.79e-19)]_23 16803644 4.66e-62 38_[2(2.64e-17)]_12_[4(6.75e-12)]_9_[8(1.67e-10)]_8_[5(5.68e-41)]_11 16804867 1.08e-24 17_[1(1.82e-21)]_7_[3(4.74e-16)]_24 17229033 2.65e-27 36_[2(8.66e-12)]_14_[4(1.43e-12)]_35_[9(3.84e-19)]_107 17229859 4.12e-27 20_[1(6.33e-23)]_7_[3(3.99e-17)]_23 1729944 3.67e-17 16_[1(2.50e-16)]_7_[3(1.23e-12)]_26 17531233 1.02e-26 26_[1(7.10e-22)]_6_[3(9.82e-20)]_25 17537401 7.66e-21 25_[1(1.21e-20)]_35_[3(1.42e-11)]_31 17547503 5.49e-26 38_[6(6.75e-05)]_2_[1(1.68e-25)]_1_[4(2.18e-08)]_27_[8(2.08e-05)]_20 18309723 5.43e-36 261_[6(7.05e-12)]_2_[1(4.41e-25)]_37_[3(5.20e-13)]_24 18313548 5.34e-50 5_[6(7.88e-15)]_7_[2(6.63e-16)]_12_[4(3.76e-16)]_2_[8(6.00e-13)]_19_[7(1.71e-15)]_25 18406743 6.87e-21 42_[1(1.34e-20)]_33_[3(7.89e-10)]_69_[1(2.86e-07)]_33_[3(1.70e-10)]_69_[1(1.48e-21)]_33_[3(9.84e-09)]_125 19173077 4.24e-66 2_[10(1.71e-21)]_[2(1.46e-17)]_12_[4(6.34e-17)]_32_[5(3.86e-28)]_14 19554157 3.34e-14 13_[1(3.42e-14)]_27_[8(1.02e-08)]_37 19705357 1.78e-31 14_[6(2.53e-06)]_2_[1(1.66e-24)]_38_[8(2.72e-09)]_45 19746502 1.82e-28 67_[6(5.44e-15)]_2_[1(2.34e-15)]_35_[3(3.66e-13)]_24 20092028 2.59e-33 5_[1(1.89e-24)]_7_[3(3.13e-20)]_23 20151112 6.22e-71 3_[10(4.55e-21)]_[2(4.55e-16)]_12_[4(6.13e-16)]_6_[8(2.19e-05)]_8_[5(3.57e-32)]_25 21112072 8.72e-28 34_[2(3.47e-14)]_13_[4(1.25e-13)]_36_[9(1.16e-17)]_18 21222859 3.39e-31 78_[6(2.90e-06)]_2_[1(2.01e-21)]_27_[8(1.34e-10)]_12_[7(2.46e-11)]_21 21223405 6.27e-55 37_[2(1.05e-13)]_12_[4(1.53e-13)]_6_[8(3.84e-11)]_7_[5(9.87e-34)]_19 21227878 4.61e-31 7_[6(1.34e-11)]_5_[2(5.85e-13)]_12_[4(5.52e-13)]_33_[7(2.46e-11)]_20 21283385 3.54e-29 20_[6(2.53e-14)]_5_[2(6.53e-09)]_9_[4(1.08e-12)]_32_[7(3.38e-11)]_21 21674812 6.74e-49 4_[6(3.37e-19)]_5_[2(1.97e-15)]_12_[4(3.58e-15)]_2_[8(8.61e-11)]_12_[7(3.97e-11)]_21 23098307 3.79e-49 28_[6(1.05e-16)]_2_[1(5.65e-23)]_30_[3(2.85e-15)]_24 2649838 3.34e-07 31_[3(5.25e-15)]_33 267116 8.58e-25 18_[1(4.58e-24)]_6_[3(3.07e-13)]_22 27375582 2.01e-26 48_[6(2.50e-05)]_10_[1(1.63e-20)]_28_[3(1.67e-11)]_29 2822332 2.00e-30 32_[1(2.73e-21)]_5_[3(2.06e-22)]_24 30021713 5.30e-48 52_[6(1.18e-16)]_2_[1(1.70e-27)]_34_[3(1.49e-13)]_24 3261501 4.00e-17 17_[1(3.79e-13)]_6_[3(1.56e-14)]_43 3318841 9.83e-45 33_[2(6.63e-16)]_12_[4(4.55e-13)]_12_[8(1.41e-09)]_14_[5(8.83e-27)]_50 3323237 3.75e-28 17_[1(1.20e-21)]_7_[3(8.53e-20)]_23 4155972 5.34e-21 15_[1(2.21e-19)]_7_[3(5.25e-15)]_24 4200327 6.70e-24 77_[1(6.73e-20)]_7_[3(1.65e-18)]_24 4433065 2.70e-43 15_[2(2.10e-16)]_12_[4(1.59e-19)]_9_[8(3.89e-10)]_13_[7(1.19e-13)]_13 4704732 3.13e-25 37_[2(2.05e-14)]_14_[4(3.60e-14)]_36_[9(2.30e-13)]_16 4996210 7.00e-45 33_[2(4.01e-16)]_12_[4(9.20e-14)]_38_[5(2.00e-31)]_49 5326864 3.94e-25 46_[2(5.55e-15)]_14_[4(2.04e-08)]_31_[9(2.32e-16)]_17 6322180 5.27e-28 65_[6(1.10e-10)]_7_[2(4.26e-14)]_11_[4(3.49e-11)]_2_[8(1.28e-11)]_56 6323138 1.01e-24 48_[2(9.94e-11)]_15_[4(1.69e-13)]_36_[9(3.52e-18)]_18 6687568 3.18e-29 18_[1(8.90e-23)]_6_[3(1.30e-20)]_22 6850955 2.59e-79 4_[10(8.03e-26)]_[2(1.16e-13)]_12_[4(7.15e-18)]_9_[8(1.67e-10)]_8_[5(5.66e-34)]_38 7109697 5.90e-32 16_[1(3.57e-24)]_6_[3(2.06e-22)]_20 7290567 5.98e-30 19_[1(5.05e-25)]_7_[3(2.34e-20)]_27 9955016 8.64e-90 7_[10(5.16e-25)]_[2(2.52e-19)]_12_[4(5.72e-19)]_9_[8(8.63e-09)]_8_[5(7.82e-38)]_29 15677788 1.21e-24 24_[6(2.67e-13)]_2_[1(5.24e-13)]_31_[3(7.37e-07)]_30 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 10 reached. ******************************************************************************** CPU: bmf1 ******************************************************************************** PyCogent-1.5.3/doc/data/primate_brca1.fasta000644 000765 000024 00000047175 11261304034 021457 0ustar00jrideoutstaff000000 000000 >Galago TGTGGCAAAAATACTCATGCCAGCTCATTACAGCATGAGAGCAGTTTATTACTCACTAAA GACAAAATGAATGTAGAAAAGGCTGAATTTTGTAATAAAAGCAAACAGCCTGGCTTAGCA AGGAGCCAACAGAGCAGATCGGCTCAAAGTAAGGAAACATGCAATGATAGGCACACTTGC AGCCCTGAGCAAAAGGTAGATCTGAATACTGCTCCCCCATATGGGAGAAAAGAACAGAAT AAGGAGAAACTTCTATGCTCCAAGAATCCTAGAGATAGCCAAGATGTTCCTTGGATAACA CTAAATAGCAGCATTCAGAAAGTTAATGAATGGTTTTCTAGAAGTGATGAAATGTTAACT TCTGATGACTCACATGATGAGGGTTCTGAATCACATGCTGAAGTAGCTGGAGCCTTAGAA GTTCCAAGTGAAGTAGATGGATATTCCAGTTCCTCAGAGAAAATAGACTTACTGGCCAGT GATCCTCATTATCCTATAATATGTAAAAGTGAAAGAGTTCACTCCAAACCAATAAAGAGT AAAGTTGAAGATAAAATATTTGGGAAAACTTATCGGAGGAAGGCAAGCCTCCCTAACTTA AGCCATGTAACTGAAAATCTAATTATAAGAGCAGCTGCTACTGAGCCACAGATAACACAA GAGTGTTCCCTCACAAATAAATTAAAACGTAAAAGGAGAACTACATCAGGTCTTTGTCCT GAGGATTTTATCAAGAAGGCAGATTTGGCAGTTCAAAAGACACCTGAAAAGAGAATTCAG GGAACTAACCAAGTGGATCAGAATAGTCACGTGGTAAATATTACTAATAGTGGTTATGAG AATGAAACAAAAGGTGATTATGTTCAGAATGAAAAAAATGCTAACTCAACAGAATCATTG GAAAAAGAATCTCTCGGAACTAAAGCTGAACCTATAAGCAGCAGTATAAGTAATATGAAA TTAGAATTAAATATTCACAATTCAAAAGCAAGTAAAAAGAAAAGGCTGAGGAAGAAGTCT TCTAGCAGGCATATTCGTGCACTTGAACTAGTAGTCAATAAAAATCCAAGCCCTCCTAAT CATACCAACCTACAAATTGACAGTTGTTCTAGCAGTGAAGAAATAAAGGATAAAAGTTCT GACCAAATACCAGTCAGGCATAGCAGAAAGCCTGGACTCATGGAAGATAGAGAACCTGCA ACTGGAGCCAAGAAAAGTAACAAGCCAAATGAGCAAATAAGTAAAAGACATGTCAGTGAT ACTTTCCCAGAAGTGGCATTAACAAATATATCTAGTTTTTTTACTAACTGTTCAGGTTCT AATAGAAAAGAATTTGTCAATCCTAGCCTTCAAAGAAAAAAAACAGAAGAGAACGAAGAA ACAATTCAAGTGTCTAATAGTACCAAAGGTCCGGTGTTAAGTGGAGAAAGGGTTTTGCAA ATTGAAAGTGAAGAAAGATCTATAAAAAGCACCAGTATTTCATTGGTACCTGATACTGAT TATGGTACTCAGGACAGTAACTCGTTACTGAAAGTTAAAGTCTTACGGAAGGTGAAAACA GCACCAAATAAACATGCAAGTCAGGGTACAGCCACTGAAAACCCCAAGGAACTAATCCAT GGTTGCTCTAAAGATACTGGAAATGACACAGAGGGCTATAAGGATCCATTGAGACATGAA ATTAACCACATTCAGAAGATAAGCATGGAAATGGAAGACAGTGAACTTGATACTCAGTAT TTACAGAATACATTCAAGTTTTCAAAGCGTCAGTCGTTTGCTCTGTTTTCAAACCTAGGA AAGGAATGTGCAACAGTCTGTGCCCAGTCTCTCTCTGCGTCCTTAAGAAAAGGTTCAAAA GTCATTCTTGAATGTGAACAAATAGAAAATCCAGGAATGAAAGAGCCTAAAATCAAGCAT ATACAGGGAAATAATATCAATACAGGCTTCTCTGTAGTTTGTCAGAAAGATAAGACAGAT GATTATGCCAAATACATCAAAGAAGCATCTAGGTTTTGTTTGTCAAATCAGTTTCGAGAC AATGAAACTGAATCCATTACTGTAAATAAACTTGGAATTTTACAAAACCTCTATCATATA CCACCACTTTCTCCTATCAGGCTATTTGATAAAACTAAATGTAATACAAACCTGTTAGAG GAAAGGTTTGAAGAACATTCAGTGTTACCTGAAAAAGCAGTAGGAAACGAGAACGTTCCA AGTACAATGAATACAATTAACCAAAATAACAGAGAAAGTGCTTATAAAGAAGCCAGTTCA AGCAGTATCAATGAAGTAAGCTCGAGTACTAATGAAGTGGGCTCCAGTGTTAACGAAGTA GGCCCCAGTAGTGAAAACATTCAAGCAGAACTAGATAAAAACAGAGGACCTAAGTTGAAT GCTGTGCTTAGATTAGGTCTTATGCAACCTGAAGTCTATAAACAAAATCTTCCTATAAGT AATTGTGAACATCCTAAAATAAAAGGGCAAGAAGAAAATGGAGTAGTTCAACCTGTTAAT CCAGATTTTTCTTCATGTCTAATTTCAGATAACCTAGAACAACCTACGAGAAGTAGTCAT GCTTCTCAGCTTTGTTCTGAGACACCTGATGACTTATTAGTTGATGATGAACTAAAGGAA AATACCAGTTTTGCTGAAAATAACATTAAGGAAAGATCTGCTGTTTTTAGCAAAAATGTC ATGAGAAGAGAGATTAGCAGGAGCCCTAGCCCTTTAGCCCATATACATTTGACTCAGGCT CACCAAAGAGAGGTTAGGAAATTAGAGTCCTCAGAAGAGAACATGTCTAGTGAA >HowlerMon TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTGTTACTCACTAAA GACACACTGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCA AGGAGCCAACATAACAGATGGGCTGAAAGTGAGGAAACATGTAATGATAGGCAGACTCCC AGCACAGAGAAAAAGGTAGATGTGGATGCTGATCCCCTGCATGGGAGAAAAGAATGGAAT AAGCAGAAACCTCCGTGCTCTGAGAATCCTAGAGATACTGAAGATGTTGCTTGGATAATG CTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAACT TCTGATGACTCACATGATGGGGGGTCTGAATCAAATGCCAAAGTAGCTGAAGCATTGGAA GTTCTAAATGAGGTAGATGGATATTCTAGTTCTTCAGAGAAAATAGACTTACTGGCCAGT GATCCTCATGATCATTTGATATGTAAAAGTGAAAGAGTTCACTGCAAATCAGTAGAGAGT AGTATTGAAGATAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCTAACTTG AGCCACGTAACTGAAAATCTAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA GAGCATCCTCTCACAAATAAATTAAAGCGTAAAAGGAGAGTTACATCAGGACTTCATCCT GAGGATTTTATCAAGAAAGCAGATTTGGCAGTTCAAAAGACTCCTGAAAAGATAAATCAG GGAACTAACCAAACAGAGCGGAATGATCAAGTGATGAATATTACTAACAGTGGTCATGAG AATAAAACAAAAGGTGATTCTATTCAGAATGAGAACAATCCTAACCCAGTAGAATCACTG GAAAAAGAATCATTCAAAAGTAAAGCTGAACCTATAAGCAGTAGTATAAGCAATATGGAA TTAGAATTGAATGTCCACAATTCCAAAGCATCTAAAAAGAATAGGCTGAGAAGGAAGTCT TCTACCAGGCATATTCATGAGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAAT TATACTGAAGTACAAATTGATAGTTGTTCTAGCAGTGAAGAGATAAAGAAAAAAAATTAC AACCAAATGCCAGTCAGGCACAGCAGAAAGCTACAACTCATGGAAGATAAAGAACGTGCA GCTAGAGCCAAAAAGAGTAGCAAGCCAAATGAACAAACAAGTAAAAGACATGCCAGTGAT ACTTTCCCAGAACTGAGGTTAACAAACATACCTGGTTCTTTTACTAACTGTTCAAATACT AATGAAAAAGAATTTGTCAATCCTAGCCTTCCAAGAGAACAAACAGAAGAGAAACTAGAA ACAGTTAAACTGTCTAATAATGCCAAAGACCCCAAAGATCTCATGTTAAGTGGAGAAAGT GTTTTGCAAATTGAAAGATCTGTAGAGAGTAGCAGTATTTTGTTGATACCTGGTACTGAT TATGGCACTCAGGAAAGTATCTCATTACTGGAAGTTAGCACTCTGGGGAAGGCAAAAACA GAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGAACTAATTCAT GGTTGTTCTAAAGATACTAGAAATGGCACAGAAGGCTTGAAGTATCCATTGGGACCTGAA GTTAACTACAGTCAGGAAACAAGCATAGATATGAGAGAAAGTGAACTTGATACTCAATAT TTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCTGTTTTCAAATCCAGGA AAGGAATGTGCAACATTCTCTGCCTGCTCTAGGTCCTTAAAGAAACAAAGTCCAAAGGTC ACTCCTGAATGTGAACAAAAGGAAGAAAATCAAGGAGAGAAAGAGTCTAATATCGAGCTT GTAGAGACAGTTAATACCACTGCAGGCTTTCCTATGGTTTGTCAGAAAGATAAGCCAGTT GATTATGCCAGATGTATCGAAGGAGGCTCTAGGCTTTGTCTATCATCTCAGTTCAGAGGC AACGAAACTGGACTCATTATTCCAAATAAACATGGACTTTTACAGAACCCATATCATATG TCACCGCTTATTCCCACCAGGTCATTTGTTAAAACTAAATGTAAGAAAAACCTGCTAGAA GAAAACTCTGAGGAACATTCAATGTCACCTGAAAGAGCAATGGGAAACAAGAACATTCCA AGTACAGTGAGCACAATTAGCCATAATAACAGAGAAAATGCTTTTAAAGAAACCAGCTCA AGCAGTATTTATGAAGTAGGTTCCAGTACTAATGAAGCAGGTTCTAGTACTAATGAAGTA GGCTCCAGTGATGAAAACATTCAAGCAGAGCTAGGTAGAAACAGAAGGCCAAAATTGAAT GCTATGCTTAGATTAGGGCTTCTGCAACCTGAGATTTGTAAGCAAAGTCTTCCTATAAGT GATTGTAAACATCCTGAAATTAAAAAGCAAGAACATGAAGAAGTAGTTCAGACTGTTAAT ACAGACGTCTCTCTATGTCTGATTTCATATAACCTAGAACAGCATATGGGAAGCAGTCAT ACATCTCAGGTTTGTTCTGAGACACCTGACAACCTGTTAGATGATGGTGAAATAAAGGAA GATACTAGTTTTGCTGAATATGGCATTAAGGAGACTTCTACTGTTTTTAGCAAAAGTGTC CAGAGAGGAGAGCTCAGCAGGAGCCCTAGCCCTTTCACCCATACACATTTGGCTCAGGTT TACCAAAGAGGGGCCAAGAAATTAGAGTCCTCGGAAGAGAATTTATCTAGTGAG >Rhesus TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTGTTACTCACTAAA GACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTGGCA AGGAGCCAACATAACAGATGGACTGGAAGTAAGGAAACATGTAATGATAGGCAGACTCCC AGCACAGAGAAAAAGGTAGATCTGAATGCTAATGCCCTGTATGAGAGAAAAGAATGGAAT AAGCAAAAACTGCCATGCTCTGAGAATCCTAGAGACACTGAAGATGTTCCTTGGATAACA CTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAAGT TCTGATGACTCACATGATGGGGGGTCTGAATCAAATGCCAAAGTAGCTGATGTATTGGAC GTTCTAAATGAGGTAGATGAATATTCTGGTTCTTCAGAGAAAATAGACTTACTGGCCAGT GATCCTCATGAGCCTTTAATATGTAAAAGTGAAAGAGTTCACTCCAGTTCAGTAGAGAGT AATATTAAAGACAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAACCTTCCCAATTTA AGCCATGTAACTGAAAATCTAATTATAGGAGCACTTGTTACTGAGTCACAGATAATGCAA GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGAACTACATCAGGTCTTCATCCT GAGGATTTTATAAAGAAAGCAGATTTGGCAGTTCAAAAGACTCCTGAAATAATAAATCAG GGAACTAACCAAATGGAGCAGAATGGTCAAGTGATGAATATTACTAATAGTGCTCATGAG AATAAAACAAAAGGTGATTCTATTCAGAATGAGAAAAATCCTAACCCAATAGAATCACTG GAAGAAGAATCTTTCAAAACTAAAGCTGAACCTATAAGCAGCAGTATAAACAATATGGAA CTAGAATTAAATATCCACAATTCAAAAGCACCTAAAAAAAATAGGCTGAGGAGGAAGTCT TCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAAC TGTACTGAACTACAAATTGATAGTTGTTCTAGCAGTGAAGAGATAAAGAAAAAAAATTAC AACCAAATGCCAGTCAGGCACAGCAGAAACCTACAACTCATGGAAGATAAAGAATCTGCA ACTGGAGCCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGACATGCCAGTGAT ACTTTCCCAGAACTGAAGTTAACAAAGGTACCTGGTTCTTTTACTAACTGTTCAAATACT AGTGAAAAAGAATTTGTCAATCCTAGCCTTTCAAGAGAAGAAAAAGAAGAGAAACTAGAA ACAGTTAAAGTGTCTAATAATGCCAAAGACCCCAAAGATCTCATCTTAAGTGGAGAAAGG GTTTTACAAACTGAAAGATCTGTAGAGAGTAGCAGTATTTCATTGGTACCTGGTACCGAT TATGGCACTCAGGAAAGTATCTCATTACTGGAAGTTAGCACTCTAGGGAAGGCAAAAACA GAACGAAATAAATGTATGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGAACTAATTCAT GGTTGTTCTGAAGATACTAGAAATGACACAGAAGGCTTTAAGTATCCATTGGGAAGTGAA GTTAACCACAGTCAGGAAACAAGCATAGAAATAGAAGAAAGTGAACTTGATACTCAGTAT TTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCCTTTGCTCTGTTTTCAAATCCAGGA GAGGAATGTGCAACATTCTCTGCCCACTCTAGGTCCTTAAAGAAACAAAGTCCAAAAGTT ACTTCTGAATGTGAACAAAAGGAAGAAAATCAAGGAAAGAAACAGTCTAATATCAAGCCT GTACAGACAGTTAATATCACTGCAGGCTTTTCTGTGGTTTGTCAGAAAGATAAGCCAGTT GATAATGCCAAATGTATCAAAGGAGGCTCTAGGTTTTGTCTATCATCTCAGTTCAGAGGC AACGAAACTGGACTCATTACTCCAAATAAACATGGACTGTTACAAAACCCATACCATATA CCACCACTTTTTCCTGTCAAGTCATTTGTTAAAACTAAATGTAACAAAAACCTGCTAGAG GAAAACTCTGAGGAACATTCAGTGTCACCTGAAAGAGCAGTGGGAAACAAGAACATTCCA AGTACAGTGAGCACAATTAGCCATAATAACAGAGAAAATGCTTTTAAAGAAGCCAGCTCG AGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGTGGGCTCCAGTATTAATGAAGTA GGTTCCAGTGATGAAAACATTCAAGCAGAACTAGGTAGAAACAGAGGGCCAAAATTGAAT GCTGTGCTTAGATTAGGGCTTTTGCAACCTGAGGTCTGTAAACAAAGTCTTCCTATAAGT AATTGTAAGCATCCTGAAATAAAAAAGCAAGAACATGAAGAATTAGTTCAGACTGTTAAT ACAGACTTCTCTCCATGTCTGATTTCAGATAACCTAGAACAGCCTATGGGAAGTAGTCAT GCGTCTGAGGTTTGTTCTGAGACTCCTGATGATCTGTTAGATGATGGTGAAATAAAGGAA GATACTAGTTTTGCTGAAAATGACATTAAGGAGAGTTCTGCTGTTTTTAGCAAAAGCATC CAGAGAGGAGAGCTCAGCAGGAGCCCTAGCCCTTTCACCCATACACATTTAGCTCAGGGT TACCGAAAAGAGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG >Orangutan TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA GACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCA AGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCAGACTCCC AGCACAGAAAAAAAGGTAGACCTGAATGCTGATCCCCTGTGTGAGAGAAAAGAATGGAAT AAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGATACTGAAGATGTTCCTTGGATAACA CTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGACGAACTGTTAGGT TCTGATGACTCACATGATGGGAGGTCTGAATCAAATGCCAAAGTAGCGGATGTATTGGAC GTTCTAAATGAGGTAGATGAATATTCTGGTTCTTCAGAGAAAATAGACTTACTGGCCAGT GATCCTCATGAGGCTTTAATTTGTAAAAGTGAAAGAGTTCACTCCAAATCAGTAGAGAGT AATATTGAAGACAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCCAACTTA AGCCATGTAACTGAAAATCTAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGAGCTACATCAGGCCTTCATCCT GAGGATTTTATCAAGAAAGCAGATTTGGCAGTTCAAAAGACTCCTGAAATGATAAATCAG GGAACTAACCAAATGGAGCAGAATGGTCAAGTGATGAATATTACTAATAGTGGTCATGAG AATAAAACAAAAGGTGATTCTATTCAGAATGAGAAAAATCCTAACCCAATAGAATCACTC GAAAAAGAATCTTTCAAAACAAAAGCTGAACCTATAAGCAGCAGTATAAGCAATATGGAA CTCGAATTAAATATCCATAATTCAAAAGCACCTAAAAAGAATAGGCTGAGGAGGAAGTCT TCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAAT TGTACTGAATTGCAAATTGATAGTTGTTCTAGCAGTGAAGAGATAAAGAAAAAAAAATAC AACCAAATGCCAGTCAGGCACAGCAGAAACCTACAACTCATGGAAGATAAAGAACCTGCA ACTGGAGCCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGACATGACAGCGAT ACTTTCCCAGAGCTGAAGTTAACAAATGCACCTGGTTCTTTTACTAACTGTTCAAATACC AGTGAGAAAGAATTTGTCAATCCTAGCCTTCCAAGAGAAGAAAAAGAAGAGAAACTAGGA ACAGTTAAAGTGTCTAATAATGCCAAAGACCCCAAAGATCTCATGTTAAGTGGAGGAAGG GTTTTGCAAACTGAAAGATCTGTAGAGAGTAGCAGTATTTCATTGGTACCTGGTACTGAT TATGGCACTCAGGAAAGTATCTCGTTACTGGAAGTTAGCACTCTAGGGAAGGCAAAAACA GAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGAACTAATTCAT GGTTGTTTCAAAGATACTAGAAATGACACAGAAGGGTTTAAGTATCCATTGGGACATGAA GTTAACCACAGTCAGGAAACAAGCATAGAAATGGAAGAAAGTGAACTTGATACTCAGTAT TTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCTGTTTTCAAATCCAGGA GAGGAATGTGCAACATTCTCTGCCCACTCTAGGTCCTTAAAGAAACAAAGTCCAAAAGTC ACTTTTGAATGTGAACAAAAGGAAGAAAATCAAGGAAAGAATGAGTCTAATATCAAGCCT GTACAGACAGCTAATATCACTGCAGGCTTTCCTGTGGTTTGTCAGAAAGATAAGCCAGTT GATTATGCCAAATGTATCAAAGGAGGCTCTAGGTTTTGTCTATCATCTCAGTTCAGAGGC AACGAAACTGGACTCATTACTCCAAATAAACATGGACTTTCACAAAACCCATATCATATA CCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAACCTGCTAGAG GAAAACTCTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAACGAGAACATTCCA AGTACAGTGAGCATAATTAGCCGTAATAACAGAGAAAATGTTTTTAAAGAAGCCAGCTCA AGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGTGGGCTCCAGTATTAATGAAGTA GGTTCCAGTGATGAAAACATTCAAGCAGAACTAGGTAGAAGCAGAGGGCCAAAATTGAAT GCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTATAAACAAAGTTTTCCTGGAAGT AATGGTAAGCATCCTGAAATAAAAAAGCAAGAATATGAAGAAGTACTTCAGACTGTTAAT ACAGACTTCTCTCCATGTCTGATTTCAGATAACCTAGAACAGCCTATGAGAAGTAGTCAT GCATCTCAGGTTTGTTCTGAGACACCTAATGACCTGTTAGATGATGGTGAAATAAAGGAA GATACTAGTTTTGCTGAAAATGACATTAAGGAAAGTTCTGCTGTTTTTAGCAAAAGCGTC CAGAGAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCATACACATTTGGCTCAGGGT TACCGAAGAGGGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG >Gorilla TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA GACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAACAAACAGCCTGGCTTAGCA AGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACTCCC AGCACAGAAAAAAAGGTAGATCTGAATGCTGATCCCCTGTGTGAGAGAAACGAATGGAAT AAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGATACTGAAGATGTTCCTTGGATAACA CTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAGGT TCTGATGACTCACATGATGGGGGGTCTGAATCAAATGCCAAAGTAGCTGATGTATTGGAC GTTCTAAATGAGGTAGATGAATATTCTGGTTCTTCAGAGAAAATAGACTTACTGGCCAGT GATCCTCATGAGGCTTTAATATGTAAAAGTGAAAGAGTTCACTCCAAATCAGTAGAGAGT AATATTGAAGACAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCCAGCTTA AGCCATGTAACTGAAAATCTAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGAGCTACATCAGGCCTTCATCCT GAGGATTTTATCAAGAAAGCAGATTTGGCAGTTCAAAAGACTCCTGAAATGATAAATCAG GGAACTAACCAAATGGAGCAGAATGGTCAAGTGATGAATATTACTAATAGTGGTCATGAG AATAAAACAAAAGGTGATTCTATTCAGAATGAGAAAAATCCTAACCCAATAGAATCACTA GAAAAAGAATCTTTCAAAACGAAAGCTGAACCTATAAGCAGCAGTATAAGCAATATGGAA CTCGAATTAAATATCCACAATTCAAAAGCGCCTAAAAAGAATAGGCTGAGGAGGAAGTCT TCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAAT TGTACTGAATTGCAAATTGATAGTTGTTCTAGCAGTGAAGAGATAAAGAAAAAAAAGTAC AACCAAATGCCAGTCAGGCACAGCAGAAACCTACAGCTCATGGAAGATAAAGAACCTGCA ACTGGAGCCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGACATGACAGCGAT ACTTTCCCAGAGCTGAAGTTAACAAATGCACCTGGTTCTTTTACTAACTGTTCAAATACC AGTGAAAAAGAATTTGTCAATCCTAGCCTTCCAAGAGAAGAAAAAGAAGAGAAACTAGAA ACAGTTAAAGTGTCTAATAATGCCGAAGACCCCAAAGATCTCATGTTAAGTGGAGAAAGG GTTTTGCAAACTGAAAGATCTGTAGAGAGTAGCAGTATTTCATTGGTACCTGGTACTGAT TATGGCACTCAGGAAAGTATCTCGTTACTGGAAGTTAGCACTCTAGGGAAGGCAAAAACA GAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGGACTAATTCAT GGTTGTTCCAAAGATACTAGAAATGACACAGAAGGCTTTAAGTATCCATTGGGACATGAA GTTAACCACAGTCGGGAAACAAGCATAGAAATGGAAGAAAGTGAACTTGATGCTCAGTAT TTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCTGTTTTCAAATCCAGGA GAGGAATGTGCAACATTCTCTGCCCACTCTAGGTCCTTAAAGAAACAAAGTCCAAAAGTC ACTTTTGAATGTGAACAAAAGGAAGAAAATCAAGGAAAGAATGAGTCTAATATCAAGCCT GTACAGACAGTTAATATCACTGCAGGCTTTCCTGTGGTTTGTCAGAAAGATAAGCCAGTT GATTATGCCAAATGTATCAAAGGAGGCTCTAGGTTTTGTCTATCATCTCAGTTCAGAGGC AACGAAACTGGACTCATTACTCCAAATAAACATGGACTTTTACAAAACCCATATCATATA CCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAACCTGCTAGAG GAAAACTTTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAATGAGAACATTCCA AGTACAGTGAGCACAATTAGCCGTAATAACAGAGAAAATGTTTTTAAAGAAGCCAGCTCA AGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGTGGGCTCCAGTATTAATGAAGTA GGTTCCAGTGATGAAAACATTCAAGCAGAACTAGGTAGAAACAGAGGGCCAAAATTGAAT GCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTATAAACAAAGTCTTCCTGGAAGT AATTGTAAGCATCCTGAAATAAAAAAGCAAGAATATGAAGAAGTAGTTCAGACTGTTAAT ACAGATTTCTCTCCATGTCTGATTTCAGATAACTTAGAACAGCCTATGGGAAGTAGTCAT GCATCTCAGGTTTGTTCTGAGACACCTGATGACCTGTTAGATGATGGTGAAATAAAGGAA GATACTAGTTTTGCTAAAAATGACATTAAGGAAAGTTCTGCTGTTTTTAGCAAAAGCGTC CAGAGAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCATACACATTTGGCTCAGGGT TACCGAAGAGGGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG >Human TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA GACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCA AGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACTCCC AGCACAGAAAAAAAGGTAGATCTGAATGCTGATCCCCTGTGTGAGAGAAAAGAATGGAAT AAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGATACTGAAGATGTTCCTTGGATAACA CTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAGGT TCTGATGACTCACATGATGGGGAGTCTGAATCAAATGCCAAAGTAGCTGATGTATTGGAC GTTCTAAATGAGGTAGATGAATATTCTGGTTCTTCAGAGAAAATAGACTTACTGGCCAGT GATCCTCATGAGGCTTTAATATGTAAAAGTGAAAGAGTTCACTCCAAATCAGTAGAGAGT AATATTGAAGACAAAATATTTGGGAAAACCTATCGGAAGAAGGCAAGCCTCCCCAACTTA AGCCATGTAACTGAAAATCTAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGACCTACATCAGGCCTTCATCCT GAGGATTTTATCAAGAAAGCAGATTTGGCAGTTCAAAAGACTCCTGAAATGATAAATCAG GGAACTAACCAAACGGAGCAGAATGGTCAAGTGATGAATATTACTAATAGTGGTCATGAG AATAAAACAAAAGGTGATTCTATTCAGAATGAGAAAAATCCTAACCCAATAGAATCACTC GAAAAAGAATCTTTCAAAACGAAAGCTGAACCTATAAGCAGCAGTATAAGCAATATGGAA CTCGAATTAAATATCCACAATTCAAAAGCACCTAAAAAGAATAGGCTGAGGAGGAAGTCT TCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAAT TGTACTGAATTGCAAATTGATAGTTGTTCTAGCAGTGAAGAGATAAAGAAAAAAAAGTAC AACCAAATGCCAGTCAGGCACAGCAGAAACCTACAACTCATGGAAGGTAAAGAACCTGCA ACTGGAGCCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGACATGACAGCGAT ACTTTCCCAGAGCTGAAGTTAACAAATGCACCTGGTTCTTTTACTAAGTGTTCAAATACC AGTGAAAAAGAATTTGTCAATCCTAGCCTTCCAAGAGAAGAAAAAGAAGAGAAACTAGAA ACAGTTAAAGTGTCTAATAATGCTGAAGACCCCAAAGATCTCATGTTAAGTGGAGAAAGG GTTTTGCAAACTGAAAGATCTGTAGAGAGTAGCAGTATTTCATTGGTACCTGGTACTGAT TATGGCACTCAGGAAAGTATCTCGTTACTGGAAGTTAGCACTCTAGGGAAGGCAAAAACA GAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGGACTAATTCAT GGTTGTTCCAAAGATAATAGAAATGACACAGAAGGCTTTAAGTATCCATTGGGACATGAA GTTAACCACAGTCGGGAAACAAGCATAGAAATGGAAGAAAGTGAACTTGATGCTCAGTAT TTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCCGTTTTCAAATCCAGGA GAGGAATGTGCAACATTCTCTGCCCACTCTGGGTCCTTAAAGAAACAAAGTCCAAAAGTC ACTTTTGAATGTGAACAAAAGGAAGAAAATCAAGGAAAGAATGAGTCTAATATCAAGCCT GTACAGACAGTTAATATCACTGCAGGCTTTCCTGTGGTTGGTCAGAAAGATAAGCCAGTT GATAATGCCAAATGTATCAAAGGAGGCTCTAGGTTTTGTCTATCATCTCAGTTCAGAGGC AACGAAACTGGACTCATTACTCCAAATAAACATGGACTTTTACAAAACCCATATCGTATA CCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAATCTGCTAGAG GAAAACTTTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAATGAGAACATTCCA AGTACAGTGAGCACAATTAGCCGTAATAACAGAGAAAATGTTTTTAAAGAAGCCAGCTCA AGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGTGGGCTCCAGTATTAATGAAATA GGTTCCAGTGATGAAAACATTCAAGCAGAACTAGGTAGAAACAGAGGGCCAAAATTGAAT GCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTATAAACAAAGTCTTCCTGGAAGT AATTGTAAGCATCCTGAAATAAAAAAGCAAGAATATGAAGAAGTAGTTCAGACTGTTAAT ACAGATTTCTCTCCATATCTGATTTCAGATAACTTAGAACAGCCTATGGGAAGTAGTCAT GCATCTCAGGTTTGTTCTGAGACACCTGATGACCTGTTAGATGATGGTGAAATAAAGGAA GATACTAGTTTTGCTGAAAATGACATTAAGGAAAGTTCTGCTGTTTTTAGCAAAAGCGTC CAGAAAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCATACACATTTGGCTCAGGGT TACCGAAGAGGGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG >Chimpanzee TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA GACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCA AGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACTCCC AGCACAGAAAAAAAGGTAGATCTGAATGCTGATCCCCTGTGTGAGAGAAAAGAATGGAAT AAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGATACTGAAGATGTTCCTTGGATAACA CTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAGGT TCTGATGACTCACATGATGGGGGGTCTGAATCAAATGCCAAAGTAGCTGATGTATTGGAC GTTCTAAATGAGGTAGATGAATATTCTGGTTCTTCAGAGAAAATAGACTTACTGGCCAGC GATCCTCATGAGGCTTTAATATGTAAAAGTGAAAGAGTTCACTCCAAATCAGTAGAGAGT AATACTGAAGACAAAATATTTGGGAAAACCTATCGGAGGAAGGCAAGCCTCCCCAACTTA AGCCATGTAACTGAAAATCTAATTATAGGAGCATTTGTTACTGAGCCACAGATAATACAA GAGCGTCCCCTCACAAATAAATTAAAGCGTAAAAGGAGAGCTACATCAGGCCTTCATCCT GAGGATTTTATCAAGAAAGCAGATTTGGCAGTTCAAAAGACTCCTGAAATGATAAATCAG GGAACTAACCAAATGGAGCAGAATGGTCAAGTGATGAATATTACTAATAGTGGTCATGAG AATAAAACAAAAGGTGATTCTATTCAGAATGAGAAAAATCCTAACCCAATAGAATCACTC GAAAAAGAATCTTTCAAAACGAAAGCTGAACCTATAAGCAGCAGTATAAGCAATATGGAA CTCGAATTAAATATCCACAATTCAAAAGCACCTAAAAAGAATAGGCTGAGGAGGAAGTCT TCTACCAGGCATATTCATGCGCTTGAACTAGTAGTCAGTAGAAATCTAAGCCCACCTAAT TGTACTGAATTGCAAATTGATAGTTGTTCTAGCAGTGAAGAGATAAAGAAAAAAAAGTAC AACCAAATGCCAGTCAGGCACAGCAGAAACCTACAACTCATGGAAGATAAAGAACCTGCA ACTGGAGTCAAGAAGAGTAACAAGCCAAATGAACAGACAAGTAAAAGACATGACAGCGAT ACTTTCCCAGAGCTGAAGTTAACAAATGCACCTGGTTCTTTTACTAACTGTTCAAATACC AGTGAAAAAGAATTTGTCAATCCTAGCCTTCCAAGAGAAGAAGAAGAAGAGAAACTAGAA ACAGTTAAAGTGTCTAATAATGCCGAAGACCCCAAAGATCTCATGTTAAGTGGAGAAAGG GTTTTGCAAACTGAAAGATCTGTAGAGAGTAGCAGTATTTCATTGGTACCTGGTACTGAT TATGGCACTCAGGAAAGTATCTCGTTACTGGAAGTTAGCACTCTAGGGAAGGCAAAAACA GAACCAAATAAATGTGTGAGTCAGTGTGCAGCATTTGAAAACCCCAAGGGACTAATTCAT GGTTGTTCCAAAGATACTAGAAATGACACAGAAGGCTTTAAGTATCCATTGGGACATGAA GTTAACCACAGTCGGGAAACAAGCATAGAAATGGAAGAAAGTGAACTTGATGCTCAGTAT TTGCAGAATACATTCAAGGTTTCAAAGCGCCAGTCATTTGCTCTGTTTTCAAATCCAGGA GAGGAATGTGCAACATTCTCTGCCCACTGTAGGTCCTTAAAGAAACAAAGTCCAAAAGTC ACTTTTGAACGTGAACAAAAGGAACAAAATCAAGGAAAGAATGAGTCTAATATCAAGCCT GTACAGACAGTTAATATCACTGCAGGCTTTCCTGTGGTTTGTCAGAAAGATAAGCCAGTT GATTATGCCAAATGTATCAAAGGAGGCTCTAGGTTTTGTCTATCATCTCAGTTCAGAGGC AACGAAACTGGACTCATTACTCCAAATAAACATGGACTTTTACAAAACCCATATCATATA CCACCACTTTTTCCCATCAAGTCATTTGTTAAAACTAAATGTAAGAAAAACCTGCTAGAG GAAAACTTTGAGGAACATTCAATGTCACCTGAAAGAGAAATGGGAAATGAGAACATTCCA AGTACAGTGAGCACAATTAGCCGTAATAACAGAGAAAATGTTTTTAAAGAAGCCAGCTCA AGCAATATTAATGAAGTAGGTTCCAGTACTAATGAAGTGGGCTCCAGTATTAATGAAGTA GGTTCCAGTGATGAAAACATTCAAGCAGAACTAGGTAGAAACAGAGGGCCAAAATTGAAT GCTATGCTTAGATTAGGGGTTTTGCAACCTGAGGTCTATAAACAAAGTCTTCCTGAAAGT AATTGTAAGCATCCTGAAATAAAAAAGCAAGAATATGAAGAAGTAGTTCAGACTGTTAAT ACAGATTTCTCTCCATGTCTGATTTCAGATAACTTAGAACAGCCTATGGGAAGTAGTCAT GCATCTCAGGTTTGTTCTGAGACACCTGATGACCTGTTAGATGATGGTGAAATAAAGGAA GATACTAGTTTTGCTGAAAATGACATTAAGGAAAGTTCTGCTGTTTTTAGCAAAAGCGTC CAGAGAGGAGAGCTTAGCAGGAGTCCTAGCCCTTTCACCCATACACATTTGGCTCAGGGT TACCGAAGAGGGGCCAAGAAATTAGAGTCCTCAGAAGAGAACTTATCTAGTGAG PyCogent-1.5.3/doc/data/primate_brca1.tree000644 000765 000024 00000000105 11261304034 021276 0ustar00jrideoutstaff000000 000000 (Galago,HowlerMon,(Rhesus,(Orangutan,(Gorilla,(Human,Chimpanzee)))));PyCogent-1.5.3/doc/data/primate_cdx2_promoter.fasta000644 000765 000024 00000011104 11212010705 023230 0ustar00jrideoutstaff000000 000000 >human AGCGCCCGCGGGTTCTGAGAGCGCTCAAAGCCGCCGAGTCAGGCTGCCCAGCCCGCCGGG CCTCGCCGCAGTGATCCTCATTCCCGAATCTGGCAGCGCTGTCAAAGGCTTGTATTAGGA GGTGAACGGCGGCCGCAGGCCCACTCCACGCGGTTGCTGAAACCGAGCTGGGCGCGCGCG GGGGCCGAATCTCGCCGCCTCCGCGCTCCTGTCGGGGCAGCTCCCGATCCCGGGCTGCGC GGCTTCGGTCCCCAAGACGGCCACTTCCAGCCCTAGGCCCCTTGGCCGCAGCGCTTCCCA AACCAAGAGAGATCCTTTCTCAACTCAGAGCTTTTCATTAGCAGTCGTTAATAATGGCCC TGAGTTGCCTTATCATCTCCTGGAAATGAGAAATAAATTTCTTCGGAGAACGTTTCCCTT TGTAAAGGACAGAGAGTTTTAAAGATACAGGTATGATGTAAGACACATAAATACCTAGGT AAGCATTAGCAGAAATTCTCTTTTCCTT-------------------ATATTTAAGTATA ATAAACATACAAGTGTAGCTCAATGAATTTTCACAAACTGACATTCTGTGTAACCAGCAG CCTAAGAAACTGCTTTACCAACGATCCCCTAGCTCGCCTCCAGTTATGCACGCCAATAAC CACTAGCCTAACTTCTACCACATGCCCATTACTTCTGTAGTTTAAAACTTCTGATTCTTG AATGTAAACGTTTAACAATAAATCGCTTGAATTTAACTCAAATTTCAAATGTAAGATGAA GTCAGAGATGCAGCCTGAATCTAGGATCATAATTTGTCTTGTGCGGAGGGCGAGTAATTT CCTTGGGCAAGAAAATAACT----GGAGGTGACAGTTGTTTGGGGCTGCAGTCGTCCGGG CCAGGAGCACAGGGCGGGAAGGAATGGCCCATCTCTTAGGGCTCTCTGCTTGTCACCTAC CAGGTTGGTCAGAAACGTTCTCATCAAAGCAATGG-TTCTCTTTTCTTTTCTCTTTGGGA CAGAAGGAGTTTCTTGACCGCCCTCTTCCCTGCAAATGCATAAACAACCACTGCTCCTGT CTCCAAGCTCAGATTCCTACCAAGATAGCCTTTTCTCTTCCCCTCTCTTTTGTAAGTCTC TTGATTTCATTCTTTGAACCTGTGATTGGAGGTTAAAGTGCACCAGGTTGGAAGGAGGAA GCTCTTAACAATAAAGGTTTGAATATTTAGCTGTG-TCAGGTCGCTGCCCTCTCACGAGC CTCCCTCCCCTTTATCTTTTAAAATGCAAATTATGTTTCGAGGGGTTGTGCGTAGAGTGC GCGCTGCGCCTCGACGTCTCCAACCATTGGTGTCTGTGTCATTACTAATAGAGTCTTGTA AACACTCGTTAATCACGGAAGGCCGCCGGCCTGGGGCTCCGCACGCCAGCCTGTGGCGGG TCTTCCCCGCCTCTGCAGCCTAGTGGGAAGGAGGTGGGAGGAAAGAAGGAAGAAAGGGAG GGAGGGAGGAGGCAGGCCAGAGGGA >macaque AGCGCCCGCGGGTTCTGAGAGCGCTCAAAGCCACCGAGTCAGGCTGCCCAGCCCGCCGGG CCTCGCCGCAGTGATCCTCATTCCCGAATCTGGCAGCGCTGTCAAAGGCTGGTA-TAGGA GGTGAACGGCGGCCGCAGGCCCACTCCACGCTGTTGTTGAAACCCAGTTGCGCGCGCGCG AGAGCCGAATCTCGCCGCCTCCGCGCTCCTGTCGGGGTAGCTCCAGATTCCTGGCTGCGC GGCTCCGGTCCCCAAGGCGGTCGCTCCCAGCCCTAGGCCCCTTGGCCGCAGCGCTTCCCA AACCGAGAGAGATCCTTTCTCAACTCAGAGCTTTTCATTAACAATCGTTAATAACGGCCC TGAGTTGCCTTGTGATCTCCTGGAGATGAAAAATAAATTTCTTAGGAGAATGTTTCACTT TGTAAAGGACAAAGAGTTTTAAAGATATAGGTATGATGTAAGACACATAAATACCTAGGT AAGTATTAGCAGAAATTCTCTTTTCCTTCATTGTTAACCACTTATGCA------AGTATA AGAAAAATACAAGTGTAGCTCAATGAATTTTCACAAACTGACACTCCATGTAACCAGCAG CCTAAGAAACAGCGTTACCAGCGATCCCCTAGTTCGCCTCC--TTATGCACGCCAATAAC CACTAGCCTAACTTCTACCACATGCCCATTACTTTTGTAGTTTAAAACTTTTGATTCTTG AATGTAAACGTTTATCAATAAATCGC-----TTTAACTCAAATCTCAAACGTAAGATGAA GTCAGAGATGCAGCCTGAATCTCGGATCATAATTTATCTTGTACGGAGGGCGAGTAATTT CCTTGGGCAAGAAAATAACTGATTGGAGGTGACAGTTGTTTAGGGCTGCAGTCGTCCGGG CCAGGAGCACAGGGCGGGAAGGAATGGCCCATCTCTTAGGGCTCTCTGCTTGTCACCTAC CAGGTTGGTCAGAAACGTTCTCATCAAAGCAATGGTTTCTCTTTTCTTTTCTCTTTGGGA CAGAAGGAGTTCCTTGACCGCCCCCTTCCCTGCAAATGCATAAACAACCACTGCTCCTGT CTCCAAGCTCAGATTCCCACCAAGATAGCCTTTTCTCTTCCCCTGTCTTTTGTAAGTCTC TTGATTTCATTCTTTGAACCTGTGATTGGAGGTTAAAGTGCACCAGGTTGGAAGGAGGAA GCTCTTAACAATAAAGGTTTGAATATTTAGCTGTGATCAGGTCGCTGCCCTCTCACGAAC CTCCCCCCCCCTTATCTTTTAAAATGCAAATTATGTTTCAGGGGGTTGTGCGTTGAGTGC GCGCTGCGCCTCGACGTCTCCAACCATTGGTGTCTGTGTCATTACTAATAGAGTCTTGTA AACACTCGTTAATCACGGAAGGCCGCCGGCCTGGGGCTCCGCACGCCAGCCTGTGGCGGG TCTTCCCCGCCTCTGCAGCCTAGTGGGAAGGAGGTGGGAGGAAAGAAGGAAGAAAGAGAG GGAGGGAGGAGGCAGGCCAGAGGGA >chimp AGCGCCCGCGGGTTCTGAGAGCGCTCAAAGCCGCCGAGTCAGGCTGCCCAGCCCGCCGGG CCTCGCCGCAGTGATCCTCATTCCCGAATCTGGCAGCGCTGTCAAAGGCTTGTA-TAGGA GGTGAACGGCGGCCGCAGGCCCACCCCACGCGGTTGCTGAAACCGAGCTGGGCGCGCGCA GGAGCCGAATCTCGCCGCCTCTGCCCTCCTGTCGGGGCAGCTCCCGATCCCGGGCTGCGC GGCTCCGGTCCCCAAGACGGCCACTTCCAGCCCTAGGCCCCTTGGCCGCAGCGCTTCCCA AACCAAGAGAGATCCTTTCTCAACTCAGAGCTTTTCATTAGCAGTCGTTAATAATGGCCC TGAGTTGCCTTATCATCTCCTGGAAATAAGAAATAAATTTCTTCGGAGAACGTTTCCCTT TGTAAAGGACAGAGAGTTTTAAAGATATAGATATGATGTAAGACACATAAATACCTAGGT AAGCATTAGCAGAAATTCTCTTTTCCTT-------------------ATATTTAAGTATA ATAAACATACAAGTGTAGCTCAATGAATTTTCACAAACTGACATTCTATGTAACCAGCAG CCTAAGAAACTGCTTTACCAACGATCCCCTAGCTCGCCTCCAGTTATGCACGCCAATAAC CACTAGCCTAACTTCTACCACATGCCCATTACTTCTGTAGTTTAAAACTTCTGATTCTTG AATGTAAACGTTTAACAATAAATCGCTTGAATTTAACTCAAATTTCAAATGTAAGATGAA GTCAGAGATGCAGCCTGAATCTAGGATCATAATTTGTCTTGTGCGGAGGGCGAGTAATTT CCTTGGGCAAGAAAATAACTGATTGGAGGTGACAGTTGTTTGGGGCTGCAGTCATCTGGG CCAGGAGCACAGGGCGGGAAGGAATGGCCCATCTCTTAGGGCTCTCTGCTTGTCACCTAC CAGGTTGGTCAGAAACGTTCTCATCAAAGCAATGG-TTCTCTTTTCTTTTCTCTTTGGGA CAGAAGGAGTTTCTTGACCGCCCCCTTCCCTGCAAATGCATAAACAACCACTGTTCCTGT CTCCAAGCTCAGATTCCCACCAAGATAGCCTTTTCTCTTCCCCTCTCTTTTGTAAGTCTC TTGATTTCATTCTTTGAACCTGTGATTGGAGGTTAAAGTGCACCAGGTTGGAAGGAGGAA GCTCTTAACAATAAAGGTTTGAATATTTAGCTGTG-TCAGGTCGCTGCCCTCTCACGAGC CTCCCTCCCCTTTATCTTTTAAAATGCAAATTATGTTTCGAGGGGTTGTGCGTAGAGTGC GCGCTGCGCCTCGACGTCTCCAACCATTGGTGTCTGTGTCATTACTAATAGAGTCTTGTA AACACTCGTTAATCACGGAAGGCCGCCGGCCTGGGGCTCCGCACGCCAGCCTGTGGCGGG TCTTCCCCGCCTCTGCAGCCTAGTGGGAAGGAGGTGGGAGGAAAGAAGGAAGAAAGGGAG GGAGGGAGGAGGCAGGCCAGAGGGA PyCogent-1.5.3/doc/data/pycogent_script_template.py000644 000765 000024 00000002061 11612102756 023377 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # File created on 15 Jul 2011 from __future__ import division __author__ = "AUTHOR_NAME" __copyright__ = "COPYRIGHT_INFORMATION" __credits__ = ["AUTHOR_NAME"] __license__ = "GPL" __version__ = "1.6.0dev" __maintainer__ = "AUTHOR_NAME" __email__ = "AUTHOR_EMAIL" __status__ = "Development" from cogent.util.option_parsing import parse_command_line_parameters, make_option script_info = {} script_info['brief_description'] = "" script_info['script_description'] = "" script_info['script_usage'] = [("","","")] script_info['output_description']= "" script_info['required_options'] = [\ # Example required option #make_option('-i','--input_dir',type="existing_filepath",help='the input directory'),\ ] script_info['optional_options'] = [\ # Example optional option #make_option('-o','--output_dir',type="new_dirpath",help='the output directory [default: %default]'),\ ] script_info['version'] = __version__ def main(): option_parser, opts, args =\ parse_command_line_parameters(**script_info) if __name__ == "__main__": main()PyCogent-1.5.3/doc/data/refseqs.fasta000644 000765 000024 00000000477 11350301455 020413 0ustar00jrideoutstaff000000 000000 >s0 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC >s1 TGCAGCTTGAGCCACAGGAGAGAGAGAGCTTC >s2 TGCAGCTTGAGCCACAGGAGAGAGCCTTC >s3 TGCAGCTTGAGCCACAGGAGAGAGAGAGCTTC >s4 ACCGATGAGATATTAGCACAGGGGAATTAGAACCA >s5 TGTCGAGAGTGAGATGAGATGAGAACA >s6 ACGTATTTTAATTTGGCATGGT >s7 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT >s8 CCAGAGCGAGTGAGATAGACACCCAC PyCogent-1.5.3/doc/data/refseqs_protein.fasta000644 000765 000024 00000041664 11361457314 022166 0ustar00jrideoutstaff000000 000000 >1091044 VTDKFPEDVKFLYIAYTPAKADITACGIPIPLDFDKEFRDKTVVIVAIPGAFTPTCTANHIPPFVEKFTALKSAGVDAVIVLSANDPFVQSAFGKALGVTDEAFIFASDPGAEFSKSAGLSLDLPPAFGTRTARYAIIVSNGVVKYVEKDSEGVAGSGVDAVLAAL >11467494 MTNFPKIGKTPPNFLTIGVYKKRLGKIRLSDYRGKKYVILFFYPANFTAISPTELMLLSDRISEFRKLSTQILAISVDSPFSHLQYLLCNREEGGLEDLNYPLVSDLTQTITRDYQVLTDEGLAFPGLFIIDKEGIIQYYTVNNLLCGRNINELLRILESIQYVKENPGYACPVNWNFGDQVFYSHPLKSKIYFKDLYSPKKSS >11499727 MERLNSERFREVIQSDKLVVVDFYADWCMPCRYISPILEKLSKEYNGEVEFYKLNVDENQDVAFEYGIASIPTVLFFRNGKVVGGFIGAMPESAVRAEIEKALGA >1174686 MSDGVKHINSAQEFANLLNTTQYVVADFYADWCGPCKAIAPMYAQFAKTFSIPNFLAFAKINVDSVQQVAQHYRVSAMPTFLFFKNGKQVAVNGSVMIQGADVNSLRAAAEKMGRLAKEKAAAAGSS >12044976 MVTEIRSLKQLEEIFSAKKNVIVDFWAAWCGPCKLTSPEFQKAADEFSDAQFVKVNVDDHTDIAAAYNITSLPTIVVFENGVEKKRAIGFMPKTKIIDLFNN >13186328 MLNTRIKPFKNISYYKKKFYEIKEIDLKSNWNVFFFYPYSYSFICPLELKNISNKIKEFKNLNTKIYAISNDSHFVQKNWIENELKFINFPFISDFNHKISNNFNILNKKDGNCLRSTIIIDKNLIIKYINIVDDSIGRSIDEILKNIKMLQFINTNENKLCPYSWNNDSKSIEIN >13358154 MLIKLESNQNLNELLKENHSKPILIDFYADWCPPCRMLIPVLDSIEKKHGDEFTIIKINVDHFPELSTQYQVKSIPALFYLKNGDIKATSLGFIDENSLVNKLRSI >13541053 MSLVNKAAPDFEANAFVNGEVKKIRLSSYRGKWVVLFFYPADFTFVCPTEVEGFAEDYEKFKKKNTEVISVSEDTVYVHKAWVQYDERVAKAKYPMVEDRKGIIARAYDVYNEETGNAQRGLFIINPDGIVKYVVITDDNVGRSTDETLRVLEALQSGGLCPVNWHEGEPTLKV >13541117 MSPKVNEKAPDFEAPDTALKMRKLSEFRGQNVVLAFFPGAFTSVCTKEMCTFRDSMANFNKFKAKVIGISVDSPFSLAEFAKKNNLTFDLLSDSNREISKKYDVLHQNFAGVPGLTASKRSVFIIDGDGIVRYAWVSDDPGKEPDYKAIQEFLSKMN >135765 AIVKATDQSFSAETSEGVVLADFWAPWCGPCKMIAPVLEELDQEMGDKLKIVKIDVDENQETAGKYGVMSIPTLLVLKDGEVVETSVGFKPKEALQELVNKHL >1388082 MAAEEGQVIGCHTNDVWTVQLDKAKESNKLIVIDFTASWCPPCRMIAPIFNDLAKKFMSSAIFFKVDVDELQSVAKEFGVEAMPTFVFIKAGEVVDKLVGANKEDLQAKI >140543 MLFYKPVMRMAVRPLKSIRFQSSYTSITKLTNLTEFRNLIKQNDKLVIDFYATWCGPCKMMQPHLTKLIQAYPDVRFVKCDVDESPDIAKECEVTAMPTFVLGKDGQLIGKIIGANPTALEKGIKDL >14286173 MPLIGDKFPEMEVQTTHGPMELPDEFEGKWFILFSHPADFTPVCTTEFVAFQEVYPELRELDCELVGLSVDQVFSHIKWIEWIAENLDTEIEFPVIADTGRVADTLGLIHPARPTNTVRAVFVVDPEGIIRAILYYPQELGRNIPEIVRMIRAFRVIDAEGVAAPANWPDNQLIGDHVIVPPASDIETARKRKDEYECYDWWLCTQSRG >14578634 MKKNFFYAGGLSLLLWGMAACSGQGKADKAAVVADSVVVKTDSVAADSTGYIVKVGESAPDFTITLTDGKQMKLSELRGKVVMLQFTASWCGVCRKEMPFIEKDIWLKHKNNPEFALIGIDRDEPLDKVIAFGKSVGVTYPLGLDPGADIFAKYALRESGITRNVLIDREGKIVKLTRLYNEEEFASLVDQIDEMLKK >14600438 MPGVGEQAPDFEGIAHTGEKIRLSDFRGRIVVLYFYPRAMTPGCTREGVRFNELLDEFEKLGAVVIGVSTDSVEKNRKFAEKHGFRFKLVSDEKGEIGMKYGVVRGEGSNLAAERVTFIIDREGNIRAILRNIRPAEKHADLALEEVKKLVLGKKAERAGEVI >15218394 MQVRAAKKQTFNSFDDLLQNSDKPVLVDFYATWCGPCQLMVPILNEVSETLKDIIAVVKIDTEKYPSLANKYQIEALPTFILFKDGKLWDRFVSFLSRMNTAYLLLASKAQL >15597673 MLSFSLGPLVVSLQHLLLFLALGAALLGGWLAARGGGRNAEPVLFNLLLLGLLVARLAFVVRYWPQYRGDFAQMLDIRDGGFLAWPGLLAAVLGALFWAWRRPALRRSLGVGASLGLAFWLLGSLGLGIYERGTRLPELSLRNAAGESVQLADFRGRPLVINLWASWCPPCRREMPVLQQAQAENPDVVFLFANQGESAETVRHFLQGENLRLDNLLFDNGGQLGQQVGSVALPTTVFYTAEGRLLGSHLGELSRGSLARYLEAFEPAAAAPATRSSE >15599256 MSDTPYIFDVTGANFEQLVIENSFHKPVLVDFWADWCAPCKALMPLLAQIAESYQGELLLAKVNCDVEQDIVMRFGIRSLPTVVLFKDGQPVDGFAGAQPESQIRALLEPHVKAPALPDEDPLEVAQALFAEGRIGDAEATLKALLAENNENAAALILYARCLAERGELEEAQAILDAVKSDEHKQALAGARAQLTFLRQAADLPDSAELKSRLAADAGDDEAAYQLAVQQLARQQYEAALDGLLKLFLRNRGYQDDLPRKTLVQVFDLLGNDHPLVTAYRRKLYQALY >15602312 MVNSYEKPQMKTSVLLTALFKPLLLCTIVLSCIGCKEDIAVIGKQAPEIAVFDLVGTQRSLNEGKGKTILLNFWSETCGVCIAELKTFEQLLQSYPQNNLHIIAINVDGDKADTQALVKKREISLLVVKDQLKITAERYQLVGTPTSFVIDPEGKILYKFEGLIPTQDLHLFFKG >15605725 MKRLLPVFLALILFGLLVYIGLNQDKHDHYTITTQKGQKIPNVTLTTPDGKKVSIEEFKGKVLLINFWATWCPPCKEEIPMFKEIYEKYRDRGFEILAINMDPENLTGFLKNNPLPFPVFVINEKWKEPLTFQGFQRRTWLTEEEL >15605963 MARTVNLKGNPVTLVGPELKVGDRAPEAVVVTKDLQEKIVGGAKDVVQVIITVPSLDTPVCETETKKFNEIMAGMEGVDVTVVSMDLPFAQKRFCESFNIQNVTVASDFRYRDMEKYGVLIGEGALKGILARAVFIIDKEGKVAYVQLVPEITEEPNYDEVVNKVKELI >15609375 MLNVGATAPDFTLRDQNQQLVTLRGYRGAKNVLLVFFPLAFTGICQGELDQLRDHLPEFENDDSAALAISVGPPPTHKIWATQSGFTFPLLSDFWPHGAVSQAYGVFNEQAGIANRGTFVVDRSGIIRFAEMKQPGEVRDQRLWTDALAALTA >15609658 MTKTTRLTPGDKAPAFTLPDADGNNVSLADYRGRRVIVYFYPAASTPGCTKQACDFRDNLGDFTTAGLNVVGISPDKPEKLATFRDAQGLTFPLLSDPDREVLTAWGAYGEKQMYGKTVQGVIRSTFVVDEDGKIVVAQYNVKATGHVAKLRRDLSV >15613511 MLEGKQAPDFSLPASNGETVSLSDFKGKNIVLYFYPKDMTPGCTTEACDFRDRVEDFKGLNTVILGVSPDPVERHKKFIEKYSLPFLLLADEDTKVAQQYDVWKLKKNFGKEYMGIERSTFVIDKDGTVVKEWRKVRVKDHVEEALAFIKENLE >15614085 MNKKRLTTIVVLIAVVASVIIILTQNNLEVGNGKGMLAEDFTLPLYEESQSRSLSDYRGDVVILNVWASWCEPCRKEMPALMELQSDYESEDVSIVTVNMQTFERTVNDAGEFIEELGITLPVFLDEEGEFADAYQVQHLPMTYVLDREGIINEVILGEVTYEQLEQLIVPLLEKAS >15614140 MDKRKRFWMRLSILAVISVALGYTFYSNFFADRSLARAGEQAVNFVLEDLEGESIELRELEGKGVFLNFWGTYCPPCEREMPHMEKLYGEYKEQGVEIIAVNANEPELTVQRFVDRYGLSFPIVIDKGLNVIDAYGIRPLPTTILINEHGEIVKVHTGGMTEQMVEEFMELIKPEA >15615431 MKGKQAFYRSFLILALCTVGYFYSQSEHVYRPVDKEAITTFNEGVQVGQRAVPFSLTTLEGQVVDLSSLRGQPVILHFFATWCPVCQDEMPSLVKLDKEYRQKGGQFLAINLTNQESSIKDVRAFVQHYRAEFDPLLDTDGEVMETYQVIGIPTTLILDEEGTIVKRYNGVLTEEIIDEIMDIH >15643152 MRVKHFELLTDEGKTFTHVDLYGKYTILFFFPKAGTSGCTREAVEFSRENFEKAQVVGISRDSVEALKRFKEKNDLKVTLLSDPEGILHEFFNVLENGKTVRSTFLIDRWGFVRKEWRRVKVEGHVQEVKEALDRLIEEDLSLNKHIEWRRARRALKKDRVPREELELLIKAAHLAPSCMNNQPWRFVVVDEEELLKKIHEALPGGNYWMKNAPALIAVHSKKDFDCALPDNRDYFLFDTGLAVGNLLVQATQMGLVAHPVAGYDPVKVKEILKIPEDHVLITLIAVGYLGDESELSEKHRELERSERVRKELSEIVRWNL >15672286 MPMNITSHGEKIATLNPPISGDAPDFELTDLKGNKIKLSKLEKPVLISVFPDINTRVCSLQTKHFNLEAAKHSEIDFLSISNNTADEQKNWCATEGVDMTILADDGTFGKAYGLILNGGPLEGRLARSVFVVKNGQIVYSEVLSELSDEPNYEKALAATK >15790738 MQRGWGSERLAPRLSPGRAGSVISNDTDTPSMRCEGARAPAFELPGVSDGTQTRLGLTDALADNRAVVLFFYPFDFSPVCATELCAIQNARWFDCTPGLAVWGISPDSTYAHEAFADEYALTFPLLSDHAGAIADAFGVLQASAEDHDRVPERAVFLIDADRVIRYAWASSDLSESPDLGAVKAAIDDLQPDTAGVAPRTISDAHDDGTTDG >15791337 MTVTLKDFYADWCGPCKTQDPILEELEADYDEDVSFEKIDVDEAEDVANEYQVRSIPTIVVENDDGVVERFVGVTQREQLEDAIESAGA >15801846 MSQTVHFQGNPVTVANSIPQAGSKAQTFTLVAKDLSDVTLGQFAGKRKVLNIFPSIDTGVCAASVRKFNQLATEIDNTVVLCISADLPFAQSRFCGAEGLNNVITLSTFRNAEFLQAYGVAIADGPLKGLAARAVVVIDENDNVIFSQLVDEITTEPDYEAALAVLKA >15805225 MTEPAPASPARPAWTRALPPLLAAALVGGLGWALLKPAGNAANGPLVGKPAPQFNLTGLDGQPVALADYRGRPVVLNFWASWCGPCREEAPLFAKLAAHPGAPAVLGILFNETKPQNARDFARQYGLTYPNLQDPGVATAIAYQVTGIPRTVFIDAQGVVRHIDQGGLDTARLNAGLSKIGVPGL >15805374 MTSPTPPTSPSLSPQPRRASWTRWIVPAVMVGLVGLLAYGLFTPDPEGGPALLGKPAPAFALEDLGGRTHALTAAQGKPVVINFWASWCVPCRQEAPLFSKLSQETAGKAEFFGVIYNDQPADARRFMDQYGLIYPALLDPGSRTALSYGVGKLPITFIVDGQGKVVHIKDGPIEEPELRAALKQAGL >15807234 MTLVGQPAPDFTLPASTGQDITLSSYRGQSHVVLVFYPLDFSPVCSMQLPEYSGSQDDFTEAGAVVLGINRDSVYAHRAWAAEYGIEVPLLADMQLEVARQYGVAIDERGISGRAVFVIDREGVVRYQHVEEQTGQYTVRPGAVLEQLRGL >15826629 APIKVGDAIPAVEVFEGEPGNKVNLAELFKGKKGVLFGVPGAFTPGCSKTHLPGFVEQAEALKAKGVQVVACLSVNDAFVTGEWGRAHKAEGKVRLLADPTGAFGKETDLLLDDSLVSIFGNRRLKRFSMVVQDGIVKALNVEPDGTGLTCSLAPNIISQL >15899007 MNDELNDPELQKILSKKTTQILNNLKEKVKEPVKHLNSKNFDEFITKNKIVVVDFWAEWCAPCLILAPVIEELANDYPQVAFGKLNTEESQDIAMRYGIMSLPTIMFFKNGELVDQILGAVPREEIEVRLKSLLE >15899339 MVEIGEKAPEIELVDTDLKKVKIPSDFKGKVVVLAFYPAAFTSVCTKEMCTFRDSMAKFNEVNAVVIGISVDPPFSNKAFKEQNKINFTIVSDFNREAVKAYGVAGELPILKGYVLAKRSVFVIDKNGIVRYKWVSEDPTKEPNYDEIKDVVTKLS >15964668 MTIAVGDKLPNATFKEKTADGPVEVTTELLFKGKRVVLFAVPGAFTPTCSLNHLPGYLENRDAILARGVDDIAVVAVNDLHVMGAWATHSGGMGKIHFLSDWNAAFTKAIGMEIDLSAGTLGIRSKRYSMLVEDGVVKALNIEESPGQATASGAAAMLELL >15966937 MSGSDNPYQGSFGTQMTGSASFGGQPASAASGPNDLTPDDLIRETTTAAFTRDVLEASRQQPVLVDFWAPWCGPCKQLTPVIEKVVREAAGRVKLVKMNIDDHPSIAGQLGIQSIPAVIAFIDGRPVDGFMGAVPESQIKEFIDRIAGPAADNGKAEIESVLADAKALIDAGDAQNAAGLYGAVLQADPENAAAVAGMIECMIALGQVAEARQALSGLPEALAGEAAVAAVSKKLDQIEEARKLGDPAALERQLALDPDDHAARLKLAKLRNVEGDRTAAAEHLLTIMKRDRSFEDDGARRELLSFFEVWGPKDPATIAARRKLSSILFS >15988313 SRAPTGDPACRAAVATAQKIAPLAHGEVAALTMASAPLKLPDLAFEDADGKPKKLSDFRGKTLLVNLWATWCVPCRKEMPALDELQGKLSGPNFEVVAINIDTRDPEKPKTFLKEANLTRLGYFNDQKAKVFQDLKAIGRALGMPTSVLVDPQGCEIATIAGPAEWASEDALKLIRAATGKAAAAL >16078864 MLKKWLAGILLIMLVGYTGWNLYQTYSKKEVGIQEGQQAPDFSLKTLSGEKSSLQDAKGKKVLLNFWATWCKPCRQEMPAMEKLQKEYADKLAVVAVNFTSAEKSEKQVRAFADTYDLTFPILIDKKGINADYNVMSYPTTYILDEKGVIQDIHVGTMTKKEMEQKLDLD >16123427 MNTVCTACMATNRLPEERIDDGAKCGRCGHSLFDGEVINATAETLDKLLQDDLPMVIDFWAPWCGPCRSFAPIFAETAAERAGKVRFVKVNTEAEPALSTRFRIRSIPTIMLYRNGKMIDMLNGAVPKAPFDNWLDEQLSRDPNS >16125919 MKIVLAGVAAVALAFASPALAALKAGDKAPDFSAKGALAGKDFDFNLAKALKKGPVVLYFFPAAYTAGCTAEAREFAEATPEFEKLGATVIGMTAGNVDRLKDFSKEHCRDKFAVAAADKALIKAYDVALAVKPDWSNRTSYVIAPDGKILLSHTDGNFMGHVQQTMGAVKAYKAKK >16330420 MNLQIELYKFQQESLKRSSPERAAIFSDFIQGLSEEFRNRRLLRIGDFAPDFTLKNTKGETIILSEQLKTGPILLKFFRGYWCPYCGLELRAYQKVVNKIRALGGTILAISPQTLVASQKTIDRHDLTYDLLSDSGFQTAQDYGLVFTVPDAVKQIYLQSGCVIPEHNGTEEWLLPVPATFVIDRRGHIALAYANVDFRVRYEPEDAIAILLSLFVGN >1633495 XDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLA >16501671 MVLVTRQAPDFTAAAVLGSGEIVDKFNFKQHTNGKTTVLFFWPMDFTFVCPSELIAFDKRYEEFQKRGVEVVGVSFDSEFVHNAWRNTPVDKGGIGPVKYAMVADVKREIQKAYGIEHPDEGVALRGSFLIDANGIVRHQVVNDLPLGRNIDEMLRMVDALQFHEEHGDVCPAQWEKGKEGMNASPDGVAKYLAENISSL >1651717 MVVAWTTRDQSARNMPMLAVNEDNFDNLVLQCPKPILVYFGAPWCGLCHFVKPLLNHLHGEWQEQLVCVEVNADVNLHLANAYRLKNLPTLILFNRGQVIQRLEDFRVREDLHRIREQIAVSLFSP >16759994 MAGKLRRWLREAAVFLALLIAIMVVMDVWRAPQAPPAFAATPLHTLTGESTTLATLSEERPVLLYFWASWCGVCRFTTPAVAHLAAEGENVMTVALRSGGDAEVARWLARKGVDFPVVNDANGALSAGWEISVTPTLVVVSQGRVVFTTSGWTSYWGMKLRLWWAKTF >16761507 MNTVCTHCQAINRIPGDRLQDAAKCGRCGHELFDGEVINATGETLDKLLKDDLPVVIDFWAPWCGPCRNFAPIFEDVAEERSGKVRFVKVNTEAERELSARFGIRSIPMIMIFKHGQVVDMLNGAVPKAPFDSWLNEAL >16803644 MAERLVGTQAPRFEMEAVMPNQTFGKVSLEKNIEDDKWTILFFYPMDFTFVCPTEIVAISARSDEFDALNARIIGASTDTIHSHLAWTNTPIKEGGIGKLNYPLAADTNHQVASDYGVLIEEEGVALRGLFIINPKGEIQYEVVHHNNIGREVDEVLRVLQALQTGGLCPINWQPGEKTIV >16804867 MAIIFAKEDDLEEIISSHPKILLNFWAEWCAPCRCFWPTLEQFAEMEEGNVQVVKINVDKQRALAQKFDVKGIPNSLVLVDGEIKGAIAGIVSCDELRSRFKSLAK >17229033 MFSNREGQRVPQVTFRTRQNNEWVNVTTDDLFAGKTVAVFSLPGAFTPTCSSTHLPGYNELAKVFKDNGVDEIVCISVNDAFVMNEWAKTQEAENITLIPDGNGEFTEGMGMLVDKTDLGFGKRSWRYSMLVKDGVIEKMFIEPDVPGDPFEVSDAQTMLKHINPQAAKPKLVSLFTKVGCPFCARAKAALQENGIEYEEITLGKGITTRSLRAVTGATTVPQVFIDGVLIGGSEALTDYLSTIKQEKVGV >17229859 MPTDTVAYVQENEFDTVLSEDKVVVVDFTATWCGPCRLVSPLMDQLADEYKGRVKVVKVDVDNNKPLFKRFGLRSIPAVLIFKDGELTEKIVGVSPYEQFSGAVEKLI >1729944 MLMLDKDTFKTEVLEGTGYVLVDYFSDGCVPCKALMPAVEELSKKYEGRVVFAKLNTTGARRLAISQKILGLPTLSLYKDGVKVDEVTKDDATIENIEAMVEEHISK >17531233 MLKRCNFKNQVKYFQSDFEQLIRQHPEKIIILDFYATWCGPCKAIAPLYKELATTHKGIIFCKVDVDEAEDLCSKYDVKMMPTFIFTKNGDAIEALEGCVEDELRQKVLEHVSAQ >17537401 MPQFLKGVTLENREGDELPAEEHLKGKIIGLYFSASWCPPCRAFTPKLKEFFEEIKKTHPEFEIIFVSRDRNSSDLVTYFKEHQGEWTYIPFGSDKIMSLMQKYEVKTIPAMRIVNDQGEVIVQDARTEIQNKGENVEGLWAEWMALIK >17547503 MSRRQTLIALILAGIVAAVLGVYAGHRATQPRPPADGAVTAFFDARLPDANGTPIDLSAFRGQPVVINFWAPWCGPCVEEMPELSALAQEQKARVKFIGIGIDSAANIQAFLGKVPVTYPIAVAGFGGTELGRQLGNQAGGLPFTVILNAQGEITFRKMGRVHADELRAALQRT >18309723 MENINLFIVFIEGILSIFSPCILPILPIYLSMLSNSSVEDIRDSNFKSGVLIRNTLFFTLGISTTFFILGSSISALSSFFNTNKNIIMILGGVIILFMGLFYLGLINLNILNREKRLNFKYKNMSPVSAFVLGFTFSFGWTPCIGPILASVLVMASSSKNLLMSNLLILVYTIGFILPFIIASLFYGKLFKKFDGIKKHMDLIKKISGIFIIIAGLIMLVGGIRNMNNEIKINNNPQINKSESVNKDSKNESDNKKQEEENKIPPIDFTLYDQYGNKHTLSEYKGKTIFLNFWATWCPPCRGEMPYIDELYKEYNENKDDVVILGVASPNLGREGSEEHIKNFLKEENHVFPVVLDENGAMVYQYGINAFPSTFIINKEGYITKYIPGAMTKETMKTLIESER >18313548 MVLKVGDKAPDFELLNEELKPVRLSEVLKRGRPVVLLFFPGAFTSVCTKELCTFRDKMALLNKANAEVLAISVDSPFALKAFKDANRLNFPLLSDYNRIVIGMYDVVQANLLGLPLYHLAKRAVYIIDPTGTIRYVWYSDDPRDEPPYDEVIKEAEKIGAQK >18406743 MAETSKQVNGDDAQDLHSLLSSPARDFLVRNDGEQVKVDSLLGKKIGLYFSAAWCGPCQRFTPQLVEVYNELSSKVGFEIVFVSGDEDEESFGDYFRKMPWLAVPFTDSETRDRLDELFKVRGIPNLVMVDDHGKLVNENGVGVIRSYGADAYPFTPEKMKEIKEDEDRARRGQTLRSVLVTPSRDFVISPDGNKVPVSELEGKTIGLLFSVASYRKCTELTPKLVEFYTKLKENKEDFEIVLISLEDDEESFNQDFKTKPWLALPFNDKSGSKLARHFMLSTLPTLVILGPDGKTRHSNVAEAIDDYGVLAYPFTPEKFQELKELEKAKVEAQTLESLLVSGDLNYVLGKDGAKVLVSDLVGKTILMYFSAHWCPPCRAFTPKLVEVYKQIKERNEAFELIFISSDRDQESFDEYYSQMPWLALPFGDPRKASLAKTFKVGGIPMLAALGPTGQTVTKEARDLVVAHGADAYPFTEERLKEIEAKYDEIAKDWPKKVKHVLHEEHELELTRVQVYTCDKCEEEGTIWSYHCDECDFDLHAKCALNEDTKENGDEAVKVGGDESKDGWVCEGNVCTKA >19173077 MFPKTLTDSKYKAFVDGEIKEISLQDYIGKYVVLAFYPLDFTFVCPTEINRFSDLKGAFLRRNAVVLLISCDSVYTHKAWASIPREQNGVLGTAWPMVWDAKRELCNQFGLYDEENGHPMRSTVILAKDLSVRHISSNYHAIGRSVDEIIRLIDAITFNDENGDICPAEWRSENKDN >19554157 MEEGEEISLSDFEGEVVVLNAWGQWCAPCRAEVDDLQLVQETLDPLGGTVLGINVRDYNQTIAQDFKLDNAVTYPSIYDPPFRIAAALGGVPTSVIPTTIVLDRSHRPAAVFLREVTALSG >19705357 MSFSVFAAKSDKKDDVKLPNIVLYDQYGKKHNIEEYKGKVVVINFWATWCGYCVEEMPGFEKVYKEFGSNKKDVIILGVAGPKSKENLNNVDVEKDKIISFLKKKNITYPSLMDETGKSFDDYKVRALPMTYVINKNGYLEGFVNGAITDEQLRKAINETLKKK >19746502 MKKGLLVTTGLACLGLLTACSTQDNMAKKEITQDKMSMAAKKKDKMSTSKDKSMMADKSSDKKMTNDGPMAPDFELKGIDGKTYRLSEFKGKKVYLKFWASWCSICLSTLADTEDLAKMSDKDYVVLTVVSPGHQGEKSEADFTKWFQGTDYKDLPVLLDPDGKLLEAYGVRSYPTEVFIGSDGVLAKKHIGYAKKSDIKKALKGIH >20092028 MAGDFMKPMLLDFSATWCGPCRMQKPILEELEKKYGDKVEFKVVDVDENQELASKYGIHAVPTLIIQKDGTEVKRFMGVTQGSILAAELDKLL >20151112 SLINTKIKPFKNQAFKNGEFIEVTEKDTEGRWSVFFFYPADFTFVCPTELGDVADHYEELQKLGVDVYSVSTDTHFTHKAWHSSSETIAKIKYAMIGDPTGALTRNFDNMREDEGLADRATFVVDPQGIIQAIEVTAEGIGRDASDLLRKIKAAQYVAAHPGEVCPAKWKEGEATLAPSLDLVGKI >21112072 MTIQVGDRIPEVVLKHLREGIEAVDTHTLFTGRKVVLFAVPGAFTPTCSAKHLPGYVEQFEAFRKRGIEVLCTAVNDPFVMQAWGRSQLVPDGLHLVPDGNAELARALGLEIDASGSGMGLRSRRYALYADDGVVKALFVEEPGEFKVSAADYVLQHLPD >21222859 MSAASRAPLRSNRTAERFTERTADARARVRSRAARAAVGAAAAALLVSACSSGGTSGGGGQTGFITGSDGIATAKKGERADAPELSGETVDGGQVDVADYKGKVVVLNVWGSWCPPCRAEAKNFEKVYQDVKDQGVQFVGINTRDTSTGPARAFEKDYGVTYPSLYDPAGRLMLRFEKGTLNPQAVPSTLIIDREGKVAARTLQALSEEKLRKMLAPYLQPEK >21223405 MLTVGDKFPEFDLTACVSLEKGKEFQQINHKTYEGQWKVVFAWPKDFTFVCPTEIAAFGKLNDEFADRDAQILGFSGDSEFVHHAWRKDHPDLTDLPFPMLADSKHELMRDLGIEGEDGFAQRAVFIVDQNNEIQFTMVTAGSVGRNPKEVLRVLDALQTDELCPCNWSKGDETLDPVALLSGE >21227878 MTDEIRVGETIQDFRLRDQKREEIHLYDLKGKKVLLSFHPLAWTQVCAQQMKSLEENYELFTELNTVPLGISVDPIPSKKAWARELGINHIKLLSDFWPHGEVARTCGIFRGKEGVSERANIIIDENRQVIYFKKYLGHELPDIKEIIEVLKNK >21283385 MTEITFKGGPIHLKGQQINEGDFAPDFTVLDNDLNQVTLADYAGKKKLISVVPSIDTGVCDQQTRKFNSDASKEEGIVLTISADLPFAQKRWCASAGLDNVITLSDHRDLSFGENYGVVMEELRLLARAVFVLDADNKVVYKEIVSEGTDFPDFDAALAAYKNI >21674812 MIEEGKIAPDFTLPDSTGKMVSLSEFKGRKVLLIFYPGDDTPVCTAQLCDYRNNVAAFTSRGITVIGISGDSPESHKQFAEKHKLPFLLLSDQERTVAKAYDALGFLGMAQRAYVLIDEQGLVLLSYSDFLPVTYQPMKDLLARIDAS >23098307 MKILKYVLIISISMFLPIMTAYAEGTEVGERAPDFELKTIDGQQLRLSDFKGERVLINFWTTWCPPCRQEMPDMQRFYQDLQPNILAVNLTDTEMNKEQVVRFSQELELTFPILLDEKGEVSKAYRISPIPTTYMIDSEGIIRHKSYGALTYEQMVAEYNKME >2649838 MVFTSKYCPYCRAFEKVVERLMGELNGTVEFEVVDVDEKRELAEKYEVLMLPTLVLADGDEVLGGFMGFADYKTAREAILEQISAFLKPDYKN >267116 MSKVIHVTSNEELDKYLQHQRVVVDFSAEWCGPCRAIAPVFDKLSNEFTTFTFVHVDIDKVNTHPIGKEIRSVPTFYFYVNGAKVSEFSGANEATLRSTLEANI >27375582 MSEQSTSANPQRRTFLMVLPLIAFIGLALLFWFRLGSGDPSRIPSALIGRPAPQTALPPLEGLQADNVQVPGLDPAAFKGKVSLVNVWASWCVPCHDEAPLLTELGKDKRFQLVGINYKDAADNARRFLGRYGNPFGRVGVDANGRASIEWGVYGVPETFVVGREGTIVYKLVGPITPDNLRSVLLPQMEKALK >2822332 MATDTAATTGTASPDEPLYVNGQTELDDVTSDNDVVLADFYADWCGPCQMLEPVVETLAEQTDAAVAKIDVDENQALASAYGVRGVPTLVLFADGEQVEEVVGLQDEDALKDLIESYTE >30021713 MWRKLTIIVVLLCLAGYAAYEQFGNKEQAVKVKQEKSEAAMKEIIARNGIEIGKSAPDFELTKLDGTNVKLSDLKGKKVILNFWATWCGPCQQEMPDMEAFYKEHKENVEILAVNYTPSEKGGGEEKVSNFIKEKGITFPVLLDKNIDVTTAYKVITIPTSYFIDTKGVIQDKFIGPMTQKEMEKRVAKLK >3261501 MTTRDLTAAYFQQTISANSNVLVYFWAPLCAPCDLFTPTYEASSRKHFDVVHGKVNIETEKDLASIAGVKLLPTLMAFKKGKLVFKQAGIANPAIMDNLVQQLRAYTFKSPAGEGIGPGTKTSS >3318841 MPGGLLLGDVAPNFEANTTVGRIRFHDFLGDSWGILFSHPRDFTPVCTTELGRAAKLAPEFAKRNVKLIALSIDSVEDHLAWSKDINAYNSEEPTEKLPFPIIDDRNRELAILLGMLDPAEKDEKGMPVTARVVFVFGPDKKLKLSILYPATTGRNFDEILRVVISLQLTAEKRVATPVDWKDGDSVMVLPTIPEEEAKKLFPKGVFTKELPSGKKYLRYTPQP >3323237 MALLDISSGNVRKTIETNPLVIVDFWAPWCGSCKMLGPVLEEVESEVGSGVVIGKLNVDDDQDLAVEFNVASIPTLIVFKDGKEVDRSIGFVDKSKILTLIQKNA >4155972 MLEVINGKNYAEKTAHQAVVVNVGASWCPDCRKIEPIMENLAKTYKGKVEFFKVSFDESQDLKESLGIRKIPTLIFYKNAKEVGERLVEPSSQKPIEDALKALL >4200327 MAQRLLLRRFLASVISRKPSQGQWPPLTSRALQTPQCSPGGLTVTPNPARTIYTTRISLTTFNIQDGPDFQDRVVNSETPVVVDFHAQWCGPCKILGPRLEKMVAKQHGKVVMAKVDIDDHTDLAIEYEVSAVPTVLAMKNGDVVDKFVGIKDEDQLEAFLKKLIG >4433065 GSIKEIDINEYKGKYVVLLFYPLDWTFVCPTEMIGYSEVAGQLKEINCEVIGVSVDSVYCHQAWCEADKSKGGVGKLGFPLVSDIKRCISIKYGMLNVETGVSRRGYVIIDDKGKVRYIQMNDDGIGRSTEET >4704732 MAPITVGDVVPDGTISFFDENDQLQTVSVHSIAAGKKVILFGVPGAFTPTCSMSHVPGFIGKAEELKSKGIDEIICFSVNDPFVMKAWGKTYPENKHVKFVADGSGEYTHLLGLELDLKDKGSGISSGRFALLLDNLKVTVANVESGGEFTVSSAEDILKAL >4996210 MAYHLGATFPNFTATASNVDGVFDFYKYVGDNWAILFSHPHDFTPVCTTELAEFGKMHEEFLKLNCKLIGFSCNSKESHDQWIEDIKFYGNLDKWDIPMVCDESRELANQLKIMDEKEKDIKGLPLTCRCVFFISPDKKVKATVLYPATTGRNSQEILRVLKSLQLTNTHPVATPVNWKEGDKCCILPSVDNADLPKLFKNEVKKLDVPSQKAYLRFVQM >5326864 MSLKAGDSFPEGVTFSYIPWAEDASEITSCGIPINYNASKEFANKKVVLFALPGAFTPVCSANHVPEYIQKLPELRAKGVDQVAVLAYNDAYVMSAWGKANGVTGDDILFLSDPEAKFSKSIGWADEEGRTYRYVLVIDNGKIIYAAKEAAKNSLELSRADHVLKQL >6322180 MGEALRRSTRIAISKRMLEEEESKLAPISTPEVPKKKIKTGPKHNANQAVVQEANRSSDVNELEIGDPIPDLSLLNEDNDSISLKKITENNRVVVFFVYPRASTPGCTRQACGFRDNYQELKKYAAVFGLSADSVTSQKKFQSKQNLPYHLLSDPKREFIGLLGAKKTPLSGSIRSHFIFVDGKLKFKRVKISPEVSVNDAKKEVLEVAEKFKEE >6323138 MSDLVNKKFPAGDYKFQYIAISQSDADSESCKMPQTVEWSKLISENKKVIITGAPAAFSPTCTVSHIPGYINYLDELVKEKEVDQVIVVTVDNPFANQAWAKSLGVKDTTHIKFASDPGCAFTKSIGFELAVGDGVYWSGRWAMVVENGIVTYAAKETNPGTDVTVSSVESVLAHL >6687568 MRVLATAADLEKLINENKGRLIVVDFFAQWCGPCRNIAPKVEALAKEIPEVEFAKVDVDQNEEAAAKYSVTAMPTFVFIKDGKEVDRFSGANETKLRETITRHK >6850955 MVSVGKKAPDFEMAGFYKGEFKTFRLSEYLGKWVVLCFYPGDFTFVXATEVSAVAEKYPEFQKLGVEVLSVSVDSVFVHKMWNDNELSKMVEGGIPFPMLSDGGGNVGTLYGVYDPEAGVENRGRFLIDPDGIIQGYEVLILPVGRNVSETLRQIQAFQLVRETKGAEVAPSGWKPGKKTLKPGPGLVGNVYKEWSVKEAFED >7109697 MKHITNKAELDQLLTTNKKVVVDFYANWCGPCKILGPIFEEVAQDKKDWTFVKVDVDQANEISSEYEIRSIPTIIFFQDGKMADKRIGFIPKNELKELLK >7290567 MVYPVRNKDDLDQQLILAEDKLVVIDFYADWCGPCKIIAPKLDELAQQYSDRVVVLKVNVDENEDITVEYNVNSMPTFVFIKGGNVLELFVGCNSDKLAKLMEKHARRLYR >9955016 XSGNARIGKPAPDFKATAVVDGAFKEVKLSDYKGKYVVLFFYPLDFTFVCPTEIIAFSNRAEDFRKLGCEVLGVSVDSQFTHLAWINTPRKEGGLGPLNIPLLADVTRRLSEDYGVLKTDEGIAYRGLFIIDGKGVLRQITVNDLPVGRSVDEALRLVQAFQYTDEHGEVCPAGWKPGSDTIKPNVDDSKEYFSKHN >15677788 MKKILTAAVVALIGILLAIVLIPDSKTAPAFSLPDLHGKTVSNADLQGKVTLINFWFPSCPGCVSEMPKIIKTANDYKNKNFQVLAVAQPIDPIESVRQYVKDYGLPFTVMYDADKAVGQAFGTQVYPTSVLIGKKGEILKTYVGEPDFGKLYQEIDTAWRNSDAVPyCogent-1.5.3/doc/data/ST_genome_part.gb000644 000765 000024 00000077246 11350301455 021153 0ustar00jrideoutstaff000000 000000 LOCUS AE006468 4857432 bp DNA circular BCT 23-FEB-2009 DEFINITION Salmonella enterica subsp. enterica serovar Typhimurium str. LT2, complete genome. ACCESSION AE006468 AE008693-AE008916 VERSION AE006468.1 GI:16445344 DBLINK Project:241 KEYWORDS . SOURCE Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 ORGANISM Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Salmonella; Salmonella enterica subsp. enterica serovar Typhimurium. REFERENCE 1 (bases 1 to 4857432) AUTHORS McClelland,M., Sanderson,K.E., Spieth,J., Clifton,S.W., Latreille,P., Courtney,L., Porwollik,S., Ali,J., Dante,M., Du,F., Hou,S., Layman,D., Leonard,S., Nguyen,C., Scott,K., Holmes,A., Grewal,N., Mulvaney,E., Ryan,E., Sun,H., Florea,L., Miller,W., Stoneking,T., Nhan,M., Waterston,R. and Wilson,R.K. TITLE Complete genome sequence of Salmonella enterica serovar Typhimurium LT2 JOURNAL Nature 413 (6858), 852-856 (2001) PUBMED 11677609 REFERENCE 2 (bases 1 to 4857432) CONSRTM The Salmonella typhimurium Genome Sequencing Project TITLE Direct Submission JOURNAL Submitted (29-MAR-2001) Genome Sequencing Center, Department of Genetics, Washington University School of Medicine, 4444 Forest Park Boulevard, St. Louis, MO 63108, USA COMMENT On or before Feb 20, 2009 this sequence version replaced gi:16418493, gi:16418512, gi:16418529, gi:16418549, gi:16418569, gi:16418591, gi:16418608, gi:16418628, gi:16418650, gi:16418668, gi:16418689, gi:16418705, gi:16418723, gi:16418742, gi:16418760, gi:16418784, gi:16418804, gi:16418824, gi:16418844, gi:16418861, gi:16418878, gi:16418900, gi:16418921, gi:16418944, gi:16418964, gi:16418985, gi:16419003, gi:16419024, gi:16419047, gi:16419070, gi:16419090, gi:16419110, gi:16419136, gi:16419161, gi:16419181, gi:16419199, gi:16419219, gi:16419241, gi:16419261, gi:16419282, gi:16419303, gi:16419326, gi:16419345, gi:16419364, gi:22024638, gi:16419449, gi:16419469, gi:16419487, gi:22024639, gi:16419572, gi:16419592, gi:16419620, gi:16419641, gi:16419665, gi:16419687, gi:16419709, gi:16419731, gi:16419750, gi:16419776, gi:16419802, gi:16419825, gi:16419849, gi:16419870, gi:16419888, gi:16419908, gi:16419944, gi:16419969, gi:16419990, gi:16420008, gi:16420030, gi:16420061, gi:16420079, gi:16420095, gi:16420112, gi:16420132, gi:16420154, gi:16420173, gi:16420194, gi:16420213, gi:16420237, gi:16420258, gi:16420281, gi:16420298, gi:16420325, gi:16420348, gi:16420369, gi:16420391, gi:16420418, gi:16420438, gi:16420461, gi:16420488, gi:16420517, gi:16420539, gi:16420565, gi:16420590, gi:16420610, gi:16420630, gi:16420649, gi:16420666, gi:16420687, gi:16420709, gi:16420731, gi:16420749, gi:16420769, gi:16420795, gi:16420811, gi:16420832, gi:16420856, gi:16420874, gi:16420896, gi:16420916, gi:16420938, gi:16420955, gi:16420979, gi:16421005, gi:16421023, gi:16421046, gi:16421058, gi:16421075, gi:16421092, gi:16421111, gi:16421131, gi:16421159, gi:16421190, gi:16421206, gi:16421230, gi:16421239, gi:16421268, gi:16421292, gi:16421311, gi:16421327, gi:16421347, gi:16421370, gi:16421388, gi:16421412, gi:16421434, gi:16421460, gi:16421485, gi:16421501, gi:16421517, gi:16421536, gi:16421550, gi:16421569, gi:16421592, gi:16421612, gi:16421636, gi:16421663, gi:16421682, gi:16421705, gi:16421729, gi:16421751, gi:16421769, gi:16421791, gi:16421808, gi:16421834, gi:16421851, gi:16421879, gi:16421897, gi:16421920, gi:16421939, gi:16421954, gi:16421987, gi:16422017, gi:16422036, gi:16422058, gi:16422074, gi:16422092, gi:16422111, gi:16422137, gi:16422156, gi:16422173, gi:16422188, gi:16422205, gi:16422228, gi:16422247, gi:16422263, gi:16422283, gi:16422308, gi:16422324, gi:16422343, gi:16422367, gi:16422393, gi:16422410, gi:16422430, gi:16422449, gi:16422464, gi:16422484, gi:16422503, gi:16422527, gi:16422547, gi:16422561, gi:16422586, gi:16422609, gi:16422630, gi:16422654, gi:16422675, gi:16422691, gi:16422706, gi:16422728, gi:16422744, gi:16422768, gi:16422789, gi:16422815, gi:16422822, gi:16422843, gi:16422860, gi:16422882, gi:16422906, gi:16422923, gi:16422950, gi:16422976, gi:16422994, gi:16423017, gi:16423037, gi:16423054, gi:16423067, gi:16423088, gi:16423108, gi:16423133, gi:16423153. Supported by NIH grant 5U 01 AI43283 Coding sequences below are predicted from manually evaluated computer analysis, using similarity information and the programs; GLIMMER; http://www.tigr.org/softlab/glimmer/glimmer.html and GeneMark; http://opal.biology.gatech.edu/GeneMark/ EC numbers were kindly provided by Junko Yabuzaki and the Kyoto Encyclopedia of Genes and Genomes; http://www.genome.ad.jp/kegg/, and Pedro Romero and Peter Karp at EcoCyc; http://ecocyc.PangeaSystems.com/ecocyc/ The analyses of ribosome binding sites and promoter binding sites were kindly provided by Heladia Salgado, Julio Collado-Vides and ReguonDB; http://kinich.cifn.unam.mx:8850/db/regulondb_intro.frameset This sequence was finished as follows unless otherwise noted: all regions were double stranded, sequenced with an alternate chemistries or covered by high quality data (i.e., phred quality >= 30); an attempt was made to resolve all sequencing problems, such as compressions and repeats; all regions were covered by sequence from more than one m13 subclone. FEATURES Location/Qualifiers source join(1..962611,1004276..1098227,1143715..2728971, 2776823..2844426,2879234..4857432) /organism="Salmonella enterica subsp. enterica serovar Typhimurium str. LT2" /mol_type="genomic DNA" /strain="LT2; SGSC 1412; ATCC 700720" /serovar="Typhimurium" /culture_collection="ATCC:700720" /db_xref="taxon:99287" /focus source 962612..1004275 /organism="Salmonella phage Fels-1" /mol_type="genomic DNA" /db_xref="taxon:128975" source 1098228..1143714 /organism="Phage Gifsy-2" /mol_type="genomic DNA" /db_xref="taxon:129862" source 2728972..2776822 /organism="Phage Gifsy-1" /mol_type="genomic DNA" /db_xref="taxon:129861" source 2844427..2879233 /organism="Enterobacteria phage Fels-2" /mol_type="genomic DNA" /db_xref="taxon:194701" gene 190..255 /gene="thrL" /locus_tag="STM0001" CDS 190..255 /gene="thrL" /locus_tag="STM0001" /note="similar to E. coli thr operon leader peptide (AAC73112.1); Blastp hit to AAC73112.1 (21 aa), 85-1073758644dentity in aa 1 - 21" /codon_start=1 /transl_table=11 /product="thr operon leader peptide" /protein_id="AAL18965.1" /db_xref="GI:16418494" /translation="MNRISTTTITTITITTGNGAG" gene 325..2799 /gene="thrA" /locus_tag="STM0002" RBS 325..330 /gene="thrA" /locus_tag="STM0002" /note="putative RBS for thrA; RegulonDB:STMS1H000398" CDS 337..2799 /gene="thrA" /locus_tag="STM0002" /EC_number="1.1.1.3" /EC_number="2.7.2.4" /note="bifunctional; N-terminaus is aspartokinase I and C terminus is homoserine dehydrogenase I; similar to E. coli aspartokinase I, homoserine dehydrogenase I (AAC73113.1); Blastp hit to AAC73113.1 (820 aa), 94% identity in aa 1 - 820" /codon_start=1 /transl_table=11 /product="aspartokinase I" /protein_id="AAL18966.1" /db_xref="GI:16418495" /translation="MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKIT NHLVAMIEKTIGGQDALPNISDAERIFSDLLAGLASAQPGFPLARLKMVVEQEFAQIK HVLHGISLLGQCPDSINAALICRGEKMSIAIMAGLLEARGHRVTVIDPVEKLLAVGHY LESTVDIAESTRRIAASQIPADHMILMAGFTAGNEKGELVVLGRNGSDYSAAVLAACL RADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQF QIPCLIKNTGNPQAPGTLIGASSDDDNLPVKGISNLNNMAMFSVSGPGMKGMIGMAAR VFAAMSRAGISVVLITQSSSEYSISFCVPQSDCARARRAMQDEFYLELKEGLLEPLAV TERLAIISVVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATT GVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQTWLKNKHIDLRVCGVANSKA LLTNVHGLNLDNWQAELAQANAPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYAD FLREGFHVVTPNKKANTSSMDYYHQLRFAAAQSRRKFLYDTNVGAGLPVIENLQNLLN AGDELQKFSGILSGSLSFIFGKLEEGMSLSQATALAREMGYTEPDPRDDLSGMDVARK LLILARETGRELELSDIVIEPVLPNEFDASGDVTAFMAHLPQLDDAFAARVAKARDEG KVLRYVGNIEEDGVCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAG NDVTAAGVFADLLRTLSWKLGV" gene 2789..3730 /gene="thrB" /locus_tag="STM0003" RBS 2789..2794 /gene="thrB" /locus_tag="STM0003" /note="putative RBS for thrB; RegulonDB:STMS1H000399" CDS 2801..3730 /gene="thrB" /locus_tag="STM0003" /EC_number="2.7.1.39" /note="similar to E. coli homoserine kinase (AAC73114.1); Blastp hit to AAC73114.1 (310 aa), 93% identity in aa 1 - 308" /codon_start=1 /transl_table=11 /product="homoserine kinase" /protein_id="AAL18967.1" /db_xref="GI:16418496" /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGTLLGDVVSVEAADHF RLHNLGRFADKLPPEPRENIVYQCWERFCQALGKTIPVAMTLEKNMPIGSGLGSSACS VVAALVAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENGI ISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQ PQLAAALMKDVIAEPYRARLLPGFSQARQAVSEIGALASGISGSGPTLFALCDKPETA QRVADWLSKHYLQNQEGFVHICRLDTAGARVVG" gene 3722..5020 /gene="thrC" /locus_tag="STM0004" RBS 3722..3727 /gene="thrB" /locus_tag="STM0003" /note="putative RBS for thrC; RegulonDB:STMS1H000400" CDS 3734..5020 /gene="thrC" /locus_tag="STM0004" /EC_number="4.2.3.1" /note="similar to E. coli threonine synthase (AAC73115.1); Blastp hit to AAC73115.1 (428 aa), 93% identity in aa 1 - 428" /codon_start=1 /transl_table=11 /product="threonine synthase" /protein_id="AAL18968.1" /db_xref="GI:16418497" /translation="MKLYNLKDHNEQVSFAQAVTQGLGKQQGLFFPHDLPEFSLTEID EMLNQDFVSRSAKILSAFIGDEIPQQILEERVRAAFAFPAPVAQVESDVGCLELFHGP TLAFKDFGGRFMAQMLTHISGDKPVTILTATSGDTGAAVAHAFYGLENVRVVILYPRG KISPLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKTALGLNSANSINISR LLAQICYYFEAVAQLPQGARNQLVISVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVN DTVPRFLHDGKWAPKATQATLSNAMDVSQPNNWPRVEELFRRKIWRLTELGYAAVDDT TTQQTMRELKAKGYISEPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVESILGE TLALPEALAERADLPLLSHHLPADFAALRKLMMTRQ" gene complement(5114..5898) /gene="yaaA" /locus_tag="STM0005" CDS complement(5114..5887) /gene="yaaA" /locus_tag="STM0005" /note="similar to E. coli orf, hypothetical protein (AAC73117.1); Blastp hit to AAC73117.1 (258 aa), 86% identity in aa 1 - 257" /codon_start=1 /transl_table=11 /product="putative cytoplasmic protein" /protein_id="AAL18969.1" /db_xref="GI:16418498" /translation="MLILISPAKTLDYQSPLATTRYTQPELLDHSQQLIQQARQLSAP QISRLMGISDKLADLNATRFHDWQPHFTPDNARQAILAFKGDVYTGLQAETFNDADFD FAQQHLRMLSGLYGVLRPLDLMQPYRLEMGIRLENPRGKDLYQFWGDIITDKLNEALE AQGDRVVVNLASEEYFKSVKPKKLNAELIKPVFLDEKNGKFKVVSFYAKKARGLMSRF IIENRLTKPEQLTAFDREGYFFDEETSTQDELVFKRYEQ" RBS complement(5893..5898) /gene="yaaA" /locus_tag="STM0005" /note="putative RBS for yaaA; RegulonDB:STMS1H000401" gene complement(5966..7407) /gene="yaaJ" /locus_tag="STM0006" CDS complement(5966..7396) /gene="yaaJ" /locus_tag="STM0006" /note="AGCS family; similar to E. coli inner membrane transport protein (AAC73118.1); Blastp hit to AAC73118.1 (476 aa), 76% identity in aa 1 - 476" /codon_start=1 /transl_table=11 /product="putative alanine/glycine transport protein" /protein_id="AAL18970.1" /db_xref="GI:16418499" /translation="MPEFFSFINEILWGSVMIYLLLGAGCWFTWRTGFIQFRYIRQFS RSLKGSLSPQPGGLTSFQALCTSLAARIGSGNLAGVALAIAAGGPGAVFWMWVSAIIG MATSFAECSLAQLYKERDPTGQFRGGPAWYMARGLGMRWMGVVFALFLLVAYGLIFNS VQANAVSRALHFAFNIPPLISGIALAFCALLIIIRGIKGVARLMQWLIPLIALLWVAG SVFICLWHIEQMPGVIASIVKSAFGWQEAAAGAAGYTLTQAITSGFQRGMFSNEAGMG STPNAAAAATSYPPHPVAQGIVQMIGVFSDTIIICTASAMIILLAGNHASHSSTEGIQ LLQHAMVSLTGEWGASFVALIVILFAFSSIVANYIYAENNLFFLRLHNAKAIWLLRLA TLGMVIAGTLISFPLIWQLADMIMACMAITNLTAILLLSPVVYTLASDYLRQRKLGVR PQFDPRRFPDIEPQLAPDTWEAASRD" RBS complement(7402..7407) /gene="yaaJ" /locus_tag="STM0006" /note="putative RBS for yaaJ; RegulonDB:STMS1H000402" gene 7652..8618 /gene="talB" /locus_tag="STM0007" RBS 7652..7657 /gene="talB" /locus_tag="STM0007" /note="putative RBS for talB; RegulonDB:STMS1H000403" CDS 7665..8618 /gene="talB" /locus_tag="STM0007" /EC_number="2.2.1.2" /note="similar to E. coli transaldolase B (AAC73119.1); Blastp hit to AAC73119.1 (317 aa), 94% identity in aa 1 - 317" /codon_start=1 /transl_table=11 /product="transaldolase B" /protein_id="AAL18971.1" /db_xref="GI:16418500" /translation="MTDKLTSLRQFTTVVADTGDIAAMKLYQPQDATTNPSLILNAAQ IPEYRKLIDDAVAWAKQQSSDRAQQVVDATDKLAVNIGLEILKLVPGRISTEVDARLS YDTEASIAKAKRIIKLYNDAGISNDRILIKLASTWQGIRAAEQLEKEGINCNLTLLFS FAQARACAEAGVYLISPFVGRILDWYKANTDKKDYAPAEDPGVVSVTEIYEYYKQHGY ETVVMGASFRNVGEILELAGCDRLTIAPALLKELAESEGAIERKLSFSGEVKARPERI TEAEFLWQHHQDPMAVDKLADGIRKFAVDQEKLEKMIGDLL" gene 8718..9319 /gene="mog" /locus_tag="STM0008" RBS 8718..8723 /gene="mog" /locus_tag="STM0008" /note="putative RBS for mog; RegulonDB:STMS1H000404" CDS 8729..9319 /gene="mog" /locus_tag="STM0008" /note="molybdopterine biosynthesis; similar to E. coli required for the efficient incorporation of molybdate into molybdoproteins (AAC73120.1); Blastp hit to AAC73120.1 (195 aa), 91% identity in aa 1 - 192" /codon_start=1 /transl_table=11 /product="putative molybdochetalase" /protein_id="AAL18972.1" /db_xref="GI:16418501" /translation="MDTLRIGLVSISDRASSGVYQDKGIPALEEWLASALTTPFEVQR RLIPDEQEIIEQTLCELVDEMSCHLVLTTGGTGPARRDVTPDATLAIADREMPGFGEQ MRQISLRFVPTAILSRQVGVIRKQALILNLPGQPKSIKETLEGVKADDGSVSVPGIFA SVPYCIQLLDGPYVETAPEVVAAFRPKSARRENMSD" gene complement(9376..9950) /gene="yaaH" /locus_tag="STM0009" CDS complement(9376..9942) /gene="yaaH" /locus_tag="STM0009" /note="similar to E. coli orf, hypothetical protein (AAC73121.1); Blastp hit to AAC73121.1 (188 aa), 90% identity in aa 1 - 188" /codon_start=1 /transl_table=11 /product="putative regulatory protein" /protein_id="AAL18973.1" /db_xref="GI:16418502" /translation="MGNTKLANPAPLGLMGFGMTTILLNLHNAGFFALDGIILAMGIF YGGIAQIFAGLLEYKKGNTFGLTAFTSYGSFWLTLVAILLMPKMGLTDAPDAQLLGAY LGLWGVFTLFMFFGTLKAARALQFVFLSLTVLFALLAVGNITGNEATIHIAGWVGLVC GASAIYLAMGEVLNEQFGRTILPIGEAH" RBS complement(9945..9950) /gene="yaaH" /locus_tag="STM0009" /note="putative RBS for yaaH; RegulonDB:STMS1H000405" ORIGIN 1 agagattacg tctggttgca agagatcatg acagggggaa ttggttgaaa ataaatatat 61 cgccagcagc acatgaacaa gtttcggaat gtgatcaatt taaaaattta ttgacttagg 121 cgggcagata ctttaaccaa tataggaata caagacagac aaataaaaat gacagagtac 181 acaacatcca tgaaccgcat cagcaccacc accattacca ccatcaccat taccacaggt 241 aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg aacagtgcgg 301 gctttttttt cgaccagaga tcacgaggta acaaccatgc gagtgttgaa gttcggcggt 361 acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc 421 aggcaagggc aggtagcgac cgtactttcc gcccccgcga aaattaccaa ccatctggtg 481 gcaatgattg aaaaaactat cggcggccag gatgctttgc cgaatatcag cgatgcagaa 541 cgtatttttt ctgacctgct cgcaggactt gccagcgcgc agccgggatt cccgcttgca 601 cggttgaaaa tggttgtcga acaagaattc gctcagatca aacatgttct gcatggtatc 661 agcctgctgg gtcagtgccc ggatagcatc aacgccgcgc tgatttgccg tggcgaaaaa 721 atgtcgatcg cgattatggc gggacttctg gaggcgcgtg ggcatcgcgt cacggtgatc 781 gatccggtag aaaaattgct ggcggtgggc cattaccttg aatctaccgt tgatatcgcg 841 gaatcgactc gccgtatcgc cgccagccag atcccggccg atcacatgat cctgatggcg 901 ggctttaccg ccggtaatga aaagggtgaa ctggtggtgc tgggccgtaa tggttccgac 961 tattccgccg ccgtgctggc cgcctgttta cgcgctgact gctgtgaaat ctggactgac 1021 gtcgatggcg tgtatacctg tgacccgcgc caggtgccgg acgccaggct gctgaaatcg 1081 atgtcctacc aggaagcgat ggaactctct tacttcggcg ccaaagtcct tcaccctcgc 1141 accattacgc ccatcgccca gttccagatc ccctgtctga ttaaaaatac cggtaatccg 1201 caggcgccag gaacgctgat cggcgcgtcc agcgacgatg ataacctgcc ggttaaaggg 1261 atctctaacc ttaacaacat ggcgatgttt agcgtctccg gcccgggaat gaaagggatg 1321 attgggatgg cggcgcgtgt tttcgccgcc atgtctcgcg ccgggatctc ggtggtgctc 1381 attacccagt cctcctctga gtacagcatc agtttctgtg tgccgcagag tgactgcgcg 1441 cgtgcccgcc gtgcgatgca ggatgagttc tatctggagc tgaaagaggg gctgctggag 1501 ccgctggcgg ttacggagcg gttggcgatt atctctgttg tcggcgacgg tatgcgcacg 1561 ctacgcggca tttcagcgaa attcttcgcc gcgctggcgc gggctaatat caatatcgtg 1621 gcgatcgctc agggatcttc tgagcgttct atttctgtgg tggtgaataa cgacgatgcc 1681 accaccggcg tgcgggtaac gcaccagatg ctgttcaata ccgatcaggt gattgaagtg 1741 tttgtcattg gcgtcggcgg cgtcggcggc gcgctactgg aacagcttaa acgtcagcaa 1801 acctggctga agaacaagca catcgatcta cgcgtgtgcg gcgtggcgaa ctcaaaggcg 1861 ttgctaacca atgtgcatgg cctgaatctg gacaactggc aggcggaact ggcgcaagcg 1921 aacgcgccgt tcaatctggg acgcttaatt cgcctggtga aagaatatca tctactcaat 1981 ccggtgattg ttgattgtac ctccagtcag gcggtggccg accagtatgc cgacttcctg 2041 cgtgaagggt tccatgtggt gacgccaaac aagaaagcga acacctcgtc aatggactac 2101 taccatcagc tacgtttcgc cgccgcgcaa tcacggcgca aattcttgta tgacaccaac 2161 gtcggcgccg gtttgccggt aatcgaaaac ctgcaaaacc tgctgaatgc cggtgatgaa 2221 ctgcaaaaat tttccggcat tctttccggg tcgctctctt ttattttcgg taaactggaa 2281 gaggggatga gtctctcaca ggcgaccgcc ctggcgcgcg agatgggcta taccgaaccc 2341 gatccgcgcg acgatctttc cggtatggat gtggcgcgta aactgttgat cctcgcccgc 2401 gagacgggcc gcgagctgga gctttccgat attgtgattg aaccggtgtt gccgaacgag 2461 tttgacgcct ccggcgatgt gaccgccttt atggcacatc tgccgcagct tgacgacgcg 2521 tttgccgccc gtgtggcgaa agctcgtgat gaaggtaagg tattgcgcta tgtgggcaat 2581 atcgaagagg atggcgtgtg ccgcgtgaaa attgccgaag ttgatggtaa cgatccgctc 2641 ttcaaagtga aaaacggtga aaacgcgctg gcgttctaca gccactatta tcagcccttg 2701 ccgttggtgc tgcgcggcta cggcgcaggc aatgatgtga cggcggcggg cgtgtttgcc 2761 gatctgttac ggaccctctc atggaagtta ggagtttaac atggtgaaag tgtatgcccc 2821 ggcttccagc gcgaacatga gcgtcggttt cgacgtgttg ggcgcagccg tcacacccgt 2881 tgacggcacg ttgctgggcg atgtggtatc cgttgaagca gcggaccatt tccgtctgca 2941 taacctgggg cgatttgccg ataaactgcc gccggagccg cgtgaaaata ttgtttatca 3001 gtgctgggaa cgtttttgcc aggcattggg gaaaaccatc ccggtggcga tgacgctgga 3061 aaaaaatatg ccgattggtt ccgggttagg gtccagcgcc tgttccgtcg tcgccgcgct 3121 ggtcgcgatg aatgagcact gcggcaaacc gttaaacgac acgcgtctgt tggcgctgat 3181 gggcgagctg gaaggccgta tctccggcag catccattac gataacgtcg cgccgtgctt 3241 tcttggcggt atgcagttga tgattgaaga aaacggcatt attagtcagc aggtgccggg 3301 ctttgatgag tggctatggg tactggctta tccgggcatt aaagtttcta ccgcagaagc 3361 acgggccatt ttgcctgcgc agtatcgccg tcaggattgc attgcgcatg gacggcatct 3421 ggccggtttt attcacgcct gttactcgcg gcagccgcag cttgccgccg cgctgatgaa 3481 agatgttatt gccgaaccat accgcgcgcg tttactgccg ggctttagcc aggcgcggca 3541 ggcggtgtcg gagatcggcg cgctggcgag cgggatttcc gggtcggggc cgacgctgtt 3601 tgcgctatgc gataaaccgg agacggcgca gcgcgtcgcg gactggctga gcaaacatta 3661 tctgcaaaat caggaaggct tcgttcatat ttgccggctg gatacggcgg gcgcacgagt 3721 agtgggataa tcaatgaaac tctataatct gaaagaccat aatgagcagg tcagctttgc 3781 gcaggccgtc acgcaaggac tgggcaaaca gcagggactt ttttttccgc acgatctgcc 3841 ggagtttagc ctgacggaaa ttgatgagat gctcaaccag gactttgtca gccgtagcgc 3901 aaagatcctc tcggcattta ttggcgatga aataccgcag caaattctgg aagagcgcgt 3961 ccgcgcggcg tttgcgttcc cggcaccggt agcgcaggta gaaagcgatg tcggctgcct 4021 ggagctgttc catggtccga cgctggcctt taaagacttc ggcgggcgtt ttatggcgca 4081 aatgctgacg catatcagcg gcgacaaacc ggtgacgatt ctgactgcaa cgtcaggcga 4141 taccggcgcg gcggtggctc acgcgttcta tggcctggaa aatgtgcggg tcgtcattct 4201 ctacccgcgc ggtaagatca gtccgttgca ggaaaaactg ttctgtacgc tgggcggcaa 4261 cattgaaacc gtggcgatcg acggcgattt cgacgcgtgc caggcgctgg tgaaacaggc 4321 atttgatgac gaagaactga aaacggcgct ggggctgaat tcggctaatt cgattaatat 4381 cagccgcctg ttggcgcaaa tttgctacta ctttgaagcc gtggcgcaac tgccgcaggg 4441 ggcgcgtaac caactggtga tctccgtccc cagcggcaac tttggcgatt tgacggcagg 4501 gctgctggcg aagtcgttag gcctaccggt gaaacgtttt atcgccgcca ccaacgtcaa 4561 cgacacggtg ccgcgttttc tgcatgacgg aaagtgggcg ccgaaagcga cgcaggcgac 4621 cctgtcgaat gcgatggatg tcagccagcc gaataactgg ccgcgcgtgg aggagctatt 4681 ccgccgtaaa atctggcgcc tgactgagct gggctatgcg gcggtggatg atactacgac 4741 acagcagacg atgcgcgagc tgaaagcgaa aggttatatc tcggaacctc atgcggcggt 4801 agcgtatcgg gcattacgcg accagttaaa ccctggcgag tatggcttgt ttctcggaac 4861 ggcgcatccg gcgaagttta aagagagcgt ggagtccatt ctgggagaaa cgctggcctt 4921 gcctgaagcg ctcgccgaac gcgccgacct gccgctgctt tcacatcatc tgcctgcgga 4981 ttttgccgcc ctgcgtaagc tgatgatgac ccgccaataa ccattgcgcc cggtggcgct 5041 gtcgcttacc gggcctatgg ggtggtgtcg atttgtaggc cggataaggc gtaaccgcca 5101 tccggcgatg ccgttactgc tcgtagcgtt taaagaccag ctcgtcttgt gtggaggttt 5161 cttcatcaaa gaaataccct tcacggtcaa acgcggtaag ctgttccggc ttcgttaagc 5221 ggttttcaat aataaaacgg ctcatcagtc cgcgcgcttt tttggcgtag aagcttacca 5281 ccttaaactt gccgtttttc tcatcaagga acacgggctt aatcagttcg gcattcagtt 5341 tcttcggctt caccgattta aaatattcct cggaggccag attcaccacc acccgatcgc 5401 cctgcgcctc aagcgcttcg ttgagcttat cggtaatgat atcgccccag aattgataaa 5461 gatctttgcc gcgcggattc tccaggcgaa tccccatctc cagacgataa ggctgcatta 5521 aatccagcgg gcgcaatacg ccatacaagc cagagagcat acgcagatgt tgttgagcaa 5581 aatcaaaatc cgcgtcgttg aacgtttccg cctgtaggcc ggtataaaca tcgcctttga 5641 acgccagaat cgcctggcgt gcattatccg gcgtaaaatg aggctgccag tcatgaaaac 5701 gcgtggcgtt gagatccgcc agtttgtcgc taattcccat cagcctggaa atttgcggcg 5761 ccgaaagctg gcgggcctgt tgaataagct gctggctgtg atccaacagc tccggctggg 5821 tatagcgggt cgtggccagc gggctttgat aatcaagcgt ttttgcaggt gaaatcagaa 5881 tcagcatatt cagtccttgc agggaatttt ctgcgacttt agcaaaaaaa cgccgcagag 5941 ttgaccgatg gttgcgattg tcggcttaat cgcgcgatgc cgcctcccag gtatctggcg 6001 ccagttgtgg ttcgatatcc gggaagcgcc gcggatcgaa ctgcggtctt acgcccagtt 6061 tccgttgtcg cagataatcg ctggcgaggg tatacaccac cggtgagagc aacaaaatcg 6121 ccgtcagatt ggtaatggcc atacaggcca tgatcatgtc agcgagctgc catatcagcg 6181 gaaaactgat aagcgtaccg gcgataacca tgccaagcgt cgcaaggcgt aatagccaga 6241 tagcctttgc gttatgtaac cgcagaaaaa acagattgtt ttcggcgtaa atatagttgg 6301 cgacgataga actgaacgcg aacagaatga cgataagcgc gacaaaactg gcgccccatt 6361 caccggtcaa cgaaaccatt gcatgttgga gaagctgaat gccttctgtt gacgagtggg 6421 acgcgtgatt ccccgccagc aggataatca tcgcgctggc ggtacagatg ataatggtgt 6481 cgctgaatac gccaatcatt tgcacaatcc cctgcgcgac agggtgaggg ggatacgacg 6541 ttgccgctgc ggccgcatta ggcgttgacc ccattcccgc ttcattagag aacatcccac 6601 gctgaaaacc gctggtaata gcctgggtga gcgtatatcc ggctgcgcct gccgcggctt 6661 cctgccagcc aaatgcgctt ttgactatcg aggcgataac gccaggcatt tgctcaatat 6721 gccagaggca aatgaatacg ctgccggcga cccacaataa cgcgatgagg ggaatcagcc 6781 attgcatcag acgggcgacg cctttgatgc cgcgaatgat aattaacagg gcacagaacg 6841 ccagagcaat gccggagata agcggcggaa tgttgaaggc gaaatggagc gcgcgtgaga 6901 cggcattcgc ctgcacgcta ttaaaaatca acccgtaggc gacgagcaga aagagggcga 6961 aaaccacgcc catccagcgc attcccagcc cacgcgccat ataccacgcc gggccgccgc 7021 ggaactggcc tgtcgggtca cgttctttat aaagctgggc aagcgaacac tcggcgaagg 7081 aggtcgccat gccaatgatg gccgagaccc acatccagaa taccgcgccg ggaccgcctg 7141 cggcgatagc cagcgccacg ccggccaggt taccgctgcc aatccgcgcc gcgaggctgg 7201 tacacagagc ctgaaatgac gtcaggccgc ctggctgcgg gctaaggctg cctttcagac 7261 tgcggctaaa ttggcgaata taacgaaact gaatgaatcc ggtacgccag gtaaaccaac 7321 atcctgcgcc gagcagcagg taaatcatta ccgagcccca gagtatttcg ttaataaaac 7381 tgaaaaactc aggcattaac gtccctcttg ttgatgccgg cacgctttga taatcctgta 7441 taagcgtgac ccatgatgta gatgaccttg tcagactaat attaacggca gtttaccata 7501 aatacggtgg tatcctttaa ttgcgcatca accgtcggca gatacgcaaa cagtgcacaa 7561 gggcagccag gtgcatgtag gcggttgcgc tgtgagtgcg tcgtgttatc atcagggtag 7621 accggttaca tcccctaaca agctgtttaa agagaaactc tatcatgacg gacaaattga 7681 cctcccttcg tcagttcacc accgtagtgg ctgataccgg agatatcgcg gcaatgaaac 7741 tgtatcagcc gcaggatgct acaactaacc cttctctcat tcttaacgca gcgcaaatcc 7801 cggaatatcg taagctgatt gacgatgctg tcgcctgggc gaaacagcag agcagcgacc 7861 gcgcgcagca ggttgttgac gcgaccgata agctggcggt gaatattggc ctggagatcc 7921 tgaagctggt gccggggcgt atttctaccg aagttgacgc gcgtctgtct tatgacactg 7981 aagcgtctat cgccaaagca aaacgtatca ttaaactcta caatgatgcg ggtatcagca 8041 acgatcgtat cctgatcaag ctggcgtcca cctggcaggg cattcgtgca gccgaacagc 8101 tggaaaaaga aggcatcaac tgtaacctga cgctgctgtt ctccttcgcg caggcgcgtg 8161 cgtgcgccga agcgggcgtc tacctgatct cgccgttcgt aggtcgtatt cttgactggt 8221 ataaagccaa taccgacaag aaagactatg cgccagctga agatccgggc gtggtttccg 8281 taacggaaat ctacgagtac tacaaacagc atggttacga aaccgtcgtt atgggcgcaa 8341 gcttccgtaa cgtaggcgaa attctggagc tggcgggctg cgaccgtctg actatcgcgc 8401 cggcattgct gaaagaactg gcggaaagcg aaggggcgat tgagcgtaag ctctctttct 8461 ccggcgaagt caaagcgcgc ccggaacgca ttaccgaagc cgagttcctg tggcagcatc 8521 accaggaccc catggcggtt gacaaactgg cggatggtat ccgtaagttt gcggtagacc 8581 aggaaaaact ggaaaaaatg atcggcgatc tgctgtaatc attaacgcgt ggccctgata 8641 tgggtcacgc tacctcttct gaaacctgtc tgtccttccc ttcgcagtgt atcattctgt 8701 ttaacgagac tgtttaaacg gaataatcat ggatacctta cgtattggct tagtttctat 8761 ctccgaccgc gcttcaagcg gcgtttacca ggataaaggc attcctgcgc ttgaggagtg 8821 gctcgcttct gcgctgacca cgcctttcga ggtccaacgg cgcttaattc ctgatgaaca 8881 ggaaattatc gagcaaacgt tgtgtgaact ggtcgatgag atgagctgtc atctggtgct 8941 gaccactggc ggtaccggtc cggcgcgtcg cgacgtcacg ccggacgcga cccttgccat 9001 cgccgaccgt gaaatgccag gttttggcga gcagatgcgc cagatcagcc tgcgctttgt 9061 gccgaccgcc attctttccc gccaggtggg cgttatccgt aaacaggcgt taattcttaa 9121 tctgcctgga cagccaaaat cgatcaaaga aacgctggaa ggcgtaaaag cggacgatgg 9181 cagcgttagc gtgccgggca tttttgcgag cgtgccgtat tgcatacagc tgcttgacgg 9241 gccgtatgtc gaaaccgcgc cggaagtggt tgccgctttc cgtccaaaga gcgccagacg 9301 tgagaatatg tcggactgac cggaaaatac tgatagtagg gttattcctc ccggtgcggg 9361 aggaataaaa gagatttagt gcgcctcgcc gattggcaga atagtgcggc caaattgctc 9421 gtttagcact tcacccatcg ccagataaat agcgctggcg ccgcaaacca gaccaaccca 9481 gcctgcgata tggatagttg cttcgttacc ggtgatattg ccgaccgcca gcagagcgaa 9541 cagtacggtc aggctcagga aaacaaattg cagcgcgcgg gcggctttca gcgtaccgaa 9601 gaacataaac agcgtgaaca cgccccacag acctaaataa gcgccgagta gctgagcgtc 9661 aggcgcatcc gtcagaccca ttttcggcat cagcaggata gcgaccagcg tcagccagaa 9721 cgaaccgtaa gaggtaaagg ccgttaaacc aaaggtattg ccttttttgt actccagcag 9781 accggcaaaa atttgcgcga taccgccgta gaaaatcccc atcgccagaa taataccgtc 9841 cagggcgaaa aaaccggcat tgtgcaggtt aagcagaatg gtggtcatgc cgaagcccat 9901 caggcccagc ggtgccggat tagccaactt agtgttgccc ataattcctc aaaatcatca 9961 taattgaatg gtgaaatagt ttcccagaat aacgagttcc gtattcgggg cgcggcataa // PyCogent-1.5.3/doc/data/test.paml000644 000765 000024 00000000540 10665667404 017565 0ustar00jrideoutstaff000000 000000 5 60 NineBande gcaaggcgccaacagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact Mouse gcagtgagccagcagagcagatgggctgcaagtaaaggaacatgtaacgacaggcaggtt Human gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact HowlerMon gcaaggagccaacataacagatgggctgaaagtgaggaaacatgtaatgataggcagact DogFaced gcaaggagccagcagaacagatgggttgaaactaaggaaacatgtaatgataggcagact PyCogent-1.5.3/doc/data/test.tree000644 000765 000024 00000000235 11213030412 017541 0ustar00jrideoutstaff000000 000000 (((Human:0.0311054096183,HowlerMon:0.0415847131449):0.0382963424874,Mouse:0.277353608988):0.0197278502379,NineBande:0.0939768158209,DogFaced:0.113211053859);PyCogent-1.5.3/doc/data/test2.fasta000644 000765 000024 00000000531 10665667404 020014 0ustar00jrideoutstaff000000 000000 >NineBande cgccaacagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact >Mouse gcagtgagccagcagagcagatgggctgcaagtaaaggaacatgtaacgacaggcaggtt >HowlerMon gcaaggagccaacataacagatgggctgaaagtgaggaaacatgtaatgataggcagact >Human gcaaggagccaacataacagatgggctggaagtaaggaaacatgtaatgataggcggact >DogFaced gcaaggagccagcagaacagatgggttgaaactaaggaaacatgtaatgataggcagact PyCogent-1.5.3/doc/data/trna_profile.fasta000644 000765 000024 00000004253 11075742320 021427 0ustar00jrideoutstaff000000 000000 >seq_50_Halobacterium_sp. GCCGCCUUAGCUCAGACUGGGAGAGCACUCGACUGAAGAUCGAGCUGUCCCUGGUUCAAAUCCGGGAGGCGGC-CCA >seq_23_Haemophilus_influenzae GCCUCGAUAGCUCAGUC-GGUAGAGCAGGGGAUUGAAAAUCCCCGUGUCGGUGGUUCGAUUCCGCCUCGAGGCACCA >seq_46_Salmonella_enterica GCCCGGAUAGCUCAGUC-GGUAGAGCAGGGGAUUGAAAAUCCCCGUGUCCUUGGUUCGAUUCCGAGUCCGGGCACCA >seq_135_Yersinia_pestis_biovar_Medievalis GCCCGGAUAGCUCAGU-CGGUAGAGCAGGGGAUUGAAAAUCCCCGUGUCCUUGGUUCGAUUCCGAGUCCGGGCACCA >seq_148_Bartonella_quintana GCCCAGAUAGCUCAGUU-GGUAGAGCAGCGGACUGAAAAUCCGCGUGUCCGUGGUUCGAAUCCGCGUCUGGGCACCA >seq_117_Pseudomonas_syringae_pv._Tomato GCCCAGAUAGCUCAGU-CGGUAGAGCAGGGGAUUGAAAAUCCCCGUGUCGGCGGUUCGAUUCCGUCUCUGGGCACCA >seq_31_Pseudomonas_aeruginosa GCCCAGGUAGCUCAGUU-GGUAGAGCAGGGGAUUGAAAAUCCCCGUGUCGGCGGUUCGAUUCCGUCCCUGGGCACCA >seq_95_Shewanella_oneidensis GCCCGAAUAGCUCAGUC-GGUAGAGCAGAGGAUUGAAAAUCCUCGUGUCCCUGGUUCGAUUCCGGGUUCGGGCACCA >seq_119_Bacillus_anthracis GGCUCGGUAGCUCAGU-CGGUAGAGCAGAGGACUGAAAAUCCUCGUGUCGGCGGUUCGAUUCCGUCCCGAGCCACCA >seq_86_Xanthomonas_axonopodis_pv._citri. GGCCGAGUAGCUCAGUU-GGUAGAGCAGGGGAUUGAAAAUCCCCGUGUCGGCGGUUCGAUUCCGUCCUCGGCCACCA >seq_146_Bacillus_thuringiensis_serovar_konkukian GGCUCGGUAGCUCAGUC-GGUAGAGCAGAGGACUGAAAAUCCUCGUGUCGGCGGUUCGAUUCCGUCCCGAGCCACCA >seq_24_Campylobacter_jejuni GGUUGGAUAGCUCAGUC-GGUAGAGCAGCAGACUGAAAAUCUGCGUGUCGGCAGUUCGAUUCUGCCUCUAACCACCA >seq_138_Bacillus_cereus GGCUCGGUAGCUCAGU-CGGUAGAGCAGAGGACUGAAAAUCCUCGUGUCGGCGGUUCGAUUCCGUCCCGAGCCACCA >seq_104_Tropheryma_whipplei GGCUCUGUAGCUCAGU-UGGUAGAGCGUUCGACUGAAAAUCGAAAGGUCACCGGAUCGAUACCGGUCGGAGCCACAA >seq_59_Thermoplasma_acidophilum GCCGUGGUAGCUCAGUU-GGGAGAGCGCCAGACUGAAGAUCUGGAGGUCGCUGGUUCGAUCCCGGCCCACGGC-CCA >seq_129_Bacillus_anthracis GGCUCGGUAGCUCAGU-CGGUAGAGCAGAGGACUGAAAAUCCUCGUGUCGGCGGUUCGAUUCCGUCCCGAGCCACUC >seq_130_Geobacter_sulfurreducens GCCCAGAUAGCUCAGC-UGGUAGAGCAGAGGACUGAAAAUCCUCGUGUCGGCGGUUCGAUUCCGUCUCUGGGCACCA >seq_48_Aeropyrum_pernix GCCGCCGUAGCUCAGC--GGGAGAGCGCCCGGCUGAAGACCGGGUGGUCCGGGGUUCGAAUCCCCGCGGCGGCACCA >seq_105_Xylella_fastidiosa GGCCGGGUAGCUCAGU-CGGUAGAGCAGGGGACUGAAAAUCCCCGUGUCGGGGGUUCGAUUCCCUCCCCGGCCACCA >seq_70_Saccharomyces_cerevisiae_(baker's_yeast) GCGGAUUUAGCUCAGUU-GGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA PyCogent-1.5.3/doc/cookbook/accessing_databases.rst000644 000765 000024 00000042505 11751654700 023327 0ustar00jrideoutstaff000000 000000 ******************* Accessing databases ******************* .. Gavin Huttley, Kristian Rother, Patrick Yannul, Rob Knight, Yarden Katz NCBI ==== EUtils is a web service offered by the NCBI to access the sequence, literature and other databases by a special format of URLs. PyCogent offers an interface to construct the URLs and retrieve the results in text format. From Pubmed ----------- Retrieving PubMed records from NCBI by PubMed ID ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The process for getting PubMed records by PubMed ID (PMID) is very similar to that for getting sequences. Basically, you just need to pass in the unique id associated with the article. For example, if you want to get the reference to the original PyCogent paper to see how far we've come since then, you can do this: .. doctest:: >>> from cogent.db.ncbi import EFetch >>> ef = EFetch(id='17708774', db='pubmed', rettype='brief') >>> ef.read() '\n1: Knight R et al. PyCogent: a toolkit for makin...[PMID: 17708774] \n' If you want more information, there are other rettypes, e.g. .. doctest:: >>> ef = EFetch(id='17708774', db='pubmed', rettype='citation') >>> ef.read() "\n1: Genome Biol. 2007;8(8):R171. \n\nPyCogent: a toolkit for making sense from sequence.\n\nKnight R, Maxwell P, Birmingham A, Carnes J, Caporaso JG, Easton BC, Eaton M,\nHamady M, Lindsay H, Liu Z, Lozupone C, McDonald D, Robeson M, Sammut R, Smit S,\nWakefield MJ, Widmann J, Wikman S, Wilson S, Ying H, Huttley GA.\n\nDepartment of Chemistry and Biochemistry, University of Colorado, Boulder,\nColorado, USA. rob@spot.colorado.edu\n\nWe have implemented in Python the COmparative GENomic Toolkit, a fully\nintegrated and thoroughly tested framework for novel probabilistic analyses of\nbiological sequences, devising workflows, and generating publication quality\ngraphics. PyCogent includes connectors to remote databases, built-in generalized\nprobabilistic techniques for working with biological sequences, and controllers\nfor third-party applications. The toolkit takes advantage of parallel\narchitectures and runs on a range of hardware and operating systems, and is\navailable under the general public license from\nhttp://sourceforge.net/projects/pycogent.\n\nPublication Types:\n Evaluation Studies\n Research Support, N.I.H., Extramural\n Research Support, Non-U.S. Gov't\n\nMeSH Terms:\n Animals\n BRCA1 Protein/genetics\n Databases, Genetic\n Genomics/methods*\n Humans\n Phylogeny\n Protein Conformation\n Proteobacteria/classification\n Proteobacteria/genetics\n Sequence Analysis/methods*\n Software*\n von Willebrand Factor/chemistry\n von Willebrand Factor/genetics\n\nSubstances:\n BRCA1 Protein\n von Willebrand Factor\n\nPMID: 17708774 [PubMed - indexed for MEDLINE]\n" Similarly, if you want something more machine-readable (but quite a lot less human-readable), you can specify XML in the retmode: .. doctest:: >>> ef = EFetch(id='17708774', db='pubmed', rettype='citation', retmode='xml') >>> cite = ef.read() >>> for line in cite.splitlines()[:5]: ... print line ... Only a partial example is shown as there are quite a few lines. As with sequences, you can retrieve multiple accessions at once. Searching for records using EUtils ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Getting records by their primary identifiers is very nice if you actually have the primary identifiers, but what if you don't? For example, what if you want to do a search based on a keyword, or have a genbank accession number rather than a gi, or want to get a range of records? Fortunately, the more general EUtils class allows this kind of complex workflow with relatively little intervention. For example, if you want to search for articles that mention PyCogent: .. doctest:: >>> from cogent.db.ncbi import EUtils >>> eu = EUtils(db='pubmed', rettype='brief') >>> res = eu['PyCogent'] >>> print res.read() 1: Smit S et al. From knotted to nested RNA st...[PMID: 18230758] 2: Knight R et al. PyCogent: a toolkit for makin...[PMID: 17708774] ...or perhaps you want only the ones with PyCogent in the title, in which case you can use any qualifier that NCBI supports: .. doctest:: >>> res = eu['PyCogent[ti]'] >>> print res.read() 1: Knight R et al. PyCogent: a toolkit for makin...[PMID: 17708774] The NCBI-supported list of field qualifiers, and lots of documentation generally on how to do pubmed queries, is `here `_. One especially useful feature is the ability to get a list of primary identifiers matching a query. You do this by setting ``rettype='uilist'`` (not idlist any more, so again you may need to update old code examples). For example: .. doctest:: >>> eu = EUtils(db='pubmed', rettype='uilist') >>> res = eu['PyCogent'] >>> print res.read() 18230758 17708774 This is especially useful when you want to do a bunch of queries (whether for journal articles, as shown here, or for sequences), combine the results, then download the actual unique records only once. You could of course do this with an incredibly complex single query, but good luck debugging that query... For sequences ------------- Fetching FASTA or Genpept sequences from NCBI using EFetch with GI's ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you already have a list of GI's (the numeric identifiers that are used by GenBank internally as identifers), your job is very easy: you just need to use ``EFetch`` to retrieve the corresponding records. In general, this works for any case where the identifiers you have are the primary keys, e.g. for PubMed you use the PubMed ID (see example below). Here is an example of getting the nucleotide record that corresponds to one particular id, in this case id # 459567 (chosen arbitrarily). The record arrives as a file-like object that can be read; in this case, we are looking at each line and printing the first 40 characters. .. doctest:: >>> from cogent.db.ncbi import EFetch >>> ef = EFetch(id='459567', rettype='fasta') >>> lines = ef.read().splitlines() >>> for line in lines: ... print line[:40] ... >gi|459567|dbj|D28543.1|HPCNS5PC Hepatit GAGCACGACATCTACCAATGTTGCCAACTGAACCCAGAGG GGCTTTACCTTGGTGGTCCCATGTTTAACTCGCGAGGTCA CGGGGTTCTTCCAACCAGCATGGGCAATACCCTCACATGT GCAGGCCTCACCAATTCTGACATGTTGGTTTGCGGAGATG TC Similarly, if your id refers to a protein record, you can get that by setting the ``rettype`` to 'gp'. .. doctest:: >>> genpept = EFetch(id='1234567,459567', rettype='gp').read() You'll probably notice that the lines look suspiciously like FASTA-format records. This is in fact true: the ``rettype`` parameter controls what type of record you get back. For example, if we do the same search with ``rettype='brief'``, we get. .. doctest:: >>> from cogent.db.ncbi import EFetch >>> ef = EFetch(id='459567', rettype='brief') >>> lines = ef.read().splitlines() >>> for line in lines: ... print line ... D28543 Hepatitis C virus... [gi:459567] The current ``rettypes`` (as of this writing on 4/14/2010) for the 'core' NCBI databases are native, fasta, gb, gp, gbwithparts, gbc, gpc, est, gss, seqid, acc, ft. Formerly, but not currently, 'genbank' was a synonym for 'gb' and 'genpept' was a synonym for 'gp': however, these synonyms no longer work and need to be fixed if you encounter them in old code. For more information check NCBI's `format documentation `_. Note that there are two separate concepts: rettype and retmode. rettype controls what kind of data you'll get, and retmode controls how the data will be formatted. For example: .. doctest:: >>> from cogent.db.ncbi import EFetch >>> ef = EFetch(id='459567', rettype='fasta', retmode='text') >>> lines = ef.read().splitlines() >>> for line in lines: ... print line[:40] ... >gi|459567|dbj|D28543.1|HPCNS5PC Hepatit GAGCACGACATCTACCAATGTTGCCAACTGAACCCAGAGG GGCTTTACCTTGGTGGTCCCATGTTTAACTCGCGAGGTCA CGGGGTTCTTCCAACCAGCATGGGCAATACCCTCACATGT GCAGGCCTCACCAATTCTGACATGTTGGTTTGCGGAGATG TC >>> ef = EFetch(id='459567', rettype='fasta', retmode='html') >>> lines = ef.read().splitlines() >>> for line in lines: ... print line[:40] ... >gi|459567|dbj|D28543.1|HPCNS5PC Hepatit GAGCACGACATCTACCAATGTTGCCAACTGAACCCAGAGG GGCTTTACCTTGGTGGTCCCATGTTTAACTCGCGAGGTCA CGGGGTTCTTCCAACCAGCATGGGCAATACCCTCACATGT GCAGGCCTCACCAATTCTGACATGTTGGTTTGCGGAGATG TC >>> ef = EFetch(id='459567', rettype='fasta', retmode='xml') >>> lines = ef.read().splitlines() >>> for line in lines: ... print line[:40] ... 459567 D28543.1 11103 Hepatitis C virusHepatitis C virus gene f 282 GAGCACGACATCTACCAATGTTG You'll notice that the second case is some funny-looking html. Thanks, NCBI! This is not our fault, please don't file a bug report. To figure out whether something is actually surprising behavior at NCBI, you can always capture the command-line and run it in a web browser. You can do this by calling str() on the ``ef``, or by printing it. For example: .. doctest:: >>> print ef http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmax=100&retmod... If you paste the resulting string into your web browser and you get the same incorrect result that you get using PyCogent, you know that you should direct your support requests NCBI's way. If you want to use your own email address instead of leaving it as the default (the module developer), you can do that just by passing it in as a parameter. For example, in the unlikely event that I want NCBI to contact me instead of Mike if something goes wrong with my script, I can achieve that as follows: .. doctest:: >>> ef = EFetch(id='459567', rettype='fasta', retmode='xml', email='rob@spot.colorado.edu') >>> print ef http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmax=100&retmod... You can also select multiple ids (pass in as comma-delimited list): .. doctest:: >>> ef = EFetch(id='459567,459568', rettype='brief') >>> ef.read() 'D28543 Hepatitis C virus... [gi:459567]\nBAA05896 NS5 protein [Hepa... [gi:459568]' Retrieving GenPept files from NCBI via Eutils ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We query for just one accession to illustrate the process. A more general query can be executed by replacing ``'BAB52044`` with ``'"lysyl tRNA-synthetase"[ti] AND bacteria[orgn]'`` in the snippet below. .. doctest:: >>> from cogent.db.ncbi import EUtils >>> e = EUtils(numseqs=100, db='protein', rettype='gp') >>> result = e['BAB52044'] >>> print result.read() LOCUS BAB52044 548 aa linear BCT 16-MAY-2009 DEFINITION lysyl tRNA synthetase [Mesorhizobium loti MAFF303099]. ACCESSION BAB52044 VERSION BAB52044.1 GI:14025444 DBSOURCE accession BA000012.4 KEYWORDS . SOURCE Mesorhizobium loti MAFF303099... Retrieving and parsing GenBank entries ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent.parse.genbank import RichGenbankParser >>> from cogent.db.ncbi import EUtils >>> e = EUtils(numseqs=100, db='protein', rettype='gp') >>> result = e['"lysyl tRNA-synthetase"[ti] AND bacteria[orgn]'] >>> parser = RichGenbankParser(result.readlines()) >>> gb = [(accession, seq) for accession, seq in parser] Printing the resulting list (``gb``) will generate output like: .. code-block:: python [('AAA83071', Sequence(MSEQHAQ... 505)), ('ACS40931', ... Parsing in more detail: a single GenBank entry ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. TODO you could select these from each sequence using the getFeaturesMatching .. doctest:: >>> from cogent.db.ncbi import EUtils >>> from cogent.parse.genbank import RichGenbankParser >>> e = EUtils(db="nucleotide", rettype="gb") >>> record = e['154102'].readlines() >>> parser = RichGenbankParser(record) >>> accession, seq = [record for record in parser][0] >>> accession 'STYHEMAPRF' >>> type(seq) >>> def gene_and_cds(f): ... return f['type'] == 'CDS' and 'gene' in f ... >>> cds_features = [f for f in seq.Info.features if gene_and_cds(f)] >>> for cds_feature in cds_features: ... print cds_feature['gene'], cds_feature['location'] ... ['hemA'] 732..1988 ['prfA'] 2029..3111 Retrieving a bacterial genome file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To obtain a full bacterial genome, run the following to get the complete *Salmonella typhimurium* genome sequence (Genbank) file. (For this documentation, we include a partial file for illustration purposes.) .. code-block:: python from cogent.db.ncbi import EUtils e = EUtils(db="nucleotide", rettype="gb") outfile = open('data/ST.genome.gb','w') outfile.write(e['AE006468'].read()) outfile.close() For larger files, you might want to dump them directly into a file on your hard drive rather than reading them into memory first, e.g. .. code-block:: python e.filename='ST.genome.gb' f = e['AE006468'] dumps the result into the file directly, and returns you a handle to the open file where you can read the result, get the path, or do any of the other standard file operations. Now do the analysis: .. doctest:: >>> from cogent.parse.genbank import RichGenbankParser >>> infile = open('data/ST_genome_part.gb', 'r') >>> parser = RichGenbankParser(infile) >>> accession, seq = [record for record in parser][0] >>> gene_and_cds = lambda f: f['type'] == 'CDS' and 'gene' in f >>> gene_name = lambda f: f['gene'] >>> all_cds = [f for f in seq.Info.features if gene_and_cds(f)] >>> for cds in sorted(all_cds, key=gene_name): ... print cds['gene'][0].ljust(6), ... print cds['protein_id'], cds['location'] ... mog ['AAL18972.1'] 8729..9319 talB ['AAL18971.1'] 7665..8618 thrA ['AAL18966.1'] 337..2799 thrB ['AAL18967.1'] 2801..3730 thrC ['AAL18968.1'] 3734..5020 thrL ['AAL18965.1'] 190..255 yaaA ['AAL18969.1'] complement(5114..5887) yaaH ['AAL18973.1'] complement(9376..9942) yaaJ ['AAL18970.1'] complement(5966..7396) The EUtils modules are generic, so additional databases like OMIM can be accessed using similar mechanisms. Retrieving PubMed abstracts from NCBI via EUtils ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: :options: +NORMALIZE_WHITESPACE >>> from cogent.db.ncbi import EUtils >>> e = EUtils(db='pubmed',rettype='brief') >>> result = e['Simon Easteal AND Von Bing Yap'].read() >>> print result 1: Yap VB et al. Estimates of the effect of na...[PMID: 19815689] 2: Schranz HW et al. Pathological rate matrices: f...[PMID: 19099591] Retrieving PubMed abstracts via PMID ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent.db.ncbi import EUtils >>> e = EUtils(db='pubmed',rettype='abstract') >>> result = e['14983078'].read() KEGG ==== Complete genomes ---------------- *To be written.* Orthologs --------- *To be written.* Functional assignments ---------------------- *To be written.* Pathway assignments ------------------- *To be written.* Ensembl ======= .. include:: ensembl.rst PDB === For structures -------------- The PDB module is very simple and basically gets a pdb coordinates file by accession number. Searches, etc. are not currently implemented because it's easier to get the pdb ids from NCBI than to scrape PDB's html results format. .. doctest:: >>> from cogent.db.pdb import Pdb >>> p = Pdb() >>> result = p['3L0U'] returns a handle to a file containing the PDB coordinates (that you can, for example, pass to the PDB parser in a fashion analogous to how you pass the GenBank record above to the RichGenbankParser). See the pdb parser documentation for more info. To send results directly to a file, you can use the retrieve() method of the Pdb object. Rfam ==== For RNA secondary structures, alignments, functions --------------------------------------------------- *To be written.* GoldenPath (not yet implemented) ================================ *To be written.* Whole-genome alignments, orthologs, annotation tracks ----------------------------------------------------- *To be written.* .. following cleans up files .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files('ST.genome.gb', error_on_missing=False) PyCogent-1.5.3/doc/cookbook/alignments.rst000644 000765 000024 00000064673 12014671600 021523 0ustar00jrideoutstaff000000 000000 Collections and Alignments -------------------------- .. authors, Gavin Huttley, Kristian Rother, Patrick Yannul, Tom Elliott, Jan Kosinski For loading collections of unaligned or aligned sequences see :ref:`load-seqs`. Basic Collection objects ^^^^^^^^^^^^^^^^^^^^^^^^ Constructing a ``SequenceCollection`` or ``Alignment`` object from strings """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadSeqs, DNA >>> dna = {'seq1': 'ATGACC', ... 'seq2': 'ATCGCC'} >>> seqs = LoadSeqs(data=dna, moltype=DNA) >>> print type(seqs) >>> seqs = LoadSeqs(data=dna, moltype=DNA, aligned=False) >>> print type(seqs) Converting a ``SequenceCollection`` to FASTA format """"""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadSeqs >>> seq = LoadSeqs('data/test.paml', aligned=False) >>> fasta_data = seq.toFasta() >>> print fasta_data >DogFaced GCAAGGAGCCAGCAGAACAGATGGGTTGAAACTAAGGAAACATGTAATGATAGGCAGACT >HowlerMon GCAAGGAGCCAACATAACAGATGGGCTGAAAGTGAGGAAACATGTAATGATAGGCAGACT >Human GCAAGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACT >Mouse GCAGTGAGCCAGCAGAGCAGATGGGCTGCAAGTAAAGGAACATGTAACGACAGGCAGGTT >NineBande GCAAGGCGCCAACAGAGCAGATGGGCTGAAAGTAAGGAAACATGTAATGATAGGCAGACT Adding new sequences to an existing collection or alignment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ New sequences can be either appended or inserted using the ``addSeqs`` method. More than one sequence can be added at the same time. Note that ``addSeqs`` does not modify the existing collection/alignment, it creates new one. Appending the sequences """"""""""""""""""""""" ``addSeqs`` without additional parameters will append the sequences to the end of the collection/alignment. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs(data= [('seq1', 'ATGAA------'), ... ('seq2', 'ATG-AGTGATG'), ... ('seq3', 'AT--AG-GATG')], moltype=DNA) >>> print aln >seq1 ATGAA------ >seq2 ATG-AGTGATG >seq3 AT--AG-GATG >>> new_seqs = LoadSeqs(data= [('seq0', 'ATG-AGT-AGG'), ... ('seq4', 'ATGCC------')], moltype=DNA) >>> new_aln = aln.addSeqs(new_seqs) >>> print new_aln >seq1 ATGAA------ >seq2 ATG-AGTGATG >seq3 AT--AG-GATG >seq0 ATG-AGT-AGG >seq4 ATGCC------ .. note:: The order is not preserved if you use ``toFasta`` method, which sorts sequences by name. Inserting the sequences """"""""""""""""""""""" Sequences can be inserted into an alignment at the specified position using either the ``before_name`` or ``after_name`` arguments. .. doctest:: >>> new_aln = aln.addSeqs(new_seqs, before_name='seq2') >>> print new_aln >seq1 ATGAA------ >seq0 ATG-AGT-AGG >seq4 ATGCC------ >seq2 ATG-AGTGATG >seq3 AT--AG-GATG >>> new_aln = aln.addSeqs(new_seqs, after_name='seq2') >>> print new_aln >seq1 ATGAA------ >seq2 ATG-AGTGATG >seq0 ATG-AGT-AGG >seq4 ATGCC------ >seq3 AT--AG-GATG Inserting sequence(s) based on their alignment to a reference sequence """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Already aligned sequences can be added to an existing ``Alignment`` object and aligned at the same time using the ``addFromReferenceAln`` method. The alignment is performed based on their alignment to a reference sequence (which must be present in both alignments). The method assumes the first sequence in ``ref_aln.Names[0]`` is the reference. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs(data= [('seq1', 'ATGAA------'), ... ('seq2', 'ATG-AGTGATG'), ... ('seq3', 'AT--AG-GATG')], moltype=DNA) >>> ref_aln = LoadSeqs(data= [('seq3', 'ATAGGATG'), ... ('seq0', 'ATG-AGCG'), ... ('seq4', 'ATGCTGGG')], moltype=DNA) >>> new_aln = aln.addFromReferenceAln(ref_aln) >>> print new_aln >seq1 ATGAA------ >seq2 ATG-AGTGATG >seq3 AT--AG-GATG >seq0 AT--G--AGCG >seq4 AT--GC-TGGG ``addFromReferenceAln`` has the same arguments as ``addSeqs`` so ``before_name`` and ``after_name`` can be used to insert the new sequences at the desired position. .. note:: This method does not work with the ``DenseAlignment`` class. Removing all columns with gaps in a named sequence ++++++++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs(data= [('seq1', 'ATGAA---TG-'), ... ('seq2', 'ATG-AGTGATG'), ... ('seq3', 'AT--AG-GATG')], moltype=DNA) >>> new_aln = aln.getDegappedRelativeTo('seq1') >>> print new_aln >seq1 ATGAATG >seq2 ATG-AAT >seq3 AT--AAT The elements of a collection or alignment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Accessing individual sequences from a collection or alignment by name """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Using the ``getSeq`` method allows for extracting an unaligned sequence from a collection or alignment by name. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs(data= [('seq1', 'ATGAA------'), ... ('seq2', 'ATG-AGTGATG'), ... ('seq3', 'AT--AG-GATG')], moltype=DNA) >>> seq = aln.getSeq('seq1') >>> seq.Name 'seq1' >>> type(seq) >>> seq.isGapped() False Alternatively, if you want to extract the aligned (i.e., gapped) sequence from an alignment, you can use ``getGappedSeq``. .. doctest:: >>> seq = aln.getGappedSeq('seq1') >>> seq.isGapped() True >>> print seq ATGAA------ To see the names of the sequences in a sequence collection, you can use either the ``Names`` attribute or ``getSeqNames`` method. .. doctest:: >>> aln.Names ['seq1', 'seq2', 'seq3'] >>> aln.getSeqNames() ['seq1', 'seq2', 'seq3'] Slice the sequences from an alignment like a list """"""""""""""""""""""""""""""""""""""""""""""""" The usual approach is to access a ``SequenceCollection`` or ``Alignment`` object as a dictionary, obtaining the individual sequences using the titles as "keys" (above). However, one can also iterate through the collection like a list. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> fn = 'data/long_testseqs.fasta' >>> seqs = LoadSeqs(fn, moltype=DNA, aligned=False) >>> my_seq = seqs.Seqs[0] >>> my_seq[:24] DnaSequence(TGTGGCA... 24) >>> str(my_seq[:24]) 'TGTGGCACAAATACTCATGCCAGC' >>> type(my_seq) >>> aln = LoadSeqs(fn, moltype=DNA, aligned=True) >>> aln.Seqs[0][:24] [0:24]/2532 of DnaSequence(TGTGGCA... 2532) >>> print aln.Seqs[0][:24] TGTGGCACAAATACTCATGCCAGC Getting a subset of sequences from the alignment """""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs('data/test.paml', moltype=DNA) >>> aln.Names ['NineBande', 'Mouse', 'Human', 'HowlerMon', 'DogFaced'] >>> new = aln.takeSeqs(['Human', 'HowlerMon']) >>> new.Names ['Human', 'HowlerMon'] Note the subset contain references to the original sequences, not copies. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs('data/test.paml', moltype=DNA) >>> seq = aln.getSeq('Human') >>> new = aln.takeSeqs(['Human', 'HowlerMon']) >>> id(new.getSeq('Human')) == id(aln.getSeq('Human')) True Alignments ^^^^^^^^^^ Creating an ``Alignment`` object from a ``SequenceCollection`` """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent.core.alignment import Alignment >>> seq = LoadSeqs('data/test.paml', aligned=False) >>> aln = Alignment(seq) >>> fasta_1 = seq.toFasta() >>> fasta_2 = aln.toFasta() >>> assert fasta_1 == fasta_2 Handling gaps """"""""""""" Remove all gaps from an alignment in FASTA format +++++++++++++++++++++++++++++++++++++++++++++++++ This necessarily returns a ``SequenceCollection``. .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs("data/primate_cdx2_promoter.fasta") >>> degapped = aln.degap() >>> print type(degapped) .. TODO the following should be preceded by a section describing the writeToFile method and format argument Writing sequences to file """"""""""""""""""""""""" Both collection and alignment objects have a ``writeToFile`` method. The output format is inferred from the filename suffix, .. doctest:: >>> from cogent import LoadSeqs, DNA >>> dna = {'seq1': 'ATGACC', ... 'seq2': 'ATCGCC'} >>> aln = LoadSeqs(data=dna, moltype=DNA) >>> aln.writeToFile('sample.fasta') or by the ``format`` argument. .. doctest:: >>> aln.writeToFile('sample', format='fasta') .. now clean the files up .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files(['sample', 'sample.fasta'], error_on_missing=False) Converting an alignment to FASTA format """"""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent.core.alignment import Alignment >>> seq = LoadSeqs('data/long_testseqs.fasta') >>> aln = Alignment(seq) >>> fasta_align = aln.toFasta() Converting an alignment into Phylip format """""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent.core.alignment import Alignment >>> seq = LoadSeqs('data/test.paml') >>> aln = Alignment(seq) >>> phylip_file, name_dictionary = aln.toPhylip() Converting an alignment to a list of strings """""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent.core.alignment import Alignment >>> seq = LoadSeqs('data/test.paml') >>> aln = Alignment(seq) >>> string_list = aln.todict().values() Slicing an alignment ^^^^^^^^^^^^^^^^^^^^ By rows (sequences) """"""""""""""""""" An ``Alignment`` can be sliced .. doctest:: >>> from cogent import LoadSeqs, DNA >>> fn = 'data/long_testseqs.fasta' >>> aln = LoadSeqs(fn, moltype=DNA, aligned=True) >>> print aln[:24] >Human TGTGGCACAAATACTCATGCCAGC >HowlerMon TGTGGCACAAATACTCATGCCAGC >Mouse TGTGGCACAGATGCTCATGCCAGC >NineBande TGTGGCACAAATACTCATGCCAAC >DogFaced TGTGGCACAAATACTCATGCCAAC but a ``SequenceCollection`` cannot be sliced .. doctest:: >>> from cogent import LoadSeqs, DNA >>> fn = 'data/long_testseqs.fasta' >>> seqs = LoadSeqs(fn, moltype=DNA, aligned=False) >>> print seqs[:24] Traceback (most recent call last): TypeError: 'SequenceCollection' object is not subscriptable Getting a single column from an alignment """"""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent.core.alignment import Alignment >>> seq = LoadSeqs('data/test.paml') >>> aln = Alignment(seq) >>> column_four = aln[3] Getting a region of contiguous columns """""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent.core.alignment import Alignment >>> aln = LoadSeqs('data/long_testseqs.fasta') >>> region = aln[50:70] Iterating over alignment positions """""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs('data/primate_cdx2_promoter.fasta') >>> col = aln[113:115].iterPositions() >>> type(col) >>> list(col) [['A', 'A', 'A'], ['T', '-', '-']] Getting codon 3rd positions from an alignment """"""""""""""""""""""""""""""""""""""""""""" We'll do this by specifying the position indices of interest, creating a sequence ``Feature`` and using that to extract the positions. .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs(data={'seq1': 'ATGATGATG---', ... 'seq2': 'ATGATGATGATG'}) >>> range(len(aln))[2::3] [2, 5, 8, 11] >>> indices = [(i, i+1) for i in range(len(aln))[2::3]] >>> indices [(2, 3), (5, 6), (8, 9), (11, 12)] >>> pos3 = aln.addFeature('pos3', 'pos3', indices) >>> pos3 = pos3.getSlice() >>> print pos3 >seq2 GGGG >seq1 GGG- .. _filter-positions: Filtering positions """"""""""""""""""" Eliminating columns with non-nucleotide characters ++++++++++++++++++++++++++++++++++++++++++++++++++ We sometimes want to eliminate ambiguous or gap data from our alignments. We show how to exclude alignment columns by the characters they contain. In the first instance we do this just for single nucleotide columns, then for trinucleotides (equivalent for handling codons). .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs(data= [('seq1', 'ATGAAGGTG---'), ... ('seq2', 'ATGAAGGTGATG'), ... ('seq3', 'ATGAAGGNGATG')], moltype=DNA) We now just define a one-line function that returns ``True`` if the passed data contains only nucleotide characters, ``False`` otherwise. The function works by converting the aligned column into a ``set`` and checking it is equal to, or a subset of, all nucleotides. This function, which works for nucleotides or codons, has the effect of eliminating the (nucleotide/trinucleotide) columns with the 'N' and '-' characters. .. doctest:: >>> just_nucs = lambda x: set(''.join(x)) <= set('ACGT') We apply to nucleotides, .. doctest:: >>> nucs = aln.filtered(just_nucs) >>> print nucs >seq1 ATGAAGGG >seq2 ATGAAGGG >seq3 ATGAAGGG We can also do this in a more longwinded but clearer fashion with a named multi-line function: .. doctest:: >>> def just_nucs(x, allowed = 'ACGT'): ... for char in ''.join(x): # ensure char is a str with length 1 ... if not char in allowed: ... return False ... return True ... >>> nucs = aln.filtered(just_nucs) >>> nucs 3 x 8 dna alignment: seq1[ATGAAGGG], seq2[ATGAAGGG], seq3[ATGAAGGG] >>> print nucs >seq1 ATGAAGGG >seq2 ATGAAGGG >seq3 ATGAAGGG Applying the same filter to trinucleotides (specified by setting ``motif_length=3``). .. doctest:: >>> trinucs = aln.filtered(just_nucs, motif_length=3) >>> print trinucs >seq1 ATGAAG >seq2 ATGAAG >seq3 ATGAAG Getting all variable positions from an alignment ++++++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs('data/long_testseqs.fasta') >>> just_variable_aln = aln.filtered(lambda x: len(set(x)) > 1) >>> print just_variable_aln[:10] >Human AAGCAAAACT >HowlerMon AAGCAAGACT >Mouse GGGCCCAGCT >NineBande AAATAAAACT >DogFaced AAACAAAATA Getting all constant positions from an alignment ++++++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs('data/long_testseqs.fasta') >>> just_constant_aln = aln.filtered(lambda x: len(set(x)) == 1) >>> print just_constant_aln[:10] >Human TGTGGCACAA >HowlerMon TGTGGCACAA >Mouse TGTGGCACAA >NineBande TGTGGCACAA >DogFaced TGTGGCACAA Getting all variable codons from an alignment +++++++++++++++++++++++++++++++++++++++++++++ This is exactly the same as before, with a new keyword argument .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs('data/long_testseqs.fasta') >>> variable_codons = aln.filtered(lambda x: len(set(x)) > 1, ... motif_length=3) >>> print just_variable_aln[:9] >Human AAGCAAAAC >HowlerMon AAGCAAGAC >Mouse GGGCCCAGC >NineBande AAATAAAAC >DogFaced AAACAAAAT Filtering sequences """"""""""""""""""" Extracting sequences by sequence identifier into a new alignment object +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ You can use ``takeSeqs`` to extract some sequences by sequence identifier from an alignment to a new alignment object: .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs('data/long_testseqs.fasta') >>> aln.takeSeqs(['Human','Mouse']) 2 x 2532 text alignment: Human[TGTGGCACAAA...], Mouse[TGTGGCACAGA...] Alternatively, you can extract only the sequences which are not specified by passing ``negate=True``: .. doctest:: >>> aln.takeSeqs(['Human','Mouse'],negate=True) 3 x 2532 text alignment: NineBande[TGTGGCACAAA...], HowlerMon[TGTGGCACAAA...], DogFaced[TGTGGCACAAA...] Extracting sequences using an arbitrary function into a new alignment object ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ You can use ``takeSeqsIf`` to extract sequences into a new alignment object based on whether an arbitrary function applied to the sequence evaluates to True. For example, to extract sequences which don't contain any N bases you could do the following: .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs(data= [('seq1', 'ATGAAGGTG---'), ... ('seq2', 'ATGAAGGTGATG'), ... ('seq3', 'ATGAAGGNGATG')], moltype=DNA) >>> def no_N_chars(s): ... return 'N' not in s >>> aln.takeSeqsIf(no_N_chars) 2 x 12 dna alignment: seq1[ATGAAGGTG--...], seq2[ATGAAGGTGAT...] You can additionally get the sequences where the provided function evaluates to False: .. doctest:: >>> aln.takeSeqsIf(no_N_chars,negate=True) 1 x 12 dna alignment: seq3[ATGAAGGNGAT...] Computing alignment statistics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Computing motif probabilities from an alignment """"""""""""""""""""""""""""""""""""""""""""""" The method ``getMotifProbs`` of ``Alignment`` objects returns the probabilities for all motifs of a given length. For individual nucleotides: .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs('data/primate_cdx2_promoter.fasta', moltype=DNA) >>> motif_probs = aln.getMotifProbs() >>> print motif_probs {'A': 0.24... For dinucleotides or longer, we need to pass in an ``Alphabet`` with the appropriate word length. Here is an example with trinucleotides: .. doctest:: >>> from cogent import LoadSeqs, DNA >>> trinuc_alphabet = DNA.Alphabet.getWordAlphabet(3) >>> aln = LoadSeqs('data/primate_cdx2_promoter.fasta', moltype=DNA) >>> motif_probs = aln.getMotifProbs(alphabet=trinuc_alphabet) >>> for m in sorted(motif_probs, key=lambda x: motif_probs[x], ... reverse=True): ... print m, motif_probs[m] ... CAG 0.0374581939799 CCT 0.0341137123746 CGC 0.0301003344482... The same holds for other arbitrary alphabets, as long as they match the alignment ``MolType``. Some calculations in cogent require all non-zero values in the motif probabilities, in which case we use a pseudo-count. We illustrate that here with a simple example where T is missing. Without the pseudo-count, the frequency of T is 0.0, with the pseudo-count defined as 1e-6 then the frequency of T will be slightly less than 1e-6. .. doctest:: >>> aln = LoadSeqs(data=[('a', 'AACAAC'),('b', 'AAGAAG')], moltype=DNA) >>> motif_probs = aln.getMotifProbs() >>> assert motif_probs['T'] == 0.0 >>> motif_probs = aln.getMotifProbs(pseudocount=1e-6) >>> assert 0 < motif_probs['T'] <= 1e-6 It is important to notice that motif probabilities are computed by treating sequences as non-overlapping tuples. Below is a very simple pair of identical sequences where there are clearly 2 'AA' dinucleotides per sequence but only the first one is 'in-frame' (frame width = 2). We then create a dinucleotide ``Alphabet`` object and use this to get dinucleotide probabilities. These frequencies are determined by breaking each aligned sequence up into non-overlapping dinucleotides and then doing a count. The expected value for the 'AA' dinucleotide in this case will be 2/8 = 0.25. .. doctest:: >>> seqs = [('a', 'AACGTAAG'), ('b', 'AACGTAAG')] >>> aln = LoadSeqs(data=seqs, moltype=DNA) >>> dinuc_alphabet = DNA.Alphabet.getWordAlphabet(2) >>> motif_probs = aln.getMotifProbs(alphabet=dinuc_alphabet) >>> assert motif_probs['AA'] == 0.25 What about counting the total incidence of dinucleotides including those not in-frame? A naive application of the Python string object's count method will not work as desired either because it "returns the number of non-overlapping occurrences". .. doctest:: >>> seqs = [('my_seq', 'AAAGTAAG')] >>> aln = LoadSeqs(data=seqs, moltype=DNA) >>> my_seq = aln.getSeq('my_seq') >>> my_seq.count('AA') 2 >>> 'AAA'.count('AA') 1 >>> 'AAAA'.count('AA') 2 To count all occurrences of a given dinucleotide in a DNA sequence, one could use a standard Python approach such as list comprehension: .. doctest:: >>> from cogent import Sequence, DNA >>> seq = Sequence(moltype=DNA, seq='AAAGTAAG') >>> seq DnaSequence(AAAGTAAG) >>> di_nucs = [seq[i:i+2] for i in range(len(seq)-1)] >>> sum([nn == 'AA' for nn in di_nucs]) 3 Working with alignment gaps """"""""""""""""""""""""""" Filtering extracted columns for the gap character +++++++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs('data/primate_cdx2_promoter.fasta') >>> col = aln[113:115].iterPositions() >>> c1, c2 = list(col) >>> c1, c2 (['A', 'A', 'A'], ['T', '-', '-']) >>> filter(lambda x: x == '-', c1) [] >>> filter(lambda x: x == '-', c2) ['-', '-'] Calculating the gap fraction ++++++++++++++++++++++++++++ .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs('data/primate_cdx2_promoter.fasta') >>> for column in aln[113:150].iterPositions(): ... ungapped = filter(lambda x: x == '-', column) ... gap_fraction = len(ungapped) * 1.0 / len(column) ... print gap_fraction 0.0 0.666666666667 0.0 0.0... Extracting maps of aligned to unaligned positions (i.e., gap maps) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ It's often important to know how an alignment position relates to a position in one or more of the sequences in the alignment. The ``gapMaps`` method of the individual sequences is useful for this. To get a map of sequence to alignment positions for a specific sequence in your alignment, do the following: .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs(data= [('seq1', 'ATGAAGG-TG--'), ... ('seq2', 'ATG-AGGTGATG'), ... ('seq3', 'ATGAAG--GATG')], moltype=DNA) >>> seq_to_aln_map = aln.getGappedSeq('seq1').gapMaps()[0] It's now possible to look up positions in the ``seq1``, and find out what they map to in the alignment: .. doctest:: >>> seq_to_aln_map[3] 3 >>> seq_to_aln_map[8] 9 This tells us that in position 3 in ``seq1`` corresponds to position 3 in ``aln``, and that position 8 in ``seq1`` corresponds to position 9 in ``aln``. Notice that we grabbed the first result from the call to ``gapMaps``. This is the sequence position to alignment position map. The second value returned is the alignment position to sequence position map, so if you want to find out what sequence positions the alignment positions correspond to (opposed to what alignment positions the sequence positions correspond to) for a given sequence, you would take the following steps: .. doctest:: >>> aln_to_seq_map = aln.getGappedSeq('seq1').gapMaps()[1] >>> aln_to_seq_map[3] 3 >>> aln_to_seq_map[8] 7 If an alignment position is a gap, and therefore has no corresponding sequence position, you'll get a ``KeyError``. .. doctest:: >>> seq_pos = aln_to_seq_map[7] Traceback (most recent call last): KeyError: 7 .. note:: The first position in alignments and sequences is always numbered position 0. Filtering alignments based on gaps ++++++++++++++++++++++++++++++++++ .. note:: An alternate, computationally faster, approach to removing gaps is to use the ``filtered`` method as discussed in :ref:`filter-positions`. The ``omitGapRuns`` method can be applied to remove long stretches of gaps in an alignment. In the following example, we remove sequences that have more than two adjacent gaps anywhere in the aligned sequence. .. doctest:: >>> aln = LoadSeqs(data= [('seq1', 'ATGAA---TG-'), ... ('seq2', 'ATG-AGTGATG'), ... ('seq3', 'AT--AG-GATG')], moltype=DNA) >>> print aln.omitGapRuns(2).toFasta() >seq2 ATG-AGTGATG >seq3 AT--AG-GATG If instead, we just wanted to remove positions from the alignment which are gaps in more than a certain percentage of the sequences, we could use the ``omitGapPositions`` function. For example: .. doctest:: >>> aln = LoadSeqs(data= [('seq1', 'ATGAA---TG-'), ... ('seq2', 'ATG-AGTGATG'), ... ('seq3', 'AT--AG-GATG')], moltype=DNA) >>> print aln.omitGapPositions(0.40).toFasta() >seq1 ATGA--TG- >seq2 ATGAGGATG >seq3 AT-AGGATG You'll notice that the 4th and 7th columns of the alignment have been removed because they contained 66% gaps -- more than the allowed 40%. If you wanted to remove sequences which contain more than a certain percent gap characters, you could use the ``omitGapSeqs`` method. This is commonly applied to filter partial sequences from an alignment. >>> aln = LoadSeqs(data= [('seq1', 'ATGAA------'), ... ('seq2', 'ATG-AGTGATG'), ... ('seq3', 'AT--AG-GATG')], moltype=DNA) >>> filtered_aln = aln.omitGapSeqs(0.50) >>> print filtered_aln.toFasta() >seq2 ATG-AGTGATG >seq3 AT--AG-GATG Note that following this call to ``omitGapSeqs``, the 4th column of ``filtered_aln`` is 100% gaps. This is generally not desirable, so a call to ``omitGapSeqs`` is frequently followed with a call to ``omitGapPositions`` with no parameters -- this defaults to removing positions which are all gaps: >>> print filtered_aln.omitGapPositions().toFasta() >seq2 ATGAGTGATG >seq3 AT-AG-GATG PyCogent-1.5.3/doc/cookbook/alphabet.rst000644 000765 000024 00000002424 11444532333 021131 0ustar00jrideoutstaff000000 000000 Alphabets --------- .. authors Gavin Huttley ``Alphabet`` and ``MolType`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``MolType`` instances have an ``Alphabet``. .. doctest:: >>> from cogent import DNA, PROTEIN >>> print DNA.Alphabet ('T', 'C', 'A', 'G') >>> print PROTEIN.Alphabet ('A', 'C', 'D', 'E', ... ``Alphabet`` instances have a ``MolType``. .. doctest:: >>> PROTEIN.Alphabet.MolType == PROTEIN True Creating tuple alphabets ^^^^^^^^^^^^^^^^^^^^^^^^ You can create a tuple alphabet of, for example, dinucleotides or trinucleotides. .. doctest:: >>> dinuc_alphabet = DNA.Alphabet.getWordAlphabet(2) >>> print dinuc_alphabet ('TT', 'CT', 'AT', 'GT', ... >>> trinuc_alphabet = DNA.Alphabet.getWordAlphabet(3) >>> print trinuc_alphabet ('TTT', 'CTT', 'ATT', ... Convert a sequence into integers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> seq = 'TAGT' >>> indices = DNA.Alphabet.toIndices(seq) >>> indices [0, 2, 3, 0] Convert integers to a sequence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> seq = DNA.Alphabet.fromIndices([0,2,3,0]) >>> seq ['T', 'A', 'G', 'T'] or .. doctest:: >>> seq = DNA.Alphabet.fromOrdinalsToSequence([0,2,3,0]) >>> seq DnaSequence(TAGT) PyCogent-1.5.3/doc/cookbook/analysis_of_sequence_composition.rst000644 000765 000024 00000030245 12014671666 026204 0ustar00jrideoutstaff000000 000000 ******************************** Analysis of sequence composition ******************************** .. sectionauthor:: Jesse Zaneveld PyCogent provides several tools for analyzing the composition of DNA, RNA, or protein sequences. Loading your sequence ===================== Let us say that we wish to study the sequence composition of the *Y. pseudotuberculosis* PB1 DNA Polymerase III beta subunit. First we input the sequence as a string. .. doctest:: >>> y_pseudo_seq = \ ... """ atgaaatttatcattgaacgtgagcatctgctaaaaccactgcaacaggtcagtagcccg ... ctgggtggacgccctacgttgcctattttgggtaacttgttgctgcaagtcacggaaggc ... tctttgcggctgaccggtaccgacttggagatggagatggtggcttgtgttgccttgtct ... cagtcccatgagccgggtgctaccacagtacccgcacggaagttttttgatatctggcgt ... ggtttacccgaaggggcggaaattacggtagcgttggatggtgatcgcctgctagtgcgc ... tctggtcgcagccgtttctcgctgtctaccttgcctgcgattgacttccctaatctggat ... gactggcagagtgaggttgaattcactttaccgcaggctacgttaaagcgtctgattgag ... tccactcagttttcgatggcccatcaggatgtccgttattatttgaacggcatgctgttt ... gagaccgaaggcgaagagttacgtactgtggcgaccgatgggcatcgcttggctgtatgc ... tcaatgcctattggccagacgttaccctcacattcggtgatcgtgccgcgtaaaggtgtg ... atggagctggttcggttgctggatggtggtgatacccccttgcggctgcaaattggcagt ... aataatattcgtgctcatgtgggcgattttattttcacatctaagctggttgatggccgt ... ttcccggattatcgccgcgtattgccgaagaatcctgataaaatgctggaagccggttgc ... gatttactgaaacaggcattttcgcgtgcggcaattctgtcaaatgagaagttccgtggt ... gttcggctctatgtcagccacaatcaactcaaaatcactgctaataatcctgaacaggaa ... gaagcagaagagatcctcgatgttagctacgaggggacagaaatggagatcggtttcaac ... gtcagctatgtgcttgatgtgctaaatgcactgaagtgcgaagatgtgcgcctgttattg ... actgactctgtatccagtgtgcagattgaagacagcgccagccaagctgcagcctatgtc ... gtcatgccaatgcgtttgtag""" To check that our results are reasonable, we can also load a small example string. .. doctest:: >>> example_seq = "GCGTTT" In order to calculate compositional statistics, we need to import one of the ``Usage`` objects from ``cogent.core.usage``, create an object from our string, and normalize the counts contained in the string into frequencies. ``Usage`` objects include ``BaseUsage``, ``PositionalBaseUsage``, ``CodonUsage``, and ``AminoAcidUsage``. Let us start with the ``BaseUsage`` object. The first few steps will be the same for the other Usage objects, however (as we will see below). GC content ========== Total GC content ----------------- GC content is one commonly used compositional statistic. To calculate the total GC content of our gene, we will need to initiate and normalize a ``BaseUsage`` object. .. doctest:: >>> from cogent.core.usage import BaseUsage >>> example_bu = BaseUsage(example_seq) >>> # Print raw counts >>> print example_bu.content("GC") 3.0 >>> example_bu.normalize() >>> print example_bu.content("GC") 0.5 We can now visually verify that the reported GC contents are correct, and use the same technique on our full sequence. .. doctest:: >>> y_pseudo_bu = BaseUsage(y_pseudo_seq) >>> # Print raw counts >>> y_pseudo_bu.content("GC") 555.0 >>> y_pseudo_bu.normalize() >>> print y_pseudo_bu.content("GC") 0.50408719346 Positional GC content of Codons ------------------------------- When analyzing protein coding genes, it is often useful to subdivide the GC content by codon position. In particular, the 3rd codon position ``CodonUsage`` objects allow us to calculate the GC content at each codon position. First, let us calculate the GC content for the codons in the example sequence as follows. .. doctest:: >>> # Import CodonUsage object >>> from cogent.core.usage import CodonUsage >>> # Initiate & normalize CodonUsage object >>> example_seq_cu = CodonUsage(example_seq) >>> example_seq_cu.normalize() >>> GC,P1,P2,P3 = example_seq_cu.positionalGC() Here, GC is the overall GC content for the sequence, while P1, P2, and P3 are the GC content at the first, second, and third codon positions, respectively. Printing the results for the example gives the following results. .. doctest:: >>> print "GC:", GC GC: 0.5 >>> print "P1:", P1 P1: 0.5 >>> print "P2:", P2 P2: 0.5 >>> print "P3:", P3 P3: 0.5 We can then do the same for our biological sequence. .. doctest:: >>> y_pseudo_cu = CodonUsage(y_pseudo_seq) >>> y_pseudo_cu.normalize() >>> y_pseudo_GC = y_pseudo_cu.positionalGC() >>> print y_pseudo_GC [0.51874999999999993, 0.5843749999999999, 0.4750000000000001, 0.49687499999999996] These results could then be fed into downstream analyses. One important note is that ``CodonUsage`` objects calculate the GC content of codons within nucleotide sequences, rather than the full GC content. Therefore, ``BaseUsage`` rather than ``CodonUsage`` objects should be used for calculating the GC content of non-coding sequences. Total Base Usage ================ A more detailed view of composition incorporates the relative counts or frequencies of all bases. We can calculate total base usage as follows. .. doctest:: >>> from cogent.core.usage import BaseUsage >>> example_bu = BaseUsage(example_seq) >>> # Print raw counts >>> for k in example_bu.RequiredKeys: ... print k, example_bu[k] A 0.0 C 1.0 U 3.0 G 2.0 >>> example_bu.normalize() >>> for k in example_bu.RequiredKeys: ... print k, example_bu[k] A 0.0 C 0.166666666667 U 0.5 G 0.333333333333 Dinucleotide Content ==================== The ``DinucUsage`` object allows us to calculate Dinucleotide usage for our sequence. Dinucleotide usage can be calculated using overlapping, non-overlapping, or '3-1' dinucleotides. Given the sequence "AATTAAGCC", each method will count dinucleotide usage differently. Overlapping dinucleotide usage will count "AA", "AT", "TT", "TA", "AA", "AG", "GC", "CC". Non-overlapping dinucleotide usage will count "AA", "TT", "AA", "GC" 3-1 dinucleotide usage will count "TT", "AC". Calculating the GC content at the third and first codon positions ("3-1" usage) is useful for some applications, such as gene transfer detection, because changes at these positions tend to produce the most conservative amino acid substitutions, and thus are thought to better reflect mutational (rather than selective) pressure. Overlapping dinucleotide content -------------------------------- To calculate overlapping dinucleotide usage for our *Y. pseudotuberculosis* PB1 sequence. .. doctest:: >>> from cogent.core.usage import DinucUsage >>> du = DinucUsage(y_pseudo_seq, Overlapping=True) >>> du.normalize() We can inspect individual dinucleotide usages and confirm that the results add to 100% as follows .. doctest:: >>> total = 0.0 >>> for k in du.RequiredKeys: ... print k, du[k] ... total += du[k] UU 0.0757855822551 UC 0.0517560073937 UA 0.043438077634 UG 0.103512014787 CU 0.0619223659889 CC 0.0517560073937 CA 0.0517560073937 CG 0.0573012939002 AU 0.0674676524954 AC 0.043438077634 AA 0.0573012939002 AG 0.054528650647 GU 0.0711645101664 GC 0.0794824399261 GA 0.0674676524954 GG 0.0619223659889 >>> print "Total:",total Total: 1.0 Non-overlapping Dinucleotide Content ------------------------------------ To calculate non-overlapping dinucleotide usage we simply change the ``Overlapping`` parameter to ``False`` when initiating the ``DinucUsage`` object. .. doctest:: >>> from cogent.core.usage import DinucUsage >>> du_no = DinucUsage(y_pseudo_seq, Overlapping=False) >>> du_no.normalize() >>> total = 0 >>> for k in du_no.RequiredKeys: ... print k, du_no[k] ... total += du_no[k] UU 0.0733082706767 UC 0.0507518796992 UA 0.0375939849624 UG 0.105263157895 CU 0.0733082706767 CC 0.046992481203 CA 0.0394736842105 CG 0.0601503759398 AU 0.0751879699248 AC 0.046992481203 AA 0.062030075188 AG 0.0545112781955 GU 0.0601503759398 GC 0.0845864661654 GA 0.0676691729323 GG 0.062030075188 >>> print "Total:",total Total: 1.0 '3-1' Dinucleotide Content -------------------------- To calculate dinucleotide usage considering only adjacent first and third codon positions, we set the Overlapping parameter to '3-1' when constructing our ``DinucUsage`` object .. doctest:: >>> from cogent.core.usage import DinucUsage >>> du_3_1 = DinucUsage(y_pseudo_seq, Overlapping='3-1') >>> du_3_1.normalize() >>> total = 0 >>> for k in du_3_1.RequiredKeys: ... print k, du_3_1[k] ... total += du_3_1[k] UU 0.0720221606648 UC 0.0664819944598 UA 0.0360110803324 UG 0.0914127423823 CU 0.0387811634349 CC 0.0415512465374 CA 0.0554016620499 CG 0.0554016620499 AU 0.0498614958449 AC 0.0470914127424 AA 0.0664819944598 AG 0.0747922437673 GU 0.0886426592798 GC 0.0886426592798 GA 0.0609418282548 GG 0.0664819944598 >>> print "Total:",total Total: 1.0 Comparing dinucleotide usages ----------------------------- Above, we noted that there are several ways to calculate dinucleotide usages on a single sequence, and that the choice of methods changes the reported frequencies somewhat. How could we quantify the effect this choice make on the result? One way to test this is to calculate the Euclidean distance between the resulting frequencies. We can do this using the dinucleotide usage's .. doctest:: >>> du_vs_du_3_1_dist = du.distance(du_3_1) As required of a true distance, the results are independent of the direction of the calculation. .. doctest:: >>> du_3_1_vs_du_dist = du_3_1.distance(du) >>> print du_3_1_vs_du_dist == du_vs_du_3_1_dist True Caution regarding unnormalized distances ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note that in this case we have already called ``du.normalize()`` on each ``DinucUsage`` object. You MUST call ``du.normalize()`` before calculating distances. Otherwise the distance calculated will be for the dinucleotide counts, rather than frequencies. Distances of counts can be non-zero even for sequences with identical dinucleotide usage, if those sequences are of different lengths. k-words ------- *To be written.* Codon usage analyses ==================== In addition to allowing a more detailed examination of GC content in coding sequences, ``CodonUsage`` objects (as the name implies) let us examine the codon usage of our sequence. .. doctest:: >>> from cogent.core.usage import CodonUsage >>> y_pseudo_cu = CodonUsage(y_pseudo_seq) >>> # Print raw counts >>> for k in y_pseudo_cu.RequiredKeys: ... print k, y_pseudo_cu[k] UUU 8.0 UUC 4.0 UUA 5.0 UUG 14.0 UCU 4.0 UCC 3.0 UCA 5.0 UCG 3.0 UAU 8.0... Note that before normalization the ``CodonUsage`` object holds raw counts of results. However, for most purposes, we will want frequencies, so we normalize the counts. .. doctest:: >>> y_pseudo_cu.normalize() >>> # Print normalized frequencies >>> for k in y_pseudo_cu.RequiredKeys: ... print k, y_pseudo_cu[k] UUU 0.0225988700565 UUC 0.0112994350282 UUA 0.0141242937853 UUG 0.0395480225989 UCU 0.0112994350282 UCC 0.00847457627119 UCA 0.0141242937853 UCG 0.00847457627119 UAU 0.0225988700565... Relative Synonymous Codon Usage ------------------------------- The RSCU or relative synonymous codon usage metric divides the frequency of each codon by the total frequency of all codons encoding the same amino acid. .. doctest:: >>> y_pseudo_cu.normalize() >>> y_pseudo_rscu = y_pseudo_cu.rscu() >>> # Print rscu frequencies >>> for k in y_pseudo_rscu.keys(): ... print k, y_pseudo_rscu[k] ACC 0.263157894737 GUC 0.238095238095 ACA 0.210526315789 ACG 0.263157894737 AAC 0.4 CCU 0.315789473684 UGG 1.0 AUC 0.266666666667 GUA 0.190476190476... PR2 bias -------- *To be written* Fingerprint analysis -------------------- *To be written* Amino Acid Usage ================ *To be written.* Profiles ======== *To be written.* Visualisation ============= *To be written.* PyCogent-1.5.3/doc/cookbook/annotations.rst000644 000765 000024 00000044104 11654627574 021726 0ustar00jrideoutstaff000000 000000 Annotations ^^^^^^^^^^^ .. Gavin Huttley, Tom Elliot Annotations with coordinates ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For more extensive documentation about annotations see :ref:`seq-annotations`. Automated introduction from reading genbank files """"""""""""""""""""""""""""""""""""""""""""""""" We load a sample genbank file with plenty of features and grab the CDS features. .. doctest:: >>> from cogent.parse.genbank import RichGenbankParser >>> parser = RichGenbankParser(open('data/ST_genome_part.gb')) >>> for accession, seq in parser: ... print accession ... AE006468 >>> cds = seq.getAnnotationsMatching('CDS') >>> print cds [CDS "thrL" at [189:255]/10020, CDS "thrA" at ... Creating directly on a sequence """"""""""""""""""""""""""""""" .. doctest:: >>> from cogent import DNA >>> from cogent.core.annotation import Feature >>> s1 = DNA.makeSequence("AAGAAGAAGACCCCCAAAAAAAAAA"\ ... "TTTTTTTTTTAAAAAGGGAACCCT", ... Name="seq1") ... >>> print s1[10:15] # this will be exon 1 CCCCC >>> print s1[30:40] # this will be exon 2 TTTTTAAAAA >>> print s1[45:48] # this will be exon 3 CCC >>> s2 = DNA.makeSequence("CGAAACGTTT", Name="seq2") >>> s3 = DNA.makeSequence("CGAAACGTTT", Name="seq3") Via """ ``addAnnotation`` +++++++++++++++++ .. doctest:: >>> from cogent import DNA >>> from cogent.core.annotation import Feature >>> s1 = DNA.makeSequence("AAGAAGAAGACCCCCAAAAAAAAAA"\ ... "TTTTTTTTTTAAAAAGGGAACCCT", ... Name="seq1") ... >>> exon1 = s1.addAnnotation(Feature, 'exon', 'A', [(10,15)]) >>> exon2 = s1.addAnnotation(Feature, 'exon', 'B', [(30,40)]) ``addFeature`` ++++++++++++++ .. doctest:: >>> from cogent import DNA >>> s1 = DNA.makeSequence("AAGAAGAAGACCCCCAAAAAAAAAA"\ ... "TTTTTTTTTTAAAAAGGGAACCCT", ... Name="seq1") ... >>> exon3 = s1.addFeature('exon', 'C', [(45, 48)]) *There are other annotation types.* Adding as a series or item-wise """"""""""""""""""""""""""""""" .. doctest:: >>> from cogent import DNA >>> s2 = DNA.makeSequence("CGAAACGTTT", Name="seq2") >>> cpgs_series = s2.addFeature('cpgsite', 'cpg', [(0,2), (5,7)]) >>> s3 = DNA.makeSequence("CGAAACGTTT", Name="seq3") >>> cpg1 = s3.addFeature('cpgsite', 'cpg', [(0,2)]) >>> cpg2 = s3.addFeature('cpgsite', 'cpg', [(5,7)]) Taking the union of annotations """"""""""""""""""""""""""""""" Construct a pseudo-feature (``cds``) that's a union of other features (``exon1``, ``exon2``, ``exon3``). .. doctest:: >>> from cogent import DNA >>> s1 = DNA.makeSequence("AAGAAGAAGACCCCCAAAAAAAAAA"\ ... "TTTTTTTTTTAAAAAGGGAACCCT", ... Name="seq1") ... >>> exon1 = s1.addFeature('exon', 'A', [(10,15)]) >>> exon2 = s1.addFeature('exon', 'B', [(30,40)]) >>> exon3 = s1.addFeature('exon', 'C', [(45, 48)]) >>> cds = s1.getRegionCoveringAll([exon1, exon2, exon3]) Getting annotation coordinates """""""""""""""""""""""""""""" These are useful for doing custom things, e.g. you could construct intron features using the below. .. doctest:: >>> cds.getCoordinates() [(10, 15), (30, 40), (45, 48)] Annotations have shadows """""""""""""""""""""""" A shadow is a span representing everything but the annotation. .. doctest:: >>> not_cds = cds.getShadow() >>> not_cds region "not exon" at [0:10, 15:30, 40:45, 48:49]/49 Compare to the coordinates of the original. .. doctest:: >>> cds region "exon" at [10:15, 30:40, 45:48]/49 Adding to a sequence member of an alignment """"""""""""""""""""""""""""""""""""""""""" The following annotation is directly applied onto the sequence and so is in ungapped sequence coordinates. .. doctest:: >>> from cogent import LoadSeqs >>> aln1 = LoadSeqs(data=[['x','-AAACCCCCA'], ... ['y','TTTT--TTTT']]) >>> seq_exon = aln1.getSeq('x').addFeature('exon', 'A', [(3,8)]) Adding to an alignment """""""""""""""""""""" We add an annotation directly onto an alignment. In this example we add a ``Variable`` that can be displayed as a red line on a drawing. The resulting annotation (``red_data`` here) is in **alignment coordinates**! .. doctest:: >>> from cogent.core.annotation import Variable >>> red_data = aln1.addAnnotation(Variable, 'redline', 'align', ... [((0,15),1),((15,30),2),((30,45),3)]) ... Slicing sequences and alignments by annotations """"""""""""""""""""""""""""""""""""""""""""""" By a feature or coordinates returns same sequence span .. doctest:: >>> from cogent import DNA >>> s1 = DNA.makeSequence("AAGAAGAAGACCCCCAAAAAAAAAA"\ ... "TTTTTTTTTTAAAAAGGGAACCCT", ... Name="seq1") ... >>> exon1 = s1.addFeature('exon', 'A', [(10,15)]) >>> exon2 = s1.addFeature('exon', 'B', [(30,40)]) >>> s1[exon1] DnaSequence(CCCCC) >>> s1[10:15] DnaSequence(CCCCC) Using the annotation object ``getSlice`` method returns the same thing. .. doctest:: >>> s1[exon2] DnaSequence(TTTTTAAAAA) >>> exon2.getSlice() DnaSequence(TTTTTAAAAA) Slicing by pseudo-feature or feature series """"""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import DNA >>> s1 = DNA.makeSequence("AAGAAGAAGACCCCCAAAAAAAAAA"\ ... "TTTTTTTTTTAAAAAGGGAACCCT", ... Name="seq1") ... >>> exon1 = s1.addFeature('exon', 'A', [(10,15)]) >>> exon2 = s1.addFeature('exon', 'B', [(30,40)]) >>> exon3 = s1.addFeature('exon', 'C', [(45, 48)]) >>> cds = s1.getRegionCoveringAll([exon1, exon2, exon3]) >>> print s1[cds] CCCCCTTTTTAAAAACCC >>> print s1[exon1, exon2, exon3] CCCCCTTTTTAAAAACCC .. warning:: Slices are applied in order! .. doctest:: >>> print s1 AAGAAGAAGACCCCCAAAAAAAAAATTTTTTTTTTAAAAAGGGAACCCT >>> print s1[exon1, exon2, exon3] CCCCCTTTTTAAAAACCC >>> print s1[exon2] TTTTTAAAAA >>> print s1[exon3] CCC >>> print s1[exon1, exon3, exon2] CCCCCCCCTTTTTAAAAA Slice series must not be overlapping """""""""""""""""""""""""""""""""""" .. doctest:: >>> s1[1:10, 9:15] Traceback (most recent call last): ValueError: Uninvertable. Overlap: 9 < 10 >>> s1[exon1, exon1] Traceback (most recent call last): ValueError: Uninvertable. Overlap: 10 < 15 But ``getRegionCoveringAll`` resolves this, ensuring no overlaps. .. doctest:: >>> print s1.getRegionCoveringAll([exon3, exon3]).getSlice() CCC You can slice an annotation itself """""""""""""""""""""""""""""""""" .. doctest:: >>> print s1[exon2] TTTTTAAAAA >>> ex2_start = exon2[0:3] >>> print s1[ex2_start] TTT >>> ex2_end = exon2[-3:] >>> print s1[ex2_end] AAA Sequence vs Alignment slicing """"""""""""""""""""""""""""" You can't slice an alignment using an annotation from a sequence. .. doctest:: >>> aln1[seq_exon] Traceback (most recent call last): ValueError: Can't map exon "A" at [3:8]/9 onto 2 x 10 text alignment: x[-AAACCCCCA], y[TTTT--TTTT] via [] Copying annotations """"""""""""""""""" You can copy annotations onto sequences with the same name, even if the length differs .. doctest:: >>> aln2 = LoadSeqs(data=[['x', '-AAAAAAAAA'], ['y', 'TTTT--TTTT']]) >>> seq = DNA.makeSequence('CCCCCCCCCCCCCCCCCCCC', 'x') >>> match_exon = seq.addFeature('exon', 'A', [(3,8)]) >>> aln2.getSeq('x').copyAnnotations(seq) >>> copied = list(aln2.getAnnotationsFromSequence('x', 'exon')) >>> copied [exon "A" at [4:9]/10] but if the feature lies outside the sequence being copied to, you get a lost span .. doctest:: >>> aln2 = LoadSeqs(data=[['x', '-AAAA'], ['y', 'TTTTT']]) >>> seq = DNA.makeSequence('CCCCCCCCCCCCCCCCCCCC', 'x') >>> match_exon = seq.addFeature('exon', 'A', [(5,8)]) >>> aln2.getSeq('x').copyAnnotations(seq) >>> copied = list(aln2.getAnnotationsFromSequence('x', 'exon')) >>> copied [exon "A" at [5:5, -4-]/5] >>> copied[0].getSlice() 2 x 4 text alignment: x[----], y[----] You can copy to a sequence with a different name, in a different alignment if the feature lies within the length .. doctest:: >>> # new test >>> aln2 = LoadSeqs(data=[['x', '-AAAAAAAAA'], ['y', 'TTTT--TTTT']]) >>> seq = DNA.makeSequence('CCCCCCCCCCCCCCCCCCCC', 'x') >>> match_exon = seq.addFeature('exon', 'A', [(5,8)]) >>> aln2.getSeq('y').copyAnnotations(seq) >>> copied = list(aln2.getAnnotationsFromSequence('y', 'exon')) >>> copied [exon "A" at [7:10]/10] If the sequence is shorter, again you get a lost span. .. doctest:: >>> aln2 = LoadSeqs(data=[['x', '-AAAAAAAAA'], ['y', 'TTTT--TTTT']]) >>> diff_len_seq = DNA.makeSequence('CCCCCCCCCCCCCCCCCCCCCCCCCCCC', 'x') >>> nonmatch = diff_len_seq.addFeature('repeat', 'A', [(12,14)]) >>> aln2.getSeq('y').copyAnnotations(diff_len_seq) >>> copied = list(aln2.getAnnotationsFromSequence('y', 'repeat')) >>> copied [repeat "A" at [10:10, -6-]/10] Querying """""""" You need to get a corresponding annotation projected into alignment coordinates via a query. .. doctest:: >>> aln_exon = aln1.getAnnotationsFromAnySequence('exon') >>> print aln1[aln_exon] >x CCCCC >y --TTT Querying produces objects only valid for their source """"""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> cpgsite2 = s2.getAnnotationsMatching('cpgsite') >>> print s2[cpgsite2] CGCG >>> cpgsite3 = s3.getAnnotationsMatching('cpgsite') >>> s2[cpgsite3] Traceback (most recent call last): ValueError: Can't map cpgsite "cpg" at [0:2]/10 onto DnaSequence(CGAAACGTTT) via [] Querying for absent annotation """""""""""""""""""""""""""""" You get back an empty list, and slicing with this returns an empty sequence. .. doctest:: >>> # this test is new >>> dont_exist = s2.getAnnotationsMatching('dont_exist') >>> dont_exist [] >>> s2[dont_exist] DnaSequence() Querying features that span gaps in alignments """""""""""""""""""""""""""""""""""""""""""""" If you query for a feature from a sequence, it's alignment coordinates may be discontinuous. .. doctest:: >>> aln3 = LoadSeqs(data=[['x', 'C-CCCAAAAA'], ['y', '-T----TTTT']]) >>> exon = aln3.getSeq('x').addFeature('exon', 'ex1', [(0,4)]) >>> print exon.getSlice() CCCC >>> aln_exons = list(aln3.getAnnotationsFromSequence('x', 'exon')) >>> print aln_exons [exon "ex1" at [0:1, 2:5]/10] >>> print aln3[aln_exons] >x CCCC >y ---- .. note:: The ``T`` opposite the gap is missing since this approach only returns positions directly corresponding to the feature. ``asOneSpan`` unifies features with discontinuous alignment coordinates """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" To get positions spanned by a feature, including gaps, use ``asOneSpan``. .. doctest:: >>> unified = aln_exons[0].asOneSpan() >>> print aln3[unified] >x C-CCC >y -T--- Behaviour of annotations on nucleic acid sequences """""""""""""""""""""""""""""""""""""""""""""""""" Reverse complementing a sequence **does not** reverse annotations, that is they retain the reference to the frame for which they were defined. .. doctest:: >>> plus = DNA.makeSequence("CCCCCAAAAAAAAAATTTTTTTTTTAAAGG") >>> plus_rpt = plus.addFeature('blah', 'a', [(5,15), (25, 28)]) >>> print plus[plus_rpt] AAAAAAAAAAAAA >>> minus = plus.rc() >>> print minus CCTTTAAAAAAAAAATTTTTTTTTTGGGGG >>> minus_rpt = minus.getAnnotationsMatching('blah') >>> print minus[minus_rpt] AAAAAAAAAAAAA Masking annotated regions """"""""""""""""""""""""" We mask the CDS regions. .. doctest:: >>> from cogent.parse.genbank import RichGenbankParser >>> parser = RichGenbankParser(open('data/ST_genome_part.gb')) >>> seq = [seq for accession, seq in parser][0] >>> no_cds = seq.withMaskedAnnotations('CDS') >>> print no_cds[150:400] CAAGACAGACAAATAAAAATGACAGAGTACACAACATCC?????????... The above sequence could then have positions filtered so no position with the ambiguous character '?' was present. Masking annotated regions on alignments """"""""""""""""""""""""""""""""""""""" We mask exon's on an alignment. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs(data=[['x', 'C-CCCAAAAAGGGAA'], ... ['y', '-T----TTTTG-GTT']], moltype=DNA) >>> exon = aln.getSeq('x').addFeature('exon', 'norwegian', [(0,4)]) >>> print aln.withMaskedAnnotations('exon', mask_char='?') >x ?-???AAAAAGGGAA >y -T----TTTTG-GTT These also persist through reverse complement operations. .. doctest:: >>> rc = aln.rc() >>> print rc >x TTCCCTTTTTGGG-G >y AAC-CAAAA----A- >>> print rc.withMaskedAnnotations('exon', mask_char='?') >x TTCCCTTTTT???-? >y AAC-CAAAA----A- You can take mask of the shadow """"""""""""""""""""""""""""""" .. doctest:: >>> from cogent import DNA >>> s = DNA.makeSequence('CCCCAAAAAGGGAA', 'x') >>> exon = s.addFeature('exon', 'norwegian', [(0,4)]) >>> rpt = s.addFeature('repeat', 'norwegian', [(9, 12)]) >>> rc = s.rc() >>> print s.withMaskedAnnotations('exon', shadow=True) CCCC?????????? >>> print rc.withMaskedAnnotations('exon', shadow=True) ??????????GGGG >>> print s.withMaskedAnnotations(['exon', 'repeat'], shadow=True) CCCC?????GGG?? >>> print rc.withMaskedAnnotations(['exon', 'repeat'], shadow=True) ??CCC?????GGGG What features of a certain type are available? """""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import DNA >>> s = DNA.makeSequence('ATGACCCTGTAAAAAATGTGTTAACCC', ... Name='a') >>> cds1 = s.addFeature('cds','cds1', [(0,12)]) >>> cds2 = s.addFeature('cds','cds2', [(15,24)]) >>> all_cds = s.getAnnotationsMatching('cds') >>> all_cds [cds "cds1" at [0:12]/27, cds "cds2" at [15:24]/27] Getting all features of a type, or everything but that type """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" The annotation methods ``getRegionCoveringAll`` and ``getShadow`` can be used to grab all the coding sequences or non-coding sequences in a ``DnaSequence`` object. .. doctest:: >>> from cogent.parse.genbank import RichGenbankParser >>> parser = RichGenbankParser(open('data/ST_genome_part.gb')) >>> seq = [seq for accession, seq in parser][0] >>> all_cds = seq.getAnnotationsMatching('CDS') >>> coding_seqs = seq.getRegionCoveringAll(all_cds) >>> coding_seqs region "CDS" at [189:255, 336:2799, 2800:3730, 3733... >>> coding_seqs.getSlice() DnaSequence(ATGAACC... 9063) >>> noncoding_seqs = coding_seqs.getShadow() >>> noncoding_seqs region "not CDS" at [0:189, 255:336, 2799:2800, ... >>> noncoding_seqs.getSlice() DnaSequence(AGAGATT... 957) Getting sequence features when you have an alignment object """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" Sequence features can be accessed via a containing ``Alignment``. .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs(data=[['x','-AAAAAAAAA'], ['y','TTTT--TTTT']]) >>> print aln >x -AAAAAAAAA >y TTTT--TTTT >>> exon = aln.getSeq('x').addFeature('exon', '1', [(3,8)]) >>> aln_exons = aln.getAnnotationsFromSequence('x', 'exon') >>> aln_exons = aln.getAnnotationsFromAnySequence('exon') >>> aln_exons [exon "1" at [4:9]/10] Annotation display on sequences """"""""""""""""""""""""""""""" We can display annotations on sequences, writing to file. .. note:: This requires `matplotlib `_ be installed. We first make a sequence and add some annotations. .. doctest:: >>> from cogent import DNA >>> seq = DNA.makeSequence('aaaccggttt' * 10) >>> v = seq.addFeature('exon', 'exon', [(20,35)]) >>> v = seq.addFeature('repeat_unit', 'repeat_unit', [(39,49)]) >>> v = seq.addFeature('repeat_unit', 'rep2', [(49,60)]) We then make a ``Display`` instance and write to file. This will use standard feature policy for colouring and shape of feature types. .. doctest:: >>> from cogent.draw.linear import Display >>> seq_display = Display(seq, colour_sequences=True) >>> fig = seq_display.makeFigure() >>> fig.savefig('annotated_1.png') Annotation display on alignments """""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import DNA, LoadSeqs >>> from cogent.core.annotation import Variable >>> from cogent.draw.linear import Display >>> aln = LoadSeqs('data/primate_cdx2_promoter.fasta', moltype=DNA)[:150] >>> annot = aln.addAnnotation(Variable, 'redline', 'align', ... [((0,15),1),((15,30),2),((30,45),3)]) >>> annot = aln.addAnnotation(Variable, 'blueline', 'align', ... [((0,15),1.5),((15,30),2.5),((30,45),3.5)]) >>> align_display = Display(aln, colour_sequences=True) >>> fig = align_display.makeFigure(width=25, left=1, right=1) >>> fig.savefig('annotated_2.png') Annotation display of a custom variable """"""""""""""""""""""""""""""""""""""" We just show a series of spans. .. doctest:: >>> from cogent import DNA >>> from cogent.draw.linear import Display >>> from cogent.core.annotation import Variable >>> seq = DNA.makeSequence('aaaccggttt' * 10) >>> annot = seq.addAnnotation(Variable, 'redline', 'align', ... [((0,15),1),((15,30),2),((30,45),3)]) ... >>> seq_display = Display(seq, colour_sequences=True) >>> fig = seq_display.makeFigure() >>> fig.savefig('annotated_3.png') Generic metadata ^^^^^^^^^^^^^^^^ *To be written.* Info object """"""""""" *To be written.* .. following cleans up files .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files(['annotated_%d.png' % i for i in range(1,4)], ... error_on_missing=False) PyCogent-1.5.3/doc/cookbook/blast.rst000644 000765 000024 00000035044 11572500652 020463 0ustar00jrideoutstaff000000 000000 .. _blast-usage: ***************** Controlling BLAST ***************** .. authors, Gavin Huttley, Tom Elliott, Jeremy Widmann Preliminaries ------------- In order to run BLAST locally (from a program running on your computer) you will need to do three things: - Download the BLAST "executables" from NCBI - Make sure these programs are available on your ``PATH`` - Construct and format a database to search against - Tested on version 2.2.13 NCBI has recently changed the BLAST programs, and as yet PyCogent does not support the new versions. The "legacy" versions are available from `here `_ (login as guest). Detailed installation instructions are beyond the scope of this example, but are available at `NCBI's website `_ . After downloading the programs and setting up your ``PATH``, you should test BLAST by doing this from the command line: :: $ blastall --help Which should give this: :: blastall 2.2.22 arguments: -p Program Name [String] -d Database [String] default = nr... The file ``refseqs.fasta`` contains some short sequences for use in the following examples. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> seqs = LoadSeqs('data/refseqs.fasta', ... moltype=DNA, aligned=False) >>> for seq in seqs.Seqs: ... print seq ... CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC TGCAGCTTGAGCCACAGGAGAGAGAGAGCTTC TGCAGCTTGAGCCACAGGAGAGAGCCTTC TGCAGCTTGAGCCACAGGAGAGAGAGAGCTTC ACCGATGAGATATTAGCACAGGGGAATTAGAACCA TGTCGAGAGTGAGATGAGATGAGAACA ACGTATTTTAATTTGGCATGGT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT CCAGAGCGAGTGAGATAGACACCCAC These sequences can be formatted as a database by running the ``formatdb`` program contained in the NCBI download (this assumes that ``formatdb`` is on your ``PATH``) .. doctest:: >>> from cogent.app.formatdb import build_blast_db_from_fasta_path >>> result = build_blast_db_from_fasta_path('data/refseqs.fasta') >>> result[0] 'data/refseqs.fasta'... The function ``build_blast_db_from_fasta_path()`` returns a tuple containing the path to the database, and a list of paths for all the new files written by ``formatdb``. The final requirement is to know the path to the ``data`` directory that comes with your BLAST download. This directory contains substitution matrices and other files. BLAST with text output ---------------------- In this example, we load a DNA sequence from a file in the data directory and BLAST against our formatted database. The parameters dictionary contains two flagged arguments: -p for the program to use, and -m for the type of output we want. '-m':'9' requests "tabular with comment lines." See ``blastall --help`` for more details. Also, the application controller is set up to require a path to the data directory even though we don't use a substitution matrix for DNA. Here we can just pass any string. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> from cogent.app.blast import blast_seqs, Blastall >>> seqs = LoadSeqs('data/inseqs.fasta', moltype=DNA, aligned=False) >>> seq = seqs.getSeq('s2_like_seq') >>> seq DnaSequence(TGCAGCT... 28) >>> params={'-p':'blastn','-m':'9'} >>> result = blast_seqs([seq], ... Blastall, ... blast_db = 'data/refseqs.fasta', ... blast_mat_root = 'xxx', ... params = params) >>> data = result['StdOut'].read() >>> print data.split('\n')[:1] ['# BLASTN 2.2... We save the results for further processing .. doctest:: >>> outfile = open('data/blast_test.txt','w') >>> outfile.write(data) >>> outfile.close() The simplest way to organize the results is to use a parser. The BLAST parsers operate on a file object. .. doctest:: >>> from cogent.parse.blast import MinimalBlastParser9 >>> blastfile = open('data/blast_test.txt', 'r') >>> blast_results = MinimalBlastParser9(blastfile) >>> type(blast_results) >>> for result in blast_results: ... print result ... ({'QUERY': '1', 'FIELDS': 'Query id... The results include one item for each query sequence. Each result consists of a tuple whose first item is a dictionary of metadata. The second item is a list of hits. For example, you might do this .. doctest:: >>> from cogent.parse.blast import MinimalBlastParser9 >>> blastfile = open('data/blast_test.txt', 'r') >>> blast_results = MinimalBlastParser9(blastfile) >>> for result in blast_results: ... meta_data, hit_list = result ... fields = meta_data['FIELDS'].split(',') ... for key, value in zip(fields, hit_list[0]): ... print key.strip().ljust(20), value Query id 1 Subject id s2 % identity 89.66 alignment length 29 mismatches 2 gap openings 1 q. start 1 q. end 28 s. start 1 s. end 29 e-value 6e-05 bit score 26.3 BLAST with XML output --------------------- In this example, we load a DNA sequence from a file in the data directory and BLAST against our formatted database as above. NCBI recommends that you use XML as the output for BLAST. (They reserve the right to change the format for other output types). XML is the output when we pass '-m':'7'. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> from cogent.app.blast import blast_seqs, Blastall >>> seqs = LoadSeqs('data/inseqs.fasta', moltype=DNA, aligned=False) >>> seq = seqs.getSeq('s2_like_seq') >>> params={'-p':'blastn','-m':'7'} >>> result = blast_seqs([seq], ... Blastall, ... blast_db = 'data/refseqs.fasta', ... blast_mat_root = 'xxx', ... params = params) >>> data = result['StdOut'].read() >>> outfile = open('data/blast_test.xml','w') >>> outfile.write(data) >>> outfile.close() One nice thing about this format is that it includes the alignment. The organization of the results from this parser is slightly different. Each result is still a tuple, but the first item of the tuple is metadata about the BLAST settings (``meta_meta_data``). The keys for the fields in the output are contained as the first element in the list that is the second item in the result tuple. .. doctest:: >>> from cogent.parse.blast_xml import MinimalBlastParser7 >>> blastfile = open('data/blast_test.xml', 'r') >>> blast_results = MinimalBlastParser7(blastfile) >>> for result in blast_results: ... meta_meta_data, hit_list = result ... key_list = hit_list[0] ... for key, value in zip(key_list, hit_list[1]): ... if 'ALIGN' in key: ... continue ... print key.ljust(20), value QUERY ID 1 SUBJECT_ID lcl|s2 HIT_DEF No definition line found HIT_ACCESSION s2 HIT_LENGTH 29 PERCENT_IDENTITY 26 MISMATCHES 0 GAP_OPENINGS 1 QUERY_START 1 QUERY_END 28 SUBJECT_START 1 SUBJECT_END 29 E_VALUE 6.00825e-05 BIT_SCORE 26.2635 SCORE 13 POSITIVE 26 >>> from cogent.parse.blast_xml import MinimalBlastParser7 >>> blastfile = open('data/blast_test.xml', 'r') >>> blast_results = MinimalBlastParser7(blastfile) >>> for result in blast_results: ... meta_meta_data, hit_list = result ... key_list = hit_list[0] ... for key in ('QUERY_ALIGN','MIDLINE_ALIGN','SUBJECT_ALIGN'): ... i = key_list.index(key) ... print hit_list[1][i][:40] TGCAGCTTGAG-CACAGGTTAGAGCCTTC ||||||||||| |||||| ||||||||| TGCAGCTTGAGCCACAGGAGAGAGCCTTC .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files(['data/blast_test.txt', 'data/blast_test.xml'], ... error_on_missing=False) BLAST with protein sequences ---------------------------- In this example, we load a protein sequence from a file in the data directory and BLAST against a new protein database we will create. Since we want to BLAST protein sequences instead of DNA, we will have to construct a new BLAST database. The file ``refseqs_protein.fasta`` contains some short sequences for use in the following examples. .. doctest:: >>> from cogent.app.formatdb import build_blast_db_from_fasta_path >>> result = build_blast_db_from_fasta_path('data/refseqs_protein.fasta', is_protein=True) >>> result[0] 'data/refseqs_protein.fasta'... Notice that we set the parameter ``is_protein`` to ``True`` since our database consists of protein sequences this time. This was not necessary in the previous example, because ``is_protein`` is set to ``False`` by default. Now that we have built our protein BLAST database, we can load our sequence and BLAST against this database. .. doctest:: >>> from cogent import LoadSeqs, PROTEIN >>> from cogent.app.blast import blast_seqs, Blastall >>> seqs = LoadSeqs('data/inseqs_protein.fasta', moltype=PROTEIN, aligned=False) >>> seq = seqs.getSeq('1091044_fragment') >>> seq ProteinSequence(IPLDFDK... 26) Notice we need to use '-p':'blastp' in the parameters dictionary, since ``blastp`` is used for protein. .. doctest:: >>> params={'-p':'blastp','-m':'9'} >>> result = blast_seqs([seq], ... Blastall, ... blast_db = 'data/refseqs_protein.fasta', ... params = params) >>> data = result['StdOut'].read() >>> print data.split('\n')[:1] ['# BLASTP 2.2... We save the results for further processing .. doctest:: >>> outfile = open('data/blast_protein_test.txt','w') >>> outfile.write(data) >>> outfile.close() Now we will explore some of the convenience methods of the ``BlastResult`` object. .. doctest:: >>> from cogent.parse.blast import BlastResult >>> blast_results = BlastResult(open('data/blast_protein_test.txt','r')) Suppose we want to filter our results based on various criteria. In many cases you may want to only keep the top '3' matches with the longest 'ALIGNMENT LENGTH' for the query sequence to the target. .. doctest:: >>> best_hits = dict(blast_results.bestHitsByQuery(field='ALIGNMENT LENGTH', n=3)) >>> query_1_best_hits = best_hits['1'] >>> for hit in query_1_best_hits: ... for key, value in hit.items(): ... print key.ljust(20), value ... print ... MISMATCHES 0 ALIGNMENT LENGTH 26 Q. END 26 BIT SCORE 56.2 % IDENTITY 100.00 Q. START 1 S. START 30 S. END 55 GAP OPENINGS 0 QUERY ID 1 E-VALUE 5e-12 SUBJECT ID 1091044 MISMATCHES 10 ALIGNMENT LENGTH 27 Q. END 25 BIT SCORE 33.5 % IDENTITY 55.56 Q. START 1 S. START 32 S. END 58 GAP OPENINGS 1 QUERY ID 1 E-VALUE 3e-05 SUBJECT ID 5326864 MISMATCHES 16 ALIGNMENT LENGTH 24 Q. END 25 BIT SCORE 22.3 % IDENTITY 33.33 Q. START 2 S. START 19 S. END 42 GAP OPENINGS 0 QUERY ID 1 E-VALUE 0.077 SUBJECT ID 14286173 The fist of the top 3 hits for alignment length has 0 MISMATCHES and a % IDENTITY of 100.00. The next 2 hits have many MISMATCHES and a much lower % IDENTITY. Lets filter the results again, but by E-VALUE this time: .. doctest:: >>> best_hits = dict(blast_results.bestHitsByQuery(field='E-VALUE', n=3)) >>> query_1_best_hits = best_hits['1'] >>> for hit in query_1_best_hits: ... for key, value in hit.items(): ... print key.ljust(20), value ... print ... MISMATCHES 0 ALIGNMENT LENGTH 26 Q. END 26 BIT SCORE 56.2 % IDENTITY 100.00 Q. START 1 S. START 30 S. END 55 GAP OPENINGS 0 QUERY ID 1 E-VALUE 5e-12 SUBJECT ID 1091044 MISMATCHES 10 ALIGNMENT LENGTH 27 Q. END 25 BIT SCORE 33.5 % IDENTITY 55.56 Q. START 1 S. START 32 S. END 58 GAP OPENINGS 1 QUERY ID 1 E-VALUE 3e-05 SUBJECT ID 5326864 MISMATCHES 6 ALIGNMENT LENGTH 18 Q. END 26 BIT SCORE 30.4 % IDENTITY 66.67 Q. START 9 S. START 31 S. END 48 GAP OPENINGS 0 QUERY ID 1 E-VALUE 3e-04 SUBJECT ID 15964668 You can filter the BLAST results by any of the fields you like. You can also use the ``BlastResult`` object to do a quick assessment of your BLAST results looking only at the fields you like: .. doctest:: >>> fields = ['SUBJECT ID', 'BIT SCORE', 'E-VALUE'] >>> for query, results in blast_results.items(): ... print ''.join([f.ljust(20) for f in fields]) ... for result in results[-1]: ... print ''.join(map(str,[result[field].ljust(20) for field in fields])) SUBJECT ID BIT SCORE E-VALUE 1091044 56.2 5e-12 5326864 33.5 3e-05 15964668 30.4 3e-04 17229033 29.6 5e-04 21112072 28.1 0.001 4704732 25.8 0.007 13541117 24.6 0.016 15826629 24.3 0.020 14286173 22.3 0.077 6323138 21.9 0.10 18313548 20.8 0.22 21674812 20.0 0.38 14600438 20.0 0.38 4996210 18.5 1.1 15605963 17.3 2.5 15615431 16.5 4.2 .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files(['data/blast_protein_test.txt'], ... error_on_missing=False) PyCogent-1.5.3/doc/cookbook/building_a_tree_of_life.rst000644 000765 000024 00000102441 11611142132 024136 0ustar00jrideoutstaff000000 000000 *********************** Building a tree of life *********************** .. authors, Greg Caporaso Building a tree of life with PyCogent ====================================== This cookbook example runs through how to construct construct a tree of life from 16S rRNA sequences to test whether the three domains of life are visible as three separate clusters in a phylogenetic tree. This example covers compiling sequences, building a multiple sequence alignment, building a phylogenetic tree from that sequence alignment, and visualizing the tree. Step 0. Set up your python environment -------------------------------------- For this tutorial you'll need cogent, muscle, and FastTree installed on your system. Start an interactive python session by entering the following into a command terminal:: python You should now see the python command prompt:: >>> Step 1: Download sequences from NCBI ------------------------------------ Here we'll work with archaeal, bacteria, and eukaryotic sequences obtained from NCBI using the PyCogent EUtils wrappers. Run the following commands to obtain these sequences:: from cogent.db.ncbi import EUtils from cogent.parse.fasta import MinimalFastaParser e = EUtils() arc16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND archaea[orgn]'])) bac16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND bacteria[orgn]'])) euk16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND eukarya[orgn]'])) You can check how many sequences you obtained for each query by running:: len(arc16s) len(bac16s) len(euk16s) .. note:: In this example you'll notice that you have relatively few sequences for each query. You'd obtain many more if you replaced the ``rRNA`` in the query with ``ribosomal RNA``, but the runtime would also be significantly longer. For the purpose of these tutorial we'll therefore stick with this command that returns fewer sequences. Step 2: Load the sequences -------------------------- We'll begin by loading the sequences that have been downloaded, applying a filter to retain only those that we consider to be of good quality. Sequences fewer than 750 bases or sequences containing one or more ``N`` characters will be ignored (``N`` characters typically represent ambiguous base calls during sequencing). First, define a function to load and filter the sequences:: from cogent.parse.fasta import MinimalFastaParser def load_and_filter_seqs(seqs, domain_label): result = [] for seq_id, seq in seqs: if len(seq) > 750 and seq.count('N') < 1: result.append((domain_label + seq_id,seq)) return result Next, load and filter the three sequence sets:: arc16s_filtered = load_and_filter_seqs(arc16s,'A: ') bac16s_filtered = load_and_filter_seqs(bac16s,'B: ') euk16s_filtered = load_and_filter_seqs(euk16s,'E: ') len(arc16s_filtered) len(bac16s_filtered) len(euk16s_filtered) Step 3: Select a random subset of the sequences ----------------------------------------------- Import shuffle from the random module to extract a random collection of sequences:: from random import shuffle shuffle(arc16s_filtered) shuffle(bac16s_filtered) shuffle(euk16s_filtered) Select some random sequences from each domain. Note that only a few sequences are chosen to facilitate a quick analysis:: combined16s = arc16s_filtered[:3] + bac16s_filtered[:10] + euk16s_filtered[:6] len(combined16s) Step 4: Load the sequences into a SequenceCollection object ----------------------------------------------------------- Use ``LoadSeqs`` to load the unaligned sequences into a ``SequenceCollection`` object. In this step we'll rename the sequences (by passing a ``label_to_name`` function) to only the accession number for the sequence. This facilitates visualization in downstream steps. :: from cogent import LoadSeqs, DNA seqs = LoadSeqs(data=combined16s,moltype=DNA,aligned=False,label_to_name=lambda x: '|'.join(x.split('|')[:2])) You can explore some properties of this sequence collection. For example, you can count how many sequences are in the sequence collection object:: seqs.getNumSeqs() .. _step5: Step 5: Align the sequences using muscle ---------------------------------------- Load an aligner function, and align the sequences. Here we'll align with muscle via the muscle application controller. The sequences will be loaded into an ``Alignment`` object called ``aln``. :: from cogent.app.muscle import align_unaligned_seqs aln = align_unaligned_seqs(seqs,DNA) Step 6: Build a tree from the alignment using FastTree ------------------------------------------------------ Load a tree-building function, and build a tree from the alignment. Here we'll use FastTree. The tree will be stored in a ``PhyloNode`` object called ``tree``. :: from cogent.app.fasttree import build_tree_from_alignment tree = build_tree_from_alignment(aln,DNA) Step 7: Visualize the tree ------------------------------------------ Load a drawing function to generate a prettier picture of the tree:: from cogent.draw.dendrogram import UnrootedDendrogram dendrogram = UnrootedDendrogram(tree) Have a quick look at the unrooted dendrogram:: dendrogram.showFigure() You should see something like this: .. image:: ../images/tol_not_gap_filtered.png Figure 1: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences. Step 8: Save the tree as a PDF ------------------------------- Finally, you can save this tree as a PDF for sharing or later viewing:: dendrogram.drawToPDF('./tol.pdf') You can also write the alignment and tree to fasta and newick files, respectively. You can then load these in tools such as `BoulderALE `_ (for alignment editing) or `TopiaryExplorer `_ or `FigTree `_ (for tree viewing, coloring, and layout manipulation). :: open('./tol.fasta','w').write(aln.toFasta()) open('./tol.tre','w').write(tree.getNewick(with_distances=True)) Extra credit: Alignment filtering --------------------------------- Filter highly gapped positions from the alignment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To try to improve the quality of the alignment and therefore the tree, it's often a good idea to removed positions that contain a high proportion of gap characters from the alignment. These generally represent non-homologous regions of the sequence of interest, and therefore contribute little to our understanding of the evolutionary history of the sequence. These steps may result in a clearer delineation of the three domains on your tree, but the results will in part be dependent on the randomly chosen sequences in your alignment. To remove positions that are greater than 10% gap characters from the alignment, run the following command:: gap_filtered_aln = aln.omitGapPositions(allowed_gap_frac=0.10) If you count the positions in both the full and reduced alignments you'll see that your alignment is now a lot shorter:: len(aln) len(gap_filtered_aln) Rebuild the tree and visualize the result as before:: gap_filtered_tree = build_tree_from_alignment(gap_filtered_aln,DNA) gap_filtered_dendrogram = UnrootedDendrogram(gap_filtered_tree) gap_filtered_dendrogram.showFigure() Your tree should look something like this: .. image:: ../images/tol_gap_filtered.png Figure 2: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences. Filtering highly variable positions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Another issue that adds noise to alignments of distantly related sequences is highly entropic (or highly variable) positions. To filter these, we can compute the Shannon Entropy or uncertainty of each position, and then remove the most 10% entropic positions. First we'll compile the Shannon Entropy value for each position in the alignment:: sorted_uncertainties = sorted(gap_filtered_aln.uncertainties()) Next we'll find the 90th percentile by sorting the uncertainties and finding that value that is 90% of the way through that list:: uncertain_90p = sorted_uncertainties[int(len(sorted_uncertainties)*0.9)] Next we'll identify and store the positions that have lower entropy than ``uncertain_90p``:: positions_to_keep = [] for i,u in enumerate(gap_filtered_aln.uncertainties()): if u < uncertain_90p: positions_to_keep.append(i) Then we'll filter the alignment to contain only those positions:: entropy_gap_filtered_aln = gap_filtered_aln.takePositions(positions_to_keep) We can then rebuild and visualize the tree:: entropy_gap_filtered_tree = build_tree_from_alignment(entropy_gap_filtered_aln,DNA) entropy_gap_filtered_dendrogram = UnrootedDendrogram(entropy_gap_filtered_tree) entropy_gap_filtered_dendrogram.showFigure() Your tree should look something like this: .. image:: ../images/tol_entropy_gap_filtered.png Figure 3: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences. While the trees in Figures 1, 2, and 3 don't look very different, an interesting point to note is the amount of information in each:: len(aln) len(gap_filtered_aln) len(entropy_gap_filtered_aln) The entropy and gap filtered alignment (``entropy_gap_filtered_aln``) contains approximately 1/4 of the positions as the full alignment (``aln``), yet results in a nearly identical phylogenetic tree. This suggests that the filtered positions add very little phylogenetic information. In small alignments such as the example here this may not have a large affect on run time, but when building a tree from thousands or tens of thousands of sequences removing gap and high entropy positions can save significant compute time as well as frequently improving results. Starting with Silva sequences (to skip steps of obtaining sequences from NCBI) ------------------------------------------------------------------------------ The following sequences are randomly chosen from the Silva database. You can use these instead of pulling random sequences from NCBI. :: fasta_str = """>AF424517 1 994 Archaea/Crenarchaeota/uncultured/uncultured CAGCAGCCGCGGTAATACCAGCCCCCCGAGTGGTGGGGATGTTTATTTGGCCTAAAACGTCCGTAGCCAGCTCGGTAAATCTCTCGTTAAATCCAGCGTCCTAAGCGTTGGGCTGCGAGGGAGACTGCCAAGCTAGAGGGTGGGAGAGGTCAGCGGTATTTCTGGGGTAGGGGCGAAATCCATTGATCCCAGGAGGACCACCAGTGGCGAAGGCTGCTGACTAGAACACGCCTGACGGTGAGGGACGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCCAGCTGTAAACGATGCAAACTCGGTGATGCCCTGGCTTGTGGCCAGTGCAGTGCCGCAGGGAAGCCGTTAAGTTTGCCGCCTGGGAAGTACGTACGCAAGTATGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGAAGCCTGCGGTTCAATTGGAGTCAACGCCAGAAATCTTACCCGAAGAGACAGCAGAATGAAGGTCAAGCTGGAGACTTTACCAGACAAGCTGAGAAGTGGTGCATGGCCGTCGCCAGCTCGTGCCGTGAGATGTCCTGTTAAGTCAGGTAACCAGCGAGATCCCTGCCTCTAGTTGCCACCATTACTCTCCGGAGTAGTGGGGCGAATTAGCGGGACCGCCGTAGTTAATACGGAGGAAGGAAGGGGCCACGGCAGGTCAGTATGCCCTGAAACTTTGGGGCCACACGCGGGCTGCAATGGTAACGACAATGGGTTCCGAAACCGAAAGGTGGAGGTAATCCTCAAACGTTACCACAGTTATGATTGAGGGCTGCAACTCGCCCTCATGAATATGGAATCCCTAGTAACTGCGTGTCATTATCGCGCGGTGAATACGTCCCTGCTCCTTGCACACACTGCCCGTCGAACCACCCGAATGAGGTTTGGGTGAGGAATGGTCGAATGTTGGCCGTTTCGAACCTGGGCTTCGTAAGGAGGGTTAAGTCGTAACAAGGTAACCGTA >AF448158 1 1828 Eukarya/Metazoa/Magelona et rel. TTGATCCTGCCAGTAGTCATATGCTTGACTCAAAGATTAAGCCATGCATGTGCAAGTACATGACTTTTTTACACACGGTGAGACCGCGAATGGCTCATTAGATCAGTCTTAGTTCCTTAGACGGAAAGTGCTACTTGGATAACTGTGGCAATTCTAGAGCTAATACGTGCACGCAAGCTCCGACCTACTGGGGAAGAGCGCAATTATTAGATCAAGACCAAACGAGTCGAAAGGCTCGAACGTCTGGTGACTCTGGATAACCTCGGGCTGACCGCACGGCCAAGAGCCGGCGGCGCATCTTTCAAGTGTCTGCCCTATCAACTTTCGATGGTATGCGATCTGCGTACCATGGTGCTTACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACCTCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCTGGCACAGGGAGGTAGTGACGAGCAATAGCGACTCGGGACTCTTTCGAGGCCTCGGGATCGGAATGAGTACAACGTAAACACTTTTGCAAGGAACAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGCTGTTGCAGTTAAAAAGCTCGTAGCTGAATCTCGGGTGCGGGCGGGCGGTCCGCCTTACAGCGTGCACTGCCCCGATCCTGATCCAACTGCCGGTATTATCTCGGGGTGCTCTTAGCTGAGTGTCTTGGGCTGGCCGGTGCTTTTACTTTGAAAAAATTAGAGTGCTCAAAGCAGGCTTCCACGCCTGAATACTATAGCATGGAATAATGGAATAAGACCTCGGTTCTATTCTGTTGGTCTCTGGAAACCAGAGGTAATGATTAAGAGGGACAGACGGGGGCATTCGTATTGCGGGGCGAGAGGTGAAATTCTTAGACCCTCGCAAGACGAACTACAGCGAAAGCATTTGCCAAGCATGTTTTCTTTAGTCAAGAACGAAAGTCAGAGGTTCGAAGACGATCAGATACCGTCCTAGTTCTGACCATAAACGATGCCGACTAGCGATGCGCGAGCGTTGGTATCTGACCTCGCGCGCAGCTCCCGGGAAACCAAAGTCTTTGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCCGGCCCGGACACTGCGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGATAACGAACGAGACTCTAGCCTGCTAAATAGTTCGTCGACACGCGGTTGTGTCTGGCGAGGAAACTTCTTAGAGGGACAAATGGCATTTAGTCATACGAGATTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTTCGGGGCCGCACGCGCGCTACACTGAAGGAGACAGCGAGTGTCCTGACCTAGCCCGAAAGGGCCGGGCAATCTGCTGAACCTCTTTCGTGGTAGGGATTGGGGCTTGCAATTGTTCCCCATGAACCAGGAATTCCGAGTAAGCGCAGGTCACAAGCCTGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGAGCGGTTCAGTGAGACCCTCGGACTTGCCCAGCAGGAGCCGGCGACGGCTCCGCGTGTGTGCGAGAAAGAATGTCGAACTGTATTGCTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGCTT >AJ428075 1 1749 Eukarya/Viridiplantae/Streptophyta/Klebsormidiophyceae TAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAAATTACTCTAAATGGTAAAACTGCGAATGGCTCATTAAATCAGTTATAGTTTATTTGATGATTCCTGCTACTCGGATAACCGTAGTAATTATAGAGCTAATACGTGCGCAAACGCCCGACTTCGGAAGGGCCGTATTTATTAGATAAAAGACCAACTCGGGGTTCGCCCCGAAACTTTGGTGATTCATAATGTAATCTCGGACCGCACGGCCTCGCGCCGGCGGCAAATCAATCAAATATCTGCCCTATCAACTTTCGATGGCAGGATAGTCGCCTGCCATGGTTGTAACGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTGATTCAGGGAGGTAGTGACAATAAATAACAATACCGGTCTCTTATGTGACTGGTAATTGGAATGAGCGGAACATAAATACCTTAACGAGGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCGGGACGGAGACGTCGGTCCTCCCTCGTGGTCGATACTGACTCTCTTCCTTAATTGCCTCGAGCGCCGCCTAGTCTTCATTGCCTGGGCGCGCTACGCGGCGCCGTTACCTTGAATAAATTATGGTGTTCAAAGCAGGCTTATGCTCTGAGTACATTAGCATGGAATAACGCTATAGGACTCCGGTCCTATTACGTTGGTCTTCTGACCGGAGTAATGATTAATAGGGACAGTCGGGGGCATTCGTACTTCATCGTTAGAGGTGAAATTCTTGGATCGATGAAAGACGAACTTCTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTGGGGGCGCGAAGACGATTAGATACCGTCCTAGTCCCAACCGTAAACGATGCCGACCCCGAATTGGCGCACGTATGACTTGACGTCGCCAGCGCCCGAGGAGAAATCAGAGTCTTTGGGTTCCGGGGGGAGTATGGTCGCAAGTCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGTGTGGAGCGTGCGGCTTAATTTGACTCAACGCGGGGAATCTTACCAGGTCCAGACATAGCGACGATTGACAGACTGATAGCTCTTTCTTGATCATATGGGTAGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGTTAACGAACGAGACCTCAGCTTGCTAACTAGTTGCGCGAAGATTTTCTTCGCGCACACTTCTTAGAAGGACTTTGAGCGTTTAGCTCATGGAGGTTTGAGGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACAATGATGCATTCAGCGAGCGGAATCCCTGATCGGAAACGGTCGGGCAATCTTTGAATCTTTATCGTGATGGGGATAGACCCTTGCAATTATTGGTCTCGAACGAGGAATACCTAGTAAGCGCTCGTCATCAGCGTGCGCTGACTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATAGAATGCTTCGGTGAAGCACTCGGATCGCGCCGCCGSCGGCGAAACCTCCGGGGACGGCATGAGAAGTTTGTTAAACCATATCGTTTAGAGGAAGGAGAAGTCGTAACAAGG >AJ850036 1 1961 Eukarya/Metazoa/Arthropoda/Polyphaga/Bagous et rel. TTGTCTCAAAGATTAAGCCATGCATGTCTCAGTACAAGCCATATTAAGGTGAAACCGCGAAAGGCTCATTAAATCAGTTATGGTTCCTTAGATCGTACCCAGGTTACTTGGATAACTGTGGTAATTCTAGAGCTAATACATGCAAACAGAGCTCCGACTGGAAACGGAAGGAGTGCTTTTATTAGATCAAAGCCAAACGGTAACTTAATGTTGTCGTACAATAATATTGTTGACTCTGAATAACTTTATGCTGATCGCATGGTCTTGCACCGGCGACGCATCTTTCAAATGTCTGCCTTATCAACTGTCGATGGTAGGTTCTGCGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCCGGCACGGGGAGGTAGTGACGAAAAATAACGATACGGGACTCATCCGAGGCCCCGTAATCGGAATGAGTACACTTTAAATCCTTTAACGAGGATCAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCGGTTAAAAAGCTCGTAGTCAAATTTGTGTCTCGTGCCGCTGGTTCATCGTTCGCGGTGTTAATTGGCGTGATACGAGACGTCCTGCCGGTGGGCTTTCAGATTTTTCCGTATTTCAGGACCATAACAATTGGTTTGTATCTGTGGCGTAATACTGCAGTGCAGGGCAATTGGTTAATGAACGGTTGGTTTTTGTGCTACCCAAACTTACAATCCTGTCGCGTTGCTCTTGATTGAGTGACGAGGTGGGCCGGCACGTTTACTTTGAACAAATTAGAGTGCTTAAAGCAGGCAAAATTTCGCCTGAATATTCTGTGCATGGAATAATGGAATAGGACCTCGGTTCTATTTCGTTGGTTTTCGGAACTCCGAGGTAATGATTAATAGGAACGGATGGGGGCATTCGTATTGCGACGTTAGAGGTGAAATTCTTGGATCGTCGCAAGACGAACAGAAGCGAAAGCATTTGCCAAAAACGCTTTCATTGATCAAGAACGAAAGTTAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTAACCGTAAACTATGTCATCTGACGATCCGTCGACGTTCCTTTATTGACTCGACGGGCAGTTTCCGGGAAACCAAAGATTTTGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAACCTCACCAGGCCCGGACACCGGAAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGCGATTTGTCTGGTTAATTCCGATAACGAACGAGACTCTAGCCTGCTAAATAGGCGACATATGACATCGCAAAGGCCAGCCGGTTTGATTTAAAGGGTGGCGAGGTGGCGTCAAGGCGTTTATCTCGTGCTCTTGTCAGATTGTGCGCGGTTTTTACTGTCGGCGTATAAATAATTCTTCTTAGAGGGACAGGCGGCTTTTAGCCGCACGAGATTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGAAGGAATCAGCGTGTCCTCCCTGGCCGAGTGGCCCGGGTAACCCGCTGAACCTCCTTCGTGCTAGGGATTGGGGCTTGCAATTGTTCCCCATGAACGAGGAATTCCCAGTAAGCGCGAGTCATAAGCTCGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGAATGATTTACTGAGGTCTTCGGATCGATGCGCGATGACGTCTGACGTTGATCGATGTATCCGAGAAGATGACCAAACTTGATCATTT >AM745254 1 1365 Archaea/Euryarchaeota/Halobacteriales/uncultured TTCCGGTTGATCCTGCCGGACCTGACTGCTATTGGAGTAGGACTAAGTCACGCTAGTCAAAGGTGTGGAATGGAACACCTGGCGCACGGCTCAGTAACACGTAGTGAACCTACCCTAAGGACGAGGACAACCACGGGAAACTGTGGCTAATCCTCGATAGGAAATTTGGCCTGGAACGGTATCTTTCCTAAAACCGGCTCGCCGTGAGACACGGGCCTTAGGATGGCGCTGCGGCCGATTATGCTAGACGGCGGTGTAAAGGACCACCGTGGCGACGATCGGTATGGGCGATGGAAGTCGGAGCCCAGAGTCGGCTACTGAGACAAGGAGCCGAGCCTTACGAGGCTTAGCGGTCGCGAAAACTCGCCAATGCACGAAAGTGTGAGTGGGCTACTCCAAGTGTCATTCTTACGGATGACTGTCGCCCAGTTTTACAAGCTGGGAAAGGAAGGAGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCTTCGAGTGGTCAGGACGAATATTGGGTCTAAAGCGTTCGTAGCGGGACAAGTAGGTTCCTGGTTAAATCCGATGTCACAAGCATCGGGCTGCTGGGAATACCGCTAGTCTTGAGAGCGGGATAGGACAGGGGTAGTCTATGGGCAGGGGTGAAATCCAGTGATCCATAGGCGACCACCGATGGCGAAGGCACCTGTCTGGAACGTATCTAACCGTGATGGACGAAAGCCAGGGGAGCGACCCGGATTAGATACCCGGTTAGTCCTGGCCGTAAACGATGCCGACTAGGTGTTGCAGCGGCCAAGAGCCACTGCAGTGCCACAGTGAAGACGTTAAGTCGGCCACCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGACGGGGGCGCACCACCAGGAGTGAAGCCTGCGGTTTAATTGGATTCAACGCCGAAAAACTCACCTAAACAGACGGCAGAATGAAGCTCAAGTTAATGACTTTAGCTAACTCGCCGAGAGGAAGTGCATGGCCGTCGACAGTTCGTGCTGTGAAGTGTCTTGTTAAGTCAAGCAACGAACGAGATCCACGTCCGCAATTGCCAGCGGGTCCCTTTGGGATGCCGGGAACCTTGCGGAGACTGCTTGGTGCTAAACCAGAGGAAGGAGTGGGCAACGGCAGGTCAGTATGCTCCGATAGTTTAGGGCTACACGCGGGCTGCAATGGTCGGTACAATGGGCCGCGACCCCGAAAGGGGAAGCCAATCCCGAAAGCCGGTCTCAGTCAGGATTGGGGTTTGCAACTCAGCCCCATGAATATGGAATTCCTAGTAAACGTGTTTCATTAAGACACGTTGAATACGTCCCCGCGCCTTGTACACACCGCCCGT >AY175392 1 1057 Archaea/Euryarchaeota/Methanomicrobiales CCCTTTCTGGTTGATCCTGCCAGAGGCCACTGCTATCGGGGTTCGACTAAGCCATGCGAGTCGAGAGGGGTAATGCCCTCGGCGAACGGCTCAGTAACACGTGGACAACCTACCCTCAGATCTGGGATAACTCCGGGAAACTGGAGATAATACCGGATAATCCGTGAACGCTGGAATGCCTTACGGTTCAAAGCTTTAGCGTCTGAGGATGGGTCTGCGGCCGATTAGGTAGTTGCTGGGGTAACGTCCCAACAAGCCGATAATCGGTACGGGTTGTGAGAGCAAGAGCCCGGAGATGGATTCTGAGACACGAATCCAGGTCCTACGGGGCGCAGCAGGCGCGAAAACTTTACACTGCGCGAAAGCGCGATAAGGGAACCTCGAGTGCGTGCGCAATGCGTACGCTTTTCACATGCCTAAAAAGCATGTGGAATAAGAGCCGGGCAAGACCGGTGCCAGCCGCCGCGGTAACACCGGCGGCTCAAGTGGTGGCCGCTATTATTGGGCTTAAAGGGTCCGTAGCCGGACCAGTTAGTCCCTTGGGAAATCTTACGGCTTAACCGTAAGGCTGCCAATGGATACTGCTGGCCTTGGGACCGGGAGAGGCAAGAGGTACCTCAGGGGTAGGAGTGAAATCCTGTAATCCTTGAGGGACCGCCAGTGGCGAAGGCGTCTTGCTAGAACGGGTCCGACGGTGAGGGACGAAAGCTAGGGGCACGAACCGGATTAGATACCCGGGTAGTCCTAGCCGTAAACGATGCGAGCTAGGTGTCACGTGGATTGCGAATCCATGTGGTGCCGTAGGGAAACCGTGAAGCTCGCCGCCTGGGAAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAACGGGTGGAGCCTGCGGTTTAATTGGACTCAACGCCGGAAAGCTCACCGGAGACGACAGCGGGATGAGGGCCAGGCTGATGACCTTGCTAGACTAGCTGAGAGGAGGTGCATGGCCGCCGTCAGTTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCAAAGGG >AY284588 1 1736 Eukarya/Metazoa/Nematoda/Aphelenchus et rel. CTCAAAGATTAAGCCATGCATGTGTAAGTATAAACGATTCAATCGTGAAACCGCGAACGGCTCATTATAACAGCTATGATCTACTTGATCTTGAGAATCCTAATTGGATAACTGTAGTAATTCTAGAGCTAATACATGCATAAGAGCTCGAACCTTGCGCAAGCGGGGGAAGAGTGCATTTATTGGAAGAAGACCAGTTGTGGCTGTAAAAAGCTGCATGTCGTTGACTCGCAATAACTAAGCTGATCGCATGGCCTTGTGCCGGCGACGAGTCTTTCGAGTATCTGCCTTATCAACTTTCGACGGTAGTGTATTTGACTACCATGGTGGTGACGGGTAACGGAGGATAAGGGTTCGACTCCGGAGAAGGGGCCTGAGAAATGGCCACTACGTCTAAGGATGGCAGCAGGCGCGCAAATTACCCACTCTCGGTACGAGGAGGTAGTGACGAAAAATAACGAAGAGGTCCCCTATGGGTCTTCTATTGGAATGGGTACAATTTAAACCCTTTAACGATTAACCAAGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCTAAATGCATAGATACATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGTGTTGGGGACTTGGTCCACTCTAACGGGTGGTACTTTGCTCCTTGACAATCAATGTTGGCTCACTTGGCGTAGTCTTCAGTGATTGCGTCATAGTTGGCTGACGAGTTTACTTTGAGCAAATCAGAGTGCTCCAAACAGGCGTTTACGCTTGAATGTTCGTGCATGGAATAATAGAAGAGGATTTCGGTTCTATTTTGTTGGTTTTGAGACCGAGATAATGGTTAACAGAGACAGACGGGGGCATTCGTACTTCTGCGTGAGAGGTGAAATTCTTGGACCGCAGAAAGACGCACCACAGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGATCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGTAAACGATGCCAACTAGCGATCTGTCGGTGGTGTGTTTTCGCCCTGATAGGGAGCTTCCCGGAAACGAAAGTCTTCGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAACCTCACCCGGGCCGGACACCGTAAGGATTGACAAATTGATAGCTTTTTCATGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTCGTGGAGCGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTTGGCACATTACATTGTGCGTCCTAACTTCTTAGAGGGATTTACGGCGTATAGCCGCAAGAGAATGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCGGGGCTGCACGCGCGCTACACTGGTGAAATCAACGTGTTCTCCTATGCCGAGAGGCACTTGGGTAAACCATTGAAAATTCGCCGTGATTGGGATCGGAGATTGAAATTATTTTCCGTGAACGAGGAATTCCAAGTAAGTGCGAGTCATCAACTCGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACCCGGGACTGGGTTATTTCGAGAAATTTGAGGATTGGCTAGGTGCTTGATGCCTCCGGGTGTCATCGCCTGTCGAGAATCAACTTAATCGAGATGGCCTGAACCGGGT >AY454558 1 1110 Archaea/Crenarchaeota/uncultured/uncultured ACTCACTAAGAGCGAATTGGGCCTTTCGTCGCATGCTAAAAGGCCGCCATGGCCGCGGGATTGGGCACGGGGGGACGGGTTGCCGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCTTGACAGGGTTACTCTGAGTGATTTCCGTTAAGGAGATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTGGTCGGGACGTTTATTGGGCCTAAAGCATCCGTAGCCGGTTCTACAAGTCTTCCGTTAAATCCACCTGCTTAACAGATGGGCTGCGGAAGATACTATAGAGCTAGGAGGCGGGAGAGGCAAGCGGTACTCGATGGGTAGGGGTAAAATCCGTTGATCCATTGAAGACCACCAGTGGCGAAGGCGGCTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCCAGCTGTAAACGATGCAGACTCGGTGATGAGTTGGCTTCTTGCTAACTCAGTGCCGCAGGGAAGCCGTTAAGTTTGCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGAAGCCTGCGGTTCAATTGGAGTCAACGCCGGAAATCTTACCGGGGGCGACAGCAGAGTGAAGGTCAAGCTGAAGACTTTACCAGACAAGCTGAGAGGAGGTGCATGGCCGTCGCCAGCTCGTGCCGTGAGGTGTCCTGTTAAGTCAGGTAACGAGCGAGATCCCTGCCTCTAGTTGCTACCATTATTCTCAGGAGTAGTGGAGCTAATTAGAGGGACCGCCGTCGCTGAGACGGAGGAAGGTGGGGGCTACGGCAGGTCAGTATGCCCCGAAACCCTCGGGCCACACGCGGGCTGCAATGGTAAGGACAATGAGTTTCAATTCCGAAAGGAGGAGGCAATCTCTAAACCTTACCACAGTTATGATTGAGGGCTGAAACTCGCCCTCATGAATATGGAATCCCTAGTAACCGCGTGTCACTATCGCGCGGTGAATACGTCCCTGCTCCTTGCACGAGTTAACCGAATCACTAGT >DQ421767 1 1422 Bacteria/Beta Gammaproteobacteria/Gammaproteobacteria_1/Oceanospirillales_2/Marinomonas AGCGGTAACAGGAATTAGCTTGCTAATTTGCTGACGAGCGGCGGACGGGTGAGTAACGCGTAGGAATCTGCCTGGTAGTGGGGGACAACATGTGGAAACGCATGCTAATACCGCATACGCCCTACGGGGGAAAGGAGGGGATCTTCGGACCTTTCGCTATCAGATGAGCCTGCGTGAGATTAGCTAGTTGGTGGGGTAAAGGCTCACCAAGGCGACGATCTCTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTCAGTTGGGAAGATGATGACGTTACCAACAGAAGAAGCACCGGCTAAATCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGTTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGACCAGAAAGTTGGGGGTGAAATCCCGGGGCTCAACCCCGGAACGGCCTCCAAAACTCCTGGTCTTGAGTACGGCAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATAGGAAGGAACATCAGTGGCGAAGGCGACACCCTGGACCGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCTACTAGCCGTTGGGGATTTTATTCTTAGTGGCGCAGCTAACGCGATAAGTAGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTACTCTTGACATCCAGAGAATTTAGCAGAGATGCTTTAGTGCCTTCGGGAACTCTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTATCCTTATTTGCCAGCACTTCGGGTGGGAACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTACGAGTAGGGCTACACACGTGCTACAATGGCGTATACAGAGGGCCGCAAGACCGCGAGGTGGAGCAAATCCCAAAAAGTACGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTTGATTGCTCCAGAAGTAGCTAGCTTAACCTTCGGGATGGCGGTTACCACGGAGTGGTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCTAGG >DQ628981 1 1786 Eukarya/Rhodophyta et al./Rhodophyta/Florideophyceae/Corallinales CACCTGGTTGATCCTGCCAGTGGTATATGCTTGTCTCAAAGACTAAGCCATGCAAGTCTAAGTATAAGTTATTCTTACGACAAAACTGCGAATGGCTCGGTAAAACAGCAATAATTTCTTCAGTGATGATTTTACTCACGGATAACCGTAGTAATTCTAGAGCTAATACGTGCAAATTAAAGCAATGACCGCAAGGCCAGCGCTGTGCCGTTTAGATAACAACACCATCATTTGGTGATTCATAATCGTCTTTCTGATCGCTTCGTGCGACACACTGTTCAAATTTCTGACCTATCAACTTTCGATGGTAAGGTAGTGTCTTACCATGGTTATGACGGGTAACGGACCGTGGGTGCGGGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGTAAATTACCCAATCCAGACACTGGGAGGTAGTGACAAGAAATATCAATGGGGGAACTGTAAAGTTCTTCCAATTGGAATGAGATCGAGCTAAATAGCCAAATCGAGAATCCAGCAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTGTAAGCGTATACCAAAGTTGTTGCACTTAAAACGCTCGTAGTCGGACATTGGTAGTTCCGGGAGTGTGCGCGTCGTGTGCATGCTCTGCGGGACTGCCTTTCGTGGAGTTGTCGGAGGGATGAAGCATTTTAATTAATGAACGTCCACCGCGCCCACTTTTTACTGTGAGAAAATCAGAGTGCTCAAAGCAGGCAATTGCCGTGAATGTATTAGCATGGAATAATAGAATAGGACTCGTTTCTATTTTGTTGGTTTGTTGGGAATGAGTAATGATTAAGAGGGACAGTTGGGGGCATTTGTATTACGAGGCTAGAGGTGAAATTCTTAGATTCTCGTAAGACAAACTGCTGCGAAAGCGTCTGCCAAGGATGTTTTCATTGATCAAGAACGAAAGTAAGGGGATCGAAGACGATCAGATACCGTCGTAGTCTTTACTATAAACGATGAGAACTAGGGATCGGGCGAGGCATTACGATGACCCGCCCGGCACCTTCCGCGAAAGCAAAGTGTTTGCTTTCTGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCATCACCGGGTGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTTACCAGGTCAGGACATAGTGAGGATGAACAGATTGAGAGCTCTTTTTTGATTCTATGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGTTAACGAGCGAGACCTGGGCGTGCTAACTAGGAGAGGCTACACTCGTGGTAGTTTTCGACTTCTTAGACGGACTGGTGGCGTCTAGCCACCGGAAGCTCCAGGCAATAACAGGTCTGAGATGCCCTTAGATGTTCTGGGCCGCACGCGTGCTACACTGAGTAATTCAATGGGTAAGGGAACACGAAAGTGCGACCTAATCTTGAAATTTGCTCGTGATGGGGATCGACGGTTGCAATTTTCCGTCGTGAACGAGGAATACCTTGTAGGCGCGTGTCATCATCACGCGCCGAATACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATTGAGTGATCCGGTGAGGCTCTGGGACCTGAGCGGAAAGAGCGTTTCGCTTGTTCTGCTTGGGAAACTTGGTCGAACCTTATCATTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGCTA >EF406474 1 1502 Bacteria/Firmicutes/Clostridiales/Ruminococcus et rel./Papillibacter et rel./Oscillospira TAGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAGCACCCTTGAAGGAGTTTTCGGACAACGGATAGGAATGCTTAGTGGCGGACTGGTGAGTAACGCGTGAGGAACCTGCCTTCCAGAGGGGGACAACAGTTGGAAACGACTGCTAATACCGCATGACGCATTGGTGTCGCATGGCACTGATGTCAAAGATTTATCGCTGGAAGATGGCCTCGCGTCTGATTAGCTAGTTGGTGAGGTAACGGCCCACCAAGGCGACGATCAGTAGCCGGACTGAGAGGTTGGCCGGCCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTAAGGGGGAAGAGCAGAAGACGGTACCCCTTGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGAGAGACAAGTCAGATGTGAAATCCACGGGCTCAACCCGTGAACTGCATTTGAAACTGTTTCCCTTGAGTGTCGGAGAGGTAATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGATTACTGGACGATAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATCGATACTAGGTGTGCGGGGACTGACCCCCTGCGTGCCGGAGTTAACACAATAAGTATCGCACCTGGGGAGTACGATCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGATTATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGGCTTGACATCCTACTAACGAAGTAGAGATACATTAGGTGCCCTTCGGGACAAGAGAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCTTCAGTAGCCAGCAGGTAAAGCCGGGCACTCTGGAGAGACTGCCGGGGATAACCCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGCTACACACGTGCTACAATGGCGTAAACAGAGGGAAGCGAGCCCGCGAGGGGGAGCAAATCCCAAAAATAACGTCCCAGTTCGGATTGTAGTCTGCAACCCGACTACATGAAGCTGGAATCGCTAGTAATCGCGGATCAGAATGCCGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGGGAGTCGGAAATGCCCGAAGTCTGTGACCCAACCGCAAGGAGGGAGCAGCCGAAGGCAGGTCGGATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAA >EF516988 1 1782 Bacteria/Firmicutes/Bacillales Mollicutes/Staphylococcaceae/Staphylococcus/Staphylococcus aureus et rel./Staphylococcus aureus et rel./Staphylococcus warneri GTACCGCTTTGGAGCCTCTCGAGTTTGATCCTGGCTCAGGAGGTCCTAACAAGGTAACCAGTATTGGATCCCCTAGAGTTTGATCCCGGCCCCTAAAGTTTGAACAAAGTCCAGGAAATTGGGGCCCCTACAGTTTAATCTCTTTTGCTTCATGGTAAAAAACTGAAAGACGGTTTCGGCTGTCGCTATTTGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGCGACGATGCGTAGCCCACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGGCGAAAGCCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTAAGGGAAGAACAAGTACAGTAGTAACTGGCTGTACCTTGACGGTACCTTATTAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCAAGTGTAGCGGTGAAATGCGTAGAGATTTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGAGAGTACGGTCGCAGGACTGAAACTCAAAAGAATTTGACGGGGGGCTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCAAGTGTAGCGGTGAAATGCGTAGAGATTTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCCGTTGACCACTGTAGAGATATAGTTTCCCCTTCGGGGGCAACGGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGATCTTAGTTGCCATCATTTAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACGATACAAACGGTTGCCAACTCGCGAGAGGGAGGTATCCGATAAAGTCGTTCTCAGTTCGGATTGTTGGCCCCAACTCGCGTACGTGAAACCAGAATAACCAGTAATGGCTCCTCAGCATTTTGATCCGGGCTCGTTAAGTGGTAACAAGGTAACCGCTATTGGATCCTTAGAGTTTGATCCGGCTCAGGAAGTCGTAACAAGGTAACCAGTATGGTCCTCTAGAG >EF551905 1 1203 Bacteria/Beta Gammaproteobacteria/Xanthomonadales GATAGCGGCGCGATTCGCCCTTCCTACGGGGGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCAGATCCAGCCATGCCGCGTGGGTGAAGAAGGCCTTCGGGTTGTAAAGCCCTTTTGTTGGGAAAGAAAGACGTCCGGCTAATACCCGGATGGAATGACGGTACCCAAAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTACTCGGAATTACTGGGCGTAAAGGGTGCGTAGGTGGTTCGTTAAGTCTGATGTGAAAGCCCTGGGCTCAACCTGGGAATTGCATTGGATACTGGCGAGCTGGAGTGCGGTAGAGGGTAGTGGAATTCCCGGTGTAGCAGTGAAATGCGTAGATATCGGGAGGAACATCCGTGGCGAAGGCGACTACCTGGACCAGCACTGACACTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGCGAACTGGATGTTGGGTTCAATCAGGAACTCAGTATCGAAGCTAACGCGTTAAGTTCGCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGTATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGCCTTGACATGTCGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCGAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTTAGTTGCCAGCACGTAATGGTGGGAACTCTAAGGAGACCGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTACTACAATGGGAAAGGACAGAGGGCTGCGAACCCGCGAGGGCAAGCCAATCCCAGAAACCTTTCTCCCAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGCAGATCAGCATTGCTGCGGTGAATACGTTTCCGGTCTTGTACAACACCGCCCGTCACACCATGGGAGTGGGTGCCACCAGAAGTAGCTAGACTACGTTCGGGAGACCGTTACCCACGGTTGAATTCATGGACTTGGGGTGAGTCCGTAAACAGGGTTACCCCCG >EU132755 1 1345 Bacteria/Actinobacteria/CMN et rel./CMN/Pseudonocardiaceae_3/Pseudonocardia aurantiaca et rel./Pseudonocardia aurantiaca et rel. GAACGCTTGACGGCGTGCTTACACATGCAAGTCGAACGGGCCATTGCTCTTCGGGGTGGTGGTTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTCGGCTTCGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATATTCACATCTTGTTGCATGGTGGGGTGTGGAAAGGGTTTCTGGCTGGGGATGGGCTCGCGGCCTATCAGCTTGTTGGTGGGGTGATGGCCTACCAAGGCGGTGACGGGTAGCCGGCCTGAGAGGGCGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCGCAATGGGCGGAAGCCTGACGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCCCCGACGAAGCGAAAGTGACGGTAGGGGTAGAAGAAGCGCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTTCCGTGAAAACTGGGGGCTTAACTTCCAGCTTGCGGTGGATACGGGCTGACTGGAGTGCGGCAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGGTGGCGAAGGCGGGTCTCTGGGCCGTTACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGTTGGGCGCTAGGTGTGGGGGACTTTCCACGTTCTCCGTGCCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGTGGCTTAATTCGATGCAACGCGAAGAACCTTACCTGGGTTTGACATGCGCGGTAATCCTGTAGAGATACAGGGTCCTTCGGGGCCGTGTACAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTTCCATGTTGCCAGCACGTGATGGTGGGGACTCATGGGAGACTGCCGGGGTCAACTCGGAGGAAGGTAGGGATGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTGCACACATGCTACAATGGCTCATACAGAGGGCTGCGATGCTGTGAGGCTGAGCGAATCCCTTAAAGTGAGTCTCAGTTCGGATCGGGGTCTGCAACTCGACCCCGTGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGATACGTTCCCGGGCATTGCACTCA >EU570118 1 1433 Archaea/Euryarchaeota/Thermoplasmatales/uncultured CGGTTGATCCTGCCGGCGCTCACCGCTCTTGGAATCCGATTAAGCCATGTGAGTCGAGAGGGTTCGGCCCTCGGCAAACTGCTCAGTAACACGTGGATAACCTAACCTAAGGTGGGAGATAATCTCGGAAAACTGAGGCTAATATCCCATAGACCTTGATGACTGGAATGTTTTGAGGTTTAAAGTTACGACGCCTTAGGATGGGTCTGCGGCCTATCAGGTTGTAGTTAGTGTAAAGGACTAACTAGCCGACGACGGGTACGGGCCATGGGAGTGGTTGCCCGGAGATGGACTCTGAGACACGAGTCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTTGCAATGCGCGAAAGCGCGACAAGGGGATTCCAAGTGCATGCACTAAGTGTATGCTTTTCGTGAGTGTAAAAAGCTCACGGAATAAGGGCTGGGTAAGACTGGTGCCAGCCGCCGCGGTAATACCAGCGGCCCTAGTGGTGATCGTTTTTATTGGGCCTAAAGCGTCCGTAGCCGGTTCGGTAAATCTCTGGGTAAATCGTTGGGCTTAACCCAACGAATTCTGGGGAGACTGCCGAACTTGGGACCGGGAGAGGTCGGAGGTACTCCAGGGGTAGGGGTGAAATCCTGTAATCCTTGGGGGACCACCGGTGGCGAAAGCGTCCGACCAGAACGGGTCCGACGGTAAGGGACGAAGCCCTGGGTCGCGAACCGGATTAGATACCCGGGTAGTCCAGGGTGTAAACGCTGTGCGCTTGGTGTAGGGGGTCCTACGAGGGCATCCTGTGCCGGAGAGAAGTTGTTAAGCGCACCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACAGCAACGGGAGGAGCGTGCGGTTTAATTGGATTCAACGCCGGAAAACTCACCAGGGGCGACTGCCACATGAAGATCAAGCTGATGACTTTATCTGATTGGTAGAGAGGTGGTGCATGGCCGTCGTCAGTTCGTACCGTAGGGCGTTCTGTTAAGTCAGATAACGAACGAGACCCTTGCCCTTAATTGCCATGTTTCCCTCCGGGGGAACGGTACTTTAAGGGGACCGCTGGTGCAAAATCAGAGGAAGGGAAGGGCAACGGTAGGTCAGTATGCCCCGAATCCCCTGGGCAACACGCGCGCTACAAAGGCCGGGACAAAGGGTTCCGACACCGAGAGGTGAAGGTAATCCCGAAACCTGTCCGTAGTTCGGATCGAGGGCTGCAACCCGCCCTCGTGAAGCTGGATTCCGTAGTAATCGCAGATCAACATCCTGCGGTGAATATGCCCCTGCTCCTTGCACACACCGCCCGTCAAACCATCCGAGTGGAGTTTCGATGAGGGTGGGATTCTTGTCCTTCTCAAATCGCGATTTCGCAAGGAGGGTTAAGTCGTAACAAGGTAACC""" def label_to_name(x): fields = x.split() return '%s: %s' % (fields[3].split('/')[0], fields[0]) seqs = LoadSeqs(data=fasta_str.split('\n'),moltype=DNA,aligned=False,label_to_name=label_to_name) Now pick up with `Step 5 <./building_a_tree_of_life.html#step5>`_ above. PyCogent-1.5.3/doc/cookbook/building_alignments.rst000644 000765 000024 00000011762 12014677717 023406 0ustar00jrideoutstaff000000 000000 ******************* Building alignments ******************* .. authors, Gavin Huttley, Kristian Rother, Patrick Yannul Using the cogent aligners ========================= Running a pairwise Needleman-Wunsch-Alignment --------------------------------------------- .. doctest:: >>> from cogent.align.algorithm import nw_align >>> seq1 = 'AKSAMITNY' >>> seq2 = 'AKHSAMMIT' >>> print nw_align(seq1,seq2) ('AK-SAM-ITNY', 'AKHSAMMIT--') Running a progressive aligner ----------------------------- We import useful functions and then load the sequences to be aligned. .. doctest:: >>> from cogent import LoadSeqs, LoadTree, DNA >>> seqs = LoadSeqs('data/test2.fasta', aligned=False, moltype=DNA) For nucleotides ^^^^^^^^^^^^^^^ We load a canned nucleotide substitution model and the progressive aligner ``TreeAlign`` function. .. doctest:: >>> from cogent.evolve.models import HKY85 >>> from cogent.align.progressive import TreeAlign We first align without providing a guide tree. The ``TreeAlign`` algorithm builds pairwise alignments and estimates the substitution model parameters and pairwise distances. The distances are used to build a neighbour joining tree and the median value of substitution model parameters are provided to the substitution model for the progressive alignment step. .. doctest:: >>> aln, tree = TreeAlign(HKY85(), seqs) Param Estimate Summary Stats: kappa ============================== Statistic Value ------------------------------ Count 10 Sum 1e+06 Median 4.256 Mean 1e+05 StandardDeviation 3.162e+05 Variance 1e+11 ------------------------------ >>> aln 5 x 60 text alignment: NineBande[-C-----GCCA...], Mouse[GCAGTGAGCCA...], DogFaced[GCAAGGAGCCA...], ... We then align using a guide tree (pre-estimated) and specifying the ratio of transitions to transversions (kappa). .. doctest:: >>> tree = LoadTree(treestring='(((NineBande:0.0128202449453,Mouse:0.184732725695):0.0289459522137,DogFaced:0.0456427810916):0.0271363715538,Human:0.0341320714654,HowlerMon:0.0188456837006)root;') >>> params={'kappa': 4.0} >>> aln, tree = TreeAlign(HKY85(), seqs, tree=tree, param_vals=params) >>> aln 5 x 60 text alignment: NineBande[-C-----GCCA...], Mouse[GCAGTGAGCCA...], DogFaced[GCAAGGAGCCA...], ... For codons ^^^^^^^^^^ We load a canned codon substitution model and use a pre-defined tree and parameter estimates. .. doctest:: >>> from cogent.evolve.models import MG94HKY >>> tree = LoadTree(treestring='((NineBande:0.0575781680031,Mouse:0.594704139406):0.078919659556,DogFaced:0.142151930069,(HowlerMon:0.0619991555435,Human:0.10343006422):0.0792423439112)') >>> params={'kappa': 4.0, 'omega': 1.3} >>> aln, tree = TreeAlign(MG94HKY(), seqs, tree=tree, param_vals=params) >>> aln 5 x 60 text alignment: NineBande[------CGCCA...], Mouse[GCAGTGAGCCA...], DogFaced[GCAAGGAGCCA...], ... Building alignments with 3rd-party apps such as muscle or clustalw ================================================================== See :ref:`alignment-controllers`. Converting gaps from aa-seq alignment to nuc seq alignment ========================================================== We load some unaligned DNA sequences and show their translation. .. doctest:: >>> from cogent import LoadSeqs, DNA, PROTEIN >>> seqs = [('hum', 'AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT'), ... ('mus', 'AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG'), ... ('rat', 'CTGAACAAGCAGCCACTTTCAAACAAGAAA')] >>> unaligned_DNA = LoadSeqs(data=seqs, moltype = DNA, aligned = False) >>> print unaligned_DNA.toFasta() >hum AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT >mus AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG >rat CTGAACAAGCAGCCACTTTCAAACAAGAAA >>> print unaligned_DNA.getTranslation() >hum KQIQESSENGSLAARQERQAQVNLT >mus KQIQESGESGSLAARQERQAQVNLT >rat LNKQPLSNKK We load an alignment of these protein sequences. .. doctest:: >>> aligned_aa_seqs = [('hum', 'KQIQESSENGSLAARQERQAQVNLT'), ... ('mus', 'KQIQESGESGSLAARQERQAQVNLT'), ... ('rat', 'LNKQ------PLS---------NKK')] >>> aligned_aa = LoadSeqs(data = aligned_aa_seqs, moltype = PROTEIN) We then obtain an alignment of the DNA sequences from the alignment of their translation. .. doctest:: >>> aligned_DNA = aligned_aa.replaceSeqs(unaligned_DNA) >>> print aligned_DNA >hum AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT >mus AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG >rat CTGAACAAGCAG------------------CCACTTTCA---------------------------AACAAGAAA PyCogent-1.5.3/doc/cookbook/building_phylogenies.rst000644 000765 000024 00000016250 11523700277 023560 0ustar00jrideoutstaff000000 000000 ******************** Building phylogenies ******************** .. Anuj Pahwa, Gavin Huttley Built-in Phylogenetic reconstruction ==================================== By distance method ------------------ Given an alignment, a phylogenetic tree can be generated based on the pair-wise distance matrix computed from the alignment. Fast pairwise distance estimation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For a limited number of evolutionary models a fast implementation is available. Here we use the Tamura and Nei 1993 model. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> from cogent.evolve.pairwise_distance import TN93Pair >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> dist_calc = TN93Pair(DNA, alignment=aln) >>> dist_calc.run() We can obtain the distances as a ``dict`` for direct usage in phylogenetic reconstruction .. doctest:: >>> dists = dist_calc.getPairwiseDistances() or as a table for display / saving .. doctest:: >>> print dist_calc.Dists[:4,:4] # truncated to fit screens Pairwise Distances ============================================ Seq1 \ Seq2 Galago HowlerMon Rhesus -------------------------------------------- Galago * 0.2157 0.1962 HowlerMon 0.2157 * 0.0736 Rhesus 0.1962 0.0736 * Orangutan 0.1944 0.0719 0.0411 -------------------------------------------- Other statistics are also available, such the as the standard errors of the estimates. .. doctest:: >>> print dist_calc.StdErr[:4,:4] # truncated to fit screens Standard Error of Pairwise Distances ============================================ Seq1 \ Seq2 Galago HowlerMon Rhesus -------------------------------------------- Galago * 0.0103 0.0096 HowlerMon 0.0103 * 0.0054 Rhesus 0.0096 0.0054 * Orangutan 0.0095 0.0053 0.0039 -------------------------------------------- More general estimation of pairwise distances ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The standard cogent likelihood function can also be used to estimate distances. Because these require numerical optimisation they can be significantly slower than the fast estimation approach above. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> from cogent.phylo import distance >>> from cogent.evolve.models import F81 >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> d = distance.EstimateDistances(aln, submodel=F81()) >>> d.run() The example above will use the F81 nucleotide substitution model and run the ``distance.EstimateDistances()`` method with the default options for the optimiser. To configure the optimiser a dictionary of optimisation options can be passed onto the ``run`` command. The example below configures the ``Powell`` optimiser to run a maximum of 10000 evaluations, with a maximum of 5 restarts (a total of 5 x 10000 = 50000 evaluations). .. doctest:: >>> dist_opt_args = dict(max_restarts=5, max_evaluations=10000) >>> d.run(dist_opt_args=dist_opt_args) >>> print d ============================================================================================ Seq1 \ Seq2 Galago HowlerMon Rhesus Orangutan Gorilla Human Chimpanzee -------------------------------------------------------------------------------------------- Galago * 0.2112 0.1930 0.1915 0.1891 0.1934 0.1892 HowlerMon 0.2112 * 0.0729 0.0713 0.0693 0.0729 0.0697 Rhesus 0.1930 0.0729 * 0.0410 0.0391 0.0421 0.0395 Orangutan 0.1915 0.0713 0.0410 * 0.0136 0.0173 0.0140 Gorilla 0.1891 0.0693 0.0391 0.0136 * 0.0086 0.0054 Human 0.1934 0.0729 0.0421 0.0173 0.0086 * 0.0089 Chimpanzee 0.1892 0.0697 0.0395 0.0140 0.0054 0.0089 * -------------------------------------------------------------------------------------------- Building A Phylogenetic Tree From Pairwise Distances ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Phylogenetic Trees can be built by using the neighbour joining algorithm by providing a dictionary of pairwise distances. This dictionary can be obtained either from the output of ``distance.EstimateDistances()`` .. doctest:: >>> from cogent.phylo import nj >>> njtree = nj.nj(d.getPairwiseDistances()) >>> njtree = njtree.balanced() >>> print njtree.asciiArt() /-Rhesus /edge.1--| | | /-HowlerMon | \edge.0--| | \-Galago -root----| |--Orangutan | | /-Human \edge.2--| | /-Gorilla \edge.3--| \-Chimpanzee Or created manually as shown below. .. doctest:: >>> dists = {('a', 'b'): 2.7, ('c', 'b'): 2.33, ('c', 'a'): 0.73} >>> njtree2 = nj.nj(dists) >>> print njtree2.asciiArt() /-a | -root----|--b | \-c By least-squares ---------------- We illustrate the phylogeny reconstruction by least-squares using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space. Here ``a`` is the number of taxa to exhaustively evaluate all possible phylogenies for. Successive taxa will are added to the top ``k`` trees (measured by the least-squares metric) and ``k`` trees are kept at each iteration. .. doctest:: >>> import cPickle >>> from cogent.phylo.least_squares import WLS >>> dists = cPickle.load(open('data/dists_for_phylo.pickle')) >>> ls = WLS(dists) >>> stat, tree = ls.trex(a = 5, k = 5, show_progress = False) Other optional arguments that can be passed to the ``trex`` method are: ``return_all``, whether the ``k`` best trees at the final step are returned as a ``ScoredTreeCollection`` object; ``order``, a series of tip names whose order defines the sequence in which tips will be added during tree building (this allows the user to randomise the input order). By ML ----- We illustrate the phylogeny reconstruction using maximum-likelihood using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space, setting .. doctest:: >>> from cogent import LoadSeqs, DNA >>> from cogent.phylo.maximum_likelihood import ML >>> from cogent.evolve.models import F81 >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> ml = ML(F81(), aln) The ``ML`` object also has the ``trex`` method and this can be used in the same way as for above, i.e. ``ml.trex()``. We don't do that here because this is a very slow method for phylogenetic reconstruction. Building phylogenies with 3rd-party apps such as FastTree or RAxML ================================================================== A thorough description is :ref:`appcontroller-phylogeny`. PyCogent-1.5.3/doc/cookbook/checkpointing_long_running.rst000644 000765 000024 00000005127 11444532333 024760 0ustar00jrideoutstaff000000 000000 .. _checkpointing-optimisation: Checkpointing optimisation runs =============================== .. sectionauthor Gavin Huttley A common problem in HPC systems is to make sure a long running process is capable of restarting after interruptions by restoring to the last check pointed state. The optimiser class code has this capability, for example and we'll illustrate that here. We first construct a likelihood function object. .. doctest:: >>> from cogent import LoadSeqs, LoadTree >>> from cogent.evolve.models import F81 >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> tree = LoadTree('data/primate_brca1.tree') >>> sub_model = F81() >>> lf = sub_model.makeLikelihoodFunction(tree) >>> lf.setAlignment(aln) We then start an optimisation, providing a filename for checkpointing and specifying a time-interval in (which we make very short here to ensure something get's written, for longer running functions the default ``interval`` setting is fine). Calling ``optimise`` then results in the notice that checkpoint's are being written. .. doctest:: >>> checkpoint_fn = 'checkpoint_this.txt' >>> lf.optimise(filename=checkpoint_fn, interval=100, show_progress = False) CHECKPOINTING to file 'checkpoint_this.txt'... Recovering from a real run that was interrupted generates an additional notification: ``RESUMING from file ..``. For the purpose of this snippet we just show that the checkpoint file exists. .. doctest:: >>> import cPickle >>> data = cPickle.load(open(checkpoint_fn)) >>> print data >> import cPickle >>> dists = cPickle.load(open('data/dists_for_phylo.pickle')) We make the weighted least-squares calculator. .. doctest:: >>> from cogent.phylo import distance, least_squares >>> ls = least_squares.WLS(dists) We start searching for trees, providing the name of the file to checkpoint to. .. doctest:: >>> checkpoint_phylo_fn = 'checkpoint_phylo.txt' >>> score, tree = ls.trex(a = 5, k = 1, filename=checkpoint_phylo_fn, interval=100) .. following cleans up files .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files([checkpoint_fn, checkpoint_phylo_fn], error_on_missing=False) PyCogent-1.5.3/doc/cookbook/code_development.rst000644 000765 000024 00000000260 11305664061 022661 0ustar00jrideoutstaff000000 000000 **************** Code development **************** Testing code ============ *To be written.* Tests for stochastic functions ------------------------------ *To be written.*PyCogent-1.5.3/doc/cookbook/community_analysis.rst000644 000765 000024 00000021045 11466065606 023310 0ustar00jrideoutstaff000000 000000 ****************** Community analysis ****************** alpha diversity =============== Phylogenetic Diversity (PD) --------------------------- For each environment (i.e. sample), calculates the amount of branch length in a phylogenetic tree that lead to its sequences. First we will load in a Newick formatted tree. .. doctest:: >>> from cogent.parse.tree import DndParser >>> from cogent.maths.unifrac.fast_tree import UniFracTreeNode >>> tree_in = open("data/Crump_example_tree_newick.txt") >>> tree = DndParser(tree_in, UniFracTreeNode) Next we will load information on which sequences in the tree come from which environment. .. doctest:: >>> from cogent.maths.unifrac.fast_unifrac import count_envs >>> envs_in = open("data/Crump_et_al_example_env_file.txt") >>> envs = count_envs(envs_in) Finally, we can calculate the PD values for each environment in the tree .. doctest:: >>> from cogent.maths.unifrac.fast_unifrac import PD_whole_tree >>> envs, PD_values = PD_whole_tree(tree, envs) >>> print envs ['E_FL', 'E_PA', 'O_FL', 'O_UN', 'R_FL', 'R_PA'] >>> print PD_values#doctest: +SKIP [ 5.85389 7.60352 2.19215 2.81821 3.93728 3.7534 ] ``PD_vals`` is a ``numpy`` ``array`` with the values representing each environment in envs. Rarefaction ------------- *To be written.* Parametric methods ------------------ *To be written.* Nonparametric methods --------------------- *To be written.* beta diversity ============== Unifrac ------- The Fast UniFrac implementation in PyCogent is the source code for the `Fast UniFrac web tool `_ and the `QIIME pipeline `_ for Microbial community analysis. Calculate a UniFrac Distance Matrix and apply PCoA and UPGMA ------------------------------------------------------------ The UniFrac analysis is run on open tree and environment file objects. The resulting dictionary has a distance matrix of pairwise UniFrac values ('distance_matrix'), a Newick string representing the results of performing UPGMA clustering on this distance matrix ('cluster_envs') and the results of running Principal Coordinates Analysis on the distance matrix ('pcoa'). One can specify weighted UniFrac with ``weighted=True``. Here we run an unweighted analysis. .. doctest:: >>> from cogent.maths.unifrac.fast_unifrac import fast_unifrac_file >>> tree_in = open("data/Crump_example_tree_newick.txt") >>> envs_in = open("data/Crump_et_al_example_env_file.txt") >>> result = fast_unifrac_file(tree_in, envs_in, weighted=False) >>> print result['cluster_envs']#doctest: +SKIP ((((('E_FL':0.339607103063,'R_FL':0.339607103063):0.0279991540511,'R_PA':0.367606257114):0.0103026524101,'E_PA':0.377908909524):0.0223322024492,'O_UN':0.400241111973):0.00976759866402,'O_FL':0.410008710637); >>> print result['pcoa']#doctest: +SKIP ================================================================================================= Type Label vec_num-0 vec_num-1 vec_num-2 vec_num-3 vec_num-4 vec_num-5 ------------------------------------------------------------------------------------------------- Eigenvectors E_FL 0.05 0.22 -0.09 -0.26 -0.29 -0.00 Eigenvectors E_PA -0.36 0.24 0.21 -0.08 0.18 -0.00 Eigenvectors O_FL 0.32 -0.26 0.30 -0.13 0.05 -0.00 Eigenvectors O_UN -0.28 -0.40 -0.24 -0.04 0.01 -0.00 Eigenvectors R_FL 0.29 0.18 -0.28 0.09 0.22 -0.00 Eigenvectors R_PA -0.02 0.02 0.11 0.42 -0.17 -0.00 Eigenvalues eigenvalues 0.40 0.36 0.29 0.27 0.19 -0.00 Eigenvalues var explained (%) 26.34 23.84 19.06 18.02 12.74 -0.00 ------------------------------------------------------------------------------------------------- Perform pairwise tests of whether samples are significantly different with UniFrac ---------------------------------------------------------------------------------- The analysis is run on open tree and environment file objects. In this example, we use unweighted unifrac (``weighted=False``), we permute the environment assignments on the tree 50 times (``num_iters=50``) and we perform UniFrac on all pairs of environments (``test_on="Pairwise"``). A list is returned with a tuple for each pairwise comparison with items: 0 - the first environment, 1 - the second environment, 2- the uncorrected p-value and 3 - the p-value after correcting for multiple comparisons with the Bonferroni correction. .. doctest:: >>> from cogent.maths.unifrac.fast_unifrac import fast_unifrac_permutations_file >>> tree_in = open("data/Crump_example_tree_newick.txt") >>> envs_in = open("data/Crump_et_al_example_env_file.txt") >>> result = fast_unifrac_permutations_file(tree_in, envs_in, weighted=False, num_iters=50, test_on="Pairwise") >>> print result[0]#doctest: +SKIP ('E_FL', 'E_PA', 0.17999999999999999, 1.0) Perform a single UniFrac significance test on the whole tree ------------------------------------------------------------ The analysis is run on open tree and environment file objects. In this example, we use weighted unifrac (``weighted=True``), we permute the environment assignments on the tree 50 times (``num_iters=50``) and we perform a unifrac significance test on the whole tree (``test_on="Tree"``). The resulting list has only one item since a single test was performed. It is a 3 item tuple where the second and third values are the p-value. .. doctest:: >>> from cogent.maths.unifrac.fast_unifrac import fast_unifrac_permutations_file >>> tree_in = open("data/Crump_example_tree_newick.txt") >>> envs_in = open("data/Crump_et_al_example_env_file.txt") >>> result = fast_unifrac_permutations_file(tree_in, envs_in, weighted=True, num_iters=50, test_on="Tree") >>> print result#doctest: +SKIP [('whole tree', 0.56000000000000005, 0.56000000000000005)] P-test ------- Perform pairwise tests of whether samples are significantly different with the P-test (Martin, 2002) ---------------------------------------------------------------------------------------------------- The analysis is run on open tree and environment file objects. In this example, we permute the environment assignments on the tree 50 times (``num_iters=50``) and perform the p test for all pairs of environments (``test_on="Pairwise"``). A list is returned with a tuple for each pairwise comparison with items: 0 - the first environment, 1 - the second environment, 2- the uncorrected p-value and 3 - the p-value after correcting for multiple comparisons with the Bonferroni correction. .. doctest:: >>> from cogent.maths.unifrac.fast_unifrac import fast_p_test_file >>> tree_in = open("data/Crump_example_tree_newick.txt") >>> envs_in = open("data/Crump_et_al_example_env_file.txt") >>> result = fast_p_test_file(tree_in, envs_in, num_iters=50, test_on="Pairwise") >>> print result[0]#doctest: +SKIP ('E_FL', 'E_PA', 0.040000000000000001, 0.59999999999999998) Taxon-based ----------- Computing a distance matrix between samples ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyCogent provides many different ways to compute pairwise distances between objects. ``cogent/maths/distance_transform.py`` provides a set of functions to calculate dissimilarities/distances between samples, given an abundance matrix. Here is one example: .. doctest:: >>> from cogent.maths.distance_transform import dist_euclidean >>> from numpy import array >>> abundance_data = array([[1, 3], ... [5, 2], ... [0.1, 22]],'float') .. note:: see ``distance_transform.py`` for other metrics than euclidean We now have 3 samples, and the abundance of each column (e.g.: species) in that sample. The first sample has 1 individual of species 1, 3 individuals of species 2. We now compute the relatedness between these samples, using euclidean distance between the rows: .. doctest:: >>> dists = dist_euclidean(abundance_data) >>> print str(dists.round(2)) # doctest: +SKIP [[ 0. , 4.12, 19.02] [ 4.12, 0. , 20.59 ] [ 19.02, 20.59 , 0. ]] this distance matrix can be visualized via multivariate reduction techniques such as :ref:`multivariate-analysis`. Taxonomy ======== *To be written.* .. need to decide on methods here PyCogent-1.5.3/doc/cookbook/controlling_third_party_applications.rst000644 000765 000024 00000003154 11572746234 027074 0ustar00jrideoutstaff000000 000000 ************************************ Controlling third party applications ************************************ Existing supported apps ======================= Alignment apps -------------- clustalw and muscle ^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import LoadSeqs, DNA >>> from cogent.app.clustalw import align_unaligned_seqs as clustal_aln >>> from cogent.app.muscle import align_unaligned_seqs as muscle_aln >>> seqs = LoadSeqs(filename='data/test2.fasta', aligned=False) >>> aln1 = clustal_aln(seqs, DNA) >>> aln2 = muscle_aln(seqs, DNA) >>> aln1 == aln2 True >>> from cogent.app.fasttree import build_tree_from_alignment >>> tr = build_tree_from_alignment(aln1,moltype=DNA) >>> print tr.asciiArt() /-Mouse | ---------|--NineBande | | /-DogFaced \0.508---| | /-HowlerMon \0.752---| \-Human And if you have matplotlib installed you can draw the tree (see :ref:`draw-trees`). .. note:: Tree output based on v2.0.1 .. TODO add in cross-ref to drawing usage example BLAST ----- See :ref:`blast-usage`. Phylo apps ---------- *To be written.* RNA structure apps ------------------ *To be written.* Visualisation ------------- *To be written.* PyMol, MAGE ^^^^^^^^^^^ Adding new apps =============== *To be written.* Pipelining 3rd party apps ========================= *To be written.* .. integrating with cogent features .. grab seqs from genbank, align, build tree, cogent evolutionary analysis PyCogent-1.5.3/doc/cookbook/dealing_with_hts_data.rst000644 000765 000024 00000002636 11464143357 023671 0ustar00jrideoutstaff000000 000000 ********************* Dealing with HTS data ********************* FASTQ formatted files ===================== Parsing ------- FASTQ format can be exported by Illumina's pipeline software. .. doctest:: >>> from cogent.parse.fastq import MinimalFastqParser >>> for label, seq, qual in MinimalFastqParser('data/fastq.txt'): ... print label ... print seq ... print qual GAPC_0015:6:1:1259:10413#0/1 AACACCAAACTTCTCCACCACGTGAGCTACAAAAG ````Y^T]`]c^cabcacc`^Lb^ccYT\T\Y\WF GAPC_0015:6:1:1283:11957#0/1 TATGTATATATAACATATACATATATACATACATA ]KZ[PY]_[YY^```ac^\\`bT``c`\aT``bbb... Converting quality scores to numeric data ----------------------------------------- In FASTQ format, ASCII characters are used to represent base-call quality. Unfortunately, vendors differ in the range of characters used. According to their documentation, Illumina uses the character range from 64-104. We parse the sequence file and convert the characters into integers on the fly. .. doctest:: >>> from cogent.parse.fastq import MinimalFastqParser >>> for label, seq, qual in MinimalFastqParser('data/fastq.txt'): ... qual = map(lambda x: ord(x)-64, qual) ... print label ... print seq ... print qual GAPC_0015:6:1:1259:10413#0/1 AACACCAAACTTCTCCACCACGTGAGCTACAAAAG [32, 32, 32, 32, 25, 30, 20, 29, 32, 29, 35, 30, 35, 33, 34, 35, 33, ... PyCogent-1.5.3/doc/cookbook/DNA_and_RNA_sequences.rst000644 000765 000024 00000012155 11444532333 023352 0ustar00jrideoutstaff000000 000000 .. _dna-rna-seqs: DNA and RNA sequences --------------------- .. authors, Gavin Huttley, Kristian Rother, Patrick Yannul, Tom Elliott, Tony Walters, Meg Pirrung Creating a DNA sequence from a string ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ All sequence and alignment objects have a molecular type, or ``MolType`` which provides key properties for validating sequence characters. Here we use the ``DNA`` ``MolType`` to create a DNA sequence. .. doctest:: >>> from cogent import DNA >>> my_seq = DNA.makeSequence("AGTACACTGGT") >>> my_seq DnaSequence(AGTACAC... 11) >>> print my_seq AGTACACTGGT >>> str(my_seq) 'AGTACACTGGT' Creating a RNA sequence from a string ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import RNA >>> rnaseq = RNA.makeSequence('ACGUACGUACGUACGU') Converting to FASTA format ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import DNA >>> my_seq = DNA.makeSequence('AGTACACTGGT') >>> print my_seq.toFasta() >0 AGTACACTGGT Convert a RNA sequence to FASTA format ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import RNA >>> rnaseq = RNA.makeSequence('ACGUACGUACGUACGU') >>> rnaseq.toFasta() '>0\nACGUACGUACGUACGU' Creating a named sequence ^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import DNA >>> my_seq = DNA.makeSequence('AGTACACTGGT','my_gene') >>> my_seq DnaSequence(AGTACAC... 11) >>> type(my_seq) Setting or changing the name of a sequence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import DNA >>> my_seq = DNA.makeSequence('AGTACACTGGT') >>> my_seq.Name = 'my_gene' >>> print my_seq.toFasta() >my_gene AGTACACTGGT Complementing a DNA sequence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import DNA >>> my_seq = DNA.makeSequence("AGTACACTGGT") >>> print my_seq.complement() TCATGTGACCA Reverse complementing a DNA sequence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> print my_seq.reversecomplement() ACCAGTGTACT The ``rc`` method name is easier to type .. doctest:: >>> print my_seq.rc() ACCAGTGTACT .. _translation: Translate a ``DnaSequence`` to protein ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import DNA >>> my_seq = DNA.makeSequence('GCTTGGGAAAGTCAAATGGAA','protein-X') >>> pep = my_seq.getTranslation() >>> type(pep) >>> print pep.toFasta() >protein-X AWESQME Converting a DNA sequence to RNA ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import DNA >>> my_seq = DNA.makeSequence('ACGTACGTACGTACGT') >>> print my_seq.toRna() ACGUACGUACGUACGU Convert an RNA sequence to DNA ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import RNA >>> rnaseq = RNA.makeSequence('ACGUACGUACGUACGU') >>> print rnaseq.toDna() ACGTACGTACGTACGT Testing complementarity ^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import DNA >>> a = DNA.makeSequence("AGTACACTGGT") >>> a.canPair(a.complement()) False >>> a.canPair(a.reversecomplement()) True Joining two DNA sequences ^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import DNA >>> my_seq = DNA.makeSequence("AGTACACTGGT") >>> extra_seq = DNA.makeSequence("CTGAC") >>> long_seq = my_seq + extra_seq >>> long_seq DnaSequence(AGTACAC... 16) >>> str(long_seq) 'AGTACACTGGTCTGAC' Slicing DNA sequences ^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> my_seq[1:6] DnaSequence(GTACA) Getting 3rd positions from codons ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We'll do this by specifying the position indices of interest, creating a sequence ``Feature`` and using that to extract the positions. .. doctest:: >>> from cogent import DNA >>> seq = DNA.makeSequence('ATGATGATGATG') Creating the position indices, note that we start at the 2nd index (the 'first' codon's 3rd position) indicate each position as a *span* (``i -- i+1``). .. doctest:: >>> indices = [(i, i+1) for i in range(len(seq))[2::3]] Create the sequence feature and use it to slice the sequence. .. doctest:: >>> pos3 = seq.addFeature('pos3', 'pos3', indices) >>> pos3 = pos3.getSlice() >>> assert str(pos3) == 'GGGG' Getting 1st and 2nd positions from codons ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The only difference here to above is that our spans cover 2 positions. .. doctest:: >>> from cogent import DNA >>> seq = DNA.makeSequence('ATGATGATGATG') >>> indices = [(i, i+2) for i in range(len(seq))[::3]] >>> pos12 = seq.addFeature('pos12', 'pos12', indices) >>> pos12 = pos12.getSlice() >>> assert str(pos12) == 'ATATATAT' Return a randomized version of the sequence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: print rnaseq.shuffle() ACAACUGGCUCUGAUG Remove gaps from a sequence ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import RNA >>> s = RNA.makeSequence('--AUUAUGCUAU-UAu--') >>> print s.degap() AUUAUGCUAUUAU PyCogent-1.5.3/doc/cookbook/ensembl.rst000644 000765 000024 00000043471 12003174737 021007 0ustar00jrideoutstaff000000 000000 Note that much more extensive documentation is available in :ref:`query-ensembl`. Connecting ---------- .. Gavin Huttley `Ensembl `_ provides access to their MySQL databases directly or users can download and run those databases on a local machine. To use the Ensembl's UK servers for running queries, nothing special needs to be done as this is the default setting for PyCogent's ``ensembl`` module. To use a different Ensembl installation, you create an account instance: .. doctest:: >>> from cogent.db.ensembl import HostAccount >>> account = HostAccount('fastcomputer.topuni.edu', 'username', ... 'canthackthis') To specify a specific port to connect to MySQL on: .. doctest:: >>> from cogent.db.ensembl import HostAccount >>> account = HostAccount('anensembl.server.edu', 'someuser', ... 'somepass', port=3306) .. we create valid account now to work on my local machines here at ANU .. doctest:: :hide: >>> import os >>> hotsname, uname, passwd = os.environ['ENSEMBL_ACCOUNT'].split() >>> account = HostAccount(hotsname, uname, passwd) Species to be queried --------------------- To see what existing species are available .. doctest:: >>> from cogent.db.ensembl import Species >>> print Species ================================================================================ Common Name Species Name Ensembl Db Prefix -------------------------------------------------------------------------------- A.aegypti Aedes aegypti aedes_aegypti A.clavatus Aspergillus clavatus aspergillus_clavatus... If Ensembl has added a new species which is not yet included in ``Species``, you can add it yourself. .. doctest:: >>> Species.amendSpecies('A latinname', 'a common name') You can get the common name for a species .. doctest:: >>> Species.getCommonName('Procavia capensis') 'Rock hyrax' and the Ensembl database name prefix which will be used for all databases for this species. .. doctest:: >>> Species.getEnsemblDbPrefix('Procavia capensis') 'procavia_capensis' Species common names are used to construct attributes on PyCogent ``Compara`` instances). You can get the name that will be using the ``getComparaName`` method. For species with a real common name .. doctest:: >>> Species.getComparaName('Procavia capensis') 'RockHyrax' or with a shortened species name .. doctest:: >>> Species.getComparaName('Caenorhabditis remanei') 'Cremanei' Get genomic features -------------------- Find a gene by gene symbol ^^^^^^^^^^^^^^^^^^^^^^^^^^ We query for the *BRCA2* gene for humans. .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> print human Genome(Species='Homo sapiens'; Release='67') >>> genes = human.getGenesMatching(Symbol='BRCA2') >>> for gene in genes: ... if gene.Symbol == 'BRCA2': ... print gene ... break Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='breast cancer 2,...'; StableId='ENSG00000139618'; Status='KNOWN'; Symbol='BRCA2') Find a gene by Ensembl Stable ID ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We use the stable ID for *BRCA2*. .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> gene = human.getGeneByStableId(StableId='ENSG00000139618') >>> print gene Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='breast cancer 2,...'; StableId='ENSG00000139618'; Status='KNOWN'; Symbol='BRCA2') Find genes matching a description ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We look for breast cancer related genes that are estrogen induced. .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> genes = human.getGenesMatching(Description='breast cancer anti-estrogen') >>> for gene in genes: ... print gene Gene(Species='Homo sapiens'; BioType='lincRNA'; Description='breast cancer anti-estrogen...'; StableId='ENSG00000262117'; Status='NOVEL'; Symbol='BCAR4')... We can also require that an exact (case insensitive) match to the word(s) occurs within the description by setting ``like=False``. .. doctest:: >>> genes = human.getGenesMatching(Description='breast cancer anti-estrogen', ... like=False) >>> for gene in genes: ... print gene Gene(Species='Homo sapiens'; BioType='lincRNA'; Description='breast cancer anti-estrogen...'; StableId='ENSG00000262117'; Status='NOVEL'; Symbol='BCAR4')... Get canonical transcript for a gene ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We get the canonical transcripts for *BRCA2*. .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618') >>> transcript = brca2.CanonicalTranscript >>> print transcript Transcript(Species='Homo sapiens'; CoordName='13'; Start=32889610; End=32973347; length=83737; Strand='+') Get the CDS for a transcript ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618') >>> transcript = brca2.CanonicalTranscript >>> cds = transcript.Cds >>> print type(cds) >>> print cds ATGCCTATTGGATCCAAAGAGAGGCCA... Look at all transcripts for a gene ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618') >>> for transcript in brca2.Transcripts: ... print transcript Transcript(Species='Homo sapiens'; CoordName='13'; Start=32889610; End=32973347; length=83737; Strand='+') Transcript(Species='Homo sapiens'; CoordName='13'; Start=32889641; End=32907428; length=17787; Strand='+')... Get the first exon for a transcript ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We show just for the canonical transcript. .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618') >>> print brca2.CanonicalTranscript.Exons[0] Exon(StableId=ENSE00001184784, Rank=1) Get the introns for a transcript ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We show just for the canonical transcript. .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618') >>> for intron in brca2.CanonicalTranscript.Introns: ... print intron Intron(TranscriptId=ENST00000380152, Rank=1) Intron(TranscriptId=ENST00000380152, Rank=2) Intron(TranscriptId=ENST00000380152, Rank=3)... Inspect the genomic coordinate for a feature ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618') >>> print brca2.Location.CoordName 13 >>> print brca2.Location.Start 32889610 >>> print brca2.Location.Strand 1 Get repeat elements in a genomic interval ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We query the genome for repeats within a specific coordinate range on chromosome 13. .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> repeats = human.getFeatures(CoordName='13', Start=32879610, End=32889610, feature_types='repeat') >>> for repeat in repeats: ... print repeat.RepeatClass ... print repeat ... break SINE/Alu Repeat(CoordName='13'; Start=32879362; End=32879662; length=300; Strand='-', Score=2479.0) Get CpG island elements in a genomic interval ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We query the genome for CpG islands within a specific coordinate range on chromosome 11. .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> islands = human.getFeatures(CoordName='11', Start=2150341, End=2170833, feature_types='cpg') >>> for island in islands: ... print island ... break CpGisland(CoordName='11'; Start=2158951; End=2162484; length=3533; Strand='-', Score=3254.0) Get SNPs -------- For a gene ^^^^^^^^^^ We find the genetic variants for the canonical transcript of *BRCA2*. .. note:: The output is significantly truncated! .. doctest:: >>> from cogent.db.ensembl import Genome >>> human = Genome('human', Release=67, account=account) >>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618') >>> transcript = brca2.CanonicalTranscript >>> print transcript.Variants (>> for variant in transcript.Variants: ... print variant ... break Variation(Symbol='rs55880202'; Effect=['2KB_upstream_variant', '5_prime_UTR_variant', '5KB_upstream_variant']; Alleles='C/T')... Get a single SNP ^^^^^^^^^^^^^^^^ We get a single SNP and print it's allele frequencies. .. doctest:: >>> snp = list(human.getVariation(Symbol='rs34213141'))[0] >>> print snp.AlleleFreqs ============================= allele freq sample_id ----------------------------- A 0.0303 933 G 0.9697 933 G 1.0000 11208 G 1.0000 11519 A 0.0110 113559 G 0.9889 113559... What alignment types available ------------------------------ We create a ``Compara`` instance for human, chimpanzee and macaque. .. doctest:: >>> from cogent.db.ensembl import Compara >>> compara = Compara(['human', 'chimp', 'macaque'], Release=67, ... account=account) >>> print compara.method_species_links Align Methods/Clades =================================================================================================================== method_link_species_set_id method_link_id species_set_id align_method align_clade ------------------------------------------------------------------------------------------------------------------- 580 10 34468 PECAN 19 amniota vertebrates Pecan 548 13 34115 EPO 6 primates EPO 578 13 34466 EPO 12 eutherian mammals EPO 582 14 34697 EPO_LOW_COVERAGE 35 eutherian mammals EPO_LOW_COVERAGE ------------------------------------------------------------------------------------------------------------------- Get genomic alignment for a gene region --------------------------------------- We first get the syntenic region corresponding to human gene *BRCA2*. .. doctest:: >>> from cogent.db.ensembl import Compara >>> compara = Compara(['human', 'chimp', 'macaque'], Release=67, ... account=account) >>> human_brca2 = compara.Human.getGeneByStableId(StableId='ENSG00000139618') >>> regions = compara.getSyntenicRegions(region=human_brca2, align_method='EPO', align_clade='primates') >>> for region in regions: ... print region SyntenicRegions: Coordinate(Human,chro...,13,32889610-32973805,1) Coordinate(Macaque,chro...,17,11686607-11778803,1) Coordinate(Chimp,chro...,13,31957346-32040817,1)... We then get a cogent ``Alignment`` object, requesting that sequences be annotated for gene spans. .. doctest:: >>> aln = region.getAlignment(feature_types='gene') >>> print repr(aln) 3 x 98805 dna alignment: Homo sapiens:chromosome:13:3288... Parsing syntenic regions ------------------------ Not all regions in a given genome have a syntenic alignment, and some have more than one alignment. To illustrate these cases, we can consider an alignment between mouse and human, using the ``PECAN`` alignment method in the vertebrates clade: .. doctest:: >>> species = ["mouse", "human"] >>> compara = Compara(species, Release=67, account=account) >>> clade = "vertebrates" >>> chrom, start, end, strand = "X", 165754928, 165755079, "-" >>> regions = compara.getSyntenicRegions(Species="mouse", CoordName=chrom, ... Start=start, End=end, align_method="PECAN", ... align_clade=clade, Strand=strand) >>> aligned_pairs = [r for r in regions] >>> alignment = aligned_pairs[0] >>> aligned_regions = [m for m in alignment.Members ... if m.Region is not None] >>> source_region, target_region = aligned_regions >>> print source_region.Location.CoordName, source_region.Location.Start, source_region.Location.End X 165754928 165755079 >>> print target_region.Location.CoordName, target_region.Location.Start, target_region.Location.End X 11132954 11133105 .. note:: We took the aligned regions from the ``regions`` generator and put them in a list for convenience. If there are no regions returned (i.e. ``num_pairs`` is zero), then no alignment could be found. In the case of the above region, an exon in the *Hccs* gene, there is only one alignment. We then accessed the coordinates of the alignment using the ``Members`` attribute of the region. Each element of ``aligned_regions`` is a ``SyntenicRegion`` instance, whose coordinates can be pulled from the ``Location`` attribute. This example shows that mouse region ``X:165754928-165755079`` aligns only to human region ``X:11132954-11133105``. .. note:: Sometimes, the genomic coordinates given to ``getSyntenicRegions`` will contain multiple alignments between the pair of genomes, in which case two or more regions will be returned in ``aligned_pairs``. Getting related genes --------------------- What gene relationships are available ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent.db.ensembl import Compara >>> compara = Compara(['human', 'chimp', 'macaque'], Release=67, ... account=account) >>> print compara.getDistinct('relationship') [u'ortholog_one2many', u'contiguous_gene_split', u'ortholog_one2one',... Get one-to-one orthologs ^^^^^^^^^^^^^^^^^^^^^^^^ We get the one-to-one orthologs for *BRCA2*. .. doctest:: >>> from cogent.db.ensembl import Compara >>> compara = Compara(['human', 'chimp', 'macaque'], Release=67, ... account=account) >>> orthologs = compara.getRelatedGenes(StableId='ENSG00000139618', ... Relationship='ortholog_one2one') >>> print orthologs RelatedGenes: Relationships=ortholog_one2one Gene(Species='Macaca mulatta'; BioType='protein_coding'; Description=... We iterate over the related members. .. doctest:: >>> for ortholog in orthologs.Members: ... print ortholog Gene(Species='Macaca mulatta'; BioType='protein_coding'; Description=... We get statistics on the ortholog CDS lengths. .. doctest:: >>> print orthologs.getMaxCdsLengths() [10008, 10257, 10257] We get the sequences as a sequence collection, with annotations for gene. .. doctest:: >>> seqs = orthologs.getSeqCollection(feature_types='gene') Get CDS for all one-to-one orthologs ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We sample all one-to-one orthologs for a group of species, generating a FASTA formatted string that can be written to file. We check all species have an ortholog and that all are translatable. .. doctest:: >>> from cogent.core.alphabet import AlphabetError >>> common_names = ["mouse", "rat", "human", "opossum"] >>> latin_names = set([Species.getSpeciesName(n) for n in common_names]) >>> latin_to_common = dict(zip(latin_names, common_names)) >>> compara = Compara(common_names, Release=67, account=account) >>> for gene in compara.Human.getGenesMatching(BioType='protein_coding'): ... orthologs = compara.getRelatedGenes(gene, ... Relationship='ortholog_one2one') ... # make sure all species represented ... if orthologs is None or orthologs.getSpeciesSet() != latin_names: ... continue ... seqs = [] ... for m in orthologs.Members: ... try: # if sequence can't be translated, we ignore it ... # get the CDS without the ending stop ... seq = m.CanonicalTranscript.Cds.withoutTerminalStopCodon() ... # make the sequence name ... seq.Name = '%s:%s:%s' % \ ... (latin_to_common[m.genome.Species], m.StableId, m.Location) ... aa = seq.getTranslation() ... seqs += [seq] ... except (AlphabetError, AssertionError): ... seqs = [] # exclude this gene ... break ... if len(seqs) == len(common_names): ... fasta = '\n'.join(s.toFasta() for s in seqs) ... break Get within species paralogs ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> paralogs = compara.getRelatedGenes(StableId='ENSG00000164032', ... Relationship='within_species_paralog') >>> print paralogs RelatedGenes: Relationships=within_species_paralog Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='H2A histone... PyCogent-1.5.3/doc/cookbook/genetic_code.rst000644 000765 000024 00000010215 11541775513 021765 0ustar00jrideoutstaff000000 000000 Getting a genetic code ---------------------- The standard genetic code. .. doctest:: >>> from cogent.core.genetic_code import GeneticCodes >>> standard_code = GeneticCodes[1] The vertebrate mt genetic code. .. doctest:: >>> from cogent.core.genetic_code import GeneticCodes >>> mt_gc = GeneticCodes[2] >>> print mt_gc.Name Vertebrate Mitochondrial To see the key -> genetic code mapping, use a loop. .. doctest:: >>> for key, code in GeneticCodes.items(): ... print key, code.Name 1 Standard Nuclear 2 Vertebrate Mitochondrial 3 Yeast Mitochondrial... Translate DNA sequences ----------------------- .. doctest:: >>> from cogent.core.genetic_code import DEFAULT as standard_code >>> standard_code.translate('TTTGCAAAC') 'FAN' Conversion to a ``ProteinSequence`` from a ``DnaSequence`` is shown in :ref:`translation`. Translate all six frames ------------------------ .. doctest:: >>> from cogent import DNA >>> from cogent.core.genetic_code import DEFAULT as standard_code >>> seq = DNA.makeSequence('ATGCTAACATAAA') >>> translations = standard_code.sixframes(seq) >>> print translations ['MLT*', 'C*HK', 'ANI', 'FMLA', 'LC*H', 'YVS'] Find out how many stops in a frame ---------------------------------- .. doctest:: >>> from cogent import DNA >>> from cogent.core.genetic_code import DEFAULT as standard_code >>> seq = DNA.makeSequence('ATGCTAACATAAA') >>> stops_frame1 = standard_code.getStopIndices(seq, start=0) >>> stops_frame1 [9] >>> stop_index = stops_frame1[0] >>> seq[stop_index:stop_index+3] DnaSequence(TAA) Translate a codon ----------------- .. doctest:: >>> from cogent.core.genetic_code import DEFAULT as standard_code >>> standard_code['TTT'] 'F' or get the codons for a single amino acid .. doctest:: >>> standard_code['A'] ['GCT', 'GCC', 'GCA', 'GCG'] Look up the amino acid corresponding to a single codon ------------------------------------------------------ .. doctest:: >>> from cogent.core.genetic_code import DEFAULT as standard_code >>> standard_code['TTT'] 'F' Or get all the codons for one amino acid ---------------------------------------- .. doctest:: >>> standard_code['A'] ['GCT', 'GCC', 'GCA', 'GCG'] For a group of amino acids -------------------------- .. doctest:: >>> targets = ['A','C'] >>> codons = [standard_code[aa] for aa in targets] >>> codons [['GCT', 'GCC', 'GCA', 'GCG'], ['TGT', 'TGC']] >>> flat_list = sum(codons,[]) >>> flat_list ['GCT', 'GCC', 'GCA', 'GCG', 'TGT', 'TGC'] Converting the ``CodonAlphabet`` to codon series ------------------------------------------------ .. doctest:: >>> from cogent import DNA >>> my_seq = DNA.makeSequence("AGTACACTGGTT") >>> sorted(my_seq.CodonAlphabet()) ['AAA', 'AAC', 'AAG', 'AAT'... >>> len(my_seq.CodonAlphabet()) 61 Obtaining the codons from a ``DnaSequence`` object -------------------------------------------------- Use the method ``getInMotifSize`` .. doctest:: >>> from cogent import LoadSeqs,DNA >>> my_seq = DNA.makeSequence('ATGCACTGGTAA','my_gene') >>> codons = my_seq.getInMotifSize(3) >>> print codons ['ATG', 'CAC', 'TGG', 'TAA'] You can't translate a sequence that contains a stop codon. .. doctest:: >>> pep = my_seq.getTranslation() Traceback (most recent call last): AlphabetError: TAA Remove the stop codon first ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import LoadSeqs,DNA >>> my_seq = DNA.makeSequence('ATGCACTGGTAA','my_gene') >>> seq = my_seq.withoutTerminalStopCodon() >>> pep = seq.getTranslation() >>> print pep.toFasta() >my_gene MHW >>> print type(pep) Or we can just grab the correct slice from the ``DnaSequence`` object ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import LoadSeqs,DNA >>> my_seq = DNA.makeSequence('CAAATGTATTAA','my_gene') >>> pep = my_seq[:-3].getTranslation().toFasta() >>> print pep >my_gene QMY PyCogent-1.5.3/doc/cookbook/hpc_environments.rst000644 000765 000024 00000000330 11642035237 022725 0ustar00jrideoutstaff000000 000000 **************** HPC environments **************** .. sectionauthor Gavin Huttley .. toctree:: :maxdepth: 1 parallel_tasks_with_mpi parallel_tasks_with_multiprocess checkpointing_long_running PyCogent-1.5.3/doc/cookbook/index.rst000644 000765 000024 00000001243 11610676134 020461 0ustar00jrideoutstaff000000 000000 ################# PyCogent Cookbook ################# Contents: .. toctree:: :maxdepth: 3 introduction tips_for_using_python manipulating_biological_data accessing_databases analysis_of_sequence_composition sequence_similarity_search controlling_third_party_applications blast building_alignments building_a_tree_of_life building_phylogenies using_likelihood_to_perform_evolutionary_analyses sequence_simulation standard_statistical_analyses multivariate_data_analysis community_analysis dealing_with_hts_data hpc_environments useful_utilities code_development .. managing_alphabet PyCogent-1.5.3/doc/cookbook/introduction.rst000644 000765 000024 00000001270 11444532333 022070 0ustar00jrideoutstaff000000 000000 ************ Introduction ************ The cookbook remains incomplete with many sections not yet written. Rather than wait until it is complete before presenting it we provide it as is. What we hope is clear from the Cookbook is that PyCogent has extensive capabilities that cover for many areas of genomic biology including, but not limited to: comparative genomics, metagenomics and systems biology. If you have particular need of an example falling in any of the sections, please post a documentation request on the appropriate projects forums_ page. Better yet, if you write one and attach to such a posting we'll include it! .. _forums: http://sourceforge.net/projects/pycogent/forums PyCogent-1.5.3/doc/cookbook/loading_sequences.rst000755 000765 000024 00000016270 11444532333 023050 0ustar00jrideoutstaff000000 000000 .. _load-seqs: Loading nucleotide, protein sequences ------------------------------------- .. author, Tony Walters, Tom Elliott, Gavin Huttley ``LoadSeqs`` from a file ^^^^^^^^^^^^^^^^^^^^^^^^ As an alignment """"""""""""""" The function ``LoadSeqs()`` creates either a sequence collection or an alignment depending on the keyword argument ``aligned`` (the default is ``True``). .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs('data/long_testseqs.fasta', moltype=DNA) >>> type(aln) This example and some of the following use the :download:`long_testseqs.fasta <../data/long_testseqs.fasta>` file. As a sequence collection (unaligned) """""""""""""""""""""""""""""""""""" Setting the ``LoadSeqs()`` function keyword argument ``aligned=False`` returns a sequence collection. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> seqs = LoadSeqs('data/long_testseqs.fasta', moltype=DNA, aligned=False) >>> print type(seqs) .. note:: An alignment can be sliced, but a ``SequenceCollection`` can not. Specifying the file format """""""""""""""""""""""""" ``LoadSeqs()`` uses the filename suffix to infer the file format. This can be overridden using the ``format`` argument. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs('data/long_testseqs.fasta', moltype=DNA, ... format='fasta') ... >>> aln 5 x 2532 dna alignment: Human[TGTGGCACAAA... ``LoadSeqs`` from a series of strings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import LoadSeqs >>> seqs = ['>seq1','AATCG-A','>seq2','AATCGGA'] >>> seqs_loaded = LoadSeqs(data=seqs) >>> print seqs_loaded >seq1 AATCG-A >seq2 AATCGGA ``LoadSeqs`` from a dict of strings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import LoadSeqs >>> seqs = {'seq1': 'AATCG-A','seq2': 'AATCGGA'} >>> seqs_loaded = LoadSeqs(data=seqs) Specifying the sequence molecular type ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Simple case of loading a ``list`` of aligned amino acid sequences in FASTA format, with and without molecule type specification. When the ``MolType`` is not specified it defaults to ASCII. .. doctest:: >>> from cogent import LoadSeqs >>> from cogent import DNA, PROTEIN >>> protein_seqs = ['>seq1','DEKQL-RG','>seq2','DDK--SRG'] >>> proteins_loaded = LoadSeqs(data=protein_seqs) >>> proteins_loaded.MolType MolType(('a', 'b', 'c', 'd', 'e', ... >>> print proteins_loaded >seq1 DEKQL-RG >seq2 DDK--SRG >>> proteins_loaded = LoadSeqs(data=protein_seqs, moltype=PROTEIN) >>> print proteins_loaded >seq1 DEKQL-RG >seq2 DDK--SRG Stripping label characters on loading ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Load a list of aligned nucleotide sequences, while specifying the DNA molecule type and stripping the comments from the label. In this example, stripping is accomplished by passing a function that removes everything after the first whitespace to the ``label_to_name`` parameter. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> DNA_seqs = ['>sample1 Mus musculus','AACCTGC--C','>sample2 Gallus gallus','AAC-TGCAAC'] >>> loaded_seqs = LoadSeqs(data=DNA_seqs, moltype=DNA, label_to_name=lambda x: x.split()[0]) >>> print loaded_seqs >sample1 AACCTGC--C >sample2 AAC-TGCAAC Using alternative constructors for the `Alignment` object ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An example of using an alternative constructor is given below. A constructor is passed to the aligned parameter in lieu of ``True`` or ``False``. .. doctest:: >>> from cogent import LoadSeqs >>> from cogent.core.alignment import DenseAlignment >>> seqs = ['>seq1','AATCG-A','>seq2','AATCGGA'] >>> seqs_loaded = LoadSeqs(data=seqs,aligned=DenseAlignment) >>> print seqs_loaded >seq1 AATCG-A >seq2 AATCGGA Loading sequences using format parsers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``LoadSeqs`` is just a convenience interface to format parsers. It can sometimes be more effective to use the parsers directly, say when you don't want to load everything into memory. Loading FASTA sequences from an open file or list of lines """""""""""""""""""""""""""""""""""""""""""""""""""""""""" To load FASTA formatted sequences directly, you can use the ``MinimalFastaParser``. .. note:: This returns the sequences as strings. .. doctest:: >>> from cogent.parse.fasta import MinimalFastaParser >>> f=open('data/long_testseqs.fasta') >>> seqs = [(name, seq) for name, seq in MinimalFastaParser(f)] >>> print seqs [('Human', 'TGTGGCACAAATAC... Handling overloaded FASTA sequence labels """"""""""""""""""""""""""""""""""""""""" The FASTA label field is frequently overloaded, with different information fields present in the field and separated by some delimiter. This can be flexibly addressed using the ``LabelParser``. By creating a custom label parser, we can decided which part we use as the sequence name. We show how convert a field into something specific. .. doctest:: >>> from cogent.parse.fasta import LabelParser >>> def latin_to_common(latin): ... return {'Homo sapiens': 'human', ... 'Pan troglodtyes': 'chimp'}[latin] >>> label_parser = LabelParser("%(species)s", ... [[1, "species", latin_to_common]], split_with=':') >>> for label in ">abcd:Homo sapiens:misc", ">abcd:Pan troglodtyes:misc": ... label = label_parser(label) ... print label, type(label) human chimp The ``RichLabel`` objects have an ``Info`` object as an attribute, allowing specific reference to all the specified label fields. .. doctest:: >>> from cogent.parse.fasta import MinimalFastaParser, LabelParser >>> fasta_data = ['>gi|10047090|ref|NP_055147.1| small muscle protein, X-linked [Homo sapiens]', ... 'MNMSKQPVSNVRAIQANINIPMGAFRPGAGQPPRRKECTPEVEEGVPPTSDEEKKPIPGAKKLPGPAVNL', ... 'SEIQNIKSELKYVPKAEQ', ... '>gi|10047092|ref|NP_037391.1| neuronal protein [Homo sapiens]', ... 'MANRGPSYGLSREVQEKIEQKYDADLENKLVDWIILQCAEDIEHPPPGRAHFQKWLMDGTVLCKLINSLY', ... 'PPGQEPIPKISESKMAFKQMEQISQFLKAAETYGVRTTDIFQTVDLWEGKDMAAVQRTLMALGSVAVTKD'] ... >>> label_to_name = LabelParser("%(ref)s", ... [[1,"gi", str], ... [3, "ref", str], ... [4, "description", str]], ... split_with="|") ... >>> for name, seq in MinimalFastaParser(fasta_data, label_to_name=label_to_name): ... print name ... print name.Info.gi ... print name.Info.description NP_055147.1 10047090 small muscle protein, X-linked [Homo sapiens] NP_037391.1 10047092 neuronal protein [Homo sapiens] Loading DNA sequences from a GenBank file """"""""""""""""""""""""""""""""""""""""" .. todo:: get sample data for this *To be written.* PyCogent-1.5.3/doc/cookbook/managing_trees.rst000644 000765 000024 00000000155 11444532333 022333 0ustar00jrideoutstaff000000 000000 Managing Trees -------------- .. authors, Dan Knights PhyloNode methods ^^^^^^^^^^^^^^^^^ *To be written.*PyCogent-1.5.3/doc/cookbook/manipulating_biological_data.rst000644 000765 000024 00000001265 11444532333 025220 0ustar00jrideoutstaff000000 000000 **************************** Manipulating biological data **************************** .. authors, Gavin Huttley, Kristian Rother, Patrick Yannul Sequences ========= .. toctree:: :maxdepth: 1 loading_sequences moltypesequence alphabet DNA_and_RNA_sequences protein_sequences alignments annotations Genetic code ============ .. toctree:: :maxdepth: 1 genetic_code Trees ===== .. toctree:: :maxdepth: 1 simple_trees phylonodes managing_trees Tables ====== .. toctree:: :maxdepth: 1 tables Structures ========== .. toctree:: :maxdepth: 1 structural_data structural_data_2 structural_contacts PyCogent-1.5.3/doc/cookbook/moltypesequence.rst000644 000765 000024 00000010150 11444532333 022566 0ustar00jrideoutstaff000000 000000 ********************************************** Using the ``MolType`` and ``Sequence`` objects ********************************************** .. authors Meg Pirrung MolType ======= ``MolType`` provides services for resolving ambiguities, or providing the correct ambiguity for recoding. It also maintains the mappings between different kinds of alphabets, sequences and alignments. One issue with ``MolType``'s is that they need to know about ``Sequence``, ``Alphabet``, and other objects, but, at the same time, those objects need to know about the ``MolType``. It is thus essential that the connection between these other types and the ``MolType`` can be made after the objects are created. Setting up a ``MolType`` object with an RNA sequence ---------------------------------------------------- .. doctest:: >>> from cogent.core.moltype import MolType, IUPAC_RNA_chars,\ ... IUPAC_RNA_ambiguities, RnaStandardPairs, RnaMW,\ ... IUPAC_RNA_ambiguities_complements >>> from cogent.core.sequence import NucleicAcidSequence >>> testrnaseq = 'ACGUACGUACGUACGU' >>> RnaMolType = MolType( ... Sequence = NucleicAcidSequence(testrnaseq), ... motifset = IUPAC_RNA_chars, ... Ambiguities = IUPAC_RNA_ambiguities, ... label = "rna_with_lowercase", ... MWCalculator = RnaMW, ... Complements = IUPAC_RNA_ambiguities_complements, ... Pairs = RnaStandardPairs, ... add_lower=True, ... preserve_existing_moltypes=True, ... make_alphabet_group=True, ... ) Setting up a ``MolType`` object with a DNA sequence --------------------------------------------------- .. doctest:: >>> from cogent.core.moltype import MolType, IUPAC_DNA_chars,\ ... IUPAC_DNA_ambiguities, DnaMW, IUPAC_DNA_ambiguities_complements,\ ... DnaStandardPairs >>> testdnaseq = 'ACGTACGTACGUACGT' >>> DnaMolType = MolType( ... Sequence = NucleicAcidSequence(testdnaseq), ... motifset = IUPAC_DNA_chars, ... Ambiguities = IUPAC_DNA_ambiguities, ... label = "dna_with_lowercase", ... MWCalculator = DnaMW, ... Complements = IUPAC_DNA_ambiguities_complements, ... Pairs = DnaStandardPairs, ... add_lower=True, ... preserve_existing_moltypes=True, ... make_alphabet_group=True, ... ) Setting up a ``MolType`` object with a protein sequence ------------------------------------------------------- .. doctest:: >>> from cogent.core.moltype import MolType, IUPAC_PROTEIN_chars,\ ... IUPAC_PROTEIN_ambiguities, ProteinMW >>> from cogent.core.sequence import ProteinSequence, ModelProteinSequence >>> protstr = 'TEST' >>> ProteinMolType = MolType( ... Sequence = ProteinSequence(protstr), ... motifset = IUPAC_PROTEIN_chars, ... Ambiguities = IUPAC_PROTEIN_ambiguities, ... MWCalculator = ProteinMW, ... make_alphabet_group=True, ... ModelSeq = ModelProteinSequence, ... label = "protein") >>> protseq = ProteinMolType.Sequence Verify sequences ---------------- .. doctest:: >>> rnastr = 'ACGUACGUACGUACGU' >>> dnastr = 'ACGTACGTACGTACGT' >>> RnaMolType.isValid(rnastr) True >>> RnaMolType.isValid(dnastr) False >>> RnaMolType.isValid(NucleicAcidSequence(dnastr).toRna()) True ``Sequence`` ============ The ``Sequence`` object contains classes that represent biological sequence data. These provide generic biological sequence manipulation functions, plus functions that are critical for the ``evolve`` module calculations. .. warning:: Do not import sequence classes directly! It is expected that you will access them through ``MolType`` objects. The most common molecular types ``DNA``, ``RNA``, ``PROTEIN`` are provided as top level imports in cogent (e.g. ``cogent.DNA``). Sequence classes depend on information from the ``MolType`` that is **only** available after ``MolType`` has been imported. Sequences are intended to be immutable. This is not enforced by the code for performance reasons, but don't alter the ``MolType`` or the sequence data after creation. More detailed usage of sequence objects can be found in :ref:`dna-rna-seqs`.PyCogent-1.5.3/doc/cookbook/multivariate_data_analysis.rst000644 000765 000024 00000031277 11466070253 024765 0ustar00jrideoutstaff000000 000000 .. _multivariate-analysis: ************************** Multivariate data analysis ************************** .. sectionauthor Justin Kuczynski, Catherine Lozupone, Andreas Wilm Principal Coordinates Analysis ============================== Principal Coordinates Analysis works on a matrix of pairwise distances. In this example we start by calculating the pairwise distances for a set of aligned sequences, though any distance matrix can be used with PCoA, relating any objects, not only sequences. .. doctest:: >>> from cogent import LoadSeqs >>> from cogent.phylo import distance >>> from cogent.cluster.metric_scaling import PCoA Import a substitution model (or create your own). .. doctest:: >>> from cogent.evolve.models import HKY85 Load the alignment. .. doctest:: >>> al = LoadSeqs("data/test.paml") Create a pairwise distances object calculator for the alignment, providing a substitution model instance. .. doctest:: >>> d = distance.EstimateDistances(al, submodel= HKY85()) >>> d.run(show_progress=False) Now use this matrix to perform principal coordinates analysis. .. doctest:: >>> PCoA_result = PCoA(d.getPairwiseDistances()) >>> print PCoA_result # doctest: +SKIP ====================================================================================== Type Label vec_num-0 vec_num-1 vec_num-2 vec_num-3 vec_num-4 -------------------------------------------------------------------------------------- Eigenvectors NineBande -0.02 0.01 0.04 0.01 0.00 Eigenvectors DogFaced -0.04 -0.06 -0.01 0.00 0.00 Eigenvectors HowlerMon -0.07 0.01 0.01 -0.02 0.00 Eigenvectors Mouse 0.20 0.01 -0.01 -0.00 0.00 Eigenvectors Human -0.07 0.04 -0.03 0.01 0.00 Eigenvalues eigenvalues 0.05 0.01 0.00 0.00 -0.00 Eigenvalues var explained (%) 85.71 9.60 3.73 0.95 -0.00 -------------------------------------------------------------------------------------- We can save these results to a file in a delimited format (we'll use tab here) that can be opened up in any data analysis program, like R or Excel. Here the principal coordinates can be plotted against each other for visualization. .. doctest:: >>> PCoA_result.writeToFile('PCoA_results.txt',sep='\t') Fast-MDS ======== The eigendecomposition step in Principal Coordinates Analysis (PCoA) doesn't scale very well. And for thousands of objects the computation of all pairwise distance alone can get very slow, because it scales quadratically. For a huge number of objects this might even pose a memory problem. Fast-MDS methods approximate an MDS/PCoA solution and do not suffer from these problems. First, let's simulate a big data sample by creating 1500 objects living in 10 dimension. Then compute their pairwise distances and perform a principal coordinates analysis on it. Note that the last two steps might take already a couple of minutes. .. doctest:: >>> from cogent.maths.distance_transform import dist_euclidean >>> from cogent.cluster.metric_scaling import principal_coordinates_analysis >>> from numpy import random >>> objs = random.random((1500, 10)) >>> distmtx = dist_euclidean(objs) >>> full_pcoa = principal_coordinates_analysis(distmtx) PyCogent implements two fast MDS approximations called Split-and-Combine MDS (SCMDS, still in development) and Nystrom (also known as Landmark-MDS). Both can easily handle many thousands objects. One reason is that they don't require all distances to be computed. Instead you pass down the distance function and only required distances are calculated. Nystrom works by using a so called seed-matrix, which contains (only) k by n distances, where n is the total number of objects and k<>> from cogent.cluster.approximate_mds import nystrom >>> from random import sample >>> from numpy import array >>> n_seeds = 100 >>> seeds = array(sample(distmtx,n_seeds)) >>> dims = 3 >>> nystrom_3d = nystrom(seeds, dims) A good rule of thumb for picking n_seeds is log(n), log(n)**2 or sqrt(n). SCMDS works by dividing the pairwise distance matrix into chunks of certain size and overlap. MDS is performed on each chunk individually and the resulting solutions are progressively joined. As in the case of Nystrom not all distances will be computed, but only those of the overlapping tiles. The size and overlap of the tiles determine the quality of the approximation as well as the run-time. .. doctest:: >>> from cogent.cluster.approximate_mds import CombineMds, cmds_tzeng >>> combine_mds = CombineMds() >>> tile_overlap = 100 >>> dims = 3 >>> tile_eigvecs, tile_eigvals = cmds_tzeng(distmtx[0:500,0:500], dims) >>> combine_mds.add(tile_eigvecs, tile_overlap) >>> tile_eigvecs, tile_eigvals = cmds_tzeng(distmtx[400:900,400:900], dims) >>> combine_mds.add(tile_eigvecs, tile_overlap) >>> tile_eigvecs, tile_eigvals = cmds_tzeng(distmtx[800:1300,800:1300], dims) >>> combine_mds.add(tile_eigvecs, tile_overlap) >>> tile_eigvecs, tile_eigvals = cmds_tzeng(distmtx[1200:1500,1200:1500], dims) >>> combine_mds.add(tile_eigvecs, tile_overlap) >>> combien_mds_3d = combine_mds.getFinalMDS() If you want to know how good the returned approximations are, you will have to perform principal_coordinates_analysis() on a smallish submatrix and perform a goodness_of_fit analysis. NMDS ==== NMDS (Non-metric MultiDimensional Scaling) works on a matrix of pairwise distances. In this example, we generate a matrix based on the euclidean distances of an abundance matrix. .. doctest:: >>> from cogent.cluster.nmds import NMDS >>> from cogent.maths.distance_transform import dist_euclidean >>> from numpy import array We start with an abundance matrix, samples (rows) by sequences/species (cols) .. doctest:: >>> abundance = array( ... [[7,1,0,0,0,0,0,0,0], ... [4,2,0,0,0,1,0,0,0], ... [2,4,0,0,0,1,0,0,0], ... [1,7,0,0,0,0,0,0,0], ... [0,8,0,0,0,0,0,0,0], ... [0,7,1,0,0,0,0,0,0],#idx 5 ... [0,4,2,0,0,0,2,0,0], ... [0,2,4,0,0,0,1,0,0], ... [0,1,7,0,0,0,0,0,0], ... [0,0,8,0,0,0,0,0,0], ... [0,0,7,1,0,0,0,0,0],#idx 10 ... [0,0,4,2,0,0,0,3,0], ... [0,0,2,4,0,0,0,1,0], ... [0,0,1,7,0,0,0,0,0], ... [0,0,0,8,0,0,0,0,0], ... [0,0,0,7,1,0,0,0,0],#idx 15 ... [0,0,0,4,2,0,0,0,4], ... [0,0,0,2,4,0,0,0,1], ... [0,0,0,1,7,0,0,0,0]], 'float') Then compute a distance matrix using euclidean distance, and perform nmds on that matrix .. doctest:: >>> euc_distmtx = dist_euclidean(abundance) >>> nm = NMDS(euc_distmtx, verbosity=0) The NMDS object provides a list of points, which can be plotted if desired .. doctest:: >>> pts = nm.getPoints() >>> stress = nm.getStress() With matplotlib installed, we could then do ``plt.plot(pts[:,0], pts[:,1])`` Hierarchical clustering (UPGMA, NJ) =================================== Hierarchical clustering techniques work on a matrix of pairwise distances. In this case, we use the distance matrix from the NMDS example, relating samples of species to one another using UPGMA (NJ below). .. note:: UPGMA should not be used for phylogenetic reconstruction. .. doctest:: >>> from cogent.cluster.UPGMA import upgma we start with the distance matrix and list of sample names: .. doctest:: >>> sample_names = ['sample'+str(i) for i in range(len(euc_distmtx))] make 2d dict: .. doctest:: >>> euc_distdict = {} >>> for i in range(len(sample_names)): ... for j in range(len(sample_names)): ... euc_distdict[(sample_names[i],sample_names[j])]=euc_distmtx[i,j] e.g.: ``euc_distdict[('sample6', 'sample5')] == 3.7416573867739413`` Now use this matrix to build a UPGMA cluster. .. doctest:: >>> mycluster = upgma(euc_distdict) >>> print mycluster.asciiArt() /-sample10 /edge.3--| /edge.2--| \-sample8 | | | \-sample9 /edge.1--| | | /-sample12 | | /edge.5--| | | | \-sample11 | \edge.4--| | | /-sample6 | \edge.6--| /edge.0--| \-sample7 | | | | /-sample15 | | /edge.10-| | | /edge.9--| \-sample14 | | | | | | /edge.8--| \-sample13 | | | | | \edge.7--| \-sample16 -root----| | | | /-sample17 | \edge.11-| | \-sample18 | | /-sample5 | /edge.14-| | /edge.13-| \-sample4 | | | | | \-sample3 \edge.12-| | /-sample2 | /edge.16-| \edge.15-| \-sample1 | \-sample0 We demonstrate saving this UPGMA cluster to a file. .. doctest:: >>> mycluster.writeToFile('test_upgma.tree') .. We don't actually want to keep that file now, so I'm importing the ``os`` module to delete it. .. doctest:: :hide: >>> import os >>> os.remove('test_upgma.tree') We can use neighbor joining (NJ) instead of UPGMA: .. doctest:: >>> from cogent.phylo.nj import nj >>> njtree = nj(euc_distdict) >>> print njtree.asciiArt() /-sample16 | | /-sample12 | /edge.2--| | | | /-sample13 | | \edge.1--| | | | /-sample14 | | \edge.0--| | | \-sample15 | | | | /-sample7 |-edge.14-| /edge.5--| | | | | /-sample8 | | | \edge.4--| | | /edge.6--| | /-sample10 | | | | \edge.3--| | | | | \-sample9 -root----| | | | | | | \-sample11 | | | | \edge.13-| /-sample6 | | | | | | /-sample4 | | /edge.10-| /edge.7--| | | | | /edge.8--| \-sample3 | | | | | | | | | \edge.9--| \-sample5 | \edge.12-| | | | \-sample2 | | | | /-sample0 | \edge.11-| | \-sample1 | | /-sample18 \edge.15-| \-sample17 PyCogent-1.5.3/doc/cookbook/parallel_tasks_with_mpi.rst000644 000765 000024 00000001647 11642035237 024261 0ustar00jrideoutstaff000000 000000 Distribution of tasks across multiple CPUs using ``mpi`` ======================================================== .. warning:: This example requires execution on 2 CPUs. It can be run using: ``$ mpirun -n 2 python path/to/parallel_tasks.rst`` All I'm doing here is illustrating the use of ``parallel.map`` with the simplest example I could come up with. I create a list which will have an integer appended to it -- hardly useful, but hopefully demonstrates how a series of data can be mapped onto a function in parallel. In this case, the data is just a list of numbers. .. doctest:: >>> from cogent.util import parallel >>> passed_indices = [] >>> series = range(20) >>> result = parallel.map(passed_indices.append, series) >>> assert passed_indices == range(0,20,2) or passed_indices == range(1,20,2) The result is either the list of even numbers up to (but not including) 20 or the list of odd numbers. PyCogent-1.5.3/doc/cookbook/parallel_tasks_with_multiprocess.rst000644 000765 000024 00000003042 11642035237 026214 0ustar00jrideoutstaff000000 000000 Distribution of tasks across multiple CPUs using ``multiprocess`` ================================================================= .. note:: Using multiprocess requires no extra commands on invoking the script! In the ``multiprocess`` case, ``foo()`` is called in a *temporary* subprocesses. Communication from a subprocess back into the top process is via the value that ``foo()`` returns. I illustrate the use of ``parallel.map`` here with an example that collects both the process id and the integer. .. doctest:: >>> from cogent.util import parallel >>> import os, time >>> parallel.use_multiprocessing(2) >>> def foo(val): ... # do some real intensive work in here, which I've simulated by ... # using sleep ... time.sleep(0.01) ... return os.getpid(), val >>> data = range(20) >>> result = parallel.map(foo, data) For the purpose of testing the code is executing correctly, I'll check that there are 2 pid's returned. .. doctest:: >>> pids, nums = zip(*result) >>> len(set(pids)) == 2 True The ``result`` list is a series of tuples of process id's and the integer, the latter not necessarily being in order. .. we don't test the following output since the pid will vary between runs .. doctest:: >>> print result # doctest: +SKIP [(7332, 0), (7333, 1), (7332, 2), (7333, 3), (7332, 4), (7333, 5), (7332, 6), (7333, 7), (7332, 8), (7333, 9), (7332, 10), (7333, 11), (7332, 12), (7333, 13), (7332, 14), (7333, 15), (7332, 16), (7333, 17), (7332, 18), (7333, 19)] PyCogent-1.5.3/doc/cookbook/phylonodes.rst000644 000765 000024 00000025403 12014672000 021525 0ustar00jrideoutstaff000000 000000 Phylonodes Methods ------------------ .. authors, Dan Knights Basics ^^^^^^ Loading a tree from a file and visualizing it with ``asciiArt()`` """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.asciiArt() /-Human /edge.0--| /edge.1--| \-HowlerMon | | | \-Mouse -root----| |--NineBande | \-DogFaced Writing a tree to a file """""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> tr.writeToFile('data/temp.tree') Getting the individual nodes of a tree by name """""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> names = tr.getNodeNames() >>> names[:4] ['root', 'edge.1', 'edge.0', 'Human'] >>> names[4:] ['HowlerMon', 'Mouse', 'NineBande', 'DogFaced'] >>> names_nodes = tr.getNodesDict() >>> names_nodes['Human'] Tree("Human;") >>> tr.getNodeMatchingName('Mouse') Tree("Mouse;") Getting the name of a node (or a tree) """""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> hu = tr.getNodeMatchingName('Human') >>> tr.Name 'root' >>> hu.Name 'Human' The object type of a tree and its nodes is the same """"""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> nodes = tr.getNodesDict() >>> hu = nodes['Human'] >>> type(hu) >>> type(tr) Working with the nodes of a tree """""""""""""""""""""""""""""""" Get all the nodes, tips and edges .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> nodes = tr.getNodesDict() >>> for n in nodes.items(): ... print n ... ('NineBande', Tree("NineBande;")) ('edge.1', Tree("((Human,HowlerMon),Mouse);")) ('root', Tree("(((Human,HowlerMon),Mouse),NineBande,DogFaced);")) ('DogFaced', Tree("DogFaced;")) ('Human', Tree("Human;")) ('edge.0', Tree("(Human,HowlerMon);")) ('Mouse', Tree("Mouse;")) ('HowlerMon', Tree("HowlerMon;")) only the terminal nodes (tips) .. doctest:: >>> for n in tr.iterTips(): ... print n ... Human:0.0311054096183; HowlerMon:0.0415847131449; Mouse:0.277353608988; NineBande:0.0939768158209; DogFaced:0.113211053859; for internal nodes (edges) we can use Newick format to simplify the output .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> for n in tr.iterNontips(): ... print n.getNewick() ... ((Human,HowlerMon),Mouse); (Human,HowlerMon); Getting the path between two tips or edges (connecting edges) """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> edges = tr.getConnectingEdges('edge.1','Human') >>> for edge in edges: ... print edge.Name ... edge.1 edge.0 Human Getting the distance between two nodes """""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> nodes = tr.getNodesDict() >>> hu = nodes['Human'] >>> mu = nodes['Mouse'] >>> hu.distance(mu) 0.3467553... >>> hu.isTip() True Getting the last common ancestor (LCA) for two nodes """""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> nodes = tr.getNodesDict() >>> hu = nodes['Human'] >>> mu = nodes['Mouse'] >>> lca = hu.lastCommonAncestor(mu) >>> lca Tree("((Human,HowlerMon),Mouse);") >>> type(lca) Getting all the ancestors for a node """""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> hu = tr.getNodeMatchingName('Human') >>> for a in hu.ancestors(): ... print a.Name ... edge.0 edge.1 root Getting all the children for a node """"""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> node = tr.getNodeMatchingName('edge.1') >>> children = list(node.iterTips()) + list(node.iterNontips()) >>> for child in children: ... print child.Name ... Human HowlerMon Mouse edge.0 Getting all the distances for a tree """""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> dists = tr.getDistances() We also show how to select a subset of distances involving just one species. .. doctest:: >>> human_dists = [names for names in dists if 'Human' in names] >>> for dist in human_dists: ... print dist, dists[dist] ... ('Human', 'NineBande') 0.183106418165 ('DogFaced', 'Human') 0.202340656203 ('NineBande', 'Human') 0.183106418165 ('Human', 'DogFaced') 0.202340656203 ('Mouse', 'Human') 0.346755361094 ('HowlerMon', 'Human') 0.0726901227632 ('Human', 'Mouse') 0.346755361094 ('Human', 'HowlerMon') 0.0726901227632 Getting the two nodes that are farthest apart """"""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> tr.maxTipTipDistance() (0.4102925130849, ('Mouse', 'DogFaced')) Get the nodes within a given distance """"""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> hu = tr.getNodeMatchingName('Human') >>> tips = hu.tipsWithinDistance(0.2) >>> for t in tips: ... print t ... HowlerMon:0.0415847131449; NineBande:0.0939768158209; Rerooting trees ^^^^^^^^^^^^^^^ At a named node """"""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.rootedAt('edge.0').asciiArt() /-Human | -root----|--HowlerMon | | /-Mouse \edge.0--| | /-NineBande \edge.1--| \-DogFaced At the midpoint """"""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.rootAtMidpoint().asciiArt() /-Mouse | -root----| /-Human | /edge.0--| | | \-HowlerMon \edge.0.2| | /-NineBande \edge.1--| \-DogFaced >>> print tr.asciiArt() /-Human /edge.0--| /edge.1--| \-HowlerMon | | | \-------- /-Mouse -root----| |--NineBande | \-DogFaced Tree representations ^^^^^^^^^^^^^^^^^^^^ Newick format """"""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> tr.getNewick() '(((Human,HowlerMon),Mouse),NineBande,DogFaced);' >>> tr.getNewick(with_distances=True) '(((Human:0.0311054096183,HowlerMon:0.0415847131449)... XML format """""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> xml = tr.getXML() >>> for line in xml.splitlines(): ... print line ... length0.0197278502379 length0.0382963424874 Human... Write to PDF """""""""""" .. note:: This requires ``matplotlib``. It will bring up a ``matplotlib`` window if run from the command line. But in any case, it will write the pdf file to the data directory. .. doctest:: >>> from cogent import LoadTree >>> from cogent.draw import dendrogram >>> tr = LoadTree('data/test.tree') >>> h, w = 500, 500 >>> np = dendrogram.ContemporaneousDendrogram(tr) >>> np.drawToPDF('temp.pdf', w, h, font_size=14) .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files('temp.pdf', error_on_missing=False) Tree traversal ^^^^^^^^^^^^^^ Here is the example tree for reference: .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.asciiArt() /-Human /edge.0--| /edge.1--| \-HowlerMon | | | \-Mouse -root----| |--NineBande | \-DogFaced Preorder """""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> for t in tr.preorder(): ... print t.getNewick() ... (((Human,HowlerMon),Mouse),NineBande,DogFaced); ((Human,HowlerMon),Mouse); (Human,HowlerMon); Human; HowlerMon; Mouse; NineBande; DogFaced; Postorder """"""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> for t in tr.postorder(): ... print t.getNewick() ... Human; HowlerMon; (Human,HowlerMon); Mouse; ((Human,HowlerMon),Mouse); NineBande; DogFaced; (((Human,HowlerMon),Mouse),NineBande,DogFaced); Selecting subtrees ^^^^^^^^^^^^^^^^^^ One way to do it """""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> for tip in tr.iterNontips(): ... tip_names = tip.getTipNames() ... print tip_names ... sub_tree = tr.getSubTree(tip_names) ... print sub_tree.asciiArt() ... print ... ['Human', 'HowlerMon', 'Mouse'] /-Human | -root----|--HowlerMon | \-Mouse ['Human', 'HowlerMon'] /-Human -root----| \-HowlerMon .. We do some file clean up .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files(['data/temp.tree', 'data/temp.pdf'], ... error_on_missing=False) PyCogent-1.5.3/doc/cookbook/protein_sequences.rst000644 000765 000024 00000002066 11350301455 023101 0ustar00jrideoutstaff000000 000000 Protein sequences ----------------- .. authors, Gavin Huttley, Kristian Rother, Patrick Yannul Creating a ProteinSequence with a name ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import PROTEIN >>> p = PROTEIN.makeSequence('THISISAPRQTEIN','myProtein') >>> type(p) >>> str(p) 'THISISAPRQTEIN' Converting a DNA sequence string to protein sequence string ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent.core.genetic_code import DEFAULT as standard_code >>> standard_code.translate('TTTGCAAAC') 'FAN' Conversion to a ``ProteinSequence`` from a ``DnaSequence`` is shown in :ref:`translation`. Loading protein sequences from a Phylip file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import LoadSeqs, PROTEIN >>> seq = LoadSeqs('data/abglobin_aa.phylip', moltype=PROTEIN, ... aligned=True) Loading other formats, or collections of sequences is shown in :ref:`load-seqs`.PyCogent-1.5.3/doc/cookbook/sequence_similarity_search.rst000644 000765 000024 00000000521 11305664061 024750 0ustar00jrideoutstaff000000 000000 ************************** Sequence similarity search ************************** Exact pattern ============= *To be written.* Weight matrix ============= *To be written.* BLAST ===== *To be written.* hmmer ===== *To be written.* infernal ======== *To be written.* Orthologs vs paralogs ===================== *To be written.* PyCogent-1.5.3/doc/cookbook/sequence_simulation.rst000644 000765 000024 00000000203 11305664061 023416 0ustar00jrideoutstaff000000 000000 ******************* Sequence simulation ******************* By ML ===== *To be written.* By Seqsim ========= *To be written.* PyCogent-1.5.3/doc/cookbook/simple_trees.rst000644 000765 000024 00000041132 12014672000 022031 0ustar00jrideoutstaff000000 000000 Trees ----- .. authors, Gavin Huttley, Tom Elliott Basics ^^^^^^ Loading a tree from a file and visualizing it with ``asciiArt()`` """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.asciiArt() /-Human /edge.0--| /edge.1--| \-HowlerMon | | | \-Mouse -root----| |--NineBande | \-DogFaced Writing a tree to a file """""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> tr.writeToFile('data/temp.tree') Getting the individual nodes of a tree by name """""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> names = tr.getNodeNames() >>> names[:4] ['root', 'edge.1', 'edge.0', 'Human'] >>> names[4:] ['HowlerMon', 'Mouse', 'NineBande', 'DogFaced'] >>> names_nodes = tr.getNodesDict() >>> names_nodes['Human'] Tree("Human;") >>> tr.getNodeMatchingName('Mouse') Tree("Mouse;") Getting the name of a node (or a tree) """""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> hu = tr.getNodeMatchingName('Human') >>> tr.Name 'root' >>> hu.Name 'Human' The object type of a tree and its nodes is the same """"""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> nodes = tr.getNodesDict() >>> hu = nodes['Human'] >>> type(hu) >>> type(tr) Working with the nodes of a tree """""""""""""""""""""""""""""""" Get all the nodes, tips and edges .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> nodes = tr.getNodesDict() >>> for n in nodes.items(): ... print n ... ('NineBande', Tree("NineBande;")) ('edge.1', Tree("((Human,HowlerMon),Mouse);")) ('root', Tree("(((Human,HowlerMon),Mouse),NineBande,DogFaced);")) ('DogFaced', Tree("DogFaced;")) ('Human', Tree("Human;")) ('edge.0', Tree("(Human,HowlerMon);")) ('Mouse', Tree("Mouse;")) ('HowlerMon', Tree("HowlerMon;")) only the terminal nodes (tips) .. doctest:: >>> for n in tr.iterTips(): ... print n ... Human:0.0311054096183; HowlerMon:0.0415847131449; Mouse:0.277353608988; NineBande:0.0939768158209; DogFaced:0.113211053859; for internal nodes (edges) we can use Newick format to simplify the output .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> for n in tr.iterNontips(): ... print n.getNewick() ... ((Human,HowlerMon),Mouse); (Human,HowlerMon); Getting the path between two tips or edges (connecting edges) """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> edges = tr.getConnectingEdges('edge.1','Human') >>> for edge in edges: ... print edge.Name ... edge.1 edge.0 Human Getting the distance between two nodes """""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> nodes = tr.getNodesDict() >>> hu = nodes['Human'] >>> mu = nodes['Mouse'] >>> hu.distance(mu) 0.3467553... >>> hu.isTip() True Getting the last common ancestor (LCA) for two nodes """""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> nodes = tr.getNodesDict() >>> hu = nodes['Human'] >>> mu = nodes['Mouse'] >>> lca = hu.lastCommonAncestor(mu) >>> lca Tree("((Human,HowlerMon),Mouse);") >>> type(lca) Getting all the ancestors for a node """""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> hu = tr.getNodeMatchingName('Human') >>> for a in hu.ancestors(): ... print a.Name ... edge.0 edge.1 root Getting all the children for a node """"""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> node = tr.getNodeMatchingName('edge.1') >>> children = list(node.iterTips()) + list(node.iterNontips()) >>> for child in children: ... print child.Name ... Human HowlerMon Mouse edge.0 Getting all the distances for a tree """""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> dists = tr.getDistances() We also show how to select a subset of distances involving just one species. .. doctest:: >>> human_dists = [names for names in dists if 'Human' in names] >>> for dist in human_dists: ... print dist, dists[dist] ... ('Human', 'NineBande') 0.183106418165 ('DogFaced', 'Human') 0.202340656203 ('NineBande', 'Human') 0.183106418165 ('Human', 'DogFaced') 0.202340656203 ('Mouse', 'Human') 0.346755361094 ('HowlerMon', 'Human') 0.0726901227632 ('Human', 'Mouse') 0.346755361094 ('Human', 'HowlerMon') 0.0726901227632 Getting the two nodes that are farthest apart """"""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> tr.maxTipTipDistance() (0.4102925130849, ('Mouse', 'DogFaced')) Get the nodes within a given distance """"""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> hu = tr.getNodeMatchingName('Human') >>> tips = hu.tipsWithinDistance(0.2) >>> for t in tips: ... print t ... HowlerMon:0.0415847131449; NineBande:0.0939768158209; Rerooting trees ^^^^^^^^^^^^^^^ At a named node """"""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.rootedAt('edge.0').asciiArt() /-Human | -root----|--HowlerMon | | /-Mouse \edge.0--| | /-NineBande \edge.1--| \-DogFaced At the midpoint """"""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.rootAtMidpoint().asciiArt() /-Mouse | -root----| /-Human | /edge.0--| | | \-HowlerMon \edge.0.2| | /-NineBande \edge.1--| \-DogFaced >>> print tr.asciiArt() /-Human /edge.0--| /edge.1--| \-HowlerMon | | | \-------- /-Mouse -root----| |--NineBande | \-DogFaced Near a given tip """""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.asciiArt() /-Human /edge.0--| /edge.1--| \-HowlerMon | | | \-Mouse -root----| |--NineBande | \-DogFaced >>> print tr.rootedWithTip("Mouse").asciiArt() /-Human /edge.0--| | \-HowlerMon | -root----|--Mouse | | /-NineBande \edge.1--| \-DogFaced Tree representations ^^^^^^^^^^^^^^^^^^^^ Newick format """"""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> tr.getNewick() '(((Human,HowlerMon),Mouse),NineBande,DogFaced);' >>> tr.getNewick(with_distances=True) '(((Human:0.0311054096183,HowlerMon:0.0415847131449)... XML format """""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> xml = tr.getXML() >>> for line in xml.splitlines(): ... print line ... length0.0197278502379 length0.0382963424874 Human... Write to PDF """""""""""" .. note:: This requires ``matplotlib``. It will bring up a ``matplotlib`` window if run from the command line. But in any case, it will write the pdf file to the data directory. .. doctest:: >>> from cogent import LoadTree >>> from cogent.draw import dendrogram >>> tr = LoadTree('data/test.tree') >>> h, w = 500, 500 >>> np = dendrogram.ContemporaneousDendrogram(tr) >>> np.drawToPDF('temp.pdf', w, h, font_size=14) .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files('temp.pdf', error_on_missing=False) Tree traversal ^^^^^^^^^^^^^^ Here is the example tree for reference: .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.asciiArt() /-Human /edge.0--| /edge.1--| \-HowlerMon | | | \-Mouse -root----| |--NineBande | \-DogFaced Preorder """""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> for t in tr.preorder(): ... print t.getNewick() ... (((Human,HowlerMon),Mouse),NineBande,DogFaced); ((Human,HowlerMon),Mouse); (Human,HowlerMon); Human; HowlerMon; Mouse; NineBande; DogFaced; Postorder """"""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> for t in tr.postorder(): ... print t.getNewick() ... Human; HowlerMon; (Human,HowlerMon); Mouse; ((Human,HowlerMon),Mouse); NineBande; DogFaced; (((Human,HowlerMon),Mouse),NineBande,DogFaced); Selecting subtrees ^^^^^^^^^^^^^^^^^^ One way to do it """""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> for tip in tr.iterNontips(): ... tip_names = tip.getTipNames() ... print tip_names ... sub_tree = tr.getSubTree(tip_names) ... print sub_tree.asciiArt() ... print ... ['Human', 'HowlerMon', 'Mouse'] /-Human | -root----|--HowlerMon | \-Mouse ['Human', 'HowlerMon'] /-Human -root----| \-HowlerMon .. We do some file clean up .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files(['data/temp.tree', 'data/temp.pdf'], ... error_on_missing=False) Tree manipulation methods ^^^^^^^^^^^^^^^^^^^^^^^^^ Pruning the tree """""""""""""""" Remove internal nodes with only one child. Create new connections and branch lengths (if tree is a PhyloNode) to reflect the change. .. doctest:: >>> from cogent import LoadTree >>> simple_tree_string="(B:0.2,(D:0.4)E:0.5)F;" >>> simple_tree=LoadTree(treestring=simple_tree_string) >>> print simple_tree.asciiArt() /-B -F-------| \E------- /-D >>> simple_tree.prune() >>> print simple_tree.asciiArt() /-B -F-------| \-D >>> print simple_tree (B:0.2,D:0.9)F; Create a full unrooted copy of the tree """"""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr1 = LoadTree('data/test.tree') >>> print tr1.getNewick() (((Human,HowlerMon),Mouse),NineBande,DogFaced); >>> tr2 = tr1.unrootedDeepcopy() >>> print tr2.getNewick() (((Human,HowlerMon),Mouse),NineBande,DogFaced); Transform tree into a bifurcating tree """""""""""""""""""""""""""""""""""""" Add internal nodes so that every node has 2 or fewer children. .. doctest:: >>> from cogent import LoadTree >>> tree_string="(B:0.2,H:0.2,(C:0.3,D:0.4,E:0.1)F:0.5)G;" >>> tr = LoadTree(treestring=tree_string) >>> print tr.asciiArt() /-B | |--H -G-------| | /-C | | \F-------|--D | \-E >>> print tr.bifurcating().asciiArt() /-B -G-------| | /-H \--------| | /-C \F-------| | /-D \--------| \-E Transform tree into a balanced tree """"""""""""""""""""""""""""""""""" Using a balanced tree can substantially improve performance of likelihood calculations. Note that the resulting tree has a different orientation with the effect that specifying clades or stems for model parameterization should be done using the "outgroup_name" argument. .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree('data/test.tree') >>> print tr.asciiArt() /-Human /edge.0--| /edge.1--| \-HowlerMon | | | \-Mouse -root----| |--NineBande | \-DogFaced >>> print tr.balanced().asciiArt() /-Human /edge.0--| | \-HowlerMon | -root----|--Mouse | | /-NineBande \edge.1--| \-DogFaced Test two trees for same topology """""""""""""""""""""""""""""""" Branch lengths don't matter. .. doctest:: >>> from cogent import LoadTree >>> tr1 = LoadTree(treestring="(B:0.2,(C:0.2,D:0.2)F:0.2)G;") >>> tr2 = LoadTree(treestring="((C:0.1,D:0.1)F:0.1,B:0.1)G;") >>> tr1.sameTopology(tr2) True Calculate each node's maximum distance to a tip """"""""""""""""""""""""""""""""""""""""""""""" Sets each node's "TipDistance" attribute to be the distance from that node to its most distant tip. .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree(treestring="(B:0.2,(C:0.3,D:0.4)F:0.5)G;") >>> print tr.asciiArt() /-B -G-------| | /-C \F-------| \-D >>> tr.setTipDistances() >>> for t in tr.preorder(): ... print t.Name, t.TipDistance ... G 0.9 B 0 F 0.4 C 0 D 0 Scale branch lengths in place to integers for ascii output """""""""""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree(treestring="(B:0.2,(C:0.3,D:0.4)F:0.5)G;") >>> print tr (B:0.2,(C:0.3,D:0.4)F:0.5)G; >>> tr.scaleBranchLengths() >>> print tr (B:22,(C:33,D:44)F:56)G; Get tip-to-tip distances """""""""""""""""""""""" Get a distance matrix between all pairs of tips and a list of the tip nodes. .. doctest:: >>> from cogent import LoadTree >>> tr = LoadTree(treestring="(B:3,(C:2,D:4)F:5)G;") >>> d,tips = tr.tipToTipDistances() >>> for i,t in enumerate(tips): ... print t.Name,d[i] ... B [ 0. 10. 12.] C [ 10. 0. 6.] D [ 12. 6. 0.] Compare two trees using tip-to-tip distance matrices """""""""""""""""""""""""""""""""""""""""""""""""""" Score ranges from 0 (minimum distance) to 1 (maximum distance). The default is to use Pearson's correlation, in which case a score of 0 means that the Pearson's correlation was perfectly good (1), and a score of 1 means that the Pearson's correlation was perfectly bad (-1). Note: automatically strips out the names that don't match. .. doctest:: >>> from cogent import LoadTree >>> tr1 = LoadTree(treestring="(B:2,(C:3,D:4)F:5)G;") >>> tr2 = LoadTree(treestring="(C:2,(B:3,D:4)F:5)G;") >>> tr1.compareByTipDistances(tr2) 0.0835... PyCogent-1.5.3/doc/cookbook/standard_statistical_analyses.rst000644 000765 000024 00000050736 12014672070 025462 0ustar00jrideoutstaff000000 000000 ***************************** Standard statistical analyses ***************************** .. authors Tom Elliott, Gavin Huttley, Anuj Pahwa .. following is just a list of the filenames that need to be deleted, to be appended to after each one is called. Readers don't really need to see this housekeeping so I'm 'hiding' this code. .. doctest:: :hide: >>> filenames_to_delete = [] Random Numbers ============== Many of the code snippets in this section use random numbers. These can be obtained using functions from Python's ``random`` module, or using ``numpy.random``. To facilitate testing, the examples "seed" the random number generator, which ensures the same results each time the code is run. .. doctest:: >>> import random >>> random.seed(157) >>> random.choice((-1,1)) 1 >>> random.choice(range(1000)) 224 >>> random.gauss(mu=50, sigma=3) 52.7668... >>> import numpy as np >>> np.random.seed(157) >>> np.random.random_integers(-1,1,5) array([-1, 1, 1, 1, 0]) For the last example, note that the range includes 0. .. doctest:: >>> np.random.normal(loc=50,scale=3,size=2) array([ 42.8217253 , 49.90008293]) >>> np.random.randn(3) array([ 1.26613052, 1.59533412, 0.95612413]) Summary statistics ================== Population mean and median -------------------------- PyCogent's functions for statistical analysis operate on ``numpy`` arrays .. doctest:: >>> import random >>> import numpy as np >>> import cogent.maths.stats.test as stats >>> random.seed(157) >>> nums = [random.gauss(mu=50, sigma=3) for i in range(1000)] >>> arr = np.array(nums) >>> stats.mean(arr) 49.9927... but in some cases they will also accept a simple list of values .. doctest:: >>> stats.mean(range(1,8)) 4.0 >>> stats.var(range(1,8)) 4.66666... The keyword argument ``axis`` controls whether a function operates by rows (``axis=0``) or by columns (``axis=1``), or on all of the values (``axis=None``) .. doctest:: >>> import cogent.maths.stats.test as stats >>> import numpy as np >>> nums = range(1,6) + [50] + range(10,60,10) + [500] >>> arr = np.array(nums) >>> arr.shape = (2,6) >>> arr array([[ 1, 2, 3, 4, 5, 50], [ 10, 20, 30, 40, 50, 500]]) >>> stats.mean(arr, axis=0) array([ 5.5, 11. , 16.5, 22. , 27.5, 275. ]) >>> stats.mean(arr, axis=1) array([ 10.83333333, 108.33333333]) >>> stats.mean(arr) 59.58333... >>> stats.median(arr, axis=0) array([ 5.5, 11. , 16.5, 22. , 27.5, 275. ]) >>> stats.median(arr, axis=1) array([ 3.5, 35. ]) >>> stats.median(arr) 15.0 Population variance and standard deviation ------------------------------------------ .. doctest:: >>> print stats.var(arr, axis=0) [ 4.05000000e+01 1.62000000e+02 3.64500000e+02 6.48000000e+02 1.01250000e+03 1.01250000e+05] >>> print stats.std(arr, axis=0) [ 6.36396103 12.72792206 19.09188309 25.45584412 31.81980515 318.19805153] >>> print stats.var(arr, axis=1) [ 370.16666667 37016.66666667] >>> print stats.std(arr, axis=1) [ 19.23971587 192.39715868] >>> print stats.var(arr, axis=None) 19586.6287879 >>> print stats.std(arr, axis=None) 139.952237524 The variance (and standard deviation) are unbiased .. doctest:: >>> import numpy as np >>> import cogent.maths.stats.test as stats >>> arr = np.array([1,2,3,4,5]) >>> m = np.mean(arr) >>> stats.var(arr) 2.5 >>> 1.0 * sum([(n-m)**2 for n in arr]) / (len(arr) - 1) 2.5 Distributions ============= Binomial -------- The binomial distribution can be used for calculating the probability of specific frequencies of states occurring in discrete data. The two alternate states are typically referred to as a success or failure. This distribution is used for sign tests. .. doctest:: >>> import cogent.maths.stats.distribution as distr >>> distr.binomial_low(successes=5, trials=10, prob=0.5) 0.623... >>> distr.binomial_high(successes=5, trials=10, prob=0.5) 0.376... >>> distr.binomial_exact(successes=5, trials=10, prob=0.5) 0.246... Chi-square ---------- A convenience function for computing the probability of a chi-square statistic is provided at the ``stats`` top level. .. doctest:: >>> from cogent.maths.stats import chisqprob >>> chisqprob(3.84, 1) 0.05... which is just a reference to the ``chi_high`` function. .. doctest:: >>> from cogent.maths.stats.distribution import chi_high >>> chi_high(3.84, 1) 0.05... Getting the inverse ^^^^^^^^^^^^^^^^^^^ Given a probability we can determine the corresponding chi-square value for a given degrees-of-freedom. .. doctest:: >>> from cogent.maths.stats.distribution import chdtri >>> chdtri(1, 0.05) 3.84... >>> chdtri(2, 0.05) 5.99... Normal ------ The function ``zprob()`` takes a z-score or standard deviation and computes the fraction of the normal distribution (mean=0, std=1) which lies farther away from the mean than that value. For example, only about 4.5% of the values are more than 2 standard deviations away from the mean, so that more than 95% of the values are at least that close to the mean. .. doctest:: >>> import cogent.maths.stats.distribution as distr >>> for z in range(5): ... print '%s %.4f' % (z, distr.zprob(z)) ... 0 1.0000 1 0.3173 2 0.0455 3 0.0027 4 0.0001 Use the functions ``z_low()`` and ``z_high()`` to compute the normal distribution in a directional fashion. Here we see that a z-score of 1.65 has a value greater than 95% of the values in the distribution, and similarly a z-score of 1.96 has a value greater than 97.5% of the values in the distribution. .. doctest:: >>> z = 0 >>> while distr.z_low(z) < 0.95: ... z += 0.01 ... >>> z 1.6500... >>> z = 0 >>> while distr.z_low(z) < 0.975: ... z += 0.01 ... >>> z 1.9600... Normalizing data (as Z-scores) ============================== The function ``z_test()`` takes a sample of values as the first argument, and named arguments for the population parameters: ``popmean`` and ``popstddev`` (with default values of 0 and 1), and returns the z-score and its probability. In this example, we grab a sample from a population with ``mean=50`` and ``std=3``, and call ``z_test()`` with the population mean specified as 50 and the ``popstddev`` assuming its default value of 1: .. Uses the parametric standard deviation .. doctest:: >>> import numpy as np >>> import cogent.maths.stats.test as stats >>> np.random.seed(157) >>> arr = np.random.normal(loc=50,scale=3,size=1000) >>> round(stats.mean(arr), 1) 49.9... >>> round(stats.std(arr), 1) 3.1... >>> z, prob = stats.z_test(arr, popmean=50.0) >>> print z -3.08... .. todo TE: I think the above needs more explanation. What does this have to do with a Z-score, as in Z = (arr - stats.mean(arr))/stats.std(arr)? Resampling based statistics =========================== The Jackknife ------------- This is a data resampling based approach to estimating the confidence in measures of location (like the mean). The method is based on omission of one member of a sample and recomputing the statistic of interest. This measures the influence of individual observations on the sample and also the confidence in the statistic. The ``Jackknife`` class relies on our ability to handle a set of indexes for sub-setting our data and re-computing our statistic. The client code must be able to take a indices and generate a new statistic. We demo using the jackknife the estimate of mean GC% for an alignment. We first write a factory function to compute the confidence in the mean GC% for an alignment by sampling specific columns. .. doctest:: >>> def CalcGc(aln): ... def calc_gc(indices): ... new = aln.takePositions(indices) ... probs = new.getMotifProbs() ... gc = sum(probs[b] for b in 'CG') ... total = sum(probs[b] for b in 'ACGT') ... return gc / total ... return calc_gc We then create an instance of this factory function with a specific alignment. .. doctest:: >>> from cogent import LoadSeqs, DNA >>> aln = LoadSeqs('data/test.paml', moltype=DNA) >>> calc_gc = CalcGc(aln) We now create a ``Jackknife`` instance, passing it the ``calc_gc`` instance we have just made and obtain the sampling statistics. We specify how many elements we're interested in (in this case, the positions in the alignment). .. doctest:: >>> from cogent.maths.stats.jackknife import JackknifeStats >>> jk = JackknifeStats(len(aln), calc_gc) >>> print jk.SampleStat 0.4766... >>> print jk.SummaryStats Summary Statistics =============================================== Sample Stat Jackknife Stat Standard Error ----------------------------------------------- 0.4767 0.4767 0.0584 ----------------------------------------------- We also display the sub-sample statistics. .. doctest:: >>> print jk.SubSampleStats Subsample Stats ============ i Stat-i ------------ 0 0.4678 1 0.4678 2 0.4847 3 0.4814... .. note:: You can provide a custom index generation function that omits groups of observations, for instance. This can be assigned to the ``gen_index`` argument of the ``Jackknife`` constructor. Permutations ============ .. this is really a numpy features Random ------ .. doctest:: >>> from numpy.random import permutation as perm >>> import numpy as np >>> np.random.seed(153) >>> arr = np.array(range(5)) >>> for i in range(3): ... print perm(arr) ... [2 1 3 0 4] [0 3 2 4 1] [4 0 1 2 3] Ordered ------- *To be written.* Differences in means ==================== Consider a single sample of 50 value: .. doctest:: >>> import numpy as np >>> import cogent.maths.stats.test as stats >>> np.random.seed(1357) >>> nums1 = np.random.normal(loc=45,scale=10,size=50) Although we don't know the population values for the mean and standard deviation for this sample, we can evaluate the probability that the sample could have been drawn from some population with known values, as shown above in Normalizing data (as Z-scores). If we have a second sample, whose parent population mean and standard deviation are also unknown: .. doctest:: >>> nums2 = np.random.normal(loc=50,scale=10,size=50) Suppose we believe (before we see any data) that the mean of the first population is different than the second but we don't know in which direction the change lies, we estimate the standard deviation. We use the standard error of the mean as an estimate for how close the mean of sample 2 is to the mean of its parent population (and vice-versa). .. doctest:: >>> mean_nums2 = stats.mean(nums2) >>> sd_nums2 = stats.std(nums2) >>> se_nums2 = sd_nums2 / np.sqrt(len(nums2)) >>> se_nums2 1.1113... >>> mean_nums1 = stats.mean(nums1) >>> mean_nums1 46.5727... >>> mean_nums2 50.3825... >>> mean_nums1 < mean_nums2 - 1.96 * se_nums2 True t-Tests ======= Small sample sizes can be handled by the use of t-tests. The function ``t_two_sample()`` is used for two independent samples. .. doctest:: >>> subsample1 = nums1[:5] >>> [str(round(n,2)) for n in subsample1] ['49.25', '38.87', '47.06', '44.49', '43.73'] >>> stats.mean(subsample1) 44.67901... >>> subsample2 = nums2[:5] >>> [str(round(n,2)) for n in subsample2] ['51.57', '40.6', '49.62', '46.69', '59.34'] >>> stats.mean(subsample2) 49.56494... >>> t, prob = stats.t_two_sample(subsample1,subsample2) >>> t -1.3835... >>> prob 0.20388... The two sample means are not significantly different. If there is one small sample and we want to ask whether it is unlikely to have come from a population with a known mean, use the function ``t_one_sample()`` .. doctest:: >>> import cogent.maths.stats.test as stats >>> arr = [52.6, 51.3, 49.8] >>> t, prob = stats.t_one_sample(arr, popmean=48, tails='high') >>> t 3.99681... >>> prob 0.02863... For related samples (pre- and post-treatment), use the function ``t_paired()`` .. doctest:: >>> import cogent.maths.stats.test as stats >>> pre = [52.6, 51.3, 49.8] >>> post = [62.6, 75.0, 65.2] >>> t, prob = stats.t_paired(pre, post, tails='low') >>> t -4.10781... >>> prob 0.02723... Sign test ========= This is essentially just a test using the binomial distribution where the probability of success = 0.5. .. doctest:: >>> from cogent.maths.stats.test import sign_test >>> sign_test(40, 100) 0.056... Differences in proportions ========================== *To be written.* Association =========== We create some data for testing for association. .. doctest:: >>> import numpy as np >>> np.random.seed(13) >>> x_nums = range(1,11) >>> error = [1.5 * random.random() for i in range(len(x_nums))] >>> error = [e * random.choice((-1,1)) for e in error] >>> y_nums = [(x * 0.5) + e for x, e in zip(x_nums, error)] >>> x_array = np.array(x_nums) >>> y_array = np.array(y_nums) We then compute Kendall's tau and associated probability, which tests the null hypothesis that x and y are not associated. .. doctest:: >>> from cogent.maths.stats.test import kendall_correlation >>> tau, prob = kendall_correlation(x_array, y_array) >>> print tau 0.688... >>> print prob 0.00468... Correlation =========== For this example, we generate y-values as one-half the x-value plus a bit of random error .. doctest:: >>> import numpy as np >>> np.random.seed(13) >>> x_array = np.arange(1,11) >>> error = np.random.normal(size=10) >>> y_array = x_array * 0.5 + error >>> x_array array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> [str(round(n,2)) for n in y_array] ['-0.21', '1.75', '1.46', '2.45', '3.85', '3.53', '4.85', '4.86', '5.98', '3.95'] The function ``correlation()`` returns the Pearson correlation between x and y, as well as its significance .. doctest:: >>> import cogent.maths.stats.test as stats >>> r, prob = stats.correlation(x_array, y_array) >>> r 0.8907... >>> prob 0.0005... The function ``regress()`` returns the coefficients to the regression line "y=ax+b" .. doctest:: >>> slope, y_intercept = stats.regress(x_array, y_array) >>> slope 0.5514... >>> y_intercept 0.2141... Calculate the R^2 value for the regression of x and y .. doctest:: >>> R_squared = stats.regress_R2(x_array, y_array) >>> R_squared 0.7934... And finally, the residual error for each point from the linear regression .. doctest:: >>> error = stats.regress_residuals(x_array, y_array) >>> error = [str(round(e,2)) for e in error] >>> error ['-0.98', '0.44', '-0.41'... Differences in variances ======================== *To be written.* Chi-Squared test ================ .. TODO pick a biological example, perhaps sequence nucleotide composition? Codon usage for a particular amino acid? Calculus class data (from Grinstead and Snell, Introduction to Probability). There seems to be a disparity in the number of 'A' grades awarded when broken down by student gender. As input to the function ``chi_square_from_Dict2D()`` we need a ``Dict2D`` object containing the observed counts that has been processed by ``calc_contingency_expected()`` to add the expected counts for each element of the table ``Expected = row_total x column_total / overall_total`` .. doctest:: >>> from cogent.util.dict2d import Dict2D >>> import cogent.maths.stats.test as stats >>> F_grades = {'A':37,'B':63,'C':47,'F':5} >>> M_grades = {'A':56,'B':60,'C':43,'F':8} >>> grades = {'F':F_grades,'M':M_grades} >>> data = Dict2D(grades) >>> data {'M': {'A': 56... >>> OE_data = stats.calc_contingency_expected(data) >>> OE_data {'M': {'A': [56, 48.686... >>> test, chi_high = stats.chi_square_from_Dict2D(OE_data) >>> test 4.12877... >>> chi_high 0.24789... Nearly 25% of the time we would expect a Chi-squared statistic as extreme as this one or more (with df = 3), so the result is not significant. Goodness-of-fit calculation with the same data .. doctest:: >>> g_val, prob = stats.G_fit_from_Dict2D(OE_data) >>> g_val 4.1337592429166437 >>> prob 0.76424854978813872 Scatterplots ============ In this example, we generate the error as above, but separately from the x-value, and subsequently transform using matrix multiplication .. doctest:: >>> import random >>> import numpy as np >>> import cogent.maths.stats.test as stats >>> random.seed(13) >>> x_nums = range(1,11) >>> error = [1.5 * random.random() for i in range(len(x_nums))] >>> error = [e * random.choice((-1,1)) for e in error] >>> arr = np.array(x_nums + error) >>> arr.shape = (2, len(x_nums)) >>> arr array([[ 1. , 2. , 3. , 4. , 5. , 6. , 7. , 8. , 9. , 10. ], [ 0.38851274, -1.02788699, -1.02612288, -1.27400424, 0.27858626, 0.34583791, -0.22073988, -0.3377444 , -1.1010354 , 0.19531953]]) We can use a transformation matrix to rotate the points .. doctest:: >>> from math import sqrt >>> z = 1.0/sqrt(2) >>> t = np.array([[z,-z],[z,z]]) >>> rotated_x, rotated_y = np.dot(t,arr) The plotting code uses matplotlib_. .. doctest:: >>> import matplotlib.pyplot as plt >>> fig = plt.figure() >>> ax = fig.add_subplot(111) >>> ax.scatter(arr[0],arr[1],s=250,color='b',marker='s') >> ax.scatter(rotated_x,rotated_y,s=250,color='r',marker='o') >> plt.axis('equal') (0.0, 12.0, -2.0, 8.0) Plot the least squares regression lines too .. doctest:: >>> slope, y_intercept = stats.regress(rotated_x, rotated_y) >>> slope 0.9547989732316251 >>> max_x = 10 >>> ax.plot([0, max_x],[y_intercept, max_x * slope + y_intercept], ... linewidth=4.0, color='k') ... [>> slope, y_intercept = stats.regress(arr[0],arr[1]) >>> ax.plot([0, max_x],[y_intercept, max_x * slope + y_intercept], ... linewidth=4.0, color='0.6') ... [>> plt.grid(True) >>> plt.savefig('scatter_example.pdf') (If you want to plot the lines under the points, specify ``zorder=n`` to the plot commands, where ``zorder`` for the lines < ``zorder`` for the points). .. Possibly split these out into "Visualizing data" .. doctest:: :hide: >>> filenames_to_delete.append('scatter_example.pdf') Histograms ========== .. doctest:: :hide: >>> plt.clf() # because the plot gets screwed up by operations above .. doctest:: >>> import numpy as np >>> import matplotlib.pyplot as plt >>> plt.clf() >>> mu, sigma = 100, 15 >>> x = mu + sigma*np.random.randn(10000) >>> n, bins, patches = plt.hist(x, 60, normed=1, facecolor='0.75') add a "best fit" line .. doctest:: >>> import matplotlib.mlab as mlab >>> y = mlab.normpdf( bins, mu, sigma) >>> l = plt.plot(bins, y, 'r--', linewidth=3) >>> plt.grid(True) >>> plt.savefig('hist_example.png') .. doctest:: :hide: >>> filenames_to_delete.append('hist_example.png') Heat Maps ========= Representing numbers as colors is a powerful data visualization technique. This example does not actually use any functionality from PyCogent, it simply highlights a convenient matplotlib_ method for constructing a heat map. .. doctest:: >>> import numpy as np >>> import matplotlib.pyplot as plt >>> data = [i * 0.01 for i in range(100)] >>> data = np.array(data) >>> data.shape = (10,10) The plot code .. doctest:: >>> fig = plt.figure() >>> plt.hot() >>> plt.pcolormesh(data) >> plt.colorbar() >> plt.savefig('heatmap_example.png') .. doctest:: :hide: >>> filenames_to_delete.append('heatmap_example.png') >>> from cogent.util.misc import remove_files >>> remove_files(filenames_to_delete, error_on_missing=False) .. _matplotlib: http://matplotlib.sourceforge.net/ PyCogent-1.5.3/doc/cookbook/structural_contacts.rst000644 000765 000024 00000004645 11444532333 023466 0ustar00jrideoutstaff000000 000000 Intramolecular contacts ----------------------- This section of the documentation explains how to use several PyCogent modules to find contacts between macromolecular entities e.g. residues or chains both within and outside the context of a crystal. It assumes that the following boilerplate code is entered into the interpreter. This includes the necessary imports and sets up a working set of ``Entity`` instances. .. doctest:: >>> from cogent.struct import contact >>> from cogent.parse.pdb import PDBParser >>> from cogent.struct import selection, manipulation, annotation >>> pdb_fh = open('data/1HQF.pdb') >>> pdb_structure = PDBParser(pdb_fh) >>> model = pdb_structure[(0,)] >>> chainA = model[('A',)] >>> chainB = model[('B',)] >>> chainC = model[('C',)] Find contacts within a asymmetric unit ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To find contacts and save the result in the xtra dictionary of an ``Entity`` instance we have to use the ``contact.contacts_xtra`` function. This function calls is a wrapper around a low level function to find the contacts and annotates the input with the result of this call, but also returns a dictionary of the annotations. In this simple case we do not look for contacts in the crystal lattice, but for contacts between chains in the asymmetric unit, which corresponds to the default parameters of the ``contacts_xtra`` function. We seek for all contacts between all atoms in the asymmetric unit, which are at most 5.0A apart and belong to different chains. .. doctest:: >>> conts = contact.contacts_xtra(model, xtra_key='CNT', search_limit=5.0) Let's explore what the output is all about. The keys in the dictionary are all the atoms, which have are involved in some contact(s). The length of the dictionary is not the total number of contacts. .. doctest:: >>> atom_id = ('1HQF', 0, 'B', ('VAL', 182, ' '), ('C', ' ')) >>> atom_id_conts = conts[atom_id] The value for this is a dictionary, where the contacts are store in the given ``xtra_key``. .. doctest:: >>> atom_id_conts {'CNT': {('1HQF', 0, 'A', ('GLY', 310, ' '), ('CA', ' ')): (4.5734119648245102, 0, 0)}} The value is a dictionary of contacts with keys being ids of the involved atoms and values tuples defining the distance in A, symmetry operation id, and unit cell id. For contacts within the asymmetric unit there is no symmetry operation and no unit cell translation so both are ``0``. PyCogent-1.5.3/doc/cookbook/structural_data.rst000644 000765 000024 00000021614 11461256071 022555 0ustar00jrideoutstaff000000 000000 Structural data --------------- .. sectionauthor:: Kristian Rother, Patrick Yannul Protein structures ^^^^^^^^^^^^^^^^^^ Reading Protein structures """""""""""""""""""""""""" Retrieve a structure from PDB +++++++++++++++++++++++++++++ .. doctest:: >>> from cogent.db.pdb import Pdb >>> p = Pdb() >>> pdb_file = p['4tsv'] >>> pdb = pdb_file.read() >>> len(pdb) 135027 This example will retrieve the structure as a PDB file string. Parse a PDB file ++++++++++++++++ .. doctest:: >>> from cogent.parse.pdb import PDBParser >>> struc = PDBParser(open('data/4TSV.pdb')) >>> struc Parse a PDB entry directly from the web +++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent.parse.pdb import PDBParser >>> struc = PDBParser(p['4tsv']) Accessing PDB header information ++++++++++++++++++++++++++++++++ .. doctest:: >>> struc.header['id'] '4TSV' >>> struc.header['resolution'] '1.80' >>> struc.header['r_free'] '0.262' >>> struc.header['space_group'] 'H 3' Navigating structure objects """""""""""""""""""""""""""" What does a structure object contain? +++++++++++++++++++++++++++++++++++++ A ``cogent.parse.pdb.Structure`` object as returned by ``PDBParser`` contains a tree-like hierarchy of ``Entity`` objects. They are organized such that ``Structures`` that contain ``Models`` that contain ``Chains`` that contain ``Residues`` that in turn contain ``Atoms``. You can read more about the entity model on [URL of Marcins example page]. How to access a model from a structure ++++++++++++++++++++++++++++++++++++++ To get the first model out of a structure: .. doctest:: >>> model = struc[(0,)] >>> model The key contains the model number as a tuple. How to access a chain from a model? +++++++++++++++++++++++++++++++++++ To get a particular chain: .. doctest:: >>> chain = model[('A',)] >>> chain How to access a residue from a chain? +++++++++++++++++++++++++++++++++++++ To get a particular residue: .. doctest:: >>> resi = chain[('ILE', 154, ' '),] >>> resi What properties does a residue have? ++++++++++++++++++++++++++++++++++++ .. doctest:: >>> resi.res_id 154 >>> resi.name 'ILE' >>> resi.h_flag ' ' >>> resi.seg_id ' ' Access an atom from a residue +++++++++++++++++++++++++++++ To get a particular atom: .. doctest:: >>> atom = resi[("N", ' '),] >>> atom Properties of an atom +++++++++++++++++++++ .. doctest:: >>> atom.name ' N ' >>> atom.element ' N' >>> atom.coords array([ 142.986, 36.523, 6.838]) >>> atom.bfactor 13.35 >>> atom.occupancy 1.0 If a model/chain/residue/atom does not exist ++++++++++++++++++++++++++++++++++++++++++++ You will get a ``KeyError``. Is there something special about heteroatoms to consider? +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Yes, they have the ``h_flag`` attribute set in residues. How are Altlocs/insertion codes represented? ++++++++++++++++++++++++++++++++++++++++++++ Both are part of the residue/atom ID. Useful methods to access Structure objects """""""""""""""""""""""""""""""""""""""""" How to access all atoms, residues etc via a dictionary ++++++++++++++++++++++++++++++++++++++++++++++++++++++ The ``table`` property of a structure returns a two-dimensional dictionary containing all atoms. The keys are 1) the entity level (any of 'A','R','C','M') and 2) the combined IDs of ``Structure``, ``Model``, ``Chain``, ``Residue``, ``Atom`` as a tuple. .. doctest:: >>> struc.table['A'][('4TSV', 0, 'A', ('HIS', 73, ' '), ('O', ' '))] Calculate the center of mass of a model or chain ++++++++++++++++++++++++++++++++++++++++++++++++ .. NEEDS TO BE CHECKED WITH MARCIN .. doctest:: >>> model.coords array([ 146.66615752, 35.08673503, -3.60735847]) >>> chain.coords array([ 146.66615752, 35.08673503, -3.60735847]) How to get a list of all residues in a chain? +++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> chain.values()[0] How to get a list of all atoms in a chain? ++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> resi.values()[0] Constructing structures """"""""""""""""""""""" How to create a new entity? +++++++++++++++++++++++++++ ``Structure``/``Model``/``Chain``/``Residue``/``Atom`` objects can be created as follows: .. doctest:: >>> from cogent.core.entity import Structure,Model,Chain,Residue,Atom >>> from numpy import array >>> s = Structure('my_struc') >>> m = Model((0),) >>> c = Chain(('A'),) >>> r = Residue(('ALA', 1, ' ',),False,' ') >>> a = Atom(('C ',' ',), 'C', 1, array([0.0,0.0,0.0]), 1.0, 0.0, 'C') How to add entities to each other? ++++++++++++++++++++++++++++++++++ .. doctest:: >>> s.addChild(m) >>> m.addChild(c) >>> c.addChild(r) >>> r.addChild(a) >>> s.setTable(force=True) >>> s.table {'A': {('my_struc', 0, 'A', ('ALA', 1, ' '), ('C ', ' ')): }, 'C': {('my_struc', 0, 'A'): }, 'R': {('my_struc', 0, 'A', ('ALA', 1, ' ')): }, 'M': {('my_struc', 0): }} How to remove a residue from a chain? +++++++++++++++++++++++++++++++++++++ .. doctest:: >>> c.delChild(r.id) >>> s.table {'A': {('my_struc', 0, 'A', ('ALA', 1, ' '), ... Geometrical analyses """""""""""""""""""" Calculating euclidean distances between atoms +++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent.maths.geometry import distance >>> atom1 = resi[('N', ' '),] >>> atom2 = resi[('CA', ' '),] >>> distance(atom1.coords, atom2.coords) 1.4691967192993618 Calculating euclidean distances between coordinates +++++++++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from numpy import array >>> from cogent.maths.geometry import distance >>> a1 = array([1.0, 2.0, 3.0]) >>> a2 = array([1.0, 4.0, 9.0]) >>> distance(a1,a2) 6.324... Calculating flat angles from atoms ++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent.struct.dihedral import angle >>> atom3 = resi[('C', ' '),] >>> a12 = atom2.coords-atom1.coords >>> a23 = atom2.coords-atom3.coords >>> angle(a12,a23) 1.856818... Calculates the angle in radians. Calculating flat angles from coordinates ++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent.struct.dihedral import angle >>> a1 = array([0.0, 0.0, 1.0]) >>> a2 = array([0.0, 0.0, 0.0]) >>> a3 = array([0.0, 1.0, 0.0]) >>> a12 = a2-a1 >>> a23 = a2-a3 >>> angle(a12,a23) 1.5707963267948966 Calculates the angle in radians. Calculating dihedral angles from atoms ++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent.struct.dihedral import dihedral >>> atom4 = resi[('CG1', ' '),] >>> dihedral(atom1.coords,atom2.coords,atom3.coords, atom4.coords) 259.49277688244217 Calculates the torsion in degrees. Calculating dihedral angles from coordinates ++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> from cogent.struct.dihedral import dihedral >>> a1 = array([0.0, 0.0, 1.0]) >>> a2 = array([0.0, 0.0, 0.0]) >>> a3 = array([0.0, 1.0, 0.0]) >>> a4 = array([1.0, 1.0, 0.0]) >>> dihedral(a1,a2,a3,a4) 90.0 Calculates the torsion in degrees. Other stuff """"""""""" How to count the atoms in a structure? ++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> len(struc.table['A'].values()) 1187 How to iterate over chains in canonical PDB order? ++++++++++++++++++++++++++++++++++++++++++++++++++ In PDB, the chain with space as ID comes last, the others in alphabetical order. .. doctest:: >>> for chain in model.sortedvalues(): ... print chain How to iterate over chains in alphabetical order? +++++++++++++++++++++++++++++++++++++++++++++++++ If you want the chains in purely alphabetical order: .. KR 2 ROB: Is this what you requested or is the above example enough? .. doctest:: >>> keys = model.keys() >>> keys.sort() >>> for chain in [model[id] for id in keys]: ... print chain How to iterate over all residues in a chain? ++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> residues = [resi for resi in chain.values()] >>> len(residues) 218 How to remove all water molecules from a structure ++++++++++++++++++++++++++++++++++++++++++++++++++ .. doctest:: >>> water = [r for r in struc.table['R'].values() if r.name == 'H_HOH'] >>> for resi in water: ... resi.parent.delChild(resi.id) >>> struc.setTable(force=True) >>> len(struc.table['A'].values()) 1117 >>> residues = [resi for resi in chain.values()] >>> len(residues) 148 PyCogent-1.5.3/doc/cookbook/structural_data_2.rst000644 000765 000024 00000013116 11444532333 022773 0ustar00jrideoutstaff000000 000000 Structural Data Advanced ------------------------ This section covers more advanced structural entity handling tasks. Adding an entity when it exists ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A PyCogent ``Entity`` is a subclass of a dictionary. Adding children is essentially the same as updating a dictionary, with a minimal amount of book-keeping. It is equivalent to the following: :: child.setParent(parent) child_id = child.getId() parent[child_id] = child parent.setModified(True, False) This points the child entity to it's new parent (line 1) and adds the child to the parent dictionary (line 3). The call to ``setModified`` notifies all parents of the parent of the modification. A dictionary has unique keys and so a parent has children with unique ids. If you try to add a child which has an id clash it will update the parent and override the previous child, just like you would update a dictionary. Why are the short ids inside a tuple? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Short ids are parts of a long id. The long id is a tuple. Short ids can be concatenated to form a long id. This would not be possible if short ids were not within a tuple initially. For example: .. doctest:: >>> (0,) + ('A',) + (('GLY', 209, ' '),) + (('C', ' '),) (0, 'A', ('GLY', 209, ' '), ('C', ' ')) The output here is a valid long id of an atom for use in ``AtomHolder`` instances. Select children of a ``MultiEntity`` instance by a feature ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Selection is a common task and ``PyCogent`` has a unified syntax for this via the ``selectChildren`` method. The idea behind it is as follows: #. gather "requested data" from all children. #. compare each returned child value to the template "value" using the "operator" #. return children for which the comparison is ``True`` The signature of this method is selectChildren("value", "operator", "requested data"). In the first step all children return the "requested data", the request might be an attribute, a value corresponding to a key in the ``parent.xtra`` dictionary or any other query supported by the ``getData`` method. .. doctest:: >>> from cogent.parse.pdb import PDBParser >>> pdb_fh = open('data/1HQF.pdb') >>> pdb_structure = PDBParser(pdb_fh) >>> model = pdb_structure[(0,)] >>> chainA = model[('A',)] Example 1: select all alanines from a chain. """""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> alanines = chainA.selectChildren('ALA', 'eq', 'name') This requests the "name" attribute from all children in chain A and uses the "eq" (equals) operator to compare this to "ALA". It returns a list of residues which have this name. Example 2: select all residues, which are not amino acids or nucleic acids. """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> selection = chainA.selectChildren('H', 'eq', 'h_flag') This requests the "h_flag" i.e. hetero-atom flag from all residues. For amino acids and nucleic acids this should be "" for all other molecular entities "H", so the function returns only ligands, waters etc. Example 3: What if some children have data to return? """"""""""""""""""""""""""""""""""""""""""""""""""""" First we pick out a residue and modify it's xtra dictionary to contain some custom data. We mark ``lys39`` as a catalytic residue. .. doctest:: >>> lys39 = chainA[(('LYS', 39, ' '),)] >>> lys39.xtra['CATALYTIC'] = True All other residues do not have a value corresponding to the "CATALYTIC" key. But we still can select all "CATALYTIC" residues in chain A. .. doctest:: >>> catalytic = chainA.selectChildren(True, 'eq', 'CATALYTIC', xtra=True) >>> catalytic {(('LYS', 39, ' '),): } The difference is that we have requested a value from the "xtra" dictionary instead of a hypothetical "CATALYTIC" attribute. Comparison "operators" supported for the ``selectChildren`` method ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The "operator" can be either a) a string corresponding to a function from the ``operator`` module from the python standard library. The list of currently supported operators is: ``gt``, ``ge``, ``lt``, ``le``, ``eq``, ``ne``, ``or_``, ``and_``, ``contains``, ``is_``, ``is_not`` or alternatively it can be a a custom function, which has the following signature operator (value, got), where "got" is the value returned by the child and "value" is what it is compared to. Copying or serializing an entity ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyCogent ``MutltiEntity`` and ``Entity`` are Python objects and they support the copy and deepcopy protocols. .. doctest:: >>> import cPickle >>> pickledA = cPickle.dumps(chainA) >>> unpickledA = cPickle.loads(pickledA) >>> unpickledA is chainA False >>> unpickledA == chainA True In the above we have pickled and unpickled a ``MultiEntity`` instance. This results in a new instance "unpickledA" which is the same as "chainA", but has a different id (different objects, identity fails). If you are only interested in obtaining a copy of an ``Entity`` instance and not being able to share entities between python sessions. You can use the functions from the ``copy`` module. Please note that copies and deep copies are the same .. doctest:: >>> from copy import copy, deepcopy >>> otherA = copy(chainA) >>> otherA is chainA False >>> otherA == chainA True >>> cys119 = chainA[(('CYS', 119, ' '),)] >>> cys119_other = otherA[(('CYS', 119, ' '),)] >>> cys119 is cys119_other False >>> cys119 == cys119_other True PyCogent-1.5.3/doc/cookbook/tables.rst000644 000765 000024 00000041662 11643033237 020632 0ustar00jrideoutstaff000000 000000 Tabular data ------------ .. authors, Gavin Huttley, Kristian Rother, Patrick Yannul .. doctest:: :hide: >>> # just saving some tabular data for subsequent data >>> from cogent import LoadTable >>> rows = (('NP_003077', 'Con', 2.5386013224378985), ... ('NP_004893', 'Con', 0.12135142635634111e+06), ... ('NP_005079', 'Con', 0.95165949788861326e+07), ... ('NP_005500', 'NonCon', 0.73827030202664901e-07), ... ('NP_055852', 'NonCon', 1.0933217708952725e+07)) >>> table = LoadTable(header=['Locus', 'Region', 'Ratio'], rows=rows) >>> table.writeToFile('stats.txt', sep=',') Loading delimited formats ^^^^^^^^^^^^^^^^^^^^^^^^^ We load a comma separated data file using the generic ``LoadTable`` function. .. doctest:: >>> from cogent import LoadTable >>> table = LoadTable('stats.txt', sep=',') >>> print table ==================================== Locus Region Ratio ------------------------------------ NP_003077 Con 2.5386 NP_004893 Con 121351.4264 NP_005079 Con 9516594.9789 NP_005500 NonCon 0.0000 NP_055852 NonCon 10933217.7090 ------------------------------------ Reading large files ^^^^^^^^^^^^^^^^^^^ For really large files the automated conversion used by the standard read mechanism can be quite slow. If the data within a column is consistently of one type, set the ``LoadTable`` argument ``static_column_types=True``. This causes the ``Table`` object to create a custom reader. .. doctest:: >>> table = LoadTable('stats.txt', static_column_types=True) >>> print table ==================================== Locus Region Ratio ------------------------------------ NP_003077 Con 2.5386 NP_004893 Con 121351.4264 NP_005079 Con 9516594.9789 NP_005500 NonCon 0.0000 NP_055852 NonCon 10933217.7090 ------------------------------------ Formatting ^^^^^^^^^^ Changing displayed numerical precision """""""""""""""""""""""""""""""""""""" We change the ``Ratio`` column to using scientific notation. .. doctest:: >>> table.setColumnFormat('Ratio', '%.1e') >>> print table ============================== Locus Region Ratio ------------------------------ NP_003077 Con 2.5e+00 NP_004893 Con 1.2e+05 NP_005079 Con 9.5e+06 NP_005500 NonCon 7.4e-08 NP_055852 NonCon 1.1e+07 ------------------------------ Change digits or column spacing """"""""""""""""""""""""""""""" This can be done on table loading, .. doctest:: >>> table = LoadTable('stats.txt', sep=',', digits=1, space=2) >>> print table ============================= Locus Region Ratio ----------------------------- NP_003077 Con 2.5 NP_004893 Con 121351.4 NP_005079 Con 9516595.0 NP_005500 NonCon 0.0 NP_055852 NonCon 10933217.7 ----------------------------- or, for spacing at least, by modifying the attributes .. doctest:: >>> table.Space = ' ' >>> print table ================================= Locus Region Ratio --------------------------------- NP_003077 Con 2.5 NP_004893 Con 121351.4 NP_005079 Con 9516595.0 NP_005500 NonCon 0.0 NP_055852 NonCon 10933217.7 --------------------------------- Changing column headings ^^^^^^^^^^^^^^^^^^^^^^^^ The table ``Header`` is immutable. Changing column headings is done as follows. .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> print table.Header ['Locus', 'Region', 'Ratio'] >>> table = table.withNewHeader('Ratio', 'Stat') >>> print table.Header ['Locus', 'Region', 'Stat'] Creating new columns from existing ones ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This can be used to take a single, or multiple columns and generate a new column of values. Here we'll take 2 columns and return True/False based on a condition. .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> table = table.withNewColumn('LargeCon', ... lambda (r,v): r == 'Con' and v>10.0, ... columns=['Region', 'Ratio']) >>> print table ================================================ Locus Region Ratio LargeCon ------------------------------------------------ NP_003077 Con 2.5386 False NP_004893 Con 121351.4264 True NP_005079 Con 9516594.9789 True NP_005500 NonCon 0.0000 False NP_055852 NonCon 10933217.7090 False ------------------------------------------------ Appending tables ^^^^^^^^^^^^^^^^ Can be done without specifying a new column. Here we simply use the same table data. .. doctest:: >>> table1 = LoadTable('stats.txt', sep=',') >>> table2 = LoadTable('stats.txt', sep=',') >>> table = table1.appended(None, table2) >>> print table ==================================== Locus Region Ratio ------------------------------------ NP_003077 Con 2.5386 NP_004893 Con 121351.4264 NP_005079 Con 9516594.9789 NP_005500 NonCon 0.0000 NP_055852 NonCon 10933217.7090 NP_003077 Con 2.5386 NP_004893 Con 121351.4264 NP_005079 Con 9516594.9789 NP_005500 NonCon 0.0000 NP_055852 NonCon 10933217.7090 ------------------------------------ or with a new column .. doctest:: >>> table1.Title = 'Data1' >>> table2.Title = 'Data2' >>> table = table1.appended('Data#', table2, title='') >>> print table ============================================= Data# Locus Region Ratio --------------------------------------------- Data1 NP_003077 Con 2.5386 Data1 NP_004893 Con 121351.4264 Data1 NP_005079 Con 9516594.9789 Data1 NP_005500 NonCon 0.0000 Data1 NP_055852 NonCon 10933217.7090 Data2 NP_003077 Con 2.5386 Data2 NP_004893 Con 121351.4264 Data2 NP_005079 Con 9516594.9789 Data2 NP_005500 NonCon 0.0000 Data2 NP_055852 NonCon 10933217.7090 --------------------------------------------- .. note:: We assigned an empty string to ``title``, otherwise the resulting table has the same ``Title`` attribute as that of ``table1``. Summing a single column ^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> table.summed('Ratio') 20571166.652847398 Summing multiple columns or rows - strictly numerical data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We define a strictly numerical table, .. doctest:: >>> all_numeric = LoadTable(header=['A', 'B', 'C'], rows=[range(3), ... range(3,6), range(6,9), range(9,12)]) >>> print all_numeric ============= A B C ------------- 0 1 2 3 4 5 6 7 8 9 10 11 ------------- and sum all columns (default condition) .. doctest:: >>> all_numeric.summed() [18, 22, 26] and all rows .. doctest:: >>> all_numeric.summed(col_sum=False) [3, 12, 21, 30] Summing multiple columns or rows with mixed non-numeric/numeric data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We define a table with mixed data, like a distance matrix. .. doctest:: >>> mixed = LoadTable(header=['A', 'B', 'C'], rows=[['*',1,2], [3,'*', 5], ... [6,7,'*']]) >>> print mixed =========== A B C ----------- * 1 2 3 * 5 6 7 * ----------- and sum all columns (default condition), ignoring non-numerical data .. doctest:: >>> mixed.summed(strict=False) [9, 8, 7] and all rows .. doctest:: >>> mixed.summed(col_sum=False, strict=False) [3, 8, 13] Filtering table rows ^^^^^^^^^^^^^^^^^^^^ We can do this by providing a reference to an external function .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> sub_table = table.filtered(lambda x: x < 10.0, columns='Ratio') >>> print sub_table ============================= Locus Region Ratio ----------------------------- NP_003077 Con 2.5386 NP_005500 NonCon 0.0000 ----------------------------- or using valid python syntax within a string, which is executed .. doctest:: >>> sub_table = table.filtered("Ratio < 10.0") >>> print sub_table ============================= Locus Region Ratio ----------------------------- NP_003077 Con 2.5386 NP_005500 NonCon 0.0000 ----------------------------- You can also filter for values in multiple columns .. doctest:: >>> sub_table = table.filtered("Ratio < 10.0 and Region == 'NonCon'") >>> print sub_table ============================= Locus Region Ratio ----------------------------- NP_005500 NonCon 0.0000 ----------------------------- Filtering table columns ^^^^^^^^^^^^^^^^^^^^^^^ We select only columns that have a sum > 20 from the ``all_numeric`` table constructed above. .. doctest:: >>> big_numeric = all_numeric.filteredByColumn(lambda x: sum(x)>20) >>> print big_numeric ======== B C -------- 1 2 4 5 7 8 10 11 -------- Sorting ^^^^^^^ Standard sorting """""""""""""""" .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> print table.sorted(columns='Ratio') ==================================== Locus Region Ratio ------------------------------------ NP_005500 NonCon 0.0000 NP_003077 Con 2.5386 NP_004893 Con 121351.4264 NP_005079 Con 9516594.9789 NP_055852 NonCon 10933217.7090 ------------------------------------ Reverse sorting """"""""""""""" .. doctest:: >>> print table.sorted(columns='Ratio', reverse='Ratio') ==================================== Locus Region Ratio ------------------------------------ NP_055852 NonCon 10933217.7090 NP_005079 Con 9516594.9789 NP_004893 Con 121351.4264 NP_003077 Con 2.5386 NP_005500 NonCon 0.0000 ------------------------------------ Sorting involving multiple columns, one reversed """""""""""""""""""""""""""""""""""""""""""""""" .. doctest:: >>> print table.sorted(columns=['Region', 'Ratio'], reverse='Ratio') ==================================== Locus Region Ratio ------------------------------------ NP_005079 Con 9516594.9789 NP_004893 Con 121351.4264 NP_003077 Con 2.5386 NP_055852 NonCon 10933217.7090 NP_005500 NonCon 0.0000 ------------------------------------ Getting raw data ^^^^^^^^^^^^^^^^ For a single column """"""""""""""""""" .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> raw = table.getRawData('Region') >>> print raw ['Con', 'Con', 'Con', 'NonCon', 'NonCon'] For multiple columns """""""""""""""""""" .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> raw = table.getRawData(['Locus', 'Region']) >>> print raw [['NP_003077', 'Con'], ['NP_004893', 'Con'], ... Iterating over table rows ^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> for row in table: ... print row['Locus'] ... NP_003077 NP_004893 NP_005079 NP_005500 NP_055852 Table slicing ^^^^^^^^^^^^^ Using column names """""""""""""""""" .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> print table[:2, :'Region'] ========= Locus --------- NP_003077 NP_004893 --------- Using column indices """""""""""""""""""" .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> print table[:2,: 1] ========= Locus --------- NP_003077 NP_004893 --------- SQL-like capabilities ^^^^^^^^^^^^^^^^^^^^^ Distinct values """"""""""""""" .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> assert table.getDistinctValues('Region') == set(['NonCon', 'Con']) Counting """""""" .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> assert table.count("Region == 'NonCon' and Ratio > 1") == 1 Joining tables """""""""""""" SQL-like join operations requires tables have different ``Title`` attributes which are not ``None``. We do a standard inner join here for a restricted subset. We must specify the columns that will be used for the join. Here we just use ``Locus`` but multiple columns can be used, and their names can be different between the tables. Note that the second table's title becomes a part of the column names. .. doctest:: >>> rows = [['NP_004893', True], ['NP_005079', True], ... ['NP_005500', False], ['NP_055852', False]] >>> region_type = LoadTable(header=['Locus', 'LargeCon'], rows=rows, ... title='RegionClass') >>> stats_table = LoadTable('stats.txt', sep=',', title='Stats') >>> new = stats_table.joined(region_type, columns_self='Locus') >>> print new ============================================================ Locus Region Ratio RegionClass_LargeCon ------------------------------------------------------------ NP_004893 Con 121351.4264 True NP_005079 Con 9516594.9789 True NP_005500 NonCon 0.0000 False NP_055852 NonCon 10933217.7090 False ------------------------------------------------------------ Exporting ^^^^^^^^^ Writing delimited formats """"""""""""""""""""""""" .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> table.writeToFile('stats_tab.txt', sep='\t') Writing latex format """""""""""""""""""" It is also possible to specify column alignment, table caption and other arguments. .. doctest:: >>> table = LoadTable('stats.txt', sep=',') >>> print table.tostring(format='latex') \begin{longtable}[htp!]{ r r r } \hline \bf{Locus} & \bf{Region} & \bf{Ratio} \\ \hline \hline NP_003077 & Con & 2.5386 \\ NP_004893 & Con & 121351.4264 \\ NP_005079 & Con & 9516594.9789 \\ NP_005500 & NonCon & 0.0000 \\ NP_055852 & NonCon & 10933217.7090 \\ \hline \end{longtable} .. we remove the table data .. doctest:: :hide: >>> import os >>> os.remove('stats.txt') >>> os.remove('stats_tab.txt') Writing bedGraph format """"""""""""""""""""""" This format allows display of annotation tracks on genome browsers. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> rows = [['1', 100, 101, 1.123], ['1', 101, 102, 1.123], ... ['1', 102, 103, 1.123], ['1', 103, 104, 1.123], ... ['1', 104, 105, 1.123], ['1', 105, 106, 1.123], ... ['1', 106, 107, 1.123], ['1', 107, 108, 1.123], ... ['1', 108, 109, 1], ['1', 109, 110, 1], ... ['1', 110, 111, 1], ['1', 111, 112, 1], ... ['1', 112, 113, 1], ['1', 113, 114, 1], ... ['1', 114, 115, 1], ['1', 115, 116, 1], ... ['1', 116, 117, 1], ['1', 117, 118, 1], ... ['1', 118, 119, 2], ['1', 119, 120, 2], ... ['1', 120, 121, 2], ['1', 150, 151, 2], ... ['1', 151, 152, 2], ['1', 152, 153, 2], ... ['1', 153, 154, 2], ['1', 154, 155, 2], ... ['1', 155, 156, 2], ['1', 156, 157, 2], ... ['1', 157, 158, 2], ['1', 158, 159, 2], ... ['1', 159, 160, 2], ['1', 160, 161, 2]] ... >>> bgraph = LoadTable(header=['chrom', 'start', 'end', 'value'], ... rows=rows) ... >>> print bgraph.tostring(format='bedgraph', name='test track', ... description='test of bedgraph', color=(255,0,0)) # doctest: +NORMALIZE_WHITESPACE track type=bedGraph name="test track" description="test of bedgraph" color=255,0,0 1 100 108 1.12 1 108 118 1.0 1 118 161 2.0 The bedgraph formatter defaults to rounding values to 2 decimal places. You can adjust that precision using the ``digits`` argument. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> print bgraph.tostring(format='bedgraph', name='test track', ... description='test of bedgraph', color=(255,0,0), digits=0) # doctest: +NORMALIZE_WHITESPACE track type=bedGraph name="test track" description="test of bedgraph" color=255,0,0 1 100 118 1.0 1 118 161 2.0 PyCogent-1.5.3/doc/cookbook/tips_for_using_python.rst000644 000765 000024 00000015077 11572744444 024026 0ustar00jrideoutstaff000000 000000 ********************* Tips for using python ********************* .. sectionauthor:: Doug Wendel If you are new to Python, this is the right place for you. This is not a comprehensive guide, rather this section is intended help you install an appropriate version of Python and get you started using the language with PyCogent. Checking your version ===================== .. note:: If you are running OSX 10.5 or later you only need to install the Apple Developer tools which come with OSX and you'll have a suitable version. At a minimum, you need to be using Python 2.5.x. If you have Python installed, open a terminal window and type ``python``. The first few lines of text should tell you what version you're running. For example: .. note:: $ python Python 2.6.4 (r264:75706, Mar 11 2010, 12:48:01) [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin Type "help", "copyright", "credits" or "license" for more information. In this case, Python 2.6.4 is installed. If you don't have Python installed or your version is older than 2.5.x, download the most recent 2.x release of Python `here `_. .. warning:: **DO NOT** install Python 3.x or above. This version of the language was significantly restructured and will currently not work with PyCogent. Getting help ============ Python comes with a built-in help utility. To access it, open a terminal window and type ``python`` to enter Python's interactive mode: :: help() Welcome to Python 2.6! This is the online help utility. If this is your first time using Python, you should definitely check out the tutorial on the Internet at http://docs.python.org/tutorial/. Enter the name of any module, keyword, or topic to get help on writing Python programs and using Python modules. To quit this help utility and return to the interpreter, just type "quit". To get a list of available modules, keywords, or topics, type "modules", "keywords", or "topics". Each module also comes with a one-line summary of what it does; to list the modules whose summaries contain a given word such as "spam", type "modules spam". help> Note that you are now in the interactive help mode as noted by the prompt ``help>``. In this mode you can type in the names of various objects and get additional information on them. For example, if I want to know something about the ``map`` function: .. doctest:: help> map Help on built-in function map in module __builtin__: map(...) map(function, sequence[, sequence, ...]) -> list Return a list of the results of applying the function to the items of the argument sequence(s). If more than one sequence is given, the function is called with an argument list consisting of the corresponding item of each sequence, substituting None for missing values when not all sequences have the same length. If the function is None, return a list of the items of the sequence (or a list of tuples if more than one sequence). (END) To exit the interactive help mode, simply enter a blank line at the ``help>`` prompt: .. help> You are now leaving help and returning to the Python interpreter. If you want to ask for help on a particular object directly from the interpreter, you can type "help(object)". Executing "help('string')" has the same effect as typing a particular string at the help> prompt. As the parting message suggests, you can also invoke help on a specific object directly: .. help(abs) Help on built-in function abs in module __builtin__: abs(...) abs(number) -> number Return the absolute value of the argument. (END) To quit help in this case just press ``q``. Using the dir() function ======================== Another useful built-in function is ``dir()``. As the name implies, it's use is to list defined names in the current scope. To list the currently defined names: .. dir() ['__builtins__', '__doc__', '__name__', '__package__'] The list shows which names are currently defined. This list includes all imported modules and variable names. For example, if I define a new variable, it will also show up in this list: .. my_variable = 'Just testing' dir() ['__builtins__', '__doc__', '__name__', '__package__', 'my_variable'] Imported modules will also be reflected in this list: :: import os import sys dir() ['__builtins__', '__doc__', '__name__', '__package__', 'my_variable', 'os', 'sys'] ``dir()`` can also be used to list the names defined within a module: :: import sys dir(sys) ['__displayhook__', '__doc__', '__excepthook__', '__name__', '__package__', '__stderr__', '__stdin__', '__stdout__',... It also works on variable types. For example, let's see what attributes the string class has as defined: :: dir(str) ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__',... You can also use ``dir()`` on a defined variable. It will inspect the variable's type and report the attributes for that type. In this case, we defined a variable ``my_variable`` of type ``str``. Calling ``dir(my_variable)`` will product the same result as calling ``dir(str)``: :: my_variable = 'Just testing' dir(my_variable) ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__',... Hello PyCogent! =============== Now that we've gotten our feet wet, let's write a simple function that returns a friendly message. This is a simple function which takes in one parameter, ``your_name``, and outputs the user's name prefixed with a standard message. Calling your new function is as simple as typing the name of the function and supplying the appropriate variables: .. doctest:: >>> def hello_pycogent(your_name): ... message = 'PyCogent bids you welcome ' + your_name ... print message ... >>> hello_pycogent('John Smith') PyCogent bids you welcome John Smith Enter each line as you see it and note that white space is important! There are no brackets or keywords to signal blocks of code. Instead, indentation is used to designate related lines of code. Further Python documentation ============================ Now that you've got Python up and running and know a few commands, it might be useful to `browse the official documentation `_. There is a comprehensive list of information and some excellent tutorials to work though. There are also many code examples to be found in the `Python cookbook `_. PyCogent-1.5.3/doc/cookbook/useful_utilities.rst000644 000765 000024 00000046001 12014704335 022743 0ustar00jrideoutstaff000000 000000 **************** Useful Utilities **************** .. authors, Daniel McDonald, Gavin Huttley, Antonio Gonzalez Pena, Rob Knight Using PyCogent's optimisers for your own functions ================================================== You have a function that you want to maximise/minimise. The parameters in your function may be bounded (must lie in a specific interval) or not. The cogent optimisers can be applied to these cases. The ``Powell`` (a local optimiser) and ``SimulatedAnnealing`` (a global optimiser) classes in particular have had their interfaces standardised for such use cases. We demonstrate for a very simple function below. We write a simple factory function that uses a provided value for omega to compute the squared deviation from an estimate, then use it to create our optimisable function. .. doctest:: >>> import numpy >>> def DiffOmega(omega): ... def omega_from_S(S): ... omega_est = S/(1-numpy.e**(-1*S)) ... return abs(omega-omega_est)**2 ... return omega_from_S >>> omega = 0.1 >>> f = DiffOmega(omega) We then import the minimise function and use it to minimise the function, obtaining the fit statistic and the associated estimate of S. Note that we provide lower and upper bounds (which are optional) and an initial guess for our parameter of interest (``S``). .. doctest:: >>> from cogent.maths.optimisers import minimise, maximise >>> S = minimise(f, # the function ... xinit=1.0, # the initial value ... bounds=(-100, 100), # [lower,upper] bounds for the parameter ... local=True) # just local optimisation, not Simulated Annealing >>> assert 0.0 <= f(S) < 1e-6 >>> print 'S=%.4f' % S S=-3.6150 The minimise and maximise functions can also handle multidimensional optimisations, just make xinit (and the bounds) lists rather than scalar values. Fitting a function to a giving set of x and y values ==================================================== Giving a set of values for ``x`` and ``y`` fit a function ``func`` that has ``n_params`` using simplex iterations to minimize the error between the model ``func`` to fit and the given values. Here we fitting an exponential function. .. doctest:: :hide: >>> from numpy.random import seed >>> seed(42) # so the results are not volatile .. doctest:: >>> from numpy import array, arange, exp >>> from numpy.random import rand, seed >>> from cogent.maths.fit_function import fit_function >>> # creating x values >>> x = arange(-1,1,.01) >>> >>> # defining our fitting function >>> def f(x,a): ... return exp(a[0]+x*a[1]) ... >>> # getting our real y >>> y = f(x,a=[2,5]) >>> >>> # creating our noisy y >>> y_noise = y + rand(len(y))*5 >>> >>> # fitting our noisy data to the function using 1 iteration >>> params = fit_function(x, y_noise, f, 2, 1) >>> params array([ 2.0399908 , 4.96109191]) >>> >>> # fitting our noisy data to the function using 1 iteration >>> params = fit_function(x, y_noise, f, 2, 5) >>> params array([ 2.0399641 , 4.96112469]) Cartesian products ================== *To be written.* .. cogent.util.transform Miscellaneous functions ======================= .. index:: cogent.util.misc Identity testing ^^^^^^^^^^^^^^^^ Basic ``identity`` function to avoid having to test explicitly for None .. doctest:: >>> from cogent.util.misc import identity >>> my_var = None >>> if identity(my_var): ... print "foo" ... else: ... print "bar" ... bar One-line if/else statement ^^^^^^^^^^^^^^^^^^^^^^^^^^ Convenience function for performing one-line if/else statements. This is similar to the C-style ternary operator: .. doctest:: >>> from cogent.util.misc import if_ >>> result = if_(4 > 5, "Expression is True", "Expression is False") >>> result 'Expression is False' However, the value returned is evaluated, but not called. For instance: .. doctest:: >>> from cogent.util.misc import if_ >>> def foo(): ... print "in foo" ... >>> def bar(): ... print "in bar" ... >>> if_(4 > 5, foo, bar) >> from cogent.util.misc import iterable >>> my_var = 10 >>> for i in my_var: ... print "will not work" ... Traceback (most recent call last): TypeError: 'int' object is not iterable >>> for i in iterable(my_var): ... print i ... 10 Obtain the index of the largest item ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To determine the index of the largest item in any iterable container, use ``max_index``: .. doctest:: >>> from cogent.util.misc import max_index >>> l = [5,4,2,2,6,8,0,10,0,5] >>> max_index(l) 7 .. note:: Will return the lowest index of duplicate max values Obtain the index of the smallest item ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To determine the index of the smallest item in any iterable container, use ``min_index``: .. doctest:: >>> from cogent.util.misc import min_index >>> l = [5,4,2,2,6,8,0,10,0,5] >>> min_index(l) 6 .. note:: Will return the lowest index of duplicate min values Remove a nesting level ^^^^^^^^^^^^^^^^^^^^^^ To flatten a 2-dimensional list, you can use ``flatten``: .. doctest:: >>> from cogent.util.misc import flatten >>> l = ['abcd','efgh','ijkl'] >>> flatten(l) ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l'] Convert a nested tuple into a list ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Conversion of a nested ``tuple`` into a ``list`` can be performed using ``deep_list``: .. doctest:: >>> from cogent.util.misc import deep_list >>> t = ((1,2),(3,4),(5,6)) >>> deep_list(t) [[1, 2], [3, 4], [5, 6]] Simply calling ``list`` will not convert the nested items: .. doctest:: >>> list(t) [(1, 2), (3, 4), (5, 6)] Convert a nested list into a tuple ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Conversion of a nested ``list`` into a ``tuple`` can be performed using ``deep_list``: .. doctest:: >>> from cogent.util.misc import deep_tuple >>> l = [[1,2],[3,4],[5,6]] >>> deep_tuple(l) ((1, 2), (3, 4), (5, 6)) Simply calling ``tuple`` will not convert the nested items: .. doctest:: >>> tuple(l) ([1, 2], [3, 4], [5, 6]) Testing if an item is between two values ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Same as: min <= number <= max, although it is quickly readable within code .. doctest:: >>> from cogent.util.misc import between >>> between((3,5),4) True >>> between((3,5),6) False Return combinations of items ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``Combinate`` returns all k-combinations of items. For instance: .. doctest:: >>> from cogent.util.misc import combinate >>> list(combinate([1,2,3],0)) [[]] >>> list(combinate([1,2,3],1)) [[1], [2], [3]] >>> list(combinate([1,2,3],2)) [[1, 2], [1, 3], [2, 3]] >>> list(combinate([1,2,3],3)) [[1, 2, 3]] Save and load gzip'd files ^^^^^^^^^^^^^^^^^^^^^^^^^^ These handy methods will ``cPickle`` an object and automagically gzip the file. You can also then reload the object at a later date. .. doctest:: >>> from cogent.util.misc import gzip_dump, gzip_load >>> class foo(object): ... some_var = 5 ... >>> bar = foo() >>> bar.some_var = 10 >>> # gzip_dump(bar, 'test_file') >>> # new_bar = gzip_load('test_file') >>> # isinstance(new_bar, foo) .. note:: The above code does work, but cPickle won't write out within doctest Curry a function ^^^^^^^^^^^^^^^^ curry(f,x)(y) = f(x,y) or = lambda y: f(x,y). This was modified from the Python Cookbook. Docstrings are also carried over. .. doctest:: >>> from cogent.util.misc import curry >>> def foo(x,y): ... """Some function""" ... return x + y ... >>> bar = curry(foo, 5) >>> print bar.__doc__ curry(foo,5) == curried from foo == Some function >>> bar(10) 15 Test to see if an object is iterable ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Perform a simple test to see if an object supports iteration .. doctest:: >>> from cogent.util.misc import is_iterable >>> can_iter = [1,2,3,4] >>> cannot_iter = 1.234 >>> is_iterable(can_iter) True >>> is_iterable(cannot_iter) False Test to see if an object is a single char ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Perform a simple test to see if an object is a single character .. doctest:: >>> from cogent.util.misc import is_char >>> class foo: ... pass ... >>> is_char('a') True >>> is_char('ab') False >>> is_char(foo()) False Flatten a deeply nested iterable ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To flatten a deeply nested iterable, use ``recursive_flatten``. This method supports multiple levels of nesting, and multiple iterable types .. doctest:: >>> from cogent.util.misc import recursive_flatten >>> l = [[[[1,2], 'abcde'], [5,6]], [7,8], [9,10]] >>> recursive_flatten(l) [1, 2, 'a', 'b', 'c', 'd', 'e', 5, 6, 7, 8, 9, 10] Test to determine if ``list`` of ``tuple`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Perform a simple check to see if an object is not a list or a tuple .. doctest:: >>> from cogent.util.misc import not_list_tuple >>> not_list_tuple(1) True >>> not_list_tuple([1]) False >>> not_list_tuple('ab') True Unflatten items to row-width ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Unflatten an iterable of items to a specified row-width. This does reverse the effect of ``zip`` as the lists produced are not interleaved. .. doctest:: >>> from cogent.util.misc import unflatten >>> l = [1,2,3,4,5,6,7,8] >>> unflatten(l,1) [[1], [2], [3], [4], [5], [6], [7], [8]] >>> unflatten(l,2) [[1, 2], [3, 4], [5, 6], [7, 8]] >>> unflatten(l,3) [[1, 2, 3], [4, 5, 6]] >>> unflatten(l,4) [[1, 2, 3, 4], [5, 6, 7, 8]] Unzip items ^^^^^^^^^^^ Reverse the effects of a ``zip`` method, i.e. produces separate lists from tuples .. doctest:: >>> from cogent.util.misc import unzip >>> l = ((1,2),(3,4),(5,6)) >>> unzip(l) [[1, 3, 5], [2, 4, 6]] Select items in order ^^^^^^^^^^^^^^^^^^^^^ Select items in a specified order .. doctest:: >>> from cogent.util.misc import select >>> select('ea', {'a':1,'b':5,'c':2,'d':4,'e':6}) [6, 1] >>> select([0,4,8], 'abcdefghijklm') ['a', 'e', 'i'] Obtain the index sort order ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Obtain the indices for items in sort order. This is similar to numpy.argsort, but will work on any iterable that implements the necessary ``cmp`` methods .. doctest:: >>> from cogent.util.misc import sort_order >>> sort_order([4,2,3,5,7,8]) [1, 2, 0, 3, 4, 5] >>> sort_order('dcba') [3, 2, 1, 0] Find overlapping pattern occurrences ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Find all of the overlapping occurrences of a pattern within a text .. doctest:: >>> from cogent.util.misc import find_all >>> text = 'aaaaaaa' >>> pattern = 'aa' >>> find_all(text, pattern) [0, 1, 2, 3, 4, 5] >>> text = 'abababab' >>> pattern = 'aba' >>> find_all(text, pattern) [0, 2, 4] Find multiple pattern occurrences ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Find all of the overlapping occurrences of multiple patterns within a text. Returned indices are sorted, each index is the start position of one of the patterns .. doctest:: >>> from cogent.util.misc import find_many >>> text = 'abababcabab' >>> patterns = ['ab','abc'] >>> find_many(text, patterns) [0, 2, 4, 4, 7, 9] Safely remove a trailing underscore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 'Unreserve' a mutation of Python reserved words .. doctest:: >>> from cogent.util.misc import unreserve >>> unreserve('class_') 'class' >>> unreserve('class') 'class' Create a case-insensitive iterable ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Create a case-insensitive object, for instance, if you want the key 'a' and 'A' to point to the same item in a dict .. doctest:: >>> from cogent.util.misc import add_lowercase >>> d = {'A':5,'B':6,'C':7,'foo':8,42:'life'} >>> add_lowercase(d) {'A': 5, 'a': 5, 'C': 7, 'B': 6, 42: 'life', 'c': 7, 'b': 6, 'foo': 8} Extract data delimited by differing left and right delimiters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Extract data from a line that is surrounded by different right/left delimiters .. doctest:: >>> from cogent.util.misc import extract_delimited >>> line = "abc[def]ghi" >>> extract_delimited(line,'[',']') 'def' Invert a dictionary ^^^^^^^^^^^^^^^^^^^ Get a dictionary with the values set as keys and the keys set as values .. doctest:: >>> from cogent.util.misc import InverseDict >>> d = {'some_key':1,'some_key_2':2} >>> InverseDict(d) {1: 'some_key', 2: 'some_key_2'} .. note:: An arbitrary key will be set if there are multiple keys with the same value Invert a dictionary with multiple keys having the same value ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Get a dictionary with the values set as keys and the keys set as values. Can handle the case where multiple keys point to the same values .. doctest:: >>> from cogent.util.misc import InverseDictMulti >>> d = {'some_key':1,'some_key_2':1} >>> InverseDictMulti(d) {1: ['some_key_2', 'some_key']} >>> Get mapping from sequence item to all positions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``DictFromPos`` returns the positions of all items seen within a sequence. This is useful for obtaining, for instance, nucleotide counts and positions .. doctest:: >>> from cogent.util.misc import DictFromPos >>> seq = 'aattggttggaaggccgccgttagacg' >>> DictFromPos(seq) {'a': [0, 1, 10, 11, 22, 24], 'c': [14, 15, 17, 18, 25], 't': [2, 3, 6, 7, 20, 21], 'g': [4, 5, 8, 9, 12, 13, 16, 19, 23, 26]} Get the first index of occurrence for each item in a sequence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``DictFromFirst`` will return the first location of each item in a sequence .. doctest:: >>> from cogent.util.misc import DictFromFirst >>> seq = 'aattggttggaaggccgccgttagacg' >>> DictFromFirst(seq) {'a': 0, 'c': 14, 't': 2, 'g': 4} Get the last index of occurrence for each item in a sequence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``DictFromLast`` will return the last location of each item in a sequence .. doctest:: >>> from cogent.util.misc import DictFromLast >>> seq = 'aattggttggaaggccgccgttagacg' >>> DictFromLast(seq) {'a': 24, 'c': 25, 't': 21, 'g': 26} Construct a distance matrix lookup function ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Automatically construct a distance matrix lookup function. This is useful for maintaining flexibility about whether a function is being computed or if a lookup is being used .. doctest:: >>> from cogent.util.misc import DistanceFromMatrix >>> from numpy import array >>> m = array([[1,2,3],[4,5,6],[7,8,9]]) >>> f = DistanceFromMatrix(m) >>> f(0,0) 1 >>> f(1,2) 6 Get all pairs from groups ^^^^^^^^^^^^^^^^^^^^^^^^^ Get all of the pairs of items present in a list of groups. A key will be created (i,j) iff i and j share a group .. doctest:: >>> from cogent.util.misc import PairsFromGroups >>> groups = ['ab','xyz'] >>> PairsFromGroups(groups) {('a', 'a'): None, ('b', 'b'): None, ('b', 'a'): None, ('x', 'y'): None, ('z', 'x'): None, ('y', 'y'): None, ('x', 'x'): None, ('y', 'x'): None, ('z', 'y'): None, ('x', 'z'): None, ('a', 'b'): None, ('y', 'z'): None, ('z', 'z'): None} Check class types ^^^^^^^^^^^^^^^^^ Check an object against base classes or derived classes to see if it is acceptable .. doctest:: >>> from cogent.util.misc import ClassChecker >>> class not_okay(object): ... pass ... >>> no = not_okay() >>> class okay(object): ... pass ... >>> o = okay() >>> class my_dict(dict): ... pass ... >>> md = my_dict() >>> cc = ClassChecker(str, okay, dict) >>> o in cc True >>> no in cc False >>> 5 in cc False >>> {'a':5} in cc True >>> 'asasas' in cc True >>> md in cc True Delegate to a separate object ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Delegate object method calls, properties and variables to the appropriate object. Useful to combine multiple objects together while assuring that the calls will go to the correct object. .. doctest:: >>> from cogent.util.misc import Delegator >>> class ListAndString(list, Delegator): ... def __init__(self, items, string): ... Delegator.__init__(self, string) ... for i in items: ... self.append(i) ... >>> ls = ListAndString([1,2,3], 'ab_cd') >>> len(ls) 3 >>> ls[0] 1 >>> ls.upper() 'AB_CD' >>> ls.split('_') ['ab', 'cd'] Wrap a function to hide from a class ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Wrap a function to hide it from a class so that it isn't a method. .. doctest:: >>> from cogent.util.misc import FunctionWrapper >>> f = FunctionWrapper(str) >>> f >> f(123) '123' Construct a constrained container ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Wrap a container with a constraint. This is useful for enforcing that the data contained is valid within a defined context. PyCogent provides a base ``ConstrainedContainer`` which can be used to construct user-defined constrained objects. PyCogent also provides ``ConstrainedString``, ``ConstrainedList``, and ``ConstrainedDict``. These provided types fully cover the builtin types while staying integrated with the ``ConstrainedContainer``. Here is a light example of the ``ConstrainedDict`` .. doctest:: >>> from cogent.util.misc import ConstrainedDict >>> d = ConstrainedDict({'a':1,'b':2,'c':3}, Constraint='abc') >>> d {'a': 1, 'c': 3, 'b': 2} >>> d['d'] = 5 Traceback (most recent call last): ConstraintError: Item 'd' not in constraint 'abc' PyCogent also provides mapped constrained containers for each of the default types provided, ``MappedString``, ``MappedList``, and ``MappedDict``. These behave the same, except that they map a mask onto ``__contains__`` and ``__getitem__`` .. doctest:: >>> def mask(x): ... return str(int(x) + 3) ... >>> from cogent.util.misc import MappedString >>> s = MappedString('12345', Constraint='45678', Mask=mask) >>> s '45678' >>> s + '123' '45678456' >>> s + '9' Traceback (most recent call last): ConstraintError: Sequence '9' doesn't meet constraint Check the location of an application ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Determine if an application is available on a system .. doctest:: >>> from cogent.util.misc import app_path >>> app_path('ls') '/bin/ls' >>> app_path('does_not_exist') False PyCogent-1.5.3/doc/cookbook/using_likelihood_to_perform_evolutionary_analyses.rst000644 000765 000024 00000064320 12014672164 031660 0ustar00jrideoutstaff000000 000000 ************************************** Evolutionary analysis using likelihood ************************************** Specifying substitution models ============================== Canned models ------------- Many standard evolutionary models come pre-defined in the ``cogent.evolve.models`` module. The available nucleotide, codon and protein models are .. doctest:: >>> from cogent.evolve import models >>> print models.nucleotide_models ['JC69', 'K80', 'F81', 'HKY85', 'TN93', 'GTR'] >>> print models.codon_models ['CNFGTR', 'CNFHKY', 'MG94HKY', 'MG94GTR', 'GY94', 'H04G', 'H04GK', 'H04GGK'] >>> print models.protein_models ['DSO78', 'AH96', 'AH96_mtmammals', 'JTT92', 'WG01'] While those values are strings, a function of the same name exists within the module so creating the substitution models requires only calling that function. I demonstrate that for a nucleotide model here. .. doctest:: >>> from cogent.evolve.models import F81 >>> sub_mod = F81() We'll be using these for the examples below. Rate heterogeneity models ------------------------- We illustrate this for the gamma distributed case using examples of the canned models displayed above. Creating rate heterogeneity variants of the canned models can be done by using optional arguments that get passed to the substitution model class. For nucleotide ^^^^^^^^^^^^^^ We specify a general time reversible nucleotide model with gamma distributed rate heterogeneity. .. doctest:: >>> from cogent.evolve.models import GTR >>> sub_mod = GTR(with_rate=True, distribution='gamma') >>> print sub_mod Nucleotide ( name = 'GTR'; type = 'None'; params = ['A/G', 'A/T', 'A/C', 'C/T', 'C/G']; number of motifs = 4; motifs = ['T', 'C', 'A', 'G']) For codon ^^^^^^^^^ We specify a conditional nucleotide frequency codon model with nucleotide general time reversible parameters and a parameter for the ratio of nonsynonymous to synonymous substitutions (omega) with gamma distributed rate heterogeneity. .. doctest:: >>> from cogent.evolve.models import CNFGTR >>> sub_mod = CNFGTR(with_rate=True, distribution='gamma') >>> print sub_mod Codon ( name = 'CNFGTR'; type = 'None'; params = ['A/G', 'A/C', 'C/T', 'A/T', 'C/G', 'omega']; ... For protein ^^^^^^^^^^^ We specify a Jones, Taylor and Thornton 1992 empirical protein substitution model with gamma distributed rate heterogeneity. .. doctest:: >>> from cogent.evolve.models import JTT92 >>> sub_mod = JTT92(with_rate=True, distribution='gamma') >>> print sub_mod Empirical ( name = 'JTT92'; type = 'None'; number of motifs = 20; motifs = ['A', 'C'... Specifying likelihood functions =============================== Making a likelihood function ---------------------------- You start by specifying a substitution model and use that to construct a likelihood function for a specific tree. .. doctest:: >>> from cogent import LoadTree >>> from cogent.evolve.models import F81 >>> sub_mod = F81() >>> tree = LoadTree(treestring='(a,b,(c,d))') >>> lf = sub_mod.makeLikelihoodFunction(tree) Providing an alignment to a likelihood function ----------------------------------------------- You need to load an alignment and then provide it a likelihood function. I construct very simple trees and alignments for this example. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import F81 >>> sub_mod = F81() >>> tree = LoadTree(treestring='(a,b,(c,d))') >>> lf = sub_mod.makeLikelihoodFunction(tree) >>> aln = LoadSeqs(data=[('a', 'ACGT'), ('b', 'AC-T'), ('c', 'ACGT'), ... ('d', 'AC-T')]) ... >>> lf.setAlignment(aln) Scoping parameters on trees --------------------------- For many evolutionary analyses, it's desirable to allow different branches on a tree to have different values of a parameter. We show this for a simple codon model case here where we want the great apes (the clade that includes human and orangutan) to have a different value of the ratio of nonsynonymous to synonymous substitutions. This parameter is identified in the precanned ``CNFGTR`` model as ``omega``. .. doctest:: >>> from cogent import LoadTree >>> from cogent.evolve.models import CNFGTR >>> tree = LoadTree('data/primate_brca1.tree') >>> print tree.asciiArt() /-Galago | -root----|--HowlerMon | | /-Rhesus \edge.3--| | /-Orangutan \edge.2--| | /-Gorilla \edge.1--| | /-Human \edge.0--| \-Chimpanzee >>> sm = CNFGTR() >>> lf = sm.makeLikelihoodFunction(tree, digits=2) >>> lf.setParamRule('omega', tip_names=['Human', 'Orangutan'], outgroup_name='Galago', is_clade=True, init=0.5) We've set an *initial* value for this clade so that the edges affected by this rule are evident below. .. doctest:: >>> print lf Likelihood Function Table ==================================== A/C A/G A/T C/G C/T ------------------------------------ 1.00 1.00 1.00 1.00 1.00 ------------------------------------ ======================================= edge parent length omega --------------------------------------- Galago root 1.00 1.00 HowlerMon root 1.00 1.00 Rhesus edge.3 1.00 1.00 Orangutan edge.2 1.00 0.50 Gorilla edge.1 1.00 0.50 Human edge.0 1.00 0.50 Chimpanzee edge.0 1.00 0.50 edge.0 edge.1 1.00 0.50 edge.1 edge.2 1.00 0.50 edge.2 edge.3 1.00 1.00 edge.3 root 1.00 1.00 ---------------------------------------... A more extensive description of capabilities is in :ref:`scope-params-on-trees`. Specifying parameter values --------------------------- Specifying a parameter as constant ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This means the parameter will not be modified during likelihood maximisation. We show this here by making the ``omega`` parameter constant at the value 1 -- essentially the condition of selective neutrality. .. doctest:: >>> from cogent import LoadTree >>> from cogent.evolve.models import CNFGTR >>> tree = LoadTree('data/primate_brca1.tree') >>> sm = CNFGTR() >>> lf = sm.makeLikelihoodFunction(tree, digits=2) >>> lf.setParamRule('omega', is_constant=True) Providing a starting value for a parameter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This can be useful to improve performance, the closer you are to the maximum likelihood estimator the quicker optimisation will be. .. doctest:: >>> from cogent import LoadTree >>> from cogent.evolve.models import CNFGTR >>> tree = LoadTree('data/primate_brca1.tree') >>> sm = CNFGTR() >>> lf = sm.makeLikelihoodFunction(tree, digits=2) >>> lf.setParamRule('omega', init=0.1) Setting bounds for optimising a function ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This can be useful for stopping optimisers from getting stuck in a bad part of parameter space. .. doctest:: >>> from cogent import LoadTree >>> from cogent.evolve.models import CNFGTR >>> tree = LoadTree('data/primate_brca1.tree') >>> sm = CNFGTR() >>> lf = sm.makeLikelihoodFunction(tree, digits=2) >>> lf.setParamRule('omega', init=0.1, lower=1e-9, upper=20.0) Specifying rate heterogeneity functions --------------------------------------- We extend the simple gamma distributed rate heterogeneity case for nucleotides from above to construction of the actual likelihood function. We do this for 4 bins and constraint the bin probabilities to be equal. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import GTR >>> sm = GTR(with_rate=True, distribution='gamma') >>> tree = LoadTree('data/primate_brca1.tree') >>> lf = sm.makeLikelihoodFunction(tree, bins=4, digits=2) >>> lf.setParamRule('bprobs', is_constant=True) For more detailed discussion of defining and using these models see :ref:`rate-heterogeneity`. Specifying Phylo-HMMs --------------------- .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import GTR >>> sm = GTR(with_rate=True, distribution='gamma') >>> tree = LoadTree('data/primate_brca1.tree') >>> lf = sm.makeLikelihoodFunction(tree, bins=4, sites_independent=False, ... digits=2) >>> lf.setParamRule('bprobs', is_constant=True) For more detailed discussion of defining and using these models see :ref:`rate-heterogeneity-hmm`. Fitting likelihood functions ============================ Choice of optimisers -------------------- There are 2 types of optimiser: simulated annealing, a *global* optimiser; and Powell, a *local* optimiser. The simulated annealing method is slow compared to Powell and in general Powell is an adequate choice. I setup a simple nucleotide model to illustrate these. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import F81 >>> tree = LoadTree('data/primate_brca1.tree') >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> sm = F81() >>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2) >>> lf.setAlignment(aln) The default is to use the simulated annealing optimiser followed by Powell. .. doctest:: >>> lf.optimise(show_progress=False) We can specify just using the local optimiser. To do so, it's recommended to set the ``max_restarts`` argument since this provides a mechanism for Powell to attempt restarting the optimisation from slightly different sport which can help in overcoming local maxima. .. doctest:: >>> lf.optimise(local=True, max_restarts=5, show_progress=False) We might want to do crude simulated annealing following by more rigorous Powell. .. doctest:: >>> lf.optimise(show_progress=False, global_tolerance=1.0, tolerance=1e-8, ... max_restarts=5) Checkpointing runs ------------------ See :ref:`checkpointing-optimisation`. How to check your optimisation was successful. ---------------------------------------------- There is no guarantee that an optimised function has achieved a global maximum. We can, however, be sure that a maximum was achieved by validating that the optimiser stopped because the specified tolerance condition was met, rather than exceeding the maximum number of evaluations. The latter number is set to ensure optimisation doesn't proceed endlessly. If the optimiser exited because this limit was exceeded you can be sure that the function **has not** been successfully optimised. We can monitor this situation using the ``limit_action`` argument to ``optimise``. Providing the value ``raise`` causes an exception to be raised if this condition occurs, as shown below. Providing ``warn`` (default) instead will cause a warning message to be printed to screen but execution will continue. The value ``ignore`` hides any such message. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import F81 >>> tree = LoadTree('data/primate_brca1.tree') >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> sm = F81() >>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2) >>> lf.setAlignment(aln) >>> max_evals = 10 >>> lf.optimise(show_progress=False, limit_action='raise', ... max_evaluations=max_evals, return_calculator=True) ... Traceback (most recent call last): ArithmeticError: FORCED EXIT from optimiser after 10 evaluations .. note:: We recommend using ``limit_action='raise'`` and catching the ``ArithmeticError`` error explicitly. You really shouldn't be using results from such an optimisation run. Getting statistics out of likelihood functions ============================================== Model fit statistics -------------------- Log likelihood and number of free parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import GTR >>> sm = GTR() >>> tree = LoadTree('data/primate_brca1.tree') >>> lf = sm.makeLikelihoodFunction(tree) >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> lf.setAlignment(aln) We get the log-likelihood and the number of free parameters. .. doctest:: >>> lnL = lf.getLogLikelihood() >>> print lnL -24601.9... >>> nfp = lf.getNumFreeParams() >>> print nfp 16 .. warning:: The number of free parameters (nfp) refers only to the number of parameters that were modifiable by the optimiser. Typically, the degrees-of-freedom of a likelihood ratio test statistic is computed as the difference in nfp between models. This will not be correct for models in which boundary conditions exist (rate heterogeneity models where a parameter value boundary is set between bins). Information theoretic measures ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Aikake Information Criterion """""""""""""""""""""""""""" .. note:: this measure only makes sense when the model has been optimised, a step I'm skipping here in the interests of speed. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import GTR >>> sm = GTR() >>> tree = LoadTree('data/primate_brca1.tree') >>> lf = sm.makeLikelihoodFunction(tree) >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> lf.setAlignment(aln) >>> AIC = lf.getAic() >>> AIC 49235.869... We can also get the second-order AIC. .. doctest:: >>> AICc = lf.getAic(second_order=True) >>> AICc 49236.064... Bayesian Information Criterion """""""""""""""""""""""""""""" .. note:: this measure only makes sense when the model has been optimised, a step I'm skipping here in the interests of speed. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import GTR >>> sm = GTR() >>> tree = LoadTree('data/primate_brca1.tree') >>> lf = sm.makeLikelihoodFunction(tree) >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> lf.setAlignment(aln) >>> BIC = lf.getBic() >>> BIC 49330.9475... Getting maximum likelihood estimates ------------------------------------ We fit the model defined in the previous section and use that in the following. One at a time ^^^^^^^^^^^^^ We get the statistics out individually. We get the ``length`` for the Human edge and the exchangeability parameter ``A/G``. .. doctest:: >>> lf.optimise(local=True, show_progress=False) >>> a_g = lf.getParamValue('A/G') >>> print a_g 5.25... >>> human = lf.getParamValue('length', 'Human') >>> print human 0.006... Just the motif probabilities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. doctest:: >>> mprobs = lf.getMotifProbs() >>> print mprobs ==================================== T C A G ------------------------------------ 0.2406 0.1742 0.3757 0.2095 ------------------------------------ On the tree object ^^^^^^^^^^^^^^^^^^ If written to file in xml format, then model parameters will be saved. This can be useful for later plotting or recreating likelihood functions. .. doctest:: >>> annot_tree = lf.getAnnotatedTree() >>> print annot_tree.getXML() #doctest: +SKIP Galago A/G5.25342689214 A/C1.23159157151 C/T5.97001104267 length0.173114172705... .. warning:: This method fails for some rate-heterogeneity models. As tables ^^^^^^^^^ .. doctest:: >>> tables = lf.getStatistics(with_motif_probs=True, with_titles=True) >>> for table in tables: ... if 'global' in table.Title: ... print table global params ============================================== A/C A/G A/T C/G C/T ---------------------------------------------- 1.2316 5.2534 0.9585 2.3159 5.9700 ---------------------------------------------- Testing hypotheses ================== Using likelihood ratio tests ---------------------------- We test the molecular clock hypothesis for human and chimpanzee lineages. The null has these two branches constrained to be equal. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import F81 >>> tree = LoadTree('data/primate_brca1.tree') >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> sm = F81() >>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2) >>> lf.setAlignment(aln) >>> lf.setParamRule('length', tip_names=['Human', 'Chimpanzee'], ... outgroup_name='Galago', is_clade=True, is_independent=False) ... >>> lf.setName('Null Hypothesis') >>> lf.optimise(local=True, show_progress=False) >>> null_lnL = lf.getLogLikelihood() >>> null_nfp = lf.getNumFreeParams() >>> print lf Null Hypothesis ========================== edge parent length -------------------------- Galago root 0.167 HowlerMon root 0.044 Rhesus edge.3 0.021 Orangutan edge.2 0.008 Gorilla edge.1 0.002 Human edge.0 0.004 Chimpanzee edge.0 0.004 edge.0 edge.1 0.000... The alternate allows the human and chimpanzee branches to differ by just setting all lengths to be independent. .. doctest:: >>> lf.setParamRule('length', is_independent=True) >>> lf.setName('Alt Hypothesis') >>> lf.optimise(local=True, show_progress=False) >>> alt_lnL = lf.getLogLikelihood() >>> alt_nfp = lf.getNumFreeParams() >>> print lf Alt Hypothesis ========================== edge parent length -------------------------- Galago root 0.167 HowlerMon root 0.044 Rhesus edge.3 0.021 Orangutan edge.2 0.008 Gorilla edge.1 0.002 Human edge.0 0.006 Chimpanzee edge.0 0.003 edge.0 edge.1 0.000... We import the function for computing the probability of a chi-square test statistic, compute the likelihood ratio test statistic, degrees of freedom and the corresponding probability. .. doctest:: >>> from cogent.maths.stats import chisqprob >>> LR = 2 * (alt_lnL - null_lnL) # the likelihood ratio statistic >>> df = (alt_nfp - null_nfp) # the test degrees of freedom >>> p = chisqprob(LR, df) >>> print 'LR=%.4f ; df = %d ; p=%.4f' % (LR, df, p) LR=3.3294 ; df = 1 ; p=0.0681 By parametric bootstrapping --------------------------- If we can't rely on the asymptotic behaviour of the LRT, e.g. due to small alignment length, we can use a parametric bootstrap. Convenience functions for that are described in more detail here :ref:`parametric-bootstrap`. In general, however, this capability derives from the ability of any defined ``evolve`` likelihood function to simulate an alignment. This property is provided as ``simulateAlignment`` method on likelihood function objects. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import F81 >>> tree = LoadTree('data/primate_brca1.tree') >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> sm = F81() >>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2) >>> lf.setAlignment(aln) >>> lf.setParamRule('length', tip_names=['Human', 'Chimpanzee'], ... outgroup_name='Galago', is_clade=True, is_independent=False) ... >>> lf.setName('Null Hypothesis') >>> lf.optimise(local=True, show_progress=False) >>> sim_aln = lf.simulateAlignment() >>> print repr(sim_aln) 7 x 2814 dna alignment: Gorilla... Determining confidence intervals on MLEs ======================================== The profile method is used to calculate a confidence interval for a named parameter. We show it here for a global substitution model exchangeability parameter (*kappa*, the ratio of transition to transversion rates) and for an edge specific parameter (just the human branch length). .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import HKY85 >>> tree = LoadTree('data/primate_brca1.tree') >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> sm = HKY85() >>> lf = sm.makeLikelihoodFunction(tree) >>> lf.setAlignment(aln) >>> lf.optimise(local=True, show_progress=False) >>> kappa_lo, kappa_mle, kappa_hi = lf.getParamInterval('kappa') >>> print "lo=%.2f ; mle=%.2f ; hi = %.2f" % (kappa_lo, kappa_mle, kappa_hi) lo=3.78 ; mle=4.44 ; hi = 5.22 >>> human_lo, human_mle, human_hi = lf.getParamInterval('length', 'Human') >>> print "lo=%.2f ; mle=%.2f ; hi = %.2f" % (human_lo, human_mle, human_hi) lo=0.00 ; mle=0.01 ; hi = 0.01 Saving results ============== Use either the annotated tree or statistics tables to obtain objects that can easily be written to file. Visualising statistics on trees =============================== We look at the distribution of ``omega`` from the CNF codon model family across different primate lineages. We allow each edge to have an independent value for ``omega``. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import CNFGTR >>> tree = LoadTree('data/primate_brca1.tree') >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> sm = CNFGTR() >>> lf = sm.makeLikelihoodFunction(tree, digits=2, space=2) >>> lf.setParamRule('omega', is_independent=True, upper=10.0) >>> lf.setAlignment(aln) >>> lf.optimise(show_progress=False, local=True) >>> print lf Likelihood Function Table ============================ A/C A/G A/T C/G C/T ---------------------------- 1.07 3.88 0.79 1.96 4.09 ---------------------------- ================================= edge parent length omega --------------------------------- Galago root 0.53 0.85 HowlerMon root 0.14 0.71 Rhesus edge.3 0.07 0.58 Orangutan edge.2 0.02 0.49 Gorilla edge.1 0.01 0.43 Human edge.0 0.02 2.44 Chimpanzee edge.0 0.01 2.28 edge.0 edge.1 0.00 0.01 edge.1 edge.2 0.01 0.55 edge.2 edge.3 0.04 0.33 edge.3 root 0.02 1.10... We need an annotated tree object to do the drawing, we write this out to an XML formatted file so it can be reloaded for later reuse. .. doctest:: >>> annot_tree = lf.getAnnotatedTree() >>> annot_tree.writeToFile('result_tree.xml') We first import an unrooted dendrogram and then generate a heat mapped image to file where edges are colored red by the magnitude of ``omega`` with maximal saturation when ``omega=1``. .. doctest:: >>> from cogent.draw.dendrogram import ContemporaneousDendrogram >>> dend = ContemporaneousDendrogram(annot_tree) >>> fig = dend.makeFigure(height=6, width=6, shade_param='omega', ... max_value=1.0, stroke_width=2) >>> fig.savefig('omega_heat_map.png') Reconstructing ancestral sequences ================================== We first fit a likelihood function. .. doctest:: >>> from cogent import LoadTree, LoadSeqs >>> from cogent.evolve.models import F81 >>> tree = LoadTree('data/primate_brca1.tree') >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> sm = F81() >>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2) >>> lf.setAlignment(aln) >>> lf.optimise(show_progress=False, local=True) We then get the most likely ancestral sequences. .. doctest:: >>> ancestors = lf.likelyAncestralSeqs() >>> print ancestors >root TGTGGCACAAATACTCATGCCAGCTCATTACAGCA... Or we can get the posterior probabilities (returned as a ``DictArray``) of sequence states at each node. .. doctest:: >>> ancestral_probs = lf.reconstructAncestralSeqs() >>> print ancestral_probs['root'] ============================================ T C A G -------------------------------------------- 0 0.1816 0.0000 0.0000 0.0000 1 0.0000 0.0000 0.0000 0.1561 2 0.1816 0.0000 0.0000 0.0000 3 0.0000 0.0000 0.0000 0.1561... Tips for improved performance ============================= Sequentially build the fitting ------------------------------ There's nothing that improves performance quite like being close to the maximum likelihood values. So using the ``setParamRule`` method to provide good starting values can be very useful. As this can be difficult to do one easy way is to build simpler models that are nested within the one you're interested in. Fitting those models and then relaxing constraints until you’re at the parameterisation of interest can markedly improve optimisation speed. Being able to save results to file allows you to do this between sessions. Sampling -------- If you're dealing with a very large alignment, another approach is to use a subset of the alignment to fit the model then try fitting the entire alignment. The alignment method does have an method to facilitate this approach. The following samples 99 codons without replacement. .. doctest:: >>> from cogent import LoadSeqs >>> aln = LoadSeqs('data/primate_brca1.fasta') >>> smpl = aln.sample(n=99, with_replacement=False, motif_length=3) >>> len(smpl) 297 While this samples 99 nucleotides without replacement. .. doctest:: >>> smpl = aln.sample(n=99, with_replacement=False) >>> len(smpl) 99 .. following cleans up files .. doctest:: :hide: >>> from cogent.util.misc import remove_files >>> remove_files(['result_tree.xml', 'omega_heat_map.png'], ... error_on_missing=False) PyCogent-1.5.3/doc/_static/google_feed.js000755 000765 000024 00000002310 11361423150 021223 0ustar00jrideoutstaff000000 000000 google.load("feeds", "1"); function initialize() { var feed = new google.feeds.Feed("http://pycogent.wordpress.com/feed/"); feed.load(function(result) { if (!result.error) { var container = document.getElementById("feed"); for (var i = 0; i < 3; i++) { var entry = result.feed.entries[i]; var tr =document.createElement('tr'); var td =document.createElement('td'); var link = document.createElement('a'); link.setAttribute('href', entry.link); var str = entry.publishedDate; var patt1 = /[0-9]{2} \w+ [0-9]{4}/i; var pubdate=str.match(patt1).toString(); var splitdate=pubdate.split(" "); var title=document.createTextNode(entry.title); var subtitle = document.createElement('b'); subtitletxt=document.createTextNode(' ('+splitdate[1]+' '+splitdate[0]+', '+splitdate[2]+')'); subtitle.setAttribute('style','color:white;font-size:9px;') subtitle.appendChild(subtitletxt); link.appendChild(title); link.appendChild(subtitle) td.appendChild(link); tr.appendChild(td); container.appendChild(tr); } } }); } google.setOnLoadCallback(initialize);PyCogent-1.5.3/cogent/__init__.py000644 000765 000024 00000023656 12024702176 017645 0ustar00jrideoutstaff000000 000000 """The most commonly used constructors are available from this toplevel module. The rest are in the subpackages: phylo, evolve, maths, draw, parse and format.""" import sys, os, re, cPickle import numpy __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Rob Knight", "Peter Maxwell", "Jeremy Widmann", "Catherine Lozupone", "Matthew Wakefield", "Edward Lang", "Greg Caporaso", "Mike Robeson", "Micah Hamady", "Sandra Smit", "Zongzhi Liu", "Andrew Butterfield", "Amanda Birmingham", "Brett Easton", "Hua Ying", "Jason Carnes", "Raymond Sammut", "Helen Lindsay", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" #SUPPORT2425 if sys.version_info < (2, 6): py_version = ".".join([str(n) for n in sys.version_info]) raise RuntimeError("Python-2.6 or greater is required, Python-%s used." % py_version) numpy_version = re.split("[^\d]", numpy.__version__) numpy_version_info = tuple([int(i) for i in numpy_version if i.isdigit()]) if numpy_version_info < (1, 3): raise RuntimeError("Numpy-1.3 is required, %s found." % numpy_version) version = __version__ version_info = tuple([int(v) for v in version.split(".") if v.isdigit()]) from cogent.util.table import Table as _Table from cogent.parse.table import load_delimited, autogen_reader from cogent.core.tree import TreeBuilder, TreeError from cogent.parse.tree_xml import parse_string as tree_xml_parse_string from cogent.parse.newick import parse_string as newick_parse_string from cogent.core.alignment import SequenceCollection from cogent.core.alignment import Alignment from cogent.parse.sequence import FromFilenameParser from cogent.parse.structure import FromFilenameStructureParser #note that moltype has to be imported last, because it sets the moltype in #the objects created by the other modules. from cogent.core.moltype import ASCII, DNA, RNA, PROTEIN, STANDARD_CODON, \ CodonAlphabet def Sequence(moltype=None, seq=None, name=None, filename=None, format=None): if seq is None: for (a_name, a_seq) in FromFilenameParser(filename, format): if seq is None: seq = a_seq if name is None: name = a_name else: raise ValueError("Multiple sequences in '%s'" % filename) if moltype is not None: seq = moltype.makeSequence(seq) elif not hasattr(seq, 'MolType'): seq = ASCII.makeSequence(seq) if name is not None: seq.Name = name return seq def LoadSeqs(filename=None, format=None, data=None, moltype=None, name=None, aligned=True, label_to_name=None, parser_kw={}, constructor_kw={}, **kw): """Initialize an alignment or collection of sequences. Arguments: - filename: name of the sequence file - format: format of the sequence file - data: optional explicit provision of sequences - moltype: the MolType, eg DNA, PROTEIN - aligned: set True if sequences are already aligned and have the same length, results in an Alignment object. If False, a SequenceCollection instance is returned instead. If callable, will use as a constructor (e.g. can pass in DenseAlignment or CodonAlignment). - label_to_name: function for converting original name into another name. Default behavior is to preserve the original FASTA label and comment. To remove all FASTA label comments, and pass in only the label, pass in: label_to_name=lambda x: x.split()[0] To look up names in a dict, pass in: label_to_name = lambda x: d.get(x, default_name) ...where d is a dict that's in scope, and default_name is what you want to assign any sequence that isn't in the dict. If format is None, will attempt to infer format from the filename suffix. If label_to_name is None, will attempt to infer correct conversion from the format. """ if filename is None: assert data is not None assert format is None assert not kw, kw else: assert data is None, (filename, data) data = list(FromFilenameParser(filename, format, **parser_kw)) # the following is a temp hack until we have the load API sorted out. if aligned: #if callable, call it -- expect either f(data) or bool if hasattr(aligned, '__call__'): return aligned(data=data, MolType=moltype, Name=name, label_to_name=label_to_name, **constructor_kw) else: #was not callable, but wasn't False return Alignment(data=data, MolType=moltype, Name=name, label_to_name=label_to_name, **constructor_kw) else: #generic case: return SequenceCollection return SequenceCollection(data, MolType=moltype, Name=name, label_to_name=label_to_name, **constructor_kw) def LoadStructure(filename, format=None, parser_kw={}): """Initialize a Structure from data contained in filename. Arguments: - filename: name of the filename to create structure from. - format: the optional file format extension. - parser_kw: optional keyword arguments for the parser.""" # currently there is no support for string-input assert filename is not None, 'No filename given.' return FromFilenameStructureParser(filename, format, **parser_kw) def LoadTable(filename=None, sep=',', reader=None, header=None, rows=None, row_order=None, digits=4, space=4, title='', missing_data='', max_width = 1e100, row_ids=False, legend='', column_templates=None, dtype=None, static_column_types=False, limit=None, **kwargs): """ Arguments: - filename: path to file containing a pickled table - sep: the delimiting character between columns - reader: a parser for reading filename. This approach assumes the first row returned by the reader will be the header row. - static_column_types: if True, and reader is None, identifies columns with a numeric data type (int, float) from the first non-header row. This assumes all subsequent entries in that column are of the same type. Default is False. - header: column headings - rows: a 2D dict, list or tuple. If a dict, it must have column headings as top level keys, and common row labels as keys in each column. - row_order: the order in which rows will be pulled from the twoDdict - digits: floating point resolution - space: number of spaces between columns or a string - title: as implied - missing_data: character assigned if a row has no entry for a column - max_width: maximum column width for printing - row_ids: if True, the 0'th column is used as row identifiers and keys for slicing. - legend: table legend - column_templates: dict of column headings: string format templates or a function that will handle the formatting. - dtype: optional numpy array typecode. - limit: exits after this many lines. Only applied for non pickled data file types. """ # if filename is not None and not (reader or static_column_types): if filename[filename.rfind(".")+1:] == 'pickle': f = file(filename, 'U') loaded_table = cPickle.load(f) f.close() return _Table(**loaded_table) sep = sep or kwargs.pop('delimiter', None) header, rows, loaded_title, legend = load_delimited(filename, delimiter = sep, limit=limit, **kwargs) title = title or loaded_title elif filename and (reader or static_column_types): f = file(filename, "r") if not reader: reader = autogen_reader(f, sep, limit=limit, with_title=kwargs.get('with_title', False)) rows = [row for row in reader(f)] f.close() header = rows.pop(0) table = _Table(header=header, rows=rows, digits=digits, row_order=row_order, title=title, dtype=dtype, column_templates=column_templates, space=space, missing_data=missing_data, max_width=max_width, row_ids=row_ids, legend=legend) return table def LoadTree(filename=None, treestring=None, tip_names=None, format=None, \ underscore_unmunge=False): """Constructor for tree. Arguments, use only one of: - filename: a file containing a newick or xml formatted tree. - treestring: a newick or xml formatted tree string. - tip_names: a list of tip names. Note: underscore_unmunging is turned off by default, although it is part of the Newick format. Set underscore_unmunge to True to replace underscores with spaces in all names read. """ if filename: assert not (treestring or tip_names) treestring = open(filename).read() if format is None and filename.endswith('.xml'): format = "xml" if treestring: assert not tip_names if format is None and treestring.startswith('<'): format = "xml" if format == "xml": parser = tree_xml_parse_string else: parser = newick_parse_string tree_builder = TreeBuilder().createEdge #FIXME: More general strategy for underscore_unmunge if parser is newick_parse_string: tree = parser(treestring, tree_builder, \ underscore_unmunge=underscore_unmunge) else: tree = parser(treestring, tree_builder) if not tree.NameLoaded: tree.Name = 'root' elif tip_names: tree_builder = TreeBuilder().createEdge tips = [tree_builder([], tip_name, {}) for tip_name in tip_names] tree = tree_builder(tips, 'root', {}) else: raise TreeError, 'filename or treestring not specified' return tree PyCogent-1.5.3/cogent/align/000755 000765 000024 00000000000 12024703627 016614 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/app/000755 000765 000024 00000000000 12024703625 016300 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/cluster/000755 000765 000024 00000000000 12024703625 017201 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/core/000755 000765 000024 00000000000 12024703626 016451 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/data/000755 000765 000024 00000000000 12024703630 016425 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/db/000755 000765 000024 00000000000 12024703626 016106 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/draw/000755 000765 000024 00000000000 12024703627 016457 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/evolve/000755 000765 000024 00000000000 12024703627 017022 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/format/000755 000765 000024 00000000000 12024703626 017011 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/maths/000755 000765 000024 00000000000 12024703631 016631 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/motif/000755 000765 000024 00000000000 12024703631 016633 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/parse/000755 000765 000024 00000000000 12024703630 016626 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/phylo/000755 000765 000024 00000000000 12024703627 016655 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/recalculation/000755 000765 000024 00000000000 12024703626 020346 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/seqsim/000755 000765 000024 00000000000 12024703630 017015 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/struct/000755 000765 000024 00000000000 12024703626 017045 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/util/000755 000765 000024 00000000000 12024703631 016472 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/util/__init__.py000644 000765 000024 00000001224 12024702176 020605 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['array', 'checkpointing', 'datatypes', 'dict2d', 'misc', 'modules', 'organizer', 'parallel', 'table', 'transform', 'unit_test', 'warning','recode_alignment'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Rob Knight", "Sandra Smit", "Peter Maxwell", "Amanda Birmingham", "Zongzhi Liu", "Andrew Butterfield", "Daniel McDonald", "Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" PyCogent-1.5.3/cogent/util/array.py000644 000765 000024 00000053153 12024702176 020174 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides small utility functions for numpy arrays. """ from operator import mul, __getitem__ as getitem from numpy import array, arange, logical_not, cumsum, where, compress, ravel,\ zeros, put, take, sort, searchsorted, log, nonzero, sum,\ sqrt, clip, maximum, reshape, argsort, argmin, repeat, product, identity,\ concatenate, less, trace, newaxis, min, pi from numpy.random import randint, normal import numpy from cogent.util.transform import cross_comb def cartesian_product(lists): """Returns cartesian product of lists as list of tuples. WARNING: Explicitly constructs list in memory. Should use generator version in cogent.util.transform instead for most uses. Provided for compatibility. """ return map(tuple, cross_comb(lists)) numerictypes = numpy.core.numerictypes.sctype2char Float = numerictypes(float) Int = numerictypes(int) err = numpy.seterr(divide='raise') __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def gapped_to_ungapped(orig, gap_state, remove_mask=False): """Return array converting gapped to ungapped indices based on gap state. Will use == to test whether items equal the gapped state. Assumes character arrays. If remove_mask is True (default is False), will assign positions that are only in the gapped but not the ungapped version to -1 for easy detection. """ return masked_to_unmasked(orig == gap_state, remove_mask) def ungapped_to_gapped(orig, gap_state): """Returns array mapping indices in ungapped sequence to indices in orig. See documentation for unmasked_to_masked for more detail. """ return unmasked_to_masked(orig == gap_state) def masked_to_unmasked(mask, remove_mask=False): """Returns array mapping indices in orig to indices in ungapped. Specifically, for each position in orig, returns the index of the position in the unmasked sequence of the last non-masked character at or before that index (i.e. if the index corresponds to a masked position, will return the index of the previous non-masked position since the masked positions aren't in the unmasked sequence by definition). If remove_mask is True (the default is False), sets the masked positions to -1 for easy detection. """ result = cumsum(logical_not(mask), axis=0) -1 if remove_mask: result = where(mask, -1, result) return result def unmasked_to_masked(mask): """Returns array mapping indices in ungapped to indices in original. Any position where the mask is True will be omitted from the final result. """ return compress(logical_not(mask), arange(len(mask))) def pairs_to_array(pairs, num_items=None, transform=None): """Returns array with same data as pairs (list of tuples). pairs can contain (first, second, weight) or (first, second) tuples. If 2 items in the tuple, weight will be assumed to be 1. num_items should contain the number of items that the pairs are chosen from. If None, will calculate based on the largest item in the actual list. transform contains a array that maps indices in the pairs coordinates to other indices, i.e. transform[old_index] = new_index. It is anticipated that transform will be the result of calling ungapped_to_gapped on the original, gapped sequence before the sequence is passed into something that strips out the gaps (e.g. for motif finding or RNA folding). WARNING: all tuples must be the same length! (i.e. if weight is supplied for any, it must be supplied for all. WARNING: if num_items is actually smaller than the biggest index in the list (+ 1, because the indices start with 0), you'll get an exception when trying to place the object. Don't do it. """ #handle easy case if not pairs: return array([]) data = array(pairs) #figure out if we're mapping the indices to gapped coordinates if transform is not None: #pairs of indices idx_pairs = take(transform, data[:,0:2].astype(Int), axis=0) else: idx_pairs = data[:,0:2].astype(Int) #figure out biggest item if not supplied if num_items is None: num_items = int(max(ravel(idx_pairs))) + 1 #make result array result = zeros((num_items,num_items), Float) if len(data[0]) == 2: values = 1 else: values = data[:,2] put(ravel(result), idx_pairs[:,0]*num_items+idx_pairs[:,1], values) return result ln_2 = log(2) def log2(x): """Returns the log (base 2) of x" WARNING: log2(0) will give -inf on one platform, but it might raise an error (Overflow or ZeroDivision on another platform. So don't rely on getting -inf in your downstream code. """ return log(x)/ln_2 def safe_p_log_p(a): """Returns -(p*log2(p)) for every non-negative, nonzero p in a. a: numpy array WARNING: log2 is only defined on positive numbers, so make sure there are no negative numbers in the array. Always returns an array with floats in there to avoid unexpected results when applying it to an array with just integers. """ c = array(a.copy(),Float) flat = ravel(c) nz_i = numpy.ravel(nonzero(maximum(flat,0))) nz_e = take(flat,nz_i, axis=0) log_nz = log2(nz_e) flat *= 0 x = nz_e*-log_nz put(flat,nz_i,x) return c def safe_log(a): """Returns the log (base 2) of each nonzero item in a. a: numpy array WARNING: log2 is only defined on positive numbers, so make sure there are no negative numbers in the array. Will either return an array containing floating point exceptions or will raise an exception, depending on platform. Always returns an array with floats in there to avoid unexpected results when applying it to an array with just integers. """ c = array(a.copy(),Float) flat = ravel(c) nz_i = numpy.ravel(nonzero(flat)) nz_e = take(flat,nz_i, axis=0) log_nz = log2(nz_e) put(flat,nz_i,log_nz) return c def row_uncertainty(a): """Returns uncertainty (Shannon's entropy) for each row in a IN BITS a: numpy array (has to be 2-dimensional!) The uncertainty is calculated in BITS not NATS!!! Will return 0 for every empty row, but an empty array for every empty column, thanks to this sum behavior: >>> sum(array([[]]),1) array([0]) >>> sum(array([[]])) zeros((0,), 'l') """ try: return sum(safe_p_log_p(a),1) except ValueError: raise ValueError, "Array has to be two-dimensional" def column_uncertainty(a): """Returns uncertainty (Shannon's entropy) for each column in a in BITS a: numpy array (has to be 2-dimensional) The uncertainty is calculated in BITS not NATS!!! Will return 0 for every empty row, but an empty array for every empty column, thanks to this sum behavior: >>> sum(array([[]]),1) array([0]) >>> sum(array([[]])) zeros((0,), 'l') """ if len(a.shape) < 2: raise ValueError, "Array has to be two-dimensional" return sum(safe_p_log_p(a), axis=0) def row_degeneracy(a,cutoff=.5): """Returns the number of characters that's needed to cover >= cutoff a: numpy array cutoff: number that should be covered in the array Example: [ [.1 .3 .4 .2], [.5 .3 0 .2], [.8 0 .1 .1]] if cutoff = .75: row_degeneracy -> [3,2,1] if cutoff = .95: row_degeneracy -> [4,3,3] WARNING: watch out with floating point numbers. if the cutoff= 0.9 and in the array is also 0.9, it might not be found >>> searchsorted(cumsum(array([.6,.3,.1])),.9) 2 >>> searchsorted(cumsum(array([.5,.4,.1])),.9) 1 If the cutoff value is not found, the result is clipped to the number of columns in the array. """ if not a.any(): return [] try: b = cumsum(sort(a)[:,::-1],1) except IndexError: raise ValueError, "Array has to be two dimensional" degen = [searchsorted(aln_pos,cutoff) for aln_pos in b] #degen contains now the indices at which the cutoff was hit #to change to the number of characters, add 1 return clip(array(degen)+1,0,a.shape[1]) def column_degeneracy(a,cutoff=.5): """Returns the number of characters that's needed to cover >= cutoff a: numpy array cutoff: number that should be covered in the array Example: [ [.1 .8 .3], [.3 .2 .3], [.6 0 .4]] if cutoff = .75: column_degeneracy -> [2,1,3] if cutoff = .45: column_degeneracy -> [1,1,2] WARNING: watch out with floating point numbers. if the cutoff= 0.9 and in the array is also 0.9, it might not be found >>> searchsorted(cumsum(array([.6,.3,.1])),.9) 2 >>> searchsorted(cumsum(array([.5,.4,.1])),.9) 1 If the cutoff value is not found, the result is clipped to the number of rows in the array. """ if not a.any(): return [] b = cumsum(sort(a,0)[::-1],axis=0) try: degen = [searchsorted(b[:,idx],cutoff) for idx in range(len(b[0]))] except TypeError: raise ValueError, "Array has to be two dimensional" #degen contains now the indices at which the cutoff was hit #to change to the number of characters, add 1 return clip(array(degen)+1,0,a.shape[0]) def hamming_distance(x,y): """Returns the Hamming distance between two arrays. The Hamming distance is the number of characters which differ between two sequences (arrays). WARNING: This function truncates the longest array to the length of the shortest one. Example: ABC, ABB -> 1 ABCDEFG, ABCEFGH -> 4 """ shortest = min(map(len,[x,y])) return sum(x[:shortest] != y[:shortest], axis=0) def norm(a): """Returns the norm of a matrix or vector Calculates the Euclidean norm of a vector. Applies the Frobenius norm function to a matrix (a.k.a. Euclidian matrix norm) a = numpy array """ return sqrt(sum((a*a).flat)) def euclidean_distance(a,b): """Returns the Euclidean distance between two vectors/arrays a,b: numpy vectors or arrays WARNING: this method is NOT intended for use on arrays of different sizes, but no check for this has been built in. """ return norm(a-b) def count_simple(a, alphabet_len): """Counts items in a. """ result = zeros(alphabet_len, Int) for i in ravel(a): result[i] += 1 return result def count_alphabet(a, alphabet_len): """Counts items in a, using ==""" #ensure behavior is polymorphic with count_simple if not alphabet_len: raise IndexError, "alphabet_len must be > 0" result = zeros(alphabet_len, Int) a = ravel(a) for i in range(alphabet_len): result[i] = sum(a == i) return result def is_complex(m): """Returns True if m has a complex component.""" return m.dtype.char == 'D' def is_significantly_complex(m, threshold=0.1): """Returns True if the sum of m's imaginary component exceeds threshold.""" if is_complex(m): if sum(sum(abs(m.imag))) > threshold: return True return False def has_neg_off_diags(m): """Returns True if m has negative off-diagonal elements.""" return min(ravel(m * logical_not(identity(len(m))))) < 0 def has_neg_off_diags_naive(m): """Returns True if m has off-diagonal elements. Naive, slow implementation -- don't use. Primarily here to check correctness of faster implementation. """ working = m.copy() for i in range(len(working)): working[i][i] = 0 if min(ravel(working)) < 0: return True else: return False def sum_neg_off_diags(m): """Returns sum of negative off-diags in m.""" return sum(compress(ravel(less(m,0)), \ ravel((m * logical_not(identity(len(m))))))) def sum_neg_off_diags_naive(m): """Returns sum of negative off-diags in m. Naive, slow implementation -- don't use. Primarily here to check correctness of faster implementation. """ sum = 0 for row_i, row in enumerate(m): for col_i, item in enumerate(row): if (row_i != col_i) and (item < 0): sum += item return sum def scale_row_sum(m, val=1): """Scales matrix in place so that each row sums to val (default: 1). WARNING: will use 'integer division', not true division, if matrix is an integer data type. """ m /= (sum(m, axis=1)/val)[:,newaxis] def scale_row_sum_naive(m, val=1): """Scales matrix in place so that each row sums to val (default:1). Naive implementation -- don't use. Primarily here to check correctness. WARNING: will use 'integer division'. """ for row in m: row_sum = sum(row) row /= (row_sum / val) def scale_trace(m, val=-1): """Scales matrix in place so that trace of result is val (default: -1). WARNING: if trace is opposite sign to val, inverts sign of all elements in the matrix. WARNING: will use 'integer division', not true division, if matrix is an integer data type. """ m *= val/trace(m) def abs_diff(first, second): """Calculates element-wise sum of abs(first - second). Return value may be real or complex. """ return sum(ravel(abs(first-second))) def sq_diff(first, second): """Calculates element-wise sum of (first - second)**2. Return value may be real or complex. """ diff = first - second return sum(ravel((diff*diff))) def norm_diff(first, second): """Returns square root of sq_diff, normalized to # elements.""" size = len(ravel(first)) return sqrt(sq_diff(first, second))/size def without_diag(a): """Returns copy of square matrix a, omitting diagonal elements.""" return array([concatenate((r[:i], r[i+1:])) for i, r in enumerate(a)]) def with_diag(a, d): """Returns copy of matrix a with d inserted as diagonal to yield square.""" rows, cols = a.shape result = zeros((rows, cols+1), a.dtype.char) for i, r in enumerate(a): result_row = result[i] result_row[:i] = r[:i] result_row[i] = d[i] result_row[i+1:] = r[i:] return result def only_nonzero(a): """Returns elements of a where the first element of a[i] is nonzero. Result is a new array and does not share data with the original. NOTE: This is designed for arrays of rate matrices. If the first element of the rate matrix is zero, then the row must be all zero (since the row sums to zero with the first element being equal in magnitude but opposite in sign to the sum of the other elements). If the row is all zero, then the rate matrix is almost certainly invalid and should be excluded from further analyses. """ first_element_selector = [0] * len(a.shape) first_element_selector[0] = slice(None,None) return take(a, numpy.ravel(nonzero(a[first_element_selector])), axis=0) def combine_dimensions(m, dim): """Aggregates all dimensions of m between dim and the end. In other words, combine_dimensions(m, 3) flattens the first 3 dimensions. Similarly, combine_dimensions(m, -2) flattens the last two dimensions. WARNING: Result shares data with m. """ #if we're not combining more than one dimension, return the array unchanged if abs(dim) <= 1: return m #otherwise, figure out the new shape and reshape the array shape = m.shape if dim < 0: #counting from end return reshape(m, shape[:dim] + (product(shape[dim:]),)) else: #counting from start return reshape(m, (product(shape[:dim]),) + shape[dim:]) def split_dimension(m, dim, shape=None): """Splits specified dimension of m into shape. WARNING: product of terms in shape must match size of dim. For example, if the length of dim is 12, shape could be (4,3) or (2,6) but not (2,3). Result shares data with m. """ curr_dims = m.shape num_dims = len(curr_dims) #if shape not specified, assume it was square if shape is None: shape = (sqrt(curr_dims[dim]),)*2 #raise IndexError if index out of bounds curr_dims[dim] #fix negative indices if dim < 0: dim = num_dims + dim #extract the relevant region and reshape it return reshape(m, curr_dims[:dim] + shape + curr_dims[dim+1:]) def non_diag(m): """From a sequence of n flattened 2D matrices, returns non-diag elements. For example, for an array containing 20 9-element row vectors, returns an array containing 20 6-element row vectors that omit elements 0, 4, and 8. """ num_rows, num_elements = m.shape side_length = int(sqrt(num_elements)) wanted = numpy.ravel(nonzero(logical_not(identity(side_length).flat))) all_wanted = repeat([wanted], num_rows,axis=0) all_wanted += (arange(num_rows) * num_elements)[:,newaxis] return reshape(take(ravel(m), ravel(all_wanted), axis=0), \ (num_rows, num_elements-side_length)) def perturb_one_off_diag(m, mean=0, sd=0.01, element_to_change=None): """Perturbs one off-diagonal element of rate matrix m by random number. mean: mean of distribution to sample from. Default 0. sd: standard deviation of distribution to sample from. Default 0.05. Error model is additive. WARNING: may reverse sign of element in some cases! WARNING: if an element is specified, the coordinate is relative to the flat array _without_ the diagonal, _not_ relative to the original array! e.g. for a 4x4 array, element 8 refers to a[2][3], _not_ a[2][0]. """ #get the elements that are allowed to change elements = without_diag(m) flat = ravel(elements) #pick an element to change if it wasn't specified if element_to_change is None: element_to_change = randint(0, len(flat)) #change the element, pack the elements back into the array, and return #the result. flat[element_to_change] += normal(mean, sd) return with_diag(elements, -sum(elements, 1)) def perturb_one_off_diag_fixed(m, size): """Perturbs a random off-diag element of rate matrix m by factor of size.""" elements = without_diag(m) flat = ravel(elements) element_to_change = randint(0, len(flat)) flat[element_to_change] *= (1.0 + size) return with_diag(elements, -sum(elements, 1)) def perturb_off_diag(m, mean=0, sd=0.01): """Perturbs all off-diagonal elements of m by adding random number. mean: mean of distribution to sample from. Default 0. sd: standard deviation of distribution to sample from. Default 0.01. WARNING: may reverse sign of element! """ elements = without_diag(m) random = normal(mean, sd, elements.shape) result = elements + random return with_diag(result, -sum(result, 1)) def perturb_off_diag_frac(m, size): """Perturbs all off-diagonal elements of m by about a specified fraction. mean: mean of size (relative to each element) of change to make. Will never reverse sign of an element. """ elements = without_diag(m) random = normal(0, size_to_stdev(size), elements.shape) result = elements * abs((1.0+random)) #ensure always positive return with_diag(result, -sum(result, 1)) def size_to_stdev(size): """From desired mean deviation, returns sd for N(0,sd). Uses method of Altman 1993, as described in: http://www-users.york.ac.uk/~mb55/talks/halfnor.pdf ...where E(X) = sqrt(2*sigma/pi) """ return size*size*pi/2.0 def merge_samples(*samples): """Merges list of samples into array of [vals,dists]. value of each sample corresponds to its position in the list. e.g. for [1,2,3] and [4,5,6], result will be: array([[1,2,3,4,5,6],[0,0,0,1,1,1]]) """ return concatenate([array((a, zeros(a.shape)+i)) \ for i,a in enumerate(samples)], 1) def sort_merged_samples_by_value(a): """Sorts result of merge_samples by value (i.e. first row).""" return take(a, argsort(a[0]), 1) def classifiers(*samples): """Returns all 1D classifier separating first sample from the remainder. Returns [(cut_value, fp, fn, tp, tn) for i in cuts]. """ if len(samples) <= 1: raise TypeError, "optimal_classifier needs at least 2 distributions." vals, labels = sort_merged_samples_by_value(merge_samples(*samples)) n = len(vals) num_positives = len(samples[0]) num_negatives = n - num_positives #only want to check indices where the next value is different to_check = numpy.ravel(nonzero(vals[:-1] - vals[1:])) result = [] for index in to_check: i = index+1 #because it changes at the value _after_ the specified index fp = sum(labels[:i] != 0) fn = sum(labels[i:] == 0) tp = num_positives - fn tn = num_negatives - fp reversed = tp + tn < fp + fn if reversed: tp, tn, fp, fn = fp, fn, tp, tn result.append((i, reversed, fp, fn, tp, tn)) return result def minimize_error_count(classifiers): """Returns the classifier from a list of classifiers that minimizes errors. Errors are defined as #fp + # fn. If multiple classifiers have equal scores, returns an arbitrary one. """ c = array(classifiers) return classifiers[argmin(sum(c[:,2:4],1))] def minimize_error_rate(classifiers): """Returns the classifier from a list of classifiers that minimizes errors. Errors are defined as (#fp/#p) + (# fn/#n). If multiple classifiers have equal scores, returns an arbitrary one. """ c = array(classifiers) return classifiers[argmin(\ 1.0*c[:,2]/(c[:,2]+c[:,5])+1.0*c[:,3]/(c[:,3]+c[:,4]))] def mutate_array(a, sd, mean=0): """Return mutated copy of the array (or vector), adding mean +/- sd.""" return a + normal(mean, sd, a.shape) PyCogent-1.5.3/cogent/util/checkpointing.py000644 000765 000024 00000003031 12024702176 021671 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os, time, cPickle from cogent.util import parallel __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class Checkpointer(object): def __init__(self, filename, interval=None, noisy=True): if interval is None: interval = 1800 self.filename = filename self.interval = interval self.last_time = time.time() self.noisy = noisy self._redundant = parallel.getCommunicator().Get_rank() > 0 def available(self): return self.filename is not None and os.path.exists(self.filename) def load(self): assert self.filename is not None, 'check .available() first' print "RESUMING from file '%s'" % self.filename f = open(self.filename) obj = cPickle.load(f) self.last_time = time.time() return obj def record(self, obj, msg=None, always=False): if self.filename is None or self._redundant: return now = time.time() elapsed = now - self.last_time if always or elapsed > self.interval: if self.noisy: print "CHECKPOINTING to file '%s'" % self.filename if msg is not None: print msg f = open(self.filename, 'w') cPickle.dump(obj, f) self.last_time = now PyCogent-1.5.3/cogent/util/datatypes.py000644 000765 000024 00000001006 12024702176 021042 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class ImmutableDictionary(dict): def _immutable(self, *args, **kw): raise TypeError("%ss are immutable" % type(self).__name__) __setitem__ = __delitem__ = _immutable update = clear = pop = popitem = setdefault = _immutable PyCogent-1.5.3/cogent/util/dict2d.py000644 000765 000024 00000072703 12024702176 020231 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Dict2D: holds two-dimensional dict, acting as matrix with labeled rows/cols. The Dict2D is useful for storing arbitrary, sparse data in a way that is easy to access by labeled rows and columns. It is much slower than a numpy array, so only use when the convenience outweighs the performance penalty. It is especially useful for storing distance matrices between arbitrarily labeled objects. """ __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class Dict2DError(Exception): """All Dict2D-specific errors come from here.""" pass class Dict2DInitError(ValueError, Dict2DError): """Raised if Dict2D init fails.""" pass class Dict2DSparseError(KeyError, Dict2DError): """Raised on operations that fail because the Dict2D is sparse.""" pass ## Following methods based on methods developed by Rob Knight ## These methods are intended to be used by the reflect function of ## SquareMatrix. Each method takes two values, and returns two values. The ## idea is to reflect a matrix you take the value from the upper triangle, ## and the value from the lower triangle perform the intended operation on ## them, and then return the new values for the upper triangle and the lower ## triangle. def average(upper, lower): """Returns mean of the two values.""" try: val = (upper + lower)/2.0 return val, val except TypeError: raise TypeError, "%s or %s invalid types for averaging."\ % (str(upper), str(lower)) def largest(upper, lower): """Returns largest of the two values.""" val = max(upper, lower) return val, val def smallest(upper, lower): """Returns smallest of the two values.""" val = min(upper, lower) return val, val def swap(upper, lower): """Swaps the two values.""" return lower, upper def nonzero(upper, lower): """Fills both values to whichever evaluates True, or leaves in place.""" if upper and not lower: return upper, upper elif lower and not upper: return lower, lower else: return upper, lower def not_0(upper, lower): """Fills both values to whichever is not equal to 0, or leaves in place.""" # Note that I compare the values to zero istead of using if not lower # This is because the values coming in can be anything, including an # empty list, which should be considered nonzero if upper == 0: return lower, lower elif lower == 0: return upper, upper else: return upper, lower def upper_to_lower(upper, lower): """ return new symm matrix with upper tri copied to lower tri""" return upper, upper def lower_to_upper(upper, lower): """ return new symm matrix with upper tri copied to lower tri""" return lower, lower # End methods developed by Rob Knight class Dict2D(dict): """Implements dict of dicts with expanded functionality This class is useful for creating and working with 2D dictionaries. """ #override these in subclasses for convenient customization. RowOrder = None #default RowOrder ColOrder = None #default ColOrder Default = None #default Default value when m[r][c] absent RowConstructor = dict #default row constructor Pad = False #default state for automatic col and item padding #list of attributes that is copied by the copy() method _copied_attributes = ['RowOrder', 'ColOrder', 'Default', 'Pad', \ 'RowConstructor'] def __init__(self, data=None, RowOrder=None, ColOrder=None, Default=None, Pad=None, RowConstructor=None): """Returns new Dict2D with specified parameters. data : can either be a dict of dicts, or a sequence of 3-item sequences giving row, col, value. RowOrder: list of 'interesting' row keys. If passed in during init, all rows in RowOrder will be created. Rows not in RowOrder will not be printed or used in most calculations, if they exist. Default is None (calculates on the fly from self.keys(). ColOrder: list of 'interesting' column keys. If passed in during init, all columns in ColOrder will be created in all rows. Columns not in ColOrder will not be printed or used in most calculations, if they exist. Default is None (calculates on the fly by examining the keys in each row. This can be expensive! Default: value returned when m[r][c] doesn't exist. Pad: whether or not to pad Cols and Items with the default value instead of raising an exception. Default False. RowConstructor: constructor for inner rows. Defaults to class value of dict. WARNING: Must be able to accept initialization with no parameters (i.e. if you have a class that requires parameters to initialize, RowConstructor should be a function that supplies all the appropriate defaults.) Note that all operations that alter the Dict2D act in place. If you want to operate on a different object you should call the Dict2D copy() to create an identical deep copy of your Dict2D and then work on that one, leaving the original untouched. See doc string for Dict2D.copy() for usage information. usage: d = {'a':{'x':1,'y':2}, 'b':{'x':0, 'z':5}} m = Dict2D(d) m = Dict2D(d,Rows=['a','b']) m = Dict2D(d,Cols=['x'],Default=99) """ #set the following as instance data if supplied; otherwise, will #fall through to class data if RowOrder is not None: self.RowOrder = RowOrder if ColOrder is not None: self.ColOrder = ColOrder if Default is not None: self.Default = Default if Pad is not None: self.Pad = Pad if RowConstructor is not None: self.RowConstructor = RowConstructor #initialize data as an empty dict if data = None data = data or {} init_method = self._guess_input_type(data) if not init_method: raise Dict2DInitError, \ "Dict2D init failed (data unknown type, or Row/Col order needed)." #if we get here, we got an init method that it's safe to call init_method(data) #fill in any missing m[r][c] from RowOrder and ColOrder if self.Pad. if self.Pad: self.pad() def _guess_input_type(self, data): """Guesses the input type of data, and returns appropriate init method. Known init methods are fromDicts, fromIndices, and fromLists. Returns None if it can't figure out the data type. """ if isinstance(data, dict): #assume dict of dicts return self.fromDicts else: RowOrder = self.RowOrder ColOrder = self.ColOrder try: #guess list of lists or seq of 3-item seqs if (RowOrder is not None) and \ (ColOrder is not None) and \ (len(data) == len(RowOrder)) and \ (len(data[0]) == len(ColOrder)): #assume list of lists return self.fromLists elif len(data[0]) == 3: #assume seq of 3-item seqs return self.fromIndices except: #if there's any exception, we guessed the wrong type so #will return None pass def fromDicts(self, data): """Fills self from dict of dicts.""" constructor = self.RowConstructor try: for key, val in data.items(): self[key] = constructor(val) except (TypeError, ValueError, AttributeError): raise Dict2DInitError, \ "Dict2D init from dicts failed." def fromIndices(self, data): """Fills self from sequence of (row, col, value) sequences.""" constructor = self.RowConstructor try: for row, col, val in data: curr_row = self.get(row, constructor()) curr_row[col] = val self[row] = curr_row except (TypeError, ValueError, AttributeError): raise Dict2DInitError, \ "Dict2D init from indices failed." def fromLists(self, data): """Fills self from list of lists. Note that dimensions of list of lists must match RowOrder x ColOrder.""" constructor = self.RowConstructor if (self.RowOrder is None) or (self.ColOrder is None): raise Dict2DInitError, \ "Must have RowOrder and ColOrder to init Dict2D from list of lists." try: for key, row in zip(self.RowOrder, data): self[key] = dict(zip(self.ColOrder, row)) except (TypeError): raise Dict2DInitError, \ "Dict2D init from lists failed." def pad(self, default=None): """Ensures self[r][c] exists for r in RowOrder for c in ColOrder. default, if not specified, uses self.Default. """ constructor = self.RowConstructor if default is None: default = self.Default row_order = self.RowOrder or self.rowKeys() col_order = self.ColOrder or self.colKeys() for r in row_order: if r not in self: self[r] = constructor() curr_row = self[r] for c in col_order: if c not in curr_row: curr_row[c] = default def purge(self): """Keeps only items self[r][c] if r in RowOrder and c in ColOrder.""" #first, purge unwanted rows if self.RowOrder: wanted_keys = dict.fromkeys(self.RowOrder) for key in self.keys(): if not key in wanted_keys: del self[key] #then, purge unwanted cols if self.ColOrder: wanted_keys = dict.fromkeys(self.ColOrder) for row in self.values(): for key in row.keys(): if not key in wanted_keys: del row[key] def rowKeys(self): """Returns list of keys corresponding to all rows. Same as list(self). """ return list(self) def colKeys(self): """Returns list of keys corresponding to all cols.""" result = {} for row in self.values(): result.update(row) return list(result) def sharedColKeys(self): """Returns list of keys shared by all cols.""" rows = self.values() if not rows: return [] result = rows[0] for row in rows: for key in result.keys(): if key not in row: del result[key] return list(result) def square(self, default=None, reset_order=False): """Checks RowOrder and ColOrder share keys, and that self[r][c] exists. If reset_order is True (default is False), appends additional Cols to RowOrder and sets ColOrder to RowOrder. """ row_order = self.RowOrder or self.rowKeys() col_order = self.ColOrder or self.colKeys() rows = dict.fromkeys(row_order) cols = dict.fromkeys(col_order) if reset_order: if rows != cols: for c in cols: if c not in rows: row_order.append(c) self.RowOrder = row_order #WARNING: we set self.ColOrder to row_order as well, _not_ to #col_order, because we want the RowOrder and ColOrder to be #the same after squaring. self.ColOrder = row_order else: if rows != cols: raise Dict2DError, \ "Rows and Cols must be the same to square a Dict2D." self.pad(default) def _get_rows(self): """Iterates over the rows, using RowOrder/ColOrder. Converts the rows to lists of values, so r[i] is the same as m[r][m.ColOrder.index(c)] (assuming that the items in ColOrder are unique). zip(self.ColOrder, row) will associate the column label with each item in the row. If you actually want to get a list of the row objects, you probably want self.values() or [self[i] for i in self.RowOrder] instead of this method. If self.Pad is True, will pad rows with self.Default instead of raising an exception. """ row_order = self.RowOrder or self.rowKeys() if self.Pad: col_order = self.ColOrder or self.colKeys() constructor = self.RowConstructor default = self.Default for r in row_order: curr_row = self.get(r, constructor()) yield [curr_row.get(c, default) for c in col_order] else: col_order = self.ColOrder if col_order: #need to get items into the right column order try: for r in row_order: curr_row = self[r] yield [curr_row[c] for c in col_order] except KeyError: raise Dict2DSparseError, \ "Can't iterate over rows of sparse Dict2D." else: #if there's no ColOrder, just return what's there for r in row_order: curr_row = self[r] yield curr_row.values() Rows = property(_get_rows) def _get_cols(self): """Iterates over the columns, using RowOrder/ColOrder. Returns each column as a list of the values in that column, so that c[i] = m[m.RowOrder.index(r)][c] (assuming the items in RowOrder are unique). zip(self.RowOrder, col) will associate the row label with each item in the column. If you want to get the column objects as dicts that support named lookups, so that c[r] = m[r][c], your best bet is something like: cols = self.copy() cols.transpose() return cols.values() #or [cols[r] for r in cols.RowOrder] Will fail if ColOrder is specified and keys are missing. """ row_order = self.RowOrder or self.rowKeys() col_order = self.ColOrder or self.colKeys() if self.Pad: default = self.Default constructor = self.RowConstructor for c in col_order: yield [self.get(r, constructor()).get(c, default) for \ r in row_order] else: try: for c in col_order: yield [self[r][c] for r in row_order] except KeyError: raise Dict2DSparseError, \ "Can't iterate over cols of sparse Dict2D." Cols = property(_get_cols) def _get_items(self): """Iterates over the items, using RowOrder and ColOrder if present. Returns a list of the items, by rows rather than by columns. self.Pad controls whether to insert the default anywhere a value is missing, or to return only the values that exist. """ for row in self.Rows: for i in row: yield i Items = property(_get_items) def getRows(self, rows, negate=False): """Returns new Dict2D containing only specified rows. Note that the rows in the new Dict2D will be references to the same objects as the rows in the old Dict2D. If self.Pad is True, will create new rows rather than raising KeyError. """ result = {} if negate: #copy everything except the specified rows row_lookup = dict.fromkeys(rows) for r, row in self.items(): if r not in row_lookup: result[r] = row else: #copy only the specified rows if self.Pad: row_constructor = self.RowConstructor for r in rows: result[r] = self.get(r, row_constructor()) else: for r in rows: result[r] = self[r] return self.__class__(result) def getRowIndices(self, f, negate=False): """Returns list of keys of rows where f(row) is True. List will be in the same order as self.RowOrder, if present. Note that the function is applied to the row as given by self.Rows, not to the original dict that contains it. """ #negate function if necessary if negate: new_f = lambda x: not f(x) else: new_f = f #get all the rows where the function is True row_order = self.RowOrder or self return [key for key, row in zip(row_order,self.Rows) \ if new_f(row)] def getRowsIf(self, f, negate=False): """Returns new Dict2D containing rows where f(row) is True. Note that the rows in the new Dict2D are the same objects as the rows in the old Dict2D, not copies. """ #pass negate to get RowIndices return self.getRows(self.getRowIndices(f, negate)) def getCols(self, cols, negate=False, row_constructor=None): """Returns new Dict2D containing only specified cols. By default, the rows will be dicts, but an alternative constructor can be specified. Note that getCols should not fail on ragged columns, and will just ignore any elements that are not explicitly present in a given row whether or not self.Pad is set. """ if row_constructor is None: row_constructor = self.RowConstructor result = {} #if we're negating, pick out all the columns except specified indices if negate: col_lookup = dict.fromkeys(cols) for key, row in self.items(): result[key] = row_constructor([(i, row[i]) for i in row \ if (i in row) and (i not in col_lookup)]) #otherwise, just get the requested indices else: for key, row in self.items(): result[key] = row_constructor([(i, row[i]) for i in cols \ if i in row]) return self.__class__(result) def getColIndices(self, f, negate=False): """Returns list of column indices for which f(col) is True.""" #negate f if necessary if negate: new_f = lambda x: not f(x) else: new_f = f return [i for i, col in zip(self.ColOrder or self.colKeys(), self.Cols)\ if new_f(col)] def getColsIf(self, f, negate=False, row_constructor=None): """Returns new Dict2D containing cols where f(col) is True. Note that the rows in the new Dict2D are always new objects. Default constructor is list(), but an alternative can be passed in. """ if row_constructor is None: row_constructor = self.RowConstructor return self.getCols(self.getColIndices(f, negate), \ row_constructor=row_constructor) def getItems(self, items, negate=False): """Returns list containing only specified items. items should be a list of (row_key, col_key) tuples. getItems will fail with KeyError if items that don't exist are requested, unless self.Pad is True. Items will be returned in order (according to self.ColOrder and self.RowOrder) when negate is True; when negate is False, they'll be returned in the order in which they were passed in. """ if negate: #have to cycle through every item and check that it's not in #the list of items to return item_lookup = dict.fromkeys(map(tuple, items)) result = [] if self.Pad: default = self.Default row_constructor = self.RowConstructor for r in self.RowOrder or self: curr_row = self.get(r, row_constructor()) for c in self.ColOrder or curr_row: if (r, c) not in items: result.append(curr_row.get(c, default)) else: for r in self.RowOrder or self: curr_row = self[r] for c in self.ColOrder or curr_row: if c in curr_row and (r, c) not in items: result.append(curr_row[c]) return result #otherwise, just pick the selected items out of the list else: if self.Pad: row_constructor = self.RowConstructor default = self.Default return [self.get(row, row_constructor()).get(col, default) \ for row, col in items] else: return [self[row][col] for row, col in items] def getItemIndices(self, f, negate=False): """Returns list of (key,val) tuples where f(self[key][val]) is True.""" if negate: new_f = lambda x: not f(x) else: new_f = f result = [] for row_label in self.RowOrder or self: curr_row = self[row_label] for col_label in self.ColOrder or curr_row: if col_label in curr_row and new_f(curr_row[col_label]): result.append((row_label, col_label)) return result def getItemsIf(self, f, negate=False): """Returns list of items where f(self[row][col]) is True.""" return self.getItems(self.getItemIndices(f, negate)) def toLists(self, headers=False): """Return copy of self as list of lists, in order if specified. headers specifies whether to include the row and col headers (default: False). If the class data specifies RowOrder and ColOrder, can recapture the data in a new Dict2D object using self.fromLists() on the result of this method. Pads with self.Default if self.Pad is True; otherwise, raises exception on missing values. The result of toLists() can be passed to the array() function of numpy to generate a real array object for numerical calculations (if headers are False, the default). """ col_order = self.ColOrder or self.colKeys() row_order = self.RowOrder or self.rowKeys() if self.Pad: default = self.Default result = [] for r in row_order: result.append([self[r].get(c, default) for c in col_order]) else: try: result = [] for r in row_order: result.append([self[r][c] for c in col_order]) except KeyError: raise Dict2DSparseError, \ "Unpadded Dict2D can't convert to list of lists if sparse." if headers: for header, row in zip(row_order, result): row.insert(0, header) result = [['-'] + list(col_order)] + result return result def copy(self): """Returns a new deep copy of the data in self (rows are new objects). NOTE: only copies the attributes in self._copied_attributes. Remember to override _copied_attributes in subclasses that need to copy additional data. usage: d = {'a':{'a':0}} m = Dict2D(d) c = m.copy() c['a']['a'] = 1 c['a']['a'] != m['a']['a'] """ #create new copy of the data data = {} for key, row in self.items(): data[key] = row.copy() #convert the result to the same class as self result = self.__class__(data) #copy any of self's attributes that aren't the same as the class values for attr in self._copied_attributes: curr_value = getattr(self, attr) if curr_value is not getattr(self.__class__, attr): setattr(result, attr, curr_value) return result def fill(self, val, rows=None, cols=None, set_orders=False): """Fills self[r][c] with val for r in rows and c in cols. If rows or cols is not specified, fills all the appropriate values that are already present in self. val: value to fill with. Required. rows: row indices to fill. If None, all rows are affected. Every row in rows will be created if necessary. cols: col indices to fill. If None, all cols are affected. Every col in cols will be created if necessary. set_orders: if True, sets self.RowOrder to rows and self.ColOrder to cols (default False). Otherwise, RowOrder and ColOrder are unaffected. NOTE: RowOrder and ColOrder are _not_ used as defaults for rows and cols, in order to make it convenient to fill only the elements that actually exist. """ if set_orders: self.RowOrder = rows self.ColOrder = cols if rows is None: #affect all rows for r in self.values(): for c in (cols or r): #if not cols, affect everything in row r[c] = val else: #affect only specified rows constructor = self.RowConstructor #might need to create new rows for r in rows: #create row if needed if r not in self: self[r] = constructor() #bind current row curr_row = self[r] for c in (cols or curr_row): curr_row[c] = val def setDiag(self, val): """Set the diagonal to val (required). Note: only affects keys that are rows (i.e. does not create rows for keys that are in the cols only). Use self.square() in this situation. """ for k in self: self[k][k] = val def scale(self, f): """Applies f(x) to all elements of self.""" for r in self: curr_row = self[r] for c, val in curr_row.items(): curr_row[c] = f(val) def transpose(self): """Converts self in-place so self[r][c] -> self[c][r]. Also swaps RowOrder and ColOrder. """ t = {} for r, row in self.items(): for c, val in row.items(): if c not in t: t[c] = {} t[c][r] = val self.clear() self.fromDicts(t) self.RowOrder, self.ColOrder = self.ColOrder, self.RowOrder def reflect(self, method=average): """Reflects items across diagonal, as given by RowOrder and ColOrder. Fails if RowOrder and ColOrder aren't equal, or if they're unspecified. Works fine on triangular matrices (i.e. no need to make the matrix square first) -- however, the diagonal won't be set if it wasn't present originally. """ row_order = self.RowOrder col_order = self.ColOrder if (row_order is None) or (col_order is None): raise Dict2DError, \ "Can't reflect a Dict2D without both RowOrder and ColOrder." if row_order != col_order: raise Dict2DError, \ "Can only reflect Dict2D if RowOrder and ColOrder are the same." constructor = self.RowConstructor default = self.Default for row_index in range(len(row_order)): r = row_order[row_index] if r not in self: self[r] = constructor() curr_row = self[r] for col_index in range(row_index): c = row_order[col_index] if c not in self: self[c] = constructor() curr_col = self[c] # Note that rows and cols need to be transposed, the way the # functions are currently written. I think that changing the # functions would break existing code, and they make sense # as written. curr_row[c], curr_col[r] = \ method(curr_col.get(r, default), curr_row.get(c, default)) def toDelimited(self, headers=True, item_delimiter='\t', \ row_delimiter = '\n', formatter=str): """Printable string of items in self, optionally including headers. headers: whether or not to print headers (default True). delimiter: inserted between fields (default tab). formatter: applied to each element (default str). Note that the formatter is also applied to the headers, so should be a function that can handle strings as well as whatever the elements of the matrix are (i.e. if it's a number formatter, it should detect strings and return them without formatting). If RowOrder or ColOrder is present, only prints the relevant rows and cols. Always pads missing values with str(self.Default). """ lists = self.toLists(headers) return row_delimiter.join([item_delimiter.join(map(formatter, r)) \ for r in lists]) PyCogent-1.5.3/cogent/util/dict_array.py000644 000765 000024 00000014500 12024702176 021170 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Wrapper for numpy arrays so that they can be indexed by name >>> a = numpy.identity(3, int) >>> b = DictArrayTemplate('abc', 'ABC').wrap(a) >>> b[0] =========== A B C ----------- 1 0 0 ----------- >>> b['a'] =========== A B C ----------- 1 0 0 ----------- >>> b.keys() ['a', 'b', 'c'] >>> b['a'].keys() ['A', 'B', 'C'] """ import numpy from cogent.format import table __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class DictArrayTemplate(object): def __init__(self, *dimensions): self.names = [] self.ordinals = [] for names in dimensions: if isinstance(names, int): names = range(names) else: names = list(names)[:] self.names.append(names) self.ordinals.append(dict((c,i) for (i,c) in enumerate(names))) self._shape = tuple(len(keys) for keys in self.names) def __eq__(self, other): return self is other or ( isinstance(other, DictArrayTemplate) and self.names == other.names) def _dict2list(self, value, depth=0): # Unpack (possibly nested) dictionary into correct order of elements if depth < len(self._shape): return [self._dict2list(value[key], depth+1) for key in self.names[depth]] else: return value def unwrap(self, value): """Convert to a simple numpy array""" if isinstance(value, DictArray): if value.template == self: value = value.array else: raise ValueError # used to return None, which can't be right elif isinstance(value, dict): value = self._dict2list(value) value = numpy.asarray(value) assert value.shape == self._shape, (value.shape, self._shape) return value def wrap(self, array, dtype = None): # dtype is numpy array = numpy.asarray(array, dtype=dtype) for (dim, categories) in enumerate(self.names): assert len(categories) == numpy.shape(array)[dim], "cats=%s; dim=%s" % (categories, dim) return DictArray(array, self) def interpretIndex(self, names): if not isinstance(names, tuple): names = (names,) index = [] remaining = [] for (ordinals, allnames, name) in zip(self.ordinals, self.names, names): if not isinstance(name, (int, slice)): name = ordinals[name] elif isinstance(name, slice): start = name.start stop = name.stop if isinstance(name.start, str): start = allnames.index(name.start) if isinstance(name.stop, str): stop = allnames.index(name.stop) name = slice(start, stop, name.step) remaining.append(allnames.__getitem__(name)) index.append(name) remaining.extend(self.names[len(index):]) if remaining: klass = type(self)(*remaining) else: klass = None return (tuple(index), klass) def array_repr(self, a): if len(a.shape) == 1: heading = [str(n) for n in self.names[0]] a = a[numpy.newaxis, :] elif len(a.shape) == 2: heading = [''] + [str(n) for n in self.names[1]] a = [[str(name)] + list(row) for (name, row) in zip(self.names[0], a)] else: return '%s dimensional %s' % ( len(self.names), type(self).__name__) formatted = table.formattedCells(rows=a, header=heading) return str(table.simpleFormat(formatted[0], formatted[1], space=4)) class DictArray(object): """Wraps a numpy array so that it can be indexed with strings like nested dictionaries (only ordered), for things like substitution matrices and bin probabilities.""" def __init__(self, *args, **kwargs): """allow alternate ways of creating for time being""" if len(args) <= 2: self.array = args[0] self.template = args[1] else: if 'dtype' in kwargs or 'typecode' in kwargs: dtype = kwargs['dtype'] kwargs.pop('dtype', None) kwargs.pop('typecode', None) else: dtype = None create_new = DictArrayTemplate(*args[1:]).wrap(args[0], dtype=dtype) self.__dict__ = create_new.__dict__ self.Shape = self.array.shape def asarray(self): return self.array def __array__(self, dtype=None): array = self.array if dtype is not None: array = array.astype(dtype) return array def asdict(self): return dict(self.items()) def __getitem__(self, names): (index, remaining) = self.template.interpretIndex(names) result = self.array[index] if remaining is not None: result = self.__class__(result, remaining) return result def __iter__(self): (index, remaining) = self.template.interpretIndex(0) for elt in self.array: if remaining is None: yield elt else: yield remaining.wrap(elt) def __len__(self): return len(self.template.names[0]) def keys(self): return self.template.names[0][:] def items(self): return [(n,self[n]) for n in self.keys()] def __repr__(self): return self.template.array_repr(self.array) def __ne__(self, other): return not self.__eq__(other) def __eq__(self, other): if self is other: return True elif isinstance(other, DictArray): return self.template == other.template and numpy.all( self.array == other.array) elif type(other) is type(self.array): return self.array == other elif isinstance(other, dict): return self.asdict() == other else: return False PyCogent-1.5.3/cogent/util/misc.py000644 000765 000024 00000152511 12024702176 020007 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Generally useful utility classes and methods. """ import types import sys from time import clock from datetime import datetime from string import maketrans, strip from random import randrange, choice, randint from sys import maxint from os import popen, remove, makedirs, getenv from os.path import join, abspath, exists, isdir from numpy import logical_not, sum from cPickle import dumps, loads from gzip import GzipFile import hashlib # import parse_command_line_parameters for backward compatibility from cogent.util.option_parsing import parse_command_line_parameters __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell", "Amanda Birmingham", "Sandra Smit", "Zongzhi Liu", "Daniel McDonald", "Kyle Bittinger", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def safe_md5(open_file, block_size=2**20): """Computes an md5 sum without loading the file into memory This method is based on the answers given in: http://stackoverflow.com/questions/1131220/get-md5-hash-of-a-files-without-open-it-in-python """ md5 = hashlib.md5() data = True while data: data = open_file.read(block_size) if data: md5.update(data) return md5 def identity(x): """Identity function: useful for avoiding special handling for None.""" return x def if_(test, true_result, false_result): """Convenience function for one-line if/then/else with known return values. Note that both true_result and false_result are evaluated, which is not the case for the normal if/then/else, so don't use if one branch might fail. Additionally, true_result and false_result must be expressions, not statements (so print and raise will not work, for example). """ if test: return true_result else: return false_result def iterable(item): """If item is iterable, returns item. Otherwise, returns [item]. Useful for guaranteeing a result that can be iterated over. """ try: iter(item) return item except TypeError: return [item] def max_index(items): """Returns index of the largest item. items can be any sequence. If there is a tie, returns latest item. """ return max([(item, index) for index, item in enumerate(items)])[1] def min_index(items): """Returns index of the smallest item. items can be any sequence. If there is a tie, returns earliest item""" return min([(item, index) for index, item in enumerate(items)])[1] def flatten(items): """Removes one level of nesting from items. items can be any sequence, but flatten always returns a list. """ result = [] for i in items: try: result.extend(i) except: result.append(i) return result class DepthExceededError(Exception): pass def deep_list(x): """Convert tuple recursively to list.""" if isinstance(x, tuple): return map(deep_list, x) return x def deep_tuple(x): """Convert list recursively to tuple.""" if isinstance(x, list): return tuple(map(deep_tuple, x)) return x def between((min_, max_), number): """Same as: min_ <= number <= max_.""" return min_ <= number <= max_ def combinate(items, n): """Returns combinations of items.""" if n == 0: yield [] else: for i in xrange(len(items) - n + 1): for cc in combinate(items[i + 1:], n - 1): yield [items[i]] + cc def gzip_dump(object, filename, bin=2): """Saves a compressed object to file.""" file = GzipFile(filename, 'wb') file.write(dumps(object, bin)) try: # do not leave unlinked structures object.link() except AttributeError: pass file.close() def gzip_load(filename): """Loads a compressed object from file.""" file = GzipFile(filename, 'rb') buffer = [] while True: data = file.read() if data == "": break buffer.append(data) buffer = "".join(buffer) object = loads(buffer) file.close() return object def recursive_flatten_old(items, max_depth=None, curr_depth=0): """Removes all nesting from items, recursively. Note: Default max_depth is None, which removes all nesting (including unpacking strings). Setting max_depth unpacks a maximum of max_depth levels of nesting, but will not raise exception if the structure is not really that deep (instead, will just remove the nesting that exists). If max_depth is 0, will not remove any nesting (note difference from setting max_depth to None). """ #bail out if greater than max_depth if max_depth is not None: if curr_depth > max_depth: raise DepthExceededError result = [] for i in items: try: result.extend(recursive_flatten(i, max_depth, curr_depth + 1)) except: result.append(i) return result def curry(f, *a, **kw): """curry(f,x)(y) = f(x,y) or =lambda y: f(x,y) modified from python cookbook""" def curried(*more_a, **more_kw): return f(*(a + more_a), **dict(kw, **more_kw)) ## make docstring for curried funtion curry_params = [] if a: curry_params.extend([e for e in a]) if kw: curry_params.extend(['%s=%s' % (k, v) for k, v in kw.items()]) #str it to prevent error in join() curry_params = map(str, curry_params) try: f_name = f.func_name except: #e.g. itertools.groupby failed .func_name f_name = '?' curried.__doc__ = ' curry(%s,%s)\n'\ '== curried from %s ==\n %s'\ % (f_name, ', '.join(curry_params), f_name, f.__doc__) return curried #end curry def is_iterable(obj): """return True if obj is iterable""" try: iter(obj) except TypeError, e: return False else: return True def is_char(obj): """return True if obj is a char (str with lenth<=1)""" return isinstance(obj, basestring) and len(obj) <= 1 is_char_or_noniterable = lambda x: is_char(x) or\ not is_iterable(x) is_str_or_noniterable = lambda x: isinstance(x, basestring) or\ not is_iterable(x) def recursive_flatten(items, max_depth=None, curr_depth=1, is_leaf=is_char_or_noniterable): """Removes all nesting from items, recursively. Note: Default max_depth is None, which removes all nesting (including unpacking strings). Setting max_depth unpacks a maximum of max_depth levels of nesting, but will not raise exception if the structure is not really that deep (instead, will just remove the nesting that exists). If max_depth is 0, will not remove any nesting (note difference from setting max_depth to None). is_leaf: a predicate for 'leaf node'. The default is_char_or_noniterable removes all nesting. is_str_or_noniterable removes all nesting sequences except strings. is_leaf=not_list_tuple removes only nesting list or tuple , which is considerably faster and recommended for general use. """ result = [] for i in items: if max_depth is not None and curr_depth > max_depth \ or is_leaf(i): result.append(i) else: result.extend(recursive_flatten(i, max_depth, curr_depth + 1, is_leaf)) return result #end recursive_flatten def not_list_tuple(obj): """return False if obj is a list or a tuple""" return not isinstance(obj, (list, tuple)) list_flatten = curry(recursive_flatten, is_leaf=not_list_tuple) def unflatten(data, row_width, keep_extras=False): """Converts items in data into a list of row_width-length lists. row_width must be an integer. Will raise error if zero. data can be any sequence type, but results will always be lists at the first level (including the common case of a list containing one sequence). Any items left over after the last complete row will be discarded. This means that the number of items in data need not be divisible by row_width. This function does _not_ reverse the effect of zip, since the lists it produces are not interleaved. If the list of lists were treated as a 2-d array, its transpose would be the reverse of the effect of zip (i.e. the original lists would be columns, not rows). """ if row_width < 1: raise ValueError, "unflatten: row_width must be at least 1." result = [] num_items = len(data) slices = num_items / row_width for s in xrange(slices): result.append(data[s * row_width:(s + 1) * row_width]) if keep_extras: last_slice = data[slices * row_width:] if last_slice: result.append(last_slice) return result def unzip(items): """Performs the reverse of zip, i.e. produces separate lists from tuples. items should be list of k-element tuples. Will raise exception if any tuples contain more items than the first one. Conceptually equivalent to transposing the matrix of tuples. Returns list of lists in which the ith list contains the ith element of each tuple. Note: zip expects *items rather than items, such that unzip(zip(*items)) returns something that compares equal to items. Always returns lists: does not check original data type, but will accept any sequence. """ if items: return map(list, zip(*items)) else: return [] def select(order, items): """Returns the elements from items specified in order, a list of indices. Builds up a list containing the ith element in items for each item in order, which must be a list of valid keys into items. Example: vowels = select([0, 4, 8], 'abcdefghijklm') Can also be used to emulate Perl's hash slices. Example: reverse_vowel_freqs = select('ea', {'a':1,'b':5,'c':2,'d':4,'e':6}) Return type is a list of whatever type the elements in items are. """ return map(items.__getitem__, order) def sort_order(items, cmpfunc=None): """Returns an array containing the sorted order of elements in items. The returned array contains indexes. Looking up items[i] for i in indexes returns a sorted list of the items in ascending order. Useful for returning just the n best items, etc. """ indexed = [(item, index) for index, item in enumerate(items)] if cmpfunc is None: indexed.sort() else: indexed.sort(cmpfunc) return [i[1] for i in indexed] def find_all(text, pat): """Returns list of all overlapping occurrences of a pattern in a text. Each item in the (sorted) list is the index of one of the matches. """ result = [] last = 0 try: while 1: curr = text.index(pat, last) result.append(curr) last = curr + 1 except ValueError: #raised when no more matches return result def find_many(text, pats): """Returns sorted list of all occurrences of all patterns in text. Matches to all patterns are sorted together. Each item in the list is the index of one of the matches. WARNING: if pat is a single string, will search for the chars in the string individually. If this is not what you want (i.e. you want to search for the whole string), you should be using find_all instead; if you want to use find_many anyway, put the string in a 1-item list. """ result = [] for pat in pats: result.extend(find_all(text, pat)) result.sort() return result def unreserve(item): """Removes trailing underscore from item if it has one. Useful for fixing mutations of Python reserved words, e.g. class. """ if hasattr(item, 'endswith') and item.endswith('_'): return item[:-1] else: return item def add_lowercase(d): """Adds lowercase version of keys in d to itself. Converts vals as well. Should work on sequences of strings as well as strings. Now also works on strings and sets. """ if hasattr(d, 'lower'): #behaves like a string return d + d.lower() elif not hasattr(d, 'items'): #not a dict items = list(d) return d.__class__(items + [i.lower() for i in items]) #otherwise, assume dict-like behavior for key, val in d.items(): try: new_key = key.lower() except: #try to make tuple out of arbitrary sequence try: new_key = [] for k in key: try: new_key.append(k.lower()) except: new_key.append(k) new_key = tuple(new_key) except: new_key = key try: new_val = val.lower() except: new_val = val #don't care if we couldn't convert it if new_key not in d: #don't overwrite existing lcase keys d[new_key] = new_val return d def extract_delimited(line, left, right, start_index=0): """Returns the part of line from first left to first right delimiter. Optionally begins searching at start_index. Note: finds the next complete field (i.e. if we start in an incomplete field, skip it and move to the next). """ if left == right: raise TypeError, \ "extract_delimited is for fields w/ different left and right delimiters" try: field_start = line.index(left, start_index) except ValueError: #no such field return None else: try: field_end = line.index(right, field_start) except ValueError: #left but no right delimiter: raise error raise ValueError, \ "Found '%s' but not '%s' in line %s, starting at %s." \ % (left, right, line, start_index) #if we got here, we found the start and end of the field return line[field_start + 1:field_end] def caps_from_underscores(string): """Converts words_with_underscores to CapWords.""" words = string.split('_') return ''.join([w.title() for w in words]) def InverseDict(d): """Returns inverse of d, setting keys to values and vice versa. Note: if more than one key has the same value, returns an arbitrary key for that value (overwrites with the last one encountered in the iteration). Can be invoked with anything that can be an argument for dict(), including an existing dict or a list of tuples. However, keys are always inserted in arbitrary order rather than input order. WARNING: will fail if any values are unhashable, e.g. if they are dicts or lists. """ if isinstance(d, dict): temp = d else: temp = dict(d) return dict([(val, key) for key, val in temp.iteritems()]) def InverseDictMulti(d): """Returns inverse of d, setting keys to values and values to list of keys. Note that each value will _always_ be a list, even if one item. Can be invoked with anything that can be an argument for dict(), including an existing dict or a list of tuples. However, keys are always appended in arbitrary order, not the input order. WARNING: will fail if any values are unhashable, e.g. if they are dicts or lists. """ if isinstance(d, dict): temp = d else: temp = dict(d) result = {} for key, val in temp.iteritems(): if val not in result: result[val] = [] result[val].append(key) return result def DictFromPos(seq): """Returns dict mapping items to list of positions of those items in seq. Always assigns values as lists, even if the item appeared only once. WARNING: will fail if items in seq are unhashable, e.g. if seq is a list of lists. """ result = {} for i, s in enumerate(seq): if s not in result: result[s] = [] result[s].append(i) return result def DictFromFirst(seq): """Returns dict mapping each item to index of its first occurrence in seq. WARNING: will fail if items in seq are unhashable, e.g. if seq is a list of lists. """ result = {} for i, s in enumerate(seq): if s not in result: result[s] = i return result def DictFromLast(seq): """Returns dict mapping each item to index of its last occurrence in seq. WARNING: will fail if items in seq are unhashable, e.g. if seq is a list of lists. """ return dict([(item, index) for index, item in enumerate(seq)]) def DistanceFromMatrix(matrix): """Returns function(i,j) that looks up matrix[i][j]. Useful for maintaining flexibility about whether a function is computed or looked up. Matrix can be a 2D dict (arbitrary keys) or list (integer keys). """ def result(i, j): return matrix[i][j] return result def PairsFromGroups(groups): """Returns dict such that d[(i,j)] exists iff i and j share a group. groups must be a sequence of sequences, e.g a list of strings. """ result = {} for group in groups: for i in group: for j in group: result[(i, j)] = None return result class ClassChecker(object): """Container for classes: 'if t in x == True' if t is the right class.""" def __init__(self, *Classes): """Returns a new ClassChecker that accepts specified classes.""" type_type = type(str) for c in Classes: if type(c) != type_type: raise TypeError, \ "ClassChecker found non-type object '%s' in parameter list." % c self.Classes = list(Classes) def __contains__(self, item): """Returns True if item is a subclass of one of the classes in self.""" for c in self.Classes: if isinstance(item, c): return True return False def __str__(self): """Informal string representation: returns list""" return str(self.Classes) class Delegator(object): """Mixin class that forwards unknown attributes to a specified object. Handles properties correctly (this was somewhat subtle). WARNING: If you are delegating to an object that pretends to have every attribute (e.g. a MappedRecord), you _must_ bypass normal attribute access in __init__ of your subclasses to ensure that the properties are set in the object itself, not in the object to which it delegates. Alternatively, you can initialize with None so that unhandled attributes are set in self, and then replace self._handler with your object right at the end of __init__. The first option is probably safer and more general. Warning: will not work on classes that use __slots__ instead of __dict__. """ def __init__(self, obj): """Returns a new Delegator that uses methods of obj. NOTE: It's important that this bypasses the normal attribute setting mechanism, or there's an infinite loop between __init__ and __setattr__. However, subclasses should be able to use the normal mechanism with impunity. """ self.__dict__['_handler'] = obj def __getattr__(self, attr): """Forwards unhandled attributes to self._handler. Sets _handler to None on first use if not already set. """ handler = self.__dict__.setdefault('_handler', None) return getattr(handler, attr) def __setattr__(self, attr, value): """Forwards requests to change unhandled attributes to self._handler. This logic is rather complicated because of GenericRecord objects, which masquerade as having every attribute, which can be used as handlers for Delegators, which forward all requests to their handlers. Consequently, we need to check the following: 1. Is attr in the object's __dict__? If so, set it in self. 2. Does the handler have attr? If so, try to set it in handler. 3. Does self lack the attr? If so, try to set it in handler. 4. Did setting attr in the handler fail? If so, set it in self. """ #if we're setting _handler, set it in dict directly (but complain if #it's self). if attr == '_handler': if value is self: raise ValueError, "Can't set object to be its own handler." self.__dict__['_handler'] = value return #check if the attribute is in this object's dict elif attr in self.__dict__: return object.__setattr__(self, attr, value) #then check if the class knows about it elif hasattr(self.__class__, attr): return object.__setattr__(self, attr, value) #then try to set it in the handler if hasattr(self._handler, attr) or not hasattr(self, attr): try: return setattr(self._handler, attr, value) except AttributeError: pass #will try to create the attribute on self return object.__setattr__(self, attr, value) class FunctionWrapper(object): """Wraps a function to hide it from a class so that it isn't a method.""" def __init__(self, Function): self.Function = Function def __call__(self, *args, **kwargs): return self.Function(*args, **kwargs) class ConstraintError(Exception): """Raised when constraint on a container is violated.""" pass class ConstrainedContainer(object): """Mixin class providing constraint checking to a container. Container should have a Constraint property that __contains__ the items that will be allowed in the container. Can also have a Mask property that contains a function that will be applied to each item (a) on checking the item for validity, and (b) on inserting the item in the container. WARNING: Because the Mask is evaluated both when the item is checked and when it is inserted, any side-effects it has are applied _twice_. This means that any Mask that mutates the object or changes global variables is unlikely to do what you want! """ _constraint = None Mask = FunctionWrapper(identity) def _mask_for_new(self): """Returns self.Mask only if different from class data.""" if self.Mask is not self.__class__.Mask: return self.Mask else: return None def __init__(self, Constraint=None, Mask=None): """Returns new ConstrainedContainer, incorporating constraint. WARNING: Does not perform validation. It is the subclass's responsibility to perform validation during __init__ or __new__! """ if Constraint is not None: self._constraint = Constraint if Mask is not None: self.Mask = Mask def matchesConstraint(self, constraint): """Returns True if all items in self are allowed.""" #First checks if constraints are compatible. If not, or if the current #sequence has no constraint, does item by item search. #bail out if self or constraint is empty if not constraint or not self: return True #try checking constraints for compatibility if self.Constraint: try: constraint_ok = True for c in self.Constraint: if c not in constraint: constraint_ok = False break if constraint_ok: return True except TypeError: pass #e.g. tried to check wrong type item in string alphabet #get here if either self.Constraint is empty, or if we found an item #in self.Constraint that wasn't in the other constraint. In either case, #we need to check self item by item. if self: try: for i in self: if i not in constraint: return False except TypeError: #e.g. tried to check int in string alphabet return False return True def otherIsValid(self, other): """Returns True if other has only items allowed in self.Constraint.""" #First, checks other.Constrant for compatibility. #If other.Constraint is incompatible, checks items in other. mask = self.Mask constraint = self.Constraint if not constraint or not other: return True #bail out if empty try: #if other has a constraint, check whether it's compatible other_constraint = other.Constraint if other_constraint: for c in map(mask, other_constraint): if c not in constraint: raise ConstraintError return True except (ConstraintError, AttributeError, TypeError): pass #get here if other doesn't have a constraint or if other's constraint #isn't valid on self's constraint. try: for item in map(mask, other): if item not in constraint: return False except TypeError: return False #e.g. tried to check int in str alphabet return True def itemIsValid(self, item): """Returns True if single item is in self.Constraint.""" try: if (not self.Constraint) or self.Mask(item) in self.Constraint: return True else: return False except (TypeError, ConstraintError): #wrong type or not allowed return False def sequenceIsValid(self, sequence): """Returns True if all items in sequence are in self.Constraint.""" is_valid = self.itemIsValid for i in map(self.Mask, sequence): if not is_valid(i): return False return True def _get_constraint(self): """Accessor for constraint.""" return self._constraint def _set_constraint(self, constraint): """Mutator for constraint.""" if self.matchesConstraint(constraint): self._constraint = constraint else: raise ConstraintError, \ "Sequence '%s' incompatible with constraint '%s'" % (self, constraint) Constraint = property(_get_constraint, _set_constraint) class ConstrainedString(str, ConstrainedContainer): """String that is always valid on a specified constraint.""" def __new__(cls, data, Constraint=None, Mask=None): """Constructor class method for validated ConstrainedString.""" mask = Mask or cls.Mask if data == '': pass #map can't handle an empty sequence, sadly... elif isinstance(data, str): data = ''.join(map(mask, data)) else: try: data = str(map(mask, data)) except (TypeError, ValueError): data = str(mask(data)) new_string = str.__new__(cls, data) curr_constraint = Constraint or new_string.Constraint if curr_constraint and new_string: for c in new_string: try: is_valid = c in curr_constraint except TypeError: is_valid = False if not is_valid: raise ConstraintError, \ "Character '%s' not in constraint '%s'" % (c, curr_constraint) return new_string def __init__(self, data, Constraint=None, Mask=None): """Constructor for validated ConstrainedString.""" ConstrainedContainer.__init__(self, Constraint, Mask) def __add__(self, other): """Returns copy of self added to copy of other if constraint correct.""" if not self.otherIsValid(other): raise ConstraintError, \ "Sequence '%s' doesn't meet constraint" % other result = self.__class__(str(self) + ''.join(map(self.Mask, other)), \ Constraint=self.Constraint) mask = self._mask_for_new() if mask: result.Mask = mask return result def __mul__(self, multiplier): """Returns copy of self multiplied by multiplier.""" result = self.__class__(str.__mul__(self, multiplier), Constraint=self.Constraint) mask = self._mask_for_new() if mask: result.Mask = mask return result def __rmul__(self, multiplier): """Returns copy of self multiplied by multiplier.""" result = self.__class__(str.__rmul__(self, multiplier), Constraint=self.Constraint) mask = self._mask_for_new() if mask: result.Mask = mask return result def __getslice__(self, *args, **kwargs): """Make sure slice remembers the constraint.""" result = self.__class__(str.__getslice__(self, *args, **kwargs), \ Constraint=self.Constraint) mask = self._mask_for_new() if mask: result.Mask = mask return result def __getitem__(self, index): """Make sure extended slice remembers the constraint.""" items = str.__getitem__(self, index) if isinstance(index, slice): mask = self._mask_for_new() result = self.__class__(items, Constraint=self.Constraint) if mask: result.Mask = mask return result else: return items class MappedString(ConstrainedString): """As for ConstrainedString, but maps __contained__ and __getitem__.""" def __contains__(self, item): """Ensure that contains applies the mask.""" try: return super(MappedString, self).__contains__(self.Mask(item)) except (TypeError, ValueError): return False class ConstrainedList(ConstrainedContainer, list): """List that is always valid on a specified constraint.""" def __init__(self, data=None, Constraint=None, Mask=None): """Constructor for validated ConstrainedList.""" ConstrainedContainer.__init__(self, Constraint, Mask) if data: self.extend(data) def __add__(self, other): """Returns copy of self added to copy of other if constraint correct.""" result = self.__class__(list(self) + map(self.Mask, other) , \ Constraint=self.Constraint) mask = self._mask_for_new() if mask: result.Mask = mask return result def __iadd__(self, other): """Adds other to self if constraint correct.""" other = map(self.Mask, other) if self.otherIsValid(other): return list.__iadd__(self, other) else: raise ConstraintError, \ "Sequence '%s' has items not in constraint '%s'" \ % (other, self.Constraint) def __mul__(self, multiplier): """Returns copy of self multiplied by multiplier.""" result = self.__class__(list(self) * multiplier, Constraint=self.Constraint) mask = self._mask_for_new() if mask: result.Mask = mask return result def __rmul__(self, multiplier): """Returns copy of self multiplied by multiplier.""" result = self.__class__(list(self) * multiplier, Constraint=self.Constraint) mask = self._mask_for_new() if mask: result.Mask = mask return result def __setitem__(self, index, item): """Sets self[index] to item if item in constraint. Handles slices""" if isinstance(index, slice): if not self.otherIsValid(item): raise ConstraintError, \ "Sequence '%s' contains items not in constraint '%s'." % \ (item, self.Constraint) item = map(self.Mask, item) else: if not self.itemIsValid(item): raise ConstraintError, "Item '%s' not in constraint '%s'" % \ (item, self.Constraint) item = self.Mask(item) list.__setitem__(self, index, item) def append(self, item): """Appends item to self.""" if not self.itemIsValid(item): raise ConstraintError, "Item '%s' not in constraint '%s'" % \ (item, self.Constraint) list.append(self, self.Mask(item)) def extend(self, sequence): """Appends sequence to self.""" if self.otherIsValid(sequence): list.extend(self, map(self.Mask, sequence)) else: raise ConstraintError, "Some items in '%s' not in constraint '%s'"\ % (sequence, self.Constraint) def insert(self, position, item): """Inserts item at position in self.""" if not self.itemIsValid(item): raise ConstraintError, "Item '%s' not in constraint '%s'" % \ (item, self.Constraint) list.insert(self, position, self.Mask(item)) def __getslice__(self, *args, **kwargs): """Make sure slice remembers the constraint.""" result = self.__class__(list.__getslice__(self, *args, **kwargs), \ Constraint=self.Constraint) mask = self._mask_for_new() if mask: result.Mask = mask return result def __setslice__(self, start, end, sequence): """Make sure invalid data can't get into slice.""" if self.otherIsValid(sequence): list.__setslice__(self, start, end, map(self.Mask, sequence)) else: raise ConstraintError, \ "Sequence '%s' has items not in constraint '%s'"\ % (sequence, self.Constraint) class MappedList(ConstrainedList): """As for ConstrainedList, but maps items on contains and getitem.""" def __contains__(self, item): """Ensure that contains applies the mask.""" try: return super(MappedList, self).__contains__(self.Mask(item)) except (TypeError, ValueError): return False class ConstrainedDict(ConstrainedContainer, dict): """Dict containing only keys that are valid on a specified constraint. Default behavior when fed a sequence is to store counts of the items in that sequence, which is not the standard dict interface (should raise a ValueError instead) but which is surprisingly useful in practice. """ ValueMask = FunctionWrapper(identity) def _get_mask_and_valmask(self): """Helper method to check whether Mask and ValueMask were set.""" if self.Mask is self.__class__.Mask: mask = None else: mask = self.Mask if self.ValueMask is self.__class__.ValueMask: valmask = None else: valmask = self.ValueMask return mask, valmask def __init__(self, data=None, Constraint=None, Mask=None, ValueMask=None): """Constructor for validated ConstrainedDict.""" ConstrainedContainer.__init__(self, Constraint, Mask) if ValueMask is not None: self.ValueMask = ValueMask if data: try: self.update(data) except (ValueError, TypeError): for d in map(self.Mask, iterable(data)): curr = self.get(d, 0) self[d] = curr + 1 def __setitem__(self, key, value): """Sets self[key] to value if value in constraint.""" if not self.itemIsValid(key): raise ConstraintError, "Item '%s' not in constraint '%s'" % \ (key, self.Constraint) key, value = self.Mask(key), self.ValueMask(value) dict.__setitem__(self, key, value) def copy(self): """Should return copy of self, including constraint.""" mask, valmask = self._get_mask_and_valmask() return self.__class__(self, Constraint=self.Constraint, Mask=mask, ValueMask=valmask) def fromkeys(self, keys, value=None): """Returns new dictionary with same constraint as self.""" mask, valmask = self._get_mask_and_valmask() return self.__class__(dict.fromkeys(keys, value), Constraint=self.Constraint, Mask=mask, ValueMask=valmask) def setdefault(self, key, default=None): """Returns self[key], setting self[key]=default if absent.""" key, default = self.Mask(key), self.ValueMask(default) if key not in self: self[key] = default return self[key] def update(self, other): """Updates self with items in other. Implementation note: currently uses __setitem__, so no need to apply masks in this method. """ if not hasattr(other, 'keys'): other = dict(other) for key in other: self[key] = other[key] class MappedDict(ConstrainedDict): """As for ConstrainedDict, but maps keys on contains and getitem.""" def __contains__(self, item): """Ensure that contains applies the mask.""" try: return super(MappedDict, self).__contains__(self.Mask(item)) except (TypeError, ValueError): return False def __getitem__(self, item): """Ensure that getitem applies the mask.""" return super(MappedDict, self).__getitem__(self.Mask(item)) def get(self, item, default=None): """Ensure that get applies the mask.""" return super(MappedDict, self).get(self.Mask(item), default) def has_key(self, item): """Ensure that has_key applies the mask.""" return super(MappedDict, self).has_key(self.Mask(item)) def getNewId(rand_f=randrange): """Creates a random 12-digit integer to be used as an id.""" NUM_DIGITS = 12 return ''.join(map(str, [rand_f(10) for i in range(NUM_DIGITS)])) #end function getNewId def generateCombinations(alphabet, combination_length): """Returns an array of strings: all combinations of a given length. alphabet: a sequence (string or list) type object containing the characters that can be used to make the combinations. combination_length: a long-castable value (integer) specifying the length of desired combinations. comb is used as an abbreviation of combinations throughout. """ found_combs = [] num_combs = 0 try: alphabet_len = len(alphabet) combination_length = long(combination_length) except TypeError, ValueError: #conversion failed raise RuntimeError, "Bad parameter: alphabet must be of sequence " + \ "type and combination_length must be castable " + \ "to long." #end parameter conversion try/catch #the number of combs is alphabet length raised to the combination length if combination_length != 0: num_combs = pow(alphabet_len, combination_length) #end if for curr_comb_num in xrange(num_combs): curr_digit = 0 curr_comb = [0] * combination_length while curr_comb_num: curr_comb[curr_digit] = curr_comb_num % alphabet_len curr_comb_num = curr_comb_num / alphabet_len curr_digit += 1 #end while #now translate the list of digits into a list of characters real_comb = [] for position in curr_comb: real_comb.append(alphabet[position]) found_combs.append("".join(real_comb)) #next combination number return found_combs #end generateCombinations def toString(obj): """Public function to write a string of object's properties & their vals. This function looks only at the local properties/methods/etc of the object it is sent, and only examines public and first-level private (starts with _ but not __) entries. It ignores anything that is a method, function, or class. Any attribute whose value looks like a printout of a memory address (starts with < and ends with >) has its value replaced with the word "object". """ ignored_types = [types.BuiltinFunctionType, types.BuiltinMethodType, \ types.ClassType, types.FunctionType, types.MethodType, \ types.UnboundMethodType] result = [] for slot in obj.__dict__: if not slot.startswith("__"): ignore_attr = False attr = getattr(obj, slot) for ignored_type in ignored_types: if isinstance(attr, ignored_type): ignore_attr = True #next ignored type if not ignore_attr: attr_value = str(attr) if attr_value.startswith("<") and attr_value.endswith(">"): attr_value = "object" #end if result.append(slot + ": " + attr_value) #end if #end if #next property return "; ".join(result) #end toString #A class for exceptions caused when param cannot be cast to nonneg int class NonnegIntError(ValueError): """for exceptions caused when param cannot be cast to nonneg int""" def __init__(self, args=None): self.args = args #end __init__ #end NonnegIntError def makeNonnegInt(n): """Public function to cast input to nonneg int and return, or raise err""" try: n = abs(int(n)) except: raise NonnegIntError, n + " must be castable to a nonnegative int" #end try/except return n #end makeNonnegInt def reverse_complement(seq, use_DNA=True): """Public function to reverse complement DNA or RNA sequence string seq: a string use_DNA: a boolean indicating (if true) that A should translate to T. If false, RNA is assumed (A translates to U). Default is True. Returns a reverse complemented string. """ bad_chars = set(seq) - set("ACGTUacgtu") if len(bad_chars) > 0: raise ValueError,\ ("Only ACGTU characters may be passed to reverse_complement. Other " "characters were identified: %s. Use cogent.DNA.rc if you need to " "reverse complement ambiguous bases." % ''.join(bad_chars)) #decide which translation to use for complementation if use_DNA: trans_table = maketrans("ACGTacgt", "TGCAtgca") else: trans_table = maketrans("ACGUacgu", "UGCAugca") #end if #complement the input sequence, then reverse complemented = seq.translate(trans_table) comp_list = list(complemented) comp_list.reverse() #join the reverse-complemented list and return return "".join(comp_list) # The reverse_complement function was previously called revComp, but that # naming doesn't adhere to the PyCogent coding guidelines. Renamed, but # keeping the old name around to not break existing code. revComp = reverse_complement #end revComp def timeLimitReached(start_time, time_limit): """Return true if more that time_limit has elapsed since start_time""" result = False curr_time = clock() elapsed = curr_time - start_time if elapsed > time_limit: result = True return result #end _time_limit_reached def not_none(seq): """Returns True if no item in seq is None.""" for i in seq: if i is None: return False return True #end not_none def get_items_except(seq, indices, seq_constructor=None): """Returns all items in seq that are not in indices Returns the same type as parsed in except when a seq_constructor is set. """ sequence = list(seq) index_lookup = dict.fromkeys(indices) result = [sequence[i] for i in range(len(seq)) \ if i not in index_lookup] if not seq_constructor: if isinstance(seq, str): return ''.join(result) else: seq_constructor = seq.__class__ return seq_constructor(result) #end get_items_except def NestedSplitter(delimiters=[None], same_level=False, constructor=strip, filter_=False): """return a splitter which return a list (maybe nested) from a str using delimiters nestedly same_level -- if true, all the leaf items will be split whether there is delimiters in it or not constructor: modify each splited fields. filter_: filter the splits if not False(default) Note: the line input in parser is expected to be a str, but without check """ def parser(line, index=0): #split line with curr delimiter curr = delimiters[index] if isinstance(curr, (list, tuple)): try: delim, maxsplits = curr except ValueError: raise ValueError("delimiter tuple/list should be \ [delimiter_str, maxsplits]") if maxsplits < 0: result = line.rsplit(delim, -maxsplits) else: result = line.split(delim, maxsplits) else: result = line.split(curr) #modify splits if required if constructor: result = map(constructor, result) if filter_ != False: #allow filter(None,..) to rip off the empty items result = filter(filter_, result) #repeat recursively for next delimiter if index != len(delimiters) - 1: #not last delimiter result = [parser(f, index + 1) for f in result] #undo split if curr not in line and same_level==False #ignore the first delimiter if not same_level and index > 0 \ and len(result) == 1 and isinstance(result[0], basestring): result = result[0] return result #parser.__doc__ = make_innerdoc(NestedSplitter, parser, locals()) return parser #end NestedSplitter def app_path(app,env_variable='PATH'): """Returns path to an app, or False if app does not exist in env_variable This functions in the same way as which in that it returns the first path that contains the app. """ # strip off " characters, in case we got a FilePath object app = app.strip('"') paths = getenv(env_variable).split(':') for path in paths: p = join(path,app) if exists(p): return p return False #some error codes for creating a dir def get_create_dir_error_codes(): return {'NO_ERROR': 0, 'DIR_EXISTS': 1, 'FILE_EXISTS': 2, 'OTHER_OS_ERROR':3} def create_dir(dir_name, fail_on_exist=False, handle_errors_externally=False): """Create a dir safely and fail meaningful. dir_name: name of directory to create fail_on_exist: if true raise an error if dir already exists handle_errors_externally: if True do not raise Errors, but return failure codes. This allows to handle errors locally and e.g. hint the user at a --force_overwrite options. returns values (if no Error raised): 0: dir could be safely made 1: directory already existed 2: a file with the same name exists 3: any other unspecified OSError See qiime/denoiser.py for an example of how to use this mechanism. Note: Depending of how thorough we want to be we could add tests, e.g. for testing actual write permission in an existing dir. """ error_code_lookup = get_create_dir_error_codes() #pre-instanciate function with ror = curry(handle_error_codes, dir_name, handle_errors_externally) if exists(dir_name): if isdir(dir_name): #dir is there if fail_on_exist: return ror(error_code_lookup['DIR_EXISTS']) else: return error_code_lookup['DIR_EXISTS'] else: #must be file with same name return ror(error_code_lookup['FILE_EXISTS']) else: #no dir there, try making it try: makedirs(dir_name) except OSError: return ror(error_code_lookup['OTHER_OS_ERROR']) return error_code_lookup['NO_ERROR'] def handle_error_codes(dir_name, supress_errors=False, error_code=None): """Wrapper function for error_handling. dir_name: name of directory that raised the error suppress_errors: if True raise Errors, otherwise return error_code error_code: the code for the error """ error_code_lookup = get_create_dir_error_codes() if error_code == None: error_code = error_code_lookup['NO_ERROR'] error_strings = \ {error_code_lookup['DIR_EXISTS'] : "Directory already exists: %s" % dir_name, error_code_lookup['FILE_EXISTS'] : "File with same name exists: %s" % dir_name, error_code_lookup['OTHER_OS_ERROR']: "Could not create output directory: %s. " % dir_name + "Check the permissions."} if error_code == error_code_lookup['NO_ERROR']: return error_code_lookup['NO_ERROR'] if supress_errors: return error_code else: raise OSError, error_strings[error_code] def remove_files(list_of_filepaths, error_on_missing=True): """Remove list of filepaths, optionally raising an error if any are missing """ missing = [] for fp in list_of_filepaths: try: remove(fp) except OSError: missing.append(fp) if error_on_missing and missing: raise OSError, "Some filepaths were not accessible: %s" % '\t'.join(missing) def get_random_directory_name(suppress_mkdir=False,\ timestamp_pattern='%Y%m%d%H%M%S',\ rand_length=20,\ output_dir=None,\ prefix='', suffix='', return_absolute_path=True): """Build a random directory name and create the directory suppress_mkdir: only build the directory name, don't create the directory (default: False) timestamp_pattern: string passed to strftime() to generate the timestamp (pass '' to suppress the timestamp) rand_length: length of random string of characters output_dir: the directory which should contain the random directory prefix: prefix for directory name suffix: suffix for directory name return_absolute_path: If False, returns the local (relative) path to the new directory """ output_dir = output_dir or './' # Define a set of characters to be used in the random directory name chars = "abcdefghigklmnopqrstuvwxyz" picks = chars + chars.upper() + "0123456790" # Get a time stamp timestamp = datetime.now().strftime(timestamp_pattern) # Construct the directory name dirname = '%s%s%s%s' % (prefix,timestamp,\ ''.join([choice(picks) for i in range(rand_length)]),\ suffix) dirpath = join(output_dir,dirname) abs_dirpath = abspath(dirpath) # Make the directory if not suppress_mkdir: try: makedirs(abs_dirpath) except OSError: raise OSError,\ "Cannot make directory %s. Do you have write access?" % dirpath # Return the path to the directory if return_absolute_path: return abs_dirpath return dirpath def get_independent_coords(spans, random_tie_breaker=False): """returns non-overlapping spans. spans must have structure [(start, end, ..), (..)]. spans can be decorated with arbitrary data after the end entry. Arguments: - random_tie_breaker: break overlaps by randomly choosing the first or second span. Defaults to the first span. """ if len(spans) <= 1: return spans last = spans[0] result = [last] for i in range(1, len(spans)): curr = spans[i] if curr[0] < last[1]: if random_tie_breaker: result[-1] = [last, curr][randint(0,1)] else: result[-1] = last continue result.append(curr) last = curr return result def get_merged_overlapping_coords(start_end): """merges overlapping spans, assumes sorted by start""" result = [start_end[0]] prev_end = result[0][-1] for i in range(1, len(start_end)): curr_start, curr_end = start_end[i] # if we're beyond previous, add and continue if curr_start > prev_end: prev_end = curr_end result.append([curr_start, curr_end]) elif curr_end > prev_end: prev_end = curr_end result[-1][-1] = prev_end else: pass # we lie completely within previous span return result def get_run_start_indices(values, digits=None, converter_func=None): """returns starting index, value for all distinct values""" assert not (digits and converter_func), \ 'Cannot set both digits and converter_func' if digits is not None: converter_func = lambda x: round(x, digits) elif converter_func is None: converter_func = lambda x: x last_val = None for index, val in enumerate(values): val = converter_func(val) if val != last_val: yield [index, val] last_val = val return def get_merged_by_value_coords(spans_value, digits=None): """returns adjacent spans merged if they have the same value. Assumes [(start, end, val), ..] structure and that spans_value is sorted in ascending order. Arguments: - digits: if None, any data can be handled and exact values are compared. Otherwise values are rounded to that many digits. """ assert len(spans_value[0]) == 3, 'spans_value must have 3 records per row' starts, ends, vals = zip(*spans_value) indices_distinct_vals = get_run_start_indices(vals, digits=digits) data = [] i = 0 for index, val in indices_distinct_vals: start = starts[index] end = ends[index] prev_index = max(index-1, 0) try: data[-1][1] = ends[prev_index] except IndexError: pass data.append([start, end, val]) if index < len(ends): data[-1][1] = ends[-1] return data PyCogent-1.5.3/cogent/util/modules.py000644 000765 000024 00000002541 12024702176 020521 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Compiled modules may be out of date or missing""" import os, sys __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class ExpectedImportError(ImportError): pass def fail(msg): print >>sys.stderr, msg raise ExpectedImportError def importVersionedModule(name, globals, min_version, alt_desc): if os.environ.has_key('COGENT_PURE_PYTHON'): fail('Not using compiled module "%s". Will use %s.' % (name, alt_desc)) try: m = __import__(name, globals) except ImportError: fail('Compiled module "%s" not found. Will use %s.' % (name, alt_desc)) version = getattr(m, 'version_info', (0, 0)) desc = '.'.join(str(n) for n in version) min_desc = '.'.join(str(n) for n in min_version) max_desc = str(min_version[0])+'.x' if version < min_version: fail('Compiled module "%s" is too old as %s < %s. ' 'Will use %s.' % (name, desc, min_desc, alt_desc)) if version[0] > min_version[0]: fail('Compiled module "%s" is too new as %s > %s. ' 'Will use %s.' % (name, desc, max_desc, alt_desc)) return m PyCogent-1.5.3/cogent/util/option_parsing.py000644 000765 000024 00000024753 12024702176 022115 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Utilities for handle script options """ from copy import copy import sys from optparse import (OptionParser, OptionGroup, Option, OptionValueError) from os import popen, remove, makedirs, getenv from os.path import join, abspath, exists, isdir, isfile __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso","Daniel McDonald", "Gavin Huttley","Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Production" ## Definition of CogentOption option type, a subclass of Option that ## contains specific types for filepaths and directory paths. This ## will be particularly useful for graphical interfaces that make ## use of the script_info dictionary as they can then distinguish ## paths from ordinary strings def check_existing_filepath(option, opt, value): if not exists(value): raise OptionValueError( "option %s: file does not exist: %r" % (opt, value)) elif not isfile(value): raise OptionValueError( "option %s: not a regular file (can't be a directory!): %r" % (opt, value)) else: return value def check_existing_filepaths(option, opt, value): values = value.split(',') for v in values: check_existing_filepath(option,opt,v) return values def check_existing_dirpath(option, opt, value): if not exists(value): raise OptionValueError( "option %s: directory does not exist: %r" % (opt, value)) elif not isdir(value): raise OptionValueError( "option %s: not a directory (can't be a file!): %r" % (opt, value)) else: return value def check_new_filepath(option, opt, value): return value def check_new_dirpath(option, opt, value): return value def check_existing_path(option, opt, value): if not exists(value): raise OptionValueError( "option %s: path does not exist: %r" % (opt, value)) return value def check_new_path(option, opt, value): return value class CogentOption(Option): TYPES = Option.TYPES + ("existing_path", "new_path", "existing_filepath", "existing_filepaths", "new_filepath", "existing_dirpath", "new_dirpath",) TYPE_CHECKER = copy(Option.TYPE_CHECKER) # for cases where the user specifies an existing file or directory # as input, but it can be either a dir or a file TYPE_CHECKER["existing_path"] = check_existing_path # for cases where the user specifies a new file or directory # as output, but it can be either a dir or a file TYPE_CHECKER["new_path"] = check_new_path # for cases where the user passes a single existing file TYPE_CHECKER["existing_filepath"] = check_existing_filepath # for cases where the user passes one or more existing files # as a comma-separated list - paths are returned as a list TYPE_CHECKER["existing_filepaths"] = check_existing_filepaths # for cases where the user is passing a new path to be # create (e.g., an output file) TYPE_CHECKER["new_filepath"] = check_new_filepath # for cases where the user is passing an existing directory # (e.g., containing a set of input files) TYPE_CHECKER["existing_dirpath"] = check_existing_dirpath # for cases where the user is passing a new directory to be # create (e.g., an output dir which will contain many result files) TYPE_CHECKER["new_dirpath"] = check_new_dirpath make_option = CogentOption ## End definition of new option type def build_usage_lines(required_options, script_description, script_usage, optional_input_line, required_input_line): """ Build the usage string from components """ line1 = 'usage: %prog [options] ' + '{%s}' %\ ' '.join(['%s %s' % (str(ro),ro.dest.upper())\ for ro in required_options]) usage_examples = [] for title, description, command in script_usage: title = title.strip(':').strip() description = description.strip(':').strip() command = command.strip() if title: usage_examples.append('%s: %s\n %s' %\ (title,description,command)) else: usage_examples.append('%s\n %s' % (description,command)) usage_examples = '\n\n'.join(usage_examples) lines = (line1, '', # Blank line optional_input_line, required_input_line, '', # Blank line script_description, '', # Blank line 'Example usage: ',\ 'Print help message and exit', ' %prog -h\n', usage_examples) return '\n'.join(lines) def set_parameter(key,kwargs,default=None): try: return kwargs[key] except KeyError: return default def set_required_parameter(key,kwargs): try: return kwargs[key] except KeyError: raise KeyError,\ "parse_command_line_parameters requires value for %s" % key def parse_command_line_parameters(**kwargs): """ Constructs the OptionParser object and parses command line arguments parse_command_line_parameters takes a dict of objects via kwargs which it uses to build command line interfaces according to standards developed in the Knight Lab, and enforced in QIIME. The currently supported options are listed below with their default values. If no default is provided, the option is required. script_description script_usage = [("","","")] version required_options=None optional_options=None suppress_verbose=False disallow_positional_arguments=True help_on_no_arguments=True optional_input_line = '[] indicates optional input (order unimportant)' required_input_line = '{} indicates required input (order unimportant)' These values can either be passed directly, as: parse_command_line_parameters(script_description="My script",\ script_usage=[('Print help','%prog -h','')],\ version=1.0) or they can be passed via a pre-constructed dict, as: d = {'script_description':"My script",\ 'script_usage':[('Print help','%prog -h','')],\ 'version':1.0} parse_command_line_parameters(**d) """ # Get the options, or defaults if none were provided. script_description = set_required_parameter('script_description',kwargs) version = set_required_parameter('version',kwargs) script_usage = set_parameter('script_usage',kwargs,[("","","")]) required_options = set_parameter('required_options',kwargs,[]) optional_options = set_parameter('optional_options',kwargs,[]) suppress_verbose = set_parameter('suppress_verbose',kwargs,False) disallow_positional_arguments =\ set_parameter('disallow_positional_arguments',kwargs,True) help_on_no_arguments = set_parameter('help_on_no_arguments',kwargs,True) optional_input_line = set_parameter('optional_input_line',kwargs,\ '[] indicates optional input (order unimportant)') required_input_line = set_parameter('required_input_line',kwargs,\ '{} indicates required input (order unimportant)') # command_line_text will usually be nothing, but can be passed for # testing purposes command_line_args = set_parameter('command_line_args',kwargs,None) # Build the usage and version strings usage = build_usage_lines(required_options,script_description,script_usage,\ optional_input_line,required_input_line) version = 'Version: %prog ' + version # Instantiate the command line parser object parser = OptionParser(usage=usage, version=version) parser.exit = set_parameter('exit_func',kwargs,parser.exit) # If no arguments were provided, print the help string (unless the # caller specified not to) if help_on_no_arguments and (not command_line_args) and len(sys.argv) == 1: parser.print_usage() return parser.exit(-1) # Process the required options if required_options: # Define an option group so all required options are # grouped together, and under a common header required = OptionGroup(parser, "REQUIRED options", "The following options must be provided under all circumstances.") for ro in required_options: # if the option doesn't already end with [REQUIRED], # add it. if not ro.help.strip().endswith('[REQUIRED]'): ro.help += ' [REQUIRED]' required.add_option(ro) parser.add_option_group(required) # Add a verbose parameter (if the caller didn't specify not to) if not suppress_verbose: parser.add_option('-v','--verbose',action='store_true',\ dest='verbose',help='Print information during execution '+\ '-- useful for debugging [default: %default]',default=False) # Add the optional options map(parser.add_option,optional_options) # Parse the command line # command_line_text will None except in test cases, in which # case sys.argv[1:] will be parsed opts,args = parser.parse_args(command_line_args) # If positional arguments are not allowed, and any were provided, # raise an error. if disallow_positional_arguments and len(args) != 0: parser.error("Positional argument detected: %s\n" % str(args[0]) +\ " Be sure all parameters are identified by their option name.\n" +\ " (e.g.: include the '-i' in '-i INPUT_DIR')") # Test that all required options were provided. if required_options: required_option_ids = [o.dest for o in required.option_list] for required_option_id in required_option_ids: if getattr(opts,required_option_id) == None: return parser.error('Required option --%s omitted.' \ % required_option_id) # Return the parser, the options, and the arguments. The parser is returned # so users have access to any additional functionality they may want at # this stage -- most commonly, it will be used for doing custom tests of # parameter values. return parser, opts, args PyCogent-1.5.3/cogent/util/organizer.py000644 000765 000024 00000007334 12024702176 021056 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides Filter, Organizer, GroupList. Filter is a dictionary of {field_name:[filter_functions]}. Organizer is initiated with a list of Filters and it is called on a list of objects. It organizes all objects according to the given criteria (Filters). GroupList stores a group of objects and a separte hierarchy (i.e. list of elements or functions it matched). """ __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" class Filter(dict): """Dictionary of functions, i.e. selection criteria""" def __init__(self,Name,functions_dict=None): """Returns a new Filter object with given name and funtions""" if not functions_dict: functions_dict = {} super(Filter, self).__init__(functions_dict) self.Name = Name def __call__(self, item): """Returns True if the item satisfies all criteria""" for field in self: for function in self[field]: try: if field is None: if not function(item): return False else: if not function(getattr(item, field)): return False except AttributeError: return False return True class GroupList(list): """GroupList is a list of objects accompanied by a hierarchy. The data can be of any type. Groups is a list that stores the hierarchy of things the data matched. E.g. filters, classifiers. Example GroupList: data = [1,2,3,4], groups = ['numbers','<5'] GroupList: data = [5,6,7,8], groups = ['numbers','>=5'] """ def __init__(self, data, Groups=None): """Returns a new GroupList with data and Groups""" super(GroupList, self).__init__(data) self.Groups = Groups or [] class Organizer(list): """Organizer puts given objects into dict according to given criteria""" def _find_first_match(self, item): """Returns the name of the first criterion that applies to the item""" for criterion in self: if criterion(item): return criterion.Name def __call__(self, data, existing_results=None): """Puts the sequences into the dictionary according to the criteria""" temp_results = {None:[]} #dict to store intermediate results results = [] #the final results try: groups = data.Groups except AttributeError: groups = [] #figure out what criteria we have, and make keys for them for criterion in self: temp_results[criterion.Name] = [] #partition each datum according to which filter it matches first for item in data: temp_results[self._find_first_match(item)].append(item) return [GroupList(value, groups + [key]) \ for key, value in temp_results.items() if value] def regroup(data): """Regroups data into matching categories. data is a list of GroupLists (a list with hierarchy info associated) e.g. regroup([GroupList([1,2],['a']),GroupList([8,9],['b']), GroupList([3,4],['a'])] will give: [[1,2,3,4],[8,9]] """ result = {} for group_list in data: analyses = tuple(group_list.Groups) try: result[analyses].extend(group_list) except: result[analyses] = [] result[analyses].extend(group_list) return [GroupList(result[k],list(k)) for k,v in result.items()] PyCogent-1.5.3/cogent/util/parallel.py000644 000765 000024 00000021462 12024702176 020650 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import with_statement import os, sys from contextlib import contextmanager import warnings import threading import multiprocessing import multiprocessing.pool __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Andrew Butterfield", "Peter Maxwell", "Gavin Huttley", "Matthew Wakefield", "Edward Lang"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin Huttley" __status__ = "Production" # A flag to control if excess CPUs are worth a warning. inefficiency_forgiven = False class _FakeCommunicator(object): """Looks like a 1-cpu MPI communicator, but isn't""" def Get_rank(self): return 0 def Get_size(self): return 1 def Split(self, colour, key=0): return (self, self) def allreduce(self, value, op=None): return value def allgather(self, value): return [value] def bcast(self, obj, source): return obj def Bcast(self, array, source): pass def Barrier(self): pass FAKE_MPI_COMM = _FakeCommunicator() class _FakeMPI(object): # required MPI module constants SUM = MAX = DOUBLE = 'fake' COMM_WORLD = FAKE_MPI_COMM if os.environ.get('DONT_USE_MPI', 0): print >>sys.stderr, 'Not using MPI' MPI = None else: try: from mpi4py import MPI except ImportError: warnings.warn('Not using MPI as mpi4py not found', stacklevel=2) MPI = None else: size = MPI.COMM_WORLD.Get_size() if size == 1: MPI = None if MPI is None: USING_MPI = False MPI = _FakeMPI() else: USING_MPI = True class ParallelContext(object): """A parallel context encapsulates the number of CPUs available and the mechanism by which they communicate. All contexts offer the lowest-common- denominator of parallel mechanisms - parallel imap.""" pass class NonParallelContext(ParallelContext): """This dummy parallel context is used when there is only one CPU available """ size = 1 def getCommunicator(self): return FAKE_MPI_COMM def split(self, jobs): return (self, self) def imap(self, f, s, chunksize=None): for element in s: yield f(element) NONE = NonParallelContext() class UnFlattened(list): pass class MPIParallelContext(ParallelContext): """This parallel context divides the available CPUs into groups of equal size. Inner levels of potential parallelism can then further subdivide those groups. It helps to have a CPU count which is divisible by the task count.""" def __init__(self, comm=None): if comm is None: comm = MPI.COMM_WORLD self.comm = comm self.size = comm.Get_size() def getCommunicator(self): return self.comm def split(self, jobs): assert jobs > 0 size = self.size group_count = min(jobs, size) while size % group_count: group_count += 1 if group_count == 1: (next, sub) = (NONE, self) elif group_count == size: (next, sub) = (self, NONE) else: rank = self.comm.Get_rank() klass = type(self) next = klass(self.comm.Split(rank // group_count, rank)) sub = klass(self.comm.Split(rank % group_count, rank)) return (next, sub) def imap(self, f, s, chunksize=1): comm = self.comm (size, rank) = (comm.Get_size(), comm.Get_rank()) ordinals = range(0, len(s), size*chunksize) # ensure same number of allgather calls in every process for start in ordinals: start += rank local_results = UnFlattened([f(x) for x in s[start:start+chunksize]]) for results in comm.allgather(local_results): # mpi4py allgather has a nasty inconsistancy about flattening # lists of simple values if isinstance(results, UnFlattened): for result in results: yield result else: yield results # Helping MultiprocessingParallelContext map unpicklable functions _FUNCTIONS = {} class PicklableAndCallable(object): def __init__(self, key): self.key = key self.func = None def __call__(self, *args, **kw): if self.func is None: try: self.func = _FUNCTIONS[self.key] except KeyError: raise RuntimeError return self.func(*args, **kw) class MultiprocessingParallelContext(ParallelContext): """At the outermost opportunity, this parallel context delegates all work to a multiprocessing.Pool. Subprocesses may also make pools if the outer pool is more than half idle. Ideally the pool would be reused for later tasks, but cogent code mostly uses map() with functions defined in local scopes, which are unpicklable, so that is hacked around and pools are only ever used for one map() call""" def __init__(self, size=None): if size is None: size = multiprocessing.cpu_count() self.size = size def getCommunicator(self): return FAKE_MPI_COMM def _subContext(self, size): if size == 1: return NONE elif size == self.size: return self else: return type(self)(size) def split(self, jobs): assert jobs > 0 group_count = min(self.size, jobs) remaining = self.size // group_count next = self._subContext(group_count) sub = self._subContext(remaining) return (next, sub) def _initWorkerProcess(self): from cogent.util import progress_display progress_display.CURRENT.context = progress_display.NULL_CONTEXT def imap(self, f, s, chunksize=1): key = id(f) _FUNCTIONS[key] = f f = PicklableAndCallable(id(f)) pool = multiprocessing.Pool(self.size, self._initWorkerProcess) for result in pool.imap(f, s, chunksize=chunksize): yield result del _FUNCTIONS[key] pool.close() class ContextStack(threading.local): """This singleton object holds the current and enclosing parallel contexts.""" def __init__(self): # Because this is a thread.local, any secondary threads will see this # default and so not attempt to use MPI/multiprocessing: self.stack = [] self.top = NONE def setInitial(self, context): """The real initialiser. Should be called once from the main thread.""" assert self.stack == [] and self.top is NONE self.top = context @contextmanager def pushed(self, context): """Temporarily enter a pre-existing parallel context""" self.stack.append(self.top) try: self.top = context yield finally: self.top = self.stack.pop() @contextmanager def split(self, jobs=None): """Divide the available CPUs up into groups to handle 'jobs' independent tasks. If jobs << CPUs so that there are multiple CPUS per job, leave that reduced number of CPUs available to any nested parallelism opportunities within each job""" if jobs is None: jobs = self.top.size (next, sub) = self.top.split(jobs) with self.pushed(sub): yield next def imap(self, f, s, chunksize=1): """Like itertools.imap(f,s) only parallel.""" chunks = (len(s)-1) // chunksize + 1 with self.split(chunks) as next: for element in next.imap(f, s, chunksize=chunksize): yield element def map(self, f, s): return list(self.imap(f, s)) def getCommunicator(self): """For code needing an MPI communicator interface. If not using MPI this will be a dummy communicator of 1 CPU.""" return self.top.getCommunicator() def getContext(self): return self.top CONTEXT = ContextStack() if os.environ.get('COGENT_CPUS', False): try: cpus = int(os.environ['COGENT_CPUS']) except ValueError: cpus = None else: assert cpus > 0 else: cpus = 1 if USING_MPI: CONTEXT.setInitial(MPIParallelContext()) elif cpus > 1: CONTEXT.setInitial(MultiprocessingParallelContext(cpus)) getContext = CONTEXT.getContext getCommunicator = CONTEXT.getCommunicator parallel_context = CONTEXT.pushed split = CONTEXT.split imap = CONTEXT.imap map = CONTEXT.map def use_multiprocessing(cpus=None): CONTEXT.setInitial(MultiprocessingParallelContext(cpus)) def sync_random(r): # Only matters with MPI comm = getCommunicator() state = comm.bcast(r.getstate(), 0) r.setstate(state) PyCogent-1.5.3/cogent/util/progress_display.py000644 000765 000024 00000030754 12024702176 022451 0ustar00jrideoutstaff000000 000000 # -*- coding: utf-8 -*- """The hook for new user interfaces to take control of progress bar and status display is pass setupRootUiContext an instance of their UI context class, with the same methods as the _Context class defined here. Long-running functions can be decorated with @display_wrap, and will then be given the extra argument 'ui'. 'ui' is a ProgressContext instance with methods .series(), .imap(), .map() and .display(), any one of which will cause a progress-bar to be displayed. @display_wrap def long_running_function(..., ui) ui.display(msg, progress) # progress is between 0.0 and 1.0 or for item in ui.map(items, function) or for item in ui.imap(items, function) or for item in ui.series(items) """ from __future__ import with_statement, division import sys, time, contextlib, functools, warnings import os, atexit import threading import itertools from cogent.util import parallel, terminal __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" try: curses_terminal = terminal.CursesOutput() except terminal.TerminalUnavailableError: curses_terminal = None else: CODES = curses_terminal.getCodes() if not (CODES['BOL'] and CODES['UP'] and CODES['CLEAR_EOL']): curses_terminal = None # Terminal too primitive else: BOL = CODES['BOL'] CLEAR = CODES['UP'] + BOL + CODES['CLEAR_EOL'] if CODES['GREEN']: bar_template = CODES['GREEN'] + '%s' + CODES['NORMAL'] + '%s' def terminal_progress_bar(dots, width): return bar_template % ('█' * dots, '█' * (width-dots)) else: def terminal_progress_bar(dots, width): return '.' * dots class TextBuffer(object): """A file-like object which accumulates written text. Specialised for output to a curses terminal in that it uses CLEAR and re-writing to extend incomplete lines instead of just outputting or buffering them. That allows the output to always end at a newline, ready for a progress bar to be shown, without postponing output of any incomplete last line.""" def __init__(self): self.chunks = [] self.pending_eol = False def write(self, text): self.chunks.append(text) # multiprocessing calls these def flush(self): pass def isatty(self): return False def regurgitate(self, out): if self.chunks: text = ''.join(self.chunks) if self.pending_eol: out.write(CLEAR) #out.write(CODES['YELLOW']) out.write(text) if text.endswith('\n'): self.pending_eol = False self.chunks = [] else: self.pending_eol = True self.chunks = [text.split('\n')[-1]] out.write('\n') #out.write(CODES['NORMAL']) class ProgressContext(object): """The interface by which cogent algorithms can report progress to the user interface. Calls self.progress_bar.set(progress, message)""" def __init__(self, progress_bar=None, prefix=None, base=0.0, segment=1.0, parent=None, rate=1.0): self.progress_bar = progress_bar self.desc = '' self.base = base self.segment = segment self.progress = 0 self.current = 1 if parent is None: self.depth = 0 self.parent = None self.t_last = 0 else: assert progress_bar is parent.progress_bar self.depth = parent.depth + 1 self.parent = parent self.t_last = parent.t_last self.msg = '' self.prefix = prefix or [] self.message = self.prefix + [self.msg] self._max_text_len = 0 self.max_depth = 2 self.rate = rate def subcontext(self): """For any sub-task which may want to report its own progress, but should not get its own progress bar.""" if self.depth == self.max_depth: return NullContext() return ProgressContext( progress_bar = self.progress_bar, prefix = self.message, base = self.base+self.progress*self.segment, segment = self.current*self.segment, parent = self, rate = self.rate) def display(self, msg=None, progress=None, current=0.0): """Inform the UI that we are are at 'progress' of the way through and will be doing 'msg' until we reach and report at progress+current. """ if self.depth > 0: msg = None updated = False if progress is not None: self.progress = min(progress, 1.0) updated = True if current is not None: self.current = current updated = True if msg is not None and msg != self.msg: self.msg = self.message[-1] = msg updated = True if updated and ( (self.depth==0 and self.progress in [0.0, 1.0]) or time.time() > self.t_last + self.rate): self.render() def render(self): self.progress_bar.set(self.base+self.progress*self.segment, self.message[0]) self.t_last = time.time() def done(self): if self.depth == 0: self.progress_bar.done() # Not much point while cogent is still full of print statements, but # .info() (and maybe other logging analogues such as .warning()) would # avoid the need to capture stdout: #def info(self, text): # """Display some information which may be more than fleetingly useful, # such as a summary of intermediate statistics or a very mild warning. # A GUI should make this information retrievable but not intrusive. # For terminal UIs this is equivalent to printing""" # raise NotImplementedError def series(self, items, noun='', labels=None, start=None, end=1.0, count=None): """Wrap a looped-over list with a progress bar""" if count is None: if not hasattr(items, '__len__'): items = list(items) count = len(items) if start is None: start = 0.0 step = (end-start) / count if labels: assert len(labels) == count elif count == 1: labels = [''] else: if noun: noun += ' ' template = '%s%%%sd/%s' % (noun, len(str(count)), count) labels = [template % i for i in range(0, count)] for (i, item) in enumerate(items): self.display(msg=labels[i], progress=start+step*i, current=step) yield item self.display(progress=end, current=0) def imap(self, f, s, pure=True, **kw): """Like itertools.imap() but with a progress bar""" results = (parallel if pure else itertools).imap(f, s) for result in self.series(results, count=len(s), **kw): yield result def eager_map(self, f, s, **kw): """Like regular Python2 map() but with a progress bar""" return list(self.imap(f,s, **kw)) def map(self, f, s, **kw): """Synonym for eager_map, unlike in Python3""" return self.eager_map(f, s, **kw) class NullContext(ProgressContext): """A UI context which discards all output. Useful on secondary MPI cpus, and other situations where all output is suppressed""" def subcontext(self, *args, **kw): return self def display(self, *args, **kw): pass def done(self): pass class LogFileOutput(object): """A fake progress bar for when progress bars are impossible""" def __init__(self): self.t0 = time.time() self.lpad = '' self.output = sys.stdout # sys.stderr def done(self): pass def set(self, progress, message): if message: delta = '+%s' % int(time.time() - self.t0) progress = int(100*progress+0.5) print >>self.output, "%s %5s %3i%% %s" % ( self.lpad, delta, progress, str(message.encode('utf8'))) class CursesTerminalProgressBar(object): """Wraps stdout and stderr, displaying a progress bar via simple ascii/curses art and scheduling other output around its redraws.""" def __init__(self): global curses_terminal assert curses_terminal is not None self.curses_terminal = curses_terminal self.stdout = sys.stdout self.stderr = sys.stderr self.stdout_log = TextBuffer() self.stderr_log = TextBuffer() self.lines = [] self.chunks = [] self.pending_eol = False self.line_count = 0 (sys.stdout, sys.stderr, self._stdout, self._stderr) = ( self.stdout_log, self.stderr_log, sys.stdout, sys.stderr) def done(self): self.set(None, None) (sys.stdout, sys.stderr) = (self._stdout, self._stderr) def set(self, progress, message): """Clear the existing progress bar, write out any accumulated stdout and stderr, then draw the updated progress bar.""" cols = self.curses_terminal.getColumns() width = cols - 1 if progress is not None: assert 0.0 <= progress <= 1.0, progress dots = int(progress * width) bar = terminal_progress_bar(dots, width) if self.line_count: self.stderr.write(CLEAR * (self.line_count)) else: self.stderr.write(BOL) self.stdout_log.regurgitate(self.stdout) self.stderr_log.regurgitate(self.stderr) if progress is not None: self.stderr.writelines([bar, '\n']) if message is not None: self.stderr.writelines([str(message[:width].encode('utf8')), u'\n']) self.line_count = (progress is not None) + (message is not None) NULL_CONTEXT = NullContext() CURRENT = threading.local() CURRENT.context = None class RootProgressContext(object): """The context between long running jobs, when there is no progress bar""" def __init__(self, pbar_constructor, rate): self.pbar_constructor = pbar_constructor self.rate = rate def subcontext(self): pbar = self.pbar_constructor() return ProgressContext(pbar, rate=self.rate) def setupRootUiContext(progressBarConstructor=None, rate=None): """Select a UI Context type depending on system environment""" if parallel.getCommunicator().Get_rank() != 0: klass = None elif progressBarConstructor is not None: klass = progressBarConstructor elif curses_terminal and sys.stdout.isatty(): klass = CursesTerminalProgressBar elif isinstance(sys.stdout, file): klass = LogFileOutput if rate is None: rate = 5.0 else: klass = None if klass is None: CURRENT.context = NULL_CONTEXT else: if rate is None: rate = 0.1 CURRENT.context = RootProgressContext(klass, rate) def display_wrap(slow_function): """Decorator which give the function its own UI context. The function will receive an extra argument, 'ui', which is used to report progress etc.""" @functools.wraps(slow_function) def f(*args, **kw): if getattr(CURRENT, 'context', None) is None: setupRootUiContext() parent = CURRENT.context show_progress = kw.pop('show_progress', None) if show_progress is False: # PendingDeprecationWarning? subcontext = NULL_CONTEXT else: subcontext = parent.subcontext() kw['ui'] = CURRENT.context = subcontext try: result = slow_function(*args, **kw) finally: CURRENT.context = parent subcontext.done() return result return f @display_wrap def subdemo(ui): for j in ui.series(range(10)): time.sleep(0.1) return @display_wrap def demo(ui): print "non-linebuffered output, tricky but look:", for i in ui.series(range(10)): time.sleep(.6) if i == 5: print '\nhalfway through, a new line: ', if i % 2: subdemo() print i, ".", print "done" if __name__ == '__main__': #setupRootUiContext(rate=0.2) demo() # This messes up interactive shells a bit: #CURRENT.start() #atexit.register(CURRENT.done) PyCogent-1.5.3/cogent/util/recode_alignment.py000755 000765 000024 00000034776 12024702176 022372 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # Author: Greg Caporaso (gregcaporaso@gmail.com) # recode_alignment.py """This file contains functions for recoding alignment objects with reduced-state alphabets, and also defines some reduced-state alphabets. Example alphabet definitions: charge_his_3: Three states for +/-/no charge, with histidine counted as a charged residue. size_2: Two states for large/small. Splits were manually derived by ordering the Mr of the residues and finding a natural split in the middle of the data set (residues with Mr <= 133 -> small; Mr >= 146 -> large). Ordered Mr: [75,89,105,115,117,119,121,131,131,132,133,146,146,147, 149,155,165,174,181,204] Differences between neighboring Mr values: [14,16,10,2,2,2,10,0,1,1,13, 0,1,2,6,10,9,7,23] The difference between 133 and 146 (or D and K/Q) seems like the most natural split. Alphabet naming convention: standard alphabets (those defined in this file) are named based on an identifier (e.g., charge_his) followed by an underscore, followed by the number of states in the reduced alphabet (e.g., 3). Alphabet definition convention: When defining reduced alphabets, a reduced state should be represented by the first char listed in that state. This allows for alignments to not have to support new characeters (by changing the MolType of the alignment) and it allows for easier interpretation of the alignment states. Many of the alphabets presented here are discussed in: Detecting Coevolution by Disregarding Evolution? Tree-Ignorant Metrics of Coevolution Perform As Well As Tree-Aware Metrics; J. Gregory Caporaso, Sandra Smit, Brett C. Easton, Lawrence Hunter, Gavin A. Huttley, and Rob Knight. BMC Evolutionary Biology, 2008. """ from __future__ import division from optparse import OptionParser from numpy import take, array, zeros from cogent.core.alignment import DenseAlignment, Alignment from cogent.evolve.models import DSO78_matrix, DSO78_freqs from cogent import PROTEIN __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Beta" class RecodeError(Exception): """ A generic error to be raised when errors occur in recoding """ pass alphabets = {\ # Note: 3-STATE CHARGE ALPHAS ASSIGN B/Z TO UNCHARGED -- I FIGURE THAT'S # SAFER THAT ASSIGNING THEM TO EITHER +/-, BUT MAYBE THAT'S NOT ACCURATE. # AN ALTERNATIVE WOULD BE TO ASSIGN THEM TO WHATEVER THEIR MORE COMMON # VALUE WAS IN (e.g.) ALL PROTEINS 'charge_2': [('K','KRDEBZ'), ('A','ACFGHILMNPQSTVWYX')],\ 'charge_3': [('K','KR'), ('D','DE'), ('A','ACFGHILMNPQSTVWYBXZ')],\ 'charge_his_2': [('K','KRDEHBZ'), ('A','ACFGILMNPQSTVWYX')],\ 'charge_his_3': [('K','KRH'), ('D','DE'), ('A','ACFGILMNPQSTVWYBXZ')],\ 'size_2': [('G','GAVLISPTCNDXB'),('M','MFYWQKHREZ')],\ 'hydropathy_3':[('R','RKDENQHBZ'),('Y','YWSTG'), ('P','PAMCFLVIX')],\ # B/Z are assigned to the acidic residues, since D/E are more common -- # need to figure out if this is the best way to handle them 'polarity_his_4':[('D','DEBZ'),('R','RHK'),\ ('A','AILMFPWV'),('G','GSTCYNQ')], # This is a modified A1_4 alphabet to capture natural breaks in the metric 'a1_4m':[('C','CVILF'),('M','MWA'),('T','GSTPYH'),('N','QNDERK')], 'a1_2':[('C','CVILFMWAGS'),('T','TPYHQNDERK')], 'a1_3':[('C','CVILFMW'),('A','AGSTPY'),('H','HQNDERK')], 'a1_4':[('C','CVILF'),('M','MWAGS'),('T','TPYHQ'),('N','NDERK')], 'a1_5':[('C','CVIL'),('F','FMWA'),('G','GSTP'),('Y','YHQN'),('D','DERK')], 'a1_6':[('C','CVI'),('L','LFMW'),('A','AGS'),('T','TPY'),('H','HQND'),\ ('E','ERK')], 'a1_7':[('C','CVI'),('L','LFM'),('W','WAG'),('S','ST'),('P','PYH'),\ ('Q','QND'),('E','ERK')], 'a1_8':[('C','CVI'),('L','LF'),('M','MWA'),('G','GS'),('T','TPY'),\ ('H','HQ'),('N','NDE'),('R','RK')], 'a1_9':[('C','CV'),('I','IL'),('F','FMW'),('A','AG'),\ ('S','ST'),('P','PY'),\ ('H','HQN'),('D','DE'),('R','RK')], 'a1_10':[('C','CV'),('I','IL'),('F','FM'),\ ('W','WA'),('G','GS'),('T','TP'),\ ('Y','YH'),('Q','QN'),('D','DE'),('R','RK')], 'a2_2':[('M','MEALFKIHVQ'),('R','RWDTCNYSGP')], 'a2_3':[('M','MEALFKI'),('H','HVQRWD'),('T','TCNYSGP')], 'a2_4':[('M','MEALF'),('K','KIHVQ'),('R','RWDTC'),('N','NYSGP')], 'a2_5':[('M','MEAL'),('F','FKIH'),('V','VQRW'),('D','DTCN'),('Y','YSGP')], 'a2_6':[('M','MEA'),('L','LFKI'),('H','HVQ'),('R','RWD'),('T','TCNY'),\ ('S','SGP')], 'a2_7':[('M','MEA'),('L','LFK'),('I','IHV'),('Q','QR'),('W','WDT'),\ ('C','CNY'),('S','SGP')], 'a2_8':[('M','MEA'),('L','LF'),('K','KIH'),('V','VQ'),('R','RWD'),\ ('T','TC'),('N','NYS'),('G','GP')], 'a2_9':[('M','ME'),('A','AL'),('F','FKI'),('H','HV'),('Q','QR'),\ ('W','WD'),('T','TCN'),('Y','YS'),('G','GP')], 'a2_10':[('M','ME'),('A','AL'),('F','FK'),('I','IH'),('V','VQ'),\ ('R','RW'),('D','DT'),('C','CN'),('Y','YS'),('G','GP')], 'a3_2':[('S','SDQHPLCAVK'),('W','WNGERFITMY')], 'a3_3':[('S','SDQHPLC'),('A','AVKWNG'),('E','ERFITMY')], 'a3_4':[('S','SDQHP'),('L','LCAVK'),('W','WNGER'),('F','FITMY')], 'a3_5':[('S','SDQH'),('P','PLCA'),('V','VKWN'),('G','GERF'),('I','ITMY')], 'a3_6':[('S','SDQ'),('H','HPLC'),('A','AVK'),('W','WNG'),('E','ERFI'),\ ('T','TMY')], 'a3_7':[('S','SDQ'),('H','HPL'),('C','CAV'),('K','KW'),('N','NGE'),\ ('R','RFI'),('T','TMY')], 'a3_8':[('S','SDQ'),('H','HP'),('L','LCA'),('V','VK'),('W','WNG'),\ ('E','ER'),('F','FIT'),('M','MY')], 'a3_9':[('S','SD'),('Q','QH'),('P','PLC'),('A','AV'),('K','KW'),\ ('N','NG'),('E','ERF'),('I','IT'),('M','MY')], 'a3_10':[('S','SD'),('Q','QH'),('P','PL'),('C','CA'),('V','VK'),\ ('W','WN'),('G','GE'),('R','RF'),('I','IT'),('M','MY')], 'a4_2':[('W','WHCMYQFKDN'),('E','EIPRSTGVLA')], 'a4_3':[('W','WHCMYQF'),('K','KDNEIP'),('R','RSTGVLA')], 'a4_4':[('W','WHCMY'),('Q','QFKDN'),('E','EIPRS'),('T','TGVLA')], 'a4_5':[('W','WHCM'),('Y','YQFK'),('D','DNEI'),('P','PRST'),('G','GVLA')], 'a4_6':[('W','WHC'),('M','MYQF'),('K','KDN'),('E','EIP'),('R','RSTG'),\ ('V','VLA')], 'a4_7':[('W','WHC'),('M','MYQ'),('F','FKD'),('N','NE'),('I','IPR'),\ ('S','STG'),('V','VLA')], 'a4_8':[('W','WHC'),('M','MY'),('Q','QFK'),('D','DN'),('E','EIP'),\ ('R','RS'),('T','TGV'),('L','LA')], 'a4_9':[('W','WH'),('C','CM'),('Y','YQF'),('K','KD'),('N','NE'),\ ('I','IP'),('R','RST'),('G','GV'),('L','LA')], 'a4_10':[('W','WH'),('C','CM'),('Y','YQ'),('F','FK'),('D','DN'),\ ('E','EI'),('P','PR'),('S','ST'),('G','GV'),('L','LA')], 'a5_2':[('D','DSQPVLECWA'),('H','HFINMTYKGR')], 'a5_3':[('D','DSQPVLE'),('C','CWAHFI'),('N','NMTYKGR')], 'a5_4':[('D','DSQPV'),('L','LECWA'),('H','HFINM'),('T','TYKGR')], 'a5_5':[('D','DSQP'),('V','VLEC'),('W','WAHF'),('I','INMT'),('Y','YKGR')], 'a5_6':[('D','DSQ'),('P','PVLE'),('C','CWA'),('H','HFI'),('N','NMTY'),\ ('K','KGR')], 'a5_7':[('D','DSQ'),('P','PVL'),('E','ECW'),('A','AH'),('F','FIN'),\ ('M','MTY'),('K','KGR')], 'a5_8':[('D','DSQ'),('P','PV'),('L','LEC'),('W','WA'),('H','HFI'),\ ('N','NM'),('T','TYK'),('G','GR')], 'a5_9':[('D','DS'),('Q','QP'),('V','VLE'),('C','CW'),('A','AH'),\ ('F','FI'),('N','NMT'),('Y','YK'),('G','GR')], 'a5_10':[('D','DS'),('Q','QP'),('V','VL'),('E','EC'),('W','WA'),\ ('H','HF'),('I','IN'),('M','MT'),('Y','YK'),('G','GR')], # orig does no recoding, but is provided for convenience so if you want to # iterate over all reduced alphabets and the full alphabet, you can do that # without having specify the original alphabet differently. 'orig':zip('ACDEFGHIKLMNPQRSTVWY','ACDEFGHIKLMNPQRSTVWY')} def build_alphabet_map(alphabet_id=None,alphabet_def=None): """ return dict mapping old alphabet chars to new alphabet chars alphabet_id: string identifying an alphabet in cogent.util.recode_alignment.alphabets. (See cogent.util.recode_alignment.alphabets.keys() for valid alphabet_ids.) alphabet_def: list of two-element tuples where first element is the new alphabet character and the second elements is an iterable object containing the old alphabet chars which should be mapped to the new char. e.g., [('A','CVILFMWAGSTPYH'),('B','QNDERKBZ')] (See cogent.util.recode_alignment.alphabets.values() for more examples.) NOTE: Only one of the two parameters should be provided -- you either provide the alphabet, or it is looked up. If you do provide both, the alphabet_id is ignored. """ try: alphabet_def = alphabet_def or alphabets[alphabet_id] except KeyError: if not alphabet_id: raise ValueError,\ "Must provide an alphabet_id or alphabet definiton." raise ValueError, "Invalid alphabet id." result = {} for new, old in alphabet_def: for old_c in old: result[old_c] = new return result def recode_dense_alignment(aln,alphabet_id=None,alphabet_def=None): """Return new DenseAlignment recoded in the provided reduced-state alphabet aln: the DenseAlignment object to be recoded alphabet_id: string identifying an alphabet in cogent.util.recode_alignment.alphabets. (See cogent.util.recode_alignment.alphabets.keys() for valid alphabet_ids.) alphabet_def: list of two-element tuples where first element is the new alphabet character and the second elements is an iterable object containing the old alphabet chars which should be mapped to the new char. e.g., [('A','CVILFMWAGSTPYH'),('B','QNDERKBZ')] (See cogent.util.recode_alignment.alphabets.values() for more examples.) Note: either alphabet_id OR alphabet_def must be passed. Either provide the alphabet, or have it is looked up. If both are provided the alphabet_id is ignored. """ # Construct a dict mapping from UInt8s in alignment to their # associated characters. This dict is then used for looking # up chars in the new and old alphabets. byte_map = dict(zip(aln.Alphabet,range(len(aln.Alphabet)))) # Construct a dict mapping old characters to new characters. alphabet_map = build_alphabet_map(alphabet_id=alphabet_id,\ alphabet_def=alphabet_def) # Create the recoded version of seqs.Alphabet new_indices = range(len(aln.Alphabet)) for old, new in alphabet_map.items(): new_indices[byte_map[old]] = byte_map[new] # Map the old alphabet onto the new alphabet. Note: characters that # that are not mapped are ignored. Returns a new DenseAlignment. return DenseAlignment(take(new_indices,aln.ArraySeqs).transpose(),\ aln.Names[:],MolType=aln.MolType) def recode_alignment(aln,alphabet_id=None,alphabet_def=None): raise NotImplementedError def recode_freq_vector(alphabet_def,freqs,ignores='BXZ'): """ recode the bg_freqs to reflect the recoding defined in alphabet_def alphabet_def: list of tuples where new char is first tuple element and sequence of old chars is second tuple element. (For examples, see cogent.util.recode_alignment.alphabets.values()) freqs: dict mapping chars to their frequencies ignores: the degenerate characters -- we don't want to include these in the new freqs, b/c they'll be counted multiple times. Also, isn't clear what should be done if an alphabet were to split them apart. Note: there is no error-checking here, so you need to be sure that the alphabet and the frequencies are compatible (i.e., freqs and the old characters must overlap perfectly, with the exception of the degenerate characters, which are ignored by default). """ result = {} for new,olds in alphabet_def: for old in olds: if old in ignores: continue try: result[new] += freqs[old] except KeyError: result[new] = freqs[old] return result ## The following code is for recoding substitution matrices def square_matrix_to_dict(matrix,key_order='ACDEFGHIKLMNPQRSTVWY'): result = {} for c,row in zip(key_order,matrix): result[c] = dict(zip(key_order,row)) return result def recode_count_matrix(alphabet,count_matrix,aa_order): """Recodes a subsitution count matrix alphabet: the alphabet to be used for recoding the matrix (see cogent.util.recode_alignment.alphabets.values()) for examples count_matrix: matrix to be recoded (e.g., cogent.evolve.models.DSO78_matrix) aa_order: the order of the rows/cols in the matrix as a string (for cogent.evolve.models.DSO78_matrix this would be 'ACDEFGHIKLMNPQRSTVWY') """ m = square_matrix_to_dict(count_matrix,aa_order) result = zeros(len(aa_order)**2).reshape(len(aa_order),len(aa_order)) result = square_matrix_to_dict(result,aa_order) for row_new,row_olds in alphabet: for col_new,col_olds in alphabet: if row_new not in col_olds: new_count = 0. for row_old in row_olds: for col_old in col_olds: try: new_count += m[row_old][col_old] except KeyError: # hit a char that's not in the sub matrix -- # probablyan ambiguous residue (i.e., B, X, or Z) pass result[row_new][col_new] = new_count cm = [] for row_c in aa_order: r = [] for col_c in aa_order: r.append(result[row_c][col_c]) cm.append(r) return array(cm) def recode_counts_and_freqs(alphabet,count_matrix=DSO78_matrix,\ freqs=DSO78_freqs,aa_order='ACDEFGHIKLMNPQRSTVWY'): """ recode a substituion count matrix and a vector of character freqs """ recoded_freqs = recode_freq_vector(alphabet,freqs) for aa in aa_order: if aa not in recoded_freqs: recoded_freqs[aa] = 0.0 recoded_counts = recode_count_matrix(alphabet,count_matrix,aa_order) return recoded_counts,recoded_freqs PyCogent-1.5.3/cogent/util/table.py000644 000765 000024 00000110207 12024702176 020137 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ A light-weight Table class for manipulating 2D data and representing it as text, or writing to file for import into other packages. Current output formats include pickle (pythons serialisation format), restructured text (keyed by 'rest'), latex, html, delimited columns, and a simple text format. Table can read pickled and delimited formats. """ from __future__ import division import cPickle, csv from gzip import GzipFile import numpy from cogent.format import table as table_format, bedgraph from cogent.util.dict_array import DictArray __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Felix Schill"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" # making reversed characters for use in reverse order sorting _all_chrs = [chr(i) for i in range(256)] _all_chrs.reverse() _reversed_chrs = ''.join(_all_chrs) def convert2DDict(twoDdict, header = None, row_order = None): """Returns a 2 dimensional list. Arguments: - twoDdict: a 2 dimensional dict with top level keys corresponding to column headings, lower level keys correspond to row headings but are not preserved. - header: series with column headings. If not provided, the sorted top level dict keys are used. - row_order: a specified order to generate the rows. """ if not header: header = twoDdict.keys() header.sort() if not row_order: # we assume rows consistent across dict row_order = twoDdict[header[0]].keys() row_order.sort() # make twoD list table = [] for row in row_order: string_row = [] for column in header: string_row.append(twoDdict[column][row]) table.append(string_row) return table class _Header(list): """a convenience class for storing the Header""" def __new__(cls, arg): n = list.__new__(cls, list(arg)) return n def __setslice__(self, *args): """disallowed""" raise RuntimeError("Table Header is immutable, use withNewHeader") def __setitem__(self, *args): """disallowed""" raise RuntimeError("Table Header is immutable, use withNewHeader") class Table(DictArray): def __init__(self, header = None, rows = None, row_order = None, digits = 4, space = 4, title = '', missing_data = '', max_width = 1e100, row_ids = False, legend = '', column_templates = None, dtype = None): """ Arguments: - header: column headings - rows: a 2D dict, list or tuple. If a dict, it must have column headings as top level keys, and common row labels as keys in each column. - row_order: the order in which rows will be pulled from the twoDdict - digits: floating point resolution - space: number of spaces between columns or a string - title: as implied - missing_data: character assigned if a row has no entry for a column - max_width: maximum column width for printing - row_ids: if True, the 0'th column is used as row identifiers and keys for slicing. - legend: table legend - column_templates: dict of column headings: string format templates or a function that will handle the formatting. - dtype: optional numpy array typecode. """ try: num_cols = len(header) assert num_cols > 0 if type(rows) == numpy.ndarray: assert num_cols == rows.shape[1] elif type(rows) == dict: assert num_cols == len(rows) else: assert num_cols == len(rows[0]) except (IndexError, TypeError, AssertionError): raise RuntimeError("header and rows must be provided to Table") header = [str(head) for head in header] if isinstance(rows, dict): rows = convert2DDict(rows, header = header, row_order = row_order) # if row_ids, we select that column as the row identifiers if row_ids: identifiers = [row[0] for row in rows] else: identifiers = len(rows) if not dtype: dtype = "O" DictArray.__init__(self, rows, identifiers, header, dtype = dtype) # forcing all column headings to be strings self._header = _Header([str(head) for head in header]) self._missing_data = missing_data self.Title = str(title) self.Legend = str(legend) try: self.Space = ' ' * space except TypeError: self.Space = space self._digits = digits self._row_ids = row_ids self._max_width = max_width # some attributes are not preserved in any file format, so always based # on args self._column_templates = column_templates or {} def __repr__(self): row = [] try: for val in self.array[0][:3]: if isinstance(val, float): row.append('%.4f' % val) elif isinstance(val, int): row.append('%d' % val) else: row.append('%r' % val) except IndexError: # an empty table pass row_trunc = ', '.join(row) header_trunc = ', '.join(map(repr, self.Header[:3])) if self.Shape[1] > 3: header_trunc = '[%s,..]' % header_trunc row_trunc = '[%s,..]' % row_trunc else: header_trunc = '[%s]' % header_trunc row_trunc = '[%s]' % row_trunc if self.Shape[0] > 1: row_trunc = '[%s,..]' % row_trunc else: row_trunc = '[[%s]]' % row_trunc if self.Shape[0] == 0: row_trunc = str([]) result = 'Table(numrows=%s, numcols=%s, header=%s, rows=%s)' % \ (self.Shape[0], self.Shape[1], header_trunc, row_trunc) return result def __str__(self): return self.tostring() def __getitem__(self, names): (index, remaining) = self.template.interpretIndex(names) # if we have two integers, return a single value ints = [isinstance(idx, int) for idx in index] if len(ints) == 2 and min(ints): return self.array[index] new_index = list(index) for i, idx in enumerate(new_index): if isinstance(idx, int): new_index[i] = slice(idx, idx+1, None) index = tuple(new_index) rows = self.array[index] result = None if len(index) > 1: header = numpy.asarray(self.Header, dtype="O")[index[1:]] else: header = self.Header if remaining is not None: kwargs = self._get_persistent_attrs() result = self.__class__(header, rows, **kwargs) return result def __getstate__(self): data = self._get_persistent_attrs() del(data['column_templates']) data.update(dict(header = self.Header, rows = self.getRawData())) return data def __setstate__(self, data): limit_ids = data.pop('limit_ids', None) if limit_ids is not None: data['row_ids'] = limit_ids or False new = Table(**data) self.__dict__.update(new.__dict__) return self def _get_header(self): """returns Header value""" return self._header def _set_header(self, data): """disallowed""" raise RuntimeError("not allowed to set the Header") Header = property(_get_header, _set_header) def withNewHeader(self, old, new, **kwargs): """returns a new Table with old header labels replaced by new Arguments: - old: the old column header(s). Can be a string or series of them. - new: the new column header(s). Can be a string or series of them. """ if type(old) == str: old = [old] new = [new] assert len(old) == len(new), 'Mismatched number of old/new labels' indices = map(self.Header.index, old) new_header = list(self.Header) for i in range(len(old)): new_header[indices[i]] = new[i] kw = self._get_persistent_attrs() kw.update(kwargs) return Table(header = new_header, rows = self.getRawData(), **kw) def _get_persistent_attrs(self): kws = dict(row_ids = self._row_ids, title = self.Title, legend = self.Legend, digits = self._digits, space = self.Space, max_width = self._max_width, missing_data = self._missing_data, column_templates = self._column_templates or None) return kws def setColumnFormat(self, column_head, format_template): """Provide a formatting template for a named column. Arguments: - column_head: the column label. - format_template: string formatting template or a function that will handle the formatting. """ assert column_head in self.Header, \ "Unknown column heading %s" % column_head self._column_templates[column_head] = format_template def tostring(self, borders=True, sep=None, format='', **kwargs): """Return the table as a formatted string. Arguments: - format: possible formats are 'rest', 'latex', 'html', 'phylip', 'bedgraph', or simple text (default). - sep: A string separator for delineating columns, e.g. ',' or '\t'. Overrides format. NOTE: If format is bedgraph, assumes that column headers are chrom, start, end, value. In that order! """ if format.lower() == 'phylip': missing_data = "%.4f" % 0.0 else: missing_data = self._missing_data # convert self to a 2D list formatted_table = self.array.tolist() if format != 'bedgraph': header, formatted_table = table_format.formattedCells(formatted_table, self.Header, digits = self._digits, column_templates = self._column_templates, missing_data = missing_data) args = (header, formatted_table, self.Title, self.Legend) if sep and format != 'bedgraph': return table_format.separatorFormat(*args + (sep,)) elif format == 'rest': return table_format.gridTableFormat(*args) elif format.endswith('tex'): caption = None if self.Title or self.Legend: caption = " ".join([self.Title or "", self.Legend or ""]) return table_format.latex(formatted_table, header, caption = caption, **kwargs) elif format == 'html': rest = table_format.gridTableFormat(*args) return table_format.html(rest) elif format == 'phylip': # need to eliminate row identifiers formatted_table = [row[self._row_ids:] for row in formatted_table] header = header[self._row_ids:] return table_format.phylipMatrix(formatted_table, header) elif format == 'bedgraph': assert self.Shape[1] == 4, 'bedgraph format is for 4 column tables' # assuming that header order is chrom, start, end, val formatted_table = bedgraph.bedgraph(self.sorted().array.tolist(), **kwargs) return formatted_table else: return table_format.simpleFormat(*args + (self._max_width, self._row_ids, borders, self.Space)) def toRichHtmlTable(self, row_cell_func=None, header_cell_func=None, element_formatters={}, merge_identical=True, compact=True): """returns just the table html code. Arguments: - row_cell_func: callback function that formats the row values. Must take the row value and coordinates (row index, column index). - header_cell_func: callback function that formats the column headings must take the header label value and coordinate - element_formatters: a dictionary of specific callback funcs for formatting individual html table elements. e.g. {'table': lambda x: ''} - merge_identical: cells within a row are merged to one span. """ formatted_table = self.array.tolist() header, formatted_table = table_format.formattedCells(formatted_table, self.Header, digits = self._digits, column_templates = self._column_templates, missing_data = self._missing_data) # but we strip the cell spacing header = [v.strip() for v in header] rows = [[c.strip() for c in r] for r in formatted_table] return table_format.rich_html(rows, row_cell_func=row_cell_func, header=header, header_cell_func=header_cell_func, element_formatters=element_formatters, compact=compact) def writeToFile(self, filename, mode = 'w', writer = None, format = None, sep = None, compress=None, **kwargs): """Write table to filename in the specified format. If a format is not specified, it attempts to use a filename suffix. Note if a sep argument is provided, unformatted values are written to file in order to preserve numerical accuracy. Arguments: - mode: file opening mode - format: Valid formats are those of the tostring method plus pickle. - writer: a function for formatting the data for output. - sep: a character delimiter for fields. - compress: if True, gzips the file and appends .gz to the filename (if not already added). """ compress = compress or filename.endswith('.gz') if compress: if not filename.endswith('.gz'): filename = '%s.gz' % filename mode = ['wb', mode][mode == 'w'] outfile = GzipFile(filename, mode) else: outfile = file(filename, mode) if format is None: # try guessing from filename suffix if compress: index = -2 else: index = -1 suffix = filename.split('.') if len(suffix) > 1: format = suffix[index] if writer: rows = self.getRawData() rows.insert(0, self.Header[:]) rows = writer(rows, has_header=True) outfile.writelines("\n".join(rows)) elif format == 'pickle': data = self.__getstate__() cPickle.dump(data, outfile) elif sep is not None and format != 'bedgraph': writer = csv.writer(outfile, delimiter = sep) if self.Title: writer.writerow([self.Title]) writer.writerow(self.Header) writer.writerows(self.array) if self.Legend: writer.writerow([self.Legend]) else: table = self.tostring(format = format, **kwargs) outfile.writelines(table + '\n') outfile.close() def toBedgraph(self, chrom_col, start_col, end_col): """docstring for toBedgraph""" pass def appended(self, new_column, *tables, **kwargs): """Append an arbitrary number of tables to the end of this one. Returns a new table object. Optional keyword arguments to the new tables constructor may be passed. Arguments: - new_column: provide a heading for the new column, each tables title will be placed in it. If value is false, the result is no additional column.""" # convert series of tables if isinstance(tables[0], tuple) or isinstance(tables[0], list): tables = tuple(tables[0]) # for each table, determine it's number of rows and create an equivalent # length vector of its title if new_column: header = [new_column] + self.Header else: header = self.Header big_twoD = () table_series = (self,) + tables for table in table_series: # check compatible tables assert self.Header == table.Header, \ "Inconsistent tables -- column headings are not the same." new_twoD = [] for row in table: if new_column: new_twoD.append([table.Title] + row.asarray().tolist()) else: new_twoD.append(row.asarray().tolist()) new_twoD = tuple(new_twoD) big_twoD += new_twoD kw = self._get_persistent_attrs() kw.update(kwargs) return Table(header, big_twoD, **kw) def getRawData(self, columns = None): """Returns raw data as a 1D or 2D list of rows from columns. If one column, its a 1D list. Arguments: - columns: if None, all data are returned""" if columns is None: return self.array.tolist() if isinstance(columns, str): # assumes all column headings are strings. columns = (columns,) column_indices = map(self.Header.index, columns) result = self.array.take(column_indices, axis=1) if len(columns) == 1: result = result.flatten() return result.tolist() def _callback(self, callback, row, columns=None, num_columns=None): if callable(callback): row_segment = row.take(columns) if num_columns == 1: row_segment = row_segment[0] return callback(row_segment) else: return eval(callback, {}, row) def filtered(self, callback, columns=None, **kwargs): """Returns a sub-table of rows for which the provided callback function returns True when passed row data from columns. Row data is a 1D list if more than one column, raw row[col] value otherwise. Arguments: - columns: the columns whose values determine whether a row is to be included. - callback: Can be a function, which takes the sub-row delimited by columns and returns True/False, or a string representing valid python code to be evaluated.""" if isinstance(columns, str): columns = (columns,) if columns: num_columns = len(columns) else: num_columns = None row_indexes = [] if not callable(callback): data = self cols = columns else: data = self.array cols = map(self.Header.index, columns) for rdex, row in enumerate(data): if self._callback(callback, row, cols, num_columns): row_indexes.append(rdex) sub_set = numpy.take(self, row_indexes, 0) kw = self._get_persistent_attrs() kw.update(kwargs) return Table(header = self.Header, rows = sub_set, **kw) def filteredByColumn(self, callback, **kwargs): """Returns a table with columns identified by callback Arguments: - callback: A function which takes the columns delimited by columns and returns True/False, or a string representing valid python code to be evaluated.""" data = self.array.transpose() column_indices = [] append = column_indices.append for index, row in enumerate(data): if callback(row): append(index) columns = numpy.take(self.Header, column_indices) return self.getColumns(columns, **kwargs) def count(self, callback, columns=None, **kwargs): """Returns number of rows for which the provided callback function returns True when passed row data from columns. Row data is a 1D list if more than one column, raw row[col] value otherwise. Arguments: - columns: the columns whose values determine whether a row is to be included. - callback: Can be a function, which takes the sub-row delimited by columns and returns True/False, or a string representing valid python code to be evaluated.""" if isinstance(columns, str): columns = (columns,) if columns: num_columns = len(columns) else: num_columns = None count = 0 if not callable(callback): data = self cols = columns else: data = self.array cols = map(self.Header.index, columns) for row in data: if self._callback(callback, row, cols, num_columns): count += 1 return count def sorted(self, columns = None, reverse = None, **kwargs): """Returns a new table sorted according to columns order. Arguments: - columns: column headings, their order determines the sort order. - reverse: column headings, these columns will be reverse sorted. Either can be provided as just a single string, or a series of strings. """ if columns is None: columns = self.Header elif isinstance(columns, str): columns = [columns] indices = [self.Header.index(col) for col in columns] if not reverse: is_reversed = [False] * len(columns) reverse_indices = [] else: if type(reverse) == str: reverse = [reverse] reverse_indices = [] for index, header_index in enumerate(indices): col = self.Header[header_index] if col in reverse: reverse_indices += [index] is_reversed = [col in reverse for col in columns] reverse_indices = numpy.array(reverse_indices) dtypes = [(self.Header[i], self.array.dtype) for i in indices] # applying the decorate-sort-undecorate approach aux_list = self.array.take(indices, axis=1) # we figure out the casting funcs for any reversed elements cast = [] for index in reverse_indices: val = aux_list[0, index] try: val = val.translate(_reversed_chrs) func = lambda x: x.translate(_reversed_chrs) except AttributeError: func = lambda x: x * -1 func = numpy.vectorize(func) aux_list[:, index] = func(aux_list[:, index]) aux_list = numpy.rec.fromarrays(aux_list.copy().T, dtype=dtypes) indices = aux_list.argsort() new_twoD = self.array.take(indices, axis=0) kw = self._get_persistent_attrs() kw.update(kwargs) return Table(header = self.Header, rows = new_twoD, **kw) def getColumns(self, columns, **kwargs): """Return a slice of columns""" # check whether we have integer columns if isinstance(columns, str): columns = [columns] is_int = min([isinstance(val, int) for val in columns]) indexes = [] if is_int: indexes = columns else: indexes = [self.Header.index(head) for head in columns] if self._row_ids: # we disallow reordering of identifiers, and ensure they are only # presented once for val in range(self._row_ids): try: indexes.remove(val) except ValueError: pass indexes = range(self._row_ids) + indexes columns = numpy.take(numpy.asarray(self.Header, dtype="O"), indexes) new = numpy.take(self.array, indexes, axis=1) kw = self._get_persistent_attrs() kw.update(kwargs) return Table(header = columns, rows = new, **kw) def getDisjointRows(self, rows, **kwargs): """Return the nominated disjoint rows.""" if isinstance(rows, str): rows = [rows] indexes = [] for row in rows: idx, drop = self.template.interpretIndex(row) indexes.append(idx[0]) new = self.array.take(indexes, axis=0) kw = self._get_persistent_attrs() kw.update(kwargs) return Table(header = self.Header, rows = new, **kw) def withNewColumn(self, new_column, callback, columns = None, **kwargs): """Returns a new table with an additional column, computed using callback. Arguments: - new_column: new column heading - columns: the columns whose values determine whether a row is to be included. - callback: Can be a function, which takes the sub-row delimited by columns and returns True/False, or a string representing valid python code to be evaluated.""" if isinstance(columns, str): columns = (columns,) if columns is not None: num_columns = len(columns) else: num_columns = None if not callable(callback): data = self cols = columns else: data = self.array cols = map(self.Header.index, columns) twoD = [list(row) + [self._callback(callback, row, cols, num_columns)] for row in data] kw = self._get_persistent_attrs() kw.update(kwargs) return Table(header = self.Header + [new_column], rows = twoD, **kw) def getDistinctValues(self, column): """returns the set of distinct values for the named column(s)""" columns = [column, [column]][type(column) == str] data = self.getRawData(column) if len(columns) > 1: data = [tuple(row) for row in data] return set(data) def joined(self, other_table, columns_self=None, columns_other=None, inner_join=True, **kwargs): """returns a new table containing the join of this table and other_table. Default behaviour is the natural inner join. Checks for equality in the specified columns (if provided) or all columns; a combined row is included in the output if all indices match exactly. A combined row contains first the row of this table, and then columns from the other_table that are not key columns (i.e. not specified in columns_other). The order (of self, then other) is preserved. The column headers of the output are made unique by replacing the headers of other_table with _. Arguments: - other_table: A table object which will be joined with this table. other_table must have a title. - columns_self, columns_other: indices of key columns that will be compared in the join operation. Can be either column index, or a string matching the column header. The order matters, and the dimensions of columns_self and columns_other have to match. A row will be included in the output iff self[row][columns_self[i]]==other_table[row][columns_other[i]] for all i - inner_join: if False, the outer join of the two tables is returned. """ if other_table.Title is None: raise RuntimeError, "Cannot join if a other_table.Title is None" elif self.Title == other_table.Title: raise RuntimeError, "Cannot join if a table.Title's are equal" columns_self = [columns_self,[columns_self]][type(columns_self)==str] columns_other = [columns_other, [columns_other]][type(columns_other)==str] if not inner_join: assert columns_self is None and columns_other is None, "Cannot "\ "specify column indices for an outer join" columns_self = [] columns_other = [] if columns_self is None and columns_other is None: # we do the natural inner join columns_self=[] columns_other=[] for col_head in self.Header: if col_head in other_table.Header: columns_self.append(self.Header.index(col_head)) columns_other.append(other_table.Header.index(col_head)) elif columns_self is None or columns_other is None: # the same column labels will be used for both tables columns_self = columns_self or columns_other columns_other = columns_self or columns_other elif len(columns_self)!=len(columns_other): raise RuntimeError("Error during table join: key columns have "\ "different dimensions!") # create new 2d list for the output joined_table=[] #resolve column indices from Header, if necessary columns_self_indices=[] columns_other_indices=[] for col in columns_self: if type(col)==int: columns_self_indices.append(col) else: columns_self_indices.append(self.Header.index(col)) for col in columns_other: if type(col)==int: columns_other_indices.append(col) else: columns_other_indices.append(other_table.Header.index(col)) # create a mask of which columns of the other_table will end up in the # output output_mask_other=[] for col in range(0,len(other_table.Header)): if not (col in columns_other_indices): output_mask_other.append(col) # use a dictionary for the key lookup # key dictionary for other_table. # key is a tuple made from specified columns; data is the row index # for lookup... key_lookup={} row_index=0 for row in other_table: #insert new entry for each row key=tuple([row[col] for col in columns_other_indices]) if key in key_lookup: key_lookup[key].append(row_index) else: key_lookup[key]=[row_index] row_index=row_index+1 for this_row in self: # assemble key for query of other_table key=tuple([this_row[col] for col in columns_self_indices]) if key in key_lookup: for output_row_index in key_lookup[key]: other_row=[other_table[output_row_index,c] \ for c in output_mask_other] joined_table.append(list(this_row) + other_row) new_header=self.Header+[other_table.Title+"_"+other_table.Header[c] \ for c in output_mask_other] if not joined_table: # YUK, this is to stop dimension check in DictArray causing # failures joined_table = numpy.empty((0,len(new_header))) return Table(header=new_header, rows=joined_table, **kwargs) def summed(self, indices=None, col_sum=True, strict=True, **kwargs): """returns the sum of numerical values for column(s)/row(s) Arguments: - indices: column name(s) or indices or row indices - col_sum: sums values in the indicated column, the default. If False, returns the row sum. - strict: if False, ignores cells with non-numeric data in the column/row.""" all = indices is None if type(indices) == str: assert col_sum, "Must use row integer indices" indices = self.Header.index(indices) elif type(indices) == int: # a single index indices = [indices] elif not all: raise RuntimeError("unknown indices type: %s" % indices) if not all: vals = self.array.take([indices], axis=[0,1][col_sum]).flatten() if strict: result = vals.sum() else: result = sum(v for v in vals if type(v)!=str) else: # a multi-rowed result if col_sum: vals = self.array else: vals = self.array.transpose() if strict: result = vals.sum(axis=0).tolist() else: result = [] append = result.append # we need to iterate over the elements to be summed, so we # have to transpose for row in vals.transpose(): try: num = row.sum() except TypeError: num = sum(r for r in row if type(r) != str) append(num) return result def normalized(self, by_row=True, denominator_func=None, **kwargs): """returns a table with elements expressed as a fraction according to the results from func Arguments: - by_row: normalisation done by row - denominator_func: a callback function that takes an array and returns a value to be used as the denominator. Default is sum.""" if denominator_func: data = self.array if not by_row: data = data.transpose() denominators = [denominator_func(row) for row in data] else: denominators = self.summed(col_sum=not by_row) if by_row: values = self.array else: values = self.array.transpose() rows = [values[i]/denom for i, denom in enumerate(denominators)] rows = numpy.array(rows) if not by_row: rows = rows.transpose() return Table(header=self.Header, rows=rows, **kwargs) def transposed(self, new_column_name, select_as_header=None, **kwargs): """returns the transposed table. Arguments: - new_column_name: the existing header will become a column with this name - select_as_header: current column name containing data to be used as the header. Defaults to the first column. """ select_as_header = select_as_header or self.Header[0] assert select_as_header in self.Header, \ '"%s" not in table Header' % select_as_header raw_data = self.getRawData() raw_data.insert(0, self.Header) transposed = numpy.array(raw_data, dtype='O') transposed = transposed.transpose() # indices for the header and non header rows header_index = self.Header.index(select_as_header) data_indices = range(0, header_index)+range(header_index+1, len(transposed)) header = list(numpy.take(transposed, [header_index], axis=0)[0]) header = [new_column_name]+header[1:] # [1:] slice excludes old name rows = numpy.take(transposed, data_indices, axis=0) return Table(header=header, rows=rows, **kwargs) PyCogent-1.5.3/cogent/util/terminal.py000644 000765 000024 00000004660 12024702176 020670 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Nadia Alramli" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Nadia Alramli"] __license__ = "BSD" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" # Copyright: 2008 Nadia Alramli # License: BSD import sys try: import curses except ImportError: curses = None COLORS = "BLUE GREEN CYAN RED MAGENTA YELLOW WHITE BLACK".split() # List of terminal controls, you can add more to the list. _CONTROLS = { 'BOL':'cr', 'UP':'cuu1', 'DOWN':'cud1', 'LEFT':'cub1', 'RIGHT':'cuf1', 'CLEAR_SCREEN':'clear', 'CLEAR_EOL':'el', 'CLEAR_BOL':'el1', 'CLEAR_EOS':'ed', 'BOLD':'bold', 'BLINK':'blink', 'DIM':'dim', 'REVERSE':'rev', 'UNDERLINE':'smul', 'NORMAL':'sgr0', 'HIDE_CURSOR':'cinvis', 'SHOW_CURSOR':'cnorm' } class TerminalUnavailableError(RuntimeError): pass class CursesOutput(object): def __init__(self): if curses is None: raise TerminalUnavailableError("No curses modules") elif not hasattr(sys.stdout, 'fileno'): raise TerminalUnavailableError("stdout not a real file") try: curses.setupterm() except curses.error, detail: raise TerminalUnavailableError(detail) def getColumns(self): return curses.tigetnum('cols') def getLines(self): return curses.tigetnum('lines') def getCodes(self): # Get the color escape sequence template or '' if not supported # setab and setaf are for ANSI escape sequences bgColorSeq = curses.tigetstr('setab') or curses.tigetstr('setb') or '' fgColorSeq = curses.tigetstr('setaf') or curses.tigetstr('setf') or '' codes = {} for color in COLORS: # Get the color index from curses colorIndex = getattr(curses, 'COLOR_%s' % color) # Set the color escape sequence after filling the template with index for (prefix, termseq) in [('', fgColorSeq), ('BG_', bgColorSeq)]: key = prefix + color try: codes[key] = curses.tparm(termseq, colorIndex) except curses.error: codes[key] = '' for control in _CONTROLS: # Set the control escape sequence codes[control] = curses.tigetstr(_CONTROLS[control]) or '' return codes PyCogent-1.5.3/cogent/util/transform.py000644 000765 000024 00000051026 12024702176 021066 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides transformations of functions and other objects. Includes: Standard combinatorial higher-order functions adapted from David Mertz (2003), "Text Processing in Python", Chapter 1. Functions for performing complex tests on strings, e.g. includes_any or includes_all. Functions for generating combinations, permutations, or cartesian products of lists. """ from __future__ import division from operator import add, and_, or_ from numpy import logical_and, logical_or, logical_not from string import maketrans from cogent.maths.stats.util import Freqs from cogent.util.misc import identity, select __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight","Zongzhi Liu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" #standard combinatorial HOF's from Mertz def apply_each(functions, *args, **kwargs): """Returns list containing result of applying each function to args.""" return [f(*args, **kwargs) for f in functions] def bools(items): """Returns list of booleans: reflects state of each item.""" return map(bool, items) def bool_each(functions, *args, **kwargs): """Returns list of booleans: results of applying each function to args.""" return bools(apply_each(functions,*args, **kwargs)) def conjoin(functions, *args, **kwargs): """Returns True if all functions return True when applied to args.""" for f in functions: if not f(*args, **kwargs): return False return True def all(functions): """Returns function that returns True when all components return True.""" def apply_to(*args, **kwargs): return conjoin(functions, *args, **kwargs) return apply_to def both(f,g): """Returns function that returns True when functions f and g return True.""" def apply_to(*args, **kwargs): #use operator.__and__ to make it compatible to numpy array operation #return logical_and(f(*args, **kwargs), g(*args, **kwargs)) return f(*args, **kwargs) and g(*args, **kwargs) return apply_to def disjoin(functions, *args, **kwargs): """Returns True if any of the component functions return True.""" for f in functions: if f(*args, **kwargs): return True return False def any(functions): """Returns a function that returns True if any component returns True.""" def apply_to(*args, **kwargs): return disjoin(functions, *args, **kwargs) return apply_to def either(f,g): """Returns a function that returns True if either f or g returns True.""" def apply_to(*args, **kwargs): return f(*args, **kwargs) or g(*args, **kwargs) return apply_to def negate(functions, *args, **kwargs): """Returns True if all functions return False.""" for f in functions: if f(*args, **kwargs): return False return True def none(functions): """Returns a function that returns True if all components return False.""" def apply_to(*args, **kwargs): return negate(functions, *args, **kwargs) return apply_to def neither(f,g): """Returns a function that returns True if neither f not g returns True.""" def apply_to(*args, **kwargs): return not(f(*args, **kwargs)) and not(g(*args, **kwargs)) return apply_to def compose(f,g): """Returns a function that returns the result of applying f to g(x).""" def apply_to(*args, **kwargs): return f(g(*args, **kwargs)) return apply_to def compose_many(*functions): """Returns a function that composes all input functions.""" funs = list(functions) funs.reverse() def apply_to(*args, **kwargs): result = funs[0](*args, **kwargs) for f in funs[1:]: next = f(result) result = next return result return apply_to #factory for making functions that apply to sequences def per_shortest(total, x, y): """Divides total by min(len(x), len(y)). Useful for normalizing per-item results from sequences that are zipped together. Always returns 0 if one of the sequences is empty (to avoid divide by zero error). """ shortest = min(len(x), len(y)) if not shortest: return 0 else: return total/shortest def per_longest(total, x, y): """Divides total by max(len(x), len(y)). Useful for normalizing per-item results from sequences that are zipped together. Always returns 0 if one of the sequences is empty (to avoid divide by zero error). """ longest = max(len(x), len(y)) if not longest: return 0 else: return total/longest class for_seq(object): """Returns function that applies f(i,j) to i,j in zip(first, second). f: f(i,j) applying to elements of the sequence. aggregator: method to reduce the list of results to a scalar. Default: sum. normalizer: f(total, i, j) that normalizes the total as a function of i and j. Default is length_normalizer (divides by the length of the shorter of i and j). If normalizer is None, no normalization is performed. Will always truncate to length of the shorter sequence (because of the use of zip). """ def __init__(self, f, aggregator=sum, normalizer=per_shortest): self.f = f self.aggregator = aggregator self.normalizer = normalizer def __call__(self, first, second): f = self.f if self.normalizer is None: return self.aggregator([f(i,j) for i,j in zip(first, second)]) else: return self.normalizer(self.aggregator(\ [f(i,j) for i,j in zip(first,second)]), first, second) #convenience functions for modifying objects def has_field(field_name): """Returns a function that returns True if the obj has the field_name.""" def field_checker(obj): return hasattr(obj, field_name) return field_checker def extract_field(field_name, constructor=None): """Returns a function that returns the value of the specified field. If set, the constructor function will be applied to obj.field_name. Returns None if the constructor fails or the attribute doesn't exist. """ f = constructor or identity def result(x): try: return f(getattr(x, field_name)) except: return None return result def test_field(field_name, constructor=None): """Returns True if obj.field_name is True. False otherwise. If set, the constructor will be applied to the obj.field_name. If accessing the field raises an exception, returns False. """ extractor = extract_field(field_name, constructor) def result(x): return bool(extractor(x)) return result def index(constructor=None, overwrite=False): """Returns a function that constructs a dict mapping constructor to object. If constructor is None, tries to make the objects the keys (only works if the objects defines __hash__. If overwrite is True, returns single item for each value. Otherwise, always returns a list (even if one item) for each value. """ f = constructor or identity if overwrite: def result(items): return dict([(f(i), i) for i in items]) return result else: def result(items): index = {} for i in items: curr = f(i) if curr in index: index[curr].append(i) else: index[curr] = [i] return index return result def test_container(container): """Returns function that tests safely if item in container. Does not raise TypeError (e.g. for dict checks.) """ def result(item): try: return item in container except TypeError: return False return result allchars = maketrans('','') def trans_except(good_chars, default): """Returns translation table mapping all but the 'good chars' to default.""" all_list = list(allchars) for i, char in enumerate(all_list): if char not in good_chars: all_list[i] = default return ''.join(all_list) def trans_all(bad_chars, default): """Returns translation table mapping all the 'bad chars' to default.""" all_list = list(allchars) for i, char in enumerate(all_list): if char in bad_chars: all_list[i] = default return ''.join(all_list) def make_trans(frm='', to='', default=None): """Like built-in maketrans, but sets all unspecified chars to default.""" if default is None: all_list = list(allchars) else: if len(default) != 1: raise ValueError, 'make_trans default must be single char: got %s' \ % default all_list = [default] * 256 for orig, new in zip(frm, to): all_list[ord(orig)] = new return ''.join(all_list) def find_any(words, case_sens = False): """Tests if any of the given words occurs in the given string. This filter is case INsensitive by default. """ if not case_sens: used_words = [w.lower() for w in words] else: used_words = words def apply_to(s): if not case_sens: used_s = s.lower() else: used_s = s for w in used_words: if used_s.find(w) > -1: return True return False return apply_to def find_no(words, case_sens = False): """Returns True if none of the words appears in s. This filter is case INsensitive by default. """ f=find_any(words,case_sens) def apply_to(s): return not f(s) return apply_to def find_all(words, case_sens=False): """Returns True if all given words appear in s. This filter is case INsensitive by default. """ if not case_sens: used_words = [w.lower() for w in words] else: used_words = words def apply_to(s): if not case_sens: used_s = s.lower() else: used_s = s for w in used_words: # if w doesn't appear in lc_s if not used_s.find(w) > -1: return False return True return apply_to def keep_if_more(items,x,case_sens=False): """Returns True if #items in s > x. False otherwise. This filter is case INsensitive by default. """ x = int(x) if x < 0: raise IndexError, "x should be >= 0" if not case_sens: used_items = [str(item).lower() for item in items] else: used_items = items def find_more_good(s): if s and not case_sens: for i in s: used_s = [str(item).lower() for item in s] else: used_s = s fd = Freqs(used_s) value_list = [fd[i] for i in fd if i in used_items] if value_list: count = reduce(add, value_list) return count > x else: return False return find_more_good def exclude_if_more(items,x,case_sens=False): """Returns True if #items in s < x. This filter is case INsensitive by default. """ f = keep_if_more(items,x,case_sens) def apply_to(s): return not f(s) return apply_to def keep_if_more_other(items,x,case_sens=False): """Returns True if #items in s other than those in items > x. This filter is case INsensitive by default. """ x = int(x) if x < 0: raise IndexError, "x should be >= 0" if not case_sens: used_items = [str(item).lower() for item in items] else: used_items = items def apply_to(s): if s and not case_sens: used_s = [str(item).lower() for item in s] else: used_s = s fd = Freqs(used_s) value_list = [fd[i] for i in fd if i not in used_items] if value_list: count = reduce(add, value_list) return count > x else: return False return apply_to def exclude_if_more_other(items,x,case_sens=False): """Returns True if #items other than in items in s < x. This filter is case INsensitive by default. """ f = keep_if_more_other(items,x,case_sens) def apply_to(s): return not f(s) return apply_to ''' def keep_chars(keep, case_sens=True): """Returns a filter function f(s) that returns a filtered string. Specifically, strips out everything in s that is not in keep. This filter is case sensitive by default. """ allchars = maketrans('', '') if not case_sens: low = keep.lower() up = keep.upper() keep = low + up delchars = ''.join([c for c in allchars if c not in keep]) #make the filter function, capturing allchars and delchars in closure def filter_function(s, a=allchars, d=delchars): return s.translate(a, d) #return the filter function return filter_function ''' class keep_chars(object): """Returns a filter object o(s): call to return a filtered string. Specifically, strips out everything in s that is not in keep. This filter is case sensitive by default. """ allchars = maketrans('','') def __init__(self, keep, case_sens=True): """Returns a new keep_chars object, based on string keep""" if not case_sens: low = keep.lower() up = keep.upper() keep = low + up self.delchars = ''.join([c for c in self.allchars if c not in keep]) def __call__(self, s, a=None, d=None): """f(s) -> s, translates using self.allchars and self.delchars""" if a is None: a = self.allchars if d is None: d = self.delchars return s.translate(a,d) def exclude_chars(exclude,case_sens=True): """Returns a filter function f(s) that returns a filtered string. Specifically, strips out everything is s that is in exlude. This filter is case sensitive by default. """ allchars = maketrans('','') if not case_sens: low = exclude.lower() up = exclude.upper() exclude = low + up delchars = ''.join([c for c in allchars if c in exclude]) def filter_function(s,a=allchars, d=delchars): return s.translate(a,d) return filter_function def reorder(order): """Returns a function that rearranges sequence into specified order. order should be a sequence of indices (for lists, tuples, strings, etc.) or keys (for dicts, etc.). Always returns a list. Will raise the appropriate IndexError or KeyError if items does not contain any position reqested by the order. """ def result(items): return select(order, items) return result def reorder_inplace(order, attr=None): """Returns a function that rearranges the items in attr, in place. If attr is None (the default), reorders the object itself. Uses slice assignment, so, unlike the original reorder, will only work on mutable sequences (e.g. lists, but not tuples or strings or dicts). """ if attr: def result(obj): curr = getattr(obj, attr) curr[:] = select(order, curr) return obj else: def result(obj): obj[:] = select(order, obj) return obj return result maybe_number = keep_chars('0123456789.+-eE') def float_from_string(data): """Extracts a floating point number from string in data, if possible.""" return float(maybe_number(data)) def first_index(seq, f): """Returns index of first item in seq where f(item) is True, or None. To invert the function, use lambda f: not f """ for i, s in enumerate(seq): if f(s): return i def last_index(seq, f): """Returns index of last item in seq where f(item) is True, or None. To invert the function, use lambda f: not f NOTE: We could do this slightly more efficiently by iterating over s in reverse order, but then it wouldn't work on generators that can't be reversed. """ found = None for i, s in enumerate(seq): if f(s): found = i return found def first_index_in_set(seq, items): """Returns index of first occurrence of any of items in seq, or None.""" for i, s in enumerate(seq): if s in items: return i def last_index_in_set(seq, items): """Returns index of last occurrence of any of items in seq, or None. NOTE: We could do this slightly more efficiently by iterating over s in reverse order, but then it wouldn't work on generators that can't be reversed. """ found = None for i, s in enumerate(seq): if s in items: found = i return found def first_index_not_in_set(seq, items): """Returns index of first occurrence of any of items in seq, or None.""" for i, s in enumerate(seq): if not s in items: return i def last_index_not_in_set(seq, items): """Returns index of last occurrence of any of items in seq, or None. NOTE: We could do this slightly more efficiently by iterating over s in reverse order, but then it wouldn't work on generators that can't be reversed. """ found = None for i, s in enumerate(seq): if s not in items: found = i return found def first(seq, f): """Returns first item in seq where f(item) is True, or None. To invert the function, use lambda f: not f """ for s in seq: if f(s): return s def last(seq, f): """Returns last item in seq where f(item) is True, or None. To invert the function, use lambda f: not f NOTE: We could do this slightly more efficiently by iterating over s in reverse order, but then it wouldn't work on generators that can't be reversed. """ found = None for s in seq: if f(s): found = s return found def first_in_set(seq, items): """Returns first occurrence of any of items in seq, or None.""" for s in seq: if s in items: return s def last_in_set(seq, items): """Returns index of last occurrence of any of items in seq, or None. NOTE: We could do this slightly more efficiently by iterating over s in reverse order, but then it wouldn't work on generators that can't be reversed. """ found = None for s in seq: if s in items: found = s return found def first_not_in_set(seq, items): """Returns first occurrence of any of items in seq, or None.""" for s in seq: if not s in items: return s def last_not_in_set(seq, items): """Returns last occurrence of any of items in seq, or None. NOTE: We could do this slightly more efficiently by iterating over s in reverse order, but then it wouldn't work on generators that can't be reversed. """ found = None for s in seq: if s not in items: found = s return found def perm(items, n=None): """Yields each successive permutation of items. This version from Raymond Hettinger, 2006/03/23 """ if n is None: n = len(items) for i in range(len(items)): v = items[i:i+1] if n == 1: yield v else: rest = items[:i] + items[i+1:] for p in perm(rest, n-1): yield v + p def comb(items, n=None): """Yields each successive combination of n items. items: a slicable sequence. n: number of items in each combination This version from Raymond Hettinger, 2006/03/23 """ if n is None: n = len(items) for i in range(len(items)): v = items[i:i+1] if n == 1: yield v else: rest = items[i+1:] for c in comb(rest, n-1): yield v + c def _increment_comb(outcomes, vector): """Yields each new outcome as an expansion of existing outcomes. """ for outcome in outcomes: for e in vector: yield outcome + [e] def cross_comb(vectors): """Yields each cross combination of a sequence of sequences (e.g. lists). i.e. returns the Cartesian product of the sequences. vectors: must be a seq of sequences. Recipe from the Python cookbook. Profiling shows that this is slightly slower than the previous implementation of cartesian_product for long lists, but faster for short lists. The speed penalty for long lists is outweighed by not having to keep the lists in memory. """ result = ([],) for vector in vectors: result = _increment_comb(result, vector) return result cartesian_product = cross_comb #standard, but obscure, name for this function PyCogent-1.5.3/cogent/util/trie.py000644 000765 000024 00000024751 12024702176 020023 0ustar00jrideoutstaff000000 000000 #! /usr/bin/env python """A trie and compressed trie data structure.""" __author__ = "Jens Reeder" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jens Reeder"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jens Reeder" __email__ = "jens.reeder@gmail.com" __status__ = "Prototype" ################## # A compressed Trie ################## class _Compressed_Node: """ A node in a compressed Trie.""" def __init__(self, key, labels=None): """Creates a new Node.""" self.labels = labels or [] self.key = key self.children = {} def __nonzero__(self): """Checks if Node contains any data.""" return (self.key!="" or len(self.labels)>0 or len(self.children.keys())>0) def __len__(self): """Counts the number of sequences in the Trie""" l = 0 for n in self.children.values(): l += len(n) l += len(self.labels) return l def _to_string(self, depth=0): """Debugging method to display the Node's content. depth: indentation level """ s = ["\n"+depth*'\t'+"key %s"%self.key] if (self.labels): s.append("%s" % str(self.labels)) s.append("\n") for n in self.children.values(): s.append(depth*'\t' +"{%s" %(n._to_string(depth+1))) s.append(depth*'\t'+"}\n") return "".join(s) def size(self): """Returns number of nodes.""" s = 1 for n in self.children.values(): s += n.size() return s def insert(self, key, label): """Insert a key into the Trie. key: The key of the entry label: The value that should be stored for this key """ node_key_len = len(self.key) length = min(node_key_len, len(key)) #follow the key into the tree index=0 while index < length: if(key[index] != self.key[index]): #split up node new_key_node = _Compressed_Node(key[index:], [label]) old_key_node = _Compressed_Node(self.key[index:], self.labels) old_key_node.children = self.children self.children = {key[index]: new_key_node, self.key[index]: old_key_node} self.key = self.key[:index] self.labels = [] return index += 1 #new key matches node key exactly if (index == len(self.key) and index == len(key)): self.labels.append(label) return #Key shorter than node key if(index < node_key_len): lower_node = _Compressed_Node(self.key[index:], self.labels) lower_node.children = self.children self.children = {self.key[index]: lower_node} self.key = key self.labels = [label] return #new key longer than current node key node = self.children.get(key[index]) if(node): # insert into next node node.insert(key[index:], label) else: # create new node new_node = _Compressed_Node(key[index:], [label]) self.children[key[index]] = new_node return def find(self, key): """Searches for key and returns values stored for the key. key: the key whose values should be retuned. """ #key exhausted if(len(key) == 0): return self.labels #find matching part of key and node_key min_length = min(len(key), len(self.key)) index = 0 while index < min_length: if(key[index] != self.key[index]): return [] index+=1 #key and node_key match exactly if (index == len(key)): return self.labels node = self.children.get(key[index]) if(node): # descend to next node return node.find(key[index:]) else: return [] def prefixMap(self): """Builds a prefix map from sequences stored in Trie.""" labels = [] mapping = {} # we have a leaf if (len(self.children)==0): mapping = {self.labels[0]: self.labels[1:]} # we are at an internal node else: for child in self.children.values(): mapping.update(child.prefixMap()) # get largest group n = -1 key_largest = None for (key, value) in mapping.iteritems(): if (len(value) > n): n = len(value) key_largest = key # append this node's values mapping[key_largest].extend(self.labels) return mapping class Compressed_Trie: """ A compressed Trie""" def __init__(self): """ Return an empty Trie.""" self.root = _Compressed_Node("") def insert(self, key, label): """Insert key with label in Trie.""" self.root.insert(key, label) def find(self, key): """Returns label for key in Trie.""" return self.root.find(key) def __str__(self): """Ugly Trie string representation for debugging.""" return self.root._to_string() def __nonzero__(self): """Checks wheter the trie is empty or not.""" return (self.root.__nonzero__()) def __len__(self): """Returns number of sequences in Trie.""" return len(self.root) def size(self): """Returns number of nodes in Trie""" return self.root.size() def prefixMap(self): """builds a prefix map of seqeunces stored in Trie.""" return self.root.prefixMap() ################### # An atomic Trie ################### class _Node: """ A node in a Trie.""" def __init__(self, labels=None): """Creates a new node with value label.""" self.labels = labels or [] self.children = {} def insert(self, key, label): """Insert key with value label into Trie. key: The key of the entry label: The value that should be stored for this key """ curr_node = self for ch in key: curr_node = curr_node.children.setdefault(ch, _Node()) curr_node.labels.append(label) def _insert_unique(self, key, value): """Insert key with value if key not already in Trie. key: The key of the entry label: The value that should be stored for this key Returns label of key if unique or label of containing key. Note: Tries built with this function will only have labels on their leaves. """ curr_node = self labels = [] #descend along key and collect all internal node labels for ch in key: if curr_node.labels != []: labels.extend(curr_node.labels) curr_node.labels = [] curr_node = curr_node.children.setdefault(ch, _Node()) if (curr_node.children=={} and curr_node.labels==[]): # we have a new prefix curr_node.labels = [value] return (curr_node.labels[0], labels) else: # we are at an internal node or a labeled leaf # descend to the "leftmost" leaf while (curr_node.children != {}): curr_node = curr_node.children.values()[0] return (curr_node.labels[0], []) def find(self, key): """Retrieves node with key in Trie.""" next_node = self for ch in key: try: next_node = next_node.children[ch] except KeyError: return None return next_node class Trie: """ A Trie data structure.""" def __init__(self): """Inits an empty Trie.""" self.root = _Node() def insert(self, key, label): """Insert key with label into Trie. key: The key of the entry label: The value that should be stored for this key """ return self.root.insert(key, label) def _insert_unique(self, key, label): """Insert unique keys into Trie, skip redundant ones. key: The key of the entry label: The value that should be stored for this key Returns: list of labels removed rom Trie """ return self.root._insert_unique(key, label) def find(self, key): """Retrieves labels of key in Trie.""" node = self.root.find(key) if (node ==None): return [] return node.labels def _build_prefix_map(seqs): """Builds a prefix map using an atomic Trie. seqs: dict like sequence collection """ mapping = {} t = Trie() for label,seq in seqs: (label2, prefix_labels) = t._insert_unique(seq, label) if (label2==label): #seq got inserted and is new word mapping[label] = prefix_labels[:] #delete old mappings and transfer old mappings to new label for l in prefix_labels: mapping[label].extend(mapping[l]) del(mapping[l]) else: # seq is already in tree and is prefix of label2 mapping[label2].append(label) return mapping def build_trie(seqs, classname=Compressed_Trie): """ build a Trie for a list of (label,seq) pairs. seqs: list of (label,seq) pairs classname: Constructor to use for tree building Compressed_Trie (default) or Trie `""" if (not(classname==Trie or classname==Compressed_Trie)): raise ValueError, "Wrong classname for build_trie." t = classname() for (label, seq) in seqs: t.insert(seq, label) return t def build_prefix_map(seqs, classname=Compressed_Trie): """Builds prefix map from seqs. seqs: list of (label,seq) pairs classname: Constructor to use for tree building Compressed_Trie (default) or Trie This method can be used to filter sequences for identity and for identical prefixes. Due to the nature of tries a list of n seqs of length l can be filtered in O(nl) time. """ if (not(classname==Trie or classname==Compressed_Trie)): raise ValueError, "Wrong classname for build_trie." if (classname == Compressed_Trie): t = build_trie(seqs) return t.prefixMap() else: return _build_prefix_map(seqs) PyCogent-1.5.3/cogent/util/unit_test.py000644 000765 000024 00000053420 12024702176 021071 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Extension of the built-in unittest framework for floating-point comparisons. Specific Extensions: assertFloatEqual, assertFloatEqualAbs, and assertFloatEqualRel give fine- grained control over how floating point numbers (or lists thereof) are tested for equality. assertContains and assertNotContains give more helpful error messages when testing whether an observed item is present or absent in a set of possiblities. Ditto assertGreaterThan, assertLessThan, assertBetween, and assertIsProb (which is a special case of assertBetween requiring the result to between 0 and 1). assertSameItems and assertEqualItems test the items in a list for pairwise identity and equality respectively (i.e. the observed and expected values must have the same number of each item, though the order can differ); assertNotEqualItems verifies that two lists do not contain equal sets of items. assertSimilarMeans and assertSimilarFreqs allow you to test stochastic results by setting an explicit P-value and checking that the result is not improbable given the expected P-value. Please use these instead of guessing confidence intervals! The major advantage is that you can reset the P-value gloabally over the whole test suite, so that rare failures don't occur every time. assertIsPermutation checks that you get a permutation of an expected result that differs from the original result, repeating the test a specified number of times before giving up and assuming that the result is always the same. """ #from contextlib import contextmanager import numpy; from numpy import testing, array, asarray, ravel, zeros, \ logical_and, logical_or, isfinite from unittest import main, TestCase as orig_TestCase, TestSuite, findTestCases from cogent.util.misc import recursive_flatten from cogent.maths.stats.test import t_two_sample, G_ind __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell", "Sandra Smit", "Zongzhi Liu", "Micah Hamady", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" ## SUPPORT2425 #@contextmanager #def numpy_err(**kw): # """a numpy err context manager. # **kw: pass to numpy.seterr(all=None, divide=None, over=None, under=None, # invalid=None) # Example: # with numpy_err(divide='raise'): # self.assertRaises(FloatingPointError, log, 0) # """ # ori_err = numpy.geterr() # numpy.seterr(**kw) # try: yield None # finally: numpy.seterr(**ori_err) class FakeRandom(object): """Drop-in substitute for random.random that provides items from list.""" def __init__(self, data, circular=False): """Returns new FakeRandom object, using list of items in data. circular: if True (default is False), wraps the list around. Otherwise, raises IndexError when we run off the end of the list. WARNING: data must always be iterable, even if it's a single item. """ self._data = data self._ptr = -1 self._circular = circular def __call__(self, *args, **kwargs): """Returns next item from the list in self._data. Raises IndexError when we run out of data. """ self._ptr += 1 #wrap around if circular if self._circular: if self._ptr >= len(self._data): self._ptr = 0 return self._data[self._ptr] class TestCase(orig_TestCase): """Adds some additional utility methods to unittest.TestCase. Notably, adds facilities for dealing with floating point numbers, and some common templates for replicated tests. BEWARE: Do not start any method with 'test' unless you want it to actually run as a test suite in every instance! """ _suite_pvalue = None # see TestCase._set_suite_pvalue() def _get_values_from_matching_dicts(self, d1, d2): """Gets corresponding values from matching dicts""" if set(d1) != set (d2): return None return d1.values(), [d2[k] for k in d1] #might not be in same order def errorCheck(self, call, known_errors): """Applies function to (data, error) tuples, checking for error """ for (data, error) in known_errors: self.assertRaises(error, call, data) def valueCheck(self, call, known_values, arg_prefix='', eps=None): """Applies function to (data, expected) tuples, treating data as args """ for (data, expected) in known_values: observed = eval('call(' + arg_prefix + 'data)') try: allowed_diff = float(eps) except TypeError: self.assertEqual(observed, expected) else: self.assertFloatEqual(observed, expected, allowed_diff) def assertFloatEqualRel(self, obs, exp, eps=1e-6): """Tests whether two floating point numbers/arrays are approx. equal. Checks whether the distance is within epsilon relative to the value of the sum of observed and expected. Use this method when you expect the difference to be small relative to the magnitudes of the observed and expected values. Note: for arbitrary objects, need to compare the specific attribute that's numeric, not the whole object, using this method. """ #do array check first #note that we can't use array ops to combine, because we need to check #at each element whether the expected is zero to do the test to avoid #floating point error. #WARNING: numpy iterates over objects that are not regular Python #floats/ints, so need to explicitly catch scalar values and prevent #cast to array if we want the exact object to print out correctly. is_array = False if hasattr(obs, 'keys') and hasattr(exp, 'keys'): #both dicts? result = self._get_values_from_matching_dicts(obs, exp) if result: obs, exp = result else: try: iter(obs) iter(exp) except TypeError: obs = [obs] exp = [exp] else: try: arr_obs = array(obs) arr_exp = array(exp) arr_diff = arr_obs - arr_exp if arr_obs.shape != arr_exp.shape: self.fail("Wrong shape: Got %s, but expected %s" % \ (`obs`, `exp`)) obs = arr_obs.ravel() exp = arr_exp.ravel() is_array=True except (TypeError, ValueError): pass # shape mismatch can still get by... # explict cast is to work around bug in certain versions of numpy # installed version on osx 10.5 if asarray(obs, object).shape != asarray(exp, object).shape: self.fail("Wrong shape: Got %s, but expected %s" % (obs, exp)) for observed, expected in zip(obs, exp): #try the cheap comparison first if observed == expected: continue try: sum = float(observed + expected) diff = float(observed - expected) if (sum == 0): if is_array: self.failIf(abs(diff) > abs(eps), \ "Got %s, but expected %s (diff was %s)" % \ (`arr_obs`, `arr_exp`, `arr_diff`)) else: self.failIf(abs(diff) > abs(eps), \ "Got %s, but expected %s (diff was %s)" % \ (`observed`, `expected`, `diff`)) else: if is_array: self.failIf(abs(diff/sum) > abs(eps), \ "Got %s, but expected %s (diff was %s)" % \ (`arr_obs`, `arr_exp`, `arr_diff`)) else: self.failIf(abs(diff/sum) > abs(eps), \ "Got %s, but expected %s (diff was %s)" % \ (`observed`, `expected`, `diff`)) except (TypeError, ValueError, AttributeError, NotImplementedError): self.fail("Got %s, but expected %s" % \ (`observed`, `expected`)) def assertFloatEqualAbs(self, obs, exp, eps=1e-6): """ Tests whether two floating point numbers are approximately equal. Checks whether the absolute value of (a - b) is within epsilon. Use this method when you expect that one of the values should be very small, and the other should be zero. """ #do array check first #note that we can't use array ops to combine, because we need to check #at each element whether the expected is zero to do the test to avoid #floating point error. if hasattr(obs, 'keys') and hasattr(exp, 'keys'): #both dicts? result = self._get_values_from_matching_dicts(obs, exp) if result: obs, exp = result else: try: iter(obs) iter(exp) except TypeError: obs = [obs] exp = [exp] else: try: arr_obs = array(obs) arr_exp = array(exp) if arr_obs.shape != arr_exp.shape: self.fail("Wrong shape: Got %s, but expected %s" % \ (`obs`, `exp`)) diff = arr_obs - arr_exp self.failIf(abs(diff).max() > eps, \ "Got %s, but expected %s (diff was %s)" % \ (`obs`, `exp`, `diff`)) return except (TypeError, ValueError): pass #only get here if array comparison failed for observed, expected in zip(obs, exp): #cheap comparison first if observed == expected: continue try: diff = observed - expected self.failIf(abs(diff) > abs(eps), "Got %s, but expected %s (diff was %s)" % \ (`observed`, `expected`, `diff`)) except (TypeError, ValueError, AttributeError, NotImplementedError): self.fail("Got %s, but expected %s" % \ (`observed`, `expected`)) def assertFloatEqual(self, obs, exp, eps=1e-6, rel_eps=None, \ abs_eps=None): """Tests whether two floating point numbers are approximately equal. If one of the arguments is zero, tests the absolute magnitude of the difference; otherwise, tests the relative magnitude. Use this method as a reasonable default. """ obs = numpy.asarray(obs, dtype='O') exp = numpy.asarray(exp, dtype='O') obs = numpy.ravel(obs) exp = numpy.ravel(exp) if obs.shape != exp.shape: self.fail("Shape mismatch. Got, %s but expected %s" % (obs, exp)) for observed, expected in zip(obs, exp): if self._is_equal(observed, expected): continue try: rel_eps = rel_eps or eps abs_eps = abs_eps or eps if (observed == 0) or (expected == 0): self.assertFloatEqualAbs(observed, expected, abs_eps) else: self.assertFloatEqualRel(observed, expected, rel_eps) except (TypeError, ValueError, AttributeError, NotImplementedError): self.fail("Got %s, but expected %s" % \ (`observed`, `expected`)) def _is_equal(self, observed, expected): """Returns True if observed and expected are equal, False otherwise.""" #errors to catch: TypeError when obs is None tolist_errors = (AttributeError, ValueError, TypeError) try: obs = observed.tolist() except tolist_errors: obs = observed try: exp = expected.tolist() except tolist_errors: exp = expected return obs == exp def failUnlessEqual(self, observed, expected, msg=None): """Fail if the two objects are unequal as determined by != Overridden to make error message enforce order of observed, expected. Use numpy.testing.assert_equal if ValueError, TypeError raised. """ try: if not self._is_equal(observed, expected): raise self.failureException, \ (msg or 'Got %s, but expected %s' % (`observed`, `expected`)) except (ValueError, TypeError), e: #The truth value of an array with more than one element is #ambiguous. Use a.any() or a.all() #descriptor 'tolist' of 'numpy.generic' object needs an argument testing.assert_equal(observed, expected) def failIfEqual(self, observed, expected, msg=None): """Fail if the two objects are equal as determined by ==""" try: self.failUnlessEqual(observed, expected) except self.failureException: pass else: raise self.failureException, \ (msg or 'Observed %s and expected %s: shouldn\'t test equal'\ % (`observed`, `expected`)) #following needed to get our version instead of unittest's assertEqual = assertEquals = failUnlessEqual assertNotEqual = assertNotEquals = failIfEqual def assertEqualItems(self, observed, expected, msg=None): """Fail if the two items contain unequal elements""" obs_items = list(observed) exp_items = list(expected) if len(obs_items) != len(exp_items): raise self.failureException, \ (msg or 'Observed and expected are different lengths: %s and %s' \ % (len(obs_items), len(exp_items))) obs_items.sort() exp_items.sort() for index, (obs, exp) in enumerate(zip(obs_items, exp_items)): if obs != exp: raise self.failureException, \ (msg or 'Observed %s and expected %s at sorted index %s' \ % (obs, exp, index)) def assertSameItems(self, observed, expected, msg=None): """Fail if the two items contain non-identical elements""" obs_items = list(observed) exp_items = list(expected) if len(obs_items) != len(exp_items): raise self.failureException, \ (msg or 'Observed and expected are different lengths: %s and %s' \ % (len(obs_items), len(exp_items))) obs_ids = [(id(i), i) for i in obs_items] exp_ids = [(id(i), i) for i in exp_items] obs_ids.sort() exp_ids.sort() for index, (obs, exp) in enumerate(zip(obs_ids, exp_ids)): o_id, o = obs e_id, e = exp if o_id != e_id: #i.e. the ids are different raise self.failureException, \ (msg or \ 'Observed %s <%s> and expected %s <%s> at sorted index %s' \ % (o, o_id, e, e_id, index)) def assertNotEqualItems(self, observed, expected, msg=None): """Fail if the two items contain only equal elements when sorted""" try: self.assertEqualItems(observed, expected, msg) except: pass else: raise self.failureException, \ (msg or 'Observed %s has same items as %s'%(`observed`, `expected`)) def assertContains(self, observed, item, msg=None): """Fail if item not in observed""" try: if item in observed: return except (TypeError, ValueError): pass raise self.failureException, \ (msg or 'Item %s not found in %s' % (`item`, `observed`)) def assertNotContains(self, observed, item, msg=None): """Fail if item in observed""" try: if item not in observed: return except (TypeError, ValueError): return raise self.failureException, \ (msg or 'Item %s should not have been in %s' % (`item`, `observed`)) def assertGreaterThan(self, observed, value, msg=None): """Fail if observed is <= value""" try: if value is None or observed is None: raise ValueError if (asarray(observed) > value).all(): return except: pass raise self.failureException, \ (msg or 'Observed %s has elements <= %s' % (`observed`, `value`)) def assertLessThan(self, observed, value, msg=None): """Fail if observed is >= value""" try: if value is None or observed is None: raise ValueError if (asarray(observed) < value).all(): return except: pass raise self.failureException, \ (msg or 'Observed %s has elements >= %s' % (`observed`, `value`)) def assertIsBetween(self, observed, min_value, max_value, msg=None): """Fail if observed is not between min_value and max_value""" try: if min_value is None or max_value is None or observed is None: raise ValueError if min_value >= max_value: raise ValueError if logical_and(asarray(observed) < max_value, asarray(observed) > min_value).all(): return except: pass raise self.failureException, \ (msg or 'Observed %s has elements not between %s, %s' % \ (`observed`, `min_value`, `max_value`)) def assertIsNotBetween(self, observed, min_value, max_value, msg=None): """Fail if observed is between min_value and max_value""" try: if min_value is None or max_value is None or observed is None: raise ValueError if min_value >= max_value: raise ValueError if logical_or(asarray(observed) >= max_value, asarray(observed) <= min_value).all(): return except: pass raise self.failureException, \ (msg or 'Observed %s has elements between %s, %s' % \ (`observed`, `min_value`, `max_value`)) def assertIsProb(self, observed, msg=None): """Fail is observed is not between 0.0 and 1.0""" try: if observed is None: raise ValueError if (asarray(observed) >= 0.0).all() and \ (asarray(observed) <= 1.0).all(): return except: pass raise self.failureException, \ (msg or 'Observed %s has elements that are not probs' % (`observed`)) def _set_suite_pvalue(self, pvalue): """Sets the test suite pvalue to be used in similarity tests This value is by default None. The pvalue used in this case is specified in the test module itself. The purpose of this method is to set the pvalue to be used when running a massive test suite """ self._suite_pvalue = pvalue def assertSimilarMeans(self, observed, expected, pvalue=0.01, msg=None): """Fail if observed p is lower than pvalue""" if self._suite_pvalue: pvalue = self._suite_pvalue observed, expected = asarray(observed), asarray(expected) t, p = t_two_sample(observed, expected) if p > pvalue: return elif p is None or not isfinite(p): #handle case where all elements were the same if not observed.shape: observed = observed.reshape((1,)) if not expected.shape: expected = expected.reshape((1,)) if observed[0] == expected[0]: return else: raise self.failureException, \ (msg or 'p-value %s, t-test p %s' % (`pvalue`, `p`)) def assertSimilarFreqs(self, observed, expected, pvalue=0.01, msg=None): """Fail if observed p is lower than pvalue""" if self._suite_pvalue: pvalue = self._suite_pvalue obs_ravel = ravel(asarray(observed)) exp_ravel = ravel(asarray(expected)) m = zeros((2,len(obs_ravel))) m[0,:] = obs_ravel m[1,:] = exp_ravel G, p = G_ind(m) if p > pvalue: return else: raise self.failureException, \ (msg or 'p-value %s, G-test p %s' % (`pvalue`, `p`)) def assertIsPermutation(self, observed, items, msg=None): """Fail if observed is not a permutation of items""" try: self.assertEqualItems(observed, items) self.assertNotEqual(observed, items) return except: pass raise self.failureException, \ (msg or 'Observed %s is not a different permutation of items %s' % \ (`observed`, `items`)) def assertSameObj(self, observed, expected, msg=None): """Fail if 'observed is not expected'""" try: if observed is expected: return except: pass raise self.failureException, \ (msg or 'Observed %s is not the same as expected %s' % \ (`observed`, `expected`)) def assertNotSameObj(self, observed, expected, msg=None): """Fail if 'observed is expected'""" try: if observed is not expected: return except: pass raise self.failureException, \ (msg or 'Observed %s is the same as expected %s' % \ (`observed`, `expected`)) PyCogent-1.5.3/cogent/util/update_version.py000644 000765 000024 00000032165 12024702176 022105 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Support for updating version strings in the PyCogent source tree All .py, .pyx, and .c files descending from cogent/ will be updated All .py files descending from tests/ will be updated All .pyx and .h files descending from include/ will be updated docs/conf.py will be updated setup.py will be updated ###cogent-requirements.txt will be updated### not being updated now """ from optparse import make_option, OptionParser from sys import argv from os import path import os __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "mcdonadt@colorado.edu" __status__ = "Development" options = [make_option('--pycogent_dir',dest='pycogent_dir',type='string', default=''), make_option('--new_version',dest='version',type='string', default=''), make_option('--is_release',dest='is_release',\ action='store_true', default=False), make_option('--verbose',dest='verbose',action='store_true', default=False), make_option('--mock_run',dest='mockrun',action='store_true', default=False), make_option('--new_version_short',dest='version_short',type='string', default=None)] class VersionUpdater(object): """Handles version update of files contained within the PyCogent tree""" def __init__(self, PyCogentDirectory=None, Version=None, \ IsRelease=False, Verbose=False, MockRun=False, VersionShort=None): self.PyCogentDirectory = PyCogentDirectory self.Version = Version self.VersionShort = VersionShort self.VersionTuple = tuple(self.Version.split('.')) self.IsRelease = IsRelease self.Verbose = Verbose self.MockRun = MockRun self.CodesDirectory = path.join(self.PyCogentDirectory, 'cogent') self.TestsDirectory = path.join(self.PyCogentDirectory, 'tests') self.DocDirectory = path.join(self.PyCogentDirectory, 'doc') self.IncludesDirectory = path.join(self.PyCogentDirectory, 'include') if not os.access(path.join(self.CodesDirectory, '__init__.py'),os.R_OK): raise IOError, "Could not locate cogent/__init__.py" if not os.access(path.join(self.TestsDirectory, '__init__.py'),os.R_OK): raise IOError, "Could not locate tests/__init__.py" if not os.access(path.join(self.DocDirectory, 'conf.py'), os.R_OK): raise IOError, "Could not locate doc/conf.py" if not os.access(path.join(self.IncludesDirectory, \ 'array_interface.h'), os.R_OK): raise IOError, "Cound not locate include/array_interface.h" def _get_base_files(self): """Support method, provides relative locations for files in base dir""" setup_file = path.join(self.PyCogentDirectory, 'setup.py') #reqs_file = path.join(self.PyCogentDirectory, 'cogent-requirements.txt') #return [(setup_file, 'Python'), (reqs_file, 'Properties')] return [(setup_file, 'Python')] def _get_test_files(self): """Support method, provides relative locations for test files""" for dirpath, dirnames, filenames in os.walk(self.TestsDirectory): for f in filenames: if f.endswith('.py'): yield (path.join(dirpath, f), 'Python') def _get_code_files(self): """Support method, provides relative locations for code files Yields file name and file type """ for dirpath, dirnames, filenames in os.walk(self.CodesDirectory): for f in filenames: rel_name = path.join(dirpath, f) if f.endswith('.py'): yield (rel_name, 'Python') elif f.endswith('.pyx'): yield (rel_name, 'PyRex') elif f.endswith('.c'): yield (rel_name, 'C') else: pass def _get_doc_files(self): """Support method, provides relative locations for test files Only yields conf.py currently """ return [(path.join(self.DocDirectory, 'conf.py'), 'Python')] def _get_include_files(self): """Support method, provides relative locations for include files Yields file name and file type """ for dirpath, dirnames, filenames in os.walk(self.IncludesDirectory): for f in filenames: rel_name = path.join(dirpath, f) if f.endswith('.pyx'): yield (rel_name, 'PyRex') elif f.endswith('.h'): yield (rel_name, 'Header') else: pass def _update_python_file(self, lines, filename): """Updates the __version__ string of a Python file""" found_version_line = False for lineno, line in enumerate(lines): if line.startswith('__version__'): found_version_line = True break if found_version_line: if self.Verbose: print 'Version string found on line %d' % lineno lines[lineno] = '__version__ = "%s"\n' % self.Version else: print "No version string found in %s" % filename return (lines, found_version_line) def _update_properties_file(self, lines, filename): """Updates version information in specific properties files Expects the properties file to be in "key=value" lines """ found_version_line = False if filename.endswith('cogent-requirements.txt'): for lineno, line in enumerate(lines): if 'packages/source/c/cogent' in line: found_version_line = True break if found_version_line: if self.Verbose: print 'Version string found on line %d' % lineno http_base = lines[lineno].rsplit('/',1)[0] lines[lineno] = '%s/PyCogent-%s.tgz\n' % (http_base, self.Version) else: print "No version string found in %s" % filename return (lines, found_version_line) def _update_doc_conf_file(self, lines, filename): """Updates doc/conf.py file""" versionline = None releaseline = None for lineno, line in enumerate(lines): if line.startswith('version'): versionline = lineno if line.startswith('release'): releaseline = lineno if versionline is not None and releaseline is not None: break if versionline is None: print "No version string found in doc/conf.py" else: if self.Verbose: print 'Version string found on line %d' % versionline lines[versionline] = 'version = "%s"\n' % self.VersionShort if releaseline is None: print "No release string found in doc/conf.py" else: if self.Verbose: print 'Release string found on line %d' % releaseline lines[releaseline] = 'release = "%s"\n' % self.Version return (lines, versionline and releaseline) def _update_pyrex_file(self, lines, filename): """Updates __version__ within a pyx file""" found_version_line = False for lineno, line in enumerate(lines): if line.startswith('__version__'): found_version_line = True break if found_version_line: if self.Verbose: print 'Version string found on line %d' % lineno lines[lineno] = '__version__ = "%s"\n' % str(self.VersionTuple) else: print "No version string found in %s" % filename return (lines, found_version_line) def _update_header_file(self, lines, filename): """Updates a C header file""" found_version_line = False for lineno, line in enumerate(lines): if line.startswith('#define PYCOGENT_VERSION'): found_version_line = True break if found_version_line: if self.Verbose: print 'Version string found on line %d' % lineno lines[lineno] = '#define PYCOGENT_VERSION "%s"\n' \ % self.Version else: print "No version string found in %s" % filename return (lines, found_version_line) def _update_c_file(self, lines, filename): """Updates a C file""" # same as C header... return self._update_header_file(lines, filename) def _file_writer(self, lines, filename): """Handles writing out to the file system""" if self.MockRun: return if self.Verbose: print "Writing file %s" % filename updated_file = open(filename, 'w') updated_file.write(''.join(lines)) updated_file.close() def updateBaseFiles(self): """Updates version strings in files in base PyCogent directory""" for filename, filetype in self._get_base_files(): lines = open(filename).readlines() if self.Verbose: print 'Reading %s' % filename if filetype is 'Python': lines, write_out = self._update_python_file(lines, filename) elif filetype is 'Properties': lines, write_out = self._update_properties_file(lines,filename) else: raise TypeError, "Unknown base file type %s" % filetype if write_out: self._file_writer(lines, filename) def updateDocFiles(self): """Updates version strings in documentation files So far we only update conf.py """ for filename, filetype in self._get_doc_files(): lines = open(filename).readlines() if self.Verbose: print 'Reading %s' % filename if filename.endswith('conf.py'): lines, write_out = self._update_doc_conf_file(lines, filename) else: raise TypeError, "Unknown doc file type: %s" % filetype if write_out: self._file_writer(lines, filename) def updateIncludeFiles(self): """Updates version strings in include files""" for filename, filetype in self._get_include_files(): lines = open(filename).readlines() found_version_line = False if self.Verbose: print 'Reading %s' % filename if filetype is 'PyRex': lines, write_out = self._update_pyrex_file(lines, filename) elif filetype is 'Header': lines, write_out = self._update_header_file(lines, filename) else: raise TypeError, "Unknown include file type %s" % filetype if write_out: self._file_writer(lines, filename) def updateTestFiles(self): """Updates version strings in test files""" for filename, filetype in self._get_test_files(): lines = open(filename).readlines() found_version_line = False if self.Verbose: print 'Reading %s' % filename if filetype is 'Python': lines, write_out = self._update_python_file(lines, filename) else: raise TypeError, "Unknown test file type %s" % filetype if write_out: self._file_writer(lines, filename) def updateCodeFiles(self): """Updates version strings in code files""" # if this annoying slow, could probably drop to bash or soemthing # for a search/replace for filename, filetype in self._get_code_files(): lines = open(filename).readlines() found_version_line = False if self.Verbose: print 'Reading %s' % filename if filetype is 'Python': lines, write_out = self._update_python_file(lines, filename) elif filetype is 'PyRex': lines, write_out = self._update_pyrex_file(lines, filename) elif filetype is 'C': lines, write_out = self._update_c_file(lines, filename) else: raise TypeError, "Unknown code file type %s" % filetype if write_out: self._file_writer(lines, filename) def main(arg_list=argv): parser = OptionParser(option_list=options) opts, args = parser.parse_args(args=arg_list) updater = VersionUpdater(PyCogentDirectory=opts.pycogent_dir, Version=opts.version, VersionShort=opts.version_short, IsRelease=opts.is_release, Verbose=opts.verbose, MockRun=opts.mockrun) updater.updateCodeFiles() updater.updateTestFiles() updater.updateDocFiles() updater.updateIncludeFiles() updater.updateBaseFiles() if __name__ == '__main__': main(argv) PyCogent-1.5.3/cogent/util/warning.py000644 000765 000024 00000003247 12024702176 020522 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from warnings import catch_warnings, simplefilter, warn as _warn __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def deprecated(_type, old, new, version, stack_level=2): """a convenience function for deprecating classes, functions, arguments. Arguments: - _type should be one of class, method, function, argument - old, new: the old and new names - version: the version by which support for the old name will be discontinued - stack_level: as per warnings.warn""" msg = "use %s %s instead of %s, support discontinued in version %s" % \ (_type, new, old, version) # DeprecationWarnings are ignored by default in python 2.7, so temporarily # force them to be handled. with catch_warnings(): simplefilter("always") _warn(msg, DeprecationWarning, stack_level) def discontinued(_type, name, version, stack_level=2): """convenience func to warn about discontinued attributes Arguments: - _type should be one of class, method, function, argument - name: the attributes name - version: the version by which support for the old name will be discontinued - stack_level: as per warnings.warn""" msg = "%s %s is discontinued, support will be stopped in version %s" %\ (_type, name, version) with catch_warnings(): simplefilter("always") _warn(msg, DeprecationWarning, stack_level) PyCogent-1.5.3/cogent/struct/__init__.py000644 000765 000024 00000000723 12024702176 021157 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Code for dealing with molecular structures.""" __all__ = ['knots', 'pairs_util', 'rna2d', 'selection', 'annotation', 'manipulation'] __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit", "Peter Maxwell", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/cogent/struct/_asa.c000644 000765 000024 00001054334 12024702176 020125 0ustar00jrideoutstaff000000 000000 /* Generated by Cython 0.16 on Fri Sep 14 12:12:06 2012 */ #define PY_SSIZE_T_CLEAN #include "Python.h" #ifndef Py_PYTHON_H #error Python headers needed to compile C extensions, please install development version of Python. #elif PY_VERSION_HEX < 0x02040000 #error Cython requires Python 2.4+. #else #include /* For offsetof */ #ifndef offsetof #define offsetof(type, member) ( (size_t) & ((type*)0) -> member ) #endif #if !defined(WIN32) && !defined(MS_WINDOWS) #ifndef __stdcall #define __stdcall #endif #ifndef __cdecl #define __cdecl #endif #ifndef __fastcall #define __fastcall #endif #endif #ifndef DL_IMPORT #define DL_IMPORT(t) t #endif #ifndef DL_EXPORT #define DL_EXPORT(t) t #endif #ifndef PY_LONG_LONG #define PY_LONG_LONG LONG_LONG #endif #ifndef Py_HUGE_VAL #define Py_HUGE_VAL HUGE_VAL #endif #ifdef PYPY_VERSION #define CYTHON_COMPILING_IN_PYPY 1 #define CYTHON_COMPILING_IN_CPYTHON 0 #else #define CYTHON_COMPILING_IN_PYPY 0 #define CYTHON_COMPILING_IN_CPYTHON 1 #endif #if CYTHON_COMPILING_IN_PYPY #define __Pyx_PyCFunction_Call PyObject_Call #else #define __Pyx_PyCFunction_Call PyCFunction_Call #endif #if PY_VERSION_HEX < 0x02050000 typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN #define PY_FORMAT_SIZE_T "" #define PyInt_FromSsize_t(z) PyInt_FromLong(z) #define PyInt_AsSsize_t(o) __Pyx_PyInt_AsInt(o) #define PyNumber_Index(o) PyNumber_Int(o) #define PyIndex_Check(o) PyNumber_Check(o) #define PyErr_WarnEx(category, message, stacklevel) PyErr_Warn(category, message) #define __PYX_BUILD_PY_SSIZE_T "i" #else #define __PYX_BUILD_PY_SSIZE_T "n" #endif #if PY_VERSION_HEX < 0x02060000 #define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt) #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) #define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size) #define PyVarObject_HEAD_INIT(type, size) \ PyObject_HEAD_INIT(type) size, #define PyType_Modified(t) typedef struct { void *buf; PyObject *obj; Py_ssize_t len; Py_ssize_t itemsize; int readonly; int ndim; char *format; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; void *internal; } Py_buffer; #define PyBUF_SIMPLE 0 #define PyBUF_WRITABLE 0x0001 #define PyBUF_FORMAT 0x0004 #define PyBUF_ND 0x0008 #define PyBUF_STRIDES (0x0010 | PyBUF_ND) #define PyBUF_C_CONTIGUOUS (0x0020 | PyBUF_STRIDES) #define PyBUF_F_CONTIGUOUS (0x0040 | PyBUF_STRIDES) #define PyBUF_ANY_CONTIGUOUS (0x0080 | PyBUF_STRIDES) #define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES) #define PyBUF_RECORDS (PyBUF_STRIDES | PyBUF_FORMAT | PyBUF_WRITABLE) #define PyBUF_FULL (PyBUF_INDIRECT | PyBUF_FORMAT | PyBUF_WRITABLE) typedef int (*getbufferproc)(PyObject *, Py_buffer *, int); typedef void (*releasebufferproc)(PyObject *, Py_buffer *); #endif #if PY_MAJOR_VERSION < 3 #define __Pyx_BUILTIN_MODULE_NAME "__builtin__" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #else #define __Pyx_BUILTIN_MODULE_NAME "builtins" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #endif #if PY_MAJOR_VERSION < 3 && PY_MINOR_VERSION < 6 #define PyUnicode_FromString(s) PyUnicode_Decode(s, strlen(s), "UTF-8", "strict") #endif #if PY_MAJOR_VERSION >= 3 #define Py_TPFLAGS_CHECKTYPES 0 #define Py_TPFLAGS_HAVE_INDEX 0 #endif #if (PY_VERSION_HEX < 0x02060000) || (PY_MAJOR_VERSION >= 3) #define Py_TPFLAGS_HAVE_NEWBUFFER 0 #endif #if PY_VERSION_HEX > 0x03030000 && defined(PyUnicode_GET_LENGTH) #define CYTHON_PEP393_ENABLED 1 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_LENGTH(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) PyUnicode_READ_CHAR(u, i) #else #define CYTHON_PEP393_ENABLED 0 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_SIZE(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) ((Py_UCS4)(PyUnicode_AS_UNICODE(u)[i])) #endif #if PY_MAJOR_VERSION >= 3 #define PyBaseString_Type PyUnicode_Type #define PyStringObject PyUnicodeObject #define PyString_Type PyUnicode_Type #define PyString_Check PyUnicode_Check #define PyString_CheckExact PyUnicode_CheckExact #endif #if PY_VERSION_HEX < 0x02060000 #define PyBytesObject PyStringObject #define PyBytes_Type PyString_Type #define PyBytes_Check PyString_Check #define PyBytes_CheckExact PyString_CheckExact #define PyBytes_FromString PyString_FromString #define PyBytes_FromStringAndSize PyString_FromStringAndSize #define PyBytes_FromFormat PyString_FromFormat #define PyBytes_DecodeEscape PyString_DecodeEscape #define PyBytes_AsString PyString_AsString #define PyBytes_AsStringAndSize PyString_AsStringAndSize #define PyBytes_Size PyString_Size #define PyBytes_AS_STRING PyString_AS_STRING #define PyBytes_GET_SIZE PyString_GET_SIZE #define PyBytes_Repr PyString_Repr #define PyBytes_Concat PyString_Concat #define PyBytes_ConcatAndDel PyString_ConcatAndDel #endif #if PY_VERSION_HEX < 0x02060000 #define PySet_Check(obj) PyObject_TypeCheck(obj, &PySet_Type) #define PyFrozenSet_Check(obj) PyObject_TypeCheck(obj, &PyFrozenSet_Type) #endif #ifndef PySet_CheckExact #define PySet_CheckExact(obj) (Py_TYPE(obj) == &PySet_Type) #endif #define __Pyx_TypeCheck(obj, type) PyObject_TypeCheck(obj, (PyTypeObject *)type) #if PY_MAJOR_VERSION >= 3 #define PyIntObject PyLongObject #define PyInt_Type PyLong_Type #define PyInt_Check(op) PyLong_Check(op) #define PyInt_CheckExact(op) PyLong_CheckExact(op) #define PyInt_FromString PyLong_FromString #define PyInt_FromUnicode PyLong_FromUnicode #define PyInt_FromLong PyLong_FromLong #define PyInt_FromSize_t PyLong_FromSize_t #define PyInt_FromSsize_t PyLong_FromSsize_t #define PyInt_AsLong PyLong_AsLong #define PyInt_AS_LONG PyLong_AS_LONG #define PyInt_AsSsize_t PyLong_AsSsize_t #define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask #define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask #endif #if PY_MAJOR_VERSION >= 3 #define PyBoolObject PyLongObject #endif #if PY_VERSION_HEX < 0x03020000 typedef long Py_hash_t; #define __Pyx_PyInt_FromHash_t PyInt_FromLong #define __Pyx_PyInt_AsHash_t PyInt_AsLong #else #define __Pyx_PyInt_FromHash_t PyInt_FromSsize_t #define __Pyx_PyInt_AsHash_t PyInt_AsSsize_t #endif #if (PY_MAJOR_VERSION < 3) || (PY_VERSION_HEX >= 0x03010300) #define __Pyx_PySequence_GetSlice(obj, a, b) PySequence_GetSlice(obj, a, b) #define __Pyx_PySequence_SetSlice(obj, a, b, value) PySequence_SetSlice(obj, a, b, value) #define __Pyx_PySequence_DelSlice(obj, a, b) PySequence_DelSlice(obj, a, b) #else #define __Pyx_PySequence_GetSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), (PyObject*)0) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_GetSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object is unsliceable", (obj)->ob_type->tp_name), (PyObject*)0))) #define __Pyx_PySequence_SetSlice(obj, a, b, value) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_SetSlice(obj, a, b, value)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice assignment", (obj)->ob_type->tp_name), -1))) #define __Pyx_PySequence_DelSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_DelSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice deletion", (obj)->ob_type->tp_name), -1))) #endif #if PY_MAJOR_VERSION >= 3 #define PyMethod_New(func, self, klass) ((self) ? PyMethod_New(func, self) : PyInstanceMethod_New(func)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),((char *)(n))) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),((char *)(n)),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),((char *)(n))) #else #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),(n)) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),(n),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),(n)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_NAMESTR(n) ((char *)(n)) #define __Pyx_DOCSTR(n) ((char *)(n)) #else #define __Pyx_NAMESTR(n) (n) #define __Pyx_DOCSTR(n) (n) #endif #if PY_MAJOR_VERSION >= 3 #define __Pyx_PyNumber_Divide(x,y) PyNumber_TrueDivide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceTrueDivide(x,y) #else #define __Pyx_PyNumber_Divide(x,y) PyNumber_Divide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceDivide(x,y) #endif #ifndef __PYX_EXTERN_C #ifdef __cplusplus #define __PYX_EXTERN_C extern "C" #else #define __PYX_EXTERN_C extern #endif #endif #if defined(WIN32) || defined(MS_WINDOWS) #define _USE_MATH_DEFINES #endif #include #define __PYX_HAVE__cogent__struct___asa #define __PYX_HAVE_API__cogent__struct___asa #include "stdio.h" #include "stdlib.h" #include "numpy/arrayobject.h" #include "numpy/ufuncobject.h" #include "math.h" #ifdef _OPENMP #include #endif /* _OPENMP */ #ifdef PYREX_WITHOUT_ASSERTIONS #define CYTHON_WITHOUT_ASSERTIONS #endif /* inline attribute */ #ifndef CYTHON_INLINE #if defined(__GNUC__) #define CYTHON_INLINE __inline__ #elif defined(_MSC_VER) #define CYTHON_INLINE __inline #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L #define CYTHON_INLINE inline #else #define CYTHON_INLINE #endif #endif /* unused attribute */ #ifndef CYTHON_UNUSED # if defined(__GNUC__) # if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif # elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif #endif typedef struct {PyObject **p; char *s; const long n; const char* encoding; const char is_unicode; const char is_str; const char intern; } __Pyx_StringTabEntry; /*proto*/ /* Type Conversion Predeclarations */ #define __Pyx_PyBytes_FromUString(s) PyBytes_FromString((char*)s) #define __Pyx_PyBytes_AsUString(s) ((unsigned char*) PyBytes_AsString(s)) #define __Pyx_Owned_Py_None(b) (Py_INCREF(Py_None), Py_None) #define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False)) static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject*); static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x); static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject*); static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t); static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject*); #define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x)) #define __pyx_PyFloat_AsFloat(x) ((float) __pyx_PyFloat_AsDouble(x)) #ifdef __GNUC__ /* Test for GCC > 2.95 */ #if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)) #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) #else /* __GNUC__ > 2 ... */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ > 2 ... */ #else /* __GNUC__ */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ */ static PyObject *__pyx_m; static PyObject *__pyx_b; static PyObject *__pyx_empty_tuple; static PyObject *__pyx_empty_bytes; static int __pyx_lineno; static int __pyx_clineno = 0; static const char * __pyx_cfilenm= __FILE__; static const char *__pyx_filename; #if !defined(CYTHON_CCOMPLEX) #if defined(__cplusplus) #define CYTHON_CCOMPLEX 1 #elif defined(_Complex_I) #define CYTHON_CCOMPLEX 1 #else #define CYTHON_CCOMPLEX 0 #endif #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus #include #else #include #endif #endif #if CYTHON_CCOMPLEX && !defined(__cplusplus) && defined(__sun__) && defined(__GNUC__) #undef _Complex_I #define _Complex_I 1.0fj #endif static const char *__pyx_f[] = { "_asa.pyx", "numpy.pxd", }; #define IS_UNSIGNED(type) (((type) -1) > 0) struct __Pyx_StructField_; #define __PYX_BUF_FLAGS_PACKED_STRUCT (1 << 0) typedef struct { const char* name; /* for error messages only */ struct __Pyx_StructField_* fields; size_t size; /* sizeof(type) */ size_t arraysize[8]; /* length of array in each dimension */ int ndim; char typegroup; /* _R_eal, _C_omplex, Signed _I_nt, _U_nsigned int, _S_truct, _P_ointer, _O_bject */ char is_unsigned; int flags; } __Pyx_TypeInfo; typedef struct __Pyx_StructField_ { __Pyx_TypeInfo* type; const char* name; size_t offset; } __Pyx_StructField; typedef struct { __Pyx_StructField* field; size_t parent_offset; } __Pyx_BufFmt_StackElem; typedef struct { __Pyx_StructField root; __Pyx_BufFmt_StackElem* head; size_t fmt_offset; size_t new_count, enc_count; size_t struct_alignment; int is_complex; char enc_type; char new_packmode; char enc_packmode; char is_valid_array; } __Pyx_BufFmt_Context; /* "numpy.pxd":722 * # in Cython to enable them only on the right systems. * * ctypedef npy_int8 int8_t # <<<<<<<<<<<<<< * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t */ typedef npy_int8 __pyx_t_5numpy_int8_t; /* "numpy.pxd":723 * * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t # <<<<<<<<<<<<<< * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t */ typedef npy_int16 __pyx_t_5numpy_int16_t; /* "numpy.pxd":724 * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t # <<<<<<<<<<<<<< * ctypedef npy_int64 int64_t * #ctypedef npy_int96 int96_t */ typedef npy_int32 __pyx_t_5numpy_int32_t; /* "numpy.pxd":725 * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t # <<<<<<<<<<<<<< * #ctypedef npy_int96 int96_t * #ctypedef npy_int128 int128_t */ typedef npy_int64 __pyx_t_5numpy_int64_t; /* "numpy.pxd":729 * #ctypedef npy_int128 int128_t * * ctypedef npy_uint8 uint8_t # <<<<<<<<<<<<<< * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t */ typedef npy_uint8 __pyx_t_5numpy_uint8_t; /* "numpy.pxd":730 * * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t # <<<<<<<<<<<<<< * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t */ typedef npy_uint16 __pyx_t_5numpy_uint16_t; /* "numpy.pxd":731 * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t # <<<<<<<<<<<<<< * ctypedef npy_uint64 uint64_t * #ctypedef npy_uint96 uint96_t */ typedef npy_uint32 __pyx_t_5numpy_uint32_t; /* "numpy.pxd":732 * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t # <<<<<<<<<<<<<< * #ctypedef npy_uint96 uint96_t * #ctypedef npy_uint128 uint128_t */ typedef npy_uint64 __pyx_t_5numpy_uint64_t; /* "numpy.pxd":736 * #ctypedef npy_uint128 uint128_t * * ctypedef npy_float32 float32_t # <<<<<<<<<<<<<< * ctypedef npy_float64 float64_t * #ctypedef npy_float80 float80_t */ typedef npy_float32 __pyx_t_5numpy_float32_t; /* "numpy.pxd":737 * * ctypedef npy_float32 float32_t * ctypedef npy_float64 float64_t # <<<<<<<<<<<<<< * #ctypedef npy_float80 float80_t * #ctypedef npy_float128 float128_t */ typedef npy_float64 __pyx_t_5numpy_float64_t; /* "numpy.pxd":746 * # The int types are mapped a bit surprising -- * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t # <<<<<<<<<<<<<< * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t */ typedef npy_long __pyx_t_5numpy_int_t; /* "numpy.pxd":747 * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t * ctypedef npy_longlong long_t # <<<<<<<<<<<<<< * ctypedef npy_longlong longlong_t * */ typedef npy_longlong __pyx_t_5numpy_long_t; /* "numpy.pxd":748 * ctypedef npy_long int_t * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t # <<<<<<<<<<<<<< * * ctypedef npy_ulong uint_t */ typedef npy_longlong __pyx_t_5numpy_longlong_t; /* "numpy.pxd":750 * ctypedef npy_longlong longlong_t * * ctypedef npy_ulong uint_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t */ typedef npy_ulong __pyx_t_5numpy_uint_t; /* "numpy.pxd":751 * * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulonglong_t * */ typedef npy_ulonglong __pyx_t_5numpy_ulong_t; /* "numpy.pxd":752 * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t # <<<<<<<<<<<<<< * * ctypedef npy_intp intp_t */ typedef npy_ulonglong __pyx_t_5numpy_ulonglong_t; /* "numpy.pxd":754 * ctypedef npy_ulonglong ulonglong_t * * ctypedef npy_intp intp_t # <<<<<<<<<<<<<< * ctypedef npy_uintp uintp_t * */ typedef npy_intp __pyx_t_5numpy_intp_t; /* "numpy.pxd":755 * * ctypedef npy_intp intp_t * ctypedef npy_uintp uintp_t # <<<<<<<<<<<<<< * * ctypedef npy_double float_t */ typedef npy_uintp __pyx_t_5numpy_uintp_t; /* "numpy.pxd":757 * ctypedef npy_uintp uintp_t * * ctypedef npy_double float_t # <<<<<<<<<<<<<< * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t */ typedef npy_double __pyx_t_5numpy_float_t; /* "numpy.pxd":758 * * ctypedef npy_double float_t * ctypedef npy_double double_t # <<<<<<<<<<<<<< * ctypedef npy_longdouble longdouble_t * */ typedef npy_double __pyx_t_5numpy_double_t; /* "numpy.pxd":759 * ctypedef npy_double float_t * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cfloat cfloat_t */ typedef npy_longdouble __pyx_t_5numpy_longdouble_t; /* "cogent/maths/spatial/ckd3.pxd":2 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t # <<<<<<<<<<<<<< * ctypedef np.npy_uint64 UTYPE_t * */ typedef npy_float64 __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t; /* "cogent/maths/spatial/ckd3.pxd":3 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t * ctypedef np.npy_uint64 UTYPE_t # <<<<<<<<<<<<<< * * cdef enum constants: */ typedef npy_uint64 __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t; /* "cogent/struct/_asa.pxd":2 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t # <<<<<<<<<<<<<< * ctypedef np.npy_uint64 UTYPE_t */ typedef npy_float64 __pyx_t_6cogent_6struct_4_asa_DTYPE_t; /* "cogent/struct/_asa.pxd":3 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t * ctypedef np.npy_uint64 UTYPE_t # <<<<<<<<<<<<<< */ typedef npy_uint64 __pyx_t_6cogent_6struct_4_asa_UTYPE_t; #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< float > __pyx_t_float_complex; #else typedef float _Complex __pyx_t_float_complex; #endif #else typedef struct { float real, imag; } __pyx_t_float_complex; #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< double > __pyx_t_double_complex; #else typedef double _Complex __pyx_t_double_complex; #endif #else typedef struct { double real, imag; } __pyx_t_double_complex; #endif /*--- Type declarations ---*/ /* "numpy.pxd":761 * ctypedef npy_longdouble longdouble_t * * ctypedef npy_cfloat cfloat_t # <<<<<<<<<<<<<< * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t */ typedef npy_cfloat __pyx_t_5numpy_cfloat_t; /* "numpy.pxd":762 * * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t # <<<<<<<<<<<<<< * ctypedef npy_clongdouble clongdouble_t * */ typedef npy_cdouble __pyx_t_5numpy_cdouble_t; /* "numpy.pxd":763 * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cdouble complex_t */ typedef npy_clongdouble __pyx_t_5numpy_clongdouble_t; /* "numpy.pxd":765 * ctypedef npy_clongdouble clongdouble_t * * ctypedef npy_cdouble complex_t # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew1(a): */ typedef npy_cdouble __pyx_t_5numpy_complex_t; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode; /* "cogent/maths/spatial/ckd3.pxd":5 * ctypedef np.npy_uint64 UTYPE_t * * cdef enum constants: # <<<<<<<<<<<<<< * NSTACK = 100 * */ enum __pyx_t_6cogent_5maths_7spatial_4ckd3_constants { __pyx_e_6cogent_5maths_7spatial_4ckd3_NSTACK = 100 }; /* "cogent/maths/spatial/ckd3.pxd":8 * NSTACK = 100 * * cdef struct kdpoint: # <<<<<<<<<<<<<< * UTYPE_t index * DTYPE_t *coords */ struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t index; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *coords; }; /* "cogent/maths/spatial/ckd3.pxd":12 * DTYPE_t *coords * * cdef struct kdnode: # <<<<<<<<<<<<<< * UTYPE_t bucket # 1 if leaf-bucket, 0 if node * int dimension */ struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t bucket; int dimension; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t position; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t start; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t end; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *left; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *right; }; #ifndef CYTHON_REFNANNY #define CYTHON_REFNANNY 0 #endif #if CYTHON_REFNANNY typedef struct { void (*INCREF)(void*, PyObject*, int); void (*DECREF)(void*, PyObject*, int); void (*GOTREF)(void*, PyObject*, int); void (*GIVEREF)(void*, PyObject*, int); void* (*SetupContext)(const char*, int, const char*); void (*FinishContext)(void**); } __Pyx_RefNannyAPIStruct; static __Pyx_RefNannyAPIStruct *__Pyx_RefNanny = NULL; static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname); /*proto*/ #define __Pyx_RefNannyDeclarations void *__pyx_refnanny = NULL; #ifdef WITH_THREAD #define __Pyx_RefNannySetupContext(name, acquire_gil) \ if (acquire_gil) { \ PyGILState_STATE __pyx_gilstate_save = PyGILState_Ensure(); \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ PyGILState_Release(__pyx_gilstate_save); \ } else { \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ } #else #define __Pyx_RefNannySetupContext(name, acquire_gil) \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__) #endif #define __Pyx_RefNannyFinishContext() \ __Pyx_RefNanny->FinishContext(&__pyx_refnanny) #define __Pyx_INCREF(r) __Pyx_RefNanny->INCREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_DECREF(r) __Pyx_RefNanny->DECREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GOTREF(r) __Pyx_RefNanny->GOTREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GIVEREF(r) __Pyx_RefNanny->GIVEREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_XINCREF(r) do { if((r) != NULL) {__Pyx_INCREF(r); }} while(0) #define __Pyx_XDECREF(r) do { if((r) != NULL) {__Pyx_DECREF(r); }} while(0) #define __Pyx_XGOTREF(r) do { if((r) != NULL) {__Pyx_GOTREF(r); }} while(0) #define __Pyx_XGIVEREF(r) do { if((r) != NULL) {__Pyx_GIVEREF(r);}} while(0) #else #define __Pyx_RefNannyDeclarations #define __Pyx_RefNannySetupContext(name, acquire_gil) #define __Pyx_RefNannyFinishContext() #define __Pyx_INCREF(r) Py_INCREF(r) #define __Pyx_DECREF(r) Py_DECREF(r) #define __Pyx_GOTREF(r) #define __Pyx_GIVEREF(r) #define __Pyx_XINCREF(r) Py_XINCREF(r) #define __Pyx_XDECREF(r) Py_XDECREF(r) #define __Pyx_XGOTREF(r) #define __Pyx_XGIVEREF(r) #endif /* CYTHON_REFNANNY */ #define __Pyx_CLEAR(r) do { PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);} while(0) #define __Pyx_XCLEAR(r) do { if((r) != NULL) {PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);}} while(0) static void __Pyx_RaiseArgtupleInvalid(const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found); /*proto*/ static void __Pyx_RaiseDoubleKeywordsError(const char* func_name, PyObject* kw_name); /*proto*/ static int __Pyx_ParseOptionalKeywords(PyObject *kwds, PyObject **argnames[], \ PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, \ const char* function_name); /*proto*/ static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact); /*proto*/ static CYTHON_INLINE int __Pyx_GetBufferAndValidate(Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack); static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info); static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name); /*proto*/ #define __Pyx_BufPtrStrided1d(type, buf, i0, s0) (type)((char*)buf + i0 * s0) static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb); /*proto*/ static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb); /*proto*/ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause); /*proto*/ static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index); static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected); static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void); static void __Pyx_UnpackTupleError(PyObject *, Py_ssize_t index); /*proto*/ static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type); /*proto*/ typedef struct { Py_ssize_t shape, strides, suboffsets; } __Pyx_Buf_DimInfo; typedef struct { size_t refcount; Py_buffer pybuffer; } __Pyx_Buffer; typedef struct { __Pyx_Buffer *rcbuffer; char *data; __Pyx_Buf_DimInfo diminfo[8]; } __Pyx_LocalBuf_ND; #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags); static void __Pyx_ReleaseBuffer(Py_buffer *view); #else #define __Pyx_GetBuffer PyObject_GetBuffer #define __Pyx_ReleaseBuffer PyBuffer_Release #endif static Py_ssize_t __Pyx_zeros[] = {0, 0, 0, 0, 0, 0, 0, 0}; static Py_ssize_t __Pyx_minusones[] = {-1, -1, -1, -1, -1, -1, -1, -1}; static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, long level); /*proto*/ static CYTHON_INLINE PyObject *__Pyx_PyInt_to_py_Py_intptr_t(Py_intptr_t); #if CYTHON_CCOMPLEX #ifdef __cplusplus #define __Pyx_CREAL(z) ((z).real()) #define __Pyx_CIMAG(z) ((z).imag()) #else #define __Pyx_CREAL(z) (__real__(z)) #define __Pyx_CIMAG(z) (__imag__(z)) #endif #else #define __Pyx_CREAL(z) ((z).real) #define __Pyx_CIMAG(z) ((z).imag) #endif #if defined(_WIN32) && defined(__cplusplus) && CYTHON_CCOMPLEX #define __Pyx_SET_CREAL(z,x) ((z).real(x)) #define __Pyx_SET_CIMAG(z,y) ((z).imag(y)) #else #define __Pyx_SET_CREAL(z,x) __Pyx_CREAL(z) = (x) #define __Pyx_SET_CIMAG(z,y) __Pyx_CIMAG(z) = (y) #endif static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float, float); #if CYTHON_CCOMPLEX #define __Pyx_c_eqf(a, b) ((a)==(b)) #define __Pyx_c_sumf(a, b) ((a)+(b)) #define __Pyx_c_difff(a, b) ((a)-(b)) #define __Pyx_c_prodf(a, b) ((a)*(b)) #define __Pyx_c_quotf(a, b) ((a)/(b)) #define __Pyx_c_negf(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zerof(z) ((z)==(float)0) #define __Pyx_c_conjf(z) (::std::conj(z)) #if 1 #define __Pyx_c_absf(z) (::std::abs(z)) #define __Pyx_c_powf(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zerof(z) ((z)==0) #define __Pyx_c_conjf(z) (conjf(z)) #if 1 #define __Pyx_c_absf(z) (cabsf(z)) #define __Pyx_c_powf(a, b) (cpowf(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex); static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex); #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex, __pyx_t_float_complex); #endif #endif static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double, double); #if CYTHON_CCOMPLEX #define __Pyx_c_eq(a, b) ((a)==(b)) #define __Pyx_c_sum(a, b) ((a)+(b)) #define __Pyx_c_diff(a, b) ((a)-(b)) #define __Pyx_c_prod(a, b) ((a)*(b)) #define __Pyx_c_quot(a, b) ((a)/(b)) #define __Pyx_c_neg(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zero(z) ((z)==(double)0) #define __Pyx_c_conj(z) (::std::conj(z)) #if 1 #define __Pyx_c_abs(z) (::std::abs(z)) #define __Pyx_c_pow(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zero(z) ((z)==0) #define __Pyx_c_conj(z) (conj(z)) #if 1 #define __Pyx_c_abs(z) (cabs(z)) #define __Pyx_c_pow(a, b) (cpow(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex); static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex); #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex, __pyx_t_double_complex); #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject *); static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject *); static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject *); static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject *); static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject *); static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject *); static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject *); static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject *); static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject *); static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject *); static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject *); static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject *); static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject *); static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject *); static CYTHON_INLINE npy_uint64 __Pyx_PyInt_from_py_npy_uint64(PyObject *); static int __Pyx_check_binary_version(void); #if !defined(__Pyx_PyIdentifier_FromString) #if PY_MAJOR_VERSION < 3 #define __Pyx_PyIdentifier_FromString(s) PyString_FromString(s) #else #define __Pyx_PyIdentifier_FromString(s) PyUnicode_FromString(s) #endif #endif static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict); /*proto*/ static PyObject *__Pyx_ImportModule(const char *name); /*proto*/ static int __Pyx_ImportFunction(PyObject *module, const char *funcname, void (**f)(void), const char *sig); /*proto*/ typedef struct { int code_line; PyCodeObject* code_object; } __Pyx_CodeObjectCacheEntry; struct __Pyx_CodeObjectCache { int count; int max_count; __Pyx_CodeObjectCacheEntry* entries; }; static struct __Pyx_CodeObjectCache __pyx_code_cache = {0,0,NULL}; static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line); static PyCodeObject *__pyx_find_code_object(int code_line); static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object); static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename); /*proto*/ static int __Pyx_InitStrings(__Pyx_StringTabEntry *t); /*proto*/ /* Module declarations from 'cpython.buffer' */ /* Module declarations from 'cpython.ref' */ /* Module declarations from 'libc.stdio' */ /* Module declarations from 'cpython.object' */ /* Module declarations from 'libc.stdlib' */ /* Module declarations from 'numpy' */ /* Module declarations from 'numpy' */ static PyTypeObject *__pyx_ptype_5numpy_dtype = 0; static PyTypeObject *__pyx_ptype_5numpy_flatiter = 0; static PyTypeObject *__pyx_ptype_5numpy_broadcast = 0; static PyTypeObject *__pyx_ptype_5numpy_ndarray = 0; static PyTypeObject *__pyx_ptype_5numpy_ufunc = 0; static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *, char *, char *, int *); /*proto*/ /* Module declarations from 'cython' */ /* Module declarations from 'cogent.maths.spatial.ckd3' */ static struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *(*__pyx_f_6cogent_5maths_7spatial_4ckd3_points)(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *(*__pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree)(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t (*__pyx_f_6cogent_5maths_7spatial_4ckd3_rn)(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ /* Module declarations from 'stdlib' */ /* Module declarations from 'cogent.struct._asa' */ static __Pyx_TypeInfo __Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_4_asa_DTYPE_t = { "DTYPE_t", NULL, sizeof(__pyx_t_6cogent_6struct_4_asa_DTYPE_t), { 0 }, 0, 'R', 0, 0 }; #define __Pyx_MODULE_NAME "cogent.struct._asa" int __pyx_module_is_main_cogent__struct___asa = 0; /* Implementation of 'cogent.struct._asa' */ static PyObject *__pyx_builtin_ValueError; static PyObject *__pyx_builtin_range; static PyObject *__pyx_builtin_RuntimeError; static PyObject *__pyx_pf_6cogent_6struct_4_asa_asa_loop(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_qcoords, PyArrayObject *__pyx_v_lcoords, PyArrayObject *__pyx_v_qradii, PyArrayObject *__pyx_v_lradii, PyArrayObject *__pyx_v_spoints, PyArrayObject *__pyx_v_box, __pyx_t_6cogent_6struct_4_asa_DTYPE_t __pyx_v_probe, __pyx_t_6cogent_6struct_4_asa_UTYPE_t __pyx_v_bucket_size, PyObject *__pyx_v_MAXSYM); /* proto */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /* proto */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info); /* proto */ static char __pyx_k_1[] = "ndarray is not C contiguous"; static char __pyx_k_3[] = "ndarray is not Fortran contiguous"; static char __pyx_k_5[] = "Non-native byte order not supported"; static char __pyx_k_7[] = "unknown dtype code in numpy.pxd (%d)"; static char __pyx_k_8[] = "Format string allocated too short, see comment in numpy.pxd"; static char __pyx_k_11[] = "Format string allocated too short."; static char __pyx_k_13[] = "('1', '5', '3')"; static char __pyx_k_16[] = "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/struct/_asa.pyx"; static char __pyx_k_17[] = "cogent.struct._asa"; static char __pyx_k__B[] = "B"; static char __pyx_k__H[] = "H"; static char __pyx_k__I[] = "I"; static char __pyx_k__L[] = "L"; static char __pyx_k__O[] = "O"; static char __pyx_k__Q[] = "Q"; static char __pyx_k__b[] = "b"; static char __pyx_k__d[] = "d"; static char __pyx_k__f[] = "f"; static char __pyx_k__g[] = "g"; static char __pyx_k__h[] = "h"; static char __pyx_k__i[] = "i"; static char __pyx_k__l[] = "l"; static char __pyx_k__q[] = "q"; static char __pyx_k__Zd[] = "Zd"; static char __pyx_k__Zf[] = "Zf"; static char __pyx_k__Zg[] = "Zg"; static char __pyx_k__np[] = "np"; static char __pyx_k__box[] = "box"; static char __pyx_k__idx[] = "idx"; static char __pyx_k__idxn[] = "idxn"; static char __pyx_k__idxs[] = "idxs"; static char __pyx_k__lidx[] = "lidx"; static char __pyx_k__ridx[] = "ridx"; static char __pyx_k__tree[] = "tree"; static char __pyx_k__areas[] = "areas"; static char __pyx_k__box_c[] = "box_c"; static char __pyx_k__dtype[] = "dtype"; static char __pyx_k__lsize[] = "lsize"; static char __pyx_k__numpy[] = "numpy"; static char __pyx_k__probe[] = "probe"; static char __pyx_k__qpnts[] = "qpnts"; static char __pyx_k__range[] = "range"; static char __pyx_k__ssize[] = "ssize"; static char __pyx_k__t_arr[] = "t_arr"; static char __pyx_k__t_lid[] = "t_lid"; static char __pyx_k__t_ptr[] = "t_ptr"; static char __pyx_k__MAXSYM[] = "MAXSYM"; static char __pyx_k__dstptr[] = "dstptr"; static char __pyx_k__idxptr[] = "idxptr"; static char __pyx_k__kdpnts[] = "kdpnts"; static char __pyx_k__lidx_c[] = "lidx_c"; static char __pyx_k__lradii[] = "lradii"; static char __pyx_k__qradii[] = "qradii"; static char __pyx_k__float64[] = "float64"; static char __pyx_k__lcoords[] = "lcoords"; static char __pyx_k__lradius[] = "lradius"; static char __pyx_k__qcoords[] = "qcoords"; static char __pyx_k__qradius[] = "qradius"; static char __pyx_k__rspoint[] = "rspoint"; static char __pyx_k__spoints[] = "spoints"; static char __pyx_k____main__[] = "__main__"; static char __pyx_k____test__[] = "__test__"; static char __pyx_k__asa_loop[] = "asa_loop"; static char __pyx_k__const_pi[] = "const_pi"; static char __pyx_k__distance[] = "distance"; static char __pyx_k__ridx_div[] = "ridx_div"; static char __pyx_k__idxn_skip[] = "idxn_skip"; static char __pyx_k__lcoords_c[] = "lcoords_c"; static char __pyx_k__qcoords_c[] = "qcoords_c"; static char __pyx_k__spoints_c[] = "spoints_c"; static char __pyx_k__ValueError[] = "ValueError"; static char __pyx_k____version__[] = "__version__"; static char __pyx_k__bucket_size[] = "bucket_size"; static char __pyx_k__distance_sq[] = "distance_sq"; static char __pyx_k__lradius_max[] = "lradius_max"; static char __pyx_k__RuntimeError[] = "RuntimeError"; static char __pyx_k__n_acc_points[] = "n_acc_points"; static char __pyx_k__search_limit[] = "search_limit"; static char __pyx_k__search_point[] = "search_point"; static char __pyx_k__is_accessible[] = "is_accessible"; static char __pyx_k__neighbor_number[] = "neighbor_number"; static PyObject *__pyx_kp_u_1; static PyObject *__pyx_kp_u_11; static PyObject *__pyx_kp_s_13; static PyObject *__pyx_kp_s_16; static PyObject *__pyx_n_s_17; static PyObject *__pyx_kp_u_3; static PyObject *__pyx_kp_u_5; static PyObject *__pyx_kp_u_7; static PyObject *__pyx_kp_u_8; static PyObject *__pyx_n_s__MAXSYM; static PyObject *__pyx_n_s__RuntimeError; static PyObject *__pyx_n_s__ValueError; static PyObject *__pyx_n_s____main__; static PyObject *__pyx_n_s____test__; static PyObject *__pyx_n_s____version__; static PyObject *__pyx_n_s__areas; static PyObject *__pyx_n_s__asa_loop; static PyObject *__pyx_n_s__box; static PyObject *__pyx_n_s__box_c; static PyObject *__pyx_n_s__bucket_size; static PyObject *__pyx_n_s__const_pi; static PyObject *__pyx_n_s__distance; static PyObject *__pyx_n_s__distance_sq; static PyObject *__pyx_n_s__dstptr; static PyObject *__pyx_n_s__dtype; static PyObject *__pyx_n_s__float64; static PyObject *__pyx_n_s__idx; static PyObject *__pyx_n_s__idxn; static PyObject *__pyx_n_s__idxn_skip; static PyObject *__pyx_n_s__idxptr; static PyObject *__pyx_n_s__idxs; static PyObject *__pyx_n_s__is_accessible; static PyObject *__pyx_n_s__kdpnts; static PyObject *__pyx_n_s__lcoords; static PyObject *__pyx_n_s__lcoords_c; static PyObject *__pyx_n_s__lidx; static PyObject *__pyx_n_s__lidx_c; static PyObject *__pyx_n_s__lradii; static PyObject *__pyx_n_s__lradius; static PyObject *__pyx_n_s__lradius_max; static PyObject *__pyx_n_s__lsize; static PyObject *__pyx_n_s__n_acc_points; static PyObject *__pyx_n_s__neighbor_number; static PyObject *__pyx_n_s__np; static PyObject *__pyx_n_s__numpy; static PyObject *__pyx_n_s__probe; static PyObject *__pyx_n_s__qcoords; static PyObject *__pyx_n_s__qcoords_c; static PyObject *__pyx_n_s__qpnts; static PyObject *__pyx_n_s__qradii; static PyObject *__pyx_n_s__qradius; static PyObject *__pyx_n_s__range; static PyObject *__pyx_n_s__ridx; static PyObject *__pyx_n_s__ridx_div; static PyObject *__pyx_n_s__rspoint; static PyObject *__pyx_n_s__search_limit; static PyObject *__pyx_n_s__search_point; static PyObject *__pyx_n_s__spoints; static PyObject *__pyx_n_s__spoints_c; static PyObject *__pyx_n_s__ssize; static PyObject *__pyx_n_s__t_arr; static PyObject *__pyx_n_s__t_lid; static PyObject *__pyx_n_s__t_ptr; static PyObject *__pyx_n_s__tree; static PyObject *__pyx_int_3; static PyObject *__pyx_int_15; static PyObject *__pyx_int_200000; static PyObject *__pyx_k_tuple_2; static PyObject *__pyx_k_tuple_4; static PyObject *__pyx_k_tuple_6; static PyObject *__pyx_k_tuple_9; static PyObject *__pyx_k_tuple_10; static PyObject *__pyx_k_tuple_12; static PyObject *__pyx_k_tuple_14; static PyObject *__pyx_k_codeobj_15; /* Python wrapper */ static PyObject *__pyx_pw_6cogent_6struct_4_asa_1asa_loop(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_6cogent_6struct_4_asa_1asa_loop = {__Pyx_NAMESTR("asa_loop"), (PyCFunction)__pyx_pw_6cogent_6struct_4_asa_1asa_loop, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(0)}; static PyObject *__pyx_pw_6cogent_6struct_4_asa_1asa_loop(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_qcoords = 0; PyArrayObject *__pyx_v_lcoords = 0; PyArrayObject *__pyx_v_qradii = 0; PyArrayObject *__pyx_v_lradii = 0; PyArrayObject *__pyx_v_spoints = 0; PyArrayObject *__pyx_v_box = 0; __pyx_t_6cogent_6struct_4_asa_DTYPE_t __pyx_v_probe; __pyx_t_6cogent_6struct_4_asa_UTYPE_t __pyx_v_bucket_size; PyObject *__pyx_v_MAXSYM = 0; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__qcoords,&__pyx_n_s__lcoords,&__pyx_n_s__qradii,&__pyx_n_s__lradii,&__pyx_n_s__spoints,&__pyx_n_s__box,&__pyx_n_s__probe,&__pyx_n_s__bucket_size,&__pyx_n_s__MAXSYM,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("asa_loop (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[9] = {0,0,0,0,0,0,0,0,0}; values[8] = ((PyObject *)__pyx_int_200000); if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 9: values[8] = PyTuple_GET_ITEM(__pyx_args, 8); case 8: values[7] = PyTuple_GET_ITEM(__pyx_args, 7); case 7: values[6] = PyTuple_GET_ITEM(__pyx_args, 6); case 6: values[5] = PyTuple_GET_ITEM(__pyx_args, 5); case 5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4); case 4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3); case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__qcoords); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__lcoords); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("asa_loop", 0, 8, 9, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__qradii); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("asa_loop", 0, 8, 9, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 3: values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__lradii); if (likely(values[3])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("asa_loop", 0, 8, 9, 3); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 4: values[4] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__spoints); if (likely(values[4])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("asa_loop", 0, 8, 9, 4); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 5: values[5] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__box); if (likely(values[5])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("asa_loop", 0, 8, 9, 5); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 6: values[6] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__probe); if (likely(values[6])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("asa_loop", 0, 8, 9, 6); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 7: values[7] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__bucket_size); if (likely(values[7])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("asa_loop", 0, 8, 9, 7); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 8: if (kw_args > 0) { PyObject* value = PyDict_GetItem(__pyx_kwds, __pyx_n_s__MAXSYM); if (value) { values[8] = value; kw_args--; } } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "asa_loop") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else { switch (PyTuple_GET_SIZE(__pyx_args)) { case 9: values[8] = PyTuple_GET_ITEM(__pyx_args, 8); case 8: values[7] = PyTuple_GET_ITEM(__pyx_args, 7); values[6] = PyTuple_GET_ITEM(__pyx_args, 6); values[5] = PyTuple_GET_ITEM(__pyx_args, 5); values[4] = PyTuple_GET_ITEM(__pyx_args, 4); values[3] = PyTuple_GET_ITEM(__pyx_args, 3); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[0] = PyTuple_GET_ITEM(__pyx_args, 0); break; default: goto __pyx_L5_argtuple_error; } } __pyx_v_qcoords = ((PyArrayObject *)values[0]); __pyx_v_lcoords = ((PyArrayObject *)values[1]); __pyx_v_qradii = ((PyArrayObject *)values[2]); __pyx_v_lradii = ((PyArrayObject *)values[3]); __pyx_v_spoints = ((PyArrayObject *)values[4]); __pyx_v_box = ((PyArrayObject *)values[5]); __pyx_v_probe = __pyx_PyFloat_AsDouble(values[6]); if (unlikely((__pyx_v_probe == (npy_float64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 24; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_bucket_size = __Pyx_PyInt_from_py_npy_uint64(values[7]); if (unlikely((__pyx_v_bucket_size == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 24; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_MAXSYM = values[8]; } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("asa_loop", 0, 8, 9, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.struct._asa.asa_loop", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_qcoords), __pyx_ptype_5numpy_ndarray, 1, "qcoords", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_lcoords), __pyx_ptype_5numpy_ndarray, 1, "lcoords", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_qradii), __pyx_ptype_5numpy_ndarray, 1, "qradii", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_lradii), __pyx_ptype_5numpy_ndarray, 1, "lradii", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_spoints), __pyx_ptype_5numpy_ndarray, 1, "spoints", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 23; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_box), __pyx_ptype_5numpy_ndarray, 1, "box", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 23; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_6struct_4_asa_asa_loop(__pyx_self, __pyx_v_qcoords, __pyx_v_lcoords, __pyx_v_qradii, __pyx_v_lradii, __pyx_v_spoints, __pyx_v_box, __pyx_v_probe, __pyx_v_bucket_size, __pyx_v_MAXSYM); goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/struct/_asa.pyx":21 * * @cython.boundscheck(False) * def asa_loop(np.ndarray[DTYPE_t, ndim =2] qcoords, np.ndarray[DTYPE_t, ndim =2] lcoords,\ # <<<<<<<<<<<<<< * np.ndarray[DTYPE_t, ndim =1] qradii, np.ndarray[DTYPE_t, ndim =1] lradii,\ * np.ndarray[DTYPE_t, ndim =2] spoints, np.ndarray[DTYPE_t, ndim =1] box,\ */ static PyObject *__pyx_pf_6cogent_6struct_4_asa_asa_loop(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_qcoords, PyArrayObject *__pyx_v_lcoords, PyArrayObject *__pyx_v_qradii, PyArrayObject *__pyx_v_lradii, PyArrayObject *__pyx_v_spoints, PyArrayObject *__pyx_v_box, __pyx_t_6cogent_6struct_4_asa_DTYPE_t __pyx_v_probe, __pyx_t_6cogent_6struct_4_asa_UTYPE_t __pyx_v_bucket_size, PyObject *__pyx_v_MAXSYM) { int __pyx_v_idx; int __pyx_v_idxn; int __pyx_v_idxn_skip; int __pyx_v_idxs; int __pyx_v_lidx; int __pyx_v_lidx_c; int __pyx_v_is_accessible; __pyx_t_6cogent_6struct_4_asa_UTYPE_t __pyx_v_n_acc_points; __pyx_t_6cogent_6struct_4_asa_DTYPE_t __pyx_v_qradius; __pyx_t_6cogent_6struct_4_asa_DTYPE_t __pyx_v_lradius; __pyx_t_6cogent_6struct_4_asa_DTYPE_t __pyx_v_search_limit; __pyx_t_6cogent_6struct_4_asa_DTYPE_t __pyx_v_lradius_max; __pyx_t_6cogent_6struct_4_asa_DTYPE_t *__pyx_v_rspoint; __pyx_t_6cogent_6struct_4_asa_DTYPE_t *__pyx_v_distance; __pyx_t_6cogent_6struct_4_asa_DTYPE_t *__pyx_v_distance_sq; __pyx_t_6cogent_6struct_4_asa_DTYPE_t **__pyx_v_dstptr; __pyx_t_6cogent_6struct_4_asa_UTYPE_t **__pyx_v_idxptr; PyArrayObject *__pyx_v_areas = 0; __pyx_t_6cogent_6struct_4_asa_DTYPE_t *__pyx_v_qcoords_c; __pyx_t_6cogent_6struct_4_asa_DTYPE_t *__pyx_v_lcoords_c; __pyx_t_6cogent_6struct_4_asa_DTYPE_t *__pyx_v_spoints_c; __pyx_t_6cogent_6struct_4_asa_UTYPE_t __pyx_v_ssize; __pyx_t_6cogent_6struct_4_asa_UTYPE_t __pyx_v_lsize; npy_intp __pyx_v_qpnts; __pyx_t_6cogent_6struct_4_asa_DTYPE_t __pyx_v_const_pi; __pyx_t_6cogent_6struct_4_asa_UTYPE_t *__pyx_v_ridx; __pyx_t_6cogent_6struct_4_asa_UTYPE_t *__pyx_v_ridx_div; __pyx_t_6cogent_6struct_4_asa_DTYPE_t *__pyx_v_box_c; __pyx_t_6cogent_6struct_4_asa_DTYPE_t *__pyx_v_t_ptr; __pyx_t_6cogent_6struct_4_asa_DTYPE_t *__pyx_v_t_arr; __pyx_t_6cogent_6struct_4_asa_UTYPE_t *__pyx_v_t_lid; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint __pyx_v_search_point; npy_intp __pyx_v_neighbor_number; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_kdpnts; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_v_tree; __Pyx_LocalBuf_ND __pyx_pybuffernd_areas; __Pyx_Buffer __pyx_pybuffer_areas; __Pyx_LocalBuf_ND __pyx_pybuffernd_box; __Pyx_Buffer __pyx_pybuffer_box; __Pyx_LocalBuf_ND __pyx_pybuffernd_lcoords; __Pyx_Buffer __pyx_pybuffer_lcoords; __Pyx_LocalBuf_ND __pyx_pybuffernd_lradii; __Pyx_Buffer __pyx_pybuffer_lradii; __Pyx_LocalBuf_ND __pyx_pybuffernd_qcoords; __Pyx_Buffer __pyx_pybuffer_qcoords; __Pyx_LocalBuf_ND __pyx_pybuffernd_qradii; __Pyx_Buffer __pyx_pybuffer_qradii; __Pyx_LocalBuf_ND __pyx_pybuffernd_spoints; __Pyx_Buffer __pyx_pybuffer_spoints; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; PyObject *__pyx_t_2 = NULL; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; double __pyx_t_5; size_t __pyx_t_6; npy_intp __pyx_t_7; int __pyx_t_8; __pyx_t_6cogent_6struct_4_asa_DTYPE_t __pyx_t_9; int __pyx_t_10; int __pyx_t_11; int __pyx_t_12; int __pyx_t_13; npy_intp __pyx_t_14; __pyx_t_6cogent_6struct_4_asa_UTYPE_t __pyx_t_15; int __pyx_t_16; __pyx_t_6cogent_6struct_4_asa_UTYPE_t __pyx_t_17; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("asa_loop", 0); __pyx_pybuffer_areas.pybuffer.buf = NULL; __pyx_pybuffer_areas.refcount = 0; __pyx_pybuffernd_areas.data = NULL; __pyx_pybuffernd_areas.rcbuffer = &__pyx_pybuffer_areas; __pyx_pybuffer_qcoords.pybuffer.buf = NULL; __pyx_pybuffer_qcoords.refcount = 0; __pyx_pybuffernd_qcoords.data = NULL; __pyx_pybuffernd_qcoords.rcbuffer = &__pyx_pybuffer_qcoords; __pyx_pybuffer_lcoords.pybuffer.buf = NULL; __pyx_pybuffer_lcoords.refcount = 0; __pyx_pybuffernd_lcoords.data = NULL; __pyx_pybuffernd_lcoords.rcbuffer = &__pyx_pybuffer_lcoords; __pyx_pybuffer_qradii.pybuffer.buf = NULL; __pyx_pybuffer_qradii.refcount = 0; __pyx_pybuffernd_qradii.data = NULL; __pyx_pybuffernd_qradii.rcbuffer = &__pyx_pybuffer_qradii; __pyx_pybuffer_lradii.pybuffer.buf = NULL; __pyx_pybuffer_lradii.refcount = 0; __pyx_pybuffernd_lradii.data = NULL; __pyx_pybuffernd_lradii.rcbuffer = &__pyx_pybuffer_lradii; __pyx_pybuffer_spoints.pybuffer.buf = NULL; __pyx_pybuffer_spoints.refcount = 0; __pyx_pybuffernd_spoints.data = NULL; __pyx_pybuffernd_spoints.rcbuffer = &__pyx_pybuffer_spoints; __pyx_pybuffer_box.pybuffer.buf = NULL; __pyx_pybuffer_box.refcount = 0; __pyx_pybuffernd_box.data = NULL; __pyx_pybuffernd_box.rcbuffer = &__pyx_pybuffer_box; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_qcoords.rcbuffer->pybuffer, (PyObject*)__pyx_v_qcoords, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_4_asa_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_qcoords.diminfo[0].strides = __pyx_pybuffernd_qcoords.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_qcoords.diminfo[0].shape = __pyx_pybuffernd_qcoords.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_qcoords.diminfo[1].strides = __pyx_pybuffernd_qcoords.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_qcoords.diminfo[1].shape = __pyx_pybuffernd_qcoords.rcbuffer->pybuffer.shape[1]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_lcoords.rcbuffer->pybuffer, (PyObject*)__pyx_v_lcoords, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_4_asa_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_lcoords.diminfo[0].strides = __pyx_pybuffernd_lcoords.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_lcoords.diminfo[0].shape = __pyx_pybuffernd_lcoords.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_lcoords.diminfo[1].strides = __pyx_pybuffernd_lcoords.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_lcoords.diminfo[1].shape = __pyx_pybuffernd_lcoords.rcbuffer->pybuffer.shape[1]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_qradii.rcbuffer->pybuffer, (PyObject*)__pyx_v_qradii, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_4_asa_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_qradii.diminfo[0].strides = __pyx_pybuffernd_qradii.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_qradii.diminfo[0].shape = __pyx_pybuffernd_qradii.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_lradii.rcbuffer->pybuffer, (PyObject*)__pyx_v_lradii, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_4_asa_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_lradii.diminfo[0].strides = __pyx_pybuffernd_lradii.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_lradii.diminfo[0].shape = __pyx_pybuffernd_lradii.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_spoints.rcbuffer->pybuffer, (PyObject*)__pyx_v_spoints, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_4_asa_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_spoints.diminfo[0].strides = __pyx_pybuffernd_spoints.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_spoints.diminfo[0].shape = __pyx_pybuffernd_spoints.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_spoints.diminfo[1].strides = __pyx_pybuffernd_spoints.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_spoints.diminfo[1].shape = __pyx_pybuffernd_spoints.rcbuffer->pybuffer.shape[1]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_box.rcbuffer->pybuffer, (PyObject*)__pyx_v_box, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_4_asa_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_box.diminfo[0].strides = __pyx_pybuffernd_box.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_box.diminfo[0].shape = __pyx_pybuffernd_box.rcbuffer->pybuffer.shape[0]; /* "cogent/struct/_asa.pyx":33 * cdef UTYPE_t n_acc_points * cdef DTYPE_t qradius, lradius, search_limit * cdef DTYPE_t lradius_max = 2.0 + probe # <<<<<<<<<<<<<< * #malloc'ed * cdef DTYPE_t *rspoint = malloc(3 * sizeof(DTYPE_t)) */ __pyx_v_lradius_max = (2.0 + __pyx_v_probe); /* "cogent/struct/_asa.pyx":35 * cdef DTYPE_t lradius_max = 2.0 + probe * #malloc'ed * cdef DTYPE_t *rspoint = malloc(3 * sizeof(DTYPE_t)) # <<<<<<<<<<<<<< * cdef DTYPE_t *distance = malloc(3 * sizeof(DTYPE_t)) * cdef DTYPE_t *distance_sq = malloc(3 * sizeof(DTYPE_t)) */ __pyx_v_rspoint = ((__pyx_t_6cogent_6struct_4_asa_DTYPE_t *)malloc((3 * (sizeof(__pyx_t_6cogent_6struct_4_asa_DTYPE_t))))); /* "cogent/struct/_asa.pyx":36 * #malloc'ed * cdef DTYPE_t *rspoint = malloc(3 * sizeof(DTYPE_t)) * cdef DTYPE_t *distance = malloc(3 * sizeof(DTYPE_t)) # <<<<<<<<<<<<<< * cdef DTYPE_t *distance_sq = malloc(3 * sizeof(DTYPE_t)) * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) */ __pyx_v_distance = ((__pyx_t_6cogent_6struct_4_asa_DTYPE_t *)malloc((3 * (sizeof(__pyx_t_6cogent_6struct_4_asa_DTYPE_t))))); /* "cogent/struct/_asa.pyx":37 * cdef DTYPE_t *rspoint = malloc(3 * sizeof(DTYPE_t)) * cdef DTYPE_t *distance = malloc(3 * sizeof(DTYPE_t)) * cdef DTYPE_t *distance_sq = malloc(3 * sizeof(DTYPE_t)) # <<<<<<<<<<<<<< * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) */ __pyx_v_distance_sq = ((__pyx_t_6cogent_6struct_4_asa_DTYPE_t *)malloc((3 * (sizeof(__pyx_t_6cogent_6struct_4_asa_DTYPE_t))))); /* "cogent/struct/_asa.pyx":38 * cdef DTYPE_t *distance = malloc(3 * sizeof(DTYPE_t)) * cdef DTYPE_t *distance_sq = malloc(3 * sizeof(DTYPE_t)) * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) # <<<<<<<<<<<<<< * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) * # cdef DTYPE_t *areas = malloc(qcoords.shape[0] * sizeof(DTYPE_t)) */ __pyx_v_dstptr = ((__pyx_t_6cogent_6struct_4_asa_DTYPE_t **)malloc((sizeof(__pyx_t_6cogent_6struct_4_asa_DTYPE_t *)))); /* "cogent/struct/_asa.pyx":39 * cdef DTYPE_t *distance_sq = malloc(3 * sizeof(DTYPE_t)) * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) # <<<<<<<<<<<<<< * # cdef DTYPE_t *areas = malloc(qcoords.shape[0] * sizeof(DTYPE_t)) * cdef np.ndarray[DTYPE_t, ndim=1] areas = np.ndarray((qcoords.shape[0],), dtype=np.float64) */ __pyx_v_idxptr = ((__pyx_t_6cogent_6struct_4_asa_UTYPE_t **)malloc((sizeof(__pyx_t_6cogent_6struct_4_asa_UTYPE_t *)))); /* "cogent/struct/_asa.pyx":41 * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) * # cdef DTYPE_t *areas = malloc(qcoords.shape[0] * sizeof(DTYPE_t)) * cdef np.ndarray[DTYPE_t, ndim=1] areas = np.ndarray((qcoords.shape[0],), dtype=np.float64) # <<<<<<<<<<<<<< * * #c arrays from numpy */ __pyx_t_1 = __Pyx_PyInt_to_py_Py_intptr_t((__pyx_v_qcoords->dimensions[0])); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1); __Pyx_GIVEREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)__pyx_t_2)); __Pyx_GIVEREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); __pyx_t_3 = __Pyx_GetName(__pyx_m, __pyx_n_s__np); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = PyObject_GetAttr(__pyx_t_3, __pyx_n_s__float64); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (PyDict_SetItem(__pyx_t_2, ((PyObject *)__pyx_n_s__dtype), __pyx_t_4) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyObject_Call(((PyObject *)((PyObject*)__pyx_ptype_5numpy_ndarray)), ((PyObject *)__pyx_t_1), ((PyObject *)__pyx_t_2)); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0; __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_areas.rcbuffer->pybuffer, (PyObject*)((PyArrayObject *)__pyx_t_4), &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_4_asa_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_areas = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_areas.rcbuffer->pybuffer.buf = NULL; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } else {__pyx_pybuffernd_areas.diminfo[0].strides = __pyx_pybuffernd_areas.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_areas.diminfo[0].shape = __pyx_pybuffernd_areas.rcbuffer->pybuffer.shape[0]; } } __pyx_v_areas = ((PyArrayObject *)__pyx_t_4); __pyx_t_4 = 0; /* "cogent/struct/_asa.pyx":44 * * #c arrays from numpy * cdef DTYPE_t *qcoords_c = qcoords.data # <<<<<<<<<<<<<< * cdef DTYPE_t *lcoords_c = lcoords.data * cdef DTYPE_t *spoints_c = spoints.data */ __pyx_v_qcoords_c = ((__pyx_t_6cogent_6struct_4_asa_DTYPE_t *)__pyx_v_qcoords->data); /* "cogent/struct/_asa.pyx":45 * #c arrays from numpy * cdef DTYPE_t *qcoords_c = qcoords.data * cdef DTYPE_t *lcoords_c = lcoords.data # <<<<<<<<<<<<<< * cdef DTYPE_t *spoints_c = spoints.data * #datas */ __pyx_v_lcoords_c = ((__pyx_t_6cogent_6struct_4_asa_DTYPE_t *)__pyx_v_lcoords->data); /* "cogent/struct/_asa.pyx":46 * cdef DTYPE_t *qcoords_c = qcoords.data * cdef DTYPE_t *lcoords_c = lcoords.data * cdef DTYPE_t *spoints_c = spoints.data # <<<<<<<<<<<<<< * #datas * cdef UTYPE_t ssize = spoints.shape[0] */ __pyx_v_spoints_c = ((__pyx_t_6cogent_6struct_4_asa_DTYPE_t *)__pyx_v_spoints->data); /* "cogent/struct/_asa.pyx":48 * cdef DTYPE_t *spoints_c = spoints.data * #datas * cdef UTYPE_t ssize = spoints.shape[0] # <<<<<<<<<<<<<< * cdef UTYPE_t lsize = lradii.shape[0] * cdef npy_intp qpnts = qcoords.shape[0] */ __pyx_v_ssize = (__pyx_v_spoints->dimensions[0]); /* "cogent/struct/_asa.pyx":49 * #datas * cdef UTYPE_t ssize = spoints.shape[0] * cdef UTYPE_t lsize = lradii.shape[0] # <<<<<<<<<<<<<< * cdef npy_intp qpnts = qcoords.shape[0] * cdef DTYPE_t const_pi = pi * 4.0 / ssize */ __pyx_v_lsize = (__pyx_v_lradii->dimensions[0]); /* "cogent/struct/_asa.pyx":50 * cdef UTYPE_t ssize = spoints.shape[0] * cdef UTYPE_t lsize = lradii.shape[0] * cdef npy_intp qpnts = qcoords.shape[0] # <<<<<<<<<<<<<< * cdef DTYPE_t const_pi = pi * 4.0 / ssize * */ __pyx_v_qpnts = (__pyx_v_qcoords->dimensions[0]); /* "cogent/struct/_asa.pyx":51 * cdef UTYPE_t lsize = lradii.shape[0] * cdef npy_intp qpnts = qcoords.shape[0] * cdef DTYPE_t const_pi = pi * 4.0 / ssize # <<<<<<<<<<<<<< * * #pointers */ __pyx_t_5 = (M_PI * 4.0); if (unlikely(__pyx_v_ssize == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "float division"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 51; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_v_const_pi = (__pyx_t_5 / __pyx_v_ssize); /* "cogent/struct/_asa.pyx":59 * # create a temporary array of lattice points, which are within a box around * # the query atoms. The kd-tree will be constructed from those filterd atoms. * cdef DTYPE_t *box_c = box.data # <<<<<<<<<<<<<< * cdef DTYPE_t *t_ptr # temporary pointer * cdef DTYPE_t *t_arr = malloc(3 * MAXSYM * sizeof(DTYPE_t)) # temporary array of symmetry */ __pyx_v_box_c = ((__pyx_t_6cogent_6struct_4_asa_DTYPE_t *)__pyx_v_box->data); /* "cogent/struct/_asa.pyx":61 * cdef DTYPE_t *box_c = box.data * cdef DTYPE_t *t_ptr # temporary pointer * cdef DTYPE_t *t_arr = malloc(3 * MAXSYM * sizeof(DTYPE_t)) # temporary array of symmetry # <<<<<<<<<<<<<< * cdef UTYPE_t *t_lid = malloc( MAXSYM * sizeof(UTYPE_t)) # maping to original indices * lidx_c = 0 */ __pyx_t_4 = PyNumber_Multiply(__pyx_int_3, __pyx_v_MAXSYM); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 61; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_2 = __Pyx_PyInt_FromSize_t((sizeof(__pyx_t_6cogent_6struct_4_asa_DTYPE_t))); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 61; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_1 = PyNumber_Multiply(__pyx_t_4, __pyx_t_2); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 61; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_6 = __Pyx_PyInt_AsSize_t(__pyx_t_1); if (unlikely((__pyx_t_6 == (size_t)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 61; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_v_t_arr = ((__pyx_t_6cogent_6struct_4_asa_DTYPE_t *)malloc(__pyx_t_6)); /* "cogent/struct/_asa.pyx":62 * cdef DTYPE_t *t_ptr # temporary pointer * cdef DTYPE_t *t_arr = malloc(3 * MAXSYM * sizeof(DTYPE_t)) # temporary array of symmetry * cdef UTYPE_t *t_lid = malloc( MAXSYM * sizeof(UTYPE_t)) # maping to original indices # <<<<<<<<<<<<<< * lidx_c = 0 * for 0 <= lidx < lcoords.shape[0]: */ __pyx_t_1 = __Pyx_PyInt_FromSize_t((sizeof(__pyx_t_6cogent_6struct_4_asa_UTYPE_t))); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 62; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = PyNumber_Multiply(__pyx_v_MAXSYM, __pyx_t_1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 62; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_6 = __Pyx_PyInt_AsSize_t(__pyx_t_2); if (unlikely((__pyx_t_6 == (size_t)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 62; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_v_t_lid = ((__pyx_t_6cogent_6struct_4_asa_UTYPE_t *)malloc(__pyx_t_6)); /* "cogent/struct/_asa.pyx":63 * cdef DTYPE_t *t_arr = malloc(3 * MAXSYM * sizeof(DTYPE_t)) # temporary array of symmetry * cdef UTYPE_t *t_lid = malloc( MAXSYM * sizeof(UTYPE_t)) # maping to original indices * lidx_c = 0 # <<<<<<<<<<<<<< * for 0 <= lidx < lcoords.shape[0]: * t_ptr = lcoords_c + lidx * 3 */ __pyx_v_lidx_c = 0; /* "cogent/struct/_asa.pyx":64 * cdef UTYPE_t *t_lid = malloc( MAXSYM * sizeof(UTYPE_t)) # maping to original indices * lidx_c = 0 * for 0 <= lidx < lcoords.shape[0]: # <<<<<<<<<<<<<< * t_ptr = lcoords_c + lidx * 3 * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ */ __pyx_t_7 = (__pyx_v_lcoords->dimensions[0]); for (__pyx_v_lidx = 0; __pyx_v_lidx < __pyx_t_7; __pyx_v_lidx++) { /* "cogent/struct/_asa.pyx":65 * lidx_c = 0 * for 0 <= lidx < lcoords.shape[0]: * t_ptr = lcoords_c + lidx * 3 # <<<<<<<<<<<<<< * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ */ __pyx_v_t_ptr = (__pyx_v_lcoords_c + (__pyx_v_lidx * 3)); /* "cogent/struct/_asa.pyx":66 * for 0 <= lidx < lcoords.shape[0]: * t_ptr = lcoords_c + lidx * 3 * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ # <<<<<<<<<<<<<< * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: */ __pyx_t_8 = ((__pyx_v_box_c[0]) <= (__pyx_v_t_ptr[0])); if (__pyx_t_8) { __pyx_t_8 = ((__pyx_v_t_ptr[0]) <= (__pyx_v_box_c[3])); } if (__pyx_t_8) { /* "cogent/struct/_asa.pyx":67 * t_ptr = lcoords_c + lidx * 3 * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ # <<<<<<<<<<<<<< * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: * t_arr[3*lidx_c ] = (t_ptr )[0] */ __pyx_t_9 = ((__pyx_v_t_ptr + 1)[0]); __pyx_t_10 = ((__pyx_v_box_c[1]) <= __pyx_t_9); if (__pyx_t_10) { __pyx_t_10 = (__pyx_t_9 <= (__pyx_v_box_c[4])); } if (__pyx_t_10) { /* "cogent/struct/_asa.pyx":68 * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: # <<<<<<<<<<<<<< * t_arr[3*lidx_c ] = (t_ptr )[0] * t_arr[3*lidx_c+1] = (t_ptr+1)[0] */ __pyx_t_9 = ((__pyx_v_t_ptr + 2)[0]); __pyx_t_11 = ((__pyx_v_box_c[2]) <= __pyx_t_9); if (__pyx_t_11) { __pyx_t_11 = (__pyx_t_9 <= (__pyx_v_box_c[5])); } __pyx_t_12 = __pyx_t_11; } else { __pyx_t_12 = __pyx_t_10; } __pyx_t_10 = __pyx_t_12; } else { __pyx_t_10 = __pyx_t_8; } if (__pyx_t_10) { /* "cogent/struct/_asa.pyx":69 * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: * t_arr[3*lidx_c ] = (t_ptr )[0] # <<<<<<<<<<<<<< * t_arr[3*lidx_c+1] = (t_ptr+1)[0] * t_arr[3*lidx_c+2] = (t_ptr+2)[0] */ (__pyx_v_t_arr[(3 * __pyx_v_lidx_c)]) = (__pyx_v_t_ptr[0]); /* "cogent/struct/_asa.pyx":70 * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: * t_arr[3*lidx_c ] = (t_ptr )[0] * t_arr[3*lidx_c+1] = (t_ptr+1)[0] # <<<<<<<<<<<<<< * t_arr[3*lidx_c+2] = (t_ptr+2)[0] * t_lid[lidx_c] = lidx */ (__pyx_v_t_arr[((3 * __pyx_v_lidx_c) + 1)]) = ((__pyx_v_t_ptr + 1)[0]); /* "cogent/struct/_asa.pyx":71 * t_arr[3*lidx_c ] = (t_ptr )[0] * t_arr[3*lidx_c+1] = (t_ptr+1)[0] * t_arr[3*lidx_c+2] = (t_ptr+2)[0] # <<<<<<<<<<<<<< * t_lid[lidx_c] = lidx * lidx_c += 1 */ (__pyx_v_t_arr[((3 * __pyx_v_lidx_c) + 2)]) = ((__pyx_v_t_ptr + 2)[0]); /* "cogent/struct/_asa.pyx":72 * t_arr[3*lidx_c+1] = (t_ptr+1)[0] * t_arr[3*lidx_c+2] = (t_ptr+2)[0] * t_lid[lidx_c] = lidx # <<<<<<<<<<<<<< * lidx_c += 1 * */ (__pyx_v_t_lid[__pyx_v_lidx_c]) = __pyx_v_lidx; /* "cogent/struct/_asa.pyx":73 * t_arr[3*lidx_c+2] = (t_ptr+2)[0] * t_lid[lidx_c] = lidx * lidx_c += 1 # <<<<<<<<<<<<<< * * #make kd-tree */ __pyx_v_lidx_c = (__pyx_v_lidx_c + 1); goto __pyx_L5; } __pyx_L5:; } /* "cogent/struct/_asa.pyx":78 * cdef kdpoint search_point * cdef npy_intp neighbor_number * cdef kdpoint *kdpnts = points(t_arr, lidx_c, 3) # <<<<<<<<<<<<<< * cdef kdnode *tree = build_tree(kdpnts, 0, lidx_c -1, 3, bucket_size, 0) * */ __pyx_v_kdpnts = __pyx_f_6cogent_5maths_7spatial_4ckd3_points(__pyx_v_t_arr, __pyx_v_lidx_c, 3); /* "cogent/struct/_asa.pyx":79 * cdef npy_intp neighbor_number * cdef kdpoint *kdpnts = points(t_arr, lidx_c, 3) * cdef kdnode *tree = build_tree(kdpnts, 0, lidx_c -1, 3, bucket_size, 0) # <<<<<<<<<<<<<< * * for 0 <= idx < qpnts: */ __pyx_v_tree = __pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree(__pyx_v_kdpnts, 0, (__pyx_v_lidx_c - 1), 3, __pyx_v_bucket_size, 0); /* "cogent/struct/_asa.pyx":81 * cdef kdnode *tree = build_tree(kdpnts, 0, lidx_c -1, 3, bucket_size, 0) * * for 0 <= idx < qpnts: # <<<<<<<<<<<<<< * qradius = qradii[idx] * #print qradius, lradius_max */ __pyx_t_7 = __pyx_v_qpnts; for (__pyx_v_idx = 0; __pyx_v_idx < __pyx_t_7; __pyx_v_idx++) { /* "cogent/struct/_asa.pyx":82 * * for 0 <= idx < qpnts: * qradius = qradii[idx] # <<<<<<<<<<<<<< * #print qradius, lradius_max * search_point.coords = qcoords_c + idx*3 */ __pyx_t_13 = __pyx_v_idx; if (__pyx_t_13 < 0) __pyx_t_13 += __pyx_pybuffernd_qradii.diminfo[0].shape; __pyx_v_qradius = (*__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_4_asa_DTYPE_t *, __pyx_pybuffernd_qradii.rcbuffer->pybuffer.buf, __pyx_t_13, __pyx_pybuffernd_qradii.diminfo[0].strides)); /* "cogent/struct/_asa.pyx":84 * qradius = qradii[idx] * #print qradius, lradius_max * search_point.coords = qcoords_c + idx*3 # <<<<<<<<<<<<<< * #print search_point.coords[0], search_point.coords[1], search_point.coords[2] * search_limit = (qradius + lradius_max) * (qradius + lradius_max) */ __pyx_v_search_point.coords = (__pyx_v_qcoords_c + (__pyx_v_idx * 3)); /* "cogent/struct/_asa.pyx":86 * search_point.coords = qcoords_c + idx*3 * #print search_point.coords[0], search_point.coords[1], search_point.coords[2] * search_limit = (qradius + lradius_max) * (qradius + lradius_max) # <<<<<<<<<<<<<< * #print search_limit * neighbor_number = rn(tree, kdpnts, search_point, dstptr, idxptr, search_limit, 3, 100) */ __pyx_v_search_limit = ((__pyx_v_qradius + __pyx_v_lradius_max) * (__pyx_v_qradius + __pyx_v_lradius_max)); /* "cogent/struct/_asa.pyx":88 * search_limit = (qradius + lradius_max) * (qradius + lradius_max) * #print search_limit * neighbor_number = rn(tree, kdpnts, search_point, dstptr, idxptr, search_limit, 3, 100) # <<<<<<<<<<<<<< * #print neighbor_number * #create array of real indices */ __pyx_v_neighbor_number = ((npy_intp)__pyx_f_6cogent_5maths_7spatial_4ckd3_rn(__pyx_v_tree, __pyx_v_kdpnts, __pyx_v_search_point, __pyx_v_dstptr, __pyx_v_idxptr, __pyx_v_search_limit, 3, 100)); /* "cogent/struct/_asa.pyx":91 * #print neighbor_number * #create array of real indices * ridx = malloc(neighbor_number * sizeof(UTYPE_t)) # <<<<<<<<<<<<<< * ridx_div = malloc(neighbor_number * sizeof(UTYPE_t)) * */ __pyx_v_ridx = ((__pyx_t_6cogent_6struct_4_asa_UTYPE_t *)malloc((__pyx_v_neighbor_number * (sizeof(__pyx_t_6cogent_6struct_4_asa_UTYPE_t))))); /* "cogent/struct/_asa.pyx":92 * #create array of real indices * ridx = malloc(neighbor_number * sizeof(UTYPE_t)) * ridx_div = malloc(neighbor_number * sizeof(UTYPE_t)) # <<<<<<<<<<<<<< * * idxn_skip = 0 */ __pyx_v_ridx_div = ((__pyx_t_6cogent_6struct_4_asa_UTYPE_t *)malloc((__pyx_v_neighbor_number * (sizeof(__pyx_t_6cogent_6struct_4_asa_UTYPE_t))))); /* "cogent/struct/_asa.pyx":94 * ridx_div = malloc(neighbor_number * sizeof(UTYPE_t)) * * idxn_skip = 0 # <<<<<<<<<<<<<< * for 0 <= idxn < neighbor_number: * if dstptr[0][idxn] <= 0.001: */ __pyx_v_idxn_skip = 0; /* "cogent/struct/_asa.pyx":95 * * idxn_skip = 0 * for 0 <= idxn < neighbor_number: # <<<<<<<<<<<<<< * if dstptr[0][idxn] <= 0.001: * idxn_skip = 1 */ __pyx_t_14 = __pyx_v_neighbor_number; for (__pyx_v_idxn = 0; __pyx_v_idxn < __pyx_t_14; __pyx_v_idxn++) { /* "cogent/struct/_asa.pyx":96 * idxn_skip = 0 * for 0 <= idxn < neighbor_number: * if dstptr[0][idxn] <= 0.001: # <<<<<<<<<<<<<< * idxn_skip = 1 * continue */ __pyx_t_10 = (((__pyx_v_dstptr[0])[__pyx_v_idxn]) <= 0.001); if (__pyx_t_10) { /* "cogent/struct/_asa.pyx":97 * for 0 <= idxn < neighbor_number: * if dstptr[0][idxn] <= 0.001: * idxn_skip = 1 # <<<<<<<<<<<<<< * continue * ridx[idxn - idxn_skip] = kdpnts[idxptr[0][idxn]].index * 3 */ __pyx_v_idxn_skip = 1; /* "cogent/struct/_asa.pyx":98 * if dstptr[0][idxn] <= 0.001: * idxn_skip = 1 * continue # <<<<<<<<<<<<<< * ridx[idxn - idxn_skip] = kdpnts[idxptr[0][idxn]].index * 3 * ridx_div[idxn - idxn_skip] = t_lid[kdpnts[idxptr[0][idxn]].index] % lsize */ goto __pyx_L8_continue; goto __pyx_L10; } __pyx_L10:; /* "cogent/struct/_asa.pyx":99 * idxn_skip = 1 * continue * ridx[idxn - idxn_skip] = kdpnts[idxptr[0][idxn]].index * 3 # <<<<<<<<<<<<<< * ridx_div[idxn - idxn_skip] = t_lid[kdpnts[idxptr[0][idxn]].index] % lsize * */ (__pyx_v_ridx[(__pyx_v_idxn - __pyx_v_idxn_skip)]) = ((__pyx_v_kdpnts[((__pyx_v_idxptr[0])[__pyx_v_idxn])]).index * 3); /* "cogent/struct/_asa.pyx":100 * continue * ridx[idxn - idxn_skip] = kdpnts[idxptr[0][idxn]].index * 3 * ridx_div[idxn - idxn_skip] = t_lid[kdpnts[idxptr[0][idxn]].index] % lsize # <<<<<<<<<<<<<< * * n_acc_points = 0 */ if (unlikely(__pyx_v_lsize == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "integer division or modulo by zero"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 100; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } (__pyx_v_ridx_div[(__pyx_v_idxn - __pyx_v_idxn_skip)]) = ((__pyx_v_t_lid[(__pyx_v_kdpnts[((__pyx_v_idxptr[0])[__pyx_v_idxn])]).index]) % __pyx_v_lsize); __pyx_L8_continue:; } /* "cogent/struct/_asa.pyx":102 * ridx_div[idxn - idxn_skip] = t_lid[kdpnts[idxptr[0][idxn]].index] % lsize * * n_acc_points = 0 # <<<<<<<<<<<<<< * for 0 <= idxs < ssize: # loop over sphere points * is_accessible = 1 */ __pyx_v_n_acc_points = 0; /* "cogent/struct/_asa.pyx":103 * * n_acc_points = 0 * for 0 <= idxs < ssize: # loop over sphere points # <<<<<<<<<<<<<< * is_accessible = 1 * #if idxs == 0: */ __pyx_t_15 = __pyx_v_ssize; for (__pyx_v_idxs = 0; __pyx_v_idxs < __pyx_t_15; __pyx_v_idxs++) { /* "cogent/struct/_asa.pyx":104 * n_acc_points = 0 * for 0 <= idxs < ssize: # loop over sphere points * is_accessible = 1 # <<<<<<<<<<<<<< * #if idxs == 0: * #print search_point.coords[0], search_point.coords[1], search_point.coords[2], '\t', */ __pyx_v_is_accessible = 1; /* "cogent/struct/_asa.pyx":111 * #if idxs == 0: * #print qradius, '\t', * rspoint[0] = search_point.coords[0] + (spoints_c + idxs*3 )[0] * qradius # <<<<<<<<<<<<<< * rspoint[1] = search_point.coords[1] + (spoints_c + idxs*3+1)[0] * qradius * rspoint[2] = search_point.coords[2] + (spoints_c + idxs*3+2)[0] * qradius */ (__pyx_v_rspoint[0]) = ((__pyx_v_search_point.coords[0]) + (((__pyx_v_spoints_c + (__pyx_v_idxs * 3))[0]) * __pyx_v_qradius)); /* "cogent/struct/_asa.pyx":112 * #print qradius, '\t', * rspoint[0] = search_point.coords[0] + (spoints_c + idxs*3 )[0] * qradius * rspoint[1] = search_point.coords[1] + (spoints_c + idxs*3+1)[0] * qradius # <<<<<<<<<<<<<< * rspoint[2] = search_point.coords[2] + (spoints_c + idxs*3+2)[0] * qradius * #if idxs == 0: */ (__pyx_v_rspoint[1]) = ((__pyx_v_search_point.coords[1]) + ((((__pyx_v_spoints_c + (__pyx_v_idxs * 3)) + 1)[0]) * __pyx_v_qradius)); /* "cogent/struct/_asa.pyx":113 * rspoint[0] = search_point.coords[0] + (spoints_c + idxs*3 )[0] * qradius * rspoint[1] = search_point.coords[1] + (spoints_c + idxs*3+1)[0] * qradius * rspoint[2] = search_point.coords[2] + (spoints_c + idxs*3+2)[0] * qradius # <<<<<<<<<<<<<< * #if idxs == 0: * #print rspoint[0], rspoint[1], rspoint[2] */ (__pyx_v_rspoint[2]) = ((__pyx_v_search_point.coords[2]) + ((((__pyx_v_spoints_c + (__pyx_v_idxs * 3)) + 2)[0]) * __pyx_v_qradius)); /* "cogent/struct/_asa.pyx":117 * #print rspoint[0], rspoint[1], rspoint[2] * #real_point = point * qradius + qcoord * for 0 <= idxn < neighbor_number - idxn_skip: # loop over neighbors # <<<<<<<<<<<<<< * #if dstptr[0][idxn] == 0.: * # continue */ __pyx_t_16 = (__pyx_v_neighbor_number - __pyx_v_idxn_skip); for (__pyx_v_idxn = 0; __pyx_v_idxn < __pyx_t_16; __pyx_v_idxn++) { /* "cogent/struct/_asa.pyx":121 * # continue * #print ' ', neighbor_number, idxn_skip,idxn, ridx_div[idxn] ,ridx[idxn] * lradius = lradii[ridx_div[idxn]] # <<<<<<<<<<<<<< * #print (lcoords_c + ridx[idxn]*3 )[0], (lcoords_c + ridx[idxn]*3+1)[0], (lcoords_c + ridx[idxn]*3+2)[0], * distance[0] = rspoint[0] - (t_arr + ridx[idxn] )[0] */ __pyx_t_17 = (__pyx_v_ridx_div[__pyx_v_idxn]); __pyx_v_lradius = (*__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_4_asa_DTYPE_t *, __pyx_pybuffernd_lradii.rcbuffer->pybuffer.buf, __pyx_t_17, __pyx_pybuffernd_lradii.diminfo[0].strides)); /* "cogent/struct/_asa.pyx":123 * lradius = lradii[ridx_div[idxn]] * #print (lcoords_c + ridx[idxn]*3 )[0], (lcoords_c + ridx[idxn]*3+1)[0], (lcoords_c + ridx[idxn]*3+2)[0], * distance[0] = rspoint[0] - (t_arr + ridx[idxn] )[0] # <<<<<<<<<<<<<< * distance[1] = rspoint[1] - (t_arr + ridx[idxn]+1)[0] * distance[2] = rspoint[2] - (t_arr + ridx[idxn]+2)[0] */ (__pyx_v_distance[0]) = ((__pyx_v_rspoint[0]) - ((__pyx_v_t_arr + (__pyx_v_ridx[__pyx_v_idxn]))[0])); /* "cogent/struct/_asa.pyx":124 * #print (lcoords_c + ridx[idxn]*3 )[0], (lcoords_c + ridx[idxn]*3+1)[0], (lcoords_c + ridx[idxn]*3+2)[0], * distance[0] = rspoint[0] - (t_arr + ridx[idxn] )[0] * distance[1] = rspoint[1] - (t_arr + ridx[idxn]+1)[0] # <<<<<<<<<<<<<< * distance[2] = rspoint[2] - (t_arr + ridx[idxn]+2)[0] * #print '\t', distance[0], distance[1], distance[2], */ (__pyx_v_distance[1]) = ((__pyx_v_rspoint[1]) - (((__pyx_v_t_arr + (__pyx_v_ridx[__pyx_v_idxn])) + 1)[0])); /* "cogent/struct/_asa.pyx":125 * distance[0] = rspoint[0] - (t_arr + ridx[idxn] )[0] * distance[1] = rspoint[1] - (t_arr + ridx[idxn]+1)[0] * distance[2] = rspoint[2] - (t_arr + ridx[idxn]+2)[0] # <<<<<<<<<<<<<< * #print '\t', distance[0], distance[1], distance[2], * distance_sq[0] = distance[0] * distance[0] */ (__pyx_v_distance[2]) = ((__pyx_v_rspoint[2]) - (((__pyx_v_t_arr + (__pyx_v_ridx[__pyx_v_idxn])) + 2)[0])); /* "cogent/struct/_asa.pyx":127 * distance[2] = rspoint[2] - (t_arr + ridx[idxn]+2)[0] * #print '\t', distance[0], distance[1], distance[2], * distance_sq[0] = distance[0] * distance[0] # <<<<<<<<<<<<<< * distance_sq[1] = distance[1] * distance[1] * distance_sq[2] = distance[2] * distance[2] */ (__pyx_v_distance_sq[0]) = ((__pyx_v_distance[0]) * (__pyx_v_distance[0])); /* "cogent/struct/_asa.pyx":128 * #print '\t', distance[0], distance[1], distance[2], * distance_sq[0] = distance[0] * distance[0] * distance_sq[1] = distance[1] * distance[1] # <<<<<<<<<<<<<< * distance_sq[2] = distance[2] * distance[2] * #print '\t', distance_sq[0], distance_sq[1], distance_sq[2], lradius * lradius, */ (__pyx_v_distance_sq[1]) = ((__pyx_v_distance[1]) * (__pyx_v_distance[1])); /* "cogent/struct/_asa.pyx":129 * distance_sq[0] = distance[0] * distance[0] * distance_sq[1] = distance[1] * distance[1] * distance_sq[2] = distance[2] * distance[2] # <<<<<<<<<<<<<< * #print '\t', distance_sq[0], distance_sq[1], distance_sq[2], lradius * lradius, * if distance_sq[0] + distance_sq[1] + distance_sq[2] < lradius * lradius: */ (__pyx_v_distance_sq[2]) = ((__pyx_v_distance[2]) * (__pyx_v_distance[2])); /* "cogent/struct/_asa.pyx":131 * distance_sq[2] = distance[2] * distance[2] * #print '\t', distance_sq[0], distance_sq[1], distance_sq[2], lradius * lradius, * if distance_sq[0] + distance_sq[1] + distance_sq[2] < lradius * lradius: # <<<<<<<<<<<<<< * is_accessible = 0 * #print 'NA' */ __pyx_t_10 = ((((__pyx_v_distance_sq[0]) + (__pyx_v_distance_sq[1])) + (__pyx_v_distance_sq[2])) < (__pyx_v_lradius * __pyx_v_lradius)); if (__pyx_t_10) { /* "cogent/struct/_asa.pyx":132 * #print '\t', distance_sq[0], distance_sq[1], distance_sq[2], lradius * lradius, * if distance_sq[0] + distance_sq[1] + distance_sq[2] < lradius * lradius: * is_accessible = 0 # <<<<<<<<<<<<<< * #print 'NA' * break */ __pyx_v_is_accessible = 0; /* "cogent/struct/_asa.pyx":134 * is_accessible = 0 * #print 'NA' * break # <<<<<<<<<<<<<< * #print * if is_accessible == 1: */ goto __pyx_L14_break; goto __pyx_L15; } __pyx_L15:; } __pyx_L14_break:; /* "cogent/struct/_asa.pyx":136 * break * #print * if is_accessible == 1: # <<<<<<<<<<<<<< * n_acc_points += 1 * #print n_acc_points */ __pyx_t_10 = (__pyx_v_is_accessible == 1); if (__pyx_t_10) { /* "cogent/struct/_asa.pyx":137 * #print * if is_accessible == 1: * n_acc_points += 1 # <<<<<<<<<<<<<< * #print n_acc_points * areas[idx] = const_pi * n_acc_points * qradius * qradius */ __pyx_v_n_acc_points = (__pyx_v_n_acc_points + 1); goto __pyx_L16; } __pyx_L16:; } /* "cogent/struct/_asa.pyx":139 * n_acc_points += 1 * #print n_acc_points * areas[idx] = const_pi * n_acc_points * qradius * qradius # <<<<<<<<<<<<<< * free(ridx) * free(ridx_div) */ __pyx_t_16 = __pyx_v_idx; if (__pyx_t_16 < 0) __pyx_t_16 += __pyx_pybuffernd_areas.diminfo[0].shape; *__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_4_asa_DTYPE_t *, __pyx_pybuffernd_areas.rcbuffer->pybuffer.buf, __pyx_t_16, __pyx_pybuffernd_areas.diminfo[0].strides) = (((__pyx_v_const_pi * __pyx_v_n_acc_points) * __pyx_v_qradius) * __pyx_v_qradius); /* "cogent/struct/_asa.pyx":140 * #print n_acc_points * areas[idx] = const_pi * n_acc_points * qradius * qradius * free(ridx) # <<<<<<<<<<<<<< * free(ridx_div) * free(t_arr) */ free(__pyx_v_ridx); /* "cogent/struct/_asa.pyx":141 * areas[idx] = const_pi * n_acc_points * qradius * qradius * free(ridx) * free(ridx_div) # <<<<<<<<<<<<<< * free(t_arr) * free(t_lid) */ free(__pyx_v_ridx_div); } /* "cogent/struct/_asa.pyx":142 * free(ridx) * free(ridx_div) * free(t_arr) # <<<<<<<<<<<<<< * free(t_lid) * free(distance) */ free(__pyx_v_t_arr); /* "cogent/struct/_asa.pyx":143 * free(ridx_div) * free(t_arr) * free(t_lid) # <<<<<<<<<<<<<< * free(distance) * free(distance_sq) */ free(__pyx_v_t_lid); /* "cogent/struct/_asa.pyx":144 * free(t_arr) * free(t_lid) * free(distance) # <<<<<<<<<<<<<< * free(distance_sq) * free(rspoint) */ free(__pyx_v_distance); /* "cogent/struct/_asa.pyx":145 * free(t_lid) * free(distance) * free(distance_sq) # <<<<<<<<<<<<<< * free(rspoint) * free(idxptr[0]) */ free(__pyx_v_distance_sq); /* "cogent/struct/_asa.pyx":146 * free(distance) * free(distance_sq) * free(rspoint) # <<<<<<<<<<<<<< * free(idxptr[0]) * free(idxptr) */ free(__pyx_v_rspoint); /* "cogent/struct/_asa.pyx":147 * free(distance_sq) * free(rspoint) * free(idxptr[0]) # <<<<<<<<<<<<<< * free(idxptr) * free(dstptr) */ free((__pyx_v_idxptr[0])); /* "cogent/struct/_asa.pyx":148 * free(rspoint) * free(idxptr[0]) * free(idxptr) # <<<<<<<<<<<<<< * free(dstptr) * # import_array() */ free(__pyx_v_idxptr); /* "cogent/struct/_asa.pyx":149 * free(idxptr[0]) * free(idxptr) * free(dstptr) # <<<<<<<<<<<<<< * # import_array() * # cdef np.ndarray output = PyArray_SimpleNewFromData(1, &qpnts, NPY_DOUBLE, areas) */ free(__pyx_v_dstptr); /* "cogent/struct/_asa.pyx":156 * # print PyArray_FLAGS(output) |(NPY_OWNDATA) * # output = PyArray_NewCopy(output, NPY_CORDER) * return areas # <<<<<<<<<<<<<< */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_areas)); __pyx_r = ((PyObject *)__pyx_v_areas); goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_areas.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_box.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_lcoords.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_lradii.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_qcoords.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_qradii.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_spoints.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.struct._asa.asa_loop", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_areas.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_box.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_lcoords.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_lradii.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_qcoords.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_qradii.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_spoints.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_areas); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /*proto*/ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_r; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__getbuffer__ (wrapper)", 0); __pyx_r = __pyx_pf_5numpy_7ndarray___getbuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info), ((int)__pyx_v_flags)); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":193 * # experimental exception made for __getbuffer__ and __releasebuffer__ * # -- the details of this may change. * def __getbuffer__(ndarray self, Py_buffer* info, int flags): # <<<<<<<<<<<<<< * # This implementation of getbuffer is geared towards Cython * # requirements, and does not yet fullfill the PEP. */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_v_copy_shape; int __pyx_v_i; int __pyx_v_ndim; int __pyx_v_endian_detector; int __pyx_v_little_endian; int __pyx_v_t; char *__pyx_v_f; PyArray_Descr *__pyx_v_descr = 0; int __pyx_v_offset; int __pyx_v_hasfields; int __pyx_r; __Pyx_RefNannyDeclarations int __pyx_t_1; int __pyx_t_2; int __pyx_t_3; PyObject *__pyx_t_4 = NULL; int __pyx_t_5; int __pyx_t_6; int __pyx_t_7; PyObject *__pyx_t_8 = NULL; char *__pyx_t_9; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("__getbuffer__", 0); if (__pyx_v_info != NULL) { __pyx_v_info->obj = Py_None; __Pyx_INCREF(Py_None); __Pyx_GIVEREF(__pyx_v_info->obj); } /* "numpy.pxd":199 * # of flags * * if info == NULL: return # <<<<<<<<<<<<<< * * cdef int copy_shape, i, ndim */ __pyx_t_1 = (__pyx_v_info == NULL); if (__pyx_t_1) { __pyx_r = 0; goto __pyx_L0; goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":202 * * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * */ __pyx_v_endian_detector = 1; /* "numpy.pxd":203 * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * * ndim = PyArray_NDIM(self) */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":205 * cdef bint little_endian = ((&endian_detector)[0] != 0) * * ndim = PyArray_NDIM(self) # <<<<<<<<<<<<<< * * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_v_ndim = PyArray_NDIM(__pyx_v_self); /* "numpy.pxd":207 * ndim = PyArray_NDIM(self) * * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * copy_shape = 1 * else: */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":208 * * if sizeof(npy_intp) != sizeof(Py_ssize_t): * copy_shape = 1 # <<<<<<<<<<<<<< * else: * copy_shape = 0 */ __pyx_v_copy_shape = 1; goto __pyx_L4; } /*else*/ { /* "numpy.pxd":210 * copy_shape = 1 * else: * copy_shape = 0 # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) */ __pyx_v_copy_shape = 0; } __pyx_L4:; /* "numpy.pxd":212 * copy_shape = 0 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") */ __pyx_t_1 = ((__pyx_v_flags & PyBUF_C_CONTIGUOUS) == PyBUF_C_CONTIGUOUS); if (__pyx_t_1) { /* "numpy.pxd":213 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not C contiguous") * */ __pyx_t_2 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_C_CONTIGUOUS)); __pyx_t_3 = __pyx_t_2; } else { __pyx_t_3 = __pyx_t_1; } if (__pyx_t_3) { /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_2), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":216 * raise ValueError(u"ndarray is not C contiguous") * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") */ __pyx_t_3 = ((__pyx_v_flags & PyBUF_F_CONTIGUOUS) == PyBUF_F_CONTIGUOUS); if (__pyx_t_3) { /* "numpy.pxd":217 * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not Fortran contiguous") * */ __pyx_t_1 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_F_CONTIGUOUS)); __pyx_t_2 = __pyx_t_1; } else { __pyx_t_2 = __pyx_t_3; } if (__pyx_t_2) { /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_4), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":220 * raise ValueError(u"ndarray is not Fortran contiguous") * * info.buf = PyArray_DATA(self) # <<<<<<<<<<<<<< * info.ndim = ndim * if copy_shape: */ __pyx_v_info->buf = PyArray_DATA(__pyx_v_self); /* "numpy.pxd":221 * * info.buf = PyArray_DATA(self) * info.ndim = ndim # <<<<<<<<<<<<<< * if copy_shape: * # Allocate new buffer for strides and shape info. */ __pyx_v_info->ndim = __pyx_v_ndim; /* "numpy.pxd":222 * info.buf = PyArray_DATA(self) * info.ndim = ndim * if copy_shape: # <<<<<<<<<<<<<< * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. */ if (__pyx_v_copy_shape) { /* "numpy.pxd":225 * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) # <<<<<<<<<<<<<< * info.shape = info.strides + ndim * for i in range(ndim): */ __pyx_v_info->strides = ((Py_ssize_t *)malloc((((sizeof(Py_ssize_t)) * ((size_t)__pyx_v_ndim)) * 2))); /* "numpy.pxd":226 * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim # <<<<<<<<<<<<<< * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] */ __pyx_v_info->shape = (__pyx_v_info->strides + __pyx_v_ndim); /* "numpy.pxd":227 * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim * for i in range(ndim): # <<<<<<<<<<<<<< * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] */ __pyx_t_5 = __pyx_v_ndim; for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) { __pyx_v_i = __pyx_t_6; /* "numpy.pxd":228 * info.shape = info.strides + ndim * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] # <<<<<<<<<<<<<< * info.shape[i] = PyArray_DIMS(self)[i] * else: */ (__pyx_v_info->strides[__pyx_v_i]) = (PyArray_STRIDES(__pyx_v_self)[__pyx_v_i]); /* "numpy.pxd":229 * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] # <<<<<<<<<<<<<< * else: * info.strides = PyArray_STRIDES(self) */ (__pyx_v_info->shape[__pyx_v_i]) = (PyArray_DIMS(__pyx_v_self)[__pyx_v_i]); } goto __pyx_L7; } /*else*/ { /* "numpy.pxd":231 * info.shape[i] = PyArray_DIMS(self)[i] * else: * info.strides = PyArray_STRIDES(self) # <<<<<<<<<<<<<< * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL */ __pyx_v_info->strides = ((Py_ssize_t *)PyArray_STRIDES(__pyx_v_self)); /* "numpy.pxd":232 * else: * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) # <<<<<<<<<<<<<< * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) */ __pyx_v_info->shape = ((Py_ssize_t *)PyArray_DIMS(__pyx_v_self)); } __pyx_L7:; /* "numpy.pxd":233 * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL # <<<<<<<<<<<<<< * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) */ __pyx_v_info->suboffsets = NULL; /* "numpy.pxd":234 * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) # <<<<<<<<<<<<<< * info.readonly = not PyArray_ISWRITEABLE(self) * */ __pyx_v_info->itemsize = PyArray_ITEMSIZE(__pyx_v_self); /* "numpy.pxd":235 * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) # <<<<<<<<<<<<<< * * cdef int t */ __pyx_v_info->readonly = (!PyArray_ISWRITEABLE(__pyx_v_self)); /* "numpy.pxd":238 * * cdef int t * cdef char* f = NULL # <<<<<<<<<<<<<< * cdef dtype descr = self.descr * cdef list stack */ __pyx_v_f = NULL; /* "numpy.pxd":239 * cdef int t * cdef char* f = NULL * cdef dtype descr = self.descr # <<<<<<<<<<<<<< * cdef list stack * cdef int offset */ __Pyx_INCREF(((PyObject *)__pyx_v_self->descr)); __pyx_v_descr = __pyx_v_self->descr; /* "numpy.pxd":243 * cdef int offset * * cdef bint hasfields = PyDataType_HASFIELDS(descr) # <<<<<<<<<<<<<< * * if not hasfields and not copy_shape: */ __pyx_v_hasfields = PyDataType_HASFIELDS(__pyx_v_descr); /* "numpy.pxd":245 * cdef bint hasfields = PyDataType_HASFIELDS(descr) * * if not hasfields and not copy_shape: # <<<<<<<<<<<<<< * # do not call releasebuffer * info.obj = None */ __pyx_t_2 = (!__pyx_v_hasfields); if (__pyx_t_2) { __pyx_t_3 = (!__pyx_v_copy_shape); __pyx_t_1 = __pyx_t_3; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":247 * if not hasfields and not copy_shape: * # do not call releasebuffer * info.obj = None # <<<<<<<<<<<<<< * else: * # need to call releasebuffer */ __Pyx_INCREF(Py_None); __Pyx_GIVEREF(Py_None); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = Py_None; goto __pyx_L10; } /*else*/ { /* "numpy.pxd":250 * else: * # need to call releasebuffer * info.obj = self # <<<<<<<<<<<<<< * * if not hasfields: */ __Pyx_INCREF(((PyObject *)__pyx_v_self)); __Pyx_GIVEREF(((PyObject *)__pyx_v_self)); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = ((PyObject *)__pyx_v_self); } __pyx_L10:; /* "numpy.pxd":252 * info.obj = self * * if not hasfields: # <<<<<<<<<<<<<< * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or */ __pyx_t_1 = (!__pyx_v_hasfields); if (__pyx_t_1) { /* "numpy.pxd":253 * * if not hasfields: * t = descr.type_num # <<<<<<<<<<<<<< * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): */ __pyx_v_t = __pyx_v_descr->type_num; /* "numpy.pxd":254 * if not hasfields: * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_1 = (__pyx_v_descr->byteorder == '>'); if (__pyx_t_1) { __pyx_t_2 = __pyx_v_little_endian; } else { __pyx_t_2 = __pyx_t_1; } if (!__pyx_t_2) { /* "numpy.pxd":255 * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" */ __pyx_t_1 = (__pyx_v_descr->byteorder == '<'); if (__pyx_t_1) { __pyx_t_3 = (!__pyx_v_little_endian); __pyx_t_7 = __pyx_t_3; } else { __pyx_t_7 = __pyx_t_1; } __pyx_t_1 = __pyx_t_7; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_6), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L12; } __pyx_L12:; /* "numpy.pxd":257 * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" */ __pyx_t_1 = (__pyx_v_t == NPY_BYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__b; goto __pyx_L13; } /* "numpy.pxd":258 * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" */ __pyx_t_1 = (__pyx_v_t == NPY_UBYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__B; goto __pyx_L13; } /* "numpy.pxd":259 * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" */ __pyx_t_1 = (__pyx_v_t == NPY_SHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__h; goto __pyx_L13; } /* "numpy.pxd":260 * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" */ __pyx_t_1 = (__pyx_v_t == NPY_USHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__H; goto __pyx_L13; } /* "numpy.pxd":261 * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" */ __pyx_t_1 = (__pyx_v_t == NPY_INT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__i; goto __pyx_L13; } /* "numpy.pxd":262 * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" */ __pyx_t_1 = (__pyx_v_t == NPY_UINT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__I; goto __pyx_L13; } /* "numpy.pxd":263 * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" */ __pyx_t_1 = (__pyx_v_t == NPY_LONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__l; goto __pyx_L13; } /* "numpy.pxd":264 * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__L; goto __pyx_L13; } /* "numpy.pxd":265 * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__q; goto __pyx_L13; } /* "numpy.pxd":266 * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Q; goto __pyx_L13; } /* "numpy.pxd":267 * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" */ __pyx_t_1 = (__pyx_v_t == NPY_FLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__f; goto __pyx_L13; } /* "numpy.pxd":268 * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" */ __pyx_t_1 = (__pyx_v_t == NPY_DOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__d; goto __pyx_L13; } /* "numpy.pxd":269 * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__g; goto __pyx_L13; } /* "numpy.pxd":270 * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" */ __pyx_t_1 = (__pyx_v_t == NPY_CFLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zf; goto __pyx_L13; } /* "numpy.pxd":271 * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" */ __pyx_t_1 = (__pyx_v_t == NPY_CDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zd; goto __pyx_L13; } /* "numpy.pxd":272 * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f = "O" * else: */ __pyx_t_1 = (__pyx_v_t == NPY_CLONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zg; goto __pyx_L13; } /* "numpy.pxd":273 * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_1 = (__pyx_v_t == NPY_OBJECT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__O; goto __pyx_L13; } /*else*/ { /* "numpy.pxd":275 * elif t == NPY_OBJECT: f = "O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * info.format = f * return */ __pyx_t_4 = PyInt_FromLong(__pyx_v_t); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_8 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_t_4); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_8)); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_8)); __Pyx_GIVEREF(((PyObject *)__pyx_t_8)); __pyx_t_8 = 0; __pyx_t_8 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_8); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_Raise(__pyx_t_8, 0, 0, 0); __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L13:; /* "numpy.pxd":276 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f # <<<<<<<<<<<<<< * return * else: */ __pyx_v_info->format = __pyx_v_f; /* "numpy.pxd":277 * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f * return # <<<<<<<<<<<<<< * else: * info.format = stdlib.malloc(_buffer_format_string_len) */ __pyx_r = 0; goto __pyx_L0; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":279 * return * else: * info.format = stdlib.malloc(_buffer_format_string_len) # <<<<<<<<<<<<<< * info.format[0] = '^' # Native data types, manual alignment * offset = 0 */ __pyx_v_info->format = ((char *)malloc(255)); /* "numpy.pxd":280 * else: * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment # <<<<<<<<<<<<<< * offset = 0 * f = _util_dtypestring(descr, info.format + 1, */ (__pyx_v_info->format[0]) = '^'; /* "numpy.pxd":281 * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment * offset = 0 # <<<<<<<<<<<<<< * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, */ __pyx_v_offset = 0; /* "numpy.pxd":284 * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, * &offset) # <<<<<<<<<<<<<< * f[0] = 0 # Terminate format string * */ __pyx_t_9 = __pyx_f_5numpy__util_dtypestring(__pyx_v_descr, (__pyx_v_info->format + 1), (__pyx_v_info->format + 255), (&__pyx_v_offset)); if (unlikely(__pyx_t_9 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_9; /* "numpy.pxd":285 * info.format + _buffer_format_string_len, * &offset) * f[0] = 0 # Terminate format string # <<<<<<<<<<<<<< * * def __releasebuffer__(ndarray self, Py_buffer* info): */ (__pyx_v_f[0]) = 0; } __pyx_L11:; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_8); __Pyx_AddTraceback("numpy.ndarray.__getbuffer__", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = -1; if (__pyx_v_info != NULL && __pyx_v_info->obj != NULL) { __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = NULL; } goto __pyx_L2; __pyx_L0:; if (__pyx_v_info != NULL && __pyx_v_info->obj == Py_None) { __Pyx_GOTREF(Py_None); __Pyx_DECREF(Py_None); __pyx_v_info->obj = NULL; } __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_descr); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info); /*proto*/ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__releasebuffer__ (wrapper)", 0); __pyx_pf_5numpy_7ndarray_2__releasebuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info)); __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":287 * f[0] = 0 # Terminate format string * * def __releasebuffer__(ndarray self, Py_buffer* info): # <<<<<<<<<<<<<< * if PyArray_HASFIELDS(self): * stdlib.free(info.format) */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("__releasebuffer__", 0); /* "numpy.pxd":288 * * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): # <<<<<<<<<<<<<< * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_t_1 = PyArray_HASFIELDS(__pyx_v_self); if (__pyx_t_1) { /* "numpy.pxd":289 * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): * stdlib.free(info.format) # <<<<<<<<<<<<<< * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) */ free(__pyx_v_info->format); goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":290 * if PyArray_HASFIELDS(self): * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * stdlib.free(info.strides) * # info.shape was stored after info.strides in the same block */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":291 * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) # <<<<<<<<<<<<<< * # info.shape was stored after info.strides in the same block * */ free(__pyx_v_info->strides); goto __pyx_L4; } __pyx_L4:; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":767 * ctypedef npy_cdouble complex_t * * cdef inline object PyArray_MultiIterNew1(a): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(1, a) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew1(PyObject *__pyx_v_a) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew1", 0); /* "numpy.pxd":768 * * cdef inline object PyArray_MultiIterNew1(a): * return PyArray_MultiIterNew(1, a) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew2(a, b): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(1, ((void *)__pyx_v_a)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 768; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew1", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":770 * return PyArray_MultiIterNew(1, a) * * cdef inline object PyArray_MultiIterNew2(a, b): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(2, a, b) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew2(PyObject *__pyx_v_a, PyObject *__pyx_v_b) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew2", 0); /* "numpy.pxd":771 * * cdef inline object PyArray_MultiIterNew2(a, b): * return PyArray_MultiIterNew(2, a, b) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew3(a, b, c): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(2, ((void *)__pyx_v_a), ((void *)__pyx_v_b)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 771; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew2", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":773 * return PyArray_MultiIterNew(2, a, b) * * cdef inline object PyArray_MultiIterNew3(a, b, c): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(3, a, b, c) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew3(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew3", 0); /* "numpy.pxd":774 * * cdef inline object PyArray_MultiIterNew3(a, b, c): * return PyArray_MultiIterNew(3, a, b, c) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(3, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 774; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew3", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":776 * return PyArray_MultiIterNew(3, a, b, c) * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(4, a, b, c, d) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew4(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew4", 0); /* "numpy.pxd":777 * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): * return PyArray_MultiIterNew(4, a, b, c, d) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(4, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 777; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew4", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":779 * return PyArray_MultiIterNew(4, a, b, c, d) * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(5, a, b, c, d, e) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew5(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d, PyObject *__pyx_v_e) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew5", 0); /* "numpy.pxd":780 * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): * return PyArray_MultiIterNew(5, a, b, c, d, e) # <<<<<<<<<<<<<< * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(5, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d), ((void *)__pyx_v_e)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 780; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew5", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":782 * return PyArray_MultiIterNew(5, a, b, c, d, e) * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: # <<<<<<<<<<<<<< * # Recursive utility function used in __getbuffer__ to get format * # string. The new location in the format string is returned. */ static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *__pyx_v_descr, char *__pyx_v_f, char *__pyx_v_end, int *__pyx_v_offset) { PyArray_Descr *__pyx_v_child = 0; int __pyx_v_endian_detector; int __pyx_v_little_endian; PyObject *__pyx_v_fields = 0; PyObject *__pyx_v_childname = NULL; PyObject *__pyx_v_new_offset = NULL; PyObject *__pyx_v_t = NULL; char *__pyx_r; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; Py_ssize_t __pyx_t_2; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; PyObject *__pyx_t_5 = NULL; int __pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; long __pyx_t_10; char *__pyx_t_11; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("_util_dtypestring", 0); /* "numpy.pxd":789 * cdef int delta_offset * cdef tuple i * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * cdef tuple fields */ __pyx_v_endian_detector = 1; /* "numpy.pxd":790 * cdef tuple i * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * cdef tuple fields * */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":793 * cdef tuple fields * * for childname in descr.names: # <<<<<<<<<<<<<< * fields = descr.fields[childname] * child, new_offset = fields */ if (unlikely(((PyObject *)__pyx_v_descr->names) == Py_None)) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 793; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_1 = ((PyObject *)__pyx_v_descr->names); __Pyx_INCREF(__pyx_t_1); __pyx_t_2 = 0; for (;;) { if (__pyx_t_2 >= PyTuple_GET_SIZE(__pyx_t_1)) break; __pyx_t_3 = PyTuple_GET_ITEM(__pyx_t_1, __pyx_t_2); __Pyx_INCREF(__pyx_t_3); __pyx_t_2++; __Pyx_XDECREF(__pyx_v_childname); __pyx_v_childname = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":794 * * for childname in descr.names: * fields = descr.fields[childname] # <<<<<<<<<<<<<< * child, new_offset = fields * */ __pyx_t_3 = PyObject_GetItem(__pyx_v_descr->fields, __pyx_v_childname); if (!__pyx_t_3) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); if (!(likely(PyTuple_CheckExact(__pyx_t_3))||((__pyx_t_3) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected tuple, got %.200s", Py_TYPE(__pyx_t_3)->tp_name), 0))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_fields)); __pyx_v_fields = ((PyObject*)__pyx_t_3); __pyx_t_3 = 0; /* "numpy.pxd":795 * for childname in descr.names: * fields = descr.fields[childname] * child, new_offset = fields # <<<<<<<<<<<<<< * * if (end - f) - (new_offset - offset[0]) < 15: */ if (likely(PyTuple_CheckExact(((PyObject *)__pyx_v_fields)))) { PyObject* sequence = ((PyObject *)__pyx_v_fields); if (unlikely(PyTuple_GET_SIZE(sequence) != 2)) { if (PyTuple_GET_SIZE(sequence) > 2) __Pyx_RaiseTooManyValuesError(2); else __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(sequence)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_3 = PyTuple_GET_ITEM(sequence, 0); __pyx_t_4 = PyTuple_GET_ITEM(sequence, 1); __Pyx_INCREF(__pyx_t_3); __Pyx_INCREF(__pyx_t_4); } else { __Pyx_UnpackTupleError(((PyObject *)__pyx_v_fields), 2); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } if (!(likely(((__pyx_t_3) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_3, __pyx_ptype_5numpy_dtype))))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_child)); __pyx_v_child = ((PyArray_Descr *)__pyx_t_3); __pyx_t_3 = 0; __Pyx_XDECREF(__pyx_v_new_offset); __pyx_v_new_offset = __pyx_t_4; __pyx_t_4 = 0; /* "numpy.pxd":797 * child, new_offset = fields * * if (end - f) - (new_offset - offset[0]) < 15: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * */ __pyx_t_4 = PyInt_FromLong((__pyx_v_end - __pyx_v_f)); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyNumber_Subtract(__pyx_v_new_offset, __pyx_t_3); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyNumber_Subtract(__pyx_t_4, __pyx_t_5); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = PyObject_RichCompare(__pyx_t_3, __pyx_int_15, Py_LT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_t_5 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_9), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":800 * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * * if ((child.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_6 = (__pyx_v_child->byteorder == '>'); if (__pyx_t_6) { __pyx_t_7 = __pyx_v_little_endian; } else { __pyx_t_7 = __pyx_t_6; } if (!__pyx_t_7) { /* "numpy.pxd":801 * * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * # One could encode it in the format string and have Cython */ __pyx_t_6 = (__pyx_v_child->byteorder == '<'); if (__pyx_t_6) { __pyx_t_8 = (!__pyx_v_little_endian); __pyx_t_9 = __pyx_t_8; } else { __pyx_t_9 = __pyx_t_6; } __pyx_t_6 = __pyx_t_9; } else { __pyx_t_6 = __pyx_t_7; } if (__pyx_t_6) { /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_10), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":812 * * # Output padding bytes * while offset[0] < new_offset: # <<<<<<<<<<<<<< * f[0] = 120 # "x"; pad byte * f += 1 */ while (1) { __pyx_t_5 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_v_new_offset, Py_LT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (!__pyx_t_6) break; /* "numpy.pxd":813 * # Output padding bytes * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte # <<<<<<<<<<<<<< * f += 1 * offset[0] += 1 */ (__pyx_v_f[0]) = 120; /* "numpy.pxd":814 * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte * f += 1 # <<<<<<<<<<<<<< * offset[0] += 1 * */ __pyx_v_f = (__pyx_v_f + 1); /* "numpy.pxd":815 * f[0] = 120 # "x"; pad byte * f += 1 * offset[0] += 1 # <<<<<<<<<<<<<< * * offset[0] += child.itemsize */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + 1); } /* "numpy.pxd":817 * offset[0] += 1 * * offset[0] += child.itemsize # <<<<<<<<<<<<<< * * if not PyDataType_HASFIELDS(child): */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + __pyx_v_child->elsize); /* "numpy.pxd":819 * offset[0] += child.itemsize * * if not PyDataType_HASFIELDS(child): # <<<<<<<<<<<<<< * t = child.type_num * if end - f < 5: */ __pyx_t_6 = (!PyDataType_HASFIELDS(__pyx_v_child)); if (__pyx_t_6) { /* "numpy.pxd":820 * * if not PyDataType_HASFIELDS(child): * t = child.type_num # <<<<<<<<<<<<<< * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") */ __pyx_t_3 = PyInt_FromLong(__pyx_v_child->type_num); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 820; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_XDECREF(__pyx_v_t); __pyx_v_t = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":821 * if not PyDataType_HASFIELDS(child): * t = child.type_num * if end - f < 5: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short.") * */ __pyx_t_6 = ((__pyx_v_end - __pyx_v_f) < 5); if (__pyx_t_6) { /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_t_3 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_12), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_Raise(__pyx_t_3, 0, 0, 0); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L10; } __pyx_L10:; /* "numpy.pxd":825 * * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" */ __pyx_t_3 = PyInt_FromLong(NPY_BYTE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 98; goto __pyx_L11; } /* "numpy.pxd":826 * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" */ __pyx_t_5 = PyInt_FromLong(NPY_UBYTE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 66; goto __pyx_L11; } /* "numpy.pxd":827 * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" */ __pyx_t_3 = PyInt_FromLong(NPY_SHORT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 104; goto __pyx_L11; } /* "numpy.pxd":828 * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" */ __pyx_t_5 = PyInt_FromLong(NPY_USHORT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 72; goto __pyx_L11; } /* "numpy.pxd":829 * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" */ __pyx_t_3 = PyInt_FromLong(NPY_INT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 105; goto __pyx_L11; } /* "numpy.pxd":830 * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" */ __pyx_t_5 = PyInt_FromLong(NPY_UINT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 73; goto __pyx_L11; } /* "numpy.pxd":831 * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" */ __pyx_t_3 = PyInt_FromLong(NPY_LONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 108; goto __pyx_L11; } /* "numpy.pxd":832 * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 76; goto __pyx_L11; } /* "numpy.pxd":833 * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" */ __pyx_t_3 = PyInt_FromLong(NPY_LONGLONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 113; goto __pyx_L11; } /* "numpy.pxd":834 * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONGLONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 81; goto __pyx_L11; } /* "numpy.pxd":835 * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" */ __pyx_t_3 = PyInt_FromLong(NPY_FLOAT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 102; goto __pyx_L11; } /* "numpy.pxd":836 * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf */ __pyx_t_5 = PyInt_FromLong(NPY_DOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 100; goto __pyx_L11; } /* "numpy.pxd":837 * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd */ __pyx_t_3 = PyInt_FromLong(NPY_LONGDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 103; goto __pyx_L11; } /* "numpy.pxd":838 * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg */ __pyx_t_5 = PyInt_FromLong(NPY_CFLOAT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 102; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":839 * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" */ __pyx_t_3 = PyInt_FromLong(NPY_CDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 100; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":840 * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: */ __pyx_t_5 = PyInt_FromLong(NPY_CLONGDOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 103; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":841 * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_3 = PyInt_FromLong(NPY_OBJECT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 79; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":843 * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * f += 1 * else: */ __pyx_t_5 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_v_t); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_5)); __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, ((PyObject *)__pyx_t_5)); __Pyx_GIVEREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L11:; /* "numpy.pxd":844 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * f += 1 # <<<<<<<<<<<<<< * else: * # Cython ignores struct boundary information ("T{...}"), */ __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L9; } /*else*/ { /* "numpy.pxd":848 * # Cython ignores struct boundary information ("T{...}"), * # so don't output it * f = _util_dtypestring(child, f, end, offset) # <<<<<<<<<<<<<< * return f * */ __pyx_t_11 = __pyx_f_5numpy__util_dtypestring(__pyx_v_child, __pyx_v_f, __pyx_v_end, __pyx_v_offset); if (unlikely(__pyx_t_11 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 848; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_11; } __pyx_L9:; } __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "numpy.pxd":849 * # so don't output it * f = _util_dtypestring(child, f, end, offset) * return f # <<<<<<<<<<<<<< * * */ __pyx_r = __pyx_v_f; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __Pyx_AddTraceback("numpy._util_dtypestring", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF((PyObject *)__pyx_v_child); __Pyx_XDECREF(__pyx_v_fields); __Pyx_XDECREF(__pyx_v_childname); __Pyx_XDECREF(__pyx_v_new_offset); __Pyx_XDECREF(__pyx_v_t); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":964 * * * cdef inline void set_array_base(ndarray arr, object base): # <<<<<<<<<<<<<< * cdef PyObject* baseptr * if base is None: */ static CYTHON_INLINE void __pyx_f_5numpy_set_array_base(PyArrayObject *__pyx_v_arr, PyObject *__pyx_v_base) { PyObject *__pyx_v_baseptr; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("set_array_base", 0); /* "numpy.pxd":966 * cdef inline void set_array_base(ndarray arr, object base): * cdef PyObject* baseptr * if base is None: # <<<<<<<<<<<<<< * baseptr = NULL * else: */ __pyx_t_1 = (__pyx_v_base == Py_None); if (__pyx_t_1) { /* "numpy.pxd":967 * cdef PyObject* baseptr * if base is None: * baseptr = NULL # <<<<<<<<<<<<<< * else: * Py_INCREF(base) # important to do this before decref below! */ __pyx_v_baseptr = NULL; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":969 * baseptr = NULL * else: * Py_INCREF(base) # important to do this before decref below! # <<<<<<<<<<<<<< * baseptr = base * Py_XDECREF(arr.base) */ Py_INCREF(__pyx_v_base); /* "numpy.pxd":970 * else: * Py_INCREF(base) # important to do this before decref below! * baseptr = base # <<<<<<<<<<<<<< * Py_XDECREF(arr.base) * arr.base = baseptr */ __pyx_v_baseptr = ((PyObject *)__pyx_v_base); } __pyx_L3:; /* "numpy.pxd":971 * Py_INCREF(base) # important to do this before decref below! * baseptr = base * Py_XDECREF(arr.base) # <<<<<<<<<<<<<< * arr.base = baseptr * */ Py_XDECREF(__pyx_v_arr->base); /* "numpy.pxd":972 * baseptr = base * Py_XDECREF(arr.base) * arr.base = baseptr # <<<<<<<<<<<<<< * * cdef inline object get_array_base(ndarray arr): */ __pyx_v_arr->base = __pyx_v_baseptr; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_get_array_base(PyArrayObject *__pyx_v_arr) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("get_array_base", 0); /* "numpy.pxd":975 * * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: # <<<<<<<<<<<<<< * return None * else: */ __pyx_t_1 = (__pyx_v_arr->base == NULL); if (__pyx_t_1) { /* "numpy.pxd":976 * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: * return None # <<<<<<<<<<<<<< * else: * return arr.base */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(Py_None); __pyx_r = Py_None; goto __pyx_L0; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":978 * return None * else: * return arr.base # <<<<<<<<<<<<<< */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_arr->base)); __pyx_r = ((PyObject *)__pyx_v_arr->base); goto __pyx_L0; } __pyx_L3:; __pyx_r = Py_None; __Pyx_INCREF(Py_None); __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyMethodDef __pyx_methods[] = { {0, 0, 0, 0} }; #if PY_MAJOR_VERSION >= 3 static struct PyModuleDef __pyx_moduledef = { PyModuleDef_HEAD_INIT, __Pyx_NAMESTR("_asa"), 0, /* m_doc */ -1, /* m_size */ __pyx_methods /* m_methods */, NULL, /* m_reload */ NULL, /* m_traverse */ NULL, /* m_clear */ NULL /* m_free */ }; #endif static __Pyx_StringTabEntry __pyx_string_tab[] = { {&__pyx_kp_u_1, __pyx_k_1, sizeof(__pyx_k_1), 0, 1, 0, 0}, {&__pyx_kp_u_11, __pyx_k_11, sizeof(__pyx_k_11), 0, 1, 0, 0}, {&__pyx_kp_s_13, __pyx_k_13, sizeof(__pyx_k_13), 0, 0, 1, 0}, {&__pyx_kp_s_16, __pyx_k_16, sizeof(__pyx_k_16), 0, 0, 1, 0}, {&__pyx_n_s_17, __pyx_k_17, sizeof(__pyx_k_17), 0, 0, 1, 1}, {&__pyx_kp_u_3, __pyx_k_3, sizeof(__pyx_k_3), 0, 1, 0, 0}, {&__pyx_kp_u_5, __pyx_k_5, sizeof(__pyx_k_5), 0, 1, 0, 0}, {&__pyx_kp_u_7, __pyx_k_7, sizeof(__pyx_k_7), 0, 1, 0, 0}, {&__pyx_kp_u_8, __pyx_k_8, sizeof(__pyx_k_8), 0, 1, 0, 0}, {&__pyx_n_s__MAXSYM, __pyx_k__MAXSYM, sizeof(__pyx_k__MAXSYM), 0, 0, 1, 1}, {&__pyx_n_s__RuntimeError, __pyx_k__RuntimeError, sizeof(__pyx_k__RuntimeError), 0, 0, 1, 1}, {&__pyx_n_s__ValueError, __pyx_k__ValueError, sizeof(__pyx_k__ValueError), 0, 0, 1, 1}, {&__pyx_n_s____main__, __pyx_k____main__, sizeof(__pyx_k____main__), 0, 0, 1, 1}, {&__pyx_n_s____test__, __pyx_k____test__, sizeof(__pyx_k____test__), 0, 0, 1, 1}, {&__pyx_n_s____version__, __pyx_k____version__, sizeof(__pyx_k____version__), 0, 0, 1, 1}, {&__pyx_n_s__areas, __pyx_k__areas, sizeof(__pyx_k__areas), 0, 0, 1, 1}, {&__pyx_n_s__asa_loop, __pyx_k__asa_loop, sizeof(__pyx_k__asa_loop), 0, 0, 1, 1}, {&__pyx_n_s__box, __pyx_k__box, sizeof(__pyx_k__box), 0, 0, 1, 1}, {&__pyx_n_s__box_c, __pyx_k__box_c, sizeof(__pyx_k__box_c), 0, 0, 1, 1}, {&__pyx_n_s__bucket_size, __pyx_k__bucket_size, sizeof(__pyx_k__bucket_size), 0, 0, 1, 1}, {&__pyx_n_s__const_pi, __pyx_k__const_pi, sizeof(__pyx_k__const_pi), 0, 0, 1, 1}, {&__pyx_n_s__distance, __pyx_k__distance, sizeof(__pyx_k__distance), 0, 0, 1, 1}, {&__pyx_n_s__distance_sq, __pyx_k__distance_sq, sizeof(__pyx_k__distance_sq), 0, 0, 1, 1}, {&__pyx_n_s__dstptr, __pyx_k__dstptr, sizeof(__pyx_k__dstptr), 0, 0, 1, 1}, {&__pyx_n_s__dtype, __pyx_k__dtype, sizeof(__pyx_k__dtype), 0, 0, 1, 1}, {&__pyx_n_s__float64, __pyx_k__float64, sizeof(__pyx_k__float64), 0, 0, 1, 1}, {&__pyx_n_s__idx, __pyx_k__idx, sizeof(__pyx_k__idx), 0, 0, 1, 1}, {&__pyx_n_s__idxn, __pyx_k__idxn, sizeof(__pyx_k__idxn), 0, 0, 1, 1}, {&__pyx_n_s__idxn_skip, __pyx_k__idxn_skip, sizeof(__pyx_k__idxn_skip), 0, 0, 1, 1}, {&__pyx_n_s__idxptr, __pyx_k__idxptr, sizeof(__pyx_k__idxptr), 0, 0, 1, 1}, {&__pyx_n_s__idxs, __pyx_k__idxs, sizeof(__pyx_k__idxs), 0, 0, 1, 1}, {&__pyx_n_s__is_accessible, __pyx_k__is_accessible, sizeof(__pyx_k__is_accessible), 0, 0, 1, 1}, {&__pyx_n_s__kdpnts, __pyx_k__kdpnts, sizeof(__pyx_k__kdpnts), 0, 0, 1, 1}, {&__pyx_n_s__lcoords, __pyx_k__lcoords, sizeof(__pyx_k__lcoords), 0, 0, 1, 1}, {&__pyx_n_s__lcoords_c, __pyx_k__lcoords_c, sizeof(__pyx_k__lcoords_c), 0, 0, 1, 1}, {&__pyx_n_s__lidx, __pyx_k__lidx, sizeof(__pyx_k__lidx), 0, 0, 1, 1}, {&__pyx_n_s__lidx_c, __pyx_k__lidx_c, sizeof(__pyx_k__lidx_c), 0, 0, 1, 1}, {&__pyx_n_s__lradii, __pyx_k__lradii, sizeof(__pyx_k__lradii), 0, 0, 1, 1}, {&__pyx_n_s__lradius, __pyx_k__lradius, sizeof(__pyx_k__lradius), 0, 0, 1, 1}, {&__pyx_n_s__lradius_max, __pyx_k__lradius_max, sizeof(__pyx_k__lradius_max), 0, 0, 1, 1}, {&__pyx_n_s__lsize, __pyx_k__lsize, sizeof(__pyx_k__lsize), 0, 0, 1, 1}, {&__pyx_n_s__n_acc_points, __pyx_k__n_acc_points, sizeof(__pyx_k__n_acc_points), 0, 0, 1, 1}, {&__pyx_n_s__neighbor_number, __pyx_k__neighbor_number, sizeof(__pyx_k__neighbor_number), 0, 0, 1, 1}, {&__pyx_n_s__np, __pyx_k__np, sizeof(__pyx_k__np), 0, 0, 1, 1}, {&__pyx_n_s__numpy, __pyx_k__numpy, sizeof(__pyx_k__numpy), 0, 0, 1, 1}, {&__pyx_n_s__probe, __pyx_k__probe, sizeof(__pyx_k__probe), 0, 0, 1, 1}, {&__pyx_n_s__qcoords, __pyx_k__qcoords, sizeof(__pyx_k__qcoords), 0, 0, 1, 1}, {&__pyx_n_s__qcoords_c, __pyx_k__qcoords_c, sizeof(__pyx_k__qcoords_c), 0, 0, 1, 1}, {&__pyx_n_s__qpnts, __pyx_k__qpnts, sizeof(__pyx_k__qpnts), 0, 0, 1, 1}, {&__pyx_n_s__qradii, __pyx_k__qradii, sizeof(__pyx_k__qradii), 0, 0, 1, 1}, {&__pyx_n_s__qradius, __pyx_k__qradius, sizeof(__pyx_k__qradius), 0, 0, 1, 1}, {&__pyx_n_s__range, __pyx_k__range, sizeof(__pyx_k__range), 0, 0, 1, 1}, {&__pyx_n_s__ridx, __pyx_k__ridx, sizeof(__pyx_k__ridx), 0, 0, 1, 1}, {&__pyx_n_s__ridx_div, __pyx_k__ridx_div, sizeof(__pyx_k__ridx_div), 0, 0, 1, 1}, {&__pyx_n_s__rspoint, __pyx_k__rspoint, sizeof(__pyx_k__rspoint), 0, 0, 1, 1}, {&__pyx_n_s__search_limit, __pyx_k__search_limit, sizeof(__pyx_k__search_limit), 0, 0, 1, 1}, {&__pyx_n_s__search_point, __pyx_k__search_point, sizeof(__pyx_k__search_point), 0, 0, 1, 1}, {&__pyx_n_s__spoints, __pyx_k__spoints, sizeof(__pyx_k__spoints), 0, 0, 1, 1}, {&__pyx_n_s__spoints_c, __pyx_k__spoints_c, sizeof(__pyx_k__spoints_c), 0, 0, 1, 1}, {&__pyx_n_s__ssize, __pyx_k__ssize, sizeof(__pyx_k__ssize), 0, 0, 1, 1}, {&__pyx_n_s__t_arr, __pyx_k__t_arr, sizeof(__pyx_k__t_arr), 0, 0, 1, 1}, {&__pyx_n_s__t_lid, __pyx_k__t_lid, sizeof(__pyx_k__t_lid), 0, 0, 1, 1}, {&__pyx_n_s__t_ptr, __pyx_k__t_ptr, sizeof(__pyx_k__t_ptr), 0, 0, 1, 1}, {&__pyx_n_s__tree, __pyx_k__tree, sizeof(__pyx_k__tree), 0, 0, 1, 1}, {0, 0, 0, 0, 0, 0, 0} }; static int __Pyx_InitCachedBuiltins(void) { __pyx_builtin_ValueError = __Pyx_GetName(__pyx_b, __pyx_n_s__ValueError); if (!__pyx_builtin_ValueError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_range = __Pyx_GetName(__pyx_b, __pyx_n_s__range); if (!__pyx_builtin_range) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 227; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_RuntimeError = __Pyx_GetName(__pyx_b, __pyx_n_s__RuntimeError); if (!__pyx_builtin_RuntimeError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} return 0; __pyx_L1_error:; return -1; } static int __Pyx_InitCachedConstants(void) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__Pyx_InitCachedConstants", 0); /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_k_tuple_2 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_2); __Pyx_INCREF(((PyObject *)__pyx_kp_u_1)); PyTuple_SET_ITEM(__pyx_k_tuple_2, 0, ((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_2)); /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_k_tuple_4 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_4); __Pyx_INCREF(((PyObject *)__pyx_kp_u_3)); PyTuple_SET_ITEM(__pyx_k_tuple_4, 0, ((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_4)); /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_k_tuple_6 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_6)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_6); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_6, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_6)); /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_k_tuple_9 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_9)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_9); __Pyx_INCREF(((PyObject *)__pyx_kp_u_8)); PyTuple_SET_ITEM(__pyx_k_tuple_9, 0, ((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_9)); /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_k_tuple_10 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_10)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_10); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_10, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_10)); /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_k_tuple_12 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_12)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_12); __Pyx_INCREF(((PyObject *)__pyx_kp_u_11)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 0, ((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_12)); /* "cogent/struct/_asa.pyx":21 * * @cython.boundscheck(False) * def asa_loop(np.ndarray[DTYPE_t, ndim =2] qcoords, np.ndarray[DTYPE_t, ndim =2] lcoords,\ # <<<<<<<<<<<<<< * np.ndarray[DTYPE_t, ndim =1] qradii, np.ndarray[DTYPE_t, ndim =1] lradii,\ * np.ndarray[DTYPE_t, ndim =2] spoints, np.ndarray[DTYPE_t, ndim =1] box,\ */ __pyx_k_tuple_14 = PyTuple_New(44); if (unlikely(!__pyx_k_tuple_14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_14); __Pyx_INCREF(((PyObject *)__pyx_n_s__qcoords)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 0, ((PyObject *)__pyx_n_s__qcoords)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__qcoords)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lcoords)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 1, ((PyObject *)__pyx_n_s__lcoords)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lcoords)); __Pyx_INCREF(((PyObject *)__pyx_n_s__qradii)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 2, ((PyObject *)__pyx_n_s__qradii)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__qradii)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lradii)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 3, ((PyObject *)__pyx_n_s__lradii)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lradii)); __Pyx_INCREF(((PyObject *)__pyx_n_s__spoints)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 4, ((PyObject *)__pyx_n_s__spoints)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__spoints)); __Pyx_INCREF(((PyObject *)__pyx_n_s__box)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 5, ((PyObject *)__pyx_n_s__box)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__box)); __Pyx_INCREF(((PyObject *)__pyx_n_s__probe)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 6, ((PyObject *)__pyx_n_s__probe)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__probe)); __Pyx_INCREF(((PyObject *)__pyx_n_s__bucket_size)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 7, ((PyObject *)__pyx_n_s__bucket_size)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__bucket_size)); __Pyx_INCREF(((PyObject *)__pyx_n_s__MAXSYM)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 8, ((PyObject *)__pyx_n_s__MAXSYM)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__MAXSYM)); __Pyx_INCREF(((PyObject *)__pyx_n_s__idx)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 9, ((PyObject *)__pyx_n_s__idx)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__idx)); __Pyx_INCREF(((PyObject *)__pyx_n_s__idxn)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 10, ((PyObject *)__pyx_n_s__idxn)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__idxn)); __Pyx_INCREF(((PyObject *)__pyx_n_s__idxn_skip)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 11, ((PyObject *)__pyx_n_s__idxn_skip)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__idxn_skip)); __Pyx_INCREF(((PyObject *)__pyx_n_s__idxs)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 12, ((PyObject *)__pyx_n_s__idxs)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__idxs)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lidx)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 13, ((PyObject *)__pyx_n_s__lidx)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lidx)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lidx_c)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 14, ((PyObject *)__pyx_n_s__lidx_c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lidx_c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__is_accessible)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 15, ((PyObject *)__pyx_n_s__is_accessible)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__is_accessible)); __Pyx_INCREF(((PyObject *)__pyx_n_s__n_acc_points)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 16, ((PyObject *)__pyx_n_s__n_acc_points)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__n_acc_points)); __Pyx_INCREF(((PyObject *)__pyx_n_s__qradius)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 17, ((PyObject *)__pyx_n_s__qradius)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__qradius)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lradius)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 18, ((PyObject *)__pyx_n_s__lradius)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lradius)); __Pyx_INCREF(((PyObject *)__pyx_n_s__search_limit)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 19, ((PyObject *)__pyx_n_s__search_limit)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__search_limit)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lradius_max)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 20, ((PyObject *)__pyx_n_s__lradius_max)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lradius_max)); __Pyx_INCREF(((PyObject *)__pyx_n_s__rspoint)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 21, ((PyObject *)__pyx_n_s__rspoint)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__rspoint)); __Pyx_INCREF(((PyObject *)__pyx_n_s__distance)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 22, ((PyObject *)__pyx_n_s__distance)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__distance)); __Pyx_INCREF(((PyObject *)__pyx_n_s__distance_sq)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 23, ((PyObject *)__pyx_n_s__distance_sq)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__distance_sq)); __Pyx_INCREF(((PyObject *)__pyx_n_s__dstptr)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 24, ((PyObject *)__pyx_n_s__dstptr)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__dstptr)); __Pyx_INCREF(((PyObject *)__pyx_n_s__idxptr)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 25, ((PyObject *)__pyx_n_s__idxptr)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__idxptr)); __Pyx_INCREF(((PyObject *)__pyx_n_s__areas)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 26, ((PyObject *)__pyx_n_s__areas)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__areas)); __Pyx_INCREF(((PyObject *)__pyx_n_s__qcoords_c)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 27, ((PyObject *)__pyx_n_s__qcoords_c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__qcoords_c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lcoords_c)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 28, ((PyObject *)__pyx_n_s__lcoords_c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lcoords_c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__spoints_c)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 29, ((PyObject *)__pyx_n_s__spoints_c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__spoints_c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__ssize)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 30, ((PyObject *)__pyx_n_s__ssize)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__ssize)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lsize)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 31, ((PyObject *)__pyx_n_s__lsize)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lsize)); __Pyx_INCREF(((PyObject *)__pyx_n_s__qpnts)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 32, ((PyObject *)__pyx_n_s__qpnts)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__qpnts)); __Pyx_INCREF(((PyObject *)__pyx_n_s__const_pi)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 33, ((PyObject *)__pyx_n_s__const_pi)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__const_pi)); __Pyx_INCREF(((PyObject *)__pyx_n_s__ridx)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 34, ((PyObject *)__pyx_n_s__ridx)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__ridx)); __Pyx_INCREF(((PyObject *)__pyx_n_s__ridx_div)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 35, ((PyObject *)__pyx_n_s__ridx_div)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__ridx_div)); __Pyx_INCREF(((PyObject *)__pyx_n_s__box_c)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 36, ((PyObject *)__pyx_n_s__box_c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__box_c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_ptr)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 37, ((PyObject *)__pyx_n_s__t_ptr)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_ptr)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_arr)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 38, ((PyObject *)__pyx_n_s__t_arr)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_arr)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_lid)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 39, ((PyObject *)__pyx_n_s__t_lid)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_lid)); __Pyx_INCREF(((PyObject *)__pyx_n_s__search_point)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 40, ((PyObject *)__pyx_n_s__search_point)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__search_point)); __Pyx_INCREF(((PyObject *)__pyx_n_s__neighbor_number)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 41, ((PyObject *)__pyx_n_s__neighbor_number)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__neighbor_number)); __Pyx_INCREF(((PyObject *)__pyx_n_s__kdpnts)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 42, ((PyObject *)__pyx_n_s__kdpnts)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__kdpnts)); __Pyx_INCREF(((PyObject *)__pyx_n_s__tree)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 43, ((PyObject *)__pyx_n_s__tree)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__tree)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_14)); __pyx_k_codeobj_15 = (PyObject*)__Pyx_PyCode_New(9, 0, 44, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_16, __pyx_n_s__asa_loop, 21, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_RefNannyFinishContext(); return 0; __pyx_L1_error:; __Pyx_RefNannyFinishContext(); return -1; } static int __Pyx_InitGlobals(void) { if (__Pyx_InitStrings(__pyx_string_tab) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_3 = PyInt_FromLong(3); if (unlikely(!__pyx_int_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_15 = PyInt_FromLong(15); if (unlikely(!__pyx_int_15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_200000 = PyInt_FromLong(200000); if (unlikely(!__pyx_int_200000)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; return 0; __pyx_L1_error:; return -1; } #if PY_MAJOR_VERSION < 3 PyMODINIT_FUNC init_asa(void); /*proto*/ PyMODINIT_FUNC init_asa(void) #else PyMODINIT_FUNC PyInit__asa(void); /*proto*/ PyMODINIT_FUNC PyInit__asa(void) #endif { PyObject *__pyx_t_1 = NULL; PyObject *__pyx_t_2 = NULL; __Pyx_RefNannyDeclarations #if CYTHON_REFNANNY __Pyx_RefNanny = __Pyx_RefNannyImportAPI("refnanny"); if (!__Pyx_RefNanny) { PyErr_Clear(); __Pyx_RefNanny = __Pyx_RefNannyImportAPI("Cython.Runtime.refnanny"); if (!__Pyx_RefNanny) Py_FatalError("failed to import 'refnanny' module"); } #endif __Pyx_RefNannySetupContext("PyMODINIT_FUNC PyInit__asa(void)", 0); if ( __Pyx_check_binary_version() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_tuple = PyTuple_New(0); if (unlikely(!__pyx_empty_tuple)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_bytes = PyBytes_FromStringAndSize("", 0); if (unlikely(!__pyx_empty_bytes)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #ifdef __Pyx_CyFunction_USED if (__Pyx_CyFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_FusedFunction_USED if (__pyx_FusedFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_Generator_USED if (__pyx_Generator_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif /*--- Library function declarations ---*/ /*--- Threads initialization code ---*/ #if defined(__PYX_FORCE_INIT_THREADS) && __PYX_FORCE_INIT_THREADS #ifdef WITH_THREAD /* Python build with threading support? */ PyEval_InitThreads(); #endif #endif /*--- Module creation code ---*/ #if PY_MAJOR_VERSION < 3 __pyx_m = Py_InitModule4(__Pyx_NAMESTR("_asa"), __pyx_methods, 0, 0, PYTHON_API_VERSION); #else __pyx_m = PyModule_Create(&__pyx_moduledef); #endif if (!__pyx_m) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; #if PY_MAJOR_VERSION < 3 Py_INCREF(__pyx_m); #endif __pyx_b = PyImport_AddModule(__Pyx_NAMESTR(__Pyx_BUILTIN_MODULE_NAME)); if (!__pyx_b) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; if (__Pyx_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; /*--- Initialize various global constants etc. ---*/ if (unlikely(__Pyx_InitGlobals() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__pyx_module_is_main_cogent__struct___asa) { if (__Pyx_SetAttrString(__pyx_m, "__name__", __pyx_n_s____main__) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; } /*--- Builtin init code ---*/ if (unlikely(__Pyx_InitCachedBuiltins() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Constants init code ---*/ if (unlikely(__Pyx_InitCachedConstants() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Global init code ---*/ /*--- Variable export code ---*/ /*--- Function export code ---*/ /*--- Type init code ---*/ /*--- Type import code ---*/ __pyx_ptype_5numpy_dtype = __Pyx_ImportType("numpy", "dtype", sizeof(PyArray_Descr), 0); if (unlikely(!__pyx_ptype_5numpy_dtype)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 154; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_flatiter = __Pyx_ImportType("numpy", "flatiter", sizeof(PyArrayIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_flatiter)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 164; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_broadcast = __Pyx_ImportType("numpy", "broadcast", sizeof(PyArrayMultiIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_broadcast)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 168; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ndarray = __Pyx_ImportType("numpy", "ndarray", sizeof(PyArrayObject), 0); if (unlikely(!__pyx_ptype_5numpy_ndarray)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 177; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ufunc = __Pyx_ImportType("numpy", "ufunc", sizeof(PyUFuncObject), 0); if (unlikely(!__pyx_ptype_5numpy_ufunc)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 860; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Variable import code ---*/ /*--- Function import code ---*/ __pyx_t_1 = __Pyx_ImportModule("cogent.maths.spatial.ckd3"); if (!__pyx_t_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ImportFunction(__pyx_t_1, "points", (void (**)(void))&__pyx_f_6cogent_5maths_7spatial_4ckd3_points, "struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ImportFunction(__pyx_t_1, "build_tree", (void (**)(void))&__pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree, "struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ImportFunction(__pyx_t_1, "rn", (void (**)(void))&__pyx_f_6cogent_5maths_7spatial_4ckd3_rn, "__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t (struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} Py_DECREF(__pyx_t_1); __pyx_t_1 = 0; /*--- Execution code ---*/ /* "cogent/struct/_asa.pyx":3 * cimport cython * cimport numpy as np * import numpy as np # <<<<<<<<<<<<<< * from numpy cimport npy_intp * from cogent.maths.spatial.ckd3 cimport kdpoint, points, kdnode, build_tree, rn */ __pyx_t_2 = __Pyx_Import(((PyObject *)__pyx_n_s__numpy), 0, -1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__np, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/struct/_asa.pyx":8 * from stdlib cimport malloc, free * * __version__ = "('1', '5', '3')" # <<<<<<<<<<<<<< * * cdef extern from "numpy/arrayobject.h": */ if (PyObject_SetAttr(__pyx_m, __pyx_n_s____version__, ((PyObject *)__pyx_kp_s_13)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/struct/_asa.pyx":21 * * @cython.boundscheck(False) * def asa_loop(np.ndarray[DTYPE_t, ndim =2] qcoords, np.ndarray[DTYPE_t, ndim =2] lcoords,\ # <<<<<<<<<<<<<< * np.ndarray[DTYPE_t, ndim =1] qradii, np.ndarray[DTYPE_t, ndim =1] lradii,\ * np.ndarray[DTYPE_t, ndim =2] spoints, np.ndarray[DTYPE_t, ndim =1] box,\ */ __pyx_t_2 = PyCFunction_NewEx(&__pyx_mdef_6cogent_6struct_4_asa_1asa_loop, NULL, __pyx_n_s_17); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__asa_loop, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/struct/_asa.pyx":1 * cimport cython # <<<<<<<<<<<<<< * cimport numpy as np * import numpy as np */ __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); if (PyObject_SetAttr(__pyx_m, __pyx_n_s____test__, ((PyObject *)__pyx_t_2)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); if (__pyx_m) { __Pyx_AddTraceback("init cogent.struct._asa", __pyx_clineno, __pyx_lineno, __pyx_filename); Py_DECREF(__pyx_m); __pyx_m = 0; } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_ImportError, "init cogent.struct._asa"); } __pyx_L0:; __Pyx_RefNannyFinishContext(); #if PY_MAJOR_VERSION < 3 return; #else return __pyx_m; #endif } /* Runtime support code */ #if CYTHON_REFNANNY static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname) { PyObject *m = NULL, *p = NULL; void *r = NULL; m = PyImport_ImportModule((char *)modname); if (!m) goto end; p = PyObject_GetAttrString(m, (char *)"RefNannyAPI"); if (!p) goto end; r = PyLong_AsVoidPtr(p); end: Py_XDECREF(p); Py_XDECREF(m); return (__Pyx_RefNannyAPIStruct *)r; } #endif /* CYTHON_REFNANNY */ static void __Pyx_RaiseArgtupleInvalid( const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found) { Py_ssize_t num_expected; const char *more_or_less; if (num_found < num_min) { num_expected = num_min; more_or_less = "at least"; } else { num_expected = num_max; more_or_less = "at most"; } if (exact) { more_or_less = "exactly"; } PyErr_Format(PyExc_TypeError, "%s() takes %s %"PY_FORMAT_SIZE_T"d positional argument%s (%"PY_FORMAT_SIZE_T"d given)", func_name, more_or_less, num_expected, (num_expected == 1) ? "" : "s", num_found); } static void __Pyx_RaiseDoubleKeywordsError( const char* func_name, PyObject* kw_name) { PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION >= 3 "%s() got multiple values for keyword argument '%U'", func_name, kw_name); #else "%s() got multiple values for keyword argument '%s'", func_name, PyString_AS_STRING(kw_name)); #endif } static int __Pyx_ParseOptionalKeywords( PyObject *kwds, PyObject **argnames[], PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, const char* function_name) { PyObject *key = 0, *value = 0; Py_ssize_t pos = 0; PyObject*** name; PyObject*** first_kw_arg = argnames + num_pos_args; while (PyDict_Next(kwds, &pos, &key, &value)) { name = first_kw_arg; while (*name && (**name != key)) name++; if (*name) { values[name-argnames] = value; } else { #if PY_MAJOR_VERSION < 3 if (unlikely(!PyString_CheckExact(key)) && unlikely(!PyString_Check(key))) { #else if (unlikely(!PyUnicode_Check(key))) { #endif goto invalid_keyword_type; } else { for (name = first_kw_arg; *name; name++) { #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) break; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) break; #endif } if (*name) { values[name-argnames] = value; } else { for (name=argnames; name != first_kw_arg; name++) { if (**name == key) goto arg_passed_twice; #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) goto arg_passed_twice; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) goto arg_passed_twice; #endif } if (kwds2) { if (unlikely(PyDict_SetItem(kwds2, key, value))) goto bad; } else { goto invalid_keyword; } } } } } return 0; arg_passed_twice: __Pyx_RaiseDoubleKeywordsError(function_name, **name); goto bad; invalid_keyword_type: PyErr_Format(PyExc_TypeError, "%s() keywords must be strings", function_name); goto bad; invalid_keyword: PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION < 3 "%s() got an unexpected keyword argument '%s'", function_name, PyString_AsString(key)); #else "%s() got an unexpected keyword argument '%U'", function_name, key); #endif bad: return -1; } static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact) { if (!type) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (none_allowed && obj == Py_None) return 1; else if (exact) { if (Py_TYPE(obj) == type) return 1; } else { if (PyObject_TypeCheck(obj, type)) return 1; } PyErr_Format(PyExc_TypeError, "Argument '%s' has incorrect type (expected %s, got %s)", name, type->tp_name, Py_TYPE(obj)->tp_name); return 0; } static CYTHON_INLINE int __Pyx_IsLittleEndian(void) { unsigned int n = 1; return *(unsigned char*)(&n) != 0; } static void __Pyx_BufFmt_Init(__Pyx_BufFmt_Context* ctx, __Pyx_BufFmt_StackElem* stack, __Pyx_TypeInfo* type) { stack[0].field = &ctx->root; stack[0].parent_offset = 0; ctx->root.type = type; ctx->root.name = "buffer dtype"; ctx->root.offset = 0; ctx->head = stack; ctx->head->field = &ctx->root; ctx->fmt_offset = 0; ctx->head->parent_offset = 0; ctx->new_packmode = '@'; ctx->enc_packmode = '@'; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->is_complex = 0; ctx->is_valid_array = 0; ctx->struct_alignment = 0; while (type->typegroup == 'S') { ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = 0; type = type->fields->type; } } static int __Pyx_BufFmt_ParseNumber(const char** ts) { int count; const char* t = *ts; if (*t < '0' || *t > '9') { return -1; } else { count = *t++ - '0'; while (*t >= '0' && *t < '9') { count *= 10; count += *t++ - '0'; } } *ts = t; return count; } static int __Pyx_BufFmt_ExpectNumber(const char **ts) { int number = __Pyx_BufFmt_ParseNumber(ts); if (number == -1) /* First char was not a digit */ PyErr_Format(PyExc_ValueError,\ "Does not understand character buffer dtype format string ('%c')", **ts); return number; } static void __Pyx_BufFmt_RaiseUnexpectedChar(char ch) { PyErr_Format(PyExc_ValueError, "Unexpected format string character: '%c'", ch); } static const char* __Pyx_BufFmt_DescribeTypeChar(char ch, int is_complex) { switch (ch) { case 'b': return "'char'"; case 'B': return "'unsigned char'"; case 'h': return "'short'"; case 'H': return "'unsigned short'"; case 'i': return "'int'"; case 'I': return "'unsigned int'"; case 'l': return "'long'"; case 'L': return "'unsigned long'"; case 'q': return "'long long'"; case 'Q': return "'unsigned long long'"; case 'f': return (is_complex ? "'complex float'" : "'float'"); case 'd': return (is_complex ? "'complex double'" : "'double'"); case 'g': return (is_complex ? "'complex long double'" : "'long double'"); case 'T': return "a struct"; case 'O': return "Python object"; case 'P': return "a pointer"; case 's': case 'p': return "a string"; case 0: return "end"; default: return "unparseable format string"; } } static size_t __Pyx_BufFmt_TypeCharToStandardSize(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return 2; case 'i': case 'I': case 'l': case 'L': return 4; case 'q': case 'Q': return 8; case 'f': return (is_complex ? 8 : 4); case 'd': return (is_complex ? 16 : 8); case 'g': { PyErr_SetString(PyExc_ValueError, "Python does not define a standard format string size for long double ('g').."); return 0; } case 'O': case 'P': return sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static size_t __Pyx_BufFmt_TypeCharToNativeSize(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(short); case 'i': case 'I': return sizeof(int); case 'l': case 'L': return sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(float) * (is_complex ? 2 : 1); case 'd': return sizeof(double) * (is_complex ? 2 : 1); case 'g': return sizeof(long double) * (is_complex ? 2 : 1); case 'O': case 'P': return sizeof(void*); default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } typedef struct { char c; short x; } __Pyx_st_short; typedef struct { char c; int x; } __Pyx_st_int; typedef struct { char c; long x; } __Pyx_st_long; typedef struct { char c; float x; } __Pyx_st_float; typedef struct { char c; double x; } __Pyx_st_double; typedef struct { char c; long double x; } __Pyx_st_longdouble; typedef struct { char c; void *x; } __Pyx_st_void_p; #ifdef HAVE_LONG_LONG typedef struct { char c; PY_LONG_LONG x; } __Pyx_st_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToAlignment(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_st_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_st_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_st_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_st_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_st_float) - sizeof(float); case 'd': return sizeof(__Pyx_st_double) - sizeof(double); case 'g': return sizeof(__Pyx_st_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_st_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } /* These are for computing the padding at the end of the struct to align on the first member of the struct. This will probably the same as above, but we don't have any guarantees. */ typedef struct { short x; char c; } __Pyx_pad_short; typedef struct { int x; char c; } __Pyx_pad_int; typedef struct { long x; char c; } __Pyx_pad_long; typedef struct { float x; char c; } __Pyx_pad_float; typedef struct { double x; char c; } __Pyx_pad_double; typedef struct { long double x; char c; } __Pyx_pad_longdouble; typedef struct { void *x; char c; } __Pyx_pad_void_p; #ifdef HAVE_LONG_LONG typedef struct { PY_LONG_LONG x; char c; } __Pyx_pad_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToPadding(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_pad_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_pad_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_pad_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_pad_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_pad_float) - sizeof(float); case 'd': return sizeof(__Pyx_pad_double) - sizeof(double); case 'g': return sizeof(__Pyx_pad_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_pad_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static char __Pyx_BufFmt_TypeCharToGroup(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'h': case 'i': case 'l': case 'q': case 's': case 'p': return 'I'; case 'B': case 'H': case 'I': case 'L': case 'Q': return 'U'; case 'f': case 'd': case 'g': return (is_complex ? 'C' : 'R'); case 'O': return 'O'; case 'P': return 'P'; default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } static void __Pyx_BufFmt_RaiseExpected(__Pyx_BufFmt_Context* ctx) { if (ctx->head == NULL || ctx->head->field == &ctx->root) { const char* expected; const char* quote; if (ctx->head == NULL) { expected = "end"; quote = ""; } else { expected = ctx->head->field->type->name; quote = "'"; } PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected %s%s%s but got %s", quote, expected, quote, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex)); } else { __Pyx_StructField* field = ctx->head->field; __Pyx_StructField* parent = (ctx->head - 1)->field; PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected '%s' but got %s in '%s.%s'", field->type->name, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex), parent->type->name, field->name); } } static int __Pyx_BufFmt_ProcessTypeChunk(__Pyx_BufFmt_Context* ctx) { char group; size_t size, offset, arraysize = 1; if (ctx->enc_type == 0) return 0; if (ctx->head->field->type->arraysize[0]) { int i, ndim = 0; if (ctx->enc_type == 's' || ctx->enc_type == 'p') { ctx->is_valid_array = ctx->head->field->type->ndim == 1; ndim = 1; if (ctx->enc_count != ctx->head->field->type->arraysize[0]) { PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %zu", ctx->head->field->type->arraysize[0], ctx->enc_count); return -1; } } if (!ctx->is_valid_array) { PyErr_Format(PyExc_ValueError, "Expected %d dimensions, got %d", ctx->head->field->type->ndim, ndim); return -1; } for (i = 0; i < ctx->head->field->type->ndim; i++) { arraysize *= ctx->head->field->type->arraysize[i]; } ctx->is_valid_array = 0; ctx->enc_count = 1; } group = __Pyx_BufFmt_TypeCharToGroup(ctx->enc_type, ctx->is_complex); do { __Pyx_StructField* field = ctx->head->field; __Pyx_TypeInfo* type = field->type; if (ctx->enc_packmode == '@' || ctx->enc_packmode == '^') { size = __Pyx_BufFmt_TypeCharToNativeSize(ctx->enc_type, ctx->is_complex); } else { size = __Pyx_BufFmt_TypeCharToStandardSize(ctx->enc_type, ctx->is_complex); } if (ctx->enc_packmode == '@') { size_t align_at = __Pyx_BufFmt_TypeCharToAlignment(ctx->enc_type, ctx->is_complex); size_t align_mod_offset; if (align_at == 0) return -1; align_mod_offset = ctx->fmt_offset % align_at; if (align_mod_offset > 0) ctx->fmt_offset += align_at - align_mod_offset; if (ctx->struct_alignment == 0) ctx->struct_alignment = __Pyx_BufFmt_TypeCharToPadding(ctx->enc_type, ctx->is_complex); } if (type->size != size || type->typegroup != group) { if (type->typegroup == 'C' && type->fields != NULL) { size_t parent_offset = ctx->head->parent_offset + field->offset; ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = parent_offset; continue; } __Pyx_BufFmt_RaiseExpected(ctx); return -1; } offset = ctx->head->parent_offset + field->offset; if (ctx->fmt_offset != offset) { PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch; next field is at offset %"PY_FORMAT_SIZE_T"d but %"PY_FORMAT_SIZE_T"d expected", (Py_ssize_t)ctx->fmt_offset, (Py_ssize_t)offset); return -1; } ctx->fmt_offset += size; if (arraysize) ctx->fmt_offset += (arraysize - 1) * size; --ctx->enc_count; /* Consume from buffer string */ while (1) { if (field == &ctx->root) { ctx->head = NULL; if (ctx->enc_count != 0) { __Pyx_BufFmt_RaiseExpected(ctx); return -1; } break; /* breaks both loops as ctx->enc_count == 0 */ } ctx->head->field = ++field; if (field->type == NULL) { --ctx->head; field = ctx->head->field; continue; } else if (field->type->typegroup == 'S') { size_t parent_offset = ctx->head->parent_offset + field->offset; if (field->type->fields->type == NULL) continue; /* empty struct */ field = field->type->fields; ++ctx->head; ctx->head->field = field; ctx->head->parent_offset = parent_offset; break; } else { break; } } } while (ctx->enc_count); ctx->enc_type = 0; ctx->is_complex = 0; return 0; } static CYTHON_INLINE PyObject * __pyx_buffmt_parse_array(__Pyx_BufFmt_Context* ctx, const char** tsp) { const char *ts = *tsp; int i = 0, number; int ndim = ctx->head->field->type->ndim; ; ++ts; if (ctx->new_count != 1) { PyErr_SetString(PyExc_ValueError, "Cannot handle repeated arrays in format string"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; while (*ts && *ts != ')') { if (isspace(*ts)) continue; number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; if (i < ndim && (size_t) number != ctx->head->field->type->arraysize[i]) return PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %d", ctx->head->field->type->arraysize[i], number); if (*ts != ',' && *ts != ')') return PyErr_Format(PyExc_ValueError, "Expected a comma in format string, got '%c'", *ts); if (*ts == ',') ts++; i++; } if (i != ndim) return PyErr_Format(PyExc_ValueError, "Expected %d dimension(s), got %d", ctx->head->field->type->ndim, i); if (!*ts) { PyErr_SetString(PyExc_ValueError, "Unexpected end of format string, expected ')'"); return NULL; } ctx->is_valid_array = 1; ctx->new_count = 1; *tsp = ++ts; return Py_None; } static const char* __Pyx_BufFmt_CheckString(__Pyx_BufFmt_Context* ctx, const char* ts) { int got_Z = 0; while (1) { switch(*ts) { case 0: if (ctx->enc_type != 0 && ctx->head == NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; if (ctx->head != NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } return ts; case ' ': case 10: case 13: ++ts; break; case '<': if (!__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Little-endian buffer not supported on big-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '>': case '!': if (__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Big-endian buffer not supported on little-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '=': case '@': case '^': ctx->new_packmode = *ts++; break; case 'T': /* substruct */ { const char* ts_after_sub; size_t i, struct_count = ctx->new_count; size_t struct_alignment = ctx->struct_alignment; ctx->new_count = 1; ++ts; if (*ts != '{') { PyErr_SetString(PyExc_ValueError, "Buffer acquisition: Expected '{' after 'T'"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ ctx->enc_count = 0; ctx->struct_alignment = 0; ++ts; ts_after_sub = ts; for (i = 0; i != struct_count; ++i) { ts_after_sub = __Pyx_BufFmt_CheckString(ctx, ts); if (!ts_after_sub) return NULL; } ts = ts_after_sub; if (struct_alignment) ctx->struct_alignment = struct_alignment; } break; case '}': /* end of substruct; either repeat or move on */ { size_t alignment = ctx->struct_alignment; ++ts; if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ if (alignment && ctx->fmt_offset % alignment) { ctx->fmt_offset += alignment - (ctx->fmt_offset % alignment); } } return ts; case 'x': if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->fmt_offset += ctx->new_count; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->enc_packmode = ctx->new_packmode; ++ts; break; case 'Z': got_Z = 1; ++ts; if (*ts != 'f' && *ts != 'd' && *ts != 'g') { __Pyx_BufFmt_RaiseUnexpectedChar('Z'); return NULL; } /* fall through */ case 'c': case 'b': case 'B': case 'h': case 'H': case 'i': case 'I': case 'l': case 'L': case 'q': case 'Q': case 'f': case 'd': case 'g': case 'O': case 's': case 'p': if (ctx->enc_type == *ts && got_Z == ctx->is_complex && ctx->enc_packmode == ctx->new_packmode) { ctx->enc_count += ctx->new_count; } else { if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_count = ctx->new_count; ctx->enc_packmode = ctx->new_packmode; ctx->enc_type = *ts; ctx->is_complex = got_Z; } ++ts; ctx->new_count = 1; got_Z = 0; break; case ':': ++ts; while(*ts != ':') ++ts; ++ts; break; case '(': if (!__pyx_buffmt_parse_array(ctx, &ts)) return NULL; break; default: { int number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; ctx->new_count = (size_t)number; } } } } static CYTHON_INLINE void __Pyx_ZeroBuffer(Py_buffer* buf) { buf->buf = NULL; buf->obj = NULL; buf->strides = __Pyx_zeros; buf->shape = __Pyx_zeros; buf->suboffsets = __Pyx_minusones; } static CYTHON_INLINE int __Pyx_GetBufferAndValidate( Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack) { if (obj == Py_None || obj == NULL) { __Pyx_ZeroBuffer(buf); return 0; } buf->buf = NULL; if (__Pyx_GetBuffer(obj, buf, flags) == -1) goto fail; if (buf->ndim != nd) { PyErr_Format(PyExc_ValueError, "Buffer has wrong number of dimensions (expected %d, got %d)", nd, buf->ndim); goto fail; } if (!cast) { __Pyx_BufFmt_Context ctx; __Pyx_BufFmt_Init(&ctx, stack, dtype); if (!__Pyx_BufFmt_CheckString(&ctx, buf->format)) goto fail; } if ((unsigned)buf->itemsize != dtype->size) { PyErr_Format(PyExc_ValueError, "Item size of buffer (%"PY_FORMAT_SIZE_T"d byte%s) does not match size of '%s' (%"PY_FORMAT_SIZE_T"d byte%s)", buf->itemsize, (buf->itemsize > 1) ? "s" : "", dtype->name, (Py_ssize_t)dtype->size, (dtype->size > 1) ? "s" : ""); goto fail; } if (buf->suboffsets == NULL) buf->suboffsets = __Pyx_minusones; return 0; fail:; __Pyx_ZeroBuffer(buf); return -1; } static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info) { if (info->buf == NULL) return; if (info->suboffsets == __Pyx_minusones) info->suboffsets = NULL; __Pyx_ReleaseBuffer(info); } static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name) { PyObject *result; result = PyObject_GetAttr(dict, name); if (!result) { if (dict != __pyx_b) { PyErr_Clear(); result = PyObject_GetAttr(__pyx_b, name); } if (!result) { PyErr_SetObject(PyExc_NameError, name); } } return result; } static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb) { #if CYTHON_COMPILING_IN_CPYTHON PyObject *tmp_type, *tmp_value, *tmp_tb; PyThreadState *tstate = PyThreadState_GET(); tmp_type = tstate->curexc_type; tmp_value = tstate->curexc_value; tmp_tb = tstate->curexc_traceback; tstate->curexc_type = type; tstate->curexc_value = value; tstate->curexc_traceback = tb; Py_XDECREF(tmp_type); Py_XDECREF(tmp_value); Py_XDECREF(tmp_tb); #else PyErr_Restore(type, value, tb); #endif } static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb) { #if CYTHON_COMPILING_IN_CPYTHON PyThreadState *tstate = PyThreadState_GET(); *type = tstate->curexc_type; *value = tstate->curexc_value; *tb = tstate->curexc_traceback; tstate->curexc_type = 0; tstate->curexc_value = 0; tstate->curexc_traceback = 0; #else PyErr_Fetch(type, value, tb); #endif } #if PY_MAJOR_VERSION < 3 static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, CYTHON_UNUSED PyObject *cause) { Py_XINCREF(type); Py_XINCREF(value); Py_XINCREF(tb); if (tb == Py_None) { Py_DECREF(tb); tb = 0; } else if (tb != NULL && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto raise_error; } if (value == NULL) { value = Py_None; Py_INCREF(value); } #if PY_VERSION_HEX < 0x02050000 if (!PyClass_Check(type)) #else if (!PyType_Check(type)) #endif { if (value != Py_None) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto raise_error; } Py_DECREF(value); value = type; #if PY_VERSION_HEX < 0x02050000 if (PyInstance_Check(type)) { type = (PyObject*) ((PyInstanceObject*)type)->in_class; Py_INCREF(type); } else { type = 0; PyErr_SetString(PyExc_TypeError, "raise: exception must be an old-style class or instance"); goto raise_error; } #else type = (PyObject*) Py_TYPE(type); Py_INCREF(type); if (!PyType_IsSubtype((PyTypeObject *)type, (PyTypeObject *)PyExc_BaseException)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto raise_error; } #endif } __Pyx_ErrRestore(type, value, tb); return; raise_error: Py_XDECREF(value); Py_XDECREF(type); Py_XDECREF(tb); return; } #else /* Python 3+ */ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause) { if (tb == Py_None) { tb = 0; } else if (tb && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto bad; } if (value == Py_None) value = 0; if (PyExceptionInstance_Check(type)) { if (value) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto bad; } value = type; type = (PyObject*) Py_TYPE(value); } else if (!PyExceptionClass_Check(type)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto bad; } if (cause) { PyObject *fixed_cause; if (PyExceptionClass_Check(cause)) { fixed_cause = PyObject_CallObject(cause, NULL); if (fixed_cause == NULL) goto bad; } else if (PyExceptionInstance_Check(cause)) { fixed_cause = cause; Py_INCREF(fixed_cause); } else { PyErr_SetString(PyExc_TypeError, "exception causes must derive from " "BaseException"); goto bad; } if (!value) { value = PyObject_CallObject(type, NULL); } PyException_SetCause(value, fixed_cause); } PyErr_SetObject(type, value); if (tb) { PyThreadState *tstate = PyThreadState_GET(); PyObject* tmp_tb = tstate->curexc_traceback; if (tb != tmp_tb) { Py_INCREF(tb); tstate->curexc_traceback = tb; Py_XDECREF(tmp_tb); } } bad: return; } #endif static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index) { PyErr_Format(PyExc_ValueError, "need more than %"PY_FORMAT_SIZE_T"d value%s to unpack", index, (index == 1) ? "" : "s"); } static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected) { PyErr_Format(PyExc_ValueError, "too many values to unpack (expected %"PY_FORMAT_SIZE_T"d)", expected); } static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); } static void __Pyx_UnpackTupleError(PyObject *t, Py_ssize_t index) { if (t == Py_None) { __Pyx_RaiseNoneNotIterableError(); } else if (PyTuple_GET_SIZE(t) < index) { __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(t)); } else { __Pyx_RaiseTooManyValuesError(index); } } static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type) { if (unlikely(!type)) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (likely(PyObject_TypeCheck(obj, type))) return 1; PyErr_Format(PyExc_TypeError, "Cannot convert %.200s to %.200s", Py_TYPE(obj)->tp_name, type->tp_name); return 0; } #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags) { PyObject *getbuffer_cobj; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) return PyObject_GetBuffer(obj, view, flags); #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) return __pyx_pw_5numpy_7ndarray_1__getbuffer__(obj, view, flags); #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (getbuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_getbuffer"))) { getbufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (getbufferproc) PyCapsule_GetPointer(getbuffer_cobj, "getbuffer(obj, view, flags)"); #else func = (getbufferproc) PyCObject_AsVoidPtr(getbuffer_cobj); #endif Py_DECREF(getbuffer_cobj); if (!func) goto fail; return func(obj, view, flags); } else { PyErr_Clear(); } #endif PyErr_Format(PyExc_TypeError, "'%100s' does not have the buffer interface", Py_TYPE(obj)->tp_name); #if PY_VERSION_HEX < 0x02060000 fail: #endif return -1; } static void __Pyx_ReleaseBuffer(Py_buffer *view) { PyObject *obj = view->obj; PyObject *releasebuffer_cobj; if (!obj) return; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) { PyBuffer_Release(view); return; } #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) { __pyx_pw_5numpy_7ndarray_3__releasebuffer__(obj, view); return; } #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (releasebuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_releasebuffer"))) { releasebufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (releasebufferproc) PyCapsule_GetPointer(releasebuffer_cobj, "releasebuffer(obj, view)"); #else func = (releasebufferproc) PyCObject_AsVoidPtr(releasebuffer_cobj); #endif Py_DECREF(releasebuffer_cobj); if (!func) goto fail; func(obj, view); return; } else { PyErr_Clear(); } #endif goto nofail; #if PY_VERSION_HEX < 0x02060000 fail: #endif PyErr_WriteUnraisable(obj); nofail: Py_DECREF(obj); view->obj = NULL; } #endif /* PY_MAJOR_VERSION < 3 */ static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, long level) { PyObject *py_import = 0; PyObject *empty_list = 0; PyObject *module = 0; PyObject *global_dict = 0; PyObject *empty_dict = 0; PyObject *list; py_import = __Pyx_GetAttrString(__pyx_b, "__import__"); if (!py_import) goto bad; if (from_list) list = from_list; else { empty_list = PyList_New(0); if (!empty_list) goto bad; list = empty_list; } global_dict = PyModule_GetDict(__pyx_m); if (!global_dict) goto bad; empty_dict = PyDict_New(); if (!empty_dict) goto bad; #if PY_VERSION_HEX >= 0x02050000 { #if PY_MAJOR_VERSION >= 3 if (level == -1) { if (strchr(__Pyx_MODULE_NAME, '.')) { /* try package relative import first */ PyObject *py_level = PyInt_FromLong(1); if (!py_level) goto bad; module = PyObject_CallFunctionObjArgs(py_import, name, global_dict, empty_dict, list, py_level, NULL); Py_DECREF(py_level); if (!module) { if (!PyErr_ExceptionMatches(PyExc_ImportError)) goto bad; PyErr_Clear(); } } level = 0; /* try absolute import on failure */ } #endif if (!module) { PyObject *py_level = PyInt_FromLong(level); if (!py_level) goto bad; module = PyObject_CallFunctionObjArgs(py_import, name, global_dict, empty_dict, list, py_level, NULL); Py_DECREF(py_level); } } #else if (level>0) { PyErr_SetString(PyExc_RuntimeError, "Relative import is not supported for Python <=2.4."); goto bad; } module = PyObject_CallFunctionObjArgs(py_import, name, global_dict, empty_dict, list, NULL); #endif bad: Py_XDECREF(empty_list); Py_XDECREF(py_import); Py_XDECREF(empty_dict); return module; } static CYTHON_INLINE PyObject *__Pyx_PyInt_to_py_Py_intptr_t(Py_intptr_t val) { const Py_intptr_t neg_one = (Py_intptr_t)-1, const_zero = (Py_intptr_t)0; const int is_unsigned = const_zero < neg_one; if ((sizeof(Py_intptr_t) == sizeof(char)) || (sizeof(Py_intptr_t) == sizeof(short))) { return PyInt_FromLong((long)val); } else if ((sizeof(Py_intptr_t) == sizeof(int)) || (sizeof(Py_intptr_t) == sizeof(long))) { if (is_unsigned) return PyLong_FromUnsignedLong((unsigned long)val); else return PyInt_FromLong((long)val); } else if (sizeof(Py_intptr_t) == sizeof(PY_LONG_LONG)) { if (is_unsigned) return PyLong_FromUnsignedLongLong((unsigned PY_LONG_LONG)val); else return PyLong_FromLongLong((PY_LONG_LONG)val); } else { int one = 1; int little = (int)*(unsigned char *)&one; unsigned char *bytes = (unsigned char *)&val; return _PyLong_FromByteArray(bytes, sizeof(Py_intptr_t), little, !is_unsigned); } } #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return ::std::complex< float >(x, y); } #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return x + y*(__pyx_t_float_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { __pyx_t_float_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex a, __pyx_t_float_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrtf(z.real*z.real + z.imag*z.imag); #else return hypotf(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { float denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(a, a); case 3: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, a); case 4: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_absf(a); theta = atan2f(a.imag, a.real); } lnr = logf(r); z_r = expf(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cosf(z_theta); z.imag = z_r * sinf(z_theta); return z; } #endif #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return ::std::complex< double >(x, y); } #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return x + y*(__pyx_t_double_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { __pyx_t_double_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex a, __pyx_t_double_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrt(z.real*z.real + z.imag*z.imag); #else return hypot(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { double denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(a, a); case 3: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, a); case 4: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_abs(a); theta = atan2(a.imag, a.real); } lnr = log(r); z_r = exp(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cos(z_theta); z.imag = z_r * sin(z_theta); return z; } #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject* x) { const unsigned char neg_one = (unsigned char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned char" : "value too large to convert to unsigned char"); } return (unsigned char)-1; } return (unsigned char)val; } return (unsigned char)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject* x) { const unsigned short neg_one = (unsigned short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned short" : "value too large to convert to unsigned short"); } return (unsigned short)-1; } return (unsigned short)val; } return (unsigned short)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject* x) { const unsigned int neg_one = (unsigned int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned int" : "value too large to convert to unsigned int"); } return (unsigned int)-1; } return (unsigned int)val; } return (unsigned int)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject* x) { const char neg_one = (char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to char" : "value too large to convert to char"); } return (char)-1; } return (char)val; } return (char)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject* x) { const short neg_one = (short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to short" : "value too large to convert to short"); } return (short)-1; } return (short)val; } return (short)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject* x) { const signed char neg_one = (signed char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed char" : "value too large to convert to signed char"); } return (signed char)-1; } return (signed char)val; } return (signed char)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject* x) { const signed short neg_one = (signed short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed short" : "value too large to convert to signed short"); } return (signed short)-1; } return (signed short)val; } return (signed short)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject* x) { const signed int neg_one = (signed int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed int" : "value too large to convert to signed int"); } return (signed int)-1; } return (signed int)val; } return (signed int)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject* x) { const unsigned long neg_one = (unsigned long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)PyLong_AsUnsignedLong(x); } else { return (unsigned long)PyLong_AsLong(x); } } else { unsigned long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned long)-1; val = __Pyx_PyInt_AsUnsignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject* x) { const unsigned PY_LONG_LONG neg_one = (unsigned PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (unsigned PY_LONG_LONG)PyLong_AsLongLong(x); } } else { unsigned PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned PY_LONG_LONG)-1; val = __Pyx_PyInt_AsUnsignedLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject* x) { const long neg_one = (long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)PyLong_AsUnsignedLong(x); } else { return (long)PyLong_AsLong(x); } } else { long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (long)-1; val = __Pyx_PyInt_AsLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject* x) { const PY_LONG_LONG neg_one = (PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (PY_LONG_LONG)PyLong_AsLongLong(x); } } else { PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1; val = __Pyx_PyInt_AsLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject* x) { const signed long neg_one = (signed long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)PyLong_AsUnsignedLong(x); } else { return (signed long)PyLong_AsLong(x); } } else { signed long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed long)-1; val = __Pyx_PyInt_AsSignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject* x) { const signed PY_LONG_LONG neg_one = (signed PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (signed PY_LONG_LONG)PyLong_AsLongLong(x); } } else { signed PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed PY_LONG_LONG)-1; val = __Pyx_PyInt_AsSignedLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE npy_uint64 __Pyx_PyInt_from_py_npy_uint64(PyObject* x) { const npy_uint64 neg_one = (npy_uint64)-1, const_zero = (npy_uint64)0; const int is_unsigned = const_zero < neg_one; if (sizeof(npy_uint64) == sizeof(char)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedChar(x); else return (npy_uint64)__Pyx_PyInt_AsSignedChar(x); } else if (sizeof(npy_uint64) == sizeof(short)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedShort(x); else return (npy_uint64)__Pyx_PyInt_AsSignedShort(x); } else if (sizeof(npy_uint64) == sizeof(int)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedInt(x); else return (npy_uint64)__Pyx_PyInt_AsSignedInt(x); } else if (sizeof(npy_uint64) == sizeof(long)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedLong(x); else return (npy_uint64)__Pyx_PyInt_AsSignedLong(x); } else if (sizeof(npy_uint64) == sizeof(PY_LONG_LONG)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedLongLong(x); else return (npy_uint64)__Pyx_PyInt_AsSignedLongLong(x); } else { npy_uint64 val; PyObject *v = __Pyx_PyNumber_Int(x); #if PY_VERSION_HEX < 0x03000000 if (likely(v) && !PyLong_Check(v)) { PyObject *tmp = v; v = PyNumber_Long(tmp); Py_DECREF(tmp); } #endif if (likely(v)) { int one = 1; int is_little = (int)*(unsigned char *)&one; unsigned char *bytes = (unsigned char *)&val; int ret = _PyLong_AsByteArray((PyLongObject *)v, bytes, sizeof(val), is_little, !is_unsigned); Py_DECREF(v); if (likely(!ret)) return val; } return (npy_uint64)-1; } } static int __Pyx_check_binary_version(void) { char ctversion[4], rtversion[4]; PyOS_snprintf(ctversion, 4, "%d.%d", PY_MAJOR_VERSION, PY_MINOR_VERSION); PyOS_snprintf(rtversion, 4, "%s", Py_GetVersion()); if (ctversion[0] != rtversion[0] || ctversion[2] != rtversion[2]) { char message[200]; PyOS_snprintf(message, sizeof(message), "compiletime version %s of module '%.100s' " "does not match runtime version %s", ctversion, __Pyx_MODULE_NAME, rtversion); #if PY_VERSION_HEX < 0x02050000 return PyErr_Warn(NULL, message); #else return PyErr_WarnEx(NULL, message, 1); #endif } return 0; } #ifndef __PYX_HAVE_RT_ImportType #define __PYX_HAVE_RT_ImportType static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict) { PyObject *py_module = 0; PyObject *result = 0; PyObject *py_name = 0; char warning[200]; py_module = __Pyx_ImportModule(module_name); if (!py_module) goto bad; py_name = __Pyx_PyIdentifier_FromString(class_name); if (!py_name) goto bad; result = PyObject_GetAttr(py_module, py_name); Py_DECREF(py_name); py_name = 0; Py_DECREF(py_module); py_module = 0; if (!result) goto bad; if (!PyType_Check(result)) { PyErr_Format(PyExc_TypeError, "%s.%s is not a type object", module_name, class_name); goto bad; } if (!strict && (size_t)((PyTypeObject *)result)->tp_basicsize > size) { PyOS_snprintf(warning, sizeof(warning), "%s.%s size changed, may indicate binary incompatibility", module_name, class_name); #if PY_VERSION_HEX < 0x02050000 if (PyErr_Warn(NULL, warning) < 0) goto bad; #else if (PyErr_WarnEx(NULL, warning, 0) < 0) goto bad; #endif } else if ((size_t)((PyTypeObject *)result)->tp_basicsize != size) { PyErr_Format(PyExc_ValueError, "%s.%s has the wrong size, try recompiling", module_name, class_name); goto bad; } return (PyTypeObject *)result; bad: Py_XDECREF(py_module); Py_XDECREF(result); return NULL; } #endif #ifndef __PYX_HAVE_RT_ImportModule #define __PYX_HAVE_RT_ImportModule static PyObject *__Pyx_ImportModule(const char *name) { PyObject *py_name = 0; PyObject *py_module = 0; py_name = __Pyx_PyIdentifier_FromString(name); if (!py_name) goto bad; py_module = PyImport_Import(py_name); Py_DECREF(py_name); return py_module; bad: Py_XDECREF(py_name); return 0; } #endif #ifndef __PYX_HAVE_RT_ImportFunction #define __PYX_HAVE_RT_ImportFunction static int __Pyx_ImportFunction(PyObject *module, const char *funcname, void (**f)(void), const char *sig) { PyObject *d = 0; PyObject *cobj = 0; union { void (*fp)(void); void *p; } tmp; d = PyObject_GetAttrString(module, (char *)"__pyx_capi__"); if (!d) goto bad; cobj = PyDict_GetItemString(d, funcname); if (!cobj) { PyErr_Format(PyExc_ImportError, "%s does not export expected C function %s", PyModule_GetName(module), funcname); goto bad; } #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION==3&&PY_MINOR_VERSION==0) if (!PyCapsule_IsValid(cobj, sig)) { PyErr_Format(PyExc_TypeError, "C function %s.%s has wrong signature (expected %s, got %s)", PyModule_GetName(module), funcname, sig, PyCapsule_GetName(cobj)); goto bad; } tmp.p = PyCapsule_GetPointer(cobj, sig); #else {const char *desc, *s1, *s2; desc = (const char *)PyCObject_GetDesc(cobj); if (!desc) goto bad; s1 = desc; s2 = sig; while (*s1 != '\0' && *s1 == *s2) { s1++; s2++; } if (*s1 != *s2) { PyErr_Format(PyExc_TypeError, "C function %s.%s has wrong signature (expected %s, got %s)", PyModule_GetName(module), funcname, sig, desc); goto bad; } tmp.p = PyCObject_AsVoidPtr(cobj);} #endif *f = tmp.fp; if (!(*f)) goto bad; Py_DECREF(d); return 0; bad: Py_XDECREF(d); return -1; } #endif static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line) { int start = 0, mid = 0, end = count - 1; if (end >= 0 && code_line > entries[end].code_line) { return count; } while (start < end) { mid = (start + end) / 2; if (code_line < entries[mid].code_line) { end = mid; } else if (code_line > entries[mid].code_line) { start = mid + 1; } else { return mid; } } if (code_line <= entries[mid].code_line) { return mid; } else { return mid + 1; } } static PyCodeObject *__pyx_find_code_object(int code_line) { PyCodeObject* code_object; int pos; if (unlikely(!code_line) || unlikely(!__pyx_code_cache.entries)) { return NULL; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if (unlikely(pos >= __pyx_code_cache.count) || unlikely(__pyx_code_cache.entries[pos].code_line != code_line)) { return NULL; } code_object = __pyx_code_cache.entries[pos].code_object; Py_INCREF(code_object); return code_object; } static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object) { int pos, i; __Pyx_CodeObjectCacheEntry* entries = __pyx_code_cache.entries; if (unlikely(!code_line)) { return; } if (unlikely(!entries)) { entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Malloc(64*sizeof(__Pyx_CodeObjectCacheEntry)); if (likely(entries)) { __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = 64; __pyx_code_cache.count = 1; entries[0].code_line = code_line; entries[0].code_object = code_object; Py_INCREF(code_object); } return; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if ((pos < __pyx_code_cache.count) && unlikely(__pyx_code_cache.entries[pos].code_line == code_line)) { PyCodeObject* tmp = entries[pos].code_object; entries[pos].code_object = code_object; Py_DECREF(tmp); return; } if (__pyx_code_cache.count == __pyx_code_cache.max_count) { int new_max = __pyx_code_cache.max_count + 64; entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Realloc( __pyx_code_cache.entries, new_max*sizeof(__Pyx_CodeObjectCacheEntry)); if (unlikely(!entries)) { return; } __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = new_max; } for (i=__pyx_code_cache.count; i>pos; i--) { entries[i] = entries[i-1]; } entries[pos].code_line = code_line; entries[pos].code_object = code_object; __pyx_code_cache.count++; Py_INCREF(code_object); } #include "compile.h" #include "frameobject.h" #include "traceback.h" static PyCodeObject* __Pyx_CreateCodeObjectForTraceback( const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_srcfile = 0; PyObject *py_funcname = 0; #if PY_MAJOR_VERSION < 3 py_srcfile = PyString_FromString(filename); #else py_srcfile = PyUnicode_FromString(filename); #endif if (!py_srcfile) goto bad; if (c_line) { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #else py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #endif } else { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromString(funcname); #else py_funcname = PyUnicode_FromString(funcname); #endif } if (!py_funcname) goto bad; py_code = __Pyx_PyCode_New( 0, /*int argcount,*/ 0, /*int kwonlyargcount,*/ 0, /*int nlocals,*/ 0, /*int stacksize,*/ 0, /*int flags,*/ __pyx_empty_bytes, /*PyObject *code,*/ __pyx_empty_tuple, /*PyObject *consts,*/ __pyx_empty_tuple, /*PyObject *names,*/ __pyx_empty_tuple, /*PyObject *varnames,*/ __pyx_empty_tuple, /*PyObject *freevars,*/ __pyx_empty_tuple, /*PyObject *cellvars,*/ py_srcfile, /*PyObject *filename,*/ py_funcname, /*PyObject *name,*/ py_line, /*int firstlineno,*/ __pyx_empty_bytes /*PyObject *lnotab*/ ); Py_DECREF(py_srcfile); Py_DECREF(py_funcname); return py_code; bad: Py_XDECREF(py_srcfile); Py_XDECREF(py_funcname); return NULL; } static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_globals = 0; PyFrameObject *py_frame = 0; py_code = __pyx_find_code_object(c_line ? c_line : py_line); if (!py_code) { py_code = __Pyx_CreateCodeObjectForTraceback( funcname, c_line, py_line, filename); if (!py_code) goto bad; __pyx_insert_code_object(c_line ? c_line : py_line, py_code); } py_globals = PyModule_GetDict(__pyx_m); if (!py_globals) goto bad; py_frame = PyFrame_New( PyThreadState_GET(), /*PyThreadState *tstate,*/ py_code, /*PyCodeObject *code,*/ py_globals, /*PyObject *globals,*/ 0 /*PyObject *locals*/ ); if (!py_frame) goto bad; py_frame->f_lineno = py_line; PyTraceBack_Here(py_frame); bad: Py_XDECREF(py_code); Py_XDECREF(py_frame); } static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) { while (t->p) { #if PY_MAJOR_VERSION < 3 if (t->is_unicode) { *t->p = PyUnicode_DecodeUTF8(t->s, t->n - 1, NULL); } else if (t->intern) { *t->p = PyString_InternFromString(t->s); } else { *t->p = PyString_FromStringAndSize(t->s, t->n - 1); } #else /* Python 3+ has unicode identifiers */ if (t->is_unicode | t->is_str) { if (t->intern) { *t->p = PyUnicode_InternFromString(t->s); } else if (t->encoding) { *t->p = PyUnicode_Decode(t->s, t->n - 1, t->encoding, NULL); } else { *t->p = PyUnicode_FromStringAndSize(t->s, t->n - 1); } } else { *t->p = PyBytes_FromStringAndSize(t->s, t->n - 1); } #endif if (!*t->p) return -1; ++t; } return 0; } /* Type Conversion Functions */ static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { int is_true = x == Py_True; if (is_true | (x == Py_False) | (x == Py_None)) return is_true; else return PyObject_IsTrue(x); } static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x) { PyNumberMethods *m; const char *name = NULL; PyObject *res = NULL; #if PY_VERSION_HEX < 0x03000000 if (PyInt_Check(x) || PyLong_Check(x)) #else if (PyLong_Check(x)) #endif return Py_INCREF(x), x; m = Py_TYPE(x)->tp_as_number; #if PY_VERSION_HEX < 0x03000000 if (m && m->nb_int) { name = "int"; res = PyNumber_Int(x); } else if (m && m->nb_long) { name = "long"; res = PyNumber_Long(x); } #else if (m && m->nb_int) { name = "int"; res = PyNumber_Long(x); } #endif if (res) { #if PY_VERSION_HEX < 0x03000000 if (!PyInt_Check(res) && !PyLong_Check(res)) { #else if (!PyLong_Check(res)) { #endif PyErr_Format(PyExc_TypeError, "__%s__ returned non-%s (type %.200s)", name, name, Py_TYPE(res)->tp_name); Py_DECREF(res); return NULL; } } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_TypeError, "an integer is required"); } return res; } static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) { Py_ssize_t ival; PyObject* x = PyNumber_Index(b); if (!x) return -1; ival = PyInt_AsSsize_t(x); Py_DECREF(x); return ival; } static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) { #if PY_VERSION_HEX < 0x02050000 if (ival <= LONG_MAX) return PyInt_FromLong((long)ival); else { unsigned char *bytes = (unsigned char *) &ival; int one = 1; int little = (int)*(unsigned char*)&one; return _PyLong_FromByteArray(bytes, sizeof(size_t), little, 0); } #else return PyInt_FromSize_t(ival); #endif } static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject* x) { unsigned PY_LONG_LONG val = __Pyx_PyInt_AsUnsignedLongLong(x); if (unlikely(val == (unsigned PY_LONG_LONG)-1 && PyErr_Occurred())) { return (size_t)-1; } else if (unlikely(val != (unsigned PY_LONG_LONG)(size_t)val)) { PyErr_SetString(PyExc_OverflowError, "value too large to convert to size_t"); return (size_t)-1; } return (size_t)val; } #endif /* Py_PYTHON_H */ PyCogent-1.5.3/cogent/struct/_asa.pxd000644 000765 000024 00000000125 11307235442 020462 0ustar00jrideoutstaff000000 000000 cimport numpy as np ctypedef np.npy_float64 DTYPE_t ctypedef np.npy_uint64 UTYPE_tPyCogent-1.5.3/cogent/struct/_asa.pyx000644 000765 000024 00000015607 12024702176 020522 0ustar00jrideoutstaff000000 000000 cimport cython cimport numpy as np import numpy as np from numpy cimport npy_intp from cogent.maths.spatial.ckd3 cimport kdpoint, points, kdnode, build_tree, rn from stdlib cimport malloc, free __version__ = "('1', '5', '3')" cdef extern from "numpy/arrayobject.h": # cdef object PyArray_SimpleNewFromData(int nd, npy_intp *dims,\ # int typenum, void *data) cdef void import_array() # cdef enum requirements: # NPY_OWNDATA cdef extern from "math.h": float pi "M_PI" # Aliasing the C define to "pi" @cython.boundscheck(False) def asa_loop(np.ndarray[DTYPE_t, ndim =2] qcoords, np.ndarray[DTYPE_t, ndim =2] lcoords,\ np.ndarray[DTYPE_t, ndim =1] qradii, np.ndarray[DTYPE_t, ndim =1] lradii,\ np.ndarray[DTYPE_t, ndim =2] spoints, np.ndarray[DTYPE_t, ndim =1] box,\ DTYPE_t probe, UTYPE_t bucket_size, MAXSYM =200000): # looping indexes atom, neighbor, sphere cdef int idx, idxn, idxn_skip, idxs, lidx, lidx_c # flags cdef int is_accessible # initialize variables cdef UTYPE_t n_acc_points cdef DTYPE_t qradius, lradius, search_limit cdef DTYPE_t lradius_max = 2.0 + probe #malloc'ed cdef DTYPE_t *rspoint = malloc(3 * sizeof(DTYPE_t)) cdef DTYPE_t *distance = malloc(3 * sizeof(DTYPE_t)) cdef DTYPE_t *distance_sq = malloc(3 * sizeof(DTYPE_t)) cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) # cdef DTYPE_t *areas = malloc(qcoords.shape[0] * sizeof(DTYPE_t)) cdef np.ndarray[DTYPE_t, ndim=1] areas = np.ndarray((qcoords.shape[0],), dtype=np.float64) #c arrays from numpy cdef DTYPE_t *qcoords_c = qcoords.data cdef DTYPE_t *lcoords_c = lcoords.data cdef DTYPE_t *spoints_c = spoints.data #datas cdef UTYPE_t ssize = spoints.shape[0] cdef UTYPE_t lsize = lradii.shape[0] cdef npy_intp qpnts = qcoords.shape[0] cdef DTYPE_t const_pi = pi * 4.0 / ssize #pointers cdef UTYPE_t *ridx cdef UTYPE_t *ridx_div # create a temporary array of lattice points, which are within a box around # the query atoms. The kd-tree will be constructed from those filterd atoms. cdef DTYPE_t *box_c = box.data cdef DTYPE_t *t_ptr # temporary pointer cdef DTYPE_t *t_arr = malloc(3 * MAXSYM * sizeof(DTYPE_t)) # temporary array of symmetry cdef UTYPE_t *t_lid = malloc( MAXSYM * sizeof(UTYPE_t)) # maping to original indices lidx_c = 0 for 0 <= lidx < lcoords.shape[0]: t_ptr = lcoords_c + lidx * 3 if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ box_c[2] <= (t_ptr+2)[0] <= box_c[5]: t_arr[3*lidx_c ] = (t_ptr )[0] t_arr[3*lidx_c+1] = (t_ptr+1)[0] t_arr[3*lidx_c+2] = (t_ptr+2)[0] t_lid[lidx_c] = lidx lidx_c += 1 #make kd-tree cdef kdpoint search_point cdef npy_intp neighbor_number cdef kdpoint *kdpnts = points(t_arr, lidx_c, 3) cdef kdnode *tree = build_tree(kdpnts, 0, lidx_c -1, 3, bucket_size, 0) for 0 <= idx < qpnts: qradius = qradii[idx] #print qradius, lradius_max search_point.coords = qcoords_c + idx*3 #print search_point.coords[0], search_point.coords[1], search_point.coords[2] search_limit = (qradius + lradius_max) * (qradius + lradius_max) #print search_limit neighbor_number = rn(tree, kdpnts, search_point, dstptr, idxptr, search_limit, 3, 100) #print neighbor_number #create array of real indices ridx = malloc(neighbor_number * sizeof(UTYPE_t)) ridx_div = malloc(neighbor_number * sizeof(UTYPE_t)) idxn_skip = 0 for 0 <= idxn < neighbor_number: if dstptr[0][idxn] <= 0.001: idxn_skip = 1 continue ridx[idxn - idxn_skip] = kdpnts[idxptr[0][idxn]].index * 3 ridx_div[idxn - idxn_skip] = t_lid[kdpnts[idxptr[0][idxn]].index] % lsize n_acc_points = 0 for 0 <= idxs < ssize: # loop over sphere points is_accessible = 1 #if idxs == 0: #print search_point.coords[0], search_point.coords[1], search_point.coords[2], '\t', #if idxs == 0: #print (spoints_c + idxs*3)[0], (spoints_c + idxs*3+1)[0], (spoints_c + idxs*3+2)[0], '\t', #if idxs == 0: #print qradius, '\t', rspoint[0] = search_point.coords[0] + (spoints_c + idxs*3 )[0] * qradius rspoint[1] = search_point.coords[1] + (spoints_c + idxs*3+1)[0] * qradius rspoint[2] = search_point.coords[2] + (spoints_c + idxs*3+2)[0] * qradius #if idxs == 0: #print rspoint[0], rspoint[1], rspoint[2] #real_point = point * qradius + qcoord for 0 <= idxn < neighbor_number - idxn_skip: # loop over neighbors #if dstptr[0][idxn] == 0.: # continue #print ' ', neighbor_number, idxn_skip,idxn, ridx_div[idxn] ,ridx[idxn] lradius = lradii[ridx_div[idxn]] #print (lcoords_c + ridx[idxn]*3 )[0], (lcoords_c + ridx[idxn]*3+1)[0], (lcoords_c + ridx[idxn]*3+2)[0], distance[0] = rspoint[0] - (t_arr + ridx[idxn] )[0] distance[1] = rspoint[1] - (t_arr + ridx[idxn]+1)[0] distance[2] = rspoint[2] - (t_arr + ridx[idxn]+2)[0] #print '\t', distance[0], distance[1], distance[2], distance_sq[0] = distance[0] * distance[0] distance_sq[1] = distance[1] * distance[1] distance_sq[2] = distance[2] * distance[2] #print '\t', distance_sq[0], distance_sq[1], distance_sq[2], lradius * lradius, if distance_sq[0] + distance_sq[1] + distance_sq[2] < lradius * lradius: is_accessible = 0 #print 'NA' break #print if is_accessible == 1: n_acc_points += 1 #print n_acc_points areas[idx] = const_pi * n_acc_points * qradius * qradius free(ridx) free(ridx_div) free(t_arr) free(t_lid) free(distance) free(distance_sq) free(rspoint) free(idxptr[0]) free(idxptr) free(dstptr) # import_array() # cdef np.ndarray output = PyArray_SimpleNewFromData(1, &qpnts, NPY_DOUBLE, areas) # output.flags = output.flags|(NPY_OWNDATA) # this sets the ownership bit # PyArray_FLAGS(&output) # print PyArray_FLAGS(output) |(NPY_OWNDATA) # output = PyArray_NewCopy(output, NPY_CORDER) return areas PyCogent-1.5.3/cogent/struct/_contact.c000644 000765 000024 00001107106 12024702176 021010 0ustar00jrideoutstaff000000 000000 /* Generated by Cython 0.16 on Fri Sep 14 12:12:07 2012 */ #define PY_SSIZE_T_CLEAN #include "Python.h" #ifndef Py_PYTHON_H #error Python headers needed to compile C extensions, please install development version of Python. #elif PY_VERSION_HEX < 0x02040000 #error Cython requires Python 2.4+. #else #include /* For offsetof */ #ifndef offsetof #define offsetof(type, member) ( (size_t) & ((type*)0) -> member ) #endif #if !defined(WIN32) && !defined(MS_WINDOWS) #ifndef __stdcall #define __stdcall #endif #ifndef __cdecl #define __cdecl #endif #ifndef __fastcall #define __fastcall #endif #endif #ifndef DL_IMPORT #define DL_IMPORT(t) t #endif #ifndef DL_EXPORT #define DL_EXPORT(t) t #endif #ifndef PY_LONG_LONG #define PY_LONG_LONG LONG_LONG #endif #ifndef Py_HUGE_VAL #define Py_HUGE_VAL HUGE_VAL #endif #ifdef PYPY_VERSION #define CYTHON_COMPILING_IN_PYPY 1 #define CYTHON_COMPILING_IN_CPYTHON 0 #else #define CYTHON_COMPILING_IN_PYPY 0 #define CYTHON_COMPILING_IN_CPYTHON 1 #endif #if CYTHON_COMPILING_IN_PYPY #define __Pyx_PyCFunction_Call PyObject_Call #else #define __Pyx_PyCFunction_Call PyCFunction_Call #endif #if PY_VERSION_HEX < 0x02050000 typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN #define PY_FORMAT_SIZE_T "" #define PyInt_FromSsize_t(z) PyInt_FromLong(z) #define PyInt_AsSsize_t(o) __Pyx_PyInt_AsInt(o) #define PyNumber_Index(o) PyNumber_Int(o) #define PyIndex_Check(o) PyNumber_Check(o) #define PyErr_WarnEx(category, message, stacklevel) PyErr_Warn(category, message) #define __PYX_BUILD_PY_SSIZE_T "i" #else #define __PYX_BUILD_PY_SSIZE_T "n" #endif #if PY_VERSION_HEX < 0x02060000 #define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt) #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) #define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size) #define PyVarObject_HEAD_INIT(type, size) \ PyObject_HEAD_INIT(type) size, #define PyType_Modified(t) typedef struct { void *buf; PyObject *obj; Py_ssize_t len; Py_ssize_t itemsize; int readonly; int ndim; char *format; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; void *internal; } Py_buffer; #define PyBUF_SIMPLE 0 #define PyBUF_WRITABLE 0x0001 #define PyBUF_FORMAT 0x0004 #define PyBUF_ND 0x0008 #define PyBUF_STRIDES (0x0010 | PyBUF_ND) #define PyBUF_C_CONTIGUOUS (0x0020 | PyBUF_STRIDES) #define PyBUF_F_CONTIGUOUS (0x0040 | PyBUF_STRIDES) #define PyBUF_ANY_CONTIGUOUS (0x0080 | PyBUF_STRIDES) #define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES) #define PyBUF_RECORDS (PyBUF_STRIDES | PyBUF_FORMAT | PyBUF_WRITABLE) #define PyBUF_FULL (PyBUF_INDIRECT | PyBUF_FORMAT | PyBUF_WRITABLE) typedef int (*getbufferproc)(PyObject *, Py_buffer *, int); typedef void (*releasebufferproc)(PyObject *, Py_buffer *); #endif #if PY_MAJOR_VERSION < 3 #define __Pyx_BUILTIN_MODULE_NAME "__builtin__" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #else #define __Pyx_BUILTIN_MODULE_NAME "builtins" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #endif #if PY_MAJOR_VERSION < 3 && PY_MINOR_VERSION < 6 #define PyUnicode_FromString(s) PyUnicode_Decode(s, strlen(s), "UTF-8", "strict") #endif #if PY_MAJOR_VERSION >= 3 #define Py_TPFLAGS_CHECKTYPES 0 #define Py_TPFLAGS_HAVE_INDEX 0 #endif #if (PY_VERSION_HEX < 0x02060000) || (PY_MAJOR_VERSION >= 3) #define Py_TPFLAGS_HAVE_NEWBUFFER 0 #endif #if PY_VERSION_HEX > 0x03030000 && defined(PyUnicode_GET_LENGTH) #define CYTHON_PEP393_ENABLED 1 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_LENGTH(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) PyUnicode_READ_CHAR(u, i) #else #define CYTHON_PEP393_ENABLED 0 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_SIZE(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) ((Py_UCS4)(PyUnicode_AS_UNICODE(u)[i])) #endif #if PY_MAJOR_VERSION >= 3 #define PyBaseString_Type PyUnicode_Type #define PyStringObject PyUnicodeObject #define PyString_Type PyUnicode_Type #define PyString_Check PyUnicode_Check #define PyString_CheckExact PyUnicode_CheckExact #endif #if PY_VERSION_HEX < 0x02060000 #define PyBytesObject PyStringObject #define PyBytes_Type PyString_Type #define PyBytes_Check PyString_Check #define PyBytes_CheckExact PyString_CheckExact #define PyBytes_FromString PyString_FromString #define PyBytes_FromStringAndSize PyString_FromStringAndSize #define PyBytes_FromFormat PyString_FromFormat #define PyBytes_DecodeEscape PyString_DecodeEscape #define PyBytes_AsString PyString_AsString #define PyBytes_AsStringAndSize PyString_AsStringAndSize #define PyBytes_Size PyString_Size #define PyBytes_AS_STRING PyString_AS_STRING #define PyBytes_GET_SIZE PyString_GET_SIZE #define PyBytes_Repr PyString_Repr #define PyBytes_Concat PyString_Concat #define PyBytes_ConcatAndDel PyString_ConcatAndDel #endif #if PY_VERSION_HEX < 0x02060000 #define PySet_Check(obj) PyObject_TypeCheck(obj, &PySet_Type) #define PyFrozenSet_Check(obj) PyObject_TypeCheck(obj, &PyFrozenSet_Type) #endif #ifndef PySet_CheckExact #define PySet_CheckExact(obj) (Py_TYPE(obj) == &PySet_Type) #endif #define __Pyx_TypeCheck(obj, type) PyObject_TypeCheck(obj, (PyTypeObject *)type) #if PY_MAJOR_VERSION >= 3 #define PyIntObject PyLongObject #define PyInt_Type PyLong_Type #define PyInt_Check(op) PyLong_Check(op) #define PyInt_CheckExact(op) PyLong_CheckExact(op) #define PyInt_FromString PyLong_FromString #define PyInt_FromUnicode PyLong_FromUnicode #define PyInt_FromLong PyLong_FromLong #define PyInt_FromSize_t PyLong_FromSize_t #define PyInt_FromSsize_t PyLong_FromSsize_t #define PyInt_AsLong PyLong_AsLong #define PyInt_AS_LONG PyLong_AS_LONG #define PyInt_AsSsize_t PyLong_AsSsize_t #define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask #define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask #endif #if PY_MAJOR_VERSION >= 3 #define PyBoolObject PyLongObject #endif #if PY_VERSION_HEX < 0x03020000 typedef long Py_hash_t; #define __Pyx_PyInt_FromHash_t PyInt_FromLong #define __Pyx_PyInt_AsHash_t PyInt_AsLong #else #define __Pyx_PyInt_FromHash_t PyInt_FromSsize_t #define __Pyx_PyInt_AsHash_t PyInt_AsSsize_t #endif #if (PY_MAJOR_VERSION < 3) || (PY_VERSION_HEX >= 0x03010300) #define __Pyx_PySequence_GetSlice(obj, a, b) PySequence_GetSlice(obj, a, b) #define __Pyx_PySequence_SetSlice(obj, a, b, value) PySequence_SetSlice(obj, a, b, value) #define __Pyx_PySequence_DelSlice(obj, a, b) PySequence_DelSlice(obj, a, b) #else #define __Pyx_PySequence_GetSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), (PyObject*)0) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_GetSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object is unsliceable", (obj)->ob_type->tp_name), (PyObject*)0))) #define __Pyx_PySequence_SetSlice(obj, a, b, value) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_SetSlice(obj, a, b, value)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice assignment", (obj)->ob_type->tp_name), -1))) #define __Pyx_PySequence_DelSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_DelSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice deletion", (obj)->ob_type->tp_name), -1))) #endif #if PY_MAJOR_VERSION >= 3 #define PyMethod_New(func, self, klass) ((self) ? PyMethod_New(func, self) : PyInstanceMethod_New(func)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),((char *)(n))) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),((char *)(n)),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),((char *)(n))) #else #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),(n)) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),(n),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),(n)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_NAMESTR(n) ((char *)(n)) #define __Pyx_DOCSTR(n) ((char *)(n)) #else #define __Pyx_NAMESTR(n) (n) #define __Pyx_DOCSTR(n) (n) #endif #if PY_MAJOR_VERSION >= 3 #define __Pyx_PyNumber_Divide(x,y) PyNumber_TrueDivide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceTrueDivide(x,y) #else #define __Pyx_PyNumber_Divide(x,y) PyNumber_Divide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceDivide(x,y) #endif #ifndef __PYX_EXTERN_C #ifdef __cplusplus #define __PYX_EXTERN_C extern "C" #else #define __PYX_EXTERN_C extern #endif #endif #if defined(WIN32) || defined(MS_WINDOWS) #define _USE_MATH_DEFINES #endif #include #define __PYX_HAVE__cogent__struct___contact #define __PYX_HAVE_API__cogent__struct___contact #include "stdio.h" #include "stdlib.h" #include "numpy/arrayobject.h" #include "numpy/ufuncobject.h" #ifdef _OPENMP #include #endif /* _OPENMP */ #ifdef PYREX_WITHOUT_ASSERTIONS #define CYTHON_WITHOUT_ASSERTIONS #endif /* inline attribute */ #ifndef CYTHON_INLINE #if defined(__GNUC__) #define CYTHON_INLINE __inline__ #elif defined(_MSC_VER) #define CYTHON_INLINE __inline #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L #define CYTHON_INLINE inline #else #define CYTHON_INLINE #endif #endif /* unused attribute */ #ifndef CYTHON_UNUSED # if defined(__GNUC__) # if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif # elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif #endif typedef struct {PyObject **p; char *s; const long n; const char* encoding; const char is_unicode; const char is_str; const char intern; } __Pyx_StringTabEntry; /*proto*/ /* Type Conversion Predeclarations */ #define __Pyx_PyBytes_FromUString(s) PyBytes_FromString((char*)s) #define __Pyx_PyBytes_AsUString(s) ((unsigned char*) PyBytes_AsString(s)) #define __Pyx_Owned_Py_None(b) (Py_INCREF(Py_None), Py_None) #define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False)) static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject*); static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x); static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject*); static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t); static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject*); #define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x)) #define __pyx_PyFloat_AsFloat(x) ((float) __pyx_PyFloat_AsDouble(x)) #ifdef __GNUC__ /* Test for GCC > 2.95 */ #if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)) #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) #else /* __GNUC__ > 2 ... */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ > 2 ... */ #else /* __GNUC__ */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ */ static PyObject *__pyx_m; static PyObject *__pyx_b; static PyObject *__pyx_empty_tuple; static PyObject *__pyx_empty_bytes; static int __pyx_lineno; static int __pyx_clineno = 0; static const char * __pyx_cfilenm= __FILE__; static const char *__pyx_filename; #if !defined(CYTHON_CCOMPLEX) #if defined(__cplusplus) #define CYTHON_CCOMPLEX 1 #elif defined(_Complex_I) #define CYTHON_CCOMPLEX 1 #else #define CYTHON_CCOMPLEX 0 #endif #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus #include #else #include #endif #endif #if CYTHON_CCOMPLEX && !defined(__cplusplus) && defined(__sun__) && defined(__GNUC__) #undef _Complex_I #define _Complex_I 1.0fj #endif static const char *__pyx_f[] = { "_contact.pyx", "numpy.pxd", }; #define IS_UNSIGNED(type) (((type) -1) > 0) struct __Pyx_StructField_; #define __PYX_BUF_FLAGS_PACKED_STRUCT (1 << 0) typedef struct { const char* name; /* for error messages only */ struct __Pyx_StructField_* fields; size_t size; /* sizeof(type) */ size_t arraysize[8]; /* length of array in each dimension */ int ndim; char typegroup; /* _R_eal, _C_omplex, Signed _I_nt, _U_nsigned int, _S_truct, _P_ointer, _O_bject */ char is_unsigned; int flags; } __Pyx_TypeInfo; typedef struct __Pyx_StructField_ { __Pyx_TypeInfo* type; const char* name; size_t offset; } __Pyx_StructField; typedef struct { __Pyx_StructField* field; size_t parent_offset; } __Pyx_BufFmt_StackElem; typedef struct { __Pyx_StructField root; __Pyx_BufFmt_StackElem* head; size_t fmt_offset; size_t new_count, enc_count; size_t struct_alignment; int is_complex; char enc_type; char new_packmode; char enc_packmode; char is_valid_array; } __Pyx_BufFmt_Context; /* "numpy.pxd":722 * # in Cython to enable them only on the right systems. * * ctypedef npy_int8 int8_t # <<<<<<<<<<<<<< * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t */ typedef npy_int8 __pyx_t_5numpy_int8_t; /* "numpy.pxd":723 * * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t # <<<<<<<<<<<<<< * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t */ typedef npy_int16 __pyx_t_5numpy_int16_t; /* "numpy.pxd":724 * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t # <<<<<<<<<<<<<< * ctypedef npy_int64 int64_t * #ctypedef npy_int96 int96_t */ typedef npy_int32 __pyx_t_5numpy_int32_t; /* "numpy.pxd":725 * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t # <<<<<<<<<<<<<< * #ctypedef npy_int96 int96_t * #ctypedef npy_int128 int128_t */ typedef npy_int64 __pyx_t_5numpy_int64_t; /* "numpy.pxd":729 * #ctypedef npy_int128 int128_t * * ctypedef npy_uint8 uint8_t # <<<<<<<<<<<<<< * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t */ typedef npy_uint8 __pyx_t_5numpy_uint8_t; /* "numpy.pxd":730 * * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t # <<<<<<<<<<<<<< * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t */ typedef npy_uint16 __pyx_t_5numpy_uint16_t; /* "numpy.pxd":731 * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t # <<<<<<<<<<<<<< * ctypedef npy_uint64 uint64_t * #ctypedef npy_uint96 uint96_t */ typedef npy_uint32 __pyx_t_5numpy_uint32_t; /* "numpy.pxd":732 * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t # <<<<<<<<<<<<<< * #ctypedef npy_uint96 uint96_t * #ctypedef npy_uint128 uint128_t */ typedef npy_uint64 __pyx_t_5numpy_uint64_t; /* "numpy.pxd":736 * #ctypedef npy_uint128 uint128_t * * ctypedef npy_float32 float32_t # <<<<<<<<<<<<<< * ctypedef npy_float64 float64_t * #ctypedef npy_float80 float80_t */ typedef npy_float32 __pyx_t_5numpy_float32_t; /* "numpy.pxd":737 * * ctypedef npy_float32 float32_t * ctypedef npy_float64 float64_t # <<<<<<<<<<<<<< * #ctypedef npy_float80 float80_t * #ctypedef npy_float128 float128_t */ typedef npy_float64 __pyx_t_5numpy_float64_t; /* "numpy.pxd":746 * # The int types are mapped a bit surprising -- * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t # <<<<<<<<<<<<<< * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t */ typedef npy_long __pyx_t_5numpy_int_t; /* "numpy.pxd":747 * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t * ctypedef npy_longlong long_t # <<<<<<<<<<<<<< * ctypedef npy_longlong longlong_t * */ typedef npy_longlong __pyx_t_5numpy_long_t; /* "numpy.pxd":748 * ctypedef npy_long int_t * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t # <<<<<<<<<<<<<< * * ctypedef npy_ulong uint_t */ typedef npy_longlong __pyx_t_5numpy_longlong_t; /* "numpy.pxd":750 * ctypedef npy_longlong longlong_t * * ctypedef npy_ulong uint_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t */ typedef npy_ulong __pyx_t_5numpy_uint_t; /* "numpy.pxd":751 * * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulonglong_t * */ typedef npy_ulonglong __pyx_t_5numpy_ulong_t; /* "numpy.pxd":752 * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t # <<<<<<<<<<<<<< * * ctypedef npy_intp intp_t */ typedef npy_ulonglong __pyx_t_5numpy_ulonglong_t; /* "numpy.pxd":754 * ctypedef npy_ulonglong ulonglong_t * * ctypedef npy_intp intp_t # <<<<<<<<<<<<<< * ctypedef npy_uintp uintp_t * */ typedef npy_intp __pyx_t_5numpy_intp_t; /* "numpy.pxd":755 * * ctypedef npy_intp intp_t * ctypedef npy_uintp uintp_t # <<<<<<<<<<<<<< * * ctypedef npy_double float_t */ typedef npy_uintp __pyx_t_5numpy_uintp_t; /* "numpy.pxd":757 * ctypedef npy_uintp uintp_t * * ctypedef npy_double float_t # <<<<<<<<<<<<<< * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t */ typedef npy_double __pyx_t_5numpy_float_t; /* "numpy.pxd":758 * * ctypedef npy_double float_t * ctypedef npy_double double_t # <<<<<<<<<<<<<< * ctypedef npy_longdouble longdouble_t * */ typedef npy_double __pyx_t_5numpy_double_t; /* "numpy.pxd":759 * ctypedef npy_double float_t * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cfloat cfloat_t */ typedef npy_longdouble __pyx_t_5numpy_longdouble_t; /* "cogent/maths/spatial/ckd3.pxd":2 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t # <<<<<<<<<<<<<< * ctypedef np.npy_uint64 UTYPE_t * */ typedef npy_float64 __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t; /* "cogent/maths/spatial/ckd3.pxd":3 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t * ctypedef np.npy_uint64 UTYPE_t # <<<<<<<<<<<<<< * * cdef enum constants: */ typedef npy_uint64 __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t; /* "cogent/struct/_contact.pxd":2 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t # <<<<<<<<<<<<<< * ctypedef np.npy_int64 LTYPE_t * ctypedef np.npy_uint64 UTYPE_t */ typedef npy_float64 __pyx_t_6cogent_6struct_8_contact_DTYPE_t; /* "cogent/struct/_contact.pxd":3 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t * ctypedef np.npy_int64 LTYPE_t # <<<<<<<<<<<<<< * ctypedef np.npy_uint64 UTYPE_t */ typedef npy_int64 __pyx_t_6cogent_6struct_8_contact_LTYPE_t; /* "cogent/struct/_contact.pxd":4 * ctypedef np.npy_float64 DTYPE_t * ctypedef np.npy_int64 LTYPE_t * ctypedef np.npy_uint64 UTYPE_t # <<<<<<<<<<<<<< */ typedef npy_uint64 __pyx_t_6cogent_6struct_8_contact_UTYPE_t; #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< float > __pyx_t_float_complex; #else typedef float _Complex __pyx_t_float_complex; #endif #else typedef struct { float real, imag; } __pyx_t_float_complex; #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< double > __pyx_t_double_complex; #else typedef double _Complex __pyx_t_double_complex; #endif #else typedef struct { double real, imag; } __pyx_t_double_complex; #endif /*--- Type declarations ---*/ /* "numpy.pxd":761 * ctypedef npy_longdouble longdouble_t * * ctypedef npy_cfloat cfloat_t # <<<<<<<<<<<<<< * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t */ typedef npy_cfloat __pyx_t_5numpy_cfloat_t; /* "numpy.pxd":762 * * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t # <<<<<<<<<<<<<< * ctypedef npy_clongdouble clongdouble_t * */ typedef npy_cdouble __pyx_t_5numpy_cdouble_t; /* "numpy.pxd":763 * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cdouble complex_t */ typedef npy_clongdouble __pyx_t_5numpy_clongdouble_t; /* "numpy.pxd":765 * ctypedef npy_clongdouble clongdouble_t * * ctypedef npy_cdouble complex_t # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew1(a): */ typedef npy_cdouble __pyx_t_5numpy_complex_t; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode; /* "cogent/maths/spatial/ckd3.pxd":5 * ctypedef np.npy_uint64 UTYPE_t * * cdef enum constants: # <<<<<<<<<<<<<< * NSTACK = 100 * */ enum __pyx_t_6cogent_5maths_7spatial_4ckd3_constants { __pyx_e_6cogent_5maths_7spatial_4ckd3_NSTACK = 100 }; /* "cogent/maths/spatial/ckd3.pxd":8 * NSTACK = 100 * * cdef struct kdpoint: # <<<<<<<<<<<<<< * UTYPE_t index * DTYPE_t *coords */ struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t index; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *coords; }; /* "cogent/maths/spatial/ckd3.pxd":12 * DTYPE_t *coords * * cdef struct kdnode: # <<<<<<<<<<<<<< * UTYPE_t bucket # 1 if leaf-bucket, 0 if node * int dimension */ struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t bucket; int dimension; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t position; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t start; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t end; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *left; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *right; }; #ifndef CYTHON_REFNANNY #define CYTHON_REFNANNY 0 #endif #if CYTHON_REFNANNY typedef struct { void (*INCREF)(void*, PyObject*, int); void (*DECREF)(void*, PyObject*, int); void (*GOTREF)(void*, PyObject*, int); void (*GIVEREF)(void*, PyObject*, int); void* (*SetupContext)(const char*, int, const char*); void (*FinishContext)(void**); } __Pyx_RefNannyAPIStruct; static __Pyx_RefNannyAPIStruct *__Pyx_RefNanny = NULL; static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname); /*proto*/ #define __Pyx_RefNannyDeclarations void *__pyx_refnanny = NULL; #ifdef WITH_THREAD #define __Pyx_RefNannySetupContext(name, acquire_gil) \ if (acquire_gil) { \ PyGILState_STATE __pyx_gilstate_save = PyGILState_Ensure(); \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ PyGILState_Release(__pyx_gilstate_save); \ } else { \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ } #else #define __Pyx_RefNannySetupContext(name, acquire_gil) \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__) #endif #define __Pyx_RefNannyFinishContext() \ __Pyx_RefNanny->FinishContext(&__pyx_refnanny) #define __Pyx_INCREF(r) __Pyx_RefNanny->INCREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_DECREF(r) __Pyx_RefNanny->DECREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GOTREF(r) __Pyx_RefNanny->GOTREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GIVEREF(r) __Pyx_RefNanny->GIVEREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_XINCREF(r) do { if((r) != NULL) {__Pyx_INCREF(r); }} while(0) #define __Pyx_XDECREF(r) do { if((r) != NULL) {__Pyx_DECREF(r); }} while(0) #define __Pyx_XGOTREF(r) do { if((r) != NULL) {__Pyx_GOTREF(r); }} while(0) #define __Pyx_XGIVEREF(r) do { if((r) != NULL) {__Pyx_GIVEREF(r);}} while(0) #else #define __Pyx_RefNannyDeclarations #define __Pyx_RefNannySetupContext(name, acquire_gil) #define __Pyx_RefNannyFinishContext() #define __Pyx_INCREF(r) Py_INCREF(r) #define __Pyx_DECREF(r) Py_DECREF(r) #define __Pyx_GOTREF(r) #define __Pyx_GIVEREF(r) #define __Pyx_XINCREF(r) Py_XINCREF(r) #define __Pyx_XDECREF(r) Py_XDECREF(r) #define __Pyx_XGOTREF(r) #define __Pyx_XGIVEREF(r) #endif /* CYTHON_REFNANNY */ #define __Pyx_CLEAR(r) do { PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);} while(0) #define __Pyx_XCLEAR(r) do { if((r) != NULL) {PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);}} while(0) static void __Pyx_RaiseArgtupleInvalid(const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found); /*proto*/ static void __Pyx_RaiseDoubleKeywordsError(const char* func_name, PyObject* kw_name); /*proto*/ static int __Pyx_ParseOptionalKeywords(PyObject *kwds, PyObject **argnames[], \ PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, \ const char* function_name); /*proto*/ static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact); /*proto*/ static CYTHON_INLINE int __Pyx_GetBufferAndValidate(Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack); static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info); static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name); /*proto*/ static void __Pyx_RaiseBufferIndexError(int axis); /*proto*/ #define __Pyx_BufPtrStrided1d(type, buf, i0, s0) (type)((char*)buf + i0 * s0) static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb); /*proto*/ static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb); /*proto*/ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause); /*proto*/ static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index); static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected); static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void); static void __Pyx_UnpackTupleError(PyObject *, Py_ssize_t index); /*proto*/ static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type); /*proto*/ typedef struct { Py_ssize_t shape, strides, suboffsets; } __Pyx_Buf_DimInfo; typedef struct { size_t refcount; Py_buffer pybuffer; } __Pyx_Buffer; typedef struct { __Pyx_Buffer *rcbuffer; char *data; __Pyx_Buf_DimInfo diminfo[8]; } __Pyx_LocalBuf_ND; #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags); static void __Pyx_ReleaseBuffer(Py_buffer *view); #else #define __Pyx_GetBuffer PyObject_GetBuffer #define __Pyx_ReleaseBuffer PyBuffer_Release #endif static Py_ssize_t __Pyx_zeros[] = {0, 0, 0, 0, 0, 0, 0, 0}; static Py_ssize_t __Pyx_minusones[] = {-1, -1, -1, -1, -1, -1, -1, -1}; static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, long level); /*proto*/ static CYTHON_INLINE PyObject *__Pyx_PyInt_to_py_Py_intptr_t(Py_intptr_t); #if CYTHON_CCOMPLEX #ifdef __cplusplus #define __Pyx_CREAL(z) ((z).real()) #define __Pyx_CIMAG(z) ((z).imag()) #else #define __Pyx_CREAL(z) (__real__(z)) #define __Pyx_CIMAG(z) (__imag__(z)) #endif #else #define __Pyx_CREAL(z) ((z).real) #define __Pyx_CIMAG(z) ((z).imag) #endif #if defined(_WIN32) && defined(__cplusplus) && CYTHON_CCOMPLEX #define __Pyx_SET_CREAL(z,x) ((z).real(x)) #define __Pyx_SET_CIMAG(z,y) ((z).imag(y)) #else #define __Pyx_SET_CREAL(z,x) __Pyx_CREAL(z) = (x) #define __Pyx_SET_CIMAG(z,y) __Pyx_CIMAG(z) = (y) #endif static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float, float); #if CYTHON_CCOMPLEX #define __Pyx_c_eqf(a, b) ((a)==(b)) #define __Pyx_c_sumf(a, b) ((a)+(b)) #define __Pyx_c_difff(a, b) ((a)-(b)) #define __Pyx_c_prodf(a, b) ((a)*(b)) #define __Pyx_c_quotf(a, b) ((a)/(b)) #define __Pyx_c_negf(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zerof(z) ((z)==(float)0) #define __Pyx_c_conjf(z) (::std::conj(z)) #if 1 #define __Pyx_c_absf(z) (::std::abs(z)) #define __Pyx_c_powf(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zerof(z) ((z)==0) #define __Pyx_c_conjf(z) (conjf(z)) #if 1 #define __Pyx_c_absf(z) (cabsf(z)) #define __Pyx_c_powf(a, b) (cpowf(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex); static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex); #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex, __pyx_t_float_complex); #endif #endif static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double, double); #if CYTHON_CCOMPLEX #define __Pyx_c_eq(a, b) ((a)==(b)) #define __Pyx_c_sum(a, b) ((a)+(b)) #define __Pyx_c_diff(a, b) ((a)-(b)) #define __Pyx_c_prod(a, b) ((a)*(b)) #define __Pyx_c_quot(a, b) ((a)/(b)) #define __Pyx_c_neg(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zero(z) ((z)==(double)0) #define __Pyx_c_conj(z) (::std::conj(z)) #if 1 #define __Pyx_c_abs(z) (::std::abs(z)) #define __Pyx_c_pow(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zero(z) ((z)==0) #define __Pyx_c_conj(z) (conj(z)) #if 1 #define __Pyx_c_abs(z) (cabs(z)) #define __Pyx_c_pow(a, b) (cpow(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex); static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex); #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex, __pyx_t_double_complex); #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject *); static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject *); static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject *); static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject *); static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject *); static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject *); static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject *); static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject *); static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject *); static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject *); static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject *); static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject *); static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject *); static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject *); static CYTHON_INLINE npy_uint64 __Pyx_PyInt_from_py_npy_uint64(PyObject *); static CYTHON_INLINE Py_intptr_t __Pyx_PyInt_from_py_Py_intptr_t(PyObject *); static int __Pyx_check_binary_version(void); #if !defined(__Pyx_PyIdentifier_FromString) #if PY_MAJOR_VERSION < 3 #define __Pyx_PyIdentifier_FromString(s) PyString_FromString(s) #else #define __Pyx_PyIdentifier_FromString(s) PyUnicode_FromString(s) #endif #endif static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict); /*proto*/ static PyObject *__Pyx_ImportModule(const char *name); /*proto*/ static int __Pyx_ImportFunction(PyObject *module, const char *funcname, void (**f)(void), const char *sig); /*proto*/ typedef struct { int code_line; PyCodeObject* code_object; } __Pyx_CodeObjectCacheEntry; struct __Pyx_CodeObjectCache { int count; int max_count; __Pyx_CodeObjectCacheEntry* entries; }; static struct __Pyx_CodeObjectCache __pyx_code_cache = {0,0,NULL}; static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line); static PyCodeObject *__pyx_find_code_object(int code_line); static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object); static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename); /*proto*/ static int __Pyx_InitStrings(__Pyx_StringTabEntry *t); /*proto*/ /* Module declarations from 'cpython.buffer' */ /* Module declarations from 'cpython.ref' */ /* Module declarations from 'libc.stdio' */ /* Module declarations from 'cpython.object' */ /* Module declarations from 'libc.stdlib' */ /* Module declarations from 'numpy' */ /* Module declarations from 'numpy' */ static PyTypeObject *__pyx_ptype_5numpy_dtype = 0; static PyTypeObject *__pyx_ptype_5numpy_flatiter = 0; static PyTypeObject *__pyx_ptype_5numpy_broadcast = 0; static PyTypeObject *__pyx_ptype_5numpy_ndarray = 0; static PyTypeObject *__pyx_ptype_5numpy_ufunc = 0; static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *, char *, char *, int *); /*proto*/ /* Module declarations from 'cython' */ /* Module declarations from 'cogent.maths.spatial.ckd3' */ static struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *(*__pyx_f_6cogent_5maths_7spatial_4ckd3_points)(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *(*__pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree)(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t (*__pyx_f_6cogent_5maths_7spatial_4ckd3_rn)(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ /* Module declarations from 'stdlib' */ /* Module declarations from 'cogent.struct._contact' */ static __Pyx_TypeInfo __Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_DTYPE_t = { "DTYPE_t", NULL, sizeof(__pyx_t_6cogent_6struct_8_contact_DTYPE_t), { 0 }, 0, 'R', 0, 0 }; static __Pyx_TypeInfo __Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_LTYPE_t = { "LTYPE_t", NULL, sizeof(__pyx_t_6cogent_6struct_8_contact_LTYPE_t), { 0 }, 0, 'I', IS_UNSIGNED(__pyx_t_6cogent_6struct_8_contact_LTYPE_t), 0 }; static __Pyx_TypeInfo __Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_UTYPE_t = { "UTYPE_t", NULL, sizeof(__pyx_t_6cogent_6struct_8_contact_UTYPE_t), { 0 }, 0, 'U', IS_UNSIGNED(__pyx_t_6cogent_6struct_8_contact_UTYPE_t), 0 }; #define __Pyx_MODULE_NAME "cogent.struct._contact" int __pyx_module_is_main_cogent__struct___contact = 0; /* Implementation of 'cogent.struct._contact' */ static PyObject *__pyx_builtin_ValueError; static PyObject *__pyx_builtin_range; static PyObject *__pyx_builtin_RuntimeError; static PyObject *__pyx_pf_6cogent_6struct_8_contact_cnt_loop(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_qcoords, PyArrayObject *__pyx_v_lcoords, PyArrayObject *__pyx_v_qc, PyArrayObject *__pyx_v_lc, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_shape1, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_shape2, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_zero_tra, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_mode, __pyx_t_6cogent_6struct_8_contact_DTYPE_t __pyx_v_search_limit, PyArrayObject *__pyx_v_box, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_bucket_size, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_MAXSYM, npy_intp __pyx_v_MAXCNT); /* proto */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /* proto */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info); /* proto */ static char __pyx_k_1[] = "ndarray is not C contiguous"; static char __pyx_k_3[] = "ndarray is not Fortran contiguous"; static char __pyx_k_5[] = "Non-native byte order not supported"; static char __pyx_k_7[] = "unknown dtype code in numpy.pxd (%d)"; static char __pyx_k_8[] = "Format string allocated too short, see comment in numpy.pxd"; static char __pyx_k_11[] = "Format string allocated too short."; static char __pyx_k_13[] = "('1', '5', '3')"; static char __pyx_k_16[] = "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/struct/_contact.pyx"; static char __pyx_k_17[] = "cogent.struct._contact"; static char __pyx_k__B[] = "B"; static char __pyx_k__H[] = "H"; static char __pyx_k__I[] = "I"; static char __pyx_k__L[] = "L"; static char __pyx_k__O[] = "O"; static char __pyx_k__Q[] = "Q"; static char __pyx_k__b[] = "b"; static char __pyx_k__d[] = "d"; static char __pyx_k__f[] = "f"; static char __pyx_k__g[] = "g"; static char __pyx_k__h[] = "h"; static char __pyx_k__i[] = "i"; static char __pyx_k__l[] = "l"; static char __pyx_k__q[] = "q"; static char __pyx_k__Zd[] = "Zd"; static char __pyx_k__Zf[] = "Zf"; static char __pyx_k__Zg[] = "Zg"; static char __pyx_k__lc[] = "lc"; static char __pyx_k__np[] = "np"; static char __pyx_k__qc[] = "qc"; static char __pyx_k__box[] = "box"; static char __pyx_k__idx[] = "idx"; static char __pyx_k__idxc[] = "idxc"; static char __pyx_k__idxn[] = "idxn"; static char __pyx_k__lidx[] = "lidx"; static char __pyx_k__mode[] = "mode"; static char __pyx_k__tree[] = "tree"; static char __pyx_k__box_c[] = "box_c"; static char __pyx_k__c_asu[] = "c_asu"; static char __pyx_k__c_dst[] = "c_dst"; static char __pyx_k__c_src[] = "c_src"; static char __pyx_k__c_sym[] = "c_sym"; static char __pyx_k__c_tra[] = "c_tra"; static char __pyx_k__dtype[] = "dtype"; static char __pyx_k__numpy[] = "numpy"; static char __pyx_k__range[] = "range"; static char __pyx_k__t_arr[] = "t_arr"; static char __pyx_k__t_asu[] = "t_asu"; static char __pyx_k__t_dst[] = "t_dst"; static char __pyx_k__t_idx[] = "t_idx"; static char __pyx_k__t_lid[] = "t_lid"; static char __pyx_k__t_ptr[] = "t_ptr"; static char __pyx_k__t_sym[] = "t_sym"; static char __pyx_k__t_tra[] = "t_tra"; static char __pyx_k__MAXCNT[] = "MAXCNT"; static char __pyx_k__MAXSYM[] = "MAXSYM"; static char __pyx_k__dstptr[] = "dstptr"; static char __pyx_k__idxptr[] = "idxptr"; static char __pyx_k__kdpnts[] = "kdpnts"; static char __pyx_k__lidx_c[] = "lidx_c"; static char __pyx_k__shape1[] = "shape1"; static char __pyx_k__shape2[] = "shape2"; static char __pyx_k__uint64[] = "uint64"; static char __pyx_k__float64[] = "float64"; static char __pyx_k__lcoords[] = "lcoords"; static char __pyx_k__qcoords[] = "qcoords"; static char __pyx_k____main__[] = "__main__"; static char __pyx_k____test__[] = "__test__"; static char __pyx_k__cnt_loop[] = "cnt_loop"; static char __pyx_k__zero_tra[] = "zero_tra"; static char __pyx_k__asu_atoms[] = "asu_atoms"; static char __pyx_k__lcoords_c[] = "lcoords_c"; static char __pyx_k__qcoords_c[] = "qcoords_c"; static char __pyx_k__ValueError[] = "ValueError"; static char __pyx_k____version__[] = "__version__"; static char __pyx_k__bucket_size[] = "bucket_size"; static char __pyx_k__RuntimeError[] = "RuntimeError"; static char __pyx_k__search_limit[] = "search_limit"; static char __pyx_k__search_point[] = "search_point"; static char __pyx_k__neighbor_number[] = "neighbor_number"; static PyObject *__pyx_kp_u_1; static PyObject *__pyx_kp_u_11; static PyObject *__pyx_kp_s_13; static PyObject *__pyx_kp_s_16; static PyObject *__pyx_n_s_17; static PyObject *__pyx_kp_u_3; static PyObject *__pyx_kp_u_5; static PyObject *__pyx_kp_u_7; static PyObject *__pyx_kp_u_8; static PyObject *__pyx_n_s__MAXCNT; static PyObject *__pyx_n_s__MAXSYM; static PyObject *__pyx_n_s__RuntimeError; static PyObject *__pyx_n_s__ValueError; static PyObject *__pyx_n_s____main__; static PyObject *__pyx_n_s____test__; static PyObject *__pyx_n_s____version__; static PyObject *__pyx_n_s__asu_atoms; static PyObject *__pyx_n_s__box; static PyObject *__pyx_n_s__box_c; static PyObject *__pyx_n_s__bucket_size; static PyObject *__pyx_n_s__c_asu; static PyObject *__pyx_n_s__c_dst; static PyObject *__pyx_n_s__c_src; static PyObject *__pyx_n_s__c_sym; static PyObject *__pyx_n_s__c_tra; static PyObject *__pyx_n_s__cnt_loop; static PyObject *__pyx_n_s__dstptr; static PyObject *__pyx_n_s__dtype; static PyObject *__pyx_n_s__float64; static PyObject *__pyx_n_s__idx; static PyObject *__pyx_n_s__idxc; static PyObject *__pyx_n_s__idxn; static PyObject *__pyx_n_s__idxptr; static PyObject *__pyx_n_s__kdpnts; static PyObject *__pyx_n_s__lc; static PyObject *__pyx_n_s__lcoords; static PyObject *__pyx_n_s__lcoords_c; static PyObject *__pyx_n_s__lidx; static PyObject *__pyx_n_s__lidx_c; static PyObject *__pyx_n_s__mode; static PyObject *__pyx_n_s__neighbor_number; static PyObject *__pyx_n_s__np; static PyObject *__pyx_n_s__numpy; static PyObject *__pyx_n_s__qc; static PyObject *__pyx_n_s__qcoords; static PyObject *__pyx_n_s__qcoords_c; static PyObject *__pyx_n_s__range; static PyObject *__pyx_n_s__search_limit; static PyObject *__pyx_n_s__search_point; static PyObject *__pyx_n_s__shape1; static PyObject *__pyx_n_s__shape2; static PyObject *__pyx_n_s__t_arr; static PyObject *__pyx_n_s__t_asu; static PyObject *__pyx_n_s__t_dst; static PyObject *__pyx_n_s__t_idx; static PyObject *__pyx_n_s__t_lid; static PyObject *__pyx_n_s__t_ptr; static PyObject *__pyx_n_s__t_sym; static PyObject *__pyx_n_s__t_tra; static PyObject *__pyx_n_s__tree; static PyObject *__pyx_n_s__uint64; static PyObject *__pyx_n_s__zero_tra; static PyObject *__pyx_int_15; static PyObject *__pyx_k_tuple_2; static PyObject *__pyx_k_tuple_4; static PyObject *__pyx_k_tuple_6; static PyObject *__pyx_k_tuple_9; static PyObject *__pyx_k_tuple_10; static PyObject *__pyx_k_tuple_12; static PyObject *__pyx_k_tuple_14; static PyObject *__pyx_k_codeobj_15; /* Python wrapper */ static PyObject *__pyx_pw_6cogent_6struct_8_contact_1cnt_loop(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_6cogent_6struct_8_contact_1cnt_loop = {__Pyx_NAMESTR("cnt_loop"), (PyCFunction)__pyx_pw_6cogent_6struct_8_contact_1cnt_loop, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(0)}; static PyObject *__pyx_pw_6cogent_6struct_8_contact_1cnt_loop(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_qcoords = 0; PyArrayObject *__pyx_v_lcoords = 0; PyArrayObject *__pyx_v_qc = 0; PyArrayObject *__pyx_v_lc = 0; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_shape1; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_shape2; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_zero_tra; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_mode; __pyx_t_6cogent_6struct_8_contact_DTYPE_t __pyx_v_search_limit; PyArrayObject *__pyx_v_box = 0; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_bucket_size; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_MAXSYM; npy_intp __pyx_v_MAXCNT; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__qcoords,&__pyx_n_s__lcoords,&__pyx_n_s__qc,&__pyx_n_s__lc,&__pyx_n_s__shape1,&__pyx_n_s__shape2,&__pyx_n_s__zero_tra,&__pyx_n_s__mode,&__pyx_n_s__search_limit,&__pyx_n_s__box,&__pyx_n_s__bucket_size,&__pyx_n_s__MAXSYM,&__pyx_n_s__MAXCNT,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("cnt_loop (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[13] = {0,0,0,0,0,0,0,0,0,0,0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 13: values[12] = PyTuple_GET_ITEM(__pyx_args, 12); case 12: values[11] = PyTuple_GET_ITEM(__pyx_args, 11); case 11: values[10] = PyTuple_GET_ITEM(__pyx_args, 10); case 10: values[9] = PyTuple_GET_ITEM(__pyx_args, 9); case 9: values[8] = PyTuple_GET_ITEM(__pyx_args, 8); case 8: values[7] = PyTuple_GET_ITEM(__pyx_args, 7); case 7: values[6] = PyTuple_GET_ITEM(__pyx_args, 6); case 6: values[5] = PyTuple_GET_ITEM(__pyx_args, 5); case 5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4); case 4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3); case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__qcoords); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__lcoords); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__qc); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 3: values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__lc); if (likely(values[3])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, 3); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 4: values[4] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__shape1); if (likely(values[4])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, 4); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 5: values[5] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__shape2); if (likely(values[5])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, 5); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 6: values[6] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__zero_tra); if (likely(values[6])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, 6); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 7: values[7] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__mode); if (likely(values[7])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, 7); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 8: values[8] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__search_limit); if (likely(values[8])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, 8); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 9: values[9] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__box); if (likely(values[9])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, 9); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 10: if (kw_args > 0) { PyObject* value = PyDict_GetItem(__pyx_kwds, __pyx_n_s__bucket_size); if (value) { values[10] = value; kw_args--; } } case 11: if (kw_args > 0) { PyObject* value = PyDict_GetItem(__pyx_kwds, __pyx_n_s__MAXSYM); if (value) { values[11] = value; kw_args--; } } case 12: if (kw_args > 0) { PyObject* value = PyDict_GetItem(__pyx_kwds, __pyx_n_s__MAXCNT); if (value) { values[12] = value; kw_args--; } } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "cnt_loop") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } if (values[10]) { } else { __pyx_v_bucket_size = ((__pyx_t_6cogent_6struct_8_contact_UTYPE_t)10); } if (values[11]) { } else { __pyx_v_MAXSYM = ((__pyx_t_6cogent_6struct_8_contact_UTYPE_t)200000); } if (values[12]) { } else { __pyx_v_MAXCNT = ((npy_intp)100000); } } else { switch (PyTuple_GET_SIZE(__pyx_args)) { case 13: values[12] = PyTuple_GET_ITEM(__pyx_args, 12); case 12: values[11] = PyTuple_GET_ITEM(__pyx_args, 11); case 11: values[10] = PyTuple_GET_ITEM(__pyx_args, 10); case 10: values[9] = PyTuple_GET_ITEM(__pyx_args, 9); values[8] = PyTuple_GET_ITEM(__pyx_args, 8); values[7] = PyTuple_GET_ITEM(__pyx_args, 7); values[6] = PyTuple_GET_ITEM(__pyx_args, 6); values[5] = PyTuple_GET_ITEM(__pyx_args, 5); values[4] = PyTuple_GET_ITEM(__pyx_args, 4); values[3] = PyTuple_GET_ITEM(__pyx_args, 3); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[0] = PyTuple_GET_ITEM(__pyx_args, 0); break; default: goto __pyx_L5_argtuple_error; } } __pyx_v_qcoords = ((PyArrayObject *)values[0]); __pyx_v_lcoords = ((PyArrayObject *)values[1]); __pyx_v_qc = ((PyArrayObject *)values[2]); __pyx_v_lc = ((PyArrayObject *)values[3]); __pyx_v_shape1 = __Pyx_PyInt_from_py_npy_uint64(values[4]); if (unlikely((__pyx_v_shape1 == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 21; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_shape2 = __Pyx_PyInt_from_py_npy_uint64(values[5]); if (unlikely((__pyx_v_shape2 == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_zero_tra = __Pyx_PyInt_from_py_npy_uint64(values[6]); if (unlikely((__pyx_v_zero_tra == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 23; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_mode = __Pyx_PyInt_from_py_npy_uint64(values[7]); if (unlikely((__pyx_v_mode == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 24; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_search_limit = __pyx_PyFloat_AsDouble(values[8]); if (unlikely((__pyx_v_search_limit == (npy_float64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 25; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_box = ((PyArrayObject *)values[9]); if (values[10]) { __pyx_v_bucket_size = __Pyx_PyInt_from_py_npy_uint64(values[10]); if (unlikely((__pyx_v_bucket_size == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 27; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } else { __pyx_v_bucket_size = ((__pyx_t_6cogent_6struct_8_contact_UTYPE_t)10); } if (values[11]) { __pyx_v_MAXSYM = __Pyx_PyInt_from_py_npy_uint64(values[11]); if (unlikely((__pyx_v_MAXSYM == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 28; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } else { __pyx_v_MAXSYM = ((__pyx_t_6cogent_6struct_8_contact_UTYPE_t)200000); } if (values[12]) { __pyx_v_MAXCNT = __Pyx_PyInt_from_py_Py_intptr_t(values[12]); if (unlikely((__pyx_v_MAXCNT == (npy_intp)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 29; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } else { __pyx_v_MAXCNT = ((npy_intp)100000); } } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("cnt_loop", 0, 10, 13, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.struct._contact.cnt_loop", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_qcoords), __pyx_ptype_5numpy_ndarray, 1, "qcoords", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_lcoords), __pyx_ptype_5numpy_ndarray, 1, "lcoords", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_qc), __pyx_ptype_5numpy_ndarray, 1, "qc", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 19; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_lc), __pyx_ptype_5numpy_ndarray, 1, "lc", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 20; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_box), __pyx_ptype_5numpy_ndarray, 1, "box", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 26; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_6struct_8_contact_cnt_loop(__pyx_self, __pyx_v_qcoords, __pyx_v_lcoords, __pyx_v_qc, __pyx_v_lc, __pyx_v_shape1, __pyx_v_shape2, __pyx_v_zero_tra, __pyx_v_mode, __pyx_v_search_limit, __pyx_v_box, __pyx_v_bucket_size, __pyx_v_MAXSYM, __pyx_v_MAXCNT); goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/struct/_contact.pyx":17 * # NPY_OWNDATA * * def cnt_loop( np.ndarray[DTYPE_t, ndim =2] qcoords,\ # <<<<<<<<<<<<<< * np.ndarray[DTYPE_t, ndim =2] lcoords,\ * np.ndarray[LTYPE_t, ndim =1] qc,\ */ static PyObject *__pyx_pf_6cogent_6struct_8_contact_cnt_loop(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_qcoords, PyArrayObject *__pyx_v_lcoords, PyArrayObject *__pyx_v_qc, PyArrayObject *__pyx_v_lc, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_shape1, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_shape2, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_zero_tra, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_mode, __pyx_t_6cogent_6struct_8_contact_DTYPE_t __pyx_v_search_limit, PyArrayObject *__pyx_v_box, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_bucket_size, __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_MAXSYM, npy_intp __pyx_v_MAXCNT) { __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_asu_atoms; int __pyx_v_idx; int __pyx_v_lidx; int __pyx_v_lidx_c; int __pyx_v_idxn; int __pyx_v_idxc; __pyx_t_6cogent_6struct_8_contact_DTYPE_t *__pyx_v_qcoords_c; __pyx_t_6cogent_6struct_8_contact_DTYPE_t *__pyx_v_lcoords_c; __pyx_t_6cogent_6struct_8_contact_DTYPE_t *__pyx_v_box_c; __pyx_t_6cogent_6struct_8_contact_UTYPE_t **__pyx_v_idxptr; __pyx_t_6cogent_6struct_8_contact_DTYPE_t **__pyx_v_dstptr; __pyx_t_6cogent_6struct_8_contact_DTYPE_t *__pyx_v_t_ptr; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_t_idx; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_t_asu; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_t_sym; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_v_t_tra; __pyx_t_6cogent_6struct_8_contact_DTYPE_t __pyx_v_t_dst; __pyx_t_6cogent_6struct_8_contact_DTYPE_t *__pyx_v_t_arr; __pyx_t_6cogent_6struct_8_contact_UTYPE_t *__pyx_v_t_lid; PyArrayObject *__pyx_v_c_src = 0; PyArrayObject *__pyx_v_c_asu = 0; PyArrayObject *__pyx_v_c_sym = 0; PyArrayObject *__pyx_v_c_tra = 0; PyArrayObject *__pyx_v_c_dst = 0; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint __pyx_v_search_point; npy_intp __pyx_v_neighbor_number; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_kdpnts; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_v_tree; __Pyx_LocalBuf_ND __pyx_pybuffernd_box; __Pyx_Buffer __pyx_pybuffer_box; __Pyx_LocalBuf_ND __pyx_pybuffernd_c_asu; __Pyx_Buffer __pyx_pybuffer_c_asu; __Pyx_LocalBuf_ND __pyx_pybuffernd_c_dst; __Pyx_Buffer __pyx_pybuffer_c_dst; __Pyx_LocalBuf_ND __pyx_pybuffernd_c_src; __Pyx_Buffer __pyx_pybuffer_c_src; __Pyx_LocalBuf_ND __pyx_pybuffernd_c_sym; __Pyx_Buffer __pyx_pybuffer_c_sym; __Pyx_LocalBuf_ND __pyx_pybuffernd_c_tra; __Pyx_Buffer __pyx_pybuffer_c_tra; __Pyx_LocalBuf_ND __pyx_pybuffernd_lc; __Pyx_Buffer __pyx_pybuffer_lc; __Pyx_LocalBuf_ND __pyx_pybuffernd_lcoords; __Pyx_Buffer __pyx_pybuffer_lcoords; __Pyx_LocalBuf_ND __pyx_pybuffernd_qc; __Pyx_Buffer __pyx_pybuffer_qc; __Pyx_LocalBuf_ND __pyx_pybuffernd_qcoords; __Pyx_Buffer __pyx_pybuffer_qcoords; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; PyObject *__pyx_t_2 = NULL; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; npy_intp __pyx_t_5; int __pyx_t_6; __pyx_t_6cogent_6struct_8_contact_DTYPE_t __pyx_t_7; int __pyx_t_8; int __pyx_t_9; int __pyx_t_10; npy_intp __pyx_t_11; __pyx_t_6cogent_6struct_8_contact_UTYPE_t __pyx_t_12; int __pyx_t_13; int __pyx_t_14; int __pyx_t_15; int __pyx_t_16; int __pyx_t_17; int __pyx_t_18; int __pyx_t_19; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("cnt_loop", 0); __pyx_pybuffer_c_src.pybuffer.buf = NULL; __pyx_pybuffer_c_src.refcount = 0; __pyx_pybuffernd_c_src.data = NULL; __pyx_pybuffernd_c_src.rcbuffer = &__pyx_pybuffer_c_src; __pyx_pybuffer_c_asu.pybuffer.buf = NULL; __pyx_pybuffer_c_asu.refcount = 0; __pyx_pybuffernd_c_asu.data = NULL; __pyx_pybuffernd_c_asu.rcbuffer = &__pyx_pybuffer_c_asu; __pyx_pybuffer_c_sym.pybuffer.buf = NULL; __pyx_pybuffer_c_sym.refcount = 0; __pyx_pybuffernd_c_sym.data = NULL; __pyx_pybuffernd_c_sym.rcbuffer = &__pyx_pybuffer_c_sym; __pyx_pybuffer_c_tra.pybuffer.buf = NULL; __pyx_pybuffer_c_tra.refcount = 0; __pyx_pybuffernd_c_tra.data = NULL; __pyx_pybuffernd_c_tra.rcbuffer = &__pyx_pybuffer_c_tra; __pyx_pybuffer_c_dst.pybuffer.buf = NULL; __pyx_pybuffer_c_dst.refcount = 0; __pyx_pybuffernd_c_dst.data = NULL; __pyx_pybuffernd_c_dst.rcbuffer = &__pyx_pybuffer_c_dst; __pyx_pybuffer_qcoords.pybuffer.buf = NULL; __pyx_pybuffer_qcoords.refcount = 0; __pyx_pybuffernd_qcoords.data = NULL; __pyx_pybuffernd_qcoords.rcbuffer = &__pyx_pybuffer_qcoords; __pyx_pybuffer_lcoords.pybuffer.buf = NULL; __pyx_pybuffer_lcoords.refcount = 0; __pyx_pybuffernd_lcoords.data = NULL; __pyx_pybuffernd_lcoords.rcbuffer = &__pyx_pybuffer_lcoords; __pyx_pybuffer_qc.pybuffer.buf = NULL; __pyx_pybuffer_qc.refcount = 0; __pyx_pybuffernd_qc.data = NULL; __pyx_pybuffernd_qc.rcbuffer = &__pyx_pybuffer_qc; __pyx_pybuffer_lc.pybuffer.buf = NULL; __pyx_pybuffer_lc.refcount = 0; __pyx_pybuffernd_lc.data = NULL; __pyx_pybuffernd_lc.rcbuffer = &__pyx_pybuffer_lc; __pyx_pybuffer_box.pybuffer.buf = NULL; __pyx_pybuffer_box.refcount = 0; __pyx_pybuffernd_box.data = NULL; __pyx_pybuffernd_box.rcbuffer = &__pyx_pybuffer_box; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_qcoords.rcbuffer->pybuffer, (PyObject*)__pyx_v_qcoords, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_qcoords.diminfo[0].strides = __pyx_pybuffernd_qcoords.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_qcoords.diminfo[0].shape = __pyx_pybuffernd_qcoords.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_qcoords.diminfo[1].strides = __pyx_pybuffernd_qcoords.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_qcoords.diminfo[1].shape = __pyx_pybuffernd_qcoords.rcbuffer->pybuffer.shape[1]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_lcoords.rcbuffer->pybuffer, (PyObject*)__pyx_v_lcoords, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_lcoords.diminfo[0].strides = __pyx_pybuffernd_lcoords.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_lcoords.diminfo[0].shape = __pyx_pybuffernd_lcoords.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_lcoords.diminfo[1].strides = __pyx_pybuffernd_lcoords.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_lcoords.diminfo[1].shape = __pyx_pybuffernd_lcoords.rcbuffer->pybuffer.shape[1]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_qc.rcbuffer->pybuffer, (PyObject*)__pyx_v_qc, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_LTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_qc.diminfo[0].strides = __pyx_pybuffernd_qc.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_qc.diminfo[0].shape = __pyx_pybuffernd_qc.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_lc.rcbuffer->pybuffer, (PyObject*)__pyx_v_lc, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_LTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_lc.diminfo[0].strides = __pyx_pybuffernd_lc.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_lc.diminfo[0].shape = __pyx_pybuffernd_lc.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_box.rcbuffer->pybuffer, (PyObject*)__pyx_v_box, &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_box.diminfo[0].strides = __pyx_pybuffernd_box.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_box.diminfo[0].shape = __pyx_pybuffernd_box.rcbuffer->pybuffer.shape[0]; /* "cogent/struct/_contact.pyx":31 * npy_intp MAXCNT =100000): * #const * cdef UTYPE_t asu_atoms = shape1 * shape2 # <<<<<<<<<<<<<< * search_limit = search_limit * search_limit * */ __pyx_v_asu_atoms = (__pyx_v_shape1 * __pyx_v_shape2); /* "cogent/struct/_contact.pyx":32 * #const * cdef UTYPE_t asu_atoms = shape1 * shape2 * search_limit = search_limit * search_limit # <<<<<<<<<<<<<< * * #looping indexes query atom, lattice atom, neighbor, result */ __pyx_v_search_limit = (__pyx_v_search_limit * __pyx_v_search_limit); /* "cogent/struct/_contact.pyx":38 * * #c arrays from numpy * cdef DTYPE_t *qcoords_c = qcoords.data # <<<<<<<<<<<<<< * cdef DTYPE_t *lcoords_c = lcoords.data * cdef DTYPE_t *box_c = box.data */ __pyx_v_qcoords_c = ((__pyx_t_6cogent_6struct_8_contact_DTYPE_t *)__pyx_v_qcoords->data); /* "cogent/struct/_contact.pyx":39 * #c arrays from numpy * cdef DTYPE_t *qcoords_c = qcoords.data * cdef DTYPE_t *lcoords_c = lcoords.data # <<<<<<<<<<<<<< * cdef DTYPE_t *box_c = box.data * */ __pyx_v_lcoords_c = ((__pyx_t_6cogent_6struct_8_contact_DTYPE_t *)__pyx_v_lcoords->data); /* "cogent/struct/_contact.pyx":40 * cdef DTYPE_t *qcoords_c = qcoords.data * cdef DTYPE_t *lcoords_c = lcoords.data * cdef DTYPE_t *box_c = box.data # <<<<<<<<<<<<<< * * #malloc'ed pointers */ __pyx_v_box_c = ((__pyx_t_6cogent_6struct_8_contact_DTYPE_t *)__pyx_v_box->data); /* "cogent/struct/_contact.pyx":43 * * #malloc'ed pointers * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) # <<<<<<<<<<<<<< * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) * */ __pyx_v_idxptr = ((__pyx_t_6cogent_6struct_8_contact_UTYPE_t **)malloc((sizeof(__pyx_t_6cogent_6struct_8_contact_UTYPE_t *)))); /* "cogent/struct/_contact.pyx":44 * #malloc'ed pointers * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) # <<<<<<<<<<<<<< * * # temp */ __pyx_v_dstptr = ((__pyx_t_6cogent_6struct_8_contact_DTYPE_t **)malloc((sizeof(__pyx_t_6cogent_6struct_8_contact_DTYPE_t *)))); /* "cogent/struct/_contact.pyx":53 * cdef UTYPE_t t_tra # translation UTYPE_t * cdef DTYPE_t t_dst # distance * cdef DTYPE_t *t_arr = malloc(3 * MAXSYM * sizeof(DTYPE_t)) # temporary array of symmetry # <<<<<<<<<<<<<< * cdef UTYPE_t *t_lid = malloc( MAXSYM * sizeof(UTYPE_t)) # maping to original indices * */ __pyx_v_t_arr = ((__pyx_t_6cogent_6struct_8_contact_DTYPE_t *)malloc(((3 * __pyx_v_MAXSYM) * (sizeof(__pyx_t_6cogent_6struct_8_contact_DTYPE_t))))); /* "cogent/struct/_contact.pyx":54 * cdef DTYPE_t t_dst # distance * cdef DTYPE_t *t_arr = malloc(3 * MAXSYM * sizeof(DTYPE_t)) # temporary array of symmetry * cdef UTYPE_t *t_lid = malloc( MAXSYM * sizeof(UTYPE_t)) # maping to original indices # <<<<<<<<<<<<<< * * # result */ __pyx_v_t_lid = ((__pyx_t_6cogent_6struct_8_contact_UTYPE_t *)malloc((__pyx_v_MAXSYM * (sizeof(__pyx_t_6cogent_6struct_8_contact_UTYPE_t))))); /* "cogent/struct/_contact.pyx":63 * #cdef DTYPE_t *c_dst = malloc(MAXCNT * sizeof(DTYPE_t)) # distances * * cdef np.ndarray[UTYPE_t, ndim=1] c_src = np.ndarray((MAXCNT,), dtype=np.uint64) # <<<<<<<<<<<<<< * cdef np.ndarray[UTYPE_t, ndim=1] c_asu = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[UTYPE_t, ndim=1] c_sym = np.ndarray((MAXCNT,), dtype=np.uint64) */ __pyx_t_1 = __Pyx_PyInt_to_py_Py_intptr_t(__pyx_v_MAXCNT); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1); __Pyx_GIVEREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)__pyx_t_2)); __Pyx_GIVEREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); __pyx_t_3 = __Pyx_GetName(__pyx_m, __pyx_n_s__np); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = PyObject_GetAttr(__pyx_t_3, __pyx_n_s__uint64); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (PyDict_SetItem(__pyx_t_2, ((PyObject *)__pyx_n_s__dtype), __pyx_t_4) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyObject_Call(((PyObject *)((PyObject*)__pyx_ptype_5numpy_ndarray)), ((PyObject *)__pyx_t_1), ((PyObject *)__pyx_t_2)); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0; __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_c_src.rcbuffer->pybuffer, (PyObject*)((PyArrayObject *)__pyx_t_4), &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_UTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_c_src = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_c_src.rcbuffer->pybuffer.buf = NULL; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } else {__pyx_pybuffernd_c_src.diminfo[0].strides = __pyx_pybuffernd_c_src.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_c_src.diminfo[0].shape = __pyx_pybuffernd_c_src.rcbuffer->pybuffer.shape[0]; } } __pyx_v_c_src = ((PyArrayObject *)__pyx_t_4); __pyx_t_4 = 0; /* "cogent/struct/_contact.pyx":64 * * cdef np.ndarray[UTYPE_t, ndim=1] c_src = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[UTYPE_t, ndim=1] c_asu = np.ndarray((MAXCNT,), dtype=np.uint64) # <<<<<<<<<<<<<< * cdef np.ndarray[UTYPE_t, ndim=1] c_sym = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[UTYPE_t, ndim=1] c_tra = np.ndarray((MAXCNT,), dtype=np.uint64) */ __pyx_t_4 = __Pyx_PyInt_to_py_Py_intptr_t(__pyx_v_MAXCNT); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __Pyx_GIVEREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_2)); __Pyx_GIVEREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); __pyx_t_1 = __Pyx_GetName(__pyx_m, __pyx_n_s__np); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_3 = PyObject_GetAttr(__pyx_t_1, __pyx_n_s__uint64); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; if (PyDict_SetItem(__pyx_t_2, ((PyObject *)__pyx_n_s__dtype), __pyx_t_3) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyObject_Call(((PyObject *)((PyObject*)__pyx_ptype_5numpy_ndarray)), ((PyObject *)__pyx_t_4), ((PyObject *)__pyx_t_2)); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_c_asu.rcbuffer->pybuffer, (PyObject*)((PyArrayObject *)__pyx_t_3), &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_UTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_c_asu = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_c_asu.rcbuffer->pybuffer.buf = NULL; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } else {__pyx_pybuffernd_c_asu.diminfo[0].strides = __pyx_pybuffernd_c_asu.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_c_asu.diminfo[0].shape = __pyx_pybuffernd_c_asu.rcbuffer->pybuffer.shape[0]; } } __pyx_v_c_asu = ((PyArrayObject *)__pyx_t_3); __pyx_t_3 = 0; /* "cogent/struct/_contact.pyx":65 * cdef np.ndarray[UTYPE_t, ndim=1] c_src = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[UTYPE_t, ndim=1] c_asu = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[UTYPE_t, ndim=1] c_sym = np.ndarray((MAXCNT,), dtype=np.uint64) # <<<<<<<<<<<<<< * cdef np.ndarray[UTYPE_t, ndim=1] c_tra = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[DTYPE_t, ndim=1] c_dst = np.ndarray((MAXCNT,), dtype=np.float64) */ __pyx_t_3 = __Pyx_PyInt_to_py_Py_intptr_t(__pyx_v_MAXCNT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 65; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 65; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 65; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, ((PyObject *)__pyx_t_2)); __Pyx_GIVEREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 65; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); __pyx_t_4 = __Pyx_GetName(__pyx_m, __pyx_n_s__np); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 65; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_1 = PyObject_GetAttr(__pyx_t_4, __pyx_n_s__uint64); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 65; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; if (PyDict_SetItem(__pyx_t_2, ((PyObject *)__pyx_n_s__dtype), __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 65; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyObject_Call(((PyObject *)((PyObject*)__pyx_ptype_5numpy_ndarray)), ((PyObject *)__pyx_t_3), ((PyObject *)__pyx_t_2)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 65; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_c_sym.rcbuffer->pybuffer, (PyObject*)((PyArrayObject *)__pyx_t_1), &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_UTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_c_sym = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_c_sym.rcbuffer->pybuffer.buf = NULL; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 65; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } else {__pyx_pybuffernd_c_sym.diminfo[0].strides = __pyx_pybuffernd_c_sym.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_c_sym.diminfo[0].shape = __pyx_pybuffernd_c_sym.rcbuffer->pybuffer.shape[0]; } } __pyx_v_c_sym = ((PyArrayObject *)__pyx_t_1); __pyx_t_1 = 0; /* "cogent/struct/_contact.pyx":66 * cdef np.ndarray[UTYPE_t, ndim=1] c_asu = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[UTYPE_t, ndim=1] c_sym = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[UTYPE_t, ndim=1] c_tra = np.ndarray((MAXCNT,), dtype=np.uint64) # <<<<<<<<<<<<<< * cdef np.ndarray[DTYPE_t, ndim=1] c_dst = np.ndarray((MAXCNT,), dtype=np.float64) * */ __pyx_t_1 = __Pyx_PyInt_to_py_Py_intptr_t(__pyx_v_MAXCNT); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 66; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 66; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1); __Pyx_GIVEREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 66; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)__pyx_t_2)); __Pyx_GIVEREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 66; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); __pyx_t_3 = __Pyx_GetName(__pyx_m, __pyx_n_s__np); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 66; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = PyObject_GetAttr(__pyx_t_3, __pyx_n_s__uint64); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 66; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (PyDict_SetItem(__pyx_t_2, ((PyObject *)__pyx_n_s__dtype), __pyx_t_4) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 66; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyObject_Call(((PyObject *)((PyObject*)__pyx_ptype_5numpy_ndarray)), ((PyObject *)__pyx_t_1), ((PyObject *)__pyx_t_2)); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 66; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0; __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_c_tra.rcbuffer->pybuffer, (PyObject*)((PyArrayObject *)__pyx_t_4), &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_UTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_c_tra = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_c_tra.rcbuffer->pybuffer.buf = NULL; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 66; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } else {__pyx_pybuffernd_c_tra.diminfo[0].strides = __pyx_pybuffernd_c_tra.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_c_tra.diminfo[0].shape = __pyx_pybuffernd_c_tra.rcbuffer->pybuffer.shape[0]; } } __pyx_v_c_tra = ((PyArrayObject *)__pyx_t_4); __pyx_t_4 = 0; /* "cogent/struct/_contact.pyx":67 * cdef np.ndarray[UTYPE_t, ndim=1] c_sym = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[UTYPE_t, ndim=1] c_tra = np.ndarray((MAXCNT,), dtype=np.uint64) * cdef np.ndarray[DTYPE_t, ndim=1] c_dst = np.ndarray((MAXCNT,), dtype=np.float64) # <<<<<<<<<<<<<< * * # create a temporary array of lattice points, which are within a box around */ __pyx_t_4 = __Pyx_PyInt_to_py_Py_intptr_t(__pyx_v_MAXCNT); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 67; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 67; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __Pyx_GIVEREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 67; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_2)); __Pyx_GIVEREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 67; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); __pyx_t_1 = __Pyx_GetName(__pyx_m, __pyx_n_s__np); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 67; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_3 = PyObject_GetAttr(__pyx_t_1, __pyx_n_s__float64); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 67; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; if (PyDict_SetItem(__pyx_t_2, ((PyObject *)__pyx_n_s__dtype), __pyx_t_3) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 67; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyObject_Call(((PyObject *)((PyObject*)__pyx_ptype_5numpy_ndarray)), ((PyObject *)__pyx_t_4), ((PyObject *)__pyx_t_2)); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 67; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_c_dst.rcbuffer->pybuffer, (PyObject*)((PyArrayObject *)__pyx_t_3), &__Pyx_TypeInfo_nn___pyx_t_6cogent_6struct_8_contact_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_c_dst = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_c_dst.rcbuffer->pybuffer.buf = NULL; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 67; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } else {__pyx_pybuffernd_c_dst.diminfo[0].strides = __pyx_pybuffernd_c_dst.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_c_dst.diminfo[0].shape = __pyx_pybuffernd_c_dst.rcbuffer->pybuffer.shape[0]; } } __pyx_v_c_dst = ((PyArrayObject *)__pyx_t_3); __pyx_t_3 = 0; /* "cogent/struct/_contact.pyx":71 * # create a temporary array of lattice points, which are within a box around * # the query atoms. The kd-tree will be constructed from those filterd atoms. * lidx_c = 0 # <<<<<<<<<<<<<< * for 0 <= lidx < lcoords.shape[0]: * t_ptr = lcoords_c + lidx * 3 */ __pyx_v_lidx_c = 0; /* "cogent/struct/_contact.pyx":72 * # the query atoms. The kd-tree will be constructed from those filterd atoms. * lidx_c = 0 * for 0 <= lidx < lcoords.shape[0]: # <<<<<<<<<<<<<< * t_ptr = lcoords_c + lidx * 3 * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ */ __pyx_t_5 = (__pyx_v_lcoords->dimensions[0]); for (__pyx_v_lidx = 0; __pyx_v_lidx < __pyx_t_5; __pyx_v_lidx++) { /* "cogent/struct/_contact.pyx":73 * lidx_c = 0 * for 0 <= lidx < lcoords.shape[0]: * t_ptr = lcoords_c + lidx * 3 # <<<<<<<<<<<<<< * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ */ __pyx_v_t_ptr = (__pyx_v_lcoords_c + (__pyx_v_lidx * 3)); /* "cogent/struct/_contact.pyx":74 * for 0 <= lidx < lcoords.shape[0]: * t_ptr = lcoords_c + lidx * 3 * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ # <<<<<<<<<<<<<< * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: */ __pyx_t_6 = ((__pyx_v_box_c[0]) <= (__pyx_v_t_ptr[0])); if (__pyx_t_6) { __pyx_t_6 = ((__pyx_v_t_ptr[0]) <= (__pyx_v_box_c[3])); } if (__pyx_t_6) { /* "cogent/struct/_contact.pyx":75 * t_ptr = lcoords_c + lidx * 3 * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ # <<<<<<<<<<<<<< * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: * t_arr[3*lidx_c ] = (t_ptr )[0] */ __pyx_t_7 = ((__pyx_v_t_ptr + 1)[0]); __pyx_t_8 = ((__pyx_v_box_c[1]) <= __pyx_t_7); if (__pyx_t_8) { __pyx_t_8 = (__pyx_t_7 <= (__pyx_v_box_c[4])); } if (__pyx_t_8) { /* "cogent/struct/_contact.pyx":76 * if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: # <<<<<<<<<<<<<< * t_arr[3*lidx_c ] = (t_ptr )[0] * t_arr[3*lidx_c+1] = (t_ptr+1)[0] */ __pyx_t_7 = ((__pyx_v_t_ptr + 2)[0]); __pyx_t_9 = ((__pyx_v_box_c[2]) <= __pyx_t_7); if (__pyx_t_9) { __pyx_t_9 = (__pyx_t_7 <= (__pyx_v_box_c[5])); } __pyx_t_10 = __pyx_t_9; } else { __pyx_t_10 = __pyx_t_8; } __pyx_t_8 = __pyx_t_10; } else { __pyx_t_8 = __pyx_t_6; } if (__pyx_t_8) { /* "cogent/struct/_contact.pyx":77 * box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: * t_arr[3*lidx_c ] = (t_ptr )[0] # <<<<<<<<<<<<<< * t_arr[3*lidx_c+1] = (t_ptr+1)[0] * t_arr[3*lidx_c+2] = (t_ptr+2)[0] */ (__pyx_v_t_arr[(3 * __pyx_v_lidx_c)]) = (__pyx_v_t_ptr[0]); /* "cogent/struct/_contact.pyx":78 * box_c[2] <= (t_ptr+2)[0] <= box_c[5]: * t_arr[3*lidx_c ] = (t_ptr )[0] * t_arr[3*lidx_c+1] = (t_ptr+1)[0] # <<<<<<<<<<<<<< * t_arr[3*lidx_c+2] = (t_ptr+2)[0] * t_lid[lidx_c] = lidx */ (__pyx_v_t_arr[((3 * __pyx_v_lidx_c) + 1)]) = ((__pyx_v_t_ptr + 1)[0]); /* "cogent/struct/_contact.pyx":79 * t_arr[3*lidx_c ] = (t_ptr )[0] * t_arr[3*lidx_c+1] = (t_ptr+1)[0] * t_arr[3*lidx_c+2] = (t_ptr+2)[0] # <<<<<<<<<<<<<< * t_lid[lidx_c] = lidx * lidx_c += 1 */ (__pyx_v_t_arr[((3 * __pyx_v_lidx_c) + 2)]) = ((__pyx_v_t_ptr + 2)[0]); /* "cogent/struct/_contact.pyx":80 * t_arr[3*lidx_c+1] = (t_ptr+1)[0] * t_arr[3*lidx_c+2] = (t_ptr+2)[0] * t_lid[lidx_c] = lidx # <<<<<<<<<<<<<< * lidx_c += 1 * */ (__pyx_v_t_lid[__pyx_v_lidx_c]) = __pyx_v_lidx; /* "cogent/struct/_contact.pyx":81 * t_arr[3*lidx_c+2] = (t_ptr+2)[0] * t_lid[lidx_c] = lidx * lidx_c += 1 # <<<<<<<<<<<<<< * * #make kd-tree */ __pyx_v_lidx_c = (__pyx_v_lidx_c + 1); goto __pyx_L5; } __pyx_L5:; } /* "cogent/struct/_contact.pyx":86 * cdef kdpoint search_point * cdef npy_intp neighbor_number * cdef kdpoint *kdpnts = points(t_arr, lidx_c, 3) # <<<<<<<<<<<<<< * cdef kdnode *tree = build_tree(kdpnts, 0, lidx_c - 1, 3, bucket_size, 0) * */ __pyx_v_kdpnts = __pyx_f_6cogent_5maths_7spatial_4ckd3_points(__pyx_v_t_arr, __pyx_v_lidx_c, 3); /* "cogent/struct/_contact.pyx":87 * cdef npy_intp neighbor_number * cdef kdpoint *kdpnts = points(t_arr, lidx_c, 3) * cdef kdnode *tree = build_tree(kdpnts, 0, lidx_c - 1, 3, bucket_size, 0) # <<<<<<<<<<<<<< * * idxc = 0 */ __pyx_v_tree = __pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree(__pyx_v_kdpnts, 0, (__pyx_v_lidx_c - 1), 3, __pyx_v_bucket_size, 0); /* "cogent/struct/_contact.pyx":89 * cdef kdnode *tree = build_tree(kdpnts, 0, lidx_c - 1, 3, bucket_size, 0) * * idxc = 0 # <<<<<<<<<<<<<< * # loop over every query atom * for 0 <= idx < qcoords.shape[0]: */ __pyx_v_idxc = 0; /* "cogent/struct/_contact.pyx":91 * idxc = 0 * # loop over every query atom * for 0 <= idx < qcoords.shape[0]: # <<<<<<<<<<<<<< * search_point.coords = qcoords_c + idx*3 * neighbor_number = rn(tree, kdpnts, search_point, dstptr, idxptr, search_limit, 3, 100) */ __pyx_t_5 = (__pyx_v_qcoords->dimensions[0]); for (__pyx_v_idx = 0; __pyx_v_idx < __pyx_t_5; __pyx_v_idx++) { /* "cogent/struct/_contact.pyx":92 * # loop over every query atom * for 0 <= idx < qcoords.shape[0]: * search_point.coords = qcoords_c + idx*3 # <<<<<<<<<<<<<< * neighbor_number = rn(tree, kdpnts, search_point, dstptr, idxptr, search_limit, 3, 100) * # loop over all neighbors */ __pyx_v_search_point.coords = (__pyx_v_qcoords_c + (__pyx_v_idx * 3)); /* "cogent/struct/_contact.pyx":93 * for 0 <= idx < qcoords.shape[0]: * search_point.coords = qcoords_c + idx*3 * neighbor_number = rn(tree, kdpnts, search_point, dstptr, idxptr, search_limit, 3, 100) # <<<<<<<<<<<<<< * # loop over all neighbors * for 0 <= idxn < neighbor_number: */ __pyx_v_neighbor_number = ((npy_intp)__pyx_f_6cogent_5maths_7spatial_4ckd3_rn(__pyx_v_tree, __pyx_v_kdpnts, __pyx_v_search_point, __pyx_v_dstptr, __pyx_v_idxptr, __pyx_v_search_limit, 3, 100)); /* "cogent/struct/_contact.pyx":95 * neighbor_number = rn(tree, kdpnts, search_point, dstptr, idxptr, search_limit, 3, 100) * # loop over all neighbors * for 0 <= idxn < neighbor_number: # <<<<<<<<<<<<<< * t_dst = dstptr[0][idxn] # the distance of the neighbor to the query * if t_dst <= 0.001: # its the same atom, skipping. */ __pyx_t_11 = __pyx_v_neighbor_number; for (__pyx_v_idxn = 0; __pyx_v_idxn < __pyx_t_11; __pyx_v_idxn++) { /* "cogent/struct/_contact.pyx":96 * # loop over all neighbors * for 0 <= idxn < neighbor_number: * t_dst = dstptr[0][idxn] # the distance of the neighbor to the query # <<<<<<<<<<<<<< * if t_dst <= 0.001: # its the same atom, skipping. * continue */ __pyx_v_t_dst = ((__pyx_v_dstptr[0])[__pyx_v_idxn]); /* "cogent/struct/_contact.pyx":97 * for 0 <= idxn < neighbor_number: * t_dst = dstptr[0][idxn] # the distance of the neighbor to the query * if t_dst <= 0.001: # its the same atom, skipping. # <<<<<<<<<<<<<< * continue * */ __pyx_t_8 = (__pyx_v_t_dst <= 0.001); if (__pyx_t_8) { /* "cogent/struct/_contact.pyx":98 * t_dst = dstptr[0][idxn] # the distance of the neighbor to the query * if t_dst <= 0.001: # its the same atom, skipping. * continue # <<<<<<<<<<<<<< * * t_idx = kdpnts[idxptr[0][idxn]].index # real index in t_arr array */ goto __pyx_L8_continue; goto __pyx_L10; } __pyx_L10:; /* "cogent/struct/_contact.pyx":100 * continue * * t_idx = kdpnts[idxptr[0][idxn]].index # real index in t_arr array # <<<<<<<<<<<<<< * t_idx = t_lid[t_idx] # real index in lcoords array * t_asu = t_idx % shape2 # 0 .. N -1, atom number */ __pyx_v_t_idx = (__pyx_v_kdpnts[((__pyx_v_idxptr[0])[__pyx_v_idxn])]).index; /* "cogent/struct/_contact.pyx":101 * * t_idx = kdpnts[idxptr[0][idxn]].index # real index in t_arr array * t_idx = t_lid[t_idx] # real index in lcoords array # <<<<<<<<<<<<<< * t_asu = t_idx % shape2 # 0 .. N -1, atom number * t_sym = (t_idx // shape2) % shape1 # 0 .. MXS -1, symmetry number */ __pyx_v_t_idx = (__pyx_v_t_lid[__pyx_v_t_idx]); /* "cogent/struct/_contact.pyx":102 * t_idx = kdpnts[idxptr[0][idxn]].index # real index in t_arr array * t_idx = t_lid[t_idx] # real index in lcoords array * t_asu = t_idx % shape2 # 0 .. N -1, atom number # <<<<<<<<<<<<<< * t_sym = (t_idx // shape2) % shape1 # 0 .. MXS -1, symmetry number * t_tra = t_idx // asu_atoms # 0 .. (2n + 1)^2 -1, translation number */ if (unlikely(__pyx_v_shape2 == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "integer division or modulo by zero"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 102; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_v_t_asu = (__pyx_v_t_idx % __pyx_v_shape2); /* "cogent/struct/_contact.pyx":103 * t_idx = t_lid[t_idx] # real index in lcoords array * t_asu = t_idx % shape2 # 0 .. N -1, atom number * t_sym = (t_idx // shape2) % shape1 # 0 .. MXS -1, symmetry number # <<<<<<<<<<<<<< * t_tra = t_idx // asu_atoms # 0 .. (2n + 1)^2 -1, translation number * */ if (unlikely(__pyx_v_shape2 == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "integer division or modulo by zero"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 103; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_12 = (__pyx_v_t_idx / __pyx_v_shape2); if (unlikely(__pyx_v_shape1 == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "integer division or modulo by zero"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 103; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_v_t_sym = (__pyx_t_12 % __pyx_v_shape1); /* "cogent/struct/_contact.pyx":104 * t_asu = t_idx % shape2 # 0 .. N -1, atom number * t_sym = (t_idx // shape2) % shape1 # 0 .. MXS -1, symmetry number * t_tra = t_idx // asu_atoms # 0 .. (2n + 1)^2 -1, translation number # <<<<<<<<<<<<<< * * if t_tra == zero_tra: # same unit cell */ if (unlikely(__pyx_v_asu_atoms == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "integer division or modulo by zero"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 104; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_v_t_tra = (__pyx_v_t_idx / __pyx_v_asu_atoms); /* "cogent/struct/_contact.pyx":106 * t_tra = t_idx // asu_atoms # 0 .. (2n + 1)^2 -1, translation number * * if t_tra == zero_tra: # same unit cell # <<<<<<<<<<<<<< * if (mode == 0): * continue */ __pyx_t_8 = (__pyx_v_t_tra == __pyx_v_zero_tra); if (__pyx_t_8) { /* "cogent/struct/_contact.pyx":107 * * if t_tra == zero_tra: # same unit cell * if (mode == 0): # <<<<<<<<<<<<<< * continue * elif (mode == 1) and (t_sym == 0): # same asymmetric unit */ __pyx_t_8 = (__pyx_v_mode == 0); if (__pyx_t_8) { /* "cogent/struct/_contact.pyx":108 * if t_tra == zero_tra: # same unit cell * if (mode == 0): * continue # <<<<<<<<<<<<<< * elif (mode == 1) and (t_sym == 0): # same asymmetric unit * continue */ goto __pyx_L8_continue; goto __pyx_L12; } /* "cogent/struct/_contact.pyx":109 * if (mode == 0): * continue * elif (mode == 1) and (t_sym == 0): # same asymmetric unit # <<<<<<<<<<<<<< * continue * elif (mode == 2) and (qc[idx] == lc[t_asu]) and (t_sym == 0): # */ __pyx_t_8 = (__pyx_v_mode == 1); if (__pyx_t_8) { __pyx_t_6 = (__pyx_v_t_sym == 0); __pyx_t_10 = __pyx_t_6; } else { __pyx_t_10 = __pyx_t_8; } if (__pyx_t_10) { /* "cogent/struct/_contact.pyx":110 * continue * elif (mode == 1) and (t_sym == 0): # same asymmetric unit * continue # <<<<<<<<<<<<<< * elif (mode == 2) and (qc[idx] == lc[t_asu]) and (t_sym == 0): # * continue */ goto __pyx_L8_continue; goto __pyx_L12; } /* "cogent/struct/_contact.pyx":111 * elif (mode == 1) and (t_sym == 0): # same asymmetric unit * continue * elif (mode == 2) and (qc[idx] == lc[t_asu]) and (t_sym == 0): # # <<<<<<<<<<<<<< * continue * */ __pyx_t_10 = (__pyx_v_mode == 2); if (__pyx_t_10) { __pyx_t_13 = __pyx_v_idx; __pyx_t_14 = -1; if (__pyx_t_13 < 0) { __pyx_t_13 += __pyx_pybuffernd_qc.diminfo[0].shape; if (unlikely(__pyx_t_13 < 0)) __pyx_t_14 = 0; } else if (unlikely(__pyx_t_13 >= __pyx_pybuffernd_qc.diminfo[0].shape)) __pyx_t_14 = 0; if (unlikely(__pyx_t_14 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_14); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 111; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_12 = __pyx_v_t_asu; __pyx_t_14 = -1; if (unlikely(__pyx_t_12 >= (size_t)__pyx_pybuffernd_lc.diminfo[0].shape)) __pyx_t_14 = 0; if (unlikely(__pyx_t_14 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_14); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 111; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_8 = ((*__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_8_contact_LTYPE_t *, __pyx_pybuffernd_qc.rcbuffer->pybuffer.buf, __pyx_t_13, __pyx_pybuffernd_qc.diminfo[0].strides)) == (*__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_8_contact_LTYPE_t *, __pyx_pybuffernd_lc.rcbuffer->pybuffer.buf, __pyx_t_12, __pyx_pybuffernd_lc.diminfo[0].strides))); if (__pyx_t_8) { __pyx_t_6 = (__pyx_v_t_sym == 0); __pyx_t_9 = __pyx_t_6; } else { __pyx_t_9 = __pyx_t_8; } __pyx_t_8 = __pyx_t_9; } else { __pyx_t_8 = __pyx_t_10; } if (__pyx_t_8) { /* "cogent/struct/_contact.pyx":112 * continue * elif (mode == 2) and (qc[idx] == lc[t_asu]) and (t_sym == 0): # * continue # <<<<<<<<<<<<<< * * # safe valid contact */ goto __pyx_L8_continue; goto __pyx_L12; } __pyx_L12:; goto __pyx_L11; } __pyx_L11:; /* "cogent/struct/_contact.pyx":115 * * # safe valid contact * c_src[idxc] = idx # <<<<<<<<<<<<<< * c_asu[idxc] = t_asu * c_sym[idxc] = t_sym */ __pyx_t_14 = __pyx_v_idxc; __pyx_t_15 = -1; if (__pyx_t_14 < 0) { __pyx_t_14 += __pyx_pybuffernd_c_src.diminfo[0].shape; if (unlikely(__pyx_t_14 < 0)) __pyx_t_15 = 0; } else if (unlikely(__pyx_t_14 >= __pyx_pybuffernd_c_src.diminfo[0].shape)) __pyx_t_15 = 0; if (unlikely(__pyx_t_15 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_15); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 115; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } *__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_8_contact_UTYPE_t *, __pyx_pybuffernd_c_src.rcbuffer->pybuffer.buf, __pyx_t_14, __pyx_pybuffernd_c_src.diminfo[0].strides) = __pyx_v_idx; /* "cogent/struct/_contact.pyx":116 * # safe valid contact * c_src[idxc] = idx * c_asu[idxc] = t_asu # <<<<<<<<<<<<<< * c_sym[idxc] = t_sym * c_tra[idxc] = t_tra */ __pyx_t_15 = __pyx_v_idxc; __pyx_t_16 = -1; if (__pyx_t_15 < 0) { __pyx_t_15 += __pyx_pybuffernd_c_asu.diminfo[0].shape; if (unlikely(__pyx_t_15 < 0)) __pyx_t_16 = 0; } else if (unlikely(__pyx_t_15 >= __pyx_pybuffernd_c_asu.diminfo[0].shape)) __pyx_t_16 = 0; if (unlikely(__pyx_t_16 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_16); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 116; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } *__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_8_contact_UTYPE_t *, __pyx_pybuffernd_c_asu.rcbuffer->pybuffer.buf, __pyx_t_15, __pyx_pybuffernd_c_asu.diminfo[0].strides) = __pyx_v_t_asu; /* "cogent/struct/_contact.pyx":117 * c_src[idxc] = idx * c_asu[idxc] = t_asu * c_sym[idxc] = t_sym # <<<<<<<<<<<<<< * c_tra[idxc] = t_tra * c_dst[idxc] = t_dst */ __pyx_t_16 = __pyx_v_idxc; __pyx_t_17 = -1; if (__pyx_t_16 < 0) { __pyx_t_16 += __pyx_pybuffernd_c_sym.diminfo[0].shape; if (unlikely(__pyx_t_16 < 0)) __pyx_t_17 = 0; } else if (unlikely(__pyx_t_16 >= __pyx_pybuffernd_c_sym.diminfo[0].shape)) __pyx_t_17 = 0; if (unlikely(__pyx_t_17 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_17); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 117; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } *__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_8_contact_UTYPE_t *, __pyx_pybuffernd_c_sym.rcbuffer->pybuffer.buf, __pyx_t_16, __pyx_pybuffernd_c_sym.diminfo[0].strides) = __pyx_v_t_sym; /* "cogent/struct/_contact.pyx":118 * c_asu[idxc] = t_asu * c_sym[idxc] = t_sym * c_tra[idxc] = t_tra # <<<<<<<<<<<<<< * c_dst[idxc] = t_dst * idxc += 1 */ __pyx_t_17 = __pyx_v_idxc; __pyx_t_18 = -1; if (__pyx_t_17 < 0) { __pyx_t_17 += __pyx_pybuffernd_c_tra.diminfo[0].shape; if (unlikely(__pyx_t_17 < 0)) __pyx_t_18 = 0; } else if (unlikely(__pyx_t_17 >= __pyx_pybuffernd_c_tra.diminfo[0].shape)) __pyx_t_18 = 0; if (unlikely(__pyx_t_18 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_18); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 118; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } *__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_8_contact_UTYPE_t *, __pyx_pybuffernd_c_tra.rcbuffer->pybuffer.buf, __pyx_t_17, __pyx_pybuffernd_c_tra.diminfo[0].strides) = __pyx_v_t_tra; /* "cogent/struct/_contact.pyx":119 * c_sym[idxc] = t_sym * c_tra[idxc] = t_tra * c_dst[idxc] = t_dst # <<<<<<<<<<<<<< * idxc += 1 * */ __pyx_t_18 = __pyx_v_idxc; __pyx_t_19 = -1; if (__pyx_t_18 < 0) { __pyx_t_18 += __pyx_pybuffernd_c_dst.diminfo[0].shape; if (unlikely(__pyx_t_18 < 0)) __pyx_t_19 = 0; } else if (unlikely(__pyx_t_18 >= __pyx_pybuffernd_c_dst.diminfo[0].shape)) __pyx_t_19 = 0; if (unlikely(__pyx_t_19 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_19); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 119; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } *__Pyx_BufPtrStrided1d(__pyx_t_6cogent_6struct_8_contact_DTYPE_t *, __pyx_pybuffernd_c_dst.rcbuffer->pybuffer.buf, __pyx_t_18, __pyx_pybuffernd_c_dst.diminfo[0].strides) = __pyx_v_t_dst; /* "cogent/struct/_contact.pyx":120 * c_tra[idxc] = t_tra * c_dst[idxc] = t_dst * idxc += 1 # <<<<<<<<<<<<<< * * free(t_arr) */ __pyx_v_idxc = (__pyx_v_idxc + 1); __pyx_L8_continue:; } } /* "cogent/struct/_contact.pyx":122 * idxc += 1 * * free(t_arr) # <<<<<<<<<<<<<< * free(t_lid) * # free KD-Tree */ free(__pyx_v_t_arr); /* "cogent/struct/_contact.pyx":123 * * free(t_arr) * free(t_lid) # <<<<<<<<<<<<<< * # free KD-Tree * free(idxptr[0]) */ free(__pyx_v_t_lid); /* "cogent/struct/_contact.pyx":125 * free(t_lid) * # free KD-Tree * free(idxptr[0]) # <<<<<<<<<<<<<< * free(idxptr) * free(dstptr) */ free((__pyx_v_idxptr[0])); /* "cogent/struct/_contact.pyx":126 * # free KD-Tree * free(idxptr[0]) * free(idxptr) # <<<<<<<<<<<<<< * free(dstptr) * */ free(__pyx_v_idxptr); /* "cogent/struct/_contact.pyx":127 * free(idxptr[0]) * free(idxptr) * free(dstptr) # <<<<<<<<<<<<<< * * # numpy output */ free(__pyx_v_dstptr); /* "cogent/struct/_contact.pyx":141 * #cdef np.ndarray n_dst = PyArray_SimpleNewFromData(1, &MAXCNT, NPY_DOUBLE, c_dst) * # n_dst.flags = n_dst.flags|(NPY_OWNDATA) # this sets the ownership bit * return (idxc, c_src, c_asu, c_sym, c_tra, c_dst) # <<<<<<<<<<<<<< */ __Pyx_XDECREF(__pyx_r); __pyx_t_3 = PyInt_FromLong(__pyx_v_idxc); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 141; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_2 = PyTuple_New(6); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 141; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); __Pyx_INCREF(((PyObject *)__pyx_v_c_src)); PyTuple_SET_ITEM(__pyx_t_2, 1, ((PyObject *)__pyx_v_c_src)); __Pyx_GIVEREF(((PyObject *)__pyx_v_c_src)); __Pyx_INCREF(((PyObject *)__pyx_v_c_asu)); PyTuple_SET_ITEM(__pyx_t_2, 2, ((PyObject *)__pyx_v_c_asu)); __Pyx_GIVEREF(((PyObject *)__pyx_v_c_asu)); __Pyx_INCREF(((PyObject *)__pyx_v_c_sym)); PyTuple_SET_ITEM(__pyx_t_2, 3, ((PyObject *)__pyx_v_c_sym)); __Pyx_GIVEREF(((PyObject *)__pyx_v_c_sym)); __Pyx_INCREF(((PyObject *)__pyx_v_c_tra)); PyTuple_SET_ITEM(__pyx_t_2, 4, ((PyObject *)__pyx_v_c_tra)); __Pyx_GIVEREF(((PyObject *)__pyx_v_c_tra)); __Pyx_INCREF(((PyObject *)__pyx_v_c_dst)); PyTuple_SET_ITEM(__pyx_t_2, 5, ((PyObject *)__pyx_v_c_dst)); __Pyx_GIVEREF(((PyObject *)__pyx_v_c_dst)); __pyx_t_3 = 0; __pyx_r = ((PyObject *)__pyx_t_2); __pyx_t_2 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_box.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_asu.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_dst.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_src.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_sym.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_tra.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_lc.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_lcoords.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_qc.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_qcoords.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.struct._contact.cnt_loop", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_box.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_asu.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_dst.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_src.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_sym.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_c_tra.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_lc.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_lcoords.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_qc.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_qcoords.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_c_src); __Pyx_XDECREF((PyObject *)__pyx_v_c_asu); __Pyx_XDECREF((PyObject *)__pyx_v_c_sym); __Pyx_XDECREF((PyObject *)__pyx_v_c_tra); __Pyx_XDECREF((PyObject *)__pyx_v_c_dst); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /*proto*/ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_r; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__getbuffer__ (wrapper)", 0); __pyx_r = __pyx_pf_5numpy_7ndarray___getbuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info), ((int)__pyx_v_flags)); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":193 * # experimental exception made for __getbuffer__ and __releasebuffer__ * # -- the details of this may change. * def __getbuffer__(ndarray self, Py_buffer* info, int flags): # <<<<<<<<<<<<<< * # This implementation of getbuffer is geared towards Cython * # requirements, and does not yet fullfill the PEP. */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_v_copy_shape; int __pyx_v_i; int __pyx_v_ndim; int __pyx_v_endian_detector; int __pyx_v_little_endian; int __pyx_v_t; char *__pyx_v_f; PyArray_Descr *__pyx_v_descr = 0; int __pyx_v_offset; int __pyx_v_hasfields; int __pyx_r; __Pyx_RefNannyDeclarations int __pyx_t_1; int __pyx_t_2; int __pyx_t_3; PyObject *__pyx_t_4 = NULL; int __pyx_t_5; int __pyx_t_6; int __pyx_t_7; PyObject *__pyx_t_8 = NULL; char *__pyx_t_9; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("__getbuffer__", 0); if (__pyx_v_info != NULL) { __pyx_v_info->obj = Py_None; __Pyx_INCREF(Py_None); __Pyx_GIVEREF(__pyx_v_info->obj); } /* "numpy.pxd":199 * # of flags * * if info == NULL: return # <<<<<<<<<<<<<< * * cdef int copy_shape, i, ndim */ __pyx_t_1 = (__pyx_v_info == NULL); if (__pyx_t_1) { __pyx_r = 0; goto __pyx_L0; goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":202 * * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * */ __pyx_v_endian_detector = 1; /* "numpy.pxd":203 * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * * ndim = PyArray_NDIM(self) */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":205 * cdef bint little_endian = ((&endian_detector)[0] != 0) * * ndim = PyArray_NDIM(self) # <<<<<<<<<<<<<< * * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_v_ndim = PyArray_NDIM(__pyx_v_self); /* "numpy.pxd":207 * ndim = PyArray_NDIM(self) * * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * copy_shape = 1 * else: */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":208 * * if sizeof(npy_intp) != sizeof(Py_ssize_t): * copy_shape = 1 # <<<<<<<<<<<<<< * else: * copy_shape = 0 */ __pyx_v_copy_shape = 1; goto __pyx_L4; } /*else*/ { /* "numpy.pxd":210 * copy_shape = 1 * else: * copy_shape = 0 # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) */ __pyx_v_copy_shape = 0; } __pyx_L4:; /* "numpy.pxd":212 * copy_shape = 0 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") */ __pyx_t_1 = ((__pyx_v_flags & PyBUF_C_CONTIGUOUS) == PyBUF_C_CONTIGUOUS); if (__pyx_t_1) { /* "numpy.pxd":213 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not C contiguous") * */ __pyx_t_2 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_C_CONTIGUOUS)); __pyx_t_3 = __pyx_t_2; } else { __pyx_t_3 = __pyx_t_1; } if (__pyx_t_3) { /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_2), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":216 * raise ValueError(u"ndarray is not C contiguous") * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") */ __pyx_t_3 = ((__pyx_v_flags & PyBUF_F_CONTIGUOUS) == PyBUF_F_CONTIGUOUS); if (__pyx_t_3) { /* "numpy.pxd":217 * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not Fortran contiguous") * */ __pyx_t_1 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_F_CONTIGUOUS)); __pyx_t_2 = __pyx_t_1; } else { __pyx_t_2 = __pyx_t_3; } if (__pyx_t_2) { /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_4), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":220 * raise ValueError(u"ndarray is not Fortran contiguous") * * info.buf = PyArray_DATA(self) # <<<<<<<<<<<<<< * info.ndim = ndim * if copy_shape: */ __pyx_v_info->buf = PyArray_DATA(__pyx_v_self); /* "numpy.pxd":221 * * info.buf = PyArray_DATA(self) * info.ndim = ndim # <<<<<<<<<<<<<< * if copy_shape: * # Allocate new buffer for strides and shape info. */ __pyx_v_info->ndim = __pyx_v_ndim; /* "numpy.pxd":222 * info.buf = PyArray_DATA(self) * info.ndim = ndim * if copy_shape: # <<<<<<<<<<<<<< * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. */ if (__pyx_v_copy_shape) { /* "numpy.pxd":225 * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) # <<<<<<<<<<<<<< * info.shape = info.strides + ndim * for i in range(ndim): */ __pyx_v_info->strides = ((Py_ssize_t *)malloc((((sizeof(Py_ssize_t)) * ((size_t)__pyx_v_ndim)) * 2))); /* "numpy.pxd":226 * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim # <<<<<<<<<<<<<< * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] */ __pyx_v_info->shape = (__pyx_v_info->strides + __pyx_v_ndim); /* "numpy.pxd":227 * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim * for i in range(ndim): # <<<<<<<<<<<<<< * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] */ __pyx_t_5 = __pyx_v_ndim; for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) { __pyx_v_i = __pyx_t_6; /* "numpy.pxd":228 * info.shape = info.strides + ndim * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] # <<<<<<<<<<<<<< * info.shape[i] = PyArray_DIMS(self)[i] * else: */ (__pyx_v_info->strides[__pyx_v_i]) = (PyArray_STRIDES(__pyx_v_self)[__pyx_v_i]); /* "numpy.pxd":229 * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] # <<<<<<<<<<<<<< * else: * info.strides = PyArray_STRIDES(self) */ (__pyx_v_info->shape[__pyx_v_i]) = (PyArray_DIMS(__pyx_v_self)[__pyx_v_i]); } goto __pyx_L7; } /*else*/ { /* "numpy.pxd":231 * info.shape[i] = PyArray_DIMS(self)[i] * else: * info.strides = PyArray_STRIDES(self) # <<<<<<<<<<<<<< * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL */ __pyx_v_info->strides = ((Py_ssize_t *)PyArray_STRIDES(__pyx_v_self)); /* "numpy.pxd":232 * else: * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) # <<<<<<<<<<<<<< * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) */ __pyx_v_info->shape = ((Py_ssize_t *)PyArray_DIMS(__pyx_v_self)); } __pyx_L7:; /* "numpy.pxd":233 * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL # <<<<<<<<<<<<<< * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) */ __pyx_v_info->suboffsets = NULL; /* "numpy.pxd":234 * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) # <<<<<<<<<<<<<< * info.readonly = not PyArray_ISWRITEABLE(self) * */ __pyx_v_info->itemsize = PyArray_ITEMSIZE(__pyx_v_self); /* "numpy.pxd":235 * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) # <<<<<<<<<<<<<< * * cdef int t */ __pyx_v_info->readonly = (!PyArray_ISWRITEABLE(__pyx_v_self)); /* "numpy.pxd":238 * * cdef int t * cdef char* f = NULL # <<<<<<<<<<<<<< * cdef dtype descr = self.descr * cdef list stack */ __pyx_v_f = NULL; /* "numpy.pxd":239 * cdef int t * cdef char* f = NULL * cdef dtype descr = self.descr # <<<<<<<<<<<<<< * cdef list stack * cdef int offset */ __Pyx_INCREF(((PyObject *)__pyx_v_self->descr)); __pyx_v_descr = __pyx_v_self->descr; /* "numpy.pxd":243 * cdef int offset * * cdef bint hasfields = PyDataType_HASFIELDS(descr) # <<<<<<<<<<<<<< * * if not hasfields and not copy_shape: */ __pyx_v_hasfields = PyDataType_HASFIELDS(__pyx_v_descr); /* "numpy.pxd":245 * cdef bint hasfields = PyDataType_HASFIELDS(descr) * * if not hasfields and not copy_shape: # <<<<<<<<<<<<<< * # do not call releasebuffer * info.obj = None */ __pyx_t_2 = (!__pyx_v_hasfields); if (__pyx_t_2) { __pyx_t_3 = (!__pyx_v_copy_shape); __pyx_t_1 = __pyx_t_3; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":247 * if not hasfields and not copy_shape: * # do not call releasebuffer * info.obj = None # <<<<<<<<<<<<<< * else: * # need to call releasebuffer */ __Pyx_INCREF(Py_None); __Pyx_GIVEREF(Py_None); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = Py_None; goto __pyx_L10; } /*else*/ { /* "numpy.pxd":250 * else: * # need to call releasebuffer * info.obj = self # <<<<<<<<<<<<<< * * if not hasfields: */ __Pyx_INCREF(((PyObject *)__pyx_v_self)); __Pyx_GIVEREF(((PyObject *)__pyx_v_self)); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = ((PyObject *)__pyx_v_self); } __pyx_L10:; /* "numpy.pxd":252 * info.obj = self * * if not hasfields: # <<<<<<<<<<<<<< * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or */ __pyx_t_1 = (!__pyx_v_hasfields); if (__pyx_t_1) { /* "numpy.pxd":253 * * if not hasfields: * t = descr.type_num # <<<<<<<<<<<<<< * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): */ __pyx_v_t = __pyx_v_descr->type_num; /* "numpy.pxd":254 * if not hasfields: * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_1 = (__pyx_v_descr->byteorder == '>'); if (__pyx_t_1) { __pyx_t_2 = __pyx_v_little_endian; } else { __pyx_t_2 = __pyx_t_1; } if (!__pyx_t_2) { /* "numpy.pxd":255 * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" */ __pyx_t_1 = (__pyx_v_descr->byteorder == '<'); if (__pyx_t_1) { __pyx_t_3 = (!__pyx_v_little_endian); __pyx_t_7 = __pyx_t_3; } else { __pyx_t_7 = __pyx_t_1; } __pyx_t_1 = __pyx_t_7; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_6), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L12; } __pyx_L12:; /* "numpy.pxd":257 * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" */ __pyx_t_1 = (__pyx_v_t == NPY_BYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__b; goto __pyx_L13; } /* "numpy.pxd":258 * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" */ __pyx_t_1 = (__pyx_v_t == NPY_UBYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__B; goto __pyx_L13; } /* "numpy.pxd":259 * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" */ __pyx_t_1 = (__pyx_v_t == NPY_SHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__h; goto __pyx_L13; } /* "numpy.pxd":260 * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" */ __pyx_t_1 = (__pyx_v_t == NPY_USHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__H; goto __pyx_L13; } /* "numpy.pxd":261 * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" */ __pyx_t_1 = (__pyx_v_t == NPY_INT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__i; goto __pyx_L13; } /* "numpy.pxd":262 * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" */ __pyx_t_1 = (__pyx_v_t == NPY_UINT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__I; goto __pyx_L13; } /* "numpy.pxd":263 * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" */ __pyx_t_1 = (__pyx_v_t == NPY_LONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__l; goto __pyx_L13; } /* "numpy.pxd":264 * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__L; goto __pyx_L13; } /* "numpy.pxd":265 * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__q; goto __pyx_L13; } /* "numpy.pxd":266 * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Q; goto __pyx_L13; } /* "numpy.pxd":267 * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" */ __pyx_t_1 = (__pyx_v_t == NPY_FLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__f; goto __pyx_L13; } /* "numpy.pxd":268 * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" */ __pyx_t_1 = (__pyx_v_t == NPY_DOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__d; goto __pyx_L13; } /* "numpy.pxd":269 * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__g; goto __pyx_L13; } /* "numpy.pxd":270 * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" */ __pyx_t_1 = (__pyx_v_t == NPY_CFLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zf; goto __pyx_L13; } /* "numpy.pxd":271 * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" */ __pyx_t_1 = (__pyx_v_t == NPY_CDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zd; goto __pyx_L13; } /* "numpy.pxd":272 * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f = "O" * else: */ __pyx_t_1 = (__pyx_v_t == NPY_CLONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zg; goto __pyx_L13; } /* "numpy.pxd":273 * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_1 = (__pyx_v_t == NPY_OBJECT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__O; goto __pyx_L13; } /*else*/ { /* "numpy.pxd":275 * elif t == NPY_OBJECT: f = "O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * info.format = f * return */ __pyx_t_4 = PyInt_FromLong(__pyx_v_t); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_8 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_t_4); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_8)); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_8)); __Pyx_GIVEREF(((PyObject *)__pyx_t_8)); __pyx_t_8 = 0; __pyx_t_8 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_8); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_Raise(__pyx_t_8, 0, 0, 0); __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L13:; /* "numpy.pxd":276 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f # <<<<<<<<<<<<<< * return * else: */ __pyx_v_info->format = __pyx_v_f; /* "numpy.pxd":277 * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f * return # <<<<<<<<<<<<<< * else: * info.format = stdlib.malloc(_buffer_format_string_len) */ __pyx_r = 0; goto __pyx_L0; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":279 * return * else: * info.format = stdlib.malloc(_buffer_format_string_len) # <<<<<<<<<<<<<< * info.format[0] = '^' # Native data types, manual alignment * offset = 0 */ __pyx_v_info->format = ((char *)malloc(255)); /* "numpy.pxd":280 * else: * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment # <<<<<<<<<<<<<< * offset = 0 * f = _util_dtypestring(descr, info.format + 1, */ (__pyx_v_info->format[0]) = '^'; /* "numpy.pxd":281 * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment * offset = 0 # <<<<<<<<<<<<<< * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, */ __pyx_v_offset = 0; /* "numpy.pxd":284 * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, * &offset) # <<<<<<<<<<<<<< * f[0] = 0 # Terminate format string * */ __pyx_t_9 = __pyx_f_5numpy__util_dtypestring(__pyx_v_descr, (__pyx_v_info->format + 1), (__pyx_v_info->format + 255), (&__pyx_v_offset)); if (unlikely(__pyx_t_9 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_9; /* "numpy.pxd":285 * info.format + _buffer_format_string_len, * &offset) * f[0] = 0 # Terminate format string # <<<<<<<<<<<<<< * * def __releasebuffer__(ndarray self, Py_buffer* info): */ (__pyx_v_f[0]) = 0; } __pyx_L11:; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_8); __Pyx_AddTraceback("numpy.ndarray.__getbuffer__", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = -1; if (__pyx_v_info != NULL && __pyx_v_info->obj != NULL) { __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = NULL; } goto __pyx_L2; __pyx_L0:; if (__pyx_v_info != NULL && __pyx_v_info->obj == Py_None) { __Pyx_GOTREF(Py_None); __Pyx_DECREF(Py_None); __pyx_v_info->obj = NULL; } __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_descr); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info); /*proto*/ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__releasebuffer__ (wrapper)", 0); __pyx_pf_5numpy_7ndarray_2__releasebuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info)); __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":287 * f[0] = 0 # Terminate format string * * def __releasebuffer__(ndarray self, Py_buffer* info): # <<<<<<<<<<<<<< * if PyArray_HASFIELDS(self): * stdlib.free(info.format) */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("__releasebuffer__", 0); /* "numpy.pxd":288 * * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): # <<<<<<<<<<<<<< * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_t_1 = PyArray_HASFIELDS(__pyx_v_self); if (__pyx_t_1) { /* "numpy.pxd":289 * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): * stdlib.free(info.format) # <<<<<<<<<<<<<< * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) */ free(__pyx_v_info->format); goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":290 * if PyArray_HASFIELDS(self): * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * stdlib.free(info.strides) * # info.shape was stored after info.strides in the same block */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":291 * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) # <<<<<<<<<<<<<< * # info.shape was stored after info.strides in the same block * */ free(__pyx_v_info->strides); goto __pyx_L4; } __pyx_L4:; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":767 * ctypedef npy_cdouble complex_t * * cdef inline object PyArray_MultiIterNew1(a): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(1, a) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew1(PyObject *__pyx_v_a) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew1", 0); /* "numpy.pxd":768 * * cdef inline object PyArray_MultiIterNew1(a): * return PyArray_MultiIterNew(1, a) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew2(a, b): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(1, ((void *)__pyx_v_a)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 768; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew1", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":770 * return PyArray_MultiIterNew(1, a) * * cdef inline object PyArray_MultiIterNew2(a, b): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(2, a, b) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew2(PyObject *__pyx_v_a, PyObject *__pyx_v_b) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew2", 0); /* "numpy.pxd":771 * * cdef inline object PyArray_MultiIterNew2(a, b): * return PyArray_MultiIterNew(2, a, b) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew3(a, b, c): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(2, ((void *)__pyx_v_a), ((void *)__pyx_v_b)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 771; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew2", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":773 * return PyArray_MultiIterNew(2, a, b) * * cdef inline object PyArray_MultiIterNew3(a, b, c): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(3, a, b, c) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew3(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew3", 0); /* "numpy.pxd":774 * * cdef inline object PyArray_MultiIterNew3(a, b, c): * return PyArray_MultiIterNew(3, a, b, c) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(3, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 774; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew3", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":776 * return PyArray_MultiIterNew(3, a, b, c) * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(4, a, b, c, d) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew4(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew4", 0); /* "numpy.pxd":777 * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): * return PyArray_MultiIterNew(4, a, b, c, d) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(4, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 777; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew4", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":779 * return PyArray_MultiIterNew(4, a, b, c, d) * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(5, a, b, c, d, e) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew5(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d, PyObject *__pyx_v_e) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew5", 0); /* "numpy.pxd":780 * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): * return PyArray_MultiIterNew(5, a, b, c, d, e) # <<<<<<<<<<<<<< * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(5, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d), ((void *)__pyx_v_e)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 780; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew5", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":782 * return PyArray_MultiIterNew(5, a, b, c, d, e) * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: # <<<<<<<<<<<<<< * # Recursive utility function used in __getbuffer__ to get format * # string. The new location in the format string is returned. */ static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *__pyx_v_descr, char *__pyx_v_f, char *__pyx_v_end, int *__pyx_v_offset) { PyArray_Descr *__pyx_v_child = 0; int __pyx_v_endian_detector; int __pyx_v_little_endian; PyObject *__pyx_v_fields = 0; PyObject *__pyx_v_childname = NULL; PyObject *__pyx_v_new_offset = NULL; PyObject *__pyx_v_t = NULL; char *__pyx_r; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; Py_ssize_t __pyx_t_2; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; PyObject *__pyx_t_5 = NULL; int __pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; long __pyx_t_10; char *__pyx_t_11; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("_util_dtypestring", 0); /* "numpy.pxd":789 * cdef int delta_offset * cdef tuple i * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * cdef tuple fields */ __pyx_v_endian_detector = 1; /* "numpy.pxd":790 * cdef tuple i * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * cdef tuple fields * */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":793 * cdef tuple fields * * for childname in descr.names: # <<<<<<<<<<<<<< * fields = descr.fields[childname] * child, new_offset = fields */ if (unlikely(((PyObject *)__pyx_v_descr->names) == Py_None)) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 793; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_1 = ((PyObject *)__pyx_v_descr->names); __Pyx_INCREF(__pyx_t_1); __pyx_t_2 = 0; for (;;) { if (__pyx_t_2 >= PyTuple_GET_SIZE(__pyx_t_1)) break; __pyx_t_3 = PyTuple_GET_ITEM(__pyx_t_1, __pyx_t_2); __Pyx_INCREF(__pyx_t_3); __pyx_t_2++; __Pyx_XDECREF(__pyx_v_childname); __pyx_v_childname = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":794 * * for childname in descr.names: * fields = descr.fields[childname] # <<<<<<<<<<<<<< * child, new_offset = fields * */ __pyx_t_3 = PyObject_GetItem(__pyx_v_descr->fields, __pyx_v_childname); if (!__pyx_t_3) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); if (!(likely(PyTuple_CheckExact(__pyx_t_3))||((__pyx_t_3) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected tuple, got %.200s", Py_TYPE(__pyx_t_3)->tp_name), 0))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_fields)); __pyx_v_fields = ((PyObject*)__pyx_t_3); __pyx_t_3 = 0; /* "numpy.pxd":795 * for childname in descr.names: * fields = descr.fields[childname] * child, new_offset = fields # <<<<<<<<<<<<<< * * if (end - f) - (new_offset - offset[0]) < 15: */ if (likely(PyTuple_CheckExact(((PyObject *)__pyx_v_fields)))) { PyObject* sequence = ((PyObject *)__pyx_v_fields); if (unlikely(PyTuple_GET_SIZE(sequence) != 2)) { if (PyTuple_GET_SIZE(sequence) > 2) __Pyx_RaiseTooManyValuesError(2); else __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(sequence)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_3 = PyTuple_GET_ITEM(sequence, 0); __pyx_t_4 = PyTuple_GET_ITEM(sequence, 1); __Pyx_INCREF(__pyx_t_3); __Pyx_INCREF(__pyx_t_4); } else { __Pyx_UnpackTupleError(((PyObject *)__pyx_v_fields), 2); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } if (!(likely(((__pyx_t_3) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_3, __pyx_ptype_5numpy_dtype))))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_child)); __pyx_v_child = ((PyArray_Descr *)__pyx_t_3); __pyx_t_3 = 0; __Pyx_XDECREF(__pyx_v_new_offset); __pyx_v_new_offset = __pyx_t_4; __pyx_t_4 = 0; /* "numpy.pxd":797 * child, new_offset = fields * * if (end - f) - (new_offset - offset[0]) < 15: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * */ __pyx_t_4 = PyInt_FromLong((__pyx_v_end - __pyx_v_f)); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyNumber_Subtract(__pyx_v_new_offset, __pyx_t_3); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyNumber_Subtract(__pyx_t_4, __pyx_t_5); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = PyObject_RichCompare(__pyx_t_3, __pyx_int_15, Py_LT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_t_5 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_9), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":800 * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * * if ((child.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_6 = (__pyx_v_child->byteorder == '>'); if (__pyx_t_6) { __pyx_t_7 = __pyx_v_little_endian; } else { __pyx_t_7 = __pyx_t_6; } if (!__pyx_t_7) { /* "numpy.pxd":801 * * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * # One could encode it in the format string and have Cython */ __pyx_t_6 = (__pyx_v_child->byteorder == '<'); if (__pyx_t_6) { __pyx_t_8 = (!__pyx_v_little_endian); __pyx_t_9 = __pyx_t_8; } else { __pyx_t_9 = __pyx_t_6; } __pyx_t_6 = __pyx_t_9; } else { __pyx_t_6 = __pyx_t_7; } if (__pyx_t_6) { /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_10), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":812 * * # Output padding bytes * while offset[0] < new_offset: # <<<<<<<<<<<<<< * f[0] = 120 # "x"; pad byte * f += 1 */ while (1) { __pyx_t_5 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_v_new_offset, Py_LT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (!__pyx_t_6) break; /* "numpy.pxd":813 * # Output padding bytes * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte # <<<<<<<<<<<<<< * f += 1 * offset[0] += 1 */ (__pyx_v_f[0]) = 120; /* "numpy.pxd":814 * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte * f += 1 # <<<<<<<<<<<<<< * offset[0] += 1 * */ __pyx_v_f = (__pyx_v_f + 1); /* "numpy.pxd":815 * f[0] = 120 # "x"; pad byte * f += 1 * offset[0] += 1 # <<<<<<<<<<<<<< * * offset[0] += child.itemsize */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + 1); } /* "numpy.pxd":817 * offset[0] += 1 * * offset[0] += child.itemsize # <<<<<<<<<<<<<< * * if not PyDataType_HASFIELDS(child): */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + __pyx_v_child->elsize); /* "numpy.pxd":819 * offset[0] += child.itemsize * * if not PyDataType_HASFIELDS(child): # <<<<<<<<<<<<<< * t = child.type_num * if end - f < 5: */ __pyx_t_6 = (!PyDataType_HASFIELDS(__pyx_v_child)); if (__pyx_t_6) { /* "numpy.pxd":820 * * if not PyDataType_HASFIELDS(child): * t = child.type_num # <<<<<<<<<<<<<< * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") */ __pyx_t_3 = PyInt_FromLong(__pyx_v_child->type_num); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 820; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_XDECREF(__pyx_v_t); __pyx_v_t = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":821 * if not PyDataType_HASFIELDS(child): * t = child.type_num * if end - f < 5: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short.") * */ __pyx_t_6 = ((__pyx_v_end - __pyx_v_f) < 5); if (__pyx_t_6) { /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_t_3 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_12), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_Raise(__pyx_t_3, 0, 0, 0); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L10; } __pyx_L10:; /* "numpy.pxd":825 * * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" */ __pyx_t_3 = PyInt_FromLong(NPY_BYTE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 98; goto __pyx_L11; } /* "numpy.pxd":826 * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" */ __pyx_t_5 = PyInt_FromLong(NPY_UBYTE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 66; goto __pyx_L11; } /* "numpy.pxd":827 * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" */ __pyx_t_3 = PyInt_FromLong(NPY_SHORT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 104; goto __pyx_L11; } /* "numpy.pxd":828 * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" */ __pyx_t_5 = PyInt_FromLong(NPY_USHORT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 72; goto __pyx_L11; } /* "numpy.pxd":829 * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" */ __pyx_t_3 = PyInt_FromLong(NPY_INT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 105; goto __pyx_L11; } /* "numpy.pxd":830 * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" */ __pyx_t_5 = PyInt_FromLong(NPY_UINT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 73; goto __pyx_L11; } /* "numpy.pxd":831 * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" */ __pyx_t_3 = PyInt_FromLong(NPY_LONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 108; goto __pyx_L11; } /* "numpy.pxd":832 * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 76; goto __pyx_L11; } /* "numpy.pxd":833 * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" */ __pyx_t_3 = PyInt_FromLong(NPY_LONGLONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 113; goto __pyx_L11; } /* "numpy.pxd":834 * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONGLONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 81; goto __pyx_L11; } /* "numpy.pxd":835 * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" */ __pyx_t_3 = PyInt_FromLong(NPY_FLOAT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 102; goto __pyx_L11; } /* "numpy.pxd":836 * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf */ __pyx_t_5 = PyInt_FromLong(NPY_DOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 100; goto __pyx_L11; } /* "numpy.pxd":837 * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd */ __pyx_t_3 = PyInt_FromLong(NPY_LONGDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 103; goto __pyx_L11; } /* "numpy.pxd":838 * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg */ __pyx_t_5 = PyInt_FromLong(NPY_CFLOAT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 102; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":839 * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" */ __pyx_t_3 = PyInt_FromLong(NPY_CDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 100; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":840 * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: */ __pyx_t_5 = PyInt_FromLong(NPY_CLONGDOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 103; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":841 * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_3 = PyInt_FromLong(NPY_OBJECT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 79; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":843 * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * f += 1 * else: */ __pyx_t_5 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_v_t); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_5)); __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, ((PyObject *)__pyx_t_5)); __Pyx_GIVEREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L11:; /* "numpy.pxd":844 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * f += 1 # <<<<<<<<<<<<<< * else: * # Cython ignores struct boundary information ("T{...}"), */ __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L9; } /*else*/ { /* "numpy.pxd":848 * # Cython ignores struct boundary information ("T{...}"), * # so don't output it * f = _util_dtypestring(child, f, end, offset) # <<<<<<<<<<<<<< * return f * */ __pyx_t_11 = __pyx_f_5numpy__util_dtypestring(__pyx_v_child, __pyx_v_f, __pyx_v_end, __pyx_v_offset); if (unlikely(__pyx_t_11 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 848; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_11; } __pyx_L9:; } __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "numpy.pxd":849 * # so don't output it * f = _util_dtypestring(child, f, end, offset) * return f # <<<<<<<<<<<<<< * * */ __pyx_r = __pyx_v_f; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __Pyx_AddTraceback("numpy._util_dtypestring", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF((PyObject *)__pyx_v_child); __Pyx_XDECREF(__pyx_v_fields); __Pyx_XDECREF(__pyx_v_childname); __Pyx_XDECREF(__pyx_v_new_offset); __Pyx_XDECREF(__pyx_v_t); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":964 * * * cdef inline void set_array_base(ndarray arr, object base): # <<<<<<<<<<<<<< * cdef PyObject* baseptr * if base is None: */ static CYTHON_INLINE void __pyx_f_5numpy_set_array_base(PyArrayObject *__pyx_v_arr, PyObject *__pyx_v_base) { PyObject *__pyx_v_baseptr; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("set_array_base", 0); /* "numpy.pxd":966 * cdef inline void set_array_base(ndarray arr, object base): * cdef PyObject* baseptr * if base is None: # <<<<<<<<<<<<<< * baseptr = NULL * else: */ __pyx_t_1 = (__pyx_v_base == Py_None); if (__pyx_t_1) { /* "numpy.pxd":967 * cdef PyObject* baseptr * if base is None: * baseptr = NULL # <<<<<<<<<<<<<< * else: * Py_INCREF(base) # important to do this before decref below! */ __pyx_v_baseptr = NULL; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":969 * baseptr = NULL * else: * Py_INCREF(base) # important to do this before decref below! # <<<<<<<<<<<<<< * baseptr = base * Py_XDECREF(arr.base) */ Py_INCREF(__pyx_v_base); /* "numpy.pxd":970 * else: * Py_INCREF(base) # important to do this before decref below! * baseptr = base # <<<<<<<<<<<<<< * Py_XDECREF(arr.base) * arr.base = baseptr */ __pyx_v_baseptr = ((PyObject *)__pyx_v_base); } __pyx_L3:; /* "numpy.pxd":971 * Py_INCREF(base) # important to do this before decref below! * baseptr = base * Py_XDECREF(arr.base) # <<<<<<<<<<<<<< * arr.base = baseptr * */ Py_XDECREF(__pyx_v_arr->base); /* "numpy.pxd":972 * baseptr = base * Py_XDECREF(arr.base) * arr.base = baseptr # <<<<<<<<<<<<<< * * cdef inline object get_array_base(ndarray arr): */ __pyx_v_arr->base = __pyx_v_baseptr; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_get_array_base(PyArrayObject *__pyx_v_arr) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("get_array_base", 0); /* "numpy.pxd":975 * * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: # <<<<<<<<<<<<<< * return None * else: */ __pyx_t_1 = (__pyx_v_arr->base == NULL); if (__pyx_t_1) { /* "numpy.pxd":976 * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: * return None # <<<<<<<<<<<<<< * else: * return arr.base */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(Py_None); __pyx_r = Py_None; goto __pyx_L0; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":978 * return None * else: * return arr.base # <<<<<<<<<<<<<< */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_arr->base)); __pyx_r = ((PyObject *)__pyx_v_arr->base); goto __pyx_L0; } __pyx_L3:; __pyx_r = Py_None; __Pyx_INCREF(Py_None); __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyMethodDef __pyx_methods[] = { {0, 0, 0, 0} }; #if PY_MAJOR_VERSION >= 3 static struct PyModuleDef __pyx_moduledef = { PyModuleDef_HEAD_INIT, __Pyx_NAMESTR("_contact"), 0, /* m_doc */ -1, /* m_size */ __pyx_methods /* m_methods */, NULL, /* m_reload */ NULL, /* m_traverse */ NULL, /* m_clear */ NULL /* m_free */ }; #endif static __Pyx_StringTabEntry __pyx_string_tab[] = { {&__pyx_kp_u_1, __pyx_k_1, sizeof(__pyx_k_1), 0, 1, 0, 0}, {&__pyx_kp_u_11, __pyx_k_11, sizeof(__pyx_k_11), 0, 1, 0, 0}, {&__pyx_kp_s_13, __pyx_k_13, sizeof(__pyx_k_13), 0, 0, 1, 0}, {&__pyx_kp_s_16, __pyx_k_16, sizeof(__pyx_k_16), 0, 0, 1, 0}, {&__pyx_n_s_17, __pyx_k_17, sizeof(__pyx_k_17), 0, 0, 1, 1}, {&__pyx_kp_u_3, __pyx_k_3, sizeof(__pyx_k_3), 0, 1, 0, 0}, {&__pyx_kp_u_5, __pyx_k_5, sizeof(__pyx_k_5), 0, 1, 0, 0}, {&__pyx_kp_u_7, __pyx_k_7, sizeof(__pyx_k_7), 0, 1, 0, 0}, {&__pyx_kp_u_8, __pyx_k_8, sizeof(__pyx_k_8), 0, 1, 0, 0}, {&__pyx_n_s__MAXCNT, __pyx_k__MAXCNT, sizeof(__pyx_k__MAXCNT), 0, 0, 1, 1}, {&__pyx_n_s__MAXSYM, __pyx_k__MAXSYM, sizeof(__pyx_k__MAXSYM), 0, 0, 1, 1}, {&__pyx_n_s__RuntimeError, __pyx_k__RuntimeError, sizeof(__pyx_k__RuntimeError), 0, 0, 1, 1}, {&__pyx_n_s__ValueError, __pyx_k__ValueError, sizeof(__pyx_k__ValueError), 0, 0, 1, 1}, {&__pyx_n_s____main__, __pyx_k____main__, sizeof(__pyx_k____main__), 0, 0, 1, 1}, {&__pyx_n_s____test__, __pyx_k____test__, sizeof(__pyx_k____test__), 0, 0, 1, 1}, {&__pyx_n_s____version__, __pyx_k____version__, sizeof(__pyx_k____version__), 0, 0, 1, 1}, {&__pyx_n_s__asu_atoms, __pyx_k__asu_atoms, sizeof(__pyx_k__asu_atoms), 0, 0, 1, 1}, {&__pyx_n_s__box, __pyx_k__box, sizeof(__pyx_k__box), 0, 0, 1, 1}, {&__pyx_n_s__box_c, __pyx_k__box_c, sizeof(__pyx_k__box_c), 0, 0, 1, 1}, {&__pyx_n_s__bucket_size, __pyx_k__bucket_size, sizeof(__pyx_k__bucket_size), 0, 0, 1, 1}, {&__pyx_n_s__c_asu, __pyx_k__c_asu, sizeof(__pyx_k__c_asu), 0, 0, 1, 1}, {&__pyx_n_s__c_dst, __pyx_k__c_dst, sizeof(__pyx_k__c_dst), 0, 0, 1, 1}, {&__pyx_n_s__c_src, __pyx_k__c_src, sizeof(__pyx_k__c_src), 0, 0, 1, 1}, {&__pyx_n_s__c_sym, __pyx_k__c_sym, sizeof(__pyx_k__c_sym), 0, 0, 1, 1}, {&__pyx_n_s__c_tra, __pyx_k__c_tra, sizeof(__pyx_k__c_tra), 0, 0, 1, 1}, {&__pyx_n_s__cnt_loop, __pyx_k__cnt_loop, sizeof(__pyx_k__cnt_loop), 0, 0, 1, 1}, {&__pyx_n_s__dstptr, __pyx_k__dstptr, sizeof(__pyx_k__dstptr), 0, 0, 1, 1}, {&__pyx_n_s__dtype, __pyx_k__dtype, sizeof(__pyx_k__dtype), 0, 0, 1, 1}, {&__pyx_n_s__float64, __pyx_k__float64, sizeof(__pyx_k__float64), 0, 0, 1, 1}, {&__pyx_n_s__idx, __pyx_k__idx, sizeof(__pyx_k__idx), 0, 0, 1, 1}, {&__pyx_n_s__idxc, __pyx_k__idxc, sizeof(__pyx_k__idxc), 0, 0, 1, 1}, {&__pyx_n_s__idxn, __pyx_k__idxn, sizeof(__pyx_k__idxn), 0, 0, 1, 1}, {&__pyx_n_s__idxptr, __pyx_k__idxptr, sizeof(__pyx_k__idxptr), 0, 0, 1, 1}, {&__pyx_n_s__kdpnts, __pyx_k__kdpnts, sizeof(__pyx_k__kdpnts), 0, 0, 1, 1}, {&__pyx_n_s__lc, __pyx_k__lc, sizeof(__pyx_k__lc), 0, 0, 1, 1}, {&__pyx_n_s__lcoords, __pyx_k__lcoords, sizeof(__pyx_k__lcoords), 0, 0, 1, 1}, {&__pyx_n_s__lcoords_c, __pyx_k__lcoords_c, sizeof(__pyx_k__lcoords_c), 0, 0, 1, 1}, {&__pyx_n_s__lidx, __pyx_k__lidx, sizeof(__pyx_k__lidx), 0, 0, 1, 1}, {&__pyx_n_s__lidx_c, __pyx_k__lidx_c, sizeof(__pyx_k__lidx_c), 0, 0, 1, 1}, {&__pyx_n_s__mode, __pyx_k__mode, sizeof(__pyx_k__mode), 0, 0, 1, 1}, {&__pyx_n_s__neighbor_number, __pyx_k__neighbor_number, sizeof(__pyx_k__neighbor_number), 0, 0, 1, 1}, {&__pyx_n_s__np, __pyx_k__np, sizeof(__pyx_k__np), 0, 0, 1, 1}, {&__pyx_n_s__numpy, __pyx_k__numpy, sizeof(__pyx_k__numpy), 0, 0, 1, 1}, {&__pyx_n_s__qc, __pyx_k__qc, sizeof(__pyx_k__qc), 0, 0, 1, 1}, {&__pyx_n_s__qcoords, __pyx_k__qcoords, sizeof(__pyx_k__qcoords), 0, 0, 1, 1}, {&__pyx_n_s__qcoords_c, __pyx_k__qcoords_c, sizeof(__pyx_k__qcoords_c), 0, 0, 1, 1}, {&__pyx_n_s__range, __pyx_k__range, sizeof(__pyx_k__range), 0, 0, 1, 1}, {&__pyx_n_s__search_limit, __pyx_k__search_limit, sizeof(__pyx_k__search_limit), 0, 0, 1, 1}, {&__pyx_n_s__search_point, __pyx_k__search_point, sizeof(__pyx_k__search_point), 0, 0, 1, 1}, {&__pyx_n_s__shape1, __pyx_k__shape1, sizeof(__pyx_k__shape1), 0, 0, 1, 1}, {&__pyx_n_s__shape2, __pyx_k__shape2, sizeof(__pyx_k__shape2), 0, 0, 1, 1}, {&__pyx_n_s__t_arr, __pyx_k__t_arr, sizeof(__pyx_k__t_arr), 0, 0, 1, 1}, {&__pyx_n_s__t_asu, __pyx_k__t_asu, sizeof(__pyx_k__t_asu), 0, 0, 1, 1}, {&__pyx_n_s__t_dst, __pyx_k__t_dst, sizeof(__pyx_k__t_dst), 0, 0, 1, 1}, {&__pyx_n_s__t_idx, __pyx_k__t_idx, sizeof(__pyx_k__t_idx), 0, 0, 1, 1}, {&__pyx_n_s__t_lid, __pyx_k__t_lid, sizeof(__pyx_k__t_lid), 0, 0, 1, 1}, {&__pyx_n_s__t_ptr, __pyx_k__t_ptr, sizeof(__pyx_k__t_ptr), 0, 0, 1, 1}, {&__pyx_n_s__t_sym, __pyx_k__t_sym, sizeof(__pyx_k__t_sym), 0, 0, 1, 1}, {&__pyx_n_s__t_tra, __pyx_k__t_tra, sizeof(__pyx_k__t_tra), 0, 0, 1, 1}, {&__pyx_n_s__tree, __pyx_k__tree, sizeof(__pyx_k__tree), 0, 0, 1, 1}, {&__pyx_n_s__uint64, __pyx_k__uint64, sizeof(__pyx_k__uint64), 0, 0, 1, 1}, {&__pyx_n_s__zero_tra, __pyx_k__zero_tra, sizeof(__pyx_k__zero_tra), 0, 0, 1, 1}, {0, 0, 0, 0, 0, 0, 0} }; static int __Pyx_InitCachedBuiltins(void) { __pyx_builtin_ValueError = __Pyx_GetName(__pyx_b, __pyx_n_s__ValueError); if (!__pyx_builtin_ValueError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_range = __Pyx_GetName(__pyx_b, __pyx_n_s__range); if (!__pyx_builtin_range) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 227; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_RuntimeError = __Pyx_GetName(__pyx_b, __pyx_n_s__RuntimeError); if (!__pyx_builtin_RuntimeError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} return 0; __pyx_L1_error:; return -1; } static int __Pyx_InitCachedConstants(void) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__Pyx_InitCachedConstants", 0); /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_k_tuple_2 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_2); __Pyx_INCREF(((PyObject *)__pyx_kp_u_1)); PyTuple_SET_ITEM(__pyx_k_tuple_2, 0, ((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_2)); /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_k_tuple_4 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_4); __Pyx_INCREF(((PyObject *)__pyx_kp_u_3)); PyTuple_SET_ITEM(__pyx_k_tuple_4, 0, ((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_4)); /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_k_tuple_6 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_6)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_6); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_6, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_6)); /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_k_tuple_9 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_9)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_9); __Pyx_INCREF(((PyObject *)__pyx_kp_u_8)); PyTuple_SET_ITEM(__pyx_k_tuple_9, 0, ((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_9)); /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_k_tuple_10 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_10)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_10); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_10, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_10)); /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_k_tuple_12 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_12)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_12); __Pyx_INCREF(((PyObject *)__pyx_kp_u_11)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 0, ((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_12)); /* "cogent/struct/_contact.pyx":17 * # NPY_OWNDATA * * def cnt_loop( np.ndarray[DTYPE_t, ndim =2] qcoords,\ # <<<<<<<<<<<<<< * np.ndarray[DTYPE_t, ndim =2] lcoords,\ * np.ndarray[LTYPE_t, ndim =1] qc,\ */ __pyx_k_tuple_14 = PyTuple_New(41); if (unlikely(!__pyx_k_tuple_14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_14); __Pyx_INCREF(((PyObject *)__pyx_n_s__qcoords)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 0, ((PyObject *)__pyx_n_s__qcoords)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__qcoords)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lcoords)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 1, ((PyObject *)__pyx_n_s__lcoords)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lcoords)); __Pyx_INCREF(((PyObject *)__pyx_n_s__qc)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 2, ((PyObject *)__pyx_n_s__qc)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__qc)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lc)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 3, ((PyObject *)__pyx_n_s__lc)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lc)); __Pyx_INCREF(((PyObject *)__pyx_n_s__shape1)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 4, ((PyObject *)__pyx_n_s__shape1)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__shape1)); __Pyx_INCREF(((PyObject *)__pyx_n_s__shape2)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 5, ((PyObject *)__pyx_n_s__shape2)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__shape2)); __Pyx_INCREF(((PyObject *)__pyx_n_s__zero_tra)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 6, ((PyObject *)__pyx_n_s__zero_tra)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__zero_tra)); __Pyx_INCREF(((PyObject *)__pyx_n_s__mode)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 7, ((PyObject *)__pyx_n_s__mode)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__mode)); __Pyx_INCREF(((PyObject *)__pyx_n_s__search_limit)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 8, ((PyObject *)__pyx_n_s__search_limit)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__search_limit)); __Pyx_INCREF(((PyObject *)__pyx_n_s__box)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 9, ((PyObject *)__pyx_n_s__box)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__box)); __Pyx_INCREF(((PyObject *)__pyx_n_s__bucket_size)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 10, ((PyObject *)__pyx_n_s__bucket_size)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__bucket_size)); __Pyx_INCREF(((PyObject *)__pyx_n_s__MAXSYM)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 11, ((PyObject *)__pyx_n_s__MAXSYM)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__MAXSYM)); __Pyx_INCREF(((PyObject *)__pyx_n_s__MAXCNT)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 12, ((PyObject *)__pyx_n_s__MAXCNT)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__MAXCNT)); __Pyx_INCREF(((PyObject *)__pyx_n_s__asu_atoms)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 13, ((PyObject *)__pyx_n_s__asu_atoms)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__asu_atoms)); __Pyx_INCREF(((PyObject *)__pyx_n_s__idx)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 14, ((PyObject *)__pyx_n_s__idx)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__idx)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lidx)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 15, ((PyObject *)__pyx_n_s__lidx)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lidx)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lidx_c)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 16, ((PyObject *)__pyx_n_s__lidx_c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lidx_c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__idxn)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 17, ((PyObject *)__pyx_n_s__idxn)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__idxn)); __Pyx_INCREF(((PyObject *)__pyx_n_s__idxc)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 18, ((PyObject *)__pyx_n_s__idxc)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__idxc)); __Pyx_INCREF(((PyObject *)__pyx_n_s__qcoords_c)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 19, ((PyObject *)__pyx_n_s__qcoords_c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__qcoords_c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__lcoords_c)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 20, ((PyObject *)__pyx_n_s__lcoords_c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__lcoords_c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__box_c)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 21, ((PyObject *)__pyx_n_s__box_c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__box_c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__idxptr)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 22, ((PyObject *)__pyx_n_s__idxptr)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__idxptr)); __Pyx_INCREF(((PyObject *)__pyx_n_s__dstptr)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 23, ((PyObject *)__pyx_n_s__dstptr)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__dstptr)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_ptr)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 24, ((PyObject *)__pyx_n_s__t_ptr)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_ptr)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_idx)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 25, ((PyObject *)__pyx_n_s__t_idx)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_idx)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_asu)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 26, ((PyObject *)__pyx_n_s__t_asu)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_asu)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_sym)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 27, ((PyObject *)__pyx_n_s__t_sym)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_sym)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_tra)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 28, ((PyObject *)__pyx_n_s__t_tra)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_tra)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_dst)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 29, ((PyObject *)__pyx_n_s__t_dst)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_dst)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_arr)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 30, ((PyObject *)__pyx_n_s__t_arr)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_arr)); __Pyx_INCREF(((PyObject *)__pyx_n_s__t_lid)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 31, ((PyObject *)__pyx_n_s__t_lid)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__t_lid)); __Pyx_INCREF(((PyObject *)__pyx_n_s__c_src)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 32, ((PyObject *)__pyx_n_s__c_src)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__c_src)); __Pyx_INCREF(((PyObject *)__pyx_n_s__c_asu)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 33, ((PyObject *)__pyx_n_s__c_asu)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__c_asu)); __Pyx_INCREF(((PyObject *)__pyx_n_s__c_sym)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 34, ((PyObject *)__pyx_n_s__c_sym)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__c_sym)); __Pyx_INCREF(((PyObject *)__pyx_n_s__c_tra)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 35, ((PyObject *)__pyx_n_s__c_tra)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__c_tra)); __Pyx_INCREF(((PyObject *)__pyx_n_s__c_dst)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 36, ((PyObject *)__pyx_n_s__c_dst)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__c_dst)); __Pyx_INCREF(((PyObject *)__pyx_n_s__search_point)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 37, ((PyObject *)__pyx_n_s__search_point)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__search_point)); __Pyx_INCREF(((PyObject *)__pyx_n_s__neighbor_number)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 38, ((PyObject *)__pyx_n_s__neighbor_number)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__neighbor_number)); __Pyx_INCREF(((PyObject *)__pyx_n_s__kdpnts)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 39, ((PyObject *)__pyx_n_s__kdpnts)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__kdpnts)); __Pyx_INCREF(((PyObject *)__pyx_n_s__tree)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 40, ((PyObject *)__pyx_n_s__tree)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__tree)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_14)); __pyx_k_codeobj_15 = (PyObject*)__Pyx_PyCode_New(13, 0, 41, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_16, __pyx_n_s__cnt_loop, 17, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_RefNannyFinishContext(); return 0; __pyx_L1_error:; __Pyx_RefNannyFinishContext(); return -1; } static int __Pyx_InitGlobals(void) { if (__Pyx_InitStrings(__pyx_string_tab) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_15 = PyInt_FromLong(15); if (unlikely(!__pyx_int_15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; return 0; __pyx_L1_error:; return -1; } #if PY_MAJOR_VERSION < 3 PyMODINIT_FUNC init_contact(void); /*proto*/ PyMODINIT_FUNC init_contact(void) #else PyMODINIT_FUNC PyInit__contact(void); /*proto*/ PyMODINIT_FUNC PyInit__contact(void) #endif { PyObject *__pyx_t_1 = NULL; PyObject *__pyx_t_2 = NULL; __Pyx_RefNannyDeclarations #if CYTHON_REFNANNY __Pyx_RefNanny = __Pyx_RefNannyImportAPI("refnanny"); if (!__Pyx_RefNanny) { PyErr_Clear(); __Pyx_RefNanny = __Pyx_RefNannyImportAPI("Cython.Runtime.refnanny"); if (!__Pyx_RefNanny) Py_FatalError("failed to import 'refnanny' module"); } #endif __Pyx_RefNannySetupContext("PyMODINIT_FUNC PyInit__contact(void)", 0); if ( __Pyx_check_binary_version() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_tuple = PyTuple_New(0); if (unlikely(!__pyx_empty_tuple)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_bytes = PyBytes_FromStringAndSize("", 0); if (unlikely(!__pyx_empty_bytes)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #ifdef __Pyx_CyFunction_USED if (__Pyx_CyFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_FusedFunction_USED if (__pyx_FusedFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_Generator_USED if (__pyx_Generator_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif /*--- Library function declarations ---*/ /*--- Threads initialization code ---*/ #if defined(__PYX_FORCE_INIT_THREADS) && __PYX_FORCE_INIT_THREADS #ifdef WITH_THREAD /* Python build with threading support? */ PyEval_InitThreads(); #endif #endif /*--- Module creation code ---*/ #if PY_MAJOR_VERSION < 3 __pyx_m = Py_InitModule4(__Pyx_NAMESTR("_contact"), __pyx_methods, 0, 0, PYTHON_API_VERSION); #else __pyx_m = PyModule_Create(&__pyx_moduledef); #endif if (!__pyx_m) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; #if PY_MAJOR_VERSION < 3 Py_INCREF(__pyx_m); #endif __pyx_b = PyImport_AddModule(__Pyx_NAMESTR(__Pyx_BUILTIN_MODULE_NAME)); if (!__pyx_b) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; if (__Pyx_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; /*--- Initialize various global constants etc. ---*/ if (unlikely(__Pyx_InitGlobals() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__pyx_module_is_main_cogent__struct___contact) { if (__Pyx_SetAttrString(__pyx_m, "__name__", __pyx_n_s____main__) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; } /*--- Builtin init code ---*/ if (unlikely(__Pyx_InitCachedBuiltins() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Constants init code ---*/ if (unlikely(__Pyx_InitCachedConstants() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Global init code ---*/ /*--- Variable export code ---*/ /*--- Function export code ---*/ /*--- Type init code ---*/ /*--- Type import code ---*/ __pyx_ptype_5numpy_dtype = __Pyx_ImportType("numpy", "dtype", sizeof(PyArray_Descr), 0); if (unlikely(!__pyx_ptype_5numpy_dtype)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 154; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_flatiter = __Pyx_ImportType("numpy", "flatiter", sizeof(PyArrayIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_flatiter)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 164; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_broadcast = __Pyx_ImportType("numpy", "broadcast", sizeof(PyArrayMultiIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_broadcast)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 168; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ndarray = __Pyx_ImportType("numpy", "ndarray", sizeof(PyArrayObject), 0); if (unlikely(!__pyx_ptype_5numpy_ndarray)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 177; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ufunc = __Pyx_ImportType("numpy", "ufunc", sizeof(PyUFuncObject), 0); if (unlikely(!__pyx_ptype_5numpy_ufunc)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 860; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Variable import code ---*/ /*--- Function import code ---*/ __pyx_t_1 = __Pyx_ImportModule("cogent.maths.spatial.ckd3"); if (!__pyx_t_1) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ImportFunction(__pyx_t_1, "points", (void (**)(void))&__pyx_f_6cogent_5maths_7spatial_4ckd3_points, "struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ImportFunction(__pyx_t_1, "build_tree", (void (**)(void))&__pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree, "struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ImportFunction(__pyx_t_1, "rn", (void (**)(void))&__pyx_f_6cogent_5maths_7spatial_4ckd3_rn, "__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t (struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} Py_DECREF(__pyx_t_1); __pyx_t_1 = 0; /*--- Execution code ---*/ /* "cogent/struct/_contact.pyx":2 * cimport cython * import numpy as np # <<<<<<<<<<<<<< * cimport numpy as np * from numpy cimport npy_intp */ __pyx_t_2 = __Pyx_Import(((PyObject *)__pyx_n_s__numpy), 0, -1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__np, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/struct/_contact.pyx":8 * from stdlib cimport malloc, free * * __version__ = "('1', '5', '3')" # <<<<<<<<<<<<<< * * cdef extern from "numpy/arrayobject.h": */ if (PyObject_SetAttr(__pyx_m, __pyx_n_s____version__, ((PyObject *)__pyx_kp_s_13)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/struct/_contact.pyx":17 * # NPY_OWNDATA * * def cnt_loop( np.ndarray[DTYPE_t, ndim =2] qcoords,\ # <<<<<<<<<<<<<< * np.ndarray[DTYPE_t, ndim =2] lcoords,\ * np.ndarray[LTYPE_t, ndim =1] qc,\ */ __pyx_t_2 = PyCFunction_NewEx(&__pyx_mdef_6cogent_6struct_8_contact_1cnt_loop, NULL, __pyx_n_s_17); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__cnt_loop, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/struct/_contact.pyx":1 * cimport cython # <<<<<<<<<<<<<< * import numpy as np * cimport numpy as np */ __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); if (PyObject_SetAttr(__pyx_m, __pyx_n_s____test__, ((PyObject *)__pyx_t_2)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); if (__pyx_m) { __Pyx_AddTraceback("init cogent.struct._contact", __pyx_clineno, __pyx_lineno, __pyx_filename); Py_DECREF(__pyx_m); __pyx_m = 0; } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_ImportError, "init cogent.struct._contact"); } __pyx_L0:; __Pyx_RefNannyFinishContext(); #if PY_MAJOR_VERSION < 3 return; #else return __pyx_m; #endif } /* Runtime support code */ #if CYTHON_REFNANNY static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname) { PyObject *m = NULL, *p = NULL; void *r = NULL; m = PyImport_ImportModule((char *)modname); if (!m) goto end; p = PyObject_GetAttrString(m, (char *)"RefNannyAPI"); if (!p) goto end; r = PyLong_AsVoidPtr(p); end: Py_XDECREF(p); Py_XDECREF(m); return (__Pyx_RefNannyAPIStruct *)r; } #endif /* CYTHON_REFNANNY */ static void __Pyx_RaiseArgtupleInvalid( const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found) { Py_ssize_t num_expected; const char *more_or_less; if (num_found < num_min) { num_expected = num_min; more_or_less = "at least"; } else { num_expected = num_max; more_or_less = "at most"; } if (exact) { more_or_less = "exactly"; } PyErr_Format(PyExc_TypeError, "%s() takes %s %"PY_FORMAT_SIZE_T"d positional argument%s (%"PY_FORMAT_SIZE_T"d given)", func_name, more_or_less, num_expected, (num_expected == 1) ? "" : "s", num_found); } static void __Pyx_RaiseDoubleKeywordsError( const char* func_name, PyObject* kw_name) { PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION >= 3 "%s() got multiple values for keyword argument '%U'", func_name, kw_name); #else "%s() got multiple values for keyword argument '%s'", func_name, PyString_AS_STRING(kw_name)); #endif } static int __Pyx_ParseOptionalKeywords( PyObject *kwds, PyObject **argnames[], PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, const char* function_name) { PyObject *key = 0, *value = 0; Py_ssize_t pos = 0; PyObject*** name; PyObject*** first_kw_arg = argnames + num_pos_args; while (PyDict_Next(kwds, &pos, &key, &value)) { name = first_kw_arg; while (*name && (**name != key)) name++; if (*name) { values[name-argnames] = value; } else { #if PY_MAJOR_VERSION < 3 if (unlikely(!PyString_CheckExact(key)) && unlikely(!PyString_Check(key))) { #else if (unlikely(!PyUnicode_Check(key))) { #endif goto invalid_keyword_type; } else { for (name = first_kw_arg; *name; name++) { #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) break; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) break; #endif } if (*name) { values[name-argnames] = value; } else { for (name=argnames; name != first_kw_arg; name++) { if (**name == key) goto arg_passed_twice; #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) goto arg_passed_twice; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) goto arg_passed_twice; #endif } if (kwds2) { if (unlikely(PyDict_SetItem(kwds2, key, value))) goto bad; } else { goto invalid_keyword; } } } } } return 0; arg_passed_twice: __Pyx_RaiseDoubleKeywordsError(function_name, **name); goto bad; invalid_keyword_type: PyErr_Format(PyExc_TypeError, "%s() keywords must be strings", function_name); goto bad; invalid_keyword: PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION < 3 "%s() got an unexpected keyword argument '%s'", function_name, PyString_AsString(key)); #else "%s() got an unexpected keyword argument '%U'", function_name, key); #endif bad: return -1; } static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact) { if (!type) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (none_allowed && obj == Py_None) return 1; else if (exact) { if (Py_TYPE(obj) == type) return 1; } else { if (PyObject_TypeCheck(obj, type)) return 1; } PyErr_Format(PyExc_TypeError, "Argument '%s' has incorrect type (expected %s, got %s)", name, type->tp_name, Py_TYPE(obj)->tp_name); return 0; } static CYTHON_INLINE int __Pyx_IsLittleEndian(void) { unsigned int n = 1; return *(unsigned char*)(&n) != 0; } static void __Pyx_BufFmt_Init(__Pyx_BufFmt_Context* ctx, __Pyx_BufFmt_StackElem* stack, __Pyx_TypeInfo* type) { stack[0].field = &ctx->root; stack[0].parent_offset = 0; ctx->root.type = type; ctx->root.name = "buffer dtype"; ctx->root.offset = 0; ctx->head = stack; ctx->head->field = &ctx->root; ctx->fmt_offset = 0; ctx->head->parent_offset = 0; ctx->new_packmode = '@'; ctx->enc_packmode = '@'; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->is_complex = 0; ctx->is_valid_array = 0; ctx->struct_alignment = 0; while (type->typegroup == 'S') { ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = 0; type = type->fields->type; } } static int __Pyx_BufFmt_ParseNumber(const char** ts) { int count; const char* t = *ts; if (*t < '0' || *t > '9') { return -1; } else { count = *t++ - '0'; while (*t >= '0' && *t < '9') { count *= 10; count += *t++ - '0'; } } *ts = t; return count; } static int __Pyx_BufFmt_ExpectNumber(const char **ts) { int number = __Pyx_BufFmt_ParseNumber(ts); if (number == -1) /* First char was not a digit */ PyErr_Format(PyExc_ValueError,\ "Does not understand character buffer dtype format string ('%c')", **ts); return number; } static void __Pyx_BufFmt_RaiseUnexpectedChar(char ch) { PyErr_Format(PyExc_ValueError, "Unexpected format string character: '%c'", ch); } static const char* __Pyx_BufFmt_DescribeTypeChar(char ch, int is_complex) { switch (ch) { case 'b': return "'char'"; case 'B': return "'unsigned char'"; case 'h': return "'short'"; case 'H': return "'unsigned short'"; case 'i': return "'int'"; case 'I': return "'unsigned int'"; case 'l': return "'long'"; case 'L': return "'unsigned long'"; case 'q': return "'long long'"; case 'Q': return "'unsigned long long'"; case 'f': return (is_complex ? "'complex float'" : "'float'"); case 'd': return (is_complex ? "'complex double'" : "'double'"); case 'g': return (is_complex ? "'complex long double'" : "'long double'"); case 'T': return "a struct"; case 'O': return "Python object"; case 'P': return "a pointer"; case 's': case 'p': return "a string"; case 0: return "end"; default: return "unparseable format string"; } } static size_t __Pyx_BufFmt_TypeCharToStandardSize(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return 2; case 'i': case 'I': case 'l': case 'L': return 4; case 'q': case 'Q': return 8; case 'f': return (is_complex ? 8 : 4); case 'd': return (is_complex ? 16 : 8); case 'g': { PyErr_SetString(PyExc_ValueError, "Python does not define a standard format string size for long double ('g').."); return 0; } case 'O': case 'P': return sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static size_t __Pyx_BufFmt_TypeCharToNativeSize(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(short); case 'i': case 'I': return sizeof(int); case 'l': case 'L': return sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(float) * (is_complex ? 2 : 1); case 'd': return sizeof(double) * (is_complex ? 2 : 1); case 'g': return sizeof(long double) * (is_complex ? 2 : 1); case 'O': case 'P': return sizeof(void*); default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } typedef struct { char c; short x; } __Pyx_st_short; typedef struct { char c; int x; } __Pyx_st_int; typedef struct { char c; long x; } __Pyx_st_long; typedef struct { char c; float x; } __Pyx_st_float; typedef struct { char c; double x; } __Pyx_st_double; typedef struct { char c; long double x; } __Pyx_st_longdouble; typedef struct { char c; void *x; } __Pyx_st_void_p; #ifdef HAVE_LONG_LONG typedef struct { char c; PY_LONG_LONG x; } __Pyx_st_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToAlignment(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_st_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_st_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_st_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_st_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_st_float) - sizeof(float); case 'd': return sizeof(__Pyx_st_double) - sizeof(double); case 'g': return sizeof(__Pyx_st_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_st_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } /* These are for computing the padding at the end of the struct to align on the first member of the struct. This will probably the same as above, but we don't have any guarantees. */ typedef struct { short x; char c; } __Pyx_pad_short; typedef struct { int x; char c; } __Pyx_pad_int; typedef struct { long x; char c; } __Pyx_pad_long; typedef struct { float x; char c; } __Pyx_pad_float; typedef struct { double x; char c; } __Pyx_pad_double; typedef struct { long double x; char c; } __Pyx_pad_longdouble; typedef struct { void *x; char c; } __Pyx_pad_void_p; #ifdef HAVE_LONG_LONG typedef struct { PY_LONG_LONG x; char c; } __Pyx_pad_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToPadding(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_pad_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_pad_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_pad_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_pad_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_pad_float) - sizeof(float); case 'd': return sizeof(__Pyx_pad_double) - sizeof(double); case 'g': return sizeof(__Pyx_pad_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_pad_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static char __Pyx_BufFmt_TypeCharToGroup(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'h': case 'i': case 'l': case 'q': case 's': case 'p': return 'I'; case 'B': case 'H': case 'I': case 'L': case 'Q': return 'U'; case 'f': case 'd': case 'g': return (is_complex ? 'C' : 'R'); case 'O': return 'O'; case 'P': return 'P'; default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } static void __Pyx_BufFmt_RaiseExpected(__Pyx_BufFmt_Context* ctx) { if (ctx->head == NULL || ctx->head->field == &ctx->root) { const char* expected; const char* quote; if (ctx->head == NULL) { expected = "end"; quote = ""; } else { expected = ctx->head->field->type->name; quote = "'"; } PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected %s%s%s but got %s", quote, expected, quote, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex)); } else { __Pyx_StructField* field = ctx->head->field; __Pyx_StructField* parent = (ctx->head - 1)->field; PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected '%s' but got %s in '%s.%s'", field->type->name, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex), parent->type->name, field->name); } } static int __Pyx_BufFmt_ProcessTypeChunk(__Pyx_BufFmt_Context* ctx) { char group; size_t size, offset, arraysize = 1; if (ctx->enc_type == 0) return 0; if (ctx->head->field->type->arraysize[0]) { int i, ndim = 0; if (ctx->enc_type == 's' || ctx->enc_type == 'p') { ctx->is_valid_array = ctx->head->field->type->ndim == 1; ndim = 1; if (ctx->enc_count != ctx->head->field->type->arraysize[0]) { PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %zu", ctx->head->field->type->arraysize[0], ctx->enc_count); return -1; } } if (!ctx->is_valid_array) { PyErr_Format(PyExc_ValueError, "Expected %d dimensions, got %d", ctx->head->field->type->ndim, ndim); return -1; } for (i = 0; i < ctx->head->field->type->ndim; i++) { arraysize *= ctx->head->field->type->arraysize[i]; } ctx->is_valid_array = 0; ctx->enc_count = 1; } group = __Pyx_BufFmt_TypeCharToGroup(ctx->enc_type, ctx->is_complex); do { __Pyx_StructField* field = ctx->head->field; __Pyx_TypeInfo* type = field->type; if (ctx->enc_packmode == '@' || ctx->enc_packmode == '^') { size = __Pyx_BufFmt_TypeCharToNativeSize(ctx->enc_type, ctx->is_complex); } else { size = __Pyx_BufFmt_TypeCharToStandardSize(ctx->enc_type, ctx->is_complex); } if (ctx->enc_packmode == '@') { size_t align_at = __Pyx_BufFmt_TypeCharToAlignment(ctx->enc_type, ctx->is_complex); size_t align_mod_offset; if (align_at == 0) return -1; align_mod_offset = ctx->fmt_offset % align_at; if (align_mod_offset > 0) ctx->fmt_offset += align_at - align_mod_offset; if (ctx->struct_alignment == 0) ctx->struct_alignment = __Pyx_BufFmt_TypeCharToPadding(ctx->enc_type, ctx->is_complex); } if (type->size != size || type->typegroup != group) { if (type->typegroup == 'C' && type->fields != NULL) { size_t parent_offset = ctx->head->parent_offset + field->offset; ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = parent_offset; continue; } __Pyx_BufFmt_RaiseExpected(ctx); return -1; } offset = ctx->head->parent_offset + field->offset; if (ctx->fmt_offset != offset) { PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch; next field is at offset %"PY_FORMAT_SIZE_T"d but %"PY_FORMAT_SIZE_T"d expected", (Py_ssize_t)ctx->fmt_offset, (Py_ssize_t)offset); return -1; } ctx->fmt_offset += size; if (arraysize) ctx->fmt_offset += (arraysize - 1) * size; --ctx->enc_count; /* Consume from buffer string */ while (1) { if (field == &ctx->root) { ctx->head = NULL; if (ctx->enc_count != 0) { __Pyx_BufFmt_RaiseExpected(ctx); return -1; } break; /* breaks both loops as ctx->enc_count == 0 */ } ctx->head->field = ++field; if (field->type == NULL) { --ctx->head; field = ctx->head->field; continue; } else if (field->type->typegroup == 'S') { size_t parent_offset = ctx->head->parent_offset + field->offset; if (field->type->fields->type == NULL) continue; /* empty struct */ field = field->type->fields; ++ctx->head; ctx->head->field = field; ctx->head->parent_offset = parent_offset; break; } else { break; } } } while (ctx->enc_count); ctx->enc_type = 0; ctx->is_complex = 0; return 0; } static CYTHON_INLINE PyObject * __pyx_buffmt_parse_array(__Pyx_BufFmt_Context* ctx, const char** tsp) { const char *ts = *tsp; int i = 0, number; int ndim = ctx->head->field->type->ndim; ; ++ts; if (ctx->new_count != 1) { PyErr_SetString(PyExc_ValueError, "Cannot handle repeated arrays in format string"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; while (*ts && *ts != ')') { if (isspace(*ts)) continue; number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; if (i < ndim && (size_t) number != ctx->head->field->type->arraysize[i]) return PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %d", ctx->head->field->type->arraysize[i], number); if (*ts != ',' && *ts != ')') return PyErr_Format(PyExc_ValueError, "Expected a comma in format string, got '%c'", *ts); if (*ts == ',') ts++; i++; } if (i != ndim) return PyErr_Format(PyExc_ValueError, "Expected %d dimension(s), got %d", ctx->head->field->type->ndim, i); if (!*ts) { PyErr_SetString(PyExc_ValueError, "Unexpected end of format string, expected ')'"); return NULL; } ctx->is_valid_array = 1; ctx->new_count = 1; *tsp = ++ts; return Py_None; } static const char* __Pyx_BufFmt_CheckString(__Pyx_BufFmt_Context* ctx, const char* ts) { int got_Z = 0; while (1) { switch(*ts) { case 0: if (ctx->enc_type != 0 && ctx->head == NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; if (ctx->head != NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } return ts; case ' ': case 10: case 13: ++ts; break; case '<': if (!__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Little-endian buffer not supported on big-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '>': case '!': if (__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Big-endian buffer not supported on little-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '=': case '@': case '^': ctx->new_packmode = *ts++; break; case 'T': /* substruct */ { const char* ts_after_sub; size_t i, struct_count = ctx->new_count; size_t struct_alignment = ctx->struct_alignment; ctx->new_count = 1; ++ts; if (*ts != '{') { PyErr_SetString(PyExc_ValueError, "Buffer acquisition: Expected '{' after 'T'"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ ctx->enc_count = 0; ctx->struct_alignment = 0; ++ts; ts_after_sub = ts; for (i = 0; i != struct_count; ++i) { ts_after_sub = __Pyx_BufFmt_CheckString(ctx, ts); if (!ts_after_sub) return NULL; } ts = ts_after_sub; if (struct_alignment) ctx->struct_alignment = struct_alignment; } break; case '}': /* end of substruct; either repeat or move on */ { size_t alignment = ctx->struct_alignment; ++ts; if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ if (alignment && ctx->fmt_offset % alignment) { ctx->fmt_offset += alignment - (ctx->fmt_offset % alignment); } } return ts; case 'x': if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->fmt_offset += ctx->new_count; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->enc_packmode = ctx->new_packmode; ++ts; break; case 'Z': got_Z = 1; ++ts; if (*ts != 'f' && *ts != 'd' && *ts != 'g') { __Pyx_BufFmt_RaiseUnexpectedChar('Z'); return NULL; } /* fall through */ case 'c': case 'b': case 'B': case 'h': case 'H': case 'i': case 'I': case 'l': case 'L': case 'q': case 'Q': case 'f': case 'd': case 'g': case 'O': case 's': case 'p': if (ctx->enc_type == *ts && got_Z == ctx->is_complex && ctx->enc_packmode == ctx->new_packmode) { ctx->enc_count += ctx->new_count; } else { if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_count = ctx->new_count; ctx->enc_packmode = ctx->new_packmode; ctx->enc_type = *ts; ctx->is_complex = got_Z; } ++ts; ctx->new_count = 1; got_Z = 0; break; case ':': ++ts; while(*ts != ':') ++ts; ++ts; break; case '(': if (!__pyx_buffmt_parse_array(ctx, &ts)) return NULL; break; default: { int number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; ctx->new_count = (size_t)number; } } } } static CYTHON_INLINE void __Pyx_ZeroBuffer(Py_buffer* buf) { buf->buf = NULL; buf->obj = NULL; buf->strides = __Pyx_zeros; buf->shape = __Pyx_zeros; buf->suboffsets = __Pyx_minusones; } static CYTHON_INLINE int __Pyx_GetBufferAndValidate( Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack) { if (obj == Py_None || obj == NULL) { __Pyx_ZeroBuffer(buf); return 0; } buf->buf = NULL; if (__Pyx_GetBuffer(obj, buf, flags) == -1) goto fail; if (buf->ndim != nd) { PyErr_Format(PyExc_ValueError, "Buffer has wrong number of dimensions (expected %d, got %d)", nd, buf->ndim); goto fail; } if (!cast) { __Pyx_BufFmt_Context ctx; __Pyx_BufFmt_Init(&ctx, stack, dtype); if (!__Pyx_BufFmt_CheckString(&ctx, buf->format)) goto fail; } if ((unsigned)buf->itemsize != dtype->size) { PyErr_Format(PyExc_ValueError, "Item size of buffer (%"PY_FORMAT_SIZE_T"d byte%s) does not match size of '%s' (%"PY_FORMAT_SIZE_T"d byte%s)", buf->itemsize, (buf->itemsize > 1) ? "s" : "", dtype->name, (Py_ssize_t)dtype->size, (dtype->size > 1) ? "s" : ""); goto fail; } if (buf->suboffsets == NULL) buf->suboffsets = __Pyx_minusones; return 0; fail:; __Pyx_ZeroBuffer(buf); return -1; } static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info) { if (info->buf == NULL) return; if (info->suboffsets == __Pyx_minusones) info->suboffsets = NULL; __Pyx_ReleaseBuffer(info); } static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name) { PyObject *result; result = PyObject_GetAttr(dict, name); if (!result) { if (dict != __pyx_b) { PyErr_Clear(); result = PyObject_GetAttr(__pyx_b, name); } if (!result) { PyErr_SetObject(PyExc_NameError, name); } } return result; } static void __Pyx_RaiseBufferIndexError(int axis) { PyErr_Format(PyExc_IndexError, "Out of bounds on buffer access (axis %d)", axis); } static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb) { #if CYTHON_COMPILING_IN_CPYTHON PyObject *tmp_type, *tmp_value, *tmp_tb; PyThreadState *tstate = PyThreadState_GET(); tmp_type = tstate->curexc_type; tmp_value = tstate->curexc_value; tmp_tb = tstate->curexc_traceback; tstate->curexc_type = type; tstate->curexc_value = value; tstate->curexc_traceback = tb; Py_XDECREF(tmp_type); Py_XDECREF(tmp_value); Py_XDECREF(tmp_tb); #else PyErr_Restore(type, value, tb); #endif } static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb) { #if CYTHON_COMPILING_IN_CPYTHON PyThreadState *tstate = PyThreadState_GET(); *type = tstate->curexc_type; *value = tstate->curexc_value; *tb = tstate->curexc_traceback; tstate->curexc_type = 0; tstate->curexc_value = 0; tstate->curexc_traceback = 0; #else PyErr_Fetch(type, value, tb); #endif } #if PY_MAJOR_VERSION < 3 static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, CYTHON_UNUSED PyObject *cause) { Py_XINCREF(type); Py_XINCREF(value); Py_XINCREF(tb); if (tb == Py_None) { Py_DECREF(tb); tb = 0; } else if (tb != NULL && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto raise_error; } if (value == NULL) { value = Py_None; Py_INCREF(value); } #if PY_VERSION_HEX < 0x02050000 if (!PyClass_Check(type)) #else if (!PyType_Check(type)) #endif { if (value != Py_None) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto raise_error; } Py_DECREF(value); value = type; #if PY_VERSION_HEX < 0x02050000 if (PyInstance_Check(type)) { type = (PyObject*) ((PyInstanceObject*)type)->in_class; Py_INCREF(type); } else { type = 0; PyErr_SetString(PyExc_TypeError, "raise: exception must be an old-style class or instance"); goto raise_error; } #else type = (PyObject*) Py_TYPE(type); Py_INCREF(type); if (!PyType_IsSubtype((PyTypeObject *)type, (PyTypeObject *)PyExc_BaseException)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto raise_error; } #endif } __Pyx_ErrRestore(type, value, tb); return; raise_error: Py_XDECREF(value); Py_XDECREF(type); Py_XDECREF(tb); return; } #else /* Python 3+ */ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause) { if (tb == Py_None) { tb = 0; } else if (tb && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto bad; } if (value == Py_None) value = 0; if (PyExceptionInstance_Check(type)) { if (value) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto bad; } value = type; type = (PyObject*) Py_TYPE(value); } else if (!PyExceptionClass_Check(type)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto bad; } if (cause) { PyObject *fixed_cause; if (PyExceptionClass_Check(cause)) { fixed_cause = PyObject_CallObject(cause, NULL); if (fixed_cause == NULL) goto bad; } else if (PyExceptionInstance_Check(cause)) { fixed_cause = cause; Py_INCREF(fixed_cause); } else { PyErr_SetString(PyExc_TypeError, "exception causes must derive from " "BaseException"); goto bad; } if (!value) { value = PyObject_CallObject(type, NULL); } PyException_SetCause(value, fixed_cause); } PyErr_SetObject(type, value); if (tb) { PyThreadState *tstate = PyThreadState_GET(); PyObject* tmp_tb = tstate->curexc_traceback; if (tb != tmp_tb) { Py_INCREF(tb); tstate->curexc_traceback = tb; Py_XDECREF(tmp_tb); } } bad: return; } #endif static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index) { PyErr_Format(PyExc_ValueError, "need more than %"PY_FORMAT_SIZE_T"d value%s to unpack", index, (index == 1) ? "" : "s"); } static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected) { PyErr_Format(PyExc_ValueError, "too many values to unpack (expected %"PY_FORMAT_SIZE_T"d)", expected); } static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); } static void __Pyx_UnpackTupleError(PyObject *t, Py_ssize_t index) { if (t == Py_None) { __Pyx_RaiseNoneNotIterableError(); } else if (PyTuple_GET_SIZE(t) < index) { __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(t)); } else { __Pyx_RaiseTooManyValuesError(index); } } static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type) { if (unlikely(!type)) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (likely(PyObject_TypeCheck(obj, type))) return 1; PyErr_Format(PyExc_TypeError, "Cannot convert %.200s to %.200s", Py_TYPE(obj)->tp_name, type->tp_name); return 0; } #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags) { PyObject *getbuffer_cobj; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) return PyObject_GetBuffer(obj, view, flags); #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) return __pyx_pw_5numpy_7ndarray_1__getbuffer__(obj, view, flags); #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (getbuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_getbuffer"))) { getbufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (getbufferproc) PyCapsule_GetPointer(getbuffer_cobj, "getbuffer(obj, view, flags)"); #else func = (getbufferproc) PyCObject_AsVoidPtr(getbuffer_cobj); #endif Py_DECREF(getbuffer_cobj); if (!func) goto fail; return func(obj, view, flags); } else { PyErr_Clear(); } #endif PyErr_Format(PyExc_TypeError, "'%100s' does not have the buffer interface", Py_TYPE(obj)->tp_name); #if PY_VERSION_HEX < 0x02060000 fail: #endif return -1; } static void __Pyx_ReleaseBuffer(Py_buffer *view) { PyObject *obj = view->obj; PyObject *releasebuffer_cobj; if (!obj) return; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) { PyBuffer_Release(view); return; } #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) { __pyx_pw_5numpy_7ndarray_3__releasebuffer__(obj, view); return; } #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (releasebuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_releasebuffer"))) { releasebufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (releasebufferproc) PyCapsule_GetPointer(releasebuffer_cobj, "releasebuffer(obj, view)"); #else func = (releasebufferproc) PyCObject_AsVoidPtr(releasebuffer_cobj); #endif Py_DECREF(releasebuffer_cobj); if (!func) goto fail; func(obj, view); return; } else { PyErr_Clear(); } #endif goto nofail; #if PY_VERSION_HEX < 0x02060000 fail: #endif PyErr_WriteUnraisable(obj); nofail: Py_DECREF(obj); view->obj = NULL; } #endif /* PY_MAJOR_VERSION < 3 */ static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, long level) { PyObject *py_import = 0; PyObject *empty_list = 0; PyObject *module = 0; PyObject *global_dict = 0; PyObject *empty_dict = 0; PyObject *list; py_import = __Pyx_GetAttrString(__pyx_b, "__import__"); if (!py_import) goto bad; if (from_list) list = from_list; else { empty_list = PyList_New(0); if (!empty_list) goto bad; list = empty_list; } global_dict = PyModule_GetDict(__pyx_m); if (!global_dict) goto bad; empty_dict = PyDict_New(); if (!empty_dict) goto bad; #if PY_VERSION_HEX >= 0x02050000 { #if PY_MAJOR_VERSION >= 3 if (level == -1) { if (strchr(__Pyx_MODULE_NAME, '.')) { /* try package relative import first */ PyObject *py_level = PyInt_FromLong(1); if (!py_level) goto bad; module = PyObject_CallFunctionObjArgs(py_import, name, global_dict, empty_dict, list, py_level, NULL); Py_DECREF(py_level); if (!module) { if (!PyErr_ExceptionMatches(PyExc_ImportError)) goto bad; PyErr_Clear(); } } level = 0; /* try absolute import on failure */ } #endif if (!module) { PyObject *py_level = PyInt_FromLong(level); if (!py_level) goto bad; module = PyObject_CallFunctionObjArgs(py_import, name, global_dict, empty_dict, list, py_level, NULL); Py_DECREF(py_level); } } #else if (level>0) { PyErr_SetString(PyExc_RuntimeError, "Relative import is not supported for Python <=2.4."); goto bad; } module = PyObject_CallFunctionObjArgs(py_import, name, global_dict, empty_dict, list, NULL); #endif bad: Py_XDECREF(empty_list); Py_XDECREF(py_import); Py_XDECREF(empty_dict); return module; } static CYTHON_INLINE PyObject *__Pyx_PyInt_to_py_Py_intptr_t(Py_intptr_t val) { const Py_intptr_t neg_one = (Py_intptr_t)-1, const_zero = (Py_intptr_t)0; const int is_unsigned = const_zero < neg_one; if ((sizeof(Py_intptr_t) == sizeof(char)) || (sizeof(Py_intptr_t) == sizeof(short))) { return PyInt_FromLong((long)val); } else if ((sizeof(Py_intptr_t) == sizeof(int)) || (sizeof(Py_intptr_t) == sizeof(long))) { if (is_unsigned) return PyLong_FromUnsignedLong((unsigned long)val); else return PyInt_FromLong((long)val); } else if (sizeof(Py_intptr_t) == sizeof(PY_LONG_LONG)) { if (is_unsigned) return PyLong_FromUnsignedLongLong((unsigned PY_LONG_LONG)val); else return PyLong_FromLongLong((PY_LONG_LONG)val); } else { int one = 1; int little = (int)*(unsigned char *)&one; unsigned char *bytes = (unsigned char *)&val; return _PyLong_FromByteArray(bytes, sizeof(Py_intptr_t), little, !is_unsigned); } } #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return ::std::complex< float >(x, y); } #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return x + y*(__pyx_t_float_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { __pyx_t_float_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex a, __pyx_t_float_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrtf(z.real*z.real + z.imag*z.imag); #else return hypotf(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { float denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(a, a); case 3: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, a); case 4: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_absf(a); theta = atan2f(a.imag, a.real); } lnr = logf(r); z_r = expf(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cosf(z_theta); z.imag = z_r * sinf(z_theta); return z; } #endif #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return ::std::complex< double >(x, y); } #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return x + y*(__pyx_t_double_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { __pyx_t_double_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex a, __pyx_t_double_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrt(z.real*z.real + z.imag*z.imag); #else return hypot(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { double denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(a, a); case 3: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, a); case 4: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_abs(a); theta = atan2(a.imag, a.real); } lnr = log(r); z_r = exp(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cos(z_theta); z.imag = z_r * sin(z_theta); return z; } #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject* x) { const unsigned char neg_one = (unsigned char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned char" : "value too large to convert to unsigned char"); } return (unsigned char)-1; } return (unsigned char)val; } return (unsigned char)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject* x) { const unsigned short neg_one = (unsigned short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned short" : "value too large to convert to unsigned short"); } return (unsigned short)-1; } return (unsigned short)val; } return (unsigned short)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject* x) { const unsigned int neg_one = (unsigned int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned int" : "value too large to convert to unsigned int"); } return (unsigned int)-1; } return (unsigned int)val; } return (unsigned int)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject* x) { const char neg_one = (char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to char" : "value too large to convert to char"); } return (char)-1; } return (char)val; } return (char)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject* x) { const short neg_one = (short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to short" : "value too large to convert to short"); } return (short)-1; } return (short)val; } return (short)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject* x) { const signed char neg_one = (signed char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed char" : "value too large to convert to signed char"); } return (signed char)-1; } return (signed char)val; } return (signed char)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject* x) { const signed short neg_one = (signed short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed short" : "value too large to convert to signed short"); } return (signed short)-1; } return (signed short)val; } return (signed short)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject* x) { const signed int neg_one = (signed int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed int" : "value too large to convert to signed int"); } return (signed int)-1; } return (signed int)val; } return (signed int)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject* x) { const unsigned long neg_one = (unsigned long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)PyLong_AsUnsignedLong(x); } else { return (unsigned long)PyLong_AsLong(x); } } else { unsigned long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned long)-1; val = __Pyx_PyInt_AsUnsignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject* x) { const unsigned PY_LONG_LONG neg_one = (unsigned PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (unsigned PY_LONG_LONG)PyLong_AsLongLong(x); } } else { unsigned PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned PY_LONG_LONG)-1; val = __Pyx_PyInt_AsUnsignedLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject* x) { const long neg_one = (long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)PyLong_AsUnsignedLong(x); } else { return (long)PyLong_AsLong(x); } } else { long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (long)-1; val = __Pyx_PyInt_AsLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject* x) { const PY_LONG_LONG neg_one = (PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (PY_LONG_LONG)PyLong_AsLongLong(x); } } else { PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1; val = __Pyx_PyInt_AsLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject* x) { const signed long neg_one = (signed long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)PyLong_AsUnsignedLong(x); } else { return (signed long)PyLong_AsLong(x); } } else { signed long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed long)-1; val = __Pyx_PyInt_AsSignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject* x) { const signed PY_LONG_LONG neg_one = (signed PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (signed PY_LONG_LONG)PyLong_AsLongLong(x); } } else { signed PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed PY_LONG_LONG)-1; val = __Pyx_PyInt_AsSignedLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE npy_uint64 __Pyx_PyInt_from_py_npy_uint64(PyObject* x) { const npy_uint64 neg_one = (npy_uint64)-1, const_zero = (npy_uint64)0; const int is_unsigned = const_zero < neg_one; if (sizeof(npy_uint64) == sizeof(char)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedChar(x); else return (npy_uint64)__Pyx_PyInt_AsSignedChar(x); } else if (sizeof(npy_uint64) == sizeof(short)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedShort(x); else return (npy_uint64)__Pyx_PyInt_AsSignedShort(x); } else if (sizeof(npy_uint64) == sizeof(int)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedInt(x); else return (npy_uint64)__Pyx_PyInt_AsSignedInt(x); } else if (sizeof(npy_uint64) == sizeof(long)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedLong(x); else return (npy_uint64)__Pyx_PyInt_AsSignedLong(x); } else if (sizeof(npy_uint64) == sizeof(PY_LONG_LONG)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedLongLong(x); else return (npy_uint64)__Pyx_PyInt_AsSignedLongLong(x); } else { npy_uint64 val; PyObject *v = __Pyx_PyNumber_Int(x); #if PY_VERSION_HEX < 0x03000000 if (likely(v) && !PyLong_Check(v)) { PyObject *tmp = v; v = PyNumber_Long(tmp); Py_DECREF(tmp); } #endif if (likely(v)) { int one = 1; int is_little = (int)*(unsigned char *)&one; unsigned char *bytes = (unsigned char *)&val; int ret = _PyLong_AsByteArray((PyLongObject *)v, bytes, sizeof(val), is_little, !is_unsigned); Py_DECREF(v); if (likely(!ret)) return val; } return (npy_uint64)-1; } } static CYTHON_INLINE Py_intptr_t __Pyx_PyInt_from_py_Py_intptr_t(PyObject* x) { const Py_intptr_t neg_one = (Py_intptr_t)-1, const_zero = (Py_intptr_t)0; const int is_unsigned = const_zero < neg_one; if (sizeof(Py_intptr_t) == sizeof(char)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedChar(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedChar(x); } else if (sizeof(Py_intptr_t) == sizeof(short)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedShort(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedShort(x); } else if (sizeof(Py_intptr_t) == sizeof(int)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedInt(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedInt(x); } else if (sizeof(Py_intptr_t) == sizeof(long)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedLong(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedLong(x); } else if (sizeof(Py_intptr_t) == sizeof(PY_LONG_LONG)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedLongLong(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedLongLong(x); } else { Py_intptr_t val; PyObject *v = __Pyx_PyNumber_Int(x); #if PY_VERSION_HEX < 0x03000000 if (likely(v) && !PyLong_Check(v)) { PyObject *tmp = v; v = PyNumber_Long(tmp); Py_DECREF(tmp); } #endif if (likely(v)) { int one = 1; int is_little = (int)*(unsigned char *)&one; unsigned char *bytes = (unsigned char *)&val; int ret = _PyLong_AsByteArray((PyLongObject *)v, bytes, sizeof(val), is_little, !is_unsigned); Py_DECREF(v); if (likely(!ret)) return val; } return (Py_intptr_t)-1; } } static int __Pyx_check_binary_version(void) { char ctversion[4], rtversion[4]; PyOS_snprintf(ctversion, 4, "%d.%d", PY_MAJOR_VERSION, PY_MINOR_VERSION); PyOS_snprintf(rtversion, 4, "%s", Py_GetVersion()); if (ctversion[0] != rtversion[0] || ctversion[2] != rtversion[2]) { char message[200]; PyOS_snprintf(message, sizeof(message), "compiletime version %s of module '%.100s' " "does not match runtime version %s", ctversion, __Pyx_MODULE_NAME, rtversion); #if PY_VERSION_HEX < 0x02050000 return PyErr_Warn(NULL, message); #else return PyErr_WarnEx(NULL, message, 1); #endif } return 0; } #ifndef __PYX_HAVE_RT_ImportType #define __PYX_HAVE_RT_ImportType static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict) { PyObject *py_module = 0; PyObject *result = 0; PyObject *py_name = 0; char warning[200]; py_module = __Pyx_ImportModule(module_name); if (!py_module) goto bad; py_name = __Pyx_PyIdentifier_FromString(class_name); if (!py_name) goto bad; result = PyObject_GetAttr(py_module, py_name); Py_DECREF(py_name); py_name = 0; Py_DECREF(py_module); py_module = 0; if (!result) goto bad; if (!PyType_Check(result)) { PyErr_Format(PyExc_TypeError, "%s.%s is not a type object", module_name, class_name); goto bad; } if (!strict && (size_t)((PyTypeObject *)result)->tp_basicsize > size) { PyOS_snprintf(warning, sizeof(warning), "%s.%s size changed, may indicate binary incompatibility", module_name, class_name); #if PY_VERSION_HEX < 0x02050000 if (PyErr_Warn(NULL, warning) < 0) goto bad; #else if (PyErr_WarnEx(NULL, warning, 0) < 0) goto bad; #endif } else if ((size_t)((PyTypeObject *)result)->tp_basicsize != size) { PyErr_Format(PyExc_ValueError, "%s.%s has the wrong size, try recompiling", module_name, class_name); goto bad; } return (PyTypeObject *)result; bad: Py_XDECREF(py_module); Py_XDECREF(result); return NULL; } #endif #ifndef __PYX_HAVE_RT_ImportModule #define __PYX_HAVE_RT_ImportModule static PyObject *__Pyx_ImportModule(const char *name) { PyObject *py_name = 0; PyObject *py_module = 0; py_name = __Pyx_PyIdentifier_FromString(name); if (!py_name) goto bad; py_module = PyImport_Import(py_name); Py_DECREF(py_name); return py_module; bad: Py_XDECREF(py_name); return 0; } #endif #ifndef __PYX_HAVE_RT_ImportFunction #define __PYX_HAVE_RT_ImportFunction static int __Pyx_ImportFunction(PyObject *module, const char *funcname, void (**f)(void), const char *sig) { PyObject *d = 0; PyObject *cobj = 0; union { void (*fp)(void); void *p; } tmp; d = PyObject_GetAttrString(module, (char *)"__pyx_capi__"); if (!d) goto bad; cobj = PyDict_GetItemString(d, funcname); if (!cobj) { PyErr_Format(PyExc_ImportError, "%s does not export expected C function %s", PyModule_GetName(module), funcname); goto bad; } #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION==3&&PY_MINOR_VERSION==0) if (!PyCapsule_IsValid(cobj, sig)) { PyErr_Format(PyExc_TypeError, "C function %s.%s has wrong signature (expected %s, got %s)", PyModule_GetName(module), funcname, sig, PyCapsule_GetName(cobj)); goto bad; } tmp.p = PyCapsule_GetPointer(cobj, sig); #else {const char *desc, *s1, *s2; desc = (const char *)PyCObject_GetDesc(cobj); if (!desc) goto bad; s1 = desc; s2 = sig; while (*s1 != '\0' && *s1 == *s2) { s1++; s2++; } if (*s1 != *s2) { PyErr_Format(PyExc_TypeError, "C function %s.%s has wrong signature (expected %s, got %s)", PyModule_GetName(module), funcname, sig, desc); goto bad; } tmp.p = PyCObject_AsVoidPtr(cobj);} #endif *f = tmp.fp; if (!(*f)) goto bad; Py_DECREF(d); return 0; bad: Py_XDECREF(d); return -1; } #endif static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line) { int start = 0, mid = 0, end = count - 1; if (end >= 0 && code_line > entries[end].code_line) { return count; } while (start < end) { mid = (start + end) / 2; if (code_line < entries[mid].code_line) { end = mid; } else if (code_line > entries[mid].code_line) { start = mid + 1; } else { return mid; } } if (code_line <= entries[mid].code_line) { return mid; } else { return mid + 1; } } static PyCodeObject *__pyx_find_code_object(int code_line) { PyCodeObject* code_object; int pos; if (unlikely(!code_line) || unlikely(!__pyx_code_cache.entries)) { return NULL; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if (unlikely(pos >= __pyx_code_cache.count) || unlikely(__pyx_code_cache.entries[pos].code_line != code_line)) { return NULL; } code_object = __pyx_code_cache.entries[pos].code_object; Py_INCREF(code_object); return code_object; } static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object) { int pos, i; __Pyx_CodeObjectCacheEntry* entries = __pyx_code_cache.entries; if (unlikely(!code_line)) { return; } if (unlikely(!entries)) { entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Malloc(64*sizeof(__Pyx_CodeObjectCacheEntry)); if (likely(entries)) { __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = 64; __pyx_code_cache.count = 1; entries[0].code_line = code_line; entries[0].code_object = code_object; Py_INCREF(code_object); } return; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if ((pos < __pyx_code_cache.count) && unlikely(__pyx_code_cache.entries[pos].code_line == code_line)) { PyCodeObject* tmp = entries[pos].code_object; entries[pos].code_object = code_object; Py_DECREF(tmp); return; } if (__pyx_code_cache.count == __pyx_code_cache.max_count) { int new_max = __pyx_code_cache.max_count + 64; entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Realloc( __pyx_code_cache.entries, new_max*sizeof(__Pyx_CodeObjectCacheEntry)); if (unlikely(!entries)) { return; } __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = new_max; } for (i=__pyx_code_cache.count; i>pos; i--) { entries[i] = entries[i-1]; } entries[pos].code_line = code_line; entries[pos].code_object = code_object; __pyx_code_cache.count++; Py_INCREF(code_object); } #include "compile.h" #include "frameobject.h" #include "traceback.h" static PyCodeObject* __Pyx_CreateCodeObjectForTraceback( const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_srcfile = 0; PyObject *py_funcname = 0; #if PY_MAJOR_VERSION < 3 py_srcfile = PyString_FromString(filename); #else py_srcfile = PyUnicode_FromString(filename); #endif if (!py_srcfile) goto bad; if (c_line) { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #else py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #endif } else { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromString(funcname); #else py_funcname = PyUnicode_FromString(funcname); #endif } if (!py_funcname) goto bad; py_code = __Pyx_PyCode_New( 0, /*int argcount,*/ 0, /*int kwonlyargcount,*/ 0, /*int nlocals,*/ 0, /*int stacksize,*/ 0, /*int flags,*/ __pyx_empty_bytes, /*PyObject *code,*/ __pyx_empty_tuple, /*PyObject *consts,*/ __pyx_empty_tuple, /*PyObject *names,*/ __pyx_empty_tuple, /*PyObject *varnames,*/ __pyx_empty_tuple, /*PyObject *freevars,*/ __pyx_empty_tuple, /*PyObject *cellvars,*/ py_srcfile, /*PyObject *filename,*/ py_funcname, /*PyObject *name,*/ py_line, /*int firstlineno,*/ __pyx_empty_bytes /*PyObject *lnotab*/ ); Py_DECREF(py_srcfile); Py_DECREF(py_funcname); return py_code; bad: Py_XDECREF(py_srcfile); Py_XDECREF(py_funcname); return NULL; } static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_globals = 0; PyFrameObject *py_frame = 0; py_code = __pyx_find_code_object(c_line ? c_line : py_line); if (!py_code) { py_code = __Pyx_CreateCodeObjectForTraceback( funcname, c_line, py_line, filename); if (!py_code) goto bad; __pyx_insert_code_object(c_line ? c_line : py_line, py_code); } py_globals = PyModule_GetDict(__pyx_m); if (!py_globals) goto bad; py_frame = PyFrame_New( PyThreadState_GET(), /*PyThreadState *tstate,*/ py_code, /*PyCodeObject *code,*/ py_globals, /*PyObject *globals,*/ 0 /*PyObject *locals*/ ); if (!py_frame) goto bad; py_frame->f_lineno = py_line; PyTraceBack_Here(py_frame); bad: Py_XDECREF(py_code); Py_XDECREF(py_frame); } static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) { while (t->p) { #if PY_MAJOR_VERSION < 3 if (t->is_unicode) { *t->p = PyUnicode_DecodeUTF8(t->s, t->n - 1, NULL); } else if (t->intern) { *t->p = PyString_InternFromString(t->s); } else { *t->p = PyString_FromStringAndSize(t->s, t->n - 1); } #else /* Python 3+ has unicode identifiers */ if (t->is_unicode | t->is_str) { if (t->intern) { *t->p = PyUnicode_InternFromString(t->s); } else if (t->encoding) { *t->p = PyUnicode_Decode(t->s, t->n - 1, t->encoding, NULL); } else { *t->p = PyUnicode_FromStringAndSize(t->s, t->n - 1); } } else { *t->p = PyBytes_FromStringAndSize(t->s, t->n - 1); } #endif if (!*t->p) return -1; ++t; } return 0; } /* Type Conversion Functions */ static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { int is_true = x == Py_True; if (is_true | (x == Py_False) | (x == Py_None)) return is_true; else return PyObject_IsTrue(x); } static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x) { PyNumberMethods *m; const char *name = NULL; PyObject *res = NULL; #if PY_VERSION_HEX < 0x03000000 if (PyInt_Check(x) || PyLong_Check(x)) #else if (PyLong_Check(x)) #endif return Py_INCREF(x), x; m = Py_TYPE(x)->tp_as_number; #if PY_VERSION_HEX < 0x03000000 if (m && m->nb_int) { name = "int"; res = PyNumber_Int(x); } else if (m && m->nb_long) { name = "long"; res = PyNumber_Long(x); } #else if (m && m->nb_int) { name = "int"; res = PyNumber_Long(x); } #endif if (res) { #if PY_VERSION_HEX < 0x03000000 if (!PyInt_Check(res) && !PyLong_Check(res)) { #else if (!PyLong_Check(res)) { #endif PyErr_Format(PyExc_TypeError, "__%s__ returned non-%s (type %.200s)", name, name, Py_TYPE(res)->tp_name); Py_DECREF(res); return NULL; } } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_TypeError, "an integer is required"); } return res; } static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) { Py_ssize_t ival; PyObject* x = PyNumber_Index(b); if (!x) return -1; ival = PyInt_AsSsize_t(x); Py_DECREF(x); return ival; } static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) { #if PY_VERSION_HEX < 0x02050000 if (ival <= LONG_MAX) return PyInt_FromLong((long)ival); else { unsigned char *bytes = (unsigned char *) &ival; int one = 1; int little = (int)*(unsigned char*)&one; return _PyLong_FromByteArray(bytes, sizeof(size_t), little, 0); } #else return PyInt_FromSize_t(ival); #endif } static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject* x) { unsigned PY_LONG_LONG val = __Pyx_PyInt_AsUnsignedLongLong(x); if (unlikely(val == (unsigned PY_LONG_LONG)-1 && PyErr_Occurred())) { return (size_t)-1; } else if (unlikely(val != (unsigned PY_LONG_LONG)(size_t)val)) { PyErr_SetString(PyExc_OverflowError, "value too large to convert to size_t"); return (size_t)-1; } return (size_t)val; } #endif /* Py_PYTHON_H */ PyCogent-1.5.3/cogent/struct/_contact.pxd000644 000765 000024 00000000163 11307235442 021353 0ustar00jrideoutstaff000000 000000 cimport numpy as np ctypedef np.npy_float64 DTYPE_t ctypedef np.npy_int64 LTYPE_t ctypedef np.npy_uint64 UTYPE_tPyCogent-1.5.3/cogent/struct/_contact.pyx000644 000765 000024 00000014207 12024702176 021404 0ustar00jrideoutstaff000000 000000 cimport cython import numpy as np cimport numpy as np from numpy cimport npy_intp from cogent.maths.spatial.ckd3 cimport kdpoint, points, kdnode, build_tree, rn from stdlib cimport malloc, free __version__ = "('1', '5', '3')" cdef extern from "numpy/arrayobject.h": # cdef object PyArray_SimpleNewFromData(int nd, npy_intp *dims,\ # int typenum, void *data) cdef void import_array() # cdef enum requirements: # NPY_OWNDATA def cnt_loop( np.ndarray[DTYPE_t, ndim =2] qcoords,\ np.ndarray[DTYPE_t, ndim =2] lcoords,\ np.ndarray[LTYPE_t, ndim =1] qc,\ np.ndarray[LTYPE_t, ndim =1] lc,\ UTYPE_t shape1,\ UTYPE_t shape2,\ UTYPE_t zero_tra,\ UTYPE_t mode,\ DTYPE_t search_limit,\ np.ndarray[DTYPE_t, ndim =1] box,\ UTYPE_t bucket_size =10,\ UTYPE_t MAXSYM =200000,\ npy_intp MAXCNT =100000): #const cdef UTYPE_t asu_atoms = shape1 * shape2 search_limit = search_limit * search_limit #looping indexes query atom, lattice atom, neighbor, result cdef int idx, lidx, lidx_c, idxn, idxc #c arrays from numpy cdef DTYPE_t *qcoords_c = qcoords.data cdef DTYPE_t *lcoords_c = lcoords.data cdef DTYPE_t *box_c = box.data #malloc'ed pointers cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) # temp cdef DTYPE_t *t_ptr # temporary pointer cdef UTYPE_t t_idx # index cdef UTYPE_t t_asu # reduced index cdef UTYPE_t t_sym # symmetry cdef UTYPE_t t_tra # translation UTYPE_t cdef DTYPE_t t_dst # distance cdef DTYPE_t *t_arr = malloc(3 * MAXSYM * sizeof(DTYPE_t)) # temporary array of symmetry cdef UTYPE_t *t_lid = malloc( MAXSYM * sizeof(UTYPE_t)) # maping to original indices # result #cdef UTYPE_t *c_src = malloc(MAXCNT * sizeof(UTYPE_t)) # source indices #cdef UTYPE_t *c_asu = malloc(MAXCNT * sizeof(UTYPE_t)) # target indices #cdef UTYPE_t *c_sym = malloc(MAXCNT * sizeof(UTYPE_t)) # symmetries #cdef UTYPE_t *c_tra = malloc(MAXCNT * sizeof(UTYPE_t)) # translations #cdef DTYPE_t *c_dst = malloc(MAXCNT * sizeof(DTYPE_t)) # distances cdef np.ndarray[UTYPE_t, ndim=1] c_src = np.ndarray((MAXCNT,), dtype=np.uint64) cdef np.ndarray[UTYPE_t, ndim=1] c_asu = np.ndarray((MAXCNT,), dtype=np.uint64) cdef np.ndarray[UTYPE_t, ndim=1] c_sym = np.ndarray((MAXCNT,), dtype=np.uint64) cdef np.ndarray[UTYPE_t, ndim=1] c_tra = np.ndarray((MAXCNT,), dtype=np.uint64) cdef np.ndarray[DTYPE_t, ndim=1] c_dst = np.ndarray((MAXCNT,), dtype=np.float64) # create a temporary array of lattice points, which are within a box around # the query atoms. The kd-tree will be constructed from those filterd atoms. lidx_c = 0 for 0 <= lidx < lcoords.shape[0]: t_ptr = lcoords_c + lidx * 3 if box_c[0] <= (t_ptr )[0] <= box_c[3] and\ box_c[1] <= (t_ptr+1)[0] <= box_c[4] and\ box_c[2] <= (t_ptr+2)[0] <= box_c[5]: t_arr[3*lidx_c ] = (t_ptr )[0] t_arr[3*lidx_c+1] = (t_ptr+1)[0] t_arr[3*lidx_c+2] = (t_ptr+2)[0] t_lid[lidx_c] = lidx lidx_c += 1 #make kd-tree cdef kdpoint search_point cdef npy_intp neighbor_number cdef kdpoint *kdpnts = points(t_arr, lidx_c, 3) cdef kdnode *tree = build_tree(kdpnts, 0, lidx_c - 1, 3, bucket_size, 0) idxc = 0 # loop over every query atom for 0 <= idx < qcoords.shape[0]: search_point.coords = qcoords_c + idx*3 neighbor_number = rn(tree, kdpnts, search_point, dstptr, idxptr, search_limit, 3, 100) # loop over all neighbors for 0 <= idxn < neighbor_number: t_dst = dstptr[0][idxn] # the distance of the neighbor to the query if t_dst <= 0.001: # its the same atom, skipping. continue t_idx = kdpnts[idxptr[0][idxn]].index # real index in t_arr array t_idx = t_lid[t_idx] # real index in lcoords array t_asu = t_idx % shape2 # 0 .. N -1, atom number t_sym = (t_idx // shape2) % shape1 # 0 .. MXS -1, symmetry number t_tra = t_idx // asu_atoms # 0 .. (2n + 1)^2 -1, translation number if t_tra == zero_tra: # same unit cell if (mode == 0): continue elif (mode == 1) and (t_sym == 0): # same asymmetric unit continue elif (mode == 2) and (qc[idx] == lc[t_asu]) and (t_sym == 0): # continue # safe valid contact c_src[idxc] = idx c_asu[idxc] = t_asu c_sym[idxc] = t_sym c_tra[idxc] = t_tra c_dst[idxc] = t_dst idxc += 1 free(t_arr) free(t_lid) # free KD-Tree free(idxptr[0]) free(idxptr) free(dstptr) # numpy output #import_array() #cdef np.ndarray n_src = PyArray_SimpleNewFromData(1, &MAXCNT, NPY_UINT, c_src) # n_src.flags = n_src.flags|(NPY_OWNDATA) # this sets the ownership bit #cdef np.ndarray n_asu = PyArray_SimpleNewFromData(1, &MAXCNT, NPY_UINT, c_asu) # n_asu.flags = n_asu.flags|(NPY_OWNDATA) # this sets the ownership bit #cdef np.ndarray n_sym = PyArray_SimpleNewFromData(1, &MAXCNT, NPY_UINT, c_sym) # n_sym.flags = n_sym.flags|(NPY_OWNDATA) # this sets the ownership bit #cdef np.ndarray n_tra = PyArray_SimpleNewFromData(1, &MAXCNT, NPY_UINT, c_tra) # n_tra.flags = n_tra.flags|(NPY_OWNDATA) # this sets the ownership bit #cdef np.ndarray n_dst = PyArray_SimpleNewFromData(1, &MAXCNT, NPY_DOUBLE, c_dst) # n_dst.flags = n_dst.flags|(NPY_OWNDATA) # this sets the ownership bit return (idxc, c_src, c_asu, c_sym, c_tra, c_dst) PyCogent-1.5.3/cogent/struct/annotation.py000644 000765 000024 00000001763 12024702176 021577 0ustar00jrideoutstaff000000 000000 """Contains functions to annotate macromolecular entities.""" from cogent.core.entity import HIERARCHY __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" def xtradata(data, entity): """Annotates an entity with data from a ``{full_id:data}`` dictionary. The ``data`` should also be a dictionary. Arguments: - data: a dictionary, which is a mapping of full_id's (keys) and data dictionaries. - entity: top-level entity, which contains the entities which will hold the data.""" for full_id, data in data.iteritems(): sub_entity = entity strip_full_id = [i for i in full_id if i is not None] for short_id in strip_full_id: sub_entity = sub_entity[(short_id,)] sub_entity.xtra.update(data) PyCogent-1.5.3/cogent/struct/asa.py000644 000765 000024 00000022444 12024702176 020170 0ustar00jrideoutstaff000000 000000 """Classes and functions for computing and manipulating accessible surface areas (ASA).""" from cogent.app.stride import Stride from cogent.parse.stride import stride_parser from cogent.struct.selection import einput from cogent.struct.annotation import xtradata from cogent.maths.geometry import sphere_points, coords_to_symmetry, \ coords_to_crystal from _asa import asa_loop from numpy import array, r_ __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" def _run_asa(atoms, lattice_coords, spoints, probe=1.4, bucket_size=5, \ MAXSYM=200000): """Runs an ASA calculation. This function takes a selection of atoms (in the most common case all atoms in a structure) and lattice coordinates (in the most common a 3x3 box of unit-cells). This function should be considered low-level and not part of the interface. Arguments: - atoms: Holder of atom entities. - lattice_coords: Numpy array of coordinates. - spoints: an array of coordinates on the unit sphere, defines the accuracy of ASA calculation. - probe: size of the probe i.e. solvent molecule. - bucket_size: see: ``KDTree``. - MAXSYM (int): maximum number of symmetry generated atoms. """ # get array of radii inflated by probe size of the selection of atoms. atom_radii = array(atoms.getData('radius', forgiving=False)) + probe # get array of coordinates atom_coords = array(atoms.getData('coords', forgiving=False)) # calculate bounding box in a form of an array search_limit = 2 * (2.0 + probe) # 2.0 is maximum atom radius atom_box = r_[atom_coords.min(axis=0) - search_limit, \ atom_coords.max(axis=0) + search_limit] # the lattice coordinates are atom coordinates after transformations in a # 4D-array this array gets reshaped into an all_atoms x 3 array. shape = lattice_coords.shape lattice_coords = \ lattice_coords.reshape((shape[0] * shape[1] * shape[2], shape[3])) # this calls the cython code which loops over all query atoms, surface # points, and lattice atoms return asa_loop(atom_coords, lattice_coords, atom_radii, atom_radii, \ spoints, atom_box, probe, bucket_size, MAXSYM) def _prepare_entities(entities): """Prepares input entities for ASA calculation, which includes masking water molecules and water chains. """ # First we mask all water residues and chains with all residues masked # (water chains). lattice_residues = einput(entities, 'R') lattice_residues.maskChildren('H_HOH', 'eq', 'name') lattice_chains = einput(entities, 'C') lattice_chains.maskChildren([], 'eq', 'values', method=True) # if no residues or chains are left - no atoms to work with, # abort with warning. if not lattice_chains.values(): # the following makes sure that masking changes by the above # tests are reverted. lattice_structures = einput(entities, 'S') lattice_structures.setUnmasked(force=True) raise ValueError('No unmasked atoms to build lattice.') # these are all atoms we can work with lattice_atoms = einput(entities, 'A') lattice_atoms.dispatch('setRadius') def _postpare_entities(entities): """Restores entities after ASA calculation, which includes unmasking.""" structures = einput(entities, 'S') structures.setUnmasked(force=True) def _prepare_asa(entities, symmetry_mode=None, crystal_mode=None, points=960, \ **kwargs): """Prepares the atomic solvent-accessible surface area (ASA) calculation. Arguments: - entities: input entities for ASA calculation (most commondly a structure entity). - symmetry_mode (str): One of 'uc', 'bio' or 'table'. This defines the transformations of applied to the coordinates of the input entities. It is one of 'bio', 'uc' or 'table'. Where 'bio' and 'uc' are transformations to create the biological molecule or unit-cell from the PDB header. The 'table' uses transformation matrices derived from space-group information only using crystallographic tables(requires ``cctbx``). - crystal_mode (int): Defines the number of unit-cells to expand the initial unit-cell into. The number of unit-cells in each direction i.e. 1 is makes a total of 27 unit cells: (-1, 0, 1) == 3, 3^3 == 27 - points: number of points on atom spheres higher is slower but more accurate. Additional keyworded arguments are passed to the ``_run_asa`` function. """ # generate uniform points on the unit-sphere spoints = sphere_points(points) # prepare entities for asa calculation # free-floating area mode result = {} atoms = einput(entities, 'A') if not symmetry_mode and not crystal_mode: coords = array(atoms.getData('coords', forgiving=False)) coords = array([[coords]]) # fake 3D and 4D idx_to_id = dict(enumerate(atoms.getData('getFull_id', \ forgiving=False, method=True))) asas = _run_asa(atoms, coords, spoints, **kwargs) for idx in xrange(asas.shape[0]): result[idx_to_id[idx]] = asas[idx] # crystal-contact area mode elif symmetry_mode in ('table', 'uc'): structure = einput(entities, 'S').values()[0] sh = structure.header coords = array(atoms.getData('coords', forgiving=False)) idx_to_id = dict(enumerate(atoms.getData('getFull_id', \ forgiving=False, method=True))) # expand to unit-cell, real 3D coords = coords_to_symmetry(coords, \ sh[symmetry_mode + '_fmx'], \ sh[symmetry_mode + '_omx'], \ sh[symmetry_mode + '_mxs'], \ symmetry_mode) # expand to crystal, real 4D if crystal_mode: coords = coords_to_crystal(coords, \ sh[symmetry_mode + '_fmx'], \ sh[symmetry_mode + '_omx'], \ crystal_mode) # real 4D else: coords = array([coords]) # fake 4D asas = _run_asa(atoms, coords, spoints, **kwargs) for idx in xrange(asas.shape[0]): result[idx_to_id[idx]] = asas[idx] # biological area mode elif symmetry_mode == 'bio': structure = einput(entities, 'S').values()[0] chains = einput(entities, 'C') sh = structure.header start = 0 for chain_ids, mx_num in sh['bio_cmx']: sel = chains.selectChildren(chain_ids, 'contains', 'id').values() atoms = einput(sel, 'A') coords = array(atoms.getData('coords', forgiving=False)) idx_to_id = dict(enumerate(atoms.getData('getFull_id', \ forgiving=False, method=True))) stop = start + mx_num coords = coords_to_symmetry(coords, \ sh['uc_fmx'], \ sh['uc_omx'], \ sh['bio_mxs'][start:stop], \ symmetry_mode) coords = array([coords]) start = stop asas = _run_asa(atoms, coords, spoints, **kwargs) for idx in xrange(asas.shape[0]): result[idx_to_id[idx]] = asas[idx] return result def asa_xtra(entities, mode='internal', xtra_key=None, **asa_kwargs): """Calculates accessible surface areas (ASA) and puts the results into the xtra dictionaries of entities. Arguments: - entities: an entity or sequence of entities - mode(str): 'internal' for calculations using the built-in cython code or 'stride' if the stride binary should be called to do the job. - xtra_key(str): Key in the xtra dictionary to hold the result for each entity Additional keyworded arguments are passed to the ``_prepare_asa`` and ``_run_asa`` functions. """ xtra_key = xtra_key or 'ASA' structures = einput(entities, 'S') if len(structures.values()) > 1: raise ValueError('Entities from multiple structures are not supported.') if mode == 'internal': _prepare_entities(entities) # mask waters result = _prepare_asa(entities, **asa_kwargs) # calculate ASA _postpare_entities(entities) # unmask waters result = dict([(id, {xtra_key:v}) for id, v in result.iteritems()]) xtradata(result, structures) elif mode == 'stride': models = einput(entities, 'M') stride_app = Stride() result = stride_app(entities)['StdOut'].readlines() result = stride_parser(result) xtradata(result, structures.values()[0][(0,)]) else: raise ValueError('Not a valid mode: "%s"' % mode) return result PyCogent-1.5.3/cogent/struct/contact.py000644 000765 000024 00000016156 12024702176 021062 0ustar00jrideoutstaff000000 000000 """Classes and functions for computing and manipulating accessible surface areas (ASA).""" from cogent.struct.selection import einput from cogent.struct.annotation import xtradata from cogent.maths.geometry import coords_to_symmetry, \ coords_to_crystal from _contact import cnt_loop from collections import defaultdict from numpy import array, r_, sqrt, int64 __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" def _prepare_contacts(query, model=None, level='A', search_limit=6.0, \ contact_mode='diff_chain', symmetry_mode=None, \ crystal_mode=None, **kwargs): """Prepares distance contact calculations. Arguments: - query(entitie[s]): query entitie[s] for contact calculation (most commonly a structure entity). - model(entity): a Model entity which will be transformed according to symmetry_mode and crystal_mode. (most commonly it is the same as the query) - level(str): The level in the hierarchy at which distances will be calculated (most commonly 'A' for atoms) - search_limit(float): maximum distance in Angstrom's - contact_mode(str): One of "diff_cell", "diff_sym", "diff_chain". Defines the allowed contacts i.e. requires that contacts are by entities, which have: "diff_cell" different unit cells; "diff_sym" different symmetry operators (if in the same unit cell) "diff_chain" with different chain ids (if in the same unit cell and symmetry). - symmetry_mode (str): One of 'uc', 'bio' or 'table'. This defines the transformations of applied to the coordinates of the input entities. It is one of 'bio', 'uc' or 'table'. Where 'bio' and 'uc' are transformations to create the biological molecule or unit-cell from the PDB header. The 'table' uses transformation matrices derived from space-group information only using crystallographic tables(requires ``cctbx``). - crystal_mode (int): Defines the number of unit-cells to expand the initial unit-cell into. The number of unit cells in each direction i.e. 1 is makes a total of 27 unit cells: (-1, 0, 1) == 3, 3^3 == 27 Additional arguments are passed to the ``cnt_loop`` Cython function. """ contact_mode = {'diff_asu' :0, 'diff_sym' :1, 'diff_chain':2 }[contact_mode] # determine unique structure structure = einput(query, 'S').values()[0] sh = structure.header # if not specified otherwise the lattice is the first model lattice = model or structure[(0,)] lents = einput(lattice, level) lents_ids = lents.getData('getFull_id', forgiving=False, method=True) lcoords = array(lents.getData('coords', forgiving=False)) qents = einput(query, level) qents_ids = qents.getData('getFull_id', forgiving=False, method=True) qcoords = array(qents.getData('coords', forgiving=False)) if symmetry_mode: if symmetry_mode == 'table': lcoords = coords_to_symmetry(lcoords, \ sh['table_fmx'], \ sh['table_omx'], \ sh['table_mxs'], \ symmetry_mode) elif symmetry_mode == 'uc': lcoords = coords_to_symmetry(lcoords, \ sh['uc_fmx'], \ sh['uc_omx'], \ sh['uc_mxs'], \ symmetry_mode) elif symmetry_mode == 'bio': # TODO see asa raise ValueError("Unsupported symmetry_mode: %s" % symmetry_mode) else: raise ValueError("Unsupported symmetry_mode: %s" % symmetry_mode) else: lcoords = array([lcoords]) # fake 3D if crystal_mode: zero_tra = {1:13, 2:62, 3:171}[crystal_mode] # 0,0,0 translation is: Thickened cube numbers: # a(n)=n*(n^2+(n-1)^2)+(n-1)*2*n*(n-1). # 1, 14, 63, 172, 365, 666, 1099, 1688, 2457, 3430, 4631, 6084, 7813 ... if symmetry_mode == 'table': lcoords = coords_to_crystal(lcoords, \ sh['table_fmx'], \ sh['table_omx'], \ crystal_mode) elif symmetry_mode == 'uc': lcoords = coords_to_crystal(lcoords, \ sh['uc_fmx'], \ sh['uc_omx'], \ crystal_mode) else: raise ValueError('crystal_mode not possible for "bio" symmetry') else: zero_tra = 0 lcoords = array([lcoords]) # fake 4D shape = lcoords.shape lcoords = lcoords.reshape((shape[0] * shape[1] * shape[2], shape[3])) box = r_[qcoords.min(axis=0) - search_limit, \ qcoords.max(axis=0) + search_limit] lc = [] # lattice chain qc = [] # query chain lchains = [i[2] for i in lents_ids] qchains = [i[2] for i in qents_ids] allchains = set() allchains.update(lchains) allchains.update(qchains) chain2id = dict(zip(allchains, range(len(allchains)))) for lent_id in lents_ids: lc.append(chain2id[lent_id[2]]) for qent_id in qents_ids: qc.append(chain2id[qent_id[2]]) lc = array(lc, dtype=int64) qc = array(qc, dtype=int64) # here we leave python (idxc, n_src, n_asu, n_sym, n_tra, n_dst) = cnt_loop(\ qcoords, lcoords, qc, lc, shape[1], shape[2], \ zero_tra, contact_mode, search_limit, box, \ **kwargs) result = defaultdict(dict) for contact in xrange(idxc): qent_id = qents_ids[n_src[contact]] lent_id = lents_ids[n_asu[contact]] result[qent_id][lent_id] = (sqrt(n_dst[contact]), n_tra[contact], n_sym[contact]) return result def contacts_xtra(query, xtra_key=None, **cnt_kwargs): """Finds distance contacts between entities. This function searches for contacts for query entities (query) either within the asymmetric unit, biological molecule, unit-cell or crystal. Arguments: - query (entitie[s]): query entity or sequence entities - xtra_key (str): name of the key Additional keyworded arguments are passed to the ``_prepare_contacts`` functon. """ xtra_key = xtra_key or 'CONTACTS' structures = einput(query, 'S') if len(structures.values()) > 1: raise ValueError('Entities from multiple structures are not supported.') result = _prepare_contacts(query, **cnt_kwargs) # calculate CONTACTS result = dict([(id, {xtra_key:v}) for id, v in result.iteritems()]) xtradata(result, structures) return result PyCogent-1.5.3/cogent/struct/dihedral.py000644 000765 000024 00000011156 12024702176 021176 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # # dihedral.py # # Calculates dihedral angles. # """Dihedral angle calculation module. K. Rother Two functions are provided (see documentation of each function for a more detailed description): angle (v1,v2) - returns the angle between two 2D or 3D numpy arrays (in radians) dihedral (v1,v2,v3,v4) - returns the dihedral angle between four 3D vectors (in degrees) The vectors that dihedral uses can be lists, tuples or numpy arrays. Scientific.Geometry.Vector objects behave differently on Win and Linux, and they are therefore not supported (but may work anyway). """ __author__ = "Kristian Rother" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Kristian Rother", "Sandra Smit"] __credits__ = ["Janusz Bujnicki", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kristian Rother" __email__ = "krother@rubor.de" __status__ = "Production" from numpy import array, cross, pi, cos, arccos as acos from cogent.util.array import norm from cogent.maths.stats.special import fix_rounding_error class DihedralGeometryError(Exception): pass class AngleGeometryError(Exception): pass def scalar(v1,v2): """ calculates the scalar product of two vectors v1 and v2 are numpy.array objects. returns a float for a one-dimensional array. """ return sum(v1*v2) def angle(v1,v2): """ calculates the angle between two vectors. v1 and v2 are numpy.array objects. returns a float containing the angle in radians. """ length_product = norm(v1)*norm(v2) if length_product == 0: raise AngleGeometryError(\ "Cannot calculate angle for vectors with length zero") cosine = scalar(v1,v2)/length_product angle = acos(fix_rounding_error(cosine)) return angle def calc_angle(vec1,vec2,vec3): """Calculates a flat angle from three coordinates.""" if len(vec1) == 3: v1, v2, v3 = map(create_vector,[vec1,vec2,vec3]) else: v1, v2, v3 = map(create_vector2d,[vec1,vec2,vec3]) v12 = v2 - v1 v23 = v2 - v3 return angle(v12, v23) def create_vector2d(vec): """Returns a vector as a numpy array.""" return array([vec[0],vec[1]]) def create_vector(vec): """Returns a vector as a numpy array.""" return array([vec[0],vec[1],vec[2]]) def create_vectors(vec1,vec2,vec3,vec4): """Returns dihedral angle, takes four Scientific.Geometry.Vector objects (dihedral does not work for them because the Win and Linux libraries are not identical. """ return map(create_vector,[vec1,vec2,vec3,vec4]) def dihedral(vec1,vec2,vec3,vec4): """ Returns a float value for the dihedral angle between the four vectors. They define the bond for which the torsion is calculated (~) as: V1 - V2 ~ V3 - V4 The vectors vec1 .. vec4 can be array objects, lists or tuples of length three containing floats. For Scientific.geometry.Vector objects the behavior is different on Windows and Linux. Therefore, the latter is not a featured input type even though it may work. If the dihedral angle cant be calculated (because vectors are collinear), the function raises a DihedralGeometryError """ # create array instances. v1,v2,v3,v4 =create_vectors(vec1,vec2,vec3,vec4) all_vecs = [v1,v2,v3,v4] # rule out that two of the atoms are identical # except the first and last, which may be. for i in range(len(all_vecs)-1): for j in range(i+1,len(all_vecs)): if i>0 or j<3: # exclude the (1,4) pair equals = all_vecs[i]==all_vecs[j] if equals.all(): raise DihedralGeometryError(\ "Vectors #%i and #%i may not be identical!"%(i,j)) # calculate vectors representing bonds v12 = v2-v1 v23 = v3-v2 v34 = v4-v3 # calculate vectors perpendicular to the bonds normal1 = cross(v12,v23) normal2 = cross(v23,v34) # check for linearity if norm(normal1) == 0 or norm(normal2)== 0: raise DihedralGeometryError(\ "Vectors are in one line; cannot calculate normals!") # normalize them to length 1.0 normal1 = normal1/norm(normal1) normal2 = normal2/norm(normal2) # calculate torsion and convert to degrees torsion = angle(normal1,normal2) * 180.0/pi # take into account the determinant # (the determinant is a scalar value distinguishing # between clockwise and counter-clockwise torsion. if scalar(normal1,v34) >= 0: return torsion else: torsion = 360-torsion if torsion == 360: torsion = 0.0 return torsion PyCogent-1.5.3/cogent/struct/knots.py000644 000765 000024 00000204533 12024702176 020563 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # knots.py """Contains code related to RNA (secondary) structure and pseudoknots. Specifically, this module contains several methods to remove pseudoknots from RNA structures. Pseudoknot removal is discussed in the following paper: S. Smit, K. Rother, J. Heringa, and R. Knight Manuscript in preparation. If you use this code in your work, please cite this publication (in addition to the PyCogent publication). If you need to cite the paper before submission, please contact the author of this module. Six functions are provided (see documentation of each function for a more detailed description): opt_all -- optimization approach that calculates all nested structures that optimize some value (e.g. keep the maximum number of base pairs) conflict_elimination -- Removes pseudoknots from a structure by eliminating conflicting base pairs one by one. Two functions to determine which paired region should be removed next are provided: max_conlficts and min_gain. inc_order -- creates a nested structure by adding non-conflicting paired regions one by one to the solution; paired regions are processed from 5' to 3' start point or from 3' to 5' end point. inc_length -- creates a nested structure by adding non-conflicting paired regions one at the time, starting with the longest region working towards the shortest region. inc_range -- generates a nested structure by adding non-conflicting paired retions one at the time, starting with short-range interactions working towards long-range interactions. These six functions represent the core objective of this module. Two convenience functions supporting the opt_all function are added: opt_single_random, and opt_single_property There is also a modified version of the original Nussinov-Jacobson algorithm present which is restricted to the given list of base pairs: nussinov_restricted In addition, the following supporting objects and functions are present: PairedRegion -- object that represents a paired region in an RNA structure. A paired region is an uninterrupted stretch of base pairs with positions [(i,j),(i+1, j-1),(i+2,j-2), ...]. PairedRegions -- object (basically a list) that stores a collection of PairedRegion objects. This is an alternative way of representing an RNA structure, where basically stretches of base pairs are condensed into PairedRegion objects. PairedRegionFromPairs -- Factory function to create a PairedRegion object from a Pairs object (cogent.struct.rna2d) PairedRegionsFromPairs -- Factory function to create a PairedRegions object from a Pairs object (cogent.struct.rna2d) ConflictMatrix -- object to store a matrix of conflicts between different paired regions. Row and Column indices correspond to PairedRegion IDs, values in the matrix are True (if the regions conflict), and False (if the regions don't conflict). Other smaller helper functions are: contains_true, empty_matrix, pick_multi_best, dp_matrix_multi, matrix_solutions, and add_back_non_conflicting See their docstrings for detailed documentation. NOTE: None of the methods provided can handle overlapping base pairs (i.e. base A pairs with B, and A pairs with C), because in that case it is unclear whether the first or second pair should be kept (also, you could get different results based on the order of the base pairs). In general, one removes pseudoknots in order to represent the structure in dot-bracket format. If the list contains conflict, this is impossible anyway, so removing the pseudoknots would not help. It is the responsibility of the user to remove overlapping base pairs before trying to obtain a nested structure. """ from __future__ import division from random import choice from numpy import sum, average, zeros from cogent.struct.rna2d import Pairs from cogent.util.dict2d import Dict2D __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Sandra Smit, Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class PairedRegion(object): """Store an uninterrupted list of base pairs by start, end, and length A paired region (a.k.a. ladder or (helical) region) is a stretch of perfectly nested base pairs with positions: [(m,n), (m+1,n-1), (m+2,n-2),...] This object is very similar to the Stem object in cogent.struct.rna2d. In addition to the start, end, and length it stores the actual base pairs, and it stores a region ID. It has many more methods than the Stem object. This object performs no error checking. You can specify and End before a Start point, or a Start and End which are closer to each other than 2 times the Length. """ def __init__(self, Start, End, Length, Id=None): """Initialize a new PairedRegion object Start -- int, specifying the starting index of the paired region in the sequence. This is the 5' side of the 5' halfregion. End -- int, specifying the end index of the paired region in the sequence. This is the 3' side of the 3' halfregion. Length -- int, specifying the length of the paired region, i.e. the number of base pairs in the region. Id -- string or int, unique identifier for this PairedRegion. During initialization a Pairs object is created. The first pair is always (Start,End), additional pairs add up from the Start point and down from the End point, so (Start+1,End-1), (Start+2, End-2) etc. """ self.Start = Start self.End = End if Length < 1: raise ValueError(\ "PairedRegion should contain at least one base pair") self.Length = Length self.Id = Id self.Pairs = Pairs() for i in range(self.Length): self.Pairs.append((self.Start+i, self.End-i)) def __str__(self): """Return string representation of PairedRegion, list of Pairs """ return str(self.Pairs) def __len__(self): """Return Length of the PairedRegion, i.e. the number of base pairs """ return self.Length def __eq__(self, other): """Compares Pairs and IDs other -- PairedRegion object If IDs are not set (both None), Pairs is the only criterion. """ if self.Pairs == other.Pairs and self.Id == other.Id: return True return False def __ne__(self, other): """Return True if two PairedRegion objects differ other -- PairedRegion object """ return not self == other def upstream(self): """Return list of upstream positions in self from 5' to 3' """ return [i for (i,j) in self.Pairs] def downstream(self): """Return list of downstream positions in self from 5' to 3' """ result = [j for (i,j) in self.Pairs] result.reverse() return result def paired(self): """Return sorted list of paired positions in this region """ result = self.upstream() + self.downstream() result.sort() return result def range(self): """Return the range of this region The range of the region is the number of bases between the highest upstream and the lowest downstream position. (i.e. the number of unpaired bases in the hairpin if this were the only paired region in the structure) For example: the range of region with start=3, end=10, and len=2 would be 4. Performs no error checking. If region overlaps with itself, a negative number will be returned... """ return min(self.downstream()) - max(self.upstream()) - 1 def overlapping(self, other): """Returns True if two regions overlap other -- PairedRegion object Two regions overlap if there is at least one base which is a member of both regions. (Definition from Studnicka 1978) Identical regions are overlapping. """ ref_pos = dict.fromkeys(self.paired()) for pos in other.paired(): if pos in ref_pos: return True return False def conflicting(self, other): """Return True if the regions are conflicting, False if they are nested other -- PairedRegion object Two paired regions are conflicting if they are organized in a knotted fashion. This means both other.Start and other.End have to be between self.Start and self.End or both shouldn't be. See for example Studnicka 1978 for defintion, or any other paper with a general pseudoknot definiton in there. Overlapping regions cause an error, because you can't determine whether they are conflicting or not. However, two identical regions are defined as NOT conflicting, even though they are overlapping, and thus False is returned. For non-identical, but overlapping regions an error will be raised. """ if self == other: # equal blocks return False # if not equal, but overlapping, raise error if self.overlapping(other): raise ValueError("Can only handle non-overlapping regions") if (other.Start > self.End and other.End > self.End) or\ (self.Start > other.End and self.End > other.End): return False if (self.Start < other.Start < self.End and\ self.Start < other.End < self.End) or\ (other.Start < self.Start < other.End and\ other.Start < self.End < other.End): return False return True def score(self, scoring_function): """Sets self.Score to value of scoring function applies to self scoring_function -- function that can be applied to a PairedRegion object and returns a numerical value Note: this method has no return value, it sets a property of the instance. """ self.Score = scoring_function(self) def PairedRegionFromPairs(pairs, Id=None): """Return new PairedRegion object from Pairs object pairs -- Pairs object or list of tuples with up and downstream positions. Id -- string or int, unique identifier of this region. This is a factory function to create a PairedRegion object from a Pairs object. It assumes the pairs are fully nested with positions [(m,n), (m+1,n-1), (m+2,n-2),...]. The function doesn't validate the input pairs, so if the assumtion does not hold, the Pairs in the resulting PairedRegion might differ from the input pairs. It makes the pairs directed and sorts them, then it extracts the Start, End and Length and initialized a new PairedRegion with those parameters. """ if not pairs: raise ValueError("PairedRegion should contain at least one pair") # preprocess raw_pairs = Pairs(pairs) if raw_pairs.hasConflicts(): raise ValueError("Cannot handle pairs with conflicts") p = raw_pairs.directed() p.sort() # initialize variables start = p[0][0] end = p[0][1] length = len(p) return PairedRegion(start, end, length, Id=Id) class PairedRegions(list): """Stores a list of PairedRegion objects A PairedRegions object is a condensed way of looking at an RNA structure, where continuous stretches of base pairs are collapsed into PairedRegion objects. See the documentation on PairedRegion for more details. """ def __init__(self, regions=None): """Initialize new PairedRegions object regions -- list of PairedRegion objects The object is meant to store a list of PairedRegion objects, however it is very light-weight and does not perform any validation on the input. """ if regions is None: regions = [] self[:] = list(regions) def __str__(self): """Return string representation of PairedRegions object Each PairedRegion is presented as Id:Start,End,Length; A PairedRegions object is presented as '(' + space-delimited list of PairedRegion objects + ')' For example: (A:2,10,2; B:12,20,3;) """ result = [] for i in self: result.append("%s:%s,%s,%s;"%(i.Id,i.Start,i.End,i.Length)) return '('+' '.join(result)+')' def __eq__(self, other): """Return True if two PairedRegions objects are equal other -- PairedRegions object Two regions are equal if they have the same length and contain the same PairedRegion objects. """ if len(self) != len(other): return False for i in self: if i not in other: return False return True def __ne__(self, other): """Return True if two PairedRegions objects are different other -- PairedRegions object """ return not self == other def byId(self): """Return dict of {ID: PairedRegion} This function only works if the IDs for each region in self are unique. If multiple regions with the same ID are found, an error is raised. """ result = {} for pr in self: # for every paired region if pr.Id in result: raise ValueError("Duplicate key found") result[pr.Id] = pr return result def numberOfRegions(self): """Return the number of PairedRegion objects in the list """ return len(self) def totalLength(self): """Return the cumulative length of all PairedRegion objects in self The totalLength is the total number of base pairs in this PairedRegions object. So, it adds the number of pairs in each PairedRegion in the list. """ if self: return sum(map(len, self)) else: return 0 def totalScore(self): """Return sum of Score values of each PairedRegion in self This method simply adds all the Score attributes (!= None) for each PairedRegion in this PairedRegions object. """ score = 0 for pr in self: try: score += pr.Score except AttributeError: raise ValueError("Score not set for %s"%(str(self))) except TypeError: raise ValueError("Score should be numerical, but is %s"\ %(pr.Score)) return score def toPairs(self): """Return Pairs object containing all the pairs in each PairedRegion This method does not validate the pairs. It simply adds all the pairs in each PairedRegion to the result. Pairs might occur twice in the result. The resulting Pairs object is sorted. """ result = Pairs() for pr in self: result.extend(pr.Pairs) result.sort() return result def byStartEnd(self): """Return dict of {(pr.Start, pr.End): pr} Keys in the dictionary are tuples of start and end positions, the values are the PairedRegion objects themselves. If a Start/End combination is already in the dictionary, an error is raised. """ result = {} for pr in self: se = (pr.Start, pr.End) if se in result: raise ValueError("Duplicate key found: %s"%(str(se))) result[se] = pr return result #return dict([((pr.Start, pr.End), pr) for pr in self]) def lowestStart(self): """Return lowest begin value of any PairedRegion in the list It lists all the Start values for all the PairedRegion objects in the list and returns the lowest value. If there are no regions in self, None is returned. """ start_values = [pr.Start for pr in self] if not start_values: return None else: return min(start_values) def highestEnd(self): """Return highest end value of any PairedRegion in the list It lists all the End values for all the PairedRegion objects in the list and returns the highest value. If there are no regions in self, None is returned. """ end_values = [pr.End for pr in self] if not end_values: return None else: return max(end_values) def sortedIds(self): """Return sorted list of region IDs """ all_ids = [pr.Id for pr in self] all_ids.sort() return all_ids def upstream(self): """Return sorted list of upstream positions """ result = [] for pr in self: result.extend(pr.upstream()) result.sort() return result def downstream(self): """Return sorted list of downstream positions """ result = [] for pr in self: result.extend(pr.downstream()) result.sort() return result def pairedPos(self): """Return sorted list of all paired positions """ result = self.upstream() + self.downstream() result.sort() return result def boundaries(self): """Return sorted list of all start and end points """ result = [] for pr in self: result.append(pr.Start) result.append(pr.End) result.sort() return result def enumeratedBoundaries(self): """Return dict of {boundary_index: boundary} Return value is dictionary created from tuples from the enumeration of all boundaries. """ return dict(enumerate(self.boundaries())) def invertedEnumeratedBoundaries(self): """Return dict of {boundary_value: boundary_idx} Boundary values are all the start and end points of the paired regions in self. Should be unique, otherwise an error is raised. Boundary indices are the indices assigned to each start and end point during an enumeration of the sorted list. Overall the result is the inverted dictionary of the result of the enumeratedBoundaries method. """ eb = self.enumeratedBoundaries() result = {} for boundary_idx, boundary_value in eb.items(): if boundary_value in result: raise ValueError(\ "Boundary value %s is not unique"%(boundary_value)) result[boundary_value] = boundary_idx #result = dict([(v,k) for k,v in eb.items()]) return result def merge(self, other): """Merge two PairedRegions objects together other -- PairedRegions object Duplicate PairedRegion objects are stored only once. This methods used the PairedRegion IDs to check for duplications. """ result = PairedRegions() seen = {} for pr in self+other: if pr.Id not in seen: result.append(pr) seen[pr.Id] = True return result def conflicting(self, cm=None): """Return PairedRegions obj containing regions involved in a conflict cm -- ConflictMatrix for this PairedRegions object. This method only works if the PairedRegion objects have unique IDs, because a conflict matrix is constructed. This behavior can be changed... See PairedRegion.conflicting() for a definition of conflicting paired regions. """ if cm is None: cm = ConflictMatrix(self) id_to_pr = self.byId() result = PairedRegions() for pr_id in cm.conflicting(): result.append(id_to_pr[pr_id]) return result def nonConflicting(self, cm=None): """Return new PairedRegions object containing non-conflicing regions cm -- ConflictMatrix for this PairedRegions object. Two PairedRegion objects do not conflict when they are organized in a nested fashion. """ if cm is None: cm = ConflictMatrix(self) id_to_pr = self.byId() result = PairedRegions() for pr_id in cm.nonConflicting(): result.append(id_to_pr[pr_id]) return result def conflictCliques(self, cm=None): """Return list of PairedRegions objects w/ mutually conflicting regions cm -- ConflictMatrix for this PairedRegions object. Mutually conflicting regions form a knot-component as defined in Rodland 2006. Return value is a list of PairedRegions objects. Each PairedRegions object contains mutually conflicting regions (knot-components) E.g. region A conflicts with B and B conflicts with A and C, and D conflicts with E, and F doesn't conflict with any othe region, one group would be A, B and C, the other would be D and E. F would not be returned in any group, since it isn't conflicting. """ if cm is None: cm = ConflictMatrix(self) id_to_pr = self.byId() cliques = cm.conflictCliques() result = [] for cl in cliques: pr = PairedRegions() for i in cl: pr.append(id_to_pr[i]) result.append(pr) return result def PairedRegionsFromPairs(pairs): """Return PairedRegions object from Pairs pairs -- Pairs object, no conflicts allowed Result is a list of stretches of perfectly nested base pairs, which means [[(m,n), (m+1,n-1), (m+2,n-2),...],[(i,j),(i+1,j-1),...]] Base pairs are made directed and sorted before the stretches are picked out, so the result will be in order. IDs of the regions are set as indices (enumeration of all regions). The PairedRegion that starts closest to the 5' end will get ID 0, the next ID 1, etc. """ p = Pairs(pairs) if not p: return PairedRegions() if p.hasConflicts(): raise ValueError("Cannot handle base pair conflicts") clean_pairs = p.directed() clean_pairs.sort() regions = [] curr_region = [] pr_id = -1 # paired region ID for pair in clean_pairs: if not curr_region: curr_region.append(pair) else: x,y = curr_region[-1] if pair == (x+1, y-1): curr_region.append(pair) else: pr_id += 1 regions.append(PairedRegionFromPairs(curr_region, Id=pr_id)) curr_region = [pair] if curr_region: # last block pr_id += 1 regions.append(PairedRegionFromPairs(curr_region, Id=pr_id)) return PairedRegions(regions) class ConflictMatrix(object): """Stores conflict matrix A conflict matrix is a matrix that indicates which PairedRegion objects are conflicting. Row and column IDs correspond to Region IDs. If two regions are conflicting True is stored, otherwise False is stored. """ def __init__(self, data): """Initialize new ConflictMatrix object data -- either a PairedRegions object or a Pairs object, or anything that can be made into a Pairs object (e.g. a list of tuples) This method sets the Matrix attribute to a Dict2D object containing conflict information on the PairedRegions. The input data is either a PairedRegions object or it is made into one. A ValueError will be raised when the pairs or regions are overlapping. Input data that can't be converted to Pairs will lead to downstream errors. Pairs doesn't perform any validation. Row and column IDs are the Identifiers of the PairedRegion objects. The RowOrder and ColumnOrder of the Dict2D are the sorted region IDs. """ if isinstance(data, PairedRegions): id_to_pr = data.byId() elif isinstance(data, Pairs): id_to_pr = PairedRegionsFromPairs(data).byId() else: # try to convert to Pairs try: d = Pairs(data) id_to_pr = PairedRegionsFromPairs(d).byId() except: raise ValueError("Can't convert data to Pairs") # handle the rows and columns in order and set RowOrder and ColOrder ro = id_to_pr.keys() ro.sort() co = id_to_pr.keys() co.sort() conf = {} # dict of conflicts between blocks for id1, bl in id_to_pr.items(): for id2, bl2 in id_to_pr.items(): if id2 < id1: # minimize number of calculations continue if id1 not in conf: conf[id1] = {} if id2 not in conf: conf[id2] = {} if id1 == id2: conf[id1][id2] = False conf[id2][id1] = False continue is_conflicting = bl.conflicting(bl2) conf[id1][id2] = is_conflicting conf[id2][id1] = is_conflicting self.Matrix = Dict2D(conf, RowOrder=ro, ColOrder=co) # create Dict2D def conflictsOf(self, pr_id): """Return list of region IDs for regions that conflict with pr_id pr_id -- row ID in the matrix (ID of paired region) Input is ID of a particular region, return value are the IDs of all regions that conflict with the given region. """ return [k for k,v in self.Matrix[pr_id].items() if v is True] def conflicting(self): """Return list of region IDs for conflicting regions """ result = [] cm = self.Matrix for pr_id in cm.RowOrder: if contains_true(cm[pr_id].values()): result.append(pr_id) return result def nonConflicting(self): """Return list of region IDs for non-conflicting regions """ result = [] cm = self.Matrix for pr_id in cm.RowOrder: if not contains_true(cm[pr_id].values()): result.append(pr_id) return result def conflictCliques(self): """Return list of lists with IDs of mutually conflicting regions See documentation on PairedRegions.conflictCliques for more details. """ cm = self.Matrix cliques = [] seen = {} for pr_id in cm.RowOrder: if pr_id in seen: continue todo = set([pr_id]) done = set() while todo != done: collection = [] # collection of conflicts for i in todo: if i in done: # no need to do them twice continue conf = [] # conflicting regions for k,v in cm[i].items(): if v is True: conf.append(k) collection.extend(conf) # add conflict to collection done.add(i) # register that i is done todo.update(collection) # update todo if len(done) > 1: cliques.append(list(done)) for i in done: seen[i] = True return cliques # ============================================================================= # SCORING FUNCTIONS FOR DYNAMIC PROGRAMMING APPROACH # ============================================================================= def num_bps(paired_region): """Return number of base pairs (=Length) of paired_region paired_region -- PairedRegion object """ return paired_region.Length def hydrogen_bonds(seq): """Return function to score a PairedRegion by its hydrogen bonds seq -- Sequence object or string This method counts the number of hydrogen bonds in Watson-Crick and Wobble base pairs. GC pairs score 3, AU and GU pairs score 2. """ HB_SCORE = {('G','C'): 3, ('C','G'): 3,\ ('A','U'): 2, ('U','A'): 2,\ ('G','U'): 2, ('U','G'): 2} def apply_to(paired_region): """Return score of paired_region by its hydrogen bonds paired_region -- PairedRegion object Scores each base pair in the region by giving each GC base pair 3 points and each AU or GU base pair 2 points. Other base pairs are ignored and don't add anything to the overall score. """ score = 0 for up,down in paired_region.Pairs: seq_pair = (seq[up],seq[down]) try: score += HB_SCORE[seq_pair] except KeyError: continue return score return apply_to # ============================================================================= # HELPER FUNCTIONS FOR DYNAMIC PROGRAMMING APPROACH # ============================================================================= def contains_true(i): """Return True if input contains True i -- any object that implements __contains__ Returns True if True is in the input. Both True and 1 count as True. Helper function for ConflictMatrix object. """ try: if True in i: return True return False except TypeError: # when i is a string return False def empty_matrix(size): """Return square matrix as list of lists of specified size. size -- int, number of rows and columns in the matrix. This function is a helper function of opt_all. Each cell is filled with [PairedRegions()]. This is the initialization value needed for a dynamic programming matrix that keeps track of all optimal solutions. A solution is a single PairedRegions object. """ if size < 1: raise ValueError("The size of the matrix should be at least one") result = [] for i in range(size): result.append([]) for j in range(size): result[i].append([PairedRegions()]) return result def pick_multi_best(candidates, goal='max'): """Return list of unique solutions with a maximum/minimum score. candidates -- list of PairedRegions objects This function returns a list of all PairedRegions objects that have an optimal score (maximum or minimum depending on the goal). If the list of candidates is empty, a list containing an empty PairedRegions object is returned. This function is a helper function of dp_matrix_multi. NOTE: PairedRegion IDs must be set. They are checked to avoid including unsaturated solutions. Maybe implementation should be changed, such that (Start, End, Length) tuples are used as IDs?! """ if not candidates: return [PairedRegions()] result = [] best_score = None seen = {} # Candidates have to be processed in order of length can_len = [(c.totalLength(), c) for c in candidates] can_len.sort() can_len.reverse() for l, c in can_len: c_ids = tuple(c.sortedIds()) c_ids_set = set(c_ids) if not c or c_ids in seen: continue this_score = c.totalScore() if best_score is None: best_score = this_score result = [c] seen[c_ids] = True elif this_score == best_score: is_sub = False for seen_id in seen: if len(c_ids_set) == len(c_ids_set & set(seen_id)): is_sub = True break if is_sub: seen[c_ids] = True else: result.append(c) seen[c_ids] = True elif goal == 'max' and this_score < best_score: continue elif goal == 'min' and this_score > best_score: continue else: result = [c] seen[c_ids] = True best_score = this_score if not result: return [PairedRegions()] return result def dp_matrix_multi(paired_regions, goal='max', scoring_function=num_bps): """Return dynamic programming matrix with top-right half filled paired_regions -- PairedRegions object goal -- str, 'max' or 'min', if the goal is 'max' the routine returns the solutions maximizing the score, if the goal is 'min' the solutions with a minimum score are returned. scoring_function -- function that can be applied to a PairedRegion object and that returns a numerical score. This function fills a matrix that calculates the optimal solution for the pseudoknot-removal problem by storing optimal solutions for smaller sub-problems. The number of cells in each DP matrix is the number of given paired regions times two, because there is one row and column for each start and end point of each region. Only the top-right half of the matrix will be filled. A row index is referred to as i (begin_idx in code), a column index is referred to as j (end_idx in code). The matrix is initialized on the diagonal (where i==j) with a list containing an empty solution (an empty PairedRegions object). A list is used because we keep track of all possible optimal choices. For each cell (i,j) where j>i we collect all the candidate-solutions as follows. ** Add all solutions of the cell to the left, which contains the best solutions for the area from start point i to end point j-1. ** Add all solutions of the cell to the bottom, which contains the best solutions for the area from start point i+1 to end point j. ** If start point i and end point j are a start and end point of the same region, add all possible solutions from the cell to the bottom-left plus this region. The cell to the bottom-left contains the optimal solutions for the area from start point i+1 to end point j-1. ** If the lists of solutions at the cells to the left and bottom both contained anything different from the empty solution, we need to check two more things: ** For each combination of a solution in the left cell and a solution in the right cell, calculate the highest end point of the solution in the left cell and the lowest start point for the bottom cell. In a collection of paired regions, every region has an end point, and the highest end point is the largest number in the list of all end points. The lowest start value is calculated in a similar way. ** If the highest end point is lower than the lowest start point, it means both solutions are disjoint and can be added to form a better solution. Thus, merge the two solutions and add them to the list of candidate-solutions. ** Otherwise, the solutions are not disjoint, but because of the pseudoknots sub-solutions of the two solutions might be combined to form a better solution. Create a slider k (splitter in code) that runs from the lowest start point minus one to the highest end point plus one. For each cell (i,k), (k+1,j) merge all possible solutions and add them to the list of candidate-solutions. ** Next, store in cell (i,j) all solutions with an optimal score. There might be one solution, or there might be multiple solutions. ** Finish the calculation when the top-right cell in the matrix is filled. This cell contains the optimal solutions for the given set of paired regions. """ if goal not in ['max','min']: raise ValueError("goal has to be 'min' or 'max', but is '%s'"%(goal)) prs = paired_regions num_cells = len(prs)*2 # pre-calculate scores for pr in prs: pr.score(scoring_function) # create and initialize matrix result = empty_matrix(num_cells) # create some lookup dictionaries # enumerated start and end points enum_boundaries = prs.enumeratedBoundaries() # inverted enumerated start/end points {pr.Start/End: position in list} inv_enum_boundaries = prs.invertedEnumeratedBoundaries() pos_to_pr = prs.byStartEnd() # {(pr.Start, pr.End): PairedRegion} # fill the matrix for end_idx in range(num_cells): for begin_idx in range(end_idx-1, -1, -1): # look up sequence positions that match indices begin_pos = enum_boundaries[begin_idx] end_pos = enum_boundaries[end_idx] # look up solutions in left and bottom cells left_cell = result[begin_idx][end_idx-1] bottom_cell = result[begin_idx+1][end_idx] # collect candidates candidates = [] # add solutions from the left cell for sol in bottom_cell: candidates.append(sol) # add solutions from the bottom cell for sol in left_cell: candidates.append(sol) # if begin_pos and end_pos are paired: # ==> add left_bottom + this region if (begin_pos, end_pos) in pos_to_pr: this_region = pos_to_pr[(begin_pos, end_pos)] bottom_left = result[begin_idx+1][end_idx-1] for sol in bottom_left: candidates.append(\ PairedRegions(sol+PairedRegions([this_region]))) # if we have a solution in the left and in the bottom cell: if left_cell !=[[]] and bottom_cell != [[]]: # check whether they can be added or iterate for sol1 in left_cell: for sol2 in bottom_cell: he_pos = sol1.highestEnd() he_idx = inv_enum_boundaries[he_pos] ls_pos = sol2.lowestStart() ls_idx = inv_enum_boundaries[ls_pos] # If both solutions are disjoint if he_pos < ls_pos: candidates.append(sol1.merge(sol2)) else: # not disjoint ==> iterate for splitter in range(ls_idx-1, he_idx+1): cell_to_left = result[begin_idx][splitter] cell_to_bottom = result[splitter+1][end_idx] if cell_to_bottom == [[]]: break for sub_sol1 in cell_to_left: for sub_sol2 in cell_to_bottom: both = sub_sol1.merge(sub_sol2) candidates.append(both) # select all the candidates of maximum length best_candidate = pick_multi_best(candidates, goal=goal) result[begin_idx][end_idx] = best_candidate # return the whole matrix return result def matrix_solutions(paired_regions, goal='max', scoring_function=num_bps): """Return the list of solutions in the top-right cell of the DP matrix paired_regions -- PairedRegions object This methods fills a dynamic programming matrix (by calling dp_matrix_multi) and returns the list of solutions in the top-right cell. """ return dp_matrix_multi(paired_regions, goal=goal,\ scoring_function=scoring_function)[0][-1] # ============================================================================= # DYNAMIC PROGRAMMING APPROACH # ============================================================================= # DP function used to remove pseudoknots def opt_all(pairs, return_removed=False, goal='max',\ scoring_function=num_bps): """Return a list of pseudoknot-free Pairs objects pairs -- Pairs object or list of tuples. One base can only interact with one other base, otherwise an error will be raised. return_removed -- boolean, if True a list of tuples of (nested pairs, removed pairs) will be returned. Default is False --> list of nested pairs only is returned. goal -- str, 'max' or 'min', if the goal is 'max' the routine returns the solutions maximizing the score, if the goal is 'min' the solutions with a minimum score are returned. scoring_function -- function that can be applied to a PairedRegion object and that returns a numerical score. OPTIMIZATION, ALL SOLUTIONS (OA) -- PSEUDOKNOT REMOVAL METHOD This method will find all nested structures with an optimal score. Since there might be multiple optimal solutions, the retun value is always a list. If there is only one solution, the list will contain a single element. The problem is solved by dynamic programming. For each clique of mutually conflicting paired regions a matrix is filled out, and the best solutions are added to the result. Non-conflicting regions are part of the solution, and don't need to be processed. The user can specify the goal (maximize or minimize) and the scoring function. If one specifies for example goal='max' and scoring_function=num_bps, the routine finds the nested structures with the maximum number of base pairs. See documentation of dp_matrix_multi for recursion rules used in the approach. """ if not pairs.hasPseudoknots(): return [pairs] prs = PairedRegionsFromPairs(pairs) id_to_bl = prs.byId() cm = ConflictMatrix(prs) nc_regions = prs.nonConflicting(cm=cm) cliques = prs.conflictCliques(cm=cm) # basis for all nested structures are the non-conflicting regions result = [PairedRegions(nc_regions)] # resolve conflicts, store survivors and removed for cl in cliques: new_result = [] best = matrix_solutions(cl, goal=goal,\ scoring_function=scoring_function) for best_sol in best: for prev_res in result: new_result.append(prev_res.merge(best_sol)) result = new_result if return_removed: # collect the removed pairs for each solution surviving_ids = [] for sol in result: surviving_ids.append(dict.fromkeys([pr.Id for pr in sol])) removed = [] for sol, surv in zip(result, surviving_ids): rem = [] for pr_id in id_to_bl: if pr_id not in surv: rem.extend(id_to_bl[pr_id].Pairs) rem.sort() removed.append(rem) nested = [prs.toPairs() for prs in result] return zip(nested, removed) nested = [prs.toPairs() for prs in result] return nested # ============================================================================= # MAJORITY OF BASE PAIRS -- CONVENIENCE FUNCTIONS # ============================================================================= def opt_single_random(pairs, return_removed=False, goal='max',\ scoring_function=num_bps): """Return single pseudoknot-free Pairs object with an optimal score pairs -- Pairs object or list of tuples. One base can only interact with one other base, otherwise an error will be raised. return_removed -- boolean, if True a tuple of (nested pairs, removed pairs) will be returned. Default is False --> only nested pairs are returned. goal -- str, 'max' or 'min', if the goal is 'max' the routine returns the solutions maximizing the score, if the goal is 'min' the solutions with a minimum score are returned. scoring_function -- function that can be applied to a PairedRegion object and that returns a numerical score. There might be multiple nested structures with an optimal score. This method calculates all of them and returns one at random. The user can specify the goal (maximize or minimize) and the scoring function. If one specifies for example goal='max' and scoring_function=num_bps, the routine finds the nested structures with the maximum number of base pairs. """ nested_structs = opt_all(pairs, return_removed, goal,\ scoring_function) return choice(nested_structs) def opt_single_property(pairs, return_removed=False, goal='max',\ scoring_function=num_bps): """Return single pseudoknot-free Pairs object with max number of bps pairs -- Pairs object or list of tuples. One base can only interact with one other base, otherwise an error will be raised. return_removed -- boolean, if True a tuple of (nested pairs, removed pairs) will be returned. Default is False --> only nested pairs are returned. goal -- str, 'max' or 'min', if the goal is 'max' the routine returns the solutions maximizing the score, if the goal is 'min' the solutions with a minimum score are returned. scoring_function -- function that can be applied to a PairedRegion object and that returns a numerical score. There might be multiple nested structures with an optimal score. This method calculates all of them and returns the best by examining some properties. The first criterion is the number of paired regions in the returned structure, the second is the average range of the regions, the third is the average start value of the regions. It returns the structure with the minimum value for these three properties. If all properties are the same for multiple structures, an error is raises. I believe this can't happen, but if it does the behavior can be changed. The user can specify the goal (maximize or minimize) and the scoring function. If one specifies for example goal='max' and scoring_function=num_bps, the routine finds the nested structures with the maximum number of base pairs. """ nested_structs = opt_all(pairs, return_removed, goal,\ scoring_function) lookup = {} if return_removed: for p, p_rem in nested_structs: prs = PairedRegionsFromPairs(p) num_regions = len(prs) avg_range = average([pr.range() for pr in prs]) avg_start = average([pr.Start for pr in prs]) three = (num_regions, avg_range, avg_start) if three not in lookup: lookup[three] = [] lookup[three].append((p,p_rem)) else: for p in nested_structs: prs = PairedRegionsFromPairs(p) num_regions = len(prs) avg_range = average([pr.range() for pr in prs]) avg_start = average([pr.Start for pr in prs]) three = (num_regions, avg_range, avg_start) if three not in lookup: lookup[three] = [] lookup[three].append(p) min_key = min(lookup.keys()) min_value = lookup[min_key] if len(min_value) == 1: return min_value[0] else: # believe this can never happen, but just to be sure... raise ValueError("Multiple solutions found with equal properties") # ============================================================================= # CONFLICT-ELIMINATION APPROACHES # ============================================================================= def find_max_conflicts(conflicting_ids, cm, id_to_pr): """Return region ID of the region involved in the most conflicts conflicting_ids -- list of PairedRegion IDs cm -- ConflictMatrix object id_to_pr -- dict of {region ID: PairedRegion}. Result of PairedRegions.byId() method. This methods returns the region ID (out of conflicting_ids) involved in the most conflicts. If there is a single region with the most conflicts, return it. Otherwise compare all regions with the max number of conflicts on their gain. Gain is the length of the region minus the cumulative length of all of its conflicting regions. Return the one with the minimum gain. If both properties are equal, return the region that starts closest to the 3' end. """ number_of_conflicts = {} for pr_id in conflicting_ids: noc = len(cm.conflictsOf(pr_id)) if noc not in number_of_conflicts: number_of_conflicts[noc] = [] number_of_conflicts[noc].append(pr_id) max_noc = max(number_of_conflicts.keys()) max_ids = number_of_conflicts[max_noc] if len(max_ids) == 1: return max_ids[0] else: len_diffs = {} for pr_id in max_ids: pr_len = id_to_pr[pr_id].Length conf_len = sum([id_to_pr[i].Length for i in cm.conflictsOf(pr_id)]) diff = pr_len - conf_len if diff not in len_diffs: len_diffs[diff] = [] len_diffs[diff].append(pr_id) min_ld = min(len_diffs.keys()) min_ids = len_diffs[min_ld] if len(min_ids) == 1: return min_ids[0] else: start_vals = {} for pr_id in min_ids: start = id_to_pr[pr_id].Start start_vals[start] = pr_id max_start = max(start_vals.keys()) return start_vals[max_start] def find_min_gain(conflicting_ids, cm, id_to_pr): """Return region ID of the region with the minimum gain conflicting_ids -- list of PairedRegion IDs cm -- ConflictMatrix object id_to_pr -- dict of {region ID: PairedRegion}. Result of PairedRegions.byId() method. This methods returns the region ID (out of conflicting_ids) of the region that has the minimum gain. Gain is the length of the region minus the cumulative length of all of its conflicting regions. It expresses how many base pairs are gained if this region is kept and all of its conflicts have to be removed. If its gain is positive, it is favorable to keep this region. If its gain is negative, it is better to remove this region and keep its conflicts. If there are multiple regions with the minimal gain, the one involved in the most conflicts is returned. If both properties are equal, the method returns the region that starts closest to the 3' end. """ len_diffs = {} for pr_id in conflicting_ids: pr_len = id_to_pr[pr_id].Length conf_len = sum([id_to_pr[i].Length for i in cm.conflictsOf(pr_id)]) diff = pr_len - conf_len if diff not in len_diffs: len_diffs[diff] = [] len_diffs[diff].append(pr_id) min_ld = min(len_diffs.keys()) min_ids = len_diffs[min_ld] if len(min_ids) == 1: return min_ids[0] else: number_of_conflicts = {} for pr_id in min_ids: noc = len(cm.conflictsOf(pr_id)) if noc not in number_of_conflicts: number_of_conflicts[noc] = [] number_of_conflicts[noc].append(pr_id) max_noc = max(number_of_conflicts.keys()) max_ids = number_of_conflicts[max_noc] if len(max_ids) == 1: return max_ids[0] else: start_vals = {} for pr_id in min_ids: start = id_to_pr[pr_id].Start start_vals[start] = pr_id max_start = max(start_vals.keys()) return start_vals[max_start] def add_back_non_conflicting(paired_regions, removed): """Return new PairedRegions object and new dict of removed regions paired_regions -- PairedRegions object removed -- dict of {region_id: PairedRegion} Helper-function for conflict_elimination. Circular removal might occur in conflict-elimination methods. It means that a particular region is removed and later in the process all of its conflicts are also removed, which result in an eliminated region that doens't conflict with any region in the solution anymore. This methods adds removed regions back into the solution if they don't conflict with any region in the solution. The order in which regions are tried to add is from 5' to 3' starting point. """ id_to_pr = paired_regions.byId() new_removed = removed.copy() added = True # process removed from 5' to 3' order = [(pr.Start, pr.Id) for pr in new_removed.values()] order.sort() # from low start value to high start value while added: added = False for start, region_id in order: pr1 = new_removed[region_id] is_conflicting = False for pr2 in id_to_pr.values(): if pr1.conflicting(pr2): is_conflicting = True new_removed[region_id] = pr1 break if not is_conflicting: id_to_pr[region_id] = pr1 del new_removed[region_id] order = [(pr.Start, pr.Id) for pr in new_removed.values()] order.sort() # from low start value to high start value added = True break return PairedRegions(id_to_pr.values()), new_removed # Conflict-elimination heuristic. def conflict_elimination(pairs, sel_function, add_back=True,\ return_removed=False): """Return pseudoknot-free Pairs object pairs -- Pairs object or list of tuples sel_function -- function that takes a list of IDs of conflicting regions, a conflict matrix and a dict of {region_id: PairedRegion} and returns the ID of a paired region that has to be removed. add_back -- boolean, if True regions that are removed but not conflicting at the end because of circular removal are added back into the solution. If False, regions are only removed. This choice might result in too many regions being removed. Default value is True. return_removed -- boolean, if True a tuple of (nested pairs, removed pairs) will be returned. Default is False --> only nested pairs are returned. CONFLICT ELIMINATION -- PSEUDOKNOT REMOVAL METHOD EC -- sel_function=find_max_conflicts EG -- sel_functino=find_min_gain This is the general conflict-elimination function that should be used to remove pseudoknots from a knotted RNA structure. This algorithm removes paired regions one at the time. The order is in which regions are removed is specified by the selection function. Different selection functions can be specified. Selection functions should take a list of conflicting IDS, a ConflictMatrix and a dict of {Region ID: PairedRegion} as input and they should return a single PairedRegion ID. Two selection functions are available: find_max_conflicts and find_min_gain. See their documentation for specifications. """ prs = PairedRegionsFromPairs(pairs) id_to_pr = prs.byId() cm = ConflictMatrix(prs) removed = {} conf = cm.conflicting() while conf: to_remove = sel_function(conf, cm, id_to_pr) removed[to_remove] = id_to_pr[to_remove] prs.remove(id_to_pr[to_remove]) id_to_pr = prs.byId() cm = ConflictMatrix(prs) conf = cm.conflicting() # potential circular removal: add regions back in if add_back: # collect IDs of non-conflicting removed regions prs, removed = add_back_non_conflicting(prs, removed) if return_removed: rem = PairedRegions(removed.values()).toPairs() return prs.toPairs(), rem return prs.toPairs() # ============================================================================= # INCREMENTAL APPROACHES # ============================================================================= # Incremental in order (IO) method def inc_order(pairs, reversed=False, return_removed=False): """Return pseudoknot-free Pairs object pairs -- Pairs object or list of tuples. One base can only interact with one other base, otherwise an error will be raised. reversed -- boolean, indication whether the algoritm adds paired regions from 5' to 3' start values or from 3' to 5' end values. If False, order is 5' to 3', if True, order is 3' to 5'. Default is False. return_removed -- boolean, if True a tuple of (nested pairs, removed pairs) will be returned. Default is False --> only nested pairs are returned. INCREMENTAL IN ORDER (IO) -- PSEUDOKNOT REMOVAL METHOD This algorithm treats all the paired regions in order, either starting at he 5' end (reversed=F) or at the 3' end (reversed=T). It accepts all the non-conflicting regions; If a region conflicts with an already added region, it is excluded from the solution. """ prs = PairedRegionsFromPairs(pairs) id_to_pr = prs.byId() cm = ConflictMatrix(prs) if reversed: by_pos = [(pr.End, pr) for pr in prs] by_pos.sort() by_pos.reverse() else: by_pos = [(pr.Start, pr) for pr in prs] by_pos.sort() excluded = {} result = PairedRegions() for pos, pr in by_pos: if pr.Id in excluded: continue result.append(pr) for k in cm.conflictsOf(pr.Id): excluded[k] = True if return_removed: removed = Pairs([]) for pr_id in excluded: removed.extend(id_to_pr[pr_id].Pairs) removed.sort() return result.toPairs(), removed return result.toPairs() # Incremental by length (IL) method def inc_length(pairs, reversed=False, return_removed=False): """Return pseudoknot-free Pairs object pairs -- Pairs object or list of tuples. One base can only interact with one other base, otherwise an error will be raised. reversed -- boolean. In case of equal lengths, all paired regions are processed from 5' to 3' starting position. If reversed is True, regions are processed from 3' to 5' starting position. return_removed -- boolean, if True a tuple of (nested pairs, removed pairs) will be returned. Default is False --> only nested pairs are returned. INCREMENTAL BY LENGTH (IL) -- PSEUDOKNOT REMOVAL METHOD This algorithm will process the paired regions from the longest to the shortest. In case there are multiple regions of the same length, the one on the 5' side is added first if reversed=False (3' side is preferred if reversed=True). All paired regions that are conflicting with an already-added region are excluded. """ prs = PairedRegionsFromPairs(pairs) id_to_pr = prs.byId() # create conflict matrix to lookup the conflicts cm = ConflictMatrix(prs) length_pos_data = {} # dict of {region_length: [(pr.start, pr)]} for pr in prs: if pr.Length not in length_pos_data: length_pos_data[pr.Length] = [] length_pos_data[pr.Length].append((pr.Start, pr)) for v in length_pos_data.values(): v.sort() if reversed: v.reverse() excluded = {} result = PairedRegions() lengths = length_pos_data.keys() lengths.sort() lengths.reverse() # longest regions first for pr_len in lengths: for pr_start, pr in length_pos_data[pr_len]: if pr.Id not in excluded: result.append(pr) # use the conflict matrix to determine which regions to exclude for k in cm.conflictsOf(pr.Id): excluded[k] = True if return_removed: removed = Pairs([]) for pr_id in excluded: removed.extend(id_to_pr[pr_id].Pairs) removed.sort() return result.toPairs(), removed return result.toPairs() # Incremental by range (IR) method def inc_range(pairs, reversed=False, return_removed=False): """Return pseudoknot-free Pairs object pairs -- Pairs object or list of tuples. One base can only interact with one other base, otherwise an error will be raised. reversed -- boolean. If reversed is True: in case of two regions with the same range the region that starts closest to the 5' side is added first. If reversed is False: the region that starts closest to the 3' side is added first. Default is False (5' region preferred). return_removed -- boolean, if True a tuple of (nested pairs, removed pairs) will be returned. Default is False --> only nested pairs are returned. INCREMENTAL BY RANGE (IR) -- PSEUDOKNOT REMOVAL METHOD This algorithm will process the paired regions from the one with the shortest range to the one with the longest range. The range of a region is defined as the distance between the highest upstream position and the lowest downstream position (-1); in other words, it is the number of unpaired bases in the haripin if this region were the only paired region in the structure. In case there are multiple regions of the same length, the one that starts closest to the 5' side is added first if reversed=False (starting at the 3' side is preferred if reversed=True). All paired regions that are conflicting with an already-added region are excluded. """ prs = PairedRegionsFromPairs(pairs) id_to_pr = prs.byId() # create conflict matrix to lookup the conflicts cm = ConflictMatrix(prs) range_pos_data = {} # dict of {region_range: [(pr.start, pr)]} for pr in prs: rr = pr.range() # region range if rr not in range_pos_data: range_pos_data[rr] = [] range_pos_data[rr].append((pr.Start, pr)) for v in range_pos_data.values(): v.sort() if reversed: v.reverse() ranges = range_pos_data.keys() ranges.sort() result = PairedRegions() excluded = {} for rr in ranges: for pr_start, pr in range_pos_data[rr]: if pr.Id not in excluded: result.append(pr) for k in cm.conflictsOf(pr.Id): excluded[k] = True if return_removed: removed = Pairs([]) for pr_id in excluded: removed.extend(id_to_pr[pr_id].Pairs) removed.sort() return result.toPairs(), removed return result.toPairs() # ============================================================================= # NUSSINOV RESTRICTED # ============================================================================= def nussinov_fill(pairs, size): """Return filled dynamic programming search matrix with number of base pairs pairs -- Pairs object or list of tuples, should be directed (up,down) size -- int, number of rows and columns in the matrix (should be at least as much as the highest base paired position) Applies Nussinov-Jacobson algorithm restricted to input list of pairs. This function records the number of base pairs in the optimal (sub)solution. """ bp_dict = dict.fromkeys(pairs) m = zeros((size, size), int) for j in range(size): for i in range(j-1,-1,-1): m[i,j] = m[i+1,j] # i unpaired if m[i,j-1] > m[i,j]: # j unpaired m[i,j] = m[i,j-1] if (i,j) in bp_dict and m[i+1,j-1]+1 > m[i,j]: # (i,j) pair m[i,j] = m[i+1,j-1]+ 1 for k in range(i+1,j-1): # bifurcation if m[i,k] + m[k+1,j] > m[i,j]: m[i,j] = m[i,k]+m[k+1,j] return m def nussinov_traceback(m, i, j, pairs): """Return set of base pairs: nested structure with max number of pairs m -- filled DP search matrix i -- int, row coordinate where traceback should start, normally 0 j -- int, column coordinate where traceback should start, normally length-1 pairs -- Pairs object of list of tuples, should be directed (up, down), expect the same list as at the fill stage. Traceback procedure of the Nussinov-Jacobson algorithm that returns a single solution containing the maximum number of base pairs. """ bp_dict = dict.fromkeys(pairs) if m[i,j] == 0: #or if i>=j: return set() if (i,j) in bp_dict and m[i+1,j-1] + 1 == m[i,j]: return set([(i,j)]) | nussinov_traceback(m, i+1, j-1, pairs) for k in range(i,j): if m[i,j] == m[i,k] + m[k+1,j]: return nussinov_traceback(m,i,k,pairs) | \ nussinov_traceback(m,k+1,j,pairs) def nussinov_restricted(pairs, return_removed=False): """Return nested Pairs object containing the maximum number of pairs pairs -- Pairs object or list of tuples [(1,10),(2,9),...]. List of base pairs has to be conflict-free (one base can only pair with one other base) return_removed -- boolean, if True a list of tuples of (nested pairs, removed pairs) will be returned. Default is False --> list of nested pairs only is returned. This function is a modification of the original Nussinov-Jacobson algorithm, restricted to the given list of base pairs. It calculated a nested structure containing the maximum number of base pairs. NOTE: This function is very slow. If have have many base pairs in the list, the size of the matrix grows quickly. We recoomend using the opt_all (OA) function for larger problems. """ # pairs have to be conflict-free, directed, and sorted p = Pairs(pairs) if p.hasConflicts(): raise ValueError("Cannot handle base pair conflicts") p = p.directed() p.sort() if not p.hasPseudoknots(): nested = p # if not Pseudoknots: return structure else: paired_positions = [x for x,y in p] + [y for x,y in p] paired_positions.sort() if max(paired_positions) > 200: # if the 'sequence' is longer than 200, map the numbers # to remove the 'unpaired' positions. Avoid waste of space mapped_back = dict([(x,y) for x,y in enumerate(paired_positions)]) mapped = dict([(y,x) for x,y in enumerate(paired_positions)]) mapped_pairs = [] for x,y in pairs: mapped_pairs.append((mapped[x],mapped[y])) max_idx = len(paired_positions)+1 m = nussinov_fill(mapped_pairs, size=max_idx) t = nussinov_traceback(m, 0, max_idx-1, mapped_pairs) nested = [] for x,y in t: nested.append((mapped_back[x], mapped_back[y])) else: # sequence short enough, fill matrix directly max_idx = max(filter(None, paired_positions))+1 m = nussinov_fill(p, size=max_idx) nested = nussinov_traceback(m, 0, max_idx-1, p) nested = Pairs(list(nested)) nested.sort() if return_removed: removed = Pairs([]) for bp in p: if bp not in nested: removed.append(bp) return nested, removed else: return nested if __name__ == "__main__": pass PyCogent-1.5.3/cogent/struct/manipulation.py000644 000765 000024 00000013275 12024702176 022126 0ustar00jrideoutstaff000000 000000 """Contains functions to manipulate i.e. modify macromolecular entities.""" from numpy import array from itertools import izip from cogent.core.entity import HIERARCHY, copy, StructureHolder, ModelHolder from cogent.maths.geometry import coords_to_symmetry, coords_to_crystal from cogent.struct.selection import einput __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" def clean_ical(entities, pretend=True, mask=True): """Removes or masks entities with ambiguous (i)nsertion (c)odes or (a)lternate (l)ocations. Arguments: - entities: universal input see: ``cogent.struct.selection.einput`` - pretend: If ``True`` only reports icals and does not mask or remove anything. - mask (boolean): If pretend is ``False`` masks entities instead of removing them. This function does not check for occupancy. I retains the residue which is first when sorted by id number, insertion code and finally name. Residues without IC come first. Atoms within a retained residue are sorted according to PDB rules and the first one is chosen. If The first entity has an IC or alt_loc different from ' ' it will be changed to ' '. """ conflicts = [] changes = [] residues = einput(entities, 'R') id_r = [[None, None, None]] for r in residues.sortedvalues(): # sort by id, ic, name id_a = [[None, None]] if r.res_id == id_r[0][1]: # on collision choose first ... conflicts.append(r.getFull_id()) if not pretend: if mask: r.setMasked(True) else: r.parent.delChild(r.id) continue # an entity could be in other holders # keep it there as-is for a in r.sortedvalues(): # sort by id, alt_loc (' ', 'A' ...) if a.at_id == id_a[0][0]: # on collision choose first conflicts.append(a.getFull_id()) if not pretend: if mask: a.setMasked(True) else: r.delChild(a.id) else: if a.id[0][1] != ' ': changes.append((a.getFull_id(), ((a.id[0][0], ' '),))) if not pretend: a.setAlt_loc(' ') try: a.parent.updateIds() except AttributeError: pass id_a = a.id if r.id[0][2] != ' ': changes.append((r.getFull_id(), ((r.id[0][0], r.id[0][1], ' '),))) if not pretend: r.set_res_ic(' ') try: r.parent.updateIds() except AttributeError: pass id_r = r.id return (changes, conflicts) def expand_symmetry(model, mode='uc', name='UC', **kwargs): """Applies the symmetry operations defined by the header of the PDB files to the given ``Model`` entity instance. Returns a ``ModelHolder`` entity. Arguments: - model: model entity to expand - mode: 'uc', 'bio' or 'raw' - name: optional name of the ``ModelHolder`` instance. Requires a PDB file with a correct CRYST1 field and space group information. """ structure = model.getParent('S') sh = structure.header fmx = sh['uc_fmx'] omx = sh['uc_omx'] mxs = sh['uc_mxs'] # get initial coordinates atoms = einput(model, 'A') coords = array(atoms.getData('coords')) # expand the coordinates to symmetry all_coords = coords_to_symmetry(coords, fmx, omx, mxs, mode) models = ModelHolder(name) for i in xrange(0, len(mxs)): # copy model new_model = copy(model) # with additional models which new_atoms = einput(new_model, 'A') # patch with coordinates new_coords = all_coords[i] for (atom_id, new_coord) in izip(atoms.keys(), new_coords): new_atoms[atom_id[1:]].coords = new_coord # give it an id: the models are numbered by the symmetry operations with # identity being the first model new_model.setName(i) models.addChild(new_model) return models def expand_crystal(structure, n=1, name='XTAL'): """Expands the contents of a structure to a crystal of a given size. Returns a `` StructureHolder`` entity instance. Arguments: - structure: ``Structure`` entity instance. - n: number number of unit-cell layers. - name: optional name. Requires a PDB file with correct CRYST1 field and space group information. """ sh = structure.header sn = structure.name fmx = sh['uc_fmx'] omx = sh['uc_omx'] # get initial coorinates atoms = einput(structure, 'A') coords = array([atoms.getData('coords')]) # fake 3D # expand the coordinates to crystal all_coords = coords_to_crystal(coords, fmx, omx, n) structures = StructureHolder(name) rng = range(-n, n + 1) # a range like -2, -1, 0, 1, 2 vectors = [(x, y, z) for x in rng for y in rng for z in rng] for i, (u, v, w) in enumerate(vectors): new_structure = copy(structure) new_atoms = einput(new_structure, 'A') new_coords = all_coords[i, 0] for (atom_id, new_coord) in izip(atoms.keys(), new_coords): new_atoms[atom_id].coords = new_coord new_structure.setName("%s_%s%s%s" % (sn, u, v, w)) structures.addChild(new_structure) return structures PyCogent-1.5.3/cogent/struct/pairs_util.py000644 000765 000024 00000061065 12024702176 021601 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # pairs_util.py """Provides functions related to a Pairs object Functions to adjust Pairs in several ways (e.g. from gapped to ungapped or from ungapped to gapped. Works on strings or Sequence objects, on list of tuples or Pairs objects. The module also contains several function for measuring the distance (or similarity) between structures. The metrics from Gardner and Giegerich 2004 are provided. """ from __future__ import division from string import strip from numpy import array, sqrt, searchsorted, flatnonzero, take, sum from cogent.struct.rna2d import Pairs from cogent.parse.fasta import MinimalFastaParser __author__ = "Sandra Smit and Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Shandy Wikman", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class PairsAdjustmentError(Exception): pass # ================================================================== # Adjustment functions for Pairs objects # ================================================================== def adjust_base(pairs, offset): """Returns new Pairs with values shifted by offset pairs: Pairs object or list of tuples offset: integer Adjusts the base of a pairs object or a list of pairs according to the given offset. There's no validation in here! It is possible negative values are returned -> user responsibility. This method treats all pairs as equal. It'll return a pairs object of exactly the same length as the input, including pairs containing None, and duplicates. Example: adjust_base(Pairs([(2,8),(4,None)]), 2) --> [(4,10),(6,None)] """ if not isinstance(offset, int): raise PairsAdjustmentError("adjust_base: offset should be integer") result = Pairs() for x, y in pairs: if x is not None: new_x = x + offset else: new_x = x if y is not None: new_y = y + offset else: new_y = y result.append((new_x, new_y)) assert len(result) == len(pairs) return result def adjust_base_structures(structures, offset): """Adjusts the base of all structures by offset structures: list of Pairs objects offset: integer """ result = [] for struct in structures: result.append(adjust_base(struct, offset)) return result def adjust_pairs_from_mapping(pairs, mapping): """Returns new Pairs object with numbers adjusted according to map pairs: list of tuples or Pairs object mapping: dictionary containing mapping of positions from one state to the other (e.g. ungapped to gapped) For example: {0: 0, 1: 1, 2: 3, 3: 4, 4: 6, 5: 7, 6: 9, 7: 10, 8: 12} When the Pairs object corresponds to an ungapped sequence and you want to insert gaps, use a mapping from ungapped to gapped. When the Pairs object corresponds to a gapped sequence and you want to degap it, use a mapping from gapped to ungapped. """ result = Pairs() for x,y in pairs: if x is None: new_x = None elif x not in mapping: continue else: new_x = mapping[x] if y is None: new_y = None elif y not in mapping: continue else: new_y = mapping[y] result.append((new_x, new_y)) return result def delete_gaps_from_pairs(pairs, gap_list): """Returns Pairs object with pairs adjusted to gap_list pairs: list of tuples or Pairs object gap_list: list or array of gapped positions that should be removed from the pairs object Base pairs of which one of the partners or both of them are in the gap list are removed. If both of them are not in the gap_list, the numbering is adjusted according to the gap_list. When at least one of the two pair members is in the gap_list, the pair will be removed. The rest of the structure will be left intact. Pairs containing None, duplicates, pseudoknots, and conflicts will be maintained and adjusted according to the gap_list. """ if not gap_list: result = Pairs() result.extend(pairs) return result g = array(gap_list) result = Pairs() for up, down in pairs: if up in g or down in g: continue else: if up is not None: new_up = up - g.searchsorted(up) else: new_up = up if down is not None: new_down = down - g.searchsorted(down) else: new_down = down result.append((new_up, new_down)) return result def insert_gaps_in_pairs(pairs, gap_list): """Adjusts numbering in pairs according to the gap list. pairs: Pairs object gap_list: list of integers, gap positions in a sequence The main assumptionis that all positions in pairs correspond to ungapped positions. If this is not true, the result will be meaningless. """ if not gap_list: new = Pairs() new.extend(pairs) return new ungapped = [] for idx in range(max(gap_list)+2): if idx not in gap_list: ungapped.append(idx) new = Pairs() for x,y in pairs: if x is not None: try: new_x = ungapped[x] except IndexError: new_x = ungapped[-1] + (x-len(ungapped)+1) else: new_x = x if y is not None: try: new_y = ungapped[y] except IndexError: new_y = ungapped[-1] + (y-len(ungapped)+1) else: new_y = y new.append((new_x, new_y)) return new def get_gap_symbol(seq): """Return gap symbol. seq: Sequence object or plain string. Should be able to handle cogent.core Sequence and ModelSequence object. If the input sequence doesn't have a MolType, '-' will be returned as default. """ try: gap = seq.MolType.Alphabet.Gap except AttributeError: gap = '-' return gap def get_gap_list(gapped_seq, gap_symbol=None): """Return list of gapped positions. gapped_seq: string of sequence object. Should be able to handle old_cogent.base.sequence object, cogent.core Sequence and ModelSequence object, or plain strings. gap_symbol: gap symbol. Will be used for plain strings. """ try: gap_list = gapped_seq.gapList() #should work for RnaSequence except AttributeError: try: gap_list = flatnonzero(gapped_seq.gaps()) except AttributeError: gap_list = flatnonzero(array(gapped_seq,'c') == gap_symbol) try: # if gap_list is array, convert it to list gap_list = gap_list.tolist() except AttributeError: #already a list pass return gap_list def degap_model_seq(seq): """Returns ungapped copy of self, not changing alphabet. This function should actually be a method of ModelSequence. Right now the ungapped method is broken, so this is a temporary replacement. """ if seq.Alphabet.Gap is None: return seq.copy() d = take(seq._data, flatnonzero(seq.nongaps())) return seq.__class__(d, Alphabet=seq.Alphabet, Name=seq.Name, \ Info=seq.Info) def degap_seq(gapped_seq, gap_symbol=None): """Return ungapped copy of sequence. Should be able to handle old_cogent.base.sequence object, cogent.core Sequence and ModelSequence object, or plain strings. """ # degap the sequence try: #should work for old and new RnaSequence ungapped_seq = gapped_seq.degap() except AttributeError: try: ungapped_seq = degap_model_seq(gapped_seq) except AttributeError: ungapped_symbols = take(array(list(gapped_seq)),\ flatnonzero((array(list(gapped_seq)) != gap_symbol))) ungapped_seq = ''.join(ungapped_symbols) return ungapped_seq def gapped_to_ungapped(gapped_seq, gapped_pairs): """Returns ungapped sequence and corresponding Pairs object gapped_seq: string of characters (can handle Sequence, ModelSequence, str, or old_cogent Sequence objects). gapped_pairs: Pairs object, e.g. [(3,7),(4,6)]. The Pairs object should correspond to the gapped sequence version. The gap_symbol will be extracted from the sequence object. In case the gapped_seq is a simple str, a '-' will be used as default. """ gap_symbol = get_gap_symbol(gapped_seq) gap_list = get_gap_list(gapped_seq, gap_symbol) ungapped_seq = degap_seq(gapped_seq, gap_symbol) ungapped_pairs = delete_gaps_from_pairs(gapped_pairs, gap_list) return ungapped_seq, ungapped_pairs def ungapped_to_gapped(gapped_seq, ungapped_pairs): """Returns gapped sequence (same obj) and corresponding Pairs object gapped_seq: string of characters (can handle Sequence, ModelSequence, str, or old_cogent Sequence objects). ungapped_pairs: Pairs object, e.g. [(3,7),(4,6)]. The Pairs object should correspond to the ungapped sequence version. The gap_symbol will be extracted from the sequence object. In case the gapped_seq is a simple str, a '-' will be used as default. """ gap_symbol = get_gap_symbol(gapped_seq) gap_list = get_gap_list(gapped_seq, gap_symbol) gapped_pairs = insert_gaps_in_pairs(ungapped_pairs, gap_list) return gapped_seq, gapped_pairs # ================================================================== # Distance/similarity measures and logical operations # Pairs comparisons # ================================================================== def pairs_intersection(one, other): """Returns Pairs object with pairs common to one and other one: list of tuples or Pairs object other: list of tuples or Pairs object one and other should map onto a sequence of the same length. """ pairs1 = frozenset(Pairs(one).directed()) #removes duplicates pairs2 = frozenset(Pairs(other).directed()) return Pairs(pairs1&pairs2) def pairs_union(one, other): """Returns the intersection of one and other one: list of tuples or Pairs object other: list of tuples or Pairs object one and other should map onto a sequence of the same length. """ pairs1 = frozenset(Pairs(one).directed()) #removes duplicates pairs2 = frozenset(Pairs(other).directed()) return Pairs(pairs1 | pairs2) def compare_pairs(one, other): """Returns size of intersection divided by size of union between two Pairs Use as a similiraty measure for comparing secondary structures. Returns the number of base pairs common to both structures divided by the number of base pairs that is in one or the other structure: (A AND B)/(A OR B) (intersection/union) one: list of tuples or Pairs object other: list of tuples or Pairs object """ if one.hasConflicts() or other.hasConflicts(): raise ValueError("Can't handle conflicts in the structure""") if not one and not other: return 1.0 pairs1 = frozenset(Pairs(one).directed()) #removes duplicates pairs2 = frozenset(Pairs(other).directed()) return len(pairs1 & pairs2)/len(pairs1|pairs2) def compare_random_to_correct(one, other): """Returns fraction of bp in one that is in other (correct) one: list of tuples or Pairs object other: list of tuples or Pairs object Note: the second structure is the one compared against (the correct structure) """ if not one and not other: return 1.0 if not one or not other: return 0.0 pairs1 = frozenset(Pairs(one).directed()) #removes duplicates pairs2 = frozenset(Pairs(other).directed()) return len(pairs1 & pairs2)/len(pairs1) def compare_pairs_mapping(one, other, one_to_other): """Returns intersection/union given a mapping from the first pairs to second Use in case the numbering of the two Pairs object don't correspond. Sort of aligning two ungapped sequences and comparing their Pairs object via a mapping. one: list of tuples or Pairs object other: list of tuples or Pairs object one_to_other: mapping of positions in first pairs object to positions in second pairs object. For example: # pos in first seq, base, pos in second seq #1 U 0 #2 C 1 #3 G 2 #4 A 3 # A 4 #5 C 5 #6 C 6 #7 U #8 G 7 mapping = {1:0, 2:1, 3:2, 4:3, 5:5, 6:6, 7:None, 8:7} """ if not one and not other: return 1.0 just_in_first = 0 just_in_second = 0 in_both = 0 pairs1 = Pairs(one).directed() #removes duplicates pairs2 = Pairs(other).directed() for x,y in pairs1: other_match = (one_to_other[x],one_to_other[y]) if other_match in pairs2: in_both += 1 pairs2.remove(other_match) else: just_in_first += 1 just_in_second += len(pairs2) return in_both/(just_in_first + in_both + just_in_second) # =========================================================== # Gardner & Giegerich 2004 metrics # =========================================================== ACCEPTED = dict.fromkeys(map(tuple,["GC","CG","AU","UA","GU","UG"])) def check_structures(ref, predicted): """Raise ValueError if one of the two structures contains conflicts""" if ref.hasConflicts(): raise ValueError("Reference structure contains conflicts") if predicted.hasConflicts(): raise ValueError("Predicted structure contains conflicts") def get_all_pairs(sequences, min_dist=4): """Return number of possible base pairs in the sequece sequences: list of Sequence objects or strings min_dist: integer, minimum distance between two members of a base pair. Default is 4 (i.e. minimum of 3 unpaired bases in a loop) The number of pairs is defined as all possible GC, AU, and GU pairs, respecting the minimum distance between the two members of a base pair. This method returns the average number of possible base pairs over all provided sequences. """ if min_dist < 1: raise ValueError("Minimum distance should be >= 1") if not sequences: return 0.0 tn_counts = [] for seq in sequences: seq_str = str(seq).upper() seq_count = 0 #print 'xrange', range(len(seq)-min_dist) for x in range(len(seq)-min_dist): for y in range(x+min_dist,len(seq)): if (seq_str[x],seq_str[y]) in ACCEPTED: #print x,y, seq_str[x], seq_str[y], 'Y' seq_count += 1 else: pass #print x,y, seq_str[x], seq_str[y], 'N' tn_counts.append(seq_count) return sum(tn_counts)/len(tn_counts) def get_counts(ref, predicted, split_fp=False, sequences=None, min_dist=4): """Return TP, TN, FPcont, FPconf FPcomp, FN counts""" result = dict.fromkeys(['TP','TN','FN','FP',\ 'FP_INCONS','FP_CONTRA','FP_COMP'],0) ref_set = frozenset(Pairs(ref).directed()) pred_set = frozenset(Pairs(predicted).directed()) ref_dict = dict(ref.symmetric()) pred_dict = dict(predicted.symmetric()) tp_pairs = ref_set.intersection(pred_set) fn_pairs = ref_set.difference(pred_set) fp_pairs = pred_set.difference(ref_set) result['TP'] = len(tp_pairs) result['FN'] = len(fn_pairs) result['FP'] = len(fp_pairs) if split_fp: fp_incons = [] fp_contra = [] fp_comp = [] for x,y in fp_pairs: if x in ref_dict or y in ref_dict: #print "Conflicting: %d - %d"%(x,y) fp_incons.append((x,y)) else: five_prime = x three_prime = y contr_found = False for idx in range(x,y+1): if idx in ref_dict and\ (ref_dict[idx] < five_prime or\ ref_dict[idx] > three_prime): #print "Contradicting: %d - %d"%(x,y) contr_found = True fp_contra.append((x,y)) break if not contr_found: #print "Comatible: %d - %d"%(x,y) fp_comp.append((x,y)) result['FP_INCONS'] = len(fp_incons) result['FP_CONTRA'] = len(fp_contra) result['FP_COMP'] = len(fp_comp) assert result['FP_INCONS'] + result['FP_CONTRA'] + result['FP_COMP'] ==\ result['FP'] if sequences: num_possible_pairs = get_all_pairs(sequences, min_dist) result['TN'] = num_possible_pairs - result['TP'] -\ result['FP_INCONS'] - result['FP_CONTRA'] return result def extract_seqs(seqs): """Return list of sequences as strings. seqs could either be: -- a long string in fasta format: ">seq1\nACGUAGC\n>seq2\nGGUAGCG" -- a list of lines in fasta format: [">seq1","ACGUAGC",">seq2","GGUAGCG"] -- a list of sequences (strings or objects): ['ACGUAGC','GGUAGCG'] """ if isinstance(seqs, str): #assume fasta string result = [v for (l,v) in list(MinimalFastaParser(seqs.split('\n')))] elif isinstance(seqs, list): seq_strings = map(strip,map(str, seqs)) if seq_strings[0].startswith('>'): #list of fasta lines result = [v for l,v in list(MinimalFastaParser(seq_strings))] else: result = seq_strings else: raise Exception result = [s.replace('T','U') for s in result] return result def sensitivity_formula(counts): """Return sensitivity counts: dict of counts, containing at least TP and FN """ tp = counts['TP'] fn = counts['FN'] if not tp and not fn: return 0.0 sensitivity = tp/(tp + fn) return sensitivity def selectivity_formula(counts): """Return selectivity counts: dict of counts, containing at least TP, FP, and FP_COMP """ tp = counts['TP'] fp = counts['FP'] fp_comp = counts['FP_COMP'] if not tp and fp==fp_comp: return 0.0 selectivity = tp/(tp + (fp - fp_comp)) return selectivity def ac_formula(counts): """Return approximate correlation counts: dict of counts, containing at least TP, FP, and FP_COMP """ sens = sensitivity_formula(counts) sel = selectivity_formula(counts) return (sens+sel)/2 def cc_formula(counts): """Return correlation coefficient counts: dict of counts, containing at least TP, TN, FN, FP, and FP_COMP """ tp = counts['TP'] tn = counts['TN'] fp = counts['FP'] fn = counts['FN'] comp = counts['FP_COMP'] sens = sensitivity_formula(counts) sel = selectivity_formula(counts) N = tp+ (fp-comp) + fn + tn cc = 0.0 if tp >0: cc = (N*sens*sel-tp)/sqrt((N*sens-tp)*(N*sel-tp)) return cc def mcc_formula(counts): """Return correlation coefficient counts: dict of counts, containing at least TP, TN, FN, FP, and FP_COMP """ tp = counts['TP'] tn = counts['TN'] fp = counts['FP'] fn = counts['FN'] comp = counts['FP_COMP'] mcc_quotient = (tp+fp-comp)*(tp+fn)*(tn+fp-comp)*(tn+fn) if mcc_quotient > 0: mcc = (tp*tn-(fp-comp)*fn)/sqrt(mcc_quotient) else: raise ValueError("mcc_quotient <= 0: %.2f"%(mcc_quotient)) return mcc def sensitivity(ref, predicted): """Return sensitivity of the predicted structure ref: Pairs object -> reference structure (true structure) predicted: Pairs object -> predicted structure Formula: sensitivity = tp/(tp + fn) tp = True positives fn = False negatives """ check_structures(ref, predicted) if not ref and not predicted: return 1.0 elif not predicted: return 0.0 counts = get_counts(ref, predicted) return sensitivity_formula(counts) def selectivity(ref,predicted): """Return selectivity of the predicted structure ref: Pairs object -> reference structure (true structure) predicted: Pairs object -> predicted structure Formula: selectivity = tp/(tp+fp-fp_comp) tp = True positives fp = False positives fp_comp = compatible fp pairs """ check_structures(ref, predicted) if not ref and not predicted: return 1.0 elif not predicted: return 0.0 counts = get_counts(ref, predicted, split_fp=True) return selectivity_formula(counts) def selectivity_simple(ref, predicted): """Return selectivity without subtracting compatible false positives ref: Pairs object -> reference structure (true structure) predicted: Pairs object -> predicted structure Formula: selectivity = tp/(tp+fp) tp = True positives fp = False positives Not considering compatible false positives. As implemented in Dowell 2004 """ check_structures(ref, predicted) if not ref and not predicted: return 1.0 elif not predicted: return 0.0 counts = get_counts(ref, predicted) tp = counts['TP'] fp = counts['FP'] if not tp: #and fp==fp_comp: return 0.0 selectivity = tp/(tp + fp) return selectivity def approximate_correlation(ref, predicted, seqs): """Return the approximate correlation between sensitivity and selectivity ref: Pairs object -> reference structure (true structure) predicted: Pairs object -> predicted structure For the specific case of RNA structure comparisons, Matthews correlation coefficient can be approximated by the arithmetic-mean or geometrix-mean of sensitivity and selectivity Formula: ac = (sensitivity+selectivity)/2 """ check_structures(ref, predicted) counts = get_counts(ref, predicted, split_fp=True) return ac_formula(counts) def correlation_coefficient(ref, predicted, seqs, min_dist=4): """Return correlation coefficient to relate sensitivity and selectivity Implementation copied from compare_ct.pm Always same result as MCC? """ check_structures(ref, predicted) sequences = extract_seqs(seqs) counts = get_counts(ref, predicted, sequences=sequences, split_fp=True,\ min_dist=min_dist) return cc_formula(counts) def mcc(ref, predicted, seqs, min_dist=4): """Return the Matthews correlation coefficient ref: Pairs object -> reference structure (true structure) predicted: Pairs object -> predicted structure seqs: list of sequences, necessary to compute the number of true negatives. See documentation of extract_seqs function for accepted formats. min_dist: minimum distance required between two members of a base pair. Needed to calculate the number of true negatives. """ check_structures(ref, predicted) if not ref and not predicted: return 1.0 elif not predicted: return 0.0 elif not seqs: raise ValueError, 'No sequence provided!' sequences = extract_seqs(seqs) counts = get_counts(ref, predicted, sequences=sequences, split_fp=True,\ min_dist=min_dist) return mcc_formula(counts) def all_metrics(ref, predicted, seqs, min_dist=4): """Return dictionary containing the values of five metrics ref: Pairs object -> reference structure (true structure) predicted: Pairs object -> predicted structure seqs: list of sequences, necessary to compute the number of true negatives. See documentation of extract_seqs function for accepted formats. min_dist: minimum distance required between two members of a base pair. Needed to calculate the number of true negatives. the metrics returned are: sensitivity, selectivity, approximate correlation, correlation coefficient, and Matthews correlation coefficient """ check_structures(ref, predicted) result = {} if not ref and not predicted: # set all to 1.0 for i in ['SENSITIVITY','SELECTIVITY','AC','CC','MCC']: result[i] = 1.0 return result elif not predicted: # set all to 0.0 for i in ['SENSITIVITY','SELECTIVITY','AC','CC','MCC']: result[i] = 0.0 return result elif not seqs: raise ValueError, 'No sequence provided!' sequences = extract_seqs(seqs) counts = get_counts(ref, predicted, sequences=sequences, split_fp=True,\ min_dist=min_dist) result['SENSITIVITY'] = sensitivity_formula(counts) result['SELECTIVITY'] = selectivity_formula(counts) result['AC'] = ac_formula(counts) result['CC'] = cc_formula(counts) result['MCC'] = mcc_formula(counts) return result if __name__ == "__main__": pass PyCogent-1.5.3/cogent/struct/rna2d.py000644 000765 000024 00000112432 12024702176 020427 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Code for handling RNA secondary structure. RNA secondary structures can be represented in many different ways. The representation makes a large difference to the efficiency of different algorithms, so several different structural representations (and the means to interconvert them) are provided. Provides following classes: Stem: representation of a stem in a secondary structure Partners: list holding partner of each position. Pairs: list of base pairs in a structure. StructureString: string holding a secondary structure representation. ViennaStructure: representation of Vienna-format RNA structure. WussStructure: Wash U secondary structure format, handles pseudoknots. StructureNode: for tree representation of nested RNA structure. """ from numpy import zeros, put from cogent.util.transform import make_trans, float_from_string from cogent.core.tree import TreeNode, TreeError from cogent.util.misc import not_none, flatten __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class PairError(ValueError): """Base class for errors in pairing.""" pass class Stem(object): """Holds a Start, an End, and a Length. Note that the Start and the End pair with each other, thus spanning the range. Successive pairs count up from the Start and down from the End. Stem is intended to be _very_ lightweight and does no error checking. In other words, it will let you specify a Start that's after the End, a Length that can't exist because there's insufficient separation between the Start and the End, and so on. The same principle applies to __getitem__ (and hence __iter__): you can iterate through pairs in a stem that can't exist, e.g. Stem(6, 7, 10) will give you the 10 items from Stem(6,7,1) through Stem(15, -2, 1) -- often not what you want. Similarly, when you initialize a Stem it detects whether Start and End are both set, and makes a pair accordingly, but updating the Start or End will not re-do the check to see if a pair has formed or broken. The alternative is to make Start, End and Length into properties that reset the state of the others when updated, but this makes Stem too slow for applications such as BayesFold. """ __slots__ = ['Start', 'End', 'Length'] def __init__(self, Start=None, End=None, Length=0): """Returns a new Stem object.""" self.Start = Start self.End = End #set length if specified if Length: self.Length = Length #otherwise, set to 1 if paired and 0 if unpaired elif Start is None or End is None: self.Length = 0 else: self.Length = 1 def __len__(self): """Returns self.Length.""" return self.Length def __getitem__(self, item): """Masquerades as a list of single-base stems.""" length = self.Length #bounds check if item < 0: item = length - item if (item < 0) or (item >= length): raise IndexError, "Index %s out of range." % item #return appropriate base pair return Stem(self.Start + item, self.End - item) def __cmp__(self, other): """Sorts by start, then by end, then by length (if possible).""" return cmp(self.Start, other.Start) or cmp(self.End, other.End) \ or cmp(self.Length, other.Length) def extract(self, seq): """Returns bases in pairs as list of tuples. Note: always returns list, even if only one base pair. """ if self.Length > 1: return flatten([p.extract(seq) for p in self]) else: if self.Start is not None: start = seq[self.Start] else: start = None if self.End is not None: end = seq[self.End] else: end = None return [(start, end)] def __hash__(self): """Hashes the same as a tuple of (start, end, length). WARNING: if you change any of the values once an object is in a dict, you'll get unpleasant results. Don't do it! """ return hash((self.Start, self.End, self.Length)) def __str__(self): """String representation contains Start,End,Length.""" return '(%s,%s,%s)' % (self.Start, self.End, self.Length) def __nonzero__(self): """Nonzero if length > 0.""" return self.Length > 0 class Partners(list): """Holds list p such that p[i] is the index of the partner of i, or None. Primarily useful for testing whether a specified base is paired and, if so, extracting its partner. Each base may have precisely 0 or 1 partners. If A pairs with B, B must pair with A. All inconsistencies will be removed by setting previous partners to None. Checking for conflicts and raising errors should be done in method that constructs the Partners. If constructing by hand, should initialize with list of [None] * seq_length. Typically, Partners will be constructed by code from some other data. Use the EmptyPartners(n) factory function to get an empty Partners list of length n. """ def __setitem__(self, index, item): """Sets self[index] to item, enforcing integrity constraints.""" if index == item: raise ValueError, "Cannot set base %s to pair with itself." % item #if item already paired, raise Error or make partner unpaired if item and self[item]: self[self[item]] = None #if already paired, need to make partner unpaired curr_partner = self[index] if curr_partner is not None: list.__setitem__(self, curr_partner, None) #set self[index] to item list.__setitem__(self, index, item) #if item is not None, set self[item] to index if item is not None: list.__setitem__(self, item, index) def toPairs(self): """Converts the partners to sorted list of pairs.""" result = Pairs() for first, second in enumerate(self): if first < second: result.append((first, second)) return result def _not_implemented(self, *args, **kwargs): """Raises NotImplementedError for 'naughty' methods. Not allowed any methods that insert/remove items or that change the order of the items, including things like sort or reverse. """ raise NotImplementedError __delitem__ = __delslice__ = __iadd__ = __imul__ = __setslice__ = append \ = extend = insert = pop = remove = reverse = sort = _not_implemented def EmptyPartners(length): """Returns empty list of Partners with specified length.""" return Partners([None] * length) class Pairs(list): """Holds list of base pairs, each of which is a 2-element sequence. This is a very lightweight object for storing base pairs, and does not perform any validation. Useful as an intermediate in many different calculations. """ def toPartners(self, length, offset=0, strict=True): """Returns a Partners object, if possible. length of resulting sequence must be specified. offset is optional, and is added to each index. strict specifies whether collisions cause fatal errors. if not strict conflicts will be removed by the Partners object. """ result = EmptyPartners(length) for up, down in self: upstream = up + offset downstream = down + offset if result[upstream] or result[downstream]: if strict: raise ValueError, "Pairs contain conflicting partners: %s"\ % self result[upstream] = downstream return result def toVienna(self, length, offset=0, strict=True): """Returns a Vienna structure string, if possible. length of resulting sequence must be specified. Instead of parsing the sequence length, you can also throw in an object that has the required length (such as the sequence that the structure corresponds to). offset is optional, and is added to each index. strict specifies whether collisions cause fatal errors. """ if self.hasPseudoknots(): raise PairError, "Pairs contains pseudoknots %s"%(self) try: length = int(length) except ValueError: #raised when length can't be converted to int length = len(length) p = self.directed() result = ['.'] * length for up, down in p: try: upstream = up + offset downstream = down + offset except TypeError: continue if strict: if (result[upstream] != '.') or (result[downstream] != '.'): raise ValueError, "Pairs contain conflicting partners: %s"\ % self result[upstream] = '(' result[downstream] = ')' return ViennaStructure(''.join(result)) def tuples(self): """Converts all pairs in self to tuples, in place. Useful for constructing dicts and for sorting (otherwise, pairs of different types, e.g. lists and tuples, will sort according to type rather than to position). """ self[:] = map(tuple, self) def unique(self): """Returns copy of self omitting duplicate pairs, preserving order. Keeps the first occurrence of each pair. """ seen = {} result = [] for p in map(tuple, self): if p not in seen: seen[p] = True result.append(p) return Pairs(result) def directed(self): """Returns copy of self where all pairs are (upstream, downstream). Omits any unpaired bases and any duplicates. Result is in arbitrary order. """ seen = {} for up, down in self: if (up is None) or (down is None): continue #omit unpaired bases if up > down: up, down = down, up seen[(up, down)] = True result = seen.keys() return Pairs(result) def symmetric(self): """Retruns copy of self where each up, down pair has a down, up pair. Result is in arbitrary order. Double pairs and pairs containing None are left out. """ result = self.directed() result.extend([(down, up) for up, down in result]) return Pairs(result) def paired(self): """Returns copy of self omitting items where a 'partner' is None.""" return Pairs(filter(not_none, self)) def hasPseudoknots(self): """Returns True if the pair list contains pseudoknots. (good_up,good_down) <=> (checked_up,checked_down) pseudoknot if checked_upgood_down """ pairs = self.directed() seen = [] # list of pairs against which you compare each time pairs.sort() for pair in pairs: if not seen: seen.append(pair) else: lastseen_up, lastseen_down = seen[-1] while pair[0] > lastseen_down: seen.pop() if not seen: break else: lastseen_up,lastseen_down = seen[-1] if not seen: seen.append(pair) continue if pair[1]>lastseen_down: #pseudoknot found return True else: #good pair seen.append(pair) return False def hasConflicts(self): """Returns True if the pair list contains conflicts. Conflict occurs when a base has two different partners, or is asserted to be both paired and unpaired. """ partners = {} for first, second in self: if first is None: if second is None: continue #no pairing info else: first, second = second, first #swap order so None is 2nd if second is None: #check first isn't paired if partners.get(first, None) is not None: return True else: partners[first] = None else: #first and second were both non-empty: check partners if first in partners: if partners[first] != second: return True if second in partners: if partners[second] != first: return True #add current pair to the list of constraints partners[first] = second partners[second] = first #can only get here if there weren't conflicts return False def mismatches(self, sequence, pairs=None): """Counts pairs that can't form in sequence. Sequence must have a Pairs property that acts like a dictionary containing a 2-element tuple for each valid pair. Can also pass in the pairs explicitly. """ mismatches = 0 if pairs is None: try: pairs = sequence.Alphabet.Pairs except AttributeError: pairs = sequence.Pairs for up, down in self.directed(): curr = (sequence[up], sequence[down]) if curr not in pairs: mismatches += 1 return mismatches wuss_to_vienna_table = make_trans('<([{>)]} ', '(((()))) ', '.') def wuss_to_vienna(data): """Converts WUSS format string to Vienna format. Any pseudoknots or unrecognized chars will convert to unpaired bases. Spaces will be preserved. """ return ViennaStructure(data.translate(wuss_to_vienna_table)) class StructureString(str): """Base class for ViennaStructure and WussStructure. Immutable. StructureString holds a structure and a energy. By default energy is set to None. If you compare two StructureStrings the structure is the only important thing, since the energy is relative to the associated sequence. """ Alphabet=None StartSymbols = '' #dict of symbols that start base pairs EndSymbols = '' #dict of symbols that end base pairs def __new__(cls, Structure, Energy=None): """Returns new StructureString.""" a = cls.Alphabet if a: for i in Structure: if i not in a: raise ValueError,\ "Tried to include unknown symbol '%s'" % i return str.__new__(cls,Structure) def __init__(self, Structure='', Energy=None): """Initializes StructureString with Structure and optionally Energy.""" self.Energy = Energy self.toPartners() def __str__(self): """Returns string representaion of structure and energy, if known. Energy = 0 is different from Energy = None. Latter case is not printed. """ if not self.Energy == None: return self + ' (' + str(self.Energy) + ')' else: return str.__str__(self) def toPartners(self): """Makes list containing partner of each position. Constructs a list from 0 to the number of bases, where each position contains the index of its pair (or None if it is unpaired). Note that the numbering starts at 0 for the first position. The algorithm here relies on the fact that any completely nested base-paired structure (no pseudoknots!) can be formally represented as a tree. Consequently, when you hit a closed base pair, you know that it it must pair with the last base pair you opened. """ num_bases = len(self) #number of bases result = [None] * len(self) #array of None, one for each base stack = [] start = self.StartSymbols end = self.EndSymbols for i, symbol in enumerate(self): if symbol in start: #open a pair stack.append(i) elif symbol in end: #close a pair curr = stack.pop() #return and delete last element result[i] = curr #make i pair with the last element... result[curr] = i #...and the last element pair with i #test whether there are any open pairs left unaccounted for if stack: raise IndexError, \ "Too many open pairs in structure:\n%s" % self return Partners(result) def toPairs(self): """Makes list of (upstream,downstream) partners. Note that the numbering starts at 0 for the first position. Key will always be smaller than value. Result is in arbitrary order. """ result = {} stack = [] start = self.StartSymbols end = self.EndSymbols for i, symbol in enumerate(self): if symbol in start: #open a pair stack.append(i) elif symbol in end: #close a pair result[stack.pop()] = i #test whether there are any open pairs left unaccounted for if stack: raise IndexError, \ "Too many open pairs in structure:\n%s" % self return Pairs([(key,result[key]) for key in result]) def toTree(self): """Returns tree version of structure. Each node in the tree corresponds to a loop. Runs of nodes with single non-leaf children correspond to stems. """ root = StructureNode() curr_node = root start = self.StartSymbols end = self.EndSymbols for index, symbol in enumerate(self): if symbol in end: curr_node.End = index curr_node.Length = 1 curr_node = curr_node.Parent else: new_node = StructureNode() new_node.Start = index curr_node.Children.append(new_node) new_node.Parent = curr_node if symbol in start: curr_node = new_node return root class ViennaStructure(StructureString): """Contains a Vienna dot-bracket structure, possibly with energy.""" Alphabet = dict.fromkeys('(.)') StartSymbols = {'(':None} #dict of symbols that start base pairs EndSymbols = {')':None} #dict of symbols that end base pairs class WussStructure(StructureString): """Contains a Wuss Structure.""" Alphabet = dict.fromkeys( '(<{[.~-,:_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)>}]') StartSymbols = dict.fromkeys('(<{[')#dict of symbols that start base pairs EndSymbols = dict.fromkeys(')>}]') #dict of symbols that end base pairs def Vienna(data,Energy=None): """Tries to extract structure and energy from string data. Returns (structure, energy) where energy might be None. structure is just anything before the first space: doesn't validate. """ pieces = data.strip().split(None, 1) if not pieces: return ViennaStructure('', Energy) else: if not Energy: try: energy = float_from_string(pieces[1]) except (TypeError, ValueError, IndexError): energy = Energy else: #energy given by user overrules the one in structure energy = Energy return ViennaStructure(pieces[0], energy) class StructureNode(TreeNode): """Operations on StructureNodes, trees, and subtrees.""" Types = ["Stem", "Loop", "Bulge", "Junction", "End", "Flexible"] def __init__(self, Data=None, Children=None, Parent=None): """Returns a new StructureNode object.""" #coerce data into correct type if Data is None: Data = Stem() elif not isinstance(Data, Stem): Data = Stem(Data) #initalize as TreeNode: will forward attributes to Data super(StructureNode, self).__init__(None, Children, Parent) self.Data = Data def _get_stems(self): """Returns nodes corresponding to stems leaving self.""" return [i for i in self if i.IsPaired] def _get_unpaired(self): """Returns nodes corresponding to unpaired bases leaving self.""" return [i for i in self if not i.IsPaired] Stems = property(_get_stems) Unpaired = property(_get_unpaired) def _get_end(self): """Gets End from self's data.""" return self.Data.End def _set_end(self, val): """Sets End in self's data.""" self.Data.End = val def _get_start(self): """Gets Start from self's data.""" return self.Data.Start def _set_start(self, val): """Sets Start in self's data.""" self.Data.Start = val def _get_length(self): """Gets Length from self's data.""" return self.Data.Length def _set_length(self, val): """Sets Length in self's data.""" self.Data.Length = val def _get_index(self): """Accessor for index: returns position of self in parent's list.""" if self.Parent is None: return None else: ids = map(id, self.Parent.Children) return ids.index(id(self)) def _set_index(self, index): """Mutator for index: moves self to new location in parent's list. NOTE: index is relative to the new list after self is removed, not to the old list before self is removed. """ if self.Parent is None: raise TreeError, "Can't set Index in node %s without parent." % self else: curr_parent = self.Parent curr_parent.removeNode(self) curr_parent.Children.insert(index, self) End = property(_get_end, _set_end) Start = property(_get_start, _set_start) Length = property(_get_length, _set_length) Index = property(_get_index, _set_index) def _is_paired(self): """Returns True if self is paired, false otherwise.""" return self.End is not None IsPaired = property(_is_paired) def _get_type(self): """Returns type of the node, depending on stems in and out.""" #easy cases first if self.Parent is None: return "Root" if self.IsPaired: return "Stem" #if grandparent is None (i.e. it is a child of the root node), either #end or 'flexible' depending on whether the current node is outside #the first and last helices or between them. if self.Parent.Parent is None: stems = self.Parent.Stems if not stems: return "End" first, last = stems[0], stems[-1] index = self.Index if index < stems[0].Index or index > stems[-1].Index: return "End" else: return "Flexible" #otherwise, depends on number of stems coming out of parent: if none, #it's a loop, if one, it's a bulge, and otherwise it's a junction. else: out_stems = self.Parent.Stems if not out_stems: return "Loop" elif len(out_stems) == 1: return "Bulge" else: return "Junction" #should never get down here, since all cases are handled above raise ValueError, '_get_type failed on node with start %s, end %s' % \ (self.Start, self.End) Type = property(_get_type) def renumber(self, start=0): """Renumbers self and all child nodes consecutively. Returns the next number to be used. """ if self.Parent is None: #no number for root node curr = start else: self.Start = start curr = start + max(1, self.Length) #still add 1 if it's unpaired for i in self: curr = i.renumber(curr) if self.IsPaired: curr += self.Length self.End = curr - 1 return curr def classify(self, terminate=True): """Returns string containing site classification""" #check whether we're at the original node or in an internal node if not terminate: #if it's paired, need to add 'S' for self, plus handle children if self.IsPaired: result = ['S'] for i in self: result.extend(i.classify(False)) result.append('S') return result #otherwise, it's unpaired and we need to figure out what it is else: return self.Type[0] #just want first letter else: #if it's the root, just need to handle children result = [] for i in self: result.extend(i.classify(False)) return ''.join(result) def __str__(self): """Returns string representation of tree, in Vienna format.""" if self.IsPaired: prefix = '(' * self.Length suffix = ')' * self.Length return prefix + ''.join(map(str, self)) + suffix elif self.Parent is None: #root node return ''.join(map(str, self)) else: #unpaired base return '.' def unpair(self): """Breaks the first pair represented by the current node, if any. Returns True if the node is changed, False if it wasn't. """ if self.IsPaired: curr_idx = self.Index first = StructureNode(Data=Stem(self.Start)) last = StructureNode(Data=Stem(self.End)) if self.Length > 1: #not melting the whole helix self.Start += 1 self.End -= 1 self.Length -= 1 result = [first, self, last] else: #melting the whole helix result = [first] + self.Children + [last] #replace current record in parent with the result #note use of a slice assignment instead of an index! This is to #replace with the elements, not with a list of the elements. self.Parent[curr_idx:curr_idx+1] = result return True else: return False def _pair_before_indices(self): """Detects the pair of positions before the current node. Returns a tuple of (upstream, downstream) if possible; None otherwise. """ curr_idx = self.Index curr_parent = self.Parent #bail out if it's the first or last base if (curr_idx == 0) or (curr_idx == len(curr_parent) - 1): return None #bail out if the base before and after self aren't unpaired before = curr_parent[curr_idx - 1] after = curr_parent[curr_idx + 1] if before.IsPaired or after.IsPaired: return None return (before.Start, after.Start) def pairBefore(self): """Forms a pair before the current node, if possible. Returns True if the pair was successfully created, False otherwise. """ #get the indices of the pair, if possible indices = self._pair_before_indices() if not indices: return False #cache the current index and parent curr_idx = self.Index curr_parent = self.Parent #create a new pair, and add self to it new_pair = StructureNode(Data=Stem(indices[0], indices[1], 1)) new_pair.Children.append(self) self.Parent = new_pair #add the new pair to parent, and delete its unpaired siblings curr_parent.Children.insert(curr_idx, new_pair) new_pair.Parent = curr_parent del curr_parent.Children[curr_idx + 1] del curr_parent.Children[curr_idx - 1] return True def _pair_after_indices(self): """Returns indices of the pair after the current node, if possible. Returns tuple containing the indices if possible, None otherwise. """ #can't work if one or fewer children if len(self) < 2: return None before = self[0] after = self[-1] if before.IsPaired or after.IsPaired: return None return (before.Start, after.Start) def pairAfter(self): """Forms a pair after the current node, if possible. Returns True if the pair was successfully created, False otherwise. Note that all children of self will become children of the newly- created pair. """ indices = self._pair_after_indices() if not indices: return False #make the new pair and devolve children to it new_pair = StructureNode(Data=Stem(indices[0], indices[1], 1)) new_pair.Children[:] = self.Children[1:-1] #all except start and end for c in new_pair.Children: c.Parent = new_pair self.Children[:] = [new_pair] new_pair.Parent = self return True def pairChildren(self, first, second): """Forms a pair using two of the children of self. first and second can be the node objects or the indices. Returns True if the pair was created, False otherwise. """ #check that first and second are really instances of self, or #convert from indices if isinstance(first, int): if first >= 0: first_index = first else: first_index = len(self) + first first_node = self[first] else: first_index = first.Index first_node = first if isinstance(second, int): if second >= 0: second_index = second else: second_index = len(self) + second second_node = self[second] else: second_index = second.Index second_node = second #check that the nodes are really children of self, and that they're #not the same, and that they're not already paired if self is not first_node.Parent or self is not second_node.Parent: return False if first_node is second_node: return False if first_node.IsPaired or second_node.IsPaired: return False #create the new node and append the appropriate children new_node = StructureNode(Data= \ Stem(first_node.Start, second_node.Start, 1)) del self.Children[second_index] new_node.Children[:] = self.Children[first_index + 1 : second_index] for c in new_node.Children: c.Parent = new_node self.Children[first_index] = new_node new_node.Parent = self return True def expand(self): """If self is a stem, expands self into n separate pair nodes. Returns True if the node was expanded, False otherwise. """ if self.Length <= 1: return False children = self.Children[:] self.Children[:] = [] curr_pair = self start = self.Start end = self.End for i in range(1, self.Length): new_pair = StructureNode(Data=Stem(start+i, end-i, 1)) curr_pair.Children.append(new_pair) new_pair.Parent = curr_pair curr_pair = new_pair new_pair.Children[:] = children for c in new_pair.Children: c.Parent = new_pair self.Length = 1 return True def expandAll(self): """Expands self and all children of self.""" for i in self: i.expandAll() self.expand() def collapse(self): """If self is a stem, extends self with as many base pairs as possible. Returns True if the stem was extended, False otherwise. """ #no effect if we're at the root, or if the position isn't paired if not self.IsPaired or self.Parent is None: return False extended = False while len(self) == 1 and self[0].IsPaired: self.Children[:] = self[0].Children[:] for c in self.Children: c.Parent = self self.Length += 1 extended = True return extended def collapseAll(self): """Collapse self and all children of self.""" self.collapse() for i in self: i.collapseAll() def breakBadPairs(self, seq): """Breaks all pairs in self that can't form in seq.""" pairs = seq.Alphabet.Pairs self.expandAll() for i in self.traverse_recursive(): if i.IsPaired: if not (seq[i.Start], seq[i.End]) in pairs: i.unpair() def extendHelix(self, seq): """Extends helix represented by self as far as possible with seq. Assumes that helix has already been collapsed. """ pairs = seq.Alphabet.Pairs #form as many contiguous pairs before as possible curr = self while 1: indices = curr._pair_before_indices() if indices and ((seq[indices[0]], seq[indices[1]]) in pairs): curr.pairBefore() #see if we can extend the newly-created pair curr = curr.Parent else: break #form as many contiguous pairs after as possible curr = self while 1: indices = curr._pair_after_indices() if indices and ((seq[indices[0]], seq[indices[1]]) in pairs) \ and (curr.Stems or (len(curr.Unpaired) >= 5)): curr.pairAfter() #see if we can extend the newly-created pair curr = curr[0] else: break def extendHelices(self, seq): """Extends all helices in self and its children as far as possible.""" for i in self.traverse_recursive(): if i.IsPaired: i.extendHelix(seq) def fitSeq(self, seq): """Corrects the structure in self according to the specified sequence. seq must be a Sequence object with .Alphabet, etc. Note that fitSeq does not renumber the structure; it may be necessary to call renumber(), especially if the structure does not start at the beginning of the sequence. Assumes that it will be called starting with the root node; does not check backwards in the tree. """ self.breakBadPairs(seq) self.extendHelices(seq) def classify(struct, verbose=False): """Classifies a Vienna-format string into structural categories. struct may be a string or a real Vienna structure. It is tested on validity, because classifying invalid structures is very unreliable. If the number of closing brackets is larger than the number of opening brackets, an error would be raised, but if it's the other way around, weird characters could be added to the string. For instance, classifying '.((.)' would give "\x00SSLS". This happens because the ends are not handled correctly if the stack is not back to its one-level state. """ #Test whether structure is valid. Classifying invalid structures #is very unreliable. try: Vienna(struct) except IndexError: raise IndexError, "Trying to classify an invalid Vienna structure: %s"\ %(struct) MAX_STEMS=1000 #implement stack as three-item list PARENT = 0 ITEMS = 1 DEGREE = 2 STEM, LOOP, BULGE, JUNCTION, END, FLEXIBLE = map(ord, 'SLBJEF') #WARNING: won't work if more than max_stems come off a junction LEVELS = [FLEXIBLE, LOOP, BULGE] + [JUNCTION]*MAX_STEMS length = len(struct) result = zeros((length,), 'B') stack = [None,[],0] curr_level = stack if verbose: print 'curr_level:', curr_level print 'result:', result for i, c in enumerate(struct): if verbose: print 'pos, char:',i,c #open parens add new level to stack if c == '(': curr_level = [curr_level,[],1] result[i] = STEM #unpaired base gets appended to current level elif c == '.': curr_level[ITEMS].append(i) #closed parens subtract level from stack and assign state elif c == ')': #note: will handle end separately result[i] = STEM put(result, curr_level[ITEMS], LEVELS[curr_level[DEGREE]]) curr_level = curr_level[PARENT] curr_level[DEGREE] += 1 if verbose: print 'curr_level:', curr_level print 'result', result #handle ends and flexible bases end_items = curr_level[ITEMS] if end_items: first_start = struct.find('(') if first_start == -1: first_start = length+1 last_end = struct.rfind(')') put(result, [i for i in end_items if first_start HIERARCHY.index(entity.level): # call for children all.update(get_children(entity, level)) elif index < HIERARCHY.index(entity.level): # call for parents all.update(get_parent(entity, level)) else: all.update({entity.getFull_id():entity}) # call for self higher_level = HIERARCHY[index - 1] # one up;) if all: name = name or higher_level if higher_level == 'C': holder = ResidueHolder(name, all) elif higher_level == 'R': holder = AtomHolder(name, all) elif higher_level == 'M': holder = ChainHolder(name, all) elif higher_level == 'S': holder = ModelHolder(name, all) elif higher_level == 'H': holder = StructureHolder(name, all) else: raise ValueError, "einput got no input entites." holder.setSort_tuple() return holder def get_children(entity, level): """Return unique entities of lower or equal level Arguments: - entity: any ``Entity`` instance. - level: one of 'H', 'S', 'M', 'C', 'R', 'A' """ entity.setTable() return entity.table[level] def get_parent(entity, level): """Returns unique entities of higher level. Arguments: - entity: any ``Entity`` instance. - level: one of 'H', 'S', 'M', 'C', 'R', 'A' """ parent = entity.getParent(level) # get the correct parent return {parent.getFull_id(): parent} PyCogent-1.5.3/cogent/seqsim/__init__.py000644 000765 000024 00000001214 12024702176 021130 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Seqsim: code for simulating sequences and trees.""" __all__ = ['analysis', 'birth_death', 'markov', 'microarray', 'microarray_normalize', 'randomization', 'searchpath', 'sequence_generators', 'tree', 'usage'] __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald", "Mike Robeson", \ "Jesse Zaneveld"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/cogent/seqsim/analysis.py000644 000765 000024 00000050554 12024702176 021227 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from tree import RangeNode, balanced_breakpoints from cogent.core.usage import DnaPairs from usage import Counts from random import choice from numpy.linalg import det as determinant, inv as inverse from numpy import sqrt, newaxis as NewAxis, exp, dot, zeros, ravel, array, \ float64, max, min, average, any, pi from cogent.util.array import without_diag from cogent.maths.svd import three_item_combos, two_item_combos from cogent.maths.stats.test import std __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def tree_threeway_counts(tree, lca_depths, alphabet=DnaPairs, attr='Sequence'): """From tree and array of lca_depths, returns n*n*n array of Count objects. n is number of leaves. lca_depths: array (leaf * leaf) of depths of last common ancestor. alphabet: pair alphabet for input sequences. Returns dict containing counts for (i, j, k) and (j, i, k) where k is the outgroup of the three sequences. Will pick an arbitrary node to be the outgroup if there is a polytomy. Note: Leaves of tree must have sequences already assigned. """ outgroup_last = tree.outgroupLast leaves = list(tree.traverse()) result = {} for first, second, third in three_item_combos(leaves): new_first, new_second, new_third = outgroup_last(first, second, third) #get the sequence from each node seq_1 = getattr(new_first, attr) seq_2 = getattr(new_second, attr) seq_3 = getattr(new_third, attr) result[(new_first.Id, new_second.Id, new_third.Id)] = \ Counts.fromTriple(seq_1, seq_2, seq_3, alphabet) #don't forget to do counts from both the non-outgroups result[(new_second.Id, new_first.Id, new_third.Id)] = \ Counts.fromTriple(seq_2, seq_1, seq_3, alphabet) return result def dna_count_cleaner(counts): """Cleans DNA counts to just the 4-letter alphabet.""" return Counts(counts._data[:4,:4], DnaPairs) def tree_threeway_counts_sample(tree, lca_depths, alphabet=DnaPairs, \ attr='Sequence', n=1000, check_rates=True, clean_f=None): """Like tree_threeway_counts, but takes random sample (w/o replacement).""" leaves = list(tree.traverse()) num_leaves = len(leaves) #do normal threeway counts if number of triples < n num_triples = num_leaves * (num_leaves - 1) * (num_leaves-2) / 3 if num_triples < n: counts = tree_threeway_counts(tree, lca_depths, alphabet, attr) if clean_f: result = {} for k, v in counts.items(): result[k] = clean_f(v) return result else: return counts #if we got here, need to sample outgroup_last = tree.outgroupLast i = 0 seen = {} result = {} while i < n and len(seen) < num_triples: #bail out if same node picked twice, or if resampling same combo curr = choice(leaves), choice(leaves), choice(leaves) ids = tuple([c.Id for c in curr]) if len(dict.fromkeys(ids)) < len(curr): #picked same thing twice continue if curr in seen: continue first, second, third = curr new_first, new_second, new_third = outgroup_last(first, second, third) seq_1 = getattr(new_first, attr) seq_2 = getattr(new_second, attr) seq_3 = getattr(new_third, attr) counts = Counts.fromTriple(seq_1, seq_2, seq_3, alphabet) if clean_f: counts = clean_f(counts) key = (new_first.Id, new_second.Id, new_third.Id) #check rates if we need to if check_rates: try: #skip probs with zero rows if not min(max(counts._data,1)): continue probs = counts.toProbs() rates = probs.toRates() except (ZeroDivisionError, OverflowError, ValueError, \ FloatingPointError): continue result[key] = counts i += 1 return result def tree_twoway_counts(tree, alphabet=DnaPairs, average=True, attr='Sequence'): """From tree, return dict of Count objects. Note: if average is True, only has counts in m[i,j] or m[j,i], not both. """ leaves = list(tree.traverse()) result = {} if average: #return symmetric matrix for first, second in two_item_combos(leaves): seq_1 = getattr(first, attr) seq_2 = getattr(second, attr) result[(first.Id, second.Id)] = \ Counts.fromPair(seq_1, seq_2, alphabet) else: for first, second in two_item_combos(leaves): seq_1 = getattr(first, attr) seq_2 = getattr(second, attr) result[(first.Id, second.Id)] = \ Counts.fromPair(seq_1, seq_2, alphabet,False) result[(second.Id, first.Id)] = \ Counts.fromPair(seq_2, seq_1, alphabet,False) return result def counts_to_probs(count_dict): """Converts counts to probs, omitting problem cases.""" result = {} for key, val in count_dict.items(): #check for zero rows if not min(max(val._data,1)): continue try: p = val.toProbs() #the following detects nan from divide by zero for empty rows #this works because nan doesn't compare equal to itself first_col = p._data[:,0] if any(first_col != first_col): raise ZeroDivisionError #if we got here, everything was OK result[key] = val.toProbs() except (ZeroDivisionError, OverflowError,ValueError,FloatingPointError): #errors are platform-dependent and arise when a row is zero #(i.e. if a character doesn't appear in the sequence). pass return result def probs_to_rates(prob_dict, fix_f=None, normalize=False, ftol=0.01): """Converts probs to rates, omitting problem cases (but not neg off-diags). ftol is in log10 units, e.g. ftol of 1 means within one order of magnitude. Default = 0.01. NOTE: fix_f should be an unbound method of Rates, or other function that expects a rate as a single parameter. """ result = {} seen = {} for key, val in prob_dict.items(): try: rate = val.toRates(normalize) if fix_f: rate = fix_f(rate) rounded = tuple(map(round, rate._data.flatten()/ftol)) if rounded in seen: continue else: seen[rounded] = 1 #check for zero rows if not min(max(rate._data,1)): continue #check for zero cols if not min(max(rate._data)): continue if (not rate.isSignificantlyComplex()): # and rate.isValid(): result[key] = rate except (ZeroDivisionError, OverflowError,ValueError,FloatingPointError): #errors are platform-dependent pass return result def tree_threeway_rates(tree, lca_depths, alphabet=DnaPairs, fix_f=None, \ normalize=False, without_diag=False): """Generates matrix of all valid three-way rate matrices from a tree. Dimensions of matrix are (num_leaves, num_leaves, num_leaves, flat_rate), i.e. m[0][1][2] gives you the matrix for leaves 0,1,2. In general, expect to get matrices out of the tree by slicing rather than by indexing, since many rates will be empty due to inference problems. """ leaves = len(list(tree.traverse())) counts = tree_threeway_counts(tree, lca_depths, alphabet) rates = probs_to_rates(counts_to_probs(counts), fix_f, normalize) return threeway_rates_to_array(rates, leaves, alphabet, without_diag) def tree_twoway_rates(tree, alphabet=DnaPairs, average=False, fix_f=None, \ normalize=False, without_diag=False): """Generates all valid two-way rate matrices from a tree. Dimensions of matrix are (num_leaves, num_leaves, num_leaves, flat_rate), i.e. m[0][1][2] gives you the matrix for leaves 0,1,2. In general, expect to get matrices out of the tree by slicing rather than by indexing, since many rates will be empty due to inference problems. """ leaves = len(list(tree.traverse())) counts = tree_twoway_counts(tree, alphabet, average) probs = counts_to_probs(counts) r = probs_to_rates(probs, fix_f, normalize) return twoway_rates_to_array(r, leaves, alphabet, without_diag) def twoway_rates_to_array(rates, num_seqs, alphabet=DnaPairs, \ without_diag=True): """Fills an array with flat twoway_rates arrays.""" if without_diag: result = zeros((num_seqs, num_seqs, len(alphabet)- \ len(alphabet.SubEnumerations[0])), float64) else: result = zeros((num_seqs, num_seqs, len(alphabet)), float64) return rates_to_array(rates, result, without_diag) def threeway_rates_to_array(rates, num_seqs, alphabet=DnaPairs, \ without_diag=True): """Fills an array with flat threeway_rates arrays.""" if without_diag: result = zeros((num_seqs, num_seqs, num_seqs, \ len(alphabet) - len(alphabet.SubEnumerations[0])), float64) else: result = zeros((num_seqs, num_seqs, num_seqs, len(alphabet)), float64) return rates_to_array(rates, result, without_diag) def rates_to_array(rates, to_fill, without_diagonal=False): """Fills rates into a pre-existing array object. Assumes that all keys in rates are valid indices into to_fill, and that the last dimension of to_fill is the same size as the flattened values in rates. If without_diagonal is True (False by default), removes the diagonals from the data before placing in the array. Note that we can't call this without_diag or we collide with the function we want to call to strip the diagonal. WARNING: size of to_fill array must be adjusted to be the right size as the inputs, i.e. last dimension same as flat array with/without diagonals. """ if without_diagonal: for key, val in rates.items(): to_fill[key] = ravel(without_diag(val._data)) else: for key, val in rates.items(): to_fill[key] = ravel(val._data) return to_fill def multivariate_normal_prob(x, cov, mean=None): """Returns multivariate normal probability density of vector x. Formula: http://www.riskglossary.com/link/joint_normal_distribution.htm """ if mean is None: mean = zeros(len(x)) diff_row = x-mean diff_col = diff_row[:,NewAxis] numerator = exp(-0.5 * dot(dot(diff_row, inverse(cov)), diff_col)) denominator = sqrt((2*pi)**(len(x)) * determinant(cov)) return numerator/denominator ######### WARNING: TESTS END HERE! ########################### def classify_tree(tree, comparison_sets): """Takes a tree and a dict of label -> distributions. Returns node -> label.""" pass def balanced_two_q_tree(n, length=0.05, seq_length=100, change_depth=2,\ perturb=True, both=False): """Makes single random tree with specified nodes, branch and seq length. Two random rate matrices are assigned. """ t = BalancedTree(n) t.assignLengths(length) if perturb: t.pertubrAttr('Length', xxx) raise #NOT FINISHED q = Rates.random() scale_trace(q) t.Q = q curr_node = t for i in range(change_depth): curr_node = choice(curr_node) q2 = random_q_matrix() scale_trace(q2) curr_node.Q = q2 t.assignQ() t.assignP() t.Sequence = rand_rna(seq_length) t.assignSeqs() if both: return t, curr_node else: return t def balanced_multi_q_tree(n, length=0.05, seq_length=100,perturb=True): """Makes single random tree with specified nodes, branch and seq length. Every node gets its own random rate matrix. """ t = BalancedTree(n) t.assignLengths({},default=length) if perturb: perturb_lengths(t) for node in t.traverse(): q = random_q_matrix() scale_trace(q) node.Q = q t.assignQ() t.assignP() t.Sequence = rand_rna(seq_length) t.assignSeqs() return t def get_matrix_stats(qs, true_q=None): """Returns list of matrix stats. Expects flat q matrices as input.""" #set up flat q matrices flat_qs = flatten_q_matrices(qs) for q in qs: scale_trace(qs) flat_scaled_qs = flatten_q_matrices(qs) #get covariance and correl matrices covar = cov(flat_qs) correl = corrcoef(flat_qs) norm_covar = euclidean_norm(covar) norm_correl = euclidean_norm(correl) covar_e = list(eigenvalues(covar).real) covar_e.sort() covar_e.reverse() correl_e = list(eigenvalues(correl).real) correl_e.sort() correl_e.reverse() two_best_covar = ratio_two_best(covar_e) two_best_correl = ratio_two_best(correl_e) frac_best_covar = ratio_best_to_sum(covar_e) frac_best_correl = ratio_best_to_sum(correl_e) weiss_stat = weiss(covar_e) #get summary stats from scaled qs distances = dists_from_mean(flat_scaled_qs) mean_dist = average(distances) var_dist = var(distances) if true_q: mean_q = mean(flat_scaled_qs) q_error = euclidean_distance(mean_q, true_q) else: q_error = 0 norm_scaled_covar = euclidean_norm(cov(flat_scaled_qs)) result = [norm_covar, norm_correl, two_best_covar, \ two_best_correl, frac_best_covar, frac_best_correl, weiss_stat, \ mean_dist, var_dist, q_error, norm_scaled_covar] result.extend(correl_e) result.extend(covar_e) return result def compare_distance_to_weiss(t): """Given a tree, calculate 2-sequence Weiss statistic and 3-seq variation. Returns tuple containing Weiss, eigenvector ratio, and var of 3-seq dist.""" three_subs = all_p_from_tree(t) pair_subs = pair_p_from_tree(t) three_qs = get_qs(three_subs) pair_qs = get_qs(pair_subs) pairs_no_diag = map(array, map(take_off_diag, pair_qs)) three_qs_copy = [i.copy() for i in three_qs] for i in three_qs: scale_trace(i) pairs_flat = flatten_q_matrices(pairs_no_diag) three_flat = flatten_q_matrices(three_qs) three_distances = dists_from_mean(three_flat) mean_three = average(three_distances) var_three = var(three_distances) pair_distances = dists_from_mean(pairs_flat) mean_pair = average(pair_distances) var_pair = var(pair_distances) three_qs_copy_flat = flatten_q_matrices(three_qs_copy) three_cor_matrix = corrcoef(three_qs_copy_flat) three_ratio = ratio_two_best(list(eigenvalues(three_cor_matrix).real)) cov_matrix = cov(pairs_flat) covar_e = list(eigenvalues(cov_matrix).real) covar_e.sort() covar_e.reverse() two_ratio = ratio_two_best(covar_e) weiss_stat = weiss(covar_e) return [mean_pair, var_pair, mean_three, var_three, two_ratio, three_ratio, weiss_stat] def distance_to_weiss_header(): return ['mean_pair', 'var_pair', 'mean_three', 'var_three', 'two_ratio',\ 'three_ratio', 'weiss_stat'] def mean_var_eigen(t): """Returns mean and variance of branch lengths and 3-seq eigen of tree""" three_subs = all_p_from_tree(t) three_qs = get_qs(three_subs) three_qs_copy = [i.copy() for i in three_qs] for i in three_qs: scale_trace(i) three_flat = flatten_q_matrices(three_qs) three_distances = dists_from_mean(three_flat) mean_three = average(three_distances) var_three = var(three_distances) three_qs_copy_flat = flatten_q_matrices(three_qs_copy) three_cor_matrix = corrcoef(three_qs_copy_flat) three_ratio = ratio_two_best(list(eigenvalues(three_cor_matrix).real)) return [mean_three, var_three, three_ratio] def mean_var_eigen_header(): return ['mean_three', 'var_three', 'ratio_three'] def svd_tests(n): """Runs some tests of svd on samples of n random matrices""" print 'SVD tests...' tests = {'random':random_mats, 'random q':random_q_mats, \ 'scaled q':scale_one_q, 'scaled many': scale_many_q} for name, f in tests.items(): print name, ':\n', svd_q(f(n)) def tree_tests(n, leaves=8, slen=100, blen=0.1): """Runs some tests of svd on trees""" print "Tree tests..." print "single matrix" for i in range(n): t = balanced_one_q_tree(n=leaves, seq_length=slen, length=blen) ps = all_p_from_tree(t) print svd_q(get_qs(ps)).real print "two matrices" for i in range(n): t = balanced_two_q_tree(n=leaves, seq_length=slen, length=blen) ps = all_p_from_tree(t) print svd_q(get_qs(all_p_from_tree(t))).real print "many matrices" for i in range(n): ps = all_p_from_tree(t) t = balanced_multi_q_tree(n=leaves, seq_length=slen, length=blen) print svd_q(get_qs(all_p_from_tree(t))).real def stats_from_tree(t): ps = all_p_from_tree(t) qs = get_qs(ps) return get_matrix_stats(qs) def tree_stats(n, tree_f, result_f): for i in range(n): t = tree_f() yield result_f(t) def tree_stats_header(): header += ['norm_covar', 'norm_correl', 'two_best_covar', 'two_best_correl', 'frac_best_covar', 'frac_best_correl', 'weiss_stat', 'mean_dist', 'var_dist', 'q_error', 'norm_scaled_covar'] for i in range(16): header.append('cor_'+str(i)) for i in range(16): header.append('cov_'+str(i)) return header def tree_stats_analysis(n, header_f, result_f, leaves=16, slen=1000, blen=0.1, \ tree_f=balanced_two_q_tree, tree_f_name='two_q'): make_tree = lambda: tree_f(n=leaves, seq_length=slen, length=blen) shared_params = [leaves, slen, blen, tree_f_name] shared_header = ['n', 'leaves', 'seq_len', 'branch_len', 'type'] print '\t'.join(shared_header + header_f()) for i, data in enumerate(tree_stats(n, make_tree, result_f)): result = [i] + shared_params + data print '\t'.join(map(str, result)) def perturb_lengths(t): """Perturbs the branch lengths in t by multiplying each by (0.5,1.5).""" for node in t.traverse(): if node.Length: node.Length *= (0.5 + random()) def stats_from_tree_analysis(n): tree_stats_analysis(n, header_f=tree_stats_header, result_f=stats_from_tree) def q_matrix_compare_analysis(n): tree_stats_analysis(n, header_f=distance_to_weiss_header, \ result_f=compare_distance_to_weiss, tree_f=balanced_one_q_tree, \ tree_f_name='one_q') tree_stats_analysis(n, header_f=distance_to_weiss_header, \ result_f=compare_distance_to_weiss, tree_f=balanced_two_q_tree, \ tree_f_name='two_q') tree_stats_analysis(n, header_f=distance_to_weiss_header, \ result_f=compare_distance_to_weiss, tree_f=balanced_multi_q_tree, \ tree_f_name='multi_q') def change_length_analysis(n, branch_lengths=\ [0.05, 0.1, 0.15, 0.2, 0.25,0.3, 0.35, 0.4], leaves=16, slen=1000): header = ['mean_three_m','mean_three_sd','var_three_m', 'var_three_sd', \ 'ratio_three_m', 'ratio_three_sd'] conditions = {'one_q':balanced_one_q_tree, 'two_q':balanced_two_q_tree, \ 'multi_q':balanced_multi_q_tree} condition_names = ['one_q', 'two_q', 'multi_q'] result_f = mean_var_eigen full_header = ['length'] + \ [name+i for name in condition_names for i in header] print '\t'.join(full_header) for b in lengths: result = [b] for c in condition_names: tree_f = conditions[c] make_tree = lambda:tree_f(n=leaves,seq_length=slen,length=b,\ perturb=False) samples = list(tree_stats(n, make_tree, result_f)) means = average(samples) stdevs = std(samples) for i in zip(means, stdevs): result.extend(i) print '\t'.join(map(str, result)) def tree_change_analysis(n): """Test whether we can see a change in stats for the first subtree w/ Q2""" header = ['mean_three','var_three', 'ratio_three'] result_f = mean_var_eigen print '\t'.join(['n'] + header + header) for i in range(n): t, d = balanced_two_q_tree(n=16, seq_length=1000, length=0.1, \ perturb=False, both=True) d_result = result_f(d) p_result = result_f(d.Parent) print '\t'.join(map(str, [i] + d_result + p_result)) if __name__ == '__main__': t = tree_controller() PyCogent-1.5.3/cogent/seqsim/birth_death.py000644 000765 000024 00000066222 12024702176 021660 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Code for birth-death (Yule) processes for simulating phylogenetic trees. Also contains a double birth-death model for simulating horizontal gene transfer histories (not yet tested). "Production" status only applies to the single birth-death model. """ from cogent.seqsim.tree import RangeNode from random import random __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Mike Robeson"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class ExtinctionError(Exception): pass class TooManyTaxaError(Exception): pass class BirthDeathModel(object): """Creates trees using a birth-death model. Initialize with timestep, birth prob, death prob. This class only produces the trees (using RangeNode); these trees already know how to evolve sequences, set rate matrices, etc. The trees returned will include lengths for each branch. WARNING: Sometimes, the ancestral node will die off, or that all the nodes will die off later. If that happens, an ExtinctionError will be raised. """ NodeClass = RangeNode def __init__(self, BirthProb, DeathProb, TimePerStep, ChangedBirthProb=None, \ ChangedDeathProb=None, ChangedBirthStep=None, ChangedDeathStep=None, \ MaxStep=1000, MaxTaxa=None): """Returns a new BirthDeathModel object. BirthProb: probability that a node will split in timestep. DeathProb: probability that a node will die in timestep. TimePerStep: branch length (sequence distance units) for each step. ChangedBirthProb: new BirthProb to be set at the time step specified in ChangedBirthStep. ChangedDeathProb: new DeathProb to be set at the time step specified in ChangedDeathStep. ChangedBirthStep: time step at which the ChangedBirthProb is set. ChangedDeathStep: time step at which the ChangedDeathProb is set. MaxStep: maximum time step before stopping. Default 1000. MaxTaxa: maximum taxa before stopping. Default None. Sets CurrStep to 0 at the beginning to measure elapsed time. Note: if both a birth and a death occur in the same timestep, they will be ignored. WARNING: If neither MaxStep nor MaxTaxa is set, the simulation will keep going until all nodes are extinct, or you run out of memory! """ self.CurrStep = 0 self.BirthProb = BirthProb self.DeathProb = DeathProb self.TimePerStep = float(TimePerStep) self.ChangedBirthProb = self.prob_check(ChangedBirthProb) self.ChangedBirthStep = self.step_check(ChangedBirthStep) self.ChangedDeathProb = self.prob_check(ChangedDeathProb) self.ChangedDeathStep = self.step_check(ChangedDeathStep) self.CurrDeathProb = DeathProb self.CurrBirthProb = BirthProb self.MaxStep = MaxStep self.MaxTaxa = MaxTaxa if TimePerStep <= 0: raise ValueError, "TimePerStep must be greater than zero" if not (0 <= BirthProb <= 1) or not (0 <= DeathProb <= 1): raise ValueError, "Birth and death probs must be between 0 and 1" #self.CurrStep = 0 self.Tree = self.NodeClass() self.CurrTaxa = [self.Tree] def prob_check(self,prob_value): """Checks if prabability value lies between 0 and 1""" if prob_value is not None: if not (0 <= prob_value <= 1): raise ValueError, "\'prob_value\' must be between 0 and 1" else: return prob_value else: return None def step_check(self,step_value): """Checks to see if value is greater than zero""" if step_value is not None: if step_value <= 0: raise ValueError, "\'stop_value\' must be greater than zero" else: return step_value else: return None def timeOk(self): """Return True only if the maximum time has not yet been reached.""" #If MaxStep is not set, never say that the maximum time was reached if self.MaxStep is None: return True else: return self.CurrStep < self.MaxStep def taxaOk(self): """Returns True if the number of taxa is > 0 and < self.MaxTaxa. Note: MaxTaxa is exclusive (i.e. if MaxTaxa is 32, taxaOk will return False when the number of taxa is exactly 32, allowing you to stop when this number is reached). """ num_taxa = len(self.CurrTaxa) if num_taxa < 1: return False if self.MaxTaxa is not None: return num_taxa < self.MaxTaxa #otherwise, if self.MaxTaxa was not set, any number is OK since we #know we have at least item in the list or we wouldn't have got here. else: return True def B_Prob(self): """Checks to see if Birth probability changes during a time step. If target time step is defined and met, the new birth prob will take affect from that point onward. """ if self.ChangedBirthStep is None: return self.BirthProb else: if self.CurrStep < self.ChangedBirthStep: return self.BirthProb else: try: self.CurrBirthProb = self.ChangedBirthProb return self.ChangedBirthProb except ValueError: 'ChangedBirthProb value is \'None\'' def D_Prob(self): """Checks to see if Death probability changes during a time step. If target time step is defined and met, the new death prob will take affect from that point onward. """ if self.ChangedDeathStep is None: return self.DeathProb else: if self.CurrStep < self.ChangedDeathStep: return self.DeathProb else: try: self.CurrDeathProb = self.ChangedDeathProb return self.ChangedDeathProb except ValueError: 'ChangedDeathProb value is \'None\'' def step(self, random_f=random): """Advances the state of the object by one timestep. Specifically: For each node in the current taxa, decides whether it's going to produce a birth or a death. If a node dies, delete it from the list of current taxa. If a node gives birth, add two child nodes to the list of current taxa each with branchlength equal to the timestep, and delete the original node from the list of taxa. Otherwise, add the timestep to the node's branchlength. """ #create list of new current nodes b = self.B_Prob() d = self.D_Prob() nc = self.NodeClass ts = self.TimePerStep new_list = [] for node in self.CurrTaxa: died = random_f() < d born = random_f() < b #need to duplicate only if it was born and one didn't die if (born and not died): first_child = nc() second_child = nc() children = [first_child, second_child] #remember, we need to take care of both parent and child #refs manually unless the tree class does it for us node.Children = children first_child.Parent = node second_child.Parent = node new_list.extend(children) elif (died and not born): #don't add the dead node to the new list continue else: #i.e. if born and died, or if nothing happened new_list.append(node) #update time steps for node in new_list: if hasattr(node, 'Length') and node.Length is not None: node.Length += ts else: node.Length = ts self.CurrStep += 1 #set the list of current nodes to the new list self.CurrTaxa = new_list def __call__(self, filter=True, exact=False, random_f=random): """Returns a new tree using params in self. If filter is True (the default), gets rid of extinct lineages. If exact is True (default is False), raises exception if we didn't get the right number of taxa WARNING: Because multiple births can happen in the same timestep, you might get more than the number of taxa you specify. Check afterwards! """ self.CurrStep = 0 self.Tree = self.NodeClass() self.CurrTaxa = [self.Tree] while 1: self.step(random_f) if not(self.timeOk() and self.taxaOk()): break if not self.CurrTaxa: raise ExtinctionError, "All taxa are extinct." if filter: self.Tree.filter(self.CurrTaxa, keep=True) if exact and self.MaxTaxa and (len(self.CurrTaxa) != self.MaxTaxa): raise TooManyTaxaError, "Got %s taxa, not %s." % \ (len(self.CurrTaxa), self.MaxTaxa) return self.Tree class GeneNode(RangeNode): """Holds a phylogenetic node that corresponds to a gene. Specificially, needs Species property holding ref to its species. WARNING: the current implementation does not take Species in __init__, but assumes you will create it manually after you make the node. """ pass class SpeciesNode(RangeNode): """Holds a phylogenetic node that corresponds to a species. Specifically, needs a Genes property that holds refs to its genes. WARNING: the current implementation does not take Genes in __init__, but assumes you will create it manually after you make the node. """ pass class DoubleBirthDeathModel(object): """Creates species and gene trees using a double birth-death model. Initialize with timestep, birth prob, death prob. This class only produces the trees (using RangeNode); these trees already know how to evolve sequences, set rate matrices, etc. The trees returned will include lengths for each branch. WARNING: Sometimes, the ancestral node will die off, or that all the nodes will die off later. If that happens, an ExtinctionError will be raised. """ GeneClass = GeneNode SpeciesClass = SpeciesNode def __init__(self, GeneBirth, GeneDeath, GeneTransfer, SpeciesBirth, \ SpeciesDeath, SpeciesRateChange, TimePerStep, GenesAtStart, \ MaxStep=1000, MaxGenes=None, MaxSpecies=None, MaxGenome=None, DEBUG=False): """Returns a new BirthDeathModel object. GeneBirth: f(val, gene) returning True if gene is born given seed val. GeneDeath: f(val, gene) returning True if gene dies given seed val. GeneTransfer: f(val, gene) returning species that gene transfers to (or None if no trransfer.) SpeciesBirth: f(val, species) returning True if species splits. SpeciesDeath: f(val, species) returning True if species dies. SpeciesRateChange: f(val, species) resetting species rate matrix given val. NOTE: in current implementation, Q only changes when species duplicates. TimePerStep: branch length (sequence distance units) for each step. GenesAtStart: number of genes at the beginning of the simulation. MaxStep: maximum time step before stopping. Default 1000. MaxGenes: maximum genes before stopping. Default None. MaxSpecies: maximum species before stopping. Default None. MaxGenome: maximum number of genes in a genome. Default None. Sets CurrStep to 0 at the beginning to measure elapsed time. Note: if both a birth and a death occur in the same timestep, they will be ignored. WARNING: If neither MaxStep nor MaxTaxa is set, the simulation will keep going until all nodes are extinct, or you run out of memory! """ self.GeneBirth = GeneBirth self.GeneDeath = GeneDeath self.GeneTransfer = GeneTransfer self.SpeciesBirth = SpeciesBirth self.SpeciesDeath = SpeciesDeath self.SpeciesRateChange = SpeciesRateChange self.TimePerStep = TimePerStep self.GenesAtStart = GenesAtStart self.MaxStep = MaxStep self.MaxGenes = MaxGenes self.MaxSpecies = MaxSpecies self.MaxGenome = MaxGenome self.DEBUG = DEBUG if TimePerStep <= 0: raise ValueError, "TimePerStep must be greater than zero" self._init_vars() def _init_vars(self): """Initialize vars before running the simulation.""" self.CurrStep = 0 self.SpeciesTree = self.SpeciesClass() self.SpeciesTree.Length = 0 self.SpeciesTree.BirthDeathModel = self self.CurrSpecies = [self.SpeciesTree] self.SpeciesTree.CurrSpecies = self.CurrSpecies #ref to same object self.GeneTrees = [self.GeneClass() for i in range(self.GenesAtStart)] for i in self.GeneTrees: i.Length = 0 i.BirthDeathModel = self self.CurrGenes = self.GeneTrees[:] #set gene/species references for i in self.CurrGenes: i.Species = self.SpeciesTree self.SpeciesTree.Genes = self.CurrGenes[:] #note: copy of CurrGenes list, not reference def timeOk(self): """Return True only if the maximum time has not yet been reached.""" #If MaxStep is not set, never say that the maximum time was reached if self.MaxStep is None: return True else: return self.CurrStep < self.MaxStep def genesOk(self): """Returns True if the number of genes is > 0 and < self.MaxGenes. Note: MaxGenes is exclusive (i.e. if MaxGenes is 32, genesOk will return False when the number of genes is exactly 32, allowing you to stop when this number is reached). """ num_taxa = len(self.CurrGenes) if num_taxa < 1: return False if self.MaxGenes is not None: return num_taxa < self.MaxGenes #otherwise, if self.MaxTaxa was not set, any number is OK since we #know we have at least item in the list or we wouldn't have got here. else: return True def speciesOk(self): """Returns True if the number of species is > 0 and < self.MaxSpecies. Note: MaxSpecies is exclusive (i.e. if MaxSpecies is 32, speciesOk will return False when the number of species is exactly 32, allowing you to stop when this number is reached). """ num_taxa = len(self.CurrSpecies) if num_taxa < 1: return False if self.MaxSpecies is not None: return num_taxa < self.MaxSpecies #otherwise, if self.MaxTaxa was not set, any number is OK since we #know we have at least item in the list or we wouldn't have got here. else: return True def genomeOk(self): """Returns True if the max genome size is < self.MaxGenome. Note: MaxGenome is exclusive (i.e. if MaxGenome is 32, genomeOk will return False when the max genome size is exactly 32, allowing you to stop when this number is reached). """ max_taxa = max([len(i.Genes) for i in self.CurrSpecies]) if num_taxa < 1: return False if self.MaxGenome is not None: return num_taxa < self.MaxGenome #otherwise, if self.MaxTaxa was not set, any number is OK since we #know we have at least item in the list or we wouldn't have got here. else: return True def geneStep(self, random_f=random): """Advances the state of the genes by one timestep (except speciation). Specifically: Decides whether each gene will die, duplicate, or transfer. If a gene dies, delete it from the list of current genes. If a gene gives birth, add two child nodes to the list of current genes each with zero branchlength (will increment later), and delete the original node from the list of genes. If a gene transfers, handle like birth but also change the species. WARNING: This method does not increment the branch length or the time counter. Handle separately! """ #create list of new current nodes #Too complex to do combinations of states. Use three-pass algorithm: #1. birth #2. transfer #3. death #i.e. each copy gets a separate chance at death after it is made. #note that this differs slightly from what we do in the single #birth-death model where each original gene gets a chance at death #and a death and a duplication just cancel. Is this a problem with #the original model? self._gene_birth_step(random_f) self._gene_transfer_step(random_f) self._gene_death_step(random_f) def _duplicate_gene(self, gene, orig_species, new_species=None, \ new_species_2=None): """Duplicates a gene, optionally attaching to new species. When called with only orig_species, duplicates the gene in the same species (killing the old gene and making two copies). When called with orig_species and new_species, kills the old gene and puts one new child into each of the old and new species (i.e. for horizontal gene transfer). When called with orig_species, new_species, and new_species_2, kills the old gene and puts one new child into each of the two new species (i.e. for speciation where all genes duplicate into new species). WARNING: Does not update self.CurrGenes (so can use in loop, but must update self.CurrGenes manually).""" gc = self.GeneClass #make new children first_child, second_child = gc(), gc() children = [first_child, second_child] #update gene parent/child refs gene.Children = children first_child.Parent = gene second_child.Parent = gene #init branch lengths first_child.Length = 0 second_child.Length = 0 #update species refs #first, figure out which species to deal with if new_species is None: #add both to orig species first_species = orig_species second_species = orig_species elif new_species_2 is None: #add first to orig, second to new_species first_species = orig_species second_species = new_species else: #add to the two new species first_species = new_species second_species = new_species_2 #then, update the refs first_child.Species = first_species second_child.Species = second_species orig_species.Genes.remove(gene) first_species.Genes.append(first_child) second_species.Genes.append(second_child) #return the new genes for appending or whatever return first_child, second_child def _gene_birth_step(self, random_f=random): """Implements gene birth sweep.""" gb = self.GeneBirth new_genes = [] for gene in self.CurrGenes: if gb(random_f(), gene): new_genes.extend(self._duplicate_gene(gene, gene.Species)) else: new_genes.append(gene) self.CurrGenes[:] = new_genes[:] def _gene_transfer_step(self, random_f=random): """Implements gene transfer sweep.""" gt = self.GeneTransfer new_genes = [] #step 2: transfer for gene in self.CurrGenes: new_species = gt(random_f(), gene) if new_species: new_genes.extend(self._duplicate_gene(gene, gene.Species, \ new_species)) else: new_genes.append(gene) self.CurrGenes[:] = new_genes[:] def _gene_death_step(self, random_f=random): """Implements gene death sweep.""" gd = self.GeneDeath new_genes = [] for gene in self.CurrGenes: if gd(random_f(), gene): gene.Species.Genes.remove(gene) else: new_genes.append(gene) self.CurrGenes[:] = new_genes[:] def speciesStep(self, random_f=random): """Advances the state of the species by one timestep. Specifically: For each species in the current species, decides whether it's going to produce a birth or a death. If a species dies, delete it from the list of current species. If a species gives birth, add two child nodes to the list of current species, duplicate all their genes, and delete the original node from the list of taxa. Otherwise, add the timestep to the node's branchlength. """ #make the species that are going to duplicate self._species_birth_step(random_f) #kill the species that are going to die self._species_death_step(random_f) self._kill_orphan_genes_step() def _species_death_step(self, random_f): """Kills species in self.""" sd = self.SpeciesDeath new_list = [] for s in self.CurrSpecies: if not sd(random_f(), s): new_list.append(s) self.CurrSpecies[:] = new_list[:] def _kill_orphan_genes_step(self): """Kills genes whose species has been removed.""" new_list = [] species_dict = dict.fromkeys(map(id, self.CurrSpecies)) for g in self.CurrGenes: if id(g.Species) in species_dict: new_list.append(g) self.CurrGenes[:] = new_list[:] def _species_birth_step(self, random_f): sb = self.SpeciesBirth new_list = [] for s in self.CurrSpecies: if sb(random_f(), s): new_list.extend(self._duplicate_species(s)) else: new_list.append(s) self.CurrSpecies[:] = new_list[:] def _duplicate_species(self, species): """Duplicates a species by duplicating all its genes. WARNING: Doesn't remove from self.CurrSpecies: must do outside function (so can use while iterating over self.CurrSpecies). """ for i in self.CurrGenes: assert i.Species in self.CurrSpecies for i in self.CurrSpecies: assert not i.Children if self.DEBUG: print '*** DUPLICATING SPECIES' print "SPECIES GENES AT START: ", len(species.Genes) sc = self.SpeciesClass #make new species first_child, second_child = sc(), sc() children = [first_child, second_child] #update species parent/child refs species.Children = children first_child.Parent = species second_child.Parent = species #update other child properties first_child.CurrSpecies = species.CurrSpecies first_child.Genes = [] first_child.Length = 0 second_child.CurrSpecies = species.CurrSpecies second_child.Genes = [] second_child.Length = 0 #update gene references curr_genes = self.CurrGenes if self.DEBUG: print "GENES BEFORE SWEEP: ", len(curr_genes) print "NUM GENES IN SPECIES: ", len(species.Genes) gene_counter = 0 for gene in species.Genes[:]: assert gene.Species is species if self.DEBUG: print "handling gene ", gene_counter gene_counter += 1 curr_genes.remove(gene) assert gene not in curr_genes for i in curr_genes: assert (i.Species in self.CurrSpecies) or \ i.Species in [first_child, second_child] curr_genes.extend(self._duplicate_gene(gene, \ gene.Species, first_child, second_child)) for i in curr_genes: assert (i.Species in self.CurrSpecies) or \ i.Species in [first_child, second_child] if self.DEBUG: print "GENES IN FIRST CHILD: ", len(first_child.Genes) print "GENES IN SECOND CHILD: ", len(second_child.Genes) print "GENES AFTER SWEEP: ", len(curr_genes) self.SpeciesTree.assignIds() if self.DEBUG: print "SPECIES TREE: ", self.SpeciesTree if self.DEBUG: print "SPECIES ASSIGNMENTS FOR EACH GENE" for i in curr_genes: if self.DEBUG: print i.Species.Id assert (i.Species in self.CurrSpecies) or i.Species in [first_child, second_child] return children def updateLengths(self): """Adds timestep to the branch lengths of surviving genes/species.""" ts = self.TimePerStep for gene in self.CurrGenes: gene.Length += ts for species in self.CurrSpecies: species.Length += ts def __call__(self, filter=True, exact_species=False, exact_genes=False, \ random_f=random): """Returns a new tree using params in self. If filter is True (the default), gets rid of extinct lineages. If exact is True (default is False), raises exception if we didn't get the right number of taxa WARNING: Because multiple births can happen in the same timestep, you might get more than the number of taxa you specify. Check afterwards! """ self._init_vars() done = False while not done: if self.DEBUG: print "CURR STEP:", self.CurrStep for i in self.CurrGenes: assert i.Species in self.CurrSpecies self.geneStep(random_f) for i in self.CurrGenes: assert i.Species in self.CurrSpecies self.speciesStep(random_f) for i in self.CurrGenes: assert i.Species in self.CurrSpecies self.updateLengths() self.CurrStep += 1 done = not(self.timeOk() and self.genesOk() and self.speciesOk \ and self.genomeOk) #check if all the constraints were met if not (self.CurrSpecies or self.CurrGenes): raise ExtinctionError, "All taxa are extinct." if exact_species and self.MaxSpecies and \ (len(self.CurrSpecies) != self.MaxSpecies): raise TooManyTaxaError, "Got %s species, not %s." % \ (len(self.CurrSpecies), self.MaxSpecies) if exact_genes and self.MaxGenes and \ (len(self.CurrGenes) != self.MaxGenes): raise TooManyTaxaError, "Got %s genes, not %s." % \ (len(self.CurrGenes), self.MaxGenes) #filter if required if filter: if self.DEBUG: print "***FILTERING..." self.SpeciesTree.assignIds() if self.DEBUG: print "BEFORE PRUNE: ", self.SpeciesTree self.SpeciesTree.filter(self.CurrSpecies, keep=True) if self.DEBUG: print "AFTER PRUNE: ", self.SpeciesTree for i, t in enumerate(self.GeneTrees): t.filter(self.CurrGenes) return self.SpeciesTree, self.GeneTrees PyCogent-1.5.3/cogent/seqsim/markov.py000644 000765 000024 00000051631 12024702176 020700 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """markov.py: various types of random and non-random generators. Currently provides: MarkovGenerator: reads in k-word frequencies from a text, and generates random text based on those frequencies. Also calculates the average entropy of the (k+1)th symbol (for sufficiently large k, converges on the entropy of the text). NOTE: The text must be a list of strings (e.g. lines of text). If a single string is passed into the constructor it should be put into a list (i.e. ['your_string']) or it will result in errors when calculating kword frequencies. """ from __future__ import division from operator import mul from random import choice, shuffle, randrange from cogent.maths.stats.util import UnsafeFreqs as Freqs from cogent.util.array import cartesian_product from cogent.maths.stats.test import G_fit from copy import copy,deepcopy from numpy import ones, zeros, ravel, array, rank, put, argsort, searchsorted,\ take from numpy.random import random __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Jesse Zaneveld", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" class MarkovGenerator(object): """Holds k-word probabilities read from file, and can generate text.""" def __init__(self, text=None, order=1, linebreaks=False, \ calc_entropy=False, freqs=None, overlapping=True, \ array_pseudocounts=0, delete_bad_suffixes=True): """Sets text and generates k-word frequencies.""" self.Text = text self.Linebreaks = linebreaks self.Order = order self.Frequencies = freqs or {} self.RawCounts = {} self.FrequencyArray=None self.CountArray=None self.ExcludedCounts=None self.ArrayPseudocounts=array_pseudocounts self._calc_entropy = calc_entropy self.Entropy = None self.Prior = None self.Overlapping=overlapping if self.Text: self.calcFrequencies(delete_bad_suffixes) def calcFrequencies(self, delete_bad_suffixes=True): """For order k, gets the (k-1)-word frequencies plus what follows.""" #reset text if possible -- but it might just be a string, so don't #complain if the reset fails. overlapping=self.Overlapping try: self.Text.reset() except AttributeError: try: self.Text.seek(0) except AttributeError: pass k = self.Order if k < 1: #must be 0 or '-1': just need to count single bases self._first_order_frequency_calculation() else: #need to figure out what comes after the first k bases all_freqs = {} for line in self.Text: if not self.Linebreaks: line = line.strip() #skip the line if it's blank if (not line): continue #otherwise, make a frequency distribution of symbols end = len(line) - k if overlapping: rang=xrange(end) else: rang=xrange(0,end,(k+1)) for i in rang: word, next = line[i:i+k], line[i+k] curr = all_freqs.get(word, None) if curr is None: curr = Freqs({next:1}) all_freqs[word] = curr else: curr += next if self._calc_entropy: self.Entropy = self._entropy(all_freqs) self.Frequencies = all_freqs if delete_bad_suffixes: self.deleteBadSuffixes() self.RawCounts=deepcopy(all_freqs) #preserve non-normalized freqs for dist in self.Frequencies.values(): dist.normalize() def wordToUniqueKey\ (self,word, conversion_dict={'a':0,'c':1,'t':2,'g':3}): #since conversion_dict values are used as array indices later, #values of conversion dict should range from 0 to (n-1), #where n=number of characters in your alphabet uniqueKey=0 alpha_len = len(conversion_dict) for i in range(0,len(word)): uniqueKey += (conversion_dict[word[i]]*alpha_len**i) return uniqueKey def makeCountArray(self): """Generates a 1 column array with indices equal to the keys for each kword + character and raw counts of the occurances of that key as values. This allows counts for many k+1 long strings to be found simultaneously with evaluateArrayProbability (which also normalizes the raw counts to frequencies)""" #print "makeCountArray: before replaceDegen self.Rawcounts=",\ # self.RawCounts self.replaceDegenerateBases() #print "makeCountArray:self.RawCounts=",self.RawCounts #debugging counts=self.RawCounts self.CountArray=zeros((4**(self.Order+1)),'f') #Order 0 --> 4 spots ('a','c','t','g') 1 --> 16 etc #TODO: may generate problems if Order = -1 if self.Order==0: #print "makeCountArray:counts=",counts #debugging for key in counts['']: #print "attempting to put",float(counts[''][key]),"into index",\ # self.wordToUniqueKey(key),\ # "of array CountArray=",self.CountArray #debugging put(self.CountArray,self.wordToUniqueKey(key),\ float(counts[''][key])) self.CountArray[self.wordToUniqueKey(key)]=counts[''][key] #print "placement successful!" #debugging else: for kword in counts.keys(): for key in counts[kword]: index=self.wordToUniqueKey(kword+key) #debugging #print "attempting to put",counts[kword][key],"at index",\ # index,"of self.CountArray, which =",self.CountArray put(self.CountArray,index,counts[kword][key]) #print "placement sucessful!" #debugging #print "makeCountArray:raveling self" #debugging if self.ArrayPseudocounts: self.CountArray = self.CountArray + float(self.ArrayPseudocounts) # adds to each count, giving unobserved keys frequency #pseudocounts/n #n= number of observed counts (rather than 0 frequency) # When the number of pseudocounts added is one, # this is 'Laplace's rule' #(See 'Biological Sequence Analysis',Durbin et. al, p.115) self.CountArray=ravel(self.CountArray) #print "makeCountArray:final CountArray=",self.CountArray def updateFrequencyArray(self): """updates the frequency array by re-normalizing CountArray""" self.FrequencyArray=deepcopy(self.CountArray) #preserve raw counts total_counts=sum(self.FrequencyArray) self.FrequencyArray=self.FrequencyArray/total_counts def replaceDegenerateBases(self,normal_bases=['a','t','c','g']): """remove all characters from self.Text that aren't a,t,c or g and replace them with random characters (when degenerate characters are rare, this is useful because it avoids assigning all kwords with those characters artificially low conditional probabilities)""" def normalize_character(base,bases=normal_bases): if base not in bases: base=choice(bases) return base text=self.Text for i in range(len(text)): text[i]=\ ''.join(map(normalize_character,text[i].lower())) self.Text=text def deleteBadSuffixes(self): """Deletes all suffixes that can't lead to prefixes. For example, with word size 3, if acg is present but cg* is not present, acg is not allowed. Need to repeat until no more suffixes are deleted. """ f = self.Frequencies #loop until we make a pass where we don't delete anything deleted = True while deleted: deleted = False for k, v in f.items(): suffix = k[1:] for last_char in v.keys(): #if we can't make suffix + last_char, can't select that char if suffix + last_char not in f: del v[last_char] deleted=True if not v: #if we deleted the last item, delete prefix del f[k] deleted = True def _entropy(self, frequencies): """Calcuates average entropy of the (k+1)th character for k-words.""" sum_ = 0. sum_entropy = 0. count = 0. for i in frequencies.values(): curr_entropy = i.Uncertainty curr_sum = sum(i.values()) sum_ += curr_sum sum_entropy += curr_sum * curr_entropy count += 1 return sum_entropy/sum_ def _first_order_frequency_calculation(self): """Handles single-character calculations, which are independent. Specifically, don't need to take into account any other characters, and can just feed the whole thing into a single Freqs. """ freqs = Freqs('') for line in self.Text: freqs += line #get rid of line breaks if necessary if not self.Linebreaks: for badkey in ['\r', '\n']: try: del freqs[badkey] except KeyError: pass #don't care if there weren't any #if order is negative, equalize the frequencies if self.Order < 0: for key in freqs: freqs[key] = 1 self.RawCounts= {'':deepcopy(freqs)} freqs.normalize() self.Frequencies = {'':freqs} def next(self, length=1, burn=0): """Generates random text of specified length with current freqs. burn specifies the number of iterations to throw away while the chain converges. """ if self.Order < 1: return self._next_for_uncorrelated_model(length) freqs = self.Frequencies #cache reference since it's frequently used #just pick one of the items at random, since calculating the weighted #frequencies is not possible without storing lots of extra info keys = freqs.keys() curr = choice(keys) result = [] for i in range(burn +length): next = freqs[curr].choice(random()) if i >= burn: result.append(next) curr = curr[1:] + next return ''.join(result) def _next_for_uncorrelated_model(self, length): """Special case for characters that don't depend on previous text.""" return ''.join(self.Frequencies[''].randomSequence(length)) def evaluateProbability(self,seq): """Evaluates the probability of generating a user-specified sequence given the model.""" conditional_prob=1 order=self.Order for i in range(0,(len(seq)-(order)),1): k=seq[i:i+order+1] try: conditional_prob *= self.Frequencies[k[:-1]][k[-1]] except KeyError: #if key not in Frequencies 0 < Freq < 1/n #To be conservative in the exclusion of models, use 1/n if conditional_prob: conditional_prob *= 1.0/(float(len(self.Text))) else: conditional_prob = 1.0/(float(len(self.Text))) return conditional_prob def evaluateWordProbability(self,word): k=word[:self.Order+1] try: conditional_prob= self.Frequencies[k[:-1]][k[-1]] except KeyError: conditional_prob = 1.0/(float(len(self.Text))) return conditional_prob def evaluateArrayProbability(self,id_array): #takes an array of unique integer keys #corresponding to (k+1) long strings #[can be generated by self.wordToUniqueKey()] #Outputs probability if self.FrequencyArray is None: if self.CountArray is None: self.makeCountArray() self.updateFrequencyArray() freqs=take(self.FrequencyArray,id_array) prob=reduce(mul,freqs) return float(prob) def evaluateInitiationFrequency(self,kword,\ allowed_bases=['a','t','c','g']): # takes a unique key corresponding to a k long word # calculates the initiation frequency for that kword # which is equal to its relative frequency #TODO: add case where order is < 1 if len(kword) != (self.Order): raise KwordError #kword must be equal to markov model order if self.CountArray is None: self.makeCountArray() unique_keys=[] #add term for each possible letter for base in allowed_bases: unique_keys.append(self.wordToUniqueKey(kword+base)) id_array=ravel(array(unique_keys)) counts=take(self.CountArray,id_array) total_kword_counts=sum(counts) total_counts=sum(self.CountArray) prob=float(total_kword_counts)/float(total_counts) return prob def excludeContribution(self,excluded_texts): #"""Excludes the contribution of a set of texts #from the markov model. This can be useful, for example, #to prevent self-contribution of the data in a gene to #the model under which that gene is evaluated. # #A Markov Model is made from the strings, converted to a CountArray, #and then that Count array (stored as ExcludedCounts) is subtracted #from the current CountArray, and FrequencyArray is updated. # #The data excluded with this function can be restored with #restoreContribution # #Only one list of texts can be excluded at any time. If a list of #texts is already excluded when excludeContribution is called, that #data will be restored before the new data is excluded""" #print "excludeContribution:excluded_texts=",excluded_texts #debugging if self.CountArray is None: #print ".excludeContribution:missing countArray" #debugging self.makeCountArray() if self.ExcludedCounts: self.restoreContribution() #generate mm using same parameters as current model exclusion_model=MarkovGenerator(excluded_texts,order=self.Order,\ overlapping=self.Overlapping) exclusion_model.makeCountArray() self.ExcludedCounts=\ (exclusion_model.CountArray) self.CountArray = self.CountArray-(self.ExcludedCounts) self.updateFrequencyArray() def restoreContribution(self): """Restores data excluded using excludeContribution, and renormalizes FrequencyArray""" if self.ExcludedCounts: self.CountArray += (self.ExcludedCounts) self.ExcludedCounts=None self.updateFrequencyArray() def count_kwords(source, k, delimiter=''): """Makes dict of {word:count} for specified k.""" result = {} #reset to beginning if possible if hasattr(source, 'seek'): source.seek(0) elif hasattr(source, 'reset'): source.reset() if isinstance(source, str): if delimiter: source = source.split(delimiter) else: source = [source] for s in source: for i in range(len(s) - k + 1): curr = s[i:i+k] if curr in result: result[curr] += 1 else: result[curr] = 1 return result def extract_prefix(kwords): """Converts dict of {w:count} to {w[:-1]:{w[-1]:count}}""" result = {} for w, count in kwords.items(): prefix = w[:-1] suffix = w[-1] if prefix not in result: result[prefix] = {} curr = result[prefix] if suffix not in curr: curr[suffix] = {} curr[suffix] = count return result def _get_expected_counts(kwords, kminus1): """Gets expected counts from counts of 2 successive kword lengths..""" result = [] total = sum(kminus1.values()) #shortest prefixes = extract_prefix(kminus1) for k in kwords: result.append(kminus1[k[:-1]] * prefixes[k[1:-1]]/kminus1[k[:-1]]) def _pair_product(p, i, j): """Return product of counts of i and j from data.""" try: return sum(p[i].values())*sum(p[j].values()) except KeyError: return 0 def markov_order(word_counts, k, alpha): """Estimates Markov order of a source, using G test for fit. Uses following procedure: A source depends on the previous k letters more than the previous (k-1) letters iff Pr(a|w_k) != Pr(a|w_{k-1}) for all words of length k. If we know Pr(a|w) for all symbols a and words of length k and k-1, we would expect count(a|w_i) to equal count(a|w_i[1:]) * count(w)/count(w[1:]). We can compare these expected frequencies to observed frequencies using the G test. max_length: maximum correlation length to try """ if k == 0: #special case: test for unequal freqs obs = word_counts.values() total = sum(obs) #will remain defined through loop exp = [total/len(word_counts)] * len(word_counts) elif k == 1: #special case: test for pair freqs prefix_counts = extract_prefix(word_counts) total = sum(word_counts.values()) words = word_counts.keys() exp = [_pair_product(prefix_counts, w[0], w[1])/total for w in words] obs = word_counts.values() else: # k >= 3: need to do general Markov chain #expect count(a_i.w.b_i) to be Pr(b_i|w)*count(a_i.w) cwb = {} #count of word.b cw = {} #count of word caw = {} #count of a.word #build up counts of prefix, word, and suffix for word, count in word_counts.items(): aw, w, wb = word[:-1], word[1:-1], word[1:] if not wb in cwb: cwb[wb] = 0 cwb[wb] += count if not aw in caw: caw[aw] = 0 caw[aw] += count if not w in cw: cw[w] = 0 cw[w] += count obs = word_counts.values() exp = [cwb[w[1:]]/(cw[w[1:-1]])*caw[w[:-1]] for w in \ word_counts.keys()] return G_fit(obs, exp) def random_source(a, k, random_f=random): """Makes a random Markov source on alphabet a with memory k. Specifically, for all words k, pr(i|k) = rand(). """ result = dict.fromkeys(map(''.join, cartesian_product([a]*k))) for k in result: result[k] = Freqs(dict(zip(a, random_f(len(a))))) return result def markov_order_tests(a, max_order=5, text_len=10000, verbose=False): """Tests of the Markov order inferrer using Markov chains of diff. orders. """ result = [] max_estimated_order = max_order + 2 for real_order in range(max_order): print "Actual Markov order:", real_order s = random_source(a, real_order) m = MarkovGenerator(order=real_order, freqs=s) text = m.next(text_len) for word_length in range(1, max_estimated_order+1): words = count_kwords(text, word_length) g, prob = markov_order(words, word_length-1, a) if verbose: print "Inferred order: %s G=%s P=%s" % (word_length-1, g, prob) result.append([word_length-1, g, prob]) return result if __name__ == '__main__': """Makes text of specified # chars from training file. Note: these were tested from the command line and confirmed working by RK on 8/6/07. """ from sys import argv, exit if len(argv) == 2 and argv[1] == 'm': markov_order_tests('tcag', verbose=True) elif len(argv) == 3 and argv[1] == 'm': infilename = argv[2] max_estimated_order = 12 text = open(infilename).read().split('\n') for word_length in range(1, max_estimated_order): words = count_kwords(text, word_length) g, prob = markov_order(words, word_length-1, 'ATGC') print "Inferred order: %s G=%s P=%s" % (word_length-1, g, prob) else: try: length = int(argv[1]) max_order = int(argv[2]) text = open(argv[3], 'U') except: print "Usage: python markov.py num_chars order training_file" print "...or python markov.py m training_file to check order" exit() for order in range(max_order + 1): m = MarkovGenerator(text, order, calc_entropy=True) print order,':', 'Entropy=', m.Entropy, m.next(length=length) PyCogent-1.5.3/cogent/seqsim/microarray.py000644 000765 000024 00000004276 12024702176 021554 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Methods for using trees to generate microarray data. """ from cogent.seqsim.tree import RangeNode from cogent.util.array import mutate_array __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" class MicroarrayNode(RangeNode): def __init__(self, Length=0, Array=None, *args, **kwargs): """Returns new MicroarrayNode object. Length: float giving the branch length (sd to add to data) Array: array of float giving the expression vector, or None Additional args for superclass: Name: usually a text label giving the name of the node LeafRange: range of leaves that the node spans Id: unique numeric identifier from the node Children: list of Node objects specifying the children Parent: Node object specifying the parent """ RangeNode.__init__(self, *args, **kwargs) self.Length = Length self.Array = Array def setExpression(self, vec): """Sets expression in self and all children. WARNING: Will overwrite existing array with new array passed in. Expects vec to be a floating-point Numeric array. """ #if it's the root or zero branch length, just set the vector if not self.Length: self.Array = vec.copy() else: self.Array = mutate_array(vec, self.Length) curr_children = self.Children curr_arrays = [self.Array] * len(curr_children) while len(curr_children): new_children = [] new_arrays = [] for c,a in zip(curr_children, curr_arrays): if not c.Length: c.Array = a.copy() else: c.Array = mutate_array(a, c.Length) new_children.extend(c.Children) new_arrays.extend([c.Array] * len(c.Children)) curr_children = new_children curr_arrays = new_arrays PyCogent-1.5.3/cogent/seqsim/microarray_normalize.py000644 000765 000024 00000015016 12024702176 023626 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """microarray_normalize.py: provides functions to normalize array data. Will use for testing normalization methods against simulated data. Implementation notes: All normalization methods should use the interface f(a) -> n, where a is an array in which the rows are genes (or probes) and the columns are samples (i.e. chips); n is an array of the same shape that contains the normalized values. All probe consolidation methods should be f(probes, groups) -> genes where probes is the array of probes (rows = probes, cols = samples), and genes is the array of genes. All platform consolidation methods should be f([p_1, p_2, ...]) -> g where the input is a list of probe arrays for each platform, and the output is a single gene array. All missing value imputation methods should be f(a, m) -> n where a is an array where each row is a gene (or probe), m is a dict such that m[(x,y)] is True if a[x,y] is missing (we are assuming that missing values are rare), and n is a dense array containing imputed values at missing positions. Revision History 10/28/05 Rob Knight: file created. 11/10/05 Micah Hamady: merged w/my implementations """ from cogent.maths.stats.distribution import ndtri from numpy import ceil, arange, argsort, sort, array, log2, zeros, ravel, \ transpose, take, mean, std from numpy.linalg import svd __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Micah Hamady", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def zscores(a): """Converts a to zscores in each col, i.e. subtract mean and div by stdev.""" return (a - mean(a,axis=0)) / std(a, axis=0) def logzscores(a): """Takes log (base 2) of values then computes zscores""" return zscores(log2(a)) def ranks(a): """Converts a to absolute ranks in each col, 0 = smallest value. Doesn't break ties: instead, assigns arbitrary ranks. """ return(argsort(argsort(a,0),0)) def quantiles(a): """Converts a to quantiles p(x[old] index. All omit_rows functions will return both these values. """ mask = array(map(f, a)) coords = nonzero(mask) return take(a, coords), coords def omit_rows_below_mean_threshold(a, t): """Returns copy of a without rows that have mean below threshold.""" return omit_rows(a, lambda f: mean(f) >= t) def omit_rows_below_max_threshold(a, t): """Returns copy of a without rows that have max val below threshold.""" return omit_rows(a, lambda f: max(f) >= t) def group_rows(a, groups): """Converts a into list of groups, specified as lists of lists of indices. i.e. groups should be a list of groups, where each group contains the indices of the rows that belong to it. """ return [take(a, indices) for indices in groups] def max_per_group(a, groups): """Returns array with max item for each col for each group.""" return array(map(max, group_rows(a, groups))) def mean_per_group(a, groups): """Returns array with mean for each col for each group.""" return array(map(mean, group_rows(a, groups))) def housekeeping_gene_normalize(a, housekeeping_gene_indexes): """ Normalize matrix based on mean/std dev of housekeeping genes Need to refactor to use more effiecient array operations (in place) a: microarray data. expects rows to be genes, columns to be samples housekeeping_gene_indexes: list of indexes of genes (rows) that represent the housekeeping set """ norm_values = ravel(take(a, housekeeping_gene_indexes, 1)) hk_mean = mean(norm_values) hk_std_dev = std(norm_values) if not hk_std_dev: raise NormalizationError, "Cannot normalize, std dev is zero." return (a - hk_mean) / hk_std_dev PyCogent-1.5.3/cogent/seqsim/randomization.py000644 000765 000024 00000011156 12024702176 022255 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides utility methods for randomization. """ from random import shuffle from cogent.util.misc import find_many __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def shuffle_range(items, start, end): """Randomizes region of items between start and end indices, in-place. Shuffle affects items[start] but not items[end], as ususal for slice operations. Items slices must be assignable. No return value, as per standard for in-place operations. """ length = len(items) #handle negative indices if start < 0: start += length if end < 0: end += length #skip if 0 or 1 items if end - start > 1: #shuffle has no effect on 0 or 1 item curr = items[start:end] shuffle(curr) items[start:end] = curr[:] def shuffle_between(items, constructor=None): """Returns function that shuffles each part of seq between specified items. If constructor is not passed, uses ''.join() for subclasses of strings, and seq.__class__ for everything else. Note that each interval between items is shuffled independently: use shuffle_except to keep a particular item in place while shuffling everything else. The resulting function takes only the sequence as an argument, so can be passed e.g. to map(). """ c = constructor def result(seq): """Returns copy of seq shuffled between specified items.""" #figure out the appropriate constructor, if not supplied if c is None: if isinstance(seq, str): constructor = lambda x: seq.__class__(''.join(x)) else: constructor = seq.__class__ else: constructor = c #figure out where to cut the sequence cut_sites = find_many(seq, items) #want to shuffle sequence before first and after last match as well if (not cut_sites) or cut_sites[0] != 0: cut_sites.insert(0, 0) seq_length = len(seq) if cut_sites[-1] != seq_length: cut_sites.append(seq_length) #shuffle each pair of (i, i+1) matches, excluding position of match curr_seq = list(seq) for start, end in zip(cut_sites, cut_sites[1:]): shuffle_range(curr_seq, start+1, end) #remember to exclude start #return the shuffled copy return constructor(curr_seq) #return the resulting function return result #example of shuffle_between shuffle_peptides = shuffle_between('KR', ''.join) def shuffle_except_indices(items, indices): """Shuffles all items in list in place, except at specified indices. items must be slice-assignable. Uses linear algorithm, so suitable for long lists. """ length = len(items) if not length: return [] orig = items[:] sorted_indices = indices[:] #make indices positive for i, s in enumerate(sorted_indices): if s < 0: sorted_indices[i] = length + s sorted_indices.sort() sorted_indices.reverse() last = None for i in sorted_indices: if i != last: #skip repeated indices in list del items[i] last = i shuffle(items) sorted_indices.reverse() last = None for i in sorted_indices: if i != last: items.insert(i, orig[i]) last = i def shuffle_except(items, constructor=None): """Returns function that shuffles a sequence except specified items. If constructor is not passed, uses ''.join() for subclasses of strings, and seq.__class__ for everything else. Note that all intervals are shuffled together; use shuffle_between to shuffle the items in each interval separately. The resulting function takes only the sequence as an argument, so can be passed e.g. to map(). """ c = constructor def result(seq): """Returns copy of seq shuffled between specified items.""" #figure out the appropriate constructor, if not supplied if c is None: if isinstance(seq, str): constructor = lambda x: seq.__class__(''.join(x)) else: constructor = seq.__class__ else: constructor = c #figure out where to cut the sequence cut_sites = find_many(seq, items) new_seq = list(seq) shuffle_except_indices(new_seq, cut_sites) return constructor(new_seq) #return the resulting function return result PyCogent-1.5.3/cogent/seqsim/searchpath.py000644 000765 000024 00000035626 12024702176 021531 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """SearchPath and SearchNode classes for generating random w/rules strings. SearchPath is a class that can generate strings that follow certain rules re: what alphabet may appear at each position in the string and whether any substrings are forbidden. If it encounters a forbidden substring, it is able to backtrack the minimum amount necessary to get around that. The main method is "generate", which takes in the length of the string to generate and can be called multiple times (without clearing state) to grow the path incrementally. Generate returns the first good string of the correct length that it finds--or, if none is possible given the constraints, it returns None. Internally, SearchPath treats each position as a SearchNode, which contains a number of randomized options and knows how to remove options that have been determined to produce forbidden substrings. Revision History: 09/22/03 Amanda Birmingham: created from generalized elements of what started as primer_builder.py 11/04/03 Amanda Birmingham: removed unnecessary increment of position variable in SearchPath.generate. Renamed public properties with leading uppercase to match standards. 12/03/03 Amanda Birmingham: renamed _remove_option (in SearchPath) to removeOption to indicate it is now public 12/05/03 Amanda Birmingham: made DEFAULT_KEY property of SearchPath public 12/15/03 Amanda Birmingham: renamed _find_allowed_option (in SearchPath) to findAllowedOption to indicate it is now public 12/16/03 Amanda Birmingham: added optional parameter to removeOption to allow removal of accepted options from a completed path; see comment in removeOption for more details. Necessary to fix bug when backtracking from a completed primer. 01/05/04 Amanda Birmingham: Altered init of SearchPath so that the default value of forbidden_seqs is defined as None, not []--Rob says that using a default mutable object is a big no-no. 01/20/04 Amanda Birmingham: updated to fit into cvsroot and use packages. """ from random import shuffle from cogent.util.misc import toString, makeNonnegInt __author__ = "Amanda Birmingham" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Amanda Birmingham"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Amanda Birmingham" __email__ = "amanda.birmingham@thermofisher.com" __status__ = "Production" class SearchPath(object): """Represents one path through the a random w/rules search tree.""" DEFAULT_KEY = "default" def __init__(self, alphabets_by_position, forbidden_seqs = None): """Prepopulate with input forbidden values. alphabets_by_position: a dictionary-like object keyed by position in searchpath, specifying the alphabet that should be used at that position. Must include a "default" key and alphabet forbidden_seqs: a list containing sequences that may never occur. In the case of primers, this should include runs of 4 purines or pyrimidines, as well as any user-defined forbidden seqs. """ #reminder: can't use mutable object as default! if forbidden_seqs is None: forbidden_seqs = [] #store the alphabet dictionary and forbidden len for later use self._alphabets = alphabets_by_position try: if self.DEFAULT_KEY not in self._alphabets: raise ValueError, "alphabets_by_position param must " + \ "contain a " + self.DEFAULT_KEY + "key" #end if except TypeError: raise ValueError, "alphabets_by_position param must function" + \ " as a dictionary" #end except #create the private dict of fixed, unacceptable values input_forbidden = [(i.upper(), True) for i in forbidden_seqs] self._fixed_forbidden = dict(input_forbidden) #create a dictionary holding the lengths of fixed forbidden seqs self._forbidden_lengths = self._get_forbidden_lengths() #create the variable_forbidden and path_stack properties self.clearNodes() #end __init__ def __str__(self): """Create a human-readable representation of object.""" return toString(self) #end __str__ def _get_top_index(self): """Return the value of top index of path stack""" result = None path_length = len(self._path_stack) if path_length > 0: result = path_length - 1 return result #end _get_top_index _top_index = property(_get_top_index) def _get_value(self): """Read the curr value of each node, concat, and return string""" node_vals = [] for curr_node in self._path_stack: node_vals.append(curr_node.Value) #next node return "".join(node_vals) #end _get_value Value = property(_get_value) def clearNodes(self): """Clear the node stack""" #create an stack to hold the path (most recent selection will go #on top) self._path_stack = [] #end clearNodes def _get_forbidden_lengths(self): """Return a dictionary of lengths of all fixed forbidden sequences""" lengths = {} #for each key in self._fixed_forbidden, say we need to check #nmers of that length. for seq in self._fixed_forbidden: lengths[len(seq)] = True return lengths #end _get_forbidden_lengths def _get_top(self): """Retrieve (but don't pop) the top item on the path stack.""" result = None if self._top_index is not None: result = self._path_stack[self._top_index] #end if there are actually any nodes on the stack return result #end _get_top def _add_node(self, new_searchnode): """Add new searchnode to top of path stack and increase top index. new_searchnode: a SearchNode object to add to the path stack. NOTE that it is the responsibility of the user of this function to pass a SearchNode; no checks are made, so GIGO """ self._path_stack.append(new_searchnode) #end _add_node def _get_alphabet(self, position): """Return the alphabet for input position. If none, return default position: a nonnegative integer or something castable to it. Positions are assumed to be ZERO based. """ position = makeNonnegInt(position) if position in self._alphabets: result = self._alphabets[position] else: result = self._alphabets[self.DEFAULT_KEY] #end if return result #end _get_alphabet def generate(self, path_length): """Generate a valid path of required length and return its value. Returns None if no path is possible. path_length: a nonnegative integer or castable to it. Indicates length of desired valid path. """ path_length = makeNonnegInt(path_length) #while length of path stack < path_length while len(self._path_stack) < path_length: #always have to get the alphabet based on the current #top index plus 1. This is because, if we have to remove #a node because it contributes to a forbidden sequence, #we need to make sure that the position used to generate #the next node is the same position = self._top_index if position is None: position = -1 position += 1 #get the alphabet for the next node curr_alphabet = self._get_alphabet(position) #make new SearchNode and push it onto path new_node = SearchNode(curr_alphabet) self._add_node(new_node) #select the next available allowed option option_exists = self.findAllowedOption() #if no more options, no valid searchpath is possible #break out and return None if not option_exists: return None #end while #return path as string return self.Value #end generate def findAllowedOption(self): """Finds, sets next allowed option. Returns false if none exists.""" #initially, assume that the current option is forbidden #(means we always check at least once) is_forbidden = True while is_forbidden == True: #check whether the current option of the top node is invalid is_forbidden = self._check_forbidden_seqs() if is_forbidden: options_remain = self.removeOption() #if the path is now empty, break out and return false if not options_remain: return False #end if current option is forbidden #end while current option is forbidden #we found a good option, so do any necessary bookkeeping self._accept_option() return True #end findAllowedOption def _accept_option(self): """Bookkeeping to accept a good option. Default impl does nothing.""" pass #end _accept_option def _remove_accepted_option(self): """Bookkeeping to remove previously accepted. Default does nothing.""" pass #end _remove_accepted_option def removeOption(self, top_is_accepted = False): """Remove the current option and return false if no more exist""" result = True #if the top option is accepted, remove its nmer from #the accepted option list (before we remove the option) #This option is only used when calling removeOption #explicitly from another module after a path has been #generated. if top_is_accepted: self._remove_accepted_option() #tell top search node to removeOption curr_top = self._get_top() still_viable = curr_top.removeOption() #if that node is now out of options if not still_viable: #pop it off the path self._path_stack.pop() #remove the previously accepted option #(in node under the one we just popped) self._remove_accepted_option() #if stack is now empty if len(self._path_stack) == 0: result = False else: #call recursively to remove dead-end option(s) result = self.removeOption() #end if #end if return result #end removeOption def _check_forbidden_seqs(self): """Return t/f for whether current path has any forbidden seqs in it""" result = False #assume current path has no forbidden nmers #for every length we need to check (bc there's a fixed forbidden #sequence with that length), get the current rightmost string #of that length from the path and see if it is in a forbidden list for length_to_test in self._forbidden_lengths: curr_nmer = self._get_nmer(length_to_test) #check if current nmer is forbidden in fixed or variable if (self._fixed_forbidden.__contains__(curr_nmer) or \ self._in_extra_forbidden(curr_nmer)): result = True break #next length to check return result #end _check_forbidden_seqs def _in_extra_forbidden(self, nmer): """Return True if nmer forbidden in anything besides fixed forbid""" #default implementation *has* no other forbidden dictionariesm so #the nmer can't be *in* one return False #end _in_extra_forbidden def _get_nmer(self, n): """Get the string of the last N bases on the path stack. n: aninteger or integer-castable value; nonnegative Returns None if n is greater than the number of searchnodes on the path stack. Otherwise, returns a string of the values of those nodes, in the order they were pushed on the stack """ nmer = [] result = None n = makeNonnegInt(n) #only try to get the nmer if there are at least n items on stack if n <= len(self._path_stack): #for n to 0 -- start with the furthest-back entry for temp_length in xrange(n, 0, -1): #Note that we add one bc temp length is base 1, while #top index isn't. Ex: a stack with length 5 has top #index 4. If you want the last 4, you want to start #at index 1. Thus, 4 - 4 + 1 = 1 temp_index = self._top_index - temp_length +1 #get value of node at that index and add to list nmer.append(self._path_stack[temp_index].Value) #next intermediate length #join list to get current nmer and return result = "".join(nmer) #end if return result #end _get_nmer #end SearchPath class SearchNode(object): """Represents a single choice (base) in a search path (primer).""" def __init__(self, an_alphabet): """Create a node ready to be used.""" self._alphabet = list(an_alphabet) self._options = self.Alphabet #note: this gets a COPY shuffle(self._options) #don't need a current option index variable: current option is #always zero; it is the # of available options that changes #end __init__ def __str__(self): """Create a human-readable representation of object.""" return toString(self) #end __str__ def _get_alphabet(self): """Get a copy of the class's list of choices""" return self._alphabet[:] #end _get_alphabet def _get_value(self): """Get the value of the current option.""" return self._options[0] #end _get_value def _get_options(self): """Get a copy of the list of available options""" return self._options[:] #end _get_options Value = property(_get_value) Options = property(_get_options) Alphabet = property(_get_alphabet) def removeOption(self): """Remove current option and return t/f whether any more are left.""" #assume, by default, that we won't run out of options, #then remove the current option from the options array result = True del self._options[0] #if there are no more options return false; otherwise, true if len(self._options) == 0: result = False return result #end removeOption #end SearchNode PyCogent-1.5.3/cogent/seqsim/sequence_generators.py000644 000765 000024 00000156472 12024702176 023453 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """sequence_generators.py: various types of random and non-random generators. Currently provides: SequenceGenerator: fills in degenerate sequences, either by cycling through all possibilities or by jumping to a particular sequence. Supports indexing, iteration, and slicing. Partition: generates all the ways of dividing n objects among b bins. Useful for stepping through a space of compositions, or dividing a sequence. The SequencGenerators are fairly elaborate, and allow complex modeling of RNA. The present implementation is based on Freqs, and is relatively slow. An array-based implementation that uses seqsim.usage objects, cogent core alphabets, etc. is in the works, and should have essentially the same interface. However, this implementation is fairly well-tested and was used to generate the data for the Knight et al. 2005 NAR paper on hammerhead/isoleucine motif folding. """ from operator import mul from types import SliceType from sys import path from random import choice, random, shuffle, randrange from cogent.maths.stats.util import Freqs from cogent.struct.rna2d import ViennaStructure from cogent.app.vienna_package import RNAfold from numpy import logical_and, fromstring, byte __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" IUPAC_DNA = {'T':'T','C':'C','A':'A','G':'G', 'R':'AG','Y':'TC','W':'TA','S':'CG','M':'CA','K':'TG', 'B':'TCG','D':'TAG','H':'TCA','V':'CAG','N':'TCAG'} IUPAC_RNA = {'U':'U','C':'C','A':'A','G':'G', 'R':'AG','Y':'UC','W':'UA','S':'CG','M':'CA','K':'UG', 'B':'UCG','D':'UAG','H':'UCA','V':'CAG','N':'UCAG'} def permutations(n, k): """Returns the number of ways of choosing k items from n, in order. Defined as n!/k! """ #Validation: k must be be between 0 and n (inclusive), and n must be >=0. if k > n: raise IndexError, "can't choose %s items from %s" % (k, n) elif k < 0: raise IndexError, "can't choose negative number of items" elif n < 0: raise IndexError, "can't choose from negative number of items" product = 1 for i in xrange(n-k+1, n+1): product *= i return product def combinations(n, k): """Returns the number of ways of choosing k items from n. Defined as n!/(k!(n-k)!) """ #Validation: k must be be between 0 and n (inclusive), and n must be >=0. if k > n: raise IndexError, "can't choose %s items from %s" % (k, n) elif k < 0: raise IndexError, "can't choose negative number of items" elif n < 0: raise IndexError, "can't choose from negative number of items" #permutations(n, k) = permutations(n, n-k), so reduce computation by #figuring out which requires calculation of fewer terms. if k > (n - k): larger = k smaller = n - k else: larger = n - k smaller = k product = 1 #compute n!/(n-larger)! by multiplying terms from n to (n-larger+1) for i in xrange(larger+1, n+1): product *= i #divide by (smaller)! by multiplying terms from 2 to smaller for i in xrange(2, smaller+1): #no need to divide by 1... product /= i #ok to use integer division: should always be factor return product def _slice_support(the_slice, length): """Takes a slice and the length of an object; returns normalized version. Specifically, corrects start and end for negative indices. """ start = the_slice.start stop = the_slice.stop step = the_slice.step or 1 #fill in missing values for start and end if start is None: start = 0 if stop is None: stop = length #convert end-relative values to start-relative values if start < 0: start = length - start if stop < 0: stop = length - stop return (start, stop, step) #ensure step is not zero, or we never move in the sequence! if not step: step = 1 class SequenceGenerator(object): """Generates all the possibilities for a degnerate template.""" def __init__(self, template='', alphabet=None, start=None): """Returns a new SequenceGenerator based on template""" if alphabet: self.Alphabet = alphabet else: self.Alphabet = IUPAC_RNA self.Template = template if start: self.validate(start) self.Start = start else: self.Start = [0] * len(template) def validate(self, state): """Check that state is allowable given the template.""" possibilities = map(len, map(self.Alphabet.__getitem__, self.Template)) for max_allowed, curr in zip(possibilities, state): if curr >= max_allowed: raise ValueError, "Tried to set a state to too high an index." def __str__(self): """Returns data about current iterator's template""" return ""\ % (self.Template, self.Alphabet) def __len__(self): """Returns the number of elements in all possible expansions.""" return self.numPossibilities() def numPossibilities(self): """Same as __len__, except Python doesn't coerce result to an int""" if self.Template: return reduce(mul, map(len, map(self.Alphabet.__getitem__, \ self.Template))) else: return 0 def _index2state(self, index): """Takes an index and returns the corresponding state.""" expansions = map(self.Alphabet.__getitem__, self.Template) num_items = len(expansions) lengths = map(len, expansions) indices = range(num_items) indices.reverse() #want to traverse in reverse order states = [0] * num_items #initialize with zero for i in indices: if not index: break #if index is zero, so is everything to the left of i if lengths[i] == 1: continue #skip anything that can't vary else: choices = lengths[i] states[i] = index % choices index //= choices return states def __getitem__(self, index): """Supports indexing. Now uses constant-time algorithm (fast).""" if type(index) is SliceType: return self._handle_slice(index) else: if index < 0: index = self.__len__() + index iterator = self.items(self._index2state(index)) return iterator.next() def _handle_slice(self, index): """Needs separate method, since __getitem__ can't yield _and_ return. """ length = self.__len__() #might be too big to fit in an int start, stop, step = _slice_support(index, length) #quick check to se if we can't return any items if (stop - start < 1): raise StopIteration if step < 1: raise NotImplementedError, \ "Can't support negative step in irreversible sequence.""" else: index = start iterator = self.items(self._index2state(start)) while index < stop: for i in range(step - 1): if i >= stop - 1: raise StopIteration iterator.next() index += step yield iterator.next() def __iter__(self): """Iterator interface using self.Start as the start_state.""" return self.items(self.Start) def items(self, start_state=None): """Acts like a sequence containing all the possibilities.""" #shortcut if the template is empty if not self.Template: return #figure out how many possibilities there are at each position, and #what the choices are expansions = map(self.Alphabet.__getitem__, self.Template) num_positions = len(expansions) lengths = map(len, expansions) #set the starting state, i.e. the array of what the current choice is #at each position. if start_state is None: indices = [0] * num_positions else: self.validate(start_state) indices = start_state[:] seq = [expansions[i][indices[i]] for i in range(num_positions)] #always return the sequence for the first possibility: there might not #be any more... yield ''.join(seq) while 1: #find rightmost element that can be incremented pos = num_positions - 1 while indices[pos] == lengths[pos] - 1: pos -= 1 if pos < 0: #ran off end return indices[pos] += 1 seq[pos] = expansions[pos][indices[pos]] #reset the rest of the elements, if there are any pos += 1 while pos <= num_positions - 1: indices[pos] = 0 seq[pos] = expansions[pos][0] pos += 1 #seq should always contain a list of the current states for each pos yield ''.join(seq) class Partition(object): """Generator behaving like a list of the partitions of a set of n items. Usage: p = Partition(num_items, num_pieces, min_occupancy=0) Requires each bin to have at least min_occupancy pieces. """ def __init__(self, num_items, num_pieces, min_occupancy=0): """Returns new Partition object with first partitions initialized. Usage: p = Partition(num_items, num_pieces, min_occupancy=0) Default is min_occupanct items in each bin, with all the leftovers in the first bin. """ self.NumItems = num_items if num_pieces: self.NumPieces = num_pieces else: raise ValueError, "Cannot divide items among zero bins." self.MinOccupancy = min_occupancy self._reset() def __str__(self): """Prints string representation with pieces, items, and occupancy.""" return "Items: %s Pieces: %s Min Per Piece: %s" % \ (self.NumItems, self.NumPieces, self.MinOccupancy) def _validate(self, states): """Verify that states has right sum and meets occupancy restrictions. Raises ValueError if there is any problem: does not return anything. """ num_pieces = self.NumPieces min_occupancy = self.MinOccupancy #cache for efficiency #check the number of pieces if len(states) != num_pieces: raise ValueError, "Tried to set state %s, but need %s pieces." % \ (states, num_pieces) #check that no piece has too few items sum = 0 for state in states: if state < min_occupancy: raise ValueError, \ "Tried to set state %s, but need at least %s items per bin." %\ (states, min_occupancy) sum += state #check that we have the right number of pieces if sum != self.NumItems: raise ValueError, \ "Tried to set state %s, but it has %s pieces instead of %s." %\ (states, sum, self.NumItems) def _reset(self, states=None): """Resets to a particular state given by sequence of states in each bin. Default: go to first partition, with MinOccupancy items in each bin and any leftovers in the first bin. """ min_occupancy = self.MinOccupancy num_items = self.NumItems num_pieces = self.NumPieces #cache for efficiency if states: #check that we're not trying to set a bad state self._validate(states) self._bins = states else: reserved = (num_pieces - 1) * min_occupancy #check that we can actually divide the pieces among the bins OK if reserved + min_occupancy > num_items: raise ValueError, \ "Can't divide %s items into %s pieces with at least %s in each."\ % (num_items, num_pieces, min_occupancy) #otherwise, fill the bins bins = [min_occupancy] * num_pieces bins[0] = num_items - reserved self._bins = bins self._reserved = reserved def __iter__(self): """Defines iterator interface, starting with self._bins.""" return self.items() def _transform(self, value): """Transformation to be applied to return values. Default behavior is to copy, but can be overridden in derived classes. """ return value[:] def items(self, bin_states=None): """Defines iterator interface, supporting for i in self.""" #always copy the array of states, since we will be mutating it if bin_states: self._validate(bin_states) bins = bin_states[:] else: self._reset() bins = self._bins[:] #cache local vars for efficiency delta = self.MinOccupancy num_items = self.NumItems num_pieces = self.NumPieces transform = self._transform end_state = num_items - (delta * (num_pieces - 1)) #always return the first state yield transform(bins) while 1: #check if we're done: when the last bin has all the items if bins[-1] == end_state: return #need to adjust the bins to the correct state for next time #find rightmost non-delta except the last rightmost = sum = 0 #figure out the sum of all the items to the right of the bin we're #going to decrement, and also which bin we're going to decrement for i in xrange(len(bins)-2, -1, -1): curr = bins[i] if curr != delta: rightmost = i break else: sum += curr #bins[-1] excluded from count above: also need to add 1 for newly #incremented item from the rightmost decrementable bin sum += bins[-1] + 1 bins[rightmost] -= 1 #leftover_bins counts the number of bins more than one to the left #of the rightmost leftover_bins = num_pieces - rightmost - 2 if leftover_bins: bins[rightmost+2:] = [delta] * leftover_bins sum -= delta * leftover_bins bins[rightmost + 1] = sum yield transform(bins) def __len__(self): """Calculates the number of possible parameters with current state. Specifically, only takes into account the number of objects, the number of bins, and the minimum per bin: does _not_ take into account a particular start point. """ #NOTE: I don't know why the following works, but it seems to be #empirically true when compared to the lengths of the resulting lists. cuts = self.NumPieces items = self.NumItems - (self.MinOccupancy - 1) * cuts product = 1 for i in range(items - cuts + 1, items): product *= i for i in range(2, cuts): product /= i return product class Composition(Partition): """Generates evenly spaced composition intervals over an alphabet. Usage:c=Composition(spacing,min_occupancy=0,alphabet='ACGU') spacing should be a float representing the percentage of the space separating successive values (e.g. 5 for 5% steps). Note that the spacing may be approximated: check self.Spacing to see what the recorded value is. alphabet should be a list, in order, of the possible characters. min_occupancy should typically be 0 (can miss some symbols) or 1 (always require at least one of each symbol). Always yields an un-normalized Freqs containing counts of each symbol at each step. For a given alphabet A, the possible compositions of that alphabet can be represented as a simplex in len(A)-1 dimensions. Composition returns a representation of evenly distributed compositions in that space, with distances along all dimensions represented by spacing (i.e. if spacing is 0.05, then the next point in any dimension will be 0.05 away if it exists.) Useful for generating sequences of specified composition that can then be randomized. """ def __init__(self,spacing,min_occupancy=0,alphabet='ACGU'): """Initializes new generator with specified spacing, alphabet, etc. Usage: c = Composition(spacing, min_occupancy=0, alphabet='ACGU') See class documentation for details. """ self.Spacing = spacing #also sets self._num_items self.Alphabet = alphabet #also sets self._num_pieces self.MinOccupancy = min_occupancy def _get_spacing(self): """Accessor for self.Spacing.""" return self._spacing def _set_spacing(self, spacing): """Mutator for self.Spacing. Sets self.NumItems to correct value.""" num_items = int(round(100.0/spacing)) self._num_items = num_items self._spacing = 100.0/num_items Spacing = property(_get_spacing, _set_spacing, \ doc="Set spacing and calculate string length.") def _get_num_items(self): """Accessor for self.NumItems.""" return self._num_items def _set_num_items(self, num_items): """Mutator for self.NumItems: recalculates self.Spacing.""" self._num_items = num_items self._spacing = 100.0/num_items NumItems = property(_get_num_items, _set_num_items, \ doc="Set NumItems and calculate Spacing.") def _get_alphabet(self): """Accessor for self.Alphabet.""" return self._alphabet def _set_alphabet(self, alphabet): """Mutator for self.Alphabet.""" self._alphabet = alphabet Alphabet = property(_get_alphabet, _set_alphabet, \ doc="Set alphabet and calculate number of pieces.") def _get_num_pieces(self): """Accessor for self.NumPieces""" return len(self.Alphabet) NumPieces = property(_get_num_pieces, doc="Get number of pieces.") def _transform(self, value): """Override superclass transform to yield Freqs. """ return Freqs(dict(zip(self.Alphabet, value))) def __iter__(self): """Defines iterator interface, starting with self._bins.""" return self.items() class MageFrequencies(object): """Takes a Freqs and optionally a label. Writes out a Mage-format string. This presentation class is standalone to avoid cluttering Freqs. """ def __init__(self, freqs, label=''): """Returns a new MageFrequencies object. This is basically a labeled Freqs that can write itself out as a Mage-format string. """ self.Freqs = freqs#don't mutate original self.Label = label def __str__(self): """Returns the frequency string, suitable for MAGE.""" pieces = [] freqs = self.Freqs known_bases = Freqs({ 'A':freqs.get('A',0), 'C':freqs.get('C',0), 'U':freqs.get('U',0) + freqs.get('T',0), 'G':freqs.get('G',0), }) #frequencies should sum to 1 for MAGE display. known_bases.normalize() #Only append label field if there is one. label = self.Label or '' if label: pieces.append('{%s}' % self.Label) for item in 'ACG': pieces.append(str(known_bases[item])) return ' '.join(pieces) class SequenceHandle(list): """Holds mutable sequence that can join itself together as string. Sequence cannot vary in length. """ def __init__(self, data='', alphabet=None): """Initializes new list over an alphabet. Rejects invalid entries.""" if alphabet: for d in data: if d not in alphabet: raise ValueError, "Item %s not in alphabet %s." \ % (d, alphabet) super(SequenceHandle, self).__init__(data) self.Alphabet = alphabet def __setitem__(self, index, item): """Checks that the item is in the alphabet.""" alphabet = self.Alphabet if alphabet: try: absent = item not in alphabet except TypeError: raise ValueError, "Item %s not in alphabet %s." \ % (item, alphabet) else: if absent: raise ValueError, "Item %s not in alphabet %s." \ % (item, alphabet) super(SequenceHandle, self).__setitem__(index, item) def __setslice__(self, start, stop, values): """Checks that items are in alphabet, and that slice is same length.""" orig_length = len(self) alphabet = self.Alphabet if alphabet: for v in values: try: absent = v not in alphabet except TypeError: raise ValueError, "Item %s not in alphabet %s." \ % (v, alphabet) else: if absent: raise ValueError, "Item %s not in alphabet %s." \ % (v, alphabet) super(SequenceHandle, self).__setslice__(start, stop, values) if len(self) != orig_length: raise ValueError, "Cannot change length of SequenceHandle." def __str__(self): """Returns self as a string, symbols joined.""" try: return ''.join(self) except: #use built-in conversion methods for lists return super(SequenceHandle, self).__str__() def _naughty_method(self, *args, **kwargs): """Prevent other methods that change the length or set items.""" raise NotImplementedError, \ "May not change length of SequenceHandle." #note how _many_ methods are naughty... __delitem__ = __delslice__ = __iadd__ = __imul__ = append \ = extend = insert = pop = remove = _naughty_method class BaseFrequency(Freqs): RNA = ['U', 'C', 'A', 'G'] DNA = ['T', 'C', 'A', 'G'] """Holds information about base frequencies.""" def __init__(self, freqs, RNA=True): """Returns new BaseFrequency object, ensuring a count for each base.""" if RNA: alphabet = self.RNA else: alphabet = self.DNA super(BaseFrequency, self).__init__(freqs, alphabet) for k in alphabet: if k not in self: self[k] = 0.0 class PairFrequency(Freqs): """Makes a frequency distribution of pairs from freqs of single items.""" def __init__(self, freqs, pairs=None): """Makes pair frequency distribution. Usage: p = PairFrequency(freqs, pairs) freqs is the single-item frequencies pairs is the list of valid pairs from which samples will be drawn. If pairs is None (the default), constructs all possible pairs. """ symbol_freqs = BaseFrequency(freqs) if pairs is None: symbols = symbol_freqs.keys() pairs = [(i, j) for i in symbols for j in symbols] pair_freqs = {} for i, j in pairs: try: pair_freqs[(i,j)] = symbol_freqs[i]*symbol_freqs[j] except KeyError, e: print symbol_freqs print i, j raise e super(PairFrequency, self).__init__(pair_freqs, pairs) self.normalize() class BasePairFrequency(PairFrequency): """Holds information about base pair frequencies.""" WatsonCrick = [('A','U'), ('U','A'),('G','C'),('C','G')] Wobble = WatsonCrick + [('G','U'), ('U','G')] def __init__(self, freqs, GU=True): if GU: pairs = self.Wobble else: pairs = self.WatsonCrick super(BasePairFrequency, self).__init__(freqs, pairs) class RegionModel(object): """Holds probability model for constructing random or randomized sequences. Supports the following interface: Current: Reference to current sequence, or tuple of references. Template: Degenerate sequence specifying the class of sequences to produce. Immutable. Length: Length of the current sequence. Read-only. Composition: Composition used to generate sequences (e.g. pairs). refresh(): Generate the next, random sequence. monomers(f): Update internal frequencies using symbol frequencies in f. Base class RegionModel behavior is to model a constant region. """ def __init__(self, template='', composition=None): """Return a new RegionModel object. See class for documentation.""" self.Composition = composition self.Template = template #will set self.Current def _get_template(self): """Accessor method for self.Template""" return self._template def _set_template(self, data): """Mutator method for self.Template""" self._template = data self.Current = SequenceHandle(data) self.refresh() Template = property(_get_template, _set_template) def _get_composition(self): """Accessor method for self.Composition""" return self._composition def _set_composition(self, composition): """Mutator method for self.Composition""" self._composition = composition self.refresh() Composition = property(_get_composition, _set_composition) def __len__(self): """Returns length of the current string.""" return len(self.Current) def refresh(self): """Replaces the current sequence with a new string fitting the model. Does nothing unless overridden in derived classes. """ pass def monomers(self, composition, **kwargs): """Replaces the current composition with new Freqs.""" self.Composition = composition #no effect unless overridden class ConstantRegion(RegionModel): """Holds a constant string: this is default behavior.""" pass class UnpairedRegion(RegionModel): """Holds an unpaired region: this gets filled in from self.Composition.""" def refresh(self): """Fills in a sequence drawn randomly from composition.""" if hasattr(self, "Current") and self.Current and self.Composition: self.Current[:] = self.Composition.randomSequence(len(self)) class ShuffledRegion(RegionModel): """Holds a non-degenerate template that is randomized by shuffling.""" def refresh(self): """Randomizes the template by permuting the elements.""" if hasattr(self, "Current") and self.Current: shuffle(self.Current) class PairedRegion(RegionModel): """Holds complementary upstream and downstream strands.""" def refresh(self): """Fills in tuple of paired sequences drawn from self.Composition.""" if hasattr(self, "Current") and self.Current and self.Composition: length = len(self) upstream = self.Current[0] downstream = self.Current[1] pairs = self.Composition.randomSequence(length) for i in xrange(length): upstream[i] = pairs[i][0] downstream[i] = pairs[i][1] #downstream has the complements, but need to reverse it as well downstream.reverse() def __len__(self): """Returns length of (half of) the current helix, not the tuple...""" return len(self.Current[0]) def _set_template(self, data): """Mutator method for self.Template""" data = list(data) self._template = data self.Current = (SequenceHandle(data), SequenceHandle(data)) self.refresh() #Override base class _set_template in the property Template = property(RegionModel._get_template, _set_template) def monomers(self, composition, **kwargs): """Calculates pair distribution from monomer frequencies.""" GU = kwargs.get('GU', True) self.Composition = BasePairFrequency(composition,GU) class DegenRegion(RegionModel): """Handles a string of degenexerate bases. WARNING: Not tested! """ def refresh(self): """Fills in degen bases randomly according to possible symbols""" if hasattr(self, "Current") and self.Current and self.Composition: result = [] non_degen = dict.fromkeys('UCAG') freqs = self.Composition for b in self.Template: if b in non_degen: result.append(b) else: allowed_bases = IUPAC_RNA[b] composition = Freqs(dict([(i,freqs[i]) for i in allowed_bases])) composition.normalize() result.append(composition.choice(random())) self.Current[:] = ''.join(result) ###WARNING: MATCHINGREGION HAS NOT YET BEEN TESTED#### class MatchingRegion(RegionModel): """Fills in the complement to specified constant region.""" WatsonCrick = {'A':'U', 'U':'A', 'C':'G', 'G':'C'} Wobble = {'A':'U', 'U':'AG', 'C':'G', 'G':'UC'} def __init__(self): raise NotImplementedError, "NOT YET TESTED" def _init_current(self): """Initializes Current and some private variables.""" wc = self.WatsonCrick template = self.Template self.Complement = [wc[base] for base in template] self.Current = SequenceHandle(self.Complement) self._freqs = {} def refresh(self): """Returns new sequence that could pair with template.""" if self.GU: freqs = self._freqs bases = [freqs[base].randomSequence(1)[0] for base in self.Template] self.Current[:] = bases else: self.Current[:] = self.Complement def monomers(self, freqs, **kwargs): """Calculates Freqs of possibilities for each base.""" if kwargs.get('GU', False): pairs = self.Wobble.items() self.GU = True for base, complements in pairs: freqs = self.Composition.copy() freqs.subset(complements) freqs.normalize() self._freqs[base] = freqs else: self.GU = False class SequenceModel(object): """Stores state associated with generating a randomized sequence.""" def __init__(self, order, composition=None, GU=True, \ constants=[], unpaireds=[], helices=[], matches=[], degenerates=[]): """Returns a new SequenceModel. constants, unpaireds, and helices should all be lists of RegionModels. order should be a string of the following format: [label1][index1] [label2][index2] ... where label is C for constant, U for unpaired, or H for helix, or D for degen, and index is the index of the region within the appropriate list. Use hyphens to indicate cuts. For example: "C0 U3 H1 U5 - H2 C1 H2 H1 U0" ...means constants[0] followed by unpaireds[3], followed by the first part of helices[1], followed by unpaireds[5], followed by the first part of helices[2], followed by constants[1], followed by the second part of helices[2], followed by the second part of helices[1], followed by unpaireds[0]. There is a cut in the sequence between U5 and H2. Although this is a very general mechanism (each piece can potentially have its own composition, etc.), typically the functionality will be accessed programatically through other classes. """ self.Helices = helices self.Unpaireds = unpaireds self.Constants = constants self.Degenerates = degenerates self.Matches = matches self.GU = GU #Don't _require_ a composition to be passed in, but if it isn't passed #in, then all the pieces must be initialized with their own compositions #beforehand. self.Composition = composition self.Order = order def __len__(self): """Figures out the total length of all the components.""" length = 0 for i in self.Unpaireds + self.Constants + self.Matches + self.Degenerates: length += len(i) for h in self.Helices: length += 2 * len(h) return length def refresh(self): """Delegates each region to refresh itself.""" for i in self.Helices + self.Unpaireds + self.Matches + self.Degenerates: i.refresh() def _get_order(self): """Accessor for self.Order.""" return self._order def _set_order(self, order): """Figure out the order to put pieces in, using string format.""" result = [] segments = order.split('-') helix_counts = [0] * len(self.Helices) for s in segments: pieces = [] components = s.split() for c in components: label = c[0] index = int(c[1:]) if label == 'C': #constant pieces.append(self.Constants[index].Current) elif label == 'U': #unpaired random region pieces.append(self.Unpaireds[index].Current) elif label == 'D': #degenerate region pieces.append(self.Degenerates[index].Current) elif label == 'H': #helix pieces.append(self.Helices[index].Current[\ helix_counts[index]]) helix_counts[index] += 1 #will give IndexError if the helix is added too many times else: raise ValueError, \ "SequenceModel got unknown label: %s" % label result.append(pieces) self._order = result self.refresh() Order = property(_get_order, _set_order, \ doc="Stores order for accessing the pieces of the template.") def _get_composition(self): """Accessor for Composition.""" return self._composition def _set_composition(self, composition): """Sets the composition of each of the components to a global value.""" if composition: for i in self.Helices + self.Unpaireds + self.Matches + self.Degenerates: i.monomers(composition, GU=self.GU) self._composition = composition Composition = property(_get_composition, _set_composition) def _get_GU(self): """Accessor for GU.""" return self._GU def _set_GU(self, GU): """Mutator for GU. Recalculates composition.""" self._GU = GU if hasattr(self, 'Composition') and self.Composition: self.Composition = self.Composition #recalculate with GU GU = property(_get_GU,_set_GU,doc="Controls whether GU pairs are allowed.") def __getitem__(self, index): """Returns the index'th segment of the sequence in its current state.""" return ''.join([str(i) for i in self.Order[index]]) def __str__(self): return '-'.join(self) class Rule(object): """Holds information about pairing constraints on motifs.""" def __init__(self, upstream_seq, upstream_pos, downstream_seq, \ downstream_pos, length): """Initialize new Rule object.""" self.UpstreamSequence = upstream_seq self.UpstreamPosition = upstream_pos self.DownstreamSequence = downstream_seq self.DownstreamPosition = downstream_pos self.Length = length self.validate() def validate(self): """Sanity checks on rule object.""" if self.Length <= 0: raise ValueError, "Helix length must be at least 1." if self.Length > self.DownstreamPosition + 1: raise ValueError, \ "Helix length cannot be more than 1 greater than downstream start." if min(self.UpstreamSequence, self.UpstreamPosition, \ self.DownstreamSequence, self.DownstreamPosition) < 0: raise ValueError, \ "All sequences and positions must be >= 0." if self.UpstreamSequence == self.DownstreamSequence: if self.UpstreamPosition >= self.DownstreamPosition: raise ValueError, \ "Upstream position must have lower index than downstream." if self.DownstreamPosition-self.UpstreamPosition+1 < 2*self.Length: raise ValueError, "Helices can't overlap." if self.UpstreamSequence > self.DownstreamSequence: raise ValueError, "Upstream sequence must have the smaller index." def isCompatible(self, other): """Checks that the helices in self and other don't overlap. Has to try all possible combinations of upstream and downstream sequences, since any could conflict. """ if self.UpstreamSequence == other.UpstreamSequence: diff = abs(self.UpstreamPosition - other.UpstreamPosition) if self.UpstreamPosition <= other.UpstreamPosition: if diff < self.Length: return False elif diff < other.Length: return False if self.DownstreamSequence == other.DownstreamSequence: diff = abs(self.DownstreamPosition - other.DownstreamPosition) if self.DownstreamPosition >= other.DownstreamPosition: if diff < self.Length: return False elif diff < other.Length: return False if self.UpstreamSequence == other.DownstreamSequence: diff = abs(self.UpstreamPosition - other.DownstreamPosition) #only need to check if position in self <= position in other if self.UpstreamPosition <= other.DownstreamPosition: if diff < (self.Length + other.Length - 1): return False if self.DownstreamSequence == other.UpstreamSequence: diff = abs(self.DownstreamPosition - other.UpstreamPosition) #only need to check if position in self >= position in other if self.DownstreamPosition >= other.UpstreamPosition: if diff < (self.Length + other.Length - 1): return False #if none of the checks failed, return True return True def fitsInSequence(self, upstream): """Checks whether upstream sequence is too short to hold helix. Note: downstream sequence length doesn't need to be checked because the index that can't be overlapped is always 0. """ if self.UpstreamPosition + self.Length > len(upstream): return False else: return True def __str__(self): """Human-readable rule string.""" return "Up Seq: %s Up Pos: %s Down Seq: %s Down Pos: %s Length: %s" % \ (self.UpstreamSequence, self.UpstreamPosition, \ self.DownstreamSequence, self.DownstreamPosition, self.Length) class Module(object): """Holds information about a module's required sequence and structure.""" def __init__(self, sequence, structure): """Returns a new Module object with specified sequence and structure.""" self.Sequence = sequence self.Structure = structure len(self) #will raise error if lengths out of sync def __len__(self): """Returns length of sequence and structure.""" seq = self.Sequence struct = self.Structure seq_length = len(seq) if seq_length != len(struct): raise ValueError, \ "Lengths of sequence '%s' and structure '%s' differ." % \ (seq, struct) else: return seq_length def __str__(self): """Returns string containing sequence and structure.""" return "Sequence: %s\nStructure: %s" % (self.Sequence, self.Structure) def matches(self, other, index=None, alphabet=IUPAC_RNA): """Tests whether sequence/structure in self match other at index. other must be an object that has Sequence and Structure properties. If index is None, will search for matches anywhere in other. ###THIS METHOD NEEDS ATTENTION: move responsibility for finding matches to the sequence objects themselves? """ length = len(self) if not length: #zero-length pattern matches everywhere by definition return True if index is not None: #index might be 0... seq_match = True this_seq = self.Sequence other_seq = other.Sequence for i in range(length): curr = False try: curr = curr or (other_seq[i+index] in alphabet[this_seq[i]]) except: pass if not curr: try: curr = curr or (this_seq[i] in \ alphabet[other_seq[i+index]]) except: pass if not curr: seq_match = False break struct_match = self.structureMatches(other.Structure, index) return seq_match and struct_match[0] else: other_length = len(other) seq = self.Sequence struct = self.Structure other_struct = other.Structure other_seq = other.Sequence curr = 0 #current index while curr <= other_length - length: #don't run off end try: index = other_seq.index(seq, curr) except ValueError: return False #no more matches to try if struct == other_struct[index:index+length]: return True #found struct and seq matches at same place if curr == index: curr += 1 #always make sure curr is incremented else: curr = index return False #must have been a seq match but no struct match #at last window if we got here after the loop def structureMatches(self, structure, index): """Tests whether structure in self matches other at index. structure must have PairList property, e.g. ViennaStructure. """ length = len(self) if not length: #zero-length pattern matches everywhere by definition return (True, ) else: ss = fromstring(self.Structure, byte) structure_mask = ss != ord('x') diffs = ss != fromstring(structure[index:index+length], byte) result = not logical_and(diffs, structure_mask).any() return result, ss, structure_mask, diffs class Motif(object): """Holds sequences and structures for a motif.""" def __init__(self, modules, rules): """Initializes motif with sequences, structures, and rules""" self.Modules = modules self.Rules = rules self.validate() def validate(self): """Checks that sequences and structures are equal length, and rules ok. Specifically, there must be the same number of sequences as structures; the length of each sequence must be the length of each structure; and the rules may not refer to any index outside the known sequences and structures. """ self._check_helix_lengths() self._check_rule_overlaps() def _check_helix_lengths(self): """Check upstream sequence of each rule to make sure the helix can fit.""" for r in self.Rules: if not r.fitsInSequence(self.Modules[r.UpstreamSequence].Sequence): raise ValueError, "Rule '%s' can't fit in sequence '%s'." \ % (r, self.Modules[r.UpstreamSequence].Sequence) def _check_rule_overlaps(self): """Check every pair of rules for overlaps in covered regions.""" rules = self.Rules #cache reference for efficiency for first in range(len(rules)): first_rule = rules[first] for second in range(first): second_rule = rules[second] if not first_rule.isCompatible(second_rule): raise ValueError, "Rules '%s' and '%s' incompatible." \ % (first, second) def _check_rule_match(self, rule, pairlist, locations): """Check whether rule matches pairlist given module locations. pairlist should be a list where, for each position in a longer sequence, parlist[i] should be the index of the partner of i, or None if i is not paired. locations should be a list of the locations of each module, in the order that the rule expects to find them. """ start_up = locations[rule.UpstreamSequence]+rule.UpstreamPosition start_down = locations[rule.DownstreamSequence]+rule.DownstreamPosition for i in range(rule.Length): if pairlist[start_up + i] != start_down - i: return False return True #if nothing failed, everything must be OK def _get_rule_match_pairs(self, rule, pairlist, locations): """Get the pairs that the rule will check.""" start_up = locations[rule.UpstreamSequence]+rule.UpstreamPosition start_down = locations[rule.DownstreamSequence]+rule.DownstreamPosition return [(start_up+i,start_down-i) for i in range(rule.Length)] def matches(self, sequence, structure, positions): """Checks that sequence and structure matches motifs/rules. sequence needs to support the string interface (specifically, s.index) if it is necessary to search for matches anywhere; otherwise, arbitrary sequences should work. structure must have a PairList property (like ViennaStructure), which is a list the same length as the sequence where the value of each position is the index of its partner, or None if it is unpaired. As with sequence, must support string interface to find arbitrary matches; arbitrary sequences are ok otherwise. positions must be a list the same length as self.Modules, containing the index at which each successive module should be searched for. ###TO BE IMPLEMENTED: IF POSITIONS IS NONE, SEARCH FOR THE MODULES ANYWHERE IN THE SEQUENCE.### """ full_length = Module(sequence, structure) #more convenient as object modules = self.Modules if len(positions) != len(modules): raise ValueError, "len(positions) must match number of modules." for position, module in zip(positions, modules): if not module.matches(full_length, position): return False #can only get here if all the modules matched: need to check rules pairlist = structure.toPartners() for rule in self.Rules: if not self._check_rule_match(rule, pairlist, positions): return False #if we got here, all the modules matched and all the rules were OK return True def structureMatches(self, structure, positions, offsets=None,debug=False): """Checks that structure only matches motifs/rules. structure must have a PairList property (like ViennaStructure), which is a list the same length as the sequence where the value of each position is the index of its partner, or None if it is unpaired. As with sequence, must support string interface to find arbitrary matches; arbitrary sequences are ok otherwise. positions must be a list the same length as self.Modules, containing the index at which each successive module should be searched for. ###TO BE IMPLEMENTED: IF POSITIONS IS NONE, SEARCH FOR THE MODULES ANYWHERE IN THE SEQUENCE.### """ modules = self.Modules if len(positions) != len(modules): raise ValueError, "len(positions) must match number of modules." if offsets: positions = [p+o for p, o in zip(positions, offsets)] result = True for position, module in zip(positions, modules): matched, ss, mask, diffs = \ module.structureMatches(structure, position) if debug: print 'STRUC:', structure[position:position+len(ss)] print 'SS :', ss.tostring() print 'MASK :', ''.join(map(str, map(int, mask))) print 'DIFFS:', ''.join(map(str, map(int,diffs))) print 'WHERE:' all = ['.'] * len(structure) all[position:position+len(ss)] = ['x']*len(ss) print ''.join(all) if not matched: if debug: result = False else: return False if not result: return False #can only get here if all the modules matched: need to check rules pairlist = structure.toPartners() for rule in self.Rules: if debug: pairs = self._get_rule_match_pairs(rule, pairlist, positions) for up, down in pairs: all = ['.'] * len(structure) all[up] = '(' all[down] = ')' print ''.join(all) if not pairlist[up] == down: print structure raise Exception, "Failed to find partner in pairlist" if not self._check_rule_match(rule, pairlist, positions): return False #if we got here, all the modules matched and all the rules were OK return True class SequenceEmbedder(object): """Generates and analyzes set of modules embedded inside longer sequence.""" def __init__(self, length, num_to_do, motif, model, composition, GU=True,\ with_replacement=False, positions=None, primer_5='', primer_3='', match_offsets=None, debug=False, report_seqs=False ): """Initializes with a specified length sequence model, composition. Note that sampling with replacement does NOT give all the outcomes equal frequencies, e.g. with two choices (0,1) will happen half the time because there are 2 ways to get it, but only one way to get (0,0) or (1,1). """ self.Model = model self.Motif = motif self.NumToDo = long(num_to_do) self.Length = long(length) self.WithReplacement = with_replacement #allows adjacent modules self.GU = GU self.RandomRegion = UnpairedRegion('N'*(length - len(self.Model)), \ composition) self.Composition = composition self._fixed_positions = positions self.Positions = positions self.Primer3 = primer_3 self.Primer5 = primer_5 self.MatchOffsets = match_offsets self.Debug = debug self.ReportSeqs = report_seqs def _get_composition(self): """Accessor for self.Composition.""" return self._composition def _set_composition(self, composition): """Mutator for self.Composition.""" self._composition = composition if composition: self.Model.GU = self.GU self.Model.Composition = composition self.RandomRegion.Composition = composition Composition = property(_get_composition, _set_composition) def _choose_locations(self): """Picks out places for the modules.""" random_positions = self.Length - len(self.Model) num_modules = len(self.Motif.Modules) locations = [] with_replacement = self.WithReplacement if (not with_replacement) and (random_positions < num_modules): raise ValueError, "Not enough positions to place modules." while len(locations) < num_modules: if with_replacement: curr = randrange(random_positions + 1) locations.append(curr) else: curr = randrange(random_positions) if curr not in locations: locations.append(curr) locations.sort() return locations def __str__(self): """Makes a new sequence with inserts at correct positions. Note: no longer mutates self.Positions. """ pieces = [str(self.Primer5)] random = str(self.RandomRegion.Current) modules = list(self.Model) added_positions = 0 last_position = 0 positions = self.Positions[:] for i in range(len(positions)): curr_module = modules[i] curr_position = positions[i] pieces.append(random[last_position:curr_position]) pieces.append(curr_module) last_position = curr_position positions[i] += added_positions added_positions += len(curr_module) pieces.append(random[last_position:]) #add anything left over pieces.append(str(self.Primer3)) return ''.join(pieces) def refresh(self): """Generates a new version of each module, incl. the random region.""" self.RandomRegion.refresh() self.Model.GU = self.GU self.Model.refresh() def countMatches(self, verbose=False, temp=25): """Generates NumToDo sequences, folds them, and returns match count.""" positions = [] seqs = [] structs = [] orig_positions = self._fixed_positions self.Positions = orig_positions for i in xrange(self.NumToDo): self.refresh() if not orig_positions: self.Positions = self._choose_locations() curr_seq = str(self) #adjust positions to account for inserted modules curr_positions = self.Positions[:] insert_length = len(self.Primer5) module_lengths = map(len, list(self.Model)) for i in range(len(curr_positions)): curr_positions[i] += insert_length insert_length += module_lengths[i] positions.append(curr_positions) seqs.append(curr_seq) folder = RNAfold(params={'-T':temp}) struct_file = folder(seqs)['StdOut'] odd = False for line in struct_file: if odd: structs.append(ViennaStructure(line.split()[0])) odd = not odd good_count = 0 if self.Debug: print "DEBUGGING" #debug code: prints seqs, structs, matches for seq, struct, position in zip(seqs, structs, positions): matched = self.Motif.structureMatches(struct, position, \ self.MatchOffsets,debug=self.Debug) if matched: good_count += 1 if self.Debug or (matched and self.ReportSeqs): module_lengths = map(len, list(self.Model)) if self.Debug: print "Module lengths:", module_lengths print "Positions:", position print seq print struct temp = [' '] * len(seq) for l, p in zip(module_lengths, position): temp[p:p+l] = ['*']*l print ''.join(temp) if self.Debug: print "Offsets:", self.MatchOffsets print matched return good_count PyCogent-1.5.3/cogent/seqsim/tree.py000644 000765 000024 00000063734 12024702176 020347 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Fast tree class for sequence simulations.""" from operator import add, or_, and_ from random import choice from numpy import array, zeros, transpose, arange, concatenate, any from numpy.random import permutation from cogent.core.tree import PhyloNode __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class RangeNode(PhyloNode): """Node object that assigns ids to its leaves and can access leaf blocks. Note: some of these methods should possibly move to the base class. """ def __init__(self, *args, **kwargs): """Returns a new RangeNode object. Name: text label LeafRange: range of this node's leaves in array. last = index+1 Id: index of this node in array. Children: list of Node objects that are this node's direct children. Parent: Node object that is this node's parent. NameLoaded: From cogent.core.tree.TreeNode, undocumented. WARNING: Parent/Child relationships are _not_ checked to preserve consistency! You must specify both the parent and the children explicitly, or the connections will not be made correctly. """ self.LeafRange = kwargs.get('LeafRange', None) self.Id = kwargs.get('Id', None) super(RangeNode, self).__init__(*args, **kwargs) def __int__(self): """Returns index of self.""" return self.Index def _find_label(self): """Makes up a label for __str__ method.""" label = None if hasattr(self, 'Name'): label = self.Name if label is None: if hasattr(self, 'Id'): label = self.Id if label is None: label = '' return label def __str__(self): """Returns informal Newick-like representation of self.""" label = self._find_label() if self.Children: child_string = ','.join(map(str, self.Children)) else: child_string = '' if self.Parent is None: #root of tree if self.Children: return '(%s)%s' % (child_string, label) else: return '()%s' % label else: #internal node if self.Children: if hasattr(self, 'Length') and (self.Length!=None): return '(%s)%s:%s' % \ (child_string, label, self.Length) else: return '(%s)%s' % \ (child_string, label) else: if hasattr(self, 'Length') and (self.Length!=None): return '%s:%s' % (label, self.Length) else: return '%s' % (label) def traverse(self, self_before=False, self_after=False, include_self=True): """Iterates through children of self. Default behavior: leaves only. self_before: yield self before children (preorder traversal) self_after: yield self after children (postorder traversal) include_self: if False (default is True), skips self in traversal. Primarily useful for skipping the root node. If both self_before and self_after are True, the node is returned both before _and_ after all its children are handled. This can be useful for certain applications, e.g. in RNA structure. """ return super(RangeNode, self).traverse(self_before, self_after, \ include_self) def indexByAttr(self, attr, multiple=False): """Returns dict of node.attr -> node. WARNING: Assumes all nodes have unique values of attr unless multiple is set to True. """ result = {} if multiple: for n in self.traverse(self_before=True): curr = getattr(n, attr) if curr not in result: result[curr] = [n] else: result[curr].append(n) else: for n in self.traverse(self_before=True): result[getattr(n, attr)] = n return result def indexByFunc(self, f): """Returns dict of f(node) -> [matching nodes].""" result = {} for n in self.traverse(self_before=True): val = f(n) if val not in result: result[val] = [n] else: result[val].append(n) return result def assignIds(self, num_leaves=None): """Assigns each node's Id property, based on order in the tree. WARNING: Will store incorrect data if num_leaves is incorrect. """ if num_leaves is None: num_leaves = len(list(self.traverse())) last_leaf_id = 0 last_internal_id = num_leaves for node in self.traverse(self_after=True): c = node.Children if c: node.Id = last_internal_id last_internal_id += 1 node.LeafRange = (c[0].LeafRange[0], c[-1].LeafRange[-1]) else: node.Id = last_leaf_id node.LeafRange = (last_leaf_id, last_leaf_id + 1) last_leaf_id += 1 def propagateAttr(self, attr, overwrite=False): """Propagates self's version of attr to all children without attr. overwrite: determines whether to overwrite existing attr values. """ curr = getattr(self, attr) if overwrite: for node in self.traverse(self_after=True): setattr(node, attr, curr) else: for node in self.Children: if not hasattr(node, attr): setattr(node, attr, curr) node.propagateAttr(attr) def delAttr(self, attr): """Delets attr in self and all children.""" for node in self.traverse(self_after=True): delattr(node, attr) def perturbAttr(self, attr, f, pass_attr=False): """Perturbs attr in self and all children according to f() or f(attr). If pass_attr is False (the default), the branch is assigned f(). If pass_attr is True, the branch is assigned f(attr). Make sure that f has the correct form or you'll get an error! """ for node in self.traverse(self_after=True): if pass_attr: setattr(node, attr, f(getattr(node, attr))) else: setattr(node, attr, f()) def accumulateAttr(self, attr, towards_leaves=True, f=add): """Sets each node's version of attr to f(node, parent|children). if towards_leaves (the default), node.attr = f(node.attr, parent.attr); otherwise, node.attr = f(node.attr, child.attr) for each child. """ if towards_leaves: for node in self.traverse(self_before=True): parent = node.Parent if parent is None: continue else: setattr(node, attr, \ f(getattr(node,attr), getattr(parent,attr))) else: for node in self.traverse(self_after=True): children = node.Children if children: for c in children: setattr(node, attr, \ f(getattr(node, attr), getattr(c, attr))) def accumulateChildAttr(self, attr, f=add): """Sets each node's attr based on states in children (only). Always works from leaves to root. Does not set states in leaves. Skips any child where attr is None. """ for node in self.traverse(self_after=True): #only reset nodes with children if node.Children: #get attr from all children that have it vals = [getattr(c, attr) for c in node.Children \ if hasattr(c, attr)] #get rid of None values vals = filter(lambda x: x is not None, vals) if vals: setattr(node, attr, reduce(f, vals)) else: setattr(node, attr, None) def assignLevelsFromRoot(self): """Assigns each node its level reletave to self (self.Level=0).""" self.Level = 0 self.propagateAttr('Level', overwrite=True) self.accumulateAttr('Level', towards_leaves=True, f=lambda a,b:b+1) def assignLevelsFromLeaves(self, use_min=False): """Assigns each node its distance from the leaves. use_min: use min distance from leaf instead of max (default:False) """ self.Level = 0 self.propagateAttr('Level', overwrite=True) if use_min: self.accumulateAttr('Level', towards_leaves=False, \ f=lambda a,b: (a and min(a, b+1)) or b+1) else: self.accumulateAttr('Level', towards_leaves=False, \ f=lambda a,b:max(a, b+1)) def attrToList(self, attr, default=None, size=None, \ leaves_only=False): """Copies attribute from each node of self into list. attr: name of attr to copy. size: size of list to copy into (must be >= num nodes). leaves_only: only look at leaves, not internal nodes WARNING: will fail if the Id attribute of each node has not yet been set. """ if leaves_only: nodes = list(self.traverse()) else: nodes = list(self.traverse(self_before=True)) if size is None: size = len(nodes) result = [default] * size for node in nodes: result[node.Id] = getattr(node, attr) return result def attrFromList(self, attr, items, leaves_only=False): """Copies items in list into attr of nodes. Must have right # items.""" for n in self.traverse(self_before = not leaves_only): setattr(n, attr, items[n.Id]) def toBreakpoints(self): """Returns list of breakpoints that reconstructs self's topology. WARNING: Only works for strictly bifurcating trees. """ result = [] for node in self.traverse(self_before=True): if node.Children: result.append(node.Children[0].LeafRange[-1] - 1) return result def fromBreakpoints(cls, breakpoints): """Makes a new RangeNode tree from a sequence of breakpoints. Will have one more leaf than breakpoint. Always produces bifurcating tree. WARNING: will return incorrect results if elements in breakpoints are not unique! To make a random tree. call fromBreakpoints(permutation(n-1)) where n is the number of leaves desired in the tree. """ #return single, leaf node if breakpoints is empty if not any(breakpoints): return cls(Id=0, LeafRange=(0,1)) num_leaves = len(breakpoints) + 1 curr_internal_index = num_leaves root = cls(Id=curr_internal_index, LeafRange=(0,num_leaves)) curr_internal_index += 1 #need to walk through the tree for each breakpoint, find the range #in which the breakpoint occurs, and make children containing the #start (i.e. start:breakpoint+1) and end (i.e. breakpoint+1:end) #of the range. for b in breakpoints: #start at the root curr_node = root children = curr_node.Children #walk down the tree until we find a range without children that #the breakpoint is in while children: middle = children[1].LeafRange[0] #SUPPORT2425 curr_node = children[int(middle <= b)] children = curr_node.Children #curr_node is now the range that contains the breakpoint start, end = curr_node.LeafRange #check if left and right nodes are leaves, and assign relevant ids #we frequently need the index after the breakpoint, so assign #variable after_b to avoid lots of mysterious 'b+1's in the code after_b = b+1 if after_b - start == 1: left_id = start else: left_id = curr_internal_index curr_internal_index += 1 if end - after_b == 1: right_id = after_b else: right_id = curr_internal_index curr_internal_index += 1 #add left and right nodes to current node's children left = cls(Parent=curr_node,Id=left_id, LeafRange=(start, after_b)) right = cls(Parent=curr_node,Id=right_id, LeafRange=(after_b, end)) curr_node.Children = [left, right] return root fromBreakpoints = classmethod(fromBreakpoints) def leafLcaDepths(self, assign_ids=True, assign_levels=True): """Returns num_leaves x num_leaves matrix with depth of each LCA. assign_ids and assign_levels control whether or not to assign ids and levels (default: True). size: if supplied, sizes the matrix. No longer assumes strictly bifurcating tree. """ if assign_ids: self.assignIds() if assign_levels: self.assignLevelsFromLeaves() nodes = list(self.traverse(self_before=True)) #second element of LeafRange should contain largest node index #incidentally, will fail if ids not assigned num_nodes = self.LeafRange[1] result = zeros((num_nodes,num_nodes)) for node in nodes: #skip any nodes that are themselves leaves children = node.Children if not children: continue #if node has only one child, can't be anyone's LCA if len(children) == 1: continue if len(children) == 2: #if node has two children, is LCA of any descendant of first #child w.r.t. any descendant of second child curr_level = node.Level left, right = children for left_index in range(*(left.LeafRange)): for right_index in range(*(right.LeafRange)): result[left_index, right_index] = curr_level #otherwise, node is LCA of each child's descendants w.r.t. the #descendants of other children else: curr_level = node.Level for first in children: for second in children[1:]: for left_index in range(*(first.LeafRange)): for right_index in range(*(second.LeafRange)): result[left_index, right_index] = curr_level result += transpose(result) return result def randomNode(self): """Returns random node from self and children.""" return choice(list(self.traverse(self_before=True))) def randomLeaf(self): """Returns random leaf descended from self.""" if self.Children: return choice(list(self.traverse())) else: return self def randomNodeWithNLeaves(self, n): """Returns random node with exactly the specified number of leaves.""" try: lookup = self.indexByFunc(lambda x: x.LeafRange[1] - x.LeafRange[0]) except TypeError: #possible that ranges weren't assigned self.assignIds() lookup = self.indexByFunc(lambda x: x.LeafRange[1] - x.LeafRange[0]) return choice(lookup[n]) def randomNodeAtLevel(self, n, from_leaves=True): """Returns random node at specified level from root or tips.""" if from_leaves: self.assignLevelsFromLeaves() else: self.assignLevelsFromRoot() lookup = self.indexByAttr('Level', multiple=True) return choice(lookup[n]) def outgroupLast(self, first, second, third, cache=True): """Returns tuple of nodes first, second and third, with outgroup last. first, second, and third must all be descendants of self, and ids must have already been assigned to the trees. Sets self._leaf_lca_depths if not already set if cache is True. WARNING: if first, second, third are all at the same level of an unresolved polytomy, will arbitrarily choose one of the three as an outgroup. This choice may be inconsistent between different runs of the program. """ #find the leaf lca depths if necessary if cache: if not hasattr(self, '_leaf_lca_depths'): self._leaf_lca_depths = self.leafLcaDepths() depths = self._leaf_lca_depths else: depths = self.leafLcaDepths() #get the ids of the nodes first_id, second_id, third_id = first.Id, second.Id, third.Id lca_12 = depths[first_id, second_id] lca_13 = depths[first_id, third_id] lca_23 = depths[second_id, third_id] #find the shallowest lca and return nodes in appropriate order shallowest_lca = min([lca_12, lca_13, lca_23]) if shallowest_lca == lca_12: return (first, second, third) elif shallowest_lca == lca_13: return first, third, second else: return second, third, first def filter(self, taxa, keep=True): """Prunes (inplace) all items in self not leading to a taxon in taxa. Taxa must be container of nodes in the tree. keep determines whether to keep (if True) or delete (if False) the specified taxa. Collapses nodes where appropriate (i.e. one-child nodes get deleted). Branch lengths are preserved (i.e. if a node is collapsed, its branch length is added to the node that collapses onto it). WARNING: Root of the tree is always preserved, so you might find that all the nodes are in a single-child subtree of the root if all the nodes on the other side of the root were deleted. If no nodes are kept, will return an empty root node with no children. """ taxon_ids = dict.fromkeys(map(id, taxa)) node_ids = self.indexByFunc(id) #select specified ids for t in taxon_ids: if t in node_ids: node_ids[t][0]._selected = True #unselect root if not specified if not id(self) in taxon_ids: self._selected = False #figure out whether each node is selected: first, accumulate towards #tips, then trace back to root self.propagateAttr('_selected') if keep: self.accumulateChildAttr('_selected', f=or_) else: self.accumulateChildAttr('_selected', f=and_) #delete and/or collapse undesired nodes for node in list(self.traverse(self_after=True)): if node.Parent is None: #back at root continue #delete if not selected if node._selected != keep: result = node.Parent.removeNode(node) node.Parent = None #replace with (already handled) child if single-item elif (len(node.Children) == 1) and (node.Parent is not None): curr_child = node.Children[0] curr_parent = node.Parent curr_parent.Children[curr_parent.Children.index(node)] = \ curr_child curr_child.Parent = curr_parent node.Parent = None #add branch lengths if present if hasattr(node, 'Length'): if hasattr(curr_child, 'Length') and \ curr_child.Length is not None: curr_child.Length += node.Length else: curr_child.Length = node.Length self.delAttr('_selected') def addChildren(self, n): """Adds n children to self.""" constructor = self.__class__ new_nodes = [constructor(Parent=self) for i in range(n)] def makeIdIndex(self): """Sets self.IdIndex as index of ids in self.""" self.assignIds() self.IdIndex = self.indexByAttr('Id') def assignQ(self, q=None, special_qs=None, overwrite=False): """Clears and assigns Q matrices. q: overall Q for tree. special_qs: dict of node id -> q for node and its subtrees. overwrite: if True (default is False), overwrites existing Qs """ if q is not None: self.Q = q if special_qs: ids = self.IdIndex for k, v in special_qs.items(): ids[int(k)].Q = v if (not hasattr(self, 'Q')) or (not self.Q): raise ValueError, "Failed to assign Q matrix to root." self.propagateAttr('Q', overwrite) def assignP(self): """Assigns P-matrices based on current Q-matrices. Assumes that Length and Q are already set in all nodes. WARNING: Assumes that branch lengths represent sequence divergences and are at most 0.75 (for DNA, somewhat more for protein), i.e. do not exceed saturation. If the branch lengths exceed saturation, will probably fail unpredictably. Note that Q.toSimilarProbs generates a sequence that matches a specific similarity, not a specific divergence, so need to use (1-Length) (with the attendant dangers). WARNING: Does not set P-matrix at root (because there's no change before the root. Should it instead set the P-matrix at the root to the identity matrix of the appropriate size?) """ #don't set P if root if self.Parent is not None: self.P = self.Q.toSimilarProbs(1-self.Length) for c in self.Children: c.assignP() def assignLength(self, length): """Assigns all nodes the specified length.""" for node in self.traverse(self_before=True): node.Length = length def evolve(self, seq, field_name='Sequence'): """Evolves seq, according to P-matrix on each node. Assumes that P has been set on all nodes already. WARNING: seq is an array, not a Sequence object (at this point). """ #assign seq to root, or evolve from parent's sequence if self.Parent is None: setattr(self, field_name, seq) else: setattr(self, field_name, self.P.mutate(seq)) for c in self.Children: c.evolve(getattr(self, field_name), field_name) def assignPs(self, rates): """Sets many P-matrices from a single Q-matrix, scaled to rates. Assumes that Branchlength and Q are already set in all nodes. rates should be a list of rates at least as long as the seqs to evolve. These should all be less than 1 (i.e. the max rate is 1, and other rates decline from there). """ if self.Parent is not None: self.Ps = [self.Q.toSimilarProbs(1-(self.Length*r)) \ for r in rates] for c in self.Children: c.assignPs(rates) def evolveSeqs(self, seqs, field_name='Sequences'): """Evolves list of seqs according to Q-matrices and rates on each node. Assume that Ps has already been set such that P_i -> seq_i at each node, e.g. via self.assignPs. WARNING: seqs are (currently) assumed to be arrays, not Sequence objects. """ if self.Parent is None: setattr(self, field_name, seqs) else: setattr(self, field_name, [p.mutate(seq) for (p, seq) in \ zip(self.Ps, seqs)]) for c in self.Children: c.evolveSeqs(getattr(self, field_name), field_name) def balanced_breakpoints(num_leaves): """Returns breakpoints for a balanced tree with specified num_leaves. num_leaves must be at least 1 and must be a power of 2. WARNING: no validation is performed to ensure these conditions are met. This algorithm works by figuring the indices of all the nodes that are at a particular level, making an array of those indices using arange, and then concatenating the arrays in order of level. """ result = [] curr_step = num_leaves curr_start = (curr_step/2) - 1 while curr_start >= 0: result.append(arange(curr_start, num_leaves, curr_step)) curr_step /= 2 curr_start = (curr_step/2) -1 return concatenate(result) def BalancedTree(num_leaves, node_class=RangeNode): """Returns a balanced tree of node_class (num_leaves must be power of 2).""" #return node_class.fromBreakpoints(balanced_breakpoints(num_leaves)) root = node_class() curr_children = [root] while len(curr_children) < num_leaves: tmp = [] for n in curr_children: n.Children[:] = [node_class(Parent=n), node_class(Parent=n)] tmp.extend(n.Children) curr_children = tmp return root def RandomTree(num_leaves, node_class=RangeNode): """Returns a random node_class tree using the breakpoint model.""" return node_class.fromBreakpoints(permutation(num_leaves-1)) def CombTree(num_leaves, deepest_first=True, node_class=RangeNode): """Returns a comb node_class tree.""" if deepest_first: branch_child = 1 else: branch_child = 0 root = node_class() curr = root for i in range(num_leaves-1): curr.Children[:] = [node_class(Parent=curr),node_class(Parent=curr)] curr = curr.Children[branch_child] return root def StarTree(num_leaves, node_class=RangeNode): """Returns a star phylogeny, with all leaves equally connected to root.""" t = node_class() t.addChildren(num_leaves) return t def LineTree(depth, node_class=RangeNode): """Returns a tree with all nodes arranged in a line.""" t = node_class() curr = t for i in range(depth-1): new_node = node_class(Parent=curr) curr = new_node return t PyCogent-1.5.3/cogent/seqsim/usage.py000644 000765 000024 00000124054 12024702176 020505 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """usage.py: usage of symbols, including substitutions on pairwise alphabets. Revision History Created 10/12/04 by Rob Knight. 9/14/05 Rob Knight: Changed Usage constructor to allow Alphabet on the instance level, and to eliminate the precalculated flag which was not used. Added entropy method. 7/20/07 Mike Robeson: Under PairMatrix.__init__ changed 'if data:' to 'if data != None: 8/3/07 Daniel McDonald: Code now relies on numpy and cogent with the exception of the one scipy function that still needs to be removed """ from cogent.maths.scipy_optimize import fmin, brent from cogent.util.array import scale_trace, norm_diff, \ has_neg_off_diags, sum_neg_off_diags, with_diag, without_diag from cogent.core.alphabet import get_array_type from cogent.core.usage import RnaBases, DnaBases, DnaPairs, RnaPairs, Codons from cogent.core.sequence import ModelSequence, ModelDnaSequence, \ ModelRnaSequence from operator import add, sub, mul, div from cogent.maths.matrix_logarithm import logm from cogent.maths.stats.util import FreqsI from cogent.maths.matrix_exponentiation import FastExponentiator as expm from numpy import zeros, array, max, diag, log, nonzero, product, cumsum, \ searchsorted, exp, diagonal, choose, less, repeat, average,\ logical_and, logical_or, logical_not, transpose, compress,\ ravel, concatenate, equal, log, dot, identity, \ newaxis as NewAxis, sum, take, reshape, any, all, asarray from numpy.linalg import eig from numpy.linalg import inv as inverse from numpy.random import random as randarray ARRAY_TYPE = type(array([0])) __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Mike Robeson", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class Usage(FreqsI): """Stores usage on a particular alphabet. Abstract class. Note: Usage is abstract because most subclasses (e.g. CodonUsage, AminoAcidUsage) have specific methods that depend on their alphabets. Allowing generic Usage objects is disallowed to enforce use of the appropriate Usage object for specific situations. Supports most of the Cogent FreqsI interface. """ Alphabet = None # concrete subclasses have specific alphabets def __init__(self, data=None, Alphabet=None): """Returns a new Usage object from array of symbol freqs. Will interpret many different kinds of data, including precalculated frequencies, arrays of symbols, and cogent.core.sequence.ModelSequence objects. Warning: it guesses whether you passed in frequencies or symbols based on the length of the array, so for example Usage(DnaSequence('ATCG')) will _not_ give the result you expect. If you know the data type, use the alternative class method constructors. """ if Alphabet is not None: self.Alphabet = Alphabet if not self.Alphabet: raise TypeError, "Usage subclasses must define alphabet.""" if isinstance(data, Usage): self._data = data._data else: self._data = zeros(len(self), 'float64') if any(data): self += data def __getitem__(self, i): """Returns item based on alphabet.""" return self._data[self.Alphabet.index(i)] def __setitem__(self, key, val): """Sets item based on alphabet.""" self._data[self.Alphabet.index(key)] = val def __str__(self): """Prints as though it were a tuple of key,value pairs.""" return str(self.items()) def __repr__(self): """String representation of self.""" return ''.join([self.__class__.__name__, '(', repr(self._data), ')']) def __iter__(self): """Iterates over keys, like a dict.""" return iter(self.Alphabet) def __eq__(self, other): """Tests whether two Usage objects have the same data.""" if hasattr(other, '_data'): return all(self._data == other._data) #if we get here, didn't compare equal try: return all(self._data == self.__class__(other)._data) except: return False def __ne__(self, other): """Returns True if self and other are not equal.""" if hasattr(other, '_data'): return any(self._data != other._data) #if we get here, didn't compare equal try: return any(self._data != self.__class__(other)._data) except: return True def __iadd__(self, other): """Adds data to self in-place.""" #check if other is nonzero; skip if it isn't try: if not other: return self except ValueError: if not any(other): return self #first, check if it's a Usage object if isinstance(other, Usage): self._data += other._data return self #then, check if it's one of our ModelSequence objects ac = self.Alphabet.counts if isinstance(other, ModelSequence): self._data += ac(other._data) return self #if it's the same length as self, try to add it as frequencies try: if len(other) == len(self): self._data += other return self except TypeError: pass #then try to convert it using the alphabet #WARNING: this will silently ignore unknown keys! #since we know other wasn't nonzero, we won't accept #the result if we can't convert anything. try: other_freqs = ac(other) #check if we actually converted anything... if any(other_freqs): self._data += other_freqs return self except (IndexError, KeyError, TypeError): pass #then use the generic conversion function f = self._find_conversion_function(other) if f: f(other, op=add) return self else: raise TypeError, "Could not convert this to freqs: %s" % other def __isub__(self, other): """Subtracts data from self in-place.""" #check if other is nonzero; skip if it isn't try: if not other: return self except ValueError: if not any(other): return self #first, check if it's a Usage object if isinstance(other, Usage): self._data -= other._data return self #then, check if it's one of our ModelSequence objects ac = self.Alphabet.counts if isinstance(other, ModelSequence): self._data -= ac(other._data) return self #if it's the same length as self, try to add it as frequencies try: if len(other) == len(self): self._data -= other return self except TypeError: pass #then try to convert it using the alphabet #WARNING: this will silently ignore unknown keys! #since we know other wasn't nonzero, we won't accept #the result if we can't convert anything. try: other_freqs = ac(other) #check if we actually converted anything... if other_freqs.any(): self._data -= other_freqs return self except (IndexError, KeyError, TypeError): pass #then use the generic conversion function f = self._find_conversion_function(other) if f: f(other, op=sub) return self else: raise TypeError, "Could not convert this to freqs: %s" % other def __mul__(self, other): """Multiplies self by other (assumed scalar).""" return self.__class__(self._data * other) def __imul__(self, other): """Multiplies self by other in-place (assumed scalar).""" self._data *= other def __div__(self, other): """Divides self by other (assumed scalar). Always true division.""" return self.__class__(self._data / (other)) def __idiv__(self, other): """Divides self by other (assumed scalar) inplace. Maybe int division.""" self._data /= other def scale_sum(self, sum_=1.0): """Returns copy of self scaled to specified sum.""" return self.__class__(self._data * (sum_/sum(self._data))) def scale_max(self, max_=1.0): """Returns copy of self scaled to specified maximum (default 1).""" return self.__class__(self._data * (max_/max(self._data))) def probs(self): """Returns copy of self scaled so that the sum is 1.""" return self.__class__(self._data / (sum(self._data))) def randomIndices(self, length, random_vector=None): """Produces random indices according to symbol freqs.""" freqs = cumsum(self._data/sum(self._data))[:-1] if random_vector is None: random_vector=randarray(length) return searchsorted(freqs, random_vector) def fromSeqData(cls, seq, Alphabet=None): """Returns new Usage object from Sequence object.""" return cls.fromArray(seq._data, Alphabet=Alphabet) def fromArray(cls, a, Alphabet=None): """Returns new Usage object from array.""" return cls(cls.Alphabet.counts(a), Alphabet=Alphabet) fromSeqData = classmethod(fromSeqData) fromArray = classmethod(fromArray) #following code is to support FreqsI def get(self, key, default): """Returns self._data[self.Alphabet.index(key) if present, or default.""" try: return self._data[self.Alphabet.index(key)] except (KeyError, IndexError, TypeError): return default def values(self): """Returns list of keys in self (i.e. the alphabet).""" return list(self._data) def keys(self): """Returns list of values in self (i.e. the data).""" return list(self.Alphabet) def items(self): """Returns list of (key, value) pairs in self.""" return zip(self.Alphabet, self._data) def isValid(self): """Always valid (except for negative numbers), so override.""" return min(self._data) >= 0 def copy(self): """Return copy of self with same alphabet, not sharing data.""" return self.__class__(self._data.copy()) def __delitem__(self, key): """Can't really delete items, but raise error if in alphabet.""" if key in self.Alphabet: raise KeyError, "May not delete required key %s" % key def purge(self): """Can't contain anything not in alphabet, so do nothing.""" pass def normalize(self, total=1.0, purge=True): """Converts counts into probabilities, normalized to 1 in-place. Changes result to Float64. Purge is always treated as True. """ if self._data is not None and self._data.any(): self._data = self._data / (total * sum(self._data)) def choice(self, prob): """Returns item corresponding to Pr(prob).""" if prob > 1: return self.Alphabet[-1] summed = cumsum(self._data/sum(self._data)) return self.Alphabet[searchsorted(summed, prob)] def randomSequence(self, n): """Returns list of n random choices, with replacement.""" if not self: raise IndexError, "All frequencies are zero." return list(choose(self.randomIndices(n), self.Alphabet)) def subset(self, items, keep=True): """Sets all frequencies not in items to 0. If keep is False, sets all frequencies in items to 0. """ if keep: for i in self.Alphabet: if i not in items: self[i] = 0 else: for i in items: try: self[i] = 0 except KeyError: pass def scale(self, factor=1, offset=0): """Linear transform of values in freqs where val= factor*val + offset.""" self._data = factor * self._data + offset def __len__(self): """Returns length of alphabet.""" return len(self.Alphabet) def setdefault(self, key, default): """Returns self[key] or sets self[key] to default.""" if self[key]: return self[key] else: self[key] = default return default def __contains__(self, key): """Returns True if key in self.""" try: return key in self.Alphabet except TypeError: return False def __nonzero__(self): """Returns True if self is nonzero.""" return bool(sum(self._data) != 0) def rekey(self, key_map, default=None, constructor=None): """Returns new Freqs with keys remapped using key_map. key_map should be a dict of {old_key:new_key}. Values are summed across all keys that map to the same new value. Keys that are not in the key_map are omitted (if default is None), or set to the default. constructor defaults to self.__class__. However, if you're doing something like mapping amino acid frequencies onto charge frequencies, you probably want to specify the constructor since the result won't be valid on the alphabet of the current class. Note that the resulting Freqs object is not required to contain values for all the possible keys. """ if constructor is None: constructor = self.__class__ result = constructor() for key, val in self.items(): new_key = key_map.get(key, default) curr = result.get(new_key, 0) try: result[new_key] = curr + val except KeyError: pass return result def entropy(self, base=2): """Returns Shannon entropy of usage: sum of p log p.""" ln_base = log(base) flat = ravel(self._data) total = sum(flat) if not total: return 0 flat /= total ok_indices = nonzero(flat)[0] ok_vals = take(flat, ok_indices, axis=0) return -sum(ok_vals * log(ok_vals))/ln_base class DnaUsage(Usage): """Stores usage on the DNA alphabet.""" Alphabet = DnaBases class RnaUsage(Usage): """Stores usage on the RNA alphabet.""" Alphabet = RnaBases class CodonUsage(Usage): """Stores usage on the Codon alphabet.""" Alphabet = Codons class DnaPairUsage(Usage): """Stores usage on the DnaPairs alphabet.""" Alphabet = DnaPairs class RnaPairUsage(Usage): """Stores usage on the RnaPairs alphabet.""" Alphabet = RnaPairs class PairMatrix(object): """Base class for Counts, Probs, and Rates matrices. Immutable. Holds any numeric relationship between pairs of objects on a JointAlphabet. Note that the two SubEnumerations of the JointAlphabet need not be the same, although many subclasses of PairMatrix will require that the two SubEnumerations _are_ the same because their methods assume square matrices. """ def __init__(self, data, Alphabet, Name=None): """Returns new PairMatrix object containing data. WARNING: Alphabet must be a JointAlphabet where the two SubEnumerations are the same. """ self.Alphabet = Alphabet if any(data): self._data = reshape(array(data, 'd'), Alphabet.Shape) else: self._data = zeros(Alphabet.Shape, 'd') self.Name = Name def toMatlab(self): """Returns Matlab-formatted string representation.""" if self.Name is None: name = 'm' else: name = str(self.Name) return ''.join([name, '=', '[', \ ';\n'.join([' '.join(map(str, r)) for r in self._data]), '];\n']) def __str__(self): """Returns string representation of array held in self.""" return str(self._data) def __repr__(self): """Returns string representation of self.""" return ''.join([self.__class__.__name__, '(', repr(self._data), \ ',', repr(self.Alphabet), ',', repr(self.Name), ')']) def __getitem__(self, args): """__getitem__ passes everything to internal array. WARNING: m[a,b] will work where a and b are symbols in the alphabet, but m[a][b] will fail. This is because m[a] produces an array object with the corresponding row, which is then passed b as an index. Because the array object doesn't have the alphabet, it can't map the index into a number. Slicing is not supported. """ # First, test whether args are in the JointAlphabet. Will always be tuple. if isinstance(args, tuple): try: return ravel(self._data)[self.Alphabet.index(tuple(args))] except (KeyError, TypeError): pass return self._data[self.Alphabet.SubEnumerations[0].index(args)] def __len__(self): """Returns number of rows.""" return len(self._data) def empty(cls, Alphabet): """Class method: returns empty matrix sized for alphabet.""" return cls(zeros(Alphabet.Shape), Alphabet) empty = classmethod(empty) def __eq__(self, other): """Tests whether two Usage objects have the same data.""" try: return all(self._data == other._data) #return not bool(all(self._data != other._data)) except: return False def __ne__(self, other): """Returns True if self and other are not equal.""" try: return any(self._data != other._data) #return bool(all(self._data != other._data)) except: return False def __iter__(self): """Iterates over rows in data.""" return iter(self._data) class Counts(PairMatrix): """Holds the data for a matrix of counts. Immutable. """ def toProbs(self): """Returns copy of self where rows sum to 1.""" return Probs(self._data/ (sum(self._data, 1)[:,NewAxis]), \ self.Alphabet) def fromPair(cls, first, second, Alphabet, average=True): """Class method: returns new Counts from two sequences. """ size = len(Alphabet.SubEnumerations[-1]) #if they're ModelSequence objects, use the _data attribute if hasattr(first, '_data'): first, second = first._data, second._data #figure out what size we need the result to go in: note that the #result is on a pair alphabet, so the data type of the single #alphabet (that the sequence starts off in) might not work. data_type = get_array_type(product(map(len, Alphabet.SubEnumerations))) first = asarray(first, data_type) second = asarray(second, data_type) items = first * size + second counts = reshape(Alphabet.counts(items), Alphabet.Shape) if average: return cls((counts + transpose(counts))/2.0, Alphabet) else: return cls(counts, Alphabet) fromPair = classmethod(fromPair) def _from_triple_small(cls, first, second, outgroup, Alphabet): """Class method: returns new Counts for first from three sequences. Sequence order is first, second, outgroup. Use this method when the sequences are short and/or the alphabet is small: relatively memory intensive because it makes an array the size of the seq x the alphabet for each sequence. Fast on short sequences, though. NOTE: requires input to either all be ModelSequence objects, or all not be ModelSequence objects. Could change this if desirable. """ #if they've got data, assume ModelSequence objects. Otherwise, arrays. if hasattr(first, '_data'): first, second, outgroup = first._data, second._data, outgroup._data size = len(Alphabet.SubEnumerations[-1]) a_eq_b = equal(first, second) a_ne_b = logical_not(a_eq_b) a_eq_x = equal(first, outgroup) b_eq_x = equal(second, outgroup) #figure out what size we need the result to go in: note that the #result is on a pair alphabet, so the data type of the single #alphabet (that the sequence starts off in) might not work. data_type = get_array_type(product(map(len, Alphabet.SubEnumerations))) first = asarray(first, data_type) second = asarray(second, data_type) b_to_a = second*size + first a_to_a = first*size + first b_to_a_items = compress(logical_and(b_eq_x, a_ne_b), b_to_a) a_to_a_items = compress(logical_or(a_eq_b, a_eq_x), a_to_a) items = concatenate((b_to_a_items, a_to_a_items)) counts = reshape(Alphabet.counts(items), Alphabet.Shape) return cls(counts, Alphabet) def _from_triple_large(cls, first, second, outgroup, Alphabet): """Same as _from_triple except copes with very long sequences. Specifically, allocates an array for the frequencies of each type, walks through the triple one base at a time, and updates the appropriate cell. Faster when alphabet and/or sequences are large; also avoids memory issues because it doesn't allocate the seq x alphabet array. NOTE: requires input to either all be ModelSequence objects, or all not be ModelSequence objects. Could change this if desirable. WARNING: uses float, not int, as datatype in return value. """ #figure out if we already have the data in terms of alphabet indices. #if not, we need to convert it. if hasattr(first, '_data'): first, second, outgroup = first._data, second._data, outgroup._data else: if hasattr(Alphabet, 'toIndices'): converter = Alphabet.toIndices else: converter = Alphabet.fromSequenceToArray # convert to alphabet indices first, second, outgroup = map(asarray, map(converter, [first, second, outgroup])) # only include positions where all three not different valid_posn = logical_not(logical_and(logical_and(first != outgroup, second != outgroup), first != second)) valid_pos = [index for index, val in enumerate(valid_posn) if val] first = first.take(valid_pos) second = second.take(valid_pos) outgroup = outgroup.take(valid_pos) out_diffs = logical_and(first == second, first != outgroup) counts = zeros((len(Alphabet.SubEnumerations[0]), \ len(Alphabet.SubEnumerations[0]))) for x, y, out_diff in zip(outgroup, first, out_diffs): if out_diff: counts[y,y] += 1 else: counts[x,y] += 1 return cls(counts, Alphabet) def fromTriple(cls, first, second, outgroup, Alphabet, threshold=1e6): """Reads counts from triple of sequences, method chosen by data size.""" if len(first) * len(Alphabet) > threshold: return cls._from_triple_large(first, second, outgroup, Alphabet) else: return cls._from_triple_small(first, second, outgroup, Alphabet) fromTriple = classmethod(fromTriple) _from_triple_small = classmethod(_from_triple_small) _from_triple_large = classmethod(_from_triple_large) class Probs(PairMatrix): """Holds the data for a probability matrix. Immutable.""" def isValid(self): """Returns True if all values positive and each row sums to 1.""" for row in self: if sum(row) != 1.0 or min(row) < 0.0: return False return True def makeModel(self, seq): """Returns substitution model for seq based on self's rows.""" return take(self._data, seq, axis=0) def mutate(self, seq, random_vector=None): """Returns mutated version of seq, according to self. seq should behave like a Numeric array. random_vector should be vector of 0 and 1 of same length as sequence, if supplied. Result is always an array, not coerced into seq's class. """ sums = cumsum(self._data, 1) model = take(sums, seq, axis=0) if random_vector is None: random_vector = randarray(seq.shape) return sum(transpose(model)[:-1] < random_vector, axis=0) #transpose needed to align frames def toCounts(self, num): """Returns count matrix with approximately num counts. Rounding error may prevent counts from summing exactly to num. """ num_rows = len(self) return Counts(self._data * (num/num_rows), self.Alphabet) def toRates(self, normalize=False): """Returns rate matrix. Does not normalize by default.""" return Rates(logm(self._data), self.Alphabet, self.Name, normalize) def random(cls, Alphabet, diags=None): """Makes random P-matrix with specified diag elements and size. diags can be a single float, or vector of values with same number of chars as individual alphabet (e.g. list of 4 elements will act as elements for the 4 bases). """ shape = Alphabet.Shape if diags is None: result = randarray(shape) return cls(result/sum(result, 1)[:,NewAxis], Alphabet) else: single_size = shape[0] diags = array(diags, 'd') #handle scalar case if not diags.shape: diags = reshape(diags, (1,)) if len(diags) == 1: diags = repeat(diags, single_size) temp = randarray((single_size, single_size-1)) temp *= ((1.0-diags)/sum(temp, 1))[:,NewAxis] result = diag(diags) for r, row in enumerate(temp): result[r][:r] = row[:r] result[r][r+1:] = row[r:] return cls(result, Alphabet) random = classmethod(random) class Rates(PairMatrix): """Holds the data for a rate matrix. Immutable.""" def __init__(self, data, Alphabet, name=None, normalize=False): """Returns new Rates matrix, normalizing trace to -1 if necessary.""" data = array(data) #check for complex input array if data.dtype == 'complex128': self.imag = data.imag data = data.real super(Rates, self).__init__(data, Alphabet) if normalize: self._normalize_inplace() def isComplex(self): """Returns True if self has a complex component.""" return hasattr(self, 'imag') def isSignificantlyComplex(self, threshold=0.1): """Returns True if complex component is above threshold.""" if hasattr(self, 'imag'): return sum(ravel(self.imag)) > threshold else: return False def isValid(self, threshold=1e-7): """Rate matrix is valid if rows sum to 0 and no negative off-diags. threshold gives maximum error allowed in row sums. """ if max(abs(sum(self._data, -1)) > threshold): return False return not has_neg_off_diags(self._data) def _normalize_inplace(self): """Normalizes trace to -1, in-place. Should only call during __init__, since it mutates the object. WARNING: Only normalizes real component. """ scale_trace(self._data) def normalize(self): """Returns normalized copy of self where trace is -1. WARNING: Only normalizes real component. """ return Rates(self._data, self.Alphabet, normalize=True) def _get_diagonalized(self): """Gets diagonalization of self as u, v, w; caches values.""" if not hasattr(self, '_diag_cache'): error_tolerance = 1e-4 #amount of error allowed in product eigenvalues, eigenvectors = eig(self._data) u = transpose(eigenvectors) v = eigenvalues w = inverse(u) #check that the diagonalization actually worked by multiplying #the results back together result = dot(dot(u,v),w) if abs(sum(ravel(result))) > error_tolerance: raise ValueError, "Diagonalization failed with erroneous result." self._diag_cache = u, v, w return self._diag_cache _diagonalized = property(_get_diagonalized) def toProbs(self, time=1.0): """Returns probs at exp(self*scale_factor). The way this works is by diagonalizing the rate matrix so that u is the matrix with eigenvectors as columns, v is a vector of eigenvalues, and w is the inverse of u. u * diag(v) * w reconstructs the original rate matrix. u * diag(exp(v*t)) * w exponentiates the rate matrix to time t. This is more expensive than a single exponentiation if the rate matrix is going to be sxponentiated only once, but faster if it is to be exponentiated to many different time points. Note that the diagonalization is not the same as the svd. If the diagonalization fails, we use the naive version of just multiplying the rate matrix by the time and exponentiating. """ try: u, v, w = self._diagonalized #scale v to the right time by exp(v_0*t) v = diag(exp(v * time)) return Probs(dot(dot(u,v), w), self.Alphabet) except: return Probs(expm(self._data)(time), self.Alphabet) def _timeForSimilarity_naive(self, similarity, freqs=None): """Returns time exponent so that exp(q*time) diverges to right distance. Takes symbol freqs into account if specified; otherwise assumes equal. freqs: vector of frequencies, applied to each row successively. WARNING: Factor of 5 slower than timeForSimilarity. Included for testing that results are identical. """ q = self._data if freqs is None: def similarity_f(t): return abs(average(diagonal(expm(q)(t)))-similarity) else: def similarity_f(t): return abs(sum(diagonal(expm(q)(t)*freqs)) - similarity) initial_guess = array([1.0]) result = fmin(similarity_f, initial_guess, disp=0) #disp=0 turns off fmin messages return result def timeForSimilarity(self, similarity, freqs=None): """Returns time exponent so that exp(q*time) diverges to right distance. Takes symbol freqs into account if specified; otherwise assumes equal. freqs: vector of frequencies, applied to each row successively. NOTE: harder to understand, but a factor of 5 faster than the naive version. The nested matrixmultiply calls have the same effect as exponentiating the matrix. """ #if there's no change, the time is 0 if similarity == 1: return 0.0 #try fast version first, but if it fails we'll use the naive version. try: u, v, w = self._diagonalized if freqs is None: def similarity_f(t): return abs(average(diagonal(dot(u, \ dot(diag(exp(v*t)), w)))) - similarity) else: def similarity_f(t): return abs(sum(diagonal(dot(u, \ dot(diag(exp(v*t)), w)))*freqs) - similarity) except (TypeError, ValueError): #get here if diagonalization fails q = self._data if freqs is None: def similarity_f(t): return abs(average(diagonal(expm(q)(t)))-similarity) else: def similarity_f(t): return abs(sum(diagonal(expm(q)(t)*freqs))-similarity) return brent(similarity_f) def toSimilarProbs(self, similarity, freqs=None): """Returns Probs at specified divergence. Convenience wrapper for toProbs and timeForSimilarity. """ return self.toProbs(self.timeForSimilarity(similarity, freqs)) def random(cls, Alphabet, diags=None): """Makes random Q-matrix with specified diag elements and size. diags can be a single float, or vector of values with same number of chars as individual alphabet (e.g. list of 4 elements will act as elements for the 4 bases). """ shape = Alphabet.Shape single_size = shape[0] if diags is None: diags = -randarray(single_size) else: diags = array(diags, 'd') #handle scalar case if not diags.shape: diags = reshape(diags, (1,)) if len(diags) == 1: diags = repeat(diags, single_size) temp = randarray((single_size, single_size-1)) temp *= ((-diags)/sum(temp, 1))[:,NewAxis] result = diag(diags) for r, row in enumerate(temp): result[r][:r] = row[:r] result[r][r+1:] = row[r:] return cls(result, Alphabet) random = classmethod(random) def hasNegOffDiags(self): """Returns True if any off-diagonal elements negative.""" return has_neg_off_diags(self._data) def sumNegOffDiags(self): """Returns sum of negative off-diagonal elements.""" return sum_neg_off_diags(self._data) def fixNegsDiag(self): """Returns copy of self w/o negative off-diags, using 'diag' heuristic. If a negative off-diagonal element is encountered, sets it to 0. Subtracts all the negative off-diagonals from the diagonal to preserve row sum = 0. """ m = self._data.copy() #clip to 0 m = choose(less(m, 0.), (m, 0.)) for i, row in enumerate(m): row[i] = -sum(row) return self.__class__(m, self.Alphabet) def fixNegsEven(self): """Returns copy of self w/o negative off-diags, using 'even' heuristic. If a negative off-diagonal is encountered, sets it to 0. Distributes the negative score evenly among the other elements. """ m = without_diag(self._data) for i, row in enumerate(m): is_neg = row < 0 if any(is_neg): num_negs = sum(is_neg) sum_negs = sum(is_neg*row) is_not_neg = logical_not(is_neg) num_not_neg = sum(is_not_neg) new_row = (row + (sum_negs/(num_not_neg+1)))*is_not_neg m[i] = new_row return self.__class__(with_diag(m, -sum(m,1)), self.Alphabet) def _make_error_f(self, to_minimize): """Make error function whose minimization estimates q = ln(p).""" p = expm(self._data)(t=1) BIG = 1e10 def result(q): new_q = reshape(q, (4,4)) neg_sum = sum_neg_off_diags(new_q) p_new = expm(new_q)(t=1) return to_minimize(ravel(p), ravel(p_new)) - (BIG * neg_sum) \ + (BIG * sum(abs(sum(new_q,1)))) return result def fixNegsFmin(self, method=fmin, to_minimize=norm_diff, debug=False): """Uses an fmin method to find a good approximate q matrix. Possible values for method: fmin: simplex method (the default) fmin_bfgs: bfgs optimizer #always produces negative elements! fmin_cg: cg optimizer #doesn't work! fmin_powell: powell method #doesn't work! """ q = self._data #bail out if q is already ok to start with if not sum_neg_off_diags(q): return self err_f = self._make_error_f(to_minimize) initial_guess = q.copy() xmin = method(err_f, initial_guess.flat, disp=0) #disp=0 turns off messages new_q = reshape(xmin, self.Alphabet.Shape)[:] if debug: if sum_neg_off_diags(new_q): raise Exception, 'Made invalid Q matrix: %s' % q return self.__class__(new_q, self.Alphabet) def fixNegsConstrainedOpt(self, to_minimize=norm_diff, badness=1e6): """Uses constrained minimization to find approx q matrix. to_minimize: metric for comparing orig result and new result. badness: scale factor for penalizing negative off-diagonal values. """ if not sum_neg_off_diags(self._data): return self q = ravel(without_diag(self._data)) p = expm(self._data)(t=1) def err_f(q): new_q = reshape(array(q), (4,3)) new_q = with_diag(new_q, -sum(new_q, 1)) p_new = expm(new_q)(t=1) result = to_minimize(ravel(p), ravel(p_new)) if q.min() < 0: result += -q.min() * badness return result a = array(q) xmin = fmin(func=err_f, x0=a, disp=0) r = reshape(xmin, (4,3)) new_q = with_diag(r, -sum(r, 1)) return self.__class__(new_q, self.Alphabet) def fixNegsReflect(self): """Fixes negative off-diagonals by subtracting m[i][j] from m[j][i]. Specifically, if m[i][j] is negative, subtracts this value from m[i][j] and m[i][i] to keep the row total at 0, and then subtracts it from m[j][i] and m[j][j] to convert a negative flux in the forward direction into a positive flux in the reverse direction. If both m[i][j] and m[j][i] are negative, this algorithm converts them both into positive values, effectively exchanging the magnitudes of the changes and making the signs positive. NOTE: It's important to iterate over the original and make changes to the copy to avoid incorrect results in cases where both m[i][j] and m[j][i] are negative. """ orig = self._data result = orig.copy() for i, row in enumerate(orig): for j, val in enumerate(row): #skip diagonal if i == j: continue #only make changes if element < 0 if val < 0: result[i][j] -= val result[i][i] += val result[j][i] -= val result[j][j] += val return self.__class__(result, self.Alphabet) def goldman_q_rna_triple(seq1, seq2, outgroup): """Returns the Goldman rate matrix for seq1""" if len(seq1) != len(seq2) != len(outgroup): raise ValueError, "seq1,seq2 and outgroup are not the same length!" seq1 = ModelRnaSequence(seq1) seq2 = ModelRnaSequence(seq2) outgroup = ModelRnaSequence(outgroup) m = Counts.fromTriple(seq1, seq2, outgroup, RnaPairs)._data q = m / m.sum(axis=1)[:,NewAxis] new_diag = -(q.sum(axis=1) - diag(q)) for i,v in enumerate(new_diag): q[i,i] = v return q def goldman_q_dna_triple(seq1, seq2, outgroup): """Returns the Goldman rate matrix for seq1""" if len(seq1) != len(seq2) != len(outgroup): raise ValueError, "seq1,seq2 and outgroup are not the same length!" seq1 = ModelDnaSequence(seq1) seq2 = ModelDnaSequence(seq2) outgroup = ModelDnaSequence(outgroup) m = Counts.fromTriple(seq1, seq2, outgroup, DnaPairs)._data q = m / m.sum(axis=1)[:,NewAxis] new_diag = -(q.sum(axis=1) - diag(q)) for i,v in enumerate(new_diag): q[i,i] = v return q def goldman_q_dna_pair(seq1, seq2): """Returns the Goldman rate matrix""" if len(seq1) != len(seq2): raise ValueError, "seq1 and seq2 are not the same length!" seq1, seq2 = ModelDnaSequence(seq1), ModelDnaSequence(seq2) m = Counts.fromPair(seq1, seq2, DnaPairs,average=True)._data q = m / m.sum(axis=1)[:,NewAxis] new_diag = -(q.sum(axis=1) - diag(q)) for i,v in enumerate(new_diag): q[i,i] = v return q def goldman_q_rna_pair(seq1, seq2): """Returns the Goldman rate matrix""" if len(seq1) != len(seq2): raise ValueError, "seq1 and seq2 are not the same length!" seq1, seq2 = ModelRnaSequence(seq1), ModelRnaSequence(seq2) m = Counts.fromPair(seq1, seq2, RnaPairs,average=True)._data q = m / m.sum(axis=1)[:,NewAxis] new_diag = -(q.sum(axis=1) - diag(q)) for i,v in enumerate(new_diag): q[i,i] = v return q def make_random_from_file(lines): """Simulates array random() using values from an iterator.""" def result(shape): size = product(shape) items = map(float, [lines.next() for s in range(size)]) a = reshape(array(items), shape) return a return result #randarray = make_random_from_file(open('/Users/rob/random.txt')) def test_heuristics(p_range=None, num_to_do=71, heuristics=None): if p_range is None: p_range = [0.6] if heuristics is None: heuristics = ['fixNegsDiag', 'fixNegsEven', 'fixNegsReflect', 'fixNegsConstrainedOpt'] num_heuristics = len(heuristics) print '\t'.join(['p'] + heuristics) for p in p_range: result = zeros((num_to_do, num_heuristics), Float64) has_nonzero = 0 i = 0 while i < num_to_do: curr_row = result[i] random_p = Probs.random(DnaPairs, p) q = random_p.toRates() if not q.hasNegOffDiags(): continue has_nonzero += 1 #print "P:" #print random_p._data #print "Q:" #print q._data i += 1 for j, h in enumerate(heuristics): #print "HEURISTIC: ", h q_corr = getattr(q, h)() #print "CORRECTED Q: " #print q_corr._data p_corr = expm(q_corr._data)(t=1) #print "CORRECTED P:" #print p_corr dist = norm_diff(p_corr, random_p._data) #print "DISTANCE: ", dist curr_row[j] = dist averages = average(result) print p, '\t', '\t'.join(map(str, averages)) if __name__ == '__main__': test_heuristics() PyCogent-1.5.3/cogent/recalculation/__init__.py000644 000765 000024 00000000545 12024702176 022462 0ustar00jrideoutstaff000000 000000 #!/usr/bin/envthon __all__ = ['array', 'calculation', 'definition', 'scope', 'setting'] __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" PyCogent-1.5.3/cogent/recalculation/calculation.py000644 000765 000024 00000057711 12024702176 023230 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division, with_statement import numpy Float = numpy.core.numerictypes.sctype2char(float) import time, warnings from cogent.maths.solve import find_root from cogent.util import parallel from cogent.maths.optimisers import maximise, ParameterOutOfBoundsError import os TRACE_DEFAULT = os.environ.has_key('COGENT_TRACE') TRACE_SCALE = 100000 __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" # This is the 'live' layer of the recalculation system # Cells and OptPars are held by a Calculator # For docstring see definitions.py class CalculationInterupted(Exception): pass class OptPar(object): """One parameter, as seen by the optimiser, eg: length of one edge. An OptPar reports changes to the ParameterValueSet for its parameter. """ is_constant = False recycled = False args = () # Use of __slots__ here and in Cell gives 8% speedup on small calculators. __slots__ = ['clients', 'client_ranks', 'name', 'lower', 'default_value', 'upper', 'scope', 'order', 'label', 'consequences', 'rank'] def __init__(self, name, scope, bounds): self.clients = [] self.client_ranks = [] self.name = name for (attr, v) in zip(['lower', 'default_value', 'upper'], bounds): setattr(self, attr, float(v)) # controls order in optimiser - group for LF self.scope = scope self.order = (len(scope), scope and min(scope), name) self.label = self.name def addClient(self, client): self.clients.append(client) def __cmp__(self, other): # optimisation is more efficient if params for one edge are neighbours return cmp(self.order, other.order) def __repr__(self): return '%s(%s)' % (self.__class__.__name__, self.label) def getOptimiserBounds(self): lower = self.transformToOptimiser(self.lower) upper = self.transformToOptimiser(self.upper) return (lower, upper) def transformFromOptimiser(self, value): return value def transformToOptimiser(self, value): return value class LogOptPar(OptPar): # For ratios, optimiser sees log(param value). Conversions to/from # optimiser representation are only done by Calculator.change(), # .getValueArray() and .getBoundsArrrays(). def transformFromOptimiser(self, value): return numpy.exp(value) def transformToOptimiser(self, value): try: return numpy.log(value) except OverflowError: raise OverflowError('log(%s)' % value) class EvaluatedCell(object): __slots__ = ['client_ranks', 'rank', 'calc', 'args', 'is_constant', 'clients', 'failure_count', 'name', 'arg_ranks', 'consequences', 'recycled', 'default'] def __init__(self, name, calc, args, recycling=None, default=None): self.name = name self.rank = None self.calc = calc self.default = default self.args = tuple(args) self.recycled = recycling if recycling: self.args = (self,) + self.args self.is_constant = True for arg in args: arg.addClient(self) if not arg.is_constant: self.is_constant = False self.clients = [] self.client_ranks = [] self.failure_count = 0 def addClient(self, client): self.clients.append(client) def update(self, data): data[self.rank] = self.calc( *[data[arg_rank] for arg_rank in self.arg_ranks]) def prime(self, data_sets): if self.is_constant: # Just calc once self.update(data_sets[0]) for data in data_sets[1:]: data[self.rank] = data_sets[0][self.rank] else: for data in data_sets: self.update(data) def reportError(self, detail, data): self.failure_count += 1 if self.failure_count <= 5: print ("%s in calculating %s:", detail.__class__.__name__, self.name) if self.failure_count == 5: print "Additional failures of this type will not be reported." if self.failure_count < 2: print '%s inputs were:', len(self.arg_ranks) for (i, arg) in enumerate(self.arg_ranks): print '%s: ' % i + repr(data[arg]) class ConstCell(object): __slots__ = ['name', 'scope', 'value', 'rank', 'consequences', 'clients'] recycled = False is_constant = True args = () def __init__(self, name, value): self.name = name self.clients = [] self.value = value def addClient(self, client): self.clients.append(client) class Calculator(object): """A complete hierarchical function with N evaluation steps to call for each change of inputs. Made by a ParameterController.""" def __init__(self, cells, defns, remaining_parallel_context=None, overall_parallel_context=None, trace=None, with_undo=True): if trace is None: trace = TRACE_DEFAULT self.overall_parallel_context = overall_parallel_context self.remaining_parallel_context = remaining_parallel_context self.with_undo = with_undo self.results_by_id = defns self.opt_pars = [] other_cells = [] for cell in cells: if isinstance(cell, OptPar): self.opt_pars.append(cell) else: other_cells.append(cell) self._cells = self.opt_pars + other_cells data_sets = [[0], [0,1]][self.with_undo] self.cell_values = [[None]*len(self._cells) for switch in data_sets] self.arg_ranks = [[] for cell in self._cells] for (i, cell) in enumerate(self._cells): cell.rank = i cell.consequences = {} if isinstance(cell, OptPar): for switch in data_sets: self.cell_values[switch][i] = cell.default_value elif isinstance(cell, ConstCell): for switch in data_sets: self.cell_values[switch][i] = cell.value elif isinstance(cell, EvaluatedCell): cell.arg_ranks = [] for arg in cell.args: if hasattr(arg, 'client_ranks'): arg.client_ranks.append(i) self.arg_ranks[i].append(arg.rank) cell.arg_ranks.append(arg.rank) with parallel.parallel_context(self.remaining_parallel_context): try: cell.prime(self.cell_values) except KeyboardInterrupt: raise except Exception, detail: print ("Failed initial calculation of %s" % cell.name) raise else: raise RuntimeError('Unexpected Cell type %s' % type(cell)) self._switch = 0 self.recycled_cells = [ cell.rank for cell in self._cells if cell.recycled] self.spare = [None] * len (self._cells) for cell in self._cells[::-1]: for arg in cell.args: arg.consequences[cell.rank] = True arg.consequences.update(cell.consequences) self._programs = {} # Just for timings pre-calc these for opt_par in self.opt_pars: self.cellsChangedBy([(opt_par.rank, None)]) self.last_values = self.getValueArray() self.last_undo = [] self.elapsed_time = 0.0 self.evaluations = 0 self.setTracing(trace) self.optimised = False def _graphviz(self): """A string in the 'dot' graph description language used by the program 'Graphviz'. One box per cell, grouped by Defn.""" lines = ['digraph G {\n rankdir = LR\n ranksep = 1\n'] evs = [] for cell in self._cells: if cell.name not in evs: evs.append(cell.name) nodes = dict([(name, []) for name in evs]) edges = [] for cell in self._cells: if hasattr(cell, 'name'): nodes[cell.name].append(cell) for arg in cell.args: if arg is not cell: edges.append('"%s":%s -> "%s":%s' % (arg.name, arg.rank, cell.name, cell.rank)) for name in evs: all_const = True some_const = False enodes = [name.replace('edge', 'QQQ')] for cell in nodes[name]: value = self._getCurrentCellValue(cell) if isinstance(value, float): label = '%5.2e' % value else: label = '[]' label = '<%s> %s' % (cell.rank, label) enodes.append(label) all_const = all_const and cell.is_constant some_const = some_const or cell.is_constant enodes = '|'.join(enodes) colour = ['', ' fillcolor=gray90, style=filled,'][some_const] colour = [colour, ' fillcolor=gray, style=filled,'][all_const] lines.append('"%s" [shape = "record",%s label="%s"];' % (name, colour, enodes)) lines.extend(edges) lines.append('}') return '\n'.join(lines).replace('edge', 'egde').replace('QQQ', 'edge') def graphviz(self, keep=False): """Use Graphviz to display a graph representing the inner workings of the calculator. Leaves behind a temporary file (so that Graphviz can redraw it with different settings) unless 'keep' is False""" import tempfile, os, sys if sys.platform != 'darwin': raise NotImplementedError, "Graphviz support Mac only at present" GRAPHVIZ = '/Applications/Graphviz.app' # test that graphviz is installed if not os.path.exists(GRAPHVIZ): raise RuntimeError('%s not present' % GRAPHVIZ) text = self._graphviz() fn = tempfile.mktemp(prefix="calc_", suffix=".dot") f = open(fn, 'w') f.write(text) f.close() # Mac specific! # Specify Graphviz as ".dot" can mean other things. # Would be sensible to eventually use LaunchServices. os.system('open -a "%s" "%s"' % (GRAPHVIZ, fn)) if not keep: time.sleep(5) os.remove(fn) def optimise(self, **kw): x = self.getValueArray() bounds = self.getBoundsVectors() maximise(self, x, bounds, **kw) self.optimised = True def setTracing(self, trace=False): """With 'trace' true every evaluated is printed. Useful for profiling and debugging.""" self.trace = trace if trace: print n_opars = len(self.opt_pars) n_cells = len([c for c in self._cells if not c.is_constant]) print n_opars, "OptPars and", n_cells - n_opars, "derived values" print 'OptPars: ', ', '.join([par.name for par in self.opt_pars]) print "Times in 1/%sths of a second" % TRACE_SCALE groups = [] groupd = {} for cell in self._cells: if cell.is_constant or not isinstance(cell, EvaluatedCell): continue if cell.name not in groupd: group = [] groups.append((cell.name, group)) groupd[cell.name] = group groupd[cell.name].append(cell) widths = [] for (name, cells) in groups: width = 4 + len(cells) widths.append(min(15, width)) self._cellsGroupedForDisplay = zip(groups, widths) for ((name, cells), width) in self._cellsGroupedForDisplay: print name[:width].ljust(width), '|', print for width in widths: print '-' * width, '|', print def getValueArray(self): """This being a caching function, you can ask it for its current input! Handy for initialising the optimiser.""" values = [p.transformToOptimiser(self._getCurrentCellValue(p)) for p in self.opt_pars] return values # getBoundsVectors and testoptparvector make up the old LikelihoodFunction # interface expected by the optimiser. def getBoundsVectors(self): """2 arrays: minimums, maximums""" lower = numpy.zeros([len(self.opt_pars)], Float) upper = numpy.zeros([len(self.opt_pars)], Float) for (i, opt_par) in enumerate(self.opt_pars): (lb, ub) = opt_par.getOptimiserBounds() lower[i] = lb upper[i] = ub return (lower, upper) def fuzz(self, random_series=None, seed=None): # Slight randomisation suitable for removing right-on-the- # ridge starting points before local optimisation. if random_series is None: import random random_series = random.Random() if seed is not None: random_series.seed(seed) X = self.getValueArray() for (i, (l,u)) in enumerate(zip(*self.getBoundsVectors())): sign = random_series.choice([-1, +1]) step = random_series.uniform(+0.05, +0.025) X[i] = max(l,min(u,(1.0 + sign*step*X[i]))) self.testoptparvector(X) self.optimised = False def testoptparvector(self, values): """AKA self(). Called by optimisers. Returns the output value after doing any recalculation required for the new input 'values' array""" assert len(values) == len(self.opt_pars) changes = [(i, new) for (i, (old, new)) in enumerate(zip(self.last_values, values)) if old != new] return self.change(changes) __call__ = testoptparvector def testfunction(self): """Return the current output value without changing any inputs""" return self._getCurrentCellValue(self._cells[-1]) def change(self, changes): """Returns the output value after applying 'changes', a list of (optimisable_parameter_ordinal, new_value) tuples.""" t0 = time.time() self.evaluations += 1 assert parallel.getContext() is self.overall_parallel_context, ( parallel.getContext(), self.overall_parallel_context) # If ALL of the changes made in the last step are reversed in this step # then it is safe to undo them first, taking advantage of the 1-deep # cache. if self.with_undo and self.last_undo: for (i, v) in self.last_undo: if (i,v) not in changes: break else: changes = [ch for ch in changes if ch not in self.last_undo] self._switch = not self._switch for (i, v) in self.last_undo: self.last_values[i] = v self.last_undo = [] program = self.cellsChangedBy(changes) if self.with_undo: self._switch = not self._switch data = self.cell_values[self._switch] base = self.cell_values[not self._switch] # recycle and undo interact in bad ways for rank in self.recycled_cells: if data[rank] is not base[rank]: self.spare[rank] = data[rank] data[:] = base[:] for cell in program: if cell.recycled: if data[cell.rank] is base[cell.rank]: data[cell.rank]=self.spare[cell.rank] assert data[cell.rank] is not base[cell.rank] else: data = self.cell_values[self._switch] # Set new OptPar values changed_optpars = [] for (i, v) in changes: if i < len(self.opt_pars): assert isinstance(v*1.0, float), v changed_optpars.append((i, self.last_values[i])) self.last_values[i] = v data[i] = self.opt_pars[i].transformFromOptimiser(v) else: data[i] = v with parallel.parallel_context(self.remaining_parallel_context): try: if self.trace: self.tracingUpdate(changes, program, data) else: self.plainUpdate(program, data) # if non-optimiser parameter was set then undo is invalid if (self.last_undo and max(self.last_undo)[0] >= len(self.opt_pars)): self.last_undo = [] else: self.last_undo = changed_optpars except CalculationInterupted, detail: if self.with_undo: self._switch = not self._switch for (i,v) in changed_optpars: self.last_values[i] = v self.last_undo = [] (cell, exception) = detail.args raise exception finally: self.elapsed_time += time.time() - t0 return self.cell_values[self._switch][-1] def cellsChangedBy(self, changes): # What OptPars have been changed determines cells to update change_key = dict(changes).keys() change_key.sort() change_key = tuple(change_key) if change_key in self._programs: program = self._programs[change_key] else: # Make a list of the cells to update and cache it. consequences = {} for i in change_key: consequences.update(self._cells[i].consequences) self._programs[change_key] = program = ( [cell for cell in self._cells if cell.rank in consequences]) return program def plainUpdate(self, program, data): try: for cell in program: data[cell.rank] = cell.calc(*[data[a] for a in cell.arg_ranks]) except ParameterOutOfBoundsError, detail: # Non-fatal error, just cancel this calculation. raise CalculationInterupted(cell, detail) except ArithmeticError, detail: # Non-fatal but unexpected error. Warn and cancel this calculation. cell.reportError(detail, data) raise CalculationInterupted(cell, detail) def tracingUpdate(self, changes, program, data): # Does the same thing as plainUpdate, but also produces lots of # output showing how long each step of the calculation takes. # One line per call, '-' for undo, '+' for calculation exception = None elapsed = {} for cell in program: try: t0 = time.time() data[cell.rank] = cell.calc(*[data[a] for a in cell.arg_ranks]) t1 = time.time() except (ParameterOutOfBoundsError, ArithmeticError), exception: error_cell = cell break elapsed[cell.rank] = (t1-t0) tds = [] for ((name, cells), width) in self._cellsGroupedForDisplay: text = ''.join([' +'[cell.rank in elapsed] for cell in cells]) elap = sum([elapsed.get(cell.rank, 0) for cell in cells]) if len(text) > width-4: edge_width = min(len(text), (width - 4 - 3)) // 2 elipsis = [' ','...'][not not text.strip()] text = text[:edge_width] + elipsis + text[-edge_width:] tds.append('%s%4s' % (text, int(TRACE_SCALE*elap+0.5) or '')) par_descs = [] for (i,v) in changes: cell = self._cells[i] if isinstance(cell, OptPar): par_descs.append('%s=%8.6f' % (cell.name, v)) else: par_descs.append('%s=?' % cell.name) par_descs = ', '.join(par_descs)[:22].ljust(22) print ' | '.join(tds+['']), if exception: print '%15s | %s' % ('', par_descs) error_cell.reportError(exception, data) raise CalculationInterupted(cell, exception) else: print '%-15s | %s' % (repr(data[-1])[:15], par_descs) def measureEvalsPerSecond(self, time_limit=1.0, wall=True, sa=False): # Returns an estimate of the number of evaluations per second # an each-optpar-in-turn simulated annealing type optimiser # can achive, spending not much more than 'time_limit' doing # so. 'wall'=False causes process time to be used instead of # wall time. # 'sa' makes it simulated-annealing-like, with frequent backtracks if wall: now = time.time else: now = time.clock x = self.getValueArray() samples = [] elapsed = 0.0 rounds_per_sample = 2 comm = parallel.getCommunicator() while elapsed < time_limit and len(samples) < 5: time.sleep(0.01) t0 = now() last = [] for j in range(rounds_per_sample): for (i,v) in enumerate(x): # Not a real change, but works like one. self.change(last + [(i, v)]) if sa and (i+j) % 2: last = [(i, v)] else: last = [] # Use one agreed on delta otherwise different cpus will finish the # loop at different times causing chaos. delta = comm.allreduce(now()-t0, parallel.MPI.MAX) if delta < 0.1: # time.clock is low res, so need to ensure each sample # is long enough to take SOME time. rounds_per_sample *= 2 continue else: rate = rounds_per_sample * len(x) / delta samples.append(rate) elapsed += delta if wall: samples.sort() return samples[len(samples)//2] else: return sum(samples) / len(samples) def _getCurrentCellValue(self, cell): return self.cell_values[self._switch][cell.rank] def getCurrentCellValuesForDefn(self, defn): cells = self.results_by_id[id(defn)] return [self.cell_values[self._switch][cell.rank] for cell in cells] def __getBoundedRoot(self, func, origX, direction, bound, xtol): return find_root(func, origX, direction, bound, xtol=xtol, expected_exception = ( ParameterOutOfBoundsError, ArithmeticError)) def _getCurrentCellInterval(self, opt_par, dropoff, xtol=None): # (min, opt, max) tuples for each parameter where f(min) == # f(max) == f(opt)-dropoff. Uses None when a bound is hit. #assert self.optimised, "Call optimise() first" origY = self.testfunction() (lower, upper) = opt_par.getOptimiserBounds() opt_value = self._getCurrentCellValue(opt_par) origX = opt_par.transformToOptimiser(opt_value) def func(x): Y = self.change([(opt_par.rank, x)]) return Y - (origY - dropoff) try: lowX = self.__getBoundedRoot(func, origX, -1, lower, xtol) highX = self.__getBoundedRoot(func, origX, +1, upper, xtol) finally: func(origX) triple = [] for x in [lowX, origX, highX]: if x is not None: x = opt_par.transformFromOptimiser(x) triple.append(x) return tuple(triple) PyCogent-1.5.3/cogent/recalculation/definition.py000644 000765 000024 00000050655 12024702176 023062 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """A recalculation engine, something like a spreadsheet. Goals: - Allow construction of a calculation in a flexible and declarative way. - Enable caching at any step in the calculation where it makes sense. Terms: - Definition - defines one cachable step in a complex calculation. - ParameterController - Sets parameter scope rules on a DAG of Definitions. - Calculator - An instance of an internally caching function. - Category - An arbitrary label. - Dimension - A named set of categories. - Scope - A subset of the categories from each dimension. - Setting - A variable (Var) or constant (ConstVal). - Assignments - A mapping from Scopes to Settings. - Cell - Evaluates one Scope of one Definition. - OptPar - A cell with indegree 0. Structure: - A Calculator holds a list of Cells: OptPars and EvaluatedCells. - EvaluatedCells take their arguments from other Cells. - Each type of cell (motifs, Qs, Psubs) made by a different CalculationDefn. - No two cells from the same CalculationDefn have the same inputs, so nothing is calculated twice. Interface: 1) Define a function for each step in the calculation. 2) Instantiate a DAG of ParamDefns and CalcDefns, each CalcDefn like CalcDefn(f)(*args) where 'f' is one of your functions and the '*args' are Defns that correspond to the arguments of 'f'. 3) With your final CalcDefn called say 'top', PC = ParameterController(top) to get a ParameterController. 4) PC.assignAll(param, value=value, **scope) to define the parameter scopes. 'value' can be a constant float or an instance of Var. 5) calculator = PC.makeCalculator() to get a live Calculator. 6) calculator.optimise() etc. Caching: In addition to the caching provided by the update strategy (not recalculating anything that hasn't changed), the calculator keeps a 1-deep cache of the previous value for each cell so that it has a 1-deep undo capability. This is ideal for the behaviour of a one-change-at-a-time simanneal optimiser, which backtracks when a new value isn't accepted, ie it tries sequences like: [0,0,0] [0,0,3] [0,8,0] [7,0,0] [0,0,4] [0,6,0] ... when it isn't making progress, and [0,0,0] [0,0,3] [0,8,3] [7,8,3] [7,8,9] ... when it's having a lucky streak. Each cell knows when it's out of date, but doesn't know why (ie: what input changed) which limits the undo strategy to all-or-nothing. An optimiser that tried values [0,0,0] [0,3,8] [0,3,0] ... (ie: the third step is a recombination of the previous two) would not get any help from caching. This does keep things simple and fast though. Recycling: If defn.recycling is True then defn.calc() will be passed the previous result as its first argument so it can be reused. This is to avoid having to reallocate memory for say a large numpy array just at the very moment that an old one of the same shape is being disposed of. To prevent recycling from invalidating the caching system 3 values are stored for each cell - current, previous and spare. The spare value is the one to be used next for recycling. """ from __future__ import division, with_statement import warnings import numpy # In this module we bring together scopes, settings and calculations. # Most of the classes are 'Defns' with their superclasses in scope.py. # These supply a makeCells() method which instantiates 'Cell' # classes from calculation.py from .calculation import EvaluatedCell, OptPar, LogOptPar, ConstCell from .scope import _NonLeafDefn, _LeafDefn, _Defn, ParameterController from .setting import Var, ConstVal from cogent.util.dict_array import DictArrayTemplate from cogent.maths.stats.distribution import chdtri from cogent.util import parallel __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class CalculationDefn(_NonLeafDefn): """Defn for a derived value. In most cases use CalcDefn instead The only reason for subclassing this directly would be to override .makeCalcFunction() or setup().""" recycling = False # positional arguments are inputs to this step of the calculation, # keyword arguments are passed on to self.setup(), likely to end up # as static attributes of this CalculationDefn, to be used (as self.X) # by its 'calc' method. def makeParamController(self): return ParameterController(self) def setup(self): pass def makeCalcFunction(self): return self.calc def makeCell(self, *args): calc = self.makeCalcFunction() # can't calc outside correct parallel context, so can't do # if [arg for arg in args if not arg.is_constant]: cell = EvaluatedCell(self.name, calc, args, recycling=self.recycling, default=self.default) return cell def makeCells(self, input_soup, variable=None): # input soups contains all necessary values for calc on self. # Going from defns to cells. cells = [] for input_nums in self.uniq: args = [] for (arg, u) in zip(self.args, input_nums): arg = input_soup[id(arg)][u] args.append(arg) cell = self.makeCell(*args) cells.append(cell) return (cells, cells) class _FuncDefn(CalculationDefn): def __init__(self, calc, *args, **kw): self.calc = calc CalculationDefn.__init__(self, *args, **kw) # Use this rather than having to subclass CalculationDefinition # just to supply the 'calc' method. class CalcDefn(object): """CalcDefn(function)(arg1, arg2)""" def __init__(self, calc, name=None, **kw): self.calc = calc if name is None: name = self.calc.__name__ else: assert isinstance(name, basestring), name kw['name'] = name self.kw = kw def __call__(self, *args): return _FuncDefn(self.calc, *args, **self.kw) class WeightedPartitionDefn(CalculationDefn): """Uses a PartitionDefn (ie: N-1 optimiser parameters) to make an array of floats with weighted average of 1.0""" def __init__(self, weights, name): N = len(weights.bin_names) partition = PartitionDefn(size=N, name=name+'_partition') partition.user_param = False CalculationDefn.__init__(self, weights, partition, name=name+'_distrib') def calc(self, weights, values): scale = numpy.sum(weights * values) return values / scale class MonotonicDefn(WeightedPartitionDefn): """Uses a PartitionDefn (ie: N-1 optimiser parameters) to make an ordered array of floats with weighted average of 1.0""" def calc(self, weights, increments): values = numpy.add.accumulate(increments) scale = numpy.sum(weights * values) return values / scale class GammaDefn(MonotonicDefn): """Uses 1 optimiser parameter to define a gamma distribution, divides the distribution into N equal probability bins and makes an array of their medians. If N > 2 medians are approx means so their average is approx 1.0, but not quite so we scale them to make it exactly 1.0""" name = 'gamma' def __init__(self, weights, name=None, default_shape=1.0, extra_label=None, dimensions=()): name = self.makeName(name, extra_label) shape = PositiveParamDefn(name+'_shape', default=default_shape, dimensions=dimensions, lower=1e-2) CalculationDefn.__init__(self, weights, shape, name=name+'_distrib') def calc(self, weights, a): from cogent.maths.stats.distribution import gdtri weights = weights / numpy.sum(weights) percentiles = (numpy.add.accumulate(weights) - weights*0.5) medians = numpy.array([gdtri(a,a,p) for p in percentiles]) scale = numpy.sum(medians*weights) #assert 0.5 < scale < 2.0, scale # medians as approx. to means. return medians / scale class _InputDefn(_LeafDefn): user_param = True def __init__(self, name=None, default=None, dimensions=None, lower=None, upper=None, **kw): _LeafDefn.__init__(self, name=name, dimensions=dimensions, **kw) if default is not None: if hasattr(default, '__len__'): default = numpy.array(default) self.default = default # these two have no effect on constants if lower is not None: self.lower = lower if upper is not None: self.upper = upper def makeParamController(self): return ParameterController(self) def updateFromCalculator(self, calc): outputs = calc.getCurrentCellValuesForDefn(self) for (output, setting) in zip(outputs, self.uniq): setting.value = output def getNumFreeParams(self): (cells, outputs) = self.makeCells({}, None) return len([c for c in cells if isinstance(c, OptPar)]) class ParamDefn(_InputDefn): """Defn for an optimisable, scalar input to the calculation""" numeric = True const_by_default = False independent_by_default = False opt_par_class = OptPar # These can be overridden in a subclass or the constructor default = 1.0 lower = -1e10 upper = +1e10 def makeDefaultSetting(self): return Var(bounds = (self.lower, self.default, self.upper)) def checkSettingIsValid(self, setting): pass def makeCells(self, input_soup={}, variable=None): uniq_cells = [] for (i, v) in enumerate(self.uniq): scope = [key for key in self.assignments if self.assignments[key] is v] if v.is_constant or (variable is not None and variable is not v): cell = ConstCell(self.name, v.value) else: cell = self.opt_par_class(self.name, scope, v.getBounds()) uniq_cells.append(cell) return (uniq_cells, uniq_cells) # Example / basic ParamDefn subclasses class PositiveParamDefn(ParamDefn): lower = 0.0 class ProbabilityParamDefn(PositiveParamDefn): upper = 1.0 class RatioParamDefn(PositiveParamDefn): lower = 1e-6 upper = 1e+6 opt_par_class = LogOptPar class NonScalarDefn(_InputDefn): """Defn for an array or other such object that is an input but can not be optimised""" user_param = False numeric = False const_by_default = True independent_by_default = False default = None def makeDefaultSetting(self): if self.default is None: return None else: return ConstVal(self.default) def checkSettingIsValid(self, setting): if not isinstance(setting, ConstVal): raise ValueError("%s can only be constant" % self.name) def makeCells(self, input_soup={}, variable=None): if None in self.uniq: if [v for v in self.uniq if v is not None]: scope = [key for key in self.assignments if self.assignments[key] is None] msg = 'Unoptimisable input "%%s" not set for %s' % scope else: msg = 'Unoptimisable input "%s" not given' raise ValueError(msg % self.name) uniq_cells = [ConstCell(self.name, v.value) for v in self.uniq] return (uniq_cells, uniq_cells) def getNumFreeParams(self): return 0 def updateFromCalculator(self, calc): pass # don't reset parallel_context etc. def _proportions(total, params): """List of N proportions from N-1 ratios >>> _proportions(1.0, [3, 1, 1]) [0.125, 0.125, 0.375, 0.375]""" if len(params) == 0: return [total] half = (len(params)+1) // 2 part = 1.0 / (params[0] + 1.0) # ratio -> proportion return _proportions(total*part, params[1:half]) + \ _proportions(total*(1.0-part), params[half:]) def _unpack_proportions(values): """List of N-1 ratios from N proportions""" if len(values) == 1: return [] half = len(values) // 2 (num, denom) = (sum(values[half:]), sum(values[:half])) assert num > 0 and denom > 0 ratio = num / denom return [ratio] + _unpack_proportions(values[:half]) + \ _unpack_proportions(values[half:]) class PartitionDefn(_InputDefn): """A partition such as mprobs can be const or optimised. Optimised is a bit tricky since it isn't just a scalar.""" numeric = False # well, not scalar anyway const_by_default = False independent_by_default = False def __init__(self, default=None, name=None, dimensions=None, dimension=None, size=None, **kw): assert name if size is not None: pass elif default is not None: size = len(default) elif dimension is not None: size = len(dimension[1]) self.size = size if dimension is not None: self.internal_dimension = dimension (dim_name, dim_cats) = dimension self.bin_names = dim_cats self.array_template = DictArrayTemplate(dim_cats) self.internal_dimensions = (dim_name,) if default is None: default = self._makeDefaultValue() elif self.array_template is not None: default = self.array_template.unwrap(default) else: default = numpy.asarray(default) _InputDefn.__init__(self, name=name, default=default, dimensions=dimensions, **kw) self.checkValueIsValid(default, True) def _makeDefaultValue(self): return numpy.array([1.0/self.size] * self.size) def makeDefaultSetting(self): #return ConstVal(self.default) return Var((None, self.default.copy(), None)) def checkSettingIsValid(self, setting): value = setting.getDefaultValue() return self.checkValueIsValid(value, setting.is_constant) def checkValueIsValid(self, value, is_constant): if value.shape != (self.size,): raise ValueError("Wrong array shape %s for %s, expected (%s,)" % (value.shape, self.name, self.size)) for part in value: if part < 0: raise ValueError("Negative probability in %s" % self.name) if part > 1: raise ValueError("Probability > 1 in %s" % self.name) if not is_constant: # 0 or 1 leads to log(0) or log(inf) in optimiser code if part == 0: raise ValueError("Zeros allowed in %s only when constant" % self.name) if part == 1: raise ValueError("Ones allowed in %s only when constant" % self.name) if abs(sum(value) - 1.0) > .00001: raise ValueError("Elements of %s must sum to 1.0, not %s" % (self.name, sum(value))) def _makePartitionCell(self, name, scope, value): # This was originally put in its own function so as to provide a # closure containing the value of sum(value), which is no longer # required since it is now always 1.0. N = len(value) assert abs(sum(value) - 1.0) < .00001 ratios = _unpack_proportions(value) ratios = [LogOptPar(name+'_ratio', scope, (1e-6,r,1e+6)) for r in ratios] def r2p(*ratios): return numpy.asarray(_proportions(1.0, ratios)) partition = EvaluatedCell(name, r2p, tuple(ratios)) return (ratios, partition) def makeCells(self, input_soup={}, variable=None): uniq_cells = [] all_cells = [] for (i, v) in enumerate(self.uniq): if v is None: raise ValueError("input %s not set" % self.name) assert hasattr(v, 'getDefaultValue'), v value = v.getDefaultValue() assert hasattr(value, 'shape'), value assert value.shape == (self.size,) scope = [key for key in self.assignments if self.assignments[key] is v] assert value is not None if v.is_constant or (variable is not None and variable is not v): partition = ConstCell(self.name, value) else: (ratios, partition) = self._makePartitionCell( self.name, scope, value) all_cells.extend(ratios) all_cells.append(partition) uniq_cells.append(partition) return (all_cells, uniq_cells) def NonParamDefn(name, dimensions=None, **kw): # Just to get 2nd arg as dimensions return NonScalarDefn(name=name, dimensions=dimensions, **kw) class ConstDefn(NonScalarDefn): # This isn't really needed - just use NonParamDefn name_required = False def __init__(self, value, name=None, **kw): NonScalarDefn.__init__(self, default=value, name=name, **kw) def checkSettingIsValid(self, setting): if setting is not None and setting.value is not self.default: raise ValueError("%s is constant" % self.name) class SelectForDimension(_Defn): """A special kind of Defn used to bridge from Defns where a particular dimension is wrapped up inside an array to later Defns where each value has its own Defn, eg: gamma distributed rates""" name = 'select' user_param = True numeric=True # not guarenteed! internal_dimensions = () def __init__(self, arg, dimension, name=None): assert not arg.activated, arg.name if name is not None: self.name = name _Defn.__init__(self) self.args = (arg,) self.arg = arg self.valid_dimensions = arg.valid_dimensions if dimension not in self.valid_dimensions: self.valid_dimensions = self.valid_dimensions + (dimension,) self.dimension = dimension arg.addClient(self) def update(self): for scope_t in self.assignments: scope = dict(zip(self.valid_dimensions, scope_t)) scope2 = dict((n,v) for (n,v) in scope.items() if n!=self.dimension) input_num = self.arg.outputOrdinalFor(scope2) pos = self.arg.bin_names.index(scope[self.dimension]) self.assignments[scope_t] = (input_num, pos) self._update_from_assignments() self.values = [self.arg.values[i][p] for (i,p) in self.uniq] def _select(self, arg, p): return arg[p] def makeCells(self, input_soup, variable=None): cells = [] distribs = input_soup[id(self.arg)] for (input_num, bin_num) in self.uniq: cell = EvaluatedCell( self.name, (lambda x,p=bin_num:x[p]), (distribs[input_num],)) cells.append(cell) return (cells, cells) # Some simple CalcDefns #SumDefn = CalcDefn(lambda *args:sum(args), 'sum') #ProductDefn = CalcDefn(lambda *args:numpy.product(args), 'product') #CallDefn = CalcDefn(lambda func,*args:func(*args), 'call') #ParallelSumDefn = CalcDefn(lambda comm,local:comm.sum(local), 'parallel_sum') class SwitchDefn(CalculationDefn): name = 'switch' def calc(self, condition, *args): return args[condition] def getShortcutCell(self, condition, *args): if condition.is_constant: return self.calc(self, condition.value, *args) class VectorMatrixInnerDefn(CalculationDefn): name = 'evolve' def calc(self, pi, psub): return numpy.inner(pi, psub) def getShortcutCell(self, pi, psub): if psub.is_stationary: return pi class SumDefn(CalculationDefn): name = 'sum' def calc(self, *args): return sum(args) class ProductDefn(CalculationDefn): name = 'product' def calc(self, *args): return numpy.product(args) class CallDefn(CalculationDefn): name = 'call' def calc(self, func, *args): return func(*args) class ParallelSumDefn(CalculationDefn): name = 'parallel_sum' def calc(self, comm, local): return comm.allreduce(local) # default MPI op is SUM __all__ = ['ConstDefn', 'NonParamDefn', 'CalcDefn', 'SumDefn', 'ProductDefn', 'CallDefn', 'ParallelSumDefn'] + [ n for (n,c) in vars().items() if (isinstance(c, type) and issubclass(c, _Defn) and n[0] != '_') or isinstance(c, CalcDefn)] PyCogent-1.5.3/cogent/recalculation/scope.py000644 000765 000024 00000074234 12024702176 022042 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division, with_statement import warnings import numpy from contextlib import contextmanager from .setting import Var, ConstVal from .calculation import Calculator from cogent.util import parallel from cogent.maths.stats.distribution import chdtri from cogent.maths.optimisers import MaximumEvaluationsReached __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class ScopeError(KeyError): pass class InvalidScopeError(ScopeError): """for scopes including an unknown value for a known dimension""" pass class InvalidDimensionError(ScopeError): """for scopes including an unknown dimension""" pass class IncompleteScopeError(ScopeError): """For underspecified scope when retrieving values""" pass # Can be passed to _LeafDefn.interpretScopes() class _ExistentialQualifier(object): def __init__(self, cats=None): self.cats = cats def __repr__(self): if self.cats is None: return self.__class__.__name__ else: return '%s(%s)' % (self.__class__.__name__, self.cats) class EACH(_ExistentialQualifier): independent = True class ALL(_ExistentialQualifier): independent = False def theOneItemIn(items): assert len(items) == 1, items return iter(items).next() def _indexed(values): # This is the core of the redundancy elimination, used to group # identical calculations. # >>> _indexed({'a':1.0, 'b':2.0, 'c':3.0, 'd':1.0, 'e':1.0}) # ([1.0, 2.0, 3.0], {'a':0, 'b':1, 'c':2, 'd':0, 'e':0}) uniq = [] index = {} values = values.items() values.sort() for (key, value) in values: if value in uniq: u = uniq.index(value) else: u = len(uniq) uniq.append(value) index[key] = u return uniq, index def _fmtrow(width, values, maxwidth): if (len(dict([(id(v),1) for v in values])) == 1 and len(str(values[0])) > width): s = str(values[0]).replace('\n', ' ') if len(s) > maxwidth: s = s[:maxwidth-4] + '...' else: template = '%%%ss' % width s = ''.join([(template % (v,)).replace('\n', ' ')[:width] for v in values]) return s class Undefined(object): # Placeholder for a value that can't be calculated # because input 'name' has not been provided. def __init__(self, name): self.name = name def __repr__(self): return 'Undef(%s)' % self.name def nullor(name, f, recycled=False): # If None, record as undefined. # If undefined, propagate error up. # Otherwise, do the calculation. def g(*args): undef = [x for x in args if isinstance(x, Undefined)] if undef: return undef[0] elif None in args: return Undefined(name) else: if recycled: args = (None,) + args return f(*args) return g # Level1: D E F I N I T I O N S # Each ParamDefn supplied a .calc(args) method. Used to define the # calculation as a DAG of ParamDefns. # A _Defn has two phases in its life: pre activation it just has .args, # post activation (once it becomes part of a parameter controller) it # holds a dynamic list of scope assignments. # This means defn.makeParamController() can only be called once. class _Defn(object): name = '?' default = None user_param = False def __init__(self): self.clients = [] self.selection = {} self.assignments = {} self.activated = False def makeName(self, name, extra_label=None): if name is None: name = self.name if extra_label is not None: name += extra_label return name def getDefaultSetting(self): return None def addClient(self, client): assert not self.activated, self.name assert not self.assignments, self.assignments self.clients.append(client) def acrossDimension(self, dimension, cats): return [self.selectFromDimension(dimension, cat) for cat in cats] def selectFromDimension(self, dimension, cat): return SelectFromDimension(self, **{dimension:cat}) def getRequiredScopes(self, arg_dimensions): # A list of scope dictionaries: [{dimension:value},] that this # Defn needs from an input Defn with `arg_dimensions` if not self.activated: assert not self.clients, self.clients raise RuntimeError('Value at "%s" step never used' % self.name) if self.assignments: result = [] for scope_t in self.assignments: sel = {} sel.update(self.selection) for (d, c) in zip(self.valid_dimensions, scope_t): if d in arg_dimensions: sel[d] = c result.append(sel) else: result = [self.selection] return result def addScopes(self, scopes): assert not self.activated for scope in scopes: scope_t = [scope.get(d, 'all') for d in self.valid_dimensions] scope_t = tuple(scope_t) if scope_t not in self.assignments: self.assignments[scope_t] = self.getDefaultSetting() def outputOrdinalFor(self, scope): scope_t = tuple([scope[d] for d in self.valid_dimensions]) return self.index[scope_t] def usedDimensions(self): used = [] for (d, dim) in enumerate(self.valid_dimensions): seen = {} for (scope_t, i) in self.index.items(): rest_of_scope = scope_t[:d]+scope_t[d+1:] if rest_of_scope in seen: if i != seen[rest_of_scope]: used.append(dim) break else: seen[rest_of_scope] = i return tuple(used) + self.internal_dimensions def _getPosnForScope(self, *args, **scope): scope = self.interpretPositionalScopeArgs(*args, **scope) posns = set() for scope_t in self.interpretScope(**scope): posns.add(self.index[scope_t]) if len(posns) == 0: raise InvalidScopeError("no value for %s at %s" % (self.name, scope)) if len(posns) > 1: raise IncompleteScopeError("%s distinct values of %s within %s" % (len(posns), self.name, scope)) return theOneItemIn(posns) def wrapValue(self, value): if isinstance(value, Undefined): raise ValueError('Input "%s" is not defined' % value.name) if getattr(self, 'array_template', None) is not None: value = self.array_template.wrap(value) return value def unwrapValue(self, value): if getattr(self, 'array_template', None) is not None: value = self.array_template.unwrap(value) return value def getCurrentValueForScope(self, *args, **scope): posn = self._getPosnForScope(*args, **scope) return self.wrapValue(self.values[posn]) def getCurrentSettingForScope(self, *args, **scope): posn = self._getPosnForScope(*args, **scope) return self.uniq[posn] def interpretPositionalScopeArgs(self, *args, **scope): # Carefully turn scope args into scope kwargs assert len(args) <= len(self.valid_dimensions), args for (dimension, arg) in zip(self.valid_dimensions, args): assert dimension not in scope, dimension scope[dimension] = arg return scope def interpretScopes(self, independent=None, **kw): """A list of the scopes defined by the selecting keyword arguments. Keyword arguments should be of the form dimension=settings, where settings are a list of categories from that dimension, or an instance of EACH or ALL wrapping such a list. A missing list, None, or an uninstantiated ALL / EACH class is taken to mean the entire dimension. If 'independent' (which defaults to self.independent_by_default) is true then category lists not wrapped as an EACH or an ALL will be treated as an EACH, otherwise as an ALL. There will only be one scope in the resulting list unless at least one dimension is set to EACH.""" if independent is None: independent = self.independent_by_default # interpretScopes is used for assigning, so should specify # the scope exactly for d in kw: if d not in self.valid_dimensions: raise InvalidDimensionError(d) # Initially ignore EACH, just get a full ungrouped set kw2 = {} independent_dimensions = [] for (i,dimension) in enumerate(self.valid_dimensions): selection = kw.get(dimension, None) if selection in [EACH, ALL]: dimension_independent = selection.independent selection = None elif isinstance(selection, (EACH, ALL)): dimension_independent = selection.independent selection = selection.cats else: dimension_independent = independent if dimension_independent: independent_dimensions.append(i) if selection is not None: kw2[dimension] = selection all = self.interpretScope(**kw2) # Group independent scopes result = {} for scope_t in all: key = tuple([scope_t[i] for i in independent_dimensions]) if key not in result: result[key] = set() result[key].add(scope_t) return result.values() def interpretScope(self, **kw): """A set of the scope-tuples that match the input dict like {dimension:[categories]}""" selector = [] unused = {} valid_dimensions = list(self.valid_dimensions) for d in kw: if d not in valid_dimensions: continue if kw[d] is None: #i.e.: ALL continue if isinstance(kw[d], str): kw[d] = [kw[d]] assert type(kw[d]) in [tuple, list], (d, kw[d]) assert len(kw[d]), kw[d] selector.append((valid_dimensions.index(d), d, kw[d])) unused[d] = kw[d][:] result = set() for scope_t in self.assignments: for (i, d, cs) in selector: if scope_t[i] not in cs: break else: result.add(scope_t) for (i, d, cs) in selector: if d in unused: if scope_t[i] in unused[d]: unused[d].remove(scope_t[i]) if not unused[d]: del unused[d] if unused: # print unused, self.assignments.keys() raise InvalidScopeError(unused) return result def fillParValueDict(self, result, dimensions, cell_value_lookup): """Low level method for extracting values. Pushes values of this particular parameter/defn into the dict tree 'result', eg: length_defn.fillParValueDict(['edge']) populates 'result' like {'length':{'human':1.0, 'mouse':1.0}}""" assert self.name not in result, self.name posns = [ list(self.valid_dimensions).index(d) for d in dimensions if d in self.valid_dimensions] for (scope_t, i) in self.index.items(): value = cell_value_lookup(self, i) value = self.wrapValue(value) scope = tuple([scope_t[i] for i in posns]) (d,key) = (result, self.name) for key2 in scope: if key not in d: d[key] = {} (d, key) = (d[key], key2) if key in d and value != d[key]: msg = 'Multiple values for %s' % self.name if scope: msg += ' within scope %s' % '/'.join(scope) raise IncompleteScopeError(msg) d[key] = value def _update_from_assignments(self): (self.uniq, self.index) = _indexed(self.assignments) def _local_repr(self, col_width, max_width): body = [] for (i, arg) in enumerate(self.args): row = [] if isinstance(arg, SelectFromDimension): argname = arg.arg.name for nums in self.uniq: num = arg.uniq[nums[i]] row.append(theOneItemIn(num)) else: argname = arg.name for nums in self.uniq: row.append(nums[i]) body.append((['', self.name][i==0], argname, row)) return '\n'.join( ['%-10s%-10s%s' % (label1[:9], label2[:9], _fmtrow(col_width+1, settings, max_width)) for (label1, label2, settings) in body]) def __repr__(self): return '%s(%s x %s)' % (self.__class__.__name__, self.name, len(getattr(self, 'cells', []))) class SelectFromDimension(_Defn): """A special kind of Defn used to bridge from Defns where a particular dimension is just part of the scope rules to later Defns where each value has its own Defn, eg: edges of a tree""" name = 'select' #params = {} def __init__(self, arg, **kw): assert not arg.activated, arg.name _Defn.__init__(self) self.args = (arg,) self.arg = arg self.valid_dimensions = tuple([ d for d in arg.valid_dimensions if d not in kw]) self.selection = kw arg.addClient(self) def update(self): for scope_t in self.assignments: scope = dict(zip(self.valid_dimensions, scope_t)) scope.update(self.selection) input_num = self.arg.outputOrdinalFor(scope) self.assignments[scope_t] = (input_num,) self._update_from_assignments() self.values = [self.arg.values[i] for (i,) in self.uniq] def makeCells(self, input_soup, variable=None): arg = input_soup[id(self.arg)] outputs = [arg[input_num] for (input_num,) in self.uniq] return ([], outputs) class _NonLeafDefn(_Defn): def __init__(self, *args, **kw): _Defn.__init__(self) valid_dimensions = [] for arg in args: assert isinstance(arg, _Defn), type(arg) assert not arg.activated, arg.name for dimension in arg.valid_dimensions: if dimension not in valid_dimensions: valid_dimensions.append(dimension) #print >>sys.stderr, arg.name, '>', valid_dimensions, '>', self.name arg.addClient(self) valid_dimensions.sort() self.valid_dimensions = tuple(valid_dimensions) self.args = args if 'name' in kw: self.name = kw.pop('name') self.setup(**kw) def setup(self): pass def update(self): for scope_t in self.assignments: scope = dict(zip(self.valid_dimensions, scope_t)) input_nums = [arg.outputOrdinalFor(scope) for arg in self.args] self.assignments[scope_t] = tuple(input_nums) self._update_from_assignments() calc = self.makeCalcFunction() self.values = [nullor(self.name, calc, self.recycling)(*[a.values[i] for (i,a) in zip(u, self.args)]) for u in self.uniq] class _LeafDefn(_Defn): """An input to the calculator, ie: a Defn with no inputs itself. This class is incomplete - subclasses provide: makeDefaultSetting() adaptSetting(setting) makeCells(input_soup)""" args = () name = None name_required = True # These can be overriden in a subclass or the constuctor. valid_dimensions = () numeric = False array_template = None internal_dimensions = () def __init__(self, name=None, extra_label=None, dimensions=None, independent_by_default=None): _Defn.__init__(self) if dimensions is not None: assert type(dimensions) in [list, tuple], type(dimensions) self.valid_dimensions = tuple(dimensions) if independent_by_default is not None: self.independent_by_default = independent_by_default if name is not None: self.name = name if self.name_required: assert isinstance(self.name, basestring), self.name if extra_label is not None: self.name = self.name + extra_label def getDefaultSetting(self): if (getattr(self, '_default_setting', None) is None or self.independent_by_default): self._default_setting = self.makeDefaultSetting() return self._default_setting def update(self): self._update_from_assignments() gdv = lambda x:x.getDefaultValue() self.values = [nullor(self.name, gdv)(u) for u in self.uniq] def assignAll(self, scope_spec=None, value=None, lower=None, upper=None, const=None, independent=None): settings = [] if const is None: const = self.const_by_default for scope in self.interpretScopes( independent=independent, **(scope_spec or {})): if value is None: s_value = self.getMeanCurrentValue(scope) else: s_value = self.unwrapValue(value) if const: setting = ConstVal(s_value) elif not self.numeric: if lower is not None or upper is not None: raise ValueError( "Non-scalar input '%s' doesn't support bounds" % self.name) setting = Var((None, s_value, None)) else: (s_lower, s_upper) = self.getCurrentBounds(scope) if lower is not None: s_lower = lower if upper is not None: s_upper = upper if s_lower > s_upper: raise ValueError("Bounds: upper < lower") elif (s_lower is not None) and s_value < s_lower: s_value = s_lower warnings.warn("Value of %s increased to keep within bounds" % self.name, stacklevel=3) elif (s_upper is not None) and s_value > s_upper: s_value = s_upper warnings.warn("Value of %s decreased to keep within bounds" % self.name, stacklevel=3) setting = Var((s_lower, s_value, s_upper)) self.checkSettingIsValid(setting) settings.append((scope, setting)) for (scope, setting) in settings: for scope_t in scope: assert scope_t in self.assignments, scope_t self.assignments[scope_t] = setting def getMeanCurrentValue(self, scope): values = [self.assignments[s].getDefaultValue() for s in scope] if len(values) == 1: s_value = values[0] else: s_value = sum(values) / len(values) for value in values: if not numpy.all(value==s_value): warnings.warn("Used mean of %s %s values" % (len(values), self.name), stacklevel=4) break return s_value def getCurrentBounds(self, scope): lowest = highest = None for s in scope: (lower, init, upper) = self.assignments[s].getBounds() if upper == lower: continue if lowest is None or lower < lowest: lowest = lower if highest is None or upper > highest: highest = upper if lowest is None or highest is None: # All current settings are consts so use the class defaults (lowest, default, highest) = self.getDefaultSetting().getBounds() return (lowest, highest) def __repr__(self): return "%s(%s)" % (self.__class__.__name__, self._local_repr(col_width=6, max_width=60)) def _local_repr(self, col_width, max_width): template = "%%%s.%sf" % (col_width, (col_width-1)//2) assignments = [] for (i,a) in self.assignments.items(): if a is None: assignments.append('None') elif a.is_constant: if isinstance(a.value, float): assignments.append(template % a.value) else: assignments.append(a.value) else: assignments.append('Var') # %s' % str(i)) return '%-20s%s' % (self.name[:19], _fmtrow(col_width+1, assignments, max_width)) class ParameterController(object): """Holds a set of activated CalculationDefns, including their parameter scopes. Makes calculators on demand.""" def __init__(self, top_defn): # topological sort indegree = {id(top_defn):0} Q = [top_defn] while Q: pd = Q.pop(0) for arg in pd.args: arg_id = id(arg) if arg_id in indegree: indegree[arg_id] += 1 else: indegree[arg_id] = 1 Q.append(arg) topdown = [] Q = [top_defn] while Q: pd = Q.pop(0) topdown.append(pd) for arg in pd.args: arg_id = id(arg) indegree[arg_id] -= 1 if indegree[arg_id] == 0: Q.append(arg) # propagate categories downwards top_defn.assignments = {} for pd in topdown: pd.assignments = {} for client in pd.clients: scopes = client.getRequiredScopes(pd.valid_dimensions) # print pd.valid_dimensions, pd.name, '<', scopes, '<', client.name, client.valid_dimensions pd.addScopes(scopes) if not pd.assignments: pd.addScopes([{}]) pd.activated = True self.defns = topdown[::-1] self.defn_for = {} for defn in self.defns: #if not defn.args: #assert defn.name not in self.defn_for, defn.name if defn.name in self.defn_for: self.defn_for[defn.name] = None # duplicate else: self.defn_for[defn.name] = defn self._changed = set() self._update_suspended = False self.updateIntermediateValues(self.defns) self.setupParallelContext() def getParamNames(self, scalar_only=False): """The names of the numerical inputs to the calculation.""" return [defn.name for defn in self.defns if defn.user_param and (defn.numeric or not scalar_only)] def getUsedDimensions(self, par_name): return self.defn_for[par_name].usedDimensions() def getParamValue(self, par_name, *args, **kw): """The value for 'par_name'. Additional arguments specify the scope. Despite the name intermediate values can also be retrieved this way.""" callback = self._makeValueCallback(None, None) defn = self.defn_for[par_name] posn = defn._getPosnForScope(*args, **kw) return callback(defn, posn) def getParamInterval(self, par_name, *args, **kw): """Confidence interval for 'par_name' found by adjusting the single parameter until the final result falls by 'dropoff', which can be specified directly or via 'p' as chdtri(1, p). Additional arguments are taken to specify the scope.""" dropoff = kw.pop('dropoff', None) p = kw.pop('p', None) if dropoff is None and p is None: p = 0.05 callback = self._makeValueCallback(dropoff, p, kw.pop('xtol', None)) defn = self.defn_for[par_name] posn = defn._getPosnForScope(*args, **kw) return callback(defn, posn) def getFinalResult(self): return self.defns[-1].getCurrentValueForScope() def getParamValueDict(self, dimensions, p=None, dropoff=None, params=None, xtol=None): """A dict tree of parameter values, with parameter names as the top level keys, and the various dimensions ('edge', 'bin', etc.) supplying lower level keys: edge names, bin names etc. If 'p' or 'dropoff' is specified returns chi-square intervals instead of simple values.""" callback = self._makeValueCallback(dropoff, p, xtol) if params is None: params = self.getParamNames(scalar_only=True) result = {} for param_name in params: ev = self.defn_for[param_name] ev.fillParValueDict(result, dimensions, callback) return result def _makeValueCallback(self, dropoff, p, xtol=None): """Make a setting -> value function""" if p is not None: assert dropoff is None, (p, dropoff) dropoff = chdtri(1, p) / 2.0 if dropoff is None: def callback(defn, posn): return defn.values[posn] else: assert dropoff > 0, dropoff def callback(defn, posn): lc = self.makeCalculator(variable=defn.uniq[posn]) assert len(lc.opt_pars) == 1, lc.opt_pars opt_par = lc.opt_pars[0] return lc._getCurrentCellInterval(opt_par, dropoff, xtol) return callback @contextmanager def updatesPostponed(self): "Temporarily turn off calculation for faster input setting" (old, self._update_suspended) = (self._update_suspended, True) yield self._update_suspended = old self._updateIntermediateValues() def updateIntermediateValues(self, changed=None): if changed is None: changed = self.defns # all self._changed.update(id(defn) for defn in changed) self._updateIntermediateValues() def _updateIntermediateValues(self): if self._update_suspended: return # use topological sort order # xxx parallel context check? for defn in self.defns: if id(defn) in self._changed: defn.update() for c in defn.clients: self._changed.add(id(c)) self._changed.clear() def assignAll(self, par_name, *args, **kw): defn = self.defn_for[par_name] if not isinstance(defn, _LeafDefn): args = ' and '.join(['"%s"' % a.name for a in defn.args]) msg = '"%s" is not settable as it is derived from %s.' % ( par_name, args) raise ValueError(msg) defn.assignAll(*args, **kw) self.updateIntermediateValues([defn]) def measureEvalsPerSecond(self, *args, **kw): return self.makeCalculator().measureEvalsPerSecond(*args, **kw) def setupParallelContext(self, parallel_split=None): self.overall_parallel_context = parallel.getContext() with parallel.split(parallel_split) as parallel_context: parallel_context = parallel_context.getCommunicator() self.remaining_parallel_context = parallel.getContext() if 'parallel_context' in self.defn_for: self.assignAll( 'parallel_context', value=parallel_context, const=True) def makeCalculator(self, calculatorClass=None, variable=None, **kw): cells = [] input_soup = {} for defn in self.defns: defn.update() (newcells, outputs) = defn.makeCells(input_soup, variable) cells.extend(newcells) input_soup[id(defn)] = outputs if calculatorClass is None: calculatorClass = Calculator kw['overall_parallel_context'] = self.overall_parallel_context kw['remaining_parallel_context'] = self.remaining_parallel_context return calculatorClass(cells, input_soup, **kw) def updateFromCalculator(self, calc): changed = [] for defn in self.defn_for.values(): if isinstance(defn, _LeafDefn): defn.updateFromCalculator(calc) changed.append(defn) self.updateIntermediateValues(changed) def getNumFreeParams(self): return sum(defn.getNumFreeParams() for defn in self.defns if isinstance(defn, _LeafDefn)) def optimise(self, local=None, filename=None, interval=None, limit_action='warn', max_evaluations=None, tolerance=1e-6, global_tolerance=1e-1, **kw): """Find input values that optimise this function. 'local' controls the choice of optimiser, the default being to run both the global and local optimisers. 'filename' and 'interval' control checkpointing. Unknown keyword arguments get passed on to the optimiser(s).""" return_calculator = kw.pop('return_calculator', False) # only for debug for n in ['local', 'filename', 'interval', 'max_evaluations', 'tolerance', 'global_tolerance']: kw[n] = locals()[n] lc = self.makeCalculator() try: lc.optimise(**kw) except MaximumEvaluationsReached, detail: evals = detail[0] err_msg = 'FORCED EXIT from optimiser after %s evaluations' % evals if limit_action == 'ignore': pass elif limit_action == 'warn': warnings.warn(err_msg, stacklevel=2) else: raise ArithmeticError(err_msg) finally: self.updateFromCalculator(lc) if return_calculator: return lc def graphviz(self, **kw): lc = self.makeCalculator() return lc.graphviz(**kw) PyCogent-1.5.3/cogent/recalculation/setting.py000644 000765 000024 00000003742 12024702176 022402 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python' """Instances of these classes are assigned to different parameter/scopes by a parameter controller""" __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class Setting(object): pass class Var(Setting): # placeholder for a single optimiser parameter is_constant = False def __init__(self, bounds = None): if bounds is None: bounds = (None, None, None) else: assert len(bounds) == 3, bounds (self.lower, self.value, self.upper) = bounds def getBounds(self): return (self.lower, self.value, self.upper) def getDefaultValue(self): return self.value def __str__(self): return "Var" # short as in table def __repr__(self): constraints = [] for (template, bound) in [ ("%s<", self.lower), ("(%s)", self.value), ("<%s", self.upper)]: if bound is not None: constraints.append(template % bound) return "Var(%s)" % " ".join(constraints) class ConstVal(Setting): # not to be confused with defns.Const. This is like a Var, # assigned to a parameter which may have other Var values # for other scopes. is_constant = True # Val interface def __init__(self, value): self.value = value def __str__(self): return repr(self.value) # short as in table def __repr__(self): return "ConstVal(%s)" % repr(self.value) # indep useful sometimes! #def __eq__(self, other): # return type(self) is type(other) and other.value == self.value def getDefaultValue(self): return self.value def getBounds(self): return (None, self.value, None) PyCogent-1.5.3/cogent/phylo/__init__.py000644 000765 000024 00000000646 12024702176 020772 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['consensus', 'distance', 'least_squares', 'maximum_likelihood', 'nj', 'tree_space', 'util'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell", "Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" PyCogent-1.5.3/cogent/phylo/consensus.py000644 000765 000024 00000010476 12024702176 021255 0ustar00jrideoutstaff000000 000000 #! /usr/bin/env python """This module implements methods for generating consensus trees from a list of trees""" from cogent.core.tree import TreeBuilder from cogent import LoadTree __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield", "Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" def majorityRule(trees, strict=False): """Determines the consensus tree from a list of rooted trees using the majority rules method of Margush and McMorris 1981 Arguments: - trees: A list of cogent.evolve.tree objects - strict: A boolean flag for strict majority rule tree construction when true only nodes occurring >50% will be used when false the highest scoring < 50% node will be used if there is more than one node with the same score this will be arbitrarily chosen on sort order Returns: a list of cogent.evolve.tree objects """ trees = [(1, tree) for tree in trees] return weightedMajorityRule(trees, strict, "count") def weightedMajorityRule(weighted_trees, strict=False, attr="support"): cladecounts = {} edgelengths = {} total = 0 for (weight, tree) in weighted_trees: total += weight edges = tree.getEdgeVector() for edge in edges: tips = edge.getTipNames(includeself=True) tips = frozenset(tips) if tips not in cladecounts: cladecounts[tips] = 0 cladecounts[tips] += weight length = edge.Length and edge.Length * weight if edgelengths.get(tips, None): edgelengths[tips] += length else: edgelengths[tips] = length cladecounts = [(count, clade) for (clade, count) in cladecounts.items()] cladecounts.sort() cladecounts.reverse() if strict: # Remove any with support < 50% for index, (count, clade) in enumerate(cladecounts): if count <= 0.5 * total: cladecounts = cladecounts[:index] break # Remove conflicts accepted_clades = set() counts = {} for (count, clade) in cladecounts: for accepted_clade in accepted_clades: if clade.intersection(accepted_clade) and not ( clade.issubset(accepted_clade) or clade.issuperset(accepted_clade)): break else: accepted_clades.add(clade) counts[clade] = count weighted_length = edgelengths[clade] edgelengths[clade] = weighted_length and weighted_length / total nodes = {} queue = [] tree_build = TreeBuilder().createEdge for clade in accepted_clades: if len(clade) == 1: tip_name = iter(clade).next() params = {'length':edgelengths[clade], attr:counts[clade]} nodes[tip_name] = tree_build([], tip_name, params) else: queue.append(((len(clade), clade))) while queue: queue.sort() (size, clade) = queue.pop(0) new_queue = [] for (size2, ancestor) in queue: if clade.issubset(ancestor): new_ancestor = (ancestor - clade) | frozenset([clade]) counts[new_ancestor] = counts.pop(ancestor) edgelengths[new_ancestor] = edgelengths.pop(ancestor) ancestor = new_ancestor new_queue.append((len(ancestor), ancestor)) children = [nodes.pop(c) for c in clade] assert len([children]) nodes[clade] = tree_build(children, None, {attr:counts[clade], 'length':edgelengths[clade]}) queue = new_queue for root in nodes.values(): root.Name = 'root' # Yuk return [root for root in nodes.values()] if __name__ == "__main__": import sys trees = [] for filename in sys.argv[1:]: for tree in open(filename): trees.append(LoadTree(treestring=tree)) print "Consensus of %s trees from %s" % (len(trees),sys.argv[1:]) outtrees = majorityRule(trees, strict=True) for tree in outtrees: print tree.asciiArt(compact=True, show_internal=False) PyCogent-1.5.3/cogent/phylo/distance.py000644 000765 000024 00000031673 12024702176 021031 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Estimating pairwise distances between sequences. """ from cogent.util import parallel, table, warning, progress_display as UI from cogent.maths.stats.util import Numbers from cogent import LoadSeqs, LoadTree from warnings import warn __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell", "Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class EstimateDistances(object): """Base class used for estimating pairwise distances between sequences. Can also estimate other parameters from pairs.""" def __init__(self, seqs, submodel, threeway=False, motif_probs = None, do_pair_align=False, rigorous_align=False, est_params=None, modify_lf=None): """Arguments: - seqs: an Alignment or SeqCollection instance with > 1 sequence - submodel: substitution model object Predefined models can be imported from cogent.evolve.models - threeway: a boolean flag for using threeway comparisons to estimate distances. default False. Ignored if do_pair_align is True. - do_pair_align: if the input sequences are to be pairwise aligned first and then the distance will be estimated. A pair HMM based on the submodel will be used. - rigorous_align: if True the pairwise alignments are actually numerically optimised, otherwise the current substitution model settings are used. This slows down estimation considerably. - est_params: substitution model parameters to save estimates from in addition to length (distance) - modify_lf: a callback function for that takes a likelihood function (with alignment set) and modifies it. Can be used to configure local_params, set bounds, optimise using a restriction for faster performance. Note: Unless you know a priori your alignment will be flush ended (meaning no sequence has terminal gaps) it is advisable to construct a substitution model that recodes gaps. Otherwise the terminal gaps will significantly bias the estimation of branch lengths when using do_pair_align. """ if do_pair_align: self.__threeway = False else: # whether pairwise is to be estimated from 3-way self.__threeway = [threeway, False][do_pair_align] self.__seq_collection = seqs self.__seqnames = seqs.getSeqNames() self.__motif_probs = motif_probs # the following may be pairs or three way combinations self.__combination_aligns = None self._do_pair_align = do_pair_align self._rigorous_align = rigorous_align # substitution model stuff self.__sm = submodel self._modify_lf = modify_lf # store for the results self.__param_ests = {} self.__est_params = list(est_params or []) self.__run = False # a flag indicating whether estimation completed # whether we're on the master CPU or not self._on_master_cpu = parallel.getCommunicator().Get_rank() == 0 def __str__(self): return str(self.getTable()) def __make_pairwise_comparison_sets(self): comps = [] names = self.__seq_collection.getSeqNames() n = len(names) for i in range(0, n - 1): for j in range(i + 1, n): comps.append((names[i], names[j])) return comps def __make_threeway_comparison_sets(self): comps = [] names = self.__seq_collection.getSeqNames() n = len(names) for i in range(0, n - 2): for j in range(i + 1, n - 1): for k in range(j + 1, n): comps.append((names[i], names[j], names[k])) return comps def __make_pair_alignment(self, seqs, opt_kwargs): lf = self.__sm.makeLikelihoodFunction(\ LoadTree(tip_names=seqs.getSeqNames()), aligned=False) lf.setSequences(seqs.NamedSeqs) # allow user to modify the lf config if self._modify_lf: lf = self._modify_lf(lf) if self._rigorous_align: lf.optimise(**opt_kwargs) lnL = lf.getLogLikelihood() (vtLnL, aln) = lnL.edge.getViterbiScoreAndAlignment() return aln @UI.display_wrap def __doset(self, sequence_names, dist_opt_args, aln_opt_args, ui): # slice the alignment seqs = self.__seq_collection.takeSeqs(sequence_names) if self._do_pair_align: ui.display('Aligning', progress=0.0, current=.5) align = self.__make_pair_alignment(seqs, aln_opt_args) ui.display('', progress=.5, current=.5) else: align = seqs ui.display('', progress=0.0, current=1.0) # note that we may want to consider removing the redundant gaps # create the tree object tree = LoadTree(tip_names = sequence_names) # make the parameter controller lf = self.__sm.makeLikelihoodFunction(tree) if not self.__threeway: lf.setParamRule('length', is_independent = False) if self.__motif_probs: lf.setMotifProbs(self.__motif_probs) lf.setAlignment(align) # allow user modification of lf using the modify_lf if self._modify_lf: lf = self._modify_lf(lf) lf.optimise(**dist_opt_args) # get the statistics stats_dict = lf.getParamValueDict(['edge'], params=['length'] + self.__est_params) # if two-way, grab first distance only if not self.__threeway: result = {'length': stats_dict['length'].values()[0] * 2.0} else: result = {'length': stats_dict['length']} # include any other params requested for param in self.__est_params: result[param] = stats_dict[param].values()[0] return result @UI.display_wrap def run(self, dist_opt_args=None, aln_opt_args=None, ui=None, **kwargs): """Start estimating the distances between sequences. Distance estimation is done using the Powell local optimiser. This can be changed using the dist_opt_args and aln_opt_args. Arguments: - show_progress: whether to display progress. More detailed progress information from individual optimisation is controlled by the ..opt_args. - dist_opt_args, aln_opt_args: arguments for the optimise method for the distance estimation and alignment estimation respectively.""" if 'local' in kwargs: warn("local argument ignored, provide it to dist_opt_args or"\ " aln_opt_args", DeprecationWarning, stacklevel=2) ui.display("Distances") dist_opt_args = dist_opt_args or {} aln_opt_args = aln_opt_args or {} # set the optimiser defaults dist_opt_args['local'] = dist_opt_args.get('local', True) aln_opt_args['local'] = aln_opt_args.get('local', True) # generate the list of unique sequence sets (pairs or triples) to be # analysed if self.__threeway: combination_aligns = self.__make_threeway_comparison_sets() desc = "triplet " else: combination_aligns = self.__make_pairwise_comparison_sets() desc = "pair " labels = [desc + ','.join(names) for names in combination_aligns] def _one_alignment(comp): result = self.__doset(comp, dist_opt_args, aln_opt_args) return (comp, result) for (comp, value) in ui.imap(_one_alignment, combination_aligns, labels=labels): self.__param_ests[comp] = value def getPairwiseParam(self, param, summary_function="mean"): """Return the pairwise statistic estimates as a dictionary keyed by (seq1, seq2) Arguments: - param: name of a parameter in est_params or 'length' - summary_function: a string naming the function used for estimating param from threeway distances. Valid values are 'mean' (default) and 'median'.""" summary_func = summary_function.capitalize() pairwise_stats = {} assert param in self.__est_params + ['length'], \ "unrecognised param %s" % param if self.__threeway and param == 'length': pairwise = self.__make_pairwise_comparison_sets() # get all the distances involving this pair for a, b in pairwise: values = Numbers() for comp_names, param_vals in self.__param_ests.items(): if a in comp_names and b in comp_names: values.append(param_vals[param][a] + \ param_vals[param][b]) pairwise_stats[(a,b)] = getattr(values, summary_func) else: # no additional processing of the distances is required for comp_names, param_vals in self.__param_ests.items(): pairwise_stats[comp_names] = param_vals[param] return pairwise_stats def getPairwiseDistances(self,summary_function="mean", **kwargs): """Return the pairwise distances as a dictionary keyed by (seq1, seq2). Convenience interface to getPairwiseParam. Arguments: - summary_function: a string naming the function used for estimating param from threeway distances. Valid values are 'mean' (default) and 'median'. """ return self.getPairwiseParam('length',summary_function=summary_function, **kwargs) def getParamValues(self, param, **kwargs): """Returns a Numbers object with all estimated values of param. Arguments: - param: name of a parameter in est_params or 'length' - **kwargs: arguments passed to getPairwiseParam""" ests = self.getPairwiseParam(param, **kwargs) return Numbers(ests.values()) def getTable(self,summary_function="mean", **kwargs): """returns a Table instance of the distance matrix. Arguments: - summary_function: a string naming the function used for estimating param from threeway distances. Valid values are 'mean' (default) and 'median'.""" d = \ self.getPairwiseDistances(summary_function=summary_function,**kwargs) if not d: d = {} for s1 in self.__seqnames: for s2 in self.__seqnames: if s1 == s2: continue else: d[(s1,s2)] = 'Not Done' twoD = [] for s1 in self.__seqnames: row = [s1] for s2 in self.__seqnames: if s1 == s2: row.append('') continue try: row.append(d[(s1,s2)]) except KeyError: row.append(d[(s2,s1)]) twoD.append(row) T = table.Table(['Seq1 \ Seq2'] + self.__seqnames, twoD, row_ids = True, missing_data = "*") return T def getNewickTrees(self): """Returns a list of Newick format trees for supertree methods.""" trees = [] for comp_names, param_vals in self.__param_ests.items(): tips = [] for name in comp_names: tips.append(repr(name)+":%s" % param_vals[name]) trees.append("("+",".join(tips)+");") return trees def writeToFile(self, filename, summary_function="mean", format='phylip', **kwargs): """Save the pairwise distances to a file using phylip format. Other formats can be obtained by getting to a Table. If running in parallel, the master CPU writes out. Arguments: - filename: where distances will be written, required. - summary_function: a string naming the function used for estimating param from threeway distances. Valid values are 'mean' (default) and 'median'. - format: output format of distance matrix """ if self._on_master_cpu: # only write output from 0th node table = self.getTable(summary_function=summary_function, **kwargs) table.writeToFile(filename, format=format) PyCogent-1.5.3/cogent/phylo/least_squares.py000644 000765 000024 00000006235 12024702176 022106 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import numpy from numpy.linalg import solve as solve_linear_equations from tree_space import TreeEvaluator, ancestry2tree from util import distanceDictAndNamesTo1D, distanceDictTo1D, triangularOrder __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" # This is a fairly slow implementation and NOT suitable for large trees. # Trees are represented as "ancestry" matricies in which A[i,j] iff j is an # ancestor of i. For the LS calculations the ancestry matrix is converted # to a "paths" matrix or "split metric" in which S[p,j] iff the path between # the pth pair of tips passes through edge j. def _ancestry2paths(A): """Convert edge x edge ancestry matrix to tip-to-tip path x edge split metric matrix. The paths will be in the same triangular matrix order as produced by distanceDictAndNamesTo1D, provided that the tips appear in the correct order in A""" tips = [i for i in range(A.shape[0]) if sum(A[:,i])==1] paths = [] for (tip1, tip2) in triangularOrder(tips): path = A[tip1] ^ A[tip2] paths.append(path) return numpy.array(paths) class WLS(TreeEvaluator): """(err, best_tree) = WLS(dists).trex()""" def __init__(self, dists, weights = None): """Arguments: - dists: a dict with structure (seq1, seq2): distance - weights: an equivalently structured dict with measurements of variability of the distance estimates. By default, the sqrt of distance is used.""" self.dists = dists self.weights = weights or \ dict((key, 1.0/(dists[key]**2)) for key in dists) (self.names, dists) = distanceDictTo1D(self.dists) def makeTreeScorer(self, names): dists = distanceDictAndNamesTo1D(self.dists, names) weights = distanceDictAndNamesTo1D(self.weights, names) # dists and weights are 1D forms of triangular tip x tip matrices # The order of the tip-to-tip paths is the same for dists, weights and A weights_dists = weights * dists def evaluate(ancestry, lengths=None, sum=sum, _ancestry2paths=_ancestry2paths, dot=numpy.dot, maximum=numpy.maximum, transpose=numpy.transpose, solve=solve_linear_equations): A = _ancestry2paths(ancestry) if lengths is None: At = transpose(A) X = dot(weights * At, A) y = dot(At, weights_dists) lengths = solve(X, y) lengths = maximum(lengths, 0.0) diffs = dot(A, lengths) - dists err = sum(diffs**2) return (err, lengths) return evaluate def result2output(self, err, ancestry, lengths, names): return (err, ancestry2tree(ancestry, lengths, names)) def wls(*args, **kw): (err, tree) = WLS(*args).trex(**kw) return tree PyCogent-1.5.3/cogent/phylo/maximum_likelihood.py000644 000765 000024 00000005104 12024702176 023105 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python' from tree_space import TreeEvaluator, ancestry2tree from least_squares import WLS from math import exp from tree_collection import LogLikelihoodScoredTreeCollection from tree_collection import LoadTrees # only for back compat. __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class ML(TreeEvaluator): """(err, best_tree) = ML(model, alignment, [dists]).trex() 'model' can be a substitution model or a likelihood function factory equivalent to SubstitutionModel.makeLikelihoodFunction(tree). If 'dists' is provided uses WLS to get initial values for lengths""" def __init__(self, model, alignment, dists=None, opt_args={}): self.opt_args = opt_args self.names = alignment.getSeqNames() self.alignment = alignment if hasattr(model, 'makeLikelihoodFunction'): self.lf_factory = lambda tree: model.makeLikelihoodFunction(tree) else: self.lf_factory = model if dists: self.wlsMakeTreeScorer = WLS(dists).makeTreeScorer else: fake_wls = lambda a:(None, None) self.wlsMakeTreeScorer = lambda n: fake_wls def evaluateTree(self, tree): names = tree.getTipNames() subalign = self.alignment.takeSeqs(names) lf = self.lf_factory(tree) lf.setAlignment(subalign) return lf.getLogLikelihood() def makeTreeScorer(self, names): subalign = self.alignment.takeSeqs(names) wls_eval = self.wlsMakeTreeScorer(names) def evaluate(ancestry, lengths=None): if lengths is None: (wls_err, init_lengths) = wls_eval(ancestry) else: init_lengths = lengths tree = ancestry2tree(ancestry, init_lengths, names) lf = self.lf_factory(tree) lf.setAlignment(subalign) if lengths is not None: lf.setParamRule('length', is_constant=True) lf.optimise(show_progress=False, **self.opt_args) err = -1.0 * lf.getLogLikelihood() tree = lf.getAnnotatedTree() return (err, tree) return evaluate def result2output(self, err, ancestry, annotated_tree, names): return (-1.0*err, annotated_tree) def results2output(self, results): return LogLikelihoodScoredTreeCollection(results) PyCogent-1.5.3/cogent/phylo/nj.py000644 000765 000024 00000023337 12024702176 017644 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Generalised Neighbour Joining phylogenetic tree estimation. By default negative branch lengths are reset to 0.0 during the calculations. This is based on the algorithm of Studier and Keppler, as described in the book Biological sequence analysis by Durbin et al Generalised as described by Pearson, Robins & Zhang, 1999. """ from __future__ import division import numpy from cogent.core.tree import TreeBuilder from cogent.phylo.tree_collection import ScoredTreeCollection from cogent.phylo.util import distanceDictTo2D from cogent.util import progress_display as UI from collections import deque __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class LightweightTreeTip(str): def convert(self, constructor, length): node = constructor([], str(self), {}) node.Length = max(0.0, length) return node class LightweightTreeNode(frozenset): """Set of (length, child node) tuples""" def convert(self, constructor=None, length=None): if constructor is None: constructor = TreeBuilder().createEdge children = [child.convert(constructor, clength) for (clength, child) in self] node = constructor(children, None, {}) if length is not None: node.Length = max(0.0, length) return node def __or__(self, other): return type(self)(frozenset.__or__(self, other)) class PartialTree(object): """A candidate tree stored as (distance matrix, list of subtrees, list of tip sets, set of partitions, score). At each iteration (ie: call of the join method) the number of subtrees is reduced as 2 of them are joined, while the number of partitions is increased as a new edge is introduced. """ def __init__(self, d, nodes, tips, score): self.d = d self.nodes = nodes self.tips = tips self.score = score def getDistSavedJoinScoreMatrix(self): d = self.d L = len(d) r = numpy.sum(d, 0) Q = d - numpy.add.outer(r, r)/(L-2.0) return Q/2.0 + sum(r)/(L-2.0)/2 + self.score def join(self, i, j): tips = self.tips[:] new_tip_set = tips[i] | tips[j] nodes = self.nodes[:] d = self.d.copy() # Branch lengths from i and j to new node L = len(nodes) r = numpy.sum(d, axis=0) ij_dist_diff = (r[i]-r[j]) / (L-2.0) left_length = 0.5 * (d[i,j] + ij_dist_diff) right_length = 0.5 * (d[i,j] - ij_dist_diff) score = self.score + d[i,j] left_length = max(0.0, left_length) right_length = max(0.0, right_length) # Join i and k to make new node new_node = LightweightTreeNode( [(left_length, nodes[i]), (right_length, nodes[j])]) # Store new node at i new_dists = 0.5 * (d[i] + d[j] - d[i,j]) d[:, i] = new_dists d[i, :] = new_dists d[i, i] = 0.0 nodes[i] = new_node tips[i] = new_tip_set # Eliminate j d[j, :] = d[L-1, :] d[:, j] = d[:, L-1] assert d[j, j] == 0.0, d d = d[0:L-1, 0:L-1] nodes[j] = nodes[L-1] nodes.pop() tips[j] = tips[L-1] tips.pop() return type(self)(d, nodes, tips, score) def asScoreTreeTuple(self): assert len(self.nodes) == 3 # otherwise next line needs generalizing lengths = numpy.sum(self.d, axis=0) - numpy.sum(self.d)/4 root = LightweightTreeNode(zip(lengths, self.nodes)) tree = root.convert() tree.Name = "root" return (self.score + sum(lengths), tree) class Pair(object): """A candidate neighbour join, not turned into an actual PartialTree until and unless we decide to use it, because calculating just the topology is faster than calculating the whole new distance matrix etc. as well.""" __slots__ = ['tree', 'i', 'j', 'topology', 'new_partition'] def __init__(self, tree, i, j, topology, new_partition): self.tree = tree self.i = i self.j = j self.topology = topology self.new_partition = new_partition def joined(self): new_tree = self.tree.join(self.i,self.j) new_tree.topology = self.topology return new_tree def uniq_neighbour_joins(trees, encode_partition): """Generate all joinable pairs from all trees, best first, filtering out any duplicates""" L = len(trees[0].nodes) scores = numpy.zeros([len(trees), L, L]) for (k, tree) in enumerate(trees): scores[k] = tree.getDistSavedJoinScoreMatrix() topologies = set() order = numpy.argsort(scores.flat) for index in order: (k, ij) = divmod(index, L*L) (i, j) = divmod(ij, L) if i == j: continue tree = trees[k] new_tip_set = tree.tips[i] | tree.tips[j] new_partition = encode_partition(new_tip_set) # check is new topology topology = tree.topology | frozenset([new_partition]) if topology in topologies: continue yield Pair(tree, i, j, topology, new_partition) topologies.add(topology) @UI.display_wrap def gnj(dists, keep=None, dkeep=0, ui=None): """Arguments: - dists: dict of (name1, name2): distance - keep: number of best partial trees to keep at each iteration, and therefore to return. Same as Q parameter in original GNJ paper. - dkeep: number of diverse partial trees to keep at each iteration, and therefore to return. Same as D parameter in original GNJ paper. Result: - a sorted list of (tree length, tree) tuples """ (names, d) = distanceDictTo2D(dists) if keep is None: keep = len(names) * 5 all_keep = keep + dkeep # For recognising duplicate topologies, encode partitions (ie: edges) as # frozensets of tip names, which should be quickly comparable. arbitrary_anchor = names[0] all_tips = frozenset(names) def encode_partition(tips): included = frozenset(tips) if arbitrary_anchor not in included: included = all_tips - included return included # could also convert to long int, or cache, would be faster? tips = [frozenset([n]) for n in names] nodes = [LightweightTreeTip(name) for name in names] star_tree = PartialTree(d, nodes, tips, 0.0) star_tree.topology = frozenset([]) trees = [star_tree] # Progress display auxiliary code template = ' size %%s/%s trees %%%si' % (len(names), len(str(all_keep))) total_work = 0 max_candidates = 1 total_work_before = {} for L in range(len(names), 3, -1): total_work_before[L] = total_work max_candidates = min(all_keep, max_candidates*L*(L-1)//2) total_work += max_candidates def _show_progress(): t = len(next_trees) work_done = total_work_before[L] + t ui.display(msg=template % (L, t), progress=work_done/total_work) for L in range(len(names), 3, -1): # Generator of candidate joins, best first. # Note that with dkeep>0 this generator is used up a bit at a time # by 2 different interupted 'for' loops below. candidates = uniq_neighbour_joins(trees, encode_partition) # First take up to 'keep' best ones next_trees = [] _show_progress() for pair in candidates: next_trees.append(pair) if len(next_trees) == keep: break _show_progress() # The very best one is used as an anchor for measuring the # topological distance to others best_topology = next_trees[0].topology prior_td = [len(best_topology ^ tree.topology) for tree in trees] # Maintain a separate queue of joins for each possible # topological distance max_td = (max(prior_td) + 1) // 2 queue = [deque() for g in range(max_td+1)] queued = 0 # Now take up to dkeep joins, an equal number of the best at each # topological distance, while not calculating any more TDs than # necessary. prior_td = dict(zip(map(id, trees), prior_td)) target_td = 1 while (candidates or queued) and len(next_trees) < all_keep: if candidates and not queue[target_td]: for pair in candidates: diff = pair.new_partition not in best_topology td = (prior_td[id(pair.tree)] + [-1,+1][diff]) // 2 # equiv, slower: td = len(best_topology ^ topology) // 2 queue[td].append(pair) queued += 1 if td == target_td: break else: candidates = None if queue[target_td]: next_trees.append(queue[target_td].popleft()) queued -= 1 _show_progress() target_td = target_td % max_td + 1 trees = [pair.joined() for pair in next_trees] result = [tree.asScoreTreeTuple() for tree in trees] result.sort() return ScoredTreeCollection(result) def nj(dists, no_negatives=True): """Arguments: - dists: dict of (name1, name2): distance - no_negatives: negative branch lengths will be set to 0 """ assert no_negatives, "no_negatives=False is deprecated" (result,) = gnj(dists, keep=1) (score, tree) = result return tree PyCogent-1.5.3/cogent/phylo/tree_collection.py000644 000765 000024 00000007216 12024702176 022405 0ustar00jrideoutstaff000000 000000 from numpy import exp import consensus __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" class _UserList(list): def __getitem__(self, index): # Helpful to keep type after truncation like [self[:10]], # but not self[0] or self[x,y,-1] result = list.__getitem__(self, index) if isinstance(index, slice) and index.step is None: result = type(self)(result) return result class ScoredTreeCollection(_UserList): """An ordered list of (score, tree) tuples""" def writeToFile(self, filename): f = open(filename, 'w') for (score, tree) in self: f.writelines( self.scoredTreeFormat(tree.getNewick(with_distances=True), str(score))) f.close() def scoredTreeFormat(self, tree, score): return [tree, '\t[', score, ']\n'] def getConsensusTree(self, strict=None): ctrees = self.getConsensusTrees(strict) assert len(ctrees) == 1, len(ctrees) return ctrees[0] def getConsensusTrees(self, strict=True): if strict is None: strict = True return consensus.weightedMajorityRule(self, strict) class UsefullyScoredTreeCollection(ScoredTreeCollection): def scoredTreeFormat(self, tree, score): return [score, '\t', tree, '\n'] class WeightedTreeCollection(UsefullyScoredTreeCollection): """An ordered list of (weight, tree) tuples""" def getConsensusTrees(self, strict=False): if strict is None: strict = False return consensus.weightedMajorityRule(self, strict) class LogLikelihoodScoredTreeCollection(UsefullyScoredTreeCollection): """An ordered list of (log likelihood, tree) tuples""" def __init__(self, trees): list.__init__(self, trees) # Quick and very dirty check of order assert self[0][0] >= self[-1][0] def getConsensusTrees(self, cutoff=None, strict=False): return self.getWeightedTrees(cutoff).getConsensusTrees(strict) def getWeightedTrees(self, cutoff=None): if cutoff is None: cutoff = 0.99 assert 0 <= cutoff <= 1.0 max_lnL = self[0][0] weights = [exp(lnL-max_lnL) for (lnL, t) in self] # add from smallest end to avoid rounding errors weights.reverse() tail = (1.0-cutoff) * sum(weights) dropped = 0.0 for (index, weight) in enumerate(weights): dropped += weight if dropped > tail: weights = weights[index:] break denominator = sum(weights) weights.reverse() return WeightedTreeCollection((weight/denominator, tree) for (weight, (lnL, tree)) in zip(weights, self)) def LoadTrees(filename): """Parse a file of (score, tree) lines. Scores can be positive probabilities or negative log likelihoods.""" from cogent import LoadTree infile = open(filename, 'r') trees = [] klass = list # expect score, tree for line in infile: line = line.split(None, 1) lnL = float(line[0]) if lnL > 1: raise ValueError('likelihoods expected, not %s' % lnL) elif lnL > 0: assert klass in [list, WeightedTreeCollection] klass = WeightedTreeCollection else: assert klass in [list, LogLikelihoodScoredTreeCollection] klass = LogLikelihoodScoredTreeCollection tree = LoadTree(treestring=line[1]) trees.append((lnL, tree)) trees.sort(reverse=True) return klass(trees) PyCogent-1.5.3/cogent/phylo/tree_space.py000644 000765 000024 00000022754 12024702176 021351 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division import numpy import itertools from cogent.core.tree import TreeBuilder from cogent.phylo.tree_collection import ScoredTreeCollection from cogent.util import parallel, checkpointing, progress_display as UI __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" def ismallest(data, size): """There are many ways to get the k smallest items from an N sequence, and which one performs best depends on k, N and k/N. This algorithm appears to beat anything heapq can do, and stays with a factor of 2 of sort() and min(). Is uses memory O(2*k) and so is particularly suitable for lazy application to large N. It returns the smallest k sorted too.""" limit = 2 * size data = iter(data) best = list(itertools.islice(data, limit)) while True: best.sort() if len(best) <= size: break del best[size:] worst_of_best = best[-1] for item in data: if item < worst_of_best: best.append(item) if len(best) > limit: break return best # Trees are represented as "ancestry" matricies in which A[i,j] iff j is an # ancestor of i. For LS calculations the ancestry matrix is converted # to a "paths" matrix or "split metric" in which S[p,j] iff the path between # the pth pair of tips passes through edge j. For ML calculations the # ancestry matrix is converted back into an ordinary cogent tree object. def tree2ancestry(tree, order=None): nodes = tree.unrooted().getEdgeVector()[:-1] if order is not None: lookup = dict([(k,i) for (i,k) in enumerate(order)]) def _ordered_tips_first(n): if n.Children: return len(order) else: return lookup[n.Name] nodes.sort(key=_ordered_tips_first) n = len(nodes) A = numpy.zeros([n, n], int) seen = {} for (i, node) in enumerate(nodes): A[i, i] = 1 seen[id(node)] = i for c in node.Children: A[:,i] |= A[:,seen[id(c)]] names = [n.Name for n in nodes if not n.Children] lengths = [n.Length for n in nodes] return (A, names, lengths) def ancestry2tree(A, lengths, tip_names): """Convert edge x edge ancestry matrix to a cogent Tree object""" tips = {} tip = 0 for i in range(len(A)): if numpy.sum(A[:,i]) == 1: tips[i] = tip_names[tip] tip += 1 assert tip == len(tip_names) constructor = TreeBuilder().createEdge free = {} for i in numpy.argsort(numpy.sum(A, axis=0)): children = [j for j in range(len(A)) if A[j, i] and j != i] child_nodes = [free.pop(j) for j in children if j in free] if child_nodes: name = None else: name = tips[i] if lengths is None: params = {} else: params = {'length':lengths[i]} node = constructor(child_nodes, name, params) free[i] = node return constructor(free.values(), 'root', {}) def grown(B, split_edge): """Ancestry matrix 'B' with one extra leaf added at 'split_edge'. Row/column order within the matrix is independent of the topology it represents. The added leaf will be the last one in the matrix, which keeps the leaf node order the same as the order in which they are added, which is what is assumed by ancestry2tree and ancestry2paths""" n = len(B) A = numpy.zeros([n+2, n+2], int) A[:n, :n] = B (sibling, parent) = (n, n + 1) A[sibling] = A[parent] = A[split_edge] A[:,parent] = A[:,split_edge] A[sibling,split_edge] = 0 A[parent, split_edge] = 0 A[sibling,sibling] = 1 A[parent,parent] = 1 A[sibling,parent] = 1 A[split_edge,parent] = 1 return A class TreeEvaluator(object): """Subclass must provide makeTreeScorer and result2output""" def results2output(self, results): return ScoredTreeCollection(results) def evaluateTopology(self, tree): """Optimal (score, tree) for the one topology 'tree'""" (ancestry, names, lengths) = tree2ancestry(tree) evaluate = self.makeTreeScorer(names) (err, lengths) = evaluate(ancestry) return self.result2output(err, ancestry, lengths, names) def evaluateTree(self, tree): """score for 'tree' with lengths as-is""" (ancestry, names, lengths) = tree2ancestry(tree) evaluate = self.makeTreeScorer(names) (err, result) = evaluate(ancestry, lengths=lengths) return err def _consistentNameOrder(self, fixed_names, ordered_names=None): """fixed_names followed by ordered_names without duplicates""" all_names = set(self.names) fixed_names_set = set(fixed_names) assert fixed_names_set.issubset(all_names) if ordered_names: assert set(ordered_names).issubset(all_names) else: ordered_names = self.names names = list(fixed_names) + [n for n in ordered_names if n not in fixed_names_set] return names @UI.display_wrap def trex(self, a=8, k=1000, start=None, order=None, return_all=False, filename=None, interval=None, ui=None): """TrexML policy for tree sampling - all trees up to size 'a' and then keep no more than 'k' best trees at each tree size. 'order' is an optional list of tip names. 'start' is an optional list of initial trees. Each of the trees must contain the same tips. 'filename' and 'interval' control checkpointing. Advanced step-wise addition algorithm M. J. Wolf, S. Easteal, M. Kahn, B. D. McKay, and L. S. Jermiin. Trexml: a maximum-likelihood approach for extensive tree-space exploration. Bioinformatics, 16(4):383 94, 2000.""" checkpointer = checkpointing.Checkpointer(filename, interval) if checkpointer.available(): (init_tree_size, fixed_names, trees) = checkpointer.load() names = self._consistentNameOrder(fixed_names, order) elif start is not None: if not isinstance(start, list): start = [start] fixed_names = start[0].getTipNames() names = self._consistentNameOrder(fixed_names, order) trees = [] for tree in start: # check the start tree represents a subset of tips assert set(tree.getTipNames()) < set(self.names), \ "Starting tree names not a subset of the sequence names" (ancestry, fixed_names2, lengths) = tree2ancestry( tree, order=fixed_names) assert fixed_names2 == fixed_names trees.append((None, None, ancestry)) init_tree_size = len(fixed_names) else: trees = [(None, None, numpy.identity(3, int))] names = self._consistentNameOrder([], order) init_tree_size = 3 tree_size = len(names) assert tree_size > 3 if a > tree_size: a = tree_size if a < 4: a = 4 # All trees of size a-1, no need to compare them for n in range(init_tree_size+1, a): trees2 = [] for (err2, lengths2, ancestry) in trees: for split_edge in range(len(ancestry)): ancestry2 = grown(ancestry, split_edge) trees2.append((None, None, ancestry2)) trees = trees2 init_tree_size = n # Pre calculate how much work is to be done, for progress display tree_count = len(trees) total_work = 0 work_done = [0] * (init_tree_size+1) for n in range(init_tree_size+1, tree_size+1): evals = tree_count * (n*2-5) total_work += evals * n tree_count = min(k, evals) work_done.append(total_work) # For each tree size, grow at each edge of each tree. Keep best k. for n in range(init_tree_size+1, tree_size+1): evaluate = self.makeTreeScorer(names[:n]) def grown_tree(spec): (tree_ordinal, tree, split_edge) = spec (old_err, old_lengths, old_ancestry) = tree ancestry = grown(old_ancestry, split_edge) (err, lengths) = evaluate(ancestry) return (err, tree_ordinal, split_edge, lengths, ancestry) specs = [(i, tree, edge) for (i,tree) in enumerate(trees) for edge in range(n*2-5)] candidates = ui.imap(grown_tree, specs, noun=('%s leaf tree' % n), start=work_done[n-1]/total_work, end=work_done[n]/total_work) best = ismallest(candidates, k) trees = [(err, lengths, ancestry) for (err, parent_ordinal, split_edge, lengths, ancestry) in best] checkpointer.record((n, names[:n], trees)) results = (self.result2output(err, ancestry, lengths, names) for (err, lengths, ancestry) in trees) if return_all: result = self.results2output(results) else: result = results.next() return result PyCogent-1.5.3/cogent/phylo/util.py000644 000765 000024 00000005125 12024702176 020205 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import numpy Float = numpy.core.numerictypes.sctype2char(float) # Distance matricies are presently represented as simple dictionaries, which # need to be converted into numpy arrays before being fed into phylogenetic # reconstruction algorithms. __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "pm67nz@gmail.com" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def namesFromDistanceDict(dists): """Unique names from within the tuples which make up the keys of 'dists'""" names = [] for key in dists: for name in key: if name not in names: names.append(name) return names def lookupSymmetricDict(dists, a, b): """dists[a,b] or dists[b,a], whichever is present, so long as they don't contradict each other""" v1 = dists.get((a, b), None) v2 = dists.get((b, a), None) if v1 is None and v2 is None: raise KeyError((a,b)) elif v1 is None or v2 is None or v1 == v2: return v1 or v2 else: raise ValueError("d[%s,%s] != d[%s,%s]" % (a,b,b,a)) def distanceDictTo2D(dists): """(names, dists). Distances converted into a straightforward distance matrix""" names = namesFromDistanceDict(dists) L = len(names) d = numpy.zeros([L, L], Float) for (i, a) in enumerate(names): for (j, b) in enumerate(names): if i != j: d[i, j] = lookupSymmetricDict(dists, a, b) return (names, d) def triangularOrder(keys): """Indices for extracting a 1D representation of a triangular matrix where j > i and i is the inner dimension: Yields (0,1), (0,2), (1, 2), (0,3), (1,3), (2,3), (0,4)...""" N = len(keys) for j in range(1, N): for i in range(0, j): yield (keys[i], keys[j]) def distanceDictAndNamesTo1D(dists, names): """Distances converted into a triangular matrix implemented as a 1D array where j > i and i is the inner dimension: d[0,1], d[0, 2], d[1, 2], d[0, 3]...""" d = [] for (name_i, name_j) in triangularOrder(names): d.append(lookupSymmetricDict(dists, name_i, name_j)) return numpy.array(d) def distanceDictTo1D(dists): """(names, dists). Distances converted into a triangular matrix implemented as a 1D array where j > i and i is the inner dimension: d[0,1], d[0, 2], d[1, 2], d[0, 3]...""" names = namesFromDistanceDict(dists) d = distanceDictAndNamesTo1D(dists, names) return (names, d) PyCogent-1.5.3/cogent/parse/__init__.py000644 000765 000024 00000003604 12024702176 020746 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['aaindex', 'agilent_microarray', 'blast', 'bpseq', 'carnac', 'cigar', 'clustal', 'cmfinder', 'column', 'comrna', 'consan', 'contrafold', 'cove', 'ct', 'cut', 'cutg', 'dialign', 'dynalign', 'ebi', 'fasta', 'fastq', 'foldalign', 'gbseq', 'gcg', 'genbank', 'gff', 'illumina_sequence', 'ilm', 'knetfold', 'locuslink', 'macsim', 'mage', 'meme', 'mfold', 'ncbi_taxonomy', 'newick', 'nexus', 'nupack', 'paml', 'paml_matrix', 'pdb', 'pfold', 'phylip', 'pknotsrg', 'rdb', 'record', 'record_finder', 'rfam', 'rna_fold', 'rnaalifold', 'rnaforester', 'rnashapes', 'rnaview', 'sequence', 'sfold', 'sprinzl', 'table', 'tinyseq', 'tree', 'tree_xml', 'unafold', 'unigene'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell", "Rob Knight", "Catherine Lozupone", "Jeremy Widmann", "Matthew Wakefield", "Sandra Smit", "Greg Caporaso", "Zongzhi Liu", "Micah Hamady", "Jason Carnes", "Raymond Sammut", "Hua Ying", "Andrew Butterfield", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/cogent/parse/aaindex.py000644 000765 000024 00000037331 12024702176 020624 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parsers for the AAIndex file format. AAIndex can be downloaded at: http://www.genome.ad.jp/dbget/aaindex.html There are two main files: AAIndex1 contains linear measures (one number per amino acid) of amino acid properties, while AAIndex2 contains pairwise measures (one number per pair of amino acids, e.g. distance or similarity matrices). """ import re from cogent.parse.record_finder import DelimitedRecordFinder from string import rstrip from cogent.maths.matrix.distance import DistanceMatrix __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "caporaso@colorado.edu" __status__ = "Production" class AAIndexParser(object): """ Abstract class for AAIndex file parsers This file is an abstract class for the parsers of the two AAIndex files. The only real difference between the files is that AAIndex1 has one additional field, labeled in here as Correlating. """ def __init__(self): """ Initialize the object. """ def __call__(self, infile): """ Parse AAIndex file into dict of AAIndex objects with ID as key infile = file to parse as file object or list of lines Usage: aa1p = AAIndex1Parser() aaIndex1Objects = aa1p('data/AAIndex1') aa2p = AAIndex2Parser() aaIndex2Objects = aa2p('data/AAIndex2') """ result = {} # Break down the file into records delimited by '//' and then # parse each record into AAIndexRecord objects which will be stored # in a dict keyed by the records unique ID string AAIndexRecordFinder = DelimitedRecordFinder('//', constructor=rstrip) # parser is a generator of AAIndexRecords from file parser = AAIndexRecordFinder(infile) for r in parser: new_record = self._parse_record(r) if new_record: yield new_record def _get_field(self, field_identifier, lines): """ Returns the field identified as a one line string """ i = 0 result = '' # Concatenate multi-line data with line_split line_split = ' ' # Run through all lines in the current record while (i < len(lines)): # Check each line to see if it starts with the field # identifier we are looking for if (lines[i].startswith(field_identifier)): # If we find the line we are looking for, include it in # the result, unless it's a Data line. # Data entries are multi-line, and the first is information # that we are not interested in here. if (field_identifier != 'I'): result += lines[i] if field_identifier == 'M': result += 'BRK' # Get rid of the line identifier and leading white space result = result[2:] # Move to next line i += 1 # and see if it's a continuation from the above line while (i < len(lines) and\ (lines[i].startswith(' ') or\ lines[i].startswith(field_identifier))): # if continuation combine the lines while treating the # spaces nicely, ie, multiple spaces -> one space # this is mostly just important for the # lines that are strings such as title result = result.rstrip() + line_split + lines[i].lstrip() i += 1 break i += 1 # return the field of interest return result class AAIndex1Parser(AAIndexParser): """ Parse AAIndex1 file & return it as dict of AAIndex1 objects""" def _parse_record(self, lines): """ Parse a single record and return it as a AAIndex1Record Object """ # init all of the fields each time, this is so that # if fields are missing they don't get the value from the last # record id = None description = None LITDB = None authors = None title = None citations = None comments = None correlating = {} data = [None] * 20 id = self._get_field('H', lines) description = self._get_field('D', lines) LITDB = self._get_field('R', lines) authors = self._get_field('A', lines) title = self._get_field('T', lines) citations = self._get_field('J', lines) comments = self._get_field('*', lines) correlating = self._parse_correlating(self._get_field('C', lines)) data = self._parse_data(self._get_field('I', lines)) return AAIndex1Record(id, description, LITDB, authors,\ title, citations, comments, correlating, data) def _parse_correlating(self, raw): """ Parse Correlating entries from the current record """ keys = [] values = [] raw = raw.lstrip() # Split by white space data = re.split('\s*', raw) i=0 while(iIIQIIHHHB', common_header_fields) def parse_common_header(sff_file): """Parse a Common Header section from a binary SFF file. Keys in the resulting dict are identical to those defined in the Roche documentation. As a side effect, sets the position of the file object to the end of the Common Header section. """ h = common_header_struct.read_from(sff_file) h['flow_chars'] = sff_file.read(h['number_of_flows_per_read']) h['key_sequence'] = sff_file.read(h['key_length']) seek_pad(sff_file) return h def write_common_header(sff_file, header): """Write a common header section to a binary SFF file. """ header_bytes = common_header_struct.pack(header) sff_file.write(header_bytes) sff_file.write(header['flow_chars']) sff_file.write(header['key_sequence']) write_pad(sff_file) common_header_formats = [ ' Magic Number: 0x%X\n', ' Version: %04d\n', ' Index Offset: %d\n', ' Index Length: %d\n', ' # of Reads: %d\n', ' Header Length: %d\n', ' Key Length: %d\n', ' # of Flows: %d\n', ' Flowgram Code: %d\n', ] def format_common_header(header): """Format a dictionary representation of an SFF common header as text. """ out = StringIO() out.write('Common Header:\n') for key, fmt in zip(common_header_fields, common_header_formats): val = header[key] out.write(fmt % val) out.write(' Flow Chars: %s\n' % header['flow_chars']) out.write(' Key Sequence: %s\n' % header['key_sequence']) return out.getvalue() class UnsupportedSffError(Exception): pass def validate_common_header(header): """Validate the Common Header section of a binary SFF file. Raises an UnsupportedSffError if the header is not supported. """ supported_values = { 'magic_number': 0x2E736666, 'version': 1, 'flowgram_format_code': 1, } for attr_name, expected_value in supported_values.items(): observed_value = header[attr_name] if observed_value != expected_value: raise UnsupportedSffError( '%s not supported. (Expected %s, observed %s)' % ( attr_name, expected_value, observed_value)) read_header_fields = [ 'read_header_length', 'name_length', 'number_of_bases', 'clip_qual_left', 'clip_qual_right', 'clip_adapter_left', 'clip_adapter_right', ] read_header_struct = NamedStruct('>HHIHHHH', read_header_fields) def parse_read_header(sff_file): """Parse a Read Header section from a binary SFF file. Keys in the resulting dict are identical to those defined in the Roche documentation. As a side effect, sets the position of the file object to the end of the Read Header section. """ data = read_header_struct.read_from(sff_file) data['Name'] = sff_file.read(data['name_length']) seek_pad(sff_file) return data def write_read_header(sff_file, read_header): """Write a read header section to a binary SFF file. """ header_bytes = read_header_struct.pack(read_header) sff_file.write(header_bytes) sff_file.write(read_header['Name']) write_pad(sff_file) read_header_formats = [ ' Read Header Len: %d\n', ' Name Length: %d\n', ' # of Bases: %d\n', ' Clip Qual Left: %d\n', ' Clip Qual Right: %d\n', ' Clip Adap Left: %d\n', ' Clip Adap Right: %d\n', ] def format_read_header(read_header): """Format a dictionary representation of an SFF read header as text. """ out = StringIO() out.write('\n>%s\n' % read_header['Name']) timestamp, hashchar, region, location = decode_accession(read_header['Name']) out.write(' Run Prefix: R_%d_%02d_%02d_%02d_%02d_%02d_\n' % timestamp) out.write(' Region #: %d\n' % region) out.write(' XY Location: %04d_%04d\n' % location) out.write('\n') for key, fmt in zip(read_header_fields, read_header_formats): val = read_header[key] out.write(fmt % val) return out.getvalue() def parse_read_data(sff_file, number_of_bases, number_of_flows=400): """Parse a Read Data section from a binary SFF file. Keys in the resulting dict are identical to those defined in the Roche documentation. As a side effect, sets the position of the file object to the end of the Read Header section. """ data = {} flow_fmt = '>' + ('H' * number_of_flows) base_fmt = '>' + ('B' * number_of_bases) flow_fmt_size = struct.calcsize(flow_fmt) base_fmt_size = struct.calcsize(base_fmt) buff = sff_file.read(flow_fmt_size) data['flowgram_values'] = struct.unpack(flow_fmt, buff) buff = sff_file.read(base_fmt_size) data['flow_index_per_base'] = struct.unpack(base_fmt, buff) data['Bases'] = sff_file.read(number_of_bases) buff = sff_file.read(base_fmt_size) data['quality_scores'] = struct.unpack(base_fmt, buff) seek_pad(sff_file) return data def write_read_data(sff_file, read_data): """Write a read data section to a binary SFF file. """ number_of_flows = len(read_data['flowgram_values']) number_of_bases = len(read_data['quality_scores']) flow_fmt = '>' + ('H' * number_of_flows) base_fmt = '>' + ('B' * number_of_bases) flow_bytes = struct.pack(flow_fmt, *read_data['flowgram_values']) sff_file.write(flow_bytes) index_bytes = struct.pack(base_fmt, *read_data['flow_index_per_base']) sff_file.write(index_bytes) sff_file.write(read_data['Bases']) qual_bytes = struct.pack(base_fmt, *read_data['quality_scores']) sff_file.write(qual_bytes) write_pad(sff_file) def format_read_data(read_data, read_header): """Format a dictionary representation of an SFF read data as text. The read data is expected to be in native flowgram format. """ out = StringIO() out.write('\n') out.write('Flowgram:') for x in read_data['flowgram_values']: out.write('\t%01.2f' % (x * 0.01)) out.write('\n') out.write('Flow Indexes:') current_index = 0 for i in read_data['flow_index_per_base']: current_index = current_index + i out.write('\t%d' % current_index) out.write('\n') out.write('Bases:\t') # Roche uses 1-based indexing left_idx = read_header['clip_qual_left'] - 1 right_idx = read_header['clip_qual_right'] - 1 for i, base in enumerate(read_data['Bases']): if (i < left_idx) or (i > right_idx): out.write(base.lower()) else: out.write(base.upper()) out.write('\n') out.write('Quality Scores:') for score in read_data['quality_scores']: out.write('\t%d' % score) out.write('\n') return out.getvalue() def parse_read(sff_file, number_of_flows=400): """Parse a single read from a binary SFF file. Keys in the resulting dict are identical to those defined in the Roche documentation for the Read Header and Read Data sections. As a side effect, sets the position of the file object to the end of the Read Data section. """ header_data = parse_read_header(sff_file) read_data = parse_read_data( sff_file, header_data['number_of_bases'], number_of_flows) read_data.update(header_data) return read_data def write_read(sff_file, read): """Write a single read to a binary SFF file. """ write_read_header(sff_file, read) write_read_data(sff_file, read) def format_read(read): """Format a dictionary representation of an SFF read as text. """ out = StringIO() out.write(format_read_header(read)) out.write(format_read_data(read, read)) return out.getvalue() def parse_binary_sff(sff_file, native_flowgram_values=False): """Parse a binary SFF file, returning the header and a sequence of reads. In the binary file, flowgram values are stored as integers, 100 times larger than the normalized floating point value. Because the conversion is relatively expensive, we allow the computation to be skipped if the keyword argument native_flowgram_values is True. """ header = parse_common_header(sff_file) number_of_flows = header['number_of_flows_per_read'] validate_common_header(header) def get_reads(): for i in range(header['number_of_reads']): # Skip the index section if sff_file.tell() == header['index_offset']: sff_file.seek(header['index_length'], 1) read = parse_read(sff_file, number_of_flows) if not native_flowgram_values: read['flowgram_values'] = [x * 0.01 for x in read['flowgram_values']] yield read return header, get_reads() def write_binary_sff(sff_file, header, reads): """Write a binary SFF file, using provided header and read dicts. """ sff_file.seek(0) sff_file.truncate() write_common_header(sff_file, header) for read in reads: write_read(sff_file, read) def format_binary_sff(sff_file, output_file=None): """Write a text version of a binary SFF file to an output file. If no output file is provided, an in-memory file-like buffer is used (namely, a StringIO object). """ if output_file is None: output_file = StringIO() header, reads = parse_binary_sff(sff_file, True) output_file.write(format_common_header(header)) for read in reads: output_file.write(format_read(read)) return output_file def base36_encode(n): """Convert a positive integer to a base36 string. Following the conventions outlined in the Roche 454 manual, the numbers 0-25 are represented by letters, and the numbers 36-35 are represented by digits. Based on the code example at http://en.wikipedia.org/wiki/Base_36 """ if n < 0: raise ValueError('Only poitive numbers are supported.') chars = [] while n != 0: n, remainder = divmod(n, 36) chars.append(base36_encode.alphabet[remainder]) return ''.join(chars) base36_encode.alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' def base36_decode(base36_str): """Convert a base36 string to a positive integer. Following the conventions outlined in the Roche 454 manual, the numbers 0-25 are represented by letters, and the numbers 36-35 are represented by digits. """ base36_str = base36_str.translate(base36_decode.translation) return int(base36_str, 36) base36_decode.translation = string.maketrans( 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', ) def decode_location(location_str): """Decode a base36-encoded well location, in Roche 454 format. Such timestamps are embedded in the final 5 characters of Roche \"universal\" accession numbers. """ return divmod(base36_decode(location_str), 4096) def decode_timestamp(timestamp_str): """Decode a base36-encoded timestamp, in Roche 454 format. Such timestamps are embedded in the first 6 characters of Roche \"universal\" accession numbers and SFF filenames. """ n = base36_decode(timestamp_str) year, n = divmod(n, 13 * 32 * 24 * 60 * 60) year = year + 2000 month, n = divmod(n, 32 * 24 * 60 * 60) day, n = divmod(n, 24 * 60 * 60) hour, n = divmod(n, 60 * 60) minute, second = divmod(n, 60) return year, month, day, hour, minute, second def decode_accession(accession): """Decode a Roche 454 \"universal\" accession number. """ assert len(accession) == 14 timestamp = decode_timestamp(accession[:6]) hashchar = accession[6] region = int(accession[7:9]) location = decode_location(accession[9:14]) return timestamp, hashchar, region, location def decode_sff_filename(sff_filename): """Decode a Roche 454 SFF filename, returning a timestamp and other info. """ assert len(sff_filename) == 13 assert sff_filename.endswith('.sff') timestamp = decode_timestamp(sff_filename[:6]) hashchar = sff_filename[6] region = int(sff_filename[7:9]) return timestamp, hashchar, region PyCogent-1.5.3/cogent/parse/blast.py000644 000765 000024 00000041630 12024702176 020315 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parsers for blast, psi-blast and blat. """ from cogent.parse.record_finder import LabeledRecordFinder, \ DelimitedRecordFinder, never_ignore from cogent.parse.record import RecordError from string import strip, upper __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" def iter_finder(line): """Split record on rows that start with iteration label.""" return line.startswith("# Iteration:") def query_finder(line): """Split record on rows that start with query label.""" return line.startswith("# Query:") def iteration_set_finder(line): """Split record on rows that begin a new iteration.""" return line.startswith("# Iteration: 1") def _is_junk(line, t_strs): """Ignore empty line, line with blast info, or whitespace line""" # empty or white space if not line or not line.strip(): return True # blast info line for t_str in t_strs: if line.startswith("# %s" % t_str): return True return False def is_blast_junk(line): """Ignore empty line or lines with blast info""" return _is_junk(line, ("BLAST","TBLAS")) def is_blat_junk(line): """Ignore empty line or lines with blat info""" return _is_junk(line, ("BLAT",)) label_constructors = {'ITERATION': int} #add other label constructors here def make_label(line): """Make key, value for colon-delimited comment lines. WARNING: Only maps the data type if the key is in label_constructors above. """ if not line.startswith("#"): raise ValueError, "Labels must start with a # symbol." if line.find(":") == -1: raise ValueError, "Labels must contain a : symbol." key, value = map(strip, line[1:].split(":", 1)) key = key.upper() if key in label_constructors: value = label_constructors[key](value) return key, value BlatFinder = LabeledRecordFinder(query_finder, constructor=strip, \ ignore=is_blat_junk) BlastFinder = LabeledRecordFinder(query_finder, constructor=strip, \ ignore=is_blast_junk) PsiBlastFinder = LabeledRecordFinder(iter_finder, constructor=strip, \ ignore=is_blast_junk) PsiBlastQueryFinder = LabeledRecordFinder(iteration_set_finder, \ constructor=strip, ignore=is_blast_junk) def GenericBlastParser9(lines, finder, make_col_headers=False): """Yields successive records from lines (props, data list) Infile must in blast9 format finder: labeled record finder function make_col_header: adds column headers (from fields entry) as first row in data output props is a dict of {UPPERCASE_KEY:value}. data_list is a list of list of strings, optionally with header first. """ for rec in finder(lines): props = {} data = [] for line in rec: if line.startswith("#"): label, value = make_label(line) props[label] = value # check if need to insert column headers if make_col_headers and label == "FIELDS": data.insert(0, map(upper, map(strip,value.split(",")))) else: data.append(map(strip, line.split("\t"))) yield props, data def TableToValues(table, constructors=None, header=None): """Converts table to values according to constructors. Returns (table, header). Use dict([(val, i) for i, val in enumerate(header)]) to get back a dict mapping the fields to indices in each row. """ if header is None: #assume first row of table header = table[0] table = table[1:] c_list = [constructors.get(k, str) for k in header] return [[c(val) for c, val in zip(c_list, row)] for row in table], header psiblast_constructors={'% identity':float, 'alignment length':int, \ 'mismatches':int, 'gap openings':int, 'q. start':int, 'q. end':int, \ 's. start':int, 's. end':int, 'e-value':float, 'bit score':float} #make case-insensitive for key, val in psiblast_constructors.items(): psiblast_constructors[key.upper()] = val def PsiBlastTableParser(table): return TableToValues(table, psiblast_constructors) def MinimalBlastParser9(lines, include_column_names=False): """Yields succesive records from lines (props, data list). lines must be BLAST output format. """ return GenericBlastParser9(lines, BlastFinder, include_column_names) def MinimalPsiBlastParser9(lines, include_column_names=False): """Yields successive records from lines (props, data list) lines must be of psi-blast output format """ return GenericBlastParser9(lines, PsiBlastFinder, include_column_names) def MinimalBlatParser9(lines, include_column_names=True): """Yields successive records from lines (props, data list) lines must be of blat output (blast9) format """ return GenericBlastParser9(lines, BlatFinder, include_column_names) def PsiBlastParser9(lines): """Returns fully parsed PSI-BLAST result. result['query'] gives all the results for specified query sequence. result['query'][i] gives result for iteration i (offset by 1: zero-based) if x = result['query']['iteration']: x[0]['e-value'] gives the e-value of the first result. WARNING: designed for ease of use, not efficiency!""" result = {} for query in PsiBlastQueryFinder(lines): first_query = True #if it's the first, need to make the entry for properties, record in MinimalPsiBlastParser9(query, True): if first_query: curr_resultset = [] result[properties['QUERY'].split()[0]] = curr_resultset first_query = False table, header = PsiBlastTableParser(record) curr_resultset.append([dict(zip(header, row)) for row in table]) return result def get_blast_ids(props, data, filter_identity, threshold, keep_values): """ Extract ids from blast output """ fields = map(strip, props["FIELDS"].upper().split(",")) # get column index of protein ids we want p_ix = fields.index("SUBJECT ID") # get column index to screen by if filter_identity: e_ix = fields.index("% IDENTITY") else: e_ix = fields.index("E-VALUE") # no filter, returh all if not threshold: if keep_values: return [(x[p_ix],x[e_ix]) for x in data] else: return [x[p_ix] for x in data] else: # will raise exception if invalid threshold passed max_val = float(threshold) #figure out what we're keeping def ok_val(val): if threshold: return (val <= max_val) return (val >= max_val) if keep_values: return [(x[p_ix],x[e_ix]) for x in data if ok_val(float(x[e_ix]))] else: return [x[p_ix] for x in data if ok_val(float(x[e_ix]))] def AllProteinIds9(lines, filter_identity=True, threshold=None, \ keep_below_threshold=True, output_parser=MinimalPsiBlastParser9, keep_values=False): """Helper to extract just protein ids from each blast search lines: output file in output format #9. filter_identity: when True, use % identity to filter, else use e-value threshold: when None, all results are returned. When not None, used as a threshold to filter results. keep_below_threshold: when True, keeps any rows below given threshold, else keep any rows above threshold output_parser: minimal output parser to use (e.g. minimalpsiblast) keep_values: if True, returns tuples of (id, value) rather than just ids. Note that you can feed it successive output from PsiBlastQueryFinder if you have a PSI-BLAST file with multiple input queries. Subject ids are stable relative to original order. """ mpbp = output_parser(lines) # get last record. props = data = None out_ids = {} out_ct = 1 for rec in mpbp: props, data = rec out_ids[out_ct] = get_blast_ids(props, data, filter_identity, threshold, keep_values) out_ct += 1 return out_ids def LastProteinIds9(lines, filter_identity=True, threshold=None, \ keep_below_threshold=True, output_parser=MinimalPsiBlastParser9, keep_values=False): """Helper to extract just protein ids from last psi-blast iteration. lines: output file in output format #9. filter_identity: when True, use % identity to filter, else use e-value threshold: when None, all results are returned. When not None, used as a threshold to filter results. keep_below_threshold: when True, keeps any rows below given threshold, else keep any rows above threshold output_parser: minimal output parser to use (e.g. minimalpsiblast) keep_values: if True, returns tuples of (id, value) rather than just ids. Note that you can feed it successive output from PsiBlastQueryFinder if you have a PSI-BLAST file with multiple input queries. Subject ids are stable relative to original order. """ mpbp = output_parser(lines) # get last record. props = data = None for rec in mpbp: props, data = rec if not (props and data): return [] return get_blast_ids(props, data, filter_identity, threshold, keep_values) def QMEBlast9(lines): """Returns query, match and e-value for each line in Blast-9 output. WARNING: Allows duplicates in result. WARNING: If you use this on PSI-BLAST output, will not check that you're only getting stuff from the last iteration but will give you everything. The advantage is that you keep stuff that drops out of the profile. The disadvantage is that you keep stuff that drops out of the profile... """ result = [] for line in lines: if line.startswith('#'): continue try: fields = line.split('\t') result.append((fields[0], fields[1], float(fields[-2]))) except (TypeError, ValueError, IndexError): pass return result def QMEPsiBlast9(lines): """Returns successive query, match, e-value from lines of Psi-Blast run. Assumes tabular output. Uses last iteration from each query. WARNING: Allows duplicates in result """ result = [] for query in PsiBlastQueryFinder(lines): for iteration in PsiBlastFinder(query): pass result.extend(QMEBlast9(iteration)) return result class BlastResult(dict): """Adds convenience methods to BLAST result dict. {Query:[[{Field:Value}]]} Nesting is: query: key/value iteration: list hit: list field: key/value For BLAST, there is always exactly one iteration, but PSIBLAST can have multiple. Keep interface the same. Question: should it be able to construct itself from the result string? """ # FIELD NAMES ITERATION = 'ITERATION' QUERY_ID = 'QUERY ID' SUBJECT_ID = 'SUBJECT ID' PERCENT_IDENTITY = '% IDENTITY' ALIGNMENT_LENGTH = 'ALIGNMENT LENGTH' MISMATCHES = 'MISMATCHES' GAP_OPENINGS = 'GAP OPENINGS' QUERY_START = 'Q. START' QUERY_END = 'Q. END' SUBJECT_START = 'S. START' SUBJECT_END = 'S. END' E_VALUE = 'E-VALUE' BIT_SCORE = 'BIT SCORE' #standard comparison for each field, e.g. #want long matches, small e-values _lt = lambda x, y: cmp(x, y) == -1 _le = lambda x, y: cmp(x, y) <= 0 _gt = lambda x, y: cmp(x, y) == 1 _ge = lambda x, y: cmp(x, y) >= 0 _eq = lambda x, y: cmp(x, y) == 0 FieldComparisonOperators = { PERCENT_IDENTITY:(_gt, float), ALIGNMENT_LENGTH:(_gt, int), MISMATCHES:(_lt, int), E_VALUE:(_lt, float), BIT_SCORE:(_gt, float) } # set up valid blast keys HitKeys = set([ ITERATION, QUERY_ID, SUBJECT_ID, PERCENT_IDENTITY, ALIGNMENT_LENGTH, MISMATCHES, GAP_OPENINGS, QUERY_START, QUERY_END, SUBJECT_START, SUBJECT_END, E_VALUE, BIT_SCORE ]) def __init__(self, data, psiblast=False): """ Init using blast results data: blast output from the m = 9 output option psiblast: if True, will expect psiblast output, else expects blast output """ parser = MinimalBlastParser9 if psiblast: parser = MinimalPsiBlastParser9 mp = parser(data, True) for props, rec_data in mp: iteration = 1 if self.ITERATION in props: iteration = int(props[self.ITERATION]) hits = [] # check if found any hits if len(rec_data) > 1: for h in rec_data[1:]: hits.append(dict(zip(rec_data[0], h))) else: hits.append(dict(zip(rec_data[0], ['' for x in rec_data[0]]))) # get blast version of query id query_id = hits[0][self.QUERY_ID] if query_id not in self: self[query_id] = [] self[query_id].append(hits) def iterHitsByQuery(self, iteration=-1): """Iterates over set of hits, returning list of hits for each query""" for query_id in self: yield query_id, self[query_id][iteration] def iterHitsByTarget(self, iteration=-1): """Iterates over set of hits, returning list of hits for each target""" raise NotImplementedError def iterAllHits(self, iteration=-1): """Iterates over all hits, one at a time""" raise NotImplementedError def filterByField(self, field='E-value', threshold=0.001): """Returns a copy of self containing hits where field better than threshold. Uses FieldComparisonOperators to figure out which direction to compare. """ raise NotImplementedError def filterByFunc(self, f): """Returns copy of self containing hits where f(entry) is True.""" raise NotImplementedError def bestHitsByQuery(self, iteration=-1, n=1, field='BIT SCORE', return_self=False): """Iterates over all queries and returns best hit for each return_self: if False, will not return best hit as itself. Uses FieldComparisonOperators to figure out which direction to compare. """ # check that given valid comparison field if field not in self.FieldComparisonOperators: raise ValueError, "Invalid field: %s. You must specify one of: %s" \ % (field, str(self.FieldComparisonOperators)) cmp_fun, cast_fun = self.FieldComparisonOperators[field] # enumerate hits for q, hits in self.iterHitsByQuery(iteration=iteration): best_hits = [] for hit in hits: # check if want to skip self hit if not return_self: if hit[self.SUBJECT_ID] == q: continue # check if better hit than ones we have if len(best_hits) < n: best_hits.append(hit) else: for ix, best_hit in enumerate(best_hits): new_val = cast_fun(hit[field]) old_val = cast_fun(best_hit[field]) if cmp_fun(new_val, old_val): best_hits[ix] = hit continue yield q, best_hits def filterByIteration(self, iteration=-1): """Returns copy of self containing only specified iteration. Negative indices count backwards.""" #raise error if both field and f passed, uses same dict as filterByField fastacmd_taxonomy_splitter = DelimitedRecordFinder(delimiter='', \ ignore=never_ignore) fasta_field_map = { 'NCBI sequence id':'seq_id', 'NCBI taxonomy id':'tax_id', 'Common name':'common_name', 'Scientific name':'scientific_name'} def FastacmdTaxonomyParser(lines): """Yields successive records from the results of fastacmd -T. Format is four lines separated by newline: NCBI sequence NCBI taxonomy Common name Scientific name Result is dict with keys by seq_id, tax_id, common_name, scientific_name. """ for group in fastacmd_taxonomy_splitter(lines): result = {} for line in group: try: header, data = line.split(':', 1) result[fasta_field_map[header]] = data.strip() except (TypeError, ValueError, KeyError): continue yield result PyCogent-1.5.3/cogent/parse/blast_xml.py000644 000765 000024 00000016740 12024702176 021201 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parsers for XML output of blast, psi-blast and blat. """ __author__ = "Kristian Rother" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Micah Hamady"] __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kristian Rother" __email__ = "krother@rubor.de" __status__ = "Prototype" import xml.dom.minidom """ CAUTION: This XML BLAST PARSER uses minidom. This means a bad performance for big files (>5MB), and huge XML files will for sure crash the program! (06/2009 Kristian) Possible improvements: - convert some values into floats automatically (feature request) - MH recommends sax.* for faster processing. - test against nt result - test really big file. - consider high speed parser for standard output """ from cogent.parse.blast import BlastResult # field names used to parse tags and create dict. HIT_XML_FIELDNAMES = ['QUERY ID','SUBJECT_ID','HIT_DEF','HIT_ACCESSION',\ 'HIT_LENGTH'] HSP_XML_FIELDS = ( ('PERCENT_IDENTITY','Hsp_identity'), ('ALIGNMENT_LENGTH','Hsp_align-len'), ('MISMATCHES',''), ('GAP_OPENINGS','Hsp_gaps'), ('QUERY_START','Hsp_query-from'), ('QUERY_END','Hsp_query-to'), ('SUBJECT_START','Hsp_hit-from'), ('SUBJECT_END','Hsp_hit-to'), ('E_VALUE','Hsp_evalue'), ('BIT_SCORE','Hsp_bit-score'), ('SCORE','Hsp_score'), ('POSITIVE','Hsp_positive'), ('QUERY_ALIGN','Hsp_qseq'), ('SUBJECT_ALIGN','Hsp_hseq'), ('MIDLINE_ALIGN','Hsp_midline'), ) HSP_XML_FIELDNAMES = [x[0] for x in HSP_XML_FIELDS] HSP_XML_TAGNAMES = [x[1] for x in HSP_XML_FIELDS] def get_tag(record, name, default=None): """ Loks in the XML tag 'record' for other tags named 'name', and returns the value of the first one. If none is found, it returns 'default'. """ tag = record.getElementsByTagName(name) if len(tag) and len(tag[0].childNodes): return tag[0].childNodes[0].nodeValue else: return default def parse_hit(hit_tag,query_id=1): """ Parses a 'Hit' dom object. Returns a list of lists with HSP data. """ result = [] # parse elements from hit tag hit_id = get_tag(hit_tag,'Hit_id') hit_def = get_tag(hit_tag,'Hit_def') accession = get_tag(hit_tag,'Hit_accession') length = int(get_tag(hit_tag,'Hit_len'), 0) hit_data = [query_id,hit_id, hit_def, accession, length] # process HSPS in this hit. for hsp_tag in hit_tag.getElementsByTagName('Hsp'): result.append(hit_data + parse_hsp(hsp_tag)) return result def parse_hsp(hsp_tag): """ Parses a 'Hsp' XML dom object. Returns a list of values, according to the items in HSP_XML_FIELDS. """ result = [] for tag_name in HSP_XML_TAGNAMES: result.append(get_tag(hsp_tag,tag_name,0)) # what about these? # self.identity = int(self.get_tag(record,'Hsp_identity', 0)) # self.positive = int(self.get_tag(record, 'Hsp_positive', 0)) return result def parse_header(tag): """ Parses a 'BlastOutput' dom object. Returns a dict with information from the blast header """ result = {} result['application'] = get_tag(tag,'BlastOutput_program') result['version'] = get_tag(tag,'BlastOutput_version') result['reference'] = get_tag(tag,'BlastOutput_reference') result['query'] = get_tag(tag,'BlastOutput_query-def') result['query_letters'] = int(get_tag(tag,'BlastOutput_query-len')) result['database'] = get_tag(tag,'BlastOutput_db') # add data fro Parameters tag for param_tag in tag.getElementsByTagName('BlastOutput_param'): #for param_tag in tag.getElementsByTagName('Parameters'): data = parse_parameters(param_tag) for k in data: result[k] = data[k] return result def parse_parameters(tag): """Parses a 'BlastOutput_param' dom object.""" result = {} result['matrix'] = get_tag(tag,'Parameters_matrix') result['expect'] = get_tag(tag,'Parameters_expect') result['gap_open_penalty'] = float(get_tag(tag,'Parameters_gap-open')) result['gap_extend_penalty'] = float(get_tag(tag,'Parameters_gap-extend')) result['filter'] = get_tag(tag,'Parameters_filter') return result def MinimalBlastParser7(lines, include_column_names=False, format='xml'): """Yields succesive records from lines (props, data list). lines must be XML BLAST output format. output: props is a dict of {UPPERCASE_KEY:value}. data_list is a list of list of strings, optionally with header first. LIST CONTAINS [HIT][HSP][strings], FIRST ENTRY IS LIST OF LABELS! """ doc = ''.join(lines) dom_obj = xml.dom.minidom.parseString(doc) query_id = 1 for record in dom_obj.getElementsByTagName('BlastOutput'): props = parse_header(record) hits = [HIT_XML_FIELDNAMES + HSP_XML_FIELDNAMES] for hit in record.getElementsByTagName('Hit'): hits += parse_hit(hit,query_id) yield props,hits class BlastXMLResult(BlastResult): """the BlastResult objects have the query sequence as keys, and the values are lists of lists of dictionaries. The FIELD NAMES given are the keys of the dict. """ # FIELD NAMES QUERY_ALIGN = 'HSP QSEQ' SUBJECT_ALIGN = 'HSP HSEQ' MIDLINE_ALIGN = 'HSP MIDLINE' HIT_DEF = 'HIT_DEF' HIT_ACCESSION = 'HIT_ACCESSION' HIT_LENGTH = 'HIT_LENGTH' SCORE = 'SCORE' POSITIVE = 'POSITIVE' #FieldComparisonOperators = ( # BlastResult.FieldComparisonOperators = { # HIT_DEF:(_gt, float) # } # .. to be done # .. extend HitKeys HitKeys = BlastResult.HitKeys.union( set([ HIT_DEF, HIT_ACCESSION, HIT_LENGTH, SCORE, POSITIVE, QUERY_ALIGN, SUBJECT_ALIGN, MIDLINE_ALIGN ])) def __init__(self, data, psiblast=False, parser=None, xml=False): # iterate blast results, generate data structure """ Init using blast 7 or blast 9 results data: blast output from the m = 9 output option psiblast: if True, will expect psiblast output, else expects blast output """ # further improvement: # add XML option to BlastResult __init__ instead of # using a separate class. if not parser: if xml: parser = MinimalBlastParser7 elif psiblast: parser = MinimalPsiBlastParser9 else: parser = MinimalBlastParser9 # code below copied from BlastResult, unchanged. mp = parser(data, True) for props, rec_data in mp: iteration = 1 if self.ITERATION in props: iteration = int(props[self.ITERATION]) hits = [] # check if found any hits if len(rec_data) > 1: for h in rec_data[1:]: hits.append(dict(zip(rec_data[0], h))) else: hits.append(dict(zip(rec_data[0], ['' for x in rec_data[0]]))) # get blast version of query id query_id = hits[0][self.QUERY_ID] if query_id not in self: self[query_id] = [] self[query_id].append(hits) PyCogent-1.5.3/cogent/parse/bowtie.py000644 000765 000024 00000004166 12024702176 020504 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parser for the default bowtie output Compatible with version 0.12.5 """ from cogent import LoadTable from cogent.parse.table import ConvertFields __author__ = "Gavin Huttley, Anuj Pahwa" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight","Peter Maxwell", "Gavin Huttley", "Anuj Pahwa"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Development" # The 4th and the 7th elements of the row of data returned from bowtie are # integer values and can thus be converted. row_converter = ConvertFields([(3, int), (6, int)]) def BowtieOutputParser(data, row_converter=row_converter): """yields a header and row of data from the default bowtie output Arguments: - row_converter: if not provided, uses a default converter which casts the Offset and Other Matches fields to ints. If set to None, all returned data will be strings (this is faster). """ header = ['Query Name', 'Strand Direction','Reference Name', 'Offset', 'Query Seq', 'Quality', 'Other Matches', 'Mismatches'] yield header # If given a filename for the data if type(data) == str: data = open(data) for record in data: row = record.rstrip('\n').split('\t') if row_converter: row = row_converter(row) # convert the last element to a list of strings if row[-1] is '': row[-1] = [] else: row[-1] = row[-1].split(',') yield row def BowtieToTable(data, row_converter=row_converter): """Converts bowtie output to a table Arguments: - row_converter: if not provided, uses a default converter which casts the Offset and Other Matches fields to ints. If set to None, all returned data will be strings (this is faster). """ parser = BowtieOutputParser(data, row_converter=row_converter) header = parser.next() rows = [row for row in parser] table = LoadTable(header=header, rows=rows) return table PyCogent-1.5.3/cogent/parse/bpseq.py000644 000765 000024 00000023771 12024702176 020330 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #bpseq.py """Provides parser for bpseq files downloaded from Gutell's comparative RNA website (CRW): http://www.rna.icmb.utexas.edu/ The file format is: Filename: d.16.b.E.coli.bpseq Organism: Escherichia coli Accession Number: J01695 Citation and related information available at http://www.rna.icmb.utexas.edu 1 A 0 2 A 0 3 A 0 4 U 0 5 U 0 6 G 0 7 A 0 8 A 0 9 G 25 10 A 24 11 G 23 12 U 22 13 U 21 14 U 0 So, header of four lines (Filename, Organism, Accession Number, Citation) Sequence, structure information in tuples of residue position, residue name, residue partner. The residue partner is 0 if the base is unpaired. Numbering is 1-based! """ from __future__ import division from string import strip from cogent.struct.rna2d import Vienna, Pairs from cogent.struct.knots import opt_single_random from cogent.core.info import Info from cogent.core.sequence import RnaSequence __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class BpseqParseError(Exception): """Exception raised when an error occurs during parsing a bpseq file""" pass def parse_header(header_lines): """Return Info object from header information. header_lines -- list of lines or anything that behaves like it. Parses only the first three header lines with Filename, Organism, and Accession number. In general lines that contain a colon will be parsed. There's no error checking in here. If it fails to split on ':', the information is simply not added to the dictionary. The expected format for header lines is "key: value". The citation lane is parsed differently. """ info = {} for line in header_lines: if line.startswith('Citation'): info['Citation'] = line.split()[-1].strip() elif ':' in line: try: field, value = map(strip,line.split(':',1)) info[field] = value except ValueError: #no interesting header line continue else: continue return Info(info) def construct_sequence(seq_dict): """Construct RnaSequence from dict of {pos:residue}. seq_dict -- dictionary of {position: residue} Checks whether the first residue is 0. Checks whether all residues between min and max index are present. No checking on validity of residue symbols. """ all_pos = seq_dict.keys() min_pos, max_pos = min(all_pos), max(all_pos) if min_pos != 0: raise BpseqParseError(\ "Something went wrong with adjusting the numbering") # make sure all positions are in the dictionary for idx in range(min_pos, max_pos+1): if idx not in seq_dict: raise BpseqParseError(\ "Description of residue with index %s is missing"%(idx)) seq = [] for idx in range(min_pos, max_pos+1): seq.append(seq_dict[idx]) # Return as a simple string return ''.join(seq) def parse_residues(residue_lines, num_base, unpaired_symbol): """Return RnaSequence and Pairs object from residue lines. residue_lines -- list of lines or anything that behaves like it. Lines should contain: residue_position, residue_identiy, residue_partner. num_base -- int, basis of the residue numbering. In bpseq files from the CRW website, the numbering starts at 1. unpaired_symbol -- string, symbol in the 'partner' column that indicates that a base is unpaired. In bpseq files from the CRW website, the unpaired_symbol is '0'. This parameter should be a string to allow other symbols that can't be casted to an integer to indicate unpaired bases. Checks for double entries both in the sequence and the structure, and checks that the structre is valid in the sense that if (up,down) in there, that (down,up) is the same. """ #create dictionary/list for sequence and structure seq_dict = {} pairs = Pairs() for line in residue_lines: try: pos, res, partner = line.strip().split() if partner == unpaired_symbol: # adjust pos, not partner pos = int(pos) - num_base partner = None else: # adjust pos and partner pos = int(pos) - num_base partner = int(partner) - num_base pairs.append((pos,partner)) #fill seq_dict if pos in seq_dict: raise BpseqParseError(\ "Double entry for residue %s (%s in bpseq file)"\ %(str(pos), str(pos+1))) else: seq_dict[pos] = res except ValueError: raise BpseqParseError("Failed to parse line: %s"%(line)) #check for conflicts, remove unpaired bases if pairs.hasConflicts(): raise BpseqParseError("Conflicts in the list of basepairs") pairs = pairs.directed() pairs.sort() # construct sequence from seq_dict seq = RnaSequence(construct_sequence(seq_dict)) return seq, pairs def MinimalBpseqParser(lines): """Separate header and content (residue lines). lines -- a list of lines or anything that behaves like that. The standard bpseq header (from the CRW website) is recognized. Also, lines that contain a colon are accepted as header lines. Header lines that aren't accepted as header, but that can be split into three parts are residue lines (sequence and structure description). Lines that don't fall into any of these categories are ignored. """ result = {'HEADER':[], 'SEQ_STRUCT':[]} for line in lines: if line.startswith('Filename') or line.startswith('Organism') or\ line.startswith('Accession') or line.startswith('Citation') or\ ":" in line: result['HEADER'].append(line.strip()) elif len(line.split()) == 3: result['SEQ_STRUCT'].append(line.strip()) else: continue #unknown return result def BpseqParser(lines, num_base=1, unpaired_symbol='0'): """Return RnaSequence and structure (Pairs object) specified in file. lines -- filestream of bpseq file. File should contain a single record. num_base -- int, basis of the residue numbering. In bpseq files from the CRW website, the numbering starts at 1. unpaired_symbol -- string, symbol in the 'partner' column that indicates that a base is unpaired. In bpseq files from the CRW website, the unpaired_symbol is '0'. This parameter should be a string to allow other symbols that can't be casted to an integer to indicate unpaired bases. Bpseq file looks like this: Filename: d.16.b.E.coli.bpseq Organism: Escherichia coli Accession Number: J01695 Citation and related information available at http://www.... 1 A 0 2 A 0 3 A 0 4 U 0 5 U 0 6 G 0 7 A 0 8 A 0 9 G 25 10 A 24 11 G 23 12 U 22 13 U 21 So, 4 header lines, followed by a list of residues. Position (indexed to 1), residue, partner position """ # separate header and residue lines grouped_lines = MinimalBpseqParser(lines) # parse header and seq/struct separately header_info = parse_header(grouped_lines['HEADER']) seq, struct = parse_residues(grouped_lines['SEQ_STRUCT'],\ num_base, unpaired_symbol) #add header info to the sequence as Info object seq.Info = header_info return seq, struct # ============================================================================ # CONVENIENCE FUNCTIONS # ============================================================================ def bpseq_specify_output(lines, num_base=1, unpaired_symbol='0', return_vienna=False, remove_pseudo=False,\ pseudoknot_function=opt_single_random): """Return Vienna structure of Pairs object with or without pseudoknots lines -- filestream of bpseq file. File should contain a single record. num_base -- int, basis of the residue numbering. In bpseq files from the CRW website, the numbering starts at 1. unpaired_symbol -- string, symbol in the 'partner' column that indicates that a base is unpaired. In bpseq files from the CRW website, the unpaired_symbol is '0'. This parameter should be a string to allow other symbols that can't be casted to an integer to indicate unpaired bases. return_vienna -- boolean, if True, a ViennaStructure object is returned, if False, a Pairs object is returned. If return_vienna is True, pseudoknots need to be removed from the structure. remove_pseudo -- boolean, if True, pseudoknots will be removed from the structure. pseudoknot_function -- function that takes a Pairs object as input and returns a nested version of the structure. The function should return a single nested structure, not a list of structures. Default is opt_single_random, which retuns a single nested structure (it picks one at random in case of multiple structures with the maximum number of base pairs). This default is chosen to assure the code always returns something. In case the experiment needs to be reproducible, a random choice isn't the best one to make, and one should use a different pseudoknot removal function. See struct/knots.py for documentation. """ seq, pairs = BpseqParser(lines, num_base, unpaired_symbol) if pairs.hasPseudoknots() and (remove_pseudo or return_vienna): pairs = pseudoknot_function(pairs) if return_vienna: v = pairs.toVienna(len(seq)) return seq, v else: return seq, pairs if __name__ == "__main__": from sys import argv seq, struct = BpseqParser(open(argv[1])) print seq print seq.Info print struct PyCogent-1.5.3/cogent/parse/carnac.py000644 000765 000024 00000000724 12024702176 020436 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.parse.ct import ct_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def carnac_parser(lines=None): """Parser for carnac output tested in test_ct""" result = ct_parser(lines) return result PyCogent-1.5.3/cogent/parse/cigar.py000644 000765 000024 00000012020 12024702176 020264 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parsers for the cigar format Cigar stands for Compact Idiosyncratic Gapped Alignment Report and defines the sequence of matches/mismatches and deletions (or gaps). Cigar line is used in Ensembl database for both multiple and pairwise genomic alignment. for example, this cigar line 2MD3M2D2M will mean that the alignment contains 2 matches/ mismatches, 1 deletion (number 1 is omitted in order to save some spaces), 3 matches/ mismatches, 2 deletion and 2 matches/mismatches. if the original sequence is: AACGCTT the cigar line is: 2MD3M2D2M the aligned sequence will be: M M D M M M D D M M A A - C G C - - T T """ import re from cogent.core.location import LostSpan, Span, Map, _LostSpan from cogent import DNA, LoadSeqs __author__ = "Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Hua Ying" __email__ = "hua.ying@anu.edu.au" __status__ = "Production" pattern = re.compile('([0-9]*)([DM])') def map_to_cigar(map): """convert a Map into a cigar string""" cigar = '' for span in map.spans: if isinstance(span, Span): num_chars = span.End-span.Start char = 'M' else: num_chars = span.length char = 'D' if num_chars == 1: cigar += char else: cigar += str(num_chars)+char return cigar def cigar_to_map(cigar_text): """convert cigar string into Map""" assert 'I' not in cigar_text spans, posn = [], 0 for n, c in pattern.findall(cigar_text): if n: n = int(n) else: n = 1 if c == 'M': spans.append(Span(posn, posn+n)) posn += n else: spans.append(LostSpan(n)) map = Map(spans = spans, parent_length = posn) return map def aligned_from_cigar(cigar_text, seq, moltype=DNA): """returns an Aligned sequence from a cigar string, sequence and moltype""" if isinstance(seq, str): seq = moltype.makeSequence(seq) map = cigar_to_map(cigar_text) aligned_seq = seq.gappedByMap(map) return aligned_seq def _slice_by_aln(map, left, right): slicemap = map[left:right] if hasattr(slicemap, 'Start'): location = [slicemap.Start, slicemap.End] else: location = [] return slicemap, location def _slice_by_seq(map, start, end): re_map = map.inverse() slicemap = re_map[start:end] aln_start, aln_end = slicemap.Start, slicemap.End new_map = map[aln_start:aln_end] return new_map, [aln_start, aln_end] def _remap(map): start = map.Start if start == 0: new_map = map new_map.parent_length = map.End else: spans = [] for span in map.spans: if span.lost: spans.append(span) else: span.Start = span.Start - start span.End = span.End - start length = span.End spans.append(span) new_map = Map(spans = spans, parent_length = length) return new_map def slice_cigar(cigar_text, start, end, by_align=True): """slices a cigar string as an alignment""" map = cigar_to_map(cigar_text) if by_align: new_map, location = _slice_by_aln(map, start, end) else: new_map, location = _slice_by_seq(map, start, end) if hasattr(new_map, 'Start'): new_map = _remap(new_map) return new_map, location def CigarParser(seqs, cigars, sliced = False, ref_seqname = None, start = None, end = None, moltype=DNA): """return an alignment from raw sequences and cigar strings if sliced, will return an alignment correspondent to ref sequence start to end Arguments: seqs - raw sequences as {seqname: seq} cigars - corresponding cigar text as {seqname: cigar_text} cigars and seqs should have the same seqnames MolType - optional default to DNA """ data = {} if not sliced: for seqname in seqs.keys(): aligned_seq = aligned_from_cigar(cigars[seqname], seqs[seqname], moltype=moltype) data[seqname] = aligned_seq else: ref_aln_seq = aligned_from_cigar(cigars[ref_seqname], seqs[ref_seqname], moltype=moltype) m, aln_loc = slice_cigar(cigars[ref_seqname], start, end, by_align = False) data[ref_seqname] = ref_aln_seq[aln_loc[0]:aln_loc[1]] for seqname in [seqname for seqname in seqs.keys() if seqname != ref_seqname]: m, seq_loc = slice_cigar(cigars[seqname], aln_loc[0], aln_loc[1]) if seq_loc: seq = seqs[seqname] if isinstance(seq, str): seq = moltype.makeSequence(seq) data[seqname] = seq[seq_loc[0]:seq_loc[1]].gappedByMap(m) else: data[seqname] = DNA.makeSequence('-'*(aln_loc[1] - aln_loc[0])) aln = LoadSeqs(data = data, aligned = True) return aln PyCogent-1.5.3/cogent/parse/clustal.py000644 000765 000024 00000006550 12024702176 020661 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Parsers for Clustal and related formats (e.g. MUSCLE). Implementation Notes: Currently, does not check whether sequences are the same length and are in order. Skips any line that starts with a blank. ClustalParser preserves the order of the sequences from the original file. However, it does use a dict as an intermediate, so two sequences can't have the same label. This is probably OK since Clustal will refuse to run on a FASTA file in which two sequences have the same label, but could potentially cause trouble with manually edited files (all the segments of the conflicting sequences would be interleaved, possibly in an unpredictable way). If the lines have trailing numbers (i.e. Clustal was run with -LINENOS=ON), silently deletes them. Does not check that the numbers actually correspond to the number of chars in the sequence printed so far. """ from cogent.parse.record import RecordError, DelimitedSplitter from string import strip __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit", "Gavin Huttley", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def LabelLineParser(record, splitter, strict=True): """Returns dict mapping list of data to labels, plus list with field order. Field order contains labels in order encountered in file. NOTE: doesn't care if lines are out of order in different blocks. This should never happen anyway, but it's possible that this behavior should be changed to tighten up validation. """ labels = [] result = {} for line in record: try: key, val = splitter(line.rstrip()) except: if strict: raise RecordError, \ "Failed to extract key and value from line %s" % line else: continue #just skip the line if not strict if key in result: result[key].append(val) else: result[key] = [val] labels.append(key) return result, labels def is_clustal_seq_line(line): """Returns True if line starts with a non-blank character but not 'CLUSTAL'. Useful for filtering other lines out of the file. """ return line and (not line[0].isspace()) and\ (not line.startswith('CLUSTAL')) and (not line.startswith('MUSCLE')) last_space = DelimitedSplitter(None, -1) def delete_trailing_number(line): """Deletes trailing number from a line. WARNING: does not preserve internal whitespace when a number is removed! (converts each whitespace run to a single space). Returns the original line if it didn't end in a number. """ pieces = line.split() try: int(pieces[-1]) return ' '.join(pieces[:-1]) except ValueError: #no trailing numbers return line def MinimalClustalParser(record, strict=True): """Returns (data, label_order) tuple. Data is dict of label -> sequence (pieces not joined). """ return LabelLineParser(map(delete_trailing_number, \ filter(is_clustal_seq_line, record)), last_space, strict) def ClustalParser(record, strict=True): seqs, labels = MinimalClustalParser(record, strict) for l in labels: yield l, ''.join(seqs[l]) PyCogent-1.5.3/cogent/parse/cmfinder.py000644 000765 000024 00000001324 12024702176 020773 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.util.transform import make_trans from cogent.struct.rna2d import wuss_to_vienna, Pairs from cogent.parse.rfam import RfamParser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def CMfinderParser(lines): """Parser for CMfinder output format Parser tested through RfamParser test """ for info, alignment, struct in RfamParser(lines,strict=False): struct = wuss_to_vienna(struct) pairs = struct.toPairs() return [alignment, pairs] PyCogent-1.5.3/cogent/parse/column.py000644 000765 000024 00000003202 12024702176 020476 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file: column_parser.py """Parser for column format Works for the following column format: ; COL 1 label ; COL 2 residue ; COL 3 seqpos ; COL 4 alignpos ; COL 5 align_bp ; COL 6 certainty/seqpos_bp Structure part separated by '; ------' and ends with '; ******' """ from string import split from cogent.struct.rna2d import Pairs from cogent.struct.pairs_util import adjust_base __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def column_parser(lines): """Parser column format""" record = False result = [] struct = [] seq = '' for line in lines: if line.startswith('; ------'): #structure part beginns record = True continue if line.startswith('; ******'): #structure part ends record = False struct = adjust_base(struct,-1) struct = Pairs(struct).directed()#remove duplicates struct.sort() result.append([seq,struct]) struct = [] seq = '' continue if record: sline = line.split() if sline[4] == '.': #skip not paired seq = ''.join([seq,sline[1]]) continue seq = ''.join([seq,sline[1]]) pair = (int(sline[3]),int(sline[4])) #(alignpos,align_bp) struct.append(pair) return result PyCogent-1.5.3/cogent/parse/comrna.py000644 000765 000024 00000020046 12024702176 020465 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file comrna_parser.py """Parser for comRNA output format To reduce number of structures that parser report set first to True, then parser will only report structures from first block (Maximum stem similarity score block). The function common can be used to have the most common occuring structure be reported first, if all structure only occurs once the structure with the most pairs will be reported as the most common. """ from cogent.util.transform import make_trans from cogent.struct.rna2d import Pairs from cogent.struct.knots import opt_single_random from string import index __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def comRNA_parser(lines=None,pseudo=True,first=False): """Parsed comRNA output. pseudo - if True, report results with pseudoknots; if flag is False pseudoknots will be removed. """ names = get_names(lines) result = [] for block in minimalComrnaParser(lines): for structure in blockParser(block): for struct in structParser(structure): for pairs,seq in pairsParser(struct,names): result.append([seq,pairs]) if first: break if first: break if first: break if not pseudo: tmp = [] for block in result: tmp.append([block[0],opt_single_random(block[-1])]) result = tmp return result def common(structs): """ Will return a list of sequences and structures with the most common structure first in the list. (rest or list unordered!) Don't care which of the sequences for the "winning" sequence that is reported since the are not ranked amongst them self. """ frequency = {} v = 0 indx = 0 result = [] tmp_list = [] #lookup the seq for the structures,dont care which winner seq key = [] for block in structs: tmp_list.extend(block) p = tuple(block[-1]) if frequency.__contains__(p): #everytime struct p appears count up by 1 frequency[p]+=1 else: frequency[p]=1 nr = frequency[p] if nr > v: #Which struct appears most times v = nr key = p #if winning structure has frequency == 1 all structure apper only once if frequency[key]==1: longest = 0 for block in structs: l = len(block[-1]) if l > longest: #pick longest sequence as the winner key = tuple(block[-1]) winner = Pairs(key) indx = tmp_list.index(winner)-1 result.append([tmp_list[indx],winner]) #adds the most common structure first del frequency[key] for i in frequency.keys(): #rest of structures added i = Pairs(i) indx = tmp_list.index(i)-1 result.append([tmp_list[indx],i]) return result def get_names(lines): """ Retrieves the names of the sequences in the output. """ next = False names = [] for line in lines: if next: if len(line) == 1: break else: tmp = line.split() names.append(tmp[1]) if line.startswith('Sequences loaded ...'): next = True return names def minimalComrnaParser(lines): """ Parses the output file in to blocks depending on the S score S score is the Maximum stem similarity score. """ block = [] first = True record = False for line in lines: if line.startswith('=========================== S ='): record = True if not first: yield block block = [] first = False if record: block.append(line) yield block def blockParser(block): """ Parses every block of S scores in to blocks of structures every S score block has 10 or less structures """ struct = [] first = True record = False for line in block: if line.startswith('Structure #'): record = True if not first: yield struct struct = [] first = False if record: struct.append(line) yield struct def structParser(lines): """ Parses a structure block into a block containing the sequens and structures lines. """ blc = 0 #blank line counter bc = 0 #block counter struct = [] record = False for line in lines: if len(line) == 1: blc +=1 record = False if blc == 2: blc = 0 bc +=1 record = True if record and bc < 3: struct.append(line) yield struct def pairsParser(seqBlock,names): """ Takes a structure block and parse that into structures """ for name in names: seq = [] sIndx = [] #start index, where in the line the sequence start struct = [] #structure lines record = False for line in seqBlock: if line.startswith(name+' '): tmp = line.split() #if seq length is shorter then 80 for one seq and longer #for another seq the following block will be empty for the #shorter sequence. this if statement protects against that if len(tmp) == 4: try: seq.append(tmp[2])#[name,start nr,seq,end nr] except: print 'LINE',line print 'BLOCK', seqBlock sIndx.append(index(line,tmp[2])) record = True else: continue else: if record: record = False struct.append(line) ############################################################################### # Construction of the full sequence and structure and then mapping each letter #in structure to a position Fseq = '' #full sequence Fstruct = '' #full structure for i in range(len(seq)): # slice out corresponding structure to sequence #so you can get the same index for structure and sequence tmpStruct = struct[i][sIndx[i]:(sIndx[i]+len(seq[i]))] Fseq = ''.join([Fseq,seq[i]]) Fstruct = ''.join([Fstruct,tmpStruct]) #Applies a position to every letter in structure sequence letterPos = zip(range(len(Fseq)),Fstruct) ############################################################################### #Cunstruction of dictionary for where every letter in structure has a list of #positions corresponding to that of that letter in respect to the sequence alphabet = {} for pos, letter in letterPos: indices = [] #if the dict contains the letter you want to add to that list if alphabet.__contains__(letter): indices = alphabet[letter] indices.append(pos) alphabet[letter] = indices #else you want to create a new list for that letter elif not letter==' ': indices.append(pos) alphabet[letter] = indices ############################################################################### #Each list in alphabet needs to be split in two, #oL and cL (open and close list), to be able to fold the positions into pairs pairs = [] for value in alphabet.values(): middle = len(value)/2 oL = value[:middle] cL = value[middle:] #pairs are created by making a tuple of the first in oL to #the last in cl, second in oL to second last in cL and so on pairs.extend(zip(oL,cL.__reversed__())) yield Pairs(pairs),Fseq PyCogent-1.5.3/cogent/parse/consan.py000644 000765 000024 00000003417 12024702176 020472 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.struct.rna2d import ViennaStructure,wuss_to_vienna from cogent.util.transform import make_trans __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" to_vienna_table = make_trans('><','()') def consan_parser(lines): """ Takes a series of lines as input. Returns a list containing alignment and structure ex: [{alignment},[structure]] """ seqs = [] struct = '' pairs = [] alignment = {} for line in lines: if sequence(line): line = line.split() name = line[0].strip() seq = line[1].strip() #add sequence to alignment if name in alignment: alignment[name] += seq else: alignment[name] = seq elif line.startswith('#=GC SS_cons'): line = line.split() struct += line[2].strip() pairs = convert_to_pairs(struct) return [alignment, pairs] def convert_to_pairs(data): """ Converts format >< to () format, viennaformat. """ try: vienna = ViennaStructure(data.translate(to_vienna_table)) return toPairs(vienna) except IndexError: return '' def toPairs(vienna): """ Converts a vienna structure to a pairs obejct """ pairs = vienna.toPairs() return pairs def sequence(line): """Determines if line is a sequence line """ answer = False if not len(line) == 1 and not line.startswith('#') and not line.startswith('/') and not line.startswith('Using'): answer = True return answer PyCogent-1.5.3/cogent/parse/contrafold.py000644 000765 000024 00000001224 12024702176 021336 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.parse.bpseq import _parse_residues __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def contrafold_parser(lines): """Parser Contarfold output Returns a list containing sequence and structure(in pair format) Ex: [[sequence,[structure]]] Tested in tests for bpseq) """ result = [] seq,struct = _parse_residues(lines,True) result.append([seq,struct]) return result PyCogent-1.5.3/cogent/parse/cove.py000644 000765 000024 00000006123 12024702176 020142 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from string import split,strip from cogent.util.transform import make_trans from cogent.struct.rna2d import ViennaStructure,wuss_to_vienna __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def coves_parser(data=None): """ Parser for coves output using option -m Takes lines as input. Returns a list of lists(containing structure and pairs data) ex: [[seq1,pairs1],[seq2,pairs2],...] """ c = 1 count = 0 tmp_seq = '' tmp_struct = '' seq = '' struct = '' result = [] seqNames = NameList(data) for name in seqNames: #parse each sequence in turn name = '%s ' % (name) #add blank to differ ex seq1 from seq10 count+=1 if count != 1: struct, seq = remove_gaps(struct,seq) pairs = convert_to_vienna(struct) result.append([seq,pairs]) seq = '' struct = '' for line in data: line = str(line) line = line.strip() sline = line.split(None,1) if c==1: #sequence line (every other line) if line.startswith(name): c=0 tmp_seq = sline[-1] seq = ''.join([seq,tmp_seq]) elif c==0: #struct line (every other line) if line.startswith(name): c=1 tmp_struct = sline[-1] struct = ''.join([struct,tmp_struct]) struct,seq = remove_gaps(struct,seq) pairs = convert_to_vienna(struct) result.append([seq,pairs]) return result cove_to_vienna_table = make_trans('><','()') def remove_gaps(struct,seq): """Remove gaps function Some results comes with gaps that need to be removed """ seq = seq.replace('-','') tmp_struct = struct.split() #removes gaps tmp = '' for i in range(len(tmp_struct)): tmp = ''.join([tmp,tmp_struct[i]]) #put struct parts together struct = tmp if len(struct) != len(seq): #check so that struct and seq match in length raise ValueError, 'Sequence length don\'t match structure length' return struct,seq def NameList(data=None): """ Takes coves results and retrieves the sequence names for further parsing """ nameList = [] if not isinstance(data,list): data = open(data).readlines() for line in data: if line.__contains__('bits'): #every unique sequense begins with 'bits' line = line.split() nameList.append(line[-1]) return nameList def convert_to_vienna(data): """ Converts into vienna dot bracket format, >< to () """ try: return toPairs(ViennaStructure(data.translate(cove_to_vienna_table))) except IndexError: return '' def toPairs(vienna): """ Converts a vienna structure to a pairs obejct """ pairs = vienna.toPairs() return pairs PyCogent-1.5.3/cogent/parse/ct.py000644 000765 000024 00000006276 12024702176 017625 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parser for ct rna secondary structure format Works on ct files containing one or more structures.. supports: Carnac dynalign mfold sfold unafold knetfold Should work on all ct formats conforming to format: header, structure, header, structure ... Header is line beginning every structure, containing length,energy,input file: 72 ENERGY = -23.4 trna_phe.fasta currently only works on multiple structures files if header lines contain the word 'Structure', 'ENERGY' or 'dG'. Further support added as needed Convention of Connect format(ct) is to include 'ENERGY = value' as header above (value left blank if not applicable) """ from string import split,atof from cogent.struct.rna2d import Pairs from cogent.struct.pairs_util import adjust_base __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def ct_parser(lines=None): """Ct format parser Takes lines from a ct file as input Returns a list containing sequence,structure and if available the energy. [[seq1,[struct1],energy1],[seq2,[struct2],energy2],...] """ count = 0 length = '' energy = None seq = '' struct = [] result = [] for line in lines: count+=1 sline = line.split(None,6) #sline = split line if count==1 or new_struct(line):#first line or new struct line. if count > 1: struct = adjust_base(struct,-1) struct = Pairs(struct).directed() struct.sort() if energy is not None: result.append([seq,struct,energy]) energy = None else: result.append([seq,pairs]) struct = [] seq = '' #checks if energy for predicted struct is given if sline.__contains__('dG') or sline.__contains__('ENERGY'): energy = atof(sline[3]) if sline.__contains__('Structure'): energy = atof(sline[2]) else: seq = ''.join([seq,sline[1]]) if not int(sline[4]) == 0:#unpaired base pair = ( int(sline[0]),int(sline[4]) ) struct.append(pair) #structs are one(1) based, adjust to zero based struct = adjust_base(struct,-1) struct = Pairs(struct).directed() struct.sort() if energy is not None: result.append([seq,struct,energy]) else: result.append([seq,struct]) return result def new_struct(line): """ Determines if a new structure begins on line in question. Currently only works for multiple structure files containing these key words in their header. Convention of Connect format (ct format) is to include 'ENERGY = value' (value left blank if not applicable) Support for additional formats will be added as needed """ answer=False if 'Structure' in line or 'dG' in line or 'ENERGY' in line: answer = True return answer PyCogent-1.5.3/cogent/parse/cut.py000644 000765 000024 00000001373 12024702176 020003 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parser for the EMBOSS .cut codon usage table. Currently just reads the codons and their counts into a dict. """ __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def cut_parser(lines): """cut format parser Takes lines from a cut file as input. returns dict of {codon:count}. """ result = {} for line in lines: if line.startswith('#'): continue if not line.strip(): continue fields = line.split() result[fields[0]] = float(fields[-1]) return result PyCogent-1.5.3/cogent/parse/cutg.py000644 000765 000024 00000016103 12024702176 020147 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Parsers for CUTG codon and spsum files. Notes The CUTG format specifiers are highly inaccurate. For the species sum records, colons can appear in the organism name (e.g. for some bacterial strains), so it's important to split on the _last_ colon. For the codon records, the 'nt..nt' field now stores the GenBank location line in all its complexity. The length field precedes the PID field. There is not really a separator between the title and the descriptions. Some of the database _names_ contain '/', the field delimiter. For now, these are just skipped, since writing something sufficiently general to detect whether they are quoted is considerably more effort than the reward warrants. """ from cogent.parse.record_finder import LabeledRecordFinder from cogent.parse.record import RecordError, DelimitedSplitter from cogent.core.info import Info, DbRef from cogent.util.misc import caps_from_underscores as cfu from cogent.core.usage import CodonUsage from string import strip __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def is_cutg_label(x): """Checks if x looks like a CUTG label line.""" return x.startswith('>') def is_cutg_species_label(x): """Checks if x looks like a CUTG label line.""" return ':' in x def is_blank(x): """Checks if x is blank.""" return (not x) or x.isspace() CutgSpeciesFinder = LabeledRecordFinder(is_cutg_species_label, ignore=is_blank) CutgFinder = LabeledRecordFinder(is_cutg_label, ignore=is_blank) codon_order = "CGA CGC CGG CGU AGA AGG CUA CUC CUG CUU UUA UUG UCA UCC UCG UCU AGC AGU ACA ACC ACG ACU CCA CCC CCG CCU GCA GCC GCG GCU GGA GGC GGG GGU GUA GUC GUG GUU AAA AAG AAC AAU CAA CAG CAC CAU GAA GAG GAC GAU UAC UAU UGC UGU UUC UUU AUA AUC AUU AUG UGG UAA UAG UGA".split() #NOTE: following field order omits Locus/CDS (first field), which needs further #processing. Use zip(field_order, fields[1:]) and handle first field specially. field_order = "GenBank Location Length GenPept Species Description".split() species_label_splitter = DelimitedSplitter(':', -1) def CutgSpeciesParser(infile, strict=True, constructor=CodonUsage): """Yields successive sequences from infile as CodonUsage objects. If strict is True (default), raises RecordError when label or seq missing. """ if not strict: #easier to see logic without detailed error handling for rec in CutgSpeciesFinder(infile): try: label, counts = rec if not is_cutg_species_label(label): continue species, genes = species_label_splitter(label) info = Info({'Species':species, 'NumGenes':int(genes)}) freqs = constructor(zip(codon_order, map(int, counts.split())), Info=info) yield freqs except: continue else: for rec in CutgSpeciesFinder(infile): try: label, counts = rec except ValueError: #can't have got any counts raise RecordError, "Found label without sequences: %s" % rec if not is_cutg_species_label(label): raise RecordError, "Found CUTG record without label: %s" % rec species, genes = species_label_splitter(label) info = Info({'Species':species, 'NumGenes':int(genes)}) try: d = zip(codon_order, map(int, counts.split())) freqs = constructor(d, Info=info) except: raise RecordError, "Unable to convert counts: %s" % counts yield freqs def InfoFromLabel(line): """Takes a CUTG codon description line and returns an Info object. Raises RecordError if wrong number of fields etc. """ try: raw_fields = line.split('\\') result = Info(dict(zip(field_order, map(strip, raw_fields[1:])))) #extra processing for first field first = raw_fields[0] if '#' in first: locus, cds_num = map(strip, raw_fields[0].split('#')) else: locus, cds_num = first, '1' result['Locus'] = locus[1:] #remove leading '>' result['CdsNumber'] = cds_num #additional processing for last field: mostly key="value" pairs description = result['Description'] descrs = description.split('/') for d in descrs: if '=' in d: #assume key-value pair key, val = map(strip, d.split('=', 1)) #might be '=' in value #cut off leading and trailing " if present, but _not_ internal! if val.startswith('"'): val = val[1:] if val.endswith('"'): val = val[:-1] if key == 'db_xref': #handle cross-refs specially try: key, val = val.split(':') except ValueError: #missing actual reference? continue #just skip the bad db records try: if result[key]: result[key].append(val) else: result[key] = [val] except (KeyError, TypeError): #didn't recognize database result[key] = val else: #remember to convert the key to MixedCase naming convention result[cfu(key)] = val return result except: raise RecordError, "Failed to read label line:\n%s" % line def CutgParser(infile, strict=True, constructor=CodonUsage): """Yields successive sequences from infile as CodonUsage objects. If strict is True (default), raises RecordError when label or seq missing. """ if not strict: #not much error checking needed: following makes logic clear for rec in CutgFinder(infile): try: label, counts = rec if not is_cutg_label(label): continue info = InfoFromLabel(label) freqs = constructor(zip(codon_order, map(int, counts.split())), Info=info) yield freqs except: continue else: #need to do more detailed error checking count = 0 for rec in CutgFinder(infile): try: label, counts = rec except ValueError: #can't have got any counts raise RecordError, "Found label without sequences: %s" % rec if not is_cutg_label(label): raise RecordError, "Found CUTG record without label: %s" % rec info = InfoFromLabel(label) try: freqs = constructor(zip(codon_order, map(int, counts.split())), Info=info) except NotImplementedError: raise RecordError, "Unable to convert counts: %s" % counts yield freqs PyCogent-1.5.3/cogent/parse/dialign.py000644 000765 000024 00000004157 12024702176 020622 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import re from cogent import ASCII __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" _header = re.compile("^\s+[=]+") _quality_scores = re.compile("^ +\d+[\s\d]*$") def align_block_lines(lines): counter = 0 for line in lines: if "Alignment (DIALIGN format):" in line: counter += 1 continue elif counter == 1 and _header.findall(line): counter += 2 continue elif not counter or not line: continue elif "Sequence tree:" in line: break yield line def parse_data_line(line): if _quality_scores.findall(line): line = line.split() name = None seq = "".join(line) elif line[0].isspace(): name, seq = None, None else: line = line.split() name = line[0] seq = "".join(line[2:]) return name, seq def DialignParser(lines, seq_maker=None, get_scores=False): """Yields label, sequence pairs. The alignment quality info is recorded in the sequence case and the score line. Font info can be handled by providing a custom seq_maker function. The quality scores are returned as the last value pair with name 'QualityScores' when get_scores is True.""" if seq_maker is None: seq_maker = ASCII.Sequence seqs = {} quality_scores = [] for line in align_block_lines(lines): name, seq = parse_data_line(line) if seq is None: continue elif name is None and seq: quality_scores.append(seq) continue if name in seqs: seqs[name].append(seq) else: seqs[name] = [seq] # concat sequence blocks for name, seq_segs in seqs.items(): seq = "".join(seq_segs) yield name, seq_maker(seq, Name=name) if get_scores: yield "QualityScores", "".join(quality_scores) PyCogent-1.5.3/cogent/parse/dotur.py000644 000765 000024 00000002761 12024702176 020347 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file cogent.parse.dotur.py """Parses various Dotur output formats.""" from record_finder import is_empty __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" def get_otu_lists(data): """Returns list of lists of OTUs given data. - data: list of OTUs in following format: ['seq_1,seq_2,seq_3','seq_4,seq_5','seq_6','seq_7,seq_8'] """ return [i.split(',') for i in data] def OtuListParser(lines,ignore=is_empty): """Parser for *.list file format dotur result. - Result will be list of lists with following order: [[OTU distance, number of OTUs, [list of OTUs]], [OTU distance, number of OTUs, [list of OTUs]], [etc...]] """ result = [] if not lines: return result for line in lines: if ignore(line): continue curr_data = line.strip().split() #Get distance. Replace 'unique' string with 0 distance = float(curr_data[0].upper().replace('UNIQUE','0')) #number of OTUs is second column num_otus = int(curr_data[1]) #remaining columns contain lists of OTUs otu_list = get_otu_lists(curr_data[2:]) result.append([distance,num_otus,otu_list]) return result PyCogent-1.5.3/cogent/parse/dynalign.py000644 000765 000024 00000000676 12024702176 021022 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.parse.ct import ct_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def dynalign_parser(lines=None): """Parser for dynalign output""" result = ct_parser(lines) return result PyCogent-1.5.3/cogent/parse/ebi.py000644 000765 000024 00000161677 12024702176 017765 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provide a parser for SwissProt EBI format files. """ import sys from string import maketrans, strip, rstrip from pprint import pprint, pformat from cogent.parse.record_finder import DelimitedRecordFinder,\ LabeledRecordFinder, is_empty, TailedRecordFinder from cogent.parse.record import RecordError, FieldError from cogent.util.misc import identity, curry,\ NestedSplitter, list_flatten from cogent.core.sequence import Sequence __author__ = "Zongzhi Liu and Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Zongzhi Liu", "Sandra Smit", "Rob Knight", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Zongzhi Liu" __email__ = "zongzhi.liu@gmail.com" __status__ = "Development" all_chars = maketrans('','') def rstrip_(chars=None): return curry(rstrip, chars=chars) EbiFinder = DelimitedRecordFinder('//', constructor=rstrip) no_indent = lambda s: not s.startswith(' ') hanging_paragraph_finder = LabeledRecordFinder(no_indent, constructor=None) endswith_period = lambda x: x.endswith('.') period_tail_finder = TailedRecordFinder(endswith_period) ################################# # pairs_to_dict def pairs_to_dict(key_values, dict_mode=None, all_keys=None, handlers={}, default_handler=None): """generate a function which return a dict from a sequence of key_value pairs. key_values: (key, value) pairs, from any sequence type. Example: [('a', 1), ('b', 2), ('b', 3)] dict_mode: one of four modes to build a dict from pairs. 'overwrite_value': default, same as dict(key_values) the key_values example get {'a': 1, 'b': 3} 'no_duplicated_key': raise error when there is duplicated key; or a dict with a list of values when there is duplicated key 'allow_muti_value': a duplicated key will have a list of values, the example get {'a': 1, 'b': [2, 3]} 'always_multi_value': always group value(s) into a list for each key; the example get {'a': [1], 'b': [2, 3]} all_keys: if a key not found in all_keys, raise error; recommend to use a dict for all_keys for efficiency. Each value will be converted, if a valid handler can be found in handlers or a default_handler is provided. handler = handlers.get(key, default_handler) When handlers provided but no valid handler is found for a key: raise ValueError. Always use original value, if no handlers provided, for example, pairs_to_dict(adict.items()) will return adict. Note: use default_handler=identity is often useful if you want to return the original value when no handler found. """ if not dict_mode: dict_mode = 'overwrite_value' #generate add_item for different dict_mode. if dict_mode=='always_multi_value': def add_item(dictionary, key, value): """add key, value to dictionary in place""" dictionary.setdefault(key, []).append(value) elif dict_mode=='allow_multi_value': multiples = {} #auxillary dict recording the keys with multi_values def add_item(dictionary, key, value): """add key, value to dictionary in place Warning: using outer auxillary dictionary: multiples""" if key in dictionary: if not key in multiples: multiples[key] = True dictionary[key] = [dictionary[key]] dictionary[key].append(value) else: dictionary[key] = value elif dict_mode=='no_duplicated_key': def add_item(dictionary, key, value): """add key, value to dictionary in place""" if key in dictionary: raise ValueError('Duplicated Key') dictionary[key] = value elif dict_mode=='overwrite_value': def add_item(dictionary, key, value): """add key, value to dictionary in place""" dictionary[key] = value else: # unknown dict_mode raise ValueError('Unknown dict_mode:%s. \ndict_mode must be one of ' 'overwrite_value, no_duplicated_key, allow_multi_value and ' 'always_multi_value.' % dict_mode) #generate the handle_value function. if not handlers and not default_handler: handle_value = lambda x, y: (x, y) else: #handlers not empty, def handle_value(key, raw_value): handler = handlers.get(key, default_handler) if handler: value = handler(raw_value) else: #no handler found for key raise ValueError('No handler found for %s' % key) return key, value #build the result dict. result = {} for key, raw_value in key_values: if all_keys and key not in all_keys: raise ValueError('key: %s not in all_keys: %s' % (repr(key), all_keys)) key, value = handle_value(key, raw_value) add_item(result, key, value) return result ################################# # generic parsers def linecode_maker(line): """return the linecode and the line. The two-character line code that begins each line is always followed by three blanks, so that the actual information begins with the sixth character.""" linecode = line.split(' ', 1)[0] return linecode, line def labeloff(lines, splice_from=5): """strip off the first splice_from characters from each line Warning: without check!""" return [line[splice_from:] for line in lines] def join_parser(lines, join_str=' ', chars_to_strip=' ;.'): """return a joined str from a list of lines, strip off chars requested from the joined str""" #a str will not be joined if isinstance(lines, basestring): result = lines else: result = join_str.join(lines) return result.strip(chars_to_strip) def join_split_parser(lines, delimiters=';', item_modifier=strip, same_level=False, **kwargs): """return a nested list from lines, join lines before using NestedSplitter. delimiters: delimiters used by NestedSplitter item_modifier: passed to NestedSplitter, modify each splitted item. kwargs: passed to join_parser Examples: join_split_parser(['aa; bb;', 'cc.']) -> ['aa', 'bb', 'cc'] join_split_parser(['aa; bb, bbb;', 'cc.'], delimiters=';,') -> ['aa', ['bb','bbb'], 'cc'] join_split_parser('aa (bb) (cc).', delimiters='(', item_modifer=rstrip_(')) -> ['aa','bb','cc'] """ result = join_parser(lines, **kwargs) return NestedSplitter(delimiters, constructor=item_modifier, same_level=same_level)(result) def join_split_dict_parser(lines, delimiters=[';', ('=',1), ','], dict_mode=None, strict=True, **kwargs): """return a dict from lines, using the splited pairs from join_split_parser and pairs_to_dict. delimiters, kwargs: pass to join_split_sparser strict: when dict() fails -- (a pair not got from the second delimiter). return unconstructed list when False or raise error when True (default). dict_mode: pass to pairs_to_dict. if leave as None, will be 'overwrite_value', which is same as dict(pairs). Examples: join_split_dict_parser(['aa=1; bb=2,3; cc=4 (if aa=1);']) -> {'aa':'1', 'bb': ['2','3'], 'cc': '4 (if aa=1)'} """ primary_delimiters, value_delimiters = delimiters[:2], delimiters[2:] pairs = join_split_parser(lines, delimiters=primary_delimiters, same_level=True, **kwargs) try: dict(pairs) #catch error for any not splitted pair. except ValueError, e: #dictionary update sequence element #1 has length 1; if strict: raise ValueError('e\nFailed to get a dict from pairs: %s' % pairs) else: #return the splitted list without constucting return pairs if value_delimiters: split_value = NestedSplitter(value_delimiters, same_level=False) #should raise ValueError here if a pair donot have two elems. for i,(k, v) in enumerate(pairs): v = split_value(v) #modify v only if splitted by the first dilimiter if len(v) > 1: pairs[i][1] = v result = pairs_to_dict(pairs, dict_mode) return result def mapping_parser(line, fields, delimiters=[';', None], flatten=list_flatten): """return a dict of zip(fields, splitted line), None key will be deleted from the result dict. line: should be a str, to be splitted. fields: field name and optional type constructor for mapping. example: ['EntryName', ('Length', int), 'MolType'] delimiters: separators used to split the line. flatten: a function used to flatten the list from nested splitting. """ splits = NestedSplitter(delimiters=delimiters)(line) values = flatten(splits) result = {} for f, v in zip(fields, values): if isinstance(f, (tuple, list)): name, type = f result[name] = type(v) else: result[f] = v if None in result: del result[None] return result ################################# # individual parsers ################################# ################################# # mapping parsers: id, sq # def id_parser(lines): """return a mapping dict from id lines (only one line). The ID (IDentification) line is always the first line of an entry. The general form of the ID line is: ID ENTRY_NAME DATA_CLASS; MOLECULE_TYPE; SEQUENCE_LENGTH. Example: ID CYC_BOVIN STANDARD; PRT; 104 AA. """ lines = labeloff(lines) return mapping_parser(lines[0], delimiters=[';',None], fields=('EntryName','DataClass','MolType',('Length',int))) def sq_parser(lines): """return a mapping dict from SQ lines (only one line). The SQ (SeQuence header) line marks the beginning of the sequence data and gives a quick summary of its content. The format of the SQ line is: SQ SEQUENCE XXXX AA; XXXXX MW; XXXXXXXXXXXXXXXX CRC64; The line contains the length of the sequence in amino acids ('AA') followed by the molecular weight ('MW') rounded to the nearest mass unit (Dalton) and the sequence 64-bit CRC (Cyclic Redundancy Check) value ('CRC64'). """ lines = labeloff(lines) return mapping_parser(lines[0], delimiters=[';',None], fields=(None, ('Length', int), None, ('MolWeight',int), None, 'Crc64')) def kw_parser(lines): """return a list of keywords from KW lines. The format of the KW line is: KW Keyword[; Keyword...]. """ lines = labeloff(lines) return join_split_parser(lines) def ac_parser(lines): """return a list of accession numbers from AC lines. The AC (ACcession number) line lists the accession number(s) associated with an entry. The format of the AC line is: AC AC_number_1;[ AC_number_2;]...[ AC_number_N;] The first accession number is commonly referred to as the 'primary accession number'. 'Secondary accession numbers' are sorted alphanumerically. """ lines = labeloff(lines) return join_split_parser(lines) def dt_parser(lines): """return the origal lines from DT lines. Note: not complete parsing The DT (DaTe) lines show the date of creation and last modification of the database entry. The format of the DT line in Swiss-Prot is: DT DD-MMM-YYYY (Rel. XX, Comment) The format of the DT line in TrEMBL is: DT DD-MMM-YYYY (TrEMBLrel. XX, Comment) There are always three DT lines in each entry, each of them is associated with a specific comment: * The first DT line indicates when the entry first appeared in the database. The comment is 'Created'; * The second DT line indicates when the sequence data was last modified. comment is 'Last sequence update'; * The third DT line indicates when data (see the note below) other than the sequence was last modified. 'Last annotation update'. Example of a block of Swiss-Prot DT lines: DT 01-AUG-1988 (Rel. 08, Created) DT 30-MAY-2000 (Rel. 39, Last sequence update) DT 10-MAY-2005 (Rel. 47, Last annotation update) """ lines = labeloff(lines) return lines ################################# # gn_parser def gn_parser(lines): """return a list of dict from GN lines. The GN (Gene Name) line indicates the name(s) of the gene(s) that code for the stored protein sequence. The GN line contains three types of information: Gene names, Ordered locus names, ORF names. format: GN Name=; Synonyms=[, ...]; OrderedLocusNames=[, ...]; GN ORFNames=[, ...]; None of the above four tokens are mandatory. But a "Synonyms" token can only be present if there is a "Name" token. If there is more than one gene, GN line blocks for the different genes are separated by the following line: GN and Example: GN Name=Jon99Cii; Synonyms=SER1, SER5, Ser99Da; ORFNames=CG7877; GN and GN Name=Jon99Ciii; Synonyms=SER2, SER5, Ser99Db; ORFNames=CG15519;""" lines = labeloff(lines) return map(gn_itemparser, gn_itemfinder(lines)) gn_itemparser = join_split_dict_parser gn_itemfinder = DelimitedRecordFinder('and', constructor=None, strict=False, keep_delimiter=False) def oc_parser(lines): """return a list from OC lines. The OC (Organism Classification) lines contain the taxonomic classification of the source organism. The classification is listed top-down as nodes in a taxonomic tree in which the most general grouping is given first. format: OC Node[; Node...]. """ lines = labeloff(lines) return join_split_parser(lines) def os_parser(lines): """return a list from OS lines. OS (Organism Species) line specifies the organism which was the source of the stored sequence. The last OS line is terminated by a period. The species designation consists, in most cases, of the Latin genus and species designation followed by the English name (in parentheses). For viruses, only the common English name is given. Examples of OS lines are shown here: OS Escherichia coli. OS Solanum melongena (Eggplant) (Aubergine). OS Rous sarcoma virus (strain SchRuppin A) (RSV-SRA) (Avian leukosis OS virus-RSA). """ lines = labeloff(lines) return join_split_parser(lines, delimiters='(', item_modifier=rstrip_(') ')) def ox_parser(lines): """return a dict from OX lines. The OX (Organism taxonomy cross-reference) line is used to indicate the identifier of a specific organism in a taxonomic database. The format: OX Taxonomy_database_Qualifier=Taxonomic code; Currently the cross-references are made to the taxonomy database of NCBI, which is associated with the qualifier 'TaxID' and a one- to six-digit taxonomic code.""" lines = labeloff(lines) return join_split_dict_parser(lines) def og_parser(lines): """return a list from OG lines The OG (OrGanelle) line indicates if the gene coding for a protein originates from the mitochondria, the chloroplast, the cyanelle, the nucleomorph or a plasmid. The format of the OG line is: OG Hydrogenosome. OG Mitochondrion. OG Nucleomorph. OG Plasmid name. OG Plastid. OG Plastid; Apicoplast. OG Plastid; Chloroplast. OG Plastid; Cyanelle. OG Plastid; Non-photosynthetic plastid. Where 'name' is the name of the plasmid. example: OG Mitochondrion. OG Plasmid R6-5, Plasmid IncFII R100 (NR1), and OG Plasmid IncFII R1-19 (R1 drd-19).""" lines = labeloff(lines) result = [] for item in period_tail_finder(lines): item = ' '.join(item).rstrip('. ') if item.startswith('Plasmid'): item = item.replace(' and', '') item = map(strip, item.split(',')) result.append(item) return result ################################# # dr_parser def dr_parser(lines): """return a dict of items from DR lines. The DR (Database cross-Reference) lines are used as pointers to information related to entries and found in data collections other than Swiss-Prot. The format of one of many DR line is: DR DATABASE_IDENTIFIER; PRIMARY_IDENTIFIER; SECONDARY_IDENTIFIER[; TERTIARY_IDENTIFIER][; QUATERNARY_IDENTIFIER]. """ lines = labeloff(lines) keyvalues = map(dr_itemparser, period_tail_finder(lines)) result = pairs_to_dict(keyvalues, 'always_multi_value') return result def dr_itemparser(lines): """return a key, value pair from lines of a DR item. """ fields = join_split_parser(lines) return fields[0], fields[1:] ################################# # de_parser def de_parser(lines): """return a dict of {OfficalName: str, Synonyms: str, Fragment: bool, Contains: [itemdict,], Includes: [itemdict,]} from DE lines The DE (DEscription) lines contain general descriptive information about the sequence stored. This information is generally sufficient to identify the protein precisely. The description always starts with the proposed official name of the protein. Synonyms are indicated between brackets. Examples below If a protein is known to be cleaved into multiple functional components, the description starts with the name of the precursor protein, followed by a section delimited by '[Contains: ...]'. All the individual components are listed in that section and are separated by semi-colons (';'). Synonyms are allowed at the level of the precursor and for each individual component. If a protein is known to include multiple functional domains each of which is described by a different name, the description starts with the name of the overall protein, followed by a section delimited by '[Includes: ]'. All the domains are listed in that section and are separated by semi-colons (';'). Synonyms are allowed at the level of the protein and for each individual domain. In rare cases, the functional domains of an enzyme are cleaved, but the catalytic activity can only be observed, when the individual chains reorganize in a complex. Such proteins are described in the DE line by a combination of both '[Includes:...]' and '[Contains:...]', in the order given in the following example: If the complete sequence is not determined, the last information given on the DE lines is '(Fragment)' or '(Fragments)'. Example: DE Dihydrodipicolinate reductase (EC 1.3.1.26) (DHPR) (Fragment). DE Arginine biosynthesis bifunctional protein argJ [Includes: Glutamate DE N-acetyltransferase (EC 2.3.1.35) (Ornithine acetyltransferase) DE (Ornithine transacetylase) (OATase); Amino-acid acetyltransferase DE (EC 2.3.1.1) (N-acetylglutamate synthase) (AGS)] [Contains: Arginine DE biosynthesis bifunctional protein argJ alpha chain; Arginine DE biosynthesis bifunctional protein argJ beta chain] (Fragment). Trouble maker: DE Amiloride-sensitive amine oxidase [copper-containing] precursor(EC DE 1.4.3.6) (Diamine oxidase) (DAO). """ labeloff_lines = labeloff(lines) joined = join_parser(labeloff_lines, chars_to_strip='). ') keys = ['Includes', 'Contains', 'Fragment'] fragment_label = '(Fragment' contains_label = '[Contains:' includes_label = '[Includes:' #Process Fragment fragment = False if joined.endswith(fragment_label): fragment = True joined = joined.rsplit('(', 1)[0] #Process Contains contains = [] if contains_label in joined: joined, contains_str = joined.split(contains_label) contains_str = contains_str.strip(' ]') contains = map(de_itemparser, contains_str.split('; ')) #Process Includes includes = [] if includes_label in joined: joined, includes_str = joined.split(includes_label) includes_str = includes_str.strip(' ]') includes = map(de_itemparser, includes_str.split('; ')) #Process Primary primary = de_itemparser(joined) result = dict(zip(keys, (includes, contains, fragment))) result.update(primary) return result def de_itemparser(line): """return a dict of {OfficalName: str, Synonyms: [str,]} from a de_item The description item is a str, always starts with the proposed official name of the protein. Synonyms are indicated between brackets. Examples below 'Annexin A5 (Annexin V) (Lipocortin V) (Endonexin II)' """ fieldnames = ['OfficalName', 'Synonyms'] fields = [e.strip(') ') for e in line.split('(')] #if no '(', fields[1:] will be [] return dict(zip(fieldnames, [fields[0], fields[1:]])) ################################# # ft_parser def ft_parser(lines): """return a list of ft items from FT lines. The FT (Feature Table) lines lists posttranslational modifications, binding sites, enzyme active sites, local secondary structure or other characteristics reported in the cited references. Sequence conflicts between references are also included in the feature table. The FT lines have a fixed format. The column numbers allocated to each of the data items within each FT line are shown in the following table (column numbers not referred to in the table are always occupied by blanks). Columns Data item 1-2 FT 6-13 Key name 15-20 'From' endpoint 22-27 'To' endpoint 35-75 Description The key name and the endpoints are always on a single line, but the description may require one or more additional lines. The following description lines continues from column 35 onwards. For more information about individual ft keys, see http://us.expasy.org/sprot/userman.html#FT_keys 'FROM' and 'TO' endpoints: Numbering start from 1; When a feature is known to extend beyond the position that is given in the feature table, the endpoint specification will be preceded by '<' for features which continue to the left end (N-terminal direction) or by '>' for features which continue to the right end (C- terminal direction); Unknown endpoints are denoted by '?'. Uncertain endpoints are denoted by a '?' before the position, e.g. '?42'. Some features (CARBOHYD, CHAIN, PEPTIDE, PROPEP, VARIANT and VARSPLIC) are associated with a unique and stable feature identifier (FTId), which allows to construct links directly from position-specific annotation in the feature table to specialized protein-related databases. The FTId is always the last component of a feature in the description field. Examples: FT SIGNAL <1 10 By similarity. FT MOD_RES 41 41 Arginine amide (G-42 provides amide FT group) (By similarity). FT CONFLICT 327 327 E -> R (in Ref. 2). FT CONFLICT 77 77 Missing (in Ref. 1). FT CARBOHYD 251 251 N-linked (GlcNAc...). FT /FTId=CAR_000070. FT PROPEP 25 48 FT /FTId=PRO_0000021449. FT VARIANT 214 214 V -> I. FT /FTId=VAR_009122. FT VARSPLIC 33 83 TVGRFRRRATP -> PLTSFHPFTSQMPP (in FT isoform 2). FT /FTId=VSP_004370. Secondary structure (HELIX, STRAND, TURN) - The feature table of sequence entries of proteins whose tertiary structure is known experimentally contains the secondary structure information extracted from the coordinate data sets of the Protein Data Bank (PDB). Residues not specified in one of these classes are in a 'loop' or 'random-coil' structure. """ lines = labeloff(lines) fieldnames = 'Start End Description'.split() secondary_structure_keynames = 'HELIX STRAND TURN'.split() result = {} for item in hanging_paragraph_finder(lines): keyname, start, end, description = ft_basic_itemparser(item) #group secondary structures (as a list) into #result['SecondaryStructure'] if keyname in secondary_structure_keynames: result.setdefault('SecondaryStructure', []).\ append((keyname, start, end)) continue #further parser the description for certain keynames if keyname in ft_description_parsers: description = ft_description_parsers[keyname](description) #group current item result (as a dict) into result[keyname] curr = dict(zip(fieldnames, [start, end, description])) result.setdefault(keyname, []). append(curr) return result def ft_basic_itemparser(item_lines): """-> (key, start, end, description) from lines of a feature item. A feature item (generated by itemfinder) has the same keyname. WARNING: not complete, location fields need further work? """ #cut_postions: the postions to split the line into fields original_cut_positions = [15,22,35] #see doc of ft_parser #keyname will start from 0(instead of 6) after labeloff cut_positions = [e - 6 for e in original_cut_positions] #unpack the first line to fields first_line = item_lines[0] keyname, from_point, to_point, description = \ [first_line[i:j].strip() for i,j in zip([0]+cut_positions,cut_positions+[None])] #extend the description if provided following lines if len(item_lines) > 1: following_lines = item_lines[1:] desc_start = cut_positions[-1] following_description = ' '.join( [e[desc_start:].strip() for e in following_lines]) description = ' '.join((description, following_description)) #convert start and end points to int, is possible from_point, to_point = map(try_int, (from_point, to_point)) return keyname, from_point, to_point, description.strip(' .') def try_int(obj): """return int(obj), or original obj if failed""" try: return int(obj) except ValueError: #invalid literal for int() return obj ### ft description_parsers below def ft_id_parser(description): """return a dict of {'Description':,'Id':} from raw decription str Examples. FT PROPEP 25 48 FT /FTId=PRO_0000021449. FT VARIANT 214 214 V -> I. FT /FTId=VAR_009122. FT VARSPLIC 33 83 TVGRFRRRATP -> PLTSFHPFTSQMPP (in FT isoform 2). FT /FTId=VSP_004370. """ fieldnames = ['Description', 'Id'] id_sep='/FTId=' try: desc, id = [i.strip(' .') for i in description.split(id_sep)] except: desc, id = description, '' #replace desc in fields with (desc, id) to get the result result = dict(zip(fieldnames, [desc, id])) return result def ft_mutation_parser(description, mutation_comment_delimiter='('): """return a dict of {'MutateFrom': , 'MutateTo':,'Comment':} from description str Warning: if both id and mutation should be parsed, always parse id first. Note: will add exceptions later Examples. FT VARIANT 214 214 V -> I (in a lung cancer). FT /FTId=VAR_009122. FT CONFLICT 484 484 Missing (in Ref. 2). FT CONFLICT 802 802 K -> Q (in Ref. 4, 5 and 10). """ fieldnames = 'MutateFrom MutateTo Comment'.split() #split desc into mutation and comment desc = description.rstrip(' )') try: mutation, comment = desc.split(mutation_comment_delimiter, 1) except ValueError, e: #too many values to unpack mutation, comment = desc, '' #split mutation into mut_from, mut_to #if mut_from/to unknown, the mutation message will be in mut_from mutation_delimiter = '->' try: mut_from, mut_to=map(strip, mutation.split(mutation_delimiter, 1)) except ValueError, e: #too many values to unpack mut_from, mut_to = mutation, '' #replace desc in fields with mut_from, mut_to and comment to get the result result = dict(zip(fieldnames, [mut_from, mut_to, comment])) return result def ft_mutagen_parser(description): """return a dict from MUTAGEN description MUTAGEN - Site which has been experimentally altered. Examples FT MUTAGEN 119 119 C->R,E,A: Loss of cADPr hydrolase and FT ADP-ribosyl cyclase activity. FT MUTAGEN 169 177 Missing: Abolishes ATP-binding. """ return ft_mutation_parser(description, mutation_comment_delimiter=':') def ft_id_mutation_parser(description): """return a dict from description str Examples. FT VARIANT 214 214 V -> I. FT /FTId=VAR_009122. FT VARSPLIC 33 83 TVGRFRRRATP -> PLTSFHPFTSQMPP (in FT isoform 2). FT /FTId=VSP_004370. """ desc_id_dict = ft_id_parser(description) desc = desc_id_dict.pop('Description') result = dict(desc_id_dict, **ft_mutation_parser(desc)) return result ft_description_parsers = { 'VARIANT': ft_id_mutation_parser, 'VARSPLIC': ft_id_mutation_parser, 'CARBOHYD':ft_id_parser, 'CHAIN': ft_id_parser, 'PEPTIDE':ft_id_parser, 'PROPEP': ft_id_parser, 'CONFLICT': ft_mutation_parser, 'MUTAGEN': ft_mutagen_parser, #'NON_TER': ft_choplast_parser, #'NON_CONS': ft_choplast_parser, } ################################# # cc_parser all_cc_topics = dict.fromkeys([ 'ALLERGEN', 'ALTERNATIVE PRODUCTS', 'BIOPHYSICOCHEMICAL PROPERTIES', 'BIOTECHNOLOGY', 'CATALYTIC ACTIVITY', 'CAUTION', 'COFACTOR', 'DATABASE', 'DEVELOPMENTAL STAGE', 'DISEASE', 'DOMAIN', 'ENZYME REGULATION', 'FUNCTION', 'INDUCTION', 'INTERACTION', 'MASS SPECTROMETRY', 'MISCELLANEOUS', 'PATHWAY', 'PHARMACEUTICAL', 'POLYMORPHISM', 'PTM', 'RNA EDITING', 'SIMILARITY', 'SUBCELLULAR LOCATION', 'SUBUNIT', 'TISSUE SPECIFICITY', 'TOXIC DOSE']) def cc_parser(lines, strict=False): """return a dict of {topic: a list of values} from CC lines. some topics have special format and will use specific parsers defined in handlers The CC lines are free text comments on the entry, and are used to convey any useful information. The comments always appear below the last reference line and are grouped together in comment blocks; a block is made up of 1 or more comment lines. The first line of a block starts with the characters '-!-'. Format: CC -!- TOPIC: First line of a comment block; CC second and subsequent lines of a comment block. Examples: CC -!- DISEASE: Defects in PHKA1 are linked to X-linked muscle CC glycogenosis [MIM:311870]. It is a disease characterized by slowly CC progressive, predominantly distal muscle weakness and atrophy. CC -!- DATABASE: NAME=Alzheimer Research Forum; NOTE=APP mutations; CC WWW="http://www.alzforum.org/res/com/mut/app/default.asp". CC -!- INTERACTION: CC Self; NbExp=1; IntAct=EBI-123485, EBI-123485; CC Q9W158:CG4612; NbExp=1; IntAct=EBI-123485, EBI-89895; CC Q9VYI0:fne; NbExp=1; IntAct=EBI-123485, EBI-126770; CC -!- BIOPHYSICOCHEMICAL PROPERTIES: CC Kinetic parameters: CC KM=98 uM for ATP; CC KM=688 uM for pyridoxal; CC Vmax=1.604 mmol/min/mg enzyme; CC pH dependence: CC Optimum pH is 6.0. Active from pH 4.5 to 10.5; CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing; Named isoforms=3; CC Comment=Additional isoforms seem to exist. Experimental CC confirmation may be lacking for some isoforms; CC Name=1; Synonyms=AIRE-1; CC IsoId=O43918-1; Sequence=Displayed; CC Name=2; Synonyms=AIRE-2; CC IsoId=O43918-2; Sequence=VSP_004089; CC Name=3; Synonyms=AIRE-3; CC IsoId=O43918-3; Sequence=VSP_004089, VSP_004090; CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced a collaboration CC removed. CC -------------------------------------------------------------------------- """ lines = labeloff(lines) #cc_itemfinder yield each topic block #cc_basic_itemparser split a topic block into (topic_name, content_as_list) topic_contents = map(cc_basic_itemparser, cc_itemfinder(lines)) #content of a topic further parsed using a content_parser decided by the #topic name. result is grouped into a dict. try: result = pairs_to_dict(topic_contents, 'always_multi_value', handlers=cc_content_parsers, default_handler=join_parser) except Exception, e: pprint( lines) raise e if strict: for topic in result: if topic not in all_cc_topics: raise FieldError('Invalid topic: %s' % topic) return result def cc_basic_itemparser(topic): """return (topic_name, topic_content as a list) from a cc topic block. Format of a topic as input of this function: [ '-!- TOPIC: First line of a comment block;', ' second and subsequent lines of a comment block.'] """ num_format_leading_spaces = 4 #for topic lines except the first #get the keyname and content_head from the first line topic_head = topic[0].lstrip(' -!') try: keyname, content_head = map(strip, topic_head.split(':', 1)) except ValueError: # need more than 1 value to unpack raise FieldError('Not a valid topic line: %s', topic[0]) if content_head: content = [content_head] else: content = [] #the following lines be stripped off the format leading spaces if len(topic) > 1: content += labeloff(topic[1:], num_format_leading_spaces) return keyname, content def cc_itemfinder(lines): """yield each topic/license as a list from CC lines without label and leading spaces. Warning: hardcoded LICENSE handling""" ## all the codes except the return line tries to preprocess the #license block #two clusters of '-' are used as borders for license, as observed license_border = '-' * 74 license_headstr = '-!- LICENSE:' content_start = 4 #the idx where topic content starts if license_border in lines: #discard the bottom license border if lines[-1] == license_border: lines.pop() else: raise FieldError('No bottom line for license: %s' % lines) #normalize license lines to the format of topic lines license_idx = lines.index(license_border) lines[license_idx] = license_headstr for i in range(license_idx+1, len(lines)): lines[i] = ' ' * content_start + lines[i] #the return line is all we need, if no license block return hanging_paragraph_finder(lines) ## cc_content_parsers here below def cc_interaction_parser(content_list): """return a list of [interactor, {params}] from interaction content. Format: -!- INTERACTION: {{SP_Ac:identifier[ (xeno)]}|Self}; NbExp=n; IntAct=IntAct_Protein_Ac, IntAct_Protein_Ac; """ result= [] for line in content_list: interactor, params = line.split(';',1) params = join_split_dict_parser([params]) result.append((interactor.strip(), params)) return result cc_alternative_products_event_finder = LabeledRecordFinder( lambda x: x.startswith('Event=')) cc_alternative_products_name_finder = LabeledRecordFinder( lambda x: x.startswith('Name=')) def cc_alternative_products_parser(content_list): """return a list from AlternativeProucts lines. Note: not complete parsing, consider to merge Names to the last Event?? and make event or name to be the dict key? Format: CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative promoter; CC Comment=Free text; CC Event=Alternative splicing; Named isoforms=n; CC Comment=Optional free text; CC Name=Isoform_1; Synonyms=Synonym_1[, Synonym_n]; CC IsoId=Isoform_identifier_1[, Isoform_identifier_n]; Sequence=Displayed; CC Note=Free text; CC Name=Isoform_n; Synonyms=Synonym_1[, Synonym_n]; CC IsoId=Isoform_identifier_1[, Isoform_identifier_n]; Sequence=VSP_identifier_1 [, VSP_identifier_n]; CC Note=Free text; CC Event=Alternative initiation; CC Comment=Free text; """ result = [] for event in cc_alternative_products_event_finder(content_list): head_names = list(cc_alternative_products_name_finder(event)) head, names = head_names[0], head_names[1:] event_dict = join_split_dict_parser(head) if names: event_dict['Names'] = map(join_split_dict_parser, names) result.append(event_dict) return result def cc_biophysicochemical_properties_parser(content): """return a dict from content_list of a ~ topic. Format of a ~ topic block: CC -!- BIOPHYSICOCHEMICAL PROPERTIES: CC Absorption: CC Abs(max)=xx nm; CC Note=free_text; CC Kinetic parameters: CC KM=xx unit for substrate [(free_text)]; CC Vmax=xx unit enzyme [free_text]; CC Note=free_text; CC pH dependence: CC free_text; CC Redox potential: CC free_text; CC Temperature dependence: CC free_text; Example of a ~ topic block: CC -!- BIOPHYSICOCHEMICAL PROPERTIES: CC Kinetic parameters: CC KM=98 uM for ATP; CC KM=688 uM for pyridoxal; CC Vmax=1.604 mmol/min/mg enzyme; CC pH dependence: CC Optimum pH is 6.0. Active from pH 4.5 to 10.5; """ def get_sub_key_content(sub_topic): """return (sub_key, sub_content as parsed) from lines of a sub_topic""" sub_key = sub_topic[0].rstrip(': ') sub_content = map(strip, sub_topic[1:]) #strip the two leading spaces #further process the content here if sub_key in ['Kinetic parameters', 'Absorption']: #group into a dict which allow multiple values. subkey_values = join_split_parser(sub_content, delimiters=[';',('=',1)]) sub_content = pairs_to_dict(subkey_values, 'allow_multi_value') else: sub_content = join_parser(sub_content, chars_to_strip='; ') return sub_key, sub_content sub_key_contents = map(get_sub_key_content, hanging_paragraph_finder(content)) return pairs_to_dict(sub_key_contents, 'no_duplicated_key') cc_content_parsers = { #? not complete: further group alternative splicing? 'ALTERNATIVE PRODUCTS': cc_alternative_products_parser, 'BIOPHYSICOCHEMICAL PROPERTIES': cc_biophysicochemical_properties_parser, 'INTERACTION': cc_interaction_parser, 'DATABASE': join_split_dict_parser, 'MASS SPECTROMETRY':join_split_dict_parser, } ################################# # REFs parser def refs_parser(lines): """return a dict of {RN: single_ref_dict} These lines comprise the literature citations. The citations indicate the sources from which the data has been abstracted. if several references are given, there will be a reference block for each. """ rn_ref_pairs = map(single_ref_parser, ref_finder(lines)) return pairs_to_dict(rn_ref_pairs) is_ref_line = lambda x: x.startswith('RN') ref_finder = LabeledRecordFinder(is_ref_line) required_ref_labels = 'RN RP RL RA/RG RL'.split() def single_ref_parser(lines, strict=False): """return rn, ref_dict from lines of a single reference block strict: if True (default False), raise RecordError if lacking required labels. Warning: using global required_ref_labels. The reference lines for a given citation occur in a block, and are always in the order RN, RP, RC, RX, RG, RA, RT and RL. Within each such reference block, the RN line occurs once, the RC, RX and RT lines occur zero or more times, and the RP, RG/RA and RL lines occur one or more times. """ #group by linecode label_lines = map(linecode_maker, lines) raw_dict = pairs_to_dict(label_lines, 'always_multi_value') if strict: labels = dict.fromkeys(raw_dict.keys()) if 'RA' in labels or 'RG' in labels: labels['RA/RG'] = True for rlabel in required_ref_labels: if rlabel not in labels: raise RecordError('The reference block lacks required label: '\ '%s' % rlabel) #parse each field with relevant parser parsed_dict = pairs_to_dict(raw_dict.items(), handlers=ref_parsers) rn = parsed_dict.pop('RN') return rn, parsed_dict ## ref_parsers here below def rx_parser(lines): """return a dict from RX lines. The RX (Reference cross-reference) line is an optional line which is used to indicate the identifier assigned to a specific reference in a bibliographic database. The format: RX Bibliographic_db=IDENTIFIER[; Bibliographic_db=IDENTIFIER...]; Where the valid bibliographic database names and their associated identifiers are: MEDLINE Eight-digit MEDLINE Unique Identifier (UI) PubMed PubMed Unique Identifier (PMID) DOI Digital Object Identifier (DOI), examples: DOI=10.2345/S1384107697000225 DOI=10.4567/0361-9230(1997)42:2.0.TX;2-B http://www.doi.org/handbook_2000/enumeration.html#2.2 """ lines = labeloff(lines) return join_split_dict_parser(lines, delimiters=['; ','=']) def rc_parser(lines): """return a dict from RC lines. The RC (Reference Comment) lines are optional lines which are used to store comments relevant to the reference cited. The format: RC TOKEN1=Text; TOKEN2=Text; ... The currently defined tokens and their order in the RC line are: STRAIN TISSUE TRANSPOSON PLASMID """ lines = labeloff(lines) return join_split_dict_parser(lines) def rg_parser(lines): """return a list of str(group names) from RG lines The Reference Group (RG) line lists the consortium name associated with a given citation. The RG line is mainly used in submission reference blocks, but can also be used in paper references, if the working group is cited as an author in the paper. RG line and RA line (Reference Author) can be present in the same reference block; at least one RG or RA line is mandatory per reference block. example: RG The mouse genome sequencing consortium; """ lines = labeloff(lines) return join_split_parser(lines) def ra_parser(lines): """return a list from RA lines. The RA (Reference Author) lines list the authors of the paper (or other work) cited. RA might be missing in references that cite a reference group (see RG line). At least one RG or RA line is mandatory per reference block. All of the authors are included, and are listed in the order given in the paper. The names are listed surname first followed by a blank, followed by initial(s) with periods. The authors' names are separated by commas and terminated by a semicolon. Author names are not split between lines. eg: RA Galinier A., Bleicher F., Negre D., Perriere G., Duclos B.; All initials of the author names are indicated and hyphens between initials are kept. An author's initials can be followed by an abbreviation such as 'Jr' (for Junior), 'Sr' (Senior), 'II', 'III' or 'IV'. Example: RA Nasoff M.S., Baker H.V. II, Wolf R.E. Jr.; """ lines = labeloff(lines) return join_split_parser(lines, chars_to_strip=';', delimiters=',') def rp_parser(lines): """return joined str stripped of '.'. The RP (Reference Position) lines describe the extent of the work relevant to the entry carried out by the authors. format: RP COMMENT. """ lines = labeloff(lines) return ' '.join(lines).strip('. ') def rl_parser(lines): """return joined str stipped of '.'. Note: not complete parsing. The RL (Reference Location) lines contain the conventional citation information for the reference. In general, the RL lines alone are sufficient to find the paper in question. a) Journal citations RL Journal_abbrev Volume:First_page-Last_page(YYYY). When a reference is made to a paper which is 'in press' RL Int. J. Parasitol. 0:0-0(2005). b) Electronic publications includes an '(er)' prefix. The format is indicated below: RL (er) Free text. c) Book citations RL (In) Editor_1 I.[, Editor_2 I., Editor_X I.] (eds.); RL Book_name, pp.[Volume:]First_page-Last_page, Publisher, City (YYYY). Examples: RL (In) Rich D.H., Gross E. (eds.); RL Proceedings symposium, pp.69-72, Pierce RL Chemical Co., Rockford Il. (1981). d) Unpublished results, eg: RL Unpublished results, cited by: RL Shelnutt J.A., Rousseau D.L., Dethmers J.K., Margoliash E.; RL Biochemistry 20:6485-6497(1981). e) Unpublished observations, format: RL Unpublished observations (MMM-YYYY). f) Thesis, format: RL Thesis (Year), Institution_name, Country. g) Patent applications, format: RL Patent number Pat_num, DD-MMM-YYYY. h) Submissions, format: RL Submitted (MMM-YYYY) to Database_name. 'Database_name' is one of the following: EMBL/GenBank/DDBJ, Swiss-Prot, PDB, PIR. """ lines = labeloff(lines) return ' '.join(lines).strip('. ') def rt_parser(lines): """return joined line stripped of ."; The RT (Reference Title) lines give the title of the paper (or other work) cited as exactly as possible given the limitations of the computer character set. The format of the RT line is: RT "Title.";""" lines = labeloff(lines) return ' '.join(lines).strip('.";') def rn_parser(lines): """return a integer from RN lines (only one line). The RN (Reference Number) line gives a sequential number to each reference citation in an entry. This number is used to indicate the reference in comments and feature table notes. The format of the RN line is: RN [##] """ lines = labeloff(lines) return int(lines[0].strip(' []')) ref_parsers = { 'RN': rn_parser, 'RP': rp_parser, 'RC': rc_parser, 'RX': rx_parser, 'RG': rg_parser, 'RA': ra_parser, 'RT': rt_parser, 'RL': rl_parser, } required_labels = 'ID AC DT DE OS OC OX SQ REF'.split() + [''] ################################# # Minimal Ebi parser def MinimalEbiParser(lines, strict=True, selected_labels=[]): """yield each (sequence as a str, a dict of header) from ebi record lines if strict (default), raise RecordError if a record lacks required labels. Warning: using the global required_labels. Line code Content Occurrence in an entry ID Identification Once; starts the entry AC Accession number(s) Once or more DT Date Three times DE Description Once or more GN Gene name(s) Optional OS Organism species Once OG Organelle Optional OC Organism classification Once or more OX Taxonomy cross-reference Once RN Reference number Once or more RP Reference position Once or more RC Reference comment(s) Optional RX Reference cross-reference(s) Optional RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if RG line) RT Reference title Optional RL Reference location Once or more CC Comments or notes Optional DR Database cross-references Optional KW Keywords Optional FT Feature table data Optional SQ Sequence header Once (blanks) Sequence data Once or more // Termination line Once; ends the entry The two-character line-type code that begins each line is always followed by three blanks, so that the actual information begins with the sixth character. Information is not extended beyond character position 75 except for one exception: CC lines that contain the 'DATABASE' topic""" for record in EbiFinder(lines): if strict and not record[0].startswith('ID'): raise RecordError('Record must begin with ID line') del record[-1] #which must be //, ensured by Finder keyvalues = map(linecode_merging_maker, record) raw_dict = pairs_to_dict(keyvalues, 'always_multi_value', all_keys=_parsers) if strict: for rlabel in required_labels: if rlabel not in raw_dict: raise RecordError('The record lacks required label: '\ '%s' % rlabel) sequence = raw_dict.pop('') #which is the linecode for sequence sequence = ''.join(sequence).translate(all_chars,'\t\n ') if selected_labels: for key in raw_dict.keys(): if key not in selected_labels: del raw_dict[key] header_dict = raw_dict yield sequence, header_dict def linecode_merging_maker(line): """return merged linecode and the line. All valid reference linecodes merged into REF Warning: using global ref_parsers""" linecode = linecode_maker(line)[0] if linecode in ref_parsers: linecode = 'REF' return linecode, line ################################# # EbiParser def parse_header(header_dict, strict=True): """Parses a dictionary of header lines""" return pairs_to_dict(header_dict.items(), 'no_duplicated_key', handlers = _parsers) _parsers = { 'ID': id_parser, 'AC': ac_parser, 'DE': de_parser, 'DT': dt_parser, 'GN': gn_parser, 'OC': oc_parser, 'OS': os_parser, 'OX': ox_parser, 'OG': og_parser, 'REF': refs_parser, 'CC': cc_parser, 'DR': dr_parser, 'KW': kw_parser, 'FT': ft_parser, 'SQ': sq_parser, '': None, '//': None, } def EbiParser(lines, seq_constructor=Sequence, header_constructor= parse_header, strict=True, selected_labels=[]): """Parser for the EBI data format. lines: input data (list of lines or file stream) seq_constructor: constructor function to construct sequence, 'Sequence' by default. header_constructor: function to process the header information. Default is 'parse_header' strict: whether an exception should be raised in case of a problem (strict=True) or whether the bad record should be skipped (strict=False). selected_labels: Labels from the original data format that you want returned. All the original header labels are used, except for REFERENCES, which is 'REF'. """ for sequence, header_dict in MinimalEbiParser(lines, strict=strict,\ selected_labels=selected_labels): if seq_constructor: sequence = seq_constructor(sequence) try: header = header_constructor(header_dict, strict=strict) except (RecordError, FieldError, ValueError), e: if strict: #!! just raise is better than raise RecordError raise #RecordError, str(e) else: continue yield sequence, header if __name__ == "__main__": from getopt import getopt, GetoptError usage = """ Usage: python __.py [options] [source] Options: -h, --help show this help -d show debugging information while parsing Examples: """ try: opts, args = getopt(sys.argv[1:], "hd", ["help"]) except GetoptError: print usage; sys.exit(2) for opt, arg in opts: if opt in ("-h", "--help"): print usage; sys.exit() if args: lines = file(args[0]) print 'Parsing the file' for i, rec in enumerate(EbiParser(lines, strict=True)): print '\r %s: %s' % (i,rec[1]['ID']['EntryName']) , else: lines="""\ ID Q9U9C5_CAEEL PRELIMINARY; PRT; 218 AA. AC Q9U9C5;hdfksfsdfs;sdfsfsfs; DT 01-MAY-2000 (TrEMBLrel. 13, Created) DT 01-MAY-2000 (TrEMBLrel. 13, Last sequence update) DT 13-SEP-2005 (TrEMBLrel. 31, Last annotation update) DE Basic salivary proline-rich protein 4 allele L (Salivary proline-rich DE protein Po) (Parotid o protein) [Contains: Peptide P-D (aa); BB (bb) DE (bbb)] (Fragment). GN Name=nob-1; ORFNames=Y75B8A.2, Y75B8A.2B; GN and GN Name=Jon99Ciii; Synonyms=SER2, SER5, Ser99Db; ORFNames=CG15519; OS Caenorhabditis elegans (aa) (bb). OC Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea; OC Rhabditidae; Peloderinae; Caenorhabditis. OG Plastid; Apicoplast. OG Plasmid R6-5, Plasmid IncFII R100 (NR1), and OG Plasmid IncFII R1-19 (R1 drd-19). OX NCBI_TaxID=6239; RN [1] RP NUCLEOTIDE SEQUENCE. RC STRAIN=N2; RX MEDLINE=20243724; PubMed=10781051; DOI=10.1073/pnas.97.9.4499; RA Van Auken K., Weaver D.C., Edgar L.G., Wood W.B.; RT "Caenorhabditis elegans embryonic axial patterning requires two RT recently discovered posterior-group Hox genes."; RL Proc. Natl. Acad. Sci. U.S.A. 97:4499-4503(2000). RN [2] RP NUCLEOTIDE SEQUENCE. RC STRAIN=N2; RG The mouse genome sequencing consortium; RL Submitted (JUL-1999) to the EMBL/GenBank/DDBJ databases. CC -!- SUBCELLULAR LOCATION: Nuclear (By similarity). CC -!- DATABASE: NAME=slkdfjAtlas Genet. Cytogenet. Oncol. Haematol.; CC WWW="http://www.infobiogen.fr/services/chromcancer/Genes/ CC -!- DATABASE: NAME=Atlas Genet. Cytogenet. Oncol. Haematol.; CC WWW="http://www.infobiogen.fr/services/chromcancer/Genes/ CC P53ID88.html". CC -!- INTERACTION: CC P51617:IRAK1; NbExp=1; IntAct=EBI-448466, EBI-358664; CC P51617:IRAK1; NbExp=1; IntAct=EBI-448472, EBI-358664; CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing; Named isoforms=3; CC Comment=Additional isoforms seem to exist. Experimental CC confirmation may be lacking for some isoforms; CC Name=1; Synonyms=AIRE-1; CC IsoId=O43918-1; Sequence=Displayed; CC Name=2; Synonyms=AIRE-2; CC IsoId=O43918-2; Sequence=VSP_004089; CC Name=3; Synonyms=AIRE-3; CC IsoId=O43918-3; Sequence=VSP_004089, VSP_004090; CC -!- BIOPHYSICOCHEMICAL PROPERTIES: CC Kinetic parameters: CC KM=98 uM for ATP; CC KM=688 uM for pyridoxal; CC Vmax=1.604 mmol/min/mg enzyme; CC pH dependence: CC Optimum pH is 6.0. Active from pH 4.5 to 10.5; CC -!- MASS SPECTROMETRY: MW=13822; METHOD=MALDI; RANGE=19-140 (P15522- CC 2); NOTE=Ref.1. CC -------------------------------------------------------------------------- CC This SWISS-PROT entry is copyright. It is produced through a collaboration CC removed. CC -------------------------------------------------------------------------- DR EMBL; AF172090; AAD48874.1; -; mRNA. DR EMBL; AL033514; CAC70124.1; -; Genomic_DNA. DR HSSP; P02833; 9ANT. KW Complete proteome; DNA-binding; Developmental protein; Homeobox; KW Hypothetical protein; Nuclear protein. FT DNA_BIND >102 292 FT REGION 1 44 Transcription activation (acidic). FT CHAIN 23 611 Halfway protein. FT /FTId=PRO_0000021413. FT VARIANT 1 7 unknown (in a skin tumor). FT /FTId=VAR_005851. FT VARIANT 7 7 D -> H (in a skin tumor). FT /FTId=VAR_005851. FT CONFLICT 282 282 R -> Q (in Ref. 18). FT STRAND 103 103 FT NON_TER 80 80 non_ter. SQ SEQUENCE 218 AA; 24367 MW; F24AE5E8A102FAC6 CRC64; MISVMQQMIN NDSPEDSKES ITSVQQTPFF WPSAAAAIPS IQGESRSERE SETGSSPQLA PSSTGMVMPG TAGMYGFGPS RMPTANEFGM MMNPVYTDFY QNPLASTDIT IPTTAGSSAA TTPNAAMHLP WAISHDGKKK RQPYKKDQIS RLEYEYSVNQ YLTNKRRSEL SAQLMLDEKQ VKVWFQNRRM KDKKLRQRHS GPFPHGAPVT PCIERLIN // ID Q9U9C5_TEST PRELIMINARY; PRT; 218 AA. DT ddd. AC Q9U9C5;hdfksfsdfs;sdfsfsfs; SQ SEQUENCE 218 AA; 24367 MW; F24AE5E8A102FAC6 CRC64; MISVMQQMIN NDSPEDSKES ITSVQQTPFF WPSAAAAIPS IQGESRSERE // """.split('\n') pprint(list(EbiParser(lines,strict=False, selected_labels=[]))) #from time import time ##sys.exit() #if len(sys.argv) > 1: # #f = file('/home/zongzhi/Projects/SNP/working/data/uniprot_sprot_human.dat') # f = file('/home/zongzhi/Projects/SNP/working/data/uniprot_sprot_fungi.dat') # #f = file('/home/zongzhi/Projects/SNP/snp_tests/ebi_test.txt') # i = 0 # for sequence, head in MinimalEbiParser(f): # i += 1 # if i>10000: sys.exit() # print '%s \r' % i, # try: # de = ' '.join(head['OG']) # except KeyError, e: # pass # #print e # else: # if 'Plasmid' in de: # print de, '\n' PyCogent-1.5.3/cogent/parse/fasta.py000644 000765 000024 00000022570 12024702176 020310 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Parsers for FASTA and related formats. """ from cogent.parse.record_finder import LabeledRecordFinder from cogent.parse.record import RecordError from cogent.core.info import Info, DbRef from cogent.core.moltype import BYTES, ASCII from string import strip import cogent import re __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight","Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" Sequence = BYTES.Sequence def is_fasta_label(x): """Checks if x looks like a FASTA label line.""" return x.startswith('>') def is_gde_label(x): """Checks if x looks like a GDE label line.""" return x and x[0] in '%#' def is_blank_or_comment(x): """Checks if x is blank or a FASTA comment line.""" return (not x) or x.startswith('#') or x.isspace() def is_blank(x): """Checks if x is blank.""" return (not x) or x.isspace() FastaFinder = LabeledRecordFinder(is_fasta_label, ignore=is_blank_or_comment) def MinimalFastaParser(infile, strict=True, \ label_to_name=str, finder=FastaFinder, \ is_label=None, label_characters='>'): """Yields successive sequences from infile as (label, seq) tuples. If strict is True (default), raises RecordError when label or seq missing. """ for rec in finder(infile): #first line must be a label line if not rec[0][0] in label_characters: if strict: raise RecordError, "Found Fasta record without label line: %s"%\ rec else: continue #record must have at least one sequence if len(rec) < 2: if strict: raise RecordError, "Found label line without sequences: %s" % \ rec else: continue label = rec[0][1:].strip() label = label_to_name(label) seq = ''.join(rec[1:]) yield label, seq GdeFinder = LabeledRecordFinder(is_gde_label, ignore=is_blank) def MinimalGdeParser(infile, strict=True, label_to_name=str): return MinimalFastaParser(infile, strict, label_to_name, finder=GdeFinder,\ label_characters='%#') def xmfa_label_to_name(line): (loc, strand, contig) = line.split() (sp, loc) = loc.split(':') (lo, hi) = [int(x) for x in loc.split('-')] if strand == '-': (lo, hi) = (hi, lo) else: assert strand == '+' name = '%s:%s:%s-%s' % (sp, contig, lo, hi) return name def is_xmfa_blank_or_comment(x): """Checks if x is blank or an XMFA comment line.""" return (not x) or x.startswith('=') or x.isspace() XmfaFinder = LabeledRecordFinder(is_fasta_label, \ ignore=is_xmfa_blank_or_comment) def MinimalXmfaParser(infile, strict=True): # Fasta-like but with header info like ">1:10-1000 + chr1" return MinimalFastaParser(infile, strict, label_to_name=xmfa_label_to_name, finder=XmfaFinder) def MinimalInfo(label): """Minimal info data maker: returns Name, and empty dict for info{}.""" return label, {} def NameLabelInfo(label): """Returns name as label split on whitespace, and Label in Info.""" return label.split()[0], {'Label':label} def FastaParser(infile,seq_maker=None,info_maker=MinimalInfo,strict=True): """Yields successive sequences from infile as (name, sequence) tuples. Constructs the sequence using seq_maker(seq, info=Info(info_maker(label))). If strict is True (default), raises RecordError when label or seq missing. Also raises RecordError if seq_maker fails. It is info_maker's responsibility to raise the appropriate RecordError or FieldError on failure. Result of info_maker need not actually be an info object, but can just be a dict or other data that Info can use in its constructor. """ if seq_maker is None: seq_maker = Sequence for label, seq in MinimalFastaParser(infile, strict=strict): if strict: #need to do error checking when constructing info and sequence try: name, info = info_maker(label) #will raise exception if bad yield name, seq_maker(seq, Name=name, Info=info) except Exception, e: raise RecordError, \ "Sequence construction failed on record with label %s" % label else: #not strict: just skip any record that raises an exception try: name, info = info_maker(label) yield(name, seq_maker(seq, Name=name, Info=info)) except Exception, e: continue #labeled fields in the NCBI FASTA records NcbiLabels = { 'dbj':'DDBJ', 'emb':'EMBL', 'gb':'GenBank', 'ref':'RefSeq', } def NcbiFastaLabelParser(line): """Creates an Info object and populates it with the line contents. As of 11/12/03, all records in genpept.fsa and the human RefSeq fasta files were consistent with this format. """ info = Info() try: ignore, gi, db, db_ref, description = map(strip, line.split('|', 4)) except ValueError: #probably got wrong value raise RecordError, "Unable to parse label line %s" % line info.GI = gi info[NcbiLabels[db]] = db_ref info.Description = description return gi, info def NcbiFastaParser(infile, seq_maker=None, strict=True): return FastaParser(infile, seq_maker=seq_maker, info_maker=NcbiFastaLabelParser, strict=strict) class RichLabel(str): """Object for overloaded Fasta labels. Holds an Info object storing keyed attributes from the fasta label. The str is created from a provided format template that uses the keys from the Info object.""" def __new__(cls, info, template="%s"): """Arguments: - info: a cogent.core.info.Info instance - template: a string template, using a subset of the keys in info. Defaults to just '%s'. Example: label = RichLabel(Info(name='rat', species='Rattus norvegicus'), '%(name)s')""" label = template % info new = str.__new__(cls, label) new.Info = info return new def LabelParser(display_template, field_formatters, split_with=":", DEBUG=False): """returns a function for creating a RichLabel's from a string Arguments; - display_template: string format template - field_formatters: series of (field index, field name, coverter function) - split_with: characters separating fields in the label. The display_template must use at least one of the assigned field names.""" indexed = False for index, field, converter in field_formatters: if field in display_template: indexed = True assert indexed, "display_template [%s] does not use a field name"\ % display_template sep = re.compile("[%s]" % split_with) def call(label): label = [label, label[1:]][label[0] == ">"] label = sep.split(label) if DEBUG: print label info = Info() for index, name, converter in field_formatters: if callable(converter): try: info[name] = converter(label[index]) except IndexError: print label, index, name raise else: info[name] = label[index] return RichLabel(info, display_template) return call def GroupFastaParser(data, label_to_name, group_key="Group", aligned=False, moltype=ASCII, done_groups=None, DEBUG=False): """yields related sequences as a separate seq collection Arguments: - data: line iterable data source - label_to_name: LabelParser callback - group_key: name of group key in RichLabel.Info object - aligned: whether sequences are to be considered aligned - moltype: default is ASCII - done_groups: series of group keys to be excluded """ done_groups = [[], done_groups][done_groups is not None] parser = MinimalFastaParser(data, label_to_name=label_to_name, finder=XmfaFinder) group_ids = [] current_collection = {} for label, seq in parser: seq = moltype.makeSequence(seq, Name=label, Info=label.Info) if DEBUG: print "str(label) ",str(label), "repr(label)", repr(label) if not group_ids or label.Info[group_key] in group_ids: current_collection[label] = seq if not group_ids: group_ids.append(label.Info[group_key]) else: # we finish off check of current before creating a collection if group_ids[-1] not in done_groups: info = Info(Group=group_ids[-1]) if DEBUG: print "GroupParser collection keys", current_collection.keys() seqs = cogent.LoadSeqs(data=current_collection, moltype=moltype, aligned=aligned) seqs.Info = info yield seqs current_collection = {label: seq} group_ids.append(label.Info[group_key]) info = Info(Group=group_ids[-1]) seqs = cogent.LoadSeqs(data=current_collection, moltype=moltype, aligned=aligned) seqs.Info = info yield seqs PyCogent-1.5.3/cogent/parse/fastq.py000644 000765 000024 00000002570 12024702176 020326 0ustar00jrideoutstaff000000 000000 __author__ = "Gavin Huttley, Anuj Pahwa" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Anuj Pahwa"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Development" def MinimalFastqParser(data, strict=True): """yields name, seq, qual from fastq file Arguments: - strict: checks the quality and sequence labels are the same """ if type(data) == str: data = open(data) # fastq format is very simple, defined by blocks of 4 lines line_num = -1 record = [] for line in data: line_num += 1 if line_num == 4: if strict: # make sure the seq and qual labels match assert record[0][1:] == record[2][1:], \ 'Invalid format: %s -- %s' % (record[0][1:], record[2][1:]) yield record[0][1:], record[1], record[3] line_num = 0 record = [] record.append(line.strip()) if record: if strict and record[0]: # make sure the seq and qual labels match assert record[0][1:] == record[2][1:], 'Invalid format' if record[0]: # could be just an empty line at eof yield record[0][1:], record[1], record[3] if type(data) == file: data.close() PyCogent-1.5.3/cogent/parse/flowgram.py000644 000765 000024 00000034322 12024702176 021026 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """A Flowgram object for 454 sequencing data.""" __author__ = "Jens Reeder, Julia Goodrich" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jens Reeder","Julia Goodrich"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jens Reeder" __email__ = "jreeder@colorado.edu" __status__ = "Development" from copy import copy from cogent.util.unit_test import FakeRandom from cogent.core.sequence import Sequence DEFAULT_FLOWORDER = "TACG" DEFAULT_KEYSEQ = "TCAG" class Flowgram(object): """Holds a 454 flowgram object""" HeaderInfo = ["Run Prefix", "Region #","XY Location","Run Name", "Analysis Name", "Full Path","Read Header Len","Name Length", "# of Bases","Clip Qual Left","Clip Qual Right", "Clip Adap Left","Clip Adap Right"] FlowgramInfo = ['Flow Indexes','Bases','Quality Scores'] def __init__(self, flowgram = '', Name = None, KeySeq = DEFAULT_KEYSEQ, floworder = DEFAULT_FLOWORDER, header_info = None): """Initialize a flowgram. Arguments: flowgram: the raw flowgram string, or list, no other type guarenteed to work as expected, default is '' Name: the flowgram name KeySeq: the 454 key sequence floworder: flow sequence used to transform flowgram to sequence """ if Name is None and hasattr(flowgram, 'Name'): Name = flowgram.Name if Name is None and hasattr(flowgram, 'header_info'): header_info = flowgram.header_info self.Name = Name if hasattr(flowgram, '_flowgram'): flowgram = flowgram._flowgram if isinstance(flowgram, str): self._flowgram = ' '.join(flowgram.split()) if isinstance(flowgram, list): self._flowgram = ' '.join(map(str, flowgram)) else: self._flowgram = str(flowgram) self.flowgram = map(float,self._flowgram.split()) self.keySeq = KeySeq self.floworder = floworder if header_info is not None: for i in header_info: setattr(self,i,header_info[i]) #Why do we store the information twice, once as attribute and once in header_info? self.header_info = header_info def __str__(self): """__str__ returns self._flowgram unmodified.""" return '\t'.join(self._flowgram.split()) def __len__(self): """returns the length of the flowgram""" return len(self.flowgram) def cmpSeqToString(self, other): """compares the flowgram's sequence to other which is a string will first try to compare by self.Bases, then by self.toSeq""" if hasattr(self,'Bases') and self.Bases == other: return True else: return self.toSeq() == other def cmpByName(self, other): """compares based on the name, other must also be a flowgram object""" if self is other: return 0 try: return cmp(self.Name, other.Name) except AttributeError: return cmp(type(self), type(other)) def cmpBySeqs(self, other): """compares by the sequences they represent other must also be a flowgram object """ if self is other: return 0 try: return cmp(self.Bases,other.Bases) except AttributeError: return cmp(self.toSeq(), other.toSeq()) def hasProperKey(self, keyseq=DEFAULT_KEYSEQ): """Checks for the proper key sequence""" keylen = len(keyseq) keyseq_from_flow = self.toSeq(truncate=False, Bases=False)[:keylen] return (keyseq_from_flow == keyseq) def __cmp__(self, other): """compares flowgram to other which is a string or another flowgram""" if isinstance(other, Flowgram): other = other._flowgram return cmp(self._flowgram, other) def __iter__(self): """yields successive floats in flowgram""" for f in self.flowgram: yield f def __hash__(self): """__hash__ behaves like the flowgram string for dict lookup.""" return hash(self._flowgram) def __contains__(self, other): """__contains__ checks whether other is in the flowgram string.""" return other in self._flowgram def toSeq(self, Bases=True, truncate=True): """Translates flowgram to sequence and returns sequence object if Bases is True then a sequence object will be made using self.Bases instead of translating the flowgram truncate: if True strip off lowercase chars (low quality bases) """ if Bases and hasattr(self, "Bases"): seq = self.Bases else: seq = [] if self.floworder is None: raise ValueError, "must have self.floworder set" key = FakeRandom(self.floworder,True) flows_since_last = 0 for n in self.flowgram: signal = int(round(n)) seq.extend([key()]* signal) if (signal>0): flows_since_last = 0 else: flows_since_last += 1 if(flows_since_last ==4): seq.extend('N') flows_since_last=0 seq = ''.join(seq) #cache the result for next time self.Bases = seq if(truncate): seq = str(seq) seq = seq.rstrip("acgtn") seq = seq.lstrip("actgn") return Sequence(seq, Name = self.Name) def toFasta(self, make_seqlabel=None, LineWrap = 80): """Return string in FASTA format, no trailing newline Will use self.Bases if it is set otherwise it will translate the flowgram Arguments: - make_seqlabel: callback function that takes the seq object and returns a label str """ if hasattr(self,'Bases'): seq = self.toSeq(Bases = True) else: seq = self.toSeq() seq.LineWrap = LineWrap return seq.toFasta(make_seqlabel = make_seqlabel) def getQualityTrimmedFlowgram(self): """Returns trimmed flowgram according to Clip Qual Right""" flow_copy = copy(self) if (hasattr(self, "Clip Qual Right") and hasattr(self, "Flow Indexes")): clip_right = int(getattr(self, "Clip Qual Right")) flow_indices = getattr(self, "Flow Indexes") flow_indices = [int(k) for k in flow_indices.split('\t') if k != ''] clip_right_flowgram = flow_indices[clip_right-1] #Truncate flowgram flow_copy.flowgram = self.flowgram[:clip_right_flowgram] flow_copy._flowgram =\ "\t".join(self._flowgram.split()[:clip_right_flowgram]) #Update attributes if hasattr(flow_copy, "Quality Scores"): qual_scores = getattr(flow_copy,"Quality Scores").split('\t') setattr(flow_copy, "Quality Scores", "\t".join(qual_scores[:clip_right])) if hasattr(flow_copy, "Flow Indexes"): setattr(flow_copy, "Flow Indexes", "\t".join(map(str, flow_indices[:clip_right]))) if hasattr(flow_copy, "Bases"): flow_copy.Bases = self.Bases[:clip_right] if hasattr(flow_copy, "# of Bases"): setattr(flow_copy, "# of Bases", clip_right) return flow_copy def getPrimerTrimmedFlowgram(self, primerseq): """Cuts the key and primer sequences of a flowgram. primerseq: the primer seq to be truncated from flowgram """ if(primerseq==""): return self else: flow_copy = copy(self) #Key currently not reliable set by FlowgramCollection #instead pass key as part of primer #key = flow_copy.keySeq or "" flow_indices = getattr(self, "Flow Indexes") flow_indices = [int(k) for k in flow_indices.split('\t') if k != ''] #position of last primer char in flowgram primer_len = len(primerseq) pos = flow_indices[primer_len-1] signal = flow_copy.flowgram[pos-1] if (signal < 0.5): #Flowgram is not consistent with primerseq return None elif (signal < 1.5): pad_num = pos % 4 #we can simply cut off flow_copy.flowgram = flow_copy.flowgram[pos:] # and pad flowgram to the left to sync with floworder flow_copy.flowgram[:0] = pad_num*[0.00] #check that first 4 flows not are all zero else: pad_num = (pos-1)%4 # we are cutting within a signal, need to do some flowgram arithmetic lastchar = primerseq[-1] #get the position in the homopolyemer pos_in_homopoly = len(primerseq) - len(primerseq.rstrip(lastchar)) flow_copy.flowgram = flow_copy.flowgram[pos-1:] flow_copy.flowgram[0] = max(0.00, flow_copy.flowgram[0] - pos_in_homopoly) #pad flowgram to the left to sync with floworder flow_copy.flowgram[:0] = (pad_num)*[0.00] # delete first flow cycle if all <0.5 (otherwise an N would be called) if(any([sign>=0.5 for sign in flow_copy.flowgram[:4]])): #We are ok extra_shift=0 pass else: #we truncate the first 4 flows flow_copy.flowgram = flow_copy.flowgram[4:] extra_shift=4 #Update "Flow Indexes" attribute #shift all flow indices by the deleted amount # WARNING: this sets wrong flow indexes, so better set to nothing # setattr(flow_copy, "Flow Indexes", # "\t".join([ str(a-(pos+extra_shift)+pad_num) for a in\ # flow_indices[primer_len:]])) setattr(flow_copy, "Flow Indexes", "") #Update flowgram string representation flow_copy._flowgram = "\t".join(map(lambda a:"%.2f"%a, flow_copy.flowgram)) #Update "Quality Scores" attribute if hasattr(self, "Quality Scores"): qual_scores = getattr(flow_copy,"Quality Scores").split('\t') setattr(flow_copy, "Quality Scores", "\t".join(qual_scores[primer_len:])) #Update Bases attribute if hasattr(flow_copy, "Bases"): flow_copy.Bases = flow_copy.Bases[primer_len:] #Update "# of Bases" attribute if hasattr(flow_copy, "# of Bases"): setattr(flow_copy, "# of Bases", str(int(getattr(flow_copy, "# of Bases")) - (primer_len))) if hasattr(flow_copy, "Clip Qual Left"): setattr(flow_copy, "Clip Qual Left", str(max(0, int(getattr(flow_copy, "Clip Qual Left")) - primer_len))) if hasattr(flow_copy, "Clip Qual Right"): setattr(flow_copy, "Clip Qual Right", str(max(0, int(getattr(flow_copy, "Clip Qual Right")) - primer_len))) if hasattr(flow_copy, "Clip Adap Left"): setattr(flow_copy, "Clip Adap Left", str(max(0, int(getattr(flow_copy, "Clip Adap Left")) - primer_len))) if hasattr(flow_copy, "Clip Adap Right"): setattr(flow_copy, "Clip Adap Right", str(max(0, int(getattr(flow_copy, "Clip Adap Right")) - primer_len))) return flow_copy def createFlowHeader(self): """header_info dict turned into flowgram header""" lines = [">%s\n"%self.Name] flow_info = [] head_info = [] for i in self.FlowgramInfo: if hasattr(self,i): flow_info.append('%s:\t%s\n' % (i, getattr(self, i))) for i in self.HeaderInfo: if hasattr(self,i): head_info.append(' %s:\t%s\n' % (i, getattr(self, i))) lines.extend(head_info) lines.extend(flow_info) lines.append("Flowgram:\t%s" % str(self)) return (''.join(lines)+"\n") def seq_to_flow(seq, id = None, keyseq = None, floworder = DEFAULT_FLOWORDER): """ Transform a sequence into an ideal flow. seq: sequence to transform to flowgram id: identifier keyseq """ complete_flow = floworder * len(seq) # worst case length i = 0 # iterates over seq j = 0 # iterates over the flow sequence tcagtcagtcag... mask = "" while (j < len(complete_flow) and i= len(seq)): break j += 1 else: mask += "0" j += 1 # pad mask to finish the last flow to a multiple of the floworder length if (len(mask) % len(floworder) != 0): right_missing = len(floworder) - (len(mask) % len(floworder)) mask += "0" * right_missing return Flowgram(map(float, mask), id, keyseq, floworder) def build_averaged_flowgram(flowgrams): """Builds an averaged flowgram from a list of raw signals.""" result=[] if(len(flowgrams)==1): return flowgrams[0] for tuple in map(None, *flowgrams): k=0 sum=0 for element in tuple: if (element!=None): k+=1 sum +=element result.append(round(sum/k,2)) return result PyCogent-1.5.3/cogent/parse/flowgram_collection.py000644 000765 000024 00000062615 12024702176 023247 0ustar00jrideoutstaff000000 000000 __author__ = "Julia Goodrich" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jens Reeder","Julia Goodrich"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Julia Goodrich" __email__ = "julia.goodrich@colorado.edu" __Status__ = "Development" from copy import copy from types import GeneratorType from numpy import transpose from numpy.random import multinomial from cogent.util.unit_test import FakeRandom from cogent.core.sequence import Sequence from cogent.parse.flowgram_parser import parse_sff from cogent.parse.flowgram import Flowgram from cogent.core.alignment import SequenceCollection default_floworder = "TACG" default_keyseq = "TCAG" def assign_sequential_names(ignored, num_seqs, base_name='seq', start_at=0): """Returns list of num_seqs sequential, unique names. First argument is ignored; expect this to be set as a class attribute. """ return ['%s_%s' % (base_name,i) for i in range(start_at,start_at+num_seqs)] def flows_from_array(a): """FlowgramCollection from array of pos x seq: names are integers. This is an InputHandler for FlowgramCollection. It converts an arbitrary array of numbers into Flowgram objects and leaves the flowgrams unlabeled. """ return list(transpose(a)), None, None def flows_from_flowCollection(flow): """FlowgramCollection from array of pos x seq: names are integers. This is an InputHandler for FlowgramCollection. It converts an arbitrary array of numbers into Flowgram objects and leaves the flowgrams unlabeled. """ return flow.flows, flow.Names, [f.header_info for f in flow.flows] def flows_from_kv_pairs(flows): """SequenceCollection from list of (key, val) pairs. val can be str or flowgram object """ names, flows = map(list, zip(*flows)) if isinstance(flows[0], str): info = [None]*len(flows) else: info = [f.header_info for f in flows] return flows, names, info def flows_from_empty(obj, *args, **kwargs): """SequenceCollection from empty data: raise exception.""" raise ValueError, "Cannot create empty SequenceCollection." def flows_from_dict(flows): """SequenceCollection from dict of {label:flow_as_str} or {label:flow_obj}. """ names, flows = map(list, zip(*flows.items())) if isinstance(flows[0], str): info = [None]*len(flows) else: info = [f.header_info for f in flows] return flows, names, info def flows_from_sff(flows): """lines is sff file lines. """ if isinstance(flows, str): flows = flows.splitlines() flows, head = parse_sff(flows) return flows_from_generic(flows) def flows_from_generic(flows): """SequenceCollection from generic seq x pos data: seq of seqs of chars. This is an InputHandler for SequenceCollection. It converts a generic list (each item in the list will be mapped onto an object using seq_constructor and assigns sequential integers (0-based) as names. """ names = [] info = [] for f in flows: if hasattr(f, 'Name'): names.append(f.Name) else: names.append(None) if hasattr(f, 'header_info'): info.append(f.header_info) else: info.append(None) return flows, names, info class FlowgramCollection(object): """stores Flowgrams. - InputHandlers: support for different data types - flows: behaves like list of Flowgram objects - Names: behaves like list of names for the Flowgram objects - NamedFlows: behaves like dict of {name:flow} """ InputHandlers = { 'array': flows_from_array, 'dict': flows_from_dict, 'sff' :flows_from_sff, 'empty': flows_from_empty, 'kv_pairs':flows_from_kv_pairs, 'flowcoll':flows_from_flowCollection, 'generic':flows_from_generic } HeaderInfo = ['Magic Number', 'Version', 'Index Offset', 'Index Length', '# of Reads', 'Header Length', 'Key Length', '# of Flows', 'Flowgram Code', 'Flow Chars', 'Key Sequence'] DefaultNameFunction = assign_sequential_names def __init__(self, data, Name = None, Names = None, header_info = None, conversion_f = None, name_conversion_f = None, remove_duplicate_names = False): """Initialize self with data and optionally Info. Parameters: data: Data to convert into a FlowgramCollection Name: Name of the FlowgramCollection. conversion_f: Function to convert data into Flowgram. name_conversion_f: if present, converts name into f(name). header_info: contains info to be printed in the common header of an sff file, it is a dictionary. ex: Key Sequence:"ATCG" """ #read all the data in if we were passed a generator if isinstance(data, GeneratorType): data = list(data) #set the Name self.Name = Name if header_info is not None: self._check_header_info(header_info) for i in header_info: setattr(self,i,header_info[i]) if 'Key Sequence' in header_info: keyseq = header_info['Key Sequence'] else: keyseq = None if 'Flow Chars' in header_info: floworder = header_info['Flow Chars'] else: floworder = default_floworder else: keyseq = None floworder = default_floworder self.header_info = header_info per_flow_names, flows, name_order, info = \ self._names_flows_order(conversion_f, data, Names, \ name_conversion_f, remove_duplicate_names) self.Names = name_order #will take only the flows and names that are in name_order if per_flow_names != name_order: good_indices = [] for n in name_order: good_indices.append(per_flow_names.index(n)) flows = [flows[i] for i in good_indices] info = [info[i] for i in good_indices] per_flow_names = name_order self.flow_str= flows self.flows = [Flowgram(f,n,keyseq,floworder, i)\ for f,n, i in zip(flows,self.Names,info)] #create NamedFlows dict for fast lookups self.NamedFlows = self._make_named_flows(self.Names, self.flows) def _strip_duplicates(self, names, flows, info): """Internal function to strip duplicates from list of names""" if len(set(names)) == len(names): return set(), names, flows, info #if we got here, there are duplicates unique_names = {} duplicates = {} fixed_names = [] fixed_flows = [] fixed_info = [] for n, f, i in zip(names, flows,info): if n in unique_names: duplicates[n] = 1 else: unique_names[n] = 1 fixed_names.append(n) fixed_flows.append(f) fixed_info.append(i) return duplicates, fixed_names, fixed_flows, fixed_info def _names_flows_order(self, conversion_f, data, Names, \ name_conversion_f, remove_duplicate_names): """Internal function to figure out names, flows, and name_order.""" #figure out conversion function and whether it's an array if not conversion_f: input_type = self._guess_input_type(data) conversion_f = self.InputHandlers[input_type] #set seqs, names, and handler_info as properties flows, names, info = conversion_f(data) if names and name_conversion_f: names = map(name_conversion_f, names) #if no names were passed in as Names, if we obtained them from #the seqs we should use them, but otherwise we should use the #default names if Names is None: if (names is None) or (None in names): per_flow_names = name_order = \ self.DefaultNameFunction(len(flows)) else: #got names from seqs per_flow_names = name_order = names else: #otherwise, names were passed in as Names: use this as the order #if we got names from the sequences, but otherwise assign the #names to successive sequences in order if (names is None) or (None in names): per_flow_names = name_order = Names else: #got names from seqs, so assume name_order is in Names per_flow_names = names name_order = Names #check for duplicate names duplicates, fixed_names, fixed_flows, fixed_info = \ self._strip_duplicates(per_flow_names, flows, info) if duplicates: if remove_duplicate_names: per_flow_names, flows, info =fixed_names,fixed_flows,fixed_info #if name_order doesn't have the same names as per_seq_names, #replace it with per_seq_names if (set(name_order) != set(per_flow_names)) or\ (len(name_order) != len(per_flow_names)): name_order = per_flow_names else: raise ValueError, \ "Some names were not unique. Duplicates are:\n" + \ str(sorted(duplicates.keys())) return per_flow_names, flows, name_order, info def _check_header_info(self, info): for h in info: if h not in self.HeaderInfo: raise ValueError, "invalid key in header_info" def _make_named_flows(self, names, flows): """Returns NamedFlows: dict of name:flow.""" name_flow_tuples = zip(names, flows) for n, f in name_flow_tuples: f.Name = n return dict(name_flow_tuples) def _guess_input_type(self, data): """Guesses input type of data; returns result as key of InputHandlers. Returns 'empty' if check fails, i.e. if it can't recognize the data as a specific type. Note that bad data is not guaranteed to return 'empty', and may be recognized as another type incorrectly. """ if isinstance(data, dict): return 'dict' if isinstance(data, str): return 'sff' if isinstance(data, FlowgramCollection): return 'flowcoll' first = None try: first = data[0] except (IndexError, TypeError): pass try: first = iter(data).next() except (IndexError, TypeError, StopIteration): pass if first is None: return 'empty' try: if isinstance(first, Flowgram): #model sequence base type return 'generic' if hasattr(first, 'dtype'): #array object return 'array' elif isinstance(first, str) and first.startswith('Common'): return 'sff' else: try: dict(data) return 'kv_pairs' except (TypeError, ValueError): pass return "generic" except (IndexError, TypeError), e: return 'empty' def __len__(self): """returns the number of flowgrams in the collection""" return len(self.flows) def __iter__(self): """iterates over the flows in the collection""" for f in self.flows: yield f def __str__(self): """returns string like sff file given the flowgrams and header_info""" lines = self.createCommonHeader() lines.append('') lines.extend([f.createFlowHeader() for f in self.flows]) return '\n'.join(lines) def __cmp__(self, other): """cmp first tests as dict, then as str.""" c = cmp(self.NamedFlows, other) if not c: return 0 else: return cmp(str(self), str(other)) def keys(self): """keys uses self.Names Note: returns copy, not original. """ return self.Names[:] def values(self): """values returns values corresponding to self.Names.""" return [self.NamedFlows[n] for n in self.Names] def items(self): """items returns (name, value) pairs.""" return [(n, self.NamedFlows[n]) for n in self.Names] def writeToFile(self, filename=None, **kwargs): """Write the flowgrams to a file. Arguments: - filename: name of the output file """ if filename is None: raise DataError('no filename specified') f = open(filename, 'w') f.write(str(self)) f.close() def createCommonHeader(self): """header_info dict turned into flowgram common header""" lines = ["Common Header:"] if self.header_info is not None: lines.extend([" %s:\t%s" % (param,self.header_info[param]) \ for param in self.header_info]) return lines def toFasta(self, exclude_ambiguous = False,Bases=False, make_seqlabel = None): """Return flowgram collection in Fasta format Arguments: - make_seqlabel: callback function that takes the seq object and returns a label str if Bases is True then a fasta string will be made using self.Bases instead of translating the flowgram """ if exclude_ambiguous: flows = flows.omitAmbiguousFlows() else: flows = self seqs = flows.toSequenceCollection(Bases) return seqs.toFasta(make_seqlabel = make_seqlabel) def toPhylip(self, exclude_ambiguous = False,Bases=False,generic_label=True, make_seqlabel=None): """ Return alignment in PHYLIP format and mapping to sequence ids raises exception if invalid alignment Arguments: - make_seqlabel: callback function that takes the seq object and returns a label str if Bases is True then a fasta string will be made using self.Bases instead of translating the flowgram """ if exclude_ambiguous: flows = flows.omitAmbiguousFlows() else: flows = self seqs = flows.toSequenceCollection( Bases) return seqs.toPhylip(make_seqlabel = make_seqlabel, generic_label= generic_label) def toNexus(self,seq_type, exclude_ambiguous = False, Bases = False, interleave_len=50): """ Return alignment in NEXUS format and mapping to sequence ids **NOTE** Not that every sequence in the alignment MUST come from a different species!! (You can concatenate multiple sequences from same species together before building tree) seq_type: dna, rna, or protein Raises exception if invalid alignment if Bases is True then a fasta string will be made using self.Bases instead of translating the flowgram """ if exclude_ambiguous: flows = flows.omitAmbiguousFlows() else: flows = self seqs = flows.toSequenceCollection(Bases) return seqs.toNexus(seq_type,interleave_len = interleave_len) def toSequenceCollection(self, Bases = False): names = self.Names flow_dict = self.NamedFlows flows = [flow_dict[f].toSeq(Bases = Bases) for f in names] return SequenceCollection(flows) def addFlows(self, other): """Adds flowgrams from other to self. Returns a new object. other must be of same class as self or coerceable to that class.. """ assert not isinstance(other, str), "Must provide a series of flows "+\ "or an flowgramCollection" self_flow_class = self.flows[0].__class__ try: combined = self.flows + other.flows except AttributeError: combined = self.flows + list(other) try: combined_info =copy(self.header_info) combined_info.update(other.header_info) except AttributeError: combined_info =self.header_info for flow in combined: assert flow.__class__ == self_flow_class,\ "classes different: Expected %s, Got %s" % \ (flow.__class__, self_flow_class) return self.__class__(data=combined,header_info=combined_info) def iterFlows(self, flow_order=None): """Iterates over values (sequences) in the alignment, in order. seq_order: list of keys giving the order in which seqs will be returned. Defaults to self.Names. Note that only these sequences will be returned, and that KeyError will be raised if there are sequences in order that have been deleted from the Alignment. If self.Names is None, returns the sequences in the same order as self.NamedSeqs.values(). Use map(f, self.seqs()) to apply the constructor f to each seq. f must accept a single list as an argument. Always returns references to the same objects that are values of the alignment. """ ns = self.NamedFlows get = ns.__getitem__ for key in flow_order or self.Names: yield get(key) def iterItems(self, flow_order=None, pos_order=None): """Iterates over elements in the flowgram collection. seq_order (names) can be used to select a subset of seqs. pos_order (positions) can be used to select a subset of positions. Always iterates along a seq first, then down a position (transposes normal order of a[i][j]; possibly, this should change).. WARNING: FlowgramCollection.iterItems() is not the same as fc.iteritems() (which is the built-in dict iteritems that iterates over key-value pairs). """ if pos_order: for row in self.iterFlows(flow_order): for i in pos_order: yield row[i] else: for row in self.iterFlows(flow_order): for i in row: yield i Items = property(iterItems) def copy(self): """Returns deep copy of self.""" result = self.__class__(self) def _take_flows(self): return list(self.iterFlows()) Flows = property(_take_flows) #access as attribute if using default order. def takeFlows(self, flows, negate=False, **kwargs): """Returns new FlowgramCollection containing only specified flows. Note that the flows in the new collection will be references to the same objects as the seqs in the old collection. """ get = self.NamedFlows.__getitem__ result = {} if negate: #copy everything except the specified seqs negated_names = [] row_lookup = dict.fromkeys(flows) for r, row in self.NamedFlows.items(): if r not in row_lookup: result[r] = row negated_names.append(r) flows = negated_names #remember to invert the list of names else: #copy only the specified seqs for r in flows: result[r] = get(r) if result: return self.__class__(result, Names=flows, **kwargs) else: return {} #safe value; can't construct empty collection def getFlowIndices(self, f, negate=False, Bases = False): """Returns list of keys of flows where f(Seq) is True. List will be in the same order as self.Names, if present. If Bases is True it will use the flowgram Bases attribute otherwise it uses the translation to a sequence """ get = self.NamedFlows.__getitem__ #negate function if necessary if negate: new_f = lambda x: not f(x) else: new_f = f #get all the seqs where the function is True return [key for key in self.Names \ if new_f(get(key).toSeq(Bases = Bases))] def takeFlowsIf(self, f, negate=False, Bases = False, **kwargs): """Returns new Alignment containing seqs where f(row) is True. Note that the seqs in the new Alignment are the same objects as the seqs in the old Alignment, not copies. If Bases is True it will use the flowgram Bases attribute otherwise it uses the translation to a sequence """ #pass negate to get SeqIndices return self.takeFlows(self.getFlowIndices(f, negate, Bases= Bases), **kwargs) def getFlow(self, flowname): """Return a flowgram object for the specified seqname. """ return self.NamedFlows[flowname] def getFlowNames(self): """Return a list of Flowgram names.""" return self.Names[:] def getIntMap(self,prefix='seq_'): """Returns a dict with names mapped to enumerates integer names. - prefix: prefix for sequence label. Default = 'seq_' - int_keys is a dict mapping int names to sorted original names. """ get = self.NamedFlows.__getitem__ int_keys = dict([(prefix+str(i),k) for i,k in \ enumerate(sorted(self.NamedFlows.keys()))]) int_map = dict([(k, copy(get(v))) for k,v in int_keys.items()]) return int_map, int_keys def toDict(self): """Returns the collection as dict of names -> strings. Note: returns strings, NOT Flowgram objects. """ collection_dict = {} for flow_name in self.Names: collection_dict[flow_name] = self.NamedFlows[flow_name]._flowgram return collection_dict def omitAmbiguousFlows(self, Bases = False): """Returns an object containing only the sequences without N's""" is_ambiguous = lambda x: 'N' not in x return self.takeFlowsIf(is_ambiguous, Bases = Bases) def setBases(self): """Sets the Bases property for each flowgram using toSeq""" for f in self.values(): f.Bases = f.toSeq() def pick_from_prob_density(pvals,bin_size): l = multinomial(1,pvals) return (l.nonzero()[0][0]) * bin_size def seqs_to_flows(seqs, keyseq = default_keyseq, floworder = default_floworder, numflows = None, probs = None, bin_size = 0.01, header_info = {}): """ Transfrom a sequence into an ideal flow seqs: a list of name sequence object tuples (name,tuple) keyseq: the flowgram key Sequence floworder: The chars needed to convert seq to flow numflows: number of total flows in each flowgram, if it is specified the flowgram will be padded to that number probs: dictionary defining the probability distribution for each homopolymer WARNING:each distributions probabilities must add to 1.0 """ flows = [] homopolymer_counter = 1.0 if probs: for p in probs: if round(sum(probs[p]),1) != 1.0: raise ValueError, 'probs[%s] does not add to 1.0' % p for name,seq in seqs: flow_seq = FakeRandom(floworder,True) flow = [] seq_len = len(seq) for i, nuc in enumerate(seq): if i < seq_len-1 and seq[i+1] == nuc: homopolymer_counter += 1.0 else: while flow_seq() != nuc: if probs is None: val = 0.0 else: val = pick_from_prob_density(probs[0],bin_size) flow.append(val) if (probs is None) or (homopolymer_counter > 9): val = homopolymer_counter else: val = pick_from_prob_density(probs[int(homopolymer_counter)],bin_size) flow.append(val) homopolymer_counter = 1.0 len_flow = len(flow) len_order = len(floworder) if numflows is not None and numflows % len_order != 0: raise ValueError, "numflows must be divisable by the length of floworder" if (len_flow % len_order != 0): right_missing = len_order - (len_flow % len_order) if numflows != (len_flow + right_missing) and numflows is not None: right_missing += (numflows - (len_flow+right_missing)) if probs is None: flow.extend([0.0]*right_missing) else: for i in range(0,right_missing): flow.append(pick_from_prob_density(probs[0],bin_size)) flows.append((name, Flowgram(flow, id, keyseq, floworder))) if keyseq is not None: keylen = len(keyseq) else: keylen = None header_info.update({'Key Sequence':keyseq,'Flow Chars':floworder, 'Key Length':keylen}) return FlowgramCollection(flows, header_info = header_info) PyCogent-1.5.3/cogent/parse/flowgram_parser.py000644 000765 000024 00000012367 12024702176 022407 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parser for 454 Flowgram files""" __author__ = "Jens Reeder, Julia Goodrich" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jens Reeder","Julia Goodrich", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jens Reeder" __email__ = "jreeder@colorado.edu" __status__ = "Development" from string import strip from random import sample from itertools import izip from cogent.parse.flowgram import Flowgram from cogent.parse.record_finder import LabeledRecordFinder, is_fasta_label,\ DelimitedRecordFinder, is_empty def get_header_info(lines): """Returns the Information stored in the common header as a dictionary lines can be a a list or a file handle """ header_dict = {} for line in lines: if line.startswith('Common Header'): continue if is_empty(line): break key, value = line.strip().split(':') header_dict[key] = value.strip() if isinstance(lines, file): lines.seek(0) return header_dict def get_summaries(handle, number_list = None, name_list=None, all_sums = False): """Returns specified flowgrams and sequence summaries as generator handle can be a list of lines or a file handle number_list is a list of the summaries wanted by their index in the sff file, starts at 0 name_list is a list of the summaries wanted by their name in the sff file all_sums if true will yield all the summaries in the order they appear in the file One and only one of the parameters must be set """ sff_info = LabeledRecordFinder(is_fasta_label,constructor=strip) sum_gen = sff_info(handle) if number_list: assert not (name_list or all_sums) num = len(number_list) for i,s in enumerate(sum_gen): if i-1 in number_list: yield s num -= 1 if num == 0: break elif name_list: assert not all_sums for s in sum_gen: if s[0].strip('>') in name_list: yield s elif all_sums: header = True for s in sum_gen: if header: header = False continue yield s else: raise ValueError, "number_list, name_list or all_sums must be specified" def get_all_summaries(lines): """Returns all the flowgrams and sequence summaries in list of lists""" sff_info = LabeledRecordFinder(is_fasta_label,constructor=strip) return list(sff_info(lines))[1::] def split_summary(summary): """Returns dictionary of one summary""" summary_dict = {} summary_dict["Name"] = summary[0].strip('>') for line in summary[1::]: key, value = line.strip().split(':') summary_dict[key] = value.strip() return summary_dict def parse_sff(lines): """Creates list of flowgram objects from a SFF file """ head = get_header_info(lines) summaries = get_all_summaries(lines) flows = [] for s in summaries: t = split_summary(s) flowgram = t["Flowgram"] del t["Flowgram"] flows.append(Flowgram(flowgram, Name = t["Name"], floworder =head["Flow Chars"], header_info = t)) return flows, head def lazy_parse_sff_handle(handle): """Returns one flowgram at a time """ sff_info = LabeledRecordFinder(is_fasta_label,constructor=strip) sff_gen = sff_info(handle) header_lines = sff_gen.next() header = get_header_info(header_lines) return (_sff_parser(sff_gen, header), header) def _sff_parser(handle, header): for s in handle: t = split_summary(s) flowgram = t["Flowgram"] del t["Flowgram"] flowgram = Flowgram(flowgram, Name = t["Name"], KeySeq=header["Key Sequence"], floworder = header["Flow Chars"], header_info = t) yield flowgram def get_random_flows_from_sff(filename, num=100, size=None): """Reads size many flows from filename and return sample of num randoms. Note: size has to be the exact number of flowgrams in the file, otherwise the result won't be random or less than num flowgrams will be returned filename: sff.txt input file num: number of flowgrams in returned sample size: number of flowgrams to sample from """ if(size==None): size = count_sff(open(filename)) if (size=num): break def count_sff(sff_fh): """Counts flowgrams in a sff file""" (flowgrams, header) = lazy_parse_sff_handle(sff_fh) i=0 for f in flowgrams: i+=1 return i def sff_to_fasta(sff_fp, out_fp): """Transform an sff file to fasta""" (flowgrams, header) = lazy_parse_sff_handle(open(sff_fp)) out_fh = open(out_fp, "w") for f in flowgrams: out_fh.write(f.toFasta()+"\n") PyCogent-1.5.3/cogent/parse/foldalign.py000644 000765 000024 00000003350 12024702176 021144 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from string import split from cogent.struct.rna2d import Pairs,ViennaStructure from cogent.struct.pairs_util import adjust_base from cogent.parse.column import column_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def foldalign_parser(lines,col=True): """Parser foldalign output""" data = lines if col: return column_parser(data) else: return find_struct(data) def find_struct(lines): """Finds structures in output data""" struct = '' name1 = '' name2 = '' seq1 = '' seq2 = '' result = [] for line in lines: if line.startswith('; ========'): break if line.startswith('; ALIGNING'): line = line.split() name1 = line[2] name2 = line[4] continue if line.startswith('; ALIGN %s' % name1): line = line.split()[3:] line = ''.join(line) seq1 = ''.join([seq1,line]) continue if line.startswith('; ALIGN %s' % name2): line = line.split()[3:] line = ''.join(line) seq2 = ''.join([seq2,line]) continue if line.startswith('; ALIGN Structure'): line = line.split()[3:] line = ''.join(line) struct = ''.join([struct,line]) continue struct = ViennaStructure(struct).toPairs() struct.sort() result.append([struct,seq1,seq2]) return result PyCogent-1.5.3/cogent/parse/gbseq.py000644 000765 000024 00000010132 12024702176 020302 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parser for NCBI Sequence Set XML format. DOCTYPE Bioseq-set PUBLIC "-//NCBI//NCBI Seqset/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_Seqset.dtd" """ import xml.dom.minidom from cogent.core import annotation, moltype __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield", "Peter Maxwell", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" """ CAUTION: This XML PARSER uses minidom. This means a bad performance for big files (>5MB), and huge XML files will for sure crash the program! """ def GbSeqXmlParser(doc): """Parser for NCBI Sequence Set XML format. DOCTYPE Bioseq-set PUBLIC "-//NCBI//NCBI Seqset/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_Seqset.dtd" Arguments: - doc: An xml.dom.minidom.Document, file object of string Yields: - name, cogent sequence CAUTION: This XML PARSER uses minidom. This means a bad performance for big files (>5MB), and huge XML files will for sure crash the program! """ if isinstance(doc,xml.dom.minidom.Document): dom_obj = doc elif isinstance(doc,file): dom_obj = xml.dom.minidom.parse(doc) elif isinstance(doc,str): dom_obj = xml.dom.minidom.parseString(doc) else: raise TypeError for record in dom_obj.getElementsByTagName('GBSeq'): raw_seq = record.getElementsByTagName( 'GBSeq_sequence')[0].childNodes[0].nodeValue name = record.getElementsByTagName( 'GBSeq_accession-version')[0].childNodes[0].nodeValue #cast as string to de-unicode raw_string = str(raw_seq).upper() name=str(name) if record.getElementsByTagName( 'GBSeq_moltype')[0].childNodes[0].nodeValue == u'9': alphabet = moltype.PROTEIN else: alphabet = moltype.DNA seq = alphabet.makeSequence(raw_string, Name=name) all = annotation.Map([(0,len(seq))], parent_length=len(seq)) seq.addAnnotation(annotation.Source, all, name, all) organism = str(record.getElementsByTagName( 'GBSeq_organism')[0].childNodes[0].nodeValue) seq.addAnnotation(annotation.Feature, "organism", organism, [(0,len(seq))]) features = record.getElementsByTagName('GBFeature') for feature in features: key = str(feature.getElementsByTagName( 'GBFeature_key')[0].childNodes[0].nodeValue) if key == 'source': continue spans = [] feature_name = "" for interval in feature.getElementsByTagName("GBInterval"): try: start = int(interval.getElementsByTagName( "GBInterval_from")[0].childNodes[0].nodeValue) end= int(interval.getElementsByTagName( "GBInterval_to")[0].childNodes[0].nodeValue) spans.append((start-1, end)) except IndexError: point = int(interval.getElementsByTagName( "GBInterval_point")[0].childNodes[0].nodeValue) spans.append((point-1, point)) if spans == []: spans = [(0,len(seq))] for qualifier in feature.getElementsByTagName("GBQualifier"): qname = qualifier.getElementsByTagName( "GBQualifier_name")[0].childNodes[0].nodeValue if qname == u'gene': feature_name = qualifier.getElementsByTagName( "GBQualifier_value")[0].childNodes[0].nodeValue seq.addAnnotation(annotation.Feature, key, feature_name, spans) yield (name, seq) def parse(*args): return GbSeqXmlParser(*args).next()[1] PyCogent-1.5.3/cogent/parse/gcg.py000644 000765 000024 00000003312 12024702176 017743 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield", "Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" import warnings def MsfParser(f): """Read sequences from a msf format file""" alignmentdict = {} #parse optional header #parse optional text information #file header and sequence header are seperated by a line ending in '..' line = f.readline().strip() for line in f: line = line.strip() if line.endswith('..'): break #parse sequence info seqinfo = {} for line in f: line = line.strip() if line.startswith('//'): break line = line.split() if line and line[0] == 'Name:': seqinfo[line[1]] = int(line[3]) #parse sequences sequences = {} for line in f: line = line.strip().split() if line and line[0] in sequences: sequences[line[0]] += ''.join(line[1:]) elif line and line[0] in seqinfo: sequences[line[0]] = ''.join(line[1:]) #consistency check if len(sequences) != len(seqinfo): warnings.warn("Number of loaded seqs[%s] not same as "\ "expected[%s]." % (len(sequences), len(seqinfo))) for name in sequences: if len(sequences[name]) != seqinfo[name]: warnings.warn("Length of loaded seqs [%s] is [%s] not "\ "[%s] as expected." % (name,len(sequences[name]),seqinfo[name])) #yield sequences for name in sequences: yield (name, sequences[name]) PyCogent-1.5.3/cogent/parse/genbank.py000644 000765 000024 00000054155 12024702176 020623 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.parse.record import FieldWrapper from cogent.parse.record_finder import DelimitedRecordFinder, \ LabeledRecordFinder from cogent.core.genetic_code import GeneticCodes from string import maketrans, strip, rstrip from cogent.core.moltype import PROTEIN, DNA, ASCII from cogent.core.annotation import Feature from cogent.core.info import Info __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell", "Matthew Wakefield", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" all_chars = maketrans('','') dna_lc = 'utacgrywsmkbdhvn' dna_lc_cmp = 'aatgcyrwskmvhdbn' dna_trans = maketrans(dna_lc+dna_lc.upper(),dna_lc_cmp+dna_lc_cmp.upper()) rna_lc = 'utacgrywsmkbdhvn' rna_lc_cmp = 'aaugcyrwskmvhdbn' rna_trans = maketrans(rna_lc+rna_lc.upper(),rna_lc_cmp+rna_lc_cmp.upper()) locus_fields = [None, 'locus','length', None, 'mol_type','topology','db','date'] _locus_parser = FieldWrapper(locus_fields) #need to turn off line stripping, because whitespace is significant GbFinder = DelimitedRecordFinder('//', constructor=rstrip) class PartialRecordError(Exception): pass def parse_locus(line): """Parses a locus line, including conversion of Length to an int. WARNING: Gives incorrect results on legacy records that omit the topology. All records spot-checked on 8/30/05 had been updated to include the topology even when prior versions omitted it. """ result = _locus_parser(line) try: result['length'] = int(result['length']) except KeyError, e: raise PartialRecordError, e if None in result: del result[None] return result def parse_single_line(line): """Generic parser: splits off the label, and return the rest.""" label, data = line.split(None, 1) return data.rstrip() def indent_splitter(lines): """Yields the lines whenever it hits a line with same indent level as first. """ first_line = True curr = [] for line in lines: #skip blank lines line = line.rstrip() if not line: continue #need to figure out indent if first line if first_line: indent = len(line) - len(line.lstrip()) curr.append(line) first_line = False elif len(line) > indent and line[indent].isspace(): curr.append(line) else: #got a line that doesn't match the indent yield curr curr = [line] if curr: yield curr def parse_sequence(lines, constructor=''.join): """Parses a GenBank sequence block. Doesn't care about ORIGIN line.""" result = [] for i in lines: if i.startswith('ORIGIN'): continue result.append(i.translate(all_chars, '0123456789 \t\n\r/')) return constructor(result) def block_consolidator(lines): """Takes block with label and multiline data, and returns (label, [data]). [data] will be list of lines of data, including first line w/o label. """ data = [] first = True label = None for line in lines: if first: #find label line = line.split(None, 1) if len(line) == 2: label, curr = line else: label = line[0] curr = "" data.append(curr) first = False else: data.append(line) return label, data def parse_organism(lines): """Takes ORGANISM block. Returns organism, [taxonomy]. NOTE: Adds species to end of taxonomy if identifiable. """ label, data = block_consolidator(lines) #get 'species' species = data[0].strip() #get rest of taxonomy taxonomy = ' '.join(data[1:]) #normalize whitespace, including deleting newlines taxonomy = ' '.join(taxonomy.split()) #separate by semicolons taxa = map(strip, taxonomy.split(';')) #get rid of leading/trailing spaces #delete trailing period if present last = taxa[-1] if last.endswith('.'): taxa[-1] = last[:-1] return species, taxa def is_feature_component_start(line): """Checks if a line starts with '/', ignoring whitespace.""" return line.lstrip().startswith('/') feature_component_iterator = LabeledRecordFinder(is_feature_component_start) _join_with_empty = dict.fromkeys(['translation']) _leave_as_lines = {} def parse_feature(lines): """Parses a feature. Doesn't handle subfeatures. Returns dict containing: 'type': source, gene, CDS, etc. 'location': unparsed location string ...then, key-value pairs for each annotation, e.g. '/gene="MNBH"' -> {'gene':['MNBH']} (i.e. quotes stripped) All relations are assumed 'to many', and order will be preserved. """ result = {} type_, data = block_consolidator(lines) result['type'] = type_ location = [] found_feature = False for curr_line_idx, line in enumerate(data): if line.lstrip().startswith('/'): found_feature = True break else: location.append(line) result['raw_location'] = location try: result['location'] = \ parse_location_line(location_line_tokenizer(location)) except (TypeError, ValueError): result['location'] = None if not found_feature: return result fci = feature_component_iterator for feature_component in fci(data[curr_line_idx:]): first = feature_component[0].lstrip()[1:] #remove leading space, '/' try: label, first_line = first.split('=', 1) except ValueError: #sometimes not delimited by = label, first_line = first, '' #chop off leading quote if appropriate if first_line.startswith('"'): first_line = first_line[1:] remainder = [first_line] + feature_component[1:] #chop off trailing quote, if appropriate last_line = remainder[-1].rstrip() if last_line.endswith('"'): remainder[-1] = last_line[:-1] if label in _join_with_empty: curr_data = ''.join(map(strip, remainder)) elif label in _leave_as_lines: curr_data = remainder else: curr_data = ' '.join(map(strip, remainder)) if label.lower() == 'type': # some source features have /type=... label = 'type_field' if label not in result: result[label.lower()] = [] result[label.lower()].append(curr_data) return result def location_line_tokenizer(lines): """Tokenizes location lines into spans, joins and complements.""" curr = [] text = ' '.join(map(strip, lines)) for char in text: if char == '(': yield ''.join(curr).strip() + char curr = [] elif char == ')': if curr: yield ''.join(curr).strip() yield char curr = [] elif char == ',': if curr: yield ''.join(curr).strip() yield ',' curr = [] else: curr.append(char) if curr: yield ''.join(curr).strip() def parse_simple_location_segment(segment): """Parses location segment of form a..b or a, incl. '<' and '>'.""" first_ambiguity, second_ambiguity = None, None if '..' in segment: first, second = segment.split('..') if not first[0].isdigit(): first_ambiguity = first[0] first = long(first[1:]) else: first = long(first) if not second[0].isdigit(): second_ambiguity = second[0] second = long(second[1:]) else: second = long(second) return Location([Location(first, Ambiguity=first_ambiguity), \ Location(second, Ambiguity=second_ambiguity)]) else: if not segment[0].isdigit(): first_ambiguity = segment[0] segment = segment[1:] return Location(long(segment), Ambiguity=first_ambiguity) def parse_location_line(tokens, parser=parse_simple_location_segment): """Parses location line tokens into location list.""" stack = [] curr = stack for t in tokens: if t .endswith('('): new = [curr, t] curr.append(new) curr = new elif t == ',': #ignore continue elif t == ')': parent, type_ = curr[:2] children = curr[2:] if type_ == 'complement(': children.reverse() for c in children: c.Strand *= -1 curr_index = parent.index(curr) del parent[curr_index] parent[curr_index:curr_index] = children[:] curr = parent else: curr.append(parser(t)) return LocationList(stack) class Location(object): """GenBank location object. Integer, or low, high, or 2-base bound. data must either be a long, an object that can be coerced to a long, or a sequence of two BasePosition objects. It can _not_ be two numbers. Ambiguity should be None (the default), '>', or '<'. IsBetween should be False (the default), or True. IsBounds should be False(the default, indicates range), or True. Accession should be an accession, or None (default). Db should be a database identifier, or None (default). Strand should be 1 (forward, default) or -1 (reverse). WARNING: This Location will allow you to do things that can't happen in GenBank, such as having a start and stop that aren't from the same accession. No validation is performed to prevent this. All reasonable cases should work. WARNING: Coordinates are based on 1, not 0, as in GenBank format. """ def __init__(self, data, Ambiguity=None, IsBetween=False, IsBounds=False, \ Accession=None, Db=None, Strand=1): """Returns new LocalLocation object.""" try: data = long(data) except TypeError: pass #assume was two Location objects. self._data = data self.Ambiguity = Ambiguity self.IsBetween = IsBetween self.IsBounds = IsBounds self.Accession = Accession self.Db = Db self.Strand = Strand def __str__(self): """Returns self in string format. WARNING: More permissive than GenBank's Backus-Naur form allows. If you abuse this object, you'll get results that aren't valid GenBank locations. """ if self.IsBetween: #between two bases try: first, last = self._data curr = '%s^%s' % (first, last) except TypeError: #only one base? must be this or the next curr = '%s^%s' % (first, first+1) else: #not self.IsBetween try: data = long(self._data) #if the above line succeeds, we've got a single item if self.Ambiguity: curr = self.Ambiguity + str(data) else: curr = str(data) except TypeError: #if long conversion failed, should have two LocalLocation objects first, last = self._data if self.IsBounds: curr = '(%s%s%s)' % (first, '.', last) else: curr = '%s%s%s' % (first, '..', last) #check if we need to add on the accession and database if self.Accession: curr = self.Accession + ':' + curr #we're only going to add the Db if we got an accession if self.Db: curr = self.Db + '::' + curr #check if it's complemented if self.Strand == -1: curr = 'complement(%s)' % curr return curr def isAmbiguous(self): """Returns True if ambiguous (single-base ambiguity or two locations.) """ if self.Ambiguity: return True try: iter(self._data) return True except: return False def first(self): """Returns first base self could be.""" try: return long(self._data) except TypeError: return self._data[0].first() def last(self): """Returns last base self could be.""" try: return long(self._data) except TypeError: return self._data[-1].last() class LocationList(list): """List of Location objects. WARNING: Coordinates are based on 1, not 0, to match GenBank format. """ BIGNUM = 1e300 def first(self): """Returns first base of self.""" curr = self.BIGNUM for i in self: first = i.first() if curr > first: curr = first return curr def last(self): """Returns last base of self.""" curr = 0 for i in self: last = i.last() if last > curr: curr = last return curr def strand(self): """Returns strand of components: 1=forward, -1=reverse, 0=both """ curr = {} for i in self: curr[i.Strand] = 1 if len(curr) >= 2: #found stuff on both strands return 0 else: return curr.keys()[0] def __str__(self): """Returns (normalized) string representation of self.""" if len(self) == 0: return '' elif len(self) == 1: return str(self[0]) else: return 'join(' + ','.join(map(str, self)) + ')' def extract(self, sequence, trans_table=dna_trans): """Extracts pieces of self from sequence.""" result = [] for i in self: first, last = i.first(), i.last() + 1 #inclusive, not exclusive #translate to 0-based indices and check if it wraps around if first < last: curr = sequence[first-1:last-1] else: curr = sequence[first-1:]+sequence[:last-1] #reverse-complement if necessary if i.Strand == -1: curr = curr.translate(trans_table)[::-1] result.append(curr) return ''.join(result) def parse_feature_table(lines): """Simple parser for feature table. Assumes starts with FEATURES line.""" if not lines: return [] if lines[0].startswith('FEATURES'): lines = lines[1:] return [parse_feature(f) for f in indent_splitter(lines)] reference_label_marker = ' ' * 11 reference_field_finder = LabeledRecordFinder(lambda x: \ not x.startswith(reference_label_marker), constructor=None) def parse_reference(lines): """Simple parser for single reference.""" result = {} for field in reference_field_finder(lines): label, data = block_consolidator(field) result[label.lower()] = ' '.join(map(strip, data)) return result def parse_source(lines): """Simple parser for source fields.""" result = {} all_lines = list(lines) source_field = reference_field_finder(all_lines).next() label, data = block_consolidator(source_field) result[label.lower()] = ' '.join(map(strip, data)) source_length = len(source_field) species, taxonomy = parse_organism(lines[source_length:]) result['species'] = species result['taxonomy'] = taxonomy return result #adaptors to update curr with data from each parser def locus_adaptor(lines, curr): curr.update(parse_locus(lines[0])) def source_adaptor(lines, curr): curr.update(parse_source(lines)) def ref_adaptor(lines, curr): if 'references' not in curr: curr['references'] = [] curr['references'].append(parse_reference(lines)) def feature_table_adaptor(lines, curr): if 'features' not in curr: curr['features'] = [] curr['features'].extend(parse_feature_table(lines)) def sequence_adaptor(lines, curr): curr['sequence'] = parse_sequence(lines) def generic_adaptor(lines, curr): label, data = block_consolidator(lines) curr[label.lower()] = ' '.join(map(strip, lines)) handlers = { 'LOCUS': locus_adaptor, 'SOURCE': source_adaptor, 'REFERENCE': ref_adaptor, 'FEATURES': feature_table_adaptor, 'ORIGIN': sequence_adaptor, '//': lambda lines, curr: None, '?': lambda lines, curr: None } def MinimalGenbankParser(lines, handlers=handlers,\ default_handler=generic_adaptor): for rec in GbFinder(lines): curr = {} bad_record = False for field in indent_splitter(rec): first_word = field[0].split(None, 1)[0] handler = handlers.get(first_word, default_handler) try: handler(field, curr) except: bad_record = True break if not bad_record: yield curr def parse_location_segment(location_segment): """Parses a location segment into its component pieces. Known possibilities: http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html 467 single base a..b range from a to b, including a and b a strictly after a (a.b) a single base between a and b, inclusive a^b a site between two adjacent bases between a and b accession:a a occurs in accession, not in the current sequence db::accession:a a occurrs in accession in db, not in the current sequence """ s = location_segment #save some typing... lsp = parse_location_segment #check if it's a range if '..' in s: first, second = s.split('..') return Location([lsp(first), lsp(second)]) #check if it's between two adjacent bases elif '^' in s: first, second = s.split('^') return Location([lsp(first), lsp(second)],IsBetween=True) #check if it's a single base reference -- but don't be fooled by accessions! elif '.' in s and s.startswith('(') and s.endswith(')'): first, second = s.split('.') return Location([lsp(first[1:]), lsp(second[:-1])]) def parse_location_atom(location_atom): """Parses a location atom, supposed to be a single-base position.""" a = location_atom if a.startswith('<') or a.startswith('>'): #fuzzy position = long(a[1:]) return Location(position, Ambiguity = a[0]) #otherwise, should just be an integer return Location(long(a)) wanted_types = dict.fromkeys(['CDS']) def extract_nt_prot_seqs(rec, wanted=wanted_types): """Extracts nucleotide seqs, and, where possible, protein seqs, from recs.""" rec_seq = rec['sequence'] for f in rec['features']: if f['type'] not in wanted: continue translation = f['translation'][0] raw_seq = f['location'].extract(rec_seq) print raw_seq seq = raw_seq[long(f['codon_start'][0])-1:] print 'dt:', translation print 'ct:', GeneticCodes[f.get('transl_table', '1')[0]].translate(seq) print 's :', seq def RichGenbankParser(handle, info_excludes=None, moltype=None, skip_contigs=False): """Returns annotated sequences from GenBank formatted file. Arguments: - info_excludes: a series of fields to be excluded from the Info object - moltype: a MolType instance, such as PROTEIN, DNA. Default is ASCII. - skip_contigs: ignores records with no actual sequence data, typically a genomic contig.""" info_excludes = info_excludes or [] moltype = moltype or ASCII for rec in MinimalGenbankParser(handle): info = Info() # populate the Info object, excluding the sequence for label, value in rec.items(): if label in info_excludes: continue info[label] = value if rec['mol_type'] == 'protein': # which it doesn't for genbank moltype = PROTEIN elif rec['mol_type'] == 'DNA': moltype = DNA try: seq = moltype.makeSequence(rec['sequence'].upper(), Info=info, Name=rec['locus']) except KeyError: if not skip_contigs: if 'contig' in rec: yield rec['locus'], rec['contig'] elif 'WGS' in rec: yield rec['locus'], rec['WGS'] else: yield rec['locus'], None continue for feature in rec['features']: spans = [] reversed = None if feature['location'] == None or feature['type'] in ['source', \ 'organism']: continue for location in feature['location']: (lo, hi) = (location.first() - 1, location.last()) if location.Strand == -1: (lo, hi) = (hi, lo) assert reversed is not False reversed = True else: assert reversed is not True reversed = False # ensure we don't put in a span that starts beyond the sequence if lo > len(seq): continue # or that's longer than the sequence hi = [hi, len(seq)][hi > len(seq)] spans.append((lo, hi)) if reversed: spans.reverse() for id_field in ['gene', 'note', 'product', 'clone']: if id_field in feature: name = feature[id_field] if not isinstance(name, basestring): name = ' '.join(name) break else: name = None seq.addAnnotation(Feature, feature['type'], name, spans) yield (rec['locus'], seq) def parse(*args): return RichGenbankParser(*args).next()[1] if __name__ == '__main__': #demo if called from commandline from sys import argv rec = parse(open(argv[1], 'U')) print len(rec), rec.getName() for annot in rec.annotations: print annot PyCogent-1.5.3/cogent/parse/gff.py000644 000765 000024 00000003611 12024702176 017747 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Matthew Wakefield", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" def GffParser(f): assert not isinstance(f, str) for line in f: # comments and blank lines if "#" in line: (line, comments) = line.split("#", 1) else: comments = None line = line.strip() if not line: continue # parse columns cols = line.split('\t') if len(cols) == 8: cols.append('') assert len(cols) == 9, line (seqname, source, feature, start, end, score, strand, frame, attributes) = cols # adjust for python 0-based indexing etc. (start, end) = (int(start) - 1, int(end)) # start is always meant to be less than end in GFF # and in v 2.0, features that extend beyond sequence have negative # indices if start < 0 or end < 0: start, end = abs(start), abs(end) if start > end: start, end = end, start # but we use reversal of indices when the feature is on the opposite # strand if strand == '-': (start, end) = (end, start) # should parse attributes too yield (seqname, source, feature, start, end, score, strand, frame, attributes, comments) def parse_attributes(attribute_string): """Returns region of attribute string between first pair of double quotes""" attribute_string = attribute_string[attribute_string.find('"')+1:] if '"' in attribute_string: attribute_string = attribute_string[:attribute_string.find('"')] return attribute_string PyCogent-1.5.3/cogent/parse/gibbs.py000644 000765 000024 00000015713 12024702176 020301 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file cogent/parse/gibbs.py """Parses Gibbs Sampler output file and creates MotifResults object.""" from cogent.parse.record_finder import LabeledRecordFinder from cogent.motif.util import Location, ModuleInstance, Module, Motif,\ MotifResults from cogent.core.moltype import DNA, RNA, PROTEIN from numpy import exp __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Micah Hamady", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" def get_sequence_and_motif_blocks(lines): """Returns main block of data as a list. """ gibbs_map_maximization = \ LabeledRecordFinder(lambda x: 'MAP MAXIMIZATION RESULTS' in x) seq_block, motif_block = list(gibbs_map_maximization(lines)) return seq_block, motif_block def get_sequence_map(lines): """Returns dict mapping Gibbs sequence number to sequence ID. - ex: sequence numbers mapping to gis: {'1':'1091044', '2':'11467494', '3':'11499727'} """ sequence_map = {} sequence_finder = \ LabeledRecordFinder(lambda x: x.startswith('Sequences to be Searched:')) sequence_block = list(sequence_finder(lines))[-1] for i in sequence_block[2:]: if i.startswith('#'): num,label = i.strip().split(' ',1) num = num.strip() label = label.strip() sequence_map[num[1:]] = label else: break return sequence_map def get_motif_blocks(lines): """Returns list of motif blocks given main block as lines. """ gibbs_motif = LabeledRecordFinder(lambda x: 'MOTIF' in x) return list(gibbs_motif(lines))[1:] def get_motif_sequences(lines): """Returns list of tuples with motif sequence information given motif block. - result is list of tuples : [(seq_num, motif_start, motif_seq, motif_sig),] """ motif_list = [] motif_seq_finder = LabeledRecordFinder(lambda x: 'columns' in x) motifs = list(motif_seq_finder(lines))[-1] for m in motifs[2:]: if ',' in m: curr = m.strip().split() motif_num = curr[1] seq_num = curr[0].split(',')[0] motif_start = int(curr[2])-1 #If motif does not start at beginning of sequence: if motif_start > 0: motif_seq = curr[4] #Motif starts at beginning of sequence, no context before motif else: motif_seq = curr[3] motif_sig = float(curr[-3]) motif_list.append((seq_num,motif_start,motif_seq,motif_sig,motif_num)) else: break return motif_list def get_motif_p_value(lines): """Returns the motif p-value given motif block. """ motif_p_finder = LabeledRecordFinder(lambda x: x.startswith('Log Motif')) motif_p_block = list(motif_p_finder(lines))[-1] log_motif_portion = float(motif_p_block[0].split()[-1]) return exp(log_motif_portion) def guess_alphabet(motif_list): """Returns alphabet given a motif_list from get_motif_sequences. - temp hack...should really think of a better way to get alphabet, but Gibbs sampler help tells nothing of how it guesses the alphabet or what it does with degenerate characters. - only allows for 2 degenerate nucleotide chars. """ alpha_dict = {} for motif in motif_list: for char in motif[2]: if char not in alpha_dict: alpha_dict[char]=0 alpha_dict[char]+=1 if len(alpha_dict) > 6: alphabet=PROTEIN elif 'T' in alpha_dict: alphabet=DNA else: alphabet=RNA return alphabet def build_module_objects(motif_block, sequence_map, truncate_len=None): """Returns module object given a motif_block and sequence_map. - motif_block is list of lines resulting from calling get_motif_blocks - sequence_map is the mapping between Gibbs sequence numbering and sequence id from fasta file. """ #Get motif id motif_id = motif_block[0].strip().split()[-1] #Get motif_list motif_list = get_motif_sequences(motif_block) #Get motif p-value motif_p = get_motif_p_value(motif_block) #Guess alphabet from motif sequences alphabet = guess_alphabet(motif_list) #Create Module object(s) gibbs_module = {} module_keys = ["1"] for motif in motif_list: seq_id = str(sequence_map[motif[0]]) if truncate_len: seq_id = seq_id[:truncate_len] start = motif[1] seq = motif[2] sig = motif[3] motif_num = "1" #Create Location object location = Location(seq_id, start, start + len(seq)) #Create ModuleInstance mod_instance = ModuleInstance(seq,location,sig) cur_key = (seq_id,start) gibbs_module[(seq_id,start)]=mod_instance gibbs_mod = Module(gibbs_module,MolType=alphabet) gibbs_mod.Pvalue = motif_p gibbs_mod.ID = motif_id + module_keys[0] yield gibbs_mod def module_ids_to_int(modules): """Given a list of modules changes alpha ids to int ids. """ id_map = {} for m in modules: id_map[m.ID]=True for i,mid in enumerate(sorted(id_map.keys())): id_map[mid]=str(i) for m in modules: m.ID=id_map[m.ID] def GibbsParser(lines, truncate_len=None, strict=True): """Returns a MotifResults object given a Gibbs Sampler results file. - only works with results from command line version of Gibbs Sampler """ try: #Get sequence and motif blocks sequence_block, motif_block = get_sequence_and_motif_blocks(lines) except Exception, e: if strict: raise e else: return None #Create MotifResults object gibbs_motif_results = MotifResults() #Get sequence map sequence_map = get_sequence_map(sequence_block) #Get motif blocks motif_blocks = get_motif_blocks(motif_block) #Get modules for module in motif_blocks: if module[1] == 'No Motifs Detected': print "No Modules detcted!!", module[0] continue for cur_smod in build_module_objects(module, sequence_map, truncate_len=truncate_len): gibbs_motif_results.Modules.append(cur_smod) module_ids_to_int(gibbs_motif_results.Modules) for ct, module in enumerate(gibbs_motif_results.Modules): gibbs_motif_results.Motifs.append(Motif(module)) gibbs_motif_results.Alphabet=module.Alphabet return gibbs_motif_results if __name__ == "__main__": from sys import argv, exit print "Running..." if len(argv) != 2: print "Usage: gibbs.py " exit(1) mr = GibbsParser(open(argv[1]), 24) print mr for module in mr.Modules: print module.ID print str(module) PyCogent-1.5.3/cogent/parse/greengenes.py000644 000765 000024 00000013500 12024702176 021325 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parse the Greengenes formatted sequence data records The script is intended to be used with the following input: http://greengenes.lbl.gov/Download/Sequence_Data/Greengenes_format/greengenes16SrRNAgenes.txt.gz """ from cogent.parse.record_finder import DelimitedRecordFinder from cogent.parse.record import DelimitedSplitter, GenericRecord __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "daniel.mcdonald@colorado.edu" __status__ = "Prototype" def make_ignore_f(start_line): """Make an ignore function that ignores bad gg lines""" def ignore(line): """Return false if line is bad""" return not line or ['',''] == line or [start_line,''] == line return ignore def DefaultDelimitedSplitter(delimiter): """Wraps delimited splitter to handle empty records""" parser = DelimitedSplitter(delimiter=delimiter) def f(line): parsed = parser(line) if len(parsed) == 1: parsed.append('') return parsed return f def MinimalGreengenesParser(lines,LineDelim="=",RecStart="BEGIN",RecEnd="END"): """Parses raw Greengeens 16S rRNA Gene records lines : open records file LineDelim : individual line delimiter, eg foo=bar RecStart : start identifier for a record RecEnd : end identifier for a record """ line_parser = DefaultDelimitedSplitter(delimiter=LineDelim) # parse what the ending record looks like so it can match after being split RecordDelim = line_parser(RecEnd) # make sure to ignore the starting record ignore = make_ignore_f(RecStart) parser = DelimitedRecordFinder(RecordDelim, constructor=line_parser, keep_delimiter=False, ignore=ignore) for record in parser(lines): yield GenericRecord(record) all_ids = lambda x,y: True specific_ids = lambda x,y: x in y def SpecificGreengenesParser(lines, fields, ids=None, **kwargs): """Yield specific fields from successive Greengenes records If ids are specified, only the records for the set of ids passed in will be returned. Parser will silently ignore ids that are not present in the set of ids as well as silently ignore ids in the set that are not present in the records file. ids : must either test True or be an iterable with prokMSA_ids Returns tuples in 'fields' order """ parser = MinimalGreengenesParser(lines, **kwargs) if ids: ids = set(ids) id_lookup = specific_ids else: id_lookup = all_ids for record in parser: if id_lookup(record['prokMSA_id'], ids): yield tuple([record[field] for field in fields]) def main(): from optparse import make_option from cogent.util.misc import parse_command_line_parameters from sys import exit, stdout script_info = {} script_info['brief_description'] = "Parse raw Greengenes 16S records" script_info['script_description'] = """Parse out specific fields from raw Greengenes 16S records. These records are rich but often only a subset of each record is required for downstream processing.""" script_info['script_usage'] = [] script_info['script_usage'].append(("""Example:""","""Greengenes taxonomy and raw sequences are needed:""","""python greengenes.py -i greengenes16SrRNAgenes.txt -o gg_seq_and_tax.txt -f prokMSA_id,greengenes_tax_string,aligned_seq""")) script_info['script_usage'].append(("""Example:""","""Spitting out the available fields from Greengenes:""","""python greengenes.py -i greengenes16SrRNAgenes.txt --print-fields""")) script_info['output_description'] = """The resulting output file will contain a header that is prefixed with a # and delimited by the specified delimiter (default is tab). All records will follow in the same order with the same delimiter. It is possible for some key/value pairs within a record to lack a value. In this case, the value placed will be ''""" script_info['required_options']=[make_option('--input','-i',dest='input',\ help='Greengenes Records')] script_info['optional_options']=[\ make_option('--output','-o',dest='output',help='Output file'), make_option('--fields','-f',dest='fields',\ help='Greengenes fields to keep'), make_option('--delim','-d',dest='delim',help='Output delimiter',\ default="\t"), make_option('--list-of-ids','-l',dest='ids',default=None,\ help='File with a single column list of ids to retrieve'), make_option('--print-fields','-p',dest='print_fields',\ help='Prints available fields from first Greengenes Record',\ action='store_true',default=False)] script_info['version'] = __version__ option_parser, opts, args = parse_command_line_parameters(**script_info) if opts.print_fields: gg_parser = MinimalGreengenesParser(open(opts.input)) rec = gg_parser.next() print '\n'.join(sorted(rec.keys())) exit(0) if not opts.fields: print option_parser.usage() print print "Greengenes fields must be specified!" exit(1) if not opts.output: output = stdout else: output = open(opts.output,'w') fields = opts.fields.split(',') output.write("#%s\n" % opts.delim.join(fields)) if opts.ids: ids = set([l.strip() for l in open(opts.ids, 'U')]) else: ids = None gg_parser = SpecificGreengenesParser(open(opts.input), fields, ids) for record in gg_parser: output.write(opts.delim.join(record)) output.write('\n') if opts.output: output.close() if __name__ == '__main__': main() PyCogent-1.5.3/cogent/parse/illumina_sequence.py000644 000765 000024 00000001532 12024702176 022707 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Tests of Illumina sequence file parser. """ __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Production" def MinimalIlluminaSequenceParser(data): """ Yields (header, sequence, quality strings) given an Illumina sequence file """ if hasattr(data,'lower'): # passed a string so open filepath data = open(data,'U') for line in data: fields = line.strip().split(':') yield fields[:-2], fields[-2], fields[-1] try: # if data is a file, close it data.close() except AttributeError: # otherwise do nothing pass return PyCogent-1.5.3/cogent/parse/ilm.py000644 000765 000024 00000002302 12024702176 017762 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.struct.rna2d import Pairs from cogent.struct.knots import opt_single_random from cogent.struct.pairs_util import adjust_base __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def ilm_parser(lines=None,pseudo=True): """Ilm format parser Takes lines as input and returns a list with Pairs object. Pseudo - if True returns pairs with possible pseudoknot if False removes pseudoknots """ pairs = [] for line in lines: if line.startswith('Final') or len(line)==1:#skip these lines continue line = line.strip('\n') line = map(int,line.split(None,2)) if line[1] == 0: continue #Skip this line, not a pair else: pairs.append(line) pairs = adjust_base(pairs,-1) tmp = Pairs(pairs).directed() tmp.sort() if not pseudo: tmp = opt_single_random(tmp) tmp.sort() result = [] result.append(tmp) return result PyCogent-1.5.3/cogent/parse/infernal.py000644 000765 000024 00000006332 12024702176 021006 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file cogent.parse.infernal.py """Parses various Infernal output formats for the commandline version of: Infernal 1.0 and 1.0.2 only.""" from cogent.parse.table import ConvertFields, SeparatorFormatParser,is_empty __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" def CmsearchParser(lines): """Parser for tabfile format cmsearch result. - IMPORTANT: Will not parse standard output from cmsearch. You must use --tabfile with cmsearch to get correct format to use this parser. - NOTE: Will only work with search result files with a single CM as a query. Will not work with multiple search result files that have been concatenated. - Result will be list of hits with following order: [target name, target start, target stop, query start, query stop, bit score, E-value, GC%] """ # Converting indices and %GC to integers and bit score to float. # Since E-value is only present if CM is calibrated, leaving as string. conversion_fields = [(2,int),(3,int),(4,int),(5,int),(6,float),(8,int)] cmsearch_converter = ConvertFields(conversion_fields) #Ignore hash characters good_lines = [] for l in lines: if not l.startswith('#'): good_lines.append(l) #make parser cmsearch_parser = SeparatorFormatParser(with_header=False,\ converter=cmsearch_converter,\ ignore=None,\ sep=None) return cmsearch_parser(good_lines) def CmalignScoreParser(lines): """Parser for tabfile format cmalign score result. - IMPORTANT: Will only parse standard output from cmalign. - NOTE: Will only work with search result files with a single CM as a query. Will not work with multiple alignment result files that have been concatenated. - Result will be list of hits with following order: [seq idx, seq name, seq len, total bit score, struct bit score, avg prob, elapsed time] """ # Converting indices and %GC to integers and bit score to float. # Since E-value is only present if CM is calibrated, leaving as string. conversion_fields = [(0,int),(2,int),(3,float),(4,float),(5,float)] cmalign_score_converter = ConvertFields(conversion_fields) #Ignore hash characters good_lines = [] for l in lines: line = l.strip() if line.startswith('# STOCKHOLM 1.0'): break if line and (not line.startswith('#')): good_lines.append(l) #make parser cmalign_score_parser = SeparatorFormatParser(with_header=False,\ converter=cmalign_score_converter,\ ignore=None,\ sep=None) return cmalign_score_parser(good_lines) PyCogent-1.5.3/cogent/parse/kegg_fasta.py000644 000765 000024 00000002573 12024702176 021306 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from string import strip from cogent.parse.fasta import MinimalFastaParser __author__ = "Jesse Zaneveld" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Zaneveld", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Zaneveld" __email__ = "zaneveld@gmail.com" __status__ = "Production" """ Parser for KEGG fasta files This code is useful for parsing the KEGG .nuc or .pep files """ def parse_fasta(lines): """lightweight parser for KEGG FASTA format sequences""" for label, seq in MinimalFastaParser(lines): yield '\t'.join(list(kegg_label_fields(label)) \ + [seq] + ["\n"]) def kegg_label_fields(line): """Splits line into KEGG label fields. Format is species:gene_id [optional gene_name]; description. """ fields = map(strip, line.split(None, 1)) id_ = fields[0] species, gene_id = map(strip, id_.split(':',1)) #check if we got a description gene_name = description = '' if len(fields) > 1: description = fields[1] if ';' in description: gene_name, description = map(strip, description.split(';',1)) return id_, species, gene_id, gene_name, description if __name__ == '__main__': from sys import argv filename = argv[1] for result_line in parse_fasta(open(filename)): print result_line.strip() PyCogent-1.5.3/cogent/parse/kegg_ko.py000644 000765 000024 00000026563 12024702176 020626 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from sys import argv from string import strip from os import listdir,path from optparse import OptionParser from datetime import datetime import tarfile _author__ = "Jesse Zaneveld" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Zaneveld", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Zaneveld" __email__ = "zaneveld@gmail.com" __status__ = "Development" """A parser for the KEGG 'ko' file containing information on KEGG orthology groups and their associated pathways. """ def parse_ko_file(filepath,dir_prefix=None,debug = True): """Parse the NCBI KO file lines, and output several tab-delimited files filepath - the full filepath to the input KO file from KEGG dir_prefix - the directory to which tab-delimited output files will be saved. debug - if set to True, pring debugging output to the screen """ lines = open(filepath,"U") ko_gene_fname = 'ko_to_gene.tab' ko_fname = 'ko.tab' ko_pathway_fname = 'ko_to_pathway.tab' pathway_fname = 'pathway.tab' ko_cog_fname = 'ko_to_cog.tab' ko_cazy_fname = 'ko_to_cazy.tab' ko_go_fname = 'ko_to_go.tab' fnames = [ko_gene_fname, ko_fname, ko_pathway_fname,\ pathway_fname, ko_cog_fname, ko_cazy_fname,\ ko_go_fname] if dir_prefix: fnames = [dir_prefix + '/' + f for f in fnames] if debug: for res_fp in fnames: print "Outputting parsed info to: %s" %(res_fp) ko_gene, ko, ko_pathway, pathway, ko_cog, ko_cazy, ko_go = \ [open(i, 'w') for i in fnames] #figure out what fields we want (and get them), and get pathway data fields = ['ENTRY', 'NAME', 'DEFINITION'] ko_to_pathway = {} for rec in parse_ko(lines): ko.write('\t'.join([rec.get(f,'') for f in fields])) ko.write('\n') entry = rec['ENTRY'] if 'GENES' not in rec: continue #apparently, some records don't have genes... genes = rec['GENES'] for species, gene_list in genes.items(): for g in gene_list: ko_gene.write('%s\t%s:%s\n' % (entry, species.lower(), g)) if 'CLASS' not in rec: continue #apparently they also lack classes... ko_to_pathway[entry] = rec['CLASS'] dblinks = rec.get('DBLINKS', None) if dblinks: cogs = dblinks.get('COG', None) cazy = dblinks.get('CAZy', None) go = dblinks.get('GO', None) if cogs: for c in cogs: ko_cog.write("%s\t%s\n" % (entry, c)) if go: for g in go: ko_go.write("%s\t%s\n" % (entry, g)) if cazy: for c in cazy: ko_cazy.write("%s\t%s\n" % (entry,c)) #postprocess the ko_to_pathway data to find out what the pathway terms #are and to write them out into a join file max_terms = 10 unique_recs = {} #will hold tuple(fields) -> unique_id curr_uid = 0 for ko, classes in ko_to_pathway.items(): for (id_, fields) in classes: if fields not in unique_recs: unique_recs[fields] = curr_uid fields_for_output = fields[:] if len(fields_for_output) > max_terms: fields_for_output = fields_for_output[:max_terms] elif len(fields_for_output) < max_terms: fields_for_output += \ ('',)*(max_terms - len(fields_for_output)) pathway.write('\t'.join((str(curr_uid),str(id_)) +\ fields_for_output)+'\n') curr_uid += 1 uid = unique_recs[fields] ko_pathway.write(str(ko)+ '\t'+ str(uid) + '\n') def make_tab_delimited_line_parser(columns_to_convert): """Generates a function that parses a tab-delimited line columns_to_convert: a list of column indexes to convert into integers by splitting on ':' and taking the second entry (e.g. to convert listings like GO:0008150 to 0008150 or ncbi-gi:14589889 to 14589889)""" def parse_tab_delimited_line(line): """Parse a tab-delimited line taking only the second item of cols %s""" %\ str(columns_to_convert) fields = line.split("\t") for index in columns_to_convert: fields[index] = fields[index].split(":")[1] return "\t".join(fields) return parse_tab_delimited_line def ko_default_parser(lines): """Handle default KEGG KO entry lines lines -- default format of space separated lines. Examples include the NAME and DEFINITION entries Strips out newlines and joins lines together.""" return ' '.join(map(strip, lines)).split(None, 1)[1] def ko_first_field_parser(lines): """Handles KEGG KO entries where only the first field is of interest For example, ENTRY fields like: 'ENTRY K01559 KO\n' Strips out newlines and joins lines together for the first field only.""" return ' '.join(map(strip, lines)).split()[1] def delete_comments(line): """Deletes comments in parentheses from a line.""" fields = line.split(')') result = [] for f in fields: if '(' in f: result.append(f.split('(',1)[0]) else: result.append(f) return ''.join(result) def ko_colon_fields(lines, without_comments=True): """Converts line to (key, [list of values]) lines -- colon fields such as DBLINKS or GENES in the KEGG KO file. Example: ' BXE: Bxe_B0037 Bxe_C0683 Bxe_C1002 Bxe_C1023\n' """ merged = ' '.join(map(strip, lines)) if without_comments: merged = delete_comments(merged) key, remainder = merged.split(':',1) vals = remainder.split() return key, vals def ko_colon_delimited_parser(lines, without_comments=True): """For lines of the form LABEL: id: values. Returns dict of id:values. """ first_line = lines[0] without_first_field = first_line.split(None, 1)[1] data_start = len(first_line) - len(without_first_field) result = {} curr = [] for line in lines: line = line[data_start:] if line[0] != ' ': #start of new block if curr: key, vals = ko_colon_fields(curr, without_comments) result[key] = vals curr = [] curr.append(line) if curr: key, vals = ko_colon_fields(curr, without_comments) result[key] = vals return result def _is_new_kegg_rec_group(prev, curr): """Check for irregular record group terminators""" return curr[0].isupper() and not prev.endswith(';') and \ not curr.startswith('CoA biosynthesis') and not prev.endswith(' and') and \ not prev.endswith('-') and not prev.endswith(' in') and not \ prev.endswith(' type') and not prev.endswith('Bindng') and not \ prev.endswith('Binding') def group_by_end_char(lines, end_char = ']', \ is_new_rec=_is_new_kegg_rec_group): """Yields successive groups of lines that end with the specified char. Note: also returns the last group of lines whether or not the end char is present. """ curr_lines = [] prev_line = '' for line in lines: stripped = line.strip() #unfortunately, not all records in kegg actually end with the #terminator, so need to check for termination condition if is_new_rec(prev_line, stripped): if curr_lines: yield curr_lines curr_lines = [] #if the line ends with the character we're looking for, assume we've #found a new record if stripped.endswith(end_char): yield curr_lines + [line] curr_lines = [] else: curr_lines.append(line) prev_line = stripped if curr_lines: yield curr_lines def class_lines_to_fields(lines): """Converts a list of lines in a single pathway within one KO class definition. """ rec = ' '.join(map(strip, lines)) #need to split off the class declaration if it is present if rec.startswith('CLASS'): rec = rec.split(None,1)[1] #figure out if it has an id and process accordingly if rec.endswith(']'): rec, class_id = rec.rsplit('[', 1) class_id = class_id[:-1] else: class_id = None rec_fields = map(strip, rec.split(';')) return class_id, tuple(rec_fields) def ko_class_parser(lines, without_comments='ignored'): """For the CLASS declaration lines. These take the form of multi-line semicolon-delimited fields (where each field is a successive entry in the KEGG pathway hierarchy), ending in a field of the form [PATH:ko00071]. Strategy: - iterate over groups of lines that end in ] (each represents one pathway) - for each line: - split off and extract the pathway id - split the rest of the terms on semicolon - return a tuple of (pathway_id, [terms_in_order]) Don't consolidate the terms in this parser because each KO group has its own class declaration so we would have to merge them for each class: instead, merge at higher level. """ for group in group_by_end_char(lines): yield class_lines_to_fields(group) def parse_ko(lines): """Parses a KO record into fields.""" # Here we define records by their category # to allow parsers to be reused on # similar entries. default_fields = ['NAME', 'DEFINITION'] colon_fields = ['DBLINKS', 'GENES'] first_field_only = ['ENTRY'] class_fields = ['CLASS'] for rec in ko_record_iterator(lines): split_fields = ko_record_splitter(rec) result = {} for k, v in split_fields.items(): if k in default_fields: result[k] = ko_default_parser(v) elif k in colon_fields: result[k] = ko_colon_delimited_parser(v) elif k in first_field_only: result[k] = ko_first_field_parser(v) elif k in class_fields: result[k] = list(ko_class_parser(v)) yield result #parse_ko: lightweight standalone ko parser def ko_record_iterator(lines): """Iterates over KO records, delimited by '///'""" curr = [] for line in lines: if line.startswith('///') and curr: yield curr curr = [] else: curr.append(line) if curr: yield curr def ko_record_splitter(lines): """Splits KO lines into dict of groups keyed by type.""" result = {} curr_label = None curr = [] i = 0 for line in lines: i+= 1 if line[0] != ' ': if curr_label is not None: result[curr_label] = curr fields = line.split(None, 1) # Annoyingly, can have blank REFERENCE lines # Lacking PMID, these still have auth/title info, however... if len(fields) == 1: curr_label = fields[0] curr_line = '' else: curr_label, curr_line = fields curr = [line] else: curr.append(line) if curr: result[curr_label] = curr return result if __name__ == '__main__': from sys import argv filename = argv[1] out_dir = argv[2] parse_ko_file(filename, \ dir_prefix = out_dir, \ debug = True) PyCogent-1.5.3/cogent/parse/kegg_pos.py000644 000765 000024 00000002461 12024702176 021005 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Jesse Zaneveld" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Zaneveld", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Zaneveld" __email__ = "zaneveld@gmail.com" __status__ = "Release" """ Parser for kegg .pos files Currently this is quite bare-bones, and primarily useful for associating the species name with the results, which is essential if combining multiple .pos files into a single database. """ # Pos file parsers def parse_pos_file(fname): """Opens fname, extracts pos fields and prepends filename""" curr_file = open(fname,"U") for line in parse_pos_lines(curr_file,fname): yield line def parse_pos_lines(lines, file_name): """Parse lines from a KEGG .pos file, yielding tab- delimited strings file name -- the file name, for deriving the species for the pos file (this is not available within the pos file, but important for mapping to other KEGG data) """ species_name = file_name.split('/')[-1].rsplit('.',1)[0] for line in lines: yield species_name + '\t' + line[:-1] + "\n" if __name__ == '__main__': from sys import argv filename = argv[1] for result_line in parse_pos_file(filename): print result_line.strip() PyCogent-1.5.3/cogent/parse/kegg_taxonomy.py000644 000765 000024 00000006616 12024702176 022070 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Jesse Zaneveld" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Zaneveld", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Zaneveld" __email__ = "zaneveld@gmail.com" __status__ = "Development" from sys import argv from string import strip from os import listdir,path from optparse import OptionParser from datetime import datetime def parse_kegg_taxonomy(lines): """Returns successive taxonomy entries from lines. Format of return value is four levels of taxonomy (sometimes empty), unique id, three-letter kegg code, abbreviated name, full name, genus, species, and common name if present. Taxonomic level information is implicit in the number of hashes read at the beginning of the last line with hashes. Need to keep track of the last level read. Each hash line has a number of hashes indicating the level, and a name for that taxonomic level. Note that this is not as detailed as the real taxonomy in the genome file! Maximum taxonomic level as of this writing is 4: exclude any levels more detailed than this. Each non-taxon line is tab-delimited: has a unique id of some kind, then the three-letter KEGG code, then the short name (should be the same as the names of the individual species files for genes, etc.), then the genus and species names which may have a common name in parentheses afterwards. """ max_taxonomy_length = 4 taxonomy_stack = [] for line in lines: #bail out if it's a blank line line = line.rstrip() if not line: continue if line.startswith('#'): #line defining taxonomic level hashes, name = line.split(None, 1) name = name.strip() level = len(hashes) if level == len(taxonomy_stack): #new entry at same level taxonomy_stack[-1] = name elif level > len(taxonomy_stack): #add level: assume sequential taxonomy_stack.append(name) else: #level must be less than stack length: truncate del taxonomy_stack[level:] taxonomy_stack[level-1] = name else: #line defining an individual taxonomy entry fields = map(strip, line.split('\t')) #add genus, species, and common name as three additional fields raw_species_name = fields[-1] species_fields = raw_species_name.split() if not species_fields: print "ERROR" print line genus_name = species_fields[0] if len(species_fields) > 1: species_name = species_fields[1] else: species_name = '' #check for common name if '(' in raw_species_name: prefix, common_name = raw_species_name.split('(', 1) common_name, ignored = common_name.split(')', 1) else: common_name = '' output_taxon = taxonomy_stack + \ ['']*(max_taxonomy_length-len(taxonomy_stack)) \ + fields + [genus_name, species_name, common_name] yield "\t".join(output_taxon) + "\n" if __name__ == '__main__': from sys import argv filename = argv[1] for result_line in parse_kegg_taxonomy(open(filename,"U")): print result_line.strip() PyCogent-1.5.3/cogent/parse/knetfold.py000644 000765 000024 00000000670 12024702176 021015 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.parse.ct import ct_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def knetfold_parser(lines): """Parser for knetfold output""" result = ct_parser(lines) return result PyCogent-1.5.3/cogent/parse/locuslink.py000644 000765 000024 00000023633 12024702176 021216 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parsers for the LL_tmpl file from LocusLink. Notes: The LocusLink format is documented in the README file, but unfortunately this documentation is mostly lies. Fields that are supposed to be unique are repeated, fields whose only association with each other is their order are found out of order, etc. I suspect that it is impossible to parse the entire file as it was intended, and writing a parser that conforms to the specification is not useful because the file does not match the specificiation. Consequently, I chose to break the assocation between fields that are supposed to form 'sets' within a subrecord rather than trying to figure out what the sets are from incomplete data. This means that e.g. products will not be associated with _particular_ RNAs: however, all RNAs and all products produced by a locus will be returned. The following fields are assumed to be unique (* = required): *LOCUSID CURRENT_LOCUSID LOCUS_CONFIRMED LOCUS_TYPE *ORGANISM STATUS OFFICIAL_SYMBOL PREFERRED_SYMBOL OFFICIAL_GENE_NAME PREFERRED_GENE_NAME All other fields are assumed to be multiple, so will return a 1-item list if they have a single record rather than returning a single item. All records will be parsed if possible, typically as MappedRecord objects. This applies especially to lines with pipe-delimited fields such as GO, CDD, CONTIG, etc. It is _likely_, but not necessarily true, that items at corresponding indices in the lists for grouped fields (e.g. MAP and MAPLINK, PHENOTYPE and PHENOTYPE_ID,BUTTON and LINK) refer to the same item (i.e. MAP[0] and MAPLINK[0] are probably a map and its corresponding link). """ from cogent.parse.record import MappedRecord, FieldWrapper, DelimitedSplitter,\ list_adder, list_extender, int_setter, LineOrientedConstructor from cogent.parse.record_finder import LabeledRecordFinder from string import maketrans, strip __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def ll_start(line): """Returns True if line looks like the start of a LocusLink record.""" return line.startswith('>>') LLFinder = LabeledRecordFinder(ll_start) pipes = DelimitedSplitter('|', None) first_pipe = DelimitedSplitter('|') commas = DelimitedSplitter(',', None) first_colon = DelimitedSplitter(':', 1) accession_wrapper = FieldWrapper(['Accession', 'Gi', 'Strain'], pipes) def _read_accession(line): """Reads accession lines: format is Accession | Gi | Strain.""" return MappedRecord(accession_wrapper(line)) rell_wrapper = FieldWrapper(['Description', 'Id', 'IdType', 'Printable'], pipes) def _read_rell(line): """Reads RELL lines: format is Description|Id|IdType|Printable""" return MappedRecord(rell_wrapper(line)) accnum_wrapper = FieldWrapper(['Accession','Gi','Strain','Start','End'], pipes) def _read_accnum(line): """Reads ACCNUM lines: format is Accession|Gi|Strain|Start|End.""" return MappedRecord(accnum_wrapper(line)) map_wrapper = FieldWrapper(['Location','Source','Type'], pipes) def _read_map(line): """Reads MAP lines: format is Location|Source|Type.""" return MappedRecord(map_wrapper(line)) sts_wrapper = FieldWrapper(['Name','Chromosome','StsId','Segment',\ 'SequenceKnown', 'Evidence'], pipes) def _read_sts(line): """Reads STS lines: format is in the full docstring. Format: Name|Chromosome|StsId|Segment|SequenceKnown|Evidence """ return MappedRecord(sts_wrapper(line)) cdd_wrapper = FieldWrapper(['Name','Key','Score','EValue','BitScore'],pipes) def _read_cdd(line): """Reads CDD lines: format is Name|Key|Score|EValue|BitScore.""" return MappedRecord(cdd_wrapper(line)) comp_wrapper = FieldWrapper(['TaxonId','Symbol','Chromosome','Position',\ 'LocusId', 'ChromosomeSelf','SymbolSelf','MapName'], pipes) def _read_comp(line): """Reads COMP lines: format is in the full docstring. TaxonId|Symbol|Chromosome|Position|LocusId|ChromosomeSelf|SymbolSelf|MapName """ return MappedRecord(comp_wrapper(line)) grif_wrapper = FieldWrapper(['PubMedId', 'Description'], first_pipe) def _read_grif(line): """Reads GRIF lines: format is PubMedId|Description.""" return MappedRecord(grif_wrapper(line)) def _read_pmid(line): """Reads PMID lines: format is comma-delimited list of pubmed IDs.""" return commas(line) go_wrapper = FieldWrapper(['Category','Term','EvidenceCode','GoId','Source',\ 'PubMedId'], pipes) def _read_go(line): """Reads GO lines. Format: Category|Term|EvidenceCode|GoId|Source|PubMedId""" return MappedRecord(go_wrapper(line)) extannot_wrapper = FieldWrapper(['Category','Term','EvidenceCode','Source',\ 'PubMedId'], pipes) def _read_extannot(line): """Reads EXTANNOT lines. format: Category|Term|EvidenceCode|Source|PubMedId""" return MappedRecord(extannot_wrapper(line)) contig_wrapper = FieldWrapper(['Accession', 'Gi', 'Strain', 'From', 'To', \ 'Orientation', 'Chromosome', 'Assembly'], pipes) def _read_contig(line): """Reads CONTIG lines. Format described in full docstring. Accession|Gi|Strain|From|To|Orientation|Chromosome|Assembly """ return MappedRecord(contig_wrapper(line)) _ll_multi = dict.fromkeys('RELL NG NR NM NC NP PRODUCT TRANSVAR ASSEMBLY CONTIG XG XR EVID XM XP CDD ACCNUM TYPE PROT PREFERRED_PRODUCT ALIAS_SYMBOL ALIAS_PROT PHENOTYPE PHENOTYPE_ID SUMMARY UNIGENE OMIM CHR MAP MAPLINK STS COMP ECNUM BUTTON LINK DB_DESCR DB_LINK PMID GRIF SUMFUNC GO EXTANNOT'.split()) for i in _ll_multi.keys(): _ll_multi[i] = [] class LocusLink(MappedRecord): """Holds data for a LocusLink record.""" Required = _ll_multi Aliases = {'LOCUSID':'LocusLinkId','CURRENT_LOCUS_ID':'CurrentLocusId', 'LOCUS_CONFIRMED':'LocusConfirmed','LOCUS_TYPE':'LocusType', 'ORGANISM':'Species','RELL':'RelatedLoci','STATUS':'LocusStatus', 'PRODUCT':'Products','TRANSVAR':'TranscriptionVariants', 'ASSEMBLY':'Assemblies','CONTIG':'Contigs','EVID':'ContigEvidenceCodes', 'ACCNUM':'AccessionNumbers','TYPE':'AccessionTypes','PROT':'ProteinIds', 'OFFICIAL_SYMBOL':'OfficialSymbol','PREFERRED_SYMBOL':'PreferredSymbol', 'OFFICIAL_GENE_NAME':'OfficialGeneName', 'PREFERRED_GENE_NAME':'PreferredGeneName', 'PREFERRED_PRODUCT':'PreferredProducts', 'ALIAS_SYMBOL':'SymbolAliases','ALIAS_PROT':'ProteinAliases', 'PHENOTYPE':'Phenotypes', 'PHENOTYPE_ID':'PhenotypeIds', 'SUMMARY':'Summaries', 'UNIGENE':'UnigeneIds','OMIM':'OmimIds', 'CHR':'Chromosomes','MAP':'Maps','MAPLINK':'MapLinks','STS':'Sts', 'COMP':'ComparativeMapLinks','ECNUM':'EcIds','BUTTON':'Buttons', 'LINK':'Links', 'DB_DESCR':'DbDescriptions','DB_LINK':'DbLinks', 'PMID':'PubMedIds','GRIF':'Grifs', 'SUMFUNC':'FunctionSummaries', 'GO':'GoIds', 'EXTANNOT':'ExternalAnnotations'} def _accession_adder(obj, field, line): """Adds accessions to relevant field""" list_adder(obj, field, _read_accession(line)) def _accnum_adder(obj, field, line): """Adds accnum to relevant field""" list_adder(obj, field, _read_accnum(line)) def _rell_adder(obj, field, line): """Adds rell to relevant field""" list_adder(obj, field, _read_rell(line)) def _map_adder(obj, field, line): """Adds map to relevant field""" list_adder(obj, field, _read_map(line)) def _sts_adder(obj, field, line): """Adds sts to relevant field""" list_adder(obj, field, _read_sts(line)) def _cdd_adder(obj, field, line): """Adds cdd to relevant field""" list_adder(obj, field, _read_cdd(line)) def _comp_adder(obj, field, line): """Adds comp to relevant field""" list_adder(obj, field, _read_comp(line)) def _grif_adder(obj, field, line): """Adds grif to relevant field""" list_adder(obj, field, _read_grif(line)) def _pmid_adder(obj, field, line): """Adds pmid to relevant field""" list_extender(obj, field, _read_pmid(line)) def _assembly_adder(obj, field, line): """Adds assembly to relevant field""" list_adder(obj, field, commas(line)) def _go_adder(obj, field, line): """Adds go to relevant field""" list_adder(obj, field, _read_go(line)) def _extannot_adder(obj, field, line): """Adds pmid to relevant field""" list_adder(obj, field, _read_extannot(line)) def _generic_adder(obj, field, line): """Adds line to relevant field, unparsed""" list_adder(obj, field, line.strip()) def _contig_adder(obj, field, line): """Adds contig to relevant field""" list_adder(obj, field, _read_contig(line)) _ll_fieldmap = {} for field in ['LOCUSID', 'CURRENT_LOCUSID']: _ll_fieldmap[field] = int_setter _ll_fieldmap['RELL'] = _rell_adder _ll_fieldmap['MAP'] = _map_adder _ll_fieldmap['STS'] = _sts_adder _ll_fieldmap['COMP'] = _comp_adder _ll_fieldmap['GRIF'] = _grif_adder _ll_fieldmap['PMID'] = _pmid_adder _ll_fieldmap['GO'] = _go_adder _ll_fieldmap['EXTANNOT'] = _extannot_adder _ll_fieldmap['MAP'] = _map_adder _ll_fieldmap['CDD'] = _cdd_adder _ll_fieldmap['ASSEMBLY'] = _assembly_adder _ll_fieldmap['CONTIG'] = _contig_adder for field in 'NG ACCNUM'.split(): _ll_fieldmap[field] = _accnum_adder for field in 'NR NM NC NP XG XR XM XP PROT'.split(): _ll_fieldmap[field] = _accession_adder for field in _ll_multi: if field not in _ll_fieldmap: _ll_fieldmap[field] = _generic_adder LinesToLocusLink = LineOrientedConstructor() LinesToLocusLink.Constructor = LocusLink LinesToLocusLink.FieldMap = _ll_fieldmap LinesToLocusLink.LabelSplitter = first_colon def LocusLinkParser(lines): """Treats lines as a stream of unigene records""" for record in LLFinder(lines): curr = LinesToLocusLink(record) yield curr if __name__ == '__main__': from sys import argv, stdout filename = argv[1] count = 0 for record in LocusLinkParser(open(filename)): print record PyCogent-1.5.3/cogent/parse/macsim.py000644 000765 000024 00000002456 12024702176 020464 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.core import annotation, moltype __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Raymond Sammut", "Peter Maxwell", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" # # # As used by BAliBASE def MacsimParser(doc): doc = doc.getElementsByTagName('macsim')[0] align = doc.getElementsByTagName('alignment')[0] for record in align.getElementsByTagName('sequence'): name = record.getElementsByTagName( 'seq-name')[0].childNodes[0].nodeValue raw_seq = record.getElementsByTagName( 'seq-data')[0].childNodes[0].nodeValue #cast as string to de-unicode raw_string = ''.join(str(raw_seq).upper().split()) name=str(name).strip() if str(record.getAttribute('seq-type')).lower() == 'protein': alphabet = moltype.PROTEIN else: alphabet = moltype.DNA seq = alphabet.makeSequence(raw_string, Name=name) yield (name, seq) PyCogent-1.5.3/cogent/parse/mage.py000644 000765 000024 00000016752 12024702176 020130 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides a parser to create a Kinemage object from a kinemage file """ import re from cogent.format.mage import Kinemage,MageGroup,MageList,MagePoint,\ SimplexHeader from cogent.util.misc import extract_delimited from cogent.parse.record_finder import LabeledRecordFinder __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" def MageGroupFromString(line): """Returns a new MageGroup, created from a string representation""" result = MageGroup([],RecessiveOn=False) trans = {'off':('Off',True),\ 'on':('Off',False),\ 'recessiveon':('RecessiveOn',True),\ 'color':'Color',\ 'radius':'Radius',\ 'nobutton':('NoButton',True),\ 'dominant':('Dominant',True),\ 'lens':('Lens',True), 'master':'Master',\ 'instance':'Instance',\ 'clone':'Clone'} #extract all delimited fields: label & KeyWordOptions (master, etc) delimited_fields = [] while 1: part = extract_delimited(line,'{','}') if part is not None: delimited_fields.append(part) line = line.replace('{'+part+'}','') else: break #the first one is always the label label = delimited_fields[0] #the later ones (starting with 1) are keyword options field_idx = 1 #gather all left-over pieces pieces = line.split() if 'sub' in pieces[0]: #@(sub)group result.Subgroup = True result.Label = label #process all optional pieces for piece in pieces[1:]: try: #here we're finding the key. The value will be '', because it #is stored in delimited_fields (accesible by field_idx) key,value = piece.split('=') setattr(result,trans[key],delimited_fields[field_idx]) field_idx += 1 except ValueError: setattr(result,trans[piece][0],trans[piece][1]) return result def MageListFromString(line): """Returns a new MageList, created from a string representation""" result = MageList() trans = {'off':('Off',True),\ 'on':('Off',False),\ 'color':'Color',\ 'radius':'Radius',\ 'nobutton':('NoButton',True),\ 'angle':'Angle',\ 'width':'Width',\ 'face':'Face',\ 'font':'Font',\ 'size':'Size'} line = re.sub('=\s*','=',line) label = extract_delimited(line, '{', '}') if label is not None: pieces = line.replace('{'+label+'}', '').split() else: pieces = line.split() style = pieces[0].strip('@')[:-4] #take off the 'list' part if style in MageList.KnownStyles: result.Style = style else: raise ValueError,"Unknown style: %s"%(style) result.Label = label #process all optional pieces for piece in pieces[1:]: try: key,value = [item.strip() for item in piece.split('=')] key = trans[key] except ValueError: #unpack list of wrong size key,value = trans[piece][0],trans[piece][1] #KeyError will be raised in case of an unkown key setattr(result,key,value) return result def MagePointFromString(line): """Constructs a new MagePoint from a one-line string.""" #handle the label if there is one: note that it might contain spaces line = line.strip() result = MagePoint() label = extract_delimited(line, '{', '}') if label: pieces = line.replace('{'+label+'}', '').split() else: pieces = line.split() fields = [] #also have to take into account the possibility of comma-delimited parts for p in pieces: fields.extend(filter(None, p.split(','))) pieces = fields result.Label = label #get the coordinates and remove them from the list of items result.Coordinates = map(float, pieces[-3:]) pieces = pieces[:-3] #parse the remaining attributes in more detail result.State = None result.Width = None result.Radius = None result.Color = None for attr in pieces: #handle radius if attr.startswith('r='): #radius: note case sensitivity result.Radius = float(attr[2:]) #handle single-character attributes elif len(attr) == 1: result.State = attr #handle line width elif attr.startswith('width'): result.Width = int(attr[5:]) else: #otherwise assume it's a color label result.Color = attr return result def _is_keyword(line): if line.startswith('@'): return True return False KeywordFinder = LabeledRecordFinder(_is_keyword) def MageParser(infile): """MageParser returns a new kinemage object, created from a string repr. infile: should be an iterable file object The MageParser works only on ONE kinemage object, so files containing more than one kinemage should be split beforehand. This can easily be adjusted if it would be useful in the future. The MageParser handles only certain keywords (@kinemage, @text, @caption, @___group, @____list) and MagePoints at this point in time. All unkown keywords are assumed to be part of the header, so you can find them in the header information. The lists that are part of the Simplex header are treated as normal lists. The 'text' and 'caption' are printed between the header and the first group (see cogent.format.mage). All text found after @text keywords is grouped and outputted as Kinemage.Text. WARNING: MageParser should work on all .kin files generated by the code in cogent.format.mage. There are no guarantees using it on other kinemage files, so always check your output!!! """ text = [] caption = [] header = [] group_pat = re.compile('@(sub)?group') list_pat = re.compile('@\w{3,8}list') last_group = None for rec in KeywordFinder(infile): first = rec[0] other=None if len(rec)>1: other = rec[1:] if first.startswith('@kinemage'): #kinemage k = Kinemage() count = int(first.replace('@kinemage','').strip()) k.Count = count elif first.startswith('@text'): #text if other: text.extend(other) elif first.startswith('@caption'): #caption if other: caption.extend(other) elif group_pat.match(first): #group m = MageGroupFromString(first) if m.Subgroup: #subgroup, should be appended to some other group if last_group is None: raise ValueError,"Subgroup found before first group" last_group.append(m) else: #normal group k.Groups.append(m) last_group = m elif list_pat.match(first): #list l = MageListFromString(first) if other: points = [MagePointFromString(p) for p in other] l.extend(points) last_group.append(l) else: #something else header.append(first) if other: header.extend(other) if text: k.Text = '\n'.join(text) if caption: k.Caption = '\n'.join(caption) if header: k.Header = '\n'.join(header) return k PyCogent-1.5.3/cogent/parse/meme.py000644 000765 000024 00000022746 12024702176 020142 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parses MEME output file and creates Module objects. Supports MEME version 3.0 - 4.8.1 """ from cogent.parse.record_finder import LabeledRecordFinder from cogent.parse.record import DelimitedSplitter from cogent.motif.util import Location, ModuleInstance, Module, Motif,\ MotifResults, make_remap_dict from cogent.core.moltype import DNA, RNA, PROTEIN __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" def getDataBlock(lines): """Returns main block of data as list. """ #Get main data block: All lines following "COMMAND LINE SUMMARY" meme_command = LabeledRecordFinder(lambda x: x.startswith('COMMAND')) main_block = list(meme_command(lines)) alphabet = getMolType(main_block[0]) return main_block[1], alphabet def getMolType(lines): """Returns alphabet type that sequences belong to. """ for line in lines: if 'ALPHABET' in line: alphabet_line = line #Split on equal sign alphabet_line = alphabet_line.strip().split('ALPHABET= ')[1].split()[0] #Get Set of alphabet letters alphabet = set(alphabet_line) #get Protein Set protein_order = set(PROTEIN.Alphabet) #get RNA Set rna_order = set(RNA.Alphabet) #Find out which alphabet is used if len(alphabet) >= 20: return PROTEIN elif alphabet == rna_order: return RNA else: return DNA def getCommandModuleBlocks(main_block): """Returns command line summary block and list of module blocks. """ #Get Command line summary and all module information meme_module = LabeledRecordFinder(lambda x: x.startswith('MOTIF')) main_block = list(meme_module(main_block)) command_block = main_block[0] module_blocks = [] if len(main_block) > 1: module_blocks = main_block[1:] return command_block, module_blocks def getSummaryBlock(module_blocks): """Returns summary of motifs block. """ meme_summary = LabeledRecordFinder(lambda x: x.startswith('SUMMARY'),\ constructor=None,ignore=lambda x: x.startswith(' ')) summary_block = list(meme_summary(module_blocks)) return summary_block[1] def dictFromList(data_list): """Returns a dict given a list. - Dict created from a list where list contains alternating key, value pairs. - ex: [key1, value1, key2, value2] returns: {key1:value1, key2:value2} """ data_dict = {} for i in range(0,len(data_list)-1,2): #If there is already a value for the given key if data_list[i] in data_dict: #Add the rest of data to the value string data_dict[data_list[i]] = data_dict[data_list[i]] + ' ' + \ data_list[i+1] else: #Otherwise add value to given key data_dict[data_list[i]] = data_list[i+1] return data_dict def extractCommandLineData(command_block): """Returns a dict of all command line data from MEME output. """ data_dict = {} #Get only necessary Command Line Summary data ignore = lambda x: x.startswith('*') meme_model = LabeledRecordFinder(lambda x: 'model:' in x, ignore=ignore) cmd_data = list(meme_model(command_block)) cmd_data = cmd_data[1] cmd_data = cmd_data[:-4] #Just return list of strings rather than parse data """ cmd_data = '^'.join(cmd_data) cmd_data = cmd_data.split() cmd_data = ' '.join(cmd_data) cmd_data = cmd_data.split(': ') lastkarat = DelimitedSplitter('^',-1) cmd_data_temp = [] for line in cmd_data: cmd_data_temp.extend(lastkarat(line)) cmd_data = '>'.join(cmd_data_temp) cmd_data = cmd_data.replace('= ','=') cmd_data = cmd_data.replace('^',' ') cmd_data = cmd_data.split('>') """ return cmd_data def getModuleDataBlocks(module_blocks): """Returns list data blocks for each module. """ #Get blocks of module information for each module meme_module_data = LabeledRecordFinder(lambda x: x.startswith('Motif')) module_data_blocks = [] for module in module_blocks: module_data_blocks.append(list(meme_module_data(module))) return module_data_blocks def extractModuleData(module_data, alphabet, remap_dict): """Creates Module object given module_data list. - Only works on 1 module at a time: only pass in data from one module. """ #Create Module object meme_module = {} #Only keep first 3 elements of the list module_data = module_data[:3] #Get Module general information: module_data[0] #Only need to keep first line general_dict = getModuleGeneralInfo(module_data[0][0]) module_length = int(general_dict['width']) #Get ModuleInstances: module_data[2] instance_data = module_data[2][4:-2] for i in xrange(len(instance_data)): instance_data[i] = instance_data[i].split() #Create a ModuleInstance object and add it to Module for each instance for instance in instance_data: seqId = remap_dict[instance[0]] start = int(instance[1])-1 Pvalue = float(instance[2]) sequence = instance[4] #Create Location object for ModuleInstance location = Location(seqId, start, start + module_length) #Create ModuleInstance mod_instance = ModuleInstance(sequence,location,Pvalue) #Add ModuleInstance to Module meme_module[(seqId,start)] = mod_instance meme_module = Module(meme_module, MolType=alphabet) #Get Multilevel Consensus Sequence meme_module.ConsensusSequence = getConsensusSequence(module_data[1]) #Pull out desired values from dict meme_module.Llr = int(general_dict['llr']) meme_module.Evalue = float(general_dict['E-value']) meme_module.ID = general_dict['MOTIF'] return meme_module def getConsensusSequence(first_block): """Returns multilevel consensus sequences string. """ for line in first_block: if line.upper().startswith('MULTILEVEL'): return line.split()[1] def getModuleGeneralInfo(module_general): """Returns dict with Module general information. - Module general information includes: - width, sites, llr, E-value """ module_id = module_general[:8] module_general = module_general[8:].strip().replace(' =','') module_general = module_general.split() #Get dict of Module general info from list general_dict = dictFromList(module_general) general_dict['MOTIF']=module_id[5:].strip() return general_dict def extractSummaryData(summary_block): """Returns dict of sequences and combined P values. - {'CombinedP':{ 'seqId1': Pvalue1, 'seqId2': Pvalue2,} } """ #Get slice of necessary data from summary_block summary = summary_block[7:] #print summary summary_dict = {} #Split on whitespace for i in xrange(len(summary)): summary[i] = summary[i].split() #Add necesary data to dict for seq in summary: #Stop when '--------------------' is found: end of data if seq[0].startswith('--------------------'): break if len(seq) < 3: continue summary_dict[seq[0]] = float(seq[1]) return {'CombinedP':summary_dict} def MemeParser(lines, allowed_ids=[]): """Returns a MotifResults object given a MEME results file. """ warnings = [] #Create MotifResults object meme_motif_results = MotifResults() #Get main block and alphabet main_block, alphabet = getDataBlock(lines) #Add alphabet to MotifResults object meme_motif_results.MolType = alphabet #Get command line summary block and module blocks command_block, module_blocks = getCommandModuleBlocks(main_block) if command_block: #Extract command line data and put in dict parameters_list = extractCommandLineData(command_block) #Add parameters dict to MotifResults object parameters meme_motif_results.Parameters = parameters_list #make sure modules were found if len(module_blocks) > 0: #Get Summary of motifs block summary_block = getSummaryBlock(module_blocks[-1]) #Extract summary data and get summary_dict summary_dict = extractSummaryData(summary_block) seq_names = summary_dict['CombinedP'].keys() if allowed_ids: remap_dict,warning = make_remap_dict(seq_names,allowed_ids) if warning: warnings.append(warning) sd = {} for k,v in summary_dict['CombinedP'].items(): sd[remap_dict[k]]=v summary_dict['CombinedP']=sd else: remap_dict = dict(zip(seq_names,seq_names)) #Add summary dict to MotifResults object meme_motif_results.Results = summary_dict #Add warnings to MotifResults object meme_motif_results.Results['Warnings']=warnings #Get blocks for each module module_blocks = getModuleDataBlocks(module_blocks) #Extract modules and put in MotifResults.Modules list for module in module_blocks: meme_motif_results.Modules.append(extractModuleData(module,\ alphabet,remap_dict)) for module in meme_motif_results.Modules: meme_motif_results.Motifs.append(Motif(module)) return meme_motif_results PyCogent-1.5.3/cogent/parse/mfold.py000644 000765 000024 00000000674 12024702176 020314 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.parse.ct import ct_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def mfold_parser(lines=None): """Parser for Mfold output""" result = ct_parser(lines) return result PyCogent-1.5.3/cogent/parse/mothur.py000644 000765 000024 00000003026 12024702176 020523 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file cogent.parse.mothur.py """Parses Mothur otu list""" from record_finder import is_empty __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Prototype" def parse_otu_list(lines, precision=0.0049): """Parser for mothur *.list file To ensure all distances are of type float, the parser returns a distance of 0.0 for the unique groups. However, if some sequences are very similar, mothur may return a grouping at zero distance. What Mothur really means by this, however, is that the clustering is at the level of Mothur's precision. In this case, the parser returns the distance explicitly. If you are parsing otu's with a non-default precision, you must specify the precision here to ensure that the parsed distances are in order. Returns an iterator over (distance, otu_list) """ for line in lines: if is_empty(line): continue tokens = line.strip().split('\t') distance_str = tokens.pop(0) if distance_str.lstrip().lower().startswith('u'): distance = 0.0 elif distance_str == '0.0': distance = float(precision) else: distance = float(distance_str) num_otus = int(tokens.pop(0)) otu_list = [t.split(',') for t in tokens] yield (distance, otu_list) PyCogent-1.5.3/cogent/parse/msms.py000644 000765 000024 00000001603 12024702176 020163 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parsers for the MSMS commandline applications. """ import numpy as np __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Production" def parse_VertFile(VertFile): """Read the vertex file (with vert extension) into a numpy array. Arguments: * VertFile - open vertex file as returned by the ``Msms`` application controller. Returns a numpy array of vertices. """ vertex_list = [] for line in VertFile.readlines(): elements = line.split() try: vertex = map(float, elements[0:3]) except ValueError: continue vertex_list.append(vertex) return np.array(vertex_list) PyCogent-1.5.3/cogent/parse/ncbi_taxonomy.py000644 000765 000024 00000021204 12024702176 022054 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Extracts data from NCBI nodes.dmp and names.dmp files. """ from cogent.core.tree import TreeNode from string import strip __author__ = "Jason Carnes" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jason Carnes", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jason Carnes" __email__ = "jason.carnes@sbri.org" __status__ = "Development" class MissingParentError(Exception): pass #Note: numbers not guaranteed to be consistent if new taxa are invented... RanksToNumbers = { 'forma':1, 'varietas':2, 'subspecies':3, 'species':4, 'species subgroup':5, 'species group':6, 'subgenus':7, 'genus':8, 'subtribe':9, 'tribe':10, 'subfamily':11, 'family':12, 'superfamily':13, 'parvorder':14, 'infraorder':15, 'suborder':16, 'order':17, 'superorder':18, 'infraclass':19, 'subclass':20, 'class':21, 'superclass':22, 'subphylum':23, 'phylum':24, 'superphylum':25, 'kingdom':26, 'superkingdom':27, 'no rank':28, } class NcbiTaxon(object): """Extracts taxon information: init from one line of NCBI's nodes.dmp. Properties: TaxonId ID of this node ParentId ID of this node's parent Rank Rank of this node: genus, species, etc. EmblCode Locus name prefix; not unique DivisionId From division.dmp DivisionInherited 1 or 0; 1 if node inherits division from parent TranslTable ID of this node's genetic code from gencode.dmp GCInherit 1 or 0; 1 if node inherits genetic code from parent TranslTableMt ID of this node's mitochondrial code from gencode.dmp TranslTableMtInherited 1 or 0; 1 if node inherits mt code from parent Hidden 1 or 0; 1 if hidden by default in GenBank's listing HiddenSubtreeRoot 1 or 0; 1 if no sequences from this subtree exist Comments free-text comments RankId Arbitrary number corresponding to rank. See RanksToNumbers. Name Name of this node: must get from external source. Thanks so much, NCBI... Expect a string: '' by default. """ Fields = ['TaxonId', 'ParentId', 'Rank', 'EmblCode', 'DivisionId', 'DivisionInherited', 'TranslTable', 'TranslTableInherited', 'TranslTableMt', 'TranslTableMtInherited', 'Hidden', 'HiddenSubtreeRoot', 'Comments'] def __init__(self, line): """Returns new NcbiTaxon from line containing taxonomy data.""" line_pieces = map(strip, line.split('|')) for i in [0, 1, 5, 6, 7, 8, 9, 10, 11]: line_pieces[i] = int(line_pieces[i]) #fix trailing delimiter last = line_pieces[-1] if last.endswith('|'): line_pieces[-1] = last[:-1] self.__dict__ = dict(zip(self.Fields, line_pieces)) self.Name = '' #will get name field from names.dmp; fillNames self.RankId = RanksToNumbers.get(self.Rank, None) def __str__(self): """Writes data out in format we got it.""" pieces = [str(getattr(self,f)) for f in self.Fields] #remember to set the parent of the root to itself if pieces[1] == 'None': pieces[1] = pieces[0] return '\t|\t'.join(pieces) + '\t|\n' def __cmp__(self, other): """Compare by taxon rank.""" try: return cmp(self.RankId, other.RankId) except AttributeError: return 1 #always sort ranked nodes above unranked def NcbiTaxonParser(infile): """Returns a sequence of NcbiTaxon objects from sequence of lines.""" for line in infile: if line.strip(): yield NcbiTaxon(line) def NcbiTaxonLookup(taxa): """Returns dict of TaxonId -> NcbiTaxon object.""" result = {} for t in taxa: result[t.TaxonId] = t return result class NcbiName(object): """Extracts name information: init from one line of NCBI's names.dmp. Properties: TaxonId TaxonId of this node Name Text representation of the name, e.g. Homo sapiens UniqueName The unique variant of this name if Name not unique NameClass Kind of name, e.g. scientific name, synonym, etc. """ Fields = ['TaxonId', 'Name', 'UniqueName', 'NameClass'] def __init__(self, line): """Returns new NcbiName from line containing name data.""" line_pieces = map(strip, line.split('|')) line_pieces[0] = int(line_pieces[0]) #convert taxon_id self.__dict__ = dict(zip(self.Fields, line_pieces)) def __str__(self): """Writes data out in similar format as the one we got it from.""" return '\t|\t'.join([str(getattr(self, f)) for f in self.Fields]) \ + '|\n' def NcbiNameParser(infile): """Returns sequence of NcbiName objects from sequence of lines.""" for line in infile: if line.strip(): yield NcbiName(line) def NcbiNameLookup(names): """Returns dict mapping taxon id -> NCBI scientific name.""" result = {} for name in names: if name.NameClass == 'scientific name': result[name.TaxonId] = name return result class NcbiTaxonomy(object): """Holds root node of a taxonomy tree, plus lookup by id or name.""" def __init__(self, taxa, names, strict=False): """Creates new taxonomy, using data in Taxa and Names. taxa should be the product of NcbiTaxonLookup. names should be the product of NcbiNameLookup. strict, if True, raises an error on finding taxa whose parents don't exist. Otherwise, will put them in self.Deadbeats keyed by parent ID. Note: because taxa is a dict, nodes will be added in arbitrary order. """ names_to_nodes = {} ids_to_nodes = {} for t_id, t in taxa.iteritems(): name_rec = names.get(t_id, None) if name_rec: name = name_rec.Name else: name = 'Unknown' t.Name = name node = NcbiTaxonNode(t) names_to_nodes[name] = node ids_to_nodes[t_id] = node self.ByName = names_to_nodes self.ById = ids_to_nodes deadbeats = {} #build the tree by connecting each node to its parent for t_id, t in ids_to_nodes.iteritems(): if t.ParentId == t.TaxonId: t.Parent = None else: try: ids_to_nodes[t.ParentId].append(t) except KeyError: #found a child whose parent doesn't exist if strict: raise MissingParentError, \ "Node %s has parent %s, which isn't in taxa." % \ (t_id, t.ParentId) else: deadbeats[t.ParentId] = t self.Deadbeats = deadbeats self.Root = t.root() def __getitem__(self, item): """If item is int, returns taxon by id: otherwise, searches by name. Returns the relevant NcbiTaxonNode. Will raise KeyError if not present. """ try: return self.ById[int(item)] except ValueError: return self.ByName[item] class NcbiTaxonNode(TreeNode): """Provides some additional methods specific to Ncbi taxa.""" def __init__(self, Data): """Returns a new NcbiTaxonNode object; requires NcbiTaxon to initialize.""" self.Data = Data self._parent = None self.Children = [] def getRankedDescendants(self, rank): """Returns all descendants of self with specified rank as flat list.""" curr = self.Rank if curr == rank: result = [self] else: result = [] for i in self: result.extend(i.getRankedDescendants(rank)) return result def _get_parent_id(self): return self.Data.ParentId ParentId = property(_get_parent_id) def _get_taxon_id(self): return self.Data.TaxonId TaxonId = property(_get_taxon_id) def _get_rank(self): return self.Data.Rank Rank = property(_get_rank) def _get_name(self): return self.Data.Name Name = property(_get_name) def NcbiTaxonomyFromFiles(nodes_file, names_file, strict=False): """Returns new NcbiTaxonomy fron nodes and names files.""" taxa = NcbiTaxonLookup(NcbiTaxonParser(nodes_file)) names = NcbiNameLookup(NcbiNameParser(names_file)) return NcbiTaxonomy(taxa, names, strict) PyCogent-1.5.3/cogent/parse/newick.py000644 000765 000024 00000016132 12024702176 020467 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Newick format with all features as per the specs at: http://evolution.genetics.washington.edu/phylip/newick_doc.html http://evolution.genetics.washington.edu/phylip/newicktree.html ie: Unquoted label underscore munging Quoted labels Inner node labels Lengths [ ... ] Comments (discarded) Unlabeled tips also: Double quotes can be used. Spaces and quote marks are OK inside unquoted labels. """ from cogent.parse.record import FileFormatError import re EOT = None __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Andrew Butterfield", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class TreeParseError(FileFormatError): pass class _Tokeniser(object): """Supplies an iterable stream of Newick tokens from 'text' By default this is very forgiving of non-standard unquoted labels. Two options can change how unquoted labels are interpreted: To prohibit internal spaces and quotes set strict_labels=True. To disable conversion of '_' to ' ' set underscore_unmunge=False. NOTE: underscore_unmunging is part of the Newick standard, although it is often inconvenient for other purposes. """ def __init__(self, text, strict_labels=False, underscore_unmunge=True): self.text = text self.posn = None self.strict_unquoted_labels = strict_labels self.underscore_unmunge = underscore_unmunge def error(self, detail=""): if self.token: msg = 'Unexpected "%s" at ' % self.token else: msg = 'At ' (line, column) = self.posn sample = self.text.split('\n')[line][:column] if column > 30: sample = "..." + sample[-20:] if line > 0: msg += 'line %s:%s "%s"' % (line+1, column, sample) else: msg += 'char %s "%s"' % (column, sample) return TreeParseError(msg + '. ' + detail) def tokens(self): closing_quote_token = None column = 0 line = 0 text = None closing_quote_token = None in_comment = False for token in re.split("""([\\t ]+|\\n|''|""|[]['"(),:;])""", self.text)+[EOT]: label_complete = False token_consumed = True self.token = token column += len(token or '') self.posn = (line, column) if token == "": pass elif in_comment: if token is EOT: raise self.error('Ended with unclosed comment') if token == ']': in_comment = False elif closing_quote_token: if token is EOT: raise self.error('Text ended inside quoted label') if token == '\n': raise self.error('Line ended inside quoted label') if token == closing_quote_token: label_complete = True closing_quote_token = None else: if token == closing_quote_token*2: token = token[0] text += token elif token is EOT or token in '\n[():,;': if text: text = text.strip() if self.underscore_unmunge and '_' in text: text = text.replace('_', ' ') label_complete = True if token == '\n': line += 1 column = 1 elif token == '[': in_comment = True else: token_consumed = False elif text is not None: text += token elif token in ["''", '""']: label_complete = True text = "" elif token in ["'", '"']: closing_quote_token = token text = "" elif token.strip(): text = token label_complete = self.strict_unquoted_labels if label_complete: self.token = None yield text text = None if not token_consumed: self.token = token yield token def parse_string(text, constructor, **kw): """Parses a Newick-format string, using specified constructor for tree. Calls constructor(children, name, attributes) Note: underscore_unmunge, if True, replaces underscores with spaces in the data that's read in. This is part of the Newick format, but it is often useful to suppress this behavior. """ if "(" not in text and ";" not in text and text.strip(): # otherwise "filename" is a valid (if small) tree raise TreeParseError('Not a Newick tree: "%s"' % text[:10]) sentinals = [';', EOT] stack = [] nodes = [] children = name = expected_attribute = None attributes = {} tokeniser = _Tokeniser(text, **kw) for token in tokeniser.tokens(): if expected_attribute is not None: (attr_name, attr_cast) = expected_attribute try: attributes[attr_name] = attr_cast(token) except ValueError: raise tokeniser.error("Can't convert %s '%s'" % (attr_name, token)) expected_attribute = None elif token == '(': if children is not None: raise tokeniser.error( "Two subtrees in one node, missing comma?") elif name or attributes: raise tokeniser.error( "Subtree must be first element of the node.") stack.append((nodes, sentinals, attributes)) (nodes, sentinals, attributes) = ([], [')'], {}) elif token == ':': if 'length' in attributes: raise tokeniser.error("Already have a length.") expected_attribute = ('length', float) elif token in [')', ';', ',', EOT]: nodes.append(constructor(children, name, attributes)) children = name = expected_attribute = None attributes = {} if token in sentinals: if stack: children = nodes (nodes, sentinals, attributes) = stack.pop() else: break elif token == ',' and ')' in sentinals: pass else: raise tokeniser.error("Was expecting to end with %s" % ' or '.join([repr(s) for s in sentinals])) else: if name is not None: raise tokeniser.error("Already have a name '%s' for this node." % name) elif attributes: raise tokeniser.error("Name should come before length.") name = token assert not stack, stack assert len(nodes) == 1, len(nodes) return nodes[0] PyCogent-1.5.3/cogent/parse/nexus.py000644 000765 000024 00000013110 12024702176 020342 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ parses Nexus formatted tree files and Branchlength info in log files """ import re from string import strip from cogent.parse.record import RecordError __author__ = "Catherine Lozupone" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozuopone", "Rob Knight", "Micah Hamady"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Catherine Lozupone" __email__ = "lozupone@colorado.edu" __status__ = "Production" def parse_nexus_tree(tree_f): """returns a dict mapping taxa # to name from the translation table, and a dict mapping tree name to dnd string; takes a handle for a Nexus formatted file as input""" trans_table = None tree_info = get_tree_info(tree_f) check_tree_info(tree_info) header_s, trans_table_s, dnd_s = split_tree_info(tree_info) if trans_table_s: trans_table = parse_trans_table(trans_table_s) dnd = parse_dnd(dnd_s) return trans_table, dnd def get_tree_info(tree_f): """returns the trees section of a Nexus file: takes a handle for a Nexus formatted file as input: returns the section describing trees as a list of strings""" in_tree = False result = [] for line in tree_f: #get lines from the 'Begin trees;' tag to the 'End;' tag line_lower = line.lower() if line_lower.startswith('begin trees;'): in_tree = True if in_tree: if line_lower.startswith('end;') or \ line_lower.startswith('endblock;'): return result else: result.append(line) def check_tree_info(tree_info): """makes sure that there is a tree section in the file""" if tree_info: pass else: raise RecordError, "not a valid Nexus Tree File" def split_tree_info(tree_info): """Returns header, table, and dnd info from tree section of Nexus file.: Expects to receive the output of get_tree_info""" header = [] trans_table = [] dnd = [] state = "in_header" for line in tree_info: line_lower = line.lower() if state == "in_header": header.append(line) if line_lower.strip() == 'translate': state = "in_trans" elif line_lower.startswith('tree'): state = "in_dnd" dnd.append(line) elif state == "in_trans": trans_table.append(line) if line.strip() == ';': state = "in_dnd" elif state == "in_dnd": dnd.append(line) return header, trans_table, dnd def parse_trans_table(trans_table): """returns a dict with the taxa names indexed by number""" result = {} for line in trans_table: line = line.strip() if line == ';': pass else: label, name = line.split(None, 1) #take comma out of name if it is there if name.endswith(','): name = name[0:-1] # remove single quotes if name.startswith("'") and name.endswith("'"): name = name[1:-1] result[label] = name return result def parse_dnd(dnd):#get rooted info """returns a dict with dnd indexed by name""" dnd_dict = {} for line in dnd: line = line.strip() name, dnd_s = map(strip, line.split('=', 1)) #get dnd from dnd_s and populate dnd_index = dnd_s.find('(') data = dnd_s[dnd_index:] dnd_dict[name] = data return dnd_dict def get_BL_table(branch_lengths): """returns the section of the log file with the BL table as a list of strings""" in_table = 0 result = [] beg_tag = re.compile('\s+Node\s+to node\s+length') end_tag = re.compile('Sum') for line in branch_lengths: if end_tag.match(line): in_table = 0 if beg_tag.match(line): in_table = 1 if in_table == 1: if line.startswith("---") or beg_tag.match(line) or line.strip()== '': pass else: result.append(line) return result def find_fields(line, field_order = ["taxa", "parent", "bl"], \ field_delims = [0, 21, 36, 49]): """takes line from BL table and returns dict with field names mapped to info field order is the order of field names to extract from the file and field_delims is a list of index numbers indicating where the field is split """ field_dict = {} for i, f in enumerate(field_order): start = field_delims[i] try: end = field_delims[i + 1] except IndexError: end = None field_dict[f] = line[start:end].strip() return field_dict def parse_taxa(taxa_field): """gets taxa # from taxa field extracted with find_fields""" #look for lines with a number in parentheses term_match = re.search(r'\(\d+\)', taxa_field) if not term_match: data = taxa_field else: term = term_match.group(0) data_match = re.search(r'\d+', term) data = data_match.group(0) return data def parse_PAUP_log(branch_lengths): """gets branch length info from a PAUP log file returns a dictionary mapping the taxon number to the parent number and the branch length""" BL_table = get_BL_table(branch_lengths) BL_dict = {} for line in BL_table: info = find_fields(line) parent = info["parent"] bl = float(info["bl"]) taxa = parse_taxa(info["taxa"]) BL_dict[taxa] = (parent, bl) return BL_dict PyCogent-1.5.3/cogent/parse/nupack.py000644 000765 000024 00000005167 12024702176 020476 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from string import strip,split,atof from cogent.util.transform import make_trans from cogent.struct.rna2d import Pairs,ViennaStructure from cogent.struct.knots import opt_single_random __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def nupack_parser(lines=None,pseudo=True): """Parser for NUPACK output format pseudo - If True pseudoknot will be keept if False it will be removed """ result = line_parser(lines,pseudo) return result curly_to_dots_table = make_trans('{}','..') bracket_to_dots_table = make_trans('()','..') curly_to_bracket_table = make_trans('{}','()') def line_parser(lines=None,pseudo=True): """Parser for nupack output format Returns list containing: sequence, paris and energy ex: [[seq,[struct],energy]] """ record = False result = [] SSEList = [] #Sequence,Structure,Energy for line in lines: if line.startswith('Error'):#Error no structure found result = [Pairs([])] #return empty pairs list return result if line.startswith('Sequence and a Minimum Energy Structure'): record = True elif record: line = line.strip('\n') SSEList.append(line) SSEList[1] = to_pairs(SSEList,pseudo) #pairs SSEList = SSEList[:3] SSEList[-1] = atof(SSEList[-1].split()[2]) #energy result.append(SSEList) return result def to_pairs(list=None,pseudo=True): """ Converts nupack structure string into pairs object pseudoknotted and not pseudoknotted. """ tmp = list[1] pairs = [] if list.__contains__('pseudoknotted!'): #since pseudoknotted is denoted by {} it divides into {} and () string #they are then turned into pairs lists seperatly and then the lists are #joined to form the complete set of pairs first = second = tmp first = ViennaStructure(first.translate(curly_to_dots_table)) second = second.translate(bracket_to_dots_table) second = ViennaStructure(second.translate(curly_to_bracket_table)) pairs = first.toPairs() pairs.extend(second.toPairs()) pairs.sort() if not pseudo: pairs = opt_single_random(pairs) pairs.sort() else: structure = ViennaStructure(tmp.translate(curly_to_bracket_table)) pairs = structure.toPairs() pairs.sort() return pairs PyCogent-1.5.3/cogent/parse/paml.py000644 000765 000024 00000001663 12024702176 020143 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" def PamlParser(f): d = f.readline().split() numseqs, seqlen = int(d[0]), int(d[1]) for i in range(numseqs): seqname = f.readline().strip() if not seqname: raise ValueError('Sequence name missing') currseq = [] length = 0 while length < seqlen: seq_line = f.readline() if not seq_line: raise ValueError('Sequence "%s" is short: %s < %s' % (seqname, length, seqlen)) seq_line = seq_line.strip() length += len(seq_line) currseq.append(seq_line) yield (seqname, ''.join(currseq)) PyCogent-1.5.3/cogent/parse/paml_matrix.py000644 000765 000024 00000003057 12024702176 021526 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import numpy Float = numpy.core.numerictypes.sctype2char(float) from cogent.evolve import substitution_model __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" three_letter_order = 'ARNDCQEGHILKMFPSTWYV' aa_order = numpy.array([ord(aa) for aa in three_letter_order]) reorder = numpy.argsort(aa_order) def numbers_in(f): for line in f: for word in line.split(): yield float(word) def PamlMatrixParser(f): """Parses a matrix of amino acid transition probabilities and amino acid frequencies in the format used by PAML and returns a symetric array in single letter alphabetical order and a dictionary of frequencies for use by substitution_model.EmpiricalProteinMatrix""" matrix = numpy.zeros([20,20], Float) next_number = numbers_in(f).next for row in range(1,20): for col in range(0, row): matrix[row,col] = matrix[col,row] = next_number() freqs = [next_number() for i in range(20)] total = sum(freqs) assert abs(total-1) < 0.001, freqs freqs = [freq/total for freq in freqs] matrix = numpy.take(matrix, reorder, 0) matrix = numpy.take(matrix, reorder, 1) assert numpy.alltrue(matrix == numpy.transpose(matrix)) freqs = dict(zip(three_letter_order, freqs)) return (matrix, freqs) PyCogent-1.5.3/cogent/parse/pdb.py000644 000765 000024 00000025357 12024702176 017765 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """PDB parser class and parsing utility functions.""" from re import compile from numpy import array, linalg from cogent.data.protein_properties import AA_NAMES from cogent.core.entity import StructureBuilder, ConstructionWarning, ConstructionError __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" match_coords = compile('^HETATM|ATOM|MODEL') # start of coordinates match_trailer = compile('^CONECT') # end of coordinates # default PDB format string -> \n neccesery for writelines # try not to parse a second \n eg. by trying to parse charge PDB_COORDS_STRING = "%s%5i %-4s%c%3s %c%4i%c %8.3f%8.3f%8.3f%6.2f%6.2f %4s%2s\n" PDB_TER_STRING = "%s%5i %-4s%c%3s %c%4i%c\n" def dict2pdb(d): """Transform an atom dictionary into a valid PDB line.""" (x, y, z) = d['coords'] args = (d['at_type'], d['ser_num'], d['at_name'], d['alt_loc'], d['res_name'][-3:], d['chain_id'], d['res_id'], d['res_ic'], x , y , z , d['occupancy'], d['bfactor'], d['seg_id'], d['element']) return PDB_COORDS_STRING % args def dict2ter(d): """Transforms an atom dictionary into a valid TER line AFTER it.""" args = ('TER ', d['ser_num'] + 1, ' ', ' ', d['res_name'][-3:], d['chain_id'], d['res_id'], d['res_ic']) return PDB_TER_STRING % args def pdb2dict(line): """Parses a valid PDB line into an atomic dictionary.""" at_type = line[0:6] ser_num = int(line[6:11]) # numbers are ints's? at_name = line[12:16] # " N B " at_id = at_name.strip() # "N B" alt_loc = line[16:17] # always keep \s res_name = line[17:20] # non standard is 4chars long chain_id = line[21:22] res_id = int(line[22:26].strip()) # pdb requirement int res_ic = line[26:27] x = line[30:38] # no float conversion necessery y = line[38:46] # it gets loaded straight into an z = line[46:54] # numpy array. occupancy = float(line[54:60]) bfactor = float(line[60:66]) seg_id = line[72:76] element = line[76:78] h_flag = ' ' # this is the default if at_type == 'HETATM': # hetatms get the het flag (it's not used for writing) if res_name not in ('MSE', 'SEL'): # SeMet are not ligands h_flag = 'H' res_name = '%s_%s' % (h_flag, res_name) at_long_id = (at_id, alt_loc) res_long_id = (res_name, res_id, res_ic) coords = array((x, y, z)).astype("double") result = { 'at_type': at_type, 'ser_num': ser_num, 'at_name': at_name, 'at_id': at_id, 'alt_loc': alt_loc, 'res_name': res_name, 'chain_id': chain_id, 'res_id': res_id, 'res_ic': res_ic, 'h_flag': h_flag, 'coords': coords, 'occupancy': occupancy, 'bfactor': bfactor, 'seg_id': seg_id, 'res_long_id': res_long_id, 'at_long_id': at_long_id, 'element':element} return result def get_symmetry(header): """Extracts symmetry operations from header, either by parsing of conversion matrices (SMTRY or BIOMT) or using CCTBX based on the space group group data name and a,b,c,alpha,beta,gamma. """ header_result = {} for (mode, remark) in (('uc', 'REMARK 290 SMTRY'), ('bio', 'REMARK 350 BIOMT')): # parsing the raw-header matrices # uc: parsing the symmetry matrices for a given space group # bio: parsing the symmetry matrices to construct the biological molecule #'REMARK 290 SMTRY3 96 -1.000000 0.000000 0.000000 0.00000' # how should we deal with the ORIGx matrix? # needed function to check if the SCALEn matrix is correct with # respect to the CRYST1 card. mx_num = 1 new_chain = False fmx, mxs, mx, cmx = [], [], [], [] for line in header: if line.startswith('SCALE') and mode == 'uc': data = map(float, line[6:].split()[:-1]) fmx.append(data) elif line.startswith(remark): m_line = map(float, line[20:].split()) if mx_num != m_line[0] or new_chain: mx.append([ 0., 0., 0., 1.]) mxs.append(mx) mx = [] if mode == 'bio': if new_chain: new_chain = False else: cmx[-1][1] += 1 mx_num = m_line[0] mx.append(m_line[1:]) elif line.startswith('REMARK 350 APPLY') and mode == 'bio': if cmx: new_chain = True; cmx[-1][1] += 1 chains = [(c.strip(),) for c in line[42:].split(',')] cmx.append([chains, 0]) mx.append([ 0., 0., 0., 1.]) # finish the last matrix mxs.append(mx) mxs = array(mxs) # symmetry matrices header_result[mode + "_mxs"] = mxs if mode == 'uc': fmx = array(fmx) # fractionalization_matrix omx = linalg.inv(fmx) # orthogonalization_matrix header_result["uc_fmx"] = fmx header_result["uc_omx"] = omx elif mode == 'bio': cmx[-1][1] += 1 header_result[mode + "_cmx"] = cmx return header_result def get_coords_offset(line_list): """Determine the line number where coordinates begin.""" i = 0 for i, line in enumerate(line_list): if match_coords.match(line): break return i def get_trailer_offset(line_list): """Determine the line number where coordinates end.""" i = 0 for i, line in enumerate(line_list): if match_trailer.match(line): break return i def parse_header(header): """Parse parts of the PDB header.""" id = (compile('HEADER\s{4}\S+.*\S+\s*\S{9}\s+(\S{4})\s*$'), 'id') dt = (compile('HEADER\s{4}\S+.*\S+\s+(\S{9})\s+\S{4}\s*$'), 'date') nm = (compile('HEADER\s{4}(\S+.*\S+)\s+\S{9}\s+\S{4}\s*$'), 'name') mc = (compile('REMARK 280\s+MATTHEWS COEFFICIENT,\s+VM\s+\(ANGSTROMS\*\*3/DA\):\s+(\d+\.\d+)'), 'matthews') sc = (compile('REMARK 280\s+SOLVENT CONTENT,\s+VS\s+\(%\):\s+(\d+\.\d+)'), 'solvent_content') sg = (compile('REMARK 290\s+SYMMETRY OPERATORS FOR SPACE GROUP:\s+(.*\S)'), 'space_group') xt = (compile('REMARK 200\s+EXPERIMENT TYPE\s+:\s+(.*\S)'), 'experiment_type') rs = (compile('REMARK 2\s+RESOLUTION\.\s+(\d+\.\d+)\s+ANGSTROMS\.'), 'resolution') rf = (compile('REMARK 3\s+FREE R VALUE\s+:\s+(\d+\.\d+)'), 'r_free') xd = (compile('EXPDTA\s+([\w\-]+).*'), 'expdta') c1 = (compile('CRYST1\s+((\d+\.\d+\s+){6})'), 'cryst1') ra = (compile('DBREF\s+\S{4}\s+\S\s+\d+\s+\d+\s+\S+\s+(\S+)\s+\S+\s+\d+\s+\d+\s+$'), 'dbref_acc') rx = (compile('DBREF\s+\S{4}\s+\S\s+\d+\s+\d+\s+\S+\s+\S+\s+(\S+)\s+\d+\s+\d+\s+$'), 'dbref_acc_full') #CRYST1 60.456 60.456 82.526 90.00 90.00 90.00 P 41 4 #DBREF 1UI9 A 1 122 UNP Q84FH6 Q84FH6_THETH 1 122 \n' tests = [id, dt, nm, mc, sc, sg, xt, rs, rf, xd, c1, ra, rx] results = {} for line in header: for (regexp, name) in tests: try: results.update({name:regexp.search(line).group(1).strip()}) continue # taking only the first grep. except AttributeError: pass return results def parse_coords(builder, coords, forgive=1): """Parse coordinate lines.""" current_model_id = 0 model_open = False current_chain_id = None current_seg_id = None current_res_long_id = None current_res_name = None for line in coords: record_type = line[0:6] if record_type == 'MODEL ': builder.initModel(current_model_id) current_model_id += 1 model_open = True current_chain_id = None current_res_id = None elif record_type == 'ENDMDL': model_open = False current_chain_id = None current_res_id = None elif record_type == 'ATOM ' or record_type == 'HETATM': if not model_open: builder.initModel(current_model_id) current_model_id += 1 model_open = 1 new_chain = False fields = pdb2dict(line) if current_seg_id != fields['seg_id']: current_seg_id = fields['seg_id'] builder.initSeg(current_seg_id) if current_chain_id != fields['chain_id']: current_chain_id = fields['chain_id'] new_chain = True try: builder.initChain(current_chain_id) except ConstructionWarning: if not forgive: raise ConstructionError if current_res_name != fields['res_name'] or current_res_long_id != fields['res_long_id'] or new_chain: current_res_long_id = fields['res_long_id'] current_res_name = fields['res_name'] try: builder.initResidue(fields['res_long_id'], fields['h_flag']) except ConstructionWarning: if not forgive: raise ConstructionError new_chain = False try: builder.initAtom(fields['at_long_id'], fields['at_name'], fields['ser_num'], \ fields['coords'], fields['occupancy'], fields['bfactor'], \ fields['element']) except ConstructionError: if not forgive > 1: raise ConstructionError return builder.getStructure() def parse_trailer(trailer): return {} def PDBParser(open_file, structure_id=None, forgive=2): """Parse a PDB file and return a Structure object.""" file_ = open_file.readlines() builder = StructureBuilder() c_offset = get_coords_offset(file_) t_offset = get_trailer_offset(file_) raw_coords = file_[c_offset:t_offset] raw_trailer = file_[t_offset:] raw_header = file_[:c_offset] parsed_header = parse_header(raw_header) parsed_trailer = parse_trailer(raw_trailer) structure_id = (structure_id or parsed_header.get('id')) builder.initStructure(structure_id) structure = parse_coords(builder, raw_coords, forgive) # only X-ray structures will contain crystallographic data if parsed_header.get('expdta') == 'X-RAY': symetry_info = get_symmetry(raw_header) parsed_header.update(symetry_info) structure.header = parsed_header structure.trailer = parsed_trailer structure.raw_header = raw_header structure.raw_trailer = raw_trailer return structure PyCogent-1.5.3/cogent/parse/pfold.py000644 000765 000024 00000001613 12024702176 020311 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from string import split,strip from cogent.struct.rna2d import Pairs,ViennaStructure from cogent.parse.column import column_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def pfold_parser(lines): """Parser for Pfold output """ tree,lines = tree_struct_sep(lines) result = column_parser(lines) return result def tree_struct_sep(lines): """Separates tree structure from rest of the data. This is done to get an excepted format for the column_parser """ indx = None for line in lines: if line.startswith('; ********'): indx = lines.index(line)+1 break return lines[:indx],lines[indx:] PyCogent-1.5.3/cogent/parse/phylip.py000644 000765 000024 00000007543 12024702176 020522 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.parse.record import RecordError from cogent.core.alignment import Alignment __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Peter Maxwell", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" def is_blank(x): """Checks if x is blank.""" return not x.strip() def _get_header_info(line): """ Get number of sequences and length of sequence """ header_parts = line.split() num_seqs, length = map(int, header_parts[:2]) is_interleaved = len(header_parts) > 2 return num_seqs, length, is_interleaved def _split_line(line, id_offset): """ First 10 chars must be blank or contain id info """ if not line or not line.strip(): return None, None # extract id and sequence curr_id = line[0:id_offset].strip() curr_seq = line[id_offset:].strip().replace(" ", "") return curr_id, curr_seq def MinimalPhylipParser(data, id_map=None, interleaved=True): """Yields successive sequences from data as (label, seq) tuples. **Need to implement id map. **NOTE if using phylip interleaved format, will cache entire file in memory before returning sequences. If phylip file not interleaved then will yield each successive sequence. data: sequence of lines in phylip format (an open file, list, etc) id_map: optional id mapping from external ids to phylip labels - not sure if we're going to implement this returns (id, sequence) tuples """ seq_cache = {} interleaved_id_map = {} id_offset = 10 curr_ct = -1 for line in data: if curr_ct == -1: # get header info num_seqs, seq_len, interleaved = _get_header_info(line) if not num_seqs or not seq_len: return curr_ct += 1 continue curr_id, curr_seq = _split_line(line, id_offset) # skip blank lines if not curr_id and not curr_seq: continue if not interleaved: if curr_id: if seq_cache: yield seq_cache[0], ''.join(seq_cache[1:]) seq_cache = [curr_id, curr_seq] else: seq_cache.append(curr_seq) else: curr_id_ix = curr_ct % num_seqs if (curr_ct + 1) % num_seqs == 0: id_offset = 0 if curr_id_ix not in interleaved_id_map: interleaved_id_map[curr_id_ix] = curr_id seq_cache[curr_id_ix] = [] seq_cache[curr_id_ix].append(curr_seq) curr_ct += 1 # return joined sequences if interleaved if interleaved: for curr_id_ix, seq_parts in seq_cache.items(): join_seq = ''.join(seq_parts) if len(join_seq) != seq_len: raise RecordError( "Length of sequence '%s' is not the same as in header " "Found %d, Expected %d" % ( interleaved_id_map[curr_id_ix], len(join_seq), seq_len)) yield interleaved_id_map[curr_id_ix], join_seq #return last seq if not interleaved else: if seq_cache: yield seq_cache[0], ''.join(seq_cache[1:]) def get_align_for_phylip(data, id_map=None): """ Convenience function to return aligment object from phylip data data: sequence of lines in phylip format (an open file, list, etc) id_map: optional id mapping from external ids to phylip labels - not sure if we're going to implement this returns Alignment object """ mpp = MinimalPhylipParser(data, id_map) tuples = [] for tup in mpp: tuples.append(tup) return Alignment(tuples) PyCogent-1.5.3/cogent/parse/pknotsrg.py000644 000765 000024 00000004744 12024702176 021064 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parser for NUPACK output format If pseudoknotted first steam will be denoted by [] brackets and second steam with {} brackets """ from string import strip,split,atof from cogent.util.transform import make_trans from cogent.struct.rna2d import Pairs,ViennaStructure from cogent.struct.knots import opt_single_random __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def pknotsrg_parser(lines=None,pseudo=True): """Parser for pknotsrg output format Returns a list containing: sequence, structure and energy ex: [[seq,[structure],energy]] pseudo - If True pairs will be returned with pseudoknots If False pairs will be returned without pseudoknots """ result = [] struct = str(lines[1]).strip('\n') seq = lines[0].strip() tmp_pairs,energy = to_pairs(struct) tmp_pairs.sort() if not pseudo: tmp_pairs = opt_single_random(tmp_pairs) tmp_pairs.sort() result.append([seq,tmp_pairs,energy]) return result primary_table = make_trans('{[]}','....') first_table = make_trans('({[]})','..()..') second_table = make_trans('([{}])','..()..') def to_pairs(struct=None): """ Converts structure string in to a pairs object. Starts by checking for pseudoknots if pseudoknotted it translates each steam in to vienna notation and from there makes a pairs object. Each pairs object is then joined to form the final pairs object of the entire structure Returns a tuple of the pairs object and the energy """ primary = first = second = struct.split(None,2)[0] energy = atof(struct.split(None,2)[1].strip('()')) if struct.__contains__('['): #Checks for first pseudoknot steam primary = ViennaStructure(primary.translate(primary_table)) first = ViennaStructure(first.translate(first_table)) pairs = primary.toPairs() pairs.extend(first.toPairs()) #Adds the first steam to pairs object if struct.__contains__('{'): #Checks for second pseudo steam second = ViennaStructure(second.translate(second_table)) pairs.extend(second.toPairs()) else: primary = ViennaStructure(primary.translate(primary_table)) pairs = primary.toPairs() return pairs,energy PyCogent-1.5.3/cogent/parse/psl.py000644 000765 000024 00000004620 12024702176 020004 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parser for PSL format (default output by blat). Compatible with blat v.34 """ from cogent import LoadTable from cogent.parse.table import ConvertFields __author__ = "Gavin Huttley, Anuj Pahwa" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight","Peter Maxwell", "Gavin Huttley", "Anuj Pahwa"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Development" def make_header(lines): """returns one header line from multiple header lines""" lengths = map(len, lines) max_length = max(lengths) for index, line in enumerate(lines): if lengths[index] != max_length: for i in range(lengths[index], max_length): line.append('') header = [] for t, b in zip(*lines): if t.strip().endswith('-'): c = t.strip()+b else: c = ' '.join([t.strip(), b.strip()]) header += [c.strip()] return header int_series = lambda x: map(int, x.replace(',',' ').split()) row_converter = ConvertFields([(i, int) for i in range(8)]+\ [(i, int) for i in range(10, 13)]+\ [(i, int) for i in range(14, 18)]+\ [(i, int_series) for i in range(18, 21)]) def MinimalPslParser(data, row_converter=row_converter): """returns version, header and rows from data""" if type(data) == str: data = open(data) psl_version = None header = None rows = [] for record in data: if psl_version is None: assert 'psLayout version' in record psl_version = record.strip() yield psl_version continue if not record.strip(): continue if header is None and record[0] == '-': header = make_header(rows) yield header rows = [] continue rows += [record.rstrip().split('\t')] if header is not None: yield row_converter(rows[0]) rows = [] def PslToTable(data): """converts psl format to a table""" parser = MinimalPslParser(data) version = parser.next() header = parser.next() rows = [row for row in parser] table = LoadTable(header=header, rows=rows, title=version) return table PyCogent-1.5.3/cogent/parse/rdb.py000644 000765 000024 00000012470 12024702176 017757 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides a parser for Rdb format files. Data in from the European rRNA database in distribution format. """ from string import strip, maketrans from cogent.parse.record_finder import DelimitedRecordFinder from cogent.parse.record import RecordError from cogent.core.sequence import Sequence, RnaSequence from cogent.core.info import Info from cogent.core.alphabet import AlphabetError __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" RdbFinder = DelimitedRecordFinder('//') _field_names = {'acc':'rRNA',\ 'src':'Source',\ 'str':'Strain',\ 'ta1':'Taxonomy1',\ 'ta2':'Taxonomy2',\ 'ta3':'Taxonomy3',\ 'ta4':'Taxonomy4',\ 'chg':'Changes',\ 'rem':'Remarks',\ 'aut':'Authors',\ 'ttl':'Title',\ 'jou':'Journal',\ 'dat':'JournalYear',\ 'vol':'JournalVolume',\ 'pgs':'JournalPages',\ 'mty':'Gene',\ 'del':'Deletions',\ 'seq':'Species'} def InfoMaker(header_lines): """Returns an Info object constructed from the headerLines.""" info = Info() for line in header_lines: all = line.strip().split(':',1) #strip out empty lines, lines without name, lines without colon if not all[0] or len(all) != 2: continue try: name = _field_names[all[0]] except KeyError: name = all[0] value = all[1].strip() info[name] = value return info def is_seq_label(x): "Check if x looks like a sequence label line.""" return x.startswith('seq:') def MinimalRdbParser(infile,strict=True): """Yield successive sequences as (headerLines, sequence) tuples. If strict is True (default) raises RecordError when 'seq' label is missing and if the record doesn't contain any sequences. """ for rec in RdbFinder(infile): index = None for line in rec: if is_seq_label(line): index = rec.index(line) + 1 #index of first sequence line # if there is no line that starts with 'seq:' throw error or skip if not index: if strict: raise RecordError, "Found Rdb record without seq label "\ + "line: %s"%rec[0] else: continue headerLines = rec[:index] sequence = ''.join(rec[index:-1]) #strip off the delimiter if sequence.endswith('*'): sequence = sequence[:-1] #strip off '*' #if there are no sequences throw error or skip if not sequence: if strict: raise RecordError, "Found Rdb record without sequences: %s"\ %rec[0] else: continue yield headerLines, sequence def create_acceptable_sequence(sequence): """Return clean sequence as string, 'o' -> '?', sec. structure deleted sequence: string of characters SeqConstructor: constructor function for sequence creation Will replace 'o' by '?'. Will strip out secondary structure annotation. """ t = maketrans('o','?') # strip out secondary structure annotation {}[]()^ return sequence.translate(t, "{}[]()^") #should be accepted by RnaSequence def RdbParser(lines, SeqConstructor=RnaSequence, LabelConstructor=InfoMaker, \ strict=True): """Yield sequences from the Rdb record. lines: a stream of Rdb records. SeqConstructor: constructor function to create the final sequence object LabelConstructor: function that creates Info dictionary from label lines strict: boolean, when True, an error is raised when one occurs, when False, the record is ignored when an error occurs. This function returns proper RnaSequence objects when possible. It strips out the secondary structure information, and it replaces 'o' by '?'. The original sequence is stored in the info dictionary under 'OriginalSeq'. If the original sequence is the desired end product, use MinimalRdbParser. """ for header, sequence in MinimalRdbParser(lines,strict=strict): info = LabelConstructor(header) clean_seq = create_acceptable_sequence(sequence) # add original raw sequence to info info['OriginalSeq'] = sequence if strict: #need to do error checking while constructing info and sequence try: yield SeqConstructor(clean_seq, Info = info) except AlphabetError: raise RecordError(\ "Sequence construction failed on record with reference %s."\ %(info.Refs)) else: #not strict: just skip any record that raises an exception try: yield SeqConstructor(clean_seq, Info=info) except: continue if __name__ == '__main__': from sys import argv filename = argv[1] for sequence in RdbParser(open(filename)): print sequence.Info.Species print sequence PyCogent-1.5.3/cogent/parse/record.py000644 000765 000024 00000047632 12024702176 020476 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides support functions and classes for parsers. """ from copy import deepcopy from cogent.util.misc import iterable __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" class FileFormatError(Exception): """Exception raised when a file can not be parsed.""" pass class RecordError(FileFormatError): """Exception raised when a record is bad.""" pass class FieldError(RecordError): """Exception raised when a field within a record is bad.""" pass class Grouper(object): """Acts as iterator that returns lists of n items at a time from seq. Note: returns a partial list if not evenly divisible by n. """ def __init__(self, NumItems=1): """Returns new Grouper object: will return n items at a time from seq""" self.NumItems = NumItems def __call__(self, seq): """Returns iterator over seq, returning n items at a time.""" try: num = int(self.NumItems) assert num >= 1 except: raise ValueError, "Grouper.NumItems must be positive int, not %s" \ % (self.NumItems) curr = [] for i, item in enumerate(seq): if (i % num == 0) and curr: yield curr curr = [item] else: curr.append(item) #return any leftover items if curr: yield curr #Example of the instances Grouper provides: ByPairs = Grouper(2) def string_and_strip(*items): """Converts items to strings and strips them.""" return [str(i).strip() for i in items] def DelimitedSplitter(delimiter=None, max_splits=1): """Returns function that returns stripped fields split by delimiter. Unlike the default behavior of split, max_splits can be negative, in which case it counts from the end instead of the start (i.e. splits at the _last_ delimiter, last two delimiters, etc. for -1, -2, etc.) However, if the delimiter is None (the default) and max_splits is negative, will not preserve internal spaces. Note: leaves empty fields in place. """ is_int = isinstance(max_splits, int) or isinstance(max_splits, long) if is_int and (max_splits > 0): def parser(line): return [i.strip() for i in line.split(delimiter, max_splits)] elif is_int and (max_splits < 0): def parser(line): to_insert = delimiter or ' ' #re-join fields w/ space if None fields = line.split(delimiter) if (fields == []) or (fields == ['']): return [] #empty string or only delimiter: return nothing #if not enough fields, count from the start, not the end if len(fields) < max_splits: first_fields = fields[0] last_fields = fields[1:] #otherwise, count off the last n fields and join the remainder else: first_fields = fields[:max_splits] last_fields = fields[max_splits:] pieces = [] #if first_fields is empty, don't make up an extra empty string if first_fields: pieces.append(to_insert.join(first_fields)) pieces.extend(last_fields) return [i.strip() for i in pieces] else: #ignore max_splits if it was 0 def parser(line): return [i.strip() for i in line.split(delimiter)] return parser #The following provide examples of the kinds of functions DelimitedSplitter #returns. semi_splitter = DelimitedSplitter(';', None) space_pairs = DelimitedSplitter(None) equal_pairs = DelimitedSplitter('=') last_colon = DelimitedSplitter(':', -1) class GenericRecord(dict): """Holds data for a generic field ->: value mapping. Override Required with {name:prototype} mapping. Each required name will get a deepcopy of its prototype. For example, use an empty list to guarantee that each instance has its own list for a particular field to which items can be appended. Raises AttributeError on attempt to delete required item, but does not raise an exception on attempt to delete absent item. This class explicitly does _not_ override __getitem__ or __setitem__ for performance reasons: if you need to transform keys on get/set or if you need to access items as attributes and vice versa, use MappedRecord instead. """ Required = {} def __init__(self, *args, **kwargs): """Reads kwargs as properties of self.""" #perform init on temp dict to preserve interface: will then translate #aliased keys when loading into self temp = {} dict.__init__(temp, *args, **kwargs) self.update(temp) for name, prototype in self.Required.iteritems(): if not name in self: self[name] = deepcopy(prototype) def __delitem__(self, item): """Deletes item or raises exception if item required. Note: Fails silently if item absent. """ if item in self.Required: raise AttributeError, "%s is a required item" % (item,) try: super(GenericRecord, self).__delitem__(item) except KeyError: pass def copy(self): """Coerces copy to correct type""" temp = self.__class__(super(GenericRecord,self).copy()) #don't forget to copy attributes! for attr, val in self.__dict__.iteritems(): temp.__dict__[attr] = deepcopy(val) return temp class MappedRecord(GenericRecord): """GenericRecord that maps names of fields onto standardized names. Override Aliases in subclass for new mapping of OldName->NewName. Each OldName can have only one NewName, but it's OK if several OldNames map to the same NewName. Note: can access fields either as items or as attributes. In addition, can access either using nonstandard names or using standard names. Implementation note: currently, just a dict with appropriate get/set overrides and ability to access items as attributes. Attribute access is about 10x slower than in GenericRecord, so make sure you need the additional capabilities if you use MappedRecord instead of GenericRecord. WARNING: MappedRecord pretends to have every attribute, so will never raise AttributeError when trying to find an unknown attribute. This feature can cause surprising interactions when a Delegator is delegating its attributes to a MappedRecord, since any attributes defined in __init__ will be set in the MappedRecord and not in the object itself. The solution is to use the self.__dict__['AttributeName'] = foo syntax to force the attributes to be set in the object and not the MappedRecord to which it forwards. """ Aliases = {} DefaultValue = None def _copy(self, prototype): """Returns a copy of item.""" if hasattr(prototype, 'copy'): return prototype.copy() elif isinstance(prototype, list): return prototype[:] elif isinstance(prototype,str) or isinstance(prototype,int) or\ isinstance(prototype,long) or isinstance(prototype,tuple)\ or isinstance(prototype,complex) or prototype is None: return prototype #immutable type: use directly else: return deepcopy(prototype) def __init__(self, *args, **kwargs): """Reads kwargs as properties of self.""" #perform init on temp dict to preserve interface: will then translate #aliased keys when loading into self temp = {} unalias = self.unalias dict.__init__(temp, *args, **kwargs) for key, val in temp.iteritems(): self[unalias(key)] = val for name, prototype in self.Required.iteritems(): new_name = unalias(name) if not new_name in self: self[new_name] = self._copy(prototype) def unalias(self, key): """Returns dealiased name for key, or key if not in alias.""" try: return self.Aliases.get(key, key) except TypeError: return key def __getattr__(self, attr): """Returns None if field is absent, rather than raising exception.""" if attr in self: return self[attr] elif attr in self.__dict__: return self.__dict__[attr] elif attr.startswith('__'): #don't retrieve private class attrs raise AttributeError elif hasattr(self.__class__, attr): return getattr(self.__class__, attr) else: return self._copy(self.DefaultValue) def __setattr__(self, attr, value): """Sets attribute in self if absent, converting name if necessary.""" normal_attr = self.unalias(attr) #we overrode __getattr__, so have to simulate getattr(self, attr) by #calling superclass method and checking for AttributeError. #BEWARE: dict defines __getattribute__, not __getattr__! try: super(MappedRecord, self).__getattribute__(normal_attr) super(MappedRecord, self).__setattr__(normal_attr, value) except AttributeError: self[normal_attr] = value def __delattr__(self, attr): """Deletes attribute, converting name if necessary. Fails silently.""" normal_attr = self.unalias(attr) if normal_attr in self.Required: raise AttributeError, "%s is a required attribute" % (attr,) else: try: super(MappedRecord, self).__delattr__(normal_attr) except AttributeError: del self[normal_attr] def __getitem__(self, item): """Returns default if item is absent, rather than raising exception.""" normal_item = self.unalias(item) return self.get(normal_item, self._copy(self.DefaultValue)) def __setitem__(self, item, val): """Sets item, converting name if necessary.""" super(MappedRecord, self).__setitem__(self.unalias(item), val) def __delitem__(self, item): """Deletes item, converting name if necessary. Fails silently.""" normal_item = self.unalias(item) super(MappedRecord, self).__delitem__(normal_item) def __contains__(self, item): """Tests membership, converting name if necessary.""" return super(MappedRecord, self).__contains__(self.unalias(item)) def get(self, item, default): """Returns self[item] or default if not present. Silent when unhashable.""" try: return super(MappedRecord, self).get(self.unalias(item), default) except TypeError: return default def setdefault(self, key, default=None): """Returns self[key] or default (and sets self[key]=default)""" return super(MappedRecord, self).setdefault(self.unalias(key),default) def update(self, *args, **kwargs): """Updates self with items in other""" temp = {} unalias = self.unalias temp.update(*args, **kwargs) for key, val in temp.iteritems(): self[unalias(key)] = val #The following methods are useful for handling particular types of fields in #line-oriented parsers def TypeSetter(constructor=None): """Returns function that takes obj, field, val and sets obj.field = val. constructor can be any callable that returns an object. """ if constructor: def setter(obj, field, val): setattr(obj, field, constructor(val)) else: def setter(obj, field, val): setattr(obj, field, val) return setter int_setter = TypeSetter(int) str_setter = TypeSetter(str) list_setter = TypeSetter(list) tuple_setter = TypeSetter(tuple) dict_setter = TypeSetter(dict) float_setter = TypeSetter(float) complex_setter = TypeSetter(complex) bool_setter = TypeSetter(bool) identity_setter = TypeSetter() def list_adder(obj, field, val): """Adds val to list in obj.field, creating list if necessary.""" try: getattr(obj, field).append(val) except AttributeError: setattr(obj, field, [val]) def list_extender(obj, field, val): """Adds val to list in obj.field, creating list if necessary.""" try: getattr(obj, field).extend(iterable(val)) except AttributeError: setattr(obj, field, list(val)) def dict_adder(obj, field, val): """If val is a sequence, adds key/value pair in obj.field: else adds key.""" try: key, value = val except (ValueError, TypeError): key, value = val, None try: getattr(obj, field)[key] = value except AttributeError: setattr(obj, field, {key:value}) class LineOrientedConstructor(object): """Constructs a MappedRecord from a sequence of lines.""" def __init__(self, Lines=None, LabelSplitter=space_pairs, FieldMap=None, Constructor=MappedRecord, Strict=False): """Returns new LineOrientedConstructor. Fields: Lines: set of lines to construct record from (for convenience). Default is None. LabelSplitter: function that returns (label, data) tuple. Default is to split on first space and strip components. FieldMap: dict of {fieldname:handler} functions. Each function has the signature (obj, field, val) and performs an inplace action like setting field to val or appending val to field. Default is empty dict. Constructor: constructor for the resulting object. Default is MappedRecord: beware of using constructors that don't subclass MappedRecord. Strict: boolean controlling whether to raise error on unrecognized field. Default is False. """ self.Lines = Lines or [] self.LabelSplitter = LabelSplitter self.FieldMap = FieldMap or {} self.Constructor = Constructor self.Strict = Strict def __call__(self, Lines=None): """Returns the record constructed from Lines, or self.Lines""" if Lines is None: Lines = self.Lines result = self.Constructor() fieldmap = self.FieldMap aka = result.unalias splitter = self.LabelSplitter for line in Lines: #find out how many items we got, setting key and val appropiately items = list(splitter(line)) num_items = len(items) if num_items == 2: #typical case: key-value pair raw_field, val = items elif num_items > 2: raw_field = items[0] val = items[1:] elif len(items) == 1: raw_field, val = items[0], None elif not items: #presumably had line with just a delimiter? continue #figure out if we know the field under its original name or as #an alias if raw_field in fieldmap: field, mapper = raw_field, fieldmap[raw_field] else: new_field = aka(raw_field) if new_field in fieldmap: field, mapper = new_field, fieldmap[new_field] else: if self.Strict: raise FieldError, \ "Got unrecognized field %s" % (raw_field,) else: identity_setter(result, raw_field, val) continue #if we found the field in the fieldmap, apply the correct function try: mapper(result, field, val) except: #Warning: this is a catchall for _any_ exception, #and may mask what's actually going wrong. if self.Strict: raise FieldError, "Could not handle line %s" % (line,) return result def FieldWrapper(fields, splitter=None, constructor=None): """Returns dict containing field->val mapping, one level. fields should be list of fields, in order. splitter should be something like a DelimitedSplitter that converts the line into a sequence of fields. constructor is a callable applied to the dict after construction. Call result on a _single_ line, not a list of lines. Note that the constructor should take a dict and return an object of some useful type. Additionally, it is the _constructor's_ responsibility to complain if there are not enough fields, since zip will silently truncate at the shorter sequence. This is actually useful in the case where many of the later fields are optional. """ if splitter is None: splitter = DelimitedSplitter(None, None) if constructor: def parser(line): return constructor(dict(zip(fields, splitter(line)))) else: def parser(line): return dict(zip(fields, splitter(line))) return parser def StrictFieldWrapper(fields, splitter=None, constructor=None): """Returns dict containing field->val mapping, one level. fields should be list of fields, in order. splitter should be something like a DelimitedSplitter that converts the line into a sequence of fields. constructor is a callable applied to the dict after construction. Call result on a _single_ line, not a list of lines. Note that the constructor should take a dict and return an object of some useful type. Raises RecordError if the wrong number of fields are returned from the split. """ if splitter is None: splitter = DelimitedSplitter(None, None) if constructor: def parser(line): items = splitter(line) if len(items) != len(fields): raise FieldError, "Expected %s items but got %s: %s" % \ (len(fields), len(items), items) return constructor(dict(zip(fields, items))) else: def parser(line): items = splitter(line) if len(items) != len(fields): raise FieldError, "Expected %s items but got %s: %s" % \ (len(fields), len(items), items) return dict(zip(fields, items)) return parser def raise_unknown_field(field, data): """Raises a FieldError, displaying the offending field and data.""" raise FieldError, "Got unknown field %s with data %s" % (field, data) class FieldMorpher(object): """When called, applies appropriate constructors to each value of dict. Initialize using a dict of fieldname:constructor pairs. """ def __init__(self, Constructors, Default=raise_unknown_field): """Returns a new FieldMorpher, using appropriate constructors. If a field is unknown, will try to set key and value to the results of Default(key, value): in other words, the signature of Default should take a key and a value and should return a key and a value. The built-in value of Default raises a FieldError instead, but it will often be useful to do things like return the key/value pair unchanged, or to strip the key and the value and then add them. """ self.Constructors = Constructors self.Default = Default def __call__(self, data): """Returns a new dict containing information converted from data.""" result = {} default = self.Default cons = self.Constructors for key, val in data.iteritems(): if key in cons: result[key] = cons[key](val) else: new_key, new_val = default(key, val) #if we now recognize the key, use its constructor on the old val if new_key in cons: result[new_key] = cons[new_key](val) #otherwise, enter the new key and the new val else: result[new_key] = new_val return result PyCogent-1.5.3/cogent/parse/record_finder.py000644 000765 000024 00000015315 12024702176 022016 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides some classes for treating files as sequences of records. Typically more useful as subclasses. Covers the three main types of records: DelimitedRecordFinder: Records demarcated by an end line, e.g. '\\' LabeledRecordFinder: Records demarcated by a start line, e.g. '>label' LineGrouper: Records consisting of a certain number of lines. TailedRecordFinder: Records demarcated by an end mark, e.g. 'blah.' All the first classes ignore/delete blank lines and strip leading and trailing whitespace. The TailedRecodeFinder is Functional similar to DelimitedRecordFinder except that it accept a is_tail function instead of a str. Note that its default constuctor is rstrip instead of strip. """ from cogent.parse.record import RecordError, FieldError from string import strip, rstrip __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Gavin Huttley", "Zongzhi Liu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def is_empty(line): """Returns True empty lines and lines consisting only of whitespace.""" return (not line) or line.isspace() def never_ignore(line): """Always returns False.""" return False def DelimitedRecordFinder(delimiter, constructor=strip, ignore=is_empty, keep_delimiter=True, strict=True): """Returns function that returns successive delimited records from file. Includes delimiter in return value. Returns list of relevant lines. Default constructor is string.strip, but can supply another constructor to transform lines and/or coerce into correct type. If constructor is None, passes along the lines without alteration. Skips any lines for which ignore(line) evaluates True (default is to skip whitespace). keep_delimiter: keep delimiter line at the end of last block if True (default), otherwise discard delimiter line. strict: when lines found after the last delimiter -- raise error if True (default), otherwise yield the lines silently """ def parser(lines): curr = [] for line in lines: if constructor: line = constructor(line) #else: # line = l #ignore blank lines if ignore(line): continue #if we find the delimiter, return the line; otherwise, keep it if line == delimiter: if keep_delimiter: curr.append(line) yield curr curr = [] else: curr.append(line) if curr: if strict: raise RecordError, "Found additional data after records: %s"%\ (curr) else: yield curr return parser #The following is an example of the sorts of iterators RecordFinder returns. GbFinder = DelimitedRecordFinder('//') def TailedRecordFinder(is_tail_line, constructor=rstrip, ignore=is_empty, strict=True): """Returns function that returns successive tailed records from lines. Includes tail line in return value. Returns list of relevant lines. constructor: a modifier for each line, default is string.rstrip: to remove \n and trailing spaces. Skips over any lines for which ignore(line) evaluates True (default is to skip empty lines). note that the line maybe modified by constructor. strict: if True(default), raise error if the last line is not a tail. otherwise, yield the last lines. """ def parser(lines): curr = [] for line in lines: if constructor: line = constructor(line) if ignore(line): continue curr.append(line) #if we find the label, return the previous record if is_tail_line(line): yield curr curr = [] #don't forget to return the last record in the file if curr: if strict: raise RecordError('lines exist after the last tail_line ' 'or no tail_line at all') else: yield curr return parser def LabeledRecordFinder(is_label_line, constructor=strip, ignore=is_empty): """Returns function that returns successive labeled records from file. Includes label line in return value. Returns list of relevant lines. Default constructor is string.strip, but can supply another constructor to transform lines and/or coerce into correct type. If constructor is None, passes along the lines without alteration. Skips over any lines for which ignore(line) evaluates True (default is to skip empty lines). NOTE: Does _not_ raise an exception if the last line is a label line: for some formats, this is acceptable. It is the responsibility of whatever is parsing the sets of lines returned into records to complain if a record is incomplete. """ def parser(lines): curr = [] for l in lines: if constructor: line = constructor(l) else: line = l if ignore(line): continue #if we find the label, return the previous record if is_label_line(line): if curr: yield curr curr = [] curr.append(line) #don't forget to return the last record in the file if curr: yield curr return parser def is_fasta_label(x): """Checks if x looks like a FASTA label line.""" return x.startswith('>') #The following is an example of the sorts of iterators RecordFinder returns. FastaFinder = LabeledRecordFinder(is_fasta_label) def LineGrouper(num, constructor=strip, ignore=is_empty): """Returns num lines at a time, stripping and ignoring blanks. Default constructor is string.strip, but can supply another constructor to transform lines and/or coerce into correct type. If constructor is None, passes along the lines without alteration. Skips over any lines for which ignore(line) evaluates True: default is to skip whitespace lines. """ def parser(lines): curr = [] for l in lines: if constructor: line = constructor(l) else: line = l if ignore(line): continue curr.append(line) if len(curr) == num: yield curr curr = [] if curr: raise RecordError, "Non-blank lines not even multiple of %s" % num return parser PyCogent-1.5.3/cogent/parse/rfam.py000644 000765 000024 00000026243 12024702176 020140 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides a parser for Rfam format files. """ from string import strip from cogent.parse.record import RecordError from cogent.parse.record_finder import DelimitedRecordFinder from cogent.parse.clustal import ClustalParser from cogent.core.sequence import RnaSequence as Rna from cogent.core.sequence import DnaSequence as Dna from cogent.core.sequence import Sequence from cogent.core.moltype import BYTES from cogent.core.info import Info from cogent.struct.rna2d import WussStructure from cogent.util.transform import trans_all,keep_chars from cogent.core.alignment import Alignment, DataError, SequenceCollection __author__ = "Sandra Smit and Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Sandra Smit", "Gavin Huttley", "Rob Knight", "Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" def is_empty_or_html(line): """Return True for HTML line and empty (or whitespace only) line. line -- string The Rfam adaptor that retrieves records inlcudes two HTML tags in the record. These lines need to be ignored in addition to empty lines. """ if line.startswith('1 and s==False: seq = line.strip() s = True elif s == True: s=False struct = line.split(None,2)[0].strip('\n') energy = atof(line.split(' (',1)[1].split(None,1)[0].strip()) pairs = to_pairs(struct) pairs.sort() result.append([seq,pairs,energy]) return result def to_pairs(struct=None): """ Converts a vienna structure into a pairs object Returns pairs object pairs functions tested in test for rna2d.py """ struct = ViennaStructure(struct) pairs = struct.toPairs() return pairs PyCogent-1.5.3/cogent/parse/rnaforester.py000644 000765 000024 00000005616 12024702176 021546 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from string import strip,split from cogent.struct.rna2d import Pairs,ViennaStructure __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def rnaforester_parser(lines): """Parser for RNAforester output format Returns a list containing: alignemnt,consensus sequence and consensus structure Ex: [{alignment},consensus sequence,[consensus structure]] """ result = [] for block in cluster_parser(lines): for struct in line_parser(block): result.append(struct) return result def cluster_parser(lines): """To parse lines into rnaforester clluster blocks""" block = [] first = True record = False for line in lines: if line.startswith('RNA Structure Cluster Nr:'): #new cluster block record = True if not first: yield block block = [] first = False if record: block.append(line) yield block def line_parser(block): """Parses the stdout output from RNAforester and return the concensus structure prediction along with alignment and consensus sequence. """ odd = True record = False first = True seq = '' con_seq = '' struct = '' alignment = {} for line in block: #find alignments if line.startswith('seq'): if line.__contains__(')') or line.__contains__('('): continue else: sline = line.strip().split() name = sline[0] tmp_seq = sline[-1] if alignment.__contains__(name): seq = alignment[name] seq = ''.join([seq,tmp_seq]) alignment[name] = seq else: alignment[name] = tmp_seq if line.startswith('Consensus sequence/structure:'): #start record = True if not first: struct = to_pairs(struct) yield [alignment,con_seq,struct] result = [] first = True elif record: if line.startswith(' '): line = line.strip() if odd: con_seq = ''.join([con_seq,line]) odd = False else: struct = ''.join([struct,line]) odd = True struct = to_pairs(struct) yield [alignment,con_seq,struct] def to_pairs(struct): """ Takes a structure in dot-bracket notation and converts it to pairs notation functions tested in rna2d.py """ struct = ViennaStructure(struct) struct = struct.toPairs() return struct PyCogent-1.5.3/cogent/parse/rnashapes.py000644 000765 000024 00000004665 12024702176 021203 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file: rnashapes_parser.py """ Author: Shandy Wikman (ens01svn@cs.umu.se) Status: Development. According to future requirements behavior will be changed Parser to parse RNAshapes output and returns list of lists [Seq,Pairs,Ene] Revision History: 2006 Shandy Wikman created file """ from string import split,strip,atof from cogent.util.transform import make_trans from cogent.struct.rna2d import Pairs,ViennaStructure __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def RNAshapes_parser(lines=None,order=True): """ Returns a list containing tuples of (sequence,pairs object,energy) for every sequence [[Seq,Pairs,Ene],[Seq,Pairs,Ene],...] Structures will be ordered by the structure energy by default, of ordered isnt desired set order to False """ result = lineParser(lines) if order: result = order_structs(result) return result def lineParser(list=None): """ Parses Lines from output and returns a list of tuples """ result = [] s = False seq = '' energy = 0.0 pairs = '' for line in list: if len(seq)>1 and energy != 0.0 and len(pairs)>1: result.append([seq,pairs,energy]) seq = energy = pairs = '' if line.startswith('>'): name = line.strip('>\n') s = True #signals that sequence is next line elif s: seq = line.strip() s = False elif line.startswith('-'): struct = line.split(None,2)[1].strip('\n') energy = atof(line.split(None,2)[0].strip('\n')) pairs = to_Pairs(struct) return result def to_Pairs(struct=None): """ Converts a vienna structure into a pairs object Returns pairs object """ struct = ViennaStructure(struct) pairs = struct.toPairs() return pairs def order_structs(result): """ order structure so that the structure whit highetst MFE(most negative) will be placed first and so on to the lowest MFE structure. """ for i in result: i.reverse() result.sort() #result.reverse() #to test with the lowest negetiv value as the best struct for i in result: i.reverse() return result PyCogent-1.5.3/cogent/parse/rnaview.py000644 000765 000024 00000067507 12024702176 020676 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ This module provides a parser for RNAView output. Authors: Greg Caporaso (Gregory.Caporaso@UCHSC.edu) Sandra Smit (Sandra.Smit@colorado.edu) RNAView reports annotated base pairs found in a PDB file. http://ndbserver.rutgers.edu/services/download/index.html Constants: RNAVIEW_ACCEPTED -- residues accepted by RNAView WC_PAIRS -- dict of Watson-Crick pairs WOBBLE_PAIRS -- dict of Wobble pairs Important objects: Base -- holds an RNA residue BasePair -- holds a base pair BasePairs -- list of BasePair objects BaseMultiplet -- holds 3 or more residues BaseMultiplets -- list of BaseMultiplet objects Important functions: RnaviewParser -- parses RNAView output several selection functions to select a subset of base pairs, for example: is_canonical -- selects all the pairs annotated as canonical by RNAView Exceptions: RnaViewObjectError -- raised when an RNAView object detects a problem RnaViewParseError -- raised when one of the parser fucntions finds a problem """ __author__ = "Greg Caporaso and Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" RNAVIEW_ACCEPTED = dict.fromkeys('AGUCTIPaguct') WC_PAIRS = dict.fromkeys(['GC','CG','AU','UA']) WOBBLE_PAIRS = dict.fromkeys(['UG','GU']) class RnaViewObjectError(ValueError): pass class RnaViewParseError(SyntaxError): pass # ============================================================================= # RNAVIEW OBJECTS # ============================================================================= class Base(object): """Describes a residue""" def __init__(self, ChainId, ResId, ResName, RnaViewSeqPos=None): """Initialize Base object. ChainId -- Chain identifier, string. If no chain is specified, the ChainId is set to ' ' (for compatibility with PyMol) ResId -- Residue identifier, string. Cannot be None ResName -- Name of the residue, string. Accepted are AGUCTIPaguct. Cannot be None. RnaViewSeqPos -- string, position that RnaView assigns to this base. """ if not ResId or not ResName: raise RnaViewObjectError(\ "Found missing attribute: ResId=%s, ResName=%s"\ %(ResId, ResName)) self.ChainId = ChainId self.ResId = ResId self.ResName = ResName self.RnaViewSeqPos = RnaViewSeqPos def __str__(self): """Return string representation of Base object.""" return ' '.join(map(str,[self.ChainId, self.ResId, self.ResName])) def __eq__(self, other): """Overwrite the == operator.""" if self.ChainId != other.ChainId: return False if self.ResId != other.ResId: return False if self.ResName != other.ResName: return False if self.RnaViewSeqPos != other.RnaViewSeqPos: return False return True def __ne__(self,other): """Overwrite the != operator.""" return not self == other class BasePair(object): """Object for storing paired RNA bases. This object is made to store two Base objects which are involved in a base pairing interaction with each other. """ def __init__(self, Up, Down, Edges=None, Orientation=None,\ Conformation=None, Saenger=None): """Initialize the object. Up -- the upstream Base object Down -- the downstream Base object Edges -- string, the pairing edges of the base pair or 'stacked' Orientation -- string, the orientation of the bases: 'cis' or 'tran' Conformation -- string, 'syn' or 'syn syn' Saenger -- string, the Saenger classification of the base pair, either roman numeral, 'n/a', or something starting with '!' """ self.Up = Up self.Down = Down self.Edges = Edges self.Orientation = Orientation self.Conformation = Conformation self.Saenger = Saenger def __str__(self): """Return string representation of BasePair object.""" return "Bases: %s -- %s; Annotation: %s -- %s -- %s -- %s;"\ %(self.Up, self.Down, self.Edges, self.Orientation,\ self.Conformation,self.Saenger) def __eq__(self,other): """Overwrite the == operator.""" if self.Up != other.Up: return False if self.Down != other.Down: return False if self.Edges != other.Edges: return False if self.Orientation != other.Orientation: return False if self.Conformation != other.Conformation: return False if self.Saenger != other.Saenger: return False return True def __ne__(self,other): """Overwrite the != operator.""" return not self == other def isWC(self): """Return True if base pair is a Watson-Crick pair. WARNING: this method returns True for GC and AU pairs independent of the rest of the annotation. It does not check the specific edges in the base pair etc. """ # The complicated looking one-liner just makes an upper-case string # out of the two Base identities and checks to see if it exists in # WC_PAIRS return ''.join([self.Up.ResName, self.Down.ResName]).upper()\ in WC_PAIRS def isWobble(self): """Return True if base pair is a wobble pair. WARNING: this method returns True for GU pairs independent of the rest of the annotation. It does not check the specific edges in the base pair etc. """ # The complicated looking one-liner just makes an upper-case string # out of the two Base identities and checks to see if it exists in # WOBBLE_PAIRS return ''.join([self.Up.ResName, self.Down.ResName]).upper()\ in WOBBLE_PAIRS class BasePairs(list): """A list of BasePair objects.""" def __init__(self, base_pairs=None): """Initialize BasePairs object. base_pairs -- list or tuple of BasePair objects. """ if base_pairs is None: base_pairs = [] try: self[:] = list(base_pairs) except TypeError: raise RnaViewObjectError(\ 'base_pairs must be convertable to a list') def __str__(self): """Return string representation of BasePairs object.""" header = [\ "===================================================================", "Bases: Up -- Down; Annotation: Edges -- Orient. -- Conf. -- Saenger", "==================================================================="] return '\n'.join(header + [str(i) for i in self]) def select(self, selection_function, rest=False): """Return a new BasePairs object of pairs that pass the selection. selection_function -- should be function that works on a BasePair and returns True or False. rest -- boolean, if True, in addition to the selection, all the pairs that didn't make it into the selection are returned in a list. The return value is thus a tuple in this case. """ sel = [] if rest: not_sel = [] for bp in self: if selection_function(bp): sel.append(bp) else: if rest: not_sel.append(bp) else: continue if rest: return BasePairs(sel), BasePairs(not_sel) else: return BasePairs(sel) def _get_present_chains(self): """Return the chains present in the BasePairs object (in ORDER).""" up_chains = set([i.Up.ChainId for i in self]) down_chains = set([i.Down.ChainId for i in self]) result = list(up_chains | down_chains) result.sort() return result PresentChains = property(_get_present_chains) def cliques(self): """Yield all cliques in the base pairs as new BasePairs objects. A clique is a set of interacting chains. All base pairs involved in a clique are returned together in a new BasePairs object. """ if len(self.PresentChains) == 1: yield self else: ia = {} #interactions for i in self: chain_up = i.Up.ChainId chain_down = i.Down.ChainId if not ia: bp_list = [i] ia[chain_up] = bp_list ia[chain_down] = bp_list elif chain_up in ia and not chain_down in ia: ia[chain_up].append(i) ia[chain_down] = ia[chain_up] elif chain_down in ia and not chain_up in ia: ia[chain_down].append(i) ia[chain_up] = ia[chain_down] elif chain_up not in ia and chain_down not in ia: bp_list = [i] ia[chain_up] = bp_list ia[chain_down] = bp_list elif chain_up in ia and chain_down in ia and\ ia[chain_up] is ia[chain_down]: ia[chain_up].append(i) elif chain_up in ia and chain_down in ia and not\ ia[chain_up] is ia[chain_down]: ia[chain_up].append(i) ia[chain_up].extend(ia[chain_down]) for k,v in ia.items(): if k != chain_down and v is ia[chain_down]: ia[k] = ia[chain_up] ia[chain_down] = ia[chain_up] else: continue uniques = [] for k,v in ia.items(): if v not in uniques: uniques.append(v) for bps in uniques: yield BasePairs(bps) def hasConflicts(self, return_conflict=False): """Return True if there is a conflict in the list of base pairs. return_conflict -- if True, return value is a tuple of a boolean and the string representation of the base involved in the conflict. A conflict occurs when according to RnaView a base is involved in more than one interaction. For example, base A pairs with base B, and base A pairs with base C. """ seen = {} for bp in self: up = bp.Up down = bp.Down if (up.ChainId, up.ResId, up.ResName) not in seen: seen[(up.ChainId, up.ResId, up.ResName)] = None else: if return_conflict: return True, str(up) else: return True if (down.ChainId, down.ResId, down.ResName) not in seen: seen[(down.ChainId, down.ResId, down.ResName)] = None else: if return_conflict: return True, str(down) else: return True if return_conflict: return False, None else: return False class BaseMultiplet(list): """Hold a base multiplet (3 or more residues) found by RnaView.""" def __init__(self, bases=None, NumberOfBases=None): """Initialize a BaseMultiplet object. bases -- a list or tuple of Base objects. NumberOfBases -- int, the number of bases in this multiplet. """ if bases is None: bases = [] try: self[:] = list(bases) except TypeError: raise RnaViewObjectError(\ 'bases must be convertable to a list') self.NumberOfBases = NumberOfBases def __str__(self): """Return string representation of a BaseMultiplet object.""" return ' -- '.join([str(i) for i in self]) + ';' class BaseMultiplets(list): """A list of BaseMultiplet objects.""" def __init__(self, multiplets=None): """Initialize a BaseMultiplets object. multiplets -- list or tuple of BaseMultiplet objects. """ if multiplets is None: multiplets = [] try: self[:] = list(multiplets) except TypeError: raise RnaViewObjectError(\ 'multiplets must be convertable to a list') def __str__(self): """Return string representation of BaseMultiplets object.""" return '\n'.join([str(i) for i in self]) class PairCounts(dict): """A dictionary of base pair counts.""" pass # ============================================================================= # SELECTION FUNCTIONS # ============================================================================= def in_chain(chains): """Return selection function that returns True when Up and Down in chains. chains -- list of chain IDs, e.g. "AB" or ['A','0'] """ def apply_to(bp): """Return True if Up and Down both in chains. bp -- BasePair object """ if bp.Up.ChainId in chains and bp.Down.ChainId in chains: return True return False return apply_to def is_canonical(bp): """Return True if base pair is standard canonical GC, AU, or wobble GU pair. bp -- BasePair object This functions only looks at the annotation, it doesn't check the base identity. """ if bp.Edges == '+/+': #GC pair return True if bp.Edges == '-/-': #AU pair return True if bp.Edges == 'W/W' and bp.Orientation == 'cis' and\ bp.Saenger == 'XXVIII': #GU pair return True return False def is_not_canonical(bp): """Return True if base pair is not canonical. bp -- BasePair object See is_canoncical for a definition. """ return not is_canonical(bp) def is_stacked(bp): """Return True if base pair is stacked. bp -- BasePair object """ if bp.Edges == 'stacked': return True return False def is_not_stacked(bp): """Return True if bp is not stacked. bp -- BasePair object """ if bp.Edges == 'stacked': return False return True def is_tertiary(bp): """Return True if bp is a tertiary interaction. bp -- BasePair object """ if bp.Saenger and bp.Saenger.startswith('!'): return True return False def is_not_stacked_or_tertiary(bp): """Return True if the base pair is not stacked or a tertiary interaction. bp -- BasePair object """ if bp.Edges == 'stacked': return False if bp.Saenger.startswith('!'): return False return True def is_tertiary_base_base(bp): """Return True if base pair is a tertiary interaction between two bases. bp -- BasePair object Other tertiary interactions can for example be between bases and sugars. """ if bp.Saenger and bp.Saenger.startswith('!1H(b_b)'): return True return False #============================================================================== # RNAVIEW PARSER #============================================================================== def is_roman_numeral(s): """Return True if s is a roman numeral. s -- string """ if not s: return False # there is a comma in the alphabet, because some categories are # combined, split by comma alphabet = dict.fromkeys("IVXDCM,") for i in s: if i not in alphabet: return False return True def is_edge(s): """Return True if s is a description of the base pair edges. s -- string X is added to the list of accepted symbols. X might be used for modified bases. """ chars = dict.fromkeys("WHsS+-?.X") try: first, second = s.split('/') except ValueError: return False if first not in chars or second not in chars: return False return True def is_orientation(s): """Return True if s is a description of the base pair orientation. s -- string """ if s == 'cis' or s == 'tran': return True return False def parse_annotation(data): """Parse the annotation of a base pair, returns tuple of 4 elements. data -- a single string containing the annotation of a single base pair. For example: '-/- cis XX' or 'stacked' Edges: could be 'stacked' or two chars separated by '/', e.g. 'W/H' Orientation: could be 'cis' or 'tran' Conformation: could be 'syn' or 'syn syn' Saenger: could be some roman numeral (or two separated by a comma) or 'n/a' or anything starting with an exclamation mark. """ edges = None orientation = None conformation = None saenger = None for field in data: if field == 'stacked' or is_edge(field): edges = field elif is_orientation(field): orientation = field elif is_roman_numeral(field) or field == 'n/a' or\ field.startswith('!'): saenger = field elif field == 'syn': if not conformation: conformation = field else: conformation += ' '+field else: raise RnaViewParseError("Unknown annotation field: %s"%(field)) return edges, orientation, conformation, saenger def parse_filename(lines): """Return PDB filename from the rnaview output file. lines -- list of strings or anything that behaves like a list of lines """ if len(lines) != 1: raise RnaViewParseError(\ "Parse filename: expected only one line to parse") return lines[0].split(':')[1].strip() def parse_uncommon_residues(lines): """Return dictionary of {(chain_id, res_id, res_name): res_assigned}. lines -- list of strings or anything that behaves like a list of lines Parses the uncommon residue lines from an rnaview file. The chain_id could be empty (parsed as ' '), all other field should have some value. If any of these is missing, an error will be raised. """ result = {} for line in lines: res_name = line[17:20].strip() res_id = line[20:26].strip() chain_id = line[36] res_assigned = line.split(':')[1].strip() if not res_name or not res_id or not res_assigned: raise RnaViewParseError(\ "Found missing field in uncommon residue line.\n"+\ "res_id: %s, res_name: %s, res_assigned: %s"\ %(res_id, res_name, res_assigned)) result[(chain_id, res_id, res_name)] = res_assigned return result def parse_base_pairs(lines): """Return BasePairs object. Parse BASE_PAIR section of rnaview output. lines -- list of strings or anything that behaves like a list of lines An empty chain_id is parsed as ' ' for compatibility with PyMol. """ pairs = [] for line in lines: data = line.split() seq_pos_up, seq_pos_down = data[0].strip(", ").split("_") up_chain_id = data[1].rstrip(':') or ' ' down_chain_id = data[5].rstrip(':') or ' ' up_res_id = data[2].strip() down_res_id = data[4].strip() up_res_name, down_res_name = data[3].split('-') #verify found residues are standard in Rnaview if up_res_name not in RNAVIEW_ACCEPTED or\ down_res_name not in RNAVIEW_ACCEPTED: raise RnaViewParseError(\ "Base found that is not generally accepted by Rnaview:"+\ " %s or %s"%(up_res_name, down_res_name)) #Build upstream and downstream Base objects up = Base(up_chain_id, up_res_id, up_res_name, RnaViewSeqPos=seq_pos_up) down = Base(down_chain_id, down_res_id, down_res_name,\ RnaViewSeqPos=seq_pos_down) e, o, c, s = parse_annotation(data[6:]) pairs.append(BasePair(up, down, Edges=e, Orientation=o,\ Conformation=c, Saenger=s)) return BasePairs(pairs) def parse_base_multiplets(lines): """Return BaseMultiplets object. Parses BASE_MULTIPLETS section. lines -- list of strings or anything that behaves like a list of lines An empty chain_id is parsed as ' ' for compatibility with PyMol. """ residues = [] for line in lines: multi = [] pos_info, rest = line.strip().split('|',1) rnaview_pos = [i.strip() for i in pos_info.split('_') if i != ''] num_info, bases_info = rest.strip().split(']',1) num_bases = int(num_info.split()[1]) bases = bases_info.split('+') if len(rnaview_pos) != len(bases): raise RnaViewParseError(\ "Number of bases (%s) doesn't match number of positions (%s)"\ %(len(bases),len(rnaview_pos))) if num_bases != len(bases): raise RnaViewParseError(\ "Reported number of bases (%s) "%(num_bases)+\ "doesn't match number of found bases (%s)"%(len(bases))) for rv_pos, b in zip(rnaview_pos, bases): data = b.split() chain_id = data[0].rstrip(':') or ' ' res_id = data[1].strip() res_name = data[2].strip() #verify found residue is standard in Rnaview if res_name not in RNAVIEW_ACCEPTED: raise RnaViewParseError(\ "Base found that is not generally accepted by RNAVIEW: %s"\ %(str(res_name))) multi.append(Base(chain_id, res_id, res_name, RnaViewSeqPos=rv_pos)) residues.append(BaseMultiplet(multi, NumberOfBases=num_bases)) return BaseMultiplets(residues) def parse_number_of_pairs(lines): """Return dict with the number of bases and number of base pairs. lines -- list of strings or anything that behaves like a list of lines The two keys in the dictionary are: NUM_PAIRS: contains the number of base pairs (note this number only includes normal base pairs, not stacked pairs, or tertiary interactions NUM_BASES: contains the number of RNA/DNA bases """ if len(lines) != 1: raise RnaViewParseError(\ "parse_number_of_parse should get only a single line") result = {} parts = lines[0].split('=')[1].split() if len(parts) != 4: raise RnaViewParseError("Can't parse 'total base pairs' line") result['NUM_PAIRS'] = int(parts[0]) result['NUM_BASES'] = int(parts[2]) return result def parse_pair_counts(lines): """Parse summary of base pairs at the end of the rnaview output file. lines -- list of strings or anything that behaves like a list of lines Returns PairCounts object which is a dictionary of {Label: count} where label is a string (name from rnaview output), e.g. 'WW--cis', and count is an integer representing the base pair count of that type of base pair. Will raise an error if the number of lines is odd (it expects (multiple) label lines followed by the line with the counts. If lines is empty it'll return an empty PairCounts object (=empty dict). """ if not len(lines)%2 == 0: raise RnaViewParseError("Weird base pair counts format:\n%s"\ %('\n'.join(lines))) res = PairCounts() for x in range(0,len(lines),2): res.update(zip(lines[x].split(),map(int,lines[x+1].split()))) return res def verify_bp_counts(bps, np, pair_counts): """Will raise an error on mismatch of reported/actual number of base pairs. bps -- BasePairs object (the full set from rnaview) np -- reported number of pairs (from rnaview output) pair_counts -- the dictionary of pair counts This function won't return anything. I'll just raise an error if the reported and actual numbers don't match. The "total base pairs" doesn't have to match the counts reported in the dictionary if "X/X" base pairs are present. The lines of code dealing with this are removed. Original check was: #if np != sum(pair_counts.values()): # raise RnaViewParseError(\ # "Reported number of base pairs (%s)"%(np)+\ # " doesn't match detailed "+\ # "counts (%s)"%(sum(pair_counts.values()))) """ subset = bps.select(is_not_stacked_or_tertiary) if np != len(subset): raise RnaViewParseError(\ "Reported number of base pairs (%s)"%(np)+\ " doesn't match number"+\ " of found base pairs (%s)"%(len(subset))) def MinimalRnaviewParser(lines): """Return line groups for uncommon res, base pairs, base multiples, counts. lines -- list of strings or anything that behaves like a list of lines This function groups all the lines into particular groups. It recognizes: FN: filename, should be a single line UC: uncommon residues, lines that start with 'uncommon' BP: base pairs, everything between BEGIN_base-pair and END_base_pair BM: base multiplets, everything between BEGIN_multiplets and END_multiplets NP: number of pairs and bases, single line PC: pair counts, lines between '-----------------------' """ result = {'UC':[],'BP':[],'BM':[],'PC':[], 'FN':[], 'NP':[]} in_bp = False in_bm = False in_pc = False for line in lines: line = line.strip() if line: if line.startswith('PDB data file name'): # filename result['FN'].append(line) continue if line.startswith('The total base pairs ='): # number of pairs result['NP'].append(line) continue if line.startswith('uncommon'): # uncommon residues result['UC'].append(line) elif line.startswith('BEGIN_base-pair'): # base pairs in_bp = True elif line.startswith('END_base-pair'): in_bp = False elif line.startswith('BEGIN_multiplets'): # base multiplets in_bm = True elif line.startswith('END_multiplets'): in_bm = False elif line.startswith('-------'): # pair counts in_pc = True else: if in_bp: result['BP'].append(line) elif in_bm: result['BM'].append(line) elif in_pc: result['PC'].append(line) else: continue return result def RnaviewParser(lines, strict=True): """Parse output from the Rnaview program. lines -- list of strings or anything that behaves like a list of lines, this should be the rnaview output strict -- boolean indicating whether an error should be raised or not, default=True The purpose of this parser is to extract every piece of information from an Rnaview output file. It parses the filename (FN), uncommon residues (UC), base pairs (BP), base multiplets (BM), number of base pairs (NP), and the pair counts (PC). """ parsers = {'FN': parse_filename,\ 'UC':parse_uncommon_residues,\ 'BP':parse_base_pairs,\ 'BM':parse_base_multiplets,\ 'PC':parse_pair_counts, 'NP':parse_number_of_pairs} grouped_lines = MinimalRnaviewParser(lines) # parse the line groups result = {} for section in parsers.keys(): try: result[section] = parsers[section](grouped_lines[section]) except RnaViewParseError, e: if strict: raise RnaViewParseError(str(e)) else: result[section] = None # verify reported versus actual number of base pairs if result['NP'] is not None: num_bps_exp = result['NP']['NUM_PAIRS'] try: verify_bp_counts(result['BP'], num_bps_exp, result['PC']) except RnaViewParseError, e: if strict: raise RnaViewParseError(str(e)) else: pass return result if __name__ == "__main__": pass PyCogent-1.5.3/cogent/parse/sequence.py000644 000765 000024 00000005563 12024702176 021025 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Classes for reading multiple sequence alignment files in different formats.""" import xml.dom.minidom from cogent.parse import fasta, phylip, paml, clustal, genbank from cogent.parse import gbseq, tinyseq, macsim, gcg from cogent.parse.record import FileFormatError __author__ = "Cath Lawrence" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Cath Lawrence", "Gavin Huttley", "Peter Maxwell", "Matthew Wakefield", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" _lc_to_wc = ''.join([[chr(x),'?']['A' <= chr(x) <= 'Z'] for x in range(256)]) def FromFilenameParser(filename, format=None, **kw): """Arguments: - filename: name of the sequence alignment file - format: the multiple sequence file format """ format = format_from_filename(filename, format) f = open(filename, 'U') return FromFileParser(f, format, **kw) def FromFileParser(f, format, dialign_recode=False, **kw): format = format.lower() if format in XML_PARSERS: doctype = format format = 'xml' else: doctype = None if format == 'xml': source = dom = xml.dom.minidom.parse(f) if doctype is None: doctype = str(dom.doctype.name).lower() if doctype not in XML_PARSERS: raise FileFormatError("Unsupported XML doctype %s" % doctype) parser = XML_PARSERS[doctype] else: if format not in PARSERS: raise FileFormatError("Unsupported file format %s" % format) parser = PARSERS[format] source = f for (name, seq) in parser(source, **kw): if isinstance(seq, basestring): if dialign_recode: seq = seq.translate(_lc_to_wc) if not seq.isupper(): seq = seq.upper() yield (name, seq) def format_from_filename(filename, format=None): """Detects format based on filename.""" if format: return format else: return filename[filename.rfind('.')+1:] PARSERS = { 'phylip': phylip.MinimalPhylipParser, 'paml': paml.PamlParser, 'fasta': fasta.MinimalFastaParser, 'mfa': fasta.MinimalFastaParser, 'fa': fasta.MinimalFastaParser, 'faa': fasta.MinimalFastaParser, 'fna': fasta.MinimalFastaParser, 'xmfa': fasta.MinimalXmfaParser, 'gde': fasta.MinimalGdeParser, 'aln': clustal.ClustalParser, 'clustal': clustal.ClustalParser, 'gb': genbank.RichGenbankParser, 'gbk': genbank.RichGenbankParser, 'genbank': genbank.RichGenbankParser, 'msf': gcg.MsfParser, } XML_PARSERS = { 'gbseq': gbseq.GbSeqXmlParser, 'tseq': tinyseq.TinyseqParser, 'macsim': macsim.MacsimParser, } PyCogent-1.5.3/cogent/parse/sfold.py000644 000765 000024 00000000655 12024702176 020321 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.parse.ct import ct_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def sfold_parser(lines): """Parser for Sfold output""" result = ct_parser(lines) return result PyCogent-1.5.3/cogent/parse/sprinzl.py000644 000765 000024 00000020213 12024702176 020703 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Parsers for the Sprinzl tRNA databases. """ from cogent.util.misc import InverseDict from string import strip, maketrans from cogent.core.sequence import RnaSequence from cogent.core.info import Info as InfoClass __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Jeremy Widmann", "Sandra Smit"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def Rna(x, Info=None): if isinstance(x, list): x = ''.join(x) if Info is None: Info = {} return RnaSequence(x.upper().replace('T','U'), Info=InfoClass(Info)) SprinzlFields =['Accession', 'AA', 'Anticodon', 'Species', 'Strain'] def OneLineSprinzlParser(infile): """Returns successive records from the tRNA database. First line labels. This was the first attempt at the parser, and requires quite a lot of preprocessing. Use SprinzlParser for something more general. Works on a file obtained by the following method: 1. Do the default search. 2. Show all the columns and autofit them. 3. Delete the first column of numbers and all blank columns. 4. Name the first 5 columns "Accession, AA, Anticodon, Species, Strain". 5. Save the worksheet as plain text. """ first = True for l in infile: line = l.strip() if not line: continue fields = line.split('\t') if first: #label line label_fields = fields[5:] labels = InverseDict(enumerate(label_fields)) first = False else: info = dict(zip(SprinzlFields, map(strip, fields[0:5]))) info['Labels'] = labels yield Rna(map(strip, fields[5:]), Info=info) GenomicFields = ['', 'Accession', 'AA', '', 'Anticodon', '', 'Species', \ '', '', '', '', '', '', '', '', '', 'Strain', '', '', '', 'Taxonomy'] def _fix_structure(fields, seq): """Returns a string with correct # chars from db struct line. fields should be the result of line.split('\t') Implementation notes: Pairing line uses strange format: = is pair, * is GU pair, and nothing is unpaired. Cells are not padded out to the start or end of the sequence length, presumably to infuriate the unwary. I don't _think_ it's possible to convert these into ViennaStructures since we don't know where each helix starts and ends, and the lengths of each piece can vary. I'd be happy to be proven wrong on this... For some reason, _sometimes_ spaces are inserted, and _sometimes_ the cells are left entirely blank. Also, when there's a noncanonical pair in the helix, the helix is broken into two pieces, so counting pieces isn't going to work for figuring out the ViennaStructure. Expects as input the sequence and the raw structure line. """ num_blanks = 4 pieces = fields[num_blanks:] result = ['.'] * len(seq) for i, p in enumerate(pieces): if p and (p != ' '): result[i] = p return ''.join(result) def _fix_sequence(seq): """Returns string where terminal gaps are replaced with terminal CCA. Some of the sequence in the Genomic tRNA Database have gaps where the acceptor stem (terminal CCA) should be. This function checks the number of terminal gaps and replaces with appropriate part of terminal CCA. """ if seq.endswith('---'): seq = seq[:-3]+'CCA' elif seq.endswith('--'): seq = seq[:-2]+'CA' elif seq.endswith('-'): seq = seq[:-1]+'A' return seq def GenomicSprinzlParser(infile,fix_sequence=False): """Parser for the Genomic tRNA Database. Assumes the file has been prepared by the following method: 1. Set all search fields to empty. 2. Check all the results fields. 3. Perform the search (this takes a while). 4. Save the results worksheet as tab-delimited text. Note that the alignment length is supposed to be 99 bases, but not all the sequences have been padded out with the correct number of hyphens. """ num_blanks = 4 first = True for l in infile: #skip blank lines line = l.rstrip() if not line: continue fields = line.split('\t') if first: #label line #for unknown reasons, some of the field headers have '.' instead #of '0', e.g. '7.' instead of '70'. line = line.replace('.', '0') fields = line.split('\t') labels = InverseDict(enumerate(fields[num_blanks:])) first = False offset = 0 else: #expect 3 record lines at a time if offset == 0: #label line info = dict(zip(GenomicFields, map(strip, fields))) #add in the labels info['Labels'] = labels #convert the taxonomy from a string to a list info['Taxonomy'] = map(strip, info['Taxonomy'].split(';')) #convert the anticodon into RNA info['Anticodon'] = Rna(info['Anticodon']) #get rid of the empty fields del info[''] elif offset == 1: #sequence line raw_seq = ''.join(map(strip, fields)) #for some reason, there are underscores in some sequences raw_seq = raw_seq.replace('_', '-') if fix_sequence: raw_seq = _fix_sequence(raw_seq) seq = Rna(raw_seq, Info=info) elif offset == 2: #structure line seq.Pairing = _fix_structure(fields, seq) yield seq #figure out which type of line we're expecting next offset += 1 if offset > 2: offset = 0 def get_pieces(struct, splits): """Breaks up the structure at fixed positions, returns the pieces. struct: structure string in sprinzl format splits: list or tuple of positions to split on This is a helper function for the sprinzl_to_vienna function. struct = '...===...===.' splits = [0,3,7,-1,13] pieces -> ['...','===.','..===','.'] """ pieces = [] for x in range(len(splits)-1): pieces.append(struct[splits[x]:splits[x+1]]) return pieces def get_counts(struct_piece): """Returns a list of the lengths or the paired regions in the structure. struct_pieces: string, piece of structure in sprinzl format This is a helper function for the sprinzl_to_vienna function struct_piece = '.===.=..' returns [3,1] """ return map(len, filter(None, [i.strip('.') for i in \ struct_piece.split('.')])) def sprinzl_to_vienna(sprinzl_struct): """Constructs vienna structure from sprinzl sec. structure format sprinzl_struct: structure string in sprinzl format Many things are hardcoded in here, so if the format or the alignment changes, these values have to be adjusted!!! The correctness of the splits has been tested on the GenomicDB database from Jan 2006, containing 8163 sequences. """ assert len(sprinzl_struct) == 99 gu='*' wc='=' splits = [0,8,19,29,38,55,79,-11,len(sprinzl_struct)] direction = ['(','(',')','(',')','(',')',')'] #get structural pieces s = sprinzl_struct.replace(gu,wc) pieces = get_pieces(s, splits) assert len(pieces) == len(splits)-1 #get counts of structured regions in each piece, check validity counts = map(get_counts,pieces) pairs = [(0,-1),(1,2),(3,4),(5,6)] for i,j in pairs: assert sum(counts[i]) == sum(counts[j]) #check counts matches directions assert len(counts) == len(direction) #construct string string of brackets brackets = [] for lengths, br in zip(counts,direction): for l in lengths: brackets.append(l*br) brackets = ''.join(brackets) #build vienna structure vienna = [] x=0 for sym in s: if sym == '.': vienna.append(sym) else: vienna.append(brackets[x]) x += 1 return ''.join(vienna) PyCogent-1.5.3/cogent/parse/stockholm.py000644 000765 000024 00000037002 12024702176 021211 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides a parser for Stockholm format files. """ from string import strip from cogent.parse.record import RecordError from cogent.parse.record_finder import DelimitedRecordFinder from cogent.parse.clustal import ClustalParser from cogent.core.sequence import RnaSequence as Rna from cogent.core.sequence import ProteinSequence as Protein from cogent.core.moltype import BYTES from cogent.core.info import Info from cogent.struct.rna2d import WussStructure from cogent.util.transform import trans_all,keep_chars from cogent.core.alignment import Alignment, DataError, SequenceCollection from collections import defaultdict __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2008, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" def is_empty_or_html(line): """Return True for HTML line and empty (or whitespace only) line. line -- string The Stockholm adaptor that retrieves records inlcudes two HTML tags in the record. These lines need to be ignored in addition to empty lines. """ if line.startswith(' '-' rename chain_id = ' ' try: res_id = int(float(line[10:15])) res_ic = ' ' except ValueError: res_id = int(float(line[10:14])) res_ic = line[14] ss_code = line[24] phi = float(line[43:49]) psi = float(line[53:59]) asa = float(line[62:69]) data[(None, None, chain_id, (res_name, res_id, res_ic), None)] = \ { 'STRIDE_SS': ss_code, 'STRIDE_PHI': phi, 'STRIDE_PSI': psi, 'STRIDE_ASA': asa } return data PyCogent-1.5.3/cogent/parse/structure.py000644 000765 000024 00000003664 12024702176 021255 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Classes for reading macromolecular structure files in different formats.""" import xml.dom.minidom from cogent.parse import pdb from cogent.parse.sequence import format_from_filename from cogent.parse.record import FileFormatError __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" def FromFilenameStructureParser(filename, format=None, **kw): """ Returns a structure parser for a specified format for given filename. Arguments: - filename: name of the structure file - format: the structure file format """ format = format_from_filename(filename, format) f = open(filename, 'U') return FromFileStructureParser(f, format, **kw) def FromFileStructureParser(f, format, dialign_recode=False, **kw): """ Returns a structure parser for a specified format for given filename. Arguments: - filename: name of the structure file - format: the structure file format """ if not type(f) is file: raise TypeError('%s is not a file' % f) format = format.lower() if format in XML_PARSERS: doctype = format format = 'xml' else: doctype = None if format == 'xml': source = dom = xml.dom.minidom.parse(f) if doctype is None: doctype = str(dom.doctype.name).lower() if doctype not in XML_PARSERS: raise FileFormatError("Unsupported XML doctype %s" % doctype) parser = XML_PARSERS[doctype] else: if format not in PARSERS: raise FileFormatError("Unsupported file format %s" % format) parser = PARSERS[format] source = f return parser(source, **kw) PARSERS = { 'pdb': pdb.PDBParser, } XML_PARSERS = { } PyCogent-1.5.3/cogent/parse/table.py000644 000765 000024 00000012254 12024702176 020277 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import cPickle, csv from record_finder import is_empty from gzip import GzipFile __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class ConvertFields(object): """converter for input data to Table""" def __init__(self, conversion, by_column=True): """handles conversions of columns or lines Arguments: - by_column: conversion will by done for each column, otherwise done by entire line - """ super(ConvertFields, self).__init__() self.conversion = conversion self.by_column = by_column self._func = self.convertByColumns if not self.by_column: assert callable(conversion), \ "conversion must be callable to convert by line" self._func = self.convertByLine def convertByColumns(self, line): """converts each column in a line""" for index, cast in self.conversion: line[index] = cast(line[index]) return line def convertByLine(self, line): """converts each column in a line""" return self.conversion(line) def _call(self, *args, **kwargs): return self._func(*args, **kwargs) __call__ = _call def SeparatorFormatParser(with_header=True, converter = None, ignore = None, sep=",", strip_wspace=True, limit=None, **kw): """Returns a parser for a delimited tabular file. Arguments: - with_header: when True, first line is taken to be the header. Not passed to converter. - converter: a callable that returns a correctly formatted line. - ignore: lines for which ignore returns True are ignored. White-space lines are always skipped. - sep: the delimiter deparating fields. - strip_wspace: removes redundant white-space from strings. - limit: exits after this many lines""" sep = kw.get("delim", sep) if ignore is None: # keep all lines ignore = lambda x: False by_column = getattr(converter, 'by_column', True) def callable(lines): num_lines = 0 header = None for line in lines: if is_empty(line): continue line = line.strip('\n').split(sep) if strip_wspace and by_column: line = [field.strip() for field in line] if with_header and not header: header = True yield line continue if converter: line = converter(line) if ignore(line): continue yield line num_lines += 1 if limit is not None and num_lines >= limit: break return callable def autogen_reader(infile, sep, with_title, limit=None): """returns a SeparatorFormatParser with field convertor for numeric column types.""" seen_title_line = False for first_data_row in infile: if seen_title_line: break if sep in first_data_row and not seen_title_line: seen_title_line = True infile.seek(0) # reset to start of file numeric_fields = [] for index, value in enumerate(first_data_row.strip().split(sep)): try: v = float(value) except ValueError: try: v = long(value) except ValueError: continue numeric_fields += [(index, eval(value).__class__)] return SeparatorFormatParser(converter=ConvertFields(numeric_fields), sep=sep, limit=limit) def load_delimited(filename, header = True, delimiter = ',', with_title = False, with_legend = False, limit=None): if limit is not None: limit += 1 # don't count header line if filename.endswith('gz'): f = GzipFile(filename, 'rb') else: f = file(filename, "U") reader = csv.reader(f, dialect = 'excel', delimiter = delimiter) rows = [] num_lines = 0 for row in reader: rows.append(row) num_lines += 1 if limit is not None and num_lines >= limit: break f.close() if with_title: title = ''.join(rows.pop(0)) else: title = '' if header: header = rows.pop(0) else: header = None if with_legend: legend = ''.join(rows.pop(-1)) else: legend = '' # now do type casting in the order int, float, default is string for row in rows: for cdex, cell in enumerate(row): try: cell = int(cell) row[cdex] = cell except ValueError: try: cell = float(cell) row[cdex] = cell except ValueError: pass pass return header, rows, title, legend PyCogent-1.5.3/cogent/parse/tinyseq.py000644 000765 000024 00000004754 12024702176 020712 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parser for NCBI Tiny Seq XML format. DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd" """ import xml.dom.minidom from cogent.core import annotation, moltype __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield", "Peter Maxwell", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" """ CAUTION: This XML PARSER uses minidom. This means a bad performance for big files (>5MB), and huge XML files will for sure crash the program! """ def TinyseqParser(doc): """Parser for NCBI Tiny Seq XML format. DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd" Arguments: - doc: An xml.dom.minidom.Document, file object of string Yields: - name, cogent sequence CAUTION: This XML PARSER uses minidom. This means a bad performance for big files (>5MB), and huge XML files will for sure crash the program! """ if isinstance(doc,xml.dom.minidom.Document): dom_obj = doc elif isinstance(doc,file): dom_obj = xml.dom.minidom.parse(doc) elif isinstance(doc,str): dom_obj = xml.dom.minidom.parseString(doc) else: raise TypeError for record in dom_obj.getElementsByTagName('TSeq'): raw_seq = record.getElementsByTagName( 'TSeq_sequence')[0].childNodes[0].nodeValue name = record.getElementsByTagName( 'TSeq_accver')[0].childNodes[0].nodeValue #cast as string to de-unicode raw_string = str(raw_seq).upper() name=str(name) if record.getElementsByTagName( 'TSeq_seqtype')[0].getAttribute('value') == u'protein': alphabet = moltype.PROTEIN else: alphabet = moltype.DNA seq = alphabet.makeSequence(raw_string, Name=name) seq.addAnnotation(annotation.Feature, "genbank_id", name, [(0,len(seq))]) organism = str(record.getElementsByTagName( 'TSeq_orgname')[0].childNodes[0].nodeValue) seq.addAnnotation(annotation.Feature, "organism", organism, [(0,len(seq))]) yield (name, seq) def parse(*args): return TinyseqParser(*args).next()[1] PyCogent-1.5.3/cogent/parse/tree.py000644 000765 000024 00000014307 12024702176 020150 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Parsers for tree formats. Implementation Notes The algorithm used here is fairly general: should possibly make the code generalizable to tree strings that use alternative delimiters and symbols. However, I can't think of any cases where alternatives are used, so this is left to future work. Should possibly build a dict of {label:TreeNode} while parsing to make it convenient to fill in additional data later, e.g. to fill in sequences from their numeric labels in Newick format. Alternatively, maybe TreeNode should get a buildIndex() method that performs the equivalent task. As of 12/27/03, should be capable of parsing the ClustalW .dnd files without difficulty. """ from cogent.core.tree import PhyloNode from cogent.parse.record import RecordError from string import strip, maketrans __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Catherine Lozupone", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" _dnd_token_str = '(:),;' _dnd_tokens = dict.fromkeys(_dnd_token_str) _dnd_tokens_and_spaces = _dnd_token_str + ' \t\v\n' remove_dnd_tokens = maketrans(_dnd_tokens_and_spaces, \ '-'*len(_dnd_tokens_and_spaces)) def safe_for_tree(s): """Makes string s safe for DndParser by removing significant chars.""" return s.translate(remove_dnd_tokens) def bad_dnd_tokens(s, is_valid_name): """Returns list of bad dnd tokens from s, using is_valid_name for names. Useful for finding trees with misformatted names that break parsing. """ for t in DndTokenizer(s): if t in _dnd_tokens: continue #also OK if it's a number try: float(t) continue except: #wasn't a number -- further tests pass if is_valid_name(t): continue #if we got here, nothing worked, so yield the current token yield t def DndTokenizer(data): """Tokenizes data into a stream of punctuation, labels and lengths. Note: data should all be a single sequence, e.g. a single string. """ in_quotes = False saved = [] sa = saved.append for d in data: if d == "'": in_quotes = not(in_quotes) if d in _dnd_tokens and not in_quotes: curr = ''.join(saved).strip() if curr: yield curr yield d saved = [] sa = saved.append else: sa(d) def DndParser(lines, constructor=PhyloNode, unescape_name=False): """Returns tree from the Clustal .dnd file format, and anything equivalent. Tree is made up of cogent.base.tree.PhyloNode objects, with branch lengths (by default, although you can pass in an alternative constructor explicitly). """ if isinstance(lines, str): data = lines else: data = ''.join(lines) #skip arb comment stuff if present: start at first paren paren_index = data.find('(') data = data[paren_index:] left_count = data.count('(') right_count = data.count(')') if left_count != right_count: raise RecordError, "Found %s left parens but %s right parens." % \ (left_count, right_count) tokens = DndTokenizer(data) curr_node = None state = 'PreColon' state1 = 'PreClosed' last_token = None for t in tokens: if t == ':': #expecting branch length state = 'PostColon' #prevent state reset last_token = t continue if t == ')' and (last_token == ',' or last_token == '('): # node without name new_node = _new_child(curr_node, constructor) new_node.Name = None curr_node = new_node.Parent state1 = 'PostClosed' last_token = t continue if t == ')': #closing the current node curr_node = curr_node.Parent state1 = 'PostClosed' last_token = t continue if t == '(': #opening a new node curr_node = _new_child(curr_node, constructor) elif t == ';': #end of data last_token = t break # node without name elif t == ',' and (last_token == ',' or last_token == '('): new_node = _new_child(curr_node, constructor) new_node.Name = None curr_node = new_node.Parent elif t == ',': #separator: next node adds to this node's parent curr_node = curr_node.Parent elif state == 'PreColon' and state1 == 'PreClosed': #data for the current node new_node = _new_child(curr_node, constructor) if unescape_name: if t.startswith("'") and t.endswith("'"): while t.startswith("'") and t.endswith("'"): t = t[1:-1] else: if '_' in t: t = t.replace('_', ' ') new_node.Name = t curr_node = new_node elif state == 'PreColon' and state1 == 'PostClosed': if unescape_name: while t.startswith("'") and t.endswith("'"): t = t[1:-1] curr_node.Name = t elif state == 'PostColon': #length data for the current node curr_node.Length = float(t) else: #can't think of a reason to get here raise RecordError, "Incorrect PhyloNode state? %s" % t state = 'PreColon' #get here for any non-colon token state1 = 'PreClosed' last_token = t if curr_node is not None and curr_node.Parent is not None: raise RecordError, "Didn't get back to root of tree." if curr_node is None: #no data -- return empty node return constructor() return curr_node #this should be the root of the tree def _new_child(old_node, constructor): """Returns new_node which has old_node as its parent.""" new_node = constructor() new_node.Parent = old_node if old_node is not None: if id(new_node) not in map(id, old_node.Children): old_node.Children.append(new_node) return new_node PyCogent-1.5.3/cogent/parse/tree_xml.py000644 000765 000024 00000005056 12024702176 021031 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parses a simple but verbose XML representation of a phylogenetic tree, with elements , , and . XML attributes are not used, so the syntax of parameter names is not restricted at all. Newick ------ ((a,b:3),c); XML --- a b length3.0 c Parameters are inherited by contained clades unless overridden. """ import xml.sax __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class TreeHandler(xml.sax.ContentHandler): def __init__(self, tree_builder): self.build_edge = tree_builder def startDocument(self): self.stack = [({}, None, None)] self.data = {'clades':[], 'params':{}} self.in_clade = False self.current = None def startElement(self, name, attrs): self.parent = self.data self.stack.append((self.data, self.in_clade, self.current)) self.current = "" if name == "clade": self.data = {'params':self.data['params'].copy(), 'clades':[], 'name':None} self.in_clade = True else: self.data = {} self.in_clade = False def characters(self, text): self.current += str(text) def endElement(self, name): getattr(self, 'process_%s' % name)(self.current, **self.data) (self.data, self.in_clade, self.current) = self.stack.pop() self.parent = self.stack[-1][0] def endDocument(self): pass def process_clade(self, text, name, params, clades): edge = self.build_edge(clades, name, params) self.parent['clades'].append(edge) def process_param(self, text, name, value): self.parent['params'][name] = value def process_name(self, text): self.parent['name'] = text.strip() def process_value(self, text): if text == "None": self.parent['value'] = None else: self.parent['value'] = float(text) def parse_string(text, tree_builder): handler = TreeHandler(tree_builder) xml.sax.parseString(text, handler) trees = handler.data['clades'] assert len(trees) == 1, trees return trees[0] PyCogent-1.5.3/cogent/parse/unafold.py000644 000765 000024 00000001576 12024702176 020645 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.parse.ct import ct_parser __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" def unafold_parser(lines=None): """Parser for unafold output""" result = ct_parser(lines) return result def order_structs(result): """Order structures according to energy value Order the structures so that the structure with lowest energy is ranked first and so on... Unafold returns results in the same order as the input files """ for i in result: i.reverse() result.sort() #result.reverse() #to test with the lowest negetiv value as the best struct for i in result: i.reverse() return result PyCogent-1.5.3/cogent/parse/unigene.py000644 000765 000024 00000007574 12024702176 020653 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Parsers for the various files in the UniGene database. """ from cogent.parse.record import MappedRecord, ByPairs, semi_splitter, \ equal_pairs, LineOrientedConstructor, list_adder, int_setter from cogent.parse.record_finder import GbFinder from string import maketrans, strip __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def _read_sts(line): """Turns an STS line (without label) into a record. Infuritatingly, STS lines are not semicolon-delimited, and spaces appear in places they shouldn't. This was the case as of 10/9/03: expect this 'feature' to be unstable! """ filtered = line.replace('=', ' ') return MappedRecord(list(ByPairs(filtered.split()))) def _read_expression(line): """Turns a semicolon-delimited expression line into list of expressions""" return semi_splitter(line) class UniGeneSeqRecord(MappedRecord): Aliases = {'ACC':'Accession', 'CLONE':'CloneId', 'END':'End',\ 'LID':'LibraryId', 'SEQTYPE':'SequenceType', 'TRACE':'Trace', \ 'EST':'EstId', 'NID':'NucleotideId', 'PID':'ProteinId'} class UniGeneProtSimRecord(MappedRecord): Aliases = {'ORG':'Species', 'PROTGI':'ProteinGi', 'ProtId':'ProteinId',\ 'PCT':'PercentSimilarity', 'ALN':'AlignmentScore'} def _read_seq(line): """Turns a sequence line into a UniGeneSeqRecord. BEWARE: first level delimiter is ';' and second level delimiter is '=', but '=' can also appear inside the _value_ of the second level! """ first_level = semi_splitter(line) second_level = map(equal_pairs, first_level) return UniGeneSeqRecord(second_level) def _read_protsim(line): """Turns a protsim line into a UniGeneProtSim record. BEWARE: first level delimiter is ';' and second level delimiter is '=', but '=' can also appear inside the _value_ of the second level! """ first_level = semi_splitter(line) second_level = map(equal_pairs, first_level) return UniGeneProtSimRecord(second_level) class UniGene(MappedRecord): """Holds data for a UniGene record.""" Required = { 'STS':[], 'PROTSIM':[], 'SEQUENCE':[], 'EXPRESS': []} Aliases = {'STS':'Sts', 'PROTSIM':'ProteinSimilarities',\ 'SEQUENCE':'SequenceIds','SCOUNT':'SequenceCount','CTYOBAND':'CytoBand',\ 'EXPRESS':'ExpressedIn', 'CHROMOSOME':'Chromosome','ID':'UniGeneId', \ 'TITLE':'UniGeneTitle','LOCUSLINK':'LocusLinkId'} def _expressions_setter(obj, field, val): """Sets specified field to a list of expressions""" setattr(obj, field, semi_splitter(val)) def _sts_adder(obj, field, val): """Appends the current STS-type record to specified field""" list_adder(obj, field, _read_sts(val)) def _seq_adder(obj, field, val): """Appends the current Sequence-type record to specified field""" list_adder(obj, field, _read_seq(val)) def _protsim_adder(obj, field, val): """Appends the current ProtSim record to specified field""" list_adder(obj, field, _read_protsim(val)) LinesToUniGene = LineOrientedConstructor() LinesToUniGene.Constructor = UniGene LinesToUniGene.FieldMap = { 'LOCUSLINK':int_setter, 'EXPRESS':_expressions_setter, 'PROTSIM':_protsim_adder, 'SCOUNT':int_setter, 'SEQUENCE':_seq_adder, 'STS':_sts_adder, } def UniGeneParser(lines): """Treats lines as a stream of unigene records""" for record in GbFinder(lines): curr = LinesToUniGene(record) del curr['//'] #clean up delimiter yield curr if __name__ == '__main__': from sys import argv, stdout filename = argv[1] count = 0 for record in UniGeneParser(open(filename)): stdout.write('.') stdout.flush() count += 1 print "read %s records" % count PyCogent-1.5.3/cogent/motif/__init__.py000644 000765 000024 00000000616 12024702176 020752 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Code for dealing with modules and motifs. """ __all__ = ['k_word','util'] __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" PyCogent-1.5.3/cogent/motif/k_word.py000644 000765 000024 00000032751 12024702176 020505 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """MotifFinder that searches for over-represented k-words in an alignment.""" from __future__ import division from cogent.motif.util import Motif, Module, ModuleFinder, ModuleConsolidator, \ MotifFinder, Location, ModuleInstance, MotifResults from cogent.core.bitvector import PackedBases from cogent.maths.stats.test import combinations, multiple_comparisons from cogent.maths.stats.distribution import poisson_high from numpy import array, fromstring __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Gavin Huttley", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" class KWordModuleFinder(ModuleFinder): """ModuleFinder that finds all k-words in an alignment. """ def __init__(self,Alignment,MolType): """Initializing KWordModuleFinder. """ self.ModuleDict = {} self.ModuleOrder = [] self.Alignment = Alignment self.MolType = MolType self._make_char_array_aln() def _make_char_array_aln(self): """Turns self.Alignment into a character array. """ for k,v in self.Alignment.items(): self.Alignment[k]=array(v,'c') def __call__(self,word_length): """Builds a dict of all Modules and a list of their order. - module_dict is {module pattern:Module object} - module_order is a list in descending order of their count. """ #Dictionary keying k-word to Module self.ModuleDict = {} #For each sequence in the alignment for key,seq in self.Alignment.items(): #For each position in seq till end - word_length for i in range(0,len(seq)-word_length+1): #Get the current k-word word = seq[i:i+word_length].tostring() #Create a location object location = Location(key,i,i+word_length) #Create a ModuleInstance curr_instance = ModuleInstance(word,location) #Check to see if pattern is already in dict if word in self.ModuleDict: #Add instance to Module self.ModuleDict[word][(key,i)]=curr_instance #Not in dict else: #Create a new module and add to dict self.ModuleDict[word]=Module({(key,i):curr_instance},\ MolType=self.MolType) #Get list of counts module_counts = \ [(len(mod.Names),word) for word,mod in self.ModuleDict.items()] #Sort and put in descending order module_counts.sort() module_counts.reverse() #Get list of only the words in descending order self.ModuleOrder = [word for i,word in module_counts] class KWordModuleConsolidatorNucleotide(ModuleConsolidator): """Consolidates module instances obtained from an alignment into modules. - Must be initialized with an instance of KWordModuleFinder """ def __init__(self,KFinder): """Initializing KWordModuleConsolidatorNucleotide. """ self.Modules=[] self.KFinder=KFinder def __call__(self,mismatches): """Consolidates ModuleInstances in KFinder.ModuleInstances into Modules. - mismatches accounts for the difference between two ModuleInstances in terms of purine vs pyrimidine or strong vs weak bonding. """ #New list to hold consolidated modules consolidated_list = [] #Bind module_dict and module_order locally module_dict = self.KFinder.ModuleDict module_order = self.KFinder.ModuleOrder #Check that dict is not empty if module_dict: #Iterate through modules in order for pattern in module_order: #Create a Bitvector representation of pattern pat_vec = PackedBases(pattern) added=False #Iterate through consolidated_list for curr_module,curr_vec in consolidated_list: #If pat_vec and curr_vec are in the allowed mismatch cutoff if sum(pat_vec ^ curr_vec) <= mismatches: #Add module information to curr_module curr_module.update(module_dict[pattern]) added=True break if not added: #Add new module to consolidated list consolidated_list.append([module_dict[pattern],pat_vec]) #self.Modules should have only the list of modules self.Modules = [i[0] for i in consolidated_list] for mod in self.Modules: mod.Template = str(mod) class KWordModuleConsolidatorProtein(ModuleConsolidator): """Consolidates module instances obtained from an alignment into modules. - Must be initialized with an instance of KWordModuleFinder """ def __init__(self,KFinder): """Initializing KWordModuleConsolidatorProtein. """ self.Modules=[] self.KFinder=KFinder def __call__(self,mismatches): """Consolidates ModuleInstances in KFinder.ModuleInstances into Modules. - mismatches accounts for the difference between two ModuleInstances in terms of purine vs pyrimidine or strong vs weak bonding. """ #New list to hold consolidated modules consolidated_list = [] #Bind module_dict and module_order locally module_dict = self.KFinder.ModuleDict module_order = self.KFinder.ModuleOrder #Check that dict is not empty if module_dict: #Iterate through modules in order for pattern in module_order: #Create a Bitvector representation of pattern pat_array = fromstring(pattern,'c') added=False #Iterate through consolidated_list for curr_module,curr_array in consolidated_list: #If pat_vec and curr_vec are in the allowed mismatch cutoff if sum(pat_array != curr_array) <= mismatches: #Add module information to curr_module curr_module.update(module_dict[pattern]) added=True break if not added: #Add new module to consolidated list consolidated_list.append([module_dict[pattern],pat_array]) #self.Modules should have only the list of modules self.Modules = [i[0] for i in consolidated_list] class KWordModuleFilterProtein(ModuleConsolidator): """Filters list of Modules by number of Modules and seqs where Module is in. - This is a strict ModuleConsolidator. Does not allow mismatches, but rather consolidates based on a minimum allowed modules and minimum number of sequences that a module must be present in. """ def __init__(self,KFinder,Alignment): """Initializing KWordModuleFilterProtein. """ self.Modules=[] self.KFinder=KFinder self.Alignment=Alignment def __call__(self,min_modules,min_seqs): """Adds modules to self.Modules from self.KFinder.ModuleDict as follows: - modules that have at least min_modules and are present in at least min_seqs will be added to self.Modules. """ #First filter by min_modules for k,v in self.KFinder.ModuleDict.items(): #If number of modules is greater than the minimum if len(v) >= min_modules: #Check to see that module is present in at least min_seqs curr_seqs = {} for curr in v.keys(): curr_seqs[curr[0]]=True if len(curr_seqs) >= min_seqs: self.Modules.append(self.fixModuleSequence(v)) def fixModuleSequence(self,module): """Remaps original (non-reduced) sequence string for each ModuleInstance """ module_len=len(str(module)) module.Template=str(module) for k,v in module.items(): seq_id, module_start = k module_end = module_start+module_len loc = Location(seq_id,module_start,module_end) curr_str = \ self.Alignment[seq_id][module_start:module_end] curr_instance = ModuleInstance(curr_str,loc) module[k]=curr_instance return module class KWordMotifFinder(MotifFinder): """Constructs a list of Motifs from a list of Modules. - Must be initialized with a list of Modules and an Alignment. """ def __init__(self,Modules,Alignment,Mismatches,BaseFrequency): """Initializing KWordMotifFinder. """ self.Modules=Modules #List of modules from Alignment self.Alignment=Alignment #Alignment used to find modules self.Mismatches=Mismatches #Number of mismatches used for consolidation self.BaseFrequency=BaseFrequency def __call__(self,threshold,alphabet_len=None,use_loose=True): """Returns a MotifResults object containing a list of significant Motifs - When a Module is found that is less than the threshold, a new Motif object is created and added to the list of significant motifs. """ #Create new MotifResults object k_word_results = MotifResults() k_word_results.Modules=[] k_word_results.Motifs=[] #Add alignment to MotifResults k_word_results.Alignment = self.Alignment #Give each module a unique ID module_id = 0 #For each module for i,module in enumerate(self.Modules): if use_loose: p_curr = self.getPValueLoose(module,alphabet_len) else: p_curr = self.getPValueStrict(module,alphabet_len) #If the P value is less than or equal to the threshold if p_curr <= threshold: #Add the P value to the current module module.Pvalue = p_curr module.ID = module_id module_id+=1 #Append the module to the k_word_results Modules list k_word_results.Modules.append(module) #Create a Motif from the module and append to Motifs list k_word_results.Motifs.append(Motif(module)) return k_word_results def getPValueStrict(self,module, alphabet_len=None): """Returns the Pvalue of a module. - pass alphabet_len if alphabet other than module.MolType was used to find modules (i.e. using a protein reduced alphabet) """ #Length of the module module_len = len(str(module)) #if moltype length has not been passed, get it from module.MolType # if not alphabet_len: alphabet_len = len(module.MolType.Alphabet) #Total length of the alignment aln_len=sum(map(len,self.Alignment.values())) #Number of sequences in the alignment num_seqs = len(self.Alignment) #get module_p module_p = 1 for char in module.Template: module_p *= self.BaseFrequency.get(char,1) #Mean for passing to poisson_high NOT using loose correction strict_mean = \ (aln_len + num_seqs*(1-module_len))*(module_p) #Strict P value from poisson_high strict_p_value = poisson_high(len(module)-1,strict_mean) #Correct P value for multiple comparisons strict_p_corrected = \ multiple_comparisons(strict_p_value,alphabet_len**module_len) return strict_p_corrected def getPValueLoose(self,module, alphabet_len=None): """Returns the Pvalue of a module. - pass alphabet_len if alphabet other than module.MolType was used to find modules (i.e. using a protein reduced alphabet) """ #Length of the module module_len = len(module.Template) #if moltype length has not been passed, get it from module.MolType # if not alphabet_len: alphabet_len = len(module.MolType.Alphabet) #Total length of the alignment aln_len=sum(map(len,self.Alignment.values())) #Number of sequences in the alignment num_seqs = len(self.Alignment) #get module_p module_p = 1 for char in module.Template: module_p *= self._degen_p(char,module.MolType) #Mean for passing to poisson_high using loose correction loose_mean = \ (aln_len + num_seqs*(1-module_len))*(module_p) #Loose P value from poisson_high loose_p_value = poisson_high(len(module)-1,loose_mean) #Correct P value for multiple comparisons loose_p_corrected = \ multiple_comparisons(loose_p_value,alphabet_len**module_len) return loose_p_corrected def _degen_p(self,char,alphabet): """Returns the sum of the probabilities of seeing each degenerate char. """ all = alphabet.Degenerates.get(char,char) return sum([self.BaseFrequency.get(x,1) for x in all]) PyCogent-1.5.3/cogent/motif/util.py000644 000765 000024 00000041506 12024702176 020173 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Utility classes for general motif and module API.""" from __future__ import division from cogent.core.alignment import Alignment from cogent.core.location import Span __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Prototype" class Location(Span): """Object that stores location information for a module -Sequence refers to the original sequence the module came from -SeqId is the key of the sequence in the alignment -Start is the position in the sequence """ def __init__(self, SeqId, Start, End=None): """Initializes location object""" self.SeqId = SeqId Span.__init__(self,Start, End) def __cmp__(self,other): """Overwriting __cmp__ for sorting purposes""" return cmp(self.SeqId, other.SeqId) class ModuleInstanceI(object): """Object that stores individual module instance information. Contains sequence, location, Pvalue and Evalue of a module instance as well as some basic instance functions. """ def __init__(self, Sequence, Location, Pvalue=None, Evalue=None): """Initializes ModuleInstance object""" self.Sequence = Sequence self.Location = Location #Location Object self.Pvalue = Pvalue self.Evalue = Evalue def distance(self,other): """Calculates the distance between two ModuleInstances""" raise NotImplementedError def __cmp__(self,other): """Overwriting __cmp__ function to compare ModuleInstance objects""" if self is other: return 0 return cmp(self.Pvalue,other.Pvalue) \ or cmp(self.Evalue,other.Evalue) \ or cmp(self.Location,other.Location) \ or cmp(str(self),str(other)) def __lt__(self, other): return cmp(self, other) == -1 def __le__(self, other): return cmp(self, other) <= 0 def __gt__(self, other): return cmp(self, other) == 1 def __ge__(self, other): return cmp(self, other) >= 0 def __eq__(self, other): return self.__cmp__(other) == 0 def __ne__(self, other): return cmp(self, other) != 0 class ModuleInstanceStr(ModuleInstanceI, str): """Constructor for ModuleInstance inheriting from string.""" def __new__(cls, data='', *args, **kwargs): return str.__new__(cls, data) def __init__(self, *args, **kwargs): return ModuleInstanceI.__init__(self, *args, **kwargs) def ModuleInstance(data, Location, Pvalue=None, Evalue=None, constructor=None): """Creates ModuleInstance given a constructor.""" if constructor is None: #maybe add code to try to figure out what to do from the data later constructor=ModuleInstanceStr return constructor(data, Location, Pvalue, Evalue) def seqs_from_empty(obj, *args, **kwargs): """Allows empty initialization of Module, useful when data must be added.""" return [], [] class Module(Alignment): """Object that stores module information. Module is an Alignment of ModuleInstances. Constructed as a dict keyed by location with ModuleInstance sequence as the value: - {(SeqId, Start): ModuleInstance} """ InputHandlers = Alignment.InputHandlers.copy() InputHandlers['empty'] = seqs_from_empty def __init__(self, data=None, Template=None, MolType=None,\ Locations=None, Pvalue=None, Evalue=None, Llr=None,\ ID=None,ConsensusSequence=None): """Initializes Module object""" self.Template = Template if MolType is not None: self.MolType = MolType self.Pvalue = Pvalue self.Evalue = Evalue self.Llr = Llr #Log likelihood ratio self.ID = ID self.ConsensusSequence = ConsensusSequence if isinstance(data, dict): data = sorted(data.items()) else: try: data = sorted(data) except TypeError: pass super(Module, self).__init__(data, MolType=MolType) def update(self, other): """Updates self with info in other, in-place. WARNING: No validation!""" self.Names += other.Names self.NamedSeqs.update(other.NamedSeqs) def __setitem__(self, item, val): """Replaces item in self.NamedSeqs. WARNING: No validation!""" if item not in self.NamedSeqs: self.Names.append(item) self.NamedSeqs[item] = val def __repr__(self): return str(self.NamedSeqs) def __str__(self): """Returns string representation of IUPAC consensus sequence""" if len(self.MolType.Alphabet) < 20: return str(self.IUPACConsensus(self.MolType)) return str(''.join(self.majorityConsensus())) def distance(self,other): """Calculates the distance between two Modules""" raise NotImplementedError def __cmp__(self,other): """Overwriting __cmp__ function to compare Module objects""" return cmp(self.Pvalue,other.Pvalue) \ or cmp(self.Evalue,other.Evalue) def __hash__(self): """overwriting __hash__ function to hash Module object""" return id(self) def _get_location_dict(self): """Returns a dict of module locations. Represented as a dict with SeqId as key and [indices] as values: {SeqId:[indices]} """ location_dict = {} for key in self.Names: try: location_dict[key[0]].append(key[1]) except: location_dict[key[0]]=[key[1]] return location_dict LocationDict = property(_get_location_dict) def _get_loose(self): """Returns a list of all ModuleInstances not in self.Strict. """ loose_list = [] strict = self.Strict[0].Sequence for instance in self.values(): if instance.Sequence != strict: loose_list.append(instance) return loose_list Loose = property(_get_loose) def _get_strict(self): """Returns a list of ModuleInstances with the most common sequence. """ strict_dict = {} #Dictionary to hold counts of instance strings. #For each ModuleInstance in self. for instance in self.values(): #If instance already in strict_dict then increment and append. if instance.Sequence in strict_dict: strict_dict[instance.Sequence][0]+=1 strict_dict[instance.Sequence][1].append(instance) #Else, add count and instance to dict. else: strict_dict[instance.Sequence]=[1,[instance]] #List with all counts and instances count_list = strict_dict.values() count_list.sort() count_list.reverse() #Set self.Template as the Strict ModuleInstance sequence. self.Template = count_list[0][1][0].Sequence #Return list of ModuleInstances with the most common sequence. return count_list[0][1] Strict = property(_get_strict) def basePossibilityCount(self,degenerate_dict=None): """Returns number of possible combinations to form a degenerate string. """ if degenerate_dict is None: degenerate_dict = self.MolType.Degenerates #Get degenerate string representation of module degenerate_string = self.__str__() #Get length of first degenerate character combinations = len(degenerate_dict.get(degenerate_string[0],'-')) #Multiply number of possibilities for each degenerate character together for i in range(1, len(degenerate_string)): combinations *= len(degenerate_dict.get(degenerate_string[i],'-')) #Return total possible ways to make module return combinations def _coerce_seqs(self, seqs, is_array): """Override _coerce_seqs so we keep the orig objects.""" return seqs def _seq_to_aligned(self, seq, key): """Override _seq_to_aligned so we keep the orig objects.""" return seq class ModuleFinder(object): """Object that constructs a dict of modules given an alignment""" def __call__(self, *args): """Call method for ModuleFinder""" raise NotImplementedError class ModuleConsolidator(object): """Object that takes in a list of modules and returns a consolidated list. Modules that are very similar are considered the same module. """ def __call__(self, *args): """Call method for ModuleConsolidator""" raise NotImplementedError class Motif(object): """Object that stores modules that are considered the same motif """ def __init__(self, Modules=None, Info=None): """Initializes Motif object""" self.Modules = [] try: #only one module in motif self.Modules.append(Modules) except: #list of modules self.Modules.extend(Modules) self.Info = Info class MotifFinder(object): """Object that takes modules and constructs motifs - Takes in a list of modules and constructs a list of Motifs""" def __call__(self, *args): """Call method for MotifFinder""" raise NotImplementedError class MotifFormatter(object): """Object that takes a list of Motifs and formats them for output to browser - Takes in a list of motifs and generates specified output format. """ COLORS = [ "#00FF00","#0000FF", "#FFFF00", "#00FFFF", "#FF00FF", "#FAEBD7", "#8A2BE2", "#A52A2A", "#00CC00", "#FF6600", "#FF33CC", "#CC33CC", "#9933FF", "#FFCCCC", "#00CCCC", "#CC6666", "#CCCC33", "#66CCFF", "#6633CC", "#FF6633" ] STYLES = ["", "font-weight: bold", "font-style: italic"] def getColorMapS0(self, module_ids): """ Standalone version - needed b/c of pickle problem """ color_map = {} mod = len(MotifFormatter.COLORS) smod = len(MotifFormatter.STYLES) for module_id in module_ids: ix = int(module_id) cur_color = ix % mod cur_style = int(round((ix / mod))) % smod style_str = """background-color: %s; %s; font-family: 'Courier New', Courier""" color_map[module_id] = style_str % ( MotifFormatter.COLORS[cur_color], MotifFormatter.STYLES[cur_style]) return color_map def getColorMap(self, motif_results): """ Return color mapping for motif_results """ module_ids = [] for motif in motif_results.Motifs: for module in motif.Modules: module_ids.append(module.ID) return self.getColorMapS0(sorted(module_ids)) def getColorMapRgb(self, motif_results): """ Return color mapping for motif_results using RGB rather than hex. """ module_ids = [] for motif in motif_results.Motifs: for module in motif.Modules: module_ids.append(module.ID) color_map = {} mod = len(MotifFormatter.COLORS) for module_id in module_ids: ix = int(module_id) cur_color = ix % mod color_map["color_" + str(module_id)] = \ html_color_to_rgb(MotifFormatter.COLORS[cur_color]) return color_map def __init__(self, *args): """Init method for MotifFormatter""" self.ConsCache={} self.ConservationThresh=None def __call__(self, *args): """Call method for MotifFormatter""" raise NotImplementedError def _make_conservation_consensus(self, module): """ Return conservation consensus string """ mod_id = module.ID if mod_id in self.ConsCache: return self.ConsCache[mod_id] cons_thresh = self.ConservationThresh cons_seq = ''.join(module.majorityConsensus()) col_freqs = module.columnFreqs() cons_con_seq = [] for ix, col in enumerate(col_freqs): col_sum = sum(col.values()) keep = False for b, v in col.items(): cur_cons = v / col_sum if cur_cons >= cons_thresh: keep = True if keep: cons_con_seq.append(cons_seq[ix]) else: cons_con_seq.append(" ") self.ConsCache[mod_id] = (cons_seq, ''.join(cons_con_seq)) return self.ConsCache[mod_id] def _flag_conserved_consensus(self, cons_con_seq, cons_seq, cur_seq): """ Annotate consensus """ color_style = """background-color: %s; font-family: 'Courier New', Courier""" span_fmt = """%s""" h_str = [] for ix in range(len(cur_seq)): cur_c = cur_seq[ix] if cur_c == cons_con_seq[ix]: h_str.append(span_fmt % (color_style % "#eeeeee", "+")) elif cons_con_seq[ix] != " ": #h_str.append("-") h_str.append(span_fmt % (color_style % "#ff0000", "-")) elif cons_seq[ix] == cur_c: #h_str.append("*") h_str.append(span_fmt % (color_style % "white", "*")) else: h_str.append(" ") return h_str #return """%s""" % ''.join(h_str) class MotifResults(object): """Object that holds a list of Modules, Motifs and a dict of Results. """ def __init__(self,Modules=None, Motifs=None, Results=None, Parameters=None, Alignment=None,MolType=None): """Initializes MotifResults object.""" self.Modules = Modules or [] self.Motifs = Motifs or [] self.Results = Results or {} #Results not belonging to other categories. if Parameters: self.__dict__.update(Parameters) self.Alignment = Alignment self.MolType = MolType def makeModuleMap(self): """Returns dict of sequence ID keyed to modules. - result = {sequence_id:(index_in_sequence, module_id, module_len)} """ module_map = {} #Dict with locations of every motif keyed by module if self: for motif in self.Motifs: for module in motif.Modules: mod_len = len(module) mod_id = str(module.ID) for skey, indexes in module.LocationDict.items(): if skey not in module_map: module_map[skey] = [] for ix in indexes: module_map[skey].append((ix, mod_id, mod_len)) return module_map def html_color_to_rgb(colorstring): """ convert #RRGGBB to an (R, G, B) tuple - From Python Cookbook. """ colorstring = colorstring.strip() if colorstring[0] == '#': colorstring = colorstring[1:] if len(colorstring) != 6: raise ValueError, "input #%s is not in #RRGGBB format" % colorstring r, g, b = colorstring[:2], colorstring[2:4], colorstring[4:] r, g, b = [int(n, 16) for n in (r, g, b)] #Divide each rgb value by 255.0 so to get float from 0.0-1.0 so colors # work in PyMOL r = r/255.0 g = g/255.0 b = b/255.0 return (r, g, b) def make_remap_dict(results_ids, allowed_ids): """Returns a dict mapping results_ids to allowed_ids. """ remap_dict = {} warning = None if sorted(results_ids) == sorted(allowed_ids): remap_dict = dict(zip(results_ids,results_ids)) else: warning = 'Sequence IDs do not match allowed IDs. IDs were remapped.' for ri in results_ids: curr_match = [] for ai in allowed_ids: if ai.startswith(ri): curr_match.append(ai) if not curr_match: raise ValueError, \ 'Sequence ID "%s" was not found in allowed IDs'%(ri) #if current results id was prefix of more than one allowed ID elif len(curr_match)>1: #Check if any allowed ID matches map to other results IDs for cm in curr_match: #Remove any matches that map to other results IDs for ri2 in results_ids: if ri2 != ri and cm.startswith(ri2): curr_match.remove(cm) #Raise error if still more than one match if len(curr_match)>1: raise ValueError, \ 'Sequence ID "%s" had more than one match in allowed IDs: "%s"'%(ri,str(curr_match)) remap_dict[ri]=curr_match[0] return remap_dict, warning PyCogent-1.5.3/cogent/maths/__init__.py000644 000765 000024 00000001410 12024702176 020741 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['fit_function' 'function_optimisation', 'geometry', 'markov', 'matrix_exponentiation', 'matrix_logarithm', 'optimiser', 'optimisers', 'scipy_optimisers', 'scipy_optimize', 'simannealingoptimiser', 'solve', 'spatial', 'svd'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell", "Matthew Wakefield", "Rob Knight", "Edward Lang", "Sandra Smit", "Antonio Gonzalez Pena"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" PyCogent-1.5.3/cogent/maths/_period.c000644 000765 000024 00001034407 12014704604 020427 0ustar00jrideoutstaff000000 000000 /* Generated by Cython 0.16 on Tue Aug 21 23:07:08 2012 */ #define PY_SSIZE_T_CLEAN #include "Python.h" #ifndef Py_PYTHON_H #error Python headers needed to compile C extensions, please install development version of Python. #elif PY_VERSION_HEX < 0x02040000 #error Cython requires Python 2.4+. #else #include /* For offsetof */ #ifndef offsetof #define offsetof(type, member) ( (size_t) & ((type*)0) -> member ) #endif #if !defined(WIN32) && !defined(MS_WINDOWS) #ifndef __stdcall #define __stdcall #endif #ifndef __cdecl #define __cdecl #endif #ifndef __fastcall #define __fastcall #endif #endif #ifndef DL_IMPORT #define DL_IMPORT(t) t #endif #ifndef DL_EXPORT #define DL_EXPORT(t) t #endif #ifndef PY_LONG_LONG #define PY_LONG_LONG LONG_LONG #endif #ifndef Py_HUGE_VAL #define Py_HUGE_VAL HUGE_VAL #endif #ifdef PYPY_VERSION #define CYTHON_COMPILING_IN_PYPY 1 #define CYTHON_COMPILING_IN_CPYTHON 0 #else #define CYTHON_COMPILING_IN_PYPY 0 #define CYTHON_COMPILING_IN_CPYTHON 1 #endif #if CYTHON_COMPILING_IN_PYPY #define __Pyx_PyCFunction_Call PyObject_Call #else #define __Pyx_PyCFunction_Call PyCFunction_Call #endif #if PY_VERSION_HEX < 0x02050000 typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN #define PY_FORMAT_SIZE_T "" #define PyInt_FromSsize_t(z) PyInt_FromLong(z) #define PyInt_AsSsize_t(o) __Pyx_PyInt_AsInt(o) #define PyNumber_Index(o) PyNumber_Int(o) #define PyIndex_Check(o) PyNumber_Check(o) #define PyErr_WarnEx(category, message, stacklevel) PyErr_Warn(category, message) #define __PYX_BUILD_PY_SSIZE_T "i" #else #define __PYX_BUILD_PY_SSIZE_T "n" #endif #if PY_VERSION_HEX < 0x02060000 #define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt) #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) #define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size) #define PyVarObject_HEAD_INIT(type, size) \ PyObject_HEAD_INIT(type) size, #define PyType_Modified(t) typedef struct { void *buf; PyObject *obj; Py_ssize_t len; Py_ssize_t itemsize; int readonly; int ndim; char *format; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; void *internal; } Py_buffer; #define PyBUF_SIMPLE 0 #define PyBUF_WRITABLE 0x0001 #define PyBUF_FORMAT 0x0004 #define PyBUF_ND 0x0008 #define PyBUF_STRIDES (0x0010 | PyBUF_ND) #define PyBUF_C_CONTIGUOUS (0x0020 | PyBUF_STRIDES) #define PyBUF_F_CONTIGUOUS (0x0040 | PyBUF_STRIDES) #define PyBUF_ANY_CONTIGUOUS (0x0080 | PyBUF_STRIDES) #define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES) #define PyBUF_RECORDS (PyBUF_STRIDES | PyBUF_FORMAT | PyBUF_WRITABLE) #define PyBUF_FULL (PyBUF_INDIRECT | PyBUF_FORMAT | PyBUF_WRITABLE) typedef int (*getbufferproc)(PyObject *, Py_buffer *, int); typedef void (*releasebufferproc)(PyObject *, Py_buffer *); #endif #if PY_MAJOR_VERSION < 3 #define __Pyx_BUILTIN_MODULE_NAME "__builtin__" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #else #define __Pyx_BUILTIN_MODULE_NAME "builtins" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #endif #if PY_MAJOR_VERSION < 3 && PY_MINOR_VERSION < 6 #define PyUnicode_FromString(s) PyUnicode_Decode(s, strlen(s), "UTF-8", "strict") #endif #if PY_MAJOR_VERSION >= 3 #define Py_TPFLAGS_CHECKTYPES 0 #define Py_TPFLAGS_HAVE_INDEX 0 #endif #if (PY_VERSION_HEX < 0x02060000) || (PY_MAJOR_VERSION >= 3) #define Py_TPFLAGS_HAVE_NEWBUFFER 0 #endif #if PY_VERSION_HEX > 0x03030000 && defined(PyUnicode_GET_LENGTH) #define CYTHON_PEP393_ENABLED 1 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_LENGTH(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) PyUnicode_READ_CHAR(u, i) #else #define CYTHON_PEP393_ENABLED 0 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_SIZE(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) ((Py_UCS4)(PyUnicode_AS_UNICODE(u)[i])) #endif #if PY_MAJOR_VERSION >= 3 #define PyBaseString_Type PyUnicode_Type #define PyStringObject PyUnicodeObject #define PyString_Type PyUnicode_Type #define PyString_Check PyUnicode_Check #define PyString_CheckExact PyUnicode_CheckExact #endif #if PY_VERSION_HEX < 0x02060000 #define PyBytesObject PyStringObject #define PyBytes_Type PyString_Type #define PyBytes_Check PyString_Check #define PyBytes_CheckExact PyString_CheckExact #define PyBytes_FromString PyString_FromString #define PyBytes_FromStringAndSize PyString_FromStringAndSize #define PyBytes_FromFormat PyString_FromFormat #define PyBytes_DecodeEscape PyString_DecodeEscape #define PyBytes_AsString PyString_AsString #define PyBytes_AsStringAndSize PyString_AsStringAndSize #define PyBytes_Size PyString_Size #define PyBytes_AS_STRING PyString_AS_STRING #define PyBytes_GET_SIZE PyString_GET_SIZE #define PyBytes_Repr PyString_Repr #define PyBytes_Concat PyString_Concat #define PyBytes_ConcatAndDel PyString_ConcatAndDel #endif #if PY_VERSION_HEX < 0x02060000 #define PySet_Check(obj) PyObject_TypeCheck(obj, &PySet_Type) #define PyFrozenSet_Check(obj) PyObject_TypeCheck(obj, &PyFrozenSet_Type) #endif #ifndef PySet_CheckExact #define PySet_CheckExact(obj) (Py_TYPE(obj) == &PySet_Type) #endif #define __Pyx_TypeCheck(obj, type) PyObject_TypeCheck(obj, (PyTypeObject *)type) #if PY_MAJOR_VERSION >= 3 #define PyIntObject PyLongObject #define PyInt_Type PyLong_Type #define PyInt_Check(op) PyLong_Check(op) #define PyInt_CheckExact(op) PyLong_CheckExact(op) #define PyInt_FromString PyLong_FromString #define PyInt_FromUnicode PyLong_FromUnicode #define PyInt_FromLong PyLong_FromLong #define PyInt_FromSize_t PyLong_FromSize_t #define PyInt_FromSsize_t PyLong_FromSsize_t #define PyInt_AsLong PyLong_AsLong #define PyInt_AS_LONG PyLong_AS_LONG #define PyInt_AsSsize_t PyLong_AsSsize_t #define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask #define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask #endif #if PY_MAJOR_VERSION >= 3 #define PyBoolObject PyLongObject #endif #if PY_VERSION_HEX < 0x03020000 typedef long Py_hash_t; #define __Pyx_PyInt_FromHash_t PyInt_FromLong #define __Pyx_PyInt_AsHash_t PyInt_AsLong #else #define __Pyx_PyInt_FromHash_t PyInt_FromSsize_t #define __Pyx_PyInt_AsHash_t PyInt_AsSsize_t #endif #if (PY_MAJOR_VERSION < 3) || (PY_VERSION_HEX >= 0x03010300) #define __Pyx_PySequence_GetSlice(obj, a, b) PySequence_GetSlice(obj, a, b) #define __Pyx_PySequence_SetSlice(obj, a, b, value) PySequence_SetSlice(obj, a, b, value) #define __Pyx_PySequence_DelSlice(obj, a, b) PySequence_DelSlice(obj, a, b) #else #define __Pyx_PySequence_GetSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), (PyObject*)0) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_GetSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object is unsliceable", (obj)->ob_type->tp_name), (PyObject*)0))) #define __Pyx_PySequence_SetSlice(obj, a, b, value) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_SetSlice(obj, a, b, value)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice assignment", (obj)->ob_type->tp_name), -1))) #define __Pyx_PySequence_DelSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_DelSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice deletion", (obj)->ob_type->tp_name), -1))) #endif #if PY_MAJOR_VERSION >= 3 #define PyMethod_New(func, self, klass) ((self) ? PyMethod_New(func, self) : PyInstanceMethod_New(func)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),((char *)(n))) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),((char *)(n)),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),((char *)(n))) #else #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),(n)) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),(n),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),(n)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_NAMESTR(n) ((char *)(n)) #define __Pyx_DOCSTR(n) ((char *)(n)) #else #define __Pyx_NAMESTR(n) (n) #define __Pyx_DOCSTR(n) (n) #endif #if PY_MAJOR_VERSION >= 3 #define __Pyx_PyNumber_Divide(x,y) PyNumber_TrueDivide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceTrueDivide(x,y) #else #define __Pyx_PyNumber_Divide(x,y) PyNumber_Divide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceDivide(x,y) #endif #ifndef __PYX_EXTERN_C #ifdef __cplusplus #define __PYX_EXTERN_C extern "C" #else #define __PYX_EXTERN_C extern #endif #endif #if defined(WIN32) || defined(MS_WINDOWS) #define _USE_MATH_DEFINES #endif #include #define __PYX_HAVE__cogent__maths___period #define __PYX_HAVE_API__cogent__maths___period #include "stdio.h" #include "stdlib.h" #include "numpy/arrayobject.h" #include "numpy/ufuncobject.h" #ifdef _OPENMP #include #endif /* _OPENMP */ #ifdef PYREX_WITHOUT_ASSERTIONS #define CYTHON_WITHOUT_ASSERTIONS #endif /* inline attribute */ #ifndef CYTHON_INLINE #if defined(__GNUC__) #define CYTHON_INLINE __inline__ #elif defined(_MSC_VER) #define CYTHON_INLINE __inline #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L #define CYTHON_INLINE inline #else #define CYTHON_INLINE #endif #endif /* unused attribute */ #ifndef CYTHON_UNUSED # if defined(__GNUC__) # if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif # elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif #endif typedef struct {PyObject **p; char *s; const long n; const char* encoding; const char is_unicode; const char is_str; const char intern; } __Pyx_StringTabEntry; /*proto*/ /* Type Conversion Predeclarations */ #define __Pyx_PyBytes_FromUString(s) PyBytes_FromString((char*)s) #define __Pyx_PyBytes_AsUString(s) ((unsigned char*) PyBytes_AsString(s)) #define __Pyx_Owned_Py_None(b) (Py_INCREF(Py_None), Py_None) #define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False)) static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject*); static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x); static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject*); static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t); static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject*); #define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x)) #define __pyx_PyFloat_AsFloat(x) ((float) __pyx_PyFloat_AsDouble(x)) #ifdef __GNUC__ /* Test for GCC > 2.95 */ #if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)) #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) #else /* __GNUC__ > 2 ... */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ > 2 ... */ #else /* __GNUC__ */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ */ static PyObject *__pyx_m; static PyObject *__pyx_b; static PyObject *__pyx_empty_tuple; static PyObject *__pyx_empty_bytes; static int __pyx_lineno; static int __pyx_clineno = 0; static const char * __pyx_cfilenm= __FILE__; static const char *__pyx_filename; #if !defined(CYTHON_CCOMPLEX) #if defined(__cplusplus) #define CYTHON_CCOMPLEX 1 #elif defined(_Complex_I) #define CYTHON_CCOMPLEX 1 #else #define CYTHON_CCOMPLEX 0 #endif #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus #include #else #include #endif #endif #if CYTHON_CCOMPLEX && !defined(__cplusplus) && defined(__sun__) && defined(__GNUC__) #undef _Complex_I #define _Complex_I 1.0fj #endif static const char *__pyx_f[] = { "_period.pyx", "numpy.pxd", }; #define IS_UNSIGNED(type) (((type) -1) > 0) struct __Pyx_StructField_; #define __PYX_BUF_FLAGS_PACKED_STRUCT (1 << 0) typedef struct { const char* name; /* for error messages only */ struct __Pyx_StructField_* fields; size_t size; /* sizeof(type) */ size_t arraysize[8]; /* length of array in each dimension */ int ndim; char typegroup; /* _R_eal, _C_omplex, Signed _I_nt, _U_nsigned int, _S_truct, _P_ointer, _O_bject */ char is_unsigned; int flags; } __Pyx_TypeInfo; typedef struct __Pyx_StructField_ { __Pyx_TypeInfo* type; const char* name; size_t offset; } __Pyx_StructField; typedef struct { __Pyx_StructField* field; size_t parent_offset; } __Pyx_BufFmt_StackElem; typedef struct { __Pyx_StructField root; __Pyx_BufFmt_StackElem* head; size_t fmt_offset; size_t new_count, enc_count; size_t struct_alignment; int is_complex; char enc_type; char new_packmode; char enc_packmode; char is_valid_array; } __Pyx_BufFmt_Context; /* "numpy.pxd":722 * # in Cython to enable them only on the right systems. * * ctypedef npy_int8 int8_t # <<<<<<<<<<<<<< * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t */ typedef npy_int8 __pyx_t_5numpy_int8_t; /* "numpy.pxd":723 * * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t # <<<<<<<<<<<<<< * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t */ typedef npy_int16 __pyx_t_5numpy_int16_t; /* "numpy.pxd":724 * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t # <<<<<<<<<<<<<< * ctypedef npy_int64 int64_t * #ctypedef npy_int96 int96_t */ typedef npy_int32 __pyx_t_5numpy_int32_t; /* "numpy.pxd":725 * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t # <<<<<<<<<<<<<< * #ctypedef npy_int96 int96_t * #ctypedef npy_int128 int128_t */ typedef npy_int64 __pyx_t_5numpy_int64_t; /* "numpy.pxd":729 * #ctypedef npy_int128 int128_t * * ctypedef npy_uint8 uint8_t # <<<<<<<<<<<<<< * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t */ typedef npy_uint8 __pyx_t_5numpy_uint8_t; /* "numpy.pxd":730 * * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t # <<<<<<<<<<<<<< * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t */ typedef npy_uint16 __pyx_t_5numpy_uint16_t; /* "numpy.pxd":731 * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t # <<<<<<<<<<<<<< * ctypedef npy_uint64 uint64_t * #ctypedef npy_uint96 uint96_t */ typedef npy_uint32 __pyx_t_5numpy_uint32_t; /* "numpy.pxd":732 * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t # <<<<<<<<<<<<<< * #ctypedef npy_uint96 uint96_t * #ctypedef npy_uint128 uint128_t */ typedef npy_uint64 __pyx_t_5numpy_uint64_t; /* "numpy.pxd":736 * #ctypedef npy_uint128 uint128_t * * ctypedef npy_float32 float32_t # <<<<<<<<<<<<<< * ctypedef npy_float64 float64_t * #ctypedef npy_float80 float80_t */ typedef npy_float32 __pyx_t_5numpy_float32_t; /* "numpy.pxd":737 * * ctypedef npy_float32 float32_t * ctypedef npy_float64 float64_t # <<<<<<<<<<<<<< * #ctypedef npy_float80 float80_t * #ctypedef npy_float128 float128_t */ typedef npy_float64 __pyx_t_5numpy_float64_t; /* "numpy.pxd":746 * # The int types are mapped a bit surprising -- * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t # <<<<<<<<<<<<<< * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t */ typedef npy_long __pyx_t_5numpy_int_t; /* "numpy.pxd":747 * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t * ctypedef npy_longlong long_t # <<<<<<<<<<<<<< * ctypedef npy_longlong longlong_t * */ typedef npy_longlong __pyx_t_5numpy_long_t; /* "numpy.pxd":748 * ctypedef npy_long int_t * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t # <<<<<<<<<<<<<< * * ctypedef npy_ulong uint_t */ typedef npy_longlong __pyx_t_5numpy_longlong_t; /* "numpy.pxd":750 * ctypedef npy_longlong longlong_t * * ctypedef npy_ulong uint_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t */ typedef npy_ulong __pyx_t_5numpy_uint_t; /* "numpy.pxd":751 * * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulonglong_t * */ typedef npy_ulonglong __pyx_t_5numpy_ulong_t; /* "numpy.pxd":752 * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t # <<<<<<<<<<<<<< * * ctypedef npy_intp intp_t */ typedef npy_ulonglong __pyx_t_5numpy_ulonglong_t; /* "numpy.pxd":754 * ctypedef npy_ulonglong ulonglong_t * * ctypedef npy_intp intp_t # <<<<<<<<<<<<<< * ctypedef npy_uintp uintp_t * */ typedef npy_intp __pyx_t_5numpy_intp_t; /* "numpy.pxd":755 * * ctypedef npy_intp intp_t * ctypedef npy_uintp uintp_t # <<<<<<<<<<<<<< * * ctypedef npy_double float_t */ typedef npy_uintp __pyx_t_5numpy_uintp_t; /* "numpy.pxd":757 * ctypedef npy_uintp uintp_t * * ctypedef npy_double float_t # <<<<<<<<<<<<<< * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t */ typedef npy_double __pyx_t_5numpy_float_t; /* "numpy.pxd":758 * * ctypedef npy_double float_t * ctypedef npy_double double_t # <<<<<<<<<<<<<< * ctypedef npy_longdouble longdouble_t * */ typedef npy_double __pyx_t_5numpy_double_t; /* "numpy.pxd":759 * ctypedef npy_double float_t * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cfloat cfloat_t */ typedef npy_longdouble __pyx_t_5numpy_longdouble_t; #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< double > __pyx_t_double_complex; #else typedef double _Complex __pyx_t_double_complex; #endif #else typedef struct { double real, imag; } __pyx_t_double_complex; #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< float > __pyx_t_float_complex; #else typedef float _Complex __pyx_t_float_complex; #endif #else typedef struct { float real, imag; } __pyx_t_float_complex; #endif /*--- Type declarations ---*/ /* "numpy.pxd":761 * ctypedef npy_longdouble longdouble_t * * ctypedef npy_cfloat cfloat_t # <<<<<<<<<<<<<< * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t */ typedef npy_cfloat __pyx_t_5numpy_cfloat_t; /* "numpy.pxd":762 * * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t # <<<<<<<<<<<<<< * ctypedef npy_clongdouble clongdouble_t * */ typedef npy_cdouble __pyx_t_5numpy_cdouble_t; /* "numpy.pxd":763 * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cdouble complex_t */ typedef npy_clongdouble __pyx_t_5numpy_clongdouble_t; /* "numpy.pxd":765 * ctypedef npy_clongdouble clongdouble_t * * ctypedef npy_cdouble complex_t # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew1(a): */ typedef npy_cdouble __pyx_t_5numpy_complex_t; #ifndef CYTHON_REFNANNY #define CYTHON_REFNANNY 0 #endif #if CYTHON_REFNANNY typedef struct { void (*INCREF)(void*, PyObject*, int); void (*DECREF)(void*, PyObject*, int); void (*GOTREF)(void*, PyObject*, int); void (*GIVEREF)(void*, PyObject*, int); void* (*SetupContext)(const char*, int, const char*); void (*FinishContext)(void**); } __Pyx_RefNannyAPIStruct; static __Pyx_RefNannyAPIStruct *__Pyx_RefNanny = NULL; static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname); /*proto*/ #define __Pyx_RefNannyDeclarations void *__pyx_refnanny = NULL; #ifdef WITH_THREAD #define __Pyx_RefNannySetupContext(name, acquire_gil) \ if (acquire_gil) { \ PyGILState_STATE __pyx_gilstate_save = PyGILState_Ensure(); \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ PyGILState_Release(__pyx_gilstate_save); \ } else { \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ } #else #define __Pyx_RefNannySetupContext(name, acquire_gil) \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__) #endif #define __Pyx_RefNannyFinishContext() \ __Pyx_RefNanny->FinishContext(&__pyx_refnanny) #define __Pyx_INCREF(r) __Pyx_RefNanny->INCREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_DECREF(r) __Pyx_RefNanny->DECREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GOTREF(r) __Pyx_RefNanny->GOTREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GIVEREF(r) __Pyx_RefNanny->GIVEREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_XINCREF(r) do { if((r) != NULL) {__Pyx_INCREF(r); }} while(0) #define __Pyx_XDECREF(r) do { if((r) != NULL) {__Pyx_DECREF(r); }} while(0) #define __Pyx_XGOTREF(r) do { if((r) != NULL) {__Pyx_GOTREF(r); }} while(0) #define __Pyx_XGIVEREF(r) do { if((r) != NULL) {__Pyx_GIVEREF(r);}} while(0) #else #define __Pyx_RefNannyDeclarations #define __Pyx_RefNannySetupContext(name, acquire_gil) #define __Pyx_RefNannyFinishContext() #define __Pyx_INCREF(r) Py_INCREF(r) #define __Pyx_DECREF(r) Py_DECREF(r) #define __Pyx_GOTREF(r) #define __Pyx_GIVEREF(r) #define __Pyx_XINCREF(r) Py_XINCREF(r) #define __Pyx_XDECREF(r) Py_XDECREF(r) #define __Pyx_XGOTREF(r) #define __Pyx_XGIVEREF(r) #endif /* CYTHON_REFNANNY */ #define __Pyx_CLEAR(r) do { PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);} while(0) #define __Pyx_XCLEAR(r) do { if((r) != NULL) {PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);}} while(0) static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name); /*proto*/ static void __Pyx_RaiseArgtupleInvalid(const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found); /*proto*/ static void __Pyx_RaiseDoubleKeywordsError(const char* func_name, PyObject* kw_name); /*proto*/ static int __Pyx_ParseOptionalKeywords(PyObject *kwds, PyObject **argnames[], \ PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, \ const char* function_name); /*proto*/ static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact); /*proto*/ static CYTHON_INLINE int __Pyx_GetBufferAndValidate(Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack); static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info); static void __Pyx_RaiseBufferIndexError(int axis); /*proto*/ #define __Pyx_BufPtrStrided1d(type, buf, i0, s0) (type)((char*)buf + i0 * s0) static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb); /*proto*/ static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb); /*proto*/ #include static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Generic(PyObject *o, PyObject* j) { PyObject *r; if (!j) return NULL; r = PyObject_GetItem(o, j); Py_DECREF(j); return r; } #define __Pyx_GetItemInt_List(o, i, size, to_py_func) (((size) <= sizeof(Py_ssize_t)) ? \ __Pyx_GetItemInt_List_Fast(o, i) : \ __Pyx_GetItemInt_Generic(o, to_py_func(i))) static CYTHON_INLINE PyObject *__Pyx_GetItemInt_List_Fast(PyObject *o, Py_ssize_t i) { if (likely(o != Py_None)) { if (likely((0 <= i) & (i < PyList_GET_SIZE(o)))) { PyObject *r = PyList_GET_ITEM(o, i); Py_INCREF(r); return r; } else if ((-PyList_GET_SIZE(o) <= i) & (i < 0)) { PyObject *r = PyList_GET_ITEM(o, PyList_GET_SIZE(o) + i); Py_INCREF(r); return r; } } return __Pyx_GetItemInt_Generic(o, PyInt_FromSsize_t(i)); } #define __Pyx_GetItemInt_Tuple(o, i, size, to_py_func) (((size) <= sizeof(Py_ssize_t)) ? \ __Pyx_GetItemInt_Tuple_Fast(o, i) : \ __Pyx_GetItemInt_Generic(o, to_py_func(i))) static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Tuple_Fast(PyObject *o, Py_ssize_t i) { if (likely(o != Py_None)) { if (likely((0 <= i) & (i < PyTuple_GET_SIZE(o)))) { PyObject *r = PyTuple_GET_ITEM(o, i); Py_INCREF(r); return r; } else if ((-PyTuple_GET_SIZE(o) <= i) & (i < 0)) { PyObject *r = PyTuple_GET_ITEM(o, PyTuple_GET_SIZE(o) + i); Py_INCREF(r); return r; } } return __Pyx_GetItemInt_Generic(o, PyInt_FromSsize_t(i)); } #define __Pyx_GetItemInt(o, i, size, to_py_func) (((size) <= sizeof(Py_ssize_t)) ? \ __Pyx_GetItemInt_Fast(o, i) : \ __Pyx_GetItemInt_Generic(o, to_py_func(i))) static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Fast(PyObject *o, Py_ssize_t i) { if (PyList_CheckExact(o)) { Py_ssize_t n = (likely(i >= 0)) ? i : i + PyList_GET_SIZE(o); if (likely((n >= 0) & (n < PyList_GET_SIZE(o)))) { PyObject *r = PyList_GET_ITEM(o, n); Py_INCREF(r); return r; } } else if (PyTuple_CheckExact(o)) { Py_ssize_t n = (likely(i >= 0)) ? i : i + PyTuple_GET_SIZE(o); if (likely((n >= 0) & (n < PyTuple_GET_SIZE(o)))) { PyObject *r = PyTuple_GET_ITEM(o, n); Py_INCREF(r); return r; } } else if (likely(i >= 0)) { PySequenceMethods *m = Py_TYPE(o)->tp_as_sequence; if (likely(m && m->sq_item)) { return m->sq_item(o, i); } } return __Pyx_GetItemInt_Generic(o, PyInt_FromSsize_t(i)); } static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause); /*proto*/ static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index); static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected); static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void); static void __Pyx_UnpackTupleError(PyObject *, Py_ssize_t index); /*proto*/ static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type); /*proto*/ typedef struct { Py_ssize_t shape, strides, suboffsets; } __Pyx_Buf_DimInfo; typedef struct { size_t refcount; Py_buffer pybuffer; } __Pyx_Buffer; typedef struct { __Pyx_Buffer *rcbuffer; char *data; __Pyx_Buf_DimInfo diminfo[8]; } __Pyx_LocalBuf_ND; #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags); static void __Pyx_ReleaseBuffer(Py_buffer *view); #else #define __Pyx_GetBuffer PyObject_GetBuffer #define __Pyx_ReleaseBuffer PyBuffer_Release #endif static Py_ssize_t __Pyx_zeros[] = {0, 0, 0, 0, 0, 0, 0, 0}; static Py_ssize_t __Pyx_minusones[] = {-1, -1, -1, -1, -1, -1, -1, -1}; static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, long level); /*proto*/ static CYTHON_INLINE void __Pyx_RaiseImportError(PyObject *name); #if CYTHON_CCOMPLEX #ifdef __cplusplus #define __Pyx_CREAL(z) ((z).real()) #define __Pyx_CIMAG(z) ((z).imag()) #else #define __Pyx_CREAL(z) (__real__(z)) #define __Pyx_CIMAG(z) (__imag__(z)) #endif #else #define __Pyx_CREAL(z) ((z).real) #define __Pyx_CIMAG(z) ((z).imag) #endif #if defined(_WIN32) && defined(__cplusplus) && CYTHON_CCOMPLEX #define __Pyx_SET_CREAL(z,x) ((z).real(x)) #define __Pyx_SET_CIMAG(z,y) ((z).imag(y)) #else #define __Pyx_SET_CREAL(z,x) __Pyx_CREAL(z) = (x) #define __Pyx_SET_CIMAG(z,y) __Pyx_CIMAG(z) = (y) #endif static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double, double); #if CYTHON_CCOMPLEX #define __Pyx_c_eq(a, b) ((a)==(b)) #define __Pyx_c_sum(a, b) ((a)+(b)) #define __Pyx_c_diff(a, b) ((a)-(b)) #define __Pyx_c_prod(a, b) ((a)*(b)) #define __Pyx_c_quot(a, b) ((a)/(b)) #define __Pyx_c_neg(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zero(z) ((z)==(double)0) #define __Pyx_c_conj(z) (::std::conj(z)) #if 1 #define __Pyx_c_abs(z) (::std::abs(z)) #define __Pyx_c_pow(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zero(z) ((z)==0) #define __Pyx_c_conj(z) (conj(z)) #if 1 #define __Pyx_c_abs(z) (cabs(z)) #define __Pyx_c_pow(a, b) (cpow(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex); static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex); #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex, __pyx_t_double_complex); #endif #endif static CYTHON_INLINE int __Pyx_PyBytes_Equals(PyObject* s1, PyObject* s2, int equals); /*proto*/ static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float, float); #if CYTHON_CCOMPLEX #define __Pyx_c_eqf(a, b) ((a)==(b)) #define __Pyx_c_sumf(a, b) ((a)+(b)) #define __Pyx_c_difff(a, b) ((a)-(b)) #define __Pyx_c_prodf(a, b) ((a)*(b)) #define __Pyx_c_quotf(a, b) ((a)/(b)) #define __Pyx_c_negf(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zerof(z) ((z)==(float)0) #define __Pyx_c_conjf(z) (::std::conj(z)) #if 1 #define __Pyx_c_absf(z) (::std::abs(z)) #define __Pyx_c_powf(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zerof(z) ((z)==0) #define __Pyx_c_conjf(z) (conjf(z)) #if 1 #define __Pyx_c_absf(z) (cabsf(z)) #define __Pyx_c_powf(a, b) (cpowf(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex); static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex); #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex, __pyx_t_float_complex); #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject *); static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject *); static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject *); static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject *); static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject *); static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject *); static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject *); static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject *); static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject *); static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject *); static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject *); static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject *); static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject *); static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject *); static int __Pyx_check_binary_version(void); #if !defined(__Pyx_PyIdentifier_FromString) #if PY_MAJOR_VERSION < 3 #define __Pyx_PyIdentifier_FromString(s) PyString_FromString(s) #else #define __Pyx_PyIdentifier_FromString(s) PyUnicode_FromString(s) #endif #endif static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict); /*proto*/ static PyObject *__Pyx_ImportModule(const char *name); /*proto*/ typedef struct { int code_line; PyCodeObject* code_object; } __Pyx_CodeObjectCacheEntry; struct __Pyx_CodeObjectCache { int count; int max_count; __Pyx_CodeObjectCacheEntry* entries; }; static struct __Pyx_CodeObjectCache __pyx_code_cache = {0,0,NULL}; static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line); static PyCodeObject *__pyx_find_code_object(int code_line); static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object); static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename); /*proto*/ static int __Pyx_InitStrings(__Pyx_StringTabEntry *t); /*proto*/ /* Module declarations from 'cpython.buffer' */ /* Module declarations from 'cpython.ref' */ /* Module declarations from 'libc.stdio' */ /* Module declarations from 'cpython.object' */ /* Module declarations from 'libc.stdlib' */ /* Module declarations from 'numpy' */ /* Module declarations from 'numpy' */ static PyTypeObject *__pyx_ptype_5numpy_dtype = 0; static PyTypeObject *__pyx_ptype_5numpy_flatiter = 0; static PyTypeObject *__pyx_ptype_5numpy_broadcast = 0; static PyTypeObject *__pyx_ptype_5numpy_ndarray = 0; static PyTypeObject *__pyx_ptype_5numpy_ufunc = 0; static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *, char *, char *, int *); /*proto*/ /* Module declarations from 'cogent.maths._period' */ static __Pyx_TypeInfo __Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t = { "float64_t", NULL, sizeof(__pyx_t_5numpy_float64_t), { 0 }, 0, 'R', 0, 0 }; static __Pyx_TypeInfo __Pyx_TypeInfo___pyx_t_double_complex = { "double complex", NULL, sizeof(__pyx_t_double_complex), { 0 }, 0, 'C', 0, 0 }; static __Pyx_TypeInfo __Pyx_TypeInfo_nn___pyx_t_5numpy_uint8_t = { "uint8_t", NULL, sizeof(__pyx_t_5numpy_uint8_t), { 0 }, 0, 'U', IS_UNSIGNED(__pyx_t_5numpy_uint8_t), 0 }; #define __Pyx_MODULE_NAME "cogent.maths._period" int __pyx_module_is_main_cogent__maths___period = 0; /* Implementation of 'cogent.maths._period' */ static PyObject *__pyx_builtin_range; static PyObject *__pyx_builtin_ValueError; static PyObject *__pyx_builtin_RuntimeError; static PyObject *__pyx_pf_6cogent_5maths_7_period_goertzel_inner(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_x, int __pyx_v_N, int __pyx_v_period); /* proto */ static PyObject *__pyx_pf_6cogent_5maths_7_period_2ipdft_inner(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_x, PyArrayObject *__pyx_v_X, PyArrayObject *__pyx_v_W, int __pyx_v_ulim, int __pyx_v_N); /* proto */ static PyObject *__pyx_pf_6cogent_5maths_7_period_4autocorr_inner(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_x, PyArrayObject *__pyx_v_xc, int __pyx_v_N); /* proto */ static PyObject *__pyx_pf_6cogent_5maths_7_period_6seq_to_symbols(CYTHON_UNUSED PyObject *__pyx_self, char *__pyx_v_seq, PyObject *__pyx_v_motifs, int __pyx_v_motif_length, PyArrayObject *__pyx_v_result); /* proto */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /* proto */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info); /* proto */ static char __pyx_k_1[] = "ndarray is not C contiguous"; static char __pyx_k_3[] = "ndarray is not Fortran contiguous"; static char __pyx_k_5[] = "Non-native byte order not supported"; static char __pyx_k_7[] = "unknown dtype code in numpy.pxd (%d)"; static char __pyx_k_8[] = "Format string allocated too short, see comment in numpy.pxd"; static char __pyx_k_11[] = "Format string allocated too short."; static char __pyx_k_15[] = "/Users/gavin/DevRepos/PyCogent-hg/cogent/maths/_period.pyx"; static char __pyx_k_16[] = "cogent.maths._period"; static char __pyx_k__B[] = "B"; static char __pyx_k__H[] = "H"; static char __pyx_k__I[] = "I"; static char __pyx_k__L[] = "L"; static char __pyx_k__N[] = "N"; static char __pyx_k__O[] = "O"; static char __pyx_k__Q[] = "Q"; static char __pyx_k__W[] = "W"; static char __pyx_k__X[] = "X"; static char __pyx_k__b[] = "b"; static char __pyx_k__d[] = "d"; static char __pyx_k__f[] = "f"; static char __pyx_k__g[] = "g"; static char __pyx_k__h[] = "h"; static char __pyx_k__i[] = "i"; static char __pyx_k__j[] = "j"; static char __pyx_k__l[] = "l"; static char __pyx_k__m[] = "m"; static char __pyx_k__n[] = "n"; static char __pyx_k__p[] = "p"; static char __pyx_k__q[] = "q"; static char __pyx_k__s[] = "s"; static char __pyx_k__w[] = "w"; static char __pyx_k__x[] = "x"; static char __pyx_k__Zd[] = "Zd"; static char __pyx_k__Zf[] = "Zf"; static char __pyx_k__Zg[] = "Zg"; static char __pyx_k__pi[] = "pi"; static char __pyx_k__xc[] = "xc"; static char __pyx_k__cos[] = "cos"; static char __pyx_k__exp[] = "exp"; static char __pyx_k__got[] = "got"; static char __pyx_k__seq[] = "seq"; static char __pyx_k__sqrt[] = "sqrt"; static char __pyx_k__ulim[] = "ulim"; static char __pyx_k__coeff[] = "coeff"; static char __pyx_k__numpy[] = "numpy"; static char __pyx_k__power[] = "power"; static char __pyx_k__range[] = "range"; static char __pyx_k__motifs[] = "motifs"; static char __pyx_k__period[] = "period"; static char __pyx_k__result[] = "result"; static char __pyx_k__s_prev[] = "s_prev"; static char __pyx_k__s_prev2[] = "s_prev2"; static char __pyx_k____main__[] = "__main__"; static char __pyx_k____test__[] = "__test__"; static char __pyx_k__ValueError[] = "ValueError"; static char __pyx_k__num_motifs[] = "num_motifs"; static char __pyx_k__ipdft_inner[] = "ipdft_inner"; static char __pyx_k__RuntimeError[] = "RuntimeError"; static char __pyx_k__motif_length[] = "motif_length"; static char __pyx_k__autocorr_inner[] = "autocorr_inner"; static char __pyx_k__goertzel_inner[] = "goertzel_inner"; static char __pyx_k__seq_to_symbols[] = "seq_to_symbols"; static PyObject *__pyx_kp_u_1; static PyObject *__pyx_kp_u_11; static PyObject *__pyx_kp_s_15; static PyObject *__pyx_n_s_16; static PyObject *__pyx_kp_u_3; static PyObject *__pyx_kp_u_5; static PyObject *__pyx_kp_u_7; static PyObject *__pyx_kp_u_8; static PyObject *__pyx_n_s__N; static PyObject *__pyx_n_s__RuntimeError; static PyObject *__pyx_n_s__ValueError; static PyObject *__pyx_n_s__W; static PyObject *__pyx_n_s__X; static PyObject *__pyx_n_s____main__; static PyObject *__pyx_n_s____test__; static PyObject *__pyx_n_s__autocorr_inner; static PyObject *__pyx_n_s__coeff; static PyObject *__pyx_n_s__cos; static PyObject *__pyx_n_s__exp; static PyObject *__pyx_n_s__goertzel_inner; static PyObject *__pyx_n_s__got; static PyObject *__pyx_n_s__i; static PyObject *__pyx_n_s__ipdft_inner; static PyObject *__pyx_n_s__j; static PyObject *__pyx_n_s__m; static PyObject *__pyx_n_s__motif_length; static PyObject *__pyx_n_s__motifs; static PyObject *__pyx_n_s__n; static PyObject *__pyx_n_s__num_motifs; static PyObject *__pyx_n_s__numpy; static PyObject *__pyx_n_s__p; static PyObject *__pyx_n_s__period; static PyObject *__pyx_n_s__pi; static PyObject *__pyx_n_s__power; static PyObject *__pyx_n_s__range; static PyObject *__pyx_n_s__result; static PyObject *__pyx_n_s__s; static PyObject *__pyx_n_s__s_prev; static PyObject *__pyx_n_s__s_prev2; static PyObject *__pyx_n_s__seq; static PyObject *__pyx_n_s__seq_to_symbols; static PyObject *__pyx_n_s__sqrt; static PyObject *__pyx_n_s__ulim; static PyObject *__pyx_n_s__w; static PyObject *__pyx_n_s__x; static PyObject *__pyx_n_s__xc; static PyObject *__pyx_int_2; static PyObject *__pyx_int_15; static PyObject *__pyx_k_tuple_2; static PyObject *__pyx_k_tuple_4; static PyObject *__pyx_k_tuple_6; static PyObject *__pyx_k_tuple_9; static PyObject *__pyx_k_tuple_10; static PyObject *__pyx_k_tuple_12; static PyObject *__pyx_k_tuple_13; static PyObject *__pyx_k_tuple_17; static PyObject *__pyx_k_tuple_19; static PyObject *__pyx_k_tuple_21; static PyObject *__pyx_k_codeobj_14; static PyObject *__pyx_k_codeobj_18; static PyObject *__pyx_k_codeobj_20; static PyObject *__pyx_k_codeobj_22; /* Python wrapper */ static PyObject *__pyx_pw_6cogent_5maths_7_period_1goertzel_inner(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static char __pyx_doc_6cogent_5maths_7_period_goertzel_inner[] = "returns the power from series x for period"; static PyMethodDef __pyx_mdef_6cogent_5maths_7_period_1goertzel_inner = {__Pyx_NAMESTR("goertzel_inner"), (PyCFunction)__pyx_pw_6cogent_5maths_7_period_1goertzel_inner, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(__pyx_doc_6cogent_5maths_7_period_goertzel_inner)}; static PyObject *__pyx_pw_6cogent_5maths_7_period_1goertzel_inner(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_x = 0; int __pyx_v_N; int __pyx_v_period; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__x,&__pyx_n_s__N,&__pyx_n_s__period,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("goertzel_inner (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[3] = {0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__x); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__N); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("goertzel_inner", 1, 3, 3, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__period); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("goertzel_inner", 1, 3, 3, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "goertzel_inner") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 3) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); } __pyx_v_x = ((PyArrayObject *)values[0]); __pyx_v_N = __Pyx_PyInt_AsInt(values[1]); if (unlikely((__pyx_v_N == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_period = __Pyx_PyInt_AsInt(values[2]); if (unlikely((__pyx_v_period == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("goertzel_inner", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.maths._period.goertzel_inner", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_x), __pyx_ptype_5numpy_ndarray, 1, "x", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_5maths_7_period_goertzel_inner(__pyx_self, __pyx_v_x, __pyx_v_N, __pyx_v_period); goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/_period.pyx":6 * # TODO intro the version idea of peter's see email from him on Wednesday, 26 May 2010 * * def goertzel_inner(np.ndarray[np.float64_t, ndim=1] x, int N, int period): # <<<<<<<<<<<<<< * """returns the power from series x for period""" * cdef int n */ static PyObject *__pyx_pf_6cogent_5maths_7_period_goertzel_inner(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_x, int __pyx_v_N, int __pyx_v_period) { int __pyx_v_n; __pyx_t_5numpy_float64_t __pyx_v_coeff; __pyx_t_5numpy_float64_t __pyx_v_s; __pyx_t_5numpy_float64_t __pyx_v_s_prev; __pyx_t_5numpy_float64_t __pyx_v_s_prev2; __pyx_t_5numpy_float64_t __pyx_v_power; __Pyx_LocalBuf_ND __pyx_pybuffernd_x; __Pyx_Buffer __pyx_pybuffer_x; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; PyObject *__pyx_t_2 = NULL; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; PyObject *__pyx_t_5 = NULL; __pyx_t_5numpy_float64_t __pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; int __pyx_t_10; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("goertzel_inner", 0); __pyx_pybuffer_x.pybuffer.buf = NULL; __pyx_pybuffer_x.refcount = 0; __pyx_pybuffernd_x.data = NULL; __pyx_pybuffernd_x.rcbuffer = &__pyx_pybuffer_x; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_x.rcbuffer->pybuffer, (PyObject*)__pyx_v_x, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_x.diminfo[0].strides = __pyx_pybuffernd_x.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_x.diminfo[0].shape = __pyx_pybuffernd_x.rcbuffer->pybuffer.shape[0]; /* "cogent/maths/_period.pyx":11 * cdef np.float64_t coeff, s, s_prev, s_prev2, power * * coeff = 2.0 * cos(2 * pi / period) # <<<<<<<<<<<<<< * s_prev = 0.0 * s_prev2 = 0.0 */ __pyx_t_1 = PyFloat_FromDouble(2.0); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = __Pyx_GetName(__pyx_m, __pyx_n_s__cos); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_3 = __Pyx_GetName(__pyx_m, __pyx_n_s__pi); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = PyNumber_Multiply(__pyx_int_2, __pyx_t_3); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyInt_FromLong(__pyx_v_period); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = __Pyx_PyNumber_Divide(__pyx_t_4, __pyx_t_3); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_5); __Pyx_GIVEREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_t_2, ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __pyx_t_3 = PyNumber_Multiply(__pyx_t_1, __pyx_t_5); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __pyx_PyFloat_AsDouble(__pyx_t_3); if (unlikely((__pyx_t_6 == (npy_float64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 11; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_v_coeff = __pyx_t_6; /* "cogent/maths/_period.pyx":12 * * coeff = 2.0 * cos(2 * pi / period) * s_prev = 0.0 # <<<<<<<<<<<<<< * s_prev2 = 0.0 * for n in range(N): */ __pyx_v_s_prev = 0.0; /* "cogent/maths/_period.pyx":13 * coeff = 2.0 * cos(2 * pi / period) * s_prev = 0.0 * s_prev2 = 0.0 # <<<<<<<<<<<<<< * for n in range(N): * s = x[n] + coeff * s_prev - s_prev2 */ __pyx_v_s_prev2 = 0.0; /* "cogent/maths/_period.pyx":14 * s_prev = 0.0 * s_prev2 = 0.0 * for n in range(N): # <<<<<<<<<<<<<< * s = x[n] + coeff * s_prev - s_prev2 * s_prev2 = s_prev */ __pyx_t_7 = __pyx_v_N; for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) { __pyx_v_n = __pyx_t_8; /* "cogent/maths/_period.pyx":15 * s_prev2 = 0.0 * for n in range(N): * s = x[n] + coeff * s_prev - s_prev2 # <<<<<<<<<<<<<< * s_prev2 = s_prev * s_prev = s */ __pyx_t_9 = __pyx_v_n; __pyx_t_10 = -1; if (__pyx_t_9 < 0) { __pyx_t_9 += __pyx_pybuffernd_x.diminfo[0].shape; if (unlikely(__pyx_t_9 < 0)) __pyx_t_10 = 0; } else if (unlikely(__pyx_t_9 >= __pyx_pybuffernd_x.diminfo[0].shape)) __pyx_t_10 = 0; if (unlikely(__pyx_t_10 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_10); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_v_s = (((*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_x.rcbuffer->pybuffer.buf, __pyx_t_9, __pyx_pybuffernd_x.diminfo[0].strides)) + (__pyx_v_coeff * __pyx_v_s_prev)) - __pyx_v_s_prev2); /* "cogent/maths/_period.pyx":16 * for n in range(N): * s = x[n] + coeff * s_prev - s_prev2 * s_prev2 = s_prev # <<<<<<<<<<<<<< * s_prev = s * */ __pyx_v_s_prev2 = __pyx_v_s_prev; /* "cogent/maths/_period.pyx":17 * s = x[n] + coeff * s_prev - s_prev2 * s_prev2 = s_prev * s_prev = s # <<<<<<<<<<<<<< * * power = sqrt(s_prev2**2 + s_prev**2 - coeff * s_prev2 * s_prev) */ __pyx_v_s_prev = __pyx_v_s; } /* "cogent/maths/_period.pyx":19 * s_prev = s * * power = sqrt(s_prev2**2 + s_prev**2 - coeff * s_prev2 * s_prev) # <<<<<<<<<<<<<< * return power * */ __pyx_t_3 = __Pyx_GetName(__pyx_m, __pyx_n_s__sqrt); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 19; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyFloat_FromDouble(((pow(__pyx_v_s_prev2, 2.0) + pow(__pyx_v_s_prev, 2.0)) - ((__pyx_v_coeff * __pyx_v_s_prev2) * __pyx_v_s_prev))); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 19; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 19; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_t_5); __Pyx_GIVEREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_t_3, ((PyObject *)__pyx_t_1), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 19; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0; __pyx_t_6 = __pyx_PyFloat_AsDouble(__pyx_t_5); if (unlikely((__pyx_t_6 == (npy_float64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 19; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_v_power = __pyx_t_6; /* "cogent/maths/_period.pyx":20 * * power = sqrt(s_prev2**2 + s_prev**2 - coeff * s_prev2 * s_prev) * return power # <<<<<<<<<<<<<< * * def ipdft_inner(np.ndarray[np.float64_t, ndim=1] x, */ __Pyx_XDECREF(__pyx_r); __pyx_t_5 = PyFloat_FromDouble(__pyx_v_power); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 20; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_r = __pyx_t_5; __pyx_t_5 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_x.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.maths._period.goertzel_inner", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_x.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_5maths_7_period_3ipdft_inner(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static char __pyx_doc_6cogent_5maths_7_period_2ipdft_inner[] = "use this when repeated calls for window of same length are to be\n made"; static PyMethodDef __pyx_mdef_6cogent_5maths_7_period_3ipdft_inner = {__Pyx_NAMESTR("ipdft_inner"), (PyCFunction)__pyx_pw_6cogent_5maths_7_period_3ipdft_inner, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(__pyx_doc_6cogent_5maths_7_period_2ipdft_inner)}; static PyObject *__pyx_pw_6cogent_5maths_7_period_3ipdft_inner(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_x = 0; PyArrayObject *__pyx_v_X = 0; PyArrayObject *__pyx_v_W = 0; int __pyx_v_ulim; int __pyx_v_N; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__x,&__pyx_n_s__X,&__pyx_n_s__W,&__pyx_n_s__ulim,&__pyx_n_s__N,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("ipdft_inner (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[5] = {0,0,0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4); case 4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3); case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__x); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__X); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("ipdft_inner", 1, 5, 5, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__W); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("ipdft_inner", 1, 5, 5, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 3: values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__ulim); if (likely(values[3])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("ipdft_inner", 1, 5, 5, 3); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 4: values[4] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__N); if (likely(values[4])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("ipdft_inner", 1, 5, 5, 4); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "ipdft_inner") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 5) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); values[3] = PyTuple_GET_ITEM(__pyx_args, 3); values[4] = PyTuple_GET_ITEM(__pyx_args, 4); } __pyx_v_x = ((PyArrayObject *)values[0]); __pyx_v_X = ((PyArrayObject *)values[1]); __pyx_v_W = ((PyArrayObject *)values[2]); __pyx_v_ulim = __Pyx_PyInt_AsInt(values[3]); if (unlikely((__pyx_v_ulim == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 25; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_N = __Pyx_PyInt_AsInt(values[4]); if (unlikely((__pyx_v_N == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 25; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("ipdft_inner", 1, 5, 5, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.maths._period.ipdft_inner", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_x), __pyx_ptype_5numpy_ndarray, 1, "x", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_X), __pyx_ptype_5numpy_ndarray, 1, "X", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 23; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_W), __pyx_ptype_5numpy_ndarray, 1, "W", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 24; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_5maths_7_period_2ipdft_inner(__pyx_self, __pyx_v_x, __pyx_v_X, __pyx_v_W, __pyx_v_ulim, __pyx_v_N); goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/_period.pyx":22 * return power * * def ipdft_inner(np.ndarray[np.float64_t, ndim=1] x, # <<<<<<<<<<<<<< * np.ndarray[np.complex128_t, ndim=1] X, * np.ndarray[np.complex128_t, ndim=1] W, */ static PyObject *__pyx_pf_6cogent_5maths_7_period_2ipdft_inner(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_x, PyArrayObject *__pyx_v_X, PyArrayObject *__pyx_v_W, int __pyx_v_ulim, int __pyx_v_N) { int __pyx_v_n; int __pyx_v_p; __pyx_t_double_complex __pyx_v_w; __Pyx_LocalBuf_ND __pyx_pybuffernd_W; __Pyx_Buffer __pyx_pybuffer_W; __Pyx_LocalBuf_ND __pyx_pybuffernd_X; __Pyx_Buffer __pyx_pybuffer_X; __Pyx_LocalBuf_ND __pyx_pybuffernd_x; __Pyx_Buffer __pyx_pybuffer_x; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations int __pyx_t_1; int __pyx_t_2; int __pyx_t_3; int __pyx_t_4; int __pyx_t_5; int __pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; int __pyx_t_10; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("ipdft_inner", 0); __pyx_pybuffer_x.pybuffer.buf = NULL; __pyx_pybuffer_x.refcount = 0; __pyx_pybuffernd_x.data = NULL; __pyx_pybuffernd_x.rcbuffer = &__pyx_pybuffer_x; __pyx_pybuffer_X.pybuffer.buf = NULL; __pyx_pybuffer_X.refcount = 0; __pyx_pybuffernd_X.data = NULL; __pyx_pybuffernd_X.rcbuffer = &__pyx_pybuffer_X; __pyx_pybuffer_W.pybuffer.buf = NULL; __pyx_pybuffer_W.refcount = 0; __pyx_pybuffernd_W.data = NULL; __pyx_pybuffernd_W.rcbuffer = &__pyx_pybuffer_W; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_x.rcbuffer->pybuffer, (PyObject*)__pyx_v_x, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_x.diminfo[0].strides = __pyx_pybuffernd_x.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_x.diminfo[0].shape = __pyx_pybuffernd_x.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_X.rcbuffer->pybuffer, (PyObject*)__pyx_v_X, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_X.diminfo[0].strides = __pyx_pybuffernd_X.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_X.diminfo[0].shape = __pyx_pybuffernd_X.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_W.rcbuffer->pybuffer, (PyObject*)__pyx_v_W, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_W.diminfo[0].strides = __pyx_pybuffernd_W.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_W.diminfo[0].shape = __pyx_pybuffernd_W.rcbuffer->pybuffer.shape[0]; /* "cogent/maths/_period.pyx":31 * cdef np.complex128_t w * * for p in range(ulim): # <<<<<<<<<<<<<< * w = 1.0 * for n in range(N): */ __pyx_t_1 = __pyx_v_ulim; for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) { __pyx_v_p = __pyx_t_2; /* "cogent/maths/_period.pyx":32 * * for p in range(ulim): * w = 1.0 # <<<<<<<<<<<<<< * for n in range(N): * if n != 0: */ __pyx_v_w = __pyx_t_double_complex_from_parts(1.0, 0); /* "cogent/maths/_period.pyx":33 * for p in range(ulim): * w = 1.0 * for n in range(N): # <<<<<<<<<<<<<< * if n != 0: * w = w * W[p] */ __pyx_t_3 = __pyx_v_N; for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) { __pyx_v_n = __pyx_t_4; /* "cogent/maths/_period.pyx":34 * w = 1.0 * for n in range(N): * if n != 0: # <<<<<<<<<<<<<< * w = w * W[p] * X[p] = X[p] + x[n]*w */ __pyx_t_5 = (__pyx_v_n != 0); if (__pyx_t_5) { /* "cogent/maths/_period.pyx":35 * for n in range(N): * if n != 0: * w = w * W[p] # <<<<<<<<<<<<<< * X[p] = X[p] + x[n]*w * return X */ __pyx_t_6 = __pyx_v_p; __pyx_t_7 = -1; if (__pyx_t_6 < 0) { __pyx_t_6 += __pyx_pybuffernd_W.diminfo[0].shape; if (unlikely(__pyx_t_6 < 0)) __pyx_t_7 = 0; } else if (unlikely(__pyx_t_6 >= __pyx_pybuffernd_W.diminfo[0].shape)) __pyx_t_7 = 0; if (unlikely(__pyx_t_7 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_7); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 35; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_v_w = __Pyx_c_prod(__pyx_v_w, (*__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_W.rcbuffer->pybuffer.buf, __pyx_t_6, __pyx_pybuffernd_W.diminfo[0].strides))); goto __pyx_L7; } __pyx_L7:; /* "cogent/maths/_period.pyx":36 * if n != 0: * w = w * W[p] * X[p] = X[p] + x[n]*w # <<<<<<<<<<<<<< * return X * */ __pyx_t_7 = __pyx_v_p; __pyx_t_8 = -1; if (__pyx_t_7 < 0) { __pyx_t_7 += __pyx_pybuffernd_X.diminfo[0].shape; if (unlikely(__pyx_t_7 < 0)) __pyx_t_8 = 0; } else if (unlikely(__pyx_t_7 >= __pyx_pybuffernd_X.diminfo[0].shape)) __pyx_t_8 = 0; if (unlikely(__pyx_t_8 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_8); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 36; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_8 = __pyx_v_n; __pyx_t_9 = -1; if (__pyx_t_8 < 0) { __pyx_t_8 += __pyx_pybuffernd_x.diminfo[0].shape; if (unlikely(__pyx_t_8 < 0)) __pyx_t_9 = 0; } else if (unlikely(__pyx_t_8 >= __pyx_pybuffernd_x.diminfo[0].shape)) __pyx_t_9 = 0; if (unlikely(__pyx_t_9 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_9); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 36; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_9 = __pyx_v_p; __pyx_t_10 = -1; if (__pyx_t_9 < 0) { __pyx_t_9 += __pyx_pybuffernd_X.diminfo[0].shape; if (unlikely(__pyx_t_9 < 0)) __pyx_t_10 = 0; } else if (unlikely(__pyx_t_9 >= __pyx_pybuffernd_X.diminfo[0].shape)) __pyx_t_10 = 0; if (unlikely(__pyx_t_10 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_10); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 36; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } *__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_X.rcbuffer->pybuffer.buf, __pyx_t_9, __pyx_pybuffernd_X.diminfo[0].strides) = __Pyx_c_sum((*__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_X.rcbuffer->pybuffer.buf, __pyx_t_7, __pyx_pybuffernd_X.diminfo[0].strides)), __Pyx_c_prod(__pyx_t_double_complex_from_parts((*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_x.rcbuffer->pybuffer.buf, __pyx_t_8, __pyx_pybuffernd_x.diminfo[0].strides)), 0), __pyx_v_w)); } } /* "cogent/maths/_period.pyx":37 * w = w * W[p] * X[p] = X[p] + x[n]*w * return X # <<<<<<<<<<<<<< * * def autocorr_inner(np.ndarray[np.float64_t, ndim=1] x, np.ndarray[np.float64_t, ndim=1] xc, int N): */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_X)); __pyx_r = ((PyObject *)__pyx_v_X); goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_W.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_X.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_x.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.maths._period.ipdft_inner", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_W.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_X.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_x.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_5maths_7_period_5autocorr_inner(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_6cogent_5maths_7_period_5autocorr_inner = {__Pyx_NAMESTR("autocorr_inner"), (PyCFunction)__pyx_pw_6cogent_5maths_7_period_5autocorr_inner, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(0)}; static PyObject *__pyx_pw_6cogent_5maths_7_period_5autocorr_inner(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_x = 0; PyArrayObject *__pyx_v_xc = 0; int __pyx_v_N; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__x,&__pyx_n_s__xc,&__pyx_n_s__N,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("autocorr_inner (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[3] = {0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__x); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__xc); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("autocorr_inner", 1, 3, 3, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__N); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("autocorr_inner", 1, 3, 3, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "autocorr_inner") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 3) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); } __pyx_v_x = ((PyArrayObject *)values[0]); __pyx_v_xc = ((PyArrayObject *)values[1]); __pyx_v_N = __Pyx_PyInt_AsInt(values[2]); if (unlikely((__pyx_v_N == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("autocorr_inner", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.maths._period.autocorr_inner", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_x), __pyx_ptype_5numpy_ndarray, 1, "x", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_xc), __pyx_ptype_5numpy_ndarray, 1, "xc", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_5maths_7_period_4autocorr_inner(__pyx_self, __pyx_v_x, __pyx_v_xc, __pyx_v_N); goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/_period.pyx":39 * return X * * def autocorr_inner(np.ndarray[np.float64_t, ndim=1] x, np.ndarray[np.float64_t, ndim=1] xc, int N): # <<<<<<<<<<<<<< * cdef int m, n * */ static PyObject *__pyx_pf_6cogent_5maths_7_period_4autocorr_inner(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_x, PyArrayObject *__pyx_v_xc, int __pyx_v_N) { int __pyx_v_m; int __pyx_v_n; __Pyx_LocalBuf_ND __pyx_pybuffernd_x; __Pyx_Buffer __pyx_pybuffer_x; __Pyx_LocalBuf_ND __pyx_pybuffernd_xc; __Pyx_Buffer __pyx_pybuffer_xc; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations int __pyx_t_1; int __pyx_t_2; int __pyx_t_3; int __pyx_t_4; int __pyx_t_5; int __pyx_t_6; int __pyx_t_7; int __pyx_t_8; long __pyx_t_9; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("autocorr_inner", 0); __pyx_pybuffer_x.pybuffer.buf = NULL; __pyx_pybuffer_x.refcount = 0; __pyx_pybuffernd_x.data = NULL; __pyx_pybuffernd_x.rcbuffer = &__pyx_pybuffer_x; __pyx_pybuffer_xc.pybuffer.buf = NULL; __pyx_pybuffer_xc.refcount = 0; __pyx_pybuffernd_xc.data = NULL; __pyx_pybuffernd_xc.rcbuffer = &__pyx_pybuffer_xc; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_x.rcbuffer->pybuffer, (PyObject*)__pyx_v_x, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_x.diminfo[0].strides = __pyx_pybuffernd_x.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_x.diminfo[0].shape = __pyx_pybuffernd_x.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_xc.rcbuffer->pybuffer, (PyObject*)__pyx_v_xc, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_xc.diminfo[0].strides = __pyx_pybuffernd_xc.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_xc.diminfo[0].shape = __pyx_pybuffernd_xc.rcbuffer->pybuffer.shape[0]; /* "cogent/maths/_period.pyx":42 * cdef int m, n * * for m in range(-N+1, N): # <<<<<<<<<<<<<< * for n in range(N): * if 0 <= n-m < N: */ __pyx_t_1 = __pyx_v_N; for (__pyx_t_2 = ((-__pyx_v_N) + 1); __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) { __pyx_v_m = __pyx_t_2; /* "cogent/maths/_period.pyx":43 * * for m in range(-N+1, N): * for n in range(N): # <<<<<<<<<<<<<< * if 0 <= n-m < N: * xc[m+N-1] += (x[n]*x[n-m]) */ __pyx_t_3 = __pyx_v_N; for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) { __pyx_v_n = __pyx_t_4; /* "cogent/maths/_period.pyx":44 * for m in range(-N+1, N): * for n in range(N): * if 0 <= n-m < N: # <<<<<<<<<<<<<< * xc[m+N-1] += (x[n]*x[n-m]) * */ __pyx_t_5 = (__pyx_v_n - __pyx_v_m); __pyx_t_6 = (0 <= __pyx_t_5); if (__pyx_t_6) { __pyx_t_6 = (__pyx_t_5 < __pyx_v_N); } if (__pyx_t_6) { /* "cogent/maths/_period.pyx":45 * for n in range(N): * if 0 <= n-m < N: * xc[m+N-1] += (x[n]*x[n-m]) # <<<<<<<<<<<<<< * * def seq_to_symbols(char* seq, list motifs, int motif_length, */ __pyx_t_5 = __pyx_v_n; __pyx_t_7 = -1; if (__pyx_t_5 < 0) { __pyx_t_5 += __pyx_pybuffernd_x.diminfo[0].shape; if (unlikely(__pyx_t_5 < 0)) __pyx_t_7 = 0; } else if (unlikely(__pyx_t_5 >= __pyx_pybuffernd_x.diminfo[0].shape)) __pyx_t_7 = 0; if (unlikely(__pyx_t_7 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_7); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 45; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_7 = (__pyx_v_n - __pyx_v_m); __pyx_t_8 = -1; if (__pyx_t_7 < 0) { __pyx_t_7 += __pyx_pybuffernd_x.diminfo[0].shape; if (unlikely(__pyx_t_7 < 0)) __pyx_t_8 = 0; } else if (unlikely(__pyx_t_7 >= __pyx_pybuffernd_x.diminfo[0].shape)) __pyx_t_8 = 0; if (unlikely(__pyx_t_8 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_8); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 45; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_9 = ((__pyx_v_m + __pyx_v_N) - 1); __pyx_t_8 = -1; if (__pyx_t_9 < 0) { __pyx_t_9 += __pyx_pybuffernd_xc.diminfo[0].shape; if (unlikely(__pyx_t_9 < 0)) __pyx_t_8 = 0; } else if (unlikely(__pyx_t_9 >= __pyx_pybuffernd_xc.diminfo[0].shape)) __pyx_t_8 = 0; if (unlikely(__pyx_t_8 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_8); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 45; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } *__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_xc.rcbuffer->pybuffer.buf, __pyx_t_9, __pyx_pybuffernd_xc.diminfo[0].strides) += ((*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_x.rcbuffer->pybuffer.buf, __pyx_t_5, __pyx_pybuffernd_x.diminfo[0].strides)) * (*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_x.rcbuffer->pybuffer.buf, __pyx_t_7, __pyx_pybuffernd_x.diminfo[0].strides))); goto __pyx_L7; } __pyx_L7:; } } __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_x.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_xc.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.maths._period.autocorr_inner", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_x.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_xc.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_5maths_7_period_7seq_to_symbols(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_6cogent_5maths_7_period_7seq_to_symbols = {__Pyx_NAMESTR("seq_to_symbols"), (PyCFunction)__pyx_pw_6cogent_5maths_7_period_7seq_to_symbols, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(0)}; static PyObject *__pyx_pw_6cogent_5maths_7_period_7seq_to_symbols(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { char *__pyx_v_seq; PyObject *__pyx_v_motifs = 0; int __pyx_v_motif_length; PyArrayObject *__pyx_v_result = 0; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__seq,&__pyx_n_s__motifs,&__pyx_n_s__motif_length,&__pyx_n_s__result,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("seq_to_symbols (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[4] = {0,0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3); case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__seq); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__motifs); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("seq_to_symbols", 1, 4, 4, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__motif_length); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("seq_to_symbols", 1, 4, 4, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 3: values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__result); if (likely(values[3])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("seq_to_symbols", 1, 4, 4, 3); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "seq_to_symbols") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 4) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); values[3] = PyTuple_GET_ITEM(__pyx_args, 3); } __pyx_v_seq = PyBytes_AsString(values[0]); if (unlikely((!__pyx_v_seq) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_motifs = ((PyObject*)values[1]); __pyx_v_motif_length = __Pyx_PyInt_AsInt(values[2]); if (unlikely((__pyx_v_motif_length == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_result = ((PyArrayObject *)values[3]); } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("seq_to_symbols", 1, 4, 4, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.maths._period.seq_to_symbols", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_motifs), (&PyList_Type), 1, "motifs", 1))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_result), __pyx_ptype_5numpy_ndarray, 1, "result", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_5maths_7_period_6seq_to_symbols(__pyx_self, __pyx_v_seq, __pyx_v_motifs, __pyx_v_motif_length, __pyx_v_result); goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/_period.pyx":47 * xc[m+N-1] += (x[n]*x[n-m]) * * def seq_to_symbols(char* seq, list motifs, int motif_length, # <<<<<<<<<<<<<< * np.ndarray[np.uint8_t, ndim=1] result): * cdef int i, j, N, num_motifs */ static PyObject *__pyx_pf_6cogent_5maths_7_period_6seq_to_symbols(CYTHON_UNUSED PyObject *__pyx_self, char *__pyx_v_seq, PyObject *__pyx_v_motifs, int __pyx_v_motif_length, PyArrayObject *__pyx_v_result) { int __pyx_v_i; int __pyx_v_j; int __pyx_v_N; int __pyx_v_num_motifs; PyObject *__pyx_v_got = 0; __Pyx_LocalBuf_ND __pyx_pybuffernd_result; __Pyx_Buffer __pyx_pybuffer_result; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations size_t __pyx_t_1; Py_ssize_t __pyx_t_2; PyObject *__pyx_t_3 = NULL; long __pyx_t_4; int __pyx_t_5; int __pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; int __pyx_t_10; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("seq_to_symbols", 0); __pyx_pybuffer_result.pybuffer.buf = NULL; __pyx_pybuffer_result.refcount = 0; __pyx_pybuffernd_result.data = NULL; __pyx_pybuffernd_result.rcbuffer = &__pyx_pybuffer_result; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_result.rcbuffer->pybuffer, (PyObject*)__pyx_v_result, &__Pyx_TypeInfo_nn___pyx_t_5numpy_uint8_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_result.diminfo[0].strides = __pyx_pybuffernd_result.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_result.diminfo[0].shape = __pyx_pybuffernd_result.rcbuffer->pybuffer.shape[0]; /* "cogent/maths/_period.pyx":52 * cdef bytes got * * N = len(seq) # <<<<<<<<<<<<<< * num_motifs = len(motifs) * motif_length = len(motifs[0]) */ __pyx_t_1 = strlen(__pyx_v_seq); __pyx_v_N = __pyx_t_1; /* "cogent/maths/_period.pyx":53 * * N = len(seq) * num_motifs = len(motifs) # <<<<<<<<<<<<<< * motif_length = len(motifs[0]) * for i in range(N - motif_length + 1): */ if (unlikely(((PyObject *)__pyx_v_motifs) == Py_None)) { PyErr_SetString(PyExc_TypeError, "object of type 'NoneType' has no len()"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 53; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_2 = PyList_GET_SIZE(((PyObject *)__pyx_v_motifs)); __pyx_v_num_motifs = __pyx_t_2; /* "cogent/maths/_period.pyx":54 * N = len(seq) * num_motifs = len(motifs) * motif_length = len(motifs[0]) # <<<<<<<<<<<<<< * for i in range(N - motif_length + 1): * got = seq[i: i+motif_length] */ __pyx_t_3 = __Pyx_GetItemInt_List(((PyObject *)__pyx_v_motifs), 0, sizeof(long), PyInt_FromLong); if (!__pyx_t_3) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_2 = PyObject_Length(__pyx_t_3); if (unlikely(__pyx_t_2 == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_v_motif_length = __pyx_t_2; /* "cogent/maths/_period.pyx":55 * num_motifs = len(motifs) * motif_length = len(motifs[0]) * for i in range(N - motif_length + 1): # <<<<<<<<<<<<<< * got = seq[i: i+motif_length] * for j in range(num_motifs): */ __pyx_t_4 = ((__pyx_v_N - __pyx_v_motif_length) + 1); for (__pyx_t_5 = 0; __pyx_t_5 < __pyx_t_4; __pyx_t_5+=1) { __pyx_v_i = __pyx_t_5; /* "cogent/maths/_period.pyx":56 * motif_length = len(motifs[0]) * for i in range(N - motif_length + 1): * got = seq[i: i+motif_length] # <<<<<<<<<<<<<< * for j in range(num_motifs): * if got == motifs[j]: */ __pyx_t_3 = PyBytes_FromStringAndSize(__pyx_v_seq + __pyx_v_i, (__pyx_v_i + __pyx_v_motif_length) - __pyx_v_i); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_3)); __Pyx_XDECREF(((PyObject *)__pyx_v_got)); __pyx_v_got = __pyx_t_3; __pyx_t_3 = 0; /* "cogent/maths/_period.pyx":57 * for i in range(N - motif_length + 1): * got = seq[i: i+motif_length] * for j in range(num_motifs): # <<<<<<<<<<<<<< * if got == motifs[j]: * result[i] = 1 */ __pyx_t_6 = __pyx_v_num_motifs; for (__pyx_t_7 = 0; __pyx_t_7 < __pyx_t_6; __pyx_t_7+=1) { __pyx_v_j = __pyx_t_7; /* "cogent/maths/_period.pyx":58 * got = seq[i: i+motif_length] * for j in range(num_motifs): * if got == motifs[j]: # <<<<<<<<<<<<<< * result[i] = 1 * return result */ __pyx_t_3 = __Pyx_GetItemInt_List(((PyObject *)__pyx_v_motifs), __pyx_v_j, sizeof(int), PyInt_FromLong); if (!__pyx_t_3) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 58; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_8 = __Pyx_PyBytes_Equals(((PyObject *)__pyx_v_got), __pyx_t_3, Py_EQ); if (unlikely(__pyx_t_8 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 58; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_8) { /* "cogent/maths/_period.pyx":59 * for j in range(num_motifs): * if got == motifs[j]: * result[i] = 1 # <<<<<<<<<<<<<< * return result * */ __pyx_t_9 = __pyx_v_i; __pyx_t_10 = -1; if (__pyx_t_9 < 0) { __pyx_t_9 += __pyx_pybuffernd_result.diminfo[0].shape; if (unlikely(__pyx_t_9 < 0)) __pyx_t_10 = 0; } else if (unlikely(__pyx_t_9 >= __pyx_pybuffernd_result.diminfo[0].shape)) __pyx_t_10 = 0; if (unlikely(__pyx_t_10 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_10); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 59; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } *__Pyx_BufPtrStrided1d(__pyx_t_5numpy_uint8_t *, __pyx_pybuffernd_result.rcbuffer->pybuffer.buf, __pyx_t_9, __pyx_pybuffernd_result.diminfo[0].strides) = 1; goto __pyx_L7; } __pyx_L7:; } } /* "cogent/maths/_period.pyx":60 * if got == motifs[j]: * result[i] = 1 * return result # <<<<<<<<<<<<<< * */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_result)); __pyx_r = ((PyObject *)__pyx_v_result); goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_3); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_result.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.maths._period.seq_to_symbols", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_result.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XDECREF(__pyx_v_got); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /*proto*/ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_r; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__getbuffer__ (wrapper)", 0); __pyx_r = __pyx_pf_5numpy_7ndarray___getbuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info), ((int)__pyx_v_flags)); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":193 * # experimental exception made for __getbuffer__ and __releasebuffer__ * # -- the details of this may change. * def __getbuffer__(ndarray self, Py_buffer* info, int flags): # <<<<<<<<<<<<<< * # This implementation of getbuffer is geared towards Cython * # requirements, and does not yet fullfill the PEP. */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_v_copy_shape; int __pyx_v_i; int __pyx_v_ndim; int __pyx_v_endian_detector; int __pyx_v_little_endian; int __pyx_v_t; char *__pyx_v_f; PyArray_Descr *__pyx_v_descr = 0; int __pyx_v_offset; int __pyx_v_hasfields; int __pyx_r; __Pyx_RefNannyDeclarations int __pyx_t_1; int __pyx_t_2; int __pyx_t_3; PyObject *__pyx_t_4 = NULL; int __pyx_t_5; int __pyx_t_6; int __pyx_t_7; PyObject *__pyx_t_8 = NULL; char *__pyx_t_9; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("__getbuffer__", 0); if (__pyx_v_info != NULL) { __pyx_v_info->obj = Py_None; __Pyx_INCREF(Py_None); __Pyx_GIVEREF(__pyx_v_info->obj); } /* "numpy.pxd":199 * # of flags * * if info == NULL: return # <<<<<<<<<<<<<< * * cdef int copy_shape, i, ndim */ __pyx_t_1 = (__pyx_v_info == NULL); if (__pyx_t_1) { __pyx_r = 0; goto __pyx_L0; goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":202 * * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * */ __pyx_v_endian_detector = 1; /* "numpy.pxd":203 * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * * ndim = PyArray_NDIM(self) */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":205 * cdef bint little_endian = ((&endian_detector)[0] != 0) * * ndim = PyArray_NDIM(self) # <<<<<<<<<<<<<< * * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_v_ndim = PyArray_NDIM(__pyx_v_self); /* "numpy.pxd":207 * ndim = PyArray_NDIM(self) * * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * copy_shape = 1 * else: */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":208 * * if sizeof(npy_intp) != sizeof(Py_ssize_t): * copy_shape = 1 # <<<<<<<<<<<<<< * else: * copy_shape = 0 */ __pyx_v_copy_shape = 1; goto __pyx_L4; } /*else*/ { /* "numpy.pxd":210 * copy_shape = 1 * else: * copy_shape = 0 # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) */ __pyx_v_copy_shape = 0; } __pyx_L4:; /* "numpy.pxd":212 * copy_shape = 0 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") */ __pyx_t_1 = ((__pyx_v_flags & PyBUF_C_CONTIGUOUS) == PyBUF_C_CONTIGUOUS); if (__pyx_t_1) { /* "numpy.pxd":213 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not C contiguous") * */ __pyx_t_2 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_C_CONTIGUOUS)); __pyx_t_3 = __pyx_t_2; } else { __pyx_t_3 = __pyx_t_1; } if (__pyx_t_3) { /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_2), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":216 * raise ValueError(u"ndarray is not C contiguous") * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") */ __pyx_t_3 = ((__pyx_v_flags & PyBUF_F_CONTIGUOUS) == PyBUF_F_CONTIGUOUS); if (__pyx_t_3) { /* "numpy.pxd":217 * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not Fortran contiguous") * */ __pyx_t_1 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_F_CONTIGUOUS)); __pyx_t_2 = __pyx_t_1; } else { __pyx_t_2 = __pyx_t_3; } if (__pyx_t_2) { /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_4), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":220 * raise ValueError(u"ndarray is not Fortran contiguous") * * info.buf = PyArray_DATA(self) # <<<<<<<<<<<<<< * info.ndim = ndim * if copy_shape: */ __pyx_v_info->buf = PyArray_DATA(__pyx_v_self); /* "numpy.pxd":221 * * info.buf = PyArray_DATA(self) * info.ndim = ndim # <<<<<<<<<<<<<< * if copy_shape: * # Allocate new buffer for strides and shape info. */ __pyx_v_info->ndim = __pyx_v_ndim; /* "numpy.pxd":222 * info.buf = PyArray_DATA(self) * info.ndim = ndim * if copy_shape: # <<<<<<<<<<<<<< * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. */ if (__pyx_v_copy_shape) { /* "numpy.pxd":225 * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) # <<<<<<<<<<<<<< * info.shape = info.strides + ndim * for i in range(ndim): */ __pyx_v_info->strides = ((Py_ssize_t *)malloc((((sizeof(Py_ssize_t)) * ((size_t)__pyx_v_ndim)) * 2))); /* "numpy.pxd":226 * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim # <<<<<<<<<<<<<< * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] */ __pyx_v_info->shape = (__pyx_v_info->strides + __pyx_v_ndim); /* "numpy.pxd":227 * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim * for i in range(ndim): # <<<<<<<<<<<<<< * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] */ __pyx_t_5 = __pyx_v_ndim; for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) { __pyx_v_i = __pyx_t_6; /* "numpy.pxd":228 * info.shape = info.strides + ndim * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] # <<<<<<<<<<<<<< * info.shape[i] = PyArray_DIMS(self)[i] * else: */ (__pyx_v_info->strides[__pyx_v_i]) = (PyArray_STRIDES(__pyx_v_self)[__pyx_v_i]); /* "numpy.pxd":229 * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] # <<<<<<<<<<<<<< * else: * info.strides = PyArray_STRIDES(self) */ (__pyx_v_info->shape[__pyx_v_i]) = (PyArray_DIMS(__pyx_v_self)[__pyx_v_i]); } goto __pyx_L7; } /*else*/ { /* "numpy.pxd":231 * info.shape[i] = PyArray_DIMS(self)[i] * else: * info.strides = PyArray_STRIDES(self) # <<<<<<<<<<<<<< * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL */ __pyx_v_info->strides = ((Py_ssize_t *)PyArray_STRIDES(__pyx_v_self)); /* "numpy.pxd":232 * else: * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) # <<<<<<<<<<<<<< * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) */ __pyx_v_info->shape = ((Py_ssize_t *)PyArray_DIMS(__pyx_v_self)); } __pyx_L7:; /* "numpy.pxd":233 * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL # <<<<<<<<<<<<<< * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) */ __pyx_v_info->suboffsets = NULL; /* "numpy.pxd":234 * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) # <<<<<<<<<<<<<< * info.readonly = not PyArray_ISWRITEABLE(self) * */ __pyx_v_info->itemsize = PyArray_ITEMSIZE(__pyx_v_self); /* "numpy.pxd":235 * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) # <<<<<<<<<<<<<< * * cdef int t */ __pyx_v_info->readonly = (!PyArray_ISWRITEABLE(__pyx_v_self)); /* "numpy.pxd":238 * * cdef int t * cdef char* f = NULL # <<<<<<<<<<<<<< * cdef dtype descr = self.descr * cdef list stack */ __pyx_v_f = NULL; /* "numpy.pxd":239 * cdef int t * cdef char* f = NULL * cdef dtype descr = self.descr # <<<<<<<<<<<<<< * cdef list stack * cdef int offset */ __Pyx_INCREF(((PyObject *)__pyx_v_self->descr)); __pyx_v_descr = __pyx_v_self->descr; /* "numpy.pxd":243 * cdef int offset * * cdef bint hasfields = PyDataType_HASFIELDS(descr) # <<<<<<<<<<<<<< * * if not hasfields and not copy_shape: */ __pyx_v_hasfields = PyDataType_HASFIELDS(__pyx_v_descr); /* "numpy.pxd":245 * cdef bint hasfields = PyDataType_HASFIELDS(descr) * * if not hasfields and not copy_shape: # <<<<<<<<<<<<<< * # do not call releasebuffer * info.obj = None */ __pyx_t_2 = (!__pyx_v_hasfields); if (__pyx_t_2) { __pyx_t_3 = (!__pyx_v_copy_shape); __pyx_t_1 = __pyx_t_3; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":247 * if not hasfields and not copy_shape: * # do not call releasebuffer * info.obj = None # <<<<<<<<<<<<<< * else: * # need to call releasebuffer */ __Pyx_INCREF(Py_None); __Pyx_GIVEREF(Py_None); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = Py_None; goto __pyx_L10; } /*else*/ { /* "numpy.pxd":250 * else: * # need to call releasebuffer * info.obj = self # <<<<<<<<<<<<<< * * if not hasfields: */ __Pyx_INCREF(((PyObject *)__pyx_v_self)); __Pyx_GIVEREF(((PyObject *)__pyx_v_self)); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = ((PyObject *)__pyx_v_self); } __pyx_L10:; /* "numpy.pxd":252 * info.obj = self * * if not hasfields: # <<<<<<<<<<<<<< * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or */ __pyx_t_1 = (!__pyx_v_hasfields); if (__pyx_t_1) { /* "numpy.pxd":253 * * if not hasfields: * t = descr.type_num # <<<<<<<<<<<<<< * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): */ __pyx_v_t = __pyx_v_descr->type_num; /* "numpy.pxd":254 * if not hasfields: * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_1 = (__pyx_v_descr->byteorder == '>'); if (__pyx_t_1) { __pyx_t_2 = __pyx_v_little_endian; } else { __pyx_t_2 = __pyx_t_1; } if (!__pyx_t_2) { /* "numpy.pxd":255 * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" */ __pyx_t_1 = (__pyx_v_descr->byteorder == '<'); if (__pyx_t_1) { __pyx_t_3 = (!__pyx_v_little_endian); __pyx_t_7 = __pyx_t_3; } else { __pyx_t_7 = __pyx_t_1; } __pyx_t_1 = __pyx_t_7; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_6), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L12; } __pyx_L12:; /* "numpy.pxd":257 * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" */ __pyx_t_1 = (__pyx_v_t == NPY_BYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__b; goto __pyx_L13; } /* "numpy.pxd":258 * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" */ __pyx_t_1 = (__pyx_v_t == NPY_UBYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__B; goto __pyx_L13; } /* "numpy.pxd":259 * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" */ __pyx_t_1 = (__pyx_v_t == NPY_SHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__h; goto __pyx_L13; } /* "numpy.pxd":260 * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" */ __pyx_t_1 = (__pyx_v_t == NPY_USHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__H; goto __pyx_L13; } /* "numpy.pxd":261 * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" */ __pyx_t_1 = (__pyx_v_t == NPY_INT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__i; goto __pyx_L13; } /* "numpy.pxd":262 * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" */ __pyx_t_1 = (__pyx_v_t == NPY_UINT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__I; goto __pyx_L13; } /* "numpy.pxd":263 * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" */ __pyx_t_1 = (__pyx_v_t == NPY_LONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__l; goto __pyx_L13; } /* "numpy.pxd":264 * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__L; goto __pyx_L13; } /* "numpy.pxd":265 * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__q; goto __pyx_L13; } /* "numpy.pxd":266 * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Q; goto __pyx_L13; } /* "numpy.pxd":267 * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" */ __pyx_t_1 = (__pyx_v_t == NPY_FLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__f; goto __pyx_L13; } /* "numpy.pxd":268 * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" */ __pyx_t_1 = (__pyx_v_t == NPY_DOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__d; goto __pyx_L13; } /* "numpy.pxd":269 * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__g; goto __pyx_L13; } /* "numpy.pxd":270 * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" */ __pyx_t_1 = (__pyx_v_t == NPY_CFLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zf; goto __pyx_L13; } /* "numpy.pxd":271 * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" */ __pyx_t_1 = (__pyx_v_t == NPY_CDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zd; goto __pyx_L13; } /* "numpy.pxd":272 * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f = "O" * else: */ __pyx_t_1 = (__pyx_v_t == NPY_CLONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zg; goto __pyx_L13; } /* "numpy.pxd":273 * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_1 = (__pyx_v_t == NPY_OBJECT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__O; goto __pyx_L13; } /*else*/ { /* "numpy.pxd":275 * elif t == NPY_OBJECT: f = "O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * info.format = f * return */ __pyx_t_4 = PyInt_FromLong(__pyx_v_t); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_8 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_t_4); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_8)); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_8)); __Pyx_GIVEREF(((PyObject *)__pyx_t_8)); __pyx_t_8 = 0; __pyx_t_8 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_8); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_Raise(__pyx_t_8, 0, 0, 0); __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L13:; /* "numpy.pxd":276 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f # <<<<<<<<<<<<<< * return * else: */ __pyx_v_info->format = __pyx_v_f; /* "numpy.pxd":277 * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f * return # <<<<<<<<<<<<<< * else: * info.format = stdlib.malloc(_buffer_format_string_len) */ __pyx_r = 0; goto __pyx_L0; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":279 * return * else: * info.format = stdlib.malloc(_buffer_format_string_len) # <<<<<<<<<<<<<< * info.format[0] = '^' # Native data types, manual alignment * offset = 0 */ __pyx_v_info->format = ((char *)malloc(255)); /* "numpy.pxd":280 * else: * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment # <<<<<<<<<<<<<< * offset = 0 * f = _util_dtypestring(descr, info.format + 1, */ (__pyx_v_info->format[0]) = '^'; /* "numpy.pxd":281 * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment * offset = 0 # <<<<<<<<<<<<<< * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, */ __pyx_v_offset = 0; /* "numpy.pxd":284 * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, * &offset) # <<<<<<<<<<<<<< * f[0] = 0 # Terminate format string * */ __pyx_t_9 = __pyx_f_5numpy__util_dtypestring(__pyx_v_descr, (__pyx_v_info->format + 1), (__pyx_v_info->format + 255), (&__pyx_v_offset)); if (unlikely(__pyx_t_9 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_9; /* "numpy.pxd":285 * info.format + _buffer_format_string_len, * &offset) * f[0] = 0 # Terminate format string # <<<<<<<<<<<<<< * * def __releasebuffer__(ndarray self, Py_buffer* info): */ (__pyx_v_f[0]) = 0; } __pyx_L11:; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_8); __Pyx_AddTraceback("numpy.ndarray.__getbuffer__", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = -1; if (__pyx_v_info != NULL && __pyx_v_info->obj != NULL) { __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = NULL; } goto __pyx_L2; __pyx_L0:; if (__pyx_v_info != NULL && __pyx_v_info->obj == Py_None) { __Pyx_GOTREF(Py_None); __Pyx_DECREF(Py_None); __pyx_v_info->obj = NULL; } __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_descr); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info); /*proto*/ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__releasebuffer__ (wrapper)", 0); __pyx_pf_5numpy_7ndarray_2__releasebuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info)); __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":287 * f[0] = 0 # Terminate format string * * def __releasebuffer__(ndarray self, Py_buffer* info): # <<<<<<<<<<<<<< * if PyArray_HASFIELDS(self): * stdlib.free(info.format) */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("__releasebuffer__", 0); /* "numpy.pxd":288 * * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): # <<<<<<<<<<<<<< * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_t_1 = PyArray_HASFIELDS(__pyx_v_self); if (__pyx_t_1) { /* "numpy.pxd":289 * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): * stdlib.free(info.format) # <<<<<<<<<<<<<< * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) */ free(__pyx_v_info->format); goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":290 * if PyArray_HASFIELDS(self): * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * stdlib.free(info.strides) * # info.shape was stored after info.strides in the same block */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":291 * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) # <<<<<<<<<<<<<< * # info.shape was stored after info.strides in the same block * */ free(__pyx_v_info->strides); goto __pyx_L4; } __pyx_L4:; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":767 * ctypedef npy_cdouble complex_t * * cdef inline object PyArray_MultiIterNew1(a): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(1, a) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew1(PyObject *__pyx_v_a) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew1", 0); /* "numpy.pxd":768 * * cdef inline object PyArray_MultiIterNew1(a): * return PyArray_MultiIterNew(1, a) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew2(a, b): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(1, ((void *)__pyx_v_a)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 768; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew1", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":770 * return PyArray_MultiIterNew(1, a) * * cdef inline object PyArray_MultiIterNew2(a, b): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(2, a, b) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew2(PyObject *__pyx_v_a, PyObject *__pyx_v_b) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew2", 0); /* "numpy.pxd":771 * * cdef inline object PyArray_MultiIterNew2(a, b): * return PyArray_MultiIterNew(2, a, b) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew3(a, b, c): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(2, ((void *)__pyx_v_a), ((void *)__pyx_v_b)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 771; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew2", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":773 * return PyArray_MultiIterNew(2, a, b) * * cdef inline object PyArray_MultiIterNew3(a, b, c): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(3, a, b, c) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew3(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew3", 0); /* "numpy.pxd":774 * * cdef inline object PyArray_MultiIterNew3(a, b, c): * return PyArray_MultiIterNew(3, a, b, c) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(3, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 774; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew3", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":776 * return PyArray_MultiIterNew(3, a, b, c) * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(4, a, b, c, d) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew4(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew4", 0); /* "numpy.pxd":777 * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): * return PyArray_MultiIterNew(4, a, b, c, d) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(4, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 777; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew4", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":779 * return PyArray_MultiIterNew(4, a, b, c, d) * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(5, a, b, c, d, e) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew5(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d, PyObject *__pyx_v_e) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew5", 0); /* "numpy.pxd":780 * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): * return PyArray_MultiIterNew(5, a, b, c, d, e) # <<<<<<<<<<<<<< * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(5, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d), ((void *)__pyx_v_e)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 780; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew5", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":782 * return PyArray_MultiIterNew(5, a, b, c, d, e) * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: # <<<<<<<<<<<<<< * # Recursive utility function used in __getbuffer__ to get format * # string. The new location in the format string is returned. */ static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *__pyx_v_descr, char *__pyx_v_f, char *__pyx_v_end, int *__pyx_v_offset) { PyArray_Descr *__pyx_v_child = 0; int __pyx_v_endian_detector; int __pyx_v_little_endian; PyObject *__pyx_v_fields = 0; PyObject *__pyx_v_childname = NULL; PyObject *__pyx_v_new_offset = NULL; PyObject *__pyx_v_t = NULL; char *__pyx_r; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; Py_ssize_t __pyx_t_2; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; PyObject *__pyx_t_5 = NULL; int __pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; long __pyx_t_10; char *__pyx_t_11; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("_util_dtypestring", 0); /* "numpy.pxd":789 * cdef int delta_offset * cdef tuple i * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * cdef tuple fields */ __pyx_v_endian_detector = 1; /* "numpy.pxd":790 * cdef tuple i * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * cdef tuple fields * */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":793 * cdef tuple fields * * for childname in descr.names: # <<<<<<<<<<<<<< * fields = descr.fields[childname] * child, new_offset = fields */ if (unlikely(((PyObject *)__pyx_v_descr->names) == Py_None)) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 793; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_1 = ((PyObject *)__pyx_v_descr->names); __Pyx_INCREF(__pyx_t_1); __pyx_t_2 = 0; for (;;) { if (__pyx_t_2 >= PyTuple_GET_SIZE(__pyx_t_1)) break; __pyx_t_3 = PyTuple_GET_ITEM(__pyx_t_1, __pyx_t_2); __Pyx_INCREF(__pyx_t_3); __pyx_t_2++; __Pyx_XDECREF(__pyx_v_childname); __pyx_v_childname = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":794 * * for childname in descr.names: * fields = descr.fields[childname] # <<<<<<<<<<<<<< * child, new_offset = fields * */ __pyx_t_3 = PyObject_GetItem(__pyx_v_descr->fields, __pyx_v_childname); if (!__pyx_t_3) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); if (!(likely(PyTuple_CheckExact(__pyx_t_3))||((__pyx_t_3) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected tuple, got %.200s", Py_TYPE(__pyx_t_3)->tp_name), 0))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_fields)); __pyx_v_fields = ((PyObject*)__pyx_t_3); __pyx_t_3 = 0; /* "numpy.pxd":795 * for childname in descr.names: * fields = descr.fields[childname] * child, new_offset = fields # <<<<<<<<<<<<<< * * if (end - f) - (new_offset - offset[0]) < 15: */ if (likely(PyTuple_CheckExact(((PyObject *)__pyx_v_fields)))) { PyObject* sequence = ((PyObject *)__pyx_v_fields); if (unlikely(PyTuple_GET_SIZE(sequence) != 2)) { if (PyTuple_GET_SIZE(sequence) > 2) __Pyx_RaiseTooManyValuesError(2); else __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(sequence)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_3 = PyTuple_GET_ITEM(sequence, 0); __pyx_t_4 = PyTuple_GET_ITEM(sequence, 1); __Pyx_INCREF(__pyx_t_3); __Pyx_INCREF(__pyx_t_4); } else { __Pyx_UnpackTupleError(((PyObject *)__pyx_v_fields), 2); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } if (!(likely(((__pyx_t_3) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_3, __pyx_ptype_5numpy_dtype))))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_child)); __pyx_v_child = ((PyArray_Descr *)__pyx_t_3); __pyx_t_3 = 0; __Pyx_XDECREF(__pyx_v_new_offset); __pyx_v_new_offset = __pyx_t_4; __pyx_t_4 = 0; /* "numpy.pxd":797 * child, new_offset = fields * * if (end - f) - (new_offset - offset[0]) < 15: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * */ __pyx_t_4 = PyInt_FromLong((__pyx_v_end - __pyx_v_f)); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyNumber_Subtract(__pyx_v_new_offset, __pyx_t_3); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyNumber_Subtract(__pyx_t_4, __pyx_t_5); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = PyObject_RichCompare(__pyx_t_3, __pyx_int_15, Py_LT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_t_5 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_9), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":800 * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * * if ((child.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_6 = (__pyx_v_child->byteorder == '>'); if (__pyx_t_6) { __pyx_t_7 = __pyx_v_little_endian; } else { __pyx_t_7 = __pyx_t_6; } if (!__pyx_t_7) { /* "numpy.pxd":801 * * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * # One could encode it in the format string and have Cython */ __pyx_t_6 = (__pyx_v_child->byteorder == '<'); if (__pyx_t_6) { __pyx_t_8 = (!__pyx_v_little_endian); __pyx_t_9 = __pyx_t_8; } else { __pyx_t_9 = __pyx_t_6; } __pyx_t_6 = __pyx_t_9; } else { __pyx_t_6 = __pyx_t_7; } if (__pyx_t_6) { /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_10), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":812 * * # Output padding bytes * while offset[0] < new_offset: # <<<<<<<<<<<<<< * f[0] = 120 # "x"; pad byte * f += 1 */ while (1) { __pyx_t_5 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_v_new_offset, Py_LT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (!__pyx_t_6) break; /* "numpy.pxd":813 * # Output padding bytes * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte # <<<<<<<<<<<<<< * f += 1 * offset[0] += 1 */ (__pyx_v_f[0]) = 120; /* "numpy.pxd":814 * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte * f += 1 # <<<<<<<<<<<<<< * offset[0] += 1 * */ __pyx_v_f = (__pyx_v_f + 1); /* "numpy.pxd":815 * f[0] = 120 # "x"; pad byte * f += 1 * offset[0] += 1 # <<<<<<<<<<<<<< * * offset[0] += child.itemsize */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + 1); } /* "numpy.pxd":817 * offset[0] += 1 * * offset[0] += child.itemsize # <<<<<<<<<<<<<< * * if not PyDataType_HASFIELDS(child): */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + __pyx_v_child->elsize); /* "numpy.pxd":819 * offset[0] += child.itemsize * * if not PyDataType_HASFIELDS(child): # <<<<<<<<<<<<<< * t = child.type_num * if end - f < 5: */ __pyx_t_6 = (!PyDataType_HASFIELDS(__pyx_v_child)); if (__pyx_t_6) { /* "numpy.pxd":820 * * if not PyDataType_HASFIELDS(child): * t = child.type_num # <<<<<<<<<<<<<< * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") */ __pyx_t_3 = PyInt_FromLong(__pyx_v_child->type_num); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 820; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_XDECREF(__pyx_v_t); __pyx_v_t = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":821 * if not PyDataType_HASFIELDS(child): * t = child.type_num * if end - f < 5: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short.") * */ __pyx_t_6 = ((__pyx_v_end - __pyx_v_f) < 5); if (__pyx_t_6) { /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_t_3 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_12), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_Raise(__pyx_t_3, 0, 0, 0); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L10; } __pyx_L10:; /* "numpy.pxd":825 * * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" */ __pyx_t_3 = PyInt_FromLong(NPY_BYTE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 98; goto __pyx_L11; } /* "numpy.pxd":826 * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" */ __pyx_t_5 = PyInt_FromLong(NPY_UBYTE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 66; goto __pyx_L11; } /* "numpy.pxd":827 * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" */ __pyx_t_3 = PyInt_FromLong(NPY_SHORT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 104; goto __pyx_L11; } /* "numpy.pxd":828 * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" */ __pyx_t_5 = PyInt_FromLong(NPY_USHORT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 72; goto __pyx_L11; } /* "numpy.pxd":829 * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" */ __pyx_t_3 = PyInt_FromLong(NPY_INT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 105; goto __pyx_L11; } /* "numpy.pxd":830 * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" */ __pyx_t_5 = PyInt_FromLong(NPY_UINT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 73; goto __pyx_L11; } /* "numpy.pxd":831 * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" */ __pyx_t_3 = PyInt_FromLong(NPY_LONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 108; goto __pyx_L11; } /* "numpy.pxd":832 * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 76; goto __pyx_L11; } /* "numpy.pxd":833 * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" */ __pyx_t_3 = PyInt_FromLong(NPY_LONGLONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 113; goto __pyx_L11; } /* "numpy.pxd":834 * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONGLONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 81; goto __pyx_L11; } /* "numpy.pxd":835 * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" */ __pyx_t_3 = PyInt_FromLong(NPY_FLOAT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 102; goto __pyx_L11; } /* "numpy.pxd":836 * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf */ __pyx_t_5 = PyInt_FromLong(NPY_DOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 100; goto __pyx_L11; } /* "numpy.pxd":837 * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd */ __pyx_t_3 = PyInt_FromLong(NPY_LONGDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 103; goto __pyx_L11; } /* "numpy.pxd":838 * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg */ __pyx_t_5 = PyInt_FromLong(NPY_CFLOAT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 102; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":839 * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" */ __pyx_t_3 = PyInt_FromLong(NPY_CDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 100; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":840 * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: */ __pyx_t_5 = PyInt_FromLong(NPY_CLONGDOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 103; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":841 * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_3 = PyInt_FromLong(NPY_OBJECT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 79; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":843 * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * f += 1 * else: */ __pyx_t_5 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_v_t); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_5)); __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, ((PyObject *)__pyx_t_5)); __Pyx_GIVEREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L11:; /* "numpy.pxd":844 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * f += 1 # <<<<<<<<<<<<<< * else: * # Cython ignores struct boundary information ("T{...}"), */ __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L9; } /*else*/ { /* "numpy.pxd":848 * # Cython ignores struct boundary information ("T{...}"), * # so don't output it * f = _util_dtypestring(child, f, end, offset) # <<<<<<<<<<<<<< * return f * */ __pyx_t_11 = __pyx_f_5numpy__util_dtypestring(__pyx_v_child, __pyx_v_f, __pyx_v_end, __pyx_v_offset); if (unlikely(__pyx_t_11 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 848; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_11; } __pyx_L9:; } __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "numpy.pxd":849 * # so don't output it * f = _util_dtypestring(child, f, end, offset) * return f # <<<<<<<<<<<<<< * * */ __pyx_r = __pyx_v_f; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __Pyx_AddTraceback("numpy._util_dtypestring", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF((PyObject *)__pyx_v_child); __Pyx_XDECREF(__pyx_v_fields); __Pyx_XDECREF(__pyx_v_childname); __Pyx_XDECREF(__pyx_v_new_offset); __Pyx_XDECREF(__pyx_v_t); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":964 * * * cdef inline void set_array_base(ndarray arr, object base): # <<<<<<<<<<<<<< * cdef PyObject* baseptr * if base is None: */ static CYTHON_INLINE void __pyx_f_5numpy_set_array_base(PyArrayObject *__pyx_v_arr, PyObject *__pyx_v_base) { PyObject *__pyx_v_baseptr; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("set_array_base", 0); /* "numpy.pxd":966 * cdef inline void set_array_base(ndarray arr, object base): * cdef PyObject* baseptr * if base is None: # <<<<<<<<<<<<<< * baseptr = NULL * else: */ __pyx_t_1 = (__pyx_v_base == Py_None); if (__pyx_t_1) { /* "numpy.pxd":967 * cdef PyObject* baseptr * if base is None: * baseptr = NULL # <<<<<<<<<<<<<< * else: * Py_INCREF(base) # important to do this before decref below! */ __pyx_v_baseptr = NULL; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":969 * baseptr = NULL * else: * Py_INCREF(base) # important to do this before decref below! # <<<<<<<<<<<<<< * baseptr = base * Py_XDECREF(arr.base) */ Py_INCREF(__pyx_v_base); /* "numpy.pxd":970 * else: * Py_INCREF(base) # important to do this before decref below! * baseptr = base # <<<<<<<<<<<<<< * Py_XDECREF(arr.base) * arr.base = baseptr */ __pyx_v_baseptr = ((PyObject *)__pyx_v_base); } __pyx_L3:; /* "numpy.pxd":971 * Py_INCREF(base) # important to do this before decref below! * baseptr = base * Py_XDECREF(arr.base) # <<<<<<<<<<<<<< * arr.base = baseptr * */ Py_XDECREF(__pyx_v_arr->base); /* "numpy.pxd":972 * baseptr = base * Py_XDECREF(arr.base) * arr.base = baseptr # <<<<<<<<<<<<<< * * cdef inline object get_array_base(ndarray arr): */ __pyx_v_arr->base = __pyx_v_baseptr; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_get_array_base(PyArrayObject *__pyx_v_arr) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("get_array_base", 0); /* "numpy.pxd":975 * * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: # <<<<<<<<<<<<<< * return None * else: */ __pyx_t_1 = (__pyx_v_arr->base == NULL); if (__pyx_t_1) { /* "numpy.pxd":976 * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: * return None # <<<<<<<<<<<<<< * else: * return arr.base */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(Py_None); __pyx_r = Py_None; goto __pyx_L0; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":978 * return None * else: * return arr.base # <<<<<<<<<<<<<< */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_arr->base)); __pyx_r = ((PyObject *)__pyx_v_arr->base); goto __pyx_L0; } __pyx_L3:; __pyx_r = Py_None; __Pyx_INCREF(Py_None); __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyMethodDef __pyx_methods[] = { {0, 0, 0, 0} }; #if PY_MAJOR_VERSION >= 3 static struct PyModuleDef __pyx_moduledef = { PyModuleDef_HEAD_INIT, __Pyx_NAMESTR("_period"), 0, /* m_doc */ -1, /* m_size */ __pyx_methods /* m_methods */, NULL, /* m_reload */ NULL, /* m_traverse */ NULL, /* m_clear */ NULL /* m_free */ }; #endif static __Pyx_StringTabEntry __pyx_string_tab[] = { {&__pyx_kp_u_1, __pyx_k_1, sizeof(__pyx_k_1), 0, 1, 0, 0}, {&__pyx_kp_u_11, __pyx_k_11, sizeof(__pyx_k_11), 0, 1, 0, 0}, {&__pyx_kp_s_15, __pyx_k_15, sizeof(__pyx_k_15), 0, 0, 1, 0}, {&__pyx_n_s_16, __pyx_k_16, sizeof(__pyx_k_16), 0, 0, 1, 1}, {&__pyx_kp_u_3, __pyx_k_3, sizeof(__pyx_k_3), 0, 1, 0, 0}, {&__pyx_kp_u_5, __pyx_k_5, sizeof(__pyx_k_5), 0, 1, 0, 0}, {&__pyx_kp_u_7, __pyx_k_7, sizeof(__pyx_k_7), 0, 1, 0, 0}, {&__pyx_kp_u_8, __pyx_k_8, sizeof(__pyx_k_8), 0, 1, 0, 0}, {&__pyx_n_s__N, __pyx_k__N, sizeof(__pyx_k__N), 0, 0, 1, 1}, {&__pyx_n_s__RuntimeError, __pyx_k__RuntimeError, sizeof(__pyx_k__RuntimeError), 0, 0, 1, 1}, {&__pyx_n_s__ValueError, __pyx_k__ValueError, sizeof(__pyx_k__ValueError), 0, 0, 1, 1}, {&__pyx_n_s__W, __pyx_k__W, sizeof(__pyx_k__W), 0, 0, 1, 1}, {&__pyx_n_s__X, __pyx_k__X, sizeof(__pyx_k__X), 0, 0, 1, 1}, {&__pyx_n_s____main__, __pyx_k____main__, sizeof(__pyx_k____main__), 0, 0, 1, 1}, {&__pyx_n_s____test__, __pyx_k____test__, sizeof(__pyx_k____test__), 0, 0, 1, 1}, {&__pyx_n_s__autocorr_inner, __pyx_k__autocorr_inner, sizeof(__pyx_k__autocorr_inner), 0, 0, 1, 1}, {&__pyx_n_s__coeff, __pyx_k__coeff, sizeof(__pyx_k__coeff), 0, 0, 1, 1}, {&__pyx_n_s__cos, __pyx_k__cos, sizeof(__pyx_k__cos), 0, 0, 1, 1}, {&__pyx_n_s__exp, __pyx_k__exp, sizeof(__pyx_k__exp), 0, 0, 1, 1}, {&__pyx_n_s__goertzel_inner, __pyx_k__goertzel_inner, sizeof(__pyx_k__goertzel_inner), 0, 0, 1, 1}, {&__pyx_n_s__got, __pyx_k__got, sizeof(__pyx_k__got), 0, 0, 1, 1}, {&__pyx_n_s__i, __pyx_k__i, sizeof(__pyx_k__i), 0, 0, 1, 1}, {&__pyx_n_s__ipdft_inner, __pyx_k__ipdft_inner, sizeof(__pyx_k__ipdft_inner), 0, 0, 1, 1}, {&__pyx_n_s__j, __pyx_k__j, sizeof(__pyx_k__j), 0, 0, 1, 1}, {&__pyx_n_s__m, __pyx_k__m, sizeof(__pyx_k__m), 0, 0, 1, 1}, {&__pyx_n_s__motif_length, __pyx_k__motif_length, sizeof(__pyx_k__motif_length), 0, 0, 1, 1}, {&__pyx_n_s__motifs, __pyx_k__motifs, sizeof(__pyx_k__motifs), 0, 0, 1, 1}, {&__pyx_n_s__n, __pyx_k__n, sizeof(__pyx_k__n), 0, 0, 1, 1}, {&__pyx_n_s__num_motifs, __pyx_k__num_motifs, sizeof(__pyx_k__num_motifs), 0, 0, 1, 1}, {&__pyx_n_s__numpy, __pyx_k__numpy, sizeof(__pyx_k__numpy), 0, 0, 1, 1}, {&__pyx_n_s__p, __pyx_k__p, sizeof(__pyx_k__p), 0, 0, 1, 1}, {&__pyx_n_s__period, __pyx_k__period, sizeof(__pyx_k__period), 0, 0, 1, 1}, {&__pyx_n_s__pi, __pyx_k__pi, sizeof(__pyx_k__pi), 0, 0, 1, 1}, {&__pyx_n_s__power, __pyx_k__power, sizeof(__pyx_k__power), 0, 0, 1, 1}, {&__pyx_n_s__range, __pyx_k__range, sizeof(__pyx_k__range), 0, 0, 1, 1}, {&__pyx_n_s__result, __pyx_k__result, sizeof(__pyx_k__result), 0, 0, 1, 1}, {&__pyx_n_s__s, __pyx_k__s, sizeof(__pyx_k__s), 0, 0, 1, 1}, {&__pyx_n_s__s_prev, __pyx_k__s_prev, sizeof(__pyx_k__s_prev), 0, 0, 1, 1}, {&__pyx_n_s__s_prev2, __pyx_k__s_prev2, sizeof(__pyx_k__s_prev2), 0, 0, 1, 1}, {&__pyx_n_s__seq, __pyx_k__seq, sizeof(__pyx_k__seq), 0, 0, 1, 1}, {&__pyx_n_s__seq_to_symbols, __pyx_k__seq_to_symbols, sizeof(__pyx_k__seq_to_symbols), 0, 0, 1, 1}, {&__pyx_n_s__sqrt, __pyx_k__sqrt, sizeof(__pyx_k__sqrt), 0, 0, 1, 1}, {&__pyx_n_s__ulim, __pyx_k__ulim, sizeof(__pyx_k__ulim), 0, 0, 1, 1}, {&__pyx_n_s__w, __pyx_k__w, sizeof(__pyx_k__w), 0, 0, 1, 1}, {&__pyx_n_s__x, __pyx_k__x, sizeof(__pyx_k__x), 0, 0, 1, 1}, {&__pyx_n_s__xc, __pyx_k__xc, sizeof(__pyx_k__xc), 0, 0, 1, 1}, {0, 0, 0, 0, 0, 0, 0} }; static int __Pyx_InitCachedBuiltins(void) { __pyx_builtin_range = __Pyx_GetName(__pyx_b, __pyx_n_s__range); if (!__pyx_builtin_range) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_ValueError = __Pyx_GetName(__pyx_b, __pyx_n_s__ValueError); if (!__pyx_builtin_ValueError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_RuntimeError = __Pyx_GetName(__pyx_b, __pyx_n_s__RuntimeError); if (!__pyx_builtin_RuntimeError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} return 0; __pyx_L1_error:; return -1; } static int __Pyx_InitCachedConstants(void) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__Pyx_InitCachedConstants", 0); /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_k_tuple_2 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_2); __Pyx_INCREF(((PyObject *)__pyx_kp_u_1)); PyTuple_SET_ITEM(__pyx_k_tuple_2, 0, ((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_2)); /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_k_tuple_4 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_4); __Pyx_INCREF(((PyObject *)__pyx_kp_u_3)); PyTuple_SET_ITEM(__pyx_k_tuple_4, 0, ((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_4)); /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_k_tuple_6 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_6)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_6); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_6, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_6)); /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_k_tuple_9 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_9)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_9); __Pyx_INCREF(((PyObject *)__pyx_kp_u_8)); PyTuple_SET_ITEM(__pyx_k_tuple_9, 0, ((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_9)); /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_k_tuple_10 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_10)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_10); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_10, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_10)); /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_k_tuple_12 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_12)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_12); __Pyx_INCREF(((PyObject *)__pyx_kp_u_11)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 0, ((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_12)); /* "cogent/maths/_period.pyx":6 * # TODO intro the version idea of peter's see email from him on Wednesday, 26 May 2010 * * def goertzel_inner(np.ndarray[np.float64_t, ndim=1] x, int N, int period): # <<<<<<<<<<<<<< * """returns the power from series x for period""" * cdef int n */ __pyx_k_tuple_13 = PyTuple_New(9); if (unlikely(!__pyx_k_tuple_13)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_13); __Pyx_INCREF(((PyObject *)__pyx_n_s__x)); PyTuple_SET_ITEM(__pyx_k_tuple_13, 0, ((PyObject *)__pyx_n_s__x)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__x)); __Pyx_INCREF(((PyObject *)__pyx_n_s__N)); PyTuple_SET_ITEM(__pyx_k_tuple_13, 1, ((PyObject *)__pyx_n_s__N)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__N)); __Pyx_INCREF(((PyObject *)__pyx_n_s__period)); PyTuple_SET_ITEM(__pyx_k_tuple_13, 2, ((PyObject *)__pyx_n_s__period)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__period)); __Pyx_INCREF(((PyObject *)__pyx_n_s__n)); PyTuple_SET_ITEM(__pyx_k_tuple_13, 3, ((PyObject *)__pyx_n_s__n)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__n)); __Pyx_INCREF(((PyObject *)__pyx_n_s__coeff)); PyTuple_SET_ITEM(__pyx_k_tuple_13, 4, ((PyObject *)__pyx_n_s__coeff)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__coeff)); __Pyx_INCREF(((PyObject *)__pyx_n_s__s)); PyTuple_SET_ITEM(__pyx_k_tuple_13, 5, ((PyObject *)__pyx_n_s__s)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__s)); __Pyx_INCREF(((PyObject *)__pyx_n_s__s_prev)); PyTuple_SET_ITEM(__pyx_k_tuple_13, 6, ((PyObject *)__pyx_n_s__s_prev)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__s_prev)); __Pyx_INCREF(((PyObject *)__pyx_n_s__s_prev2)); PyTuple_SET_ITEM(__pyx_k_tuple_13, 7, ((PyObject *)__pyx_n_s__s_prev2)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__s_prev2)); __Pyx_INCREF(((PyObject *)__pyx_n_s__power)); PyTuple_SET_ITEM(__pyx_k_tuple_13, 8, ((PyObject *)__pyx_n_s__power)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__power)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_13)); __pyx_k_codeobj_14 = (PyObject*)__Pyx_PyCode_New(3, 0, 9, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_13, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_15, __pyx_n_s__goertzel_inner, 6, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/maths/_period.pyx":22 * return power * * def ipdft_inner(np.ndarray[np.float64_t, ndim=1] x, # <<<<<<<<<<<<<< * np.ndarray[np.complex128_t, ndim=1] X, * np.ndarray[np.complex128_t, ndim=1] W, */ __pyx_k_tuple_17 = PyTuple_New(8); if (unlikely(!__pyx_k_tuple_17)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_17); __Pyx_INCREF(((PyObject *)__pyx_n_s__x)); PyTuple_SET_ITEM(__pyx_k_tuple_17, 0, ((PyObject *)__pyx_n_s__x)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__x)); __Pyx_INCREF(((PyObject *)__pyx_n_s__X)); PyTuple_SET_ITEM(__pyx_k_tuple_17, 1, ((PyObject *)__pyx_n_s__X)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__X)); __Pyx_INCREF(((PyObject *)__pyx_n_s__W)); PyTuple_SET_ITEM(__pyx_k_tuple_17, 2, ((PyObject *)__pyx_n_s__W)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__W)); __Pyx_INCREF(((PyObject *)__pyx_n_s__ulim)); PyTuple_SET_ITEM(__pyx_k_tuple_17, 3, ((PyObject *)__pyx_n_s__ulim)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__ulim)); __Pyx_INCREF(((PyObject *)__pyx_n_s__N)); PyTuple_SET_ITEM(__pyx_k_tuple_17, 4, ((PyObject *)__pyx_n_s__N)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__N)); __Pyx_INCREF(((PyObject *)__pyx_n_s__n)); PyTuple_SET_ITEM(__pyx_k_tuple_17, 5, ((PyObject *)__pyx_n_s__n)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__n)); __Pyx_INCREF(((PyObject *)__pyx_n_s__p)); PyTuple_SET_ITEM(__pyx_k_tuple_17, 6, ((PyObject *)__pyx_n_s__p)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__p)); __Pyx_INCREF(((PyObject *)__pyx_n_s__w)); PyTuple_SET_ITEM(__pyx_k_tuple_17, 7, ((PyObject *)__pyx_n_s__w)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__w)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_17)); __pyx_k_codeobj_18 = (PyObject*)__Pyx_PyCode_New(5, 0, 8, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_17, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_15, __pyx_n_s__ipdft_inner, 22, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_18)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/maths/_period.pyx":39 * return X * * def autocorr_inner(np.ndarray[np.float64_t, ndim=1] x, np.ndarray[np.float64_t, ndim=1] xc, int N): # <<<<<<<<<<<<<< * cdef int m, n * */ __pyx_k_tuple_19 = PyTuple_New(5); if (unlikely(!__pyx_k_tuple_19)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_19); __Pyx_INCREF(((PyObject *)__pyx_n_s__x)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 0, ((PyObject *)__pyx_n_s__x)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__x)); __Pyx_INCREF(((PyObject *)__pyx_n_s__xc)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 1, ((PyObject *)__pyx_n_s__xc)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__xc)); __Pyx_INCREF(((PyObject *)__pyx_n_s__N)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 2, ((PyObject *)__pyx_n_s__N)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__N)); __Pyx_INCREF(((PyObject *)__pyx_n_s__m)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 3, ((PyObject *)__pyx_n_s__m)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__m)); __Pyx_INCREF(((PyObject *)__pyx_n_s__n)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 4, ((PyObject *)__pyx_n_s__n)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__n)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_19)); __pyx_k_codeobj_20 = (PyObject*)__Pyx_PyCode_New(3, 0, 5, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_19, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_15, __pyx_n_s__autocorr_inner, 39, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_20)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/maths/_period.pyx":47 * xc[m+N-1] += (x[n]*x[n-m]) * * def seq_to_symbols(char* seq, list motifs, int motif_length, # <<<<<<<<<<<<<< * np.ndarray[np.uint8_t, ndim=1] result): * cdef int i, j, N, num_motifs */ __pyx_k_tuple_21 = PyTuple_New(9); if (unlikely(!__pyx_k_tuple_21)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_21); __Pyx_INCREF(((PyObject *)__pyx_n_s__seq)); PyTuple_SET_ITEM(__pyx_k_tuple_21, 0, ((PyObject *)__pyx_n_s__seq)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__seq)); __Pyx_INCREF(((PyObject *)__pyx_n_s__motifs)); PyTuple_SET_ITEM(__pyx_k_tuple_21, 1, ((PyObject *)__pyx_n_s__motifs)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__motifs)); __Pyx_INCREF(((PyObject *)__pyx_n_s__motif_length)); PyTuple_SET_ITEM(__pyx_k_tuple_21, 2, ((PyObject *)__pyx_n_s__motif_length)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__motif_length)); __Pyx_INCREF(((PyObject *)__pyx_n_s__result)); PyTuple_SET_ITEM(__pyx_k_tuple_21, 3, ((PyObject *)__pyx_n_s__result)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__result)); __Pyx_INCREF(((PyObject *)__pyx_n_s__i)); PyTuple_SET_ITEM(__pyx_k_tuple_21, 4, ((PyObject *)__pyx_n_s__i)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__i)); __Pyx_INCREF(((PyObject *)__pyx_n_s__j)); PyTuple_SET_ITEM(__pyx_k_tuple_21, 5, ((PyObject *)__pyx_n_s__j)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__j)); __Pyx_INCREF(((PyObject *)__pyx_n_s__N)); PyTuple_SET_ITEM(__pyx_k_tuple_21, 6, ((PyObject *)__pyx_n_s__N)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__N)); __Pyx_INCREF(((PyObject *)__pyx_n_s__num_motifs)); PyTuple_SET_ITEM(__pyx_k_tuple_21, 7, ((PyObject *)__pyx_n_s__num_motifs)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__num_motifs)); __Pyx_INCREF(((PyObject *)__pyx_n_s__got)); PyTuple_SET_ITEM(__pyx_k_tuple_21, 8, ((PyObject *)__pyx_n_s__got)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__got)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_21)); __pyx_k_codeobj_22 = (PyObject*)__Pyx_PyCode_New(4, 0, 9, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_21, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_15, __pyx_n_s__seq_to_symbols, 47, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_22)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_RefNannyFinishContext(); return 0; __pyx_L1_error:; __Pyx_RefNannyFinishContext(); return -1; } static int __Pyx_InitGlobals(void) { if (__Pyx_InitStrings(__pyx_string_tab) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_2 = PyInt_FromLong(2); if (unlikely(!__pyx_int_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_15 = PyInt_FromLong(15); if (unlikely(!__pyx_int_15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; return 0; __pyx_L1_error:; return -1; } #if PY_MAJOR_VERSION < 3 PyMODINIT_FUNC init_period(void); /*proto*/ PyMODINIT_FUNC init_period(void) #else PyMODINIT_FUNC PyInit__period(void); /*proto*/ PyMODINIT_FUNC PyInit__period(void) #endif { PyObject *__pyx_t_1 = NULL; PyObject *__pyx_t_2 = NULL; __Pyx_RefNannyDeclarations #if CYTHON_REFNANNY __Pyx_RefNanny = __Pyx_RefNannyImportAPI("refnanny"); if (!__Pyx_RefNanny) { PyErr_Clear(); __Pyx_RefNanny = __Pyx_RefNannyImportAPI("Cython.Runtime.refnanny"); if (!__Pyx_RefNanny) Py_FatalError("failed to import 'refnanny' module"); } #endif __Pyx_RefNannySetupContext("PyMODINIT_FUNC PyInit__period(void)", 0); if ( __Pyx_check_binary_version() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_tuple = PyTuple_New(0); if (unlikely(!__pyx_empty_tuple)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_bytes = PyBytes_FromStringAndSize("", 0); if (unlikely(!__pyx_empty_bytes)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #ifdef __Pyx_CyFunction_USED if (__Pyx_CyFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_FusedFunction_USED if (__pyx_FusedFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_Generator_USED if (__pyx_Generator_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif /*--- Library function declarations ---*/ /*--- Threads initialization code ---*/ #if defined(__PYX_FORCE_INIT_THREADS) && __PYX_FORCE_INIT_THREADS #ifdef WITH_THREAD /* Python build with threading support? */ PyEval_InitThreads(); #endif #endif /*--- Module creation code ---*/ #if PY_MAJOR_VERSION < 3 __pyx_m = Py_InitModule4(__Pyx_NAMESTR("_period"), __pyx_methods, 0, 0, PYTHON_API_VERSION); #else __pyx_m = PyModule_Create(&__pyx_moduledef); #endif if (!__pyx_m) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; #if PY_MAJOR_VERSION < 3 Py_INCREF(__pyx_m); #endif __pyx_b = PyImport_AddModule(__Pyx_NAMESTR(__Pyx_BUILTIN_MODULE_NAME)); if (!__pyx_b) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; if (__Pyx_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; /*--- Initialize various global constants etc. ---*/ if (unlikely(__Pyx_InitGlobals() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__pyx_module_is_main_cogent__maths___period) { if (__Pyx_SetAttrString(__pyx_m, "__name__", __pyx_n_s____main__) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; } /*--- Builtin init code ---*/ if (unlikely(__Pyx_InitCachedBuiltins() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Constants init code ---*/ if (unlikely(__Pyx_InitCachedConstants() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Global init code ---*/ /*--- Variable export code ---*/ /*--- Function export code ---*/ /*--- Type init code ---*/ /*--- Type import code ---*/ __pyx_ptype_5numpy_dtype = __Pyx_ImportType("numpy", "dtype", sizeof(PyArray_Descr), 0); if (unlikely(!__pyx_ptype_5numpy_dtype)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 154; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_flatiter = __Pyx_ImportType("numpy", "flatiter", sizeof(PyArrayIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_flatiter)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 164; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_broadcast = __Pyx_ImportType("numpy", "broadcast", sizeof(PyArrayMultiIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_broadcast)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 168; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ndarray = __Pyx_ImportType("numpy", "ndarray", sizeof(PyArrayObject), 0); if (unlikely(!__pyx_ptype_5numpy_ndarray)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 177; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ufunc = __Pyx_ImportType("numpy", "ufunc", sizeof(PyUFuncObject), 0); if (unlikely(!__pyx_ptype_5numpy_ufunc)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 860; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Variable import code ---*/ /*--- Function import code ---*/ /*--- Execution code ---*/ /* "cogent/maths/_period.pyx":1 * from numpy import pi, exp, sqrt, cos # <<<<<<<<<<<<<< * cimport numpy as np * */ __pyx_t_1 = PyList_New(4); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __Pyx_INCREF(((PyObject *)__pyx_n_s__pi)); PyList_SET_ITEM(__pyx_t_1, 0, ((PyObject *)__pyx_n_s__pi)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__pi)); __Pyx_INCREF(((PyObject *)__pyx_n_s__exp)); PyList_SET_ITEM(__pyx_t_1, 1, ((PyObject *)__pyx_n_s__exp)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__exp)); __Pyx_INCREF(((PyObject *)__pyx_n_s__sqrt)); PyList_SET_ITEM(__pyx_t_1, 2, ((PyObject *)__pyx_n_s__sqrt)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__sqrt)); __Pyx_INCREF(((PyObject *)__pyx_n_s__cos)); PyList_SET_ITEM(__pyx_t_1, 3, ((PyObject *)__pyx_n_s__cos)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__cos)); __pyx_t_2 = __Pyx_Import(((PyObject *)__pyx_n_s__numpy), ((PyObject *)__pyx_t_1), -1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0; __pyx_t_1 = PyObject_GetAttr(__pyx_t_2, __pyx_n_s__pi); if (__pyx_t_1 == NULL) { if (PyErr_ExceptionMatches(PyExc_AttributeError)) __Pyx_RaiseImportError(__pyx_n_s__pi); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__pi, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyObject_GetAttr(__pyx_t_2, __pyx_n_s__exp); if (__pyx_t_1 == NULL) { if (PyErr_ExceptionMatches(PyExc_AttributeError)) __Pyx_RaiseImportError(__pyx_n_s__exp); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__exp, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyObject_GetAttr(__pyx_t_2, __pyx_n_s__sqrt); if (__pyx_t_1 == NULL) { if (PyErr_ExceptionMatches(PyExc_AttributeError)) __Pyx_RaiseImportError(__pyx_n_s__sqrt); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__sqrt, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyObject_GetAttr(__pyx_t_2, __pyx_n_s__cos); if (__pyx_t_1 == NULL) { if (PyErr_ExceptionMatches(PyExc_AttributeError)) __Pyx_RaiseImportError(__pyx_n_s__cos); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__cos, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/maths/_period.pyx":6 * # TODO intro the version idea of peter's see email from him on Wednesday, 26 May 2010 * * def goertzel_inner(np.ndarray[np.float64_t, ndim=1] x, int N, int period): # <<<<<<<<<<<<<< * """returns the power from series x for period""" * cdef int n */ __pyx_t_2 = PyCFunction_NewEx(&__pyx_mdef_6cogent_5maths_7_period_1goertzel_inner, NULL, __pyx_n_s_16); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__goertzel_inner, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/maths/_period.pyx":22 * return power * * def ipdft_inner(np.ndarray[np.float64_t, ndim=1] x, # <<<<<<<<<<<<<< * np.ndarray[np.complex128_t, ndim=1] X, * np.ndarray[np.complex128_t, ndim=1] W, */ __pyx_t_2 = PyCFunction_NewEx(&__pyx_mdef_6cogent_5maths_7_period_3ipdft_inner, NULL, __pyx_n_s_16); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__ipdft_inner, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/maths/_period.pyx":39 * return X * * def autocorr_inner(np.ndarray[np.float64_t, ndim=1] x, np.ndarray[np.float64_t, ndim=1] xc, int N): # <<<<<<<<<<<<<< * cdef int m, n * */ __pyx_t_2 = PyCFunction_NewEx(&__pyx_mdef_6cogent_5maths_7_period_5autocorr_inner, NULL, __pyx_n_s_16); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__autocorr_inner, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 39; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/maths/_period.pyx":47 * xc[m+N-1] += (x[n]*x[n-m]) * * def seq_to_symbols(char* seq, list motifs, int motif_length, # <<<<<<<<<<<<<< * np.ndarray[np.uint8_t, ndim=1] result): * cdef int i, j, N, num_motifs */ __pyx_t_2 = PyCFunction_NewEx(&__pyx_mdef_6cogent_5maths_7_period_7seq_to_symbols, NULL, __pyx_n_s_16); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__seq_to_symbols, __pyx_t_2) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/maths/_period.pyx":1 * from numpy import pi, exp, sqrt, cos # <<<<<<<<<<<<<< * cimport numpy as np * */ __pyx_t_2 = PyDict_New(); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); if (PyObject_SetAttr(__pyx_m, __pyx_n_s____test__, ((PyObject *)__pyx_t_2)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); if (__pyx_m) { __Pyx_AddTraceback("init cogent.maths._period", __pyx_clineno, __pyx_lineno, __pyx_filename); Py_DECREF(__pyx_m); __pyx_m = 0; } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_ImportError, "init cogent.maths._period"); } __pyx_L0:; __Pyx_RefNannyFinishContext(); #if PY_MAJOR_VERSION < 3 return; #else return __pyx_m; #endif } /* Runtime support code */ #if CYTHON_REFNANNY static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname) { PyObject *m = NULL, *p = NULL; void *r = NULL; m = PyImport_ImportModule((char *)modname); if (!m) goto end; p = PyObject_GetAttrString(m, (char *)"RefNannyAPI"); if (!p) goto end; r = PyLong_AsVoidPtr(p); end: Py_XDECREF(p); Py_XDECREF(m); return (__Pyx_RefNannyAPIStruct *)r; } #endif /* CYTHON_REFNANNY */ static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name) { PyObject *result; result = PyObject_GetAttr(dict, name); if (!result) { if (dict != __pyx_b) { PyErr_Clear(); result = PyObject_GetAttr(__pyx_b, name); } if (!result) { PyErr_SetObject(PyExc_NameError, name); } } return result; } static void __Pyx_RaiseArgtupleInvalid( const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found) { Py_ssize_t num_expected; const char *more_or_less; if (num_found < num_min) { num_expected = num_min; more_or_less = "at least"; } else { num_expected = num_max; more_or_less = "at most"; } if (exact) { more_or_less = "exactly"; } PyErr_Format(PyExc_TypeError, "%s() takes %s %"PY_FORMAT_SIZE_T"d positional argument%s (%"PY_FORMAT_SIZE_T"d given)", func_name, more_or_less, num_expected, (num_expected == 1) ? "" : "s", num_found); } static void __Pyx_RaiseDoubleKeywordsError( const char* func_name, PyObject* kw_name) { PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION >= 3 "%s() got multiple values for keyword argument '%U'", func_name, kw_name); #else "%s() got multiple values for keyword argument '%s'", func_name, PyString_AS_STRING(kw_name)); #endif } static int __Pyx_ParseOptionalKeywords( PyObject *kwds, PyObject **argnames[], PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, const char* function_name) { PyObject *key = 0, *value = 0; Py_ssize_t pos = 0; PyObject*** name; PyObject*** first_kw_arg = argnames + num_pos_args; while (PyDict_Next(kwds, &pos, &key, &value)) { name = first_kw_arg; while (*name && (**name != key)) name++; if (*name) { values[name-argnames] = value; } else { #if PY_MAJOR_VERSION < 3 if (unlikely(!PyString_CheckExact(key)) && unlikely(!PyString_Check(key))) { #else if (unlikely(!PyUnicode_Check(key))) { #endif goto invalid_keyword_type; } else { for (name = first_kw_arg; *name; name++) { #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) break; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) break; #endif } if (*name) { values[name-argnames] = value; } else { for (name=argnames; name != first_kw_arg; name++) { if (**name == key) goto arg_passed_twice; #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) goto arg_passed_twice; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) goto arg_passed_twice; #endif } if (kwds2) { if (unlikely(PyDict_SetItem(kwds2, key, value))) goto bad; } else { goto invalid_keyword; } } } } } return 0; arg_passed_twice: __Pyx_RaiseDoubleKeywordsError(function_name, **name); goto bad; invalid_keyword_type: PyErr_Format(PyExc_TypeError, "%s() keywords must be strings", function_name); goto bad; invalid_keyword: PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION < 3 "%s() got an unexpected keyword argument '%s'", function_name, PyString_AsString(key)); #else "%s() got an unexpected keyword argument '%U'", function_name, key); #endif bad: return -1; } static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact) { if (!type) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (none_allowed && obj == Py_None) return 1; else if (exact) { if (Py_TYPE(obj) == type) return 1; } else { if (PyObject_TypeCheck(obj, type)) return 1; } PyErr_Format(PyExc_TypeError, "Argument '%s' has incorrect type (expected %s, got %s)", name, type->tp_name, Py_TYPE(obj)->tp_name); return 0; } static CYTHON_INLINE int __Pyx_IsLittleEndian(void) { unsigned int n = 1; return *(unsigned char*)(&n) != 0; } static void __Pyx_BufFmt_Init(__Pyx_BufFmt_Context* ctx, __Pyx_BufFmt_StackElem* stack, __Pyx_TypeInfo* type) { stack[0].field = &ctx->root; stack[0].parent_offset = 0; ctx->root.type = type; ctx->root.name = "buffer dtype"; ctx->root.offset = 0; ctx->head = stack; ctx->head->field = &ctx->root; ctx->fmt_offset = 0; ctx->head->parent_offset = 0; ctx->new_packmode = '@'; ctx->enc_packmode = '@'; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->is_complex = 0; ctx->is_valid_array = 0; ctx->struct_alignment = 0; while (type->typegroup == 'S') { ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = 0; type = type->fields->type; } } static int __Pyx_BufFmt_ParseNumber(const char** ts) { int count; const char* t = *ts; if (*t < '0' || *t > '9') { return -1; } else { count = *t++ - '0'; while (*t >= '0' && *t < '9') { count *= 10; count += *t++ - '0'; } } *ts = t; return count; } static int __Pyx_BufFmt_ExpectNumber(const char **ts) { int number = __Pyx_BufFmt_ParseNumber(ts); if (number == -1) /* First char was not a digit */ PyErr_Format(PyExc_ValueError,\ "Does not understand character buffer dtype format string ('%c')", **ts); return number; } static void __Pyx_BufFmt_RaiseUnexpectedChar(char ch) { PyErr_Format(PyExc_ValueError, "Unexpected format string character: '%c'", ch); } static const char* __Pyx_BufFmt_DescribeTypeChar(char ch, int is_complex) { switch (ch) { case 'b': return "'char'"; case 'B': return "'unsigned char'"; case 'h': return "'short'"; case 'H': return "'unsigned short'"; case 'i': return "'int'"; case 'I': return "'unsigned int'"; case 'l': return "'long'"; case 'L': return "'unsigned long'"; case 'q': return "'long long'"; case 'Q': return "'unsigned long long'"; case 'f': return (is_complex ? "'complex float'" : "'float'"); case 'd': return (is_complex ? "'complex double'" : "'double'"); case 'g': return (is_complex ? "'complex long double'" : "'long double'"); case 'T': return "a struct"; case 'O': return "Python object"; case 'P': return "a pointer"; case 's': case 'p': return "a string"; case 0: return "end"; default: return "unparseable format string"; } } static size_t __Pyx_BufFmt_TypeCharToStandardSize(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return 2; case 'i': case 'I': case 'l': case 'L': return 4; case 'q': case 'Q': return 8; case 'f': return (is_complex ? 8 : 4); case 'd': return (is_complex ? 16 : 8); case 'g': { PyErr_SetString(PyExc_ValueError, "Python does not define a standard format string size for long double ('g').."); return 0; } case 'O': case 'P': return sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static size_t __Pyx_BufFmt_TypeCharToNativeSize(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(short); case 'i': case 'I': return sizeof(int); case 'l': case 'L': return sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(float) * (is_complex ? 2 : 1); case 'd': return sizeof(double) * (is_complex ? 2 : 1); case 'g': return sizeof(long double) * (is_complex ? 2 : 1); case 'O': case 'P': return sizeof(void*); default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } typedef struct { char c; short x; } __Pyx_st_short; typedef struct { char c; int x; } __Pyx_st_int; typedef struct { char c; long x; } __Pyx_st_long; typedef struct { char c; float x; } __Pyx_st_float; typedef struct { char c; double x; } __Pyx_st_double; typedef struct { char c; long double x; } __Pyx_st_longdouble; typedef struct { char c; void *x; } __Pyx_st_void_p; #ifdef HAVE_LONG_LONG typedef struct { char c; PY_LONG_LONG x; } __Pyx_st_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToAlignment(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_st_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_st_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_st_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_st_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_st_float) - sizeof(float); case 'd': return sizeof(__Pyx_st_double) - sizeof(double); case 'g': return sizeof(__Pyx_st_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_st_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } /* These are for computing the padding at the end of the struct to align on the first member of the struct. This will probably the same as above, but we don't have any guarantees. */ typedef struct { short x; char c; } __Pyx_pad_short; typedef struct { int x; char c; } __Pyx_pad_int; typedef struct { long x; char c; } __Pyx_pad_long; typedef struct { float x; char c; } __Pyx_pad_float; typedef struct { double x; char c; } __Pyx_pad_double; typedef struct { long double x; char c; } __Pyx_pad_longdouble; typedef struct { void *x; char c; } __Pyx_pad_void_p; #ifdef HAVE_LONG_LONG typedef struct { PY_LONG_LONG x; char c; } __Pyx_pad_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToPadding(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_pad_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_pad_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_pad_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_pad_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_pad_float) - sizeof(float); case 'd': return sizeof(__Pyx_pad_double) - sizeof(double); case 'g': return sizeof(__Pyx_pad_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_pad_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static char __Pyx_BufFmt_TypeCharToGroup(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'h': case 'i': case 'l': case 'q': case 's': case 'p': return 'I'; case 'B': case 'H': case 'I': case 'L': case 'Q': return 'U'; case 'f': case 'd': case 'g': return (is_complex ? 'C' : 'R'); case 'O': return 'O'; case 'P': return 'P'; default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } static void __Pyx_BufFmt_RaiseExpected(__Pyx_BufFmt_Context* ctx) { if (ctx->head == NULL || ctx->head->field == &ctx->root) { const char* expected; const char* quote; if (ctx->head == NULL) { expected = "end"; quote = ""; } else { expected = ctx->head->field->type->name; quote = "'"; } PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected %s%s%s but got %s", quote, expected, quote, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex)); } else { __Pyx_StructField* field = ctx->head->field; __Pyx_StructField* parent = (ctx->head - 1)->field; PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected '%s' but got %s in '%s.%s'", field->type->name, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex), parent->type->name, field->name); } } static int __Pyx_BufFmt_ProcessTypeChunk(__Pyx_BufFmt_Context* ctx) { char group; size_t size, offset, arraysize = 1; if (ctx->enc_type == 0) return 0; if (ctx->head->field->type->arraysize[0]) { int i, ndim = 0; if (ctx->enc_type == 's' || ctx->enc_type == 'p') { ctx->is_valid_array = ctx->head->field->type->ndim == 1; ndim = 1; if (ctx->enc_count != ctx->head->field->type->arraysize[0]) { PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %zu", ctx->head->field->type->arraysize[0], ctx->enc_count); return -1; } } if (!ctx->is_valid_array) { PyErr_Format(PyExc_ValueError, "Expected %d dimensions, got %d", ctx->head->field->type->ndim, ndim); return -1; } for (i = 0; i < ctx->head->field->type->ndim; i++) { arraysize *= ctx->head->field->type->arraysize[i]; } ctx->is_valid_array = 0; ctx->enc_count = 1; } group = __Pyx_BufFmt_TypeCharToGroup(ctx->enc_type, ctx->is_complex); do { __Pyx_StructField* field = ctx->head->field; __Pyx_TypeInfo* type = field->type; if (ctx->enc_packmode == '@' || ctx->enc_packmode == '^') { size = __Pyx_BufFmt_TypeCharToNativeSize(ctx->enc_type, ctx->is_complex); } else { size = __Pyx_BufFmt_TypeCharToStandardSize(ctx->enc_type, ctx->is_complex); } if (ctx->enc_packmode == '@') { size_t align_at = __Pyx_BufFmt_TypeCharToAlignment(ctx->enc_type, ctx->is_complex); size_t align_mod_offset; if (align_at == 0) return -1; align_mod_offset = ctx->fmt_offset % align_at; if (align_mod_offset > 0) ctx->fmt_offset += align_at - align_mod_offset; if (ctx->struct_alignment == 0) ctx->struct_alignment = __Pyx_BufFmt_TypeCharToPadding(ctx->enc_type, ctx->is_complex); } if (type->size != size || type->typegroup != group) { if (type->typegroup == 'C' && type->fields != NULL) { size_t parent_offset = ctx->head->parent_offset + field->offset; ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = parent_offset; continue; } __Pyx_BufFmt_RaiseExpected(ctx); return -1; } offset = ctx->head->parent_offset + field->offset; if (ctx->fmt_offset != offset) { PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch; next field is at offset %"PY_FORMAT_SIZE_T"d but %"PY_FORMAT_SIZE_T"d expected", (Py_ssize_t)ctx->fmt_offset, (Py_ssize_t)offset); return -1; } ctx->fmt_offset += size; if (arraysize) ctx->fmt_offset += (arraysize - 1) * size; --ctx->enc_count; /* Consume from buffer string */ while (1) { if (field == &ctx->root) { ctx->head = NULL; if (ctx->enc_count != 0) { __Pyx_BufFmt_RaiseExpected(ctx); return -1; } break; /* breaks both loops as ctx->enc_count == 0 */ } ctx->head->field = ++field; if (field->type == NULL) { --ctx->head; field = ctx->head->field; continue; } else if (field->type->typegroup == 'S') { size_t parent_offset = ctx->head->parent_offset + field->offset; if (field->type->fields->type == NULL) continue; /* empty struct */ field = field->type->fields; ++ctx->head; ctx->head->field = field; ctx->head->parent_offset = parent_offset; break; } else { break; } } } while (ctx->enc_count); ctx->enc_type = 0; ctx->is_complex = 0; return 0; } static CYTHON_INLINE PyObject * __pyx_buffmt_parse_array(__Pyx_BufFmt_Context* ctx, const char** tsp) { const char *ts = *tsp; int i = 0, number; int ndim = ctx->head->field->type->ndim; ; ++ts; if (ctx->new_count != 1) { PyErr_SetString(PyExc_ValueError, "Cannot handle repeated arrays in format string"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; while (*ts && *ts != ')') { if (isspace(*ts)) continue; number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; if (i < ndim && (size_t) number != ctx->head->field->type->arraysize[i]) return PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %d", ctx->head->field->type->arraysize[i], number); if (*ts != ',' && *ts != ')') return PyErr_Format(PyExc_ValueError, "Expected a comma in format string, got '%c'", *ts); if (*ts == ',') ts++; i++; } if (i != ndim) return PyErr_Format(PyExc_ValueError, "Expected %d dimension(s), got %d", ctx->head->field->type->ndim, i); if (!*ts) { PyErr_SetString(PyExc_ValueError, "Unexpected end of format string, expected ')'"); return NULL; } ctx->is_valid_array = 1; ctx->new_count = 1; *tsp = ++ts; return Py_None; } static const char* __Pyx_BufFmt_CheckString(__Pyx_BufFmt_Context* ctx, const char* ts) { int got_Z = 0; while (1) { switch(*ts) { case 0: if (ctx->enc_type != 0 && ctx->head == NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; if (ctx->head != NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } return ts; case ' ': case 10: case 13: ++ts; break; case '<': if (!__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Little-endian buffer not supported on big-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '>': case '!': if (__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Big-endian buffer not supported on little-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '=': case '@': case '^': ctx->new_packmode = *ts++; break; case 'T': /* substruct */ { const char* ts_after_sub; size_t i, struct_count = ctx->new_count; size_t struct_alignment = ctx->struct_alignment; ctx->new_count = 1; ++ts; if (*ts != '{') { PyErr_SetString(PyExc_ValueError, "Buffer acquisition: Expected '{' after 'T'"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ ctx->enc_count = 0; ctx->struct_alignment = 0; ++ts; ts_after_sub = ts; for (i = 0; i != struct_count; ++i) { ts_after_sub = __Pyx_BufFmt_CheckString(ctx, ts); if (!ts_after_sub) return NULL; } ts = ts_after_sub; if (struct_alignment) ctx->struct_alignment = struct_alignment; } break; case '}': /* end of substruct; either repeat or move on */ { size_t alignment = ctx->struct_alignment; ++ts; if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ if (alignment && ctx->fmt_offset % alignment) { ctx->fmt_offset += alignment - (ctx->fmt_offset % alignment); } } return ts; case 'x': if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->fmt_offset += ctx->new_count; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->enc_packmode = ctx->new_packmode; ++ts; break; case 'Z': got_Z = 1; ++ts; if (*ts != 'f' && *ts != 'd' && *ts != 'g') { __Pyx_BufFmt_RaiseUnexpectedChar('Z'); return NULL; } /* fall through */ case 'c': case 'b': case 'B': case 'h': case 'H': case 'i': case 'I': case 'l': case 'L': case 'q': case 'Q': case 'f': case 'd': case 'g': case 'O': case 's': case 'p': if (ctx->enc_type == *ts && got_Z == ctx->is_complex && ctx->enc_packmode == ctx->new_packmode) { ctx->enc_count += ctx->new_count; } else { if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_count = ctx->new_count; ctx->enc_packmode = ctx->new_packmode; ctx->enc_type = *ts; ctx->is_complex = got_Z; } ++ts; ctx->new_count = 1; got_Z = 0; break; case ':': ++ts; while(*ts != ':') ++ts; ++ts; break; case '(': if (!__pyx_buffmt_parse_array(ctx, &ts)) return NULL; break; default: { int number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; ctx->new_count = (size_t)number; } } } } static CYTHON_INLINE void __Pyx_ZeroBuffer(Py_buffer* buf) { buf->buf = NULL; buf->obj = NULL; buf->strides = __Pyx_zeros; buf->shape = __Pyx_zeros; buf->suboffsets = __Pyx_minusones; } static CYTHON_INLINE int __Pyx_GetBufferAndValidate( Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack) { if (obj == Py_None || obj == NULL) { __Pyx_ZeroBuffer(buf); return 0; } buf->buf = NULL; if (__Pyx_GetBuffer(obj, buf, flags) == -1) goto fail; if (buf->ndim != nd) { PyErr_Format(PyExc_ValueError, "Buffer has wrong number of dimensions (expected %d, got %d)", nd, buf->ndim); goto fail; } if (!cast) { __Pyx_BufFmt_Context ctx; __Pyx_BufFmt_Init(&ctx, stack, dtype); if (!__Pyx_BufFmt_CheckString(&ctx, buf->format)) goto fail; } if ((unsigned)buf->itemsize != dtype->size) { PyErr_Format(PyExc_ValueError, "Item size of buffer (%"PY_FORMAT_SIZE_T"d byte%s) does not match size of '%s' (%"PY_FORMAT_SIZE_T"d byte%s)", buf->itemsize, (buf->itemsize > 1) ? "s" : "", dtype->name, (Py_ssize_t)dtype->size, (dtype->size > 1) ? "s" : ""); goto fail; } if (buf->suboffsets == NULL) buf->suboffsets = __Pyx_minusones; return 0; fail:; __Pyx_ZeroBuffer(buf); return -1; } static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info) { if (info->buf == NULL) return; if (info->suboffsets == __Pyx_minusones) info->suboffsets = NULL; __Pyx_ReleaseBuffer(info); } static void __Pyx_RaiseBufferIndexError(int axis) { PyErr_Format(PyExc_IndexError, "Out of bounds on buffer access (axis %d)", axis); } static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb) { #if CYTHON_COMPILING_IN_CPYTHON PyObject *tmp_type, *tmp_value, *tmp_tb; PyThreadState *tstate = PyThreadState_GET(); tmp_type = tstate->curexc_type; tmp_value = tstate->curexc_value; tmp_tb = tstate->curexc_traceback; tstate->curexc_type = type; tstate->curexc_value = value; tstate->curexc_traceback = tb; Py_XDECREF(tmp_type); Py_XDECREF(tmp_value); Py_XDECREF(tmp_tb); #else PyErr_Restore(type, value, tb); #endif } static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb) { #if CYTHON_COMPILING_IN_CPYTHON PyThreadState *tstate = PyThreadState_GET(); *type = tstate->curexc_type; *value = tstate->curexc_value; *tb = tstate->curexc_traceback; tstate->curexc_type = 0; tstate->curexc_value = 0; tstate->curexc_traceback = 0; #else PyErr_Fetch(type, value, tb); #endif } #if PY_MAJOR_VERSION < 3 static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, CYTHON_UNUSED PyObject *cause) { Py_XINCREF(type); Py_XINCREF(value); Py_XINCREF(tb); if (tb == Py_None) { Py_DECREF(tb); tb = 0; } else if (tb != NULL && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto raise_error; } if (value == NULL) { value = Py_None; Py_INCREF(value); } #if PY_VERSION_HEX < 0x02050000 if (!PyClass_Check(type)) #else if (!PyType_Check(type)) #endif { if (value != Py_None) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto raise_error; } Py_DECREF(value); value = type; #if PY_VERSION_HEX < 0x02050000 if (PyInstance_Check(type)) { type = (PyObject*) ((PyInstanceObject*)type)->in_class; Py_INCREF(type); } else { type = 0; PyErr_SetString(PyExc_TypeError, "raise: exception must be an old-style class or instance"); goto raise_error; } #else type = (PyObject*) Py_TYPE(type); Py_INCREF(type); if (!PyType_IsSubtype((PyTypeObject *)type, (PyTypeObject *)PyExc_BaseException)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto raise_error; } #endif } __Pyx_ErrRestore(type, value, tb); return; raise_error: Py_XDECREF(value); Py_XDECREF(type); Py_XDECREF(tb); return; } #else /* Python 3+ */ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause) { if (tb == Py_None) { tb = 0; } else if (tb && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto bad; } if (value == Py_None) value = 0; if (PyExceptionInstance_Check(type)) { if (value) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto bad; } value = type; type = (PyObject*) Py_TYPE(value); } else if (!PyExceptionClass_Check(type)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto bad; } if (cause) { PyObject *fixed_cause; if (PyExceptionClass_Check(cause)) { fixed_cause = PyObject_CallObject(cause, NULL); if (fixed_cause == NULL) goto bad; } else if (PyExceptionInstance_Check(cause)) { fixed_cause = cause; Py_INCREF(fixed_cause); } else { PyErr_SetString(PyExc_TypeError, "exception causes must derive from " "BaseException"); goto bad; } if (!value) { value = PyObject_CallObject(type, NULL); } PyException_SetCause(value, fixed_cause); } PyErr_SetObject(type, value); if (tb) { PyThreadState *tstate = PyThreadState_GET(); PyObject* tmp_tb = tstate->curexc_traceback; if (tb != tmp_tb) { Py_INCREF(tb); tstate->curexc_traceback = tb; Py_XDECREF(tmp_tb); } } bad: return; } #endif static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index) { PyErr_Format(PyExc_ValueError, "need more than %"PY_FORMAT_SIZE_T"d value%s to unpack", index, (index == 1) ? "" : "s"); } static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected) { PyErr_Format(PyExc_ValueError, "too many values to unpack (expected %"PY_FORMAT_SIZE_T"d)", expected); } static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); } static void __Pyx_UnpackTupleError(PyObject *t, Py_ssize_t index) { if (t == Py_None) { __Pyx_RaiseNoneNotIterableError(); } else if (PyTuple_GET_SIZE(t) < index) { __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(t)); } else { __Pyx_RaiseTooManyValuesError(index); } } static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type) { if (unlikely(!type)) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (likely(PyObject_TypeCheck(obj, type))) return 1; PyErr_Format(PyExc_TypeError, "Cannot convert %.200s to %.200s", Py_TYPE(obj)->tp_name, type->tp_name); return 0; } #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags) { PyObject *getbuffer_cobj; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) return PyObject_GetBuffer(obj, view, flags); #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) return __pyx_pw_5numpy_7ndarray_1__getbuffer__(obj, view, flags); #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (getbuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_getbuffer"))) { getbufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (getbufferproc) PyCapsule_GetPointer(getbuffer_cobj, "getbuffer(obj, view, flags)"); #else func = (getbufferproc) PyCObject_AsVoidPtr(getbuffer_cobj); #endif Py_DECREF(getbuffer_cobj); if (!func) goto fail; return func(obj, view, flags); } else { PyErr_Clear(); } #endif PyErr_Format(PyExc_TypeError, "'%100s' does not have the buffer interface", Py_TYPE(obj)->tp_name); #if PY_VERSION_HEX < 0x02060000 fail: #endif return -1; } static void __Pyx_ReleaseBuffer(Py_buffer *view) { PyObject *obj = view->obj; PyObject *releasebuffer_cobj; if (!obj) return; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) { PyBuffer_Release(view); return; } #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) { __pyx_pw_5numpy_7ndarray_3__releasebuffer__(obj, view); return; } #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (releasebuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_releasebuffer"))) { releasebufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (releasebufferproc) PyCapsule_GetPointer(releasebuffer_cobj, "releasebuffer(obj, view)"); #else func = (releasebufferproc) PyCObject_AsVoidPtr(releasebuffer_cobj); #endif Py_DECREF(releasebuffer_cobj); if (!func) goto fail; func(obj, view); return; } else { PyErr_Clear(); } #endif goto nofail; #if PY_VERSION_HEX < 0x02060000 fail: #endif PyErr_WriteUnraisable(obj); nofail: Py_DECREF(obj); view->obj = NULL; } #endif /* PY_MAJOR_VERSION < 3 */ static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, long level) { PyObject *py_import = 0; PyObject *empty_list = 0; PyObject *module = 0; PyObject *global_dict = 0; PyObject *empty_dict = 0; PyObject *list; py_import = __Pyx_GetAttrString(__pyx_b, "__import__"); if (!py_import) goto bad; if (from_list) list = from_list; else { empty_list = PyList_New(0); if (!empty_list) goto bad; list = empty_list; } global_dict = PyModule_GetDict(__pyx_m); if (!global_dict) goto bad; empty_dict = PyDict_New(); if (!empty_dict) goto bad; #if PY_VERSION_HEX >= 0x02050000 { #if PY_MAJOR_VERSION >= 3 if (level == -1) { if (strchr(__Pyx_MODULE_NAME, '.')) { /* try package relative import first */ PyObject *py_level = PyInt_FromLong(1); if (!py_level) goto bad; module = PyObject_CallFunctionObjArgs(py_import, name, global_dict, empty_dict, list, py_level, NULL); Py_DECREF(py_level); if (!module) { if (!PyErr_ExceptionMatches(PyExc_ImportError)) goto bad; PyErr_Clear(); } } level = 0; /* try absolute import on failure */ } #endif if (!module) { PyObject *py_level = PyInt_FromLong(level); if (!py_level) goto bad; module = PyObject_CallFunctionObjArgs(py_import, name, global_dict, empty_dict, list, py_level, NULL); Py_DECREF(py_level); } } #else if (level>0) { PyErr_SetString(PyExc_RuntimeError, "Relative import is not supported for Python <=2.4."); goto bad; } module = PyObject_CallFunctionObjArgs(py_import, name, global_dict, empty_dict, list, NULL); #endif bad: Py_XDECREF(empty_list); Py_XDECREF(py_import); Py_XDECREF(empty_dict); return module; } static CYTHON_INLINE void __Pyx_RaiseImportError(PyObject *name) { #if PY_MAJOR_VERSION < 3 PyErr_Format(PyExc_ImportError, "cannot import name %.230s", PyString_AsString(name)); #else PyErr_Format(PyExc_ImportError, "cannot import name %S", name); #endif } #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return ::std::complex< double >(x, y); } #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return x + y*(__pyx_t_double_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { __pyx_t_double_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex a, __pyx_t_double_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrt(z.real*z.real + z.imag*z.imag); #else return hypot(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { double denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(a, a); case 3: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, a); case 4: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_abs(a); theta = atan2(a.imag, a.real); } lnr = log(r); z_r = exp(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cos(z_theta); z.imag = z_r * sin(z_theta); return z; } #endif #endif static CYTHON_INLINE int __Pyx_PyBytes_Equals(PyObject* s1, PyObject* s2, int equals) { if (s1 == s2) { return (equals == Py_EQ); } else if (PyBytes_CheckExact(s1) & PyBytes_CheckExact(s2)) { if (PyBytes_GET_SIZE(s1) != PyBytes_GET_SIZE(s2)) { return (equals == Py_NE); } else if (PyBytes_GET_SIZE(s1) == 1) { if (equals == Py_EQ) return (PyBytes_AS_STRING(s1)[0] == PyBytes_AS_STRING(s2)[0]); else return (PyBytes_AS_STRING(s1)[0] != PyBytes_AS_STRING(s2)[0]); } else { int result = memcmp(PyBytes_AS_STRING(s1), PyBytes_AS_STRING(s2), (size_t)PyBytes_GET_SIZE(s1)); return (equals == Py_EQ) ? (result == 0) : (result != 0); } } else if ((s1 == Py_None) & PyBytes_CheckExact(s2)) { return (equals == Py_NE); } else if ((s2 == Py_None) & PyBytes_CheckExact(s1)) { return (equals == Py_NE); } else { int result; PyObject* py_result = PyObject_RichCompare(s1, s2, equals); if (!py_result) return -1; result = __Pyx_PyObject_IsTrue(py_result); Py_DECREF(py_result); return result; } } #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return ::std::complex< float >(x, y); } #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return x + y*(__pyx_t_float_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { __pyx_t_float_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex a, __pyx_t_float_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrtf(z.real*z.real + z.imag*z.imag); #else return hypotf(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { float denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(a, a); case 3: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, a); case 4: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_absf(a); theta = atan2f(a.imag, a.real); } lnr = logf(r); z_r = expf(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cosf(z_theta); z.imag = z_r * sinf(z_theta); return z; } #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject* x) { const unsigned char neg_one = (unsigned char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned char" : "value too large to convert to unsigned char"); } return (unsigned char)-1; } return (unsigned char)val; } return (unsigned char)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject* x) { const unsigned short neg_one = (unsigned short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned short" : "value too large to convert to unsigned short"); } return (unsigned short)-1; } return (unsigned short)val; } return (unsigned short)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject* x) { const unsigned int neg_one = (unsigned int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned int" : "value too large to convert to unsigned int"); } return (unsigned int)-1; } return (unsigned int)val; } return (unsigned int)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject* x) { const char neg_one = (char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to char" : "value too large to convert to char"); } return (char)-1; } return (char)val; } return (char)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject* x) { const short neg_one = (short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to short" : "value too large to convert to short"); } return (short)-1; } return (short)val; } return (short)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject* x) { const signed char neg_one = (signed char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed char" : "value too large to convert to signed char"); } return (signed char)-1; } return (signed char)val; } return (signed char)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject* x) { const signed short neg_one = (signed short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed short" : "value too large to convert to signed short"); } return (signed short)-1; } return (signed short)val; } return (signed short)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject* x) { const signed int neg_one = (signed int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed int" : "value too large to convert to signed int"); } return (signed int)-1; } return (signed int)val; } return (signed int)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject* x) { const unsigned long neg_one = (unsigned long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)PyLong_AsUnsignedLong(x); } else { return (unsigned long)PyLong_AsLong(x); } } else { unsigned long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned long)-1; val = __Pyx_PyInt_AsUnsignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject* x) { const unsigned PY_LONG_LONG neg_one = (unsigned PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (unsigned PY_LONG_LONG)PyLong_AsLongLong(x); } } else { unsigned PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned PY_LONG_LONG)-1; val = __Pyx_PyInt_AsUnsignedLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject* x) { const long neg_one = (long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)PyLong_AsUnsignedLong(x); } else { return (long)PyLong_AsLong(x); } } else { long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (long)-1; val = __Pyx_PyInt_AsLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject* x) { const PY_LONG_LONG neg_one = (PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (PY_LONG_LONG)PyLong_AsLongLong(x); } } else { PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1; val = __Pyx_PyInt_AsLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject* x) { const signed long neg_one = (signed long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)PyLong_AsUnsignedLong(x); } else { return (signed long)PyLong_AsLong(x); } } else { signed long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed long)-1; val = __Pyx_PyInt_AsSignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject* x) { const signed PY_LONG_LONG neg_one = (signed PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (signed PY_LONG_LONG)PyLong_AsLongLong(x); } } else { signed PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed PY_LONG_LONG)-1; val = __Pyx_PyInt_AsSignedLongLong(tmp); Py_DECREF(tmp); return val; } } static int __Pyx_check_binary_version(void) { char ctversion[4], rtversion[4]; PyOS_snprintf(ctversion, 4, "%d.%d", PY_MAJOR_VERSION, PY_MINOR_VERSION); PyOS_snprintf(rtversion, 4, "%s", Py_GetVersion()); if (ctversion[0] != rtversion[0] || ctversion[2] != rtversion[2]) { char message[200]; PyOS_snprintf(message, sizeof(message), "compiletime version %s of module '%.100s' " "does not match runtime version %s", ctversion, __Pyx_MODULE_NAME, rtversion); #if PY_VERSION_HEX < 0x02050000 return PyErr_Warn(NULL, message); #else return PyErr_WarnEx(NULL, message, 1); #endif } return 0; } #ifndef __PYX_HAVE_RT_ImportType #define __PYX_HAVE_RT_ImportType static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict) { PyObject *py_module = 0; PyObject *result = 0; PyObject *py_name = 0; char warning[200]; py_module = __Pyx_ImportModule(module_name); if (!py_module) goto bad; py_name = __Pyx_PyIdentifier_FromString(class_name); if (!py_name) goto bad; result = PyObject_GetAttr(py_module, py_name); Py_DECREF(py_name); py_name = 0; Py_DECREF(py_module); py_module = 0; if (!result) goto bad; if (!PyType_Check(result)) { PyErr_Format(PyExc_TypeError, "%s.%s is not a type object", module_name, class_name); goto bad; } if (!strict && (size_t)((PyTypeObject *)result)->tp_basicsize > size) { PyOS_snprintf(warning, sizeof(warning), "%s.%s size changed, may indicate binary incompatibility", module_name, class_name); #if PY_VERSION_HEX < 0x02050000 if (PyErr_Warn(NULL, warning) < 0) goto bad; #else if (PyErr_WarnEx(NULL, warning, 0) < 0) goto bad; #endif } else if ((size_t)((PyTypeObject *)result)->tp_basicsize != size) { PyErr_Format(PyExc_ValueError, "%s.%s has the wrong size, try recompiling", module_name, class_name); goto bad; } return (PyTypeObject *)result; bad: Py_XDECREF(py_module); Py_XDECREF(result); return NULL; } #endif #ifndef __PYX_HAVE_RT_ImportModule #define __PYX_HAVE_RT_ImportModule static PyObject *__Pyx_ImportModule(const char *name) { PyObject *py_name = 0; PyObject *py_module = 0; py_name = __Pyx_PyIdentifier_FromString(name); if (!py_name) goto bad; py_module = PyImport_Import(py_name); Py_DECREF(py_name); return py_module; bad: Py_XDECREF(py_name); return 0; } #endif static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line) { int start = 0, mid = 0, end = count - 1; if (end >= 0 && code_line > entries[end].code_line) { return count; } while (start < end) { mid = (start + end) / 2; if (code_line < entries[mid].code_line) { end = mid; } else if (code_line > entries[mid].code_line) { start = mid + 1; } else { return mid; } } if (code_line <= entries[mid].code_line) { return mid; } else { return mid + 1; } } static PyCodeObject *__pyx_find_code_object(int code_line) { PyCodeObject* code_object; int pos; if (unlikely(!code_line) || unlikely(!__pyx_code_cache.entries)) { return NULL; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if (unlikely(pos >= __pyx_code_cache.count) || unlikely(__pyx_code_cache.entries[pos].code_line != code_line)) { return NULL; } code_object = __pyx_code_cache.entries[pos].code_object; Py_INCREF(code_object); return code_object; } static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object) { int pos, i; __Pyx_CodeObjectCacheEntry* entries = __pyx_code_cache.entries; if (unlikely(!code_line)) { return; } if (unlikely(!entries)) { entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Malloc(64*sizeof(__Pyx_CodeObjectCacheEntry)); if (likely(entries)) { __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = 64; __pyx_code_cache.count = 1; entries[0].code_line = code_line; entries[0].code_object = code_object; Py_INCREF(code_object); } return; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if ((pos < __pyx_code_cache.count) && unlikely(__pyx_code_cache.entries[pos].code_line == code_line)) { PyCodeObject* tmp = entries[pos].code_object; entries[pos].code_object = code_object; Py_DECREF(tmp); return; } if (__pyx_code_cache.count == __pyx_code_cache.max_count) { int new_max = __pyx_code_cache.max_count + 64; entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Realloc( __pyx_code_cache.entries, new_max*sizeof(__Pyx_CodeObjectCacheEntry)); if (unlikely(!entries)) { return; } __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = new_max; } for (i=__pyx_code_cache.count; i>pos; i--) { entries[i] = entries[i-1]; } entries[pos].code_line = code_line; entries[pos].code_object = code_object; __pyx_code_cache.count++; Py_INCREF(code_object); } #include "compile.h" #include "frameobject.h" #include "traceback.h" static PyCodeObject* __Pyx_CreateCodeObjectForTraceback( const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_srcfile = 0; PyObject *py_funcname = 0; #if PY_MAJOR_VERSION < 3 py_srcfile = PyString_FromString(filename); #else py_srcfile = PyUnicode_FromString(filename); #endif if (!py_srcfile) goto bad; if (c_line) { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #else py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #endif } else { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromString(funcname); #else py_funcname = PyUnicode_FromString(funcname); #endif } if (!py_funcname) goto bad; py_code = __Pyx_PyCode_New( 0, /*int argcount,*/ 0, /*int kwonlyargcount,*/ 0, /*int nlocals,*/ 0, /*int stacksize,*/ 0, /*int flags,*/ __pyx_empty_bytes, /*PyObject *code,*/ __pyx_empty_tuple, /*PyObject *consts,*/ __pyx_empty_tuple, /*PyObject *names,*/ __pyx_empty_tuple, /*PyObject *varnames,*/ __pyx_empty_tuple, /*PyObject *freevars,*/ __pyx_empty_tuple, /*PyObject *cellvars,*/ py_srcfile, /*PyObject *filename,*/ py_funcname, /*PyObject *name,*/ py_line, /*int firstlineno,*/ __pyx_empty_bytes /*PyObject *lnotab*/ ); Py_DECREF(py_srcfile); Py_DECREF(py_funcname); return py_code; bad: Py_XDECREF(py_srcfile); Py_XDECREF(py_funcname); return NULL; } static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_globals = 0; PyFrameObject *py_frame = 0; py_code = __pyx_find_code_object(c_line ? c_line : py_line); if (!py_code) { py_code = __Pyx_CreateCodeObjectForTraceback( funcname, c_line, py_line, filename); if (!py_code) goto bad; __pyx_insert_code_object(c_line ? c_line : py_line, py_code); } py_globals = PyModule_GetDict(__pyx_m); if (!py_globals) goto bad; py_frame = PyFrame_New( PyThreadState_GET(), /*PyThreadState *tstate,*/ py_code, /*PyCodeObject *code,*/ py_globals, /*PyObject *globals,*/ 0 /*PyObject *locals*/ ); if (!py_frame) goto bad; py_frame->f_lineno = py_line; PyTraceBack_Here(py_frame); bad: Py_XDECREF(py_code); Py_XDECREF(py_frame); } static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) { while (t->p) { #if PY_MAJOR_VERSION < 3 if (t->is_unicode) { *t->p = PyUnicode_DecodeUTF8(t->s, t->n - 1, NULL); } else if (t->intern) { *t->p = PyString_InternFromString(t->s); } else { *t->p = PyString_FromStringAndSize(t->s, t->n - 1); } #else /* Python 3+ has unicode identifiers */ if (t->is_unicode | t->is_str) { if (t->intern) { *t->p = PyUnicode_InternFromString(t->s); } else if (t->encoding) { *t->p = PyUnicode_Decode(t->s, t->n - 1, t->encoding, NULL); } else { *t->p = PyUnicode_FromStringAndSize(t->s, t->n - 1); } } else { *t->p = PyBytes_FromStringAndSize(t->s, t->n - 1); } #endif if (!*t->p) return -1; ++t; } return 0; } /* Type Conversion Functions */ static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { int is_true = x == Py_True; if (is_true | (x == Py_False) | (x == Py_None)) return is_true; else return PyObject_IsTrue(x); } static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x) { PyNumberMethods *m; const char *name = NULL; PyObject *res = NULL; #if PY_VERSION_HEX < 0x03000000 if (PyInt_Check(x) || PyLong_Check(x)) #else if (PyLong_Check(x)) #endif return Py_INCREF(x), x; m = Py_TYPE(x)->tp_as_number; #if PY_VERSION_HEX < 0x03000000 if (m && m->nb_int) { name = "int"; res = PyNumber_Int(x); } else if (m && m->nb_long) { name = "long"; res = PyNumber_Long(x); } #else if (m && m->nb_int) { name = "int"; res = PyNumber_Long(x); } #endif if (res) { #if PY_VERSION_HEX < 0x03000000 if (!PyInt_Check(res) && !PyLong_Check(res)) { #else if (!PyLong_Check(res)) { #endif PyErr_Format(PyExc_TypeError, "__%s__ returned non-%s (type %.200s)", name, name, Py_TYPE(res)->tp_name); Py_DECREF(res); return NULL; } } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_TypeError, "an integer is required"); } return res; } static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) { Py_ssize_t ival; PyObject* x = PyNumber_Index(b); if (!x) return -1; ival = PyInt_AsSsize_t(x); Py_DECREF(x); return ival; } static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) { #if PY_VERSION_HEX < 0x02050000 if (ival <= LONG_MAX) return PyInt_FromLong((long)ival); else { unsigned char *bytes = (unsigned char *) &ival; int one = 1; int little = (int)*(unsigned char*)&one; return _PyLong_FromByteArray(bytes, sizeof(size_t), little, 0); } #else return PyInt_FromSize_t(ival); #endif } static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject* x) { unsigned PY_LONG_LONG val = __Pyx_PyInt_AsUnsignedLongLong(x); if (unlikely(val == (unsigned PY_LONG_LONG)-1 && PyErr_Occurred())) { return (size_t)-1; } else if (unlikely(val != (unsigned PY_LONG_LONG)(size_t)val)) { PyErr_SetString(PyExc_OverflowError, "value too large to convert to size_t"); return (size_t)-1; } return (size_t)val; } #endif /* Py_PYTHON_H */ PyCogent-1.5.3/cogent/maths/_period.pyx000644 000765 000024 00000003455 11443272207 021027 0ustar00jrideoutstaff000000 000000 from numpy import pi, exp, sqrt, cos cimport numpy as np # TODO intro the version idea of peter's see email from him on Wednesday, 26 May 2010 def goertzel_inner(np.ndarray[np.float64_t, ndim=1] x, int N, int period): """returns the power from series x for period""" cdef int n cdef np.float64_t coeff, s, s_prev, s_prev2, power coeff = 2.0 * cos(2 * pi / period) s_prev = 0.0 s_prev2 = 0.0 for n in range(N): s = x[n] + coeff * s_prev - s_prev2 s_prev2 = s_prev s_prev = s power = sqrt(s_prev2**2 + s_prev**2 - coeff * s_prev2 * s_prev) return power def ipdft_inner(np.ndarray[np.float64_t, ndim=1] x, np.ndarray[np.complex128_t, ndim=1] X, np.ndarray[np.complex128_t, ndim=1] W, int ulim, int N): """use this when repeated calls for window of same length are to be made""" cdef int n, p cdef np.complex128_t w for p in range(ulim): w = 1.0 for n in range(N): if n != 0: w = w * W[p] X[p] = X[p] + x[n]*w return X def autocorr_inner(np.ndarray[np.float64_t, ndim=1] x, np.ndarray[np.float64_t, ndim=1] xc, int N): cdef int m, n for m in range(-N+1, N): for n in range(N): if 0 <= n-m < N: xc[m+N-1] += (x[n]*x[n-m]) def seq_to_symbols(char* seq, list motifs, int motif_length, np.ndarray[np.uint8_t, ndim=1] result): cdef int i, j, N, num_motifs cdef bytes got N = len(seq) num_motifs = len(motifs) motif_length = len(motifs[0]) for i in range(N - motif_length + 1): got = seq[i: i+motif_length] for j in range(num_motifs): if got == motifs[j]: result[i] = 1 return result PyCogent-1.5.3/cogent/maths/distance_transform.py000644 000765 000024 00000146504 12024702176 023105 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ matrix based distance metrics, and related coordinate transforms functions to compute distance matrices row by row from abundance matrices, typically samples (rows) vs. species/OTU's (cols) DISTANCE FUNCTIONS For distance functions, the API resembles the following (but see function docstring for specifics): * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros *typically* returns 0 distance between them * negative values are only allowed for some distance metrics, in these cases if strict==True, negative input values return a ValueError, and if strict==False, errors or misleading return values may result * functions prefaced with "binary" consider only presense/absense in input data (qualitative rather than quantitative) TRANSFORM FUNCTIONS * For transform functions, very little error checking exists. 0/0 evals in transform fomulas will throw errors, and negative data will return spurious results or throw errors * The transform functions are as described in Legendre, P. and E. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia: 129: 271-280. These and allow the use of ordination methods such as PCA and RDA, which are Euclidean-based, for the analysis of community data, while circumventing the problems associated with the Euclidean distance. The matrix that is returned still has samples as rows and species as columns, but the values are transformed so that when programs such as PCA calculate euclidean distances on the matrix, chord, chisquare, 'species profile', or hellinger distances will result. EXAMPLE USAGE: >from distance_transform import dist_euclidean >from numpy import array >abundance_data = array([[1, 3], [5, 2], [0.1, 22]],'d') >dists = dist_euclidean(abundance_data) >print dists array([[ 0. , 4.12310563, 19.02130385], [ 4.12310563, 0. , 20.5915031 ], [ 19.02130385, 20.5915031 , 0. ]]) """ from __future__ import division import numpy from numpy import (array, zeros, logical_and, logical_or, logical_xor, where, mean, std, argsort, take, ravel, logical_not, shape, sqrt, abs, sum, square, asmatrix, asarray, multiply, min, rank, any, all, isfinite, nonzero, nan_to_num, geterr, seterr, isnan) # any, all from numpy override built in any, all, preventing: # ValueError: The truth value of an array with more than one element is # ambiguous. Use a.any() or a.all() from numpy.linalg import norm __author__ = "Justin Kuczynski" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Micah Hamady", "Justin Kuczynski", "Zongzhi Liu", "Catherine Lozupone", "Antonio Gonzalez Pena", "Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Justin Kuczynski" __email__ = "justinak@gmail.com" __status__ = "Prototype" def _rankdata(a): """ Ranks the data in a, dealing with ties appropritely. First ravels a. Adapted from Gary Perlman's |Stat ranksort. private helper function Returns: array of length equal to a, containing rank scores """ a = ravel(a) n = len(a) ivec = argsort(a) svec = take(a, ivec) sumranks = dupcount = 0 newarray = zeros(n,'d') for i in range(n): sumranks = sumranks + i dupcount = dupcount + 1 if i==n-1 or svec[i] <> svec[i+1]: averank = sumranks / float(dupcount) + 1 for j in range(i-dupcount+1,i+1): newarray[ivec[j]] = averank sumranks = dupcount = 0 return newarray def trans_chord(m): """perform a chord distance transformation on the rows of m transforms m to m' so that the euclidean dist between the rows of m' equals the chord dist between the rows of m. Ref: Legendre, P. and E. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia: 129: 271-280. """ m = asmatrix(m) row_norms = sqrt(sum(square(m), axis=1)) result = m / row_norms return result def trans_chisq(m): """perform a chi squared distance transformation on the rows of m transforms m to m' so that the euclidean dist between the rows of m' equals the chi squared dist between the rows of m. Ref: Legendre, P. and E. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia: 129: 271-280. """ m = asmatrix(m) grand_sum, row_sums, col_sums = m.sum(), m.sum(1), m.sum(0) result = m * sqrt(grand_sum) result /= row_sums result /= sqrt(col_sums) return result def trans_specprof(m): """perform a species profile distance transformation on the rows of m transforms m to m' so that the euclidean dist between the rows of m' equals the species profile dist between the rows of m. Ref: Legendre, P. and E. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia: 129: 271-280. """ m = asmatrix(m) row_sums = sum(m, axis=1) result = m / row_sums return result def trans_hellinger(m): """perform a hellinger distance transformation on the rows of m transforms m to m' so that the euclidean dist between the rows of m' equals the hellinger dist between the rows of m. Ref: Legendre, P. and E. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia: 129: 271-280. """ m = asmatrix(m) row_sums = sum(m, axis=1) result = sqrt(m / row_sums) return result def dist_bray_curtis(datamtx, strict=True): """ returns bray curtis distance (normalized manhattan distance) btw rows dist(a,b) = manhattan distance / sum on i( (a_i + b_i) ) see for example: Faith et al. 1987 Compositional dissimilarity as a robust measure of ecological distance Vegitatio * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') for i in range(numrows): r1 = datamtx[i,:] for j in range(i): r2 = datamtx[j,:] abs_v = float(sum(abs(r1 - r2))) v = sum(r1 + r2) cur_d = 0.0 if v > 0: cur_d = abs_v/v dists[i][j] = dists[j][i] = cur_d return dists dist_bray_curtis_faith = dist_bray_curtis def dist_bray_curtis_magurran(datamtx, strict=True): """ returns bray curtis distance (quantitative sorensen) btw rows dist(a,b) = 2*sum on i( min( a_i, b_i)) / sum on i( (a_i + b_i) ) see for example: Magurran 2004 Bray 1957 * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not numpy.all(numpy.isfinite(datamtx)): raise ValueError("non finite number in input matrix") if numpy.any(datamtx<0.0): raise ValueError("negative value in input matrix") if numpy.rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = numpy.shape(datamtx) else: try: numrows, numcols = numpy.shape(datamtx) except ValueError: return numpy.zeros((0,0),'d') if numrows == 0 or numcols == 0: return numpy.zeros((0,0),'d') dists = numpy.zeros((numrows,numrows),'d') for i in range(numrows): r1 = datamtx[i,:] r1sum = r1.sum() for j in range(i): r2 = datamtx[j,:] r2sum = r2.sum() minvals = numpy.min([r1,r2],axis=0) if (r1sum + r2sum) == 0: dists[i][j] = dists[j][i] = 0.0 else: dissim = 1 - ( (2*minvals.sum()) / (r1sum + r2sum) ) dists[i][j] = dists[j][i] = dissim return dists def dist_canberra(datamtx, strict=True): """returns a row-row canberra dist matrix see for example: Faith et al. 1987 Compositional dissimilarity as a robust measure of ecological distance Vegitatio * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. * chisq dist normalizes by column sums - empty columns (all zeros) are ignored here """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') oldstate = seterr(invalid='ignore',divide='ignore') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') for i in range(numrows): r1 = datamtx[i] for j in range(i): r2 = datamtx[j] dist = 0.0 net = abs( r1 - r2 ) / (r1 + r2) net = nan_to_num(net) num_nonzeros = nonzero(net)[0].size dists[i,j] = dists[j,i] = nan_to_num(net.sum()/num_nonzeros) seterr(**oldstate) return dists def dist_chisq(datamtx, strict=True): """returns a row-row chisq dist matrix see for example: Faith et al. 1987 Compositional dissimilarity as a robust measure of ecological distance Vegitatio * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * an all zero row compared with a not all zero row returns a distance of 1 * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. * chisq dist normalizes by column sums - empty columns (all zeros) are ignored here """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') sqrt_grand_sum = sqrt(sum(datamtx)) rowsums, colsums = sum(datamtx, axis=1), sum(datamtx, axis=0) if not colsums.all(): for i in range(len(colsums)): if colsums[i] == 0.0: colsums[i] = 1.0 for i in range(numrows): r1 = datamtx[i] r1sum = rowsums[i] for j in range(i): r2 = datamtx[j] r2sum = rowsums[j] if r1sum == 0.0 or r2sum == 0.0: if r1sum == 0.0 and r2sum == 0.0: dist = 0.0 else: dist = 1.0 else: dist = sqrt_grand_sum *\ sqrt(sum( multiply((1./colsums) , square(r1/r1sum - r2/r2sum)) )) dists[i,j] = dists[j,i] = dist return dists def dist_chord(datamtx, strict=True): """returns a row-row chord dist matrix attributed to Orloci (with accent). see Legendre 2001, ecologically meaningful... * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * an all zero row compared with a not all zero row returns a distance of 1 * if strict==True, raises ValueError if any of the input data is not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a 2d matrix. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') for i in range(numrows): r1 = datamtx[i] # cache here r1norm = norm(r1) for j in range(i): r2 = datamtx[j] r2norm = norm(r2) if r1norm == 0.0 or r2norm == 0.0: if r1norm == 0.0 and r2norm == 0.0: dist = 0.0 else: dist = 1.0 else: dist = norm(r1/r1norm - r2/r2norm) dists[i,j] = dists[j,i] = dist return dists def dist_euclidean(datamtx, strict=True): """returns a row by row euclidean dist matrix returns the euclidean norm of row1 - row2 for all rows in datamtx * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * if strict==True, raises ValueError if any of the input data is not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a 2d matrix. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ datamtx = asarray(datamtx, 'd') if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') for r in range(numrows): for c in range(r): dist = norm(datamtx[r] - datamtx[c]) if isnan(dist): raise RuntimeError('ERROR: overflow when computing euclidean distance') dists[r,c] = dists[c,r] = dist return dists def dist_gower(datamtx, strict=True): """returns a row-row gower dist matrix see for example, Faith et al., 1987 * note that the comparison between any two rows is dependent on the entire data matrix, d_ij is a fn of all of datamtx, not just i,j * comparisons are between rows (samples) * any column containing identical data for all rows is ignored (this prevents a 0/0 error in the formula for gower distance * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * if strict==True, raises ValueError if any of the input data is not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a 2d matrix. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') coldiffs = datamtx.max(axis=0) - datamtx.min(axis=0) for i in range(numcols): if coldiffs[i] == 0.0: coldiffs[i] = 1.0 # numerator will be zero anyway for i in range(numrows): r1 = datamtx[i] for j in range(i): r2 = datamtx[j] rowdiff = r2 - r1 dist = sum(abs(r1 - r2) / coldiffs) dists[i,j] = dists[j,i] = dist return dists def dist_hellinger(datamtx, strict=True): """returns a row-row hellinger dist matrix * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * an all zero row compared with a not all zero row returns a distance of 1 * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') for i in range(numrows): r1 = datamtx[i] r1sum = sum(r1) for j in range(i): r2 = datamtx[j] r2sum = sum(r2) if r1sum == 0.0 or r2sum == 0.0: if r1sum == 0.0 and r2sum == 0.0: dist = 0.0 else: dist = 1.0 else: dist = norm(sqrt(r1/r1sum) - sqrt(r2/r2sum)) dists[i,j] = dists[j,i] = dist return dists def dist_kulczynski(datamtx, strict=True): """ calculates the kulczynski distances between rows of a matrix see for example Faith et al., composiitonal dissimilarity, 1987 returns a distance of 1 between a row of zeros and a row with at least one nonzero element * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * an all zero row compared with a not all zero row returns a distance of 1 * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') rowsums = datamtx.sum(axis=1) # rowsum: the sum of elements in a row # cache to avoid recalculating for each pair for i in range(numrows): irowsum = rowsums[i] r1 = datamtx[i] for j in range(i): r2 = datamtx[j] jrowsum = rowsums[j] rowminsum = float(sum(where(r1 two rows of zeros elif (irowsum == 0.0 or jrowsum == 0.0): cur_d = 1.0 # one row zeros, one not all zeros else: cur_d = 1.0 - (((rowminsum/irowsum) + (rowminsum/jrowsum))/2.0) dists[i][j] = dists[j][i] = cur_d return dists def dist_manhattan(datamtx, strict=True): """ returns manhattan (city block) distance between rows dist(a,b) = sum on i( abs(a_i - b_i) ) negative values ok (but not tested) * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * if strict==True, raises ValueError if any of the input data is not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a 2d matrix. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') for i in range(numrows): r1 = datamtx[i] # cache here for j in range(i): dists[i,j] = dists[j,i] = sum(abs(r1 - datamtx[j])) return dists def dist_abund_jaccard(datamtx, strict=True): """Calculate abundance-based Jaccard distance between rows The abundance-based Jaccard index is defined in Chao et. al., Ecology Lett. 8, 148 (2005), eq. 5: J_abd = UV / (U + V - UV), where U = sum of relative abundances of shared species in a V = sum of relative abundances of shared species in b The Chao-Jaccard distance is 1 - J_abd * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * an all zero row compared with a not all zero row returns a distance of 1 * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') rowsums = datamtx.sum(axis=1, dtype='float') for i in range(numrows): row1 = datamtx[i] N1 = rowsums[i] for j in range(i): row2 = datamtx[j] N2 = rowsums[j] if N1 == 0.0 and N2 == 0.0: similarity = 1.0 elif N1 == 0.0 or N2 == 0.0: similarity = 0.0 else: shared = logical_and(row1, row2) u = sum(row1[shared]) / N1 v = sum(row2[shared]) / N2 # Verified by graphical inspection if u == 0.0 and v == 0.0: similarity = 0.0 else: similarity = (u * v) / (u + v - (u * v)) dists[i][j] = dists[j][i] = 1 - similarity return dists def dist_morisita_horn(datamtx, strict=True): """ returns morisita-horn distance between rows dist(a,b) = 1 - 2*sum(a_i * b_i) /( (d_a + d_b)* N_a * N_b ) see book: magurran 2004 pg 246 * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * an all zero row compared with a not all zero row returns a distance of 1 * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') rowsums = datamtx.sum(axis=1, dtype='float') row_ds = (datamtx**2).sum(axis=1, dtype='float') # these are d_a, etc for i in range(numrows): if row_ds[i] !=0.: row_ds[i] = row_ds[i] / rowsums[i]**2 # this leaves row_ds zero if actually 0/0 for i in range(numrows): row1 = datamtx[i] N1 = rowsums[i] d1 = row_ds[i] for j in range(i): row2 = datamtx[j] N2 = rowsums[j] d2 = row_ds[j] if N2 == 0.0 and N1==0.0: dist = 0.0 elif N2 == 0.0 or N1==0.0: dist = 1.0 else: # d's zero only if N's zero, and we already checked for that similarity = 2*sum(row1*row2) similarity = similarity / ( (d1 + d2) * N1 * N2 ) dist = 1 - similarity dists[i][j] = dists[j][i] = dist return dists def dist_pearson(datamtx, strict=True): """ Calculates pearson distance (1-r) between rows note that the literature sometimer refers to the pearson dissimilarity as (1 - r)/2 (e.g.: BC Blaxall et al. 2003: Differential Myocardial Gene Expression in the Development and Rescue of Murine Heart Failure) for pearson's r, see for example: Thirteen Ways to Look at the Correlation Coefficient by J rodgers, 1988 * distance varies between 0-2, inclusive. * Flat rows (all elements itentical) will return a distance of 1 relative to any non-flat row, and a distance of zero to another flat row * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * an all zero row compared with a not all zero row returns a distance of 1 * if strict==True, raises ValueError if any of the input data is not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a 2d matrix If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') rowmeans = mean(datamtx, axis=1) rowstds = std(datamtx, axis=1) dists = zeros((numrows,numrows),'d') n = float(numrows) for i in range(numrows): r1 = datamtx[i,:] r1m = rowmeans[i] r1dev = r1 - r1m for j in range(i): r2 = datamtx[j,:] r2m = rowmeans[j] r2dev = r2 - r2m top = sum(r1dev*r2dev) sum1 = sum(r1dev**2) sum2 = sum(r2dev**2) if (sum1 == 0.0 and sum2 == 0.0): r = 1.0 elif (sum1 == 0.0 or sum2 == 0.0): r = 0.0 else: bottom = sqrt(sum1 * sum2) r = top/bottom dists[i][j] = dists[j][i] = 1.0 - r return dists def dist_soergel(datamtx, strict=True): """ Calculate soergel distance between rows of a matrix see for example Evaluation of Distance Metrics..., Fechner 2004 dist(a,b) = sum on i( abs(a_i - b_i) ) / sum on i( max(a_i, b_i) ) returns: a symmetric distance matrix, numrows X numrows * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') for i in range(numrows): r1 = datamtx[i,:] for j in range(i): r2 = datamtx[j,:] top = float(sum(abs(r1 - r2))) bot = float(sum(where(r1>r2, r1,r2))) if bot <= 0.0: cur_d = 0.0 else: cur_d = top/bot dists[i][j] = dists[j][i] = cur_d return dists def dist_spearman_approx(datamtx, strict=True): """ Calculate spearman rank distance (1-r) using an approximation formula considers only rank order of elements in a row, averaging ties [19.2, 2.1, 0.03, 2.1] -> [3, 1.5, 0, 1.5] then performs dist(a,b) = 6 * sum(D^2) / (N*(N^2 - 1)) where D is difference in rank of element i between row a and row b, N is the length of either row * formula fails for < 2 columns, returns a zeros matrix * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * if strict==True, raises ValueError if any of the input data is not finite, or if the input data is not a rank 2 array (a matrix), or if there are less than 2 colunms * if strict==False, assumes input data is a 2d matrix. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) if numcols < 2: raise ValueError("input matrix has < 2 colunms") else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') if numcols < 2: return dists # formula fails for < 2 elements per row for i in range(numrows): r1 = datamtx[i,:] rank1 = _rankdata(r1) for j in range(i): r2 = datamtx[j,:] rank2 = _rankdata(r2) rankdiff = rank1 - rank2 dsqsum = sum((rankdiff)**2) dist = 6*dsqsum / float(numcols*(numcols**2-1)) dists[i][j] = dists[j][i] = dist return dists def dist_specprof(datamtx, strict=True): """returns a row-row species profile distance matrix * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * an all zero row compared with a not all zero row returns a distance of 1 * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') for i in range(numrows): r1 = datamtx[i] r1sum = sum(r1) for j in range(i): r2 = datamtx[j] r2sum = sum(r2) if r1sum == 0.0 or r2sum == 0.0: if r1sum == 0.0 and r2sum == 0.0: dist = 0.0 else: dist = 1.0 else: dist = norm((r1/r1sum) - (r2/r2sum)) dists[i,j] = dists[j,i] = dist return dists def binary_dist_otu_gain(otumtx): """ Calculates number of new OTUs observed in sample A wrt sample B This is an non-phylogenetic distance matrix analagous to unifrac_g. The number of OTUs gained in each sample is computed with respect to each other sample. """ result = [] for i in otumtx: row = [] for j in otumtx: gain = 0 for i_val, j_val in zip(i, j): if i_val > 0 and j_val == 0: gain += 1 row.append(gain) result.append(row) return array(result) def binary_dist_chisq(datamtx, strict=True): """Calculates binary chi-square dist between rows, returns dist matrix. converts input array to bool, then uses dist_chisq """ datamtx = datamtx.astype(bool) datamtx = datamtx.astype(float) return dist_chisq(datamtx, strict=True) def binary_dist_chord(datamtx, strict=True): """Calculates binary chord dist between rows, returns dist matrix. converts input array to bool, then uses dist_chisq for binary data, this is identical to a binary hellinger distance """ datamtx = datamtx.astype(bool) datamtx = datamtx.astype(float) return dist_chord(datamtx, strict=True) def binary_dist_sorensen_dice(datamtx, strict=True): """Calculates Sorensen-Dice distance btw rows, returning distance matrix. Note: Treats array as bool. This distance = 1 - dice's coincidence index see Measures of the Amount of Ecologic Association Between Species Author(s): Lee R. Dice, 1945 The 'o' in sorensen should be a non-ascii char, but isn't here for ease of use this is identical to a binary bray-curtis distance, as well as the binary whittaker distance metric. a = num 1's in a b = num 1's in b c = num that are 1's in both a and b Dice dist = 1 - (2*c)/(a + b). also known as whittaker: whittaker = (a + b - c)/( 0.5*(a+b) ) - 1 * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * negative input values are not allowed - will return nonsensical results and/or throw errors """ datamtx = datamtx.astype(bool) datamtx = datamtx.astype(float) if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') rowsums = datamtx.sum(axis=1) for i in range(numrows): row1 = datamtx[i] for j in range(i): row2 = datamtx[j] bottom = float(rowsums[i] + rowsums[j]) cur_d = 0.0 if bottom: cur_d = 1-(2*logical_and(row1,row2).sum()/bottom) dists[i][j] = dists[j][i] = cur_d return dists def binary_dist_euclidean(datamtx, strict=True): """Calculates binary euclidean distance between rows, returns dist matrix. converts input array to bool, then uses dist_euclidean """ datamtx = datamtx.astype(bool) datamtx = datamtx.astype(float) return dist_euclidean(datamtx, strict=True) def binary_dist_hamming(datamtx, strict=True): """Calculates hamming distance btw rows, returning distance matrix. Note: Treats array as bool. see for example wikipedia hamming_distance, 20 jan 2008 hamming is identical to binary manhattan distance Binary hamming: a = num 1's in a b = num 1's in b c = num that are 1's in both a and b hamm = a + b - 2c * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ datamtx = datamtx.astype(bool) datamtx = datamtx.astype(float) if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') rowsums = datamtx.sum(axis=1) for i in range(numrows): first = datamtx[i] a = rowsums[i] for j in range(i): second = datamtx[j] b = rowsums[j] c = float(logical_and(first, second).sum()) dist = a + b - (2.0*c) dists[i][j] = dists[j][i] = dist return dists def binary_dist_jaccard(datamtx, strict=True): """Calculates jaccard distance between rows, returns distance matrix. converts matrix to boolean. jaccard dist = 1 - jaccard index see for example: wikipedia jaccard index (20 jan 2009) this is identical to a binary version of the soergel distance Binary jaccard: a = num 1's in a b = num 1's in b c = num that are 1's in both a and b jaccard = 1 - (c/(a+b-c)) * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ datamtx = datamtx.astype(bool) datamtx = datamtx.astype(float) if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') rowsums = datamtx.sum(axis=1) for i in range(numrows): first = datamtx[i] a = rowsums[i] for j in range(i): second = datamtx[j] b = rowsums[j] c = float(logical_and(first, second).sum()) if a==0.0 and b==0.0: dist = 0.0 else: dist = 1.0 - (c/(a+b-c)) dists[i][j] = dists[j][i] = dist return dists def binary_dist_lennon(datamtx, strict=True): """Calculates lennon distance between rows, returns distance matrix. converts matrix to boolean. jaccard dist = 1 - lennon similarity lennon's similarity is a modification of simpson's index see Jack J. Lennon, The geographical structure of British bird distributions: diversity, spatial turnover and scale Binary lennon: a = num 1's in a b = num 1's in b c = num that are 1's in both a and b lennon = 1 - (c/(c + min(a-c,b-c))) * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ datamtx = datamtx.astype(bool) datamtx = datamtx.astype(float) if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') rowsums = datamtx.sum(axis=1) for i in range(numrows): first = datamtx[i] a = rowsums[i] for j in range(i): second = datamtx[j] b = rowsums[j] c = float(logical_and(first, second).sum()) if a==0.0 and b==0.0: dist = 0.0 elif c==0.0: dist = 1.0 else: dist = 1.0 - (c/(c + min([a-c,b-c]))) dists[i][j] = dists[j][i] = dist return dists def binary_dist_ochiai(datamtx, strict=True): """Calculates ochiai distance btw rows, returning distance matrix. Note: Treats array as bool. see for example: On the Mathematical Significance of the Similarity Index of Ochiai... Bolton, 1991 a = num 1's in a b = num 1's in b c = num that are 1's in both a and b ochiai = 1 - (c/sqrt(a*b)) * comparisons are between rows (samples) * input: 2D numpy array. Limited support for non-2D arrays if strict==False * output: numpy 2D array float ('d') type. shape (inputrows, inputrows) for sane input data * two rows of all zeros returns 0 distance between them * an all zero row compared with a not all zero row returns a distance of 1 * if strict==True, raises ValueError if any of the input data is negative, not finite, or if the input data is not a rank 2 array (a matrix). * if strict==False, assumes input data is a matrix with nonnegative entries. If rank of input data is < 2, returns an empty 2d array (shape: (0, 0) ). If 0 rows or 0 colunms, also returns an empty 2d array. """ datamtx = datamtx.astype(bool) datamtx = datamtx.astype(float) if strict: if not all(isfinite(datamtx)): raise ValueError("non finite number in input matrix") if any(datamtx<0.0): raise ValueError("negative value in input matrix") if rank(datamtx) != 2: raise ValueError("input matrix not 2D") numrows, numcols = shape(datamtx) else: try: numrows, numcols = shape(datamtx) except ValueError: return zeros((0,0),'d') if numrows == 0 or numcols == 0: return zeros((0,0),'d') dists = zeros((numrows,numrows),'d') rowsums = datamtx.sum(axis=1) for i in range(numrows): first = datamtx[i] a = rowsums[i] for j in range(i): second = datamtx[j] b = rowsums[j] c = float(logical_and(first, second).sum()) if a==0.0 and b==0.0: dist = 0.0 elif a==0.0 or b==0.0: dist = 1.0 else: dist = 1.0 - (c/sqrt(a*b)) dists[i][j] = dists[j][i] = dist return dists def binary_dist_pearson(datamtx, strict=True): """Calculates binary pearson distance between rows, returns distance matrix converts input array to bool, then uses dist_pearson """ datamtx = datamtx.astype(bool) datamtx = datamtx.astype(float) return dist_pearson(datamtx, strict=True) if __name__ == "__main__": """ just a test run""" matrix1 = array( [ [10,8,4,1], [8,6,2,1], [0,0,0,0], [0,0,1,0], [1,1,0,1], [1,0,8,10], [0,0,0,0], [8,6,2,1], ]) res = dist_euclidean(matrix1) print "euclidean distance result: \n" print res PyCogent-1.5.3/cogent/maths/fit_function.py000644 000765 000024 00000003411 12024702176 021674 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ fitting funtions module to fit x and y samples to a model """ from __future__ import division from numpy import array from cogent.maths.scipy_optimize import fmin __author__ = "Antonio Gonzalez Pena" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Antonio Gonzalez Pena"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Antonio Gonzalez Pena" __email__ = "antgonza@gmail.com" __status__ = "Prototype" def fit_function(x_vals, y_vals, func, n_params, iterations=2): """ Fit any function to any array of values of x and y. :Parameters: x_vals : array Values for x to fit the function func. y_vals : array Values for y to fit the function func. func : callable ``f(x, a)`` Objective function (model) to be fitted to the data. This function should return either an array for models that are not a constant, i.e. f(x)=exp(a[0]+x*a[1]), or a single value for models that are a cosntant, i.e. f(x)=a[0] n_params : int Number of parameters to fit in func iterations : int Number of iterations to fit func :Returns: param_guess param_guess : array Values for each of the arguments to fit func to x_vals and y_vals :Notes: Fit a function to a given array of values x and y using simplex to minimize the error. """ # internal function to minimize the error def f2min(a): #sum square deviation return ((func(x_vals, a) - y_vals)**2).sum() param_guess = array(range(n_params)) for i in range(iterations): xopt = fmin(f2min, param_guess, disp=0) param_guess = xopt return xopt #if __name__ == "__main__": # main() PyCogent-1.5.3/cogent/maths/function_optimisation.py000644 000765 000024 00000012713 12024702176 023636 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Algorthims for function optimisation great_deluge() is a hillclimbing algorithm based on: Gunter Dueck: New Optimization Heuristics, The Great Deluge Algorithm and the Record-to-Record Travel. Journal of Computational Physics, Vol. 104, 1993, pp. 86 - 92 ga_evolve() is a basic genetic algorithm in which all internal functions can be overridden NOTE: both optimisation functions are generators. """ from numpy.random import normal __author__ = "Daniel McDonald and Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "mcdonadt@colorado.edu" __status__ = "Production" def _simple_breed(best, num, mutation_rate, random_f): """Returns num copies of parent with mutation_rate changes""" result = [] score, parent = best for child_number in range(num): if random_f() <= mutation_rate: child = parent.mutate() result.append(child) else: result.append(parent) return result def _simple_score(child, target): """Returns the childs score as defined by the childs scoring function""" return child.score(target) def _simple_init(parent, num): """Creates a list parent copies""" return [parent.copy() for i in range(num)] def _simple_select(population, scores): """Returns a tuple: (best_score, best_child)""" scored = zip(scores, population) scored.sort() return scored[0] def great_deluge(a, step_factor=500, max_iter=100, max_total_iters=1000): """This generator makes random variations of the object a to minimize cost. Yields are performed at the end of each iteration and a tuple containing ((iter_count, total_iters), a) is returned. iter_count is used to kill the while loop in the event that no new objects are found with a better cost. iter_count gets reset each time an object with a better cost is found. total_iters will kill the while loop when the total number of iterations through the loop reaches max_total_iters Object a must implement methods cost() and perturb() for evaluating the score and making mutations respectively. Usually, you'll want to write a wrapper that passes these through to methods of an internal data object, or functions acting on that object. """ water_level = curr_cost = a.cost() # can't be worse than initial guess step_size = abs(water_level)/step_factor iter_count = 0 total_iters = 0 while iter_count < max_iter and total_iters < max_total_iters: new = a.perturb() new_cost = new.cost() if new_cost < water_level: if new_cost < curr_cost: water_level = max(curr_cost, water_level - step_size) iter_count = 0 # WARNING: iter_count is reset here! curr_cost = new_cost a = new else: iter_count += 1 yield ((iter_count, total_iters), a) total_iters += 1 def ga_evolve(parent, target, num, mutation_rate=0.01, score_f=_simple_score, breed_f=_simple_breed, select_f=_simple_select, init_f=_simple_init, random_f=normal, max_generations=1000): """Evolves a population based on the parent to the target Parent must implement methods copy(), mutate(), and score(target) to be used with the simple default functions. Yields are performed at the end of each iteration and contain the tuple (generation, best). The default functions return the tuple (generation, (best_score, best_obj)). Arguments: parent: Object to create initial population from. target: The goal of the evolution. num: Population size. mutation_rate: Rate at which objects in the population are mutated. score_f: Function to score the object against the target. breed_f: Function to create new population with mutations select_f: Function to select best object(s) from the population random_f: Function to be used in breed_f max_generations: Kills while loop if max_generations is reached Overload default functions: score_f: Must take an object and a target score. Returns objects score. breed_f: Must take a tuple containing (scores, objects), the size of population, a mutation rate and random function to use. Returns a list containing the initial population. Default function takes only the best object, but this may not be desired behavior. select_f: Must take a population and scores. Returns a tuple containing the best scores and objects in the population. Default function returns only the best score and object. init_f: Must take an object and the size of the population. Returns a list containing the starting population """ generation = 0 population = init_f(parent, num) while generation < max_generations: scores = [score_f(child, target) for child in population] best = select_f(population, scores) population = breed_f(best, num, mutation_rate, random_f) yield (generation, best) generation += 1 PyCogent-1.5.3/cogent/maths/geometry.py000644 000765 000024 00000013322 12024702176 021042 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Code for geometric operations, e.g. distances and center of mass.""" from __future__ import division from numpy import array, take, sum, newaxis, sqrt, sqrt, sin, cos, pi, c_, \ vstack, dot, ones __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Gavin Huttley", "Rob Knight", "Daniel McDonald", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" def center_of_mass(coordinates, weights= -1): """Calculates the center of mass for a dataset. coordinates, weights can be two things: either: coordinates = array of coordinates, where one column contains weights, weights = index of column that contains the weights or: coordinates = array of coordinates, weights = array of weights weights = -1 by default, because the simplest case is one dataset, where the last column contains the weights. If weights is given as a vector, it can be passed in as row or column. """ if isinstance(weights, int): return center_of_mass_one_array(coordinates, weights) else: return center_of_mass_two_array(coordinates, weights) def center_of_mass_one_array(data, weight_idx= -1): """Calculates the center of mass for a dataset data should be an array of x1,...,xn,r coordinates, where r is the weight of the point """ data = array(data) coord_idx = range(data.shape[1]) del coord_idx[weight_idx] coordinates = take(data, (coord_idx), 1) weights = take(data, (weight_idx,), 1) return sum(coordinates * weights, 0) / sum(weights, 0) def center_of_mass_two_array(coordinates, weights): """Calculates the center of mass for a set of weighted coordinates coordinates should be an array of coordinates weights should be an array of weights. Should have same number of items as the coordinates. Can be either row or column. """ coordinates = array(coordinates) weights = array(weights) try: return sum(coordinates * weights, 0) / sum(weights, 0) except ValueError: weights = weights[:, newaxis] return sum(coordinates * weights, 0) / sum(weights, 0) def distance(first, second): """Calculates Euclideas distance between two vectors (or arrays). WARNING: Vectors have to be the same dimension. """ return sqrt(sum(((first - second) ** 2).ravel())) def sphere_points(n): """Calculates uniformly distributed points on a unit sphere using the Golden Section Spiral algorithm. Arguments: -n: number of points """ points = [] inc = pi * (3 - sqrt(5)) offset = 2 / float(n) for k in xrange(int(n)): y = k * offset - 1 + (offset / 2) r = sqrt(1 - y * y) phi = k * inc points.append([cos(phi) * r, y, sin(phi) * r]) return array(points) def coords_to_symmetry(coords, fmx, omx, mxs, mode): """Applies symmetry transformation matrices on coordinates. This is used to create a crystallographic unit cell or a biological molecule, requires orthogonal coordinates, a fractionalization matrix (fmx), an orthogonalization matrix (omx) and rotation matrices (mxs). Returns all coordinates with included identity, which should be the first matrix in mxs. Arguments: - coords: an array of orthogonal coordinates - fmx: fractionalization matrix - omx: orthogonalization matrix - mxs: a sequence of 4x4 rotation matrices - mode: if mode 'table' assumes rotation matrices operate on fractional coordinates (like in crystallographic tables). """ all_coords = [coords] # the first matrix is identity if mode == 'fractional': # working with fractional matrices coords = dot(coords, fmx.transpose()) # add column of 1. coords4 = c_[coords, array([ones(len(coords))]).transpose()] for i in xrange(1, len(mxs)): # skip identity rot_mx = mxs[i].transpose() new_coords = dot(coords4, rot_mx)[:, :3] # rotate and translate, remove if mode == 'fractional': # ones column new_coords = dot(new_coords, omx.transpose()) # return to orthogonal all_coords.append(new_coords) # a vstack(arrays) with a following reshape is faster then # the equivalent creation of a new array via array(arrays). return vstack(all_coords).reshape((len(all_coords), coords.shape[0], 3)) def coords_to_crystal(coords, fmx, omx, n=1): """Applies primitive lattice translations to produce a crystal from the contents of a unit cell. Returns all coordinates with included zero translation (0, 0, 0). Arguments: - coords: an array of orthogonal coordinates - fmx: fractionalization matrix - omx: orthogonalization matrix - n: number of layers of unit-cells == (2*n+1)^2 unit-cells """ rng = range(-n, n + 1) # a range like -2, -1, 0, 1, 2 fcoords = dot(coords, fmx.transpose()) # fractionalize vectors = [(x, y, z) for x in rng for y in rng for z in rng] # looking for the center Thickened cube numbers: # a(n)=n*(n^2+(n-1)^2)+(n-1)*2*n*(n-1) ;) all_coords = [] for primitive_vector in vectors: all_coords.append(fcoords + primitive_vector) # a vstack(arrays) with a following reshape is faster then # the equivalent creation of a new array via array(arrays) all_coords = vstack(all_coords).reshape((len(all_coords), \ coords.shape[0], coords.shape[1], 3)) all_coords = dot(all_coords, omx.transpose()) # orthogonalize return all_coords PyCogent-1.5.3/cogent/maths/markov.py000644 000765 000024 00000016303 12024702176 020510 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import numpy import bisect Float = numpy.core.numerictypes.sctype2char(float) __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class TransitionMatrix(object): """The transition matrix for a Markov process. Just a square numpy array plus a list of state 'tags', eg: >>> a = numpy.array([ [.9, .0, .1], [.0, .9, .1], [.5, .5, .0],], Float) >>> T = TransitionMatrix(a, ['x', 'y', 'm']) """ def __init__(self, matrix, tags, stationary_probs=None): self.Matrix = numpy.array(matrix, Float) self.Tags = list(tags) self.size = len(matrix) assert matrix.shape == (self.size, self.size) assert len(tags) == self.size if stationary_probs is None: self._stationary_probs = None else: # Could recalculate, but if it is provided then faster to # just trust it assert len(stationary_probs) == self.size self._stationary_probs = numpy.array( stationary_probs, Float) def _getStationaryProbs(self): if self._stationary_probs is None: matrix = self.Matrix for i in range(10): matrix = numpy.core.multiarray.dot(matrix, matrix) self._stationary_probs = matrix[0] return self._stationary_probs StationaryProbs = property(_getStationaryProbs) def emit(self, random_series): """Generates an infinite sequence of states""" partitions = numpy.add.accumulate(self.Matrix, axis=1) for (state, row) in enumerate(partitions[:-1]): assert abs(row[-1]-1.0) < 1e-6, (state, self.Matrix[state]) x = random_series.uniform(0.0, 1.0) state = bisect.bisect_left( numpy.add.accumulate(self.StationaryProbs), x) while 1: yield self.Tags[state] x = random_series.uniform(0.0, 1.0) state = bisect.bisect_left(partitions[state], x) def __repr__(self): from cogent.util.table import Table labels = [] for (i, label) in enumerate(self.Tags): if hasattr(label, '__len__') and not isinstance( label, basestring): label = ','.join(str(z) for z in label) # Table needs unique labels label = "%s (%s)" % (label, i) labels.append(label) heading = [''] + labels a = [[name] + list(row) for (name, row) in zip(labels, self.Matrix)] return str(Table(header = heading, rows = a)) def withoutSilentStates(self): """An equivalent matrix without any of the states that have a false tag value""" N = self.size silent = numpy.array( [(not max(tag)) for tag in self.Tags], Float) matrix = numpy.zeros([N, N], Float) for i in range(N): row = numpy.zeros([N], Float) emt = numpy.zeros([N], Float) row[i] = 1.0 while max((row+emt)-emt): row = numpy.dot(row, self.Matrix) nul = silent * row emt += row - nul row = nul matrix[i] = emt keep = [i for i in range(self.size) if max(matrix[:,i])] matrix = numpy.take(matrix, keep, axis=0) matrix = numpy.take(matrix, keep, axis=1) tags = numpy.take(self.Tags, keep, axis=0) return type(self)(matrix, tags) def getLikelihoodOfSequence(self, obs, backward=False): """Just for testing really""" profile = numpy.zeros([len(obs), self.size], Float) for (i,a) in enumerate(obs): # This is suspiciously alphabet-like! profile[i, self.Tags.index(obs[i])] = 1.0 return self.getLikelihoodOfProfile(profile, backward=backward) def getLikelihoodOfProfile(self, obs, backward=False): """Just for testing really""" if not backward: state_probs = self.StationaryProbs.copy() for i in range(len(obs)): state_probs = numpy.dot(state_probs, self.Matrix) * obs[i] return sum(state_probs) else: state_probs = numpy.ones([self.size], Float) for i in range(len(obs)-1, -1, -1): state_probs = numpy.dot(self.Matrix, state_probs * obs[i]) return sum(state_probs * self.StationaryProbs) def getPosteriorProbs(self, obs): """'obs' is a sequence of state probability vectors""" result = numpy.zeros([len(obs), self.size], Float) # Forward state_probs = self.StationaryProbs.copy() for i in range(len(obs)): state_probs = numpy.dot(state_probs, self.Matrix) * obs[i] state_probs /= sum(state_probs) result[i] = state_probs # and Backward state_probs = numpy.ones([self.size], Float) for i in range(len(obs)-1, -1, -1): state_probs = numpy.dot(self.Matrix, state_probs) state_probs /= sum(state_probs) result[i] *= state_probs result[i] /= sum(result[i]) state_probs *= obs[i] return result def nestTransitionMatricies(self, Ts, blended=None): """Useful for combining several X/Y/M Pair HMMs into one large HMM. The transition matricies 'Ts' end up along the diagonal blocks of the result, and the off diagonal values depend on the region switching probablities defined by 'self'""" if blended is None: blended = lambda a,b: (a+b)/2.0 #blended = lambda a,b: numpy.sqrt(a*b) #blended = lambda a,b: b R = self.Matrix n = len(R) assert len(Ts) == n result = None for (x, a) in enumerate(self.Tags): a = Ts[a-1] if result is None: tags = a.Tags c = len(tags) result = numpy.zeros([c*n, c*n], Float) else: assert a.Tags == tags, (a.Tags, tags) a = a.Matrix for (y, b) in enumerate(self.Tags): b = Ts[b-1].Matrix block = self.Matrix[x,y] * blended(a, b) result[x*c:(x+1)*c, y*c:(y+1)*c] = block all_tags = [] for i in self.Tags: all_tags.extend([[i*e for e in tag] for tag in tags]) return TransitionMatrix(result, all_tags) def SiteClassTransitionMatrix(switch, probs): """TM defined by stationary probabilities and a 'switch' parameter, switch=0.0 gives an identity Matrix, switch=1.0 gives independence, ie: zero-order markov process""" probs = numpy.asarray(probs) assert numpy.allclose(sum(probs), 1.0), probs I = numpy.identity(len(probs), Float) switch_probs = (1.0 - I) * (probs * switch) + \ I * (1.0 - (1.0 - probs) * switch) tags = [i+1 for i in range(len(switch_probs))] return TransitionMatrix( switch_probs, tags, stationary_probs=probs.copy()) PyCogent-1.5.3/cogent/maths/matrix/000755 000765 000024 00000000000 12024703631 020135 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/maths/matrix_exponentiation.py000644 000765 000024 00000011612 12024702176 023637 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # 4 implementations of P = exp(Q*t) # APIs along the lines of: # exponentiator = WhateverExponenentiator(Q or Q derivative(s)) # P = exponentiator(t) # # Class(Q) instance(t) Limitations # Eigen slow fast not too asymm # SemiSym slow fast mprobs > 0 # Pade instant slow # Taylor instant very slow from cogent.util.modules import importVersionedModule, ExpectedImportError import warnings import numpy from numpy.linalg import inv, eig, solve, LinAlgError __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Zongzhi Liu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class _Exponentiator(object): def __init__(self, Q): self.Q = Q def __repr__(self): return "%s(%s)" % (self.__class__.__name__, repr(self.Q)) class EigenExponentiator(_Exponentiator): """A matrix ready for fast exponentiation. P=exp(Q*t)""" __slots__ = ['Q', 'ev', 'roots', 'evI', 'evT'] def __init__(self, Q, roots, ev, evT, evI): self.Q = Q self.evI = evI self.evT = evT self.ev = ev self.roots = roots def __call__(self, t): exp_roots = numpy.exp(t*self.roots) result = numpy.inner(self.evT * exp_roots, self.evI) if result.dtype.kind == "c": result = numpy.asarray(result.real) result = numpy.maximum(result, 0.0) return result def SemiSymmetricExponentiator(motif_probs, Q): """Like EigenExponentiator, but more numerically stable and 30% faster when the rate matrix (Q/motif_probs) is symmetrical. Only usable when all motif probs > 0. Unlike the others it needs to know the motif probabilities.""" H = numpy.sqrt(motif_probs) H2 = numpy.divide.outer(H, H) #A = Q * H2 #assert numpy.allclose(A, numpy.transpose(A)), A (roots, R) = eig(Q*H2) ev = R.T / H2 evI = (R*H2).T #self.evT = numpy.transpose(self.ev) return EigenExponentiator(Q, roots, ev, ev.T, evI) # These next two are slow exponentiators, they don't get any speed up # from reusing Q with a new t, but for compatability with the diagonalising # approach they look like they do. They are derived from code in SciPy. class TaylorExponentiator(_Exponentiator): def __init__(self, Q): self.Q = Q self.q = 21 def __call__(self, t=1.0): """Compute the matrix exponential using a Taylor series of order q.""" A = self.Q * t M = A.shape[0] eA = numpy.identity(M, float) trm = eA for k in range(1, self.q): trm = numpy.dot(trm, A/float(k)) eA += trm while not numpy.allclose(eA, eA-trm): k += 1 trm = numpy.dot(trm, A/float(k)) eA += trm if k >= self.q: warnings.warn("Taylor series lengthened from %s to %s" % (self.q, k+1)) self.q = k + 1 return eA class PadeExponentiator(_Exponentiator): def __init__(self, Q): self.Q = Q def __call__(self, t=1.0): """Compute the matrix exponential using Pade approximation of order q. """ A = self.Q * t M = A.shape[0] # Scale A so that norm is < 1/2 norm = numpy.maximum.reduce(numpy.sum(numpy.absolute(A), axis=1)) j = int(numpy.floor(numpy.log(max(norm, 0.5))/numpy.log(2.0))) + 1 A = A / 2.0**j # How many iterations required e = 1.0 q = 0 qf = 1.0 while e > 1e-12: q += 1 q2 = 2.0 * q qf *= q**2 / (q2 * (q2-1) * q2 * (q2+1)) e = 8 * (norm/(2**j))**(2*q) * qf # Pade Approximation for exp(A) X = A c = 1.0/2 N = numpy.identity(M) + c*A D = numpy.identity(M) - c*A for k in range(2,q+1): c = c * (q-k+1) / (k*(2*q-k+1)) X = numpy.dot(A,X) cX = c*X N = N + cX if not k % 2: D = D + cX; else: D = D - cX; F = solve(D,N) for k in range(1,j+1): F = numpy.dot(F,F) return F def chooseFastExponentiators(Q): return (FastExponentiator, CheckedExponentiator) def FastExponentiator(Q): (roots, evT) = eig(Q) ev = evT.T return EigenExponentiator(Q, roots, ev, evT, inv(ev)) def CheckedExponentiator(Q): (roots, evT) = eig(Q) ev = evT.T evI = inv(ev) reQ = numpy.inner(ev.T * roots, evI).real if not numpy.allclose(Q, reQ): raise ArithmeticError, "eigen failed precision test" return EigenExponentiator(Q, roots, ev, evT, evI) def RobustExponentiator(Q): return PadeExponentiator(Q) PyCogent-1.5.3/cogent/maths/matrix_logarithm.py000644 000765 000024 00000003213 12024702176 022557 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Alternate matrix log algorithms. A simple implementation of matrix log, following Brett Easton's suggestion, and a taylor series expansion approach. WARNING: The methods are not robust! """ from numpy import array, dot, eye, zeros, transpose, log, inner as innerproduct from numpy.linalg import inv as inverse, eig as eigenvectors, norm __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Gavin Huttley", "Von Bing Yap"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def logm(P): """Returns logarithm of a matrix. This method should work if the matrix is positive definite and diagonalizable. """ roots, ev = eigenvectors(P) evI = inverse(ev.T) evT = ev log_roots = log(roots) return innerproduct(evT * log_roots, evI) def logm_taylor(P, tol=1e-30): """returns the matrix log computed using the taylor series. If the Frobenius norm of P-I is > 1, raises an exception since the series is not gauranteed to converge. The series is continued until the Frobenius norm of the current element is < tol. Note: This exit condition is theoretically crude but seems to work reasonably well. Arguments: tol - the tolerance """ P = array(P) I = eye(P.shape[0]) X = P - I assert norm(X, ord='fro') < 1, "Frobenius norm > 1" Y = I Q = zeros(P.shape, dtype="double") i = 1 while norm(Y/i, ord='fro') > tol: Y = dot(Y,X) Q += ((-1)**(i-1)*Y/i) i += 1 return Q PyCogent-1.5.3/cogent/maths/optimisers.py000644 000765 000024 00000015251 12024702176 021410 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # -*- coding: utf-8 -*- """Local or Global-then-local optimisation with progress display """ from cogent.util import progress_display as UI from simannealingoptimiser import SimulatedAnnealing from scipy_optimisers import DownhillSimplex, Powell import warnings import numpy GlobalOptimiser = SimulatedAnnealing LocalOptimiser = Powell __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Andrew Butterfield", "Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def unsteadyProgressIndicator(display_progress, label='', start=0.0, end=1.0): template = u'f = % #10.6g ± % 9.3e evals = %6i ' label = label.rjust(5) goal = [1.0e-20] def _display_progress(remaining, *args): if remaining > goal[0]: goal[0] = remaining progress = (goal[0]-remaining)/goal[0] * (end-start) + start msg = template % args + label return display_progress(msg, progress=progress, current=0) return _display_progress class ParameterOutOfBoundsError(Exception): pass class MaximumEvaluationsReached(Exception): pass # The following functions are used to wrap the optimised function to # adapt it to the optimiser in various ways. They can be combined. def limited_use(f, max_evaluations=None): if max_evaluations is None: max_evaluations = numpy.inf evals = [0] best_fval = [-numpy.inf] best_x = [None] def wrapped_f(x): if evals[0] >= max_evaluations: raise MaximumEvaluationsReached(evals[0]) evals[0] += 1 fval = f(x) if fval > best_fval[0]: best_fval[0] = fval best_x[0] = x.copy() return fval def get_best(): f(best_x[0]) # for calculator, ensure best last return best_fval[0], best_x[0], evals[0] return get_best, wrapped_f def bounded_function(f, lower_bounds, upper_bounds): """Returns a function that raises an exception on out-of-bounds input rather than bothering the real function with invalid input. This is enough to get some unbounded optimisers working on bounded problems""" def _wrapper(x, **kw): if numpy.alltrue(numpy.logical_and(lower_bounds <= x, x <= upper_bounds)): return f(x, **kw) else: raise ParameterOutOfBoundsError((lower_bounds, x, upper_bounds)) return _wrapper def bounds_exception_catching_function(f): """Returns a function that return -inf on out-of-bounds or otherwise impossible to evaluate input. This only helps if the function is to be MAXIMISED.""" out_of_bounds_value = -numpy.inf acceptable_inf = numpy.isneginf def _wrapper(x, **kw): try: result = f(x, **kw) if not numpy.isfinite(result): if not acceptable_inf(result): warnings.warn('Non-finite f %s from %s' % (result, x)) raise ParameterOutOfBoundsError except (ArithmeticError, ParameterOutOfBoundsError), detail: result = out_of_bounds_value return result return _wrapper def minimise(f, *args, **kw): """See maximise""" def nf(x): return -1 * f(x) return maximise(nf, *args, **kw) @UI.display_wrap def maximise(f, xinit, bounds=None, local=None, filename=None, interval=None, max_restarts=None, max_evaluations=None, limit_action='warn', tolerance=1e-6, global_tolerance=1e-1, ui=None, return_eval_count=False, **kw): """Find input values that optimise this function. 'local' controls the choice of optimiser, the default being to run both the global and local optimisers. 'filename' and 'interval' control checkpointing. Unknown keyword arguments get passed on to the global optimiser. """ do_global = (not local) or local is None do_local = local or local is None assert limit_action in ['ignore', 'warn', 'raise', 'error'] (get_best, f) = limited_use(f, max_evaluations) x = numpy.array(xinit, float) multidimensional_input = x.shape != () if not multidimensional_input: x = numpy.atleast_1d(x) if bounds is not None: (upper, lower) = bounds if upper is not None or lower is not None: if upper is None: upper = numpy.inf if lower is None: lower = -numpy.inf f = bounded_function(f, upper, lower) try: fval = f(x) except (ArithmeticError, ParameterOutOfBoundsError), detail: raise ValueError("Initial parameter values must be valid %s" % repr(detail.args)) if not numpy.isfinite(fval): raise ValueError("Initial parameter values must evaluate to a finite value, not %s. %s" % (fval, x)) f = bounds_exception_catching_function(f) try: # Global optimisation if do_global: if 0 and not do_local: warnings.warn( 'local=False causes the post-global optimisation local ' '"polishing" optimisation to be skipped entirely, which seems ' 'pointless, so its meaning may change to a simple boolean ' 'flag: local or global.') # It also needlessly complicates this function. gend = 1.0 else: gend = 0.9 callback = unsteadyProgressIndicator(ui.display, 'Global', 0.0, gend) gtol = [tolerance, global_tolerance][do_local] opt = GlobalOptimiser(filename=filename, interval=interval) x = opt.maximise(f, x, tolerance=gtol, show_remaining=callback, **kw) else: gend = 0.0 for k in kw: warnings.warn('Unused arg for local alignment: ' + k) # Local optimisation if do_local: callback = unsteadyProgressIndicator(ui.display, 'Local', gend, 1.0) #ui.display('local opt', 1.0-per_opt, per_opt) opt = LocalOptimiser() x = opt.maximise(f, x, tolerance=tolerance, max_restarts=max_restarts, show_remaining=callback) finally: # ensure state of calculator reflects optimised result, or # partialy optimised result if exiting on an exception. (f, x, evals) = get_best() # ... and returning this info the obvious way keeps this function # potentially applicable optimising non-caching pure functions too. if not multidimensional_input: x = numpy.squeeze(x) if return_eval_count: return x, evals return x PyCogent-1.5.3/cogent/maths/period.py000644 000765 000024 00000021455 12024702176 020477 0ustar00jrideoutstaff000000 000000 from numpy import zeros, array, exp, pi, cos, fft, arange, power, sqrt, sum,\ multiply, float64, polyval __author__ = "Hua Ying, Julien Epps and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Julien Epps", "Hua Ying", "Gavin Huttley", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Production" def _goertzel_inner(x, N, period): coeff = 2.0 * cos(2 * pi / period) s_prev = 0.0 s_prev2 = 0.0 for n in range(N): s = x[n] + coeff * s_prev - s_prev2 s_prev2 = s_prev s_prev = s pwr = sqrt(s_prev2**2 + s_prev**2 - coeff * s_prev2 * s_prev) return pwr def _ipdft_inner(x, X, W, ulim, N): # naive python for p in range(ulim): w = 1 for n in range(N): if n != 0: w *= W[p] X[p] = X[p] + x[n] * w return X def _ipdft_inner2(x, X, W, ulim, N): # fastest python p = x[::-1] # reversed X = polyval(p, W) return X def _autocorr_inner2(x, xc, N): # fastest python products = multiply.outer(x, x) v = [products.trace(offset=m) for m in range(-len(x)+1, len(x))] xc.put(xrange(xc.shape[0]), v) def _autocorr_inner(x, xc, N): # naive python for m in range(-N+1, N): for n in range(N): if 0 <= n-m < N: xc[m+N-1] += (x[n]*x[n-m]) try: # try using pyrexed versions from _period import ipdft_inner, autocorr_inner, goertzel_inner # raise ImportError # for profiling except ImportError: # fastest python versions ipdft_inner = _ipdft_inner2 autocorr_inner = _autocorr_inner2 goertzel_inner = _goertzel_inner def goertzel(x, period): """returns the array(power), array(period) from series x for period result objects are arrays for consistency with that other period estimation functions""" calc = Goertzel(len(x), period=period) return calc(x) class _PeriodEstimator(object): """parent class for period estimation""" def __init__(self, length, llim=None, ulim=None, period=None): super(_PeriodEstimator, self).__init__() self.length = length self.llim = llim or 2 self.ulim = ulim or (length-1) if self.ulim > length: raise RuntimeError, 'Error: ulim > length' self.period = period def getNumStats(self): """returns the number of statistics computed by this calculator""" return 1 class AutoCorrelation(_PeriodEstimator): def __init__(self, length, llim=None, ulim=None, period=None): """class for repetitive calculation of autocorrelation for series of fixed length e.g. if x = [1,1,1,1], xc = [1,2,3,4,3,2,1] The middle element of xc corresponds to a lag (period) of 0 xc is always symmetric for real x N is the length of x""" super(AutoCorrelation, self).__init__(length, llim, ulim, period) periods = range(-length+1, length) self.min_idx = periods.index(self.llim) self.max_idx = periods.index(self.ulim) self.periods = array(periods[self.min_idx: self.max_idx + 1]) self.xc = zeros(2*self.length-1) def evaluate(self, x): x = array(x, float64) self.xc.fill(0.0) autocorr_inner(x, self.xc, self.length) xc = self.xc[self.min_idx: self.max_idx + 1] if self.period is not None: return xc[self.period-self.llim] return xc, self.periods __call__ = evaluate def auto_corr(x, llim=None, ulim=None): """returns the autocorrelation of x e.g. if x = [1,1,1,1], xc = [1,2,3,4,3,2,1] The middle element of xc corresponds to a lag (period) of 0 xc is always symmetric for real x N is the length of x """ _autocorr = AutoCorrelation(len(x), llim=llim, ulim=ulim) return _autocorr(x) class Ipdft(_PeriodEstimator): def __init__(self, length, llim=None, ulim=None, period=None, abs_ft_sig=True): """factory function for computing the integer period discrete Fourier transform for repeated application to signals of the same length. Argument: - length: the signal length - llim: lower limit - ulim: upper limit - period: a specific period to return the IPDFT power for - abs_ft_sig: if True, returns absolute value of signal """ if period is not None: llim = period ulim = period super(Ipdft, self).__init__(length, llim, ulim, period) self.periods = array(range(self.llim, self.ulim+1)) self.W = exp(-1j * 2 * pi / arange(1, self.ulim+1)) self.X = array([0+0j] * self.length) self.abs_ft_sig = abs_ft_sig def evaluate(self, x): x = array(x, float64) self.X.fill(0+0j) self.X = ipdft_inner(x, self.X, self.W, self.ulim, self.length) pwr = self.X[self.llim-1:self.ulim] if self.abs_ft_sig: pwr = abs(pwr) if self.period is not None: return pwr[self.period-self.llim] return array(pwr), self.periods __call__ = evaluate class Goertzel(_PeriodEstimator): """Computes the power of a signal for a specific period""" def __init__(self, length=None, llim=None, ulim=None, period=None, abs_ft_sig=True): assert period is not None, "Goertzel requires a period" super(Goertzel, self).__init__(length=length, period=period) def evaluate(self, x): x = array(x, float64) return _goertzel_inner(x, self.length, self.period) __call__ = evaluate class Hybrid(_PeriodEstimator): """hybrid statistic and corresponding periods for signal x See Epps. EURASIP Journal on Bioinformatics and Systems Biology, 2009""" def __init__(self, length, llim=None, ulim=None, period=None, abs_ft_sig=True, return_all=False): """Arguments: - length: the length of signals to be encountered - period: specified period at which to return the signal - llim, ulim: the smallest, largest periods to evaluate - return_all: whether to return the hybrid, ipdft, autocorr statistics as a numpy array, or just the hybrid statistic """ super(Hybrid, self).__init__(length, llim, ulim, period) self.ipdft = Ipdft(length, llim, ulim, period, abs_ft_sig) self.auto = AutoCorrelation(length, llim, ulim, period) self._return_all = return_all def getNumStats(self): """the number of stats computed by this calculator""" num = [1, 3][self._return_all] return num def evaluate(self, x): if self.period is None: auto_sig, auto_periods = self.auto(x) ft_sig, ft_periods = self.ipdft(x) hybrid = auto_sig * ft_sig if self._return_all: result = array([hybrid, ft_sig, auto_sig]), ft_periods else: result = hybrid, ft_periods else: auto_sig = self.auto(x) # ft_sig = goertzel(x, period) # performance slower than ipdft! ft_sig = self.ipdft(x) hybrid = auto_sig * ft_sig if self._return_all: result = array([abs(hybrid), ft_sig, auto_sig]) else: result = abs(hybrid) return result __call__ = evaluate def ipdft(x, llim=None, ulim=None, period=None): """returns the integer period discrete Fourier transform of the signal x Arguments: - x: series of symbols - llim: lower limit - ulim: upper limit """ x = array(x, float64) ipdft_calc = Ipdft(len(x), llim, ulim, period) return ipdft_calc(x) def hybrid(x, llim=None, ulim=None, period=None, return_all=False): """ Return hybrid statistic and corresponding periods for signal x Arguments: - return_all: whether to return the hybrid, ipdft, autocorr statistics as a numpy array, or just the hybrid statistic See Epps. EURASIP Journal on Bioinformatics and Systems Biology, 2009, 9 """ hybrid_calc = Hybrid(len(x), llim, ulim, period, return_all=return_all) x = array(x, float) return hybrid_calc(x) def dft(x, **kwargs): """ Return discrete fft and corresponding periods for signal x """ n = len(x) / 2 * 2 x = array(x[:n]) pwr = fft.rfft(x, n)[1:] freq = (arange(n/2+1)/(float(n)))[1:] pwr = list(pwr) periods = [1/f for f in freq] pwr.reverse() periods.reverse() return array(pwr), array(periods) if __name__ == "__main__": from numpy import sin x = sin(2*pi/5*arange(1,9)) print x print goertzel(x, 4) print goertzel(x, 8) PyCogent-1.5.3/cogent/maths/scipy_optimisers.py000644 000765 000024 00000007642 12024702176 022624 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division import numpy, math, warnings from cogent.maths.scipy_optimize import fmin_bfgs, fmin_powell, fmin, brent __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def bound_brent(func, brack=None, **kw): """Given a function and an initial point, find another point within the bounds, then use the two points to bracket a minimum. This differs from ordinary brent() only in that it protects bracket() from infinities. bracket() may find an infinity as the third point, but that's OK because it finishes as soon as that happens. If bracket() returns an invalid 3rd point then we will pass it on to brent(), but brent() knows to use golden section until all 3 points are finite so it will cope OK. """ assert not brack, brack xa = 0.0 fa = func(xa) assert fa is not numpy.inf, "Starting point is infinite" # if dx sends us over the boundry shrink and reflect it until # it doesn't any more. dx = -2.0 # this would be -2.0 in orig impl, but is smaller better? xb = xa + dx fb = numpy.inf while fb is numpy.inf and xb != xa: dx = dx * -0.5 xb = xa + dx fb = func(xb) assert xb != xa, "Can't find a second in-bounds point on this line" return brent(func, brack=(xa, xb), **kw) class _SciPyOptimiser(object): """This class is abstract. Subclasses must provide a _minimise(self, f, x) that can sanely handle +inf. Since these are local optimisers, we sometimes restart them to check the result is stable. Cost is less than 2-fold slowdown""" def maximise(self, function, *args, **kw): def nf(x): return -1 * function(x) return self.minimise(nf, *args, **kw) def minimise(self, function, xopt, show_remaining, max_restarts=None, tolerance=None): if max_restarts is None: max_restarts = 0 if tolerance is None: tolerance = 1e-6 fval_last = fval = numpy.inf if len(xopt) == 0: return function(xopt), xopt if show_remaining: def _callback(fcalls, x, fval, delta): remaining = math.log(max(abs(delta)/tolerance, 1.0)) show_remaining(remaining, -fval, delta, fcalls) else: _callback = None for i in range((max_restarts + 1)): (xopt, fval, iterations, func_calls, warnflag) = self._minimise( function, xopt, disp=False, callback=_callback, ftol=tolerance, full_output=True) xopt = numpy.atleast_1d(xopt) # unsqueeze incase only one param if warnflag: warnings.warn('Unexpected warning from scipy %s' % warnflag) # same tolerance check as in fmin_powell if abs(fval_last - fval) < tolerance: break fval_last = fval # fval <= fval_last return xopt class Powell(_SciPyOptimiser): """Uses an infinity avoiding version of the Brent line search.""" def _minimise(self, f, x, **kw): result = fmin_powell(f, x, linesearch=bound_brent, **kw) # same length full-results tuple as simplex: (xopt, fval, directions, iterations, func_calls, warnflag) = result return (xopt, fval, iterations, func_calls, warnflag) class DownhillSimplex(_SciPyOptimiser): """On a small brca1 tree this fails to find a minimum as good as the other optimisers. Restarts help a lot though.""" def _minimise(self, f, x, **kw): return fmin(f, x, **kw) DefaultLocalOptimiser = Powell PyCogent-1.5.3/cogent/maths/scipy_optimize.py000644 000765 000024 00000175562 12024702176 022275 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # We don't want to depend on the monolithic, fortranish, # Num-overlapping, mac-unfriendly SciPy. But this # module is too good to pass up. It has been lightly customised for # use in Cogent. Changes made to fmin_powell and brent: allow custom # line search function (to allow bound_brent to be passed in), cope with # infinity, tol specified as an absolute value, not a proportion of f, # and more info passed out via callback. # ******NOTICE*************** # optimize.py module by Travis E. Oliphant # # You may copy and use this module as you see fit with no # guarantee implied provided you keep this notice in all copies. # *****END NOTICE************ # Minimization routines __all__ = ['fmin', 'fmin_powell','fmin_bfgs', 'fmin_ncg', 'fmin_cg', 'fminbound','brent', 'golden','bracket','rosen','rosen_der', 'rosen_hess', 'rosen_hess_prod', 'brute', 'approx_fprime', 'line_search', 'check_grad'] __docformat__ = "restructuredtext en" import numpy from numpy import atleast_1d, eye, mgrid, argmin, zeros, shape, empty, \ squeeze, vectorize, asarray, absolute, sqrt, Inf, asfarray, isinf try: import linesearch # from SciPy except ImportError: linesearch = None # These have been copied from Numeric's MLab.py # I don't think they made the transition to scipy_core def max(m,axis=0): """max(m,axis=0) returns the maximum of m along dimension axis. """ m = asarray(m) return numpy.maximum.reduce(m,axis) def min(m,axis=0): """min(m,axis=0) returns the minimum of m along dimension axis. """ m = asarray(m) return numpy.minimum.reduce(m,axis) def is_array_scalar(x): """Test whether `x` is either a scalar or an array scalar. """ return len(atleast_1d(x) == 1) abs = absolute import __builtin__ pymin = __builtin__.min pymax = __builtin__.max __version__ = "1.5.3" _epsilon = sqrt(numpy.finfo(float).eps) __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" def vecnorm(x, ord=2): if ord == Inf: return numpy.amax(abs(x)) elif ord == -Inf: return numpy.amin(abs(x)) else: return numpy.sum(abs(x)**ord,axis=0)**(1.0/ord) def rosen(x): # The Rosenbrock function x = asarray(x) return numpy.sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0,axis=0) def rosen_der(x): x = asarray(x) xm = x[1:-1] xm_m1 = x[:-2] xm_p1 = x[2:] der = numpy.zeros_like(x) der[1:-1] = 200*(xm-xm_m1**2) - 400*(xm_p1 - xm**2)*xm - 2*(1-xm) der[0] = -400*x[0]*(x[1]-x[0]**2) - 2*(1-x[0]) der[-1] = 200*(x[-1]-x[-2]**2) return der def rosen_hess(x): x = atleast_1d(x) H = numpy.diag(-400*x[:-1],1) - numpy.diag(400*x[:-1],-1) diagonal = numpy.zeros(len(x), dtype=x.dtype) diagonal[0] = 1200*x[0]-400*x[1]+2 diagonal[-1] = 200 diagonal[1:-1] = 202 + 1200*x[1:-1]**2 - 400*x[2:] H = H + numpy.diag(diagonal) return H def rosen_hess_prod(x,p): x = atleast_1d(x) Hp = numpy.zeros(len(x), dtype=x.dtype) Hp[0] = (1200*x[0]**2 - 400*x[1] + 2)*p[0] - 400*x[0]*p[1] Hp[1:-1] = -400*x[:-2]*p[:-2]+(202+1200*x[1:-1]**2-400*x[2:])*p[1:-1] \ -400*x[1:-1]*p[2:] Hp[-1] = -400*x[-2]*p[-2] + 200*p[-1] return Hp def wrap_function(function, args): ncalls = [0] def function_wrapper(x): ncalls[0] += 1 return function(x, *args) return ncalls, function_wrapper def fmin(func, x0, args=(), xtol=1e-4, ftol=1e-4, maxiter=None, maxfun=None, full_output=0, disp=1, retall=0, callback=None): """Minimize a function using the downhill simplex algorithm. :Parameters: func : callable func(x,*args) The objective function to be minimized. x0 : ndarray Initial guess. args : tuple Extra arguments passed to func, i.e. ``f(x,*args)``. callback : callable Called after each iteration, as callback(xk), where xk is the current parameter vector. :Returns: (xopt, {fopt, iter, funcalls, warnflag}) xopt : ndarray Parameter that minimizes function. fopt : float Value of function at minimum: ``fopt = func(xopt)``. iter : int Number of iterations performed. funcalls : int Number of function calls made. warnflag : int 1 : Maximum number of function evaluations made. 2 : Maximum number of iterations reached. allvecs : list Solution at each iteration. *Other Parameters*: xtol : float Relative error in xopt acceptable for convergence. ftol : number Relative error in func(xopt) acceptable for convergence. maxiter : int Maximum number of iterations to perform. maxfun : number Maximum number of function evaluations to make. full_output : bool Set to True if fval and warnflag outputs are desired. disp : bool Set to True to print convergence messages. retall : bool Set to True to return list of solutions at each iteration. :Notes: Uses a Nelder-Mead simplex algorithm to find the minimum of function of one or more variables. """ fcalls, func = wrap_function(func, args) x0 = asfarray(x0).flatten() N = len(x0) rank = len(x0.shape) if not -1 < rank < 2: raise ValueError, "Initial guess must be a scalar or rank-1 sequence." if maxiter is None: maxiter = N * 200 if maxfun is None: maxfun = N * 200 rho = 1; chi = 2; psi = 0.5; sigma = 0.5; one2np1 = range(1,N+1) if rank == 0: sim = numpy.zeros((N+1,), dtype=x0.dtype) else: sim = numpy.zeros((N+1,N), dtype=x0.dtype) fsim = numpy.zeros((N+1,), float) sim[0] = x0 if retall: allvecs = [sim[0]] fsim[0] = func(x0) nonzdelt = 0.05 zdelt = 0.00025 for k in range(0,N): y = numpy.array(x0,copy=True) if y[k] != 0: y[k] = (1+nonzdelt)*y[k] else: y[k] = zdelt sim[k+1] = y f = func(y) fsim[k+1] = f ind = numpy.argsort(fsim) fsim = numpy.take(fsim,ind,0) # sort so sim[0,:] has the lowest function value sim = numpy.take(sim,ind,0) iterations = 1 while (fcalls[0] < maxfun and iterations < maxiter): if (max(numpy.ravel(abs(sim[1:]-sim[0]))) <= xtol \ and max(abs(fsim[0]-fsim[1:])) <= ftol): break xbar = numpy.add.reduce(sim[:-1],0) / N xr = (1+rho)*xbar - rho*sim[-1] fxr = func(xr) doshrink = 0 if fxr < fsim[0]: xe = (1+rho*chi)*xbar - rho*chi*sim[-1] fxe = func(xe) if fxe < fxr: sim[-1] = xe fsim[-1] = fxe else: sim[-1] = xr fsim[-1] = fxr else: # fsim[0] <= fxr if fxr < fsim[-2]: sim[-1] = xr fsim[-1] = fxr else: # fxr >= fsim[-2] # Perform contraction if fxr < fsim[-1]: xc = (1+psi*rho)*xbar - psi*rho*sim[-1] fxc = func(xc) if fxc <= fxr: sim[-1] = xc fsim[-1] = fxc else: doshrink=1 else: # Perform an inside contraction xcc = (1-psi)*xbar + psi*sim[-1] fxcc = func(xcc) if fxcc < fsim[-1]: sim[-1] = xcc fsim[-1] = fxcc else: doshrink = 1 if doshrink: for j in one2np1: sim[j] = sim[0] + sigma*(sim[j] - sim[0]) fsim[j] = func(sim[j]) ind = numpy.argsort(fsim) sim = numpy.take(sim,ind,0) fsim = numpy.take(fsim,ind,0) if callback is not None: callback(fcalls[0], sim[0], min(fsim)) iterations += 1 if retall: allvecs.append(sim[0]) x = sim[0] fval = min(fsim) warnflag = 0 if fcalls[0] >= maxfun: warnflag = 1 if disp: print "Warning: Maximum number of function evaluations has "\ "been exceeded." elif iterations >= maxiter: warnflag = 2 if disp: print "Warning: Maximum number of iterations has been exceeded" else: if disp: print "Optimization terminated successfully." print " Current function value: %f" % fval print " Iterations: %d" % iterations print " Function evaluations: %d" % fcalls[0] if full_output: retlist = x, fval, iterations, fcalls[0], warnflag if retall: retlist += (allvecs,) else: retlist = x if retall: retlist = (x, allvecs) return retlist def _cubicmin(a,fa,fpa,b,fb,c,fc): # finds the minimizer for a cubic polynomial that goes through the # points (a,fa), (b,fb), and (c,fc) with derivative at a of fpa. # # if no minimizer can be found return None # # f(x) = A *(x-a)^3 + B*(x-a)^2 + C*(x-a) + D C = fpa D = fa db = b-a dc = c-a if (db == 0) or (dc == 0) or (b==c): return None denom = (db*dc)**2 * (db-dc) d1 = empty((2,2)) d1[0,0] = dc**2 d1[0,1] = -db**2 d1[1,0] = -dc**3 d1[1,1] = db**3 [A,B] = numpy.dot(d1,asarray([fb-fa-C*db,fc-fa-C*dc]).flatten()) A /= denom B /= denom radical = B*B-3*A*C if radical < 0: return None if (A == 0): return None xmin = a + (-B + sqrt(radical))/(3*A) return xmin def _quadmin(a,fa,fpa,b,fb): # finds the minimizer for a quadratic polynomial that goes through # the points (a,fa), (b,fb) with derivative at a of fpa # f(x) = B*(x-a)^2 + C*(x-a) + D D = fa C = fpa db = b-a*1.0 if (db==0): return None B = (fb-D-C*db)/(db*db) if (B <= 0): return None xmin = a - C / (2.0*B) return xmin def zoom(a_lo, a_hi, phi_lo, phi_hi, derphi_lo, phi, derphi, phi0, derphi0, c1, c2): maxiter = 10 i = 0 delta1 = 0.2 # cubic interpolant check delta2 = 0.1 # quadratic interpolant check phi_rec = phi0 a_rec = 0 while 1: # interpolate to find a trial step length between a_lo and a_hi # Need to choose interpolation here. Use cubic interpolation and then if the # result is within delta * dalpha or outside of the interval bounded by a_lo or a_hi # then use quadratic interpolation, if the result is still too close, then use bisection dalpha = a_hi-a_lo; if dalpha < 0: a,b = a_hi,a_lo else: a,b = a_lo, a_hi # minimizer of cubic interpolant # (uses phi_lo, derphi_lo, phi_hi, and the most recent value of phi) # if the result is too close to the end points (or out of the interval) # then use quadratic interpolation with phi_lo, derphi_lo and phi_hi # if the result is stil too close to the end points (or out of the interval) # then use bisection if (i > 0): cchk = delta1*dalpha a_j = _cubicmin(a_lo, phi_lo, derphi_lo, a_hi, phi_hi, a_rec, phi_rec) if (i==0) or (a_j is None) or (a_j > b-cchk) or (a_j < a+cchk): qchk = delta2*dalpha a_j = _quadmin(a_lo, phi_lo, derphi_lo, a_hi, phi_hi) if (a_j is None) or (a_j > b-qchk) or (a_j < a+qchk): a_j = a_lo + 0.5*dalpha # print "Using bisection." # else: print "Using quadratic." # else: print "Using cubic." # Check new value of a_j phi_aj = phi(a_j) if (phi_aj > phi0 + c1*a_j*derphi0) or (phi_aj >= phi_lo): phi_rec = phi_hi a_rec = a_hi a_hi = a_j phi_hi = phi_aj else: derphi_aj = derphi(a_j) if abs(derphi_aj) <= -c2*derphi0: a_star = a_j val_star = phi_aj valprime_star = derphi_aj break if derphi_aj*(a_hi - a_lo) >= 0: phi_rec = phi_hi a_rec = a_hi a_hi = a_lo phi_hi = phi_lo else: phi_rec = phi_lo a_rec = a_lo a_lo = a_j phi_lo = phi_aj derphi_lo = derphi_aj i += 1 if (i > maxiter): a_star = a_j val_star = phi_aj valprime_star = None break return a_star, val_star, valprime_star def line_search(f, myfprime, xk, pk, gfk, old_fval, old_old_fval, args=(), c1=1e-4, c2=0.9, amax=50): """Find alpha that satisfies strong Wolfe conditions. :Parameters: f : callable f(x,*args) Objective function. myfprime : callable f'(x,*args) Objective function gradient (can be None). xk : ndarray Starting point. pk : ndarray Search direction. gfk : ndarray Gradient value for x=xk (xk being the current parameter estimate). args : tuple Additional arguments passed to objective function. c1 : float Parameter for Armijo condition rule. c2 : float Parameter for curvature condition rule. :Returns: alpha0 : float Alpha for which ``x_new = x0 + alpha * pk``. fc : int Number of function evaluations made. gc : int Number of gradient evaluations made. :Notes: Uses the line search algorithm to enforce strong Wolfe conditions. See Wright and Nocedal, 'Numerical Optimization', 1999, pg. 59-60. For the zoom phase it uses an algorithm by [...]. """ global _ls_fc, _ls_gc, _ls_ingfk _ls_fc = 0 _ls_gc = 0 _ls_ingfk = None def phi(alpha): global _ls_fc _ls_fc += 1 return f(xk+alpha*pk,*args) if isinstance(myfprime,type(())): def phiprime(alpha): global _ls_fc, _ls_ingfk _ls_fc += len(xk)+1 eps = myfprime[1] fprime = myfprime[0] newargs = (f,eps) + args _ls_ingfk = fprime(xk+alpha*pk,*newargs) # store for later use return numpy.dot(_ls_ingfk,pk) else: fprime = myfprime def phiprime(alpha): global _ls_gc, _ls_ingfk _ls_gc += 1 _ls_ingfk = fprime(xk+alpha*pk,*args) # store for later use return numpy.dot(_ls_ingfk,pk) alpha0 = 0 phi0 = old_fval derphi0 = numpy.dot(gfk,pk) alpha1 = pymin(1.0,1.01*2*(phi0-old_old_fval)/derphi0) if alpha1 == 0: # This shouldn't happen. Perhaps the increment has slipped below # machine precision? For now, set the return variables skip the # useless while loop, and raise warnflag=2 due to possible imprecision. alpha_star = None fval_star = old_fval old_fval = old_old_fval fprime_star = None phi_a1 = phi(alpha1) #derphi_a1 = phiprime(alpha1) evaluated below phi_a0 = phi0 derphi_a0 = derphi0 i = 1 maxiter = 10 while 1: # bracketing phase if alpha1 == 0: break if (phi_a1 > phi0 + c1*alpha1*derphi0) or \ ((phi_a1 >= phi_a0) and (i > 1)): alpha_star, fval_star, fprime_star = \ zoom(alpha0, alpha1, phi_a0, phi_a1, derphi_a0, phi, phiprime, phi0, derphi0, c1, c2) break derphi_a1 = phiprime(alpha1) if (abs(derphi_a1) <= -c2*derphi0): alpha_star = alpha1 fval_star = phi_a1 fprime_star = derphi_a1 break if (derphi_a1 >= 0): alpha_star, fval_star, fprime_star = \ zoom(alpha1, alpha0, phi_a1, phi_a0, derphi_a1, phi, phiprime, phi0, derphi0, c1, c2) break alpha2 = 2 * alpha1 # increase by factor of two on each iteration i = i + 1 alpha0 = alpha1 alpha1 = alpha2 phi_a0 = phi_a1 phi_a1 = phi(alpha1) derphi_a0 = derphi_a1 # stopping test if lower function not found if (i > maxiter): alpha_star = alpha1 fval_star = phi_a1 fprime_star = None break if fprime_star is not None: # fprime_star is a number (derphi) -- so use the most recently # calculated gradient used in computing it derphi = gfk*pk # this is the gradient at the next step no need to compute it # again in the outer loop. fprime_star = _ls_ingfk return alpha_star, _ls_fc, _ls_gc, fval_star, old_fval, fprime_star def line_search_BFGS(f, xk, pk, gfk, old_fval, args=(), c1=1e-4, alpha0=1): """Minimize over alpha, the function ``f(xk+alpha pk)``. Uses the interpolation algorithm (Armiijo backtracking) as suggested by Wright and Nocedal in 'Numerical Optimization', 1999, pg. 56-57 :Returns: (alpha, fc, gc) """ xk = atleast_1d(xk) fc = 0 phi0 = old_fval # compute f(xk) -- done in past loop phi_a0 = f(*((xk+alpha0*pk,)+args)) fc = fc + 1 derphi0 = numpy.dot(gfk,pk) if (phi_a0 <= phi0 + c1*alpha0*derphi0): return alpha0, fc, 0, phi_a0 # Otherwise compute the minimizer of a quadratic interpolant: alpha1 = -(derphi0) * alpha0**2 / 2.0 / (phi_a0 - phi0 - derphi0 * alpha0) phi_a1 = f(*((xk+alpha1*pk,)+args)) fc = fc + 1 if (phi_a1 <= phi0 + c1*alpha1*derphi0): return alpha1, fc, 0, phi_a1 # Otherwise loop with cubic interpolation until we find an alpha which # satifies the first Wolfe condition (since we are backtracking, we will # assume that the value of alpha is not too small and satisfies the second # condition. while 1: # we are assuming pk is a descent direction factor = alpha0**2 * alpha1**2 * (alpha1-alpha0) a = alpha0**2 * (phi_a1 - phi0 - derphi0*alpha1) - \ alpha1**2 * (phi_a0 - phi0 - derphi0*alpha0) a = a / factor b = -alpha0**3 * (phi_a1 - phi0 - derphi0*alpha1) + \ alpha1**3 * (phi_a0 - phi0 - derphi0*alpha0) b = b / factor alpha2 = (-b + numpy.sqrt(abs(b**2 - 3 * a * derphi0))) / (3.0*a) phi_a2 = f(*((xk+alpha2*pk,)+args)) fc = fc + 1 if (phi_a2 <= phi0 + c1*alpha2*derphi0): return alpha2, fc, 0, phi_a2 if (alpha1 - alpha2) > alpha1 / 2.0 or (1 - alpha2/alpha1) < 0.96: alpha2 = alpha1 / 2.0 alpha0 = alpha1 alpha1 = alpha2 phi_a0 = phi_a1 phi_a1 = phi_a2 def approx_fprime(xk,f,epsilon,*args): f0 = f(*((xk,)+args)) grad = numpy.zeros((len(xk),), float) ei = numpy.zeros((len(xk),), float) for k in range(len(xk)): ei[k] = epsilon grad[k] = (f(*((xk+ei,)+args)) - f0)/epsilon ei[k] = 0.0 return grad def check_grad(func, grad, x0, *args): return sqrt(sum((grad(x0,*args)-approx_fprime(x0,func,_epsilon,*args))**2)) def approx_fhess_p(x0,p,fprime,epsilon,*args): f2 = fprime(*((x0+epsilon*p,)+args)) f1 = fprime(*((x0,)+args)) return (f2 - f1)/epsilon def fmin_bfgs(f, x0, fprime=None, args=(), gtol=1e-5, norm=Inf, epsilon=_epsilon, maxiter=None, full_output=0, disp=1, retall=0, callback=None): """Minimize a function using the BFGS algorithm. :Parameters: f : callable f(x,*args) Objective function to be minimized. x0 : ndarray Initial guess. fprime : callable f'(x,*args) Gradient of f. args : tuple Extra arguments passed to f and fprime. gtol : float Gradient norm must be less than gtol before succesful termination. norm : float Order of norm (Inf is max, -Inf is min) epsilon : int or ndarray If fprime is approximated, use this value for the step size. callback : callable An optional user-supplied function to call after each iteration. Called as callback(xk), where xk is the current parameter vector. :Returns: (xopt, {fopt, gopt, Hopt, func_calls, grad_calls, warnflag}, ) xopt : ndarray Parameters which minimize f, i.e. f(xopt) == fopt. fopt : float Minimum value. gopt : ndarray Value of gradient at minimum, f'(xopt), which should be near 0. Bopt : ndarray Value of 1/f''(xopt), i.e. the inverse hessian matrix. func_calls : int Number of function_calls made. grad_calls : int Number of gradient calls made. warnflag : integer 1 : Maximum number of iterations exceeded. 2 : Gradient and/or function calls not changing. allvecs : list Results at each iteration. Only returned if retall is True. *Other Parameters*: maxiter : int Maximum number of iterations to perform. full_output : bool If True,return fopt, func_calls, grad_calls, and warnflag in addition to xopt. disp : bool Print convergence message if True. retall : bool Return a list of results at each iteration if True. :Notes: Optimize the function, f, whose gradient is given by fprime using the quasi-Newton method of Broyden, Fletcher, Goldfarb, and Shanno (BFGS) See Wright, and Nocedal 'Numerical Optimization', 1999, pg. 198. *See Also*: scikits.openopt : SciKit which offers a unified syntax to call this and other solvers. """ x0 = asarray(x0).squeeze() if x0.ndim == 0: x0.shape = (1,) if maxiter is None: maxiter = len(x0)*200 func_calls, f = wrap_function(f, args) if fprime is None: grad_calls, myfprime = wrap_function(approx_fprime, (f, epsilon)) else: grad_calls, myfprime = wrap_function(fprime, args) gfk = myfprime(x0) k = 0 N = len(x0) I = numpy.eye(N,dtype=int) Hk = I old_fval = f(x0) old_old_fval = old_fval + 5000 xk = x0 if retall: allvecs = [x0] sk = [2*gtol] warnflag = 0 gnorm = vecnorm(gfk,ord=norm) while (gnorm > gtol) and (k < maxiter): pk = -numpy.dot(Hk,gfk) alpha_k = None if linesearch is not None: alpha_k, fc, gc, old_fval, old_old_fval, gfkp1 = \ linesearch.line_search(f,myfprime,xk,pk,gfk, old_fval,old_old_fval) if alpha_k is None: # line search failed try different one. alpha_k, fc, gc, old_fval, old_old_fval, gfkp1 = \ line_search(f,myfprime,xk,pk,gfk, old_fval,old_old_fval) if alpha_k is None: # line search(es) failed to find a better solution. warnflag = 2 break xkp1 = xk + alpha_k * pk if retall: allvecs.append(xkp1) sk = xkp1 - xk xk = xkp1 if gfkp1 is None: gfkp1 = myfprime(xkp1) yk = gfkp1 - gfk gfk = gfkp1 if callback is not None: callback(func_calls[0], xk, old_fval) k += 1 gnorm = vecnorm(gfk,ord=norm) if (gnorm <= gtol): break try: # this was handled in numeric, let it remaines for more safety rhok = 1.0 / (numpy.dot(yk,sk)) except ZeroDivisionError: rhok = 1000.0 print "Divide-by-zero encountered: rhok assumed large" if isinf(rhok): # this is patch for numpy rhok = 1000.0 print "Divide-by-zero encountered: rhok assumed large" A1 = I - sk[:,numpy.newaxis] * yk[numpy.newaxis,:] * rhok A2 = I - yk[:,numpy.newaxis] * sk[numpy.newaxis,:] * rhok Hk = numpy.dot(A1,numpy.dot(Hk,A2)) + rhok * sk[:,numpy.newaxis] \ * sk[numpy.newaxis,:] if disp or full_output: fval = old_fval if warnflag == 2: if disp: print "Warning: Desired error not necessarily achieved" \ "due to precision loss" print " Current function value: %f" % fval print " Iterations: %d" % k print " Function evaluations: %d" % func_calls[0] print " Gradient evaluations: %d" % grad_calls[0] elif k >= maxiter: warnflag = 1 if disp: print "Warning: Maximum number of iterations has been exceeded" print " Current function value: %f" % fval print " Iterations: %d" % k print " Function evaluations: %d" % func_calls[0] print " Gradient evaluations: %d" % grad_calls[0] else: if disp: print "Optimization terminated successfully." print " Current function value: %f" % fval print " Iterations: %d" % k print " Function evaluations: %d" % func_calls[0] print " Gradient evaluations: %d" % grad_calls[0] if full_output: retlist = xk, fval, gfk, Hk, func_calls[0], grad_calls[0], warnflag if retall: retlist += (allvecs,) else: retlist = xk if retall: retlist = (xk, allvecs) return retlist def fmin_cg(f, x0, fprime=None, args=(), gtol=1e-5, norm=Inf, epsilon=_epsilon, maxiter=None, full_output=0, disp=1, retall=0, callback=None): """Minimize a function using a nonlinear conjugate gradient algorithm. :Parameters: f : callable f(x,*args) Objective function to be minimized. x0 : ndarray Initial guess. fprime : callable f'(x,*args) Function which computes the gradient of f. args : tuple Extra arguments passed to f and fprime. gtol : float Stop when norm of gradient is less than gtol. norm : float Order of vector norm to use. -Inf is min, Inf is max. epsilon : float or ndarray If fprime is approximated, use this value for the step size (can be scalar or vector). callback : callable An optional user-supplied function, called after each iteration. Called as callback(xk), where xk is the current parameter vector. :Returns: (xopt, {fopt, func_calls, grad_calls, warnflag}, {allvecs}) xopt : ndarray Parameters which minimize f, i.e. f(xopt) == fopt. fopt : float Minimum value found, f(xopt). func_calls : int The number of function_calls made. grad_calls : int The number of gradient calls made. warnflag : int 1 : Maximum number of iterations exceeded. 2 : Gradient and/or function calls not changing. allvecs : ndarray If retall is True (see other parameters below), then this vector containing the result at each iteration is returned. *Other Parameters*: maxiter : int Maximum number of iterations to perform. full_output : bool If True then return fopt, func_calls, grad_calls, and warnflag in addition to xopt. disp : bool Print convergence message if True. retall : bool return a list of results at each iteration if True. :Notes: Optimize the function, f, whose gradient is given by fprime using the nonlinear conjugate gradient algorithm of Polak and Ribiere See Wright, and Nocedal 'Numerical Optimization', 1999, pg. 120-122. """ x0 = asarray(x0).flatten() if maxiter is None: maxiter = len(x0)*200 func_calls, f = wrap_function(f, args) if fprime is None: grad_calls, myfprime = wrap_function(approx_fprime, (f, epsilon)) else: grad_calls, myfprime = wrap_function(fprime, args) gfk = myfprime(x0) k = 0 N = len(x0) xk = x0 old_fval = f(xk) old_old_fval = old_fval + 5000 if retall: allvecs = [xk] sk = [2*gtol] warnflag = 0 pk = -gfk gnorm = vecnorm(gfk,ord=norm) while (gnorm > gtol) and (k < maxiter): deltak = numpy.dot(gfk,gfk) # These values are modified by the line search, even if it fails old_fval_backup = old_fval old_old_fval_backup = old_old_fval alpha_k = None if linesearch is not None: alpha_k, fc, gc, old_fval, old_old_fval, gfkp1 = \ linesearch.line_search(f,myfprime,xk,pk,gfk,old_fval, old_old_fval,c2=0.4) if alpha_k is None: # line search failed -- use different one. alpha_k, fc, gc, old_fval, old_old_fval, gfkp1 = \ line_search(f,myfprime,xk,pk,gfk, old_fval_backup,old_old_fval_backup) if alpha_k is None or alpha_k == 0: # line search(es) failed to find a better solution. warnflag = 2 break xk = xk + alpha_k*pk if retall: allvecs.append(xk) if gfkp1 is None: gfkp1 = myfprime(xk) yk = gfkp1 - gfk beta_k = pymax(0,numpy.dot(yk,gfkp1)/deltak) pk = -gfkp1 + beta_k * pk gfk = gfkp1 gnorm = vecnorm(gfk,ord=norm) if callback is not None: callback(func_calls[0], xk, old_fval) k += 1 if disp or full_output: fval = old_fval if warnflag == 2: if disp: print "Warning: Desired error not necessarily achieved due to precision loss" print " Current function value: %f" % fval print " Iterations: %d" % k print " Function evaluations: %d" % func_calls[0] print " Gradient evaluations: %d" % grad_calls[0] elif k >= maxiter: warnflag = 1 if disp: print "Warning: Maximum number of iterations has been exceeded" print " Current function value: %f" % fval print " Iterations: %d" % k print " Function evaluations: %d" % func_calls[0] print " Gradient evaluations: %d" % grad_calls[0] else: if disp: print "Optimization terminated successfully." print " Current function value: %f" % fval print " Iterations: %d" % k print " Function evaluations: %d" % func_calls[0] print " Gradient evaluations: %d" % grad_calls[0] if full_output: retlist = xk, fval, func_calls[0], grad_calls[0], warnflag if retall: retlist += (allvecs,) else: retlist = xk if retall: retlist = (xk, allvecs) return retlist def fmin_ncg(f, x0, fprime, fhess_p=None, fhess=None, args=(), avextol=1e-5, epsilon=_epsilon, maxiter=None, full_output=0, disp=1, retall=0, callback=None): """Minimize a function using the Newton-CG method. :Parameters: f : callable f(x,*args) Objective function to be minimized. x0 : ndarray Initial guess. fprime : callable f'(x,*args) Gradient of f. fhess_p : callable fhess_p(x,p,*args) Function which computes the Hessian of f times an arbitrary vector, p. fhess : callable fhess(x,*args) Function to compute the Hessian matrix of f. args : tuple Extra arguments passed to f, fprime, fhess_p, and fhess (the same set of extra arguments is supplied to all of these functions). epsilon : float or ndarray If fhess is approximated, use this value for the step size. callback : callable An optional user-supplied function which is called after each iteration. Called as callback(n,xk,f), where xk is the current parameter vector. :Returns: (xopt, {fopt, fcalls, gcalls, hcalls, warnflag},{allvecs}) xopt : ndarray Parameters which minimizer f, i.e. ``f(xopt) == fopt``. fopt : float Value of the function at xopt, i.e. ``fopt = f(xopt)``. fcalls : int Number of function calls made. gcalls : int Number of gradient calls made. hcalls : int Number of hessian calls made. warnflag : int Warnings generated by the algorithm. 1 : Maximum number of iterations exceeded. allvecs : list The result at each iteration, if retall is True (see below). *Other Parameters*: avextol : float Convergence is assumed when the average relative error in the minimizer falls below this amount. maxiter : int Maximum number of iterations to perform. full_output : bool If True, return the optional outputs. disp : bool If True, print convergence message. retall : bool If True, return a list of results at each iteration. :Notes: 1. scikits.openopt offers a unified syntax to call this and other solvers. 2. Only one of `fhess_p` or `fhess` need to be given. If `fhess` is provided, then `fhess_p` will be ignored. If neither `fhess` nor `fhess_p` is provided, then the hessian product will be approximated using finite differences on `fprime`. `fhess_p` must compute the hessian times an arbitrary vector. If it is not given, finite-differences on `fprime` are used to compute it. See Wright, and Nocedal 'Numerical Optimization', 1999, pg. 140. """ x0 = asarray(x0).flatten() fcalls, f = wrap_function(f, args) gcalls, fprime = wrap_function(fprime, args) hcalls = 0 if maxiter is None: maxiter = len(x0)*200 xtol = len(x0)*avextol update = [2*xtol] xk = x0 if retall: allvecs = [xk] k = 0 old_fval = f(x0) while (numpy.add.reduce(abs(update)) > xtol) and (k < maxiter): # Compute a search direction pk by applying the CG method to # del2 f(xk) p = - grad f(xk) starting from 0. b = -fprime(xk) maggrad = numpy.add.reduce(abs(b)) eta = min([0.5,numpy.sqrt(maggrad)]) termcond = eta * maggrad xsupi = zeros(len(x0), dtype=x0.dtype) ri = -b psupi = -ri i = 0 dri0 = numpy.dot(ri,ri) if fhess is not None: # you want to compute hessian once. A = fhess(*(xk,)+args) hcalls = hcalls + 1 while numpy.add.reduce(abs(ri)) > termcond: if fhess is None: if fhess_p is None: Ap = approx_fhess_p(xk,psupi,fprime,epsilon) else: Ap = fhess_p(xk,psupi, *args) hcalls = hcalls + 1 else: Ap = numpy.dot(A,psupi) # check curvature Ap = asarray(Ap).squeeze() # get rid of matrices... curv = numpy.dot(psupi,Ap) if curv == 0.0: break elif curv < 0: if (i > 0): break else: xsupi = xsupi + dri0/curv * psupi break alphai = dri0 / curv xsupi = xsupi + alphai * psupi ri = ri + alphai * Ap dri1 = numpy.dot(ri,ri) betai = dri1 / dri0 psupi = -ri + betai * psupi i = i + 1 dri0 = dri1 # update numpy.dot(ri,ri) for next time. pk = xsupi # search direction is solution to system. gfk = -b # gradient at xk alphak, fc, gc, old_fval = line_search_BFGS(f,xk,pk,gfk,old_fval) update = alphak * pk xk = xk + update # upcast if necessary if callback is not None: callback(fcalls[0], xk, old_fval) if retall: allvecs.append(xk) k += 1 if disp or full_output: fval = old_fval if k >= maxiter: warnflag = 1 if disp: print "Warning: Maximum number of iterations has been exceeded" print " Current function value: %f" % fval print " Iterations: %d" % k print " Function evaluations: %d" % fcalls[0] print " Gradient evaluations: %d" % gcalls[0] print " Hessian evaluations: %d" % hcalls else: warnflag = 0 if disp: print "Optimization terminated successfully." print " Current function value: %f" % fval print " Iterations: %d" % k print " Function evaluations: %d" % fcalls[0] print " Gradient evaluations: %d" % gcalls[0] print " Hessian evaluations: %d" % hcalls if full_output: retlist = xk, fval, fcalls[0], gcalls[0], hcalls, warnflag if retall: retlist += (allvecs,) else: retlist = xk if retall: retlist = (xk, allvecs) return retlist def fminbound(func, x1, x2, args=(), xtol=1e-5, maxfun=500, full_output=0, disp=1): """Bounded minimization for scalar functions. :Parameters: func : callable f(x,*args) Objective function to be minimized (must accept and return scalars). x1, x2 : float or array scalar The optimization bounds. args : tuple Extra arguments passed to function. xtol : float The convergence tolerance. maxfun : int Maximum number of function evaluations allowed. full_output : bool If True, return optional outputs. disp : int If non-zero, print messages. 0 : no message printing. 1 : non-convergence notification messages only. 2 : print a message on convergence too. 3 : print iteration results. :Returns: (xopt, {fval, ierr, numfunc}) xopt : ndarray Parameters (over given interval) which minimize the objective function. fval : number The function value at the minimum point. ierr : int An error flag (0 if converged, 1 if maximum number of function calls reached). numfunc : int The number of function calls made. :Notes: Finds a local minimizer of the scalar function `func` in the interval x1 < xopt < x2 using Brent's method. (See `brent` for auto-bracketing). """ # Test bounds are of correct form if not (is_array_scalar(x1) and is_array_scalar(x2)): raise ValueError("Optimisation bounds must be scalars" " or array scalars.") if x1 > x2: raise ValueError("The lower bound exceeds the upper bound.") flag = 0 header = ' Func-count x f(x) Procedure' step=' initial' sqrt_eps = sqrt(2.2e-16) golden_mean = 0.5*(3.0-sqrt(5.0)) a, b = x1, x2 fulc = a + golden_mean*(b-a) nfc, xf = fulc, fulc rat = e = 0.0 x = xf fx = func(x,*args) num = 1 fmin_data = (1, xf, fx) ffulc = fnfc = fx xm = 0.5*(a+b) tol1 = sqrt_eps*abs(xf) + xtol / 3.0 tol2 = 2.0*tol1 if disp > 2: print (" ") print (header) print "%5.0f %12.6g %12.6g %s" % (fmin_data + (step,)) while ( abs(xf-xm) > (tol2 - 0.5*(b-a)) ): golden = 1 # Check for parabolic fit if abs(e) > tol1: golden = 0 r = (xf-nfc)*(fx-ffulc) q = (xf-fulc)*(fx-fnfc) p = (xf-fulc)*q - (xf-nfc)*r q = 2.0*(q-r) if q > 0.0: p = -p q = abs(q) r = e e = rat # Check for acceptability of parabola if ( (abs(p) < abs(0.5*q*r)) and (p > q*(a-xf)) and \ (p < q*(b-xf))): rat = (p+0.0) / q; x = xf + rat step = ' parabolic' if ((x-a) < tol2) or ((b-x) < tol2): si = numpy.sign(xm-xf) + ((xm-xf)==0) rat = tol1*si else: # do a golden section step golden = 1 if golden: # Do a golden-section step if xf >= xm: e=a-xf else: e=b-xf rat = golden_mean*e step = ' golden' si = numpy.sign(rat) + (rat == 0) x = xf + si*max([abs(rat), tol1]) fu = func(x,*args) num += 1 fmin_data = (num, x, fu) if disp > 2: print "%5.0f %12.6g %12.6g %s" % (fmin_data + (step,)) if fu <= fx: if x >= xf: a = xf else: b = xf fulc, ffulc = nfc, fnfc nfc, fnfc = xf, fx xf, fx = x, fu else: if x < xf: a = x else: b = x if (fu <= fnfc) or (nfc == xf): fulc, ffulc = nfc, fnfc nfc, fnfc = x, fu elif (fu <= ffulc) or (fulc == xf) or (fulc == nfc): fulc, ffulc = x, fu xm = 0.5*(a+b) tol1 = sqrt_eps*abs(xf) + xtol/3.0 tol2 = 2.0*tol1 if num >= maxfun: flag = 1 fval = fx if disp > 0: _endprint(x, flag, fval, maxfun, xtol, disp) if full_output: return xf, fval, flag, num else: return xf fval = fx if disp > 0: _endprint(x, flag, fval, maxfun, xtol, disp) if full_output: return xf, fval, flag, num else: return xf class Brent: #need to rethink design of __init__ def __init__(self, func, tol=1.48e-8, maxiter=500): self.func = func self.tol = tol self.maxiter = maxiter self._mintol = 1.0e-11 self._cg = 0.3819660 self.xmin = None self.fval = None self.iter = 0 self.funcalls = 0 self.brack = None self._brack_info = None #need to rethink design of set_bracket (new options, etc) def set_bracket(self, brack = None): self.brack = brack self._brack_info = self.get_bracket_info() def get_bracket_info(self): #set up func = self.func brack = self.brack ### BEGIN core bracket_info code ### ### carefully DOCUMENT any CHANGES in core ## if brack is None: xa,xb,xc,fa,fb,fc,funcalls = bracket(func) elif len(brack) == 2: xa,xb,xc,fa,fb,fc,funcalls = bracket(func, xa=brack[0], xb=brack[1]) elif len(brack) == 3: xa,xb,xc = brack if (xa > xc): # swap so xa < xc can be assumed dum = xa; xa=xc; xc=dum assert ((xa < xb) and (xb < xc)), "Not a bracketing interval." fa = func(xa) fb = func(xb) fc = func(xc) assert ((fb=xmid): deltax=a-x # do a golden section step else: deltax=b-x rat = _cg*deltax else: # do a parabolic step tmp1 = (x-w)*(fx-fv) tmp2 = (x-v)*(fx-fw) p = (x-v)*tmp2 - (x-w)*tmp1; tmp2 = 2.0*(tmp2-tmp1) if (tmp2 > 0.0): p = -p tmp2 = abs(tmp2) dx_temp = deltax deltax= rat # check parabolic fit if ((p > tmp2*(a-x)) and (p < tmp2*(b-x)) and (abs(p) < abs(0.5*tmp2*dx_temp))): rat = p*1.0/tmp2 # if parabolic step is useful. u = x + rat if ((u-a) < tol2 or (b-u) < tol2): if xmid-x >= 0: rat = tol1 else: rat = -tol1 else: if (x>=xmid): deltax=a-x # if it's not do a golden section step else: deltax=b-x rat = _cg*deltax if (abs(rat) < tol1): # update by at least tol1 if rat >= 0: u = x + tol1 else: u = x - tol1 else: u = x + rat fu = func(u) # calculate new output value funcalls += 1 if (fu > fx): # if it's bigger than current if (u= x): a = x else: b = x v=w; w=x; x=u fv=fw; fw=fx; fx=fu iter += 1 ################################# #END CORE ALGORITHM ################################# self.xmin = x self.fval = fx self.iter = iter self.funcalls = funcalls def get_result(self, full_output=False): if full_output: return self.xmin, self.fval, self.iter, self.funcalls else: return self.xmin def brent(func, brack=None, tol=1.48e-8, full_output=0, maxiter=500): """Given a function of one-variable and a possible bracketing interval, return the minimum of the function isolated to a fractional precision of tol. :Parameters: func : callable f(x) Objective function. brack : tuple Triple (a,b,c) where (a xc): # swap so xa < xc can be assumed dum = xa; xa=xc; xc=dum assert ((xa < xb) and (xb < xc)), "Not a bracketing interval." fa = func(*((xa,)+args)) fb = func(*((xb,)+args)) fc = func(*((xc,)+args)) assert ((fb abs(xb-xa)): x1 = xb x2 = xb + _gC*(xc-xb) else: x2 = xb x1 = xb - _gC*(xb-xa) f1 = func(*((x1,)+args)) f2 = func(*((x2,)+args)) funcalls += 2 while (abs(x3-x0) > tol*(abs(x1)+abs(x2))): if (f2 < f1): x0 = x1; x1 = x2; x2 = _gR*x1 + _gC*x3 f1 = f2; f2 = func(*((x2,)+args)) else: x3 = x2; x2 = x1; x1 = _gR*x2 + _gC*x0 f2 = f1; f1 = func(*((x1,)+args)) funcalls += 1 if (f1 < f2): xmin = x1 fval = f1 else: xmin = x2 fval = f2 if full_output: return xmin, fval, funcalls else: return xmin def bracket(func, xa=0.0, xb=1.0, args=(), grow_limit=110.0, maxiter=1000): """Given a function and distinct initial points, search in the downhill direction (as defined by the initital points) and return new points xa, xb, xc that bracket the minimum of the function f(xa) > f(xb) < f(xc). It doesn't always mean that obtained solution will satisfy xa<=x<=xb :Parameters: func : callable f(x,*args) Objective function to minimize. xa, xb : float Bracketing interval. args : tuple Additional arguments (if present), passed to `func`. grow_limit : float Maximum grow limit. maxiter : int Maximum number of iterations to perform. :Returns: xa, xb, xc, fa, fb, fc, funcalls xa, xb, xc : float Bracket. fa, fb, fc : float Objective function values in bracket. funcalls : int Number of function evaluations made. """ _gold = 1.618034 _verysmall_num = 1e-21 fa = func(*(xa,)+args) fb = func(*(xb,)+args) if (fa < fb): # Switch so fa > fb dum = xa; xa = xb; xb = dum dum = fa; fa = fb; fb = dum xc = xb + _gold*(xb-xa) fc = func(*((xc,)+args)) funcalls = 3 iter = 0 while (fc < fb): tmp1 = (xb - xa)*(fb-fc) tmp2 = (xb - xc)*(fb-fa) val = tmp2-tmp1 if abs(val) < _verysmall_num: denom = 2.0*_verysmall_num else: denom = 2.0*val w = xb - ((xb-xc)*tmp2-(xb-xa)*tmp1)/denom wlim = xb + grow_limit*(xc-xb) if iter > maxiter: raise RuntimeError, "Too many iterations." iter += 1 if (w-xc)*(xb-w) > 0.0: fw = func(*((w,)+args)) funcalls += 1 if (fw < fc): xa = xb; xb=w; fa=fb; fb=fw return xa, xb, xc, fa, fb, fc, funcalls elif (fw > fb): xc = w; fc=fw return xa, xb, xc, fa, fb, fc, funcalls w = xc + _gold*(xc-xb) fw = func(*((w,)+args)) funcalls += 1 elif (w-wlim)*(wlim-xc) >= 0.0: w = wlim fw = func(*((w,)+args)) funcalls += 1 elif (w-wlim)*(xc-w) > 0.0: fw = func(*((w,)+args)) funcalls += 1 if (fw < fc): xb=xc; xc=w; w=xc+_gold*(xc-xb) fb=fc; fc=fw; fw=func(*((w,)+args)) funcalls += 1 else: w = xc + _gold*(xc-xb) fw = func(*((w,)+args)) funcalls += 1 xa=xb; xb=xc; xc=w fa=fb; fb=fc; fc=fw return xa, xb, xc, fa, fb, fc, funcalls def _linesearch_powell(linesearch, func, p, xi, tol): """Line-search algorithm using fminbound. Find the minimium of the function ``func(x0+ alpha*direc)``. """ def myfunc(alpha): return func(p + alpha * xi) alpha_min, fret, iter, num = linesearch(myfunc, full_output=1, tol=tol) xi = alpha_min*xi return squeeze(fret), p+xi, xi def fmin_powell(func, x0, args=(), xtol=1e-4, ftol=1e-4, maxiter=None, maxfun=None, full_output=0, disp=1, retall=0, callback=None, direc=None, linesearch=brent): """Minimize a function using modified Powell's method. :Parameters: func : callable f(x,*args) Objective function to be minimized. x0 : ndarray Initial guess. args : tuple Eextra arguments passed to func. callback : callable An optional user-supplied function, called after each iteration. Called as ``callback(n,xk,f)``, where ``xk`` is the current parameter vector. direc : ndarray Initial direction set. :Returns: (xopt, {fopt, xi, direc, iter, funcalls, warnflag}, {allvecs}) xopt : ndarray Parameter which minimizes `func`. fopt : number Value of function at minimum: ``fopt = func(xopt)``. direc : ndarray Current direction set. iter : int Number of iterations. funcalls : int Number of function calls made. warnflag : int Integer warning flag: 1 : Maximum number of function evaluations. 2 : Maximum number of iterations. allvecs : list List of solutions at each iteration. *Other Parameters*: xtol : float Line-search error tolerance. ftol : float Absolute error in ``func(xopt)`` acceptable for convergence. maxiter : int Maximum number of iterations to perform. maxfun : int Maximum number of function evaluations to make. full_output : bool If True, fopt, xi, direc, iter, funcalls, and warnflag are returned. disp : bool If True, print convergence messages. retall : bool If True, return a list of the solution at each iteration. :Notes: Uses a modification of Powell's method to find the minimum of a function of N variables. """ # we need to use a mutable object here that we can update in the # wrapper function fcalls, func = wrap_function(func, args) x = asarray(x0).flatten() if retall: allvecs = [x] N = len(x) rank = len(x.shape) if not -1 < rank < 2: raise ValueError, "Initial guess must be a scalar or rank-1 sequence." if maxiter is None: maxiter = N * 1000 if maxfun is None: maxfun = N * 1000 if direc is None: direc = eye(N, dtype=float) else: direc = asarray(direc, dtype=float) fval = squeeze(func(x)) x1 = x.copy() iter = 0; ilist = range(N) while True: fx = fval bigind = 0 delta = 0.0 for i in ilist: direc1 = direc[i] fx2 = fval fval, x, direc1 = _linesearch_powell(linesearch, func, x, direc1, xtol*100) if (fx2 - fval) > delta: delta = fx2 - fval bigind = i iter += 1 if callback is not None: callback(fcalls[0], x, fval, delta) if retall: allvecs.append(x) if abs(fx - fval) < ftol: break if fcalls[0] >= maxfun: break if iter >= maxiter: break # Construct the extrapolated point direc1 = x - x1 x2 = 2*x - x1 x1 = x.copy() fx2 = squeeze(func(x2)) if (fx > fx2): t = 2.0*(fx+fx2-2.0*fval) temp = (fx-fval-delta) t *= temp*temp temp = fx-fx2 t -= delta*temp*temp if t < 0.0: fval, x, direc1 = _linesearch_powell(linesearch, func, x, direc1, xtol*100) direc[bigind] = direc[-1] direc[-1] = direc1 warnflag = 0 if fcalls[0] >= maxfun: warnflag = 1 if disp: print "Warning: Maximum number of function evaluations has "\ "been exceeded." elif iter >= maxiter: warnflag = 2 if disp: print "Warning: Maximum number of iterations has been exceeded" else: if disp: print "Optimization terminated successfully." print " Current function value: %f" % fval print " Iterations: %d" % iter print " Function evaluations: %d" % fcalls[0] x = squeeze(x) if full_output: retlist = x, fval, direc, iter, fcalls[0], warnflag if retall: retlist += (allvecs,) else: retlist = x if retall: retlist = (x, allvecs) return retlist def _endprint(x, flag, fval, maxfun, xtol, disp): if flag == 0: if disp > 1: print "\nOptimization terminated successfully;\n" \ "The returned value satisfies the termination criteria\n" \ "(using xtol = ", xtol, ")" if flag == 1: print "\nMaximum number of function evaluations exceeded --- " \ "increase maxfun argument.\n" return def brute(func, ranges, args=(), Ns=20, full_output=0, finish=fmin): """Minimize a function over a given range by brute force. :Parameters: func : callable ``f(x,*args)`` Objective function to be minimized. ranges : tuple Each element is a tuple of parameters or a slice object to be handed to ``numpy.mgrid``. args : tuple Extra arguments passed to function. Ns : int Default number of samples, if those are not provided. full_output : bool If True, return the evaluation grid. :Returns: (x0, fval, {grid, Jout}) x0 : ndarray Value of arguments to `func`, giving minimum over the grid. fval : int Function value at minimum. grid : tuple Representation of the evaluation grid. It has the same length as x0. Jout : ndarray Function values over grid: ``Jout = func(*grid)``. :Notes: Find the minimum of a function evaluated on a grid given by the tuple ranges. """ N = len(ranges) if N > 40: raise ValueError, "Brute Force not possible with more " \ "than 40 variables." lrange = list(ranges) for k in range(N): if type(lrange[k]) is not type(slice(None)): if len(lrange[k]) < 3: lrange[k] = tuple(lrange[k]) + (complex(Ns),) lrange[k] = slice(*lrange[k]) if (N==1): lrange = lrange[0] def _scalarfunc(*params): params = squeeze(asarray(params)) return func(params,*args) vecfunc = vectorize(_scalarfunc) grid = mgrid[lrange] if (N==1): grid = (grid,) Jout = vecfunc(*grid) Nshape = shape(Jout) indx = argmin(Jout.ravel(),axis=-1) Nindx = zeros(N,int) xmin = zeros(N,float) for k in range(N-1,-1,-1): thisN = Nshape[k] Nindx[k] = indx % Nshape[k] indx = indx / thisN for k in range(N): xmin[k] = grid[k][tuple(Nindx)] Jmin = Jout[tuple(Nindx)] if (N==1): grid = grid[0] xmin = xmin[0] if callable(finish): vals = finish(func,xmin,args=args,full_output=1, disp=0) xmin = vals[0] Jmin = vals[1] if vals[-1] > 0: print "Warning: Final optimization did not succeed" if full_output: return xmin, Jmin, grid, Jout else: return xmin def main(): import time times = [] algor = [] x0 = [0.8,1.2,0.7] print "Nelder-Mead Simplex" print "===================" start = time.time() x = fmin(rosen,x0) print x times.append(time.time() - start) algor.append('Nelder-Mead Simplex\t') print print "Powell Direction Set Method" print "===========================" start = time.time() x = fmin_powell(rosen,x0) print x times.append(time.time() - start) algor.append('Powell Direction Set Method.') print print "Nonlinear CG" print "============" start = time.time() x = fmin_cg(rosen, x0, fprime=rosen_der, maxiter=200) print x times.append(time.time() - start) algor.append('Nonlinear CG \t') print print "BFGS Quasi-Newton" print "=================" start = time.time() x = fmin_bfgs(rosen, x0, fprime=rosen_der, maxiter=80) print x times.append(time.time() - start) algor.append('BFGS Quasi-Newton\t') print print "BFGS approximate gradient" print "=========================" start = time.time() x = fmin_bfgs(rosen, x0, gtol=1e-4, maxiter=100) print x times.append(time.time() - start) algor.append('BFGS without gradient\t') print print "Newton-CG with Hessian product" print "==============================" start = time.time() x = fmin_ncg(rosen, x0, rosen_der, fhess_p=rosen_hess_prod, maxiter=80) print x times.append(time.time() - start) algor.append('Newton-CG with hessian product') print print "Newton-CG with full Hessian" print "===========================" start = time.time() x = fmin_ncg(rosen, x0, rosen_der, fhess=rosen_hess, maxiter=80) print x times.append(time.time() - start) algor.append('Newton-CG with full hessian') print print "\nMinimizing the Rosenbrock function of order 3\n" print " Algorithm \t\t\t Seconds" print "===========\t\t\t =========" for k in range(len(algor)): print algor[k], "\t -- ", times[k] if __name__ == "__main__": main() PyCogent-1.5.3/cogent/maths/simannealingoptimiser.py000644 000765 000024 00000021777 12024702176 023625 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Simulated annealing optimiser. Derives from basic optimiser class. The simulated annealing optimiser is a translation into Python of the fortran program simman.f authored by Bill Goffe (bgoffe@whale.st.usm.edu). The original citation is "Global Optimization of Statistical Functions with Simulated Annealing," Goffe, Ferrier and Rogers, Journal of Econometrics, vol. 60, no. 1/2, Jan./Feb. 1994, pp. 65-100. """ from __future__ import division import numpy import random from collections import deque from cogent.util import checkpointing __author__ = "Andrew Butterfield and Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Andrew Butterfield", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class AnnealingSchedule(object): """Responsible for the shape of the simulated annealing temperature profile""" def __init__(self, temp_reduction, initial_temp, temp_iterations, step_cycles): if initial_temp < 0.0 : raise ValueError, "Initial temperature not +ve" self.T = self.initial_temp = initial_temp self.temp_reduction = temp_reduction self.temp_iterations = temp_iterations self.step_cycles = step_cycles self.dwell = temp_iterations * step_cycles def checkSameConditions(self, other): for attr in ['temp_reduction', 'initial_temp', 'temp_iterations', 'step_cycles']: if getattr(self, attr) != getattr(other, attr): raise ValueError('Checkpoint file ignored - %s different' % attr) def roundsToReach(self, T): from math import log return int(-log(self.initial_temp/T) / log(self.temp_reduction)) + 1 def cool(self): self.T = self.temp_reduction * self.T def willAccept(self, newF, oldF, random_series): deltaF = newF - oldF return deltaF >= 0 or random_series.uniform(0.0, 1.0) < numpy.exp(deltaF / self.T) class AnnealingHistory(object): """Keeps the last few results, for convergence testing""" def __init__(self, sample=4): self.sample_size = sample #self.values = deque([None]*sample, sample) Py2.6 self.values = deque([None]*sample) def note(self, F): self.values.append(F) # Next 2 lines not required once above Py2.6 line is uncommented if len(self.values) > self.sample_size: self.values.popleft() def minRemainingRounds(self, tolerance): last = self.values[-1] return max([0]+[i+1 for (i,v) in enumerate(self.values) if v is None or abs(v-last)>tolerance]) class AnnealingState(object): def __init__(self, X, function, random_series): self.random_series = random_series self.NFCNEV = 1 self.VM = numpy.ones(len(X), float) self.setX(X, function(X)) (self.XOPT, self.FOPT) = (X, self.F) self.NACP = [0] * len(X) self.NTRY = 0 def setX(self, X, F): self.X = numpy.array(X, float) self.F = F def step(self, function, accept_test): # One attempted move in each dimension X = self.X self.NTRY += 1 for H in range(len(X)): self.NFCNEV += 1 current_value = X[H] X[H] += self.VM[H] * self.random_series.uniform(-1.0, 1.0) F = function(X) if accept_test(F, self.F, self.random_series): self.NACP[H] += 1 self.F = F if F > self.FOPT: (self.FOPT, self.XOPT) = (F, X.copy()) else: X[H] = current_value def adjustStepSizes(self): # Adjust velocity in each dimension to keep acceptance ratios near 50% if self.NTRY == 0: return for I in range(len(self.X)): RATIO = (self.NACP[I]*1.0) / self.NTRY if RATIO > 0.6: self.VM[I] *= (1.0 + (2.0 * ((RATIO-0.6)/0.4))) elif RATIO < 0.4: self.VM[I] /= (1.0 + (2.0 * ((0.4 - RATIO)/0.4))) self.NACP[I] = 0 self.NTRY = 0 class AnnealingRun(object): def __init__(self, function, X, schedule, random_series): self.history = AnnealingHistory() self.schedule = schedule self.state = AnnealingState(X, function, random_series) self.test_count = 0 def checkFunction(self, function, xopt, checkpointing_filename): if len(xopt) != len(self.state.XOPT): raise ValueError( "Number of parameters in checkpoint file '%s' (%s) " \ "don't match current function (%s)" % ( checkpointing_filename, len(self.state.XOPT), len(xopt))) # if f(x) != g(x) then f isn't g. then = self.state.FOPT now = function(self.state.XOPT) if not numpy.allclose(now, then, 1e-8): raise ValueError( "Function to optimise doesn't match checkpoint file " \ "'%s': F=%s now, %s in file." % ( checkpointing_filename, now, then)) def run(self, function, tolerance, checkpointer, show_remaining): state = self.state history = self.history schedule = self.schedule est_anneal_remaining = schedule.roundsToReach(tolerance/10) + 3 while True: min_history_remaining = history.minRemainingRounds(tolerance) if min_history_remaining == 0: break self.save(checkpointer) remaining = max(min_history_remaining, est_anneal_remaining) est_anneal_remaining += -1 for i in range(self.schedule.dwell): show_remaining(remaining + 1 - i/self.schedule.dwell, state.FOPT, schedule.T, state.NFCNEV) state.step(function, self.schedule.willAccept) self.test_count += 1 if self.test_count % schedule.step_cycles == 0: state.adjustStepSizes() history.note(state.F) state.setX(state.XOPT, state.FOPT) schedule.cool() self.save(checkpointer, final=True) return state def save(self, checkpointer, final=False): msg = "Number of function evaluations = %d; current F = %s" % \ (self.state.NFCNEV, self.state.FOPT) checkpointer.record(self, msg, final) class SimulatedAnnealing(object): """Simulated annealing optimiser for bounded functions """ def __init__(self, filename=None, interval=None, restore=True): """ Set the checkpointing filename and time interval. Arguments: - filename: name of the file to which data will be written. If None, no checkpointing will be done. - interval: time expressed in seconds - restore: flag to restore from this filename or not. will be set to 0 after restoration """ self.checkpointer = checkpointing.Checkpointer(filename, interval) self.restore = restore def maximise(self, function, xopt, show_remaining, random_series = None, seed = None, tolerance = None, temp_reduction = 0.5, init_temp=5.0, temp_iterations = 5, step_cycles = 20): """Optimise function(xopt). Arguments: - show_progress: whether the function values are printed as the optimisation proceeds. Default is True. - tolerance: the error condition for termination, default is 1E-6 - temp_reduction: the factor by which the annealing "temperature" is reduced, default is 0.5 - temp_iterations: the number of iterations before a temperature reduction, default is 5 - step_cycles: the number of cycles after which the step size is modified, default is 20 Returns optimised parameter vector xopt """ if tolerance is None: tolerance = 1E-6 if len(xopt) == 0: return xopt random_series = random_series or random.Random() if seed is not None: random_series.seed(seed) schedule = AnnealingSchedule( temp_reduction, init_temp, temp_iterations, step_cycles) if self.restore and self.checkpointer.available(): run = self.checkpointer.load() run.checkFunction(function, xopt, self.checkpointer.filename) run.schedule.checkSameConditions(schedule) else: run = AnnealingRun(function, xopt, schedule, random_series) self.restore = False result = run.run( function, tolerance, checkpointer = self.checkpointer, show_remaining = show_remaining) return result.XOPT PyCogent-1.5.3/cogent/maths/solve.py000644 000765 000024 00000010005 12024702176 020332 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" EPS = 1e-15 def bisection(func, a, b, args=(), xtol=1e-10, maxiter=400): """Bisection root-finding method. Given a function and an interval with func(a) * func(b) < 0, find the root between a and b. """ if b < a: (a, b) = (b, a) i = 1 eva = func(a,*args) evb = func(b,*args) assert (eva*evb < 0), "Must start with interval with func(a) * func(b) <0" while i<=maxiter: dist = (b-a)/2.0 p = a + dist if dist/max(1.0, abs(p)) < xtol: return p ev = func(p,*args) if ev == 0: return p i += 1 if ev*eva > 0: a = p else: b = p raise RuntimeError("bisection failed after %d iterations." % maxiter) def brent(func, a, b, args=(), xtol=1e-10, maxiter=100): """Fast and robust root-finding method. Given a function and an interval with func(a) * func(b) < 0, find the root between a and b. From Numerical Recipes """ if b < a: (a, b) = (b, a) i = 1 fa = func(a,*args) fb = func(b,*args) assert (fa*fb < 0), "Must start with interval with func(a) * func(b) <0" (c, fc) = (b, fb) while i<=maxiter: if fb * fc > 0.0: (c, fc) = (a, fa) d = e = b - a if abs(fc) < abs(fb): (a, fa) = (b, fb) (b, fb) = (c, fc) (c, fc) = (a, fa) tol1 = 2.0*EPS*abs(b)+0.5*xtol xm = 0.5 * (c-b) if abs(xm) <= tol1 or fb == 0: return b if abs(e) >= tol1 and abs(fa) > abs(fb): s = fb / fa if a == c: p = 2.0 * xm * s q = 1.0 - s else: q = fa / fc r = fb / fc p = s * (2.0*xm*q*(q-r)-(b-a)*(r-1.0)) q = (q-1.0)*(r-1.0)*(s-1.0) if p > 0.0: q = -1.0 * q p = abs(p) min1 = 3.0 * xm * q - abs(tol1*q) min2 = abs(e*q) if 2.0 * p < min(min1, min2): e = d d = p / q else: d = xm e = d else: d = xm e = d (a, fa) = (b, fb) if abs(d) > tol1: b += d elif xm < 0.0: b -= tol1 else: b += tol1 fb = func(b, *args) i += 1 raise RuntimeError("solver failed after %d iterations." % maxiter) def find_root(func, x, direction, bound, xtol=None, expected_exception=None): if xtol is None: xtol = 1e-10 def sign_func(z): # +ve if f(z) is +ve # zero if f(z) is -ve (what we want) # -ve if f(z) causes an error try: y = func(z) if y < 0: return 0 else: return 1 except expected_exception: return -1 # Bracket root # Start out ignoring the bound as that is likely to be an error- # prone part of the range, and if there are multiple roots we want the # one closest to x. x_range = abs(bound-x) max_delta = x_range / 5 if not max_delta: return None delta = min(0.01, max_delta) assert func(x) > 0 x2 = x while 1: x1 = x2 delta = min(delta*2, max_delta) x2 = x1 + direction * delta if direction * (x2 - bound) > 0: x2 = bound y = sign_func(x2) if y <= 0 or x2 == bound: break # Hit a bound (or error). # Look for -ve between the +ve x1 and error x2 if y == -1: x2 = bisection(sign_func, x1, x2, xtol=max(xtol, 1e-5)) y = sign_func(x2) if y != 0: return None return brent(func, x1, x2, xtol=xtol) PyCogent-1.5.3/cogent/maths/spatial/000755 000765 000024 00000000000 12024703630 020265 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/maths/stats/000755 000765 000024 00000000000 12024703630 017766 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/maths/svd.py000644 000765 000024 00000007375 12024702176 020016 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Performs singular-value decomposition on a set of Q-matrices.""" from __future__ import division from cogent.maths.stats.test import std # numpy.std is biased from cogent.maths.matrix_exponentiation import FastExponentiator as expm from cogent.maths.matrix_logarithm import logm #note: corrcoef and cov assume rows are observations, cols are variables from numpy import log, newaxis as NewAxis, array, zeros, product, sqrt, ravel,\ sum, sort, reshape, corrcoef, cov, mean from numpy.random import random from numpy.linalg import svd, eigvals __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" array_type = type(array([0.1])) def var(x): return std(x)**2 def ratio_two_best(eigenvalues): """Returns ratio of best to second best eigenvalue (from vector).""" try: sorted = sort(eigenvalues) return sorted[-1]/sorted[-2] except TypeError: #probably complex-valued eigs = abs(eigenvalues) sorted = sort(eigs) return sorted[-1]/sorted[-2] def ratio_best_to_sum(eigenvalues): """Returns ratio of best singular value to sum. Expects a vector. Corresponds to fraction of variance explained by best singular value.""" try: sorted = sort(eigenvalues) return sorted[-1]/sum(eigenvalues, axis=0) except TypeError: #probably complex-valued eigs = abs(eigenvalues) sorted = sort(eigs) return sorted[-1]/sum(eigenvalues, axis=0) def euclidean_distance(q1, q2): """Returns Euclidean distance between arrays q1 and q2.""" diff = ravel(q1 - q2) return sqrt(sum(diff*diff, axis=0)) def euclidean_norm(m): """Returns Euclidean norm of an array or matrix m.""" flattened = ravel(m) return sqrt(sum(flattened*flattened, axis=0)) def _dists_from_mean_slow(qs): """Returns distance of each item in qs from the mean. WARNING: Slow method used only for compatibility testing. Do not use. """ n = len(qs) average = mean(qs, axis=0) result = zeros(n, 'float64') for i in range(n): result[i] = euclidean_distance(average, qs[i]) return result def dists_from_v(a, v=None): """Returns vector of distances between each row in a from v. If v is None, returns distance between each row and the mean. """ if v is None: v = mean(a, axis=0) diff = a - v return(sqrt(sum(diff*diff, axis=1))) def weiss(eigens): """Returns Weiss(20003) statistic, sum(ln(1+i)) for i in vector of eigens.""" return sum(log(1+eigens), axis=0) def three_item_combos(items): """Iterates over the 3-item sets from items. Doesn't check that items are unique. """ total = len(items) for i in range(total-2): curr_i = items[i] for j in range(i+1, total-1): curr_j = items[j] for k in range(j+1, total): yield curr_i, curr_j, items[k] def two_item_combos(items): """Iterates over the 2-item sets from items. Doesn't check that items are unique. """ total = len(items) for i in range(total-1): curr_i = items[i] for j in range(i+1, total): yield curr_i, items[j] def pca_qs(flat_qs): """Returns Principal Components vector from correlations in flat_qs.""" return eigvals(corrcoef(flat_qs)) def pca_cov_qs(flat_qs): """Returns Principal Components vector from covariance in flat_qs.""" return eigvals(cov(flat_qs)) def svd_qs(flat_qs): """Returns singular vals from flat_qs directly (returns v, ignores u,w).""" return svd(flat_qs)[1] PyCogent-1.5.3/cogent/maths/unifrac/000755 000765 000024 00000000000 12024703630 020257 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/maths/unifrac/__init__.py000644 000765 000024 00000000534 12024702176 022376 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['fast_tree', 'fast_unifrac' ] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Micah Hamady", "Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" PyCogent-1.5.3/cogent/maths/unifrac/fast_tree.py000644 000765 000024 00000056045 12024702176 022623 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Fast tree and support functions for fast implementation of UniFrac """ from numpy import (logical_and, logical_or, sum, take, nonzero, repeat, array, concatenate, zeros, put, transpose, flatnonzero, newaxis, logical_xor, logical_not) from numpy.random import permutation from cogent.core.tree import PhyloNode __author__ = "Rob Knight and Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Micah Hamady", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight, Micah Hamady" __email__ = "rob@spot.colorado.edu, hamady@colorado.edu" __status__ = "Prototype" #bind reduce to local variables for speed in inner loops lar = logical_and.reduce lor = logical_or.reduce class UniFracTreeNode(PhyloNode): """Slightly extended PhyloNode treenode for use with UniFrac Can expect Length and Name (= Name) to be set by DndParser. """ def __nonzero__(self): """Returns True if self.Children.""" return bool(self.Children) def index_tree(t): """Returns tuple containing {node_id:node}, [node_id,first_child,last_child] Indexes nodes in-place as n._leaf_index. Algorithm is as follows: for each node in post-order traversal over tree: if the node has children: set an index on each child for each child with children: add the child and its start and end tips to the result """ id_index = {} #needs to be dict, not list, b/c adding out of order child_index = [] curr_index = 0 for n in t.traverse(self_before=False, self_after=True): for c in n.Children: c._leaf_index = curr_index id_index[curr_index] = c curr_index += 1 if c: #c has children itself, so need to add to result child_index.append((c._leaf_index, c.Children[0]._leaf_index,\ c.Children[-1]._leaf_index)) #handle root, which should be t itself t._leaf_index = curr_index id_index[curr_index] = t #only want to add to the child_index if t has children... if t.Children: child_index.append((t._leaf_index, t.Children[0]._leaf_index,\ t.Children[-1]._leaf_index)) return id_index, child_index def count_envs(lines, ignore_chars=0): """Reads env counts from lines. Returns dict of {name:{env:count}}. Assumes all name-env mappings are unique -- will overwrite counts with the last observed count rather than adding. """ result = {} for line in lines: fields = line.split() #skip if we don't have at least label and field if len(fields) < 2: continue name, env = fields[:2] if ignore_chars: env = env[ignore_chars:] if len(fields) > 2: count = int(fields[2]) else: count = 1 if name not in result: result[name] = {} result[name][env] = count return result def sum_env_dict(envs): """Sums counts from the data structure produced by count_envs.""" return sum([sum(env.values()) for env in envs.values()]) def get_unique_envs(envs): """extract all unique envs from envs dict""" result = set() for v in envs.values(): result.update(v.keys()) #sort envs for convenience in testing and display return sorted(result), len(result) def index_envs(env_counts, tree_index, array_constructor=int): """Returns array of taxon x env with counts of the taxon in each env. env_counts should be the output of count_envs(lines). tree_index should be the id_index of index_tree(t). array_constructor is int by default (may need to change to float later to handle microarray data). """ num_nodes = len(tree_index) unique_envs, num_envs = get_unique_envs(env_counts) env_to_index = dict([(e, i) for i, e in enumerate(unique_envs)]) result = zeros((num_nodes, num_envs), array_constructor) #figure out taxon label to index map node_to_index = {} for i, node in tree_index.items(): if node.Name is not None: node_to_index[node.Name] = i #walk over env_counts, adding correct slots in array for name in env_counts: curr_row_index = node_to_index[name] for env, count in env_counts[name].items(): result[curr_row_index, env_to_index[env]] = count #return all the data structures we created; will be useful for other tasks return result, unique_envs, env_to_index, node_to_index def get_branch_lengths(tree_index): """Returns array of branch lengths, in tree index order.""" result = zeros(len(tree_index), float) for i, node in tree_index.items(): try: if node.Length is not None: result[i] = node.Length except AttributeError: pass return result def bind_to_array(tree_index, a): """Binds tree_index to array a, returning result in list. Takes as input list of (node, first_child, last_child) returns list of (node_row, child_rows) such that node_row points to the row of a that corresponds to the current node, and child_rows points to the row or rows of a that correspond to the direct children of the current node. Order is assumed to be traversal order, i.e. for the typical case of postorder traversal iterating over the items in the result and consolidating each time should give the same result as postorder traversal of the original tree. Should also be able to modify for preorder traversal. """ #note: range ends with end+1, not end, b/c end is included return [(a[node], a[start:end+1]) for node, start, end in tree_index] def bind_to_parent_array(t, a): """Binds tree to array a, returning result in list. Takes as input tree t with _leaf_index set. Returns list of (node_row, parent_row such that node_row points to the row of a that corresponds to the current row, and parent_row points to the row of the parent. Order will be preorder traversal, i.e. for propagating attributes from the root to the tip. Typical usage of this function is to set up an array structure for many preorder traversals on the same tree, especially where you plan to change the data between traversals. """ result = [] for n in t.traverse(self_before=True, self_after=False): if n is not t: result.append([a[n._leaf_index], a[n.Parent._leaf_index]]) return result def _is_parent_empty(parent_children): """Returns True if the first element in a (parent,children) tuple is empty. This is used by delete_empty_parents to figure out which elements to filter out. """ return bool(parent_children[0].sum()) def delete_empty_parents(bound_indices): """Deletes from list of (parent, children) bound indices empty parents. Expects as input the output of bind_to_array. Returns copy rather than acting in-place because deletions from long lists are expensive. This has the effect of pruning the tree, but by just skipping over the parents who have no children rather than altering memory. This is expected to be faster than trimming the array because it avoids copy operations. For pairwise environment bootstrapping or jackknifing, run this after running bool_descendants (or similar) to delete parents that only have offspring that are in other environments. Note that this does _not_ collapse nodes so that you might have long "stalks" with many serial nodes. It might be worth testing whether collapsing these stalks provides time savings. """ return filter(_is_parent_empty, bound_indices) def traverse_reduce(bound_indices, f): """Applies a[i] = f(a[j:k]) over list of [(a[i], a[j:k])]. If list is in traversal order, has same effect as consolidating the function over the tree, only much faster. Note that f(a[j:k]) must return an object that can be broadcast to the same shape as a[i], e.g. summing a 2D array to get a vector. """ for i, s in bound_indices: i[:] = f(s, 0) def bool_descendants(bound_indices): """For each internal node, sets col to True if any descendant is True.""" traverse_reduce(bound_indices, lor) def zero_branches_past_roots(bound_indices, sums): """Zeroes out internal nodes that are roots of each subtree.""" for i, ignore in bound_indices: i *= (i != sums) def sum_descendants(bound_indices): """For each internal node, sets col to sum of values in descendants.""" traverse_reduce(bound_indices, sum) class FitchCounterDense(object): """Returns parsimony result for set of child states, counting changes. WARNING: this version assumes that all tips are assigned to at least one env, and produces incorrect parsimony counts if this is not the case. """ def __init__(self): """Returns new FitchCounter, with Changes = 0.""" self.Changes = 0 def __call__(self, a, ignored): """Returns intersection(a), or, if zero, union(a).""" result = lar(a) if not result.any(): result = lor(a) self.Changes += 1 return result class FitchCounter(object): """Returns parsimony result for set of child states, counting changes. This version is slower but is robust to the case where some tips are missing envs. WARNING: logical_and.reduce(), if called on an empty array, will return all True if there are no values (I can only assume that it returns True because it is reporting that there are no values that return False: this isn't what I expected). Hence the code to explicitly trap this case based on the shape parameter. """ def __init__(self): """Returns new FitchCounter, with Changes = 0.""" self.Changes = 0 def __call__(self, a, ignored): """Returns intersection(a), or, if zero, union(a).""" nonzero_rows = a[a.sum(1).nonzero()] if len(nonzero_rows): result = lar(nonzero_rows) else: result = zeros(nonzero_rows.shape[-1], bool) if not result.any(): if nonzero_rows.any(): result = lor(nonzero_rows) self.Changes += 1 return result def fitch_descendants(bound_indices, counter=FitchCounter): """Sets each internal node to Fitch parsimony assignment, returns # changes.""" f = counter() traverse_reduce(bound_indices, f.__call__) return f.Changes def tip_distances(a, bound_indices, tip_indices): """Sets each tip to its distance from the root.""" for i, s in bound_indices: i += s mask = zeros(len(a)) put(mask, tip_indices, 1) a *= mask[:,newaxis] def permute_selected_rows(rows, orig, new, permutation_f=permutation): """Takes selected rows from orig, inserts into new in permuted order. NOTE: the traditional UniFrac permutation test, the P test, etc. shuffle the envs, i.e. they preserve all the correlations of seqs between envs. This function can also be used to shuffle each env (i.e. column) individually by applying it to column slices of orig and new. This latter method provides a potentially less biased but less conservative test. """ shuffled = take(rows, permutation_f(len(rows))) for r, s in zip(rows, shuffled): new[s] = orig[r] def prep_items_for_jackknife(col): """Takes column of a, returns vector with multicopy states unpacked. e.g. if index 3 has value 4, there will be 4 copies of index 3 in result. """ nz = flatnonzero(col) result = [repeat(array((i,)), col[i]) for i in nz] return concatenate(result) def jackknife_bool(orig_items, n, length, permutation_f=permutation): """Jackknifes vector of items so that only n remain. orig = flatnonzero(vec) length = len(vec) Returns all items if requested sample is larger than number of items. """ permuted = take(orig_items, permutation_f(len(orig_items))[:n]) result = zeros(length) put(result, permuted, 1) return result def jackknife_int(orig_items, n, length, permutation_f=permutation): """Jackknifes new vector from vector of orig items. Returns all items if reqested sample is larger than number of items. """ result = zeros(length) permuted = take(orig_items, permutation_f(len(orig_items))[:n]) for p in permuted: result[p] += 1 return result def jackknife_array(mat, num_keep, axis=1, jackknife_f=jackknife_int, permutation_f=permutation): """ Jackknife array along specified axis, keeping specified num_keep""" cur_mat = mat if axis: cur_mat = mat.T num_r, num_c = cur_mat.shape jack_mat = zeros((num_r, num_c)) for row_ix in range(num_r): in_prepped_array = prep_items_for_jackknife(cur_mat[row_ix,:]) jack_mat[row_ix,:] = jackknife_f(orig_items=in_prepped_array, n=num_keep, length=num_c, permutation_f=permutation_f) if axis: jack_mat = jack_mat.T return jack_mat def unifrac(branch_lengths, i, j): """Calculates unifrac(i,j) from branch lengths and cols i and j of m. This is the original, unweighted UniFrac metric. branch_lengths should be row vector, same length as # nodes in tree. i and j should be slices of states from m, same length as # nodes in tree. Slicing m (e.g. m[:,i]) returns a vector in the right format; note that it should be a row vector (the default), not a column vector. """ return 1 - ((branch_lengths*logical_and(i,j)).sum()/\ (branch_lengths*logical_or(i,j)).sum()) def unnormalized_unifrac(branch_lengths, i, j): """UniFrac, but omits normalization for frac of tree covered.""" return (branch_lengths*logical_xor(i,j)).sum()/branch_lengths.sum() def G(branch_lengths, i, j): """Calculates G(i,j) from branch lengths and cols i,j of m. This calculates fraction gain in branch length in i with respect to i+j, i.e. normalized for the parts of the tree that i and j cover. Note: G is the metric that we also call "asymmetric unifrac". """ return (branch_lengths*logical_and(i, logical_not(j))).sum()/\ (branch_lengths*logical_or(i,j)).sum() def PD(branch_lengths, i): """Calculate PD(i) from branch lengths and col i of m. Calculates raw amount of branch length leading to tips in i, including branch length from the root. """ return (branch_lengths * i.astype(bool)).sum() def unnormalized_G(branch_lengths, i, j): """Calculates G(i,j) from branch length and cols i,j of m. This calculates the fraction gain in branch length of i with respect to j, divided by all the branch length in the tree. """ return (branch_lengths*logical_and(i, logical_not(j))).sum()/\ branch_lengths.sum() def unifrac_matrix(branch_lengths, m, metric=unifrac, is_symmetric=True): """Calculates unifrac(i,j) for all i,j in m. branch_lengths is the array of branch lengths. m is 2D array: rows are taxa, states are columns. Assumes that ancestral states have already been calculated (either by logical_or or Fitch). metric: metric to use for combining each pair of columns i and j. Default is unifrac. is_symmetric indicates whether the metric is symmetric. Default is True. """ num_cols = m.shape[-1] cols = [m[:,i] for i in range(num_cols)] result = zeros((num_cols,num_cols), float) if is_symmetric: #only calc half matrix and transpose for i in range(1, num_cols): first_col = cols[i] row_result = [] for j in range(i): second_col = cols[j] row_result.append(metric(branch_lengths, first_col, second_col)) result[i,:j+1] = row_result #note: can't use += because shared memory between a and transpose(a) result = result + transpose(result) else: #calc full matrix, incl. diagonal (which is probably 0...) for i in range(num_cols): first_col = cols[i] result[i] = [metric(branch_lengths, first_col, cols[j]) for \ j in range(num_cols)] return result def unifrac_one_sample(one_sample_idx, branch_lengths, m, metric=unifrac): """Calculates unifrac(one_sample_idx,j) for all environments j in m. branch_lengths is the array of branch lengths. m is 2D count array: rows are taxa (corresponding to branch_lenths), samples/states/envs are columns. Assumes that ancestral states have already been calculated (either by logical_or or Fitch). metric: metric to use for when comparing two environments. Default is unifrac. must be called like: metric(branch_lengths, env1_counts, env2counts) returns a numpy 1d array if asymmetric metric, returns metric(one_sample, other), usually a row in the mtx returned by unifrac_matrix """ num_cols = m.shape[-1] cols = [m[:,i] for i in range(num_cols)] # result = zeros((num_cols), float) first_col = cols[one_sample_idx] # better to do loop into preallocated numpy array here? result = array([metric(branch_lengths, first_col, cols[j]) for \ j in range(num_cols)],'float') return result def env_unique_fraction(branch_lengths, m): """ Calculates unique branch length for each env. Returns unique branch len and unique fraction """ total_bl = branch_lengths.sum() if total_bl <= 0: raise ValueError, "total branch length in tree must be > 0" n_rows_nodes, n_col_envs = m.shape env_bl_sums = zeros(n_col_envs) cols = [m[:, i] for i in range(n_col_envs)] col_sum = m.sum(1) env_bl_sums = zeros(n_col_envs) for env_ix, f in enumerate(cols): sing = (f == col_sum) # have to mask zeros put(sing, nonzero(f == 0), 0) env_bl_sums[env_ix] = (sing * branch_lengths).sum() return env_bl_sums, env_bl_sums/total_bl def unifrac_vector(branch_lengths, m, metric=unifrac): """Calculates unifrac(i, others) for each column i of m. Parameters as for unifrac_matrix. Use this when you want to calculate UniFrac or G of each state against the rest of the states, rather than of each state against each other state. """ num_cols = m.shape[-1] cols = [m[:, i] for i in range(num_cols)] col_sum = m.sum(1) return array([metric(branch_lengths, col, col_sum-col) for col in cols]) def PD_vector(branch_lengths, m, metric=PD): """Calculates metric(i) for each column i of m. Parameters as for unifrac_matrix. Use this when you want to calculate PD or some other alpha diversity metric that depends solely on the branches within each state, rather than calculations that compare each state against each other state. """ return array([metric(branch_lengths, col) for col in m.T]) def _weighted_unifrac(branch_lengths, i, j, i_sum, j_sum): """Calculates weighted unifrac(i,j) from branch lengths and cols i,j of m. """ return (branch_lengths * abs((i/float(i_sum))-(j/float(j_sum)))).sum() def _branch_correct(tip_distances, i, j, i_sum, j_sum): """Calculates weighted unifrac branch length correction. tip_distances must be 0 except for tips. """ result = tip_distances.ravel()*((i/float(i_sum))+(j/float(j_sum))) return result.sum() def weighted_unifrac(branch_lengths, i, j, tip_indices, \ unifrac_f=_weighted_unifrac): """Returns weighted unifrac(i,j) from branch lengths and cols i,j of m. Must pass in tip indices to calculate sums. Note: this calculation is not used in practice because it has to recalc. the sum each time. More efficient to calculate the sum first and pass it into _weighted_unifrac directly, as weighted_unifrac_matrix does. """ i_sum = (take(i, tip_indices)).sum() j_sum = (take(j, tip_indices)).sum() return unifrac_f(branch_lengths, i, j, i_sum, j_sum) def weighted_unifrac_matrix(branch_lengths, m, tip_indices, bl_correct=False, tip_distances=None, unifrac_f=_weighted_unifrac): """Calculates weighted_unifrac(i,j) for all i,j in m. Requires tip_indices for calculating sums, etc. bl_correct: if True (default: False), applies branch length correction. tip_distances is required for normalization for weighted unifrac. """ num_cols = m.shape[-1] cols = [m[:,i] for i in range(num_cols)] #note that these will be row vecs sums = [take(m[:,i], tip_indices).sum() for i in range(num_cols)] result = zeros((num_cols,num_cols),float) for i in range(1, num_cols): i_sum = sums[i] first_col = cols[i] row_result = [] for j in range(i): second_col = cols[j] j_sum = sums[j] curr = unifrac_f(branch_lengths, first_col, \ second_col, i_sum, j_sum) if bl_correct: curr /= _branch_correct(tip_distances, first_col, \ second_col, i_sum, j_sum) row_result.append(curr) result[i,:j+1] = row_result result = result + transpose(result) return result def weighted_one_sample(one_sample_idx, branch_lengths, m, tip_indices, bl_correct=False, tip_distances=None, unifrac_f=_weighted_unifrac): """Calculates weighted_unifrac(one_sample_idx,j) for all environments j in m Requires tip_indices for calculating sums, etc. bl_correct: if True (default: False), applies branch length correction. tip_distances is required for normalization for weighted unifrac. """ num_cols = m.shape[-1] cols = [m[:,i] for i in range(num_cols)] #note that these will be row vecs sums = [take(m[:,i], tip_indices).sum() for i in range(num_cols)] result = zeros((num_cols),float) i_sum = sums[one_sample_idx] first_col = cols[one_sample_idx] row_result = [] for j in range(num_cols): second_col = cols[j] j_sum = sums[j] curr = unifrac_f(branch_lengths, first_col, \ second_col, i_sum, j_sum) if bl_correct: curr /= _branch_correct(tip_distances, first_col, \ second_col, i_sum, j_sum) result[j] = curr return result def weighted_unifrac_vector(branch_lengths, m, tip_indices, bl_correct=False, tip_distances=None, unifrac_f=_weighted_unifrac): """Calculates weighted_unifrac(i,rest) for i in m. Requires tip_indices for calculating sums, etc. bl_correct: if True (default: False), applies branch length correction. tip_distances is required for normalization for weighted unifrac. """ num_cols = m.shape[-1] cols = [m[:,i] for i in range(num_cols)] sums = [take(m[:,i], tip_indices).sum() for i in range(num_cols)] sum_of_cols = m.sum(1) sum_of_sums = sum(sums) result = [] for i, col in enumerate(cols): i_sum = sums[i] i_col = cols[i] rest_col = sum_of_cols - i_col rest_sum = sum_of_sums - i_sum curr = unifrac_f(branch_lengths, i_col, rest_col, i_sum, rest_sum) if bl_correct: curr /= _branch_correct(tip_distances, i_col, rest_col, i_sum, rest_sum) result.append(curr) return array(result) PyCogent-1.5.3/cogent/maths/unifrac/fast_unifrac.py000644 000765 000024 00000125323 12024702176 023307 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Fast implementation of UniFrac for use with very large datasets""" from random import shuffle from numpy import ones, ma, where from numpy.random import permutation from cogent.maths.unifrac.fast_tree import * # not imported by import * from cogent.maths.unifrac.fast_tree import _weighted_unifrac, _branch_correct from cogent.parse.tree import DndParser from cogent.cluster.metric_scaling import * from cogent.core.tree import PhyloNode, TreeError from cogent.cluster.UPGMA import UPGMA_cluster from cogent.phylo.nj import nj from StringIO import StringIO __author__ = "Rob Knight and Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Micah Hamady", "Daniel McDonald", "Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight, Micah Hamady" __email__ = "rob@spot.colorado.edu, hamady@colorado.edu" __status__ = "Prototype" UNIFRAC_DIST_MATRIX = "distance_matrix" UNIFRAC_PCOA = "pcoa" UNIFRAC_CLUST_ENVS = "cluster_envs" UNIFRAC_NJ_ENVS = "nj_envs" UNIFRAC_DIST_VECTOR = "distance_vector" UNIFRAC_VALID_MODES = set([UNIFRAC_DIST_MATRIX, UNIFRAC_PCOA, UNIFRAC_CLUST_ENVS,UNIFRAC_DIST_VECTOR, UNIFRAC_NJ_ENVS]) UNIFRAC_DEFAULT_MODES = set([UNIFRAC_DIST_MATRIX, UNIFRAC_PCOA, UNIFRAC_CLUST_ENVS]) TEST_ON_PAIRWISE = "Pairwise" TEST_ON_TREE = "Tree" TEST_ON_ENVS = "Envs" def identity(x): return range(x) def num_comps(num_envs): """ Calc number of comparisons for Bonferroni correction """ return (num_envs * (num_envs-1.0)) / 2.0 def mcarlo_sig(real_val, sim_vals, num_comps, tail='low'): """ Calc significance value for Monte Carlo simulation real_val: real value of test sim_vals: numpy array of values from simulation runs num_envs: number of environments (for correction) tail: which side of distro to check, high=larger, low=smaller returns (raw pval, corrected pval) """ if num_comps < 1: raise ValueError, "num_comps must be > 0" sim_vals = array(sim_vals) out_count = 0 pop_size = float(len(sim_vals)) if tail == 'low': out_count = sum(sim_vals < real_val) elif tail == 'high': out_count = sum(sim_vals > real_val) else: raise ValueError, "tail must be 'low' or 'high'" raw_pval = out_count/pop_size cor_pval = raw_pval * num_comps # reset to 1.0 if corrected if > 1.0 if cor_pval > 1.0: cor_pval = 1.0 elif raw_pval == 0.0: cor_pval = "<=%.1e" % (1.0/pop_size) return (raw_pval, cor_pval) def fast_unifrac_file(tree_in, envs_in, weighted=False, metric=unifrac, is_symmetric=True, modes=UNIFRAC_DEFAULT_MODES): """Takes tree and envs file and returns fast_unifrac() results typical results: distance matrix, UPGMA cluster and PCoA tree_in: open file object or list of lines with tree in Newick format envs_in: open file object of list of lines with tab delimited sample mapping file see fast_unifrac() for further description """ tree = DndParser(tree_in, UniFracTreeNode) envs = count_envs(envs_in) return fast_unifrac(tree, envs, weighted, metric, is_symmetric=is_symmetric, modes=modes) def fast_unifrac_permutations_file(tree_in, envs_in, weighted=False, num_iters=1000, verbose=False, test_on=TEST_ON_PAIRWISE): """ Wrapper to read tree and envs from files. """ result = [] t = DndParser(tree_in, UniFracTreeNode) envs = count_envs(envs_in) if test_on == TEST_ON_PAIRWISE: # calculate real values results = fast_unifrac(t, envs,weighted=weighted, metric=unifrac, is_symmetric=True, modes=[UNIFRAC_DIST_MATRIX]) real_env_mat, unique_envs = results[UNIFRAC_DIST_MATRIX] num_uenvs = real_env_mat.shape[0] cur_num_comps = num_comps(num_uenvs) for i in range(num_uenvs): first_env = unique_envs[i] for j in range(i+1, num_uenvs): second_env = unique_envs[j] real = real_env_mat[i][j] sim = fast_unifrac_permutations(t, envs, weighted, num_iters, first_env=first_env, second_env=second_env) raw_pval, cor_pval = mcarlo_sig(real, sim, cur_num_comps, tail='high') result.append((first_env, second_env, raw_pval, cor_pval)) if verbose: print "env %s vs %s" % (first_env, second_env) print raw_pval, cor_pval, num_uenvs, cur_num_comps, 'high' # calculate single p-value for whole tree elif test_on == TEST_ON_TREE: # will be using env_unique_fraction real_ufracs, sim_ufracs = fast_unifrac_whole_tree(t, envs, num_iters) raw_pval, cor_pval = mcarlo_sig(sum(real_ufracs), [sum(x) for x in sim_ufracs], 1, tail='high') result.append(('whole tree', raw_pval, cor_pval)) # calculate one p-value per env elif test_on == TEST_ON_ENVS: unique_envs, num_uenvs = get_unique_envs(envs) real_ufracs, sim_ufracs = fast_unifrac_whole_tree(t, envs, num_iters) sim_m = array(sim_ufracs) # for each env, cal paval for i in range(len(real_ufracs)): raw_pval, cor_pval = mcarlo_sig(real_ufracs[i], sim_m[:,i], 1, tail='high') result.append((unique_envs[i], raw_pval, cor_pval)) else: raise ValueError, "Invalid test_on value: %s" % str(test_on) return result def fast_p_test_file(tree_in, envs_in, num_iters=1000, verbose=False, test_on=TEST_ON_PAIRWISE): """ Wrapper to read tree and envs from files. """ result = [] t = DndParser(tree_in, UniFracTreeNode) envs = count_envs(envs_in) unique_envs, num_uenvs = get_unique_envs(envs) # calculate real, sim vals and p-vals for each pair of envs in tree if test_on == TEST_ON_PAIRWISE: cur_num_comps = num_comps(num_uenvs) for i in range(num_uenvs): first_env = unique_envs[i] for j in range(i+1, num_uenvs): second_env = unique_envs[j] real = fast_p_test(t, envs, num_iters=1, first_env=first_env, second_env=second_env, permutation_f=identity)[0] sim = fast_p_test(t, envs, num_iters, first_env=first_env, second_env=second_env) raw_pval, cor_pval = mcarlo_sig(real, sim, cur_num_comps, tail='low') result.append((first_env, second_env, raw_pval, cor_pval)) if verbose: print "P Test: env %s vs %s" % (first_env, second_env) print raw_pval, cor_pval, num_uenvs, cur_num_comps, 'low' # calculate real, sim vals and p-vals for whole tree elif test_on == TEST_ON_TREE: real = fast_p_test(t, envs, num_iters=1, permutation_f=identity)[0] sim = fast_p_test(t, envs, num_iters) raw_pval, cor_pval = mcarlo_sig(real, sim, 1, tail='low') result.append(('Whole Tree', raw_pval)) else: raise ValueError, "Invalid test_on value: %s" % str(test_on) return result def _fast_unifrac_setup(t, envs, make_subtree=True): """Setup shared by fast_unifrac and by significance tests.""" if make_subtree: t2 = t.copy() wanted = set(envs.keys()) def delete_test(node): if node.istip() and node.Name not in wanted: return True return False t2.removeDeleted(delete_test) t2.prune() t = t2 #index tree node_index, nodes = index_tree(t) #get good nodes, defined as those that are in the env file. good_nodes=dict([(i.Name,envs[i.Name]) for i in t.tips() if i.Name in envs]) envs = good_nodes count_array, unique_envs, env_to_index, node_to_index = index_envs(envs, node_index) env_names = sorted(unique_envs) #Note: envs get sorted at the step above branch_lengths = get_branch_lengths(node_index) if not envs: raise ValueError, "No valid samples/environments found. Check whether tree tips match otus/taxa present in samples/environments" return envs, count_array, unique_envs, env_to_index, node_to_index, env_names, branch_lengths, nodes, t def fast_unifrac_whole_tree(t, envs, num_iters, permutation_f=permutation): """Performs UniFrac permutations on whole tree """ sim_ufracs = [] envs, count_array, unique_envs, env_to_index, node_to_index, env_names, \ branch_lengths, nodes, t = _fast_unifrac_setup(t, envs) bound_indices = bind_to_array(nodes, count_array) orig_count_array = count_array.copy() # calculate real values bool_descendants(bound_indices) real_bl_sums, real_bl_ufracs = env_unique_fraction(branch_lengths, count_array) tip_indices = [n._leaf_index for n in t.tips()] for i in range(num_iters): permute_selected_rows(tip_indices, orig_count_array, count_array, permutation_f) bool_descendants(bound_indices) cur_bl_sums, cur_bl_ufracs = env_unique_fraction(branch_lengths, count_array) sim_ufracs.append(cur_bl_ufracs) return real_bl_ufracs, sim_ufracs def PD_whole_tree(t, envs): """Run PD on t and envs for each env. Note: this is specific for PD per se, use PD_generic_whole_tree if you want to calculate a related metric. """ envs, count_array, unique_envs, env_to_index, node_to_index, env_names, \ branch_lengths, nodes, t = _fast_unifrac_setup(t, envs) count_array = count_array.astype(bool) bound_indices = bind_to_array(nodes, count_array) #initialize result bool_descendants(bound_indices) result = (branch_lengths * count_array.T).sum(1) return unique_envs, result def PD_generic_whole_tree(t, envs, metric=PD): """Run PD on t and envs for each env. Note: this is specific for PD per se, use PD_generic_whole_tree if you want to calculate a related metric. """ envs, count_array, unique_envs, env_to_index, node_to_index, env_names, \ branch_lengths, nodes, t = _fast_unifrac_setup(t, envs) count_array = count_array.astype(bool) bound_indices = bind_to_array(nodes, count_array) #initialize result bool_descendants(bound_indices) result = PD_vector(branch_lengths, count_array,metric) return unique_envs, result def fast_unifrac_permutations(t, envs, weighted, num_iters, first_env, second_env, permutation_f=permutation, unifrac_f=_weighted_unifrac): """Performs UniFrac permutations between specified pair of environments. NOTE: this function just gives you the result of the permutations, need to compare to real values from doing a single unifrac. """ result = [] envs, count_array, unique_envs, env_to_index, node_to_index, env_names, branch_lengths, nodes, t = _fast_unifrac_setup(t, envs) first_index,second_index = env_to_index[first_env], env_to_index[second_env] count_array = count_array[:,[first_index,second_index]] #ditch rest of array bound_indices = bind_to_array(nodes, count_array) orig_count_array = count_array.copy() first_col, second_col = count_array[:,0], count_array[:,1] tip_indices = [n._leaf_index for n in t.tips()] #figure out whether doing weighted or unweighted analysis: for weighted, #need to figure out root-to-tip distances, but can skip this step if #doing unweighted analysis. if weighted: tip_ds = branch_lengths.copy()[:,newaxis] bindings = bind_to_parent_array(t, tip_ds) tip_distances(tip_ds, bindings, tip_indices) if weighted == 'correct': bl_correct = True else: bl_correct = False first_sum, second_sum = [sum(take(count_array[:,i], tip_indices)) for i in range(2)] for i in range(num_iters): permute_selected_rows(tip_indices, orig_count_array, count_array, permutation_f) sum_descendants(bound_indices) curr = unifrac_f(branch_lengths, first_col, second_col, first_sum, second_sum) if bl_correct: curr /= _branch_correct(tip_ds, first_col, second_col, first_sum, second_sum) result.append(curr) else: for i in range(num_iters): permute_selected_rows(tip_indices, orig_count_array, count_array, permutation_f) # commenting out these (??) #first = count_array.copy() #permute_selected_rows(tip_indices, orig_count_array, count_array, permutation_f) #second = count_array.copy() bool_descendants(bound_indices) curr = unifrac(branch_lengths, first_col, second_col) result.append(curr) return result def fast_p_test(t, envs, num_iters, first_env=None, second_env=None, permutation_f=permutation): """Performs Andy Martin's p test between specified pair of environments. t: tree envs: envs first_env: name of first env, or None if doing whole tree second_env: name of second env, or None if doing whole tree NOTE: this function just gives you the result of the permutations, need to compare to real Fitch parsimony values. Sleazy way to get the real values is to set num_iters to 1, permutation_f to identity.""" result = [] envs, count_array, unique_envs, env_to_index, node_to_index, env_names, branch_lengths, nodes, t = _fast_unifrac_setup(t, envs) # check if doing pairwise if not (first_env is None or second_env is None): first_ix, second_ix = env_to_index[first_env], env_to_index[second_env] count_array = count_array[:,[first_ix,second_ix]] #ditch rest of array # else if both envs are none, we're doing whole tree elif not (first_env is None and second_env is None): raise ValueError, "Both envs must either have a value or be None." bound_indices = bind_to_array(nodes, count_array) orig_count_array = count_array.copy() tip_indices = [n._leaf_index for n in t.tips()] for i in range(num_iters): count_array *= 0 permute_selected_rows(tip_indices, orig_count_array, count_array, permutation_f=permutation_f) curr = fitch_descendants(bound_indices) result.append(curr) return result def shared_branch_length(t, envs, env_count=1): """Returns the shared branch length env_count combinations of envs t: phylogenetic tree relating the sequences. envs: dict of {sequence:{env:count}} showing environmental abundance. env_count: number of envs that must be within the subtree Returns {(env1,env2,...env_count):shared_branch_length} """ envs, count_array, unique_envs, env_to_index, node_to_index, env_names, branch_lengths, nodes, t = _fast_unifrac_setup(t, envs) if len(unique_envs) < env_count: raise ValueError, "Not enough environments for env_count" index_to_env = dict([(i,e) for i,e in enumerate(unique_envs)]) bound_indices = bind_to_array(nodes, count_array) bool_descendants(bound_indices) # determine what taxa meet the required number of environments count_array = where(count_array > 0, 1, 0) counts = count_array.sum(axis=1) taxa_to_investigate = (counts == env_count).nonzero()[0] # determine what environments corrispond to what taxa envs_to_investigate = {} for row_index in taxa_to_investigate: taxa_envs = count_array[row_index] row_envs = tuple([index_to_env[i] for i,v in enumerate(taxa_envs) if v]) try: envs_to_investigate[row_envs].append(row_index) except KeyError: envs_to_investigate[row_envs] = [row_index] # compute shared branch length for each environments result = {} for envs_tuple, taxa_indices in envs_to_investigate.items(): valid_rows = zeros(len(count_array)) for i in taxa_indices: valid_rows[i] = 1.0 result[envs_tuple] = sum(branch_lengths * valid_rows) return result def shared_branch_length_to_root(t, envs): """Returns the shared branch length for a single env from tips to root t: phylogenetic tree relating sequences envs: dict of {sequence:{env:count}} showing environmental abundance Returns {env:shared_branch_length} """ working_t = t.copy() result = {} # decorate nodes with environment information for n in working_t.postorder(): # for tip, grab and set env information if n.isTip(): curr_envs = envs.get(n.Name, None) if curr_envs is None: n.Envs = set([]) else: n.Envs = set(curr_envs.keys()) # for internal node, collect descending env information else: n.Envs = set([]) # should only visit each internal node once for c in n.Children: n.Envs.update(c.Envs) # collect branch length for each environment for n in working_t.preorder(include_self=False): if not hasattr(n, 'Length') or n.Length is None: continue for e in n.Envs: if e not in result: result[e] = 0.0 result[e] += n.Length return result def fast_unifrac(t, envs, weighted=False, metric=unifrac, is_symmetric=True, modes=UNIFRAC_DEFAULT_MODES, weighted_unifrac_f=_weighted_unifrac,make_subtree=True): """ Run fast unifrac. t: phylogenetic tree relating the sequences. pycogent phylonode object envs: dict of {sequence:{env:count}} showing environmental abundance. weighted: if True, performs the weighted UniFrac procedure. metric: distance metric to use. currently you must use unifrac only if weighted=True. see fast_tree.py for metrics (e.g.: G, unnormalized_G, unifrac, etc.) modes: tasks to perform on running unifrac. see fast_unifrac.py default is to get a unifrac distance matrix, pcoa on that matrix, and a cluster of the environments is_symmetric: if the desired distance matrix is symmetric (dist(sampleA, sampleB) == dist(sampleB, sampleA)), then set this True to prevent calculating the same number twice using default modes, returns a dictionary with the following (key:value) pairs: 'distance_matrix': a tuple with a numpy array of pairwise distances between samples and a list of names describing the order of samples in the array 'cluster_envs': cogent.core.PhyloNode object containing results of running UPGMA on the distance matrix. 'pcoa': a cogent.util.Table object with the results of running Principal Coordines Analysis on the distance matrix. Usage examples: (these assume the example files exist) from cogent.parse.tree import DndParser from cogent.maths.unifrac.fast_unifrac import count_envs, fast_unifrac from cogent.maths.unifrac.fast_tree import UniFracTreeNode, count_envs, G tree_in = open('Crump_et_al_example.tree') envs_in = open('Crump_et_al_example_env_file.txt') tree = DndParser(tree_in, UniFracTreeNode) envs = count_envs(envs_in) unifrac_result = fast_unifrac(tree, envs) G_result = fast_unifrac(tree, envs, metric=G, is_symmetric=False) WARNING: PCoA on asymmetric matrices (e.g.: G metric) is meaningless because these are not distance matrices. """ modes = set(modes) #allow list, etc. of modes to be passed in. if not modes or modes - UNIFRAC_VALID_MODES: raise ValueError, "Invalid run modes: %s, valid: %s" % (str(modes),str(UNIFRAC_VALID_MODES)) envs, count_array, unique_envs, env_to_index, node_to_index, env_names, branch_lengths, nodes, t = _fast_unifrac_setup(t, envs, make_subtree) bound_indices = bind_to_array(nodes, count_array) #initialize result result = {} #figure out whether doing weighted or unweighted analysis: for weighted, #need to figure out root-to-tip distances, but can skip this step if #doing unweighted analysis. if weighted: tip_indices = [n._leaf_index for n in t.tips()] sum_descendants(bound_indices) tip_ds = branch_lengths.copy()[:,newaxis] bindings = bind_to_parent_array(t, tip_ds) tip_distances(tip_ds, bindings, tip_indices) if weighted == 'correct': bl_correct = True else: bl_correct = False u = weighted_unifrac_matrix(branch_lengths, count_array, tip_indices, \ bl_correct=bl_correct, tip_distances=tip_ds, \ unifrac_f=weighted_unifrac_f) #figure out if we need the vector if UNIFRAC_DIST_VECTOR in modes: result[UNIFRAC_DIST_VECTOR] = (weighted_unifrac_vector( branch_lengths, count_array, tip_indices, bl_correct=bl_correct, tip_distances=tip_ds, unifrac_f=weighted_unifrac_f), env_names) else: bool_descendants(bound_indices) u = unifrac_matrix(branch_lengths, count_array, metric=metric, is_symmetric=is_symmetric) if UNIFRAC_DIST_VECTOR in modes: result[UNIFRAC_DIST_VECTOR] = (unifrac_vector(branch_lengths, count_array), env_names) #check if we have to do the matrix calculations, which are expensive if modes - set([UNIFRAC_DIST_VECTOR]): result.update(unifrac_tasks_from_matrix(u, env_names, modes=modes)) return result def fast_unifrac_one_sample(one_sample_name, t, envs, weighted=False, metric=unifrac, weighted_unifrac_f=_weighted_unifrac, make_subtree=False): """ performs fast unifrac between a specified sample and all other samples one_sample_name: unifrac will be calculated bet this and each other sample t: phylogenetic tree relating the sequences. pycogent phylonode object envs: dict of {sequence:{env:count}} showing environmental abundance. weighted: if True, performs the weighted UniFrac procedure. If "correct", performs weighted unifrac with branch length correction. If false (default), performs supplied metric (default: unweighted unifrac) metric: distance metric to use, unless weighted=True see fast_tree.py for metrics (e.g.: G, unnormalized_G, unifrac, etc.) returns distances, environments. e.g. when one_sample_name = 'B': array([ 0.623, 0., 0.4705]), ['A', 'B', 'C'] returns a ValueError if we have no data on one_sample_name """ envs, count_array, unique_envs, env_to_index, node_to_index, env_names, \ branch_lengths, nodes, t = _fast_unifrac_setup(t, envs, make_subtree) bound_indices = bind_to_array(nodes, count_array) result = {} try: one_sample_idx = env_names.index(one_sample_name) except ValueError: raise ValueError('one_sample_name not found, ensure that there are'+\ ' tree tips and corresponding envs counts for sample ' +\ one_sample_name ) if weighted: tip_indices = [n._leaf_index for n in t.tips()] sum_descendants(bound_indices) tip_ds = branch_lengths.copy()[:,newaxis] bindings = bind_to_parent_array(t, tip_ds) tip_distances(tip_ds, bindings, tip_indices) if weighted == 'correct': bl_correct = True else: bl_correct = False u = weighted_one_sample(one_sample_idx, branch_lengths, count_array, tip_indices, bl_correct=bl_correct, tip_distances=tip_ds, \ unifrac_f=weighted_unifrac_f) else: # unweighted bool_descendants(bound_indices) u = unifrac_one_sample(one_sample_idx, branch_lengths, count_array, metric=metric) return (u, env_names) def unifrac_tasks_from_matrix(u, env_names, modes=UNIFRAC_DEFAULT_MODES): """Returns the UniFrac matrix, PCoA, and/or cluster from the matrix.""" result = {} if UNIFRAC_DIST_MATRIX in modes: result[UNIFRAC_DIST_MATRIX] = (u, env_names) if UNIFRAC_PCOA in modes: point_matrix, eigvals = principal_coordinates_analysis(u) result[UNIFRAC_PCOA] = output_pca(point_matrix, eigvals, env_names) if UNIFRAC_CLUST_ENVS in modes: nodes = map(PhyloNode, env_names) BIG = 1e305 U = u.copy() for i in range(len(U)): U[i,i] = BIG c = UPGMA_cluster(U, nodes, BIG) result[UNIFRAC_CLUST_ENVS] = c if UNIFRAC_NJ_ENVS in modes: c = nj(dists_to_nj(u, env_names)) result[UNIFRAC_NJ_ENVS] = c return result def unifrac_recursive(tree, envs, ref_tree):#, metric=weighted): """Performs UniFrac recursively over a tree. Specifically, for each node in the tree, performs UniFrac clustering. Then compares the UniFrac tree to a reference tree of the same taxa using the tip-to-tip distances and the subset distances. Assumption is that if the two trees match, the node represents a group in which evolution has mirrored the evolution of the reference tree. tree: contains the tree on which UniFrac will be performed recursively. envs: environments for UniFrac clustering (these envs should match the taxon labels in the ref_tree) ref_tree: reference tree that the clustering is supposed to match. metric: metric for UniFrac clustering. Typically, will want to estimate significance by comparing the actual values from ref_tree to values obtained with one or more shuffled versions of ref_tree (can make these with permute_tip_labels). """ lengths, dists, sets = [], [], [] for node in tree.traverse(self_before=True, self_after=False): try: result = fast_unifrac(node, envs, weighted=False, modes=set([UNIFRAC_CLUST_ENVS])) curr_tree = result[UNIFRAC_CLUST_ENVS] except AttributeError: #hit a zero branch length continue if curr_tree is None: #hit single node? continue try: l = len(curr_tree.tips()) d = curr_tree.compareByTipDistances(ref_tree) s = curr_tree.compareBySubsets(ref_tree, True) #want to calculate all values before appending so we can bail out #if any of the calculations fails: this ensures that the lists #remain synchronized. lengths.append(l) dists.append(d) sets.append(s) except ValueError: #no common taxa continue return lengths, dists, sets def dists_to_nj(matrix, labels): """Wraps matrix and labels together for format NJ requires.""" result = {} for outer, row in zip(labels, matrix): for inner, i in zip(labels, row): result[(outer, inner)] = i return result def shuffle_tipnames(t): """Returns copy of tree t with tip names shuffled.""" result = t.copy() names = [i.Name for i in t.tips()] shuffle(names) for name, tip in zip(names, result.tips()): tip.Name = name return result def weight_equally(tree_list, envs_list): """Returns equal weights for all trees.""" num_trees = len(tree_list) return ones(num_trees) def weight_by_num_tips(tree_list, envs_list): """Weights each tree by the number of tips it contains.""" return array([len(list(t.tips())) for t in tree_list]) def weight_by_branch_length(tree_list, envs_list): """Weights each tree by the sum of its branch length.""" return array([sum(filter(None, [i.Length for i in \ t.traverse(self_before=True,self_after=False) if hasattr(i, 'Length')])) for t in tree_list]) def weight_by_num_seqs(tree_list, envs_list): """Weights each tree by the number of seqs it contains.""" return array(map(sum_env_dict, envs_list)) def get_all_env_names(envs): """Returns set of all env names from envs.""" result = set() for e in envs.values(): result.update(e.keys()) return result def consolidate_skipping_missing_matrices(matrices, env_names, weights, all_env_names): """Consolidates matrices, skipping any that are missing envs""" weight_sum = 0 result = zeros((len(all_env_names),len(all_env_names)), float) for m, e, w in zip(matrices, env_names, weights): if e == all_env_names: #note -- assumes sorted result += m * w weight_sum += w #readjust weights for missing matrices result /= weight_sum return result def consolidate_missing_zero(matrices, env_names, weights, all_env_names): """Consolidates matrices, setting missing values to 0 distance""" result = zeros((len(all_env_names),len(all_env_names)), float) for m, e, w in zip(matrices, env_names, weights): result += reshape_by_name(m, e, all_env_names, 0) * w return result def consolidate_missing_one(matrices, env_names, weights, all_env_names): """Consolidates matrices, setting missing values to 1 distance""" result = zeros((len(all_env_names),len(all_env_names)), float) for m, e, w in zip(matrices, env_names, weights): result += reshape_by_name(m, e, all_env_names, 1) * w return result def consolidate_skipping_missing_values(matrices, env_names, weights, all_env_names): """Consolidates matrices, skipping only values from missing envs""" result = [] for m, e, w in zip(matrices, env_names, weights): reshaped = reshape_by_name(m, e, all_env_names, masked=True) reshaped *= w result.append(reshaped) data = array([i.data for i in result], float) masks = array([i.mask for i in result], bool) masked_result = ma.array(data, mask=masks) #figure out mask of weights so we can figure out per-element weighting masked_weights = ma.array(zeros(data.shape), mask=masks) + \ array(weights,float).reshape((len(weights),1,1)) return masked_result.sum(0)/masked_weights.sum(0) def reshape_by_name(m, old_names, new_names, default_off_diag=0,default_diag=0, masked=False): """Reshape matrix m mapping slots from old names to new names. """ num_names = len(new_names) result = zeros((num_names,num_names), float) + default_off_diag for i in range(num_names): result[i,i] = default_diag pairs = {} for i, n in enumerate(old_names): if n in new_names: pairs[i] = new_names.index(n) for i, row in enumerate(m): new_i = pairs[i] for j, val in enumerate(row): new_j = pairs[j] result[new_i, new_j] = val if masked: mask = ones((num_names, num_names), float) for i in pairs.values(): for j in pairs.values(): mask[i,j] = 0 result = ma.array(result, mask=mask) return result def meta_unifrac(tree_list, envs_list, weighting_f, consolidation_f=consolidate_skipping_missing_values, modes=UNIFRAC_DEFAULT_MODES, **unifrac_params): """Perform metagenomic UniFrac on a list of trees and envs. tree_list: list of tree objects env_list: list of sample x env count arrays weighting_f: f(trees, envs) -> weights consolidation_f: f(matrix_list, name_list, weight_list, all_env_names) -> matrix unifrac_params: parameters that will be passed to unifrac to build the matrix Notes: - tree list and env list must be same length and consist of matched pairs, i.e. each tree must have a corresponding envs array. """ all_weights = weighting_f(tree_list, envs_list) all_env_names = set() for e in envs_list: all_env_names.update(get_all_env_names(e)) all_env_names = sorted(all_env_names) matrices = [] env_names = [] weights = [] #need to be robust to failure to build UniFrac tree, and to combine the #matrices that survive in a reasonable way that handles missing data for t, e, w in zip(tree_list, envs_list, all_weights): #try: u, en = fast_unifrac(t, e, modes=[UNIFRAC_DIST_MATRIX], **unifrac_params)[UNIFRAC_DIST_MATRIX] matrices.append(u) env_names.append(en) weights.append(w) #except ValueError: # pass #normalize weights so sum to 1 weights = array(weights, float)/sum(weights) final_matrix = consolidation_f(matrices, env_names, weights, all_env_names) return unifrac_tasks_from_matrix(final_matrix, all_env_names, modes) # Crump example tree CRUMP_TREE = """((((((((((((((((AF141409:0.25346,(((AF141563:0.00000,AF141568:0.00000):0.03773,AF141410:0.06022)Pseudomonadaceae:0.08853,((AF141521:0.12012,AF141439:0.03364):0.01884,(((((((((((AF141494:0.00000,AF141503:0.00000):0.00000,AF141553:0.00000):0.00000,AF141555:0.00000):0.00000,AF141527:0.00000):0.00000,AF141477:0.00513):0.00000,AF141554:0.00000):0.00000,AF141438:0.00000):0.00000,AF141493:0.00000):0.00000,AF141489:0.00000):0.00000,AF141437:0.00513):0.00000,AF141490:0.00771):0.07637):0.00573):0.00896):0.00327,((AF141546:0.03666,AF141510:0.04080):0.03273,AF141586:0.05358):0.00492):0.02041,((AF141577:0.04055,((AF141558:0.00000,AF141569:0.00771):0.01436,AF141430:0.01282):0.01144)Legionellaceae:0.06630,(AF141501:0.01323,AF141528:0.00000)SUP05:0.06854):0.02220):0.00651,AF141498:0.12862):0.00081,AF141428:0.08582):0.00570,((((((((((AF141464:0.00485,AF141422:0.00256):0.00000,AF141473:0.00770)Alcaligenaceae:0.06178,((((((AF141492:0.01026,AF141453:0.00513):0.00000,AF141443:0.01027):0.00323,(AF141441:0.00162,(AF141404:0.00000,AF141398:0.00000):0.00161):0.02267):0.01703,((AF141465:0.00513,AF141491:0.00256):0.00000,AF141405:0.00256):0.01214)Polynucleobacter:0.02121,AF141431:0.03263)Ralstoniaceae:0.01786,AF141456:0.01328):0.01630):0.00325,((((((((AF141462:0.01064,AF141414:0.00513):0.00000,AF141457:0.00000):0.00161,AF141413:0.00383):0.00646,AF141459:0.08285):0.02827,((((((AF141468:0.00256,AF141392:0.00256):0.00324,AF141394:0.00000):0.00323,AF141388:0.05424):0.01136,AF141581:0.00998):0.00323,(AF141444:0.00000,AF141446:0.00000):0.01673):0.02266,AF141478:0.02523):0.00162):0.02266,AF141595:0.05097):0.02108,((AF141393:0.00513,AF141458:0.00256):0.02621,AF141469:0.03147):0.02285):0.01541,AF141538:0.10137)Comamonadaceae:0.06378):0.01134,AF141449:0.06970)Burkholderiales:0.02283,(AF141600:0.07983,AF141482:0.04023):0.00573):0.01304,(((AF141486:0.00000,AF141403:0.00256):0.00000,AF141525:0.01542):0.01546,AF141461:0.01645)Methylophilales:0.06427):0.00326,(AF141551:0.02426,AF141507:0.04615)Neisseriales:0.09487)Betaproteobacteria:0.07947,AF141517:0.09902):0.05527,((AF141532:0.01292,AF141542:0.00257):0.00000,AF141424:0.00262)Ellin307/WD2124:0.12728):0.01546)Gamma_beta_proteobacteria:0.02854,((((((AF141557:0.00513,AF141480:0.00256)Roseobacter:0.13561,((AF141529:0.01265,AF141434:0.03671):0.02966,(((AF141526:0.00256,AF141495:0.00000):0.00000,AF141556:0.00513):0.00000,AF141548:0.00514):0.00164)Rhodobacter:0.04051)Rhodobacterales:0.22265,(AF141421:0.04514,AF141432:0.01539)Beijerinckiaceae:0.06527):0.00244,(((AF141544:0.00000,AF141545:0.00000):0.11111,AF141450:0.02767):0.10513,AF141396:0.07575):0.00653):0.00487,((AF141530:0.00000,AF141531:0.00000):0.02863,AF141448:0.04585)Sphingomonadales:0.13392):0.04010,(((((((((((AF141598:0.00323,AF141479:0.00836):0.00646,AF141593:0.01564):0.01291,AF141588:0.01033)Pelagibacter:0.00646,AF141583:0.00257):0.00000,AF141539:0.00000):0.01778,(AF141601:0.00162,AF141580:0.00769):0.02595):0.06022,(((AF141582:0.00000,AF141585:0.00000):0.02425,AF141590:0.02328):0.02274,AF141395:0.04863):0.00498)SAR11:0.04348,(AF141447:0.05616,(AF141594:0.00256,AF141584:0.00000):0.06659):0.02347):0.00663,AF141567:0.22932):0.01333,AF141589:0.05577):0.01247,AF141435:0.09271)Consistiales:0.05054)Alphaproteobacteria:0.08879):0.07221,((((AF141472:0.16321,(AF141537:0.01302,AF141496:0.00838)Desulfobulbaceae:0.12237):0.01639,AF141454:0.14646):0.01228,AF141504:0.15471):0.01396,AF141505:0.13406)'Deltaproteobacteria':0.00581)Proteobacteria:0.00332,AF141560:0.18908):0.00658,((((((AF141549:0.10860,((((((((AF141499:0.00258,AF141547:0.00258):0.00000,AF141559:0.00258):0.07190,AF141518:0.07560):0.04246,AF141474:0.07724)Cytophaga:0.01871,((((AF141515:0.02581,(AF141519:0.02952,AF141452:0.04379):0.01140):0.00484,AF141451:0.02101):0.01471,AF141466:0.06719)Sporocytophaga:0.02370,AF141524:0.04740):0.01239):0.01304,((AF141500:0.00256,AF141552:0.00000):0.00000,AF141502:0.00000):0.02059):0.04357,AF141407:0.14560):0.03134,AF141488:0.10821)Flavobacteriales:0.01232):0.02462,AF141543:0.10581):0.00743,AF141397:0.07576):0.04542,((AF141436:0.01862,AF141497:0.00325):0.00162,AF141460:0.00797)Flexibacteraceae:0.13515):0.00494,(AF141550:0.09051,AF141418:0.03264)Saprospiraceae:0.16732):0.01398,AF141514:0.25839)Bacteroidetes:0.13111):0.00907,((((AF141516:0.06810,AF141562:0.03493)'"Planctomycetacia"':0.09418,(AF141399:0.00528,AF141417:0.00262)'"Gemmatae"':0.21158)Planctomycetes:0.09116,((AF141391:0.02575,(AF141508:0.16401,(((AF141475:0.07537,AF141487:0.02921):0.02882,AF141513:0.08164):0.03217,AF141541:0.05472):0.02157)'Verrucomicrobiae(1)':0.00906):0.03990,((AF141455:0.00512,AF141387:0.00256):0.01132,(AF141408:0.00767,AF141406:0.00257):0.00815)'Opitutae(4)':0.18896)Verrucomicrobia:0.10524):0.01404,AF141536:0.11511):0.00906):0.00332,((((AF141463:0.01550,AF141476:0.01081)'Agrococcusetal.':0.09011,(((((AF141592:0.00664,AF141426:0.01026):0.02268,(AF141402:0.01525,(AF141484:0.00418,AF141445:0.00257):0.00486):0.00811):0.00486,AF141587:0.00170):0.02354,(AF141442:0.00674,AF141389:0.00000):0.01457):0.00489,AF141411:0.06787)Cellulomonadaceae:0.10377)Actinobacteridae:0.09674,((((AF141471:0.01091,AF141467:0.01136):0.01798,((AF141400:0.00767,AF141481:0.00000):0.05116,(AF141401:0.00512,AF141433:0.00256):0.03594):0.01066):0.01065,(((AF141522:0.00000,AF141423:0.01279):0.00000,AF141427:0.00000):0.00000,AF141420:0.00257):0.04350):0.09155,(((AF141440:0.00421,AF141520:0.00256):0.00161,AF141597:0.00256):0.00000,AF141485:0.00256)'BD2-10group':0.11726)Acidimicrobidae:0.04260)Actinobacteria:0.05311,((((((((AF141574:0.00000,AF141591:0.00256):0.00000,AF141579:0.00256):0.00000,AF141572:0.00769):0.00000,AF141565:0.00256):0.00000,AF141571:0.00513):0.00000,AF141575:0.00771):0.00324,((AF141596:0.00000,AF141564:0.00513):0.00162,(AF141599:0.00256,AF141578:0.01026):0.00162):0.00809)Prochlorales:0.18823,((((((AF141523:0.01285,AF141425:0.00258):0.00000,AF141429:0.00260):0.02350,AF141470:0.10407):0.00486,((AF141534:0.00809,AF141533:0.04316):0.00647,(AF141412:0.00000,AF141419:0.00257):0.00162):0.01456):0.04144,(((((AF141561:0.00372,AF141570:0.00000):0.00000,AF141576:0.00000):0.00000,AF141506:0.00743):0.00000,AF141573:0.00743):0.05520,AF141566:0.02577):0.01571)'Euglenaetal.chloroplasts':0.08248,AF141512:0.08318)Chloroplasts:0.03457)Cyanobacteria:0.16029):0.00661):0.01395,(AF141416:0.20329,AF141511:0.19093)'"Anaerolines"':0.09929):0.00986,AF141540:0.22389):0.03152,(AF141509:0.23520,(AF141390:0.25622,AF141483:0.15819)OP11-5:0.02369):0.08699):0.00913,(AF141415:0.00647,AF141535:0.00260)OP10:0.19926)Bacteria; """ # Crump example envs CRUMP_ENVS = """AF141399 R_FL 1 AF141411 R_FL 1 AF141408 R_FL 2 AF141403 R_FL 1 AF141410 R_FL 1 AF141398 R_FL 2 AF141391 R_FL 1 AF141389 R_FL 1 AF141395 R_FL 1 AF141401 R_FL 1 AF141390 R_FL 1 AF141393 R_FL 1 AF141396 R_FL 1 AF141402 R_FL 1 AF141407 R_FL 1 AF141387 R_FL 1 AF141394 R_FL 2 AF141409 R_FL 1 AF141400 R_FL 1 AF141397 R_FL 1 AF141405 R_FL 1 AF141388 R_FL 1 AF141424 R_PA 1 AF141421 R_PA 1 AF141433 R_PA 1 AF141428 R_PA 1 AF141432 R_PA 1 AF141426 R_PA 1 AF141430 R_PA 1 AF141413 R_PA 1 AF141419 R_PA 1 AF141423 R_PA 2 AF141429 R_PA 2 AF141422 R_PA 1 AF141431 R_PA 1 AF141415 R_PA 1 AF141418 R_PA 1 AF141416 R_PA 1 AF141420 R_PA 1 AF141417 R_PA 1 AF141434 R_PA 1 AF141412 R_PA 1 AF141414 R_PA 1 AF141463 E_FL 1 AF141493 E_FL 14 AF141459 E_FL 1 AF141461 E_FL 1 AF141447 E_FL 1 AF141479 E_FL 1 AF141449 E_FL 1 AF141465 E_FL 2 AF141435 E_FL 1 AF141457 E_FL 3 AF141468 E_FL 1 AF141487 E_FL 1 AF141472 E_FL 1 AF141466 E_FL 1 AF141444 E_FL 4 AF141473 E_FL 3 AF141439 E_FL 1 AF141436 E_FL 1 AF141455 E_FL 1 AF141443 E_FL 3 AF141483 E_FL 1 AF141476 E_FL 1 AF141441 E_FL 1 AF141440 E_FL 2 AF141474 E_FL 1 AF141486 E_FL 1 AF141481 E_FL 1 AF141480 E_FL 1 AF141470 E_FL 1 AF141458 E_FL 1 AF141460 E_FL 1 AF141478 E_FL 1 AF141450 E_FL 1 AF141471 E_FL 2 AF141442 E_FL 1 AF141454 E_FL 1 AF141488 E_FL 1 AF141451 E_FL 1 AF141456 E_FL 1 AF141452 E_FL 1 AF141482 E_FL 1 AF141448 E_FL 1 AF141484 E_FL 3 AF141475 E_FL 1 AF141548 E_PA 4 AF141520 E_PA 1 AF141508 E_PA 1 AF141523 E_PA 1 AF141547 E_PA 3 AF141530 E_PA 2 AF141510 E_PA 1 AF141513 E_PA 1 AF141524 E_PA 1 AF141516 E_PA 1 AF141543 E_PA 1 AF141503 E_PA 11 AF141498 E_PA 1 AF141532 E_PA 2 AF141549 E_PA 1 AF141501 E_PA 2 AF141525 E_PA 1 AF141534 E_PA 1 AF141506 E_PA 1 AF141519 E_PA 1 AF141550 E_PA 1 AF141496 E_PA 1 AF141535 E_PA 3 AF141512 E_PA 1 AF141514 E_PA 1 AF141537 E_PA 1 AF141533 E_PA 2 AF141511 E_PA 1 AF141539 E_PA 1 AF141545 E_PA 2 AF141500 E_PA 3 AF141505 E_PA 1 AF141497 E_PA 1 AF141541 E_PA 1 AF141518 E_PA 1 AF141551 E_PA 1 AF141504 E_PA 1 AF141546 E_PA 1 AF141515 E_PA 1 AF141529 E_PA 1 AF141507 E_PA 1 AF141540 E_PA 1 AF141521 E_PA 1 AF141538 E_PA 1 AF141522 E_PA 1 AF141509 E_PA 1 AF141517 E_PA 1 AF141536 E_PA 1 AF141557 O_UN 1 AF141569 O_UN 2 AF141561 O_UN 4 AF141578 O_UN 2 AF141567 O_UN 1 AF141566 O_UN 1 AF141568 O_UN 2 AF141579 O_UN 6 AF141562 O_UN 1 AF141559 O_UN 1 AF141560 O_UN 1 AF141577 O_UN 1 AF141583 O_FL 2 AF141595 O_FL 1 AF141599 O_FL 2 AF141582 O_FL 2 AF141600 O_FL 1 AF141594 O_FL 2 AF141592 O_FL 1 AF141597 O_FL 1 AF141590 O_FL 3 AF141581 O_FL 1 AF141587 O_FL 1 AF141588 O_FL 1 AF141586 O_FL 1 AF141591 O_FL 1 AF141589 O_FL 1 AF141593 O_FL 1 """ def _load_tree(tree_str, envs_str, title): """Load tree and return """ tree_in = StringIO(CRUMP_TREE) envs_in = StringIO(CRUMP_ENVS) print """\n(((((Running: %s""" % title return tree_in, envs_in def _display_results(out): print "Results:", out, "\n)))))" if __name__ == '__main__': """ Examples below using Crump tree and envs """ from StringIO import StringIO tree_in, envs_in = _load_tree(CRUMP_TREE, CRUMP_ENVS, "unifrac - pairwise example (unweighted)") out = fast_unifrac_permutations_file(tree_in, envs_in, weighted=False, num_iters=100, verbose=True) _display_results(out) tree_in, envs_in = _load_tree(CRUMP_TREE, CRUMP_ENVS, "unifrac - pairwise example (weighted, normalized)") out = fast_unifrac_permutations_file(tree_in, envs_in, weighted='correct', num_iters=100, verbose=True) _display_results(out) tree_in, envs_in = _load_tree(CRUMP_TREE, CRUMP_ENVS, "unifrac - pairwise example (weighted)") out = fast_unifrac_permutations_file(tree_in, envs_in, weighted=True, num_iters=100, verbose=True) _display_results(out) tree_in, envs_in = _load_tree(CRUMP_TREE, CRUMP_ENVS, "unifrac - whole tree example") out = fast_unifrac_permutations_file(tree_in, envs_in, weighted=True, num_iters=100, verbose=True, test_on=TEST_ON_TREE) _display_results(out) tree_in, envs_in = _load_tree(CRUMP_TREE, CRUMP_ENVS, "unifrac - each env example") out = fast_unifrac_permutations_file(tree_in, envs_in, weighted=True, num_iters=100, verbose=True, test_on=TEST_ON_ENVS) _display_results(out) tree_in, envs_in = _load_tree(CRUMP_TREE, CRUMP_ENVS, "p test - pairwise example") out = fast_p_test_file(tree_in, envs_in, num_iters=10, verbose=True, test_on=TEST_ON_PAIRWISE) _display_results(out) tree_in, envs_in = _load_tree(CRUMP_TREE, CRUMP_ENVS, "p test - whole tree example") out = fast_p_test_file(tree_in, envs_in, num_iters=10, verbose=True, test_on=TEST_ON_TREE) _display_results(out) print "Done examples." PyCogent-1.5.3/cogent/maths/stats/__init__.py000644 000765 000024 00000001457 12024702176 022112 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides statistical tests and distributions. Also provides NumberList and FrequencyDistribution, two classes for working with statistical data. """ __all__ = ['alpha_diversity', 'distribution', 'histogram', 'information_criteria', 'kendall', 'ks', 'rarefaction', 'special', 'test', 'util'] # GAH: this is a temporary introduction, so users get notice of structure change and # renaming of this function from distribution import chi_high as chisqprob __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Rob Knight", "Sandra Smit", "Catherine Lozupone", "Micah Hamady"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/cogent/maths/stats/alpha_diversity.py000644 000765 000024 00000041341 12024702176 023536 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division from cogent.maths.stats.special import lgam from cogent.maths.optimisers import minimise from math import ceil, e from numpy import array, zeros, concatenate, arange, log, sqrt, exp, asarray from cogent.maths.scipy_optimize import fmin_powell import cogent.maths.stats.rarefaction as rarefaction __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def expand_counts(counts): """Converts vector of counts at each index to vector of indices.""" result = [] for i, c in enumerate(counts): result.append(zeros(c, int) + i) return concatenate(result) def counts(indices, result=None): """Converts vector of indices to counts of each index. WARNING: does not check that 'result' array is big enough to store new counts, suggest preallocating based on whole dataset if doing cumulative analysis. """ if result is None: max_val = indices.max() result = zeros(max_val+1) for i in indices: result[i] += 1 return result def singles(counts): """Returns count of single occurrences.""" return (counts==1).sum() def doubles(counts): """Returns count of double occurrences.""" return (counts==2).sum() def observed_species(counts): """Calculates number of distinct species.""" return (counts!=0).sum() def osd(counts): """Returns observed, singles and doubles from counts. Handy for diversity calculations.""" return (counts!=0).sum(), (counts==1).sum(), (counts==2).sum() def margalef(counts): """Margalef's index, assumes log accumulation. Magurran 2004, p 77.""" return (observed_species(counts)-1)/log(counts.sum()) def menhinick(counts): """Menhinick's index, assumes sqrt accumulation. Magurran 2004, p 77.""" return observed_species(counts)/sqrt(counts.sum()) def dominance(counts): """Dominance = 1 - Simpson's index, sum of squares of probs. Briefly, gives probability that two species sampled are the same.""" freqs = counts/float(counts.sum()) return (freqs*freqs).sum() def simpson(counts): """Simpson's index = 1-dominance.""" return 1 - dominance(counts) def reciprocal_simpson(counts): """1/Simpson's index""" return 1.0/simpson(counts) def simpson_reciprocal(counts): """1/D (1/dominance)""" return 1.0/dominance(counts) def shannon(counts, base=2): """Returns Shannon entropy of counts, default in bits.""" freqs = counts/float(counts.sum()) nonzero_freqs = freqs[freqs.nonzero()] return -sum(nonzero_freqs*log(nonzero_freqs))/log(base) def equitability(counts, base=2): """Returns Shannon index corrected for # species, pure evenness.""" return shannon(counts, base)/(log((counts!=0).sum())/log(base)) def berger_parker_d(counts): """Fraction of the sample that belongs to the most abundant species. Berger & Parker 1970, by way of SDR-IV online help. """ return counts.max()/float(counts.sum()) def mcintosh_d(counts): """McIntosh index of alpha diversity (McIntosh 1967, by way of SDR-IV).""" u = sqrt((counts*counts).sum()) n = counts.sum() return (n-u)/(n-sqrt(n)) def brillouin_d(counts): """Brilloun index of alpha diversity: Pielou 1975, by way of SDR-IV.""" nz = counts[counts.nonzero()] n = nz.sum() return (lgam(n+1) - array(map(lgam, nz+1)).sum())/n def kempton_taylor_q(counts, lower_quantile=.25, upper_quantile=.75): """Kempton-Taylor (1976) q index of alpha diversity, by way of SDR-IV. Estimates the slope of the cumulative abundance curve in the interquantile range. By default, uses lower and upper quartiles, rounding inwards. Note: this differs slightly from the results given in Magurran 1998. Specifically, we have 14 in the numerator rather than 15. Magurran recommends counting half of the species with the same # counts as the point where the UQ falls and the point where the LQ falls, but the justification for this is unclear (e.g. if there were a very large # species that just overlapped one of the quantiles, the results would be considerably off). Leaving the calculation as-is for now, but consider changing. """ n = len(counts) lower = int(ceil(n*lower_quantile)) upper = int(n*upper_quantile) sorted = counts.copy() sorted.sort() return (upper-lower)/log(sorted[upper]/sorted[lower]) def strong(counts): """Strong's 2002 dominance index, by way of SDR-IV.""" cc = counts.copy() cc.sort() cc = cc[::-1] sorted_sum = cc.cumsum() n = counts.sum() s = (counts != 0).sum() i = arange(1,len(counts)+1) return (sorted_sum/float(n) - (i/float(s))).max() def fisher_alpha(counts, bounds=(1e-3,1e12)): """Fisher's alpha: S = alpha ln(1+N/alpha) where S=species, N=individuals bounds are guess for Powell optimizer bracket search. WARNING: may need to adjust bounds for some datasets. """ n = counts.sum() s = (counts!=0).sum() def f(alpha): return (alpha * log(1 + (n/alpha)) - s)**2 alpha = minimise(f, 1.0, bounds, local=True) if f(alpha) > 1.0: raise RuntimeError("optimizer failed to converge (error > 1.0)," +\ " so no fisher alpha returned") return alpha def mcintosh_e(counts): """McIntosh's evenness measure: Heip & Engels 1974 p 560 (wrong in SDR-IV).""" numerator = sqrt((counts*counts).sum()) n = counts.sum() s = (counts!=0).sum() denominator = sqrt((n-s+1)**2 + s - 1) return numerator/denominator def heip_e(counts): """Heip's evenness measure: Heip & Engels 1974.""" return exp(shannon(counts, base=e)-1)/((counts!=0).sum()-1) def simpson_e(counts): """Simpson's evenness, from SDR-IV.""" return reciprocal_simpson(counts)/(counts!=0).sum() def robbins(counts): """Robbins 1968 estimator for Pr(unobserved) at n trials. H. E. Robbins (1968, Ann. of Stats. Vol 36, pp. 256-257) probability_of_unobserved_colors = S/(n+1), (where s = singletons). Note that this is the estimate for (n-1) counts, i.e. x axis is off by 1. """ return float(singles(counts))/counts.sum() def robbins_confidence(counts, alpha=0.05): """Robbins 1968 confidence interval for counts given alpha. Note: according to Manuel's equation, if alpha=0.05, we get a 95% CI. """ s = singles(counts) n = counts.sum() k = sqrt((n+1)/alpha) return (s-k)/(n+1), (s+k)/(n+1) #TODO: SDR-IV also implements NHC, Carmago, Smith & Wilson, Gini indices. #possibly worth adding. #http://www.pisces-conservation.com/sdrhelp/index.html def michaelis_menten_fit(counts, num_repeats=1, params_guess=None, return_b=False): """Michaelis-Menten fit to rarefaction curve of observed species Note: there is some controversy about how to do the fitting. The ML model givem by Raaijmakers 1987 is based on the assumption that error is roughly proportional to magnitude of observation, reasonable for enzyme kinetics but not reasonable for rarefaction data. Here we just do a nonlinear curve fit for the parameters using least-squares. S = Smax*n/(B + n) . n: number of individuals, S: # of species returns Smax inputs: num_repeats: will perform rarefaction (subsampling without replacement) this many times at each value of n params_guess: intial guess of Smax, B (None => default) return_b: if True will return the estimate for Smax, B. Default is just Smax the fit is made to datapoints where n = 1,2,...counts.sum(), S = species represented in random sample of n individuals """ counts = asarray(counts) if params_guess is None: params_guess = array([100,500]) # observed # of species vs # of individuals sampled, S vs n xvals = arange(1,counts.sum()+1) ymtx = [] for i in range(num_repeats): ymtx.append( array([observed_species(rarefaction.subsample(counts,n)) \ for n in xvals])) ymtx = asarray(ymtx) yvals = ymtx.mean(0) # fit to obs_sp = max_sp * num_idiv / (num_indiv + B) # return max_sp def fitfn(p,n): # works with vectors of n, returns vector of S return p[0]*n/(p[1] + n) def errfn(p,n,y): # vectors of actual vals y and number of individuals n return ((fitfn(p,n) - y)**2).sum() p1 = fmin_powell(errfn, params_guess, args=(xvals,yvals), disp=0) if return_b: return p1 else: return p1[0] # return only S_max, not the K_m (B) param def chao1_uncorrected(observed, singles, doubles): """Calculates chao1 given counts. Eq. 1 in EstimateS manual. Formula: chao1 = S_obs + N_1^2/(2*N_2) where N_1 and N_2 are count of singletons and doubletons respectively. Note: this is the original formula from Chao 1984, not bias-corrected, and is Equation 1 in the EstimateS manual. """ return observed + singles**2/float(doubles*2) def chao1_bias_corrected(observed, singles, doubles): """Calculates bias-corrected chao1 given counts: Eq. 2 in EstimateS manual. Formula: chao1 = S_obs + N_1(N_1-1)/(2*(N_2+1)) where N_1 and N_2 are count of singletons and doubletons respectively. Note: this is the bias-corrected formulat from Chao 1987, Eq. 2 in the EstimateS manual. """ return observed + singles*(singles-1) / (2.0*(doubles+1)) def chao1(counts, bias_corrected=True): """Calculates chao1 according to table in EstimateS manual. Specifically, uses bias-corrected version unless bias_corrected is set to False _and_ there are both singletons and doubletons.""" o, s, d = osd(counts) if not bias_corrected: if s: if d: return chao1_uncorrected(o, s, d) return chao1_bias_corrected(o, s, d) def chao1_var_uncorrected(singles, doubles): """Calculates chao1, uncorrected. From EstimateS manual, equation 5. """ r = float(singles)/doubles return doubles*(.5*r**2 + r**3 + .24*r**4) def chao1_var_bias_corrected(singles, doubles): """Calculates chao1 variance, bias-corrected. From EstimateS manual, equation 6. """ s, d = float(singles), float(doubles) return s*(s-1)/(2*(d+1)) + (s*(2*s-1)**2)/(4*(d+1)**2) + \ (s**2 * d * (s-1)**2)/(4*(d+1)**4) def chao1_var_no_doubletons(singles, chao1): """Calculates chao1 variance in absence of doubletons. From EstimateS manual, equation 7. chao1 is the estimate of the mean of Chao1 from the same dataset. """ s = float(singles) return s*(s-1)/2 + s*(2*s-1)**2/4 - s**4/(4*chao1) def chao1_var_no_singletons(n, observed): """Calculates chao1 variance in absence of singletons. n = # individuals. From EstimateS manual, equation 8. """ o = float(observed) return o*exp(-n/o)*(1-exp(-n/o)) def chao1_var(counts, bias_corrected=True): """Calculates chao1 variance using decision rules in EstimateS.""" o, s, d = osd(counts) if not d: c = chao1(counts, bias_corrected) return chao1_var_no_doubletons(s, c) if not s: n = counts.sum() return chao1_var_no_singletons(n, o) if bias_corrected: return chao1_var_bias_corrected(s, d) else: return chao1_var_uncorrected(s, d) def chao_confidence_with_singletons(chao, observed, var_chao, zscore=1.96): """Calculates confidence bounds for chao1 or chao2. Uses Eq. 13 of EstimateS manual. zscore = score to use for confidence, default = 1.96 for 95% confidence. """ T = float(chao - observed) #if no diff betweeh chao and observed, CI is just point estimate of observed if T == 0: return observed, observed K = exp(abs(zscore)*sqrt(log(1+(var_chao/T**2)))) return observed + T/K, observed + T*K def chao_confidence_no_singletons(n, observed, zscore=1.96): """Calculates confidence bounds for chao1/chao2 in absence of singletons. Uses Eq. 14 of EstimateS manual. n = number of individuals, observed = number of species. """ s = float(observed) P = exp(-n/s) return max(s, s/(1-P)-zscore*sqrt((s*P/(1-P)))), \ s/(1-P) + zscore*sqrt(s*P/(1-P)) def chao1_confidence(counts, bias_corrected=True, zscore=1.96): """Returns chao1 confidence (lower, upper) from counts.""" o, s, d = osd(counts) if s: chao = chao1(counts, bias_corrected) chaovar = chao1_var(counts, bias_corrected) return chao_confidence_with_singletons(chao, o, chaovar, zscore) else: n = counts.sum() return chao_confidence_no_singletons(n, o, zscore) def chao1_lower(*args,**kwargs): """Convenience wrapper for chao1_confidence to fit overall diversity API.""" return chao1_confidence(*args,**kwargs)[0] def chao1_upper(*args,**kwargs): """Convenience wrapper for chao1_confidence to fit overall diversity API.""" return chao1_confidence(*args,**kwargs)[1] def ACE(count, rare_threshold=10): """Implements the ACE metric from EstimateS. Based on the equations given under ACE:Abundance-based Coverage Estimator. count = an OTU by sample vector rare_threshold = threshold at which a species containing as many or fewer individuals will be considered rare. IMPORTANT NOTES: Raises a value error if every rare species is a singleton. if no rare species exist, just returns the number of abundant species rare_threshold default value is 10. Based on Chao 2000 in Statistica Sinica pg. 229 citing empirical observations by Chao, Ma, Yang 1993. If the count vector contains 0's, indicating species which are known to exist in the environment but did not appear in the sample, they will be ignored for the purpose of calculating s_rare.""" def frequency_counter(count): """Creates a frequency count array to beused by every other function.""" return counts(count) def species_rare(freq_counts, rare_threshold): """freq_counts number of rare species. Default value of rare is 10 or fewer individuals. Based on Chao 2000 in Statistica Sinica pg. 229 citing empirical observations by Chao, Ma and Yang in 1993.""" return freq_counts[1:rare_threshold+1].sum() def species_abundant(freq_counts, rare_threshold): """freq_counts number of abundant species. Default value of abundant is greater than 10 individuals. Based on Chao 2000 in Statistica Sinica pg.229 citing observations by Chao, Ma and Yang in 1993.""" return freq_counts[rare_threshold+1:].sum() def number_rare(freq_counts, gamma=False): """Number of individuals in rare species. gamma=True generates the n_rare used for the variation coefficient.""" n_rare=0 if gamma == True: for i, j in enumerate(freq_counts[:rare_threshold+1]): n_rare = n_rare + (i*j)*(i-1) return n_rare for i, j in enumerate(freq_counts[:rare_threshold+1]): n_rare = n_rare + (i*j) return n_rare # calculations begin freq_counts = frequency_counter(count) if freq_counts[1:rare_threshold].sum() == 0: return species_abundant(freq_counts, rare_threshold) if freq_counts[1] == freq_counts[1:rare_threshold].sum(): raise ValueError("only rare species are singletons, ACE "+\ "metric is undefined. EstimateS suggests using bias corrected Chao1") s_abun = species_abundant(freq_counts, rare_threshold) s_rare = species_rare(freq_counts, rare_threshold) n_rare = number_rare(freq_counts) c_ace = 1 - (freq_counts[1]).sum()/float(n_rare) top = s_rare*number_rare(freq_counts, gamma=True) bottom = c_ace*n_rare*(n_rare-1.0) gamma_ace = (top/bottom) - 1.0 if 0 > gamma_ace: gamma_ace = 0 return s_abun + (s_rare/c_ace) + ((freq_counts[1]/c_ace)*gamma_ace) def diversity(indices,f=chao1,step=1,start=None,verbose=False): """Calculates diversity index (default:chao1) for each window of size step. indices: vector of indices of species f: f(counts) -> diversity measure (default: chao1) start: first index to sum up to (default: step) step: step size (default:1) Note: use rarefaction module if you need more versatility/speed. """ result = [] if start is None: start = step freqs = zeros(max(indices) + 1) i = 0 for j in range(start, len(indices)+1, step): freqs = counts(indices[i:j], freqs) try: curr = f(freqs) except (ZeroDivisionError, FloatingPointError, e): curr = 0 if verbose: print curr result.append(curr) i=j return array(result) PyCogent-1.5.3/cogent/maths/stats/cai/000755 000765 000024 00000000000 12024703630 020522 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/maths/stats/distribution.py000644 000765 000024 00000033614 12024702176 023072 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Translations of functions from Release 2.3 of the Cephes Math Library, which is (c) Stephen L. Moshier 1984, 1995. """ from __future__ import division from cogent.maths.stats.special import erf, erfc, igamc, igam, betai, log1p, \ expm1, SQRTH, MACHEP, MAXNUM, PI, ndtri, incbi, igami, fix_rounding_error,\ ln_binomial #ndtri import b/c it should be available via this module from numpy import sqrt, exp, arctan as atan __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit", "Gavin Huttley", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" incbet = betai #shouldn't have renamed it... #Probability integrals: low gives left-hand tail, high gives right-hand tail. def z_low(x): """Returns left-hand tail of z distribution (0 to x). x ranges from -infinity to +infinity; result ranges from 0 to 1 See Cephes docs for details.""" y = x * SQRTH z = abs(y) #distribution is symmetric if z < SQRTH: return 0.5 + 0.5 * erf(y) else: if y > 0: return 1 - 0.5 * erfc(z) else: return 0.5 * erfc(z) def z_high(x): """Returns right-hand tail of z distribution (0 to x). x ranges from -infinity to +infinity; result ranges from 0 to 1 See Cephes docs for details.""" y = x * SQRTH z = abs(y) if z < SQRTH: return 0.5 - 0.5 * erf(y) else: if x < 0: return 1 - 0.5 * erfc(z) else: return 0.5 * erfc(z) def zprob(x): """Returns both tails of z distribution (-inf to -x, inf to x).""" return 2 * z_high(abs(x)) def chi_low(x, df): """Returns left-hand tail of chi-square distribution (0 to x), given df. x ranges from 0 to infinity. df, the degrees of freedom, ranges from 1 to infinity (assume integers). Typically, df is (r-1)*(c-1) for a r by c table. Result ranges from 0 to 1. See Cephes docs for details. """ x = fix_rounding_error(x) if x < 0: raise ValueError, "chi_low: x must be >= 0 (got %s)." % x if df < 1: raise ValueError, "chi_low: df must be >= 1 (got %s)." % df return igam(df/2, x/2) def chi_high(x, df): """Returns right-hand tail of chi-square distribution (x to infinity). df, the degrees of freedom, ranges from 1 to infinity (assume integers). Typically, df is (r-1)*(c-1) for a r by c table. Result ranges from 0 to 1. See Cephes docs for details. """ x = fix_rounding_error(x) if x < 0: raise ValueError, "chi_high: x must be >= 0 (got %s)." % x if df < 1: raise ValueError, "chi_high: df must be >= 1 (got %s)." % df return igamc(df/2, x/2) def t_low(t, df): """Returns left-hand tail of Student's t distribution (-infinity to x). df, the degrees of freedom, ranges from 1 to infinity. Typically, df is (n-1) for a sample size of n. Result ranges from 0 to 1. See Cephes docs for details. """ if df < 1: raise ValueError, "t_low: df must be >= 1 (got %s)." % df return stdtr(df, t) def t_high(t, df): """Returns right-hand tail of Student's t distribution (x to infinity). df, the degrees of freedom, ranges from 1 to infinity. Typically, df is (n-1) for a sample size of n. Result ranges from 0 to 1. See Cephes docs for details. """ if df < 1: raise ValueError, "t_high: df must be >= 1 (got %s)." % df return stdtr(df, -t) #distribution is symmetric def tprob(t, df): """Returns both tails of t distribution (-infinity to -x, infinity to x)""" return 2 * t_high(abs(t), df) def poisson_high(successes, mean): """Returns right tail of Poission distribution, Pr(X > x). successes ranges from 0 to infinity. mean must be positive. """ return pdtrc(successes, mean) def poisson_low(successes, mean): """Returns left tail of Poisson distribution, Pr(X <= x). successes ranges from 0 to infinity. mean must be positive. """ return pdtr(successes, mean) def poisson_exact(successes, mean): """Returns Poisson probablity for exactly Pr(X=successes). Formula is e^-(mean) * mean^(successes) / (successes)! """ if successes == 0: return pdtr(0, mean) elif successes < mean: #use left tail return pdtr(successes, mean) - pdtr(successes-1, mean) else: #successes > mean: use right tail return pdtrc(successes-1, mean) - pdtrc(successes, mean) def binomial_high(successes, trials, prob): """Returns right-hand binomial tail (X > successes) given prob(success).""" if -1 <= successes < 0: return 1 return bdtrc(successes, trials, prob) def binomial_low(successes, trials, prob): """Returns left-hand binomial tail (X <= successes) given prob(success).""" return bdtr(successes, trials, prob) def binomial_exact(successes, trials, prob): """Returns binomial probability of exactly X successes. Works for integer and floating point values. Note: this function is only a probability mass function for integer values of 'trials' and 'successes', i.e. if you sum up non-integer values you probably won't get a sum of 1. """ if (prob < 0) or (prob > 1): raise ValueError, "Binomial prob must be between 0 and 1." if (successes < 0) or (trials < successes): raise ValueError, "Binomial successes must be between 0 and trials." return exp(ln_binomial(successes, trials, prob)) def f_low(df1, df2, x): """Returns left-hand tail of f distribution (0 to x). x ranges from 0 to infinity. Result ranges from 0 to 1. See Cephes docs for details. """ return fdtr(df1, df2, x) def f_high(df1, df2, x): """Returns right-hand tail of f distribution (x to infinity). Result ranges from 0 to 1. See Cephes docs for details. """ return fdtrc(df1, df2, x) def fprob(dfn, dfd, F, side='right'): """Returns both tails of F distribution (-inf to F and F to inf) Use in case of two-tailed test. Usually this method is called by f_two_sample, so you don't have to worry about choosing the right side. side: right means return twice the right-hand tail of the F-distribution. Use in case var(a) > var (b) left means return twice the left-hand tail of the F-distribution. Use in case var(a) < var(b) """ if F < 0: raise ValueError, "fprob: F must be >= 0 (got %s)." % F if side=='right': return 2*f_high(dfn, dfd, F) elif side=='left': return 2*f_low(dfn, dfd, F) else: raise ValueError, "Not a valid value for side %s"%(side) def stdtr(k, t): """Student's t distribution, -infinity to t. See Cephes docs for details. """ if k <= 0: raise ValueError, 'stdtr: df must be > 0.' if t == 0: return 0.5 if t < -2: rk = k z = rk / (rk + t * t) return 0.5 * betai(0.5 * rk, 0.5, z) #compute integral from -t to + t if t < 0: x = -t else: x = t rk = k #degrees of freedom z = 1 + (x * x)/rk #test if k is odd or even if (k & 1) != 0: #odd k xsqk = x/sqrt(rk) p = atan(xsqk) if k > 1: f = 1 tz = 1 j = 3 while (j <= (k-2)) and ((tz/f) > MACHEP): tz *= (j-1)/(z*j) f += tz j += 2 p += f * xsqk/z p *= 2/PI else: #even k f = 1 tz = 1 j = 2 while (j <= (k-2)) and ((tz/f) > MACHEP): tz *= (j-1)/(z*j) f += tz j += 2 p = f * x/sqrt(z*rk) #common exit if t < 0: p = -p #note destruction of relative accuracy p = 0.5 + 0.5 * p return p def bdtr(k, n, p): """Binomial distribution, 0 through k. Uses formula bdtr(k, n, p) = betai(n-k, k+1, 1-p) See Cephes docs for details. """ p = fix_rounding_error(p) if (p < 0) or (p > 1): raise ValueError, "Binomial p must be between 0 and 1." if (k < 0) or (n < k): raise ValueError, "Binomial k must be between 0 and n." if k == n: return 1 dn = n - k if k == 0: return pow(1-p, dn) else: return betai(dn, k+1, 1-p) def bdtrc(k, n, p): """Complement of binomial distribution, k+1 through n. Uses formula bdtrc(k, n, p) = betai(k+1, n-k, p) See Cephes docs for details. """ p = fix_rounding_error(p) if (p < 0) or (p > 1): raise ValueError, "Binomial p must be between 0 and 1." if (k < 0) or (n < k): raise ValueError, "Binomial k must be between 0 and n." if k == n: return 0 dn = n - k if k == 0: if p < .01: dk = -expm1(dn * log1p(-p)) else: dk = 1 - pow(1.0-p, dn) else: dk = k + 1 dk = betai(dk, dn, p) return dk def pdtr(k, m): """Returns sum of left tail of Poisson distribution, 0 through k. See Cephes docs for details. """ if k < 0: raise ValueError, "Poisson k must be >= 0." if m < 0: raise ValueError, "Poisson m must be >= 0." return igamc(k+1, m) def pdtrc(k, m): """Returns sum of right tail of Poisson distribution, k+1 through infinity. See Cephes docs for details. """ if k < 0: raise ValueError, "Poisson k must be >= 0." if m < 0: raise ValueError, "Poisson m must be >= 0." return igam(k+1, m) def fdtr(a, b, x): """Returns left tail of F distribution, 0 to x. See Cephes docs for details. """ if min(a, b) < 1: raise ValueError, "F a and b (degrees of freedom) must both be >= 1." if x < 0: raise ValueError, "F distribution value of f must be >= 0." w = a * x w /= float(b + w) return betai(0.5 * a, 0.5 * b, w) def fdtrc(a, b, x): """Returns right tail of F distribution, x to infinity. See Cephes docs for details. """ if min(a, b) < 1: raise ValueError, "F a and b (degrees of freedom) must both be >= 1." if x < 0: raise ValueError, "F distribution value of f must be >= 0." w = float(b) / (b + a * x) return betai(0.5 * b, 0.5 * a, w) def gdtr(a, b, x): """Returns integral from 0 to x of Gamma distribution with params a and b. """ if x < 0.0: raise ZeroDivisionError, "x must be at least 0." return igam( b, a * x) def gdtrc(a, b, x): """Returns integral from x to inf of Gamma distribution with params a and b. """ if x < 0.0: raise ZeroDivisionError, "x must be at least 0." return igamc(b, a * x) #note: ndtri for the normal distribution is already imported def chdtri(df, y): """Returns inverse of chi-squared distribution.""" y = fix_rounding_error(y) if(y < 0.0 or y > 1.0 or df < 1.0): raise ZeroDivisionError, "y must be between 0 and 1; df >= 1" return 2 * igami(0.5*df, y) def stdtri(k, p): """Returns inverse of Student's t distribution. k = df.""" p = fix_rounding_error(p) # handle easy cases if k <= 0 or p < 0.0 or p > 1.0: raise ZeroDivisionError, "k must be >= 1, p between 1 and 0." rk = k #handle intermediate values if p > 0.25 and p < 0.75: if p == 0.5: return 0.0 z = 1.0 - 2.0 * p; z = incbi(0.5, 0.5*rk, abs(z)) t = sqrt(rk*z/(1.0-z)) if p < 0.5: t = -t return t #handle extreme values rflg = -1 if p >= 0.5: p = 1.0 - p; rflg = 1 z = incbi(0.5*rk, 0.5, 2.0*p) if MAXNUM * z < rk: return rflg * MAXNUM t = sqrt(rk/z - rk) return rflg * t def pdtri(k, p): """Inverse of Poisson distribution. Finds Poission mean such that integral from 0 to k is p. """ p = fix_rounding_error(p) if k < 0 or p < 0.0 or p >= 1.0: raise ZeroDivisionError, "k must be >=0, p between 1 and 0." v = k+1; return igami(v, p) def bdtri(k, n, y): """Inverse of binomial distribution. Finds binomial p such that sum of terms 0-k reaches cum probability y. """ y = fix_rounding_error(y) if y < 0.0 or y > 1.0: raise ZeroDivisionError, "y must be between 1 and 0." if k < 0 or n <= k: raise ZeroDivisionError, "k must be between 0 and n" dn = n - k if k == 0: if y > 0.8: p = -expm1(log1p(y-1.0) / dn) else: p = 1.0 - y**(1.0/dn) else: dk = k + 1; p = incbet(dn, dk, 0.5) if p > 0.5: p = incbi(dk, dn, 1.0-y) else: p = 1.0 - incbi(dn, dk, y) return p def gdtri(a, b, y): """Returns Gamma such that y is the probability in the integral. WARNING: if 1-y == 1, gives incorrect result. The scipy implementation gets around this by using cdflib, which is in Fortran. Until someone gets around to translating that, only use this function for values of p greater than 1e-15 or so! """ y = fix_rounding_error(y) if y < 0.0 or y > 1.0 or a <= 0.0 or b < 0.0: raise ZeroDivisionError, "a and b must be non-negative, y from 0 to 1." return igami(b, 1.0-y) / a def fdtri(a, b, y): """Returns inverse of F distribution.""" y = fix_rounding_error(y) if( a < 1.0 or b < 1.0 or y <= 0.0 or y > 1.0): raise ZeroDivisionError, "y must be between 0 and 1; a and b >= 1" y = 1.0-y # Compute probability for x = 0.5 w = incbet(0.5*b, 0.5*a, 0.5) # If that is greater than y, then the solution w < .5. # Otherwise, solve at 1-y to remove cancellation in (b - b*w). if w > y or y < 0.001: w = incbi(0.5*b, 0.5*a, y) x = (b - b*w)/(a*w) else: w = incbi(0.5*a, 0.5*b, 1.0-y) x = b*w/(a*(1.0-w)) return x PyCogent-1.5.3/cogent/maths/stats/histogram.py000644 000765 000024 00000006622 12024702176 022347 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides Histogram, which bins arbitrary objects. """ from cogent.maths.stats.util import Freqs __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" class Histogram(object): """Stores a list of _bins and a list of corresponding objects. It contains always a similar number of bins and value lists. A value list can contain multiple objects. _bins should implement __contains__. """ Multi = False Mapping = None def __init__(self, data='', bins=None, Mapping=None, Multi=None): """Returns a new Histogram object. Data is any sequence of data bins is a list of objects that implement __contains__ Mapping is a function to be applied on each object in data. Gives you the option to order the objects according to some value e.g. make a distribution of sequence lengths. def seq_length(s): return len(s) h = histogram(Data=[sequences],_bins=[Spans],Mapping=seq_length) Multi determins whether an object might end up in multiple bins. False by default. All parameters other than Data are None by default. This gives you the opportunity to subclass Histogram and store _bins, Multi, and Mapping as class data. """ if bins is not None: self._bins = bins if Mapping is not None: self.Mapping = Mapping if Multi is not None: self.Multi = Multi self.clear() self(data) def __iter__(self): """Iterates through Bin, Values pairs""" return iter(zip(self._bins, self._values)) def __call__(self, data): """Puts all objects in data in the bins. Keeps old data that was already in histogram, so updates the data. """ function = self.Mapping if function: transformed = map(function, data) else: transformed = data for t, d in zip(transformed, data): in_bin = False for bin, values in self: if t in bin: values.append(d) in_bin = True if not self.Multi: break else: continue if not in_bin: self.Other.append(d) def clear(self): """Erases all data in the bins. Note: creates new, empty lists, so if you have references to the old data elsewhere in your code they won't be updated. """ self._values = [[] for i in self._bins] self.Other = [] def __str__(self): """Returns string representation of the histogram""" result = [] for bin, values in self: result.append(str(bin)+'\t'+str(values)) return '\n'.join(result) def toFreqs(self): """Returns a Freqs object based on the histogram. Labels of Freqs will be _bins converted into strings Values of Freqs will be the number of objects in a Bin """ result = Freqs() for bin,values in self: result[str(bin)] = len(values) return result PyCogent-1.5.3/cogent/maths/stats/information_criteria.py000644 000765 000024 00000002171 12024702176 024554 0ustar00jrideoutstaff000000 000000 from __future__ import division import numpy __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Production" def aic(lnL, nfp, sample_size=None): """returns Aikake Information Criterion Arguments: - lnL: the maximum log-likelihood of a model - nfp: the number of free parameters in the model - sample_size: if provided, the second order AIC is returned """ if sample_size is None: correction = 1 else: assert sample_size > 0, "Invalid sample_size %s" % sample_size correction = sample_size / (sample_size - nfp - 1) return -2* lnL + 2 * nfp * correction def bic(lnL, nfp, sample_size): """returns Bayesian Information Criterion Arguments: - lnL: the maximum log-likelihood of a model - nfp: the number of free parameters in the model - sample_size: size of the sample """ return -2* lnL + nfp * numpy.log(sample_size) PyCogent-1.5.3/cogent/maths/stats/jackknife.py000644 000765 000024 00000013533 12024702176 022276 0ustar00jrideoutstaff000000 000000 from __future__ import division import numpy as np from cogent import LoadTable __author__ = "Anuj Pahwa, Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Anuj Pahwa", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Production" def IndexGen(length): data = tuple(range(length)) def gen(i): temp = list(data) temp.pop(i) return temp return gen class JackknifeStats(object): """Computes the jackknife statistic for a particular statistical function as outlined by 'Tukey's Jackknife Method' Biometry by Sokal/Rohlf.""" def __init__(self, length, calc_stat, gen_index=IndexGen): """Initialise the jackknife class: length: The length of the data set (since data is not passed to this class). calc_stat: A callback function that computes the required statistic of a defined dataset. gen_index: A callback function that generates a list of indices that are used to sub-sample the dataset.""" super(JackknifeStats, self).__init__() self.n = length self.calc_stat = calc_stat self.gen_index = gen_index(self.n) self._subset_statistics = None self._pseudovalues = None self._jackknifed_stat = None self._sample_statistic = None self._standard_error = None def jackknife(self): """Computes the jackknife statistics and standard error""" n = self.n n_minus_1 = n - 1 # compute the statistic in question on the whole data set self._sample_statistic = self.calc_stat(range(self.n)) n_sample_statistic = n * self._sample_statistic # compute the jackknife statistic for the data by removing an element # in each iteration and computing the statistic. subset_statistics = [] pseudovalues = [] for index in range(self.n): stat = self.calc_stat(self.gen_index(index)) subset_statistics.append(stat) pseudovalue = n_sample_statistic - n_minus_1 * stat pseudovalues.append(pseudovalue) self._pseudovalues = np.array(pseudovalues) self._subset_statistics = np.array(subset_statistics) self._jackknifed_stat = self._pseudovalues.mean(axis=0) # Compute the approximate standard error of the jackknifed estimate # of the statistic variance = np.square(self._pseudovalues - self._jackknifed_stat).sum(axis=0) variance_norm = np.divide(variance, n * n_minus_1) self._standard_error = np.sqrt(variance_norm) @property def SampleStat(self): if self._sample_statistic is None: self.jackknife() return self._sample_statistic @property def JackknifedStat(self): if self._jackknifed_stat is None: self.jackknife() return self._jackknifed_stat @property def StandardError(self): if self._standard_error is None: self.jackknife() return self._standard_error @property def SubSampleStats(self): """Return a table of the sub-sample statistics""" # if the statistics haven't been run yet. if self._subset_statistics is None: self.jackknife() # generate table title = 'Subsample Stats' rows = [] for index in range(self.n): row = [] row.append(index) subset_statistics = self._subset_statistics[index] try: for value in subset_statistics: row.append(value) except TypeError: row.append(subset_statistics) rows.append(row) header = ['i'] subset_stats = self._subset_statistics[0] try: num_datasets = len(subset_stats) for i in range(num_datasets): header.append('Stat_%s-i'%i) except TypeError: header.append('Stat-i') return LoadTable(rows=rows, header=header,title=title) @property def Pseudovalues(self): """Return a table of the Pseudovalues""" # if the statistics haven't been run yet. if self._pseudovalues is None: self.jackknife() # detailed table title = 'Pseudovalues' rows = [] for index in range(self.n): row = [index] pseudovalues = self._pseudovalues[index] try: for value in pseudovalues: row.append(value) except TypeError: row.append(pseudovalues) rows.append(row) header = ['i'] pseudovalues = self._pseudovalues[0] try: num_datasets = len(pseudovalues) for i in range(num_datasets): header.append('Pseudovalue_%s-i'%i) except TypeError: header.append('Pseudovalue-i') return LoadTable(rows=rows, header=header,title=title) @property def SummaryStats(self): """Return a summary table with the statistic value(s) calculated for the the full data-set, the jackknife statistics and standard errors.""" # if the statistics haven't been run yet. if self._jackknifed_stat is None: self.jackknife() header = ['Sample Stat', 'Jackknife Stat', 'Standard Error'] title = 'Summary Statistics' rows = np.vstack((self._sample_statistic, self._jackknifed_stat, self._standard_error)) rows = rows.transpose() return LoadTable(header=header, rows=rows, title=title) PyCogent-1.5.3/cogent/maths/stats/kendall.py000644 000765 000024 00000007107 12024702176 021763 0ustar00jrideoutstaff000000 000000 """ Computes kendalls tau statistic and associated probabilities. A wrapper function is provided in cogent.maths.stats.kendall_correlation Translated from R 2.5 by Gavin Huttley """ from __future__ import division from numpy import floor, sqrt, array from cogent.maths.stats.util import Freqs from cogent.maths.stats.distribution import zprob __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Production" def as_paired_ranks(x, y): """return as matrix of paired ranks""" n = len(x) paired = zip(x,y) x = list(x) y = list(y) x.sort() y.sort() rank_val_map_x = dict(zip(x, range(n))) rank_val_map_y = dict(zip(y, range(n))) ranked = [] for i in range(n): ranked += [[rank_val_map_x[paired[i][0]], rank_val_map_y[paired[i][1]]]] return ranked def ckendall(k, n, w): # translated from R 2.5 combin = n*(n-1)/2 if k < 0 or k > combin: return 0 if w[n][k] < 0: if n == 1: w[n][k] = k == 0 else: s = 0 for i in range(n): result = ckendall(k-i, n-1, w) s += result w[n][k] = s return w[n][k] def pkendall(x, n, divisor, working): # translated from R 2.5 q = floor(x + 1e-7) if q < 0: x = 0 elif q > n * (n - 1) / 2: x = 1 else: p = 0 for k in range(int(q)+1): result = ckendall(k, n, working) p += result x = p / divisor return x def kendalls_tau(x, y, return_p=True): """returns kendall's tau Arguments: - return_p: returns the probability from the normal approximation when True, otherwise just returns tau""" ranked = as_paired_ranks(x, y) n = len(ranked) denom = n * (n-1) / 2 con = 0 discor = 0 x_tied = 0 y_tied = 0 for i in range(n-1): x_1 = ranked[i][0] y_1 = ranked[i][1] for j in range(i+1, n): x_2 = ranked[j][0] y_2 = ranked[j][1] x_diff = x_1 - x_2 y_diff = y_1 - y_2 if x_diff * y_diff > 0: con += 1 elif x_diff and y_diff: discor += 1 else: if x_diff: y_tied += 1 if y_diff: x_tied += 1 diff = con - discor total = con + discor denom = ((total + y_tied) * (total + x_tied))**0.5 variance = (4*n+10) / (9*n*(n-1)) tau = diff / denom stat = tau if x_tied or y_tied: x_tied = array([v for v in Freqs(x).itervalues() if v > 1]) y_tied = array([v for v in Freqs(y).itervalues() if v > 1]) t0 = n*(n-1)/2 t1 = sum(x_tied * (x_tied-1)) / 2 t2 = sum(y_tied * (y_tied-1)) / 2 stat = tau * sqrt((t0-t1)*(t0-t2)) v0 = n * (n - 1) * (2 * n + 5) vt = sum(x_tied * (x_tied - 1) * (2 * x_tied + 5)) vu = sum(y_tied * (y_tied - 1) * (2 * y_tied + 5)) v1 = sum(x_tied * (x_tied - 1)) * sum(y_tied * (y_tied - 1)) v2 = sum(x_tied * (x_tied - 1) * (x_tied - 2)) * \ sum(y_tied * (y_tied - 1) * (y_tied - 2)) variance = (v0 - vt - vu) / 18 + v1 / (2 * n * (n - 1)) + v2 / (9 * n * \ (n - 1) * (n - 2)) if return_p: return tau, zprob(stat / variance**0.5) else: return tau PyCogent-1.5.3/cogent/maths/stats/ks.py000644 000765 000024 00000007105 12024702176 020764 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """computes probabilities for the kolmogorov distribution. Translated from R 2.4 by Gavin Huttley """ from __future__ import division from numpy import sqrt, log, pi, exp, fabs, floor, zeros, asarray,\ dot as matrixmultiply, ones, array, reshape, ravel, sum, arange from cogent.maths.stats.special import combinations __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" PIO4 = pi/4 PIO2 = pi/2 INVSQRT2PI = 1/sqrt(2*pi) def mpower(A, exponent): """matrix power""" new = A for i in range(1, exponent): new = matrixmultiply(new, A) return new def pkolmogorov1x(statistic, n): """Probability function for the one-sided one sample Kolmogorov statistics. Translated from R 2.4.""" statistic = asarray(statistic) if statistic <= 0: return 0. if statistic >= 1: return 1. to = floor(n * (1-statistic))+1 j = arange(0, to) coeffs = asarray([log(combinations(n, i)) for i in j]) p = sum(exp(coeffs + (n-j)*log(1-statistic-j/n) + \ (j-1)*(log(statistic+j/n)))) return 1 - statistic * p def pkolmogorov2x(statistic, n): """Probability function for Kolmogorovs distribution.""" k=int(n*statistic)+1 m=2*k-1 h=k-n*statistic H = ones(m**2, 'd') Q = zeros(m**2, 'd') for i in range(m): for j in range(m): if(i-j+1<0): H[i*m+j]=0 for i in range(m): H[i*m] -= h**(i+1) H[(m-1) * m+i] -= h**(m-i) H[(m-1)*m] += [0, (2*h-1)**m][2 * h - 1 > 0] for i in range(m): for j in range(m): if(i-j+1>0): for g in range(1, i-j+2): H[i*m+j] /= g Q = ravel(mpower(reshape(H, (m,m)), n)) s = Q[(k-1)*m+k-1] for i in range(1,n+1): s *= i/n return s def pkstwo(x_vector, tolerance=1e-6): """Probability from the Kolmogorov asymptotic distribution.""" #if isinstance(x_vector, float): # x_vector = asarray(x_vector) x_vector = array(x_vector, ndmin=1) size = len(x_vector) k_max = int(sqrt(2-log(tolerance))) for i in range(size): if x_vector[i] < 1: z = -(PIO2 * PIO4) / x_vector[i]**2 w = log(x_vector[i]) s = 0 for k in range(1, k_max, 2): s += exp(k**2 * z - w) x_vector[i] = s / INVSQRT2PI else: z = -2 * x_vector[i]**2 s = -1 k = 1 old = 0 new = 1 while fabs(old - new) > tolerance: old = new new += (2 * s * exp(z * k**2)) s *= -1 k += 1 x_vector[i] = new return x_vector def psmirnov2x(statistic, least, most): if least > most: least, most = most, least q = floor(statistic * most * least - 1e-7) / (least * most) u_vector = zeros(most+1, 'd') for j in range(most+1): #SUPPORT2425 u_vector[j] = [1,0][int(j / most > q)] for i in range(1,least+1): w = i / (i+most) if i/least > q: u_vector[0] = 0 else: u_vector[0] = w * u_vector[0] for j in range(1, most+1): if fabs(i/least - j/most) > q: u_vector[j] = 0 else: u_vector[j] = w * u_vector[j] + u_vector[j-1] return u_vector[most] PyCogent-1.5.3/cogent/maths/stats/period.py000644 000765 000024 00000015600 12024702176 021630 0ustar00jrideoutstaff000000 000000 from __future__ import division from random import shuffle, random, choice import numpy try: from math import factorial except ImportError: # python version < 2.6 from cogent.maths.stats.special import Gamma factorial = lambda x: Gamma(x+1) from cogent.maths.stats.special import igam __author__ = "Hua Ying, Julien Epps and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Julien Epps", "Hua Ying", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "Production" def chi_square(x, p, df=1): """returns the chisquare statistic and it's probability""" N = len(x) end = N sim = numpy.logical_not(numpy.logical_xor(x[0:end-p], x[p:end]))*1 s = ((numpy.ones((N-p,), float)-sim)**2).sum() D = s/(N-p) p_val = 1 - igam(df/2.0, D/2) return D, p_val def g_statistic(X, p=None, idx=None): """ return g statistic and p value arguments: X - the periodicity profile (e.g. DFT magnitudes, autocorrelation etc) X needs to contain only those period values being considered, i.e. only periods in the range [llim, ulim] """ # X should be real X = abs(numpy.array(X)) if p is None: power = X.max(0) idx = X.argmax(0) else: assert idx is not None power = X[idx] g_obs = power/X.sum() M = numpy.floor(1/g_obs) pmax = len(X) result = numpy.zeros((int(M+1),), float) pmax_fact = factorial(pmax) for index in xrange(1, min(pmax, int(M))+1): v = (-1)**(index-1)*pmax_fact/factorial(pmax-index)/factorial(index) v *= (1-index*g_obs)**(pmax-1) result[index] = v p_val = result.sum() return g_obs, p_val def _seq_to_symbols(seq, motifs, motif_length, result=None): """return symbolic represetation of the sequence Arguments: - seq: a sequence - motifs: a list of sequence motifs - motif_length: length of first motif """ if result is None: result = numpy.zeros(len(seq), numpy.uint8) else: result.fill(0) if motif_length is None: motif_length = len(motifs[0]) for i in xrange(len(seq) - motif_length + 1): if seq[i: i + motif_length] in motifs: result[i] = 1 return result try: from cogent.maths._period import seq_to_symbols #raise ImportError except ImportError: seq_to_symbols = _seq_to_symbols class SeqToSymbols(object): """class for converting all occurrences of motifs in passed sequence to 1/0 otherwise""" def __init__(self, motifs, length=None, motif_length=None): super(SeqToSymbols, self).__init__() if type(motifs) == str: motifs = [motifs] self.motifs = motifs self.length = length self.motif_length = motif_length or len(motifs[0]) self.working = None if length is not None: self.setResultArray(length) def setResultArray(self, length): """sets a result array for length""" self.working = numpy.zeros(length, numpy.uint8) self.length = length def __call__(self, seq, result=None): if result is None and self.working is None: self.setResultArray(len(seq)) elif self.working is not None: if len(seq) != self.working.shape[0]: self.setResultArray(len(seq)) result = self.working result.fill(0) if type(seq) != str: seq = ''.join(seq) return seq_to_symbols(seq, self.motifs, self.motif_length, result) def circular_indices(vector, start, length, num): """docstring for circular_indices""" if start > length: start = start-length if start+num < length: return vector[start: start+num] # get all till end, then from beginning return vector[start:] + vector[:start+num-length] def sampled_places(block_size, length): """returns randomly sampled positions with block_size to make a new vector with length """ # Main condition is to identify when a draw would run off end, we want to # draw from beginning num_seg, remainder = divmod(length, block_size) vector = range(length) result = [] for seg_num in xrange(num_seg): i = choice(vector) result += circular_indices(vector, i, length, block_size) if remainder: result += circular_indices(vector, i+block_size, length, remainder) assert len(result) == length, len(result) return result def blockwise_bootstrap(signal, calc, block_size, num_reps, seq_to_symbols=None, num_stats=None): """returns observed statistic and the probability from the bootstrap test of observing more `power' by chance than that estimated from the observed signal Arguments: - signal: a series, can be a sequence object - calc: function to calculate the period power, e.g. ipdft, hybrid, auto_corr or any other statistic. - block_size: size of contiguous values for resampling - num_reps: number of randomly generated permutations - seq_to_symbols: function to convert a sequence to 1/0. If not provided, the raw data is used. - num_stats: the number of statistics being evaluated for each interation. Default to 1. """ signal_length = len(signal) if seq_to_symbols is not None: dtype='c' else: dtype=None # let numpy guess signal = numpy.array(list(signal), dtype=dtype) if seq_to_symbols is not None: symbolic = seq_to_symbols(signal) data = symbolic else: data = signal obs_stat = calc(data) if seq_to_symbols is not None: if sum(symbolic) == 0: p = [numpy.array([1.0, 1.0, 1.0]), 1.0][num_stats == 1] return obs_stat, p if num_stats is None: try: num_stats = calc.getNumStats() except AttributeError: num_stats = 1 if num_stats == 1: count = 0 else: count = numpy.zeros(num_stats) for rep in range(num_reps): # get sample positions sampled_indices = sampled_places(block_size, signal_length) new_signal = signal.take(sampled_indices) if seq_to_symbols is not None: symbolic = seq_to_symbols(new_signal) data = symbolic else: data = new_signal sim_stat = calc(data) # count if > than observed if num_stats > 1: count[sim_stat >= obs_stat] += 1 elif sim_stat >= obs_stat: count += 1 return obs_stat, count / num_reps # def percrb4(): # """Return SNR and CRB for periodicity estimates from symbolic signals""" # # TODO: complete the function according to Julien's percrb4.m # pass # PyCogent-1.5.3/cogent/maths/stats/rarefaction.py000644 000765 000024 00000017614 12024702176 022652 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from numpy import concatenate, repeat, array, zeros, histogram, arange, uint, zeros from numpy.random import permutation, randint, sample, multinomial from random import Random, _ceil, _log """Given array of objects (counts or indices), perform rarefaction analyses.""" __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" class MyRandom(Random): """Adding a method to sample from array only""" def sample_array(self, population, k): """Chooses k unique random elements from a population sequence. Returns a new list containing elements from the population while leaving the original population unchanged. The resulting list is in selection order so that all sub-slices will also be valid random samples. This allows raffle winners (the sample) to be partitioned into grand prize and second place winners (the subslices). Members of the population need not be hashable or unique. If the population contains repeats, then each occurrence is a possible selection in the sample. To choose a sample in a range of integers, use xrange as an argument. This is especially fast and space efficient for sampling from a large population: sample(xrange(10000000), 60) """ # Sampling without replacement entails tracking either potential # selections (the pool) in a list or previous selections in a set. # When the number of selections is small compared to the # population, then tracking selections is efficient, requiring # only a small set and an occasional reselection. For # a larger number of selections, the pool tracking method is # preferred since the list takes less space than the # set and it doesn't suffer from frequent reselections. n = len(population) if not 0 <= k <= n: raise ValueError("sample larger than population") random = self.random _int = int result = zeros(k) setsize = 21 # size of a small set minus size of an empty list if k > 5: setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets if n <= setsize or hasattr(population, "keys"): # An n-length list is smaller than a k-length set, or this is a # mapping type so the other algorithm wouldn't work. pool = array(list(population)) for i in xrange(k): # invariant: non-selected at [0,n-i) j = _int(random() * (n-i)) result[i] = pool[j] pool[j] = pool[n-i-1] # move non-selected item into vacancy else: try: selected = set() selected_add = selected.add for i in xrange(k): j = _int(random() * n) while j in selected: j = _int(random() * n) selected_add(j) result[i] = population[j] except (TypeError, KeyError): # handle (at least) sets if isinstance(population, list): raise return self.sample_array(tuple(population), k) return result _inst = MyRandom() sample = _inst.sample_array def subsample(counts, n): """Subsamples new vector from vector of orig items. Returns all items if requested sample is larger than number of items. """ if counts.sum() <= n: return counts nz = counts.nonzero()[0] unpacked = concatenate([repeat(array(i,), counts[i]) for i in nz]) permuted = permutation(unpacked)[:n] result = zeros(len(counts)) for p in permuted: result[p] += 1 return result def subsample_freq_dist_nonzero(counts, n, dtype=uint): """Subsamples new vector from vector of orig items. Returns all items if requested sample is larger than number of items. This version uses the cumsum/frequency distribution method. """ if counts.sum() <= n: return counts result = zeros(len(counts), dtype=dtype) nz = counts.nonzero()[0] compressed = counts.take(nz) sums = compressed.cumsum() total = sums[-1] del compressed curr = n while curr: pick = randint(0, total) #print pick, sums, sums.searchsorted(pick), '\n' index = sums.searchsorted(pick,side='right') result[nz[index]] += 1 sums[index:] -= 1 curr -= 1 total -= 1 return result def subsample_random(counts, n, dtype=uint): """Subsamples new vector from vector of orig items. Returns all items if requested sample is larger than number of items. This version uses random.sample. """ if counts.sum() <= n: return counts nz = counts.nonzero()[0] unpacked = concatenate([repeat(array(i,), counts[i]) for i in nz]) permuted = sample(unpacked, n) result = zeros(len(counts),dtype=dtype) for p in permuted: result[p] += 1 return result def subsample_multinomial(counts, n, dtype=None): """Subsamples new vector from vector of orig items. Returns all items if requested sample is larger than number of items. This version uses the multinomial to sample WITH replacement. """ if dtype == None: dtype=counts.dtype if counts.sum() <= n: return counts result = zeros(len(counts), dtype=dtype) nz = counts.nonzero()[0] compressed = counts.take(nz).astype(float) compressed /= compressed.sum() result = multinomial(n, compressed).astype(dtype) counts[nz] = result return counts def naive_histogram(vals, max_val=None, result=None): """Naive histogram for performance testing vs. numpy's. Apparently numpy's is 3x faster (non-cumulative) for larger step sizes (e.g. 1000) and 10x slower for small step sizes (e.g. 1), so will use logic to switch over depending on conditions. """ if max_val is None: max_val = vals.max() if result is None: result = zeros(max_val+1, dtype=int) for v in vals: result[v] += 1 return result def wrap_numpy_histogram(max_val): """return convenience wrapper for numpy histogram""" bins = arange(max_val+2, dtype = int) #+1 for length, +1 for leading 0 def f(vals, max_val='ignored'): return histogram(vals, bins)[0] return f def rarefaction(data, start=0, stop=None, stride=1, histogram_f=None, \ permutation_f=permutation, is_counts=True): """Yields successive subsamples as vectors from vector of orig items. data can either be array of counts or array of observations. Default is to assume counts; set is_counts to False if this is not the case for your input. Returns all items if requested sample is larger than number of items. WARNING: each successive result is written into the same object (for convenience) so if you want the actual vectors for each rarefaction you need to do something like res = [r.copy() for r in rarefaction(params)]. """ if is_counts: #need to transform data into indices nz = array(data).nonzero()[0] indices = concatenate([repeat(array(i,), data[i]) for i in nz]) else: indices = array(data) if stop is None: stop = len(indices) if not stride: stride = 1 #avoid zero or None as stride max_val=indices.max() if histogram_f is None: if stride < 100: histogram_f = naive_histogram else: histogram_f = wrap_numpy_histogram(max_val) permuted = permutation_f(indices) result = zeros(max_val+1, dtype=int) while start < stop: curr_slice = permuted[start:start+stride] result += histogram_f(curr_slice, max_val=max_val) yield result start += stride PyCogent-1.5.3/cogent/maths/stats/special.py000644 000765 000024 00000077322 12024702176 021777 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Translations of functions from Release 2.3 of the Cephes Math Library, (c) Stephen L. Moshier 1984, 1995. """ from __future__ import division from numpy import exp, log, floor, sin, sqrt __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Rob Knight", "Sandra Smit", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" log_epsilon = 1e-6 #for threshold in log/exp close to 1 #For IEEE arithmetic (IBMPC): MACHEP = 1.11022302462515654042E-16 #2**-53 MAXLOG = 7.09782712893383996843E2 #log(2**1024) MINLOG = -7.08396418532264106224E2 #log(2**-1022) MAXNUM = 1.7976931348623158E308 #2**1024 PI = 3.14159265358979323846 #pi PIO2 = 1.57079632679489661923 #pi/2 PIO4 = 7.85398163397448309616E-1 #pi/4 SQRT2 = 1.41421356237309504880 #sqrt(2) SQRTH = 7.07106781186547524401E-1 #sqrt(2)/2 LOG2E = 1.4426950408889634073599 #1/log(2) SQ2OPI = 7.9788456080286535587989E-1 #sqrt( 2/pi ) LOGE2 = 6.93147180559945309417E-1 #log(2) LOGSQ2 = 3.46573590279972654709E-1 #log(2)/2 THPIO4 = 2.35619449019234492885 #3*pi/4 TWOOPI = 6.36619772367581343075535E-1 #2/pi ROUND_ERROR = 1e-14 # fp rounding error: causes some tests to fail # will round to 0 if smaller in magnitude than this def fix_rounding_error(x): """If x is almost in the range 0-1, fixes it. Specifically, if x is between -ROUND_ERROR and 0, returns 0. If x is between 1 and 1+ROUND_ERROR, returns 1. """ if -ROUND_ERROR < x < 0: return 0 elif 1 < x < 1+ROUND_ERROR: return 1 else: return x def log_one_minus(x): """Returns natural log of (1-x). Useful for probability calculations. """ if abs(x) < log_epsilon: return -x else: return log(1 - x) def one_minus_exp(x): """Returns 1-exp(x). Useful for probability calculations. """ if abs(x) < log_epsilon: return -x else: return 1 - exp(x) def permutations(n, k): """Returns the number of ways of choosing k items from n, in order. Defined as n!/(n-k)!. """ #Validation: k must be be between 0 and n (inclusive), and n must be >=0. if k > n: raise IndexError, "Can't choose %s items from %s" % (k, n) elif k < 0: raise IndexError, "Can't choose negative number of items" elif n < 0: raise IndexError, "Can't choose from negative number of items" if min(n, k) < 20 and isinstance(n,int) and isinstance(k,int): return permutations_exact(n, k) else: return exp(ln_permutations(n, k)) def permutations_exact(n, k): """Calculates permutations by integer division. Preferred method for small permutations, but slow on larger ones. Note: no error checking (expects to be called through permutations()) """ product = 1 for i in xrange(n-k+1, n+1): product *= i return product def ln_permutations(n, k): """Calculates permutations by difference in log of gamma function. Preferred method for large permutations, but slow on smaller ones. Note: no error checking (expects to be called through permutations()) """ return lgam(n+1) - lgam(n-k+1) def combinations(n, k): """Returns the number of ways of choosing k items from n, in order. Defined as n!/(k!(n-k)!). """ #Validation: k must be be between 0 and n (inclusive), and n must be >=0. if k > n: raise IndexError, "Can't choose %s items from %s" % (k, n) elif k < 0: raise IndexError, "Can't choose negative number of items" elif n < 0: raise IndexError, "Can't choose from negative number of items" #if min(n, k) < 20: if min(n, k) < 20 and isinstance(n,int) and isinstance(k,int): return combinations_exact(n, k) else: return exp(ln_combinations(n, k)) def combinations_exact(n, k): """Calculates combinations by integer division. Preferred method for small combinations, but slow on larger ones. Note: no error checking (expects to be called through combinations()) """ #permutations(n, k) = permutations(n, n-k), so reduce computation by #figuring out which requires calculation of fewer terms. if k > (n - k): larger = k smaller = n - k else: larger = n - k smaller = k product = 1 #compute n!/(n-larger)! by multiplying terms from n to (n-larger+1) for i in xrange(larger+1, n+1): product *= i #divide by (smaller)! by multiplying terms from 2 to smaller for i in xrange(2, smaller+1): #no need to divide by 1... product /= i #ok to use integer division: should always be factor return product def ln_combinations(n, k): """Calculates combinations by difference in log of gamma function. Preferred method for large combinations, but slow on smaller ones. Note: no error checking (expects to be called through combinations()) """ return lgam(n+1) - lgam(k+1) - lgam(n-k+1) def ln_binomial(successes, trials, prob): """Returns the natural log of the binomial distribution. successes: number of successes trials: number of trials prob: probability of success Works for int and float values. Approximated by the gamma function. Note: no error checking (expects to be called through binomial_exact()) """ prob = fix_rounding_error(prob) return ln_combinations(trials, successes) + successes * log(prob) + \ (trials-successes) * log(1.0-prob) #Translations of functions from Cephes Math Library, by Stephen L. Moshier def polevl(x, coef): """evaluates a polynomial y = C_0 + C_1x + C_2x^2 + ... + C_Nx^N Coefficients are stored in reverse order, i.e. coef[0] = C_N """ result = 0 for c in coef: result = result * x + c return result #Coefficients for zdist follow: ZP = [ 2.46196981473530512524E-10, 5.64189564831068821977E-1, 7.46321056442269912687E0, 4.86371970985681366614E1, 1.96520832956077098242E2, 5.26445194995477358631E2, 9.34528527171957607540E2, 1.02755188689515710272E3, 5.57535335369399327526E2, ] ZQ = [ 1.0, 1.32281951154744992508E1, 8.67072140885989742329E1, 3.54937778887819891062E2, 9.75708501743205489753E2, 1.82390916687909736289E3, 2.24633760818710981792E3, 1.65666309194161350182E3, 5.57535340817727675546E2, ] ZR = [ 5.64189583547755073984E-1, 1.27536670759978104416E0, 5.01905042251180477414E0, 6.16021097993053585195E0, 7.40974269950448939160E0, 2.97886665372100240670E0, ] ZS = [ 1.00000000000000000000E0, 2.26052863220117276590E0, 9.39603524938001434673E0, 1.20489539808096656605E1, 1.70814450747565897222E1, 9.60896809063285878198E0, 3.36907645100081516050E0, ] ZT = [ 9.60497373987051638749E0, 9.00260197203842689217E1, 2.23200534594684319226E3, 7.00332514112805075473E3, 5.55923013010394962768E4, ] ZU = [ 1.00000000000000000000E0, 3.35617141647503099647E1, 5.21357949780152679795E2, 4.59432382970980127987E3, 2.26290000613890934246E4, 4.92673942608635921086E4, ] def erf(a): """Returns the error function of a: see Cephes docs.""" if abs(a) > 1: return 1 - erfc(a) z = a * a return a * polevl(z, ZT)/polevl(z, ZU) def erfc(a): """Returns the complement of the error function of a: see Cephes docs.""" if a < 0: x = -a else: x = a if x < 1: return 1 - erf(a) z = -a * a if z < -MAXLOG: #underflow if a < 0: return 2 else: return 0 z = exp(z) if x < 8: p = polevl(x, ZP) q = polevl(x, ZQ) else: p = polevl(x, ZR) q = polevl(x, ZS) y = z * p / q if a < 0: y = 2 - y if y == 0: #underflow if a < 0: return 2 else: return 0 else: return y #Coefficients for Gamma follow: GA = [ 8.11614167470508450300E-4, -5.95061904284301438324E-4, 7.93650340457716943945E-4, -2.77777777730099687205E-3, 8.33333333333331927722E-2, ] GB = [ -1.37825152569120859100E3, -3.88016315134637840924E4, -3.31612992738871184744E5, -1.16237097492762307383E6, -1.72173700820839662146E6, -8.53555664245765465627E5, ] GC = [ 1.00000000000000000000E0, -3.51815701436523470549E2, -1.70642106651881159223E4, -2.20528590553854454839E5, -1.13933444367982507207E6, -2.53252307177582951285E6, -2.01889141433532773231E6, ] GP = [ 1.60119522476751861407E-4, 1.19135147006586384913E-3, 1.04213797561761569935E-2, 4.76367800457137231464E-2, 2.07448227648435975150E-1, 4.94214826801497100753E-1, 9.99999999999999996796E-1, ] GQ = [ -2.31581873324120129819E-5, 5.39605580493303397842E-4, -4.45641913851797240494E-3, 1.18139785222060435552E-2, 3.58236398605498653373E-2, -2.34591795718243348568E-1, 7.14304917030273074085E-2, 1.00000000000000000320E0, ] STIR = [ 7.87311395793093628397E-4, -2.29549961613378126380E-4, -2.68132617805781232825E-3, 3.47222221605458667310E-3, 8.33333333333482257126E-2, ] MAXSTIR = 143.01608 MAXLGM = 2.556348e305 MAXGAM = 171.624376956302725 LOGPI = 1.14472988584940017414 SQTPI = 2.50662827463100050242E0 LS2PI = 0.91893853320467274178 #Generally useful constants SQRTH = 7.07106781186547524401E-1 SQRT2 = 1.41421356237309504880 MAXLOG = 7.09782712893383996843E2 MINLOG = -7.08396418532264106224E2 MACHEP = 1.11022302462515654042E-16 PI = 3.14159265358979323846 big = 4.503599627370496e15 biginv = 2.22044604925031308085e-16 def igamc(a,x): """Complemented incomplete Gamma integral: see Cephes docs.""" if x <= 0 or a <= 0: return 1 if x < 1 or x < a: return 1 - igam(a, x) ax = a * log(x) - x - lgam(a) if ax < -MAXLOG: #underflow return 0 ax = exp(ax) #continued fraction y = 1 - a z = x + y + 1 c = 0 pkm2 = 1 qkm2 = x pkm1 = x + 1 qkm1 = z * x ans = pkm1/qkm1 while 1: c += 1 y += 1 z += 2 yc = y * c pk = pkm1 * z - pkm2 * yc qk = qkm1 * z - qkm2 * yc if qk != 0: r = pk/qk t = abs((ans-r)/r) ans = r else: t = 1 pkm2 = pkm1 pkm1 = pk qkm2 = qkm1 qkm1 = qk if abs(pk) > big: pkm2 *= biginv pkm1 *= biginv qkm2 *= biginv qkm1 *= biginv if t <= MACHEP: break return ans * ax def igam(a, x): """Left tail of incomplete gamma function: see Cephes docs for details""" if x <= 0 or a <= 0: return 0 if x > 1 and x > a: return 1 - igamc(a,x) #Compute x**a * exp(x) / Gamma(a) ax = a * log(x) - x - lgam(a) if ax < -MAXLOG: #underflow return 0.0 ax = exp(ax) #power series r = a c = 1 ans = 1 while 1: r += 1 c *= x/r ans += c if c/ans <= MACHEP: break return ans * ax / a def lgam(x): """Natural log of the gamma fuction: see Cephes docs for details""" sgngam = 1 if x < -34: q = -x w = lgam(q) p = floor(q) if p == q: raise OverflowError, "lgam returned infinity." i = p if i & 1 == 0: sgngam = -1 else: sgngam = 1 z = q - p if z > 0.5: p += 1 z = p - q z = q * sin(PI * z) if z == 0: raise OverflowError, "lgam returned infinity." z = LOGPI - log(z) - w return z if x < 13: z = 1 p = 0 u = x while u >= 3: p -= 1 u = x + p z *= u while u < 2: if u == 0: raise OverflowError, "lgam returned infinity." z /= u p += 1 u = x + p if z < 0: sgngam = -1 z = -z else: sgngam = 1 if u == 2: return log(z) p -= 2 x = x + p p = x * polevl(x, GB)/polevl(x,GC) return log(z) + p if x > MAXLGM: raise OverflowError, "Too large a value of x in lgam." q = (x - 0.5) * log(x) - x + LS2PI if x > 1.0e8: return q p = 1/(x*x) if x >= 1000: q += (( 7.9365079365079365079365e-4 * p -2.7777777777777777777778e-3) *p + 0.0833333333333333333333) / x else: q += polevl(p, GA)/x return q def betai(aa, bb, xx): """Returns integral of the incomplete beta density function, from 0 to x. See Cephes docs for details. """ if aa <= 0 or bb <= 0: raise ValueError, "betai: a and b must both be > 0." if xx == 0: return 0 if xx == 1: return 1 if xx < 0 or xx > 1: raise ValueError, "betai: x must be between 0 and 1." flag = 0 if (bb * xx <= 1) and (xx <= 0.95): t = pseries(aa, bb, xx) return betai_result(t, flag) w = 1 - xx #reverse a and b if x is greater than the mean if xx > (aa/(aa + bb)): flag = 1 a = bb b = aa xc = xx x = w else: a = aa b = bb xc = w x = xx if (flag == 1) and ((b * x) <= 1) and (x <= 0.95): t = pseries(a, b, x) return betai_result(t, flag) #choose expansion for better convergence y = x * (a + b - 2) - (a - 1) if y < 0: w = incbcf(a, b, x) else: w = incbd(a, b, x) / xc y = a * log(x) t = b * log(xc) if ((a + b) < MAXGAM) and (abs(y) < MAXLOG) and (abs(t) < MAXLOG): t = pow(xc, b) t *= pow(x, a) t /= a t *= w t *= Gamma(a+b) / (Gamma(a) * Gamma(b)) return betai_result(t, flag) #resort to logarithms y += t + lgam(a+b) - lgam(a) - lgam(b) y += log(w/a) if y < MINLOG: t = 0 else: t = exp(y) return betai_result(t, flag) def betai_result(t, flag): if flag == 1: if t <= MACHEP: t = 1 - MACHEP else: t = 1 - t return t incbet = betai #shouldn't have renamed in first place... def incbcf(a, b, x): """Incomplete beta integral, first continued fraction representation. See Cephes docs for details.""" k1 = a k2 = a + b k3 = a k4 = a + 1 k5 = 1 k6 = b - 1 k7 = k4 k8 = a + 2 pkm2 = 0 qkm2 = 1 pkm1 = 1 qkm1 = 1 ans = 1 r = 1 n = 0 thresh = 3 * MACHEP while 1: xk = -(x * k1 * k2)/(k3 * k4) pk = pkm1 + pkm2 * xk qk = qkm1 + qkm2 * xk pkm2 = pkm1 pkm1 = pk qkm2 = qkm1 qkm1 = qk xk = (x * k5 * k6)/(k7 * k8) pk = pkm1 + pkm2 * xk qk = qkm1 + qkm2 * xk pkm2 = pkm1 pkm1 = pk qkm2 = qkm1 qkm1 = qk if qk != 0: r = pk/qk if r != 0: t = abs((ans-r)/r) ans = r else: t = 1 if t < thresh: return ans k1 += 1 k2 += 1 k3 += 2 k4 += 2 k5 += 1 k6 -= 1 k7 += 2 k8 += 2 if (abs(qk) + abs(pk)) > big: pkm2 *= biginv pkm1 *= biginv qkm2 *= biginv qkm1 *= biginv if (abs(qk) < biginv) or (abs(pk) < biginv): pkm2 *= big pkm1 *= big qkm2 *= big qkm1 *= big n += 1 if n >= 300: return ans def incbd(a, b, x): """Incomplete beta integral, second continued fraction representation. See Cephes docs for details.""" k1 = a k2 = b - 1.0 k3 = a k4 = a + 1.0 k5 = 1.0 k6 = a + b k7 = a + 1.0 k8 = a + 2.0 pkm2 = 0.0 qkm2 = 1.0 pkm1 = 1.0 qkm1 = 1.0 z = x / (1.0-x) ans = 1.0 r = 1.0 n = 0 thresh = 3 * MACHEP while 1: xk = - (z * k1 * k2)/(k3 * k4) pk = pkm1 + pkm2 * xk qk = qkm1 + qkm2 * xk pkm2 = pkm1 pkm1 = pk qkm2 = qkm1 qkm1 = qk xk = (z * k5 * k6)/(k7 * k8) pk = pkm1 + pkm2 * xk qk = qkm1 + qkm2 * xk pkm2 = pkm1 pkm1 = pk qkm2 = qkm1 qkm1 = qk if qk != 0: r = pk/qk if r != 0: t = abs((ans - r)/r) ans = r else: t = 1.0 if t < thresh: return ans k1 += 1 k2 -= 1 k3 += 2 k4 += 2 k5 += 1 k6 += 1 k7 += 2 k8 += 2 if (abs(qk) + abs(pk)) > big: pkm2 *= biginv pkm1 *= biginv qkm2 *= biginv qkm1 *= biginv if (abs(qk) < biginv) or (abs(pk) < biginv): pkm2 *= big pkm1 *= big qkm2 *= big qkm1 *= big n += 1 if n >= 300: return ans def Gamma(x): """Returns the gamma function, a generalization of the factorial. See Cephes docs for details.""" sgngam = 1 q = abs(x) if q > 33: if x < 0: p = floor(q) if p == q: raise OverflowError, "Bad value of x in Gamma function." i = p if (i & 1) == 0: sgngam = -1 z = q - p if z > 0.5: p += 1 z = q - p z = q * sin(PI * z) if z == 0: raise OverflowError, "Bad value of x in Gamma function." z = abs(z) z = PI/(z * stirf(q)) else: z = stirf(x) return sgngam * z z = 1 while x >= 3: x -= 1 z *= x while x < 0: if x > -1e9: return Gamma_small(x, z) while x < 2: if x < 1e-9: return Gamma_small(x, z) z /= x x += 1 if x == 2: return z x -= 2 p = polevl(x, GP) q = polevl(x, GQ) return z * p / q def Gamma_small(x, z): if x == 0: raise OverflowError, "Bad value of x in Gamma function." else: return z / ((1 + 0.5772156649015329 * x) * x) def stirf(x): """Stirling's approximation for the Gamma function. Valid for 33 <= x <= 162. See Cephes docs for details. """ w = 1.0/x w = 1 + w * polevl(w, STIR) y = exp(x) if x > MAXSTIR: #avoid overflow in pow() v = pow(x, 0.5 * x - 0.25) y = v * (v/y) else: y = pow(x, x - 0.5) / y return SQTPI * y * w def pseries(a, b, x): """Power series for incomplete beta integral. Use when b * x is small and x not too close to 1. See Cephes docs for details. """ ai = 1 / a u = (1-b) * x v = u / (a + 1) t1 = v t = u n = 2 s = 0 z = MACHEP * ai while abs(v) > z: u = (n - b) * x / n t *= u v = t / (a + n) s += v n += 1 s += t1 s += ai u = a * log(x) if ((a + b) < MAXGAM) and (abs(u) < MAXLOG): t = Gamma(a+b)/(Gamma(a)*Gamma(b)) s = s * t * pow(x, a) else: t = lgam(a+b) - lgam(a) - lgam(b) + u + log(s) if t < MINLOG: s = 0 else: s = exp(t) return(s) def log1p(x): """Log for values close to 1: from Cephes math library""" z = 1+x if (z < SQRTH) or (z > SQRT2): return log(z) z = x * x z = -0.5 * z + x * (z * polevl(x, LP) / polevl(x, LQ)) return x + z LP = [ 4.5270000862445199635215E-5, 4.9854102823193375972212E-1, 6.5787325942061044846969E0, 2.9911919328553073277375E1, 6.0949667980987787057556E1, 5.7112963590585538103336E1, 2.0039553499201281259648E1, ] LQ = [ 1, 1.5062909083469192043167E1, 8.3047565967967209469434E1, 2.2176239823732856465394E2, 3.0909872225312059774938E2, 2.1642788614495947685003E2, 6.0118660497603843919306E1, ] def expm1(x): """Something to do with exp? From Cephes.""" if (x < -0.5) or (x > 0.5): return (exp(x) - 1.0) xx = x * x r = x * polevl(xx, EP) r /= polevl(xx, EQ) - r return r + r EP = [ 1.2617719307481059087798E-4, 3.0299440770744196129956E-2, 9.9999999999999999991025E-1, ] EQ = [ 3.0019850513866445504159E-6, 2.5244834034968410419224E-3, 2.2726554820815502876593E-1, 2.0000000000000000000897E0, ] def igami(a, y0): #bound the solution x0 = MAXNUM yl = 0 x1 = 0 yh = 1.0 dithresh = 5.0 * MACHEP #handle easy cases if ((y0<0.0) or (y0>1.0) or (a<=0)): raise ZeroDivisionError, "y0 must be between 0 and 1; a >= 0" elif (y0==0.0): return MAXNUM elif (y0==1.0): return 0.0 #approximation to inverse function d = 1.0/(9.0*a) y = ( 1.0 - d - ndtri(y0) * sqrt(d) ) x = a * y * y * y lgm = lgam(a); for i in range(10): #this loop is just to eliminate gotos while 1: if( x > x0 or x < x1 ): break y = igamc(a,x); if( y < yl or y > yh ): break if( y < y0 ): x0 = x yl = y else: x1 = x yh = y #compute the derivative of the function at this point d = (a - 1.0) * log(x) - x - lgm if d < -MAXLOG: break d = -exp(d) #compute the step to the next approximation of x d = (y - y0)/d if abs(d/x) < MACHEP: return x x -= d break #Resort to interval halving if Newton iteration did not converge. d = 0.0625 if x0 == MAXNUM: if x <= 0.0: x = 1.0 while x0 == MAXNUM: x = (1.0 + d) * x y = igamc(a, x) if y < y0: x0 = x yl = y break d += d d = 0.5 dir = 0 for i in range(400): x = x1 + d * (x0 - x1) y = igamc(a, x) lgm = (x0 - x1)/(x1 + x0) if abs(lgm) < dithresh: break lgm = (y - y0)/y0 if abs(lgm) < dithresh: break if x <= 0.0: break if y >= y0: x1 = x yh = y if dir < 0: dir = 0 d = 0.5 elif dir > 1: d = 0.5 * d + 0.5 else: d = (y0 - yl)/(yh - yl) dir += 1 else: x0 = x yl = y if dir > 0: dir = 0 d = 0.5 elif dir < -1: d *= 0.5 else: d = (y0 - yl)/(yh - yl) dir -= 1 if x == 0.0: return 0 return x P0 = [ -5.99633501014107895267E1, 9.80010754185999661536E1, -5.66762857469070293439E1, 1.39312609387279679503E1, -1.23916583867381258016E0, ] Q0 = [ 1.00000000000000000000E0, 1.95448858338141759834E0, 4.67627912898881538453E0, 8.63602421390890590575E1, -2.25462687854119370527E2, 2.00260212380060660359E2, -8.20372256168333339912E1, 1.59056225126211695515E1, -1.18331621121330003142E0, ] s2pi = 2.50662827463100050242E0 P1 = [ 4.05544892305962419923E0, 3.15251094599893866154E1, 5.71628192246421288162E1, 4.40805073893200834700E1, 1.46849561928858024014E1, 2.18663306850790267539E0, -1.40256079171354495875E-1, -3.50424626827848203418E-2, -8.57456785154685413611E-4, ] Q1 = [ 1.00000000000000000000E0, 1.57799883256466749731E1, 4.53907635128879210584E1, 4.13172038254672030440E1, 1.50425385692907503408E1, 2.50464946208309415979E0, -1.42182922854787788574E-1, -3.80806407691578277194E-2, -9.33259480895457427372E-4, ] P2 = [ 3.23774891776946035970E0, 6.91522889068984211695E0, 3.93881025292474443415E0, 1.33303460815807542389E0, 2.01485389549179081538E-1, 1.23716634817820021358E-2, 3.01581553508235416007E-4, 2.65806974686737550832E-6, 6.23974539184983293730E-9, ] Q2 = [ 1.00000000000000000000E0, 6.02427039364742014255E0, 3.67983563856160859403E0, 1.37702099489081330271E0, 2.16236993594496635890E-1, 1.34204006088543189037E-2, 3.28014464682127739104E-4, 2.89247864745380683936E-6, 6.79019408009981274425E-9, ] exp_minus_2 = 0.13533528323661269189 def ndtri(y0): """Inverse normal distribution function. This is here and not in distributions because igami depends on it...""" y0 = fix_rounding_error(y0) #handle easy cases if y0 <= 0.0: return -MAXNUM elif y0 >= 1.0: return MAXNUM code = 1; y = y0; if y > (1.0 - exp_minus_2): y = 1.0 - y code = 0 if y > exp_minus_2: y -= 0.5 y2 = y * y x = y + y * (y2 * polevl(y2, P0)/polevl(y2, Q0)) x = x * s2pi return x x = sqrt(-2.0 * log(y)) x0 = x - log(x)/x z = 1.0/x if x < 8.0: # y > exp(-32) = 1.2664165549e-14 x1 = z * polevl(z, P1)/polevl(z, Q1) else: x1 = z * polevl(z, P2)/polevl(z, Q2) x = x0 - x1 if code != 0: x = -x return x def incbi(aa, bb, yy0): """Incomplete beta inverse function. See Cephes for docs.""" #handle easy cases first i = 0 if yy0 <= 0: return 0.0 elif yy0 >= 1.0: return 1.0 #define inscrutable parameters x0 = 0.0 yl = 0.0 x1 = 1.0 yh = 1.0 nflg = 0 if aa <= 1.0 or bb <= 1.0: dithresh = 1.0e-6 rflg = 0 a = aa b = bb y0 = yy0 x = a/(a+b) y = incbet(a, b, x) return _incbi_ihalve(\ dithresh, rflg, nflg, a, b, x0, yl, x1, yh, y0, x, y, aa, bb, yy0) else: dithresh = 1.0e-4 # approximation to inverse function yp = -ndtri(yy0) if yy0 > 0.5: rflg = 1 a = bb b = aa y0 = 1.0 - yy0 yp = -yp else: rflg = 0 a = aa b = bb y0 = yy0 lgm = (yp * yp - 3.0)/6.0 x = 2.0/(1.0/(2.0*a-1.0) + 1.0/(2.0*b-1.0)) d = yp * sqrt(x + lgm) / x \ - ( 1.0/(2.0*b-1.0) - 1.0/(2.0*a-1.0) ) \ * (lgm + 5.0/6.0 - 2.0/(3.0*x)) d *= 2.0 if d < MINLOG: x = 1.0 return _incbi_under(rflg, x) x = a/(a + b * exp(d)) y = incbet(a, b, x) yp = (y - y0)/y0 if abs(yp) < 0.2: return _incbi_newt(\ dithresh, rflg, nflg, a, b, x0, yl, x1, yh, y0, x, y, aa, bb, yy0) else: return _incbi_ihalve(\ dithresh, rflg, nflg, a, b, x0, yl, x1, yh, y0, x, y, aa, bb, yy0) def _incbi_done(rflg, x): """Final test in incbi.""" if rflg: if x <= MACHEP: x = 1.0 - MACHEP else: x = 1.0 - x return x def _incbi_under(rflg, x): """Underflow handler in incbi.""" x = 0.0 return _incbi_done(rflg, x) class IhalveRepeat(Exception): pass # Resort to interval halving if not close enough. def _incbi_ihalve(\ dithresh, rflg, nflg, a, b, x0, yl, x1, yh, y0, x, y, aa, bb, yy0): """Interval halving in incbi.""" while 1: try: dir = 0 di = 0.5 for i in range(100): if i != 0: x = x0 + di * (x1 - x0) if x == 1.0: x = 1.0 - MACHEP if x == 0.0: di = 0.5 x = x0 + di * (x1 - x0) if x == 0.0: return _incbi_under(rflg, x) y = incbet(a, b, x) yp = (x1 - x0)/(x1 + x0) if abs(yp) < dithresh: return _incbi_newt(\ dithresh, rflg, nflg, a, b, x0, yl, x1, yh, y0, x, y, aa, bb, yy0) yp = (y-y0)/y0 if abs(yp) < dithresh: return _incbi_newt(\ dithresh, rflg, nflg, a, b, x0, yl, x1, yh, y0, x, y, aa, bb, yy0) if y < y0: x0 = x yl = y if dir < 0: dir = 0 di = 0.5 elif dir > 3: di = 1.0 - (1.0 - di) * (1.0 - di) elif dir > 1: di = 0.5 * di + 0.5 else: di = (y0 - y)/(yh - yl) dir += 1 if x0 > 0.75: if rflg == 1: rflg = 0 a = aa b = bb y0 = yy0 else: rflg = 1 a = bb b = aa y0 = 1.0 - yy0 x = 1.0 - x y = incbet(a, b, x) x0 = 0.0 yl = 0.0 x1 = 1.0 yh = 1.0 raise IhalveRepeat else: x1 = x; if rflg == 1 and x1 < MACHEP: x = 0.0 return _incbi_done(rflg, x) yh = y if dir > 0: dir = 0 di = 0.5 elif dir < -3: di *= di elif dir < -1: di *= 0.5 else: di = (y - y0)/(yh - yl) dir -= 1 if x0 >= 1.0: x = 1.0 - MACHEP return _incbi_done(rflg, x) if x <= 0.0: return _incbi_under(rflg, x) except IhalveRepeat: continue def _incbi_newt(\ dithresh, rflg, nflg, a, b, x0, yl, x1, yh, y0, x, y, aa, bb, yy0): """Newton's method for incbi.""" if nflg: return _incbi_done(rflg, x) nflg = 1 lgm = lgam(a+b) - lgam(a) - lgam(b) for i in range(8): # Compute the function at this point. if i != 0: y = incbet(a,b,x); if y < yl: x = x0 y = yl elif y > yh: x = x1 y = yh elif y < y0: x0 = x yl = y else: x1 = x yh = y if x == 1.0 or x == 0.0: break # Compute the derivative of the function at this point. d = (a - 1.0) * log(x) + (b - 1.0) * log(1.0-x) + lgm if d < MINLOG: return _incbi_done(rflg, x) if d > MAXLOG: break d = exp(d) # Compute the step to the next approximation of x. d = (y - y0)/d xt = x - d if xt <= x0: y = (x - x0) / (x1 - x0) xt = x0 + 0.5 * y * (x - x0) if xt <= 0.0: break if xt >= x1: y = (x1 - x) / (x1 - x0) xt = x1 - 0.5 * y * (x1 - x) if xt >= 1.0: break x = xt if abs(d/x) < 128.0 * MACHEP: return _incbi_done(rflg, x) # Did not converge. dithresh = 256.0 * MACHEP return _incbi_ihalve(\ dithresh, rflg, nflg, a, b, x0, yl, x1, yh, y0, x, y, aa, bb, yy0) PyCogent-1.5.3/cogent/maths/stats/test.py000644 000765 000024 00000167207 12024702176 021340 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides standard statistical tests. Tests produce statistic and P-value. """ from __future__ import division import warnings from cogent.maths.stats.distribution import chi_high, z_low, z_high, zprob, \ t_high, t_low, tprob, f_high, f_low, fprob, binomial_high, binomial_low, \ ndtri from cogent.maths.stats.special import lgam, log_one_minus, one_minus_exp,\ MACHEP from cogent.maths.stats.ks import psmirnov2x, pkstwo from cogent.maths.stats.kendall import pkendall, kendalls_tau from cogent.maths.stats.special import Gamma from numpy import absolute, arctanh, array, asarray, concatenate, transpose, \ ravel, take, nonzero, log, sum, mean, cov, corrcoef, fabs, any, \ reshape, tanh, clip, nan, isnan, isinf, sqrt, trace, exp, \ median as _median, zeros, ones #, std - currently incorrect from numpy.random import permutation, randint from cogent.maths.stats.util import Numbers from operator import add from random import choice __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Rob Knight", "Catherine Lozupone", "Sandra Smit", "Micah Hamady", "Daniel McDonald", "Greg Caporaso", "Jai Ram Rideout", "Michael Dwan"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class IndexOrValueError(IndexError, ValueError): pass var = cov #cov will calculate variance if called on a vector def std_(x, axis=None): """Returns standard deviations by axis (similiar to numpy.std) The result is unbiased, matching the result from MLab.std """ x = asarray(x) if axis is None: d = x - mean(x) return sqrt(sum(d**2)/(len(x)-1)) elif axis == 0: result = [] for col in range(x.shape[1]): vals = x[:,col] d = vals - mean(vals) result.append(sqrt(sum(d**2)/(len(x)-1))) return result elif axis == 1: result = [] for row in range(x.shape[0]): vals = x[row,:] d = vals - mean(vals) result.append(sqrt(sum(d**2)/(len(x)-1))) return result else: raise ValueError, "axis out of bounds" # tested only by std def var(x, axis=None): """Returns unbiased standard deviations over given axis. Similar with numpy.std, except that it is unbiased. (var = SS/n-1) x: a float ndarray or asarray(x) is a float ndarray. axis=None: computed for the flattened array by default, or compute along an integer axis. Implementation Notes: Change the SS calculation from: SumSq(x-x_bar) to SumSq(x) - SqSum(x)/n See p. 37 of Zar (1999) Biostatistical Analysis. """ x = asarray(x) #figure out sample size along the axis if axis is None: n = x.size else: n = x.shape[axis] #compute the sum of squares from the mean(s) sample_SS = sum(x**2, axis) - sum(x, axis)**2/n return sample_SS / (n-1) def std(x, axis=None): """computed unbiased standard deviations along given axis or flat array. Similar with numpy.std, except that it is unbiased. (var = SS/n-1) x: a float ndarray or asarray(x) is a float ndarray. axis=None: computed for the flattened array by default, or compute along an given integer axis. """ try: sample_variance = var(x, axis=axis) except IndexError, e: #just to avoid breaking the old test code raise IndexOrValueError(e) return sqrt(sample_variance) def median(m, axis=None): """Returns medians by axis (similiar to numpy.mean) numpy.median does not except an axis parameter. Is safe for substition for numpy.median """ median_vals = [] rows, cols = m.shape if axis is None: return _median(ravel(m)) elif axis == 0: for col in range(cols): median_vals.append(_median(m[:,col])) elif axis == 1 or axis == -1: for row in range(rows): median_vals.append(_median(m[row,:])) else: raise ValueError, "axis(=%s) out of bounds" % axis return array(median_vals) class ZeroExpectedError(ValueError): """Class for handling tests where an expected value was zero.""" pass def G_2_by_2(a, b, c, d, williams=1, directional=1): """G test for independence in a 2 x 2 table. Usage: G, prob = G_2_by_2(a, b, c, d, willliams, directional) Cells are in the order: a b c d a, b, c, and d can be int, float, or long. williams is a boolean stating whether to do the Williams correction. directional is a boolean stating whether the test is 1-tailed. Briefly, computes sum(f ln f) for cells - sum(f ln f) for rows and columns + f ln f for the table. Always has 1 degree of freedom To generalize the test to r x c, use the same protocol: 2*(cells - rows/cols + table), then with (r-1)(c-1) df. Note that G is always positive: to get a directional test, the appropriate ratio (e.g. a/b > c/d) must be tested as a separate procedure. Find the probability for the observed G, and then either halve or halve and subtract from one depending on whether the directional prediction was upheld. The default test is now one-tailed (Rob Knight 4/21/03). See Sokal & Rohlf (1995), ch. 17. Specifically, see box 17.6 (p731). """ cells = [a, b, c, d] n = sum(cells) #return 0 if table was empty if not n: return (0, 1) #raise error if any counts were negative if min(cells) < 0: raise ValueError, \ "G_2_by_2 got negative cell counts(s): must all be >= 0." G = 0 #Add x ln x for items, adding zero for items whose counts are zero for i in filter(None, cells): G += i * log(i) #Find totals for rows and cols ab = a + b cd = c + d ac = a + c bd = b + d rows_cols = [ab, cd, ac, bd] #exit if we are missing a row or column entirely: result counts as #never significant if min(rows_cols) == 0: return (0, 1) #Subtract x ln x for rows and cols for i in filter(None, rows_cols): G -= i * log(i) #Add x ln x for table G += n * log(n) #Result needs to be multiplied by 2 G *= 2 #apply Williams correction if williams: q = 1 + (( ( (n/ab) + (n/cd) ) -1 ) * ( ( (n/ac) + (n/bd) ) -1))/(6*n) G /= q p = chi_high(max(G,0), 1) #find which tail we were in if the test was directional if directional: is_high = ((b == 0) or (d != 0 and (a/b > c/d))) p = tail(p, is_high) if not is_high: G = -G return G, p def safe_sum_p_log_p(a, base=None): """Calculates p * log(p) safely for an array that may contain zeros.""" flat = ravel(a) nz = take(flat, nonzero(flat)[0]) logs = log(nz) if base: logs /= log(base) return sum(nz * logs, 0) def G_ind(m, williams=False): """Returns G test for independence in an r x c table. Requires input data as a numpy array. From Sokal and Rohlf p 738. """ f_ln_f_elements = safe_sum_p_log_p(m) f_ln_f_rows = safe_sum_p_log_p(sum(m,0)) f_ln_f_cols = safe_sum_p_log_p(sum(m,1)) tot = sum(ravel(m)) f_ln_f_table = tot * log(tot) df = (len(m)-1) * (len(m[0])-1) G = 2*(f_ln_f_elements-f_ln_f_rows-f_ln_f_cols+f_ln_f_table) if williams: q = 1+((tot*sum(1.0/sum(m,1))-1)*(tot*sum(1.0/sum(m,0))-1)/ \ (6*tot*df)) G = G/q return G, chi_high(max(G,0), df) def calc_contingency_expected(matrix): """Calculates expected frequencies from a table of observed frequencies The input matrix is a dict2D object and represents a frequency table with different variables in the rows and columns. (observed frequencies as values) The expected value is calculated with the following equation: Expected = row_total x column_total / overall_total The returned matrix (dict2D) has lists of the observed and the expected frequency as values """ #transpose matrix for calculating column totals t_matrix = matrix.copy() t_matrix.transpose() overall_total = sum(list(matrix.Items)) #make new matrix for storing results result = matrix.copy() #populate result with expected values for row in matrix: row_sum = sum(matrix[row].values()) for item in matrix[row]: column_sum = sum(t_matrix[item].values()) #calculate expected frequency Expected = (row_sum * column_sum)/overall_total result[row][item] = [result[row][item]] result[row][item].append(Expected) return result def G_fit(obs, exp, williams=1): """G test for fit between two lists of counts. Usage: test, prob = G_fit(obs, exp, williams) obs and exp are two lists of numbers. williams is a boolean stating whether to do the Williams correction. SUM(2 f(obs)ln (f(obs)/f(exp))) See Sokal and Rohlf chapter 17. """ k = len(obs) if k != len(exp): raise ValueError, "G_fit requires two lists of equal length." G = 0 n = 0 for o, e in zip(obs, exp): if o < 0: raise ValueError, \ "G_fit requires all observed values to be positive." if e <= 0: raise ZeroExpectedError, \ "G_fit requires all expected values to be positive." if o: #if o is zero, o * log(o/e) must be zero as well. G += o * log(o/e) n += o G *= 2 if williams: q = 1 + (k + 1)/(6*n) G /= q return G, chi_high(G, k - 1) def G_fit_from_Dict2D(data): """G test for fit on a Dict2D data is a dict2D. Values are a list containing the observed and expected frequencies (can be created with calc_contingency_expected) """ obs_counts = [] exp_counts = [] for item in data.Items: if len(item) == 2: obs_counts.append(item[0]) exp_counts.append(item[1]) g_val, prob = G_fit(obs_counts, exp_counts) return g_val, prob def chi_square_from_Dict2D(data): """Chi Square test on a Dict2D data is a Dict2D. The values are a list of the observed (O) and expected (E) frequencies,(can be created with calc_contingency_expected) The chi-square value (test) is the sum of (O-E)^2/E over the items in data degrees of freedom are calculated from data as: (r-1)*(c-1) if cols and rows are both > 1 otherwise is just 1 - the # of rows or columns (whichever is greater than 1) """ test = sum([((item[0] - item[1]) * (item[0] - item[1]))/item[1] \ for item in data.Items]) num_rows = len(data) num_cols = len([col for col in data.Cols]) if num_rows == 1: df = num_cols - 1 elif num_cols == 1: df = num_rows - 1 elif num_rows == 0 or num_cols == 0: raise ValueError, "data matrix must have data" else: df = (len(data) - 1) * (len([col for col in data.Cols]) - 1) return test, chi_high(test, df) def likelihoods(d_given_h, priors): """Calculate likelihoods through marginalization, given Pr(D|H) and priors. Usage: scores = likelihoods(d_given_h, priors) d_given_h and priors are equal-length lists of probabilities. Returns a list of the same length of numbers (not probabilities). """ #check that the lists of Pr(D|H_i) and priors are equal length = len(d_given_h) if length != len(priors): raise ValueError, "Lists not equal lengths." #find weighted sum of Pr(H_i) * Pr(D|H_i) wt_sum = 0 for d, p in zip(d_given_h, priors): wt_sum += d * p #divide each Pr(D|H_i) by the weighted sum and multiply by its prior #to get its likelihood return [d/wt_sum for d in d_given_h] def posteriors(likelihoods, priors): """Calculate posterior probabilities given priors and likelihoods. Usage: probabilities = posteriors(likelihoods, priors) likelihoods is a list of numbers. priors is a list of probabilities. Returns a list of probabilities (0-1). """ #Check that there is a prior for each likelihood if len(likelihoods) != len(priors): raise ValueError, "Lists not equal lengths." #Posterior probability is defined as prior * likelihood return [l * p for l, p in zip(likelihoods, priors)] def bayes_updates(ds_given_h, priors = None): """Successively apply lists of Pr(D|H) to get Pr(H|D) by marginalization. Usage: final_probs = bayes_updates(ds_given_h, [priors]) ds_given_h is a list (for each form of evidence) of lists of probabilities. priors is optionally a list of the prior probabilities. Returns a list of posterior probabilities. """ try: first_list = ds_given_h[0] length = len(first_list) #calculate flat prior if none was passed if not priors: priors = [1/length] * length #apply each form of data to the priors to get posterior probabilities for index, d in enumerate(ds_given_h): #first, ignore the form of data if all the d's are the same all_the_same = True first_element = d[0] for i in d: if i != first_element: all_the_same = False break if not all_the_same: #probabilities won't change if len(d) != length: raise ValueError, "bayes_updates requires equal-length lists." liks = likelihoods(d, priors) pr = posteriors(liks, priors) priors = pr return priors #posteriors after last calculation are 'priors' for next #return column of zeroes if anything went wrong, e.g. if the sum of one of #the ds_given_h is zero. except (ZeroDivisionError, FloatingPointError): return [0] * length def t_paired (a,b, tails=None, exp_diff=0): """Returns t and prob for TWO RELATED samples of scores a and b. From Sokal and Rohlf (1995), p. 354. Calculates the vector of differences and compares it to exp_diff using the 1-sample t test. Usage: t, prob = t_paired(a, b, tails, exp_diff) t is a float; prob is a probability. a and b should be equal-length lists of paired observations (numbers). tails should be None (default), 'high', or 'low'. exp_diff should be the expected difference in means (a-b); 0 by default. """ n = len(a) if n != len(b): raise ValueError, 'Unequal length lists in ttest_paired.' try: diffs = array(a) - array(b) return t_one_sample(diffs, popmean=exp_diff, tails=tails) except (ZeroDivisionError, ValueError, AttributeError, TypeError, \ FloatingPointError): return (None, None) def t_one_sample(a,popmean=0, tails=None): """Returns t for ONE group of scores a, given a population mean. Usage: t, prob = t_one_sample(a, popmean, tails) t is a float; prob is a probability. a should support Mean, StandardDeviation, and Count. popmean should be the expected mean; 0 by default. tails should be None (default), 'high', or 'low'. """ try: n = len(a) t = (mean(a) - popmean)/(std(a)/sqrt(n)) except (ZeroDivisionError, ValueError, AttributeError, TypeError, \ FloatingPointError): return None, None if isnan(t) or isinf(t): return None, None prob = t_tailed_prob(t, n-1, tails) return t, prob def t_two_sample (a, b, tails=None, exp_diff=0): """Returns t, prob for two INDEPENDENT samples of scores a, and b. From Sokal and Rohlf, p 223. Usage: t, prob = t_two_sample(a,b, tails, exp_diff) t is a float; prob is a probability. a and b should be sequences of observations (numbers). Need not be equal lengths. tails should be None (default), 'high', or 'low'. exp_diff should be the expected difference in means (a-b); 0 by default. """ try: #see if we need to back off to the single-observation for single-item #groups n1 = len(a) if n1 < 2: return t_one_observation(sum(a), b, tails, exp_diff) n2 = len(b) if n2 < 2: return t_one_observation(sum(b), a, reverse_tails(tails), exp_diff) #otherwise, calculate things properly x1 = mean(a) x2 = mean(b) df = n1+n2-2 svar = ((n1-1)*var(a) + (n2-1)*var(b))/df t = (x1-x2-exp_diff)/sqrt(svar*(1/n1 + 1/n2)) except (ZeroDivisionError, ValueError, AttributeError, TypeError, \ FloatingPointError), e: #bail out if the sample sizes are wrong, the values aren't numeric or #aren't present, etc. return (None, None) if isnan(t) or isinf(t): return (None, None) prob = t_tailed_prob(t, df, tails) return t, prob def mc_t_two_sample(x_items, y_items, tails=None, permutations=999, exp_diff=0): """Performs a two-sample t-test with Monte Carlo permutations. x_items and y_items must be INDEPENDENT observations (sequences of numbers). They do not need to be of equal length. Returns the observed t statistic, the parametric p-value, a list of t statistics obtained through Monte Carlo permutations, and the nonparametric p-value obtained from the Monte Carlo permutations test. This code is partially based on Jeremy Widmann's qiime.make_distance_histograms.monte_carlo_group_distances code. Arguments: x_items - the first list of observations y_items - the second list of observations tails - if None (the default), a two-sided test is performed. 'high' or 'low' for one-tailed tests permutations - the number of permutations to use in calculating the nonparametric p-value. Must be a number greater than or equal to 0. If 0, the nonparametric test will not be performed. In this case, the list of t statistics obtained from permutations will be empty, and the nonparametric p-value will be None exp_diff - the expected difference in means (x_items - y_items) """ if tails is not None and tails != 'high' and tails != 'low': raise ValueError("Invalid tail type '%s'. Must be either None, " "'high', or 'low'." % tails) if permutations < 0: raise ValueError("Invalid number of permutations: %d. Must be greater " "than or equal to zero." % permutations) if (len(x_items) == 1 and len(y_items) == 1) or \ (len(x_items) < 1 or len(y_items) < 1): raise ValueError("At least one of the sequences of observations is " "empty, or the sequences each contain only a single " "observation. Cannot perform the t-test.") # Perform t-test using original observations. obs_t, param_p_val = t_two_sample(x_items, y_items, tails=tails, exp_diff=exp_diff) # Only perform the Monte Carlo test if we got a sane answer back from the # initial t-test and we have been specified permutations. nonparam_p_val = None perm_t_stats = [] if permutations > 0 and obs_t is not None and param_p_val is not None: # Permute observations between x_items and y_items the specified number # of times. perm_x_items, perm_y_items = _permute_observations(x_items, y_items, permutations) perm_t_stats = [t_two_sample(perm_x_items[n], perm_y_items[n], tails=tails, exp_diff=exp_diff)[0] for n in range(permutations)] # Compute nonparametric p-value based on the permuted t-test results. if tails is None: better = (absolute(array(perm_t_stats)) >= absolute(obs_t)).sum() elif tails == 'low': better = (array(perm_t_stats) <= obs_t).sum() elif tails == 'high': better = (array(perm_t_stats) >= obs_t).sum() nonparam_p_val = (better + 1) / (permutations + 1) return obs_t, param_p_val, perm_t_stats, nonparam_p_val def _permute_observations(x_items, y_items, permutations, permute_f=permutation): """Returns permuted versions of the sequences of observations. Values are permuted between x_items and y_items (i.e. shuffled between the two input sequences of observations). This code is based on Jeremy Widmann's qiime.make_distance_histograms.permute_between_groups code. """ num_x = len(x_items) num_y = len(y_items) num_total_obs = num_x + num_y combined_obs = concatenate((x_items, y_items)) # Generate a list of all permutations. perms = [permute_f(num_total_obs) for i in range(permutations)] # Use random permutations to split into groups. rand_xs = [combined_obs[perm[:num_x]] for perm in perms] rand_ys = [combined_obs[perm[num_x:num_total_obs]] for perm in perms] return rand_xs, rand_ys def t_one_observation(x, sample, tails=None, exp_diff=0): """Returns t-test for significance of single observation versus a sample. Equation for 1-observation t (Sokal and Rohlf 1995 p 228): t = obs - mean - exp_diff / (var * sqrt((n+1)/n)) df = n - 1 """ try: n = len(sample) t = (x - mean(sample) - exp_diff)/std(sample)/sqrt((n+1)/n) except (ZeroDivisionError, ValueError, AttributeError, TypeError, \ FloatingPointError): return (None, None) prob = t_tailed_prob(t, n-1, tails) return t, prob def pearson(x_items, y_items): """Returns Pearson's product moment correlation coefficient. This will always be a value between -1.0 and +1.0. x_items and y_items must be the same length, and cannot have fewer than 2 elements each. If one or both of the input vectors do not have any variation, the return value will be 0.0. Arguments: x_items - the first list of observations y_items - the second list of observations """ x_items, y_items = array(x_items), array(y_items) if len(x_items) != len(y_items): raise ValueError("The length of the two vectors must be the same in " "order to calculate the Pearson correlation " "coefficient.") if len(x_items) < 2: raise ValueError("The two vectors must both contain at least 2 " "elements. The vectors are of length %d." % len(x_items)) sum_x = sum(x_items) sum_y = sum(y_items) sum_x_sq = sum(x_items*x_items) sum_y_sq = sum(y_items*y_items) sum_xy = sum(x_items*y_items) n = len(x_items) try: r = 1.0 * ((n * sum_xy) - (sum_x * sum_y)) / \ (sqrt((n * sum_x_sq)-(sum_x*sum_x))*sqrt((n*sum_y_sq)-(sum_y*sum_y))) except (ZeroDivisionError, ValueError, FloatingPointError): #no variation r = 0.0 #check we didn't get a naughty value for r due to rounding error if r > 1.0: r = 1.0 elif r < -1.0: r = -1.0 return r def spearman(x_items, y_items): """Returns Spearman's rho. This will always be a value between -1.0 and +1.0. x_items and y_items must be the same length, and cannot have fewer than 2 elements each. If one or both of the input vectors do not have any variation, the return value will be 0.0. Arguments: x_items - the first list of observations y_items - the second list of observations """ x_items, y_items = array(x_items), array(y_items) if len(x_items) != len(y_items): raise ValueError("The length of the two vectors must be the same in " "order to calculate Spearman's rho.") if len(x_items) < 2: raise ValueError("The two vectors must both contain at least 2 " "elements. The vectors are of length %d." % len(x_items)) # Rank the two input vectors. rank1, ties1 = _get_rank(x_items) rank2, ties2 = _get_rank(y_items) if ties1 == 0 and ties2 == 0: n = len(rank1) sum_sqr = sum([(x-y)**2 for x,y in zip(rank1,rank2)]) rho = 1 - (6*sum_sqr/(n*(n**2 - 1))) else: avg = lambda x: sum(x)/len(x) x_bar = avg(rank1) y_bar = avg(rank2) numerator = sum([(x-x_bar)*(y-y_bar) for x,y in zip(rank1, rank2)]) denominator = sqrt(sum([(x-x_bar)**2 for x in rank1])* sum([(y-y_bar)**2 for y in rank2])) # Calculate rho. Handle the case when there is no variation in one or # both of the input vectors. if denominator == 0.0: rho = 0.0 else: rho = numerator/denominator return rho def _get_rank(data): """Ranks the elements of a list. Used in Spearman correlation.""" indices = range(len(data)) ranks = range(1,len(data)+1) indices.sort(key=lambda index:data[index]) ranks.sort(key=lambda index:indices[index-1]) data_len = len(data) i = 0 ties = 0 while i < data_len: j = i + 1 val = data[indices[i]] try: val += 0 except TypeError: raise(TypeError) while j < data_len and data[indices[j]] == val: j += 1 dup_ranks = j - i val = float(ranks[indices[i]]) + (dup_ranks-1)/2.0 for k in range(i, i+dup_ranks): ranks[indices[k]] = val i += dup_ranks ties += dup_ranks-1 return ranks, ties def correlation(x_items, y_items): """Returns Pearson correlation between x and y, and its significance. WARNING: x_items and y_items must be same length! This function is retained for backwards-compatibility. Please use correlation_test() for more control over how the test is performed. """ return correlation_test(x_items, y_items, method='pearson', tails=None, permutations=0)[:2] def correlation_test(x_items, y_items, method='pearson', tails=None, permutations=999, confidence_level=0.95): """Computes the correlation between two vectors and its significance. Computes a parametric p-value by using Student's t-distribution with df=n-2 to perform the test of significance, as well as a nonparametric p-value obtained by permuting one of the input vectors the specified number of times given by the permutations parameter. A confidence interval is also computed using Fisher's Z transform if the number of observations is greater than 3. Please see Sokal and Rohlf pp. 575-580 and pg. 598-601 for more details regarding these techniques. Warning: the parametric p-value is unreliable when the method is spearman and there are less than 11 observations in each vector. Returns the correlation coefficient (r or rho), the parametric p-value, a list of the r or rho values obtained from permuting the input, the nonparametric p-value, and a tuple for the confidence interval, with the first element being the lower bound of the confidence interval and the second element being the upper bound for the confidence interval. The confidence interval will be (None, None) if the number of observations is not greater than 3. x_items and y_items must be the same length, and cannot have fewer than 2 elements each. If one or both of the input vectors do not have any variation, r or rho will be 0.0. Note: the parametric portion of this function is based on the correlation function in this module. Arguments: x_items - the first list of observations y_items - the second list of observations method - 'pearson' or 'spearman' tails - if None (the default), a two-sided test is performed. 'high' for a one-tailed test for positive association, or 'low' for a one-tailed test for negative association. This parameter affects both the parametric and nonparametric tests, but the confidence interval will always be two-sided permutations - the number of permutations to use in the nonparametric test. Must be a number greater than or equal to 0. If 0, the nonparametric test will not be performed. In this case, the list of correlation coefficients obtained from permutations will be empty, and the nonparametric p-value will be None confidence_level - the confidence level to use when constructing the confidence interval. Must be between 0 and 1 (exclusive) """ # Perform some initial error checking. if method == 'pearson': corr_fn = pearson elif method == 'spearman': corr_fn = spearman else: raise ValueError("Invalid method '%s'. Must be either 'pearson' or " "'spearman'." % method) if tails is not None and tails != 'high' and tails != 'low': raise ValueError("Invalid tail type '%s'. Must be either None, " "'high', or 'low'." % tails) if permutations < 0: raise ValueError("Invalid number of permutations: %d. Must be greater " "than or equal to zero." % permutations) if confidence_level <= 0 or confidence_level >= 1: raise ValueError("Invalid confidence level: %.4f. Must be between " "zero and one." % confidence_level) # Calculate the correlation coefficient. corr_coeff = corr_fn(x_items, y_items) # Perform the parametric test first. x_items, y_items = array(x_items), array(y_items) n = len(x_items) df = n - 2 if n < 3: parametric_p_val = 1 else: try: t = corr_coeff / sqrt((1 - (corr_coeff * corr_coeff)) / df) parametric_p_val = t_tailed_prob(t, df, tails) except (ZeroDivisionError, FloatingPointError): # r/rho was presumably 1. parametric_p_val = 0 # Perform the nonparametric test. permuted_corr_coeffs = [] nonparametric_p_val = None better = 0 for i in range(permutations): permuted_y_items = y_items[permutation(n)] permuted_corr_coeff = corr_fn(x_items, permuted_y_items) permuted_corr_coeffs.append(permuted_corr_coeff) if tails is None: if abs(permuted_corr_coeff) >= abs(corr_coeff): better += 1 elif tails == 'high': if permuted_corr_coeff >= corr_coeff: better += 1 elif tails == 'low': if permuted_corr_coeff <= corr_coeff: better += 1 else: # Not strictly necessary since this was checked above, but included # for safety in case the above check gets removed or messed up. We # don't want to return a p-value of 0 if someone passes in a bogus # tail type somehow. raise ValueError("Invalid tail type '%s'. Must be either None, " "'high', or 'low'." % tails) if permutations > 0: nonparametric_p_val = (better + 1) / (permutations + 1) # Compute the confidence interval for corr_coeff using Fisher's Z # transform. z_crit = abs(ndtri((1 - confidence_level) / 2)) ci_low, ci_high = None, None if n > 3: try: ci_low = tanh(arctanh(corr_coeff) - (z_crit / sqrt(n - 3))) ci_high = tanh(arctanh(corr_coeff) + (z_crit / sqrt(n - 3))) except (ZeroDivisionError, FloatingPointError): # r/rho was presumably 1 or -1. Match what R does in this case. ci_low, ci_high = corr_coeff, corr_coeff return (corr_coeff, parametric_p_val, permuted_corr_coeffs, nonparametric_p_val, (ci_low, ci_high)) def correlation_matrix(series, as_rows=True): """Returns pairwise correlations between each pair of series. """ return corrcoef(series, rowvar=as_rows) #unused codes below if as_rows: return corrcoef(transpose(array(series))) else: return corrcoef(array(series)) def regress(x, y): """Returns coefficients to the regression line "y=ax+b" from x[] and y[]. Specifically, returns (slope, intercept) as a tuple from the regression of y on x, minimizing the error in y assuming that x is precisely known. Basically, it solves Sxx a + Sx b = Sxy Sx a + N b = Sy where Sxy = \sum_i x_i y_i, Sx = \sum_i x_i, and Sy = \sum_i y_i. The solution is a = (Sxy N - Sy Sx)/det b = (Sxx Sy - Sx Sxy)/det where det = Sxx N - Sx^2. In addition, Var|a| = s^2 |Sxx Sx|^-1 = s^2 | N -Sx| / det |b| |Sx N | |-Sx Sxx| s^2 = {\sum_i (y_i - \hat{y_i})^2 \over N-2} = {\sum_i (y_i - ax_i - b)^2 \over N-2} = residual / (N-2) R^2 = 1 - {\sum_i (y_i - \hat{y_i})^2 \over \sum_i (y_i - \mean{y})^2} = 1 - residual/meanerror Adapted from the following URL: http://www.python.org/topics/scicomp/recipes_in_python.html """ x, y = array(x,'Float64'), array(y,'Float64') N = len(x) Sx = sum(x) Sy = sum(y) Sxx = sum(x*x) Syy = sum(y*y) Sxy = sum(x*y) det = Sxx * N - Sx * Sx return (Sxy * N - Sy * Sx)/det, (Sxx * Sy - Sx * Sxy)/det def regress_origin(x,y): """Returns coefficients to regression "y=ax+b" passing through origin. Requires vectors x and y of same length. See p. 351 of Zar (1999) Biostatistical Analysis. returns slope, intercept as a tuple. """ x, y = array(x,'Float64'), array(y,'Float64') return sum(x*y)/sum(x*x), 0 def regress_R2(x, y): """Returns the R^2 value for the regression of x and y Used the method explained on pg 334 ofJ.H. Zar, Biostatistical analysis, fourth edition. 1999 """ slope, intercept = regress(x,y) coords = zip(x, y) Sx = Sy = Syy = SXY = 0.0 n = float(len(y)) for x, y in coords: SXY += x*y Sx += x Sy += y Syy += y*y Sxy = SXY - (Sx*Sy)/n regSS = slope * Sxy totSS = Syy - ((Sy*Sy)/n) return regSS/totSS def regress_residuals(x, y): """reports the residual (error) for each point from the linear regression""" slope, intercept = regress(x, y) coords = zip(x, y) residuals = [] for x, y in coords: e = y - (slope * x) - intercept residuals.append(e) return residuals def stdev_from_mean(x): """returns num standard deviations from the mean of each val in x[]""" x = array(x) return (x - mean(x))/std(x) def regress_major(x, y): """Returns major-axis regression line of y on x. Use in cases where there is error in both x and y. """ x, y = array(x), array(y) N = len(x) Sx = sum(x) Sy = sum(y) Sxx = sum(x*x) Syy = sum(y*y) Sxy = sum(x*y) var_y = (Syy-((Sy*Sy)/N))/(N-1) var_x = (Sxx-((Sx*Sx)/N))/(N-1) cov = (Sxy-((Sy*Sx)/N))/(N-1) mean_y = Sy/N mean_x = Sx/N D = sqrt((var_y + var_x)*(var_y + var_x) - 4*(var_y*var_x - (cov*cov))) eigen_1 = (var_y + var_x + D)/2 slope = cov/(eigen_1 - var_y) intercept = mean_y - (mean_x * slope) return (slope, intercept) def z_test(a, popmean=0, popstdev=1, tails=None): """Returns z and probability score for a single sample of items. Calculates the z-score on ONE sample of items with mean x, given a population mean and standard deviation (parametric). Usage: z, prob = z_test(a, popmean, popstdev, tails) z is a float; prob is a probability. a is a sample with Mean and Count. popmean should be the parametric population mean; 0 by default. popstdev should be the parametric population standard deviation, 1 by default. tails should be None (default), 'high', or 'low'. """ try: z = (mean(a) - popmean)/popstdev*sqrt(len(a)) return z, z_tailed_prob(z, tails) except (ValueError, TypeError, ZeroDivisionError, AttributeError, \ FloatingPointError): return None def z_tailed_prob(z, tails): """Returns appropriate p-value for given z, depending on tails.""" if tails == 'high': return z_high(z) elif tails == 'low': return z_low(z) else: return zprob(z) def t_tailed_prob(t, df, tails): """Return appropriate p-value for given t and df, depending on tails.""" if tails == 'high': return t_high(t, df) elif tails == 'low': return t_low(t, df) else: return tprob(t,df) def reverse_tails(tails): """Swaps high for low or vice versa, leaving other values alone.""" if tails == 'high': return 'low' elif tails == 'low': return 'high' else: return tails def tail(prob, test): """If test is true, returns prob/2. Otherwise returns 1-(prob/2). """ prob /= 2 if test: return prob else: return 1 - prob def combinations(n, k): """Returns the number of ways of choosing k items from n. """ return exp(lgam(n+1) - lgam(k+1) - lgam(n-k+1)) def multiple_comparisons(p, n): """Corrects P-value for n multiple comparisons. Calculates directly if p is large and n is small; resorts to logs otherwise to avoid rounding (1-p) to 1 """ if p > 1e-6: #if p is large and n small, calculate directly return 1 - (1-p)**n else: return one_minus_exp(-n * p) def multiple_inverse(p_final, n): """Returns p_initial for desired p_final with n multiple comparisons. WARNING: multiple_inverse is not very reliable when p_final is very close to 1 (say, within 1e-4) since we then take the ratio of two very similar numbers. """ return one_minus_exp(log_one_minus(p_final)/n) def multiple_n(p_initial, p_final): """Returns number of comparisons such that p_initial maps to p_final. WARNING: not very accurate when p_final is very close to 1. """ return log_one_minus(p_final)/log_one_minus(p_initial) def fisher(probs): """Uses Fisher's method to combine multiple tests of a hypothesis. -2 * SUM(ln(P)) gives chi-squared distribution with 2n degrees of freedom. """ try: return chi_high(-2 * sum(map(log, probs)), 2 * len(probs)) except OverflowError, e: return 0.0 def f_value(a,b): """Returns the num df, the denom df, and the F value. a, b: lists of values, must have Variance attribute (recommended to make them Numbers objects. The F value is always calculated by dividing the variance of a by the variance of b, because R uses the same approach. In f_two_value it's decided what p-value is returned, based on the relative sizes of the variances. """ if not any(a) or not any(b) or len(a) <= 1 or len(b) <= 1: raise ValueError, "Vectors should contain more than 1 element" F = var(a)/var(b) dfn = len(a)-1 dfd = len(b)-1 return dfn, dfd, F def f_two_sample(a, b, tails=None): """Returns the dfn, dfd, F-value and probability for two samples a, and b. a and b: should be independent samples of scores. Should be lists of observations (numbers). tails should be None(default, two-sided test), 'high' or 'low'. This implementation returns the same results as the F test in R. """ dfn, dfd, F = f_value(a, b) if tails == 'low': return dfn, dfd, F, f_low(dfn, dfd, F) elif tails == 'high': return dfn, dfd, F, f_high(dfn, dfd, F) else: if var(a) >= var(b): side='right' else: side='left' return dfn, dfd, F, fprob(dfn, dfd, F, side=side) def ANOVA_one_way(a): """Performs a one way analysis of variance a is a list of lists of observed values. Each list is the values within a category. The analysis must include 2 or more categories(lists). the lists must have a Mean and variance attribute. Recommende to make the Numbers objects An F value is first calculated as the variance of the group means divided by the mean of the within-group variances. """ group_means = [] group_variances = [] num_cases = 0 all_vals = [] for i in a: num_cases += len(i) group_means.append(i.Mean) group_variances.append(i.Variance * (len(i)-1)) all_vals.extend(i) group_means = Numbers(group_means) #get within group variances (denominator) group_variances = Numbers(group_variances) dfd = num_cases - len(group_means) within_MS = sum(group_variances)/dfd #get between group variances (numerator) grand_mean = Numbers(all_vals).Mean between_MS = 0 for i in a: diff = i.Mean - grand_mean diff_sq = diff * diff x = diff_sq * len(i) between_MS += x dfn = len(group_means) - 1 between_MS = between_MS/dfn F = between_MS/within_MS return dfn, dfd, F, between_MS, within_MS, group_means, f_high(dfn, dfd, F) def MonteCarloP(value, rand_values, tail = 'high'): """takes a true value and a list of random values as input and returns a p-value tail indicates which side of the distribution to look at: low = look for smaller values than expected by chance high = look for larger values than expected by chance """ pop_size= len(rand_values) rand_values.sort() if tail == 'high': num_better = pop_size for i, curr_val in enumerate(rand_values): if value <= curr_val: num_better = i break p_val = 1-(num_better / pop_size) elif tail == 'low': num_better = pop_size for i, curr_val in enumerate(rand_values): if value < curr_val: num_better = i break p_val = num_better / pop_size return p_val def sign_test(success, trials, alt="two sided"): """Returns the probability for the sign test. Arguments: - success: the number of successes - trials: the number of trials - alt: the alternate hypothesis, one of 'less', 'greater', 'two sided' (default). """ lo = ["less", "lo", "lower", "l"] hi = ["greater", "hi", "high", "h", "g"] two = ["two sided", "2", 2, "two tailed", "two"] alt = alt.lower().strip() if alt in lo: p = binomial_low(success, trials, 0.5) elif alt in hi: success -= 1 p = binomial_high(success, trials, 0.5) elif alt in two: success = min(success, trials-success) hi = 1 - binomial_high(success, trials, 0.5) lo = binomial_low(success, trials, 0.5) p = hi+lo else: raise RuntimeError("alternate [%s] not in %s" % (lo+hi+two)) return p def ks_test(x, y=None, alt="two sided", exact = None, warn_for_ties = True): """Returns the statistic and probability from the Kolmogorov-Smirnov test. Arguments: - x, y: vectors of numbers whose distributions are to be compared. - alt: the alternative hypothesis, default is 2-sided. - exact: whether to compute the exact probability - warn_for_ties: warns when values are tied. This should left at True unless a monte carlo variant, like ks_boot, is being used. Note the 1-sample cases are not implemented, although their cdf's are implemented in ks.py""" # translation from R 2.4 num_x = len(x) num_y = None x = zip(x, zeros(len(x), int)) lo = ["less", "lo", "lower", "l", "lt"] hi = ["greater", "hi", "high", "h", "g", "gt"] two = ["two sided", "2", 2, "two tailed", "two", "two.sided"] Pval = None if y is not None: # in anticipation of actually implementing the 1-sample cases num_y = len(y) y = zip(y, ones(len(y), int)) n = num_x * num_y / (num_x + num_y) combined = x + y if len(set(combined)) < num_x + num_y: ties = True else: ties = False combined = array(combined, dtype=[('stat', float), ('sample', int)]) combined.sort(order='stat') cumsum = zeros(combined.shape[0], float) scales = array([1/num_x, -1/num_y]) indices = combined['sample'] cumsum = scales.take(indices) cumsum = cumsum.cumsum() if exact == None: exact = num_x * num_y < 1e4 if alt in two: stat = max(fabs(cumsum)) elif alt in lo: stat = -cumsum.min() elif alt in hi: stat = cumsum.max() else: raise RuntimeError, "Unknown alt: %s" % alt if exact and alt in two and not ties: Pval = 1 - psmirnov2x(stat, num_x, num_y) else: raise NotImplementedError if Pval == None: if alt in two: Pval = 1 - pkstwo(sqrt(n) * stat) else: Pval = exp(-2 * n * stat**2) if ties and warn_for_ties: warnings.warn("Cannot compute correct KS probability with ties") try: # if numpy arrays were input, the Pval can be an array of len==1 Pval = Pval[0] except (TypeError, IndexError): pass return stat, Pval def _get_bootstrap_sample(x, y, num_reps): """yields num_reps random samples drawn with replacement from x and y""" combined = array(list(x) + list(y)) total_obs = len(combined) num_x = len(x) for i in range(num_reps): # sampling with replacement indices = randint(0, total_obs, total_obs) sampled = combined.take(indices) # split into the two populations sampled_x = sampled[:num_x] sampled_y = sampled[num_x:] yield sampled_x, sampled_y def ks_boot(x, y, alt = "two sided", num_reps=1000): """Monte Carlo (bootstrap) variant of the Kolmogorov-Smirnov test. Useful for when there are ties. Arguments: - x, y: vectors of numbers - alt: alternate hypothesis, as per ks_test - num_reps: number of replicates for the bootstrap""" # based on the ks_boot method in the R Matching package # see http://sekhon.berkeley.edu/matching/ # One important difference is I preserve the original sample sizes # instead of making them equal tol = MACHEP * 100 combined = array(list(x) + list(y)) observed_stat, _p = ks_test(x, y, exact=False, warn_for_ties=False) total_obs = len(combined) num_x = len(x) num_greater = 0 for sampled_x, sampled_y in _get_bootstrap_sample(x, y, num_reps): sample_stat, _p = ks_test(sampled_x, sampled_y, alt=alt, exact=False, warn_for_ties=False) if sample_stat >= (observed_stat - tol): num_greater += 1 return observed_stat, num_greater / num_reps def _average_rank(start_rank, end_rank): ave_rank = sum(range(start_rank, end_rank+1)) / (1+end_rank-start_rank) return ave_rank def mw_test(x, y): """computes the Mann-Whitney U statistic and the probability using the normal approximation""" if len(x) > len(y): x, y = y, x num_x = len(x) num_y = len(y) x = zip(x, zeros(len(x), int), zeros(len(x), int)) y = zip(y, ones(len(y), int), zeros(len(y), int)) combined = x+y combined = array(combined, dtype=[('stat', float), ('sample', int), ('rank', float)]) combined.sort(order='stat') prev = None start = None ties = False T = 0.0 for index in range(combined.shape[0]): value = combined['stat'][index] sample = combined['sample'][index] if value == prev and start is None: start = index continue if value != prev and start is not None: ties = True ave_rank = _average_rank(start, index) num_tied = index - start + 1 T += (num_tied**3 - num_tied) for i in range(start-1, index): combined['rank'][i] = ave_rank start = None combined['rank'][index] = index+1 prev = value if start is not None: ave_rank = _average_rank(start, index) num_tied = index - start + 2 T += (num_tied**3 - num_tied) for i in range(start-1, index+1): combined['rank'][i] = ave_rank total = combined.shape[0] x_ranks_sum = sum(combined['rank'][i] for i in range(total) if combined['sample'][i] == 0) prod = num_x * num_y U1 = prod + (num_x * (num_x+1) / 2) - x_ranks_sum U2 = prod - U1 U = max([U1, U2]) numerator = U - prod / 2 denominator = sqrt((prod / (total * (total-1)))*((total**3 - total - T)/12)) z = (numerator/denominator) p = zprob(z) return U, p def mw_boot(x, y, num_reps=1000): """Monte Carlo (bootstrap) variant of the Mann-Whitney test. Arguments: - x, y: vectors of numbers - num_reps: number of replicates for the bootstrap Uses the same Monte-Carlo resampling code as kw_boot """ tol = MACHEP * 100 combined = array(list(x) + list(y)) observed_stat, obs_p = mw_test(x, y) total_obs = len(combined) num_x = len(x) num_greater = 0 for sampled_x, sampled_y in _get_bootstrap_sample(x, y, num_reps): sample_stat, sample_p = mw_test(sampled_x, sampled_y) if sample_stat >= (observed_stat - tol): num_greater += 1 return observed_stat, num_greater / num_reps def permute_2d(m, p): """Performs 2D permutation of matrix m according to p.""" return m[p][:, p] #unused below m_t = transpose(m) r_t = take(m_t, p, axis=0) return take(transpose(r_t), p, axis=0) def mantel(m1, m2, n): """Compares two distance matrices. Reports P-value for correlation. The p-value is based on a two-sided test. WARNING: The two distance matrices must be symmetric, hollow distance matrices, as only the lower triangle (excluding the diagonal) will be used in the calculations (matching R's vegan::mantel function). This function is retained for backwards-compatibility. Please use mantel_test() for more control over how the test is performed. """ return mantel_test(m1, m2, n)[0] def mantel_test(m1, m2, n, alt="two sided", suppress_symmetry_and_hollowness_check=False): """Runs a Mantel test on two distance matrices. Returns the p-value, Mantel correlation statistic, and a list of Mantel correlation statistics for each permutation test. WARNING: The two distance matrices must be symmetric, hollow distance matrices, as only the lower triangle (excluding the diagonal) will be used in the calculations (matching R's vegan::mantel function). Arguments: m1 - the first distance matrix to use in the test (should be a numpy array or convertible to a numpy array) m2 - the second distance matrix to use in the test (should be a numpy array or convertible to a numpy array) n - the number of permutations to test when calculating the p-value alt - the type of alternative hypothesis to test (can be either 'two sided' for a two-sided test, 'greater' or 'less' for one-sided tests) suppress_symmetry_and_hollowness_check - by default, the input distance matrices will be checked for symmetry and hollowness. It is recommended to leave this check in place for safety, as the check is fairly fast. However, if you *know* you have symmetric and hollow distance matrices, you can disable this check for small performance gains on extremely large distance matrices """ # Perform some sanity checks on our input. if alt not in ("two sided", "greater", "less"): raise ValueError("Unrecognized alternative hypothesis. Must be either " "'two sided', 'greater', or 'less'.") m1, m2 = asarray(m1), asarray(m2) if m1.shape != m2.shape: raise ValueError("Both distance matrices must be the same size.") if n < 1: raise ValueError("The number of permutations must be greater than or " "equal to one.") if not suppress_symmetry_and_hollowness_check: if not (is_symmetric_and_hollow(m1) and is_symmetric_and_hollow(m2)): raise ValueError("Both distance matrices must be symmetric and " "hollow.") # Get a flattened list of lower-triangular matrix elements (excluding the # diagonal) in column-major order. Use these values to calculate the # correlation statistic. m1_flat, m2_flat = _flatten_lower_triangle(m1), _flatten_lower_triangle(m2) orig_stat = pearson(m1_flat, m2_flat) # Run our permutation tests so we can calculate a p-value for the test. size = len(m1) better = 0 perm_stats = [] for i in range(n): perm = permute_2d(m1, permutation(size)) perm_flat = _flatten_lower_triangle(perm) r = pearson(perm_flat, m2_flat) if alt == 'two sided': if abs(r) >= abs(orig_stat): better += 1 else: if ((alt == 'greater' and r >= orig_stat) or (alt == 'less' and r <= orig_stat)): better += 1 perm_stats.append(r) return (better + 1) / (n + 1), orig_stat, perm_stats def is_symmetric_and_hollow(matrix): return (matrix.T == matrix).all() and (trace(matrix) == 0) def _flatten_lower_triangle(matrix): """Returns a list containing the flattened lower triangle of the matrix. The returned list will contain the elements in column-major order. The diagonal will be excluded. Arguments: matrix - numpy array containing the matrix data """ matrix = asarray(matrix) flattened = [] for col_num in range(matrix.shape[1]): for row_num in range(matrix.shape[0]): if col_num < row_num: flattened.append(matrix[row_num][col_num]) return flattened def kendall_correlation(x, y, alt="two sided", exact=None, warn=True): """returns the statistic (tau) and probability from Kendall's non-parametric test of association that tau==0. Uses the large sample approximation when len(x) >= 50 or when there are ties, otherwise it computes the probability exactly. Based on the algorithm implemented in R v2.5 Arguments: - alt: the alternate hypothesis (greater, less, two sided) - exact: when False, forces use of the large sample approximation (normal distribution). Not allowed for len(x) >= 50. - warn: whether to warn about tied values """ assert len(x) == len(y), "data (x, y) not of same length" assert len(x) > 2, "not enough observations" # possible alternate hypotheses arguments lo = ["less", "lo", "lower", "l", "lt"] hi = ["greater", "hi", "high", "h", "g", "gt"] two = ["two sided", "2", 2, "two tailed", "two", "two.sided", "ts"] ties = False num = len(x) ties = len(set(x)) != num or len(set(y)) != num if ties and warn: warnings.warn("Tied values, using normal approximation") if not ties and num < 50: exact = True if num < 50 and not ties and exact: combs = int(num * (num-1) / 2) working = [] for i in range(combs): row = [-1 for j in range(combs)] working.append(row) tau = kendalls_tau(x, y, False) q = round((tau+1)*num*(num-1) / 4) if alt in two: if q > num * (num - 1) / 4: p = 1 - pkendall(q-1, num, Gamma(num+1), working) else: p = pkendall(q, num, Gamma(num+1), working) p = min(2*p, 1) elif alt in hi: p = 1 - pkendall(q-1, num, Gamma(num+1), working) elif alt in lo: p = pkendall(q, num, Gamma(num+1), working) else: tau, p = kendalls_tau(x, y, True) if alt in hi: p /= 2 elif alt in lo: p = 1 - p/2 return tau, p ## Start functions for distance_matrix_permutation_test def distance_matrix_permutation_test(matrix, cells, cells2=None,\ f=t_two_sample, tails=None, n=1000, return_scores=False,\ is_symmetric=True): """performs a monte carlo permutation test to determine if the values denoted in cells are significantly different than the rest of the values in the matrix matrix: a numpy array cells: a list of indices of special cells to compare to the rest of the matrix cells2: an optional list of indices to compare cells to. If set to None (default), compares cells to the rest of the matrix f: the statistical test used. Should take a "tails" parameter as input tails: can be None(default), 'high', or 'low'. Input into f. n: the number of replicates in the Monte Carlo simulations is_symmetric: corrects if the matrix is symmetric. Need to only look at one half otherwise the degrees of freedom value will be incorrect. """ #if matrix is symmetric convert all indices to lower trangular if is_symmetric: cells = get_ltm_cells(cells) if cells2: cells2 = get_ltm_cells(cells2) # pull out the special values special_values, other_values = \ get_values_from_matrix(matrix, cells, cells2, is_symmetric) # calc the stat and parameteric p-value for real data stat, p = f(special_values, other_values, tails) #calc for randomized matrices count_more_extreme = 0 stats = [] indices = range(len(matrix)) for k in range(n): # shuffle the order of indices, and use those to permute the matrix permuted_matrix = permute_2d(matrix,permutation(indices)) special_values, other_values = \ get_values_from_matrix(permuted_matrix, cells,\ cells2, is_symmetric) # calc the stat and p for a random subset (we don't do anything # with these p-values, we only use the current_stat value) current_stat, current_p = f(special_values, other_values, tails) stats.append(current_stat) if tails == None: if abs(current_stat) > abs(stat): count_more_extreme += 1 elif tails == 'low': if current_stat < stat: count_more_extreme += 1 elif tails == 'high': if current_stat > stat: count_more_extreme += 1 # pack up the parametric stat, parametric p, and empirical p; calc the # the latter in the process result = [stat, p, count_more_extreme/n] # append the scores of the n tests if requested if return_scores: result.append(stats) return tuple(result) def get_values_from_matrix(matrix, cells, cells2=None, is_symmetric=True): """get values from matrix positions in cells and cells2 matrix: the numpy array from which values should be taken cells: indices of first set of requested values cells2: indices of second set of requested values or None if they should be randomly selected is_symmetric: True if matrix is symmetric """ # pull cells values cells_values = [matrix[i] for i in cells] # pull cells2 values if cells2: cells2_values = [matrix[i] for i in cells2] # or generate the indices and grab them if they # weren't passed in else: cells2_values = [] for i, val_i in enumerate(matrix): for j, val in enumerate(val_i): if is_symmetric: if (i,j) not in cells and i > j: cells2_values.append(val) else: if (i,j) not in cells: cells2_values.append(val) return cells_values, cells2_values def get_ltm_cells(cells): """converts matrix indices so all are below the diagonal cells: list of indices into a 2D integer-indexable object (typically a list or lists of array of arrays) """ new_cells = [] for cell in cells: if cell[0] < cell[1]: new_cells.append((cell[1], cell[0])) elif cell[0] > cell[1]: new_cells.append(cell) #remove duplicates new_cells = set(new_cells) return list(new_cells) ## End functions for distance_matrix_permutation_test PyCogent-1.5.3/cogent/maths/stats/util.py000644 000765 000024 00000136047 12024702176 021334 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file cogent/maths/stats/util.py """Provides classes and utility methods for statistics. Classes: Numbers, Freqs, NumberFreqs, SummaryStatistics. Owner: Rob Knight rob@spot.colorado.edu Status: Stable Notes SHOULD THIS FILE BE DELETED? Numbers is clearly superfluous since we're now using numpy. Should some of the Freqs classes be kept? Should UnsafeFreqs become Freqs, and the full MappedDict version of Freqs be deleted? -- RK 8/2/05 The distinctions among the classes can be somewhat subtle, so here's a quick guide to usage. The core classes are Numbers, Freqs, NumberFreqs, and SummaryStatistics. For each of the first three, there is also an Unsafe version (e.g. UnsafeFreqs), which has the same interface but avoids overriding built-in methods for performance reasons. The performance differences can be very large (the unsafe version can be an order of magnitude faster). However, the unsafe versions do not validate their input, so methods will fail if the data are invalid. SummaryStatistics holds a Count, Sum, Variance, Mean, StandardDeviation, and SumSquares (all of these are optional). If initialized with only some of these statistics, it will calculate the others when needed when it can (e.g. it can calculate the Mean given the Sum and the Count). SummaryStatistics is useful for holding information about a large data set when you need to throw away the original data to free memory. The statistics functions all work on properties, and most functions that work on Numbers or Freqs will also work fine on the equivalent SummaryStatistics object. Numbers holds a list of values that are all numbers (i.e. can be converted to float -- it does not try to do anything with complex values). Numbers supports the full list interface, and also supports methods like items (returns key, value pairs), toFixedWidth (for formatting), normalize, accumulate, firstIndexLessThan (and its relatives), randomSequence, round, and so forth. UnsafeNumbers behaves like Numbers, except that it does not complain if you initialize it with non-numeric values or insert such values later (with insert, __setitem__, extend, and so forth). Also, UnsafeNumbers does not automatically map strings to numbers during __init__. Freqs holds a dict of key, value pairs where the keys are assumed to be separate categories. Statistics functions act on the categories: in other words, Count returns the number of categories, Sum returns the number of observations in all categories, Mean returns the mean number of observations per category, and so on. Freqs (and its subclasses) notably supports addition of counts from a sequence, a dict, a list of dicts, a sequence of key, value pairs, and a sequence of sequences: in all cases, the counts are added (unlike dict.update(), which overwrites the value with the last one read from that key). Use +, +=, -, and -= to accumulate counts. Freqs supports all the stats functions that Numbers does. Notable features of Freqs include randomSequence (generates a random sequence of keys according to their frequencies), rekey (maps the freqs onto new keys, given a conversion dictionary), normalize, scale, round, and getSortedList (sorts items by key or value, ascending or descending). Freqs refused to hold anything except non-negative floats (performs conversion on addition), and checks this constraint on all methods that mutate the object. UnsafeFreqs is like Freqs except that it is non-validating. One important consequence is that __init__ behaves differently: UnsafeFreqs inherits the raw dict __init__. This means that Freqs([('a',2),('b',3),('a',1)]) gives you an object that compares equal to Freqs({'a':3,'b':3}) because it sums the repeated keys, but UnsafeFreqs([('a',2),('b',3),('a',1)]) gives you an object that compares equal to Freqs({'a':1,'b':3}) because the last key encounted overwrites the previous value for that key. Additionally, the UnsafeFreqs constructor can't accept all the things the Freqs constructor can -- the easiest workaround is to create an empty UnsafeFreqs and use += to fill it with data, although if you know the form in the data in advance it's much faster to use the appropriate method (e.g. fromTuples, fromDict) than to let += guess what you gave it. UnsafeFreqs does not check that operations that mutate the dictionary preserve validity. NumberFreqs hold a dict of key, value pairs where the keys are assumed to be numbers along an axis, and the values are assumed to be the counts at each point. Thus, Count returns the number of _observations_ (not categories), Sum returns the sum of key * value, Mean returns the mean value of the observations, and so forth. An example of how this works is as follows: Freqs([1,2,2,1,1,1,3,3]) gives you the dict {1:4,2:2,3:2}. These values are interpreted as coming from 3 categories, so the Count is 3. There are 8 observations, so the Sum is 8. The Mean across categories is 8/3. NumberFreqs([1,2,2,1,1,3,3]) gives you the dict {1:4,2:2,3:2}. These values are interpreted as 4 counts of 1, 2 counts of 2, and 2 counts of 3. All the values are treated as coming from the same category, so the Count is 8. The sum is calculated by weighting the value by the key, so is 1*4 + 2*2 + 3*2; thus, the Sum is 14. Consequently, the Mean of the values is 14/8. Consequently, NumberFreqs is appropriate when you want to treat the distribution of key, value pairs like a histogram in the keys, and find the average along the key axis. Freqs is appropriate when you want to treat the distribution of key, value pairs like a bar graph where the keys are separate categories, and find the average along the value axis. This distinction between Freqs and NumberFreqs holds for all the stats functions, which rely on the Sum, Count, and other properties whose behavior differs between Freqs and NumberFreqs. UnsafeNumberFreqs behaves like NumberFreqs except that it doesn't validate on input or mutation of the dict. It's much faster, though. """ from __future__ import division from cogent.util.misc import FunctionWrapper, MappedList, MappedDict, \ ConstraintError from cogent.util.table import Table from numpy import array, sqrt, log2, e, floor, ceil from random import choice, random from operator import gt, ge, lt, le, add, sub __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit", "Gavin Huttley", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class SummaryStatisticsError(ValueError): """Raised when not possible to calculate a requested summary statistic.""" pass class SummaryStatistics(object): """Minimal statistics interface. Object is read-only once created.""" def __init__(self, Count=None, Sum=None, Mean=None, StandardDeviation=None, Variance=None, SumSquares=None, Median=None): """Returns a new SummaryStatistics object.""" self._count = Count self._sum = Sum self._mean = Mean self._standard_deviation = StandardDeviation self._variance = Variance self._sum_squares = SumSquares self._median = Median def __str__(self): """Returns string representation of SummaryStatistics object.""" result = [] for field in ["Count", "Sum", "Median", "Mean", "StandardDeviation", \ "Variance", "SumSquares"]: try: val = getattr(self, field) if not val: continue result.append([field, val]) except: pass if not result: return '' return str(Table("Statistic Value".split(), result, column_templates={'Value': "%.4g"})) def _get_count(self): """Returns Count if possible (tries to calculate as sum/mean).""" if self._count is None: try: self._count = self._sum/self._mean except (TypeError, ZeroDivisionError, FloatingPointError): raise SummaryStatisticsError, \ "Insufficient data to calculate count." return self._count Count = property(_get_count) def _get_sum(self): """Returns Sum if possible (tries to calculate as count*mean).""" if self._sum is None: try: self._sum = self._count * self._mean except TypeError: raise SummaryStatisticsError, \ "Insufficient data to calculate sum." return self._sum Sum = property(_get_sum) def _get_mean(self): """Returns Mean if possible (tries to calculate as sum/count).""" if self._mean is None: try: self._mean = self._sum / self._count except (TypeError, ZeroDivisionError, FloatingPointError): raise SummaryStatisticsError, \ "Insufficient data to calculate mean." return self._mean Mean = property(_get_mean) def _get_median(self): """Returns Median.""" return self._median Median = property(_get_median) def _get_standard_deviation(self): """Returns StandardDeviation if possible (calc as sqrt(var).""" if self._standard_deviation is None: try: self._standard_deviation = sqrt(abs(self._variance)) except TypeError: raise SummaryStatisticsError, \ "Insufficient data to calculate standard deviation." return self._standard_deviation StandardDeviation = property(_get_standard_deviation) def _get_variance(self): """Returns Variance if possible (calculates as stdev ** 2).""" if self._variance is None: try: self._variance = self._standard_deviation * \ self._standard_deviation except TypeError: raise SummaryStatisticsError, \ "Insufficient data to calculate variance." return self._variance Variance = property(_get_variance) def _get_sum_squares(self): """Returns SumSquares if possible.""" if self._sum_squares is None: raise SummaryStatisticsError, \ "Insufficient data to calculate sum of squares." return self._sum_squares SumSquares = property(_get_sum_squares) def __cmp__(self, other): """SummaryStatistics compares by count, then sum, then variance. Absent values compare as 0. """ result = 0 for attr in ['Count', 'Sum', 'Variance', 'SumSquares']: try: my_attr = getattr(self, attr) except SummaryStatisticsError: my_attr = 0 try: other_attr = getattr(other, attr) except SummaryStatisticsError: other_attr = 0 result = result or cmp(my_attr, other_attr) return result class NumbersI(object): """Interface for Numbers, a list that performs numeric operations.""" _is_sorted = False def isValid(self): """Checks that all items in self are numbers.""" for i in self: if not (isinstance(i, int) or isinstance(i, float)): return False return True def items(self): """Returns list of (item, 1) tuples for each item in self. This is necessary because we want to delegate attribute accesses (specifically, method calls for stats functions) to a Freqs object. Freqs tests whether an object is dictionary-like by calling items() on it, expecting an error if it doesn't have the method. However, NumericList delegates items() back to a new Freqs, calling the constructor in a cycle... The workaround, since we don't want to explicitly specify which items get passed on to Freqs, is just to give Numbers its own items() method. """ return zip(self, [1] * len(self)) def toFixedWidth(self, fieldwidth=10): """Returns string with elements mapped to fixed field width. Always converts to scientific notation. Minimum fieldwidth is 7, since it's necessary to account for '-xe-yyy'. Result has (fieldwidth - 7) significant figures of precision, or 1 if fieldwidth is 7. """ if fieldwidth < 7: raise ValueError, "toFixedWidth requres fieldwith of at least 8." if fieldwidth == 7: decimals = 0 else: decimals = fieldwidth-8 format = ''.join(["%+", str(fieldwidth), '.',str(decimals),'e']) return ''.join( [format % i for i in self]) def normalize(self, x=None): """Normalizes items in Numbers by dividing by x (sum by default).""" if not self: return #do nothing of empty if x is None: x = self.Sum if not x: #do nothing if items are empty return x = float(x) for index, item in enumerate(self): self[index] = item/x def accumulate(self): """Converts self to cumulative sum, in place""" if self: curr = self[0] for i in xrange(1, len(self)): curr += self[i] self[i] = curr def firstIndexGreaterThan(self, value, inclusive=False, stop_at_ends=False): """Returns first index of self that is greater than value. inclusive: whether to use i > value or i >= value for the test. stop_at_ends: whether to return None or the last index in self if none of the items in self are greater than the value. """ if inclusive: operator = ge else: operator = gt for i, curr in enumerate(self): if operator(curr, value): return i #only get here if we didn't find anything greater if stop_at_ends: return i #default is to return None def firstIndexLessThan(self, value, inclusive=False, stop_at_ends=False): """Returns first index of self that is less than value. inclusive: whether to use i < value or i <= value for the test. stop_at_ends: whether to return None or the last index in self if none of the items in self are less than the value. """ if inclusive: operator = le else: operator = lt for i, curr in enumerate(self): if operator(curr, value): return i #only get here if we didn't find anything greater if stop_at_ends: return i #default is to return None def lastIndexGreaterThan(self, value, inclusive=False, stop_at_ends=False): """Returns last index of self that is greater than value. inclusive: whether to use i > value or i >= value for the test. stop_at_ends: whether to return None or 0 if none of the items in self is greater than the value. """ if inclusive: operator = ge else: operator = gt latest = None for i, curr in enumerate(self): if operator(curr, value): latest = i if stop_at_ends and (latest is None): return 0 else: return latest def lastIndexLessThan(self, value, inclusive=False, stop_at_ends=False): """Returns last index of self that is less than value. inclusive: whether to use i < value or i <= value for the test. stop_at_ends: whether to return None or 0 if none of the items in self is less than the value. """ if inclusive: operator = le else: operator = lt latest = None for i, curr in enumerate(self): if operator(curr, value): latest = i if stop_at_ends and (latest is None): return 0 else: return latest def _get_sum(self): """Returns sum of items in self.""" return sum(self) Sum = property(_get_sum) Count = property(list.__len__) def _get_sum_squares(self): """Returns sum of squares of items in self.""" return sum([i*i for i in self]) SumSquares = property(_get_sum_squares) def _get_variance(self): """Returns sample variance of items in self. Fault-tolerant: returns zero if one or no items. """ if not self: return None total = self.Sum count = self.Count if count <= 1: #no variance for a single item variance = 0.0 else: variance = (self.SumSquares-(total*total)/count)/(count-1) return variance Variance = property(_get_variance) def _get_standard_deviation(self): """Returns sample standard deviation of items in self.""" if not self: return None return sqrt(abs(self.Variance)) StandardDeviation = property(_get_standard_deviation) def _get_mean(self): """Returns mean of items in self.""" if not self: return None try: mean = self.Sum/self.Count except (ZeroDivisionError, FloatingPointError): mean = 0.0 return mean Mean = property(_get_mean) def _get_mode(self): """Returns the most frequent item. If a tie, picks one at random. Usage: most_frequent = self.mode() """ best = None best_count = 0 for item, count in self.items(): if count > best_count: best_count = count best = item return best Mode = property(_get_mode) def quantile(self, quantile): """Returns the specified quantile. Uses method type 7 from R. Only sorts on first call, so subsequent modifications may result in incorrect estimates unless directly sorted prior to using.""" if not self._is_sorted: self.sort() self._is_sorted = True index = quantile * (len(self)-1) lo = int(floor(index)) hi = int(ceil(index)) diff = index - lo stat = (1-diff) * self[lo] + diff * self[hi] return stat def _get_median(self): """Returns the median""" return self.quantile(0.5) Median = property(_get_median) def summarize(self): """Returns summary statistics for self.""" return SummaryStatistics(Count=self.Count, Sum=self.Sum, \ Variance=self.Variance, Median = self.Median) def choice(self): """Returns random element from self.""" return choice(self) def randomSequence(self, n): """Returns list of n random choices from self, with replacement.""" return [choice(self) for i in range(n)] def subset(self, items, keep=True): """Retains (or deletes) everything in self contained in items, in place. For efficiency, items should be a dict. """ if keep: self[:] = filter(items.__contains__, self) else: self[:] = filter(lambda x: x not in items, self) def copy(self): """Returns new copy of self, typecast to same class.""" return self.__class__(self[:]) def round(self, ndigits=0): """Rounds each item in self to ndigits.""" self[:] = [round(i, ndigits) for i in self] #following properties/methods are handled by conversion to Freqs. # Uncertainty, Mode def _get_uncertainty(self): """Returns Shannon entropy of items in self.""" return UnsafeFreqs().fromSeq(self).Uncertainty Uncertainty = property(_get_uncertainty) def _get_mode(self): """Returns most common element in self.""" return UnsafeFreqs().fromSeq(self).Mode Mode = property(_get_mode) class UnsafeNumbers(NumbersI, list): """Subclass of list that should only hold floating-point numbers. Usage: nl = Numbers(data) WARNING: UnsafeNumbers does not check that items added into it are really numbers, and performs almost no validation. """ pass class Numbers(NumbersI, MappedList): """Safe version of Numbers that validates on all list operations. For each item in data (which must be iterable), tests whether the item is a number and, if so, adds it to the Numbers. Note: this means we have to override _all_ the list methods that might potentially add new data to the list. This makes it much slower than UnsafeNumbers, but impossible for it to hold invalid data. """ Mask = FunctionWrapper(float) def __init__(self, data=None, Constraint=None, Mask=None): """Initializes a new Numbers object. Usage: nl = Numbers(data) For each item in data, tries to convert to a float. If successful, produces new Numbers with data. Note: this means that a single string of digits will be treated as a list of digits, _not_ as a single number. This might not be what you expected. Also, data must be iterable (so a 1-element list containing a number is OK, but a single number by itself is not OK). """ if data is not None: data = map(float, data) #fails if any items are not floatable else: data = [] MappedList.__init__(self, data, Constraint, Mask) class FreqsI(object): """Interface for frequency distribution, i.e. a set of value -> count pairs. """ RequiredKeys = {} def fromTuples(self, other, op=add, uses_key=False): """Adds counts to self inplace from list of key, val tuples. op: operator to apply to old and new values (default is add, but sub and mul might also be useful. Use names from the operator module). uses_key: whether or not op expects the key as the first argument, i.e. it gets (key, old, new) rather than just (old, new). Returns modified version of self as result. """ for key, val in other: curr = self.get(key, 0) if uses_key: self[key] = op(key, curr, val) else: self[key] = op(curr, val) return self def newFromTuples(cls, other, op=add, uses_key=False): """Classmethod: returns new FreqsI object from tuples.""" result = cls() return result.fromTuples(other, op, uses_key) newFromTuples = classmethod(newFromTuples) def fromDict(self, other, op=add, uses_key=False): """Adds counts to self inplace from dict of {key:count}. op: operator to apply to old and new values (default is add, but sub and mul might also be useful. Use names from the operator module). Returns modified version of self as result. """ for key, val in other.items(): curr = self.get(key, 0) if uses_key: self[key] = op(key, curr, val) else: self[key] = op(curr, val) return self def newFromDict(cls, other, op=add, uses_key=False): """Classmethod: returns new FreqsI object from single dict.""" result = cls() return result.fromDict(other, op, uses_key) newFromDict = classmethod(newFromDict) def fromDicts(self, others, op=add, uses_key=False): """Adds counts to self inplace from list of dicts of {key:count}. op: operator to apply to old and new values (default is add, but sub and mul might also be useful. Use names from the operator module). Returns modified version of self as result. """ for i in others: self.fromDict(i, op, uses_key) return self def newFromDicts(cls, others, op=add, uses_key=False): """Classmethod: returns new FreqsI object from single dict.""" result = cls() return result.fromDicts(others, op, uses_key) newFromDicts = classmethod(newFromDicts) def fromSeq(self, seq, op=add, weight=1, uses_key=False): """Adds counts to self inplace from seq. Each item adds 'weight' counts. op: operator to apply to old and new values (default is add, but sub and mul might also be useful. Use names from the operator module). weight: increment to apply each time an item is found. Applies self[i] += op(curr, weight) for each item in seq. Returns modified version of self as result. """ if uses_key: for i in seq: curr = self.get(i, 0) self[i] = op(i, curr, weight) else: for i in seq: curr = self.get(i, 0) self[i] = op(curr, weight) return self def newFromSeq(cls, seq, op=add, weight=1, uses_key=False): """Classmethod: returns new FreqsI object from single dict.""" result = cls() r = result.fromSeq(seq, op, weight, uses_key) return result.fromSeq(seq, op, uses_key) newFromSeq = classmethod(newFromSeq) def fromSeqs(self, seqs, op=add, weight=1, uses_key=False): """Adds counts to self inplace from seq of sequences. Uses weight. op: operator to apply to old and new values (default is add). weight: increment to apply each time an item is found. """ for s in seqs: self.fromSeq(s, op, weight, uses_key) return self def newFromSeqs(cls, seqs, op=add, weight=1, uses_key=False): """Classmethod: returns new FreqsI object from single dict.""" result = cls() return result.fromSeqs(seqs, op, weight, uses_key) newFromSeqs = classmethod(newFromSeqs) def _find_conversion_function(self, data): """Figures out which conversion function to use for data.""" # if the data is empty, it's easy... if not data: return None # if it has items, treat as a dict (fromDict only uses items() from the # dict interface). if hasattr(data, 'items'): return self.fromDict # if it's a string, treat it as a sequence if isinstance(data, str): return self.fromSeq # Otherwise, we need to check what it is. Must be a sequence of some # kind: elements could be dicts, tuples, or second-level sequences. # We know it's not empty, so we can get the first element first = data[0] #if the first item is a dict, assume they all are if isinstance(first, dict): return self.fromDicts # otherwise, if all items have two elements and the second is a number, # assume they're key-value pairs try: for key, value in data: v = float(value) # if it did work, data can be treated as a sequence of key-value # pairs. Note that if you _really_ have e.g. a sequence of pairs # of numbers that you want to add individually, you need to use # fromSeqs explicitly. return self.fromTuples except (TypeError, ValueError): # if that didn't work, data is either a sequence or a sequence of # sequences. # if first is iterable and not a string, treat as seq of seqs; # otherwise, treat as seq. # Note that this means that lists of strings will always be treated # as though each string is a key that's being counted (e.g. if you # pass in a list of words, you'll get word frequencies rather than # character frequencies). If you want the character frequencies, # call fromSeqs explicitly -- there's no way to detect what's # desired automatically. if isinstance(first, str): return self.fromSeq else: # if first item is iterable, treat as seq of seqs. otherwise, # treat as single seq of items. try: i = iter(first) return self.fromSeqs except TypeError: return self.fromSeq # should never get here because of return values raise NotImplementedError, "Fell off end of _find_conversion_function" def isValid(self): """Checks presence of required keys, and that all vals are numbers.""" for k in self.RequiredKeys: if k not in self: return False for v in self.values(): if not (isinstance(v, float) or isinstance(v, int)): return False if v < 0: return False return True def __iadd__(self, other): """Adds items from other to self.""" f = self._find_conversion_function(other) if f: #do nothing if we got None, since it means other was empty f(other, op=add) return self def __add__(self, other): """Returns new Freqs object with counts from self and other.""" result = self.copy() result += other return result def __isub__(self, other): """Subtracts items in other from self.""" f = self._find_conversion_function(other) if f: #do nothing if we got None, since it means other was empty f(other, op=sub) return self def __sub__(self, other): """Returns new Freqs containing difference between self and other.""" result = self.copy() result -= other return result def __str__(self): """Prints the items of self out as tab-delimited text. Value, Frequency pairs are printed one pair to a line. Headers are 'Value' and 'Count'. """ if self: lines = ["Value\tCount"] items = self.items() #make and sort list of (key, value) pairs items.sort() for key, val in items: lines.append("\t".join([str(key), str(val)])) #add pair return "\n".join(lines) else: return "Empty frequency distribution" def __delitem__(self, key): """May not delete key if it is required. WARNING: Will not work if your class doesn't subclass dict as well as FreqsI. """ r = self.RequiredKeys if r and (key in r): raise KeyError, "May not delete required key %s" % key else: dict.__delitem__(self, key) def rekey(self, key_map, default=None, constructor=None): """Returns new Freqs with keys remapped using key_map. key_map should be a dict of {old_key:new_key}. Values are summed across all keys that map to the same new value. Keys that are not in the key_map are omitted (if default is None), or set to the default. constructor defaults to self.__class__. However, if you're doing something like mapping amino acid frequencies onto charge frequencies, you probably want to specify the constructor since the result won't be valid on the alphabet of the current class. Note that the resulting Freqs object is not required to contain values for all the possible keys. """ if constructor is None: constructor = self.__class__ result = constructor() for key, val in self.items(): new_key = key_map.get(key, default) curr = result.get(new_key, 0) result[new_key] = curr + val return result def purge(self): """If self.RequiredKeys is nonzero, deletes everything not in it.""" req = self.RequiredKeys if req: for key in self.keys(): if key not in req: del self[key] def normalize(self, total=None, purge=True): """Converts all the counts into probabilities, i.e. normalized to 1. Ensures that all the required keys are present, if self.RequiredKeys is set. Does the transformation in place. If purge is True (the default), purges any keys that are not in self.RequiredKeys before normalizing. Can also pass in a number to divide by instead of using the total, which is useful for getting the average across n data sets. Usage: self.normalize() """ if purge: self.purge() req = self.RequiredKeys if req: for r in req: if r not in self: self[r] = 0 if total is None: total = self.Sum if total != 0: #avoid divide by zero for item, freq in self.items(): f = float(freq) if f < 0: raise ValueError, \ "Freqs.normalize(): found negative count!" self[item] = f/total def choice(self, prob): """If self is normalized, returns item corresponding to Pr(prob).""" sum = 0 items = self.items() for item, freq in items: sum += freq if prob <= sum: return item return items[-1][0] #return the last value if we run off the end def randomSequence(self, n): """Returns list of n random choices, with replacement. Will raise IndexError if there are no items in self. """ num_items = self.Sum return [self.choice(random()*num_items) for i in xrange(n)] def subset(self, items, keep=True): """Deletes keys for all but items from self, in place.""" if keep: to_delete = [] for i in self: try: #fails if i is not a string but items is delete = i not in items except TypeError: #i was wrong type, so can't be in items... to_delete.append(i) else: if delete: to_delete.append(i) else: to_delete = items for i in to_delete: try: del self[i] except KeyError: pass #don't care if it wasn't in the dictionary def scale(self, factor=1, offset=0): """Linear transform of values in freqs where val = facto*val + offset. Usage: f.scale(factor, offset) Does the transformation in place. """ for k,v in self.items(): self[k] = v * factor + offset def round(self, ndigits=0 ): """Rounds frequencies in Freqs to ndigits (default:0, i.e. integers). Usage: f.round() Does the transformation in place """ for k,v in self.items(): self[k] = round(v, ndigits) def expand(self, order=None, convert_to=None, scale=None): """Expands the Freqs into a sequence of symbols. Usage: f.expand(self, order=None, convert_to=list) order should be a sequence of symbols. Symbols that are not in f will be silently ignored (so it's safe to use on e.g. codon usage tables where some of the codons might not appear). Each item should appear in the list as many times as its frequency. convert_to should be a callable that takes an arbitrary sequence and converts it into the desired output format. Default is list, but ''.join is also popular. scale should be the number you want your frequencies multiplied with. Scaling only makes sense if your original freqs are fractions (and otherwise it won't work anyway). The scaling and rounding are done on a copy of the original, so the original data is not changed. Calls round() on each frequency, so if your values are normalized you'll need to renormalize them (e.g. self.normalize(); self.normalize(1.0/100) to get percentages) or the counts will be zero for anything that's less frequent than 0.5. You _can_ use this to check whether any one symbol constitutes a majority, but it's probably more efficient to use mode()... """ if scale: if sum([round(scale*v) for v in self.values()]) != scale: raise ValueError,\ "Can't round to the desired number (%d)"%(scale) else: used_freq = self.copy() used_freq.scale(factor=scale) used_freq.round() else: used_freq = self if order is None: order = used_freq.keys() result = [] for key in order: result.extend([key] * int(round(used_freq.get(key, 0)))) if convert_to: return convert_to(result) else: return result def _get_count(self): """Calculates number of categories in the frequency distribution. Useful for other stats functions. Assumes that keys are categories. Usage: count = self.count() Note that for NumberFreqs, Count will instead return the total number of observations. """ return len(self) Count = property(_get_count) def _get_sum(self): """Returns sum of items in self.""" return sum(self.values()) Sum = property(_get_sum) def _get_sum_squares(self): """Returns sum of squares of items in self.""" return sum([i*i for i in self.values()]) SumSquares = property(_get_sum_squares) def _get_variance(self): """Returns sample variance of counts in categories in self. Fault-tolerant: returns 0 if 0 or 1 items. """ if not self: return None total = self.Sum count = self.Count if count <= 1: #no variance for a single item variance = 0.0 else: variance = (self.SumSquares-(total*total)/count)/(count-1) return variance Variance = property(_get_variance) def _get_standard_deviation(self): """Returns sample standard deviation of items in self.""" if not self: return None return sqrt(abs(self.Variance)) StandardDeviation = property(_get_standard_deviation) def _get_mean(self): """Returns mean of items in self.""" if not self: return None try: mean = self.Sum/self.Count except (ZeroDivisionError, FloatingPointError): mean = 0.0 return mean Mean = property(_get_mean) def _get_uncertainty(self): """Returns the uncertainty of the Freqs. Calculates the Shannon uncertainty, defined as the sum of weighted log probabilities (multiplied by -1). Usage: H = self.uncertainty() """ normalized = self.copy() normalized.normalize() total = 0 for prob in normalized.values(): if prob: total -= prob * log2(prob) return total Uncertainty = property(_get_uncertainty) def _get_mode(self): """Returns the most frequent item. If a tie, picks one at random. Usage: most_frequent = self.mode() """ best = None best_count = 0 for item, count in self.items(): if count > best_count: best_count = count best = item return best Mode = property(_get_mode) def summarize(self): """Returns summary statistics for self.""" return SummaryStatistics(Count=self.Count, Sum=self.Sum, \ SumSquares=self.SumSquares, Variance=self.Variance) def getSortedList(self, descending=True, by_val=True): """Returns sorted list of tuples. descending: whether to sort highest to lowest (default True). by_val: whether to sort by val instead of key (default True). """ if by_val: items = [(v, (k,v)) for k, v in self.items()] else: items = self.items() items.sort() if descending: items.reverse() if by_val: return [i[1] for i in items] else: return items class UnsafeFreqs(FreqsI, dict): """Holds a frequency distribution, i.e. a set of categpory -> count pairs. Note: does not perform any validation. Use Freqs if data consistency is more important than speed. """ pass def copy(self): """Returns copy of data in self, preserving class.""" result = self.__class__() result.update(self) return result def freqwatcher(x): """Checks frequencies are correct type and >= 0.""" try: x = float(x) except: raise ConstraintError, "Could not convert frequency %s to float." % x if x >= 0: return x else: raise ConstraintError, "Got frequency %s < 0." % x class Freqs(FreqsI, MappedDict): """Holds a frequency distribution, i.e. a set of category -> count pairs. Class data: ValueMask: function that transforms values before they are entered. RequiredKeys: keys that are automatically added with frequency 0 before frequencies are added. Performs (expensive) validation on many operations that change the dictionary. Use UnsafeFreqs if speed is more important than validation. """ ValueMask = FunctionWrapper(freqwatcher) def __init__(self, data=None, Constraint=None, Mask=None, ValueMask=None): """Passes on to superclass, but adds required keys if absent. Parameters (for polymorphism with MappedDict superclass): data: data to load into self Constraint: only items that Constraint __contains__ are allowed Mask: function applied to keys before lookup ValueMask: function applied to values before addition """ super(Freqs, self).__init__(Constraint=Constraint, Mask=Mask, \ ValueMask=ValueMask) self += data for key in self.RequiredKeys: if key not in self: self[key] = 0.0 class NumberFreqsI(FreqsI): """Interface for frequency distribution where values and counts are numbers. NOTE: In NumberFreqs (as opposed to Freqs), keys and values are assumed to be one axis. In other words, the data {1:5, 2:10} in Freqs is assumed to mean 5 items in category 1 and 10 items in category 2, for a mean of 7.5 items per category. In NumberFreqs, the same data would mean 5 counts of 1 and 10 counts of 2, for a mean of (5*1 + 10*2)/15 = 1.66. """ def isValid(self): """Returns True if all keys and values are numbers.""" for item in self.items: for i in item: if not (isinstance(i, int) or isinstance(i, float)): return False return True def _get_count(self): """Calculates sum of frequencies in the frequency distribution (i.e. number of occurrences). Useful for other stats functions. Usage: count = self.count() """ return sum(self.values()) Count = property(_get_count) def _get_sum(self): """Returns sum of items in self.""" if not self: return None return sum([item*frequency for item,frequency in self.items()]) Sum = property(_get_sum) def _get_sum_squares(self): """Returns sum of squares of items in self.""" if not self: return None return sum([i*i*count for i, count in self.items()]) SumSquares = property(_get_sum_squares) def _get_uncertainty(self): """Returns the uncertainty of the NumberFreqs. Calculates the Shannon uncertainty, defined as the sum of weighted log probabilities (multiplied by -1). Handled by conversion to Freqs, since numbers treated as categories. Usage: H = self.uncertainty() """ f = UnsafeFreqs() f += self return f.Uncertainty Uncertainty = property(_get_uncertainty) def quantile(self, quantile): """Returns the specified quantile. Uses method type 7 from R.""" def value_at_expanded_index(values, counts, index): cumsum = counts.cumsum() for i in range(cumsum.shape[0]): if cumsum[i] > index: return values[i] vals = sorted(self.keys()) counts = array([self[val] for val in vals]) index = quantile * (counts.sum()-1) lo = int(floor(index)) hi = int(ceil(index)) diff = index - lo lo_val = value_at_expanded_index(vals, counts, lo) if diff != 0: hi_val = value_at_expanded_index(vals, counts, hi) else: hi_val = 0 stat = (1-diff) * lo_val + diff * hi_val return stat def _get_median(self): """returns the median""" return self.quantile(0.5) Median = property(_get_median) class UnsafeNumberFreqs(NumberFreqsI, dict): """Class holding freqs where keys and values are assumed to be numbers. Changes calculation of mean, standard deviation, etc. by assuming that the keys have weight proportional to their values (i.e. if the key is 5 and the value is 3, it contributes 15 'units' rather than 3 to things like mean() and normalize()). Does not perform validation to check whether the keys and values are valid (will raise various exceptions depending on circumstances). """ RequiredKeys = None def copy(self): """Returns copy of data in self, preserving class.""" result = self.__class__() result.update(self) return result class NumberFreqs(NumberFreqsI, MappedDict): """Class holding freqs where both keys and values are numbers. Mean, variance etc. assume that the data are frequencies of other numbers rather than treating each key as a separate category. Changes calculation of mean, standard deviation, etc. by assuming that the keys have weight proportional to their values (i.e. if the key is 5 and the value is 3, it contributes 15 'units' rather than 3 to things like mean() and normalize()). Performs (expensive) validation to ensure that keys are floats and values are non-negative floats. All keys and values are automatically converted to float. """ RequiredKeys = None Mask = FunctionWrapper(float) ValueMask = FunctionWrapper(freqwatcher) def __init__(self, data=None, Constraint=None, Mask=None, ValueMask=None): """Passes on to superclass, but adds required keys if absent. Parameters (for polymorphism with MappedDict superclass): data: data to load into self Constraint: only items that Constraint __contains__ are allowed Mask: function applied to keys before lookup ValueMask: function applied to values before addition """ super(NumberFreqs, self).__init__(Constraint=Constraint, Mask=Mask, \ ValueMask=ValueMask) self += data r = self.RequiredKeys if r: for key in r: if key not in self: self[key] = 0.0 PyCogent-1.5.3/cogent/maths/stats/cai/__init__.py000644 000765 000024 00000000536 12024702176 022643 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['adaptor', 'get_by_cai','util'] __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Stephanie Wilson", "Michael Eaton"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/cogent/maths/stats/cai/adaptor.py000644 000765 000024 00000024704 12024702176 022541 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Adaptors to fit data read from CUTG or fasta-format files into CAI graphs. """ from cogent.core.usage import UnsafeCodonUsage as CodonUsage from cogent.core.info import Info from cogent.parse.cutg import CutgParser from cogent.parse.fasta import MinimalFastaParser from cogent.core.genetic_code import GeneticCode from numpy import array, arange, searchsorted, sort, sqrt, zeros, \ concatenate, transpose from sys import argv from string import split from cogent.maths.stats.cai.util import cais __author__ = "Stephanie Wilson" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Stephanie Wilson"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" def data_from_file(lines): """reads lines returns array """ return array([map(float, i) for i in map(split, lines)]) empty_codons = dict.fromkeys([i+j+k for i in 'TCAG' for j in 'TCAG' for k in 'TCAG'], 0.0) def read_cutg(lines): """Returns list of CUTG objects from file-like object lines. Warning: reads whole file into memory as objects. """ return list(CutgParser(lines)) def get_ribosomal_proteins(usages): """Returns list of ribosomal proteins.""" keys = ['rpl', 'rps'] result=[] for u in usages: for k in keys: if str(u.Gene).lower().startswith(k): result.append(u) break return result def consolidate(usages): """Sums frequencies of a list of usages into one usage.""" result = CodonUsage() for u in usages: result += u result.normalize() return result def make_output(training_freqs, usages, funcs=cais): """Makes results as table.""" result = [] header = funcs.keys() header.sort() result.append(['gene', 'P3'] + header) for u in usages: u.normalize() curr_line = [] curr_line.append(u.Gene) thirdpos = u.positionalBases(purge_unwanted=True).Third thirdpos.normalize() curr_line.append(thirdpos['G'] + thirdpos['C']) for h in header: curr_line.append(funcs[h](training_freqs, u)) result.append(curr_line) return result def read_nt(infile): """Returns list of usage objects from Fasta-format infile""" result = [] for label, seq in MinimalFastaParser(infile): u = UnsafeCodonsFromString(seq.upper().replace('T','U')) u.Gene = label.split()[1] result.append(u) return result def print_output(table): """Prints table as tab-delimited text.""" for line in table: print '\t'.join(map(str, line)) def seq_to_codon_dict(seq): """Converts sequence into codon dict.""" leftover = len(seq) % 3 if leftover: seq += 'A' * (3-leftover) result = empty_codons.copy() for i in range(0, len(seq), 3): curr = seq[i:i+3] if curr in result: #ignore others result[curr] += 1 return result def kegg_fasta_to_codon_list(lines): """Reads list of CodonUsage objects from KEGG-format FASTA file.""" result = [] for label, seq in MinimalFastaParser(lines): seq = seq.upper() curr_info = {} fields = label.split() curr_info['SpeciesAbbreviation'], curr_info['GeneId'] = \ fields[0].split(':') if len(fields) > 1: #additional annotation first_word = fields[1] if first_word.endswith(';'): #gene label curr_info['Gene'] = first_word[:-1] curr_info['Description'] = ' '.join(fields[2:]) else: curr_info['Description'] = ' '.join(fields[1:]) curr_codon_usage = CodonUsage(seq_to_codon_dict(seq), Info=curr_info) curr_codon_usage.__dict__.update(curr_info) result.append(curr_codon_usage) return result def group_codon_usages_by_ids(codon_usages, ids, field='GeneId'): """Sorts codon usages into 2 lists: non-matching and matching ids.""" result = [[],[]] for c in codon_usages: result[getattr(c, field) in ids].append(c) return result def file_to_codon_list(infilename): """converts a file from the cutg parser to a list of codon usages """ return list(CutgParser(open(infilename), constructor=CodonUsage)) def adapt_fingerprint(codon_usages, which_blocks='quartets', \ include_mean=True, normalize=True): """takes a sequence of CodonUsage objects and returns an array for a fingerprint plot with: x: the g3/(g3+c3) y: the a3/(a3+u3) frequency: total of the base/total of all in the order: alanine, arginine4, glycine, leucine4, proline, serine4, threonine, valine (if quartets_only is True). codon_usages: list of CodonUsage objects quartets_only: return only the quartets that all code for the same aa(True) quartets_only set to false yeilds a 16 fingerprint include_mean: include a point for the mean in the result (True) normalize: ensure the frequencies returned sum to 1 (True) """ tot_codon_usage = CodonUsage() for idx, c in enumerate(codon_usages): tot_codon_usage += c return tot_codon_usage.fingerprint(which_blocks=which_blocks, \ include_mean=include_mean, normalize=normalize) def make_bin(lowerbound, upperbound, binwidth): """Returns range suitable for use in searchsorted(), incl. upper bound. takes: lowerbound, upperbound and bin width of number to be sorted outputs: a bin that will correspond to array indexes """ return arange((lowerbound+binwidth),(upperbound-binwidth+0.0001),binwidth) def bin_by_p3(codon_usages, bin_lowbound=0.0, bin_upbound=1.0, bin_width=0.1): """takes a list of one or more codon usage and bin range returns list of lists of codon usages, split into specified lists """ p3_bin = make_bin(bin_lowbound, bin_upbound, bin_width) p3_list = [[] for i in p3_bin] p3_list.append([]) for cu in codon_usages: third_usage=cu.positionalBases().Third #calculating the P3 value (overall) third_CG=(third_usage['C']+third_usage['G']) third_AT=(third_usage['A']+third_usage['U']) #stored as RNA P3=array((third_CG/float(third_CG+third_AT)))#test if value works cur_p3=searchsorted(p3_bin,P3) p3_list[cur_p3].append(cu) return p3_list def adapt_pr2_bias(codon_usages, block='GC', bin_lowbound=0.0, bin_upbound=1.0,\ binwidth=0.1): """Returns the bin midpoint and the PR2 biases for each bin of GC3.""" result = [] for i, bin in enumerate(bin_by_p3(codon_usages, bin_lowbound, bin_upbound, \ binwidth)): if not bin: continue try: tot_usage = CodonUsage() for c in bin: tot_usage += c curr_pr2 = tot_usage.pr2bias(block) midbin = bin_lowbound + (i+0.5)*binwidth result.append([midbin]+list(curr_pr2)) except (ZeroDivisionError, FloatingPointError): pass return array(result) def adapt_p12(codon_usages, purge_unwanted=True): """From list of codon usages, returns [P3, (P1+P2)/2] for each usage. purge_unwanted: get rid of singleton codons and stop codons (True). P3 is the resulting x axis; P12 is the resulting y axis. """ data = array([c.positionalGC(purge_unwanted) for c in codon_usages]) return [data[:,3],(data[:,1]+data[:,2])/2] def adapt_p123gc(codon_usages, purge_unwanted=True): """From list of codon usages, returns [P1,P2,P3,GC] for each usage. purge_unwanted: get rid of singleton codons and stop codons (True). """ return(transpose(array(\ [c.positionalGC(purge_unwanted) for c in codon_usages]))) def cu_gene(obj): """Extracts the gene name from a codon usage object in std. format""" return str(obj.Gene).lower() def bin_codon_usage_by_patterns(codon_usages, patterns, extract_gene_f=cu_gene): """Returns two lists of codon usages, matching vs not matching the strings. no_match is result[0], match is result[1]. """ result = [[],[]] for u in codon_usages: matched = 0 norm_str = extract_gene_f(u) for p in patterns: if norm_str.startswith(p): matched = 1 break result[matched].append(u) return result def adapt_p3_histogram(codon_usages, purge_unwanted=True): """Returns P3 from each set of codon usage for feeding to hist().""" return [array([c.positionalGC(purge_unwanted=True)[3] for c in curr])\ for curr in codon_usages] def adapt_cai_histogram(codon_usages, cai_model='2g', patterns=['rpl','rps'],\ purge_unwanted=True): """Returns two arrays of CAIs for non-training and training genes. Result is suitable for feeding to hist(). codon_usages: list of codon usages to examine. cai_model: 1a, 1g, 2a, 2g, 3a, 3g depending on model and arithmetic mean. See CAI.py for documentation. patterns: list of patterns used to pick the training set. Default is ['rpl', 'rps'] for ribosomal proteins. Returns list of results for each set. Each result for a set is an array of CAI for each gene. """ normal, training = bin_codon_usage_by_patterns(codon_usages, patterns) cai_f = cais[cai_model] ref = consolidate(training) result = [] for series in [normal, training]: curr_result = [] for c in series: c = c.copy() c.normalize() curr_result.append(cai_f(ref,c)) result.append(array(curr_result)) return result def adapt_cai_p3(codon_usages,cai_model='2g', patterns=['rpl','rps'], \ purge_unwanted=True, both_series=False): """Returns array of [P3,CAI] for each gene, based on training set.""" normal, training = bin_codon_usage_by_patterns(codon_usages, patterns) cai_f = cais[cai_model] ref = consolidate(training) result = [] if both_series: for series in [normal, training]: curr_result = [] for c in series: c = c.copy() c.normalize() curr_result.append([c.positionalGC()[3], cai_f(ref,c)]) result.extend(transpose(array(curr_result))) return result #if only one series was given, add all of them to the list. for c in codon_usages: c = c.copy() c.normalize() result.append([c.positionalGC()[3], cai_f(ref, c)]) return transpose(array(result)) PyCogent-1.5.3/cogent/maths/stats/cai/get_by_cai.py000644 000765 000024 00000004342 12024702176 023170 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Produces data that matches the CAI for a gene against its P3.""" from cogent.parse.cutg import CutgParser from cogent.parse.fasta import MinimalFastaParser from cogent.core.usage import UnsafeCodonUsage as CodonUsage, \ UnsafeCodonsFromString from cogent.maths.stats.cai.util import cais from cogent.maths.stats.cai.adaptor import consolidate, read_nt __author__ = "Stephanie Wilson" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Stephanie Wilson"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class CaiFilter(object): """Returns filter that checks objects for CAI range. Abstract.""" def __init__(self, training_set, min_cai, max_cai=1, cai_type='2g'): self._cai_type = cai_type self._training_set = training_set self._min_cai = min_cai self._max_cai = max_cai def __call__(self, *args, **kwargs): raise NotImplemented class CaiSeqFilter(CaiFilter): """Returns filter that checks seqs for CAI range.""" def __call__(self, seq): """Returns True if within CAI threshold.""" u = UnsafeCodonsFromString(seq.upper().replace('T','U')) return self._min_cai < cais[self._cai_type]([self._training_set], u) \ < self._max_cai class CaiUsageFilter(CaiFilter): """Returns filter that checks usages for CAI range.""" def __call__(self, u): """Returns True if within CAI threshold.""" return self._min_cai < cais[self._cai_type]([self._training_set], u) \ < self._max_cai def filter_seqs(infile, f): """Iterates (seq, label) from Fasta-format infile""" for label, seq in MinimalFastaParser(infile): if f(seq): yield label, seq #run from command-line if __name__ == '__main__': from sys import argv infile = open(argv[1]) training = read_nt(open(argv[2])) training_freqs = consolidate(training) min_cai = float(argv[3]) try: max_cai = float(argv[4]) except: max_cai = 1 f = CaiSeqFilter(training_freqs, min_cai, max_cai) for label, seq in filter_seqs(infile, f): print '>'+label+'\n'+seq+'\n' PyCogent-1.5.3/cogent/maths/stats/cai/util.py000644 000765 000024 00000024234 12024702176 022062 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Provides codon usage bias measurements, the Codon Adaptation Index (CAI). Codon bias patterns for a gene have been shown to coorelate with translation levels and the accuracy of translation. A CAI is derived from codon preference statistics. This CAI evaluation measures the extent to which codon usage in a particular gene matches the usage pattern of a highly expressed gene set. Three models have been implemented here, each as a function of either an arithmetic or geometric mean to test multiplicative versus additive fitness: 1) Assumes each gene selects codons and amino acids to maximize translation rate. Gene length may also be selected. 2) Assumes each gene selects codons to maximize translation rate, but not amino amino acids. Assumes the absolute translation rate, independent of amino acid frequency, is what's important (Sharp and Li, 1987). 3) Same as (2) above, but assumes the per codon translation rate is what's important (Eyre-Walker, 1996). Note: - Frequencies for single codon families (Met and Trp) are omitted from the calculations since they can not exhibit bias. - Stop codon frequencies are not omitted. - The Bulmer (1988) correction is used to prevent small normalized frequencies from creating a bias. """ from __future__ import division from numpy import log , exp from cogent.core.genetic_code import GeneticCodes __author__ = "Michael Eaton" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Michael Eaton", "Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" # Default cutoff for small codon frequencies to avoid bias FREQUENCY_MIN = 1.0e-3 # Default Standard Genetic Code NCBI number SGC = 1 # Default set of codons -- note: need to be RNA for compatibility with #the standard CodonUsage object. bases = 'UGAC' cu = dict.fromkeys([i+j+k for i in bases for j in bases for k in bases], 0.0) def as_rna(s): """Converts string to uppercase RNA""" return s.upper().replace('T','U') def synonyms_to_rna(syn): """Converts values in synonyms dict to uppercase RNA""" result = {} for k, v in syn.items(): result[k] = map(as_rna, v) return result def get_synonyms(genetic_code=SGC, singles_removed=True, stops_removed=True): """Gets synonymous codon blocks as dict, keyed by amino acid.""" GC = GeneticCodes[genetic_code] synonyms = GC.Synonyms.copy() #don't modify original if stops_removed: if '*' in synonyms: del synonyms['*'] if singles_removed: for aa, family in synonyms.items(): if len(family) < 2: #delete any empty ones as well del synonyms[aa] return synonyms_to_rna(synonyms) def sum_codon_freqs(freqs): """Sums a set of individual codon freqs, assuming dicts. Omits invalid keys. """ result = cu.copy() for f in freqs: for k, v in f.items(): if k in cu: cu[k] += v return cu def norm_to_max(vals): """Normalizes items in vals relative to max val: returns copy.""" best = max(vals) return [i/best for i in vals] def arithmetic_mean(vals, freqs=None): """Returns arithmetic mean of vals.""" if freqs is None: return sum(vals)/float(len(vals)) else: return sum([v*i for v, i in zip(vals, freqs)])/sum(freqs) def geometric_mean(vals, freqs=None): """Returns geometric mean of vals.""" if freqs is None: return exp(log(vals).sum()/float(len(vals))) else: return exp(sum([v*i for v,i in zip(log(vals), freqs)])/sum(freqs)) def codon_adaptiveness_all(freqs): """Calculates relative codon adaptiveness, using all codons.""" k = freqs.keys() v = freqs.values() n = norm_to_max(freqs.values()) return dict(zip(freqs.keys(), norm_to_max(freqs.values()))) def codon_adaptiveness_blocks(freqs, blocks): """Calculates relative codon adaptiveness, using codon blocks.""" result = freqs.copy() for b, codons in blocks.items(): codon_vals = norm_to_max([result[c] for c in codons]) for c, v in zip(codons, codon_vals): result[c] = v return result def set_min(freqs, threshold): """Sets all values in freqs below min to specified threshold, in-place.""" for k, v in freqs.items(): if v < threshold: freqs[k] = threshold def valid_codons(blocks): """Gets all valid codons from blocks""" result = [] for b in blocks.values(): result.extend(b) return result def make_cai_1(genetic_code=SGC): """Returns function that calculates CAI model 1 using specified gen code.""" blocks = get_synonyms(genetic_code) codons = frozenset(valid_codons(blocks)) def cai_1(ref_freqs, gene_freqs, average=arithmetic_mean, \ threshold=FREQUENCY_MIN): """ Assumes codon and amino acid selection to maximize translation rate. Gene length may also be under selection. ref_freqs: dict mapping reference set of codons for highly expressed genes to their frequencies of occurrence. gene_freqs: dictionary mapping codons to their usage in gene of interest. average: function for normalizing and averaging gene of interest codon frequencies. threshold: cutoff for small normalized codon frequencies to avoid bias. """ r = ref_freqs.copy() set_min(r, threshold) adaptiveness_values = codon_adaptiveness_all(r) curr_codons = [k for k in gene_freqs if k in codons] return average([adaptiveness_values[i] for i in curr_codons],\ [gene_freqs[i] for i in curr_codons]) return cai_1 cai_1 = make_cai_1() def make_cai_2(genetic_code=SGC): """Returns function that calculates CAI model 2 using specified gen code.""" blocks = get_synonyms(genetic_code) codons = frozenset(valid_codons(blocks)) def cai_2(ref_freqs, gene_freqs, average=arithmetic_mean, \ threshold=FREQUENCY_MIN): """ Assumes codon, but not amino acid, selection to maximize translation rate (using geometric mean - Sharp and Li, 1987). ref_freqs: dict mapping reference set of codons for highly expressed genes to their frequencies of occurrence. gene_freqs: dictionary mapping codons to their usage in the gene of interest. average: function for normalizing and averaging the gene of interest codon frequencies. threshold: cutoff for small normalized codon frequencies to avoid bias. """ r = ref_freqs.copy() set_min(r, threshold) adaptiveness_values = codon_adaptiveness_blocks(r, blocks) curr_codons = [k for k in gene_freqs if k in codons] return average([adaptiveness_values[i] for i in curr_codons],\ [gene_freqs[i] for i in curr_codons]) return cai_2 cai_2 = make_cai_2() def make_cai_3(genetic_code=SGC): """Returns function that calculates CAI model 3 using specified gen code.""" blocks = get_synonyms(genetic_code) def cai_3(ref_freqs, gene_freqs, average=arithmetic_mean, \ threshold=FREQUENCY_MIN): """ Assumes codon, but not amino acid, selection to maximize translation rate, and per codon translation rate is paramount (using geometric mean -- Eyre-Walker, 1996). ref_freqs: dict mapping reference set of codons for highly expressed genes to their frequencies of occurrence. gene_freqs: dictionary mapping codons to their usage in the gene of interest. average: function for normalizing and averaging the gene of interest codon frequencies. if average is the string 'eyre-walker', will calculate CAI exactly according to the formula in his 1996 paper: CAI = exp(sum_i(sum_j(X_ij*ln_w_ij)/sum(X_ij))/m) ...where i is the codon block, j is the codon, X_ij is the freq, and w_ij is the relative adaptiveness of each codon. In other words, this version averages the log of the geometric means for each codon family, then exponentiates the result. Note that this actually produces exactly the same results as performing the geometric mean at both steps, so in general you'll just want to pass in the geometric mean ass the averaging function if you want to do this. Otherwise, if average is a function of the weights and frequencies, the same averaging function will be applied within and between families. threshold: cutoff for small normalized codon frequencies to avoid bias. """ if average == 'eyre_walker': eyre_walker = True average = geometric_mean else: eyre_walker = False r = ref_freqs.copy() set_min(r, threshold) adaptiveness_values = codon_adaptiveness_blocks(r, blocks) block_results = [] for b, codons in blocks.items(): vals = [adaptiveness_values[i] for i in codons if i in gene_freqs] freqs = [gene_freqs[i] for i in codons if i in gene_freqs] #will skip if freqs missing if sum(freqs): block_results.append(average(vals, freqs)) if eyre_walker: return exp(arithmetic_mean(log(block_results))) else: return average(block_results) return cai_3 cai_3 = make_cai_3() #Define dict containing standard CAI variants cais = { '1a' : lambda ref, gene, threshold=FREQUENCY_MIN: \ cai_1(ref, gene, arithmetic_mean, threshold), '1g' : lambda ref, gene, threshold=FREQUENCY_MIN: \ cai_1(ref, gene, geometric_mean, threshold), '2a' : lambda ref, gene, threshold=FREQUENCY_MIN: \ cai_2(ref, gene, arithmetic_meani, threshold), '2g' : lambda ref, gene, threshold=FREQUENCY_MIN: \ cai_2(ref, gene, geometric_mean, threshold), '3a' : lambda ref, gene, threshold=FREQUENCY_MIN: \ cai_3(ref,gene, arithmetic_mean, threshold), '3g' : lambda ref, gene, threshold=FREQUENCY_MIN: \ cai_3(ref,gene, geometric_mean, threshold), } PyCogent-1.5.3/cogent/maths/spatial/__init__.py000644 000765 000024 00000000434 12024702176 022403 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['ckd3'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Production" PyCogent-1.5.3/cogent/maths/spatial/ckd3.c000644 000765 000024 00001175104 12024702176 021272 0ustar00jrideoutstaff000000 000000 /* Generated by Cython 0.16 on Fri Sep 14 12:12:07 2012 */ #define PY_SSIZE_T_CLEAN #include "Python.h" #ifndef Py_PYTHON_H #error Python headers needed to compile C extensions, please install development version of Python. #elif PY_VERSION_HEX < 0x02040000 #error Cython requires Python 2.4+. #else #include /* For offsetof */ #ifndef offsetof #define offsetof(type, member) ( (size_t) & ((type*)0) -> member ) #endif #if !defined(WIN32) && !defined(MS_WINDOWS) #ifndef __stdcall #define __stdcall #endif #ifndef __cdecl #define __cdecl #endif #ifndef __fastcall #define __fastcall #endif #endif #ifndef DL_IMPORT #define DL_IMPORT(t) t #endif #ifndef DL_EXPORT #define DL_EXPORT(t) t #endif #ifndef PY_LONG_LONG #define PY_LONG_LONG LONG_LONG #endif #ifndef Py_HUGE_VAL #define Py_HUGE_VAL HUGE_VAL #endif #ifdef PYPY_VERSION #define CYTHON_COMPILING_IN_PYPY 1 #define CYTHON_COMPILING_IN_CPYTHON 0 #else #define CYTHON_COMPILING_IN_PYPY 0 #define CYTHON_COMPILING_IN_CPYTHON 1 #endif #if CYTHON_COMPILING_IN_PYPY #define __Pyx_PyCFunction_Call PyObject_Call #else #define __Pyx_PyCFunction_Call PyCFunction_Call #endif #if PY_VERSION_HEX < 0x02050000 typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN #define PY_FORMAT_SIZE_T "" #define PyInt_FromSsize_t(z) PyInt_FromLong(z) #define PyInt_AsSsize_t(o) __Pyx_PyInt_AsInt(o) #define PyNumber_Index(o) PyNumber_Int(o) #define PyIndex_Check(o) PyNumber_Check(o) #define PyErr_WarnEx(category, message, stacklevel) PyErr_Warn(category, message) #define __PYX_BUILD_PY_SSIZE_T "i" #else #define __PYX_BUILD_PY_SSIZE_T "n" #endif #if PY_VERSION_HEX < 0x02060000 #define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt) #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) #define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size) #define PyVarObject_HEAD_INIT(type, size) \ PyObject_HEAD_INIT(type) size, #define PyType_Modified(t) typedef struct { void *buf; PyObject *obj; Py_ssize_t len; Py_ssize_t itemsize; int readonly; int ndim; char *format; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; void *internal; } Py_buffer; #define PyBUF_SIMPLE 0 #define PyBUF_WRITABLE 0x0001 #define PyBUF_FORMAT 0x0004 #define PyBUF_ND 0x0008 #define PyBUF_STRIDES (0x0010 | PyBUF_ND) #define PyBUF_C_CONTIGUOUS (0x0020 | PyBUF_STRIDES) #define PyBUF_F_CONTIGUOUS (0x0040 | PyBUF_STRIDES) #define PyBUF_ANY_CONTIGUOUS (0x0080 | PyBUF_STRIDES) #define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES) #define PyBUF_RECORDS (PyBUF_STRIDES | PyBUF_FORMAT | PyBUF_WRITABLE) #define PyBUF_FULL (PyBUF_INDIRECT | PyBUF_FORMAT | PyBUF_WRITABLE) typedef int (*getbufferproc)(PyObject *, Py_buffer *, int); typedef void (*releasebufferproc)(PyObject *, Py_buffer *); #endif #if PY_MAJOR_VERSION < 3 #define __Pyx_BUILTIN_MODULE_NAME "__builtin__" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #else #define __Pyx_BUILTIN_MODULE_NAME "builtins" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #endif #if PY_MAJOR_VERSION < 3 && PY_MINOR_VERSION < 6 #define PyUnicode_FromString(s) PyUnicode_Decode(s, strlen(s), "UTF-8", "strict") #endif #if PY_MAJOR_VERSION >= 3 #define Py_TPFLAGS_CHECKTYPES 0 #define Py_TPFLAGS_HAVE_INDEX 0 #endif #if (PY_VERSION_HEX < 0x02060000) || (PY_MAJOR_VERSION >= 3) #define Py_TPFLAGS_HAVE_NEWBUFFER 0 #endif #if PY_VERSION_HEX > 0x03030000 && defined(PyUnicode_GET_LENGTH) #define CYTHON_PEP393_ENABLED 1 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_LENGTH(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) PyUnicode_READ_CHAR(u, i) #else #define CYTHON_PEP393_ENABLED 0 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_SIZE(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) ((Py_UCS4)(PyUnicode_AS_UNICODE(u)[i])) #endif #if PY_MAJOR_VERSION >= 3 #define PyBaseString_Type PyUnicode_Type #define PyStringObject PyUnicodeObject #define PyString_Type PyUnicode_Type #define PyString_Check PyUnicode_Check #define PyString_CheckExact PyUnicode_CheckExact #endif #if PY_VERSION_HEX < 0x02060000 #define PyBytesObject PyStringObject #define PyBytes_Type PyString_Type #define PyBytes_Check PyString_Check #define PyBytes_CheckExact PyString_CheckExact #define PyBytes_FromString PyString_FromString #define PyBytes_FromStringAndSize PyString_FromStringAndSize #define PyBytes_FromFormat PyString_FromFormat #define PyBytes_DecodeEscape PyString_DecodeEscape #define PyBytes_AsString PyString_AsString #define PyBytes_AsStringAndSize PyString_AsStringAndSize #define PyBytes_Size PyString_Size #define PyBytes_AS_STRING PyString_AS_STRING #define PyBytes_GET_SIZE PyString_GET_SIZE #define PyBytes_Repr PyString_Repr #define PyBytes_Concat PyString_Concat #define PyBytes_ConcatAndDel PyString_ConcatAndDel #endif #if PY_VERSION_HEX < 0x02060000 #define PySet_Check(obj) PyObject_TypeCheck(obj, &PySet_Type) #define PyFrozenSet_Check(obj) PyObject_TypeCheck(obj, &PyFrozenSet_Type) #endif #ifndef PySet_CheckExact #define PySet_CheckExact(obj) (Py_TYPE(obj) == &PySet_Type) #endif #define __Pyx_TypeCheck(obj, type) PyObject_TypeCheck(obj, (PyTypeObject *)type) #if PY_MAJOR_VERSION >= 3 #define PyIntObject PyLongObject #define PyInt_Type PyLong_Type #define PyInt_Check(op) PyLong_Check(op) #define PyInt_CheckExact(op) PyLong_CheckExact(op) #define PyInt_FromString PyLong_FromString #define PyInt_FromUnicode PyLong_FromUnicode #define PyInt_FromLong PyLong_FromLong #define PyInt_FromSize_t PyLong_FromSize_t #define PyInt_FromSsize_t PyLong_FromSsize_t #define PyInt_AsLong PyLong_AsLong #define PyInt_AS_LONG PyLong_AS_LONG #define PyInt_AsSsize_t PyLong_AsSsize_t #define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask #define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask #endif #if PY_MAJOR_VERSION >= 3 #define PyBoolObject PyLongObject #endif #if PY_VERSION_HEX < 0x03020000 typedef long Py_hash_t; #define __Pyx_PyInt_FromHash_t PyInt_FromLong #define __Pyx_PyInt_AsHash_t PyInt_AsLong #else #define __Pyx_PyInt_FromHash_t PyInt_FromSsize_t #define __Pyx_PyInt_AsHash_t PyInt_AsSsize_t #endif #if (PY_MAJOR_VERSION < 3) || (PY_VERSION_HEX >= 0x03010300) #define __Pyx_PySequence_GetSlice(obj, a, b) PySequence_GetSlice(obj, a, b) #define __Pyx_PySequence_SetSlice(obj, a, b, value) PySequence_SetSlice(obj, a, b, value) #define __Pyx_PySequence_DelSlice(obj, a, b) PySequence_DelSlice(obj, a, b) #else #define __Pyx_PySequence_GetSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), (PyObject*)0) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_GetSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object is unsliceable", (obj)->ob_type->tp_name), (PyObject*)0))) #define __Pyx_PySequence_SetSlice(obj, a, b, value) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_SetSlice(obj, a, b, value)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice assignment", (obj)->ob_type->tp_name), -1))) #define __Pyx_PySequence_DelSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_DelSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice deletion", (obj)->ob_type->tp_name), -1))) #endif #if PY_MAJOR_VERSION >= 3 #define PyMethod_New(func, self, klass) ((self) ? PyMethod_New(func, self) : PyInstanceMethod_New(func)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),((char *)(n))) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),((char *)(n)),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),((char *)(n))) #else #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),(n)) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),(n),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),(n)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_NAMESTR(n) ((char *)(n)) #define __Pyx_DOCSTR(n) ((char *)(n)) #else #define __Pyx_NAMESTR(n) (n) #define __Pyx_DOCSTR(n) (n) #endif #if PY_MAJOR_VERSION >= 3 #define __Pyx_PyNumber_Divide(x,y) PyNumber_TrueDivide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceTrueDivide(x,y) #else #define __Pyx_PyNumber_Divide(x,y) PyNumber_Divide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceDivide(x,y) #endif #ifndef __PYX_EXTERN_C #ifdef __cplusplus #define __PYX_EXTERN_C extern "C" #else #define __PYX_EXTERN_C extern #endif #endif #if defined(WIN32) || defined(MS_WINDOWS) #define _USE_MATH_DEFINES #endif #include #define __PYX_HAVE__cogent__maths__spatial__ckd3 #define __PYX_HAVE_API__cogent__maths__spatial__ckd3 #include "stdio.h" #include "stdlib.h" #include "numpy/arrayobject.h" #include "numpy/ufuncobject.h" #ifdef _OPENMP #include #endif /* _OPENMP */ #ifdef PYREX_WITHOUT_ASSERTIONS #define CYTHON_WITHOUT_ASSERTIONS #endif /* inline attribute */ #ifndef CYTHON_INLINE #if defined(__GNUC__) #define CYTHON_INLINE __inline__ #elif defined(_MSC_VER) #define CYTHON_INLINE __inline #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L #define CYTHON_INLINE inline #else #define CYTHON_INLINE #endif #endif /* unused attribute */ #ifndef CYTHON_UNUSED # if defined(__GNUC__) # if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif # elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif #endif typedef struct {PyObject **p; char *s; const long n; const char* encoding; const char is_unicode; const char is_str; const char intern; } __Pyx_StringTabEntry; /*proto*/ /* Type Conversion Predeclarations */ #define __Pyx_PyBytes_FromUString(s) PyBytes_FromString((char*)s) #define __Pyx_PyBytes_AsUString(s) ((unsigned char*) PyBytes_AsString(s)) #define __Pyx_Owned_Py_None(b) (Py_INCREF(Py_None), Py_None) #define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False)) static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject*); static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x); static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject*); static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t); static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject*); #define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x)) #define __pyx_PyFloat_AsFloat(x) ((float) __pyx_PyFloat_AsDouble(x)) #ifdef __GNUC__ /* Test for GCC > 2.95 */ #if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)) #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) #else /* __GNUC__ > 2 ... */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ > 2 ... */ #else /* __GNUC__ */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ */ static PyObject *__pyx_m; static PyObject *__pyx_b; static PyObject *__pyx_empty_tuple; static PyObject *__pyx_empty_bytes; static int __pyx_lineno; static int __pyx_clineno = 0; static const char * __pyx_cfilenm= __FILE__; static const char *__pyx_filename; #if !defined(CYTHON_CCOMPLEX) #if defined(__cplusplus) #define CYTHON_CCOMPLEX 1 #elif defined(_Complex_I) #define CYTHON_CCOMPLEX 1 #else #define CYTHON_CCOMPLEX 0 #endif #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus #include #else #include #endif #endif #if CYTHON_CCOMPLEX && !defined(__cplusplus) && defined(__sun__) && defined(__GNUC__) #undef _Complex_I #define _Complex_I 1.0fj #endif static const char *__pyx_f[] = { "ckd3.pyx", "numpy.pxd", }; #define IS_UNSIGNED(type) (((type) -1) > 0) struct __Pyx_StructField_; #define __PYX_BUF_FLAGS_PACKED_STRUCT (1 << 0) typedef struct { const char* name; /* for error messages only */ struct __Pyx_StructField_* fields; size_t size; /* sizeof(type) */ size_t arraysize[8]; /* length of array in each dimension */ int ndim; char typegroup; /* _R_eal, _C_omplex, Signed _I_nt, _U_nsigned int, _S_truct, _P_ointer, _O_bject */ char is_unsigned; int flags; } __Pyx_TypeInfo; typedef struct __Pyx_StructField_ { __Pyx_TypeInfo* type; const char* name; size_t offset; } __Pyx_StructField; typedef struct { __Pyx_StructField* field; size_t parent_offset; } __Pyx_BufFmt_StackElem; typedef struct { __Pyx_StructField root; __Pyx_BufFmt_StackElem* head; size_t fmt_offset; size_t new_count, enc_count; size_t struct_alignment; int is_complex; char enc_type; char new_packmode; char enc_packmode; char is_valid_array; } __Pyx_BufFmt_Context; /* "numpy.pxd":722 * # in Cython to enable them only on the right systems. * * ctypedef npy_int8 int8_t # <<<<<<<<<<<<<< * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t */ typedef npy_int8 __pyx_t_5numpy_int8_t; /* "numpy.pxd":723 * * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t # <<<<<<<<<<<<<< * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t */ typedef npy_int16 __pyx_t_5numpy_int16_t; /* "numpy.pxd":724 * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t # <<<<<<<<<<<<<< * ctypedef npy_int64 int64_t * #ctypedef npy_int96 int96_t */ typedef npy_int32 __pyx_t_5numpy_int32_t; /* "numpy.pxd":725 * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t # <<<<<<<<<<<<<< * #ctypedef npy_int96 int96_t * #ctypedef npy_int128 int128_t */ typedef npy_int64 __pyx_t_5numpy_int64_t; /* "numpy.pxd":729 * #ctypedef npy_int128 int128_t * * ctypedef npy_uint8 uint8_t # <<<<<<<<<<<<<< * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t */ typedef npy_uint8 __pyx_t_5numpy_uint8_t; /* "numpy.pxd":730 * * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t # <<<<<<<<<<<<<< * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t */ typedef npy_uint16 __pyx_t_5numpy_uint16_t; /* "numpy.pxd":731 * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t # <<<<<<<<<<<<<< * ctypedef npy_uint64 uint64_t * #ctypedef npy_uint96 uint96_t */ typedef npy_uint32 __pyx_t_5numpy_uint32_t; /* "numpy.pxd":732 * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t # <<<<<<<<<<<<<< * #ctypedef npy_uint96 uint96_t * #ctypedef npy_uint128 uint128_t */ typedef npy_uint64 __pyx_t_5numpy_uint64_t; /* "numpy.pxd":736 * #ctypedef npy_uint128 uint128_t * * ctypedef npy_float32 float32_t # <<<<<<<<<<<<<< * ctypedef npy_float64 float64_t * #ctypedef npy_float80 float80_t */ typedef npy_float32 __pyx_t_5numpy_float32_t; /* "numpy.pxd":737 * * ctypedef npy_float32 float32_t * ctypedef npy_float64 float64_t # <<<<<<<<<<<<<< * #ctypedef npy_float80 float80_t * #ctypedef npy_float128 float128_t */ typedef npy_float64 __pyx_t_5numpy_float64_t; /* "numpy.pxd":746 * # The int types are mapped a bit surprising -- * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t # <<<<<<<<<<<<<< * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t */ typedef npy_long __pyx_t_5numpy_int_t; /* "numpy.pxd":747 * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t * ctypedef npy_longlong long_t # <<<<<<<<<<<<<< * ctypedef npy_longlong longlong_t * */ typedef npy_longlong __pyx_t_5numpy_long_t; /* "numpy.pxd":748 * ctypedef npy_long int_t * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t # <<<<<<<<<<<<<< * * ctypedef npy_ulong uint_t */ typedef npy_longlong __pyx_t_5numpy_longlong_t; /* "numpy.pxd":750 * ctypedef npy_longlong longlong_t * * ctypedef npy_ulong uint_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t */ typedef npy_ulong __pyx_t_5numpy_uint_t; /* "numpy.pxd":751 * * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulonglong_t * */ typedef npy_ulonglong __pyx_t_5numpy_ulong_t; /* "numpy.pxd":752 * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t # <<<<<<<<<<<<<< * * ctypedef npy_intp intp_t */ typedef npy_ulonglong __pyx_t_5numpy_ulonglong_t; /* "numpy.pxd":754 * ctypedef npy_ulonglong ulonglong_t * * ctypedef npy_intp intp_t # <<<<<<<<<<<<<< * ctypedef npy_uintp uintp_t * */ typedef npy_intp __pyx_t_5numpy_intp_t; /* "numpy.pxd":755 * * ctypedef npy_intp intp_t * ctypedef npy_uintp uintp_t # <<<<<<<<<<<<<< * * ctypedef npy_double float_t */ typedef npy_uintp __pyx_t_5numpy_uintp_t; /* "numpy.pxd":757 * ctypedef npy_uintp uintp_t * * ctypedef npy_double float_t # <<<<<<<<<<<<<< * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t */ typedef npy_double __pyx_t_5numpy_float_t; /* "numpy.pxd":758 * * ctypedef npy_double float_t * ctypedef npy_double double_t # <<<<<<<<<<<<<< * ctypedef npy_longdouble longdouble_t * */ typedef npy_double __pyx_t_5numpy_double_t; /* "numpy.pxd":759 * ctypedef npy_double float_t * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cfloat cfloat_t */ typedef npy_longdouble __pyx_t_5numpy_longdouble_t; /* "cogent/maths/spatial/ckd3.pxd":2 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t # <<<<<<<<<<<<<< * ctypedef np.npy_uint64 UTYPE_t * */ typedef npy_float64 __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t; /* "cogent/maths/spatial/ckd3.pxd":3 * cimport numpy as np * ctypedef np.npy_float64 DTYPE_t * ctypedef np.npy_uint64 UTYPE_t # <<<<<<<<<<<<<< * * cdef enum constants: */ typedef npy_uint64 __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t; #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< float > __pyx_t_float_complex; #else typedef float _Complex __pyx_t_float_complex; #endif #else typedef struct { float real, imag; } __pyx_t_float_complex; #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< double > __pyx_t_double_complex; #else typedef double _Complex __pyx_t_double_complex; #endif #else typedef struct { double real, imag; } __pyx_t_double_complex; #endif /*--- Type declarations ---*/ struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree; /* "numpy.pxd":761 * ctypedef npy_longdouble longdouble_t * * ctypedef npy_cfloat cfloat_t # <<<<<<<<<<<<<< * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t */ typedef npy_cfloat __pyx_t_5numpy_cfloat_t; /* "numpy.pxd":762 * * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t # <<<<<<<<<<<<<< * ctypedef npy_clongdouble clongdouble_t * */ typedef npy_cdouble __pyx_t_5numpy_cdouble_t; /* "numpy.pxd":763 * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cdouble complex_t */ typedef npy_clongdouble __pyx_t_5numpy_clongdouble_t; /* "numpy.pxd":765 * ctypedef npy_clongdouble clongdouble_t * * ctypedef npy_cdouble complex_t # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew1(a): */ typedef npy_cdouble __pyx_t_5numpy_complex_t; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode; /* "cogent/maths/spatial/ckd3.pxd":5 * ctypedef np.npy_uint64 UTYPE_t * * cdef enum constants: # <<<<<<<<<<<<<< * NSTACK = 100 * */ enum __pyx_t_6cogent_5maths_7spatial_4ckd3_constants { __pyx_e_6cogent_5maths_7spatial_4ckd3_NSTACK = 100 }; /* "cogent/maths/spatial/ckd3.pxd":8 * NSTACK = 100 * * cdef struct kdpoint: # <<<<<<<<<<<<<< * UTYPE_t index * DTYPE_t *coords */ struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t index; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *coords; }; /* "cogent/maths/spatial/ckd3.pxd":12 * DTYPE_t *coords * * cdef struct kdnode: # <<<<<<<<<<<<<< * UTYPE_t bucket # 1 if leaf-bucket, 0 if node * int dimension */ struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t bucket; int dimension; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t position; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t start; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t end; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *left; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *right; }; /* "cogent/maths/spatial/ckd3.pyx":232 * return count * * cdef class KDTree: # <<<<<<<<<<<<<< * """Implements the KDTree data structure for fast neares neighbor queries.""" * cdef np.ndarray n_array */ struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree { PyObject_HEAD PyArrayObject *n_array; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *c_array; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *kdpnts; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *tree; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t dims; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t pnts; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t bucket_size; }; #ifndef CYTHON_REFNANNY #define CYTHON_REFNANNY 0 #endif #if CYTHON_REFNANNY typedef struct { void (*INCREF)(void*, PyObject*, int); void (*DECREF)(void*, PyObject*, int); void (*GOTREF)(void*, PyObject*, int); void (*GIVEREF)(void*, PyObject*, int); void* (*SetupContext)(const char*, int, const char*); void (*FinishContext)(void**); } __Pyx_RefNannyAPIStruct; static __Pyx_RefNannyAPIStruct *__Pyx_RefNanny = NULL; static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname); /*proto*/ #define __Pyx_RefNannyDeclarations void *__pyx_refnanny = NULL; #ifdef WITH_THREAD #define __Pyx_RefNannySetupContext(name, acquire_gil) \ if (acquire_gil) { \ PyGILState_STATE __pyx_gilstate_save = PyGILState_Ensure(); \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ PyGILState_Release(__pyx_gilstate_save); \ } else { \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ } #else #define __Pyx_RefNannySetupContext(name, acquire_gil) \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__) #endif #define __Pyx_RefNannyFinishContext() \ __Pyx_RefNanny->FinishContext(&__pyx_refnanny) #define __Pyx_INCREF(r) __Pyx_RefNanny->INCREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_DECREF(r) __Pyx_RefNanny->DECREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GOTREF(r) __Pyx_RefNanny->GOTREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GIVEREF(r) __Pyx_RefNanny->GIVEREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_XINCREF(r) do { if((r) != NULL) {__Pyx_INCREF(r); }} while(0) #define __Pyx_XDECREF(r) do { if((r) != NULL) {__Pyx_DECREF(r); }} while(0) #define __Pyx_XGOTREF(r) do { if((r) != NULL) {__Pyx_GOTREF(r); }} while(0) #define __Pyx_XGIVEREF(r) do { if((r) != NULL) {__Pyx_GIVEREF(r);}} while(0) #else #define __Pyx_RefNannyDeclarations #define __Pyx_RefNannySetupContext(name, acquire_gil) #define __Pyx_RefNannyFinishContext() #define __Pyx_INCREF(r) Py_INCREF(r) #define __Pyx_DECREF(r) Py_DECREF(r) #define __Pyx_GOTREF(r) #define __Pyx_GIVEREF(r) #define __Pyx_XINCREF(r) Py_XINCREF(r) #define __Pyx_XDECREF(r) Py_XDECREF(r) #define __Pyx_XGOTREF(r) #define __Pyx_XGIVEREF(r) #endif /* CYTHON_REFNANNY */ #define __Pyx_CLEAR(r) do { PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);} while(0) #define __Pyx_XCLEAR(r) do { if((r) != NULL) {PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);}} while(0) static void __Pyx_RaiseDoubleKeywordsError(const char* func_name, PyObject* kw_name); /*proto*/ static int __Pyx_ParseOptionalKeywords(PyObject *kwds, PyObject **argnames[], \ PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, \ const char* function_name); /*proto*/ static void __Pyx_RaiseArgtupleInvalid(const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found); /*proto*/ static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact); /*proto*/ static CYTHON_INLINE int __Pyx_GetBufferAndValidate(Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack); static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info); static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb); /*proto*/ static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb); /*proto*/ static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type); /*proto*/ static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name); /*proto*/ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause); /*proto*/ static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index); static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected); static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void); static void __Pyx_UnpackTupleError(PyObject *, Py_ssize_t index); /*proto*/ static CYTHON_INLINE PyObject *__Pyx_PyInt_to_py_npy_uint64(npy_uint64); typedef struct { Py_ssize_t shape, strides, suboffsets; } __Pyx_Buf_DimInfo; typedef struct { size_t refcount; Py_buffer pybuffer; } __Pyx_Buffer; typedef struct { __Pyx_Buffer *rcbuffer; char *data; __Pyx_Buf_DimInfo diminfo[8]; } __Pyx_LocalBuf_ND; #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags); static void __Pyx_ReleaseBuffer(Py_buffer *view); #else #define __Pyx_GetBuffer PyObject_GetBuffer #define __Pyx_ReleaseBuffer PyBuffer_Release #endif static Py_ssize_t __Pyx_zeros[] = {0, 0, 0, 0, 0, 0, 0, 0}; static Py_ssize_t __Pyx_minusones[] = {-1, -1, -1, -1, -1, -1, -1, -1}; static CYTHON_INLINE npy_uint64 __Pyx_PyInt_from_py_npy_uint64(PyObject *); #if CYTHON_CCOMPLEX #ifdef __cplusplus #define __Pyx_CREAL(z) ((z).real()) #define __Pyx_CIMAG(z) ((z).imag()) #else #define __Pyx_CREAL(z) (__real__(z)) #define __Pyx_CIMAG(z) (__imag__(z)) #endif #else #define __Pyx_CREAL(z) ((z).real) #define __Pyx_CIMAG(z) ((z).imag) #endif #if defined(_WIN32) && defined(__cplusplus) && CYTHON_CCOMPLEX #define __Pyx_SET_CREAL(z,x) ((z).real(x)) #define __Pyx_SET_CIMAG(z,y) ((z).imag(y)) #else #define __Pyx_SET_CREAL(z,x) __Pyx_CREAL(z) = (x) #define __Pyx_SET_CIMAG(z,y) __Pyx_CIMAG(z) = (y) #endif static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float, float); #if CYTHON_CCOMPLEX #define __Pyx_c_eqf(a, b) ((a)==(b)) #define __Pyx_c_sumf(a, b) ((a)+(b)) #define __Pyx_c_difff(a, b) ((a)-(b)) #define __Pyx_c_prodf(a, b) ((a)*(b)) #define __Pyx_c_quotf(a, b) ((a)/(b)) #define __Pyx_c_negf(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zerof(z) ((z)==(float)0) #define __Pyx_c_conjf(z) (::std::conj(z)) #if 1 #define __Pyx_c_absf(z) (::std::abs(z)) #define __Pyx_c_powf(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zerof(z) ((z)==0) #define __Pyx_c_conjf(z) (conjf(z)) #if 1 #define __Pyx_c_absf(z) (cabsf(z)) #define __Pyx_c_powf(a, b) (cpowf(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex); static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex); #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex, __pyx_t_float_complex); #endif #endif static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double, double); #if CYTHON_CCOMPLEX #define __Pyx_c_eq(a, b) ((a)==(b)) #define __Pyx_c_sum(a, b) ((a)+(b)) #define __Pyx_c_diff(a, b) ((a)-(b)) #define __Pyx_c_prod(a, b) ((a)*(b)) #define __Pyx_c_quot(a, b) ((a)/(b)) #define __Pyx_c_neg(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zero(z) ((z)==(double)0) #define __Pyx_c_conj(z) (::std::conj(z)) #if 1 #define __Pyx_c_abs(z) (::std::abs(z)) #define __Pyx_c_pow(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zero(z) ((z)==0) #define __Pyx_c_conj(z) (conj(z)) #if 1 #define __Pyx_c_abs(z) (cabs(z)) #define __Pyx_c_pow(a, b) (cpow(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex); static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex); #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex, __pyx_t_double_complex); #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject *); static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject *); static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject *); static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject *); static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject *); static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject *); static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject *); static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject *); static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject *); static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject *); static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject *); static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject *); static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject *); static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject *); static void __Pyx_WriteUnraisable(const char *name, int clineno, int lineno, const char *filename); /*proto*/ static CYTHON_INLINE Py_intptr_t __Pyx_PyInt_from_py_Py_intptr_t(PyObject *); static int __Pyx_check_binary_version(void); static int __Pyx_ExportFunction(const char *name, void (*f)(void), const char *sig); /*proto*/ #if !defined(__Pyx_PyIdentifier_FromString) #if PY_MAJOR_VERSION < 3 #define __Pyx_PyIdentifier_FromString(s) PyString_FromString(s) #else #define __Pyx_PyIdentifier_FromString(s) PyUnicode_FromString(s) #endif #endif static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict); /*proto*/ static PyObject *__Pyx_ImportModule(const char *name); /*proto*/ typedef struct { int code_line; PyCodeObject* code_object; } __Pyx_CodeObjectCacheEntry; struct __Pyx_CodeObjectCache { int count; int max_count; __Pyx_CodeObjectCacheEntry* entries; }; static struct __Pyx_CodeObjectCache __pyx_code_cache = {0,0,NULL}; static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line); static PyCodeObject *__pyx_find_code_object(int code_line); static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object); static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename); /*proto*/ static int __Pyx_InitStrings(__Pyx_StringTabEntry *t); /*proto*/ /* Module declarations from 'cpython.buffer' */ /* Module declarations from 'cpython.ref' */ /* Module declarations from 'libc.stdio' */ /* Module declarations from 'cpython.object' */ /* Module declarations from 'libc.stdlib' */ /* Module declarations from 'numpy' */ /* Module declarations from 'numpy' */ static PyTypeObject *__pyx_ptype_5numpy_dtype = 0; static PyTypeObject *__pyx_ptype_5numpy_flatiter = 0; static PyTypeObject *__pyx_ptype_5numpy_broadcast = 0; static PyTypeObject *__pyx_ptype_5numpy_ndarray = 0; static PyTypeObject *__pyx_ptype_5numpy_ufunc = 0; static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *, char *, char *, int *); /*proto*/ /* Module declarations from 'stdlib' */ /* Module declarations from 'cogent.maths.spatial.ckd3' */ static PyTypeObject *__pyx_ptype_6cogent_5maths_7spatial_4ckd3_KDTree = 0; static CYTHON_INLINE void __pyx_f_6cogent_5maths_7spatial_4ckd3_swap(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *); /*proto*/ static struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_f_6cogent_5maths_7spatial_4ckd3_points(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static CYTHON_INLINE __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_f_6cogent_5maths_7spatial_4ckd3_dist(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static void __pyx_f_6cogent_5maths_7spatial_4ckd3_qsort(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_f_6cogent_5maths_7spatial_4ckd3_rn(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static void *__pyx_f_6cogent_5maths_7spatial_4ckd3_knn(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t); /*proto*/ static __Pyx_TypeInfo __Pyx_TypeInfo_nn___pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t = { "DTYPE_t", NULL, sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t), { 0 }, 0, 'R', 0, 0 }; #define __Pyx_MODULE_NAME "cogent.maths.spatial.ckd3" int __pyx_module_is_main_cogent__maths__spatial__ckd3 = 0; /* Implementation of 'cogent.maths.spatial.ckd3' */ static PyObject *__pyx_builtin_ValueError; static PyObject *__pyx_builtin_range; static PyObject *__pyx_builtin_RuntimeError; static int __pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree___init__(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self, PyArrayObject *__pyx_v_n_array, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_bucket_size); /* proto */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_2knn(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self, PyArrayObject *__pyx_v_point, npy_intp __pyx_v_k); /* proto */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_4rn(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self, PyArrayObject *__pyx_v_point, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_r); /* proto */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_4dims___get__(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self); /* proto */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_4pnts___get__(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self); /* proto */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_11bucket_size___get__(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self); /* proto */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /* proto */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info); /* proto */ static char __pyx_k_1[] = "ndarray is not C contiguous"; static char __pyx_k_3[] = "ndarray is not Fortran contiguous"; static char __pyx_k_5[] = "Non-native byte order not supported"; static char __pyx_k_7[] = "unknown dtype code in numpy.pxd (%d)"; static char __pyx_k_8[] = "Format string allocated too short, see comment in numpy.pxd"; static char __pyx_k_11[] = "Format string allocated too short."; static char __pyx_k_13[] = "('1', '5', '3')"; static char __pyx_k__B[] = "B"; static char __pyx_k__H[] = "H"; static char __pyx_k__I[] = "I"; static char __pyx_k__L[] = "L"; static char __pyx_k__O[] = "O"; static char __pyx_k__Q[] = "Q"; static char __pyx_k__b[] = "b"; static char __pyx_k__d[] = "d"; static char __pyx_k__f[] = "f"; static char __pyx_k__g[] = "g"; static char __pyx_k__h[] = "h"; static char __pyx_k__i[] = "i"; static char __pyx_k__k[] = "k"; static char __pyx_k__l[] = "l"; static char __pyx_k__q[] = "q"; static char __pyx_k__r[] = "r"; static char __pyx_k__Zd[] = "Zd"; static char __pyx_k__Zf[] = "Zf"; static char __pyx_k__Zg[] = "Zg"; static char __pyx_k__size[] = "size"; static char __pyx_k__point[] = "point"; static char __pyx_k__range[] = "range"; static char __pyx_k__n_array[] = "n_array"; static char __pyx_k____main__[] = "__main__"; static char __pyx_k____test__[] = "__test__"; static char __pyx_k__ValueError[] = "ValueError"; static char __pyx_k____version__[] = "__version__"; static char __pyx_k__bucket_size[] = "bucket_size"; static char __pyx_k__RuntimeError[] = "RuntimeError"; static PyObject *__pyx_kp_u_1; static PyObject *__pyx_kp_u_11; static PyObject *__pyx_kp_s_13; static PyObject *__pyx_kp_u_3; static PyObject *__pyx_kp_u_5; static PyObject *__pyx_kp_u_7; static PyObject *__pyx_kp_u_8; static PyObject *__pyx_n_s__RuntimeError; static PyObject *__pyx_n_s__ValueError; static PyObject *__pyx_n_s____main__; static PyObject *__pyx_n_s____test__; static PyObject *__pyx_n_s____version__; static PyObject *__pyx_n_s__bucket_size; static PyObject *__pyx_n_s__k; static PyObject *__pyx_n_s__n_array; static PyObject *__pyx_n_s__point; static PyObject *__pyx_n_s__r; static PyObject *__pyx_n_s__range; static PyObject *__pyx_n_s__size; static PyObject *__pyx_int_1; static PyObject *__pyx_int_15; static PyObject *__pyx_k_tuple_2; static PyObject *__pyx_k_tuple_4; static PyObject *__pyx_k_tuple_6; static PyObject *__pyx_k_tuple_9; static PyObject *__pyx_k_tuple_10; static PyObject *__pyx_k_tuple_12; /* "cogent/maths/spatial/ckd3.pyx":23 * cdef int import_array1(int ret) * * cdef kdpoint *points(DTYPE_t *c_array, UTYPE_t points, UTYPE_t dims): # <<<<<<<<<<<<<< * """creates an array of kdpoints from c-array of numpy doubles.""" * cdef kdpoint *pnts = malloc(sizeof(kdpoint)*points) */ static struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_f_6cogent_5maths_7spatial_4ckd3_points(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *__pyx_v_c_array, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_points, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_dims) { struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_pnts; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_i; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_r; __Pyx_RefNannyDeclarations __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_t_1; __Pyx_RefNannySetupContext("points", 0); /* "cogent/maths/spatial/ckd3.pyx":25 * cdef kdpoint *points(DTYPE_t *c_array, UTYPE_t points, UTYPE_t dims): * """creates an array of kdpoints from c-array of numpy doubles.""" * cdef kdpoint *pnts = malloc(sizeof(kdpoint)*points) # <<<<<<<<<<<<<< * cdef UTYPE_t i * for 0 <= i < points: */ __pyx_v_pnts = ((struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *)malloc(((sizeof(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint)) * __pyx_v_points))); /* "cogent/maths/spatial/ckd3.pyx":27 * cdef kdpoint *pnts = malloc(sizeof(kdpoint)*points) * cdef UTYPE_t i * for 0 <= i < points: # <<<<<<<<<<<<<< * pnts[i].index = i * pnts[i].coords = c_array+i*dims */ __pyx_t_1 = __pyx_v_points; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_1; __pyx_v_i++) { /* "cogent/maths/spatial/ckd3.pyx":28 * cdef UTYPE_t i * for 0 <= i < points: * pnts[i].index = i # <<<<<<<<<<<<<< * pnts[i].coords = c_array+i*dims * return pnts */ (__pyx_v_pnts[__pyx_v_i]).index = __pyx_v_i; /* "cogent/maths/spatial/ckd3.pyx":29 * for 0 <= i < points: * pnts[i].index = i * pnts[i].coords = c_array+i*dims # <<<<<<<<<<<<<< * return pnts * */ (__pyx_v_pnts[__pyx_v_i]).coords = (__pyx_v_c_array + (__pyx_v_i * __pyx_v_dims)); } /* "cogent/maths/spatial/ckd3.pyx":30 * pnts[i].index = i * pnts[i].coords = c_array+i*dims * return pnts # <<<<<<<<<<<<<< * * cdef inline void swap(kdpoint *a, kdpoint *b): */ __pyx_r = __pyx_v_pnts; goto __pyx_L0; __pyx_r = 0; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":32 * return pnts * * cdef inline void swap(kdpoint *a, kdpoint *b): # <<<<<<<<<<<<<< * """swaps two pointers to kdpoint structs.""" * cdef kdpoint t */ static CYTHON_INLINE void __pyx_f_6cogent_5maths_7spatial_4ckd3_swap(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_a, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_b) { struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint __pyx_v_t; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("swap", 0); /* "cogent/maths/spatial/ckd3.pyx":35 * """swaps two pointers to kdpoint structs.""" * cdef kdpoint t * t = a[0] # <<<<<<<<<<<<<< * a[0] = b[0] * b[0] = t */ __pyx_v_t = (__pyx_v_a[0]); /* "cogent/maths/spatial/ckd3.pyx":36 * cdef kdpoint t * t = a[0] * a[0] = b[0] # <<<<<<<<<<<<<< * b[0] = t * */ (__pyx_v_a[0]) = (__pyx_v_b[0]); /* "cogent/maths/spatial/ckd3.pyx":37 * t = a[0] * a[0] = b[0] * b[0] = t # <<<<<<<<<<<<<< * * cdef inline DTYPE_t dist(kdpoint *a, kdpoint *b, UTYPE_t dims): */ (__pyx_v_b[0]) = __pyx_v_t; __Pyx_RefNannyFinishContext(); } /* "cogent/maths/spatial/ckd3.pyx":39 * b[0] = t * * cdef inline DTYPE_t dist(kdpoint *a, kdpoint *b, UTYPE_t dims): # <<<<<<<<<<<<<< * """calculates the squared distance between two points.""" * cdef UTYPE_t i */ static CYTHON_INLINE __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_f_6cogent_5maths_7spatial_4ckd3_dist(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_a, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_b, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_dims) { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_i; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_dif; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_dst; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_r; __Pyx_RefNannyDeclarations __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_t_1; __Pyx_RefNannySetupContext("dist", 0); /* "cogent/maths/spatial/ckd3.pyx":42 * """calculates the squared distance between two points.""" * cdef UTYPE_t i * cdef DTYPE_t dif, dst = 0 # <<<<<<<<<<<<<< * for 0 <= i < dims: * dif = a.coords[i] - b.coords[i] */ __pyx_v_dst = 0.0; /* "cogent/maths/spatial/ckd3.pyx":43 * cdef UTYPE_t i * cdef DTYPE_t dif, dst = 0 * for 0 <= i < dims: # <<<<<<<<<<<<<< * dif = a.coords[i] - b.coords[i] * dst += dif * dif */ __pyx_t_1 = __pyx_v_dims; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_1; __pyx_v_i++) { /* "cogent/maths/spatial/ckd3.pyx":44 * cdef DTYPE_t dif, dst = 0 * for 0 <= i < dims: * dif = a.coords[i] - b.coords[i] # <<<<<<<<<<<<<< * dst += dif * dif * return dst */ __pyx_v_dif = ((__pyx_v_a->coords[__pyx_v_i]) - (__pyx_v_b->coords[__pyx_v_i])); /* "cogent/maths/spatial/ckd3.pyx":45 * for 0 <= i < dims: * dif = a.coords[i] - b.coords[i] * dst += dif * dif # <<<<<<<<<<<<<< * return dst * */ __pyx_v_dst = (__pyx_v_dst + (__pyx_v_dif * __pyx_v_dif)); } /* "cogent/maths/spatial/ckd3.pyx":46 * dif = a.coords[i] - b.coords[i] * dst += dif * dif * return dst # <<<<<<<<<<<<<< * * cdef void qsort(kdpoint *A, UTYPE_t l, UTYPE_t r, UTYPE_t dim): */ __pyx_r = __pyx_v_dst; goto __pyx_L0; __pyx_r = 0; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":48 * return dst * * cdef void qsort(kdpoint *A, UTYPE_t l, UTYPE_t r, UTYPE_t dim): # <<<<<<<<<<<<<< * """implements the quick sort algorithm on kdpoint arrays.""" * cdef UTYPE_t i, j, jstack = 0 */ static void __pyx_f_6cogent_5maths_7spatial_4ckd3_qsort(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_A, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_l, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_r, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_dim) { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_i; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_j; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_jstack; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_v; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *__pyx_v_istack; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("qsort", 0); /* "cogent/maths/spatial/ckd3.pyx":50 * cdef void qsort(kdpoint *A, UTYPE_t l, UTYPE_t r, UTYPE_t dim): * """implements the quick sort algorithm on kdpoint arrays.""" * cdef UTYPE_t i, j, jstack = 0 # <<<<<<<<<<<<<< * cdef DTYPE_t v * cdef UTYPE_t *istack = malloc(NSTACK * sizeof(UTYPE_t)) */ __pyx_v_jstack = 0; /* "cogent/maths/spatial/ckd3.pyx":52 * cdef UTYPE_t i, j, jstack = 0 * cdef DTYPE_t v * cdef UTYPE_t *istack = malloc(NSTACK * sizeof(UTYPE_t)) # <<<<<<<<<<<<<< * while True: * if r - l > 2: */ __pyx_v_istack = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *)malloc((__pyx_e_6cogent_5maths_7spatial_4ckd3_NSTACK * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":53 * cdef DTYPE_t v * cdef UTYPE_t *istack = malloc(NSTACK * sizeof(UTYPE_t)) * while True: # <<<<<<<<<<<<<< * if r - l > 2: * i = (l + r) >> 1 */ while (1) { if (!1) break; /* "cogent/maths/spatial/ckd3.pyx":54 * cdef UTYPE_t *istack = malloc(NSTACK * sizeof(UTYPE_t)) * while True: * if r - l > 2: # <<<<<<<<<<<<<< * i = (l + r) >> 1 * if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) */ __pyx_t_1 = ((__pyx_v_r - __pyx_v_l) > 2); if (__pyx_t_1) { /* "cogent/maths/spatial/ckd3.pyx":55 * while True: * if r - l > 2: * i = (l + r) >> 1 # <<<<<<<<<<<<<< * if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) */ __pyx_v_i = ((__pyx_v_l + __pyx_v_r) >> 1); /* "cogent/maths/spatial/ckd3.pyx":56 * if r - l > 2: * i = (l + r) >> 1 * if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) # <<<<<<<<<<<<<< * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) */ __pyx_t_1 = (((__pyx_v_A[__pyx_v_l]).coords[__pyx_v_dim]) > ((__pyx_v_A[__pyx_v_i]).coords[__pyx_v_dim])); if (__pyx_t_1) { __pyx_f_6cogent_5maths_7spatial_4ckd3_swap((&(__pyx_v_A[__pyx_v_l])), (&(__pyx_v_A[__pyx_v_i]))); goto __pyx_L6; } __pyx_L6:; /* "cogent/maths/spatial/ckd3.pyx":57 * i = (l + r) >> 1 * if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) # <<<<<<<<<<<<<< * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) * j = r - 1 */ __pyx_t_1 = (((__pyx_v_A[__pyx_v_l]).coords[__pyx_v_dim]) > ((__pyx_v_A[__pyx_v_r]).coords[__pyx_v_dim])); if (__pyx_t_1) { __pyx_f_6cogent_5maths_7spatial_4ckd3_swap((&(__pyx_v_A[__pyx_v_l])), (&(__pyx_v_A[__pyx_v_r]))); goto __pyx_L7; } __pyx_L7:; /* "cogent/maths/spatial/ckd3.pyx":58 * if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) # <<<<<<<<<<<<<< * j = r - 1 * swap(&A[i], &A[j]) */ __pyx_t_1 = (((__pyx_v_A[__pyx_v_i]).coords[__pyx_v_dim]) > ((__pyx_v_A[__pyx_v_r]).coords[__pyx_v_dim])); if (__pyx_t_1) { __pyx_f_6cogent_5maths_7spatial_4ckd3_swap((&(__pyx_v_A[__pyx_v_i])), (&(__pyx_v_A[__pyx_v_r]))); goto __pyx_L8; } __pyx_L8:; /* "cogent/maths/spatial/ckd3.pyx":59 * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) * j = r - 1 # <<<<<<<<<<<<<< * swap(&A[i], &A[j]) * i = l */ __pyx_v_j = (__pyx_v_r - 1); /* "cogent/maths/spatial/ckd3.pyx":60 * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) * j = r - 1 * swap(&A[i], &A[j]) # <<<<<<<<<<<<<< * i = l * v = A[j].coords[dim] */ __pyx_f_6cogent_5maths_7spatial_4ckd3_swap((&(__pyx_v_A[__pyx_v_i])), (&(__pyx_v_A[__pyx_v_j]))); /* "cogent/maths/spatial/ckd3.pyx":61 * j = r - 1 * swap(&A[i], &A[j]) * i = l # <<<<<<<<<<<<<< * v = A[j].coords[dim] * while True: */ __pyx_v_i = __pyx_v_l; /* "cogent/maths/spatial/ckd3.pyx":62 * swap(&A[i], &A[j]) * i = l * v = A[j].coords[dim] # <<<<<<<<<<<<<< * while True: * while A[i+1].coords[dim] < v: i+=1 */ __pyx_v_v = ((__pyx_v_A[__pyx_v_j]).coords[__pyx_v_dim]); /* "cogent/maths/spatial/ckd3.pyx":63 * i = l * v = A[j].coords[dim] * while True: # <<<<<<<<<<<<<< * while A[i+1].coords[dim] < v: i+=1 * i+=1 */ while (1) { if (!1) break; /* "cogent/maths/spatial/ckd3.pyx":64 * v = A[j].coords[dim] * while True: * while A[i+1].coords[dim] < v: i+=1 # <<<<<<<<<<<<<< * i+=1 * while A[j-1].coords[dim] > v: j-=1 */ while (1) { __pyx_t_1 = (((__pyx_v_A[(__pyx_v_i + 1)]).coords[__pyx_v_dim]) < __pyx_v_v); if (!__pyx_t_1) break; __pyx_v_i = (__pyx_v_i + 1); } /* "cogent/maths/spatial/ckd3.pyx":65 * while True: * while A[i+1].coords[dim] < v: i+=1 * i+=1 # <<<<<<<<<<<<<< * while A[j-1].coords[dim] > v: j-=1 * j-=1 */ __pyx_v_i = (__pyx_v_i + 1); /* "cogent/maths/spatial/ckd3.pyx":66 * while A[i+1].coords[dim] < v: i+=1 * i+=1 * while A[j-1].coords[dim] > v: j-=1 # <<<<<<<<<<<<<< * j-=1 * if j < i: */ while (1) { __pyx_t_1 = (((__pyx_v_A[(__pyx_v_j - 1)]).coords[__pyx_v_dim]) > __pyx_v_v); if (!__pyx_t_1) break; __pyx_v_j = (__pyx_v_j - 1); } /* "cogent/maths/spatial/ckd3.pyx":67 * i+=1 * while A[j-1].coords[dim] > v: j-=1 * j-=1 # <<<<<<<<<<<<<< * if j < i: * break */ __pyx_v_j = (__pyx_v_j - 1); /* "cogent/maths/spatial/ckd3.pyx":68 * while A[j-1].coords[dim] > v: j-=1 * j-=1 * if j < i: # <<<<<<<<<<<<<< * break * swap(&A[i], &A[j]) */ __pyx_t_1 = (__pyx_v_j < __pyx_v_i); if (__pyx_t_1) { /* "cogent/maths/spatial/ckd3.pyx":69 * j-=1 * if j < i: * break # <<<<<<<<<<<<<< * swap(&A[i], &A[j]) * swap(&A[i], &A[r-1]) */ goto __pyx_L10_break; goto __pyx_L15; } __pyx_L15:; /* "cogent/maths/spatial/ckd3.pyx":70 * if j < i: * break * swap(&A[i], &A[j]) # <<<<<<<<<<<<<< * swap(&A[i], &A[r-1]) * jstack += 2 */ __pyx_f_6cogent_5maths_7spatial_4ckd3_swap((&(__pyx_v_A[__pyx_v_i])), (&(__pyx_v_A[__pyx_v_j]))); } __pyx_L10_break:; /* "cogent/maths/spatial/ckd3.pyx":71 * break * swap(&A[i], &A[j]) * swap(&A[i], &A[r-1]) # <<<<<<<<<<<<<< * jstack += 2 * if r - i >= j: */ __pyx_f_6cogent_5maths_7spatial_4ckd3_swap((&(__pyx_v_A[__pyx_v_i])), (&(__pyx_v_A[(__pyx_v_r - 1)]))); /* "cogent/maths/spatial/ckd3.pyx":72 * swap(&A[i], &A[j]) * swap(&A[i], &A[r-1]) * jstack += 2 # <<<<<<<<<<<<<< * if r - i >= j: * istack[jstack] = r */ __pyx_v_jstack = (__pyx_v_jstack + 2); /* "cogent/maths/spatial/ckd3.pyx":73 * swap(&A[i], &A[r-1]) * jstack += 2 * if r - i >= j: # <<<<<<<<<<<<<< * istack[jstack] = r * istack[jstack - 1] = i */ __pyx_t_1 = ((__pyx_v_r - __pyx_v_i) >= __pyx_v_j); if (__pyx_t_1) { /* "cogent/maths/spatial/ckd3.pyx":74 * jstack += 2 * if r - i >= j: * istack[jstack] = r # <<<<<<<<<<<<<< * istack[jstack - 1] = i * r = j */ (__pyx_v_istack[__pyx_v_jstack]) = __pyx_v_r; /* "cogent/maths/spatial/ckd3.pyx":75 * if r - i >= j: * istack[jstack] = r * istack[jstack - 1] = i # <<<<<<<<<<<<<< * r = j * else: */ (__pyx_v_istack[(__pyx_v_jstack - 1)]) = __pyx_v_i; /* "cogent/maths/spatial/ckd3.pyx":76 * istack[jstack] = r * istack[jstack - 1] = i * r = j # <<<<<<<<<<<<<< * else: * istack[jstack] = j */ __pyx_v_r = __pyx_v_j; goto __pyx_L16; } /*else*/ { /* "cogent/maths/spatial/ckd3.pyx":78 * r = j * else: * istack[jstack] = j # <<<<<<<<<<<<<< * istack[jstack - 1] = l * l = i */ (__pyx_v_istack[__pyx_v_jstack]) = __pyx_v_j; /* "cogent/maths/spatial/ckd3.pyx":79 * else: * istack[jstack] = j * istack[jstack - 1] = l # <<<<<<<<<<<<<< * l = i * else: */ (__pyx_v_istack[(__pyx_v_jstack - 1)]) = __pyx_v_l; /* "cogent/maths/spatial/ckd3.pyx":80 * istack[jstack] = j * istack[jstack - 1] = l * l = i # <<<<<<<<<<<<<< * else: * i = (l + r) >> 1 */ __pyx_v_l = __pyx_v_i; } __pyx_L16:; goto __pyx_L5; } /*else*/ { /* "cogent/maths/spatial/ckd3.pyx":82 * l = i * else: * i = (l + r) >> 1 # <<<<<<<<<<<<<< * if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) */ __pyx_v_i = ((__pyx_v_l + __pyx_v_r) >> 1); /* "cogent/maths/spatial/ckd3.pyx":83 * else: * i = (l + r) >> 1 * if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) # <<<<<<<<<<<<<< * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) */ __pyx_t_1 = (((__pyx_v_A[__pyx_v_l]).coords[__pyx_v_dim]) > ((__pyx_v_A[__pyx_v_i]).coords[__pyx_v_dim])); if (__pyx_t_1) { __pyx_f_6cogent_5maths_7spatial_4ckd3_swap((&(__pyx_v_A[__pyx_v_l])), (&(__pyx_v_A[__pyx_v_i]))); goto __pyx_L17; } __pyx_L17:; /* "cogent/maths/spatial/ckd3.pyx":84 * i = (l + r) >> 1 * if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) # <<<<<<<<<<<<<< * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) * if jstack == 0: */ __pyx_t_1 = (((__pyx_v_A[__pyx_v_l]).coords[__pyx_v_dim]) > ((__pyx_v_A[__pyx_v_r]).coords[__pyx_v_dim])); if (__pyx_t_1) { __pyx_f_6cogent_5maths_7spatial_4ckd3_swap((&(__pyx_v_A[__pyx_v_l])), (&(__pyx_v_A[__pyx_v_r]))); goto __pyx_L18; } __pyx_L18:; /* "cogent/maths/spatial/ckd3.pyx":85 * if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) # <<<<<<<<<<<<<< * if jstack == 0: * break */ __pyx_t_1 = (((__pyx_v_A[__pyx_v_i]).coords[__pyx_v_dim]) > ((__pyx_v_A[__pyx_v_r]).coords[__pyx_v_dim])); if (__pyx_t_1) { __pyx_f_6cogent_5maths_7spatial_4ckd3_swap((&(__pyx_v_A[__pyx_v_i])), (&(__pyx_v_A[__pyx_v_r]))); goto __pyx_L19; } __pyx_L19:; /* "cogent/maths/spatial/ckd3.pyx":86 * if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) * if jstack == 0: # <<<<<<<<<<<<<< * break * r = istack[jstack] */ __pyx_t_1 = (__pyx_v_jstack == 0); if (__pyx_t_1) { /* "cogent/maths/spatial/ckd3.pyx":87 * if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) * if jstack == 0: * break # <<<<<<<<<<<<<< * r = istack[jstack] * jstack-=1 */ goto __pyx_L4_break; goto __pyx_L20; } __pyx_L20:; /* "cogent/maths/spatial/ckd3.pyx":88 * if jstack == 0: * break * r = istack[jstack] # <<<<<<<<<<<<<< * jstack-=1 * l = istack[jstack] */ __pyx_v_r = (__pyx_v_istack[__pyx_v_jstack]); /* "cogent/maths/spatial/ckd3.pyx":89 * break * r = istack[jstack] * jstack-=1 # <<<<<<<<<<<<<< * l = istack[jstack] * jstack-=1 */ __pyx_v_jstack = (__pyx_v_jstack - 1); /* "cogent/maths/spatial/ckd3.pyx":90 * r = istack[jstack] * jstack-=1 * l = istack[jstack] # <<<<<<<<<<<<<< * jstack-=1 * free(istack) */ __pyx_v_l = (__pyx_v_istack[__pyx_v_jstack]); /* "cogent/maths/spatial/ckd3.pyx":91 * jstack-=1 * l = istack[jstack] * jstack-=1 # <<<<<<<<<<<<<< * free(istack) * */ __pyx_v_jstack = (__pyx_v_jstack - 1); } __pyx_L5:; } __pyx_L4_break:; /* "cogent/maths/spatial/ckd3.pyx":92 * l = istack[jstack] * jstack-=1 * free(istack) # <<<<<<<<<<<<<< * * cdef kdnode *build_tree(kdpoint *point_list, UTYPE_t start, UTYPE_t end,\ */ free(__pyx_v_istack); __Pyx_RefNannyFinishContext(); } /* "cogent/maths/spatial/ckd3.pyx":94 * free(istack) * * cdef kdnode *build_tree(kdpoint *point_list, UTYPE_t start, UTYPE_t end,\ # <<<<<<<<<<<<<< * UTYPE_t dims, UTYPE_t bucket_size, UTYPE_t depth): * """recursive tree building function.""" */ static struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_point_list, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_start, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_end, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_dims, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_bucket_size, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_depth) { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_split; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_v_node; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_r; __Pyx_RefNannyDeclarations int __pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("build_tree", 0); /* "cogent/maths/spatial/ckd3.pyx":99 * # cannot make variable in if/else * cdef UTYPE_t split, i * cdef kdnode *node = malloc(sizeof(kdnode)) # <<<<<<<<<<<<<< * node.dimension = depth % dims * node.start = start */ __pyx_v_node = ((struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *)malloc((sizeof(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode)))); /* "cogent/maths/spatial/ckd3.pyx":100 * cdef UTYPE_t split, i * cdef kdnode *node = malloc(sizeof(kdnode)) * node.dimension = depth % dims # <<<<<<<<<<<<<< * node.start = start * node.end = end */ if (unlikely(__pyx_v_dims == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "integer division or modulo by zero"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 100; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_v_node->dimension = (__pyx_v_depth % __pyx_v_dims); /* "cogent/maths/spatial/ckd3.pyx":101 * cdef kdnode *node = malloc(sizeof(kdnode)) * node.dimension = depth % dims * node.start = start # <<<<<<<<<<<<<< * node.end = end * if end - start <= bucket_size: */ __pyx_v_node->start = __pyx_v_start; /* "cogent/maths/spatial/ckd3.pyx":102 * node.dimension = depth % dims * node.start = start * node.end = end # <<<<<<<<<<<<<< * if end - start <= bucket_size: * # make bucket node */ __pyx_v_node->end = __pyx_v_end; /* "cogent/maths/spatial/ckd3.pyx":103 * node.start = start * node.end = end * if end - start <= bucket_size: # <<<<<<<<<<<<<< * # make bucket node * node.bucket = 1 */ __pyx_t_1 = ((__pyx_v_end - __pyx_v_start) <= __pyx_v_bucket_size); if (__pyx_t_1) { /* "cogent/maths/spatial/ckd3.pyx":105 * if end - start <= bucket_size: * # make bucket node * node.bucket = 1 # <<<<<<<<<<<<<< * node.position = -1.0 * node.left = NULL */ __pyx_v_node->bucket = 1; /* "cogent/maths/spatial/ckd3.pyx":106 * # make bucket node * node.bucket = 1 * node.position = -1.0 # <<<<<<<<<<<<<< * node.left = NULL * node.right = NULL */ __pyx_v_node->position = -1.0; /* "cogent/maths/spatial/ckd3.pyx":107 * node.bucket = 1 * node.position = -1.0 * node.left = NULL # <<<<<<<<<<<<<< * node.right = NULL * else: */ __pyx_v_node->left = NULL; /* "cogent/maths/spatial/ckd3.pyx":108 * node.position = -1.0 * node.left = NULL * node.right = NULL # <<<<<<<<<<<<<< * else: * ## make branch node */ __pyx_v_node->right = NULL; goto __pyx_L3; } /*else*/ { /* "cogent/maths/spatial/ckd3.pyx":111 * else: * ## make branch node * node.bucket = 0 # <<<<<<<<<<<<<< * split = (start + end) / 2 * qsort(point_list, start, end, node.dimension) */ __pyx_v_node->bucket = 0; /* "cogent/maths/spatial/ckd3.pyx":112 * ## make branch node * node.bucket = 0 * split = (start + end) / 2 # <<<<<<<<<<<<<< * qsort(point_list, start, end, node.dimension) * node.position = point_list[split].coords[node.dimension] */ __pyx_v_split = ((__pyx_v_start + __pyx_v_end) / 2); /* "cogent/maths/spatial/ckd3.pyx":113 * node.bucket = 0 * split = (start + end) / 2 * qsort(point_list, start, end, node.dimension) # <<<<<<<<<<<<<< * node.position = point_list[split].coords[node.dimension] * # recurse */ __pyx_f_6cogent_5maths_7spatial_4ckd3_qsort(__pyx_v_point_list, __pyx_v_start, __pyx_v_end, __pyx_v_node->dimension); /* "cogent/maths/spatial/ckd3.pyx":114 * split = (start + end) / 2 * qsort(point_list, start, end, node.dimension) * node.position = point_list[split].coords[node.dimension] # <<<<<<<<<<<<<< * # recurse * node.left = build_tree(point_list, start, split, dims , bucket_size , depth+1) */ __pyx_v_node->position = ((__pyx_v_point_list[__pyx_v_split]).coords[__pyx_v_node->dimension]); /* "cogent/maths/spatial/ckd3.pyx":116 * node.position = point_list[split].coords[node.dimension] * # recurse * node.left = build_tree(point_list, start, split, dims , bucket_size , depth+1) # <<<<<<<<<<<<<< * node.right = build_tree(point_list, split+1, end, dims , bucket_size , depth+1) * return node */ __pyx_v_node->left = __pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree(__pyx_v_point_list, __pyx_v_start, __pyx_v_split, __pyx_v_dims, __pyx_v_bucket_size, (__pyx_v_depth + 1)); /* "cogent/maths/spatial/ckd3.pyx":117 * # recurse * node.left = build_tree(point_list, start, split, dims , bucket_size , depth+1) * node.right = build_tree(point_list, split+1, end, dims , bucket_size , depth+1) # <<<<<<<<<<<<<< * return node * */ __pyx_v_node->right = __pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree(__pyx_v_point_list, (__pyx_v_split + 1), __pyx_v_end, __pyx_v_dims, __pyx_v_bucket_size, (__pyx_v_depth + 1)); } __pyx_L3:; /* "cogent/maths/spatial/ckd3.pyx":118 * node.left = build_tree(point_list, start, split, dims , bucket_size , depth+1) * node.right = build_tree(point_list, split+1, end, dims , bucket_size , depth+1) * return node # <<<<<<<<<<<<<< * * cdef void *knn(kdnode *root, kdpoint *point_list, kdpoint point, DTYPE_t *dst,\ */ __pyx_r = __pyx_v_node; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_WriteUnraisable("cogent.maths.spatial.ckd3.build_tree", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":120 * return node * * cdef void *knn(kdnode *root, kdpoint *point_list, kdpoint point, DTYPE_t *dst,\ # <<<<<<<<<<<<<< * UTYPE_t *idx, UTYPE_t k, UTYPE_t dims): * """finds the K-Nearest Neighbors.""" */ static void *__pyx_f_6cogent_5maths_7spatial_4ckd3_knn(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_v_root, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_point_list, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint __pyx_v_point, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *__pyx_v_dst, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *__pyx_v_idx, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_k, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_dims) { struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_v_lstack[100]; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_i; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_j; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_jold; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_ia; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_kmin; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_a; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_i_dist; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_diff; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_v_node; int __pyx_v_jstack; void *__pyx_r; __Pyx_RefNannyDeclarations __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_t_1; int __pyx_t_2; int __pyx_t_3; int __pyx_t_4; __Pyx_RefNannySetupContext("knn", 0); /* "cogent/maths/spatial/ckd3.pyx":132 * * # set helper variable to heap-queue * kmin = k - 1 # <<<<<<<<<<<<<< * * # initialize stack */ __pyx_v_kmin = (__pyx_v_k - 1); /* "cogent/maths/spatial/ckd3.pyx":135 * * # initialize stack * cdef int jstack = 1 # <<<<<<<<<<<<<< * lstack[jstack] = root * */ __pyx_v_jstack = 1; /* "cogent/maths/spatial/ckd3.pyx":136 * # initialize stack * cdef int jstack = 1 * lstack[jstack] = root # <<<<<<<<<<<<<< * * # initialize arrays */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_root; /* "cogent/maths/spatial/ckd3.pyx":139 * * # initialize arrays * for 0 <= i < k: # <<<<<<<<<<<<<< * dst[i] = 1000000000.00 # DBL_MAX * idx[i] = 2147483647 # INT_MAX */ __pyx_t_1 = __pyx_v_k; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_1; __pyx_v_i++) { /* "cogent/maths/spatial/ckd3.pyx":140 * # initialize arrays * for 0 <= i < k: * dst[i] = 1000000000.00 # DBL_MAX # <<<<<<<<<<<<<< * idx[i] = 2147483647 # INT_MAX * */ (__pyx_v_dst[__pyx_v_i]) = 1000000000.00; /* "cogent/maths/spatial/ckd3.pyx":141 * for 0 <= i < k: * dst[i] = 1000000000.00 # DBL_MAX * idx[i] = 2147483647 # INT_MAX # <<<<<<<<<<<<<< * * while jstack: */ (__pyx_v_idx[__pyx_v_i]) = 2147483647; } /* "cogent/maths/spatial/ckd3.pyx":143 * idx[i] = 2147483647 # INT_MAX * * while jstack: # <<<<<<<<<<<<<< * node = lstack[jstack] * jstack -= 1 */ while (1) { if (!__pyx_v_jstack) break; /* "cogent/maths/spatial/ckd3.pyx":144 * * while jstack: * node = lstack[jstack] # <<<<<<<<<<<<<< * jstack -= 1 * if node.bucket: */ __pyx_v_node = (__pyx_v_lstack[__pyx_v_jstack]); /* "cogent/maths/spatial/ckd3.pyx":145 * while jstack: * node = lstack[jstack] * jstack -= 1 # <<<<<<<<<<<<<< * if node.bucket: * for node.start <= i <= node.end: */ __pyx_v_jstack = (__pyx_v_jstack - 1); /* "cogent/maths/spatial/ckd3.pyx":146 * node = lstack[jstack] * jstack -= 1 * if node.bucket: # <<<<<<<<<<<<<< * for node.start <= i <= node.end: * i_dist = dist(&point_list[i], &point, dims) */ if (__pyx_v_node->bucket) { /* "cogent/maths/spatial/ckd3.pyx":147 * jstack -= 1 * if node.bucket: * for node.start <= i <= node.end: # <<<<<<<<<<<<<< * i_dist = dist(&point_list[i], &point, dims) * if i_dist < dst[0]: */ __pyx_t_1 = __pyx_v_node->end; for (__pyx_v_i = __pyx_v_node->start; __pyx_v_i <= __pyx_t_1; __pyx_v_i++) { /* "cogent/maths/spatial/ckd3.pyx":148 * if node.bucket: * for node.start <= i <= node.end: * i_dist = dist(&point_list[i], &point, dims) # <<<<<<<<<<<<<< * if i_dist < dst[0]: * dst[0] = i_dist */ __pyx_v_i_dist = __pyx_f_6cogent_5maths_7spatial_4ckd3_dist((&(__pyx_v_point_list[__pyx_v_i])), (&__pyx_v_point), __pyx_v_dims); /* "cogent/maths/spatial/ckd3.pyx":149 * for node.start <= i <= node.end: * i_dist = dist(&point_list[i], &point, dims) * if i_dist < dst[0]: # <<<<<<<<<<<<<< * dst[0] = i_dist * idx[0] = i */ __pyx_t_2 = (__pyx_v_i_dist < (__pyx_v_dst[0])); if (__pyx_t_2) { /* "cogent/maths/spatial/ckd3.pyx":150 * i_dist = dist(&point_list[i], &point, dims) * if i_dist < dst[0]: * dst[0] = i_dist # <<<<<<<<<<<<<< * idx[0] = i * if k > 1: */ (__pyx_v_dst[0]) = __pyx_v_i_dist; /* "cogent/maths/spatial/ckd3.pyx":151 * if i_dist < dst[0]: * dst[0] = i_dist * idx[0] = i # <<<<<<<<<<<<<< * if k > 1: * a = dst[0] */ (__pyx_v_idx[0]) = __pyx_v_i; /* "cogent/maths/spatial/ckd3.pyx":152 * dst[0] = i_dist * idx[0] = i * if k > 1: # <<<<<<<<<<<<<< * a = dst[0] * ia = idx[0] */ __pyx_t_2 = (__pyx_v_k > 1); if (__pyx_t_2) { /* "cogent/maths/spatial/ckd3.pyx":153 * idx[0] = i * if k > 1: * a = dst[0] # <<<<<<<<<<<<<< * ia = idx[0] * jold = 0 */ __pyx_v_a = (__pyx_v_dst[0]); /* "cogent/maths/spatial/ckd3.pyx":154 * if k > 1: * a = dst[0] * ia = idx[0] # <<<<<<<<<<<<<< * jold = 0 * j = 1 */ __pyx_v_ia = (__pyx_v_idx[0]); /* "cogent/maths/spatial/ckd3.pyx":155 * a = dst[0] * ia = idx[0] * jold = 0 # <<<<<<<<<<<<<< * j = 1 * while j <= kmin: */ __pyx_v_jold = 0; /* "cogent/maths/spatial/ckd3.pyx":156 * ia = idx[0] * jold = 0 * j = 1 # <<<<<<<<<<<<<< * while j <= kmin: * if (j < kmin) and (dst[j] < dst[j+1]): */ __pyx_v_j = 1; /* "cogent/maths/spatial/ckd3.pyx":157 * jold = 0 * j = 1 * while j <= kmin: # <<<<<<<<<<<<<< * if (j < kmin) and (dst[j] < dst[j+1]): * j+=1 */ while (1) { __pyx_t_2 = (__pyx_v_j <= __pyx_v_kmin); if (!__pyx_t_2) break; /* "cogent/maths/spatial/ckd3.pyx":158 * j = 1 * while j <= kmin: * if (j < kmin) and (dst[j] < dst[j+1]): # <<<<<<<<<<<<<< * j+=1 * if (a >= dst[j]): */ __pyx_t_2 = (__pyx_v_j < __pyx_v_kmin); if (__pyx_t_2) { __pyx_t_3 = ((__pyx_v_dst[__pyx_v_j]) < (__pyx_v_dst[(__pyx_v_j + 1)])); __pyx_t_4 = __pyx_t_3; } else { __pyx_t_4 = __pyx_t_2; } if (__pyx_t_4) { /* "cogent/maths/spatial/ckd3.pyx":159 * while j <= kmin: * if (j < kmin) and (dst[j] < dst[j+1]): * j+=1 # <<<<<<<<<<<<<< * if (a >= dst[j]): * break */ __pyx_v_j = (__pyx_v_j + 1); goto __pyx_L14; } __pyx_L14:; /* "cogent/maths/spatial/ckd3.pyx":160 * if (j < kmin) and (dst[j] < dst[j+1]): * j+=1 * if (a >= dst[j]): # <<<<<<<<<<<<<< * break * dst[jold] = dst[j] */ __pyx_t_4 = (__pyx_v_a >= (__pyx_v_dst[__pyx_v_j])); if (__pyx_t_4) { /* "cogent/maths/spatial/ckd3.pyx":161 * j+=1 * if (a >= dst[j]): * break # <<<<<<<<<<<<<< * dst[jold] = dst[j] * idx[jold] = idx[j] */ goto __pyx_L13_break; goto __pyx_L15; } __pyx_L15:; /* "cogent/maths/spatial/ckd3.pyx":162 * if (a >= dst[j]): * break * dst[jold] = dst[j] # <<<<<<<<<<<<<< * idx[jold] = idx[j] * jold = j */ (__pyx_v_dst[__pyx_v_jold]) = (__pyx_v_dst[__pyx_v_j]); /* "cogent/maths/spatial/ckd3.pyx":163 * break * dst[jold] = dst[j] * idx[jold] = idx[j] # <<<<<<<<<<<<<< * jold = j * j = 2*j + 1 */ (__pyx_v_idx[__pyx_v_jold]) = (__pyx_v_idx[__pyx_v_j]); /* "cogent/maths/spatial/ckd3.pyx":164 * dst[jold] = dst[j] * idx[jold] = idx[j] * jold = j # <<<<<<<<<<<<<< * j = 2*j + 1 * dst[jold] = a */ __pyx_v_jold = __pyx_v_j; /* "cogent/maths/spatial/ckd3.pyx":165 * idx[jold] = idx[j] * jold = j * j = 2*j + 1 # <<<<<<<<<<<<<< * dst[jold] = a * idx[jold] = ia */ __pyx_v_j = ((2 * __pyx_v_j) + 1); } __pyx_L13_break:; /* "cogent/maths/spatial/ckd3.pyx":166 * jold = j * j = 2*j + 1 * dst[jold] = a # <<<<<<<<<<<<<< * idx[jold] = ia * else: */ (__pyx_v_dst[__pyx_v_jold]) = __pyx_v_a; /* "cogent/maths/spatial/ckd3.pyx":167 * j = 2*j + 1 * dst[jold] = a * idx[jold] = ia # <<<<<<<<<<<<<< * else: * diff = point.coords[node.dimension] - node.position */ (__pyx_v_idx[__pyx_v_jold]) = __pyx_v_ia; goto __pyx_L11; } __pyx_L11:; goto __pyx_L10; } __pyx_L10:; } goto __pyx_L7; } /*else*/ { /* "cogent/maths/spatial/ckd3.pyx":169 * idx[jold] = ia * else: * diff = point.coords[node.dimension] - node.position # <<<<<<<<<<<<<< * if diff < 0: * if dst[0] >= diff * diff: */ __pyx_v_diff = ((__pyx_v_point.coords[__pyx_v_node->dimension]) - __pyx_v_node->position); /* "cogent/maths/spatial/ckd3.pyx":170 * else: * diff = point.coords[node.dimension] - node.position * if diff < 0: # <<<<<<<<<<<<<< * if dst[0] >= diff * diff: * jstack+=1 */ __pyx_t_4 = (__pyx_v_diff < 0.0); if (__pyx_t_4) { /* "cogent/maths/spatial/ckd3.pyx":171 * diff = point.coords[node.dimension] - node.position * if diff < 0: * if dst[0] >= diff * diff: # <<<<<<<<<<<<<< * jstack+=1 * lstack[jstack] = node.right */ __pyx_t_4 = ((__pyx_v_dst[0]) >= (__pyx_v_diff * __pyx_v_diff)); if (__pyx_t_4) { /* "cogent/maths/spatial/ckd3.pyx":172 * if diff < 0: * if dst[0] >= diff * diff: * jstack+=1 # <<<<<<<<<<<<<< * lstack[jstack] = node.right * jstack+=1 */ __pyx_v_jstack = (__pyx_v_jstack + 1); /* "cogent/maths/spatial/ckd3.pyx":173 * if dst[0] >= diff * diff: * jstack+=1 * lstack[jstack] = node.right # <<<<<<<<<<<<<< * jstack+=1 * lstack[jstack] = node.left */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_node->right; goto __pyx_L17; } __pyx_L17:; /* "cogent/maths/spatial/ckd3.pyx":174 * jstack+=1 * lstack[jstack] = node.right * jstack+=1 # <<<<<<<<<<<<<< * lstack[jstack] = node.left * else: */ __pyx_v_jstack = (__pyx_v_jstack + 1); /* "cogent/maths/spatial/ckd3.pyx":175 * lstack[jstack] = node.right * jstack+=1 * lstack[jstack] = node.left # <<<<<<<<<<<<<< * else: * if dst[0] >= diff * diff: */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_node->left; goto __pyx_L16; } /*else*/ { /* "cogent/maths/spatial/ckd3.pyx":177 * lstack[jstack] = node.left * else: * if dst[0] >= diff * diff: # <<<<<<<<<<<<<< * jstack+=1 * lstack[jstack] = node.left */ __pyx_t_4 = ((__pyx_v_dst[0]) >= (__pyx_v_diff * __pyx_v_diff)); if (__pyx_t_4) { /* "cogent/maths/spatial/ckd3.pyx":178 * else: * if dst[0] >= diff * diff: * jstack+=1 # <<<<<<<<<<<<<< * lstack[jstack] = node.left * jstack+=1 */ __pyx_v_jstack = (__pyx_v_jstack + 1); /* "cogent/maths/spatial/ckd3.pyx":179 * if dst[0] >= diff * diff: * jstack+=1 * lstack[jstack] = node.left # <<<<<<<<<<<<<< * jstack+=1 * lstack[jstack] = node.right */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_node->left; goto __pyx_L18; } __pyx_L18:; /* "cogent/maths/spatial/ckd3.pyx":180 * jstack+=1 * lstack[jstack] = node.left * jstack+=1 # <<<<<<<<<<<<<< * lstack[jstack] = node.right * */ __pyx_v_jstack = (__pyx_v_jstack + 1); /* "cogent/maths/spatial/ckd3.pyx":181 * lstack[jstack] = node.left * jstack+=1 * lstack[jstack] = node.right # <<<<<<<<<<<<<< * * cdef UTYPE_t rn(kdnode *root, kdpoint *point_list, kdpoint point, DTYPE_t **dstptr,\ */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_node->right; } __pyx_L16:; } __pyx_L7:; } __pyx_r = 0; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":183 * lstack[jstack] = node.right * * cdef UTYPE_t rn(kdnode *root, kdpoint *point_list, kdpoint point, DTYPE_t **dstptr,\ # <<<<<<<<<<<<<< * UTYPE_t **idxptr, DTYPE_t r, UTYPE_t dims, UTYPE_t buf): * """finds points within radius of query.""" */ static __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_f_6cogent_5maths_7spatial_4ckd3_rn(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_v_root, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *__pyx_v_point_list, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint __pyx_v_point, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t **__pyx_v_dstptr, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t **__pyx_v_idxptr, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_r, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_dims, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_buf) { struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_v_lstack[100]; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_i; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_count; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_i_dist; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_diff; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *__pyx_v_node; int __pyx_v_jstack; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_r; __Pyx_RefNannyDeclarations __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_t_1; int __pyx_t_2; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("rn", 0); /* "cogent/maths/spatial/ckd3.pyx":189 * # left nodes will be explored first. * cdef kdnode *lstack[100] * dstptr[0] = malloc(buf * sizeof(DTYPE_t)) # <<<<<<<<<<<<<< * idxptr[0] = malloc(buf * sizeof(UTYPE_t)) * */ (__pyx_v_dstptr[0]) = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *)malloc((__pyx_v_buf * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":190 * cdef kdnode *lstack[100] * dstptr[0] = malloc(buf * sizeof(DTYPE_t)) * idxptr[0] = malloc(buf * sizeof(UTYPE_t)) # <<<<<<<<<<<<<< * * cdef UTYPE_t i, count # counter and index */ (__pyx_v_idxptr[0]) = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *)malloc((__pyx_v_buf * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":197 * * # initialize stack * cdef int jstack = 1 # <<<<<<<<<<<<<< * lstack[jstack] = root * */ __pyx_v_jstack = 1; /* "cogent/maths/spatial/ckd3.pyx":198 * # initialize stack * cdef int jstack = 1 * lstack[jstack] = root # <<<<<<<<<<<<<< * * count = 0 */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_root; /* "cogent/maths/spatial/ckd3.pyx":200 * lstack[jstack] = root * * count = 0 # <<<<<<<<<<<<<< * while jstack: * node = lstack[jstack] */ __pyx_v_count = 0; /* "cogent/maths/spatial/ckd3.pyx":201 * * count = 0 * while jstack: # <<<<<<<<<<<<<< * node = lstack[jstack] * jstack -= 1 */ while (1) { if (!__pyx_v_jstack) break; /* "cogent/maths/spatial/ckd3.pyx":202 * count = 0 * while jstack: * node = lstack[jstack] # <<<<<<<<<<<<<< * jstack -= 1 * if node.bucket: */ __pyx_v_node = (__pyx_v_lstack[__pyx_v_jstack]); /* "cogent/maths/spatial/ckd3.pyx":203 * while jstack: * node = lstack[jstack] * jstack -= 1 # <<<<<<<<<<<<<< * if node.bucket: * for node.start <= i <= node.end: */ __pyx_v_jstack = (__pyx_v_jstack - 1); /* "cogent/maths/spatial/ckd3.pyx":204 * node = lstack[jstack] * jstack -= 1 * if node.bucket: # <<<<<<<<<<<<<< * for node.start <= i <= node.end: * i_dist = dist(&point_list[i], &point, dims) */ if (__pyx_v_node->bucket) { /* "cogent/maths/spatial/ckd3.pyx":205 * jstack -= 1 * if node.bucket: * for node.start <= i <= node.end: # <<<<<<<<<<<<<< * i_dist = dist(&point_list[i], &point, dims) * if i_dist < r: */ __pyx_t_1 = __pyx_v_node->end; for (__pyx_v_i = __pyx_v_node->start; __pyx_v_i <= __pyx_t_1; __pyx_v_i++) { /* "cogent/maths/spatial/ckd3.pyx":206 * if node.bucket: * for node.start <= i <= node.end: * i_dist = dist(&point_list[i], &point, dims) # <<<<<<<<<<<<<< * if i_dist < r: * dstptr[0][count] = i_dist */ __pyx_v_i_dist = __pyx_f_6cogent_5maths_7spatial_4ckd3_dist((&(__pyx_v_point_list[__pyx_v_i])), (&__pyx_v_point), __pyx_v_dims); /* "cogent/maths/spatial/ckd3.pyx":207 * for node.start <= i <= node.end: * i_dist = dist(&point_list[i], &point, dims) * if i_dist < r: # <<<<<<<<<<<<<< * dstptr[0][count] = i_dist * idxptr[0][count] = i */ __pyx_t_2 = (__pyx_v_i_dist < __pyx_v_r); if (__pyx_t_2) { /* "cogent/maths/spatial/ckd3.pyx":208 * i_dist = dist(&point_list[i], &point, dims) * if i_dist < r: * dstptr[0][count] = i_dist # <<<<<<<<<<<<<< * idxptr[0][count] = i * count += 1 */ ((__pyx_v_dstptr[0])[__pyx_v_count]) = __pyx_v_i_dist; /* "cogent/maths/spatial/ckd3.pyx":209 * if i_dist < r: * dstptr[0][count] = i_dist * idxptr[0][count] = i # <<<<<<<<<<<<<< * count += 1 * if count % buf == 0: */ ((__pyx_v_idxptr[0])[__pyx_v_count]) = __pyx_v_i; /* "cogent/maths/spatial/ckd3.pyx":210 * dstptr[0][count] = i_dist * idxptr[0][count] = i * count += 1 # <<<<<<<<<<<<<< * if count % buf == 0: * dstptr[0] = realloc(dstptr[0], (count + buf) * sizeof(DTYPE_t)) */ __pyx_v_count = (__pyx_v_count + 1); /* "cogent/maths/spatial/ckd3.pyx":211 * idxptr[0][count] = i * count += 1 * if count % buf == 0: # <<<<<<<<<<<<<< * dstptr[0] = realloc(dstptr[0], (count + buf) * sizeof(DTYPE_t)) * idxptr[0] = realloc(idxptr[0], (count + buf) * sizeof(UTYPE_t)) */ if (unlikely(__pyx_v_buf == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "integer division or modulo by zero"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 211; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_2 = ((__pyx_v_count % __pyx_v_buf) == 0); if (__pyx_t_2) { /* "cogent/maths/spatial/ckd3.pyx":212 * count += 1 * if count % buf == 0: * dstptr[0] = realloc(dstptr[0], (count + buf) * sizeof(DTYPE_t)) # <<<<<<<<<<<<<< * idxptr[0] = realloc(idxptr[0], (count + buf) * sizeof(UTYPE_t)) * else: */ (__pyx_v_dstptr[0]) = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *)realloc((__pyx_v_dstptr[0]), ((__pyx_v_count + __pyx_v_buf) * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":213 * if count % buf == 0: * dstptr[0] = realloc(dstptr[0], (count + buf) * sizeof(DTYPE_t)) * idxptr[0] = realloc(idxptr[0], (count + buf) * sizeof(UTYPE_t)) # <<<<<<<<<<<<<< * else: * diff = point.coords[node.dimension] - node.position */ (__pyx_v_idxptr[0]) = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *)realloc((__pyx_v_idxptr[0]), ((__pyx_v_count + __pyx_v_buf) * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t))))); goto __pyx_L9; } __pyx_L9:; goto __pyx_L8; } __pyx_L8:; } goto __pyx_L5; } /*else*/ { /* "cogent/maths/spatial/ckd3.pyx":215 * idxptr[0] = realloc(idxptr[0], (count + buf) * sizeof(UTYPE_t)) * else: * diff = point.coords[node.dimension] - node.position # <<<<<<<<<<<<<< * if diff < 0: * if r >= diff * diff: */ __pyx_v_diff = ((__pyx_v_point.coords[__pyx_v_node->dimension]) - __pyx_v_node->position); /* "cogent/maths/spatial/ckd3.pyx":216 * else: * diff = point.coords[node.dimension] - node.position * if diff < 0: # <<<<<<<<<<<<<< * if r >= diff * diff: * jstack+=1 */ __pyx_t_2 = (__pyx_v_diff < 0.0); if (__pyx_t_2) { /* "cogent/maths/spatial/ckd3.pyx":217 * diff = point.coords[node.dimension] - node.position * if diff < 0: * if r >= diff * diff: # <<<<<<<<<<<<<< * jstack+=1 * lstack[jstack] = node.right */ __pyx_t_2 = (__pyx_v_r >= (__pyx_v_diff * __pyx_v_diff)); if (__pyx_t_2) { /* "cogent/maths/spatial/ckd3.pyx":218 * if diff < 0: * if r >= diff * diff: * jstack+=1 # <<<<<<<<<<<<<< * lstack[jstack] = node.right * jstack+=1 */ __pyx_v_jstack = (__pyx_v_jstack + 1); /* "cogent/maths/spatial/ckd3.pyx":219 * if r >= diff * diff: * jstack+=1 * lstack[jstack] = node.right # <<<<<<<<<<<<<< * jstack+=1 * lstack[jstack] = node.left */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_node->right; goto __pyx_L11; } __pyx_L11:; /* "cogent/maths/spatial/ckd3.pyx":220 * jstack+=1 * lstack[jstack] = node.right * jstack+=1 # <<<<<<<<<<<<<< * lstack[jstack] = node.left * else: */ __pyx_v_jstack = (__pyx_v_jstack + 1); /* "cogent/maths/spatial/ckd3.pyx":221 * lstack[jstack] = node.right * jstack+=1 * lstack[jstack] = node.left # <<<<<<<<<<<<<< * else: * if r >= diff * diff: */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_node->left; goto __pyx_L10; } /*else*/ { /* "cogent/maths/spatial/ckd3.pyx":223 * lstack[jstack] = node.left * else: * if r >= diff * diff: # <<<<<<<<<<<<<< * jstack+=1 * lstack[jstack] = node.left */ __pyx_t_2 = (__pyx_v_r >= (__pyx_v_diff * __pyx_v_diff)); if (__pyx_t_2) { /* "cogent/maths/spatial/ckd3.pyx":224 * else: * if r >= diff * diff: * jstack+=1 # <<<<<<<<<<<<<< * lstack[jstack] = node.left * jstack+=1 */ __pyx_v_jstack = (__pyx_v_jstack + 1); /* "cogent/maths/spatial/ckd3.pyx":225 * if r >= diff * diff: * jstack+=1 * lstack[jstack] = node.left # <<<<<<<<<<<<<< * jstack+=1 * lstack[jstack] = node.right */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_node->left; goto __pyx_L12; } __pyx_L12:; /* "cogent/maths/spatial/ckd3.pyx":226 * jstack+=1 * lstack[jstack] = node.left * jstack+=1 # <<<<<<<<<<<<<< * lstack[jstack] = node.right * dstptr[0] = realloc(dstptr[0], count * sizeof(DTYPE_t)) */ __pyx_v_jstack = (__pyx_v_jstack + 1); /* "cogent/maths/spatial/ckd3.pyx":227 * lstack[jstack] = node.left * jstack+=1 * lstack[jstack] = node.right # <<<<<<<<<<<<<< * dstptr[0] = realloc(dstptr[0], count * sizeof(DTYPE_t)) * idxptr[0] = realloc(idxptr[0], count * sizeof(UTYPE_t)) */ (__pyx_v_lstack[__pyx_v_jstack]) = __pyx_v_node->right; } __pyx_L10:; } __pyx_L5:; } /* "cogent/maths/spatial/ckd3.pyx":228 * jstack+=1 * lstack[jstack] = node.right * dstptr[0] = realloc(dstptr[0], count * sizeof(DTYPE_t)) # <<<<<<<<<<<<<< * idxptr[0] = realloc(idxptr[0], count * sizeof(UTYPE_t)) * return count */ (__pyx_v_dstptr[0]) = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *)realloc((__pyx_v_dstptr[0]), (__pyx_v_count * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":229 * lstack[jstack] = node.right * dstptr[0] = realloc(dstptr[0], count * sizeof(DTYPE_t)) * idxptr[0] = realloc(idxptr[0], count * sizeof(UTYPE_t)) # <<<<<<<<<<<<<< * return count * */ (__pyx_v_idxptr[0]) = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *)realloc((__pyx_v_idxptr[0]), (__pyx_v_count * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":230 * dstptr[0] = realloc(dstptr[0], count * sizeof(DTYPE_t)) * idxptr[0] = realloc(idxptr[0], count * sizeof(UTYPE_t)) * return count # <<<<<<<<<<<<<< * * cdef class KDTree: */ __pyx_r = __pyx_v_count; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_WriteUnraisable("cogent.maths.spatial.ckd3.rn", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static int __pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_1__init__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static int __pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_1__init__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_n_array = 0; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_bucket_size; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__n_array,&__pyx_n_s__bucket_size,0}; int __pyx_r; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__init__ (wrapper)", 0); { PyObject* values[2] = {0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__n_array); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: if (kw_args > 0) { PyObject* value = PyDict_GetItem(__pyx_kwds, __pyx_n_s__bucket_size); if (value) { values[1] = value; kw_args--; } } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "__init__") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 241; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } if (values[1]) { } else { __pyx_v_bucket_size = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)5); } } else { switch (PyTuple_GET_SIZE(__pyx_args)) { case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); break; default: goto __pyx_L5_argtuple_error; } } __pyx_v_n_array = ((PyArrayObject *)values[0]); if (values[1]) { __pyx_v_bucket_size = __Pyx_PyInt_from_py_npy_uint64(values[1]); if (unlikely((__pyx_v_bucket_size == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 242; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } else { __pyx_v_bucket_size = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)5); } } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("__init__", 0, 1, 2, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 241; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.maths.spatial.ckd3.KDTree.__init__", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return -1; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_n_array), __pyx_ptype_5numpy_ndarray, 1, "n_array", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 241; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree___init__(((struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)__pyx_v_self), __pyx_v_n_array, __pyx_v_bucket_size); goto __pyx_L0; __pyx_L1_error:; __pyx_r = -1; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":241 * cdef readonly UTYPE_t pnts * cdef readonly UTYPE_t bucket_size * def __init__(self, np.ndarray[DTYPE_t, ndim =2] n_array, \ # <<<<<<<<<<<<<< * UTYPE_t bucket_size =5): * self.bucket_size = bucket_size */ static int __pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree___init__(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self, PyArrayObject *__pyx_v_n_array, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_bucket_size) { __Pyx_LocalBuf_ND __pyx_pybuffernd_n_array; __Pyx_Buffer __pyx_pybuffer_n_array; int __pyx_r; __Pyx_RefNannyDeclarations int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("__init__", 0); __pyx_pybuffer_n_array.pybuffer.buf = NULL; __pyx_pybuffer_n_array.refcount = 0; __pyx_pybuffernd_n_array.data = NULL; __pyx_pybuffernd_n_array.rcbuffer = &__pyx_pybuffer_n_array; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_n_array.rcbuffer->pybuffer, (PyObject*)__pyx_v_n_array, &__Pyx_TypeInfo_nn___pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 241; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_n_array.diminfo[0].strides = __pyx_pybuffernd_n_array.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_n_array.diminfo[0].shape = __pyx_pybuffernd_n_array.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_n_array.diminfo[1].strides = __pyx_pybuffernd_n_array.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_n_array.diminfo[1].shape = __pyx_pybuffernd_n_array.rcbuffer->pybuffer.shape[1]; /* "cogent/maths/spatial/ckd3.pyx":243 * def __init__(self, np.ndarray[DTYPE_t, ndim =2] n_array, \ * UTYPE_t bucket_size =5): * self.bucket_size = bucket_size # <<<<<<<<<<<<<< * self.pnts = n_array.shape[0] * self.dims = n_array.shape[1] */ __pyx_v_self->bucket_size = __pyx_v_bucket_size; /* "cogent/maths/spatial/ckd3.pyx":244 * UTYPE_t bucket_size =5): * self.bucket_size = bucket_size * self.pnts = n_array.shape[0] # <<<<<<<<<<<<<< * self.dims = n_array.shape[1] * self.n_array = n_array */ __pyx_v_self->pnts = (__pyx_v_n_array->dimensions[0]); /* "cogent/maths/spatial/ckd3.pyx":245 * self.bucket_size = bucket_size * self.pnts = n_array.shape[0] * self.dims = n_array.shape[1] # <<<<<<<<<<<<<< * self.n_array = n_array * self.c_array = n_array.data */ __pyx_v_self->dims = (__pyx_v_n_array->dimensions[1]); /* "cogent/maths/spatial/ckd3.pyx":246 * self.pnts = n_array.shape[0] * self.dims = n_array.shape[1] * self.n_array = n_array # <<<<<<<<<<<<<< * self.c_array = n_array.data * self.kdpnts = points(self.c_array, \ */ __Pyx_INCREF(((PyObject *)__pyx_v_n_array)); __Pyx_GIVEREF(((PyObject *)__pyx_v_n_array)); __Pyx_GOTREF(__pyx_v_self->n_array); __Pyx_DECREF(((PyObject *)__pyx_v_self->n_array)); __pyx_v_self->n_array = ((PyArrayObject *)__pyx_v_n_array); /* "cogent/maths/spatial/ckd3.pyx":247 * self.dims = n_array.shape[1] * self.n_array = n_array * self.c_array = n_array.data # <<<<<<<<<<<<<< * self.kdpnts = points(self.c_array, \ * self.pnts, self.dims) */ __pyx_v_self->c_array = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *)__pyx_v_n_array->data); /* "cogent/maths/spatial/ckd3.pyx":248 * self.n_array = n_array * self.c_array = n_array.data * self.kdpnts = points(self.c_array, \ # <<<<<<<<<<<<<< * self.pnts, self.dims) * self.tree = build_tree(self.kdpnts, 0, self.pnts-1, \ */ __pyx_v_self->kdpnts = __pyx_f_6cogent_5maths_7spatial_4ckd3_points(__pyx_v_self->c_array, __pyx_v_self->pnts, __pyx_v_self->dims); /* "cogent/maths/spatial/ckd3.pyx":250 * self.kdpnts = points(self.c_array, \ * self.pnts, self.dims) * self.tree = build_tree(self.kdpnts, 0, self.pnts-1, \ # <<<<<<<<<<<<<< * self.dims,self.bucket_size,0) * import_array1(0) */ __pyx_v_self->tree = __pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree(__pyx_v_self->kdpnts, 0, (__pyx_v_self->pnts - 1), __pyx_v_self->dims, __pyx_v_self->bucket_size, 0); /* "cogent/maths/spatial/ckd3.pyx":252 * self.tree = build_tree(self.kdpnts, 0, self.pnts-1, \ * self.dims,self.bucket_size,0) * import_array1(0) # <<<<<<<<<<<<<< * * def knn(self, np.ndarray[DTYPE_t, ndim =1] point, npy_intp k): */ import_array1(0); __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_n_array.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.maths.spatial.ckd3.KDTree.__init__", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = -1; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_n_array.rcbuffer->pybuffer); __pyx_L2:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_3knn(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static char __pyx_doc_6cogent_5maths_7spatial_4ckd3_6KDTree_2knn[] = "Finds the K-Nearest Neighbors of given point.\n Arguments:\n - point: 1-d numpy array (query point).\n - k: number of neighbors to find."; static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_3knn(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_point = 0; npy_intp __pyx_v_k; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__point,&__pyx_n_s__k,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("knn (wrapper)", 0); { PyObject* values[2] = {0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__point); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__k); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("knn", 1, 2, 2, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 254; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "knn") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 254; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 2) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); } __pyx_v_point = ((PyArrayObject *)values[0]); __pyx_v_k = __Pyx_PyInt_from_py_Py_intptr_t(values[1]); if (unlikely((__pyx_v_k == (npy_intp)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 254; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("knn", 1, 2, 2, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 254; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.maths.spatial.ckd3.KDTree.knn", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_point), __pyx_ptype_5numpy_ndarray, 1, "point", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 254; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_2knn(((struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)__pyx_v_self), __pyx_v_point, __pyx_v_k); goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":254 * import_array1(0) * * def knn(self, np.ndarray[DTYPE_t, ndim =1] point, npy_intp k): # <<<<<<<<<<<<<< * """Finds the K-Nearest Neighbors of given point. * Arguments: */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_2knn(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self, PyArrayObject *__pyx_v_point, npy_intp __pyx_v_k) { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_i; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint __pyx_v_pnt; CYTHON_UNUSED __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_size; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *__pyx_v_dst; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *__pyx_v_idx; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *__pyx_v_ridx; PyArrayObject *__pyx_v_dist = 0; PyArrayObject *__pyx_v_index = 0; __Pyx_LocalBuf_ND __pyx_pybuffernd_point; __Pyx_Buffer __pyx_pybuffer_point; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations int __pyx_t_1; PyObject *__pyx_t_2 = NULL; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_t_3; npy_intp __pyx_t_4; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("knn", 0); __pyx_pybuffer_point.pybuffer.buf = NULL; __pyx_pybuffer_point.refcount = 0; __pyx_pybuffernd_point.data = NULL; __pyx_pybuffernd_point.rcbuffer = &__pyx_pybuffer_point; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_point.rcbuffer->pybuffer, (PyObject*)__pyx_v_point, &__Pyx_TypeInfo_nn___pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 254; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_point.diminfo[0].strides = __pyx_pybuffernd_point.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_point.diminfo[0].shape = __pyx_pybuffernd_point.rcbuffer->pybuffer.shape[0]; /* "cogent/maths/spatial/ckd3.pyx":259 * - point: 1-d numpy array (query point). * - k: number of neighbors to find.""" * if self.pnts < k: # <<<<<<<<<<<<<< * return 1 * cdef UTYPE_t i */ __pyx_t_1 = (__pyx_v_self->pnts < __pyx_v_k); if (__pyx_t_1) { /* "cogent/maths/spatial/ckd3.pyx":260 * - k: number of neighbors to find.""" * if self.pnts < k: * return 1 # <<<<<<<<<<<<<< * cdef UTYPE_t i * cdef kdpoint pnt */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(__pyx_int_1); __pyx_r = __pyx_int_1; goto __pyx_L0; goto __pyx_L3; } __pyx_L3:; /* "cogent/maths/spatial/ckd3.pyx":263 * cdef UTYPE_t i * cdef kdpoint pnt * pnt.coords = point.data # <<<<<<<<<<<<<< * cdef UTYPE_t size = point.size * cdef DTYPE_t *dst = malloc(k * sizeof(DTYPE_t)) */ __pyx_v_pnt.coords = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *)__pyx_v_point->data); /* "cogent/maths/spatial/ckd3.pyx":264 * cdef kdpoint pnt * pnt.coords = point.data * cdef UTYPE_t size = point.size # <<<<<<<<<<<<<< * cdef DTYPE_t *dst = malloc(k * sizeof(DTYPE_t)) * cdef UTYPE_t *idx = malloc(k * sizeof(UTYPE_t)) */ __pyx_t_2 = PyObject_GetAttr(((PyObject *)__pyx_v_point), __pyx_n_s__size); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 264; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_3 = __Pyx_PyInt_from_py_npy_uint64(__pyx_t_2); if (unlikely((__pyx_t_3 == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 264; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_v_size = __pyx_t_3; /* "cogent/maths/spatial/ckd3.pyx":265 * pnt.coords = point.data * cdef UTYPE_t size = point.size * cdef DTYPE_t *dst = malloc(k * sizeof(DTYPE_t)) # <<<<<<<<<<<<<< * cdef UTYPE_t *idx = malloc(k * sizeof(UTYPE_t)) * cdef UTYPE_t *ridx = malloc(k * sizeof(UTYPE_t)) */ __pyx_v_dst = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *)malloc((__pyx_v_k * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":266 * cdef UTYPE_t size = point.size * cdef DTYPE_t *dst = malloc(k * sizeof(DTYPE_t)) * cdef UTYPE_t *idx = malloc(k * sizeof(UTYPE_t)) # <<<<<<<<<<<<<< * cdef UTYPE_t *ridx = malloc(k * sizeof(UTYPE_t)) * knn(self.tree, self.kdpnts, pnt, dst, idx, k, self.dims) */ __pyx_v_idx = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *)malloc((__pyx_v_k * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":267 * cdef DTYPE_t *dst = malloc(k * sizeof(DTYPE_t)) * cdef UTYPE_t *idx = malloc(k * sizeof(UTYPE_t)) * cdef UTYPE_t *ridx = malloc(k * sizeof(UTYPE_t)) # <<<<<<<<<<<<<< * knn(self.tree, self.kdpnts, pnt, dst, idx, k, self.dims) * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &k, NPY_DOUBLE, dst) */ __pyx_v_ridx = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *)malloc((__pyx_v_k * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":268 * cdef UTYPE_t *idx = malloc(k * sizeof(UTYPE_t)) * cdef UTYPE_t *ridx = malloc(k * sizeof(UTYPE_t)) * knn(self.tree, self.kdpnts, pnt, dst, idx, k, self.dims) # <<<<<<<<<<<<<< * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &k, NPY_DOUBLE, dst) * for 0 <= i < k: */ __pyx_f_6cogent_5maths_7spatial_4ckd3_knn(__pyx_v_self->tree, __pyx_v_self->kdpnts, __pyx_v_pnt, __pyx_v_dst, __pyx_v_idx, __pyx_v_k, __pyx_v_self->dims); /* "cogent/maths/spatial/ckd3.pyx":269 * cdef UTYPE_t *ridx = malloc(k * sizeof(UTYPE_t)) * knn(self.tree, self.kdpnts, pnt, dst, idx, k, self.dims) * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &k, NPY_DOUBLE, dst) # <<<<<<<<<<<<<< * for 0 <= i < k: * ridx[i] = self.kdpnts[idx[i]].index */ __pyx_t_2 = PyArray_SimpleNewFromData(1, (&__pyx_v_k), NPY_DOUBLE, ((void *)__pyx_v_dst)); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 269; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (!(likely(((__pyx_t_2) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_2, __pyx_ptype_5numpy_ndarray))))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 269; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_dist = ((PyArrayObject *)__pyx_t_2); __pyx_t_2 = 0; /* "cogent/maths/spatial/ckd3.pyx":270 * knn(self.tree, self.kdpnts, pnt, dst, idx, k, self.dims) * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &k, NPY_DOUBLE, dst) * for 0 <= i < k: # <<<<<<<<<<<<<< * ridx[i] = self.kdpnts[idx[i]].index * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &k, NPY_ULONGLONG, ridx) */ __pyx_t_4 = __pyx_v_k; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_4; __pyx_v_i++) { /* "cogent/maths/spatial/ckd3.pyx":271 * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &k, NPY_DOUBLE, dst) * for 0 <= i < k: * ridx[i] = self.kdpnts[idx[i]].index # <<<<<<<<<<<<<< * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &k, NPY_ULONGLONG, ridx) * free(idx) */ (__pyx_v_ridx[__pyx_v_i]) = (__pyx_v_self->kdpnts[(__pyx_v_idx[__pyx_v_i])]).index; } /* "cogent/maths/spatial/ckd3.pyx":272 * for 0 <= i < k: * ridx[i] = self.kdpnts[idx[i]].index * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &k, NPY_ULONGLONG, ridx) # <<<<<<<<<<<<<< * free(idx) * return (index, dist) */ __pyx_t_2 = PyArray_SimpleNewFromData(1, (&__pyx_v_k), NPY_ULONGLONG, ((void *)__pyx_v_ridx)); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 272; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); if (!(likely(((__pyx_t_2) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_2, __pyx_ptype_5numpy_ndarray))))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 272; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_index = ((PyArrayObject *)__pyx_t_2); __pyx_t_2 = 0; /* "cogent/maths/spatial/ckd3.pyx":273 * ridx[i] = self.kdpnts[idx[i]].index * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &k, NPY_ULONGLONG, ridx) * free(idx) # <<<<<<<<<<<<<< * return (index, dist) * */ free(__pyx_v_idx); /* "cogent/maths/spatial/ckd3.pyx":274 * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &k, NPY_ULONGLONG, ridx) * free(idx) * return (index, dist) # <<<<<<<<<<<<<< * * def rn(self, np.ndarray[DTYPE_t, ndim =1] point, DTYPE_t r): */ __Pyx_XDECREF(__pyx_r); __pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 274; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_INCREF(((PyObject *)__pyx_v_index)); PyTuple_SET_ITEM(__pyx_t_2, 0, ((PyObject *)__pyx_v_index)); __Pyx_GIVEREF(((PyObject *)__pyx_v_index)); __Pyx_INCREF(((PyObject *)__pyx_v_dist)); PyTuple_SET_ITEM(__pyx_t_2, 1, ((PyObject *)__pyx_v_dist)); __Pyx_GIVEREF(((PyObject *)__pyx_v_dist)); __pyx_r = ((PyObject *)__pyx_t_2); __pyx_t_2 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_2); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_point.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.maths.spatial.ckd3.KDTree.knn", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_point.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_dist); __Pyx_XDECREF((PyObject *)__pyx_v_index); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_5rn(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static char __pyx_doc_6cogent_5maths_7spatial_4ckd3_6KDTree_4rn[] = "Returns Radius Neighbors i.e. within radius from query point."; static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_5rn(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_point = 0; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_r; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__point,&__pyx_n_s__r,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("rn (wrapper)", 0); { PyObject* values[2] = {0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__point); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__r); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("rn", 1, 2, 2, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 276; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "rn") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 276; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 2) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); } __pyx_v_point = ((PyArrayObject *)values[0]); __pyx_v_r = __pyx_PyFloat_AsDouble(values[1]); if (unlikely((__pyx_v_r == (npy_float64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 276; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("rn", 1, 2, 2, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 276; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.maths.spatial.ckd3.KDTree.rn", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_point), __pyx_ptype_5numpy_ndarray, 1, "point", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 276; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_4rn(((struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)__pyx_v_self), __pyx_v_point, __pyx_v_r); goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":276 * return (index, dist) * * def rn(self, np.ndarray[DTYPE_t, ndim =1] point, DTYPE_t r): # <<<<<<<<<<<<<< * """Returns Radius Neighbors i.e. within radius from query point.""" * cdef UTYPE_t i */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_4rn(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self, PyArrayObject *__pyx_v_point, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t __pyx_v_r) { __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_i; npy_intp __pyx_v_j; struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint __pyx_v_pnt; CYTHON_UNUSED __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_v_size; __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t **__pyx_v_dstptr; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t **__pyx_v_idxptr; PyArrayObject *__pyx_v_dist = 0; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *__pyx_v_ridx; PyArrayObject *__pyx_v_index = 0; __Pyx_LocalBuf_ND __pyx_pybuffernd_point; __Pyx_Buffer __pyx_pybuffer_point; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t __pyx_t_2; npy_intp __pyx_t_3; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("rn", 0); __pyx_pybuffer_point.pybuffer.buf = NULL; __pyx_pybuffer_point.refcount = 0; __pyx_pybuffernd_point.data = NULL; __pyx_pybuffernd_point.rcbuffer = &__pyx_pybuffer_point; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_point.rcbuffer->pybuffer, (PyObject*)__pyx_v_point, &__Pyx_TypeInfo_nn___pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 276; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_point.diminfo[0].strides = __pyx_pybuffernd_point.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_point.diminfo[0].shape = __pyx_pybuffernd_point.rcbuffer->pybuffer.shape[0]; /* "cogent/maths/spatial/ckd3.pyx":281 * cdef npy_intp j * cdef kdpoint pnt * pnt.coords = point.data # <<<<<<<<<<<<<< * cdef UTYPE_t size = point.size * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) */ __pyx_v_pnt.coords = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *)__pyx_v_point->data); /* "cogent/maths/spatial/ckd3.pyx":282 * cdef kdpoint pnt * pnt.coords = point.data * cdef UTYPE_t size = point.size # <<<<<<<<<<<<<< * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) */ __pyx_t_1 = PyObject_GetAttr(((PyObject *)__pyx_v_point), __pyx_n_s__size); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = __Pyx_PyInt_from_py_npy_uint64(__pyx_t_1); if (unlikely((__pyx_t_2 == (npy_uint64)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_v_size = __pyx_t_2; /* "cogent/maths/spatial/ckd3.pyx":283 * pnt.coords = point.data * cdef UTYPE_t size = point.size * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) # <<<<<<<<<<<<<< * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) * j = rn(self.tree, self.kdpnts, pnt, dstptr, idxptr, r, self.dims, 100) */ __pyx_v_dstptr = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t **)malloc((sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *)))); /* "cogent/maths/spatial/ckd3.pyx":284 * cdef UTYPE_t size = point.size * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) # <<<<<<<<<<<<<< * j = rn(self.tree, self.kdpnts, pnt, dstptr, idxptr, r, self.dims, 100) * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &j, NPY_DOUBLE, dstptr[0]) */ __pyx_v_idxptr = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t **)malloc((sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *)))); /* "cogent/maths/spatial/ckd3.pyx":285 * cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) * j = rn(self.tree, self.kdpnts, pnt, dstptr, idxptr, r, self.dims, 100) # <<<<<<<<<<<<<< * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &j, NPY_DOUBLE, dstptr[0]) * cdef UTYPE_t *ridx = malloc(j * sizeof(UTYPE_t)) */ __pyx_v_j = ((npy_intp)__pyx_f_6cogent_5maths_7spatial_4ckd3_rn(__pyx_v_self->tree, __pyx_v_self->kdpnts, __pyx_v_pnt, __pyx_v_dstptr, __pyx_v_idxptr, __pyx_v_r, __pyx_v_self->dims, 100)); /* "cogent/maths/spatial/ckd3.pyx":286 * cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) * j = rn(self.tree, self.kdpnts, pnt, dstptr, idxptr, r, self.dims, 100) * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &j, NPY_DOUBLE, dstptr[0]) # <<<<<<<<<<<<<< * cdef UTYPE_t *ridx = malloc(j * sizeof(UTYPE_t)) * for 0 <= i < j: */ __pyx_t_1 = PyArray_SimpleNewFromData(1, (&__pyx_v_j), NPY_DOUBLE, ((void *)(__pyx_v_dstptr[0]))); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 286; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); if (!(likely(((__pyx_t_1) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_1, __pyx_ptype_5numpy_ndarray))))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 286; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_dist = ((PyArrayObject *)__pyx_t_1); __pyx_t_1 = 0; /* "cogent/maths/spatial/ckd3.pyx":287 * j = rn(self.tree, self.kdpnts, pnt, dstptr, idxptr, r, self.dims, 100) * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &j, NPY_DOUBLE, dstptr[0]) * cdef UTYPE_t *ridx = malloc(j * sizeof(UTYPE_t)) # <<<<<<<<<<<<<< * for 0 <= i < j: * ridx[i] = self.kdpnts[idxptr[0][i]].index */ __pyx_v_ridx = ((__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *)malloc((__pyx_v_j * (sizeof(__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t))))); /* "cogent/maths/spatial/ckd3.pyx":288 * cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &j, NPY_DOUBLE, dstptr[0]) * cdef UTYPE_t *ridx = malloc(j * sizeof(UTYPE_t)) * for 0 <= i < j: # <<<<<<<<<<<<<< * ridx[i] = self.kdpnts[idxptr[0][i]].index * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &j, NPY_ULONGLONG, ridx) */ __pyx_t_3 = __pyx_v_j; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_3; __pyx_v_i++) { /* "cogent/maths/spatial/ckd3.pyx":289 * cdef UTYPE_t *ridx = malloc(j * sizeof(UTYPE_t)) * for 0 <= i < j: * ridx[i] = self.kdpnts[idxptr[0][i]].index # <<<<<<<<<<<<<< * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &j, NPY_ULONGLONG, ridx) * free(idxptr[0]) */ (__pyx_v_ridx[__pyx_v_i]) = (__pyx_v_self->kdpnts[((__pyx_v_idxptr[0])[__pyx_v_i])]).index; } /* "cogent/maths/spatial/ckd3.pyx":290 * for 0 <= i < j: * ridx[i] = self.kdpnts[idxptr[0][i]].index * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &j, NPY_ULONGLONG, ridx) # <<<<<<<<<<<<<< * free(idxptr[0]) * free(idxptr) */ __pyx_t_1 = PyArray_SimpleNewFromData(1, (&__pyx_v_j), NPY_ULONGLONG, ((void *)__pyx_v_ridx)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 290; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); if (!(likely(((__pyx_t_1) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_1, __pyx_ptype_5numpy_ndarray))))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 290; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_index = ((PyArrayObject *)__pyx_t_1); __pyx_t_1 = 0; /* "cogent/maths/spatial/ckd3.pyx":291 * ridx[i] = self.kdpnts[idxptr[0][i]].index * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &j, NPY_ULONGLONG, ridx) * free(idxptr[0]) # <<<<<<<<<<<<<< * free(idxptr) * free(dstptr) */ free((__pyx_v_idxptr[0])); /* "cogent/maths/spatial/ckd3.pyx":292 * cdef np.ndarray index = PyArray_SimpleNewFromData(1, &j, NPY_ULONGLONG, ridx) * free(idxptr[0]) * free(idxptr) # <<<<<<<<<<<<<< * free(dstptr) * return (index, dist) */ free(__pyx_v_idxptr); /* "cogent/maths/spatial/ckd3.pyx":293 * free(idxptr[0]) * free(idxptr) * free(dstptr) # <<<<<<<<<<<<<< * return (index, dist) * */ free(__pyx_v_dstptr); /* "cogent/maths/spatial/ckd3.pyx":294 * free(idxptr) * free(dstptr) * return (index, dist) # <<<<<<<<<<<<<< * */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyTuple_New(2); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 294; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __Pyx_INCREF(((PyObject *)__pyx_v_index)); PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)__pyx_v_index)); __Pyx_GIVEREF(((PyObject *)__pyx_v_index)); __Pyx_INCREF(((PyObject *)__pyx_v_dist)); PyTuple_SET_ITEM(__pyx_t_1, 1, ((PyObject *)__pyx_v_dist)); __Pyx_GIVEREF(((PyObject *)__pyx_v_dist)); __pyx_r = ((PyObject *)__pyx_t_1); __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_point.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.maths.spatial.ckd3.KDTree.rn", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_point.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_dist); __Pyx_XDECREF((PyObject *)__pyx_v_index); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_4dims_1__get__(PyObject *__pyx_v_self); /*proto*/ static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_4dims_1__get__(PyObject *__pyx_v_self) { PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__get__ (wrapper)", 0); __pyx_r = __pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_4dims___get__(((struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)__pyx_v_self)); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":238 * cdef kdpoint *kdpnts * cdef kdnode *tree * cdef readonly UTYPE_t dims # <<<<<<<<<<<<<< * cdef readonly UTYPE_t pnts * cdef readonly UTYPE_t bucket_size */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_4dims___get__(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("__get__", 0); __Pyx_XDECREF(__pyx_r); __pyx_t_1 = __Pyx_PyInt_to_py_npy_uint64(__pyx_v_self->dims); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 238; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("cogent.maths.spatial.ckd3.KDTree.dims.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_4pnts_1__get__(PyObject *__pyx_v_self); /*proto*/ static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_4pnts_1__get__(PyObject *__pyx_v_self) { PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__get__ (wrapper)", 0); __pyx_r = __pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_4pnts___get__(((struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)__pyx_v_self)); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":239 * cdef kdnode *tree * cdef readonly UTYPE_t dims * cdef readonly UTYPE_t pnts # <<<<<<<<<<<<<< * cdef readonly UTYPE_t bucket_size * def __init__(self, np.ndarray[DTYPE_t, ndim =2] n_array, \ */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_4pnts___get__(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("__get__", 0); __Pyx_XDECREF(__pyx_r); __pyx_t_1 = __Pyx_PyInt_to_py_npy_uint64(__pyx_v_self->pnts); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 239; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("cogent.maths.spatial.ckd3.KDTree.pnts.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_11bucket_size_1__get__(PyObject *__pyx_v_self); /*proto*/ static PyObject *__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_11bucket_size_1__get__(PyObject *__pyx_v_self) { PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__get__ (wrapper)", 0); __pyx_r = __pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_11bucket_size___get__(((struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)__pyx_v_self)); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/maths/spatial/ckd3.pyx":240 * cdef readonly UTYPE_t dims * cdef readonly UTYPE_t pnts * cdef readonly UTYPE_t bucket_size # <<<<<<<<<<<<<< * def __init__(self, np.ndarray[DTYPE_t, ndim =2] n_array, \ * UTYPE_t bucket_size =5): */ static PyObject *__pyx_pf_6cogent_5maths_7spatial_4ckd3_6KDTree_11bucket_size___get__(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *__pyx_v_self) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("__get__", 0); __Pyx_XDECREF(__pyx_r); __pyx_t_1 = __Pyx_PyInt_to_py_npy_uint64(__pyx_v_self->bucket_size); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 240; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("cogent.maths.spatial.ckd3.KDTree.bucket_size.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /*proto*/ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_r; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__getbuffer__ (wrapper)", 0); __pyx_r = __pyx_pf_5numpy_7ndarray___getbuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info), ((int)__pyx_v_flags)); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":193 * # experimental exception made for __getbuffer__ and __releasebuffer__ * # -- the details of this may change. * def __getbuffer__(ndarray self, Py_buffer* info, int flags): # <<<<<<<<<<<<<< * # This implementation of getbuffer is geared towards Cython * # requirements, and does not yet fullfill the PEP. */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_v_copy_shape; int __pyx_v_i; int __pyx_v_ndim; int __pyx_v_endian_detector; int __pyx_v_little_endian; int __pyx_v_t; char *__pyx_v_f; PyArray_Descr *__pyx_v_descr = 0; int __pyx_v_offset; int __pyx_v_hasfields; int __pyx_r; __Pyx_RefNannyDeclarations int __pyx_t_1; int __pyx_t_2; int __pyx_t_3; PyObject *__pyx_t_4 = NULL; int __pyx_t_5; int __pyx_t_6; int __pyx_t_7; PyObject *__pyx_t_8 = NULL; char *__pyx_t_9; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("__getbuffer__", 0); if (__pyx_v_info != NULL) { __pyx_v_info->obj = Py_None; __Pyx_INCREF(Py_None); __Pyx_GIVEREF(__pyx_v_info->obj); } /* "numpy.pxd":199 * # of flags * * if info == NULL: return # <<<<<<<<<<<<<< * * cdef int copy_shape, i, ndim */ __pyx_t_1 = (__pyx_v_info == NULL); if (__pyx_t_1) { __pyx_r = 0; goto __pyx_L0; goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":202 * * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * */ __pyx_v_endian_detector = 1; /* "numpy.pxd":203 * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * * ndim = PyArray_NDIM(self) */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":205 * cdef bint little_endian = ((&endian_detector)[0] != 0) * * ndim = PyArray_NDIM(self) # <<<<<<<<<<<<<< * * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_v_ndim = PyArray_NDIM(__pyx_v_self); /* "numpy.pxd":207 * ndim = PyArray_NDIM(self) * * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * copy_shape = 1 * else: */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":208 * * if sizeof(npy_intp) != sizeof(Py_ssize_t): * copy_shape = 1 # <<<<<<<<<<<<<< * else: * copy_shape = 0 */ __pyx_v_copy_shape = 1; goto __pyx_L4; } /*else*/ { /* "numpy.pxd":210 * copy_shape = 1 * else: * copy_shape = 0 # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) */ __pyx_v_copy_shape = 0; } __pyx_L4:; /* "numpy.pxd":212 * copy_shape = 0 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") */ __pyx_t_1 = ((__pyx_v_flags & PyBUF_C_CONTIGUOUS) == PyBUF_C_CONTIGUOUS); if (__pyx_t_1) { /* "numpy.pxd":213 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not C contiguous") * */ __pyx_t_2 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_C_CONTIGUOUS)); __pyx_t_3 = __pyx_t_2; } else { __pyx_t_3 = __pyx_t_1; } if (__pyx_t_3) { /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_2), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":216 * raise ValueError(u"ndarray is not C contiguous") * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") */ __pyx_t_3 = ((__pyx_v_flags & PyBUF_F_CONTIGUOUS) == PyBUF_F_CONTIGUOUS); if (__pyx_t_3) { /* "numpy.pxd":217 * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not Fortran contiguous") * */ __pyx_t_1 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_F_CONTIGUOUS)); __pyx_t_2 = __pyx_t_1; } else { __pyx_t_2 = __pyx_t_3; } if (__pyx_t_2) { /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_4), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":220 * raise ValueError(u"ndarray is not Fortran contiguous") * * info.buf = PyArray_DATA(self) # <<<<<<<<<<<<<< * info.ndim = ndim * if copy_shape: */ __pyx_v_info->buf = PyArray_DATA(__pyx_v_self); /* "numpy.pxd":221 * * info.buf = PyArray_DATA(self) * info.ndim = ndim # <<<<<<<<<<<<<< * if copy_shape: * # Allocate new buffer for strides and shape info. */ __pyx_v_info->ndim = __pyx_v_ndim; /* "numpy.pxd":222 * info.buf = PyArray_DATA(self) * info.ndim = ndim * if copy_shape: # <<<<<<<<<<<<<< * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. */ if (__pyx_v_copy_shape) { /* "numpy.pxd":225 * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) # <<<<<<<<<<<<<< * info.shape = info.strides + ndim * for i in range(ndim): */ __pyx_v_info->strides = ((Py_ssize_t *)malloc((((sizeof(Py_ssize_t)) * ((size_t)__pyx_v_ndim)) * 2))); /* "numpy.pxd":226 * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim # <<<<<<<<<<<<<< * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] */ __pyx_v_info->shape = (__pyx_v_info->strides + __pyx_v_ndim); /* "numpy.pxd":227 * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim * for i in range(ndim): # <<<<<<<<<<<<<< * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] */ __pyx_t_5 = __pyx_v_ndim; for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) { __pyx_v_i = __pyx_t_6; /* "numpy.pxd":228 * info.shape = info.strides + ndim * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] # <<<<<<<<<<<<<< * info.shape[i] = PyArray_DIMS(self)[i] * else: */ (__pyx_v_info->strides[__pyx_v_i]) = (PyArray_STRIDES(__pyx_v_self)[__pyx_v_i]); /* "numpy.pxd":229 * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] # <<<<<<<<<<<<<< * else: * info.strides = PyArray_STRIDES(self) */ (__pyx_v_info->shape[__pyx_v_i]) = (PyArray_DIMS(__pyx_v_self)[__pyx_v_i]); } goto __pyx_L7; } /*else*/ { /* "numpy.pxd":231 * info.shape[i] = PyArray_DIMS(self)[i] * else: * info.strides = PyArray_STRIDES(self) # <<<<<<<<<<<<<< * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL */ __pyx_v_info->strides = ((Py_ssize_t *)PyArray_STRIDES(__pyx_v_self)); /* "numpy.pxd":232 * else: * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) # <<<<<<<<<<<<<< * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) */ __pyx_v_info->shape = ((Py_ssize_t *)PyArray_DIMS(__pyx_v_self)); } __pyx_L7:; /* "numpy.pxd":233 * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL # <<<<<<<<<<<<<< * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) */ __pyx_v_info->suboffsets = NULL; /* "numpy.pxd":234 * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) # <<<<<<<<<<<<<< * info.readonly = not PyArray_ISWRITEABLE(self) * */ __pyx_v_info->itemsize = PyArray_ITEMSIZE(__pyx_v_self); /* "numpy.pxd":235 * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) # <<<<<<<<<<<<<< * * cdef int t */ __pyx_v_info->readonly = (!PyArray_ISWRITEABLE(__pyx_v_self)); /* "numpy.pxd":238 * * cdef int t * cdef char* f = NULL # <<<<<<<<<<<<<< * cdef dtype descr = self.descr * cdef list stack */ __pyx_v_f = NULL; /* "numpy.pxd":239 * cdef int t * cdef char* f = NULL * cdef dtype descr = self.descr # <<<<<<<<<<<<<< * cdef list stack * cdef int offset */ __Pyx_INCREF(((PyObject *)__pyx_v_self->descr)); __pyx_v_descr = __pyx_v_self->descr; /* "numpy.pxd":243 * cdef int offset * * cdef bint hasfields = PyDataType_HASFIELDS(descr) # <<<<<<<<<<<<<< * * if not hasfields and not copy_shape: */ __pyx_v_hasfields = PyDataType_HASFIELDS(__pyx_v_descr); /* "numpy.pxd":245 * cdef bint hasfields = PyDataType_HASFIELDS(descr) * * if not hasfields and not copy_shape: # <<<<<<<<<<<<<< * # do not call releasebuffer * info.obj = None */ __pyx_t_2 = (!__pyx_v_hasfields); if (__pyx_t_2) { __pyx_t_3 = (!__pyx_v_copy_shape); __pyx_t_1 = __pyx_t_3; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":247 * if not hasfields and not copy_shape: * # do not call releasebuffer * info.obj = None # <<<<<<<<<<<<<< * else: * # need to call releasebuffer */ __Pyx_INCREF(Py_None); __Pyx_GIVEREF(Py_None); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = Py_None; goto __pyx_L10; } /*else*/ { /* "numpy.pxd":250 * else: * # need to call releasebuffer * info.obj = self # <<<<<<<<<<<<<< * * if not hasfields: */ __Pyx_INCREF(((PyObject *)__pyx_v_self)); __Pyx_GIVEREF(((PyObject *)__pyx_v_self)); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = ((PyObject *)__pyx_v_self); } __pyx_L10:; /* "numpy.pxd":252 * info.obj = self * * if not hasfields: # <<<<<<<<<<<<<< * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or */ __pyx_t_1 = (!__pyx_v_hasfields); if (__pyx_t_1) { /* "numpy.pxd":253 * * if not hasfields: * t = descr.type_num # <<<<<<<<<<<<<< * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): */ __pyx_v_t = __pyx_v_descr->type_num; /* "numpy.pxd":254 * if not hasfields: * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_1 = (__pyx_v_descr->byteorder == '>'); if (__pyx_t_1) { __pyx_t_2 = __pyx_v_little_endian; } else { __pyx_t_2 = __pyx_t_1; } if (!__pyx_t_2) { /* "numpy.pxd":255 * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" */ __pyx_t_1 = (__pyx_v_descr->byteorder == '<'); if (__pyx_t_1) { __pyx_t_3 = (!__pyx_v_little_endian); __pyx_t_7 = __pyx_t_3; } else { __pyx_t_7 = __pyx_t_1; } __pyx_t_1 = __pyx_t_7; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_6), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L12; } __pyx_L12:; /* "numpy.pxd":257 * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" */ __pyx_t_1 = (__pyx_v_t == NPY_BYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__b; goto __pyx_L13; } /* "numpy.pxd":258 * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" */ __pyx_t_1 = (__pyx_v_t == NPY_UBYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__B; goto __pyx_L13; } /* "numpy.pxd":259 * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" */ __pyx_t_1 = (__pyx_v_t == NPY_SHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__h; goto __pyx_L13; } /* "numpy.pxd":260 * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" */ __pyx_t_1 = (__pyx_v_t == NPY_USHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__H; goto __pyx_L13; } /* "numpy.pxd":261 * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" */ __pyx_t_1 = (__pyx_v_t == NPY_INT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__i; goto __pyx_L13; } /* "numpy.pxd":262 * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" */ __pyx_t_1 = (__pyx_v_t == NPY_UINT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__I; goto __pyx_L13; } /* "numpy.pxd":263 * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" */ __pyx_t_1 = (__pyx_v_t == NPY_LONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__l; goto __pyx_L13; } /* "numpy.pxd":264 * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__L; goto __pyx_L13; } /* "numpy.pxd":265 * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__q; goto __pyx_L13; } /* "numpy.pxd":266 * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Q; goto __pyx_L13; } /* "numpy.pxd":267 * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" */ __pyx_t_1 = (__pyx_v_t == NPY_FLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__f; goto __pyx_L13; } /* "numpy.pxd":268 * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" */ __pyx_t_1 = (__pyx_v_t == NPY_DOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__d; goto __pyx_L13; } /* "numpy.pxd":269 * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__g; goto __pyx_L13; } /* "numpy.pxd":270 * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" */ __pyx_t_1 = (__pyx_v_t == NPY_CFLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zf; goto __pyx_L13; } /* "numpy.pxd":271 * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" */ __pyx_t_1 = (__pyx_v_t == NPY_CDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zd; goto __pyx_L13; } /* "numpy.pxd":272 * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f = "O" * else: */ __pyx_t_1 = (__pyx_v_t == NPY_CLONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zg; goto __pyx_L13; } /* "numpy.pxd":273 * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_1 = (__pyx_v_t == NPY_OBJECT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__O; goto __pyx_L13; } /*else*/ { /* "numpy.pxd":275 * elif t == NPY_OBJECT: f = "O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * info.format = f * return */ __pyx_t_4 = PyInt_FromLong(__pyx_v_t); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_8 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_t_4); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_8)); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_8)); __Pyx_GIVEREF(((PyObject *)__pyx_t_8)); __pyx_t_8 = 0; __pyx_t_8 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_8); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_Raise(__pyx_t_8, 0, 0, 0); __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L13:; /* "numpy.pxd":276 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f # <<<<<<<<<<<<<< * return * else: */ __pyx_v_info->format = __pyx_v_f; /* "numpy.pxd":277 * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f * return # <<<<<<<<<<<<<< * else: * info.format = stdlib.malloc(_buffer_format_string_len) */ __pyx_r = 0; goto __pyx_L0; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":279 * return * else: * info.format = stdlib.malloc(_buffer_format_string_len) # <<<<<<<<<<<<<< * info.format[0] = '^' # Native data types, manual alignment * offset = 0 */ __pyx_v_info->format = ((char *)malloc(255)); /* "numpy.pxd":280 * else: * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment # <<<<<<<<<<<<<< * offset = 0 * f = _util_dtypestring(descr, info.format + 1, */ (__pyx_v_info->format[0]) = '^'; /* "numpy.pxd":281 * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment * offset = 0 # <<<<<<<<<<<<<< * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, */ __pyx_v_offset = 0; /* "numpy.pxd":284 * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, * &offset) # <<<<<<<<<<<<<< * f[0] = 0 # Terminate format string * */ __pyx_t_9 = __pyx_f_5numpy__util_dtypestring(__pyx_v_descr, (__pyx_v_info->format + 1), (__pyx_v_info->format + 255), (&__pyx_v_offset)); if (unlikely(__pyx_t_9 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_9; /* "numpy.pxd":285 * info.format + _buffer_format_string_len, * &offset) * f[0] = 0 # Terminate format string # <<<<<<<<<<<<<< * * def __releasebuffer__(ndarray self, Py_buffer* info): */ (__pyx_v_f[0]) = 0; } __pyx_L11:; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_8); __Pyx_AddTraceback("numpy.ndarray.__getbuffer__", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = -1; if (__pyx_v_info != NULL && __pyx_v_info->obj != NULL) { __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = NULL; } goto __pyx_L2; __pyx_L0:; if (__pyx_v_info != NULL && __pyx_v_info->obj == Py_None) { __Pyx_GOTREF(Py_None); __Pyx_DECREF(Py_None); __pyx_v_info->obj = NULL; } __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_descr); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info); /*proto*/ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__releasebuffer__ (wrapper)", 0); __pyx_pf_5numpy_7ndarray_2__releasebuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info)); __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":287 * f[0] = 0 # Terminate format string * * def __releasebuffer__(ndarray self, Py_buffer* info): # <<<<<<<<<<<<<< * if PyArray_HASFIELDS(self): * stdlib.free(info.format) */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("__releasebuffer__", 0); /* "numpy.pxd":288 * * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): # <<<<<<<<<<<<<< * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_t_1 = PyArray_HASFIELDS(__pyx_v_self); if (__pyx_t_1) { /* "numpy.pxd":289 * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): * stdlib.free(info.format) # <<<<<<<<<<<<<< * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) */ free(__pyx_v_info->format); goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":290 * if PyArray_HASFIELDS(self): * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * stdlib.free(info.strides) * # info.shape was stored after info.strides in the same block */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":291 * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) # <<<<<<<<<<<<<< * # info.shape was stored after info.strides in the same block * */ free(__pyx_v_info->strides); goto __pyx_L4; } __pyx_L4:; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":767 * ctypedef npy_cdouble complex_t * * cdef inline object PyArray_MultiIterNew1(a): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(1, a) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew1(PyObject *__pyx_v_a) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew1", 0); /* "numpy.pxd":768 * * cdef inline object PyArray_MultiIterNew1(a): * return PyArray_MultiIterNew(1, a) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew2(a, b): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(1, ((void *)__pyx_v_a)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 768; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew1", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":770 * return PyArray_MultiIterNew(1, a) * * cdef inline object PyArray_MultiIterNew2(a, b): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(2, a, b) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew2(PyObject *__pyx_v_a, PyObject *__pyx_v_b) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew2", 0); /* "numpy.pxd":771 * * cdef inline object PyArray_MultiIterNew2(a, b): * return PyArray_MultiIterNew(2, a, b) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew3(a, b, c): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(2, ((void *)__pyx_v_a), ((void *)__pyx_v_b)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 771; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew2", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":773 * return PyArray_MultiIterNew(2, a, b) * * cdef inline object PyArray_MultiIterNew3(a, b, c): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(3, a, b, c) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew3(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew3", 0); /* "numpy.pxd":774 * * cdef inline object PyArray_MultiIterNew3(a, b, c): * return PyArray_MultiIterNew(3, a, b, c) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(3, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 774; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew3", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":776 * return PyArray_MultiIterNew(3, a, b, c) * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(4, a, b, c, d) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew4(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew4", 0); /* "numpy.pxd":777 * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): * return PyArray_MultiIterNew(4, a, b, c, d) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(4, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 777; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew4", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":779 * return PyArray_MultiIterNew(4, a, b, c, d) * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(5, a, b, c, d, e) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew5(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d, PyObject *__pyx_v_e) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew5", 0); /* "numpy.pxd":780 * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): * return PyArray_MultiIterNew(5, a, b, c, d, e) # <<<<<<<<<<<<<< * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(5, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d), ((void *)__pyx_v_e)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 780; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew5", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":782 * return PyArray_MultiIterNew(5, a, b, c, d, e) * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: # <<<<<<<<<<<<<< * # Recursive utility function used in __getbuffer__ to get format * # string. The new location in the format string is returned. */ static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *__pyx_v_descr, char *__pyx_v_f, char *__pyx_v_end, int *__pyx_v_offset) { PyArray_Descr *__pyx_v_child = 0; int __pyx_v_endian_detector; int __pyx_v_little_endian; PyObject *__pyx_v_fields = 0; PyObject *__pyx_v_childname = NULL; PyObject *__pyx_v_new_offset = NULL; PyObject *__pyx_v_t = NULL; char *__pyx_r; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; Py_ssize_t __pyx_t_2; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; PyObject *__pyx_t_5 = NULL; int __pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; long __pyx_t_10; char *__pyx_t_11; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("_util_dtypestring", 0); /* "numpy.pxd":789 * cdef int delta_offset * cdef tuple i * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * cdef tuple fields */ __pyx_v_endian_detector = 1; /* "numpy.pxd":790 * cdef tuple i * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * cdef tuple fields * */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":793 * cdef tuple fields * * for childname in descr.names: # <<<<<<<<<<<<<< * fields = descr.fields[childname] * child, new_offset = fields */ if (unlikely(((PyObject *)__pyx_v_descr->names) == Py_None)) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 793; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_1 = ((PyObject *)__pyx_v_descr->names); __Pyx_INCREF(__pyx_t_1); __pyx_t_2 = 0; for (;;) { if (__pyx_t_2 >= PyTuple_GET_SIZE(__pyx_t_1)) break; __pyx_t_3 = PyTuple_GET_ITEM(__pyx_t_1, __pyx_t_2); __Pyx_INCREF(__pyx_t_3); __pyx_t_2++; __Pyx_XDECREF(__pyx_v_childname); __pyx_v_childname = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":794 * * for childname in descr.names: * fields = descr.fields[childname] # <<<<<<<<<<<<<< * child, new_offset = fields * */ __pyx_t_3 = PyObject_GetItem(__pyx_v_descr->fields, __pyx_v_childname); if (!__pyx_t_3) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); if (!(likely(PyTuple_CheckExact(__pyx_t_3))||((__pyx_t_3) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected tuple, got %.200s", Py_TYPE(__pyx_t_3)->tp_name), 0))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_fields)); __pyx_v_fields = ((PyObject*)__pyx_t_3); __pyx_t_3 = 0; /* "numpy.pxd":795 * for childname in descr.names: * fields = descr.fields[childname] * child, new_offset = fields # <<<<<<<<<<<<<< * * if (end - f) - (new_offset - offset[0]) < 15: */ if (likely(PyTuple_CheckExact(((PyObject *)__pyx_v_fields)))) { PyObject* sequence = ((PyObject *)__pyx_v_fields); if (unlikely(PyTuple_GET_SIZE(sequence) != 2)) { if (PyTuple_GET_SIZE(sequence) > 2) __Pyx_RaiseTooManyValuesError(2); else __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(sequence)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_3 = PyTuple_GET_ITEM(sequence, 0); __pyx_t_4 = PyTuple_GET_ITEM(sequence, 1); __Pyx_INCREF(__pyx_t_3); __Pyx_INCREF(__pyx_t_4); } else { __Pyx_UnpackTupleError(((PyObject *)__pyx_v_fields), 2); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } if (!(likely(((__pyx_t_3) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_3, __pyx_ptype_5numpy_dtype))))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_child)); __pyx_v_child = ((PyArray_Descr *)__pyx_t_3); __pyx_t_3 = 0; __Pyx_XDECREF(__pyx_v_new_offset); __pyx_v_new_offset = __pyx_t_4; __pyx_t_4 = 0; /* "numpy.pxd":797 * child, new_offset = fields * * if (end - f) - (new_offset - offset[0]) < 15: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * */ __pyx_t_4 = PyInt_FromLong((__pyx_v_end - __pyx_v_f)); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyNumber_Subtract(__pyx_v_new_offset, __pyx_t_3); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyNumber_Subtract(__pyx_t_4, __pyx_t_5); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = PyObject_RichCompare(__pyx_t_3, __pyx_int_15, Py_LT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_t_5 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_9), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":800 * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * * if ((child.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_6 = (__pyx_v_child->byteorder == '>'); if (__pyx_t_6) { __pyx_t_7 = __pyx_v_little_endian; } else { __pyx_t_7 = __pyx_t_6; } if (!__pyx_t_7) { /* "numpy.pxd":801 * * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * # One could encode it in the format string and have Cython */ __pyx_t_6 = (__pyx_v_child->byteorder == '<'); if (__pyx_t_6) { __pyx_t_8 = (!__pyx_v_little_endian); __pyx_t_9 = __pyx_t_8; } else { __pyx_t_9 = __pyx_t_6; } __pyx_t_6 = __pyx_t_9; } else { __pyx_t_6 = __pyx_t_7; } if (__pyx_t_6) { /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_10), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":812 * * # Output padding bytes * while offset[0] < new_offset: # <<<<<<<<<<<<<< * f[0] = 120 # "x"; pad byte * f += 1 */ while (1) { __pyx_t_5 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_v_new_offset, Py_LT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (!__pyx_t_6) break; /* "numpy.pxd":813 * # Output padding bytes * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte # <<<<<<<<<<<<<< * f += 1 * offset[0] += 1 */ (__pyx_v_f[0]) = 120; /* "numpy.pxd":814 * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte * f += 1 # <<<<<<<<<<<<<< * offset[0] += 1 * */ __pyx_v_f = (__pyx_v_f + 1); /* "numpy.pxd":815 * f[0] = 120 # "x"; pad byte * f += 1 * offset[0] += 1 # <<<<<<<<<<<<<< * * offset[0] += child.itemsize */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + 1); } /* "numpy.pxd":817 * offset[0] += 1 * * offset[0] += child.itemsize # <<<<<<<<<<<<<< * * if not PyDataType_HASFIELDS(child): */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + __pyx_v_child->elsize); /* "numpy.pxd":819 * offset[0] += child.itemsize * * if not PyDataType_HASFIELDS(child): # <<<<<<<<<<<<<< * t = child.type_num * if end - f < 5: */ __pyx_t_6 = (!PyDataType_HASFIELDS(__pyx_v_child)); if (__pyx_t_6) { /* "numpy.pxd":820 * * if not PyDataType_HASFIELDS(child): * t = child.type_num # <<<<<<<<<<<<<< * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") */ __pyx_t_3 = PyInt_FromLong(__pyx_v_child->type_num); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 820; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_XDECREF(__pyx_v_t); __pyx_v_t = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":821 * if not PyDataType_HASFIELDS(child): * t = child.type_num * if end - f < 5: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short.") * */ __pyx_t_6 = ((__pyx_v_end - __pyx_v_f) < 5); if (__pyx_t_6) { /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_t_3 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_12), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_Raise(__pyx_t_3, 0, 0, 0); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L10; } __pyx_L10:; /* "numpy.pxd":825 * * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" */ __pyx_t_3 = PyInt_FromLong(NPY_BYTE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 98; goto __pyx_L11; } /* "numpy.pxd":826 * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" */ __pyx_t_5 = PyInt_FromLong(NPY_UBYTE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 66; goto __pyx_L11; } /* "numpy.pxd":827 * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" */ __pyx_t_3 = PyInt_FromLong(NPY_SHORT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 104; goto __pyx_L11; } /* "numpy.pxd":828 * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" */ __pyx_t_5 = PyInt_FromLong(NPY_USHORT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 72; goto __pyx_L11; } /* "numpy.pxd":829 * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" */ __pyx_t_3 = PyInt_FromLong(NPY_INT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 105; goto __pyx_L11; } /* "numpy.pxd":830 * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" */ __pyx_t_5 = PyInt_FromLong(NPY_UINT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 73; goto __pyx_L11; } /* "numpy.pxd":831 * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" */ __pyx_t_3 = PyInt_FromLong(NPY_LONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 108; goto __pyx_L11; } /* "numpy.pxd":832 * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 76; goto __pyx_L11; } /* "numpy.pxd":833 * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" */ __pyx_t_3 = PyInt_FromLong(NPY_LONGLONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 113; goto __pyx_L11; } /* "numpy.pxd":834 * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONGLONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 81; goto __pyx_L11; } /* "numpy.pxd":835 * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" */ __pyx_t_3 = PyInt_FromLong(NPY_FLOAT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 102; goto __pyx_L11; } /* "numpy.pxd":836 * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf */ __pyx_t_5 = PyInt_FromLong(NPY_DOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 100; goto __pyx_L11; } /* "numpy.pxd":837 * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd */ __pyx_t_3 = PyInt_FromLong(NPY_LONGDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 103; goto __pyx_L11; } /* "numpy.pxd":838 * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg */ __pyx_t_5 = PyInt_FromLong(NPY_CFLOAT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 102; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":839 * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" */ __pyx_t_3 = PyInt_FromLong(NPY_CDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 100; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":840 * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: */ __pyx_t_5 = PyInt_FromLong(NPY_CLONGDOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 103; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":841 * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_3 = PyInt_FromLong(NPY_OBJECT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 79; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":843 * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * f += 1 * else: */ __pyx_t_5 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_v_t); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_5)); __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, ((PyObject *)__pyx_t_5)); __Pyx_GIVEREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L11:; /* "numpy.pxd":844 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * f += 1 # <<<<<<<<<<<<<< * else: * # Cython ignores struct boundary information ("T{...}"), */ __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L9; } /*else*/ { /* "numpy.pxd":848 * # Cython ignores struct boundary information ("T{...}"), * # so don't output it * f = _util_dtypestring(child, f, end, offset) # <<<<<<<<<<<<<< * return f * */ __pyx_t_11 = __pyx_f_5numpy__util_dtypestring(__pyx_v_child, __pyx_v_f, __pyx_v_end, __pyx_v_offset); if (unlikely(__pyx_t_11 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 848; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_11; } __pyx_L9:; } __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "numpy.pxd":849 * # so don't output it * f = _util_dtypestring(child, f, end, offset) * return f # <<<<<<<<<<<<<< * * */ __pyx_r = __pyx_v_f; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __Pyx_AddTraceback("numpy._util_dtypestring", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF((PyObject *)__pyx_v_child); __Pyx_XDECREF(__pyx_v_fields); __Pyx_XDECREF(__pyx_v_childname); __Pyx_XDECREF(__pyx_v_new_offset); __Pyx_XDECREF(__pyx_v_t); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":964 * * * cdef inline void set_array_base(ndarray arr, object base): # <<<<<<<<<<<<<< * cdef PyObject* baseptr * if base is None: */ static CYTHON_INLINE void __pyx_f_5numpy_set_array_base(PyArrayObject *__pyx_v_arr, PyObject *__pyx_v_base) { PyObject *__pyx_v_baseptr; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("set_array_base", 0); /* "numpy.pxd":966 * cdef inline void set_array_base(ndarray arr, object base): * cdef PyObject* baseptr * if base is None: # <<<<<<<<<<<<<< * baseptr = NULL * else: */ __pyx_t_1 = (__pyx_v_base == Py_None); if (__pyx_t_1) { /* "numpy.pxd":967 * cdef PyObject* baseptr * if base is None: * baseptr = NULL # <<<<<<<<<<<<<< * else: * Py_INCREF(base) # important to do this before decref below! */ __pyx_v_baseptr = NULL; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":969 * baseptr = NULL * else: * Py_INCREF(base) # important to do this before decref below! # <<<<<<<<<<<<<< * baseptr = base * Py_XDECREF(arr.base) */ Py_INCREF(__pyx_v_base); /* "numpy.pxd":970 * else: * Py_INCREF(base) # important to do this before decref below! * baseptr = base # <<<<<<<<<<<<<< * Py_XDECREF(arr.base) * arr.base = baseptr */ __pyx_v_baseptr = ((PyObject *)__pyx_v_base); } __pyx_L3:; /* "numpy.pxd":971 * Py_INCREF(base) # important to do this before decref below! * baseptr = base * Py_XDECREF(arr.base) # <<<<<<<<<<<<<< * arr.base = baseptr * */ Py_XDECREF(__pyx_v_arr->base); /* "numpy.pxd":972 * baseptr = base * Py_XDECREF(arr.base) * arr.base = baseptr # <<<<<<<<<<<<<< * * cdef inline object get_array_base(ndarray arr): */ __pyx_v_arr->base = __pyx_v_baseptr; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_get_array_base(PyArrayObject *__pyx_v_arr) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("get_array_base", 0); /* "numpy.pxd":975 * * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: # <<<<<<<<<<<<<< * return None * else: */ __pyx_t_1 = (__pyx_v_arr->base == NULL); if (__pyx_t_1) { /* "numpy.pxd":976 * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: * return None # <<<<<<<<<<<<<< * else: * return arr.base */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(Py_None); __pyx_r = Py_None; goto __pyx_L0; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":978 * return None * else: * return arr.base # <<<<<<<<<<<<<< */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_arr->base)); __pyx_r = ((PyObject *)__pyx_v_arr->base); goto __pyx_L0; } __pyx_L3:; __pyx_r = Py_None; __Pyx_INCREF(Py_None); __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_tp_new_6cogent_5maths_7spatial_4ckd3_KDTree(PyTypeObject *t, PyObject *a, PyObject *k) { struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *p; PyObject *o = (*t->tp_alloc)(t, 0); if (!o) return 0; p = ((struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)o); p->n_array = ((PyArrayObject *)Py_None); Py_INCREF(Py_None); return o; } static void __pyx_tp_dealloc_6cogent_5maths_7spatial_4ckd3_KDTree(PyObject *o) { struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *p = (struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)o; Py_XDECREF(((PyObject *)p->n_array)); (*Py_TYPE(o)->tp_free)(o); } static int __pyx_tp_traverse_6cogent_5maths_7spatial_4ckd3_KDTree(PyObject *o, visitproc v, void *a) { int e; struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *p = (struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)o; if (p->n_array) { e = (*v)(((PyObject*)p->n_array), a); if (e) return e; } return 0; } static int __pyx_tp_clear_6cogent_5maths_7spatial_4ckd3_KDTree(PyObject *o) { struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *p = (struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree *)o; PyObject* tmp; tmp = ((PyObject*)p->n_array); p->n_array = ((PyArrayObject *)Py_None); Py_INCREF(Py_None); Py_XDECREF(tmp); return 0; } static PyObject *__pyx_getprop_6cogent_5maths_7spatial_4ckd3_6KDTree_dims(PyObject *o, void *x) { return __pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_4dims_1__get__(o); } static PyObject *__pyx_getprop_6cogent_5maths_7spatial_4ckd3_6KDTree_pnts(PyObject *o, void *x) { return __pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_4pnts_1__get__(o); } static PyObject *__pyx_getprop_6cogent_5maths_7spatial_4ckd3_6KDTree_bucket_size(PyObject *o, void *x) { return __pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_11bucket_size_1__get__(o); } static PyMethodDef __pyx_methods_6cogent_5maths_7spatial_4ckd3_KDTree[] = { {__Pyx_NAMESTR("knn"), (PyCFunction)__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_3knn, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(__pyx_doc_6cogent_5maths_7spatial_4ckd3_6KDTree_2knn)}, {__Pyx_NAMESTR("rn"), (PyCFunction)__pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_5rn, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(__pyx_doc_6cogent_5maths_7spatial_4ckd3_6KDTree_4rn)}, {0, 0, 0, 0} }; static struct PyGetSetDef __pyx_getsets_6cogent_5maths_7spatial_4ckd3_KDTree[] = { {(char *)"dims", __pyx_getprop_6cogent_5maths_7spatial_4ckd3_6KDTree_dims, 0, 0, 0}, {(char *)"pnts", __pyx_getprop_6cogent_5maths_7spatial_4ckd3_6KDTree_pnts, 0, 0, 0}, {(char *)"bucket_size", __pyx_getprop_6cogent_5maths_7spatial_4ckd3_6KDTree_bucket_size, 0, 0, 0}, {0, 0, 0, 0, 0} }; static PyNumberMethods __pyx_tp_as_number_KDTree = { 0, /*nb_add*/ 0, /*nb_subtract*/ 0, /*nb_multiply*/ #if PY_MAJOR_VERSION < 3 0, /*nb_divide*/ #endif 0, /*nb_remainder*/ 0, /*nb_divmod*/ 0, /*nb_power*/ 0, /*nb_negative*/ 0, /*nb_positive*/ 0, /*nb_absolute*/ 0, /*nb_nonzero*/ 0, /*nb_invert*/ 0, /*nb_lshift*/ 0, /*nb_rshift*/ 0, /*nb_and*/ 0, /*nb_xor*/ 0, /*nb_or*/ #if PY_MAJOR_VERSION < 3 0, /*nb_coerce*/ #endif 0, /*nb_int*/ #if PY_MAJOR_VERSION < 3 0, /*nb_long*/ #else 0, /*reserved*/ #endif 0, /*nb_float*/ #if PY_MAJOR_VERSION < 3 0, /*nb_oct*/ #endif #if PY_MAJOR_VERSION < 3 0, /*nb_hex*/ #endif 0, /*nb_inplace_add*/ 0, /*nb_inplace_subtract*/ 0, /*nb_inplace_multiply*/ #if PY_MAJOR_VERSION < 3 0, /*nb_inplace_divide*/ #endif 0, /*nb_inplace_remainder*/ 0, /*nb_inplace_power*/ 0, /*nb_inplace_lshift*/ 0, /*nb_inplace_rshift*/ 0, /*nb_inplace_and*/ 0, /*nb_inplace_xor*/ 0, /*nb_inplace_or*/ 0, /*nb_floor_divide*/ 0, /*nb_true_divide*/ 0, /*nb_inplace_floor_divide*/ 0, /*nb_inplace_true_divide*/ #if PY_VERSION_HEX >= 0x02050000 0, /*nb_index*/ #endif }; static PySequenceMethods __pyx_tp_as_sequence_KDTree = { 0, /*sq_length*/ 0, /*sq_concat*/ 0, /*sq_repeat*/ 0, /*sq_item*/ 0, /*sq_slice*/ 0, /*sq_ass_item*/ 0, /*sq_ass_slice*/ 0, /*sq_contains*/ 0, /*sq_inplace_concat*/ 0, /*sq_inplace_repeat*/ }; static PyMappingMethods __pyx_tp_as_mapping_KDTree = { 0, /*mp_length*/ 0, /*mp_subscript*/ 0, /*mp_ass_subscript*/ }; static PyBufferProcs __pyx_tp_as_buffer_KDTree = { #if PY_MAJOR_VERSION < 3 0, /*bf_getreadbuffer*/ #endif #if PY_MAJOR_VERSION < 3 0, /*bf_getwritebuffer*/ #endif #if PY_MAJOR_VERSION < 3 0, /*bf_getsegcount*/ #endif #if PY_MAJOR_VERSION < 3 0, /*bf_getcharbuffer*/ #endif #if PY_VERSION_HEX >= 0x02060000 0, /*bf_getbuffer*/ #endif #if PY_VERSION_HEX >= 0x02060000 0, /*bf_releasebuffer*/ #endif }; static PyTypeObject __pyx_type_6cogent_5maths_7spatial_4ckd3_KDTree = { PyVarObject_HEAD_INIT(0, 0) __Pyx_NAMESTR("cogent.maths.spatial.ckd3.KDTree"), /*tp_name*/ sizeof(struct __pyx_obj_6cogent_5maths_7spatial_4ckd3_KDTree), /*tp_basicsize*/ 0, /*tp_itemsize*/ __pyx_tp_dealloc_6cogent_5maths_7spatial_4ckd3_KDTree, /*tp_dealloc*/ 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ #if PY_MAJOR_VERSION < 3 0, /*tp_compare*/ #else 0, /*reserved*/ #endif 0, /*tp_repr*/ &__pyx_tp_as_number_KDTree, /*tp_as_number*/ &__pyx_tp_as_sequence_KDTree, /*tp_as_sequence*/ &__pyx_tp_as_mapping_KDTree, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ &__pyx_tp_as_buffer_KDTree, /*tp_as_buffer*/ Py_TPFLAGS_DEFAULT|Py_TPFLAGS_CHECKTYPES|Py_TPFLAGS_HAVE_NEWBUFFER|Py_TPFLAGS_BASETYPE|Py_TPFLAGS_HAVE_GC, /*tp_flags*/ __Pyx_DOCSTR("Implements the KDTree data structure for fast neares neighbor queries."), /*tp_doc*/ __pyx_tp_traverse_6cogent_5maths_7spatial_4ckd3_KDTree, /*tp_traverse*/ __pyx_tp_clear_6cogent_5maths_7spatial_4ckd3_KDTree, /*tp_clear*/ 0, /*tp_richcompare*/ 0, /*tp_weaklistoffset*/ 0, /*tp_iter*/ 0, /*tp_iternext*/ __pyx_methods_6cogent_5maths_7spatial_4ckd3_KDTree, /*tp_methods*/ 0, /*tp_members*/ __pyx_getsets_6cogent_5maths_7spatial_4ckd3_KDTree, /*tp_getset*/ 0, /*tp_base*/ 0, /*tp_dict*/ 0, /*tp_descr_get*/ 0, /*tp_descr_set*/ 0, /*tp_dictoffset*/ __pyx_pw_6cogent_5maths_7spatial_4ckd3_6KDTree_1__init__, /*tp_init*/ 0, /*tp_alloc*/ __pyx_tp_new_6cogent_5maths_7spatial_4ckd3_KDTree, /*tp_new*/ 0, /*tp_free*/ 0, /*tp_is_gc*/ 0, /*tp_bases*/ 0, /*tp_mro*/ 0, /*tp_cache*/ 0, /*tp_subclasses*/ 0, /*tp_weaklist*/ 0, /*tp_del*/ #if PY_VERSION_HEX >= 0x02060000 0, /*tp_version_tag*/ #endif }; static PyMethodDef __pyx_methods[] = { {0, 0, 0, 0} }; #if PY_MAJOR_VERSION >= 3 static struct PyModuleDef __pyx_moduledef = { PyModuleDef_HEAD_INIT, __Pyx_NAMESTR("ckd3"), 0, /* m_doc */ -1, /* m_size */ __pyx_methods /* m_methods */, NULL, /* m_reload */ NULL, /* m_traverse */ NULL, /* m_clear */ NULL /* m_free */ }; #endif static __Pyx_StringTabEntry __pyx_string_tab[] = { {&__pyx_kp_u_1, __pyx_k_1, sizeof(__pyx_k_1), 0, 1, 0, 0}, {&__pyx_kp_u_11, __pyx_k_11, sizeof(__pyx_k_11), 0, 1, 0, 0}, {&__pyx_kp_s_13, __pyx_k_13, sizeof(__pyx_k_13), 0, 0, 1, 0}, {&__pyx_kp_u_3, __pyx_k_3, sizeof(__pyx_k_3), 0, 1, 0, 0}, {&__pyx_kp_u_5, __pyx_k_5, sizeof(__pyx_k_5), 0, 1, 0, 0}, {&__pyx_kp_u_7, __pyx_k_7, sizeof(__pyx_k_7), 0, 1, 0, 0}, {&__pyx_kp_u_8, __pyx_k_8, sizeof(__pyx_k_8), 0, 1, 0, 0}, {&__pyx_n_s__RuntimeError, __pyx_k__RuntimeError, sizeof(__pyx_k__RuntimeError), 0, 0, 1, 1}, {&__pyx_n_s__ValueError, __pyx_k__ValueError, sizeof(__pyx_k__ValueError), 0, 0, 1, 1}, {&__pyx_n_s____main__, __pyx_k____main__, sizeof(__pyx_k____main__), 0, 0, 1, 1}, {&__pyx_n_s____test__, __pyx_k____test__, sizeof(__pyx_k____test__), 0, 0, 1, 1}, {&__pyx_n_s____version__, __pyx_k____version__, sizeof(__pyx_k____version__), 0, 0, 1, 1}, {&__pyx_n_s__bucket_size, __pyx_k__bucket_size, sizeof(__pyx_k__bucket_size), 0, 0, 1, 1}, {&__pyx_n_s__k, __pyx_k__k, sizeof(__pyx_k__k), 0, 0, 1, 1}, {&__pyx_n_s__n_array, __pyx_k__n_array, sizeof(__pyx_k__n_array), 0, 0, 1, 1}, {&__pyx_n_s__point, __pyx_k__point, sizeof(__pyx_k__point), 0, 0, 1, 1}, {&__pyx_n_s__r, __pyx_k__r, sizeof(__pyx_k__r), 0, 0, 1, 1}, {&__pyx_n_s__range, __pyx_k__range, sizeof(__pyx_k__range), 0, 0, 1, 1}, {&__pyx_n_s__size, __pyx_k__size, sizeof(__pyx_k__size), 0, 0, 1, 1}, {0, 0, 0, 0, 0, 0, 0} }; static int __Pyx_InitCachedBuiltins(void) { __pyx_builtin_ValueError = __Pyx_GetName(__pyx_b, __pyx_n_s__ValueError); if (!__pyx_builtin_ValueError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_range = __Pyx_GetName(__pyx_b, __pyx_n_s__range); if (!__pyx_builtin_range) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 227; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_RuntimeError = __Pyx_GetName(__pyx_b, __pyx_n_s__RuntimeError); if (!__pyx_builtin_RuntimeError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} return 0; __pyx_L1_error:; return -1; } static int __Pyx_InitCachedConstants(void) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__Pyx_InitCachedConstants", 0); /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_k_tuple_2 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_2); __Pyx_INCREF(((PyObject *)__pyx_kp_u_1)); PyTuple_SET_ITEM(__pyx_k_tuple_2, 0, ((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_2)); /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_k_tuple_4 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_4); __Pyx_INCREF(((PyObject *)__pyx_kp_u_3)); PyTuple_SET_ITEM(__pyx_k_tuple_4, 0, ((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_4)); /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_k_tuple_6 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_6)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_6); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_6, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_6)); /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_k_tuple_9 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_9)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_9); __Pyx_INCREF(((PyObject *)__pyx_kp_u_8)); PyTuple_SET_ITEM(__pyx_k_tuple_9, 0, ((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_9)); /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_k_tuple_10 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_10)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_10); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_10, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_10)); /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_k_tuple_12 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_12)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_12); __Pyx_INCREF(((PyObject *)__pyx_kp_u_11)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 0, ((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_12)); __Pyx_RefNannyFinishContext(); return 0; __pyx_L1_error:; __Pyx_RefNannyFinishContext(); return -1; } static int __Pyx_InitGlobals(void) { if (__Pyx_InitStrings(__pyx_string_tab) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_1 = PyInt_FromLong(1); if (unlikely(!__pyx_int_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_15 = PyInt_FromLong(15); if (unlikely(!__pyx_int_15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; return 0; __pyx_L1_error:; return -1; } #if PY_MAJOR_VERSION < 3 PyMODINIT_FUNC initckd3(void); /*proto*/ PyMODINIT_FUNC initckd3(void) #else PyMODINIT_FUNC PyInit_ckd3(void); /*proto*/ PyMODINIT_FUNC PyInit_ckd3(void) #endif { PyObject *__pyx_t_1 = NULL; __Pyx_RefNannyDeclarations #if CYTHON_REFNANNY __Pyx_RefNanny = __Pyx_RefNannyImportAPI("refnanny"); if (!__Pyx_RefNanny) { PyErr_Clear(); __Pyx_RefNanny = __Pyx_RefNannyImportAPI("Cython.Runtime.refnanny"); if (!__Pyx_RefNanny) Py_FatalError("failed to import 'refnanny' module"); } #endif __Pyx_RefNannySetupContext("PyMODINIT_FUNC PyInit_ckd3(void)", 0); if ( __Pyx_check_binary_version() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_tuple = PyTuple_New(0); if (unlikely(!__pyx_empty_tuple)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_bytes = PyBytes_FromStringAndSize("", 0); if (unlikely(!__pyx_empty_bytes)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #ifdef __Pyx_CyFunction_USED if (__Pyx_CyFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_FusedFunction_USED if (__pyx_FusedFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_Generator_USED if (__pyx_Generator_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif /*--- Library function declarations ---*/ /*--- Threads initialization code ---*/ #if defined(__PYX_FORCE_INIT_THREADS) && __PYX_FORCE_INIT_THREADS #ifdef WITH_THREAD /* Python build with threading support? */ PyEval_InitThreads(); #endif #endif /*--- Module creation code ---*/ #if PY_MAJOR_VERSION < 3 __pyx_m = Py_InitModule4(__Pyx_NAMESTR("ckd3"), __pyx_methods, 0, 0, PYTHON_API_VERSION); #else __pyx_m = PyModule_Create(&__pyx_moduledef); #endif if (!__pyx_m) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; #if PY_MAJOR_VERSION < 3 Py_INCREF(__pyx_m); #endif __pyx_b = PyImport_AddModule(__Pyx_NAMESTR(__Pyx_BUILTIN_MODULE_NAME)); if (!__pyx_b) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; if (__Pyx_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; /*--- Initialize various global constants etc. ---*/ if (unlikely(__Pyx_InitGlobals() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__pyx_module_is_main_cogent__maths__spatial__ckd3) { if (__Pyx_SetAttrString(__pyx_m, "__name__", __pyx_n_s____main__) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; } /*--- Builtin init code ---*/ if (unlikely(__Pyx_InitCachedBuiltins() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Constants init code ---*/ if (unlikely(__Pyx_InitCachedConstants() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Global init code ---*/ /*--- Variable export code ---*/ /*--- Function export code ---*/ if (__Pyx_ExportFunction("swap", (void (*)(void))__pyx_f_6cogent_5maths_7spatial_4ckd3_swap, "void (struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ExportFunction("points", (void (*)(void))__pyx_f_6cogent_5maths_7spatial_4ckd3_points, "struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *(__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ExportFunction("dist", (void (*)(void))__pyx_f_6cogent_5maths_7spatial_4ckd3_dist, "__pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t (struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ExportFunction("qsort", (void (*)(void))__pyx_f_6cogent_5maths_7spatial_4ckd3_qsort, "void (struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ExportFunction("build_tree", (void (*)(void))__pyx_f_6cogent_5maths_7spatial_4ckd3_build_tree, "struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ExportFunction("rn", (void (*)(void))__pyx_f_6cogent_5maths_7spatial_4ckd3_rn, "__pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t (struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t **, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_ExportFunction("knn", (void (*)(void))__pyx_f_6cogent_5maths_7spatial_4ckd3_knn, "void *(struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdnode *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint *, struct __pyx_t_6cogent_5maths_7spatial_4ckd3_kdpoint, __pyx_t_6cogent_5maths_7spatial_4ckd3_DTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t *, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t, __pyx_t_6cogent_5maths_7spatial_4ckd3_UTYPE_t)") < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Type init code ---*/ if (PyType_Ready(&__pyx_type_6cogent_5maths_7spatial_4ckd3_KDTree) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 232; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__Pyx_SetAttrString(__pyx_m, "KDTree", (PyObject *)&__pyx_type_6cogent_5maths_7spatial_4ckd3_KDTree) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 232; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_6cogent_5maths_7spatial_4ckd3_KDTree = &__pyx_type_6cogent_5maths_7spatial_4ckd3_KDTree; /*--- Type import code ---*/ __pyx_ptype_5numpy_dtype = __Pyx_ImportType("numpy", "dtype", sizeof(PyArray_Descr), 0); if (unlikely(!__pyx_ptype_5numpy_dtype)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 154; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_flatiter = __Pyx_ImportType("numpy", "flatiter", sizeof(PyArrayIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_flatiter)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 164; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_broadcast = __Pyx_ImportType("numpy", "broadcast", sizeof(PyArrayMultiIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_broadcast)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 168; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ndarray = __Pyx_ImportType("numpy", "ndarray", sizeof(PyArrayObject), 0); if (unlikely(!__pyx_ptype_5numpy_ndarray)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 177; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ufunc = __Pyx_ImportType("numpy", "ufunc", sizeof(PyUFuncObject), 0); if (unlikely(!__pyx_ptype_5numpy_ufunc)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 860; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Variable import code ---*/ /*--- Function import code ---*/ /*--- Execution code ---*/ /* "cogent/maths/spatial/ckd3.pyx":16 * from stdlib cimport malloc, realloc, free * * __version__ = "('1', '5', '3')" # <<<<<<<<<<<<<< * * cdef extern from "numpy/arrayobject.h": */ if (PyObject_SetAttr(__pyx_m, __pyx_n_s____version__, ((PyObject *)__pyx_kp_s_13)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 16; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/maths/spatial/ckd3.pyx":1 * #cython: boundscheck=False # <<<<<<<<<<<<<< * #(not slicing or indexing any numpy arrays) * */ __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_1)); if (PyObject_SetAttr(__pyx_m, __pyx_n_s____test__, ((PyObject *)__pyx_t_1)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0; /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); if (__pyx_m) { __Pyx_AddTraceback("init cogent.maths.spatial.ckd3", __pyx_clineno, __pyx_lineno, __pyx_filename); Py_DECREF(__pyx_m); __pyx_m = 0; } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_ImportError, "init cogent.maths.spatial.ckd3"); } __pyx_L0:; __Pyx_RefNannyFinishContext(); #if PY_MAJOR_VERSION < 3 return; #else return __pyx_m; #endif } /* Runtime support code */ #if CYTHON_REFNANNY static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname) { PyObject *m = NULL, *p = NULL; void *r = NULL; m = PyImport_ImportModule((char *)modname); if (!m) goto end; p = PyObject_GetAttrString(m, (char *)"RefNannyAPI"); if (!p) goto end; r = PyLong_AsVoidPtr(p); end: Py_XDECREF(p); Py_XDECREF(m); return (__Pyx_RefNannyAPIStruct *)r; } #endif /* CYTHON_REFNANNY */ static void __Pyx_RaiseDoubleKeywordsError( const char* func_name, PyObject* kw_name) { PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION >= 3 "%s() got multiple values for keyword argument '%U'", func_name, kw_name); #else "%s() got multiple values for keyword argument '%s'", func_name, PyString_AS_STRING(kw_name)); #endif } static int __Pyx_ParseOptionalKeywords( PyObject *kwds, PyObject **argnames[], PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, const char* function_name) { PyObject *key = 0, *value = 0; Py_ssize_t pos = 0; PyObject*** name; PyObject*** first_kw_arg = argnames + num_pos_args; while (PyDict_Next(kwds, &pos, &key, &value)) { name = first_kw_arg; while (*name && (**name != key)) name++; if (*name) { values[name-argnames] = value; } else { #if PY_MAJOR_VERSION < 3 if (unlikely(!PyString_CheckExact(key)) && unlikely(!PyString_Check(key))) { #else if (unlikely(!PyUnicode_Check(key))) { #endif goto invalid_keyword_type; } else { for (name = first_kw_arg; *name; name++) { #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) break; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) break; #endif } if (*name) { values[name-argnames] = value; } else { for (name=argnames; name != first_kw_arg; name++) { if (**name == key) goto arg_passed_twice; #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) goto arg_passed_twice; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) goto arg_passed_twice; #endif } if (kwds2) { if (unlikely(PyDict_SetItem(kwds2, key, value))) goto bad; } else { goto invalid_keyword; } } } } } return 0; arg_passed_twice: __Pyx_RaiseDoubleKeywordsError(function_name, **name); goto bad; invalid_keyword_type: PyErr_Format(PyExc_TypeError, "%s() keywords must be strings", function_name); goto bad; invalid_keyword: PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION < 3 "%s() got an unexpected keyword argument '%s'", function_name, PyString_AsString(key)); #else "%s() got an unexpected keyword argument '%U'", function_name, key); #endif bad: return -1; } static void __Pyx_RaiseArgtupleInvalid( const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found) { Py_ssize_t num_expected; const char *more_or_less; if (num_found < num_min) { num_expected = num_min; more_or_less = "at least"; } else { num_expected = num_max; more_or_less = "at most"; } if (exact) { more_or_less = "exactly"; } PyErr_Format(PyExc_TypeError, "%s() takes %s %"PY_FORMAT_SIZE_T"d positional argument%s (%"PY_FORMAT_SIZE_T"d given)", func_name, more_or_less, num_expected, (num_expected == 1) ? "" : "s", num_found); } static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact) { if (!type) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (none_allowed && obj == Py_None) return 1; else if (exact) { if (Py_TYPE(obj) == type) return 1; } else { if (PyObject_TypeCheck(obj, type)) return 1; } PyErr_Format(PyExc_TypeError, "Argument '%s' has incorrect type (expected %s, got %s)", name, type->tp_name, Py_TYPE(obj)->tp_name); return 0; } static CYTHON_INLINE int __Pyx_IsLittleEndian(void) { unsigned int n = 1; return *(unsigned char*)(&n) != 0; } static void __Pyx_BufFmt_Init(__Pyx_BufFmt_Context* ctx, __Pyx_BufFmt_StackElem* stack, __Pyx_TypeInfo* type) { stack[0].field = &ctx->root; stack[0].parent_offset = 0; ctx->root.type = type; ctx->root.name = "buffer dtype"; ctx->root.offset = 0; ctx->head = stack; ctx->head->field = &ctx->root; ctx->fmt_offset = 0; ctx->head->parent_offset = 0; ctx->new_packmode = '@'; ctx->enc_packmode = '@'; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->is_complex = 0; ctx->is_valid_array = 0; ctx->struct_alignment = 0; while (type->typegroup == 'S') { ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = 0; type = type->fields->type; } } static int __Pyx_BufFmt_ParseNumber(const char** ts) { int count; const char* t = *ts; if (*t < '0' || *t > '9') { return -1; } else { count = *t++ - '0'; while (*t >= '0' && *t < '9') { count *= 10; count += *t++ - '0'; } } *ts = t; return count; } static int __Pyx_BufFmt_ExpectNumber(const char **ts) { int number = __Pyx_BufFmt_ParseNumber(ts); if (number == -1) /* First char was not a digit */ PyErr_Format(PyExc_ValueError,\ "Does not understand character buffer dtype format string ('%c')", **ts); return number; } static void __Pyx_BufFmt_RaiseUnexpectedChar(char ch) { PyErr_Format(PyExc_ValueError, "Unexpected format string character: '%c'", ch); } static const char* __Pyx_BufFmt_DescribeTypeChar(char ch, int is_complex) { switch (ch) { case 'b': return "'char'"; case 'B': return "'unsigned char'"; case 'h': return "'short'"; case 'H': return "'unsigned short'"; case 'i': return "'int'"; case 'I': return "'unsigned int'"; case 'l': return "'long'"; case 'L': return "'unsigned long'"; case 'q': return "'long long'"; case 'Q': return "'unsigned long long'"; case 'f': return (is_complex ? "'complex float'" : "'float'"); case 'd': return (is_complex ? "'complex double'" : "'double'"); case 'g': return (is_complex ? "'complex long double'" : "'long double'"); case 'T': return "a struct"; case 'O': return "Python object"; case 'P': return "a pointer"; case 's': case 'p': return "a string"; case 0: return "end"; default: return "unparseable format string"; } } static size_t __Pyx_BufFmt_TypeCharToStandardSize(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return 2; case 'i': case 'I': case 'l': case 'L': return 4; case 'q': case 'Q': return 8; case 'f': return (is_complex ? 8 : 4); case 'd': return (is_complex ? 16 : 8); case 'g': { PyErr_SetString(PyExc_ValueError, "Python does not define a standard format string size for long double ('g').."); return 0; } case 'O': case 'P': return sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static size_t __Pyx_BufFmt_TypeCharToNativeSize(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(short); case 'i': case 'I': return sizeof(int); case 'l': case 'L': return sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(float) * (is_complex ? 2 : 1); case 'd': return sizeof(double) * (is_complex ? 2 : 1); case 'g': return sizeof(long double) * (is_complex ? 2 : 1); case 'O': case 'P': return sizeof(void*); default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } typedef struct { char c; short x; } __Pyx_st_short; typedef struct { char c; int x; } __Pyx_st_int; typedef struct { char c; long x; } __Pyx_st_long; typedef struct { char c; float x; } __Pyx_st_float; typedef struct { char c; double x; } __Pyx_st_double; typedef struct { char c; long double x; } __Pyx_st_longdouble; typedef struct { char c; void *x; } __Pyx_st_void_p; #ifdef HAVE_LONG_LONG typedef struct { char c; PY_LONG_LONG x; } __Pyx_st_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToAlignment(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_st_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_st_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_st_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_st_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_st_float) - sizeof(float); case 'd': return sizeof(__Pyx_st_double) - sizeof(double); case 'g': return sizeof(__Pyx_st_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_st_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } /* These are for computing the padding at the end of the struct to align on the first member of the struct. This will probably the same as above, but we don't have any guarantees. */ typedef struct { short x; char c; } __Pyx_pad_short; typedef struct { int x; char c; } __Pyx_pad_int; typedef struct { long x; char c; } __Pyx_pad_long; typedef struct { float x; char c; } __Pyx_pad_float; typedef struct { double x; char c; } __Pyx_pad_double; typedef struct { long double x; char c; } __Pyx_pad_longdouble; typedef struct { void *x; char c; } __Pyx_pad_void_p; #ifdef HAVE_LONG_LONG typedef struct { PY_LONG_LONG x; char c; } __Pyx_pad_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToPadding(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_pad_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_pad_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_pad_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_pad_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_pad_float) - sizeof(float); case 'd': return sizeof(__Pyx_pad_double) - sizeof(double); case 'g': return sizeof(__Pyx_pad_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_pad_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static char __Pyx_BufFmt_TypeCharToGroup(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'h': case 'i': case 'l': case 'q': case 's': case 'p': return 'I'; case 'B': case 'H': case 'I': case 'L': case 'Q': return 'U'; case 'f': case 'd': case 'g': return (is_complex ? 'C' : 'R'); case 'O': return 'O'; case 'P': return 'P'; default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } static void __Pyx_BufFmt_RaiseExpected(__Pyx_BufFmt_Context* ctx) { if (ctx->head == NULL || ctx->head->field == &ctx->root) { const char* expected; const char* quote; if (ctx->head == NULL) { expected = "end"; quote = ""; } else { expected = ctx->head->field->type->name; quote = "'"; } PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected %s%s%s but got %s", quote, expected, quote, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex)); } else { __Pyx_StructField* field = ctx->head->field; __Pyx_StructField* parent = (ctx->head - 1)->field; PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected '%s' but got %s in '%s.%s'", field->type->name, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex), parent->type->name, field->name); } } static int __Pyx_BufFmt_ProcessTypeChunk(__Pyx_BufFmt_Context* ctx) { char group; size_t size, offset, arraysize = 1; if (ctx->enc_type == 0) return 0; if (ctx->head->field->type->arraysize[0]) { int i, ndim = 0; if (ctx->enc_type == 's' || ctx->enc_type == 'p') { ctx->is_valid_array = ctx->head->field->type->ndim == 1; ndim = 1; if (ctx->enc_count != ctx->head->field->type->arraysize[0]) { PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %zu", ctx->head->field->type->arraysize[0], ctx->enc_count); return -1; } } if (!ctx->is_valid_array) { PyErr_Format(PyExc_ValueError, "Expected %d dimensions, got %d", ctx->head->field->type->ndim, ndim); return -1; } for (i = 0; i < ctx->head->field->type->ndim; i++) { arraysize *= ctx->head->field->type->arraysize[i]; } ctx->is_valid_array = 0; ctx->enc_count = 1; } group = __Pyx_BufFmt_TypeCharToGroup(ctx->enc_type, ctx->is_complex); do { __Pyx_StructField* field = ctx->head->field; __Pyx_TypeInfo* type = field->type; if (ctx->enc_packmode == '@' || ctx->enc_packmode == '^') { size = __Pyx_BufFmt_TypeCharToNativeSize(ctx->enc_type, ctx->is_complex); } else { size = __Pyx_BufFmt_TypeCharToStandardSize(ctx->enc_type, ctx->is_complex); } if (ctx->enc_packmode == '@') { size_t align_at = __Pyx_BufFmt_TypeCharToAlignment(ctx->enc_type, ctx->is_complex); size_t align_mod_offset; if (align_at == 0) return -1; align_mod_offset = ctx->fmt_offset % align_at; if (align_mod_offset > 0) ctx->fmt_offset += align_at - align_mod_offset; if (ctx->struct_alignment == 0) ctx->struct_alignment = __Pyx_BufFmt_TypeCharToPadding(ctx->enc_type, ctx->is_complex); } if (type->size != size || type->typegroup != group) { if (type->typegroup == 'C' && type->fields != NULL) { size_t parent_offset = ctx->head->parent_offset + field->offset; ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = parent_offset; continue; } __Pyx_BufFmt_RaiseExpected(ctx); return -1; } offset = ctx->head->parent_offset + field->offset; if (ctx->fmt_offset != offset) { PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch; next field is at offset %"PY_FORMAT_SIZE_T"d but %"PY_FORMAT_SIZE_T"d expected", (Py_ssize_t)ctx->fmt_offset, (Py_ssize_t)offset); return -1; } ctx->fmt_offset += size; if (arraysize) ctx->fmt_offset += (arraysize - 1) * size; --ctx->enc_count; /* Consume from buffer string */ while (1) { if (field == &ctx->root) { ctx->head = NULL; if (ctx->enc_count != 0) { __Pyx_BufFmt_RaiseExpected(ctx); return -1; } break; /* breaks both loops as ctx->enc_count == 0 */ } ctx->head->field = ++field; if (field->type == NULL) { --ctx->head; field = ctx->head->field; continue; } else if (field->type->typegroup == 'S') { size_t parent_offset = ctx->head->parent_offset + field->offset; if (field->type->fields->type == NULL) continue; /* empty struct */ field = field->type->fields; ++ctx->head; ctx->head->field = field; ctx->head->parent_offset = parent_offset; break; } else { break; } } } while (ctx->enc_count); ctx->enc_type = 0; ctx->is_complex = 0; return 0; } static CYTHON_INLINE PyObject * __pyx_buffmt_parse_array(__Pyx_BufFmt_Context* ctx, const char** tsp) { const char *ts = *tsp; int i = 0, number; int ndim = ctx->head->field->type->ndim; ; ++ts; if (ctx->new_count != 1) { PyErr_SetString(PyExc_ValueError, "Cannot handle repeated arrays in format string"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; while (*ts && *ts != ')') { if (isspace(*ts)) continue; number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; if (i < ndim && (size_t) number != ctx->head->field->type->arraysize[i]) return PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %d", ctx->head->field->type->arraysize[i], number); if (*ts != ',' && *ts != ')') return PyErr_Format(PyExc_ValueError, "Expected a comma in format string, got '%c'", *ts); if (*ts == ',') ts++; i++; } if (i != ndim) return PyErr_Format(PyExc_ValueError, "Expected %d dimension(s), got %d", ctx->head->field->type->ndim, i); if (!*ts) { PyErr_SetString(PyExc_ValueError, "Unexpected end of format string, expected ')'"); return NULL; } ctx->is_valid_array = 1; ctx->new_count = 1; *tsp = ++ts; return Py_None; } static const char* __Pyx_BufFmt_CheckString(__Pyx_BufFmt_Context* ctx, const char* ts) { int got_Z = 0; while (1) { switch(*ts) { case 0: if (ctx->enc_type != 0 && ctx->head == NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; if (ctx->head != NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } return ts; case ' ': case 10: case 13: ++ts; break; case '<': if (!__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Little-endian buffer not supported on big-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '>': case '!': if (__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Big-endian buffer not supported on little-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '=': case '@': case '^': ctx->new_packmode = *ts++; break; case 'T': /* substruct */ { const char* ts_after_sub; size_t i, struct_count = ctx->new_count; size_t struct_alignment = ctx->struct_alignment; ctx->new_count = 1; ++ts; if (*ts != '{') { PyErr_SetString(PyExc_ValueError, "Buffer acquisition: Expected '{' after 'T'"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ ctx->enc_count = 0; ctx->struct_alignment = 0; ++ts; ts_after_sub = ts; for (i = 0; i != struct_count; ++i) { ts_after_sub = __Pyx_BufFmt_CheckString(ctx, ts); if (!ts_after_sub) return NULL; } ts = ts_after_sub; if (struct_alignment) ctx->struct_alignment = struct_alignment; } break; case '}': /* end of substruct; either repeat or move on */ { size_t alignment = ctx->struct_alignment; ++ts; if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ if (alignment && ctx->fmt_offset % alignment) { ctx->fmt_offset += alignment - (ctx->fmt_offset % alignment); } } return ts; case 'x': if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->fmt_offset += ctx->new_count; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->enc_packmode = ctx->new_packmode; ++ts; break; case 'Z': got_Z = 1; ++ts; if (*ts != 'f' && *ts != 'd' && *ts != 'g') { __Pyx_BufFmt_RaiseUnexpectedChar('Z'); return NULL; } /* fall through */ case 'c': case 'b': case 'B': case 'h': case 'H': case 'i': case 'I': case 'l': case 'L': case 'q': case 'Q': case 'f': case 'd': case 'g': case 'O': case 's': case 'p': if (ctx->enc_type == *ts && got_Z == ctx->is_complex && ctx->enc_packmode == ctx->new_packmode) { ctx->enc_count += ctx->new_count; } else { if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_count = ctx->new_count; ctx->enc_packmode = ctx->new_packmode; ctx->enc_type = *ts; ctx->is_complex = got_Z; } ++ts; ctx->new_count = 1; got_Z = 0; break; case ':': ++ts; while(*ts != ':') ++ts; ++ts; break; case '(': if (!__pyx_buffmt_parse_array(ctx, &ts)) return NULL; break; default: { int number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; ctx->new_count = (size_t)number; } } } } static CYTHON_INLINE void __Pyx_ZeroBuffer(Py_buffer* buf) { buf->buf = NULL; buf->obj = NULL; buf->strides = __Pyx_zeros; buf->shape = __Pyx_zeros; buf->suboffsets = __Pyx_minusones; } static CYTHON_INLINE int __Pyx_GetBufferAndValidate( Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack) { if (obj == Py_None || obj == NULL) { __Pyx_ZeroBuffer(buf); return 0; } buf->buf = NULL; if (__Pyx_GetBuffer(obj, buf, flags) == -1) goto fail; if (buf->ndim != nd) { PyErr_Format(PyExc_ValueError, "Buffer has wrong number of dimensions (expected %d, got %d)", nd, buf->ndim); goto fail; } if (!cast) { __Pyx_BufFmt_Context ctx; __Pyx_BufFmt_Init(&ctx, stack, dtype); if (!__Pyx_BufFmt_CheckString(&ctx, buf->format)) goto fail; } if ((unsigned)buf->itemsize != dtype->size) { PyErr_Format(PyExc_ValueError, "Item size of buffer (%"PY_FORMAT_SIZE_T"d byte%s) does not match size of '%s' (%"PY_FORMAT_SIZE_T"d byte%s)", buf->itemsize, (buf->itemsize > 1) ? "s" : "", dtype->name, (Py_ssize_t)dtype->size, (dtype->size > 1) ? "s" : ""); goto fail; } if (buf->suboffsets == NULL) buf->suboffsets = __Pyx_minusones; return 0; fail:; __Pyx_ZeroBuffer(buf); return -1; } static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info) { if (info->buf == NULL) return; if (info->suboffsets == __Pyx_minusones) info->suboffsets = NULL; __Pyx_ReleaseBuffer(info); } static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb) { #if CYTHON_COMPILING_IN_CPYTHON PyObject *tmp_type, *tmp_value, *tmp_tb; PyThreadState *tstate = PyThreadState_GET(); tmp_type = tstate->curexc_type; tmp_value = tstate->curexc_value; tmp_tb = tstate->curexc_traceback; tstate->curexc_type = type; tstate->curexc_value = value; tstate->curexc_traceback = tb; Py_XDECREF(tmp_type); Py_XDECREF(tmp_value); Py_XDECREF(tmp_tb); #else PyErr_Restore(type, value, tb); #endif } static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb) { #if CYTHON_COMPILING_IN_CPYTHON PyThreadState *tstate = PyThreadState_GET(); *type = tstate->curexc_type; *value = tstate->curexc_value; *tb = tstate->curexc_traceback; tstate->curexc_type = 0; tstate->curexc_value = 0; tstate->curexc_traceback = 0; #else PyErr_Fetch(type, value, tb); #endif } static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type) { if (unlikely(!type)) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (likely(PyObject_TypeCheck(obj, type))) return 1; PyErr_Format(PyExc_TypeError, "Cannot convert %.200s to %.200s", Py_TYPE(obj)->tp_name, type->tp_name); return 0; } static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name) { PyObject *result; result = PyObject_GetAttr(dict, name); if (!result) { if (dict != __pyx_b) { PyErr_Clear(); result = PyObject_GetAttr(__pyx_b, name); } if (!result) { PyErr_SetObject(PyExc_NameError, name); } } return result; } #if PY_MAJOR_VERSION < 3 static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, CYTHON_UNUSED PyObject *cause) { Py_XINCREF(type); Py_XINCREF(value); Py_XINCREF(tb); if (tb == Py_None) { Py_DECREF(tb); tb = 0; } else if (tb != NULL && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto raise_error; } if (value == NULL) { value = Py_None; Py_INCREF(value); } #if PY_VERSION_HEX < 0x02050000 if (!PyClass_Check(type)) #else if (!PyType_Check(type)) #endif { if (value != Py_None) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto raise_error; } Py_DECREF(value); value = type; #if PY_VERSION_HEX < 0x02050000 if (PyInstance_Check(type)) { type = (PyObject*) ((PyInstanceObject*)type)->in_class; Py_INCREF(type); } else { type = 0; PyErr_SetString(PyExc_TypeError, "raise: exception must be an old-style class or instance"); goto raise_error; } #else type = (PyObject*) Py_TYPE(type); Py_INCREF(type); if (!PyType_IsSubtype((PyTypeObject *)type, (PyTypeObject *)PyExc_BaseException)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto raise_error; } #endif } __Pyx_ErrRestore(type, value, tb); return; raise_error: Py_XDECREF(value); Py_XDECREF(type); Py_XDECREF(tb); return; } #else /* Python 3+ */ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause) { if (tb == Py_None) { tb = 0; } else if (tb && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto bad; } if (value == Py_None) value = 0; if (PyExceptionInstance_Check(type)) { if (value) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto bad; } value = type; type = (PyObject*) Py_TYPE(value); } else if (!PyExceptionClass_Check(type)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto bad; } if (cause) { PyObject *fixed_cause; if (PyExceptionClass_Check(cause)) { fixed_cause = PyObject_CallObject(cause, NULL); if (fixed_cause == NULL) goto bad; } else if (PyExceptionInstance_Check(cause)) { fixed_cause = cause; Py_INCREF(fixed_cause); } else { PyErr_SetString(PyExc_TypeError, "exception causes must derive from " "BaseException"); goto bad; } if (!value) { value = PyObject_CallObject(type, NULL); } PyException_SetCause(value, fixed_cause); } PyErr_SetObject(type, value); if (tb) { PyThreadState *tstate = PyThreadState_GET(); PyObject* tmp_tb = tstate->curexc_traceback; if (tb != tmp_tb) { Py_INCREF(tb); tstate->curexc_traceback = tb; Py_XDECREF(tmp_tb); } } bad: return; } #endif static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index) { PyErr_Format(PyExc_ValueError, "need more than %"PY_FORMAT_SIZE_T"d value%s to unpack", index, (index == 1) ? "" : "s"); } static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected) { PyErr_Format(PyExc_ValueError, "too many values to unpack (expected %"PY_FORMAT_SIZE_T"d)", expected); } static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); } static void __Pyx_UnpackTupleError(PyObject *t, Py_ssize_t index) { if (t == Py_None) { __Pyx_RaiseNoneNotIterableError(); } else if (PyTuple_GET_SIZE(t) < index) { __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(t)); } else { __Pyx_RaiseTooManyValuesError(index); } } static CYTHON_INLINE PyObject *__Pyx_PyInt_to_py_npy_uint64(npy_uint64 val) { const npy_uint64 neg_one = (npy_uint64)-1, const_zero = (npy_uint64)0; const int is_unsigned = const_zero < neg_one; if ((sizeof(npy_uint64) == sizeof(char)) || (sizeof(npy_uint64) == sizeof(short))) { return PyInt_FromLong((long)val); } else if ((sizeof(npy_uint64) == sizeof(int)) || (sizeof(npy_uint64) == sizeof(long))) { if (is_unsigned) return PyLong_FromUnsignedLong((unsigned long)val); else return PyInt_FromLong((long)val); } else if (sizeof(npy_uint64) == sizeof(PY_LONG_LONG)) { if (is_unsigned) return PyLong_FromUnsignedLongLong((unsigned PY_LONG_LONG)val); else return PyLong_FromLongLong((PY_LONG_LONG)val); } else { int one = 1; int little = (int)*(unsigned char *)&one; unsigned char *bytes = (unsigned char *)&val; return _PyLong_FromByteArray(bytes, sizeof(npy_uint64), little, !is_unsigned); } } #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags) { PyObject *getbuffer_cobj; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) return PyObject_GetBuffer(obj, view, flags); #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) return __pyx_pw_5numpy_7ndarray_1__getbuffer__(obj, view, flags); #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (getbuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_getbuffer"))) { getbufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (getbufferproc) PyCapsule_GetPointer(getbuffer_cobj, "getbuffer(obj, view, flags)"); #else func = (getbufferproc) PyCObject_AsVoidPtr(getbuffer_cobj); #endif Py_DECREF(getbuffer_cobj); if (!func) goto fail; return func(obj, view, flags); } else { PyErr_Clear(); } #endif PyErr_Format(PyExc_TypeError, "'%100s' does not have the buffer interface", Py_TYPE(obj)->tp_name); #if PY_VERSION_HEX < 0x02060000 fail: #endif return -1; } static void __Pyx_ReleaseBuffer(Py_buffer *view) { PyObject *obj = view->obj; PyObject *releasebuffer_cobj; if (!obj) return; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) { PyBuffer_Release(view); return; } #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) { __pyx_pw_5numpy_7ndarray_3__releasebuffer__(obj, view); return; } #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (releasebuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_releasebuffer"))) { releasebufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (releasebufferproc) PyCapsule_GetPointer(releasebuffer_cobj, "releasebuffer(obj, view)"); #else func = (releasebufferproc) PyCObject_AsVoidPtr(releasebuffer_cobj); #endif Py_DECREF(releasebuffer_cobj); if (!func) goto fail; func(obj, view); return; } else { PyErr_Clear(); } #endif goto nofail; #if PY_VERSION_HEX < 0x02060000 fail: #endif PyErr_WriteUnraisable(obj); nofail: Py_DECREF(obj); view->obj = NULL; } #endif /* PY_MAJOR_VERSION < 3 */ static CYTHON_INLINE npy_uint64 __Pyx_PyInt_from_py_npy_uint64(PyObject* x) { const npy_uint64 neg_one = (npy_uint64)-1, const_zero = (npy_uint64)0; const int is_unsigned = const_zero < neg_one; if (sizeof(npy_uint64) == sizeof(char)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedChar(x); else return (npy_uint64)__Pyx_PyInt_AsSignedChar(x); } else if (sizeof(npy_uint64) == sizeof(short)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedShort(x); else return (npy_uint64)__Pyx_PyInt_AsSignedShort(x); } else if (sizeof(npy_uint64) == sizeof(int)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedInt(x); else return (npy_uint64)__Pyx_PyInt_AsSignedInt(x); } else if (sizeof(npy_uint64) == sizeof(long)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedLong(x); else return (npy_uint64)__Pyx_PyInt_AsSignedLong(x); } else if (sizeof(npy_uint64) == sizeof(PY_LONG_LONG)) { if (is_unsigned) return (npy_uint64)__Pyx_PyInt_AsUnsignedLongLong(x); else return (npy_uint64)__Pyx_PyInt_AsSignedLongLong(x); } else { npy_uint64 val; PyObject *v = __Pyx_PyNumber_Int(x); #if PY_VERSION_HEX < 0x03000000 if (likely(v) && !PyLong_Check(v)) { PyObject *tmp = v; v = PyNumber_Long(tmp); Py_DECREF(tmp); } #endif if (likely(v)) { int one = 1; int is_little = (int)*(unsigned char *)&one; unsigned char *bytes = (unsigned char *)&val; int ret = _PyLong_AsByteArray((PyLongObject *)v, bytes, sizeof(val), is_little, !is_unsigned); Py_DECREF(v); if (likely(!ret)) return val; } return (npy_uint64)-1; } } #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return ::std::complex< float >(x, y); } #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return x + y*(__pyx_t_float_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { __pyx_t_float_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex a, __pyx_t_float_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrtf(z.real*z.real + z.imag*z.imag); #else return hypotf(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { float denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(a, a); case 3: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, a); case 4: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_absf(a); theta = atan2f(a.imag, a.real); } lnr = logf(r); z_r = expf(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cosf(z_theta); z.imag = z_r * sinf(z_theta); return z; } #endif #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return ::std::complex< double >(x, y); } #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return x + y*(__pyx_t_double_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { __pyx_t_double_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex a, __pyx_t_double_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrt(z.real*z.real + z.imag*z.imag); #else return hypot(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { double denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(a, a); case 3: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, a); case 4: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_abs(a); theta = atan2(a.imag, a.real); } lnr = log(r); z_r = exp(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cos(z_theta); z.imag = z_r * sin(z_theta); return z; } #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject* x) { const unsigned char neg_one = (unsigned char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned char" : "value too large to convert to unsigned char"); } return (unsigned char)-1; } return (unsigned char)val; } return (unsigned char)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject* x) { const unsigned short neg_one = (unsigned short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned short" : "value too large to convert to unsigned short"); } return (unsigned short)-1; } return (unsigned short)val; } return (unsigned short)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject* x) { const unsigned int neg_one = (unsigned int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned int" : "value too large to convert to unsigned int"); } return (unsigned int)-1; } return (unsigned int)val; } return (unsigned int)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject* x) { const char neg_one = (char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to char" : "value too large to convert to char"); } return (char)-1; } return (char)val; } return (char)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject* x) { const short neg_one = (short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to short" : "value too large to convert to short"); } return (short)-1; } return (short)val; } return (short)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject* x) { const signed char neg_one = (signed char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed char" : "value too large to convert to signed char"); } return (signed char)-1; } return (signed char)val; } return (signed char)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject* x) { const signed short neg_one = (signed short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed short" : "value too large to convert to signed short"); } return (signed short)-1; } return (signed short)val; } return (signed short)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject* x) { const signed int neg_one = (signed int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed int" : "value too large to convert to signed int"); } return (signed int)-1; } return (signed int)val; } return (signed int)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject* x) { const unsigned long neg_one = (unsigned long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)PyLong_AsUnsignedLong(x); } else { return (unsigned long)PyLong_AsLong(x); } } else { unsigned long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned long)-1; val = __Pyx_PyInt_AsUnsignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject* x) { const unsigned PY_LONG_LONG neg_one = (unsigned PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (unsigned PY_LONG_LONG)PyLong_AsLongLong(x); } } else { unsigned PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned PY_LONG_LONG)-1; val = __Pyx_PyInt_AsUnsignedLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject* x) { const long neg_one = (long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)PyLong_AsUnsignedLong(x); } else { return (long)PyLong_AsLong(x); } } else { long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (long)-1; val = __Pyx_PyInt_AsLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject* x) { const PY_LONG_LONG neg_one = (PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (PY_LONG_LONG)PyLong_AsLongLong(x); } } else { PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1; val = __Pyx_PyInt_AsLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject* x) { const signed long neg_one = (signed long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)PyLong_AsUnsignedLong(x); } else { return (signed long)PyLong_AsLong(x); } } else { signed long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed long)-1; val = __Pyx_PyInt_AsSignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject* x) { const signed PY_LONG_LONG neg_one = (signed PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (signed PY_LONG_LONG)PyLong_AsLongLong(x); } } else { signed PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed PY_LONG_LONG)-1; val = __Pyx_PyInt_AsSignedLongLong(tmp); Py_DECREF(tmp); return val; } } static void __Pyx_WriteUnraisable(const char *name, int clineno, int lineno, const char *filename) { PyObject *old_exc, *old_val, *old_tb; PyObject *ctx; __Pyx_ErrFetch(&old_exc, &old_val, &old_tb); #if PY_MAJOR_VERSION < 3 ctx = PyString_FromString(name); #else ctx = PyUnicode_FromString(name); #endif __Pyx_ErrRestore(old_exc, old_val, old_tb); if (!ctx) { PyErr_WriteUnraisable(Py_None); } else { PyErr_WriteUnraisable(ctx); Py_DECREF(ctx); } } static CYTHON_INLINE Py_intptr_t __Pyx_PyInt_from_py_Py_intptr_t(PyObject* x) { const Py_intptr_t neg_one = (Py_intptr_t)-1, const_zero = (Py_intptr_t)0; const int is_unsigned = const_zero < neg_one; if (sizeof(Py_intptr_t) == sizeof(char)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedChar(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedChar(x); } else if (sizeof(Py_intptr_t) == sizeof(short)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedShort(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedShort(x); } else if (sizeof(Py_intptr_t) == sizeof(int)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedInt(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedInt(x); } else if (sizeof(Py_intptr_t) == sizeof(long)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedLong(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedLong(x); } else if (sizeof(Py_intptr_t) == sizeof(PY_LONG_LONG)) { if (is_unsigned) return (Py_intptr_t)__Pyx_PyInt_AsUnsignedLongLong(x); else return (Py_intptr_t)__Pyx_PyInt_AsSignedLongLong(x); } else { Py_intptr_t val; PyObject *v = __Pyx_PyNumber_Int(x); #if PY_VERSION_HEX < 0x03000000 if (likely(v) && !PyLong_Check(v)) { PyObject *tmp = v; v = PyNumber_Long(tmp); Py_DECREF(tmp); } #endif if (likely(v)) { int one = 1; int is_little = (int)*(unsigned char *)&one; unsigned char *bytes = (unsigned char *)&val; int ret = _PyLong_AsByteArray((PyLongObject *)v, bytes, sizeof(val), is_little, !is_unsigned); Py_DECREF(v); if (likely(!ret)) return val; } return (Py_intptr_t)-1; } } static int __Pyx_check_binary_version(void) { char ctversion[4], rtversion[4]; PyOS_snprintf(ctversion, 4, "%d.%d", PY_MAJOR_VERSION, PY_MINOR_VERSION); PyOS_snprintf(rtversion, 4, "%s", Py_GetVersion()); if (ctversion[0] != rtversion[0] || ctversion[2] != rtversion[2]) { char message[200]; PyOS_snprintf(message, sizeof(message), "compiletime version %s of module '%.100s' " "does not match runtime version %s", ctversion, __Pyx_MODULE_NAME, rtversion); #if PY_VERSION_HEX < 0x02050000 return PyErr_Warn(NULL, message); #else return PyErr_WarnEx(NULL, message, 1); #endif } return 0; } static int __Pyx_ExportFunction(const char *name, void (*f)(void), const char *sig) { PyObject *d = 0; PyObject *cobj = 0; union { void (*fp)(void); void *p; } tmp; d = PyObject_GetAttrString(__pyx_m, (char *)"__pyx_capi__"); if (!d) { PyErr_Clear(); d = PyDict_New(); if (!d) goto bad; Py_INCREF(d); if (PyModule_AddObject(__pyx_m, (char *)"__pyx_capi__", d) < 0) goto bad; } tmp.fp = f; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION==3&&PY_MINOR_VERSION==0) cobj = PyCapsule_New(tmp.p, sig, 0); #else cobj = PyCObject_FromVoidPtrAndDesc(tmp.p, (void *)sig, 0); #endif if (!cobj) goto bad; if (PyDict_SetItemString(d, name, cobj) < 0) goto bad; Py_DECREF(cobj); Py_DECREF(d); return 0; bad: Py_XDECREF(cobj); Py_XDECREF(d); return -1; } #ifndef __PYX_HAVE_RT_ImportType #define __PYX_HAVE_RT_ImportType static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict) { PyObject *py_module = 0; PyObject *result = 0; PyObject *py_name = 0; char warning[200]; py_module = __Pyx_ImportModule(module_name); if (!py_module) goto bad; py_name = __Pyx_PyIdentifier_FromString(class_name); if (!py_name) goto bad; result = PyObject_GetAttr(py_module, py_name); Py_DECREF(py_name); py_name = 0; Py_DECREF(py_module); py_module = 0; if (!result) goto bad; if (!PyType_Check(result)) { PyErr_Format(PyExc_TypeError, "%s.%s is not a type object", module_name, class_name); goto bad; } if (!strict && (size_t)((PyTypeObject *)result)->tp_basicsize > size) { PyOS_snprintf(warning, sizeof(warning), "%s.%s size changed, may indicate binary incompatibility", module_name, class_name); #if PY_VERSION_HEX < 0x02050000 if (PyErr_Warn(NULL, warning) < 0) goto bad; #else if (PyErr_WarnEx(NULL, warning, 0) < 0) goto bad; #endif } else if ((size_t)((PyTypeObject *)result)->tp_basicsize != size) { PyErr_Format(PyExc_ValueError, "%s.%s has the wrong size, try recompiling", module_name, class_name); goto bad; } return (PyTypeObject *)result; bad: Py_XDECREF(py_module); Py_XDECREF(result); return NULL; } #endif #ifndef __PYX_HAVE_RT_ImportModule #define __PYX_HAVE_RT_ImportModule static PyObject *__Pyx_ImportModule(const char *name) { PyObject *py_name = 0; PyObject *py_module = 0; py_name = __Pyx_PyIdentifier_FromString(name); if (!py_name) goto bad; py_module = PyImport_Import(py_name); Py_DECREF(py_name); return py_module; bad: Py_XDECREF(py_name); return 0; } #endif static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line) { int start = 0, mid = 0, end = count - 1; if (end >= 0 && code_line > entries[end].code_line) { return count; } while (start < end) { mid = (start + end) / 2; if (code_line < entries[mid].code_line) { end = mid; } else if (code_line > entries[mid].code_line) { start = mid + 1; } else { return mid; } } if (code_line <= entries[mid].code_line) { return mid; } else { return mid + 1; } } static PyCodeObject *__pyx_find_code_object(int code_line) { PyCodeObject* code_object; int pos; if (unlikely(!code_line) || unlikely(!__pyx_code_cache.entries)) { return NULL; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if (unlikely(pos >= __pyx_code_cache.count) || unlikely(__pyx_code_cache.entries[pos].code_line != code_line)) { return NULL; } code_object = __pyx_code_cache.entries[pos].code_object; Py_INCREF(code_object); return code_object; } static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object) { int pos, i; __Pyx_CodeObjectCacheEntry* entries = __pyx_code_cache.entries; if (unlikely(!code_line)) { return; } if (unlikely(!entries)) { entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Malloc(64*sizeof(__Pyx_CodeObjectCacheEntry)); if (likely(entries)) { __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = 64; __pyx_code_cache.count = 1; entries[0].code_line = code_line; entries[0].code_object = code_object; Py_INCREF(code_object); } return; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if ((pos < __pyx_code_cache.count) && unlikely(__pyx_code_cache.entries[pos].code_line == code_line)) { PyCodeObject* tmp = entries[pos].code_object; entries[pos].code_object = code_object; Py_DECREF(tmp); return; } if (__pyx_code_cache.count == __pyx_code_cache.max_count) { int new_max = __pyx_code_cache.max_count + 64; entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Realloc( __pyx_code_cache.entries, new_max*sizeof(__Pyx_CodeObjectCacheEntry)); if (unlikely(!entries)) { return; } __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = new_max; } for (i=__pyx_code_cache.count; i>pos; i--) { entries[i] = entries[i-1]; } entries[pos].code_line = code_line; entries[pos].code_object = code_object; __pyx_code_cache.count++; Py_INCREF(code_object); } #include "compile.h" #include "frameobject.h" #include "traceback.h" static PyCodeObject* __Pyx_CreateCodeObjectForTraceback( const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_srcfile = 0; PyObject *py_funcname = 0; #if PY_MAJOR_VERSION < 3 py_srcfile = PyString_FromString(filename); #else py_srcfile = PyUnicode_FromString(filename); #endif if (!py_srcfile) goto bad; if (c_line) { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #else py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #endif } else { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromString(funcname); #else py_funcname = PyUnicode_FromString(funcname); #endif } if (!py_funcname) goto bad; py_code = __Pyx_PyCode_New( 0, /*int argcount,*/ 0, /*int kwonlyargcount,*/ 0, /*int nlocals,*/ 0, /*int stacksize,*/ 0, /*int flags,*/ __pyx_empty_bytes, /*PyObject *code,*/ __pyx_empty_tuple, /*PyObject *consts,*/ __pyx_empty_tuple, /*PyObject *names,*/ __pyx_empty_tuple, /*PyObject *varnames,*/ __pyx_empty_tuple, /*PyObject *freevars,*/ __pyx_empty_tuple, /*PyObject *cellvars,*/ py_srcfile, /*PyObject *filename,*/ py_funcname, /*PyObject *name,*/ py_line, /*int firstlineno,*/ __pyx_empty_bytes /*PyObject *lnotab*/ ); Py_DECREF(py_srcfile); Py_DECREF(py_funcname); return py_code; bad: Py_XDECREF(py_srcfile); Py_XDECREF(py_funcname); return NULL; } static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_globals = 0; PyFrameObject *py_frame = 0; py_code = __pyx_find_code_object(c_line ? c_line : py_line); if (!py_code) { py_code = __Pyx_CreateCodeObjectForTraceback( funcname, c_line, py_line, filename); if (!py_code) goto bad; __pyx_insert_code_object(c_line ? c_line : py_line, py_code); } py_globals = PyModule_GetDict(__pyx_m); if (!py_globals) goto bad; py_frame = PyFrame_New( PyThreadState_GET(), /*PyThreadState *tstate,*/ py_code, /*PyCodeObject *code,*/ py_globals, /*PyObject *globals,*/ 0 /*PyObject *locals*/ ); if (!py_frame) goto bad; py_frame->f_lineno = py_line; PyTraceBack_Here(py_frame); bad: Py_XDECREF(py_code); Py_XDECREF(py_frame); } static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) { while (t->p) { #if PY_MAJOR_VERSION < 3 if (t->is_unicode) { *t->p = PyUnicode_DecodeUTF8(t->s, t->n - 1, NULL); } else if (t->intern) { *t->p = PyString_InternFromString(t->s); } else { *t->p = PyString_FromStringAndSize(t->s, t->n - 1); } #else /* Python 3+ has unicode identifiers */ if (t->is_unicode | t->is_str) { if (t->intern) { *t->p = PyUnicode_InternFromString(t->s); } else if (t->encoding) { *t->p = PyUnicode_Decode(t->s, t->n - 1, t->encoding, NULL); } else { *t->p = PyUnicode_FromStringAndSize(t->s, t->n - 1); } } else { *t->p = PyBytes_FromStringAndSize(t->s, t->n - 1); } #endif if (!*t->p) return -1; ++t; } return 0; } /* Type Conversion Functions */ static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { int is_true = x == Py_True; if (is_true | (x == Py_False) | (x == Py_None)) return is_true; else return PyObject_IsTrue(x); } static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x) { PyNumberMethods *m; const char *name = NULL; PyObject *res = NULL; #if PY_VERSION_HEX < 0x03000000 if (PyInt_Check(x) || PyLong_Check(x)) #else if (PyLong_Check(x)) #endif return Py_INCREF(x), x; m = Py_TYPE(x)->tp_as_number; #if PY_VERSION_HEX < 0x03000000 if (m && m->nb_int) { name = "int"; res = PyNumber_Int(x); } else if (m && m->nb_long) { name = "long"; res = PyNumber_Long(x); } #else if (m && m->nb_int) { name = "int"; res = PyNumber_Long(x); } #endif if (res) { #if PY_VERSION_HEX < 0x03000000 if (!PyInt_Check(res) && !PyLong_Check(res)) { #else if (!PyLong_Check(res)) { #endif PyErr_Format(PyExc_TypeError, "__%s__ returned non-%s (type %.200s)", name, name, Py_TYPE(res)->tp_name); Py_DECREF(res); return NULL; } } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_TypeError, "an integer is required"); } return res; } static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) { Py_ssize_t ival; PyObject* x = PyNumber_Index(b); if (!x) return -1; ival = PyInt_AsSsize_t(x); Py_DECREF(x); return ival; } static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) { #if PY_VERSION_HEX < 0x02050000 if (ival <= LONG_MAX) return PyInt_FromLong((long)ival); else { unsigned char *bytes = (unsigned char *) &ival; int one = 1; int little = (int)*(unsigned char*)&one; return _PyLong_FromByteArray(bytes, sizeof(size_t), little, 0); } #else return PyInt_FromSize_t(ival); #endif } static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject* x) { unsigned PY_LONG_LONG val = __Pyx_PyInt_AsUnsignedLongLong(x); if (unlikely(val == (unsigned PY_LONG_LONG)-1 && PyErr_Occurred())) { return (size_t)-1; } else if (unlikely(val != (unsigned PY_LONG_LONG)(size_t)val)) { PyErr_SetString(PyExc_OverflowError, "value too large to convert to size_t"); return (size_t)-1; } return (size_t)val; } #endif /* Py_PYTHON_H */ PyCogent-1.5.3/cogent/maths/spatial/ckd3.pxd000644 000765 000024 00000001634 11307235442 021636 0ustar00jrideoutstaff000000 000000 cimport numpy as np ctypedef np.npy_float64 DTYPE_t ctypedef np.npy_uint64 UTYPE_t cdef enum constants: NSTACK = 100 cdef struct kdpoint: UTYPE_t index DTYPE_t *coords cdef struct kdnode: UTYPE_t bucket # 1 if leaf-bucket, 0 if node int dimension DTYPE_t position UTYPE_t start, end # start and end index of data points kdnode *left, *right # pointers to left and right nodes cdef inline void swap(kdpoint*, kdpoint*) cdef kdpoint *points(DTYPE_t*, UTYPE_t, UTYPE_t) cdef inline DTYPE_t dist(kdpoint*, kdpoint*, UTYPE_t) cdef void qsort(kdpoint*, UTYPE_t, UTYPE_t, UTYPE_t) cdef kdnode *build_tree(kdpoint*, UTYPE_t, UTYPE_t, UTYPE_t, UTYPE_t, UTYPE_t) cdef UTYPE_t rn(kdnode*, kdpoint*, kdpoint, DTYPE_t**,UTYPE_t**, DTYPE_t, UTYPE_t, UTYPE_t) cdef void *knn(kdnode*, kdpoint*, kdpoint, DTYPE_t*, UTYPE_t*, UTYPE_t, UTYPE_t) PyCogent-1.5.3/cogent/maths/spatial/ckd3.pyx000644 000765 000024 00000025250 12024702176 021663 0ustar00jrideoutstaff000000 000000 #cython: boundscheck=False #(not slicing or indexing any numpy arrays) #TODO: # - extend to more then 3-dimensions (feature) # - replace build_tree with non-recursive function (speed) # - add the option to determine splitting planes based on point position spread # (feature) # - enable bottom-up algorithms by keeping track of ancestral node # - common stack size constant, how? cimport numpy as np from numpy cimport NPY_DOUBLE, NPY_ULONGLONG, npy_intp from stdlib cimport malloc, realloc, free __version__ = "('1', '5', '3')" cdef extern from "numpy/arrayobject.h": cdef object PyArray_SimpleNewFromData(int nd, npy_intp *dims,\ int typenum, void *data) cdef int import_array1(int ret) cdef kdpoint *points(DTYPE_t *c_array, UTYPE_t points, UTYPE_t dims): """creates an array of kdpoints from c-array of numpy doubles.""" cdef kdpoint *pnts = malloc(sizeof(kdpoint)*points) cdef UTYPE_t i for 0 <= i < points: pnts[i].index = i pnts[i].coords = c_array+i*dims return pnts cdef inline void swap(kdpoint *a, kdpoint *b): """swaps two pointers to kdpoint structs.""" cdef kdpoint t t = a[0] a[0] = b[0] b[0] = t cdef inline DTYPE_t dist(kdpoint *a, kdpoint *b, UTYPE_t dims): """calculates the squared distance between two points.""" cdef UTYPE_t i cdef DTYPE_t dif, dst = 0 for 0 <= i < dims: dif = a.coords[i] - b.coords[i] dst += dif * dif return dst cdef void qsort(kdpoint *A, UTYPE_t l, UTYPE_t r, UTYPE_t dim): """implements the quick sort algorithm on kdpoint arrays.""" cdef UTYPE_t i, j, jstack = 0 cdef DTYPE_t v cdef UTYPE_t *istack = malloc(NSTACK * sizeof(UTYPE_t)) while True: if r - l > 2: i = (l + r) >> 1 if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) j = r - 1 swap(&A[i], &A[j]) i = l v = A[j].coords[dim] while True: while A[i+1].coords[dim] < v: i+=1 i+=1 while A[j-1].coords[dim] > v: j-=1 j-=1 if j < i: break swap(&A[i], &A[j]) swap(&A[i], &A[r-1]) jstack += 2 if r - i >= j: istack[jstack] = r istack[jstack - 1] = i r = j else: istack[jstack] = j istack[jstack - 1] = l l = i else: i = (l + r) >> 1 if A[l].coords[dim] > A[i].coords[dim]: swap(&A[l], &A[i]) if A[l].coords[dim] > A[r].coords[dim]: swap(&A[l], &A[r]) if A[i].coords[dim] > A[r].coords[dim]: swap(&A[i], &A[r]) if jstack == 0: break r = istack[jstack] jstack-=1 l = istack[jstack] jstack-=1 free(istack) cdef kdnode *build_tree(kdpoint *point_list, UTYPE_t start, UTYPE_t end,\ UTYPE_t dims, UTYPE_t bucket_size, UTYPE_t depth): """recursive tree building function.""" # cannot make variable in if/else cdef UTYPE_t split, i cdef kdnode *node = malloc(sizeof(kdnode)) node.dimension = depth % dims node.start = start node.end = end if end - start <= bucket_size: # make bucket node node.bucket = 1 node.position = -1.0 node.left = NULL node.right = NULL else: ## make branch node node.bucket = 0 split = (start + end) / 2 qsort(point_list, start, end, node.dimension) node.position = point_list[split].coords[node.dimension] # recurse node.left = build_tree(point_list, start, split, dims , bucket_size , depth+1) node.right = build_tree(point_list, split+1, end, dims , bucket_size , depth+1) return node cdef void *knn(kdnode *root, kdpoint *point_list, kdpoint point, DTYPE_t *dst,\ UTYPE_t *idx, UTYPE_t k, UTYPE_t dims): """finds the K-Nearest Neighbors.""" # arrays of pointers will be used as a stack for left and right nodes # left nodes will be explored first. cdef kdnode *lstack[100] cdef UTYPE_t i, j, jold, ia, kmin # counter and index cdef DTYPE_t a, i_dist, diff cdef kdnode *node # set helper variable to heap-queue kmin = k - 1 # initialize stack cdef int jstack = 1 lstack[jstack] = root # initialize arrays for 0 <= i < k: dst[i] = 1000000000.00 # DBL_MAX idx[i] = 2147483647 # INT_MAX while jstack: node = lstack[jstack] jstack -= 1 if node.bucket: for node.start <= i <= node.end: i_dist = dist(&point_list[i], &point, dims) if i_dist < dst[0]: dst[0] = i_dist idx[0] = i if k > 1: a = dst[0] ia = idx[0] jold = 0 j = 1 while j <= kmin: if (j < kmin) and (dst[j] < dst[j+1]): j+=1 if (a >= dst[j]): break dst[jold] = dst[j] idx[jold] = idx[j] jold = j j = 2*j + 1 dst[jold] = a idx[jold] = ia else: diff = point.coords[node.dimension] - node.position if diff < 0: if dst[0] >= diff * diff: jstack+=1 lstack[jstack] = node.right jstack+=1 lstack[jstack] = node.left else: if dst[0] >= diff * diff: jstack+=1 lstack[jstack] = node.left jstack+=1 lstack[jstack] = node.right cdef UTYPE_t rn(kdnode *root, kdpoint *point_list, kdpoint point, DTYPE_t **dstptr,\ UTYPE_t **idxptr, DTYPE_t r, UTYPE_t dims, UTYPE_t buf): """finds points within radius of query.""" # arrays of pointers will be used as a stack for left and right nodes # left nodes will be explored first. cdef kdnode *lstack[100] dstptr[0] = malloc(buf * sizeof(DTYPE_t)) idxptr[0] = malloc(buf * sizeof(UTYPE_t)) cdef UTYPE_t i, count # counter and index cdef DTYPE_t i_dist, diff cdef kdnode *node # initialize stack cdef int jstack = 1 lstack[jstack] = root count = 0 while jstack: node = lstack[jstack] jstack -= 1 if node.bucket: for node.start <= i <= node.end: i_dist = dist(&point_list[i], &point, dims) if i_dist < r: dstptr[0][count] = i_dist idxptr[0][count] = i count += 1 if count % buf == 0: dstptr[0] = realloc(dstptr[0], (count + buf) * sizeof(DTYPE_t)) idxptr[0] = realloc(idxptr[0], (count + buf) * sizeof(UTYPE_t)) else: diff = point.coords[node.dimension] - node.position if diff < 0: if r >= diff * diff: jstack+=1 lstack[jstack] = node.right jstack+=1 lstack[jstack] = node.left else: if r >= diff * diff: jstack+=1 lstack[jstack] = node.left jstack+=1 lstack[jstack] = node.right dstptr[0] = realloc(dstptr[0], count * sizeof(DTYPE_t)) idxptr[0] = realloc(idxptr[0], count * sizeof(UTYPE_t)) return count cdef class KDTree: """Implements the KDTree data structure for fast neares neighbor queries.""" cdef np.ndarray n_array cdef DTYPE_t *c_array cdef kdpoint *kdpnts cdef kdnode *tree cdef readonly UTYPE_t dims cdef readonly UTYPE_t pnts cdef readonly UTYPE_t bucket_size def __init__(self, np.ndarray[DTYPE_t, ndim =2] n_array, \ UTYPE_t bucket_size =5): self.bucket_size = bucket_size self.pnts = n_array.shape[0] self.dims = n_array.shape[1] self.n_array = n_array self.c_array = n_array.data self.kdpnts = points(self.c_array, \ self.pnts, self.dims) self.tree = build_tree(self.kdpnts, 0, self.pnts-1, \ self.dims,self.bucket_size,0) import_array1(0) def knn(self, np.ndarray[DTYPE_t, ndim =1] point, npy_intp k): """Finds the K-Nearest Neighbors of given point. Arguments: - point: 1-d numpy array (query point). - k: number of neighbors to find.""" if self.pnts < k: return 1 cdef UTYPE_t i cdef kdpoint pnt pnt.coords = point.data cdef UTYPE_t size = point.size cdef DTYPE_t *dst = malloc(k * sizeof(DTYPE_t)) cdef UTYPE_t *idx = malloc(k * sizeof(UTYPE_t)) cdef UTYPE_t *ridx = malloc(k * sizeof(UTYPE_t)) knn(self.tree, self.kdpnts, pnt, dst, idx, k, self.dims) cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &k, NPY_DOUBLE, dst) for 0 <= i < k: ridx[i] = self.kdpnts[idx[i]].index cdef np.ndarray index = PyArray_SimpleNewFromData(1, &k, NPY_ULONGLONG, ridx) free(idx) return (index, dist) def rn(self, np.ndarray[DTYPE_t, ndim =1] point, DTYPE_t r): """Returns Radius Neighbors i.e. within radius from query point.""" cdef UTYPE_t i cdef npy_intp j cdef kdpoint pnt pnt.coords = point.data cdef UTYPE_t size = point.size cdef DTYPE_t **dstptr = malloc(sizeof(DTYPE_t*)) cdef UTYPE_t **idxptr = malloc(sizeof(UTYPE_t*)) j = rn(self.tree, self.kdpnts, pnt, dstptr, idxptr, r, self.dims, 100) cdef np.ndarray dist = PyArray_SimpleNewFromData(1, &j, NPY_DOUBLE, dstptr[0]) cdef UTYPE_t *ridx = malloc(j * sizeof(UTYPE_t)) for 0 <= i < j: ridx[i] = self.kdpnts[idxptr[0][i]].index cdef np.ndarray index = PyArray_SimpleNewFromData(1, &j, NPY_ULONGLONG, ridx) free(idxptr[0]) free(idxptr) free(dstptr) return (index, dist) PyCogent-1.5.3/cogent/maths/matrix/__init__.py000644 000765 000024 00000000563 12024702176 022255 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides classes and algorithms for dealing with matrices. """ __all__ = ['distance'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "caporaso@colorado.edu" __status__ = "Production" PyCogent-1.5.3/cogent/maths/matrix/distance.py000644 000765 000024 00000007440 12024702176 022311 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Code supporting distance matrices with arbitrary row/column labels. Currently used to support amino acid distance matrices and similar. NOTE: This is _much_ slower than using a numpy array. It is primarily convenient when you want to index into a matrix by keys (e.g. by amino acid labels) and when you expect to have a lot of missing values. You will probably want to use this for prototyping, then move to numpy arrays if and when performance becomes an issue. """ from cogent.util.misc import Delegator from cogent.util.dict2d import Dict2D from copy import deepcopy __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "caporaso@colorado.edu" __status__ = "Production" class DistanceMatrix(Dict2D, Delegator): """ 2D dict giving distances from A to B and vice versa """ # default set of amino acids RowOrder = list('ACDEFGHIKLMNPQRSTVWY') ColOrder = list('ACDEFGHIKLMNPQRSTVWY') Pad = True def __init__(self, data=None, RowOrder=None, ColOrder=None, Default=None, Pad=None, RowConstructor=None, info=None): """ Init dict with pre-exisitng data: dict of dicts Usage: data = distance matrix in form acceptable by Dict2D class RowOrder = list of 'interesting keys', default is the set of all amino acids ColOrder = list of 'interesting keys', default is the set of all amino acids Default = value to set padded elements to Pad = boolean describing whether to fill object to hold all possible elements based on RowOrder and ColOrder RowConstructor = constructor to use when building inner objects, default dict info = the AAIndexRecord object Power = Power the original matrix has been raised to yield current matrix """ if RowOrder is not None: self.RowOrder = RowOrder if ColOrder is not None: self.ColOrder = ColOrder if Pad is not None: self.Pad = Pad # Initialize super class attributes Dict2D.__init__(self, data=data, RowOrder=self.RowOrder,\ ColOrder=self.ColOrder, Default=Default, Pad=self.Pad,\ RowConstructor=RowConstructor) Delegator.__init__(self, info) # The power to which the original data has been raised to give # the current data, starts at 1., modified by elementPow() # accessed as self.Power self.__dict__['Power'] = 1. def elementPow(self, power, ignore_invalid=True): """ Raises all elements in matrix to power power: the power to raise all elements in the matrix to, must be a floatable value or a TypeError is raise ignore_invalid: leaves invalid (not floatable) matrix data untouched """ try: n = float(power) except ValueError: raise TypeError, 'Must pass a floatable value to elementPow' if ignore_invalid: def Pow(x): try: return x**n except TypeError: return x else: def Pow(x): return x**n self.scale(Pow) self.Power = self.Power * n def copy(self): """ Returns a deep copy of the DistanceMatrix object """ # Is there a better way to do this? It's tricky to keep the delegator # part functioning return deepcopy(self) PyCogent-1.5.3/cogent/format/__init__.py000644 000765 000024 00000001462 12024702176 021124 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Package cogent.format: provides modules for writing specific file formats. Currently provides: mage: writers for the MAGE 3D visualization program xml: xml base class file: general functions to read and write files """ __all__ = ['alignment', 'clustal', 'fasta', 'mage', 'motif', 'nexus', 'pdb_color', 'pdb', 'phylip', 'rna_struct', 'stockholm', 'structure', 'table', 'text_tree', 'xyzrn'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Gavin Huttley", "Matthew Wakefield", "Rob Knight", "Sandra Smit", "Peter Maxwell", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" PyCogent-1.5.3/cogent/format/alignment.py000644 000765 000024 00000020525 12024702176 021344 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os import warnings from cogent.parse.record import FileFormatError __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def save_to_filename(alignment, filename, format, **kw): """Arguments: - alignment: to be written - filename: name of the sequence alignment file - format: the multiple sequence file format """ if format is None: raise FileFormatError("format not known") f = open(filename, 'w') try: write_alignment_to_file(f, alignment, format, **kw) except Exception: try: os.unlink(filename) except Exception: pass raise f.close() def write_alignment_to_file(f, alignment, format, **kw): format = format.lower() if format not in WRITERS: raise FileFormatError("Unsupported file format %s" % format) writer = WRITERS[format](f) writer.writealignment(alignment, **kw) class _AlignmentWriter(file): """A virtual class for writing sequence files.""" def __init__(self, f): self.file = f # other utility function def setblocksize(self, size): """Set the length of the sequence to be printed to each line. Arguments: - size: the sequence length to put on each line.""" self.block_size = size def setaligninfo(self, alignmentdict, order=[]): """Set alignment attributes for writing. Arguments: - alignmentdict: dictionary of seqname -> seqstring - order: a list of seqname's in the order for writing """ self.number_sequences = len(alignmentdict) # supersede the use of alignment length self.align_length = len(alignmentdict[alignmentdict.keys()[0]]) if order != [] and len(order) == len(alignmentdict): # not testing contents - possibly should. self.align_order = order else: self.align_order = alignmentdict.keys().sort() def slicestringinblocks(self, seqstring, altblocksize=0): """Return a list of string slices of specified length. No line returns. Arguments: - seqstring: the raw sequence string - altblocksize: the length of sequence for writing to each line, default (0) means default value specified by blocksize will be used. """ if altblocksize: block_size = altblocksize else: block_size = self.block_size blocklist = [] seqlength = len(seqstring) for block in range(0, seqlength, block_size): if block + block_size < seqlength: blocklist.append(seqstring[block: block + block_size]) else: blocklist.append(seqstring[block:]) return blocklist def wrapstringtoblocksize(self, seqstring, altblocksize=0): """Return sequence slices with line returns inserted at the end of each slice. Arguments: - seqstring: the raw sequence string - altblocksize: the length of sequence for writing to each line, default (0) means default value specified by blocksize will be used. """ if altblocksize: self.block_size = altblocksize strlist = self.slicestringinblocks(seqstring, self.block_size) return '\n'.join(strlist) + "\n" class PhylipWriter(_AlignmentWriter): def writealignment(self, alignmentdict, block_size=60, order=[]): """Write the alignment to a file. Arguments: - alignmentdict: dict of seqname -> seqstring. - blocksize: the sequence length to write to each line, default is 60 - order: optional list of sequence names, which order to print in. (Assumes complete and correct list of names) """ #setup if not order: order = alignmentdict.keys() self.setaligninfo(alignmentdict, order) self.setblocksize(block_size) # header self.file.write('%d %d\n' %(self.number_sequences, self.align_length)) # sequences (pretty much as writ by Gavin) for seqname in self.align_order: seq = alignmentdict[seqname] for block in range(0, self.align_length, self.block_size): if not block: # write the otu name if len(seqname) > 9: warnings.warn('Name "%s" too long, truncated to "%s"' % (seqname, seqname[:9])) prefix = '%-10s' % seqname[:9] else: prefix = '%-10s' % seqname else: prefix = ' ' * 10 if block + self.block_size > self.align_length: to = self.align_length else: to = block + self.block_size self.file.write('%s%s\n' % (prefix, seq[block:to])) class PamlWriter(_AlignmentWriter): def writealignment(self, alignmentdict, block_size=60, order=[]): """Write the alignment to a file. Arguments: - alignmentdict: dict of seqname -> seqstring. - blocksize: the sequence length to write to each line, default is 60 - order: optional list of sequence names, which order to print in. (Assumes order is a complete and correct list of names) """ #setup if not order: order = alignmentdict.keys() self.setaligninfo(alignmentdict, order) self.setblocksize(block_size) #header self.file.write('%d %d\n' % (self.number_sequences, self.align_length)) #sequences for seq in self.align_order: self.file.writelines('%s\n%s' % (seq,self.wrapstringtoblocksize(alignmentdict[seq], altblocksize = block_size))) class FastaWriter(_AlignmentWriter): def writealignment(self, alignmentdict, block_size=60, order=[]): """Write the alignment to a file. Arguments: - alignmentdict: dict of seqname -> seqstring. - blocksize: the sequence length to write to each line, default is 60 - order: optional list of sequence names, which order to print in. (Assumes complete and correct list of names) """ #setup if not order: order = alignmentdict.keys() self.setaligninfo(alignmentdict, order) self.setblocksize(block_size) #sequences for seq in self.align_order: self.file.writelines('>%s\n%s' % (seq, self.wrapstringtoblocksize(alignmentdict[seq], altblocksize = block_size))) class GDEWriter(_AlignmentWriter): def writealignment(self, alignmentdict, block_size=60, order=[]): """Write the alignment to a file. Arguments: - alignmentdict: dict of seqname -> seqstring. - blocksize: the sequence length to write to each line, default is 60 - order: optional list of sequence names, which order to print in. (Assumes complete and correct list of names) """ #setup if not order: order = alignmentdict.keys() self.setaligninfo(alignmentdict, order) self.setblocksize(block_size) for seq in self.align_order: self.file.writelines('%s%s\n%s' % ("%", seq, self.wrapstringtoblocksize(alignmentdict[seq], altblocksize = block_size))) # to add a new file format add it's suffix and class name here WRITERS = { 'phylip': PhylipWriter, 'paml': PamlWriter, 'fasta': FastaWriter, 'mfa': FastaWriter, 'fa': FastaWriter, 'gde': GDEWriter, } PyCogent-1.5.3/cogent/format/bedgraph.py000644 000765 000024 00000011753 12024702176 021145 0ustar00jrideoutstaff000000 000000 from cogent.util.misc import get_merged_by_value_coords __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "alpha" # following from https://cgwb.nci.nih.gov/goldenPath/help/bedgraph.html # track type=bedGraph name=track_label description=center_label # visibility=display_mode color=r,g,b altColor=r,g,b # priority=priority autoScale=on|off alwaysZero=on|off # gridDefault=on|off maxHeightPixels=max:default:min # graphType=bar|points viewLimits=lower:upper # yLineMark=real-value yLineOnOff=on|off # windowingFunction=maximum|mean|minimum smoothingWindow=off|2-16 # Data Values # Bedgraph track data values can be integer or real, positive or negative # values. Chromosome positions are specified as 0-relative. The first # chromosome position is 0. The last position in a chromosome of length N # would be N - 1. Only positions specified have data. Positions not # specified do not have data and will not be graphed. All positions specified # in the input data must be in numerical order. The bedGraph format has four # columns of data: bedgraph_fields = ('name', 'description', 'visibility', 'color', 'altColor', 'priority', 'autoScale', 'alwaysZero', 'gridDefault', 'maxHeightPixels', 'graphType', 'viewLimits', 'yLineMark', 'yLineOnOff', 'windowingFunction', 'smoothingWindow') _booleans = ('autoScale', 'alwaysZero', 'gridDefault', 'yLineOnOff') valid_values = dict(autoScale=['on', 'off'], graphType=['bar', 'points'], windowingFunction=['maximum', 'mean', 'minimum'], smoothingWindow=['off']+map(str,range(2,17))) def raise_invalid_vals(key, val): """raises RuntimeError on invalid values for keys """ if key not in valid_values: return True if not str(val) in valid_values[key]: raise AssertionError('Invalid bedgraph key/val pair: '\ + 'got %s=%s; valid values are %s' % (key, val, valid_values[key])) def booleans(key, val): """returns ucsc formatted boolean""" if val in (1, True, 'on', 'On', 'ON'): val = 'on' else: val = 'off' return val def get_header(name=None, description=None, color=None, **kwargs): """returns header line for bedgraph""" min_header = 'track type=bedGraph name="%(name)s" '\ + 'description="%(description)s" color=%(color)s' assert None not in (name, description, color) header = [min_header % {'name': name, 'description': description, 'color': ','.join(map(str,color))}] if kwargs: if not set(kwargs) <= set(bedgraph_fields): not_allowed = set(kwargs) - set(bedgraph_fields) raise RuntimeError( "incorrect arguments provided to bedgraph %s" % str(list(not_allowed))) if 'altColor' in kwargs: kwargs['altColor'] = ','.join(map(str,kwargs['altColor'])) header_suffix = [] for key in kwargs: if key in _booleans: kwargs[key] = booleans(key, kwargs[key]) raise_invalid_vals(key, kwargs[key]) header_suffix.append('%s=%s' % (key, kwargs[key])) header += header_suffix return ' '.join(header) def bedgraph(chrom_start_end_val, digits=2, name=None, description=None, color=None, **kwargs): """returns a bed formatted string. Input data must be provided as [(chrom, start, end, val), ...]. These will be merged such that adjacent records with the same value will be combined. Arguments: - name: track name - description: track description - color: (R,G,B) tuple of ints where max val of int is 255, e.g. red is (255, 0, 0) - **kwargs: keyword=val, .. valid bedgraph format modifiers see https://cgwb.nci.nih.gov/goldenPath/help/bedgraph.html """ header = get_header(name=name, description=description, color=color, **kwargs) make_data_row = lambda x: '\t'.join(map(str, x)) # get independent spans for each chromosome bedgraph_data = [] data = [] curr_chrom = None for chrom, start, end, val in chrom_start_end_val: if curr_chrom is None: curr_chrom = chrom if curr_chrom != chrom: data = get_merged_by_value_coords(data, digits=digits) bedgraph_data += [make_data_row([curr_chrom, s, e, v]) for s, e, v in data] data = [] curr_chrom = chrom else: data.append([start, end, val]) if data != []: data = get_merged_by_value_coords(data, digits=digits) bedgraph_data += [make_data_row([curr_chrom, s, e, v]) for s, e, v in data] bedgraph_data = [header] + bedgraph_data return '\n'.join(bedgraph_data) PyCogent-1.5.3/cogent/format/clustal.py000644 000765 000024 00000003674 12024702176 021043 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Writer for Clustal format. """ from cogent.core.alignment import SequenceCollection from copy import copy __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" def clustal_from_alignment(aln, interleave_len=None): """Returns a string in Clustal format. - aln: can be an Alignment object or a dict. - interleave_len: sequence line width. Only available if sequences are aligned. """ if not aln: return '' # get seq output order try: order = aln.RowOrder except: order = aln.keys() order.sort() seqs = SequenceCollection(aln) clustal_list = ["CLUSTAL\n"] if seqs.isRagged(): raise ValueError,\ "Sequences in alignment are not all the same length." +\ "Cannot generate Clustal format." aln_len = seqs.SeqLen #Get all labels labels = copy(seqs.Names) #Find all label lengths in order to get padding. label_lengths = [len(l) for l in labels] label_max = max(label_lengths) max_spaces = label_max+4 #Get ordered seqs ordered_seqs = [seqs.NamedSeqs[label] for label in order] if interleave_len is not None: curr_ix = 0 while curr_ix < aln_len: clustal_list.extend(["%s%s%s"%(x,' '*(max_spaces-len(x)),\ y[curr_ix:curr_ix+ \ interleave_len]) for x,y in zip(order,ordered_seqs)]) clustal_list.append("") curr_ix += interleave_len else: clustal_list.extend(["%s%s%s"%(x,' '*(max_spaces-len(x)),y) \ for x,y in zip(order,ordered_seqs)]) clustal_list.append("") return '\n'.join(clustal_list) PyCogent-1.5.3/cogent/format/fasta.py000644 000765 000024 00000005731 12024702176 020466 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Writer for FASTA sequence format """ __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" class _fake_seq(str): """a holder for string sequences that allows provision of a seq.Label attribute, required by fasta formatting funcs.""" def __new__(cls, Label, Seq): new = str.__new__(cls, Seq) new.Label = Label return new def __getslice__(self, *args, **kwargs): new_seq = str.__getslice__(self, *args, **kwargs) return self.__new__(self.__class__,self.Label, new_seq) def fasta_from_sequences(seqs, make_seqlabel = None, line_wrap = None): """Returns a FASTA string given a list of sequences. A sequence.Label attribute takes precedence over sequence.Name. - seqs can be a list of sequence objects or strings. - make_seqlabel: callback function that takes the seq object and returns a label str - line_wrap: a integer for maximum line width """ fasta_list = [] for i,seq in enumerate(seqs): # Check if it has a label, or one is to be created label = str(i) if make_seqlabel is not None: label = make_seqlabel(seq) elif hasattr(seq, 'Label') and seq.Label: label = seq.Label elif hasattr(seq, 'Name') and seq.Name: label = seq.Name # wrap sequence lines seq_str = str(seq) if line_wrap is not None: numlines,remainder = divmod(len(seq_str),line_wrap) if remainder: numlines += 1 body = ["%s" % seq_str[j*line_wrap:(j+1)*line_wrap] \ for j in range(numlines)] else: body = ["%s" % seq_str] fasta_list.append('>'+label) fasta_list += body return '\n'.join(fasta_list) def fasta_from_alignment(aln, make_seqlabel=None, line_wrap=None, sorted=True): """Returns a FASTA string given an alignment. - aln can be an Alignment object or dict. - make_seqlabel: callback function that takes the seq object and returns a label str - line_wrap: a integer for maximum line width """ # get seq output order try: order = aln.Names[:] except AttributeError: order = aln.keys() if sorted: order.sort() try: seq_dict = aln.NamedSeqs except AttributeError: seq_dict = aln ordered_seqs = [] for label in order: seq = seq_dict[label] if isinstance(seq, str): seq = _fake_seq(label, seq) ordered_seqs.append(seq) return fasta_from_sequences(ordered_seqs, make_seqlabel=make_seqlabel, line_wrap=line_wrap) PyCogent-1.5.3/cogent/format/mage.py000644 000765 000024 00000062303 12024702176 020277 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ MAGE format writer, especially useful for writing the RNA/DNA simplex. The MAGE format is documented here: ftp://kinemage.biochem.duke.edu/pub/kinfiles/docsDemos/KinFmt6.19.txt Implementation notes Mage seems to have a dynamic range of 'only' 2-3 orders of magnitude for balls and spheres. Consequently, although we have to truncate small values when writing in the RNA simplex, no additional output accuracy is gained by scaling everything by a large constant factor (e.g. by 10,000). Consequently, this code preserves the property of writing the simplex in the interval (0,1) since the numbers reflect the fractions of A, C, G and U directly. """ from __future__ import division from numpy import array, fabs from copy import deepcopy from cogent.util.misc import extract_delimited from cogent.maths.stats.util import Freqs from cogent.maths.stats.special import fix_rounding_error __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Gavin Huttley", "Sandra Smit", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Development" def is_essentially(ref_value, value, round_error=1e-14): """Returns True if value is within round_error distance from ref_value. ref_value -- int or float, number value -- int or float, number round_error -- limit on deviation from ref_value, default=1e-14 This function differs from fix_rounding_error: - it returns a boolean, not 1 or 0. - it checks the lower bound as well, not only if ref_value < value < ref_value+round_error """ if ref_value-round_error < value < ref_value+round_error: return True return False #Note: in addition to the following 24 colors, there's also deadwhite and #deadblack (force white or black always). The color names cannot be changed. #MAGE uses an internal color table to look up 4 'steps' in each color #corresponding to distance from the camera, and varies the colors depending #on whether the background is black or white. A @fullrgbpalette declaration #can be used to reassign colors by number, but it seems to be tricky to #get it right. MageColors = [ 'red', 'orange', 'gold', 'yellow', 'lime', 'green', 'sea', 'cyan', 'sky', 'blue', 'purple', 'magenta', 'hotpink', 'pink', 'lilac', 'peach', 'peachtint', 'yellowtint', 'greentint', 'bluetint', 'lilactint', 'pinktint', 'white', 'gray', 'brown'] #use these for 7-point scales RainbowColors = ['red', 'orange', 'yellow', 'green', 'cyan', 'blue', 'purple'] #use these for matched series of up to 3 items. The first 5 series work pretty #well, but the last series doesn't really match so only use it if desperate. MatchedColors = [['red', 'green', 'blue'], ['pinktint', 'greentint', 'bluetint'], ['orange', 'lime', 'sky'], ['magenta', 'yellow', 'cyan'], ['peach', 'sea', 'purple'], ['white', 'gray', 'brown']] class MagePoint(object): """Holds information for a Mage point: label, attributes, coordinates.""" MinRadius = 0.001 DefaultRadius = MinRadius def __init__(self, Coordinates=None, Label=None, Color=None, State=None, Width=None, Radius=None): """Returns a new MagePoint with specified coordinates. Usage: m = MagePoint(Coordinates=None, Label=None, Color=None, Attributes=None, Width=None, Radius=None) Coordinates is a list of floats: x, y, z Label is a text label (warning: does not check length). Color is the name of the point's color State can be one of: P beginning of new point in vectorlist U unpickable L line to this point B ball at this point in a vectorlist S sphere at this point in a vectorlist R ring at this point in a vectorlist Q square at this point in a vectorlist A unknown (arrow?): something to do with vectorlist T interpret as part of triangle in vectorlist Width is an integer that defines pen width Radius is a float that defines the point radius for balls, spheres, etc. If Coordinates is None, the Coordinates will be set to [0,0,0] """ if Coordinates is None: self.Coordinates = [0,0,0] else: self.Coordinates = Coordinates self.Label = Label self.Color = Color self.State = State self.Width = Width self.Radius = Radius def __str__(self): """Prints out current point as string.""" coords = self.Coordinates if len(coords) != 3: raise ValueError, "Must have exactly 3 coordinates: found %s" \ % coords pieces = [] #add the pointID if self.Label is not None: pieces.append('{%s}' % self.Label) #collect attributes and add them if necessary attr = [] if self.State: attr.append(self.State) if self.Color: attr.append(self.Color) if self.Width: attr.append('width%s' % self.Width) r = self.Radius #NOTE: as of version 6.25, MAGE could not handle radius < 0.001 if r is not None: if r > self.MinRadius: attr.append('r=%s' % r) else: attr.append('r=%s' % self.DefaultRadius) if attr: pieces += attr #add the coordinates pieces.extend(map(str, coords)) #return the result return ' '.join(pieces) def __cmp__(self, other): """Checks equality/inequality on all fields. Order: Coordinates, Label, Color, State, Width, Radius""" try: return cmp(self.__dict__, other.__dict__) except AttributeError: #some objects don't have __dict__... return cmp(self.__dict__, other) def _get_coord(coord_idx): """Gets the coordinate asked for; X is first, Y second, Z third coord """ def get_it(self): return self.Coordinates[coord_idx] return get_it def _set_coord(coord_idx): """Sets the given coordinate; X is first, Y second, Z third coord""" def set_it(self,val): self.Coordinates[coord_idx] = val return set_it _lookup = {'x':(_get_coord(0),_set_coord(0)), 'y':(_get_coord(1),_set_coord(1)), 'z':(_get_coord(2),_set_coord(2))} X = property(*_lookup['x']) Y = property(*_lookup['y']) Z = property(*_lookup['z']) def toCartesian(self, round_error=1e-14): """Returns new MagePoint with UC,UG,UA coordinates from A,C,G(,U). round_error -- float, accepted rounding error x=u+c, y=u+g, z=u+a This will only work for coordinates in a simplex, where all of them (inlcuding the implicit one) add up to one. """ # all values have to be between 0 and 1 a = fix_rounding_error(self.X) c = fix_rounding_error(self.Y) g = fix_rounding_error(self.Z) #u = fix_rounding_error(1-a-c-g) if is_essentially(1, a+c+g, round_error=round_error): u = 0.0 else: u = fix_rounding_error(1-a-c-g) for coord in [a,c,g,u]: if not 0 <= coord <= 1: raise ValueError,\ "%s is not in unit simplex (between 0 and 1)"%(coord) cart_x, cart_y, cart_z = u+c, u+g, u+a result = deepcopy(self) result.Coordinates = [cart_x, cart_y, cart_z] return result def fromCartesian(self): """Returns new MagePoint with A,C,G(,U) coordinates from UC,UG,UA. From UC,UG,UA to A,C,G(,U). This will only work when the original coordinates come from a simplex, where U+C+A+G=1 """ # x=U+C, y=U+G, z=U+A, U+C+A+G=1 # U=(1-x-y-z)/-2 x,y,z = self.X, self.Y, self.Z u = fix_rounding_error((1-x-y-z)/-2) a, c, g = map(fix_rounding_error,[z-u, x-u, y-u]) result = deepcopy(self) result.Coordinates = [a, c, g] return result def MagePointFromBaseFreqs(freqs, get_label=None, get_color=None, \ get_radius=None): """Returns a MagePoint from an object with counts for the bases. get_label should be a function that calculates a label from the freqs. If get_label is not supplied, checks freqs.Label, freqs.Species, freqs.Id, freqs.Accession, and freqs.Name in that order. If get_label fails or none of the attributes is found, no label is written. get_color should be a function that calculates a color from the freqs. Default is no color (i.e. the point has the color for the series), which will also happen if get_color fails. get_radius is similar to get_color. """ label = None if get_label: try: label = get_label(freqs) except: pass #label will be assigned None below else: for attr in ['Label', 'Species', 'Id', 'Accession', 'Name']: if hasattr(freqs, attr): label = getattr(freqs, attr) #keep going if the label is empty if label is not None and label != '': break if not label and label != 0: label = None if get_color: try: color = get_color(freqs) except: color=None else: if hasattr(freqs, 'Color'): color = freqs.Color else: color = None if get_radius: try: radius = get_radius(freqs) except: radius=None else: if hasattr(freqs, 'Radius'): try: radius = float(freqs.Radius) except: radius = None else: radius = None relevant = Freqs({'A':freqs.get('A',0), 'C':freqs.get('C',0), 'G':freqs.get('G',0), 'U':freqs.get('U',0) or freqs.get('T',0)}) relevant.normalize() return MagePoint((relevant['A'],relevant['C'],relevant['G']), Label=label,\ Color=color, Radius=radius) class MageList(list): """Holds information about a list of Mage points. """ KnownStyles = ['vector', 'dot', 'label', 'word', 'ball', 'sphere', \ 'triangle', 'ribbon', 'arrow', 'mark', 'ring', 'fan'] BooleanOptions = ['Off', 'NoButton'] KeywordOptions = ['Color','Radius','Angle','Width','Face','Font','Size'] def __init__(self, Data=None, Label=None, Style=None, Color=None, \ Off=False, NoButton=False, Radius=None, Angle=None, Width=None, \ Face=None, Font=None, Size=None): """Returns new MageList object. Must initialize with list of MagePoints. """ if Data is None: super(MageList,self).__init__([]) else: super(MageList, self).__init__(Data) self.Label = Label self.Style = Style self.Color = Color self.Off = Off self.NoButton = NoButton self.Radius = Radius self.Angle = Angle self.Width = Width self.Face = Face self.Font = Font self.Size = Size def __str__(self): """Returns data as a mage-format string, one point to a line.""" pieces = [] curr_style = self.Style or 'dot' #dotlists by default pieces.append('@%slist' % curr_style.lower()) if self.Label: pieces.append('{%s}' % self.Label) for opt in self.BooleanOptions: if getattr(self, opt): pieces.append(opt.lower()) for opt in self.KeywordOptions: curr = getattr(self, opt) if curr: pieces.append('%s=%s' % (opt.lower(), curr)) lines = [' '.join(pieces)] + [str(d) for d in self] return '\n'.join(lines) def toArray(self, include_radius=True): """Returns an array of the three coordinates and the radius for all MP """ elem = [] if include_radius: for point in self: coords = point.Coordinates radius = point.Radius or self.Radius if radius is None: raise ValueError,\ "Radius is not set for point or list, %s"%(str(point)) else: elem.append(coords + [radius]) else: for point in self: coords = point.Coordinates elem.append(coords) return array(elem) def iterPoints(self): """Iterates over all points in the list""" for point in self: yield point def toCartesian(self, round_error=1e-14): """Returns new MageList where points are in Cartesian coordinates""" #create new list result = MageList() #copy all attributes from the original result.__dict__ = self.__dict__.copy() #fill with new points for point in self.iterPoints(): result.append(point.toCartesian(round_error=round_error)) return result def fromCartesian(self): """Returns new MageList where points are in ACG coordinates""" #create new list result = MageList() #copy all attributes from the original result.__dict__ = self.__dict__.copy() #fill with new points for point in self.iterPoints(): result.append(point.fromCartesian()) return result class MageGroup(list): """Holds information about a MAGE Group, which has a list of lists/groups. Specificially, any nested MageGroups will be treated as MAGE @subgroups, while any dotlists, vectorlists, etc. will be treated directly. """ BooleanOptions = ['Off', 'NoButton', 'RecessiveOn', 'Dominant', 'Lens'] KeywordOptions = ['Master', 'Instance', 'Clone'] Cascaders = ['Style', 'Color','Radius','Angle','Width','Face','Font','Size'] def __init__(self, Data=None, Label=None, Subgroup=False, Off=False, \ NoButton=False, RecessiveOn=True, Dominant=False, Master=None, \ Instance=None, Clone=None, Lens=False, Style=None, Color=None, \ Radius=None, Angle=None, Width=None, Face=None, Font=None, Size=None): """Returns new MageList object. Must initialize with list of MagePoints. """ if Data is None: super(MageGroup,self).__init__([]) else: super(MageGroup,self).__init__(Data) self.Label = Label self.Subgroup = Subgroup self.Off = Off self.NoButton = NoButton self.RecessiveOn = RecessiveOn self.Dominant = Dominant self.Master = Master self.Instance = Instance self.Clone = Clone self.Lens = Lens #properties of lists. Group settings will be expressed when Kinemage #object is printed self.Style = Style self.Color = Color self.Radius = Radius self.Angle = Angle self.Width = Width self.Face = Face self.Font = Font self.Size = Size def __str__(self): """Returns data as a mage-format string, one point to a line. Children are temporarily mutated to output the correct string. Changes are undone afterwards, so the original doesn't change. """ pieces = [] if self.Subgroup: pieces.append('@subgroup') else: pieces.append('@group') if self.Label: pieces.append('{%s}' % self.Label) for opt in self.BooleanOptions: if getattr(self, opt): pieces.append(opt.lower()) for opt in self.KeywordOptions: curr = getattr(self, opt) if curr: pieces.append('%s={%s}' % (opt.lower(), curr)) for d in self: if isinstance(d, MageGroup): d.Subgroup = True # cascade values 1 level down (to either child lists or groups) changed = {} for attr in self.Cascaders: val = getattr(self,attr) if val is not None: changed[attr] = [] for item in self: #either group or list if getattr(item,attr) is None: setattr(item,attr,val) changed[attr].append(item) #gather all lines lines = [' '.join(pieces)] + [str(d) for d in self] #reset cascaded values for k,v in changed.items(): for l in v: setattr(l,k,None) return '\n'.join(lines) def iterGroups(self): """Iterates over all groups in this group""" for item in self: if isinstance(item,MageGroup): yield item for j in item.iterGroups(): yield j def iterLists(self): """Iterates over all lists in this group""" for item in self: if isinstance(item,MageList): yield item else: for j in item.iterLists(): yield j def iterGroupsAndLists(self): """Iterates over all groups and lists in this group Groups and Lists have multiple elements in common. You might want to change a property for both groups and lists. This iterator doesn't distinguish between them, but returns them all. """ for i in self: if isinstance(i,MageGroup): yield i for j in i.iterGroupsAndLists(): yield j elif isinstance(i,MageList): yield i def iterPoints(self): """Iterates over all points in this group""" for item in self.iterLists(): for p in item.iterPoints(): yield p def toCartesian(self, round_error=1e-14): """Returns a new MageGroup where all points are Cartesian coordinates """ #create an empty group result = MageGroup() #copy the attributes of the original result.__dict__ = self.__dict__.copy() #fill with new groups and lists for item in self: result.append(item.toCartesian(round_error=round_error)) return result def fromCartesian(self): """Returns a new MageGroup where all points are ACG coordinates """ #create an empty group result = MageGroup() #copy the attributes of the original result.__dict__ = self.__dict__.copy() #fill with new groups and lists for item in self: result.append(item.fromCartesian()) return result class MageHeader(object): """Defines the header for a kinemage. For now, just returns the text string it was initialized with. """ def __init__(self, data): """Returns a new MageHeader object.""" self.Data = data def __str__(self): """Writes the header information out as a string.""" return str(self.Data) class SimplexHeader(MageHeader): """Defines the header for the RNA simplex. May have a bunch of scaling and coloring options later. """ def __init__(self, *args, **kwargs): """Returns a new SimplexHeader object. May have options added later, but none for now. """ self.Data = \ """ @viewid {oblique} @zoom 1.05 @zslab 467 @center 0.500 0.289 0.204 @matrix -0.55836 -0.72046 -0.41133 0.82346 -0.42101 -0.38036 0.10085 -0.55108 0.82833 @2viewid {top} @2zoom 0.82 @2zslab 470 @2center 0.500 0.289 0.204 @2matrix -0.38337 0.43731 -0.81351 0.87217 -0.11840 -0.47466 -0.30389 -0.89148 -0.33602 @3viewid {side} @3zoom 0.82 @3zslab 470 @3center 0.500 0.289 0.204 @3matrix -0.49808 -0.81559 -0.29450 0.86714 -0.46911 -0.16738 -0.00164 -0.33875 0.94088 @4viewid {End-O-Line} @4zoom 1.43 @4zslab 469 @4center 0.500 0.289 0.204 @4matrix 0.00348 -0.99984 -0.01766 0.57533 -0.01244 0.81784 -0.81792 -0.01301 0.57519 @perspective @fontsizelabel 24 @onewidth @zclipoff @localrotation 1 0 0 .5 .866 0 .5 .289 .816 @group {Tetrahedron} @vectorlist {Edges} color=white nobutton P {0 0 0} 0 0 0 0.5 0 0 {1 0 0} 1 0 0 0.5 0.5 0 {0 1 0} 0 1 0 P 0 0 0 0 0.5 0 {0 1 0} 0 1 0 0 0.5 0.5 {0 0 1} 0 0 1 P 0 0 0 0 0 0.5 {0 0 1} 0 0 1 0.5 0 0.5 {1 0 0} 1 0 0 @labellist {labels} color=white nobutton {U} 0 0 0 {A} 1.1 0 0 {C} 0 1.05 0 {G} 0 0 1.08 @group {Lines} @vectorlist {A=U&C=G} color= green off P 0 0.5 0.5 .1 .4 .4 .25 .25 .25 .4 .1 .1 L 0.500, 0.000, 0.000 @vectorlist {A=G&C=U} color= red off P 0.5 0 0.5 .25 .25 .25 L 0, 0.500, 0.000 @vectorlist {A=C&G=U} color= red off P 0.5 0.5 0 .25 .25 .25 L 0.000, 0.000, 0.500 """ class CartesianSimplexHeader(MageHeader): """Defines the header for the RNA simplex: coordinates in Cartesian system. May have a bunch of scaling and coloring options later. """ def __init__(self, *args, **kwargs): """Returns a new CartesianSimplexHeader object. May have options added later, but none for now. """ self.Data = \ """ @1viewid {Oblique} @1span 2.2 @1zslab 200.0 @1center 0.5 0.5 0.5 @1matrix 0.2264 0.406 0.8854 0.6132 0.6468 -0.4535 -0.7567 0.6456 -0.1025 @2viewid {Down UA axis} @2span 2.0 @2zslab 200.0 @2center 0.5 0.5 0.5 @2matrix 1 0 0 0 1 0 0 0 1 @3viewid {Down UC axis} @3span 2.0 @3zslab 200.0 @3center 0.5 0.5 0.5 @3matrix 0 0 1 0 1 0 -1 0 0 @4viewid {Down UG axis} @4span 2.0 @4zslab 200.0 @4center 0.5 0.5 0.5 @4matrix 0 -1 0 0 0 1 -1 0 0 @5viewid {Cube} @5span 2.0 @5zslab 200.0 @5center .5 0.5 0.5 @5matrix 0.956 -0.0237 0.2923 0.2933 0.0739 -0.9532 0.001 0.997 0.0776 @group {Tetrahedron} @vectorlist {Edges} nobutton color= white {Edges}P 1 1 1 1 0 0 0 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 1 1 0 1 0 @labellist {labels} nobutton color= white {U}1.05 1.05 1.05 {A}-0.05 -0.05 1.05 {C}1.05 -0.05 -0.05 {G}-0.05 1.05 -0.05 @group {Lines} @vectorlist {A=U&C=G} color= green {A=U&C=G}P 0.5 0.5 1 0.5 0.5 0 @vectorlist {A=G&C=U} color= red {A=G&C=U}P 0 0.5 0.5 1 0.5 0.5 @vectorlist {A=C&G=U} color= red {A=C&G=U}P 0.5 0 0.5 0.5 1 0.5 @group {Cube} off @vectorlist nobutton color=yellow {"}P 0 0 0 {"}1 0 0 {"}1 1 0 {"}0 1 0 {"}0 0 0 {"}P 0 0 1 {"}1 0 1 {"}1 1 1 {"}0 1 1 {"}0 0 1 {"}P 0 0 0 {"}0 0 1 {"}P 1 0 0 {"}1 0 1 {"}P 1 1 0 {"}1 1 1 {"}P 0 1 0 {"}0 1 1 @labellist {Cube labels} color= red off {0,0,0}-0.1 -0.1 -0.1 {1,1,1}1.1 1.1 1.1 {0,0,1} -.1 -.1 1.1 {0,1,1} -.1 1.1 1.1 {0,1,0} -.1 0.9 0.05 {1,0,1} 1.05 -.1 0.95 {1,1,0} 1.1 1.1 -.1 {1,0,0} 1.1 -.1 -.1 {x++}1.03 0.5 0.5 {y++}0.5 1.1 0.475 {z++}0.5 0.5 1.05 {x--}-0.075 0.5 0.5 {y--}0.475 -0.15 0.5 {z--}0.5 0.5 -0.08 """ class Kinemage(object): """Stores information associated with a kinemage: header, caption, groups. A single file can have multiple kinemages. """ def __init__(self,Count=None,Header=None,Groups=None,Caption=None,\ Text=None): """Returns a new Kinemage object. Usage: k = Kinemage(count, header, groups, caption=None, text=None) """ self.Count = Count #integer, required for MAGE self.Header = Header self.Groups = Groups or [] self.Caption = Caption self.Text = Text def __str__(self): """String representation suitable for writing to file.""" if not self.Count: raise ValueError, "Must set a count to display a kinemage." pieces = ['@kinemage %s' % self.Count] if self.Header: pieces.append(str(self.Header)) if self.Text: pieces.append('@text') pieces.append(str(self.Text)) if self.Caption: pieces.append('@caption') pieces.append(str(self.Caption)) for g in self.Groups: pieces.append(str(g)) return '\n'.join(pieces) def iterGroups(self): """Iterates over all groups in Kinemage object""" for i in self.Groups: yield i for j in i.iterGroups(): yield j def iterLists(self): """Iterates over all lists in Kinemage object""" for gr in self.Groups: for l in gr.iterLists(): yield l def iterPoints(self): """Iterates over all points in Kinemage object""" for gr in self.Groups: for p in gr.iterPoints(): yield p def iterGroupsAndLists(self): """Iterates over all groups and lists in Kinemage object""" for gr in self.Groups: yield gr for j in gr.iterGroupsAndLists(): yield j def toCartesian(self, round_error=1e-14): """Returns a new Kinemage where all coordinates are UC,UG,UA""" result = deepcopy(self) result.Groups = [] for item in self.Groups: result.Groups.append(item.toCartesian(round_error=round_error)) return result def fromCartesian(self): """Returns a new Kinemage where all coordinates are ACG again""" result = deepcopy(self) result.Groups = [] for item in self.Groups: result.Groups.append(item.fromCartesian()) return result PyCogent-1.5.3/cogent/format/motif.py000644 000765 000024 00000203250 12024702176 020502 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Format classes for MotifResults objects.""" from __future__ import division from matplotlib import use use('Agg') #suppress graphical rendering from cogent.motif.util import MotifFormatter from cogent.format.pdb_color import get_matching_chains,\ align_subject_to_pdb, PYMOL_FUNCTION_STRING, MAIN_FUNCTION_STRING from cogent.format.rna_struct import color_on_structure, draw_structure from cogent.format.fasta import fasta_from_alignment from cogent.core.moltype import PROTEIN, RNA, DNA from cogent.core.alignment import Alignment,SequenceCollection from cogent.util.dict2d import Dict2D from numpy import zeros, nonzero from cogent.align.weights.util import AlnToProfile from zipfile import ZipFile from cogent.app.util import get_tmp_filename from gzip import GzipFile from pylab import savefig,clf __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Micah Hamady"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Prototype" def _format_number(number): if number is None: return "None" return "%.2e" % number def avg(l): """Returns the average of a list of numbers.""" if not l: return None return sum(l)/len(l) class MotifStatsBySequence(MotifFormatter): """Generates HTML table with Motifs organized by sequence. - Each sequence is listed as follows: Seq-ID Combined-P-Value Motif-ID Motif-P-Value Motif-Seq """ def __init__(self,MotifResults=None): """init function for MotifStatsBySequence class""" MotifFormatter.__init__(self) self.MotifResults = MotifResults self.ColorMap = self.getColorMap(MotifResults) def __call__(self,order=None, wrap_html=False, title_class="ntitle", normal_class="normal", cons_thresh=.9): """Call method for MotifStatsBySequence class. wrap_html: if True, wrap in html + body, else just return table table_class: css class to use to format table title_class: css class to use to format titles normal_class: css class to use to format normal text cons_thresh: conservation threshold """ self.ConservationThresh = cons_thresh html_list = [] #if MotifResults is not None if self.MotifResults: #Find out if the alignment has a combined P value from results if 'CombinedP' in self.MotifResults.Results: combined_p_string = 'Combined P-Value' self.combinedP = True else: combined_p_string = ' ' self.combinedP = False #Start HTML string with table and table headers html_list = ["""
""" % ( title_class, combined_p_string)] #For each sequence in alignment get HTML for that sequence if not order: order = sorted(self.MotifResults.Alignment.keys()) for seqID in order: html_list.append(self.seqLines(seqID, title_class, normal_class)) html_list.append("
Sequence ID %s Motif ID Motif P-Value Motif Sequence
") if wrap_html: return """Motif Finder Results%s
""" % ''.join(html_list) return ''.join(html_list) return "" def _get_location_dict(self): """Builds dict of all locations. {module:{seqID:[indices]}} """ location_dict = {} #Dict with locations of every motif keyed by module #Build dict of all the locations: # {module:{seqID:[indices]}} if self.MotifResults: for motif in self.MotifResults.Motifs: for module in motif.Modules: location_dict[module]=module.LocationDict return location_dict Locations = property(_get_location_dict) def seqLines(self, seqID, title_class="ntitle", normal_class="normal"): """Returns HTML string for single sequence in alignment. - Must call for each sequence in the alignment. normal_class: css class to use to format rows title_class: css class to use to format title cells """ #Variable which signifies if given sequence in alignment contains motifs contains_motifs=False #Generate first row in table html_list = ["""%s """%(normal_class, title_class, seqID)] #If there is a combined P for the sequences, put it in first row if self.combinedP: try: html_list = [""" %s %s       """ % ( normal_class, title_class, seqID, _format_number(float( self.MotifResults.Results['CombinedP'][seqID])))] except KeyError: pass #For each module for module in self.Locations.keys(): cons_seq, cons_con_seq = self._make_conservation_consensus(module) #Check to see if it appeared in the sequence if seqID in self.Locations[module]: contains_motifs=True for index in self.Locations[module][seqID]: cur_seq = str(module.NamedSeqs[(seqID,index)]) html_list.append("""   %s %s %s
%s """ %(normal_class, module.ID, _format_number(module.Pvalue), self.ColorMap[module.ID], cur_seq, """%s""" % ''.join( self._flag_conserved_consensus(cons_con_seq, cons_seq, cur_seq)) )) if not contains_motifs: html_list=[] return ''.join(html_list) class MotifLocationsBySequence(MotifFormatter): """Generates HTML table with Motifs organized by sequence. - Each sequence is listed as follows: Seq-ID #_bases-module_sequence-#_bases-module_sequence-#_bases-etc """ def __init__(self,MotifResults=None): """init function for MotifLocationsBySequence class""" MotifFormatter.__init__(self) self.MotifResults = MotifResults self.ColorMap = self.getColorMap(MotifResults) def _get_location_dict(self): """Builds dict of all locations. {seqID:{index:module}} """ #Dict with locations of every module keyed by seqID locations_list = [] module_map = {} #If MotifResults object exists if self.MotifResults: #Build dict of all the locations: # {seqID:{index:module}} #For each motif in MotifResults object for motif in self.MotifResults.Motifs: #For each module in the Motif for module in motif.Modules: #Get the location dict for that module location_dict = module.LocationDict #For each sequence the module is in for seqID in location_dict.keys(): #For each module instance in the sequence for index in location_dict[seqID]: #Add module to dict locations_list.append((seqID,index,module)) module_map[(seqID,index)] = module locations = Dict2D() locations.fromIndices(locations_list) self.ModuleMap = module_map return locations Locations = property(_get_location_dict) def formatLine(self, seqID, max_index_len): """Returns motif line """ #Variable which signifies if given sequence in alignment contains motifs contains_motifs=False #List of strings for sequence line seq_line_list = [] #Get all indices for the sequence indices = [] if seqID in self.Locations: indices = self.Locations[seqID].keys() contains_motifs=True indices.sort() #Current position in sequence pos=0 #For each index in sequence fmt_str = "%0" + max_index_len + "d" for index in indices: #Add distance between motifs or ends of sequence to list if > 0 seq_line_list.append(fmt_str % (index-pos)) #Add module instance sequence to list mod_id = self.ModuleMap[(seqID, index)].ID seq_line_list.append( """%s""" % \ (self.ColorMap[mod_id], mod_id,\ self.Locations[seqID][index].NamedSeqs[(seqID,index)].Sequence)) #Find new position in sequence pos = \ self.Locations[seqID][index].NamedSeqs[(seqID,index)].Location.End #Add distance from end of last module to end of sequence to list if > 0 if len(self.MotifResults.Alignment.NamedSeqs[seqID])-pos > 0: seq_line_list.append(\ str(len(self.MotifResults.Alignment.NamedSeqs[seqID])\ -pos)) return '-'.join(seq_line_list), contains_motifs def seqLines(self, seqID, max_index_len, title_class="ntitle", normal_class="normal"): """Returns HTML string for single sequence in alignment. - Must call for each sequence in the alignment. title_class: css class to use to format title normal_class: css class to use to format text """ seq_line_list, contains_motifs = self.formatLine(seqID, max_index_len) if contains_motifs: #Return sequence string return """ %s %s"""%( normal_class, title_class, seqID, seq_line_list) else: return '' def __call__(self, order=None, wrap_html=False, title_class="ntitle", normal_class="normal"): """Call method for MotifLocationsBySequence class. - must pass in an alignment order """ html_list = [] # need to calculate this acrosss all all_indicies = [] for locs in self.Locations.values(): all_indicies.extend(locs.keys()) max_index_len = str(len(str(max(all_indicies)))) #If MotifResults is not None if self.MotifResults: #For each sequence in alignment get HTML for that sequence html_list.append("""Sequence IDMotif Locations""" % title_class) if not order: order=sorted(self.Locations.keys()) for seqID in order: html_list.append(self.seqLines(seqID, max_index_len, title_class, normal_class)) out_str = "%s
" % ''.join(html_list) if wrap_html: return """Motif Finder Results%s""" % out_str return out_str return "" class SequenceByMotif(MotifFormatter): """Generates HTML table with sequences organized by Motif. """ def __init__(self,MotifResults=None): """init function for MotifLocationsBySequence class""" MotifFormatter.__init__(self) self.MotifResults = MotifResults self.ColorMap = self.getColorMap(MotifResults) def _get_location_dict(self): """Build dict of all the locations: {module:{seqID:[indices]}} """ location_dict = {} #Dict with locations of every motif keyed by module if self.MotifResults: for motif in self.MotifResults.Motifs: for module in motif.Modules: location_dict[module]=module.LocationDict return location_dict Locations = property(_get_location_dict) def __call__(self, wrap_html=False, title_class="ntitle", normal_class="normal", cons_thresh=.9): """Call method for SequenceByMotif class. """ #Start HTML string with table and table headers html_list = [] #Get modules modules = self.Locations.keys() modules.sort() self.ConservationThresh = cons_thresh #For each module for module in modules: html_list.append(self.moduleLines(module, title_class, normal_class)) out_str = """ %s
Motif IDCombined P-Value Sequence ID Motif Sequence
""" % (title_class, ''.join(html_list)) if html_list: if wrap_html: return"""Motif Finder Results%s""" % out_str else: return out_str return "" def _highlightConsensus(self, con_seq, cons_con_seq, cur_seq, cur_color): """ Hightlight positions identical to consensus """ grey_style = """background-color: #dddddd; font-family: 'Courier New', Courier""" span_fmt = """%s""" h_str = [] for ix in range(len(cur_seq)): cur_c = cur_seq[ix] if cur_c == cons_con_seq[ix]: h_str.append(span_fmt % (cur_color,cur_c)) elif cur_c == con_seq[ix]: h_str.append(span_fmt % (grey_style,cur_c)) else: h_str.append(cur_c) return ''.join(h_str) def moduleLines(self, module, title_class="ntitle", normal_class="normal"): """Returns HTML string for single module. - Must call for each module. """ cons_seq, cons_con_seq = self._make_conservation_consensus(module) cur_color = self.ColorMap[module.ID] #Generate first row in table html_list = ['%s%s %s'%\ (normal_class, title_class, module.ID, _format_number(module.Pvalue), cur_color, cons_seq, )] sequences = self.Locations[module].keys() sequences.sort() #For each sequence for seq in sequences: cur_seq = str(module.NamedSeqs[(seq,self.Locations[module][seq][0])] ) html_list.append("""     %s %s """ % (normal_class, title_class, seq, #_format_number(module.Pvalue), self._highlightConsensus(cons_seq, cons_con_seq, cur_seq, cur_color) )) return ''.join(html_list) class HighlightMotifs(MotifFormatter): """Generates HTML table with sequences highlighted """ def makeModuleMap(self, motif_results): """ Need to extract this b/c can't pickle motif_results... grr. motif_results: MotifResults object keep_module_ids: list of module ids to keep """ module_map = {} #Dict with locations of every motif keyed by module if motif_results: for motif in motif_results.Motifs: for module in motif.Modules: mod_len = len(module) mod_id = str(module.ID) for skey, indexes in module.LocationDict.items(): if skey not in module_map: module_map[skey] = [] for ix in indexes: module_map[skey].append((ix, mod_id, mod_len)) return module_map def __init__(self, MotifResults, NodeOrder=None, KeepIds=None,\ KeepAll=False, MolType=PROTEIN): """Set up color map and motif results ModuleMap: flattened map (b/c of pickle problem.) generate using make_module_map() function Alignment: SequenceCollection or Alignment object KeepIds: list of module ids to keep KeepAll: When True, ignores KeepIds and highlights all motifs """ MotifFormatter.__init__(self) ModuleMap = self.makeModuleMap(MotifResults) module_ids = set([]) for skey, slist in ModuleMap.items(): for stup in slist: module_ids.add(stup[1]) self.ColorMap = self.getColorMapS0(sorted(list(module_ids))) self.ModuleMap = ModuleMap self.Alignment = MotifResults.Alignment if KeepIds is None: KeepIds = [] self.KeepIds = set(KeepIds) self.KeepAll = KeepAll if not NodeOrder: NodeOrder=self.Alignment.Names self.NodeOrder = NodeOrder self.MolType = MolType self.GapMap = self.getGapMap() self.HighlightMap = {} def __call__(self, title_class="ntitle", normal_class="normal", row_class="highlight", table_style=''): """Call method for HightlightMotifs class. """ #Start HTML string with table and table headers html_list = [] #For each module for seq_id in self.NodeOrder: html_list.append(self.highlightSeq(seq_id, title_class, row_class)) out_str = """ %s

Selected Motifs Highlighted on Sequences:

Sequence ID Sequence
""" % (table_style, title_class, ''.join(html_list)) if html_list: return out_str return "" def getGapMap(self): """Returns dict mapping gapped_coord to ungapped_coord in self.Alignment - {seq_id:{gapped_coord:ungapped_coord}} """ gap_map = {} for k,v in self.Alignment.items(): gapped, ungapped = self.MolType.gapMaps(v) gap_map[k] = gapped return gap_map def highlightSeq(self, seq_id, title_class="ntitle", row_class="highlight"): """Returns HTML string for single sequence. seq_id: sequence_id to highlight """ #Generate first row in table row_tmpl = """ %s %s """ mo_span_tmpl = """%s""" seq_list = list(self.Alignment.NamedSeqs[seq_id]) seq_len = len(seq_list) seq_mask = zeros(seq_len) mod_id_map = {} if seq_id in self.ModuleMap: for mod_tup in self.ModuleMap[seq_id]: ix, mod_id, mod_len = mod_tup # skip modules we con't care about if not self.KeepAll and mod_id not in self.KeepIds: continue mod_mask = zeros(seq_len) # mask motif region for i in range(ix,ix+mod_len): gapped_ix = self.GapMap[seq_id][i] mod_mask[gapped_ix] = 1 # add to sequence map seq_mask += mod_mask # map module ids to indexes for jx in range(ix,ix+mod_len): gapped_jx = self.GapMap[seq_id][jx] if gapped_jx not in mod_id_map: mod_id_map[gapped_jx] = [] mod_id_map[gapped_jx].append(mod_id) # get module regions #need to take [0] element of nonzero() since numpy returns a tuple # where Numeric did not for jx in nonzero(seq_mask)[0]: # if overlapping use red background, otherwise display color if seq_mask[jx] > 1: style = "background-color: red" else: #style = self.ColorMap[mod_id_map[jx][0]].replace("font-family: 'Courier New', Courier, monospace", "") style = self.ColorMap[mod_id_map[jx][0]] seq_list[jx] = mo_span_tmpl % (style, "
".join(['Motif ID: %s' % \ x for x in mod_id_map[jx]]), seq_list[jx]) # cache data self.HighlightMap[seq_id] = ''.join(seq_list) # return row output return row_tmpl % (row_class, title_class, seq_id, ''.join(seq_list)) class HighlightMotifsForm(MotifFormatter): """Generates HTML form to submit module ids """ def __init__(self, MotifResults, FormAction="/cgi-bin/motifcluster/highlight.py", FormTarget="_blank"): """Set up color map and motif results MotifResults: MotifResults object ModuleIds: List of module ids to highlight FormAction: Form action cgi-script FormTarget: Form target """ MotifFormatter.__init__(self) self.MotifResults = MotifResults self.ColorMap = self.getColorMap(MotifResults) self.Modules= self._get_modules() self.FormAction = FormAction self.FormTarget = FormTarget def _get_modules(self): """Build map of modules {seq_id:[(ix, module_id, module_len)] """ modules = [] if self.MotifResults: for motif in self.MotifResults.Motifs: for module in motif.Modules: modules.append(module) return modules def __call__(self, title_class="ntitle", normal_class="normal", highlight_class="highlight"): """Call method for HightlightMotifs class. """ #Start HTML string with table and table headers cells = [] # format cells for module in self.Modules: cur_tup = (module.Pvalue, len(module.LocationDict), self.moduleRow(module)) cells.append(cur_tup) # sort by p value, then frequency cells = [x[-1] for x in sorted(cells)] cur_cells = [] header_tmpl = """ ID Motif Fequency P-Value""" header_cells = [] num_headers = len(cells) if num_headers > 3: num_headers = 3 for i in range(num_headers): header_cells.append(header_tmpl) header_row = """%s""" % (title_class, ''.join(header_cells)) html_out = [] html_out.append("""""" % highlight_class) for ix, cell in enumerate(cells): if ix % 3 == 0: if cur_cells: html_out.append("""%s""" % (''.join(cur_cells), highlight_class)) cur_cells = [] cur_cells.append(cell) else: cur_cells.append(cell) html_out.append("%s" % ''.join(cur_cells)) out_str = """
%s %s

""" % (self.FormAction, self.FormTarget, header_row, ''.join(html_out)) if cells: return out_str return "" def moduleRow(self, module): """Returns HTML string for single module. module: module to generate """ #Generate first row in table cells_tmpl = """%s%s%d%s""" # return row output return cells_tmpl % (module.ID, module.ID, self.ColorMap[module.ID], str(module), #module.ConsensusSequence, len(module.LocationDict), _format_number(module.Pvalue)) class HighlightOnCrystal(MotifFormatter): """Generates pymol script to highlight motifs on crystal structure. """ def __init__(self, MotifResults=None, cons_thresh=0.9, MolType=PROTEIN): """init function for HighlightOnAlignment class. MotifResults: motif results object """ MotifFormatter.__init__(self) self.ConservationThresh = float(cons_thresh) ModuleMap, ModuleConsMap = self.makeModuleMap(MotifResults) module_ids = set([]) for skey, slist in ModuleMap.items(): for stup in slist: module_ids.add(stup[1]) self.ColorMapHex = self.getColorMapS0(sorted(list(module_ids))) self.ModuleMap = ModuleMap self.ModuleConsMap = ModuleConsMap self.MotifResults = MotifResults self.ColorMap = self.getColorMapRgb(MotifResults) self.GapMap = {} self.HighlightMap = {} self.MolType=MolType self.RunScriptString = \ ''' from pymol import cmd cmd.load("%s") cmd.do("run %s") ''' self.ColorFunctionString = \ ''' color_map = %s color_command_list = %s sticks_command_list = %s #Set color list using color_map set_color_list(list(color_map.items())) #Set seq colors for color_cmd in color_command_list: colors,indices,chain_id = color_cmd set_seq_colors(colors,indices,chain_id) #Set sticks for sticks_cmd in sticks_command_list: indices,chain_id = sticks_cmd set_show_shapes(indices,chain_id) ''' def __call__(self,seq_id,pdb_id,\ sequence_type='Protein',\ zipfile_dir='.', pdb_dir='/quicksand2/hamady/data/cron_sync/pdb/'): """call method for HighlightOnCrystal class. Generates pymol script for highlighting motifs on crystal structure and creates .zip archive with pdb file and pymol script. """ #Get PDB file curr_pdb = \ [x.rstrip("\n") for x in self.getPdb(pdb_id,pdb_dir).readlines()] #Get subject sequence subject_seq = self.MotifResults.Alignment.NamedSeqs[seq_id] #Get PDB chains pdb_matching, ungapped_to_pdb = \ get_matching_chains(subject_seq,curr_pdb,sequence_type) pdb_aligned = align_subject_to_pdb(subject_seq,pdb_matching) #get color command list color_command_list, found_seq_motifs, missed_seq_motifs = \ self.makeColorCommandLists(seq_id,pdb_aligned,ungapped_to_pdb) #get sticks command list sticks_command_list = \ self.makeSticksCommandsConservedPositions(seq_id, pdb_aligned,\ ungapped_to_pdb) #Generate pdb file pdb_out = pdb_id+'.pdb' #Generate pymol script pymol_script_list = [PYMOL_FUNCTION_STRING,MAIN_FUNCTION_STRING] pymol_script_list.append(self.ColorFunctionString % (self.ColorMap,\ color_command_list,sticks_command_list)) pymol_script_string = ''.join(pymol_script_list) pymol_script_name = '%s_motif_coloring.pml' % (pdb_id) pymol_execute_name = '%s_double_click_me.pml' % (pdb_id) pymol_execute_string = \ self.RunScriptString % (pdb_out,pymol_script_name) #Generate zip file output_pre = get_tmp_filename(zipfile_dir, prefix="pdb_%s_" % pdb_id) if output_pre.endswith(".txt"): output_pre = output_pre[:-4] zip_dir = output_pre.split("/")[-1] output_filename = output_pre + ".zip" web_name = output_filename.split("/")[-1] curr_zip = ZipFile(output_filename,'w') curr_zip.writestr(zip_dir + "/" + pymol_script_name,pymol_script_string) curr_zip.writestr(zip_dir + "/" + pdb_out,'\n'.join(curr_pdb)) curr_zip.writestr(zip_dir + "/" + pymol_execute_name,pymol_execute_string) curr_zip.close() alignment_html = {} for k,v in pdb_aligned.items(): alignment_html[k]=self.highlightSeq(seq_id,v[0],pdb_id,v[1]) #set up return dictionary return_dir = { "output_filename":output_filename, "web_name":web_name, "colored_alignment":alignment_html, #"found_seq_motifs":found_seq_motifs, "found_seq_motifs":[(module.ID, _format_number(module.Pvalue)) for module in self.MotifResults.Modules if module.ID in found_seq_motifs], "missed_seq_motifs":missed_seq_motifs, "all_motifs":[(module.ID, _format_number(module.Pvalue)) for module in self.MotifResults.Modules], "all_motif_colors":self.ColorMapHex} return return_dir def getConservedPositions(self,min_conservation=1.0): """Returns dict mapping motif id to list of conserved positions. """ conserved_positions = {} for motif in self.MotifResults.Motifs: for module in motif.Modules: curr_id = module.ID conserved_positions[curr_id]=[] curr_profile = AlnToProfile(module,self.MotifResults.MolType) for ix,pos in enumerate(curr_profile.rowMax()): if pos >= min_conservation: conserved_positions[curr_id].append(ix) return conserved_positions def makeColorCommandLists(self,seq_id, pdb_aligned, ungapped_to_pdb): """Returns lists of (colors, indices, and chain_id) for coloring. - each chain is a separate tuple in the list. - colors are named by motif id """ #list of motifs found in pdb sequence found_seq_motifs = [] #list of motifs not in pdb sequence missed_seq_motifs = [] color_command_list = [] #Get locations by sequence locations = \ MotifLocationsBySequence(self.MotifResults).Locations[seq_id] for chain, aligned in pdb_aligned.items(): #Get subject gap map subject_gapped,subject_ungapped = \ self.MotifResults.MolType.gapMaps(aligned[0]) #Get pdb gap map pdb_gapped,pdb_ungapped = \ self.MotifResults.MolType.gapMaps(aligned[1]) for curr_ix, curr_module in locations.items(): curr_module_len = len(str(curr_module)) curr_module_id = curr_module.ID #curr_color = self.ColorMap[curr_module_id] curr_color = "color_" + str(curr_module_id) #get index list ix_list = [] #for each position in motif for i in range(curr_ix,curr_ix+curr_module_len): #Get the gapped index of the motif in the subject seq try: sub_gap = subject_gapped[i] except KeyError: continue #Get the ungapped index of the motif in the pdb seq try: pdb_ungap = pdb_ungapped[sub_gap] #Get the index of the position in pdb coordinates pdb_ix = ungapped_to_pdb[chain][pdb_ungap] ix_list.append(pdb_ix) except KeyError: continue if ix_list: ix_string = '+'.join(map(str,ix_list)) color_command_list.append(([curr_color],[ix_string],chain)) found_seq_motifs.append(curr_module_id) else: missed_seq_motifs.append(curr_module_id) return color_command_list, found_seq_motifs, missed_seq_motifs def makeSticksCommandsConservedPositions(self,seq_id, pdb_aligned,\ ungapped_to_pdb, min_conservation=1.0): """Returns list of (indices, chain_id) to show sticks at indices. - each chain is a separate tuple in the list. """ conserved_positions = self.getConservedPositions() show_command_list = [] #Get locations by sequence locations = \ MotifLocationsBySequence(self.MotifResults).Locations[seq_id] for chain, aligned in pdb_aligned.items(): #Get subject gap map subject_gapped,subject_ungapped = \ self.MotifResults.MolType.gapMaps(aligned[0]) #Get pdb gap map pdb_gapped,pdb_ungapped = \ self.MotifResults.MolType.gapMaps(aligned[1]) for curr_ix, curr_module in locations.items(): curr_module_id = curr_module.ID curr_module_conserved = conserved_positions[curr_module_id] #get index list ix_list = [] #for each position in motif for i in curr_module_conserved: #Get the gapped index of the motif in the subject seq try: sub_gap = subject_ungapped[i] except KeyError: continue #Get the ungapped index of the motif in the pdb seq try: pdb_ungap = pdb_gapped[sub_gap] #Get the index of the position in pdb coordinates pdb_ix = ungapped_to_pdb[pdb_ungap] ix_list.append(pdb_ix) except KeyError: continue if ix_list: show_command_list.append((ix_list,chain)) return show_command_list def getPdb(self,pdb_id, pdb_dir): """Returns open pdb file. - currently gets pdb file from pdb website. """ pdb_file = pdb_dir + "pdb%s.ent" % pdb_id.lower() of = None try: of = open(pdb_file) except Exception, e: of = GzipFile(pdb_file + ".gz") return of def getGapMap(self,seq_id,gapped_seq): """Returns dict mapping gapped_coord to ungapped_coord in self.Alignment - {seq_id:{gapped_coord:ungapped_coord}} """ gap_map = {} gapped, ungapped = self.MolType.gapMaps(gapped_seq) gap_map[seq_id] = gapped return gap_map def highlightSeq(self, seq_id, seq_aligned, pdb_id, pdb_aligned,\ title_class="ntitle", row_class="highlight"): """Returns HTML string for single sequence. seq_id: sequence_id to highlight """ #Generate first row in table seq_row_tmpl = """ %s %s """ mo_span_tmpl = """%s""" seq_list = list(seq_aligned) pdb_list = list(pdb_aligned) seq_len = len(seq_list) seq_mask = zeros(seq_len) #pdb sequence mask pdb_mask = zeros(seq_len) mod_id_map = {} self.GapMap = self.getGapMap(seq_id,seq_aligned) cons_str = list(" " * len(seq_list)) if seq_id in self.ModuleMap: for mod_tup in self.ModuleMap[seq_id]: ix, mod_id, mod_len = mod_tup cons_seq, cons_con_seq = self.ModuleConsMap[mod_id] cur_seq = ''.join(seq_list[ix:ix+mod_len]) cc_str = self._flag_conserved_consensus(cons_con_seq, cons_seq, cur_seq) cons_str[ix:ix+mod_len] = cc_str mod_mask = zeros(seq_len) #pdb module mask pdb_mod_mask = zeros(seq_len) # mask motif region for i in range(ix,ix+mod_len): gapped_ix = self.GapMap[seq_id][i] mod_mask[gapped_ix] = 1 #only allow coloring of ungapped pdb sequence if pdb_list[gapped_ix] != '-': pdb_mod_mask[gapped_ix]=1 # add to sequence map seq_mask += mod_mask pdb_mask += pdb_mod_mask # map module ids to indexes for jx in range(ix,ix+mod_len): gapped_jx = self.GapMap[seq_id][jx] if gapped_jx not in mod_id_map: mod_id_map[gapped_jx] = [] mod_id_map[gapped_jx].append(mod_id) mm_str = [] for ix in range(len(seq_list)): if seq_list[ix] == pdb_list[ix]: mm_str.append("|") else: mm_str.append("*") # get module regions for jx in nonzero(seq_mask)[0]: # if overlapping use red background, otherwise display color if seq_mask[jx] > 1: style = "background-color: red" else: style = self.ColorMapHex[mod_id_map[jx][0]].replace("font-family: 'Courier New', Courier, monospace", "") seq_list[jx] = mo_span_tmpl % (style, "
".join(['Motif ID: %s' % \ x for x in mod_id_map[jx]]), seq_list[jx]) pdb_list[jx] = mo_span_tmpl % (style, "
".join(['Motif ID: %s' % \ x for x in mod_id_map[jx]]), pdb_list[jx]) clean_cons_str = [] for item in cons_str: if item == " ": clean_cons_str.append(" ") else: clean_cons_str.append(item) cons_str = ''.join(clean_cons_str) # cache data self.HighlightMap[seq_id] = ''.join(seq_list) # return row output seq_row = seq_row_tmpl % (row_class, title_class, seq_id, ''.join(seq_list)) pdb_row = seq_row_tmpl % (row_class, title_class, pdb_id, ''.join(pdb_list)) high_row = seq_row_tmpl % (row_class, title_class, "Cons", ''.join(cons_str)) mm_row = seq_row_tmpl % (row_class, title_class, "Mismatch", ''.join(mm_str)) return ''.join(['',high_row,seq_row,mm_row, pdb_row,'
']) def makeModuleMap(self, motif_results): """ Need to extract this b/c can't pickle motif_results... grr. motif_results: MotifResults object keep_module_ids: list of module ids to keep """ module_map = {} #Dict with locations of every motif keyed by module module_cons_map = {} if motif_results: for motif in motif_results.Motifs: for module in motif.Modules: mod_id = str(module.ID) mod_len = len(str(module)) if mod_id not in module_cons_map: module_cons_map[mod_id] = self._make_conservation_consensus(module) for skey, indexes in module.LocationDict.items(): if skey not in module_map: module_map[skey] = [] for ix in indexes: module_map[skey].append((ix, mod_id, mod_len)) return module_map, module_cons_map class ColorSecondaryStructurePostscript(MotifFormatter): """Generates postscript file with motifs highlighted on 2D structure """ def makeModuleMap(self, motif_results): """ Need to extract this b/c can't pickle motif_results... grr. motif_results: MotifResults object keep_module_ids: list of module ids to keep """ module_map = {} #Dict with locations of every motif keyed by module if motif_results: for motif in motif_results.Motifs: for module in motif.Modules: mod_len = len(module) mod_id = str(module.ID) for skey, indexes in module.LocationDict.items(): if skey not in module_map: module_map[skey] = [] for ix in indexes: module_map[skey].append((ix, mod_id, mod_len)) return module_map def __init__(self, MotifResults, KeepIds=None,\ KeepAll=True, MolType=RNA, strict=True,circle_motif_id=None,\ SkipIds=None): """Set up color map and motif results ModuleMap: flattened map (b/c of pickle problem.) generate using make_module_map() function Alignment: SequenceCollection or Alignment object KeepIds: list of module ids to keep KeepAll: When True, ignores KeepIds and highlights all motifs """ MotifFormatter.__init__(self) self.ModuleMap = self.makeModuleMap(MotifResults) module_ids = set([]) for skey, slist in self.ModuleMap.items(): for stup in slist: module_ids.add(stup[1]) self.ColorMap = self.getColorMapRgb(MotifResults) overlap_color = {'overlap_color':(1.0,0.0,0.0)} self.ColorMap.update(overlap_color) self.Alignment = MotifResults.Alignment if KeepIds is None: KeepIds = [] self.KeepIds = set(KeepIds) self.KeepAll = KeepAll self.MolType = MolType self.GapMap = self.getGapMap() self.HighlightMap = {} self.Strict=strict self.CircleId = circle_motif_id self.SkipIds=SkipIds def __call__(self,seq_id,sequence,struct,write_dir='.'): """Call method for ColorSecondaryStructure class. """ indices, colors = self.getColorIndices(seq_id,sequence) circle_indices = [] if self.CircleId: circle_indices = \ self.getCircleIndices(seq_id,sequence,self.CircleId) structure_postscript = color_on_structure(\ sequence=sequence,\ struct=struct,\ color_map = self.ColorMap,\ indices=indices,\ colors=colors,\ circle_indices=circle_indices) file_id = seq_id.split()[0] ps_out_path = \ write_dir+'/'+file_id+'_secondary_struct.ps' ps_out = open(ps_out_path,'w') ps_out.write(structure_postscript) ps_out.close() return ps_out_path def getGapMap(self): """Returns dict mapping gapped_coord to ungapped_coord in self.Alignment - {seq_id:{gapped_coord:ungapped_coord}} """ gap_map = {} for k,v in self.Alignment.items(): gapped, ungapped = self.MolType.gapMaps(v) gap_map[k] = gapped return gap_map def getColorIndices(self, seq_id, sequence): """Returns list of indices and colors for a given sequence. seq_id: sequence ID to highlight on structure """ #seq_list = list(self.Alignment.NamedSeqs[seq_id]) seq_list = list(sequence) seq_len = len(seq_list) seq_mask = zeros(seq_len) mod_id_map = {} indices = [] colors = [] gapped,ungapped = self.MolType.gapMaps(sequence) self.GapMap[seq_id]= gapped if self.Strict: if seq_id not in self.ModuleMap: raise IndexError, 'seq_id %s not in ModuleMap'%(seq_id) else: if seq_id not in self.ModuleMap: return [],[] for mod_tup in self.ModuleMap[seq_id]: ix, mod_id, mod_len = mod_tup # skip modules we con't care about if not self.KeepAll and mod_id not in self.KeepIds: continue elif mod_id in self.SkipIds: continue mod_mask = zeros(seq_len) # mask motif region for i in range(ix,ix+mod_len): gapped_ix = self.GapMap[seq_id][i] mod_mask[gapped_ix] = 1 # add to sequence map seq_mask += mod_mask # map module ids to indexes for jx in range(ix,ix+mod_len): gapped_jx = self.GapMap[seq_id][jx] if gapped_jx not in mod_id_map: mod_id_map[gapped_jx] = [] mod_id_map[gapped_jx].append(mod_id) # get module regions #need to take [0] element of nonzero() since numpy returns a tuple # where Numeric did not for kx in nonzero(seq_mask)[0]: # if overlapping use red background, otherwise display color if seq_mask[kx] > 1: curr_color = 'overlap_color' else: mod_id = mod_id_map[kx][0] curr_color = 'color_'+mod_id #append indices. Must start at 1, not 0 for RNAplot to work indices.append(kx+1) colors.append(curr_color) return indices, colors def getCircleIndices(self, seq_id, sequence, motif_id): """Returns list of indices to be circled for a given sequence. seq_id: sequence ID to highlight on structure sequence: sequence string motif_id: ID of motif to circle. """ seq_list = list(sequence) seq_len = len(seq_list) seq_mask = zeros(seq_len) mod_id_map = {} indices = [] gapped,ungapped = self.MolType.gapMaps(sequence) self.GapMap[seq_id]= gapped if self.Strict: if seq_id not in self.ModuleMap: raise IndexError, 'seq_id %s not in ModuleMap'%(seq_id) else: if seq_id not in self.ModuleMap: return [],[] for mod_tup in self.ModuleMap[seq_id]: ix, mod_id, mod_len = mod_tup if mod_id != motif_id: continue # skip modules we con't care about if not self.KeepAll and mod_id not in self.KeepIds: continue mod_mask = zeros(seq_len) # mask motif region for i in range(ix,ix+mod_len): gapped_ix = self.GapMap[seq_id][i] mod_mask[gapped_ix] = 1 # add to sequence map seq_mask += mod_mask # map module ids to indexes for jx in range(ix,ix+mod_len): gapped_jx = self.GapMap[seq_id][jx] if gapped_jx not in mod_id_map: mod_id_map[gapped_jx] = [] mod_id_map[gapped_jx].append(mod_id) # get module regions #need to take [0] element of nonzero() since numpy returns a tuple # where Numeric did not for kx in nonzero(seq_mask)[0]: #append indices. Must start at 1, not 0 for RNAplot to work indices.append(kx+1) return indices class ColorSecondaryStructureMatplotlib(MotifFormatter): """Generates png file with motifs highlighted on 2D structure """ def makeModuleMap(self, motif_results): """ Need to extract this b/c can't pickle motif_results... grr. motif_results: MotifResults object keep_module_ids: list of module ids to keep """ module_map = {} #Dict with locations of every motif keyed by module if motif_results: for motif in motif_results.Motifs: for module in motif.Modules: mod_len = len(module) mod_id = str(module.ID) for skey, indexes in module.LocationDict.items(): if skey not in module_map: module_map[skey] = [] for ix in indexes: module_map[skey].append((ix, mod_id, mod_len)) return module_map def __init__(self, MotifResults, KeepIds=None,\ KeepAll=True, MolType=RNA, strict=True,circle_motif_id=None,\ SkipIds=None,square_motif_id=None,square_label=None): """Set up color map and motif results ModuleMap: flattened map (b/c of pickle problem.) generate using make_module_map() function Alignment: SequenceCollection or Alignment object KeepIds: list of module ids to keep KeepAll: When True, ignores KeepIds and highlights all motifs """ MotifFormatter.__init__(self) self.ModuleMap = self.makeModuleMap(MotifResults) module_ids = set([]) for skey, slist in self.ModuleMap.items(): for stup in slist: module_ids.add(stup[1]) self.ColorMap = self.getColorMapRgb(MotifResults) overlap_color = {'overlap_color':(1.0,0.0,0.0)} self.ColorMap.update(overlap_color) self.Alignment = MotifResults.Alignment if KeepIds is None: KeepIds = [] self.KeepIds = set(KeepIds) self.KeepAll = KeepAll self.MolType = MolType self.GapMap = self.getGapMap() self.HighlightMap = {} self.Strict=strict self.CircleId = circle_motif_id self.SquareId = square_motif_id self.SquareLabel = square_label self.SkipIds=SkipIds def __call__(self,seq_id,sequence,struct,write_dir='.'): """Call method for ColorSecondaryStructure class. """ indices, colors = self.getColorIndices(seq_id,sequence) circle_indices = [] if self.CircleId: circle_indices = \ self.getMarkedIndices(seq_id,sequence,self.CircleId) square_indices = [] if self.SquareId: square_indices = \ self.getMarkedIndices(seq_id,sequence,self.SquareId) if self.SquareLabel is None: self.SquareLabel = '' draw_structure(sequence, struct, indices=indices, colors=colors,\ circle_indices=circle_indices, square_indices=square_indices) file_id = seq_id.split()[0] struct_out_path = '%s/%s_%s_secondary_struct.png'%(write_dir,\ file_id,self.SquareLabel) savefig(struct_out_path,format='png',dpi=150) clf() return struct_out_path def getGapMap(self): """Returns dict mapping gapped_coord to ungapped_coord in self.Alignment - {seq_id:{gapped_coord:ungapped_coord}} """ gap_map = {} for k,v in self.Alignment.items(): gapped, ungapped = self.MolType.gapMaps(v) gap_map[k] = gapped return gap_map def getColorIndices(self, seq_id, sequence): """Returns list of indices and colors for a given sequence. seq_id: sequence ID to highlight on structure """ #seq_list = list(self.Alignment.NamedSeqs[seq_id]) seq_list = list(sequence) seq_len = len(seq_list) seq_mask = zeros(seq_len) mod_id_map = {} indices = [] colors = [] gapped,ungapped = self.MolType.gapMaps(sequence) self.GapMap[seq_id]= gapped if self.Strict: if seq_id not in self.ModuleMap: raise IndexError, 'seq_id %s not in ModuleMap'%(seq_id) else: if seq_id not in self.ModuleMap: return [],[] for mod_tup in self.ModuleMap[seq_id]: ix, mod_id, mod_len = mod_tup # skip modules we con't care about if not self.KeepAll and mod_id not in self.KeepIds: continue elif mod_id in self.SkipIds: continue mod_mask = zeros(seq_len) # mask motif region for i in range(ix,ix+mod_len): gapped_ix = self.GapMap[seq_id][i] mod_mask[gapped_ix] = 1 # add to sequence map seq_mask += mod_mask # map module ids to indexes for jx in range(ix,ix+mod_len): gapped_jx = self.GapMap[seq_id][jx] if gapped_jx not in mod_id_map: mod_id_map[gapped_jx] = [] mod_id_map[gapped_jx].append(mod_id) # get module regions #need to take [0] element of nonzero() since numpy returns a tuple # where Numeric did not for kx in nonzero(seq_mask)[0]: # if overlapping use red background, otherwise display color if seq_mask[kx] > 1: curr_color = 'overlap_color' else: mod_id = mod_id_map[kx][0] curr_color = 'color_'+mod_id #append indices. indices.append(kx) colors.append(self.ColorMap[curr_color]) return indices, colors def getMarkedIndices(self, seq_id, sequence, motif_id): """Returns list of indices to be marked for a given sequence. seq_id: sequence ID to highlight on structure sequence: sequence string motif_id: ID of motif to circle. """ seq_list = list(sequence) seq_len = len(seq_list) seq_mask = zeros(seq_len) mod_id_map = {} indices = [] gapped,ungapped = self.MolType.gapMaps(sequence) self.GapMap[seq_id]= gapped if self.Strict: if seq_id not in self.ModuleMap: raise IndexError, 'seq_id %s not in ModuleMap'%(seq_id) else: if seq_id not in self.ModuleMap: return [],[] for mod_tup in self.ModuleMap[seq_id]: ix, mod_id, mod_len = mod_tup if mod_id != motif_id: continue # skip modules we con't care about if not self.KeepAll and mod_id not in self.KeepIds: continue mod_mask = zeros(seq_len) # mask motif region for i in range(ix,ix+mod_len): gapped_ix = self.GapMap[seq_id][i] mod_mask[gapped_ix] = 1 # add to sequence map seq_mask += mod_mask # map module ids to indexes for jx in range(ix,ix+mod_len): gapped_jx = self.GapMap[seq_id][jx] if gapped_jx not in mod_id_map: mod_id_map[gapped_jx] = [] mod_id_map[gapped_jx].append(mod_id) # get module regions #need to take [0] element of nonzero() since numpy returns a tuple # where Numeric did not for kx in nonzero(seq_mask)[0]: #append indices. indices.append(kx) return indices class MotifsUpperCase(MotifFormatter): """Generates postscript file with motifs highlighted on 2D structure """ def makeModuleMap(self, motif_results): """ Need to extract this b/c can't pickle motif_results... grr. motif_results: MotifResults object keep_module_ids: list of module ids to keep """ module_map = {} #Dict with locations of every motif keyed by module if motif_results: for motif in motif_results.Motifs: for module in motif.Modules: mod_len = len(module) mod_id = str(module.ID) for skey, indexes in module.LocationDict.items(): if skey not in module_map: module_map[skey] = [] for ix in indexes: module_map[skey].append((ix, mod_id, mod_len)) return module_map def __init__(self, MotifResults, KeepIds=None,\ KeepAll=True, MolType=RNA): """Set up color map and motif results ModuleMap: flattened map (b/c of pickle problem.) generate using make_module_map() function Alignment: SequenceCollection or Alignment object KeepIds: list of module ids to keep KeepAll: When True, ignores KeepIds and highlights all motifs """ MotifFormatter.__init__(self) self.ModuleMap = self.makeModuleMap(MotifResults) self.Alignment = MotifResults.Alignment if KeepIds is None: KeepIds = [] self.KeepIds = set(KeepIds) self.KeepAll = KeepAll self.MolType = MolType self.GapMap = self.getGapMap() self.HighlightMap = {} def __call__(self,aln,outfile_prefix='',write_dir='.'): """Call method for ColorSecondaryStructure class. """ aln = SequenceCollection(aln) new_aln = {} for k,v in aln.NamedSeqs.items(): new_aln[k]=self.getUpperCaseSequence(k,v) out_path = \ write_dir+'/'+outfile_prefix+'_binding_site_alignment.fasta' out_file = open(out_path,'w') out_file.write(fasta_from_alignment(new_aln)) out_file.close() print out_path return new_aln def getGapMap(self): """Returns dict mapping gapped_coord to ungapped_coord in self.Alignment - {seq_id:{gapped_coord:ungapped_coord}} """ gap_map = {} for k,v in self.Alignment.items(): gapped, ungapped = self.MolType.gapMaps(v) gap_map[k] = gapped return gap_map def getUpperCaseSequence(self, seq_id, sequence): """Returns list of indices and colors for a given sequence. seq_id: sequence ID to highlight on structure """ sequence=str(sequence) seq_len = len(sequence) seq_mask = zeros(seq_len) mod_id_map = {} gapped,ungapped = self.MolType.gapMaps(sequence) self.GapMap[seq_id] = gapped #if there are no motifs, return lower case sequence. if seq_id not in self.ModuleMap: return sequence.lower() for mod_tup in self.ModuleMap[seq_id]: ix, mod_id, mod_len = mod_tup # skip modules we con't care about if not self.KeepAll and mod_id not in self.KeepIds: continue mod_mask = zeros(seq_len) # mask motif region for i in range(ix,ix+mod_len): gapped_ix = self.GapMap[seq_id][i] mod_mask[gapped_ix] = 1 # add to sequence map seq_mask += mod_mask # get upper case in module regions new_seq = [] for kx, lc, uc in zip(seq_mask,sequence.lower(), sequence.upper()): # if not motif region, use lower if kx < 1: new_seq.append(lc) else: new_seq.append(uc) return ''.join(new_seq) class MotifSequenceConstraints(MotifFormatter): """Generates postscript file with motifs highlighted on 2D structure """ def makeModuleMap(self, motif_results): """ Need to extract this b/c can't pickle motif_results... grr. motif_results: MotifResults object keep_module_ids: list of module ids to keep """ module_map = {} #Dict with locations of every motif keyed by module if motif_results: for motif in motif_results.Motifs: for module in motif.Modules: mod_len = len(module) mod_id = str(module.ID) for skey, indexes in module.LocationDict.items(): if skey not in module_map: module_map[skey] = [] for ix in indexes: module_map[skey].append((ix, mod_id, mod_len)) return module_map def __init__(self, MotifResults, KeepIds=None,\ KeepAll=True, MolType=RNA, strict=True): """Set up color map and motif results ModuleMap: flattened map (b/c of pickle problem.) generate using make_module_map() function Alignment: SequenceCollection or Alignment object KeepIds: list of module ids to keep KeepAll: When True, ignores KeepIds and highlights all motifs """ MotifFormatter.__init__(self) self.ModuleMap = self.makeModuleMap(MotifResults) self.Alignment = MotifResults.Alignment if KeepIds is None: KeepIds = [] self.KeepIds = set(KeepIds) self.KeepAll = KeepAll self.MolType = MolType self.GapMap = self.getGapMap() self.Strict=strict self.MotifCharacter = \ list('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') def __call__(self,seq_id,sequence,struct,write_dir='.'): """Call method for ColorSecondaryStructure class. """ #indices, colors = self.getColorIndices(seq_id,sequence) indices, colors = self.getColorIndices(seq_id,sequence) circle_indices = [] if self.CircleId: circle_indices = \ self.getCircleIndices(seq_id,sequence,self.CircleId) structure_postscript = color_on_structure(\ sequence=sequence,\ struct=struct,\ color_map = self.ColorMap,\ indices=indices,\ colors=colors,\ circle_indices=circle_indices) file_id = seq_id.split()[0] ps_out_path = \ write_dir+'/'+file_id+'_secondary_struct.ps' ps_out = open(ps_out_path,'w') ps_out.write(structure_postscript) ps_out.close() return ps_out_path def getGapMap(self): """Returns dict mapping gapped_coord to ungapped_coord in self.Alignment - {seq_id:{gapped_coord:ungapped_coord}} """ gap_map = {} for k,v in self.Alignment.items(): gapped, ungapped = self.MolType.gapMaps(v) gap_map[k] = gapped return gap_map def getSeqMask(self, seq_id, sequence): """Returns vector where motifs are present in sequence. - seq_id: sequence ID to make seq mask. - sequence: sequence to make seq mask with. """ #seq_list = list(self.Alignment.NamedSeqs[seq_id]) seq_list = list(sequence) seq_len = len(seq_list) seq_mask = zeros(seq_len) mod_id_map = {} gapped,ungapped = self.MolType.gapMaps(sequence) self.GapMap[seq_id]= gapped if self.Strict: if seq_id not in self.ModuleMap: raise IndexError, 'seq_id %s not in ModuleMap'%(seq_id) else: if seq_id not in self.ModuleMap: return '','' for mod_tup in self.ModuleMap[seq_id]: ix, mod_id, mod_len = mod_tup # skip modules we con't care about if not self.KeepAll and mod_id not in self.KeepIds: continue elif mod_id in self.SkipIds: continue mod_mask = zeros(seq_len) # mask motif region for i in range(ix,ix+mod_len): gapped_ix = self.GapMap[seq_id][i] mod_mask[gapped_ix] = 1 # add to sequence map seq_mask += mod_mask return seq_mask def getOverlapDicts(self,alignment): """Returns dicts of motifs that overlap in start or end positions. """ #start overlap dict start_overlap = {} #end overlap dict end_overlap = {} #Get seq_mask_dict. Calling this will construct self.GapMap for given # sequence. for seq_id,seq in alignment.items(): curr_seq_mask = self.getSeqMask(seq_id,seq) #for each module for mod_tup in self.ModuleMap[seq_id]: ix, mod_id, mod_len = mod_tup gapped_start = self.GapMap[seq_id][ix] gapped_end = self.GapMap[seq_id][ix+mod_len] if curr_seq_mask[gapped_start]>1: start_overlap[mod_id]=seq_id if curr_seq_mask[gapped_end]>1: end_overlap[mod_id]=seq_id return start_overlap, end_overlap def getConstraintStrings(self,alignment): """Returns dict of constraint strings for each sequence in alignment. - {seq_id:{1:constraint_string_1,2:constraint_string_2}} """ start_overlap, end_overlap = self.getOverlapDicts(alignment) PyCogent-1.5.3/cogent/format/nexus.py000644 000765 000024 00000002617 12024702176 020532 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def nexus_from_alignment(aln, seq_type, interleave_len=50): """returns a nexus formatted string Arguments: - seq_type: dna, rna, or protein - interleave_len: the line width""" if aln.isRagged(): raise ValueError, "Sequences in alignment are not all the same " +\ "length. Cannot generate NEXUS format." num_seq = len(aln.Seqs) if not aln or not num_seq: return "" aln_len = aln.SeqLen nexus_out = ["#NEXUS\n\nbegin data;"] nexus_out.append(" dimensions ntax=%d nchar=%d;" % (num_seq, aln_len)) nexus_out.append(" format datatype=%s interleave=yes missing=? " % \ seq_type + "gap=-;") nexus_out.append(" matrix") cur_ix = 0 while cur_ix < aln_len: nexus_out.extend([" %s %s" % (x, y[cur_ix:cur_ix + \ interleave_len]) for x, y in aln.NamedSeqs.items()]) nexus_out.append("") cur_ix += interleave_len nexus_out.append(" ;\nend;") return '\n'.join(nexus_out) PyCogent-1.5.3/cogent/format/pdb.py000644 000765 000024 00000010741 12024702176 020132 0ustar00jrideoutstaff000000 000000 """Functions to create PDB files from entities optionally with data in the B-factor and Q (occupancy) columns. """ from collections import defaultdict from itertools import chain from cogent.struct.selection import einput from cogent.data.protein_properties import AA_NAMES from cogent.parse.pdb import dict2pdb, dict2ter __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" def write_header(header): if not isinstance(header, dict): return (header or []) xt = ('REMARK 200 EXPERIMENT TYPE : %s\n', 'experiment_type') sg = ('REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: %s\n', 'space_group') tests = [xt, sg] results = [] for test in tests: string, key = test try: value = header[key] results.append(string % value) except (KeyError, TypeError): pass return results def write_coords(atoms): old_fields = defaultdict(str) coords = ['MODEL 1\n'] for atom in atoms.sortedvalues(): fields = atom.getDict() if (old_fields['chain_id'] != fields['chain_id']) and old_fields['chain_id'] and old_fields['res_name'] in AA_NAMES: coords.append(dict2ter(old_fields)) if (old_fields['model'] != fields['model']) and old_fields['model'] != '': # model can be 0 :) if old_fields['chain_id'] and old_fields['res_name'] in AA_NAMES: coords.append(dict2ter(old_fields)) coords.append('ENDMDL\n') coords.append('MODEL %4i\n' % (fields['model'] + 1)) coords.append(dict2pdb(fields)) old_fields = fields if fields['res_name'] in AA_NAMES: coords.append(dict2ter(fields)) coords.append('ENDMDL\n') coords.append('END \n') return coords def write_trailer(trailer): if not isinstance(trailer, dict): return (trailer or []) def number(data, rest_val): try: return float(data) except TypeError: return rest_val def iterable(data, rest_val): try: return float(len(data)) except TypeError: return rest_val def PDBWriter(f, entities, header_=None, trailer_=None): structure = einput(entities, level='A', name='atoms') # hierarchy: args, dicts try: header = (header_ or entities.raw_header or entities.header) except AttributeError: header = header_ try: trailer = (trailer_ or entities.raw_trailer or entities.trailer) except AttributeError: trailer = trailer_ coords = write_coords(structure) header = write_header(header) trailer = write_trailer(trailer) for part in chain(header, coords, trailer): f.writelines(part) # did not open do not close # f.close() def PDBXWriter(f, entities, level, b_key, b_mode=None, b_val=0.0, q_key=None, \ q_mode=None, q_val=0.0): """Writes data from the ``xtra`` dictionary into B and Q columns of a PDB file. The level from which the dictionary is taken can be specified. The b_key and q_key specifies should be a key in the dictionaries, b_val and q_val are the default values, b_mode and q_mode can be "number" - ``float`` will be called to transform the data or "iterable" which will return the length (``len``) of the sequence. The B and Q columns can contain only numeric values, thus any data which we wish to store in those columns needs to be converted. The following functions convert data to numeric form. boolean type can also be treated as number. """ b_mode = (b_mode or 'number') # B q_mode = (q_mode or 'number') # Q entities = einput(entities, level) for entity in entities: q_data = eval(q_mode)(entity.xtra.get(q_key), b_val) # occupancy b_data = eval(b_mode)(entity.xtra.get(b_key), q_val) # b-factor if level != 'A': atoms = einput(entity, 'A') for atom in atoms: if b_key: atom.setBfactor(b_data) if q_key: atom.setOccupancy(q_data) else: if b_key: entity.setBfactor(b_data) if q_key: entity.setOccupancy(q_data) PDBWriter(f, entities) # we try to preserve the headers PyCogent-1.5.3/cogent/format/pdb_color.py000644 000765 000024 00000022741 12024702176 021333 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division from cogent.app.muscle_v38 import muscle_seqs from cogent.app.util import get_tmp_filename from cogent.parse.fasta import MinimalFastaParser from numpy import array __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Gavin Huttley", "Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" three_to_one = {'ALA':'A','CYS':'C','ASP':'D','GLU':'E', 'PHE':'F', 'GLY':'G', 'HIS':'H','ILE':'I','LYS':'K','LEU':'L','FME':'M','MET':'M','MSE':'M', 'ASN':'N','PRO':'P', 'GLN':'Q','ARG':'R','SER':'S','THR':'T','SEC':'U','VAL':'V','TRP':'W', 'TYR':'Y'} nucleotides = {'A':'A','C':'C','G':'G','U':'U','T':'T','I':'I',\ '1MA':'A','2MA':'A','SRA':'A','MIA':'A','MAD':'A','A2M':'A','AVC':'A', 'PPU':'A','AET':'A','A23':'A','5AA':'A','T6A':'A','MTU':'A','LCA':'A', '5MC':'C','CCC':'C','OMC':'C','CH':'C','10C':'C', '1MG':'G','QUO':'G','G7M':'G','GDP':'G','YG':'G','7MG':'G','OMG':'G', 'M2G':'G','2MG':'G','GTP':'G','YYG':'G', 'PSU':'U','H2U':'U','5MU':'U','4SU':'U','ONE':'U','+U':'U','DHU':'U', 'IU':'U','SSU':'U','OMU':'U','UR3':'U','MNU':'U','5BU':'U','S4U':'U', 'UMS':'U' } def get_aligned_muscle(seq1,seq2): """Returns aligned sequences and frac_same using MUSCLE. This needs to be moved to the muscle app controller """ outname = get_tmp_filename() res = muscle_seqs([seq1,seq2], add_seq_names=True, WorkingDir="/tmp", out_filename=outname) seq1_aligned,seq2_aligned =list(MinimalFastaParser(res['MuscleOut'].read())) res.cleanUp() del(res) seq1_aligned = seq1_aligned[1][1:] seq2_aligned = seq2_aligned[1][1:] frac_same = (array(seq1_aligned, 'c') == array(seq2_aligned, 'c')).sum(0)\ / min(len(seq1), len(seq2)) return seq1_aligned,seq2_aligned,frac_same def get_chains(lines): """From list of lines in pdb records, returns dict {chain:[(pos,residue)]}. Keeps original 1-based numbering. All residues will be returned. """ chains = {} last_resnum = None ter=False prev_chains=[] for line in lines: #skip if not an atom line if not (line.startswith('ATOM') or line.startswith('HETATM')): #If chain terminated, record chain. if line.startswith('TER'): ter=True prev_chains.append(chain) else: continue else: residue = line[17:20].strip() chain = line[21].strip() try: resnum = int(line[22:27].strip()) #Some resnum columns have non-integer values except ValueError: resnum = line[22:27].strip() #Continue until next chain is found. if ter: if chain in prev_chains: continue else: ter=False if chain not in chains: chains[chain] = [] curr_chain = chains[chain] else: if chain not in chains: chains[chain] = [] curr_chain = chains[chain] if resnum != last_resnum: curr_chain.append((resnum,residue)) last_resnum = resnum return chains def ungapped_to_pdb_numbers(chain_list): """From a chain list, map seq position -> res number.""" return dict(enumerate([i[0]for i in chain_list])) def chains_to_seqs(chains): """Returns sequences as an array of chars for each chain. Will concatenate all the residues that exist; it's the job of alignment or similar to align it with other seqs. Can get numbering """ result = {} chain_to_seq_type = {} for chain_id, residues in chains.items(): seq_type = None curr_seq = [] first_res = residues[0][1].strip() if first_res in three_to_one: seq_type = 'Protein' else: seq_type = 'Nucleotide' for res_id, seq in residues: seq = seq.strip() if seq_type == 'Nucleotide': curr_seq.append(nucleotides.get(seq, 'N')) else: curr_seq.append(three_to_one.get(seq, '?')) result[chain_id] = ''.join(curr_seq) chain_to_seq_type[chain_id]=seq_type return result, chain_to_seq_type def get_best_muscle_hits(subject_seq, query_aln,threshold,use_shorter=True): """Returns subset of query_aln with alignment scores above threshold. - subject_seq is sequence aligned against query_aln seqs. - query_aln is dict or Alignment object with candidate seqs to be aligned with subject_seq. - threshold is an alignment score (fraction shared aligned length) which returned seqs must be above when aligned w/ subject_seq. - use_shorter (default=True) is to decide whether to use the length of the shorter sequence to calculate the alignment score. """ keep={} #best = 0 for query_label, query_seq in query_aln.items(): subject_aligned, query_aligned, frac_same = \ get_aligned_muscle(subject_seq,query_seq) #if frac_same > best: if frac_same > threshold: keep[query_label]=query_seq #best=frac_same return keep def get_matching_chains(subject_seq, pdb_lines,\ subject_type='Protein',threshold=0.8): """Returns PDB chains that match subject_seq. - subject_seq must be a sequence string. - pdb_lines must be a list of lines from a PDB record. - subject_type must be the type of sequence that subject_seq is. This is used to build blast database. """ #Get PDB sequence info chains = get_chains(pdb_lines) #Get pdb numbering ungapped_to_pdb = {} for k,v in chains.items(): ungapped_to_pdb[k]=ungapped_to_pdb_numbers(v) #Get pdb alignment and sequence types pdb_aln, pdb_types = chains_to_seqs(chains) #Get only sequences of subject_type pdb_matching = \ dict([(k,pdb_aln[k]) for k,v in pdb_types.items() if v == subject_type]) #if there is more than one chain in pdb_matching if len(pdb_matching) > 1: #Get best hits using MUSCLE. pdb_matching = \ get_best_muscle_hits(subject_seq, pdb_matching, threshold) return pdb_matching, ungapped_to_pdb def align_subject_to_pdb(subject_seq, pdb_matching): """Returns pairwise aligned subject_seq and pdb_matching alignment. - result will be a dict: {pdb_chain:(aligned_subject, aligned_pdb)} """ result = {} for pdb_chain, pdb_seq in pdb_matching.items(): subject_aligned, pdb_aligned,frac_same = \ get_aligned_muscle(subject_seq, pdb_seq) result[pdb_chain]=(subject_aligned,pdb_aligned) return result #####The following code must be in the .pml script:#### def iterate_blocks(seq, max_len=100): """Yields successive blocks up to max_len from seq.""" curr = 0 while curr < len(seq): yield seq[curr:curr+max_len] curr += max_len def make_color_list(colors, prefix="color_"): """Makes list of colors, sequentially numbered after prefix.""" return [(prefix+str(i+1), color) for i, color in enumerate(colors)] def set_color_list(color_list): """Uses cmd to set all the items in a list of colors as named colors.""" for name, color in color_list: cmd.set_color(name, color) def set_seq_colors(colors, indices, chain_id): """Takes list of colors same length as seq, index mapping, and chain id.""" for i, color in enumerate(colors): idx = indices[i].split('+') for block in iterate_blocks(idx): curr = '+'.join(block) cmd.color(color, "chain %s and resi %s" % (chain_id, curr)) def set_show_shapes(indices, chain_id, shape="sticks"): """Takes list of indices and a chain id and sets to shape. """ for block in iterate_blocks(indices): str_indices = '+'.join(map(str,block)) cmd.show(shape,"chain %s and resi %s" % (chain_id, str_indices)) #pymol coloring functions string: PYMOL_FUNCTION_STRING = \ ''' def iterate_blocks(seq, max_len=100): """Yields successive blocks up to max_len from seq.""" curr = 0 while curr < len(seq): yield seq[curr:curr+max_len] curr += max_len def make_color_list(colors, prefix="color_"): """Makes list of colors, sequentially numbered after prefix.""" return [(prefix+str(i+1), color) for i, color in enumerate(colors)] def set_color_list(color_list): """Uses cmd to set all the items in a list of colors as named colors.""" for name, color in color_list: cmd.set_color(name, color) def set_seq_colors(colors, indices, chain_id): """Takes list of colors same length as seq, index mapping, and chain id.""" for i, color in enumerate(colors): idx = indices[i].split('+') for block in iterate_blocks(idx): curr = '+'.join(block) cmd.color(color, "chain %s and resi %s" % (chain_id, curr)) def set_show_shapes(indices, chain_id, shape="sticks"): """Takes list of indices and a chain id and sets to shape. """ str_indices = "+".join(map(str,indices)) cmd.show(shape,"chain %s and resi %s" % (chain_id, str_indices)) ''' MAIN_FUNCTION_STRING = \ ''' cmd.hide() cmd.show("cartoon") cmd.color("white") ''' PyCogent-1.5.3/cogent/format/phylip.py000644 000765 000024 00000003020 12024702176 020662 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def phylip_from_alignment(aln, generic_label=True, make_seqlabel=None): """returns a phylip formatted string and an ID map of new to original sequence names. Sequences are sequential. Fails if all sequences are not the same length. Arguments: - generic_label: if true then numbered seq labels are generated - make_seqlabel: callback function that takes the seq object and returns a label str. The user must ensure these are correct for Phylip format. """ assert generic_label or make_seqlabel is not None if aln.isRagged(): raise ValueError, "Sequences in alignment are not all the same " +\ "length. Cannot generate PHYLIP format." num_seqs = len(aln.Seqs) if not aln or not num_seqs: return "" phylip_out = ["%d %d" % (num_seqs, aln.SeqLen)] id_map = {} cur_seq_id = 1 for seq_name, seq in zip(aln.Names, aln.Seqs): if make_seqlabel is not None: label = make_seqlabel(seq) elif generic_label: label = "seq%07d" % cur_seq_id id_map[label] = seq_name phylip_out.append("%s %s" % (label, seq)) cur_seq_id += 1 return '\n'.join(phylip_out), id_map PyCogent-1.5.3/cogent/format/rna_struct.py000644 000765 000024 00000036054 12024702176 021556 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Writer functions for RNA 2D structures. NOTE: Still in beta testing. """ from matplotlib import use use('Agg') #suppress graphical rendering from cogent.app.vienna_package import plot_from_seq_and_struct from cogent.parse.rna_plot import RnaPlotParser from cogent.parse.record_finder import LabeledRecordFinder from matplotlib.patches import Circle, Rectangle, Polygon from matplotlib.text import Text from pylab import gcf,gca, draw, savefig, clf from math import ceil __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" ################################################################################ ##### Code for postscript RNA 2d Struct ######################################## ################################################################################ COLOR_FUNCTION_STRING = \ """/Colormark { % i color Colormark draw circle around base i % setcolor setrgbcolor newpath 1 sub coor exch get aload pop fsize 2 div 0 360 arc fill stroke } bind def /Backcolor { % i color Backcolor draw circle around base i % setcolor setrgbcolor newpath 1 sub coor exch get aload pop fsize 2.6 div 0 360 arc fill stroke } bind def /Backgroundbases { indices { WHITE Backcolor } forall } bind def""" #Draw structure outline, base pairs, and base background. INIT_STRUCT_COMMAND = \ """drawoutline drawpairs Backgroundbases """ def get_indices_string(sequence): """Returns postscript index definition string. - sequence: string, list, etc representing the sequence of the RNA. Calling len(sequence) must give the length of the sequence. """ index_string = '/indices [%s] def'%(' '.join(map(str,\ [i+1 for i in range(len(sequence))]))) return index_string def get_color_map_string(color_map): """Returns postscript color map string given color map. - color_map: mapping from name to RGB color. {name:[R,G,B]} - Name must be a string with no spaces. - [R, G, B] must be a list of fractional RGB values from 0 to 1. eg {"red": [1.0, 0.0, 0.0] """ color_list = [] for name, color in color_map.items(): r,g,b = color color_list.append('/%s { %s %s %s } def'%(name,r,g,b)) return '\n'.join(color_list) def get_color_commands(indices, colors): """Returns postscript coloring string given indices and colors. - indices: base 1 index of color. NOT BASE 0. - colors: color name corresponding to color name used to generate color_map. - indices and colors must be lists of the same length. """ color_commands = [] for index, color in zip(indices,colors): color_commands.append('%s %s Colormark'%(str(index),color)) return ' '.join(color_commands) def get_circle_commands(indices, color='seqcolor'): """Returns postscript circling string given indices. - indices: base 1 index of color. NOT BASE 0. - color: color name of circle. """ circle_commands = [] if indices: circle_commands.append(color) for index in indices: circle_commands.append(str(index)+' cmark') return ' '.join(circle_commands) def get_rnaplot_postscript(sequence, struct): """Returns postscript string for seq and struct. """ #Params for RNAplot params = {'-t':'0',\ '--pre':'%PreTextHere'} #Get the postscript list ps_list = plot_from_seq_and_struct(sequence,\ struct,params=params).split('\n') #parse it into prefix and suffix lists pre_finder = LabeledRecordFinder(\ is_label_line=lambda x: x.startswith('%PreTextHere')) prefix,suffix = list(pre_finder(ps_list)) #Remove drawoutline and drawpairs commands form suffix new_suffix = [] for s in suffix: if not (s.startswith('drawpairs') or s.startswith('drawoutline')): new_suffix.append(s) return '\n'.join(prefix), '\n'.join(new_suffix) def color_on_structure(sequence,struct,color_map,indices=None,colors=None,\ circle_indices=None): """Returns a postscript string colored at indices. """ if indices is None: indices = [] if colors is None: colors = [] if len(indices) != len(colors): raise ValueError, 'indices and colors must be equal sized lists' #Get indices string indices_string = get_indices_string(str(sequence)) #Get color map string color_map_string = get_color_map_string(color_map) #Get color commands color_commands = get_color_commands(indices,colors) #Get circle commands circle_commands = get_circle_commands(circle_indices) #get RNAplot postscript prefix, suffix = get_rnaplot_postscript(sequence, struct) return '\n'.join([prefix,COLOR_FUNCTION_STRING,\ indices_string,INIT_STRUCT_COMMAND,\ color_map_string,color_commands,circle_commands,suffix]) ################################################################################ ##### END Code for postscript RNA 2d Struct #################################### ################################################################################ ################################################################################ ##### Code for matplotlib RNA 2d Struct ######################################## ################################################################################ def scale_coords(coords): """Returns coordinate list scaled to matplotlib coordintates. - coords: list of lists, of x,y coordinates: [[x1,y1],[x2,y2],[x3,y3],...] """ new_coords = [] #get min and max x coordinates max_x = max([c[0] for c in coords]) min_x = min([c[0] for c in coords]) #get min and max y coordinates max_y = max([c[1] for c in coords]) min_y = min([c[1] for c in coords]) #Get scaled max values for x and y scaled_max_x = max_x - min_x scaled_max_y = max_y - min_y #max scale value max_scale = max(scaled_max_x, scaled_max_y) scale_x = min_x scale_y = min_y for x,y in coords: new_coords.append([(x-scale_x)/max_scale,\ (y-scale_y)/max_scale]) return new_coords def set_axis_limits(axis, all_coords): """Sets axis limits based on all coordinates preventing clipping. axis: from calling gca() all_coords: all coordinates for shapes and labels. """ all_x = [x[0] for x in all_coords] all_y = [x[1] for x in all_coords] offset = abs(all_coords[0][0]-all_coords[1][0]) + \ abs(all_coords[0][1]-all_coords[1][1]) min_x = min(all_x) max_x = max(all_x) min_y = min(all_y) max_y = max(all_y) axis.set_xlim((-offset,1.0+offset)) axis.set_ylim((-offset,1.0+offset)) def make_circles(coords,facecolors,edgecolors,radius=0.02,alpha=1.0,fill=False): """Returns list of Circle objects, given list of coordinates. - coords: list of [x,y] coordinates, already scaled to matplotlib axes. - facecolor: color of circle face. - edgecolor: color of circle edge. - radius: radius of circle. """ recenter_divide = radius*100. recenter = radius/recenter_divide circles = [] for coord, facecolor,edgecolor in zip(coords,facecolors,edgecolors): x_coord,y_coord = coord curr_circle = Circle([x_coord+recenter,y_coord+recenter],\ radius=radius,facecolor=facecolor,edgecolor=edgecolor,alpha=alpha,\ fill=fill) circles.append(curr_circle) return circles def make_boxes(coords, facecolor='white', edgecolor='black', edge_size=0.03,\ alpha=1.0,fill=False): """Returns list of Rectangle objects, given list of coordinates. - coords: list of [x,y] coordinates, already scaled to matplotlib axes. - facecolor: color of box face. - edgecolor: color of box edge. - edge_size: length of box edges. """ boxes = [] recenter_divide = edge_size*200. recenter = edge_size/recenter_divide for x_coord,y_coord in coords: curr_box = Rectangle([x_coord-recenter,y_coord-recenter],\ edge_size,edge_size,\ facecolor=facecolor,edgecolor=edgecolor,alpha=alpha,fill=fill) boxes.append(curr_box) return boxes def make_letters(coords,letters,color='black'): """Returns list of Text objects, given list of coordinates and letteres. - coords: list of [x,y] coordinates, already scaled to matplotlib axes. - letters: list of letters to be drawn at given coordinates. Must be same size list as coords. - color: color of the letters. """ letter_list = [] for coord, letter in zip(coords, letters): x_coord, y_coord = coord curr_letter = Text(x_coord, y_coord, letter) letter_list.append(curr_letter) return letter_list def make_pairs(coords,pair_indices,offset=.01): """Returns list of Polygon objects, given a list of coordinates and indices. - coords: list of [x,y] coordinates, already scaled to matplotlib axes. - pair_indices: indices in the coordinate list that are paired. """ pairs = [] for first, second in pair_indices: fx, fy = coords[first] sx, sy = coords[second] pairs.append(Polygon([[fx+offset,fy+offset],[sx+offset,sy+offset]],\ alpha=0.2,linewidth=2)) return pairs def make_outline(coords,offset=.01): """Returns Polygon object given coords. """ outline_coords = [[x+offset,y+offset] for x,y in coords] outline = Polygon(outline_coords,alpha=0.2,linewidth=2,facecolor='white') finish = Polygon([outline_coords[0],outline_coords[-1]],\ alpha=1,linewidth=3,edgecolor='white',facecolor='white') return [outline,finish] def make_labels(coords): """Returns Text objects with 5' and 3' labels. """ #get five prime coordinates fp_x = coords[0][0] - (coords[1][0] - coords[0][0]) fp_y = coords[0][1] - (coords[1][1] - coords[0][1]) fp_label = Text(fp_x,fp_y,"5'") #get three prime coordinates tp_x = coords[-1][0] - (coords[-2][0] - coords[-1][0]) tp_y = coords[-1][1] - (coords[-2][1] - coords[-1][1]) tp_label = Text(tp_x,tp_y,"3'") return [fp_label,tp_label], [[fp_x,fp_y],[tp_x,tp_y]] def draw_structure(sequence,struct,indices=None,colors=None,\ circle_indices=None, square_indices=None,radial=True): """Returns a postscript string colored at indices. sequence: string of sequence characters. struct: string of ViennaStructure for sequence. Must be valid structure same length as sequence. indices: list of indices in sequence that will be colored as a solid circle. colors: list of colors, same length as list of indices. circle_indices: list of indices in sequence to draw an empty circle around. square_indices: list of indices in sequence to draw an empty square around. radial: draw structue in radial format (default=True). """ seq_len_scale = int(len(sequence)/50.) if seq_len_scale < 1: circle_scale_size = 0. square_scale_size = 0. else: #Get circle radius. Proportional to sequence length circle_scale_size = (.02/seq_len_scale)/4.0 #Get edge size. Proportional to sequence length square_scale_size = (.03/seq_len_scale)/4.0 circle_radius = .02 - circle_scale_size square_edge_size = .03 - square_scale_size if indices is None: indices = [] if colors is None: colors = [] if circle_indices is None: circle_indices = [] if square_indices is None: square_indices = [] if len(indices) != len(colors): raise ValueError, 'indices and colors must be equal sized lists' if radial: params = {'-t':'0'} else: params = {'-t':'1'} #Get the postscript list ps_list = plot_from_seq_and_struct(sequence,\ struct,params=params).split('\n') #Parse out seq, coords, and pairs seq, coords, pair_list = RnaPlotParser(ps_list) coords = scale_coords(coords) #get letters letters = make_letters(coords, list(seq)) #get pairs pairs = make_pairs(coords, pair_list) #get outline outline = make_outline(coords) #get labels labels,label_coords = make_labels(coords) #get plain circle coords circle_coords = [coords[i] for i in circle_indices] circle_faces = ['white']*len(circle_coords) circle_edges = ['black']*len(circle_coords) plain_circles = make_circles(circle_coords,circle_faces, circle_edges,\ radius=circle_radius) #get motif circles motif_coords = [coords[i] for i in indices] motif_circles = make_circles(motif_coords,colors,colors,fill=True,\ radius=circle_radius) #Get square coords square_coords = [coords[i] for i in square_indices] plain_squares = make_boxes(square_coords,edge_size=square_edge_size) axis = gca() axis.set_axis_off() all_coords = coords + label_coords set_axis_limits(axis, all_coords) for l in [letters, pairs, outline, motif_circles, plain_circles, \ plain_squares,labels]: for shape in l: axis.add_artist(shape) fig = gcf() largest = max([fig.get_figheight(), fig.get_figwidth()]) fig.set_figheight(largest) fig.set_figwidth(largest) def save_structure(sequence,struct,out_filename,indices=None,colors=None,\ circle_indices=None, square_indices=None,radial=True,format='png',\ dpi=75): """Saves figure of 2D structure generated by draw_structure(). sequence: string of sequence characters. struct: string of ViennaStructure for sequence. Must be valid structure same length as sequence. indices: list of indices in sequence that will be colored as a solid circle. colors: list of colors, same length as list of indices. circle_indices: list of indices in sequence to draw an empty circle around. square_indices: list of indices in sequence to draw an empty square around. radial: draw structue in radial format (default=True). format: format of structure figure (default=png). Must be a valid matplotlib format. dpi: resolution of figure (default=75). """ draw_structure(sequence,struct,indices=indices,colors=colors,\ circle_indices=circle_indices,\ square_indices=square_indices,radial=radial) savefig(out_filename,format=format,dpi=dpi) clf() ################################################################################ ##### End Code for matplotlib RNA 2d Struct #################################### ################################################################################ PyCogent-1.5.3/cogent/format/stockholm.py000644 000765 000024 00000006513 12024702176 021372 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Writer for Stockholm format. """ from cogent.core.alignment import SequenceCollection from copy import copy __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" def stockholm_from_alignment(aln, interleave_len=None, GC_annotation=None): """Returns a string in Stockholm format. - aln: can be an Alignment object or a dict. - interleave_len: sequence line width. Only available if sequences are aligned. - GC_annotation: dict containing Per-column annotation {:}, added to Stockholm file in the following format: #=GC - is an aligned text line of annotation type . - #=GC lines are associated with a sequence alignment block; - is aligned to the residues in the alignment block, and has the same length as the rest of the block. #=GC lines are placed at the end of each block. """ if not aln: return '' # get seq output order try: order = aln.RowOrder except: order = aln.keys() order.sort() seqs = SequenceCollection(aln) stockholm_list = ["# STOCKHOLM 1.0\n"] if seqs.isRagged(): raise ValueError,\ "Sequences in alignment are not all the same length." +\ "Cannot generate Stockholm format." aln_len = seqs.SeqLen #Get all labels labels = copy(seqs.Names) #Get ordered seqs ordered_seqs = [seqs.NamedSeqs[label] for label in order] if GC_annotation is not None: GC_annotation_list = \ [(k,GC_annotation[k]) for k in sorted(GC_annotation.keys())] #Add GC_annotation to list of labels. labels.extend(['#=GC '+ k for k in GC_annotation.keys()]) for k,v in GC_annotation.items(): if len(v) != aln_len: raise ValueError, """GC annotation %s is not same length as alignment. Cannot generate Stockholm format."""%(k) #Find all label lengths in order to get padding. label_lengths = [len(l) for l in labels] label_max = max(label_lengths) max_spaces = label_max+4 if interleave_len is not None: curr_ix = 0 while curr_ix < aln_len: stockholm_list.extend(["%s%s%s"%(x,' '*(max_spaces-len(x)),\ y[curr_ix:curr_ix+ \ interleave_len]) for x,y in zip(order, ordered_seqs)]) if GC_annotation is not None: stockholm_list.extend(["#=GC %s%s%s"%(x,\ ' '*(max_spaces-len(x)-5),\ y[curr_ix:curr_ix + interleave_len]) for x,y in\ GC_annotation_list]) stockholm_list.append("") curr_ix += interleave_len else: stockholm_list.extend(["%s%s%s"%(x,' '*(max_spaces-len(x)),y) \ for x,y in zip(order, ordered_seqs)]) if GC_annotation is not None: stockholm_list.extend(["#=GC %s%s%s"%(x,' '*(max_spaces-len(x)-5),\ y) for x,y in GC_annotation_list]) stockholm_list.append("") return '\n'.join(stockholm_list)+'//' PyCogent-1.5.3/cogent/format/structure.py000644 000765 000024 00000003074 12024702176 021426 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import os from cogent.format.pdb import PDBWriter, PDBXWriter from cogent.format.xyzrn import XYZRNWriter from cogent.parse.record import FileFormatError __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" def save_to_filename(entities, filename, format, **kw): """Saves a structure in a specified format into a given file name. Arguments: - entities: structure or entities to be written - filename: name of the structure file - format: structure file format """ f = open(filename, 'w') try: write_to_file(f, entities, format, **kw) except Exception: try: os.unlink(filename) except Exception: pass raise f.close() def write_to_file(f, entities, format, **kw): """Saves a structure in a specified format into a given file handle. Arguments: - entities: structure or entities to be written - filename: name of the structure file - format: structure file format """ format = format.lower() if format not in WRITERS: raise FileFormatError("Unsupported file format %s" % format) writer = WRITERS[format] writer(f, entities, **kw) # to add a new file format add it's suffix and class name here WRITERS = { 'pdb': PDBWriter, 'pdbx': PDBXWriter, 'xyzrn': XYZRNWriter } PyCogent-1.5.3/cogent/format/table.py000644 000765 000024 00000046737 12024702176 020472 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Tool for creating tables and representing them as text, or writing to file for import into other packages. These classes still under development. Current formats include restructured text (keyed by 'rest'), latex, html, columns separated by a provided string, and a simple text format. """ import textwrap from cogent.util.warning import discontinued __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell", "Matthew Wakefield", "Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def _merged_cell_text_wrap(text, max_line_length, space): """ left justify wraps text into multiple rows""" max_line_width = max_line_length - (2 * space) if len(text) < max_line_length: return [text] buffer = ' ' * space wrapped = textwrap.wrap(text, width=max_line_width, initial_indent = buffer, subsequent_indent = buffer) wrapped = ["%s" % line.ljust(max_line_width + 2*space) for line in wrapped] return wrapped def html(text, **kwargs): """Returns the text as html.""" from docutils.core import publish_string # assuming run from the correct directory return publish_string(source=text, writer_name='html', **kwargs) def _merge_cells(row): """merges runs of identical row cells. returns a list with structure [((span_start, span_end), cell value),..]""" new_row = [] last = 0 span = 1 # the minimum for i in range(1, len(row), 1): if row[i-1] != row[i]: new_row.append(((last,last+span), row[i-1])) last=i span=1 continue span += 1 new_row.append(((last,last+span), row[i])) return new_row def rich_html(rows, row_cell_func=None, header=None, header_cell_func=None, element_formatters={}, merge_identical=True, compact=True): """returns just the html Table string Arguments: - rows: table rows - row_cell_func: callback function that formats the row values. Must take the row value and coordinates (row index, column index). - header: the table header - header_cell_func: callback function that formats the column headings must take the header label value and coordinate - element_formatters: a dictionary of specific callback funcs for formatting individual html table elements. e.g. {'table': lambda x: ''} - merge_identical: cells within a row are merged to one span. Note: header_cell_func and row_cell_func override element_formatters. """ formatted = element_formatters.get data = [formatted('table', '
')] # TODO use the docutils writer html convertor instead of str, for correct # escaping of characters if row_cell_func is None: row_cell_func = lambda v,r,c: '' % v if header_cell_func is None: header_cell_func = lambda v, c: '' % v if merge_identical: row_iterator = _merge_cells else: row_iterator = enumerate if header: th = formatted('th', '')]+row+[''] formatted_rows = [] td = formatted('td', '')] for cidx, cell in row_iterator(row): new += [row_cell_func(cell, ridx, cidx)] new += [''] formatted_rows += new data += formatted_rows data += ['
%s%s') row = [header_cell_func(label, i) for i, label in enumerate(header)] data += [formatted('tr', '
') for ridx, row in enumerate(rows): new = [formatted('tr', '
'] if compact: data = ''.join(data) else: data = '\n'.join(data) return data def latex(rows, header=None, caption=None, justify=None, label=None, position = None): """Returns the text a LaTeX longtable. Arguments: - header: table header - position: table page position, default is here, top separate page - justify: column justification, default is right aligned. - caption: Table legend - label: for cross referencing""" if not justify: numcols = [len(header), len(rows[0])][not header] justify = "r" * numcols justify = "{ %s }" % " ".join(list(justify)) if header: header = "%s \\\\" % " & ".join([r"\bf{%s}" % head.strip() for head in header]) rows = ["%s \\\\" % " & ".join(row) for row in rows] table_format = [r"\begin{longtable}[%s]%s" % (position or "htp!", justify)] table_format.append(r"\hline") table_format.append(header) table_format.append(r"\hline") table_format.append(r"\hline") table_format += rows table_format.append(r"\hline") if caption: table_format.append(r"\caption{%s}" % caption) if label: table_format.append(r"\label{%s}" % label) table_format.append(r"\end{longtable}") return "\n".join(table_format) def simpleFormat(header, formatted_table, title = None, legend = None, max_width = 1e100, identifiers = None, borders = True, space = 2): """Returns a table in a simple text format. Arguments: - header: series with column headings - formatted_table: a two dimensional structure (list/tuple) of strings previously formatted to the same width within a column. - title: optional table title - legend: optional table legend - max_width: forces wrapping of table onto successive lines if its' width exceeds that specified - identifiers: column index for the last column that uniquely identify rows. Required if table width exceeds max_width. - borders: whether to display borders. - space: minimum number of spaces between columns. """ table = [] if title: table.append(title) try: space = " " * space except TypeError: pass # if we are to split the table, creating sub tables, determine # the boundaries if len(space.join(header)) > max_width: if not identifiers: identifiers = 0 # having determined the maximum string lengths we now need to # produce subtables of width <= max_width col_widths = [len(head) for head in header] sep = len(space) min_length = sep * (identifiers - 1) + \ sum(col_widths[: identifiers]) if min_length > max_width: raise RuntimeError, "Maximum width too small for identifiers" begin, width = identifiers, min_length subtable_boundaries = [] for i in range(begin, len(header)): width += col_widths[i] + sep if width > max_width: subtable_boundaries.append((begin, i, width - col_widths[i] - sep)) width = min_length + col_widths[i] + sep begin = i # add the last sub-table subtable_boundaries.append((begin, len(header), width)) # generate the table for start, end, width in subtable_boundaries: if start > identifiers: # we are doing a sub-table table.append("continued: %s" % title) subhead = space.join([space.join(header[:identifiers]), space.join(header[start: end])]) width = len(subhead) table.append("=" * width) table.append(subhead) table.append("-" * width) for row in formatted_table: row = [space.join(row[:identifiers]), space.join(row[start: end])] table.append(space.join(row)) table.append("-" * width + "\n") # create the table as a list of correctly formatted strings else: header = space.join(header) length_head = len(header) if borders: table.append('=' * length_head) table.append(header) table.append('-' * length_head) else: table.append(header) for row in formatted_table: table.append(space.join(row)) if borders: table.append('-' * length_head) # add the legend, wrapped to the table widths if legend: wrapped = _merged_cell_text_wrap(legend, max_width, 0) table += wrapped return '\n'.join(table) def gridTableFormat(header, formatted_table, title = None, legend = None): """Returns a table in restructured text grid format. Arguments: - header: series with column headings - formatted_table: a two dimensional structure (list/tuple) of strings previously formatted to the same width within a column. - title: optional table title - legend: optional table legend """ space = 2 # make the delineators row_delineate = [] heading_delineate = [] col_widths = [len(col) for col in header] for width in col_widths: row_delineate.append('-' * width) heading_delineate.append('=' * width) row_delineate = '+-' + '-+-'.join(row_delineate) + '-+' heading_delineate = '+=' + '=+='.join(heading_delineate) + '=+' contiguous_delineator = '+' + '-' * (len(row_delineate) - 2) + '+' table = [] # insert the title if title: table.append(contiguous_delineator) if len(title) > len(row_delineate) - 2: wrapped = _merged_cell_text_wrap(title, len(contiguous_delineator) - 2, space) for wdex, line in enumerate(wrapped): wrapped[wdex] = '|' + line + '|' table += wrapped else: centered = title.center(len(row_delineate) - 2) table.append('|' + centered + '|') # insert the heading row table.append(row_delineate) table.append('| ' + ' | '.join(header) + ' |') table.append(heading_delineate) # concatenate the rows, separating by delineators for row in formatted_table: table.append('| ' + ' | '.join(row) + ' |') table.append(row_delineate) if legend: if len(legend) > len(row_delineate) - 2: wrapped = _merged_cell_text_wrap(legend, len(contiguous_delineator) - 2, space) for wdex, line in enumerate(wrapped): wrapped[wdex] = '|' + line + '|' table += wrapped else: ljust = legend.ljust(len(row_delineate) - 3) table.append('| ' + ljust + '|') table.append(contiguous_delineator) return '\n'.join(table) def separatorFormat(header, formatted_table, title = None, legend = None, sep = None): """Returns a table with column entries separated by a delimiter. If an entry contains the sep character, that entry is put in quotes. Also, title and legends (if provided) are forced to a single line and all words forced to single spaces. Arguments: - header: series with column headings - formatted_table: a two dimensional structure (list/tuple) of strings previously formatted to the same width within a column. - sep: character to separate column entries (eg tab - \t, or comma) - title: optional table title - legend: optional table legend """ if sep is None: raise RuntimeError, "no separator provided" if title: title = " ".join(" ".join(title.splitlines()).split()) if legend: legend = " ".join(" ".join(legend.splitlines()).split()) new_table = [sep.join(header)] for row in formatted_table: for cdex, cell in enumerate(row): if sep in cell: row[cdex] = '"%s"' % cell new_table += [sep.join(row) for row in formatted_table] table = '\n'.join(new_table) # add the title to top of list if title: table = '\n'.join([title, table]) if legend: table = '\n'.join([table, legend]) return table def FormatFields(formats): """Formats row fields by index. Arguments: - formats: a series consisting of index,formatter callable pairs, eg [(0, "'%s'"), (4, '%.4f')]. All non-specified columns are formatted as strings.""" index_format = [] def callable(line, index_format = index_format): if not index_format: index_format = ["%s" for index in range(len(line))] for index, format in formats: index_format[index] = format formatted = [index_format[i] % line[i] for i in range(len(line))] return formatted return callable def SeparatorFormatWriter(formatter = None, ignore = None, sep=","): """Returns a writer for a delimited tabular file. The writer has a has_header argument which ignores the formatter for a header line. Default format is string. Does not currently handle Titles or Legends. Arguments: - formatter: a callable that returns a correctly formatted line. - ignore: lines for which ignore returns True are ignored - sep: the delimiter deparating fields.""" formatter = formatter or [] def callable(lines, formatter = formatter, has_header=False): if not formatter: formatter = FormatFields([(i, "%s") for i in range(len(lines[0]))]) header_done = None for line in lines: if has_header and not header_done: formatted = sep.join(["%s" % field for field in line]) header_done = True else: formatted = sep.join(formatter(line)) yield formatted return callable def drawToPDF(header, formatted_table, filename, pagesize=(595,792), *args, **kw): """Writes the table to a pdf file Arguments: - header: series with column headings - formatted_table: a two dimensional structure (list/tuple) of strings previously formatted to the same width within a column. - filename: the name of the file or a file object - pagesize: a tuple of the page dimentions (in points) Default is A4 - columns: the number of columns of feature / representation pairs""" from reportlab.platypus import SimpleDocTemplate doc = SimpleDocTemplate(filename, leftMargin=10, rightMargin=10, pagesize=pagesize) doc.build([asReportlabTable(header, formatted_table, pagesize[0]*0.8, *args, **kw)]) def formattedCells(rows, header = None, digits=4, column_templates = None, missing_data = ''): """Return rows with each columns cells formatted as an equal length string. Arguments: - row: the series of table rows - header: optional header - digits: number of decimal places. Can be overridden by following. - column_templates: specific format templates for each column. - missing_data: default cell value. """ if not header: num_col = max([len(row) for row in rows]) header = [''] * num_col else: num_col = len(header) col_widths = [len(col) for col in header] num_row = len(rows) column_templates = column_templates or {} float_template = '%%.%df' % digits # if we have column templates, we use those, otherwise we adaptively # apply str/num format matrix = [] for row in rows: formatted = [] for cdex, col_head in enumerate(header): try: entry = row[cdex] except IndexError: entry = '%s' % missing_data else: if not entry: try: float(entry) # could numerically be 0, so not missing except (ValueError, TypeError): entry = '%s' % missing_data # attempt formatting if col_head in column_templates: try: # for functions entry = column_templates[col_head](entry) except TypeError: entry = column_templates[col_head] % entry elif isinstance(entry, float): entry = float_template % float(entry) else: # for any other python object entry = '%s' % str(entry) formatted.append(entry) col_widths[cdex] = max(col_widths[cdex], len(entry)) matrix.append(formatted) # now normalise all cell entries to max column widths new_header = [header[i].rjust(col_widths[i]) for i in range(num_col)] for row in matrix: for cdex in range(num_col): row[cdex] = row[cdex].rjust(col_widths[cdex]) return new_header, matrix def phylipMatrix(rows, names): """Return as a distance matrix in phylip's matrix format.""" # phylip compatible format is num taxa starting at col 4 # rows start with taxa names, length 8 # distances start at 13th col, 2 spaces between each col wrapped # at 75th col # follow on dists start at col 3 # outputs a square matrix def new_name(names, oldname): # the name has to be unique in that number, the best way to ensure that # is to determine the number and revise the existing name so it has a # int as its end portion num = len(names) max_num_digits = len(str(num)) assert max_num_digits < 10, "can't create a unique name for %s" % oldname name_base = oldname[:10 - max_num_digits] newname = None for i in range(max_num_digits): trial_name = "%s%s" % (name_base, i) if not trial_name in names: newname = trial_name break if not newname: raise RuntimeError, "Can't create a unique name for %s" % oldname else: print 'WARN: Seqname %s changed to %s' % (oldname, newname) return newname def append_species(name, formatted_dists, mat_breaks): rows = [] name = name.ljust(12) # format the distances first for i in range(len(mat_breaks)): if i == len(mat_breaks): break start = mat_breaks[i] try: end = mat_breaks[i + 1] except IndexError: end = len(formatted_dists) prefix = ['', ' '][i > 0] rows.append("%s%s" % (prefix, " ".join(formatted_dists[start: end]))) # mod first row of formatted_dists rows[0] = "%s%s" % (name.ljust(12), rows[0]) return rows # number of seqs numseqs = len(names) # determine wrapped table boundaries, if any prefix = 13 mat_breaks = [0] line_len = 75 # for the first block col_widths = [len(col) for col in rows[0]] for i in range(numseqs): num_cols = i - mat_breaks[-1] if prefix + 2 * num_cols + sum(col_widths[mat_breaks[-1]: i]) > line_len: prefix = 3 line_len = 73 mat_breaks.append(i) # build the formatted distance matrix dmat = [' %d' % numseqs] for i in range(numseqs): name = names[i].strip() # we determine white space if len(name) > 10: name = new_name(names, name) dmat += append_species(name, rows[i], mat_breaks) return "\n".join(dmat) PyCogent-1.5.3/cogent/format/text_tree.py000644 000765 000024 00000006606 12024702176 021375 0ustar00jrideoutstaff000000 000000 #!/bin/env python # file text_tree.py """Simple base text representation of phylo tree.""" __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" class TextNode(object): """ Helper to display text phyolo tree """ def __init__(self, branch_length, name): """Set up defaults""" self.BranchLength = 0.0 if branch_length: self.BranchLength = branch_length self.Name = name self.Printed = False self.NumChildren = 0 self.NumChildrenPrinted = 0 self.LastChild = None def display(self, max_dist, scale): """Display current node - should refactor this""" delimiter = "-" pdelimiter = "+" if self.Printed: delimiter = " " pdelimiter = "|" # compute number of children last child has last_ct = 0 if self.LastChild: last_ct = self.LastChild.NumChildren # update values self.Printed = True self.NumChildrenPrinted += 1 print_ct = self.NumChildren - last_ct if self.NumChildrenPrinted == (print_ct + 1): # or \ delimiter = " " pdelimiter = "+" elif self.NumChildrenPrinted > (print_ct + 1): pdelimiter = " " delimiter = " " if (self.NumChildren == self.NumChildrenPrinted and self.NumChildren == print_ct): delimiter = " " pdelimiter = "+" # check if leaf dout = "" if self.Name: dout = "@@%s" % self.Name pdelimiter = ">" delimiter = "-" return (int(self.BranchLength) - 1) * delimiter + pdelimiter + dout def process_nodes(all_nodes, cur_nodes, cur_node, parent): """ Recursively process nodes """ # make current node pn = TextNode(cur_node.Length, cur_node.Name) # set current node as last child of parent (last one wins) if parent: parent.LastChild = pn # handle terminal node if not cur_node.Children: all_nodes.append(cur_nodes + [pn]) # internal node else: cur_nodes.append(pn) pn.NumChildren = 0 for child in cur_node.Children: if child.Children: pn.NumChildren += len(child.tips()) else: pn.NumChildren += 1 for child in cur_node.Children: process_nodes(all_nodes, cur_nodes, child, pn) cur_nodes.pop() def generate_nodes(tree, max_dist, scale): """Iterate over list of TextNodes to display """ all_nodes = [] cur_nodes = [] # generate list of lists of TextNodes process_nodes(all_nodes, cur_nodes, tree, None) # process each list of TextNodes for node_list in all_nodes: # generate text string and node key ticks, node_key = ''.join([x.display(max_dist, scale) for x in node_list]).split("@@") # compute distances dist = sum([x.BranchLength for x in node_list]) #scaled_dist = sum([x.ScaledBranchLength for x in node_list]) scaled_dist = sum([x.BranchLength for x in node_list]) branch_len = node_list[-1].BranchLength # yield each node yield ticks, node_key, dist, scaled_dist, branch_len PyCogent-1.5.3/cogent/format/xyzrn.py000644 000765 000024 00000003746 12024702176 020566 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Function for XYZRN (coordinates followed by radius and id_hash) format output. This is a rather free implementation of the no-standard, but is compatible with MSMS.""" from itertools import chain from cogent.struct.selection import einput from cogent.data.protein_properties import AREAIMOL_VDW_RADII from cogent.data.ligand_properties import LIGAND_AREAIMOL_VDW_RADII XYZRN_COORDS_STRING = "%8.3f %8.3f %8.3f %8.3f %d %s\n" AREAIMOL_VDW_RADII.update(LIGAND_AREAIMOL_VDW_RADII) __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" def write_header(header): """Write the header. Not implemented.""" return (header or []) def write_coords(atoms, radius_type): """Write coordinate lines from a list of ``Atom`` instances. Takes a string which identifies the radius type.""" lines = [] radius_type = eval(radius_type) for atom in atoms.sortedvalues(): residue_name = atom.parent.name atom_name = atom.name radius = radius_type[(residue_name, atom_name)] if radius == 0.0: continue (x, y, z) = atom.coords args = (x, y, z, radius, 1, hash((x, y, z))) line = XYZRN_COORDS_STRING % args lines.append(line) return lines def write_trailer(trailer): """Write the trailer. Not implemented.""" return (trailer or []) def XYZRNWriter(f, entities, radius_type=None, header=None, trailer=None): """Function which writes XYZRN files from ``Entity`` instances.""" radius_type = (radius_type or 'AREAIMOL_VDW_RADII') structure = einput(entities, level='A', name='structure') header = write_header(header) coords = write_coords(structure, radius_type) trailer = write_trailer(trailer) for part in chain(header, coords, trailer): f.writelines(part) PyCogent-1.5.3/cogent/evolve/__init__.py000644 000765 000024 00000001335 12024702176 021133 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ["best_likelihood", "bootstrap", "likelihood_calculation", "likelihood_function", "likelihood_tree", "models", "pairwise_distance", "parameter_controller", "predicate", "simulate", "substitution_calculation", "substitution_model", "coevolution"] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell", "Andrew Butterfield", "Rob Knight", "Matthrew Wakefield", "Brett Easton", "Edward Lang","Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" PyCogent-1.5.3/cogent/evolve/_likelihood_tree.c000644 000765 000024 00000602771 12024702176 022502 0ustar00jrideoutstaff000000 000000 /* Generated by Cython 0.16 on Fri Sep 14 12:12:06 2012 */ #define PY_SSIZE_T_CLEAN #include "Python.h" #ifndef Py_PYTHON_H #error Python headers needed to compile C extensions, please install development version of Python. #elif PY_VERSION_HEX < 0x02040000 #error Cython requires Python 2.4+. #else #include /* For offsetof */ #ifndef offsetof #define offsetof(type, member) ( (size_t) & ((type*)0) -> member ) #endif #if !defined(WIN32) && !defined(MS_WINDOWS) #ifndef __stdcall #define __stdcall #endif #ifndef __cdecl #define __cdecl #endif #ifndef __fastcall #define __fastcall #endif #endif #ifndef DL_IMPORT #define DL_IMPORT(t) t #endif #ifndef DL_EXPORT #define DL_EXPORT(t) t #endif #ifndef PY_LONG_LONG #define PY_LONG_LONG LONG_LONG #endif #ifndef Py_HUGE_VAL #define Py_HUGE_VAL HUGE_VAL #endif #ifdef PYPY_VERSION #define CYTHON_COMPILING_IN_PYPY 1 #define CYTHON_COMPILING_IN_CPYTHON 0 #else #define CYTHON_COMPILING_IN_PYPY 0 #define CYTHON_COMPILING_IN_CPYTHON 1 #endif #if CYTHON_COMPILING_IN_PYPY #define __Pyx_PyCFunction_Call PyObject_Call #else #define __Pyx_PyCFunction_Call PyCFunction_Call #endif #if PY_VERSION_HEX < 0x02050000 typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN #define PY_FORMAT_SIZE_T "" #define PyInt_FromSsize_t(z) PyInt_FromLong(z) #define PyInt_AsSsize_t(o) __Pyx_PyInt_AsInt(o) #define PyNumber_Index(o) PyNumber_Int(o) #define PyIndex_Check(o) PyNumber_Check(o) #define PyErr_WarnEx(category, message, stacklevel) PyErr_Warn(category, message) #define __PYX_BUILD_PY_SSIZE_T "i" #else #define __PYX_BUILD_PY_SSIZE_T "n" #endif #if PY_VERSION_HEX < 0x02060000 #define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt) #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) #define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size) #define PyVarObject_HEAD_INIT(type, size) \ PyObject_HEAD_INIT(type) size, #define PyType_Modified(t) typedef struct { void *buf; PyObject *obj; Py_ssize_t len; Py_ssize_t itemsize; int readonly; int ndim; char *format; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; void *internal; } Py_buffer; #define PyBUF_SIMPLE 0 #define PyBUF_WRITABLE 0x0001 #define PyBUF_FORMAT 0x0004 #define PyBUF_ND 0x0008 #define PyBUF_STRIDES (0x0010 | PyBUF_ND) #define PyBUF_C_CONTIGUOUS (0x0020 | PyBUF_STRIDES) #define PyBUF_F_CONTIGUOUS (0x0040 | PyBUF_STRIDES) #define PyBUF_ANY_CONTIGUOUS (0x0080 | PyBUF_STRIDES) #define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES) #define PyBUF_RECORDS (PyBUF_STRIDES | PyBUF_FORMAT | PyBUF_WRITABLE) #define PyBUF_FULL (PyBUF_INDIRECT | PyBUF_FORMAT | PyBUF_WRITABLE) typedef int (*getbufferproc)(PyObject *, Py_buffer *, int); typedef void (*releasebufferproc)(PyObject *, Py_buffer *); #endif #if PY_MAJOR_VERSION < 3 #define __Pyx_BUILTIN_MODULE_NAME "__builtin__" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #else #define __Pyx_BUILTIN_MODULE_NAME "builtins" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #endif #if PY_MAJOR_VERSION < 3 && PY_MINOR_VERSION < 6 #define PyUnicode_FromString(s) PyUnicode_Decode(s, strlen(s), "UTF-8", "strict") #endif #if PY_MAJOR_VERSION >= 3 #define Py_TPFLAGS_CHECKTYPES 0 #define Py_TPFLAGS_HAVE_INDEX 0 #endif #if (PY_VERSION_HEX < 0x02060000) || (PY_MAJOR_VERSION >= 3) #define Py_TPFLAGS_HAVE_NEWBUFFER 0 #endif #if PY_VERSION_HEX > 0x03030000 && defined(PyUnicode_GET_LENGTH) #define CYTHON_PEP393_ENABLED 1 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_LENGTH(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) PyUnicode_READ_CHAR(u, i) #else #define CYTHON_PEP393_ENABLED 0 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_SIZE(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) ((Py_UCS4)(PyUnicode_AS_UNICODE(u)[i])) #endif #if PY_MAJOR_VERSION >= 3 #define PyBaseString_Type PyUnicode_Type #define PyStringObject PyUnicodeObject #define PyString_Type PyUnicode_Type #define PyString_Check PyUnicode_Check #define PyString_CheckExact PyUnicode_CheckExact #endif #if PY_VERSION_HEX < 0x02060000 #define PyBytesObject PyStringObject #define PyBytes_Type PyString_Type #define PyBytes_Check PyString_Check #define PyBytes_CheckExact PyString_CheckExact #define PyBytes_FromString PyString_FromString #define PyBytes_FromStringAndSize PyString_FromStringAndSize #define PyBytes_FromFormat PyString_FromFormat #define PyBytes_DecodeEscape PyString_DecodeEscape #define PyBytes_AsString PyString_AsString #define PyBytes_AsStringAndSize PyString_AsStringAndSize #define PyBytes_Size PyString_Size #define PyBytes_AS_STRING PyString_AS_STRING #define PyBytes_GET_SIZE PyString_GET_SIZE #define PyBytes_Repr PyString_Repr #define PyBytes_Concat PyString_Concat #define PyBytes_ConcatAndDel PyString_ConcatAndDel #endif #if PY_VERSION_HEX < 0x02060000 #define PySet_Check(obj) PyObject_TypeCheck(obj, &PySet_Type) #define PyFrozenSet_Check(obj) PyObject_TypeCheck(obj, &PyFrozenSet_Type) #endif #ifndef PySet_CheckExact #define PySet_CheckExact(obj) (Py_TYPE(obj) == &PySet_Type) #endif #define __Pyx_TypeCheck(obj, type) PyObject_TypeCheck(obj, (PyTypeObject *)type) #if PY_MAJOR_VERSION >= 3 #define PyIntObject PyLongObject #define PyInt_Type PyLong_Type #define PyInt_Check(op) PyLong_Check(op) #define PyInt_CheckExact(op) PyLong_CheckExact(op) #define PyInt_FromString PyLong_FromString #define PyInt_FromUnicode PyLong_FromUnicode #define PyInt_FromLong PyLong_FromLong #define PyInt_FromSize_t PyLong_FromSize_t #define PyInt_FromSsize_t PyLong_FromSsize_t #define PyInt_AsLong PyLong_AsLong #define PyInt_AS_LONG PyLong_AS_LONG #define PyInt_AsSsize_t PyLong_AsSsize_t #define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask #define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask #endif #if PY_MAJOR_VERSION >= 3 #define PyBoolObject PyLongObject #endif #if PY_VERSION_HEX < 0x03020000 typedef long Py_hash_t; #define __Pyx_PyInt_FromHash_t PyInt_FromLong #define __Pyx_PyInt_AsHash_t PyInt_AsLong #else #define __Pyx_PyInt_FromHash_t PyInt_FromSsize_t #define __Pyx_PyInt_AsHash_t PyInt_AsSsize_t #endif #if (PY_MAJOR_VERSION < 3) || (PY_VERSION_HEX >= 0x03010300) #define __Pyx_PySequence_GetSlice(obj, a, b) PySequence_GetSlice(obj, a, b) #define __Pyx_PySequence_SetSlice(obj, a, b, value) PySequence_SetSlice(obj, a, b, value) #define __Pyx_PySequence_DelSlice(obj, a, b) PySequence_DelSlice(obj, a, b) #else #define __Pyx_PySequence_GetSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), (PyObject*)0) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_GetSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object is unsliceable", (obj)->ob_type->tp_name), (PyObject*)0))) #define __Pyx_PySequence_SetSlice(obj, a, b, value) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_SetSlice(obj, a, b, value)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice assignment", (obj)->ob_type->tp_name), -1))) #define __Pyx_PySequence_DelSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_DelSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice deletion", (obj)->ob_type->tp_name), -1))) #endif #if PY_MAJOR_VERSION >= 3 #define PyMethod_New(func, self, klass) ((self) ? PyMethod_New(func, self) : PyInstanceMethod_New(func)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),((char *)(n))) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),((char *)(n)),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),((char *)(n))) #else #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),(n)) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),(n),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),(n)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_NAMESTR(n) ((char *)(n)) #define __Pyx_DOCSTR(n) ((char *)(n)) #else #define __Pyx_NAMESTR(n) (n) #define __Pyx_DOCSTR(n) (n) #endif #if PY_MAJOR_VERSION >= 3 #define __Pyx_PyNumber_Divide(x,y) PyNumber_TrueDivide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceTrueDivide(x,y) #else #define __Pyx_PyNumber_Divide(x,y) PyNumber_Divide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceDivide(x,y) #endif #ifndef __PYX_EXTERN_C #ifdef __cplusplus #define __PYX_EXTERN_C extern "C" #else #define __PYX_EXTERN_C extern #endif #endif #if defined(WIN32) || defined(MS_WINDOWS) #define _USE_MATH_DEFINES #endif #include #define __PYX_HAVE__cogent__evolve___likelihood_tree #define __PYX_HAVE_API__cogent__evolve___likelihood_tree #include "array_interface.h" #include "math.h" #ifdef _OPENMP #include #endif /* _OPENMP */ #ifdef PYREX_WITHOUT_ASSERTIONS #define CYTHON_WITHOUT_ASSERTIONS #endif /* inline attribute */ #ifndef CYTHON_INLINE #if defined(__GNUC__) #define CYTHON_INLINE __inline__ #elif defined(_MSC_VER) #define CYTHON_INLINE __inline #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L #define CYTHON_INLINE inline #else #define CYTHON_INLINE #endif #endif /* unused attribute */ #ifndef CYTHON_UNUSED # if defined(__GNUC__) # if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif # elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif #endif typedef struct {PyObject **p; char *s; const long n; const char* encoding; const char is_unicode; const char is_str; const char intern; } __Pyx_StringTabEntry; /*proto*/ /* Type Conversion Predeclarations */ #define __Pyx_PyBytes_FromUString(s) PyBytes_FromString((char*)s) #define __Pyx_PyBytes_AsUString(s) ((unsigned char*) PyBytes_AsString(s)) #define __Pyx_Owned_Py_None(b) (Py_INCREF(Py_None), Py_None) #define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False)) static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject*); static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x); static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject*); static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t); static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject*); #define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x)) #define __pyx_PyFloat_AsFloat(x) ((float) __pyx_PyFloat_AsDouble(x)) #ifdef __GNUC__ /* Test for GCC > 2.95 */ #if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)) #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) #else /* __GNUC__ > 2 ... */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ > 2 ... */ #else /* __GNUC__ */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ */ static PyObject *__pyx_m; static PyObject *__pyx_b; static PyObject *__pyx_empty_tuple; static PyObject *__pyx_empty_bytes; static int __pyx_lineno; static int __pyx_clineno = 0; static const char * __pyx_cfilenm= __FILE__; static const char *__pyx_filename; static const char *__pyx_f[] = { "numerical_pyrex.pyx", "_likelihood_tree.pyx", }; /*--- Type declarations ---*/ /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":30 * int CONTIGUOUS * * ctypedef object ArrayType # <<<<<<<<<<<<<< * * cdef double *uncheckedArrayDouble(ArrayType A): */ typedef PyObject *__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType; #ifndef CYTHON_REFNANNY #define CYTHON_REFNANNY 0 #endif #if CYTHON_REFNANNY typedef struct { void (*INCREF)(void*, PyObject*, int); void (*DECREF)(void*, PyObject*, int); void (*GOTREF)(void*, PyObject*, int); void (*GIVEREF)(void*, PyObject*, int); void* (*SetupContext)(const char*, int, const char*); void (*FinishContext)(void**); } __Pyx_RefNannyAPIStruct; static __Pyx_RefNannyAPIStruct *__Pyx_RefNanny = NULL; static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname); /*proto*/ #define __Pyx_RefNannyDeclarations void *__pyx_refnanny = NULL; #ifdef WITH_THREAD #define __Pyx_RefNannySetupContext(name, acquire_gil) \ if (acquire_gil) { \ PyGILState_STATE __pyx_gilstate_save = PyGILState_Ensure(); \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ PyGILState_Release(__pyx_gilstate_save); \ } else { \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ } #else #define __Pyx_RefNannySetupContext(name, acquire_gil) \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__) #endif #define __Pyx_RefNannyFinishContext() \ __Pyx_RefNanny->FinishContext(&__pyx_refnanny) #define __Pyx_INCREF(r) __Pyx_RefNanny->INCREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_DECREF(r) __Pyx_RefNanny->DECREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GOTREF(r) __Pyx_RefNanny->GOTREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GIVEREF(r) __Pyx_RefNanny->GIVEREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_XINCREF(r) do { if((r) != NULL) {__Pyx_INCREF(r); }} while(0) #define __Pyx_XDECREF(r) do { if((r) != NULL) {__Pyx_DECREF(r); }} while(0) #define __Pyx_XGOTREF(r) do { if((r) != NULL) {__Pyx_GOTREF(r); }} while(0) #define __Pyx_XGIVEREF(r) do { if((r) != NULL) {__Pyx_GIVEREF(r);}} while(0) #else #define __Pyx_RefNannyDeclarations #define __Pyx_RefNannySetupContext(name, acquire_gil) #define __Pyx_RefNannyFinishContext() #define __Pyx_INCREF(r) Py_INCREF(r) #define __Pyx_DECREF(r) Py_DECREF(r) #define __Pyx_GOTREF(r) #define __Pyx_GIVEREF(r) #define __Pyx_XINCREF(r) Py_XINCREF(r) #define __Pyx_XDECREF(r) Py_XDECREF(r) #define __Pyx_XGOTREF(r) #define __Pyx_XGIVEREF(r) #endif /* CYTHON_REFNANNY */ #define __Pyx_CLEAR(r) do { PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);} while(0) #define __Pyx_XCLEAR(r) do { if((r) != NULL) {PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);}} while(0) static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name); /*proto*/ static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb); /*proto*/ static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb); /*proto*/ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause); /*proto*/ static void __Pyx_RaiseArgtupleInvalid(const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found); /*proto*/ static void __Pyx_RaiseDoubleKeywordsError(const char* func_name, PyObject* kw_name); /*proto*/ static int __Pyx_ParseOptionalKeywords(PyObject *kwds, PyObject **argnames[], \ PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, \ const char* function_name); /*proto*/ static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Generic(PyObject *o, PyObject* j) { PyObject *r; if (!j) return NULL; r = PyObject_GetItem(o, j); Py_DECREF(j); return r; } #define __Pyx_GetItemInt_List(o, i, size, to_py_func) (((size) <= sizeof(Py_ssize_t)) ? \ __Pyx_GetItemInt_List_Fast(o, i) : \ __Pyx_GetItemInt_Generic(o, to_py_func(i))) static CYTHON_INLINE PyObject *__Pyx_GetItemInt_List_Fast(PyObject *o, Py_ssize_t i) { if (likely(o != Py_None)) { if (likely((0 <= i) & (i < PyList_GET_SIZE(o)))) { PyObject *r = PyList_GET_ITEM(o, i); Py_INCREF(r); return r; } else if ((-PyList_GET_SIZE(o) <= i) & (i < 0)) { PyObject *r = PyList_GET_ITEM(o, PyList_GET_SIZE(o) + i); Py_INCREF(r); return r; } } return __Pyx_GetItemInt_Generic(o, PyInt_FromSsize_t(i)); } #define __Pyx_GetItemInt_Tuple(o, i, size, to_py_func) (((size) <= sizeof(Py_ssize_t)) ? \ __Pyx_GetItemInt_Tuple_Fast(o, i) : \ __Pyx_GetItemInt_Generic(o, to_py_func(i))) static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Tuple_Fast(PyObject *o, Py_ssize_t i) { if (likely(o != Py_None)) { if (likely((0 <= i) & (i < PyTuple_GET_SIZE(o)))) { PyObject *r = PyTuple_GET_ITEM(o, i); Py_INCREF(r); return r; } else if ((-PyTuple_GET_SIZE(o) <= i) & (i < 0)) { PyObject *r = PyTuple_GET_ITEM(o, PyTuple_GET_SIZE(o) + i); Py_INCREF(r); return r; } } return __Pyx_GetItemInt_Generic(o, PyInt_FromSsize_t(i)); } #define __Pyx_GetItemInt(o, i, size, to_py_func) (((size) <= sizeof(Py_ssize_t)) ? \ __Pyx_GetItemInt_Fast(o, i) : \ __Pyx_GetItemInt_Generic(o, to_py_func(i))) static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Fast(PyObject *o, Py_ssize_t i) { if (PyList_CheckExact(o)) { Py_ssize_t n = (likely(i >= 0)) ? i : i + PyList_GET_SIZE(o); if (likely((n >= 0) & (n < PyList_GET_SIZE(o)))) { PyObject *r = PyList_GET_ITEM(o, n); Py_INCREF(r); return r; } } else if (PyTuple_CheckExact(o)) { Py_ssize_t n = (likely(i >= 0)) ? i : i + PyTuple_GET_SIZE(o); if (likely((n >= 0) & (n < PyTuple_GET_SIZE(o)))) { PyObject *r = PyTuple_GET_ITEM(o, n); Py_INCREF(r); return r; } } else if (likely(i >= 0)) { PySequenceMethods *m = Py_TYPE(o)->tp_as_sequence; if (likely(m && m->sq_item)) { return m->sq_item(o, i); } } return __Pyx_GetItemInt_Generic(o, PyInt_FromSsize_t(i)); } static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject *); static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject *); static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject *); static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject *); static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject *); static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject *); static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject *); static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject *); static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject *); static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject *); static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject *); static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject *); static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject *); static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject *); static void __Pyx_WriteUnraisable(const char *name, int clineno, int lineno, const char *filename); /*proto*/ static int __Pyx_check_binary_version(void); typedef struct { int code_line; PyCodeObject* code_object; } __Pyx_CodeObjectCacheEntry; struct __Pyx_CodeObjectCache { int count; int max_count; __Pyx_CodeObjectCacheEntry* entries; }; static struct __Pyx_CodeObjectCache __pyx_code_cache = {0,0,NULL}; static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line); static PyCodeObject *__pyx_find_code_object(int code_line); static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object); static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename); /*proto*/ static int __Pyx_InitStrings(__Pyx_StringTabEntry *t); /*proto*/ /* Module declarations from 'cogent.evolve._likelihood_tree' */ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType, char, int, int, int **); /*proto*/ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray1D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType, char, int, int *); /*proto*/ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray2D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType, char, int, int *, int *); /*proto*/ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray3D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType, char, int, int *, int *, int *); /*proto*/ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray4D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType, char, int, int *, int *, int *, int *); /*proto*/ static double *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble1D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType, int *); /*proto*/ static double *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble2D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType, int *, int *); /*proto*/ static long *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayLong1D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType, int *); /*proto*/ #define __Pyx_MODULE_NAME "cogent.evolve._likelihood_tree" int __pyx_module_is_main_cogent__evolve___likelihood_tree = 0; /* Implementation of 'cogent.evolve._likelihood_tree' */ static PyObject *__pyx_builtin_TypeError; static PyObject *__pyx_builtin_ValueError; static PyObject *__pyx_builtin_chr; static PyObject *__pyx_pf_6cogent_6evolve_16_likelihood_tree_sumInputLikelihoods(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_child_indexes, PyObject *__pyx_v_result, PyObject *__pyx_v_likelihoods); /* proto */ static PyObject *__pyx_pf_6cogent_6evolve_16_likelihood_tree_2getTotalLogLikelihood(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_counts, PyObject *__pyx_v_input_likelihoods, PyObject *__pyx_v_mprobs); /* proto */ static PyObject *__pyx_pf_6cogent_6evolve_16_likelihood_tree_4getLogSumAcrossSites(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_counts, PyObject *__pyx_v_input_likelihoods); /* proto */ static PyObject *__pyx_pf_6cogent_6evolve_16_likelihood_tree_6logDotReduce(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_index, PyObject *__pyx_v_patch_probs, PyObject *__pyx_v_switch_probs, PyObject *__pyx_v_plhs); /* proto */ static char __pyx_k_1[] = "Array required, got None"; static char __pyx_k_3[] = "Unexpected array interface version %s"; static char __pyx_k_4[] = "'%s' type array required, got '%s'"; static char __pyx_k_5[] = "'%s%s' type array required, got '%s%s'"; static char __pyx_k_6[] = "%s dimensional array required, got %s"; static char __pyx_k_7[] = "Noncontiguous array"; static char __pyx_k_9[] = "Dimension %s is %s, expected %s"; static char __pyx_k_10[] = "('1', '5', '3')"; static char __pyx_k_14[] = "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/_likelihood_tree.pyx"; static char __pyx_k_15[] = "cogent.evolve._likelihood_tree"; static char __pyx_k_18[] = "getTotalLogLikelihood"; static char __pyx_k_21[] = "getLogSumAcrossSites"; static char __pyx_k__M[] = "M"; static char __pyx_k__S[] = "S"; static char __pyx_k__U[] = "U"; static char __pyx_k__c[] = "c"; static char __pyx_k__i[] = "i"; static char __pyx_k__j[] = "j"; static char __pyx_k__k[] = "k"; static char __pyx_k__m[] = "m"; static char __pyx_k__n[] = "n"; static char __pyx_k__u[] = "u"; static char __pyx_k__pl[] = "pl"; static char __pyx_k__sp[] = "sp"; static char __pyx_k__chr[] = "chr"; static char __pyx_k__tmp[] = "tmp"; static char __pyx_k__BASE[] = "BASE"; static char __pyx_k__copy[] = "copy"; static char __pyx_k__plhs[] = "plhs"; static char __pyx_k__posn[] = "posn"; static char __pyx_k__prev[] = "prev"; static char __pyx_k__site[] = "site"; static char __pyx_k__uniq[] = "uniq"; static char __pyx_k__first[] = "first"; static char __pyx_k__index[] = "index"; static char __pyx_k__state[] = "state"; static char __pyx_k__total[] = "total"; static char __pyx_k__counts[] = "counts"; static char __pyx_k__length[] = "length"; static char __pyx_k__mprobs[] = "mprobs"; static char __pyx_k__result[] = "result"; static char __pyx_k____main__[] = "__main__"; static char __pyx_k____test__[] = "__test__"; static char __pyx_k__exponent[] = "exponent"; static char __pyx_k__TypeError[] = "TypeError"; static char __pyx_k__ValueError[] = "ValueError"; static char __pyx_k__index_data[] = "index_data"; static char __pyx_k____version__[] = "__version__"; static char __pyx_k__likelihoods[] = "likelihoods"; static char __pyx_k__mprobs_data[] = "mprobs_data"; static char __pyx_k__patch_probs[] = "patch_probs"; static char __pyx_k__target_data[] = "target_data"; static char __pyx_k__values_data[] = "values_data"; static char __pyx_k__logDotReduce[] = "logDotReduce"; static char __pyx_k__patch_probs1[] = "patch_probs1"; static char __pyx_k__patch_probs2[] = "patch_probs2"; static char __pyx_k__switch_probs[] = "switch_probs"; static char __pyx_k__version_info[] = "version_info"; static char __pyx_k__weights_data[] = "weights_data"; static char __pyx_k__child_indexes[] = "child_indexes"; static char __pyx_k____array_struct__[] = "__array_struct__"; static char __pyx_k__likelihoods_data[] = "likelihoods_data"; static char __pyx_k__input_likelihoods[] = "input_likelihoods"; static char __pyx_k__most_probable_state[] = "most_probable_state"; static char __pyx_k__sumInputLikelihoods[] = "sumInputLikelihoods"; static PyObject *__pyx_kp_s_1; static PyObject *__pyx_kp_s_10; static PyObject *__pyx_kp_s_14; static PyObject *__pyx_n_s_15; static PyObject *__pyx_n_s_18; static PyObject *__pyx_n_s_21; static PyObject *__pyx_kp_s_3; static PyObject *__pyx_kp_s_4; static PyObject *__pyx_kp_s_5; static PyObject *__pyx_kp_s_6; static PyObject *__pyx_kp_s_7; static PyObject *__pyx_kp_s_9; static PyObject *__pyx_n_s__BASE; static PyObject *__pyx_n_s__M; static PyObject *__pyx_n_s__S; static PyObject *__pyx_n_s__TypeError; static PyObject *__pyx_n_s__U; static PyObject *__pyx_n_s__ValueError; static PyObject *__pyx_n_s____array_struct__; static PyObject *__pyx_n_s____main__; static PyObject *__pyx_n_s____test__; static PyObject *__pyx_n_s____version__; static PyObject *__pyx_n_s__c; static PyObject *__pyx_n_s__child_indexes; static PyObject *__pyx_n_s__chr; static PyObject *__pyx_n_s__copy; static PyObject *__pyx_n_s__counts; static PyObject *__pyx_n_s__exponent; static PyObject *__pyx_n_s__first; static PyObject *__pyx_n_s__i; static PyObject *__pyx_n_s__index; static PyObject *__pyx_n_s__index_data; static PyObject *__pyx_n_s__input_likelihoods; static PyObject *__pyx_n_s__j; static PyObject *__pyx_n_s__k; static PyObject *__pyx_n_s__length; static PyObject *__pyx_n_s__likelihoods; static PyObject *__pyx_n_s__likelihoods_data; static PyObject *__pyx_n_s__logDotReduce; static PyObject *__pyx_n_s__m; static PyObject *__pyx_n_s__most_probable_state; static PyObject *__pyx_n_s__mprobs; static PyObject *__pyx_n_s__mprobs_data; static PyObject *__pyx_n_s__n; static PyObject *__pyx_n_s__patch_probs; static PyObject *__pyx_n_s__patch_probs1; static PyObject *__pyx_n_s__patch_probs2; static PyObject *__pyx_n_s__pl; static PyObject *__pyx_n_s__plhs; static PyObject *__pyx_n_s__posn; static PyObject *__pyx_n_s__prev; static PyObject *__pyx_n_s__result; static PyObject *__pyx_n_s__site; static PyObject *__pyx_n_s__sp; static PyObject *__pyx_n_s__state; static PyObject *__pyx_n_s__sumInputLikelihoods; static PyObject *__pyx_n_s__switch_probs; static PyObject *__pyx_n_s__target_data; static PyObject *__pyx_n_s__tmp; static PyObject *__pyx_n_s__total; static PyObject *__pyx_n_s__u; static PyObject *__pyx_n_s__uniq; static PyObject *__pyx_n_s__values_data; static PyObject *__pyx_n_s__version_info; static PyObject *__pyx_n_s__weights_data; static PyObject *__pyx_int_1; static PyObject *__pyx_int_2; static PyObject *__pyx_k_tuple_2; static PyObject *__pyx_k_tuple_8; static PyObject *__pyx_k_tuple_11; static PyObject *__pyx_k_tuple_12; static PyObject *__pyx_k_tuple_16; static PyObject *__pyx_k_tuple_19; static PyObject *__pyx_k_tuple_22; static PyObject *__pyx_k_codeobj_13; static PyObject *__pyx_k_codeobj_17; static PyObject *__pyx_k_codeobj_20; static PyObject *__pyx_k_codeobj_23; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":32 * ctypedef object ArrayType * * cdef double *uncheckedArrayDouble(ArrayType A): # <<<<<<<<<<<<<< * cdef PyArrayInterface *a * cobj = A.__array_struct__ */ static double *__pyx_f_6cogent_6evolve_16_likelihood_tree_uncheckedArrayDouble(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_A) { struct PyArrayInterface *__pyx_v_a; PyObject *__pyx_v_cobj = NULL; double *__pyx_r; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("uncheckedArrayDouble", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":34 * cdef double *uncheckedArrayDouble(ArrayType A): * cdef PyArrayInterface *a * cobj = A.__array_struct__ # <<<<<<<<<<<<<< * a = PyCObject_AsVoidPtr(cobj) * return a.data */ __pyx_t_1 = PyObject_GetAttr(((PyObject *)__pyx_v_A), __pyx_n_s____array_struct__); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 34; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_v_cobj = __pyx_t_1; __pyx_t_1 = 0; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":35 * cdef PyArrayInterface *a * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) # <<<<<<<<<<<<<< * return a.data * */ __pyx_v_a = ((struct PyArrayInterface *)PyCObject_AsVoidPtr(__pyx_v_cobj)); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":36 * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) * return a.data # <<<<<<<<<<<<<< * * cdef void *checkArray(ArrayType A, char typecode, int itemsize, */ __pyx_r = ((double *)__pyx_v_a->data); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_WriteUnraisable("cogent.evolve._likelihood_tree.uncheckedArrayDouble", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XDECREF(__pyx_v_cobj); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":38 * return a.data * * cdef void *checkArray(ArrayType A, char typecode, int itemsize, # <<<<<<<<<<<<<< * int nd, int **dims) except NULL: * cdef PyArrayInterface *a */ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_A, char __pyx_v_typecode, int __pyx_v_itemsize, int __pyx_v_nd, int **__pyx_v_dims) { struct PyArrayInterface *__pyx_v_a; PyObject *__pyx_v_cobj = NULL; char __pyx_v_typecode2; int __pyx_v_dimension; int __pyx_v_val; int *__pyx_v_var; void *__pyx_r; __Pyx_RefNannyDeclarations int __pyx_t_1; PyObject *__pyx_t_2 = NULL; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; PyObject *__pyx_t_5 = NULL; PyObject *__pyx_t_6 = NULL; int __pyx_t_7; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":43 * cdef int length, size * cdef char kind * if A is None: # <<<<<<<<<<<<<< * raise TypeError("Array required, got None") * cobj = A.__array_struct__ */ __pyx_t_1 = (__pyx_v_A == ((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)Py_None)); if (__pyx_t_1) { /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":44 * cdef char kind * if A is None: * raise TypeError("Array required, got None") # <<<<<<<<<<<<<< * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) */ __pyx_t_2 = PyObject_Call(__pyx_builtin_TypeError, ((PyObject *)__pyx_k_tuple_2), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 44; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_Raise(__pyx_t_2, 0, 0, 0); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 44; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L3; } __pyx_L3:; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":45 * if A is None: * raise TypeError("Array required, got None") * cobj = A.__array_struct__ # <<<<<<<<<<<<<< * a = PyCObject_AsVoidPtr(cobj) * if a.version != 2: */ __pyx_t_2 = PyObject_GetAttr(((PyObject *)__pyx_v_A), __pyx_n_s____array_struct__); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 45; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_v_cobj = __pyx_t_2; __pyx_t_2 = 0; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":46 * raise TypeError("Array required, got None") * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) # <<<<<<<<<<<<<< * if a.version != 2: * raise ValueError( */ __pyx_v_a = ((struct PyArrayInterface *)PyCObject_AsVoidPtr(__pyx_v_cobj)); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":47 * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) * if a.version != 2: # <<<<<<<<<<<<<< * raise ValueError( * "Unexpected array interface version %s" % str(a.version)) */ __pyx_t_1 = (__pyx_v_a->version != 2); if (__pyx_t_1) { /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":49 * if a.version != 2: * raise ValueError( * "Unexpected array interface version %s" % str(a.version)) # <<<<<<<<<<<<<< * cdef char typecode2 * typecode2 = a.typekind */ __pyx_t_2 = PyInt_FromLong(__pyx_v_a->version); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 49; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 49; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = PyObject_Call(((PyObject *)((PyObject*)(&PyString_Type))), ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 49; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __pyx_t_3 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_3), __pyx_t_2); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 49; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_3)); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, ((PyObject *)__pyx_t_3)); __Pyx_GIVEREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __pyx_t_3 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_2), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __Pyx_Raise(__pyx_t_3, 0, 0, 0); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L4; } __pyx_L4:; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":51 * "Unexpected array interface version %s" % str(a.version)) * cdef char typecode2 * typecode2 = a.typekind # <<<<<<<<<<<<<< * if typecode2 != typecode: * raise TypeError("'%s' type array required, got '%s'" % */ __pyx_v_typecode2 = __pyx_v_a->typekind; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":52 * cdef char typecode2 * typecode2 = a.typekind * if typecode2 != typecode: # <<<<<<<<<<<<<< * raise TypeError("'%s' type array required, got '%s'" % * (chr(typecode), chr(typecode2))) */ __pyx_t_1 = (__pyx_v_typecode2 != __pyx_v_typecode); if (__pyx_t_1) { /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":54 * if typecode2 != typecode: * raise TypeError("'%s' type array required, got '%s'" % * (chr(typecode), chr(typecode2))) # <<<<<<<<<<<<<< * if a.itemsize != itemsize: * raise TypeError("'%s%s' type array required, got '%s%s'" % */ __pyx_t_3 = PyInt_FromLong(__pyx_v_typecode); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyObject_Call(__pyx_builtin_chr, ((PyObject *)__pyx_t_2), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyInt_FromLong(__pyx_v_typecode2); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = PyObject_Call(__pyx_builtin_chr, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); __pyx_t_3 = 0; __pyx_t_2 = 0; __pyx_t_2 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_4), ((PyObject *)__pyx_t_4)); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 53; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 53; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_2)); __Pyx_GIVEREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyObject_Call(__pyx_builtin_TypeError, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 53; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_Raise(__pyx_t_2, 0, 0, 0); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 53; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":55 * raise TypeError("'%s' type array required, got '%s'" % * (chr(typecode), chr(typecode2))) * if a.itemsize != itemsize: # <<<<<<<<<<<<<< * raise TypeError("'%s%s' type array required, got '%s%s'" % * (chr(typecode), itemsize, chr(typecode2), a.itemsize)) */ __pyx_t_1 = (__pyx_v_a->itemsize != __pyx_v_itemsize); if (__pyx_t_1) { /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":57 * if a.itemsize != itemsize: * raise TypeError("'%s%s' type array required, got '%s%s'" % * (chr(typecode), itemsize, chr(typecode2), a.itemsize)) # <<<<<<<<<<<<<< * if a.nd != nd: * raise ValueError("%s dimensional array required, got %s" % */ __pyx_t_2 = PyInt_FromLong(__pyx_v_typecode); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = PyObject_Call(__pyx_builtin_chr, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __pyx_t_4 = PyInt_FromLong(__pyx_v_itemsize); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = PyInt_FromLong(__pyx_v_typecode2); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyTuple_New(1); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyObject_Call(__pyx_builtin_chr, ((PyObject *)__pyx_t_5), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyInt_FromLong(__pyx_v_a->itemsize); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_6 = PyTuple_New(4); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); PyTuple_SET_ITEM(__pyx_t_6, 0, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_6, 1, __pyx_t_4); __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_6, 2, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_6, 3, __pyx_t_5); __Pyx_GIVEREF(__pyx_t_5); __pyx_t_2 = 0; __pyx_t_4 = 0; __pyx_t_3 = 0; __pyx_t_5 = 0; __pyx_t_5 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_5), ((PyObject *)__pyx_t_6)); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_5)); __Pyx_DECREF(((PyObject *)__pyx_t_6)); __pyx_t_6 = 0; __pyx_t_6 = PyTuple_New(1); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); PyTuple_SET_ITEM(__pyx_t_6, 0, ((PyObject *)__pyx_t_5)); __Pyx_GIVEREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_builtin_TypeError, ((PyObject *)__pyx_t_6), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(((PyObject *)__pyx_t_6)); __pyx_t_6 = 0; __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":58 * raise TypeError("'%s%s' type array required, got '%s%s'" % * (chr(typecode), itemsize, chr(typecode2), a.itemsize)) * if a.nd != nd: # <<<<<<<<<<<<<< * raise ValueError("%s dimensional array required, got %s" % * (nd, a.nd)) */ __pyx_t_1 = (__pyx_v_a->nd != __pyx_v_nd); if (__pyx_t_1) { /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":60 * if a.nd != nd: * raise ValueError("%s dimensional array required, got %s" % * (nd, a.nd)) # <<<<<<<<<<<<<< * if not a.flags & CONTIGUOUS: * raise ValueError ('Noncontiguous array') */ __pyx_t_5 = PyInt_FromLong(__pyx_v_nd); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 60; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_6 = PyInt_FromLong(__pyx_v_a->nd); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 60; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); __pyx_t_3 = PyTuple_New(2); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 60; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_5); __Pyx_GIVEREF(__pyx_t_5); PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_t_6); __Pyx_GIVEREF(__pyx_t_6); __pyx_t_5 = 0; __pyx_t_6 = 0; __pyx_t_6 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_6), ((PyObject *)__pyx_t_3)); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 59; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_6)); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 59; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, ((PyObject *)__pyx_t_6)); __Pyx_GIVEREF(((PyObject *)__pyx_t_6)); __pyx_t_6 = 0; __pyx_t_6 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 59; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __Pyx_Raise(__pyx_t_6, 0, 0, 0); __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 59; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L7; } __pyx_L7:; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":61 * raise ValueError("%s dimensional array required, got %s" % * (nd, a.nd)) * if not a.flags & CONTIGUOUS: # <<<<<<<<<<<<<< * raise ValueError ('Noncontiguous array') * */ __pyx_t_1 = (!(__pyx_v_a->flags & CONTIGUOUS)); if (__pyx_t_1) { /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":62 * (nd, a.nd)) * if not a.flags & CONTIGUOUS: * raise ValueError ('Noncontiguous array') # <<<<<<<<<<<<<< * * cdef int dimension, val */ __pyx_t_6 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_8), NULL); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 62; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); __Pyx_Raise(__pyx_t_6, 0, 0, 0); __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 62; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L8; } __pyx_L8:; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":66 * cdef int dimension, val * cdef int *var * for dimension from 0 <= dimension < nd: # <<<<<<<<<<<<<< * val = a.shape[dimension] * var = dims[dimension] */ __pyx_t_7 = __pyx_v_nd; for (__pyx_v_dimension = 0; __pyx_v_dimension < __pyx_t_7; __pyx_v_dimension++) { /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":67 * cdef int *var * for dimension from 0 <= dimension < nd: * val = a.shape[dimension] # <<<<<<<<<<<<<< * var = dims[dimension] * if var[0] == 0: */ __pyx_v_val = (__pyx_v_a->shape[__pyx_v_dimension]); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":68 * for dimension from 0 <= dimension < nd: * val = a.shape[dimension] * var = dims[dimension] # <<<<<<<<<<<<<< * if var[0] == 0: * # Length unspecified, take it from the provided array */ __pyx_v_var = (__pyx_v_dims[__pyx_v_dimension]); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":69 * val = a.shape[dimension] * var = dims[dimension] * if var[0] == 0: # <<<<<<<<<<<<<< * # Length unspecified, take it from the provided array * var[0] = val */ __pyx_t_1 = ((__pyx_v_var[0]) == 0); if (__pyx_t_1) { /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":71 * if var[0] == 0: * # Length unspecified, take it from the provided array * var[0] = val # <<<<<<<<<<<<<< * elif var[0] != val: * # Length already specified, but not the same */ (__pyx_v_var[0]) = __pyx_v_val; goto __pyx_L11; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":72 * # Length unspecified, take it from the provided array * var[0] = val * elif var[0] != val: # <<<<<<<<<<<<<< * # Length already specified, but not the same * raise ValueError("Dimension %s is %s, expected %s" % */ __pyx_t_1 = ((__pyx_v_var[0]) != __pyx_v_val); if (__pyx_t_1) { /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":75 * # Length already specified, but not the same * raise ValueError("Dimension %s is %s, expected %s" % * (dimension, val, var[0])) # <<<<<<<<<<<<<< * else: * # Length matches what was expected */ __pyx_t_6 = PyInt_FromLong(__pyx_v_dimension); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 75; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); __pyx_t_3 = PyInt_FromLong(__pyx_v_val); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 75; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyInt_FromLong((__pyx_v_var[0])); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 75; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_4 = PyTuple_New(3); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 75; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_6); __Pyx_GIVEREF(__pyx_t_6); PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_4, 2, __pyx_t_5); __Pyx_GIVEREF(__pyx_t_5); __pyx_t_6 = 0; __pyx_t_3 = 0; __pyx_t_5 = 0; __pyx_t_5 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_9), ((PyObject *)__pyx_t_4)); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 74; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_5)); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 74; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_5)); __Pyx_GIVEREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 74; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 74; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L11; } /*else*/ { } __pyx_L11:; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":79 * # Length matches what was expected * pass * return a.data # <<<<<<<<<<<<<< * * */ __pyx_r = __pyx_v_a->data; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __Pyx_XDECREF(__pyx_t_6); __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArray", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF(__pyx_v_cobj); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":82 * * * cdef void *checkArray1D(ArrayType a, char typecode, int size, # <<<<<<<<<<<<<< * int *x) except NULL: * cdef int *dims[1] */ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray1D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, char __pyx_v_typecode, int __pyx_v_size, int *__pyx_v_x) { int *__pyx_v_dims[1]; void *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray1D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":85 * int *x) except NULL: * cdef int *dims[1] * dims[0] = x # <<<<<<<<<<<<<< * return checkArray(a, typecode, size, 1, dims) * */ (__pyx_v_dims[0]) = __pyx_v_x; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":86 * cdef int *dims[1] * dims[0] = x * return checkArray(a, typecode, size, 1, dims) # <<<<<<<<<<<<<< * * cdef void *checkArray2D(ArrayType a, char typecode, int size, */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray(__pyx_v_a, __pyx_v_typecode, __pyx_v_size, 1, __pyx_v_dims); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 86; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_t_1; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArray1D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":88 * return checkArray(a, typecode, size, 1, dims) * * cdef void *checkArray2D(ArrayType a, char typecode, int size, # <<<<<<<<<<<<<< * int *x, int *y) except NULL: * cdef int *dims[2] */ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray2D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, char __pyx_v_typecode, int __pyx_v_size, int *__pyx_v_x, int *__pyx_v_y) { int *__pyx_v_dims[2]; void *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray2D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":91 * int *x, int *y) except NULL: * cdef int *dims[2] * dims[0] = x # <<<<<<<<<<<<<< * dims[1] = y * return checkArray(a, typecode, size, 2, dims) */ (__pyx_v_dims[0]) = __pyx_v_x; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":92 * cdef int *dims[2] * dims[0] = x * dims[1] = y # <<<<<<<<<<<<<< * return checkArray(a, typecode, size, 2, dims) * */ (__pyx_v_dims[1]) = __pyx_v_y; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":93 * dims[0] = x * dims[1] = y * return checkArray(a, typecode, size, 2, dims) # <<<<<<<<<<<<<< * * cdef void *checkArray3D(ArrayType a, char typecode, int size, */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray(__pyx_v_a, __pyx_v_typecode, __pyx_v_size, 2, __pyx_v_dims); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 93; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_t_1; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArray2D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":95 * return checkArray(a, typecode, size, 2, dims) * * cdef void *checkArray3D(ArrayType a, char typecode, int size, # <<<<<<<<<<<<<< * int *x, int *y, int *z) except NULL: * cdef int *dims[3] */ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray3D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, char __pyx_v_typecode, int __pyx_v_size, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { int *__pyx_v_dims[3]; void *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray3D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":98 * int *x, int *y, int *z) except NULL: * cdef int *dims[3] * dims[0] = x # <<<<<<<<<<<<<< * dims[1] = y * dims[2] = z */ (__pyx_v_dims[0]) = __pyx_v_x; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":99 * cdef int *dims[3] * dims[0] = x * dims[1] = y # <<<<<<<<<<<<<< * dims[2] = z * return checkArray(a, typecode, size, 3, dims) */ (__pyx_v_dims[1]) = __pyx_v_y; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":100 * dims[0] = x * dims[1] = y * dims[2] = z # <<<<<<<<<<<<<< * return checkArray(a, typecode, size, 3, dims) * */ (__pyx_v_dims[2]) = __pyx_v_z; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":101 * dims[1] = y * dims[2] = z * return checkArray(a, typecode, size, 3, dims) # <<<<<<<<<<<<<< * * cdef void *checkArray4D(ArrayType a, char typecode, int size, */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray(__pyx_v_a, __pyx_v_typecode, __pyx_v_size, 3, __pyx_v_dims); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 101; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_t_1; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArray3D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":103 * return checkArray(a, typecode, size, 3, dims) * * cdef void *checkArray4D(ArrayType a, char typecode, int size, # <<<<<<<<<<<<<< * int *w, int *x, int *y, int *z) except NULL: * cdef int *dims[4] */ static void *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray4D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, char __pyx_v_typecode, int __pyx_v_size, int *__pyx_v_w, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { int *__pyx_v_dims[4]; void *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray4D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":106 * int *w, int *x, int *y, int *z) except NULL: * cdef int *dims[4] * dims[0] = w # <<<<<<<<<<<<<< * dims[1] = x * dims[2] = y */ (__pyx_v_dims[0]) = __pyx_v_w; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":107 * cdef int *dims[4] * dims[0] = w * dims[1] = x # <<<<<<<<<<<<<< * dims[2] = y * dims[3] = z */ (__pyx_v_dims[1]) = __pyx_v_x; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":108 * dims[0] = w * dims[1] = x * dims[2] = y # <<<<<<<<<<<<<< * dims[3] = z * return checkArray(a, typecode, size, 4, dims) */ (__pyx_v_dims[2]) = __pyx_v_y; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":109 * dims[1] = x * dims[2] = y * dims[3] = z # <<<<<<<<<<<<<< * return checkArray(a, typecode, size, 4, dims) * */ (__pyx_v_dims[3]) = __pyx_v_z; /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":110 * dims[2] = y * dims[3] = z * return checkArray(a, typecode, size, 4, dims) # <<<<<<<<<<<<<< * * */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray(__pyx_v_a, __pyx_v_typecode, __pyx_v_size, 4, __pyx_v_dims); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 110; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_t_1; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArray4D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":113 * * * cdef double * checkArrayDouble1D(ArrayType a, int *x) except NULL: # <<<<<<<<<<<<<< * return checkArray1D(a, c'f', sizeof(double), x) * */ static double *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble1D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, int *__pyx_v_x) { double *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayDouble1D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":114 * * cdef double * checkArrayDouble1D(ArrayType a, int *x) except NULL: * return checkArray1D(a, c'f', sizeof(double), x) # <<<<<<<<<<<<<< * * cdef double * checkArrayDouble2D(ArrayType a, int *x, int *y) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray1D(__pyx_v_a, 'f', (sizeof(double)), __pyx_v_x); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 114; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((double *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArrayDouble1D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":116 * return checkArray1D(a, c'f', sizeof(double), x) * * cdef double * checkArrayDouble2D(ArrayType a, int *x, int *y) except NULL: # <<<<<<<<<<<<<< * return checkArray2D(a, c'f', sizeof(double), x, y) * */ static double *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble2D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, int *__pyx_v_x, int *__pyx_v_y) { double *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayDouble2D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":117 * * cdef double * checkArrayDouble2D(ArrayType a, int *x, int *y) except NULL: * return checkArray2D(a, c'f', sizeof(double), x, y) # <<<<<<<<<<<<<< * * cdef double * checkArrayDouble3D(ArrayType a, int *x, int *y, int *z) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray2D(__pyx_v_a, 'f', (sizeof(double)), __pyx_v_x, __pyx_v_y); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 117; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((double *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArrayDouble2D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":119 * return checkArray2D(a, c'f', sizeof(double), x, y) * * cdef double * checkArrayDouble3D(ArrayType a, int *x, int *y, int *z) except NULL: # <<<<<<<<<<<<<< * return checkArray3D(a, c'f', sizeof(double), x, y, z) * */ static double *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble3D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { double *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayDouble3D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":120 * * cdef double * checkArrayDouble3D(ArrayType a, int *x, int *y, int *z) except NULL: * return checkArray3D(a, c'f', sizeof(double), x, y, z) # <<<<<<<<<<<<<< * * cdef double * checkArrayDouble4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray3D(__pyx_v_a, 'f', (sizeof(double)), __pyx_v_x, __pyx_v_y, __pyx_v_z); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 120; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((double *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArrayDouble3D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":122 * return checkArray3D(a, c'f', sizeof(double), x, y, z) * * cdef double * checkArrayDouble4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: # <<<<<<<<<<<<<< * return checkArray4D(a, c'f', sizeof(double), w, x, y, z) * */ static double *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble4D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, int *__pyx_v_w, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { double *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayDouble4D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":123 * * cdef double * checkArrayDouble4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: * return checkArray4D(a, c'f', sizeof(double), w, x, y, z) # <<<<<<<<<<<<<< * * */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray4D(__pyx_v_a, 'f', (sizeof(double)), __pyx_v_w, __pyx_v_x, __pyx_v_y, __pyx_v_z); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 123; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((double *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArrayDouble4D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":126 * * * cdef long * checkArrayLong1D(ArrayType a, int *x) except NULL: # <<<<<<<<<<<<<< * return checkArray1D(a, c'i', sizeof(long), x) * */ static long *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayLong1D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, int *__pyx_v_x) { long *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayLong1D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":127 * * cdef long * checkArrayLong1D(ArrayType a, int *x) except NULL: * return checkArray1D(a, c'i', sizeof(long), x) # <<<<<<<<<<<<<< * * cdef long * checkArrayLong2D(ArrayType a, int *x, int *y) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray1D(__pyx_v_a, 'i', (sizeof(long)), __pyx_v_x); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 127; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((long *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArrayLong1D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":129 * return checkArray1D(a, c'i', sizeof(long), x) * * cdef long * checkArrayLong2D(ArrayType a, int *x, int *y) except NULL: # <<<<<<<<<<<<<< * return checkArray2D(a, c'i', sizeof(long), x, y) * */ static long *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayLong2D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, int *__pyx_v_x, int *__pyx_v_y) { long *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayLong2D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":130 * * cdef long * checkArrayLong2D(ArrayType a, int *x, int *y) except NULL: * return checkArray2D(a, c'i', sizeof(long), x, y) # <<<<<<<<<<<<<< * * cdef long * checkArrayLong3D(ArrayType a, int *x, int *y, int *z) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray2D(__pyx_v_a, 'i', (sizeof(long)), __pyx_v_x, __pyx_v_y); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 130; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((long *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArrayLong2D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":132 * return checkArray2D(a, c'i', sizeof(long), x, y) * * cdef long * checkArrayLong3D(ArrayType a, int *x, int *y, int *z) except NULL: # <<<<<<<<<<<<<< * return checkArray3D(a, c'i', sizeof(long), x, y, z) * */ static long *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayLong3D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { long *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayLong3D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":133 * * cdef long * checkArrayLong3D(ArrayType a, int *x, int *y, int *z) except NULL: * return checkArray3D(a, c'i', sizeof(long), x, y, z) # <<<<<<<<<<<<<< * * cdef long * checkArrayLong4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray3D(__pyx_v_a, 'i', (sizeof(long)), __pyx_v_x, __pyx_v_y, __pyx_v_z); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 133; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((long *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArrayLong3D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":135 * return checkArray3D(a, c'i', sizeof(long), x, y, z) * * cdef long * checkArrayLong4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: # <<<<<<<<<<<<<< * return checkArray4D(a, c'i', sizeof(long), w, x, y, z) */ static long *__pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayLong4D(__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType __pyx_v_a, int *__pyx_v_w, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { long *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayLong4D", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":136 * * cdef long * checkArrayLong4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: * return checkArray4D(a, c'i', sizeof(long), w, x, y, z) # <<<<<<<<<<<<<< */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArray4D(__pyx_v_a, 'i', (sizeof(long)), __pyx_v_w, __pyx_v_x, __pyx_v_y, __pyx_v_z); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 136; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((long *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.checkArrayLong4D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_6evolve_16_likelihood_tree_1sumInputLikelihoods(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_6cogent_6evolve_16_likelihood_tree_1sumInputLikelihoods = {__Pyx_NAMESTR("sumInputLikelihoods"), (PyCFunction)__pyx_pw_6cogent_6evolve_16_likelihood_tree_1sumInputLikelihoods, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(0)}; static PyObject *__pyx_pw_6cogent_6evolve_16_likelihood_tree_1sumInputLikelihoods(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyObject *__pyx_v_child_indexes = 0; PyObject *__pyx_v_result = 0; PyObject *__pyx_v_likelihoods = 0; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__child_indexes,&__pyx_n_s__result,&__pyx_n_s__likelihoods,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("sumInputLikelihoods (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[3] = {0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__child_indexes); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__result); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("sumInputLikelihoods", 1, 3, 3, 1); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__likelihoods); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("sumInputLikelihoods", 1, 3, 3, 2); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "sumInputLikelihoods") < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 3) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); } __pyx_v_child_indexes = values[0]; __pyx_v_result = values[1]; __pyx_v_likelihoods = values[2]; } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("sumInputLikelihoods", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.sumInputLikelihoods", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_6cogent_6evolve_16_likelihood_tree_sumInputLikelihoods(__pyx_self, __pyx_v_child_indexes, __pyx_v_result, __pyx_v_likelihoods); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/evolve/_likelihood_tree.pyx":8 * double log (double x) * * def sumInputLikelihoods(child_indexes, result, likelihoods): # <<<<<<<<<<<<<< * # M is dim of alphabet, S is non-redundandt parent seq length, * # U is length */ static PyObject *__pyx_pf_6cogent_6evolve_16_likelihood_tree_sumInputLikelihoods(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_child_indexes, PyObject *__pyx_v_result, PyObject *__pyx_v_likelihoods) { int __pyx_v_M; int __pyx_v_S; int __pyx_v_U; int __pyx_v_m; int __pyx_v_i; int __pyx_v_u; int __pyx_v_c; double *__pyx_v_values_data; long *__pyx_v_index_data; double *__pyx_v_target_data; CYTHON_UNUSED long __pyx_v_first; PyObject *__pyx_v_index = NULL; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations double *__pyx_t_1; PyObject *__pyx_t_2 = NULL; Py_ssize_t __pyx_t_3; PyObject *(*__pyx_t_4)(PyObject *); PyObject *__pyx_t_5 = NULL; long *__pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; int __pyx_t_10; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("sumInputLikelihoods", 0); /* "cogent/evolve/_likelihood_tree.pyx":18 * cdef double *target_data * * M = S = 0 # <<<<<<<<<<<<<< * target_data = checkArrayDouble2D(result, &S, &M) * first = 1 */ __pyx_v_M = 0; __pyx_v_S = 0; /* "cogent/evolve/_likelihood_tree.pyx":19 * * M = S = 0 * target_data = checkArrayDouble2D(result, &S, &M) # <<<<<<<<<<<<<< * first = 1 * c = 0 */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble2D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_result), (&__pyx_v_S), (&__pyx_v_M)); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 19; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_target_data = __pyx_t_1; /* "cogent/evolve/_likelihood_tree.pyx":20 * M = S = 0 * target_data = checkArrayDouble2D(result, &S, &M) * first = 1 # <<<<<<<<<<<<<< * c = 0 * for index in child_indexes: */ __pyx_v_first = 1; /* "cogent/evolve/_likelihood_tree.pyx":21 * target_data = checkArrayDouble2D(result, &S, &M) * first = 1 * c = 0 # <<<<<<<<<<<<<< * for index in child_indexes: * U = 0 */ __pyx_v_c = 0; /* "cogent/evolve/_likelihood_tree.pyx":22 * first = 1 * c = 0 * for index in child_indexes: # <<<<<<<<<<<<<< * U = 0 * index_data = checkArrayLong1D(index, &S) */ if (PyList_CheckExact(__pyx_v_child_indexes) || PyTuple_CheckExact(__pyx_v_child_indexes)) { __pyx_t_2 = __pyx_v_child_indexes; __Pyx_INCREF(__pyx_t_2); __pyx_t_3 = 0; __pyx_t_4 = NULL; } else { __pyx_t_3 = -1; __pyx_t_2 = PyObject_GetIter(__pyx_v_child_indexes); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_4 = Py_TYPE(__pyx_t_2)->tp_iternext; } for (;;) { if (!__pyx_t_4 && PyList_CheckExact(__pyx_t_2)) { if (__pyx_t_3 >= PyList_GET_SIZE(__pyx_t_2)) break; __pyx_t_5 = PyList_GET_ITEM(__pyx_t_2, __pyx_t_3); __Pyx_INCREF(__pyx_t_5); __pyx_t_3++; } else if (!__pyx_t_4 && PyTuple_CheckExact(__pyx_t_2)) { if (__pyx_t_3 >= PyTuple_GET_SIZE(__pyx_t_2)) break; __pyx_t_5 = PyTuple_GET_ITEM(__pyx_t_2, __pyx_t_3); __Pyx_INCREF(__pyx_t_5); __pyx_t_3++; } else { __pyx_t_5 = __pyx_t_4(__pyx_t_2); if (unlikely(!__pyx_t_5)) { if (PyErr_Occurred()) { if (likely(PyErr_ExceptionMatches(PyExc_StopIteration))) PyErr_Clear(); else {__pyx_filename = __pyx_f[1]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } break; } __Pyx_GOTREF(__pyx_t_5); } __Pyx_XDECREF(__pyx_v_index); __pyx_v_index = __pyx_t_5; __pyx_t_5 = 0; /* "cogent/evolve/_likelihood_tree.pyx":23 * c = 0 * for index in child_indexes: * U = 0 # <<<<<<<<<<<<<< * index_data = checkArrayLong1D(index, &S) * values_data = checkArrayDouble2D(likelihoods[c], &U, &M) */ __pyx_v_U = 0; /* "cogent/evolve/_likelihood_tree.pyx":24 * for index in child_indexes: * U = 0 * index_data = checkArrayLong1D(index, &S) # <<<<<<<<<<<<<< * values_data = checkArrayDouble2D(likelihoods[c], &U, &M) * #if index_data[S-1] >= U: */ __pyx_t_6 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayLong1D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_index), (&__pyx_v_S)); if (unlikely(__pyx_t_6 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 24; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_index_data = __pyx_t_6; /* "cogent/evolve/_likelihood_tree.pyx":25 * U = 0 * index_data = checkArrayLong1D(index, &S) * values_data = checkArrayDouble2D(likelihoods[c], &U, &M) # <<<<<<<<<<<<<< * #if index_data[S-1] >= U: * # raise RangeError */ __pyx_t_5 = __Pyx_GetItemInt(__pyx_v_likelihoods, __pyx_v_c, sizeof(int), PyInt_FromLong); if (!__pyx_t_5) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 25; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble2D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_t_5), (&__pyx_v_U), (&__pyx_v_M)); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 25; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_v_values_data = __pyx_t_1; /* "cogent/evolve/_likelihood_tree.pyx":28 * #if index_data[S-1] >= U: * # raise RangeError * if c == 0: # <<<<<<<<<<<<<< * for i from 0 <= i < S: * u = index_data[i] */ __pyx_t_7 = (__pyx_v_c == 0); if (__pyx_t_7) { /* "cogent/evolve/_likelihood_tree.pyx":29 * # raise RangeError * if c == 0: * for i from 0 <= i < S: # <<<<<<<<<<<<<< * u = index_data[i] * for m from 0 <= m < M: */ __pyx_t_8 = __pyx_v_S; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_8; __pyx_v_i++) { /* "cogent/evolve/_likelihood_tree.pyx":30 * if c == 0: * for i from 0 <= i < S: * u = index_data[i] # <<<<<<<<<<<<<< * for m from 0 <= m < M: * target_data[M*i+m] = values_data[M*u+m] */ __pyx_v_u = (__pyx_v_index_data[__pyx_v_i]); /* "cogent/evolve/_likelihood_tree.pyx":31 * for i from 0 <= i < S: * u = index_data[i] * for m from 0 <= m < M: # <<<<<<<<<<<<<< * target_data[M*i+m] = values_data[M*u+m] * else: */ __pyx_t_9 = __pyx_v_M; for (__pyx_v_m = 0; __pyx_v_m < __pyx_t_9; __pyx_v_m++) { /* "cogent/evolve/_likelihood_tree.pyx":32 * u = index_data[i] * for m from 0 <= m < M: * target_data[M*i+m] = values_data[M*u+m] # <<<<<<<<<<<<<< * else: * for i from 0 <= i < S: # col of parent data */ (__pyx_v_target_data[((__pyx_v_M * __pyx_v_i) + __pyx_v_m)]) = (__pyx_v_values_data[((__pyx_v_M * __pyx_v_u) + __pyx_v_m)]); } } goto __pyx_L5; } /*else*/ { /* "cogent/evolve/_likelihood_tree.pyx":34 * target_data[M*i+m] = values_data[M*u+m] * else: * for i from 0 <= i < S: # col of parent data # <<<<<<<<<<<<<< * u = index_data[i] # col of childs data * for m from 0 <= m < M: */ __pyx_t_8 = __pyx_v_S; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_8; __pyx_v_i++) { /* "cogent/evolve/_likelihood_tree.pyx":35 * else: * for i from 0 <= i < S: # col of parent data * u = index_data[i] # col of childs data # <<<<<<<<<<<<<< * for m from 0 <= m < M: * target_data[M*i+m] *= values_data[M*u+m] */ __pyx_v_u = (__pyx_v_index_data[__pyx_v_i]); /* "cogent/evolve/_likelihood_tree.pyx":36 * for i from 0 <= i < S: # col of parent data * u = index_data[i] # col of childs data * for m from 0 <= m < M: # <<<<<<<<<<<<<< * target_data[M*i+m] *= values_data[M*u+m] * c += 1 */ __pyx_t_9 = __pyx_v_M; for (__pyx_v_m = 0; __pyx_v_m < __pyx_t_9; __pyx_v_m++) { /* "cogent/evolve/_likelihood_tree.pyx":37 * u = index_data[i] # col of childs data * for m from 0 <= m < M: * target_data[M*i+m] *= values_data[M*u+m] # <<<<<<<<<<<<<< * c += 1 * return result */ __pyx_t_10 = ((__pyx_v_M * __pyx_v_i) + __pyx_v_m); (__pyx_v_target_data[__pyx_t_10]) = ((__pyx_v_target_data[__pyx_t_10]) * (__pyx_v_values_data[((__pyx_v_M * __pyx_v_u) + __pyx_v_m)])); } } } __pyx_L5:; /* "cogent/evolve/_likelihood_tree.pyx":38 * for m from 0 <= m < M: * target_data[M*i+m] *= values_data[M*u+m] * c += 1 # <<<<<<<<<<<<<< * return result * */ __pyx_v_c = (__pyx_v_c + 1); } __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* "cogent/evolve/_likelihood_tree.pyx":39 * target_data[M*i+m] *= values_data[M*u+m] * c += 1 * return result # <<<<<<<<<<<<<< * * def getTotalLogLikelihood(counts, input_likelihoods, mprobs): */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(__pyx_v_result); __pyx_r = __pyx_v_result; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_5); __Pyx_AddTraceback("cogent.evolve._likelihood_tree.sumInputLikelihoods", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF(__pyx_v_index); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_6evolve_16_likelihood_tree_3getTotalLogLikelihood(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_6cogent_6evolve_16_likelihood_tree_3getTotalLogLikelihood = {__Pyx_NAMESTR("getTotalLogLikelihood"), (PyCFunction)__pyx_pw_6cogent_6evolve_16_likelihood_tree_3getTotalLogLikelihood, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(0)}; static PyObject *__pyx_pw_6cogent_6evolve_16_likelihood_tree_3getTotalLogLikelihood(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyObject *__pyx_v_counts = 0; PyObject *__pyx_v_input_likelihoods = 0; PyObject *__pyx_v_mprobs = 0; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__counts,&__pyx_n_s__input_likelihoods,&__pyx_n_s__mprobs,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("getTotalLogLikelihood (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[3] = {0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__counts); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__input_likelihoods); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("getTotalLogLikelihood", 1, 3, 3, 1); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__mprobs); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("getTotalLogLikelihood", 1, 3, 3, 2); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "getTotalLogLikelihood") < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 3) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); } __pyx_v_counts = values[0]; __pyx_v_input_likelihoods = values[1]; __pyx_v_mprobs = values[2]; } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("getTotalLogLikelihood", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.getTotalLogLikelihood", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_6cogent_6evolve_16_likelihood_tree_2getTotalLogLikelihood(__pyx_self, __pyx_v_counts, __pyx_v_input_likelihoods, __pyx_v_mprobs); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/evolve/_likelihood_tree.pyx":41 * return result * * def getTotalLogLikelihood(counts, input_likelihoods, mprobs): # <<<<<<<<<<<<<< * cdef int S, M, i, m * cdef double posn, total */ static PyObject *__pyx_pf_6cogent_6evolve_16_likelihood_tree_2getTotalLogLikelihood(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_counts, PyObject *__pyx_v_input_likelihoods, PyObject *__pyx_v_mprobs) { int __pyx_v_S; int __pyx_v_M; int __pyx_v_i; int __pyx_v_m; double __pyx_v_posn; double __pyx_v_total; double *__pyx_v_likelihoods_data; double *__pyx_v_mprobs_data; double *__pyx_v_weights_data; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations double *__pyx_t_1; double *__pyx_t_2; int __pyx_t_3; int __pyx_t_4; PyObject *__pyx_t_5 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("getTotalLogLikelihood", 0); /* "cogent/evolve/_likelihood_tree.pyx":46 * cdef double *likelihoods_data, *mprobs_data, *weights_data * * S = M = 0 # <<<<<<<<<<<<<< * mprobs_data = checkArrayDouble1D(mprobs, &M) * weights_data = checkArrayDouble1D(counts, &S) */ __pyx_v_S = 0; __pyx_v_M = 0; /* "cogent/evolve/_likelihood_tree.pyx":47 * * S = M = 0 * mprobs_data = checkArrayDouble1D(mprobs, &M) # <<<<<<<<<<<<<< * weights_data = checkArrayDouble1D(counts, &S) * likelihoods_data = checkArrayDouble2D(input_likelihoods, &S, &M) */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble1D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_mprobs), (&__pyx_v_M)); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 47; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_mprobs_data = __pyx_t_1; /* "cogent/evolve/_likelihood_tree.pyx":48 * S = M = 0 * mprobs_data = checkArrayDouble1D(mprobs, &M) * weights_data = checkArrayDouble1D(counts, &S) # <<<<<<<<<<<<<< * likelihoods_data = checkArrayDouble2D(input_likelihoods, &S, &M) * total = 0.0 */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble1D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_counts), (&__pyx_v_S)); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_weights_data = __pyx_t_1; /* "cogent/evolve/_likelihood_tree.pyx":49 * mprobs_data = checkArrayDouble1D(mprobs, &M) * weights_data = checkArrayDouble1D(counts, &S) * likelihoods_data = checkArrayDouble2D(input_likelihoods, &S, &M) # <<<<<<<<<<<<<< * total = 0.0 * for i from 0 <= i < S: */ __pyx_t_2 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble2D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_input_likelihoods), (&__pyx_v_S), (&__pyx_v_M)); if (unlikely(__pyx_t_2 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 49; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_likelihoods_data = __pyx_t_2; /* "cogent/evolve/_likelihood_tree.pyx":50 * weights_data = checkArrayDouble1D(counts, &S) * likelihoods_data = checkArrayDouble2D(input_likelihoods, &S, &M) * total = 0.0 # <<<<<<<<<<<<<< * for i from 0 <= i < S: * posn = 0.0 */ __pyx_v_total = 0.0; /* "cogent/evolve/_likelihood_tree.pyx":51 * likelihoods_data = checkArrayDouble2D(input_likelihoods, &S, &M) * total = 0.0 * for i from 0 <= i < S: # <<<<<<<<<<<<<< * posn = 0.0 * for m from 0 <= m < M: */ __pyx_t_3 = __pyx_v_S; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_3; __pyx_v_i++) { /* "cogent/evolve/_likelihood_tree.pyx":52 * total = 0.0 * for i from 0 <= i < S: * posn = 0.0 # <<<<<<<<<<<<<< * for m from 0 <= m < M: * posn += likelihoods_data[i*M+m] * mprobs_data[m] */ __pyx_v_posn = 0.0; /* "cogent/evolve/_likelihood_tree.pyx":53 * for i from 0 <= i < S: * posn = 0.0 * for m from 0 <= m < M: # <<<<<<<<<<<<<< * posn += likelihoods_data[i*M+m] * mprobs_data[m] * total += log(posn)*weights_data[i] */ __pyx_t_4 = __pyx_v_M; for (__pyx_v_m = 0; __pyx_v_m < __pyx_t_4; __pyx_v_m++) { /* "cogent/evolve/_likelihood_tree.pyx":54 * posn = 0.0 * for m from 0 <= m < M: * posn += likelihoods_data[i*M+m] * mprobs_data[m] # <<<<<<<<<<<<<< * total += log(posn)*weights_data[i] * return total */ __pyx_v_posn = (__pyx_v_posn + ((__pyx_v_likelihoods_data[((__pyx_v_i * __pyx_v_M) + __pyx_v_m)]) * (__pyx_v_mprobs_data[__pyx_v_m]))); } /* "cogent/evolve/_likelihood_tree.pyx":55 * for m from 0 <= m < M: * posn += likelihoods_data[i*M+m] * mprobs_data[m] * total += log(posn)*weights_data[i] # <<<<<<<<<<<<<< * return total * */ __pyx_v_total = (__pyx_v_total + (log(__pyx_v_posn) * (__pyx_v_weights_data[__pyx_v_i]))); } /* "cogent/evolve/_likelihood_tree.pyx":56 * posn += likelihoods_data[i*M+m] * mprobs_data[m] * total += log(posn)*weights_data[i] * return total # <<<<<<<<<<<<<< * * def getLogSumAcrossSites(counts, input_likelihoods): */ __Pyx_XDECREF(__pyx_r); __pyx_t_5 = PyFloat_FromDouble(__pyx_v_total); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_r = __pyx_t_5; __pyx_t_5 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_5); __Pyx_AddTraceback("cogent.evolve._likelihood_tree.getTotalLogLikelihood", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_6evolve_16_likelihood_tree_5getLogSumAcrossSites(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_6cogent_6evolve_16_likelihood_tree_5getLogSumAcrossSites = {__Pyx_NAMESTR("getLogSumAcrossSites"), (PyCFunction)__pyx_pw_6cogent_6evolve_16_likelihood_tree_5getLogSumAcrossSites, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(0)}; static PyObject *__pyx_pw_6cogent_6evolve_16_likelihood_tree_5getLogSumAcrossSites(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyObject *__pyx_v_counts = 0; PyObject *__pyx_v_input_likelihoods = 0; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__counts,&__pyx_n_s__input_likelihoods,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("getLogSumAcrossSites (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[2] = {0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__counts); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__input_likelihoods); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("getLogSumAcrossSites", 1, 2, 2, 1); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 58; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "getLogSumAcrossSites") < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 58; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 2) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); } __pyx_v_counts = values[0]; __pyx_v_input_likelihoods = values[1]; } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("getLogSumAcrossSites", 1, 2, 2, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 58; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.getLogSumAcrossSites", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_6cogent_6evolve_16_likelihood_tree_4getLogSumAcrossSites(__pyx_self, __pyx_v_counts, __pyx_v_input_likelihoods); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/evolve/_likelihood_tree.pyx":58 * return total * * def getLogSumAcrossSites(counts, input_likelihoods): # <<<<<<<<<<<<<< * cdef int S, i * cdef double total */ static PyObject *__pyx_pf_6cogent_6evolve_16_likelihood_tree_4getLogSumAcrossSites(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_counts, PyObject *__pyx_v_input_likelihoods) { int __pyx_v_S; int __pyx_v_i; double __pyx_v_total; double *__pyx_v_likelihoods_data; double *__pyx_v_weights_data; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations double *__pyx_t_1; int __pyx_t_2; PyObject *__pyx_t_3 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("getLogSumAcrossSites", 0); /* "cogent/evolve/_likelihood_tree.pyx":62 * cdef double total * cdef double *likelihoods_data, *weights_data * S = 0 # <<<<<<<<<<<<<< * weights_data = checkArrayDouble1D(counts, &S) * likelihoods_data = checkArrayDouble1D(input_likelihoods, &S) */ __pyx_v_S = 0; /* "cogent/evolve/_likelihood_tree.pyx":63 * cdef double *likelihoods_data, *weights_data * S = 0 * weights_data = checkArrayDouble1D(counts, &S) # <<<<<<<<<<<<<< * likelihoods_data = checkArrayDouble1D(input_likelihoods, &S) * total = 0.0 */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble1D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_counts), (&__pyx_v_S)); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 63; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_weights_data = __pyx_t_1; /* "cogent/evolve/_likelihood_tree.pyx":64 * S = 0 * weights_data = checkArrayDouble1D(counts, &S) * likelihoods_data = checkArrayDouble1D(input_likelihoods, &S) # <<<<<<<<<<<<<< * total = 0.0 * for i from 0 <= i < S: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble1D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_input_likelihoods), (&__pyx_v_S)); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 64; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_likelihoods_data = __pyx_t_1; /* "cogent/evolve/_likelihood_tree.pyx":65 * weights_data = checkArrayDouble1D(counts, &S) * likelihoods_data = checkArrayDouble1D(input_likelihoods, &S) * total = 0.0 # <<<<<<<<<<<<<< * for i from 0 <= i < S: * total += log(likelihoods_data[i])*weights_data[i] */ __pyx_v_total = 0.0; /* "cogent/evolve/_likelihood_tree.pyx":66 * likelihoods_data = checkArrayDouble1D(input_likelihoods, &S) * total = 0.0 * for i from 0 <= i < S: # <<<<<<<<<<<<<< * total += log(likelihoods_data[i])*weights_data[i] * return total */ __pyx_t_2 = __pyx_v_S; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_2; __pyx_v_i++) { /* "cogent/evolve/_likelihood_tree.pyx":67 * total = 0.0 * for i from 0 <= i < S: * total += log(likelihoods_data[i])*weights_data[i] # <<<<<<<<<<<<<< * return total * */ __pyx_v_total = (__pyx_v_total + (log((__pyx_v_likelihoods_data[__pyx_v_i])) * (__pyx_v_weights_data[__pyx_v_i]))); } /* "cogent/evolve/_likelihood_tree.pyx":68 * for i from 0 <= i < S: * total += log(likelihoods_data[i])*weights_data[i] * return total # <<<<<<<<<<<<<< * * def logDotReduce(index, patch_probs, switch_probs, plhs): */ __Pyx_XDECREF(__pyx_r); __pyx_t_3 = PyFloat_FromDouble(__pyx_v_total); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 68; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_r = __pyx_t_3; __pyx_t_3 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_3); __Pyx_AddTraceback("cogent.evolve._likelihood_tree.getLogSumAcrossSites", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_6evolve_16_likelihood_tree_7logDotReduce(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_6cogent_6evolve_16_likelihood_tree_7logDotReduce = {__Pyx_NAMESTR("logDotReduce"), (PyCFunction)__pyx_pw_6cogent_6evolve_16_likelihood_tree_7logDotReduce, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(0)}; static PyObject *__pyx_pw_6cogent_6evolve_16_likelihood_tree_7logDotReduce(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyObject *__pyx_v_index = 0; PyObject *__pyx_v_patch_probs = 0; PyObject *__pyx_v_switch_probs = 0; PyObject *__pyx_v_plhs = 0; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__index,&__pyx_n_s__patch_probs,&__pyx_n_s__switch_probs,&__pyx_n_s__plhs,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("logDotReduce (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[4] = {0,0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3); case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__index); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__patch_probs); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("logDotReduce", 1, 4, 4, 1); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 70; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__switch_probs); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("logDotReduce", 1, 4, 4, 2); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 70; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 3: values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__plhs); if (likely(values[3])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("logDotReduce", 1, 4, 4, 3); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 70; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "logDotReduce") < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 70; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 4) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); values[3] = PyTuple_GET_ITEM(__pyx_args, 3); } __pyx_v_index = values[0]; __pyx_v_patch_probs = values[1]; __pyx_v_switch_probs = values[2]; __pyx_v_plhs = values[3]; } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("logDotReduce", 1, 4, 4, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 70; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.evolve._likelihood_tree.logDotReduce", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_6cogent_6evolve_16_likelihood_tree_6logDotReduce(__pyx_self, __pyx_v_index, __pyx_v_patch_probs, __pyx_v_switch_probs, __pyx_v_plhs); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/evolve/_likelihood_tree.pyx":70 * return total * * def logDotReduce(index, patch_probs, switch_probs, plhs): # <<<<<<<<<<<<<< * cdef int site, i, j, k, n, uniq, exponent, length, most_probable_state * cdef double result, BASE */ static PyObject *__pyx_pf_6cogent_6evolve_16_likelihood_tree_6logDotReduce(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_index, PyObject *__pyx_v_patch_probs, PyObject *__pyx_v_switch_probs, PyObject *__pyx_v_plhs) { int __pyx_v_site; int __pyx_v_i; int __pyx_v_j; int __pyx_v_k; int __pyx_v_n; int __pyx_v_uniq; int __pyx_v_exponent; int __pyx_v_length; int __pyx_v_most_probable_state; double __pyx_v_result; double __pyx_v_BASE; double *__pyx_v_sp; double *__pyx_v_pl; double *__pyx_v_state; double *__pyx_v_prev; double *__pyx_v_tmp; long *__pyx_v_index_data; PyObject *__pyx_v_patch_probs1 = 0; PyObject *__pyx_v_patch_probs2 = 0; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; PyObject *__pyx_t_2 = NULL; double *__pyx_t_3; double *__pyx_t_4; long *__pyx_t_5; int __pyx_t_6; int __pyx_t_7; PyObject *__pyx_t_8 = NULL; int __pyx_t_9; int __pyx_t_10; int __pyx_t_11; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("logDotReduce", 0); /* "cogent/evolve/_likelihood_tree.pyx":76 * cdef long *index_data * cdef object patch_probs1, patch_probs2 * BASE = 2.0 ** 1000 # <<<<<<<<<<<<<< * patch_probs1 = patch_probs.copy() * patch_probs2 = patch_probs.copy() */ __pyx_v_BASE = pow(2.0, 1000.0); /* "cogent/evolve/_likelihood_tree.pyx":77 * cdef object patch_probs1, patch_probs2 * BASE = 2.0 ** 1000 * patch_probs1 = patch_probs.copy() # <<<<<<<<<<<<<< * patch_probs2 = patch_probs.copy() * n = uniq = length = 0 */ __pyx_t_1 = PyObject_GetAttr(__pyx_v_patch_probs, __pyx_n_s__copy); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 77; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = PyObject_Call(__pyx_t_1, ((PyObject *)__pyx_empty_tuple), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 77; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_v_patch_probs1 = __pyx_t_2; __pyx_t_2 = 0; /* "cogent/evolve/_likelihood_tree.pyx":78 * BASE = 2.0 ** 1000 * patch_probs1 = patch_probs.copy() * patch_probs2 = patch_probs.copy() # <<<<<<<<<<<<<< * n = uniq = length = 0 * state = checkArrayDouble1D(patch_probs1, &n) */ __pyx_t_2 = PyObject_GetAttr(__pyx_v_patch_probs, __pyx_n_s__copy); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 78; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_1 = PyObject_Call(__pyx_t_2, ((PyObject *)__pyx_empty_tuple), NULL); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 78; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_v_patch_probs2 = __pyx_t_1; __pyx_t_1 = 0; /* "cogent/evolve/_likelihood_tree.pyx":79 * patch_probs1 = patch_probs.copy() * patch_probs2 = patch_probs.copy() * n = uniq = length = 0 # <<<<<<<<<<<<<< * state = checkArrayDouble1D(patch_probs1, &n) * prev = checkArrayDouble1D(patch_probs2, &n) */ __pyx_v_n = 0; __pyx_v_uniq = 0; __pyx_v_length = 0; /* "cogent/evolve/_likelihood_tree.pyx":80 * patch_probs2 = patch_probs.copy() * n = uniq = length = 0 * state = checkArrayDouble1D(patch_probs1, &n) # <<<<<<<<<<<<<< * prev = checkArrayDouble1D(patch_probs2, &n) * sp = checkArrayDouble2D(switch_probs, &n, &n) */ __pyx_t_3 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble1D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_patch_probs1), (&__pyx_v_n)); if (unlikely(__pyx_t_3 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 80; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_state = __pyx_t_3; /* "cogent/evolve/_likelihood_tree.pyx":81 * n = uniq = length = 0 * state = checkArrayDouble1D(patch_probs1, &n) * prev = checkArrayDouble1D(patch_probs2, &n) # <<<<<<<<<<<<<< * sp = checkArrayDouble2D(switch_probs, &n, &n) * pl = checkArrayDouble2D(plhs, &uniq, &n) */ __pyx_t_3 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble1D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_patch_probs2), (&__pyx_v_n)); if (unlikely(__pyx_t_3 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 81; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_prev = __pyx_t_3; /* "cogent/evolve/_likelihood_tree.pyx":82 * state = checkArrayDouble1D(patch_probs1, &n) * prev = checkArrayDouble1D(patch_probs2, &n) * sp = checkArrayDouble2D(switch_probs, &n, &n) # <<<<<<<<<<<<<< * pl = checkArrayDouble2D(plhs, &uniq, &n) * index_data = checkArrayLong1D(index, &length) */ __pyx_t_4 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble2D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_switch_probs), (&__pyx_v_n), (&__pyx_v_n)); if (unlikely(__pyx_t_4 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 82; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_sp = __pyx_t_4; /* "cogent/evolve/_likelihood_tree.pyx":83 * prev = checkArrayDouble1D(patch_probs2, &n) * sp = checkArrayDouble2D(switch_probs, &n, &n) * pl = checkArrayDouble2D(plhs, &uniq, &n) # <<<<<<<<<<<<<< * index_data = checkArrayLong1D(index, &length) * exponent = 0 */ __pyx_t_4 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayDouble2D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_plhs), (&__pyx_v_uniq), (&__pyx_v_n)); if (unlikely(__pyx_t_4 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 83; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_pl = __pyx_t_4; /* "cogent/evolve/_likelihood_tree.pyx":84 * sp = checkArrayDouble2D(switch_probs, &n, &n) * pl = checkArrayDouble2D(plhs, &uniq, &n) * index_data = checkArrayLong1D(index, &length) # <<<<<<<<<<<<<< * exponent = 0 * for site from 0 <= site < length: */ __pyx_t_5 = __pyx_f_6cogent_6evolve_16_likelihood_tree_checkArrayLong1D(((__pyx_t_6cogent_6evolve_16_likelihood_tree_ArrayType)__pyx_v_index), (&__pyx_v_length)); if (unlikely(__pyx_t_5 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 84; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_index_data = __pyx_t_5; /* "cogent/evolve/_likelihood_tree.pyx":85 * pl = checkArrayDouble2D(plhs, &uniq, &n) * index_data = checkArrayLong1D(index, &length) * exponent = 0 # <<<<<<<<<<<<<< * for site from 0 <= site < length: * k = index_data[site] */ __pyx_v_exponent = 0; /* "cogent/evolve/_likelihood_tree.pyx":86 * index_data = checkArrayLong1D(index, &length) * exponent = 0 * for site from 0 <= site < length: # <<<<<<<<<<<<<< * k = index_data[site] * if k >= uniq: */ __pyx_t_6 = __pyx_v_length; for (__pyx_v_site = 0; __pyx_v_site < __pyx_t_6; __pyx_v_site++) { /* "cogent/evolve/_likelihood_tree.pyx":87 * exponent = 0 * for site from 0 <= site < length: * k = index_data[site] # <<<<<<<<<<<<<< * if k >= uniq: * raise ValueError((k, uniq)) */ __pyx_v_k = (__pyx_v_index_data[__pyx_v_site]); /* "cogent/evolve/_likelihood_tree.pyx":88 * for site from 0 <= site < length: * k = index_data[site] * if k >= uniq: # <<<<<<<<<<<<<< * raise ValueError((k, uniq)) * tmp = prev */ __pyx_t_7 = (__pyx_v_k >= __pyx_v_uniq); if (__pyx_t_7) { /* "cogent/evolve/_likelihood_tree.pyx":89 * k = index_data[site] * if k >= uniq: * raise ValueError((k, uniq)) # <<<<<<<<<<<<<< * tmp = prev * prev = state */ __pyx_t_1 = PyInt_FromLong(__pyx_v_k); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 89; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = PyInt_FromLong(__pyx_v_uniq); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 89; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_8 = PyTuple_New(2); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 89; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_8); PyTuple_SET_ITEM(__pyx_t_8, 0, __pyx_t_1); __Pyx_GIVEREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_8, 1, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); __pyx_t_1 = 0; __pyx_t_2 = 0; __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 89; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, ((PyObject *)__pyx_t_8)); __Pyx_GIVEREF(((PyObject *)__pyx_t_8)); __pyx_t_8 = 0; __pyx_t_8 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_2), NULL); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 89; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_8); __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __Pyx_Raise(__pyx_t_8, 0, 0, 0); __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 89; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "cogent/evolve/_likelihood_tree.pyx":90 * if k >= uniq: * raise ValueError((k, uniq)) * tmp = prev # <<<<<<<<<<<<<< * prev = state * state = tmp */ __pyx_v_tmp = __pyx_v_prev; /* "cogent/evolve/_likelihood_tree.pyx":91 * raise ValueError((k, uniq)) * tmp = prev * prev = state # <<<<<<<<<<<<<< * state = tmp * most_probable_state = 0 */ __pyx_v_prev = __pyx_v_state; /* "cogent/evolve/_likelihood_tree.pyx":92 * tmp = prev * prev = state * state = tmp # <<<<<<<<<<<<<< * most_probable_state = 0 * for i from 0 <= i < n: */ __pyx_v_state = __pyx_v_tmp; /* "cogent/evolve/_likelihood_tree.pyx":93 * prev = state * state = tmp * most_probable_state = 0 # <<<<<<<<<<<<<< * for i from 0 <= i < n: * state[i] = 0 */ __pyx_v_most_probable_state = 0; /* "cogent/evolve/_likelihood_tree.pyx":94 * state = tmp * most_probable_state = 0 * for i from 0 <= i < n: # <<<<<<<<<<<<<< * state[i] = 0 * for j from 0 <= j < n: */ __pyx_t_9 = __pyx_v_n; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_9; __pyx_v_i++) { /* "cogent/evolve/_likelihood_tree.pyx":95 * most_probable_state = 0 * for i from 0 <= i < n: * state[i] = 0 # <<<<<<<<<<<<<< * for j from 0 <= j < n: * state[i] += prev[j] * sp[j*n+i] */ (__pyx_v_state[__pyx_v_i]) = 0.0; /* "cogent/evolve/_likelihood_tree.pyx":96 * for i from 0 <= i < n: * state[i] = 0 * for j from 0 <= j < n: # <<<<<<<<<<<<<< * state[i] += prev[j] * sp[j*n+i] * state[i] *= pl[k*n+i] */ __pyx_t_10 = __pyx_v_n; for (__pyx_v_j = 0; __pyx_v_j < __pyx_t_10; __pyx_v_j++) { /* "cogent/evolve/_likelihood_tree.pyx":97 * state[i] = 0 * for j from 0 <= j < n: * state[i] += prev[j] * sp[j*n+i] # <<<<<<<<<<<<<< * state[i] *= pl[k*n+i] * if state[i] > state[most_probable_state]: */ __pyx_t_11 = __pyx_v_i; (__pyx_v_state[__pyx_t_11]) = ((__pyx_v_state[__pyx_t_11]) + ((__pyx_v_prev[__pyx_v_j]) * (__pyx_v_sp[((__pyx_v_j * __pyx_v_n) + __pyx_v_i)]))); } /* "cogent/evolve/_likelihood_tree.pyx":98 * for j from 0 <= j < n: * state[i] += prev[j] * sp[j*n+i] * state[i] *= pl[k*n+i] # <<<<<<<<<<<<<< * if state[i] > state[most_probable_state]: * most_probable_state = i */ __pyx_t_10 = __pyx_v_i; (__pyx_v_state[__pyx_t_10]) = ((__pyx_v_state[__pyx_t_10]) * (__pyx_v_pl[((__pyx_v_k * __pyx_v_n) + __pyx_v_i)])); /* "cogent/evolve/_likelihood_tree.pyx":99 * state[i] += prev[j] * sp[j*n+i] * state[i] *= pl[k*n+i] * if state[i] > state[most_probable_state]: # <<<<<<<<<<<<<< * most_probable_state = i * while state[most_probable_state] < 1.0: */ __pyx_t_7 = ((__pyx_v_state[__pyx_v_i]) > (__pyx_v_state[__pyx_v_most_probable_state])); if (__pyx_t_7) { /* "cogent/evolve/_likelihood_tree.pyx":100 * state[i] *= pl[k*n+i] * if state[i] > state[most_probable_state]: * most_probable_state = i # <<<<<<<<<<<<<< * while state[most_probable_state] < 1.0: * for i from 0 <= i < n: */ __pyx_v_most_probable_state = __pyx_v_i; goto __pyx_L10; } __pyx_L10:; } /* "cogent/evolve/_likelihood_tree.pyx":101 * if state[i] > state[most_probable_state]: * most_probable_state = i * while state[most_probable_state] < 1.0: # <<<<<<<<<<<<<< * for i from 0 <= i < n: * state[i] *= BASE */ while (1) { __pyx_t_7 = ((__pyx_v_state[__pyx_v_most_probable_state]) < 1.0); if (!__pyx_t_7) break; /* "cogent/evolve/_likelihood_tree.pyx":102 * most_probable_state = i * while state[most_probable_state] < 1.0: * for i from 0 <= i < n: # <<<<<<<<<<<<<< * state[i] *= BASE * exponent += -1 */ __pyx_t_9 = __pyx_v_n; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_9; __pyx_v_i++) { /* "cogent/evolve/_likelihood_tree.pyx":103 * while state[most_probable_state] < 1.0: * for i from 0 <= i < n: * state[i] *= BASE # <<<<<<<<<<<<<< * exponent += -1 * result = 0.0 */ __pyx_t_10 = __pyx_v_i; (__pyx_v_state[__pyx_t_10]) = ((__pyx_v_state[__pyx_t_10]) * __pyx_v_BASE); } /* "cogent/evolve/_likelihood_tree.pyx":104 * for i from 0 <= i < n: * state[i] *= BASE * exponent += -1 # <<<<<<<<<<<<<< * result = 0.0 * for i from 0 <= i < n: */ __pyx_v_exponent = (__pyx_v_exponent + -1); } } /* "cogent/evolve/_likelihood_tree.pyx":105 * state[i] *= BASE * exponent += -1 * result = 0.0 # <<<<<<<<<<<<<< * for i from 0 <= i < n: * result += state[i] */ __pyx_v_result = 0.0; /* "cogent/evolve/_likelihood_tree.pyx":106 * exponent += -1 * result = 0.0 * for i from 0 <= i < n: # <<<<<<<<<<<<<< * result += state[i] * */ __pyx_t_6 = __pyx_v_n; for (__pyx_v_i = 0; __pyx_v_i < __pyx_t_6; __pyx_v_i++) { /* "cogent/evolve/_likelihood_tree.pyx":107 * result = 0.0 * for i from 0 <= i < n: * result += state[i] # <<<<<<<<<<<<<< * * return log(result) + exponent * log(BASE) */ __pyx_v_result = (__pyx_v_result + (__pyx_v_state[__pyx_v_i])); } /* "cogent/evolve/_likelihood_tree.pyx":109 * result += state[i] * * return log(result) + exponent * log(BASE) # <<<<<<<<<<<<<< * */ __Pyx_XDECREF(__pyx_r); __pyx_t_8 = PyFloat_FromDouble((log(__pyx_v_result) + (__pyx_v_exponent * log(__pyx_v_BASE)))); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 109; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_8); __pyx_r = __pyx_t_8; __pyx_t_8 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_8); __Pyx_AddTraceback("cogent.evolve._likelihood_tree.logDotReduce", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF(__pyx_v_patch_probs1); __Pyx_XDECREF(__pyx_v_patch_probs2); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyMethodDef __pyx_methods[] = { {0, 0, 0, 0} }; #if PY_MAJOR_VERSION >= 3 static struct PyModuleDef __pyx_moduledef = { PyModuleDef_HEAD_INIT, __Pyx_NAMESTR("_likelihood_tree"), 0, /* m_doc */ -1, /* m_size */ __pyx_methods /* m_methods */, NULL, /* m_reload */ NULL, /* m_traverse */ NULL, /* m_clear */ NULL /* m_free */ }; #endif static __Pyx_StringTabEntry __pyx_string_tab[] = { {&__pyx_kp_s_1, __pyx_k_1, sizeof(__pyx_k_1), 0, 0, 1, 0}, {&__pyx_kp_s_10, __pyx_k_10, sizeof(__pyx_k_10), 0, 0, 1, 0}, {&__pyx_kp_s_14, __pyx_k_14, sizeof(__pyx_k_14), 0, 0, 1, 0}, {&__pyx_n_s_15, __pyx_k_15, sizeof(__pyx_k_15), 0, 0, 1, 1}, {&__pyx_n_s_18, __pyx_k_18, sizeof(__pyx_k_18), 0, 0, 1, 1}, {&__pyx_n_s_21, __pyx_k_21, sizeof(__pyx_k_21), 0, 0, 1, 1}, {&__pyx_kp_s_3, __pyx_k_3, sizeof(__pyx_k_3), 0, 0, 1, 0}, {&__pyx_kp_s_4, __pyx_k_4, sizeof(__pyx_k_4), 0, 0, 1, 0}, {&__pyx_kp_s_5, __pyx_k_5, sizeof(__pyx_k_5), 0, 0, 1, 0}, {&__pyx_kp_s_6, __pyx_k_6, sizeof(__pyx_k_6), 0, 0, 1, 0}, {&__pyx_kp_s_7, __pyx_k_7, sizeof(__pyx_k_7), 0, 0, 1, 0}, {&__pyx_kp_s_9, __pyx_k_9, sizeof(__pyx_k_9), 0, 0, 1, 0}, {&__pyx_n_s__BASE, __pyx_k__BASE, sizeof(__pyx_k__BASE), 0, 0, 1, 1}, {&__pyx_n_s__M, __pyx_k__M, sizeof(__pyx_k__M), 0, 0, 1, 1}, {&__pyx_n_s__S, __pyx_k__S, sizeof(__pyx_k__S), 0, 0, 1, 1}, {&__pyx_n_s__TypeError, __pyx_k__TypeError, sizeof(__pyx_k__TypeError), 0, 0, 1, 1}, {&__pyx_n_s__U, __pyx_k__U, sizeof(__pyx_k__U), 0, 0, 1, 1}, {&__pyx_n_s__ValueError, __pyx_k__ValueError, sizeof(__pyx_k__ValueError), 0, 0, 1, 1}, {&__pyx_n_s____array_struct__, __pyx_k____array_struct__, sizeof(__pyx_k____array_struct__), 0, 0, 1, 1}, {&__pyx_n_s____main__, __pyx_k____main__, sizeof(__pyx_k____main__), 0, 0, 1, 1}, {&__pyx_n_s____test__, __pyx_k____test__, sizeof(__pyx_k____test__), 0, 0, 1, 1}, {&__pyx_n_s____version__, __pyx_k____version__, sizeof(__pyx_k____version__), 0, 0, 1, 1}, {&__pyx_n_s__c, __pyx_k__c, sizeof(__pyx_k__c), 0, 0, 1, 1}, {&__pyx_n_s__child_indexes, __pyx_k__child_indexes, sizeof(__pyx_k__child_indexes), 0, 0, 1, 1}, {&__pyx_n_s__chr, __pyx_k__chr, sizeof(__pyx_k__chr), 0, 0, 1, 1}, {&__pyx_n_s__copy, __pyx_k__copy, sizeof(__pyx_k__copy), 0, 0, 1, 1}, {&__pyx_n_s__counts, __pyx_k__counts, sizeof(__pyx_k__counts), 0, 0, 1, 1}, {&__pyx_n_s__exponent, __pyx_k__exponent, sizeof(__pyx_k__exponent), 0, 0, 1, 1}, {&__pyx_n_s__first, __pyx_k__first, sizeof(__pyx_k__first), 0, 0, 1, 1}, {&__pyx_n_s__i, __pyx_k__i, sizeof(__pyx_k__i), 0, 0, 1, 1}, {&__pyx_n_s__index, __pyx_k__index, sizeof(__pyx_k__index), 0, 0, 1, 1}, {&__pyx_n_s__index_data, __pyx_k__index_data, sizeof(__pyx_k__index_data), 0, 0, 1, 1}, {&__pyx_n_s__input_likelihoods, __pyx_k__input_likelihoods, sizeof(__pyx_k__input_likelihoods), 0, 0, 1, 1}, {&__pyx_n_s__j, __pyx_k__j, sizeof(__pyx_k__j), 0, 0, 1, 1}, {&__pyx_n_s__k, __pyx_k__k, sizeof(__pyx_k__k), 0, 0, 1, 1}, {&__pyx_n_s__length, __pyx_k__length, sizeof(__pyx_k__length), 0, 0, 1, 1}, {&__pyx_n_s__likelihoods, __pyx_k__likelihoods, sizeof(__pyx_k__likelihoods), 0, 0, 1, 1}, {&__pyx_n_s__likelihoods_data, __pyx_k__likelihoods_data, sizeof(__pyx_k__likelihoods_data), 0, 0, 1, 1}, {&__pyx_n_s__logDotReduce, __pyx_k__logDotReduce, sizeof(__pyx_k__logDotReduce), 0, 0, 1, 1}, {&__pyx_n_s__m, __pyx_k__m, sizeof(__pyx_k__m), 0, 0, 1, 1}, {&__pyx_n_s__most_probable_state, __pyx_k__most_probable_state, sizeof(__pyx_k__most_probable_state), 0, 0, 1, 1}, {&__pyx_n_s__mprobs, __pyx_k__mprobs, sizeof(__pyx_k__mprobs), 0, 0, 1, 1}, {&__pyx_n_s__mprobs_data, __pyx_k__mprobs_data, sizeof(__pyx_k__mprobs_data), 0, 0, 1, 1}, {&__pyx_n_s__n, __pyx_k__n, sizeof(__pyx_k__n), 0, 0, 1, 1}, {&__pyx_n_s__patch_probs, __pyx_k__patch_probs, sizeof(__pyx_k__patch_probs), 0, 0, 1, 1}, {&__pyx_n_s__patch_probs1, __pyx_k__patch_probs1, sizeof(__pyx_k__patch_probs1), 0, 0, 1, 1}, {&__pyx_n_s__patch_probs2, __pyx_k__patch_probs2, sizeof(__pyx_k__patch_probs2), 0, 0, 1, 1}, {&__pyx_n_s__pl, __pyx_k__pl, sizeof(__pyx_k__pl), 0, 0, 1, 1}, {&__pyx_n_s__plhs, __pyx_k__plhs, sizeof(__pyx_k__plhs), 0, 0, 1, 1}, {&__pyx_n_s__posn, __pyx_k__posn, sizeof(__pyx_k__posn), 0, 0, 1, 1}, {&__pyx_n_s__prev, __pyx_k__prev, sizeof(__pyx_k__prev), 0, 0, 1, 1}, {&__pyx_n_s__result, __pyx_k__result, sizeof(__pyx_k__result), 0, 0, 1, 1}, {&__pyx_n_s__site, __pyx_k__site, sizeof(__pyx_k__site), 0, 0, 1, 1}, {&__pyx_n_s__sp, __pyx_k__sp, sizeof(__pyx_k__sp), 0, 0, 1, 1}, {&__pyx_n_s__state, __pyx_k__state, sizeof(__pyx_k__state), 0, 0, 1, 1}, {&__pyx_n_s__sumInputLikelihoods, __pyx_k__sumInputLikelihoods, sizeof(__pyx_k__sumInputLikelihoods), 0, 0, 1, 1}, {&__pyx_n_s__switch_probs, __pyx_k__switch_probs, sizeof(__pyx_k__switch_probs), 0, 0, 1, 1}, {&__pyx_n_s__target_data, __pyx_k__target_data, sizeof(__pyx_k__target_data), 0, 0, 1, 1}, {&__pyx_n_s__tmp, __pyx_k__tmp, sizeof(__pyx_k__tmp), 0, 0, 1, 1}, {&__pyx_n_s__total, __pyx_k__total, sizeof(__pyx_k__total), 0, 0, 1, 1}, {&__pyx_n_s__u, __pyx_k__u, sizeof(__pyx_k__u), 0, 0, 1, 1}, {&__pyx_n_s__uniq, __pyx_k__uniq, sizeof(__pyx_k__uniq), 0, 0, 1, 1}, {&__pyx_n_s__values_data, __pyx_k__values_data, sizeof(__pyx_k__values_data), 0, 0, 1, 1}, {&__pyx_n_s__version_info, __pyx_k__version_info, sizeof(__pyx_k__version_info), 0, 0, 1, 1}, {&__pyx_n_s__weights_data, __pyx_k__weights_data, sizeof(__pyx_k__weights_data), 0, 0, 1, 1}, {0, 0, 0, 0, 0, 0, 0} }; static int __Pyx_InitCachedBuiltins(void) { __pyx_builtin_TypeError = __Pyx_GetName(__pyx_b, __pyx_n_s__TypeError); if (!__pyx_builtin_TypeError) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 44; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_ValueError = __Pyx_GetName(__pyx_b, __pyx_n_s__ValueError); if (!__pyx_builtin_ValueError) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_chr = __Pyx_GetName(__pyx_b, __pyx_n_s__chr); if (!__pyx_builtin_chr) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} return 0; __pyx_L1_error:; return -1; } static int __Pyx_InitCachedConstants(void) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__Pyx_InitCachedConstants", 0); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":44 * cdef char kind * if A is None: * raise TypeError("Array required, got None") # <<<<<<<<<<<<<< * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) */ __pyx_k_tuple_2 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 44; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_2); __Pyx_INCREF(((PyObject *)__pyx_kp_s_1)); PyTuple_SET_ITEM(__pyx_k_tuple_2, 0, ((PyObject *)__pyx_kp_s_1)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_s_1)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_2)); /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":62 * (nd, a.nd)) * if not a.flags & CONTIGUOUS: * raise ValueError ('Noncontiguous array') # <<<<<<<<<<<<<< * * cdef int dimension, val */ __pyx_k_tuple_8 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_8)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 62; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_8); __Pyx_INCREF(((PyObject *)__pyx_kp_s_7)); PyTuple_SET_ITEM(__pyx_k_tuple_8, 0, ((PyObject *)__pyx_kp_s_7)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_s_7)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_8)); /* "cogent/evolve/_likelihood_tree.pyx":2 * include "../../include/numerical_pyrex.pyx" * version_info = (2, 1) # <<<<<<<<<<<<<< * __version__ = "('1', '5', '3')" * */ __pyx_k_tuple_11 = PyTuple_New(2); if (unlikely(!__pyx_k_tuple_11)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_11); __Pyx_INCREF(__pyx_int_2); PyTuple_SET_ITEM(__pyx_k_tuple_11, 0, __pyx_int_2); __Pyx_GIVEREF(__pyx_int_2); __Pyx_INCREF(__pyx_int_1); PyTuple_SET_ITEM(__pyx_k_tuple_11, 1, __pyx_int_1); __Pyx_GIVEREF(__pyx_int_1); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_11)); /* "cogent/evolve/_likelihood_tree.pyx":8 * double log (double x) * * def sumInputLikelihoods(child_indexes, result, likelihoods): # <<<<<<<<<<<<<< * # M is dim of alphabet, S is non-redundandt parent seq length, * # U is length */ __pyx_k_tuple_12 = PyTuple_New(15); if (unlikely(!__pyx_k_tuple_12)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_12); __Pyx_INCREF(((PyObject *)__pyx_n_s__child_indexes)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 0, ((PyObject *)__pyx_n_s__child_indexes)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__child_indexes)); __Pyx_INCREF(((PyObject *)__pyx_n_s__result)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 1, ((PyObject *)__pyx_n_s__result)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__result)); __Pyx_INCREF(((PyObject *)__pyx_n_s__likelihoods)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 2, ((PyObject *)__pyx_n_s__likelihoods)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__likelihoods)); __Pyx_INCREF(((PyObject *)__pyx_n_s__M)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 3, ((PyObject *)__pyx_n_s__M)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__M)); __Pyx_INCREF(((PyObject *)__pyx_n_s__S)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 4, ((PyObject *)__pyx_n_s__S)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__S)); __Pyx_INCREF(((PyObject *)__pyx_n_s__U)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 5, ((PyObject *)__pyx_n_s__U)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__U)); __Pyx_INCREF(((PyObject *)__pyx_n_s__m)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 6, ((PyObject *)__pyx_n_s__m)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__m)); __Pyx_INCREF(((PyObject *)__pyx_n_s__i)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 7, ((PyObject *)__pyx_n_s__i)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__i)); __Pyx_INCREF(((PyObject *)__pyx_n_s__u)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 8, ((PyObject *)__pyx_n_s__u)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__u)); __Pyx_INCREF(((PyObject *)__pyx_n_s__c)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 9, ((PyObject *)__pyx_n_s__c)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__c)); __Pyx_INCREF(((PyObject *)__pyx_n_s__values_data)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 10, ((PyObject *)__pyx_n_s__values_data)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__values_data)); __Pyx_INCREF(((PyObject *)__pyx_n_s__index_data)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 11, ((PyObject *)__pyx_n_s__index_data)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__index_data)); __Pyx_INCREF(((PyObject *)__pyx_n_s__target_data)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 12, ((PyObject *)__pyx_n_s__target_data)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__target_data)); __Pyx_INCREF(((PyObject *)__pyx_n_s__first)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 13, ((PyObject *)__pyx_n_s__first)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__first)); __Pyx_INCREF(((PyObject *)__pyx_n_s__index)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 14, ((PyObject *)__pyx_n_s__index)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__index)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_12)); __pyx_k_codeobj_13 = (PyObject*)__Pyx_PyCode_New(3, 0, 15, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_12, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_14, __pyx_n_s__sumInputLikelihoods, 8, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_13)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/evolve/_likelihood_tree.pyx":41 * return result * * def getTotalLogLikelihood(counts, input_likelihoods, mprobs): # <<<<<<<<<<<<<< * cdef int S, M, i, m * cdef double posn, total */ __pyx_k_tuple_16 = PyTuple_New(12); if (unlikely(!__pyx_k_tuple_16)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_16); __Pyx_INCREF(((PyObject *)__pyx_n_s__counts)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 0, ((PyObject *)__pyx_n_s__counts)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__counts)); __Pyx_INCREF(((PyObject *)__pyx_n_s__input_likelihoods)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 1, ((PyObject *)__pyx_n_s__input_likelihoods)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__input_likelihoods)); __Pyx_INCREF(((PyObject *)__pyx_n_s__mprobs)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 2, ((PyObject *)__pyx_n_s__mprobs)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__mprobs)); __Pyx_INCREF(((PyObject *)__pyx_n_s__S)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 3, ((PyObject *)__pyx_n_s__S)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__S)); __Pyx_INCREF(((PyObject *)__pyx_n_s__M)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 4, ((PyObject *)__pyx_n_s__M)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__M)); __Pyx_INCREF(((PyObject *)__pyx_n_s__i)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 5, ((PyObject *)__pyx_n_s__i)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__i)); __Pyx_INCREF(((PyObject *)__pyx_n_s__m)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 6, ((PyObject *)__pyx_n_s__m)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__m)); __Pyx_INCREF(((PyObject *)__pyx_n_s__posn)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 7, ((PyObject *)__pyx_n_s__posn)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__posn)); __Pyx_INCREF(((PyObject *)__pyx_n_s__total)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 8, ((PyObject *)__pyx_n_s__total)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__total)); __Pyx_INCREF(((PyObject *)__pyx_n_s__likelihoods_data)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 9, ((PyObject *)__pyx_n_s__likelihoods_data)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__likelihoods_data)); __Pyx_INCREF(((PyObject *)__pyx_n_s__mprobs_data)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 10, ((PyObject *)__pyx_n_s__mprobs_data)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__mprobs_data)); __Pyx_INCREF(((PyObject *)__pyx_n_s__weights_data)); PyTuple_SET_ITEM(__pyx_k_tuple_16, 11, ((PyObject *)__pyx_n_s__weights_data)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__weights_data)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_16)); __pyx_k_codeobj_17 = (PyObject*)__Pyx_PyCode_New(3, 0, 12, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_16, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_14, __pyx_n_s_18, 41, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_17)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/evolve/_likelihood_tree.pyx":58 * return total * * def getLogSumAcrossSites(counts, input_likelihoods): # <<<<<<<<<<<<<< * cdef int S, i * cdef double total */ __pyx_k_tuple_19 = PyTuple_New(7); if (unlikely(!__pyx_k_tuple_19)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 58; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_19); __Pyx_INCREF(((PyObject *)__pyx_n_s__counts)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 0, ((PyObject *)__pyx_n_s__counts)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__counts)); __Pyx_INCREF(((PyObject *)__pyx_n_s__input_likelihoods)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 1, ((PyObject *)__pyx_n_s__input_likelihoods)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__input_likelihoods)); __Pyx_INCREF(((PyObject *)__pyx_n_s__S)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 2, ((PyObject *)__pyx_n_s__S)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__S)); __Pyx_INCREF(((PyObject *)__pyx_n_s__i)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 3, ((PyObject *)__pyx_n_s__i)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__i)); __Pyx_INCREF(((PyObject *)__pyx_n_s__total)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 4, ((PyObject *)__pyx_n_s__total)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__total)); __Pyx_INCREF(((PyObject *)__pyx_n_s__likelihoods_data)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 5, ((PyObject *)__pyx_n_s__likelihoods_data)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__likelihoods_data)); __Pyx_INCREF(((PyObject *)__pyx_n_s__weights_data)); PyTuple_SET_ITEM(__pyx_k_tuple_19, 6, ((PyObject *)__pyx_n_s__weights_data)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__weights_data)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_19)); __pyx_k_codeobj_20 = (PyObject*)__Pyx_PyCode_New(2, 0, 7, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_19, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_14, __pyx_n_s_21, 58, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_20)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 58; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/evolve/_likelihood_tree.pyx":70 * return total * * def logDotReduce(index, patch_probs, switch_probs, plhs): # <<<<<<<<<<<<<< * cdef int site, i, j, k, n, uniq, exponent, length, most_probable_state * cdef double result, BASE */ __pyx_k_tuple_22 = PyTuple_New(23); if (unlikely(!__pyx_k_tuple_22)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 70; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_22); __Pyx_INCREF(((PyObject *)__pyx_n_s__index)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 0, ((PyObject *)__pyx_n_s__index)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__index)); __Pyx_INCREF(((PyObject *)__pyx_n_s__patch_probs)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 1, ((PyObject *)__pyx_n_s__patch_probs)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__patch_probs)); __Pyx_INCREF(((PyObject *)__pyx_n_s__switch_probs)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 2, ((PyObject *)__pyx_n_s__switch_probs)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__switch_probs)); __Pyx_INCREF(((PyObject *)__pyx_n_s__plhs)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 3, ((PyObject *)__pyx_n_s__plhs)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__plhs)); __Pyx_INCREF(((PyObject *)__pyx_n_s__site)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 4, ((PyObject *)__pyx_n_s__site)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__site)); __Pyx_INCREF(((PyObject *)__pyx_n_s__i)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 5, ((PyObject *)__pyx_n_s__i)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__i)); __Pyx_INCREF(((PyObject *)__pyx_n_s__j)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 6, ((PyObject *)__pyx_n_s__j)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__j)); __Pyx_INCREF(((PyObject *)__pyx_n_s__k)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 7, ((PyObject *)__pyx_n_s__k)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__k)); __Pyx_INCREF(((PyObject *)__pyx_n_s__n)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 8, ((PyObject *)__pyx_n_s__n)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__n)); __Pyx_INCREF(((PyObject *)__pyx_n_s__uniq)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 9, ((PyObject *)__pyx_n_s__uniq)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__uniq)); __Pyx_INCREF(((PyObject *)__pyx_n_s__exponent)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 10, ((PyObject *)__pyx_n_s__exponent)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__exponent)); __Pyx_INCREF(((PyObject *)__pyx_n_s__length)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 11, ((PyObject *)__pyx_n_s__length)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__length)); __Pyx_INCREF(((PyObject *)__pyx_n_s__most_probable_state)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 12, ((PyObject *)__pyx_n_s__most_probable_state)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__most_probable_state)); __Pyx_INCREF(((PyObject *)__pyx_n_s__result)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 13, ((PyObject *)__pyx_n_s__result)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__result)); __Pyx_INCREF(((PyObject *)__pyx_n_s__BASE)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 14, ((PyObject *)__pyx_n_s__BASE)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__BASE)); __Pyx_INCREF(((PyObject *)__pyx_n_s__sp)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 15, ((PyObject *)__pyx_n_s__sp)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__sp)); __Pyx_INCREF(((PyObject *)__pyx_n_s__pl)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 16, ((PyObject *)__pyx_n_s__pl)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__pl)); __Pyx_INCREF(((PyObject *)__pyx_n_s__state)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 17, ((PyObject *)__pyx_n_s__state)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__state)); __Pyx_INCREF(((PyObject *)__pyx_n_s__prev)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 18, ((PyObject *)__pyx_n_s__prev)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__prev)); __Pyx_INCREF(((PyObject *)__pyx_n_s__tmp)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 19, ((PyObject *)__pyx_n_s__tmp)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__tmp)); __Pyx_INCREF(((PyObject *)__pyx_n_s__index_data)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 20, ((PyObject *)__pyx_n_s__index_data)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__index_data)); __Pyx_INCREF(((PyObject *)__pyx_n_s__patch_probs1)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 21, ((PyObject *)__pyx_n_s__patch_probs1)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__patch_probs1)); __Pyx_INCREF(((PyObject *)__pyx_n_s__patch_probs2)); PyTuple_SET_ITEM(__pyx_k_tuple_22, 22, ((PyObject *)__pyx_n_s__patch_probs2)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__patch_probs2)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_22)); __pyx_k_codeobj_23 = (PyObject*)__Pyx_PyCode_New(4, 0, 23, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_22, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_14, __pyx_n_s__logDotReduce, 70, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_23)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 70; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_RefNannyFinishContext(); return 0; __pyx_L1_error:; __Pyx_RefNannyFinishContext(); return -1; } static int __Pyx_InitGlobals(void) { if (__Pyx_InitStrings(__pyx_string_tab) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_1 = PyInt_FromLong(1); if (unlikely(!__pyx_int_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_2 = PyInt_FromLong(2); if (unlikely(!__pyx_int_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; return 0; __pyx_L1_error:; return -1; } #if PY_MAJOR_VERSION < 3 PyMODINIT_FUNC init_likelihood_tree(void); /*proto*/ PyMODINIT_FUNC init_likelihood_tree(void) #else PyMODINIT_FUNC PyInit__likelihood_tree(void); /*proto*/ PyMODINIT_FUNC PyInit__likelihood_tree(void) #endif { PyObject *__pyx_t_1 = NULL; __Pyx_RefNannyDeclarations #if CYTHON_REFNANNY __Pyx_RefNanny = __Pyx_RefNannyImportAPI("refnanny"); if (!__Pyx_RefNanny) { PyErr_Clear(); __Pyx_RefNanny = __Pyx_RefNannyImportAPI("Cython.Runtime.refnanny"); if (!__Pyx_RefNanny) Py_FatalError("failed to import 'refnanny' module"); } #endif __Pyx_RefNannySetupContext("PyMODINIT_FUNC PyInit__likelihood_tree(void)", 0); if ( __Pyx_check_binary_version() < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_tuple = PyTuple_New(0); if (unlikely(!__pyx_empty_tuple)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_bytes = PyBytes_FromStringAndSize("", 0); if (unlikely(!__pyx_empty_bytes)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #ifdef __Pyx_CyFunction_USED if (__Pyx_CyFunction_init() < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_FusedFunction_USED if (__pyx_FusedFunction_init() < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_Generator_USED if (__pyx_Generator_init() < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif /*--- Library function declarations ---*/ /*--- Threads initialization code ---*/ #if defined(__PYX_FORCE_INIT_THREADS) && __PYX_FORCE_INIT_THREADS #ifdef WITH_THREAD /* Python build with threading support? */ PyEval_InitThreads(); #endif #endif /*--- Module creation code ---*/ #if PY_MAJOR_VERSION < 3 __pyx_m = Py_InitModule4(__Pyx_NAMESTR("_likelihood_tree"), __pyx_methods, 0, 0, PYTHON_API_VERSION); #else __pyx_m = PyModule_Create(&__pyx_moduledef); #endif if (!__pyx_m) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; #if PY_MAJOR_VERSION < 3 Py_INCREF(__pyx_m); #endif __pyx_b = PyImport_AddModule(__Pyx_NAMESTR(__Pyx_BUILTIN_MODULE_NAME)); if (!__pyx_b) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; if (__Pyx_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; /*--- Initialize various global constants etc. ---*/ if (unlikely(__Pyx_InitGlobals() < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__pyx_module_is_main_cogent__evolve___likelihood_tree) { if (__Pyx_SetAttrString(__pyx_m, "__name__", __pyx_n_s____main__) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; } /*--- Builtin init code ---*/ if (unlikely(__Pyx_InitCachedBuiltins() < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Constants init code ---*/ if (unlikely(__Pyx_InitCachedConstants() < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Global init code ---*/ /*--- Variable export code ---*/ /*--- Function export code ---*/ /*--- Type init code ---*/ /*--- Type import code ---*/ /*--- Variable import code ---*/ /*--- Function import code ---*/ /*--- Execution code ---*/ /* "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/../../include/numerical_pyrex.pyx":13 * # * * __version__ = "('1', '5', '3')" # <<<<<<<<<<<<<< * * cdef extern from "Python.h": */ if (PyObject_SetAttr(__pyx_m, __pyx_n_s____version__, ((PyObject *)__pyx_kp_s_10)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/evolve/_likelihood_tree.pyx":2 * include "../../include/numerical_pyrex.pyx" * version_info = (2, 1) # <<<<<<<<<<<<<< * __version__ = "('1', '5', '3')" * */ if (PyObject_SetAttr(__pyx_m, __pyx_n_s__version_info, ((PyObject *)__pyx_k_tuple_11)) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 2; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/evolve/_likelihood_tree.pyx":3 * include "../../include/numerical_pyrex.pyx" * version_info = (2, 1) * __version__ = "('1', '5', '3')" # <<<<<<<<<<<<<< * * cdef extern from "math.h": */ if (PyObject_SetAttr(__pyx_m, __pyx_n_s____version__, ((PyObject *)__pyx_kp_s_10)) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 3; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/evolve/_likelihood_tree.pyx":8 * double log (double x) * * def sumInputLikelihoods(child_indexes, result, likelihoods): # <<<<<<<<<<<<<< * # M is dim of alphabet, S is non-redundandt parent seq length, * # U is length */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_6cogent_6evolve_16_likelihood_tree_1sumInputLikelihoods, NULL, __pyx_n_s_15); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__sumInputLikelihoods, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "cogent/evolve/_likelihood_tree.pyx":41 * return result * * def getTotalLogLikelihood(counts, input_likelihoods, mprobs): # <<<<<<<<<<<<<< * cdef int S, M, i, m * cdef double posn, total */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_6cogent_6evolve_16_likelihood_tree_3getTotalLogLikelihood, NULL, __pyx_n_s_15); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s_18, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 41; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "cogent/evolve/_likelihood_tree.pyx":58 * return total * * def getLogSumAcrossSites(counts, input_likelihoods): # <<<<<<<<<<<<<< * cdef int S, i * cdef double total */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_6cogent_6evolve_16_likelihood_tree_5getLogSumAcrossSites, NULL, __pyx_n_s_15); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 58; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s_21, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 58; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "cogent/evolve/_likelihood_tree.pyx":70 * return total * * def logDotReduce(index, patch_probs, switch_probs, plhs): # <<<<<<<<<<<<<< * cdef int site, i, j, k, n, uniq, exponent, length, most_probable_state * cdef double result, BASE */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_6cogent_6evolve_16_likelihood_tree_7logDotReduce, NULL, __pyx_n_s_15); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 70; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__logDotReduce, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 70; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "cogent/evolve/_likelihood_tree.pyx":1 * include "../../include/numerical_pyrex.pyx" # <<<<<<<<<<<<<< * version_info = (2, 1) * __version__ = "('1', '5', '3')" */ __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_1)); if (PyObject_SetAttr(__pyx_m, __pyx_n_s____test__, ((PyObject *)__pyx_t_1)) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); if (__pyx_m) { __Pyx_AddTraceback("init cogent.evolve._likelihood_tree", __pyx_clineno, __pyx_lineno, __pyx_filename); Py_DECREF(__pyx_m); __pyx_m = 0; } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_ImportError, "init cogent.evolve._likelihood_tree"); } __pyx_L0:; __Pyx_RefNannyFinishContext(); #if PY_MAJOR_VERSION < 3 return; #else return __pyx_m; #endif } /* Runtime support code */ #if CYTHON_REFNANNY static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname) { PyObject *m = NULL, *p = NULL; void *r = NULL; m = PyImport_ImportModule((char *)modname); if (!m) goto end; p = PyObject_GetAttrString(m, (char *)"RefNannyAPI"); if (!p) goto end; r = PyLong_AsVoidPtr(p); end: Py_XDECREF(p); Py_XDECREF(m); return (__Pyx_RefNannyAPIStruct *)r; } #endif /* CYTHON_REFNANNY */ static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name) { PyObject *result; result = PyObject_GetAttr(dict, name); if (!result) { if (dict != __pyx_b) { PyErr_Clear(); result = PyObject_GetAttr(__pyx_b, name); } if (!result) { PyErr_SetObject(PyExc_NameError, name); } } return result; } static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb) { #if CYTHON_COMPILING_IN_CPYTHON PyObject *tmp_type, *tmp_value, *tmp_tb; PyThreadState *tstate = PyThreadState_GET(); tmp_type = tstate->curexc_type; tmp_value = tstate->curexc_value; tmp_tb = tstate->curexc_traceback; tstate->curexc_type = type; tstate->curexc_value = value; tstate->curexc_traceback = tb; Py_XDECREF(tmp_type); Py_XDECREF(tmp_value); Py_XDECREF(tmp_tb); #else PyErr_Restore(type, value, tb); #endif } static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb) { #if CYTHON_COMPILING_IN_CPYTHON PyThreadState *tstate = PyThreadState_GET(); *type = tstate->curexc_type; *value = tstate->curexc_value; *tb = tstate->curexc_traceback; tstate->curexc_type = 0; tstate->curexc_value = 0; tstate->curexc_traceback = 0; #else PyErr_Fetch(type, value, tb); #endif } #if PY_MAJOR_VERSION < 3 static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, CYTHON_UNUSED PyObject *cause) { Py_XINCREF(type); Py_XINCREF(value); Py_XINCREF(tb); if (tb == Py_None) { Py_DECREF(tb); tb = 0; } else if (tb != NULL && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto raise_error; } if (value == NULL) { value = Py_None; Py_INCREF(value); } #if PY_VERSION_HEX < 0x02050000 if (!PyClass_Check(type)) #else if (!PyType_Check(type)) #endif { if (value != Py_None) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto raise_error; } Py_DECREF(value); value = type; #if PY_VERSION_HEX < 0x02050000 if (PyInstance_Check(type)) { type = (PyObject*) ((PyInstanceObject*)type)->in_class; Py_INCREF(type); } else { type = 0; PyErr_SetString(PyExc_TypeError, "raise: exception must be an old-style class or instance"); goto raise_error; } #else type = (PyObject*) Py_TYPE(type); Py_INCREF(type); if (!PyType_IsSubtype((PyTypeObject *)type, (PyTypeObject *)PyExc_BaseException)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto raise_error; } #endif } __Pyx_ErrRestore(type, value, tb); return; raise_error: Py_XDECREF(value); Py_XDECREF(type); Py_XDECREF(tb); return; } #else /* Python 3+ */ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause) { if (tb == Py_None) { tb = 0; } else if (tb && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto bad; } if (value == Py_None) value = 0; if (PyExceptionInstance_Check(type)) { if (value) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto bad; } value = type; type = (PyObject*) Py_TYPE(value); } else if (!PyExceptionClass_Check(type)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto bad; } if (cause) { PyObject *fixed_cause; if (PyExceptionClass_Check(cause)) { fixed_cause = PyObject_CallObject(cause, NULL); if (fixed_cause == NULL) goto bad; } else if (PyExceptionInstance_Check(cause)) { fixed_cause = cause; Py_INCREF(fixed_cause); } else { PyErr_SetString(PyExc_TypeError, "exception causes must derive from " "BaseException"); goto bad; } if (!value) { value = PyObject_CallObject(type, NULL); } PyException_SetCause(value, fixed_cause); } PyErr_SetObject(type, value); if (tb) { PyThreadState *tstate = PyThreadState_GET(); PyObject* tmp_tb = tstate->curexc_traceback; if (tb != tmp_tb) { Py_INCREF(tb); tstate->curexc_traceback = tb; Py_XDECREF(tmp_tb); } } bad: return; } #endif static void __Pyx_RaiseArgtupleInvalid( const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found) { Py_ssize_t num_expected; const char *more_or_less; if (num_found < num_min) { num_expected = num_min; more_or_less = "at least"; } else { num_expected = num_max; more_or_less = "at most"; } if (exact) { more_or_less = "exactly"; } PyErr_Format(PyExc_TypeError, "%s() takes %s %"PY_FORMAT_SIZE_T"d positional argument%s (%"PY_FORMAT_SIZE_T"d given)", func_name, more_or_less, num_expected, (num_expected == 1) ? "" : "s", num_found); } static void __Pyx_RaiseDoubleKeywordsError( const char* func_name, PyObject* kw_name) { PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION >= 3 "%s() got multiple values for keyword argument '%U'", func_name, kw_name); #else "%s() got multiple values for keyword argument '%s'", func_name, PyString_AS_STRING(kw_name)); #endif } static int __Pyx_ParseOptionalKeywords( PyObject *kwds, PyObject **argnames[], PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, const char* function_name) { PyObject *key = 0, *value = 0; Py_ssize_t pos = 0; PyObject*** name; PyObject*** first_kw_arg = argnames + num_pos_args; while (PyDict_Next(kwds, &pos, &key, &value)) { name = first_kw_arg; while (*name && (**name != key)) name++; if (*name) { values[name-argnames] = value; } else { #if PY_MAJOR_VERSION < 3 if (unlikely(!PyString_CheckExact(key)) && unlikely(!PyString_Check(key))) { #else if (unlikely(!PyUnicode_Check(key))) { #endif goto invalid_keyword_type; } else { for (name = first_kw_arg; *name; name++) { #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) break; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) break; #endif } if (*name) { values[name-argnames] = value; } else { for (name=argnames; name != first_kw_arg; name++) { if (**name == key) goto arg_passed_twice; #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) goto arg_passed_twice; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) goto arg_passed_twice; #endif } if (kwds2) { if (unlikely(PyDict_SetItem(kwds2, key, value))) goto bad; } else { goto invalid_keyword; } } } } } return 0; arg_passed_twice: __Pyx_RaiseDoubleKeywordsError(function_name, **name); goto bad; invalid_keyword_type: PyErr_Format(PyExc_TypeError, "%s() keywords must be strings", function_name); goto bad; invalid_keyword: PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION < 3 "%s() got an unexpected keyword argument '%s'", function_name, PyString_AsString(key)); #else "%s() got an unexpected keyword argument '%U'", function_name, key); #endif bad: return -1; } static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject* x) { const unsigned char neg_one = (unsigned char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned char" : "value too large to convert to unsigned char"); } return (unsigned char)-1; } return (unsigned char)val; } return (unsigned char)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject* x) { const unsigned short neg_one = (unsigned short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned short" : "value too large to convert to unsigned short"); } return (unsigned short)-1; } return (unsigned short)val; } return (unsigned short)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject* x) { const unsigned int neg_one = (unsigned int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned int" : "value too large to convert to unsigned int"); } return (unsigned int)-1; } return (unsigned int)val; } return (unsigned int)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject* x) { const char neg_one = (char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to char" : "value too large to convert to char"); } return (char)-1; } return (char)val; } return (char)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject* x) { const short neg_one = (short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to short" : "value too large to convert to short"); } return (short)-1; } return (short)val; } return (short)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject* x) { const signed char neg_one = (signed char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed char" : "value too large to convert to signed char"); } return (signed char)-1; } return (signed char)val; } return (signed char)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject* x) { const signed short neg_one = (signed short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed short" : "value too large to convert to signed short"); } return (signed short)-1; } return (signed short)val; } return (signed short)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject* x) { const signed int neg_one = (signed int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed int" : "value too large to convert to signed int"); } return (signed int)-1; } return (signed int)val; } return (signed int)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject* x) { const unsigned long neg_one = (unsigned long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)PyLong_AsUnsignedLong(x); } else { return (unsigned long)PyLong_AsLong(x); } } else { unsigned long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned long)-1; val = __Pyx_PyInt_AsUnsignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject* x) { const unsigned PY_LONG_LONG neg_one = (unsigned PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (unsigned PY_LONG_LONG)PyLong_AsLongLong(x); } } else { unsigned PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned PY_LONG_LONG)-1; val = __Pyx_PyInt_AsUnsignedLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject* x) { const long neg_one = (long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)PyLong_AsUnsignedLong(x); } else { return (long)PyLong_AsLong(x); } } else { long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (long)-1; val = __Pyx_PyInt_AsLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject* x) { const PY_LONG_LONG neg_one = (PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (PY_LONG_LONG)PyLong_AsLongLong(x); } } else { PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1; val = __Pyx_PyInt_AsLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject* x) { const signed long neg_one = (signed long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)PyLong_AsUnsignedLong(x); } else { return (signed long)PyLong_AsLong(x); } } else { signed long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed long)-1; val = __Pyx_PyInt_AsSignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject* x) { const signed PY_LONG_LONG neg_one = (signed PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (signed PY_LONG_LONG)PyLong_AsLongLong(x); } } else { signed PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed PY_LONG_LONG)-1; val = __Pyx_PyInt_AsSignedLongLong(tmp); Py_DECREF(tmp); return val; } } static void __Pyx_WriteUnraisable(const char *name, int clineno, int lineno, const char *filename) { PyObject *old_exc, *old_val, *old_tb; PyObject *ctx; __Pyx_ErrFetch(&old_exc, &old_val, &old_tb); #if PY_MAJOR_VERSION < 3 ctx = PyString_FromString(name); #else ctx = PyUnicode_FromString(name); #endif __Pyx_ErrRestore(old_exc, old_val, old_tb); if (!ctx) { PyErr_WriteUnraisable(Py_None); } else { PyErr_WriteUnraisable(ctx); Py_DECREF(ctx); } } static int __Pyx_check_binary_version(void) { char ctversion[4], rtversion[4]; PyOS_snprintf(ctversion, 4, "%d.%d", PY_MAJOR_VERSION, PY_MINOR_VERSION); PyOS_snprintf(rtversion, 4, "%s", Py_GetVersion()); if (ctversion[0] != rtversion[0] || ctversion[2] != rtversion[2]) { char message[200]; PyOS_snprintf(message, sizeof(message), "compiletime version %s of module '%.100s' " "does not match runtime version %s", ctversion, __Pyx_MODULE_NAME, rtversion); #if PY_VERSION_HEX < 0x02050000 return PyErr_Warn(NULL, message); #else return PyErr_WarnEx(NULL, message, 1); #endif } return 0; } static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line) { int start = 0, mid = 0, end = count - 1; if (end >= 0 && code_line > entries[end].code_line) { return count; } while (start < end) { mid = (start + end) / 2; if (code_line < entries[mid].code_line) { end = mid; } else if (code_line > entries[mid].code_line) { start = mid + 1; } else { return mid; } } if (code_line <= entries[mid].code_line) { return mid; } else { return mid + 1; } } static PyCodeObject *__pyx_find_code_object(int code_line) { PyCodeObject* code_object; int pos; if (unlikely(!code_line) || unlikely(!__pyx_code_cache.entries)) { return NULL; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if (unlikely(pos >= __pyx_code_cache.count) || unlikely(__pyx_code_cache.entries[pos].code_line != code_line)) { return NULL; } code_object = __pyx_code_cache.entries[pos].code_object; Py_INCREF(code_object); return code_object; } static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object) { int pos, i; __Pyx_CodeObjectCacheEntry* entries = __pyx_code_cache.entries; if (unlikely(!code_line)) { return; } if (unlikely(!entries)) { entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Malloc(64*sizeof(__Pyx_CodeObjectCacheEntry)); if (likely(entries)) { __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = 64; __pyx_code_cache.count = 1; entries[0].code_line = code_line; entries[0].code_object = code_object; Py_INCREF(code_object); } return; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if ((pos < __pyx_code_cache.count) && unlikely(__pyx_code_cache.entries[pos].code_line == code_line)) { PyCodeObject* tmp = entries[pos].code_object; entries[pos].code_object = code_object; Py_DECREF(tmp); return; } if (__pyx_code_cache.count == __pyx_code_cache.max_count) { int new_max = __pyx_code_cache.max_count + 64; entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Realloc( __pyx_code_cache.entries, new_max*sizeof(__Pyx_CodeObjectCacheEntry)); if (unlikely(!entries)) { return; } __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = new_max; } for (i=__pyx_code_cache.count; i>pos; i--) { entries[i] = entries[i-1]; } entries[pos].code_line = code_line; entries[pos].code_object = code_object; __pyx_code_cache.count++; Py_INCREF(code_object); } #include "compile.h" #include "frameobject.h" #include "traceback.h" static PyCodeObject* __Pyx_CreateCodeObjectForTraceback( const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_srcfile = 0; PyObject *py_funcname = 0; #if PY_MAJOR_VERSION < 3 py_srcfile = PyString_FromString(filename); #else py_srcfile = PyUnicode_FromString(filename); #endif if (!py_srcfile) goto bad; if (c_line) { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #else py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #endif } else { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromString(funcname); #else py_funcname = PyUnicode_FromString(funcname); #endif } if (!py_funcname) goto bad; py_code = __Pyx_PyCode_New( 0, /*int argcount,*/ 0, /*int kwonlyargcount,*/ 0, /*int nlocals,*/ 0, /*int stacksize,*/ 0, /*int flags,*/ __pyx_empty_bytes, /*PyObject *code,*/ __pyx_empty_tuple, /*PyObject *consts,*/ __pyx_empty_tuple, /*PyObject *names,*/ __pyx_empty_tuple, /*PyObject *varnames,*/ __pyx_empty_tuple, /*PyObject *freevars,*/ __pyx_empty_tuple, /*PyObject *cellvars,*/ py_srcfile, /*PyObject *filename,*/ py_funcname, /*PyObject *name,*/ py_line, /*int firstlineno,*/ __pyx_empty_bytes /*PyObject *lnotab*/ ); Py_DECREF(py_srcfile); Py_DECREF(py_funcname); return py_code; bad: Py_XDECREF(py_srcfile); Py_XDECREF(py_funcname); return NULL; } static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_globals = 0; PyFrameObject *py_frame = 0; py_code = __pyx_find_code_object(c_line ? c_line : py_line); if (!py_code) { py_code = __Pyx_CreateCodeObjectForTraceback( funcname, c_line, py_line, filename); if (!py_code) goto bad; __pyx_insert_code_object(c_line ? c_line : py_line, py_code); } py_globals = PyModule_GetDict(__pyx_m); if (!py_globals) goto bad; py_frame = PyFrame_New( PyThreadState_GET(), /*PyThreadState *tstate,*/ py_code, /*PyCodeObject *code,*/ py_globals, /*PyObject *globals,*/ 0 /*PyObject *locals*/ ); if (!py_frame) goto bad; py_frame->f_lineno = py_line; PyTraceBack_Here(py_frame); bad: Py_XDECREF(py_code); Py_XDECREF(py_frame); } static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) { while (t->p) { #if PY_MAJOR_VERSION < 3 if (t->is_unicode) { *t->p = PyUnicode_DecodeUTF8(t->s, t->n - 1, NULL); } else if (t->intern) { *t->p = PyString_InternFromString(t->s); } else { *t->p = PyString_FromStringAndSize(t->s, t->n - 1); } #else /* Python 3+ has unicode identifiers */ if (t->is_unicode | t->is_str) { if (t->intern) { *t->p = PyUnicode_InternFromString(t->s); } else if (t->encoding) { *t->p = PyUnicode_Decode(t->s, t->n - 1, t->encoding, NULL); } else { *t->p = PyUnicode_FromStringAndSize(t->s, t->n - 1); } } else { *t->p = PyBytes_FromStringAndSize(t->s, t->n - 1); } #endif if (!*t->p) return -1; ++t; } return 0; } /* Type Conversion Functions */ static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { int is_true = x == Py_True; if (is_true | (x == Py_False) | (x == Py_None)) return is_true; else return PyObject_IsTrue(x); } static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x) { PyNumberMethods *m; const char *name = NULL; PyObject *res = NULL; #if PY_VERSION_HEX < 0x03000000 if (PyInt_Check(x) || PyLong_Check(x)) #else if (PyLong_Check(x)) #endif return Py_INCREF(x), x; m = Py_TYPE(x)->tp_as_number; #if PY_VERSION_HEX < 0x03000000 if (m && m->nb_int) { name = "int"; res = PyNumber_Int(x); } else if (m && m->nb_long) { name = "long"; res = PyNumber_Long(x); } #else if (m && m->nb_int) { name = "int"; res = PyNumber_Long(x); } #endif if (res) { #if PY_VERSION_HEX < 0x03000000 if (!PyInt_Check(res) && !PyLong_Check(res)) { #else if (!PyLong_Check(res)) { #endif PyErr_Format(PyExc_TypeError, "__%s__ returned non-%s (type %.200s)", name, name, Py_TYPE(res)->tp_name); Py_DECREF(res); return NULL; } } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_TypeError, "an integer is required"); } return res; } static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) { Py_ssize_t ival; PyObject* x = PyNumber_Index(b); if (!x) return -1; ival = PyInt_AsSsize_t(x); Py_DECREF(x); return ival; } static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) { #if PY_VERSION_HEX < 0x02050000 if (ival <= LONG_MAX) return PyInt_FromLong((long)ival); else { unsigned char *bytes = (unsigned char *) &ival; int one = 1; int little = (int)*(unsigned char*)&one; return _PyLong_FromByteArray(bytes, sizeof(size_t), little, 0); } #else return PyInt_FromSize_t(ival); #endif } static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject* x) { unsigned PY_LONG_LONG val = __Pyx_PyInt_AsUnsignedLongLong(x); if (unlikely(val == (unsigned PY_LONG_LONG)-1 && PyErr_Occurred())) { return (size_t)-1; } else if (unlikely(val != (unsigned PY_LONG_LONG)(size_t)val)) { PyErr_SetString(PyExc_OverflowError, "value too large to convert to size_t"); return (size_t)-1; } return (size_t)val; } #endif /* Py_PYTHON_H */ PyCogent-1.5.3/cogent/evolve/_likelihood_tree.pyx000644 000765 000024 00000007005 12024702176 023065 0ustar00jrideoutstaff000000 000000 include "../../include/numerical_pyrex.pyx" version_info = (2, 1) __version__ = "('1', '5', '3')" cdef extern from "math.h": double log (double x) def sumInputLikelihoods(child_indexes, result, likelihoods): # M is dim of alphabet, S is non-redundandt parent seq length, # U is length cdef int M, S, U, m, i, u cdef int c cdef double *values_data cdef long *index_data cdef double *target_data M = S = 0 target_data = checkArrayDouble2D(result, &S, &M) first = 1 c = 0 for index in child_indexes: U = 0 index_data = checkArrayLong1D(index, &S) values_data = checkArrayDouble2D(likelihoods[c], &U, &M) #if index_data[S-1] >= U: # raise RangeError if c == 0: for i from 0 <= i < S: u = index_data[i] for m from 0 <= m < M: target_data[M*i+m] = values_data[M*u+m] else: for i from 0 <= i < S: # col of parent data u = index_data[i] # col of childs data for m from 0 <= m < M: target_data[M*i+m] *= values_data[M*u+m] c += 1 return result def getTotalLogLikelihood(counts, input_likelihoods, mprobs): cdef int S, M, i, m cdef double posn, total cdef double *likelihoods_data, *mprobs_data, *weights_data S = M = 0 mprobs_data = checkArrayDouble1D(mprobs, &M) weights_data = checkArrayDouble1D(counts, &S) likelihoods_data = checkArrayDouble2D(input_likelihoods, &S, &M) total = 0.0 for i from 0 <= i < S: posn = 0.0 for m from 0 <= m < M: posn += likelihoods_data[i*M+m] * mprobs_data[m] total += log(posn)*weights_data[i] return total def getLogSumAcrossSites(counts, input_likelihoods): cdef int S, i cdef double total cdef double *likelihoods_data, *weights_data S = 0 weights_data = checkArrayDouble1D(counts, &S) likelihoods_data = checkArrayDouble1D(input_likelihoods, &S) total = 0.0 for i from 0 <= i < S: total += log(likelihoods_data[i])*weights_data[i] return total def logDotReduce(index, patch_probs, switch_probs, plhs): cdef int site, i, j, k, n, uniq, exponent, length, most_probable_state cdef double result, BASE cdef double *sp, *pl, *state, *prev, *tmp cdef long *index_data cdef object patch_probs1, patch_probs2 BASE = 2.0 ** 1000 patch_probs1 = patch_probs.copy() patch_probs2 = patch_probs.copy() n = uniq = length = 0 state = checkArrayDouble1D(patch_probs1, &n) prev = checkArrayDouble1D(patch_probs2, &n) sp = checkArrayDouble2D(switch_probs, &n, &n) pl = checkArrayDouble2D(plhs, &uniq, &n) index_data = checkArrayLong1D(index, &length) exponent = 0 for site from 0 <= site < length: k = index_data[site] if k >= uniq: raise ValueError((k, uniq)) tmp = prev prev = state state = tmp most_probable_state = 0 for i from 0 <= i < n: state[i] = 0 for j from 0 <= j < n: state[i] += prev[j] * sp[j*n+i] state[i] *= pl[k*n+i] if state[i] > state[most_probable_state]: most_probable_state = i while state[most_probable_state] < 1.0: for i from 0 <= i < n: state[i] *= BASE exponent += -1 result = 0.0 for i from 0 <= i < n: result += state[i] return log(result) + exponent * log(BASE) PyCogent-1.5.3/cogent/evolve/_pairwise_distance.c000644 000765 000024 00000646030 12024702176 023031 0ustar00jrideoutstaff000000 000000 /* Generated by Cython 0.16 on Fri Sep 14 12:12:06 2012 */ #define PY_SSIZE_T_CLEAN #include "Python.h" #ifndef Py_PYTHON_H #error Python headers needed to compile C extensions, please install development version of Python. #elif PY_VERSION_HEX < 0x02040000 #error Cython requires Python 2.4+. #else #include /* For offsetof */ #ifndef offsetof #define offsetof(type, member) ( (size_t) & ((type*)0) -> member ) #endif #if !defined(WIN32) && !defined(MS_WINDOWS) #ifndef __stdcall #define __stdcall #endif #ifndef __cdecl #define __cdecl #endif #ifndef __fastcall #define __fastcall #endif #endif #ifndef DL_IMPORT #define DL_IMPORT(t) t #endif #ifndef DL_EXPORT #define DL_EXPORT(t) t #endif #ifndef PY_LONG_LONG #define PY_LONG_LONG LONG_LONG #endif #ifndef Py_HUGE_VAL #define Py_HUGE_VAL HUGE_VAL #endif #ifdef PYPY_VERSION #define CYTHON_COMPILING_IN_PYPY 1 #define CYTHON_COMPILING_IN_CPYTHON 0 #else #define CYTHON_COMPILING_IN_PYPY 0 #define CYTHON_COMPILING_IN_CPYTHON 1 #endif #if CYTHON_COMPILING_IN_PYPY #define __Pyx_PyCFunction_Call PyObject_Call #else #define __Pyx_PyCFunction_Call PyCFunction_Call #endif #if PY_VERSION_HEX < 0x02050000 typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN #define PY_FORMAT_SIZE_T "" #define PyInt_FromSsize_t(z) PyInt_FromLong(z) #define PyInt_AsSsize_t(o) __Pyx_PyInt_AsInt(o) #define PyNumber_Index(o) PyNumber_Int(o) #define PyIndex_Check(o) PyNumber_Check(o) #define PyErr_WarnEx(category, message, stacklevel) PyErr_Warn(category, message) #define __PYX_BUILD_PY_SSIZE_T "i" #else #define __PYX_BUILD_PY_SSIZE_T "n" #endif #if PY_VERSION_HEX < 0x02060000 #define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt) #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) #define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size) #define PyVarObject_HEAD_INIT(type, size) \ PyObject_HEAD_INIT(type) size, #define PyType_Modified(t) typedef struct { void *buf; PyObject *obj; Py_ssize_t len; Py_ssize_t itemsize; int readonly; int ndim; char *format; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; void *internal; } Py_buffer; #define PyBUF_SIMPLE 0 #define PyBUF_WRITABLE 0x0001 #define PyBUF_FORMAT 0x0004 #define PyBUF_ND 0x0008 #define PyBUF_STRIDES (0x0010 | PyBUF_ND) #define PyBUF_C_CONTIGUOUS (0x0020 | PyBUF_STRIDES) #define PyBUF_F_CONTIGUOUS (0x0040 | PyBUF_STRIDES) #define PyBUF_ANY_CONTIGUOUS (0x0080 | PyBUF_STRIDES) #define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES) #define PyBUF_RECORDS (PyBUF_STRIDES | PyBUF_FORMAT | PyBUF_WRITABLE) #define PyBUF_FULL (PyBUF_INDIRECT | PyBUF_FORMAT | PyBUF_WRITABLE) typedef int (*getbufferproc)(PyObject *, Py_buffer *, int); typedef void (*releasebufferproc)(PyObject *, Py_buffer *); #endif #if PY_MAJOR_VERSION < 3 #define __Pyx_BUILTIN_MODULE_NAME "__builtin__" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #else #define __Pyx_BUILTIN_MODULE_NAME "builtins" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #endif #if PY_MAJOR_VERSION < 3 && PY_MINOR_VERSION < 6 #define PyUnicode_FromString(s) PyUnicode_Decode(s, strlen(s), "UTF-8", "strict") #endif #if PY_MAJOR_VERSION >= 3 #define Py_TPFLAGS_CHECKTYPES 0 #define Py_TPFLAGS_HAVE_INDEX 0 #endif #if (PY_VERSION_HEX < 0x02060000) || (PY_MAJOR_VERSION >= 3) #define Py_TPFLAGS_HAVE_NEWBUFFER 0 #endif #if PY_VERSION_HEX > 0x03030000 && defined(PyUnicode_GET_LENGTH) #define CYTHON_PEP393_ENABLED 1 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_LENGTH(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) PyUnicode_READ_CHAR(u, i) #else #define CYTHON_PEP393_ENABLED 0 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_SIZE(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) ((Py_UCS4)(PyUnicode_AS_UNICODE(u)[i])) #endif #if PY_MAJOR_VERSION >= 3 #define PyBaseString_Type PyUnicode_Type #define PyStringObject PyUnicodeObject #define PyString_Type PyUnicode_Type #define PyString_Check PyUnicode_Check #define PyString_CheckExact PyUnicode_CheckExact #endif #if PY_VERSION_HEX < 0x02060000 #define PyBytesObject PyStringObject #define PyBytes_Type PyString_Type #define PyBytes_Check PyString_Check #define PyBytes_CheckExact PyString_CheckExact #define PyBytes_FromString PyString_FromString #define PyBytes_FromStringAndSize PyString_FromStringAndSize #define PyBytes_FromFormat PyString_FromFormat #define PyBytes_DecodeEscape PyString_DecodeEscape #define PyBytes_AsString PyString_AsString #define PyBytes_AsStringAndSize PyString_AsStringAndSize #define PyBytes_Size PyString_Size #define PyBytes_AS_STRING PyString_AS_STRING #define PyBytes_GET_SIZE PyString_GET_SIZE #define PyBytes_Repr PyString_Repr #define PyBytes_Concat PyString_Concat #define PyBytes_ConcatAndDel PyString_ConcatAndDel #endif #if PY_VERSION_HEX < 0x02060000 #define PySet_Check(obj) PyObject_TypeCheck(obj, &PySet_Type) #define PyFrozenSet_Check(obj) PyObject_TypeCheck(obj, &PyFrozenSet_Type) #endif #ifndef PySet_CheckExact #define PySet_CheckExact(obj) (Py_TYPE(obj) == &PySet_Type) #endif #define __Pyx_TypeCheck(obj, type) PyObject_TypeCheck(obj, (PyTypeObject *)type) #if PY_MAJOR_VERSION >= 3 #define PyIntObject PyLongObject #define PyInt_Type PyLong_Type #define PyInt_Check(op) PyLong_Check(op) #define PyInt_CheckExact(op) PyLong_CheckExact(op) #define PyInt_FromString PyLong_FromString #define PyInt_FromUnicode PyLong_FromUnicode #define PyInt_FromLong PyLong_FromLong #define PyInt_FromSize_t PyLong_FromSize_t #define PyInt_FromSsize_t PyLong_FromSsize_t #define PyInt_AsLong PyLong_AsLong #define PyInt_AS_LONG PyLong_AS_LONG #define PyInt_AsSsize_t PyLong_AsSsize_t #define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask #define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask #endif #if PY_MAJOR_VERSION >= 3 #define PyBoolObject PyLongObject #endif #if PY_VERSION_HEX < 0x03020000 typedef long Py_hash_t; #define __Pyx_PyInt_FromHash_t PyInt_FromLong #define __Pyx_PyInt_AsHash_t PyInt_AsLong #else #define __Pyx_PyInt_FromHash_t PyInt_FromSsize_t #define __Pyx_PyInt_AsHash_t PyInt_AsSsize_t #endif #if (PY_MAJOR_VERSION < 3) || (PY_VERSION_HEX >= 0x03010300) #define __Pyx_PySequence_GetSlice(obj, a, b) PySequence_GetSlice(obj, a, b) #define __Pyx_PySequence_SetSlice(obj, a, b, value) PySequence_SetSlice(obj, a, b, value) #define __Pyx_PySequence_DelSlice(obj, a, b) PySequence_DelSlice(obj, a, b) #else #define __Pyx_PySequence_GetSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), (PyObject*)0) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_GetSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object is unsliceable", (obj)->ob_type->tp_name), (PyObject*)0))) #define __Pyx_PySequence_SetSlice(obj, a, b, value) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_SetSlice(obj, a, b, value)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice assignment", (obj)->ob_type->tp_name), -1))) #define __Pyx_PySequence_DelSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_DelSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice deletion", (obj)->ob_type->tp_name), -1))) #endif #if PY_MAJOR_VERSION >= 3 #define PyMethod_New(func, self, klass) ((self) ? PyMethod_New(func, self) : PyInstanceMethod_New(func)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),((char *)(n))) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),((char *)(n)),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),((char *)(n))) #else #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),(n)) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),(n),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),(n)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_NAMESTR(n) ((char *)(n)) #define __Pyx_DOCSTR(n) ((char *)(n)) #else #define __Pyx_NAMESTR(n) (n) #define __Pyx_DOCSTR(n) (n) #endif #if PY_MAJOR_VERSION >= 3 #define __Pyx_PyNumber_Divide(x,y) PyNumber_TrueDivide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceTrueDivide(x,y) #else #define __Pyx_PyNumber_Divide(x,y) PyNumber_Divide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceDivide(x,y) #endif #ifndef __PYX_EXTERN_C #ifdef __cplusplus #define __PYX_EXTERN_C extern "C" #else #define __PYX_EXTERN_C extern #endif #endif #if defined(WIN32) || defined(MS_WINDOWS) #define _USE_MATH_DEFINES #endif #include #define __PYX_HAVE__cogent__evolve___pairwise_distance #define __PYX_HAVE_API__cogent__evolve___pairwise_distance #include "stdio.h" #include "stdlib.h" #include "numpy/arrayobject.h" #include "numpy/ufuncobject.h" #ifdef _OPENMP #include #endif /* _OPENMP */ #ifdef PYREX_WITHOUT_ASSERTIONS #define CYTHON_WITHOUT_ASSERTIONS #endif /* inline attribute */ #ifndef CYTHON_INLINE #if defined(__GNUC__) #define CYTHON_INLINE __inline__ #elif defined(_MSC_VER) #define CYTHON_INLINE __inline #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L #define CYTHON_INLINE inline #else #define CYTHON_INLINE #endif #endif /* unused attribute */ #ifndef CYTHON_UNUSED # if defined(__GNUC__) # if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif # elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif #endif typedef struct {PyObject **p; char *s; const long n; const char* encoding; const char is_unicode; const char is_str; const char intern; } __Pyx_StringTabEntry; /*proto*/ /* Type Conversion Predeclarations */ #define __Pyx_PyBytes_FromUString(s) PyBytes_FromString((char*)s) #define __Pyx_PyBytes_AsUString(s) ((unsigned char*) PyBytes_AsString(s)) #define __Pyx_Owned_Py_None(b) (Py_INCREF(Py_None), Py_None) #define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False)) static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject*); static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x); static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject*); static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t); static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject*); #define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x)) #define __pyx_PyFloat_AsFloat(x) ((float) __pyx_PyFloat_AsDouble(x)) #ifdef __GNUC__ /* Test for GCC > 2.95 */ #if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)) #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) #else /* __GNUC__ > 2 ... */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ > 2 ... */ #else /* __GNUC__ */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ */ static PyObject *__pyx_m; static PyObject *__pyx_b; static PyObject *__pyx_empty_tuple; static PyObject *__pyx_empty_bytes; static int __pyx_lineno; static int __pyx_clineno = 0; static const char * __pyx_cfilenm= __FILE__; static const char *__pyx_filename; #if !defined(CYTHON_CCOMPLEX) #if defined(__cplusplus) #define CYTHON_CCOMPLEX 1 #elif defined(_Complex_I) #define CYTHON_CCOMPLEX 1 #else #define CYTHON_CCOMPLEX 0 #endif #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus #include #else #include #endif #endif #if CYTHON_CCOMPLEX && !defined(__cplusplus) && defined(__sun__) && defined(__GNUC__) #undef _Complex_I #define _Complex_I 1.0fj #endif static const char *__pyx_f[] = { "_pairwise_distance.pyx", "numpy.pxd", }; #define IS_UNSIGNED(type) (((type) -1) > 0) struct __Pyx_StructField_; #define __PYX_BUF_FLAGS_PACKED_STRUCT (1 << 0) typedef struct { const char* name; /* for error messages only */ struct __Pyx_StructField_* fields; size_t size; /* sizeof(type) */ size_t arraysize[8]; /* length of array in each dimension */ int ndim; char typegroup; /* _R_eal, _C_omplex, Signed _I_nt, _U_nsigned int, _S_truct, _P_ointer, _O_bject */ char is_unsigned; int flags; } __Pyx_TypeInfo; typedef struct __Pyx_StructField_ { __Pyx_TypeInfo* type; const char* name; size_t offset; } __Pyx_StructField; typedef struct { __Pyx_StructField* field; size_t parent_offset; } __Pyx_BufFmt_StackElem; typedef struct { __Pyx_StructField root; __Pyx_BufFmt_StackElem* head; size_t fmt_offset; size_t new_count, enc_count; size_t struct_alignment; int is_complex; char enc_type; char new_packmode; char enc_packmode; char is_valid_array; } __Pyx_BufFmt_Context; /* "numpy.pxd":722 * # in Cython to enable them only on the right systems. * * ctypedef npy_int8 int8_t # <<<<<<<<<<<<<< * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t */ typedef npy_int8 __pyx_t_5numpy_int8_t; /* "numpy.pxd":723 * * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t # <<<<<<<<<<<<<< * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t */ typedef npy_int16 __pyx_t_5numpy_int16_t; /* "numpy.pxd":724 * ctypedef npy_int8 int8_t * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t # <<<<<<<<<<<<<< * ctypedef npy_int64 int64_t * #ctypedef npy_int96 int96_t */ typedef npy_int32 __pyx_t_5numpy_int32_t; /* "numpy.pxd":725 * ctypedef npy_int16 int16_t * ctypedef npy_int32 int32_t * ctypedef npy_int64 int64_t # <<<<<<<<<<<<<< * #ctypedef npy_int96 int96_t * #ctypedef npy_int128 int128_t */ typedef npy_int64 __pyx_t_5numpy_int64_t; /* "numpy.pxd":729 * #ctypedef npy_int128 int128_t * * ctypedef npy_uint8 uint8_t # <<<<<<<<<<<<<< * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t */ typedef npy_uint8 __pyx_t_5numpy_uint8_t; /* "numpy.pxd":730 * * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t # <<<<<<<<<<<<<< * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t */ typedef npy_uint16 __pyx_t_5numpy_uint16_t; /* "numpy.pxd":731 * ctypedef npy_uint8 uint8_t * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t # <<<<<<<<<<<<<< * ctypedef npy_uint64 uint64_t * #ctypedef npy_uint96 uint96_t */ typedef npy_uint32 __pyx_t_5numpy_uint32_t; /* "numpy.pxd":732 * ctypedef npy_uint16 uint16_t * ctypedef npy_uint32 uint32_t * ctypedef npy_uint64 uint64_t # <<<<<<<<<<<<<< * #ctypedef npy_uint96 uint96_t * #ctypedef npy_uint128 uint128_t */ typedef npy_uint64 __pyx_t_5numpy_uint64_t; /* "numpy.pxd":736 * #ctypedef npy_uint128 uint128_t * * ctypedef npy_float32 float32_t # <<<<<<<<<<<<<< * ctypedef npy_float64 float64_t * #ctypedef npy_float80 float80_t */ typedef npy_float32 __pyx_t_5numpy_float32_t; /* "numpy.pxd":737 * * ctypedef npy_float32 float32_t * ctypedef npy_float64 float64_t # <<<<<<<<<<<<<< * #ctypedef npy_float80 float80_t * #ctypedef npy_float128 float128_t */ typedef npy_float64 __pyx_t_5numpy_float64_t; /* "numpy.pxd":746 * # The int types are mapped a bit surprising -- * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t # <<<<<<<<<<<<<< * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t */ typedef npy_long __pyx_t_5numpy_int_t; /* "numpy.pxd":747 * # numpy.int corresponds to 'l' and numpy.long to 'q' * ctypedef npy_long int_t * ctypedef npy_longlong long_t # <<<<<<<<<<<<<< * ctypedef npy_longlong longlong_t * */ typedef npy_longlong __pyx_t_5numpy_long_t; /* "numpy.pxd":748 * ctypedef npy_long int_t * ctypedef npy_longlong long_t * ctypedef npy_longlong longlong_t # <<<<<<<<<<<<<< * * ctypedef npy_ulong uint_t */ typedef npy_longlong __pyx_t_5numpy_longlong_t; /* "numpy.pxd":750 * ctypedef npy_longlong longlong_t * * ctypedef npy_ulong uint_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t */ typedef npy_ulong __pyx_t_5numpy_uint_t; /* "numpy.pxd":751 * * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t # <<<<<<<<<<<<<< * ctypedef npy_ulonglong ulonglong_t * */ typedef npy_ulonglong __pyx_t_5numpy_ulong_t; /* "numpy.pxd":752 * ctypedef npy_ulong uint_t * ctypedef npy_ulonglong ulong_t * ctypedef npy_ulonglong ulonglong_t # <<<<<<<<<<<<<< * * ctypedef npy_intp intp_t */ typedef npy_ulonglong __pyx_t_5numpy_ulonglong_t; /* "numpy.pxd":754 * ctypedef npy_ulonglong ulonglong_t * * ctypedef npy_intp intp_t # <<<<<<<<<<<<<< * ctypedef npy_uintp uintp_t * */ typedef npy_intp __pyx_t_5numpy_intp_t; /* "numpy.pxd":755 * * ctypedef npy_intp intp_t * ctypedef npy_uintp uintp_t # <<<<<<<<<<<<<< * * ctypedef npy_double float_t */ typedef npy_uintp __pyx_t_5numpy_uintp_t; /* "numpy.pxd":757 * ctypedef npy_uintp uintp_t * * ctypedef npy_double float_t # <<<<<<<<<<<<<< * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t */ typedef npy_double __pyx_t_5numpy_float_t; /* "numpy.pxd":758 * * ctypedef npy_double float_t * ctypedef npy_double double_t # <<<<<<<<<<<<<< * ctypedef npy_longdouble longdouble_t * */ typedef npy_double __pyx_t_5numpy_double_t; /* "numpy.pxd":759 * ctypedef npy_double float_t * ctypedef npy_double double_t * ctypedef npy_longdouble longdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cfloat cfloat_t */ typedef npy_longdouble __pyx_t_5numpy_longdouble_t; #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< float > __pyx_t_float_complex; #else typedef float _Complex __pyx_t_float_complex; #endif #else typedef struct { float real, imag; } __pyx_t_float_complex; #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus typedef ::std::complex< double > __pyx_t_double_complex; #else typedef double _Complex __pyx_t_double_complex; #endif #else typedef struct { double real, imag; } __pyx_t_double_complex; #endif /*--- Type declarations ---*/ /* "numpy.pxd":761 * ctypedef npy_longdouble longdouble_t * * ctypedef npy_cfloat cfloat_t # <<<<<<<<<<<<<< * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t */ typedef npy_cfloat __pyx_t_5numpy_cfloat_t; /* "numpy.pxd":762 * * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t # <<<<<<<<<<<<<< * ctypedef npy_clongdouble clongdouble_t * */ typedef npy_cdouble __pyx_t_5numpy_cdouble_t; /* "numpy.pxd":763 * ctypedef npy_cfloat cfloat_t * ctypedef npy_cdouble cdouble_t * ctypedef npy_clongdouble clongdouble_t # <<<<<<<<<<<<<< * * ctypedef npy_cdouble complex_t */ typedef npy_clongdouble __pyx_t_5numpy_clongdouble_t; /* "numpy.pxd":765 * ctypedef npy_clongdouble clongdouble_t * * ctypedef npy_cdouble complex_t # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew1(a): */ typedef npy_cdouble __pyx_t_5numpy_complex_t; #ifndef CYTHON_REFNANNY #define CYTHON_REFNANNY 0 #endif #if CYTHON_REFNANNY typedef struct { void (*INCREF)(void*, PyObject*, int); void (*DECREF)(void*, PyObject*, int); void (*GOTREF)(void*, PyObject*, int); void (*GIVEREF)(void*, PyObject*, int); void* (*SetupContext)(const char*, int, const char*); void (*FinishContext)(void**); } __Pyx_RefNannyAPIStruct; static __Pyx_RefNannyAPIStruct *__Pyx_RefNanny = NULL; static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname); /*proto*/ #define __Pyx_RefNannyDeclarations void *__pyx_refnanny = NULL; #ifdef WITH_THREAD #define __Pyx_RefNannySetupContext(name, acquire_gil) \ if (acquire_gil) { \ PyGILState_STATE __pyx_gilstate_save = PyGILState_Ensure(); \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ PyGILState_Release(__pyx_gilstate_save); \ } else { \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ } #else #define __Pyx_RefNannySetupContext(name, acquire_gil) \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__) #endif #define __Pyx_RefNannyFinishContext() \ __Pyx_RefNanny->FinishContext(&__pyx_refnanny) #define __Pyx_INCREF(r) __Pyx_RefNanny->INCREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_DECREF(r) __Pyx_RefNanny->DECREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GOTREF(r) __Pyx_RefNanny->GOTREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GIVEREF(r) __Pyx_RefNanny->GIVEREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_XINCREF(r) do { if((r) != NULL) {__Pyx_INCREF(r); }} while(0) #define __Pyx_XDECREF(r) do { if((r) != NULL) {__Pyx_DECREF(r); }} while(0) #define __Pyx_XGOTREF(r) do { if((r) != NULL) {__Pyx_GOTREF(r); }} while(0) #define __Pyx_XGIVEREF(r) do { if((r) != NULL) {__Pyx_GIVEREF(r);}} while(0) #else #define __Pyx_RefNannyDeclarations #define __Pyx_RefNannySetupContext(name, acquire_gil) #define __Pyx_RefNannyFinishContext() #define __Pyx_INCREF(r) Py_INCREF(r) #define __Pyx_DECREF(r) Py_DECREF(r) #define __Pyx_GOTREF(r) #define __Pyx_GIVEREF(r) #define __Pyx_XINCREF(r) Py_XINCREF(r) #define __Pyx_XDECREF(r) Py_XDECREF(r) #define __Pyx_XGOTREF(r) #define __Pyx_XGIVEREF(r) #endif /* CYTHON_REFNANNY */ #define __Pyx_CLEAR(r) do { PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);} while(0) #define __Pyx_XCLEAR(r) do { if((r) != NULL) {PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);}} while(0) static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name); /*proto*/ static void __Pyx_RaiseArgtupleInvalid(const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found); /*proto*/ static void __Pyx_RaiseDoubleKeywordsError(const char* func_name, PyObject* kw_name); /*proto*/ static int __Pyx_ParseOptionalKeywords(PyObject *kwds, PyObject **argnames[], \ PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, \ const char* function_name); /*proto*/ static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact); /*proto*/ static CYTHON_INLINE int __Pyx_GetBufferAndValidate(Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack); static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info); static void __Pyx_RaiseBufferIndexError(int axis); /*proto*/ #define __Pyx_BufPtrStrided1d(type, buf, i0, s0) (type)((char*)buf + i0 * s0) #define __Pyx_BufPtrStrided2d(type, buf, i0, s0, i1, s1) (type)((char*)buf + i0 * s0 + i1 * s1) static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb); /*proto*/ static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb); /*proto*/ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause); /*proto*/ static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index); static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected); static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void); static void __Pyx_UnpackTupleError(PyObject *, Py_ssize_t index); /*proto*/ static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type); /*proto*/ typedef struct { Py_ssize_t shape, strides, suboffsets; } __Pyx_Buf_DimInfo; typedef struct { size_t refcount; Py_buffer pybuffer; } __Pyx_Buffer; typedef struct { __Pyx_Buffer *rcbuffer; char *data; __Pyx_Buf_DimInfo diminfo[8]; } __Pyx_LocalBuf_ND; #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags); static void __Pyx_ReleaseBuffer(Py_buffer *view); #else #define __Pyx_GetBuffer PyObject_GetBuffer #define __Pyx_ReleaseBuffer PyBuffer_Release #endif static Py_ssize_t __Pyx_zeros[] = {0, 0, 0, 0, 0, 0, 0, 0}; static Py_ssize_t __Pyx_minusones[] = {-1, -1, -1, -1, -1, -1, -1, -1}; #if CYTHON_CCOMPLEX #ifdef __cplusplus #define __Pyx_CREAL(z) ((z).real()) #define __Pyx_CIMAG(z) ((z).imag()) #else #define __Pyx_CREAL(z) (__real__(z)) #define __Pyx_CIMAG(z) (__imag__(z)) #endif #else #define __Pyx_CREAL(z) ((z).real) #define __Pyx_CIMAG(z) ((z).imag) #endif #if defined(_WIN32) && defined(__cplusplus) && CYTHON_CCOMPLEX #define __Pyx_SET_CREAL(z,x) ((z).real(x)) #define __Pyx_SET_CIMAG(z,y) ((z).imag(y)) #else #define __Pyx_SET_CREAL(z,x) __Pyx_CREAL(z) = (x) #define __Pyx_SET_CIMAG(z,y) __Pyx_CIMAG(z) = (y) #endif static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float, float); #if CYTHON_CCOMPLEX #define __Pyx_c_eqf(a, b) ((a)==(b)) #define __Pyx_c_sumf(a, b) ((a)+(b)) #define __Pyx_c_difff(a, b) ((a)-(b)) #define __Pyx_c_prodf(a, b) ((a)*(b)) #define __Pyx_c_quotf(a, b) ((a)/(b)) #define __Pyx_c_negf(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zerof(z) ((z)==(float)0) #define __Pyx_c_conjf(z) (::std::conj(z)) #if 1 #define __Pyx_c_absf(z) (::std::abs(z)) #define __Pyx_c_powf(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zerof(z) ((z)==0) #define __Pyx_c_conjf(z) (conjf(z)) #if 1 #define __Pyx_c_absf(z) (cabsf(z)) #define __Pyx_c_powf(a, b) (cpowf(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex, __pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex); static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex); #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex); static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex, __pyx_t_float_complex); #endif #endif static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double, double); #if CYTHON_CCOMPLEX #define __Pyx_c_eq(a, b) ((a)==(b)) #define __Pyx_c_sum(a, b) ((a)+(b)) #define __Pyx_c_diff(a, b) ((a)-(b)) #define __Pyx_c_prod(a, b) ((a)*(b)) #define __Pyx_c_quot(a, b) ((a)/(b)) #define __Pyx_c_neg(a) (-(a)) #ifdef __cplusplus #define __Pyx_c_is_zero(z) ((z)==(double)0) #define __Pyx_c_conj(z) (::std::conj(z)) #if 1 #define __Pyx_c_abs(z) (::std::abs(z)) #define __Pyx_c_pow(a, b) (::std::pow(a, b)) #endif #else #define __Pyx_c_is_zero(z) ((z)==0) #define __Pyx_c_conj(z) (conj(z)) #if 1 #define __Pyx_c_abs(z) (cabs(z)) #define __Pyx_c_pow(a, b) (cpow(a, b)) #endif #endif #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex, __pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex); static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex); #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex); static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex, __pyx_t_double_complex); #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject *); static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject *); static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject *); static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject *); static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject *); static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject *); static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject *); static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject *); static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject *); static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject *); static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject *); static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject *); static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject *); static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject *); static int __Pyx_check_binary_version(void); #if !defined(__Pyx_PyIdentifier_FromString) #if PY_MAJOR_VERSION < 3 #define __Pyx_PyIdentifier_FromString(s) PyString_FromString(s) #else #define __Pyx_PyIdentifier_FromString(s) PyUnicode_FromString(s) #endif #endif static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict); /*proto*/ static PyObject *__Pyx_ImportModule(const char *name); /*proto*/ typedef struct { int code_line; PyCodeObject* code_object; } __Pyx_CodeObjectCacheEntry; struct __Pyx_CodeObjectCache { int count; int max_count; __Pyx_CodeObjectCacheEntry* entries; }; static struct __Pyx_CodeObjectCache __pyx_code_cache = {0,0,NULL}; static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line); static PyCodeObject *__pyx_find_code_object(int code_line); static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object); static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename); /*proto*/ static int __Pyx_InitStrings(__Pyx_StringTabEntry *t); /*proto*/ /* Module declarations from 'cpython.buffer' */ /* Module declarations from 'cpython.ref' */ /* Module declarations from 'libc.stdio' */ /* Module declarations from 'cpython.object' */ /* Module declarations from 'libc.stdlib' */ /* Module declarations from 'numpy' */ /* Module declarations from 'numpy' */ static PyTypeObject *__pyx_ptype_5numpy_dtype = 0; static PyTypeObject *__pyx_ptype_5numpy_flatiter = 0; static PyTypeObject *__pyx_ptype_5numpy_broadcast = 0; static PyTypeObject *__pyx_ptype_5numpy_ndarray = 0; static PyTypeObject *__pyx_ptype_5numpy_ufunc = 0; static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *, char *, char *, int *); /*proto*/ /* Module declarations from 'cogent.evolve._pairwise_distance' */ static __Pyx_TypeInfo __Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t = { "float64_t", NULL, sizeof(__pyx_t_5numpy_float64_t), { 0 }, 0, 'R', 0, 0 }; static __Pyx_TypeInfo __Pyx_TypeInfo_nn___pyx_t_5numpy_int32_t = { "int32_t", NULL, sizeof(__pyx_t_5numpy_int32_t), { 0 }, 0, 'I', IS_UNSIGNED(__pyx_t_5numpy_int32_t), 0 }; #define __Pyx_MODULE_NAME "cogent.evolve._pairwise_distance" int __pyx_module_is_main_cogent__evolve___pairwise_distance = 0; /* Implementation of 'cogent.evolve._pairwise_distance' */ static PyObject *__pyx_builtin_range; static PyObject *__pyx_builtin_ValueError; static PyObject *__pyx_builtin_RuntimeError; static PyObject *__pyx_pf_6cogent_6evolve_18_pairwise_distance__fill_diversity_matrix(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_matrix, PyArrayObject *__pyx_v_seq1, PyArrayObject *__pyx_v_seq2); /* proto */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /* proto */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info); /* proto */ static char __pyx_k_1[] = "ndarray is not C contiguous"; static char __pyx_k_3[] = "ndarray is not Fortran contiguous"; static char __pyx_k_5[] = "Non-native byte order not supported"; static char __pyx_k_7[] = "unknown dtype code in numpy.pxd (%d)"; static char __pyx_k_8[] = "Format string allocated too short, see comment in numpy.pxd"; static char __pyx_k_11[] = "Format string allocated too short."; static char __pyx_k_13[] = "('1', '5', '3')"; static char __pyx_k_16[] = "_fill_diversity_matrix"; static char __pyx_k_17[] = "/Users/jrideout/.virtualenvs/pycogent/trunk/cogent/evolve/_pairwise_distance.pyx"; static char __pyx_k_18[] = "cogent.evolve._pairwise_distance"; static char __pyx_k__B[] = "B"; static char __pyx_k__H[] = "H"; static char __pyx_k__I[] = "I"; static char __pyx_k__L[] = "L"; static char __pyx_k__O[] = "O"; static char __pyx_k__Q[] = "Q"; static char __pyx_k__b[] = "b"; static char __pyx_k__d[] = "d"; static char __pyx_k__f[] = "f"; static char __pyx_k__g[] = "g"; static char __pyx_k__h[] = "h"; static char __pyx_k__i[] = "i"; static char __pyx_k__l[] = "l"; static char __pyx_k__q[] = "q"; static char __pyx_k__Zd[] = "Zd"; static char __pyx_k__Zf[] = "Zf"; static char __pyx_k__Zg[] = "Zg"; static char __pyx_k__seq1[] = "seq1"; static char __pyx_k__seq2[] = "seq2"; static char __pyx_k__range[] = "range"; static char __pyx_k__matrix[] = "matrix"; static char __pyx_k____main__[] = "__main__"; static char __pyx_k____test__[] = "__test__"; static char __pyx_k__ValueError[] = "ValueError"; static char __pyx_k____version__[] = "__version__"; static char __pyx_k__RuntimeError[] = "RuntimeError"; static PyObject *__pyx_kp_u_1; static PyObject *__pyx_kp_u_11; static PyObject *__pyx_kp_s_13; static PyObject *__pyx_n_s_16; static PyObject *__pyx_kp_s_17; static PyObject *__pyx_n_s_18; static PyObject *__pyx_kp_u_3; static PyObject *__pyx_kp_u_5; static PyObject *__pyx_kp_u_7; static PyObject *__pyx_kp_u_8; static PyObject *__pyx_n_s__RuntimeError; static PyObject *__pyx_n_s__ValueError; static PyObject *__pyx_n_s____main__; static PyObject *__pyx_n_s____test__; static PyObject *__pyx_n_s____version__; static PyObject *__pyx_n_s__i; static PyObject *__pyx_n_s__matrix; static PyObject *__pyx_n_s__range; static PyObject *__pyx_n_s__seq1; static PyObject *__pyx_n_s__seq2; static PyObject *__pyx_int_15; static PyObject *__pyx_k_tuple_2; static PyObject *__pyx_k_tuple_4; static PyObject *__pyx_k_tuple_6; static PyObject *__pyx_k_tuple_9; static PyObject *__pyx_k_tuple_10; static PyObject *__pyx_k_tuple_12; static PyObject *__pyx_k_tuple_14; static PyObject *__pyx_k_codeobj_15; /* Python wrapper */ static PyObject *__pyx_pw_6cogent_6evolve_18_pairwise_distance_1_fill_diversity_matrix(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static char __pyx_doc_6cogent_6evolve_18_pairwise_distance__fill_diversity_matrix[] = "fills the diversity matrix for valid positions.\n \n Assumes the provided sequences have been converted to indices with\n invalid characters being negative numbers (use get_moltype_index_array\n plus seq_to_indices)."; static PyMethodDef __pyx_mdef_6cogent_6evolve_18_pairwise_distance_1_fill_diversity_matrix = {__Pyx_NAMESTR("_fill_diversity_matrix"), (PyCFunction)__pyx_pw_6cogent_6evolve_18_pairwise_distance_1_fill_diversity_matrix, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(__pyx_doc_6cogent_6evolve_18_pairwise_distance__fill_diversity_matrix)}; static PyObject *__pyx_pw_6cogent_6evolve_18_pairwise_distance_1_fill_diversity_matrix(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_matrix = 0; PyArrayObject *__pyx_v_seq1 = 0; PyArrayObject *__pyx_v_seq2 = 0; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__matrix,&__pyx_n_s__seq1,&__pyx_n_s__seq2,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("_fill_diversity_matrix (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[3] = {0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__matrix); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__seq1); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("_fill_diversity_matrix", 1, 3, 3, 1); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__seq2); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("_fill_diversity_matrix", 1, 3, 3, 2); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "_fill_diversity_matrix") < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 3) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); } __pyx_v_matrix = ((PyArrayObject *)values[0]); __pyx_v_seq1 = ((PyArrayObject *)values[1]); __pyx_v_seq2 = ((PyArrayObject *)values[2]); } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("_fill_diversity_matrix", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.evolve._pairwise_distance._fill_diversity_matrix", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_matrix), __pyx_ptype_5numpy_ndarray, 1, "matrix", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_seq1), __pyx_ptype_5numpy_ndarray, 1, "seq1", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_seq2), __pyx_ptype_5numpy_ndarray, 1, "seq2", 0))) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_pf_6cogent_6evolve_18_pairwise_distance__fill_diversity_matrix(__pyx_self, __pyx_v_matrix, __pyx_v_seq1, __pyx_v_seq2); goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/evolve/_pairwise_distance.pyx":6 * * # fills in a diversity matrix from sequences of integers * def _fill_diversity_matrix(np.ndarray[np.float64_t, ndim=2] matrix, np.ndarray[np.int32_t, ndim=1] seq1, np.ndarray[np.int32_t, ndim=1] seq2): # <<<<<<<<<<<<<< * """fills the diversity matrix for valid positions. * */ static PyObject *__pyx_pf_6cogent_6evolve_18_pairwise_distance__fill_diversity_matrix(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_matrix, PyArrayObject *__pyx_v_seq1, PyArrayObject *__pyx_v_seq2) { int __pyx_v_i; __Pyx_LocalBuf_ND __pyx_pybuffernd_matrix; __Pyx_Buffer __pyx_pybuffer_matrix; __Pyx_LocalBuf_ND __pyx_pybuffernd_seq1; __Pyx_Buffer __pyx_pybuffer_seq1; __Pyx_LocalBuf_ND __pyx_pybuffernd_seq2; __Pyx_Buffer __pyx_pybuffer_seq2; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations Py_ssize_t __pyx_t_1; int __pyx_t_2; int __pyx_t_3; int __pyx_t_4; int __pyx_t_5; int __pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; int __pyx_t_10; __pyx_t_5numpy_int32_t __pyx_t_11; __pyx_t_5numpy_int32_t __pyx_t_12; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("_fill_diversity_matrix", 0); __pyx_pybuffer_matrix.pybuffer.buf = NULL; __pyx_pybuffer_matrix.refcount = 0; __pyx_pybuffernd_matrix.data = NULL; __pyx_pybuffernd_matrix.rcbuffer = &__pyx_pybuffer_matrix; __pyx_pybuffer_seq1.pybuffer.buf = NULL; __pyx_pybuffer_seq1.refcount = 0; __pyx_pybuffernd_seq1.data = NULL; __pyx_pybuffernd_seq1.rcbuffer = &__pyx_pybuffer_seq1; __pyx_pybuffer_seq2.pybuffer.buf = NULL; __pyx_pybuffer_seq2.refcount = 0; __pyx_pybuffernd_seq2.data = NULL; __pyx_pybuffernd_seq2.rcbuffer = &__pyx_pybuffer_seq2; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_matrix.rcbuffer->pybuffer, (PyObject*)__pyx_v_matrix, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 2, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_matrix.diminfo[0].strides = __pyx_pybuffernd_matrix.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_matrix.diminfo[0].shape = __pyx_pybuffernd_matrix.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_matrix.diminfo[1].strides = __pyx_pybuffernd_matrix.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_matrix.diminfo[1].shape = __pyx_pybuffernd_matrix.rcbuffer->pybuffer.shape[1]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_seq1.rcbuffer->pybuffer, (PyObject*)__pyx_v_seq1, &__Pyx_TypeInfo_nn___pyx_t_5numpy_int32_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_seq1.diminfo[0].strides = __pyx_pybuffernd_seq1.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_seq1.diminfo[0].shape = __pyx_pybuffernd_seq1.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_seq2.rcbuffer->pybuffer, (PyObject*)__pyx_v_seq2, &__Pyx_TypeInfo_nn___pyx_t_5numpy_int32_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_pybuffernd_seq2.diminfo[0].strides = __pyx_pybuffernd_seq2.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_seq2.diminfo[0].shape = __pyx_pybuffernd_seq2.rcbuffer->pybuffer.shape[0]; /* "cogent/evolve/_pairwise_distance.pyx":14 * cdef int i * * for i in range(len(seq1)): # <<<<<<<<<<<<<< * if seq1[i] < 0 or seq2[i] < 0: * continue */ __pyx_t_1 = PyObject_Length(((PyObject *)__pyx_v_seq1)); if (unlikely(__pyx_t_1 == -1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;} for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) { __pyx_v_i = __pyx_t_2; /* "cogent/evolve/_pairwise_distance.pyx":15 * * for i in range(len(seq1)): * if seq1[i] < 0 or seq2[i] < 0: # <<<<<<<<<<<<<< * continue * */ __pyx_t_3 = __pyx_v_i; __pyx_t_4 = -1; if (__pyx_t_3 < 0) { __pyx_t_3 += __pyx_pybuffernd_seq1.diminfo[0].shape; if (unlikely(__pyx_t_3 < 0)) __pyx_t_4 = 0; } else if (unlikely(__pyx_t_3 >= __pyx_pybuffernd_seq1.diminfo[0].shape)) __pyx_t_4 = 0; if (unlikely(__pyx_t_4 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_4); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_5 = ((*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_int32_t *, __pyx_pybuffernd_seq1.rcbuffer->pybuffer.buf, __pyx_t_3, __pyx_pybuffernd_seq1.diminfo[0].strides)) < 0); if (!__pyx_t_5) { __pyx_t_4 = __pyx_v_i; __pyx_t_6 = -1; if (__pyx_t_4 < 0) { __pyx_t_4 += __pyx_pybuffernd_seq2.diminfo[0].shape; if (unlikely(__pyx_t_4 < 0)) __pyx_t_6 = 0; } else if (unlikely(__pyx_t_4 >= __pyx_pybuffernd_seq2.diminfo[0].shape)) __pyx_t_6 = 0; if (unlikely(__pyx_t_6 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_6); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_7 = ((*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_int32_t *, __pyx_pybuffernd_seq2.rcbuffer->pybuffer.buf, __pyx_t_4, __pyx_pybuffernd_seq2.diminfo[0].strides)) < 0); __pyx_t_8 = __pyx_t_7; } else { __pyx_t_8 = __pyx_t_5; } if (__pyx_t_8) { /* "cogent/evolve/_pairwise_distance.pyx":16 * for i in range(len(seq1)): * if seq1[i] < 0 or seq2[i] < 0: * continue # <<<<<<<<<<<<<< * * matrix[seq1[i], seq2[i]] += 1.0 */ goto __pyx_L3_continue; goto __pyx_L5; } __pyx_L5:; /* "cogent/evolve/_pairwise_distance.pyx":18 * continue * * matrix[seq1[i], seq2[i]] += 1.0 # <<<<<<<<<<<<<< * * */ __pyx_t_6 = __pyx_v_i; __pyx_t_9 = -1; if (__pyx_t_6 < 0) { __pyx_t_6 += __pyx_pybuffernd_seq1.diminfo[0].shape; if (unlikely(__pyx_t_6 < 0)) __pyx_t_9 = 0; } else if (unlikely(__pyx_t_6 >= __pyx_pybuffernd_seq1.diminfo[0].shape)) __pyx_t_9 = 0; if (unlikely(__pyx_t_9 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_9); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_9 = __pyx_v_i; __pyx_t_10 = -1; if (__pyx_t_9 < 0) { __pyx_t_9 += __pyx_pybuffernd_seq2.diminfo[0].shape; if (unlikely(__pyx_t_9 < 0)) __pyx_t_10 = 0; } else if (unlikely(__pyx_t_9 >= __pyx_pybuffernd_seq2.diminfo[0].shape)) __pyx_t_10 = 0; if (unlikely(__pyx_t_10 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_10); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_11 = (*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_int32_t *, __pyx_pybuffernd_seq1.rcbuffer->pybuffer.buf, __pyx_t_6, __pyx_pybuffernd_seq1.diminfo[0].strides)); __pyx_t_12 = (*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_int32_t *, __pyx_pybuffernd_seq2.rcbuffer->pybuffer.buf, __pyx_t_9, __pyx_pybuffernd_seq2.diminfo[0].strides)); __pyx_t_10 = -1; if (__pyx_t_11 < 0) { __pyx_t_11 += __pyx_pybuffernd_matrix.diminfo[0].shape; if (unlikely(__pyx_t_11 < 0)) __pyx_t_10 = 0; } else if (unlikely(__pyx_t_11 >= __pyx_pybuffernd_matrix.diminfo[0].shape)) __pyx_t_10 = 0; if (__pyx_t_12 < 0) { __pyx_t_12 += __pyx_pybuffernd_matrix.diminfo[1].shape; if (unlikely(__pyx_t_12 < 0)) __pyx_t_10 = 1; } else if (unlikely(__pyx_t_12 >= __pyx_pybuffernd_matrix.diminfo[1].shape)) __pyx_t_10 = 1; if (unlikely(__pyx_t_10 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_10); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } *__Pyx_BufPtrStrided2d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_matrix.rcbuffer->pybuffer.buf, __pyx_t_11, __pyx_pybuffernd_matrix.diminfo[0].strides, __pyx_t_12, __pyx_pybuffernd_matrix.diminfo[1].strides) += 1.0; __pyx_L3_continue:; } __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_matrix.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_seq1.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_seq2.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("cogent.evolve._pairwise_distance._fill_diversity_matrix", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_matrix.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_seq1.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_seq2.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /*proto*/ static int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_r; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__getbuffer__ (wrapper)", 0); __pyx_r = __pyx_pf_5numpy_7ndarray___getbuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info), ((int)__pyx_v_flags)); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":193 * # experimental exception made for __getbuffer__ and __releasebuffer__ * # -- the details of this may change. * def __getbuffer__(ndarray self, Py_buffer* info, int flags): # <<<<<<<<<<<<<< * # This implementation of getbuffer is geared towards Cython * # requirements, and does not yet fullfill the PEP. */ static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) { int __pyx_v_copy_shape; int __pyx_v_i; int __pyx_v_ndim; int __pyx_v_endian_detector; int __pyx_v_little_endian; int __pyx_v_t; char *__pyx_v_f; PyArray_Descr *__pyx_v_descr = 0; int __pyx_v_offset; int __pyx_v_hasfields; int __pyx_r; __Pyx_RefNannyDeclarations int __pyx_t_1; int __pyx_t_2; int __pyx_t_3; PyObject *__pyx_t_4 = NULL; int __pyx_t_5; int __pyx_t_6; int __pyx_t_7; PyObject *__pyx_t_8 = NULL; char *__pyx_t_9; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("__getbuffer__", 0); if (__pyx_v_info != NULL) { __pyx_v_info->obj = Py_None; __Pyx_INCREF(Py_None); __Pyx_GIVEREF(__pyx_v_info->obj); } /* "numpy.pxd":199 * # of flags * * if info == NULL: return # <<<<<<<<<<<<<< * * cdef int copy_shape, i, ndim */ __pyx_t_1 = (__pyx_v_info == NULL); if (__pyx_t_1) { __pyx_r = 0; goto __pyx_L0; goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":202 * * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * */ __pyx_v_endian_detector = 1; /* "numpy.pxd":203 * cdef int copy_shape, i, ndim * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * * ndim = PyArray_NDIM(self) */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":205 * cdef bint little_endian = ((&endian_detector)[0] != 0) * * ndim = PyArray_NDIM(self) # <<<<<<<<<<<<<< * * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_v_ndim = PyArray_NDIM(__pyx_v_self); /* "numpy.pxd":207 * ndim = PyArray_NDIM(self) * * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * copy_shape = 1 * else: */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":208 * * if sizeof(npy_intp) != sizeof(Py_ssize_t): * copy_shape = 1 # <<<<<<<<<<<<<< * else: * copy_shape = 0 */ __pyx_v_copy_shape = 1; goto __pyx_L4; } /*else*/ { /* "numpy.pxd":210 * copy_shape = 1 * else: * copy_shape = 0 # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) */ __pyx_v_copy_shape = 0; } __pyx_L4:; /* "numpy.pxd":212 * copy_shape = 0 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") */ __pyx_t_1 = ((__pyx_v_flags & PyBUF_C_CONTIGUOUS) == PyBUF_C_CONTIGUOUS); if (__pyx_t_1) { /* "numpy.pxd":213 * * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not C contiguous") * */ __pyx_t_2 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_C_CONTIGUOUS)); __pyx_t_3 = __pyx_t_2; } else { __pyx_t_3 = __pyx_t_1; } if (__pyx_t_3) { /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_2), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":216 * raise ValueError(u"ndarray is not C contiguous") * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) # <<<<<<<<<<<<<< * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") */ __pyx_t_3 = ((__pyx_v_flags & PyBUF_F_CONTIGUOUS) == PyBUF_F_CONTIGUOUS); if (__pyx_t_3) { /* "numpy.pxd":217 * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): # <<<<<<<<<<<<<< * raise ValueError(u"ndarray is not Fortran contiguous") * */ __pyx_t_1 = (!PyArray_CHKFLAGS(__pyx_v_self, NPY_F_CONTIGUOUS)); __pyx_t_2 = __pyx_t_1; } else { __pyx_t_2 = __pyx_t_3; } if (__pyx_t_2) { /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_4), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":220 * raise ValueError(u"ndarray is not Fortran contiguous") * * info.buf = PyArray_DATA(self) # <<<<<<<<<<<<<< * info.ndim = ndim * if copy_shape: */ __pyx_v_info->buf = PyArray_DATA(__pyx_v_self); /* "numpy.pxd":221 * * info.buf = PyArray_DATA(self) * info.ndim = ndim # <<<<<<<<<<<<<< * if copy_shape: * # Allocate new buffer for strides and shape info. */ __pyx_v_info->ndim = __pyx_v_ndim; /* "numpy.pxd":222 * info.buf = PyArray_DATA(self) * info.ndim = ndim * if copy_shape: # <<<<<<<<<<<<<< * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. */ if (__pyx_v_copy_shape) { /* "numpy.pxd":225 * # Allocate new buffer for strides and shape info. * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) # <<<<<<<<<<<<<< * info.shape = info.strides + ndim * for i in range(ndim): */ __pyx_v_info->strides = ((Py_ssize_t *)malloc((((sizeof(Py_ssize_t)) * ((size_t)__pyx_v_ndim)) * 2))); /* "numpy.pxd":226 * # This is allocated as one block, strides first. * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim # <<<<<<<<<<<<<< * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] */ __pyx_v_info->shape = (__pyx_v_info->strides + __pyx_v_ndim); /* "numpy.pxd":227 * info.strides = stdlib.malloc(sizeof(Py_ssize_t) * ndim * 2) * info.shape = info.strides + ndim * for i in range(ndim): # <<<<<<<<<<<<<< * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] */ __pyx_t_5 = __pyx_v_ndim; for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) { __pyx_v_i = __pyx_t_6; /* "numpy.pxd":228 * info.shape = info.strides + ndim * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] # <<<<<<<<<<<<<< * info.shape[i] = PyArray_DIMS(self)[i] * else: */ (__pyx_v_info->strides[__pyx_v_i]) = (PyArray_STRIDES(__pyx_v_self)[__pyx_v_i]); /* "numpy.pxd":229 * for i in range(ndim): * info.strides[i] = PyArray_STRIDES(self)[i] * info.shape[i] = PyArray_DIMS(self)[i] # <<<<<<<<<<<<<< * else: * info.strides = PyArray_STRIDES(self) */ (__pyx_v_info->shape[__pyx_v_i]) = (PyArray_DIMS(__pyx_v_self)[__pyx_v_i]); } goto __pyx_L7; } /*else*/ { /* "numpy.pxd":231 * info.shape[i] = PyArray_DIMS(self)[i] * else: * info.strides = PyArray_STRIDES(self) # <<<<<<<<<<<<<< * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL */ __pyx_v_info->strides = ((Py_ssize_t *)PyArray_STRIDES(__pyx_v_self)); /* "numpy.pxd":232 * else: * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) # <<<<<<<<<<<<<< * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) */ __pyx_v_info->shape = ((Py_ssize_t *)PyArray_DIMS(__pyx_v_self)); } __pyx_L7:; /* "numpy.pxd":233 * info.strides = PyArray_STRIDES(self) * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL # <<<<<<<<<<<<<< * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) */ __pyx_v_info->suboffsets = NULL; /* "numpy.pxd":234 * info.shape = PyArray_DIMS(self) * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) # <<<<<<<<<<<<<< * info.readonly = not PyArray_ISWRITEABLE(self) * */ __pyx_v_info->itemsize = PyArray_ITEMSIZE(__pyx_v_self); /* "numpy.pxd":235 * info.suboffsets = NULL * info.itemsize = PyArray_ITEMSIZE(self) * info.readonly = not PyArray_ISWRITEABLE(self) # <<<<<<<<<<<<<< * * cdef int t */ __pyx_v_info->readonly = (!PyArray_ISWRITEABLE(__pyx_v_self)); /* "numpy.pxd":238 * * cdef int t * cdef char* f = NULL # <<<<<<<<<<<<<< * cdef dtype descr = self.descr * cdef list stack */ __pyx_v_f = NULL; /* "numpy.pxd":239 * cdef int t * cdef char* f = NULL * cdef dtype descr = self.descr # <<<<<<<<<<<<<< * cdef list stack * cdef int offset */ __Pyx_INCREF(((PyObject *)__pyx_v_self->descr)); __pyx_v_descr = __pyx_v_self->descr; /* "numpy.pxd":243 * cdef int offset * * cdef bint hasfields = PyDataType_HASFIELDS(descr) # <<<<<<<<<<<<<< * * if not hasfields and not copy_shape: */ __pyx_v_hasfields = PyDataType_HASFIELDS(__pyx_v_descr); /* "numpy.pxd":245 * cdef bint hasfields = PyDataType_HASFIELDS(descr) * * if not hasfields and not copy_shape: # <<<<<<<<<<<<<< * # do not call releasebuffer * info.obj = None */ __pyx_t_2 = (!__pyx_v_hasfields); if (__pyx_t_2) { __pyx_t_3 = (!__pyx_v_copy_shape); __pyx_t_1 = __pyx_t_3; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":247 * if not hasfields and not copy_shape: * # do not call releasebuffer * info.obj = None # <<<<<<<<<<<<<< * else: * # need to call releasebuffer */ __Pyx_INCREF(Py_None); __Pyx_GIVEREF(Py_None); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = Py_None; goto __pyx_L10; } /*else*/ { /* "numpy.pxd":250 * else: * # need to call releasebuffer * info.obj = self # <<<<<<<<<<<<<< * * if not hasfields: */ __Pyx_INCREF(((PyObject *)__pyx_v_self)); __Pyx_GIVEREF(((PyObject *)__pyx_v_self)); __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = ((PyObject *)__pyx_v_self); } __pyx_L10:; /* "numpy.pxd":252 * info.obj = self * * if not hasfields: # <<<<<<<<<<<<<< * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or */ __pyx_t_1 = (!__pyx_v_hasfields); if (__pyx_t_1) { /* "numpy.pxd":253 * * if not hasfields: * t = descr.type_num # <<<<<<<<<<<<<< * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): */ __pyx_v_t = __pyx_v_descr->type_num; /* "numpy.pxd":254 * if not hasfields: * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_1 = (__pyx_v_descr->byteorder == '>'); if (__pyx_t_1) { __pyx_t_2 = __pyx_v_little_endian; } else { __pyx_t_2 = __pyx_t_1; } if (!__pyx_t_2) { /* "numpy.pxd":255 * t = descr.type_num * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" */ __pyx_t_1 = (__pyx_v_descr->byteorder == '<'); if (__pyx_t_1) { __pyx_t_3 = (!__pyx_v_little_endian); __pyx_t_7 = __pyx_t_3; } else { __pyx_t_7 = __pyx_t_1; } __pyx_t_1 = __pyx_t_7; } else { __pyx_t_1 = __pyx_t_2; } if (__pyx_t_1) { /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_t_4 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_6), NULL); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __Pyx_Raise(__pyx_t_4, 0, 0, 0); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L12; } __pyx_L12:; /* "numpy.pxd":257 * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" */ __pyx_t_1 = (__pyx_v_t == NPY_BYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__b; goto __pyx_L13; } /* "numpy.pxd":258 * raise ValueError(u"Non-native byte order not supported") * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" */ __pyx_t_1 = (__pyx_v_t == NPY_UBYTE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__B; goto __pyx_L13; } /* "numpy.pxd":259 * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" */ __pyx_t_1 = (__pyx_v_t == NPY_SHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__h; goto __pyx_L13; } /* "numpy.pxd":260 * elif t == NPY_UBYTE: f = "B" * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" */ __pyx_t_1 = (__pyx_v_t == NPY_USHORT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__H; goto __pyx_L13; } /* "numpy.pxd":261 * elif t == NPY_SHORT: f = "h" * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" */ __pyx_t_1 = (__pyx_v_t == NPY_INT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__i; goto __pyx_L13; } /* "numpy.pxd":262 * elif t == NPY_USHORT: f = "H" * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" */ __pyx_t_1 = (__pyx_v_t == NPY_UINT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__I; goto __pyx_L13; } /* "numpy.pxd":263 * elif t == NPY_INT: f = "i" * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" */ __pyx_t_1 = (__pyx_v_t == NPY_LONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__l; goto __pyx_L13; } /* "numpy.pxd":264 * elif t == NPY_UINT: f = "I" * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__L; goto __pyx_L13; } /* "numpy.pxd":265 * elif t == NPY_LONG: f = "l" * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__q; goto __pyx_L13; } /* "numpy.pxd":266 * elif t == NPY_ULONG: f = "L" * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" */ __pyx_t_1 = (__pyx_v_t == NPY_ULONGLONG); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Q; goto __pyx_L13; } /* "numpy.pxd":267 * elif t == NPY_LONGLONG: f = "q" * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" */ __pyx_t_1 = (__pyx_v_t == NPY_FLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__f; goto __pyx_L13; } /* "numpy.pxd":268 * elif t == NPY_ULONGLONG: f = "Q" * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" */ __pyx_t_1 = (__pyx_v_t == NPY_DOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__d; goto __pyx_L13; } /* "numpy.pxd":269 * elif t == NPY_FLOAT: f = "f" * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" */ __pyx_t_1 = (__pyx_v_t == NPY_LONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__g; goto __pyx_L13; } /* "numpy.pxd":270 * elif t == NPY_DOUBLE: f = "d" * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" */ __pyx_t_1 = (__pyx_v_t == NPY_CFLOAT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zf; goto __pyx_L13; } /* "numpy.pxd":271 * elif t == NPY_LONGDOUBLE: f = "g" * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" */ __pyx_t_1 = (__pyx_v_t == NPY_CDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zd; goto __pyx_L13; } /* "numpy.pxd":272 * elif t == NPY_CFLOAT: f = "Zf" * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f = "O" * else: */ __pyx_t_1 = (__pyx_v_t == NPY_CLONGDOUBLE); if (__pyx_t_1) { __pyx_v_f = __pyx_k__Zg; goto __pyx_L13; } /* "numpy.pxd":273 * elif t == NPY_CDOUBLE: f = "Zd" * elif t == NPY_CLONGDOUBLE: f = "Zg" * elif t == NPY_OBJECT: f = "O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_1 = (__pyx_v_t == NPY_OBJECT); if (__pyx_t_1) { __pyx_v_f = __pyx_k__O; goto __pyx_L13; } /*else*/ { /* "numpy.pxd":275 * elif t == NPY_OBJECT: f = "O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * info.format = f * return */ __pyx_t_4 = PyInt_FromLong(__pyx_v_t); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_8 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_t_4); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_8)); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_8)); __Pyx_GIVEREF(((PyObject *)__pyx_t_8)); __pyx_t_8 = 0; __pyx_t_8 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_8); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_Raise(__pyx_t_8, 0, 0, 0); __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 275; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L13:; /* "numpy.pxd":276 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f # <<<<<<<<<<<<<< * return * else: */ __pyx_v_info->format = __pyx_v_f; /* "numpy.pxd":277 * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * info.format = f * return # <<<<<<<<<<<<<< * else: * info.format = stdlib.malloc(_buffer_format_string_len) */ __pyx_r = 0; goto __pyx_L0; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":279 * return * else: * info.format = stdlib.malloc(_buffer_format_string_len) # <<<<<<<<<<<<<< * info.format[0] = '^' # Native data types, manual alignment * offset = 0 */ __pyx_v_info->format = ((char *)malloc(255)); /* "numpy.pxd":280 * else: * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment # <<<<<<<<<<<<<< * offset = 0 * f = _util_dtypestring(descr, info.format + 1, */ (__pyx_v_info->format[0]) = '^'; /* "numpy.pxd":281 * info.format = stdlib.malloc(_buffer_format_string_len) * info.format[0] = '^' # Native data types, manual alignment * offset = 0 # <<<<<<<<<<<<<< * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, */ __pyx_v_offset = 0; /* "numpy.pxd":284 * f = _util_dtypestring(descr, info.format + 1, * info.format + _buffer_format_string_len, * &offset) # <<<<<<<<<<<<<< * f[0] = 0 # Terminate format string * */ __pyx_t_9 = __pyx_f_5numpy__util_dtypestring(__pyx_v_descr, (__pyx_v_info->format + 1), (__pyx_v_info->format + 255), (&__pyx_v_offset)); if (unlikely(__pyx_t_9 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 282; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_9; /* "numpy.pxd":285 * info.format + _buffer_format_string_len, * &offset) * f[0] = 0 # Terminate format string # <<<<<<<<<<<<<< * * def __releasebuffer__(ndarray self, Py_buffer* info): */ (__pyx_v_f[0]) = 0; } __pyx_L11:; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_8); __Pyx_AddTraceback("numpy.ndarray.__getbuffer__", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = -1; if (__pyx_v_info != NULL && __pyx_v_info->obj != NULL) { __Pyx_GOTREF(__pyx_v_info->obj); __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = NULL; } goto __pyx_L2; __pyx_L0:; if (__pyx_v_info != NULL && __pyx_v_info->obj == Py_None) { __Pyx_GOTREF(Py_None); __Pyx_DECREF(Py_None); __pyx_v_info->obj = NULL; } __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_descr); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info); /*proto*/ static void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__releasebuffer__ (wrapper)", 0); __pyx_pf_5numpy_7ndarray_2__releasebuffer__(((PyArrayObject *)__pyx_v_self), ((Py_buffer *)__pyx_v_info)); __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":287 * f[0] = 0 # Terminate format string * * def __releasebuffer__(ndarray self, Py_buffer* info): # <<<<<<<<<<<<<< * if PyArray_HASFIELDS(self): * stdlib.free(info.format) */ static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *__pyx_v_self, Py_buffer *__pyx_v_info) { __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("__releasebuffer__", 0); /* "numpy.pxd":288 * * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): # <<<<<<<<<<<<<< * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): */ __pyx_t_1 = PyArray_HASFIELDS(__pyx_v_self); if (__pyx_t_1) { /* "numpy.pxd":289 * def __releasebuffer__(ndarray self, Py_buffer* info): * if PyArray_HASFIELDS(self): * stdlib.free(info.format) # <<<<<<<<<<<<<< * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) */ free(__pyx_v_info->format); goto __pyx_L3; } __pyx_L3:; /* "numpy.pxd":290 * if PyArray_HASFIELDS(self): * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): # <<<<<<<<<<<<<< * stdlib.free(info.strides) * # info.shape was stored after info.strides in the same block */ __pyx_t_1 = ((sizeof(npy_intp)) != (sizeof(Py_ssize_t))); if (__pyx_t_1) { /* "numpy.pxd":291 * stdlib.free(info.format) * if sizeof(npy_intp) != sizeof(Py_ssize_t): * stdlib.free(info.strides) # <<<<<<<<<<<<<< * # info.shape was stored after info.strides in the same block * */ free(__pyx_v_info->strides); goto __pyx_L4; } __pyx_L4:; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":767 * ctypedef npy_cdouble complex_t * * cdef inline object PyArray_MultiIterNew1(a): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(1, a) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew1(PyObject *__pyx_v_a) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew1", 0); /* "numpy.pxd":768 * * cdef inline object PyArray_MultiIterNew1(a): * return PyArray_MultiIterNew(1, a) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew2(a, b): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(1, ((void *)__pyx_v_a)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 768; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew1", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":770 * return PyArray_MultiIterNew(1, a) * * cdef inline object PyArray_MultiIterNew2(a, b): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(2, a, b) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew2(PyObject *__pyx_v_a, PyObject *__pyx_v_b) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew2", 0); /* "numpy.pxd":771 * * cdef inline object PyArray_MultiIterNew2(a, b): * return PyArray_MultiIterNew(2, a, b) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew3(a, b, c): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(2, ((void *)__pyx_v_a), ((void *)__pyx_v_b)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 771; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew2", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":773 * return PyArray_MultiIterNew(2, a, b) * * cdef inline object PyArray_MultiIterNew3(a, b, c): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(3, a, b, c) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew3(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew3", 0); /* "numpy.pxd":774 * * cdef inline object PyArray_MultiIterNew3(a, b, c): * return PyArray_MultiIterNew(3, a, b, c) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(3, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 774; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew3", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":776 * return PyArray_MultiIterNew(3, a, b, c) * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(4, a, b, c, d) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew4(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew4", 0); /* "numpy.pxd":777 * * cdef inline object PyArray_MultiIterNew4(a, b, c, d): * return PyArray_MultiIterNew(4, a, b, c, d) # <<<<<<<<<<<<<< * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(4, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 777; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew4", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":779 * return PyArray_MultiIterNew(4, a, b, c, d) * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): # <<<<<<<<<<<<<< * return PyArray_MultiIterNew(5, a, b, c, d, e) * */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew5(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d, PyObject *__pyx_v_e) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("PyArray_MultiIterNew5", 0); /* "numpy.pxd":780 * * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e): * return PyArray_MultiIterNew(5, a, b, c, d, e) # <<<<<<<<<<<<<< * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: */ __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyArray_MultiIterNew(5, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d), ((void *)__pyx_v_e)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 780; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("numpy.PyArray_MultiIterNew5", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":782 * return PyArray_MultiIterNew(5, a, b, c, d, e) * * cdef inline char* _util_dtypestring(dtype descr, char* f, char* end, int* offset) except NULL: # <<<<<<<<<<<<<< * # Recursive utility function used in __getbuffer__ to get format * # string. The new location in the format string is returned. */ static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Descr *__pyx_v_descr, char *__pyx_v_f, char *__pyx_v_end, int *__pyx_v_offset) { PyArray_Descr *__pyx_v_child = 0; int __pyx_v_endian_detector; int __pyx_v_little_endian; PyObject *__pyx_v_fields = 0; PyObject *__pyx_v_childname = NULL; PyObject *__pyx_v_new_offset = NULL; PyObject *__pyx_v_t = NULL; char *__pyx_r; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; Py_ssize_t __pyx_t_2; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; PyObject *__pyx_t_5 = NULL; int __pyx_t_6; int __pyx_t_7; int __pyx_t_8; int __pyx_t_9; long __pyx_t_10; char *__pyx_t_11; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("_util_dtypestring", 0); /* "numpy.pxd":789 * cdef int delta_offset * cdef tuple i * cdef int endian_detector = 1 # <<<<<<<<<<<<<< * cdef bint little_endian = ((&endian_detector)[0] != 0) * cdef tuple fields */ __pyx_v_endian_detector = 1; /* "numpy.pxd":790 * cdef tuple i * cdef int endian_detector = 1 * cdef bint little_endian = ((&endian_detector)[0] != 0) # <<<<<<<<<<<<<< * cdef tuple fields * */ __pyx_v_little_endian = ((((char *)(&__pyx_v_endian_detector))[0]) != 0); /* "numpy.pxd":793 * cdef tuple fields * * for childname in descr.names: # <<<<<<<<<<<<<< * fields = descr.fields[childname] * child, new_offset = fields */ if (unlikely(((PyObject *)__pyx_v_descr->names) == Py_None)) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 793; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_1 = ((PyObject *)__pyx_v_descr->names); __Pyx_INCREF(__pyx_t_1); __pyx_t_2 = 0; for (;;) { if (__pyx_t_2 >= PyTuple_GET_SIZE(__pyx_t_1)) break; __pyx_t_3 = PyTuple_GET_ITEM(__pyx_t_1, __pyx_t_2); __Pyx_INCREF(__pyx_t_3); __pyx_t_2++; __Pyx_XDECREF(__pyx_v_childname); __pyx_v_childname = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":794 * * for childname in descr.names: * fields = descr.fields[childname] # <<<<<<<<<<<<<< * child, new_offset = fields * */ __pyx_t_3 = PyObject_GetItem(__pyx_v_descr->fields, __pyx_v_childname); if (!__pyx_t_3) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); if (!(likely(PyTuple_CheckExact(__pyx_t_3))||((__pyx_t_3) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected tuple, got %.200s", Py_TYPE(__pyx_t_3)->tp_name), 0))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 794; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_fields)); __pyx_v_fields = ((PyObject*)__pyx_t_3); __pyx_t_3 = 0; /* "numpy.pxd":795 * for childname in descr.names: * fields = descr.fields[childname] * child, new_offset = fields # <<<<<<<<<<<<<< * * if (end - f) - (new_offset - offset[0]) < 15: */ if (likely(PyTuple_CheckExact(((PyObject *)__pyx_v_fields)))) { PyObject* sequence = ((PyObject *)__pyx_v_fields); if (unlikely(PyTuple_GET_SIZE(sequence) != 2)) { if (PyTuple_GET_SIZE(sequence) > 2) __Pyx_RaiseTooManyValuesError(2); else __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(sequence)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_t_3 = PyTuple_GET_ITEM(sequence, 0); __pyx_t_4 = PyTuple_GET_ITEM(sequence, 1); __Pyx_INCREF(__pyx_t_3); __Pyx_INCREF(__pyx_t_4); } else { __Pyx_UnpackTupleError(((PyObject *)__pyx_v_fields), 2); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } if (!(likely(((__pyx_t_3) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_3, __pyx_ptype_5numpy_dtype))))) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 795; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_XDECREF(((PyObject *)__pyx_v_child)); __pyx_v_child = ((PyArray_Descr *)__pyx_t_3); __pyx_t_3 = 0; __Pyx_XDECREF(__pyx_v_new_offset); __pyx_v_new_offset = __pyx_t_4; __pyx_t_4 = 0; /* "numpy.pxd":797 * child, new_offset = fields * * if (end - f) - (new_offset - offset[0]) < 15: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * */ __pyx_t_4 = PyInt_FromLong((__pyx_v_end - __pyx_v_f)); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyNumber_Subtract(__pyx_v_new_offset, __pyx_t_3); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyNumber_Subtract(__pyx_t_4, __pyx_t_5); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = PyObject_RichCompare(__pyx_t_3, __pyx_int_15, Py_LT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 797; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_t_5 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_9), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "numpy.pxd":800 * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") * * if ((child.byteorder == '>' and little_endian) or # <<<<<<<<<<<<<< * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") */ __pyx_t_6 = (__pyx_v_child->byteorder == '>'); if (__pyx_t_6) { __pyx_t_7 = __pyx_v_little_endian; } else { __pyx_t_7 = __pyx_t_6; } if (!__pyx_t_7) { /* "numpy.pxd":801 * * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): # <<<<<<<<<<<<<< * raise ValueError(u"Non-native byte order not supported") * # One could encode it in the format string and have Cython */ __pyx_t_6 = (__pyx_v_child->byteorder == '<'); if (__pyx_t_6) { __pyx_t_8 = (!__pyx_v_little_endian); __pyx_t_9 = __pyx_t_8; } else { __pyx_t_9 = __pyx_t_6; } __pyx_t_6 = __pyx_t_9; } else { __pyx_t_6 = __pyx_t_7; } if (__pyx_t_6) { /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_10), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "numpy.pxd":812 * * # Output padding bytes * while offset[0] < new_offset: # <<<<<<<<<<<<<< * f[0] = 120 # "x"; pad byte * f += 1 */ while (1) { __pyx_t_5 = PyInt_FromLong((__pyx_v_offset[0])); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_v_new_offset, Py_LT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 812; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (!__pyx_t_6) break; /* "numpy.pxd":813 * # Output padding bytes * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte # <<<<<<<<<<<<<< * f += 1 * offset[0] += 1 */ (__pyx_v_f[0]) = 120; /* "numpy.pxd":814 * while offset[0] < new_offset: * f[0] = 120 # "x"; pad byte * f += 1 # <<<<<<<<<<<<<< * offset[0] += 1 * */ __pyx_v_f = (__pyx_v_f + 1); /* "numpy.pxd":815 * f[0] = 120 # "x"; pad byte * f += 1 * offset[0] += 1 # <<<<<<<<<<<<<< * * offset[0] += child.itemsize */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + 1); } /* "numpy.pxd":817 * offset[0] += 1 * * offset[0] += child.itemsize # <<<<<<<<<<<<<< * * if not PyDataType_HASFIELDS(child): */ __pyx_t_10 = 0; (__pyx_v_offset[__pyx_t_10]) = ((__pyx_v_offset[__pyx_t_10]) + __pyx_v_child->elsize); /* "numpy.pxd":819 * offset[0] += child.itemsize * * if not PyDataType_HASFIELDS(child): # <<<<<<<<<<<<<< * t = child.type_num * if end - f < 5: */ __pyx_t_6 = (!PyDataType_HASFIELDS(__pyx_v_child)); if (__pyx_t_6) { /* "numpy.pxd":820 * * if not PyDataType_HASFIELDS(child): * t = child.type_num # <<<<<<<<<<<<<< * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") */ __pyx_t_3 = PyInt_FromLong(__pyx_v_child->type_num); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 820; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_XDECREF(__pyx_v_t); __pyx_v_t = __pyx_t_3; __pyx_t_3 = 0; /* "numpy.pxd":821 * if not PyDataType_HASFIELDS(child): * t = child.type_num * if end - f < 5: # <<<<<<<<<<<<<< * raise RuntimeError(u"Format string allocated too short.") * */ __pyx_t_6 = ((__pyx_v_end - __pyx_v_f) < 5); if (__pyx_t_6) { /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_t_3 = PyObject_Call(__pyx_builtin_RuntimeError, ((PyObject *)__pyx_k_tuple_12), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_Raise(__pyx_t_3, 0, 0, 0); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L10; } __pyx_L10:; /* "numpy.pxd":825 * * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" # <<<<<<<<<<<<<< * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" */ __pyx_t_3 = PyInt_FromLong(NPY_BYTE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 825; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 98; goto __pyx_L11; } /* "numpy.pxd":826 * # Until ticket #99 is fixed, use integers to avoid warnings * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" # <<<<<<<<<<<<<< * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" */ __pyx_t_5 = PyInt_FromLong(NPY_UBYTE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 826; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 66; goto __pyx_L11; } /* "numpy.pxd":827 * if t == NPY_BYTE: f[0] = 98 #"b" * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" # <<<<<<<<<<<<<< * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" */ __pyx_t_3 = PyInt_FromLong(NPY_SHORT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 827; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 104; goto __pyx_L11; } /* "numpy.pxd":828 * elif t == NPY_UBYTE: f[0] = 66 #"B" * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" # <<<<<<<<<<<<<< * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" */ __pyx_t_5 = PyInt_FromLong(NPY_USHORT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 828; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 72; goto __pyx_L11; } /* "numpy.pxd":829 * elif t == NPY_SHORT: f[0] = 104 #"h" * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" # <<<<<<<<<<<<<< * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" */ __pyx_t_3 = PyInt_FromLong(NPY_INT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 829; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 105; goto __pyx_L11; } /* "numpy.pxd":830 * elif t == NPY_USHORT: f[0] = 72 #"H" * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" # <<<<<<<<<<<<<< * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" */ __pyx_t_5 = PyInt_FromLong(NPY_UINT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 830; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 73; goto __pyx_L11; } /* "numpy.pxd":831 * elif t == NPY_INT: f[0] = 105 #"i" * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" # <<<<<<<<<<<<<< * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" */ __pyx_t_3 = PyInt_FromLong(NPY_LONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 831; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 108; goto __pyx_L11; } /* "numpy.pxd":832 * elif t == NPY_UINT: f[0] = 73 #"I" * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" # <<<<<<<<<<<<<< * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 832; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 76; goto __pyx_L11; } /* "numpy.pxd":833 * elif t == NPY_LONG: f[0] = 108 #"l" * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" # <<<<<<<<<<<<<< * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" */ __pyx_t_3 = PyInt_FromLong(NPY_LONGLONG); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 833; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 113; goto __pyx_L11; } /* "numpy.pxd":834 * elif t == NPY_ULONG: f[0] = 76 #"L" * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" # <<<<<<<<<<<<<< * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" */ __pyx_t_5 = PyInt_FromLong(NPY_ULONGLONG); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 834; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 81; goto __pyx_L11; } /* "numpy.pxd":835 * elif t == NPY_LONGLONG: f[0] = 113 #"q" * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" # <<<<<<<<<<<<<< * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" */ __pyx_t_3 = PyInt_FromLong(NPY_FLOAT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 835; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 102; goto __pyx_L11; } /* "numpy.pxd":836 * elif t == NPY_ULONGLONG: f[0] = 81 #"Q" * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" # <<<<<<<<<<<<<< * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf */ __pyx_t_5 = PyInt_FromLong(NPY_DOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 836; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 100; goto __pyx_L11; } /* "numpy.pxd":837 * elif t == NPY_FLOAT: f[0] = 102 #"f" * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" # <<<<<<<<<<<<<< * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd */ __pyx_t_3 = PyInt_FromLong(NPY_LONGDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 837; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 103; goto __pyx_L11; } /* "numpy.pxd":838 * elif t == NPY_DOUBLE: f[0] = 100 #"d" * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf # <<<<<<<<<<<<<< * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg */ __pyx_t_5 = PyInt_FromLong(NPY_CFLOAT); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 838; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 102; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":839 * elif t == NPY_LONGDOUBLE: f[0] = 103 #"g" * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd # <<<<<<<<<<<<<< * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" */ __pyx_t_3 = PyInt_FromLong(NPY_CDOUBLE); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 839; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 100; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":840 * elif t == NPY_CFLOAT: f[0] = 90; f[1] = 102; f += 1 # Zf * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg # <<<<<<<<<<<<<< * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: */ __pyx_t_5 = PyInt_FromLong(NPY_CLONGDOUBLE); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_3 = PyObject_RichCompare(__pyx_v_t, __pyx_t_5, Py_EQ); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_3); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 840; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 90; (__pyx_v_f[1]) = 103; __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L11; } /* "numpy.pxd":841 * elif t == NPY_CDOUBLE: f[0] = 90; f[1] = 100; f += 1 # Zd * elif t == NPY_CLONGDOUBLE: f[0] = 90; f[1] = 103; f += 1 # Zg * elif t == NPY_OBJECT: f[0] = 79 #"O" # <<<<<<<<<<<<<< * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) */ __pyx_t_3 = PyInt_FromLong(NPY_OBJECT); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyObject_RichCompare(__pyx_v_t, __pyx_t_3, Py_EQ); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_6 = __Pyx_PyObject_IsTrue(__pyx_t_5); if (unlikely(__pyx_t_6 < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 841; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; if (__pyx_t_6) { (__pyx_v_f[0]) = 79; goto __pyx_L11; } /*else*/ { /* "numpy.pxd":843 * elif t == NPY_OBJECT: f[0] = 79 #"O" * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) # <<<<<<<<<<<<<< * f += 1 * else: */ __pyx_t_5 = PyNumber_Remainder(((PyObject *)__pyx_kp_u_7), __pyx_v_t); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_5)); __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, ((PyObject *)__pyx_t_5)); __Pyx_GIVEREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[1]; __pyx_lineno = 843; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_L11:; /* "numpy.pxd":844 * else: * raise ValueError(u"unknown dtype code in numpy.pxd (%d)" % t) * f += 1 # <<<<<<<<<<<<<< * else: * # Cython ignores struct boundary information ("T{...}"), */ __pyx_v_f = (__pyx_v_f + 1); goto __pyx_L9; } /*else*/ { /* "numpy.pxd":848 * # Cython ignores struct boundary information ("T{...}"), * # so don't output it * f = _util_dtypestring(child, f, end, offset) # <<<<<<<<<<<<<< * return f * */ __pyx_t_11 = __pyx_f_5numpy__util_dtypestring(__pyx_v_child, __pyx_v_f, __pyx_v_end, __pyx_v_offset); if (unlikely(__pyx_t_11 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 848; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_f = __pyx_t_11; } __pyx_L9:; } __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "numpy.pxd":849 * # so don't output it * f = _util_dtypestring(child, f, end, offset) * return f # <<<<<<<<<<<<<< * * */ __pyx_r = __pyx_v_f; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __Pyx_AddTraceback("numpy._util_dtypestring", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF((PyObject *)__pyx_v_child); __Pyx_XDECREF(__pyx_v_fields); __Pyx_XDECREF(__pyx_v_childname); __Pyx_XDECREF(__pyx_v_new_offset); __Pyx_XDECREF(__pyx_v_t); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "numpy.pxd":964 * * * cdef inline void set_array_base(ndarray arr, object base): # <<<<<<<<<<<<<< * cdef PyObject* baseptr * if base is None: */ static CYTHON_INLINE void __pyx_f_5numpy_set_array_base(PyArrayObject *__pyx_v_arr, PyObject *__pyx_v_base) { PyObject *__pyx_v_baseptr; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("set_array_base", 0); /* "numpy.pxd":966 * cdef inline void set_array_base(ndarray arr, object base): * cdef PyObject* baseptr * if base is None: # <<<<<<<<<<<<<< * baseptr = NULL * else: */ __pyx_t_1 = (__pyx_v_base == Py_None); if (__pyx_t_1) { /* "numpy.pxd":967 * cdef PyObject* baseptr * if base is None: * baseptr = NULL # <<<<<<<<<<<<<< * else: * Py_INCREF(base) # important to do this before decref below! */ __pyx_v_baseptr = NULL; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":969 * baseptr = NULL * else: * Py_INCREF(base) # important to do this before decref below! # <<<<<<<<<<<<<< * baseptr = base * Py_XDECREF(arr.base) */ Py_INCREF(__pyx_v_base); /* "numpy.pxd":970 * else: * Py_INCREF(base) # important to do this before decref below! * baseptr = base # <<<<<<<<<<<<<< * Py_XDECREF(arr.base) * arr.base = baseptr */ __pyx_v_baseptr = ((PyObject *)__pyx_v_base); } __pyx_L3:; /* "numpy.pxd":971 * Py_INCREF(base) # important to do this before decref below! * baseptr = base * Py_XDECREF(arr.base) # <<<<<<<<<<<<<< * arr.base = baseptr * */ Py_XDECREF(__pyx_v_arr->base); /* "numpy.pxd":972 * baseptr = base * Py_XDECREF(arr.base) * arr.base = baseptr # <<<<<<<<<<<<<< * * cdef inline object get_array_base(ndarray arr): */ __pyx_v_arr->base = __pyx_v_baseptr; __Pyx_RefNannyFinishContext(); } /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ static CYTHON_INLINE PyObject *__pyx_f_5numpy_get_array_base(PyArrayObject *__pyx_v_arr) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations int __pyx_t_1; __Pyx_RefNannySetupContext("get_array_base", 0); /* "numpy.pxd":975 * * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: # <<<<<<<<<<<<<< * return None * else: */ __pyx_t_1 = (__pyx_v_arr->base == NULL); if (__pyx_t_1) { /* "numpy.pxd":976 * cdef inline object get_array_base(ndarray arr): * if arr.base is NULL: * return None # <<<<<<<<<<<<<< * else: * return arr.base */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(Py_None); __pyx_r = Py_None; goto __pyx_L0; goto __pyx_L3; } /*else*/ { /* "numpy.pxd":978 * return None * else: * return arr.base # <<<<<<<<<<<<<< */ __Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_arr->base)); __pyx_r = ((PyObject *)__pyx_v_arr->base); goto __pyx_L0; } __pyx_L3:; __pyx_r = Py_None; __Pyx_INCREF(Py_None); __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyMethodDef __pyx_methods[] = { {0, 0, 0, 0} }; #if PY_MAJOR_VERSION >= 3 static struct PyModuleDef __pyx_moduledef = { PyModuleDef_HEAD_INIT, __Pyx_NAMESTR("_pairwise_distance"), 0, /* m_doc */ -1, /* m_size */ __pyx_methods /* m_methods */, NULL, /* m_reload */ NULL, /* m_traverse */ NULL, /* m_clear */ NULL /* m_free */ }; #endif static __Pyx_StringTabEntry __pyx_string_tab[] = { {&__pyx_kp_u_1, __pyx_k_1, sizeof(__pyx_k_1), 0, 1, 0, 0}, {&__pyx_kp_u_11, __pyx_k_11, sizeof(__pyx_k_11), 0, 1, 0, 0}, {&__pyx_kp_s_13, __pyx_k_13, sizeof(__pyx_k_13), 0, 0, 1, 0}, {&__pyx_n_s_16, __pyx_k_16, sizeof(__pyx_k_16), 0, 0, 1, 1}, {&__pyx_kp_s_17, __pyx_k_17, sizeof(__pyx_k_17), 0, 0, 1, 0}, {&__pyx_n_s_18, __pyx_k_18, sizeof(__pyx_k_18), 0, 0, 1, 1}, {&__pyx_kp_u_3, __pyx_k_3, sizeof(__pyx_k_3), 0, 1, 0, 0}, {&__pyx_kp_u_5, __pyx_k_5, sizeof(__pyx_k_5), 0, 1, 0, 0}, {&__pyx_kp_u_7, __pyx_k_7, sizeof(__pyx_k_7), 0, 1, 0, 0}, {&__pyx_kp_u_8, __pyx_k_8, sizeof(__pyx_k_8), 0, 1, 0, 0}, {&__pyx_n_s__RuntimeError, __pyx_k__RuntimeError, sizeof(__pyx_k__RuntimeError), 0, 0, 1, 1}, {&__pyx_n_s__ValueError, __pyx_k__ValueError, sizeof(__pyx_k__ValueError), 0, 0, 1, 1}, {&__pyx_n_s____main__, __pyx_k____main__, sizeof(__pyx_k____main__), 0, 0, 1, 1}, {&__pyx_n_s____test__, __pyx_k____test__, sizeof(__pyx_k____test__), 0, 0, 1, 1}, {&__pyx_n_s____version__, __pyx_k____version__, sizeof(__pyx_k____version__), 0, 0, 1, 1}, {&__pyx_n_s__i, __pyx_k__i, sizeof(__pyx_k__i), 0, 0, 1, 1}, {&__pyx_n_s__matrix, __pyx_k__matrix, sizeof(__pyx_k__matrix), 0, 0, 1, 1}, {&__pyx_n_s__range, __pyx_k__range, sizeof(__pyx_k__range), 0, 0, 1, 1}, {&__pyx_n_s__seq1, __pyx_k__seq1, sizeof(__pyx_k__seq1), 0, 0, 1, 1}, {&__pyx_n_s__seq2, __pyx_k__seq2, sizeof(__pyx_k__seq2), 0, 0, 1, 1}, {0, 0, 0, 0, 0, 0, 0} }; static int __Pyx_InitCachedBuiltins(void) { __pyx_builtin_range = __Pyx_GetName(__pyx_b, __pyx_n_s__range); if (!__pyx_builtin_range) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_ValueError = __Pyx_GetName(__pyx_b, __pyx_n_s__ValueError); if (!__pyx_builtin_ValueError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_RuntimeError = __Pyx_GetName(__pyx_b, __pyx_n_s__RuntimeError); if (!__pyx_builtin_RuntimeError) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} return 0; __pyx_L1_error:; return -1; } static int __Pyx_InitCachedConstants(void) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__Pyx_InitCachedConstants", 0); /* "numpy.pxd":214 * if ((flags & pybuf.PyBUF_C_CONTIGUOUS == pybuf.PyBUF_C_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_C_CONTIGUOUS)): * raise ValueError(u"ndarray is not C contiguous") # <<<<<<<<<<<<<< * * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) */ __pyx_k_tuple_2 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_2)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 214; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_2); __Pyx_INCREF(((PyObject *)__pyx_kp_u_1)); PyTuple_SET_ITEM(__pyx_k_tuple_2, 0, ((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_1)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_2)); /* "numpy.pxd":218 * if ((flags & pybuf.PyBUF_F_CONTIGUOUS == pybuf.PyBUF_F_CONTIGUOUS) * and not PyArray_CHKFLAGS(self, NPY_F_CONTIGUOUS)): * raise ValueError(u"ndarray is not Fortran contiguous") # <<<<<<<<<<<<<< * * info.buf = PyArray_DATA(self) */ __pyx_k_tuple_4 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_4)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 218; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_4); __Pyx_INCREF(((PyObject *)__pyx_kp_u_3)); PyTuple_SET_ITEM(__pyx_k_tuple_4, 0, ((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_3)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_4)); /* "numpy.pxd":256 * if ((descr.byteorder == '>' and little_endian) or * (descr.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * if t == NPY_BYTE: f = "b" * elif t == NPY_UBYTE: f = "B" */ __pyx_k_tuple_6 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_6)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 256; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_6); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_6, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_6)); /* "numpy.pxd":798 * * if (end - f) - (new_offset - offset[0]) < 15: * raise RuntimeError(u"Format string allocated too short, see comment in numpy.pxd") # <<<<<<<<<<<<<< * * if ((child.byteorder == '>' and little_endian) or */ __pyx_k_tuple_9 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_9)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 798; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_9); __Pyx_INCREF(((PyObject *)__pyx_kp_u_8)); PyTuple_SET_ITEM(__pyx_k_tuple_9, 0, ((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_8)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_9)); /* "numpy.pxd":802 * if ((child.byteorder == '>' and little_endian) or * (child.byteorder == '<' and not little_endian)): * raise ValueError(u"Non-native byte order not supported") # <<<<<<<<<<<<<< * # One could encode it in the format string and have Cython * # complain instead, BUT: < and > in format strings also imply */ __pyx_k_tuple_10 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_10)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 802; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_10); __Pyx_INCREF(((PyObject *)__pyx_kp_u_5)); PyTuple_SET_ITEM(__pyx_k_tuple_10, 0, ((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_5)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_10)); /* "numpy.pxd":822 * t = child.type_num * if end - f < 5: * raise RuntimeError(u"Format string allocated too short.") # <<<<<<<<<<<<<< * * # Until ticket #99 is fixed, use integers to avoid warnings */ __pyx_k_tuple_12 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_12)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 822; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_12); __Pyx_INCREF(((PyObject *)__pyx_kp_u_11)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 0, ((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_u_11)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_12)); /* "cogent/evolve/_pairwise_distance.pyx":6 * * # fills in a diversity matrix from sequences of integers * def _fill_diversity_matrix(np.ndarray[np.float64_t, ndim=2] matrix, np.ndarray[np.int32_t, ndim=1] seq1, np.ndarray[np.int32_t, ndim=1] seq2): # <<<<<<<<<<<<<< * """fills the diversity matrix for valid positions. * */ __pyx_k_tuple_14 = PyTuple_New(4); if (unlikely(!__pyx_k_tuple_14)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_14); __Pyx_INCREF(((PyObject *)__pyx_n_s__matrix)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 0, ((PyObject *)__pyx_n_s__matrix)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__matrix)); __Pyx_INCREF(((PyObject *)__pyx_n_s__seq1)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 1, ((PyObject *)__pyx_n_s__seq1)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__seq1)); __Pyx_INCREF(((PyObject *)__pyx_n_s__seq2)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 2, ((PyObject *)__pyx_n_s__seq2)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__seq2)); __Pyx_INCREF(((PyObject *)__pyx_n_s__i)); PyTuple_SET_ITEM(__pyx_k_tuple_14, 3, ((PyObject *)__pyx_n_s__i)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__i)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_14)); __pyx_k_codeobj_15 = (PyObject*)__Pyx_PyCode_New(3, 0, 4, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_17, __pyx_n_s_16, 6, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_RefNannyFinishContext(); return 0; __pyx_L1_error:; __Pyx_RefNannyFinishContext(); return -1; } static int __Pyx_InitGlobals(void) { if (__Pyx_InitStrings(__pyx_string_tab) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_15 = PyInt_FromLong(15); if (unlikely(!__pyx_int_15)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; return 0; __pyx_L1_error:; return -1; } #if PY_MAJOR_VERSION < 3 PyMODINIT_FUNC init_pairwise_distance(void); /*proto*/ PyMODINIT_FUNC init_pairwise_distance(void) #else PyMODINIT_FUNC PyInit__pairwise_distance(void); /*proto*/ PyMODINIT_FUNC PyInit__pairwise_distance(void) #endif { PyObject *__pyx_t_1 = NULL; __Pyx_RefNannyDeclarations #if CYTHON_REFNANNY __Pyx_RefNanny = __Pyx_RefNannyImportAPI("refnanny"); if (!__Pyx_RefNanny) { PyErr_Clear(); __Pyx_RefNanny = __Pyx_RefNannyImportAPI("Cython.Runtime.refnanny"); if (!__Pyx_RefNanny) Py_FatalError("failed to import 'refnanny' module"); } #endif __Pyx_RefNannySetupContext("PyMODINIT_FUNC PyInit__pairwise_distance(void)", 0); if ( __Pyx_check_binary_version() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_tuple = PyTuple_New(0); if (unlikely(!__pyx_empty_tuple)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_bytes = PyBytes_FromStringAndSize("", 0); if (unlikely(!__pyx_empty_bytes)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #ifdef __Pyx_CyFunction_USED if (__Pyx_CyFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_FusedFunction_USED if (__pyx_FusedFunction_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_Generator_USED if (__pyx_Generator_init() < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif /*--- Library function declarations ---*/ /*--- Threads initialization code ---*/ #if defined(__PYX_FORCE_INIT_THREADS) && __PYX_FORCE_INIT_THREADS #ifdef WITH_THREAD /* Python build with threading support? */ PyEval_InitThreads(); #endif #endif /*--- Module creation code ---*/ #if PY_MAJOR_VERSION < 3 __pyx_m = Py_InitModule4(__Pyx_NAMESTR("_pairwise_distance"), __pyx_methods, 0, 0, PYTHON_API_VERSION); #else __pyx_m = PyModule_Create(&__pyx_moduledef); #endif if (!__pyx_m) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; #if PY_MAJOR_VERSION < 3 Py_INCREF(__pyx_m); #endif __pyx_b = PyImport_AddModule(__Pyx_NAMESTR(__Pyx_BUILTIN_MODULE_NAME)); if (!__pyx_b) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; if (__Pyx_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; /*--- Initialize various global constants etc. ---*/ if (unlikely(__Pyx_InitGlobals() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__pyx_module_is_main_cogent__evolve___pairwise_distance) { if (__Pyx_SetAttrString(__pyx_m, "__name__", __pyx_n_s____main__) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; } /*--- Builtin init code ---*/ if (unlikely(__Pyx_InitCachedBuiltins() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Constants init code ---*/ if (unlikely(__Pyx_InitCachedConstants() < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Global init code ---*/ /*--- Variable export code ---*/ /*--- Function export code ---*/ /*--- Type init code ---*/ /*--- Type import code ---*/ __pyx_ptype_5numpy_dtype = __Pyx_ImportType("numpy", "dtype", sizeof(PyArray_Descr), 0); if (unlikely(!__pyx_ptype_5numpy_dtype)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 154; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_flatiter = __Pyx_ImportType("numpy", "flatiter", sizeof(PyArrayIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_flatiter)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 164; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_broadcast = __Pyx_ImportType("numpy", "broadcast", sizeof(PyArrayMultiIterObject), 0); if (unlikely(!__pyx_ptype_5numpy_broadcast)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 168; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ndarray = __Pyx_ImportType("numpy", "ndarray", sizeof(PyArrayObject), 0); if (unlikely(!__pyx_ptype_5numpy_ndarray)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 177; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_ptype_5numpy_ufunc = __Pyx_ImportType("numpy", "ufunc", sizeof(PyUFuncObject), 0); if (unlikely(!__pyx_ptype_5numpy_ufunc)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 860; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Variable import code ---*/ /*--- Function import code ---*/ /*--- Execution code ---*/ /* "cogent/evolve/_pairwise_distance.pyx":3 * cimport numpy as np * * __version__ = "('1', '5', '3')" # <<<<<<<<<<<<<< * * # fills in a diversity matrix from sequences of integers */ if (PyObject_SetAttr(__pyx_m, __pyx_n_s____version__, ((PyObject *)__pyx_kp_s_13)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 3; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/evolve/_pairwise_distance.pyx":6 * * # fills in a diversity matrix from sequences of integers * def _fill_diversity_matrix(np.ndarray[np.float64_t, ndim=2] matrix, np.ndarray[np.int32_t, ndim=1] seq1, np.ndarray[np.int32_t, ndim=1] seq2): # <<<<<<<<<<<<<< * """fills the diversity matrix for valid positions. * */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_6cogent_6evolve_18_pairwise_distance_1_fill_diversity_matrix, NULL, __pyx_n_s_18); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s_16, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "cogent/evolve/_pairwise_distance.pyx":1 * cimport numpy as np # <<<<<<<<<<<<<< * * __version__ = "('1', '5', '3')" */ __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_1)); if (PyObject_SetAttr(__pyx_m, __pyx_n_s____test__, ((PyObject *)__pyx_t_1)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0; /* "numpy.pxd":974 * arr.base = baseptr * * cdef inline object get_array_base(ndarray arr): # <<<<<<<<<<<<<< * if arr.base is NULL: * return None */ goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); if (__pyx_m) { __Pyx_AddTraceback("init cogent.evolve._pairwise_distance", __pyx_clineno, __pyx_lineno, __pyx_filename); Py_DECREF(__pyx_m); __pyx_m = 0; } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_ImportError, "init cogent.evolve._pairwise_distance"); } __pyx_L0:; __Pyx_RefNannyFinishContext(); #if PY_MAJOR_VERSION < 3 return; #else return __pyx_m; #endif } /* Runtime support code */ #if CYTHON_REFNANNY static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname) { PyObject *m = NULL, *p = NULL; void *r = NULL; m = PyImport_ImportModule((char *)modname); if (!m) goto end; p = PyObject_GetAttrString(m, (char *)"RefNannyAPI"); if (!p) goto end; r = PyLong_AsVoidPtr(p); end: Py_XDECREF(p); Py_XDECREF(m); return (__Pyx_RefNannyAPIStruct *)r; } #endif /* CYTHON_REFNANNY */ static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name) { PyObject *result; result = PyObject_GetAttr(dict, name); if (!result) { if (dict != __pyx_b) { PyErr_Clear(); result = PyObject_GetAttr(__pyx_b, name); } if (!result) { PyErr_SetObject(PyExc_NameError, name); } } return result; } static void __Pyx_RaiseArgtupleInvalid( const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found) { Py_ssize_t num_expected; const char *more_or_less; if (num_found < num_min) { num_expected = num_min; more_or_less = "at least"; } else { num_expected = num_max; more_or_less = "at most"; } if (exact) { more_or_less = "exactly"; } PyErr_Format(PyExc_TypeError, "%s() takes %s %"PY_FORMAT_SIZE_T"d positional argument%s (%"PY_FORMAT_SIZE_T"d given)", func_name, more_or_less, num_expected, (num_expected == 1) ? "" : "s", num_found); } static void __Pyx_RaiseDoubleKeywordsError( const char* func_name, PyObject* kw_name) { PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION >= 3 "%s() got multiple values for keyword argument '%U'", func_name, kw_name); #else "%s() got multiple values for keyword argument '%s'", func_name, PyString_AS_STRING(kw_name)); #endif } static int __Pyx_ParseOptionalKeywords( PyObject *kwds, PyObject **argnames[], PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, const char* function_name) { PyObject *key = 0, *value = 0; Py_ssize_t pos = 0; PyObject*** name; PyObject*** first_kw_arg = argnames + num_pos_args; while (PyDict_Next(kwds, &pos, &key, &value)) { name = first_kw_arg; while (*name && (**name != key)) name++; if (*name) { values[name-argnames] = value; } else { #if PY_MAJOR_VERSION < 3 if (unlikely(!PyString_CheckExact(key)) && unlikely(!PyString_Check(key))) { #else if (unlikely(!PyUnicode_Check(key))) { #endif goto invalid_keyword_type; } else { for (name = first_kw_arg; *name; name++) { #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) break; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) break; #endif } if (*name) { values[name-argnames] = value; } else { for (name=argnames; name != first_kw_arg; name++) { if (**name == key) goto arg_passed_twice; #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) goto arg_passed_twice; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) goto arg_passed_twice; #endif } if (kwds2) { if (unlikely(PyDict_SetItem(kwds2, key, value))) goto bad; } else { goto invalid_keyword; } } } } } return 0; arg_passed_twice: __Pyx_RaiseDoubleKeywordsError(function_name, **name); goto bad; invalid_keyword_type: PyErr_Format(PyExc_TypeError, "%s() keywords must be strings", function_name); goto bad; invalid_keyword: PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION < 3 "%s() got an unexpected keyword argument '%s'", function_name, PyString_AsString(key)); #else "%s() got an unexpected keyword argument '%U'", function_name, key); #endif bad: return -1; } static int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *type, int none_allowed, const char *name, int exact) { if (!type) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (none_allowed && obj == Py_None) return 1; else if (exact) { if (Py_TYPE(obj) == type) return 1; } else { if (PyObject_TypeCheck(obj, type)) return 1; } PyErr_Format(PyExc_TypeError, "Argument '%s' has incorrect type (expected %s, got %s)", name, type->tp_name, Py_TYPE(obj)->tp_name); return 0; } static CYTHON_INLINE int __Pyx_IsLittleEndian(void) { unsigned int n = 1; return *(unsigned char*)(&n) != 0; } static void __Pyx_BufFmt_Init(__Pyx_BufFmt_Context* ctx, __Pyx_BufFmt_StackElem* stack, __Pyx_TypeInfo* type) { stack[0].field = &ctx->root; stack[0].parent_offset = 0; ctx->root.type = type; ctx->root.name = "buffer dtype"; ctx->root.offset = 0; ctx->head = stack; ctx->head->field = &ctx->root; ctx->fmt_offset = 0; ctx->head->parent_offset = 0; ctx->new_packmode = '@'; ctx->enc_packmode = '@'; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->is_complex = 0; ctx->is_valid_array = 0; ctx->struct_alignment = 0; while (type->typegroup == 'S') { ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = 0; type = type->fields->type; } } static int __Pyx_BufFmt_ParseNumber(const char** ts) { int count; const char* t = *ts; if (*t < '0' || *t > '9') { return -1; } else { count = *t++ - '0'; while (*t >= '0' && *t < '9') { count *= 10; count += *t++ - '0'; } } *ts = t; return count; } static int __Pyx_BufFmt_ExpectNumber(const char **ts) { int number = __Pyx_BufFmt_ParseNumber(ts); if (number == -1) /* First char was not a digit */ PyErr_Format(PyExc_ValueError,\ "Does not understand character buffer dtype format string ('%c')", **ts); return number; } static void __Pyx_BufFmt_RaiseUnexpectedChar(char ch) { PyErr_Format(PyExc_ValueError, "Unexpected format string character: '%c'", ch); } static const char* __Pyx_BufFmt_DescribeTypeChar(char ch, int is_complex) { switch (ch) { case 'b': return "'char'"; case 'B': return "'unsigned char'"; case 'h': return "'short'"; case 'H': return "'unsigned short'"; case 'i': return "'int'"; case 'I': return "'unsigned int'"; case 'l': return "'long'"; case 'L': return "'unsigned long'"; case 'q': return "'long long'"; case 'Q': return "'unsigned long long'"; case 'f': return (is_complex ? "'complex float'" : "'float'"); case 'd': return (is_complex ? "'complex double'" : "'double'"); case 'g': return (is_complex ? "'complex long double'" : "'long double'"); case 'T': return "a struct"; case 'O': return "Python object"; case 'P': return "a pointer"; case 's': case 'p': return "a string"; case 0: return "end"; default: return "unparseable format string"; } } static size_t __Pyx_BufFmt_TypeCharToStandardSize(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return 2; case 'i': case 'I': case 'l': case 'L': return 4; case 'q': case 'Q': return 8; case 'f': return (is_complex ? 8 : 4); case 'd': return (is_complex ? 16 : 8); case 'g': { PyErr_SetString(PyExc_ValueError, "Python does not define a standard format string size for long double ('g').."); return 0; } case 'O': case 'P': return sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static size_t __Pyx_BufFmt_TypeCharToNativeSize(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(short); case 'i': case 'I': return sizeof(int); case 'l': case 'L': return sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(float) * (is_complex ? 2 : 1); case 'd': return sizeof(double) * (is_complex ? 2 : 1); case 'g': return sizeof(long double) * (is_complex ? 2 : 1); case 'O': case 'P': return sizeof(void*); default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } typedef struct { char c; short x; } __Pyx_st_short; typedef struct { char c; int x; } __Pyx_st_int; typedef struct { char c; long x; } __Pyx_st_long; typedef struct { char c; float x; } __Pyx_st_float; typedef struct { char c; double x; } __Pyx_st_double; typedef struct { char c; long double x; } __Pyx_st_longdouble; typedef struct { char c; void *x; } __Pyx_st_void_p; #ifdef HAVE_LONG_LONG typedef struct { char c; PY_LONG_LONG x; } __Pyx_st_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToAlignment(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_st_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_st_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_st_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_st_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_st_float) - sizeof(float); case 'd': return sizeof(__Pyx_st_double) - sizeof(double); case 'g': return sizeof(__Pyx_st_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_st_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } /* These are for computing the padding at the end of the struct to align on the first member of the struct. This will probably the same as above, but we don't have any guarantees. */ typedef struct { short x; char c; } __Pyx_pad_short; typedef struct { int x; char c; } __Pyx_pad_int; typedef struct { long x; char c; } __Pyx_pad_long; typedef struct { float x; char c; } __Pyx_pad_float; typedef struct { double x; char c; } __Pyx_pad_double; typedef struct { long double x; char c; } __Pyx_pad_longdouble; typedef struct { void *x; char c; } __Pyx_pad_void_p; #ifdef HAVE_LONG_LONG typedef struct { PY_LONG_LONG x; char c; } __Pyx_pad_longlong; #endif static size_t __Pyx_BufFmt_TypeCharToPadding(char ch, int is_complex) { switch (ch) { case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1; case 'h': case 'H': return sizeof(__Pyx_pad_short) - sizeof(short); case 'i': case 'I': return sizeof(__Pyx_pad_int) - sizeof(int); case 'l': case 'L': return sizeof(__Pyx_pad_long) - sizeof(long); #ifdef HAVE_LONG_LONG case 'q': case 'Q': return sizeof(__Pyx_pad_longlong) - sizeof(PY_LONG_LONG); #endif case 'f': return sizeof(__Pyx_pad_float) - sizeof(float); case 'd': return sizeof(__Pyx_pad_double) - sizeof(double); case 'g': return sizeof(__Pyx_pad_longdouble) - sizeof(long double); case 'P': case 'O': return sizeof(__Pyx_pad_void_p) - sizeof(void*); default: __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } static char __Pyx_BufFmt_TypeCharToGroup(char ch, int is_complex) { switch (ch) { case 'c': case 'b': case 'h': case 'i': case 'l': case 'q': case 's': case 'p': return 'I'; case 'B': case 'H': case 'I': case 'L': case 'Q': return 'U'; case 'f': case 'd': case 'g': return (is_complex ? 'C' : 'R'); case 'O': return 'O'; case 'P': return 'P'; default: { __Pyx_BufFmt_RaiseUnexpectedChar(ch); return 0; } } } static void __Pyx_BufFmt_RaiseExpected(__Pyx_BufFmt_Context* ctx) { if (ctx->head == NULL || ctx->head->field == &ctx->root) { const char* expected; const char* quote; if (ctx->head == NULL) { expected = "end"; quote = ""; } else { expected = ctx->head->field->type->name; quote = "'"; } PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected %s%s%s but got %s", quote, expected, quote, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex)); } else { __Pyx_StructField* field = ctx->head->field; __Pyx_StructField* parent = (ctx->head - 1)->field; PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch, expected '%s' but got %s in '%s.%s'", field->type->name, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex), parent->type->name, field->name); } } static int __Pyx_BufFmt_ProcessTypeChunk(__Pyx_BufFmt_Context* ctx) { char group; size_t size, offset, arraysize = 1; if (ctx->enc_type == 0) return 0; if (ctx->head->field->type->arraysize[0]) { int i, ndim = 0; if (ctx->enc_type == 's' || ctx->enc_type == 'p') { ctx->is_valid_array = ctx->head->field->type->ndim == 1; ndim = 1; if (ctx->enc_count != ctx->head->field->type->arraysize[0]) { PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %zu", ctx->head->field->type->arraysize[0], ctx->enc_count); return -1; } } if (!ctx->is_valid_array) { PyErr_Format(PyExc_ValueError, "Expected %d dimensions, got %d", ctx->head->field->type->ndim, ndim); return -1; } for (i = 0; i < ctx->head->field->type->ndim; i++) { arraysize *= ctx->head->field->type->arraysize[i]; } ctx->is_valid_array = 0; ctx->enc_count = 1; } group = __Pyx_BufFmt_TypeCharToGroup(ctx->enc_type, ctx->is_complex); do { __Pyx_StructField* field = ctx->head->field; __Pyx_TypeInfo* type = field->type; if (ctx->enc_packmode == '@' || ctx->enc_packmode == '^') { size = __Pyx_BufFmt_TypeCharToNativeSize(ctx->enc_type, ctx->is_complex); } else { size = __Pyx_BufFmt_TypeCharToStandardSize(ctx->enc_type, ctx->is_complex); } if (ctx->enc_packmode == '@') { size_t align_at = __Pyx_BufFmt_TypeCharToAlignment(ctx->enc_type, ctx->is_complex); size_t align_mod_offset; if (align_at == 0) return -1; align_mod_offset = ctx->fmt_offset % align_at; if (align_mod_offset > 0) ctx->fmt_offset += align_at - align_mod_offset; if (ctx->struct_alignment == 0) ctx->struct_alignment = __Pyx_BufFmt_TypeCharToPadding(ctx->enc_type, ctx->is_complex); } if (type->size != size || type->typegroup != group) { if (type->typegroup == 'C' && type->fields != NULL) { size_t parent_offset = ctx->head->parent_offset + field->offset; ++ctx->head; ctx->head->field = type->fields; ctx->head->parent_offset = parent_offset; continue; } __Pyx_BufFmt_RaiseExpected(ctx); return -1; } offset = ctx->head->parent_offset + field->offset; if (ctx->fmt_offset != offset) { PyErr_Format(PyExc_ValueError, "Buffer dtype mismatch; next field is at offset %"PY_FORMAT_SIZE_T"d but %"PY_FORMAT_SIZE_T"d expected", (Py_ssize_t)ctx->fmt_offset, (Py_ssize_t)offset); return -1; } ctx->fmt_offset += size; if (arraysize) ctx->fmt_offset += (arraysize - 1) * size; --ctx->enc_count; /* Consume from buffer string */ while (1) { if (field == &ctx->root) { ctx->head = NULL; if (ctx->enc_count != 0) { __Pyx_BufFmt_RaiseExpected(ctx); return -1; } break; /* breaks both loops as ctx->enc_count == 0 */ } ctx->head->field = ++field; if (field->type == NULL) { --ctx->head; field = ctx->head->field; continue; } else if (field->type->typegroup == 'S') { size_t parent_offset = ctx->head->parent_offset + field->offset; if (field->type->fields->type == NULL) continue; /* empty struct */ field = field->type->fields; ++ctx->head; ctx->head->field = field; ctx->head->parent_offset = parent_offset; break; } else { break; } } } while (ctx->enc_count); ctx->enc_type = 0; ctx->is_complex = 0; return 0; } static CYTHON_INLINE PyObject * __pyx_buffmt_parse_array(__Pyx_BufFmt_Context* ctx, const char** tsp) { const char *ts = *tsp; int i = 0, number; int ndim = ctx->head->field->type->ndim; ; ++ts; if (ctx->new_count != 1) { PyErr_SetString(PyExc_ValueError, "Cannot handle repeated arrays in format string"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; while (*ts && *ts != ')') { if (isspace(*ts)) continue; number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; if (i < ndim && (size_t) number != ctx->head->field->type->arraysize[i]) return PyErr_Format(PyExc_ValueError, "Expected a dimension of size %zu, got %d", ctx->head->field->type->arraysize[i], number); if (*ts != ',' && *ts != ')') return PyErr_Format(PyExc_ValueError, "Expected a comma in format string, got '%c'", *ts); if (*ts == ',') ts++; i++; } if (i != ndim) return PyErr_Format(PyExc_ValueError, "Expected %d dimension(s), got %d", ctx->head->field->type->ndim, i); if (!*ts) { PyErr_SetString(PyExc_ValueError, "Unexpected end of format string, expected ')'"); return NULL; } ctx->is_valid_array = 1; ctx->new_count = 1; *tsp = ++ts; return Py_None; } static const char* __Pyx_BufFmt_CheckString(__Pyx_BufFmt_Context* ctx, const char* ts) { int got_Z = 0; while (1) { switch(*ts) { case 0: if (ctx->enc_type != 0 && ctx->head == NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; if (ctx->head != NULL) { __Pyx_BufFmt_RaiseExpected(ctx); return NULL; } return ts; case ' ': case 10: case 13: ++ts; break; case '<': if (!__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Little-endian buffer not supported on big-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '>': case '!': if (__Pyx_IsLittleEndian()) { PyErr_SetString(PyExc_ValueError, "Big-endian buffer not supported on little-endian compiler"); return NULL; } ctx->new_packmode = '='; ++ts; break; case '=': case '@': case '^': ctx->new_packmode = *ts++; break; case 'T': /* substruct */ { const char* ts_after_sub; size_t i, struct_count = ctx->new_count; size_t struct_alignment = ctx->struct_alignment; ctx->new_count = 1; ++ts; if (*ts != '{') { PyErr_SetString(PyExc_ValueError, "Buffer acquisition: Expected '{' after 'T'"); return NULL; } if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ ctx->enc_count = 0; ctx->struct_alignment = 0; ++ts; ts_after_sub = ts; for (i = 0; i != struct_count; ++i) { ts_after_sub = __Pyx_BufFmt_CheckString(ctx, ts); if (!ts_after_sub) return NULL; } ts = ts_after_sub; if (struct_alignment) ctx->struct_alignment = struct_alignment; } break; case '}': /* end of substruct; either repeat or move on */ { size_t alignment = ctx->struct_alignment; ++ts; if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_type = 0; /* Erase processed last struct element */ if (alignment && ctx->fmt_offset % alignment) { ctx->fmt_offset += alignment - (ctx->fmt_offset % alignment); } } return ts; case 'x': if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->fmt_offset += ctx->new_count; ctx->new_count = 1; ctx->enc_count = 0; ctx->enc_type = 0; ctx->enc_packmode = ctx->new_packmode; ++ts; break; case 'Z': got_Z = 1; ++ts; if (*ts != 'f' && *ts != 'd' && *ts != 'g') { __Pyx_BufFmt_RaiseUnexpectedChar('Z'); return NULL; } /* fall through */ case 'c': case 'b': case 'B': case 'h': case 'H': case 'i': case 'I': case 'l': case 'L': case 'q': case 'Q': case 'f': case 'd': case 'g': case 'O': case 's': case 'p': if (ctx->enc_type == *ts && got_Z == ctx->is_complex && ctx->enc_packmode == ctx->new_packmode) { ctx->enc_count += ctx->new_count; } else { if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_count = ctx->new_count; ctx->enc_packmode = ctx->new_packmode; ctx->enc_type = *ts; ctx->is_complex = got_Z; } ++ts; ctx->new_count = 1; got_Z = 0; break; case ':': ++ts; while(*ts != ':') ++ts; ++ts; break; case '(': if (!__pyx_buffmt_parse_array(ctx, &ts)) return NULL; break; default: { int number = __Pyx_BufFmt_ExpectNumber(&ts); if (number == -1) return NULL; ctx->new_count = (size_t)number; } } } } static CYTHON_INLINE void __Pyx_ZeroBuffer(Py_buffer* buf) { buf->buf = NULL; buf->obj = NULL; buf->strides = __Pyx_zeros; buf->shape = __Pyx_zeros; buf->suboffsets = __Pyx_minusones; } static CYTHON_INLINE int __Pyx_GetBufferAndValidate( Py_buffer* buf, PyObject* obj, __Pyx_TypeInfo* dtype, int flags, int nd, int cast, __Pyx_BufFmt_StackElem* stack) { if (obj == Py_None || obj == NULL) { __Pyx_ZeroBuffer(buf); return 0; } buf->buf = NULL; if (__Pyx_GetBuffer(obj, buf, flags) == -1) goto fail; if (buf->ndim != nd) { PyErr_Format(PyExc_ValueError, "Buffer has wrong number of dimensions (expected %d, got %d)", nd, buf->ndim); goto fail; } if (!cast) { __Pyx_BufFmt_Context ctx; __Pyx_BufFmt_Init(&ctx, stack, dtype); if (!__Pyx_BufFmt_CheckString(&ctx, buf->format)) goto fail; } if ((unsigned)buf->itemsize != dtype->size) { PyErr_Format(PyExc_ValueError, "Item size of buffer (%"PY_FORMAT_SIZE_T"d byte%s) does not match size of '%s' (%"PY_FORMAT_SIZE_T"d byte%s)", buf->itemsize, (buf->itemsize > 1) ? "s" : "", dtype->name, (Py_ssize_t)dtype->size, (dtype->size > 1) ? "s" : ""); goto fail; } if (buf->suboffsets == NULL) buf->suboffsets = __Pyx_minusones; return 0; fail:; __Pyx_ZeroBuffer(buf); return -1; } static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info) { if (info->buf == NULL) return; if (info->suboffsets == __Pyx_minusones) info->suboffsets = NULL; __Pyx_ReleaseBuffer(info); } static void __Pyx_RaiseBufferIndexError(int axis) { PyErr_Format(PyExc_IndexError, "Out of bounds on buffer access (axis %d)", axis); } static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb) { #if CYTHON_COMPILING_IN_CPYTHON PyObject *tmp_type, *tmp_value, *tmp_tb; PyThreadState *tstate = PyThreadState_GET(); tmp_type = tstate->curexc_type; tmp_value = tstate->curexc_value; tmp_tb = tstate->curexc_traceback; tstate->curexc_type = type; tstate->curexc_value = value; tstate->curexc_traceback = tb; Py_XDECREF(tmp_type); Py_XDECREF(tmp_value); Py_XDECREF(tmp_tb); #else PyErr_Restore(type, value, tb); #endif } static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb) { #if CYTHON_COMPILING_IN_CPYTHON PyThreadState *tstate = PyThreadState_GET(); *type = tstate->curexc_type; *value = tstate->curexc_value; *tb = tstate->curexc_traceback; tstate->curexc_type = 0; tstate->curexc_value = 0; tstate->curexc_traceback = 0; #else PyErr_Fetch(type, value, tb); #endif } #if PY_MAJOR_VERSION < 3 static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, CYTHON_UNUSED PyObject *cause) { Py_XINCREF(type); Py_XINCREF(value); Py_XINCREF(tb); if (tb == Py_None) { Py_DECREF(tb); tb = 0; } else if (tb != NULL && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto raise_error; } if (value == NULL) { value = Py_None; Py_INCREF(value); } #if PY_VERSION_HEX < 0x02050000 if (!PyClass_Check(type)) #else if (!PyType_Check(type)) #endif { if (value != Py_None) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto raise_error; } Py_DECREF(value); value = type; #if PY_VERSION_HEX < 0x02050000 if (PyInstance_Check(type)) { type = (PyObject*) ((PyInstanceObject*)type)->in_class; Py_INCREF(type); } else { type = 0; PyErr_SetString(PyExc_TypeError, "raise: exception must be an old-style class or instance"); goto raise_error; } #else type = (PyObject*) Py_TYPE(type); Py_INCREF(type); if (!PyType_IsSubtype((PyTypeObject *)type, (PyTypeObject *)PyExc_BaseException)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto raise_error; } #endif } __Pyx_ErrRestore(type, value, tb); return; raise_error: Py_XDECREF(value); Py_XDECREF(type); Py_XDECREF(tb); return; } #else /* Python 3+ */ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause) { if (tb == Py_None) { tb = 0; } else if (tb && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto bad; } if (value == Py_None) value = 0; if (PyExceptionInstance_Check(type)) { if (value) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto bad; } value = type; type = (PyObject*) Py_TYPE(value); } else if (!PyExceptionClass_Check(type)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto bad; } if (cause) { PyObject *fixed_cause; if (PyExceptionClass_Check(cause)) { fixed_cause = PyObject_CallObject(cause, NULL); if (fixed_cause == NULL) goto bad; } else if (PyExceptionInstance_Check(cause)) { fixed_cause = cause; Py_INCREF(fixed_cause); } else { PyErr_SetString(PyExc_TypeError, "exception causes must derive from " "BaseException"); goto bad; } if (!value) { value = PyObject_CallObject(type, NULL); } PyException_SetCause(value, fixed_cause); } PyErr_SetObject(type, value); if (tb) { PyThreadState *tstate = PyThreadState_GET(); PyObject* tmp_tb = tstate->curexc_traceback; if (tb != tmp_tb) { Py_INCREF(tb); tstate->curexc_traceback = tb; Py_XDECREF(tmp_tb); } } bad: return; } #endif static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index) { PyErr_Format(PyExc_ValueError, "need more than %"PY_FORMAT_SIZE_T"d value%s to unpack", index, (index == 1) ? "" : "s"); } static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected) { PyErr_Format(PyExc_ValueError, "too many values to unpack (expected %"PY_FORMAT_SIZE_T"d)", expected); } static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void) { PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable"); } static void __Pyx_UnpackTupleError(PyObject *t, Py_ssize_t index) { if (t == Py_None) { __Pyx_RaiseNoneNotIterableError(); } else if (PyTuple_GET_SIZE(t) < index) { __Pyx_RaiseNeedMoreValuesError(PyTuple_GET_SIZE(t)); } else { __Pyx_RaiseTooManyValuesError(index); } } static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type) { if (unlikely(!type)) { PyErr_Format(PyExc_SystemError, "Missing type object"); return 0; } if (likely(PyObject_TypeCheck(obj, type))) return 1; PyErr_Format(PyExc_TypeError, "Cannot convert %.200s to %.200s", Py_TYPE(obj)->tp_name, type->tp_name); return 0; } #if PY_MAJOR_VERSION < 3 static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags) { PyObject *getbuffer_cobj; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) return PyObject_GetBuffer(obj, view, flags); #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) return __pyx_pw_5numpy_7ndarray_1__getbuffer__(obj, view, flags); #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (getbuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_getbuffer"))) { getbufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (getbufferproc) PyCapsule_GetPointer(getbuffer_cobj, "getbuffer(obj, view, flags)"); #else func = (getbufferproc) PyCObject_AsVoidPtr(getbuffer_cobj); #endif Py_DECREF(getbuffer_cobj); if (!func) goto fail; return func(obj, view, flags); } else { PyErr_Clear(); } #endif PyErr_Format(PyExc_TypeError, "'%100s' does not have the buffer interface", Py_TYPE(obj)->tp_name); #if PY_VERSION_HEX < 0x02060000 fail: #endif return -1; } static void __Pyx_ReleaseBuffer(Py_buffer *view) { PyObject *obj = view->obj; PyObject *releasebuffer_cobj; if (!obj) return; #if PY_VERSION_HEX >= 0x02060000 if (PyObject_CheckBuffer(obj)) { PyBuffer_Release(view); return; } #endif if (PyObject_TypeCheck(obj, __pyx_ptype_5numpy_ndarray)) { __pyx_pw_5numpy_7ndarray_3__releasebuffer__(obj, view); return; } #if PY_VERSION_HEX < 0x02060000 if (obj->ob_type->tp_dict && (releasebuffer_cobj = PyMapping_GetItemString(obj->ob_type->tp_dict, "__pyx_releasebuffer"))) { releasebufferproc func; #if PY_VERSION_HEX >= 0x02070000 && !(PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION == 0) func = (releasebufferproc) PyCapsule_GetPointer(releasebuffer_cobj, "releasebuffer(obj, view)"); #else func = (releasebufferproc) PyCObject_AsVoidPtr(releasebuffer_cobj); #endif Py_DECREF(releasebuffer_cobj); if (!func) goto fail; func(obj, view); return; } else { PyErr_Clear(); } #endif goto nofail; #if PY_VERSION_HEX < 0x02060000 fail: #endif PyErr_WriteUnraisable(obj); nofail: Py_DECREF(obj); view->obj = NULL; } #endif /* PY_MAJOR_VERSION < 3 */ #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return ::std::complex< float >(x, y); } #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { return x + y*(__pyx_t_float_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) { __pyx_t_float_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex a, __pyx_t_float_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_complex a) { __pyx_t_float_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrtf(z.real*z.real + z.imag*z.imag); #else return hypotf(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_complex a, __pyx_t_float_complex b) { __pyx_t_float_complex z; float r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { float denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(a, a); case 3: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, a); case 4: z = __Pyx_c_prodf(a, a); return __Pyx_c_prodf(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_absf(a); theta = atan2f(a.imag, a.real); } lnr = logf(r); z_r = expf(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cosf(z_theta); z.imag = z_r * sinf(z_theta); return z; } #endif #endif #if CYTHON_CCOMPLEX #ifdef __cplusplus static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return ::std::complex< double >(x, y); } #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { return x + y*(__pyx_t_double_complex)_Complex_I; } #endif #else static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) { __pyx_t_double_complex z; z.real = x; z.imag = y; return z; } #endif #if CYTHON_CCOMPLEX #else static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex a, __pyx_t_double_complex b) { return (a.real == b.real) && (a.imag == b.imag); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real + b.real; z.imag = a.imag + b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real - b.real; z.imag = a.imag - b.imag; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; z.real = a.real * b.real - a.imag * b.imag; z.imag = a.real * b.imag + a.imag * b.real; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double denom = b.real * b.real + b.imag * b.imag; z.real = (a.real * b.real + a.imag * b.imag) / denom; z.imag = (a.imag * b.real - a.real * b.imag) / denom; return z; } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = -a.real; z.imag = -a.imag; return z; } static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex a) { return (a.real == 0) && (a.imag == 0); } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_complex a) { __pyx_t_double_complex z; z.real = a.real; z.imag = -a.imag; return z; } #if 1 static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex z) { #if !defined(HAVE_HYPOT) || defined(_MSC_VER) return sqrt(z.real*z.real + z.imag*z.imag); #else return hypot(z.real, z.imag); #endif } static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_complex a, __pyx_t_double_complex b) { __pyx_t_double_complex z; double r, lnr, theta, z_r, z_theta; if (b.imag == 0 && b.real == (int)b.real) { if (b.real < 0) { double denom = a.real * a.real + a.imag * a.imag; a.real = a.real / denom; a.imag = -a.imag / denom; b.real = -b.real; } switch ((int)b.real) { case 0: z.real = 1; z.imag = 0; return z; case 1: return a; case 2: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(a, a); case 3: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, a); case 4: z = __Pyx_c_prod(a, a); return __Pyx_c_prod(z, z); } } if (a.imag == 0) { if (a.real == 0) { return a; } r = a.real; theta = 0; } else { r = __Pyx_c_abs(a); theta = atan2(a.imag, a.real); } lnr = log(r); z_r = exp(lnr * b.real - theta * b.imag); z_theta = theta * b.real + lnr * b.imag; z.real = z_r * cos(z_theta); z.imag = z_r * sin(z_theta); return z; } #endif #endif static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject* x) { const unsigned char neg_one = (unsigned char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned char" : "value too large to convert to unsigned char"); } return (unsigned char)-1; } return (unsigned char)val; } return (unsigned char)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject* x) { const unsigned short neg_one = (unsigned short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned short" : "value too large to convert to unsigned short"); } return (unsigned short)-1; } return (unsigned short)val; } return (unsigned short)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject* x) { const unsigned int neg_one = (unsigned int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned int" : "value too large to convert to unsigned int"); } return (unsigned int)-1; } return (unsigned int)val; } return (unsigned int)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject* x) { const char neg_one = (char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to char" : "value too large to convert to char"); } return (char)-1; } return (char)val; } return (char)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject* x) { const short neg_one = (short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to short" : "value too large to convert to short"); } return (short)-1; } return (short)val; } return (short)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject* x) { const signed char neg_one = (signed char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed char" : "value too large to convert to signed char"); } return (signed char)-1; } return (signed char)val; } return (signed char)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject* x) { const signed short neg_one = (signed short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed short" : "value too large to convert to signed short"); } return (signed short)-1; } return (signed short)val; } return (signed short)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject* x) { const signed int neg_one = (signed int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed int" : "value too large to convert to signed int"); } return (signed int)-1; } return (signed int)val; } return (signed int)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject* x) { const unsigned long neg_one = (unsigned long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)PyLong_AsUnsignedLong(x); } else { return (unsigned long)PyLong_AsLong(x); } } else { unsigned long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned long)-1; val = __Pyx_PyInt_AsUnsignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject* x) { const unsigned PY_LONG_LONG neg_one = (unsigned PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (unsigned PY_LONG_LONG)PyLong_AsLongLong(x); } } else { unsigned PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned PY_LONG_LONG)-1; val = __Pyx_PyInt_AsUnsignedLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject* x) { const long neg_one = (long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)PyLong_AsUnsignedLong(x); } else { return (long)PyLong_AsLong(x); } } else { long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (long)-1; val = __Pyx_PyInt_AsLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject* x) { const PY_LONG_LONG neg_one = (PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (PY_LONG_LONG)PyLong_AsLongLong(x); } } else { PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1; val = __Pyx_PyInt_AsLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject* x) { const signed long neg_one = (signed long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)PyLong_AsUnsignedLong(x); } else { return (signed long)PyLong_AsLong(x); } } else { signed long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed long)-1; val = __Pyx_PyInt_AsSignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject* x) { const signed PY_LONG_LONG neg_one = (signed PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (signed PY_LONG_LONG)PyLong_AsLongLong(x); } } else { signed PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed PY_LONG_LONG)-1; val = __Pyx_PyInt_AsSignedLongLong(tmp); Py_DECREF(tmp); return val; } } static int __Pyx_check_binary_version(void) { char ctversion[4], rtversion[4]; PyOS_snprintf(ctversion, 4, "%d.%d", PY_MAJOR_VERSION, PY_MINOR_VERSION); PyOS_snprintf(rtversion, 4, "%s", Py_GetVersion()); if (ctversion[0] != rtversion[0] || ctversion[2] != rtversion[2]) { char message[200]; PyOS_snprintf(message, sizeof(message), "compiletime version %s of module '%.100s' " "does not match runtime version %s", ctversion, __Pyx_MODULE_NAME, rtversion); #if PY_VERSION_HEX < 0x02050000 return PyErr_Warn(NULL, message); #else return PyErr_WarnEx(NULL, message, 1); #endif } return 0; } #ifndef __PYX_HAVE_RT_ImportType #define __PYX_HAVE_RT_ImportType static PyTypeObject *__Pyx_ImportType(const char *module_name, const char *class_name, size_t size, int strict) { PyObject *py_module = 0; PyObject *result = 0; PyObject *py_name = 0; char warning[200]; py_module = __Pyx_ImportModule(module_name); if (!py_module) goto bad; py_name = __Pyx_PyIdentifier_FromString(class_name); if (!py_name) goto bad; result = PyObject_GetAttr(py_module, py_name); Py_DECREF(py_name); py_name = 0; Py_DECREF(py_module); py_module = 0; if (!result) goto bad; if (!PyType_Check(result)) { PyErr_Format(PyExc_TypeError, "%s.%s is not a type object", module_name, class_name); goto bad; } if (!strict && (size_t)((PyTypeObject *)result)->tp_basicsize > size) { PyOS_snprintf(warning, sizeof(warning), "%s.%s size changed, may indicate binary incompatibility", module_name, class_name); #if PY_VERSION_HEX < 0x02050000 if (PyErr_Warn(NULL, warning) < 0) goto bad; #else if (PyErr_WarnEx(NULL, warning, 0) < 0) goto bad; #endif } else if ((size_t)((PyTypeObject *)result)->tp_basicsize != size) { PyErr_Format(PyExc_ValueError, "%s.%s has the wrong size, try recompiling", module_name, class_name); goto bad; } return (PyTypeObject *)result; bad: Py_XDECREF(py_module); Py_XDECREF(result); return NULL; } #endif #ifndef __PYX_HAVE_RT_ImportModule #define __PYX_HAVE_RT_ImportModule static PyObject *__Pyx_ImportModule(const char *name) { PyObject *py_name = 0; PyObject *py_module = 0; py_name = __Pyx_PyIdentifier_FromString(name); if (!py_name) goto bad; py_module = PyImport_Import(py_name); Py_DECREF(py_name); return py_module; bad: Py_XDECREF(py_name); return 0; } #endif static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line) { int start = 0, mid = 0, end = count - 1; if (end >= 0 && code_line > entries[end].code_line) { return count; } while (start < end) { mid = (start + end) / 2; if (code_line < entries[mid].code_line) { end = mid; } else if (code_line > entries[mid].code_line) { start = mid + 1; } else { return mid; } } if (code_line <= entries[mid].code_line) { return mid; } else { return mid + 1; } } static PyCodeObject *__pyx_find_code_object(int code_line) { PyCodeObject* code_object; int pos; if (unlikely(!code_line) || unlikely(!__pyx_code_cache.entries)) { return NULL; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if (unlikely(pos >= __pyx_code_cache.count) || unlikely(__pyx_code_cache.entries[pos].code_line != code_line)) { return NULL; } code_object = __pyx_code_cache.entries[pos].code_object; Py_INCREF(code_object); return code_object; } static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object) { int pos, i; __Pyx_CodeObjectCacheEntry* entries = __pyx_code_cache.entries; if (unlikely(!code_line)) { return; } if (unlikely(!entries)) { entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Malloc(64*sizeof(__Pyx_CodeObjectCacheEntry)); if (likely(entries)) { __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = 64; __pyx_code_cache.count = 1; entries[0].code_line = code_line; entries[0].code_object = code_object; Py_INCREF(code_object); } return; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if ((pos < __pyx_code_cache.count) && unlikely(__pyx_code_cache.entries[pos].code_line == code_line)) { PyCodeObject* tmp = entries[pos].code_object; entries[pos].code_object = code_object; Py_DECREF(tmp); return; } if (__pyx_code_cache.count == __pyx_code_cache.max_count) { int new_max = __pyx_code_cache.max_count + 64; entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Realloc( __pyx_code_cache.entries, new_max*sizeof(__Pyx_CodeObjectCacheEntry)); if (unlikely(!entries)) { return; } __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = new_max; } for (i=__pyx_code_cache.count; i>pos; i--) { entries[i] = entries[i-1]; } entries[pos].code_line = code_line; entries[pos].code_object = code_object; __pyx_code_cache.count++; Py_INCREF(code_object); } #include "compile.h" #include "frameobject.h" #include "traceback.h" static PyCodeObject* __Pyx_CreateCodeObjectForTraceback( const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_srcfile = 0; PyObject *py_funcname = 0; #if PY_MAJOR_VERSION < 3 py_srcfile = PyString_FromString(filename); #else py_srcfile = PyUnicode_FromString(filename); #endif if (!py_srcfile) goto bad; if (c_line) { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #else py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #endif } else { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromString(funcname); #else py_funcname = PyUnicode_FromString(funcname); #endif } if (!py_funcname) goto bad; py_code = __Pyx_PyCode_New( 0, /*int argcount,*/ 0, /*int kwonlyargcount,*/ 0, /*int nlocals,*/ 0, /*int stacksize,*/ 0, /*int flags,*/ __pyx_empty_bytes, /*PyObject *code,*/ __pyx_empty_tuple, /*PyObject *consts,*/ __pyx_empty_tuple, /*PyObject *names,*/ __pyx_empty_tuple, /*PyObject *varnames,*/ __pyx_empty_tuple, /*PyObject *freevars,*/ __pyx_empty_tuple, /*PyObject *cellvars,*/ py_srcfile, /*PyObject *filename,*/ py_funcname, /*PyObject *name,*/ py_line, /*int firstlineno,*/ __pyx_empty_bytes /*PyObject *lnotab*/ ); Py_DECREF(py_srcfile); Py_DECREF(py_funcname); return py_code; bad: Py_XDECREF(py_srcfile); Py_XDECREF(py_funcname); return NULL; } static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_globals = 0; PyFrameObject *py_frame = 0; py_code = __pyx_find_code_object(c_line ? c_line : py_line); if (!py_code) { py_code = __Pyx_CreateCodeObjectForTraceback( funcname, c_line, py_line, filename); if (!py_code) goto bad; __pyx_insert_code_object(c_line ? c_line : py_line, py_code); } py_globals = PyModule_GetDict(__pyx_m); if (!py_globals) goto bad; py_frame = PyFrame_New( PyThreadState_GET(), /*PyThreadState *tstate,*/ py_code, /*PyCodeObject *code,*/ py_globals, /*PyObject *globals,*/ 0 /*PyObject *locals*/ ); if (!py_frame) goto bad; py_frame->f_lineno = py_line; PyTraceBack_Here(py_frame); bad: Py_XDECREF(py_code); Py_XDECREF(py_frame); } static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) { while (t->p) { #if PY_MAJOR_VERSION < 3 if (t->is_unicode) { *t->p = PyUnicode_DecodeUTF8(t->s, t->n - 1, NULL); } else if (t->intern) { *t->p = PyString_InternFromString(t->s); } else { *t->p = PyString_FromStringAndSize(t->s, t->n - 1); } #else /* Python 3+ has unicode identifiers */ if (t->is_unicode | t->is_str) { if (t->intern) { *t->p = PyUnicode_InternFromString(t->s); } else if (t->encoding) { *t->p = PyUnicode_Decode(t->s, t->n - 1, t->encoding, NULL); } else { *t->p = PyUnicode_FromStringAndSize(t->s, t->n - 1); } } else { *t->p = PyBytes_FromStringAndSize(t->s, t->n - 1); } #endif if (!*t->p) return -1; ++t; } return 0; } /* Type Conversion Functions */ static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { int is_true = x == Py_True; if (is_true | (x == Py_False) | (x == Py_None)) return is_true; else return PyObject_IsTrue(x); } static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x) { PyNumberMethods *m; const char *name = NULL; PyObject *res = NULL; #if PY_VERSION_HEX < 0x03000000 if (PyInt_Check(x) || PyLong_Check(x)) #else if (PyLong_Check(x)) #endif return Py_INCREF(x), x; m = Py_TYPE(x)->tp_as_number; #if PY_VERSION_HEX < 0x03000000 if (m && m->nb_int) { name = "int"; res = PyNumber_Int(x); } else if (m && m->nb_long) { name = "long"; res = PyNumber_Long(x); } #else if (m && m->nb_int) { name = "int"; res = PyNumber_Long(x); } #endif if (res) { #if PY_VERSION_HEX < 0x03000000 if (!PyInt_Check(res) && !PyLong_Check(res)) { #else if (!PyLong_Check(res)) { #endif PyErr_Format(PyExc_TypeError, "__%s__ returned non-%s (type %.200s)", name, name, Py_TYPE(res)->tp_name); Py_DECREF(res); return NULL; } } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_TypeError, "an integer is required"); } return res; } static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) { Py_ssize_t ival; PyObject* x = PyNumber_Index(b); if (!x) return -1; ival = PyInt_AsSsize_t(x); Py_DECREF(x); return ival; } static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) { #if PY_VERSION_HEX < 0x02050000 if (ival <= LONG_MAX) return PyInt_FromLong((long)ival); else { unsigned char *bytes = (unsigned char *) &ival; int one = 1; int little = (int)*(unsigned char*)&one; return _PyLong_FromByteArray(bytes, sizeof(size_t), little, 0); } #else return PyInt_FromSize_t(ival); #endif } static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject* x) { unsigned PY_LONG_LONG val = __Pyx_PyInt_AsUnsignedLongLong(x); if (unlikely(val == (unsigned PY_LONG_LONG)-1 && PyErr_Occurred())) { return (size_t)-1; } else if (unlikely(val != (unsigned PY_LONG_LONG)(size_t)val)) { PyErr_SetString(PyExc_OverflowError, "value too large to convert to size_t"); return (size_t)-1; } return (size_t)val; } #endif /* Py_PYTHON_H */ PyCogent-1.5.3/cogent/evolve/_pairwise_distance.pyx000644 000765 000024 00000001217 12024702176 023417 0ustar00jrideoutstaff000000 000000 cimport numpy as np __version__ = "('1', '5', '3')" # fills in a diversity matrix from sequences of integers def _fill_diversity_matrix(np.ndarray[np.float64_t, ndim=2] matrix, np.ndarray[np.int32_t, ndim=1] seq1, np.ndarray[np.int32_t, ndim=1] seq2): """fills the diversity matrix for valid positions. Assumes the provided sequences have been converted to indices with invalid characters being negative numbers (use get_moltype_index_array plus seq_to_indices).""" cdef int i for i in range(len(seq1)): if seq1[i] < 0 or seq2[i] < 0: continue matrix[seq1[i], seq2[i]] += 1.0 PyCogent-1.5.3/cogent/evolve/_solved_models.c000644 000765 000024 00000435533 12014704604 022174 0ustar00jrideoutstaff000000 000000 /* Generated by Cython 0.16 on Tue Aug 21 23:07:02 2012 */ #define PY_SSIZE_T_CLEAN #include "Python.h" #ifndef Py_PYTHON_H #error Python headers needed to compile C extensions, please install development version of Python. #elif PY_VERSION_HEX < 0x02040000 #error Cython requires Python 2.4+. #else #include /* For offsetof */ #ifndef offsetof #define offsetof(type, member) ( (size_t) & ((type*)0) -> member ) #endif #if !defined(WIN32) && !defined(MS_WINDOWS) #ifndef __stdcall #define __stdcall #endif #ifndef __cdecl #define __cdecl #endif #ifndef __fastcall #define __fastcall #endif #endif #ifndef DL_IMPORT #define DL_IMPORT(t) t #endif #ifndef DL_EXPORT #define DL_EXPORT(t) t #endif #ifndef PY_LONG_LONG #define PY_LONG_LONG LONG_LONG #endif #ifndef Py_HUGE_VAL #define Py_HUGE_VAL HUGE_VAL #endif #ifdef PYPY_VERSION #define CYTHON_COMPILING_IN_PYPY 1 #define CYTHON_COMPILING_IN_CPYTHON 0 #else #define CYTHON_COMPILING_IN_PYPY 0 #define CYTHON_COMPILING_IN_CPYTHON 1 #endif #if CYTHON_COMPILING_IN_PYPY #define __Pyx_PyCFunction_Call PyObject_Call #else #define __Pyx_PyCFunction_Call PyCFunction_Call #endif #if PY_VERSION_HEX < 0x02050000 typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN #define PY_FORMAT_SIZE_T "" #define PyInt_FromSsize_t(z) PyInt_FromLong(z) #define PyInt_AsSsize_t(o) __Pyx_PyInt_AsInt(o) #define PyNumber_Index(o) PyNumber_Int(o) #define PyIndex_Check(o) PyNumber_Check(o) #define PyErr_WarnEx(category, message, stacklevel) PyErr_Warn(category, message) #define __PYX_BUILD_PY_SSIZE_T "i" #else #define __PYX_BUILD_PY_SSIZE_T "n" #endif #if PY_VERSION_HEX < 0x02060000 #define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt) #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) #define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size) #define PyVarObject_HEAD_INIT(type, size) \ PyObject_HEAD_INIT(type) size, #define PyType_Modified(t) typedef struct { void *buf; PyObject *obj; Py_ssize_t len; Py_ssize_t itemsize; int readonly; int ndim; char *format; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; void *internal; } Py_buffer; #define PyBUF_SIMPLE 0 #define PyBUF_WRITABLE 0x0001 #define PyBUF_FORMAT 0x0004 #define PyBUF_ND 0x0008 #define PyBUF_STRIDES (0x0010 | PyBUF_ND) #define PyBUF_C_CONTIGUOUS (0x0020 | PyBUF_STRIDES) #define PyBUF_F_CONTIGUOUS (0x0040 | PyBUF_STRIDES) #define PyBUF_ANY_CONTIGUOUS (0x0080 | PyBUF_STRIDES) #define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES) #define PyBUF_RECORDS (PyBUF_STRIDES | PyBUF_FORMAT | PyBUF_WRITABLE) #define PyBUF_FULL (PyBUF_INDIRECT | PyBUF_FORMAT | PyBUF_WRITABLE) typedef int (*getbufferproc)(PyObject *, Py_buffer *, int); typedef void (*releasebufferproc)(PyObject *, Py_buffer *); #endif #if PY_MAJOR_VERSION < 3 #define __Pyx_BUILTIN_MODULE_NAME "__builtin__" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #else #define __Pyx_BUILTIN_MODULE_NAME "builtins" #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) \ PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos) #endif #if PY_MAJOR_VERSION < 3 && PY_MINOR_VERSION < 6 #define PyUnicode_FromString(s) PyUnicode_Decode(s, strlen(s), "UTF-8", "strict") #endif #if PY_MAJOR_VERSION >= 3 #define Py_TPFLAGS_CHECKTYPES 0 #define Py_TPFLAGS_HAVE_INDEX 0 #endif #if (PY_VERSION_HEX < 0x02060000) || (PY_MAJOR_VERSION >= 3) #define Py_TPFLAGS_HAVE_NEWBUFFER 0 #endif #if PY_VERSION_HEX > 0x03030000 && defined(PyUnicode_GET_LENGTH) #define CYTHON_PEP393_ENABLED 1 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_LENGTH(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) PyUnicode_READ_CHAR(u, i) #else #define CYTHON_PEP393_ENABLED 0 #define __Pyx_PyUnicode_GET_LENGTH(u) PyUnicode_GET_SIZE(u) #define __Pyx_PyUnicode_READ_CHAR(u, i) ((Py_UCS4)(PyUnicode_AS_UNICODE(u)[i])) #endif #if PY_MAJOR_VERSION >= 3 #define PyBaseString_Type PyUnicode_Type #define PyStringObject PyUnicodeObject #define PyString_Type PyUnicode_Type #define PyString_Check PyUnicode_Check #define PyString_CheckExact PyUnicode_CheckExact #endif #if PY_VERSION_HEX < 0x02060000 #define PyBytesObject PyStringObject #define PyBytes_Type PyString_Type #define PyBytes_Check PyString_Check #define PyBytes_CheckExact PyString_CheckExact #define PyBytes_FromString PyString_FromString #define PyBytes_FromStringAndSize PyString_FromStringAndSize #define PyBytes_FromFormat PyString_FromFormat #define PyBytes_DecodeEscape PyString_DecodeEscape #define PyBytes_AsString PyString_AsString #define PyBytes_AsStringAndSize PyString_AsStringAndSize #define PyBytes_Size PyString_Size #define PyBytes_AS_STRING PyString_AS_STRING #define PyBytes_GET_SIZE PyString_GET_SIZE #define PyBytes_Repr PyString_Repr #define PyBytes_Concat PyString_Concat #define PyBytes_ConcatAndDel PyString_ConcatAndDel #endif #if PY_VERSION_HEX < 0x02060000 #define PySet_Check(obj) PyObject_TypeCheck(obj, &PySet_Type) #define PyFrozenSet_Check(obj) PyObject_TypeCheck(obj, &PyFrozenSet_Type) #endif #ifndef PySet_CheckExact #define PySet_CheckExact(obj) (Py_TYPE(obj) == &PySet_Type) #endif #define __Pyx_TypeCheck(obj, type) PyObject_TypeCheck(obj, (PyTypeObject *)type) #if PY_MAJOR_VERSION >= 3 #define PyIntObject PyLongObject #define PyInt_Type PyLong_Type #define PyInt_Check(op) PyLong_Check(op) #define PyInt_CheckExact(op) PyLong_CheckExact(op) #define PyInt_FromString PyLong_FromString #define PyInt_FromUnicode PyLong_FromUnicode #define PyInt_FromLong PyLong_FromLong #define PyInt_FromSize_t PyLong_FromSize_t #define PyInt_FromSsize_t PyLong_FromSsize_t #define PyInt_AsLong PyLong_AsLong #define PyInt_AS_LONG PyLong_AS_LONG #define PyInt_AsSsize_t PyLong_AsSsize_t #define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask #define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask #endif #if PY_MAJOR_VERSION >= 3 #define PyBoolObject PyLongObject #endif #if PY_VERSION_HEX < 0x03020000 typedef long Py_hash_t; #define __Pyx_PyInt_FromHash_t PyInt_FromLong #define __Pyx_PyInt_AsHash_t PyInt_AsLong #else #define __Pyx_PyInt_FromHash_t PyInt_FromSsize_t #define __Pyx_PyInt_AsHash_t PyInt_AsSsize_t #endif #if (PY_MAJOR_VERSION < 3) || (PY_VERSION_HEX >= 0x03010300) #define __Pyx_PySequence_GetSlice(obj, a, b) PySequence_GetSlice(obj, a, b) #define __Pyx_PySequence_SetSlice(obj, a, b, value) PySequence_SetSlice(obj, a, b, value) #define __Pyx_PySequence_DelSlice(obj, a, b) PySequence_DelSlice(obj, a, b) #else #define __Pyx_PySequence_GetSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), (PyObject*)0) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_GetSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object is unsliceable", (obj)->ob_type->tp_name), (PyObject*)0))) #define __Pyx_PySequence_SetSlice(obj, a, b, value) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_SetSlice(obj, a, b, value)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice assignment", (obj)->ob_type->tp_name), -1))) #define __Pyx_PySequence_DelSlice(obj, a, b) (unlikely(!(obj)) ? \ (PyErr_SetString(PyExc_SystemError, "null argument to internal routine"), -1) : \ (likely((obj)->ob_type->tp_as_mapping) ? (PySequence_DelSlice(obj, a, b)) : \ (PyErr_Format(PyExc_TypeError, "'%.200s' object doesn't support slice deletion", (obj)->ob_type->tp_name), -1))) #endif #if PY_MAJOR_VERSION >= 3 #define PyMethod_New(func, self, klass) ((self) ? PyMethod_New(func, self) : PyInstanceMethod_New(func)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),((char *)(n))) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),((char *)(n)),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),((char *)(n))) #else #define __Pyx_GetAttrString(o,n) PyObject_GetAttrString((o),(n)) #define __Pyx_SetAttrString(o,n,a) PyObject_SetAttrString((o),(n),(a)) #define __Pyx_DelAttrString(o,n) PyObject_DelAttrString((o),(n)) #endif #if PY_VERSION_HEX < 0x02050000 #define __Pyx_NAMESTR(n) ((char *)(n)) #define __Pyx_DOCSTR(n) ((char *)(n)) #else #define __Pyx_NAMESTR(n) (n) #define __Pyx_DOCSTR(n) (n) #endif #if PY_MAJOR_VERSION >= 3 #define __Pyx_PyNumber_Divide(x,y) PyNumber_TrueDivide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceTrueDivide(x,y) #else #define __Pyx_PyNumber_Divide(x,y) PyNumber_Divide(x,y) #define __Pyx_PyNumber_InPlaceDivide(x,y) PyNumber_InPlaceDivide(x,y) #endif #ifndef __PYX_EXTERN_C #ifdef __cplusplus #define __PYX_EXTERN_C extern "C" #else #define __PYX_EXTERN_C extern #endif #endif #if defined(WIN32) || defined(MS_WINDOWS) #define _USE_MATH_DEFINES #endif #include #define __PYX_HAVE__cogent__evolve___solved_models #define __PYX_HAVE_API__cogent__evolve___solved_models #include "array_interface.h" #include "math.h" #ifdef _OPENMP #include #endif /* _OPENMP */ #ifdef PYREX_WITHOUT_ASSERTIONS #define CYTHON_WITHOUT_ASSERTIONS #endif /* inline attribute */ #ifndef CYTHON_INLINE #if defined(__GNUC__) #define CYTHON_INLINE __inline__ #elif defined(_MSC_VER) #define CYTHON_INLINE __inline #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L #define CYTHON_INLINE inline #else #define CYTHON_INLINE #endif #endif /* unused attribute */ #ifndef CYTHON_UNUSED # if defined(__GNUC__) # if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif # elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif #endif typedef struct {PyObject **p; char *s; const long n; const char* encoding; const char is_unicode; const char is_str; const char intern; } __Pyx_StringTabEntry; /*proto*/ /* Type Conversion Predeclarations */ #define __Pyx_PyBytes_FromUString(s) PyBytes_FromString((char*)s) #define __Pyx_PyBytes_AsUString(s) ((unsigned char*) PyBytes_AsString(s)) #define __Pyx_Owned_Py_None(b) (Py_INCREF(Py_None), Py_None) #define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False)) static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject*); static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x); static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject*); static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t); static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject*); #define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x)) #define __pyx_PyFloat_AsFloat(x) ((float) __pyx_PyFloat_AsDouble(x)) #ifdef __GNUC__ /* Test for GCC > 2.95 */ #if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)) #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) #else /* __GNUC__ > 2 ... */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ > 2 ... */ #else /* __GNUC__ */ #define likely(x) (x) #define unlikely(x) (x) #endif /* __GNUC__ */ static PyObject *__pyx_m; static PyObject *__pyx_b; static PyObject *__pyx_empty_tuple; static PyObject *__pyx_empty_bytes; static int __pyx_lineno; static int __pyx_clineno = 0; static const char * __pyx_cfilenm= __FILE__; static const char *__pyx_filename; static const char *__pyx_f[] = { "numerical_pyrex.pyx", "_solved_models.pyx", }; /*--- Type declarations ---*/ /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":30 * int CONTIGUOUS * * ctypedef object ArrayType # <<<<<<<<<<<<<< * * cdef double *uncheckedArrayDouble(ArrayType A): */ typedef PyObject *__pyx_t_6cogent_6evolve_14_solved_models_ArrayType; #ifndef CYTHON_REFNANNY #define CYTHON_REFNANNY 0 #endif #if CYTHON_REFNANNY typedef struct { void (*INCREF)(void*, PyObject*, int); void (*DECREF)(void*, PyObject*, int); void (*GOTREF)(void*, PyObject*, int); void (*GIVEREF)(void*, PyObject*, int); void* (*SetupContext)(const char*, int, const char*); void (*FinishContext)(void**); } __Pyx_RefNannyAPIStruct; static __Pyx_RefNannyAPIStruct *__Pyx_RefNanny = NULL; static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname); /*proto*/ #define __Pyx_RefNannyDeclarations void *__pyx_refnanny = NULL; #ifdef WITH_THREAD #define __Pyx_RefNannySetupContext(name, acquire_gil) \ if (acquire_gil) { \ PyGILState_STATE __pyx_gilstate_save = PyGILState_Ensure(); \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ PyGILState_Release(__pyx_gilstate_save); \ } else { \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__); \ } #else #define __Pyx_RefNannySetupContext(name, acquire_gil) \ __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__) #endif #define __Pyx_RefNannyFinishContext() \ __Pyx_RefNanny->FinishContext(&__pyx_refnanny) #define __Pyx_INCREF(r) __Pyx_RefNanny->INCREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_DECREF(r) __Pyx_RefNanny->DECREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GOTREF(r) __Pyx_RefNanny->GOTREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_GIVEREF(r) __Pyx_RefNanny->GIVEREF(__pyx_refnanny, (PyObject *)(r), __LINE__) #define __Pyx_XINCREF(r) do { if((r) != NULL) {__Pyx_INCREF(r); }} while(0) #define __Pyx_XDECREF(r) do { if((r) != NULL) {__Pyx_DECREF(r); }} while(0) #define __Pyx_XGOTREF(r) do { if((r) != NULL) {__Pyx_GOTREF(r); }} while(0) #define __Pyx_XGIVEREF(r) do { if((r) != NULL) {__Pyx_GIVEREF(r);}} while(0) #else #define __Pyx_RefNannyDeclarations #define __Pyx_RefNannySetupContext(name, acquire_gil) #define __Pyx_RefNannyFinishContext() #define __Pyx_INCREF(r) Py_INCREF(r) #define __Pyx_DECREF(r) Py_DECREF(r) #define __Pyx_GOTREF(r) #define __Pyx_GIVEREF(r) #define __Pyx_XINCREF(r) Py_XINCREF(r) #define __Pyx_XDECREF(r) Py_XDECREF(r) #define __Pyx_XGOTREF(r) #define __Pyx_XGIVEREF(r) #endif /* CYTHON_REFNANNY */ #define __Pyx_CLEAR(r) do { PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);} while(0) #define __Pyx_XCLEAR(r) do { if((r) != NULL) {PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);}} while(0) static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name); /*proto*/ static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb); /*proto*/ static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb); /*proto*/ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause); /*proto*/ static void __Pyx_RaiseArgtupleInvalid(const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found); /*proto*/ static void __Pyx_RaiseDoubleKeywordsError(const char* func_name, PyObject* kw_name); /*proto*/ static int __Pyx_ParseOptionalKeywords(PyObject *kwds, PyObject **argnames[], \ PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, \ const char* function_name); /*proto*/ static CYTHON_INLINE long __Pyx_div_long(long, long); /* proto */ static CYTHON_INLINE long __Pyx_mod_long(long, long); /* proto */ static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject *); static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject *); static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject *); static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject *); static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject *); static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject *); static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject *); static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject *); static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject *); static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject *); static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject *); static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject *); static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject *); static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject *); static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject *); static void __Pyx_WriteUnraisable(const char *name, int clineno, int lineno, const char *filename); /*proto*/ static int __Pyx_check_binary_version(void); typedef struct { int code_line; PyCodeObject* code_object; } __Pyx_CodeObjectCacheEntry; struct __Pyx_CodeObjectCache { int count; int max_count; __Pyx_CodeObjectCacheEntry* entries; }; static struct __Pyx_CodeObjectCache __pyx_code_cache = {0,0,NULL}; static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line); static PyCodeObject *__pyx_find_code_object(int code_line); static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object); static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename); /*proto*/ static int __Pyx_InitStrings(__Pyx_StringTabEntry *t); /*proto*/ /* Module declarations from 'cogent.evolve._solved_models' */ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType, char, int, int, int **); /*proto*/ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray1D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType, char, int, int *); /*proto*/ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray2D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType, char, int, int *, int *); /*proto*/ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray3D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType, char, int, int *, int *, int *); /*proto*/ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray4D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType, char, int, int *, int *, int *, int *); /*proto*/ static double *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayDouble1D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType, int *); /*proto*/ static double *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayDouble2D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType, int *, int *); /*proto*/ #define __Pyx_MODULE_NAME "cogent.evolve._solved_models" int __pyx_module_is_main_cogent__evolve___solved_models = 0; /* Implementation of 'cogent.evolve._solved_models' */ static PyObject *__pyx_builtin_TypeError; static PyObject *__pyx_builtin_ValueError; static PyObject *__pyx_builtin_chr; static PyObject *__pyx_builtin_range; static PyObject *__pyx_pf_6cogent_6evolve_14_solved_models_calc_TN93_P(CYTHON_UNUSED PyObject *__pyx_self, int __pyx_v_do_scaling, PyObject *__pyx_v_mprobs, double __pyx_v_time, PyObject *__pyx_v_alpha_1, PyObject *__pyx_v_alpha_2, PyObject *__pyx_v_result); /* proto */ static char __pyx_k_1[] = "Array required, got None"; static char __pyx_k_3[] = "Unexpected array interface version %s"; static char __pyx_k_4[] = "'%s' type array required, got '%s'"; static char __pyx_k_5[] = "'%s%s' type array required, got '%s%s'"; static char __pyx_k_6[] = "%s dimensional array required, got %s"; static char __pyx_k_7[] = "Noncontiguous array"; static char __pyx_k_9[] = "Dimension %s is %s, expected %s"; static char __pyx_k_10[] = "('1', '6', '0dev')"; static char __pyx_k_14[] = "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/_solved_models.pyx"; static char __pyx_k_15[] = "cogent.evolve._solved_models"; static char __pyx_k__P[] = "P"; static char __pyx_k__S[] = "S"; static char __pyx_k__i[] = "i"; static char __pyx_k__j[] = "j"; static char __pyx_k__p[] = "p"; static char __pyx_k__mu[] = "mu"; static char __pyx_k__pi[] = "pi"; static char __pyx_k__chr[] = "chr"; static char __pyx_k__row[] = "row"; static char __pyx_k__time[] = "time"; static char __pyx_k__alpha[] = "alpha"; static char __pyx_k__b_row[] = "b_row"; static char __pyx_k__motif[] = "motif"; static char __pyx_k__other[] = "other"; static char __pyx_k__range[] = "range"; static char __pyx_k__column[] = "column"; static char __pyx_k__e_mu_t[] = "e_mu_t"; static char __pyx_k__mprobs[] = "mprobs"; static char __pyx_k__result[] = "result"; static char __pyx_k__alpha_1[] = "alpha_1"; static char __pyx_k__alpha_2[] = "alpha_2"; static char __pyx_k__pi_star[] = "pi_star"; static char __pyx_k____main__[] = "__main__"; static char __pyx_k____test__[] = "__test__"; static char __pyx_k__b_column[] = "b_column"; static char __pyx_k__e_beta_t[] = "e_beta_t"; static char __pyx_k__TypeError[] = "TypeError"; static char __pyx_k__ValueError[] = "ValueError"; static char __pyx_k__do_scaling[] = "do_scaling"; static char __pyx_k__transition[] = "transition"; static char __pyx_k____version__[] = "__version__"; static char __pyx_k__calc_TN93_P[] = "calc_TN93_P"; static char __pyx_k__scale_factor[] = "scale_factor"; static char __pyx_k__transversion[] = "transversion"; static char __pyx_k__version_info[] = "version_info"; static char __pyx_k____array_struct__[] = "__array_struct__"; static PyObject *__pyx_kp_s_1; static PyObject *__pyx_kp_s_10; static PyObject *__pyx_kp_s_14; static PyObject *__pyx_n_s_15; static PyObject *__pyx_kp_s_3; static PyObject *__pyx_kp_s_4; static PyObject *__pyx_kp_s_5; static PyObject *__pyx_kp_s_6; static PyObject *__pyx_kp_s_7; static PyObject *__pyx_kp_s_9; static PyObject *__pyx_n_s__P; static PyObject *__pyx_n_s__S; static PyObject *__pyx_n_s__TypeError; static PyObject *__pyx_n_s__ValueError; static PyObject *__pyx_n_s____array_struct__; static PyObject *__pyx_n_s____main__; static PyObject *__pyx_n_s____test__; static PyObject *__pyx_n_s____version__; static PyObject *__pyx_n_s__alpha; static PyObject *__pyx_n_s__alpha_1; static PyObject *__pyx_n_s__alpha_2; static PyObject *__pyx_n_s__b_column; static PyObject *__pyx_n_s__b_row; static PyObject *__pyx_n_s__calc_TN93_P; static PyObject *__pyx_n_s__chr; static PyObject *__pyx_n_s__column; static PyObject *__pyx_n_s__do_scaling; static PyObject *__pyx_n_s__e_beta_t; static PyObject *__pyx_n_s__e_mu_t; static PyObject *__pyx_n_s__i; static PyObject *__pyx_n_s__j; static PyObject *__pyx_n_s__motif; static PyObject *__pyx_n_s__mprobs; static PyObject *__pyx_n_s__mu; static PyObject *__pyx_n_s__other; static PyObject *__pyx_n_s__p; static PyObject *__pyx_n_s__pi; static PyObject *__pyx_n_s__pi_star; static PyObject *__pyx_n_s__range; static PyObject *__pyx_n_s__result; static PyObject *__pyx_n_s__row; static PyObject *__pyx_n_s__scale_factor; static PyObject *__pyx_n_s__time; static PyObject *__pyx_n_s__transition; static PyObject *__pyx_n_s__transversion; static PyObject *__pyx_n_s__version_info; static PyObject *__pyx_int_0; static PyObject *__pyx_int_1; static PyObject *__pyx_k_tuple_2; static PyObject *__pyx_k_tuple_8; static PyObject *__pyx_k_tuple_11; static PyObject *__pyx_k_tuple_12; static PyObject *__pyx_k_codeobj_13; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":32 * ctypedef object ArrayType * * cdef double *uncheckedArrayDouble(ArrayType A): # <<<<<<<<<<<<<< * cdef PyArrayInterface *a * cobj = A.__array_struct__ */ static double *__pyx_f_6cogent_6evolve_14_solved_models_uncheckedArrayDouble(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_A) { struct PyArrayInterface *__pyx_v_a; PyObject *__pyx_v_cobj = NULL; double *__pyx_r; __Pyx_RefNannyDeclarations PyObject *__pyx_t_1 = NULL; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("uncheckedArrayDouble", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":34 * cdef double *uncheckedArrayDouble(ArrayType A): * cdef PyArrayInterface *a * cobj = A.__array_struct__ # <<<<<<<<<<<<<< * a = PyCObject_AsVoidPtr(cobj) * return a.data */ __pyx_t_1 = PyObject_GetAttr(((PyObject *)__pyx_v_A), __pyx_n_s____array_struct__); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 34; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); __pyx_v_cobj = __pyx_t_1; __pyx_t_1 = 0; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":35 * cdef PyArrayInterface *a * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) # <<<<<<<<<<<<<< * return a.data * */ __pyx_v_a = ((struct PyArrayInterface *)PyCObject_AsVoidPtr(__pyx_v_cobj)); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":36 * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) * return a.data # <<<<<<<<<<<<<< * * cdef void *checkArray(ArrayType A, char typecode, int itemsize, */ __pyx_r = ((double *)__pyx_v_a->data); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_WriteUnraisable("cogent.evolve._solved_models.uncheckedArrayDouble", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = 0; __pyx_L0:; __Pyx_XDECREF(__pyx_v_cobj); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":38 * return a.data * * cdef void *checkArray(ArrayType A, char typecode, int itemsize, # <<<<<<<<<<<<<< * int nd, int **dims) except NULL: * cdef PyArrayInterface *a */ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_A, char __pyx_v_typecode, int __pyx_v_itemsize, int __pyx_v_nd, int **__pyx_v_dims) { struct PyArrayInterface *__pyx_v_a; PyObject *__pyx_v_cobj = NULL; char __pyx_v_typecode2; int __pyx_v_dimension; int __pyx_v_val; int *__pyx_v_var; void *__pyx_r; __Pyx_RefNannyDeclarations int __pyx_t_1; PyObject *__pyx_t_2 = NULL; PyObject *__pyx_t_3 = NULL; PyObject *__pyx_t_4 = NULL; PyObject *__pyx_t_5 = NULL; PyObject *__pyx_t_6 = NULL; int __pyx_t_7; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":43 * cdef int length, size * cdef char kind * if A is None: # <<<<<<<<<<<<<< * raise TypeError("Array required, got None") * cobj = A.__array_struct__ */ __pyx_t_1 = (__pyx_v_A == ((__pyx_t_6cogent_6evolve_14_solved_models_ArrayType)Py_None)); if (__pyx_t_1) { /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":44 * cdef char kind * if A is None: * raise TypeError("Array required, got None") # <<<<<<<<<<<<<< * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) */ __pyx_t_2 = PyObject_Call(__pyx_builtin_TypeError, ((PyObject *)__pyx_k_tuple_2), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 44; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_Raise(__pyx_t_2, 0, 0, 0); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 44; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L3; } __pyx_L3:; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":45 * if A is None: * raise TypeError("Array required, got None") * cobj = A.__array_struct__ # <<<<<<<<<<<<<< * a = PyCObject_AsVoidPtr(cobj) * if a.version != 2: */ __pyx_t_2 = PyObject_GetAttr(((PyObject *)__pyx_v_A), __pyx_n_s____array_struct__); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 45; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_v_cobj = __pyx_t_2; __pyx_t_2 = 0; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":46 * raise TypeError("Array required, got None") * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) # <<<<<<<<<<<<<< * if a.version != 2: * raise ValueError( */ __pyx_v_a = ((struct PyArrayInterface *)PyCObject_AsVoidPtr(__pyx_v_cobj)); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":47 * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) * if a.version != 2: # <<<<<<<<<<<<<< * raise ValueError( * "Unexpected array interface version %s" % str(a.version)) */ __pyx_t_1 = (__pyx_v_a->version != 2); if (__pyx_t_1) { /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":49 * if a.version != 2: * raise ValueError( * "Unexpected array interface version %s" % str(a.version)) # <<<<<<<<<<<<<< * cdef char typecode2 * typecode2 = a.typekind */ __pyx_t_2 = PyInt_FromLong(__pyx_v_a->version); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 49; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 49; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = PyObject_Call(((PyObject *)((PyObject*)(&PyString_Type))), ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 49; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __pyx_t_3 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_3), __pyx_t_2); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 49; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_3)); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, ((PyObject *)__pyx_t_3)); __Pyx_GIVEREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __pyx_t_3 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_2), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __Pyx_Raise(__pyx_t_3, 0, 0, 0); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L4; } __pyx_L4:; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":51 * "Unexpected array interface version %s" % str(a.version)) * cdef char typecode2 * typecode2 = a.typekind # <<<<<<<<<<<<<< * if typecode2 != typecode: * raise TypeError("'%s' type array required, got '%s'" % */ __pyx_v_typecode2 = __pyx_v_a->typekind; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":52 * cdef char typecode2 * typecode2 = a.typekind * if typecode2 != typecode: # <<<<<<<<<<<<<< * raise TypeError("'%s' type array required, got '%s'" % * (chr(typecode), chr(typecode2))) */ __pyx_t_1 = (__pyx_v_typecode2 != __pyx_v_typecode); if (__pyx_t_1) { /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":54 * if typecode2 != typecode: * raise TypeError("'%s' type array required, got '%s'" % * (chr(typecode), chr(typecode2))) # <<<<<<<<<<<<<< * if a.itemsize != itemsize: * raise TypeError("'%s%s' type array required, got '%s%s'" % */ __pyx_t_3 = PyInt_FromLong(__pyx_v_typecode); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyObject_Call(__pyx_builtin_chr, ((PyObject *)__pyx_t_2), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyInt_FromLong(__pyx_v_typecode2); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = PyObject_Call(__pyx_builtin_chr, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); __pyx_t_3 = 0; __pyx_t_2 = 0; __pyx_t_2 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_4), ((PyObject *)__pyx_t_4)); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 53; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_2)); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 53; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_2)); __Pyx_GIVEREF(((PyObject *)__pyx_t_2)); __pyx_t_2 = 0; __pyx_t_2 = PyObject_Call(__pyx_builtin_TypeError, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 53; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_Raise(__pyx_t_2, 0, 0, 0); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 53; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L5; } __pyx_L5:; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":55 * raise TypeError("'%s' type array required, got '%s'" % * (chr(typecode), chr(typecode2))) * if a.itemsize != itemsize: # <<<<<<<<<<<<<< * raise TypeError("'%s%s' type array required, got '%s%s'" % * (chr(typecode), itemsize, chr(typecode2), a.itemsize)) */ __pyx_t_1 = (__pyx_v_a->itemsize != __pyx_v_itemsize); if (__pyx_t_1) { /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":57 * if a.itemsize != itemsize: * raise TypeError("'%s%s' type array required, got '%s%s'" % * (chr(typecode), itemsize, chr(typecode2), a.itemsize)) # <<<<<<<<<<<<<< * if a.nd != nd: * raise ValueError("%s dimensional array required, got %s" % */ __pyx_t_2 = PyInt_FromLong(__pyx_v_typecode); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = PyObject_Call(__pyx_builtin_chr, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __pyx_t_4 = PyInt_FromLong(__pyx_v_itemsize); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = PyInt_FromLong(__pyx_v_typecode2); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyTuple_New(1); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = PyObject_Call(__pyx_builtin_chr, ((PyObject *)__pyx_t_5), NULL); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyInt_FromLong(__pyx_v_a->itemsize); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_6 = PyTuple_New(4); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 57; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); PyTuple_SET_ITEM(__pyx_t_6, 0, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_6, 1, __pyx_t_4); __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_6, 2, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_6, 3, __pyx_t_5); __Pyx_GIVEREF(__pyx_t_5); __pyx_t_2 = 0; __pyx_t_4 = 0; __pyx_t_3 = 0; __pyx_t_5 = 0; __pyx_t_5 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_5), ((PyObject *)__pyx_t_6)); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_5)); __Pyx_DECREF(((PyObject *)__pyx_t_6)); __pyx_t_6 = 0; __pyx_t_6 = PyTuple_New(1); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); PyTuple_SET_ITEM(__pyx_t_6, 0, ((PyObject *)__pyx_t_5)); __Pyx_GIVEREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_builtin_TypeError, ((PyObject *)__pyx_t_6), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(((PyObject *)__pyx_t_6)); __pyx_t_6 = 0; __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 56; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L6; } __pyx_L6:; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":58 * raise TypeError("'%s%s' type array required, got '%s%s'" % * (chr(typecode), itemsize, chr(typecode2), a.itemsize)) * if a.nd != nd: # <<<<<<<<<<<<<< * raise ValueError("%s dimensional array required, got %s" % * (nd, a.nd)) */ __pyx_t_1 = (__pyx_v_a->nd != __pyx_v_nd); if (__pyx_t_1) { /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":60 * if a.nd != nd: * raise ValueError("%s dimensional array required, got %s" % * (nd, a.nd)) # <<<<<<<<<<<<<< * if not a.flags & CONTIGUOUS: * raise ValueError ('Noncontiguous array') */ __pyx_t_5 = PyInt_FromLong(__pyx_v_nd); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 60; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_6 = PyInt_FromLong(__pyx_v_a->nd); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 60; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); __pyx_t_3 = PyTuple_New(2); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 60; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_5); __Pyx_GIVEREF(__pyx_t_5); PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_t_6); __Pyx_GIVEREF(__pyx_t_6); __pyx_t_5 = 0; __pyx_t_6 = 0; __pyx_t_6 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_6), ((PyObject *)__pyx_t_3)); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 59; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_6)); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 59; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_3, 0, ((PyObject *)__pyx_t_6)); __Pyx_GIVEREF(((PyObject *)__pyx_t_6)); __pyx_t_6 = 0; __pyx_t_6 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_3), NULL); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 59; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); __Pyx_DECREF(((PyObject *)__pyx_t_3)); __pyx_t_3 = 0; __Pyx_Raise(__pyx_t_6, 0, 0, 0); __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 59; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L7; } __pyx_L7:; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":61 * raise ValueError("%s dimensional array required, got %s" % * (nd, a.nd)) * if not a.flags & CONTIGUOUS: # <<<<<<<<<<<<<< * raise ValueError ('Noncontiguous array') * */ __pyx_t_1 = (!(__pyx_v_a->flags & CONTIGUOUS)); if (__pyx_t_1) { /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":62 * (nd, a.nd)) * if not a.flags & CONTIGUOUS: * raise ValueError ('Noncontiguous array') # <<<<<<<<<<<<<< * * cdef int dimension, val */ __pyx_t_6 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_k_tuple_8), NULL); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 62; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); __Pyx_Raise(__pyx_t_6, 0, 0, 0); __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 62; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L8; } __pyx_L8:; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":66 * cdef int dimension, val * cdef int *var * for dimension from 0 <= dimension < nd: # <<<<<<<<<<<<<< * val = a.shape[dimension] * var = dims[dimension] */ __pyx_t_7 = __pyx_v_nd; for (__pyx_v_dimension = 0; __pyx_v_dimension < __pyx_t_7; __pyx_v_dimension++) { /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":67 * cdef int *var * for dimension from 0 <= dimension < nd: * val = a.shape[dimension] # <<<<<<<<<<<<<< * var = dims[dimension] * if var[0] == 0: */ __pyx_v_val = (__pyx_v_a->shape[__pyx_v_dimension]); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":68 * for dimension from 0 <= dimension < nd: * val = a.shape[dimension] * var = dims[dimension] # <<<<<<<<<<<<<< * if var[0] == 0: * # Length unspecified, take it from the provided array */ __pyx_v_var = (__pyx_v_dims[__pyx_v_dimension]); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":69 * val = a.shape[dimension] * var = dims[dimension] * if var[0] == 0: # <<<<<<<<<<<<<< * # Length unspecified, take it from the provided array * var[0] = val */ __pyx_t_1 = ((__pyx_v_var[0]) == 0); if (__pyx_t_1) { /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":71 * if var[0] == 0: * # Length unspecified, take it from the provided array * var[0] = val # <<<<<<<<<<<<<< * elif var[0] != val: * # Length already specified, but not the same */ (__pyx_v_var[0]) = __pyx_v_val; goto __pyx_L11; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":72 * # Length unspecified, take it from the provided array * var[0] = val * elif var[0] != val: # <<<<<<<<<<<<<< * # Length already specified, but not the same * raise ValueError("Dimension %s is %s, expected %s" % */ __pyx_t_1 = ((__pyx_v_var[0]) != __pyx_v_val); if (__pyx_t_1) { /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":75 * # Length already specified, but not the same * raise ValueError("Dimension %s is %s, expected %s" % * (dimension, val, var[0])) # <<<<<<<<<<<<<< * else: * # Length matches what was expected */ __pyx_t_6 = PyInt_FromLong(__pyx_v_dimension); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 75; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_6); __pyx_t_3 = PyInt_FromLong(__pyx_v_val); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 75; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = PyInt_FromLong((__pyx_v_var[0])); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 75; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __pyx_t_4 = PyTuple_New(3); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 75; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_6); __Pyx_GIVEREF(__pyx_t_6); PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_4, 2, __pyx_t_5); __Pyx_GIVEREF(__pyx_t_5); __pyx_t_6 = 0; __pyx_t_3 = 0; __pyx_t_5 = 0; __pyx_t_5 = PyNumber_Remainder(((PyObject *)__pyx_kp_s_9), ((PyObject *)__pyx_t_4)); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 74; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_5)); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 74; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_4, 0, ((PyObject *)__pyx_t_5)); __Pyx_GIVEREF(((PyObject *)__pyx_t_5)); __pyx_t_5 = 0; __pyx_t_5 = PyObject_Call(__pyx_builtin_ValueError, ((PyObject *)__pyx_t_4), NULL); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 74; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(((PyObject *)__pyx_t_4)); __pyx_t_4 = 0; __Pyx_Raise(__pyx_t_5, 0, 0, 0); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; {__pyx_filename = __pyx_f[0]; __pyx_lineno = 74; __pyx_clineno = __LINE__; goto __pyx_L1_error;} goto __pyx_L11; } /*else*/ { } __pyx_L11:; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":79 * # Length matches what was expected * pass * return a.data # <<<<<<<<<<<<<< * * */ __pyx_r = __pyx_v_a->data; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __Pyx_XDECREF(__pyx_t_6); __Pyx_AddTraceback("cogent.evolve._solved_models.checkArray", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF(__pyx_v_cobj); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":82 * * * cdef void *checkArray1D(ArrayType a, char typecode, int size, # <<<<<<<<<<<<<< * int *x) except NULL: * cdef int *dims[1] */ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray1D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, char __pyx_v_typecode, int __pyx_v_size, int *__pyx_v_x) { int *__pyx_v_dims[1]; void *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray1D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":85 * int *x) except NULL: * cdef int *dims[1] * dims[0] = x # <<<<<<<<<<<<<< * return checkArray(a, typecode, size, 1, dims) * */ (__pyx_v_dims[0]) = __pyx_v_x; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":86 * cdef int *dims[1] * dims[0] = x * return checkArray(a, typecode, size, 1, dims) # <<<<<<<<<<<<<< * * cdef void *checkArray2D(ArrayType a, char typecode, int size, */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray(__pyx_v_a, __pyx_v_typecode, __pyx_v_size, 1, __pyx_v_dims); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 86; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_t_1; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArray1D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":88 * return checkArray(a, typecode, size, 1, dims) * * cdef void *checkArray2D(ArrayType a, char typecode, int size, # <<<<<<<<<<<<<< * int *x, int *y) except NULL: * cdef int *dims[2] */ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray2D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, char __pyx_v_typecode, int __pyx_v_size, int *__pyx_v_x, int *__pyx_v_y) { int *__pyx_v_dims[2]; void *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray2D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":91 * int *x, int *y) except NULL: * cdef int *dims[2] * dims[0] = x # <<<<<<<<<<<<<< * dims[1] = y * return checkArray(a, typecode, size, 2, dims) */ (__pyx_v_dims[0]) = __pyx_v_x; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":92 * cdef int *dims[2] * dims[0] = x * dims[1] = y # <<<<<<<<<<<<<< * return checkArray(a, typecode, size, 2, dims) * */ (__pyx_v_dims[1]) = __pyx_v_y; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":93 * dims[0] = x * dims[1] = y * return checkArray(a, typecode, size, 2, dims) # <<<<<<<<<<<<<< * * cdef void *checkArray3D(ArrayType a, char typecode, int size, */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray(__pyx_v_a, __pyx_v_typecode, __pyx_v_size, 2, __pyx_v_dims); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 93; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_t_1; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArray2D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":95 * return checkArray(a, typecode, size, 2, dims) * * cdef void *checkArray3D(ArrayType a, char typecode, int size, # <<<<<<<<<<<<<< * int *x, int *y, int *z) except NULL: * cdef int *dims[3] */ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray3D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, char __pyx_v_typecode, int __pyx_v_size, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { int *__pyx_v_dims[3]; void *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray3D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":98 * int *x, int *y, int *z) except NULL: * cdef int *dims[3] * dims[0] = x # <<<<<<<<<<<<<< * dims[1] = y * dims[2] = z */ (__pyx_v_dims[0]) = __pyx_v_x; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":99 * cdef int *dims[3] * dims[0] = x * dims[1] = y # <<<<<<<<<<<<<< * dims[2] = z * return checkArray(a, typecode, size, 3, dims) */ (__pyx_v_dims[1]) = __pyx_v_y; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":100 * dims[0] = x * dims[1] = y * dims[2] = z # <<<<<<<<<<<<<< * return checkArray(a, typecode, size, 3, dims) * */ (__pyx_v_dims[2]) = __pyx_v_z; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":101 * dims[1] = y * dims[2] = z * return checkArray(a, typecode, size, 3, dims) # <<<<<<<<<<<<<< * * cdef void *checkArray4D(ArrayType a, char typecode, int size, */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray(__pyx_v_a, __pyx_v_typecode, __pyx_v_size, 3, __pyx_v_dims); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 101; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_t_1; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArray3D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":103 * return checkArray(a, typecode, size, 3, dims) * * cdef void *checkArray4D(ArrayType a, char typecode, int size, # <<<<<<<<<<<<<< * int *w, int *x, int *y, int *z) except NULL: * cdef int *dims[4] */ static void *__pyx_f_6cogent_6evolve_14_solved_models_checkArray4D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, char __pyx_v_typecode, int __pyx_v_size, int *__pyx_v_w, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { int *__pyx_v_dims[4]; void *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArray4D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":106 * int *w, int *x, int *y, int *z) except NULL: * cdef int *dims[4] * dims[0] = w # <<<<<<<<<<<<<< * dims[1] = x * dims[2] = y */ (__pyx_v_dims[0]) = __pyx_v_w; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":107 * cdef int *dims[4] * dims[0] = w * dims[1] = x # <<<<<<<<<<<<<< * dims[2] = y * dims[3] = z */ (__pyx_v_dims[1]) = __pyx_v_x; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":108 * dims[0] = w * dims[1] = x * dims[2] = y # <<<<<<<<<<<<<< * dims[3] = z * return checkArray(a, typecode, size, 4, dims) */ (__pyx_v_dims[2]) = __pyx_v_y; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":109 * dims[1] = x * dims[2] = y * dims[3] = z # <<<<<<<<<<<<<< * return checkArray(a, typecode, size, 4, dims) * */ (__pyx_v_dims[3]) = __pyx_v_z; /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":110 * dims[2] = y * dims[3] = z * return checkArray(a, typecode, size, 4, dims) # <<<<<<<<<<<<<< * * */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray(__pyx_v_a, __pyx_v_typecode, __pyx_v_size, 4, __pyx_v_dims); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 110; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = __pyx_t_1; goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArray4D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":113 * * * cdef double * checkArrayDouble1D(ArrayType a, int *x) except NULL: # <<<<<<<<<<<<<< * return checkArray1D(a, c'f', sizeof(double), x) * */ static double *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayDouble1D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, int *__pyx_v_x) { double *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayDouble1D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":114 * * cdef double * checkArrayDouble1D(ArrayType a, int *x) except NULL: * return checkArray1D(a, c'f', sizeof(double), x) # <<<<<<<<<<<<<< * * cdef double * checkArrayDouble2D(ArrayType a, int *x, int *y) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray1D(__pyx_v_a, 'f', (sizeof(double)), __pyx_v_x); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 114; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((double *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArrayDouble1D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":116 * return checkArray1D(a, c'f', sizeof(double), x) * * cdef double * checkArrayDouble2D(ArrayType a, int *x, int *y) except NULL: # <<<<<<<<<<<<<< * return checkArray2D(a, c'f', sizeof(double), x, y) * */ static double *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayDouble2D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, int *__pyx_v_x, int *__pyx_v_y) { double *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayDouble2D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":117 * * cdef double * checkArrayDouble2D(ArrayType a, int *x, int *y) except NULL: * return checkArray2D(a, c'f', sizeof(double), x, y) # <<<<<<<<<<<<<< * * cdef double * checkArrayDouble3D(ArrayType a, int *x, int *y, int *z) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray2D(__pyx_v_a, 'f', (sizeof(double)), __pyx_v_x, __pyx_v_y); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 117; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((double *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArrayDouble2D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":119 * return checkArray2D(a, c'f', sizeof(double), x, y) * * cdef double * checkArrayDouble3D(ArrayType a, int *x, int *y, int *z) except NULL: # <<<<<<<<<<<<<< * return checkArray3D(a, c'f', sizeof(double), x, y, z) * */ static double *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayDouble3D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { double *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayDouble3D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":120 * * cdef double * checkArrayDouble3D(ArrayType a, int *x, int *y, int *z) except NULL: * return checkArray3D(a, c'f', sizeof(double), x, y, z) # <<<<<<<<<<<<<< * * cdef double * checkArrayDouble4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray3D(__pyx_v_a, 'f', (sizeof(double)), __pyx_v_x, __pyx_v_y, __pyx_v_z); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 120; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((double *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArrayDouble3D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":122 * return checkArray3D(a, c'f', sizeof(double), x, y, z) * * cdef double * checkArrayDouble4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: # <<<<<<<<<<<<<< * return checkArray4D(a, c'f', sizeof(double), w, x, y, z) * */ static double *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayDouble4D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, int *__pyx_v_w, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { double *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayDouble4D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":123 * * cdef double * checkArrayDouble4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: * return checkArray4D(a, c'f', sizeof(double), w, x, y, z) # <<<<<<<<<<<<<< * * */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray4D(__pyx_v_a, 'f', (sizeof(double)), __pyx_v_w, __pyx_v_x, __pyx_v_y, __pyx_v_z); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 123; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((double *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArrayDouble4D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":126 * * * cdef long * checkArrayLong1D(ArrayType a, int *x) except NULL: # <<<<<<<<<<<<<< * return checkArray1D(a, c'i', sizeof(long), x) * */ static long *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayLong1D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, int *__pyx_v_x) { long *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayLong1D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":127 * * cdef long * checkArrayLong1D(ArrayType a, int *x) except NULL: * return checkArray1D(a, c'i', sizeof(long), x) # <<<<<<<<<<<<<< * * cdef long * checkArrayLong2D(ArrayType a, int *x, int *y) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray1D(__pyx_v_a, 'i', (sizeof(long)), __pyx_v_x); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 127; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((long *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArrayLong1D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":129 * return checkArray1D(a, c'i', sizeof(long), x) * * cdef long * checkArrayLong2D(ArrayType a, int *x, int *y) except NULL: # <<<<<<<<<<<<<< * return checkArray2D(a, c'i', sizeof(long), x, y) * */ static long *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayLong2D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, int *__pyx_v_x, int *__pyx_v_y) { long *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayLong2D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":130 * * cdef long * checkArrayLong2D(ArrayType a, int *x, int *y) except NULL: * return checkArray2D(a, c'i', sizeof(long), x, y) # <<<<<<<<<<<<<< * * cdef long * checkArrayLong3D(ArrayType a, int *x, int *y, int *z) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray2D(__pyx_v_a, 'i', (sizeof(long)), __pyx_v_x, __pyx_v_y); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 130; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((long *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArrayLong2D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":132 * return checkArray2D(a, c'i', sizeof(long), x, y) * * cdef long * checkArrayLong3D(ArrayType a, int *x, int *y, int *z) except NULL: # <<<<<<<<<<<<<< * return checkArray3D(a, c'i', sizeof(long), x, y, z) * */ static long *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayLong3D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { long *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayLong3D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":133 * * cdef long * checkArrayLong3D(ArrayType a, int *x, int *y, int *z) except NULL: * return checkArray3D(a, c'i', sizeof(long), x, y, z) # <<<<<<<<<<<<<< * * cdef long * checkArrayLong4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray3D(__pyx_v_a, 'i', (sizeof(long)), __pyx_v_x, __pyx_v_y, __pyx_v_z); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 133; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((long *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArrayLong3D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":135 * return checkArray3D(a, c'i', sizeof(long), x, y, z) * * cdef long * checkArrayLong4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: # <<<<<<<<<<<<<< * return checkArray4D(a, c'i', sizeof(long), w, x, y, z) */ static long *__pyx_f_6cogent_6evolve_14_solved_models_checkArrayLong4D(__pyx_t_6cogent_6evolve_14_solved_models_ArrayType __pyx_v_a, int *__pyx_v_w, int *__pyx_v_x, int *__pyx_v_y, int *__pyx_v_z) { long *__pyx_r; __Pyx_RefNannyDeclarations void *__pyx_t_1; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("checkArrayLong4D", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":136 * * cdef long * checkArrayLong4D(ArrayType a, int *w, int *x, int *y, int *z) except NULL: * return checkArray4D(a, c'i', sizeof(long), w, x, y, z) # <<<<<<<<<<<<<< */ __pyx_t_1 = __pyx_f_6cogent_6evolve_14_solved_models_checkArray4D(__pyx_v_a, 'i', (sizeof(long)), __pyx_v_w, __pyx_v_x, __pyx_v_y, __pyx_v_z); if (unlikely(__pyx_t_1 == NULL)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 136; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_r = ((long *)__pyx_t_1); goto __pyx_L0; __pyx_r = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.checkArrayLong4D", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_6cogent_6evolve_14_solved_models_1calc_TN93_P(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_6cogent_6evolve_14_solved_models_1calc_TN93_P = {__Pyx_NAMESTR("calc_TN93_P"), (PyCFunction)__pyx_pw_6cogent_6evolve_14_solved_models_1calc_TN93_P, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(0)}; static PyObject *__pyx_pw_6cogent_6evolve_14_solved_models_1calc_TN93_P(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { int __pyx_v_do_scaling; PyObject *__pyx_v_mprobs = 0; double __pyx_v_time; PyObject *__pyx_v_alpha_1 = 0; PyObject *__pyx_v_alpha_2 = 0; PyObject *__pyx_v_result = 0; static PyObject **__pyx_pyargnames[] = {&__pyx_n_s__do_scaling,&__pyx_n_s__mprobs,&__pyx_n_s__time,&__pyx_n_s__alpha_1,&__pyx_n_s__alpha_2,&__pyx_n_s__result,0}; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("calc_TN93_P (wrapper)", 0); __pyx_self = __pyx_self; { PyObject* values[6] = {0,0,0,0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 6: values[5] = PyTuple_GET_ITEM(__pyx_args, 5); case 5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4); case 4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3); case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__do_scaling); if (likely(values[0])) kw_args--; else goto __pyx_L5_argtuple_error; case 1: values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__mprobs); if (likely(values[1])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("calc_TN93_P", 1, 6, 6, 1); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 2: values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__time); if (likely(values[2])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("calc_TN93_P", 1, 6, 6, 2); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 3: values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__alpha_1); if (likely(values[3])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("calc_TN93_P", 1, 6, 6, 3); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 4: values[4] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__alpha_2); if (likely(values[4])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("calc_TN93_P", 1, 6, 6, 4); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } case 5: values[5] = PyDict_GetItem(__pyx_kwds, __pyx_n_s__result); if (likely(values[5])) kw_args--; else { __Pyx_RaiseArgtupleInvalid("calc_TN93_P", 1, 6, 6, 5); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "calc_TN93_P") < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} } } else if (PyTuple_GET_SIZE(__pyx_args) != 6) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); values[3] = PyTuple_GET_ITEM(__pyx_args, 3); values[4] = PyTuple_GET_ITEM(__pyx_args, 4); values[5] = PyTuple_GET_ITEM(__pyx_args, 5); } __pyx_v_do_scaling = __Pyx_PyInt_AsInt(values[0]); if (unlikely((__pyx_v_do_scaling == (int)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_mprobs = values[1]; __pyx_v_time = __pyx_PyFloat_AsDouble(values[2]); if (unlikely((__pyx_v_time == (double)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_v_alpha_1 = values[3]; __pyx_v_alpha_2 = values[4]; __pyx_v_result = values[5]; } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("calc_TN93_P", 1, 6, 6, PyTuple_GET_SIZE(__pyx_args)); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L3_error;} __pyx_L3_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.calc_TN93_P", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_6cogent_6evolve_14_solved_models_calc_TN93_P(__pyx_self, __pyx_v_do_scaling, __pyx_v_mprobs, __pyx_v_time, __pyx_v_alpha_1, __pyx_v_alpha_2, __pyx_v_result); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* "cogent/evolve/_solved_models.pyx":8 * version_info = (1, 0) * * def calc_TN93_P(int do_scaling, mprobs, double time, alpha_1, alpha_2, result): # <<<<<<<<<<<<<< * cdef int S, motif, i, other, row, column, b_row, b_column * cdef double *pi, *P, scale_factor */ static PyObject *__pyx_pf_6cogent_6evolve_14_solved_models_calc_TN93_P(CYTHON_UNUSED PyObject *__pyx_self, int __pyx_v_do_scaling, PyObject *__pyx_v_mprobs, double __pyx_v_time, PyObject *__pyx_v_alpha_1, PyObject *__pyx_v_alpha_2, PyObject *__pyx_v_result) { int __pyx_v_S; int __pyx_v_motif; int __pyx_v_i; int __pyx_v_other; int __pyx_v_row; int __pyx_v_column; double *__pyx_v_pi; double *__pyx_v_P; double __pyx_v_scale_factor; double __pyx_v_pi_star[2]; double __pyx_v_alpha[2]; double __pyx_v_mu[2]; double __pyx_v_e_mu_t[2]; double __pyx_v_e_beta_t; double __pyx_v_transition[2]; double __pyx_v_transversion; double __pyx_v_p; long __pyx_v_j; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations double __pyx_t_1; double *__pyx_t_2; double *__pyx_t_3; int __pyx_t_4; int __pyx_t_5; int __pyx_t_6; int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; __Pyx_RefNannySetupContext("calc_TN93_P", 0); /* "cogent/evolve/_solved_models.pyx":14 * cdef double transition[2], transversion, p * * alpha[0] = alpha_1 # <<<<<<<<<<<<<< * alpha[1] = alpha_2 * */ __pyx_t_1 = __pyx_PyFloat_AsDouble(__pyx_v_alpha_1); if (unlikely((__pyx_t_1 == (double)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 14; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (__pyx_v_alpha[0]) = __pyx_t_1; /* "cogent/evolve/_solved_models.pyx":15 * * alpha[0] = alpha_1 * alpha[1] = alpha_2 # <<<<<<<<<<<<<< * * S = 4 */ __pyx_t_1 = __pyx_PyFloat_AsDouble(__pyx_v_alpha_2); if (unlikely((__pyx_t_1 == (double)-1) && PyErr_Occurred())) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 15; __pyx_clineno = __LINE__; goto __pyx_L1_error;} (__pyx_v_alpha[1]) = __pyx_t_1; /* "cogent/evolve/_solved_models.pyx":17 * alpha[1] = alpha_2 * * S = 4 # <<<<<<<<<<<<<< * pi = checkArrayDouble1D(mprobs, &S) * P = checkArrayDouble2D(result, &S, &S) */ __pyx_v_S = 4; /* "cogent/evolve/_solved_models.pyx":18 * * S = 4 * pi = checkArrayDouble1D(mprobs, &S) # <<<<<<<<<<<<<< * P = checkArrayDouble2D(result, &S, &S) * */ __pyx_t_2 = __pyx_f_6cogent_6evolve_14_solved_models_checkArrayDouble1D(((__pyx_t_6cogent_6evolve_14_solved_models_ArrayType)__pyx_v_mprobs), (&__pyx_v_S)); if (unlikely(__pyx_t_2 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 18; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_pi = __pyx_t_2; /* "cogent/evolve/_solved_models.pyx":19 * S = 4 * pi = checkArrayDouble1D(mprobs, &S) * P = checkArrayDouble2D(result, &S, &S) # <<<<<<<<<<<<<< * * pi_star[0] = pi[0] + pi[1] */ __pyx_t_3 = __pyx_f_6cogent_6evolve_14_solved_models_checkArrayDouble2D(((__pyx_t_6cogent_6evolve_14_solved_models_ArrayType)__pyx_v_result), (&__pyx_v_S), (&__pyx_v_S)); if (unlikely(__pyx_t_3 == NULL)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 19; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_v_P = __pyx_t_3; /* "cogent/evolve/_solved_models.pyx":21 * P = checkArrayDouble2D(result, &S, &S) * * pi_star[0] = pi[0] + pi[1] # <<<<<<<<<<<<<< * pi_star[1] = pi[2] + pi[3] * */ (__pyx_v_pi_star[0]) = ((__pyx_v_pi[0]) + (__pyx_v_pi[1])); /* "cogent/evolve/_solved_models.pyx":22 * * pi_star[0] = pi[0] + pi[1] * pi_star[1] = pi[2] + pi[3] # <<<<<<<<<<<<<< * * mu[0] = alpha[0] * pi_star[0] + 1.0 * pi_star[1] */ (__pyx_v_pi_star[1]) = ((__pyx_v_pi[2]) + (__pyx_v_pi[3])); /* "cogent/evolve/_solved_models.pyx":24 * pi_star[1] = pi[2] + pi[3] * * mu[0] = alpha[0] * pi_star[0] + 1.0 * pi_star[1] # <<<<<<<<<<<<<< * mu[1] = 1.0 * pi_star[0] + alpha[1] * pi_star[1] * */ (__pyx_v_mu[0]) = (((__pyx_v_alpha[0]) * (__pyx_v_pi_star[0])) + (1.0 * (__pyx_v_pi_star[1]))); /* "cogent/evolve/_solved_models.pyx":25 * * mu[0] = alpha[0] * pi_star[0] + 1.0 * pi_star[1] * mu[1] = 1.0 * pi_star[0] + alpha[1] * pi_star[1] # <<<<<<<<<<<<<< * * if do_scaling: */ (__pyx_v_mu[1]) = ((1.0 * (__pyx_v_pi_star[0])) + ((__pyx_v_alpha[1]) * (__pyx_v_pi_star[1]))); /* "cogent/evolve/_solved_models.pyx":27 * mu[1] = 1.0 * pi_star[0] + alpha[1] * pi_star[1] * * if do_scaling: # <<<<<<<<<<<<<< * scale_factor = 0.0 * for motif in range(4): */ if (__pyx_v_do_scaling) { /* "cogent/evolve/_solved_models.pyx":28 * * if do_scaling: * scale_factor = 0.0 # <<<<<<<<<<<<<< * for motif in range(4): * i = motif // 2 */ __pyx_v_scale_factor = 0.0; /* "cogent/evolve/_solved_models.pyx":29 * if do_scaling: * scale_factor = 0.0 * for motif in range(4): # <<<<<<<<<<<<<< * i = motif // 2 * other = 1 - i */ for (__pyx_t_4 = 0; __pyx_t_4 < 4; __pyx_t_4+=1) { __pyx_v_motif = __pyx_t_4; /* "cogent/evolve/_solved_models.pyx":30 * scale_factor = 0.0 * for motif in range(4): * i = motif // 2 # <<<<<<<<<<<<<< * other = 1 - i * scale_factor += (alpha[i] * pi[2*i+1-motif%2] + pi_star[other]) * pi[motif] */ __pyx_v_i = __Pyx_div_long(__pyx_v_motif, 2); /* "cogent/evolve/_solved_models.pyx":31 * for motif in range(4): * i = motif // 2 * other = 1 - i # <<<<<<<<<<<<<< * scale_factor += (alpha[i] * pi[2*i+1-motif%2] + pi_star[other]) * pi[motif] * time /= scale_factor */ __pyx_v_other = (1 - __pyx_v_i); /* "cogent/evolve/_solved_models.pyx":32 * i = motif // 2 * other = 1 - i * scale_factor += (alpha[i] * pi[2*i+1-motif%2] + pi_star[other]) * pi[motif] # <<<<<<<<<<<<<< * time /= scale_factor * */ __pyx_v_scale_factor = (__pyx_v_scale_factor + ((((__pyx_v_alpha[__pyx_v_i]) * (__pyx_v_pi[(((2 * __pyx_v_i) + 1) - __Pyx_mod_long(__pyx_v_motif, 2))])) + (__pyx_v_pi_star[__pyx_v_other])) * (__pyx_v_pi[__pyx_v_motif]))); } /* "cogent/evolve/_solved_models.pyx":33 * other = 1 - i * scale_factor += (alpha[i] * pi[2*i+1-motif%2] + pi_star[other]) * pi[motif] * time /= scale_factor # <<<<<<<<<<<<<< * * e_beta_t = exp(-time) */ if (unlikely(__pyx_v_scale_factor == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "float division"); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 33; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } __pyx_v_time = (__pyx_v_time / __pyx_v_scale_factor); goto __pyx_L3; } __pyx_L3:; /* "cogent/evolve/_solved_models.pyx":35 * time /= scale_factor * * e_beta_t = exp(-time) # <<<<<<<<<<<<<< * transversion = 1 - e_beta_t * for i in range(2): */ __pyx_v_e_beta_t = exp((-__pyx_v_time)); /* "cogent/evolve/_solved_models.pyx":36 * * e_beta_t = exp(-time) * transversion = 1 - e_beta_t # <<<<<<<<<<<<<< * for i in range(2): * other = 1 - i */ __pyx_v_transversion = (1.0 - __pyx_v_e_beta_t); /* "cogent/evolve/_solved_models.pyx":37 * e_beta_t = exp(-time) * transversion = 1 - e_beta_t * for i in range(2): # <<<<<<<<<<<<<< * other = 1 - i * e_mu_t[i] = exp(-mu[i]*time) */ for (__pyx_t_4 = 0; __pyx_t_4 < 2; __pyx_t_4+=1) { __pyx_v_i = __pyx_t_4; /* "cogent/evolve/_solved_models.pyx":38 * transversion = 1 - e_beta_t * for i in range(2): * other = 1 - i # <<<<<<<<<<<<<< * e_mu_t[i] = exp(-mu[i]*time) * transition[i] = 1 + (pi_star[other] * e_beta_t - e_mu_t[i]) / pi_star[i] */ __pyx_v_other = (1 - __pyx_v_i); /* "cogent/evolve/_solved_models.pyx":39 * for i in range(2): * other = 1 - i * e_mu_t[i] = exp(-mu[i]*time) # <<<<<<<<<<<<<< * transition[i] = 1 + (pi_star[other] * e_beta_t - e_mu_t[i]) / pi_star[i] * */ (__pyx_v_e_mu_t[__pyx_v_i]) = exp(((-(__pyx_v_mu[__pyx_v_i])) * __pyx_v_time)); /* "cogent/evolve/_solved_models.pyx":40 * other = 1 - i * e_mu_t[i] = exp(-mu[i]*time) * transition[i] = 1 + (pi_star[other] * e_beta_t - e_mu_t[i]) / pi_star[i] # <<<<<<<<<<<<<< * * for row in range(4): */ __pyx_t_1 = (((__pyx_v_pi_star[__pyx_v_other]) * __pyx_v_e_beta_t) - (__pyx_v_e_mu_t[__pyx_v_i])); if (unlikely((__pyx_v_pi_star[__pyx_v_i]) == 0)) { PyErr_Format(PyExc_ZeroDivisionError, "float division"); {__pyx_filename = __pyx_f[1]; __pyx_lineno = 40; __pyx_clineno = __LINE__; goto __pyx_L1_error;} } (__pyx_v_transition[__pyx_v_i]) = (1.0 + (__pyx_t_1 / (__pyx_v_pi_star[__pyx_v_i]))); } /* "cogent/evolve/_solved_models.pyx":42 * transition[i] = 1 + (pi_star[other] * e_beta_t - e_mu_t[i]) / pi_star[i] * * for row in range(4): # <<<<<<<<<<<<<< * i = row // 2 * for column in range(4): */ for (__pyx_t_4 = 0; __pyx_t_4 < 4; __pyx_t_4+=1) { __pyx_v_row = __pyx_t_4; /* "cogent/evolve/_solved_models.pyx":43 * * for row in range(4): * i = row // 2 # <<<<<<<<<<<<<< * for column in range(4): * j = column // 2 */ __pyx_v_i = __Pyx_div_long(__pyx_v_row, 2); /* "cogent/evolve/_solved_models.pyx":44 * for row in range(4): * i = row // 2 * for column in range(4): # <<<<<<<<<<<<<< * j = column // 2 * if i == j: */ for (__pyx_t_5 = 0; __pyx_t_5 < 4; __pyx_t_5+=1) { __pyx_v_column = __pyx_t_5; /* "cogent/evolve/_solved_models.pyx":45 * i = row // 2 * for column in range(4): * j = column // 2 # <<<<<<<<<<<<<< * if i == j: * p = transition[i] */ __pyx_v_j = __Pyx_div_long(__pyx_v_column, 2); /* "cogent/evolve/_solved_models.pyx":46 * for column in range(4): * j = column // 2 * if i == j: # <<<<<<<<<<<<<< * p = transition[i] * else: */ __pyx_t_6 = (__pyx_v_i == __pyx_v_j); if (__pyx_t_6) { /* "cogent/evolve/_solved_models.pyx":47 * j = column // 2 * if i == j: * p = transition[i] # <<<<<<<<<<<<<< * else: * p = transversion */ __pyx_v_p = (__pyx_v_transition[__pyx_v_i]); goto __pyx_L12; } /*else*/ { /* "cogent/evolve/_solved_models.pyx":49 * p = transition[i] * else: * p = transversion # <<<<<<<<<<<<<< * p *= pi[column] * if row == column: */ __pyx_v_p = __pyx_v_transversion; } __pyx_L12:; /* "cogent/evolve/_solved_models.pyx":50 * else: * p = transversion * p *= pi[column] # <<<<<<<<<<<<<< * if row == column: * p += e_mu_t[i] */ __pyx_v_p = (__pyx_v_p * (__pyx_v_pi[__pyx_v_column])); /* "cogent/evolve/_solved_models.pyx":51 * p = transversion * p *= pi[column] * if row == column: # <<<<<<<<<<<<<< * p += e_mu_t[i] * P[column+4*row] = p */ __pyx_t_6 = (__pyx_v_row == __pyx_v_column); if (__pyx_t_6) { /* "cogent/evolve/_solved_models.pyx":52 * p *= pi[column] * if row == column: * p += e_mu_t[i] # <<<<<<<<<<<<<< * P[column+4*row] = p * */ __pyx_v_p = (__pyx_v_p + (__pyx_v_e_mu_t[__pyx_v_i])); goto __pyx_L13; } __pyx_L13:; /* "cogent/evolve/_solved_models.pyx":53 * if row == column: * p += e_mu_t[i] * P[column+4*row] = p # <<<<<<<<<<<<<< * */ (__pyx_v_P[(__pyx_v_column + (4 * __pyx_v_row))]) = __pyx_v_p; } } __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_AddTraceback("cogent.evolve._solved_models.calc_TN93_P", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyMethodDef __pyx_methods[] = { {0, 0, 0, 0} }; #if PY_MAJOR_VERSION >= 3 static struct PyModuleDef __pyx_moduledef = { PyModuleDef_HEAD_INIT, __Pyx_NAMESTR("_solved_models"), 0, /* m_doc */ -1, /* m_size */ __pyx_methods /* m_methods */, NULL, /* m_reload */ NULL, /* m_traverse */ NULL, /* m_clear */ NULL /* m_free */ }; #endif static __Pyx_StringTabEntry __pyx_string_tab[] = { {&__pyx_kp_s_1, __pyx_k_1, sizeof(__pyx_k_1), 0, 0, 1, 0}, {&__pyx_kp_s_10, __pyx_k_10, sizeof(__pyx_k_10), 0, 0, 1, 0}, {&__pyx_kp_s_14, __pyx_k_14, sizeof(__pyx_k_14), 0, 0, 1, 0}, {&__pyx_n_s_15, __pyx_k_15, sizeof(__pyx_k_15), 0, 0, 1, 1}, {&__pyx_kp_s_3, __pyx_k_3, sizeof(__pyx_k_3), 0, 0, 1, 0}, {&__pyx_kp_s_4, __pyx_k_4, sizeof(__pyx_k_4), 0, 0, 1, 0}, {&__pyx_kp_s_5, __pyx_k_5, sizeof(__pyx_k_5), 0, 0, 1, 0}, {&__pyx_kp_s_6, __pyx_k_6, sizeof(__pyx_k_6), 0, 0, 1, 0}, {&__pyx_kp_s_7, __pyx_k_7, sizeof(__pyx_k_7), 0, 0, 1, 0}, {&__pyx_kp_s_9, __pyx_k_9, sizeof(__pyx_k_9), 0, 0, 1, 0}, {&__pyx_n_s__P, __pyx_k__P, sizeof(__pyx_k__P), 0, 0, 1, 1}, {&__pyx_n_s__S, __pyx_k__S, sizeof(__pyx_k__S), 0, 0, 1, 1}, {&__pyx_n_s__TypeError, __pyx_k__TypeError, sizeof(__pyx_k__TypeError), 0, 0, 1, 1}, {&__pyx_n_s__ValueError, __pyx_k__ValueError, sizeof(__pyx_k__ValueError), 0, 0, 1, 1}, {&__pyx_n_s____array_struct__, __pyx_k____array_struct__, sizeof(__pyx_k____array_struct__), 0, 0, 1, 1}, {&__pyx_n_s____main__, __pyx_k____main__, sizeof(__pyx_k____main__), 0, 0, 1, 1}, {&__pyx_n_s____test__, __pyx_k____test__, sizeof(__pyx_k____test__), 0, 0, 1, 1}, {&__pyx_n_s____version__, __pyx_k____version__, sizeof(__pyx_k____version__), 0, 0, 1, 1}, {&__pyx_n_s__alpha, __pyx_k__alpha, sizeof(__pyx_k__alpha), 0, 0, 1, 1}, {&__pyx_n_s__alpha_1, __pyx_k__alpha_1, sizeof(__pyx_k__alpha_1), 0, 0, 1, 1}, {&__pyx_n_s__alpha_2, __pyx_k__alpha_2, sizeof(__pyx_k__alpha_2), 0, 0, 1, 1}, {&__pyx_n_s__b_column, __pyx_k__b_column, sizeof(__pyx_k__b_column), 0, 0, 1, 1}, {&__pyx_n_s__b_row, __pyx_k__b_row, sizeof(__pyx_k__b_row), 0, 0, 1, 1}, {&__pyx_n_s__calc_TN93_P, __pyx_k__calc_TN93_P, sizeof(__pyx_k__calc_TN93_P), 0, 0, 1, 1}, {&__pyx_n_s__chr, __pyx_k__chr, sizeof(__pyx_k__chr), 0, 0, 1, 1}, {&__pyx_n_s__column, __pyx_k__column, sizeof(__pyx_k__column), 0, 0, 1, 1}, {&__pyx_n_s__do_scaling, __pyx_k__do_scaling, sizeof(__pyx_k__do_scaling), 0, 0, 1, 1}, {&__pyx_n_s__e_beta_t, __pyx_k__e_beta_t, sizeof(__pyx_k__e_beta_t), 0, 0, 1, 1}, {&__pyx_n_s__e_mu_t, __pyx_k__e_mu_t, sizeof(__pyx_k__e_mu_t), 0, 0, 1, 1}, {&__pyx_n_s__i, __pyx_k__i, sizeof(__pyx_k__i), 0, 0, 1, 1}, {&__pyx_n_s__j, __pyx_k__j, sizeof(__pyx_k__j), 0, 0, 1, 1}, {&__pyx_n_s__motif, __pyx_k__motif, sizeof(__pyx_k__motif), 0, 0, 1, 1}, {&__pyx_n_s__mprobs, __pyx_k__mprobs, sizeof(__pyx_k__mprobs), 0, 0, 1, 1}, {&__pyx_n_s__mu, __pyx_k__mu, sizeof(__pyx_k__mu), 0, 0, 1, 1}, {&__pyx_n_s__other, __pyx_k__other, sizeof(__pyx_k__other), 0, 0, 1, 1}, {&__pyx_n_s__p, __pyx_k__p, sizeof(__pyx_k__p), 0, 0, 1, 1}, {&__pyx_n_s__pi, __pyx_k__pi, sizeof(__pyx_k__pi), 0, 0, 1, 1}, {&__pyx_n_s__pi_star, __pyx_k__pi_star, sizeof(__pyx_k__pi_star), 0, 0, 1, 1}, {&__pyx_n_s__range, __pyx_k__range, sizeof(__pyx_k__range), 0, 0, 1, 1}, {&__pyx_n_s__result, __pyx_k__result, sizeof(__pyx_k__result), 0, 0, 1, 1}, {&__pyx_n_s__row, __pyx_k__row, sizeof(__pyx_k__row), 0, 0, 1, 1}, {&__pyx_n_s__scale_factor, __pyx_k__scale_factor, sizeof(__pyx_k__scale_factor), 0, 0, 1, 1}, {&__pyx_n_s__time, __pyx_k__time, sizeof(__pyx_k__time), 0, 0, 1, 1}, {&__pyx_n_s__transition, __pyx_k__transition, sizeof(__pyx_k__transition), 0, 0, 1, 1}, {&__pyx_n_s__transversion, __pyx_k__transversion, sizeof(__pyx_k__transversion), 0, 0, 1, 1}, {&__pyx_n_s__version_info, __pyx_k__version_info, sizeof(__pyx_k__version_info), 0, 0, 1, 1}, {0, 0, 0, 0, 0, 0, 0} }; static int __Pyx_InitCachedBuiltins(void) { __pyx_builtin_TypeError = __Pyx_GetName(__pyx_b, __pyx_n_s__TypeError); if (!__pyx_builtin_TypeError) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 44; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_ValueError = __Pyx_GetName(__pyx_b, __pyx_n_s__ValueError); if (!__pyx_builtin_ValueError) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 48; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_chr = __Pyx_GetName(__pyx_b, __pyx_n_s__chr); if (!__pyx_builtin_chr) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 54; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_builtin_range = __Pyx_GetName(__pyx_b, __pyx_n_s__range); if (!__pyx_builtin_range) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 29; __pyx_clineno = __LINE__; goto __pyx_L1_error;} return 0; __pyx_L1_error:; return -1; } static int __Pyx_InitCachedConstants(void) { __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("__Pyx_InitCachedConstants", 0); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":44 * cdef char kind * if A is None: * raise TypeError("Array required, got None") # <<<<<<<<<<<<<< * cobj = A.__array_struct__ * a = PyCObject_AsVoidPtr(cobj) */ __pyx_k_tuple_2 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 44; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_2); __Pyx_INCREF(((PyObject *)__pyx_kp_s_1)); PyTuple_SET_ITEM(__pyx_k_tuple_2, 0, ((PyObject *)__pyx_kp_s_1)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_s_1)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_2)); /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":62 * (nd, a.nd)) * if not a.flags & CONTIGUOUS: * raise ValueError ('Noncontiguous array') # <<<<<<<<<<<<<< * * cdef int dimension, val */ __pyx_k_tuple_8 = PyTuple_New(1); if (unlikely(!__pyx_k_tuple_8)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 62; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_8); __Pyx_INCREF(((PyObject *)__pyx_kp_s_7)); PyTuple_SET_ITEM(__pyx_k_tuple_8, 0, ((PyObject *)__pyx_kp_s_7)); __Pyx_GIVEREF(((PyObject *)__pyx_kp_s_7)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_8)); /* "cogent/evolve/_solved_models.pyx":6 * double exp(double) * * version_info = (1, 0) # <<<<<<<<<<<<<< * * def calc_TN93_P(int do_scaling, mprobs, double time, alpha_1, alpha_2, result): */ __pyx_k_tuple_11 = PyTuple_New(2); if (unlikely(!__pyx_k_tuple_11)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_11); __Pyx_INCREF(__pyx_int_1); PyTuple_SET_ITEM(__pyx_k_tuple_11, 0, __pyx_int_1); __Pyx_GIVEREF(__pyx_int_1); __Pyx_INCREF(__pyx_int_0); PyTuple_SET_ITEM(__pyx_k_tuple_11, 1, __pyx_int_0); __Pyx_GIVEREF(__pyx_int_0); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_11)); /* "cogent/evolve/_solved_models.pyx":8 * version_info = (1, 0) * * def calc_TN93_P(int do_scaling, mprobs, double time, alpha_1, alpha_2, result): # <<<<<<<<<<<<<< * cdef int S, motif, i, other, row, column, b_row, b_column * cdef double *pi, *P, scale_factor */ __pyx_k_tuple_12 = PyTuple_New(26); if (unlikely(!__pyx_k_tuple_12)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_k_tuple_12); __Pyx_INCREF(((PyObject *)__pyx_n_s__do_scaling)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 0, ((PyObject *)__pyx_n_s__do_scaling)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__do_scaling)); __Pyx_INCREF(((PyObject *)__pyx_n_s__mprobs)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 1, ((PyObject *)__pyx_n_s__mprobs)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__mprobs)); __Pyx_INCREF(((PyObject *)__pyx_n_s__time)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 2, ((PyObject *)__pyx_n_s__time)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__time)); __Pyx_INCREF(((PyObject *)__pyx_n_s__alpha_1)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 3, ((PyObject *)__pyx_n_s__alpha_1)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__alpha_1)); __Pyx_INCREF(((PyObject *)__pyx_n_s__alpha_2)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 4, ((PyObject *)__pyx_n_s__alpha_2)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__alpha_2)); __Pyx_INCREF(((PyObject *)__pyx_n_s__result)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 5, ((PyObject *)__pyx_n_s__result)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__result)); __Pyx_INCREF(((PyObject *)__pyx_n_s__S)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 6, ((PyObject *)__pyx_n_s__S)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__S)); __Pyx_INCREF(((PyObject *)__pyx_n_s__motif)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 7, ((PyObject *)__pyx_n_s__motif)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__motif)); __Pyx_INCREF(((PyObject *)__pyx_n_s__i)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 8, ((PyObject *)__pyx_n_s__i)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__i)); __Pyx_INCREF(((PyObject *)__pyx_n_s__other)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 9, ((PyObject *)__pyx_n_s__other)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__other)); __Pyx_INCREF(((PyObject *)__pyx_n_s__row)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 10, ((PyObject *)__pyx_n_s__row)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__row)); __Pyx_INCREF(((PyObject *)__pyx_n_s__column)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 11, ((PyObject *)__pyx_n_s__column)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__column)); __Pyx_INCREF(((PyObject *)__pyx_n_s__b_row)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 12, ((PyObject *)__pyx_n_s__b_row)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__b_row)); __Pyx_INCREF(((PyObject *)__pyx_n_s__b_column)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 13, ((PyObject *)__pyx_n_s__b_column)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__b_column)); __Pyx_INCREF(((PyObject *)__pyx_n_s__pi)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 14, ((PyObject *)__pyx_n_s__pi)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__pi)); __Pyx_INCREF(((PyObject *)__pyx_n_s__P)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 15, ((PyObject *)__pyx_n_s__P)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__P)); __Pyx_INCREF(((PyObject *)__pyx_n_s__scale_factor)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 16, ((PyObject *)__pyx_n_s__scale_factor)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__scale_factor)); __Pyx_INCREF(((PyObject *)__pyx_n_s__pi_star)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 17, ((PyObject *)__pyx_n_s__pi_star)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__pi_star)); __Pyx_INCREF(((PyObject *)__pyx_n_s__alpha)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 18, ((PyObject *)__pyx_n_s__alpha)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__alpha)); __Pyx_INCREF(((PyObject *)__pyx_n_s__mu)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 19, ((PyObject *)__pyx_n_s__mu)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__mu)); __Pyx_INCREF(((PyObject *)__pyx_n_s__e_mu_t)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 20, ((PyObject *)__pyx_n_s__e_mu_t)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__e_mu_t)); __Pyx_INCREF(((PyObject *)__pyx_n_s__e_beta_t)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 21, ((PyObject *)__pyx_n_s__e_beta_t)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__e_beta_t)); __Pyx_INCREF(((PyObject *)__pyx_n_s__transition)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 22, ((PyObject *)__pyx_n_s__transition)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__transition)); __Pyx_INCREF(((PyObject *)__pyx_n_s__transversion)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 23, ((PyObject *)__pyx_n_s__transversion)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__transversion)); __Pyx_INCREF(((PyObject *)__pyx_n_s__p)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 24, ((PyObject *)__pyx_n_s__p)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__p)); __Pyx_INCREF(((PyObject *)__pyx_n_s__j)); PyTuple_SET_ITEM(__pyx_k_tuple_12, 25, ((PyObject *)__pyx_n_s__j)); __Pyx_GIVEREF(((PyObject *)__pyx_n_s__j)); __Pyx_GIVEREF(((PyObject *)__pyx_k_tuple_12)); __pyx_k_codeobj_13 = (PyObject*)__Pyx_PyCode_New(6, 0, 26, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_k_tuple_12, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_14, __pyx_n_s__calc_TN93_P, 8, __pyx_empty_bytes); if (unlikely(!__pyx_k_codeobj_13)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_RefNannyFinishContext(); return 0; __pyx_L1_error:; __Pyx_RefNannyFinishContext(); return -1; } static int __Pyx_InitGlobals(void) { if (__Pyx_InitStrings(__pyx_string_tab) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_0 = PyInt_FromLong(0); if (unlikely(!__pyx_int_0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; __pyx_int_1 = PyInt_FromLong(1); if (unlikely(!__pyx_int_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; return 0; __pyx_L1_error:; return -1; } #if PY_MAJOR_VERSION < 3 PyMODINIT_FUNC init_solved_models(void); /*proto*/ PyMODINIT_FUNC init_solved_models(void) #else PyMODINIT_FUNC PyInit__solved_models(void); /*proto*/ PyMODINIT_FUNC PyInit__solved_models(void) #endif { PyObject *__pyx_t_1 = NULL; __Pyx_RefNannyDeclarations #if CYTHON_REFNANNY __Pyx_RefNanny = __Pyx_RefNannyImportAPI("refnanny"); if (!__Pyx_RefNanny) { PyErr_Clear(); __Pyx_RefNanny = __Pyx_RefNannyImportAPI("Cython.Runtime.refnanny"); if (!__Pyx_RefNanny) Py_FatalError("failed to import 'refnanny' module"); } #endif __Pyx_RefNannySetupContext("PyMODINIT_FUNC PyInit__solved_models(void)", 0); if ( __Pyx_check_binary_version() < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_tuple = PyTuple_New(0); if (unlikely(!__pyx_empty_tuple)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __pyx_empty_bytes = PyBytes_FromStringAndSize("", 0); if (unlikely(!__pyx_empty_bytes)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #ifdef __Pyx_CyFunction_USED if (__Pyx_CyFunction_init() < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_FusedFunction_USED if (__pyx_FusedFunction_init() < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif #ifdef __Pyx_Generator_USED if (__pyx_Generator_init() < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} #endif /*--- Library function declarations ---*/ /*--- Threads initialization code ---*/ #if defined(__PYX_FORCE_INIT_THREADS) && __PYX_FORCE_INIT_THREADS #ifdef WITH_THREAD /* Python build with threading support? */ PyEval_InitThreads(); #endif #endif /*--- Module creation code ---*/ #if PY_MAJOR_VERSION < 3 __pyx_m = Py_InitModule4(__Pyx_NAMESTR("_solved_models"), __pyx_methods, 0, 0, PYTHON_API_VERSION); #else __pyx_m = PyModule_Create(&__pyx_moduledef); #endif if (!__pyx_m) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; #if PY_MAJOR_VERSION < 3 Py_INCREF(__pyx_m); #endif __pyx_b = PyImport_AddModule(__Pyx_NAMESTR(__Pyx_BUILTIN_MODULE_NAME)); if (!__pyx_b) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; if (__Pyx_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; /*--- Initialize various global constants etc. ---*/ if (unlikely(__Pyx_InitGlobals() < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} if (__pyx_module_is_main_cogent__evolve___solved_models) { if (__Pyx_SetAttrString(__pyx_m, "__name__", __pyx_n_s____main__) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}; } /*--- Builtin init code ---*/ if (unlikely(__Pyx_InitCachedBuiltins() < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Constants init code ---*/ if (unlikely(__Pyx_InitCachedConstants() < 0)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /*--- Global init code ---*/ /*--- Variable export code ---*/ /*--- Function export code ---*/ /*--- Type init code ---*/ /*--- Type import code ---*/ /*--- Variable import code ---*/ /*--- Function import code ---*/ /*--- Execution code ---*/ /* "/Users/gavin/DevRepos/PyCogent-hg/cogent/evolve/../../include/numerical_pyrex.pyx":13 * # * * __version__ = "('1', '6', '0dev')" # <<<<<<<<<<<<<< * * cdef extern from "Python.h": */ if (PyObject_SetAttr(__pyx_m, __pyx_n_s____version__, ((PyObject *)__pyx_kp_s_10)) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 13; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/evolve/_solved_models.pyx":6 * double exp(double) * * version_info = (1, 0) # <<<<<<<<<<<<<< * * def calc_TN93_P(int do_scaling, mprobs, double time, alpha_1, alpha_2, result): */ if (PyObject_SetAttr(__pyx_m, __pyx_n_s__version_info, ((PyObject *)__pyx_k_tuple_11)) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;} /* "cogent/evolve/_solved_models.pyx":8 * version_info = (1, 0) * * def calc_TN93_P(int do_scaling, mprobs, double time, alpha_1, alpha_2, result): # <<<<<<<<<<<<<< * cdef int S, motif, i, other, row, column, b_row, b_column * cdef double *pi, *P, scale_factor */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_6cogent_6evolve_14_solved_models_1calc_TN93_P, NULL, __pyx_n_s_15); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(__pyx_t_1); if (PyObject_SetAttr(__pyx_m, __pyx_n_s__calc_TN93_P, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; /* "cogent/evolve/_solved_models.pyx":1 * include "../../include/numerical_pyrex.pyx" # <<<<<<<<<<<<<< * * cdef extern from "math.h": */ __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_GOTREF(((PyObject *)__pyx_t_1)); if (PyObject_SetAttr(__pyx_m, __pyx_n_s____test__, ((PyObject *)__pyx_t_1)) < 0) {__pyx_filename = __pyx_f[1]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;} __Pyx_DECREF(((PyObject *)__pyx_t_1)); __pyx_t_1 = 0; goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); if (__pyx_m) { __Pyx_AddTraceback("init cogent.evolve._solved_models", __pyx_clineno, __pyx_lineno, __pyx_filename); Py_DECREF(__pyx_m); __pyx_m = 0; } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_ImportError, "init cogent.evolve._solved_models"); } __pyx_L0:; __Pyx_RefNannyFinishContext(); #if PY_MAJOR_VERSION < 3 return; #else return __pyx_m; #endif } /* Runtime support code */ #if CYTHON_REFNANNY static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname) { PyObject *m = NULL, *p = NULL; void *r = NULL; m = PyImport_ImportModule((char *)modname); if (!m) goto end; p = PyObject_GetAttrString(m, (char *)"RefNannyAPI"); if (!p) goto end; r = PyLong_AsVoidPtr(p); end: Py_XDECREF(p); Py_XDECREF(m); return (__Pyx_RefNannyAPIStruct *)r; } #endif /* CYTHON_REFNANNY */ static PyObject *__Pyx_GetName(PyObject *dict, PyObject *name) { PyObject *result; result = PyObject_GetAttr(dict, name); if (!result) { if (dict != __pyx_b) { PyErr_Clear(); result = PyObject_GetAttr(__pyx_b, name); } if (!result) { PyErr_SetObject(PyExc_NameError, name); } } return result; } static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *value, PyObject *tb) { #if CYTHON_COMPILING_IN_CPYTHON PyObject *tmp_type, *tmp_value, *tmp_tb; PyThreadState *tstate = PyThreadState_GET(); tmp_type = tstate->curexc_type; tmp_value = tstate->curexc_value; tmp_tb = tstate->curexc_traceback; tstate->curexc_type = type; tstate->curexc_value = value; tstate->curexc_traceback = tb; Py_XDECREF(tmp_type); Py_XDECREF(tmp_value); Py_XDECREF(tmp_tb); #else PyErr_Restore(type, value, tb); #endif } static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb) { #if CYTHON_COMPILING_IN_CPYTHON PyThreadState *tstate = PyThreadState_GET(); *type = tstate->curexc_type; *value = tstate->curexc_value; *tb = tstate->curexc_traceback; tstate->curexc_type = 0; tstate->curexc_value = 0; tstate->curexc_traceback = 0; #else PyErr_Fetch(type, value, tb); #endif } #if PY_MAJOR_VERSION < 3 static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, CYTHON_UNUSED PyObject *cause) { Py_XINCREF(type); Py_XINCREF(value); Py_XINCREF(tb); if (tb == Py_None) { Py_DECREF(tb); tb = 0; } else if (tb != NULL && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto raise_error; } if (value == NULL) { value = Py_None; Py_INCREF(value); } #if PY_VERSION_HEX < 0x02050000 if (!PyClass_Check(type)) #else if (!PyType_Check(type)) #endif { if (value != Py_None) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto raise_error; } Py_DECREF(value); value = type; #if PY_VERSION_HEX < 0x02050000 if (PyInstance_Check(type)) { type = (PyObject*) ((PyInstanceObject*)type)->in_class; Py_INCREF(type); } else { type = 0; PyErr_SetString(PyExc_TypeError, "raise: exception must be an old-style class or instance"); goto raise_error; } #else type = (PyObject*) Py_TYPE(type); Py_INCREF(type); if (!PyType_IsSubtype((PyTypeObject *)type, (PyTypeObject *)PyExc_BaseException)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto raise_error; } #endif } __Pyx_ErrRestore(type, value, tb); return; raise_error: Py_XDECREF(value); Py_XDECREF(type); Py_XDECREF(tb); return; } #else /* Python 3+ */ static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause) { if (tb == Py_None) { tb = 0; } else if (tb && !PyTraceBack_Check(tb)) { PyErr_SetString(PyExc_TypeError, "raise: arg 3 must be a traceback or None"); goto bad; } if (value == Py_None) value = 0; if (PyExceptionInstance_Check(type)) { if (value) { PyErr_SetString(PyExc_TypeError, "instance exception may not have a separate value"); goto bad; } value = type; type = (PyObject*) Py_TYPE(value); } else if (!PyExceptionClass_Check(type)) { PyErr_SetString(PyExc_TypeError, "raise: exception class must be a subclass of BaseException"); goto bad; } if (cause) { PyObject *fixed_cause; if (PyExceptionClass_Check(cause)) { fixed_cause = PyObject_CallObject(cause, NULL); if (fixed_cause == NULL) goto bad; } else if (PyExceptionInstance_Check(cause)) { fixed_cause = cause; Py_INCREF(fixed_cause); } else { PyErr_SetString(PyExc_TypeError, "exception causes must derive from " "BaseException"); goto bad; } if (!value) { value = PyObject_CallObject(type, NULL); } PyException_SetCause(value, fixed_cause); } PyErr_SetObject(type, value); if (tb) { PyThreadState *tstate = PyThreadState_GET(); PyObject* tmp_tb = tstate->curexc_traceback; if (tb != tmp_tb) { Py_INCREF(tb); tstate->curexc_traceback = tb; Py_XDECREF(tmp_tb); } } bad: return; } #endif static void __Pyx_RaiseArgtupleInvalid( const char* func_name, int exact, Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found) { Py_ssize_t num_expected; const char *more_or_less; if (num_found < num_min) { num_expected = num_min; more_or_less = "at least"; } else { num_expected = num_max; more_or_less = "at most"; } if (exact) { more_or_less = "exactly"; } PyErr_Format(PyExc_TypeError, "%s() takes %s %"PY_FORMAT_SIZE_T"d positional argument%s (%"PY_FORMAT_SIZE_T"d given)", func_name, more_or_less, num_expected, (num_expected == 1) ? "" : "s", num_found); } static void __Pyx_RaiseDoubleKeywordsError( const char* func_name, PyObject* kw_name) { PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION >= 3 "%s() got multiple values for keyword argument '%U'", func_name, kw_name); #else "%s() got multiple values for keyword argument '%s'", func_name, PyString_AS_STRING(kw_name)); #endif } static int __Pyx_ParseOptionalKeywords( PyObject *kwds, PyObject **argnames[], PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args, const char* function_name) { PyObject *key = 0, *value = 0; Py_ssize_t pos = 0; PyObject*** name; PyObject*** first_kw_arg = argnames + num_pos_args; while (PyDict_Next(kwds, &pos, &key, &value)) { name = first_kw_arg; while (*name && (**name != key)) name++; if (*name) { values[name-argnames] = value; } else { #if PY_MAJOR_VERSION < 3 if (unlikely(!PyString_CheckExact(key)) && unlikely(!PyString_Check(key))) { #else if (unlikely(!PyUnicode_Check(key))) { #endif goto invalid_keyword_type; } else { for (name = first_kw_arg; *name; name++) { #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) break; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) break; #endif } if (*name) { values[name-argnames] = value; } else { for (name=argnames; name != first_kw_arg; name++) { if (**name == key) goto arg_passed_twice; #if PY_MAJOR_VERSION >= 3 if (PyUnicode_GET_SIZE(**name) == PyUnicode_GET_SIZE(key) && PyUnicode_Compare(**name, key) == 0) goto arg_passed_twice; #else if (PyString_GET_SIZE(**name) == PyString_GET_SIZE(key) && _PyString_Eq(**name, key)) goto arg_passed_twice; #endif } if (kwds2) { if (unlikely(PyDict_SetItem(kwds2, key, value))) goto bad; } else { goto invalid_keyword; } } } } } return 0; arg_passed_twice: __Pyx_RaiseDoubleKeywordsError(function_name, **name); goto bad; invalid_keyword_type: PyErr_Format(PyExc_TypeError, "%s() keywords must be strings", function_name); goto bad; invalid_keyword: PyErr_Format(PyExc_TypeError, #if PY_MAJOR_VERSION < 3 "%s() got an unexpected keyword argument '%s'", function_name, PyString_AsString(key)); #else "%s() got an unexpected keyword argument '%U'", function_name, key); #endif bad: return -1; } static CYTHON_INLINE long __Pyx_div_long(long a, long b) { long q = a / b; long r = a - q*b; q -= ((r != 0) & ((r ^ b) < 0)); return q; } static CYTHON_INLINE long __Pyx_mod_long(long a, long b) { long r = a % b; r += ((r != 0) & ((r ^ b) < 0)) * b; return r; } static CYTHON_INLINE unsigned char __Pyx_PyInt_AsUnsignedChar(PyObject* x) { const unsigned char neg_one = (unsigned char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned char" : "value too large to convert to unsigned char"); } return (unsigned char)-1; } return (unsigned char)val; } return (unsigned char)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned short __Pyx_PyInt_AsUnsignedShort(PyObject* x) { const unsigned short neg_one = (unsigned short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned short" : "value too large to convert to unsigned short"); } return (unsigned short)-1; } return (unsigned short)val; } return (unsigned short)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE unsigned int __Pyx_PyInt_AsUnsignedInt(PyObject* x) { const unsigned int neg_one = (unsigned int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(unsigned int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(unsigned int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to unsigned int" : "value too large to convert to unsigned int"); } return (unsigned int)-1; } return (unsigned int)val; } return (unsigned int)__Pyx_PyInt_AsUnsignedLong(x); } static CYTHON_INLINE char __Pyx_PyInt_AsChar(PyObject* x) { const char neg_one = (char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to char" : "value too large to convert to char"); } return (char)-1; } return (char)val; } return (char)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE short __Pyx_PyInt_AsShort(PyObject* x) { const short neg_one = (short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to short" : "value too large to convert to short"); } return (short)-1; } return (short)val; } return (short)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsInt(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE signed char __Pyx_PyInt_AsSignedChar(PyObject* x) { const signed char neg_one = (signed char)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed char) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed char)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed char" : "value too large to convert to signed char"); } return (signed char)-1; } return (signed char)val; } return (signed char)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed short __Pyx_PyInt_AsSignedShort(PyObject* x) { const signed short neg_one = (signed short)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed short) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed short)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed short" : "value too large to convert to signed short"); } return (signed short)-1; } return (signed short)val; } return (signed short)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE signed int __Pyx_PyInt_AsSignedInt(PyObject* x) { const signed int neg_one = (signed int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(signed int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(signed int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to signed int" : "value too large to convert to signed int"); } return (signed int)-1; } return (signed int)val; } return (signed int)__Pyx_PyInt_AsSignedLong(x); } static CYTHON_INLINE int __Pyx_PyInt_AsLongDouble(PyObject* x) { const int neg_one = (int)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; if (sizeof(int) < sizeof(long)) { long val = __Pyx_PyInt_AsLong(x); if (unlikely(val != (long)(int)val)) { if (!unlikely(val == -1 && PyErr_Occurred())) { PyErr_SetString(PyExc_OverflowError, (is_unsigned && unlikely(val < 0)) ? "can't convert negative value to int" : "value too large to convert to int"); } return (int)-1; } return (int)val; } return (int)__Pyx_PyInt_AsLong(x); } static CYTHON_INLINE unsigned long __Pyx_PyInt_AsUnsignedLong(PyObject* x) { const unsigned long neg_one = (unsigned long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned long"); return (unsigned long)-1; } return (unsigned long)PyLong_AsUnsignedLong(x); } else { return (unsigned long)PyLong_AsLong(x); } } else { unsigned long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned long)-1; val = __Pyx_PyInt_AsUnsignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE unsigned PY_LONG_LONG __Pyx_PyInt_AsUnsignedLongLong(PyObject* x) { const unsigned PY_LONG_LONG neg_one = (unsigned PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to unsigned PY_LONG_LONG"); return (unsigned PY_LONG_LONG)-1; } return (unsigned PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (unsigned PY_LONG_LONG)PyLong_AsLongLong(x); } } else { unsigned PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (unsigned PY_LONG_LONG)-1; val = __Pyx_PyInt_AsUnsignedLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE long __Pyx_PyInt_AsLong(PyObject* x) { const long neg_one = (long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to long"); return (long)-1; } return (long)PyLong_AsUnsignedLong(x); } else { return (long)PyLong_AsLong(x); } } else { long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (long)-1; val = __Pyx_PyInt_AsLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE PY_LONG_LONG __Pyx_PyInt_AsLongLong(PyObject* x) { const PY_LONG_LONG neg_one = (PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to PY_LONG_LONG"); return (PY_LONG_LONG)-1; } return (PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (PY_LONG_LONG)PyLong_AsLongLong(x); } } else { PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1; val = __Pyx_PyInt_AsLongLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed long __Pyx_PyInt_AsSignedLong(PyObject* x) { const signed long neg_one = (signed long)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed long"); return (signed long)-1; } return (signed long)PyLong_AsUnsignedLong(x); } else { return (signed long)PyLong_AsLong(x); } } else { signed long val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed long)-1; val = __Pyx_PyInt_AsSignedLong(tmp); Py_DECREF(tmp); return val; } } static CYTHON_INLINE signed PY_LONG_LONG __Pyx_PyInt_AsSignedLongLong(PyObject* x) { const signed PY_LONG_LONG neg_one = (signed PY_LONG_LONG)-1, const_zero = 0; const int is_unsigned = neg_one > const_zero; #if PY_VERSION_HEX < 0x03000000 if (likely(PyInt_Check(x))) { long val = PyInt_AS_LONG(x); if (is_unsigned && unlikely(val < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)val; } else #endif if (likely(PyLong_Check(x))) { if (is_unsigned) { if (unlikely(Py_SIZE(x) < 0)) { PyErr_SetString(PyExc_OverflowError, "can't convert negative value to signed PY_LONG_LONG"); return (signed PY_LONG_LONG)-1; } return (signed PY_LONG_LONG)PyLong_AsUnsignedLongLong(x); } else { return (signed PY_LONG_LONG)PyLong_AsLongLong(x); } } else { signed PY_LONG_LONG val; PyObject *tmp = __Pyx_PyNumber_Int(x); if (!tmp) return (signed PY_LONG_LONG)-1; val = __Pyx_PyInt_AsSignedLongLong(tmp); Py_DECREF(tmp); return val; } } static void __Pyx_WriteUnraisable(const char *name, int clineno, int lineno, const char *filename) { PyObject *old_exc, *old_val, *old_tb; PyObject *ctx; __Pyx_ErrFetch(&old_exc, &old_val, &old_tb); #if PY_MAJOR_VERSION < 3 ctx = PyString_FromString(name); #else ctx = PyUnicode_FromString(name); #endif __Pyx_ErrRestore(old_exc, old_val, old_tb); if (!ctx) { PyErr_WriteUnraisable(Py_None); } else { PyErr_WriteUnraisable(ctx); Py_DECREF(ctx); } } static int __Pyx_check_binary_version(void) { char ctversion[4], rtversion[4]; PyOS_snprintf(ctversion, 4, "%d.%d", PY_MAJOR_VERSION, PY_MINOR_VERSION); PyOS_snprintf(rtversion, 4, "%s", Py_GetVersion()); if (ctversion[0] != rtversion[0] || ctversion[2] != rtversion[2]) { char message[200]; PyOS_snprintf(message, sizeof(message), "compiletime version %s of module '%.100s' " "does not match runtime version %s", ctversion, __Pyx_MODULE_NAME, rtversion); #if PY_VERSION_HEX < 0x02050000 return PyErr_Warn(NULL, message); #else return PyErr_WarnEx(NULL, message, 1); #endif } return 0; } static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line) { int start = 0, mid = 0, end = count - 1; if (end >= 0 && code_line > entries[end].code_line) { return count; } while (start < end) { mid = (start + end) / 2; if (code_line < entries[mid].code_line) { end = mid; } else if (code_line > entries[mid].code_line) { start = mid + 1; } else { return mid; } } if (code_line <= entries[mid].code_line) { return mid; } else { return mid + 1; } } static PyCodeObject *__pyx_find_code_object(int code_line) { PyCodeObject* code_object; int pos; if (unlikely(!code_line) || unlikely(!__pyx_code_cache.entries)) { return NULL; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if (unlikely(pos >= __pyx_code_cache.count) || unlikely(__pyx_code_cache.entries[pos].code_line != code_line)) { return NULL; } code_object = __pyx_code_cache.entries[pos].code_object; Py_INCREF(code_object); return code_object; } static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object) { int pos, i; __Pyx_CodeObjectCacheEntry* entries = __pyx_code_cache.entries; if (unlikely(!code_line)) { return; } if (unlikely(!entries)) { entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Malloc(64*sizeof(__Pyx_CodeObjectCacheEntry)); if (likely(entries)) { __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = 64; __pyx_code_cache.count = 1; entries[0].code_line = code_line; entries[0].code_object = code_object; Py_INCREF(code_object); } return; } pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line); if ((pos < __pyx_code_cache.count) && unlikely(__pyx_code_cache.entries[pos].code_line == code_line)) { PyCodeObject* tmp = entries[pos].code_object; entries[pos].code_object = code_object; Py_DECREF(tmp); return; } if (__pyx_code_cache.count == __pyx_code_cache.max_count) { int new_max = __pyx_code_cache.max_count + 64; entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Realloc( __pyx_code_cache.entries, new_max*sizeof(__Pyx_CodeObjectCacheEntry)); if (unlikely(!entries)) { return; } __pyx_code_cache.entries = entries; __pyx_code_cache.max_count = new_max; } for (i=__pyx_code_cache.count; i>pos; i--) { entries[i] = entries[i-1]; } entries[pos].code_line = code_line; entries[pos].code_object = code_object; __pyx_code_cache.count++; Py_INCREF(code_object); } #include "compile.h" #include "frameobject.h" #include "traceback.h" static PyCodeObject* __Pyx_CreateCodeObjectForTraceback( const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_srcfile = 0; PyObject *py_funcname = 0; #if PY_MAJOR_VERSION < 3 py_srcfile = PyString_FromString(filename); #else py_srcfile = PyUnicode_FromString(filename); #endif if (!py_srcfile) goto bad; if (c_line) { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #else py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line); #endif } else { #if PY_MAJOR_VERSION < 3 py_funcname = PyString_FromString(funcname); #else py_funcname = PyUnicode_FromString(funcname); #endif } if (!py_funcname) goto bad; py_code = __Pyx_PyCode_New( 0, /*int argcount,*/ 0, /*int kwonlyargcount,*/ 0, /*int nlocals,*/ 0, /*int stacksize,*/ 0, /*int flags,*/ __pyx_empty_bytes, /*PyObject *code,*/ __pyx_empty_tuple, /*PyObject *consts,*/ __pyx_empty_tuple, /*PyObject *names,*/ __pyx_empty_tuple, /*PyObject *varnames,*/ __pyx_empty_tuple, /*PyObject *freevars,*/ __pyx_empty_tuple, /*PyObject *cellvars,*/ py_srcfile, /*PyObject *filename,*/ py_funcname, /*PyObject *name,*/ py_line, /*int firstlineno,*/ __pyx_empty_bytes /*PyObject *lnotab*/ ); Py_DECREF(py_srcfile); Py_DECREF(py_funcname); return py_code; bad: Py_XDECREF(py_srcfile); Py_XDECREF(py_funcname); return NULL; } static void __Pyx_AddTraceback(const char *funcname, int c_line, int py_line, const char *filename) { PyCodeObject *py_code = 0; PyObject *py_globals = 0; PyFrameObject *py_frame = 0; py_code = __pyx_find_code_object(c_line ? c_line : py_line); if (!py_code) { py_code = __Pyx_CreateCodeObjectForTraceback( funcname, c_line, py_line, filename); if (!py_code) goto bad; __pyx_insert_code_object(c_line ? c_line : py_line, py_code); } py_globals = PyModule_GetDict(__pyx_m); if (!py_globals) goto bad; py_frame = PyFrame_New( PyThreadState_GET(), /*PyThreadState *tstate,*/ py_code, /*PyCodeObject *code,*/ py_globals, /*PyObject *globals,*/ 0 /*PyObject *locals*/ ); if (!py_frame) goto bad; py_frame->f_lineno = py_line; PyTraceBack_Here(py_frame); bad: Py_XDECREF(py_code); Py_XDECREF(py_frame); } static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) { while (t->p) { #if PY_MAJOR_VERSION < 3 if (t->is_unicode) { *t->p = PyUnicode_DecodeUTF8(t->s, t->n - 1, NULL); } else if (t->intern) { *t->p = PyString_InternFromString(t->s); } else { *t->p = PyString_FromStringAndSize(t->s, t->n - 1); } #else /* Python 3+ has unicode identifiers */ if (t->is_unicode | t->is_str) { if (t->intern) { *t->p = PyUnicode_InternFromString(t->s); } else if (t->encoding) { *t->p = PyUnicode_Decode(t->s, t->n - 1, t->encoding, NULL); } else { *t->p = PyUnicode_FromStringAndSize(t->s, t->n - 1); } } else { *t->p = PyBytes_FromStringAndSize(t->s, t->n - 1); } #endif if (!*t->p) return -1; ++t; } return 0; } /* Type Conversion Functions */ static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) { int is_true = x == Py_True; if (is_true | (x == Py_False) | (x == Py_None)) return is_true; else return PyObject_IsTrue(x); } static CYTHON_INLINE PyObject* __Pyx_PyNumber_Int(PyObject* x) { PyNumberMethods *m; const char *name = NULL; PyObject *res = NULL; #if PY_VERSION_HEX < 0x03000000 if (PyInt_Check(x) || PyLong_Check(x)) #else if (PyLong_Check(x)) #endif return Py_INCREF(x), x; m = Py_TYPE(x)->tp_as_number; #if PY_VERSION_HEX < 0x03000000 if (m && m->nb_int) { name = "int"; res = PyNumber_Int(x); } else if (m && m->nb_long) { name = "long"; res = PyNumber_Long(x); } #else if (m && m->nb_int) { name = "int"; res = PyNumber_Long(x); } #endif if (res) { #if PY_VERSION_HEX < 0x03000000 if (!PyInt_Check(res) && !PyLong_Check(res)) { #else if (!PyLong_Check(res)) { #endif PyErr_Format(PyExc_TypeError, "__%s__ returned non-%s (type %.200s)", name, name, Py_TYPE(res)->tp_name); Py_DECREF(res); return NULL; } } else if (!PyErr_Occurred()) { PyErr_SetString(PyExc_TypeError, "an integer is required"); } return res; } static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) { Py_ssize_t ival; PyObject* x = PyNumber_Index(b); if (!x) return -1; ival = PyInt_AsSsize_t(x); Py_DECREF(x); return ival; } static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) { #if PY_VERSION_HEX < 0x02050000 if (ival <= LONG_MAX) return PyInt_FromLong((long)ival); else { unsigned char *bytes = (unsigned char *) &ival; int one = 1; int little = (int)*(unsigned char*)&one; return _PyLong_FromByteArray(bytes, sizeof(size_t), little, 0); } #else return PyInt_FromSize_t(ival); #endif } static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject* x) { unsigned PY_LONG_LONG val = __Pyx_PyInt_AsUnsignedLongLong(x); if (unlikely(val == (unsigned PY_LONG_LONG)-1 && PyErr_Occurred())) { return (size_t)-1; } else if (unlikely(val != (unsigned PY_LONG_LONG)(size_t)val)) { PyErr_SetString(PyExc_OverflowError, "value too large to convert to size_t"); return (size_t)-1; } return (size_t)val; } #endif /* Py_PYTHON_H */ PyCogent-1.5.3/cogent/evolve/_solved_models.pyx000644 000765 000024 00000003045 11524070152 022556 0ustar00jrideoutstaff000000 000000 include "../../include/numerical_pyrex.pyx" cdef extern from "math.h": double exp(double) version_info = (1, 0) def calc_TN93_P(int do_scaling, mprobs, double time, alpha_1, alpha_2, result): cdef int S, motif, i, other, row, column, b_row, b_column cdef double *pi, *P, scale_factor cdef double pi_star[2], alpha[2], mu[2], e_mu_t[2], e_beta_t cdef double transition[2], transversion, p alpha[0] = alpha_1 alpha[1] = alpha_2 S = 4 pi = checkArrayDouble1D(mprobs, &S) P = checkArrayDouble2D(result, &S, &S) pi_star[0] = pi[0] + pi[1] pi_star[1] = pi[2] + pi[3] mu[0] = alpha[0] * pi_star[0] + 1.0 * pi_star[1] mu[1] = 1.0 * pi_star[0] + alpha[1] * pi_star[1] if do_scaling: scale_factor = 0.0 for motif in range(4): i = motif // 2 other = 1 - i scale_factor += (alpha[i] * pi[2*i+1-motif%2] + pi_star[other]) * pi[motif] time /= scale_factor e_beta_t = exp(-time) transversion = 1 - e_beta_t for i in range(2): other = 1 - i e_mu_t[i] = exp(-mu[i]*time) transition[i] = 1 + (pi_star[other] * e_beta_t - e_mu_t[i]) / pi_star[i] for row in range(4): i = row // 2 for column in range(4): j = column // 2 if i == j: p = transition[i] else: p = transversion p *= pi[column] if row == column: p += e_mu_t[i] P[column+4*row] = p PyCogent-1.5.3/cogent/evolve/best_likelihood.py000644 000765 000024 00000011321 12024702176 022530 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Returns the likelihood resulting from a model in which motif probabilities are assumed to be equal to the observed motif frequencies, as described by Goldman (1993). This model is not informative for inferring the evolutionary process, but its likelihood indicates the maximum possible likelihood value for a site-independent evolutionary process. """ from __future__ import division from numpy import log from cogent import LoadSeqs __author__ = "Helen Lindsay, Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Helen Lindsay", "Gavin Huttley", "Daniel McDonald"] cite = "Goldman, N. (1993). Statistical tests of models of DNA substitution. J Mol Evol, 36: 182-98" __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def _transpose(array): new_array = [] num_cols = len(array[0]) for col in range(num_cols): new_row = [] for row in array: new_row += [row[col]] new_array.append(new_row) return new_array def _take(array, indices): new_array = [] for index in indices: new_array.append(array[index]) return new_array def aligned_columns_to_rows(aln, motif_len, exclude_chars = None, allowed_chars='ACGT'): """return alignment broken into motifs as a transposed list with sequences as columns and aligned columns as rows Arguments: exclude_chars: columns containing these characters will be excluded""" if exclude_chars: exclude_chars = set(exclude_chars) exclude_func = exclude_chars.intersection else: allowed_chars = set(allowed_chars) exclude_func = lambda x: not allowed_chars.issuperset(x) exclude_indices = set() array = [] for name in aln.Names: motifs = list(aln.getGappedSeq(name).getInMotifSize(motif_len)) array.append(motifs) for motif_index, motif in enumerate(motifs): if exclude_func(motif): exclude_indices.update([motif_index]) include_indices = set(range(len(array[0]))).difference(exclude_indices) include_indices = list(include_indices) include_indices.sort() array = _transpose(array) array = _take(array, include_indices) return array def count_column_freqs(columns_list): """return the frequency of columns""" col_freq_dict = {} for column in columns_list: column = ' '.join(column) col_freq_dict[column] = col_freq_dict.get(column, 0) + 1 return col_freq_dict def get_ML_probs(columns_list, with_patterns=False): """returns the column log-likelihoods and frequencies Argument: - with_patterns: the site patterns are returned""" n = len(columns_list) col_freq_dict = count_column_freqs(columns_list) col_lnL_freqs = [] for column_pattern, freq in col_freq_dict.items(): # note, the behaviour of / is changed due to the __future__ import if with_patterns: row = [column_pattern, freq/n, freq] else: row = [freq/n, freq] col_lnL_freqs.append(row) return col_lnL_freqs def get_G93_lnL_from_array(columns_list): """return the best log likelihood for a list of aligned columns""" col_stats = get_ML_probs(columns_list) log_likelihood = 0 for freq, num in col_stats: pattern_lnL = log(freq)*num log_likelihood += pattern_lnL return log_likelihood def BestLogLikelihood(aln, alphabet=None, exclude_chars = None, allowed_chars='ACGT', motif_length=None, return_length=False): """returns the best log-likelihood according to Goldman 1993. Arguments: - alphabet: a sequence alphabet object. - motif_length: 1 for nucleotide, 2 for dinucleotide, etc .. - exclude_chars: a series of characters used to exclude motifs - allowed_chars: only motifs that contain a subset of these are allowed - return_length: whether to also return the number of alignment columns """ assert alphabet or motif_length, "Must provide either an alphabet or a"\ " motif_length" # need to use the alphabet, so we can enforce character compliance if alphabet: kwargs = dict(moltype=alphabet.MolType) motif_length = alphabet.getMotifLen() else: kwargs = {} aln = LoadSeqs(data=aln.todict(), **kwargs) columns = aligned_columns_to_rows(aln, motif_length, exclude_chars, allowed_chars) num_cols = len(columns) log_likelihood = get_G93_lnL_from_array(columns) if return_length: return log_likelihood, num_cols return log_likelihood PyCogent-1.5.3/cogent/evolve/bootstrap.py000644 000765 000024 00000016335 12024702176 021417 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Provides services for parametric bootstrapping. These include the ability to estimate probabilities or estimate confidence intervals. The estimation of probabilities is done by the EstimateProbability class. Functions that provide ParameterController objects for the 'null' and 'alternative' cases are provided to the constructor. Numerous aspects of the bootstrapping can be controlled such as the choice of numerical optimiser, and the number of samples from which to estimate the probability. This class can be run in serial or in parallel (at the level of each random sample). An observed Likelihood Ratio (LR) statistic is estimated using the provided 'observed' data. Random data sets are simulated under the null model and the LR estimated from these. The probability of the observed LR is taken as the number of sample LR's that were >= to the observed. Confidence interval estimation can be done using the EstimateConfidenceIntervals class. Multiple statistics associated with an analysis can be evaluated simultaneously. Similar setup, and parallelisation options as provided by the EstimateProbability class. """ from __future__ import with_statement, division from cogent.util import parallel from cogent.util import progress_display as UI import random __author__ = "Gavin Huttley, Andrew Butterfield and Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley","Andrew Butterfield", "Matthew Wakefield", "Edward Lang", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class ParametricBootstrapCore(object): """Core parametric bootstrap services.""" def __init__(self): """Constructor for core parametric bootstrap services class.""" self._numreplicates = 10 self.seed = None self.results = [] def setNumReplicates(self, num): self._numreplicates = num def setSeed(self, seed): self.seed = seed @UI.display_wrap def run(self, ui, **opt_args): # Sets self.observed and self.results (a list _numreplicates long) to # whatever is returned from self.simplify([LF result from each PC]). # self.simplify() is used as the entire LF result might not be picklable # for MPI. Subclass must provide self.alignment and # self.parameter_controllers if 'random_series' not in opt_args and not opt_args.get('local', None): opt_args['random_series'] = random.Random() null_pc = self.parameter_controllers[0] pcs = len(self.parameter_controllers) if pcs == 1: model_label = [''] elif pcs == 2: model_label = ['null', 'alt '] else: model_label = ['null'] + ['alt%s'%i for i in range(1,pcs)] @UI.display_wrap def each_model(alignment, ui): def one_model(pc): pc.setAlignment(alignment) return pc.optimise(return_calculator=True, **opt_args) # This is not done in parallel because we depend on the side- # effect of changing the parameter_controller current values memos = ui.eager_map(one_model, self.parameter_controllers, labels=model_label, pure=False) concise_result = self.simplify(*self.parameter_controllers) return (memos, concise_result) #optimisations = pcs * (self._numreplicates + 1) init_work = pcs / (self._numreplicates + pcs) ui.display('Original data', 0.0, init_work) (starting_points, self.observed) = each_model(self.alignment) ui.display('Randomness', init_work, 0.0) alignment_random_state = random.Random(self.seed).getstate() if self.seed is None: comm = parallel.getCommunicator() alignment_random_state = comm.bcast(alignment_random_state, 0) def one_replicate(i): for (pc, start_point) in zip(self.parameter_controllers, starting_points): # may have fewer CPUs per replicate than for original pc.setupParallelContext() # using a calculator as a memo object to reset the params pc.updateFromCalculator(start_point) aln_rnd = random.Random(0) aln_rnd.setstate(alignment_random_state) aln_rnd.jumpahead(i*10**9) simalign = null_pc.simulateAlignment(random_series=aln_rnd) (dummy, result) = each_model(simalign) return result ui.display('Bootstrap', init_work) self.results = ui.eager_map( one_replicate, range(self._numreplicates), noun='replicate', start=init_work) class EstimateProbability(ParametricBootstrapCore): # 2 parameter controllers, LR def __init__(self, null_parameter_controller, alt_parameter_controller, alignment): ParametricBootstrapCore.__init__(self) self.alignment = alignment self.null_parameter_controller = null_parameter_controller self.alt_parameter_controller = alt_parameter_controller self.parameter_controllers = [self.null_parameter_controller, self.alt_parameter_controller] def simplify(self, null_result, alt_result): return (null_result.getLogLikelihood(), alt_result.getLogLikelihood()) def getObservedlnL(self): return self.observed def getSamplelnL(self): return self.results def getSampleLRList(self): LR = [2 * (alt_lnL - null_lnL) for (null_lnL, alt_lnL) in self.results] LR.sort() LR.reverse() return LR def getObservedLR(self): return 2 * (self.observed[1] - self.observed[0]) def getEstimatedProb(self): """Return the estimated probability. Calculated as the number of sample LR's >= observed LR divided by the number of replicates. """ observed_LR = self.getObservedLR() sample_LRs = self.getSampleLRList() for (count, value) in enumerate(sample_LRs): if value <= observed_LR: return float(count) / len(sample_LRs) return 1.0 class EstimateConfidenceIntervals(ParametricBootstrapCore): """Estimate confidence interval(s) for one or many statistics by parametric bootstrapping.""" def __init__(self, parameter_controller, func_calcstats, alignment): # func_calcstats takes a param dict and returns the statistic of interest ParametricBootstrapCore.__init__(self) self.alignment = alignment self.parameter_controller = parameter_controller self.parameter_controllers = [parameter_controller] self.func_calcstats = func_calcstats def simplify(self, result): return (result.getLogLikelihood(), self.func_calcstats(result)) def getObservedStats(self): return self.observed[1] def getSampleStats(self): return [s for (lnL, s) in self.results] def getSamplelnL(self): return [lnL for (lnL, s) in self.results] def getObservedlnL(self): return self.observed[0] PyCogent-1.5.3/cogent/evolve/coevolution.py000755 000765 000024 00000324503 12024702176 021752 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # Authors: Greg Caporaso (gregcaporaso@gmail.com), Brett Easton, Gavin Huttley # coevolution.py """ Description File created on 03 May 2007. Functions to perform coevolutionary analyses on pre-aligned biological sequences. Coevolutionary analyses detect correlated substitutions between alignment positions. Analyses can be performed to look for covariation between a pair of alignment positions, in which case a single 'coevolve score' is returned. (The nature of this coevolve score is determined by the method used to detect coevolution.) Alternatively, coevolution can be calculated between one position and all other positions in an alignment, in which case a vector of coevolve scores is returned. Finally, coevolution can be calculated over all pairs of positions in an alignment, in which case a matrix (usually, but not necessarily, symmetric) is returned. The functions providing the core functionality here are: coevolve_pair: coevolution between a pair of positions (float returned) coevolve_position: coevolution between a position and all other positions in the alignment (vector returned) coevolve_alignment: coevolution between all pairs of positions in an alignment (matrix returned) Each of these functions takes a coevolution calculator, an alignment, and any additional keyword arguments that should be passed to the coevolution calculator. More information on these functions and how they should be used is available as executable documentation in coevolution.rst. The methods provided for calculating coevolution are: Mutual Information (Shannon 19xx) Normalized Mutual Information (Martin 2005) Statistical Coupling Analysis (Suel 2003) *Ancestral states (Tuffery 2000 -- might not be the best ref, a better might be Shindyalov, Kolchannow, and Sander 1994, but so far I haven't been able to get my hands on that one). *Gctmpca (Yeang 2007) (Yeang CH, Haussler D. Detecting the coevolution in and among protein domains. PLoS Computational Biology 2007.) * These methods require a phylogenetic tree, in addition to an alignment. Trees are calculated on-the-fly, by neighbor-joining, if not provided. This file can be applied as a script to calculate a coevolution matrix given an alignment. For information, run python coevolution.py -h from the command line. """ from __future__ import division from optparse import make_option from cPickle import Pickler, Unpickler from os.path import splitext, basename, exists from sys import exit from numpy import zeros, ones, float, put, transpose, array, float64, nonzero,\ abs, sqrt, exp, ravel, take, reshape, mean, tril, nan, isnan, log, e,\ greater_equal, less_equal from random import shuffle from cogent.util.misc import parse_command_line_parameters from cogent.maths.stats.util import Freqs from cogent.util.array import norm from cogent.core.sequence import Sequence from cogent.core.moltype import IUPAC_gap, IUPAC_missing from cogent.core.profile import Profile from cogent.core.alphabet import CharAlphabet, Alphabet from cogent.maths.stats.distribution import binomial_exact from cogent.maths.stats.special import ROUND_ERROR from cogent.parse.record import FileFormatError from cogent.evolve.substitution_model import SubstitutionModel from cogent import LoadSeqs, LoadTree, PROTEIN, RNA from cogent.core.tree import TreeError from cogent.core.alignment import seqs_from_fasta, DenseAlignment from cogent.parse.newick import TreeParseError from cogent.parse.record import RecordError from cogent.app.gctmpca import Gctmpca from cogent.util.recode_alignment import recode_dense_alignment, \ alphabets, recode_freq_vector, recode_counts_and_freqs, \ square_matrix_to_dict from cogent.evolve.substitution_model import EmpiricalProteinMatrix __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Gavin Huttley", "Brett Easton",\ "Sandra Smit", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Beta" gDefaultExcludes = ''.join([IUPAC_gap,IUPAC_missing]) gDefaultNullValue = nan ## Mutual Information Analysis # Mutual Information Calculators def mi(h1,h2,joint_h): """ Calc Mutual Information given two entropies and their joint entropy """ return h1 + h2 - joint_h def normalized_mi(h1,h2,joint_h): """ MI normalized by joint entropy, as described in Martin 2005 """ return mi(h1,h2,joint_h) / joint_h nmi = normalized_mi # Other functions used in MI calculations def join_positions(pos1,pos2): """ Merge two positions and return as a list of strings pos1: iterable object containing the first positions data pos2: iterable object containing the second positions data Example: >>> join_positions('ABCD','1234') ['A1', 'B2', 'C3', 'D4'] """ return [''.join([r1,r2]) for r1,r2 in zip(pos1,pos2)] def joint_entropy(pos1,pos2): """ Calculate the joint entroy of a pair of positions """ return Freqs(join_positions(pos1,pos2)).Uncertainty # Exclude handlers (functions for processing position strings with exclude # characters) def ignore_excludes(pos,excludes=gDefaultExcludes): """ Return position data as-is (results in excludes treated as other chars) """ return pos # Functions for scoring coevolution on the basis of Mutual Information def mi_pair(alignment,pos1,pos2,h1=None,h2=None,mi_calculator=mi,\ null_value=gDefaultNullValue,excludes=gDefaultExcludes,exclude_handler=None): """ Calculate mutual information of a pair of alignment positions alignment: the full alignment object pos1: index of 1st position in alignment to be compared (zero-based, not one-based) pos2: index of 2nd position in alignment to be compared (zero-based, not one-based) h1: entropy of pos1, if already calculated (to avoid time to recalc) h2: entropy of pos2, if already calculated (to avoid time to recalc) mi_calculator: a function which calculated MI from two entropies and their joint entropy -- see mi and normalized_mi for examples null_value: the value to be returned if mi cannot be calculated (e.g., if mi_calculator == normalized_mi and joint_h = 0.0) excludes: iterable objects containing characters that require special handling -- by default, if a position contains an exclude, null_value will be returned. For non-default handling, pass an exclude_handler exclude_handler: a function which takes position data and returns it with exclude characters processed in someway. Position data should be an iterable object containing the characters present at each position. f(position_data,excludes=gDefaultExcludes) -> position_data """ positions = alignment.Positions col1 = positions[pos1] col2 = positions[pos2] # Detect and process exclude characters. # This bit of code is slow, and not necessary if # exclude_hanlder == ignore_excludes, so I explicitly # check, and bypass this block if possible. if exclude_handler != ignore_excludes: for col in (col1,col2): states = set(col) for exclude in excludes: if exclude in states: try: col = exclude_handler(col,excludes) break except TypeError: return null_value # Calculate entropy of pos1 & pos2, if they weren't passed in. if not h1: h1 = Freqs(col1).Uncertainty if not h2: h2 = Freqs(col2).Uncertainty # Calculate the joint entropy of pos1 & pos2 joint_h = joint_entropy(col1,col2) # Calculate MI using the specified method -- return null_value when # the specified MI cannot be calculated # (e.g., mi_calculator=nmi and joint_h=0.0) try: result = mi_calculator(h1,h2,joint_h) if result <= ROUND_ERROR: result = 0.0 except ZeroDivisionError: result = null_value return result def mi_position(alignment,position,\ positional_entropies=None,mi_calculator=mi,null_value=gDefaultNullValue,\ excludes=gDefaultExcludes,exclude_handler=None): """ Calc mi b/w position and all other positions in an alignment alignment: the full alignment object position: the position number of interest -- NOTE: this is the position index, not the sequenece position (so zero-indexed, not one-indexed) positional_entropies: a list containing the entropy of each position in the alignment -- these can be passed in to avoid recalculating if calling this function over more than one position (e.g., in mi_alignment) mi_calculator: a function which calculated MI from two entropies and their joint entropy -- see mi and normalized_mi for examples null_value: the value to be returned if mi cannot be calculated (e.g., if mi_calculator == normalized_mi and joint_h = 0.0) excludes: iterable objects containing characters that require special handling -- by default, if a position contains an exclude, null_value will be returned. For non-default handling, pass an exclude_handler exclude_handler: a function which takes a position and returns it with exclude characters processed in someway. """ aln_length = len(alignment) # Create result vector result = zeros(aln_length,float) # compile positional entropies if not passed in if positional_entropies == None: positional_entropies = \ [Freqs(p).Uncertainty for p in alignment.Positions] # Will want to make a change here so that we don't need to recalculate # all values when calling from mi_alignment for i in range(aln_length): result[i] = mi_pair(alignment,pos1=position,pos2=i,\ h1=positional_entropies[position],h2=positional_entropies[i],\ mi_calculator=mi_calculator,null_value=null_value,excludes=excludes,\ exclude_handler=exclude_handler) return result def mi_alignment(alignment,mi_calculator=mi,null_value=gDefaultNullValue,\ excludes=gDefaultExcludes,exclude_handler=None): """ Calc mi over all position pairs in an alignment alignment: the full alignment object mi_calculator: a function which calculated MI from two entropies and their joint entropy -- see mi and normalized_mi for examples null_value: the value to be returned if mi cannot be calculated (e.g., if mi_calculator == normalized_mi and joint_h = 0.0) excludes: iterable objects containing characters that require special handling -- by default, if a position contains an exclude, null_value will be returned. For non-default handling, pass an exclude_handler exclude_handler: a function which takes a position and returns it with exclude characters processed in someway. """ aln_length = len(alignment) # Create result matrix result = zeros((aln_length,aln_length),float) # Compile postional entropies for each position in the alignment # I believe I started using this rather than alignment.uncertainties # b/c the latter relies on converting a DenseAlignment to an Alignment -- # need to check into this. positional_entropies = [Freqs(p).Uncertainty for p in alignment.Positions] # Calculate pairwise MI between position_number and all alignment # positions, and return the results in a vector. for i in range(aln_length): for j in range(i+1): result[i,j] = mi_pair(alignment,pos1=i,pos2=j,\ h1=positional_entropies[i],h2=positional_entropies[j],\ mi_calculator=mi_calculator,null_value=null_value,\ excludes=excludes,exclude_handler=exclude_handler) # copy the lower triangle to the upper triangle to make # the matrix symmetric ltm_to_symmetric(result) return result ## End Mutual Information Analysis ## Start Normalized Mutual Information Analysis (Martin 2005) def normalized_mi_pair(alignment,pos1,pos2,h1=None,h2=None,\ null_value=gDefaultNullValue,excludes=gDefaultExcludes,\ exclude_handler=None): """Calc normalized mutual information of a pair of alignment positions alignment: the full alignment object pos1: index of 1st position in alignment to be compared (zero-based, not one-based) pos2: index of 2nd position in alignment to be compared (zero-based, not one-based) h1: entropy of pos1, if already calculated (to avoid time to recalc) h2: entropy of pos2, if already calculated (to avoid time to recalc) null_value: the value to be returned if mi cannot be calculated (e.g., if mi_calculator == normalized_mi and joint_h = 0.0) excludes: iterable objects containing characters that require special handling -- by default, if a position contains an exclude, null_value will be returned. For non-default handling, pass an exclude_handler exclude_handler: a function which takes a position and returns it with exclude characters processed in someway. """ return mi_pair(alignment,pos1,pos2,h1=h1,h2=h2,mi_calculator=nmi,\ null_value=null_value,excludes=excludes,\ exclude_handler=exclude_handler) nmi_pair = normalized_mi_pair def normalized_mi_position(alignment,position,positional_entropies=None,\ null_value=gDefaultNullValue,excludes=gDefaultExcludes,\ exclude_handler=None): """ Calc normalized mi b/w position and all other positions in an alignment alignment: the full alignment object position: the position number of interest -- NOTE: this is the position index, not the sequenece position (so zero-indexed, not one-indexed) positional_entropies: a list containing the entropy of each position in the alignment -- these can be passed in to avoid recalculating if calling this function over more than one position (e.g., in mi_alignment) null_value: the value to be returned if mi cannot be calculated (e.g., if mi_calculator == normalized_mi and joint_h = 0.0) excludes: iterable objects containing characters that require special handling -- by default, if a position contains an exclude, null_value will be returned. For non-default handling, pass an exclude_handler exclude_handler: a function which takes a position and returns it with exclude characters processed in someway. """ return mi_position(alignment,position,\ positional_entropies=positional_entropies,\ mi_calculator=nmi,null_value=null_value,excludes=excludes,\ exclude_handler=exclude_handler) nmi_position = normalized_mi_position def normalized_mi_alignment(alignment,null_value=gDefaultNullValue,\ excludes=gDefaultExcludes,exclude_handler=None): """ Calc normalized mi over all position pairs in an alignment alignment: the full alignment object null_value: the value to be returned if mi cannot be calculated (e.g., if mi_calculator == normalized_mi and joint_h = 0.0) excludes: iterable objects containing characters that require special handling -- by default, if a position contains an exclude, null_value will be returned. For non-default handling, pass an exclude_handler exclude_handler: a function which takes a position and returns it with exclude characters processed in someway. """ return mi_alignment(alignment=alignment,mi_calculator=normalized_mi,\ null_value=null_value,excludes=excludes,\ exclude_handler=exclude_handler) nmi_alignment = normalized_mi_alignment ## End Normalized Mutual Information Analysis ## Start Statistical coupling analysis (SCA) (Suel 2003) class SCAError(Exception): pass # PROTEIN's alphabet contains U, so redefining the alphabet for now # rather than use PROTEIN.Alphabet. May want to revist this decision... AAGapless = CharAlphabet('ACDEFGHIKLMNPQRSTVWY') default_sca_alphabet = AAGapless #AAGapless = PROTEIN.Alphabet #Dictionary of mean AA-frequencies in all natural proteins #Compiled by Rama Ranganathan from 36,498 unique eukaryotic proteins #from the Swiss-Prot database protein_dict = { 'A': 0.072658, 'C': 0.024692, 'D': 0.050007, 'E': 0.061087, 'F': 0.041774, 'G': 0.071589, 'H': 0.023392, 'I': 0.052691, 'K': 0.063923, 'L': 0.089093, 'M': 0.02315, 'N': 0.042931, 'P': 0.052228, 'Q': 0.039871, 'R': 0.052012, 'S': 0.073087, 'T': 0.055606, 'V': 0.063321, 'W': 0.01272, 'Y': 0.032955, } default_sca_freqs = protein_dict def freqs_to_array(f,alphabet): """Takes data in freqs object and turns it into array. f = dict or Freqs object alphabet = Alphabet object or just a list that specifies the order of things to appear in the resulting array """ return array([f.get(i,0) for i in alphabet]) def get_allowed_perturbations(counts, cutoff, alphabet, num_seqs=100): """Returns list of allowed perturbations as characters count: Profile object of raw character counts at each position num_seqs: number of sequences in the alignment cutoff: minimum number of sequences in the subalignment (as fraction of the total number of seqs in the alignment. A perturbation is allowed if the subalignment of sequences that contain the specified char at the specified position is larger that the cutoff value * the total number of sequences in the alignment. """ result = [] abs_cutoff = cutoff * num_seqs for char,count in zip(alphabet,counts): if count >= abs_cutoff: result.append(char) return result def probs_from_dict(d,alphabet): """ Convert dict of alphabet char probabilities to list in alphabet's order d: probabilities of observing each character in alphabet (dict indexed by char) alphabet: the characters in the alphabet -- provided for list order. Must iterate over the ordered characters in the alphabet (e.g., a list of characters or an Alphabet object) """ return array([d[c] for c in alphabet]) def freqs_from_aln(aln,alphabet,scaled_aln_size=100): """Return the frequencies in aln of chars in alphabet's order aln: the alignment object alphabet: the characters in the alphabet -- provided for list order. Must iterate over the ordered characters in the alphabet (e.g., a list of characters or an Alphabet object) scaled_aln_size: the scaled number of sequences in the alignment. The original SCA implementation treats all alignments as if they contained 100 sequences when calculating frequencies and probabilities. 100 is therefore the default value. *Warning: characters in aln that are not in alphabet are silently ignored. Is this the desired behavior? Need to combine this function with get_position_frequences (and renamed that one to be more generic) since they're doing the same thing now. """ alphabet_as_indices = array([aln.Alphabet.toIndices(alphabet)]).transpose() aln_data = ravel(aln.ArrayPositions) return (alphabet_as_indices == aln_data).sum(1) * \ (scaled_aln_size/len(aln_data)) def get_positional_frequencies(aln,position_number,alphabet,\ scaled_aln_size=100): """Return the freqs in aln[position_number] of chars in alphabet's order aln: the alignment object position_number: the index of the position of interest in aln (note: zero-based alignment indexing) alphabet: the characters in the alphabet -- provided for list order. Must iterate over the ordered characters in the alphabet (e.g., a list of characters or an Alphabet object) scaled_aln_size: the scaled number of sequences in the alignment. The original SCA implementation treats all alignments as if they contained 100 sequences when calculating frequencies and probabilities. 100 is therefore the default value. *Warning: characters in aln that are not in alphabet are silently ignored. Is this the desired behavior? """ alphabet_as_indices = array([aln.Alphabet.toIndices(alphabet)]).transpose() position_data = aln.ArrayPositions[position_number] return (alphabet_as_indices == position_data).sum(1) * \ (scaled_aln_size/len(position_data)) def get_positional_probabilities(pos_freqs,natural_probs,scaled_aln_size=100): """Get probs of observering the freq of each char given it's natural freq In Suel 2003 supplementary material, this step is defined as: "... each element is the binomial probability of observing each amino acid residue at position j given its mean frequency in all natural proteins." This function performs the calculate for a single position. pos_freqs: the frequencies of each char in the alphabet at a position-of-interest in the alignment (list of floats, typically output of get_positional_frequencies) natural_probs: the natural probabilities of observing each char in the alphabet (list of floats: typically output of probs_from_dict) scaled_aln_size: the scaled number of sequences in the alignment. The original SCA implementation treats all alignments as if they contained 100 sequences when calculating frequencies and probabilities. 100 is therefore the default value. Note: It is critical that the values in pos_freqs and natural_probs are in the same order, which should be the order of chars in the alphabet. """ results = [] for pos_freq,natural_prob in zip(pos_freqs,natural_probs): try: results.append(\ binomial_exact(pos_freq,scaled_aln_size,natural_prob)) # Because of the scaling of alignments to scaled_aln_size, pos_freq is # a float rather than an int. So, if a position is perfectly conserved, # pos_freq as a float could be greater than scaled_aln_size. # In this case I cast it to an int. I don't like this alignment # scaling stuff though. except ValueError, e: results.append(binomial_exact(int(pos_freq),\ scaled_aln_size,natural_prob)) return array(results) def get_subalignments(aln,position,selections): """ returns subalns w/ seq[pos] == selection for each in selections aln: an alignment object position: int in alignment to be checked for each perturbation selections: characters which must be present at seq[pos] for seq to be in subalignment Note: This method returns a list of subalignments corresponding to the list of selections. So, if you specify selections as ['A','G'], you would get two subalignments back -- the first containing sequences with 'A' at position, and the second containing sequences with 'G' at position. If you want all sequences containing either 'A' or 'G', merge the resulting subalignments. """ result = [] for s in aln.Alphabet.toIndices(selections): seqs_to_keep = nonzero(aln.ArraySeqs[:,position] == s)[0] result.append(aln.getSubAlignment(seqs=seqs_to_keep)) return result def get_dg(position_probs,aln_probs): """ Return delta_g vector position_probs: the prob of observing each alphabet chars frequency in the alignment position-of-interest, given it's background frequency in all proteins (list of floats, typically the output of get_positional_probabilities) aln_probs: the prob of observing each alphabet chars frequency in the full alignment, given it's background frequency (list of floats) """ results = [] for position_prob,aln_prob in zip(position_probs,aln_probs): results.append(log(position_prob/aln_prob)) return array(results) def get_dgg(all_dgs,subaln_dgs,scaled_aln_size=100): """Return delta_delta_g value all_dgs: the dg vector for a position-of-interest in the alignment (list of floats, typically the output of get_dg) subaln_dgs: the dg vector for a sub-alignment of the position-of- interest in the alignment (list of floats, typically the output of get_dg applied to a sub-alignment) scaled_aln_size: the scaled number of sequences in the alignment. The original SCA implementation treats all alignments as if they contained 100 sequences when calculating frequencies and probabilities. 100 is therefore the default value. * There are two weird issues in this function with respect to the desciption of the algorithm in the Suel 2003 supplementary material. In order to get the values presented in their GPCR paper, we need to (1) divide the euclidian norm by the scaled_aln_size, and then (2) multiply the result by e. ** IT IS CRITICAL TO UNDERSTAND WHY WE NEED TO APPLY THESE STEPS BEFORE PUBLISHING ANYTHING THAT USES THIS CODE.** * A possible reason for the mysterious e scaling is that we are misinterpreting what they mean when they say ddg is 'the magnitude of this difference vector.' We are assuming they are referring to the Euclidian norm, but until I see their code, I can't be sure about this. """ return norm(all_dgs - subaln_dgs)/scaled_aln_size * e def sca_pair(alignment,pos1,pos2,cutoff,\ position_freqs=None,position_probs=None,dgs=None,perturbations=None,\ scaled_aln_size=100,null_value=gDefaultNullValue,return_all=False,\ alphabet=default_sca_alphabet,background_freqs=default_sca_freqs): """ Calculate statistical coupling b/w a pair of alignment columns alignment: full alignment object pos1: the first position used to probe for statistical coupling (subalignments will be generated based on allowed perturbations at this position) -- int, zero-based indexing into alignment pos2: the second position used to probe for statistical coupling -- int, zero-based indexing into alignment cutoff: the percentage of sequences that must contain a specific char at a specific pos1 to result in an allowed sub-alignment. (According to the Ranganathan papers, this should be the value determined by their 3rd criteria.) position_freqs: if precalculated, a matrix containing the output of get_positional_frequencies for each position in the alignment. This will typically be used only when sca_pair is called from sca_position, and these values are therefore pre-calculated. position_probs: if precalculated, a matrix containing the output of get_positional_probabilities for each position in the alignment. This will typically be used only when sca_pair is called from sca_position, and these values are therefore pre-calculated. dgs: if precalculated, a matrix containing the output of get_dg for each position in the alignment. This will typically be used only when sca_pair is called from sca_position, and these values are therefore pre-calculated. perturbations: if precalculated, a matrix containing the output of get_allowed_perturbations for each position in the alignment. This will typically be used only when sca_pair is called from sca_position, and these values are therefore pre-calculated. scaled_aln_size: the scaled number of sequences in the alignment. The original SCA implementation treats all alignments as if they contained 100 sequences when calculating frequencies and probabilities. 100 is therefore the default value. null_value: the value which should be returned if SCA cannot or should not be calculated (e.g., no allowed perturbations or pos1==pos2, respectively). return_all: if cutoff <= 0.50, it is possible that there will be more than one allowed_perturbation per position. In these cases, either all of the values could be returned (return_all=True) or the max of the values can be returned (return_all=False, default). If you'd like one value, but not the max, wrap this function with return_all=True, and handle the return value as desired. alphabet: an ordered iterable object containing the characters in the alphabet. For example, this can be a CharAlphabet object, a list, or a string. **IMPORTANT NOTE: SCA, unlike (all?) other methods implemented here, requires the full alignment, even to calculate coupling between just a pair of positions. Because frequencies of characters in the full alignment are compared with frequencies at each position, you cannot simply pull out two columns of the alignment, and pass them to this function as a subalignment. Your results would differ from calculating coupling of the same positions with the full alignment. For example: sca_pair(aln,10,20,0.85) != \ sca_pair(aln.takePositions([10,20]),0,1,0.85) """ num_positions = len(alignment) num_seqs = alignment.getNumSeqs() # Calculate frequency distributions natural_probs = probs_from_dict(background_freqs,alphabet) aln_freqs = freqs_from_aln(alignment,alphabet,scaled_aln_size) aln_probs = get_positional_probabilities(\ aln_freqs,natural_probs,scaled_aln_size) # get positional frequencies if position_freqs: pos1_freqs = position_freqs[pos1] pos2_freqs = position_freqs[pos2] else: pos1_freqs = get_positional_frequencies(alignment,pos1,\ alphabet,scaled_aln_size) pos2_freqs = get_positional_frequencies(alignment,pos2,\ alphabet,scaled_aln_size) # get positional probability vectors ("... each element is the binomial # probability of observing each amino acid residue at position j given its # mean frequency in all natural proteins." Suel 2003 supplementary # material) if position_probs: pos2_probs = position_probs[pos2] else: pos2_probs = get_positional_probabilities(pos2_freqs,\ natural_probs,scaled_aln_size) # get statistical energies for pos2 in full alignment if dgs: pos2_dg = dgs[pos2] else: pos2_dg = get_dg(pos2_probs,aln_probs) # determine allowed perturbations if perturbations: allowed_perturbations = perturbations[pos1] else: allowed_perturbations = \ get_allowed_perturbations(pos1_freqs,cutoff,alphabet,scaled_aln_size) # should we do something different here on return_all == True? if not allowed_perturbations: return null_value # generate the subalignments which contain each allowed # perturbation residue at pos1 subalignments = get_subalignments(alignment,pos1,allowed_perturbations) # calculate ddg for each allowed perturbation ddg_values = [] for subalignment in subalignments: # Calculate dg for the subalignment subaln_freqs = freqs_from_aln(subalignment,alphabet,scaled_aln_size) subaln_probs = get_positional_probabilities(\ subaln_freqs,natural_probs,scaled_aln_size) subaln_pos2_freqs = get_positional_frequencies(\ subalignment,pos2,alphabet,scaled_aln_size) subaln_pos2_probs = get_positional_probabilities(\ subaln_pos2_freqs,natural_probs,scaled_aln_size) subaln_dg = get_dg(subaln_pos2_probs,subaln_probs) ddg_values.append(get_dgg(pos2_dg,subaln_dg,scaled_aln_size)) if return_all: return zip(allowed_perturbations,ddg_values) else: return max(ddg_values) def sca_position(alignment,position,cutoff,\ position_freqs=None,position_probs=None,dgs=None,\ perturbations=None,scaled_aln_size=100,\ null_value=gDefaultNullValue,return_all=False,\ alphabet=default_sca_alphabet,background_freqs=default_sca_freqs): """ Calculate statistical coupling b/w a column and all other columns alignment: full alignment object position: the position of interest to probe for statistical coupling (subalignments will be generated based on allowed perturbations at this position) -- int, zero-based indexing into alignment cutoff: the percentage of sequences that must contain a specific char at a specific pos1 to result in an allowed sub-alignment. (According to the Ranganathan papers, this should be the value determined by their 3rd criteria.) position_freqs: if precalculated, a matrix containing the output of get_positional_frequencies for each position in the alignment. This will typically be used only when sca_position is called from sca_alignment, and these values are therefore pre-calculated. position_probs: if precalculated, a matrix containing the output of get_positional_probabilities for each position in the alignment. This will typically be used only when sca_position is called from sca_alignment, and these values are therefore pre-calculated. dgs: if precalculated, a matrix containing the output of get_dg for each position in the alignment. This will typically be used only when sca_position is called from sca_alignment, and these values are therefore pre-calculated. perturbations: if precalculated, a matrix containing the output of get_allowed_perturbations for each position in the alignment. This will typically be used only when sca_position is called from sca_alignment, and these values are therefore pre-calculated. scaled_aln_size: the scaled number of sequences in the alignment. The original SCA implementation treats all alignments as if they contained 100 sequences when calculating frequencies and probabilities. 100 is therefore the default value. null_value: the value which should be returned if SCA cannot or should not be calculated (e.g., no allowed perturbations or pos1==pos2, respectively). return_all: if cutoff <= 0.50, it is possible that there will be more than one allowed_perturbation per position. In these cases, either all of the values could be returned (return_all=True) or the max of the values can be returned (return_all=False, default). If you'd like one value, but not the max, wrap this function with return_all=True, and handle the return value as desired. alphabet: an ordered iterable object containing the characters in the alphabet. For example, this can be a CharAlphabet object, a list, or a string. """ num_seqs = alignment.getNumSeqs() natural_probs = probs_from_dict(background_freqs,alphabet) aln_freqs = freqs_from_aln(alignment,alphabet,scaled_aln_size) aln_probs = get_positional_probabilities(\ aln_freqs,natural_probs,scaled_aln_size) if not position_freqs: position_freqs = [] for i in range(len(alignment)): position_freqs.append(\ get_positional_frequencies(\ alignment,i,alphabet,scaled_aln_size)) if not position_probs: position_probs = [] for i in range(len(alignment)): position_probs.append(get_positional_probabilities(\ position_freqs[i],natural_probs,scaled_aln_size)) if not dgs: dgs = [] for i in range(len(alignment)): dgs.append(get_dg(position_probs[i],aln_probs)) if not perturbations: perturbations = [] for i in range(len(alignment)): perturbations.append(get_allowed_perturbations(\ position_freqs[i],cutoff,alphabet,scaled_aln_size)) result = [] for i in range(len(alignment)): result.append(sca_pair(alignment,position,i,cutoff,\ position_freqs=position_freqs,position_probs=position_probs,\ dgs=dgs,perturbations=perturbations,\ scaled_aln_size=scaled_aln_size,null_value=null_value,\ return_all=return_all,alphabet=alphabet,\ background_freqs=background_freqs)) return array(result) def sca_alignment(alignment,cutoff,null_value=gDefaultNullValue,\ scaled_aln_size=100,return_all=False,alphabet=default_sca_alphabet,\ background_freqs=default_sca_freqs): """ Calculate statistical coupling b/w all columns in alignment alignment: full alignment object cutoff: the percentage of sequences that must contain a specific char at a specific pos1 to result in an allowed sub-alignment. (According to the Ranganathan papers, this should be the value determined by their 3rd criteria.) scaled_aln_size: the scaled number of sequences in the alignment. The original SCA implementation treats all alignments as if they contained 100 sequences when calculating frequencies and probabilities. 100 is therefore the default value. null_value: the value which should be returned if SCA cannot or should not be calculated (e.g., no allowed perturbations or pos1==pos2, respectively). return_all: if cutoff <= 0.50, it is possible that there will be more than one allowed_perturbation per position. In these cases, either all of the values could be returned (return_all=True) or the max of the values can be returned (return_all=False, default). If you'd like one value, but not the max, wrap this function with return_all=True, and handle the return value as desired. alphabet: an ordered iterable object containing the characters in the alphabet. For example, this can be a CharAlphabet object, a list, or a string. """ num_seqs = alignment.getNumSeqs() natural_probs = probs_from_dict(background_freqs,alphabet) aln_freqs = freqs_from_aln(alignment,alphabet,scaled_aln_size) aln_probs = get_positional_probabilities(\ aln_freqs,natural_probs,scaled_aln_size) # get all positional frequencies position_freqs = [] for i in range(len(alignment)): position_freqs.append(\ get_positional_frequencies(alignment,i,alphabet,scaled_aln_size)) # get all positional probabilities position_probs = [] for i in range(len(alignment)): position_probs.append(get_positional_probabilities(\ position_freqs[i],natural_probs,scaled_aln_size)) # get all delta_g vectors dgs = [] for i in range(len(alignment)): dgs.append(get_dg(position_probs[i],aln_probs)) # get all allowed perturbations perturbations = [] for i in range(len(alignment)): perturbations.append(get_allowed_perturbations(\ position_freqs[i],cutoff,alphabet,scaled_aln_size)) result = [] for i in range(len(alignment)): result.append(sca_position(alignment,i,cutoff,\ position_freqs=position_freqs,position_probs=position_probs,\ dgs=dgs,perturbations=perturbations,\ scaled_aln_size=scaled_aln_size,null_value=null_value,\ return_all=return_all,alphabet=alphabet,\ background_freqs=background_freqs)) return array(result) ## End statistical coupling analysis ## Start Resampled Mutual Information Analysis # (developed by Hutley and Easton, and first published in # Caporaso et al., 2008) def make_weights(freqs, n): """Return the weights for replacement states for each possible character. We compute the weight as the normalized frequency of the replacement state divided by 2*n.""" freqs.normalize() char_prob = freqs.items() weights = [] for C,P in char_prob: alts = Freqs([(c, p) for c, p in char_prob if c!=C]) alts.normalize() alts = Freqs([(c,w/(2*n)) for c,w in alts.items()]) weights += [(C, alts)] return weights def calc_pair_scale(seqs, obs1, obs2, weights1, weights2): """Return entropies and weights for comparable alignment. A comparable alignment is one in which, for each paired state ij, all alternate observable paired symbols are created. For instance, let the symbols {A,C} be observed at position i and {A,C} at position j. If we observe the paired types {AC, AA}. A comparable alignment would involve replacing an AC pair with a CC pair.""" # scale is calculated as the product of mi from col1 with alternate # characters. This means the number of states is changed by swapping # between the original and selected alternate, calculating the new mi pair_freqs = Freqs(seqs) weights1 = dict(weights1) weights2 = dict(weights2) scales = [] for a, b in pair_freqs.keys(): weights = weights1[a] pr = a+b pair_freqs -= [pr] obs1 -= a # make comparable alignments by mods to col 1 for c, w in weights.items(): new_pr = c+b pair_freqs += [new_pr] obs1 += c entropy = mi(obs1.Uncertainty, obs2.Uncertainty,\ pair_freqs.Uncertainty) scales += [(pr, entropy, w)] pair_freqs -= [new_pr] obs1 -= c obs1 += a # make comparable alignments by mods to col 2 weights = weights2[b] obs2 -= b for c, w in weights.items(): new_pr = a+c pair_freqs += [new_pr] obs2 += c entropy = mi(obs1.Uncertainty, obs2.Uncertainty,\ pair_freqs.Uncertainty) scales += [(pr, entropy, w)] obs2 -= c pair_freqs -= [new_pr] obs2 += b pair_freqs += [pr] return scales def resampled_mi_pair(alignment, pos1, pos2, weights=None, excludes=gDefaultExcludes, exclude_handler=None, null_value=gDefaultNullValue): """returns scaled mutual information for a pair. Arguments: - alignment: Alignment instance - pos1, pos2: alignment positions to be assessed - weights: Freq objects of weights for pos1, pos2 - excludes: states to be excluded. """ positions = list(alignment.Positions) col1 = positions[pos1] col2 = positions[pos2] seqs = [''.join(p) for p in zip(col1, col2)] for col in (col1,col2): states = {}.fromkeys(col) for exclude in excludes: if exclude in states: try: col = exclude_handler(col,excludes) break except TypeError: return null_value excludes = excludes or [] num = len(seqs) col1 = Freqs(col1) col2 = Freqs(col2) seq_freqs = Freqs(seqs) if weights: weights1, weights2 = weights else: weights1 = make_weights(col1.copy(), num) weights2 = make_weights(col2.copy(), num) entropy = mi(col1.Uncertainty, col2.Uncertainty, seq_freqs.Uncertainty) scales = calc_pair_scale(seqs, col1, col2, weights1, weights2) scaled_mi = 1-sum([w * seq_freqs[pr] for pr, e, w in scales \ if entropy <= e]) return scaled_mi def resampled_mi_position(alignment, position, positional_entropies=None, excludes=gDefaultExcludes, exclude_handler=None, null_value=gDefaultNullValue): aln_length = len(alignment) result = zeros(aln_length,float) positional_entropies = positional_entropies or alignment.uncertainties() for i in range(aln_length): result[i] = resampled_mi_pair(alignment, pos1=position, pos2=i, excludes=excludes, exclude_handler=exclude_handler, null_value=null_value) return result def resampled_mi_alignment(alignment, excludes=gDefaultExcludes, exclude_handler=None, null_value=gDefaultNullValue): """returns scaled mutual information for all possible pairs.""" aln_length = len(alignment) result = zeros((aln_length,aln_length),float) positional_entropies = alignment.uncertainties() for i in range(aln_length): result[i] = resampled_mi_position(alignment=alignment, position=i, positional_entropies=positional_entropies, excludes=excludes, exclude_handler=exclude_handler, null_value=null_value) return result ## End Resampled Mutual Information Analysis ## Begin ancestral_states analysis def get_ancestral_seqs(aln, tree, sm = None, pseudocount=1e-6, optimise=True): """ Calculates ancestral sequences by maximum likelihood Arguments: - sm: a SubstitutionModel instance. If not provided, one is constructed from the alignment Alphabet - pseudocount: unobserved sequence states must not be zero, this value is assigned to sequence states not observed in the alignment. - optimise: whether to optimise the likelihood function. Note: for the sake of reduced alphabets, we calculate the substitution model from the alignment. This also appears to be what what described in Tuffery 2000, although they're not perfectly clear about it. """ sm = sm or SubstitutionModel(aln.Alphabet, recode_gaps=True) lf = sm.makeLikelihoodFunction(tree,sm.motif_probs) lf.setAlignment(aln, motif_pseudocount=pseudocount) if optimise: lf.optimise(local=True) return DenseAlignment(lf.likelyAncestralSeqs(),MolType=aln.MolType) def ancestral_state_alignment(aln,tree,ancestral_seqs=None,\ null_value=gDefaultNullValue): ancestral_seqs = ancestral_seqs or get_ancestral_seqs(aln,tree) result = [] for i in range(len(aln)): row = [null_value] * len(aln) for j in range(i+1): row[j] = ancestral_state_pair(\ aln,tree,i,j,ancestral_seqs,null_value) result.append(row) return ltm_to_symmetric(array(result)) def ancestral_state_position(aln,tree,position,\ ancestral_seqs=None,null_value=gDefaultNullValue): ancestral_seqs = ancestral_seqs or get_ancestral_seqs(aln,tree) result = [] for i in range(len(aln)): result.append(ancestral_state_pair(\ aln,tree,position,i,ancestral_seqs,null_value)) return array(result) def ancestral_state_pair(aln,tree,pos1,pos2,\ ancestral_seqs=None,null_value=gDefaultNullValue): """ """ ancestral_seqs = ancestral_seqs or get_ancestral_seqs(aln,tree) ancestral_names_to_seqs = \ dict(zip(ancestral_seqs.Names,ancestral_seqs.ArraySeqs)) distances = tree.getDistances() tips = tree.getNodeNames(tipsonly=True) # map names to nodes (there has to be a built-in way to do this # -- what is it?) nodes = dict([(n,tree.getNodeMatchingName(n)) for n in tips]) # add tip branch lengths as distance b/w identical tips -- this is # necessary for my weighting step, where we want correlated changes # occuring on a single branch to be given the most weight distances.update(dict([((n,n),nodes[n].Length) for n in nodes])) result = 0 names_to_seqs = dict(zip(aln.Names,aln.ArraySeqs)) for i in range(len(tips)): org1 = tips[i] seq1 = names_to_seqs[org1] for j in range(i,len(tips)): org2 = tips[j] seq2 = names_to_seqs[org2] ancestor = nodes[org1].lastCommonAncestor(nodes[org2]).Name if ancestor == org1 == org2: # we're looking for correlated change along a # single branch ancestral_seq = ancestral_names_to_seqs[\ nodes[org1].ancestors()[0].Name] else: # we're looking for correlated change along different # branches (most cases) ancestral_seq = ancestral_names_to_seqs[ancestor] # get state of pos1 in org1, org2, and ancestor org1_p1 = seq1[pos1] org2_p1 = seq2[pos1] ancestor_p1 = ancestral_seq[pos1] # if pos1 has changed in both organisms since their lca, # this is a position of interest if org1_p1 != ancestor_p1 and org2_p1 != ancestor_p1: # get state of pos2 in org1, org2, and ancestor org1_p2 = seq1[pos2] org2_p2 = seq2[pos2] ancestor_p2 = ancestral_seq[pos2] # if pos2 has also changed in both organisms since their lca, # then we add a count for a correlated change if org1_p2 != ancestor_p2 and org2_p2 != ancestor_p2: # There are a variety of ways to score. The simplest is # to increment by one, which seems to be what was done # in other papers.) This works well, but in a quick test # (alpha helices/myoglobin with several generally # high scoring alphabets) weighting works better. A more # detailed analysis is in order. #result += 1 # Now I weight based on distance so # changes in shorter time are scored higher than # in longer time. (More ancient changes # are more likely to be random than more recent changes, # b/c more time has passed for the changes to occur in.) # This gives results # that appear to be better under some circumstances, # and at worst, about the same as simply incrementing # by 1. result += (1/distances[(org1,org2)]) # Another one to try might involve discounting the score # for a pair when one changes and the other doesn't. return result ## End ancestral_states analysis ## Begin Gctmpca method (Yeang et al., 2007) def build_rate_matrix(count_matrix,freqs,aa_order='ACDEFGHIKLMNPQRSTVWY'): epm = EmpiricalProteinMatrix(count_matrix,freqs) word_probs = array([freqs[aa] for aa in aa_order]) num = word_probs.shape[0] mprobs_matrix = ones((num,num), float)*word_probs return epm.calcQ(word_probs, mprobs_matrix) def create_gctmpca_input(aln,tree): """ Generate the four input files as lists of lines. """ new_tree = tree.copy() seqs1 = [] seq_names = [] seq_to_species1 = [] seqs1.append(' '.join(map(str,[aln.getNumSeqs(),len(aln)]))) constant_name_length = max(map(len,aln.Names)) for n in aln.Names: name = ''.join([n] + ['.']*(constant_name_length - len(n))) new_tree.getNodeMatchingName(n).Name = name seqs1.append(' '.join([name,str(aln.getGappedSeq(n))])) seq_names.append(name) seq_to_species1.append('\t'.join([name,name])) seqs1.append('\n') seq_names.append('\n') seq_to_species1.append('\n') return seqs1, [str(new_tree),'\n'], seq_names, seq_to_species1 def parse_gctmpca_result_line(line): fields = line.strip().split() return int(fields[0]) - 1, int(fields[1]) - 1, float(fields[2]) def parse_gctmpca_result(f,num_positions): m = array([[gDefaultNullValue]*num_positions]*num_positions) for line in list(f)[1:]: pos1, pos2, score = parse_gctmpca_result_line(line) try: m[pos1,pos2] = m[pos2,pos1] = score except IndexError: raise ValueError, \ "%d, %d out of range -- invalid num_positions?" % (pos1, pos2) return m def gctmpca_pair(aln,tree,pos1,pos2,epsilon=None,priors=None,sub_matrix=None,\ null_value=gDefaultNullValue,debug=False): seqs1, tree1, seq_names, seq_to_species1 = create_gctmpca_input(aln,tree) if aln.MolType == PROTEIN: mol_type = 'protein' elif aln.MolType == RNA: mol_type = 'rna' else: raise ValueError, 'Unsupported mol type, must be PROTEIN or RNA.' gctmpca = Gctmpca(HALT_EXEC=debug) data = {'mol_type':mol_type,'seqs1':seqs1,'tree1':tree1,\ 'seq_names':seq_names, 'seq_to_species1':seq_to_species1,\ 'species_tree':tree1, 'char_priors':priors, \ 'sub_matrix':sub_matrix,'single_pair_only':1,'epsilon':epsilon,\ 'pos1':str(pos1),'pos2':str(pos2)} r = gctmpca(data) try: # parse the first line and return the score as a float result = float(parse_gctmpca_result_line(list(r['output'])[1])[2]) except IndexError: # There is no first line, so insignificant score result = null_value # clean up the temp files r.cleanUp() return result def gctmpca_alignment(aln,tree,epsilon=None,priors=None,\ sub_matrix=None,null_value=gDefaultNullValue,debug=False): seqs1, tree1, seq_names, seq_to_species1 = create_gctmpca_input(aln,tree) if aln.MolType == PROTEIN: mol_type = 'protein' elif aln.MolType == RNA: mol_type = 'rna' else: raise ValueError, 'Unsupported mol type, must be PROTEIN or RNA.' gctmpca = Gctmpca(HALT_EXEC=debug) data = {'mol_type':mol_type,'seqs1':seqs1,'tree1':tree1,\ 'seq_names':seq_names, 'seq_to_species1':seq_to_species1,\ 'species_tree':tree1, 'char_priors':priors, \ 'sub_matrix':sub_matrix,'single_pair_only':0,'epsilon':epsilon} r = gctmpca(data) result = parse_gctmpca_result(r['output'],len(aln)) r.cleanUp() return result ## End Yeang method ### Methods for running coevolutionary analyses on sequence data. method_abbrevs_to_names = {'mi':'Mutual Information',\ 'nmi':'Normalized Mutual Information',\ 'sca':'Statistical Coupling Analysis',\ 'an':'Ancestral States',\ 'rmi':'Resampled Mutual Information', 'gctmpca':'Haussler/Yeang Method'} ## Method-specific error checking functions # Some of the coevolution algorithms require method-specific input validation, # but that code isn't included in the alrogithm-specific functions (e.g. # sca_alignment, # sca_pair) because those are sometimes run many times. For example, # sca_alignment makes many calls to sca_pair, so we can't have sca_pair # perform validation every time it's called. My solution is to have the # coevolve_* functions perform the input validation, and recommend that # users always perform analyses via these functions. So, in the above example, # the user would access sca_alignment via coevolve_alignment('sca', ...). Since # sca_alignment makes calls to sca_pair, not coevolve_pair, the input # validation # is only performed once by coevolve_alignment. def sca_input_validation(alignment,**kwargs): """SCA specific validations steps """ # check that all required parameters are present in kwargs required_parameters = ['cutoff'] # users must provide background frequencies for MolTypes other # than PROTEIN -- by default, protein background frequencies are used. if alignment.MolType != PROTEIN: required_parameters.append('background_freqs') for rp in required_parameters: if rp not in kwargs: raise ValueError, 'Required parameter was not provided: ' + rp # check that the value provided for cutoff is valid (ie. between 0 and 1) if not 0.0 <= kwargs['cutoff'] <= 1.0: raise ValueError, 'Cutoff must be between zero and one.' # check that the set of chars in alphabet and background_freqs are # identical try: alphabet = kwargs['alphabet'] except KeyError: # We want to use the PROTEIN alphabet minus the U character for # proteins since we don't have a background frequency for U if alignment.MolType == PROTEIN: alphabet = AAGapless else: alphabet = alignment.MolType.Alphabet try: background_freqs = kwargs['background_freqs'] except KeyError: background_freqs = default_sca_freqs validate_alphabet(alphabet,background_freqs) def validate_alphabet(alphabet,freqs): """SCA validation: ValueError if set(alphabet) != set(freqs.keys()) """ alphabet_chars = set(alphabet) freq_chars = set(freqs.keys()) if alphabet_chars != freq_chars: raise ValueError, \ "Alphabet and background freqs must contain identical sets of chars." def ancestral_states_input_validation(alignment,**kwargs): """Ancestral States (AS) specific validations steps """ # check that all required parameters are present in kwargs required_parameters = ['tree'] for rp in required_parameters: if rp not in kwargs: raise ValueError, 'Required parameter was not provided: ' + rp # validate the tree validate_tree(alignment,kwargs['tree']) # if ancestral seqs are provided, validate them. (If calculated on the fly, # we trust them.) if 'ancestral_seqs' in kwargs: validate_ancestral_seqs(alignment,kwargs['tree'],\ kwargs['ancestral_seqs']) def validate_ancestral_seqs(alignment,tree,ancestral_seqs): """AS validation: ValueError if incompatible aln, tree, & ancestral seqs Incompatibility between the alignment and the ancestral_seqs is different sequence lengths. Incompatbility between the tree and the ancestral seqs is imperfect overlap between the names of the ancestors in the tree and the ancestral sequence names. """ if len(alignment) != len(ancestral_seqs): raise ValueError,\ "Alignment and ancestral seqs are different lengths." # is there a better way to get all the ancestor names? why doesn't # tree.ancestors() do this? edges = set(tree.getNodeNames()) - set(tree.getTipNames()) seqs = set(ancestral_seqs.getSeqNames()) if edges != seqs: raise ValueError, \ "Must be ancestral seqs for all edges and root in tree, and no more." def validate_tree(alignment,tree): """AS validation: ValueError if tip and seq names aren't same """ if set(tree.getTipNames()) != set(alignment.getSeqNames()): raise ValueError, \ "Tree tips and seqs must have perfectly overlapping names." ## End method-specific error checking functions ## General (opposed to algorithm-specific) validation functions def validate_position(alignment,position): """ValueError if position is outside the range of the alignment """ if not 0 <= position < len(alignment): raise ValueError, \ "Position is outside the range of the alignment: " + str(position) def validate_alignment(alignment): """ValueError on ambiguous alignment characters""" bad_seqs = [] for name, ambiguous_pos in \ alignment.getPerSequenceAmbiguousPositions().items(): if ambiguous_pos: bad_seqs.append(name) if bad_seqs: raise ValueError, 'Ambiguous characters in sequences: %s' \ % '; '.join(map(str,bad_seqs)) def coevolve_alignments_validation(method,alignment1,alignment2,\ min_num_seqs,max_num_seqs,**kwargs): """ Validation steps required for intermolecular coevolution analyses """ valid_methods_for_different_moltypes = {}.fromkeys(\ [mi_alignment,nmi_alignment,resampled_mi_alignment]) if (alignment1.MolType != alignment2.MolType) and \ method not in valid_methods_for_different_moltypes: raise AssertionError, "Different MolTypes only supported for %s" %\ ' '.join(map(str,valid_methods_for_different_moltypes.keys())) alignment1_names = \ set([n.split('+')[0].strip() for n in alignment1.Names]) alignment2_names = \ set([n.split('+')[0].strip() for n in alignment2.Names]) if 'tree' in kwargs: tip_names = \ set([n.split('+')[0].strip() \ for n in kwargs['tree'].getTipNames()]) assert alignment1_names == alignment2_names == tip_names,\ "Alignment and tree sequence names must perfectly overlap" else: # no tree passed in assert alignment1_names == alignment2_names,\ "Alignment sequence names must perfectly overlap" # Determine if the alignments have enough sequences to proceed. if alignment1.getNumSeqs() < min_num_seqs: raise ValueError, "Too few sequences in merged alignment: %d < %d" \ % (alignment1.getNumSeqs(), min_num_seqs) # Confirm that min_num_seqs <= max_num_seqs if max_num_seqs and min_num_seqs > max_num_seqs: raise ValueError, \ "min_num_seqs (%d) cannot be greater than max_num_seqs (%d)." \ % (min_num_seqs, max_num_seqs) ## End general validation functions ## Start alignment-wide intramolecular coevolution analysis # coevolve alignment functions: f(alignment,**kwargs) -> 2D array coevolve_alignment_functions = \ {'mi': mi_alignment,'nmi': normalized_mi_alignment,\ 'rmi': resampled_mi_alignment,'sca': sca_alignment,\ 'an':ancestral_state_alignment,'gctmpca':gctmpca_alignment} def coevolve_alignment(method,alignment,**kwargs): """ Apply coevolution method to alignment (for intramolecular coevolution) method: f(alignment,**kwargs) -> 2D array of coevolution scores alignment: alignment object for which coevolve scores should be calculated **kwargs: parameters to be passed to method() """ # Perform method specific validation steps if method == sca_alignment: sca_input_validation(alignment,**kwargs) if method == ancestral_state_alignment: ancestral_states_input_validation(alignment,**kwargs) validate_alignment(alignment) return method(alignment,**kwargs) ## End alignment-wide intramolecular coevolution analysis ## Start intermolecular coevolution analysis # Mapping between coevolve_alignment functions and coevolve_pair functions. # These are used in coevolve_alignments, b/c under some circumstance the # alignment function is used, and under other circumstance the pair # function is used, but the user shouldn't have to know anything about # that. coevolve_alignment_to_coevolve_pair = \ {mi_alignment: mi_pair,normalized_mi_alignment: normalized_mi_pair,\ resampled_mi_alignment: resampled_mi_pair, sca_alignment: sca_pair,\ ancestral_state_alignment:ancestral_state_pair} def merge_alignments(alignment1,alignment2): """ Append alignment 2 to the end of alignment 1 This function is used by coevolve_alignments to merge two alignments so they can be evaluated by coevolve_alignment. """ result = {} # Created maps from the final seq ids (i.e., seq id before plus) to the # seq ids in the original alignments aln1_name_map = \ dict([(n.split('+')[0].strip(),n) for n in alignment1.Names]) aln2_name_map = \ dict([(n.split('+')[0].strip(),n) for n in alignment2.Names]) try: for merged_name,orig_name in aln1_name_map.items(): result[merged_name] = alignment1.getGappedSeq(orig_name) +\ alignment2.getGappedSeq(aln2_name_map[merged_name]) except ValueError: # Differing MolTypes for merged_name,orig_name in aln1_name_map.items(): result[merged_name] =\ Sequence(alignment1.getGappedSeq(orig_name)) +\ Sequence(alignment2.getGappedSeq(aln2_name_map[merged_name])) except KeyError,e: raise KeyError, 'A sequence identifier is in alignment2 ' +\ 'but not alignment1 -- did you filter out sequences identifiers' +\ ' not common to both alignments?' return LoadSeqs(data=result,aligned=DenseAlignment) def n_random_seqs(alignment,n): """Given alignment, return n random seqs in a new alignment object. This function is used by coevolve_alignments. """ seq_names = alignment.Names shuffle(seq_names) return alignment.takeSeqs(seq_names[:n]) def coevolve_alignments(method,alignment1,alignment2,\ return_full=False,merged_aln_filepath=None,min_num_seqs=2,\ max_num_seqs=None,sequence_filter=n_random_seqs,**kwargs): """ Apply method to a pair of alignments (for intermolecular coevolution) method: the *_alignment function to be applied alignment1: alignment of first molecule (DenseAlignment) alignment2: alignment of second molecule (DenseAlignment) return_full: if True, returns intra- and inter-molecular coevolution data in a square matrix (default: False) merged_aln_filepath: if provided, will write the merged alignment to file (useful for running post-processing filters) min_num_seqs: the minimum number of sequences that should be present in the merged alignment to perform the analysis (default: 2) max_num_seqs: the maximum number of sequences to include in an analysis - if the number of sequences exceeds max_num_seqs, a random selection of max_num_seqs will be used. This is a time-saving step as too many sequences can slow things down a lot. (default: None, any number of sequences is allowed) sequence_filter: function which takes an alignment and an int and returns the int number of sequences from the alignment in a new alignment object (defualt: util.n_random_seqs(alignment,n)) if None, a ValueError will be raised if there are more than max_num_seqs This function allows for calculation of coevolve scores between pairs of alignments. The results are returned in a rectangular len(alignment1) x len(alignment2) matrix. There are some complications involved in preparing alignments for this function, because it needs to be obvious how to associate the putative interacting sequences. For example, if looking for interactions between mammalian proteins A and B, sequences are required from the same sets of species, and it must be apparant how to match the sequences that are most likely to be involved in biologically meaningful interactions. This typically means matching the sequences of proteins A&B that come from the same species. In other words, interaction of T. aculeatus proteinA and H. sapien proteinB likely don't form a biologically relevant interaction, because the species are so diverged. Matching of sequences is performed via the identifiers, but it is the responsibility of the user to correctly construct the sequence identifiers before passing the alignments (and tree, if applicable) to this function. To faciliate matching sequence identifiers, but not having to discard the important information already present in a sequence identifier obtained from a database such as KEGG or RefSeq, sequence identifiers may contain a plus symbol (+). The characters before the + are used to match sequences between the alignments and tree. The characters after the + are ignored by this function. So, a good strategy is to make the text before the '+' a taxonomic identifier and leave the text after the '+' as the original sequence identifier. For example, your sequence/tip names could look like: alignment1: 'H. sapien+gi|123', 'T. aculeatus+gi|456' alignment2: 'T. aculeatus+gi|999', 'H. sapien+gi|424' tree: 'T. aculeatus+gi|456', 'H. sapien' If there is no plus, the full sequence identifier will be used for the matching (see H. sapien in tree). The order of sequences in the alignments is not important. Also note that we can't split on a colon, as would be convenient for pulling sequences from KEGG, because colons are special characters in newick. A WORD OF WARNING ON SEQUENCE IDENTIFIER CONSTRUCTION: A further complication is that in some cases, an organism will have multiple copies of proteins involved in a complex, but proteinA from locus 1 will not form a functional comples with proteinB from locus 2. An example of this is the three T6SSs in P. aeuroginosa. Make sure this is handled correctly when building your sequence identifiers! Sequence identifiers are used to match the sequences which are suspected to form a functional complex, which may not simply mean sequences from the same species. """ # Perform general validation step coevolve_alignments_validation(method,\ alignment1,alignment2,min_num_seqs,max_num_seqs,**kwargs) # Append alignment 2 to the end of alignment 1 in a new alignment object merged_alignment = merge_alignments(alignment1,alignment2) validate_alignment(merged_alignment) if max_num_seqs and merged_alignment.getNumSeqs() > max_num_seqs: try: merged_alignment = sequence_filter(merged_alignment,max_num_seqs) except TypeError: raise ValueError, "Too many sequences for covariation analysis." # If the user provided a filepath for the merged alignment, write it to # disk. This is sometimes useful for post-processing steps. if merged_aln_filepath: merged_aln_file = open(merged_aln_filepath,'w') merged_aln_file.write(merged_alignment.toFasta()) merged_aln_file.close() if return_full: # If the user requests the full result matrix (inter and intra # molecular coevolution data), call coevolve_alignment on the # merged alignment. Calling coevolve_alignment ensures that # the correct validations are performed, rather than directly # calling method. result = coevolve_alignment(method,merged_alignment,**kwargs) return result ## Note: we only get here if the above if statement comes back False, ## i.e., if we only want the intermolecular coevolution and don't care ## about the intramolecular coevolution. # Get the appropriate method (need the pair method, # not the alignment method) try: method = coevolve_alignment_to_coevolve_pair[method] except KeyError: # may have passed in the coevolve_pair function, so just # continue -- will fail (loudly) soon enough if not. pass # Cache the alignment lengths b/c we use them quite a bit, and build # the result object to be filled in. len_alignment1 = len(alignment1) len_alignment2 = len(alignment2) result = array([[gDefaultNullValue]*len_alignment1]*len_alignment2) # Some of the methods run much faster if relevant data is computed once, # and passed in -- that is done here, but there is a lot of repeated code. # I'm interested in suggestions for how to make this block of code more # compact (e.g., can I be making better use of kwargs?). if method == mi_pair or method == nmi_pair or method == normalized_mi_pair: positional_entropies = \ [Freqs(p).Uncertainty for p in merged_alignment.Positions] for i in range(len_alignment1): for j in range(len_alignment2): result[j,i] = \ method(merged_alignment,j+len_alignment1,i,\ h1=positional_entropies[j+len_alignment1],\ h2=positional_entropies[i],**kwargs) elif method == ancestral_state_pair: # Perform method-specific validations so we can safely work # directly with method rather than the coevolve_pair wrapper, # and thereby avoid validation steps on each call to method. ancestral_states_input_validation(merged_alignment,**kwargs) ancestral_seqs = get_ancestral_seqs(merged_alignment,kwargs['tree']) for i in range(len_alignment1): for j in range(len_alignment2): result[j,i] = \ method(aln=merged_alignment,\ pos1=j+len_alignment1,pos2=i,\ ancestral_seqs=ancestral_seqs,**kwargs) else: # Perform method-specific validations so we can safely work # directly with method rather than the coevolve_pair wrapper, # and thereby avoid validation steps on each call to method. if method == sca_pair: sca_input_validation(merged_alignment,**kwargs) for i in range(len_alignment1): for j in range(len_alignment2): result[j,i] = \ method(merged_alignment,j+len_alignment1,i,**kwargs) return result ## End intermolecular coevolution analysis ## Start positional coevolution analysis # coevolve position functions: f(alignment,position,**kwargs) -> 1D array coevolve_position_functions = \ {'mi': mi_position,'nmi': normalized_mi_position,\ 'rmi': resampled_mi_position,'sca': sca_position,\ 'an':ancestral_state_position} def coevolve_position(method,alignment,position,**kwargs): """ Apply provided coevolution method to a column in alignment method: f(alignment,position,**kwargs) -> array of coevolution scores alignment: alignment object for which coevolve scores should be calculated (DenseAlignment) position: position of interest for coevolution analysis (int) **kwargs: parameters to be passed to method() """ # Perform method-specific validation steps if method == sca_position: sca_input_validation(alignment,**kwargs) if method == ancestral_state_position: ancestral_states_input_validation(alignment,**kwargs) # Perform general validation steps validate_position(alignment,position) validate_alignment(alignment) # Perform the analysis and return the result vector return method(alignment,position=position,**kwargs) ## End positional coevolution analysis ## Start pairwise coevolution analysis # coevolve pair functions: f(alignment,pos1,pos2,**kwargs) -> float coevolve_pair_functions = \ {'mi': mi_pair,'nmi': normalized_mi_pair,\ 'rmi': resampled_mi_pair,'sca': sca_pair,\ 'an':ancestral_state_pair,'gctmpca':gctmpca_pair} def coevolve_pair(method,alignment,pos1,pos2,**kwargs): """ Apply provided coevolution method to columns pos1 & pos2 of alignment method: f(alignment,pos1,pos2,**kwargs) -> coevolution score alignment: alignment object for which coevolve score should be calculated (DenseAlignment) pos1, pos2: positions to evaluate coevolution between (int) **kwargs: parameters to be passed to method() """ # Perform method-specific validation steps if method == sca_pair: sca_input_validation(alignment,**kwargs) if method == ancestral_state_pair: ancestral_states_input_validation(alignment,**kwargs) # Perform general validation steps validate_position(alignment,pos1) validate_position(alignment,pos2) validate_alignment(alignment) # Perform the analysis and return the result score return method(alignment,pos1=pos1,pos2=pos2,**kwargs) ## End pairwise coevolution analysis ### End methods for running coevolutionary analyses on sequence data ## Coevolution matrix filters: the following functions are used as ## post-processing filters for coevolution result matrices. def filter_threshold_based_multiple_interdependency(aln,coevolution_matrix, threshold=0.95,max_cmp_threshold=1,cmp_function=greater_equal,\ intermolecular_data_only=False): """Filters positions with more than max_cmp_threshold scores >= threshold This post-processing filter is based on the idea described in: "Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments" Tillier and Lui, 2003 The idea is that when a position achieved a high covariation score with many other positions, the covariation is more likely to arise from the phylogeny than from coevolution. They illustrate that this works in their paper, and I plan to test it with my alpha-helix-based analysis. Note that you can change cmp_function to change whether you're looking for high values to indicate covarying positions (cmp_function=greater_equal, used for most coevolution algorithms) or low values to indicate covarying positions (cmp_function=less_equal, used, e.g., for p-value matrices). aln: alignment used to generate the coevolution matrix -- this isn't actually used, but is required to maintain the same interface as other post-processing filters. Pass None if that's more convenient. coevolution_matrix: the 2D numpy array to be filtered. This should be a rectangular matrix for intermoelcular coevolution data (in which case intermolecular_data_only must be set to True) or a symmetric square matrix (when intermolecular_data_only=False) threshold: the threshold coevolution score that other scores should be compared to max_cmp_threshold: the max number of scores that are allowed to be True with respect to cmp_function and threshold (e.g., the max number of positions that may be greater than the threhsold) before setting all values associated that position to gDefaultNullValue (default: 1) cmp_function: the function that compares each score in coevolution_matrix to threshold (default: ge (greater than)) - function should return True if the score is one that your looking (e.g. score >= threshold) or False otherwise intermolecular_data_only: True if coevolution_matrix is a rectangular matrix representing an intermolecular coevolution study, and False if the matrix is a symmetric square matrix NOTE: IF intermolecular_data_only == True, coevolution_matrix MUST BE SYMMETRIC, NOT LOWER TRIANGULAR OR OTHERWISE NON-SYMMETRIC!! """ # Determine which rows need to be filtered (but don't filter them # right away or subsequent counts could be off) filtered_rows = [] for row_n in range(coevolution_matrix.shape[0]): count_cmp_threshold = 0 for v in coevolution_matrix[row_n,:]: if v != gDefaultNullValue and cmp_function(v,threshold): count_cmp_threshold += 1 if count_cmp_threshold > max_cmp_threshold: filtered_rows.append(row_n) break # if the matrix is not symmetric, determine which cols need to be filtered if intermolecular_data_only: filtered_cols = [] for col_n in range(coevolution_matrix.shape[1]): count_cmp_threshold = 0 for v in coevolution_matrix[:,col_n]: if v != gDefaultNullValue and cmp_function(v,threshold): count_cmp_threshold += 1 if count_cmp_threshold > max_cmp_threshold: filtered_cols.append(col_n) break # filter the rows and cols in a non-symmetric matrix for row_n in filtered_rows: coevolution_matrix[row_n,:] = gDefaultNullValue for col_n in filtered_cols: coevolution_matrix[:,col_n] = gDefaultNullValue else: # filter the rows and cols in a symmetric matrix for row_n in filtered_rows: coevolution_matrix[row_n,:] =\ coevolution_matrix[:,row_n] = gDefaultNullValue # return the result return coevolution_matrix def is_parsimony_informative(column_freqs,minimum_count=2,\ minimum_differences=2,ignored=gDefaultExcludes,strict=False): """Return True is aln_position is parsimony informative column_freqs: dict of characters at alignmnet position mapped to their counts -- this is the output of call alignment.columnFreqs() minimum_count: the minimum number of times a character must show up for it to be acceptable (default: 2) minimum_differences: the minimum number of different characters that must show up at the alignment position (default: 2) ignored: characters that should not be counted toward minimum_differences (default are exclude characters) strict: if True, requires that all amino acids showing up at least once at the alignment position show up at least minimum_counts times, rather than only requiring that minimum_differences amino acids show up minimum_counts times. (default: False) The term parsimony informative comes from Codoner, O'Dea, and Fares 2008, Reducing the false positive rate in the non- parametric analysis of molecular coevolution. In the paper they find that if positions which don't contain at least two different amino acids, and where each different amino acid doesnt show up at least twice each are ignored (i.e., treated as though there is not enough information) the positive predictive value (PPV) and sensitivity (SN) increase on simulated alignments. They term this quality parsimony informative. I implemented this as a filter, but include some generalization. To determine if a column in an alignment is parsimony informative in the exact manner described in Codoner et al., the following parameter settings are required: minimum_count = 2 (default) minimum_differences = 2 (default) strict = True (default is False) To generalize this function, minimum_count and minimum_differences can be passed in so at least minimum_differences different amino acids must show up, and each amino acid must show up at least minimum_count times. In additional variation, strict=False can be passed requiring that only minimum_differences number of amino acids show up at least minimum_counts times (opposed to requiring that ALL amino acids show up minimum_counts times). This is the default behavior. By default, the default exclude characters (- and ?) don't count. """ if ignored: for e in ignored: try: del column_freqs[e] except KeyError: pass if len(column_freqs) < minimum_differences: return False count_gte_minimum = 0 for count in column_freqs.values(): # if not strict, only minimum_differences of the counts # must be greater than or equal to minimum_count, so # count those occurrences (this is different than the # exact technique presented in Codoner et al.) if count >= minimum_count: count_gte_minimum += 1 # if strict, all counts must be greater than minimum_count # so return False here if we find one that isn't. This is how # the algorithm is described in Codoner et al. elif strict: return False return count_gte_minimum >= minimum_differences def filter_non_parsimony_informative(aln,coevolution_matrix,\ null_value=gDefaultNullValue,minimum_count=2,minimum_differences=2,\ ignored=gDefaultExcludes,intermolecular_data_only=False,strict=False): """ Replaces scores in coevolution_matrix with null_value for positions which are not parsimony informative. See is_parsimony_informative doc string for definition of parsimony informative. aln: the input alignment used to generate the coevolution matrix; if the alignment was recoded, this should be the recoded alignment. coevolution_matrix: the result matrix null_value: the value to place in positions which are not parsimony informative """ if intermolecular_data_only: len_aln1 = coevolution_matrix.shape[1] column_frequencies = aln.columnFreqs() for i in range(len(column_frequencies)): if not is_parsimony_informative(column_frequencies[i],minimum_count,\ minimum_differences,ignored,strict): if not intermolecular_data_only: coevolution_matrix[i,:] = coevolution_matrix[:,i] = null_value else: try: coevolution_matrix[:,i] = null_value except IndexError: coevolution_matrix[i-len_aln1,:] = null_value def make_positional_exclude_percentage_function(excludes,max_exclude_percent): """ return function to identify aln positions with > max_exclude_percent """ excludes = {}.fromkeys(excludes) def f(col): exclude_count = 0 for c in col: if c in excludes: exclude_count += 1 return exclude_count / len(col) > max_exclude_percent return f def filter_exclude_positions(aln,coevolution_matrix,\ max_exclude_percent=0.1,null_value=gDefaultNullValue,\ excludes=gDefaultExcludes,intermolecular_data_only=False): """ Assign null_value to positions with > max_exclude_percent excludes aln: the DenseAlignment object coevolution_matrix: the 2D numpy array -- this will be modified max_exclude_percent: the maximimu percent of characters that may be exclude characters in any alignment position (column). if the percent of exclude characters is greater than this value, values in this position will be replaced with null_value (default = 0.10) null_value: the value to be used as null (default: gDefaultNullValue) excludes: the exclude characters (default: gDefaultExcludes) intermolecular_data_only: True if the coevolution result matrix contains only intermolecular data (default: False) """ # construct the function to be passed to aln.getPositionIndices f = make_positional_exclude_percentage_function(\ excludes,max_exclude_percent) # identify the positions containing too many exclude characters exclude_positions = aln.getPositionIndices(f) # replace values from exclude_positions with null_value if not intermolecular_data_only: # if working with intramolecular data (or inter + intra molecular data) # this is easy for p in exclude_positions: coevolution_matrix[p,:] = coevolution_matrix[:,p] = null_value else: # if working with intermolecular data only, this is more complicated -- # must convert from alignment positions to matrix positions len_aln1 = coevolution_matrix.shape[1] for p in exclude_positions: try: coevolution_matrix[:,p] = null_value except IndexError: coevolution_matrix[p-len_aln1,:] = null_value ## Functions for archiving/retrieiving coevolve results #### These functions are extremely general -- should they go #### somewhere else, or should I be using pre-existing code? def pickle_coevolution_result(coevolve_result,out_filepath='output.pkl'): """ Pickle coevolve_result and store it at output_filepath coevolve_result: result from a coevolve_* function (above); this can be a float, an array, or a 2D array (most likely it will be one of the latter two, as it will usually be fast enough to compute a single coevolve value on-the-fly. out_filepath: path where the pickled result should be stored """ try: p = Pickler(open(out_filepath,'w')) except IOError: err = "Can't access filepath. Do you have write access? " + \ out_filepath raise IOError,err p.dump(coevolve_result) def unpickle_coevolution_result(in_filepath): """ Read in coevolve_result from a pickled file in_filepath: filepath to unpickle """ try: u = Unpickler(open(in_filepath)) except IOError: err = \ "Can't access filepath. Does it exist? Do you have read access? "+\ in_filepath raise IOError,err return u.load() def coevolution_matrix_to_csv(coevolve_matrix,out_filepath='output.csv'): """ Write coevolve_matrix as csv file at output_filepath coevolve_result: result from a coevolve_alignment function (above); this should be a 2D numpy array out_filepath: path where the csv result should be stored """ try: f = open(out_filepath,'w') except IOError: err = "Can't access filepath. Do you have write access? " + \ out_filepath raise IOError,err f.write('\n'.join([','.join([str(v) for v in row]) \ for row in coevolve_matrix])) f.close() def csv_to_coevolution_matrix(in_filepath): """ Read a coevolution matrix from a csv file in_filepath: input filepath """ try: f = open(in_filepath) except IOError: err = \ "Can't access filepath. Does it exist? Do you have read access? "+\ in_filepath raise IOError,err result = [] for line in f: values = line.strip().split(',') result.append(map(float,values)) return array(result) ## End functions for archiving/retrieiving coevolve results ## Start functions for analyzing the results of a coevolution run. def identify_aln_positions_above_threshold(coevolution_matrix,threshold,\ aln_position,null_value=gDefaultNullValue): """ Returns the list of alignment positions which achieve a score >= threshold with aln_position. Coevolution matrix should be symmetrical or you may get weird results -- scores are pulled from the row describing aln_position. """ coevolution_scores = coevolution_matrix[aln_position] results = [] for i in range(len(coevolution_scores)): s = coevolution_scores[i] if s != null_value and s >= threshold: results.append(i) return results def aln_position_pairs_cmp_threshold(coevolution_matrix,\ threshold,cmp_function,null_value=gDefaultNullValue,\ intermolecular_data_only=False): """ Returns list of position pairs with score >= threshold coevolution_matrix: 2D numpy array threshold: value to compare matrix positions against cmp_function: function which takes a value and theshold and returns a boolean (e.g., ge(), le()) null_value: value representing null scores -- these are ignored intermolecular_data_only: True if the coevolution result matrix contains only intermolecular data (default: False) """ if not intermolecular_data_only: assert coevolution_matrix.shape[0] == coevolution_matrix.shape[1],\ "Non-square matrices only supported for intermolecular-only data." results = [] # compile the matrix positions with cmp(value,threshold) == True for i,row in enumerate(coevolution_matrix): for j,value in enumerate(row): if value != null_value and cmp_function(value,threshold): results.append((i,j)) # if working with intermolecular data only, need to convert # matrix positions to alignment positions if intermolecular_data_only: # convert matrix positions to alignment positions adjustment = coevolution_matrix.shape[1] results = [(j,i+adjustment) for i,j in results] return results def aln_position_pairs_ge_threshold(coevolution_matrix,\ threshold,null_value=gDefaultNullValue,\ intermolecular_data_only=False): """wrapper function for aln_position_pairs_cmp_threshold """ return aln_position_pairs_cmp_threshold(\ coevolution_matrix,threshold,greater_equal,null_value,intermolecular_data_only) def aln_position_pairs_le_threshold(coevolution_matrix,\ threshold,null_value=gDefaultNullValue,\ intermolecular_data_only=False): """wrapper function for aln_position_pairs_cmp_threshold """ return aln_position_pairs_cmp_threshold(\ coevolution_matrix,threshold,less_equal,\ null_value,intermolecular_data_only) def count_cmp_threshold(m,threshold,cmp_function,null_value=gDefaultNullValue,\ symmetric=False,ignore_diagonal=False): """ Returns a count of the values in m >= threshold, ignoring nulls. m: coevolution matrix (numpy array) thresold: value to compare against scores in matrix (float) cmp_function: function used to compare value to threshold (e.g., greater_equal, less_equal) """ total_non_null = 0 total_hits = 0 if not symmetric: if ignore_diagonal: values = [m[i,j] \ for i in range(m.shape[0]) \ for j in range(m.shape[1]) \ if i != j] else: values = m.flat else: if ignore_diagonal: # has to be a better way to do this... tril doesn't work b/c it # sets the upper triangle to zero -- if i could get it to set # that to null_value, and then apply flat, that'd be fine. #values = tril(m,-1) values = [m[i,j] for i in range(len(m)) for j in range(i)] else: #values = tril(m) values = [m[i,j] for i in range(len(m)) for j in range(i+1)] if isnan(null_value): def is_not_null_value(v): return not isnan(v) else: def is_not_null_value(v): return isnan(v) or v != null_value for value in values: if is_not_null_value(value): total_non_null += 1 if cmp_function(value, threshold): total_hits += 1 return total_hits, total_non_null def count_ge_threshold(m,threshold,null_value=gDefaultNullValue,\ symmetric=False,ignore_diagonal=False): """wrapper function for count_cmp_threshold """ return count_cmp_threshold(m,threshold,greater_equal,null_value,\ symmetric,ignore_diagonal) def count_le_threshold(m,threshold,null_value=gDefaultNullValue,\ symmetric=False,ignore_diagonal=False): """wrapper function for count_cmp_threshold """ return count_cmp_threshold(m,threshold,less_equal,null_value,\ symmetric,ignore_diagonal) def ltm_to_symmetric(m): """ Copies values from lower triangle to upper triangle""" assert m.shape[0] == m.shape[1], \ "Making matrices symmetric only supported for square matrices" for i in range(len(m)): for j in range(i): m[j,i] = m[i,j] return m ## End functions for analyzing the results of a coevolution run ## Script functionality def build_coevolution_matrix_filepath(input_filepath,\ output_dir='./',method=None,alphabet=None,parameter=None): """ Build filepath from input filename, output dir, and list of suffixes input_filepath: filepath to be used for generating the output filepath. The path and the final suffix will be stripped to get the 'base' filename. output_dir: the path to append to the beginning of the base filename method: string indicating method that should be appended to filename alphabet: string indicating an alphabet recoding which should be appended to filename, or None parameter: parameter that should be appended to the filename, or None (ignored if method doesn't require parameter) Examples: >>> build_coevolution_matrix_filepath(\ './p53.fasta','/output/path','mi','charge') /output/path/p53.charge.mi >>> build_coevolution_matrix_filepath(\ './p53.new.fasta','/output/path','mi','charge') /output/path/p53.new.charge.mi >>> build_coevolution_matrix_filepath(\ './p53.fasta','/output/path','sca','charge',0.75) /output/path/p53.charge.sca_75 """ if method == 'sca': try: cutoff_str = str(parameter) point_index = cutoff_str.rindex('.') method = '_'.join([method,cutoff_str[point_index+1:point_index+4]]) except ValueError: raise ValueError, 'Cutoff must be provided when method == \'sca\'' elif method == 'gctmpca': try: epsilon_str = str(parameter) point_index = epsilon_str.rindex('.') method = '_'.join([method,epsilon_str[point_index+1:point_index+4]]) except ValueError: raise ValueError, 'Epsilon must be provided when method == \'gctmpca\'' suffixes = filter(None,[alphabet,method]) # strip path try: result = input_filepath[input_filepath.rindex('/')+1:] except ValueError: result = input_filepath # strip final suffix try: result = result[:result.rindex('.')] except ValueError: pass # append output path if output_dir.endswith('/'): result = ''.join([output_dir,result]) else: result = ''.join([output_dir,'/',result]) # append output suffixes result = '.'.join(filter(None,[result]+suffixes)) return result def parse_coevolution_matrix_filepath(filepath): """ Parses a coevolution matrix filepath into constituent parts. Format is very specific. Will only work on filenames such as: path/alignment_identifier.alphabet_id.method.pkl path/alignment_identifier.alphabet_id.method.csv This format is the recommended naming convention for coevolution matrices. To ensure filepaths compatible with this function, use cogent.evolve.coevolution.build_coevolution_matrix_filepath to build the filepaths for your coevolution matrices. Examples: parse_coevolution_matrix_filepath('pkls/myosin_995.a1_4.nmi.pkl') => ('myosin_995', 'a1_4', 'nmi') parse_coevolution_matrix_filepath('p53.orig.mi.csv') => ('p53','orig','mi') """ filename = basename(filepath) fields = filename.split('.') try: alignment_id = fields[0] alphabet_id = fields[1] method_id = fields[2] extension = fields[3] except IndexError: raise ValueError,\ 'output filepath not in parsable format: %s. See doc string for format definition.' % filepath return (alignment_id,alphabet_id,method_id) script_info = {} script_info['brief_description'] = "" script_info['script_description'] = "" script_info['script_usage'] = [("","","")] script_info['output_description']= "" script_info['required_options'] = [\ # Example required option make_option('-i','--alignment_fp',help='the input alignment'), ] script_info['optional_options'] = [\ make_option('-t','--tree_fp', help='the input tree [default: %default]', default=None), make_option('-f','--force',action='store_true',\ dest='force',help='Force overwrite of any existing files '+\ '[default: %default]', default=False), make_option('--ignore_excludes',action='store_true', dest='ignore_excludes',help='exclude_handler=ignore_excludes '+\ '[default: %default]',default=False), make_option('-d','--delimited_output',action='store_true', dest='delimited_output',help='store result matrix as csv file '+\ 'instead of pkl file [default: %default]',default=False), make_option('-m','--method_id',action='store', type='choice',dest='method_id',help='coevolve method to apply '+\ '[default: %default]',default='nmi', choices=coevolve_alignment_functions.keys()), make_option('-c','--sca_cutoff',action='store', type='float',dest='sca_cutoff',help='cutoff to apply when method'+\ ' is SCA (-m sca) [default: %default]',default=0.8), make_option('-e','--epsilon',action='store', type='float',dest='epsilon',help='epsilon, only used when method'+\ ' is Haussler/Yeang (-m gctmpca) [default: %default]',default=0.7), make_option('-o','--output_dir',action='store', type='string',dest='output_dir',help='directory to store pickled '+\ 'result matrix (when -p is specified) [default: %default]', default='./'), make_option('-a','--alphabet_id',action='store', dest='alphabet_id',type='choice', help='name of alphabet to reduce to [default: %default (i.e., full)]', default='orig',choices=alphabets.keys()) ] script_info['version'] = __version__ def main(): option_parser, opts, args =\ parse_command_line_parameters(**script_info) verbose = opts.verbose force = opts.force method_id = opts.method_id output_dir = opts.output_dir sca_cutoff = opts.sca_cutoff epsilon = opts.epsilon alphabet_id = opts.alphabet_id delimited_output = opts.delimited_output alignment_filepath = opts.alignment_fp tree_filepath = opts.tree_fp # error checking related to the alignment try: aln = LoadSeqs(alignment_filepath,MolType=PROTEIN,alignment=DenseAlignment) except IndexError: option_parser.error('Must provide an alignment filepath.') except (RecordError,FileFormatError): option_parser.error( "Error parsing alignment: %s" % alignment_filepath) except IOError: option_parser.error(\ "Can't access alignment file: %s" % alignment_filepath) # error checking related to the newick tree if tree_filepath == None: if (opts.method_id == 'gctmpca' or opts.method_id == 'an'): option_parser.error(\ 'Tree-based method, but no tree. Provide a newick formatted tree.') else: try: tree = LoadTree(tree_filepath) except TreeParseError: option_parser.error(\ "Error parsing tree: %s" % tree_filepath) except IOError: option_parser.error(\ "Can't access tree file: %s" % tree_filepath) # Error checking related to exclude handling if opts.ignore_excludes and opts.method_id not in ('mi','nmi'): option_parser.error(\ 'Ignoring exclude (i.e., gap) characters currently only supported for MI and NMI.') if delimited_output: output_file_extension = 'csv' else: output_file_extension = 'pkl' # Load the data and parameters specified by the user. coevolve_alignment_function = coevolve_alignment_functions[method_id] alphabet_def = alphabets[alphabet_id] aln = LoadSeqs(alignment_filepath,moltype=PROTEIN,aligned=DenseAlignment) if tree_filepath != None: tree = LoadTree(tree_filepath) if opts.ignore_excludes: exclude_handler = ignore_excludes else: exclude_handler = None # Recode the alignment in the specified reduced-state alphabet. recoded_aln = recode_dense_alignment(aln,alphabet_def=alphabet_def) # Perform some preliminary steps before starting the analysis. This is # done here, rather than in the block below, to allow for some work # with the pickle filepath before starting the analysis. The trade-off # is that the coevolution method is checked twice (here and below), but # since this main block is run relatively infrequently, this is not # noticeably less efficient. if method_id == 'sca': # requires prior amino acid frequencies -- recode them # to reflect the reduced-state alphabet background_freqs = \ recode_freq_vector(alphabet_def,default_sca_freqs) output_filepath = ''.join([\ build_coevolution_matrix_filepath(alignment_filepath,\ output_dir,method_id,alphabet_id,sca_cutoff),\ '.',output_file_extension]) elif method_id == 'gctmpca': # uses DSO78 data -- recode it to reflect the # reduced-state alphabet recoded_counts, recoded_freqs = \ recode_counts_and_freqs(alphabet_def) recoded_q = square_matrix_to_dict(\ build_rate_matrix(recoded_counts,recoded_freqs)) output_filepath = ''.join([\ build_coevolution_matrix_filepath(alignment_filepath,\ output_dir,method_id,alphabet_id,epsilon),\ '.',output_file_extension]) else: output_filepath = ''.join([\ build_coevolution_matrix_filepath(alignment_filepath,\ output_dir,method_id,alphabet_id),\ '.',output_file_extension]) # Check for existence of output file -- we want to find this out # before generating the result matrix so we don't overwrite it # (since that can take a while). If the user specified -f to # force file overwriting, skip this step. if not force and exists(output_filepath): print 'Output file already exists:', output_filepath print 'Remove, rename, or specify -f to force overwrite.' exit(-1) # If the user specified -v, print some information to stdout. Otherwise # only error messages are displayed (via stderr). if verbose: print 'Input alignment: %s' % alignment_filepath try: print 'Input tree: %s' % tree_filepath except IndexError: pass print 'Output matrix filepath: %s' % output_filepath if alphabet_id != 'orig': print 'Alphabet reduction: %s' % alphabet_id else: print "No alphabet reduction (alphabet_id = 'orig')." if method_id == 'sca': print 'Coevolution method: sca, cutoff=%f' % sca_cutoff elif method_id == 'gctmpca': print 'Coevolution method: gctmpca, epsilon=%f' % epsilon else: print 'Coevolution method: %s' % method_id if exclude_handler == ignore_excludes: print \ 'Exclude (i.e., gap) character handling: gaps treated as other characters.' else: print \ 'Exclude (i.e., gap) character handling: columns with gaps = null value' # Perform the coevolutionary analysis. This can take a while. if coevolve_alignment_function == sca_alignment: alphabet = ''.join([c[0] for c in alphabet_def]) matrix = coevolve_alignment(coevolve_alignment_function,recoded_aln,\ cutoff=sca_cutoff,background_freqs=background_freqs,\ alphabet=alphabet) elif coevolve_alignment_function == gctmpca_alignment: matrix = coevolve_alignment(coevolve_alignment_function,\ recoded_aln,tree=tree,sub_matrix=recoded_q,priors=recoded_freqs,\ epsilon=epsilon) elif coevolve_alignment_function == ancestral_state_alignment: matrix = coevolve_alignment(\ coevolve_alignment_function,recoded_aln,tree=tree) else: matrix = coevolve_alignment(coevolve_alignment_function,recoded_aln,\ exclude_handler=exclude_handler) # Write the coevolution matrix to disk in the requested format if delimited_output: coevolution_matrix_to_csv(matrix,output_filepath) else: pickle_coevolution_result(matrix,output_filepath) if __name__ == "__main__": main() PyCogent-1.5.3/cogent/evolve/discrete_markov.py000644 000765 000024 00000007415 12024702176 022562 0ustar00jrideoutstaff000000 000000 import numpy from cogent.util.warning import deprecated from cogent.recalculation.definition import (NonParamDefn, CalcDefn, EvaluatedCell, PartitionDefn, ConstCell, ConstDefn, DictArrayTemplate) __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class PsubMatrixDefn(PartitionDefn): "Square 2D array made of 1D partitions" numeric = False # well, not scalar anyway const_by_default = False independent_by_default = True def __init__(self, default=None, name=None, dimensions=None, dimension=None, size=None, **kw): PartitionDefn.__init__(self, default, name, dimensions, dimension, size, **kw) (dim_name, dim_cats) = self.internal_dimension self.internal_dimensions = (dim_name, dim_name+"2") self.array_template = DictArrayTemplate(dim_cats, dim_cats) def _makeDefaultValue(self): # Purely flat default doesn't work well so start at approx t=0.5 flat = numpy.ones([self.size, self.size], float) / self.size diag = numpy.identity(self.size, float) return (flat + diag) / 2 def checkValueIsValid(self, value, is_constant): if value.shape != (self.size,self.size): raise ValueError("Wrong array shape %s for %s, expected (%s,%s)" % (value.shape, self.name, self.size, self.size)) for part in value: PartitionDefn.checkValueIsValid(self, part, is_constant) def makeCells(self, input_soup={}, variable=None): uniq_cells = [] all_cells = [] for (i, v) in enumerate(self.uniq): if v is None: raise ValueError("input %s not set" % self.name) assert hasattr(v, 'getDefaultValue'), v value = v.getDefaultValue() assert hasattr(value, 'shape'), value assert value.shape == (self.size,self.size) scope = [key for key in self.assignments if self.assignments[key] is v] if v.is_constant or (variable is not None and variable is not v): matrix = ConstCell(self.name, value) else: rows = [] for part in value: (ratios, partition) = self._makePartitionCell( self.name+'_part', scope, part) all_cells.extend(ratios) rows.append(partition) all_cells.extend(rows) matrix = EvaluatedCell(self.name, lambda *x:numpy.array(x), rows) all_cells.append(matrix) uniq_cells.append(matrix) return (all_cells, uniq_cells) def DiscreteSubstitutionModel(*args, **kw): deprecated("class", "cogent.evolve.discrete_markov.DiscreteSubstitutionModel", "cogent.evolve.substitution_model.DiscreteSubstitutionModel", '1.6') from cogent.evolve.substitution_model import DiscreteSubstitutionModel return DiscreteSubstitutionModel(*args, **kw) class PartialyDiscretePsubsDefn(object): def __init__(self, alphabet, psubs, discrete_edges): motifs = tuple(alphabet) dpsubs = PsubMatrixDefn( name="dpsubs", dimension = ('motif', motifs), default=None, dimensions=('locus', 'edge')) self.choices = [psubs, dpsubs] self.discrete_edges = discrete_edges def selectFromDimension(self, dimension, category): assert dimension == 'edge', dimension special = category in self.discrete_edges return self.choices[special].selectFromDimension(dimension, category) PyCogent-1.5.3/cogent/evolve/likelihood_calculation.py000644 000765 000024 00000024750 12024702176 024103 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """This file controls the central function of EVOLVE, the calculation of the log- likelihood of an alignment given a phylogenetic tree and substitution model. The likelihood calculation is done according to Felsenstein's 1981 pruning algorithm. This file contains a Python implementation of that algorithm and an interface to a more computationally efficient Pyrex implementation. The two versions are maintained for the purpose of cross- validating accuracy. The calculations can be performed for tree's that have polytomies in addition to binary trees. """ import numpy Float = numpy.core.numerictypes.sctype2char(float) from cogent.recalculation.definition import CalculationDefn, _FuncDefn, \ CalcDefn, ProbabilityParamDefn, NonParamDefn, SumDefn, CallDefn, \ ParallelSumDefn from cogent.evolve.likelihood_tree import LikelihoodTreeEdge from cogent.evolve.simulate import argpick from cogent.maths.markov import SiteClassTransitionMatrix __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class _PartialLikelihoodDefn(CalculationDefn): def setup(self, edge_name): self.edge_name = edge_name class LeafPartialLikelihoodDefn(_PartialLikelihoodDefn): name = "sequence" def calc(self, lh_tree): lh_leaf = lh_tree.getEdge(self.edge_name) return lh_leaf.input_likelihoods class PartialLikelihoodProductDefn(_PartialLikelihoodDefn): name = "plh" recycling = True def calc(self, recycled_result, lh_edge, *child_likelihoods): if recycled_result is None: recycled_result = lh_edge.makePartialLikelihoodsArray() return lh_edge.sumInputLikelihoodsR(recycled_result, *child_likelihoods) class PartialLikelihoodProductDefnFixedMotif(PartialLikelihoodProductDefn): def calc(self, recycled_result, fixed_motif, lh_edge, *child_likelihoods): if recycled_result is None: recycled_result = lh_edge.makePartialLikelihoodsArray() result = lh_edge.sumInputLikelihoodsR( recycled_result, *child_likelihoods) if fixed_motif not in [None, -1]: for motif in range(result.shape[-1]): if motif != fixed_motif: result[:, motif] = 0.0 return result class LhtEdgeLookupDefn(CalculationDefn): name = 'col_index' def setup(self, edge_name): self.edge_name = edge_name # so that it can be found by reconstructAncestralSeqs etc: if edge_name == 'root': self.name = 'root' def calc(self, lht): return lht.getEdge(self.edge_name) def makePartialLikelihoodDefns(edge, lht, psubs, fixed_motifs): kw = {'edge_name':edge.Name} if edge.istip(): plh = LeafPartialLikelihoodDefn(lht, **kw) else: lht_edge = LhtEdgeLookupDefn(lht, **kw) children = [] for child in edge.Children: child_plh = makePartialLikelihoodDefns(child, lht, psubs, fixed_motifs) psub = psubs.selectFromDimension('edge', child.Name) child_plh = CalcDefn(numpy.inner)(child_plh, psub) children.append(child_plh) if fixed_motifs: fixed_motif = fixed_motifs.selectFromDimension('edge', edge.Name) plh = PartialLikelihoodProductDefnFixedMotif( fixed_motif, lht_edge, *children, **kw) else: plh = PartialLikelihoodProductDefn(lht, *children, **kw) return plh def recursive_lht_build(edge, leaves): if edge.istip(): lhe = leaves[edge.Name] else: lht_children = [] for child in edge.Children: lht = recursive_lht_build(child, leaves) lht_children.append(lht) lhe = LikelihoodTreeEdge(lht_children, edge_name=edge.Name) return lhe class LikelihoodTreeDefn(CalculationDefn): name = 'lht' def setup(self, tree): self.tree = tree def calc(self, leaves): return recursive_lht_build(self.tree, leaves) class LikelihoodTreeAlignmentSplitterDefn(CalculationDefn): name = 'local_lht' def calc(self, parallel_context, lht): return lht.parallelShare(parallel_context) def makeTotalLogLikelihoodDefn(tree, leaves, psubs, mprobs, bprobs, bin_names, locus_names, sites_independent): fixed_motifs = NonParamDefn('fixed_motif', ['edge']) lht = LikelihoodTreeDefn(leaves, tree=tree) # Split up the alignment columns between the available CPUs. parallel_context = NonParamDefn('parallel_context') lht = LikelihoodTreeAlignmentSplitterDefn(parallel_context, lht) plh = makePartialLikelihoodDefns(tree, lht, psubs, fixed_motifs) # After the root partial likelihoods have been calculated it remains to # sum over the motifs, local sites, other sites (ie: cpus), bins and loci. # The motifs are always done first, but after that it gets complicated. # If a bin HMM is being used then the sites from the different CPUs must # be interleaved first, otherwise summing over the CPUs is done last to # minimise inter-CPU communicaton. root_mprobs = mprobs.selectFromDimension('edge', 'root') lh = CalcDefn(numpy.inner, name='lh')(plh, root_mprobs) if len(bin_names) > 1: if sites_independent: site_pattern = CalcDefn(BinnedSiteDistribution, name='bdist')( bprobs) else: parallel_context = None # hmm does the gathering over CPUs switch = ProbabilityParamDefn('bin_switch', dimensions=['locus']) site_pattern = CalcDefn(PatchSiteDistribution, name='bdist')( switch, bprobs) blh = CallDefn(site_pattern, lht, name='bindex') tll = CallDefn(blh, *lh.acrossDimension('bin', bin_names), **dict(name='tll')) else: lh = lh.selectFromDimension('bin', bin_names[0]) tll = CalcDefn(log_sum_across_sites, name='logsum')(lht, lh) if len(locus_names) > 1 or parallel_context is None: # "or parallel_context is None" only because SelectFromDimension # currently has no .makeParamController() method. tll = SumDefn(*tll.acrossDimension('locus', locus_names)) else: tll = tll.selectFromDimension('locus', locus_names[0]) if parallel_context is not None: tll = ParallelSumDefn(parallel_context, tll) return tll def log_sum_across_sites(root, root_lh): return root.getLogSumAcrossSites(root_lh) class BinnedSiteDistribution(object): def __init__(self, bprobs): self.bprobs = bprobs def getWeightedSumLh(self, lhs): result = numpy.zeros(lhs[0].shape, lhs[0].dtype.char) temp = numpy.empty(result.shape, result.dtype.char) for (bprob, lh) in zip(self.bprobs, lhs): temp[:] = lh temp *= bprob result += temp return result def __call__(self, root): return BinnedLikelihood(self, root) def emit(self, length, random_series): result = numpy.zeros([length], int) for i in range(length): result[i] = argpick(self.bprobs, random_series) return result class PatchSiteDistribution(object): def __init__(self, switch, bprobs): half = len(bprobs) // 2 self.alloc = [0] * half + [1] * (len(bprobs)-half) pprobs = numpy.zeros([max(self.alloc)+1], Float) for (b,p) in zip(self.alloc, bprobs): pprobs[b] += p self.bprobs = [p/pprobs[self.alloc[i]] for (i,p) in enumerate(bprobs)] self.transition_matrix = SiteClassTransitionMatrix(switch, pprobs) def getWeightedSumLhs(self, lhs): result = numpy.zeros((2,)+lhs[0].shape, lhs[0].dtype.char) temp = numpy.empty(lhs[0].shape, result.dtype.char) for (patch, weight, lh) in zip(self.alloc, self.bprobs, lhs): temp[:] = lh temp *= weight result[patch] += temp return result def __call__(self, root): return SiteHmm(self, root) def emit(self, length, random_series): bprobs = [[p for (patch,p) in zip(self.alloc, self.bprobs) if patch==a] for a in [0,1]] source = self.transition_matrix.emit(random_series) result = numpy.zeros([length], int) for i in range(length): patch = source.next() - 1 result[i] = argpick(bprobs[patch], random_series) return result class BinnedLikelihood(object): def __init__(self, distrib, root): self.distrib = distrib self.root = root def __call__(self, *lhs): result = self.distrib.getWeightedSumLh(lhs) return self.root.getLogSumAcrossSites(result) def getPosteriorProbs(self, *lhs): # posterior bin probs, not motif probs assert len(lhs) == len(self.distrib.bprobs) result = numpy.array( [b*self.root.getFullLengthLikelihoods(p) for (b,p) in zip(self.distrib.bprobs, lhs)]) result /= result.sum(axis=0) return result class SiteHmm(object): def __init__(self, distrib, root): self.root = root self.distrib = distrib def __call__(self, *lhs): plhs = self.distrib.getWeightedSumLhs(lhs) plhs = numpy.ascontiguousarray(numpy.transpose(plhs)) matrix = self.distrib.transition_matrix return self.root.logDotReduce( matrix.StationaryProbs, matrix.Matrix, plhs) def getPosteriorProbs(self, *lhs): plhs = [] for lh in self.distrib.getWeightedSumLhs(lhs): plh = self.root.getFullLengthLikelihoods(lh) plhs.append(plh) plhs = numpy.transpose(plhs) pprobs = self.distrib.transition_matrix.getPosteriorProbs(plhs) pprobs = numpy.array(numpy.transpose(pprobs)) lhs = numpy.array(lhs) blhs = lhs / numpy.sum(lhs, axis=0) blhs = numpy.array( [b * self.root.getFullLengthLikelihoods(p) for (b,p) in zip(self.distrib.bprobs, blhs)]) binsum = numpy.zeros(pprobs.shape, Float) for (patch, data) in zip(self.distrib.alloc, blhs): binsum[patch] += data for (patch, data) in zip(self.distrib.alloc, blhs): data *= pprobs[patch] / binsum[patch] return blhs PyCogent-1.5.3/cogent/evolve/likelihood_function.py000644 000765 000024 00000042034 12024702176 023425 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import random, numpy from cogent.core.alignment import Alignment from cogent.util.dict_array import DictArrayTemplate from cogent.evolve.simulate import AlignmentEvolver, randomSequence from cogent.util import parallel, table from cogent.recalculation.definition import ParameterController from cogent.util.warning import discontinued, deprecated __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Andrew Butterfield", "Peter Maxwell", "Matthew Wakefield", "Rob Knight", "Brett Easton"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" # cogent.evolve.parameter_controller.LikelihoodParameterController tells the # recalculation framework to use this subclass rather than the generic # recalculation Calculator. It adds methods which are useful for examining # the parameter, psub, mprob and likelihood values after the optimisation is # complete. class LikelihoodFunction(ParameterController): def setpar(self, param_name, value, edge=None, **scope): deprecated('method', 'setpar','setParamRule', '1.6') return self.setParamRule(param_name, edge=edge, value=value, is_constant=True, **scope) def testfunction(self): deprecated('method', 'testfunction','getLogLikelihood', '1.6') return self.getLogLikelihood() def getLogLikelihood(self): return self.getFinalResult() def getPsubForEdge(self, name, **kw): """returns the substitution probability matrix for the named edge""" try: # For PartialyDiscretePsubsDefn array = self.getParamValue('dpsubs', edge=name, **kw) except KeyError: array = self.getParamValue('psubs', edge=name, **kw) return DictArrayTemplate(self._motifs, self._motifs).wrap(array) def getRateMatrixForEdge(self, name, **kw): """returns the rate matrix (Q) for the named edge Note: expm(Q) will give the same result as getPsubForEdge(name)""" try: array = self.getParamValue('Q', edge=name, **kw) except KeyError as err: if err[0] == 'Q' and name != 'Q': raise RuntimeError('rate matrix not known by this model') else: raise return DictArrayTemplate(self._motifs, self._motifs).wrap(array) def _getLikelihoodValuesSummedAcrossAnyBins(self, locus=None): if self.bin_names and len(self.bin_names) > 1: root_lhs = [self.getParamValue('lh', locus=locus, bin=bin) for bin in self.bin_names] bprobs = self.getParamValue('bprobs') root_lh = bprobs.dot(root_lhs) else: root_lh = self.getParamValue('lh', locus=locus) return root_lh def getFullLengthLikelihoods(self, locus=None): """Array of [site, motif] likelihoods from the root of the tree""" root_lh = self._getLikelihoodValuesSummedAcrossAnyBins(locus=locus) root_lht = self.getParamValue('root', locus=locus) return root_lht.getFullLengthLikelihoods(root_lh) def getGStatistic(self, return_table=False, locus=None): """Goodness-of-fit statistic derived from the unambiguous columns""" root_lh = self._getLikelihoodValuesSummedAcrossAnyBins(locus=locus) root_lht = self.getParamValue('root', locus=locus) return root_lht.calcGStatistic(root_lh, return_table) def reconstructAncestralSeqs(self, locus=None): """returns a dict of DictArray objects containing probabilities of each alphabet state for each node in the tree. Arguments: - locus: a named locus""" result = {} array_template = None for restricted_edge in self._tree.getEdgeVector(): if restricted_edge.istip(): continue try: r = [] for motif in range(len(self._motifs)): self.setParamRule('fixed_motif', value=motif, edge=restricted_edge.Name, locus=locus, is_constant=True) likelihoods = self.getFullLengthLikelihoods(locus=locus) r.append(likelihoods) if array_template is None: array_template = DictArrayTemplate( likelihoods.shape[0], self._motifs) finally: self.setParamRule('fixed_motif', value=-1, edge=restricted_edge.Name, locus=locus, is_constant=True) # dict of site x motif arrays result[restricted_edge.Name] = array_template.wrap( numpy.transpose(numpy.asarray(r))) return result def likelyAncestralSeqs(self, locus=None): """Returns the most likely reconstructed ancestral sequences as an alignment. Arguments: - locus: a named locus""" prob_array = self.reconstructAncestralSeqs(locus=locus) seqs = [] for edge, probs in prob_array.items(): seq = [] for row in probs: by_p = [(p,state) for state, p in row.items()] seq.append(max(by_p)[1]) seqs += [(edge, self.model.MolType.makeSequence("".join(seq)))] return Alignment(data = seqs, MolType = self.model.MolType) def getBinProbs(self, locus=None): hmm = self.getParamValue('bindex', locus=locus) lhs = [self.getParamValue('lh', locus=locus, bin=bin) for bin in self.bin_names] array = hmm.getPosteriorProbs(*lhs) return DictArrayTemplate(self.bin_names, array.shape[1]).wrap(array) def _valuesForDimension(self, dim): # in support of __str__ if dim == 'edge': result = [e.Name for e in self._tree.getEdgeVector()] elif dim == 'bin': result = self.bin_names[:] elif dim == 'locus': result = self.locus_names[:] elif dim.startswith('motif'): result = self._mprob_motifs elif dim == 'position': result = self.posn_names[:] else: raise KeyError, dim return result def _valuesForDimensions(self, dims): # in support of __str__ result = [[]] for dim in dims: new_result = [] for r in result: for cat in self._valuesForDimension(dim): new_result.append(r+[cat]) result = new_result return result def __str__(self): if not self._name: title = 'Likelihood Function Table' else: title = self._name result = [title] result += self.getStatistics(with_motif_probs=True, with_titles=False) return '\n'.join(map(str, result)) def getAnnotatedTree(self): d = self.getParamValueDict(['edge']) tree = self._tree.deepcopy() for edge in tree.getEdgeVector(): if edge.Name == 'root': continue for par in d: edge.params[par] = d[par][edge.Name] return tree def getMotifProbs(self, edge=None, bin=None, locus=None): motif_probs_array = self.getParamValue( 'mprobs', edge=edge, bin=bin, locus=locus) return DictArrayTemplate(self._mprob_motifs).wrap(motif_probs_array) #return dict(zip(self._motifs, motif_probs_array)) def getBinPriorProbs(self, locus=None): bin_probs_array = self.getParamValue('bprobs', locus=locus) return DictArrayTemplate(self.bin_names).wrap(bin_probs_array) def getScaledLengths(self, predicate, bin=None, locus=None): """A dictionary of {scale:{edge:length}}""" if not hasattr(self._model, 'getScaledLengthsFromQ'): return {} def valueOf(param, **kw): return self.getParamValue(param, locus=locus, **kw) if bin is None: bin_names = self.bin_names else: bin_names = [bin] if len(bin_names) == 1: bprobs = [1.0] else: bprobs = valueOf('bprobs') mprobs = [valueOf('mprobs', bin=b) for b in bin_names] scaled_lengths = {} for edge in self._tree.getEdgeVector(): if edge.isroot(): continue Qs = [valueOf('Qd', bin=b, edge=edge.Name).Q for b in bin_names] length = valueOf('length', edge=edge.Name) scaled_lengths[edge.Name] = length * self._model.getScaleFromQs( Qs, bprobs, mprobs, predicate) return scaled_lengths def getStatistics(self, with_motif_probs=True, with_titles=True): """returns the parameter values as tables/dict Arguments: - with_motif_probs: include the motif probability table - with_titles: include a title for each table based on it's dimension""" result = [] group = {} param_names = self.getParamNames() mprob_name = [n for n in param_names if 'mprob' in n] if mprob_name: mprob_name = mprob_name[0] else: mprob_name = '' if not with_motif_probs: param_names.remove(mprob_name) for param in param_names: dims = tuple(self.getUsedDimensions(param)) if dims not in group: group[dims] = [] group[dims].append(param) table_order = group.keys() for table_dims in table_order: raw_table = self.getParamValueDict( dimensions=table_dims, params=group[table_dims]) param_names = group[table_dims] param_names.sort() if table_dims == ('edge',): if 'length' in param_names: param_names.remove('length') param_names.insert(0, 'length') raw_table['parent'] = dict([(e.Name, e.Parent.Name) for e in self._tree.getEdgeVector() if not e.isroot()]) param_names.insert(0, 'parent') list_table = [] heading_names = list(table_dims) + param_names row_order = self._valuesForDimensions(table_dims) for scope in row_order: row = {} row_used = False for param in param_names: d = raw_table[param] try: for part in scope: d = d[part] except KeyError: d = 'NA' else: row_used = True row[param] = d if row_used: row.update(dict(zip(table_dims, scope))) row = [row[k] for k in heading_names] list_table.append(row) if table_dims: title = ['', '%s params' % ' '.join(table_dims)][with_titles] row_ids = True else: row_ids = False title = ['', 'global params'][with_titles] result.append(table.Table( heading_names, list_table, max_width = 80, row_ids = row_ids, title=title, **self._format)) return result def getStatisticsAsDict(self, with_parent_names=True, with_edge_names=False): """Returns a dictionary containing the statistics for each edge of the tree, and any other information provided by the substitution model. The dictionary is keyed at the top-level by parameter name, and then by edge.name. Arguments: - with_edge_names: if True, an ordered list of edge names is included under the top-level key 'edge.names'. Default is False. """ discontinued('method', "'getStatisticsAsDict' " "use 'getParamValueDict(['edge'])' is nearly equivalent", '1.6') stats_dict = self.getParamValueDict(['edge']) if hasattr(self.model, 'scale_masks'): for predicate in self.model.scale_masks: stats_dict[predicate] = self.getScaledLengths(predicate) edge_vector = [e for e in self._tree.getEdgeVector() if not e.isroot()] # do the edge names if with_parent_names: parents = {} for edge in edge_vector: if edge.Parent.isroot(): parents[edge.Name] = "root" else: parents[edge.Name] = str(edge.Parent.Name) stats_dict["edge.parent"] = parents if with_edge_names: stats_dict['edge.name'] = ( [e.Name for e in edge_vector if e.istip()] + [e.Name for e in edge_vector if not e.istip()]) return stats_dict # For tests. Compat with old LF interface def setName(self, name): self._name = name def getName(self): return self._name or 'unnamed' def setTablesFormat(self, space=4, digits=4): """sets display properties for statistics tables. This affects results of str(lf) too.""" space = [space, 4][type(space)!=int] digits = [digits, 4][type(digits)!=int] self._format = dict(space=space, digits=digits) def getMotifProbsByNode(self, edges=None, bin=None, locus=None): kw = dict(bin=bin, locus=locus) mprobs = self.getParamValue('mprobs', **kw) mprobs = self._model.calcWordProbs(mprobs) result = self._nodeMotifProbs(self._tree, mprobs, kw) if edges is None: edges = [name for (name, m) in result] result = dict(result) values = [result[name] for name in edges] return DictArrayTemplate(edges, self._mprob_motifs).wrap(values) def _nodeMotifProbs(self, tree, mprobs, kw): result = [(tree.Name, mprobs)] for child in tree.Children: psub = self.getPsubForEdge(child.Name, **kw) child_mprobs = numpy.dot(mprobs, psub) result.extend(self._nodeMotifProbs(child, child_mprobs, kw)) return result def simulateAlignment(self, sequence_length=None, random_series=None, exclude_internal=True, locus=None, seed=None, root_sequence=None): """ Returns an alignment of simulated sequences with key's corresponding to names from the current attached alignment. Arguments: - sequence_length: the legnth of the alignment to be simulated, default is the length of the attached alignment. - random_series: a random number generator. - exclude_internal: if True, only sequences for tips are returned. - root_sequence: a sequence from which all others evolve. """ if sequence_length is None: lht = self.getParamValue('lht', locus=locus) sequence_length = len(lht.index) leaves = self.getParamValue('leaf_likelihoods', locus=locus) orig_ambig = {} #alignment.getPerSequenceAmbiguousPositions() for (seq_name, leaf) in leaves.items(): orig_ambig[seq_name] = leaf.getAmbiguousPositions() else: orig_ambig = {} if random_series is None: random_series = random.Random() random_series.seed(seed) parallel.sync_random(random_series) def psub_for(edge, bin): return self.getPsubForEdge(edge, bin=bin, locus=locus) if len(self.bin_names) > 1: hmm = self.getParamValue('bdist', locus=locus) site_bins = hmm.emit(sequence_length, random_series) else: site_bins = numpy.zeros([sequence_length], int) evolver = AlignmentEvolver(random_series, orig_ambig, exclude_internal, self.bin_names, site_bins, psub_for, self._motifs) if root_sequence is not None: # we convert to a vector of motifs if isinstance(root_sequence, str): root_sequence = self._model.MolType.makeSequence(root_sequence) motif_len = self._model.getAlphabet().getMotifLen() root_sequence = root_sequence.getInMotifSize(motif_len) else: mprobs = self.getParamValue('mprobs', locus=locus, edge='root') mprobs = self._model.calcWordProbs(mprobs) mprobs = dict((m, p) for (m,p) in zip(self._motifs, mprobs)) root_sequence = randomSequence( random_series, mprobs, sequence_length) simulated_sequences = evolver(self._tree, root_sequence) return Alignment( data = simulated_sequences, MolType = self._model.MolType) PyCogent-1.5.3/cogent/evolve/likelihood_tree.py000644 000765 000024 00000032577 12024702176 022552 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Leaf and Edge classes that can calculate their likelihoods. Each leaf holds a sequence. Used by a likelihood function.""" from __future__ import division from cogent.util.modules import importVersionedModule, ExpectedImportError from cogent.util.parallel import MPI from cogent import LoadTable import numpy numpy.seterr(all='ignore') numerictypes = numpy.core.numerictypes.sctype2char __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" try: pyrex = importVersionedModule('_likelihood_tree', globals(), (2, 1), "pure Python/NumPy likelihoodihood tree") except ExpectedImportError: pyrex = None class _LikelihoodTreeEdge(object): def __init__(self, children, edge_name, alignment=None): self.edge_name = edge_name self.alphabet = children[0].alphabet self.comm = None # for MPI M = children[0].shape[-1] for child in children: assert child.shape[-1] == M # Unique positions are unique combos of input positions if alignment is None: # The children are pre-aligned gapped sequences assignments = [c.index for c in children] else: self.alignment = alignment #XXX preserve through MPI split? # The children are ungapped sequences, 'alignment' # indicates where gaps need to go. assignments = [] for (i, c) in enumerate(children): a = [] for align_index in alignment: col = align_index[i] if col is None: u = len(c.uniq)-1 # gap else: u = c.index[col] assert 0 <= u < len(c.uniq)-1, ( u, len(c.uniq), c.uniq[-1], align_index) a.append(u) assignments.append(a) (uniq, counts, self.index) = _indexed(zip(*assignments)) # extra column for gap uniq.append(tuple([len(c.uniq)-1 for c in children])) counts.append(0) self.uniq = numpy.asarray(uniq, self.integer_type) # For faster math, a contiguous index array for each child self.indexes = [ numpy.array(list(ch), self.integer_type) for ch in numpy.transpose(self.uniq)] # If this is the root it will need to weight the total # log likelihoods by these counts: self.counts = numpy.array(counts, self.float_type) # For product of child likelihoods self._indexed_children = zip(self.indexes, children) self.shape = [len(self.uniq), M] # Derive per-column degree of ambiguity from children's ambigs = [child.ambig[index] for (index, child) in self._indexed_children] self.ambig = numpy.product(ambigs, axis=0) def getSitePatterns(self, cols): # Recursive lookup of Site Patterns aka Alignment Columns child_motifs = [child.getSitePatterns(index[cols]) for (index, child) in self._indexed_children] return [''.join(child[u] for child in child_motifs) for u in range(len(cols))] def restrictMotif(self, input_likelihoods, fixed_motif): # for reconstructAncestralSeqs mask = numpy.zeros([input_likelihoods.shape[-1]], self.float_type) mask[fixed_motif] = 1.0 input_likelihoods *= mask def parallelShare(self, comm): """A local version of self for a single CPU in an MPI group""" if comm is None or comm.Get_size() == 1: return self assert self.comm is None U = len(self.uniq) - 1 # Gap column (size, rank) = (comm.Get_size(), comm.Get_rank()) (share, remainder) = divmod(U, size) if share == 0: return self # not enough to share share_sizes = [share+1]*remainder + [share]*(size-remainder) assert sum(share_sizes) == U (lo,hi) = [sum(share_sizes[:i]) for i in (rank, rank+1)] local_cols = [i for (i,u) in enumerate(self.index) if lo <= u < hi] local = self.selectColumns(local_cols) # Attributes for reconstructing/refinding the global arrays. # should maybe make a wrapping class instead. local.share_sizes = share_sizes local.comm = comm local.full_length_version = self return local def selectColumns(self, cols): children = [] for (index, child) in self._indexed_children: child = child.selectColumns(cols) children.append(child) return self.__class__(children, self.edge_name) def parallelReconstructColumns(self, likelihoods): """Recombine full uniq array (eg: likelihoods) from MPI CPUs""" if self.comm is None: return (self, likelihoods) result = numpy.empty([sum(self.share_sizes)+1], likelihoods.dtype) mpi_type = None # infer it from the likelihoods/result array dtype send = (likelihoods[:-1], mpi_type) # drop gap column recv = (result, self.share_sizes, None, mpi_type) self.comm.Allgatherv(send, recv) result[-1] = likelihoods[-1] # restore gap column return (self.full_length_version, result) def getFullLengthLikelihoods(self, likelihoods): (self, likelihoods) = self.parallelReconstructColumns(likelihoods) return likelihoods[self.index] def calcGStatistic(self, likelihoods, return_table=False): # A Goodness-of-fit statistic (self, likelihoods) = self.parallelReconstructColumns(likelihoods) unambig = (self.ambig == 1.0).nonzero()[0] observed = self.counts[unambig].astype(int) expected = likelihoods[unambig] * observed.sum() #chisq = ((observed-expected)**2 / expected).sum() G = 2 * observed.dot(numpy.log(observed/expected)) if return_table: motifs = self.getSitePatterns(unambig) rows = zip(motifs, observed, expected) rows.sort(key=lambda row:(-row[1], row[0])) table = LoadTable(header=['Pattern', 'Observed', 'Expected'], rows=rows, row_ids=True) return (G, table) else: return G def getEdge(self, name): if self.edge_name == name: return self else: for (i,c) in self._indexed_children: r = c.getEdge(name) if r is not None: return r return None def makePartialLikelihoodsArray(self): return numpy.ones(self.shape, self.float_type) def sumInputLikelihoods(self, *likelihoods): result = numpy.ones(self.shape, self.float_type) self.sumInputLikelihoodsR(result, *likelihoods) return result def asLeaf(self, likelihoods): (self, likelihoods) = self.parallelReconstructColumns(likelihoods) assert len(likelihoods) == len(self.counts) return LikelihoodTreeLeaf(likelihoods, likelihoods, self.counts, self.index, self.edge_name, self.alphabet, None) class _PyLikelihoodTreeEdge(_LikelihoodTreeEdge): # Should be a subclass of regular tree edge? float_type = numerictypes(float) integer_type = numerictypes(int) # For scaling very very small numbers BASE = 2.0 ** 100 LOG_BASE = numpy.log(BASE) def sumInputLikelihoodsR(self, result, *likelihoods): result[:] = 1.0 for (i, index) in enumerate(self.indexes): result *= numpy.take(likelihoods[i], index, 0) return result # For root def logDotReduce(self, patch_probs, switch_probs, plhs): (self, plhs) = self.parallelReconstructColumns(plhs) exponent = 0 state_probs = patch_probs.copy() for site in self.index: state_probs = numpy.dot(switch_probs, state_probs) * plhs[site] while max(state_probs) < 1.0: state_probs *= self.BASE exponent -= 1 return numpy.log(sum(state_probs)) + exponent * self.LOG_BASE def getTotalLogLikelihood(self, input_likelihoods, mprobs): lhs = numpy.inner(input_likelihoods, mprobs) return self.getLogSumAcrossSites(lhs) def getLogSumAcrossSites(self, lhs): return numpy.inner(numpy.log(lhs), self.counts) class _PyxLikelihoodTreeEdge(_LikelihoodTreeEdge): integer_type = numerictypes(int) # match checkArrayInt1D float_type = numerictypes(float) # match checkArrayDouble1D/2D def sumInputLikelihoodsR(self, result, *likelihoods): pyrex.sumInputLikelihoods(self.indexes, result, likelihoods) return result # For root def logDotReduce(self, patch_probs, switch_probs, plhs): (self, plhs) = self.parallelReconstructColumns(plhs) return pyrex.logDotReduce(self.index, patch_probs, switch_probs, plhs) def getTotalLogLikelihood(self, input_likelihoods, mprobs): return pyrex.getTotalLogLikelihood(self.counts, input_likelihoods, mprobs) def getLogSumAcrossSites(self, lhs): return pyrex.getLogSumAcrossSites(self.counts, lhs) if pyrex is None: LikelihoodTreeEdge = _PyLikelihoodTreeEdge else: LikelihoodTreeEdge = _PyxLikelihoodTreeEdge FLOAT_TYPE = LikelihoodTreeEdge.float_type INTEGER_TYPE = LikelihoodTreeEdge.integer_type def _indexed(values): # >>> _indexed(['a', 'b', 'c', 'a', 'a']) # (['a', 'b', 'c'], [3, 1, 1], [0, 1, 2, 0, 0]) index = numpy.zeros([len(values)], INTEGER_TYPE) unique = [] counts = [] seen = {} for (c, key) in enumerate(values): if key in seen: i = seen[key] counts[i] += 1 else: i = len(unique) unique.append(key) counts.append(1) seen[key] = i index[c] = i return unique, counts, index def makeLikelihoodTreeLeaf(sequence, alphabet=None, seq_name=None): if alphabet is None: alphabet = sequence.MolType.Alphabet if seq_name is None: seq_name = sequence.getName() motif_len = alphabet.getMotifLen() sequence2 = sequence.getInMotifSize(motif_len) # Convert sequence to indexed list of unique motifs (uniq_motifs, counts, index) = _indexed(sequence2) # extra column for gap uniq_motifs.append('?' * motif_len) counts.append(0) counts = numpy.array(counts, FLOAT_TYPE) # Convert list of unique motifs to array of unique profiles try: likelihoods = alphabet.fromAmbigToLikelihoods( uniq_motifs, FLOAT_TYPE) except alphabet.AlphabetError, detail: motif = str(detail) posn = list(sequence2).index(motif) * motif_len raise ValueError, '%s at %s:%s not in alphabet' % ( repr(motif), seq_name, posn) return LikelihoodTreeLeaf(uniq_motifs, likelihoods, counts, index, seq_name, alphabet, sequence) class LikelihoodTreeLeaf(object): def __init__(self, uniq, likelihoods, counts, index, edge_name, alphabet, sequence): if sequence is not None: self.sequence = sequence self.alphabet = alphabet self.name = self.edge_name = edge_name self.uniq = uniq self.motifs = numpy.asarray(uniq) self.input_likelihoods = likelihoods self.counts = counts self.index = index self.shape = likelihoods.shape self.ambig = numpy.sum(self.input_likelihoods, axis=-1) def backward(self): index = numpy.array(self.index[::-1,...]) result = self.__class__(self.uniq, self.input_likelihoods, self.counts, index, self.edge_name, self.alphabet, None) return result def __len__(self): return len(self.index) def __getitem__(self, index): cols = range(*index.indices(len(self.index))) return self.selectColumns(cols) def getMotifCounts(self, include_ambiguity=False): weights = self.counts / self.ambig profile = self.input_likelihoods * weights[...,numpy.newaxis] if not include_ambiguity: unambig = self.ambig == 1.0 profile = numpy.compress(unambig, profile, axis=0) return numpy.sum(profile, axis=0) def getAmbiguousPositions(self): ambig = {} for (i,u) in enumerate(self.index): if self.ambig[u] != 1.0: ambig[i] = self.uniq[u] return ambig def selectColumns(self, cols): sub_index = [self.index[i] for i in cols] (keep, counts, index) = _indexed(sub_index) keep.append(len(self.uniq)-1) # extra column for gap counts.append(0) counts = numpy.array(counts, FLOAT_TYPE) uniq = [self.uniq[u] for u in keep] likelihoods = self.input_likelihoods[keep] return self.__class__( uniq, likelihoods, counts, index, self.edge_name, self.alphabet, None) def getEdge(self, name): if self.edge_name == name: return self else: return None def getSitePatterns(self, cols): return numpy.asarray(self.uniq)[cols] PyCogent-1.5.3/cogent/evolve/models.py000644 000765 000024 00000115271 12024702176 020664 0ustar00jrideoutstaff000000 000000 #! /usr/bin/env python """A collection of pre-defined models. These are provided for convenience so that users do not need to keep reconstructing the standard models. We encourage users to think about the assumptions in these models and consider if their problem could benefit from a user defined model. Note that models that do not traditionally deal with gaps are implemented with gap recoding that will convert gaps to Ns, and model gaps set to False.""" #The models are constructed in a strait forward manner with no attempt to condense #this file using functions etc. to allow each model to serve as an example for users #wishing to construct their own models import numpy from cogent.evolve import substitution_model from cogent.evolve.predicate import MotifChange, replacement from cogent.evolve.solved_models import F81, HKY85, TN93 __author__ = "Matthew Wakefield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Matthew Wakefield", "Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Matthew Wakefield" __email__ = "wakefield@wehi.edu.au" __status__ = "Production" nucleotide_models = ['JC69','K80', 'F81','HKY85', 'TN93', 'GTR'] codon_models = ['CNFGTR', 'CNFHKY', 'MG94HKY', 'MG94GTR', 'GY94', 'H04G', 'H04GK', 'H04GGK'] protein_models = [ 'DSO78', 'AH96', 'AH96_mtmammals', 'JTT92', 'WG01'] # Substitution model rate matrix predicates _gtr_preds = [MotifChange(x,y) for x,y in ['AC', 'AG', 'AT', 'CG', 'CT']] _kappa = (~MotifChange('R','Y')).aliased('kappa') _omega = replacement.aliased('omega') _cg = MotifChange('CG').aliased('G') _cg_k = (_cg & _kappa).aliased('G.K') def K80(**kw): """Kimura 1980""" return HKY85(equal_motif_probs=True, optimise_motif_probs=False, **kw) def JC69(**kw): """Jukes and Cantor's 1969 model""" return F81(equal_motif_probs=True, optimise_motif_probs=False, **kw) def GTR(**kw): """General Time Reversible nucleotide substitution model.""" return substitution_model.Nucleotide( motif_probs = None, do_scaling = True, model_gaps = False, recode_gaps = True, name = 'GTR', predicates = _gtr_preds, **kw) # Codon Models def CNFGTR(**kw): """Conditional nucleotide frequency codon substitution model, GTR variant (with params analagous to the nucleotide GTR model). See Yap, Lindsay, Easteal and Huttley, Mol Biol Evol, In press.""" return substitution_model.Codon( motif_probs = None, do_scaling = True, model_gaps = False, recode_gaps = True, name = 'CNFGTR', predicates = _gtr_preds+[_omega], mprob_model='conditional', **kw) def CNFHKY(**kw): """Conditional nucleotide frequency codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) See Yap, Lindsay, Easteal and Huttley, Mol Biol Evol, In press.""" return substitution_model.Codon( motif_probs = None, do_scaling = True, model_gaps = False, recode_gaps = True, name = 'CNFHKY', predicates = [_kappa, _omega], mprob_model='conditional', **kw) def MG94HKY(**kw): """Muse and Gaut 1994 codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) see, Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24""" return substitution_model.Codon( motif_probs = None, do_scaling = True, model_gaps = False, recode_gaps = True, name = 'MG94', predicates = [_kappa, _omega], mprob_model='monomer', **kw) def MG94GTR(**kw): """Muse and Gaut 1994 codon substitution model, GTR variant (with params analagous to the nucleotide GTR model) see, Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24""" return substitution_model.Codon( motif_probs = None, do_scaling = True, model_gaps = False, recode_gaps = True, name = 'MG94', predicates = _gtr_preds+[_omega], mprob_model='monomer', **kw) def GY94(**kw): """Goldman and Yang 1994 codon substitution model. see, N Goldman and Z Yang, Mol. Biol. Evol., 11(5):725-36, 1994.""" return Y98(**kw) def Y98(**kw): """Yang's 1998 substitution model, a derivative of the GY94. see Z Yang. Mol. Biol. Evol., 15(5):568-73, 1998""" return substitution_model.Codon( motif_probs = None, do_scaling = True, model_gaps = False, recode_gaps = True, name = 'Y98', predicates = { 'kappa' : 'transition', 'omega' : 'replacement', }, mprob_model = 'tuple', **kw) def H04G(**kw): """Huttley 2004 CpG substitution model. Includes a term for substitutions to or from CpG's. see, GA Huttley. Mol Biol Evol, 21(9):1760-8""" return substitution_model.Codon( motif_probs = None, do_scaling = True, model_gaps = False, recode_gaps = True, name = 'H04G', predicates = [_cg, _kappa, _omega], mprob_model = 'tuple', **kw) def H04GK(**kw): """Huttley 2004 CpG substitution model. Includes a term for transition substitutions to or from CpG's. see, GA Huttley. Mol Biol Evol, 21(9):1760-8""" return substitution_model.Codon( motif_probs = None, do_scaling = True, model_gaps = False, recode_gaps = True, name = 'H04GK', predicates = [_cg_k, _kappa, _omega], mprob_model = 'tuple', **kw) def H04GGK(**kw): """Huttley 2004 CpG substitution model. Includes a general term for substitutions to or from CpG's and an adjustment for CpG transitions. see, GA Huttley. Mol Biol Evol, 21(9):1760-8""" return substitution_model.Codon( motif_probs = None, do_scaling = True, model_gaps = False, recode_gaps = True, name = 'H04GGK', predicates = [_cg, _cg_k, _kappa, _omega], mprob_model = 'tuple', **kw) # Protein Models # Empirical Protein Models DSO78_matrix = numpy.array( [[ 0.00000000e+00, 3.60000000e+01, 1.20000000e+02, 1.98000000e+02, 1.80000000e+01, 2.40000000e+02, 2.30000000e+01, 6.50000000e+01, 2.60000000e+01, 4.10000000e+01, 7.20000000e+01, 9.80000000e+01, 2.50000000e+02, 8.90000000e+01, 2.70000000e+01, 4.09000000e+02, 3.71000000e+02, 2.08000000e+02, 0.00000000e+00, 2.40000000e+01], [ 3.60000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.10000000e+01, 2.80000000e+01, 4.40000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.90000000e+01, 0.00000000e+00, 2.30000000e+01, 1.61000000e+02, 1.60000000e+01, 4.90000000e+01, 0.00000000e+00, 9.60000000e+01], [ 1.20000000e+02, 0.00000000e+00, 0.00000000e+00, 1.15300000e+03, 0.00000000e+00, 1.25000000e+02, 8.60000000e+01, 2.40000000e+01, 7.10000000e+01, 0.00000000e+00, 0.00000000e+00, 9.05000000e+02, 1.30000000e+01, 1.34000000e+02, 0.00000000e+00, 9.50000000e+01, 6.60000000e+01, 1.80000000e+01, 0.00000000e+00, 0.00000000e+00], [ 1.98000000e+02, 0.00000000e+00, 1.15300000e+03, 0.00000000e+00, 0.00000000e+00, 8.10000000e+01, 4.30000000e+01, 6.10000000e+01, 8.30000000e+01, 1.10000000e+01, 3.00000000e+01, 1.48000000e+02, 5.10000000e+01, 7.16000000e+02, 1.00000000e+00, 7.90000000e+01, 3.40000000e+01, 3.70000000e+01, 0.00000000e+00, 2.20000000e+01], [ 1.80000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.50000000e+01, 4.80000000e+01, 1.96000000e+02, 0.00000000e+00, 1.57000000e+02, 9.20000000e+01, 1.40000000e+01, 1.10000000e+01, 0.00000000e+00, 1.40000000e+01, 4.60000000e+01, 1.30000000e+01, 1.20000000e+01, 7.60000000e+01, 6.98000000e+02], [ 2.40000000e+02, 1.10000000e+01, 1.25000000e+02, 8.10000000e+01, 1.50000000e+01, 0.00000000e+00, 1.00000000e+01, 0.00000000e+00, 2.70000000e+01, 7.00000000e+00, 1.70000000e+01, 1.39000000e+02, 3.40000000e+01, 2.80000000e+01, 9.00000000e+00, 2.34000000e+02, 3.00000000e+01, 5.40000000e+01, 0.00000000e+00, 0.00000000e+00], [ 2.30000000e+01, 2.80000000e+01, 8.60000000e+01, 4.30000000e+01, 4.80000000e+01, 1.00000000e+01, 0.00000000e+00, 7.00000000e+00, 2.60000000e+01, 4.40000000e+01, 0.00000000e+00, 5.35000000e+02, 9.40000000e+01, 6.06000000e+02, 2.40000000e+02, 3.50000000e+01, 2.20000000e+01, 4.40000000e+01, 2.70000000e+01, 1.27000000e+02], [ 6.50000000e+01, 4.40000000e+01, 2.40000000e+01, 6.10000000e+01, 1.96000000e+02, 0.00000000e+00, 7.00000000e+00, 0.00000000e+00, 4.60000000e+01, 2.57000000e+02, 3.36000000e+02, 7.70000000e+01, 1.20000000e+01, 1.80000000e+01, 6.40000000e+01, 2.40000000e+01, 1.92000000e+02, 8.89000000e+02, 0.00000000e+00, 3.70000000e+01], [ 2.60000000e+01, 0.00000000e+00, 7.10000000e+01, 8.30000000e+01, 0.00000000e+00, 2.70000000e+01, 2.60000000e+01, 4.60000000e+01, 0.00000000e+00, 1.80000000e+01, 2.43000000e+02, 3.18000000e+02, 3.30000000e+01, 1.53000000e+02, 4.64000000e+02, 9.60000000e+01, 1.36000000e+02, 1.00000000e+01, 0.00000000e+00, 1.30000000e+01], [ 4.10000000e+01, 0.00000000e+00, 0.00000000e+00, 1.10000000e+01, 1.57000000e+02, 7.00000000e+00, 4.40000000e+01, 2.57000000e+02, 1.80000000e+01, 0.00000000e+00, 5.27000000e+02, 3.40000000e+01, 3.20000000e+01, 7.30000000e+01, 1.50000000e+01, 1.70000000e+01, 3.30000000e+01, 1.75000000e+02, 4.60000000e+01, 2.80000000e+01], [ 7.20000000e+01, 0.00000000e+00, 0.00000000e+00, 3.00000000e+01, 9.20000000e+01, 1.70000000e+01, 0.00000000e+00, 3.36000000e+02, 2.43000000e+02, 5.27000000e+02, 0.00000000e+00, 1.00000000e+00, 1.70000000e+01, 1.14000000e+02, 9.00000000e+01, 6.20000000e+01, 1.04000000e+02, 2.58000000e+02, 0.00000000e+00, 0.00000000e+00], [ 9.80000000e+01, 0.00000000e+00, 9.05000000e+02, 1.48000000e+02, 1.40000000e+01, 1.39000000e+02, 5.35000000e+02, 7.70000000e+01, 3.18000000e+02, 3.40000000e+01, 1.00000000e+00, 0.00000000e+00, 4.20000000e+01, 1.03000000e+02, 3.20000000e+01, 4.95000000e+02, 2.29000000e+02, 1.50000000e+01, 2.30000000e+01, 9.50000000e+01], [ 2.50000000e+02, 1.90000000e+01, 1.30000000e+01, 5.10000000e+01, 1.10000000e+01, 3.40000000e+01, 9.40000000e+01, 1.20000000e+01, 3.30000000e+01, 3.20000000e+01, 1.70000000e+01, 4.20000000e+01, 0.00000000e+00, 1.53000000e+02, 1.03000000e+02, 2.45000000e+02, 7.80000000e+01, 4.80000000e+01, 0.00000000e+00, 0.00000000e+00], [ 8.90000000e+01, 0.00000000e+00, 1.34000000e+02, 7.16000000e+02, 0.00000000e+00, 2.80000000e+01, 6.06000000e+02, 1.80000000e+01, 1.53000000e+02, 7.30000000e+01, 1.14000000e+02, 1.03000000e+02, 1.53000000e+02, 0.00000000e+00, 2.46000000e+02, 5.60000000e+01, 5.30000000e+01, 3.50000000e+01, 0.00000000e+00, 0.00000000e+00], [ 2.70000000e+01, 2.30000000e+01, 0.00000000e+00, 1.00000000e+00, 1.40000000e+01, 9.00000000e+00, 2.40000000e+02, 6.40000000e+01, 4.64000000e+02, 1.50000000e+01, 9.00000000e+01, 3.20000000e+01, 1.03000000e+02, 2.46000000e+02, 0.00000000e+00, 1.54000000e+02, 2.60000000e+01, 2.40000000e+01, 2.01000000e+02, 8.00000000e+00], [ 4.09000000e+02, 1.61000000e+02, 9.50000000e+01, 7.90000000e+01, 4.60000000e+01, 2.34000000e+02, 3.50000000e+01, 2.40000000e+01, 9.60000000e+01, 1.70000000e+01, 6.20000000e+01, 4.95000000e+02, 2.45000000e+02, 5.60000000e+01, 1.54000000e+02, 0.00000000e+00, 5.50000000e+02, 3.00000000e+01, 7.50000000e+01, 3.40000000e+01], [ 3.71000000e+02, 1.60000000e+01, 6.60000000e+01, 3.40000000e+01, 1.30000000e+01, 3.00000000e+01, 2.20000000e+01, 1.92000000e+02, 1.36000000e+02, 3.30000000e+01, 1.04000000e+02, 2.29000000e+02, 7.80000000e+01, 5.30000000e+01, 2.60000000e+01, 5.50000000e+02, 0.00000000e+00, 1.57000000e+02, 0.00000000e+00, 4.20000000e+01], [ 2.08000000e+02, 4.90000000e+01, 1.80000000e+01, 3.70000000e+01, 1.20000000e+01, 5.40000000e+01, 4.40000000e+01, 8.89000000e+02, 1.00000000e+01, 1.75000000e+02, 2.58000000e+02, 1.50000000e+01, 4.80000000e+01, 3.50000000e+01, 2.40000000e+01, 3.00000000e+01, 1.57000000e+02, 0.00000000e+00, 0.00000000e+00, 2.80000000e+01], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 7.60000000e+01, 0.00000000e+00, 2.70000000e+01, 0.00000000e+00, 0.00000000e+00, 4.60000000e+01, 0.00000000e+00, 2.30000000e+01, 0.00000000e+00, 0.00000000e+00, 2.01000000e+02, 7.50000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.10000000e+01], [ 2.40000000e+01, 9.60000000e+01, 0.00000000e+00, 2.20000000e+01, 6.98000000e+02, 0.00000000e+00, 1.27000000e+02, 3.70000000e+01, 1.30000000e+01, 2.80000000e+01, 0.00000000e+00, 9.50000000e+01, 0.00000000e+00, 0.00000000e+00, 8.00000000e+00, 3.40000000e+01, 4.20000000e+01, 2.80000000e+01, 6.10000000e+01, 0.00000000e+00]]) DSO78_freqs = {'A': 0.087126912873087131, 'C': 0.033473966526033475, 'E': 0.04952995047004953, 'D': 0.046871953128046873, 'G': 0.088611911388088618, 'F': 0.039771960228039777, 'I': 0.036885963114036892, 'H': 0.033617966382033626, 'K': 0.08048191951808048, 'M': 0.014752985247014754, 'L': 0.085356914643085369, 'N': 0.040431959568040438, 'Q': 0.038254961745038257, 'P': 0.050679949320050689, 'S': 0.069576930423069588, 'R': 0.040903959096040908, 'T': 0.058541941458058543, 'W': 0.010493989506010494, 'V': 0.064717935282064723, 'Y': 0.029915970084029919} JTT92_matrix = numpy.array( [[ 0., 56., 81., 105., 15., 179., 27., 36., 35., 30., 54., 54., 194., 57., 58., 378., 475., 298., 9., 11.], [ 56., 0., 10., 5., 78., 59., 69., 17., 7., 23., 31., 34., 14., 9., 113., 223., 42., 62., 115., 209.], [ 81., 10., 0., 767., 4., 130., 112., 11., 26., 7., 15., 528., 15., 49., 16., 59., 38., 31., 4., 46.], [ 105., 5., 767., 0., 5., 119., 26., 12., 181., 9., 18., 58., 18., 323., 29., 30., 32., 45., 10., 7.], [ 15., 78., 4., 5., 0., 5., 40., 89., 4., 248., 43., 10., 17., 4., 5., 92., 12., 62., 53., 536.], [ 179., 59., 130., 119., 5., 0., 23., 6., 27., 6., 14., 81., 24., 26., 137., 201., 33., 47., 55., 8.], [ 27., 69., 112., 26., 40., 23., 0., 16., 45., 56., 33., 391., 115., 597., 328., 73., 46., 11., 8., 573.], [ 36., 17., 11., 12., 89., 6., 16., 0., 21., 229., 479., 47., 10., 9., 22., 40., 245., 961., 9., 32.], [ 35., 7., 26., 181., 4., 27., 45., 21., 0., 14., 65., 263., 21., 292., 646., 47., 103., 14., 10., 8.], [ 30., 23., 7., 9., 248., 6., 56., 229., 14., 0., 388., 12., 102., 72., 38., 59., 25., 180., 52., 24.], [ 54., 31., 15., 18., 43., 14., 33., 479., 65., 388., 0., 30., 16., 43., 44., 29., 226., 323., 24., 18.], [ 54., 34., 528., 58., 10., 81., 391., 47., 263., 12., 30., 0., 15., 86., 45., 503., 232., 16., 8., 70.], [ 194., 14., 15., 18., 17., 24., 115., 10., 21., 102., 16., 15., 0., 164., 74., 285., 118., 23., 6., 10.], [ 57., 9., 49., 323., 4., 26., 597., 9., 292., 72., 43., 86., 164., 0., 310., 53., 51., 20., 18., 24.], [ 58., 113., 16., 29., 5., 137., 328., 22., 646., 38., 44., 45., 74., 310., 0., 101., 64., 17., 126., 20.], [ 378., 223., 59., 30., 92., 201., 73., 40., 47., 59., 29., 503., 285., 53., 101., 0., 477., 38., 35., 63.], [ 475., 42., 38., 32., 12., 33., 46., 245., 103., 25., 226., 232., 118., 51., 64., 477., 0., 112., 12., 21.], [ 298., 62., 31., 45., 62., 47., 11., 961., 14., 180., 323., 16., 23., 20., 17., 38., 112., 0., 25., 16.], [ 9., 115., 4., 10., 53., 55., 8., 9., 10., 52., 24., 8., 6., 18., 126., 35., 12., 25., 0., 71.], [ 11., 209., 46., 7., 536., 8., 573., 32., 8., 24., 18., 70., 10., 24., 20., 63., 21., 16., 71., 0.]]) JTT92_freqs = {'A': 0.076747923252076758, 'C': 0.019802980197019805, 'E': 0.061829938170061841, 'D': 0.05154394845605155, 'G': 0.073151926848073159, 'F': 0.040125959874040135, 'I': 0.053760946239053767, 'H': 0.022943977056022944, 'K': 0.058675941324058678, 'M': 0.023825976174023829, 'L': 0.091903908096091905, 'N': 0.042644957355042652, 'Q': 0.040751959248040752, 'P': 0.050900949099050907, 'S': 0.068764931235068771, 'R': 0.051690948309051694, 'T': 0.058564941435058568, 'W': 0.014260985739014262, 'V': 0.066004933995066004, 'Y': 0.032101967898032102} AH96_matrix = numpy.array( [[ 0. , 59.93, 17.67, 9.77, 6.37, 120.71, 13.9 , 96.49, 8.36, 25.46, 141.88, 26.95, 54.31, 1.9 , 23.18, 387.86, 480.72, 195.06, 1.9 , 6.48], [ 59.93, 0. , 1.9 , 1.9 , 70.8 , 30.71, 141.49, 62.73, 1.9 , 25.65, 6.18, 58.94, 31.26, 75.24, 103.33, 277.05, 179.97, 1.9 , 33.6 , 254.77], [ 17.67, 1.9 , 0. , 583.55, 4.98, 56.77, 113.99, 4.34, 2.31, 1.9 , 1.9 , 794.38, 13.43, 55.28, 1.9 , 69.02, 28.01, 1.9 , 19.86, 21.21], [ 9.77, 1.9 , 583.55, 0. , 2.67, 28.28, 49.12, 3.31, 313.86, 1.9 , 1.9 , 63.05, 12.83, 313.56, 1.9 , 54.71, 14.82, 21.14, 1.9 , 13.12], [ 6.37, 70.8 , 4.98, 2.67, 0. , 1.9 , 48.16, 84.67, 6.44, 216.06, 90.82, 15.2 , 17.31, 19.11, 4.69, 64.29, 33.85, 6.35, 7.84, 465.58], [ 120.71, 30.71, 56.77, 28.28, 1.9 , 0. , 1.9 , 5.98, 22.73, 2.41, 1.9 , 53.3 , 1.9 , 6.75, 23.03, 125.93, 11.17, 2.53, 10.92, 3.21], [ 13.9 , 141.49, 113.99, 49.12, 48.16, 1.9 , 0. , 12.26, 127.67, 11.49, 11.97, 496.13, 60.97, 582.4 , 165.23, 77.46, 44.78, 1.9 , 7.08, 670.14], [ 96.49, 62.73, 4.34, 3.31, 84.67, 5.98, 12.26, 0. , 19.57, 329.09, 517.98, 27.1 , 20.63, 8.34, 1.9 , 47.7 , 368.43, 1222.94, 1.9 , 25.01], [ 8.36, 1.9 , 2.31, 313.86, 6.44, 22.73, 127.67, 19.57, 0. , 14.88, 91.37, 608.7 , 50.1 , 465.58, 141.4 , 105.79, 136.33, 1.9 , 24. , 51.17], [ 25.46, 25.65, 1.9 , 1.9 , 216.06, 2.41, 11.49, 329.09, 14.88, 0. , 537.53, 15.16, 40.1 , 39.7 , 15.58, 73.61, 126.4 , 91.67, 32.44, 44.15], [ 141.88, 6.18, 1.9 , 1.9 , 90.82, 1.9 , 11.97, 517.98, 91.37, 537.53, 0. , 65.41, 18.84, 47.37, 1.9 , 111.16, 528.17, 387.54, 21.71, 39.96], [ 26.95, 58.94, 794.38, 63.05, 15.2 , 53.3 , 496.13, 27.1 , 608.7 , 15.16, 65.41, 0. , 73.31, 173.56, 13.24, 494.39, 238.46, 1.9 , 10.68, 191.36], [ 54.31, 31.26, 13.43, 12.83, 17.31, 1.9 , 60.97, 20.63, 50.1 , 40.1 , 18.84, 73.31, 0. , 137.29, 23.64, 169.9 , 128.22, 8.23, 4.21, 16.21], [ 1.9 , 75.24, 55.28, 313.56, 19.11, 6.75, 582.4 , 8.34, 465.58, 39.7 , 47.37, 173.56, 137.29, 0. , 220.99, 54.11, 94.93, 19. , 1.9 , 38.82], [ 23.18, 103.33, 1.9 , 1.9 , 4.69, 23.03, 165.23, 1.9 , 141.4 , 15.58, 1.9 , 13.24, 23.64, 220.99, 0. , 6.04, 2.08, 7.64, 21.95, 1.9 ], [ 387.86, 277.05, 69.02, 54.71, 64.29, 125.93, 77.46, 47.7 , 105.79, 73.61, 111.16, 494.39, 169.9 , 54.11, 6.04, 0. , 597.21, 1.9 , 38.58, 64.92], [ 480.72, 179.97, 28.01, 14.82, 33.85, 11.17, 44.78, 368.43, 136.33, 126.4 , 528.17, 238.46, 128.22, 94.93, 2.08, 597.21, 0. , 204.54, 9.99, 38.73], [ 195.06, 1.9 , 1.9 , 21.14, 6.35, 2.53, 1.9 , 1222.94, 1.9 , 91.67, 387.54, 1.9 , 8.23, 19. , 7.64, 1.9 , 204.54, 0. , 5.37, 1.9 ], [ 1.9 , 33.6 , 19.86, 1.9 , 7.84, 10.92, 7.08, 1.9 , 24. , 32.44, 21.71, 10.68, 4.21, 1.9 , 21.95, 38.58, 9.99, 5.37, 0. , 26.25], [ 6.48, 254.77, 21.21, 13.12, 465.58, 3.21, 670.14, 25.01, 51.17, 44.15, 39.96, 191.36, 16.21, 38.82, 1.9 , 64.92, 38.73, 1.9 , 26.25, 0. ]]) AH96_freqs = { 'A': 0.071999999999999995, 'C': 0.0060000000000000001, 'E': 0.024, 'D': 0.019, 'G': 0.056000000000000001, 'F': 0.060999999999999999, 'I': 0.087999999999999995, 'H': 0.028000000000000001, 'K': 0.023, 'M': 0.053999999999999999, 'L': 0.16900000000000001, 'N': 0.039, 'Q': 0.025000000000000001, 'P': 0.053999999999999999, 'S': 0.071999999999999995, 'R': 0.019, 'T': 0.085999999999999993, 'W': 0.029000000000000001, 'V': 0.042999999999999997, 'Y': 0.033000000000000002} AH96_mtmammals_matrix = numpy.array( [[ 0.00000000e+00, 0.00000000e+00, 1.10000000e+01, 0.00000000e+00, 0.00000000e+00, 7.80000000e+01, 8.00000000e+00, 7.50000000e+01, 0.00000000e+00, 2.10000000e+01, 7.60000000e+01, 2.00000000e+00, 5.30000000e+01, 0.00000000e+00, 3.20000000e+01, 3.42000000e+02, 6.81000000e+02, 3.98000000e+02, 5.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 7.00000000e+00, 0.00000000e+00, 3.05000000e+02, 4.10000000e+01, 0.00000000e+00, 2.70000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.86000000e+02, 3.47000000e+02, 1.14000000e+02, 0.00000000e+00, 6.50000000e+01, 5.30000000e+02], [ 1.10000000e+01, 0.00000000e+00, 0.00000000e+00, 5.69000000e+02, 5.00000000e+00, 7.90000000e+01, 1.10000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 8.64000000e+02, 2.00000000e+00, 4.90000000e+01, 0.00000000e+00, 1.60000000e+01, 0.00000000e+00, 1.00000000e+01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 5.69000000e+02, 0.00000000e+00, 0.00000000e+00, 2.20000000e+01, 2.20000000e+01, 0.00000000e+00, 2.15000000e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.74000000e+02, 0.00000000e+00, 2.10000000e+01, 4.00000000e+00, 2.00000000e+01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 7.00000000e+00, 5.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.70000000e+01, 0.00000000e+00, 2.46000000e+02, 1.10000000e+01, 6.00000000e+00, 1.70000000e+01, 0.00000000e+00, 0.00000000e+00, 9.00000000e+01, 8.00000000e+00, 6.00000000e+00, 0.00000000e+00, 6.82000000e+02], [ 7.80000000e+01, 0.00000000e+00, 7.90000000e+01, 2.20000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 4.70000000e+01, 0.00000000e+00, 0.00000000e+00, 1.80000000e+01, 1.12000000e+02, 0.00000000e+00, 5.00000000e+00, 0.00000000e+00, 1.00000000e+00], [ 8.00000000e+00, 3.05000000e+02, 1.10000000e+01, 2.20000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.60000000e+01, 0.00000000e+00, 4.58000000e+02, 5.30000000e+01, 5.50000000e+02, 2.32000000e+02, 2.00000000e+01, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.52500000e+03], [ 7.50000000e+01, 4.10000000e+01, 0.00000000e+00, 0.00000000e+00, 5.70000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.00000000e+00, 2.32000000e+02, 3.78000000e+02, 1.90000000e+01, 5.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.60000000e+02, 2.22000000e+03, 0.00000000e+00, 1.60000000e+01], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.15000000e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.00000000e+00, 0.00000000e+00, 4.00000000e+00, 5.90000000e+01, 4.08000000e+02, 1.80000000e+01, 2.42000000e+02, 5.00000000e+01, 6.50000000e+01, 5.00000000e+01, 0.00000000e+00, 0.00000000e+00, 6.70000000e+01], [ 2.10000000e+01, 2.70000000e+01, 0.00000000e+00, 0.00000000e+00, 2.46000000e+02, 0.00000000e+00, 2.60000000e+01, 2.32000000e+02, 4.00000000e+00, 0.00000000e+00, 6.09000000e+02, 0.00000000e+00, 4.30000000e+01, 2.00000000e+01, 6.00000000e+00, 7.40000000e+01, 3.40000000e+01, 1.00000000e+02, 1.20000000e+01, 2.50000000e+01], [ 7.60000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.10000000e+01, 0.00000000e+00, 0.00000000e+00, 3.78000000e+02, 5.90000000e+01, 6.09000000e+02, 0.00000000e+00, 2.10000000e+01, 0.00000000e+00, 2.20000000e+01, 0.00000000e+00, 4.70000000e+01, 6.91000000e+02, 8.32000000e+02, 1.30000000e+01, 0.00000000e+00], [ 2.00000000e+00, 0.00000000e+00, 8.64000000e+02, 0.00000000e+00, 6.00000000e+00, 4.70000000e+01, 4.58000000e+02, 1.90000000e+01, 4.08000000e+02, 0.00000000e+00, 2.10000000e+01, 0.00000000e+00, 3.30000000e+01, 8.00000000e+00, 4.00000000e+00, 4.46000000e+02, 1.10000000e+02, 0.00000000e+00, 6.00000000e+00, 1.56000000e+02], [ 5.30000000e+01, 0.00000000e+00, 2.00000000e+00, 0.00000000e+00, 1.70000000e+01, 0.00000000e+00, 5.30000000e+01, 5.00000000e+00, 1.80000000e+01, 4.30000000e+01, 0.00000000e+00, 3.30000000e+01, 0.00000000e+00, 5.10000000e+01, 9.00000000e+00, 2.02000000e+02, 7.80000000e+01, 0.00000000e+00, 7.00000000e+00, 8.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 4.90000000e+01, 2.74000000e+02, 0.00000000e+00, 0.00000000e+00, 5.50000000e+02, 0.00000000e+00, 2.42000000e+02, 2.00000000e+01, 2.20000000e+01, 8.00000000e+00, 5.10000000e+01, 0.00000000e+00, 2.46000000e+02, 3.00000000e+01, 0.00000000e+00, 3.30000000e+01, 0.00000000e+00, 5.40000000e+01], [ 3.20000000e+01, 1.86000000e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.80000000e+01, 2.32000000e+02, 0.00000000e+00, 5.00000000e+01, 6.00000000e+00, 0.00000000e+00, 4.00000000e+00, 9.00000000e+00, 2.46000000e+02, 0.00000000e+00, 3.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.60000000e+01, 0.00000000e+00], [ 3.42000000e+02, 3.47000000e+02, 1.60000000e+01, 2.10000000e+01, 9.00000000e+01, 1.12000000e+02, 2.00000000e+01, 0.00000000e+00, 6.50000000e+01, 7.40000000e+01, 4.70000000e+01, 4.46000000e+02, 2.02000000e+02, 3.00000000e+01, 3.00000000e+00, 0.00000000e+00, 6.14000000e+02, 0.00000000e+00, 1.70000000e+01, 1.07000000e+02], [ 6.81000000e+02, 1.14000000e+02, 0.00000000e+00, 4.00000000e+00, 8.00000000e+00, 0.00000000e+00, 1.00000000e+00, 3.60000000e+02, 5.00000000e+01, 3.40000000e+01, 6.91000000e+02, 1.10000000e+02, 7.80000000e+01, 0.00000000e+00, 0.00000000e+00, 6.14000000e+02, 0.00000000e+00, 2.37000000e+02, 0.00000000e+00, 0.00000000e+00], [ 3.98000000e+02, 0.00000000e+00, 1.00000000e+01, 2.00000000e+01, 6.00000000e+00, 5.00000000e+00, 0.00000000e+00, 2.22000000e+03, 0.00000000e+00, 1.00000000e+02, 8.32000000e+02, 0.00000000e+00, 0.00000000e+00, 3.30000000e+01, 0.00000000e+00, 0.00000000e+00, 2.37000000e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 5.00000000e+00, 6.50000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.20000000e+01, 1.30000000e+01, 6.00000000e+00, 7.00000000e+00, 0.00000000e+00, 1.60000000e+01, 1.70000000e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.40000000e+01], [ 0.00000000e+00, 5.30000000e+02, 0.00000000e+00, 0.00000000e+00, 6.82000000e+02, 1.00000000e+00, 1.52500000e+03, 1.60000000e+01, 6.70000000e+01, 2.50000000e+01, 0.00000000e+00, 1.56000000e+02, 8.00000000e+00, 5.40000000e+01, 0.00000000e+00, 1.07000000e+02, 0.00000000e+00, 0.00000000e+00, 1.40000000e+01, 0.00000000e+00]]) AH96_mtmammals_freqs = { 'A': 0.069199999999999998, 'C': 0.0064999999999999997, 'E': 0.023599999999999999, 'D': 0.018599999999999998, 'G': 0.0557, 'F': 0.061100000000000002, 'I': 0.090499999999999997, 'H': 0.027699999999999999, 'K': 0.022100000000000002, 'M': 0.056099999999999997, 'L': 0.16750000000000001, 'N': 0.040000000000000001, 'Q': 0.023800000000000002, 'P': 0.053600000000000002, 'S': 0.072499999999999995, 'R': 0.0184, 'T': 0.086999999999999994, 'W': 0.0293, 'V': 0.042799999999999998, 'Y': 0.034000000000000002} WG01_matrix = numpy.array( [[ 0. , 1.02704 , 0.738998, 1.58285 , 0.210494, 1.41672 , 0.316954, 0.193335, 0.906265, 0.397915, 0.893496, 0.509848, 1.43855 , 0.908598, 0.551571, 3.37079 , 2.12111 , 2.00601 , 0.113133, 0.240735 ], [ 1.02704 , 0. , 0.0302949, 0.021352, 0.39802 , 0.306674, 0.248972, 0.170135, 0.0740339, 0.384287, 0.390482, 0.265256, 0.109404, 0.0988179, 0.528191, 1.40766 , 0.512984, 1.00214 , 0.71707 , 0.543833 ], [ 0.738998, 0.0302949, 0. , 6.17416 , 0.0467304, 0.865584, 0.930676, 0.039437, 0.479855, 0.0848047, 0.103754, 5.42942 , 0.423984, 0.616783, 0.147304, 1.07176 , 0.374866, 0.152335, 0.129767, 0.325711 ], [ 1.58285 , 0.021352, 6.17416 , 0. , 0.0811339, 0.567717, 0.570025, 0.127395, 2.58443 , 0.154263, 0.315124, 0.947198, 0.682355, 5.46947 , 0.439157, 0.704939, 0.822765, 0.588731, 0.156557, 0.196303 ], [ 0.210494, 0.39802 , 0.0467304, 0.0811339, 0. , 0.049931, 0.679371, 1.05947 , 0.088836, 2.11517 , 1.19063 , 0.0961621, 0.161444, 0.0999208, 0.102711, 0.545931, 0.171903, 0.649892, 1.52964 , 6.45428 ], [ 1.41672 , 0.306674, 0.865584, 0.567717, 0.049931, 0. , 0.24941 , 0.0304501, 0.373558, 0.0613037, 0.1741 , 1.12556 , 0.24357 , 0.330052, 0.584665, 1.34182 , 0.225833, 0.187247, 0.336983, 0.103604 ], [ 0.316954, 0.248972, 0.930676, 0.570025, 0.679371, 0.24941 , 0. , 0.13819 , 0.890432, 0.499462, 0.404141, 3.95629 , 0.696198, 4.29411 , 2.13715 , 0.740169, 0.473307, 0.118358, 0.262569, 3.87344 ], [ 0.193335, 0.170135, 0.039437, 0.127395, 1.05947 , 0.0304501, 0.13819 , 0. , 0.323832, 3.17097 , 4.25746 , 0.554236, 0.0999288, 0.113917, 0.186979, 0.31944 , 1.45816 , 7.8213 , 0.212483, 0.42017 ], [ 0.906265, 0.0740339, 0.479855, 2.58443 , 0.088836, 0.373558, 0.890432, 0.323832, 0. , 0.257555, 0.934276, 3.01201 , 0.556896, 3.8949 , 5.35142 , 0.96713 , 1.38698 , 0.305434, 0.137505, 0.133264 ], [ 0.397915, 0.384287, 0.0848047, 0.154263, 2.11517 , 0.0613037, 0.499462, 3.17097 , 0.257555, 0. , 4.85402 , 0.131528, 0.415844, 0.869489, 0.497671, 0.344739, 0.326622, 1.80034 , 0.665309, 0.398618 ], [ 0.893496, 0.390482, 0.103754, 0.315124, 1.19063 , 0.1741 , 0.404141, 4.25746 , 0.934276, 4.85402 , 0. , 0.198221, 0.171329, 1.54526 , 0.683162, 0.493905, 1.51612 , 2.05845 , 0.515706, 0.428437 ], [ 0.509848, 0.265256, 5.42942 , 0.947198, 0.0961621, 1.12556 , 3.95629 , 0.554236, 3.01201 , 0.131528, 0.198221, 0. , 0.195081, 1.54364 , 0.635346, 3.97423 , 2.03006 , 0.196246, 0.0719167, 1.086 ], [ 1.43855 , 0.109404, 0.423984, 0.682355, 0.161444, 0.24357 , 0.696198, 0.0999288, 0.556896, 0.415844, 0.171329, 0.195081, 0. , 0.933372, 0.679489, 1.61328 , 0.795384, 0.314887, 0.139405, 0.216046 ], [ 0.908598, 0.0988179, 0.616783, 5.46947 , 0.0999208, 0.330052, 4.29411 , 0.113917, 3.8949 , 0.869489, 1.54526 , 1.54364 , 0.933372, 0. , 3.0355 , 1.02887 , 0.857928, 0.301281, 0.215737, 0.22771 ], [ 0.551571, 0.528191, 0.147304, 0.439157, 0.102711, 0.584665, 2.13715 , 0.186979, 5.35142 , 0.497671, 0.683162, 0.635346, 0.679489, 3.0355 , 0. , 1.22419 , 0.554413, 0.251849, 1.16392 , 0.381533 ], [ 3.37079 , 1.40766 , 1.07176 , 0.704939, 0.545931, 1.34182 , 0.740169, 0.31944 , 0.96713 , 0.344739, 0.493905, 3.97423 , 1.61328 , 1.02887 , 1.22419 , 0. , 4.37802 , 0.232739, 0.523742, 0.786993 ], [ 2.12111 , 0.512984, 0.374866, 0.822765, 0.171903, 0.225833, 0.473307, 1.45816 , 1.38698 , 0.326622, 1.51612 , 2.03006 , 0.795384, 0.857928, 0.554413, 4.37802 , 0. , 1.38823 , 0.110864, 0.291148 ], [ 2.00601 , 1.00214 , 0.152335, 0.588731, 0.649892, 0.187247, 0.118358, 7.8213 , 0.305434, 1.80034 , 2.05845 , 0.196246, 0.314887, 0.301281, 0.251849, 0.232739, 1.38823 , 0. , 0.365369, 0.31473 ], [ 0.113133, 0.71707 , 0.129767, 0.156557, 1.52964 , 0.336983, 0.262569, 0.212483, 0.137505, 0.665309, 0.515706, 0.0719167, 0.139405, 0.215737, 1.16392 , 0.523742, 0.110864, 0.365369, 0. , 2.48539 ], [ 0.240735, 0.543833, 0.325711, 0.196303, 6.45428 , 0.103604, 3.87344 , 0.42017 , 0.133264, 0.398618, 0.428437, 1.086 , 0.216046, 0.22771 , 0.381533, 0.786993, 0.291148, 0.31473 , 2.48539 , 0. ]]) WG01_freqs = { 'A': 0.086627908662790867, 'C': 0.019307801930780195, 'E': 0.058058905805890577, 'D': 0.057045105704510574, 'G': 0.083251808325180837, 'F': 0.038431903843190382, 'I': 0.048466004846600491, 'H': 0.024431302443130246, 'K': 0.062028606202860624, 'M': 0.019502701950270197, 'L': 0.086209008620900862, 'N': 0.039089403908940397, 'Q': 0.036728103672810368, 'P': 0.045763104576310464, 'S': 0.069517906951790692, 'R': 0.043972004397200441, 'T': 0.061012706101270617, 'W': 0.014385901438590145, 'V': 0.070895607089560719, 'Y': 0.035274203527420354} def DSO78(**kw): """Dayhoff et al 1978 empirical protein model Dayhoff, MO, Schwartz RM, and Orcutt, BC. 1978 A model of evolutionary change in proteins. Pp. 345-352. Atlas of protein sequence and structure, Vol 5, Suppl. 3. National Biomedical Research Foundation, Washington D. C Matrix imported from PAML dayhoff.dat file""" sm = substitution_model.EmpiricalProteinMatrix( DSO78_matrix, DSO78_freqs, name='DSO78', **kw) return sm def JTT92(**kw): """Jones, Taylor and Thornton 1992 empirical protein model Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992 Jun;8(3):275-82. Matrix imported from PAML jones.dat file""" sm = substitution_model.EmpiricalProteinMatrix( JTT92_matrix, JTT92_freqs, name='JTT92', **kw) return sm def AH96(**kw): """Adachi and Hasegawa 1996 empirical model for mitochondrial proteins. Adachi J, Hasegawa M. Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol. 1996 Apr;42(4):459-68. Matrix imported from PAML mtREV24.dat file""" sm = substitution_model.EmpiricalProteinMatrix( AH96_matrix, AH96_freqs, name='AH96_mtREV24', **kw) return sm def mtREV(**kw): return AH96(**kw) def AH96_mtmammals(**kw): """Adachi and Hasegawa 1996 empirical model for mammalian mitochondrial proteins. Adachi J, Hasegawa M. Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol. 1996 Apr;42(4):459-68. Matrix imported from PAML mtmam.dat file""" sm = substitution_model.EmpiricalProteinMatrix( AH96_mtmammals_matrix, AH96_mtmammals_freqs, name='AH96_mtmammals', **kw) return sm def mtmam(**kw): return AH96_mtmammals(**kw) def WG01(**kw): """Whelan and Goldman 2001 empirical model for globular proteins. Whelan S, Goldman N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. 2001 May;18(5):691-9. Matrix imported from PAML wag.dat file""" sm = substitution_model.EmpiricalProteinMatrix( WG01_matrix, WG01_freqs, name='WG01', **kw) return sm PyCogent-1.5.3/cogent/evolve/motif_prob_model.py000644 000765 000024 00000025602 12024702176 022717 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import numpy import warnings import substitution_calculation from cogent.evolve.likelihood_tree import makeLikelihoodTreeLeaf __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def makeModel(mprob_model, tuple_alphabet, mask): if mprob_model == "monomers": return PosnSpecificMonomerProbModel(tuple_alphabet, mask) elif mprob_model == "monomer": return MonomerProbModel(tuple_alphabet, mask) elif mprob_model == "conditional": return ConditionalMotifProbModel(tuple_alphabet, mask) elif mprob_model in ["word", "tuple", None]: return SimpleMotifProbModel(tuple_alphabet) else: raise ValueError("Unknown mprob model '%s'" % str(mprob_model)) class MotifProbModel(object): def __init__(self, *whatever, **kw): raise NotImplementedError def calcWordProbs(self, *monomer_probs): assert len(monomer_probs) == 1 return monomer_probs[0] def calcWordWeightMatrix(self, *monomer_probs): assert len(monomer_probs) == 1 return monomer_probs[0] def makeMotifProbsDefn(self): """Makes the first part of a parameter controller definition for this model, the calculation of motif probabilities""" return substitution_calculation.PartitionDefn( name="mprobs", default=None, dimensions = ('locus','edge'), dimension=('motif', tuple(self.getInputAlphabet()))) def setParamControllerMotifProbs(self, pc, motif_probs, **kw): pc.setParamRule('mprobs', value=motif_probs, **kw) def countMotifs(self, alignment, include_ambiguity=False, recode_gaps=True): result = None for seq_name in alignment.getSeqNames(): sequence = alignment.getGappedSeq(seq_name, recode_gaps) leaf = makeLikelihoodTreeLeaf(sequence, self.getCountedAlphabet(), seq_name) count = leaf.getMotifCounts(include_ambiguity=include_ambiguity) if result is None: result = count.copy() else: result += count return result def adaptMotifProbs(self, motif_probs, auto=False): motif_probs = self.getInputAlphabet().adaptMotifProbs(motif_probs) assert abs(sum(motif_probs)-1.0) < 0.0001, motif_probs return motif_probs def makeEqualMotifProbs(self): alphabet = self.getInputAlphabet() p = 1.0/len(alphabet) return dict([(m,p) for m in alphabet]) def makeSampleMotifProbs(self): import random motif_probs = numpy.array( [random.uniform(0.2, 1.0) for m in self.getCountedAlphabet()]) motif_probs /= sum(motif_probs) return motif_probs class SimpleMotifProbModel(MotifProbModel): def __init__(self, alphabet): self.alphabet = alphabet def getInputAlphabet(self): return self.alphabet def getCountedAlphabet(self): return self.alphabet def makeMotifWordProbDefns(self): monomer_probs = self.makeMotifProbsDefn() return (monomer_probs, monomer_probs, monomer_probs) class ComplexMotifProbModel(MotifProbModel): def __init__(self, tuple_alphabet, mask): """Arguments: - tuple_alphabet: series of multi-letter motifs - monomers: the monomers from which the motifs are made - mask: instantaneous change matrix""" self.mask = mask self.tuple_alphabet = tuple_alphabet self.monomer_alphabet = monomers = tuple_alphabet.MolType.Alphabet self.word_length = length = tuple_alphabet.getMotifLen() size = len(tuple_alphabet) # m2w[AC, 1] = C # w2m[0, AC, A] = True # w2c[ATC, AT*] = 1 self.m2w = m2w = numpy.zeros([size, length], int) self.w2m = w2m = numpy.zeros([length, size, len(monomers)], int) contexts = monomers.getWordAlphabet(length-1) self.w2c = w2c = numpy.zeros([size, length*len(contexts)], int) for (i, word) in enumerate(tuple_alphabet): for j in range(length): monomer = monomers.index(word[j]) context = contexts.index(word[:j]+word[j+1:]) m2w[i, j] = monomer w2m[j, i, monomer] = 1 w2c[i, context*length+j] = 1 self.mutated_posn = numpy.zeros(mask.shape, int) self.mutant_motif = numpy.zeros(mask.shape, int) self.context_indices = numpy.zeros(mask.shape, int) for (i, old_word, j, new_word, diff) in self._mutations(): self.mutated_posn[i,j] = diff mutant_motif = new_word[diff] context = new_word[:diff]+new_word[diff+1:] self.mutant_motif[i,j] = monomers.index(mutant_motif) c = contexts.index(context) self.context_indices[i,j] = c * length + diff def _mutations(self): diff_pos = lambda x,y: [i for i in range(len(x)) if x[i] != y[i]] num_states = len(self.tuple_alphabet) for i in range(num_states): old_word = self.tuple_alphabet[i] for j in range(num_states): new_word = self.tuple_alphabet[j] if self.mask[i,j]: assert self.mask[i,j] == 1.0 diffs = diff_pos(old_word, new_word) assert len(diffs) == 1, (old_word, new_word) diff = diffs[0] yield i, old_word, j, new_word, diff class MonomerProbModel(ComplexMotifProbModel): def getInputAlphabet(self): return self.monomer_alphabet def getCountedAlphabet(self): return self.monomer_alphabet def calcMonomerProbs(self, word_probs): monomer_probs = numpy.dot(word_probs, self.w2m.sum(axis=0)) monomer_probs /= monomer_probs.sum() return monomer_probs def calcWordProbs(self, monomer_probs): result = numpy.product(monomer_probs.take(self.m2w), axis=-1) # maybe simpler but slower, works ok: #result = numpy.product(monomer_probs ** (w2m, axis=-1)) result /= result.sum() return result def calcWordWeightMatrix(self, monomer_probs): result = monomer_probs.take(self.mutant_motif) * self.mask return result def makeMotifWordProbDefns(self): monomer_probs = self.makeMotifProbsDefn() word_probs = substitution_calculation.CalcDefn( self.calcWordProbs, name="wprobs")(monomer_probs) mprobs_matrix = substitution_calculation.CalcDefn( self.calcWordWeightMatrix, name="mprobs_matrix")(monomer_probs) return (monomer_probs, word_probs, mprobs_matrix) def adaptMotifProbs(self, motif_probs, auto=False): try: motif_probs = self.monomer_alphabet.adaptMotifProbs(motif_probs) except ValueError: motif_probs = self.tuple_alphabet.adaptMotifProbs(motif_probs) if not auto: warnings.warn('Motif probs overspecified', stacklevel=5) motif_probs = self.calcMonomerProbs(motif_probs) return motif_probs class PosnSpecificMonomerProbModel(MonomerProbModel): def getCountedAlphabet(self): return self.tuple_alphabet def calcPosnSpecificMonomerProbs(self, word_probs): monomer_probs = numpy.dot(word_probs, self.w2m) monomer_probs /= monomer_probs.sum(axis=1)[..., numpy.newaxis] return list(monomer_probs) def calcWordProbs(self, monomer_probs): positions = range(self.word_length) assert len(monomer_probs) == self.m2w.shape[1], ( len(monomer_probs), type(monomer_probs), self.m2w.shape) result = numpy.product( [monomer_probs[i].take(self.m2w[:,i]) for i in positions], axis=0) result /= result.sum() return result def calcWordWeightMatrix(self, monomer_probs): positions = range(self.word_length) monomer_probs = numpy.array(monomer_probs) # so [posn, motif] size = monomer_probs.shape[-1] # should be constant extended_indices = self.mutated_posn * size + self.mutant_motif #print size, self.word_length #for a in [extended_indices, self.mutated_posn, self.mutant_motif, # monomer_probs]: # print a.shape, a.max() result = monomer_probs.take(extended_indices) * self.mask return result def makeMotifWordProbDefns(self): monomer_probs = substitution_calculation.PartitionDefn( name="psmprobs", default=None, dimensions = ('locus', 'position', 'edge'), dimension=('motif', tuple(self.getInputAlphabet()))) monomer_probs3 = monomer_probs.acrossDimension('position', [ str(i) for i in range(self.word_length)]) monomer_probs3 = substitution_calculation.CalcDefn( lambda *x:numpy.array(x), name='mprobs')(*monomer_probs3) word_probs = substitution_calculation.CalcDefn( self.calcWordProbs, name="wprobs")(monomer_probs3) mprobs_matrix = substitution_calculation.CalcDefn( self.calcWordWeightMatrix, name="mprobs_matrix")( monomer_probs3) return (monomer_probs, word_probs, mprobs_matrix) def setParamControllerMotifProbs(self, pc, motif_probs, **kw): assert len(motif_probs) == self.word_length for (i,m) in enumerate(motif_probs): pc.setParamRule('psmprobs', value=m, position=str(i), **kw) def adaptMotifProbs(self, motif_probs, auto=False): try: motif_probs = self.monomer_alphabet.adaptMotifProbs(motif_probs) except ValueError: motif_probs = self.tuple_alphabet.adaptMotifProbs(motif_probs) motif_probs = self.calcPosnSpecificMonomerProbs(motif_probs) else: motif_probs = [motif_probs] * self.word_length return motif_probs class ConditionalMotifProbModel(ComplexMotifProbModel): def getInputAlphabet(self): return self.tuple_alphabet def getCountedAlphabet(self): return self.tuple_alphabet def calcWordWeightMatrix(self, motif_probs): context_probs = numpy.dot(motif_probs, self.w2c) context_probs[context_probs==0.0] = numpy.inf result = motif_probs / context_probs.take(self.context_indices) return result def makeMotifWordProbDefns(self): mprobs = self.makeMotifProbsDefn() mprobs_matrix = substitution_calculation.CalcDefn( self.calcWordWeightMatrix, name="mprobs_matrix")(mprobs) return (mprobs, mprobs, mprobs_matrix) PyCogent-1.5.3/cogent/evolve/pairwise_distance.py000644 000765 000024 00000033510 12024702176 023071 0ustar00jrideoutstaff000000 000000 from __future__ import division from numpy import log, zeros, float64, int32, array, sqrt, dot, diag, where from numpy.linalg import det, norm, inv from cogent import DNA, RNA, LoadTable from cogent.util.progress_display import display_wrap __author__ = "Gavin Huttley and Yicheng Zhu" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Yicheng Zhu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Alpha" # pending addition of protein distance metrics def _same_moltype(ref, query): """if ref and query have the same states""" return set(ref) == set(query) def get_pyrimidine_indices(moltype): """returns pyrimidine indices for the moltype""" states = list(moltype) if _same_moltype(RNA, moltype): return map(states.index, 'CU') elif _same_moltype(DNA, moltype): return map(states.index, 'CT') else: raise RuntimeError('Non-nucleic acid MolType') def get_purine_indices(moltype): """returns purine indices for the moltype""" states = list(moltype) if not _same_moltype(RNA, moltype) and not _same_moltype(DNA, moltype): raise RuntimeError('Non-nucleic acid MolType') return map(states.index, 'AG') def get_matrix_diff_coords(indices): """returns coordinates for off diagonal elements""" return [(i,j) for i in indices for j in indices if i != j] def get_moltype_index_array(moltype, invalid=-9): """returns the index array for a molecular type""" canonical_chars = list(moltype) # maximum ordinal for an allowed character, this defines the length of # the required numpy array max_ord = max(map(ord, moltype.All.keys())) char_to_index = zeros(max_ord+1, int32) # all non canonical_chars are ``invalid'' char_to_index.fill(invalid) for i in range(len(canonical_chars)): c = canonical_chars[i] o = ord(c) char_to_index[o] = i return char_to_index def seq_to_indices(seq, char_to_index): """returns an array with sequence characters replaced by their index""" ords = map(ord, seq) indices = char_to_index.take(ords) return indices def _fill_diversity_matrix(matrix, seq1, seq2): """fills the diversity matrix for valid positions. Assumes the provided sequences have been converted to indices with invalid characters being negative numbers (use get_moltype_index_array plus seq_to_indices).""" paired = array([seq1, seq2]).T paired = paired[paired.min(axis=1) >= 0] for i in range(len(paired)): matrix[paired[i][0], paired[i][1]] += 1 def _jc69_from_matrix(matrix): """computes JC69 stats from a diversity matrix""" invalid = None, None, None, None total = matrix.sum() diffs = total - sum(matrix[i,i] for i in range(matrix.shape[0])) if total == 0: return invalid p = diffs / total if p >= 0.75: # cannot take log return invalid factor = (1 - (4 / 3) * p) dist = -3.0 * log(factor) / 4 var = p * (1 - p) / (factor * factor * total) return total, p, dist, var def _tn93_from_matrix(matrix, freqs, pur_indices, pyr_indices, pur_coords, pyr_coords, tv_coords): invalid = None, None, None, None total = matrix.sum() freqs = matrix.sum(axis=0) + matrix.sum(axis=1) freqs /= (2*total) if total == 0: return invalid # p = matrix.take(pur_coords + pyr_coords + tv_coords).sum() / total freq_purs = freqs.take(pur_indices).sum() prod_purs = freqs.take(pur_indices).prod() freq_pyrs = freqs.take(pyr_indices).sum() prod_pyrs = freqs.take(pyr_indices).prod() # purine transition diffs pur_ts_diffs = matrix.take(pur_coords).sum() pur_ts_diffs /= total # pyr transition diffs pyr_ts_diffs = matrix.take(pyr_coords).sum() pyr_ts_diffs /= total # transversions tv_diffs = matrix.take(tv_coords).sum() / total coeff1 = 2 * prod_purs / freq_purs coeff2 = 2 * prod_pyrs / freq_pyrs coeff3 = 2 * (freq_purs * freq_pyrs - \ (prod_purs * freq_pyrs / freq_purs) -\ (prod_pyrs * freq_purs / freq_pyrs)) term1 = 1 - pur_ts_diffs / coeff1 - tv_diffs / (2*freq_purs) term2 = 1 - pyr_ts_diffs / coeff2 - tv_diffs / (2*freq_pyrs) term3 = 1 - tv_diffs / (2 * freq_purs * freq_pyrs) if term1 <= 0 or term2 <= 0 or term3 <= 0: # log will fail return invalid dist = -coeff1 * log(term1) - coeff2 * log(term2) - coeff3 * log(term3) v1 = 1 / term1 v2 = 1 / term2 v3 = 1 / term3 v4 = (coeff1 * v1 / (2 * freq_purs)) + \ (coeff2 * v2 / (2 * freq_pyrs)) + \ (coeff3 * v3 / (2 * freq_purs * freq_pyrs)) var = v1**2 * pur_ts_diffs + v2**2 * pyr_ts_diffs + v4**2 * tv_diffs - \ (v1 * pur_ts_diffs + v2 * pyr_ts_diffs + v4 * tv_diffs)**2 var /= total return total, p, dist, var def _logdet(matrix, use_tk_adjustment=True): """returns the LogDet from a diversity matrix Arguments: - use_tk_adjustment: when True, unequal state frequencies are allowed """ invalid = None, None, None, None total = matrix.sum() diffs = total - sum(matrix[i,i] for i in range(matrix.shape[0])) if total == 0: return invalid p = diffs / total if diffs == 0: # seqs identical return total, p, 0.0, None # we replace missing diagonal states with a frequency of 0.5, # then normalise frequency = matrix.copy() unobserved = where(frequency.diagonal() == 0)[0] for index in unobserved: frequency[index, index] = 0.5 frequency /= frequency.sum() # the inverse matrix of frequency, every element is squared M_matrix = inv(frequency)**2 freqs_1 = frequency.sum(axis = 0) freqs_2 = frequency.sum(axis = 1) if use_tk_adjustment: mean_state_freqs = (freqs_1 + freqs_2) / 2 coeff = (norm(mean_state_freqs)**2 - 1) / (matrix.shape[0] - 1) else: coeff = -1 / matrix.shape[0] FM_1 = diag(freqs_1) FM_2 = diag(freqs_2) try: d_xy = coeff * log(det(frequency) / sqrt(det(FM_1 * FM_2))) except FloatingPointError: return invalid if det(frequency) <= 0: #if the result is nan return invalid var_term = dot(M_matrix, frequency).transpose()[0].sum() var_denom = 16 * total if use_tk_adjustment: var = (var_term - (1 / sqrt(freqs_1 * freqs_2)).sum()) / var_denom else: # variance formula for TK adjustment is false var = (var_term - 1) / var_denom var = d_xy - 2 * var return total, p, d_xy, var try: from _pairwise_distance import \ _fill_diversity_matrix as fill_diversity_matrix # raise ImportError # for testing except ImportError: fill_diversity_matrix = _fill_diversity_matrix def _number_formatter(template): """flexible number formatter""" def call(val): try: result = template % val except TypeError: result = val return result return call class _PairwiseDistance(object): """base class for computing pairwise distances""" def __init__(self, moltype, invalid=-9, alignment=None): super(_PairwiseDistance, self).__init__() self.moltype = moltype self.char_to_indices = get_moltype_index_array(moltype) self._dim = len(list(moltype)) self._dists = None self.Names = None self.IndexedSeqs = None if alignment is not None: self._convert_seqs_to_indices(alignment) self._func_args = [] def _convert_seqs_to_indices(self, alignment): assert type(alignment.MolType) == type(self.moltype), \ 'Alignment does not have correct MolType' self._dists = {} self.Names = alignment.Names[:] indexed_seqs = [] for name in self.Names: seq = alignment.getGappedSeq(name) indexed = seq_to_indices(str(seq), self.char_to_indices) indexed_seqs.append(indexed) self.IndexedSeqs = array(indexed_seqs) @staticmethod def func(): pass # over ride in subclasses @display_wrap def run(self, alignment=None, ui=None): """computes the pairwise distances""" if alignment is not None: self._convert_seqs_to_indices(alignment) matrix = zeros((self._dim, self._dim), float64) done = 0.0 to_do = (len(self.Names) * len(self.Names) - 1) / 2 for i in range(len(self.Names)-1): name_1 = self.Names[i] s1 = self.IndexedSeqs[i] for j in range(i+1, len(self.Names)): name_2 = self.Names[j] ui.display('%s vs %s' % (name_1, name_2), done / to_do ) done += 1 matrix.fill(0) s2 = self.IndexedSeqs[j] fill_diversity_matrix(matrix, s1, s2) total, p, dist, var = self.func(matrix, *self._func_args) self._dists[(name_1, name_2)] = (total, p, dist, var) self._dists[(name_2, name_1)] = (total, p, dist, var) def getPairwiseDistances(self): """returns a 2D dictionary of pairwise distances.""" if self._dists is None: return None dists = {} for name_1 in self.Names: for name_2 in self.Names: if name_1 == name_2: continue val = self._dists[(name_1, name_2)][2] dists[(name_1, name_2)] = val dists[(name_2, name_1)] = val return dists def _get_stats(self, stat, transform=None, **kwargs): """returns a table for the indicated statistics""" if self._dists is None: return None rows = [] for row_name in self.Names: row = [row_name] for col_name in self.Names: if row_name == col_name: row.append('') continue val = self._dists[(row_name, col_name)][stat] if transform is not None: val = transform(val) row.append(val) rows.append(row) header = [r'Seq1 \ Seq2'] + self.Names table = LoadTable(header=header, rows=rows, row_ids = True, missing_data='*', **kwargs) return table @property def Dists(self): kwargs = dict(title='Pairwise Distances', digits=4) return self._get_stats(2, **kwargs) @property def StdErr(self): stderr = lambda x: sqrt(x) kwargs = dict(title='Standard Error of Pairwise Distances', digits=4) return self._get_stats(3, transform=stderr, **kwargs) @property def Variances(self): kwargs = dict(title='Variances of Pairwise Distances', digits=4) table = self._get_stats(3, **kwargs) var_formatter = _number_formatter("%.2e") if table is not None: for name in self.Names: table.setColumnFormat(name, var_formatter) return table @property def Proportions(self): kwargs = dict(title='Proportion variable sites', digits=4) return self._get_stats(1, **kwargs) @property def Lengths(self): kwargs = dict(title='Pairwise Aligned Lengths', digits=0) return self._get_stats(0, **kwargs) class _NucleicSeqPair(_PairwiseDistance): """docstring for _NucleicSeqPair""" def __init__(self, *args, **kwargs): super(_NucleicSeqPair, self).__init__(*args, **kwargs) if not _same_moltype(DNA, self.moltype) and \ not _same_moltype(RNA, self.moltype): raise RuntimeError('Invalid MolType for this metric') class JC69Pair(_NucleicSeqPair): """calculator for pairwise alignments""" def __init__(self, *args, **kwargs): """states: the valid sequence states""" super(JC69Pair, self).__init__(*args, **kwargs) self.func = _jc69_from_matrix class TN93Pair(_NucleicSeqPair): """calculator for pairwise alignments""" def __init__(self, *args, **kwargs): """states: the valid sequence states""" super(TN93Pair, self).__init__(*args, **kwargs) self._freqs = zeros(self._dim, float64) self.pur_indices = get_purine_indices(self.moltype) self.pyr_indices = get_pyrimidine_indices(self.moltype) # matrix coordinates self.pyr_coords = get_matrix_diff_coords(self.pyr_indices) self.pur_coords = get_matrix_diff_coords(self.pur_indices) self.tv_coords = get_matrix_diff_coords(range(self._dim)) for coord in self.pur_coords + self.pyr_coords: self.tv_coords.remove(coord) # flattened self.pyr_coords = [i * 4 + j for i, j in self.pyr_coords] self.pur_coords = [i * 4 + j for i, j in self.pur_coords] self.tv_coords = [i * 4 + j for i, j in self.tv_coords] self.func = _tn93_from_matrix self._func_args = [self._freqs, self.pur_indices, self.pyr_indices, self.pur_coords, self.pyr_coords, self.tv_coords] class LogDetPair(_PairwiseDistance): """computes logdet distance between sequence pairs""" def __init__(self, use_tk_adjustment=True, *args, **kwargs): """Arguments: - use_tk_adjustment: use the correction of Tamura and Kumar 2002 """ super(LogDetPair, self).__init__(*args, **kwargs) self.func = _logdet self._func_args = [use_tk_adjustment] def run(self, use_tk_adjustment=None, *args, **kwargs): if use_tk_adjustment is not None: self._func_args = [use_tk_adjustment] super(LogDetPair, self).run(*args, **kwargs) PyCogent-1.5.3/cogent/evolve/parameter_controller.py000644 000765 000024 00000041124 12024702176 023617 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ This file defines a class for controlling the scope and heterogeneity of parameters involved in a maximum-likelihood based tree analysis. """ from __future__ import with_statement import numpy import pickle, warnings from cogent.core.tree import TreeError from cogent.evolve import likelihood_calculation from cogent.align import dp_calculation from cogent.evolve.likelihood_function import LikelihoodFunction as _LF from cogent.recalculation.scope import _indexed from cogent.maths.stats.information_criteria import aic, bic from cogent.align.pairwise import AlignableSeq from cogent.util.warning import discontinued, deprecated __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Andrew Butterfield", "Peter Maxwell", "Gavin Huttley", "Helen Lindsay"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.ed.au" __status__ = "Production" def _category_names(dimension, specified): if type(specified) is int: cats = ['%s%s' % (dimension, i) for i in range(specified)] else: cats = tuple(specified) assert len(cats) >= 1, cats assert len(set(cats)) == len(cats), ("%s names must be unique" % dimension) return list(cats) def load(filename): # first cut at saving pc's f = open(filename, 'rb') (version, info, pc) = pickle.load(f) assert version < 2.0, version pc.updateIntermediateValues() return pc class _LikelihoodParameterController(_LF): """A ParameterController works by setting parameter rules. For each parameter in the model the edges of the tree are be partitioned into groups that share one value. For usage see the setParamRule method. """ # Basically wrapper around the more generic recalulation.ParameterController # class, which doesn't know about trees. def __init__(self, model, tree, bins=1, loci=1, optimise_motif_probs=False, motif_probs_from_align=False, **kw): self.model = self._model = model self.tree = self._tree = tree self.seq_names = tree.getTipNames() self.locus_names = _category_names('locus', loci) self.bin_names = _category_names('bin', bins) self.posn_names = [str(i) for i in range(model.getWordLength())] self.motifs = self._motifs = model.getMotifs() self._mprob_motifs = list(model.getMprobAlphabet()) defn = self.makeLikelihoodDefn(**kw) super(_LF, self).__init__(defn) self.setDefaultParamRules() self.setDefaultTreeParameterRules() self.mprobs_from_alignment = motif_probs_from_align self.optimise_motif_probs = optimise_motif_probs self._name = '' self._format = {} def save(self, filename): f = open(filename, 'w') temp = {} try: for d in self.defns: temp[id(d)] = d.values del d.values pickle.dump((1.0, None, self), f) finally: for d in self.defns: if id(d) in temp: d.values = temp[id(d)] def setDefaultTreeParameterRules(self): """Lengths are set to the values found in the tree (if any), and free to be optimised independently. Other parameters are scoped based on the unique values found in the tree (if any) or default to having one value shared across the whole tree""" with self.updatesPostponed(): edges = self.tree.getEdgeVector() for par_name in self.model.getParamList(): try: values = dict([(edge.Name, edge.params[par_name]) for edge in edges if not edge.isroot()]) (uniq, index) = _indexed(values) except KeyError: continue # new parameter for (u, value) in enumerate(uniq): group = [edge for (edge, i) in index.items() if i==u] self.setParamRule(par_name, edges=group, init=value) for edge in edges: if edge.Length is not None: try: self.setParamRule('length', edge=edge.Name, init=edge.Length) except KeyError: # hopefully due to being a discrete model warnings.warn('Ignoring tree edge lengths', stacklevel=4) break def setMotifProbsFromData(self, align, locus=None, is_constant=None, include_ambiguity=False, is_independent=None, auto=False, pseudocount=None, **kwargs): if 'is_const' in kwargs: is_constant = kwargs.pop('is_const') deprecated('argument', 'is_const', 'is_constant', 1.6) counts = self.model.countMotifs(align, include_ambiguity=include_ambiguity) if is_constant is None: is_constant = not self.optimise_motif_probs if pseudocount is None: if is_constant: pseudocount = 0.0 else: pseudocount = 0.5 counts += pseudocount mprobs = counts/(1.0*sum(counts)) self.setMotifProbs(mprobs, locus=locus, is_constant=is_constant, is_independent=is_independent, auto=auto, **kwargs) def setMotifProbs(self, motif_probs, locus=None, bin=None, is_constant=None, is_independent=None, auto=False, **kwargs): if 'is_const' in kwargs: is_constant = kwargs.pop('is_const') deprecated('argument', 'is_const', 'is_constant', 1.6) motif_probs = self.model.adaptMotifProbs(motif_probs, auto=auto) if is_constant is None: is_constant = not self.optimise_motif_probs self.model.setParamControllerMotifProbs(self, motif_probs, is_constant=is_constant, bin=bin, locus=locus, is_independent=is_independent, **kwargs) if not auto: self.mprobs_from_alignment = False # should be done per-locus def setExpm(self, expm): assert expm in ['pade', 'either', 'eigen', 'checked'], expm self.setParamRule('expm', is_constant=True, value=expm) def makeCalculator(self, *args, **kw): if args: discontinued('method', "makeCalculator(aligns)", '1.6') # and shadowing a quite different superclass method. self.setAlignment(*args) if getattr(self, 'used_as_calculator', False): warnings.warn('PC used as two different calculators', stacklevel=2) self.used_as_calculator = True return self else: return super(_LF, self).makeCalculator(**kw) def _process_scope_info(self, edge=None, tip_names=None, edges=None, is_clade=None, is_stem=None, outgroup_name=None): """From information specifying the scope of a parameter derive a list of edge names""" if edges is not None: if tip_names or edge: raise TreeError("Only ONE of edge, edges or tip_names") elif edge is not None: if tip_names: raise TreeError("Only ONE of edge, edges or tip_names") edges = [edge] elif tip_names is None: edges = None # meaning all edges elif len(tip_names) != 2: raise TreeError("tip_names must contain 2 species") else: (species1, species2) = tip_names if is_stem is None: is_stem = False if is_clade is None: is_clade = not is_stem edges = self.tree.getEdgeNames(species1, species2, getstem=is_stem, getclade=is_clade, outgroup_name=outgroup_name) return edges def setParamRule(self, par_name, is_independent=None, is_constant=False, value=None, lower=None, init=None, upper=None, **scope_info): """Define a model constraint for par_name. Parameters can be set constant or split according to tree/bin scopes. Arguments: - par_name: The model parameter being modified. - is_constant, value: if True, the parameter is held constant at value, if provided, or the likelihood functions current value. - is_independent: whether the partition specified by scope/bin arguments are to be considered independent. - lower, init, upper: specify the lower bound, initial value and upper bound for optimisation. Can be set separately. - bin, bins: the name(s) of the bin to apply rule. - locus, loci: the name of the locus/loci to apply rule. - **scope_info: tree scope arguments - edge, edges: The name of the tree edge(s) affected by rule. ?? - tip_names: a tuple of two tip names, specifying a tree scope to apply rule. - outgroup_name: A tip name that, provided along with tip_names, ensures a consistently specified tree scope. - is_clade: The rule applies to all edges descending from the most recent common ancestor defined by the tip_names+outgroup_name arguments. - is_stem: The rule applies to the edge preceding the most recent common ancestor defined by the tip_names+outgroup_name arguments. """ if 'is_const' in scope_info: is_constant = scope_info.pop('is_const') deprecated('argument', 'is_const', 'is_constant', 1.6) par_name = str(par_name) scopes = {} for (single, plural) in [ ('bin', 'bins'), ('locus', 'loci'), ('position', 'positions'), ('motif', 'motifs'), ]: if single in scope_info: v = scope_info.pop(single) if v: assert isinstance(v, basestring), ('%s=, maybe?' % plural) assert plural not in scope_info scopes[single] = [v] elif plural in scope_info: v = scope_info.pop(plural) if v: scopes[single] = v edges = self._process_scope_info(**scope_info) if edges: scopes['edge'] = edges if is_constant: assert not (init or lower or upper) elif init is not None: assert not value value = init self.assignAll(par_name, scopes, value, lower, upper, is_constant, is_independent) def setLocalClock(self, tip1name, tip2name): """Constrain branch lengths for tip1name and tip2name to be equal. This is a molecular clock condition. Currently only valid for tips connected to the same node. Note: This is just a convenient interface to setParameterRule. """ self.setParamRule("length", tip_names = [tip1name, tip2name], is_clade = 1, is_independent = 0) def setConstantLengths(self, tree=None, exclude_list=[]): """Constrains edge lengths to those in the tree. Arguments: - tree: must have the same topology as the current model. If not provided, the current tree length's are used. - exclude_list: a list of edge names whose branch lengths will be constrained. """ if tree is None: tree = self.tree with self.updatesPostponed(): for edge in tree.getEdgeVector(): if edge.Length is None or edge.Name in exclude_list: continue self.setParamRule("length", edge=edge.Name, is_constant=1, value=edge.Length) def getAic(self, second_order=False): """returns Aikake Information Criteria Arguments: - second_order: if true, the second-order AIC is returned, adjusted by the alignment length""" if second_order: sequence_length = sum(len(self.getParamValue('lht', locus=l).index) for l in self.locus_names) else: sequence_length = None lnL = self.getLogLikelihood() nfp = self.getNumFreeParams() return aic(lnL, nfp, sequence_length) def getBic(self): """returns the Bayesian Information Criteria""" sequence_length = sum(len(self.getParamValue('lht', locus=l).index) for l in self.locus_names) lnL = self.getLogLikelihood() nfp = self.getNumFreeParams() return bic(lnL, nfp, sequence_length) class AlignmentLikelihoodFunction(_LikelihoodParameterController): def setDefaultParamRules(self): try: self.assignAll( 'fixed_motif', None, value=-1, const=True, independent=True) except KeyError: pass def makeLikelihoodDefn(self, sites_independent=True, discrete_edges=None): defns = self.model.makeParamControllerDefns(bin_names=self.bin_names) if discrete_edges is not None: from discrete_markov import PartialyDiscretePsubsDefn defns['psubs'] = PartialyDiscretePsubsDefn( self.motifs, defns['psubs'], discrete_edges) return likelihood_calculation.makeTotalLogLikelihoodDefn( self.tree, defns['align'], defns['psubs'], defns['word_probs'], defns['bprobs'], self.bin_names, self.locus_names, sites_independent) def setAlignment(self, aligns, motif_pseudocount=None): """set the alignment to be used for computing the likelihood.""" if type(aligns) is not list: aligns = [aligns] assert len(aligns) == len(self.locus_names), len(aligns) tip_names = set(self.tree.getTipNames()) for index, aln in enumerate(aligns): if len(aligns) > 1: locus_name = "for locus '%s'" % self.locus_names[index] else: locus_name = "" assert not set(aln.getSeqNames()).symmetric_difference(tip_names),\ "Tree tip names %s and aln seq names %s don't match %s" % \ (self.tree.getTipNames(), aln.getSeqNames(), locus_name) assert not "root" in aln.getSeqNames(), "'root' is a reserved name." with self.updatesPostponed(): for (locus_name, align) in zip(self.locus_names, aligns): self.assignAll( 'alignment', {'locus':[locus_name]}, value=align, const=True) if self.mprobs_from_alignment: self.setMotifProbsFromData(align, locus=locus_name, auto=True, pseudocount=motif_pseudocount) class SequenceLikelihoodFunction(_LikelihoodParameterController): def setDefaultParamRules(self): pass def makeLikelihoodDefn(self, sites_independent=None, with_indel_params=True, kn=True): assert sites_independent is None or not sites_independent assert len(self.locus_names) == 1 return dp_calculation.makeForwardTreeDefn( self.model, self.tree, self.bin_names, with_indel_params=with_indel_params, kn=kn) def setSequences(self, seqs, locus=None): leaves = {} for (name, seq) in seqs.items(): # if has uniq, probably already a likelihood tree leaf obj already if hasattr(seq, 'uniq'): leaf = seq # XXX more checks - same alphabet as model, name etc ... else: leaf = self.model.convertSequence(seq, name) leaf = AlignableSeq(leaf) leaves[name] = leaf assert name != "root", "'root' is a reserved name." self.setPogs(leaves, locus=locus) def setPogs(self, leaves, locus=None): with self.updatesPostponed(): for (name, pog) in leaves.items(): self.setParamRule('leaf', edge=name, value=pog, is_constant=True) if self.mprobs_from_alignment: counts = numpy.sum([pog.leaf.getMotifCounts() for pog in leaves.values()], 0) mprobs = counts/(1.0*sum(counts)) self.setMotifProbs(mprobs, locus=locus, is_constant=True, auto=True) PyCogent-1.5.3/cogent/evolve/predicate.py000644 000765 000024 00000021426 12024702176 021337 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Matching motifs: MotifChange("a", "g") Boolean logic: Any, All, Not (or &, |, ~) also: anypredicate.aliased('shortname') UserPredicate(f) """ import warnings import numpy __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" class _CallablePredicate(object): # A predicate in the context of a particular model def __init__(self, pred, model): self.model = model self.alphabet = model.getAlphabet() self.name = repr(pred) self.f = pred.interpret(model) self.__doc__ = pred.__doc__ def __call__(self, x, y): return self.f(x, y) def __repr__(self): return self.name def asciiArt(self): l = len(self.alphabet.getGapMotif()) rows = [] for i in range(l): row = [a[i] for a in list(self.alphabet)] rows.append(' '*(l+1) + ' '.join(row)) for y in self.alphabet: row = [] for x in self.alphabet: if not self.model.isinstantanious(x, y): c = ' ' elif self(x,y): c = '*' else: c = '-' row.append(c) rows.append(' '.join([y] + row)) return '\n'.join(rows) class predicate(object): def __and__(self, other): return All(self, other) def __or__(self, other): return Any(self, other) def __invert__(self): return Not(self) def __nonzero__(self): warnings.warn('alphabet predicate used as truth value. Use only binary operators: &, | and ~', stacklevel=2) return True def __eq__(self, other): warnings.warn('Warning: alphabet pair predicate used as value. Use parentheses') return self is other def aliased(self, new_name): n = PredicateAlias(new_name, self) n.__doc__ = self.__doc__ or repr(self) return n def makeModelPredicate(self, model): return _CallablePredicate(self, model) class PredicateAlias(predicate): def __init__(self, name, subpredicate): self.name = name self.subpredicate = subpredicate def __repr__(self): return self.name def interpret(self, model): subpred = self.subpredicate.interpret(model) return subpred class _UnaryPredicate(predicate): def __init__(self, subpredicate): assert isinstance(subpredicate, predicate), subpredicate self.subpredicate = subpredicate self.__doc__ = repr(self) def __repr__(self): if hasattr(self, '_op_repr'): return "%s(%s)" % (self._op_repr, self.subpredicate) else: return "%s(%s)" % (self.__class__.__name__, self.subpredicate) class _GenericPredicate(predicate): def __init__(self, *subpredicates): for p in subpredicates: assert isinstance(p, predicate), p self.subpredicates = subpredicates self.__doc__ = repr(self) def __repr__(self): if hasattr(self, '_op_repr'): return '(%s)' % (' %s ' % self._op_repr).join([repr(p) for p in self.subpredicates]) else: return '%s(%s)' % (self.__class__.__name__, ','.join(['(%s)' % repr(p) for p in self.subpredicates])) # Boolean logic on motif pair predicates class Not(_UnaryPredicate): _op_repr = '~' def interpret(self, model): subpred = self.subpredicate.interpret(model) def call(*args): return not subpred(*args) call.__doc__ = repr(self) return call class All(_GenericPredicate): _op_repr = '&' def interpret(self, model): subpreds = [p.interpret(model) for p in self.subpredicates] def call(*args): for subpredicate in subpreds: if not subpredicate(*args): return False return True call.__doc__ = repr(self) return call class Any(_GenericPredicate): _op_repr = '|' def interpret(self, model): subpreds = [p.interpret(model) for p in self.subpredicates] def call(*args): for subpredicate in subpreds: if subpredicate(*args): return True return False call.__doc__ = repr(self) return call class ModelSays(predicate): def __init__(self, name): self.name = name def __repr__(self): return self.name def interpret(self, model): return model.getPredefinedPredicate(self.name) class DirectedMotifChange(predicate): def __init__(self, from_motif, to_motif, diff_at = None): self.from_motif = from_motif self.motiflen = len(from_motif) self.to_motif = to_motif self.diff_at = diff_at def __repr__(self): if self.diff_at is not None: diff = '[%d]' % self.diff_at else: diff = '' return '%s>%s%s' % (self.from_motif, self.to_motif, diff) def testMotif(self, motifs, query): """positions where motif pattern is found in query""" positions = set() for offset in range(len(query)-self.motiflen+1): for (q,ms) in zip(query[offset: offset+self.motiflen], motifs): if q not in ms: break else: positions.add(offset) return positions def testMotifs(self, from_motifs, to_motifs, x, y): """"positions where both motifs patterns are found""" pre = self.testMotif(from_motifs, x) post = self.testMotif(to_motifs, y) return pre & post def interpret(self, model): """Make a callable function which implements this predicate specificly for 'alphabet'""" # may be looking for a 2nt pattern in a 3nt alphabet, but not # 3nt pattern in dinucleotide alphabet. alphabet = model.getAlphabet() if alphabet.getMotifLen() < self.motiflen: raise ValueError("Alphabet motifs (%s) too short for %s (%s)" % (alphabet.getMotifLen(), repr(self), self.motiflen)) resolve = model.MolType.Ambiguities.__getitem__ from_motifs = [resolve(m) for m in self.from_motif] to_motifs = [resolve(m) for m in self.to_motif] def call(x, y): diffs = [X!=Y for (X,Y) in zip(x, y)] matches = [] for posn in self.testMotifs(from_motifs, to_motifs, x, y): diff = list(numpy.nonzero(diffs[posn:posn+self.motiflen])[0]) if diff and self.diff_at is None or diff == [self.diff_at]: matches.append(posn) return len(matches) == 1 call.__doc__ = repr(self) return call class UndirectedMotifChange(DirectedMotifChange): def __repr__(self): if self.diff_at is not None: diff = '[%d]' % self.diff_at else: diff = '' return '%s/%s%s' % (self.from_motif, self.to_motif, diff) def testMotifs(self, from_motifs, to_motifs, x, y): preF = self.testMotif(from_motifs, x) postF = self.testMotif(to_motifs, y) preR = self.testMotif(from_motifs, y) postR = self.testMotif(to_motifs, x) return (preF & postF) | (preR & postR) def MotifChange(x, y=None, forward_only=False, diff_at=None): if y is None: y = '' for i in range(len(x)): if i == diff_at or diff_at is None: y += '?' else: y += x[i] if forward_only: return DirectedMotifChange(x, y, diff_at=diff_at) else: return UndirectedMotifChange(x, y, diff_at=diff_at) class UserPredicate(predicate): def __init__(self, f): self.f = f def __repr__(self): return 'UserPredicate(%s)' % ( getattr(self.f, '__name__', None) or repr(self.f)) def interpret(self, model): return self.f silent = ModelSays('silent') replacement = ModelSays('replacement') def parse(rule): if ':' in rule: (label, rule) = rule.split(':') else: label = None if '@' in rule: (rule, diff_at) = rule.split('@') diff_at = int(diff_at) else: diff_at = None if '>' in rule or '/' in rule: forward_only = '>' in rule rule = rule.replace('>', '/') (x,y) = rule.split('/') if not y: y = None pred = MotifChange(x, y, forward_only, diff_at) else: pred = ModelSays(rule) if label: pred = pred.aliased(label) return pred PyCogent-1.5.3/cogent/evolve/simulate.py000644 000765 000024 00000011546 12024702176 021224 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Random sequences and random evolution of sequences in a tree""" import numpy import bisect __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" def argpicks(freqs, random_series): partition = numpy.add.accumulate(freqs) assert abs(partition[-1]-1.0) < 1e-6, (freqs, partition) while True: x = random_series.uniform(0.0,1.0) i = bisect.bisect_left(partition, x) yield i def argpick(freqs, random_series): return argpicks(freqs, random_series).next() def _randomMotifGenerator(random_series, motif_probs): motifs = motif_probs.keys() freqs = [motif_probs[m] for m in motifs] for i in argpicks(freqs, random_series): yield motifs[i] def evolveSequence(random_series, motifs, parent_seq, site_cats, psubs, preserved_sites=()): """Evolve a new sequence derived from parent_seq. Uses psubs[site_cats[i]] to pick a new motif derived from parent_seq[i]""" seq = [] randomMotifSources = {} for (i, parent_motif) in enumerate(parent_seq): if i in preserved_sites: edge_motif = preserved_sites[i] else: if parent_motif not in randomMotifSources: mprobs = {} parent_motif_index = motifs.index(parent_motif) site_cat = site_cats[i] psub = psubs[site_cat] for (dest_motif_index, dest_motif) in enumerate(motifs): prob = psub[parent_motif_index, dest_motif_index] mprobs[dest_motif] = prob randomMotifSources[site_cat, parent_motif] = \ _randomMotifGenerator(random_series, mprobs) edge_motif = randomMotifSources[site_cat, parent_motif].next() seq.append(edge_motif) return seq def randomSequence(random_series, motif_probs, sequence_length): getRootRandomMotif = _randomMotifGenerator(random_series, motif_probs).next return [getRootRandomMotif() for i in range(sequence_length)] class AlignmentEvolver(object): # Encapsulates settings that are constant throughout the recursive generation # of a synthetic alignment. def __init__(self, random_series, orig_ambig, exclude_internal, bin_names, site_bins, psub_for, motifs): self.random_series = random_series self.orig_ambig = orig_ambig self.exclude_internal = exclude_internal self.bin_names = bin_names self.site_bins = site_bins self.psub_for = psub_for self.motifs = motifs def __call__(self, tree, root_sequence): #probsd = dict(enumerate(self.bin_probs)) #bprobs = _randomMotifGenerator(self.random_series, probsd) #site_bins = [bprobs.next() for c in range(len(root_sequence))] return self.generateSimulatedSeqs(tree, root_sequence) def generateSimulatedSeqs(self, parent, parent_seq): """recursively generate the descendant sequences by descending the tree from root. Each child will be set by mutating the parent motif based on the probs in the psub matrix of this edge. random_series - get a random numer 0-1 by calling random_series.random() length - the desired alignment length parent - the edge structure. parent_seq - the corresponding sequence. This will be mutated for each of its children, based on their psub matricies. """ # This depends on parameter names 'mprobs', 'alignment2', 'bprobs' and # 'psubs'. Might be better to integrate it into likelihood_calculation. if self.exclude_internal and parent.Children: simulated_sequences = {} else: simulated_sequences = {parent.Name : ''.join(parent_seq)} for edge in parent.Children: # The result for this edge - a list of motifs # Keep original ambiguity codes if edge.Name in self.orig_ambig: orig_seq_ambig = self.orig_ambig[edge.Name] else: orig_seq_ambig = {} # Matrix of substitution probabilities psubs = [self.psub_for(edge.Name, bin) for bin in self.bin_names] # Make the semi-random sequence for this edge. edge_seq = evolveSequence(self.random_series, self.motifs, parent_seq, self.site_bins, psubs, orig_seq_ambig) # Pass this new edge sequence on down the tree descendant_sequences = self.generateSimulatedSeqs( edge, edge_seq) simulated_sequences.update(descendant_sequences) return simulated_sequences PyCogent-1.5.3/cogent/evolve/solved_models.py000644 000765 000024 00000006160 11572303442 022234 0ustar00jrideoutstaff000000 000000 """P matrices for some DNA models can be calculated without going via the intermediate rate matrix Q. A Cython implementation of this calculation can be used when Q is not required, for example during likelihood tree optimisation. Equivalent pure python code is NOT provided because it is typically slower than the rate-matrix based alternative and provides no extra functionality. """ from cogent.evolve.substitution_model import Nucleotide, CalcDefn from cogent.evolve.predicate import MotifChange from cogent.maths.matrix_exponentiation import FastExponentiator import numpy from cogent.util.modules import importVersionedModule, ExpectedImportError try: _solved_models = importVersionedModule('_solved_models', globals(), (1, 0), "only matrix exponentiating DNA models") except ExpectedImportError: _solved_models = None class PredefinedNucleotide(Nucleotide): _default_expm_setting = None # Instead of providing calcExchangeabilityMatrix this subclass overrrides # makeContinuousPsubDefn to bypass the Q / Qd step. def makeContinuousPsubDefn(self, word_probs, mprobs_matrix, distance, rate_params): # Only one set of mprobs will be used assert word_probs is mprobs_matrix # Order of bases is assumed later, so check it really is Y,Y,R,R: alphabet = self.getAlphabet() assert set(list(alphabet)[:2]) == set(['T', 'C']) assert set(list(alphabet)[2:]) == set(['G', 'A']) # Should produce the same P as an ordinary Q based model would: self.checkPsubCalculationsMatch() return CalcDefn(self.calcPsubMatrix, name='psubs')( word_probs, distance, *rate_params) def calcPsubMatrix(self, pi, time, kappa_y=1.0, kappa_r=None): """Is F81, HKY83 or TN93 when passed 0, 1 or 2 parameters""" if kappa_r is None: kappa_r = kappa_y result = numpy.empty([4,4], float) _solved_models.calc_TN93_P(self._do_scaling, pi, time, kappa_y, kappa_r, result) return result def checkPsubCalculationsMatch(self): pi = numpy.array([.1, .2, .3, .4]) params = [4,6][:len(self.parameter_order)] Q = self.calcQ(pi, pi, *params) P1 = FastExponentiator(Q)(.5) P2 = self.calcPsubMatrix(pi, .5, *params) assert numpy.allclose(P1, P2) def _solvedNucleotide(name, predicates, rate_matrix_required=True, **kw): if _solved_models is not None and not rate_matrix_required: klass = PredefinedNucleotide else: klass = Nucleotide return klass(name=name, predicates=predicates, model_gaps=False, **kw) kappa_y = MotifChange('T', 'C').aliased('kappa_y') kappa_r = MotifChange('A', 'G').aliased('kappa_r') kappa = (kappa_y | kappa_r).aliased('kappa') def TN93(**kw): """Tamura and Nei 1993 model""" return _solvedNucleotide('TN93', [kappa_y, kappa_r], recode_gaps=True, **kw) def HKY85(**kw): """Hasegawa, Kishino and Yanamo 1985 model""" return _solvedNucleotide('HKY85', [kappa], recode_gaps=True, **kw) def F81(**kw): """Felsenstein's 1981 model""" return _solvedNucleotide('F81', [], recode_gaps=True, **kw) PyCogent-1.5.3/cogent/evolve/substitution_calculation.py000644 000765 000024 00000004660 12024702176 024532 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import numpy import warnings from cogent.recalculation.definition import PositiveParamDefn, RatioParamDefn, \ CalculationDefn, MonotonicDefn, ProductDefn, ConstDefn, PartitionDefn, \ NonParamDefn, CallDefn, SelectForDimension, \ GammaDefn, WeightedPartitionDefn, CalcDefn from cogent.maths.matrix_exponentiation import PadeExponentiator, \ FastExponentiator, CheckedExponentiator, LinAlgError __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" # Custom subclasses of Defn (see cogent.recalulation) for use by substitution models. class AlignmentAdaptDefn(CalculationDefn): name = 'leaf_likelihoods' def calc(self, model, alignment): return model.convertAlignment(alignment) class LengthDefn(PositiveParamDefn): name = 'length' valid_dimensions = ('edge',) independent_by_default = True upper = 10.0 class RateDefn(RatioParamDefn): name = 'rate' valid_dimensions = ('bin', 'locus') independent_by_default = True lower = 1e-3 upper = 1e+3 class SubstitutionParameterDefn(RatioParamDefn): valid_dimensions = ('edge', 'bin', 'locus') independent_by_default = False class ExpDefn(CalculationDefn): name = 'exp' def calc(self, expm): (allow_eigen, check_eigen, allow_pade) = { 'eigen': (True, False, False), 'checked': (True, True, False), 'pade': (False, False, True), 'either': (True, True, True), }[str(expm)] if not allow_eigen: return PadeExponentiator eigen = CheckedExponentiator if check_eigen else FastExponentiator if not allow_pade: return eigen else: def _both(Q, eigen=eigen): try: return eigen(Q) except (ArithmeticError, LinAlgError), detail: if not _both.given_expm_warning: warnings.warn("using slow exponentiator because '%s'" % str(detail)) _both.given_expm_warning = True return PadeExponentiator(Q) _both.given_expm_warning = False return _both PyCogent-1.5.3/cogent/evolve/substitution_model.py000644 000765 000024 00000114607 12024702176 023337 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ substitution_model.py Contains classes for defining Markov models of substitution. These classes depend on an Alphabet class member for defining the set of motifs that each represent a state in the Markov chain. Examples of a 'dna' type alphabet motif is 'a', and of a 'codon' type motif is'atg'. By default all models include the gap motif ('-' for a 'dna' alphabet or '---' for a 'codon' alphabet). This differs from software such as PAML, where gaps are treated as ambiguituous states (specifically, as 'n'). The gap motif state can be excluded from the substitution model using the method excludeGapMotif(). It is recommended that to ensure the alignment and the substitution model are defined with the same alphabet that modifications are done to the substitution model alphabet and this instance is then given to the alignment. The model's substitution rate parameters are represented as a dictionary with the parameter names as keys, and predicate functions as the values. These predicate functions compare a pair of motifs, returning True or False. Many such functions are provided as methods of the class. For instance, the istransition method is pertinent to dna based models. This method returns True if an 'a'/'g' or 'c'/'t' pair is passed to it, False otherwise. In this way the positioning of parameters in the instantaneous rate matrix (commonly called Q) is determined. >>> model = Nucleotide(equal_motif_probs=True) >>> model.setparameterrules({'alpha': model.istransition}) >>> parameter_controller = model.makeParamController(tree) """ import numpy from numpy.linalg import svd import warnings import inspect from cogent.core import moltype from cogent.evolve import parameter_controller, predicate, motif_prob_model from cogent.evolve.substitution_calculation import ( SubstitutionParameterDefn as ParamDefn, RateDefn, LengthDefn, ProductDefn, CallDefn, CalcDefn, PartitionDefn, NonParamDefn, AlignmentAdaptDefn, ExpDefn, ConstDefn, GammaDefn, MonotonicDefn, SelectForDimension, WeightedPartitionDefn) from cogent.evolve.discrete_markov import PsubMatrixDefn from cogent.evolve.likelihood_tree import makeLikelihoodTreeLeaf from cogent.maths.optimisers import ParameterOutOfBoundsError __author__ = "Peter Maxwell, Gavin Huttley and Andrew Butterfield" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Gavin Huttley", "Andrew Butterfield", "Peter Maxwell", "Matthew Wakefield", "Brett Easton", "Rob Knight", "Von Bing Yap"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def predicate2matrix(alphabet, pred, mask=None): """From a test like istransition() produce an MxM boolean matrix""" M = len(alphabet) result = numpy.zeros([M,M], int) for i in range(M): for j in range(M): if mask is None or mask[i,j]: result[i,j] = pred(alphabet[i], alphabet[j]) return result def redundancyInPredicateMasks(preds): # Calculate the nullity of the predicates. If non-zero # there is some redundancy and the model will be overparameterised. if len(preds) <= 1: return 0 eqns = 1.0 * numpy.array([list(mask.flat) for mask in preds.values()]) svs = svd(eqns)[1] # count non-duplicate non-zeros singular values matrix_rank = len([sv for sv in svs if abs(sv) > 1e-8]) return len(preds) - matrix_rank def _maxWidthIfTruncated(pars, delim, each): # 'pars' is an array of lists of strings, how long would the longest # list representation be if the strings were truncated at 'each' # characters and joined together with 'delim'. return max([ sum([min(len(par), each) for par in par_list]) + len(delim) * (len(par_list)-1) for par_list in pars.flat]) def _isSymmetrical(matrix): return numpy.alltrue(numpy.alltrue(matrix == numpy.transpose(matrix))) def extend_docstring_from(cls, pre=False): def docstring_inheriting_decorator(fn): parts = [getattr(cls,fn.__name__).__doc__, fn.__doc__ or ''] if pre: parts.reverse() fn.__doc__ = ''.join(parts) return fn return docstring_inheriting_decorator class _SubstitutionModel(object): # Subclasses must provide # .makeParamControllerDefns() def __init__(self, alphabet, motif_probs=None, optimise_motif_probs=False, equal_motif_probs=False, motif_probs_from_data=None, motif_probs_alignment=None, mprob_model=None, model_gaps=False, recode_gaps=False, motif_length=None, name="", motifs=None): # subclasses can extend this incomplete docstring """ Alphabet: - alphabet - An Alphabet object - motif_length: Use a tuple alphabet based on 'alphabet'. - motifs: Use a subalphabet that only contains those motifs. - model_gaps: Whether the gap motif should be included as a state. - recode_gaps: Whether gaps in an alignment should be treated as an ambiguous state instead. Motif Probability: - motif_probs: Dictionary of probabilities. - equal_motif_probs: Flag to set alignment motif probs equal. - motif_probs_alignment: An alignment from which motif probs are set. If none of these options are set then motif probs will be derived from the data: ie the particular alignment provided later. - optimise_motif_probs: Treat like other free parameters. Any values set by the other motif_prob options will be used as initial values. - mprob_model: 'tuple', 'conditional' or 'monomer' to specify how tuple-alphabet (including codon) motif probs are used. """ # MISC assert len(alphabet) < 65, "Alphabet too big. Try explicitly "\ "setting alphabet to PROTEIN or DNA" self.name = name self._optimise_motif_probs = optimise_motif_probs # ALPHABET if recode_gaps: if model_gaps: warnings.warn("Converting gaps to wildcards AND modeling gaps") else: model_gaps = False self.recode_gaps = recode_gaps self.MolType = alphabet.MolType if model_gaps: alphabet = alphabet.withGapMotif() if motif_length > 1: alphabet = alphabet.getWordAlphabet(motif_length) if motifs is not None: alphabet = alphabet.getSubset(motifs) self.alphabet = alphabet self.gapmotif = alphabet.getGapMotif() self._word_length = alphabet.getMotifLen() # MOTIF PROB ALPHABET MAPPING if mprob_model is None: mprob_model = 'tuple' if self._word_length==1 else 'conditional' elif mprob_model == 'word': mprob_model = 'tuple' if model_gaps and mprob_model != 'tuple': raise ValueError("mprob_model must be 'tuple' to model gaps") isinst = self._isInstantaneous self._instantaneous_mask = predicate2matrix(self.alphabet, isinst) self._instantaneous_mask_f = self._instantaneous_mask * 1.0 self.mprob_model = motif_prob_model.makeModel(mprob_model, alphabet, self._instantaneous_mask_f) # MOTIF PROBS if equal_motif_probs: assert not (motif_probs or motif_probs_alignment), \ "Motif probs equal or provided but not both" motif_probs = self.mprob_model.makeEqualMotifProbs() elif motif_probs_alignment is not None: assert not motif_probs, \ "Motif probs from alignment or provided but not both" motif_probs = self.countMotifs(motif_probs_alignment) motif_probs = motif_probs.astype(float) / sum(motif_probs) assert len(alphabet) == len(motif_probs) motif_probs = dict(zip(alphabet, motif_probs)) if motif_probs: self.adaptMotifProbs(motif_probs) # to check self.motif_probs = motif_probs if motif_probs_from_data is None: motif_probs_from_data = False else: self.motif_probs = None if motif_probs_from_data is None: motif_probs_from_data = True self.motif_probs_from_align = motif_probs_from_data def getParamList(self): return [] def __str__(self): s = ["\n%s (" % self.__class__.__name__ ] s.append("name = '%s'; type = '%s';" % (getattr(self, "name", None), getattr(self, "type", None))) if hasattr(self, "predicate_masks"): parlist = self.predicate_masks.keys() s.append("params = %s;" % parlist) motifs = self.getMotifs() s.append("number of motifs = %s;" % len(motifs)) s.append("motifs = %s)\n" % motifs) return " ".join(s) def getAlphabet(self): return self.alphabet def getMprobAlphabet(self): return self.mprob_model.getInputAlphabet() def getMotifs(self): return list(self.getAlphabet()) def getWordLength(self): return self._word_length def getMotifProbs(self): """Return the dictionary of motif probabilities.""" return self.motif_probs.copy() def setParamControllerMotifProbs(self, pc, mprobs, **kw): return self.mprob_model.setParamControllerMotifProbs(pc, mprobs, **kw) def makeLikelihoodFunction(self, tree, motif_probs_from_align=None, optimise_motif_probs=None, aligned=True, expm=None, digits=None, space=None, **kw): if motif_probs_from_align is None: motif_probs_from_align = self.motif_probs_from_align if optimise_motif_probs is None: optimise_motif_probs = self._optimise_motif_probs kw['optimise_motif_probs'] = optimise_motif_probs kw['motif_probs_from_align'] = motif_probs_from_align if aligned: klass = parameter_controller.AlignmentLikelihoodFunction else: alphabet = self.getAlphabet() assert alphabet.getGapMotif() not in alphabet klass = parameter_controller.SequenceLikelihoodFunction result = klass(self, tree, **kw) if self.motif_probs is not None: result.setMotifProbs(self.motif_probs, is_constant= not optimise_motif_probs, auto=True) if expm is None: expm = self._default_expm_setting if expm is not None: result.setExpm(expm) if digits or space: result.setTablesFormat(digits=digits, space=space) return result def makeParamController(self, tree, motif_probs_from_align=None, optimise_motif_probs=None, **kw): # deprecate return self.makeLikelihoodFunction(tree, motif_probs_from_align = motif_probs_from_align, optimise_motif_probs = optimise_motif_probs, **kw) def convertAlignment(self, alignment): # this is to support for everything but HMM result = {} for seq_name in alignment.getSeqNames(): sequence = alignment.getGappedSeq(seq_name, self.recode_gaps) result[seq_name] = self.convertSequence(sequence, seq_name) return result def convertSequence(self, sequence, name): # makeLikelihoodTreeLeaf, sort of an indexed profile where duplicate # columns stored once, so likelihoods only calc'd once return makeLikelihoodTreeLeaf(sequence, self.getAlphabet(), name) def countMotifs(self, alignment, include_ambiguity=False): return self.mprob_model.countMotifs(alignment, include_ambiguity, self.recode_gaps) def makeAlignmentDefn(self, model): align = NonParamDefn('alignment', ('locus',)) # The name of this matters, it's used in likelihood_function.py # to retrieve the correct (adapted) alignment. return AlignmentAdaptDefn(model, align) def adaptMotifProbs(self, motif_probs, auto=False): return self.mprob_model.adaptMotifProbs(motif_probs, auto=auto) def calcMonomerProbs(self, word_probs): # Not presently used, always go monomer->word instead return self.mprob_model.calcMonomerProbs(word_probs) def calcWordProbs(self, monomer_probs): return self.mprob_model.calcWordProbs(monomer_probs) def calcWordWeightMatrix(self, monomer_probs): return self.mprob_model.calcWordWeightMatrix(monomer_probs) def makeParamControllerDefns(self, bin_names, endAtQd=False): (input_probs, word_probs, mprobs_matrix) = \ self.mprob_model.makeMotifWordProbDefns() if len(bin_names) > 1: bprobs = PartitionDefn( [1.0/len(bin_names) for bin in bin_names], name = "bprobs", dimensions=['locus'], dimension=('bin', bin_names)) else: bprobs = None defns = { 'align': self.makeAlignmentDefn(ConstDefn(self, 'model')), 'bprobs': bprobs, 'word_probs': word_probs, } rate_params = self.makeRateParams(bprobs) if endAtQd: defns['Qd'] = self.makeQdDefn(word_probs, mprobs_matrix, rate_params) else: defns['psubs'] = self.makePsubsDefn( bprobs, word_probs, mprobs_matrix, rate_params) return defns class DiscreteSubstitutionModel(_SubstitutionModel): _default_expm_setting = None def _isInstantaneous(self, x, y): return True def getParamList(self): return [] def makeRateParams(self, bprobs): return [] def makePsubsDefn(self, bprobs, word_probs, mprobs_matrix, rate_params): assert len(rate_params) == 0 assert word_probs is mprobs_matrix, "Must use simple mprob model" motifs = tuple(self.getAlphabet()) return PsubMatrixDefn( name="psubs", dimension = ('motif', motifs), default=None, dimensions=('locus', 'edge')) class _ContinuousSubstitutionModel(_SubstitutionModel): # subclass must provide: # # - parameter_order: a list of parameter names corresponding to the # arguments of: # # - calcExchangeabilityMatrix(*params) # convert len(self.parameter_order) params to a matrix """A substitution model for which the rate matrix (P) is derived from an instantaneous rate matrix (Q). The nature of the parameters used to define Q is up to the subclasses. """ # At some point this can be made variable, and probably # the default changed to False long_indels_are_instantaneous = True _scalableQ = True _exponentiator = None _default_expm_setting = 'either' @extend_docstring_from(_SubstitutionModel) def __init__(self, alphabet, with_rate=False, ordered_param=None, distribution=None, partitioned_params=None, do_scaling=None, **kw): """ - with_rate: Add a 'rate' parameter which varies by bin. - ordered_param: name of a single parameter which distinguishes any bins. - distribution: choices of 'free' or 'gamma' or an instance of some distribution. Could probably just deprecate free - partitioned_params: names of params to be partitioned across bins - do_scaling: Scale branch lengths as the expected number of substitutions. Reduces the maximum substitution df by 1. """ _SubstitutionModel.__init__(self, alphabet, **kw) alphabet = self.getAlphabet() # as may be altered by recode_gaps etc. if do_scaling is None: do_scaling = self._scalableQ if do_scaling and not self._scalableQ: raise ValueError("Can't autoscale a %s model" % type(self).__name__) self._do_scaling = do_scaling # BINS if not ordered_param: if ordered_param is not None: warnings.warn('ordered_param should be a string or None') ordered_param = None if distribution: if with_rate: ordered_param = 'rate' else: raise ValueError('distribution provided without ordered_param') elif not isinstance(ordered_param, str): warnings.warn('ordered_param should be a string or None') assert len(ordered_param) == 1, 'More than one ordered_param' ordered_param = ordered_param[0] assert ordered_param, "False value hidden in list" self.ordered_param = ordered_param if distribution == "gamma": distribution = GammaDefn elif distribution in [None, "free"]: distribution = MonotonicDefn elif isinstance(distribution, basestring): raise ValueError('Unknown distribution "%s"' % distribution) self.distrib_class = distribution if not partitioned_params: partitioned_params = () elif isinstance(partitioned_params, str): partitioned_params = (partitioned_params,) else: partitioned_params = tuple(partitioned_params) if self.ordered_param: if self.ordered_param not in partitioned_params: partitioned_params += (self.ordered_param,) self.partitioned_params = partitioned_params if 'rate' in partitioned_params: with_rate = True self.with_rate = with_rate # CACHED SHORTCUTS self._exponentiator = None #self._ident = numpy.identity(len(self.alphabet), float) def checkParamsExist(self): """Raise an error if the parameters specified to be partitioned or ordered don't actually exist.""" for param in self.partitioned_params: if param not in self.parameter_order and param != 'rate': desc = ['partitioned', 'ordered'][param==self.ordered_param] raise ValueError('%s param "%s" unknown' % (desc, param)) def _isInstantaneous(self, x, y): diffs = sum([X!=Y for (X,Y) in zip(x,y)]) return diffs == 1 or (diffs > 1 and self.long_indels_are_instantaneous and self._isAnyIndel(x, y)) def _isAnyIndel(self, x, y): """An indel of any length""" # Things get complicated when a contigous indel of any length is OK: if x == y: return False gap_start = gap_end = gap_strand = None for (i, (X,Y)) in enumerate(zip(x,y)): G = self.gapmotif[i] if X != Y: if X != G and Y != G: return False # non-gap differences had their chance above elif gap_start is None: gap_start = i gap_strand = [X,Y].index(G) elif gap_end is not None or [X,Y].index(G) != gap_strand: return False # can't start a second gap else: pass # extend open gap elif gap_start is not None: gap_end = i return True def calcQ(self, word_probs, mprobs_matrix, *params): Q = self.calcExchangeabilityMatrix(word_probs, *params) Q *= mprobs_matrix row_totals = Q.sum(axis=1) Q -= numpy.diag(row_totals) if self._do_scaling: Q *= 1.0 / (word_probs * row_totals).sum() return Q def makeQdDefn(self, word_probs, mprobs_matrix, rate_params): """Diagonalized Q, ie: rate matrix prepared for exponentiation""" Q = CalcDefn(self.calcQ, name='Q')(word_probs, mprobs_matrix, *rate_params) expm = NonParamDefn('expm') exp = ExpDefn(expm) Qd = CallDefn(exp, Q, name='Qd') return Qd def _makeBinParamDefn(self, edge_par_name, bin_par_name, bprob_defn): # if no ordered param defined, behaves as old, everything indexed by and edge if edge_par_name not in self.partitioned_params: return ParamDefn(dimensions=['bin'], name=bin_par_name) if edge_par_name == self.ordered_param: whole = self.distrib_class(bprob_defn, bin_par_name) else: # this forces them to average to one, but no forced order # this means you can't force a param value to be shared across bins # so 1st above approach has to be used whole = WeightedPartitionDefn(bprob_defn, bin_par_name+'_partn') whole.bin_names = bprob_defn.bin_names return SelectForDimension(whole, 'bin', name=bin_par_name) def makeRateParams(self, bprobs): params = [] for param_name in self.parameter_order: if bprobs is None or param_name not in self.partitioned_params: defn = ParamDefn(param_name) else: e_defn = ParamDefn(param_name, dimensions=['edge', 'locus']) # should be weighted by bprobs*rates not bprobs b_defn = self._makeBinParamDefn( param_name, param_name+'_factor', bprobs) defn = ProductDefn(b_defn, e_defn, name=param_name+'_BE') params.append(defn) return params def makeFundamentalParamControllerDefns(self, bin_names): """Everything one step short of the psubs, because cogent.align code needs to handle Q*t itself.""" defns = self.makeParamControllerDefns(bin_names, endAtQd=True) assert not 'length' in defns defns['length'] = LengthDefn() return defns def makePsubsDefn(self, bprobs, word_probs, mprobs_matrix, rate_params): distance = self.makeDistanceDefn(bprobs) P = self.makeContinuousPsubDefn(word_probs, mprobs_matrix, distance, rate_params) return P def makeDistanceDefn(self, bprobs): length = LengthDefn() if self.with_rate and bprobs is not None: b_rate = self._makeBinParamDefn('rate', 'rate', bprobs) distance = ProductDefn(length, b_rate, name="distance") else: distance = length return distance def makeContinuousPsubDefn(self, word_probs, mprobs_matrix, distance, rate_params): Qd = self.makeQdDefn(word_probs, mprobs_matrix, rate_params) P = CallDefn(Qd, distance, name='psubs') return P class General(_ContinuousSubstitutionModel): """A continuous substitution model with one free parameter for each and every possible instantaneous substitution.""" # k = self.param_pick[i,j], 0<=k<=N+1 # k==0: not instantaneous, should be 0.0 in Q # k<=N: apply Kth exchangeability parameter # k==N+1: no parameter, should be 1.0 in unscaled Q #@extend_docstring_from(_ContinuousSubstitutionModel) def __init__(self, alphabet, **kw): _ContinuousSubstitutionModel.__init__(self, alphabet, **kw) alphabet = self.getAlphabet() # as may be altered by recode_gaps etc. mask = self._instantaneous_mask N = len(alphabet) self.param_pick = numpy.zeros([N,N], int) self.parameter_order = [] for (i,x) in enumerate(alphabet): for j in numpy.flatnonzero(mask[i]): y = alphabet[j] self.parameter_order.append('%s/%s'%(x,y)) self.param_pick[i,j] = len(self.parameter_order) if self._do_scaling: const_param = self.parameter_order.pop() self.symmetric = False self.checkParamsExist() def calcExchangeabilityMatrix(self, mprobs, *params): return numpy.array((0.0,)+params+(1.0,)).take(self.param_pick) class GeneralStationary(_ContinuousSubstitutionModel): """A continuous substitution model with one free parameter for each and every possible instantaneous substitution, except the last in each column. As general as can be while still having stationary motif probabilities""" #@extend_docstring_from(_ContinuousSubstitutionModel) def __init__(self, alphabet, **kw): _ContinuousSubstitutionModel.__init__(self, alphabet, **kw) alphabet = self.getAlphabet() # as may be altered by recode_gaps etc. mask = self._instantaneous_mask N = len(alphabet) self.param_pick = numpy.zeros([N,N], int) self.parameter_order = [] self.last_in_column = [] for (d, (row, col)) in enumerate(zip(mask, mask.T)): row = list(numpy.flatnonzero(row[d:])+d) col = list(numpy.flatnonzero(col[d:])+d) if col: self.last_in_column.append((col.pop(), d)) else: assert not row inst = [(d,j) for j in row] + [(i,d) for i in col] for (i, j) in inst: (x,y) = [alphabet[k] for k in [i,j]] self.parameter_order.append('%s/%s'%(x,y)) self.param_pick[i,j] = len(self.parameter_order) if self._do_scaling: const_param = self.parameter_order.pop() self.symmetric = False self.checkParamsExist() def calcExchangeabilityMatrix(self, mprobs, *params): R = numpy.array((0.0,)+params+(1.0,)).take(self.param_pick) for (i,j) in self.last_in_column: assert i > j row_total = numpy.dot(mprobs, R[j]) col_total = numpy.dot(mprobs, R[:,j]) required = row_total - col_total if required < 0.0: raise ParameterOutOfBoundsError R[i,j] = required / mprobs[i] return R class Empirical(_ContinuousSubstitutionModel): """A continuous substitution model with a predefined instantaneous rate matrix.""" @extend_docstring_from(_ContinuousSubstitutionModel) def __init__(self, alphabet, rate_matrix, **kw): """ - rate_matrix: The instantaneous rate matrix """ _ContinuousSubstitutionModel.__init__(self, alphabet, **kw) alphabet = self.getAlphabet() # as may be altered by recode_gaps etc. N = len(alphabet) assert rate_matrix.shape == (N, N) assert numpy.alltrue(numpy.diagonal(rate_matrix) == 0) self._instantaneous_mask_f = rate_matrix * 1.0 self._instantaneous_mask = (self._instantaneous_mask_f != 0.0) self.symmetric = _isSymmetrical(self._instantaneous_mask_f) self.parameter_order = [] self.checkParamsExist() def calcExchangeabilityMatrix(self, mprobs): return self._instantaneous_mask_f.copy() class SubstitutionModel(_ContinuousSubstitutionModel): """A continuous substitution model with only user-specified substitution parameters.""" @extend_docstring_from(_ContinuousSubstitutionModel) def __init__(self, alphabet, predicates=None, scales=None, **kw): """ - predicates: a dict of {name:predicate}. See cogent.evolve.predicate - scales: scale rules, dict with predicates """ self._canned_predicates = None _ContinuousSubstitutionModel.__init__(self, alphabet, **kw) (predicate_masks, predicate_order) = self._adaptPredicates(predicates or []) # Check for redundancy in predicates, ie: 1 or more than combine # to be equivalent to 1 or more others, or the distance params. # Give a clearer error in simple cases like always false or true. for (name, matrix) in predicate_masks.items(): if numpy.alltrue((matrix == 0).flat): raise ValueError("Predicate %s is always false." % name) predicates_plus_scale = predicate_masks.copy() predicates_plus_scale[None] = self._instantaneous_mask if self._do_scaling: for (name, matrix) in predicate_masks.items(): if numpy.alltrue((matrix == self._instantaneous_mask).flat): raise ValueError("Predicate %s is always true." % name) if redundancyInPredicateMasks(predicate_masks): raise ValueError("Redundancy in predicates.") if redundancyInPredicateMasks(predicates_plus_scale): raise ValueError("Some combination of predicates is" " equivalent to the overall rate parameter.") else: if redundancyInPredicateMasks(predicate_masks): raise ValueError("Redundancy in predicates.") if redundancyInPredicateMasks(predicates_plus_scale): warnings.warn("do_scaling=True would be more efficient than" " these overly general predicates") self.predicate_masks = predicate_masks self.parameter_order = [] self.predicate_indices = [] self.symmetric = _isSymmetrical(self._instantaneous_mask) for pred in predicate_order: mask = predicate_masks[pred] if not _isSymmetrical(mask): self.symmetric = False indices = numpy.nonzero(mask) assert numpy.alltrue(mask[indices] == 1) self.parameter_order.append(pred) self.predicate_indices.append(indices) if not self.symmetric: warnings.warn('Model not reversible') (self.scale_masks, scale_order) = self._adaptPredicates(scales or []) self.checkParamsExist() def calcExchangeabilityMatrix(self, mprobs, *params): assert len(params) == len(self.predicate_indices), self.parameter_order R = self._instantaneous_mask_f.copy() for (indices, par) in zip(self.predicate_indices, params): R[indices] *= par return R def asciiArt(self, delim='', delim2='|', max_width=70): """An ASCII-art table representing the model. 'delim' delimits parameter names, 'delim2' delimits motifs""" # Should be implemented with table module instead. pars = self.getMatrixParams() par_names = self.getParamList() longest = max([len(name) for name in (par_names+[' '])]) if delim: all_names_len = _maxWidthIfTruncated(pars, delim, 100) min_names_len = _maxWidthIfTruncated(pars, delim, 1) else: all_names_len = sum([len(name) for name in par_names]) min_names_len = len(par_names) # Find a width-per-motif that is as big as can be without being too big w = min_names_len while (w+1) * len(self.alphabet) < max_width and w < all_names_len: w += 1 # If not enough width truncate parameter names if w < all_names_len: each = w / len(par_names) if delim: while _maxWidthIfTruncated(pars, delim, each+1) <= w: each += 1 w = _maxWidthIfTruncated(pars, delim, each) else: w = each * len(par_names) else: each = longest rows = [] # Only show header if there is enough width for the motifs if self.alphabet.getMotifLen() <= w: header = [str(motif).center(w) for motif in self.alphabet] header = [' ' * self.alphabet.getMotifLen() + ' '] + header + [''] header = delim2.join(header) rows.append(header) rows.append(''.join([['-',delim2][c == delim2] for c in header])) # pars in sub-cols, should also offer pars in sub-rows? for (motif, row2) in zip(self.alphabet, pars): row = [] for par_list in row2: elt = [] for par in par_names: if par not in par_list: par = '' par = par[:each] if not delim: par = par.ljust(each) if par: elt.append(par) elt = delim.join(elt).ljust(w) row.append(elt) rows.append(delim2.join(([motif+' '] + row + ['']))) return '\n'.join(rows) def getMatrixParams(self): """Return the parameter assignment matrix.""" dim = len(self.alphabet) Pars = numpy.zeros([dim, dim], object) for x, y in [(x, y) for x in range(dim) for y in range(dim)]: Pars[x][y] = [] # a limitation of numpy. [x,y] = [] fails! if not self._instantaneous_mask[x, y]: continue for par in self.predicate_masks: if self.predicate_masks[par][x, y]: Pars[x, y].append(par) # sort the matrix entry to facilitate scaling calculations Pars[x, y].sort() return Pars def getParamList(self): """Return a list of parameter names.""" return self.predicate_masks.keys() def isInstantaneous(self, x, y): return self._isInstantaneous(x, y) def getSubstitutionRateValueFromQ(self, Q, motif_probs, pred): pred_mask = self._adaptPredicates([pred])[0].values()[0] pred_row_totals = numpy.sum(pred_mask * Q, axis=1) inst_row_totals = numpy.sum(self._instantaneous_mask * Q, axis=1) r = sum(pred_row_totals * motif_probs) t = sum(inst_row_totals * motif_probs) pred_size = numpy.sum(pred_mask.flat) inst_size = sum(self._instantaneous_mask.flat) return (r / pred_size) / ((t-r) / (inst_size-pred_size)) def getScaledLengthsFromQ(self, Q, motif_probs, length): lengths = {} for rule in self.scale_masks: lengths[rule] = length * self.getScaleFromQs( [Q], [1.0], motif_probs, rule) return lengths def getScaleFromQs(self, Qs, bin_probs, motif_probss, rule): rule = self.getPredicateMask(rule) weighted_scale = 0.0 bin_probs = numpy.asarray(bin_probs) for (Q, bin_prob, motif_probs) in zip(Qs, bin_probs, motif_probss): row_totals = numpy.sum(rule * Q, axis=1) motif_probs = numpy.asarray(motif_probs) word_probs = self.calcWordProbs(motif_probs) scale = sum(row_totals * word_probs) weighted_scale += bin_prob * scale return weighted_scale def getPredefinedPredicates(self): # overridden in subclasses return {'indel': predicate.parse('-/?')} def getPredefinedPredicate(self, name): # Called by predicate parsing code if self._canned_predicates is None: self._canned_predicates = self.getPredefinedPredicates() return self._canned_predicates[name].interpret(self) def _adaptPredicates(self, rules): # dict or list of callables, predicate objects or predicate strings if isinstance(rules, dict): rules = rules.items() else: rules = [(None, rule) for rule in rules] predicate_masks = {} order = [] for (key, pred) in rules: (label, mask) = self.adaptPredicate(pred, key) if label in predicate_masks: raise KeyError('Duplicate predicate name "%s"' % label) predicate_masks[label] = mask order.append(label) return predicate_masks, order def adaptPredicate(self, pred, label=None): if isinstance(pred, str): pred = predicate.parse(pred) elif callable(pred): pred = predicate.UserPredicate(pred) pred_func = pred.makeModelPredicate(self) label = label or repr(pred) mask = predicate2matrix( self.getAlphabet(), pred_func, mask=self._instantaneous_mask) return (label, mask) def getPredicateMask(self, pred): if pred in self.scale_masks: mask = self.scale_masks[pred] elif pred in self.predicate_masks: mask = self.predicate_masks[pred] else: (label, mask) = self.adaptPredicate(pred) return mask class _Nucleotide(SubstitutionModel): def getPredefinedPredicates(self): return { 'transition' : predicate.parse('R/R') | predicate.parse('Y/Y'), 'transversion' : predicate.parse('R/Y'), 'indel': predicate.parse('-/?'), } class Nucleotide(_Nucleotide): """A nucleotide substitution model.""" def __init__(self, **kw): SubstitutionModel.__init__(self, moltype.DNA.Alphabet, **kw) class Dinucleotide(_Nucleotide): """A nucleotide substitution model.""" def __init__(self, **kw): SubstitutionModel.__init__(self, moltype.DNA.Alphabet, motif_length=2, **kw) class Protein(SubstitutionModel): """Base protein substitution model.""" def __init__(self, with_selenocysteine=False, **kw): alph = moltype.PROTEIN.Alphabet if not with_selenocysteine: alph = alph.getSubset('U', excluded=True) SubstitutionModel.__init__(self, alph, **kw) def EmpiricalProteinMatrix(matrix, motif_probs=None, optimise_motif_probs=False, recode_gaps=True, do_scaling=True, **kw): alph = moltype.PROTEIN.Alphabet.getSubset('U', excluded=True) return Empirical(alph, rate_matrix=matrix, motif_probs=motif_probs, model_gaps=False, recode_gaps=recode_gaps, do_scaling=do_scaling, optimise_motif_probs=optimise_motif_probs, **kw) class Codon(_Nucleotide): """Core substitution model for codons""" long_indels_are_instantaneous = True def __init__(self, alphabet=None, gc=None, **kw): if gc is not None: alphabet = moltype.CodonAlphabet(gc = gc) alphabet = alphabet or moltype.STANDARD_CODON SubstitutionModel.__init__(self, alphabet, **kw) def _isInstantaneous(self, x, y): if x == self.gapmotif or y == self.gapmotif: return x != y else: ndiffs = sum([X!=Y for (X,Y) in zip(x,y)]) return ndiffs == 1 def getPredefinedPredicates(self): gc = self.getAlphabet().getGeneticCode() def silent(x, y): return x != '---' and y != '---' and gc[x] == gc[y] def replacement(x, y): return x != '---' and y != '---' and gc[x] != gc[y] preds = _Nucleotide.getPredefinedPredicates(self) preds.update({ 'indel' : predicate.parse('???/---'), 'silent' : predicate.UserPredicate(silent), 'replacement' : predicate.UserPredicate(replacement), }) return preds PyCogent-1.5.3/cogent/draw/__init__.py000644 000765 000024 00000001611 12024702176 020565 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = [ 'dendrogram', 'dotplot', 'linear', 'colors', 'TrackDefn', 'Display', 'DisplayPolicy', 'Area', 'Arrow', 'BluntArrow', 'Box', 'Diamond'] + [ 'arrow_rates', 'dinuc', 'fancy_arrow', 'codon_usage','util'] __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", "Zongzhi Liu", "Matthew Wakefield", "Stephanie Wilson"] __license__ = "GPL" __version__ = "1.5.3" __status__ = "Production" from cogent.draw.linear import (colors, TrackDefn, Display, DisplayPolicy, Area, Arrow, BluntArrow, Box, Diamond) try: import cogent.draw.matplotlib except ImportError: pass else: import warnings warnings.warn("You still have a cogent/draw/matplotlib subpackage" + " present, that will cause problems for matplotlib imports") PyCogent-1.5.3/cogent/draw/arrow_rates.py000644 000765 000024 00000022443 12024702176 021364 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from matplotlib import use, rc use('Agg') #suppress graphical rendering from pylab import rc, gcf, xlim, ylim, xticks, yticks, sqrt, text, clip, gca, \ array, dot, ravel, draw, show, savefig from fancy_arrow import arrow """Draws arrow plots representing rate matrices. Note: currently requires dict of dinuc freqs, but should modify to work with Rates objects from seqsim. Based on graphical displays by Noboru Sueoka. """ __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" rc('text', usetex=True) rates_to_bases={'r1':'AT', 'r2':'TA', 'r3':'GA','r4':'AG','r5':'CA','r6':'AC', \ 'r7':'GT', 'r8':'TG', 'r9':'CT','r10':'TC','r11':'GC','r12':'CG'} numbered_bases_to_rates = dict([(v,k) for k, v in rates_to_bases.items()]) lettered_bases_to_rates = dict([(v, 'r'+v) for k, v in rates_to_bases.items()]) def add_dicts(d1, d2): """Adds two dicts and returns the result.""" result = d1.copy() result.update(d2) return result def make_arrow_plot(data, size=4, display='length', shape='right', \ max_arrow_width=0.03, arrow_sep = 0.02, alpha=0.5, \ normalize_data=False, ec=None, labelcolor=None, \ head_starts_at_zero=True, rate_labels=lettered_bases_to_rates,\ graph_name=None, \ **kwargs): """Makes an arrow plot. Parameters: data: dict with probabilities for the bases and pair transitions. size: size of the graph in inches. display: 'length', 'width', or 'alpha' for arrow property to change. shape: 'full', 'left', or 'right' for full or half arrows. max_arrow_width: maximum width of an arrow, data coordinates. arrow_sep: separation between arrows in a pair, data coordinates. alpha: maximum opacity of arrows, default 0.8. **kwargs can be anything allowed by a Arrow object, e.g. linewidth and edgecolor. """ xlim(-0.5,1.5) ylim(-0.5,1.5) gcf().set_size_inches(size,size) xticks([]) yticks([]) max_text_size = size*12 min_text_size = size label_text_size = size*2.5 text_params={'ha':'center', 'va':'center', 'family':'sans-serif',\ 'fontweight':'bold'} r2 = sqrt(2) deltas = {\ 'AT':(1,0), 'TA':(-1,0), 'GA':(0,1), 'AG':(0,-1), 'CA':(-1/r2, 1/r2), 'AC':(1/r2, -1/r2), 'GT':(1/r2, 1/r2), 'TG':(-1/r2,-1/r2), 'CT':(0,1), 'TC':(0,-1), 'GC':(1,0), 'CG':(-1,0) } colors = {\ 'AT':'r', 'TA':'k', 'GA':'g', 'AG':'r', 'CA':'b', 'AC':'r', 'GT':'g', 'TG':'k', 'CT':'b', 'TC':'k', 'GC':'g', 'CG':'b' } label_positions = {\ 'AT':'center', 'TA':'center', 'GA':'center', 'AG':'center', 'CA':'left', 'AC':'left', 'GT':'left', 'TG':'left', 'CT':'center', 'TC':'center', 'GC':'center', 'CG':'center' } def do_fontsize(k): return float(clip(max_text_size*sqrt(data[k]),\ min_text_size,max_text_size)) A = text(0,1, '$A_3$', color='r', size=do_fontsize('A'), **text_params) T = text(1,1, '$T_3$', color='k', size=do_fontsize('T'), **text_params) G = text(0,0, '$G_3$', color='g', size=do_fontsize('G'), **text_params) C = text(1,0, '$C_3$', color='b', size=do_fontsize('C'), **text_params) arrow_h_offset = 0.25 #data coordinates, empirically determined max_arrow_length = 1 - 2*arrow_h_offset max_arrow_width = max_arrow_width max_head_width = 2.5*max_arrow_width max_head_length = 2*max_arrow_width arrow_params={'length_includes_head':True, 'shape':shape, \ 'head_starts_at_zero':head_starts_at_zero} ax = gca() sf = 0.6 #max arrow size represents this in data coords d = (r2/2 + arrow_h_offset - 0.5)/r2 #distance for diags r2v = arrow_sep/r2 #offset for diags #tuple of x, y for start position positions = {\ 'AT': (arrow_h_offset, 1+arrow_sep), 'TA': (1-arrow_h_offset, 1-arrow_sep), 'GA': (-arrow_sep, arrow_h_offset), 'AG': (arrow_sep, 1-arrow_h_offset), 'CA': (1-d-r2v, d-r2v), 'AC': (d+r2v, 1-d+r2v), 'GT': (d-r2v, d+r2v), 'TG': (1-d+r2v, 1-d-r2v), 'CT': (1-arrow_sep, arrow_h_offset), 'TC': (1+arrow_sep, 1-arrow_h_offset), 'GC': (arrow_h_offset, arrow_sep), 'CG': (1-arrow_h_offset, -arrow_sep), } if normalize_data: #find maximum value for rates, i.e. where keys are 2 chars long max_val = 0 for k, v in data.items(): if len(k) == 2: max_val = max(max_val, v) #divide rates by max val, multiply by arrow scale factor for k, v in data.items(): data[k] = v/max_val*sf def draw_arrow(pair, alpha=alpha, ec=ec, labelcolor=labelcolor): #set the length of the arrow if display == 'length': length = max_head_length+(max_arrow_length-max_head_length)*\ data[pair]/sf else: length = max_arrow_length #set the transparency of the arrow if display == 'alph': alpha = min(data[pair]/sf, alpha) else: alpha=alpha #set the width of the arrow if display == 'width': scale = data[pair]/sf width = max_arrow_width*scale head_width = max_head_width*scale head_length = max_head_length*scale else: width = max_arrow_width head_width = max_head_width head_length = max_head_length fc = colors[pair] ec = ec or fc x_scale, y_scale = deltas[pair] x_pos, y_pos = positions[pair] arrow(ax, x_pos, y_pos, x_scale*length, y_scale*length, \ fc=fc, ec=ec, alpha=alpha, width=width, head_width=head_width, \ head_length=head_length, **arrow_params) #figure out coordinates for text #if drawing relative to base: x and y are same as for arrow #dx and dy are one arrow width left and up #need to rotate based on direction of arrow, use x_scale and y_scale #as sin x and cos x? sx, cx = y_scale, x_scale alo = arrow_label_offset = 3.5*max_arrow_width where = label_positions[pair] if where == 'left': orig_position = array([[alo, alo]]) elif where == 'absolute': orig_position = array([[max_arrow_length/2.0, alo]]) elif where == 'right': orig_position = array([[length-alo, alo]]) elif where == 'center': orig_position = array([[length/2.0, alo]]) else: raise ValueError, "Got unknown position parameter %s" % where M = array([[cx, sx],[-sx,cx]]) coords = dot(orig_position, M) + [[x_pos, y_pos]] x, y = ravel(coords) orig_label = rate_labels[pair] label = '$%s_{_{\mathrm{%s}}}$' % (orig_label[0], orig_label[1:]) text(x, y, label, size=label_text_size, ha='center', va='center', \ color=labelcolor or fc) for p in positions.keys(): draw_arrow(p) if graph_name is not None: savefig(graph_name) #test data all_on_max = dict([(i, 1) for i in 'TCAG'] + \ [(i+j, 0.6) for i in 'TCAG' for j in 'TCAG']) realistic_data = { 'A':0.4, 'T':0.3, 'G':0.5, 'C':0.2, 'AT':0.4, 'AC':0.3, 'AG':0.2, 'TA':0.2, 'TC':0.3, 'TG':0.4, 'CT':0.2, 'CG':0.3, 'CA':0.2, 'GA':0.1, 'GT':0.4, 'GC':0.1, } extreme_data = { 'A':0.75, 'T':0.10, 'G':0.10, 'C':0.05, 'AT':0.6, 'AC':0.3, 'AG':0.1, 'TA':0.02, 'TC':0.3, 'TG':0.01, 'CT':0.2, 'CG':0.5, 'CA':0.2, 'GA':0.1, 'GT':0.4, 'GC':0.2, } sample_data = { 'A':0.2137, 'T':0.3541, 'G':0.1946, 'C':0.2376, 'AT':0.0228, 'AC':0.0684, 'AG':0.2056, 'TA':0.0315, 'TC':0.0629, 'TG':0.0315, 'CT':0.1355, 'CG':0.0401, 'CA':0.0703, 'GA':0.1824, 'GT':0.0387, 'GC':0.1106, } if __name__ == '__main__': from sys import argv if len(argv) > 1: if argv[1] == 'full': d = all_on_max scaled = False elif argv[1] == 'extreme': d = extreme_data scaled = False elif argv[1] == 'realistic': d = realistic_data scaled = False elif argv[1] == 'sample': d = sample_data scaled = True else: d = all_on_max scaled=False if len(argv) > 2: display = argv[2] else: display = 'length' size = 4 gcf().set_size_inches(size,size) make_arrow_plot(d, display=display, linewidth=0.001, edgecolor=None, normalize_data=scaled, head_starts_at_zero=True, size=size, graph_name='arrows.png') PyCogent-1.5.3/cogent/draw/codon_usage.py000644 000765 000024 00000033254 12024702176 021324 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Provides different kinds of codon usage plots. See individual docstrings for more info. """ from matplotlib import use, rc use('Agg') #suppress graphical rendering rc('text', usetex=True) rc('font', family='serif') #required to match latex text and equations from cogent.core.usage import UnsafeCodonUsage as CodonUsage from cogent.draw.util import scatter_classic, \ init_graph_display, init_ticks, set_axis_to_probs, \ broadcast, plot_scatter, plot_filled_contour, \ plot_contour_lines, standard_series_colors from pylab import plot, savefig, gca, text, figlegend __author__ = "Stephanie Wilson" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Stephanie Wilson"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" #module-level constants #historical doublet order for fingerprint plot; not currently used, but #same order that the colors were entered in. Matches Sueoka 2002. doublet_order = ['GC','CG','GG','CU','CC','UC','AC','GU','UU','CA','AU',\ 'AA','AG','GA','UA','UG'] color_order = ["#000000","#FF0000","#00FF00","#FFFF00", "#CC99FF","#FFCC99","#CCFFFF","#C0C0C0", "#6D6D6D","#2353FF","#00FFFF","#FF8800", "#238853","#882353","#EC008C","#000099"] #map doublets to colors so we can make sure the same doublet always #gets the same colors doublets_to_colors = dict(zip(doublet_order, color_order)) #creates a dictionary for the amino acid labels, less to input aa_labels={'ALANINE':'GCN', 'ARGININE4':'CGN', 'GLYCINE':'GGN', 'LEUCINE4':'CTN', 'PROLINE':'CCN', 'SERINE4':'TCN', 'THREONINE':'ACN', 'VALINE':'GTN'} standard_series_colors=['k','r','g','b', 'm','c'] #scatterplot functions and helpers def plot_cai_p3_scatter(data, graph_name='cai_p3_scat.png', **kwargs): """Outputs a CAI vs P3 scatter plot. expects data as ([P3s_1, CAIs_1, P3s_2, CAIs_2, ...]) """ plot_scatter(data, graph_shape='sqr', graph_grid=None,\ x_label="$P_3$",y_label="CAI", prob_axes=True,**kwargs) savefig(graph_name) def plot_p12_p3(data, graph_name='p12_p3.png', **kwargs): """Outputs a P12 versus P3 scatter graph, optionally including regression. expects data as [P3_1, P12_1, P3_2, P12_2, ...n ]. """ plot_scatter(data, graph_shape='sqr', graph_grid='/',\ x_label="$P_3$",y_label="$P_{12}$", prob_axes=True, **kwargs) savefig(graph_name) def plot_p123_gc(data, graph_name='p123_gc.png', use_p3_as_x=False, **kwargs): """Output a scatter plot of p1,p2,p3 vs gc content Expects data as array with rows as GC, P1, P2, P3 p1=blue, p2=green, p3=red """ #unpack common x axis, and decide on series names if use_p3_as_x: series_names = ['$P_1$', '$P_2$'] colors=['b','g'] x_label='$P_3$' y_label='$P_{12}$' xy_pairs = [data[3], data[1], data[3], data[2]] else: series_names = ['$P_1$', '$P_2$', '$P_3$'] colors=['b','g','r'] x_label='GC' y_label='$P_{123}$' xy_pairs = [data[0], data[1], data[0], data[2], data[0], data[3]] #plot points and write graph plot_scatter(xy_pairs, graph_grid='/',x_label=x_label,y_label=y_label, series_names=series_names, prob_axes=True, **kwargs) savefig(graph_name) def plot_fingerprint(data, alpha=0.7, \ show_legend=True, graph_name='fingerprint.png', has_mean=True, which_blocks='quartets', multiple=False, graph_grid='t', prob_axes=True, \ edge_colors='k', **kwargs): """Outputs a bubble plot of four-codon amino acid blocks labeled with the colors from Sueoka 2002. takes: data: array-elements in the col order x, y, r of each of the four codon Amino Acids in the row order: ALA, ARG4, GLY, LEU4, PRO, SER, THR, VAL (for traditional fingerprint), or: UU -> GG (for 16-block fingerprint). last row is the mean (if has_mean is set True) **kwargs passed on to init_graph_display (these include graph_shape, graph_grid, x_label, y_label, dark, with_parens). title: will be printed on graph (default: 'Unknown Species') num_genes (number of genes contributing to graph: default None) NOTE: will not print if None.) size: of graph in inches (default = 8.0) alpha: transparency of bubbles (ranges from 0, transparent, to 1, opaque; default 0.7) show_legend: bool, default True, whether to print legend graph_name: name of file to write (default 'fingerprint.png') has_mean: whether the data contain the mean (default: True) which_blocks: which codon blocks to print (default is 'quartets' for the 4-codon amino acid blocks, but can also use 'all' for all quartets or 'split' for just the split quartets.) multiple: if False (the default), assumes it got a single block of data. Otherwise, assumes multiple blocks of data in a list or array. edge_colors: if multiple is True (ignored otherwise), uses this sequence of edge color strings to hand out edge colors to successive series. Will iterate over this, so can be a string of 1-letter color codes or a list of color names. note: that the data are always expected to be in the range (0,1) since we're plotting frequencies. axes, gid, etc. are hard-coded to these values. """ #figure out which type of fingerprint plot we're doing, and get the #right colors if which_blocks == 'quartets': blocks = CodonUsage.SingleAABlocks elif which_blocks == 'split': blocks = CodonUsage.SplitBlocks else: blocks = CodonUsage.Blocks colors = [doublets_to_colors[i] for i in blocks] #formatting the labels in latex x_label="$G_3/(G_3+C_3)$" y_label="$A_3/(A_3+T_3)$" #initializing components of the graph font,label_font_size=init_graph_display(graph_shape='sqr', \ graph_grid=graph_grid, x_label=x_label, \ y_label=y_label, prob_axes=prob_axes, **kwargs) if not multiple: data = [data] alpha = broadcast(alpha, len(data)) edge_colors = broadcast(edge_colors, len(data)) for al, d, edge_color in zip(alpha, data, edge_colors): #skip this series if no data if d is None or not d.any(): continue for i, color in enumerate(colors): j = i+1 #note: doing these as slices because scatter_classic needs the #extra level of nesting patches = scatter_classic(d[i:j,0], d[i:j,1], s=(d[i:j,2]/2), c=color) #set alpha for the patches manually for p in patches: p.set_alpha(al) p.set_edgecolor(edge_color) #plot mean as its own point -- can't do cross with scatter if has_mean: mean_index = len(blocks) #next index after the blocks plot([d[mean_index,0]], [d[mean_index,1]], '-k+',markersize=label_font_size, alpha=al) abbrev = CodonUsage.BlockAbbreviations a = gca() #if show_legend is True prints a legend in the right center area if show_legend: legend_key = [abbrev[b] for b in blocks] #copy legend font properties from the x axis tick labels legend_font_props = \ a.xaxis.get_label().get_fontproperties().copy() legend_font_scale_factor = 0.7 curr_size = legend_font_props.get_size() legend_font_props.set_size(curr_size*legend_font_scale_factor) l = figlegend(a.patches[:len(blocks)], legend_key, prop=legend_font_props, loc='center right',borderpad=0.1,labelspacing=0.5, handlelength=1.0,handletextpad=0.5, borderaxespad=0.0) #fix transparency of patches for p in l.get_patches(): p.set_alpha(1) #initialize the ticks set_axis_to_probs() init_ticks(a, label_font_size) a.set_xticks([0, 0.5, 1]) a.set_yticks([0,0.5,1]) #output the figure if graph_name is not None: savefig(graph_name) #Contour plots and related functions def plot_cai_p3_contour(x_bin,y_bin,data,xy_data, graph_name='cai_contour.png', prob_axes=True, **kwargs): """Output a contour plot of cai vs p3 with colorbar on side takes: x_bin, y_bin, data (data matrix) label (default 'Unknown Species') num_genes (default 0 will not print, other numbers will) size: of graph in inches (default = 8.0) graph_name: default 'cai_contour.png' """ plot_data =[(x_bin,y_bin,data)] plot_filled_contour(plot_data, graph_grid='/',x_label="$P_3$", \ y_label="CAI", prob_axes=prob_axes, **kwargs) set_axis_to_probs() if graph_name is not None: savefig(graph_name) def plot_cai_p3_contourlines(x_bin,y_bin,data,xy_data, graph_name='cai_contourlines.png', prob_axes=True, **kwargs): """Output a contour plot of cai takes: x_bin, y_bin, data (data matrix) label (default 'Unknown Species') num_genes (default 0 will not print, other numbers will) size: of graph in inches (default = 8.0) graph_name: default 'cai_contourlines.png' """ plot_data =[(x_bin,y_bin,data)] plot_contour_lines(plot_data, graph_grid='/', x_label="$P_3$", \ y_label="CAI", prob_axes=prob_axes,**kwargs) if graph_name is not None: savefig(graph_name) def plot_p12_p3_contour(x_bin,y_bin,data,xy_data, graph_name='p12_p3_contour.png', prob_axes=True, **kwargs): """Outputs a P12 versus P3 contour graph and the mean equation of the plot takes: x_bin, y_bin, data (data matrix) label (default 'Unknown Species') num_genes (default 0 will not print, other numbers will) size: of graph in inches (default = 8.0) graph_name: default 'p12_p3_contourlines.png' """ plot_data =[(x_bin,y_bin,data)] plot_filled_contour(plot_data, graph_grid='/', x_label="$P_3$", \ y_label="$P_{12}$", prob_axes=prob_axes,**kwargs) set_axis_to_probs() if graph_name is not None: savefig(graph_name) def plot_p12_p3_contourlines(x_bin,y_bin,data,xy_data, prob_axes=True,\ graph_name='p12_p3_contourlines.png', **kwargs): """Outputs a P12 versus P3 contourline graph and the mean equation of the plot takes: x_bin, y_bin, data (data matrix) label (default 'Unknown Species') num_genes (default 0 will not print, other numbers will) size: of graph in inches (default = 8.0) graph_name: default 'p12_p3_contourlines.png """ plot_data =[(x_bin,y_bin,data)] plot_contour_lines(plot_data, graph_grid='/', x_label="$P_3$",\ y_label="$P_{12}$", prob_axes=prob_axes, **kwargs) set_axis_to_probs() if graph_name is not None: savefig(graph_name) #Other graphs def plot_pr2_bias(data, title='ALANINE', graph_name='pr2_bias.png', \ num_genes='ignored', **kwargs): """Outputs a PR2-Bias plot of: -isotypic transversions (base swapping) with G3/(G3+C3) and A3/(A3+T3) -Transitions (deaminations) with G3/(G3+A3) and C3/(C3+T3) -Allotypic transversions (G- oxidations) with G3/(G3+T3) and C3/(C3+A3) takes: an array in the order: x,G3/(G3+C3),A3/(A3+T3), G3/(G3/A3),C3/(C3+T3),G3/(G3+T3),C3/(C3+A3) label: default 'ALANINE' one amino acid written out in caps: ALANINE, ARGININE4, GLYCINE, LEUCINE4, PROLINE, SERINE4, THREONINE, VALINE from one of the amino acids program will add acronym C2 type: ala(GCN), pro(CCN), ser4(TCN), thr(ACN) G2 type: arg4 (CGN), an gly(GGN) T2 type: leu4(CTN), val (GTN) size: of graph in inches (default = 8.0) graph_name: default 'pr2_bias.png' num_genes: number of genes contributing to graph, currently ignored. """ #we can't put anything in the top right, so print num_genes after the title #if it was supplied #initializes the graph display and font font,label_font_size=init_graph_display(graph_shape='sqr', \ graph_grid='/', x_label="$P_3$", y_label="Y axis", prob_axes=True, \ title=title, **kwargs) #sets the marker_size relative to the font and thus the graph size marker_size = (label_font_size-1) #plots the pr2bias in order G3/(G3+C3),A3/(A3+T3), # G3/(G3/A3),C3/(C3+T3), # G3/(G3+T3),C3/(C3+A3) #colors and symbols coded from Sueoka 2002 plot(data[:,0], data[:,1], '-ko', c='k', markersize=marker_size) plot(data[:,0], data[:,2], '-kv', c='k', markersize=marker_size) plot(data[:,0], data[:,3], '-ro', c='r', markersize=marker_size) plot(data[:,0], data[:,4], '-rv', c='r', markersize=marker_size) plot(data[:,0], data[:,5], '-wo', c='k', mfc='w', markersize=marker_size) plot(data[:,0], data[:,6], '-wv', c='k', mfc='w', markersize=marker_size) #aaLabel based on the amino acid that is graphed #C2 type: ala(GCN), pro(CCN), ser4(TCN), thr(ACN) #G2 type: arg4 (CGN), an gly(GGN) #T2 type: leu4(CTN), val (GTN) (Sueoka 2002) text(.95, .05, aa_labels[title], font, verticalalignment='bottom', horizontalalignment='right') #output the figure set_axis_to_probs() if graph_name is not None: savefig(graph_name) PyCogent-1.5.3/cogent/draw/compatibility.py000644 000765 000024 00000046354 12024702176 021714 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Visualisation of phylogenetic compatibility within an alignment. Jakobsen & Easteal, CABIOS 12(4), 1996 Jakobsen, Wilson & Easteal, Mol. Biol. Evol. 14(5), 1997 """ from __future__ import division import sys import math import numpy import numpy.random import operator import matplotlib.pyplot as plt import matplotlib.ticker import matplotlib.colors from cogent.draw.linear import Display __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" def order_to_cluster_similar(S, elts=None, start=None): """Order so as to keep the most similar parts adjacent to each other S is expected to be a square matrix, and the returned list of ordinals is len(S) long.""" position = {} unavailable = set() if elts is not None: if len(elts) < 2: return elts elts = set(elts) for x in range(len(S)): if x != start and x not in elts: unavailable.add(x) if start is not None: position[start] = (False, [start]) similarity = [numpy.unravel_index(p, S.shape) for p in numpy.argsort(S, axis=None)] for (x, y) in similarity[::-1]: if x==y or x in unavailable or y in unavailable: continue (x_end, x_run) = position.get(x, (None, [x])) (y_end, y_run) = position.get(y, (None, [y])) if x_run is y_run: continue if x_end is not None: if not x_end: x_run.reverse() unavailable.add(x) if y_end is not None: if y_end: y_run.reverse() unavailable.add(y) run = x_run + y_run position[run[0]] = (False, run) position[run[-1]] = (True, run) if start is not None: if run[-1] == start: run.reverse() run = run[1:] return run def tied_segments(scores): """(start, end) of each run of equal values in scores >>> tied_segments([1,1,1,2]) [(0, 3), (3, 4)] """ pos = numpy.flatnonzero(numpy.diff(scores)) pos = numpy.concatenate(([0],pos+1,[len(scores)])) return zip(pos[:-1],pos[1:]) def order_tied_to_cluster_similar(S, scores): """Use similarity measure S to make similar elements adjacent, but only to break ties in the primary order defined by the list of scores""" assert S.shape == (len(scores), len(scores)) new_order = [] start = None for (a,b) in tied_segments(scores): useful = range(a,b) if start is not None: useful.append(start) start = len(useful)-1 useful = numpy.array(useful) S2 = S[useful,:] S2 = S2[:,useful] sub_order = order_to_cluster_similar(S2, range(b-a), start) new_order.extend([useful[i] for i in sub_order]) start = new_order[-1] assert set(new_order) == set(range(len(scores))) return new_order def bit_encode(x, _bool2num=numpy.array(["0","1"]).take): """Convert a boolean array into an integer""" return int(_bool2num(x).tostring(), 2) def bit_decode(x, numseqs): """Convert an integer into a boolean array""" result = numpy.empty([numseqs], bool) bit = 1 << (numseqs-1) for i in range(numseqs): result[i] = bit & x bit >>= 1 return result def binary_partitions(alignment): """Returns (sites, columns, partitions) sites[informative column number] = alignment position number columns[informative column number] = distinct partition number partitions[distinct partition number] = (partition, mask) as ints """ sites = [] columns = [] partitions = [] partition_index = {} for (site, column) in enumerate(alignment.Positions): column = numpy.array(column) (A, T, C, G, R, Y, W, S, U) = [ bit_encode(column == N) for N in "ATCGRYWSU"] T |= U for split in [([A, G, R], [C, T, Y]), ([A, T, W], [G, C, S])]: halves = [] for char_group in split: X = reduce(operator.or_, char_group) if not (X & (X - 1)): break # fewer than 2 bits set in X halves.append(X) else: (X, Z) = sorted(halves) partition = (X,X|Z) if partition not in partition_index: partition_index[partition] = len(partitions) partitions.append(partition) sites.append(site) columns.append(partition_index[partition]) break # if R/Y split OK no need to consider W/S split. return (sites, columns, partitions) def min_edges(columns): """Given two boolean arrays each representing an informative alignment position, there are 4 possible combinations for each sequence: TT, TF, FT and FF. If N of these 4 possibilities are found then there must be at least N-1 tree edges on which mutations occured As a special case, the diagonal values are set to 0 rather than, as theory suggests, 1. This is simply a convenience for later drawing code""" N = len(columns) result = numpy.zeros([N, N], int) for i in range(0, N-1): (a, mask_a) = columns[i] for j in range(i+1, N): (b, mask_b) = columns[j] mask = mask_a & mask_b (na, nb) = (~a, ~b) combos = [c & mask for c in [a&b, a&nb, na&b, na&nb]] combos = [c for c in combos if c] result[i,j] = result[j,i] = len(combos) - 1 return result def neighbour_similarity_score(matrix): left = matrix[:-1] right = matrix[1:] upper = matrix[:,:-1] lower = matrix[:,1:] same = (lower == upper).sum() + (left == right).sum() neighbours = numpy.product(left.shape)+numpy.product(upper.shape) return same / neighbours def shuffled(matrix): assert matrix.shape == (len(matrix), len(matrix)), matrix.shape index = numpy.random.permutation(numpy.arange(len(matrix))) return matrix[index,:][:,index] def nss_significance(matrix, samples=10000): score = neighbour_similarity_score(matrix) scores = numpy.empty([samples]) for i in range(samples): s = neighbour_similarity_score(shuffled(matrix)) scores[i] = s scores.sort() p = (samples-scores.searchsorted(score)+1) / samples return (score, sum(scores)/samples, p) def inter_region_average(a): return a.sum()/numpy.product(a.shape) def intra_region_average(a): d = numpy.diag(a) # ignore the diagonal return (a.sum()-d.sum())/(numpy.product(a.shape)-len(d)) def integer_tick_label(sites): def _formatfunc(x, pos, _sites=sites, _n=len(sites)): if 0 < x < _n: return str(_sites[int(x)]) else: return "" return _formatfunc def boolean_similarity(matrix): # same as numpy.equal.outer(matrix, matrix).trace(axis1=1, axis2=3) # but that would use much memory true = matrix.T.astype(int) false = (~matrix).T.astype(int) both_true = numpy.inner(true, true) both_false = numpy.inner(false, false) return both_true + both_false def partimatrix(alignment, display=False, samples=0, s_limit=0, title="", include_incomplete=False, print_stats=True, max_site_labels=50): if print_stats: print "%s sequences in %s bp alignment" % ( alignment.getNumSeqs(), len(alignment)) (sites, columns, partitions) = binary_partitions(alignment) if print_stats: print "%s unique binary partitions from %s informative sites" % ( len(partitions), len(sites)) partpart = min_edges(partitions) # [partition,partition] partimatrix = partpart[columns,:] # [site, partition] sitematrix = partimatrix[:,columns] # [site, site] # RETICULATE, JE 1996 compatiblity = sitematrix <= 2 if print_stats: print "Overall compatibility %.6f" % intra_region_average(compatiblity) if samples == 0: print "Neighbour similarity score = %.6f" % \ neighbour_similarity_score(compatiblity) else: print "Neighbour similarity = %.6f, avg random = %.6f, p < %s" % \ nss_significance(compatiblity, samples=samples) # PARTIMATRIX, JWE 1997 # Remove the incomplete partitions with gaps or other ambiguities mask = 2**alignment.getNumSeqs()-1 complete = [i for (i,(x, xz)) in enumerate(partitions) if xz==mask] if not include_incomplete: partimatrix = partimatrix[:,complete] partitions = [partitions[i] for i in complete] # For scoring/ordering purposes, also remove the incomplete sequences complete_columns = [i for (i,c) in enumerate(columns) if c in complete] scoreable_partimatrix = partimatrix[complete_columns, :] # Order partitions by increasing conflict score conflict = (scoreable_partimatrix > 2).sum(axis=0) conflict_order = numpy.argsort(conflict) partimatrix = partimatrix[:, conflict_order] partitions = [partitions[i] for i in conflict_order] scoreable_partimatrix = partimatrix[complete_columns, :] support = (scoreable_partimatrix == 0).sum(axis=0) consist = (scoreable_partimatrix <= 2).sum(axis=0) conflict = (scoreable_partimatrix > 2).sum(axis=0) # Similarity measure between partitions O = boolean_similarity(scoreable_partimatrix <= 2) s = 1.0*len(complete_columns) O = O.astype(float) / s p,q = consist/s, conflict/s E = numpy.outer(p,p) + numpy.outer(q,q) S = (O-E)/numpy.sqrt(E*(1-E)/s) # Order partitions for better visual grouping if "order_by_conflict": order = order_tied_to_cluster_similar(S, conflict) else: order = order_to_cluster_similar(S) half = len(order) // 2 if sum(conflict[order[:half]]) > sum(conflict[order[half:]]): order.reverse() partimatrix = partimatrix[:, order] conflict = conflict[order] support = support[order] partitions = [partitions[i] for i in order] if display: figwidth = 8.0 (c_size, p_size) = partimatrix.shape s_size = num_seqs = alignment.getNumSeqs() # Layout (including figure height) chosen to get aspect ratio of # 1.0 for the compatibility matrix, and if possible the other # matrices. if s_size > s_limit: # too many species to show s_size = 0 else: # distort squares to give enough space for species names extra = max(1.0, (12/80)/(figwidth/(c_size + p_size))) p_size *= numpy.sqrt(extra) s_size *= extra genemap = Display(alignment, recursive=s_size>0, colour_sequences=False, draw_bases=False) annot_width = max(genemap.height / 80, 0.1) figwidth = max(figwidth, figwidth/2 + annot_width) bar_height = 0.5 link_width = 0.3 x_margin = 0.60 y_margin = 0.35 xpad = 0.05 ypad = 0.2 (x, y) = (c_size + p_size, c_size + s_size) x_scale = y_scale = (figwidth-2*x_margin-xpad-link_width-annot_width)/x figheight = y_scale * y + 2*y_margin + 2*ypad + bar_height x_scale /= figwidth y_scale /= figheight x_margin /= figwidth y_margin /= figheight xpad /= figwidth ypad /= figheight bar_height /= figheight link_width /= figwidth annot_width /= figwidth (c_width, c_height) = (c_size*x_scale, c_size*y_scale) (p_width, s_height) = (p_size*x_scale, s_size*y_scale) vert = (x_margin + xpad + c_width) top = (y_margin + c_height + ypad) fig = plt.figure(figsize=(figwidth,figheight)) kw = dict(axisbg=fig.get_facecolor()) axC = fig.add_axes([x_margin, y_margin, c_width, c_height], **kw) axP = fig.add_axes([vert, y_margin, p_width, c_height], sharey=axC, **kw) axS = fig.add_axes([vert, top, p_width, s_height or .001], sharex=axP, **kw) axB = fig.add_axes([vert, top+ypad+s_height, p_width, bar_height], sharex=axP, **kw) axZ = fig.add_axes([vert+p_width, y_margin, link_width, c_height], frameon=False) axA = genemap.asAxes( fig, [vert+p_width+link_width, y_margin, annot_width, c_height], vertical=True, labeled=True) axP.yaxis.set_visible(False) #for ax in [axC, axP, axS]: #ax.set_aspect(adjustable='box', aspect='equal') fig.text(x_margin+c_width/2, .995, title, ha='center', va='top') if not s_size: axS.set_visible(False) # No ticks for these non-float dimensions for axes in [axB, axC, axS, axP]: for axis in [axes.xaxis, axes.yaxis]: for tick in axis.get_major_ticks(): tick.gridOn = False tick.tick1On = False tick.tick2On = False tick.label1.set_size(8) tick.label2.set_size(8) if axis is axes.xaxis: tick.label1.set_rotation('vertical') # Partition dimension for axis in [axS.xaxis, axP.xaxis, axB.xaxis, axB.yaxis]: axis.set_major_formatter(matplotlib.ticker.NullFormatter()) axis.set_minor_formatter(matplotlib.ticker.NullFormatter()) # Site dimension if c_size > max_site_labels: for axis in [axC.yaxis, axC.xaxis]: axis.set_visible(False) else: isl = integer_tick_label(sites) for axis in [axC.yaxis, axC.xaxis]: axis.set_minor_locator(matplotlib.ticker.IndexLocator(1,0)) axis.set_minor_formatter(matplotlib.ticker.NullFormatter()) axis.set_major_locator(matplotlib.ticker.IndexLocator(1,0.5)) axis.set_major_formatter(matplotlib.ticker.FuncFormatter(isl)) # Species dimension if s_size: seq_names = [name.split(' ')[0] for name in alignment.getSeqNames()] axS.yaxis.set_minor_locator(matplotlib.ticker.IndexLocator(1,0)) axS.yaxis.set_minor_formatter(matplotlib.ticker.NullFormatter()) axS.yaxis.set_major_locator(matplotlib.ticker.IndexLocator(1,0.5)) axS.yaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(seq_names)) #axS.yaxis.grid(False) #, 'minor') # Display the main matrices: compatibility and partimatrix axC.pcolorfast(compatiblity, cmap=plt.cm.gray) partishow = partimatrix <= 2 axP.pcolorfast(partishow, cmap=plt.cm.gray) axP.set_autoscale_on(False) axC.plot([0,c_size], [0, c_size], color='lightgreen') (sx, sy) = numpy.nonzero(partimatrix.T==0) axP.scatter(sx+0.5, sy+0.5, color='lightgreen', marker='^', s=15) # Make [partition, sequence] matrix # Not a good idea with too many sequences if s_size: partseq1 = numpy.empty([len(partitions), num_seqs], bool) partseq2 = numpy.empty([len(partitions), num_seqs], bool) for (i, (x, xz)) in enumerate(partitions): partseq1[i] = bit_decode(x, num_seqs) partseq2[i] = bit_decode(xz^x, num_seqs) # Order sequqnces so as to place similar sequences adjacent O = boolean_similarity(partseq1) order = order_to_cluster_similar(O) partseq1 = partseq1[:,order] partseq2 = partseq2[:,order] seq_names = [seq_names[i] for i in order] axS.set_ylim(0, len(seq_names)) axS.set_autoscale_on(False) for (halfpart,color) in [(partseq1, 'red'),(partseq2, 'blue')]: (sx, sy) = numpy.nonzero(halfpart) axS.scatter(sx+0.5, sy+0.5, color=color, marker='o') axS.grid(False) #axS.yaxis.tick_right() #axS.yaxis.set_label_position('right') # Bar chart of partition support and conflict scores #axB.set_autoscalex_on(False) if conflict.sum(): axB.bar(numpy.arange(len(partitions)), -conflict/conflict.sum(), 1.0, color='black', align='edge') if support.sum(): axB.bar(numpy.arange(len(partitions)), +support/support.sum(), 1.0, color='lightgreen', align='edge') axB.set_xlim(0.0, len(partitions)) # Alignment features axA.set_ylim(0, len(alignment)) axA.set_autoscale_on(False) axA.yaxis.set_major_formatter( matplotlib.ticker.FuncFormatter(lambda y,pos:str(int(y)))) axA.yaxis.tick_right() axA.yaxis.set_label_position('right') axA.xaxis.tick_top() axA.xaxis.set_label_position('top') #axA.xaxis.set_visible(False) # "Zoom lines" linking informative-site coords to alignment coords from matplotlib.patches import PathPatch from matplotlib.path import Path axZ.set_xlim(0.0,1.0) axZ.set_xticks([]) axZ.set_ylim(0, len(alignment)) axZ.set_yticks([]) zoom = len(alignment) / len(sites) vertices = [] for (i,p) in enumerate(sites): vertices.extend([(.1, (i+0.5)*zoom), (.9,p+0.5)]) axA.axhspan(p, p+1, facecolor='green', edgecolor='green', alpha=0.3) ops = [Path.MOVETO, Path.LINETO] * (len(vertices)//2) path = Path(vertices, ops) axZ.add_patch(PathPatch(path, fill=False, linewidth=0.25)) # interactive navigation messes up axZ. Could use callbacks but # probably not worth the extra complexity. for ax in [axC, axP, axS, axB, axZ, axA]: ax.set_navigate(False) return fig if __name__ == '__main__': from cogent import LoadSeqs, DNA import sys, optparse, os.path parser = optparse.OptionParser("usage: %prog [options] alignment") parser.add_option("-p", "--print", action="store_true", default=True, dest="print_stats", help="print neighbour similarity score etc.") parser.add_option("-d", "--display", action="store_true", default=False, dest="display", help="show matrices via matplotlib") parser.add_option("-i", "--incomplete", action="store_true", default=False, dest="include_incomplete", help="include partitions containing ambiguities") parser.add_option("-t", "--taxalimit", dest="s_limit", default=20, type="int", help="maximum number of species that can be displayed") parser.add_option("-s", "--samples", dest="samples", default=10000, type="int", help="samples for significance test") (options, args) = parser.parse_args() if len(args) != 1: parser.print_help() sys.exit(1) alignment = LoadSeqs(args[0], moltype=DNA) kw = vars(options) kw['title'] = os.path.splitext(os.path.basename(args[0]))[0] fig = partimatrix(alignment, **kw) if fig: plt.show() PyCogent-1.5.3/cogent/draw/dendrogram.py000644 000765 000024 00000066276 12024702176 021172 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Drawing trees. Draws horizontal trees where the vertical spacing between taxa is constant. Since dy is fixed dendrograms can be either: - square: dx = distance - not square: dx = max(0, sqrt(distance**2 - dy**2)) Also draws basic unrooted trees. For drawing trees use either: - SquareDendrogram - StraightDendrogram - ContemporaneousDendrogram - ContemporaneousStraightDendrogram - ShelvedDendrogram - UnrootedDendrogram """ # Future: # - font styles # - orientation switch # Layout gets more complicated for rooted tree styles if dy is allowed to vary, # and constant-y is suitable for placing alongside a sequence alignment anyway. from cogent.core.tree import TreeNode import rlg2mpl import matplotlib.colors from matplotlib.patches import PathPatch, Polygon from matplotlib.path import Path from matplotlib.text import Text import numpy __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", "Zongzhi Liu", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" to_rgb = matplotlib.colors.colorConverter.to_rgb def _sign(x): """Returns True if x is positive, False otherwise.""" return x and x/abs(x) def _first_non_none(values): for item in values: if item is not None: return item def SimpleColormap(color0, color1, name=None): """Linear interpolation between any two colours""" c0 = to_rgb(color0) c1 = to_rgb(color1) cn = ['red', 'green', 'blue'] d = dict((n,[(0,s,s),(1,e,e)]) for (n,s,e) in zip(cn, c0, c1)) return matplotlib.colors.LinearSegmentedColormap(name, d) class ScalarColormapShading(object): """Returns color interpolated between two colors based on scalar value. Parameters: shade_param: parameter to look at for shading purposes min_val: minimum value of the parameter over the whole tree max_val: maximum value of the parameter over the whole tree color0: color to use when the parameter is at its minimum color1: color to use when the parameter is at its maximum Note: this is just a convenience wrapper for coloring. You can color the tree using any arbitrary function of the individual nodes by passing edge_color_callback to makeColorCallback (or to other objects that delegate to it). """ def __init__(self, shade_param, min_val, max_val, cmap): """Returns a new callback for SimpleScalarShading: f(obj) -> color""" assert max_val, 'Need to know the maximum before shading can be done' self.min_val = min_val self.max_val = max_val self.shade_param = shade_param self.cmap = cmap def __call__(self, edge): """Returns color given node.""" value = edge.params.get(self.shade_param, None) if value is None: return "grey" else: value_to_show = max(min(value, self.max_val), self.min_val) normed = (value_to_show-self.min_val)/(self.max_val-self.min_val) color = self.cmap(normed) return color def makeColorCallback(shade_param=None, min_value=0.0, max_value=None, edge_color="black", highlight_color="red", edge_color_callback=None, callback_returns_name=None, cmap=None): """Makes a callback f(node)->color using several strategies. The possibilities are: 1. You want all the nodes to have the same color. Pass in edge_color as something other than "black". 2. You want to shade the nodes from one color to another based on the value of a parameter. Pass the name of the parameter as a string to shade_param (e.g. the parameter might be "GC" for GC content). Pass in the max and min values (e.g. calculated from the range actually on the tree), and the colors for the "normal" (low) and "special" (high) values. The renderer will automatically calculate what the colors are. 3. You want some nodes to be one color, and other nodes to be highlighted in a different color. Pass in these colors as edge_color (e.g. "blue") and highlight_color (e.g. "green"). Set a parameter to 0 (for normal) or 1 (for highlight), and pass in the name of this parameter as shade_param (e.g. the parameter might be "is_mammal", which you would set to 0 (False) or 1 (True) to highlight the mammals. 4. You have f(node) -> color. Pass in f as edge_color_callback. Alternatively, set the Color attribute of the dendrogram edges. """ if callback_returns_name is not None: pass # give deprecation warning?, no longer needed edge_color = to_rgb(edge_color) highlight_color = to_rgb(highlight_color) if edge_color_callback is not None: return lambda edge:edge_color_callback(edge) elif shade_param: if cmap is None: cmap = SimpleColormap(edge_color, highlight_color) return ScalarColormapShading( shade_param, min_value, max_value, cmap) else: return lambda edge:edge_color class MatplotlibRenderer(object): """Returns a matplitlib render including font size, stroke width, etc. Note: see documentation for makeColorCallback above to figure out how to make it color things the way you want. Dynamically varying the stroke width is not yet implemented by should be. """ def __init__(self, font_size=None, stroke_width=3, label_pad=None, **kw): self.calculated_edge_color = makeColorCallback(**kw) self.text_opts = {} if font_size is not None: self.text_opts['fontsize'] = font_size self.line_opts = {} if stroke_width is not None: self.line_opts['linewidth'] = stroke_width if label_pad is None: label_pad = 8 self.labelPadDistance = label_pad def edge_color(self, edge): if edge.Color is None: return self.calculated_edge_color(edge.original) else: return edge.Color def line(self, x1, y1, x2, y2, edge=None): opts = self.line_opts.copy() if edge is not None: opts['edgecolor'] = self.edge_color(edge) path = Path([(x1, y1), (x2, y2)], [Path.MOVETO, Path.LINETO]) return PathPatch(path, **opts) def polygon(self, vertices, color): opts = self.line_opts.copy() opts['color'] = color return Polygon(vertices, **opts) def string(self, x, y, string, ha=None, va=None, rotation=None, color=None): opts = self.text_opts.copy() if ha is not None: opts['ha'] = ha if va is not None: opts['va'] = va if rotation is not None: opts['rotation'] = rotation if color is not None: opts['color'] = color return Text(x, y, string, **opts) class DendrogramLabelStyle(object): """Label options""" def __init__(self, show_params=None, show_internal_labels=False, label_template=None, edge_label_callback=None): if edge_label_callback is None: if label_template is None: if hasattr(show_params, "__contains__"): if len(show_params) == 1: label_template = "%%(%s)s" % show_params[0] else: label_template = "\n".join( ["%s: %%(%s)s" % (p,p) for p in show_params]) elif show_params: label_template = "%s" else: label_template = "" def edge_label_callback(edge): try: if hasattr(label_template, 'substitute'): # A new style (as of Py 2.4?) string template return label_template.substitute(edge.params) else: return label_template % edge.params except KeyError: return "" # param missing - probably just the root edge self.edgeLabelCallback = edge_label_callback self.showInternalLabels = show_internal_labels def getEdgeLabel(self, edge): return self.edgeLabelCallback(edge) def getNodeLabel(self, edge): if edge.Name is not None: return edge.Name elif self.showInternalLabels or not edge.Children: return edge.original.Name else: return "" def ValidColorProperty(real_name, doc='A color name or other spec'): """Can only be set to Null or a valid color""" def getter(obj): return getattr(obj, real_name, None) def setter(obj, value): if value is not None: to_rgb(value) setattr(obj, real_name, value) def deleter(obj): setattr(obj, real_name, None) return property(getter, setter, deleter, doc) class _Dendrogram(rlg2mpl.Drawable, TreeNode): # One of these for each tree edge. Extra attributes: # depth - distance from root to bottom of edge # height - max distance from a decendant leaf to top of edge # width - number of decendant leaves # note these are named tree-wise, not geometricaly, so think # of a vertical tree (for this part anyway) # # x1, y1, x2, y2 - coordinates # these are horizontal / vertical as you would expect # # The algorithm is split into 4 passes over the tree for easier # code reuse - vertical drawing, new tree styles, new graphics # libraries etc. aspect_distorts_lengths = True def __init__(self, edge, use_lengths=True): children = [type(self)(child) for child in edge.Children] TreeNode.__init__(self, Params=edge.params.copy(), Children=children, Name=("" if children else edge.Name)) self.Length = edge.Length self.original = edge # for edge_color_callback self.Collapsed = False self.use_lengths_default = use_lengths # Colors are properties so that invalid color names are caught immediately Color = ValidColorProperty('_Color', 'Color of line segment') NameColor = ValidColorProperty('_NameColor', 'Color of node name') CladeColor = ValidColorProperty('_CladeColor', 'Color of collapsed descendants') def __repr__(self): return '%s %s %s %s' % ( self.depth, self.length, self.height, self.Children) def updateGeometry(self, use_lengths, depth=None, track_coordinates=None): """Calculate tree node attributes such as height and depth. Despite the name this first pass is ignorant of issues like scale and orientation""" if self.Length is None or not use_lengths: if depth is None: self.length = 0 else: self.length = 1 else: self.length = self.Length self.depth = (depth or 0) + self.length children = self.Children if children: for c in children: c.updateGeometry(use_lengths, self.depth, track_coordinates) self.height = max([c.height for c in children]) + self.length self.leafcount = sum([c.leafcount for c in children]) self.edgecount = sum([c.edgecount for c in children]) + 1 self.longest_label = max([c.longest_label for c in children], key=len) else: self.height = self.length self.leafcount = self.edgecount = 1 self.longest_label = self.Name or '' if track_coordinates is not None and self.Name != "root": self.track_y = track_coordinates[self.Name] else: self.track_y = 0 def coords(self, height, width): """Return list of [node_name, node_id, x, y, child_ids]""" self.asArtist(height, width) result = [] for node in self.postorder(include_self=True): result.append([node.Name, id(node), node.x2, node.y2] + [map(id, node.Children)]) return result def makeFigure(self, width=None, height=None, margin=.25, use_lengths=None, **kw): (width, height),posn,kw = rlg2mpl.figureLayout(width, height, margin=0, default_aspect=0.5, leftovers=True, **kw) fig = self._makeFigure(width, height) ax = fig.add_axes(posn, frameon=False) width = 72 * posn[2] * fig.get_figwidth() height = 72 * posn[3] * fig.get_figheight() ax.set_xlim(0, width) ax.set_ylim(0, height) ax.set_xticks([]) ax.set_yticks([]) if use_lengths is None: use_lengths = self.use_lengths_default else: pass # deprecate setting use_lengths here? if use_lengths and self.aspect_distorts_lengths: ax.set_aspect('equal') g = self.asArtist(width, height, use_lengths=use_lengths, margin=margin*72, **kw) ax.add_artist(g) return fig def asArtist(self, width, height, margin=20, use_lengths=None, scale_bar="left", show_params=None, show_internal_labels=False, label_template=None, edge_label_callback=None, shade_param=None, max_value=None, font_size=None, **kw): if use_lengths is None: use_lengths = self.use_lengths_default self.updateGeometry(use_lengths=use_lengths) if width <= 2 * margin: raise ValueError('%spt not wide enough for %spt margins' % (width, margin)) if height <= 2 * margin: raise ValueError('%spt not high enough for %spt margins' % (height, margin)) width -= 2 * margin height -= 2 * margin label_length = len(self.longest_label) label_width = label_length * 0.8 * (font_size or 10) # not very accurate (left_labels, right_labels) = self.labelMargins(label_width) total_label_width = left_labels + right_labels if width < total_label_width: raise ValueError('%spt not wide enough for ""%s"' % (width, self.longest_label)) scale = self.updateCoordinates(width-total_label_width, height) if shade_param is not None and max_value is None: for edge in self.postorder(include_self=True): sp = edge.params.get(shade_param, None) if max_value is None or sp > max_value: max_value = sp renderer = MatplotlibRenderer(shade_param=shade_param, max_value=max_value, font_size=font_size, **kw) labelopts = {} for labelopt in ['show_params', 'show_internal_labels', 'label_template', 'edge_label_callback']: labelopts[labelopt] = locals()[labelopt] label_style = DendrogramLabelStyle(**labelopts) ss = self._draw(renderer, label_style) if use_lengths: # Placing the scale properly might take some work, # for now just always put it in a bottom corner. unit = 10**min(0.0, numpy.floor(numpy.log10(width/scale/2.0))) if scale_bar == "right": x1, x2 = (width-scale*unit, width) elif scale_bar == "left": x1, x2 = (-left_labels, scale*unit-left_labels) else: assert not scale_bar, scale_bar if scale_bar: ss.append(renderer.line(x1, 0.0, x2, 0.0)) ss.append(renderer.string((x1+x2)/2, 5, str(unit), va='bottom', ha='center')) g = rlg2mpl.Group(*ss) g.translate(margin+left_labels, margin) return g def _draw(self, renderer, label_style): g = [] g += self._draw_edge(renderer, label_style) if self.Collapsed: g += self._draw_collapsed_clade(renderer, label_style) else: g += self._draw_node(renderer, label_style) for child in self.Children: g += child._draw(renderer, label_style) g += self._draw_node_label(renderer, label_style) return g def _draw_node(self, renderer, label_style): g = [] # Joining line for square form if self.Children: cys = [c.y1 for c in self.Children] + [self.y2] if max(cys) > min(cys): g.append(renderer.line(self.x2, min(cys), self.x2, max(cys), self)) return g def _draw_edge(self, renderer, label_style): g = [] if ((self.x1, self.y1) == (self.x2, self.y2)): # avoid labeling zero length line, eg: root return g # Main line g.append(renderer.line(self.x1, self.y1, self.x2, self.y2, self)) # Edge Label text = label_style.getEdgeLabel(self) if text: midx, midy = (self.x1+self.x2)/2, (self.y1+self.y2)/2 if self.x1 == self.x2: rot = 0 else: rot = numpy.arctan((self.y2-self.y1)/(self.x2-self.x1)) midx += numpy.cos(rot+numpy.pi/2)*3 midy += numpy.sin(rot+numpy.pi/2)*3 g.append(renderer.string(midx, midy, text, ha='center', va='bottom', rotation=180/numpy.pi*rot)) return g def _draw_node_label(self, renderer, label_style): text = label_style.getNodeLabel(self) color = self.NameColor (x, ha, y, va) = self.getLabelCoordinates(text, renderer) return [renderer.string(x, y, text, ha=ha, va=va, color=color)] def _draw_collapsed_clade(self, renderer, label_style): text = label_style.getNodeLabel(self) color = _first_non_none([self.CladeColor, self.Color, 'black']) icolor = 'white' if sum(to_rgb(color))/3 < 0.5 else 'black' g = [] if not self.Children: return g (l,r,t,b), vertices = self.wedgeVertices() g.append(renderer.polygon(vertices, color)) if not b <= self.y2 <= t: # ShelvedDendrogram needs this extra line segment g.append(renderer.line(self.x2, self.y2, self.x2, b, self)) (x, ha, y, va) = self.getLabelCoordinates(text, renderer) g.append(renderer.string( (self.x2+r)/2, (t+b)/2, str(self.leafcount), ha=ha, va=va, color=icolor)) g.append(renderer.string( x-self.x2+r, y, text, ha=ha, va=va, color=self.NameColor)) return g def setCollapsed(self, collapsed=True, label=None, color=None): if color is not None: self.CladeColor = color if label is not None: self.Name = label self.Collapsed = collapsed class Dimensions(object): def __init__(self, xscale, yscale, total_tree_height): self.x = xscale self.y = yscale self.height = total_tree_height class _RootedDendrogram(_Dendrogram): """_RootedDendrogram subclasses provide yCoords and xCoords, which examine attributes of a node (its length, coodinates of its children) and return a tuple for start/end of the line representing the edge.""" def labelMargins(self, label_width): return (0, label_width) def widthRequired(self): return self.leafcount def xCoords(self, scale, x1): raise NotImplementedError def yCoords(self, scale, y1): raise NotImplementedError def updateCoordinates(self, width, height): xscale = width / self.height yscale = height / self.widthRequired() scale = Dimensions(xscale, yscale, self.height) # y coords done postorder, x preorder, y first. # so it has to be done in 2 passes. self.update_y_coordinates(scale) self.update_x_coordinates(scale) return xscale def update_y_coordinates(self, scale, y1=None): """The second pass through the tree. Y coordinates only depend on the shape of the tree and yscale""" if y1 is None: y1 = self.widthRequired() * scale.y child_y = y1 for child in self.Children: child.update_y_coordinates(scale, child_y) child_y -= child.widthRequired() * scale.y (self.y1, self.y2) = self.yCoords(scale, y1) def update_x_coordinates(self, scale, x1=0): """For non 'square' styles the x coordinates will depend (a bit) on the y coodinates, so they should be done first""" (self.x1, self.x2) = self.xCoords(scale, x1) for child in self.Children: child.update_x_coordinates(scale, self.x2) def getLabelCoordinates(self, text, renderer): return (self.x2+renderer.labelPadDistance, 'left', self.y2, 'center') class SquareDendrogram(_RootedDendrogram): aspect_distorts_lengths = False def yCoords(self, scale, y1): cys = [c.y1 for c in self.Children] if cys: y2 = (cys[0]+cys[-1]) / 2.0 else: y2 = y1 - 0.5 * scale.y return (y2, y2) def xCoords(self, scale, x1): dx = scale.x * self.length x2 = x1 + dx return (x1, x2) def wedgeVertices(self): tip_ys = [(c.y2 + self.y2)/2 for c in self.iterTips()] t,b = max(tip_ys), min(tip_ys) cxs = [c.x2 for c in self.iterTips()] l,r = min(cxs), max(cxs) return (l,r,t,b), [(self.x2, b), (self.x2, t), (l, t), (r, b)] class StraightDendrogram(_RootedDendrogram): def yCoords(self, scale, y1): # has a side effect of adjusting the child y1's to meet nodes' y2's cys = [c.y1 for c in self.Children] if cys: y2 = (cys[0]+cys[-1]) / 2.0 distances = [child.length for child in self.Children] closest_child = self.Children[distances.index(min(distances))] dy = closest_child.y1 - y2 max_dy = 0.8*max(5, closest_child.length*scale.x) if abs(dy) > max_dy: # 'moved', node.Name, y2, 'to within', max_dy, # 'of', closest_child.Name, closest_child.y1 y2 = closest_child.y1 - _sign(dy) * max_dy else: y2 = y1 - scale.y / 2.0 y1 = y2 for child in self.Children: child.y1 = y2 return (y1, y2) def xCoords(self, scale, x1): dx = self.length * scale.x dy = self.y2 - self.y1 dx = numpy.sqrt(max(dx**2 - dy**2, 1)) return (x1, x1 + dx) def wedgeVertices(self): tip_ys = [(c.y2 + self.y2)/2 for c in self.iterTips()] t,b = max(tip_ys), min(tip_ys) cxs = [c.x2 for c in self.iterTips()] l,r = min(cxs), max(cxs) vertices = [(self.x2, self.y2), (l, t), (r, b)] return (l,r,t,b), vertices class _ContemporaneousMixin(object): """A dendrogram with all of the tips lined up. Tidy but not suitable for displaying evolutionary distances accurately""" # Overrides init to change default for use_lengths def __init__(self, edge, use_lengths=False): super(_ContemporaneousMixin, self).__init__(edge, use_lengths) def xCoords(self, scale, x1): return (x1, (scale.height-(self.height-self.length))*scale.x) class ContemporaneousDendrogram(_ContemporaneousMixin, SquareDendrogram): pass class ContemporaneousStraightDendrogram(_ContemporaneousMixin, StraightDendrogram): pass class ShelvedDendrogram(ContemporaneousDendrogram): """A dendrogram in which internal nodes also get a row to themselves""" def widthRequired(self): return self.edgecount # as opposed to tipcount def yCoords(self, scale, y1): cys = [c.y1 for c in self.Children] if cys: y2 = cys[-1] - 1.0 * scale.y else: y2 = y1 - 0.5 * scale.y return (y2, y2) class AlignedShelvedDendrogram(ShelvedDendrogram): def update_y_coordinates(self, scale, y1=None): """The second pass through the tree. Y coordinates only depend on the shape of the tree and yscale""" for child in self.Children: child.update_y_coordinates(scale, None) (self.y1, self.y2) = self.yCoords(scale, None) def yCoords(self, scale, y1): if hasattr(self, 'track_y'): return (self.track_y, self.track_y) else: raise RuntimeError, self.Name class UnrootedDendrogram(_Dendrogram): aspect_distorts_lengths = True def labelMargins(self, label_width): return (label_width, label_width) def wedgeVertices(self): tip_dists = [(c.depth-self.depth)*self.scale for c in self.iterTips()] (near, far) = (min(tip_dists), max(tip_dists)) a = self.angle - 0.25 * self.wedge (x1, y1) = (self.x2+near*numpy.sin(a), self.y2+near*numpy.cos(a)) a = self.angle + 0.25 * self.wedge (x2, y2) = (self.x2+far*numpy.sin(a), self.y2+far*numpy.cos(a)) vertices = [(self.x2, self.y2), (x1, y1), (x2, y2)] return (self.x2, (x1+x2)/2, self.y2, (y1+y2)/2), vertices def updateCoordinates(self, width, height): angle = 2*numpy.pi / self.leafcount # this loop is a horrible brute force hack # there are better (but complex) ways to find # the best rotation of the tree to fit the display. best_scale = 0 for i in range(60): direction = i/60.0*numpy.pi points = self._update_coordinates(1.0, 0, 0, direction, angle) xs = [x for (x,y) in points] ys = [y for (x,y) in points] scale = min(float(width)/(max(xs)-min(xs)), float(height)/(max(ys)-min(ys))) scale *= 0.95 # extra margin for labels if scale > best_scale: best_scale = scale mid_x = width/2-((max(xs)+min(xs))/2)*scale mid_y = height/2-((max(ys)+min(ys))/2)*scale best_args = (scale, mid_x, mid_y, direction, angle) self._update_coordinates(*best_args) return best_scale def _update_coordinates(self, s, x1, y1, a, da): # Constant angle algorithm. Should add maximim daylight step. (x2, y2) = (x1+self.length*s*numpy.sin(a), y1+self.length*s*numpy.cos(a)) (self.x1, self.y1, self.x2, self.y2, self.angle) = (x1, y1, x2, y2, a) if self.Collapsed: self.wedge = self.leafcount * da self.scale = s (l,r,t,b), vertices = self.wedgeVertices() return vertices a -= self.leafcount * da / 2 if not self.Children: points = [(x2, y2)] else: points = [] for (i,child) in enumerate(self.Children): ca = child.leafcount * da points += child._update_coordinates(s, x2, y2, a+ca/2, da) a += ca return points def getLabelCoordinates(self, text, renderer): (dx, dy) = (numpy.sin(self.angle), numpy.cos(self.angle)) pad = renderer.labelPadDistance (x, y) = (self.x2+pad*dx, self.y2+pad*dy) if dx > abs(dy): return (x, 'left', y, 'center') elif -dx > abs(dy): return (x, 'right', y, 'center') elif dy > 0: return (x, 'center', y, 'bottom') else: return (x, 'center', y, 'top') PyCogent-1.5.3/cogent/draw/dinuc.py000644 000765 000024 00000010754 12024702176 020140 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Provides a plot for visualizing dinucleotide frequencies.. """ from matplotlib import use, rc from matplotlib.font_manager import FontProperties use('Agg') #suppress graphical rendering from matplotlib.ticker import FixedFormatter from pylab import arange, axvline, array, plot, scatter, legend, gca, xlim, \ savefig __author__ = "Stephanie Wilson" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Stephanie Wilson"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" rc('text', usetex=True) rc('font', family='serif') #required to match latex text and equations colors = ["#000000","#FF0000","#00FF00","#FFFF00", "#CC99FF","#FFCC99","#CCFFFF","#C0C0C0", "#6D6D6D","#2353FF","#00FFFF","#FF8800", "#238853","#882353","#EC008C","#000099"] axis_names= [a+b for a in 'UCAG' for b in 'UCAG'] #place to put the averages, and the genes on x axis line_positions = arange(16) #standard_gene_type_formats provides arguments to plot() that depend on the type #of gene. All contents of this dict must be valid arguments and values for #plot()! standard_gene_type_formats = { 'hgt':{'marker':'d'}, 'ribosomal':{'marker':'s'}, None:{'marker':'o'}, } #offsets maps labels onto horizontal positions within each column. Offset of #0.5 means it will appear in the middle of the column. offsets = { 'hgt': 0.3, 'ribosomal':0.7, None: 0.5 } light_gray = '#CCCCCC' #define standard color for background lines def dinuc_plot(data, graph_name="ADinucTester.png", title = "", \ gene_type_formats=standard_gene_type_formats, \ background_line_color=light_gray, avg_formats={}, point_formats={}): """Returns dinucleotide usage plot from given species data. Data is a dict with the following structure: {name: {gene_type: [values]}} where name is e.g. the name of the species (will be displayed on graph), and gene_type will be the name of the type of gene (e.g. ribosomal, hgt, non-hgt, all). Values should be the 16-element array obtained from calling normalize() on the DinucUsage objects giving the DinucUsage frequencies, as fractions, for each of the 16 dinucleotides in alphabet order (TT -> GG). Calculates the average of each gene type on the fly and connects these averages with lines. If you have precalculated the average, just pass it in as a single 'gene' (the average of an array with one item will be itself...). avg_formats should be passable to plot(), e.g. markersize. point_formats should be passable to scatter(), e.g. 'c'. """ #constructing the vertical lines for a in line_positions: axvline(a,ymin=0,ymax=1, color=background_line_color) curr_color_index = 0 lines = [] labels = [] for label, gene_types in sorted(data.items()): gene_types = data[label] curr_color = colors[curr_color_index] for type_label, usages in sorted(gene_types.items()): labels.append(':'.join(map(str, [label,type_label]))) x_positions = line_positions + \ offsets.get(type_label, offsets[None]) curr_format = gene_type_formats.get(type_label, \ gene_type_formats[None]) #plot the averages...connected with a line for easier visual avg_format = curr_format.copy() avg_format.update(avg_formats) avg = sum(array(usages))/len(usages) curr_line = plot(x_positions, avg, color=curr_color, markerfacecolor=curr_color, label=label, **avg_format) lines.append(curr_line) point_format = curr_format.copy() if len(usages) > 1: #got more than one gene: plot points point_format.update(point_formats) for u in usages: scatter(x_positions, u, c=curr_color, **point_format) #get the next color try: curr_color_index += 1 except IndexError: #fell off end of list -- wrap around curr_color_index = 0 legend(lines, labels, loc='best', \ prop=FontProperties(size='smaller')) a = gca() a.set_yticks(arange(0,1.01,.1)) #set the yaxis labels a.set_xticks(arange(0.5,16.5,1)) a.xaxis.set_major_formatter(FixedFormatter(axis_names)) xlim(0,16) if graph_name is not None: savefig(graph_name) PyCogent-1.5.3/cogent/draw/distribution_plots.py000755 000765 000024 00000062663 12024702176 023007 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "Jai Ram Rideout" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jai Ram Rideout" __email__ = "jai.rideout@gmail.com" __status__ = "Production" """This module contains functions for plotting distributions in various ways. There are two different types of plotting functions: generate_box_plots() plots several boxplots next to each other for easy comparison. generate_comparative_plots() plots groupings of distributions at data points along the x-axis. """ from matplotlib import use use('Agg', warn=False) from itertools import cycle from math import isnan from matplotlib.colors import colorConverter from matplotlib.lines import Line2D from matplotlib.patches import Polygon, Rectangle from matplotlib.pyplot import boxplot, figure from matplotlib.transforms import Bbox from numpy import array, mean, random, sqrt, std def generate_box_plots(distributions, x_values=None, x_tick_labels=None, title=None, x_label=None, y_label=None, x_tick_labels_orientation='vertical', y_min=None, y_max=None, whisker_length=1.5, box_width=0.5, box_color=None, figure_width=None, figure_height=None): """Returns a matplotlib.figure.Figure object containing a boxplot for each distribution. Arguments: - distributions: A list of lists containing each distribution. - x_values: A list indicating where each boxplot should be placed. Must be the same length as distributions if provided. - x_tick_labels: A list of labels to be used to label x-axis ticks. - title: A string containing the title of the plot. - x_label: A string containing the x-axis label. - y_label: A string containing the y-axis label. - x_tick_labels_orientation: A string specifying the orientation of the x-axis labels (either "vertical" or "horizontal"). - y_min: The minimum value of the y-axis. If None, uses matplotlib's autoscale. - y_max: The maximum value of the y-axis. If None, uses matplotlib's autoscale. - whisker_length: The length of the whiskers as a function of the IQR. For example, if 1.5, the whiskers extend to 1.5 * IQR. Anything outside of that range is seen as an outlier. - box_width: The width of each box in plot units. - box_color: The color of the boxes. If None, boxes will be the same color as the plot background. - figure_width: the width of the plot figure in inches. If not provided, will default to matplotlib's default figure width. - figure_height: the height of the plot figure in inches. If not provided, will default to matplotlib's default figure height. """ # Make sure our input makes sense. for distribution in distributions: if len(distribution) == 0: raise ValueError("Some of the provided distributions are empty.") try: map(float, distribution) except: raise ValueError("Each value in each distribution must be a " "number.") _validate_x_values(x_values, x_tick_labels, len(distributions)); # Create a new figure to plot our data on, and then plot the distributions. result, plot_axes = _create_plot() box_plot = boxplot(distributions, positions=x_values, whis=whisker_length, widths=box_width) if box_color is not None: _color_box_plot(plot_axes, box_plot, box_color) # Set up the various plotting options, such as x- and y-axis labels, plot # title, and x-axis values if they have been supplied. _set_axes_options(plot_axes, title, x_label, y_label, x_tick_labels=x_tick_labels, x_tick_labels_orientation=x_tick_labels_orientation, y_min=y_min, y_max=y_max) _set_figure_size(result, figure_width, figure_height) return result def generate_comparative_plots(plot_type, data, x_values=None, data_point_labels=None, distribution_labels=None, distribution_markers=None, x_label=None, y_label=None, title=None, x_tick_labels_orientation='vertical', y_min=None, y_max=None, whisker_length=1.5, error_bar_type='stdv', distribution_width=0.4, group_spacing=0.5, figure_width=None, figure_height=None): """Returns a Figure containing plots grouped at points along the x-axis. Arguments: - plot_type: A string indicating what type of plot should be created. Can be one of 'bar', 'scatter', or 'box', where 'bar' is a bar chart, 'scatter' is a scatter plot, and 'box' is a box plot. - data: A list of lists that represent each data point along the x-axis. Each data point contains lists of data for each distribution in the group at that point. This nesting allows for the grouping of distributions at each data point. - x_values: A list indicating the spacing along the x-axis. Must be the same length as the number of data points if provided. If not provided, plots will be spaced evenly. - data_point_labels: A list of strings containing the label for each data point. - distribution_labels: A list of strings containing the label for each distribution in a data point grouping. - distribution_markers: A list of matplotlib-compatible strings or tuples that indicate the color or symbol to be used to distinguish each distribution in a data point grouping. Colors will be used for bar charts or box plots, while symbols will be used for scatter plots. - x_label: A string containing the x-axis label. - y_label: A string containing the y-axis label. - title: A string containing the title of the plot. - x_tick_labels_orientation: A string specifying the orientation of the x-axis labels (either "vertical" or "horizontal"). - y_min: The minimum value of the y-axis. If None, uses matplotlib's autoscale. - y_max: The maximum value of the y-axis. If None, uses matplotlib's autoscale. - whisker_length: If plot_type is 'box', determines the length of the whiskers as a function of the IQR. For example, if 1.5, the whiskers extend to 1.5 * IQR. Anything outside of that range is seen as an outlier. If plot_type is not 'box', this parameter is ignored. - error_bar_type: A string specifying the type of error bars to use if plot_type is "bar". Can be either "stdv" (for standard deviation) or "sem" for the standard error of the mean. If plot_type is not "bar", this parameter is ignored. - distribution_width: The width in plot units of each individual distribution (e.g. each bar if the plot type is a bar chart, or the width of each box if the plot type is a boxplot). - group_spacing: The gap width in plot units between each data point (i.e. the width between each group of distributions). - figure_width: the width of the plot figure in inches. If not provided, will default to matplotlib's default figure width. - figure_height: the height of the plot figure in inches. If not provided, will default to matplotlib's default figure height. """ # Set up different behavior based on the plot type. if plot_type == 'bar': plotting_function = _plot_bar_data distribution_centered = False marker_type = 'colors' elif plot_type == 'scatter': plotting_function = _plot_scatter_data distribution_centered = True marker_type = 'symbols' elif plot_type == 'box': plotting_function = _plot_box_data distribution_centered = True marker_type = 'colors' else: raise ValueError("Invalid plot type '%s'. Supported plot types are " "'bar', 'scatter', or 'box'." % plot_type) num_points, num_distributions = _validate_input(data, x_values, data_point_labels, distribution_labels) # Create a list of matplotlib markers (colors or symbols) that can be used # to distinguish each of the distributions. If the user provided a list of # markers, use it and loop around to the beginning if there aren't enough # markers. If they didn't provide a list, or it was empty, use our own # predefined list of markers (again, loop around to the beginning if we # need more markers). distribution_markers = _get_distribution_markers(marker_type, distribution_markers, num_distributions) # Now calculate where each of the data points will start on the x-axis. x_locations = _calc_data_point_locations(x_values, num_points, num_distributions, distribution_width, group_spacing) assert (len(x_locations) == num_points), "The number of x_locations " +\ "does not match the number of data points." # Create the figure to put the plots on, as well as a list to store an # example of each distribution's plot (needed for the legend). result, plot_axes = _create_plot() # Iterate over each data point, and plot each of the distributions at that # data point. Increase the offset after each distribution is plotted, # so that the grouped distributions don't overlap. for point, x_pos in zip(data, x_locations): dist_offset = 0 for dist_index, dist, dist_marker in zip(range(num_distributions), point, distribution_markers): dist_location = x_pos + dist_offset distribution_plot_result = plotting_function(plot_axes, dist, dist_marker, distribution_width, dist_location, whisker_length, error_bar_type) dist_offset += distribution_width # Set up various plot options that are best set after the plotting is done. # The x-axis tick marks (one per data point) are centered on each group of # distributions. plot_axes.set_xticks(_calc_data_point_ticks(x_locations, num_distributions, distribution_width, distribution_centered)) _set_axes_options(plot_axes, title, x_label, y_label, x_values, data_point_labels, x_tick_labels_orientation, y_min, y_max) if distribution_labels is not None: _create_legend(plot_axes, distribution_markers, distribution_labels, marker_type) _set_figure_size(result, figure_width, figure_height) # matplotlib seems to sometimes plot points on the rightmost edge of the # plot without adding padding, so we need to add our own to both sides of # the plot. For some reason this has to go after the call to draw(), # otherwise matplotlib throws an exception saying it doesn't have a # renderer. plot_axes.set_xlim(plot_axes.get_xlim()[0] - group_spacing, plot_axes.get_xlim()[1] + group_spacing) return result def _validate_input(data, x_values, data_point_labels, distribution_labels): """Returns a tuple containing the number of data points and distributions in the data. Validates plotting options to make sure they are valid with the supplied data. """ if data is None or not data or isinstance(data, basestring): raise ValueError("The data must be a list type, and it cannot be " "None or empty.") num_points = len(data) num_distributions = len(data[0]) empty_data_error_msg = "The data must contain at least one data " + \ "point, and each data point must contain at " + \ "least one distribution to plot." if num_points == 0 or num_distributions == 0: raise ValueError(empty_data_error_msg) for point in data: if len(point) == 0: raise ValueError(empty_data_error_msg) if len(point) != num_distributions: raise ValueError("The number of distributions in each data point " "grouping must be the same for all data points.") # Make sure we have the right number of x values (one for each data point), # and make sure they are numbers. _validate_x_values(x_values, data_point_labels, num_points) if (distribution_labels is not None and len(distribution_labels) != num_distributions): raise ValueError("The number of distribution labels must be equal " "to the number of distributions.") return num_points, num_distributions def _validate_x_values(x_values, x_tick_labels, num_expected_values): """Validates the x values provided by the user, making sure they are the correct length and are all numbers. Also validates the number of x-axis tick labels. Raises a ValueError if these conditions are not met. """ if x_values is not None: if len(x_values) != num_expected_values: raise ValueError("The number of x values must match the number " "of data points.") try: map(float, x_values) except: raise ValueError("Each x value must be a number.") if x_tick_labels is not None: if len(x_tick_labels) != num_expected_values: raise ValueError("The number of x-axis tick labels must match the " "number of data points.") def _get_distribution_markers(marker_type, marker_choices, num_markers): """Returns a list of length num_markers of valid matplotlib colors or symbols. The markers will be comprised of those found in marker_choices (if not None and not empty) or a list of predefined markers (determined by marker_type, which can be either 'colors' or 'symbols'). If there are not enough markers, the list of markers will be reused from the beginning again (as many times as are necessary). """ if num_markers < 0: raise ValueError("num_markers must be greater than or equal to zero.") if marker_choices is None or len(marker_choices) == 0: if marker_type == 'colors': marker_choices = ['b', 'g', 'r', 'c', 'm', 'y', 'w'] elif marker_type == 'symbols': marker_choices = \ ['s', 'o', '^', '>', 'v', '<', 'd', 'p', 'h', '8', '+', 'x'] else: raise ValueError("Invalid marker_type: '%s'. marker_type must be " "either 'colors' or 'symbols'." % marker_type) if len(marker_choices) < num_markers: # We don't have enough markers to represent each distribution uniquely, # so let the user know. We'll add as many markers (starting from the # beginning of the list again) until we have enough, but the user # should still know because they may want to provide a new list of # markers. print ("There are not enough markers to uniquely represent each " "distribution in your dataset. You may want to provide a list " "of markers that is at least as large as the number of " "distributions in your dataset.") marker_cycle = cycle(marker_choices[:]) while len(marker_choices) < num_markers: marker_choices.append(marker_cycle.next()) return marker_choices[:num_markers] def _calc_data_point_locations(x_values, num_points, num_distributions, dist_width, group_spacing): """Returns a numpy array of x-axis locations for each of the data points to start at. Note: A numpy array is returned so that the overloaded "+" operator can be used on the array. The x locations are spaced according to the spacing between points, and the width of each distribution grouping at each point. The x locations are also scaled by the x_values that may have been supplied by the user. If none are supplied, the x locations are evenly spaced. """ if dist_width <= 0 or group_spacing < 0: raise ValueError("The width of a distribution cannot be zero or " "negative. The width of the spacing between groups " "of distributions cannot be negative.") if x_values is None: # Evenly space the x locations. x_values = range(1, num_points + 1) assert (len(x_values) == num_points), "The number of x_values does not " +\ "match the number of data points." # Calculate the width of each grouping of distributions at a data point. # This is multiplied by the current x value to give us our final # absolute horizontal position for the current point. return array([(dist_width * num_distributions + group_spacing) * x_val\ for x_val in x_values]) def _calc_data_point_ticks(x_locations, num_distributions, distribution_width, distribution_centered): """Returns a 1D numpy array of x-axis tick positions. These positions will be centered on each data point. Set distribution_centered to True for scatter and box plots because their plot types naturally center over a given horizontal position. Bar charts should use distribution_centered = False because the leftmost edge of a bar starts at a given horizontal position and extends to the right for the width of the bar. """ dist_size = num_distributions - 1 if distribution_centered else\ num_distributions return x_locations + ((dist_size * distribution_width) / 2) def _create_plot(): """Creates a plot and returns the associated Figure and Axes objects.""" fig = figure() ax = fig.add_subplot(111) return fig, ax def _plot_bar_data(plot_axes, distribution, distribution_color, distribution_width, x_position, whisker_length, error_bar_type): """Returns the result of plotting a single bar in matplotlib.""" result = None # We do not want to plot empty distributions because matplotlib will not be # able to render them as PDFs. if len(distribution) > 0: avg = mean(distribution) if error_bar_type == 'stdv': error_bar = std(distribution) elif error_bar_type == 'sem': error_bar = std(distribution) / sqrt(len(distribution)) else: raise ValueError("Invalid error bar type '%s'. Supported error " "bar types are 'stdv' and 'sem'." % error_bar_type) result = plot_axes.bar(x_position, avg, distribution_width, yerr=error_bar, ecolor='black', facecolor=distribution_color) return result def _plot_scatter_data(plot_axes, distribution, distribution_symbol, distribution_width, x_position, whisker_length, error_bar_type): """Returns the result of plotting a single scatterplot in matplotlib.""" result = None x_vals = [x_position] * len(distribution) # matplotlib's scatter function doesn't like plotting empty data. if len(x_vals) > 0 and len(distribution) > 0: result = plot_axes.scatter(x_vals, distribution, marker=distribution_symbol, c='k') return result def _plot_box_data(plot_axes, distribution, distribution_color, distribution_width, x_position, whisker_length, error_bar_type): """Returns the result of plotting a single boxplot in matplotlib.""" box_plot = plot_axes.boxplot([distribution], positions=[x_position], widths=distribution_width, whis=whisker_length) _color_box_plot(plot_axes, box_plot, distribution_color) return box_plot def _color_box_plot(plot_axes, box_plot, color): """Fill each box in the box plot with the specified color. The box_plot argument must be the dictionary returned by the call to matplotlib's boxplot function, and the color argument must be a valid matplotlib color. """ # Note: the following code is largely taken from a matplotlib boxplot # example: # http://matplotlib.sourceforge.net/examples/pylab_examples/ # boxplot_demo2.html num_boxes = len(box_plot['boxes']) for box_num in range(num_boxes): box = box_plot['boxes'][box_num] boxX = [] boxY = [] # There are five points in the box. The first is the same as # the last. for j in range(5): boxX.append(box.get_xdata()[j]) boxY.append(box.get_ydata()[j]) boxCoords = zip(boxX,boxY) boxPolygon = Polygon(boxCoords, facecolor=color) plot_axes.add_patch(boxPolygon) # Draw the median lines back over what we just filled in with # color. median = box_plot['medians'][box_num] medianX = [] medianY = [] for j in range(2): medianX.append(median.get_xdata()[j]) medianY.append(median.get_ydata()[j]) plot_axes.plot(medianX, medianY, 'black') def _set_axes_options(plot_axes, title=None, x_label=None, y_label=None, x_values=None, x_tick_labels=None, x_tick_labels_orientation='vertical', y_min=None, y_max=None): """Applies various labelling options to the plot axes.""" if title is not None: plot_axes.set_title(title) if x_label is not None: plot_axes.set_xlabel(x_label) if y_label is not None: plot_axes.set_ylabel(y_label) if (x_tick_labels_orientation != 'vertical' and x_tick_labels_orientation != 'horizontal'): raise ValueError("Invalid orientation for x-axis tick labels: %s. " "Valid orientations are 'vertical' or 'horizontal'." % x_tick_labels_rotation) # If labels are provided, always use them. If they aren't, use the x_values # that denote the spacing between data points as labels. If that isn't # available, simply label the data points in an incremental fashion, # i.e. 1, 2, 3,...,n, where n is the number of data points on the plot. if x_tick_labels is not None: labels = plot_axes.set_xticklabels(x_tick_labels, rotation=x_tick_labels_orientation) elif x_tick_labels is None and x_values is not None: labels = plot_axes.set_xticklabels(x_values, rotation=x_tick_labels_orientation) else: labels = plot_axes.set_xticklabels( range(1, len(plot_axes.get_xticklabels()) + 1), rotation=x_tick_labels_orientation) # Set the y-axis range if specified. if y_min is not None: plot_axes.set_ylim(bottom=float(y_min)) if y_max is not None: plot_axes.set_ylim(top=float(y_max)) def _create_legend(plot_axes, distribution_markers, distribution_labels, marker_type): """Creates a legend on the supplied axes.""" # We have to use a proxy artist for the legend because box plots currently # don't have a very useful legend in matplotlib, and using the default # legend for bar/scatterplots chokes on empty/null distributions. # Note: This code is based on the following examples: # http://matplotlib.sourceforge.net/users/legend_guide.html # http://stackoverflow.com/a/11423554 if len(distribution_markers) != len(distribution_labels): raise ValueError("The number of distribution markers does not match " "the number of distribution labels.") if marker_type == 'colors': legend_proxy = [Rectangle((0, 0), 1, 1, fc=marker) for marker in distribution_markers] plot_axes.legend(legend_proxy, distribution_labels, loc='best') elif marker_type == 'symbols': legend_proxy = [Line2D(range(1), range(1), color='white', markerfacecolor='black', marker=marker) for marker in distribution_markers] plot_axes.legend(legend_proxy, distribution_labels, numpoints=3, scatterpoints=3, loc='best') else: raise ValueError("Invalid marker_type: '%s'. marker_type must be " "either 'colors' or 'symbols'." % marker_type) def _set_figure_size(fig, width=None, height=None): """Sets the plot figure size and makes room for axis labels, titles, etc. If both width and height are not provided, will use matplotlib defaults. Making room for labels will not always work, and if it fails, the user will be warned that their plot may have cut-off labels. """ # Set the size of the plot figure, then make room for the labels so they # don't get cut off. Must be done in this order. if width is not None and height is not None and width > 0 and height > 0: fig.set_size_inches(width, height) try: fig.tight_layout() except ValueError: print ("Warning: could not automatically resize plot to make room for " "axes labels and plot title. This can happen if the labels or " "title are extremely long and the plot size is too small. Your " "plot may have its labels and/or title cut-off. To fix this, " "try increasing the plot's size (in inches) and try again.") PyCogent-1.5.3/cogent/draw/dotplot.py000644 000765 000024 00000013435 12024702176 020522 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division from matplotlib.path import Path from matplotlib.patches import PathPatch from cogent.util.warning import discontinued from cogent.draw.linear import Display from cogent.draw.rlg2mpl import Drawable, figureLayout from cogent.align.align import dotplot __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def suitable_threshold(window, desired_probability): """Use cumulative binomial distribution to find the number of identical bases which we expect a nucleotide window-mer to have with the desired probability""" cumulative_p = 0.0 for matches in range(window, 0, -1): mismatches = window - matches p = 0.75 ** mismatches for i in range(matches, 0, -1): # n p *= (i + mismatches) p /= i p *= 0.25 cumulative_p += p if cumulative_p > desired_probability: break return matches def _reinchify(figsize, posn, *args): (fw, fh) = figsize (x,y,w,h) = posn return [fw*x, fh*y, fw*w, fh*h] def comparison_display(seq1, seq2, left=.5, bottom=.5, **kw): """'Fat' annotated X and Y axes for a dotplot Returns a matplotlib axes object placed and scaled ready for plotting a sequence vs sequence comparison between the sequences (or alignments) seq1 and seq2, which are also displayed. The aspect ratio will depend on the sequence lengths as the sequences are drawn to the same scale""" import matplotlib.pyplot as plt (x1, y1, w1, h1) = _reinchify(*seq1.figureLayout( labeled=True, bottom=bottom, margin=0)) (x2, y2, w2, h2) = _reinchify(*seq2.figureLayout( labeled=False, bottom=left, margin=0)) # equalize points-per-base scales to get aspect ratio 1.0 ipb = min(w1/len(seq1), w2/len(seq2)) (w1, w2) = ipb*len(seq1), ipb*len(seq2) # Figure with correct aspect # Indent enough for labels and/or vertical display (w,h), posn = figureLayout(width=w1, height=w2, left=max(x1,y2+h2), bottom=y1+h1, **kw) fig = plt.figure(figsize=(w,h), facecolor='white') fw = fig.get_figwidth() fh = fig.get_figheight() # 2 sequence display axes x = seq1.asAxes(fig, [posn[0], posn[1]-h1/fh, posn[2], h1/fh]) y = seq2.asAxes(fig, [posn[0]-h2/fw, posn[1], h2/fw, posn[3]], vertical=True, labeled=False) # and 1 dotplot axes d = fig.add_axes(posn, sharex=x, sharey=y) d.xaxis.set_visible(False) d.yaxis.set_visible(False) return d class Display2D(Drawable): def __init__(self, seq1, seq2, **kw): if not isinstance(seq1, Display): seq1 = Display(seq1, **kw) if not isinstance(seq2, Display): seq2 = Display(seq2, **kw) self.seq1 = seq1.base self.seq1d = seq1 self.seq2 = seq2.base self.seq2d = seq2 self._cache = {} # Check inputs are sufficiently sequence-like assert len(self.seq1) == len(str(self.seq1)) assert len(self.seq2) == len(str(self.seq2)) def _calc_lines(self, window, threshold, min_gap): # Cache dotplot line segment coordinates as they can sometimes # be re-used at different resolutions, colours etc. (len1, len2) = (len(self.seq1), len(self.seq2)) if threshold is None: universe = (len1-window) * (len2-window) acceptable_noise = min(len1, len2) / window threshold = suitable_threshold(window, acceptable_noise/universe) # print 'require %s / %s bases' % (threshold, window) # print 'expect %s / %s matching' % (acceptable_noise, universe) key = (min_gap, window, threshold) if not self._cache.has_key(key): fwd = dotplot(str(self.seq1), str(self.seq2), window, threshold, min_gap, None) if hasattr(self.seq1, "reversecomplement"): rev = dotplot(str(self.seq1.reversecomplement()), str(self.seq2), window, threshold, min_gap, None) rev = [((len1-x1,y1),(len1-x2,y2)) for ((x1,y1),(x2,y2)) in rev] else: rev = [] self._cache[key] = (fwd, rev) return self._cache[key] def makeFigure(self, window=20, join_gaps=None, min_gap=0, **kw): """Drawing of a line segment based dotplot with annotated axes""" # hard to pick min_gap without knowing pixels per base, and # matplotlib is reasonably fast anyway, so: if join_gaps is not None: discontinued('argument', 'join_gaps', '1.6') ax = comparison_display(self.seq1d, self.seq2d, **kw) (fwd, rev) = self._calc_lines(window, None, min_gap) for (lines, colour) in [(fwd, 'blue'), (rev, 'red')]: vertices = [] for segment in lines: vertices.extend(segment) if vertices: ops = [Path.MOVETO, Path.LINETO] * (len(vertices)//2) path = Path(vertices, ops) patch = PathPatch(path, edgecolor=colour, fill=False) ax.add_patch(patch) return ax.get_figure() def simplerMakeFigure(self): """Drawing of a matrix style dotplot with annotated axes""" import numpy ax = comparison_display(self.seq1, self.seq2) alphabet = self.seq1.MolType.Alphabet seq1 = alphabet.toIndices(self.seq1) seq2 = alphabet.toIndices(self.seq2) ax.pcolorfast(numpy.equal.outer(seq2, seq1)) return ax.get_figure() PyCogent-1.5.3/cogent/draw/fancy_arrow.py000644 000765 000024 00000006314 12024702176 021345 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Extension of matplotlib's arrows to give additional control. Contributed this back to the matplotlib developers, so may be obsolete. """ from matplotlib.patches import Polygon from numpy import dot, array, sqrt, concatenate __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class FancyArrow(Polygon): """Like Arrow, but lets you set head width and head height independently.""" def __init__(self, x, y, dx, dy, width=0.001, length_includes_head=False, \ head_width=None, head_length=None, shape='full', overhang=0, \ head_starts_at_zero=False,**kwargs): """Returns a new Arrow. length_includes_head: True if head is counted in calculating the length. shape: ['full', 'left', 'right'] overhang: distance that the arrow is swept back (0 overhang means triangular shape). head_starts_at_zero: if True, the head starts being drawn at coordinate 0 instead of ending at coordinate 0. """ if head_width is None: head_width = 3 * width if head_length is None: head_length = 1.5 * head_width distance = sqrt(dx**2 + dy**2) if length_includes_head: length=distance else: length=distance+head_length if not length: verts = [] #display nothing if empty else: #start by drawing horizontal arrow, point at (0,0) hw, hl, hs, lw = head_width, head_length, overhang, width left_half_arrow = array([ [0.0,0.0], #tip [-hl, -hw/2.0], #leftmost [-hl*(1-hs), -lw/2.0], #meets stem [-length, -lw/2.0], #bottom left [-length, 0], ]) #if we're not including the head, shift up by head length if not length_includes_head: left_half_arrow += [head_length, 0] #if the head starts at 0, shift up by another head length if head_starts_at_zero: left_half_arrow += [head_length/2.0, 0] #figure out the shape, and complete accordingly if shape == 'left': coords = left_half_arrow else: right_half_arrow = left_half_arrow*[1,-1] if shape == 'right': coords = right_half_arrow elif shape == 'full': coords=concatenate([left_half_arrow,right_half_arrow[::-1]]) else: raise ValueError, "Got unknown shape: %s" % shape cx = float(dx)/distance sx = float(dy)/distance M = array([[cx, sx],[-sx,cx]]) verts = dot(coords, M) + (x+dx, y+dy) Polygon.__init__(self, map(tuple, verts), **kwargs) def arrow(axis, x, y, dx, dy, **kwargs): """Draws arrow on specified axis from (x,y) to (x+dx,y+dy).""" a = FancyArrow(x, y, dx, dy, **kwargs) axis.add_artist(a) return a PyCogent-1.5.3/cogent/draw/legend.py000644 000765 000024 00000006200 12024702176 020263 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division from cogent.core import moltype, annotation from matplotlib.collections import PatchCollection from matplotlib.text import Text from matplotlib.transforms import Affine2D from cogent.draw.rlg2mpl import Group, Drawable, figureLayout from cogent.draw.linear import Display, DisplayPolicy __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class Legend(Drawable): """A class for drawing a legend for a display policy Arguments: - policy: a reference to a Display policy class""" def __init__(self, policy = DisplayPolicy): self.policy = policy def _makeSampleSequence(self, feature_type): seq = moltype.DNA.makeSequence('aaaccggttt' * 7) v = seq.addAnnotation(annotation.Feature, feature_type, feature_type, [(2,3)]) v = seq.addAnnotation(annotation.Feature, feature_type, feature_type, [(7,18)]) v = seq.addAnnotation(annotation.Feature, feature_type, feature_type, [(20,70)]) return seq def populateAxes(self, ax, columns = 3): """ Returns the legend as a matplotlib artist Arguments: - columns: the number of columns of feature / representation pairs """ ax.set_xlim(0, 600) ax.set_ylim(-800, 50) result = [] x = y = 0 for track in self.policy()._makeTrackDefns(): if track.tag is None or track.tag=="Graphs": continue ax.text(10, y*30, track.tag) y -= 1 for feature in track: seq = self._makeSampleSequence(feature) display = Display(seq, policy = self.policy, min_feature_height = 10, show_code = False, pad = 0,) sample = display.makeArtist() #trans = sample.get_transform() #offset = Affine2D() #offset.translate(x*600+20 / columns, y*30) sample.translate(x*600/columns+10, y*30) ax.add_artist(sample) ax.text(x*600/columns+90, y*30, feature) x += 1 if x % columns == 0: x = 0 y -= 1 if x: x = 0 y -= 1 ax.axhline((y+.7)*30) def makeFigure(self, margin=0, default_aspect=1.3, **kw): kw['margin'] = margin kw['default_aspect'] = default_aspect (width, height), posn, kw = figureLayout(leftovers=True, **kw) fig = self._makeFigure(width, height) ax = fig.add_axes(posn, adjustable="datalim", frame_on=False, xticks=[], yticks=[]) g = self.populateAxes(ax, **kw) return fig if __name__ == '__main__': Legend().showFigure() PyCogent-1.5.3/cogent/draw/linear.py000644 000765 000024 00000120755 12024702176 020313 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from __future__ import division import rlg2mpl import matplotlib.colors import matplotlib.ticker import matplotlib.transforms from matplotlib.text import Text from matplotlib.patches import PathPatch from matplotlib.font_manager import FontProperties from matplotlib.collections import (CircleCollection, PolyCollection, LineCollection, RegularPolyCollection) from matplotlib.transforms import (IdentityTransform, blended_transform_factory, Affine2DBase) import numpy import copy import warnings from cogent.core.location import Map, Span, _norm_slice from cogent.core.moltype import DNA __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell", "Matthew Wakefield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" # map - a 1D space. must support len() and in some cases [i] # track - a panel within a sequence display holding annotations # display - a collection of stacked tracks with a shared base sequence # base - the map that provides the X scale # annotation - an annotated disjoint location on a map # span - a contiguous part of an annotation # feature, code, variable, shade - types of anotation # TODO: subtracks, spectra/width variables, strides, windows # broken ends, vscales, zero lines, sub-pixel features. # fuzzy ends. more ids/qualifiers. MAX_WIDTH = 72*8 def dna_shapes(): w = .5 / 2 s = .75 / 2 m = (w+s) / 2 r = 8 y = 5 n = (r+y) / 2 def rectangle(x,y): return [(-x, 0), (-x, y), (x, y), (x, 0)] shapes = {} for (motif, width, height) in [ ('A', w, r), ('G', s, r), ('C', s, y), ('T', w, y), ('R', m, r), ('Y', m, y), ('S', s, n), ('W', w, n), ('N', m, n)]: shapes[motif] = rectangle(width, height) return shapes dna_shapes = dna_shapes() class TransformScalePart(Affine2DBase): """Just the translation factors of the child transform, no rotation or translation. a: Child transform from which scale is extracted source_dims: the dimensions (0:X, 1:Y, 2:I) from which the resulting X and Y scales are taken.""" def __init__(self, a, source_dims=[0,1]): self.input_dims = a.input_dims self.output_dims = a.output_dims self._a = a self._scale_source_dims = source_dims self._mtx = a.get_affine().get_matrix().copy() self._mtx[:] = 0 self._mtx[2,2] = 1 self._invalid = True self.set_children(a) Affine2DBase.__init__(self) def get_matrix(self): if self._invalid: a = self._a.get_affine().get_matrix() for dim in [0,1]: sdim = self._scale_source_dims[dim] self._mtx[dim,dim] = a[sdim,sdim] self._inverted = None self._invalid = 0 return self._mtx class _Colors(object): """colors.white = to_rgb("white"), same as just using "white" except that this lookup also checks the color is valid""" def __getattr__(self, attr): return matplotlib.colors.colorConverter.to_rgb(attr) colors = _Colors() def llen(label, fontSize=10): # placeholder for better length-of-label code return len(label) * fontSize class TrackDefn(object): def __init__(self, tag, features): assert tag self.tag = tag self.feature_types = features def __iter__(self): return iter(self.feature_types) class Track(object): def __init__(self, tag, features, level=0, label=None, needs_border=False, max_y=None, height=None): assert tag if label is None: label = tag self.tag = tag self.label = label assert isinstance(label, str), (tag, label) self.features = features self.height = max(height, max([f.height for f in self.features])) self.range = max_y or max( # xxx this works for zero-based only [getattr(f, 'value', None) for f in self.features]) or 0 self.level = level self.needs_border = needs_border def getShapes(self, span, rotated, height, yrange=None, done_border=False): shape_list = [feature.shape(height, yrange or self.range, rotated) for feature in self.features] if self.needs_border and not done_border: border = rlg2mpl.Group( rlg2mpl.Line(span[0], 0, span[1], 0, strokeWidth=.5, strokeColor=colors.black), rlg2mpl.Line(span[0], height, span[1], height, strokeWidth=.5, strokeColor=colors.black) ) shape_list = [border] + shape_list return shape_list def __repr__(self): return "Track(%(tag)s,%(label)s)" % vars(self) class CompositeTrack(Track): """Overlayed tracks""" def __init__(self, tag, tracks, label=None): if label is None: labels = dict([(track.label, True) for track in tracks]).keys() if len(labels) == 1: label = labels[0] else: label = '' self.tag = tag self.label = label self.tracks = tracks self.height = max([track.height for track in tracks]) self.level = max([track.level for track in tracks]) self.range = max([track.range for track in tracks]) def getShapes(self, span, rotated, height, yrange=None, done_border=False): if yrange is None: yrange = self.range shape_list = [] for track in self.tracks: if track.needs_border and not done_border: border = rlg2mpl.Group( rlg2mpl.Line(span[0], 0, span[1], 0, strokeWidth=.5, strokeColor=colors.black), rlg2mpl.Line(span[0], height, span[1], height, strokeWidth=.5, strokeColor=colors.black) ) shape_list.append(border) done_border = True shape_list.extend(track.getShapes(span, rotated, height, yrange=yrange, done_border=True)) return shape_list class Annotation(object): """A map, a style, and some values""" def __init__(self, map, *args, **kw): self.map = map self.values = self._make_values(*args, **kw) def _make_values(self, *args, **kw): # override for variables etc. return [] def __repr__(self): return "%s at %s" % (type(self), getattr(self, 'map', '?')) # xxx styles do this. what about maps/ others def shape(self, height, yrange, rotated): g = rlg2mpl.Group() posn = 0 for span in self.map.spans: if not span.lost: g.add(self._item_shape( span, self.values[posn:posn+span.length], height, yrange, rotated)) posn += span.length return g class _SeqRepresentation(object): height = 20 y_offset = 10 x_offset = 0 def __init__(self, map, sequence, cvalues=None, colour_sequences=True, font_properties=None): self.font_properties = font_properties alphabet = self.alphabet = sequence.MolType.Alphabets.Degen alphabet_colours = None if cvalues: assert len(cvalues) == len(sequence) cvalues = numpy.asarray(cvalues) elif colour_sequences: colour_map = sequence.getColourScheme(colors) color_specs = [colour_map.get(m,'grey') for m in self.alphabet] alphabet_colours = numpy.array([ matplotlib.colors.colorConverter.to_rgba(c, alpha=.5) for c in color_specs]) # this could be faster is sequence were known to be a ModelSequence sequence = numpy.asarray(self.alphabet.toIndices(str(sequence))) posn = 0 used_count = 0 offsets = [] lengths = [] used = numpy.zeros([len(sequence)], bool) x_offset = self.x_offset * 1.0 for span in map.spans: if not (span.lost or span.Reverse): offsets.append(x_offset + span.Start - used_count) lengths.append(span.length) used[posn:posn+span.length] = True used_count += span.length posn += span.length seq = sequence[used] if cvalues is None: cvals = None else: cvals = cvalues[used] x_offsets = numpy.repeat(offsets, lengths) + numpy.arange(used_count) y_offsets = numpy.zeros_like(x_offsets) + self.y_offset offsets = numpy.vstack([x_offsets, y_offsets]).T self._calc_values(seq, cvals, alphabet_colours, offsets) def shape(self, height, yrange, rotated): raise NotImplementedError class _MultiShapeSeqRepresentation(_SeqRepresentation): def _calc_values(self, sequence, cvalues, alphabet_colours, offsets): motifs = range(len(self.alphabet)) values = [] for motif in motifs: positions = numpy.flatnonzero(sequence==motif) if len(positions) == 0: continue if cvalues is not None: cvs = cvalues.take(positions, axis=0) elif alphabet_colours is not None: cvs = [alphabet_colours[motif]] else: cvs = [colors.black] values.append((motif, cvs, offsets[positions])) self.per_shape_values = values class _SingleShapeSeqRepresentation(_SeqRepresentation): def _calc_values(self, sequence, cvalues, alphabet_colours, offsets): if cvalues: cvs = cvalues elif alphabet_colours is not None: cvs = alphabet_colours.take(sequence, axis=0) else: cvs = [colors.black] self.cvalues = cvs self.offsets = offsets class SeqText(_MultiShapeSeqRepresentation): height = 20 x_offset = 0.5 def shape(self, height, yrange, rotated): rot = 0 if rotated: rot += 90 #if span.Reverse: rot+= 180 g = rlg2mpl.Group() kw = dict(ha='center', va='baseline', rotation=rot, font_properties=self.font_properties) for (motif, cvalues, offsets) in self.per_shape_values: letter = self.alphabet[motif] c = len(cvalues) for (i, (x,y)) in enumerate(offsets): s = Text(x, y, letter, color=cvalues[i%c], **kw) g.add(s) return g class SeqShapes(_MultiShapeSeqRepresentation): height = 10 x_offset = 0.5 y_offset = 0 def __init__(self, map, sequence, *args, **kw): super(SeqShapes, self).__init__(map, sequence, *args, **kw) default = dna_shapes['N'] self.shapes = [dna_shapes.get(m, default) for m in self.alphabet] self.rshapes = [[(y,x) for (x,y) in v] for v in self.shapes] def shape(self, height, yrange, rotated): g = rlg2mpl.Group() (X, Y, I) = (0, 1, 2) shapes = [self.shapes, self.rshapes][rotated] trans = TransformScalePart(g.combined_transform) artists = [] for (motif, cvalues, offsets) in self.per_shape_values: shape = shapes[motif] a = PolyCollection([shape], closed=True, facecolors=cvalues, edgecolors=cvalues, offsets=offsets, transOffset=g.combined_transform) g.add(a) a.set_transform(trans) return g class SeqDots(_SingleShapeSeqRepresentation): # Something wrong with this one. height = 5 x_offset = 0.5 y_offset = 5 def shape(self, height, yrange, rotated): g = rlg2mpl.Group() (X, Y, I) = (0, 1, 2) #scaled_axes = [[X, I], [I, Y]][rotated] scaled_axes = [[X, X], [Y, Y]][rotated] scaled_axes = [[X, Y], [X, Y]][rotated] trans = TransformScalePart(g.combined_transform, scaled_axes) a = CircleCollection([.5], edgecolors=self.cvalues, facecolors=self.cvalues, offsets=self.offsets, transOffset=g.combined_transform) g.add(a) a.set_transform(trans) return g class SeqLineSegments(_SingleShapeSeqRepresentation): height = 5 x_offset = 0.0 y_offset = 2.5 def shape(self, height, yrange, rotated): g = rlg2mpl.Group() trans = TransformScalePart(g.combined_transform) segment = [(.1,0),(.9,0)] if rotated: segment = [(y,x) for (x,y) in segment] a = LineCollection([segment], colors=self.cvalues, offsets=self.offsets, transOffset=g.combined_transform) a.set_linewidth(3) g.add(a) a.set_transform(trans) return g class SeqLine(object): height = 20 def __init__(self, map, *args, **kw): x_offset = 0.0 self.segments = [(span.Start+x_offset, span.End+x_offset) for span in map.spans if not span.lost] def shape(self, height, yrange, rotated): g = rlg2mpl.Group() trans = TransformScalePart(g.combined_transform) y = height/2.0 segments = [[(x1,y),(x2,y)] for (x1,x2) in self.segments] a = LineCollection(segments, edgecolor='k', facecolor='k') a.set_linewidth(2) g.add(a) a.set_transform(g.combined_transform) return g class Feature(Annotation): """An Annotation with a style and location rather than values""" def __init__(self, map, style, label=None, value=None): self.map = map self.style = style self.label = label self.value = value self.height = style.height #self.values = self._make_values(*args, **kw) def shape(self, height, yrange, rotated): return self.style(height, self.label, self.map, self.value, yrange, rotated) class _FeatureStyle(object): range_required = False def __init__(self, fill=True, color=colors.black, min_width=0.5, showLabel=False, height=1, thickness=0.6, closed=True, one_span=False, **kw): opts = {} if fill: opts['fillColor'] = color opts['strokeColor'] = None opts['strokeWidth'] = 0 else: opts['fillColor'] = colors.white # otherwise matplotlib blue! opts['strokeColor'] = color opts['strokeWidth'] = 1 self.filled = fill self.closed = closed or fill opts.update(kw) self.opts = opts self.min_width = min_width self.showLabel = showLabel self.height = height self.proportion_of_track = thickness self.one_span = one_span def __call__(self, height, label, map, value, yrange, rotated): #return self.FeatureClass(label, map) g = rlg2mpl.Group() last = first = None if self.range_required and not yrange: warnings.warn("'%s' graph values are all zero" % label) yrange = 1.0 if map.useful and self.one_span: map = map.getCoveringSpan() for (i, span) in enumerate(map.spans): #if last is not None: # g.add(rlg2mpl.Line(last, height, part.Start, height)) if span.lost or (value is None and self.range_required): continue if span.Reverse: (start, end) = (span.End, span.Start) (tidy_start, tidy_end) = (span.tidy_end, span.tidy_start) else: (start, end) = (span.Start, span.End) (tidy_start, tidy_end) = (span.tidy_start, span.tidy_end) shape = self._item_shape( start, end, tidy_start, tidy_end, height, value, yrange, rotated, last=i==len(map.spans)-1) g.add(shape) last = end if first is None: first = start if self.showLabel and label and last is not None and height > 7: font_height = 12 #self.label_font.get_size_in_points() text_width = llen(label, font_height) if (text_width < abs(first-last)): label_shape = Text( (first+last)/2, height/2, label, ha="center", va="center", rotation=[0,90][rotated], #font_properties=self.label_font, ) g.add(label_shape) else: pass #warnings.warn("couldn't fit feature label '%s'" % label) return g class _VariableThicknessFeatureStyle(_FeatureStyle): def _item_shape(self, start, end, tidy_start, tidy_end, height, value, yrange, rotated, last=False): if yrange: thickness = 1.0*value/yrange*height else: thickness = height*self.proportion_of_track return self._item_shape_scaled(start, end, tidy_start, tidy_end, height/2, max(2, thickness), rotated, last) class Box(_VariableThicknessFeatureStyle): arrow = False blunt = False def _item_shape_scaled(self, start, end, tidy_start, tidy_end, middle, thickness, rotated, last): (top, bottom) = (middle+thickness/2, middle-thickness/2) kw = dict(min_width=self.min_width, pointy=False, closed=self.closed, blunt=self.blunt, proportion_of_track=self.proportion_of_track) kw['rounded'] = tidy_start #kw['closed'] = self.closed or tidy_start end1 = rlg2mpl.End(start, end, bottom, top, **kw) kw['rounded'] = tidy_end #kw['closed'] = self.closed or tidy_end or self.filled kw['pointy'] = last and self.arrow end2 = rlg2mpl.End(end, start, top, bottom, **kw) path = end1 + end2 return PathPatch(path, **rlg2mpl.line_options(**self.opts)) class Arrow(Box): arrow = True blunt = False class BluntArrow(Box): arrow = True blunt = True class Diamond(_VariableThicknessFeatureStyle): """diamond""" def _item_shape_scaled(self, start, end, tidy_start, tidy_end, middle, thickness, rotated, last): x = (start+end)/2 spread = max(abs(start-end), self.min_width) / 2 return rlg2mpl.Polygon( [(x-spread, middle), (x, middle+thickness/2), (x+spread, middle), (x, middle-thickness/2)], **self.opts) class Line(_FeatureStyle): """For a line segment graph""" range_required = True def _item_shape(self, start, end, tidy_start, tidy_end, height, value, yrange, rotated, last=False): altitude = value * (height-1) / yrange #if self.orientation < 0: # altitude = height - altitude return rlg2mpl.Line(start, altitude, end, altitude, **self.opts) class Area(_FeatureStyle): """For a line segment graph""" range_required = True def _item_shape(self, start, end, tidy_start, tidy_end, height, value, yrange, rotated, last=False): altitude = value * (height-1) / yrange #if self.orientation < 0: # altitude = height - altitude if end < start: start, end = end, start tidy_start, tidy_end = tidy_end, tidy_start return rlg2mpl.Rect(start, 0, end-start, altitude, **self.opts) class DisplayPolicy(object): def _makeFeatureStyles(self): return { #gene structure 'misc_RNA': Box(True, colors.lightcyan), 'precursor_RNA': Box(True, colors.lightcyan), 'prim_transcript': Box(True, colors.lightcyan), "3'clip": Box(True, colors.lightcyan), "5'clip": Box(True, colors.lightcyan), 'mRNA': Box(True, colors.cyan), 'exon': Box(True, colors.cyan), 'intron': Box(False, colors.cyan, closed = False), "3'UTR": Box(True, colors.cyan), "5'UTR": Box(True, colors.cyan), 'CDS': Box(True, colors.blue), 'mat_peptide': Box(True, colors.blue), 'sig_peptide': Box(True, colors.navy), 'transit_peptide': Box(True, colors.navy), 'polyA_signal': Box(True, colors.lightgreen), 'polyA_site': Diamond(True, colors.lightgreen), 'gene': BluntArrow(False, colors.blue, showLabel=True, closed = False), 'operon': BluntArrow(False, colors.royalblue, showLabel=True, closed = False), #regulation 'attenuator': Box(False, colors.red), 'enhancer': Box(True, colors.green), 'CAAT_signal': Diamond(True, colors.blue), 'TATA_signal': Diamond(True, colors.teal), 'promoter': Box(False, colors.seagreen), 'GC_signal': Box(True, colors.purple), 'protein_bind': Box(True, colors.orange), 'misc_binding': Box(False, colors.black), '-10_signal': Diamond(True, colors.blue), '-35_signal': Diamond(True, colors.teal), 'terminator': Diamond(True, colors.red), 'misc_signal': Box(False, colors.maroon), 'rep_origin': Box(True, colors.linen), 'RBS': Diamond(True, colors.navy), #repeats 'repeat_region': Box(True, colors.brown), 'repeat_unit': Arrow(True, colors.brown), 'LTR': Box(False, colors.black), 'satellite': Box(False, colors.brown), 'stem_loop': Box(False, colors.dimgray), 'misc_structure': Box(False, colors.darkslategray), #rna genes 'rRNA': Arrow(False, colors.darkorchid, showLabel=True), 'scRNA': Arrow(False, colors.darkslateblue, showLabel=True), 'snRNA': Arrow(False, colors.darkviolet, showLabel=True), 'snoRNA': Arrow(False, colors.darkviolet, showLabel=True), 'tRNA': Arrow(False, colors.darkturquoise, showLabel=True), #sequence 'source': Box(False, colors.black, showLabel=True), 'misc_recomb': Box(False, colors.black, showLabel=True), 'variation': Diamond(True, colors.violet, showLabel=True), 'domain': Box(False, colors.darkorange, showLabel=True), 'bluediamond': Diamond(True, colors.blue), 'reddiamond': Diamond(True, colors.red), 'misc_feature': Box(True, colors.darkorange, showLabel=True), 'old_sequence': Box(False, colors.darkslategray), 'unsure': Diamond(False, colors.crimson, min_width=2,), 'misc_difference': Diamond(False, colors.darkorange), 'conflict': Box(False, colors.darkorange), 'modified_base': Diamond(True, colors.black), 'primer_bind': Arrow(False, colors.green, showLabel=True), 'STS': Box(False, colors.black), 'gap': Box(True, colors.gray), #graphs 'blueline': Line(False, colors.blue), 'redline': Line(False, colors.red), #other ##immune system specific #'C_region': Diamond(True, colors.mediumblue), #'N_region': Box(False, colors.linen), #'S_region': Box(False, colors.linen), #'V_region': Box(False, colors.linen), #'D_segment': Diamond(True, colors.mediumpurple), #'J_segment': Box(False, colors.linen), #'V_segment': Box(False, colors.linen), #'iDNA': Box(False, colors.grey), ##Mitocondria specific #'D-loop': Diamond(True, colors.linen), ##Bacterial element specific #'oriT': Box(False, colors.linen), } def _makeTrackDefns(self): return [TrackDefn(*args) for args in [ ('Gene Structure',[ 'misc_RNA', 'precursor_RNA', 'prim_transcript', "3'clip", "5'clip", 'mRNA', 'exon', 'intron', "3'UTR", "5'UTR", 'CDS', 'mat_peptide', 'sig_peptide', 'transit_peptide', 'polyA_signal', 'polyA_site', 'gene', 'operon', ]), ('Regulation',[ 'attenuator', 'enhancer', 'CAAT_signal', 'TATA_signal', 'promoter', 'GC_signal', 'protein_bind', 'misc_binding', '-10_signal', '-35_signal', 'terminator', 'misc_signal', 'rep_origin', 'RBS', ]), ('Repeats',[ 'repeat_region', 'repeat_unit', 'LTR', 'satellite', 'stem_loop', 'misc_structure', ]), ('Rna Genes',[ 'rRNA', 'scRNA', 'snRNA', 'snoRNA', 'tRNA', ]), ('Sequence',[ 'source', 'misc_recomb', 'domain', 'variation', 'bluediamond', 'reddiamond', 'misc_feature', 'old_sequence', 'unsure', 'misc_difference', 'conflict', 'modified_base', 'primer_bind', 'STS', 'gap', ]), ('Graphs',[ 'blueline', 'redline', ]), ]] _default_ignored_features = ['C_region','N_region','S_region','V_region', 'D_segment','J_segment','V_segment','iDNA','D-loop','oriT',] _default_keep_unexpected_tracks = True dont_merge = [] show_text = None # auto draw_bases = None show_gaps = None colour_sequences = None seq_color_callback = None seqname = '' rowlen = None recursive = True def __init__(self, min_feature_height = 20, min_graph_height = None, ignored_features=None, keep_unexpected_tracks=None, **kw): self.seq_font = FontProperties(size=10) #self.label_font = FontProperties() if min_graph_height is None: min_graph_height = min_feature_height * 2 feature_styles = self._makeFeatureStyles() # yuk for style in feature_styles.values(): if style.range_required: style.height = max(style.height, min_graph_height) else: style.height = max(style.height, min_feature_height) self._track_defns = self._makeTrackDefns() if ignored_features is None: ignored_features = self._default_ignored_features self._ignored_features = ignored_features if keep_unexpected_tracks is None: keep_unexpected_tracks = self._default_keep_unexpected_tracks self.keep_unexpected_tracks = keep_unexpected_tracks if not hasattr(self, '_track_map'): self._track_map = {} for track_defn in self._track_defns: for (level, feature_tag) in enumerate(track_defn): feature_style = feature_styles[feature_tag] self._track_map[feature_tag] = ( track_defn, level, feature_style) for ft in self._ignored_features: self._track_map[ft] = (None, 0, None) for ft in feature_styles: if ft not in self._track_map: self._track_map[ft] = (None, level, feature_style) self.map = None self.depth = 0 self.orientation = -1 self.show_code = True self._logged_drops = [] self._setattrs(**kw) def _setattrs(self, **kw): for (n,v) in kw.items(): if not hasattr(self, n): warnings.warn('surprising kwarg "%s"' % n, stacklevel=3) if n.endswith('font'): assert isinstance(kw[n], FontProperties) setattr(self, n, v) def copy(self, **kw): new = copy.copy(self) new._setattrs(**kw) return new def at(self, map): if map is None: return self else: return self.copy(map=self.map[map], depth=self.depth+1) def mergeTracks(self, orig_tracks, keep_unexpected=None): # merge tracks with same names # order features within a track by level # xxx remerge tracks = {} orig_track_tags = [] for track in orig_tracks: if not track.tag in tracks: tracks[track.tag] = {} orig_track_tags.append(track.tag) # ordered list if not track.level in tracks[track.tag]: tracks[track.tag][track.level] = [] tracks[track.tag][track.level].append(track) track_order = [track.tag for track in self._track_defns if track.tag in tracks] unexpected = [tag for tag in orig_track_tags if tag not in track_order] if keep_unexpected is None: keep_unexpected = self.keep_unexpected_tracks if keep_unexpected: track_order += unexpected elif unexpected: warnings.warn('dropped tracks ' + ','.join(unexpected), stacklevel=2) sorted_tracks = [] for track_tag in track_order: annots = [] levels = tracks[track_tag].keys() levels.sort() for level in levels: annots.extend(tracks[track_tag][level]) if len(annots)> 1 and track_tag not in self.dont_merge: sorted_tracks.append(CompositeTrack(track_tag, annots)) else: sorted_tracks.extend(annots) return sorted_tracks def tracksForAlignment(self, alignment): annot_tracks = alignment.getAnnotationTracks(self) if self.recursive: if self.show_gaps is None: seqs_policy = self.copy(show_gaps=True) else: seqs_policy = self seq_tracks = alignment.getChildTracks(seqs_policy) else: seq_tracks = [] annot_tracks = self.mergeTracks(annot_tracks) return seq_tracks + annot_tracks def tracksForSequence(self, sequence=None): result = [] length = None if length is None and sequence is not None: length = len(sequence) label = getattr(self, 'seqname', '') if self.show_code and sequence is not None: # this should be based on resolution, not rowlen, but that's all # we have at this point if self.seq_color_callback is not None: cvalues = self.seq_color_callback(sequence) else: cvalues = None show_text = self.show_text draw_bases = self.draw_bases if draw_bases is None: draw_bases = self.rowlen <= 500 and sequence.MolType is DNA self.rowlen <= 500 and sequence.MolType is DNA and self.draw_bases if show_text is None: show_text = self.rowlen <= 100 if show_text and self.rowlen <= 200: seqrepr_class = SeqText elif draw_bases: seqrepr_class = SeqShapes elif self.rowlen <= 1000 and (self.colour_sequences or cvalues is not None): seqrepr_class = SeqLineSegments elif self.show_gaps: seqrepr_class = SeqLine else: seqrepr_class = None if seqrepr_class is not None: colour_sequences = self.colour_sequences if colour_sequences is None: colour_sequences = seqrepr_class != SeqText feature = seqrepr_class(self.map, sequence, colour_sequences = colour_sequences, font_properties = self.seq_font, cvalues = cvalues) result.append(Track('seq', [feature], level=2, label=label)) else: pass # show label somewhere annot_tracks = sequence.getAnnotationTracks(self) return self.mergeTracks(annot_tracks + result) def getStyleDefnForFeature(self, feature): if feature.type in self._track_map: (track_defn, level, style) = self._track_map[feature.type] elif self.keep_unexpected_tracks: (track_defn, level, style) = self._track_map['misc_feature'] else: if feature.type not in self._logged_drops: warnings.warn('dropped feature ' + repr(feature.type)) self._logged_drops.append(feature.type) return (None, None, None) if track_defn is None: warnings.warn('dropped feature ' + repr(feature.type)) return (None, None, None) else: track_tag = track_defn.tag or feature.type return (track_tag, style, level) def tracksForFeature(self, feature): (track_tag, style, level) = self.getStyleDefnForFeature(feature) if style is None: return [] annot_tracks = feature.getAnnotationTracks(self) return annot_tracks + [Track(track_tag, [Feature(self.map, style, feature.Name)], level=level)] def tracksForVariable(self, variable): (track_tag, style, level) = self.getStyleDefnForFeature(variable) if style is None: return [] segments = [] max_y = 0.0 for ((x1, x2), y) in variable.xxy_list: map = self.map[x1:x2] segments.append(Feature(map, style, variable.Name, value=y)) if type(y) is tuple: y = max(y) if y > max_y: max_y = y return [Track(track_tag, segments, max_y=max_y, needs_border=True, label=variable.Name, level=level)] class Display(rlg2mpl.Drawable): """Holds a list of tracks and displays them all aligned base: A sequence, alignment, or anything else offering .getTracks(policy) policy: A DisplayPolicy subclass. pad: Gap between tracks in points. Other keyword arguments are used to modify the DisplayPolicy: Sequence display: show_text: Represent bases as characters. Slow. draw_bases: Represent bases as rectangles if MolType allows. show_gaps: Represent bases as line segments. colour_sequences: Colour code sequences if MolType allows. seq_color_callback: f(seq)->[colours] for flexible seq coloring. Layout: rowlen: wrap at this many characters per line. min_feature_height: minimum feature symbol height in points. min_graph_height: minimum height of any graphed features in points. Inclusion: recursive: include the sequences of the alignment. ignored_features: list of feature type tags to leave out. keep_unexpected_tracks: show features not assigned to a track by the policy. """ def __init__(self, base, policy=DisplayPolicy, _policy=None, pad=1, yrange=None, **kw): self.pad = pad self.base = base self.yrange = yrange assert len(base) > 0, len(base) if _policy is None: policy = policy(**kw).copy( map=Map([(0, len(base))], parent_length=len(base)), depth=0, rowlen=len(base)) else: policy = _policy self.policy = policy self.smap=Map([(0, len(base))], parent_length=len(base)) self._calc_tracks() def __len__(self): return len(self.smap.inverse()) def _calc_tracks(self): y = 0 self._tracks = [] for p in self.base.getTracks(self.policy)[::-1]: if not isinstance(p, Track): if not isinstance(p, list): p = [p] p = Track('', p) y2 = y + p.height + self.pad self._tracks.append((y+self.pad/2, (y+y2)/2, p)) y = y2 self.height = y if self.yrange is None: self.yrange = {} for (y, ym, p) in self._tracks: self.yrange[p.tag] = max(self.yrange.get(p.tag, 0), p.range) def copy(self, **kw): new = copy.copy(self) new.policy = self.policy.copy(**kw) new._calc_tracks() return new def __getitem__(self, slice): c = copy.copy(self) c.smap = self.smap.inverse()[slice].inverse() return c def makeArtist(self, vertical=False): g = rlg2mpl.Group() for (y, ym, p) in self._tracks: smap = self.smap.inverse() for s in p.getShapes( span=(smap.Start, smap.End), rotated=vertical, height=float(p.height), yrange=self.yrange[p.tag]): trans = matplotlib.transforms.Affine2D() trans.translate(0, y) s.set_transform(s.get_transform() + trans) g.add(s) if vertical: g.rotate(90) g.scale(-1.0, 1.0) return g def asAxes(self, fig, posn, labeled=True, vertical=False): ax = fig.add_axes(posn) self.applyScaleToAxes(ax, labeled=labeled, vertical=vertical) g = self.makeArtist(vertical=vertical) ax.add_artist(g) return ax def applyScaleToAxes(self, ax, labeled=True, vertical=False): (seqaxis, trackaxis) = [ax.xaxis, ax.yaxis] if vertical: (seqaxis, trackaxis) = (trackaxis, seqaxis) if not labeled: trackaxis.set_ticks([]) else: track_positions = [] track_labels = [] for (y, ym, p) in self._tracks: if p.height > 8: track_labels.append(p.label) track_positions.append(ym) trackaxis.set_ticks(track_positions) trackaxis.set_ticklabels(track_labels) if vertical: for tick in trackaxis.get_major_ticks(): tick.label1.set_rotation('vertical') tick.label2.set_rotation('vertical') seqaxis.set_major_formatter( matplotlib.ticker.FuncFormatter(lambda x,pos:str(int(x)))) smap = self.smap.inverse() seq_lim = (smap.Start, smap.End) if vertical: ax.set_ylim(*seq_lim) ax.set_xlim(0, self.height) else: ax.set_xlim(*seq_lim) ax.set_ylim(0, self.height) def figureLayout(self, labeled=True, vertical=False, width=None, height=None, left=None, **kw): if left is None: if labeled: left = max(len(p.label) for (y, ym, p) in self._tracks) left *= 12/72 * .5 # guess mixed chars, 12pt, inaccurate! else: left = 0 height = height or self.height/72 useful_width = len(self)*16/72 # ie bigish font, wide chars fkw = dict(leftovers=True, width=width, height=height, left=left, useful_width=useful_width, **kw) (w,h),posn,kw = rlg2mpl.figureLayout(**fkw) #points_per_base = w * posn[3] / len(self) if vertical: (w, h) = (h, w) posn[0:2] = reversed(posn[0:2]) posn[2:4] = reversed(posn[2:4]) return (w, h), posn, kw def makeFigure(self, width=None, height=None, rowlen=None, **kw): if rowlen: rows = [self[i:i+rowlen] for i in range(0, len(self), rowlen)] else: rows = [self] rowlen = len(self) kw.update(width=width, height=height) ((width, height), (x, y, w, h), kw) = self.figureLayout(**kw) N = len(rows) # since scales go below and titles go above, each row # gets the bottom margin, but not the top margin. vzoom = 1 + (y+h) * (N-1) fig = self._makeFigure(width, height * vzoom) for (i, row) in enumerate(rows): i = len(rows) - i - 1 posn = [x, (y+i*(y+h))/vzoom, w*len(row)/rowlen, h/vzoom] row.asAxes(fig, posn, **kw) return fig PyCogent-1.5.3/cogent/draw/multivariate_plot.py000644 000765 000024 00000022223 12024702176 022574 0ustar00jrideoutstaff000000 000000 """Biplot and triplot for ordination results. """ from itertools import imap import pylab; from pylab import xlim, ylim, plot, scatter from matplotlib import cm from matplotlib.colors import rgb2hex, Normalize, Colormap from numpy import asarray, isscalar, concatenate, any from pdb import set_trace __author__ = "Zongzhi Liu" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Zongzhi Liu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Zongzhi Liu" __email__ = "zongzhi.liu@gmail.com" __status__ = "Development" def plot_ordination(res, choices=[1,2], axis_names='PC', constrained_names=None, samples_kw={}, species_kw={}, centroids_kw={}, biplot_kw={}, axline_kw={}): """plot the ordination result dict. - res: ordination result as a dict with 'eigvals': increasing eigen values as a 1darray. 'samples': sample points as a nx2 array. 'species': optional species points as a nx2. 'centroids': optional centroid points as a nx2. 'biplot': optional biplot arrowheads as a nx2. - choices=[1,2]: a 2 item list, including axes(base from 1) to choose - axis_names='PC': a name prefix or a list of names for each axis. - constrained_names=None: The prefix or a list of constrained axis names. - samples_kw, centroids_kw: control the plot of sample points and centroid points respectively. pass to scatter_points(label=, c=, s=, ...) - species_kw: control the plot of species points. pass to plot_points(points, cml=, label=, ...) - cml: a str for color+marker+linestyle. eg. 'r+:' - biplot_kw: pass to arrows(points_from, points_to, label=, ...) - axline_kw: pass to axhline(...) and axvline(...). """ choices = asarray(choices) -1 samples, species, centroids, biplot, evals = [asarray(res.get(k, None)) for k in ['samples', 'species', 'centroids', 'biplot', 'eigvals']] if isinstance(axis_names, str): axis_names = ['%s%i' % (axis_names, i+1) for i in range(len(evals))] # draw the axis lines axline_kw = dict({'color':'gray'}, **axline_kw) pylab.axvline(**axline_kw) pylab.axhline(**axline_kw) # calc percentages from evals and label them evals = asarray(evals) evals[evals<0] = 0 percs = 100 * (evals / evals.sum()) pylab.xlabel('%s - %.2f%%' % (axis_names[choices[0]], percs[choices[0]])) pylab.ylabel('%s - %.2f%%' % (axis_names[choices[1]], percs[choices[1]])) #plot the speceies points in red + if any(species): species_kw = dict({'cml': 'r+', 'label_kw':{'size':'smaller'}}, **species_kw) #set default plot_points(species[:, choices], **species_kw) # scatter the sample points in black default = {'c': 'k', 's': 50, 'alpha': 0.5, 'label_kw': {'size': 'medium'}} scatter_points(samples[:, choices], **dict(default, **samples_kw)) # scatter the centroids if any(centroids): default = {'c':'b', 's':0, 'label': 'X', 'label_kw':{'size':'larger', 'color':'b', 'ha':'center', 'va':'center'}} scatter_points(centroids[:, choices], **dict(default, **centroids_kw)) # arrow the biplot points if any(biplot): default = {'c':'b', 'label_kw':{'size':'larger', 'color':'b'}} arrows([[0,0]] * len(biplot), biplot[:, choices], **dict(default, **biplot_kw)) # calc the constrained percentage and title it if constrained_names: if isinstance(constrained_names, str): #a prefix constrained_names = [n for n in axis_names if n.startswith(constrained_names)] con_idxs = [i for i, n in enumerate(axis_names) if n in constrained_names] con_perc = percs[con_idxs].sum() pylab.title('%.2f%% constrained' % con_perc) #### # support functions def map_colors(colors, cmap=None, lut=None, mode='hexs', **norm_kw): """return a list of rgb tuples/hexs from color numbers. - colors: a seq of color numbers. - cmap: a Colormap or a name like 'jet' (passto cm.get_cmap(cmap, lut) - mode: one of ['hexs', 'tuples', 'arrays'] Ref: http://www.scipy.org/Cookbook/Matplotlib/Show_colormaps """ modes = ['hexs', 'tuples', 'arrays'] if mode not in modes: raise ValueError('mode must be one of %s, but got %s' % (modes, mode)) if not isinstance(cmap, Colormap): cmap = cm.get_cmap(cmap, lut=lut) rgba_arrays = cmap(Normalize(**norm_kw)(colors)) rgb_arrays = rgba_arrays[:, :-1] #without alpha if mode == 'arrays': return rgb_arrays elif mode == 'tuples': return list(imap(tuple, rgb_arrays)) else: # mode == 'hexs': return list(imap(rgb2hex, rgb_arrays)) def text_points(points, strs, **kw): """print each str at each point. - points: a seq of xy pairs. - strs: a str or a seq of strings with the length of points. **kw: params pass to pylab.text(x, y, s, **kw) - horizontalalignment or ha: ['center', 'left', 'right'] - verticalalignment or va: ['center', 'top', 'bottom'] - size, rotation, ... """ xs, ys = asarray(points, float).T if isinstance(strs, str): #vectorize strs strs = [strs] * len(xs) for x, y, s in zip(xs, ys, strs): if not s: continue pylab.text(x, y, s, **kw) def plot_points(points, cml=None, label=None, label_kw={}, **kw): """plot at each point (x,y). - points: a seq of xy pairs. - cml: a str combining color, marker and linestyle, default to be '-' - label, label_kw: pass to pylab,text(x,y, lbl, **kw) **kw: pass to plot(x,y,**kw) -color, marker(+,), linestyle(-, --, :,) Note: label was overwritten to be text label for each point. """ points = asarray(points, float) xs, ys = points.T if cml: pylab.plot(xs, ys, cml, **kw) else: pylab.plot(xs, ys, **kw) if label:#add label to each point text_points(points, label, **label_kw) def scatter_points(points, s=10, c='b', marker='o', label=None, label_kw={} , **kw): """scatter enhanced with points, markers, and labels. - points: a seq of xy-pairs - s=10: size or sizes of bubbles. - c='b': a color (str or tuple) or a list of colors - marker: a marker (str or tuple) or a list of markers. - label: a label (str) or a list of labels. - label_kw: will be passed to text() to print the labels. **kw: params pass to pylab.scatter() """ points = asarray(points, float) xs, ys = points.T num_points = len(xs) #vectorize size, color, and marker if isscalar(s): s = [s] * num_points if isinstance(c, (str, tuple)): c = [c] * num_points if isinstance(marker, (str, tuple)): marker = [marker] * num_points #numer list to hex list if isinstance(c[0], (int, float)): c = map_colors(c) #scatter each point; colormap() will not work properly now. for xi, yi, si, ci, mi in zip(xs, ys, s, c, marker): scatter([xi], [yi], si, ci, mi, **kw) if label: #add label to each point text_points(points, label, **label_kw) def arrows(points_from, points_to, color='k', width=None, width_ratio=0.002, label=None, label_kw={}, margin_ratio=None, **kw): """draw arrows from each point_from to each point_to. - points_from, points_to: each is a seq of xy pairs. - color or c: arrow color(s). - width: arrow width(s), default to auto adjust with width_ratio. - width_ratio: set width to minimal_axisrange * width_ratio. - label=None: a list of strs to label the arrow heads. - label_kw: pass to pylab.text(x,y,txt,...) - margin_ratio: if provided as a number, canvas margins will be set to axisrange * margin_ratio. **kw: params pass to pylab.arrow(,...) """ # convert and valid check of the points points_from = asarray(points_from, float) points_to = asarray(points_to, float) num_points = len(points_from) if len(points_to) != num_points: raise ValueError('Length of two group of points unmatched') if not width: #auto adjust arrow widt min_range = min(asarray(xlim()).ptp(), asarray(ylim()).ptp()) width = min_range * width_ratio #vectorize colors and width if necessary color = kw.pop('c', color) if isscalar(color): color = [color] * num_points if isscalar(width): width = [width] * num_points #draw each arrow #?? length_include_head=True will break?? default = {'length_includes_head':False, 'alpha':0.75} kw = dict(default, **kw) for (x0, y0), (x1, y1), w, c in zip(points_from, points_to, width, color): pylab.arrow(x0, y0, x1-x0, y1-y0, edgecolor=c, facecolor=c, width=w, **kw) if label: #label the arrow heads text_points(points_to, label, **label_kw) #hack fix of axis limits, otherwise some of the arrows will be invisible #not neccessary if the points were also scattered if margin_ratio is not None: x, y = concatenate((points_from, points_to)).T #all the points xmargin = x.ptp() * margin_ratio ymargin = y.ptp() * margin_ratio xlim(x.min()-xmargin, x.max()+xmargin) ylim(y.min()-ymargin, y.max()+xmargin) return PyCogent-1.5.3/cogent/draw/rlg2mpl.py000644 000765 000024 00000027166 12024702176 020422 0ustar00jrideoutstaff000000 000000 """ReportLab Graphics -> Matplotlib helpers""" #linear, dotplot and dendrogram were originally written to target ReportLab Graphics rather #than Matplotlib. This module can be slowly boiled away as they become more matplotlib native. from __future__ import division from matplotlib.path import Path from matplotlib.lines import Line2D from matplotlib.text import Text import matplotlib.patches as mpatches import matplotlib.artist import matplotlib.transforms import matplotlib.colors import numpy from cogent.util.warning import discontinued __author__ = "Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Peter Maxwell" __email__ = "pm67nz@gmail.com" __status__ = "Production" def line_options(strokeColor='black', fillColor='black', strokeWidth=1): """Map from RLD line option names""" return dict(edgecolor=strokeColor, facecolor=fillColor, linewidth=strokeWidth) def Line(x1, y1, x2, y2, **kw): """Acts like the RLG shape class of the same name""" path = Path([(x1, y1), (x2, y2)], [Path.MOVETO, Path.LINETO]) return mpatches.PathPatch(path, **line_options(**kw)) def Rect(x, y, width, height, **kw): """Acts like the RLG shape class of the same name""" return mpatches.Rectangle((x,y),width,height,**line_options(**kw)) def Polygon(vertices, **kw): """Acts like the RLG shape class of the same name""" return mpatches.Polygon(vertices, **line_options(**kw)) def String(x, y, text, textAnchor='start', fontName=None, fontSize=10, fillColor='black', rotation=None): """Acts like the RLG shape class of the same name""" fontname = fontName fontsize = fontSize color = fillColor ha = {'start':'left', 'middle':'center', 'end':'right'}[textAnchor] va = 'baseline' mpl_kw = dict((n,v) for (n,v) in locals().items() if n.islower()) return Text(**mpl_kw) class Group(matplotlib.artist.Artist): """Acts like the RLG shape class of the same name Groups elements together. May apply a transform to its contents.""" def __init__(self, *elements): """Initial lists of elements may be provided to allow compact definitions in literal Python code. May or may not be useful.""" matplotlib.artist.Artist.__init__(self) self.contents = [] self.group_transform = matplotlib.transforms.Affine2D() self.outer_transform = matplotlib.transforms.TransformWrapper( self.get_transform()) self.combined_transform = self.group_transform + self.outer_transform for elt in elements: self.add(elt) def set_figure(self, fig): matplotlib.artist.Artist.set_figure(self, fig) for c in self.contents: c.set_figure(fig) def set_clip_path(self, patch): matplotlib.artist.Artist.set_clip_path(self, patch) for c in self.contents: c.set_clip_path(patch) def set_transform(self, transform): matplotlib.artist.Artist.set_transform(self, transform) self.outer_transform.set(self.get_transform()) def add(self, node, name=None): """Appends non-None child node to the 'contents' attribute. In addition, if a name is provided, it is subsequently accessible by name """ # propagates properties down node.set_transform(node.get_transform() + self.combined_transform) self.contents.append(node) def rotate(self, theta): """Convenience to help you set transforms""" self.group_transform.rotate_deg(theta) def translate(self, dx, dy): """Convenience to help you set transforms""" self.group_transform.translate(dx, dy) def scale(self, sx, sy): """Convenience to help you set transforms""" self.group_transform.scale(sx, sy) def draw(self, renderer, *args, **kw): for c in self.contents: c.draw(renderer, *args, **kw) def figureLayout(width=None, height=None, margin=0.25, aspect=None, default_aspect=0.75, useful_width=None, leftovers=False, **margins): """Width and height of a figure, plus a bounding box that nearly fills it, derived from defaults or provided margins. All input figures are in inches.""" left = margins.pop('left', 0) + margin right = margins.pop('right', 0) + margin top = margins.pop('top', 0) + margin bottom = margins.pop('bottom', 0) + margin default_width = 6 # use rcParams here? if useful_width: default_width = min(useful_width, default_width) width = width or default_width if aspect is not None: assert not height height = aspect * width else: height = height or default_aspect * width total_height = height + top + bottom total_width = width + left + right posn = [left/total_width, bottom/total_height, width/total_width, height/total_height] if leftovers: return (total_width, total_height), posn, margins else: assert not margins, margins.keys() return (total_width, total_height), posn class Drawable(object): # Superclass for objects which can generate a matplotlib figure, in order # to supply consistent and convenient showFigure() and drawToFile() # methods. # Subclasses must provide .makeFigure() which will make use of # _makeFigure() matplotlib.pyplot import done at runtime to give the # user every chance to change the matplotlib backend first def _makeFigure(self, width, height, **kw): import matplotlib.pyplot as plt fig = plt.figure(figsize=(width,height), facecolor='white') return fig def drawFigure(self, title=None, **kw): """Draw the figure. Extra arguments are forwarded to self.makeFigure()""" import matplotlib.pyplot as plt fig = self.makeFigure(**kw) if title is not None: fig.suptitle(title) plt.draw_if_interactive() def showFigure(self, title=None, **kw): """Make the figure and immediately pyplot.show() it. Extra arguments are forwarded to self.makeFigure()""" self.drawFigure(title, **kw) import matplotlib.pyplot as plt plt.show() def drawToFile(self, fname, **kw): """Save in a file named 'fname' Extra arguments are forwarded to self.makeFigure() unless they are valid for savefig()""" makefig_kw = {} savefig_kw = {} for (k,v) in kw.items(): if k in ['dpi', 'facecolor', 'edgecolor', 'orientation', 'papertype', 'format', 'transparent']: savefig_kw[k] = v else: makefig_kw[k] = v fig = self.makeFigure(**makefig_kw) fig.savefig(fname, **savefig_kw) def drawToPDF(self, filename, total_width=None, height=None, **kw): # Matches, as far as possible, old ReportLab version if total_width is not None: kw['width'] = total_width / 72 kw2 = {} for (k,v) in kw.items(): if k in ['wraps', 'border', 'withTrackLabelColumn']: discontinued('argument', "%s" % k, '1.6') else: kw2[k] = v kw2['format'] = 'pdf' if height: kw2['height'] = height / 72 return self.drawToFile(filename, **kw2) # For sequence feature styles: # Matplotlib has fancy_box and fancy_arrow. The code below is # similar, except that the two ends of the box have independent # styles: open, square, rounded, pointy, or blunt class PathBuilder(object): """Path, made up of straight lines and bezier curves.""" # Only used by the _End classes below def __init__(self): self.points = [] self.operators = [] def asPath(self): return Path(self.points, self.operators) def moveTo(self, x, y): self.points.append((x, y)) self.operators.append(Path.MOVETO) def lineTo(self, x, y): self.points.append((x, y)) self.operators.append(Path.LINETO) def curveTo(self, x1, y1, x2, y2, x3, y3): self.points.extend([(x1, y1), (x2, y2), (x3, y3)]) self.operators.extend([Path.CURVE4]*3) def closePath(self): self.points.append((0.0, 0.0)) # ignored self.operators.append(Path.CLOSEPOLY) class _End(object): def __init__(self, x_near, x_far, y_first, y_second, **kw): self.x_near = x_near self.x_far = x_far self.y_first = y_first self.y_second = y_second for (n, v) in kw.items(): setattr(self, n, v) def moveToStart(self, path): path.moveTo(*self.startPoint()) def drawToStart(self, path): path.lineTo(*self.startPoint()) def finish(self, path): path.closePath() def startPoint(self): return (self.x_near, self.y_first) def __add__(self, oppo): p = PathBuilder() self.moveToStart(p) self.drawEnd(p) oppo.drawToStart(p) oppo.drawEnd(p) self.finish(p) return p.asPath() class Open(_End): def finish(self, path): self.drawToStart(path) def drawEnd(self, path): path.moveTo(self.x_near, self.y_second) class Square(_End): def drawEnd(self, path): path.lineTo(self.x_near, self.y_second) class Rounded(_End): def startPoint(self): return (self.x_near + self.dx, self.y_first) def drawEnd(self, path): path.curveTo(self.x_near, self.y_first, self.x_near, self.y_first, self.x_near, self.y_first + self.dy) path.lineTo(self.x_near, self.y_second - self.dy) path.curveTo(self.x_near, self.y_second, self.x_near, self.y_second, self.x_near + self.dx, self.y_second) class Pointy(_End): def _effective_dx(self): return max(abs(self.dx), abs(self.dy))*self.dx/abs(self.dx) def startPoint(self): return (self.x_near + self._effective_dx(), self.y_first) def drawEnd(self, path): head_start = self.x_near + self._effective_dx() middle = (self.y_first + self.y_second) / 2 if self.blunt: for (x, y) in [ (head_start, self.y_first + self.dy), (self.x_near, self.y_first), (self.x_near, self.y_second), (head_start, self.y_second - self.dy), (head_start, self.y_second)]: path.lineTo(x, y) else: for (x, y) in [ (head_start, self.y_first + self.dy), (self.x_near, middle), (head_start, self.y_second - self.dy), (head_start, self.y_second)]: path.lineTo(x, y) def _sign(x): return x and x/abs(x) def End(x1, x2, y1, y2, closed=True, rounded=False, pointy=False, blunt=False, min_width=0.5, proportion_of_track=0.6): inwards = _sign(x2 - x1) span = max(abs(x2 - x1), min_width) thickness = abs(y1-y2) if pointy: head_size = min(thickness/20, span) head_size = max(head_size, min_width/2) height = thickness / proportion_of_track spare = (height - thickness) / 2 end = Pointy(x1, x2, y1, y2, dx=inwards*head_size/2, dy=_sign(y1-y2)*spare, blunt=blunt) elif rounded: ry = thickness/4 rx = min(span, ry) end = Rounded(x1, x2, y1, y2, dx=rx*inwards, dy=ry*_sign(y2-y1)) elif not closed: end = Open(x1, x2, y1, y2) else: end = Square(x1, x2, y1, y2) return end PyCogent-1.5.3/cogent/draw/util.py000644 000765 000024 00000102411 12024702176 020003 0ustar00jrideoutstaff000000 000000 #/usr/bin/env python """Provides different kinds of generally useful plots using matplotlib. Some of these plots are enhancements of the matplotlib versions (e.g. hist() or copies of plot types that have been withdrawn from matplotlib (e.g. scatter_classic). Notable capabilities include automated series coloring and drawing of regression lines, the ability to plot scatterplots with correlated histograms, etc. See individual docstrings for more info. """ from __future__ import division from matplotlib import use, rc, rcParams __author__ = "Stephanie Wilson" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Stephanie Wilson"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" #use('Agg') #suppress graphical rendering #rc('text', usetex=True) rc('font', family='serif') #required to match latex text and equations try: import Image import ImageFilter except ImportError: Image = ImageFilter = None #only used to smooth contours: skip if no PIL from numpy import array, shape, fromstring, sqrt, zeros, pi from cogent.core.usage import UnsafeCodonUsage as CodonUsage from cogent.maths.stats.test import regress, correlation from pylab import plot, cm, savefig, gca, gcf, arange, text, subplot, \ asarray, iterable, searchsorted, sort, diff, concatenate, silent_list, \ is_string_like, Circle, mean, std, normpdf, legend, contourf, \ colorbar, ravel, imshow, contour from matplotlib.font_manager import FontProperties from os.path import split #module-level constants standard_series_colors=['k','r','g','b', 'm','c'] def hist(x, bins=10, normed='height', bottom=0, \ align='edge', orientation='vertical', width=None, axes=None, **kwargs): """Just like the matplotlib hist, but normalizes bar heights to 1. axes uses gca() by default (built-in hist is a method of Axes). Original docs from matplotlib: HIST(x, bins=10, normed=0, bottom=0, orientiation='vertical', **kwargs) Compute the histogram of x. bins is either an integer number of bins or a sequence giving the bins. x are the data to be binned. The return values is (n, bins, patches) If normed is true, the first element of the return tuple will be the counts normalized to form a probability density, ie, n/(len(x)*dbin) orientation = 'horizontal' | 'vertical'. If horizontal, barh will be used and the "bottom" kwarg will be the left. width: the width of the bars. If None, automatically compute the width. kwargs are used to update the properties of the hist bars """ if axes is None: axes = gca() if not axes._hold: axes.cla() n, bins = norm_hist_bins(x, bins, normed) if width is None: width = 0.9*(bins[1]-bins[0]) if orientation=='horizontal': patches = axes.barh(bins, n, height=width, left=bottom, \ align=align) else: patches = axes.bar(bins, n, width=width, bottom=bottom, \ align=align) for p in patches: p.update(kwargs) return n, bins, silent_list('Patch', patches) def norm_hist_bins(y, bins=10, normed='height'): """Just like the matplotlib mlab.hist, but can normalize by height. normed can be 'area' (produces matplotlib behavior, area is 1), any False value (no normalization), or any True value (normalization). Original docs from matplotlib: Return the histogram of y with bins equally sized bins. If bins is an array, use the bins. Return value is (n,x) where n is the count for each bin in x If normed is False, return the counts in the first element of the return tuple. If normed is True, return the probability density n/(len(y)*dbin) If y has rank>1, it will be raveled Credits: the Numeric 22 documentation """ y = asarray(y) if len(y.shape)>1: y = ravel(y) if not iterable(bins): ymin, ymax = min(y), max(y) if ymin==ymax: ymin -= 0.5 ymax += 0.5 if bins==1: bins=ymax dy = (ymax-ymin)/bins bins = ymin + dy*arange(bins) n = searchsorted(sort(y), bins) n = diff(concatenate([n, [len(y)]])) if normed: if normed == 'area': db = bins[1]-bins[0] else: db = 1.0 return 1/(len(y)*db)*n, bins else: return n, bins def scatter_classic(x, y, s=None, c='b'): """ SCATTER_CLASSIC(x, y, s=None, c='b') Make a scatter plot of x versus y. s is a size (in data coords) and can be either a scalar or an array of the same length as x or y. c is a color and can be a single color format string or an length(x) array of intensities which will be mapped by the colormap jet. If size is None a default size will be used Copied from older version of matplotlib -- removed in version 0.9.1 for whatever reason. """ self = gca() if not self._hold: self.cla() if is_string_like(c): c = [c]*len(x) elif not iterable(c): c = [c]*len(x) else: norm = normalize() norm(c) c = cm.jet(c) if s is None: s = [abs(0.015*(amax(y)-amin(y)))]*len(x) elif not iterable(s): s = [s]*len(x) if len(c)!=len(x): raise ValueError, 'c and x are not equal lengths' if len(s)!=len(x): raise ValueError, 's and x are not equal lengths' patches = [] for thisX, thisY, thisS, thisC in zip(x,y,s,c): circ = Circle( (thisX, thisY), radius=thisS, ) circ.set_facecolor(thisC) self.add_patch(circ) patches.append(circ) self.autoscale_view() return patches def as_species(name, leave_path=False): """Cleans up a filename into a species name, italicizing it in latex.""" #trim extension if present dot_location = name.rfind('.') if dot_location > -1: name = name[:dot_location] #get rid of _small if present -- used for debugging if name.endswith('_small'): name = name[:-len('_small')] if name.endswith('_codon_usage'): name = name[:-len('_codon_usage')] #get rid of path unless told to leave it name = split(name)[-1] #replace underscores with spaces name = name.replace('_', ' ') #make sure the first letter of the genus is caps, and not the first letter #of the species fields = name.split() fields[0] = fields[0].title() #assume second field is species name if len(fields) > 1: fields[1] = fields[1].lower() binomial = ' '.join(fields) if rcParams.get('text.usetex'): binomial = r'\emph{' + binomial + '}' return binomial def frac_to_psq(frac, graph_size): """Converts diameter as fraction of graph to points squared for scatter. frac: fraction of graph (e.g. .01 is 1% of graph size) graph_size: graph size in inches """ points = frac * graph_size * 72 return pi * (points/2.0)**2 def init_graph_display(title=None, aux_title=None, size=4.0, \ graph_shape='sqr', graph_grid=None, x_label='', y_label='', \ dark=False, with_parens=True, prob_axes=True, axes=None, num_genes=None): """Initializes a range of graph settings for standard plots. These settings include: - font sizes based on the size of the graph - graph shape - grid, including lines for x=y or at x and y = 0.5 - title, auxillary title, and x and y axis labels Parameters: title: displayed on left of graph, at the top, latex-format string aux_title: displayed on top right of graph, latex-format string. typically used for number of genes. size: size of graph, in inches graph_shape: 'sqr' for square graphs, 'rect' for graphs that include a colorbar, 3to1: width 3 to height 1. graph_grid: background grid for the graph. Currently recognized grids are '/' (line at x=y) and 't' (cross at x=.5 and y=.5). x_label: label for x axis, latex-format string. y_label: label for y axis, latex-format string. dark: set to True if dark background, reverses text and tick colors. with_parens: if True (default), puts parens around auxillary title returns font, label_font_size (for use in producing additional labels in calling function). """ if dark: color='w' else: color='k' rect_scale_factor = 1.28 #need to allow for legend while keeping graph #square; empirically determined at 1.28 font_size = int(size*3-1) #want 11pt font w/ default graph size 4" sqr label_scale_factor = 0.8 label_font_size = font_size * label_scale_factor label_offset = label_font_size * 0.5 axis_label_font={'fontsize':font_size} font={'fontsize':font_size, 'color':color} if graph_shape == 'sqr': gcf().set_size_inches(size,size) elif graph_shape == 'rect': #scaling for sqr graphs with colorbar gcf().set_size_inches(size*rect_scale_factor,size) elif graph_shape == '3to1': gcf().set_size_inches(3*size, size) elif graph_shape == '2to1': gcf().set_size_inches(2*size, size) else: raise ValueError, "Got unknown graph shape %s" % graph_shape #set or create axes if axes is None: axes = gca() min_x, max_x =axes.get_xlim() min_y, max_y = axes.get_ylim() x_range = abs(max_x - min_x) y_range = abs(max_y - min_y) min_offset = (x_range * 0.05) + min_x #minimum offset, e.g. for text max_offset = max_y - (y_range * 0.05) #draw grid manually: these are in data coordinates. if graph_grid == 't': #grid lines at 0.5 on each axis, horiz & vertic axes.axvline(x=.5, ymin=0, ymax=1, color=color, linestyle=':') axes.axhline(y=.5, xmin=0, xmax=1, color=color, linestyle=':') elif graph_grid == '/': #diagonal gridlines from 0,0 to 1,1. axes.plot([0,1], color=color, linestyle=':') else: pass #ignore other choices #remove default grid axes.grid(False) #set x and y labels axes.set_ylabel(y_label, axis_label_font) axes.set_xlabel(x_label, axis_label_font) #add title/aux_title to graph directly. Note that we want #the tops of these to be fixed, and we want the label to be #left-justified and the number of genes to be right justified, #so that it still works when we resize the graph. if title is not None: axes.text(min_offset, max_offset, str(title), font, \ verticalalignment='top', horizontalalignment='left') #use num_genes as aux_title by default aux_title = num_genes or aux_title if aux_title is not None: if with_parens: aux_title='('+str(aux_title)+')' axes.text(max_offset, max_offset, str(aux_title), font, verticalalignment='top', horizontalalignment='right') if prob_axes: init_ticks(axes, label_font_size, dark) #set x and y label offsets -- currently though rcParams, but should be #able to do at instance level? #rc('xtick.major', pad=label_offset) #rc('ytick.major', pad=label_offset) return font, label_font_size def init_ticks(axes=None, label_font_size=None, dark=False): """Initializes ticks for fingerprint plots or other plots ranging from 0-1. takes axis argument a from a = gca(), or a specified axis sets the ticks to span from 0 to 1 with .1 intervals changes the size of the ticks and the corresponding number labels """ if axes is None: axes = gca() axes.set_xticks(arange(0,1.01,.1),) axes.set_yticks(arange(0,1.01,.1)) #reset sizes for x and y labels x = axes.get_xticklabels() y = axes.get_yticklabels() if label_font_size is not None: for l in axes.get_xticklabels() + axes.get_yticklabels(): l.set_fontsize(label_font_size) #if dark, need to reset color of internal ticks to white if dark: for l in axes.get_xticklines() + axes.get_yticklines(): l.set_markeredgecolor('white') def set_axis_to_probs(axes=None): """sets the axes to span from 0 to 1. Useful for forcing axes to range over probabilities. Axes are sometimes reset by other calls. """ #set axis for probabilities (range 0 to 1) if axes is None: axes = gca() axes.set_xlim([0,1]) axes.set_ylim([0,1]) def plot_regression_line(x,y,line_color='r', axes=None, prob_axes=False, \ axis_range=None): """Plots the regression line, and returns the equation. x and y are the x and y data for a single series line_color is a matplotlib color, will be used for the line axes is the name of the axes the regression will be plotted against prob_axes, if true, forces the axes to be between 0 and 1 range, if not None, forces the axes to be between (xmin, xmax, ymin, ymax). """ if axes is None: axes = gca() m, b = regress(x, y) r, significance = correlation(x,y) #set the a, b, and r values. a is the slope, b is the intercept. r_str = '%0.3g'% (r**2) m_str ='%0.3g' % m b_str = '%0.3g' % b #want to clip the line so it's contained entirely within the graph #coordinates. Basically, we need to find the values of y where x #is at x_min and x_max, and the values of x where y is at y_min and #y_max. #if we didn't set prob_axis or axis_range, just find empirical x and y if (not prob_axes) and (axis_range is None): x1, x2 = min(x), max(x) y1, y2 = m*x1 + b, m*x2 + b x_min, x_max = x1, x2 else: if prob_axes: x_min, x_max = 0, 1 y_min, y_max = 0, 1 else: #axis range must have been set x_min, x_max, y_min, y_max = axis_range #figure out bounds for x_min and y_min y_at_x_min = m*x_min + b if y_at_x_min < y_min: #too low: find x at y_min y1 = y_min x1 = (y_min-b)/m elif y_at_x_min > y_max: #too high: find x at y_max y1 = y_max x1 = (y_max-b)/m else: #just right x1, y1 = x_min, y_at_x_min y_at_x_max = m*x_max + b if y_at_x_max < y_min: #too low: find x at y_min y2 = y_min x2 = (y_min-b)/m elif y_at_x_max > y_max: #too high: find x at y_max y2 = y_max x2 = (y_max-b)/m else: #just right x2, y2 = x_max, y_at_x_max #need to check that the series wasn't entirely in range if (x_min <= x1 <= x_max) and (x_min <= x2 <= x_max): axes.plot([x1,x2],[y1,y2], color=line_color, linewidth=0.5) if b >= 0: sign_str = ' + ' else: sign_str = ' ' equation=''.join(['y= ',m_str,'x',sign_str,b_str,'\nr$^2$=',r_str]) return equation, line_color def add_regression_equations(equations, axes=None, prob_axes=False, \ horizontalalignment='right', verticalalignment='bottom'): """Writes list of regression equations to graph. equations: list of regression equations size: size of the graph in inches """ if axes is None: axes = gca() if prob_axes: min_x, max_x = 0, 1 min_y, max_y = 0, 1 else: min_x, max_x = axes.get_xlim() min_y, max_y = axes.get_ylim() x_range = abs(max_x - min_x) y_range = abs(max_y - min_y) for i, (eq_text, eq_color) in enumerate(equations): axes.text((x_range * 0.98) + min_x, \ (y_range * 0.02 + min_y +(y_range * .1 * i)), \ str(eq_text), \ horizontalalignment=horizontalalignment, \ verticalalignment=verticalalignment, \ color=eq_color) def broadcast(i, n): """Broadcasts i to a vector of length n.""" try: i = list(i) except: i = [i] reps, leftovers = divmod(n, len(i)) return (i * reps) + i[:leftovers] #scatterplot functions and helpers def plot_scatter(data, series_names=None, \ series_color=standard_series_colors, line_color=standard_series_colors,\ alpha=0.25, marker_size=.015, scale_markers=True, show_legend=True,legend_loc='center right', show_regression=True, show_equation=True, prob_axes=False, size=8.0, axes=None, **kwargs): """helper plots one or more series of scatter data of specified color, calls the initializing functions, doesn't print graph takes: plotted_pairs, series_names, show_legend, legend_loc, and **kwargs passed on to init_graph_display (these include title, aux_title, size, graph_shape, graph_grid, x_label, y_label, dark, with_parens). plotted_pairs = (first_pos, second_pos, dot_color, line_color, alpha, show_regression, show_equation) returns the regression str equation (list) if regression is set true suppresses legend if series not named, even if show_legend is True. """ if not axes: axes = gca() #initialize fonts, shape and labels font,label_font_size=init_graph_display(prob_axes=prob_axes, \ size=size, axes=axes, **kwargs) equations = [] #figure out how many series there are, and scale vals accordingly num_series = int(len(data)/2) series_color = broadcast(series_color, num_series) line_color = broadcast(line_color, num_series) alpha = broadcast(alpha, num_series) marker_size = broadcast(marker_size, num_series) if scale_markers: marker_size = [frac_to_psq(m, size) for m in marker_size] series = [] for i in range(num_series): x, y = data[2*i], data[2*i+1] series.append(axes.scatter(x,y,s=marker_size[i],c=series_color[i],\ alpha=alpha[i])) #find the equation and plots the regression line if True if show_regression: equation = plot_regression_line(x,y,line_color[i], axes=axes, \ prob_axes=prob_axes) if show_equation: equations.append(equation) #will be (str, color) tuple #update graph size for new data axes.autoscale_view(tight=True) #print all the regression equations at once -- need to know how many if show_regression: add_regression_equations(equations, axes=axes, prob_axes=prob_axes) #clean up axes if necessary if show_legend and series_names: #suppress legend if series not named axes.legend(series, series_names, legend_loc) if prob_axes: set_axis_to_probs(axes) return equations, font #Contour plots and related functions def plot_filled_contour(plot_data, xy_data=None, show_regression=False, \ show_equation=False, fill_cmap=cm.hot, graph_shape='rect', \ num_contour_lines=10, prob_axes=False, **kwargs): """helper plots one or more series of contour data calls the initializing functions, doesn't output figure takes: plot_data, xy_data, show_regression, show_equation, fill_cmap, and **kwargs passed on to init_graph_display. plot_data = (x_bin, y_bin, data_matrix dot_colors) """ if show_regression: equation = plot_regression_line(xy_data[:,0],xy_data[:,1], \ prob_axes=prob_axes) if show_equation: add_regression_equations([equation]) #init graph display, rectangular due to needed colorbar space init_graph_display(graph_shape=graph_shape, **kwargs) #plots the contour data for x_bin,y_bin,data_matrix in plot_data: contourf(x_bin,y_bin,data_matrix, num_contour_lines, cmap=fill_cmap) #add the colorbar legend to the side colorbar() def plot_contour_lines(plot_data, xy_data=None, show_regression=False, \ show_equation=False, smooth_steps=0, num_contour_lines=10, \ label_contours=False, line_cmap=cm.hot, fill_cmap=cm.gray,dark=True, graph_shape='rect', prob_axes=False, **kwargs): """helper plots one or more series of contour line data calls the initializing functions, doesn't output figure takes: plot_data, xy_data, show_regression, show_equation, smooth, num_contour_lines, label_contours, line_cmap, fill_cmap, graph_shape, and **kwargs passed on to init_graph_display. plot_data = (x_bin, y_bin, data_matrix dot_colors) """ if prob_axes: extent = (0,1,0,1) else: a = gca() extent = a.get_xlim()+a.get_ylim() #init graph display, rectangular due to needed colorbar space init_graph_display(graph_shape=graph_shape, dark=dark, **kwargs) #plots the contour data for x_bin,y_bin,data in plot_data: orig_max = max(ravel(data)) scaled_data = (data/orig_max*255).astype('b') if smooth_steps and (Image is not None): orig_shape = data.shape im = Image.fromstring('L', data.shape, scaled_data) for i in range(smooth_steps): im = im.filter(ImageFilter.BLUR) new_data = fromstring(im.tostring(), 'b') data = reshape(new_data.astype('i')/255.0 * orig_max, orig_shape) if fill_cmap is not None: im = imshow(data, interpolation='bicubic', extent=extent, \ origin='lower', cmap=fill_cmap) result=contour(x_bin,y_bin,data, num_contour_lines, origin='lower',linewidth=2, extent=extent, cmap=line_cmap) if label_contours: clabel(result, fmt='%1.1g') #add the colorbar legend to the side cb = colorbar() cb.ax.axisbg = 'black' if show_regression: equation=plot_regression_line(xy_data[0],xy_data[1],prob_axes=prob_axes) if show_equation: add_regression_equations([equation]) def plot_histograms(data, graph_name='histogram.png', bins=20,\ normal_fit=True, normed=True, colors=None, linecolors=None, \ alpha=0.75, prob_axes=True, series_names=None, show_legend=False,\ y_label=None, **kwargs): """Outputs a histogram with multiple series (must provide a list of series). takes: data: list of arrays of values to plot (needs to be list of arrays so you can pass in arrays with different numbers of elements) graph_name: filename to write graph to bins: number of bins to use normal_fit: whether to show the normal curve best fitting the data normed: whether to normalize the histogram (e.g. so bars sum to 1) colors: list of colors to use for bars linecolors: list of colors to use for fit lines **kwargs are pssed on to init_graph_display. """ rc('patch', linewidth=.2) if y_label is None: if normed: y_label='Frequency' else: y_label='Count' num_series = len(data) if colors is None: if num_series == 1: colors = ['white'] else: colors = standard_series_colors if linecolors is None: if num_series == 1: linecolors = ['red'] else: linecolors = standard_series_colors init_graph_display(prob_axes=prob_axes, y_label=y_label, **kwargs) all_patches = [] for i, d in enumerate(data): fc = colors[i % len(colors)] lc = linecolors[i % len(linecolors)] counts, x_bins, patches = hist(d, bins=bins, normed=normed, \ alpha=alpha, facecolor=fc) all_patches.append(patches[0]) if normal_fit and len(d) > 1: maxv, minv = max(d), min(d) mu = mean(d) sigma = std(d) bin_width = x_bins[-1] - x_bins[-2] #want normpdf to extend over the range normpdf_bins = arange(minv,maxv,(maxv - minv)*.01) y = normpdf(normpdf_bins, mu, sigma) orig_area = sum(counts) * bin_width y = y * orig_area #normpdf area is 1 by default plot(normpdf_bins, y, linestyle='--', color=lc, linewidth=1) if show_legend and series_names: fp = FontProperties() fp.set_size('x-small') legend(all_patches, series_names, prop = fp) #output figure if graph name set -- otherwise, leave for further changes if graph_name is not None: savefig(graph_name) def plot_monte_histograms(data, graph_name='gene_histogram.png', bins=20,\ normal_fit=True, normed=True, colors=None, linecolors=None, \ alpha=0.75, prob_axes=True, series_names=None, show_legend=False,\ y_label=None, x_label=None, **kwargs): """Outputs a histogram with multiple series (must provide a list of series). Differs from regular histogram in that p-value works w/exactly two datasets, where the first dataset is the reference set. Calculates the mean of the reference set, and compares this to the second set (which is assumed to contain the means of many runs producing data comparable to the data in the reference set). takes: data: list of arrays of values to plot (needs to be list of arrays so you can pass in arrays with different numbers of elements) graph_name: filename to write graph to bins: number of bins to use normal_fit: whether to show the normal curve best fitting the data normed: whether to normalize the histogram (e.g. so bars sum to 1) colors: list of colors to use for bars linecolors: list of colors to use for fit lines **kwargs are passed on to init_graph_display. """ rc('patch', linewidth=.2) rc('font', size='x-small') rc('axes', linewidth=.2) rc('axes', labelsize=7) rc('xtick', labelsize=7) rc('ytick', labelsize=7) if y_label is None: if normed: y_label='Frequency' else: y_label='Count' num_series = len(data) if colors is None: if num_series == 1: colors = ['white'] else: colors = standard_series_colors if linecolors is None: if num_series == 1: linecolors = ['red'] else: linecolors = standard_series_colors init_graph_display(prob_axes=prob_axes, y_label=y_label, **kwargs) all_patches = [] for i, d in enumerate(data): fc = colors[i % len(colors)] lc = linecolors[i % len(linecolors)] counts, x_bins, patches = hist(d, bins=bins, normed=normed, \ alpha=alpha, facecolor=fc) all_patches.append(patches[0]) if normal_fit and len(d) > 1: mu = mean(d) sigma = std(d) minv = min(d) maxv = max(d) bin_width = x_bins[-1] - x_bins[-2] #set range for normpdf normpdf_bins = arange(minv,maxv,0.01*(maxv-minv)) y = normpdf(normpdf_bins, mu, sigma) orig_area = sum(counts) * bin_width y = y * orig_area #normpdf area is 1 by default plot(normpdf_bins, y, linestyle='--', color=lc, linewidth=1) font = { 'color': lc, 'fontsize': 11} text(mu, 0.0 , "*", font, verticalalignment='center', horizontalalignment='center') xlabel(x_label) if show_legend and series_names: fp = FontProperties() fp.set_size('x-small') legend(all_patches, series_names, prop = fp) #output figure if graph name set -- otherwise, leave for further changes if graph_name is not None: savefig(graph_name) def plot_scatter_with_histograms(data, graph_name='histo_scatter.png', \ graph_grid='/', prob_axes=False, bins=20, frac=0.9, scatter_alpha=0.5, \ hist_alpha=0.8, colors=standard_series_colors, normed='height', **kwargs): """Plots a scatter plot with histograms showing distribution of x and y. Data should be list of [x1, y1, x2, y2, ...]. """ #set up subplot coords tl=subplot(2,2,1) br=subplot(2,2,4) bl=subplot(2,2,3, sharex=tl, sharey=br) #get_position returns a Bbox relative to figure tl_coords = tl.get_position() bl_coords = bl.get_position() br_coords = br.get_position() left = tl_coords.xmin bottom = bl_coords.ymin width = br_coords.xmax - left height = tl_coords.ymax - bottom bl.set_position([left, bottom, frac*width, frac*height]) tl.set_position([left, bottom+(frac*height), frac*width, (1-frac)*height]) br.set_position([left+(frac*width), bottom, (1-frac)*width, frac*height]) #suppress frame and axis for histograms for i in [tl,br]: i.set_frame_on(False) i.xaxis.set_visible(False) i.yaxis.set_visible(False) plot_scatter(data=data, alpha=scatter_alpha, axes=bl, **kwargs) for i in range(0, len(data), 2): x, y = data[i], data[i+1] color = colors[int((i/2))%len(colors)] hist(x, facecolor=color, bins=bins, alpha=hist_alpha, normed=normed, axes=tl) hist(y, facecolor=color, bins=bins, alpha=hist_alpha, normed=normed, \ axes=br, orientation='horizontal') if prob_axes: bl.set_xlim(0,1) bl.set_ylim(0,1) br.set_ylim(0,1) tl.set_xlim(0,1) #output figure if graph name set -- otherwise, leave for further changes if graph_name is not None: savefig(graph_name) def format_contour_array(data, points_per_cell=20, bulk=0.8): """Formats [x,y] series of data into x_bins, y_bins and data for contour(). data: 2 x n array of float representing x,y coordinates points_per_cell: average points per unit cell in the bulk of the data, default 3 bulk: fraction containing the 'bulk' of the data in x and y, default 0.8 (i.e. 80% of the data will be used in the calculation). returns: x-bin, y-bin, and a square matrix of frequencies to be plotted WARNING: Assumes x and y are in the range 0-1. """ #bind x and y data data_x = sort(data[0]) #note: numpy sort returns a sorted copy data_y = sort(data[1]) num_points = len(data_x) #calculate the x and y bounds holding the bulk of the data low_prob = (1-bulk)/2.0 low_tail = int(num_points*low_prob) high_tail = int(num_points*(1-low_prob)) x_low = data_x[low_tail] x_high = data_x[high_tail] y_low = data_y[low_tail] y_high = data_y[high_tail] #calculate the side length in the bulk that holds the right number of #points delta_x = x_high - x_low delta_y = y_high - y_low points_in_bulk = num_points * bulk #approximate: assumes no correlation area_of_bulk = delta_x * delta_y points_per_area = points_in_bulk/area_of_bulk side_length = sqrt(points_per_cell / points_per_area) #correct the side length so we get an integer number of bins. num_bins = int(1/side_length) corrected_side_length = 1.0/num_bins #figure out how many items are in each grid square in x and y # #this is the tricky part, because contour() takes as its data matrix #the points at the vertices of each cell, rather than the points at #the centers of each cell. this means that if we were going to make #a 3 x 3 grid, we actually have to estimate a 4 x 4 matrix that's offset #by half a unit cell in both x and y. # #if the data are between 0 and 1, the first and last bin in our range are #superfluous because searchsorted will put items before the first #bin into bin 0, and items after the last bin into bin n+1, where #n is the maximum index in the original array. for example, if we #have 3 bins, the values .33 and .66 would suffice to find the centers, #because anything below .33 gets index 0 and anything above .66 gets index #2 (anything between them gets index 1). incidentally, this prevents #issues with floating-point error and values slightly below 0 or above 1 #that might otherwise arise. # #however, for our 3 x 3 case, we actually want to start estimating at the #cell centered at 0, i.e. starting at -.33/2, so that we get the four #estimates centered at (rather than starting at) 0, .33, .66, and 1. #because the data are constrained to be between 0 and 1, we will need to #double the counts at the edges (and quadruple them at the corners) to get #a fair estimate of the density. csl = corrected_side_length #save typing below eps = csl/10 #don't ever want max value to be in the list precisely half_csl = .5*csl bins = arange(half_csl, 1+half_csl-eps, csl) x_coords = searchsorted(bins, data[0]) y_coords = searchsorted(bins, data[1]) #matrix has dimension 1 more than num bins, b/c can be above largest matrix = zeros((num_bins+1, num_bins+1)) #for some reason, need to swap x and y to match up with normal #scatter plots for coord in zip(y_coords, x_coords): matrix[coord] += 1 #we now have estimates of the densities at the edge of each of the #n x n cells in the grid. for example, if we have a 3 x 3 grid, we have #16 densities, one at the center of each grid cell (0, .33, .66, 1 in each #dimension). need to double the counts at edges to reflect places where #we can't observe data because of range restrictions. matrix[0]*=2 matrix[:,0]*=2 matrix[-1]*=2 matrix[:,-1]*=2 #return adjusted_bins as centers, rather than boundaries, of the range x_bins = csl*arange(num_bins+1) return x_bins, x_bins, matrix if __name__ == '__main__': from numpy.random import normal x = normal(0.3, 0.05, 1000) y = normal(0.5, 0.1, 1000) plot_scatter_with_histograms([x,x+y, y, (x+y)/2], prob_axes=True) PyCogent-1.5.3/cogent/db/__init__.py000644 000765 000024 00000001367 12024702176 020225 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """db: provides support libraries for database retrieval. """ __all__ = ['ncbi', 'util', 'rfam', 'pdb', 'ensembl'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Mike Robeson", "Zongzhi Liu", "Gavin Huttley", "Hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" """Need to add: rfam RNA families database go Gene Ontology kegg KEGG metabolic pathways and related file formats pfam Protein families database pdb Protein Data Bank ndb Nucleotide Data Bank ('Atlas' files) """ PyCogent-1.5.3/cogent/db/ensembl/000755 000765 000024 00000000000 12024703626 017533 5ustar00jrideoutstaff000000 000000 PyCogent-1.5.3/cogent/db/ncbi.py000644 000765 000024 00000066756 12024702176 017416 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """To automate batch functions provided by EUtils (http://www.ncbi..nih.gov/entrez/eutils) search and fetch for sets of sequence information """ from urllib import urlopen, urlretrieve from xml.dom.minidom import parseString from xml.etree.ElementTree import parse from cogent.db.util import UrlGetter, expand_slice,\ make_lists_of_expanded_slices_of_set_size,make_lists_of_accessions_of_set_size from time import sleep from StringIO import StringIO from cogent.parse.record_finder import DelimitedRecordFinder, never_ignore from string import strip __author__ = "Mike Robeson" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Mike Robeson", "Rob Knight", "Zongzhi Liu"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Mike Robeson" __email__ = "mike.robeson@colorado.edu" __status__ = "Production" class QueryNotFoundError(Exception): pass #eutils_base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils' eutils_base='http://www.ncbi.nlm.nih.gov/entrez/eutils' #EUtils requires a tool and and email address default_tool_string = 'PyCogent' default_email_address = 'Michael.Robeson@colorado.edu' #databases last updated 7/22/05 valid_databases=dict.fromkeys(["pubmed", "protein", "nucleotide", "structure",\ "genome", "books", "cancerchromosomes", "cdd", "domains", "gene", \ "genomeprj", "gensat", "geo", "gds", "homologene", "journals", "mesh",\ "ncbisearch", "nlmcatalog", "omim", "pmc", "popset", "probe", "pcassay",\ "pccompound", "pcsubstance", "snp", "taxonomy", "unigene", "unists"]) #rettypes last updated 7/22/05 #somehow, I don't think we'll be writing parsers for all these... #WARNING BY RK 4/13/09: THESE RETTYPES ARE HIGHLY MISLEADING AND NO LONGER #WORK. See this URL for the list of "official" rettypes, which is highly #incomplete and has some important omissions (e.g. rettype 'gi' is missing #but is the "official" replacement for 'GiList'): # http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html #In particular, use gb or gp for GenBank or GenPept, use gi for GiList, #use fasta for FASTA, and several other changes. #Until we get a complete accounting of what all the changes are, treat the #rettypes below with extreme caution and experiment in the interpreter. rettypes = {} rettypes['pubmed']='DocSum Brief Abstract Citation MEDLINE XML uilist ExternalLink ASN1 pubmed_pubmed pubmed_pubmed_refs pubmed_books_refs pubmed_cancerchromosomes pubmed_cdd pubmed_domains pubmed_gds pubmed_gene pubmed_gene_rif pubmed_genome pubmed_genomeprj pubmed_gensat pubmed_geo pubmed_homologene pubmed_nucleotide pubmed_omim pubmed_pcassay pubmed_pccompound pubmed_pccompound_mesh pubmed_pcsubstance pubmed_pcsubstance_mesh pubmed_pmc pubmed_pmc_refs pubmed_popset pubmed_probe pubmed_protein pubmed_snp pubmed_structure pubmed_unigene pubmed_unists' rettypes['protein']='DocSum ASN1 FASTA XML GenPept GiList graph fasta_xml igp_xml gpc_xml ExternalLink protein_protein protein_cdd protein_domains protein_gene protein_genome protein_genomeprj protein_homologene protein_nucleotide protein_nucleotide_mgc protein_omim protein_pcassay protein_pccompound protein_pcsubstance protein_pmc protein_popset protein_pubmed protein_snp protein_snp_genegenotype protein_structure protein_taxonomy protein_unigene' rettypes['nucleotide']='DocSum ASN1 FASTA XML GenBank GiList graph fasta_xml gb_xml gbc_xml ExternalLink nucleotide_comp_nucleotide nucleotide_nucleotide nucleotide_nucleotide_comp nucleotide_nucleotide_mrna nucleotide_comp_genome nucleotide_gene nucleotide_genome nucleotide_genome_samespecies nucleotide_gensat nucleotide_geo nucleotide_homologene nucleotide_mrna_genome nucleotide_omim nucleotide_pcassay nucleotide_pccompound nucleotide_pcsubstance nucleotide_pmc nucleotide_popset nucleotide_probe nucleotide_protein nucleotide_pubmed nucleotide_snp nucleotide_snp_genegenotype nucleotide_structure nucleotide_taxonomy nucleotide_unigene nucleotide_unists' rettypes['structure']='DocSum Brief Structure Summary uilist ExternalLink structure_domains structure_genome structure_nucleotide structure_omim structure_pcassay structure_pccompound structure_pcsubstance structure_pmc structure_protein structure_pubmed structure_snp structure_taxonomy' rettypes['genome']='DocSum ASN1 GenBank XML ExternalLink genome_genomeprj genome_nucleotide genome_nucleotide_comp genome_nucleotide_mrna genome_nucleotide_samespecies genome_omim genome_pmc genome_protein genome_pubmed genome_structure genome_taxonomy' rettypes['books']='DocSum Brief Books books_gene books_omim books_pmc_refs books_pubmed_refs' rettypes['cancerchromosomes']='DocSum SkyCghDetails SkyCghCommon SkyCghCommonVerbose cancerchromosomes_cancerchromosomes_casecell cancerchromosomes_cancerchromosomes_cellcase cancerchromosomes_cancerchromosomes_cytocgh cancerchromosomes_cancerchromosomes_cytoclincgh cancerchromosomes_cancerchromosomes_cytoclinsky cancerchromosomes_cancerchromosomes_cytodiagcgh cancerchromosomes_cancerchromosomes_cytodiagsky cancerchromosomes_cancerchromosomes_cytosky cancerchromosomes_cancerchromosomes_diag cancerchromosomes_cancerchromosomes_textual cancerchromosomes_pmc cancerchromosomes_pubmed' rettypes['cdd']='DocSum Brief uilist cdd_cdd_fused cdd_cdd_related cdd_gene cdd_homologene cdd_pmc cdd_protein cdd_pubmed cdd_taxonomy' rettypes['domains']='DocSum Brief uilist domains_domains_new domains_pmc domains_protein domains_pubmed domains_structure domains_taxonomy' rettypes['gene']='Default DocSum Brief ASN.1 XML Graphics gene_table uilist ExternalLink gene_books gene_cdd gene_gensat gene_geo gene_homologene gene_nucleotide gene_nucleotide_mgc gene_omim gene_pmc gene_probe gene_protein gene_pubmed gene_pubmed_rif gene_snp gene_snp_genegenotype gene_taxonomy gene_unigene gene_unists' rettypes['genomeprj']='DocSum Brief Overview genomeprj_genomeprj genomeprj_genome genomeprj_nucleotide genomeprj_nucleotide_mrna genomeprj_nucleotide_organella genomeprj_nucleotide_wgs genomeprj_pmc genomeprj_popset genomeprj_protein genomeprj_pubmed genomeprj_taxonomy' rettypes['gensat']='Group Detail DocSum Brief gensat_gensat gensat_gene gensat_geo gensat_nucleotide gensat_pmc gensat_pubmed gensat_taxonomy gensat_unigene' rettypes['geo']='DocSum Brief ExternalLink geo_geo_homologs geo_geo_prof geo_geo_seq geo_gds geo_gene geo_gensat geo_homologene geo_nucleotide geo_omim geo_pmc geo_pubmed geo_taxonomy geo_unigene' rettypes['gds']='DocSum Brief gds_gds gds_geo gds_pmc gds_pubmed gds_taxonomy' rettypes['homologene']='DocSum Brief HomoloGene AlignmentScores MultipleAlignment ASN1 XML FASTA homologene_homologene homologene_cdd homologene_gene homologene_geo homologene_nucleotide homologene_omim homologene_pmc homologene_protein homologene_pubmed homologene_snp homologene_snp_genegenotype homologene_taxonomy homologene_unigene' rettypes['journals']='DocSum full journals_PubMed journals_Protein journals_Nucleotide journals_Genome journals_Popset journals_PMC journals_nlmcatalog' rettypes['mesh']='Full DocSum Brief mesh_PubMed' rettypes['ncbisearch']='DocSum Brief Home+Page+View ncbisearch_ncbisearch' rettypes['nlmcatalog']='Brief DocSum XML Expanded Full Subject ExternalLink' rettypes['omim']='DocSum Detailed Synopsis Variants ASN1 XML ExternalLink omim_omim omim_books omim_gene omim_genome omim_geo omim_homologene omim_nucleotide omim_pmc omim_protein omim_pubmed omim_snp omim_snp_genegenotype omim_structure omim_unigene omim_unists' rettypes['pmc']='DocSum Brief XML TxTree pmc_books_refs pmc_cancerchromosomes pmc_cdd pmc_domains pmc_gds pmc_gene pmc_genome pmc_genomeprj pmc_gensat pmc_geo pmc_homologene pmc_nucleotide pmc_omim pmc_pccompound pmc_pcsubstance pmc_popset pmc_protein pmc_pubmed pmc_refs_pubmed pmc_snp pmc_structure pmc_taxonomy pmc_unists' rettypes['popset']='DocSum PS ASN1 XML GiList ExternalLink TxTree popset_genomeprj popset_nucleotide popset_protein popset_pubmed popset_taxonomy' rettypes['probe']='DocSum Brief ASN1 XML Probe probe_probe probe_gene probe_nucleotide probe_pubmed probe_taxonomy' rettypes['pcassay']='DocSum Brief uilist pcassay_nucleotide pcassay_pccompound pcassay_pccompound_active pcassay_pccompound_inactive pcassay_pcsubstance pcassay_pcsubstance_active pcassay_pcsubstance_inactive pcassay_protein pcassay_pubmed pcassay_structure' rettypes['pccompound']='Brief DocSum PROP SYNONYMS pc_fetch pccompound_pccompound_pulldown pccompound_pccompound_sameanytautomer_pulldown pccompound_pccompound_sameconnectivity_pulldown pccompound_pccompound_sameisotopic_pulldown pccompound_pccompound_samestereochem_pulldown pccompound_nucleotide pccompound_pcassay pccompound_pcassay_active pccompound_pcassay_inactive pccompound_pcsubstance pccompound_pmc pccompound_protein pccompound_pubmed pccompound_pubmed_mesh pccompound_structure' rettypes['pcsubstance']='Brief DocSum PROP SYNONYMS pc_fetch IDLIST pcsubstance_pcsubstance_pulldown pcsubstance_pcsubstance_same_pulldown pcsubstance_pcsubstance_sameanytautomer_pulldown pcsubstance_pcsubstance_sameconnectivity_pulldow pcsubstance_pcsubstance_sameisotopic_pulldown pcsubstance_pcsubstance_samestereochem_pulldown pcsubstance_mesh pcsubstance_nucleotide pcsubstance_pcassay pcsubstance_pcassay_active pcsubstance_pcassay_inactive pcsubstance_pccompound pcsubstance_pmc pcsubstance_protein pcsubstance_pubmed pcsubstance_pubmed_mesh pcsubstance_structure' rettypes['snp']='DocSum Brief FLT ASN1 XML FASTA RSR ssexemplar CHR FREQXML GENB GEN GENXML DocSet Batch uilist GbExp ExternalLink MergeStatus snp_snp_genegenotype snp_gene snp_homologene snp_nucleotide snp_omim snp_pmc snp_protein snp_pubmed snp_structure snp_taxonomy snp_unigene snp_unists' rettypes['taxonomy']='DocSum Brief TxUidList TxInfo XML TxTree ExternalLink taxonomy_protein taxonomy_nucleotide taxonomy_structure taxonomy_genome taxonomy_gene taxonomy_cdd taxonomy_domains taxonomy_gds taxonomy_genomeprj taxonomy_gensat taxonomy_homologene taxonomy_pmc taxonomy_popset taxonomy_probe taxonomy_pubmed taxonomy_snp taxonomy_unigene taxonomy_unists' rettypes['unigene']='DocSum Brief ExternalLink unigene_unigene unigene_unigene_expression unigene_unigene_homologous unigene_gene unigene_gensat unigene_geo unigene_homologene unigene_nucleotide unigene_nucleotide_mgc unigene_omim unigene_protein unigene_pubmed unigene_snp unigene_snp_genegenotype unigene_taxonomy unigene_unists' rettypes['unists']='DocSum Brief ExternalLink unists_gene unists_nucleotide unists_omim unists_pmc unists_pubmed unists_snp unists_taxonomy unists_unigene' #convert into dict of known rettypes for efficient lookups -- don't want to #scan list every time. for key, val in rettypes.items(): rettypes[key] = dict.fromkeys(val.split()) class ESearch(UrlGetter): """Performs an ESearch, getting a list of ids from an arbitrary query.""" PrintedFields = dict.fromkeys(['db', 'usehistory', 'term', 'retmax', 'retstart', 'tool', 'email']) Defaults = {'db':'nucleotide','usehistory':'y', 'retmax':1000, 'tool':default_tool_string, 'email':default_email_address} BaseUrl = eutils_base+'/esearch.fcgi?' class EFetch(UrlGetter): """Retrieves a list of primary ids. WARNING: retmax (the maximum return value) is only 3 by default, so you will only get 3 records (this is for testing purposes). You will probably want to increase for real searches. """ PrintedFields = dict.fromkeys(['db', 'rettype', 'retmode', 'query_key',\ 'WebEnv', 'retmax', 'retstart', 'id', 'tool', 'email']) Defaults = {'retmode':'text','rettype':'fasta','db':'nucleotide',\ 'retstart':0, 'retmax':100, 'tool':default_tool_string, \ 'email':default_email_address} BaseUrl = eutils_base+'/efetch.fcgi?' class ELink(UrlGetter): """Retrieves a list of ids from one db that link to another db.""" PrintedFields = dict.fromkeys(['db', 'id', 'reldate', 'mindate', 'maxdate', 'datetype', 'term', 'retmode', 'db', 'dbfrom', 'WebEnv', 'query_key', 'holding', 'cmd', 'tool', 'email']) Defaults = {'tool':default_tool_string, 'email':default_email_address} BaseUrl = eutils_base + '/elink.fcgi?' class ESearchResult(object): def __init__(self, **kwargs): self.__dict__.update(kwargs) def __str__(self): return str(self.__dict__) def id_list_constructor(id_list_node): """Takes an id_list xml node and converts it into list of ids as strings""" return [str_constructor(n) for n in id_list_node.childNodes \ if n.nodeType != n.TEXT_NODE] def int_constructor(node): """Makes an int out of node's first textnode child.""" return int(node.firstChild.data) def str_constructor(node): """Makes an str out of node's first textnode child.""" return str(node.firstChild.data) #the following are the only keys we explicitly handle now: #(note difference in capitalization from parameters passed in) esearch_constructors = {'Count':int_constructor, 'RetMax':int_constructor,\ 'RetStart':int_constructor, 'QueryKey':int_constructor, \ 'WebEnv':str_constructor, 'IdList':id_list_constructor} def ESearchResultParser(result_as_string): """Parses an ESearch result. Returns ESearchResult object.""" if '414 Request-URI Too Large' in result_as_string: raise ValueError, "Tried to pass too large an URI:\n" + result_as_string doc = parseString(result_as_string) #assume one query result -- may need to fix query = doc.childNodes[-1] result = {} for n in query.childNodes: #skip top-level text nodes if n.nodeType==n.TEXT_NODE: continue name = str(n.tagName) #who cares about unicode anyway... if name in esearch_constructors: result[name] = esearch_constructors[name](n) else: #just keep the data if we don't know what it is result[name] = n.toxml() return ESearchResult(**result) def ELinkResultParser(text): """Gets the linked ids out of a single ELink result. Does not use the XML parser because of problems with long results. Only handles cases where there is a single set of links between databases. """ result = [] in_links = False for line in text.splitlines(): if '' in line: in_links = True elif in_links and ('' in line): try: #expect line of form xxxx: want xxxx result.append(line.split('>', 1)[1].rsplit('<', 1)[0]) except (IndexError, TypeError): pass elif '' in line: #end of block break return result class EUtils(object): """Retrieves records from NCBI using EUtils.""" def __init__(self, filename=None, wait=0.5, retmax=100, url_limit=400, DEBUG=False, max_recs=None, **kwargs): self.__dict__.update(kwargs) self.filename = filename self.wait = wait self.retstart = 0 # was originally set to 1 self.DEBUG = DEBUG self.retmax = retmax self.url_limit = url_limit # limits url esearch term size self.max_recs = max_recs #adjust retmax if max_recs is set: no point getting more records if max_recs is not None and max_recs < retmax: self.retmax = max_recs def __getitem__(self, query): """Gets an query from NCBI. Assumes lists are lists of accessions. Returns a handle to the result (either in memory or file on disk). WARNING: result is not guaranteed to contain any data. """ #check if it's a slice if isinstance(query, slice): #query = expand_slice(query) queries = make_lists_of_expanded_slices_of_set_size(query) return self.grab_data(queries) #check if it's a list -- if so, delimit with ' ' if isinstance(query, list) or isinstance(query,tuple): #query = ' '.join(map(str, query)) queries = make_lists_of_accessions_of_set_size(query) return self.grab_data(queries) # most likey a general set of search terms #e.g. '9606[taxid] OR 28901[taxid]' . So just return. return self.grab_data([query]) def grab_data(self,queries): """Iterates through list of search terms and combines results. -queries : list of lists of accession lists / query items This will mostly only apply whe the user wants to download 1000s of sequences via accessions. This will superced the GenBank url length limit. So, we break up the accession list into sets of 400 terms per list. WARNING: if you _really_ have more than 300-400 terms similar to: 'angiotensin[ti] AND rodents[orgn]' The results will not be what you want anyway due do the limitations of the esearch url length at GenBank. You'll just end up returning sets of results from the broken up word based search terms. """ #figure out where to put the data if self.filename: result = open(self.filename, 'w') else: result = StringIO() for query in queries: self.term=query search_query = ESearch(**self.__dict__) search_query.retmax = 0 #don't want the ids, just want to post search if self.DEBUG: print 'SEARCH QUERY:' print str(search_query) cookie = search_query.read() if self.DEBUG: print 'COOKIE:' print `cookie` search_result = ESearchResultParser(cookie) if self.DEBUG: print 'SEARCH RESULT:' print search_result try: self.query_key = search_result.QueryKey self.WebEnv = search_result.WebEnv except AttributeError: #The query_key and/or WebEnv not Found! #GenBank occiasionally does not return these when user attempts # to only fetch data by Accession or UID. So we just #move on to extract UID list directly from the search result try: self.id = ','.join(search_result.IdList) except AttributeError: raise QueryNotFoundError,\ "WebEnv or query_key not Found! Query %s returned no results.\nURL was:\n%s" % \ (repr(query),str(search_query)) count = search_result.Count #wrap the fetch in a loop so we get all the results fetch_query = EFetch(**self.__dict__) curr_rec = 0 #check if we need to get additional ids if self.max_recs: #cut off at max_recs if set count = min(count, self.max_recs) retmax = min(self.retmax, self.max_recs) else: retmax = self.retmax while curr_rec < count: #do the fetch if count - curr_rec < self.retmax: fetch_query.retmax = count - curr_rec fetch_query.retstart = curr_rec if self.DEBUG: print 'FETCH QUERY' print 'CURR REC:', curr_rec, 'COUNT:', count print str(fetch_query) #return the result of the fetch curr = fetch_query.read() result.write(curr) if not curr.endswith('\n'): result.write('\n') curr_rec += retmax sleep(self.wait) #clean up after retrieval if self.filename: result.close() return open(self.filename, 'r') else: result.seek(0) return result #The following are convenience wrappers for some of the above functionality def get_primary_ids(term, retmax=100, max_recs=None, **kwargs): """Gets primary ids from query.""" search_result = None records_got = 0 if max_recs: retmax = min(retmax, max_recs) search_query = ESearch(term=term, retmax=retmax, **kwargs) while 1: cookie = search_query.read() if search_result is None: search_result = ESearchResultParser(cookie) else: search_result.IdList.extend(ESearchResultParser(cookie).IdList) #set the query key and WebEnv search_query.query_key = search_result.QueryKey search_query.WebEnv = search_result.WebEnv #if more results than retmax, keep adding results if max_recs: recs_to_get = min(max_recs, search_result.Count) else: recs_to_get = search_result.Count records_got += retmax if records_got >= recs_to_get: break elif recs_to_get - records_got < retmax: search_query.retmax = recs_to_get - records_got search_query.retstart = records_got return search_result.IdList def ids_to_taxon_ids(ids, db='nucleotide'): """Converts primary ids to taxon ids""" link = ELink(id=' '.join(ids), db='taxonomy', dbfrom=db, DEBUG=True) return ELinkResultParser(link.read()) def get_between_tags(line): """"Returns portion of line between xml tags.""" return line.split('>', 1)[1].rsplit('<', 1)[0] def taxon_lineage_extractor(lines): """Extracts lineage from taxonomy record lines, not incl. species.""" for line in lines: if '' in line: #expect line of form xxxx where xxxx semicolon- #delimited between_tags = line.split('>', 1)[1].rsplit('<', 1)[0] yield map(strip, between_tags.split(';')) taxon_record_finder = DelimitedRecordFinder('', constructor=None, strict=False) def get_taxid_name_lineage(rec): """Returns taxon id, name, and lineage from single xml taxon record.""" tax_tag = ' ' name_tag = ' ' lineage_tag = ' ' taxid = name = lineage = None for line in rec: if line.startswith(tax_tag): taxid = get_between_tags(line) elif line.startswith(name_tag): name = get_between_tags(line) elif line.startswith(lineage_tag): lineage = map(strip, get_between_tags(line).split(';')) return taxid, name, lineage def get_taxa_names_lineages(lines): """Extracts taxon, name and lineage from each entry in an XML record.""" empty_result = (None, None, None) for rec in taxon_record_finder(lines): curr = get_taxid_name_lineage(rec) if curr != empty_result: yield curr #def taxon_ids_to_names_and_lineages(ids, retmax=1000): # """Yields taxon id, name and lineage for a set of taxon ids.""" # e = EUtils(db='taxonomy', rettype='TxInfo', retmode='xml', retmax=retmax, # DEBUG=False) # ids = fix_taxon_ids(ids) # result = e[ids].read().splitlines() # #print result # return get_taxa_names_lineages(result) def parse_taxonomy_using_elementtree_xml_parse(search_result): """Returns upper level XML taxonomy information from GenBank. search_result: StringIO object Returns list of all results in the form of: [{result_01},{result_02},{result_03}] For each dict the key and values would be: key,value = xml label, e.g. [{'Lineage':'Bacteria; Proteobacteria...', 'TaxId':'28901', 'ScientificName':'Salmonella enterica'}, {...}...] """ xml_data = parse(search_result) xml_data_root = xml_data.getroot() tax_info_list = [''] l = [] for individual_result in xml_data_root: children = list(individual_result) d = {} for child in children: key = child.tag value = child.text.strip() # We only want to retain the upper-level taxonomy information # from the xml parser and ignore all the rest of the information. # May revisit this in the future so that we can extract # 'GeneticCode', 'GCId', 'GCName', etc... <-- These values at this # level have whitespace, so we just ignore. Must traverse deeper to # obtain this information. Again, may implement in the future if #needed if value == '': continue else: d[key] = value l.append(d) return l def taxon_ids_to_names_and_lineages(ids, retmax=1000): """Yields taxon id, name and lineage for a set of taxon ids.""" e = EUtils(db='taxonomy', rettype='xml', retmode='xml', retmax=retmax, DEBUG=False) fids = fix_taxon_ids(ids) #print '\nids: ',fids result = StringIO() result.write(e[fids].read()) result.seek(0) data = parse_taxonomy_using_elementtree_xml_parse(result) return [(i['TaxId'],i['ScientificName'],i['Lineage'])for i in data] def taxon_ids_to_lineages(ids, retmax=1000): """Returns full taxonomy (excluding species) from set of taxon ids. WARNING: Resulting lineages aren't in the same order as input. Use taxon_ids_to_name_and_lineage if you need the names and/or lineages associated with the specific ids. """ ids = fix_taxon_ids(ids) e = EUtils(db='taxonomy', rettype='xml', retmode='xml', retmax=retmax, DEBUG=False) result = e[ids].read().splitlines() #print result return taxon_lineage_extractor(result) #def taxon_ids_to_names(ids, retmax=1000): # """Returns names (e.g. species) from set of taxon ids. # # WARNING: Resulting lineages aren't in the same order as input. Use # taxon_ids_to_name_and_lineage if you need the names and/or lineages # associated with the specific ids. # """ # e = EUtils(db='taxonomy', rettype='brief', retmode='text', retmax=retmax, # DEBUG=False) # transformed_ids = fix_taxon_ids(ids) # return e[transformed_ids].read().splitlines() def taxon_ids_to_names(ids, retmax=1000): """Returns names (e.g. species) from set of taxon ids. WARNING: Resulting lineages aren't in the same order as input. Use taxon_ids_to_name_and_lineage if you need the names and/or lineages associated with the specific ids. """ e = EUtils(db='taxonomy', rettype='xml', retmode='xml', retmax=retmax, DEBUG=False) transformed_ids = fix_taxon_ids(ids) h = StringIO() h.write(e[transformed_ids].read()) h.seek(0) result = parse_taxonomy_using_elementtree_xml_parse(h) return [i['ScientificName'] for i in result] def fix_taxon_ids(ids): """Fixes list of taxonomy ids by adding [taxid] to each. Need to add taxid field restriction to each id because NCBI broke taxon id search around 3/07 and has no plans to fix it. """ if isinstance(ids, str): if not ids.endswith('[taxid]'): ids += '[taxid]' transformed_ids = ids else: transformed_ids = [] for i in ids: if not i.endswith('[taxid]'): i = i.strip() + '[taxid]' transformed_ids.append(i) transformed_ids = ' OR '.join(transformed_ids) return transformed_ids def get_unique_lineages(query, db='protein'): """Gets the unique lineages directly from a query.""" return set(map(tuple, taxon_ids_to_lineages(ids_to_taxon_ids( get_primary_ids(query,db=db),db=db)))) def get_unique_taxa(query, db='protein'): """Gets the unique lineages directly from a query.""" return set(taxon_ids_to_names(ids_to_taxon_ids(get_primary_ids(query,db=db),db=db))) if __name__ == '__main__': from sys import argv, exit if len(argv) < 5: print "Syntax: python ncbi.py db rettype retmax query." exit() db = argv[1] rettype = argv[2] retmax = int(argv[3]) query = ' '.join(argv[4:]) print 'Query: ', query e = EUtils(db=db,rettype=rettype,retmax=retmax, DEBUG=True) print e[query].read() PyCogent-1.5.3/cogent/db/pdb.py000644 000765 000024 00000001511 12024702176 017222 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Retrieves records by id from PDB, the Protein Data Bank.""" from cogent.db.util import UrlGetter __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" pdb_base='http://www.rcsb.org/pdb/files/' class Pdb(UrlGetter): """Returns a pdb file.""" BaseUrl = pdb_base Suffix='.pdb' Key=None def __str__(self): return self.BaseUrl + str(self.Key) + self.Suffix def __getitem__(self, item): """Returns handle to file containing specified PDB id.""" orig_key = self.Key self.Key = item.lower() result = self.open() self.Key = orig_key return result PyCogent-1.5.3/cogent/db/rfam.py000644 000765 000024 00000002233 12024702176 017404 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Retrieves records by id from RFAM, the RNA families database.""" from cogent.db.util import UrlGetter __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight","Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" rfam_base='http://rfam.sanger.ac.uk/family/alignment/download/format?' rfam_formats = dict.fromkeys('stockholm pfam fasta fastau') rfam_types = dict.fromkeys(['seed','full']) class Rfam(UrlGetter): """Returns a pdb file.""" Defaults={'alnType':'seed','format':'stockholm','acc':None,'nseLabels':'1', 'download':'0'} PrintedFields=dict.fromkeys('acc alnType nseLabels format download'.split()) BaseUrl = rfam_base def __getitem__(self, item): """Returns handle to file containing aln of specified rfam id.""" orig_acc = self.acc item = str(item).upper() if not item.startswith('RF'): item = 'RF'+item self.acc = item result = self.open() self.acc = orig_acc return result PyCogent-1.5.3/cogent/db/util.py000644 000765 000024 00000010217 12024702176 017435 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Retrieve information from web databases. """ from urllib import urlopen, urlretrieve, quote_plus __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class UrlGetter(object): Defaults = {} #override in derived classes -- default values PrintedFields = {} #override in derived classes -- fields to print BaseUrl = '' #override in derived classes KeyValDelimiter = '=' FieldDelimiter = '&' def __init__(self, **kwargs): """Returns new instance with arbitrary kwargs.""" self.__dict__.update(self.Defaults) self.__dict__.update(kwargs) self._temp_args = {} def __str__(self): to_get = self.__dict__.copy() to_get.update(self._temp_args) return self.BaseUrl + self.FieldDelimiter.join(\ [quote_plus(k)+self.KeyValDelimiter+quote_plus(str(v)) for k, v in to_get.items()\ if k in self.PrintedFields]) def open(self, **kwargs): """Returns a stream handle to URL result, temporarily overriding kwargs..""" self._temp_args = kwargs result = urlopen(str(self)) self._temp_args = {} return result def read(self, **kwargs): """Gets URL and reads into memory, temporarily overriding kwargs.""" result = self.open(**kwargs) data = result.read() result.close() return data def retrieve(self, fname, **kwargs): """Gets URL and writes to file fname, temporarily overriding kwargs. Note: produces no return value.""" self._temp_args = kwargs urlretrieve(str(self), fname) self._temp_args = None def expand_slice(s): """Takes a start and end accession, and gets the whole range. WARNING: Unlike standard slices, includes the last item in the range. In other words, obj[AF1001:AF1010] will include AF1010. Both accessions must have the same non-numeric prefix. """ start, step, end = s.start, s.step, s.stop #find where the number is start_index = last_nondigit_index(start) end_index = last_nondigit_index(end) prefix = start[:start_index] if prefix != end[:end_index]: raise TypeError, "Range start and end don't have same prefix" if not step: step = 1 range_start = long(start[start_index:]) range_end = long(end[end_index:]) field_width = str(len(start) - start_index) format_string = '%'+field_width+'.'+field_width+'d' return [prefix + format_string % i \ for i in range(range_start, range_end+1, step)] def make_lists_of_expanded_slices_of_set_size(s,size_limit=200): """Returns a list of Accessions terms from 'expand_slice'. GenBank URLs are limited in size. This helps break up larger lists of Accessions (e.g. thousands) into GenBank friendly sizes for down stream fetching. -s : slice of accessions -size_limit : max items each list should contain """ full_list = expand_slice(s) ls = len(full_list) l = [] for i in range(ls/size_limit+1): start = i * size_limit end = (i+1) * size_limit subset = full_list[start:end] l.append(' '.join(subset)) return l def make_lists_of_accessions_of_set_size(s,size_limit=200): """Returns list of search terms that contain accessions up to the size 'size_limit' This is to help make friendly GenBank urls for fetching large lists of accessions (1000s). -s : list of accessions -size_limit : max items each list should contain """ ls = len(s) l = [] for i in range(ls/size_limit+1): start = i * size_limit end = (i+1) * size_limit subset = s[start:end] l.append(' '.join(subset)) return l def last_nondigit_index(s): """Returns the index of s such that s[i:] is numeric, or None.""" for i in range(len(s)): if s[i:].isdigit(): return i #if we get here, there weren't any trailing digits return None PyCogent-1.5.3/cogent/db/ensembl/__init__.py000644 000765 000024 00000001140 12024702176 021637 0ustar00jrideoutstaff000000 000000 from host import HostAccount from species import Species from genome import Genome from compara import Compara from util import NoItemError __all__ = ['assembly', 'compara', 'database', 'genome', 'host', 'name', 'region', 'related_region', 'sequence', 'species', 'util', 'HostAccount', 'Species', 'Genome', 'Compara'] __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" PyCogent-1.5.3/cogent/db/ensembl/assembly.py000644 000765 000024 00000042007 12024702176 021726 0ustar00jrideoutstaff000000 000000 import sqlalchemy as sql from cogent.core.location import Map from cogent.db.ensembl.species import Species as _Species from cogent.db.ensembl.util import asserted_one, convert_strand, DisplayString from cogent.db.ensembl.host import DbConnection __author__ = "Hua Ying" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Hua Ying"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Hua Ying" __email__ = "Hua.Ying@anu.edu.au" __status__ = "alpha" def location_query(table, query_start, query_end, start_col = 'seq_region_start', end_col = 'seq_region_end', query = None, where = 'overlap'): # TODO should we allow for spans, overlaps, within? # the union result is a complex query and has be appended to any other queries # in which it's being employed # should we be setting default values here regarding the columns that start/end # are pulled from, or explicitly state which columns if query is None: query = sql.select([table]) if where == 'within': query.append_whereclause(sql.and_(table.c[start_col] < query_start, table.c[end_col] > query_end)) else: query.append_whereclause( sql.or_(sql.and_(table.c[start_col] < query_start, table.c[end_col] > query_end), sql.and_(table.c[start_col] >= query_start, table.c[start_col] <= query_end), sql.and_(table.c[end_col] >= query_start, table.c[end_col] <= query_end))) # the union is only being used here to order the results # that usage imposes the limitation this function must be appended to # other queries components being built into a fuller SQL query # makes me think it shouldn't be here? query = query.order_by(table.c[start_col]) return query def _get_coord_type_and_seq_region_id(coord_name, core_db): seq_region_table = core_db.getTable('seq_region') rows = sql.select([seq_region_table]).\ where(seq_region_table.c.name == coord_name).execute().fetchall() species_coord_sys = CoordSystem(species=core_db.db_name.Species, core_db = core_db) try: selected_row = asserted_one(rows) except ValueError: selected_row = None for row in rows: # not a default_version if not row['coord_system_id'] in species_coord_sys: continue elif not selected_row: selected_row = row break if selected_row is None: raise ValueError("Ambigous coordinate name: %s" % coord_name) coord_type = species_coord_sys[selected_row['coord_system_id']].name return selected_row, coord_type class Coordinate(object): def __init__(self, genome, CoordName, Start, End, Strand = 1, CoordType = None, seq_region_id = None, ensembl_coord=False): if not CoordType or not (seq_region_id or Start or End): seq_region_data, CoordType = \ _get_coord_type_and_seq_region_id(CoordName, genome.CoreDb) seq_region_id = seq_region_data['seq_region_id'] Start = Start or 0 End = End or seq_region_data['length'] # TODO allow creation with just seq_region_id self.Species = genome.Species self.CoordType = DisplayString(CoordType, repr_length=4, with_quotes=False) self.CoordName = DisplayString(CoordName, repr_length=4, with_quotes=False) # if Start == End, we +1 to End, unless these are ensembl_coord's if ensembl_coord: Start -= 1 elif Start == End: End += 1 if Start > End: assert Strand == -1,\ "strand incorrect for start[%s] > end[%s]" % (Start, End) Start, End = End, Start self.Start = Start self.End = End self.Strand = convert_strand(Strand) self.seq_region_id = seq_region_id self.genome = genome def __len__(self): return self.End - self.Start def __cmp__(self, other): return cmp((self.CoordName,self.Start), (other.CoordName,other.Start)) def _get_ensembl_start(self): # ensembl counting starts from 1 return self.Start + 1 EnsemblStart = property(_get_ensembl_start) def _get_ensembl_end(self): return self.End EnsemblEnd = property(_get_ensembl_end) def __str__(self): return '%s:%s:%s:%d-%d:%d' % (self.Species, self.CoordType, self.CoordName, self.Start, self.End, self.Strand) def __repr__(self): my_type = self.__class__.__name__ name = _Species.getCommonName(self.Species) coord_type = self.CoordType c = '%s(%r,%r,%r,%d-%d,%d)'%(my_type, name, coord_type, self.CoordName, self.Start, self.End, self.Strand) return c.replace("'", "") def adopted(self, other, shift=False): """adopts the seq_region_id (including CoordName and CoordType) of another coordinate. Arguments: - shift: an int or True/False. If int, it's added to Start/End. If bool, other.Start is added to Start/End""" if type(shift) == bool: shift = [0, other.Start][shift] return self.__class__(other.genome, CoordName=other.CoordName, Start=self.Start+shift, End=self.End+shift, Strand=other.Strand, seq_region_id=other.seq_region_id) def shifted(self, value): """adds value to Start/End coords, returning a new instance.""" new = self.copy() new.Start += value new.End += value assert len(new) > 0, 'shift generated a negative length' return new def copy(self): """returns a copy""" return self.__class__(genome=self.genome, CoordName=self.CoordName, Start=self.Start, End=self.End, Strand = self.Strand, CoordType = self.CoordType, seq_region_id = self.seq_region_id) def resized(self, from_start, from_end): """returns a new resized Coordinate with the Start=self.Start+from_start and End = self.End+from_end. If you want to shift Start upstream, add a -ve number""" new = self.copy() new.Start += from_start new.End += from_end try: assert len(new) >= 0, 'resized generated a negative length: %s' % new except (ValueError, AssertionError): raise ValueError return new def makeRelativeTo(self, other, make_relative=True): """returns a new coordinate with attributes adopted from other, and positioned relative to other.""" if other.Strand != self.Strand: Start = other.End-self.End elif make_relative: Start = self.Start-other.Start else: Start = self.Start+other.Start End = Start+len(self) return self.__class__(other.genome, CoordName=other.CoordName, Start=Start, End=End, Strand=other.Strand, seq_region_id=other.seq_region_id) class _CoordRecord(object): """store one record of the coord""" def __init__(self, attrib, rank, name = None, coord_system_id=None): self.coord_system_id = coord_system_id self.name = name self.rank = rank self.attr = attrib def __str__(self): return "coord_system_id = %d; name = %s; rank = %d; attr = %s "\ % (self.coord_system_id, self.name, self.rank, self.attr) class CoordSystemCache(object): """store coord_system table from core database. (only read default_version as stated in attrib column) There are two ways to get information about coordinate system: (1) use coord_type (e.g contig) which is at coord_system.c.name, and which are keys of _species_coord_systems[species] (2) use coord_system_id (e.g 17 refers to chromosome) which are also keys of _species_coord_systems[species] (3) to get which level of system is used for storing dna table, check 'attrib' column of coord_system as default_version, sequence_level. """ # Problem: multiple species (for compara) --> organized as {species: coordsystem} # TODO: simplify _species_coord_systems? # we place each species coord-system in _species_coord_systems, once, so # this attribute is a very _public_ attribute, and serves as a cache to # reduce unecessary lookups _species_coord_systems = {} columns = ['coord_system_id', 'name', 'rank', 'attrib'] # columns needed from coord_system table # the attrib property has sequence_level, which means this the coordinate system employed for sequence def _set_species_system(self, core_db, species): if species in self._species_coord_systems: return self._species_coord_systems[species] = {} coord_table = core_db.getTable('coord_system') records = sql.select([coord_table]).where(coord_table.c.attrib.like('default%')).\ execute().fetchall() # only select default version for record in records: attr = self._species_coord_systems[species] for key in ['coord_system_id', 'name']: key_val = record[key] vals = {} for column in self.columns: val = record[column] if isinstance(val, set): # join items in set to one string try: val = ", ".join(val) except TypeError: pass vals[column] = val attr[key_val] = _CoordRecord(**vals) def _get_seq_level_system(self, species): """returns the sequence level system for species""" sp_sys = self._species_coord_systems[species] for key, val in sp_sys.items(): if 'sequence_level' in val.attr: return val.name raise RuntimeError, 'no coord system for %s' % species def __call__(self, coord_type = None, core_db = None, species = None, seq_level=False): """coord_type can be coord_type or coord_system_id""" # TODO should only pass in core_db here, not that and Species, or just # the genome - what if someone wants to compare different ensembl # releases? keying by species is then a bad idea! better to key by # id(object) # change identifier to coord_system, handle either string val or int # (see MySQL table) as is this shouldn't be a __call__, see line 168 # for reason why we should have a method to set data: setSpeciesCoord # call then then just returns the coords for the named species species = _Species.getSpeciesName(species or core_db.db_name.Species) self._set_species_system(core_db, species) if seq_level: result = self._get_seq_level_system(species) elif coord_type: result = self._species_coord_systems[species][coord_type] else: result = self._species_coord_systems[species] return result CoordSystem = CoordSystemCache() def _rank_checking(query_coord_type, target_coord_type, core_db, species): # assiting in constructingthe query language for assembly # in order to convert between coordinate systems, we need to establish the # ranking for coordinate types # rank defines the order of conversion between coord system 'levels' # chromosome has rank 1 # super contig has rank 2 # contig has rank 4 # clone has rank 3 # converting requires changing columns between 'asm' and 'cmp' # converting from clone -> contig, use 'asm' column # converting from contig -> clone, use 'cmp' column query_rank = CoordSystem(core_db = core_db, species = species, coord_type=query_coord_type).rank target_rank = CoordSystem(core_db = core_db, species = species, coord_type=target_coord_type).rank if query_rank < target_rank: query_prefix, target_prefix = 'asm', 'cmp' elif query_rank > target_rank: query_prefix, target_prefix = 'cmp', 'asm' else: query_prefix, target_prefix = '', '' return query_prefix, target_prefix def _get_equivalent_coords(query_coord, assembly_row, query_prefix, target_prefix, target_coord_type): # TODO better function name start = query_coord.EnsemblStart end = query_coord.EnsemblEnd strand = query_coord.Strand ori = assembly_row['ori'] q_strand, t_strand = strand, strand * ori if 'seq_region' not in query_prefix: q_seq_region_id = assembly_row['%s_seq_region_id' % query_prefix] t_seq_region_id = assembly_row['%s_seq_region_id' % target_prefix] else: q_seq_region_id = assembly_row['_'.join([query_prefix, 'id'])] t_seq_region_id = assembly_row['_'.join([target_prefix, 'id'])] # d -- distance d_start = max(0, start - int(assembly_row['%s_start' % query_prefix])) d_end = max(0, int(assembly_row['%s_end' % query_prefix]) - end) # q -- query (to differ from the origin query block) q_start = int(assembly_row['%s_start'%query_prefix]) + d_start q_end = int(assembly_row['%s_end'%query_prefix]) - d_end if int(assembly_row['ori']) == -1: d_start, d_end = d_end, d_start # t -- target t_start = int(assembly_row['%s_start' % target_prefix]) + d_start t_end = int(assembly_row['%s_end' % target_prefix]) - d_end q_location = Coordinate(CoordName=query_coord.CoordName, Start=q_start, End=q_end, Strand=q_strand, CoordType=query_coord.CoordType, seq_region_id=q_seq_region_id, genome = query_coord.genome, ensembl_coord=True) t_location = Coordinate(CoordName=assembly_row['name'], Start=t_start, End=t_end, Strand=t_strand, CoordType=target_coord_type, seq_region_id=t_seq_region_id, genome = query_coord.genome, ensembl_coord=True) return [q_location, t_location] def assembly_exception_coordinate(loc): """returns a coordinate conversion for one with an assembly exception""" genome = loc.genome assemb_except_table = genome.CoreDb.getTable('assembly_exception') seq_region_table = genome.CoreDb.getTable('seq_region') query = sql.select([assemb_except_table, seq_region_table.c.name], sql.and_( assemb_except_table.c.seq_region_id == \ loc.seq_region_id, assemb_except_table.c.exc_seq_region_id == \ seq_region_table.c.seq_region_id)) query = location_query(assemb_except_table, loc.Start, loc.End, query = query) record = asserted_one(query.execute().fetchall()) s, conv_loc = _get_equivalent_coords(loc, record, "seq_region", "exc_seq_region", loc.CoordType) return conv_loc def get_coord_conversion(query_location,target_coord_type,core_db,where=None): """returns the ???""" where = where or 'overlap' # TODO better function name species = core_db.db_name.Species assert query_location.Species == species assembly = core_db.getTable('assembly') seq_region = core_db.getTable('seq_region') target_coord_system_id = CoordSystem(target_coord_type, core_db=core_db, species=species).coord_system_id query_prefix, target_prefix = _rank_checking(query_location.CoordType, target_coord_type, core_db, species) if query_prefix == target_prefix: return [[query_location, query_location]] # TODO: deal with query_prefix == target_prefix == '' --> could happen # when query features. query = sql.select([assembly, seq_region.c.name], sql.and_(assembly.c\ ['%s_seq_region_id' % target_prefix] == seq_region.c.seq_region_id, seq_region.c.coord_system_id == target_coord_system_id, assembly.c['%s_seq_region_id' % query_prefix] ==\ query_location.seq_region_id)) query = location_query(assembly, query_location.EnsemblStart, query_location.EnsemblEnd, start_col = "%s_start" % query_prefix, end_col = "%s_end" % query_prefix, query = query, where=where) assembly_rows = query.execute().fetchall() results = [] for assembly_row in assembly_rows: results.append(_get_equivalent_coords(query_location, assembly_row, query_prefix, target_prefix, target_coord_type)) return results PyCogent-1.5.3/cogent/db/ensembl/compara.py000644 000765 000024 00000045066 12024702176 021541 0ustar00jrideoutstaff000000 000000 import sqlalchemy as sql from numpy import empty from cogent.util.table import Table from cogent.db.ensembl.species import Species as _Species from cogent.db.ensembl.util import NoItemError, asserted_one from cogent.db.ensembl.host import get_ensembl_account, get_latest_release from cogent.db.ensembl.database import Database from cogent.db.ensembl.assembly import Coordinate, location_query from cogent.db.ensembl.genome import Genome from cogent.db.ensembl.related_region import RelatedGenes, SyntenicRegions __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Hua Ying", "Jason Merkin"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" class Compara(object): """comaparison among genomes""" def __init__(self, species, Release, account=None, pool_recycle=None, division=None): assert Release, 'invalid release specified' self.Release = str(Release) if account is None: account = get_ensembl_account(release=Release) self._account = account self._pool_recycle = pool_recycle self._compara_db = None sp = sorted([_Species.getSpeciesName(sp) for sp in set(species)]) self.Species = tuple(sp) self._genomes = {} self._attach_genomes() self._species_id_map = None self._species_db_map = None self._species_set = None self._method_species_link = None self.division = division def _attach_genomes(self): for species in self.Species: attr_name = _Species.getComparaName(species) genome = Genome(Species=species, Release=self.Release, account=self._account) self._genomes[species] = genome setattr(self, attr_name, genome) def __str__(self): my_type = self.__class__.__name__ return "%s(Species=%s; Release=%s; connected=%s)" % \ (my_type, self.Species, self.Release, self.ComparaDb is not None) def _connect_db(self): # TODO can the connection be all done in init? connection = dict(account=self._account, release=self.Release, pool_recycle=self._pool_recycle) if self._compara_db is None: self._compara_db = Database(db_type='compara', division=self.division, **connection) def _get_compara_db(self): self._connect_db() return self._compara_db ComparaDb = property(_get_compara_db) def _make_species_id_map(self): """caches the taxon id's for the self.Species""" if self._species_id_map is not None: return self._species_id_map ncbi_table = self.ComparaDb.getTable('ncbi_taxa_name') conditon = sql.select([ncbi_table.c.taxon_id, ncbi_table.c.name], ncbi_table.c.name.in_([sp for sp in self.Species])) # TODO this should make the dict values the actual Genome instances id_genome = [] for r in conditon.execute(): id_genome += [(r['taxon_id'], self._genomes[r['name']])] self._species_id_map = dict(id_genome) return self._species_id_map taxon_id_species = property(_make_species_id_map) def _get_genome_db_ids(self): if self._species_db_map is not None: return self._species_db_map genome_db_table = self.ComparaDb.getTable('genome_db') query = sql.select([genome_db_table.c.genome_db_id, genome_db_table.c.taxon_id], genome_db_table.c.taxon_id.in_(self.taxon_id_species.keys())) records = query.execute() self._species_db_map = \ dict([(r['genome_db_id'],r['taxon_id']) for r in records]) return self._species_db_map genome_taxon = property(_get_genome_db_ids) def _get_species_set(self): if self._species_set is not None: return self._species_set # we make sure the species set contains all species species_set_table = self.ComparaDb.getTable('species_set') query = sql.select([species_set_table], species_set_table.c.genome_db_id.in_(self.genome_taxon.keys())) species_sets = {} for record in query.execute(): gen_id = record['genome_db_id'] sp_set_id = record['species_set_id'] if sp_set_id in species_sets: species_sets[sp_set_id].update([gen_id]) else: species_sets[sp_set_id] = set([gen_id]) expected = set(self.genome_taxon.keys()) species_set_ids = [] for sp_set, gen_id in species_sets.items(): if expected <= gen_id: species_set_ids.append(sp_set) self._species_set = species_set_ids return self._species_set species_set = property(_get_species_set) def _get_method_link_species_set(self): if self._method_species_link is not None: return self._method_species_link method_link_table = self.ComparaDb.getTable('method_link') query = sql.select([method_link_table], method_link_table.c['class'].like('%'+'alignment'+'%')) methods = query.execute().fetchall() method_link_ids = dict([(r['method_link_id'], r) for r in methods]) method_link_species_table = \ self.ComparaDb.getTable('method_link_species_set') query = sql.select([method_link_species_table], sql.and_( method_link_species_table.c.species_set_id.in_(self.species_set), method_link_species_table.c.method_link_id.in_( method_link_ids.keys()))) records = query.execute().fetchall() # store method_link_id, type, species_set_id, # method_link_species_set.name, class header = ['method_link_species_set_id', 'method_link_id', 'species_set_id', 'align_method', 'align_clade'] rows = [] for record in records: ml_id = record['method_link_id'] sp_set_id = record['species_set_id'] ml_sp_set_id = record['method_link_species_set_id'] clade_name = record['name'] aln_name = method_link_ids[ml_id]['type'] rows += [[ml_sp_set_id, ml_id, sp_set_id, aln_name, clade_name]] if rows == []: rows = empty((0,len(header))) t = Table(header=header, rows=rows, space=2, row_ids=True, title='Align Methods/Clades') self._method_species_link = t return t method_species_links = property(_get_method_link_species_set) def getRelatedGenes(self, gene_region=None, StableId=None, Relationship=None, DEBUG=False): """returns a RelatedGenes instance. Arguments: - gene_region: a Gene instance - StableId: ensembl stable_id identifier - Relationship: the types of related genes sought""" assert gene_region is not None or StableId is not None,\ "No identifier provided" assert Relationship is not None, "No Relationship specified" # TODO understand why this has become necessary to suppress warnings # in SQLAlchemy 0.6 Relationship = u'%s' % Relationship StableId = StableId or gene_region.StableId member_table = self.ComparaDb.getTable('member') homology_member_table = self.ComparaDb.getTable('homology_member') homology_table = self.ComparaDb.getTable('homology') member_ids = sql.select([member_table.c.member_id], member_table.c.stable_id == StableId) member_ids = [r['member_id'] for r in member_ids.execute()] if not member_ids: return None if DEBUG: print "member_ids", member_ids homology_ids = sql.select([homology_member_table.c.homology_id, homology_member_table.c.member_id], homology_member_table.c.member_id.in_(member_ids)) homology_ids = [r['homology_id'] for r in homology_ids.execute()] if not homology_ids: return None if DEBUG: print "1 - homology_ids", homology_ids homology_records = \ sql.select([homology_table.c.homology_id, homology_table.c.description, homology_table.c.method_link_species_set_id], sql.and_(homology_table.c.homology_id.in_(homology_ids), homology_table.c.description == Relationship)) homology_ids = [] for r in homology_records.execute(): homology_ids.append((r["homology_id"], (r["description"], r["method_link_species_set_id"]))) homology_ids = dict(homology_ids) if DEBUG: print "2 - homology_ids", homology_ids if not homology_ids: return None ortholog_ids = sql.select([homology_member_table.c.member_id, homology_member_table.c.homology_id], homology_member_table.c.homology_id.in_(homology_ids.keys())) ortholog_ids = dict([(r['member_id'], r['homology_id']) \ for r in ortholog_ids.execute()]) if DEBUG: print "ortholog_ids", ortholog_ids if not ortholog_ids: return None # could we have more than one here? relationships = set() for memid, homid in ortholog_ids.items(): relationships.update([homology_ids[homid][0]]) relationships = tuple(relationships) gene_set = sql.select([member_table], sql.and_(member_table.c.member_id.in_(ortholog_ids.keys()), member_table.c.taxon_id.in_(self.taxon_id_species.keys()))) data = [] for record in gene_set.execute(): genome = self.taxon_id_species[record['taxon_id']] StableId = record['stable_id'] gene = list(genome.getGenesMatching(StableId=StableId)) assert len(gene) == 1, "Error in selecting genes: %s" % gene gene = gene[0] gene.Location.Strand = record['chr_strand'] data += [gene] if not data: return None return RelatedGenes(self, data, Relationships=relationships) def _get_dnafrag_id_for_coord(self, coord): """returns the dnafrag_id for the coordnate""" dnafrag_table = self.ComparaDb.getTable('dnafrag') genome_db_table = self.ComparaDb.getTable('genome_db') # column renamed between versions prefix = coord.genome.Species.lower() if int(self.Release) > 58: prefix = _Species.getEnsemblDbPrefix(prefix) query = sql.select([dnafrag_table.c.dnafrag_id, dnafrag_table.c.coord_system_name], sql.and_(dnafrag_table.c.genome_db_id ==\ genome_db_table.c.genome_db_id, genome_db_table.c.name == prefix, dnafrag_table.c.name == coord.CoordName)) try: record = asserted_one(query.execute().fetchall()) dnafrag_id = record['dnafrag_id'] except NoItemError: raise RuntimeError, 'No DNA fragment identified' return dnafrag_id def _get_genomic_align_blocks_for_dna_frag_id(self, method_clade_id, dnafrag_id, coord): genomic_align_table = self.ComparaDb.getTable('genomic_align') query = sql.select([genomic_align_table.c.genomic_align_id, genomic_align_table.c.genomic_align_block_id], sql.and_(genomic_align_table.c.method_link_species_set_id ==\ method_clade_id, genomic_align_table.c.dnafrag_id == dnafrag_id)) query = location_query(genomic_align_table, coord.EnsemblStart, coord.EnsemblEnd, start_col = 'dnafrag_start', end_col = 'dnafrag_end', query = query) return query.execute().fetchall() def _get_joint_genomic_align_dnafrag(self, genomic_align_block_id): genomic_align_table = self.ComparaDb.getTable('genomic_align') dnafrag_table = self.ComparaDb.getTable('dnafrag') query = sql.select([genomic_align_table.c.genomic_align_id, genomic_align_table.c.genomic_align_block_id, genomic_align_table.c.dnafrag_start, genomic_align_table.c.dnafrag_end, genomic_align_table.c.dnafrag_strand, dnafrag_table], sql.and_(genomic_align_table.c.genomic_align_block_id == \ genomic_align_block_id, genomic_align_table.c.dnafrag_id == dnafrag_table.c.dnafrag_id, dnafrag_table.c.genome_db_id.in_(self.genome_taxon.keys()))) return query.execute().fetchall() def getSyntenicRegions(self, Species=None, CoordName=None, Start=None, End=None, Strand=1, ensembl_coord=False, region=None, align_method=None, align_clade=None, method_clade_id=None): """returns a SyntenicRegions instance Arguments: - Species: the species name - CoordName, Start, End, Strand: the coordinates for the region - ensembl_coord: whether the coordinates are in Ensembl form - region: a region instance or a location, in which case the CoordName etc .. arguments are ignored - align_method, align_clade: the alignment method and clade to use Note: the options for this instance can be found by printing the method_species_links attribute of this object. - method_clade_id: over-rides align_method/align_clade. The entry in method_species_links under method_link_species_set_id """ assert (align_method and align_clade) or method_clade_id, \ 'Must specify (align_method & align_clade) or method_clade_id' if method_clade_id is None: for row in self.method_species_links: if align_method.lower() in row['align_method'].lower() and\ align_clade.lower() in row['align_clade'].lower(): method_clade_id = row['method_link_species_set_id'] if method_clade_id is None: raise RuntimeError, "Invalid align_method[%s] or align_clade "\ "specified[%s]" % (align_method, align_clade) if region is None: ref_genome = self._genomes[_Species.getSpeciesName(Species)] region = ref_genome.makeLocation(CoordName=CoordName, Start=Start, End=End, Strand=Strand, ensembl_coord=ensembl_coord) elif hasattr(region, 'Location'): region = region.Location # make sure the genome instances match ref_genome = self._genomes[region.genome.Species] if ref_genome is not region.genome: # recreate region from our instance region = ref_genome.makeLocation(CoordName=region.CoordName, Start=region.Start, End=region.End, Strand=region.Strand) ref_dnafrag_id = self._get_dnafrag_id_for_coord(region) blocks=self._get_genomic_align_blocks_for_dna_frag_id(method_clade_id, ref_dnafrag_id, region) for block in blocks: genomic_align_block_id = block['genomic_align_block_id'] # we get joint records for these identifiers from records = self._get_joint_genomic_align_dnafrag( genomic_align_block_id) members = [] ref_location = None for record in records: taxon_id = self.genome_taxon[record.genome_db_id] genome = self.taxon_id_species[taxon_id] # we have a case where we getback different coordinate system # results for the ref genome. We keep only those that match # the CoordName of region if genome is region.genome and \ record.name == region.CoordName: # this is the ref species and we adjust the ref_location # for this block diff_start = record.dnafrag_start-region.EnsemblStart shift_start = [0, diff_start][diff_start > 0] diff_end = record.dnafrag_end-region.EnsemblEnd shift_end = [diff_end, 0][diff_end > 0] try: ref_location = region.resized(shift_start, shift_end) except ValueError: # we've hit some ref genome fragment that matches # but whose coordinates aren't right continue elif genome is region.genome: continue members += [(genome, record)] assert ref_location is not None, "Failed to make the reference"\ " location" yield SyntenicRegions(self, members, ref_location=ref_location) def getDistinct(self, property_type): """returns the Ensembl data-bases distinct values for the named property_type. Arguments: - property_type: valid values are relationship""" property_type = property_type.lower() db = self.ComparaDb property_map = {'relationship': ('homology', 'description'), 'clade': ('method_link_species_set', 'name')} if property_type not in property_map: raise RuntimeError, "ERROR: Unknown property type: %s"%property_type table_name, column = property_map[property_type] return list(db.getDistinct(table_name, column)) PyCogent-1.5.3/cogent/db/ensembl/database.py000644 000765 000024 00000010212 12024702176 021644 0ustar00jrideoutstaff000000 000000 import sqlalchemy as sql from cogent.util import table as cogent_table from cogent.db.ensembl.host import DbConnection, get_db_name from cogent.util.misc import flatten __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Jason Merkin"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" class Database(object): """holds the data-base connection and table attributes""" def __init__(self, account, species=None, db_type=None, release=None, pool_recycle=None, division=None): self._tables = {} self.db_name = get_db_name(account=account, species=species, release=release, db_type=db_type, division=division) if not self.db_name: raise RuntimeError, "%s db doesn't exist for '%s' on '%s'" % \ (db_type, species, account.host) else: self.db_name = self.db_name[0] self._db = DbConnection(account=account, db_name=self.db_name, pool_recycle=pool_recycle) self._meta = sql.MetaData(self._db) self.Type = db_type def __str__(self): return str(self.db_name) def __cmp__(self, other): return cmp(self._db, other._db) def getTable(self, name): """returns the SQLalchemy table instance""" table = self._tables.get(name, None) if table is None: c = self._db.execute("DESCRIBE %s" % name) custom_columns = [] for r in c.fetchall(): Field = r["Field"] Type = r["Type"] if "tinyint" in Type: custom_columns.append(sql.Column(Field, sql.Integer)) try: table = sql.Table(name, self._meta, autoload=True, extend_existing=True, *custom_columns) except TypeError: # new arg name not supported, try old table = sql.Table(name, self._meta, autoload=True, useexisting=True, *custom_columns) self._tables[name] = table return table def getDistinct(self, table_name, column): """returns the Ensembl data-bases distinct values for the named property_type. Arguments: - table_name: the data base table name - column: valid values are biotype, status""" table = self.getTable(table_name) query = sql.select([table.c[column]], distinct=True) records = set() string_types = str, unicode for record in query.execute(): if type(record) not in string_types and \ type(record[0]) not in string_types: # multi-dimensioned list/tuple record = flatten(record) elif type(record) not in string_types: # list/tuple of strings record = tuple(record) else: # a string record = [record] records.update(record) return records def tableHasColumn(self, table_name, column): """returns True if table has column""" table = self.getTable(table_name) return hasattr(table.c, column) def getTablesRowCount(self, table_name=None): """returns a cogent Table object with the row count for each table in the database Arguments: - table_name: database table name. If none, all database tables assessed.""" if type(table_name) == str: table_name = (table_name,) elif table_name is None: self._meta.reflect() table_name = self._meta.tables.keys() rows = [] for name in table_name: table = self.getTable(name) count = table.count().execute().fetchone()[0] rows.append(['%s.%s' % (self.db_name, name), count]) return cogent_table.Table(header=['name', 'count'], rows=rows) PyCogent-1.5.3/cogent/db/ensembl/feature_level.py000644 000765 000024 00000013064 12024702176 022732 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import sqlalchemy as sql from cogent.util.table import Table from cogent.db.ensembl.assembly import CoordSystem from cogent.db.ensembl.species import Species as _Species __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" class _FeatureLevelRecord(object): def __init__(self, feature_type, coord_system_names): self.feature_type = feature_type self.levels = coord_system_names def __str__(self): return 'feature = %s; Levels = %s' % (self.feature_type, ', '.join(self.levels)) class FeatureCoordLevelsCache(object): _species_feature_levels = {} _species_feature_dbs = {} def __init__(self, species): self.Species = _Species.getSpeciesName(species) def __repr__(self): """print table format""" header = ['Type', 'Levels'] result = [] for species in self._species_feature_levels.keys(): feature_levels = self._species_feature_levels[species] collate = [] for feature in feature_levels.keys(): collate.append([feature, feature_levels[feature].levels]) t = Table(header, collate, title=species) result.append(str(t)) result = '\n'.join(result) return result def _get_meta_coord_records(self, db): meta_coord = db.getTable('meta_coord') if 'core' in str(db.db_name): query = sql.select([meta_coord]).where(meta_coord.c.table_name.\ in_(['gene', 'simple_feature', 'repeat_feature'])) query = query.order_by(meta_coord.c.table_name) elif 'variation' in str(db.db_name): query = sql.select([meta_coord]).where(meta_coord.c.table_name == 'variation_feature') else: assert 'otherfeature' in str(db.db_name) query = sql.select([meta_coord]).where(meta_coord.c.table_name == 'gene') records = query.execute().fetchall() return records def _add_species_feature_levels(self, species, records, db_type, coord_system): if db_type == 'core': features = ['cpg', 'repeat', 'gene', 'est'] tables = ['simple_feature', 'repeat_feature', 'gene', 'gene'] elif db_type == 'var': features, tables = ['variation'], ['variation_feature'] else: assert db_type == 'otherfeature' features, tables = ['est'], ['gene'] for feature, table_name in zip(features, tables): feature_coord_ids = [r['coord_system_id'] for r in records if r['table_name'] == table_name] feature_coord_systems = [coord_system[coord_id] for coord_id in feature_coord_ids] levels = [s.name for s in feature_coord_systems] self._species_feature_levels[species][feature] = _FeatureLevelRecord(feature, levels) def _set_species_feature_levels(self, species, core_db, feature_types, var_db, otherfeature_db): if species not in self._species_feature_levels: self._species_feature_levels[species] = {} self._species_feature_dbs[species] = [] coord_system = CoordSystem(core_db = core_db) if set(feature_types).intersection(set(['cpg', 'repeat', 'gene'])): if 'core_db' not in self._species_feature_dbs[species]: self._species_feature_dbs[species].append('core_db') records = self._get_meta_coord_records(core_db) self._add_species_feature_levels(species, records, 'core', coord_system) if 'variation' in feature_types: if 'var_db' not in self._species_feature_dbs[species]: self._species_feature_dbs[species].append('var_db') assert var_db is not None records = self._get_meta_coord_records(var_db) self._add_species_feature_levels(species, records, 'var', coord_system) if 'est' in feature_types: if 'otherfeature_db' not in self._species_feature_dbs[species]: self._species_feature_dbs[species].append('otherfeature_db') assert otherfeature_db is not None records = self._get_meta_coord_records(otherfeature_db) self._add_species_feature_levels(species, records, 'otherfeature', coord_system) def __call__(self, species = None, core_db=None, feature_types=None, var_db=None, otherfeature_db=None): if 'variation' in feature_types: assert var_db is not None species = _Species.getSpeciesName(core_db.db_name.Species or species) self._set_species_feature_levels(species, core_db, feature_types, var_db, otherfeature_db) return self._species_feature_levels[species] class FeatureCoordLevels(FeatureCoordLevelsCache): def __init__(self, species): self.Species = _Species.getSpeciesName(species) def __repr__(self): """print table format""" header = ['Type', 'Levels'] if self.Species not in self._species_feature_levels: result = '' else: collate = [] feature_levels = self._species_feature_levels[self.Species] for feature in feature_levels.keys(): record = feature_levels[feature] collate.append([feature, ', '.join(record.levels)]) result = str(Table(header, collate, title=self.Species)) return result PyCogent-1.5.3/cogent/db/ensembl/genome.py000644 000765 000024 00000061152 12024702176 021363 0ustar00jrideoutstaff000000 000000 import re import sqlalchemy as sql from cogent.db.ensembl.species import Species as _Species from cogent.db.ensembl.util import LazyRecord, asserted_one,\ convert_strand, DisplayString from cogent.db.ensembl.host import get_ensembl_account, get_latest_release from cogent.db.ensembl.database import Database from cogent.db.ensembl.assembly import CoordSystem, Coordinate, \ get_coord_conversion, location_query from cogent.db.ensembl.region import Gene, Variation, GenericRegion, \ CpGisland, Repeat, Est from cogent.db.ensembl.feature_level import FeatureCoordLevels from cogent.util.misc import flatten __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" class FeatureTypeCache(LazyRecord): """stores critical indices for different feature types""" def __init__(self, genome): super(FeatureTypeCache, self).__init__() self.genome = genome self._type_func_map = dict(CpGisland=self._get_cpg_island_analysis_id, Repeat=self._get_repeat_id) def _get_cpg_island_analysis_id(self): analysis_description_table = \ self.genome.CoreDb.getTable('analysis_description') query = sql.select([analysis_description_table.c.analysis_id], analysis_description_table.c.display_label.like('%CpG%')) record = asserted_one(query.execute()) self._table_rows['analysis_description'] = record quoted_limited = lambda x : DisplayString(x, with_quotes=True, num_words=2) self._populate_cache_from_record( [('CpGisland','analysis_id',quoted_limited)], 'analysis_description') def _get_cpg_island_id(self): return self._get_cached_value('CpGisland', self._get_cpg_island_analysis_id) CpGisland = property(_get_cpg_island_id) def _get_repeat_id(self): raise NotImplementedError Repeat = property(_get_repeat_id) def get(self, feature_type): """returns the analysis_id for feature_type""" try: func = self._type_func_map[feature_type] except KeyError: raise RuntimeError,"Unknown feature type: %s" % feature_type return self._get_cached_value(feature_type, func) class Genome(object): """An Ensembl Genome""" def __init__(self, Species, Release, account=None, pool_recycle=None): super(Genome, self).__init__() assert Release, 'invalid release specified' if account is None: account = get_ensembl_account(release=Release) self._account = account self._pool_recycle = pool_recycle # TODO: check Release may not be necessary because: assert Release above if Release is None: Release = get_latest_release(account=account) self._gen_release = None # TODO make name and release immutable properties self.Species = _Species.getSpeciesName(Species) self.Release = str(Release) # the db connections self._core_db = None self._var_db = None self._other_db = None self._feature_type_ids = FeatureTypeCache(self) self._feature_coord_levels = FeatureCoordLevels(self.Species) def __str__(self): my_type = self.__class__.__name__ return "%s(Species='%s'; Release='%s')" % (my_type, self.Species, self.Release) def __repr__(self): return self.__str__() def __cmp__(self, other): return cmp(self.CoreDb, other.CoreDb) def _connect_db(self, db_type): connection = dict(account=self._account, release=self.Release, species=self.Species, pool_recycle=self._pool_recycle) if self._core_db is None and db_type == 'core': self._core_db = Database(db_type='core', **connection) gen_rel = self.CoreDb.db_name.GeneralRelease gen_rel = int(re.findall(r'^\d+', str(gen_rel))[0]) self._gen_release = gen_rel elif self._var_db is None and db_type == 'variation': self._var_db = Database(db_type='variation', **connection) elif self._other_db is None and db_type == 'otherfeatures': self._other_db = Database(db_type='otherfeatures', **connection) def _get_core_db(self): self._connect_db('core') return self._core_db CoreDb = property(_get_core_db) def _get_var_db(self): self._connect_db('variation') return self._var_db VarDb = property(_get_var_db) def _get_other_db(self): self._connect_db('otherfeatures') return self._other_db OtherFeaturesDb = property(_get_other_db) @property def GeneralRelease(self): """returns True if the general Ensembl release is >= 65""" # General release is used here as to support Ensembl genomes if self._gen_release is None: self.CoreDb return self._gen_release def _get_biotype_description_condition(self, gene_table, Description=None, BioType=None, like=True): assert Description or BioType, "no valid argument provided" btype, descr = None, None if BioType: if like: btype = gene_table.c.biotype.like('%'+BioType+'%') else: btype = gene_table.c.biotype==BioType if Description: if like: descr = gene_table.c.description.like('%'+Description+'%') else: descr = gene_table.c.description.op('regexp')( '[[:<:]]%s[[:>:]]' % Description) if btype is not None and descr is not None: condition = sql.and_(btype, descr) elif btype is not None: condition = btype elif descr is not None: condition = descr return condition def _build_gene_query(self, db, condition, gene_table, gene_id_table, xref_table=None): if gene_id_table is None: # Ensembl releases later than >= 65 join_obj = gene_table select_obj = [gene_table] else: join_obj = gene_id_table.join(gene_table, gene_id_table.c.gene_id==gene_table.c.gene_id) select_obj = [gene_id_table.c.stable_id, gene_table] if db.Type == 'core': join_obj = join_obj.outerjoin(xref_table, gene_table.c.display_xref_id==xref_table.c.xref_id) select_obj.append(xref_table.c.display_label) query = sql.select(select_obj, from_obj=[join_obj], whereclause=condition) return query def _get_symbol_from_synonym(self, db, synonym): """returns the gene symbol for a synonym""" synonym_table = db.getTable('external_synonym') xref_table = db.getTable('xref') joinclause = xref_table.join(synonym_table, xref_table.c.xref_id==synonym_table.c.xref_id) whereclause = synonym_table.c.synonym==synonym query = sql.select([xref_table.c.display_label], from_obj=[joinclause], whereclause=whereclause).distinct() result = query.execute().fetchall() if result: try: symbol = flatten(result)[0] except IndexError: symbol = None else: symbol = None return symbol def _get_gene_query(self, db, Symbol=None, Description=None, StableId=None, BioType=None, synonym=None, like=True): xref_table = [None, db.getTable('xref')][db.Type == 'core'] gene_table = db.getTable('gene') # after release 65, the gene_id_table is removed. The following is to maintain # support for earlier releases release_ge_65 = self.GeneralRelease >= 65 if release_ge_65: gene_id_table = None else: gene_id_table = db.getTable('gene_stable_id') assert Symbol or Description or StableId or BioType, "no valid argument provided" if Symbol: condition = xref_table.c.display_label==Symbol elif StableId and release_ge_65: condition = gene_table.c.stable_id==StableId elif StableId: condition = gene_id_table.c.stable_id==StableId else: condition = self._get_biotype_description_condition(gene_table, Description, BioType, like) query = self._build_gene_query(db, condition, gene_table, gene_id_table, xref_table) return query def makeLocation(self, CoordName, Start=None, End=None, Strand=1, ensembl_coord=False): """returns a location in the genome""" return Coordinate(self, CoordName=CoordName, Start=Start, End=End, Strand=Strand, ensembl_coord=ensembl_coord) def getGeneByStableId(self, StableId): """returns the gene matching StableId, or None if no record found""" query = self._get_gene_query(self.CoreDb, StableId=StableId) try: record = list(query.execute())[0] gene = Gene(self, self.CoreDb, data=record) except IndexError: gene = None return gene def getGenesMatching(self, Symbol=None, Description=None, StableId=None, BioType=None, like=True): """Symbol: HGC gene symbol, case doesn't matter description: a functional description StableId: the ensebl identifier BioType: the biological encoding type""" # TODO additional arguments to satisfy: external_ref, go_terms if Symbol is not None: Symbol = Symbol.lower() # biotype -> gene # description -> gene # Symbols -> xref # StableId -> gene_stable_id # XREF table calls # for gene symbols, these need to be matched against the display_label # attribute of core.xref table # for description, these need to be matched against the description # field of the xref table # TODO catch conditions where user passes in both a symbol and a # biotype args = dict(Symbol=Symbol, Description=Description, StableId=StableId, BioType=BioType, like=like) query = self._get_gene_query(self.CoreDb, **args) records = query.execute() if records.rowcount == 0 and Symbol is not None: # see if the symbol has a synonym Symbol = self._get_symbol_from_synonym(self.CoreDb, Symbol) if Symbol is not None: args['Symbol'] = Symbol records = self._get_gene_query(self.CoreDb, **args).execute() else: records = [] for record in records: gene = Gene(self, self.CoreDb, data=record) yield gene def getEstMatching(self, StableId): """returns an Est object from the otherfeatures db with the StableId""" query = self._get_gene_query(self.OtherFeaturesDb, StableId=StableId) records = query.execute() for record in records: yield Est(self,self.OtherFeaturesDb,StableId=StableId,data=record) def _get_seq_region_id(self, CoordName): """returns the seq_region_id for the provided CoordName""" seq_region_table = self.CoreDb.getTable('seq_region') coord_systems = CoordSystem(core_db=self.CoreDb) coord_system_ids = [k for k in coord_systems if not isinstance(k, str)] record = sql.select([seq_region_table.c.seq_region_id], sql.and_(seq_region_table.c.name == CoordName, seq_region_table.c.coord_system_id.in_(coord_system_ids))) record = asserted_one(record.execute().fetchall()) return record['seq_region_id'] def _get_simple_features(self, db, klass, target_coord, query_coord, where_feature): """returns feature_type records for the query_coord from the simple_feature table. The returned coord is referenced to target_coord. At present, only CpG islands being queried.""" simple_feature_table = db.getTable('simple_feature') feature_types = ['CpGisland'] feature_type_ids=[self._feature_type_ids.get(f) for f in feature_types] # fix the following query = sql.select([simple_feature_table], sql.and_(simple_feature_table.c.analysis_id.in_(feature_type_ids), simple_feature_table.c.seq_region_id == query_coord.seq_region_id)) query = location_query(simple_feature_table,query_coord.EnsemblStart, query_coord.EnsemblEnd, query=query, where=where_feature) records = query.execute() for record in records: coord = Coordinate(self, CoordName=query_coord.CoordName, Start=record['seq_region_start'], End = record['seq_region_end'], seq_region_id=record['seq_region_id'], Strand = record['seq_region_strand'], ensembl_coord=True) if query_coord.CoordName != target_coord.CoordName: coord = asserted_one(get_coord_conversion(coord, target_coord.CoordType, self.CoreDb))[1] # coord = coord.makeRelativeTo(query_coord) #TODO: fix here if query_coord and target_coord have different coordName # coord = coord.makeRelativeTo(target_coord, False) yield klass(self, db, Location=coord, Score=record['score']) def _get_repeat_features(self, db, klass, target_coord, query_coord, where_feature): """returns Repeat region instances""" # we build repeats using coordinates from repeat_feature table # the repeat_consensus_id is required to get the repeat name, class # and type repeat_feature_table = db.getTable('repeat_feature') query = sql.select([repeat_feature_table], repeat_feature_table.c.seq_region_id == query_coord.seq_region_id) query = location_query(repeat_feature_table, query_coord.EnsemblStart, query_coord.EnsemblEnd, query=query, where=where_feature) for record in query.execute(): coord = Coordinate(self, CoordName=query_coord.CoordName, Start=record['seq_region_start'], End = record['seq_region_end'], seq_region_id=record['seq_region_id'], Strand = record['seq_region_strand'], ensembl_coord=True) if query_coord.CoordName != target_coord.CoordName: coord = asserted_one(get_coord_conversion(coord, target_coord.CoordType, self.CoreDb))[1] # coord = coord.makeRelativeTo(query_coord) #TODO: fix here if query_coord and target_coord have different coordName # coord = coord.makeRelativeTo(target_coord, False) yield klass(self, db, Location=coord, Score=record['score'], data=record) def _get_gene_features(self, db, klass, target_coord, query_coord, where_feature): """returns all genes""" xref_table = [None, db.getTable('xref')][db.Type == 'core'] gene_table = db.getTable('gene') # after release 65, the gene_id_table is removed. The following is to maintain # support for earlier releases. if self.GeneralRelease >= 65: gene_id_table = None else: gene_id_table = db.getTable('gene_stable_id') # note gene records are at chromosome, not contig, level condition = gene_table.c.seq_region_id == query_coord.seq_region_id query = self._build_gene_query(db, condition, gene_table, gene_id_table, xref_table) query = location_query(gene_table, query_coord.EnsemblStart, query_coord.EnsemblEnd, query=query, where=where_feature) for record in query.execute(): new = Coordinate(self, CoordName=query_coord.CoordName, Start=record['seq_region_start'], End = record['seq_region_end'], Strand = record['seq_region_strand'], seq_region_id=record['seq_region_id'], ensembl_coord=True) if query_coord.CoordName != target_coord.CoordName: coord = asserted_one(get_coord_conversion(coord, target_coord.CoordType, self.CoreDb))[1] # TODO: check coord, used 'new' here. where is coord (above line) used? gene = klass(self, db, Location=new, data=record) yield gene def _get_variation_features(self, db, klass, target_coord, query_coord, where_feature): """returns variation instances within the specified region""" # variation features at supercontig level var_feature_table = self.VarDb.getTable('variation_feature') # note gene records are at chromosome, not contig, level query = sql.select([var_feature_table], var_feature_table.c.seq_region_id == query_coord.seq_region_id) query = location_query(var_feature_table, query_coord.EnsemblStart, query_coord.EnsemblEnd, query=query, where=where_feature) for record in query.execute(): yield klass(self, self.CoreDb, Symbol=record['variation_name'], data=record) def _get_feature_coord_levels(self, feature_types): dbs = dict(core_db = self.CoreDb) if 'variation' in feature_types: dbs["var_db"] = self.VarDb if 'est' in feature_types: dbs["otherfeature_db"] = self.OtherFeaturesDb feature_coord_levels = self._feature_coord_levels(self.Species, feature_types = feature_types,**dbs) return feature_coord_levels def _feature_coord_levels(self): if str(self._feature_coord_levels): return self._feature_coord_levels feature_types = ['gene', 'est', 'variation', 'cpg', 'repeat'] feature_coord_levels = self._get_feature_coord_levels(feature_types) return self._feature_coord_levels FeatureCoordLevels = property(_feature_coord_levels) def getFeatures(self, region=None, feature_types=None, where_feature=None, CoordName=None, Start=None, End=None, Strand=None, ensembl_coord=False): """returns Region instances for the specified location""" if isinstance(feature_types, str): feature_types = [feature_types] feature_types = [ft.lower() for ft in feature_types] feature_coord_levels = self._get_feature_coord_levels(feature_types) if region is None: seq_region_id = self._get_seq_region_id(CoordName) region = Coordinate(self,CoordName=CoordName, Start=Start, End=End, Strand = convert_strand(Strand), seq_region_id=seq_region_id, ensembl_coord=ensembl_coord) elif hasattr(region, 'Location'): region = region.Location coord = region # the coordinate system at which locations are to be referenced, and # the processing function target_coords_funcs = \ dict(cpg = (self._get_simple_features, CpGisland), repeat = (self._get_repeat_features, Repeat), gene = (self._get_gene_features, Gene), est = (self._get_gene_features, Est), variation = (self._get_variation_features, Variation)) known_types = set(target_coords_funcs.keys()) if not set(feature_types) <= known_types: raise RuntimeError, 'Unknown feature[%s], valid feature_types \ are: %s' % (set(feature_types)^known_types, known_types) for feature_type in feature_types: target_func, target_class = target_coords_funcs[feature_type] db = self.CoreDb if feature_type == 'est': db = self.OtherFeaturesDb feature_coords = feature_coord_levels[feature_type].levels for feature_coord in feature_coords: chrom_other_coords = get_coord_conversion(coord, feature_coord, db, where=where_feature) for chrom_coord, other_coord in chrom_other_coords: for region in target_func(db, target_class, chrom_coord, other_coord, where_feature): yield region def getVariation(self, Effect=None, Symbol=None, like=True, validated=False): """returns a generator of Variation instances Arguments: - Effect: the coding impact, eg. nonsynonymous - like: Effect is exactly matched against records like that provided - Symbol: the external or ensembl identifier - returns the exact match - validated: variant has validation_status != None""" var_feature_table = self.VarDb.getTable('variation_feature') assert Effect or Symbol, "No arguments provided" # if we don't have Symbol, then we deal with Effect if Effect is not None: if like: query = \ var_feature_table.c.consequence_type.like('%'+Effect+'%') else: query = var_feature_table.c.consequence_type == Effect else: query = var_feature_table.c.variation_name == Symbol if validated: # in Release 65, the default validated status is now '' # why?? thanks Ensembl! null = None if int(self.Release) >= 65: null = '' query = sql.and_(query,var_feature_table.c.validation_status!=null) query = sql.select([var_feature_table], query).order_by(var_feature_table.c.seq_region_start) for record in query.execute(): yield Variation(self, self.CoreDb, Effect = Effect, Symbol=Symbol, data=record) def getRegion(self, region=None, CoordName=None, Start=None, End=None, Strand=None, ensembl_coord=False): """returns a single generic region for the specified coordinates Arguments: - region: a genomic region or a Coordinate instance - ensembl_coords: if True, follows indexing system of Ensembl where indexing starts at 1""" if region is None: seq_region_id = self._get_seq_region_id(CoordName) region = Coordinate(self,CoordName=CoordName, Start=Start, End=End, Strand = convert_strand(Strand), seq_region_id=seq_region_id, ensembl_coord=ensembl_coord) elif hasattr(region, 'Location'): region = region.Location return GenericRegion(self, self.CoreDb, CoordName=CoordName, Start=Start, End=End, Strand=Strand, Location=region, ensembl_coord=ensembl_coord) def getDistinct(self, property_type): """returns the Ensembl data-bases distinct values for the named property_type. Arguments: - property_type: valid values are biotype, status, effect""" property_type = property_type.lower() if property_type == 'effect': db = self.VarDb else: db = self.CoreDb property_map = {'effect': ('variation_feature', 'consequence_type'), 'biotype': ('gene', 'biotype'), 'status': ('gene', 'status')} if property_type not in property_map: raise RuntimeError,\ "ERROR: Unknown property type: %s" % property_type table_name, column = property_map[property_type] return list(db.getDistinct(table_name, column)) PyCogent-1.5.3/cogent/db/ensembl/host.py000644 000765 000024 00000011066 12024702176 021065 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python import MySQLdb import sqlalchemy as sql from cogent.util.table import Table from cogent.db.ensembl.species import Species from cogent.db.ensembl.name import EnsemblDbName from cogent.db.ensembl.util import asserted_one __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Jason Merkin"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" class HostAccount(object): """host account data""" def __init__(self, host, user, passwd, port=None): super(HostAccount, self).__init__() self.host = host self.user = user self.passwd = passwd self.port = port or 3306 def __cmp__(self, other): return cmp((self.host, self.user, self.port), (other.host, other.user, other.port)) def __str__(self): return '%s:%s@%s:%s' % (self.user,self.passwd,self.host,self.port) def get_ensembl_account(release=None): """returns an HostAccount for ensembl. Arguments: - release: if not specified, returns for the ensembl MySQL server hosting releases from 48""" port = [None, 5306][release is None or int(release) > 47] return HostAccount("ensembldb.ensembl.org", "anonymous", "", port=port) def _get_default_connection(): return "ensembldb.ensembl.org", "anonymous", "" class EngineCache(object): """storage of active connections, indexed by account, database name""" _db_account = {} def __call__(self, account, db_name=None, pool_recycle=None): """returns an active SQLAlchemy connection engine""" assert account and db_name,"Must provide an account and a db" pool_recycle = pool_recycle or 3600 if account not in self._db_account.get(db_name, []): if db_name == "PARENT": engine = MySQLdb.connect(host=account.host, user=account.user, passwd=account.passwd, port=account.port) else: engine = sql.create_engine("mysql://%s/%s" % (account, db_name), pool_recycle=pool_recycle) if db_name not in self._db_account: self._db_account[db_name] = {} self._db_account[db_name][account] = engine return self._db_account[db_name][account] DbConnection = EngineCache() def make_db_name_pattern(species=None, db_type=None, release=None): """returns a pattern for matching the db name against""" sep = r"%" pattern = "" if species: species = Species.getEnsemblDbPrefix(species) pattern = "%s%s" % (sep, species) if db_type: pattern = "%s%s%s" % (pattern, sep, db_type) if release: pattern = "%s%s%s" % (pattern, sep, release) assert pattern return "'%s%s'" % (pattern, sep) def get_db_name(account=None, species=None, db_type=None, release=None, division=None, DEBUG=False): """returns the listing of valid data-base names as EnsemblDbName objects""" if account is None: account = get_ensembl_account(release=release) if DEBUG: print "Connection To:", account print "Selecting For:", species, db_type, release server = DbConnection(account, db_name='PARENT') cursor = server.cursor() show = "SHOW DATABASES" if species or db_type or release: pattern = make_db_name_pattern(species, db_type, release) show = "%s LIKE %s" % (show, pattern) if DEBUG: print show cursor.execute(show) rows = cursor.fetchall() dbs = [] for row in rows: try: if division is not None and division not in row[0]: continue name = EnsemblDbName(row[0]) if (release is None or name.Release == str(release)) and\ (db_type is None or name.Type == db_type): dbs.append(name) except (IndexError, RuntimeError): if DEBUG: print "FAIL:", row[0] continue return dbs def get_latest_release(account = None): """returns the number of the latest release based on the compara db""" names = get_db_name(account=account, db_type="compara") compara = [] for name in names: compara += [int(name.Release)] return str(max(compara)) if __name__ == "__main__": eaccount = get_ensembl_account(release='48') print get_db_name(account=eaccount, release="48", db_type='compara') print get_latest_release() PyCogent-1.5.3/cogent/db/ensembl/name.py000644 000765 000024 00000006137 12024702176 021033 0ustar00jrideoutstaff000000 000000 import re from species import Species __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" _release = re.compile("\d+") def get_version_from_name(name): """returns the release and build identifiers from an ensembl db_name""" r = _release.search(name) if r is None: return None, None # first number run is release, followed by build # note, for the ensemblgenomes naming system, the second digit run is the # standard Ensembl release and the first is for the specified genome release = name[r.start(): r.end()] b = [s for s in _name_delim.split(name[r.end():]) if s] return release, b _name_delim = re.compile("_") def get_dbtype_from_name(name): """returns the data base type from the name""" try: name = _release.split(name) name = [s for s in _name_delim.split(name[0]) if s] except TypeError, msg: print "Error:" print name, type(name), msg raise dbtype = None if name[0] == "ensembl": dbtype = name[1] else: dbtype = "_".join(name[2:]) return dbtype def get_db_prefix(name): """returns the db prefix, typically an organism or `ensembl'""" name = _name_delim.split(name) if name[0] == "ensembl": prefix = "ensembl" elif len(name) > 4: prefix = "_".join(name[:2]) else: raise RuntimeError("Unknown name structure: %s" % "_".join(name)) return prefix class EnsemblDbName(object): """container for a db name, inferring different attributes from the name, such as species, version, build""" def __init__(self, db_name): """db_name: and Emsembl database name""" if isinstance(db_name, EnsemblDbName): db_name = db_name.Name self.Name = db_name self.Type = get_dbtype_from_name(db_name) self.Prefix = get_db_prefix(db_name) release, build = get_version_from_name(db_name) self.Release = release self.GeneralRelease = self.Release if len(build) == 1: if self.Type != 'compara': self.Build = build[0] else: self.Build = None self.GeneralRelease = build[0] elif build: self.Build = build[1] self.GeneralRelease = build[0] else: self.Build = None self.Species = None self.Species = Species.getSpeciesName(self.Prefix) def __repr__(self): build = ['', "; Build='%s'" % self.Build][self.Build != None] s = "db(Prefix='%s'; Type='%s'; Release='%s'%s)" % (self.Prefix, self.Type, self.Release, build) return s def __str__(self): return self.Name def __cmp__(self, other): if isinstance(other, type(self)): other = other.Name return cmp(self.Name, other) def __hash__(self): return hash(self.Name) PyCogent-1.5.3/cogent/db/ensembl/region.py000644 000765 000024 00000146005 12024702176 021375 0ustar00jrideoutstaff000000 000000 import sys import sqlalchemy as sql from cogent import DNA from cogent.core.annotation import Feature from cogent.core.location import Map from cogent.util.table import Table from cogent.db.ensembl.util import LazyRecord, asserted_one, DisplayString, \ NoItemError from cogent.db.ensembl.assembly import Coordinate, CoordSystem, \ location_query, assembly_exception_coordinate from cogent.db.ensembl.sequence import get_sequence __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" DEFAULT_PARENT_LENGTH = 2 **30 # some common string display formatting _quoted = lambda x : DisplayString(x, with_quotes=True) _limit_words = lambda x : DisplayString(x, with_quotes=True, num_words=3) class _Region(LazyRecord): """a simple genomic region object""" Type = None def __init__(self): super(_Region, self).__init__() self._attr_ensembl_table_map = None self._location_column_prefix = 'seq_region_' def __len__(self): return len(self.Location) def __cmp__(self, other): return cmp(self.Location, other.Location) def _make_location(self): row = self._table_rows[self._attr_ensembl_table_map['Location']] if row is None: return seq_region_id = row['%sid' % self._location_column_prefix] start = row['%sstart' % self._location_column_prefix] end = row['%send' % self._location_column_prefix] strand = row['%sstrand' % self._location_column_prefix] seq_region_table = self.db.getTable('seq_region') query = sql.select([seq_region_table.c.name], seq_region_table.c.seq_region_id == seq_region_id) result = asserted_one(query.execute().fetchall()) coord_name = result['name'] coord = Coordinate(genome = self.genome, CoordName=coord_name, Start=start, End=end, Strand=strand, seq_region_id=seq_region_id, ensembl_coord=True) self._cached['Location'] = coord def _get_location_record(self): """makes the Location data""" if not self._attr_ensembl_table_map['Location'] in self._table_rows: # we use a bit of magic to figure out what method will be required # this magic assumes the method for obtaining a record from a table # are named _get_tablename_record dep_record_func = getattr(self, '_get_%s_record' % \ self._attr_ensembl_table_map['Location']) dep_record_func() self._make_location() def _get_location(self): return self._get_cached_value('Location', self._get_location_record) Location = property(_get_location) def _get_sequence(self): if 'Seq' not in self._cached: try: seq = get_sequence(self.Location) except NoItemError: try: alt_loc = assembly_exception_coordinate(self.Location) seq = get_sequence(alt_loc) except NoItemError: seq = DNA.makeSequence("N"*len(self)) seq.Name = str(self.Location) self._cached['Seq'] = seq return self._cached['Seq'] Seq = property(_get_sequence) def _get_symbol(self): # override in subclasses return None Symbol = property(_get_symbol) def getFeatures(self, feature_types, where_feature=None): """queries the parent genome for feature types corresponding to this region where_feature: the returned region can either lie 'within' this region, 'overlap' this region, or 'span' this region""" return self.genome.getFeatures(self.Location, feature_types=feature_types, where_feature=where_feature) def _get_variants(self): """constructs the variants attribute""" if 'Variants' not in self._cached: variants = self.genome.getFeatures(feature_types='variation', region=self) self._cached['Variants'] = tuple(variants) return self._cached['Variants'] Variants = property(_get_variants) def featureData(self, parent_map): symbol = self.Symbol or getattr(self, 'StableId', '') assert not parent_map.Reverse feat_map = parent_map[self.Location.Start:self.Location.End] if feat_map.useful: if self.Location.Strand == -1: # this map is relative to + strand feat_map = feat_map.reversed() data = (self.Type, str(symbol), feat_map) else: data = None return data def getAnnotatedSeq(self, feature_types=None, where_feature=None): regions = list(self.getFeatures(feature_types = feature_types, where_feature = where_feature)) # seq_map is on the + strand, regardless the actual strand of sequence seq_map = Map(locations = [(self.Location.Start, self.Location.End)], parent_length = DEFAULT_PARENT_LENGTH) seq_map = seq_map.inverse() for region in regions: data = region.featureData(seq_map) if data is None: continue # this will consider the strand information of actual sequence feature_map = [data[-1], data[-1].nucleicReversed()][self.Location.Strand == -1] self.Seq.addAnnotation(Feature, data[0], data[1], feature_map) if region.Type == 'gene': # TODO: SHOULD be much simplified sub_data = region.subFeatureData(seq_map) for feature_type, feature_name, feature_map in sub_data: if self.Location.Strand == -1: # again, change feature map to -1 strand sequence if # needed. feature_map = feature_map.nucleicReversed() self.Seq.addAnnotation(Feature, feature_type, feature_name, feature_map) return self.Seq class GenericRegion(_Region): """a generic genomic region""" Type = 'generic_region' def __init__(self, genome, db, Location=None, CoordName=None, Start=None, End=None, Strand=1, ensembl_coord=False): super(GenericRegion, self).__init__() self.genome = genome self.db = db if Location is None and CoordName: self._get_seq_region_record(str(CoordName)) if End is not None: assert self._table_rows['seq_region']['length'] > End, \ 'Requested End[%s] too large' % End seq_region_id = self._table_rows['seq_region']['seq_region_id'] Location = Coordinate(genome=genome, CoordName=str(CoordName), Start=Start, End=End, Strand=Strand, seq_region_id=seq_region_id, ensembl_coord=ensembl_coord) if Location is not None: self._cached['Location'] = Location def __str__(self): my_type = self.__class__.__name__ return "%s(Species='%s'; CoordName='%s'; Start=%s; End=%s;"\ " length=%s; Strand='%s')" % (my_type, self.genome.Species, self.Location.CoordName, self.Location.Start, self.Location.End, len(self), '-+'[self.Location.Strand>0]) def _get_seq_region_record(self, CoordName): # override the _Region class method, since, we take the provided Start # etc .. attributes # CoordName comes from seq_region_table.c.name # matched, by coord_system_id, to default coord system seq_region_table = self.genome.db.getTable('seq_region') coord_systems = CoordSystem(core_db=self.genome.CoreDb) coord_system_ids = [k for k in coord_systems if not isinstance(k, str)] record = sql.select([seq_region_table], sql.and_(seq_region_table.c.name == CoordName, seq_region_table.c.coord_system_id.in_(coord_system_ids))) record = asserted_one(record.execute().fetchall()) self._table_rows['seq_region'] = record class _StableRegion(GenericRegion): """region with a stable_id""" _member_types = None def __init__(self, genome, db, **kwargs): super(_StableRegion, self).__init__(genome, db, **kwargs) def __repr__(self): my_type = self.__class__.__name__ return '%s(%s; %s)' % (my_type, self.genome.Species, self.StableId) def _get_record_for_stable_id(self): # subclasses need to provide a function for loading the correct # record for obtaining a stable_id table_name = self._attr_ensembl_table_map['StableId'] if self.genome.GeneralRelease >= 65: func_name = '_get_%s_record' % (table_name + '_stable_id') else: func_name = '_get_%s_record' % table_name func = getattr(self, func_name) func() attr_column_map = [('StableId', 'stable_id', _quoted)] self._populate_cache_from_record(attr_column_map, table_name) def _get_stable_id(self): return self._get_cached_value('StableId', self._get_record_for_stable_id) StableId = property(_get_stable_id) def getMember(self, StableId, member_types=None): """returns the associated member with matching StableId or None if not found. Arguments: - member_types: the property to be searched, depends on self.Type. Transcripts for genes, Exons/TranslatedExons for Transcripts.""" member_types = member_types or self._member_types if type(member_types) == str: member_types = [member_types] for member_type in member_types: member = getattr(self, member_type, None) if member is None: raise AttributeError,\ "%s doesn't have property %s" % (self.Type, member_type) for element in member: if element.StableId == StableId: return element return None class Gene(_StableRegion): """a gene region""" Type = 'gene' _member_types = ['Transcripts'] def __init__(self, genome, db, StableId=None, Symbol=None, Location=None, data=None): """constructed by a genome instance""" super(Gene, self).__init__(genome, db, Location=Location) self._attr_ensembl_table_map = dict(StableId=['gene_stable_id', 'gene'][genome.GeneralRelease >= 65], Symbol='xref', Description='gene', BioType='gene', Location='gene', CanonicalTranscript='gene', Transcripts='transcript', Exons='transcript') if data is None: args = [dict(StableId=StableId), dict(Symbol=Symbol)][StableId is None] assert args data = asserted_one(list(self.genome._get_gene_query(db, **args).execute())) for name, func in \ [('StableId',self._get_gene_stable_id_record), ('BioType', self._get_gene_record), ('Description', self._get_gene_record), ('Symbol', self._get_xref_record), ('Location', self._get_gene_record)]: if name == 'Symbol' and 'display_label' not in data.keys(): # For EST continue self._table_rows[self._attr_ensembl_table_map[name]] = data func() # this populates the attributes def __str__(self): my_type = self.__class__.__name__ vals = ['%s=%r' % (key, val) for key, val in self._cached.items() \ if val is not None] vals.sort() vals.insert(0, "Species='%s'" % self.genome.Species) return '%s(%s)' % (my_type, '; '.join(vals)) def __repr__(self): my_type = self.__class__.__name__ vals = ['%s=%r' % (key, val) for key, val in self._cached.items() \ if val is not None] vals.sort() vals.insert(0, 'Species=%r' % self.genome.Species) return '%s(%s)' % (my_type, '; '.join(vals)) def _get_gene_record(self): """adds the gene data to self._table_rows""" attr_column_map = [('BioType', 'biotype', _quoted), ('Status', 'status', _quoted), ('Description', 'description', _limit_words)] # we set all the attributes that derive from this self._populate_cache_from_record(attr_column_map, 'gene') return def _get_gene_stable_id_record(self): """adds the gene_stable_id data to self._table_rows""" attr_column_map = [('StableId', 'stable_id', _quoted)] self._populate_cache_from_record(attr_column_map, self._attr_ensembl_table_map['StableId']) return def _get_xref_record(self): attr_column_map = [('Symbol','display_label', _quoted)] self._populate_cache_from_record(attr_column_map, 'xref') return def _get_biotype(self): return self._get_cached_value('BioType', self._get_gene_record) BioType = property(_get_biotype) def _get_symbol(self): if 'xref' in self._table_rows: return self._get_cached_value('Symbol', self._get_xref_record) self._set_null_values(['Symbol']) return self._cached['Symbol'] Symbol = property(_get_symbol) def _get_description(self): return self._get_cached_value('Description', self._get_gene_record) Description = property(_get_description) def _get_status(self): return self._get_cached_value('Status', self._get_gene_record) Status = property(_get_status) def _make_canonical_transcript(self): if not 'gene' in self._table_rows: self._get_gene_record() canonical_id = self._table_rows['gene']['canonical_transcript_id'] transcript_table = self.db.getTable('transcript') query = sql.select([transcript_table], transcript_table.c.transcript_id==canonical_id) records = query.execute().fetchall() assert len(records) == 1,\ "wrong number of records from CanonicalTranscript" record = records[0] transcript = Transcript(self.genome, self.db, canonical_id, data=record) self._cached['CanonicalTranscript'] = transcript def _get_canonical_transcript(self): return self._get_cached_value('CanonicalTranscript', self._make_canonical_transcript) CanonicalTranscript = property(_get_canonical_transcript) def _make_transcripts(self): if not 'gene' in self._table_rows: self._get_gene_record() gene_id = self._table_rows['gene']['gene_id'] transcript_table = self.db.getTable('transcript') query = sql.select([transcript_table], transcript_table.c.gene_id==gene_id) records = query.execute().fetchall() if not records: self._set_null_values(['Transcripts'], 'transcript') return transcripts = [] for record in records: transcript_id = record['transcript_id'] transcripts.append(Transcript(self.genome, self.db, transcript_id, data=record)) self._cached['Transcripts'] = tuple(transcripts) def _get_transcripts(self): return self._get_cached_value('Transcripts', self._make_transcripts) Transcripts = property(_get_transcripts) def subFeatureData(self, parent_map): """returns data for making a cogent Feature. These can be automatically applied to the Seq by the getAnnotatedSeq method. Returns None if self lies outside parent's span. """ features = [] for transcript in self.Transcripts: transcript_data = transcript.featureData(parent_map) if transcript_data: features.append(transcript_data) data = transcript.subFeatureData(parent_map) features.extend(data) return features def getCdsLengths(self): """returns the Cds lengths from transcripts with the same biotype. returns None if no transcripts.""" if self.Transcripts is self.NULL_VALUE: return None l = [ts.getCdsLength() for ts in self.Transcripts if ts.BioType == self.BioType] return l def getLongestCdsTranscript(self): """returns the Transcript with the longest Cds and the same biotype""" result = sorted([(ts.getCdsLength(), ts) for ts in self.Transcripts if ts.BioType == self.BioType]) if result: # last result is longest result = result[-1][1] return result class Transcript(_StableRegion): Type = 'transcript' _member_types = ['Exons', 'TranslatedExons'] def __init__(self, genome, db, transcript_id, data, Location=None): """created by Gene""" super(Transcript, self).__init__(genome, db, Location=Location) self._attr_ensembl_table_map = dict(StableId=['transcript_stable_id', 'transcript'][genome.GeneralRelease >= 65], Location='transcript', Status = 'transcript', TranslatedExons='translation') self._am_prot_coding = None self.transcript_id = transcript_id self._table_rows['transcript'] = data self._set_transcript_record() def _set_transcript_record(self): attr_column_map = [('BioType', 'biotype', _quoted), ('Status', 'status', _quoted)] self._populate_cache_from_record(attr_column_map, 'transcript') self._am_prot_coding=self._cached['BioType'].lower()=='protein_coding' def _get_status(self): return self._cached['Status'] Status = property(_get_status) def _get_biotype(self): return self._cached['BioType'] BioType = property(_get_biotype) def _get_transcript_stable_id_record(self): table_name = self._attr_ensembl_table_map['StableId'] if table_name in self._table_rows: return transcript_id = self.transcript_id table = self.db.getTable(table_name) query = sql.select([table], table.c.transcript_id == transcript_id) record = asserted_one(query.execute()) self._table_rows[table_name] = record def _get_exon_transcript_records(self): transcript_id = self.transcript_id exon_transcript_table = self.db.getTable('exon_transcript') query = sql.select([exon_transcript_table], exon_transcript_table.c.transcript_id == transcript_id) records = query.execute() exons = [] for record in records: exons.append(Exon(self.genome, self.db, record['exon_id'], record['rank'])) exons.sort() self._cached['Exons'] = tuple(exons) def _get_exons(self): return self._get_cached_value('Exons', self._get_exon_transcript_records) Exons = property(_get_exons) def _get_intron_transcript_records(self): if len(self.Exons) < 2: self._set_null_values(["Introns"]) return exon_positions = [(exon.Location.Start, exon.Location.End) for exon in self.Exons] exon_positions.sort() end = exon_positions[-1][-1] exon_map = Map(locations=exon_positions, parent_length=end) intron_map = exon_map.shadow() intron_positions = [(span.Start, span.End) for span in intron_map.spans if span.Start != 0] chrom = self.Location.CoordName strand = self.Location.Strand introns = [] rank = 1 if strand == -1: intron_positions.reverse() for s, e in intron_positions: coord = self.genome.makeLocation(CoordName=chrom, Start=s, End=e, Strand=strand, ensembl_coord=False) introns.append(Intron(self.genome, self.db, rank, self.StableId, coord)) rank += 1 self._cached['Introns'] = tuple(introns) def _get_introns(self): return self._get_cached_value('Introns', self._get_intron_transcript_records) Introns = property(_get_introns) def _get_translation_record(self): transcript_id = self.transcript_id translation_table = self.db.getTable('translation') query = sql.select([translation_table], translation_table.c.transcript_id == transcript_id) try: record = asserted_one(query.execute()) except NoItemError: self._set_null_values(['TranslatedExons'], 'translation') return self._table_rows['translation'] = record def _get_transcript(self): self._get_translation_record() record = self._table_rows['translation'] if record == self.NULL_VALUE: return start_exon_id = record['start_exon_id'] end_exon_id = record['end_exon_id'] # because this will be used to shift a coord. Note: these are relative # to the exon start but ignore strand, so we have to decide whether # the coord shifts need to be flipped seq_start = record['seq_start'] - 1 seq_end = record['seq_end'] flip_coords = self.Exons[0].Location.Strand == -1 start_index = None end_index = None for index, exon in enumerate(self.Exons): if exon.exon_id == start_exon_id: start_index = index if exon.exon_id == end_exon_id: end_index = index assert None not in (start_index, end_index), \ 'Error in matching transcript and exons' start_exon = self.Exons[start_index] if start_index == end_index: shift_start=[seq_start,len(start_exon)-seq_end][flip_coords] shift_end = [seq_end-len(start_exon), -1*seq_start][flip_coords] else: shift_start = [seq_start, 0][flip_coords] shift_end = [0, -1*seq_start][flip_coords] coord = start_exon.Location.resized(shift_start, shift_end) DEBUG = False if DEBUG: out = ['\nseq_start=%d; seq_end=%d' %(seq_start, seq_end), 'shift_start=%d; shift_end=%d'%(shift_start, shift_end), 'len=%s'% len(coord)] sys.stderr.write('\n'.join(map(str,out))+'\n') new_start_exon = Exon(self.genome, self.db, start_exon.exon_id, start_exon.Rank, Location=coord) translated_exons=(new_start_exon,)+\ self.Exons[start_index+1:end_index] if start_index != end_index: end_exon = self.Exons[end_index] shift_start = [0, len(end_exon)-seq_end][flip_coords] shift_end = [seq_end-len(end_exon), 0][flip_coords] coord=end_exon.Location.resized(shift_start, shift_end) new_end_exon = Exon(self.genome, self.db, end_exon.exon_id, end_exon.Rank, Location=coord) translated_exons += (new_end_exon,) self._cached['TranslatedExons'] = translated_exons def _get_translated_exons(self): return self._get_cached_value('TranslatedExons', self._get_transcript) TranslatedExons = property(_get_translated_exons) def _calculate_Utr_exons(self): # TODO clean up this code exons = self.Exons translated_exons = self.TranslatedExons num_exons = len(self.Exons) if not translated_exons: self._set_null_values(["UntranslatedExons5", "UntranslatedExons3"]) return untranslated_5exons, untranslated_3exons = [], [] start_exon, end_exon = translated_exons[0], translated_exons[-1] flip_coords = start_exon.Location.Strand == -1 for exon in exons[0:start_exon.Rank]: # get 5'UTR coord = exon.Location.copy() if exon.StableId == start_exon.StableId: coord.Start = [coord.Start, start_exon.Location.End][flip_coords] coord.End = [start_exon.Location.Start, coord.End][flip_coords] if len(coord) != 0: untranslated_5exons.append(Exon(self.genome, self.db, exon.exon_id, exon.Rank, Location = coord)) for exon in exons[end_exon.Rank -1: num_exons]: # get 3'UTR coord = exon.Location.copy() if exon.StableId == end_exon.StableId: coord.Start = [end_exon.Location.End, coord.Start][flip_coords] coord.End = [coord.End, end_exon.Location.Start][flip_coords] if len(coord) != 0: untranslated_3exons.append(Exon(self.genome, self.db, exon.exon_id, exon.Rank, Location = coord)) self._cached["UntranslatedExons5"] = tuple(untranslated_5exons) self._cached["UntranslatedExons3"] = tuple(untranslated_3exons) def _get_5prime_untranslated_exons(self): return self._get_cached_value("UntranslatedExons5", self._calculate_Utr_exons) UntranslatedExons5 = property(_get_5prime_untranslated_exons) def _get_3prime_untranslated_exons(self): return self._get_cached_value("UntranslatedExons3", self._calculate_Utr_exons) UntranslatedExons3 = property(_get_3prime_untranslated_exons) def _make_utr_seq(self): if self.UntranslatedExons5 is None and self.UntranslatedExons3 is None: self._cached["Utr5"] = self.NULL_VALUE self._cached["Utr3"] = self.NULL_VALUE return Utr5_seq, Utr3_seq = DNA.makeSequence(""), DNA.makeSequence("") for exon in self.UntranslatedExons5: Utr5_seq += exon.Seq for exon in self.UntranslatedExons3: Utr3_seq += exon.Seq self._cached["Utr5"] = Utr5_seq self._cached["Utr3"] = Utr3_seq def _get_utr5_seq(self): return self._get_cached_value("Utr5", self._make_utr_seq) Utr5 = property(_get_utr5_seq) def _get_utr3_seq(self): return self._get_cached_value("Utr3", self._make_utr_seq) Utr3 = property(_get_utr3_seq) def _make_cds_seq(self): if self.Exons is self.NULL_VALUE: self._cached['Cds'] = self.NULL_VALUE return exons = [self.Exons, self.TranslatedExons][self._am_prot_coding] full_seq = None for exon in exons: if full_seq is None: full_seq = exon.Seq continue full_seq += exon.Seq # check first exon PhaseStart is 0 and last exon PhaseEnd if exons[0].PhaseStart > 0: fill = DNA.makeSequence('N' * exons[0].PhaseStart, Name=full_seq.Name) full_seq = fill + full_seq if exons[-1].PhaseEnd > 0: fill = DNA.makeSequence('N' * exons[-1].PhaseEnd, Name=full_seq.Name) full_seq += fill self._cached['Cds'] = full_seq def _get_cds(self): return self._get_cached_value('Cds', self._make_cds_seq) Cds = property(_get_cds) def getCdsLength(self): """returns the length of the Cds. If this property is not available, returns None.""" if self.Cds is self.NULL_VALUE: return None exons = [self.Exons, self.TranslatedExons][self._am_prot_coding] return sum(map(len, exons)) def _make_protein_seq(self): if not self._am_prot_coding or self.Cds is self.NULL_VALUE: self._cached['ProteinSeq'] = self.NULL_VALUE return DEBUG = False # enforce multiple of 3 cds = self.Cds length = len(cds) cds = cds[: length - (length % 3)] try: cds = cds.withoutTerminalStopCodon() except AssertionError: if not DEBUG: raise out = ['\n****\nFAILED=%s' % self.StableId] for exon in self.TranslatedExons: out += ['TranslatedExon[rank=%d]\n' % exon.Rank, exon, exon.Location, '%s ... %s'%(exon.Seq[:20],exon.Seq[-20:])] sys.stderr.write('\n'.join(map(str, out))+'\n') raise self._cached['ProteinSeq'] = cds.getTranslation() def _get_protein_seq(self): return self._get_cached_value('ProteinSeq', self._make_protein_seq) ProteinSeq = property(_get_protein_seq) def _get_exon_feature_data(self, parent_map): """returns the exon feature data""" features = [] if self.Exons is self.NULL_VALUE: return features for exon in self.Exons: feature_data = exon.featureData(parent_map) if feature_data is None: continue features.append(feature_data) return features def _get_intron_feature_data(self, parent_map): """return the intron feature data""" features = [] if self.Introns is self.NULL_VALUE: return features for intron in self.Introns: feature_data = intron.featureData(parent_map) if feature_data is None: continue features.append(feature_data) return features def _get_translated_exon_feature_data(self, parent_map): """returns featureD data for translated exons""" features = [] if self.TranslatedExons is self.NULL_VALUE: return features cds_spans = [] for exon in self.TranslatedExons: feature_data = exon.featureData(parent_map) if feature_data is None: continue cds_spans.extend(feature_data[-1].spans) if cds_spans: # TODO: check strand cds_map = Map(spans = cds_spans, parent_length = parent_map.parent_length) features.append(('CDS', str(self.StableId), cds_map)) return features def _get_Utr_feature_data(self, parent_map): # TODO: Simplify this part features = [] utr5_spans, utr3_spans = [], [] for exon in self.UntranslatedExons5: feature_data = exon.featureData(parent_map) if feature_data is None: continue utr5_spans.extend(feature_data[-1].spans) for exon in self.UntranslatedExons3: feature_data = exon.featureData(parent_map) if feature_data is None: continue utr3_spans.extend(feature_data[-1].spans) if utr5_spans: utr5_map = Map(spans = utr5_spans, parent_length = parent_map.parent_length) features.append(("5'UTR", str(self.StableId), utr5_map)) if utr3_spans: utr3_map = Map(spans = utr3_spans, parent_length = parent_map.parent_length) features.append(("3'UTR", str(self.StableId), utr3_map)) return features def subFeatureData(self, parent_map): """returns data for making a cogent Feature. This can be automatically applied to the Seq by the getAnnotatedSeq method. Returns None if self lies outside parent's span. """ features = self._get_exon_feature_data(parent_map) features += self._get_intron_feature_data(parent_map) features += self._get_translated_exon_feature_data(parent_map) if self.TranslatedExons: features += self._get_Utr_feature_data(parent_map) return features class Exon(_StableRegion): Type = 'exon' def __init__(self, genome, db, exon_id, Rank, Location=None): """created by a Gene""" _StableRegion.__init__(self, genome, db, Location=Location) self._attr_ensembl_table_map = dict(StableId=['exon_stable_id', 'exon'][genome.GeneralRelease >= 65], Location='exon') self.exon_id = exon_id self.Rank = Rank def __str__(self): return self.__repr__() def __repr__(self): my_type = self.__class__.__name__ return '%s(StableId=%s, Rank=%s)' % (my_type, self.StableId, self.Rank) def __cmp__(self, other): return cmp(self.Rank, other.Rank) def _get_exon_stable_id_record(self): if self.genome.GeneralRelease >= 65: # release >= 65, data is just in the exon table self._get_exon_record() return table_name = self._attr_ensembl_table_map['StableId'] exon_stable_id_table = self.db.getTable(table_name) query = sql.select([exon_stable_id_table.c.stable_id], exon_stable_id_table.c.exon_id == self.exon_id) records = query.execute() record = asserted_one(records.fetchall()) self._table_rows[table_name] = record def _get_exon_record(self): # this will be called by _Region parent class to make the location exon_table = self.db.getTable('exon') query = sql.select([exon_table], exon_table.c.exon_id == self.exon_id) records = query.execute() record = asserted_one(records.fetchall()) self._table_rows['exon'] = record def _make_symbol(self): self._cached['Symbol'] = '%s-%s' % (self.StableId, self.Rank) def _get_symbol(self): return self._get_cached_value('Symbol', self._make_symbol) Symbol = property(_get_symbol) def _make_phase(self): """creates the exon phase attributes""" if 'exon' not in self._table_rows: self._get_exon_record() exon = self._table_rows['exon'] self._cached['PhaseStart'] = exon['phase'] self._cached['PhaseEnd'] = exon['end_phase'] @property def PhaseStart(self): """reading frame start for this exon""" return self._get_cached_value('PhaseStart', self._make_phase) @property def PhaseEnd(self): """reading frame end for this exon""" return self._get_cached_value('PhaseEnd', self._make_phase) class Intron(GenericRegion): Type = 'intron' def __init__(self, genome, db, rank, transcript_stable_id, Location=None): GenericRegion.__init__(self, genome, db, Location=Location) self.TranscriptStableId = transcript_stable_id self.Rank = rank def __str__(self): return self.__repr__() def __repr__(self): my_type = self.__class__.__name__ return '%s(TranscriptId=%s, Rank=%s)' % (my_type, self.TranscriptStableId, self.Rank) def _make_symbol(self): self._cached['Symbol'] = '%s-%s'%(self.TranscriptStableId, self.Rank) def _get_symbol(self): return self._get_cached_value('Symbol', self._make_symbol) Symbol = property(_get_symbol) class Est(Gene): """an EST region""" Type = 'est' def _set_to_string(val): if type(val) in (str, type(None)): return val val = list(val) while len(val) == 1 and type(val) in (tuple, list): val = val[0] return val class Variation(_Region): """genomic variation""" Type = 'variation' def __init__(self, genome, db=None, Effect=None, Symbol=None, data=None): self.genome = genome get_table = genome.VarDb.getTable self.variation_feature_table = get_table('variation_feature') self.transcript_variation_table = get_table('transcript_variation') self.flanking_sequence_table = get_table('flanking_sequence') self.allele_table = get_table('allele') try: self.allele_code_table = get_table('allele_code') except sql.exceptions.ProgrammingError: self.allele_code_table = None super(Variation, self).__init__() self._attr_ensembl_table_map = dict(Effect='variation_feature', Symbol='variation_feature', Validation='variation_feature', MapWeight='variation_feature', FlankingSeq='flanking_sequence', PeptideAlleles='transcript_variation', TranslationLocation='transcript_variation', Location='variation_feature', AlleleFreqs='allele') assert data is not None, 'Variation record created in an unusual way' for name, value, func in \ [('Effect',Effect,self._get_variation_table_record), ('Symbol',Symbol,self._get_variation_table_record)]: if value is not None: self._table_rows[self._attr_ensembl_table_map[name]] = data if func is not None: func() # this populates the attributes self.db = db or self.genome.CoreDb def __len__(self): """return the length of the longest allelic variant""" return max(map(len, self._split_alleles())) def __str__(self): my_type = self.__class__.__name__ return "%s(Symbol=%r; Effect=%r; Alleles=%r)" % \ (my_type, self.Symbol, self.Effect, self.Alleles) def _get_variation_table_record(self): consequence_type = 'consequence_type' if self.genome.GeneralRelease > 67: consequence_type += 's' # change to plural column name attr_name_map = [('Effect', consequence_type, _set_to_string), ('Alleles', 'allele_string', _quoted), ('Symbol', 'variation_name', _quoted), ('Validation', 'validation_status', _set_to_string), ('MapWeight', 'map_weight', int)] self._populate_cache_from_record(attr_name_map, 'variation_feature') # TODO handle obtaining the variation_feature if we were created in # any way other than through the Symbol or Effect def _get_seq_region_record(self, seq_region_id): # should this be on a parent class? or a generic function in assembly? seq_region_table = self.db.getTable('seq_region') query = sql.select([seq_region_table], seq_region_table.c.seq_region_id == seq_region_id) record = asserted_one(query.execute()) return record def _get_flanking_seq_data(self): # maps to flanking_sequence through variation_feature_id # if this fails, we grab from genomic sequence variation_id = self._table_rows['variation_feature']['variation_id'] flanking_seq_table = self.flanking_sequence_table query = sql.select([flanking_seq_table], flanking_seq_table.c.variation_id == variation_id) record = asserted_one(query.execute()) self._table_rows['flanking_sequence'] = record up_seq = record['up_seq'] down_seq = record['down_seq'] # the following two lines are because -- wait for it -- someone has # entered the string 'NULL' instead of NULL in the MySQL tables!!! up_seq = [up_seq, None][up_seq == 'NULL'] down_seq = [down_seq, None][down_seq == 'NULL'] seqs = dict(up=up_seq, down=down_seq) for name, seq in seqs.items(): if seq is not None: seq = DNA.makeSequence(seq) else: resized = [(-301, -1), (1, 301)][name == 'down'] if self.Location.Strand == -1: resized = [(1, 301), (-301, -1)][name == 'down'] flank = self.Location.resized(*resized) flanking = self.genome.getRegion(region=flank) seq = flanking.Seq seqs[name] = seq self._cached[('FlankingSeq')] = (seqs['up'][-300:],seqs['down'][:300]) def _get_flanking_seq(self): return self._get_cached_value('FlankingSeq', self._get_flanking_seq_data) FlankingSeq = property(_get_flanking_seq) def _get_effect(self): return self._get_cached_value('Effect', self._get_variation_table_record) Effect = property(_get_effect) def _get_alleles(self): return self._get_cached_value('Alleles', self._get_variation_table_record) Alleles = property(_get_alleles) def _get_allele_table_record(self): variation_id = self._table_rows['variation_feature']['variation_id'] allele_table = self.allele_table query = sql.select([allele_table], allele_table.c.variation_id == variation_id) records = [r for r in query.execute()] if len(records) == 0: self._cached[('AlleleFreqs')] = self.NULL_VALUE return # property change from >= 65, allele ids need to be looked up in # the allele_code table allele_code = self.allele_code_table self._table_rows['allele_table'] = records data = [] for rec in records: if not rec['sample_id']: continue if allele_code is None: allele = rec['allele'] else: allele_query = sql.select([allele_code.c.allele], allele_code.c.allele_code_id == rec['allele_code_id']) allele = list(allele_query.execute())[0][0] data.append((allele, rec['frequency'], rec['sample_id'])) if not data: self._cached[('AlleleFreqs')] = self.NULL_VALUE return table = Table(header='allele freq sample_id'.split(), rows=data) self._cached[('AlleleFreqs')] = table.sorted(['sample_id', 'allele']) def _get_allele_freqs(self): return self._get_cached_value('AlleleFreqs', self._get_allele_table_record) AlleleFreqs = property(_get_allele_freqs) def _get_symbol(self): return self._get_cached_value('Symbol', self._get_variation_table_record) Symbol = property(_get_symbol) def _get_validation(self): return self._get_cached_value('Validation', self._get_variation_table_record) Validation = property(_get_validation) def _get_map_weight(self): return self._get_cached_value('MapWeight', self._get_variation_table_record) MapWeight = property(_get_map_weight) def _get_transcript_record(self): if not 'variation_feature' in self._table_rows: raise NotImplementedError try: effects = [self.Effect.lower()] except AttributeError: effects = [v.lower() for v in self.Effect] effects = set(effects) nsyn = set(('non_synonymous_coding', 'non_synonymous_codon')) if not effects & nsyn: self._cached['PeptideAlleles'] = self.NULL_VALUE self._cached['TranslationLocation'] = self.NULL_VALUE return table_name = self._attr_ensembl_table_map['PeptideAlleles'] loc = lambda x: int(x)-1 # column name changed between releases, so we check to see which # one is being used for this instance and set the column strings # TODO can we modify the table on loading? This would give better # performance. if self.genome.VarDb.tableHasColumn(table_name, 'pep_allele_string'): pep_allele_string = 'pep_allele_string' consequence_type = 'consequence_types' else: pep_allele_string = 'peptide_allele_string' consequence_type = 'consequence_type' attr_column_map = [ ('PeptideAlleles', pep_allele_string, _quoted), ('TranslationLocation', 'translation_start', loc)] if table_name in self._table_rows: self._populate_cache_from_record(attr_column_map, table_name) return var_feature_record = self._table_rows['variation_feature'] var_feature_id = var_feature_record['variation_feature_id'] table = self.genome.VarDb.getTable(table_name) self_effect = set([self.Effect,[self.Effect]][type(self.Effect)==str]) query = sql.select([table.c.variation_feature_id, table.columns[pep_allele_string], table.c.translation_start, table.columns[consequence_type]], sql.and_(table.c.variation_feature_id == var_feature_id, table.columns[pep_allele_string] != None)) records = query.execute().fetchall() pep_alleles = [] translation_location = [] for record in records: if not record[consequence_type] & self_effect: continue pep_alleles += [record[pep_allele_string]] translation_location += [record['translation_start']] if not pep_alleles: print 'Expected at least a single record' raise RuntimeError # we only want unique allele strings allele_location = dict(zip(pep_alleles, translation_location)) pep_alleles = list(set(pep_alleles)) pep_alleles = [pep_alleles, pep_alleles[0]][len(pep_alleles)==1] if type(pep_alleles) != str: for pep_allele in pep_alleles: translation_location = allele_location[pep_allele] else: translation_location = allele_location[pep_alleles] self._table_rows[table_name] = dict(pep_allele_string=pep_alleles, translation_start=translation_location) self._populate_cache_from_record(attr_column_map, table_name) def _get_peptide_variation(self): return self._get_cached_value('PeptideAlleles', self._get_transcript_record) PeptideAlleles = property(_get_peptide_variation) def _get_translation_location(self): return self._get_cached_value('TranslationLocation', self._get_transcript_record) TranslationLocation = property(_get_translation_location) def _split_alleles(self): return self.Alleles.split('/') def _get_number_alleles(self): result = self._split_alleles() return len(result) NumAlleles = property(_get_number_alleles) class CpGisland(GenericRegion): Type = 'CpGisland' def __init__(self, genome, db, Location, Score): super(CpGisland, self).__init__(genome=genome, db=db, Location=Location) self.Score = Score def __str__(self): my_type = self.__class__.__name__ return "%s(CoordName='%s'; Start=%s; End=%s; length=%s;"\ " Strand='%s', Score=%.1f)" % (my_type, self.Location.CoordName, self.Location.Start, self.Location.End, len(self), '-+'[self.Location.Strand>0], self.Score) class Repeat(GenericRegion): Type = 'repeat' def __init__(self, genome, db, Location, Score, data): super(Repeat, self).__init__(genome=genome, db=db, Location=Location) self._attr_ensembl_table_map = dict(Symbol='repeat_consensus', RepeatType='repeat_consensus', RepeatClass='repeat_consensus', Consensus='repeat_consensus') self.Score = Score # assume always created from repeat_feature table self._table_rows['repeat_feature']= data def __str__(self): my_type = self.__class__.__name__ return "%s(CoordName='%s'; Start=%s; End=%s; length=%s;"\ " Strand='%s', Score=%.1f)" % (my_type, self.Location.CoordName, self.Location.Start, self.Location.End, len(self), '-+'[self.Location.Strand>0], self.Score) def _get_repeat_consensus_record(self): repeat_consensus_table = self.db.getTable('repeat_consensus') repeat_consensus_id = self._table_rows['repeat_feature']['repeat_consensus_id'] record = sql.select([repeat_consensus_table], repeat_consensus_table.c.repeat_consensus_id == repeat_consensus_id) record = asserted_one(record.execute().fetchall()) self._table_rows['repeat_consensus'] = record limit_length = lambda x : DisplayString(x, repr_length=10) attr_column_map = [('Symbol', 'repeat_name', _quoted), ('RepeatClass', 'repeat_class', _quoted), ('RepeatType', 'repeat_type', _quoted), ('Consensus', 'repeat_consensus', limit_length)] self._populate_cache_from_record(attr_column_map, 'repeat_consensus') def _get_symbol(self): return self._get_cached_value('Symbol', self._get_repeat_consensus_record) Symbol = property(_get_symbol) def _get_repeat_class(self): return self._get_cached_value('RepeatClass', self._get_repeat_consensus_record) RepeatClass = property(_get_repeat_class) def _get_repeat_type(self): return self._get_cached_value('RepeatType', self._get_repeat_consensus_record) RepeatType = property(_get_repeat_type) def _get_consensus(self): return self._get_cached_value('Consensus', self._get_repeat_consensus_record) Consensus = property(_get_consensus) PyCogent-1.5.3/cogent/db/ensembl/related_region.py000644 000765 000024 00000031354 12024702176 023075 0ustar00jrideoutstaff000000 000000 from pprint import pprint import sqlalchemy as sql from cogent import DNA from cogent.core.alignment import SequenceCollection, Alignment, Aligned from cogent.parse import cigar from cogent.db.ensembl.util import LazyRecord, asserted_one, NoItemError from cogent.db.ensembl.assembly import location_query from cogent.db.ensembl.species import Species __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" class _RelatedRegions(LazyRecord): # a basic related region, capable of providing the sequences # obtaining SyntenicRegions -- for getting aligned blocks -- is delegated # to compara Type = None def __init__(self): super(_RelatedRegions, self).__init__() def __str__(self): # temporary string method, just to demo correct assembly # TODO StableID, Species and Description my_type = self.__class__.__name__ data = map(repr, self.Members) data.insert(0, '%s(' % my_type) data.append(')') return "\n\t".join(data) def getSeqCollection(self, feature_types=None, where_feature=None): """returns a SequenceCollection instance of the unaligned sequences""" seqs = [] for member in self.Members: if feature_types: seq = member.getAnnotatedSeq(feature_types,where_feature) else: seq = member.Seq if seq is None: continue seqs.append((seq.Name, seq)) return SequenceCollection(data=seqs, MolType=DNA) def getSeqLengths(self): """returns a vector of lengths""" return [len(member) for member in self.Members] def getSpeciesSet(self): """returns the latin names of self.Member species as a set""" return set([m.Location.Species for m in self.Members]) class RelatedGenes(_RelatedRegions): Type = 'related_genes' def __init__(self, compara, Members, Relationships): super(RelatedGenes, self).__init__() self.compara = compara self.Members = tuple(Members) self.Relationships = Relationships def __str__(self): my_type = self.__class__.__name__ display = ['%s:' % my_type, ' Relationships=%s' % self.Relationships] display += [' %s' % m for m in self.Members] return '\n'.join(display) def __repr__(self): return self.__str__() def getMaxCdsLengths(self): """returns the vector of maximum Cds lengths from member transcripts""" return [max(member.getCdsLengths()) for member in self.Members] class SyntenicRegion(LazyRecord): """a class that takes the genome, compara instances and is used to build Aligned sequences for Ensembl multiple alignments""" def __init__(self, parent, genome, identifiers_values, am_ref_member, Location=None): # create with method_link_species_set_id, at least, in # identifiers_values super(SyntenicRegion, self).__init__() self.parent = parent self.compara = parent.compara self.genome = genome self.am_ref_member = am_ref_member self.aln_map = None self.aln_loc = None self._make_map_func = [self._make_map_from_ref, self._make_ref_map][am_ref_member] if Location is not None: if hasattr(Location, 'Location'): # likely to be a feature region region = Location else: region = genome.getRegion(region=Location) self._cached['Region'] = region for identifier, value in dict(identifiers_values).items(): self._cached[identifier] = value def __len__(self): return len(self._get_cached_value('Region', self._make_map_func)) def _get_location(self): region = self._get_cached_value('Region', self._make_map_func) return region.Location Location = property(_get_location) def _get_region(self): region = self._get_cached_value('Region', self._make_map_func) return region Region = property(_get_region) def _get_cigar_record(self): genomic_align_table = \ self.parent.compara.ComparaDb.getTable('genomic_align') query = sql.select([genomic_align_table.c.cigar_line], genomic_align_table.c.genomic_align_id == \ self._cached['genomic_align_id']) record = asserted_one(query.execute()) self._cached['cigar_line'] = record['cigar_line'] return record def _get_cigar_line(self): return self._get_cached_value('cigar_line', self._get_cigar_record) cigar_line = property(_get_cigar_line) def _make_ref_map(self): if self.aln_map and self.aln_loc is not None: return ref_record = self._cached record_start = ref_record['dnafrag_start'] record_end = ref_record['dnafrag_end'] record_strand = ref_record['dnafrag_strand'] block_loc = self.genome.makeLocation(CoordName=ref_record['name'], Start=record_start, End=record_end, Strand=record_strand, ensembl_coord=True) ref_location = self.parent.ref_location relative_start = ref_location.Start-block_loc.Start relative_end = relative_start + len(ref_location) if block_loc.Strand != 1: relative_start = len(block_loc) - relative_end relative_end = relative_start + len(ref_location) aln_map, aln_loc = cigar.slice_cigar(self.cigar_line, relative_start, relative_end, by_align = False) self.aln_map = aln_map self.aln_loc = aln_loc region_loc = ref_location.copy() region_loc.Strand = block_loc.Strand region = self.genome.getRegion(region=region_loc) self._cached['Region'] = region def _make_map_from_ref(self): # this is the 'other' species if self.aln_loc and self.aln_map is not None: return record = self._cached try: aln_map, aln_loc = cigar.slice_cigar(self.cigar_line, self.parent.CigarStart, self.parent.CigarEnd, by_align=True) self.aln_map = aln_map self.aln_loc = aln_loc # probably unnecesary to store?? # we make a loc for the aligned region block_loc = self.genome.makeLocation(CoordName=record['name'], Start=record['dnafrag_start'], End = record['dnafrag_end'], Strand=record['dnafrag_strand'], ensembl_coord=True) relative_start = aln_loc[0] relative_end = aln_loc[1] # new location with correct length loc = block_loc.copy() loc.End = loc.Start+(relative_end-relative_start) if block_loc.Strand != 1: shift = len(block_loc) - relative_end else: shift = relative_start loc = loc.shifted(shift) region = self.genome.getRegion(region=loc) except IndexError: # TODO ask Hua where these index errors occur region = None self._cached['Region'] = region def _make_aligned(self, feature_types = None, where_feature=None): if self.aln_loc is None or self.aln_map is None: # is this required? self._make_map_func() region = self._cached['Region'] if region is None: self._cached['AlignedSeq'] = None return if feature_types: seq = region.getAnnotatedSeq(feature_types, where_feature) else: seq = region.Seq # we get the Seq objects to allow for copying of their annotations gapped_seq = Aligned(self.aln_map, seq) self._cached['AlignedSeq'] = gapped_seq def _get_aligned_seq(self): aligned = self._get_cached_value('AlignedSeq', self._make_aligned) return aligned AlignedSeq = property(_get_aligned_seq) def getAnnotatedAligned(self, feature_types, where_feature=None): """returns aligned seq annotated for the specified feature types""" region = self._get_cached_value('Region', self._make_map_func) if region is None: return None self._make_aligned(feature_types=feature_types, where_feature=where_feature) return self.AlignedSeq class SyntenicRegions(_RelatedRegions): Type = 'syntenic_regions' def __init__(self, compara, Members, ref_location): super(SyntenicRegions, self).__init__() self.compara = compara members = [] ref_member = None self.ref_location = ref_location for genome, data in Members: if genome is ref_location.genome: ref_member = SyntenicRegion(self, genome, dict(data), am_ref_member=True, Location=ref_location) else: members += [SyntenicRegion(self, genome, dict(data), am_ref_member=False)] assert ref_member is not None, "Can't match a member to ref_location" self.ref_member = ref_member self.Members = tuple([ref_member] + members) self.NumMembers = len(self.Members) self.aln_loc = None self._do_rc = None def __str__(self): my_type = self.__class__.__name__ display = ['%s:' % my_type] display += [' %r' % m.Location for m in self.Members \ if m.Region is not None] return '\n'.join(display) def __repr__(self): return self.__str__() def _populate_ref(self): """near (don't actually get the sequence) completes construction of ref sequence""" self.ref_member._make_map_func() self._cached['CigarStart'] = self.ref_member.aln_loc[0] self._cached['CigarEnd'] = self.ref_member.aln_loc[1] def _get_rc_state(self): """determines whether the ref_member strand is the same as that from the align block, if they diff we will rc the alignment, seqs, seq_names""" if self._do_rc is not None: return self._do_rc self._populate_ref() inferred = self.ref_member._cached['Region'].Location.Strand self._do_rc = self.ref_location.Strand != inferred return self._do_rc _rc = property(fget=_get_rc_state) def __len__(self): return self.CigarEnd - self.CigarStart def _get_ref_start(self): return self._get_cached_value('CigarStart', self._populate_ref) CigarStart = property(_get_ref_start) def _get_ref_end(self): return self._get_cached_value('CigarEnd', self._populate_ref) CigarEnd = property(_get_ref_end) def getAlignment(self, feature_types=None, where_feature=None, omit_redundant=True): """Arguments: - feature_types: annotations to be applied to the returned sequences - omit_redundant: exclude redundant gap positions""" seqs = [] annotations = {} for member in self.Members: if feature_types: seq = member.getAnnotatedAligned(feature_types, where_feature) else: seq = member.AlignedSeq if seq is None: continue name = seq.Name if self._rc: # names should reflect change to strand loc = member.Location.copy() loc.Strand *= -1 name = str(loc) annotations[name] = seq.data.annotations seq.Name = seq.data.Name = name seqs += [(name, seq)] if seqs is None: return None aln = Alignment(data=seqs, MolType=DNA) if self._rc: aln = aln.rc() if omit_redundant: aln = aln.filtered(lambda x: set(x) != set('-')) return aln PyCogent-1.5.3/cogent/db/ensembl/sequence.py000644 000765 000024 00000012645 12024702176 021724 0ustar00jrideoutstaff000000 000000 import sqlalchemy as sql from cogent import DNA from cogent.core.location import Map from cogent.db.ensembl.species import Species from cogent.db.ensembl.util import NoItemError, asserted_one from cogent.db.ensembl.assembly import CoordSystem, Coordinate, \ get_coord_conversion __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" # local reference to the sqlalchemy substring function substr = sql.sql.func.substr def _assemble_seq(frags, start, end, frag_positions): """returns a single string in which missing sequence is replaced by 'N'""" prev_end = start assert len(frag_positions) == len(frags), "Mismatched number of "\ "fragments and positions" assembled = [] for index, (frag_start, frag_end) in enumerate(frag_positions): diff = frag_start - prev_end assert diff >= 0, 'fragment position start < previous end: %s, %s' %\ (frag_start, prev_end) assembled += ['N'*diff, frags[index]] prev_end = frag_end diff = end - frag_end assert diff >= 0, 'end[%s] < previous frag_end[%s]' % (end, frag_end) assembled += ['N' * diff] return DNA.makeSequence(''.join(assembled)) def _make_coord(genome, coord_name, start, end, strand): """returns a Coordinate""" return Coordinate(CoordName=coord_name, Start=start, End=end, Strand=strand, genome=genome) def get_lower_coord_conversion(coord, species, core_db): coord_system = CoordSystem(species=species, core_db=core_db) seq_level_coord_type = CoordSystem(species=species,core_db=core_db, seq_level=True) query_rank = coord_system[coord.CoordType].rank seq_level_rank = coord_system[seq_level_coord_type].rank assemblies = None for rank in range(query_rank+1, seq_level_rank): coord_type = None for key in coord_system.keys(): if coord_system[key].rank == rank: coord_type = coord_system[key].name break if coord_type is None: continue assemblies = get_coord_conversion(coord, coord_type, core_db) if assemblies: break return assemblies def _get_sequence_from_direct_assembly(coord=None, DEBUG=False): # TODO clean up use of a coord genome = coord.genome # no matter what strand user provide, we get the + sequence first coord.Strand = 1 species = genome.Species coord_type = CoordSystem(species=species,core_db=genome.CoreDb, seq_level=True) if DEBUG: print 'Created Coordinate:',coord,coord.EnsemblStart,coord.EnsemblEnd print coord.CoordType, coord_type assemblies = get_coord_conversion(coord, coord_type, genome.CoreDb) if not assemblies: raise NoItemError, 'no assembly for %s' % coord dna = genome.CoreDb.getTable('dna') seqs, positions = [], [] for q_loc, t_loc in assemblies: assert q_loc.Strand == 1 length = len(t_loc) # get MySQL to do the string slicing via substr function query = sql.select([substr(dna.c.sequence, t_loc.EnsemblStart, length).label('sequence')], dna.c.seq_region_id == t_loc.seq_region_id) record = asserted_one(query.execute().fetchall()) seq = record['sequence'] seq = DNA.makeSequence(seq) if t_loc.Strand == -1: seq = seq.rc() seqs.append(str(seq)) positions.append((q_loc.Start, q_loc.End)) sequence = _assemble_seq(seqs, coord.Start, coord.End, positions) return sequence def _get_sequence_from_lower_assembly(coord, DEBUG): assemblies = get_lower_coord_conversion(coord, coord.genome.Species, coord.genome.CoreDb) if not assemblies: raise NoItemError, 'no assembly for %s' % coord if DEBUG: print '\nMedium_level_assemblies = ', assemblies seqs, positions = [], [] for q_loc, t_loc in assemblies: t_strand = t_loc.Strand temp_seq = _get_sequence_from_direct_assembly(t_loc, DEBUG) if t_strand == -1: temp_seq = temp_seq.rc() if DEBUG: print q_loc print t_loc print 'temp_seq = ', temp_seq[:10], '\n' seqs.append(str(temp_seq)) positions.append((q_loc.Start, q_loc.End)) sequence = _assemble_seq(seqs, coord.Start, coord.End, positions) return sequence def get_sequence(coord=None, genome=None, coord_name=None, start=None, end=None, strand=1, DEBUG=False): if coord is None: coord = _make_coord(genome, coord_name, start, end, 1) else: coord = coord.copy() strand = coord.Strand try: sequence = _get_sequence_from_direct_assembly(coord, DEBUG) except NoItemError: ## means there is no assembly, so we do a thorough assembly by converting according to the "rank" sequence = _get_sequence_from_lower_assembly(coord, DEBUG) if strand == -1: sequence = sequence.rc() return sequence PyCogent-1.5.3/cogent/db/ensembl/species.py000644 000765 000024 00000030105 12024702176 021536 0ustar00jrideoutstaff000000 000000 from cogent.util.table import Table from cogent.db.ensembl.util import CaseInsensitiveString __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Jason Merkin"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" _species_common_map = [['Acyrthosiphon pisum', 'A.pisum'], ['Aedes aegypti', 'A.aegypti'], ['Anolis carolinensis', 'Anole lizard'], ['Anopheles gambiae', 'A.gambiae'], ['Apis mellifera', 'Honeybee'], ['Arabidopsis lyrata', 'A.lyrata'], ['Arabidopsis thaliana', 'A.thaliana'], ['Aspergillus clavatus', 'A.clavatus'], ['Aspergillus flavus', 'A.flavus'], ['Aspergillus fumigatus', 'A.fumigatus'], ['Aspergillus fumigatusa1163', 'A.fumigatusa1163'], ['Aspergillus nidulans', 'A.nidulans'], ['Aspergillus niger', 'A.niger'], ['Aspergillus oryzae', 'A.oryzae'], ['Aspergillus terreus', 'A.terreus'], ['Bacillus collection', 'Bacillus'], ['Borrelia collection', 'Borrelia'], ['Bos taurus', 'Cow'], ['Brachypodium distachyon', 'B.distachyon'], ['Buchnera collection', 'Buchnera'], ['Caenorhabditis brenneri', 'C.brenneri'], ['Caenorhabditis briggsae', 'C.briggsae'], ['Caenorhabditis elegans', 'C.elegans'], ['Caenorhabditis japonica', 'C.japonica'], ['Caenorhabditis remanei', 'C.remanei'], ['Callithrix jacchus', 'Marmoset'], ['Canis familiaris', 'Dog'], ['Cavia porcellus', 'Guinea Pig'], ['Choloepus hoffmanni', 'Sloth'], ['Ciona intestinalis', 'C.intestinalis'], ['Ciona savignyi', 'C.savignyi'], ['Culex quinquefasciatus', 'C.quinquefasciatus'], ['Danio rerio', 'Zebrafish'], ['Dasypus novemcinctus', 'Armadillo'], ['Dictyostelium discoideum', 'D.discoideum'], ['Dipodomys ordii', 'Kangaroo rat'], ['Drosophila ananassae', 'D.ananassae'], ['Drosophila erecta', 'D.erecta'], ['Drosophila grimshawi', 'D.grimshawi'], ['Drosophila melanogaster', 'D.melanogaster'], ['Drosophila mojavensis', 'D.mojavensis'], ['Drosophila persimilis', 'D.persimilis'], ['Drosophila pseudoobscura', 'D.pseudoobscura'], ['Drosophila sechellia', 'D.sechellia'], ['Drosophila simulans', 'D.simulans'], ['Drosophila virilis', 'D.virilis'], ['Drosophila willistoni', 'D.willistoni'], ['Drosophila yakuba', 'D.yakuba'], ['Echinops telfairi', 'Tenrec'], ['Equus caballus', 'Horse'], ['Erinaceus europaeus', 'Hedgehog'], ['Escherichia shigella', 'E.shigella'], ['Felis catus', 'Cat'], ['Gallus gallus', 'Chicken'], ['Gasterosteus aculeatus', 'Stickleback'], ['Gorilla gorilla', 'Gorilla'], ['Homo sapiens', 'Human'], ['Ixodes scapularis', 'I.scapularis'], ['Loxodonta africana', 'Elephant'], ['Macaca mulatta', 'Macaque'], ['Macropus eugenii', 'Wallaby'], ['Microcebus murinus', 'Mouse lemur'], ['Monodelphis domestica', 'Opossum'], ['Mus musculus', 'Mouse'], ['Mycobacterium collection', 'Mycobacterium'], ['Myotis lucifugus', 'Microbat'], ['Neisseria collection', 'Neisseria'], ['Nematostella vectensis', 'N.vectensis'], ['Neosartorya fischeri', 'N.fischeri'], ['Neurospora crassa', 'N.crassa'], ['Ochotona princeps', 'Pika'], ['Ornithorhynchus anatinus', 'Platypus'], ['Oryctolagus cuniculus', 'Rabbit'], ['Oryza indica', 'O.indica'], ['Oryza sativa', 'O.sativa'], ['Oryzias latipes', 'Medaka'], ['Otolemur garnettii', 'Bushbaby'], ['Pan troglodytes', 'Chimp'], ['Pediculus humanus', 'P.humanus'], ['Petromyzon marinus', 'Lamprey'], ['Phaeodactylum tricornutum', 'P.tricornutum'], ['Physcomitrella patens', 'P.patens'], ['Phytophthora infestans', 'P.infestans'], ['Phytophthora ramorum', 'P.ramorum'], ['Phytophthora sojae', 'P.sojae'], ['Plasmodium berghei', 'P.berghei'], ['Plasmodium chabaudi', 'P.chabaudi'], ['Plasmodium falciparum', 'P.falciparum'], ['Plasmodium knowlesi', 'P.knowlesi'], ['Plasmodium vivax', 'P.vivax'], ['Pongo pygmaeus', 'Orangutan'], ['Populus trichocarpa', 'P.trichocarpa'], ['Pristionchus pacificus', 'P.pacificus'], ['Procavia capensis', 'Rock hyrax'], ['Pteropus vampyrus', 'Flying fox'], ['Puccinia graministritici', 'P.graministritici'], ['Pyrococcus collection', 'Pyrococcus'], ['Rattus norvegicus', 'Rat'], ['Saccharomyces cerevisiae', 'S.cerevisiae'], ['Schistosoma mansoni', 'S.mansoni'], ['Schizosaccharomyces pombe', 'S.pombe'], ['Sorex araneus', 'Shrew'], ['Sorghum bicolor', 'S.bicolor'], ['Spermophilus tridecemlineatus', 'Ground Squirrel'], ['Staphylococcus collection', 'Staphylococcus'], ['Streptococcus collection', 'Streptococcus'], ['Strongylocentrotus purpuratus', 'S.purpuratus'], ['Sus scrofa', 'Pig'], ['Taeniopygia guttata', 'Zebra finch'], ['Takifugu rubripes', 'Fugu'], ['Tarsius syrichta', 'Tarsier'], ['Tetraodon nigroviridis', 'Tetraodon'], ['Thalassiosira pseudonana', 'T.pseudonana'], ['Trichoplax adhaerens', 'T.adhaerens'], ['Tupaia belangeri', 'Tree Shrew'], ['Tursiops truncatus', 'Bottlenose dolphin'], ['Vicugna pacos', 'Alpaca'], ['Vitis vinifera', 'V.vinifera'], ['Wolbachia collection', 'Wolbachia'], ['Xenopus tropicalis', 'Xenopus'], ['Zea mays', 'Z.mays']] class SpeciesNameMap(dict): """mapping between common names and latin names""" def __init__(self, species_common = _species_common_map): """provides latin name:common name mappings""" self._species_common = {} self._common_species = {} self._species_ensembl = {} self._ensembl_species = {} for species_name, common_name in species_common: self.amendSpecies(CaseInsensitiveString(species_name), CaseInsensitiveString(common_name)) def __str__(self): rows = [] for common in self._common_species: species = self._common_species[common] ensembl = self._species_ensembl[species] rows += [[common, species, ensembl]] return str(Table(['Common Name', 'Species Name', 'Ensembl Db Prefix'], rows=rows, space=2).sorted()) def __repr__(self): return 'Available species: %s' % ("'"+\ "'; '".join(self._common_species.keys())+"'") def getCommonName(self, name): """returns the common name for the given name (which can be either a species name or the ensembl version)""" name = CaseInsensitiveString(name) if name in self._ensembl_species: name = self._ensembl_species[name] if name in self._species_common: common_name = self._species_common[name] elif name in self._common_species: common_name = name else: raise RuntimeError("Unknown species: %s" % name) return str(common_name) def getSpeciesName(self, name, level='ignore'): """returns the species name for the given common name""" name = CaseInsensitiveString(name) if name in self._species_common: return str(name) species_name = None level = level.lower().strip() name = name for data in [self._common_species, self._ensembl_species]: if name in data: species_name = data[name] if species_name is None: msg = "Unknown common name: %s" % name if level == 'raise': raise RuntimeError(msg) elif level == 'warn': print "WARN: %s" % msg return str(species_name) def getSpeciesNames(self): """returns the list of species names""" names = self._species_common.keys() names.sort() return [str(n) for n in names] def getEnsemblDbPrefix(self, name): """returns a string of the species name in the format used by ensembl""" name = CaseInsensitiveString(name) if name in self._common_species: name = self._common_species[name] try: species_name = self.getSpeciesName(name, level='raise') except RuntimeError: if name not in self._species_common: raise RuntimeError("Unknown name %s" % name) species_name = name return str(species_name.lower().replace(" ","_")) def getComparaName(self, name): """returns string matching a compara instance attribute name for a species""" name = self.getCommonName(name) if '.' in name: name = name.replace('.', '') else: name = name.title() name = name.split() return ''.join(name) def _purge_species(self, species_name): """removes a species record""" species_name = CaseInsensitiveString(species_name) if not species_name in self._species_common: return common_name = self._species_common.pop(species_name) ensembl_name= self._species_ensembl.pop(species_name) self._ensembl_species.pop(ensembl_name) self._common_species.pop(common_name) def amendSpecies(self, species_name, common_name): """add a new species, and common name""" species_name = CaseInsensitiveString(species_name) common_name = CaseInsensitiveString(common_name) assert "_" not in species_name,\ "'_' in species_name, not a Latin name?" self._purge_species(species_name) # remove if existing self._species_common[species_name] = common_name self._common_species[common_name] = species_name ensembl_name = species_name.lower().replace(" ","_") self._species_ensembl[species_name] = ensembl_name self._ensembl_species[ensembl_name] = species_name return Species = SpeciesNameMap() PyCogent-1.5.3/cogent/db/ensembl/util.py000644 000765 000024 00000007671 12024702176 021074 0ustar00jrideoutstaff000000 000000 import re from sqlalchemy import create_engine, MetaData, Table __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "Gavin.Huttley@anu.edu.au" __status__ = "alpha" class DisplayString(str): """provides a mechanism for customising the str() and repr() of objects""" def __new__(cls, arg, num_words=None, repr_length=None, with_quotes=False): new = str.__new__(cls, str(arg)) new.num_words = num_words new.repr_length = repr_length or len(str(arg)) new.with_quotes = with_quotes return new def __repr__(self): if self.num_words is not None: new = " ".join(self.split()[:self.num_words]) elif self.repr_length != len(self): new = self[:self.repr_length] else: new = self if len(self) > len(new): new += '...' new = [new, "'%s'" % new][self.with_quotes] return new class CaseInsensitiveString(str): """A case insensitive string class. Comparisons are case insensitive.""" def __new__(cls, arg, h=None): n = str.__new__(cls, str(arg)) n._hash=hash(''.join(list(n)).lower()) n._lower = ''.join(list(n)).lower() return n def __eq__(self, other): return self._lower == ''.join(list(other)).lower() def __hash__(self): # dict hashing done via lower case return self._hash def __str__(self): return ''.join(list(self)) class LazyRecord(object): """a convenience class for conducting lazy evaluations of class properties""" NULL_VALUE = None def __init__(self): """blind constructor of caches""" self._cached = {} self._table_rows = {} def _get_cached_value(self, attr_name, get_attr_func): if attr_name not in self._cached: get_attr_func() return self._cached[attr_name] def _populate_cache_from_record(self, attr_column, table_name): """attr_column: the attribute name <-> table column key mappin table_name: the key in _table_rows""" table = self._table_rows[table_name] for attr, column, func in attr_column: if attr not in self._cached: self._cached[attr] = func(table[column]) def _set_null_values(self, attrs, table_name = None): for attr in attrs: self._cached[attr] = self.NULL_VALUE if table_name: self._table_rows[table_name] = self.NULL_VALUE class NoItemError(Exception): def __init__(self, value): self.value = value def __str__(self): return repr(self.value) def convert_strand(val): """ensures a consistent internal representation of strand""" if isinstance(val, str): assert val in '-+', 'unknown strand "%s"' % val val = [-1,1][val == '+'] elif val is not None: val = [-1,1][val > 0] else: val = 1 return val def asserted_one(items): """asserts items has a single value and returns it""" one = False for item in items: if one: raise ValueError('More than one: [%s]' % item.items()) one = True if one: return item else: raise NoItemError('No items') def what_columns(table): """shows what columns are in a table""" print [c.name for c in table.c] def yield_selected(sqlalchemy_select, limit=100): """takes a SQLAlchemy select condition, yielding limit per db query purpose is to not get all records from the db in one go """ offset = 0 while 1: results = sqlalchemy_select.offset(offset).limit(limit).execute() count = 0 for result in results: yield result count += 1 offset += limit if count == 0 or count < limit: break PyCogent-1.5.3/cogent/data/__init__.py000644 000765 000024 00000000652 12024702176 020545 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['energy_params', 'molecular_weight', 'protein_properties', \ 'ligand_properties', 'nucleic_properties'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Amanda Birmingham", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" PyCogent-1.5.3/cogent/data/energy_params.py000644 000765 000024 00000006311 12024702176 021640 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """EnergyParams class and instances of entropy and enthalpy params. The enthalpy and entropy params for the 10 Watson-Crick nearest-neighbor interactions, initiation corrections, and symmetry corrections are taken from SantaLucia, PNAS vol 95 1460-1465 GC_INIT is the initiation parameter for duplexes that contain AT LEAST ONE GC base pair, while AT_INIT is the initiation parameter for duplexes that contain ONLY AT base pairs. (quoted from SantaLucia, Allawi, and Seneviratne, Biochemistry, 1996, 3555-3562.) The SYM term is added into the entropy or enthalpy calculation IFF the sequence is self-complementary (seq = revComp of seq) """ __author__ = "Amanda Birmingham" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Amanda Birmingham", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Amanda Birmingham" __email__ = "amanda.birmingham@thermofisher.com" __status__ = "Production" class EnergyParams(object): """A data structure to hold parameters used in energy calculations""" def __init__(self, nearest_neighbor_vals, gc_init, at_init, sym_correct): """Store the input params for later reference nearest_neighbor_vals: a dictionary or dictionary-castable object keyed by each nearest-neighbor pair and holding a value for each. gc_init: a floating-point castable value holding the initiation parameter for duplexes that contain AT LEAST ONE GC base pair at_init: a floating-point castable value holding the initiation parameter for duplexes that contain ONLY AT base pairs sym_correct: a floating-point castable value that is added into the calculation if a sequence is self-complementary """ self.nearestNeighbors = dict(nearest_neighbor_vals) self.gcInit = float(gc_init) self.atInit = float(at_init) self.symCorrection = float(sym_correct) #end __init__ #end EnergyParams #-------------------------- #Enthalpies are in kcal/mol, assumed to be salt concentration independent _NN_ENTHALPIES = { "AA":-7.9, "TT":-7.9, "AT":-7.2, "TA":-7.2, \ "CA":-8.5, "TG":-8.5, "GT":-8.4, "AC":-8.4, \ "CT":-7.8, "AG":-7.8, "GA":-8.2, "TC":-8.2, \ "CG":-10.6, "GC":-9.8, "GG":-8.0, "CC":-8.0} _ENTHALPY_GC_INIT = 0.1 _ENTHALPY_AT_INIT = 2.3 _ENTHALPY_SYM = 0 #-------------------------- #-------------------------- #Entropies are in cal/Kelvin*mol, at 1 M NaCl _NN_ENTROPIES = { "AA":-22.2, "TT":-22.2, "AT":-20.4, "TA":-21.3, \ "CA":-22.7, "TG":-22.7, "GT":-22.4, "AC":-22.4, \ "CT":-21.0, "AG":-21.0, "GA":-22.2, "TC":-22.2, \ "CG":-27.2, "GC":-24.4, "GG":-19.9, "CC":-19.9} _ENTROPY_GC_INIT = -2.8 _ENTROPY_AT_INIT = 4.1 _ENTROPY_SYM = -1.4 #-------------------------- #-------------------------- #Module level public EnergyParams instances (one for entropy, one for energy) ENTHALPY_PARAMS = EnergyParams(_NN_ENTHALPIES,_ENTHALPY_GC_INIT, \ _ENTHALPY_AT_INIT, _ENTHALPY_SYM) ENTROPY_PARAMS = EnergyParams(_NN_ENTROPIES, _ENTROPY_GC_INIT, \ _ENTROPY_AT_INIT, _ENTROPY_SYM) #-------------------------- PyCogent-1.5.3/cogent/data/ligand_properties.py000644 000765 000024 00000001134 12024702176 022514 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Properties of ligands are data from 'tables' about ligands and their atoms.""" __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Production" HOH_NAMES = ['H_HOH', 'H_WAT', 'H_DOH', 'H_HOD', 'H_DOD'] WATER_NAMES = HOH_NAMES LIGAND_ATOM_PROPERTIES = { ('H_HOH', ' O '): [1.60] } LIGAND_AREAIMOL_VDW_RADII = dict([(k, v[0]) for k, v in LIGAND_ATOM_PROPERTIES.iteritems()]) PyCogent-1.5.3/cogent/data/molecular_weight.py000644 000765 000024 00000003534 12024702176 022342 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env Python """Data for molecular weight calculations on proteins and nucleotides.""" __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" ProteinWeights = { 'A': 89.09, 'C': 121.16, 'D': 133.10, 'E': 147.13, 'F': 165.19, 'G': 75.07, 'H': 155.16, 'I': 131.18, 'K': 146.19, 'L': 131.18, 'M': 149.21, 'N': 132.12, 'P': 115.13, 'Q': 146.15, 'R': 174.20, 'S': 105.09, 'T': 119.12, 'V': 117.15, 'W': 204.23, 'Y': 181.19, 'U': 168.06, } RnaWeights = { 'A': 313.21, 'U': 290.17, 'C': 289.19, 'G': 329.21, } DnaWeights = { 'A': 297.21, 'T': 274.17, 'C': 273.19, 'G': 313.21, } ProteinWeightCorrection = 18.0 #terminal residues not dehydrated DnaWeightCorrection = 61.96 #assumes 5' monophosphate, 3' OH RnaWeightCorrection = DnaWeightCorrection class WeightCalculator(object): """Calculates molecular weight of a non-degenerate sequence.""" def __init__(self, Weights, Correction): """Returns a new WeightCalculator object (class, so serializable).""" self.Weights = Weights self.Correction = Correction def __call__(self, seq, correction=None): """Returns the molecular weight of a specified sequence.""" if not seq: return 0 if correction is None: correction = self.Correction get_mw = self.Weights.get return sum([get_mw(i, 0) for i in seq]) + correction DnaMW = WeightCalculator(DnaWeights, DnaWeightCorrection) RnaMW = WeightCalculator(RnaWeights, DnaWeightCorrection) ProteinMW = WeightCalculator(ProteinWeights, ProteinWeightCorrection) PyCogent-1.5.3/cogent/data/nucleic_properties.py000644 000765 000024 00000000332 12024702176 022677 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = [] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "" __email__ = "" __status__ = "Development" PyCogent-1.5.3/cogent/data/protein_properties.py000644 000765 000024 00000047065 12024702176 022753 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Properties of proteins are data from 'tables' about residues and their atoms.""" __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Production" AA_PROPERTIES = { 'ALA': (1.8, 0.0, 0.42, 0.00, 106.0, 'T', 'P', 'A', 'A', 'A', 'A', 9, 'A'), 'ARG': (-4.5, 1.0, -1.37, -1.88, 248.0, 'C', 'P', 'E', 'K', 'K', 'K', 15, 'R'), 'ASN': (-3.5, 0.0, -0.82, -1.03, 157.0, 'D', 'P', 'E', 'E', 'E', 'N', 16, 'N'), 'ASP': (-3.5, -1.0, -1.05, -0.78, 163.0, 'C', 'P', 'E', 'E', 'E', 'D', 19, 'D'), 'ASX': (-3.5, -0.5, -1.05, -0.78, 160.0, None, 'P', 'E', 'E', 'E', None, None, 'B'), # 1kp0.pdb 'CYS': (2.5, 0.0, 1.34, -0.85, 135.0, 'T', 'H', 'L', 'L', 'C', 'C', 7, 'C'), 'GLN': (-3.5, 0.0, -0.30, -1.73, 198.0, 'D', 'P', 'E', 'E', 'E', 'Q', 17, 'Q'), 'GLU': (-3.5, -1.0, -0.87, -1.46, 194.0, 'C', 'P', 'E', 'E', 'E', 'E', 18, 'E'), 'GLX': (-3.5, -0.5, -0.87, -1.46, 196.0, None, 'P', 'E', 'E', 'E', None, None, 'Z'), 'GLY': (-0.4, 0.0, 0.00, 0.00, 84.0, 'T', 'P', 'A', 'A', 'G', 'G', 11, 'G'), 'HIS': (-3.2, 0.0, 0.18, -0.95, 184.0, 'R', 'P', 'E', 'H', 'H', 'H', 10, 'H'), 'ILE': (4.5, 0.0, 2.46, -0.76, 169.0, 'A', 'H', 'L', 'L', 'L', 'L', 1, 'I'), 'LEU': (3.8, 0.0, 2.32, -0.71, 164.0, 'A', 'H', 'L', 'L', 'L', 'L', 3, 'L'), 'LYS': (-3.9, 1.0, -1.35, -1.89, 205.0, 'C', 'P', 'E', 'K', 'K', 'K', 20, 'K'), 'MET': (1.9, 0.0, 1.68, -1.46, 188.0, 'D', 'H', 'L', 'L', 'L', 'L', 5, 'M'), 'MSE': (0.0, 0.0, 0.00, 0.00, 000.0, 'D', 'H', 'L', 'L', 'L', 'L', 5, 'M'), # SeMet 'PHE': (2.8, 0.0, 2.44, -0.62, 197.0, 'R', 'H', 'F', 'F', 'F', 'F', 2, 'F'), 'PRO': (-1.6, 0.0, 0.98, -0.06, 136.0, 'D', 'P', 'A', 'P', 'P', 'P', 13, 'P'), 'SEC': (2.5, 0.0, 1.34, -0.85, 135.0, 'T', 'H', 'L', 'L', 'C', 'C', 7, 'U'), # SeCys 'SER': (-0.8, 0.0, -0.05, -1.11, 130.0, 'T', 'P', 'A', 'S', 'S', 'S', 14, 'S'), 'THR': (-0.7, 0.0, 0.35, -1.08, 142.0, 'D', 'P', 'A', 'S', 'S', 'T', 12, 'T'), 'TRP': (-0.9, 0.0, 3.07, -0.99, 227.0, 'R', 'H', 'F', 'F', 'F', 'W', 6, 'W'), 'TYR': (-1.3, 0.0, 1.31, -1.13, 222.0, 'R', 'H', 'F', 'F', 'F', 'F', 8, 'Y'), 'UNK': (0.0, 0.0, 0.00, 0.00, 000.0, None, None, None, None, None, None, None, 'X'), 'VAL': (4.2, 0.0, 1.66, -0.43, 142.0, 'A', 'H', 'L', 'L', 'L', 'L', 4, 'V') } AA_NAMES_1 = [value_[-1] for value_ in AA_PROPERTIES.itervalues()] AA_NAMES_3 = [key_ for key_ in AA_PROPERTIES.iterkeys()] AA_NAMES_3to1 = dict([(key_, value_[-1]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_GRAVY = dict([(key_, value_[0]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_CHARGE = dict([(key_, value_[1]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_SOLVATATION = dict([(key_, value_[2]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_ENTROPY = dict([(key_, value_[3]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_ASA = dict([(key_, value_[4]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_5 = dict([(key_, value_[5]) for (key_, value_) in AA_PROPERTIES.iteritems()]) # (Aliphatic, aRomatic, Charged, Tiny, Diverse) AA_2 = dict([(key_, value_[6]) for (key_, value_) in AA_PROPERTIES.iteritems()]) # (Hydrphobic, hydroPhylic) AA_4MURPHY = dict([(key_, value_[7]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_8MURPHY = dict([(key_, value_[8]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_10MURPHY = dict([(key_, value_[9]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_15MURPHY = dict([(key_, value_[10]) for (key_, value_) in AA_PROPERTIES.iteritems()]) AA_POLARITY = dict([(key_, value_[11]) for (key_, value_) in AA_PROPERTIES.iteritems()]) #short alternative AA_NAMES = AA_NAMES_3 AA_3to1 = AA_NAMES_3to1 AA_ATOMS = { 'ILE': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG1', ' CG2', ' CD1', ' OXT', ' H ', ' H2 ', ' HA ', ' HB ', 'HG12', 'HG13', 'HG21', 'HG22', 'HG23', 'HD11', 'HD12', 'HD13', ' HXT'], 'GLN': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD ', ' OE1', ' NE2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HG2', ' HG3', 'HE21', 'HE22', ' HXT'], 'GLX': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD ', ' XE1', ' XE2', ' HA ', ' H ', ' HB1', ' HB2', ' HG1', ' HG2', ' HD '], 'GLY': [' N ', ' CA ', ' C ', ' O ', ' OXT', ' H ', ' H2 ', ' HA2', ' HA3', ' HXT'], 'MSE': [' N ', ' CA ', ' C ', ' O ', ' OXT', ' CB ', ' CG ', 'SE ', ' CE ', ' H ', ' HN2', ' HA ', ' HXT', ' HB2', ' HB3', ' HG2', ' HG3', ' HE1', ' HE2', ' HE3'], 'GLU': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD ', ' OE1', ' OE2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HG2', ' HG3', ' HE2', ' HXT'], 'CYS': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' SG ', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HG ', ' HXT'], 'ASP': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' OD1', ' OD2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HD2', ' HXT'], 'SER': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' OG ', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HG ', ' HXT'], 'PRO': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD ', ' OXT', ' H ', ' HA ', ' HB2', ' HB3', ' HG2', ' HG3', ' HD2', ' HD3', ' HXT'], 'ASX': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' XD1', ' XD2', ' HA ', ' H ', ' HB1', ' HB2', ' HG '], 'SEC': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' SEG', ' OD1', ' OD2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HD2', ' HXT'], 'ASN': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' OD1', ' ND2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', 'HD21', 'HD22', ' HXT'], 'VAL': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG1', ' CG2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB ', 'HG11', 'HG12', 'HG13', 'HG21', 'HG22', 'HG23', ' HXT'], 'THR': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' OG1', ' CG2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB ', ' HG1', 'HG21', 'HG22', 'HG23', ' HXT'], 'HIS': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' ND1', ' CD2', ' CE1', ' NE2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HD1', ' HD2', ' HE1', ' HE2', ' HXT'], 'UNK': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' OXT', ' H ', ' H2 ', ' HA ', ' HB1', ' HB2', ' HG1', ' HG2', ' HG3', ' HXT'], 'PHE': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD1', ' CD2', ' CE1', ' CE2', ' CZ ', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HD1', ' HD2', ' HE1', ' HE2', ' HZ ', ' HXT'], 'ALA': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' OXT', ' H ', ' H2 ', ' HA ', ' HB1', ' HB2', ' HB3', ' HXT'], 'MET': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' SD ', ' CE ', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HG2', ' HG3', ' HE1', ' HE2', ' HE3', ' HXT'], 'LEU': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD1', ' CD2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HG ', 'HD11', 'HD12', 'HD13', 'HD21', 'HD22', 'HD23', ' HXT'], 'ARG': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD ', ' NE ', ' CZ ', ' NH1', ' NH2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HG2', ' HG3', ' HD2', ' HD3', ' HE ', 'HH11', 'HH12', 'HH21', 'HH22', ' HXT'], 'TRP': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD1', ' CD2', ' NE1', ' CE2', ' CE3', ' CZ2', ' CZ3', ' CH2', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HD1', ' HE1', ' HE3', ' HZ2', ' HZ3', ' HH2', ' HXT'], 'LYS': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD ', ' CE ', ' NZ ', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HG2', ' HG3', ' HD2', ' HD3', ' HE2', ' HE3', ' HZ1', ' HZ2', ' HZ3', ' HXT'], 'TYR': [' N ', ' CA ', ' C ', ' O ', ' CB ', ' CG ', ' CD1', ' CD2', ' CE1', ' CE2', ' CZ ', ' OH ', ' OXT', ' H ', ' H2 ', ' HA ', ' HB2', ' HB3', ' HD1', ' HD2', ' HE1', ' HE2', ' HH ', ' HXT']} # sort atoms in residue according to PDBv3 specification AA_ATOM_BACKBONE_ORDER = {'N':3, 'CA':2, 'C':1, 'O':0} AA_ATOM_REMOTE_ORDER = {'A':0, 'B':1, 'G':2, 'D':3, 'E':4, 'Z':5, \ 'H':6, 'X':7, '1':8, '2':9, 'N':9, '3':10} #H1, H2, H3 and NH2 in PDV v2.3AA_ AA_ATOM_PROPERTIES = { ('ALA', ' C '): [1.8], ('ALA', ' CA '): [1.8], ('ALA', ' CB '): [1.8], ('ALA', ' H '): [0.0], ('ALA', ' H2 '): [0.0], ('ALA', ' HA '): [0.0], ('ALA', ' HB1'): [0.0], ('ALA', ' HB2'): [0.0], ('ALA', ' HB3'): [0.0], ('ALA', ' HXT'): [0.0], ('ALA', ' N '): [1.65], ('ALA', ' O '): [1.60], ('ALA', ' OXT'): [1.60], ('ARG', ' C '): [1.8], ('ARG', ' CA '): [1.8], ('ARG', ' CB '): [1.8], ('ARG', ' CD '): [1.8], ('ARG', ' CG '): [1.8], ('ARG', ' CZ '): [1.8], ('ARG', ' H '): [0.0], ('ARG', ' H2 '): [0.0], ('ARG', ' HA '): [0.0], ('ARG', ' HB2'): [0.0], ('ARG', ' HB3'): [0.0], ('ARG', ' HD2'): [0.0], ('ARG', ' HD3'): [0.0], ('ARG', ' HE '): [0.0], ('ARG', ' HG2'): [0.0], ('ARG', ' HG3'): [0.0], ('ARG', ' HXT'): [0.0], ('ARG', ' N '): [1.65], ('ARG', ' NE '): [1.65], ('ARG', ' NH1'): [1.65], ('ARG', ' NH2'): [1.65], ('ARG', ' O '): [1.60], ('ARG', ' OXT'): [1.60], ('ARG', 'HH11'): [0.0], ('ARG', 'HH12'): [0.0], ('ARG', 'HH21'): [0.0], ('ARG', 'HH22'): [0.0], ('ASN', ' C '): [1.8], ('ASN', ' CA '): [1.8], ('ASN', ' CB '): [1.8], ('ASN', ' CG '): [1.8], ('ASN', ' H '): [0.0], ('ASN', ' H2 '): [0.0], ('ASN', ' HA '): [0.0], ('ASN', ' HB2'): [0.0], ('ASN', ' HB3'): [0.0], ('ASN', ' HXT'): [0.0], ('ASN', ' N '): [1.65], ('ASN', ' ND2'): [1.65], ('ASN', ' O '): [1.60], ('ASN', ' OD1'): [1.60], ('ASN', ' OXT'): [1.60], ('ASN', 'HD21'): [0.0], ('ASN', 'HD22'): [0.0], ('ASP', ' C '): [1.8], ('ASP', ' CA '): [1.8], ('ASP', ' CB '): [1.8], ('ASP', ' CG '): [1.8], ('ASP', ' H '): [0.0], ('ASP', ' H2 '): [0.0], ('ASP', ' HA '): [0.0], ('ASP', ' HB2'): [0.0], ('ASP', ' HB3'): [0.0], ('ASP', ' HD2'): [0.0], ('ASP', ' HXT'): [0.0], ('ASP', ' N '): [1.65], ('ASP', ' O '): [1.60], ('ASP', ' OD1'): [1.60], ('ASP', ' OD2'): [1.60], ('ASP', ' OXT'): [1.60], ('ASX', ' C '): [1.8], ('ASX', ' CA '): [1.8], ('ASX', ' CB '): [1.8], ('ASX', ' CG '): [1.8], ('ASX', ' H '): [0.0], ('ASX', ' HA '): [0.0], ('ASX', ' HB1'): [0.0], ('ASX', ' HB2'): [0.0], ('ASX', ' HG '): [0.0], ('ASX', ' N '): [1.65], ('ASX', ' O '): [1.60], ('ASX', ' XD1'): [1.65], ('ASX', ' XD2'): [1.65], ('CYS', ' C '): [1.8], ('CYS', ' CA '): [1.8], ('CYS', ' CB '): [1.8], ('CYS', ' H '): [0.0], ('CYS', ' H2 '): [0.0], ('CYS', ' HA '): [0.0], ('CYS', ' HB2'): [0.0], ('CYS', ' HB3'): [0.0], ('CYS', ' HG '): [0.0], ('CYS', ' HXT'): [0.0], ('CYS', ' N '): [1.65], ('CYS', ' O '): [1.60], ('CYS', ' OXT'): [1.60], ('CYS', ' SG '): [1.850], ('GLN', ' C '): [1.8], ('GLN', ' CA '): [1.8], ('GLN', ' CB '): [1.8], ('GLN', ' CD '): [1.8], ('GLN', ' CG '): [1.8], ('GLN', ' H '): [0.0], ('GLN', ' H2 '): [0.0], ('GLN', ' HA '): [0.0], ('GLN', ' HB2'): [0.0], ('GLN', ' HB3'): [0.0], ('GLN', ' HG2'): [0.0], ('GLN', ' HG3'): [0.0], ('GLN', ' HXT'): [0.0], ('GLN', ' N '): [1.65], ('GLN', ' NE2'): [1.65], ('GLN', ' O '): [1.60], ('GLN', ' OE1'): [1.60], ('GLN', ' OXT'): [1.60], ('GLN', 'HE21'): [0.0], ('GLN', 'HE22'): [0.0], ('GLU', ' C '): [1.8], ('GLU', ' CA '): [1.8], ('GLU', ' CB '): [1.8], ('GLU', ' CD '): [1.8], ('GLU', ' CG '): [1.8], ('GLU', ' H '): [0.0], ('GLU', ' H2 '): [0.0], ('GLU', ' HA '): [0.0], ('GLU', ' HB2'): [0.0], ('GLU', ' HB3'): [0.0], ('GLU', ' HE2'): [0.0], ('GLU', ' HG2'): [0.0], ('GLU', ' HG3'): [0.0], ('GLU', ' HXT'): [0.0], ('GLU', ' N '): [1.65], ('GLU', ' O '): [1.60], ('GLU', ' OE1'): [1.60], ('GLU', ' OE2'): [1.60], ('GLU', ' OXT'): [1.60], ('GLX', ' C '): [1.8], ('GLX', ' CA '): [1.8], ('GLX', ' CB '): [1.8], ('GLX', ' CD '): [1.8], ('GLX', ' CG '): [1.8], ('GLX', ' H '): [0.0], ('GLX', ' HA '): [0.0], ('GLX', ' HB1'): [0.0], ('GLX', ' HB2'): [0.0], ('GLX', ' HD '): [0.0], ('GLX', ' HG1'): [0.0], ('GLX', ' HG2'): [0.0], ('GLX', ' N '): [1.65], ('GLX', ' O '): [1.60], ('GLX', ' XE1'): [1.65], ('GLX', ' XE2'): [1.65], ('GLY', ' C '): [1.8], ('GLY', ' CA '): [1.8], ('GLY', ' H '): [0.0], ('GLY', ' H2 '): [0.0], ('GLY', ' HA2'): [0.0], ('GLY', ' HA3'): [0.0], ('GLY', ' HXT'): [0.0], ('GLY', ' N '): [1.65], ('GLY', ' O '): [1.60], ('GLY', ' OXT'): [1.60], ('HIS', ' C '): [1.8], ('HIS', ' CA '): [1.8], ('HIS', ' CB '): [1.8], ('HIS', ' CD2'): [1.8], ('HIS', ' CE1'): [1.8], ('HIS', ' CG '): [1.8], ('HIS', ' H '): [0.0], ('HIS', ' H2 '): [0.0], ('HIS', ' HA '): [0.0], ('HIS', ' HB2'): [0.0], ('HIS', ' HB3'): [0.0], ('HIS', ' HD1'): [0.0], ('HIS', ' HD2'): [0.0], ('HIS', ' HE1'): [0.0], ('HIS', ' HE2'): [0.0], ('HIS', ' HXT'): [0.0], ('HIS', ' N '): [1.65], ('HIS', ' ND1'): [1.65], ('HIS', ' NE2'): [1.65], ('HIS', ' O '): [1.60], ('HIS', ' OXT'): [1.60], ('ILE', ' C '): [1.8], ('ILE', ' CA '): [1.8], ('ILE', ' CB '): [1.8], ('ILE', ' CD1'): [1.8], ('ILE', ' CG1'): [1.8], ('ILE', ' CG2'): [1.8], ('ILE', ' H '): [0.0], ('ILE', ' H2 '): [0.0], ('ILE', ' HA '): [0.0], ('ILE', ' HB '): [0.0], ('ILE', ' HXT'): [0.0], ('ILE', ' N '): [1.65], ('ILE', ' O '): [1.60], ('ILE', ' OXT'): [1.60], ('ILE', 'HD11'): [0.0], ('ILE', 'HD12'): [0.0], ('ILE', 'HD13'): [0.0], ('ILE', 'HG12'): [0.0], ('ILE', 'HG13'): [0.0], ('ILE', 'HG21'): [0.0], ('ILE', 'HG22'): [0.0], ('ILE', 'HG23'): [0.0], ('LEU', ' C '): [1.8], ('LEU', ' CA '): [1.8], ('LEU', ' CB '): [1.8], ('LEU', ' CD1'): [1.8], ('LEU', ' CD2'): [1.8], ('LEU', ' CG '): [1.8], ('LEU', ' H '): [0.0], ('LEU', ' H2 '): [0.0], ('LEU', ' HA '): [0.0], ('LEU', ' HB2'): [0.0], ('LEU', ' HB3'): [0.0], ('LEU', ' HG '): [0.0], ('LEU', ' HXT'): [0.0], ('LEU', ' N '): [1.65], ('LEU', ' O '): [1.60], ('LEU', ' OXT'): [1.60], ('LEU', 'HD11'): [0.0], ('LEU', 'HD12'): [0.0], ('LEU', 'HD13'): [0.0], ('LEU', 'HD21'): [0.0], ('LEU', 'HD22'): [0.0], ('LEU', 'HD23'): [0.0], ('LYS', ' C '): [1.8], ('LYS', ' CA '): [1.8], ('LYS', ' CB '): [1.8], ('LYS', ' CD '): [1.8], ('LYS', ' CE '): [1.8], ('LYS', ' CG '): [1.8], ('LYS', ' H '): [0.0], ('LYS', ' H2 '): [0.0], ('LYS', ' HA '): [0.0], ('LYS', ' HB2'): [0.0], ('LYS', ' HB3'): [0.0], ('LYS', ' HD2'): [0.0], ('LYS', ' HD3'): [0.0], ('LYS', ' HE2'): [0.0], ('LYS', ' HE3'): [0.0], ('LYS', ' HG2'): [0.0], ('LYS', ' HG3'): [0.0], ('LYS', ' HXT'): [0.0], ('LYS', ' HZ1'): [0.0], ('LYS', ' HZ2'): [0.0], ('LYS', ' HZ3'): [0.0], ('LYS', ' N '): [1.65], ('LYS', ' NZ '): [1.65], ('LYS', ' O '): [1.60], ('LYS', ' OXT'): [1.60], ('MET', ' C '): [1.8], ('MET', ' CA '): [1.8], ('MET', ' CB '): [1.8], ('MET', ' CE '): [1.8], ('MET', ' CG '): [1.8], ('MET', ' H '): [0.0], ('MET', ' H2 '): [0.0], ('MET', ' HA '): [0.0], ('MET', ' HB2'): [0.0], ('MET', ' HB3'): [0.0], ('MET', ' HE1'): [0.0], ('MET', ' HE2'): [0.0], ('MET', ' HE3'): [0.0], ('MET', ' HG2'): [0.0], ('MET', ' HG3'): [0.0], ('MET', ' HXT'): [0.0], ('MET', ' N '): [1.65], ('MET', ' O '): [1.60], ('MET', ' OXT'): [1.60], ('MET', ' SD '): [1.850], ('MSE', ' C '): [1.8], ('MSE', ' CA '): [1.8], ('MSE', ' CB '): [1.8], ('MSE', ' CE '): [1.8], ('MSE', ' CG '): [1.8], ('MSE', ' H '): [0.0], ('MSE', ' HA '): [0.0], ('MSE', ' HB2'): [0.0], ('MSE', ' HB3'): [0.0], ('MSE', ' HE1'): [0.0], ('MSE', ' HE2'): [0.0], ('MSE', ' HE3'): [0.0], ('MSE', ' HG2'): [0.0], ('MSE', ' HG3'): [0.0], ('MSE', ' HN2'): [0.0], ('MSE', ' HXT'): [0.0], ('MSE', ' N '): [1.65], ('MSE', ' O '): [1.60], ('MSE', ' OXT'): [1.60], ('MSE', 'SE '): [1.90], ('PHE', ' C '): [1.8], ('PHE', ' CA '): [1.8], ('PHE', ' CB '): [1.8], ('PHE', ' CD1'): [1.8], ('PHE', ' CD2'): [1.8], ('PHE', ' CE1'): [1.8], ('PHE', ' CE2'): [1.8], ('PHE', ' CG '): [1.8], ('PHE', ' CZ '): [1.8], ('PHE', ' H '): [0.0], ('PHE', ' H2 '): [0.0], ('PHE', ' HA '): [0.0], ('PHE', ' HB2'): [0.0], ('PHE', ' HB3'): [0.0], ('PHE', ' HD1'): [0.0], ('PHE', ' HD2'): [0.0], ('PHE', ' HE1'): [0.0], ('PHE', ' HE2'): [0.0], ('PHE', ' HXT'): [0.0], ('PHE', ' HZ '): [0.0], ('PHE', ' N '): [1.65], ('PHE', ' O '): [1.60], ('PHE', ' OXT'): [1.60], ('PRO', ' C '): [1.8], ('PRO', ' CA '): [1.8], ('PRO', ' CB '): [1.8], ('PRO', ' CD '): [1.8], ('PRO', ' CG '): [1.8], ('PRO', ' H '): [0.0], ('PRO', ' HA '): [0.0], ('PRO', ' HB2'): [0.0], ('PRO', ' HB3'): [0.0], ('PRO', ' HD2'): [0.0], ('PRO', ' HD3'): [0.0], ('PRO', ' HG2'): [0.0], ('PRO', ' HG3'): [0.0], ('PRO', ' HXT'): [0.0], ('PRO', ' N '): [1.65], ('PRO', ' O '): [1.60], ('PRO', ' OXT'): [1.60], ('SEC', ' C '): [1.8], ('SEC', ' CA '): [1.8], ('SEC', ' CB '): [1.8], ('SEC', ' H '): [0.0], ('SEC', ' H2 '): [0.0], ('SEC', ' HA '): [0.0], ('SEC', ' HB2'): [0.0], ('SEC', ' HB3'): [0.0], ('SEC', ' HD2'): [0.0], ('SEC', ' HXT'): [0.0], ('SEC', ' N '): [1.65], ('SEC', ' O '): [1.60], ('SEC', ' OD1'): [1.60], ('SEC', ' OD2'): [1.60], ('SEC', ' OXT'): [1.60], ('SEC', ' SEG'): [1.85], ('SER', ' C '): [1.8], ('SER', ' CA '): [1.8], ('SER', ' CB '): [1.8], ('SER', ' H '): [0.0], ('SER', ' H2 '): [0.0], ('SER', ' HA '): [0.0], ('SER', ' HB2'): [0.0], ('SER', ' HB3'): [0.0], ('SER', ' HG '): [0.0], ('SER', ' HXT'): [0.0], ('SER', ' N '): [1.65], ('SER', ' O '): [1.60], ('SER', ' OG '): [1.60], ('SER', ' OXT'): [1.60], ('THR', ' C '): [1.8], ('THR', ' CA '): [1.8], ('THR', ' CB '): [1.8], ('THR', ' CG2'): [1.8], ('THR', ' H '): [0.0], ('THR', ' H2 '): [0.0], ('THR', ' HA '): [0.0], ('THR', ' HB '): [0.0], ('THR', ' HG1'): [0.0], ('THR', ' HXT'): [0.0], ('THR', ' N '): [1.65], ('THR', ' O '): [1.60], ('THR', ' OG1'): [1.60], ('THR', ' OXT'): [1.60], ('THR', 'HG21'): [0.0], ('THR', 'HG22'): [0.0], ('THR', 'HG23'): [0.0], ('TRP', ' C '): [1.8], ('TRP', ' CA '): [1.8], ('TRP', ' CB '): [1.8], ('TRP', ' CD1'): [1.8], ('TRP', ' CD2'): [1.8], ('TRP', ' CE2'): [1.8], ('TRP', ' CE3'): [1.8], ('TRP', ' CG '): [1.8], ('TRP', ' CH2'): [1.8], ('TRP', ' CZ2'): [1.8], ('TRP', ' CZ3'): [1.8], ('TRP', ' H '): [0.0], ('TRP', ' H2 '): [0.0], ('TRP', ' HA '): [0.0], ('TRP', ' HB2'): [0.0], ('TRP', ' HB3'): [0.0], ('TRP', ' HD1'): [0.0], ('TRP', ' HE1'): [0.0], ('TRP', ' HE3'): [0.0], ('TRP', ' HH2'): [0.0], ('TRP', ' HXT'): [0.0], ('TRP', ' HZ2'): [0.0], ('TRP', ' HZ3'): [0.0], ('TRP', ' N '): [1.65], ('TRP', ' NE1'): [1.65], ('TRP', ' O '): [1.60], ('TRP', ' OXT'): [1.60], ('TYR', ' C '): [1.8], ('TYR', ' CA '): [1.8], ('TYR', ' CB '): [1.8], ('TYR', ' CD1'): [1.8], ('TYR', ' CD2'): [1.8], ('TYR', ' CE1'): [1.8], ('TYR', ' CE2'): [1.8], ('TYR', ' CG '): [1.8], ('TYR', ' CZ '): [1.8], ('TYR', ' H '): [0.0], ('TYR', ' H2 '): [0.0], ('TYR', ' HA '): [0.0], ('TYR', ' HB2'): [0.0], ('TYR', ' HB3'): [0.0], ('TYR', ' HD1'): [0.0], ('TYR', ' HD2'): [0.0], ('TYR', ' HE1'): [0.0], ('TYR', ' HE2'): [0.0], ('TYR', ' HH '): [0.0], ('TYR', ' HXT'): [0.0], ('TYR', ' N '): [1.65], ('TYR', ' O '): [1.60], ('TYR', ' OH '): [1.60], ('TYR', ' OXT'): [1.60], ('UNK', ' C '): [1.8], ('UNK', ' CA '): [1.8], ('UNK', ' CB '): [1.8], ('UNK', ' CG '): [1.8], ('UNK', ' H '): [0.0], ('UNK', ' H2 '): [0.0], ('UNK', ' HA '): [0.0], ('UNK', ' HB1'): [0.0], ('UNK', ' HB2'): [0.0], ('UNK', ' HG1'): [0.0], ('UNK', ' HG2'): [0.0], ('UNK', ' HG3'): [0.0], ('UNK', ' HXT'): [0.0], ('UNK', ' N '): [1.65], ('UNK', ' O '): [1.60], ('UNK', ' OXT'): [1.60], ('VAL', ' C '): [1.8], ('VAL', ' CA '): [1.8], ('VAL', ' CB '): [1.8], ('VAL', ' CG1'): [1.8], ('VAL', ' CG2'): [1.8], ('VAL', ' H '): [0.0], ('VAL', ' H2 '): [0.0], ('VAL', ' HA '): [0.0], ('VAL', ' HB '): [0.0], ('VAL', ' HXT'): [0.0], ('VAL', ' N '): [1.65], ('VAL', ' O '): [1.60], ('VAL', ' OXT'): [1.60], ('VAL', 'HG11'): [0.0], ('VAL', 'HG12'): [0.0], ('VAL', 'HG13'): [0.0], ('VAL', 'HG21'): [0.0], ('VAL', 'HG22'): [0.0], ('VAL', 'HG23'): [0.0]} #AREAIMOL_VDW_RADII AREAIMOL_VDW_RADII = dict([(k, v[0]) for k, v in AA_ATOM_PROPERTIES.iteritems()]) DEFAULT_AREAIMOL_VDW_RADIUS = 1.7 PyCogent-1.5.3/cogent/core/__init__.py000644 000765 000024 00000001145 12024702176 020562 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python __all__ = ['alignment', 'alphabet', 'annotation', 'bitvector', 'entity', 'genetic_code', 'info', 'location', 'moltype', 'profile', 'sequence', 'tree', 'usage'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Gavin Huttley", "Rob Knight", "Sandra Smit", "Peter Maxwell", "Matthew Wakefield", "Greg Caporaso", "Marcin Cieslik" ] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" PyCogent-1.5.3/cogent/core/alignment.py000644 000765 000024 00000326543 12024702176 021015 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Code for handling multiple sequence alignments. In particular: - SequenceCollection handles both aligned and unaligned sequences. - Alignment and its subclasses handle multiple sequence alignments, storing the raw sequences and a gap map. Useful for very long alignments, e.g. genomics data. - DenseAlignment and its subclasses handle multiple sequence alignments as arrays of characters. Especially useful for short alignments that contain many sequences. WARNING: The various alignment objects try to guess the input type from the input, but this behavior has a few quirks. In particular, if the input is a sequence of two-item sequences (e.g. a list of two-character strings), each sequence will be unpacked and the first item will be used as the label, the second as the sequence. For example, Alignment(['AA','CC','AA']) produces an alignment of three 1-character strings labeled A, C and A respectively. The reason for this is that the common case is that you have passed in a stream of two-item label, sequence pairs. However, this can cause confusion when testing. """ from __future__ import division from types import GeneratorType from cogent.core.annotation import Map, _Annotatable import cogent #will use to get at cogent.parse.fasta.MinimalFastaParser, #which is a circular import otherwise. from cogent.format.alignment import save_to_filename from cogent.core.info import Info as InfoClass from cogent.core.sequence import frac_same, ModelSequence from cogent.maths.stats.util import Freqs from cogent.format.fasta import fasta_from_alignment from cogent.format.phylip import phylip_from_alignment from cogent.format.nexus import nexus_from_alignment from cogent.parse.gff import GffParser, parse_attributes from numpy import nonzero, array, logical_or, logical_and, logical_not, \ transpose, arange, zeros, ones, take, put, uint8, ndarray from numpy.random import randint, permutation from cogent.util.dict2d import Dict2D from copy import copy from cogent.core.profile import Profile __author__ = "Peter Maxwell and Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Rob Knight", "Gavin Huttley", "Jeremy Widmann", "Catherine Lozupone", "Matthew Wakefield", "Micah Hamady", "Daniel McDonald", "Jan Kosinski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" class DataError(Exception): pass eps = 1e-6 #small number: 1-eps is almost 1, and is used for things like the #default number of gaps to allow in a column. def assign_sequential_names(ignored, num_seqs, base_name='seq', start_at=0): """Returns list of num_seqs sequential, unique names. First argument is ignored; expect this to be set as a class attribute. """ return ['%s_%s' % (base_name,i) for i in range(start_at,start_at+num_seqs)] class SeqLabeler(object): """Allows flexible seq labeling in toFasta().""" def __init__(self, aln, label_f=assign_sequential_names, **kwargs): """Initializes a new seq labeler.""" self._aln = aln self._label_f = label_f self._map = dict(zip(aln.Names, label_f(len(aln.Names, **kwargs)))) def __call__(self, s): """Returns seq name from seq id""" return self._map[s.Name] def coerce_to_string(s): """Converts an arbitrary sequence into a string.""" if isinstance(s, str): #if it's a string, OK as is return s if isinstance(s, Aligned): #if it's an Aligned object, convert to string return str(s) curr = str(s) #if its string is the same length, return that if len(curr) == len(s): return curr try: return ''.join(s) #assume it's a seq of chars except(TypeError, ValueError): return ''.join(map(str, s)) #general case (slow, might not be correct) def seqs_from_array(a, Alphabet=None): """SequenceCollection from array of pos x seq: names are integers. This is an InputHandler for SequenceCollection. It converts an arbitrary array of numbers into Sequence objects using seq_constructor, and leaves the sequences unlabeled. """ return list(transpose(a)), None def seqs_from_model_seqs(seqs, Alphabet=None): """Alignment from ModelSequence objects: seqs -> array, names from seqs. This is an InputHandler for SequenceCollection. It converts a list of Sequence objects with _data and Name properties into a SequenceCollection that uses those sequences. """ return seqs, [s.Name for s in seqs] def seqs_from_generic(seqs, Alphabet=None): """SequenceCollection from generic seq x pos data: seq of seqs of chars. This is an InputHandler for SequenceCollection. It converts a generic list (each item in the list will be mapped onto an object using seq_constructor and assigns sequential integers (0-based) as names. """ names = [] for s in seqs: if hasattr(s, 'Name'): names.append(s.Name) else: names.append(None) return seqs, names def seqs_from_fasta(seqs, Alphabet=None): """SequenceCollection from FASTA-format string or lines. This is an InputHandler for SequenceCollection. It converts a FASTA-format string or collection of lines into a SequenceCollection object, preserving order.. """ if isinstance(seqs, str): seqs = seqs.splitlines() names, seqs = zip(*list(cogent.parse.fasta.MinimalFastaParser(seqs))) return list(seqs), list(names) def seqs_from_dict(seqs, Alphabet=None): """SequenceCollection from dict of {label:seq_as_str}. This is an InputHandler for SequenceCollection. It converts a dict in which the keys are the names and the values are the sequences (sequence only, no whitespace or other formatting) into a SequenceCollection. Because the dict doesn't preserve order, the result will not necessarily be in alphabetical order.""" names, seqs = map(list, zip(*seqs.items())) return seqs, names def seqs_from_kv_pairs(seqs, Alphabet=None): """SequenceCollection from list of (key, val) pairs. This is an InputHandler for SequenceCollection. It converts a dict in which the keys are the names and the values are the sequences (sequence only, no whitespace or other formatting) into a SequenceCollection. Because the dict doesn't preserve order, the result will be in arbitrary order.""" names, seqs = map(list, zip(*seqs)) return seqs, names def seqs_from_aln(seqs, Alphabet=None): """SequenceCollection from existing SequenceCollection object: copies data. This is relatively inefficient: you should really use the copy() method instead, which duplicates the internal data structures. """ return seqs.Seqs, seqs.Names def seqs_from_empty(obj, *args, **kwargs): """SequenceCollection from empty data: raise exception.""" raise ValueError, "Cannot create empty SequenceCollection." class SequenceCollection(object): """Base class for Alignment, but also just stores unaligned seqs. Handles shared functionality: detecting the input type, writing out the sequences as different formats, translating the sequences, chopping off stop codons, looking up sequences by name, etc. A SequenceCollection must support: - input handlers for different data types - SeqData: behaves like list of lists of chars, holds seq data - Seqs: behaves like list of Sequence objects, iterable in name order - Names: behaves like list of names for the sequence objects - NamedSeqs: behaves like dict of {name:seq} - MolType: specifies what kind of sequences are in the collection """ InputHandlers = { 'array': seqs_from_array, 'model_seqs': seqs_from_model_seqs, 'generic': seqs_from_generic, 'fasta': seqs_from_fasta, 'collection': seqs_from_aln, 'aln': seqs_from_aln, 'dense_aln': seqs_from_aln, 'dict': seqs_from_dict, 'empty': seqs_from_empty, 'kv_pairs':seqs_from_kv_pairs, } IsArray = set(['array', 'model_seqs']) DefaultNameFunction = assign_sequential_names def __init__(self, data, Names=None, Alphabet=None, MolType=None, \ Name=None, Info=None, conversion_f=None, is_array=False, \ force_same_data=False, \ remove_duplicate_names=False, label_to_name=None, suppress_named_seqs=False): """Initialize self with data and optionally Info. We are always going to convert to characters, so Sequence objects in the collection will lose additional special attributes they have. This is somewhat inefficient, so it might be worth revisiting this decision later. The handling of sequence names requires special attention. Depending on the input data, we might get the names from the sequences themselves, or we might add them from Names that are passed in. However, the Names attribute controls the order that we examine the sequences in, so if it is passed in it should override the order that we got from the input data (e.g. you might pass in unlabeled sequences with the names ['b','a'], so that you want the first sequence to be called 'b' and the second to be called 'a', or you might pass in labeled sequences, e.g. as a dict, and the names ['b','a'], indicating that you want the sequence called b to be first and the sequence called a to be second despite the fact that they are in arbitrary order in the original input. In this second situation, it is imortant that the sequences not be relabeled. This is handled as followed. If the sequences are passed in using a method that does not carry the names with it, the Names that are passed in will be handed out to successive sequences. If the sequences are passed in using a method that does carry the names with it, the Names that are passed in will be used to order the sequences, but they will not be relabeled. Note that if you're passing in a data type that is already labeled (e.g. a list of Sequence objects) you _must_ have unique names beforehand. It's possible that this additional handling should be moved to a separate object; the motivation for having it on Alignment __init__ is that it's easy for users to construct Alignment objects directly. Parameters: data: Data to convert into a SequenceCollection Names: Order of Names in the alignment. Should match the names of the sequences (after processing by label_to_name if present). Alphabet: Alphabet to use for the alignment (primarily important for DenseAlignment) MolType: MolType to be applied to the Alignment and to each seq. Name: Name of the SequenceCollection. Info: Info object to be attached to the alignment itself. conversion_f: Function to convert string into sequence. is_array: True if input is an array, False otherwise. force_same_data: True if data will be used as the same object. remove_duplicate_names: True if duplicate names are to be silently deleted instead of raising errors. label_to_name: if present, converts name into f(name). """ #read all the data in if we were passed a generator if isinstance(data, GeneratorType): data = list(data) #set the Name self.Name = Name #figure out alphabet and moltype self.Alphabet, self.MolType = \ self._get_alphabet_and_moltype(Alphabet, MolType, data) if not isinstance(Info, InfoClass): if Info: Info = InfoClass(Info) else: Info = InfoClass() self.Info = Info #if we're forcing the same data, skip the validation if force_same_data: self._force_same_data(data, Names) curr_seqs = data #otherwise, figure out what we got and coerce it into the right type else: per_seq_names, curr_seqs, name_order = \ self._names_seqs_order(conversion_f, data, Names, is_array, \ label_to_name, remove_duplicate_names, \ Alphabet=self.Alphabet) self.Names = name_order #will take only the seqs and names that are in name_order if per_seq_names != name_order: good_indices = [] for n in name_order: good_indices.append(per_seq_names.index(n)) if hasattr(curr_seqs, 'astype'): #it's an array #much faster to check than to raise exception in this case curr_seqs = take(curr_seqs, good_indices, axis=0) else: curr_seqs = [curr_seqs[i] for i in good_indices] per_seq_names = name_order #create NamedSeqs dict for fast lookups if not suppress_named_seqs: self.NamedSeqs = self._make_named_seqs(self.Names, curr_seqs) #Sequence objects behave like sequences of chars, so no difference #between Seqs and SeqData. Note that this differs for Alignments, #so be careful which you use if writing methods that should work for #both SequenceCollections and Alignments. self._set_additional_attributes(curr_seqs) def __str__(self): """Returns self in FASTA-format, respecting name order.""" return ''.join(['>%s\n%s\n' % (name, self.getGappedSeq(name)) for name in self.Names]) def _make_named_seqs(self, names, seqs): """Returns NamedSeqs: dict of name:seq.""" name_seq_tuples = zip(names, seqs) for n, s in name_seq_tuples: s.Name = n return dict(name_seq_tuples) def _set_additional_attributes(self, curr_seqs): """Sets additional attributes based on current seqs: class-specific.""" self.SeqData = curr_seqs self._seqs = curr_seqs try: self.SeqLen = max(map(len, curr_seqs)) except ValueError: #got empty sequence, for some reason? self.SeqLen = 0 def _force_same_data(self, data, Names): """Forces dict that was passed in to be used as self.NamedSeqs""" self.NamedSeqs = data self.Names = Names or data.keys() def copy(self): """Returns deep copy of self.""" result = self.__class__(self, MolType=self.MolType, Info=self.Info) return result def _get_alphabet_and_moltype(self, Alphabet, MolType, data): """Returns Alphabet and MolType, giving MolType precedence.""" if Alphabet is None and MolType is None: if hasattr(data, 'MolType'): MolType = data.MolType elif hasattr(data, 'Alphabet'): Alphabet = data.Alphabet #check for containers else: curr_item = self._get_container_item(data) if hasattr(curr_item, 'MolType'): MolType = curr_item.MolType elif hasattr(curr_item, 'Alphabet'): Alphabet = curr_item.Alphabet else: MolType = self.MolType #will be BYTES by default if Alphabet is not None and MolType is None: MolType = Alphabet.MolType if MolType is not None and Alphabet is None: try: Alphabet = MolType.Alphabets.DegenGapped except AttributeError: Alphabet = MolType.Alphabet return Alphabet, MolType def _get_container_item(self, data): """Checks container for item with Alphabet or MolType""" curr_item = None if hasattr(data, 'itervalues'): curr_item = data.itervalues().next() else: try: curr_item = iter(data).next() except: pass return curr_item def _strip_duplicates(self, names, seqs): """Internal function to strip duplicates from list of names""" if len(set(names)) == len(names): return set(), names, seqs #if we got here, there are duplicates unique_names = {} duplicates = {} fixed_names = [] fixed_seqs = [] for n, s in zip(names, seqs): if n in unique_names: duplicates[n] = 1 else: unique_names[n] = 1 fixed_names.append(n) fixed_seqs.append(s) if type(seqs) is ndarray: fixed_seqs = array(fixed_seqs, seqs.dtype) return duplicates, fixed_names, fixed_seqs def _names_seqs_order(self, conversion_f, data, Names, is_array, \ label_to_name, remove_duplicate_names, Alphabet=None): """Internal function to figure out names, seqs, and name_order.""" #figure out conversion function and whether it's an array if not conversion_f: input_type = self._guess_input_type(data) is_array = input_type in self.IsArray conversion_f = self.InputHandlers[input_type] #set seqs and names as properties if Alphabet: seqs, names = conversion_f(data, Alphabet=Alphabet) else: seqs, names = conversion_f(data) if names and label_to_name: names = map(label_to_name, names) curr_seqs = self._coerce_seqs(seqs, is_array) #if no names were passed in as Names, if we obtained them from #the seqs we should use them, but otherwise we should use the #default names if Names is None: if (names is None) or (None in names): per_seq_names = name_order = \ self.DefaultNameFunction(len(curr_seqs)) else: #got names from seqs per_seq_names = name_order = names else: #otherwise, names were passed in as Names: use this as the order #if we got names from the sequences, but otherwise assign the #names to successive sequences in order if (names is None) or (None in names): per_seq_names = name_order = Names else: #got names from seqs, so assume name_order is in Names per_seq_names = names name_order = Names #check for duplicate names duplicates, fixed_names, fixed_seqs = \ self._strip_duplicates(per_seq_names, curr_seqs) if duplicates: if remove_duplicate_names: per_seq_names, curr_seqs = fixed_names, fixed_seqs #if name_order doesn't have the same names as per_seq_names, #replace it with per_seq_names if (set(name_order) != set(per_seq_names)) or\ (len(name_order) != len(per_seq_names)): name_order = per_seq_names else: raise ValueError, \ "Some names were not unique. Duplicates are:\n" + \ str(sorted(duplicates.keys())) return per_seq_names, curr_seqs, name_order def _coerce_seqs(self, seqs, is_array): """Controls how seqs are coerced in _names_seqs_order. Override in subclasses where this behavior should differ. """ if is_array: seqs = map(str, map(self.MolType.ModelSeq, seqs)) return map(self.MolType.Sequence, seqs) def _guess_input_type(self, data): """Guesses input type of data; returns result as key of InputHandlers. First checks whether data is an Alignment, then checks for some common string formats, then tries to do it based on string or array properties. Returns 'empty' if check fails, i.e. if it can't recognize the sequence as a specific type. Note that bad sequences are not guaranteed to return 'empty', and may be recognized as another type incorrectly. """ if isinstance(data, DenseAlignment): return 'dense_aln' if isinstance(data, Alignment): return 'aln' if isinstance(data, SequenceCollection): return 'collection' if isinstance(data, dict): return 'dict' if isinstance(data, str): if data.startswith('>'): return 'fasta' else: return 'generic' first = None try: first = data[0] except (IndexError, TypeError): pass try: first = iter(data).next() except (IndexError, TypeError, StopIteration): pass if first is None: return 'empty' try: if isinstance(first, ModelSequence): #model sequence base type return 'model_seqs' elif hasattr(first, 'dtype'): #array object return 'array' elif isinstance(first, str) and first.startswith('>'): return 'fasta' else: try: dict(data) return 'kv_pairs' except (TypeError, ValueError): pass return 'generic' except (IndexError, TypeError), e: return 'empty' def __cmp__(self, other): """cmp first tests as dict, then as str.""" c = cmp(self.NamedSeqs, other) if not c: return 0 else: return cmp(str(self), str(other)) def keys(self): """keys uses self.Names, which defaults to known keys if None. Note: returns copy, not original. """ return self.Names[:] def values(self): """values returns values corresponding to self.Names.""" return [self.NamedSeqs[n] for n in self.Names] def items(self): """items returns (name, value) pairs.""" return [(n, self.NamedSeqs[n]) for n in self.Names] def iterSeqs(self, seq_order=None): """Iterates over values (sequences) in the alignment, in order. seq_order: list of keys giving the order in which seqs will be returned. Defaults to self.Names. Note that only these sequences will be returned, and that KeyError will be raised if there are sequences in order that have been deleted from the Alignment. If self.Names is None, returns the sequences in the same order as self.NamedSeqs.values(). Use map(f, self.seqs()) to apply the constructor f to each seq. f must accept a single list as an argument. Always returns references to the same objects that are values of the alignment. """ ns = self.NamedSeqs get = ns.__getitem__ for key in seq_order or self.Names: yield get(key) def _take_seqs(self): return list(self.iterSeqs()) Seqs = property(_take_seqs) #access as attribute if using default order. def takeSeqs(self, seqs, negate=False, **kwargs): """Returns new Alignment containing only specified seqs. Note that the seqs in the new alignment will be references to the same objects as the seqs in the old alignment. """ get = self.NamedSeqs.__getitem__ result = {} if 'MolType' not in kwargs: kwargs['MolType'] = self.MolType if negate: #copy everything except the specified seqs negated_names = [] row_lookup = dict.fromkeys(seqs) for r, row in self.NamedSeqs.items(): if r not in row_lookup: result[r] = row negated_names.append(r) seqs = negated_names #remember to invert the list of names else: #copy only the specified seqs for r in seqs: result[r] = get(r) if result: return self.__class__(result, Names=seqs, **kwargs) else: return {} #safe value; can't construct empty alignment def getSeqIndices(self, f, negate=False): """Returns list of keys of seqs where f(row) is True. List will be in the same order as self.Names, if present. """ get = self.NamedSeqs.__getitem__ #negate function if necessary if negate: new_f = lambda x: not f(x) else: new_f = f #get all the seqs where the function is True return [key for key in self.Names if new_f(get(key))] def takeSeqsIf(self, f, negate=False, **kwargs): """Returns new Alignment containing seqs where f(row) is True. Note that the seqs in the new Alignment are the same objects as the seqs in the old Alignment, not copies. """ #pass negate to get SeqIndices return self.takeSeqs(self.getSeqIndices(f, negate), **kwargs) def iterItems(self, seq_order=None, pos_order=None): """Iterates over elements in the alignment. seq_order (names) can be used to select a subset of seqs. pos_order (positions) can be used to select a subset of positions. Always iterates along a seq first, then down a position (transposes normal order of a[i][j]; possibly, this should change).. WARNING: Alignment.iterItems() is not the same as alignment.iteritems() (which is the built-in dict iteritems that iterates over key-value pairs). """ if pos_order: for row in self.iterSeqs(seq_order): for i in pos_order: yield row[i] else: for row in self.iterSeqs(seq_order): for i in row: yield i Items = property(iterItems) def getItems(self, items, negate=False): """Returns list containing only specified items. items should be a list of (row_key, col_key) tuples. """ get = self.NamedSeqs.__getitem__ if negate: #have to cycle through every item and check that it's not in #the list of items to return item_lookup = dict.fromkeys(map(tuple, items)) result = [] for r in self.Names: curr_row = get(r) for c in range(len(curr_row)): if (r, c) not in items: result.append(curr_row[c]) return result #otherwise, just pick the selected items out of the list else: return [get(row)[col] for row, col in items] def getItemIndices(self, f, negate=False): """Returns list of (key,val) tuples where f(self.NamedSeqs[key][val]).""" get = self.NamedSeqs.__getitem__ if negate: new_f = lambda x: not f(x) else: new_f = f result = [] for row_label in self.Names: curr_row = get(row_label) for col_idx, item in enumerate(curr_row): if new_f(item): result.append((row_label, col_idx)) return result def getItemsIf(self, f, negate=False): """Returns list of items where f(self.NamedSeqs[row][col]) is True.""" return self.getItems(self.getItemIndices(f, negate)) def getSimilar(self, target, min_similarity=0.0, max_similarity=1.0, \ metric=frac_same, transform=None): """Returns new Alignment containing sequences similar to target. target: sequence object to compare to. Can be in the alignment. min_similarity: minimum similarity that will be kept. Default 0.0. max_similarity: maximum similarity that will be kept. Default 1.0. (Note that both min_similarity and max_similarity are inclusive.) metric: similarity function to use. Must be f(first_seq, second_seq). The default metric is fraction similarity, ranging from 0.0 (0% identical) to 1.0 (100% identical). The Sequence classes have lots of methods that can be passed in as unbound methods to act as the metric, e.g. fracSameGaps. transform: transformation function to use on the sequences before the metric is calculated. If None, uses the whole sequences in each case. A frequent transformation is a function that returns a specified range of a sequence, e.g. eliminating the ends. Note that the transform applies to both the real sequence and the target sequence. WARNING: if the transformation changes the type of the sequence (e.g. extracting a string from an RnaSequence object), distance metrics that depend on instance data of the original class may fail. """ if transform: target = transform(target) m = lambda x: metric(target, x) if transform: def f(x): result = m(transform(x)) return min_similarity <= result <= max_similarity else: def f(x): result = m(x) return min_similarity <= result <= max_similarity return self.takeSeqsIf(f) def distanceMatrix(self, f): """Returns Matrix containing pairwise distances between sequences. f is the distance function f(x,y) -> distance between x and y. It's often useful to pass an unbound method in as f. Does not assume that f(x,y) == f(y,x) or that f(x,x) == 0. """ get = self.NamedSeqs.__getitem__ seqs = self.NamedSeqs.keys() result = Dict2D() for i in seqs: for j in seqs: d = f(get(i), get(j)) if i not in result: result[i] = {} if j not in result: result[j] = {} result[i][j] = d result[j][i] = d return result def isRagged(self): """Returns True if alignment has sequences of different lengths.""" seqs = self.Seqs #Get all sequences in alignment length = len(seqs[0]) #Get length of first sequence for seq in seqs: #If lengths differ if length != len(seq): return True #lengths were all equal return False def toPhylip(self, generic_label=True, make_seqlabel=None): """ Return alignment in PHYLIP format and mapping to sequence ids raises exception if invalid alignment Arguments: - make_seqlabel: callback function that takes the seq object and returns a label str """ return phylip_from_alignment(self, generic_label=generic_label, make_seqlabel=make_seqlabel) def toFasta(self, make_seqlabel=None): """Return alignment in Fasta format Arguments: - make_seqlabel: callback function that takes the seq object and returns a label str """ return fasta_from_alignment(self, make_seqlabel=make_seqlabel) def toNexus(self, seq_type, interleave_len=50): """ Return alignment in NEXUS format and mapping to sequence ids **NOTE** Not that every sequence in the alignment MUST come from a different species!! (You can concatenate multiple sequences from same species together before building tree) seq_type: dna, rna, or protein Raises exception if invalid alignment """ return nexus_from_alignment(self, seq_type, interleave_len=interleave_len) def getIntMap(self,prefix='seq_'): """Returns a dict with names mapped to enumerates integer names. - prefix: prefix for sequence label. Default = 'seq_' - int_keys is a dict mapping int names to sorted original names. """ get = self.NamedSeqs.__getitem__ int_keys = dict([(prefix+str(i),k) for i,k in \ enumerate(sorted(self.NamedSeqs.keys()))]) int_map = dict([(k, copy(get(v))) for k,v in int_keys.items()]) return int_map, int_keys def getNumSeqs(self): """Returns the number of sequences in the alignment.""" return len(self.NamedSeqs) def copyAnnotations(self, unaligned): """Copies annotations from seqs in unaligned to self, matching by name. Alignment programs like ClustalW don't preserve annotations, so this method is available to copy annotations off the unaligned sequences. unaligned should be a dictionary of Sequence instances. Ignores sequences that are not in self, so safe to use on larger dict of seqs that are not in the current collection/alignment. """ for name, seq in unaligned.items(): if name in self.NamedSeqs: self.NamedSeqs[name].copyAnnotations(seq) def annotateFromGff(self, f): """Copies annotations from gff-format file to self. Matches by name of sequence. This method expects a file handle, not the name of a file. Skips sequences in the file that are not in self. """ for (name, source, feature, start, end, score, strand, frame, attributes, comments) in GffParser(f): if name in self.NamedSeqs: self.NamedSeqs[name].addFeature( feature, parse_attributes(attributes), [(start,end)]) ''' self.NamedSeqs[seqname].data.addFeature( feature, parse_attributes(attributes), [(start, end)]) ''' def replaceSeqs(self, seqs): """Returns new alignment with same shape but with data taken from seqs. Primary use is for aligning codons from protein alignment, or, more generally, substituting in codons from a set of protein sequences (not necessarily aligned). For this reason, it takes characters from seqs three at a time rather than one at a time (i.e. 3 characters in seqs are put in place of 1 character in self). If seqs is an alignment, any gaps in it will be ignored. """ if hasattr(seqs, 'NamedSeqs'): seqs = seqs.NamedSeqs else: seqs = SequenceCollection(seqs).NamedSeqs new_seqs = [] for label in self.Names: aligned = self.NamedSeqs[label] seq = seqs[label] if isinstance(seq, Aligned): seq = seq.data new_seqs.append((label, Aligned(aligned.map * 3, seq))) return self.__class__(new_seqs) def getGappedSeq(self, seq_name, recode_gaps=False): """Return a gapped Sequence object for the specified seqname. Note: return type may depend on what data was loaded into the SequenceCollection or Alignment. """ return self.NamedSeqs[seq_name] def __add__(self, other): """Concatenates sequence data for same names""" aligned = isinstance(self, Alignment) if len(self.NamedSeqs) != len(other.NamedSeqs): raise ValueError("Alignments don't have same number of sequences") concatenated = [] for name in self.Names: if name not in other.Names: raise ValueError("Right alignment doesn't have a '%s'" % name) new_seq = self.NamedSeqs[name] + other.NamedSeqs[name] concatenated.append(new_seq) new = self.__class__(MolType=self.MolType, data=zip(self.Names, concatenated)) if aligned: left = [a for a in self._shiftedAnnotations(new, 0) \ if a.map.End <= len(self)] right = [a for a in other._shiftedAnnotations(new, len(self)) \ if a.map.Start >= len(self)] new.annotations = left + right return new def addSeqs(self, other, before_name=None, after_name=None): """Returns new object of class self with sequences from other added. By default the sequence is appended to the end of the alignment, this can be changed by using either before_name or after_name arguments. Arguments: - other: same class as self or coerceable to that class - before_name: str - [default:None] name of the sequence before which sequence is added - after_name: str - [default:None] name of the sequence after which sequence is added If both before_name and after_name are specified, the seqs will be inserted using before_name. """ assert not isinstance(other, str), "Must provide a series of seqs "+\ "or an alignment" self_seq_class = self.Seqs[0].__class__ try: combined = self.Seqs + other.Seqs except AttributeError: combined = self.Seqs + list(other) for seq in combined: assert seq.__class__ == self_seq_class,\ "Seq classes different: Expected %s, Got %s" % \ (seq.__class__, self_seq_class) combined_aln = self.__class__(data=combined) if before_name is None and after_name is None: return combined_aln if (before_name and before_name not in self.Names) \ or \ (after_name and after_name not in self.Names): name = before_name or after_name raise ValueError("The alignment doesn't have a sequence named '{0}'" .format(name)) if before_name is not None: # someone might have seqname of int(0) index = self.Names.index(before_name) elif after_name is not None: index = self.Names.index(after_name) + 1 names_before = self.Names[:index] names_after = self.Names[index:] new_names = combined_aln.Names[len(self.Names):] aln_new = combined_aln.takeSeqs(new_names) if len(names_before) > 0: aln_before = self.takeSeqs(names_before) combined_aln = aln_before combined_aln = combined_aln.addSeqs(aln_new) else: combined_aln = aln_new if len(names_after) > 0: aln_after = self.takeSeqs(names_after) combined_aln = combined_aln.addSeqs(aln_after) return combined_aln def writeToFile(self, filename=None, format=None, **kwargs): """Write the alignment to a file, preserving order of sequences. Arguments: - filename: name of the sequence file - format: format of the sequence file If format is None, will attempt to infer format from the filename suffix. """ if filename is None: raise DataError('no filename specified') # need to turn the alignment into a dictionary align_dict = {} for seq_name in self.Names: align_dict[seq_name] = str(self.NamedSeqs[seq_name]) if format is None and '.' in filename: # allow extension to work if provided format = filename[filename.rfind(".")+1:] if 'order' not in kwargs: kwargs['order'] = self.Names save_to_filename(align_dict, filename, format, **kwargs) def __len__(self): """len of SequenceCollection returns length of longest sequence.""" return self.SeqLen def getTranslation(self, gc=None, **kwargs): """Returns a new alignment object with the DNA sequences translated, using the current codon moltype, into an amino acid sequence. """ translated = [] aligned = isinstance(self, Alignment) # do the translation try: for seqname in self.Names: if aligned: seq = self.getGappedSeq(seqname) else: seq = self.NamedSeqs[seqname] pep = seq.getTranslation(gc) translated.append((seqname, pep)) return self.__class__(translated, **kwargs) except AttributeError, msg: raise AttributeError, "%s -- %s" % (msg, "Did you set a DNA MolType?") def getSeq(self, seqname): """Return a sequence object for the specified seqname. """ return self.NamedSeqs[seqname] def todict(self): """Returns the alignment as dict of names -> strings. Note: returns strings, NOT Sequence objects. """ align_dict = {} for seq_name in self.Names: align_dict[seq_name] = str(self.NamedSeqs[seq_name]) return align_dict def getPerSequenceAmbiguousPositions(self): """Returns dict of seq:{position:char} for ambiguous chars. Used in likelihood calculations. """ result = {} for name in self.Names: result[name] = ambig = {} for (i, motif) in enumerate(self.getGappedSeq(name)): if self.MolType.isAmbiguity(motif): ambig[i] = motif return result def degap(self, **kwargs): """Returns copy in which sequences have no gaps.""" new_seqs = [] aligned = isinstance(self, Alignment) for seq_name in self.Names: if aligned: seq = self.NamedSeqs[seq_name].data else: seq = self.NamedSeqs[seq_name] new_seqs.append((seq_name, seq.degap())) return SequenceCollection(MolType=self.MolType, data=new_seqs, **kwargs) def withModifiedTermini(self): """Changes the termini to include termini char instead of gapmotif. Useful to correct the standard gap char output by most alignment programs when aligned sequences have different ends. """ seqs = [] for name in self.Names: seq = self.NamedSeqs[name].withTerminiUnknown() seqs.append((name, seq)) return self.__class__(MolType=self.MolType, data=seqs) def hasTerminalStops(self, gc=None): """Returns True if any sequence has a terminal stop codon.""" stops = [] aligned = isinstance(self, Alignment) for seq_name in self.Names: if aligned: seq = self.NamedSeqs[seq_name].data else: seq = self.NamedSeqs[seq_name] stops.append(seq.hasTerminalStop(gc=gc)) return max(stops) def withoutTerminalStopCodons(self, gc=None, **kwargs): """Removes any terminal stop codons from the sequences""" new_seqs = [] aligned = isinstance(self, Alignment) for seq_name in self.Names: old_seq = self.NamedSeqs[seq_name] if aligned: new_seq = old_seq.data.withoutTerminalStopCodon(gc=gc) new_seq = Aligned(old_seq.map, new_seq) else: new_seq = old_seq.withoutTerminalStopCodon(gc=gc) new_seqs.append((seq_name, new_seq)) return self.__class__(MolType=self.MolType, data=new_seqs, **kwargs) def getSeqNames(self): """Return a list of sequence names.""" return self.Names[:] def getMotifProbs(self, alphabet=None, include_ambiguity=False, exclude_unobserved=False, allow_gap=False, pseudocount=0): """Return a dictionary of motif probs. Arguments: - include_ambiguity: if True resolved ambiguous codes are included in estimation of frequencies, default is False. - exclude_unobserved: if True, motifs that are not present in the alignment are excluded from the returned dictionary, default is False. - allow_gap: allow gap motif """ if alphabet is None: alphabet = self.MolType.Alphabet if allow_gap: alphabet = alphabet.Gapped counts = {} for seq_name in self.Names: sequence = self.NamedSeqs[seq_name] motif_len = alphabet.getMotifLen() if motif_len > 1: posns = range(0, len(sequence)+1-motif_len, motif_len) sequence = [sequence[i:i+motif_len] for i in posns] for motif in sequence: if not allow_gap: if self.MolType.Gap in motif: continue if motif in counts: counts[motif] += 1 else: counts[motif] = 1 probs = {} if not exclude_unobserved: for motif in alphabet: probs[motif] = pseudocount for (motif, count) in counts.items(): motif_set = alphabet.resolveAmbiguity(motif) if len(motif_set) > 1: if include_ambiguity: count = float(count) / len(motif_set) else: continue for motif in motif_set: probs[motif] = probs.get(motif, pseudocount) + count total = float(sum(probs.values())) for motif in probs: probs[motif] /= total return probs def getGapCount(self, seq_name): return len(self.NamedSeqs[seq_name].map.gaps()) def getSeqFreqs(self): """Returns Profile of counts: seq by character. See documentation for _get_freqs: this just wraps it and converts the result into a Profile object organized per-sequence (i.e. per row). """ return Profile(self._get_freqs(0), self.Alphabet) def _make_gaps_ok(self, allowed_gap_frac): """Makes the gaps_ok function used by omitGapPositions and omitGapSeqs. Need to make the function because if it's a method of Alignment, it has unwanted 'self' and 'allowed_gap_frac' parameters that impede the use of map() in takeSeqsIf. WARNING: may not work correctly if component sequences have gaps that are not the Alignment gap character. This is because the gaps are checked at the position level (and the positions are lists), rather than at the sequence level. Working around this issue would probably cause a significant speed penalty. """ def gaps_ok(seq): seq_len = len(seq) try: num_gaps = seq.countGaps() except AttributeError: num_gaps = len(filter(self.MolType.Gaps.__contains__, seq)) return num_gaps / seq_len <= allowed_gap_frac return gaps_ok def omitGapPositions(self, allowed_gap_frac=1-eps, del_seqs=False, \ allowed_frac_bad_cols=0, seq_constructor=None): """Returns new alignment where all cols have <= allowed_gap_frac gaps. allowed_gap_frac says what proportion of gaps is allowed in each column (default is 1-eps, i.e. all cols with at least one non-gap character are preserved). If del_seqs is True (default:False), deletes the sequences that don't have gaps where everything else does. Otherwise, just deletes the corresponding column from all sequences, in which case real data as well as gaps can be removed. Uses seq_constructor(seq) to make each new sequence object. Note: a sequence that is all gaps will not be deleted by del_seqs (even if all the positions have been deleted), since it has no non-gaps in positions that are being deleted for their gap content. Possibly, this decision should be revisited since it may be a surprising result (and there are more convenient ways to return the sequences that consist wholly of gaps). """ if seq_constructor is None: seq_constructor = self.MolType.Sequence gaps_ok = self._make_gaps_ok(allowed_gap_frac) #if we're not deleting the 'naughty' seqs that contribute to the #gaps, it's easy... if not del_seqs: return self.takePositionsIf(f=gaps_ok, \ seq_constructor=seq_constructor) #otherwise, we have to figure out which seqs to delete. #if we get here, we're doing del_seqs. cols_to_delete = dict.fromkeys(self.getPositionIndices(gaps_ok, \ negate=True)) default_gap_f = self.MolType.Gaps.__contains__ bad_cols_per_row = {} for key, row in self.NamedSeqs.items(): try: is_gap = row.Alphabet.Gaps.__contains__ except AttributeError: is_gap = default_gap_f for col in cols_to_delete: if not is_gap(str(row)[col]): if key not in bad_cols_per_row: bad_cols_per_row[key] = 1 else: bad_cols_per_row[key] += 1 #figure out which of the seqs we're deleting get = self.NamedSeqs.__getitem__ seqs_to_delete = {} for key, count in bad_cols_per_row.items(): if float(count)/len(get(key)) >= allowed_frac_bad_cols: seqs_to_delete[key] = True #It's _much_ more efficient to delete the seqs before the cols. good_seqs = self.takeSeqs(seqs_to_delete, negate=True) cols_to_keep = dict.fromkeys(range(self.SeqLen)) for c in cols_to_delete: del cols_to_keep[c] if good_seqs: return good_seqs.takePositions(cols=cols_to_keep.keys(), \ seq_constructor=seq_constructor) else: return {} def omitGapSeqs(self, allowed_gap_frac=0): """Returns new alignment with seqs that have <= allowed_gap_frac. allowed_gap_frac should be a fraction between 0 and 1 inclusive. Default is 0. """ gaps_ok = self._make_gaps_ok(allowed_gap_frac) return self.takeSeqsIf(gaps_ok) def omitGapRuns(self, allowed_run=1): """Returns new alignment where all seqs have runs of gaps <=allowed_run. Note that seqs with exactly allowed_run gaps are not deleted. Default is for allowed_run to be 1 (i.e. no consecutive gaps allowed). Because the test for whether the current gap run exceeds the maximum allowed gap run is only triggered when there is at least one gap, even negative values for allowed_run will still let sequences with no gaps through. """ def ok_gap_run(x): try: is_gap = x.Alphabet.Gaps.__contains__ except AttributeError: is_gap = self.MolType.Gaps.__contains__ curr_run = max_run = 0 for i in x: if is_gap(i): curr_run += 1 if curr_run > allowed_run: return False else: curr_run = 0 #can only get here if max_run was never exceeded (although this #does include the case where the sequence is empty) return True return self.takeSeqsIf(ok_gap_run) def omitSeqsTemplate(self, template_name, gap_fraction, gap_run): """Returns new alignment where all seqs are well aligned with template. gap_fraction = fraction of positions that either have a gap in the template but not in the seq or in the seq but not in the template gap_run = number of consecutive gaps tolerated in query relative to sequence or sequence relative to query """ template = self.NamedSeqs[template_name] gap_filter = make_gap_filter(template, gap_fraction, gap_run) return self.takeSeqsIf(gap_filter) def toDna(self): """Returns the alignment as DNA.""" seqs = [self.NamedSeqs[name].toDna() for name in self.Names] aln = self.__class__(data=seqs, Names=self.Names[:], Name=self.Name, Info=self.Info) if isinstance(self, _Annotatable) and self.annotations: aln.annotations = self.annotations[:] return aln def toRna(self): """Returns the alignment as RNA""" seqs = [self.NamedSeqs[name].toRna() for name in self.Names] aln = self.__class__(data=seqs, Names=self.Names[:], Name=self.Name, Info=self.Info) if isinstance(self, _Annotatable) and self.annotations: aln.annotations = self.annotations[:] return aln def rc(self): """Returns the reverse complement alignment""" seqs = [self.NamedSeqs[name].rc() for name in self.Names] rc = self.__class__(data=seqs, Names=self.Names[:], Name=self.Name, Info=self.Info) if isinstance(self, _Annotatable) and self.annotations: self._annotations_nucleic_reversed_on(rc) return rc def reversecomplement(self): """Returns the reverse complement alignment. A synonymn for rc.""" return self.rc() def padSeqs(self, pad_length=None, **kwargs): """Returns copy in which sequences are padded to same length. pad_length: Length all sequences are to be padded to. Will pad to max sequence length if pad_length is None or less than max length. """ #get max length max_len = max([len(s) for s in self.Seqs]) #If a pad_length was passed in, make sure it is valid if pad_length is not None: pad_length = int(pad_length) if pad_length < max_len: raise ValueError, \ "pad_length must be at greater or equal to maximum sequence length: %s"\ %(str(max_len)) #pad_length is max sequence length. else: pad_length = max_len #Get new sequence list new_seqs = [] aligned = isinstance(self, Alignment) #for each sequence, pad gaps to end for seq_name in self.Names: if aligned: seq = self.NamedSeqs[seq_name].data else: seq = self.NamedSeqs[seq_name] padded_seq = seq + '-'*(pad_length-len(seq)) new_seqs.append((seq_name, padded_seq)) #return new SequenceCollection object return SequenceCollection(MolType=self.MolType, data=new_seqs, **kwargs) class Aligned(object): """One sequence in an alignment, a map between alignment coordinates and sequence coordinates""" def __init__(self, map, data, length=None): #Unlike the normal map constructor, here we take a list of pairs of #alignment coordinates, NOT a list of pairs of sequence coordinates if isinstance(map, list): map = Map(map, parent_length=length).inverse() self.map = map self.data = data if hasattr(data, 'Info'): self.Info = data.Info if hasattr(data, 'Name'): self.Name = data.Name def _get_moltype(self): return self.data.MolType MolType = property(_get_moltype) def copy(self, memo=None, _nil=[], constructor='ignored'): """Returns a shallow copy of self WARNING: cogent.core.sequence.Sequence does NOT implement a copy method, as such, the data member variable of the copied object will maintain reference to the original object. WARNING: cogent.core.location.Map does NOT implement a copy method, as such, the data member variable of the copied object will maintain reference to the original object. """ return self.__class__(self.map, self.data) def __repr__(self): return '%s of %s' % (repr(self.map), repr(self.data)) def withTerminiUnknown(self): return self.__class__(self.map.withTerminiUnknown(), self.data) def copyAnnotations(self, other): self.data.copyAnnotations(other) def annotateFromGff(self, f): self.data.annotate_from_gff(f) def addFeature(self, *args, **kwargs): self.data.addFeature(*args, **kwargs) def __str__(self): """Returns string representation of aligned sequence, incl. gaps.""" return str(self.getGappedSeq()) def __cmp__(self, other): """Compares based on string representations.""" return cmp(str(self), str(other)) def __iter__(self): """Iterates over sequence one motif (e.g. char) at a time, incl. gaps""" return self.data.gappedByMapMotifIter(self.map) def getGappedSeq(self, recode_gaps=False): """Returns sequence as an object, including gaps.""" return self.data.gappedByMap(self.map, recode_gaps) def __len__(self): # these make it look like Aligned should be a subclass of Map, # but then you have to be careful with __getitem__, __init__ and inverse. return len(self.map) def __add__(self, other): if self.data is other.data: (map, seq) = (self.map + other.map, self.data) else: seq = self.getGappedSeq() + other.getGappedSeq() (map, seq) = seq.parseOutGaps() return Aligned(map, seq) def __getitem__(self, slice): return Aligned(self.map[slice], self.data) def rc(self): return Aligned(self.map.reversed(), self.data) def toRna(self): return Aligned(self.map, self.data.toRna()) def toDna(self): return Aligned(self.map, self.data.toDna()) def getTracks(self, policy): policy = policy.at(self.map.inverse()) return self.data.getTracks(policy) def remappedTo(self, map): #assert map is self.parent_map or ... ? #print 'REMAP', self.map, self #print 'ONTO', map, map.inverse() result = Aligned(map[self.map.inverse()].inverse(), self.data) #print 'GIVES', result.map, result #print return result def getAnnotationsMatching(self, alignment, *args): for annot in self.data.getAnnotationsMatching(*args): yield annot.remappedTo(alignment, self.map.inverse()) def gapVector(self): """Returns gapVector of GappedSeq, for omitGapPositions.""" return self.getGappedSeq().gapVector() def _masked_annotations(self, annot_types, mask_char, shadow): """returns a new aligned sequence with regions defined by align_spans and shadow masked.""" new_data = self.data.withMaskedAnnotations(annot_types, mask_char, shadow) # we remove the mask annotations from self and new_data return self.__class__(self.map, new_data) class AlignmentI(object): """Alignment interface object. Contains methods shared by implementations. Note that subclasses should inherit both from AlignmentI and from SequenceCollection (typically). Alignments are expected to be immutable once created. No mechanism is provided for maintaining reference consistency if data in the alignment are modified. An Alignment is expected to be able to generate the following: - Seqs: Sequence objects in the alignment, can turn themselves into strings. These are usually thought of as "rows" in an alignment. - Positions: Vectors representing data in each position in the alignment These are usually thought of as "columns" in an alignment. - SeqData: Vectors representing data in each sequence in the alignment, not necessarily guaranteed to turn themselves into a string - Items: Iterator over the characters in the alignment - Names: List of names of sequences in the alignment. Used for display order. A cheap way to omit or reorder sequences is to modify the list of names. - NamedSeqs: Dict of name -> seq object, used for lookup. - MolType: MolType of the alignment. """ DefaultGap = '-' #default gap character for padding GapChars = dict.fromkeys('-?') #default gap chars for comparisons def iterPositions(self, pos_order=None): """Iterates over positions in the alignment, in order. pos_order refers to a list of indices (ints) specifying the column order. This lets you rearrange positions if you want to (e.g. to pull out individual codon positions). Note that self.iterPositions() always returns new objects, by default lists of elements. Use map(f, self.iterPositions) to apply the constructor or function f to the resulting lists (f must take a single list as a parameter). Note that some sequences (e.g. ViennaStructures) have rules that prevent arbitrary strings of their symbols from being valid objects. Will raise IndexError if one of the indices in order exceeds the sequence length. This will always happen on ragged alignments: assign to self.SeqLen to set all sequences to the same length. """ get = self.NamedSeqs.__getitem__ pos_order = pos_order or xrange(self.SeqLen) seq_order = self.Names for pos in pos_order: yield [get(seq)[pos] for seq in seq_order] Positions = property(iterPositions) def takePositions(self, cols, negate=False, seq_constructor=None): """Returns new Alignment containing only specified positions. By default, the seqs will be lists, but an alternative constructor can be specified. Note that takePositions will fail on ragged positions. """ if seq_constructor is None: seq_constructor = self.MolType.Sequence result = {} #if we're negating, pick out all the positions except specified indices if negate: col_lookup = dict.fromkeys(cols) for key, row in self.NamedSeqs.items(): result[key] = seq_constructor([row[i] for i in range(len(row)) \ if i not in col_lookup]) #otherwise, just get the requested indices else: for key, row in self.NamedSeqs.items(): result[key] = seq_constructor([row[i] for i in cols]) return self.__class__(result, Names=self.Names) def getPositionIndices(self, f, negate=False): """Returns list of column indices for which f(col) is True.""" #negate f if necessary if negate: new_f = lambda x: not f(x) else: new_f = f return [i for i, col in enumerate(self.Positions) if new_f(col)] def takePositionsIf(self, f, negate=False, seq_constructor=None): """Returns new Alignment containing cols where f(col) is True. Note that the seqs in the new Alignment are always new objects. Default constructor is list(), but an alternative can be passed in. """ if seq_constructor is None: seq_constructor = self.MolType.Sequence return self.takePositions(self.getPositionIndices(f, negate), \ seq_constructor=seq_constructor) def IUPACConsensus(self, alphabet=None): """Returns string containing IUPAC consensus sequence of the alignment. """ if alphabet is None: alphabet = self.MolType consensus = [] degen = alphabet.degenerateFromSequence for col in self.Positions: consensus.append(degen(coerce_to_string(col))) return coerce_to_string(consensus) def columnFreqs(self, constructor=Freqs): """Returns list of Freqs with item counts for each column. """ return map(constructor, self.Positions) def columnProbs(self, constructor=Freqs): """Returns FrequencyDistribuutions w/ prob. of each item per column. Implemented as a list of normalized Freqs objects. """ freqs = self.columnFreqs(constructor) for fd in freqs: fd.normalize() return freqs def majorityConsensus(self, transform=None, constructor=Freqs): """Returns list containing most frequent item at each position. Optional parameter transform gives constructor for type to which result will be converted (useful when consensus should be same type as originals). """ col_freqs = self.columnFreqs(constructor) consensus = [freq.Mode for freq in col_freqs] if transform == str: return coerce_to_string(consensus) elif transform: return transform(consensus) else: return consensus def uncertainties(self, good_items=None): """Returns Shannon uncertainty at each position. Usage: information_list = alignment.information(good_items=None) If good_items is supplied, deletes any symbols that are not in good_items. """ uncertainties = [] #calculate column probabilities if necessary if hasattr(self, 'PositionumnProbs'): probs = self.PositionumnProbs else: probs = self.columnProbs() #calculate uncertainty for each column for prob in probs: #if there's a list of valid symbols, need to delete everything else if good_items: prob = prob.copy() #do not change original #get rid of any symbols not in good_items for symbol in prob.keys(): if symbol not in good_items: del prob[symbol] #normalize the probabilities and add to the list prob.normalize() uncertainties.append(prob.Uncertainty) return uncertainties def scoreMatrix(self): """Returns a position specific score matrix for the alignment.""" return Dict2D(dict([(i,Freqs(col)) for i, col in enumerate(self.Positions)])) def _get_freqs(self, index=None): """Gets array of freqs along index 0 (= positions) or 1 (= seqs). index: if 0, will calculate the frequency of each symbol in each position (=column) in the alignment. Will return 2D array where the first index is the position, and the second index is the index of the symbol in the alphabet. For example, for the TCAG DNA Alphabet, result[3][0] would store the count of T at position 3 (i.e. the 4th position in the alignment. if 1, does the same thing except that the calculation is performed for each sequence, so the 2D array has the sequence index as the first index, and the symbol index as the second index. For example, for the TCAG DNA Alphabet, result[3][0] would store the count of T in the sequence at index 3 (i.e. the 4th sequence). First an DenseAligment object is created, next the calculation is done on this object. It is important that the DenseAlignment is initialized with the same MolType and Alphabet as the original Alignment. """ da = DenseAlignment(self, MolType=self.MolType, Alphabet=self.Alphabet) return da._get_freqs(index) def getPosFreqs(self): """Returns Profile of counts: position by character. See documentation for _get_freqs: this just wraps it and converts the result into a Profile object organized per-position (i.e. per column). """ return Profile(self._get_freqs(1), self.Alphabet) def sample(self, n=None, with_replacement=False, motif_length=1, \ randint=randint, permutation=permutation): """Returns random sample of positions from self, e.g. to bootstrap. Arguments: - n: the number of positions to sample from the alignment. Default is alignment length - with_replacement: boolean flag for determining if sampled positions - random_series: a random number generator with .randint(min,max) .random() methods Notes: By default (resampling all positions without replacement), generates a permutation of the positions of the alignment. Setting with_replacement to True and otherwise leaving parameters as defaults generates a standard bootstrap resampling of the alignment. """ population_size = len(self) // motif_length if not n: n = population_size if with_replacement: locations = randint(0, population_size, n) else: assert n <= population_size, (n, population_size, motif_length) locations = permutation(population_size)[:n] positions = [(loc*motif_length, (loc+1)*motif_length) for loc in locations] sample = Map(positions, parent_length=len(self)) return self.gappedByMap(sample, Info=self.Info) def slidingWindows(self, window, step, start=None, end=None): """Generator yielding new Alignments of given length and interval. Arguments: - window: The length of each returned alignment. - step: The interval between the start of the successive alignment objects returned. - start: first window start position - end: last window start position """ start = [start, 0][start is None] end = [end, len(self)-window+1][end is None] end = min(len(self)-window+1, end) if start < end and len(self)-end >= window-1: for pos in xrange(start, end, step): yield self[pos:pos+window] def aln_from_array(a, array_type=None, Alphabet=None): """Alignment from array of pos x seq: no change, names are integers. This is an InputHandler for Alignment. It converts an arbitrary array of numbers without change, but adds successive integer names (0-based) to each sequence (i.e. column) in the input a. Data type of input is unchanged. """ if array_type is None: result = a.copy() else: result = a.astype(array_type) return transpose(result), None def aln_from_model_seqs(seqs, array_type=None, Alphabet=None): """Alignment from ModelSequence objects: seqs -> array, names from seqs. This is an InputHandler for Alignment. It converts a list of Sequence objects with _data and Label properties into the character array Alignment needs. All sequences must be the same length. WARNING: Assumes that the ModelSeqs are already in the right alphabet. If this is not the case, e.g. if you are putting sequences on a degenerate alphabet into a non-degenerate alignment or you are putting protein sequences into a DNA alignment, there will be problems with the alphabet mapping (i.e. the resulting sequences may be meaningless). WARNING: Data type of return array is not guaranteed -- check in caller! """ data, names = [], [] for s in seqs: data.append(s._data) names.append(s.Name) result = array(data) if array_type: result = result.astype(array_type) return result, names def aln_from_generic(data, array_type=None, Alphabet=None): """Alignment from generic seq x pos data: sequence of sequences of chars. This is an InputHandler for Alignment. It converts a generic list (each item in the list will be mapped onto an Array object, with character transformations, all items must be the same length) into a numpy array, and assigns sequential integers (0-based) as names. WARNING: Data type of return array is not guaranteed -- check in caller! """ result = array(map(Alphabet.toIndices, data)) names = [] for d in data: if hasattr(d, 'Name'): names.append(d.Name) else: names.append(None) if array_type: result = result.astype(array_type) return result, names def aln_from_collection(seqs, array_type=None, Alphabet=None): """Alignment from SequenceCollection object, or its subclasses.""" names = seqs.Names data = [seqs.NamedSeqs[i] for i in names] result = array(map(Alphabet.toIndices, data)) if array_type: result = result.astype(array_type) return result, names def aln_from_fasta(seqs, array_type=None, Alphabet=None): """Alignment from FASTA-format string or lines. This is an InputHandler for Alignment. It converts a FASTA-format string or collection of lines into an Alignment object. All sequences must be the same length. WARNING: Data type of return array is not guaranteed -- check in caller! """ if isinstance(seqs, str): seqs = seqs.splitlines() return aln_from_model_seqs([ModelSequence(s, Name=l, Alphabet=Alphabet)\ for l, s in cogent.parse.fasta.MinimalFastaParser(seqs)], array_type) def aln_from_dict(aln, array_type=None, Alphabet=None): """Alignment from dict of {label:seq_as_str}. This is an InputHandler for Alignment. It converts a dict in which the keys are the names and the values are the sequences (sequence only, no whitespace or other formatting) into an alignment. Because the dict doesn't preserve order, the result will be in alphabetical order.""" names, seqs = zip(*sorted(aln.items())) result = array(map(Alphabet.toIndices, seqs), array_type) return result, list(names) def aln_from_kv_pairs(aln, array_type=None, Alphabet=None): """Alignment from sequence of (key, value) pairs. This is an InputHandler for Alignment. It converts a list in which the first item of each pair is the label and the second item is the sequence (sequence only, no whitespace or other formatting) into an alignment. Because the dict doesn't preserve order, the result will be in arbitrary order.""" names, seqs = zip(*aln) result = array(map(Alphabet.toIndices, seqs), array_type) return result, list(names) def aln_from_dense_aln(aln, array_type=None, Alphabet=None): """Alignment from existing DenseAlignment object: copies data. Retrieves data from Positions field. Uses copy(), so array data type should be unchanged. """ if array_type is None: result = aln.ArrayPositions.copy() else: result = aln.ArrayPositions.astype(array_type) return transpose(result), aln.Names[:] def aln_from_empty(obj, *args, **kwargs): """Alignment from empty data: raise exception.""" raise ValueError, "Cannot create empty alignment." #Implementation of Alignment base class class DenseAlignment(AlignmentI, SequenceCollection): """Holds a dense array representing a multiple sequence alignment. An Alignment is _often_, but not necessarily, an array of chars. You might want to use some other data type for the alignment if you have a large number of symbols. For example, codons on an ungapped DNA alphabet has 4*4*4=64 entries so can fit in a standard char data type, but tripeptides on the 20-letter ungapped protein alphabet has 20*20*20=8000 entries so can _not_ fit in a char and values will wrap around (i.e. you will get an unpredictable, wrong value for any item whose index is greater than the max value, e.g. 255 for uint8), so in this case you would need to use UInt16, which can hold 65536 values. DO NOT USE SIGNED DATA TYPES FOR YOUR ALIGNMENT ARRAY UNLESS YOU LOVE MISERY AND HARD-TO-DEBUG PROBLEMS. Implementation: aln[i] returns position i in the alignment. aln.Positions[i] returns the same as aln[i] -- usually, users think of this as a 'column', because alignment editors such as Clustal typically display each sequence as a row so a position that cuts across sequences is a column. aln.Seqs[i] returns a sequence, or 'row' of the alignment in standard terminology. WARNING: aln.Seqs and aln.Positions are different views of the same array, so if you change one you will change the other. This will no longer be true if you assign to Seqs or Positions directly, so don't do it. If you want to change the data in the whole array, always assign to a slice so that both views update: aln.Seqs[:] = x instead of aln.Seqs = x. If you get the two views out of sync, you will get all sorts of exceptions. No validation is performed on aln.Seqs and aln.Positions for performance reasons, so this can really get you into trouble. Alignments are immutable, though this is not enforced. If you change the data after the alignment is created, all sorts of bad things might happen. Class properties: Alphabet: should be an Alphabet object. Must provide mapping between items (possibly, but not necessarily, characters) in the alignment and indices of those characters in the resulting Alignment object. SequenceType: Constructor to use when building sequences. Default: Sequence InputHandlers: dict of {input_type:input_handler} where input_handler is from the InputHandlers above and input_type is a result of the method self._guess_input_type (should always be a string). Creating a new array will always result in a new object unless you use the force_same_object=True parameter. WARNING: Rebinding the Names attribute in a DenseAlignment is not recommended because not all methods will use the updated name order. This is because the original sequence and name order are used to produce data structures that are cached for efficiency, and are not updated if you change the Names attribute. WARNING: DenseAlignment strips off Info objects from sequences that have them, primarily for efficiency. """ MolType = None #will be set to BYTES on moltype import Alphabet = None #will be set to BYTES.Alphabet on moltype import InputHandlers = { 'array':aln_from_array, 'model_seqs':aln_from_model_seqs, 'generic':aln_from_generic, 'fasta':aln_from_fasta, 'dense_aln':aln_from_dense_aln, 'aln':aln_from_collection, 'collection':aln_from_collection, 'dict':aln_from_dict, 'kv_pairs':aln_from_kv_pairs, 'empty':aln_from_empty, } def __init__(self, *args, **kwargs): """Returns new DenseAlignment object. Inherits from SequenceCollection. """ kwargs['suppress_named_seqs'] = True super(DenseAlignment, self).__init__(*args, **kwargs) self.ArrayPositions = transpose(\ self.SeqData.astype(self.Alphabet.ArrayType)) self.ArraySeqs = transpose(self.ArrayPositions) self.SeqData = self.ArraySeqs self.SeqLen = len(self.ArrayPositions) def _force_same_data(self, data, Names): """Forces array that was passed in to be used as self.ArrayPositions""" if isinstance(data, DenseAlignment): data = data._positions self.ArrayPositions = data self.Names = Names or self.DefaultNameFunction(len(data[0])) def _get_positions(self): """Override superclass Positions to return positions as symbols.""" return map(self.Alphabet.fromIndices, self.ArrayPositions) Positions = property(_get_positions) def _get_named_seqs(self): if not hasattr(self, '_named_seqs'): seqs = map(self.Alphabet.toString, self.ArraySeqs) if self.MolType: seqs = map(self.MolType.Sequence, seqs) self._named_seqs = self._make_named_seqs(self.Names, seqs) return self._named_seqs NamedSeqs = property(_get_named_seqs) def keys(self): """Supports dict-like interface: returns names as keys.""" return self.Names def values(self): """Supports dict-like interface: returns seqs as Sequence objects.""" return [self.Alphabet.MolType.ModelSeq(i, Alphabet=self.Alphabet) \ for i in self.ArraySeqs] def items(self): """Supports dict-like interface; returns (name, seq) pairs.""" return zip(self.keys(), self.values()) def __iter__(self): """iter(aln) iterates over positions, returning array slices. Each item in the result is be a position ('column' in standard terminology) within the alignment, with the sequneces in the same order as in the names. The result shares data with the original array, so if you change the result you change the Alignment. """ return iter(self.Positions) def __getitem__(self, item): """getitem delegates to self.Positions., returning array slices. The result is a column or slice of columns, supporting full slice functionality (including stride). Use this to get a selection of positions from the alignment. Result shares data with the original array, so if you change the result you change the Alignment. """ return self.Positions[item] def _coerce_seqs(self, seqs, is_array): """Controls how seqs are coerced in _names_seqs_order. Override in subclasses where this behavior should differ. """ return seqs def getSubAlignment(self, seqs=None, pos=None, invert_seqs=False, \ invert_pos=False): """Returns subalignment of specified sequences and positions. seqs and pos can be passed in as lists of sequence indices to keep or positions to keep. invert_seqs: if True (default False), gets everything _except_ the specified sequences. invert_pos: if True (default False), gets everything _except_ the specified positions. Unlike most of the other code that gets things out of an alignment, this method returns a new alignment that does NOT share data with the original alignment. """ #figure out which positions to keep, and keep them if pos is not None: if invert_pos: pos_mask = ones(len(self.ArrayPositions)) put(pos_mask, pos, 0) pos = nonzero(pos_mask)[0] data = take(self.ArrayPositions, pos, axis=0) else: data = self.ArrayPositions #figure out which sequences to keep, and keep them if seqs is not None: if invert_seqs: seq_mask = ones(len(self.ArraySeqs)) put(seq_mask, seqs, 0) seqs = nonzero(seq_mask)[0] data = take(data, seqs, 1) names = [self.Names[i] for i in seqs] else: names = self.Names return self.__class__(data, map(str,names), self.Alphabet, \ conversion_f=aln_from_array) def __str__(self): """Returns FASTA-format string. Should be able to handle joint alphabets, e.g. codons. """ result = [] names = map(str, self.Names) max_label_length = max(map(len, names)) + 1 seq2str = self.Alphabet.fromIndices for l, s in zip(self.Names, self.ArraySeqs): result.append('>'+str(l)+'\n'+''.join(seq2str(s))) return '\n'.join(result) + '\n' def _get_freqs(self, index=None): """Gets array of freqs along index 0 (= positions) or 1 (= seqs). index: if 0, will calculate the frequency of each symbol in each position (=column) in the alignment. Will return 2D array where the first index is the position, and the second index is the index of the symbol in the alphabet. For example, for the TCAG DNA Alphabet, result[3][0] would store the count of T at position 3 (i.e. the 4th position in the alignment. if 1, does the same thing except that the calculation is performed for each sequence, so the 2D array has the sequence index as the first index, and the symbol index as the second index. For example, for the TCAG DNA Alphabet, result[3][0] would store the count of T in the sequence at index 3 (i.e. the 4th sequence). """ if index: a = self.ArrayPositions else: a = self.ArraySeqs count_f = self.Alphabet.counts return array(map(count_f, a)) def getPosFreqs(self): """Returns Profile of counts: position by character. See documentation for _get_freqs: this just wraps it and converts the result into a Profile object organized per-position (i.e. per column). """ return Profile(self._get_freqs(1), self.Alphabet) def getSeqEntropy(self): """Returns array containing Shannon entropy for each seq in self. Uses the profile object from getSeqFreqs (see docstring) to calculate the per-symbol entropy in each sequence in the alignment, i.e. the uncertainty about each symbol in each sequence (or row). This can be used to, for instance, filter low-complexity sequences. """ p = self.getSeqFreqs() p.normalizePositions() return p.rowUncertainty() def getPosEntropy(self): """Returns array containing Shannon entropy for each pos in self. Uses the profile object from getPosFreqs (see docstring) to calculate the per-symbol entropy in each position in the alignment, i.e. the uncertainty about each symbol at each position (or column). This can be used to, for instance, detect the level of conservation at each position in an alignment. """ p = self.getPosFreqs() p.normalizePositions() return p.rowUncertainty() def IUPACConsensus(self, alphabet=None): """Returns string containing IUPAC consensus sequence of the alignment. """ if alphabet is None: alphabet = self.MolType consensus = [] degen = alphabet.degenerateFromSequence for col in self.Positions: consensus.append(degen(str(alphabet.ModelSeq(col, \ Alphabet=alphabet.Alphabets.DegenGapped)))) return coerce_to_string(consensus) def _make_gaps_ok(self, allowed_gap_frac): """Makes the gaps_ok function used by omitGapPositions and omitGapSeqs. Need to make the function because if it's a method of Alignment, it has unwanted 'self' and 'allowed_gap_frac' parameters that impede the use of map() in takeSeqsIf. WARNING: may not work correctly if component sequences have gaps that are not the Alignment gap character. This is because the gaps are checked at the column level (and the positions are lists), rather than at the row level. Working around this issue would probably cause a significant speed penalty. """ def gaps_ok(seq): seq_len = len(seq) if hasattr(seq, 'countGaps'): num_gaps = seq.countGaps() elif hasattr(seq, 'count'): num_gaps = seq.count(self.Alphabet.Gap) else: num_gaps = sum(seq==self.Alphabet.GapIndex) return num_gaps / seq_len <= allowed_gap_frac return gaps_ok def columnFreqs(self, constructor=Freqs): """Returns list of Freqs with item counts for each column. """ return map(constructor, self.Positions) def sample(self, n=None, with_replacement=False, motif_length=1, \ randint=randint, permutation=permutation): """Returns random sample of positions from self, e.g. to bootstrap. Arguments: - n: the number of positions to sample from the alignment. Default is alignment length - with_replacement: boolean flag for determining if sampled positions - randint and permutation: functions for random integer in a specified range, and permutation, respectively. Notes: By default (resampling all positions without replacement), generates a permutation of the positions of the alignment. Setting with_replacement to True and otherwise leaving parameters as defaults generates a standard bootstrap resampling of the alignment. """ population_size = len(self) // motif_length if not n: n = population_size if with_replacement: locations = randint(0, population_size, n) else: assert n <= population_size, (n, population_size, motif_length) locations = permutation(population_size)[:n] #check if we need to convert coords for multi-width motifs if motif_length > 1: locations = (locations*motif_length).repeat(motif_length) wrapped_locations =locations.reshape((n,motif_length)) wrapped_locations += arange(motif_length) positions = take(self.ArrayPositions, locations, 0) result = self.__class__(positions.T,force_same_data=True, \ Info=self.Info, Names=self.Names) return result def aln_from_fasta_codons(seqs, array_type=None, Alphabet=None): """Codon alignment from FASTA-format string or lines. This is an InputHandler for taking a FASTA-format string of individual bases and converting it into an array by way of a CodonSequence object that groups triples of bases together and converts them into symbols on the codon alphabet (i.e. each group of 3 bases together is coded by a single symbol). This needs to override the normal aln_from_fasta InputHandler, which asssumes that it can convert the string into the array directly without this grouping step. """ if isinstance(seqs, str): seqs = seqs.split('\n') return aln_from_model_seqs([CodonSequenceGap(s, Label=l) for l, s \ in cogent.parse.fasta.MinimalFastaParser(seqs)]) def xsample(self, n=None, with_replacement=False, motif_length=1, \ random_series=random): """Returns random sample of positions from self, e.g. to bootstrap. Arguments: - n: the number of positions to sample from the alignment. Default is alignment length - with_replacement: boolean flag for determining if sampled positions - random_series: a random number generator with .randint(min,max) .random() methods Notes: By default (resampling all positions without replacement), generates a permutation of the positions of the alignment. Setting with_replacement to True and otherwise leaving parameters as defaults generates a standard bootstrap resampling of the alignment. """ population_size = len(self) // motif_length if not n: n = population_size if with_replacement: locations = [random_series.randint(0, population_size) for samp in xrange(n)] else: assert n <= population_size, (n, population_size, motif_length) locations = random_series.sample(xrange(population_size), n) positions = [(loc*motif_length, (loc+1)*motif_length) for loc in locations] sample = Map(positions, parent_length=len(self)) return self.gappedByMap(sample, Info=self.Info) class CodonDenseAlignment(DenseAlignment): """Stores alignment of gapped codons, no degenerate symbols.""" InputHandlers = { 'array':aln_from_array, 'seqs':aln_from_model_seqs, 'generic':aln_from_generic, 'fasta':aln_from_fasta_codons, 'dense_aln':aln_from_dense_aln, 'aln': aln_from_collection, 'collection':aln_from_collection, 'dict':aln_from_dict, 'empty':aln_from_empty, } def make_gap_filter(template, gap_fraction, gap_run): """Returns f(seq) -> True if no gap runs and acceptable gap fraction. Calculations relative to template. gap_run = number of consecutive gaps allowed in either the template or seq gap_fraction = fraction of positions that either have a gap in the template but not in the seq or in the seq but not in the template NOTE: template and seq must both be ModelSequence objects. """ template_gaps = array(template.gapVector()) def result(seq): """Returns True if seq adhers to the gap threshold and gap fraction.""" seq_gaps = array(seq.gapVector()) #check if gap amount bad if sum(seq_gaps!=template_gaps)/float(len(seq)) > gap_fraction: return False #check if gap runs bad if '\x01'*gap_run in logical_and(seq_gaps, \ logical_not(template_gaps)).astype(uint8).tostring(): return False #check if insertion runs bad elif '\x01'*gap_run in logical_and(template_gaps, \ logical_not(seq_gaps)).astype(uint8).tostring(): return False return True return result class Alignment(_Annotatable, AlignmentI, SequenceCollection): MolType = None #note: this is reset to ASCII in moltype module def __init__(self, *args, **kwargs): """Returns new Alignment object: see SequenceCollection.""" SequenceCollection.__init__(self, *args, **kwargs) #need to convert seqs to Aligned objects seqs = self.SeqData names = self.Names self._motif_probs = {} self._type = self.MolType.gettype() lengths = map(len, self.SeqData) if lengths and (max(lengths) != min(lengths)): raise DataError, "Not all sequences are the same length:\n" + \ "max is %s, min is %s" % (max(lengths), min(lengths)) aligned_seqs = [] for s, n in zip(seqs, names): if isinstance(s, Aligned): s.Name = n #ensure consistency aligned_seqs.append(s) else: aligned_seqs.append(self._seq_to_aligned(s, n)) self.NamedSeqs = self.AlignedSeqs = dict(zip(names, aligned_seqs)) self.SeqData = self._seqs = aligned_seqs def _coerce_seqs(self, seqs, is_array): if not min([isinstance(seq, _Annotatable) or isinstance(seq, Aligned) for seq in seqs]): seqs = map(self.MolType.Sequence, seqs) return seqs def _seq_to_aligned(self, seq, key): """Converts seq to Aligned object -- override in subclasses""" (map, seq) = self.MolType.Sequence(seq, key).parseOutGaps() return Aligned(map, seq) def getTracks(self, policy): # drawing code related # same as sequence but annotations go below sequence tracks return policy.tracksForAlignment(self) def getChildTracks(self, policy): """The only Alignment method required for cogent.draw""" tracks = [] for label in self.Names: seq = self.NamedSeqs[label] tracks += seq.getTracks(policy.copy(seqname=label)) return tracks def __repr__(self): seqs = [] limit = 10 delimiter = '' for (count, name) in enumerate(self.Names): if count == 3: seqs.append('...') break elts = list(self.getGappedSeq(name)[:limit+1]) if len(elts) > limit: elts.append('...') seqs.append("%s[%s]" % (name, delimiter.join(elts))) seqs = ', '.join(seqs) return "%s x %s %s alignment: %s" % (len(self.Names), self.SeqLen, self._type, seqs) def _mapped(self, slicemap): align = [] for name in self.Names: align.append((name, self.NamedSeqs[name][slicemap])) return self.__class__(MolType=self.MolType, data=align) def gappedByMap(self, keep, **kwargs): # keep is a Map seqs = [] for seq_name in self.Names: aligned = self.NamedSeqs[seq_name] seqmap = aligned.map[keep] seq = aligned.data.gappedByMap(seqmap) seqs.append((seq_name, seq)) return self.__class__(MolType=self.MolType, data=seqs, **kwargs) def projectAnnotation(self, seq_name, annot): target_aligned = self.NamedSeqs[seq_name] if annot.parent is not self: raise ValueError('Annotation does not belong to this alignment') return annot.remappedTo(target_aligned.data, target_aligned.map) def getProjectedAnnotations(self, seq_name, *args): aln_annots = self.getAnnotationsMatching(*args) return [self.projectAnnotation(seq_name, a) for a in aln_annots] def getAnnotationsFromSequence(self, seq_name, *args): aligned = self.NamedSeqs[seq_name] return aligned.getAnnotationsMatching(self, *args) def getAnnotationsFromAnySequence(self, *args): result = [] for seq_name in self.Names: result.extend(self.getAnnotationsFromSequence(seq_name, *args)) return result def getBySequenceAnnotation(self, seq_name, *args): result = [] for feature in self.getAnnotationsFromSequence(seq_name, *args): segment = self[feature.map.Start:feature.map.End] segment.Name = '%s "%s" %s to %s of %s' % ( feature.type, feature.Name, feature.map.Start, feature.map.End, self.Name or '') result.append(segment) return result def withMaskedAnnotations(self, annot_types, mask_char=None, shadow=False): """returns an alignment with annot_types regions replaced by mask_char if shadow is False, otherwise all other regions are masked. Arguments: - annot_types: annotation type(s) - mask_char: must be a character valid for the seq MolType. The default value is the most ambiguous character, eg. '?' for DNA - shadow: whether to mask the annotated regions, or everything but the annotated regions""" masked_seqs = [] for seq in self.Seqs: # we mask each sequence using these spans masked_seqs += [seq._masked_annotations(annot_types,mask_char,shadow)] new = self.__class__(data=masked_seqs, Info=self.Info, Name=self.Name) return new def variablePositions(self, include_gap_motif = True): """Return a list of variable position indexes. Arguments: - include_gap_motif: if False, sequences with a gap motif in a column are ignored.""" seqs = [self.getGappedSeq(n) for n in self.Names] seq1 = seqs[0] positions = zip(*seqs[1:]) result = [] for (position, (motif1, column)) in enumerate(zip(seq1,positions)): for motif in column: if motif != motif1: if include_gap_motif: result.append(position) break elif motif != '-' and motif1 != '-': result.append(position) break return result def filtered(self, predicate, motif_length=1, **kwargs): """The alignment positions where predicate(column) is true. Arguments: - predicate: a callback function that takes an tuple of motifs and returns True/False - motif_length: length of the motifs the sequences should be split into, eg. 3 for filtering aligned codons.""" gv = [] kept = False seqs = [self.getGappedSeq(n).getInMotifSize(motif_length, **kwargs) for n in self.Names] positions = zip(*seqs) for (position, column) in enumerate(positions): keep = predicate(column) if kept != keep: gv.append(position*motif_length) kept = keep if kept: gv.append(len(positions)*motif_length) locations = [(gv[i], gv[i+1]) for i in range(0, len(gv), 2)] keep = Map(locations, parent_length=len(self)) return self.gappedByMap(keep, Info=self.Info) def getSeq(self, seqname): """Return a ungapped Sequence object for the specified seqname. Note: always returns Sequence object, not ModelSequence. """ return self.NamedSeqs[seqname].data def getGappedSeq(self, seq_name, recode_gaps=False): """Return a gapped Sequence object for the specified seqname. Note: always returns Sequence object, not ModelSequence. """ return self.NamedSeqs[seq_name].getGappedSeq(recode_gaps) def iterPositions(self, pos_order=None): """Iterates over positions in the alignment, in order. pos_order refers to a list of indices (ints) specifying the column order. This lets you rearrange positions if you want to (e.g. to pull out individual codon positions). Note that self.iterPositions() always returns new objects, by default lists of elements. Use map(f, self.iterPositions) to apply the constructor or function f to the resulting lists (f must take a single list as a parameter). Note that some sequences (e.g. ViennaStructures) have rules that prevent arbitrary strings of their symbols from being valid objects. Will raise IndexError if one of the indices in order exceeds the sequence length. This will always happen on ragged alignments: assign to self.SeqLen to set all sequences to the same length. """ get = self.NamedSeqs.__getitem__ pos_order = pos_order or xrange(self.SeqLen) seq_order = self.Names aligned_objs = [get(seq) for seq in seq_order] seqs = map(str, aligned_objs) for pos in pos_order: yield [seq[pos] for seq in seqs] Positions = property(iterPositions) def withGapsFrom(self, template): """Same alignment but overwritten with the gaps from 'template'""" if len(self) != len(template): raise ValueError("Template alignment must be same length") gap = self.Alphabet.Gap tgp = template.Alphabet.Gap result = {} for name in self.Names: seq = self.getGappedSeq(name) if name not in template.Names: raise ValueError("Template alignment doesn't have a '%s'" % name) gsq = template.getGappedSeq(name) assert len(gsq) == len(seq) combo = [] for (s,g) in zip(seq, gsq): if g == tgp: combo.append(gap) else: combo.append(s) result[name] = combo return Alignment(result, Alphabet=self.Alphabet.withGapMotif()) def getDegappedRelativeTo(self, name): """Remove all columns with gaps in sequence with given name. Returns Alignment object of the same class. Note that the seqs in the new Alignment are always new objects. Arguments: - name: sequence name """ if name not in self.Names: raise ValueError("The alignment doesn't have a sequence named '{0}'" .format(name)) gap = self.Alphabet.Gap non_gap_cols = [i for i, col in enumerate(self.getGappedSeq(name)) if col != gap] return self.takePositions(non_gap_cols) def addFromReferenceAln(self, ref_aln, before_name=None, after_name=None): """ Insert sequence(s) to self based on their alignment to a reference sequence. Assumes the first sequence in ref_aln.Names[0] is the reference. By default the sequence is appended to the end of the alignment, this can be changed by using either before_name or after_name arguments. Returns Alignment object of the same class. Arguments: - ref_aln: reference alignment (Alignment object/series) of reference sequence and sequences to add. New sequences in ref_aln (ref_aln.Names[1:] are sequences to add. If series is used as ref_aln, it must have the structure [['ref_name', SEQ], ['name', SEQ]] - before_name: name of the sequence before which sequence is added - after_name: name of the sequence after which sequence is added If both before_name and after_name are specified seqs will be inserted using before_name. Example: Aln1: -AC-DEFGHI (name: seq1) XXXXXX--XX (name: seq2) YYYY-YYYYY (name: seq3) Aln2: ACDEFGHI (name: seq1) KL--MNPR (name: seqX) KLACMNPR (name: seqY) KL--MNPR (name: seqZ) Out: -AC-DEFGHI (name: seq1) XXXXXX--XX (name: seq2) YYYY-YYYYY (name: seq3) -KL---MNPR (name: seqX) -KL-ACMNPR (name: seqY) -KL---MNPR (name: seqZ) """ if type(ref_aln) != type(self): # let the seq class try and guess ref_aln = self.__class__(ref_aln) ref_seq_name = ref_aln.Names[0] if ref_seq_name not in self.Names: raise ValueError, "The name of reference sequence ({0})"\ "not found in the alignment \n(names in the alignment:\n{1}\n)"\ .format(ref_seq_name, "\n".join(self.Names)) if str(ref_aln.getGappedSeq(ref_seq_name)) \ != str(self.getSeq(ref_seq_name)): raise ValueError, "Reference sequences are unequal."\ "The reference sequence must not contain gaps" temp_aln = None for seq_name in ref_aln.Names[1:]: if seq_name in self.Names: raise ValueError, "The name of a sequence being added ({0})"\ "is already present".format(seq_name) seq = ref_aln.getGappedSeq(seq_name) new_seq = Aligned(self.NamedSeqs[ref_seq_name].map, seq) if not temp_aln: temp_aln = self.__class__({new_seq.Name: str(new_seq)}) else: temp_aln = temp_aln.addSeqs(self.__class__({new_seq.Name: str(new_seq)})) aln = self.addSeqs(temp_aln, before_name, after_name) return aln PyCogent-1.5.3/cogent/core/alphabet.py000644 000765 000024 00000077416 12024702176 020621 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ alphabet.py Contains classes for representing alphabets, and more general ordinations that map between a set of symbols and indices for storing the results in tables. The provided alphabets are those encountered in biological sequences, but other alphabets are certainly possible. WARNING: do access the standard Alphabets directly. It is expected that you will access them through the appropriate MolType. Until the moltype module has been imported, the Alphabets will not know their MolType, which will cause problems. It is often useful to create Alphabets and/or Enumerations on the fly, however. MolType provides services for resolving ambiguities, or providing the correct ambiguity for recoding -- will move to its own module. """ from cogent.util.array import cartesian_product import re import string from numpy import array, sum, transpose, remainder, zeros, arange, newaxis, \ ravel, asarray, fromstring, take, uint8, uint16, uint32, take from string import maketrans, translate import numpy Float = numpy.core.numerictypes.sctype2char(float) Int = numpy.core.numerictypes.sctype2char(int) __author__ = "Peter Maxwell, Gavin Huttley and Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", "Andrew Butterfield"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class AlphabetError(Exception): pass def get_array_type(num_elements): """Returns smallest array type that can contain sequence on num_elements. Used to figure out how large a data type is needed for the array in which elements are indices from an alphabet. If the data type is too small (e.g. you allocated an uint8 array, with 256 possible states (0-255), but your data actually have more than 256 states, e.g. tripeptide data with 20*20*20 = 8000 states), when you assign a state larger than the data type can hold you'll get an unexpected result. For example, assigning state 800 in an array that can only hold 256 different states will actually give you the result mod 256: >>> a = array(range(10), uint8) >>> a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],'B') >>> a[0] = 800 >>> a array([32, 1, 2, 3, 4, 5, 6, 7, 8, 9],'B') ^^ NOTE: element 1 is _not_ 800, but instead 32 -- nasty surprise! Getting the size of the necessary array from the Alphabet is a good solution to this problem. WARNING: Will not overflow if somehow you manage to feed it an alphabet with more than 2**32 elements, but it seems unlikely that this will happen very often in practice... """ if num_elements <= 256: return uint8 elif num_elements <= 2**16: return uint16 return uint32 def _make_translation_tables(a): """Makes translation tables between chars and indices. Return value is a tuple containing (a) the translation table where s.translate(a) -> array data, i.e. mapping characters to numbers, and (b) the translation table where a.tostring().translate(s) -> string of the characters in the original alphabet (e.g. array of 0..4 converted to strings of UCAG...). This is useful for alphabets where the entries are all single characters (e.g.nucleotides or amino acids, but not codons) because we can use translate() on the input string to make the array of values we need instead of having to convert each character into a Python object and look it up in some mapping. Using translate() can be thousands of times faster, so it's almost always worth it if you have a choice. """ indices = ''.join(map(chr, range(len(a)))) chars = ''.join(a) return maketrans(indices, chars), maketrans(chars, indices) def _make_complement_array(a, complements): """Makes translation array between item indices and their complements.""" comps = [complements.get(i, i) for i in a] return array(map(a.index, comps)) class Enumeration(tuple): """An ordered set of objects, e.g. a list of taxon labels or sequence ids. An Enumeration maps items to indices, and vice versa. Immutable. Must initialize with a sequence of (hashable) objects, in order. This is the base class for Alphabets. An Alphabet is a special case of Enumeration in which all the objects are strings of the same length. Stored as a tuple, but remember that if the elements in the tuple are mutable you can still mutate them in-place. Don't do this if you want your enumeration to work in a predictable fashion. Optionally takes a Gap parameter that defines the standard gap that will be used for output or for operations that act on gaps. Typically, this will be '-' or None, depending on the application. """ def __new__(cls, data=[], Gap=None, MolType=None): """Returns a new Enumeration object. data can be any sequence that can be passed to the tuple() constructor. Takes Gap as an argument but ignores it (handled in __init__). """ return tuple.__new__(cls,data) def __init__(self, data=[], Gap=None, MolType=None): """Initializes self from data, and optionally a gap. An Enumeration object mainly provides the mapping between objects and order so that you can convert symbols on an enumeration into numeric indices (e.g. recoding UCAG as the numbers 0,1,2,3, or recoding the set of species ['Human', 'Mouse', 'Fly'] as indices 0, 1 and 2 in a matrix. Properties: _obj_to_index: dict mapping the objects onto indices for fast lookup. index: provides the index of an object. __getitem__: provides the object at a specified index. Shape: shape of the data, typically an n x 1 array. _allowed_range: stores the range in which the enumeration elements occur (used for summing items that match a particular symbol). Gap: item to be used as a gap -- should typically appear in data too. ArrayType: type of array needed to store all the symbols in the enumeration, e.g. if your enumeration has > 256 objects in it you need to use uint16, not uint8, because it will wrap around otherwise. Also constrains the types to unsigned integer types so you don't accidentally use negative numbers as indices (this is very bad when doing indexed lookups). """ self.MolType = MolType #check if motif lengths are homogeneous -- if so, set length try: motif_lengths = frozenset(map(len, self)) if len(motif_lengths) > 1: self._motiflen = None else: self._motiflen = list(motif_lengths)[0] except TypeError: #some motifs don't support __len__, e.g. ints self._motiflen = None #make the quick_motifset for fast lookups; check for duplicates. self._quick_motifset = frozenset(self) if len(self._quick_motifset) != len(self): #got duplicates: show user what they sent in raise TypeError, 'Alphabet initialized with duplicate values:\n' +\ str(self) self._obj_to_index = dict(zip(self, range(len(self)))) #handle gaps self.Gap = Gap if Gap and (Gap in self): gap_index = self.index(Gap) if gap_index >= 0: self.GapIndex = gap_index try: self._gapmotif = self.Gap * self._motiflen except TypeError: #self._motiflen was probably None self._gapmotif = self.Gap self.Shape = (len(self),) #_allowed_range provides for fast sums of matching items self._allowed_range = arange(len(self))[:,newaxis] self.ArrayType = get_array_type(len(self)) self._complement_array = None #set in moltypes.py for standard types def index(self, item): """Returns the index of a specified item. This goes through an extra object lookup. If you _really_ need speed, you can bind self._obj_to_index.__getitem__ directly, but this is not recommended because the internal implementation may change.""" return self._obj_to_index[item] def toIndices(self, data): """Returns sequence of indices from sequence of elements. Raises KeyError if some of the elements were not found. Expects data to be a sequence (e.g. list of tuple) of items that are in the Enumeration. Returns a list containing the index of each element in the input, in order. e.g. for the RNA alphabet ('U','C','A','G'), the sequence 'CCAU' would produce the result [1,1,2,0], returning the index of each element in the input. """ return map(self._obj_to_index.__getitem__, data) def isValid(self, seq): """Returns True if seq contains only items in self.""" try: self.toIndices(seq) return True except (KeyError, TypeError): return False def fromIndices(self, data): """Returns sequence of elements from sequence of indices. Specifically, takes as input a sequence of numbers corresponding to elements in the Enumeration (i.e. the numbers must all be < len(self). Returns a list of the items in the same order as the indices. Inverse of toIndices. e.g. for the DNA alphabet ('U','C','A','G'), the sequence [1,1,2,0] would produce the result 'CCAU', returning the element corresponding to each element in the input. """ #if it's a normal Python type, map will work try: return map(self.__getitem__, data) #otherwise, it's probably an array object. except TypeError: try: data = map(int, data) except (TypeError, ValueError): #might be char array? print "DATA", data print "FIRST MAP:", map(str, data) print "SECOND MAP:", map(ord, map(str, data)) data = map(ord, map(str, data)) return(map(self.__getitem__, data)) def __pow__(self, num): """Returns JointEnumeration with num copies of self. A JointEnumeration is an Enumeration of tuples on the original enumeration (although these may be mapped to a different data type, e.g. a JointAlphabet is still an Alphabet, so its members are fixed-length strings). For example, a trinucleotide alphabet (or codons) would be a JointEnumeration on the nucleotides. So RnaBases**3 is RnaCodons (except for some additional logic for converting between tuples of one-letter strings and the desired 3-letter strings). All subenumerations of a JointEnumeration made by __pow__ are identical. """ return JointEnumeration([self]*num, MolType=self.MolType) def __mul__(self, other): """Returns JointEnumeration between self and other. Specifically, returns a JointEnumeration whose elements are (a,b) where a is an element of the first enumeration and b is an element of the second enumeration. For example, a JointEnumeration of 'ab' and 'cd' would have the four elements ('a','c'),('a','d'),('b','c'),('b',d'). A JointEnumeration is an enumeration of tuples on more than one enumeration, where the first element in each tuple comes from the first enumeration, the second from the second enumeration, and so on. JointEnumerations are useful as the basis for contingency tables, transition matrices, counts of dinucleotides, etc. """ if self.MolType is other.MolType: MolType = self.MolType else: MolType = None return JointEnumeration([self, other], MolType=MolType) def counts(self, a): """Returns array containing counts of each item in a. For example, on the enumeration 'UCAG', the sequence 'CCUG' would return the array [1,2,0,1] reflecting one count for the first item in the enumeration ('U'), two counts for the second item ('C'), no counts for the third item ('A'), and one count for the last item ('G'). The result will always be a vector of Int with length equal to the length of the enumeration. We return Int and non an unsigned type because it's common to subtract counts, which produces surprising results on unit types (i.e. wrapraround to maxint) unless the type is explicitly coerced by the user. Sliently ignores any unrecognized indices, e.g. if your enumeration contains 'TCAG' and you get an 'X', the 'X' will be ignored because it has no index in the enumeration. """ try: data = ravel(a) except ValueError: #ravel failed; try coercing to array try: data = ravel(array(a)) except ValueError: #try mapping to string data = ravel(array(map(str, a))) return sum(asarray(self._allowed_range == data, Int), axis=-1) def _get_pairs(self): """Accessor for pairs, lazy evaluation.""" if not hasattr(self, '_pairs'): self._pairs = self**2 return self._pairs Pairs = property(_get_pairs) def _get_triples(self): """Accessor for triples, lazy evaluation.""" if not hasattr(self, '_triples'): self._triples = self**3 return self._triples Triples = property(_get_triples) class JointEnumeration(Enumeration): """Holds an enumeration composed of subenumerations. Immutable. JointEnumeration[i] will return tuple of items on each of the constituent alphabets. For example, a JointEnumeration between the enumerations 'ab' and 'cd' would have four elements: ('a','c'),('a','d'),('b','c'),('b','d'). (note that if doing a JointAlphabet, these would be strings, not tuples). Note that the two enumerations do not have to be the same, although it is often convenient if they are (e.g. pair enumerations that underlie substitution matrices). """ def __new__(cls, data=[], Gap=None, MolType=None): """Fills in the tuple with tuples from the enumerations in data.""" sub_enums = cls._coerce_enumerations(data) return Enumeration.__new__(cls, cartesian_product(sub_enums), \ MolType=MolType) def __init__(self, data=[], Gap=None, MolType=None): """Returns a new JointEnumeration object. See class docstring for info. Expects a list of Enumeration objects, or objects that can be coerced into Enumeration objects (basically, anything that can be a tuple). Does NOT have an independent concept of a gap -- gets the gaps from the constituent subenumerations. """ self.SubEnumerations = self._coerce_enumerations(data) sub_enum_lengths = map(len, self.SubEnumerations) #build factors for combining symbols. curr_factor = 1 sub_enum_factors = [curr_factor] for i in sub_enum_lengths[-1:0:-1]: curr_factor *= i sub_enum_factors = [curr_factor] + sub_enum_factors self._sub_enum_factors = transpose(array([sub_enum_factors])) try: #figure out the gaps correctly gaps = [i.Gap for i in self.SubEnumerations] self.Gap = tuple(gaps) gap_indices = array([i.GapIndex for i in self.SubEnumerations]) gap_indices *= sub_enum_factors self.GapIndex = sum(gap_indices) except (TypeError, AttributeError): #index not settable self.Gap = None super(JointEnumeration, self).__init__(self, self.Gap) #remember to reset shape after superclass init self.Shape = tuple(sub_enum_lengths) def _coerce_enumerations(cls, enums): """Coerces putative enumerations into Enumeration objects. For each object passed in, if it's an Enumeration object already, use that object without translation/conversion. If it isn't, call the Enumeration constructor on it and append the new Enumeration to the result. Note that this means you can construct JointEnumerations where the subenumerations have the same data but are different objects -- in general, you probably don't want to do this (i.e. you should make it into an Enumeration beforehand and pass n references to that Enumeration in as a list: a = Enumeration('abc') j = JointEnumeration([a,a,a]) ... not a = JointEnumeration(['abc','abc','abc']) """ result = [] for a in enums: if isinstance(a, Enumeration): result.append(a) else: result.append(Enumeration(a)) return result def packArrays(self, arrays): """Packs parallel arrays on subenums to single array on joint enum. WARNING: must pass arrays as single object. This method takes a single array in which each row is an array of indices on the appropriate subenumeration. For example, you might have arrays for the bases at the first, second, and third positions of each codon in a gene, and want to pack them together into a single Codons object. packArrays() allows you to do this without having to explicitly interleave the arrays into a single sequence and then convert it back on the JointEnumeration. Notes: - Expects a single array object, where the rows (first dimension) correspond to information about the same set of data on a different enumeration (or, as in the case of codons, on the same enumeration but at a different position). This means that if you're constructing the array on the fly, the number of elements you have in each enumeration must be the same. - Arrays must already be converted into indices -- for example, you can't pass in raw strings as sequences, but you can pass in the data from cogent.seqsim.Sequence objects. - This method is the inverse of unpackArrays(). - Uses self.ArrayType to figure out the type of array to return (e.g. the amino acids may use a character array, but you need a larger data type to store indices on a JointEnumeration of pairs or triples of amino acids). """ return sum(self._sub_enum_factors * array(arrays, self.ArrayType), axis=0) def unpackArrays(self, a): """Unpacks array on joint enum to individual arrays on subenums. Returns result as single numpy array object. This method takes a single vector of indices on the appropriate JointEnumeration, and returns an array where the rows are, in order, vectors of the appropriate indices on each subenumeration. For example, you might have a sequence on the Codons enumeration, and want to unpack it into the corresponding sequences of the first, second, and third position bases in each codon. unpackArrays() allows you to do this . Notes: - Will always return a single array object, with number of rows equal to the number of subenumerations in self. - Will always return a value for each enumeration for each item, e.g. a sequence on the codon enumeration will always return three sequences on the individual enumerations that all have the same length (packed into a single array). - Output will aill always use the same typecode as the input array. """ a = array(a) num_enums = len(self.SubEnumerations) result = zeros((num_enums, len(a))) lengths = self.Shape # count backwards through the enumeration lengths and add to array for i in range(num_enums-1, -1, -1): length = lengths[i] result[i] = a % length a /= array(length,a.dtype.char) return result # the following, _coerce_enumerations, is a class method because we use # it in __new__ before we have an instance to call it on. _coerce_enumerations = classmethod(_coerce_enumerations) class Alphabet(Enumeration): """An ordered set of fixed-length strings, e.g. the 61 sense codons. Ambiguities (e.g. N for any base in DNA) are not considered part of the alphabet itself, although a sequence is valid on the alphabet even if it contains ambiguities that are known to the alphabet. A gap is considered a separate motif and is not part of the alphabet itself. The typical use is for the Alphabet to hold nucleic acid bases, amino acids, or codons. The MolType, if supplied, handles ambiguities, coercion of the sequence to the correct data type, and complementation (if appropriate). """ # make this exception avalable to objects calling alphabet methods. AlphabetError = AlphabetError def __new__(cls, motifset, Gap='-', MolType=None): """Returns a new Alphabet object.""" return Enumeration.__new__(cls, data=motifset, Gap=Gap, \ MolType=MolType) def __init__(self, motifset, Gap='-', MolType=None): """Returns a new Alphabet object.""" super(Alphabet, self).__init__(data=motifset, Gap=Gap, \ MolType=MolType) def getWordAlphabet(self, length): """Returns a new Alphabet object with items as length-n strings. Note that the result is not a JointEnumeration object, and cannot unpack its indices. However, the items in the result _are_ all strings. """ crossproduct = [''] for a in range(length): n = [] for c in crossproduct: for m in self: n.append(m+c) crossproduct = n return Alphabet(crossproduct, MolType=self.MolType) def fromSequenceToArray(self, sequence): """Returns an array of indices corresponding to items in sequence. Unlike toIndices() in superclass, this method returns a numpy array object. It also breaks the seqeunce into items in the current alphabet (e.g. breaking a raw DNA sequence into codons), which toIndices() does not do. It also requires the sequence to be a Sequence object rather than an arbitrary string, tuple, etc. """ sequence = sequence.getInMotifSize(self._motiflen) return array(map(self.index, sequence)) def fromOrdinalsToSequence(self, data): """Returns a Sequence object corresponding to indices in data. Unlike fromIndices() in superclass, this method uses the MolType to coerce the result into a sequence of the correct class. Note that if the MolType is not set, this method will raise an AttributeError. """ result = '' return self.MolType.makeSequence(''.join(self[i] for i in data)) def fromAmbigToLikelihoods(self, motifs, dtype=Float): """Returns an array in which rows are motifs, columns are items in self. Result is an array of Float in which a[i][j] indicates whether the ith motif passed in as motifs is a symbol that matches the jth character in self. For example, on the DNA alphabet 'TCAG', the degenerate symbol 'Y' would correspond to the row [1,1,0,0] because Y is a degenerate symbol that encompasses T and C but not A or G. This code is similar to code in the Profile class, and should perhaps be merged with it (in particular, because there is nothing likelihood- specific about the resulting match table). """ result = zeros([len(motifs), len(self)], dtype) obj_to_index = self._obj_to_index for (u, ambig_motif) in enumerate(motifs): for motif in self.resolveAmbiguity(ambig_motif): result[u, obj_to_index[motif]] = 1.0 return result def getMotifLen(self): """Returns the length of the items in self, or None if they differ.""" return self._motiflen def getGapMotif(self): """Returns the motif that self is using as a gap. Note that this will typically be a multiple of self.Gap. """ return self._gapmotif def includesGapMotif(self): """Returns True if self includes the gap motif, False otherwise.""" return self._gapmotif in self def _with(self, motifset): """Returns a new Alphabet object with same class and moltype as self. Will always return a new Alphabet object even if the motifset is the same. """ return self.__class__(tuple(motifset), MolType=self.MolType) def withGapMotif(self): """Returns an Alphabet object resembling self but including the gap. Always returns the same object. """ if self.includesGapMotif(): return self if not hasattr(self, 'Gapped'): self.Gapped = self._with(list(self) + [self.getGapMotif()]) return self.Gapped def getSubset(self, motif_subset, excluded=False): """Returns a new Alphabet object containing a subset of motifs in self. Raises an exception if any of the items in the subset are not already in self. Always returns a new object. """ if isinstance(motif_subset, dict): motif_subset = [m for m in motif_subset if motif_subset[m]] for m in motif_subset: if m not in self: raise AlphabetError(m) if excluded: motif_subset = [m for m in self if m not in motif_subset] return self._with(motif_subset) def resolveAmbiguity(self, ambig_motif): """Returns set of symbols corresponding to ambig_motif. Handles multi-character symbols and screens against the set of valid motifs, unlike the MolType version. """ # shortcut easy case if ambig_motif in self._quick_motifset: return (ambig_motif,) # resolve each letter, and build the possible sub motifs ambiguities = self.MolType.Ambiguities motif_set = [''] ALL = self.MolType.Alphabet.withGapMotif() for character in ambig_motif: new_motifs = [] if character == '?': resolved = ALL elif character == '-': resolved = ['-'] else: try: resolved = ambiguities[character] except KeyError: raise AlphabetError(ambig_motif) for character2 in resolved: for motif in motif_set: new_motifs.append(''.join([motif, character2])) motif_set = new_motifs # delete sub motifs that are not to be included motif_set = [motif for motif in motif_set if motif in self._quick_motifset] if not motif_set: raise AlphabetError(ambig_motif) return tuple(motif_set) def adaptMotifProbs(self, motif_probs): """Prepare an array or dictionary of probabilities for use with this alphabet by checking size and order""" if hasattr(motif_probs, 'keys'): sample = motif_probs.keys()[0] if sample not in self: raise ValueError("Can't find motif %s in alphabet" % sample) motif_probs = numpy.array( [motif_probs[motif] for motif in self]) else: if len(motif_probs) != len(self): if len(motif_probs) != len(self): raise ValueError("Can't match %s probs to %s alphabet" % (len(motif_probs), len(self))) motif_probs = numpy.asarray(motif_probs) assert abs(sum(motif_probs)-1.0) < 0.0001, motif_probs return motif_probs class CharAlphabet(Alphabet): """Holds an alphabet whose items are single chars. The general Alphabet can hold items of any type, but this is inconvenient if your Alphabet is characters-only because you get back operations on the alphabet as tuples of single-character strings instead of the strings you probably want. Having a separate represntation for CharAlphabets also allows certain efficiencies, such as using translation tables to map characters and indices instead of having to extract each element searately for remapping. """ def __init__(self, data=[], Gap='-', MolType=None): """Initializes self from items. data should be a sequence (string, list, etc.) of characters that are in the alphabet, e.g. 'UCAG' for RNA. Gap should be a single character that represents the gap, e.g. '-'. """ super(CharAlphabet, self).__init__(data, Gap, MolType=MolType) self._indices_to_chars, self._chars_to_indices = \ _make_translation_tables(data) self._char_nums_to_indices = array(self._chars_to_indices,'c').view('B') self._indices_nums_to_chars = array(self._indices_to_chars, 'c') def fromString(self, data): """Returns array of indices from string containing elements. data should be a string on the alphabet, e.g. 'ACC' for the RNA alhabet 'UCAG' would return the array [2,1,1]. This is useful for converting strings into arrays of small integers on the alphabet, e.g. for reading a Sequence from a string. This is on the Alphabet, not the Sequence, because lots of objects (e.g. Profile, Alignment) also need to use it. """ return fromstring(translate(data, self._chars_to_indices), uint8) def isValid(self, seq): """Returns True if seq contains only items in self.""" try: if len(seq) == 0: #can't be invalid if empty return True ind = self.toIndices(seq) return max(ind) < len(self) and min(ind) >= 0 except (TypeError, KeyError): return False def fromArray(self, data): """Returns array of indices from array containing elements. This is useful if, instead of a string, you have an array of characters that's been converted into a numpy array. See fromString docstring for general behavior. """ return take(self._char_nums_to_indices, data.view('B')) def toChars(self, data): """Converts array of indices into array of elements. For example, on the 'UCAG' RNA alphabet, an array with the data [0,1,1] would return the characters [U,C,C] in a byte array. """ return take(self._indices_nums_to_chars, data.astype('B')) def toString(self, data, delimiter='\n'): """Converts array of data into string. For example, on the 'UCAG' RNA alphabet, an array with the data [0,1,1] would return the string 'UCC'. This is the most useful conversion mechanism, and is used by e.g. the Sequence object to convert the internal data back into strings for output. """ s = data.shape if not s: return '' elif len(s) == 1: return self.toChars(data).tostring() else: return delimiter.join([i.tostring() for i in self.toChars(data)]) PyCogent-1.5.3/cogent/core/annotation.py000644 000765 000024 00000027754 12024702176 021213 0ustar00jrideoutstaff000000 000000 from location import as_map, Map import numpy __author__ = "Peter Maxwell and Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" class _Annotatable(object): # default annotations = () # Subclasses should provide __init__, getOwnTracks, and a _mapped for use by # __getitem__ def _slicedAnnotations(self, new, slice): result = [] if self.annotations: slicemap = self._as_map(slice) #try: newmap = slicemap.inverse() #except ValueError, detail: # print "Annotations dropped because %s" % detail # return [] if slicemap.useful: for annot in self.annotations: if not annot.map.useful: continue if annot.map.Start < slicemap.End and \ annot.map.End > slicemap.Start: annot = annot.remappedTo(new, newmap) if annot.map.useful: result.append(annot) return result def _shiftedAnnotations(self, new, shift): result = [] if self.annotations: newmap = Map([(shift, shift+len(self))], parent_length=len(new)) for annot in self.annotations: annot = annot.remappedTo(new, newmap) result.append(annot) return result def _as_map(self, index): """Can take a slice, integer, map or feature, or even a list/tuple of those""" if type(index) in [list, tuple]: spans = [] for i in index: spans.extend(self._as_map(i).spans) map = Map(spans=spans, parent_length=len(self)) elif isinstance(index, _Feature): feature = index map = feature.map base = feature.parent containers = [] while feature and base is not self and hasattr(base, 'parent'): containers.append(base) base = base.parent if base is not self: raise ValueError("Can't map %s onto %s via %s" % (index, repr(self), containers)) for base in containers: feature = feature.remappedTo(base, base.map) index = map else: map = as_map(index, len(self)) return map def __getitem__(self, index): map = self._as_map(index) new = self._mapped(map) sliced_annots = self._slicedAnnotations(new, map) new.attachAnnotations(sliced_annots) return new def _mapped(self, map): raise NotImplementedError def getAnnotationTracks(self, policy): result = [] for annot in self.annotations: result.extend(annot.getTracks(policy)) return result def addAnnotation(self, klass, *args, **kw): annot = klass(self, *args, **kw) self.attachAnnotations([annot]) return annot def attachAnnotations(self, annots): for annot in annots: if annot.parent is not self: raise ValueError("doesn't belong here") if annot.attached: raise ValueError("already attached") if self.annotations is self.__class__.annotations: self.annotations = [] self.annotations.extend(annots) for annot in annots: annot.attached = True def detachAnnotations(self, annots): for annot in annots: if annot.parent is not self: raise ValueError("doesn't live here") for annot in annots: if annot.attached: self.annotations.remove(annot) annot.attached = False def addFeature(self, type, Name, spans): return self.addAnnotation(Feature, type, Name, spans) def getAnnotationsMatching(self, annotation_type, Name=None): result = [] for annotation in self.annotations: if annotation_type == annotation.type and ( Name is None or Name == annotation.Name): result.append(annotation) return result def getRegionCoveringAll(self, annotations): spans = [] annotation_types = [] for annot in annotations: spans.extend(annot.map.spans) if annot.type not in annotation_types: annotation_types.append(annot.type) map = Map(spans=spans, parent_length=len(self)) map = map.covered() # No overlaps Name = ','.join(annotation_types) return _Feature(self, map, type='region', Name=Name) def getByAnnotation(self, annotation_type, Name=None, ignore_partial=False): """yields the sequence segments corresponding to the specified annotation_type and Name one at a time. Arguments: - ignore_partial: if True, annotations that extend beyond the current sequence are ignored.""" for annotation in self.getAnnotationsMatching(annotation_type, Name): try: seq = self[annotation.map] except ValueError, msg: if ignore_partial: continue raise msg seq.Info['Name'] = annotation.Name yield seq def _annotations_nucleic_reversed_on(self, new): """applies self.annotations to new with coordinates adjusted for reverse complement.""" assert len(new) == len(self) annotations = [] for annot in self.annotations: new_map = annot.map.nucleicReversed() annotations.append(annot.__class__(new, new_map, annot)) new.attachAnnotations(annotations) class _Feature(_Annotatable): qualifier_names = ['type', 'Name'] def __init__(self, parent, map, original=None, **kw): assert isinstance(parent, _Annotatable), parent self.parent = parent self.attached = False self.map = map if hasattr(parent, 'base'): self.base = parent.base self.base_map = parent.base_map[self.map] else: self.base = parent self.base_map = map for n in self.qualifier_names: if n in kw: setattr(self, n, kw.pop(n)) else: setattr(self, n, getattr(original, n)) assert not kw, kw def attach(self): self.parent.attachAnnotations([self]) def detach(self): self.parent.detachAnnotations([self]) def _mapped(self, slicemap): Name = "%s of %s" % (repr(slicemap), self.Name) return _Feature(self, slicemap, type="slice", Name=Name) def getSlice(self, complete=True): """The corresponding sequence fragment. If 'complete' is true and the full length of this feature is not present in the sequence then this method will fail.""" map = self.base_map if not (complete or map.complete): map = map.withoutGaps() return self.base[map] def withoutLostSpans(self): """Keeps only the parts which are actually present in the underlying sequence""" if self.map.complete: return self keep = self.map.nongap() new = type(self)(self.parent, self.map[keep], original=self) if self.annotations: sliced_annots = self._slicedAnnotations(new, keep) new.attachAnnotations(sliced_annots) return new def asOneSpan(self): new_map = self.map.getCoveringSpan() return _Feature(self.parent, new_map, type="span", Name=self.Name) def getShadow(self): return _Feature(self.parent, self.map.shadow(), type='region', Name='not '+ self.Name) def __len__(self): return len(self.map) def __repr__(self): Name = getattr(self, 'Name', '') if Name: Name = ' "%s"' % Name return '%s%s at %s' % (self.type, Name, self.map) def remappedTo(self, grandparent, gmap): map = gmap[self.map] return self.__class__(grandparent, map, original=self) def getCoordinates(self): """returns sequence coordinates of this Feature as [(start1, end1), ...]""" coordinates = [(span.Start, span.End) for span in self.map.spans] return coordinates class AnnotatableFeature(_Feature): """These features can themselves be annotated.""" def _mapped(self, slicemap): new_map = self.map[slicemap] return _Feature(self.parent, new_map, type='slice', Name='') def remappedTo(self, grandparent, gmap): new = _Feature.remappedTo(self, grandparent, gmap) new.annotations = [annot for annot in self.annotations if annot.map.useful] return new def getTracks(self, policy): return policy.at(self.map).tracksForFeature(self) class Source(_Feature): # Has two maps - where it is on the sequence it annotates, and # where it is on the original sequence. type = 'source' def __init__(self, seq, map, accession, basemap): self.accession = accession self.Name = repr(basemap) + ' of ' + accession self.parent = seq self.attached = False self.map = map self.basemap = basemap def remappedTo(self, grandparent, gmap): new_map = gmap[self.map] # unlike other annotations, sources are divisible, so throw # away gaps. since they don't have annotations it's simple. ng = new_map.nongap() new_map = new_map[ng] basemap = self.basemap[ng] return self.__class__(grandparent, new_map, self.accession, basemap) def withoutLostSpans(self): return self def Feature(parent, type, Name, spans, value=None): if isinstance(spans, Map): map = spans assert map.parent_length == len(parent), (map, len(parent)) else: map = Map(locations=spans, parent_length=len(parent)) return AnnotatableFeature(parent, map, type=type, Name=Name) class _Variable(_Feature): qualifier_names = _Feature.qualifier_names + ['xxy_list'] def getTracks(self, policy): return policy.tracksForVariable(self) def withoutLostSpans(self): if self.map.complete: return self raise NotImplementedError def Variable(parent, type, Name, xxy_list): """A variable that has 2 x-components (start, end) and a single y component. Currently used by Vestige - BMC Bioinformatics, 6:130, 2005.""" start = min([min(x1, x2) for ((x1, x2), y) in xxy_list]) end = max([max(x1, x2) for ((x1, x2), y) in xxy_list]) if start != 0: xxy_list = [((x1-start, x2-start), y) for ((x1, x2), y) in xxy_list] end -= start #values = [location.Span(x1-start, x2-start, True, True, y) for ((x1, x2), y) in xxy] map = Map([(start, end)], parent_length=len(parent)) return _Variable(parent, map, type=type, Name=Name, xxy_list=xxy_list) class _SimpleVariable(_Feature): qualifier_names = _Feature.qualifier_names + ['data'] def getTracks(self, policy): return policy.tracks_for_value(self) def withoutLostSpans(self): if self.map.complete: return self keep = self.map.nongap() indicies = numpy.concatenate([list(span) for span in keep.Spans]) data = numpy.asarray(data)[indicies] new = type(self)(self.parent, self.map[keep], data=data, original=self) return new def SimpleVariable(parent, type, Name, data): """A simple variable type of annotation, such as a computed property of a sequence that varies spatially.""" assert len(data) == len(parent), (len(data), len(parent)) map = Map([(0, len(data))], parent_length=len(parent)) return _SimpleVariable(parent, map, type=type, Name=Name, data=data) PyCogent-1.5.3/cogent/core/bitvector.py000644 000765 000024 00000067147 12024702176 021042 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Classes for dealing with bitvectors: arrays of 1 and 0. bitvectors are often much faster than numpy arrays for bit operations, especially if several characters (e.g. of DNA) can be packed into a single byte. Usage: v = Bitvector('11000101') Provides the following major classes and factory functions: ShortBitvector and LongBitvector: subclass ImmutableBitvector and are produced by the factory function Bitvector(). Short uses an int, while Long uses a long. ShortBitvectors in particular are _very_ fast, but are limited to 31 bits. Bitvector() will return the appropriate kind of vector for your sequence. MutableBitvector: always a LongBitvector. Can be changed through __getitem__, e.g. vec[3] = 0. Use freeze() and thaw() methods to convert between mutable and immutable Bitvectors (both types define both methods). VectorFromCases: constructs a Bitvector from letters in a string. VectorFromMatches: constructs a Bitvector from a text and a pattern, with 1 at the positions where there's a match. Pattern can be a string or a regex. PackedBases: subclasses LongBitvector: provides a way of storing nucleic acid bases compactly and in a format convenient for assessing sequence similarity. Note: there isn't a general VectorFromX factory function because if you have a function, you can always do Bitvector(map(func, items)), making VectorFromX trivial. """ from cogent.util.misc import Delegator import re from string import maketrans from operator import and_, or_, xor from numpy import log2 from sys import maxint __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Rob Knight", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Production" _bits_in_int = int(round(log2(maxint))) def is_nonzero_string_char(char): """Tests whether input character is not '0', returning '1' or '0'.""" if char == '0' or char == '': return '0' else: return '1' def is_nonzero_char(item): """Tests whether input item is nonzero, returning '1' or '0'.""" if isinstance(item, str): return is_nonzero_string_char(item) else: if item: return '1' else: return '0' def seq_to_bitstring(seq): """Converts sequence to string of 1 and 0, returning string of '1'/'0'.""" if not seq: return '' if isinstance(seq, str): return ''.join(map(is_nonzero_string_char, seq)) else: return ''.join(map(is_nonzero_char, seq)) def is_nonzero_string_int(char): """Tests whether input character is not '0', returning 1 or 0.""" if char == '0' or char == '': return 0 else: return 1 def is_nonzero_int(item): """Tests whether input item is nonzero, returning 1 or 0.""" if isinstance(item, str): return is_nonzero_string_int(item) else: if item: return 1 else: return 0 def seq_to_bitlist(seq): """Converts sequence to list of 1 and 0.""" if not seq: return [] if isinstance(seq, str): return map(is_nonzero_string_int, seq) else: return map(is_nonzero_int, seq) def num_to_bitstring(num, length): """Returns string of bits from number, truncated at length. Algorithm: While the number storing the bitvector is greater than zero, find the last digit as (num mod 2) and then bit-shift the string to the right to delete the last digit. Add the digits into a string from right to left (in reverse order). In other words, when length is too small for the number, takes the last n bits of the number in order. Warning: this is not terribly efficient, since the entire number must be rewritten at each shift step. In mutable bitvectors, use caching to reduce the number of conversions required. """ bits = [0] * length #start with all zeroes #successively find the last bit of curr, replacing the #appropriate element of bits as each bit is processed and #removed while num and length: bits[length-1] = 1 & num num >>= 1 length -= 1 return ''.join(map(str, bits)) def bitcount(num, length, value=1): """Counts the bits in num that are set to value (1 or 0, default 1).""" one_count = 0 curr_length = length while num and curr_length: one_count += 1 & num num >>= 1 curr_length -= 1 if value: return one_count else: return length - one_count class ImmutableBitvector(object): """Generic interface for immutable bitvectors.""" def __init__(self, items='', length=None): """Returns new Bitvector.""" if length is not None: self._length = length else: self._length = len(items) def __str__(self): """Returns string representation of bitvector.""" return num_to_bitstring(self, self._length) def __len__(self): """Returns length of the bitvector.""" return self._length def op(self, other, func): """Performs bitwise op on self and other, returning new Bitvector.""" self_length = self._length if not isinstance(other, ImmutableBitvector): #coerce to comparable type try: other = other.freeze() other_length = other._length except: other = Bitvector(other, self_length) other_length = self_length else: other_length = other._length if self_length == other_length: #if they're the same length, just combine the vectors return Bitvector(func(self, other), self_length) else: #need to find which is larger, right-shift, and combine diff = self_length - other_length if diff < 0: #re-bind self and other so self is now longest #tuple unpacking swaps self and other self_length, other_length = other_length, self_length self, other = other, self #shift the shorter vector and combine with bitwise function return Bitvector(func(self >> abs(diff), other), other_length) def __getitem__(self, item): """Returns self[item] as 1 or 0, i.e. as integers and not strings. key should be an index less than the length of the bitvector; negative indexes are handled in the usual manner. Algorithm: uses bit masks to figure out whether the specific position is occupied. Caches masks in Bitvector._masks, so should be very fast when only a few different lengths of bitvector are used in a program. Will be somewhat inefficient if many large bitvectors of different sizes are each used occasionally. """ #check that there actually are items in the vector length = self._length if not length: raise IndexError, "Can't find items in empty vector!" if isinstance(item, slice): return Bitvector(''.join(map(str,[self[i] \ for i in range(*item.indices(length))]))) else: #transform keys to allow negative index if item < 0: item += length #check that key is in bounds if (item < 0) or (item >= length): raise IndexError, "Index %s out of range." % (item,) move_to = length - item - 1 if move_to >= _bits_in_int: result = self & (1L << move_to) else: result = self & (1 << move_to) if result: return 1 else: return 0 def bitcount(self, value=1): """Counts the bits in self that match value.""" return bitcount(self, self._length, value) def __repr__(self): """Produces standard object representation instead of int.""" c = self.__class__ return "<%s.%s object at %s>" % (c.__module__,c.__name__,hex(id(self))) def freeze(self): """Returns ImmutableBitvector containing data from self.""" return self def thaw(self): """Returns MutableBitvector containing data from self.""" return MutableBitvector(self) def stateChanges(self): """Returns list of indices where state changes from 0->1 or 1->0.""" #bail out if no elements rather than raising IndexError length = len(self) if not length: return [] bits = list(self) changes = [i for i in range(1, length) if self[i] != self[i-1]] return [0] + changes + [length] #always implicitly add start and end def divideSequence(self, seq, state=None): """Divides sequence into a list of subsequences, cut using stateChanges. The list will contain the slices for the indices containing the given state. If no state given (state = None) the list will consist of all of the subsequences, cut at the indices. The list and self[0] are returned as a tuple. Truncates whichever sequence is shorter. """ #bail out rather than raising IndexError if vector or sequence is empty if not (len(self) and seq): return ([], 0) cut_list = self.stateChanges() cut_seq = [] first = 0 cut_index = len(cut_list)-1 #If user supplied a specific state, return only pieces in that state if state is not None: #test whether we want to include the first segment or not exclude_start = self[0] != state for i in range(cut_index): if i % 2 == exclude_start: cut_seq.append(seq[cut_list[i]:cut_list[i+1]]) else: #No state supplied: return pieces in all states for i in range(cut_index): cut_seq.append(seq[cut_list[i]:cut_list[i+1]]) first = self[0] return filter(None, cut_seq), first class ShortBitvector(ImmutableBitvector, int): """Short bitvector, stored as an int.""" __slots__ = ['_length'] def __new__(cls, items='', length='ignored'): """Creates new bitvector from sequence of items.""" if isinstance(items, str): if items: #guard against empty string return int.__new__(cls, items, 2) else: return int.__new__(cls, 0) else: return int.__new__(cls, items) def __or__(self, other): """Returns position-wise OR (true if either true).""" if isinstance(other, long): return LongBitvector(self).__or__(other) else: return self.op(other, int.__or__) def __and__(self, other): """Returns position-wise AND (true if both true).""" if isinstance(other, long): return LongBitvector(self).__and__(other) else: return self.op(other, int.__and__) def __xor__(self, other): """Returns position-wise XOR of self and other (true if states differ). """ if isinstance(other, long): return LongBitvector(self).__xor__(other) else: return self.op(other, int.__xor__) def __invert__(self): """Returns complement (replace 1 with 0 and vice versa).""" length = self._length if length >= _bits_in_int: mask = (1L << length) - 1 else: mask = (1 << length) - 1 return Bitvector(mask & ~int(self), length) class LongBitvector(ImmutableBitvector, long): """Long bitvector, stored as a long.""" def __new__(cls, items='', length='ignored'): """Creates new bitvector from sequence of items.""" if isinstance(items, str): if items: #guard against empty string return long.__new__(cls, items, 2) else: return long.__new__(cls, 0) else: return long.__new__(cls, items) def __or__(self, other): """Returns position-wise OR (true if either true).""" return self.op(other, long.__or__) def __and__(self, other): """Returns position-wise AND (true if both true).""" return self.op(other, long.__and__) def __xor__(self, other): """Returns position-wise XOR of self and other (true if states differ). """ return self.op(other, long.__xor__) def __invert__(self): """Returns complement (replace 1 with 0 and vice versa).""" length = self._length return Bitvector(((1L << length) - 1) & ~long(self), length) def Bitvector(items='', length=None, constructor=None): """Factory function returning short or long Bitvector depending on length. Note: uses explict test rather than try/except to fix memory leak. Now is the only way to convert an arbitrary sequence of true/false values into a bitvector. """ #convert whatever was passed as 'items' into a number if isinstance(items, ImmutableBitvector): num = items if length is None: length = len(items) elif isinstance(items, int) or isinstance(items, long): num = items if length is None: raise TypeError, "Must specify length if initializing with number." else: bitstring = seq_to_bitstring(items) if bitstring: num = long(bitstring, 2) if length is None: length = len(items) else: num = 0 length = 0 #if the constructor was not passed explicitly, guess it if not constructor: if 0 <= num <= maxint: constructor = ShortBitvector else: constructor = LongBitvector return constructor(num, length) # Original version with memory leak under Python 2.3 # try: # return ShortBitvector(items, length) # except (OverflowError, TypeError): # return LongBitvector(items, length) class MutableBitvector(Delegator): """Array of bits (0 or 1) supporting set operations. Supports __setitem__. """ def __init__(self, items='', length=None): """Initializes the Bitvector class, storing data in long _vector. Items can be any sequence with members that evaluate to True or False. Private data: _vector: long representing the bitvector _string: string representation of the vector _is_current: boolean indicating whether the string is known to match the long data """ Delegator.__init__(self, None) self.replace(items, length) def replace(self, items='', length=None): """Replaces the contents of self with data in items. Items should be a sequence with elements that evaluate to True or False. If items is another BitVector, uses an efficient method based on the internal data. Otherwise, will loop through items evaluating each to True or False. Primarily used internally, but exposed publically because it's often convenient to replace the contents of an existing bitvector. Usage: vec.update(foo) where foo is a sequence or bitvector. """ vec = Bitvector(items, length) self._handler = vec #make sure attributes go to the new vec self._is_current = False self._string = '' def __str__(self): """Prints the bits in the vector as a string of '1' and '0'.""" #check that the string is up to date; when it is, return it. if not self._is_current: self._string = str(self._handler) self._is_current = True return self._string def __len__(self): """Need to pass directly to handler.""" return len(self._handler) def __getitem__(self, *args): """Gets item from vector: needs to be passed through explicitly.""" return self._handler.__getitem__(*args) def __setitem__(self, item, value): """Sets self[item] to value: value must be 0 or 1, or '0' or '1'. Uses masks to alter the bit at a specific position. To set a bit to zero, make a string that is 1 everywhere except at that position and then use bitwise &. To set a bit to 1, make a string that is 0 everywhere except at that position and then use bitwise |. Warning: this is quite a slow operation, especially on long vectors. """ length = self._length #handle slice assignment. Warning: does not allow change in length! if isinstance(item, slice): for i, val in zip(range(*item.indices(length)), value): self[i] = is_nonzero_int(val) return #otherwise, check that the key is in range vec = self._handler curr_val = vec[item] val = is_nonzero_int(value) if val != curr_val: #need only do anything if value changed! #find index of negative values if item < 0: item += length #check that key is in bounds if (item < 0) or (item >= length): raise IndexError, "Index %s out of range." % (item,) #figure out offset move_to = length - item - 1 if value == 0: #make mask of '1', and combine with & result = vec &((1L << length) - 1) ^ (1L << move_to) elif value == 1: #make mask of '0', and combine with | result = vec | (1L << move_to) else: #complain if we got anything else raise ValueError, "Item not 0 or 1: " + str(value) self.replace(result) def __cmp__(self, other): """Comparison needs to be passed through explicitly.""" return cmp(self._handler, other) def op(self, other, func): """Returns position-wise op for self and other. self[i] == 1 if op(self[i], other[i]) == 1 Internal method: used to implement bitwise operations on pairs of bitvectors. Truncates the longer vector to fit the shorter one by right-shifting, i.e. the leftmost bits of the longer vector will be combined with all the bits of the shorter vector. Handles the case where both the operands are bitvectors specially for speed, to greatest advantage when they are equal length. Uses bool() to convert elements of sequences, in preference to checking whether they directly represent 0 or 1. This makes the operations more flexible, but beware of bad data! """ return self.__class__(self._handler.op(other, func)) def __or__(self, other): """Returns position-wise OR (true if either true). """ return self.op(other, or_) def __and__(self, other): """Returns position-wise AND (true if both true). """ return self.op(other, and_) def __xor__(self, other): """Returns position-wise XOR of self and other (true if states differ). """ return self.op(other, xor) def __invert__(self): """Returns complement (replace 1 with 0 and vice versa). """ return self.__class__(~self._handler) def thaw(self): """Returns mutable version of self.""" return self def freeze(self): """Returns immutable version of self.""" return self._handler def VectorFromCases(seq, constructor=LongBitvector): """Returns vector with 0(1) for lower(upper)case positions in sequence. Primarily used for identifying binding sites denoted by uppercase bases. """ return constructor(''.join(map(str, map(int, [i.isupper() for i in seq])))) def VectorFromMatches(seq, pattern, overlapping=1, constructor=LongBitvector): """Replaces self with 1(0) for each position in sequence (not) matching. Usage: vec.toMatches('agagac', 'aga', 1) == vec('111110') vec.toMatches('agagac', 'aga', 0) == vec('111000') The first argument must be a type that is searchable by a re object, typically a string. The second argument must be a pattern to search: this can be a string, a list of strings, or an re object. The third, optional argument specifies whether matches may overlap or not. If a list is passed, its elements will be converted into strings, and then into a regular expression using alternation. Lists of regular expressions are not supported directly. Zero-length patterns will typically never be counted as matching, rather than being counted as matching everywhere. """ #allocate space for the results seqlength = len(seq) result = [0] * seqlength if seqlength: #will return empty string if seq empty if isinstance(pattern, str): #handle strings by using the index string method patlength = len(pattern) #will skip zero-length patterns if patlength: #until we reach the end of the sequence, find the next #index that contains a match. When a match has been #processed by inserting '1' into the list at each position #that matches, either move to the next position (if matches #can overlap) or to the position after the end of the match #(if matches cannot overlap). start = 0 while start + patlength - 1 <= seqlength: try: match = seq.index(pattern, start) except ValueError: break #no more matches #insert '1' into the list in the appropriate slice result[match:match + patlength] = [1] * patlength #move to the first index where the next match could be if overlapping: start = match + 1 else: start = match + patlength else: #if not a string, assume regex #first, attempt to alternate the items in pattern as though it #were a sequence type try: pattern = re.compile('|'.join(map(str, pattern))) except: pass #use same strategy as above to locate matches of pattern, find #the lengths, insert '1' into the list at the appropriate #places, and move on to the next position where a match might #start. start = 0 while start < seqlength: match = pattern.search(seq, start) try: #find the length of the match, and the start and end #indexes. Use these to define the positions where '1' #will appear in the result. match_start, match_end = match.span() match_length = match_end - match_start result[match_start:match_end] = [1] * match_length #figure out how many positions to advance for the next #possible match start if overlapping or (match_length == 0): start = match_start + 1 else: start = match_end except AttributeError, TypeError: #match is None if no more in string break else: #sequence was zero-length result = [] return constructor(''.join(map(str, result))) def VectorFromRuns(seq, length, constructor=LongBitvector): """Returns vector with 1 at positions specified by sequence of (start,len). seq should be a sequence of 2-item sequences, where the first element is the index where the run of 1's starts and the second element is the number of 1's in the run. length should be an int giving the total length of the sequence. """ if not length: return constructor('') bits = ['0']*length for start, run in seq: if start + run > length: raise IndexError, "start %s + run %s exceeds seq length %s" % \ (start, run, length) bits[start:start+run] = ['1']*run return constructor(''.join(bits)) def VectorFromSpans(seq, length, constructor=LongBitvector): """Returns vector with 1 at positions specified by sequence of (start,end). seq should be a sequence of 2-item sequences, where the first element is the index where the run of 1's starts and the second element is the index where the run stops (standard Python slice notation, i.e. the start and stop you would normally use in a slice, where the slice does not include the element at i[stop]). length should be an int giving the total length of the sequence. """ if not length: return constructor('') bits = ['0']*length for start, stop in seq: if stop > length: raise IndexError, "stop %s exceeds seq length %s" % (stop, length) run = stop - start bits[start:stop] = ['1']*run return constructor(''.join(bits)) def VectorFromPositions(seq, length, constructor=LongBitvector): """Returns vector with 1 at positions specified by sequence of positions. seq should be a sequence of ints (position). length should be an int giving the total length of the sequence. """ if not length: return constructor('') bits = ['0']*length for i in seq: bits[i] = '1' return constructor(''.join(bits)) class PackedBases(LongBitvector): """Stores unambiguous nucleotide sequences as bitvectors, 4 per byte. Rna controls whether __str__ produces RNA or DNA. Each base is stored as 2 bits. The first bit encodes purine/pyrmidine, while the second encodes weak/strong. NOTE: len() returns the length in _bits_, not the length in bases. """ #translation table to turn bases into sequences. First bit encodes purine/ #pyrimindine; second bit encodes weak/strong. _sequence_to_bits = maketrans('AGUTCagutc', '0122301223') _rna_bases = {0:'A', 1:'G', 2:'U', 3:'C'} _dna_bases = {0:'A', 1:'G', 2:'T', 3:'C'} def __new__(cls, sequence='', Rna='ignored'): """Packs bases in sequence into a 2-bit per base bitvector. Usage: vec.toBases('AcgcaAc') == vec('00110111000011') Encoding is to use the first bit as purine vs pyrimidine (purine = 0), and to use the second bit as weak vs. strong (weak = 0). This means that A == 00, G == 01, U == 10, and C == 11. """ seq = sequence.translate(cls._sequence_to_bits) if not seq: return long.__new__(cls, 0L) else: return long.__new__(cls, long(seq, 4)) #note base 4 conversion def __init__(self, sequence='', Rna=True): """Returns new PackedBases object.""" self._length = len(sequence)*2 self.Rna = Rna def __str__(self): """Unpacks bases from sequence using encoding scheme in toBases.""" length = self._length/2 bits = [0] * length #allocate space using '00' pattern curr = self #successively find the last two bit of curr, replacing the #appropriate element of bits as each bit is processed and #removed while curr and length: bits[length-1] = curr & 3 curr >>= 2 length -= 1 #convert numbers back into string if self.Rna: num2base = self._rna_bases else: num2base = self._dna_bases return ''.join([num2base[i] for i in bits]) PyCogent-1.5.3/cogent/core/entity.py000644 000765 000024 00000127504 12024702176 020347 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides the entities, the building blocks of the SMRCA hierachy representation of a macromolecular structure. The MultiEntity class is a special Entity class to hold multiple instances of other entities. All Entities apart from the Atom can hold others and inherit from the MultiEntity. The Entity is the most basic class to deal with structural and molecular data. Do not use it directly since some functions depend on methods provided by sub-classes. Classes inheriting from MultiEntity have to provide some attributes during init e.g: self.level = a valid string inside the SMCRA hierarchy). Holders of entities are like normal MultiEntities, but are temporary and are outside the parent-children axes. """ import cogent from cogent.core.annotation import SimpleVariable from numpy import (sqrt, arctan2, power, array, mean, sum) from cogent.data.protein_properties import AA_NAMES, AA_ATOM_BACKBONE_ORDER, \ AA_ATOM_REMOTE_ORDER, AREAIMOL_VDW_RADII, \ DEFAULT_AREAIMOL_VDW_RADIUS, AA_NAMES_3to1 from cogent.data.ligand_properties import HOH_NAMES, LIGAND_AREAIMOL_VDW_RADII from operator import itemgetter, gt, ge, lt, le, eq, ne, or_, and_, contains, \ is_, is_not from collections import defaultdict from itertools import izip from copy import copy, deepcopy __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ_ ' HIERARCHY = ['H', 'S', 'M', 'C', 'R', 'A'] AREAIMOL_VDW_RADII.update(LIGAND_AREAIMOL_VDW_RADII) # error while creating a structure (non-recoverable error) class ConstructionError(Exception): """Cannot unambiguously create a structure.""" pass # warning while creating a structure # (something wrong with the input, but recoverable) class ConstructionWarning(Exception): """Input violates some construction rules (contiguity).""" pass def sort_id_list(id_list, sort_tuple): """Sorts lists of id tuples. The order is defined by the PDB file specification.""" (hol_loc, str_loc, mod_loc, chn_loc, res_loc, at_loc) = sort_tuple # even a simple id is a tuple, this makes sorting general def space_last(ch_id1, ch_id2): # this is for chain sorting if ch_id1 == ' ' and ch_id2 != ' ': return 1 if ch_id2 == ' ' and ch_id1 != ' ': return - 1 if ch_id1 == ' ' and ch_id2 == ' ': return 0 return cmp(ch_id1, ch_id2) def atom(at_id1, at_id2): # hydrogen atoms come last is_hydrogen1 = (at_id1[0] == 'H') is_hydrogen2 = (at_id2[0] == 'H') diff = cmp(is_hydrogen1, is_hydrogen2) # back bone come first if not diff: order1 = AA_ATOM_BACKBONE_ORDER.get(at_id1) order2 = AA_ATOM_BACKBONE_ORDER.get(at_id2) diff = cmp(order2, order1) # (B)eta, (D)elta, (G)amma, .... o(X)t if not diff: remote1 = AA_ATOM_REMOTE_ORDER.get(at_id1[1:2]) remote2 = AA_ATOM_REMOTE_ORDER.get(at_id2[1:2]) diff = cmp(remote1, remote2) # branching comes last if not diff: diff = cmp(at_id1[2:4], at_id2[2:4]) return diff # SE vs CE - selenium first if not diff: alpha1 = ALPHABET.index(at_id1[0:1]) alpha2 = ALPHABET.index(at_id2[0:1]) diff = cmp(alpha2, alpha1) def residue(res_id1, res_id2): r1, r2 = 1, 1 if res_id1 in AA_NAMES: r1 = 2 if res_id1 in HOH_NAMES: r1 = 0 if res_id2 in AA_NAMES: r2 = 2 if res_id2 in HOH_NAMES: r2 = 0 if r1 is r2: return cmp(res_id1, res_id2) else: return cmp(r2, r1) # this assumes that the implementation of sorting is stable. # does it work for others then cPython. if res_loc or res_loc is 0: id_list.sort(key=itemgetter(res_loc), cmp=lambda x, y: residue(x[0], y[0])) # by res_name if at_loc or at_loc is 0: id_list.sort(key=itemgetter(at_loc), cmp=lambda x, y: space_last(x[1], y[1])) # by alt_loc if at_loc or at_loc is 0: id_list.sort(key=itemgetter(at_loc), cmp=lambda x, y: atom(x[0], y[0])) # by at_id if res_loc or res_loc is 0: id_list.sort(key=itemgetter(res_loc), cmp=lambda x, y: cmp(x[2], y[2])) # by res_ic if res_loc or res_loc is 0: id_list.sort(key=itemgetter(res_loc), cmp=lambda x, y: cmp(x[1], y[1])) # by res_id if chn_loc or chn_loc is 0: id_list.sort(key=itemgetter(chn_loc), cmp=space_last) # by chain if mod_loc or mod_loc is 0: id_list.sort(key=itemgetter(mod_loc)) # by model if str_loc or str_loc is 0: id_list.sort(key=itemgetter(str_loc)) # by structure return id_list def merge(dicts): """Merges multiple dictionaries into a new one.""" master_dict = {} for dict_ in dicts: master_dict.update(dict_) return master_dict def unique(lists): """Merges multiple iterables into a unique sorted tuple (sorted set).""" master_set = set() for set_ in lists: master_set.update(set_) return tuple(sorted(master_set)) class Entity(dict): """Container object all entities inherit from it. Inherits from dict.""" def __init__(self, id, name=None, *args): # This class has to be sub-classed! # the masked attribute has to be set before the __init__ of an Entity # because during __setstate__, __getstate__ sub-entities are iterated # by .values(), which relies on the attribute masked. to decide which # children should be omitted. self.masked = False self.parent = None # mandatory parent attribute self.modified = True # modified on creation self.id = (id,) # ids are non-zero lenght tuples self.name = (name or id) # prefer name over duplicate id self.xtra = {} # mandatory xtra dict attribute # Dictionary that keeps additional properties dict.__init__(self, *args) # finish init as dictionary def __copy__(self): return deepcopy(self) def __deepcopy__(self, memo): new_state = self.__getstate__() new_instance = self.__new__(type(self)) new_instance.__setstate__(new_state) return new_instance def __getstate__(self): new_state = copy(self.__dict__) # shallow new_state['parent'] = None return new_state def __setstate__(self, new_state): self.__dict__.update(new_state) def __repr__(self): """Default representation.""" # mandatory getLevel from sub-class return "" % (self.getId(), self.getLevel()) def __sub__(self, entity): """Override "-" as Euclidean distance between coordinates.""" return sqrt(sum(pow(self.coords - entity.coords, 2))) def _setId(self, id): self.name = id[0] def _getId(self): return (self.name,) def getId(self): """Return the id.""" return self._getId() def getFull_id(self): """Return the full id.""" parent = self.getParent() if parent: full_id = parent.getFull_id() else: full_id = () # we create a tuple on the top full_id = full_id + self.getId() # merge tuples from the left return full_id def setId(self, id_=None): """Set the id. Calls the ``_setId`` method.""" if (id_ and id_ != self.id) or (not id_ and (self.getId() != self.id)): self.id = (id_ or self.getId()) self.setModified(True, True) self._setId(self.id) if self.parent: self.parent.updateIds() def _setMasked(self, masked, force=False): if masked != self.masked or force: self.masked = masked # mask or unmask self.setModified(True, False) # set parents as modified def setMasked(self, *args, **kwargs): """Set masked flag (``masked``) ``True``.""" self._setMasked(True, *args, **kwargs) def setUnmasked(self, *args, **kwargs): """Set masked flag (``masked``) ``False``.""" self._setMasked(False, *args, **kwargs) def setModified(self, up=True, down=False): """Set modified flag (``modified``) ``True``.""" self.modified = True if up and self.parent: self.parent.setModified(True, False) def setUnmodified(self, up=False, down=False): """Set modified flag (``modified``) ``False``.""" self.modified = False if up and self.parent: self.parent.setUnmodified(True, False) def setParent(self, entity): """Set the parent ``Entity`` and adds oneself as the child.""" if self.parent != entity: # delete old parent self.delParent() # add new parent self.parent = entity self.parent.addChild(self) self.setModified(False, True) def delParent(self): """Detach mutually from the parent. Sets both child and parent modified flags (``modified``) as ``True``.""" if self.parent: self.parent.pop(self.getId()) self.parent.setModified(True, False) self.parent = None self.setModified(False, True) def getModified(self): """Return value of the modified flag (``modified``).""" return self.modified def getMasked(self): """Return value of the masked flag (``masked``).""" return self.masked def setLevel(self, level): """Set level (``level``).""" self.level = level def getLevel(self): """Return level (``level``)in the hierarchy.""" return self.level def setName(self, name): """Set name.""" self.name = name self.setId() def getName(self): """Return name.""" return self.name def getParent(self, level=None): """Return the parent ``Entity`` instance.""" if not level: return self.parent elif level == self.level: return self return self.parent.getParent(level) def move(self, origin): """Subtract the origin coordinates from the coordintats (``coords``).""" self.coords = self.coords - origin def setCoords(self, coords): """Set the entity coordinates. Coordinates should be a ``numpy.array``.""" self.coords = coords def getCoords(self): """Get the entity coordinates.""" return self.coords def getScoords(self): """Return spherical (r, theta, phi) coordinates.""" x, y, z = self.coords x2, y2, z2 = power(self.coords, 2) scoords = array([sqrt(x2 + y2 + z2), \ arctan2(sqrt(x2 + y2), z), \ arctan2(y, x)]) return scoords def getCcoords(self): """Return redundant, polar, clustering-coordinates on the unit-sphere. This is only useful for clustering.""" x, y, z = self.coords x2, y2, z2 = power(self.coords, 2) ccoords = array([arctan2(sqrt(y2 + z2), x), \ arctan2(sqrt(x2 + z2), y), \ arctan2(sqrt(x2 + y2), z) ]) return ccoords def setScoords(self): """Set ``entity.scoords``, see: getScoords.""" self.scoords = self.getScoords() def setCcoords(self): """Set ``entity.ccoords``, see: getCcoords.""" self.ccoords = self.getCcoords() class MultiEntity(Entity): """The ``MultiEntity`` contains other ``Entity`` or ``MultiEntity`` instances.""" def __init__(self, long_id, short_id=None, *args): self.index = HIERARCHY.index(self.level) # index corresponding to the hierarchy level self.table = dict([(level, {}) for level in HIERARCHY[self.index + 1:]]) # empty table Entity.__init__(self, long_id, short_id, *args) def __repr__(self): id_ = self.getId() return "" % (id_, len(self)) def _link(self): """Recursively adds a parent pointer to children.""" for child in self.itervalues(unmask=True): child.parent = self try: child._link() except AttributeError: pass def _unlink(self): """Recursively deletes the parent pointer from children.""" for child in self.itervalues(unmask=True): child.parent = None try: child._unlink() except AttributeError: pass def __getstate__(self): new_dict = copy(self.__dict__) # shallow copy new_dict['parent'] = None # remove recursion new_children = [] for child in self.itervalues(unmask=True): new_child_instance = deepcopy(child) new_children.append(new_child_instance) return (new_children, new_dict) def __setstate__(self, new_state): new_children, new_dict = new_state self.__dict__.update(new_dict) for child in new_children: self.addChild(child) def __copy__(self): return deepcopy(self) def __deepcopy__(self, memo): new_state = self.__getstate__() new_instance = self.__new__(type(self)) new_instance.__setstate__(new_state) return new_instance def __iter__(self): return self.itervalues() def setSort_tuple(self, sort_tuple=None): """Set the ``sort_tuple attribute``. The ``sort_tuple`` is a tuple needed by the ``sort_id_list`` function to correctly sort items within entities.""" if sort_tuple: self.sort_tuple = sort_tuple else: # making the sort tuple, ugly, uggly, uaughhlly ble sort_tuple = [None, None, None, None, None, None] key_lenght = len(self.keys()[0]) stop_i = self.index + 2 # next level, open right [) start_i = stop_i - key_lenght # before all nones indexes = range(start_i, stop_i) # Nones to change for value, index in enumerate(indexes): sort_tuple[index] = value self.sort_tuple = sort_tuple def getSort_tuple(self): """Return the ``sort_tuple`` attribute. If not set calls the ``setSort_tuple`` method first. See: ``setSort_tuple``.""" if not hasattr(self, 'sort_tuple'): self.setSort_tuple() return self.sort_tuple def itervalues(self, unmask=False): return (v for v in super(MultiEntity, self).itervalues() if not v.masked or unmask) def iteritems(self, unmask=False): return ((k, v) for k, v in super(MultiEntity, self).iteritems() if not v.masked or unmask) def iterkeys(self, unmask=False): return (k for k, v in super(MultiEntity, self).iteritems() if not v.masked or unmask) def values(self, *args, **kwargs): return list(self.itervalues(*args, **kwargs)) def items(self, *args, **kwargs): return list(self.iteritems(*args, **kwargs)) def keys(self, *args, **kwargs): return list(self.iterkeys(*args, **kwargs)) def __contains__(self, key, *args, **kwargs): return key in self.keys(*args, **kwargs) def sortedkeys(self, *args, **kwargs): list_ = sort_id_list(self.keys(*args, **kwargs), self.getSort_tuple()) return list_ def sortedvalues(self, *args, **kwargs): values = [self[i] for i in self.sortedkeys(*args, **kwargs)] return values def sorteditems(self, *args, **kwargs): items = [(i, self[i]) for i in self.sortedkeys()] return items def _setMasked(self, masked, force=False): """Set the masked flag (``masked``) recursively. If forced proceed even if the flag is already set correctly.""" if masked != self.masked or force: # the second condition is when if masked: # an entity has all children masked # we have to mask children # but is not masked itself for child in self.itervalues(): # only unmasked children child.setMasked() child.setModified(False, False) else: # we have to unmask children for child in self.itervalues(unmask=True): if child.masked or force: # only masked children child.setUnmasked(force=force) child.setModified(False, False) self.masked = masked self.setModified(True, False) # set parents as modified def setModified(self, up=True, down=True): """Set the modified flag (``modified``) ``True``. If down proceeds recursively for all children. If up proceeds recursively for all parents.""" self.modified = True if up and self.parent: self.parent.setModified(True, False) if down: for child in self.itervalues(unmask=True): child.setModified(False, True) def setUnmodified(self, up=False, down=False): """Set the modified (``modified``) flag ``False``. If down proceeds recursively for all children. If up proceeds recursively for all parents.""" self.modified = False if up and self.parent: self.parent.setUnmodified(True, False) if down: for child in self.itervalues(unmask=True): child.setUnmodified(False, True) def _initChild(self, child): """Initialize a child (during construction).""" child.parent = self self[child.getId()] = child def addChild(self, child): """Add a child.""" child.setParent(self) child_id = child.getId() self[child_id] = child self.setModified(True, False) def delChild(self, child_id): """Remove a child.""" child = self.get(child_id) if child: child.delParent() self.setModified(True, False) def getChildren(self, ids=None, **kwargs): """Return a copy of the list of children.""" if ids: children = [] for (id_, child) in self.iteritems(**kwargs): if id_ in ids: children.append(child) else: children = self.values(**kwargs) return children def _setTable(self, entity): """Recursive helper method for ``entity.setTable``.""" for e in entity.itervalues(): self.table[e.getLevel()].update({e.getFull_id():e}) self._setTable(e) def setTable(self, force=True, unmodify=True): """Populate the children table (``table``) recursively with all children grouped into hierarchy levels. If forced is ``True`` the table will be updated even if the ``Entity`` instance is not modified. If unmodify is ``True`` the ``Entity`` modified flag (``modified``) will be set ``False`` afterwards.""" if self.modified or force: # a table is accurate as long as the contents of a dictionary do not # change. self.delTable() self._setTable(self) if unmodify: self.setUnmodified() def delTable(self): """Delete all children from the children-table (``table``). This does not modify the hierarchy.""" self.table = dict([(level, {}) for level in HIERARCHY[self.index + 1:]]) self.modified = True def getTable(self, level): """Return children of given level from the children-table (``table``).""" return self.table[level] def updateIds(self): """Update self with children ids.""" ids = [] for (id_, child) in self.iteritems(): new_id = child.getId() if id_ != new_id: ids.append((id_, new_id)) for (old_id, new_id) in ids: child = self.pop(old_id) self.update(((new_id, child),)) def getData(self, attr, xtra=False, method=False, forgiving=True, sorted=False): """Get data from children attributes, methods and xtra dicts as a list. If is ``True`` forgiving remove ``None`` values from the output. ``Nones`` are place-holders if a child does not have the requested data. If xtra is True the xtra dictionary (``xtra``) will be searched, if method is ``True`` the child attribute will be called.""" values = self.sortedvalues() if sorted else self.values() if xtra: # looking inside the xtra of children data = [child.xtra.get(attr) for child in values] # could get None else: # looking at attributes data = [] for child in values: try: if not method: data.append(getattr(child, attr)) else: data.append(getattr(child, attr)()) except AttributeError: # data.append(None) if forgiving: # remove Nones data = [point for point in data if point is not None] return data def propagateData(self, function, level, attr, **kwargs): """Propagate data from child level to this ``Entity`` instance. The function defines how children data should be transformed to become the parents data e.g. summed.""" if self.index <= HIERARCHY.index(level) - 2: for child in self.itervalues(): child.propagateData(function, level, attr, **kwargs) datas = self.getData(attr, **kwargs) if isinstance(function, basestring): function = eval(function) transformed_datas = function(datas) if kwargs.get('xtra'): self.xtra[attr] = transformed_datas else: setattr(self, attr, transformed_datas) return transformed_datas def countChildren(self, *args, **kwargs): """Count children based on ``getData``. Additional arguments and keyworded arguments are passed to the ``getData`` method.""" data = self.getData(*args, **kwargs) children = defaultdict(int) # by default returns 0 for d in data: children[d] += 1 return children def freqChildren(self, *args, **kwargs): """Frequency of children based on ``countChildren``. Additional arguments and keyworded arguments are passed to the ``countChildren`` method.""" children_count = self.countChildren(*args, **kwargs) lenght = float(len(self)) # it could be len(children_count)? for (key_, value_) in children_count.iteritems(): children_count[key_] = value_ / lenght return children_count def splitChildren(self, *args, **kwargs): """Splits children into groups children based on ``getData``. Additional arguments and keyworded arguments are passed to the ``getData`` method.""" kwargs['forgiving'] = False data = self.getData(*args, **kwargs) clusters = defaultdict(dict) # by default returns {} for (key, (id_, child)) in izip(data, self.iteritems()): clusters[key].update({id_:child}) return clusters def selectChildren(self, value, operator, *args, **kwargs): """Generic method to select children, based on ``getData``. Returns a dictionary of children indexed by ids. Compares the data item for each child using the operator name e.g. "eq" and a value e.g. "H_HOH". Additional arguments and keyworded arguments are passed to the ``getData`` method.""" kwargs['forgiving'] = False data = self.getData(*args, **kwargs) children = {} for (got, (id_, child)) in izip(data, self.iteritems()): if isinstance(operator, basestring): operator = eval(operator) if operator(value, got): children.update({id_:child}) return children def ornamentChildren(self, *args, **kwargs): """Return a list of (ornament, (id, child)) tuples, based on ``getData``. Useful for sorting see: Schwartzian transform. Forgiving is set False. Additional arguments and keyworded arguments are passed to the ``getData`` method.""" kwargs['forgiving'] = False data = self.getData(*args, **kwargs) children = [] for (got, (id_, child)) in izip(data, self.iteritems()): children.append((got, (id_, child))) return children def ornamentdictChildren(self, *args, **kwargs): """Return a dictionary of ornaments indexed by child ids, based on ``getData``. Forgiving is set False. Additional arguments and keyworded arguments are passed to the ``getData`` method.""" kwargs['forgiving'] = False data = self.getData(*args, **kwargs) propertydict = {} for (got, id_) in izip(data, self.iterkeys()): propertydict.update(((id_, got),)) return propertydict def stripChildren(self, *args, **kwargs): """Strips children based on selection criteria. See: ``selectChildren``. Additional arguments and keyworded arguments are passed to the ``selectChildren`` method.""" children_ids = self.selectChildren(*args, **kwargs).keys() for id_ in children_ids: self.delChild(id_) def maskChildren(self, *args, **kwargs): """Mask children based on selection criteria. See: ``selectChildren``. Additional arguments and keyworded arguments are passed to the ``selectChildren`` method.""" children = self.selectChildren(*args, **kwargs).itervalues() for child in children: child.setMasked() # child.setModified child.parent.setModified def unmaskChildren(self, *args, **kwargs): """Unmask children based on selection criteria. See: ``selectChildren``. Additional arguments and keyworded arguments are passed to the ``selectChildren`` method.""" children = self.selectChildren(*args, **kwargs).itervalues() for child in children: child.setUnmasked() # child.setModified child.parent.setModified def moveRecursively(self, origin): """Move ``Entity`` instance recursively to the origin.""" for child in self.itervalues(): try: child.moveRecursively(origin) except: # Atoms do not have this child.move(origin) pass self.setCoords() def setCoordsRecursively(self): """Set coordinates (``coords``) recursively. Useful if any child had its coordinates changed.""" for child in self.itervalues(): try: child.setCoordsRecursively() except: #Atoms do not have this pass self.setCoords() def setCoords(self, *args, **kwargs): """Set coordinates (``coords``) as a centroid of children coordinates. A subset of children can be selected for the calculation. See: ``Entity.selectChildren``. Additional arguments and keyworded arguments are passed to the ``getData`` method.""" # select only some children if args or kwargs: children = self.selectChildren(*args, **kwargs).values() else: children = self coords = [] for child in children: coords.append(child.getCoords()) self.coords = mean(coords, axis=0) def getCoords(self): """Returns the current coordinates (``coords``). Raises an ``AttributeError`` if not set.""" try: return self.coords except AttributeError: raise AttributeError, "Entity has coordinates not set." def dispatch(self, method, *args, **kwargs): """Calls a method of all children with given arguments and keyworded arguments.""" for child in self.itervalues(): getattr(child, method)(*args, **kwargs) class Structure(MultiEntity): """The ``Structure`` instance contains ``Model`` instances.""" def __init__(self, id, *args, **kwargs): self.level = 'S' MultiEntity.__init__(self, id, *args, **kwargs) def __repr__(self): return '' % self.getId() def removeAltmodels(self): """Remove all models with an id != 0""" self.stripChildren((0,), 'ne', 'id', forgiving=False) def getDict(self): """See: ``Entity.getDict``.""" return {'structure':self.getId()[0]} class Model(MultiEntity): """The ``Model`` instance contains ``Chain`` instances.""" def __init__(self, id, *args, **kwargs): self.level = 'M' MultiEntity.__init__(self, id, *args, **kwargs) def __repr__(self): return "" % self.getId() def getDict(self): """See: ``Entity.getDict``.""" try: from_parent = self.parent.getDict() except AttributeError: # we are allowed to silence this becaus a structure id is not # required to write a proper pdb line. from_parent = {} from_parent.update({'model':self.getId()[0]}) return from_parent class Chain(MultiEntity): """The ``Chain`` instance contains ``Residue`` instances.""" def __init__(self, id, *args, **kwargs): self.level = 'C' MultiEntity.__init__(self, id, *args, **kwargs) def __repr__(self): return "" % self.getId() def removeHetero(self): """Remove residues with the hetero flag.""" self.stripChildren('H', 'eq', 'h_flag', forgiving=False) def removeWater(self): """Remove water residues.""" self.stripChildren('H_HOH', 'eq', 'name', forgiving=False) def residueCount(self): """Count residues based on ``name``.""" return self.countChildren('name') def residueFreq(self): """Calculate residue frequency (based on ``name``).""" return self.freqChildren('name') def getSeq(self, moltype ='PROTEIN'): """Returns a Sequence object from the ordered residues. The "seq_type" determines allowed residue names.""" if moltype == 'PROTEIN': valid_names = AA_NAMES moltype = cogent.PROTEIN elif moltype == 'DNA': raise NotImplementedError('The sequence type: %s is not implemented' % moltype) elif moltype == 'RNA': raise NotImplementedError('The sequence type: %s is not implemented' % moltype) else: raise ValueError('The \'seq_type\' is not supported.') aa = ResidueHolder('aa', self.selectChildren(valid_names, contains, 'name')) aa_noic = ResidueHolder('noic', aa.selectChildren(' ', eq, 'res_ic')) raw_seq = [] full_ids = [] for res in aa_noic.sortedvalues(): raw_seq.append(AA_NAMES_3to1[res.name]) full_ids.append(res.getFull_id()[1:]) raw_seq = "".join(raw_seq) seq = cogent.Sequence(moltype, raw_seq, self.getName()) seq.addAnnotation(SimpleVariable, 'entity_id', 'S_id', full_ids) return seq def getDict(self): """See: ``Entity.getDict``.""" from_parent = self.parent.getDict() from_parent.update({'chain_id':self.getId()[0]}) return from_parent class Residue(MultiEntity): """The ``Residue`` instance contains ``Atom`` instances.""" def __init__(self, res_long_id, h_flag, seg_id, *args, **kwargs): self.level = 'R' self.seg_id = seg_id self.h_flag = h_flag self.res_id = res_long_id[1] #ID number self.res_ic = res_long_id[2] #ID long NAME MultiEntity.__init__(self, res_long_id, res_long_id[0], *args, **kwargs) def __repr__(self): res_name, res_id, res_ic = self.getId()[0] full_name = (res_name, res_id, res_ic) return "" % full_name def _getId(self): """Return the residue full id. ``(name, res_id, res_ic)``.""" return ((self.name, self.res_id, self.res_ic),) def _setId(self, id): """Set the residue id ``res_id``, name ``name`` and insertion code ``res_ic`` from a full id.""" (self.name, self.res_id, self.res_ic) = id[0] def removeHydrogens(self): """Remove hydrogen atoms.""" self.stripChildren(' H', 'eq', 'element', forgiving=False) def getSeg_id(self): """Return the segment id.""" return self.seg_id def setSeg_id(self, seg_id): """Set the segment id. This does not change the id.""" self.seg_id = seg_id def getIc(self): """Return the insertion code.""" return self.res_ic def setIc(self, res_ic): """Set the insertion code.""" self.res_ic = res_ic self.setId() def getRes_id(self): """Get the id.""" return self.res_id def setRes_id(self, res_id): """Set the id.""" self.res_id = res_id self.setId() def getH_flag(self): """Return the hetero flag.""" return self.h_flag def setH_flag(self, h_flag): """Sets the hetero flag. A valid flag is ' ' or 'H'. If 'H' the flag becomes part of the residue name i.e. H_XXX.""" if not h_flag in (' ', 'H'): raise AttributeError, "Only ' ' and 'H' hetero flags allowed." if len(self.name) == 3: self.name = "%s_%s" % (h_flag, self.name) elif len(self.name) == 5: self.name = "%s_%s" % (h_flag, self.name[2:]) else: raise ValueError, 'Non-standard residue name' self.h_flag = h_flag self.setId() def getDict(self): """See: ``Entity.getDict``.""" from_parent = self.parent.getDict() if self.h_flag != ' ': at_type = 'HETATM' else: at_type = 'ATOM ' from_parent.update({'at_type': at_type, 'h_flag': self.h_flag, 'res_name': self.name, 'res_long_id': self.getId()[0], 'res_id': self.res_id, 'res_ic': self.res_ic, 'seg_id': self.seg_id, }) return from_parent class Atom(Entity): """The ``Atom`` class contains no children.""" def __init__(self, at_long_id, at_name, ser_num, coords, occupancy, bfactor, element): self.level = 'A' self.index = HIERARCHY.index(self.level) self.coords = coords self.bfactor = bfactor self.occupancy = occupancy self.ser_num = ser_num self.at_id = at_long_id[0] self.alt_loc = at_long_id[1] self.table = dict([(level, {}) for level in HIERARCHY[self.index + 1:]]) self.element = element Entity.__init__(self, at_long_id, at_name) def __nonzero__(self): return bool(self.id) def __repr__(self): return "" % self.getId() def _getId(self): """Return the full id. The id of an atom is not its ' XX ' name but this string after left/right spaces striping. The full id is ``(at_id, alt_loc)``.""" return ((self.at_id, self.alt_loc),) def _setId(self, id): """Set the atom id ``at_id`` and alternate location ``alt_loc`` from a full id. See: ``_getId``.""" (self.at_id, self.alt_loc) = id[0] def setElement(self, element): """Set the atom element ``element``.""" self.element = element def setName(self, name): """Set name and update the id.""" self.name = name self.setAt_id(name.strip()) def setAt_id(self, at_id): """Set id. An atom id should be derived from the atom name. See: ``_getId``.""" self.at_id = at_id self.setId() def setAlt_loc(self, alt_loc): """Set alternate location identifier.""" self.alt_loc = alt_loc self.setId() def setSer_num(self, n): """Set serial number.""" self.ser_num = n def setBfactor(self, bfactor): """Set B-factor.""" self.bfactor = bfactor def setOccupancy(self, occupancy): """Set occupancy.""" self.occupancy = occupancy def setRadius(self, radius=None, radius_type=AREAIMOL_VDW_RADII, \ default_radius=DEFAULT_AREAIMOL_VDW_RADIUS): """Set radius, defaults to the AreaIMol VdW radius.""" if radius: self.radius = radius else: try: self.radius = radius_type[(self.parent.name, self.name)] except KeyError: self.radius = default_radius def getSer_num(self): """Return the serial number.""" return self.ser_num def getBfactor(self): """Return the B-factor.""" return self.bfactor def getOccupancy(self): """Return the occupancy.""" return self.occupancy def getRadius(self): """Return the radius.""" return self.radius def getDict(self): """See: ``Entity.getDict``.""" from_parent = self.parent.getDict() from_parent.update({'at_name': self.name, 'ser_num': self.ser_num, 'coords': self.coords, 'occupancy': self.occupancy, 'bfactor': self.bfactor, 'alt_loc': self.alt_loc, 'at_long_id': self.getId()[0], 'at_id': self.at_id, 'element': self.element}) return from_parent class Holder(MultiEntity): """The ``Holder`` instance exists outside the SMCRA hierarchy. Elements in a ``Holder`` instance are indexed by the full id.""" def __init__(self, name, *args): if not hasattr(self, 'level'): self.level = name MultiEntity.__init__(self, name, name, *args) def __repr__(self): return '' % (self.level, self.getName()) def addChild(self, child): """Add a child.""" child_id = child.getFull_id() self[child_id] = child def delChild(self, child_id): """Remove a child.""" self.pop(child_id) def updateIds(self): """Update self with children long ids.""" ids = [] for (id_, child) in self.iteritems(): new_id = child.getFull_id() if id_ != new_id: ids.append((id_, new_id)) for (old_id, new_id) in ids: child = self.pop(old_id) self.update(((new_id, child),)) class StructureHolder(Holder): """The ``StructureHolder`` contains ``Structure`` instances. See: ``Holder``.""" def __init__(self, *args): self.level = 'H' Holder.__init__(self, *args) def __repr__(self): return "" % self.getName() class ModelHolder(Holder): """The ``ModelHolder`` contains ``Model`` instances. See: ``Holder``.""" def __init__(self, *args): self.level = 'S' Holder.__init__(self, *args) def __repr__(self): return "" % self.getName() class ChainHolder(Holder): """The ``ChainHolder`` contains ``Chain`` instances. See: ``Holder``.""" def __init__(self, *args): self.level = 'M' Holder.__init__(self, *args) def __repr__(self): return "" % self.getName() class ResidueHolder(Holder): """The ``ResidueHolder`` contains ``Residue`` instances. See: ``Holder``.""" def __init__(self, *args): self.level = 'C' Holder.__init__(self, *args) def __repr__(self): return "" % self.getName() class AtomHolder(Holder): """The ``AtomHolder`` contains ``Atom`` instances. See: ``Holder``.""" def __init__(self, *args): self.level = 'R' Holder.__init__(self, *args) def __repr__(self): return "" % self.getName() class StructureBuilder(object): """Constructs a ``Structure`` object. The ``StructureBuilder`` class is used by a parser class to parse a file into a ``Structure`` object. An instance of a ``StructureBuilder`` has methods to create ``Entity`` instances and add them into the SMCRA hierarchy``.""" def __init__(self): self.structure = None def initStructure(self, structure_id): """Initialize a ``Structure`` instance.""" self.structure = Structure(structure_id) def initModel(self, model_id): """Initialize a ``Model`` instance and add it as a child to the ``Structure`` instance. If a model is defined twice a ``ConstructionError`` is raised.""" if not (model_id,) in self.structure: self.model = Model(model_id) self.model.junk = AtomHolder('junk') self.structure._initChild(self.model) else: raise ConstructionError def initChain(self, chain_id): """Initialize a ``Chain`` instance and add it as a child to the ``Model`` instance. If a chain is defined twice a ``ConstructionWarning`` is raised. This means that the model is not continuous.""" if not (chain_id,) in self.model: self.chain = Chain(chain_id) self.model._initChild(self.chain) else: self.chain = self.model[(chain_id,)] raise ConstructionWarning, "Chain %s is not continous" % chain_id def initSeg(self, seg_id): """Does not create an ``Entity`` instance, but updates the segment id, ``seg_id`` which is used to initialize ``Residue`` instances.""" self.seg_id = seg_id def initResidue(self, res_long_id, res_name): """Initialize a ``Residue`` instance and add it as a child to the ``Chain`` instance. If a residue is defined twice a ``ConstructionWarning`` is raised. This means that the chain is not continuous.""" if not (res_long_id,) in self.chain: self.residue = Residue(res_long_id, res_name, self.seg_id) self.chain._initChild(self.residue) else: self.residue = self.chain[(res_long_id,)] raise ConstructionWarning, "Residue %s%s%s is not continuous" % \ res_long_id def initAtom(self, at_long_id, at_name, ser_num, coord, occupancy, \ bfactor, element): """Initialize an ``Atom`` instance and add is as child to the ``Residue`` instance. If an atom is defined twice a ``ConstructionError`` is raised and the ``Atom`` instance is added to the ``structure.model.junk`` ``Holder`` instance.""" if not (at_long_id,) in self.residue: self.atom = Atom(at_long_id, at_name, ser_num, coord, occupancy, \ bfactor, element) self.residue._initChild(self.atom) else: full_id = (tuple(self.residue[(at_long_id,)].getFull_id()), \ ser_num) self.model.junk._initChild(Atom(full_id, at_name, ser_num, coord, \ occupancy, bfactor, element)) raise ConstructionError, 'Atom %s%s is defined twice.' % at_long_id def getStructure(self): """Update coordinates (``coords``), set the children-table (``table``) and return the ``Structure`` instance.""" self.structure.setTable() self.structure.setCoordsRecursively() return self.structure PyCogent-1.5.3/cogent/core/genetic_code.py000644 000765 000024 00000032144 12024702176 021436 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Translates RNA or DNA string to amino acid sequence. NOTE: * is used to denote termination (as per NCBI standard). NOTE: Although the genetic code objects convert DNA to RNA and vice versa, lists of codons that they produce will be provided in DNA format. """ from string import maketrans import re __author__ = "Greg Caporaso and Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso", "Rob Knight", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "caporaso@colorado.edu" __status__ = "Production" class GeneticCodeError(Exception): pass class GeneticCodeInitError(ValueError, GeneticCodeError): pass class InvalidCodonError(KeyError, GeneticCodeError): pass _dna_trans = maketrans('TCAG','AGTC') def _simple_rc(seq): """simple reverse-complement: works only on unambiguous uppercase DNA""" return seq.translate(_dna_trans)[::-1] class GeneticCode(object): """Holds codon to amino acid mapping, and vice versa. Usage: gc = GeneticCode(CodeSequence) sgc = GeneticCode( 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG') sgc['UUU'] == 'F' sgc['TTT'] == 'F' sgc['F'] == ['TTT', 'TTC'] #in arbitrary order sgc['*'] == ['TAA', 'TAG', 'TGA'] #in arbitrary order CodeSequence : 64 character string containing NCBI genetic code translation GeneticCode is immutable once created. """ #class data: need the bases, the list of codons in UUU -> GGG order, and #a mapping from positions in the list back to codons. These should be the #same for all GeneticCode instances, and are immutable (therefore private). _nt = "TCAG" _codons = [a+b+c for a in _nt for b in _nt for c in _nt] def __init__(self, CodeSequence, ID=None, Name=None, StartCodonSequence=None): """Returns new GeneticCode object. CodeSequence : 64-character string containing NCBI representation of the genetic code. Raises GeneticCodeInitError if length != 64. """ if (len(CodeSequence) != 64): raise GeneticCodeInitError,\ "CodeSequence: %s has length %d, but expected 64"\ % (CodeSequence, len(CodeSequence)) self.CodeSequence = CodeSequence self.ID = ID self.Name = Name self.StartCodonSequence = StartCodonSequence start_codons = {} if StartCodonSequence: for codon, aa in zip(self._codons, StartCodonSequence): if aa != '-': start_codons[codon] = aa self.StartCodons = start_codons codon_lookup = dict(zip(self._codons, CodeSequence)) self.Codons = codon_lookup #create synonyms for each aa aa_lookup = {} for codon in self._codons: aa = codon_lookup[codon] if aa not in aa_lookup: aa_lookup[aa] = [codon] else: aa_lookup[aa].append(codon) self.Synonyms = aa_lookup sense_codons = codon_lookup.copy() #create sense codons stop_codons = self['*'] for c in stop_codons: del sense_codons[c] self.SenseCodons = sense_codons #create anticodons ac = {} for aa, codons in self.Synonyms.items(): ac[aa] = map(_simple_rc, codons) self.Anticodons = ac def _analyze_quartet(self, codons, aa): """Analyzes a quartet of codons and amino acids: returns list of lists. Each list contains one block, splitting at R/Y if necessary. codons should be a list of 4 codons. aa should be a list of 4 amino acid symbols. Possible states: - All amino acids are the same: returns list of one quartet. - Two groups of 2 aa: returns list of two doublets. - One group of 2 and 2 groups of 1: list of one doublet, 2 singles. - 4 groups of 1: four singles. Note: codon blocks like Ile in the standard code (AUU, AUC, AUA) will be split when they cross the R/Y boundary, so [[AUU, AUC], [AUA]]. This would also apply to a block like AUC AUA AUG -> [[AUC],[AUA,AUG]], although this latter pattern is not observed in the standard code. """ if aa[0] == aa[1]: first_doublet = True else: first_doublet = False if aa[2] == aa[3]: second_doublet = True else: second_doublet = False if first_doublet and second_doublet and aa[1] == aa[2]: return [codons] else: blocks = [] if first_doublet: blocks.append(codons[:2]) else: blocks.extend([[codons[0]],[codons[1]]]) if second_doublet: blocks.append(codons[2:]) else: blocks.extend([[codons[2]],[codons[3]]]) return blocks def _get_blocks(self): """Returns list of lists of codon blocks in the genetic code. A codon block can be: - a quartet, if all 4 XYn codons have the same amino acid. - a doublet, if XYt and XYc or XYa and XYg have the same aa. - a singlet, otherwise. Returns a list of the quartets, doublets, and singlets in the order UUU -> GGG. Note that a doublet cannot span the purine/pyrimidine boundary, and a quartet cannot span the boundary between two codon blocks whose first two bases differ. """ if hasattr(self, '_blocks'): return self._blocks else: blocks = [] curr_codons = [] curr_aa = [] for index, codon, aa in zip(range(64),self._codons,self.CodeSequence): #we're in a new block if it's a new quartet or a different aa new_quartet = not index % 4 if new_quartet and curr_codons: blocks.extend(self._analyze_quartet(curr_codons, curr_aa)) curr_codons = [] curr_aa = [] curr_codons.append(codon) curr_aa.append(aa) #don't forget to append last block if curr_codons: blocks.extend(self._analyze_quartet(curr_codons, curr_aa)) self._blocks = blocks return self._blocks Blocks = property(_get_blocks) def __str__(self): """Returns CodeSequence that constructs the GeneticCode.""" return self.CodeSequence def __repr__(self): """Returns reconstructable representation of the GeneticCode.""" return 'GeneticCode(%s)' % str(self) def __cmp__(self, other): """ Allows two GeneticCode objects to be compared to each other. Two GeneticCode objects are equal if they have equal CodeSequences. """ return cmp(str(self), str(other)) def __getitem__(self, item): """Returns amino acid corresponding to codon, or codons for an aa. Returns [] for empty list of codons, 'X' for unknown amino acid. """ item = str(item) if len(item) == 1: #amino acid return self.Synonyms.get(item, []) elif len(item) == 3: #codon key = item.upper() key = key.replace('U', 'T') return self.Codons.get(key, 'X') else: raise InvalidCodonError, "Codon or aa %s has wrong length" % item def translate(self, dna, start=0): """ Translates DNA to protein with current GeneticCode. dna = a string of nucleotides start = position to begin translation (used to implement frames) Returns string containing amino acid sequence. Translates the entire sequence: it is the caller's responsibility to find open reading frames. NOTE: should return Protein object when we have a class for it. """ if not dna: return '' if start + 1 > len(dna): raise ValueError, "Translation starts after end of RNA" return ''.join([self[dna[i:i+3]] for i in range(start, len(dna)-2, 3)]) def getStopIndices(self, dna, start=0): """returns indexes for stop codons in the specified frame""" stops = self['*'] stop_pattern = '(%s)' % '|'.join(stops) stop_pattern = re.compile(stop_pattern) seq = str(dna) found = [hit.start() for hit in stop_pattern.finditer(seq)] found = [index for index in found if index % 3 == start] return found def sixframes(self, dna): """Returns six-frame translation as dict containing {frame:translation} """ reverse = dna.rc() return [self.translate(dna, start) for start in range(3)] + \ [self.translate(reverse, start) for start in range(3)] def isStart(self, codon): """Returns True if codon is a start codon, False otherwise.""" fixed_codon = codon.upper().replace('U','T') return fixed_codon in self.StartCodons def isStop(self, codon): """Returns True if codon is a stop codon, False otherwise.""" return self[codon] == '*' def changes(self, other): """Returns dict of {codon:'XY'} for codons that differ. X is the string representation of the amino acid in self, Y is the string representation of the amino acid in other. Always returns a 2-character string. """ changes = {} try: other_code = other.CodeSequence except AttributeError: #try using other directly as sequence other_code = other for codon, old, new in zip(self._codons, self.CodeSequence, other_code): if old != new: changes[codon] = old+new return changes NcbiGeneticCodeData = [GeneticCode(*data) for data in [ [ 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 1, 'Standard Nuclear', '---M---------------M---------------M----------------------------', ], [ 'FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG', 2, 'Vertebrate Mitochondrial', '--------------------------------MMMM---------------M------------', ], [ 'FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 3, 'Yeast Mitochondrial', '----------------------------------MM----------------------------', ], [ 'FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 4, 'Mold, Protozoan, and Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma Nuclear', '--MM---------------M------------MMMM---------------M------------', ], [ 'FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG', 5, 'Invertebrate Mitochondrial', '---M----------------------------MMMM---------------M------------', ], [ 'FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 6, 'Ciliate, Dasycladacean and Hexamita Nuclear', '-----------------------------------M----------------------------', ], [ 'FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG', 9, 'Echinoderm and Flatworm Mitochondrial', '-----------------------------------M---------------M------------', ], [ 'FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 10, 'Euplotid Nuclear', '-----------------------------------M----------------------------', ], [ 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 11, 'Bacterial Nuclear and Plant Plastid', '---M---------------M------------MMMM---------------M------------', ], [ 'FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 12, 'Alternative Yeast Nuclear', '-------------------M---------------M----------------------------', ], [ 'FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG', 13, 'Ascidian Mitochondrial', '-----------------------------------M----------------------------', ], [ 'FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG', 14, 'Alternative Flatworm Mitochondrial', '-----------------------------------M----------------------------', ], [ 'FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 15, 'Blepharisma Nuclear', '-----------------------------------M----------------------------', ], [ 'FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 16, 'Chlorophycean Mitochondrial', '-----------------------------------M----------------------------', ], [ 'FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG', 20, 'Trematode Mitochondrial', '-----------------------------------M---------------M------------', ], [ 'FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 22, 'Scenedesmus obliquus Mitochondrial', '-----------------------------------M----------------------------', ], [ 'FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG', 23, 'Thraustochytrium Mitochondrial', ], ]] #build dict of GeneticCodes keyed by ID (as int, not str) GeneticCodes = dict([(i.ID, i) for i in NcbiGeneticCodeData]) #add str versions for convenience for key, value in GeneticCodes.items(): GeneticCodes[str(key)] = value DEFAULT = GeneticCodes[1] PyCogent-1.5.3/cogent/core/info.py000644 000765 000024 00000011021 12024702176 017750 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides Info, DbRef, DbRefs Info is a dictionary and is the annotation object of a Sequence object. """ from cogent.parse.record import MappedRecord from cogent.util.misc import Delegator, FunctionWrapper, ConstrainedDict __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Prototype" class DbRef(object): """Holds a database accession, and optionally other data. Accession: id in the database: str or int Db: database name: str Name: short name of the record: str Description: description of the record, possibly lengthy: str Data: any data associated with the record: arbitrary object str(DbRef) always returns the accession. """ def __init__(self, Accession, Db='', Name='', Description='', \ Data=None): """Returns new DbRef. str(DbRef) always returns the accession as a string. """ self.Accession = Accession self.Db = Db self.Name = Name self.Description = Description self.Data = Data def __str__(self): """Returns accession.""" return str(self.Accession) def __int__(self): """Tries to coerce accession to int.""" return int(self.Accession) def __cmp__(self, other): """Compares by accession: tries numeric first, then alphabetic""" try: return cmp(int(self), int(other)) except: return cmp(str(self), str(other)) def _make_list(obj): """Returns list corresponding to or containing obj, depending on type.""" if isinstance(obj, list): return obj elif isinstance(obj, tuple): return list(obj) else: return [obj] class DbRefs(MappedRecord, ConstrainedDict): """Holds Database -> [Accessions] mapping. The accessions for a particular database are always stored as a list. DbRefs will ultimately contain methods for actually getting the records from known databases. """ ValueMask = FunctionWrapper(_make_list) DefaultValue = [] KnownDatabases = dict.fromkeys(['RefSeq', 'GenBank', 'GenNucl', 'GenPept', 'GI', 'SwissProt', 'PIR', 'EMBL', 'DDBJ', 'NDB', 'PDB', 'Taxon', 'LocusLink', 'UniGene', 'OMIM', 'PubMed', 'COGS', 'CDD', 'Pfam', 'Rfam', 'GO', 'dbEST', 'IPI', 'rRNA', 'EC', 'HomoloGene', 'KEGG', 'BRENDA', 'EcoCyc', 'HumanCyc', 'BLOCKS']) class Info(MappedRecord, Delegator): """Dictionary that stores attributes for Sequence objects. Delegates to DbRefs for database IDs. """ Required = {'Refs':None} def __init__(self, *args, **kwargs): """Returns new Info object. Creates DbRefs if necessary.""" temp = dict(*args, **kwargs) if 'Refs' in temp: refs = temp['Refs'] if not isinstance(refs, DbRefs): refs = DbRefs(refs) else: refs = DbRefs() #move keys into refs if they belong there: allows init from flat dict for key, val in temp.items(): if key in KnownDatabases: refs[key] = val del temp[key] Delegator.__init__(self, refs) self['Refs'] = refs MappedRecord.__init__(self, temp) def __getattr__(self, attr): """Checks for attr in Refs first.""" if attr in KnownDatabases: return getattr(self.Refs, attr) else: return super(Info, self).__getattr__(attr) def __setattr__(self, attr, val): """Try to set in Refs first.""" if attr in KnownDatabases: return setattr(self.Refs, attr, val) else: return super(Info, self).__setattr__(attr, val) def __getitem__(self, item): """Checks for item in Refs first.""" if item in KnownDatabases: return getattr(self.Refs, item) else: return super(Info, self).__getitem__(item) def __setitem__(self, item, val): """Try to set in Refs first.""" if item in KnownDatabases: return setattr(self.Refs, item, val) else: return super(Info, self).__setitem__(item, val) def __contains__(self, item): """Checks for item in Refs first.""" if item in KnownDatabases: return item in self.Refs else: return super(Info, self).__contains__(item) PyCogent-1.5.3/cogent/core/location.py000644 000765 000024 00000073514 12024702176 020644 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Alignments and Sequences are _Annotatables _Annotatables hold a list of Maps. Maps can be Features, Variables or AlignedSequences. Maps have a list of Spans. Also provides Range and Point classes for dealing with parts of sequences. Span is a region with a start, an end, and a direction. Range is an ordered collection of Spans (note: Range does _not_ support the list interface, but you can always access Range.Spans directly). Map is like a Range but is immutable and is able to be nested, i.e. Maps can be defined relative to other Maps. Implementation Notes Span and Range behave much like Python's slices: a Span contains the element after its Start but does not contain the element after its End. It may help to think of the Span indices occurring _between_ the list elements: a b c d e | | | | | | 0 1 2 3 4 5 ...so that a Span whose Start is its End contains no elements (e.g. 2:2), and a Span whose End is 2 more than its start contains 2 elements (e.g. 2:4 has c and d), etc. Similarly, Span(0,2) does _not_ overlap Span(2,3), since the former contains a and b while the latter contains c. A Point is a Span whose Start and End refer to the same object, i.e. the same position in the sequence. A Point occurs between elements in the sequence, and so does not contain any elements itself. WARNING: this differs from the way e.g. NCBI handles sequence indices, where the sequence is 1-based, a single index is treated as containing one element, the point 3 contains exactly one element, 3, rather than no elements, and a range from 2:4 contains 2, 3 and 4, _not_ just 2 and 3. """ from cogent.util.misc import FunctionWrapper, ClassChecker, ConstrainedList, \ iterable from itertools import chain from string import strip from bisect import bisect_right, bisect_left import copy __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell", "Matthew Wakefield", "Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Prototype" def _norm_index(i, length, default): """For converting s[:3] to s[0:3], s[-1] to s[len(s)-1] and s[0:lots] to s[0:len(s)]""" if i is None: i = default elif i < 0: i += length return min(max(i,0),length) def _norm_slice(index, length): """_norm_slice(slice(1, -2, 3), 10) -> (1,8,3)""" if isinstance(index, slice): start = _norm_index(index.start, length, 0) end = _norm_index(index.stop, length, length) return (start, end, index.step) else: start = index if start < 0: start += length if start >= length: raise IndexError(index) return (start, start+1, 1) def as_map(slice, length): """Take anything that might be used as a subscript: Integer, Slice, or Map, and return a Map.""" if isinstance(slice, (list, tuple)): spans = [] for i in slice: spans.extend(as_map(i, length).spans) map = Map(spans=spans, parent_length=length) elif isinstance(slice, Map): map = slice # reasons for failure when the following is not commented out # should be checked further #assert map.parent_length == length, (map, length) else: (lo, hi, step) = _norm_slice(slice, length) assert (step or 1) == 1 map = Map([(lo, hi)], parent_length=length) return map class SpanI(object): """Abstract interface for Span and Range objects. Required properties: Start, End (must both be numbers) """ __slots__ = [] #override in subclass def __contains__(self, other): """Returns True if other entirely contained in self.""" raise NotImplementedError def overlaps(self, other): """Returns True if any positions in self are also in other.""" raise NotImplementedError def reverse(self): """Reverses self.""" raise NotImplementedError def __iter__(self): """Iterates over indices contained in self.""" raise NotImplementedError def __str__(self): """Returns string representation of self.""" return '(%s,%s)' % (self.Start, self.End) def __len__(self): """Returns length of self.""" raise NotImplementedError def __cmp__(self): """Compares indices of self with indices of other.""" raise NotImplementedError def startsBefore(self, other): """Returns True if self starts before other or other.Start.""" try: return self.Start < other.Start except AttributeError: return self.Start < other def startsAfter(self, other): """Returns True if self starts after other or after other.Start.""" try: return self.Start > other.Start except AttributeError: return self.Start > other def startsAt(self, other): """Returns True if self starts at the same place as other.""" try: return self.Start == other.Start except AttributeError: return self.Start == other def startsInside(self, other): """Returns True if self's start in other or equal to other.""" try: return self.Start in other except (AttributeError, TypeError): #count other as empty span return False def endsBefore(self, other): """Returns True if self ends before other or other.End.""" try: return self.End < other.End except AttributeError: return self.End < other def endsAfter(self, other): """Returns True if self ends after other or after other.End.""" try: return self.End > other.End except AttributeError: return self.End > other def endsAt(self, other): """Returns True if self ends at the same place as other.""" try: return self.End == other.End except AttributeError: return self.End == other def endsInside(self, other): """Returns True if self's end in other or equal to other.""" try: return self.End in other except (AttributeError, TypeError): #count other as empty span return False class Span(SpanI): """A contiguous location, not much more than (start, end) Spans don't even know what map they are on. The only smarts the class has is the ability to slice correctly. Spans do not expect to be reverse-sliced (sl[5,3]) and treat positions as relative to themselves, not an underlying sequence (eg sl[:n] == sl[0:n]), so this slicing is very different to feature slicing. Spans may optionaly have a value, which gets preserved when they are remapped etc.""" lost = False __slots__ = ( 'tidy_start', 'tidy_end', 'length', 'value', 'Start', 'End', 'Reverse') def __init__(self, Start, End=None, tidy_start=False, tidy_end=False, value=None, Reverse=False): self._new_init(Start, End, Reverse) self.tidy_start = tidy_start self.tidy_end = tidy_end self.value = value self.length = self.End - self.Start assert self.length >= 0 def _new_init(self, Start, End=None, Reverse=False): """Returns a new Span object, with Start, End, and Reverse properties. If End is not supplied, it is set to Start + 1 (providing a 1-element range). Reverse defaults to False. This should replace the current __init__ method when deprecated vars are removed. """ #special handling in case we were passed another Span if isinstance(Start, Span): assert End is None self.Start, self.End, self.Reverse = Start.Start, Start.End, \ Start.Reverse else: #reverse start and end so that start is always first if End is None: End = Start + 1 elif Start > End: Start, End = End, Start self.Start = Start self.End = End self.Reverse = Reverse def __setstate__(self, args): self.__init__(*args) def __getstate__(self): return (self.Start, self.End, self.tidy_start, self.tidy_end, \ self.value, self.Reverse) def __repr__(self): (start, end) = (self.Start, self.End) if self.Reverse: (end, start) = (start, end) return '%s:%s' % (start, end) def reversed(self): return self.__class__(self.Start, self.End, self.tidy_end, self.tidy_start, self.value, Reverse=not self.Reverse) def __getitem__(self, slice): start,end,step = _norm_slice(slice, self.length) assert (step or 1) == 1, slice assert start <= end, slice tidy_start = self.tidy_start and start==0 tidy_end = self.tidy_end and end == self.length if self.Reverse: (Start, End, Reverse) = (self.End-end, self.End-start, True) else: (Start, End, Reverse) = (self.Start+start, self.Start+end, False) return type(self)(Start, End, tidy_start, tidy_end, self.value, Reverse) def __mul__(self, scale): return Span(self.Start * scale, self.End * scale, self.tidy_start, self.tidy_end, self.value, self.Reverse) def __div__(self, scale): assert not self.Start % scale or self.End % scale return Span(self.Start // scale, self.End // scale, self.tidy_start, self.tidy_end, self.value, self.Reverse) def remapWith(self, map): """The list of spans corresponding to this span on its grandparent, ie: C is a span of a feature on B which itself is a feature on A, so to place C on A return that part of B (map) covered by C (self)""" (offsets, spans) = (map.offsets, map.spans) map_length = offsets[-1] + spans[-1].length # don't try to remap any non-corresponding end region(s) # this won't matter if all spans lie properly within their # parent maps, but that might not be true of Display slices. (zlo, zhi) = (max(0, self.Start), min(map_length, self.End)) # Find the right span(s) of the map first = bisect_right(offsets, zlo) - 1 last = bisect_left(offsets, zhi, first) -1 result = spans[first:last+1] # Cut off something at either end to get # the same position and length as 'self' if result: end_trim = offsets[last] + spans[last].length - zhi start_trim = zlo - offsets[first] if end_trim > 0: result[-1] = result[-1][:result[-1].length-end_trim] if start_trim > 0: result[0] = result[0][start_trim:] # May need to add a bit at either end if the span didn't lie entirely # within its parent map (eg: Display slice, inverse of feature map). if self.Start < 0: result.insert(0, LostSpan(-self.Start)) if self.End > map_length: result.append(LostSpan(self.End-map_length)) # If the ends of self are meaningful then so are the new ends, # but not any new internal breaks. if result: if self.tidy_start: result[0].tidy_start = True if self.tidy_end: result[-1].tidy_end = True # Deal with case where self is a reverse slice. if self.Reverse: result = [part.reversed() for part in result] result.reverse() if self.value is not None: result = [copy.copy(s) for s in result] for s in result: s.value = self.value return result def __contains__(self, other): """Returns True if other completely contained in self. other must either be a number or have Start and End properties. """ try: return other.Start >= self.Start and other.End <= self.End except AttributeError: #other is scalar: must be _less_ than self.End, #for the same reason that 3 is not in range(3). return other >= self.Start and other < self.End def overlaps(self, other): """Returns True if any positions in self are also in other.""" #remember to subtract 1 from the Ends, since self.End isn't really #in self... try: return (self.Start in other) or (other.Start in self) except AttributeError: #other was probably a number? return other in self def reverse(self): """Reverses self.""" self.Reverse = not self.Reverse def reversedRelativeTo(self, length): """Returns a new span with positions adjusted relative to length. For use in reverse complementing of nucleic acids""" # if reverse complementing, the start becomes the length minus the end # position start = length - self.End assert start >= 0 end = start + self.length return self.__class__(start, end, value = self.value, Reverse = not self.Reverse) def __iter__(self): """Iterates over indices contained in self. NOTE: to make sure that the same items are contained whether going through the range in forward or reverse, need to adjust the indices by 1 if going backwards. """ if self.Reverse: return iter(xrange(self.End-1, self.Start-1, -1)) else: return iter(xrange(self.Start, self.End, 1)) def __str__(self): """Returns string representation of self.""" return '(%s,%s,%s)' % (self.Start, self.End, bool(self.Reverse)) def __len__(self): """Returns length of self.""" return self.End - self.Start def __cmp__(self, other): """Compares indices of self with indices of other.""" if hasattr(other, 'Start') and hasattr(other, 'End'): return cmp(self.Start, other.Start) or cmp(self.End, other.End) \ or cmp(self.Reverse, other.Reverse) else: return cmp(type(self), type(other)) class _LostSpan(object): """A placeholder span which doesn't exist in the underlying sequence""" __slots__ = ['length', 'value'] lost = True terminal = False def __init__(self, length, value=None): self.length = length self.value = value def __len__(self): return self.length def __setstate__(self, args): self.__init__(*args) def __getstate__(self): return (self.length, self.value) def __repr__(self): return '-%s-' % (self.length) def where(self, index): return None def reversed(self): return self def __getitem__(self, slice): (start,end,step) = _norm_slice(slice, self.length) assert (step or 1) == 1, slice return self.__class__(abs(end-start), self.value) def __mul__(self, scale): return LostSpan(self.length * scale, self.value) def __div__(self, scale): assert not self.length % 3 return LostSpan(self.length // scale, self.value) def remapWith(self, map): return [self] def reversedRelativeTo(self, length): return self # Save memory by only making one of each small gap _lost_span_cache = {} def LostSpan(length, value=None): global _lost_span_cache if value is None and length < 1000: if length not in _lost_span_cache: _lost_span_cache[length] = _LostSpan(length, value) return _lost_span_cache[length] else: return _LostSpan(length, value) class TerminalPadding(_LostSpan): terminal = True def __repr__(self): return '?%s?' % (self.length) class Map(object): """A map holds a list of spans. """ def __init__(self, locations=None, spans=None, tidy=False, parent_length=None, termini_unknown=False): assert parent_length is not None if spans is None: spans = [] for (start, end) in locations: diff = 0 reverse = start > end if max(start, end) < 0 or min(start, end) > parent_length: raise RuntimeError("located outside sequence: %s" % \ str((start, end, parent_length))) elif max(start, end) < 0: diff = min(start, end) start = [start, 0][start < 0] end = [end, 0][end < 0] elif min(start, end) > parent_length: diff = max(start, end) - parent_length start = [start, parent_length][start > parent_length] end = [end, parent_length][end > parent_length] span = Span(start, end, tidy, tidy, Reverse=reverse) if diff < 0: spans += [LostSpan(-diff), span] elif diff > 0: spans += [span, LostSpan(diff)] else: spans += [span] self.offsets = [] self.useful = False self.complete = True self.Reverse = None posn = 0 for span in spans: self.offsets.append(posn) posn += span.length if span.lost: self.complete = False elif not self.useful: self.useful = True (self.Start, self.End) = (span.Start, span.End) self.Reverse = span.Reverse else: self.Start = min(self.Start, span.Start) self.End = max(self.End, span.End) if self.Reverse is not None and (span.Reverse != self.Reverse): self.Reverse = None if termini_unknown: if spans[0].lost: spans[0] = TerminalPadding(spans[0].length) if spans[-1].lost: spans[-1] = TerminalPadding(spans[-1].length) self.spans = spans self.length = posn self.parent_length = parent_length self.__inverse = None def __len__(self): return self.length def __repr__(self): return repr(self.spans) + '/%s' % self.parent_length def __getitem__(self, slice): # A possible shorter map at the same level slice = as_map(slice, len(self)) new_parts = [] for span in slice.spans: new_parts.extend(span.remapWith(self)) return Map(spans=new_parts, parent_length=self.parent_length) def __mul__(self, scale): # For Protein -> DNA new_parts = [] for span in self.spans: new_parts.append(span * scale) return Map(spans=new_parts, parent_length=self.parent_length*scale) def __div__(self, scale): # For DNA -> Protein new_parts = [] for span in self.spans: new_parts.append(span / scale) return Map(spans=new_parts, parent_length=self.parent_length // scale) def __add__(self, other): if other.parent_length != self.parent_length: raise ValueError("Those maps belong to different sequences") return Map(spans=self.spans + other.spans, parent_length=self.parent_length) def withTerminiUnknown(self): return Map(self, spans=self.spans[:], parent_length=self.parent_length, termini_unknown = True) def getCoveringSpan(self): if self.Reverse: span = (self.End, self.Start) else: span = (self.Start, self.End) return Map([span], parent_length=self.parent_length) def covered(self): """>>> Map([(10,20), (15, 25), (80, 90)]).covered().spans [Span(10,25), Span(80, 90)]""" delta = {} for span in self.spans: if span.lost: continue delta[span.Start] = delta.get(span.Start, 0) + 1 delta[span.End] = delta.get(span.End, 0) - 1 positions = delta.keys() positions.sort() last_y = y = 0 last_x = start = None result = [] for x in positions: y += delta[x] if x == last_x: continue if y and not last_y: assert start is None start = x elif last_y and not y: result.append((start, x)) start = None last_x = x last_y = y assert y == 0 return Map(result, parent_length=self.parent_length) def reversed(self): """Reversed location on same parent""" spans = [s.reversed() for s in self.spans] spans.reverse() return Map(spans=spans, parent_length=self.parent_length) def nucleicReversed(self): """Same location on reversed parent""" spans = [s.reversedRelativeTo(self.parent_length) for s in self.spans] return Map(spans=spans, parent_length=self.parent_length) def gaps(self): """The gaps (lost spans) in this map""" locations = [] offset = 0 for s in self.spans: if s.lost: locations.append((offset, offset+s.length)) offset += s.length return Map(locations, parent_length=len(self)) def shadow(self): """The 'negative' map of the spans not included in this map""" return self.inverse().gaps() def nongap(self): locations = [] offset = 0 for s in self.spans: if not s.lost: locations.append((offset, offset+s.length)) offset += s.length return Map(locations, parent_length=len(self)) def withoutGaps(self): return Map( spans = [s for s in self.spans if not s.lost], parent_length = self.parent_length) def inverse(self): if self.__inverse is None: self.__inverse = self._inverse() return self.__inverse def _inverse(self): # can't work if there are overlaps in the map # tidy ends don't survive inversion if self.parent_length is None: raise ValueError("Uninvertable. Parent length not known") posn = 0 temp = [] for span in self.spans: if not span.lost: if span.Reverse: temp.append((span.Start, span.End, posn+span.length, posn)) else: temp.append((span.Start, span.End, posn, posn+span.length)) posn += span.length temp.sort() new_spans = [] last_hi = 0 for (lo, hi, start, end) in temp: if lo > last_hi: new_spans.append(LostSpan(lo-last_hi)) elif lo < last_hi: raise ValueError, "Uninvertable. Overlap: %s < %s" % (lo, last_hi) new_spans.append(Span(start, end, Reverse=start>end)) last_hi = hi if self.parent_length > last_hi: new_spans.append(LostSpan(self.parent_length-last_hi)) return Map(spans=new_spans, parent_length=len(self)) class SpansOnly(ConstrainedList): """List that converts elements to Spans on addition.""" Mask = FunctionWrapper(Span) _constraint = ClassChecker(Span) class Range(SpanI): """Complex object consisting of many spans.""" def __init__(self, Spans=[]): """Returns a new Range object with data in Spans. """ result = SpansOnly() #need to check if we got a single Span, since they define __iter__. if isinstance(Spans, Span): result.append(Spans) elif hasattr(Spans, 'Spans'): #probably a single range object? result.extend(Spans.Spans) else: for s in iterable(Spans): if hasattr(s, 'Spans'): result.extend(s.Spans) else: result.append(s) self.Spans = result def __str__(self): """Returns string representation of self.""" return '(%s)' % ','.join(map(str, self.Spans)) def __len__(self): """Returns sum of span lengths. NOTE: if spans overlap, will count multiple times. Use reduce() to get rid of overlaps. """ return sum(map(len, self.Spans)) def __cmp__(self, other): """Compares spans of self with indices of other.""" if hasattr(other, 'Spans'): return cmp(self.Spans, other.Spans) elif len(self.Spans) == 1 and hasattr(other, 'Start') and \ hasattr(other, 'End'): return cmp(self.Spans[0].Start, other.Start) or \ cmp(self.Spans[0].End, other.End) else: return object.__cmp__(self, other) def _get_start(self): """Finds earliest start of items in self.Spans.""" return min([i.Start for i in self.Spans]) Start = property(_get_start) def _get_end(self): """Finds latest end of items in self.Spans.""" return max([i.End for i in self.Spans]) End = property(_get_end) def _get_reverse(self): """Reverse is True if any piece is reversed.""" for i in self.Spans: if i.Reverse: return True return False Reverse = property(_get_reverse) def reverse(self): """Reverses all spans in self.""" for i in self.Spans: i.reverse() def __contains__(self, other): """Returns True if other completely contained in self. other must either be a number or have Start and End properties. """ if hasattr(other, 'Spans'): for curr in other.Spans: found = False for i in self.Spans: if curr in i: found = True break if not found: return False return True else: for i in self.Spans: if other in i: return True return False def overlaps(self, other): """Returns True if any positions in self are also in other.""" if hasattr(other, 'Spans'): for i in self.Spans: for j in other.Spans: if i.overlaps(j): return True else: for i in self.Spans: if i.overlaps(other): return True return False def overlapsExtent(self, other): """Returns True if any positions in self's extent also in other's.""" if hasattr(other, 'Extent'): return self.Extent.overlaps(other.Extent) else: return self.Extent.overlaps(other) def sort(self): """Sorts the spans in self.""" self.Spans.sort() def __iter__(self): """Iterates over indices contained in self.""" return chain(*[iter(i) for i in self.Spans]) def _get_extent(self): """Returns Span object representing the extent of self.""" return Span(self.Start, self.End) Extent = property(_get_extent) def simplify(self): """Reduces the spans in self in-place to get fewest spans. Will not condense spans with opposite directions. Will condense adjacent but nonoverlapping spans (e.g. (1,3) and (4,5)). """ forward = [] reverse = [] spans = self.Spans[:] spans.sort() for span in spans: if span.Reverse: direction = reverse else: direction = forward found_overlap = False for other in direction: if span.overlaps(other) or (span.Start == other.End) or \ (other.Start == span.End): #handle adjacent spans also other.Start = min(span.Start, other.Start) other.End = max(span.End, other.End) found_overlap = True break if not found_overlap: direction.append(span) self.Spans[:] = forward + reverse class Point(Span): """Point is a special case of Span, where Start always equals End. Note that, as per Python standard, a point is _between_ two elements in a sequence. In other words, a point does not contain any elements. If you want a single element, use a Span where End = Start + 1. A Point does have a direction (i.e. a Reverse property) to indicate where successive items would go if it were expanded. """ def __init__(self, Start, Reverse=False): """Returns new Point object.""" self.Reverse = Reverse self._start = Start def _get_start(self): """Returns self.Start.""" return self._start def _set_start(self, Start): """Sets self.Start and self.End.""" self._start = Start Start = property(_get_start, _set_start) End = Start #start and end are synonyms for the same property def RangeFromString(string, delimiter=','): """Returns Range object from string of the form 1-5,11,20,30-50. Ignores whitespace; expects values to be comma-delimited and positive. """ result = Range() pairs = map(strip, string.split(delimiter)) for p in pairs: if not p: #adjacent delimiters? continue if '-' in p: #treat as pair first, second = p.split('-') result.Spans.append(Span(int(first), int(second))) else: result.Spans.append(Span(int(p))) return result PyCogent-1.5.3/cogent/core/moltype.py000644 000765 000024 00000117652 12024702176 020527 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ moltype.py MolType provides services for resolving ambiguities, or providing the correct ambiguity for recoding. It also maintains the mappings between different kinds of alphabets, sequences and alignments. One issue with MolTypes is that they need to know about Sequence, Alphabet, and other objects, but, at the same time, those objects need to know about the MolType. It is thus essential that the connection between these other types and the MolType can be made after the objects are created. """ __author__ = "Peter Maxwell, Gavin Huttley and Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight", \ "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" from cogent.core.alphabet import CharAlphabet, Enumeration, Alphabet, \ AlphabetError, _make_complement_array from cogent.util.misc import FunctionWrapper, add_lowercase, iterable, if_ from cogent.util.transform import allchars, keep_chars from cogent.data.molecular_weight import DnaMW, RnaMW, ProteinMW from cogent.core.sequence import Sequence as DefaultSequence, RnaSequence, \ DnaSequence, ProteinSequence, ABSequence, NucleicAcidSequence, \ ByteSequence, ModelSequence, ModelNucleicAcidSequence, \ ModelDnaSequence, ModelRnaSequence, ModelDnaCodonSequence, \ ModelRnaCodonSequence, ModelProteinSequence, ProteinWithStopSequence,\ ModelProteinWithStopSequence from cogent.core.genetic_code import DEFAULT as DEFAULT_GENETIC_CODE, \ GeneticCodes from cogent.core.alignment import Alignment, DenseAlignment, \ SequenceCollection from random import choice import re import string import numpy from numpy import array, sum, transpose, remainder, zeros, arange, newaxis, \ ravel, asarray, fromstring, take, uint8, uint16, uint32 Float = numpy.core.numerictypes.sctype2char(float) Int = numpy.core.numerictypes.sctype2char(int) from string import maketrans, translate IUPAC_gap = '-' IUPAC_missing = '?' IUPAC_DNA_chars = ['T','C','A','G'] IUPAC_DNA_ambiguities = { 'N': ('A','C','T','G'), 'R': ('A','G'), 'Y': ('C','T'), 'W': ('A','T'), 'S': ('C','G'), 'K': ('T','G'), 'M': ('C','A'), 'B': ('C','T','G'), 'D': ('A','T','G'), 'H': ('A','C','T'), 'V': ('A','C','G') } IUPAC_DNA_ambiguities_complements = { 'A':'T','C':'G','G':'C','T':'A', '-':'-', 'M':'K', 'K':'M', 'N':'N', 'R':'Y', 'Y':'R', 'W':'W', 'S':'S', 'X':'X', # not technically an IUPAC ambiguity, but used by repeatmasker 'V':'B', 'B':'V', 'H':'D', 'D':'H' } IUPAC_DNA_complements = { 'A':'T','C':'G','G':'C','T':'A', '-':'-', } IUPAC_DNA_complements = { 'A':'T','C':'G','G':'C','T':'A', '-':'-', } IUPAC_RNA_chars = ['U','C','A','G'] #note change in standard order from DNA IUPAC_RNA_ambiguities = { 'N': ('A','C','U','G'), 'R': ('A','G'), 'Y': ('C','U'), 'W': ('A','U'), 'S': ('C','G'), 'K': ('U','G'), 'M': ('C','A'), 'B': ('C','U','G'), 'D': ('A','U','G'), 'H': ('A','C','U'), 'V': ('A','C','G') } IUPAC_RNA_ambiguities_complements = { 'A':'U','C':'G','G':'C','U':'A', '-':'-', 'M':'K', 'K':'M', 'N':'N', 'R':'Y', 'Y':'R', 'W':'W', 'S':'S', 'X':'X', # not technically an IUPAC ambiguity, but used by repeatmasker 'V':'B', 'B':'V', 'H':'D', 'D':'H' } IUPAC_RNA_complements = { 'A':'U','C':'G','G':'C','U':'A', '-':'-', } #Standard RNA pairing: GU pairs count as 'weak' pairs RnaStandardPairs = { ('A','U'): True, #True vs False for 'always' vs 'sometimes' pairing ('C','G'): True, ('G','C'): True, ('U','A'): True, ('G','U'): False, ('U','G'): False, } #Watson-Crick RNA pairing only: GU pairs don't count as pairs RnaWCPairs = { ('A','U'): True, ('C','G'): True, ('G','C'): True, ('U','A'): True, } #RNA pairing with GU counted as standard pairs RnaGUPairs = { ('A','U'): True, ('C','G'): True, ('G','C'): True, ('U','A'): True, ('G','U'): True, ('U','G'): True, } #RNA pairing with GU, AA, GA, CA and UU mismatches allowed as weak pairs RnaExtendedPairs = { ('A','U'): True, ('C','G'): True, ('G','C'): True, ('U','A'): True, ('G','U'): False, ('U','G'): False, ('A','A'): False, ('G','A'): False, ('A','G'): False, ('C','A'): False, ('A','C'): False, ('U','U'): False, } #Standard DNA pairing: only Watson-Crick pairs count as pairs DnaStandardPairs = { ('A','T'): True, ('C','G'): True, ('G','C'): True, ('T','A'): True, } # protein letters & ambiguity codes IUPAC_PROTEIN_chars = [ 'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y'] PROTEIN_WITH_STOP_chars = [ 'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y', '*'] IUPAC_PROTEIN_ambiguities = { 'B': ['N', 'D'], 'X': IUPAC_PROTEIN_chars, 'Z': ['Q', 'E'], } PROTEIN_WITH_STOP_ambiguities = { 'B': ['N', 'D'], 'X': PROTEIN_WITH_STOP_chars, 'Z': ['Q', 'E'], } class FoundMatch(Exception): """Raised when a match is found in a deep loop to skip many levels""" pass def make_matches(monomers=None, gaps=None, degenerates=None): """Makes a dict of symbol pairs (i,j) -> strictness. Strictness is True if i and j always match and False if they sometimes match (e.g. A always matches A, but W sometimes matches R). """ result = {} #allow defaults to be left blank without problems monomers = monomers or {} gaps = gaps or {} degenerates = degenerates or {} #all monomers always match themselves and no other monomers for i in monomers: result[(i,i)] = True #all gaps always match all other gaps for i in gaps: for j in gaps: result[(i,j)] = True #monomers sometimes match degenerates that contain them for i in monomers: for j in degenerates: if i in degenerates[j]: result[(i,j)] = False result[(j,i)] = False #degenerates sometimes match degenerates that contain at least one of #the same monomers for i in degenerates: for j in degenerates: try: for i_symbol in degenerates[i]: if i_symbol in degenerates[j]: result[(i,j)] = False raise FoundMatch except FoundMatch: pass #flow control: break out of doubly nested loop return result def make_pairs(pairs=None, monomers=None, gaps=None, degenerates=None): """Makes a dict of symbol pairs (i,j) -> strictness. Expands pairs into all possible pairs using degen symbols. Strictness is True if i and j always pair, and False if they 'weakly' pair (e.g. GU pairs or if it is possible that they pair). If you want to make GU pairs count as 'always matching', pass in pairs that have (G,U) and (U, G) mapped to True rather than False. """ result = {} #allow defaults to be left blank without problems pairs = pairs or {} monomers = monomers or {} gaps = gaps or {} degenerates = degenerates or {} #add in the original pairs: should be complete monomer pairs result.update(pairs) #all gaps 'weakly' pair with each other for i in gaps: for j in gaps: result[(i,j)] = False #monomers sometimes pair with degenerates if the monomer's complement #is in the degenerate symbol for i in monomers: for j in degenerates: found = False try: for curr_j in degenerates[j]: #check if (i,curr_j) and/or (curr_j,i) is a valid pair: #not mutually required if pairs are not all commutative! if (i, curr_j) in pairs: result[(i,j)] = False found = True if (curr_j, i) in pairs: result[(j,i)] = False found = True if found: raise FoundMatch except FoundMatch: pass #flow control: break out of nested loop #degenerates sometimes pair with each other if the first degenerate #contains the complement of one of the bases in the second degenerate for i in degenerates: for j in degenerates: try: for curr_i in degenerates[i]: for curr_j in degenerates[j]: if (curr_i, curr_j) in pairs: result[(i,j)] = False raise FoundMatch except FoundMatch: pass #just using for flow control #don't forget the return value! return result #RnaPairingRules is a dict of {name:(base_pairs,degen_pairs)} where base_pairs #is a dict with the non-degenerate pairing rules and degen_pairs is a dict with #both the degenerate and non-degenerate pairing rules. #NOTE: uses make_pairs to augment the initial dict after construction. RnaPairingRules = { 'Standard': RnaStandardPairs, 'WC': RnaWCPairs, 'GU': RnaGUPairs, 'Extended': RnaExtendedPairs, } for k, v in RnaPairingRules.items(): RnaPairingRules[k] = (v, make_pairs(v)) class CoreObjectGroup(object): """Container relating gapped, ungapped, degen, and non-degen objects.""" _types = ['Base', 'Degen', 'Gap', 'DegenGap'] def __init__(self, Base, Degen=None, Gapped=None, DegenGapped=None): """Returns new CoreObjectGroup. Only Base is required""" self.Base = Base self.Degen = Degen self.Gapped = Gapped self.DegenGapped = DegenGapped self._items = [Base, Degen, Gapped, DegenGapped] self._set_relationships() def _set_relationships(self): """Sets relationships between the different "flavors".""" self.Base.Gapped = self.Gapped self.Base.Ungapped = self.Base self.Base.Degen = self.Degen self.Base.NonDegen = self.Base statements = [ "self.Degen.Gapped = self.DegenGapped", "self.Degen.Ungapped = self.Degen", "self.Degen.Degen = self.Degen", "self.Degen.NonDegen = self.Base", "self.Gapped.Gapped = self.Gapped", "self.Gapped.Ungapped = self.Base", "self.Gapped.Degen = self.DegenGapped", "self.Gapped.NonDegen = self.Gapped", "self.DegenGapped.Gapped = self.DegenGapped", "self.DegenGapped.Ungapped = self.Degen", "self.DegenGapped.Degen = self.DegenGapped", "self.DegenGapped.NonDegen = self.Gapped", ] for s in statements: try: exec(s) except AttributeError: pass def __getitem__(self, i): """Allows container to be indexed into, by type of object (e.g. Gap).""" return self.__dict__[i] def whichType(self, a): """Returns the type of an alphabet in self, or None if not present.""" return self._types[self._items.find(a)] class AlphabetGroup(CoreObjectGroup): """Container relating gapped, ungapped, degen, and non-degen alphabets.""" def __init__(self, chars, degens, gap=IUPAC_gap, missing=IUPAC_missing, \ MolType=None, constructor=None): """Returns new AlphabetGroup.""" if constructor is None: if max(map(len, chars)) == 1: constructor = CharAlphabet chars = ''.join(chars) degens = ''.join(degens) else: constructor = Alphabet #assume multi-char self.Base = constructor(chars, MolType=MolType) self.Degen = constructor(chars+degens, MolType=MolType) self.Gapped = constructor(chars+gap, gap, MolType=MolType) self.DegenGapped = constructor(chars+gap+degens+missing, gap, \ MolType=MolType) self._items = [self.Base, self.Degen, self.Gapped, self.DegenGapped] self._set_relationships() #set complements if MolType was specified if MolType is not None: comps = MolType.Complements for i in self._items: i._complement_array = _make_complement_array(i, comps) class MolType(object): """MolType: Handles operations that depend on the sequence type (e.g. DNA). The MolType knows how to connect alphabets, sequences, alignments, and so forth, and how to disambiguate ambiguous symbols and perform base pairing (where appropriate). WARNING: Objects passed to a MolType become associated with that MolType, i.e. if you pass ProteinSequence to a new MolType you make up, all ProteinSequences will now be associated with the new MolType. This may not be what you expect. Use preserve_existing_moltypes=True if you don't want to reset the moltype. """ def __init__(self, motifset, Gap=IUPAC_gap, Missing=IUPAC_missing,\ Gaps=None, Sequence=None, Ambiguities=None, label=None, Complements=None, Pairs=None, MWCalculator=None, \ add_lower=False, preserve_existing_moltypes=False, \ make_alphabet_group=False, ModelSeq=None): """Returns a new MolType object. Note that the parameters are in flux. Currently: motifset: Alphabet or sequence of items in the default alphabet. Does not include degenerates. Gap: default gap symbol Missing: symbol for missing data Gaps: any other symbols that should be treated as gaps (doesn't have to include Gap or Missing; they will be silently added) Sequence: Class for constructing sequences. Ambiguities: dict of char:tuple, doesn't include gaps (these are hard-coded as - and ?, and added later. label: text label, don't know what this is used for. Unnecessary? Complements: dict of symbol:symbol showing how the non-degenerate single characters complement each other. Used for constructing on the fly the complement table, incl. support for mustPair and canPair. Pairs: dict in which keys are pairs of symbols that can pair with each other, values are True (must pair) or False (might pair). Currently, the meaning of GU pairs as 'weak' is conflated with the meaning of degenerate symbol pairs (which might pair with each other but don't necessarily, depending on how the symbol is resolved). This should be refactored. MWCalculator: f(seq) -> molecular weight. add_lower: if True (default: False) adds the lowercase versions of everything into the alphabet. Slated for deletion. preserve_existing_moltypes: if True (default: False), does not set the MolType of the things added in **kwargs to self. make_alphabet_group: if True, makes an AlphabetGroup relating the various alphabets to one another. ModelSeq: sequence type for modeling Note on "Degenerates" versus "Ambiguities": self.Degenerates contains _only_ mappings for degenerate symbols, whereas self.Ambiguities contains mappings for both degenerate and non-degenerate symbols. Sometimes you want one, sometimes the other, so both are provided. """ self.Gap = Gap self.Missing = Missing self.Gaps = frozenset([Gap, Missing]) if Gaps: self.Gaps = self.Gaps.union(frozenset(Gaps)) self.label = label #set the sequence constructor if Sequence is None: Sequence = ''.join #safe default string constructor elif not preserve_existing_moltypes: Sequence.MolType = self self.Sequence = Sequence #set the ambiguities ambigs = {self.Missing:tuple(motifset)+(self.Gap,),self.Gap:(self.Gap,)} if Ambiguities: ambigs.update(Ambiguities) for c in motifset: ambigs[c] = (c,) self.Ambiguities = ambigs #set Complements -- must set before we make the alphabet group self.Complements = Complements or {} if make_alphabet_group: #note: must use _original_ ambiguities here self.Alphabets = AlphabetGroup(motifset, Ambiguities, \ MolType=self) self.Alphabet = self.Alphabets.Base else: if isinstance(motifset, Enumeration): self.Alphabet = motifset elif max(len(motif) for motif in motifset) == 1: self.Alphabet = CharAlphabet(motifset, MolType=self) else: self.Alphabet = Alphabet(motifset, MolType=self) #set the other properties self.Degenerates = Ambiguities and Ambiguities.copy() or {} self.Degenerates[self.Missing] = ''.join(motifset)+self.Gap self.Matches = make_matches(motifset, self.Gaps, self.Degenerates) self.Pairs = Pairs and Pairs.copy() or {} self.Pairs.update(make_pairs(Pairs, motifset, self.Gaps, \ self.Degenerates)) self.MWCalculator = MWCalculator #add lowercase characters, if we're doing that if add_lower: self._add_lowercase() #cache various other data that make the calculations faster self._make_all() self._make_comp_table() # a gap can be a true gap char or a degenerate character, typically '?' # we therefore want to ensure consistent treatment across the definition # of characters as either gap or degenerate self.GapString = ''.join(self.Gaps) strict_gap = "".join(set(self.GapString) - set(self.Degenerates)) self.stripDegenerate = FunctionWrapper( keep_chars(strict_gap+''.join(self.Alphabet))) self.stripBad = FunctionWrapper(keep_chars(''.join(self.All))) to_keep = set(self.Alphabet) ^ set(self.Degenerates) - set(self.Gaps) self.stripBadAndGaps = FunctionWrapper(keep_chars(''.join(to_keep))) #make inverse degenerates from degenerates #ensure that lowercase versions also exist if appropriate inv_degens = {} for key, val in self.Degenerates.items(): inv_degens[frozenset(val)] = key.upper() if add_lower: inv_degens[frozenset(''.join(val).lower())] = key.lower() for m in self.Alphabet: inv_degens[frozenset(m)] = m if add_lower: inv_degens[frozenset(''.join(m).lower())] = m.lower() for m in self.Gaps: inv_degens[frozenset(m)] = m self.InverseDegenerates = inv_degens #set array type for modeling alphabets try: self.ArrayType = self.Alphabet.ArrayType except AttributeError: self.ArrayType = None #set modeling sequence self.ModelSeq = ModelSeq def __repr__(self): """String representation of MolType. WARNING: This doesn't allow you to reconstruct the object in its present incarnation. """ return 'MolType(%s)' % (self.Alphabet,) def gettype(self): """Returns type, e.g. 'dna', 'rna', 'protein'. Delete?""" return self.label def makeSequence(self, Seq, Name=None, **kwargs): """Returns sequence of correct type. Replace with just self.Sequence?""" return self.Sequence(Seq, Name, **kwargs) def verifySequence(self, seq, gaps_allowed=True, wildcards_allowed=True): """Checks whether sequence is valid on the default alphabet. Has special-case handling for gaps and wild-cards. This mechanism is probably useful to have in parallel with the validation routines that check specifically whether the sequence has gaps, degenerate symbols, etc., or that explicitly take an alphabet as input. """ alpha = frozenset(self.Ambiguities) if gaps_allowed: alpha = alpha.union(self.Gaps) if wildcards_allowed: alpha = alpha.union(self.Missing) try: nonalpha = re.compile('[^%s]' % re.escape(''.join(alpha))) badchar = nonalpha.search(seq) if badchar: motif = badchar.group() raise AlphabetError(motif) except TypeError: #not alphabetic sequence: try slow method for motif in seq: if motif not in alpha: raise AlphabetError(motif) def isAmbiguity(self, querymotif): """Return True if querymotif is an amibiguity character in alphabet. Arguments: - querymotif: the motif being queried.""" return len(self.Ambiguities[querymotif]) > 1 def _whatAmbiguity(self, motifs): """The code that represents all of 'motifs', and minimal others. Does this duplicate DegenerateFromSequence directly? """ most_specific = len(self.Alphabet) + 1 result = self.Missing for (code, motifs2) in self.Ambiguities.items(): for c in motifs: if c not in motifs2: break else: if len(motifs2) < most_specific: most_specific = len(motifs2) result = code return result def whatAmbiguity(self, motifs): """The code that represents all of 'motifs', and minimal others. Does this duplicate DegenerateFromSequence directly? """ if not hasattr(self, '_reverse_ambiguities'): self._reverse_ambiguities = {} motifs = frozenset(motifs) if motifs not in self._reverse_ambiguities: self._reverse_ambiguities[motifs] = self._whatAmbiguity(motifs) return self._reverse_ambiguities[motifs] def _add_lowercase(self): """Adds lowercase versions of keys and vals to each internal dict.""" for name in ['Alphabet', 'Degenerates', 'Gaps', 'Complements', 'Pairs', 'Matches']: curr = getattr(self, name) #temp hack to get around re-ordering if isinstance(curr, Alphabet): curr = tuple(curr) new = add_lowercase(curr) setattr(self, name, new) def _make_all(self): """Sets self.All, which contains all the symbols self knows about. Note that the value of items in self.All will be the string containing the possibly degenerate set of symbols that the items expand to. """ all = {} for i in self.Alphabet: curr = str(i) all[i] = i for key, val in self.Degenerates.items(): all[key] = val for i in self.Gaps: all[i] = i self.All = all def _make_comp_table(self): """Sets self.ComplementTable, which maps items onto their complements. Note: self.ComplementTable is only set if self.Complements exists. """ if self.Complements: self.ComplementTable = maketrans(''.join(self.Complements.keys()), ''.join(self.Complements.values())) def complement(self, item): """Returns complement of item, using data from self.Complements. Always tries to return same type as item: if item looks like a dict, will return list of keys. """ if not self.Complements: raise TypeError, \ "Tried to complement sequence using alphabet without complements." try: return item.translate(self.ComplementTable) except (AttributeError, TypeError): item = iterable(item) get = self.Complements.get return item.__class__([get(i, i) for i in item]) def rc(self, item): """Returns reverse complement of item w/ data from self.Complements. Always returns same type as input. """ comp = list(self.complement(item)) comp.reverse() if isinstance(item, str): return item.__class__(''.join(comp)) else: return item.__class__(comp) def __contains__(self, item): """A MolType contains every character it knows about.""" return item in self.All def __iter__(self): """A MolType iterates only over the characters in its Alphabet..""" return iter(self.Alphabet) def isGap(self, char): """Returns True if char is a gap.""" return char in self.Gaps def isGapped(self, sequence): """Returns True if sequence contains gaps.""" return self.firstGap(sequence) is not None def isDegenerate(self, sequence): """Returns True if sequence contains degenerate characters.""" return self.firstDegenerate(sequence) is not None def isValid(self, sequence): """Returns True if sequence contains no items that are not in self.""" try: return self.firstInvalid(sequence) is None except: return False def isStrict(self, sequence): """Returns True if sequence contains only items in self.Alphabet.""" try: return (len(sequence)==0) or (self.firstNonStrict(sequence) is None) except: return False def isValidOnAlphabet(self, sequence, alphabet=None): """Returns True if sequence contains only items in alphabet. Alphabet can actually be anything that implements __contains__. Defaults to self.Alphabet if not supplied. """ if alphabet is None: alphabet = self.Alphabet return first_index_in_set(sequence, alphabet) is not None def firstNotInAlphabet(self, sequence, alphabet=None): """Returns index of first item not in alphabet, or None. Defaults to self.Alphabet if alphabet not supplied. """ if alphabet is None: alphabet = self.Alphabet return first_index_in_set(sequence, alphabet) def firstGap(self, sequence): """Returns the index of the first gap in the sequence, or None.""" gap = self.Gaps for i, s in enumerate(sequence): if s in gap: return i return None def firstDegenerate(self, sequence): """Returns the index of first degenerate symbol in sequence, or None.""" degen = self.Degenerates for i, s in enumerate(sequence): if s in degen: return i return None def firstInvalid(self, sequence): """Returns the index of first invalid symbol in sequence, or None.""" all = self.All for i, s in enumerate(sequence): if not s in all: return i return None def firstNonStrict(self, sequence): """Returns the index of first non-strict symbol in sequence, or None.""" monomers = self.Alphabet for i, s in enumerate(sequence): if not s in monomers: return i return None def disambiguate(self, sequence, method='strip'): """Returns a non-degenerate sequence from a degenerate one. method can be 'strip' (deletes any characters not in monomers or gaps) or 'random'(assigns the possibilities at random, using equal frequencies). """ if method == 'strip': try: return sequence.__class__(self.stripDegenerate(sequence)) except: ambi = self.Degenerates def not_ambiguous(x): return not x in ambi return sequence.__class__(filter(not_ambiguous, sequence)) elif method == 'random': degen = self.Degenerates result = [] for i in sequence: if i in degen: result.append(choice(degen[i])) else: result.append(i) if isinstance(sequence, str): return sequence.__class__(''.join(result)) else: return sequence.__class__(result) else: raise NotImplementedError, "Got unknown method %s" % method def degap(self, sequence): """Deletes all gap characters from sequence.""" try: return sequence.__class__(sequence.translate( \ allchars, self.GapString)) except AttributeError: gap = self.Gaps def not_gap(x): return not x in gap return sequence.__class__(filter(not_gap, sequence)) def gapList(self, sequence): """Returns list of indices of all gaps in the sequence, or [].""" gaps = self.Gaps return [i for i, s in enumerate(sequence) if s in gaps] def gapVector(self, sequence): """Returns list of bool indicating gap or non-gap in sequence.""" return map(self.isGap, sequence) def gapMaps(self, sequence): """Returns tuple containing dicts mapping between gapped and ungapped. First element is a dict such that d[ungapped_coord] = gapped_coord. Second element is a dict such that d[gapped_coord] = ungapped_coord. Note that the dicts will be invalid if the sequence changes after the dicts are made. The gaps themselves are not in the dictionary, so use d.get() or test 'if pos in d' to avoid KeyErrors if looking up all elements in a gapped sequence. """ ungapped = {} gapped = {} num_gaps = 0 for i, is_gap in enumerate(self.gapVector(sequence)): if is_gap: num_gaps += 1 else: ungapped[i] = i - num_gaps gapped[i - num_gaps] = i return gapped, ungapped def countGaps(self, sequence): """Counts the gaps in the specified sequence.""" gaps = self.Gaps gap_count = 0 for s in sequence: if s in gaps: gap_count += 1 return gap_count def countDegenerate(self, sequence): """Counts the degenerate bases in the specified sequence.""" degen = self.Degenerates degen_count = 0 for s in sequence: if s in degen: degen_count += 1 return degen_count def possibilities(self, sequence): """Counts number of possible sequences matching the sequence. Uses self.Degenerates to decide how many possibilites there are at each position in the sequence. """ degen = self.Degenerates count = 1 for s in sequence: if s in degen: count *= len(degen[s]) return count def MW(self, sequence, method='random', delta=None): """Returns the molecular weight of the sequence. If the sequence is ambiguous, uses method (random or strip) to disambiguate the sequence. if delta is present, uses it instead of the standard weight adjustment. """ if not sequence: return 0 try: return self.MWCalculator(sequence, delta) except KeyError: #assume sequence was ambiguous return self.MWCalculator(self.disambiguate(sequence, method), delta) def canMatch(self, first, second): """Returns True if every pos in 1st could match same pos in 2nd. Truncates at length of shorter sequence. Gaps are only allowed to match other gaps. """ m = self.Matches for pair in zip(first, second): if pair not in m: return False return True def canMismatch(self, first, second): """Returns True if any position in 1st could cause a mismatch with 2nd. Truncates at length of shorter sequence. Gaps are always counted as matches. """ m = self.Matches if not first or not second: return False for pair in zip(first, second): if not m.get(pair, None): return True return False def mustMatch(self, first, second): """Returns True if all positions in 1st must match positions in second.""" return not self.canMismatch(first, second) def canPair(self, first, second): """Returns True if first and second could pair. Pairing occurs in reverse order, i.e. last position of second with first position of first, etc. Truncates at length of shorter sequence. Gaps are only allowed to pair with other gaps, and are counted as 'weak' (same category as GU and degenerate pairs). NOTE: second must be able to be reverse """ p = self.Pairs sec = list(second) sec.reverse() for pair in zip(first, sec): if pair not in p: return False return True def canMispair(self, first, second): """Returns True if any position in 1st could mispair with 2nd. Pairing occurs in reverse order, i.e. last position of second with first position of first, etc. Truncates at length of shorter sequence. Gaps are always counted as possible mispairs, as are weak pairs like GU. """ p = self.Pairs if not first or not second: return False sec = list(second) sec.reverse() for pair in zip(first, sec): if not p.get(pair, None): return True return False def mustPair(self, first, second): """Returns True if all positions in 1st must pair with second. Pairing occurs in reverse order, i.e. last position of second with first position of first, etc. """ return not self.canMispair(first, second) def degenerateFromSequence(self, sequence): """Returns least degenerate symbol corresponding to chars in sequence. First tries to look up in self.InverseDegenerates. Then disambiguates and tries to look up in self.InverseDegenerates. Then tries converting the case (tries uppercase before lowercase). Raises TypeError if conversion fails. """ symbols = frozenset(sequence) #check if symbols are already known inv_degens = self.InverseDegenerates result = inv_degens.get(symbols, None) if result: return result #then, try converting the symbols degens = self.All converted = set() for sym in symbols: for char in degens[sym]: converted.add(char) symbols = frozenset(converted) result = inv_degens.get(symbols, None) if result: return result #then, try converting case symbols = frozenset([s.upper() for s in symbols]) result = inv_degens.get(symbols, None) if result: return result symbols = frozenset([s.lower() for s in symbols]) result = inv_degens.get(symbols, None) if result: return result #finally, try to find the minimal subset containing the symbols symbols = frozenset([s.upper() for s in symbols]) lengths = {} for i in inv_degens: if symbols.issubset(i): lengths[len(i)] = i if lengths: #found at least some matches sorted = lengths.keys() sorted.sort() return inv_degens[lengths[sorted[0]]] #if we got here, nothing worked raise TypeError, "Cannot find degenerate char for symbols: %s" \ % symbols ASCII = MolType( # A default type for text read from a file etc. when we don't # want to prematurely assume DNA or Protein. Sequence = DefaultSequence, motifset = string.letters, Ambiguities = {}, label = 'text', ModelSeq = ModelSequence, ) DNA = MolType( Sequence = DnaSequence, motifset = IUPAC_DNA_chars, Ambiguities = IUPAC_DNA_ambiguities, label = "dna", MWCalculator = DnaMW, Complements = IUPAC_DNA_ambiguities_complements, Pairs = DnaStandardPairs, make_alphabet_group=True, ModelSeq = ModelDnaSequence, ) RNA = MolType( Sequence = RnaSequence, motifset = IUPAC_RNA_chars, Ambiguities = IUPAC_RNA_ambiguities, label = "rna", MWCalculator = RnaMW, Complements = IUPAC_RNA_ambiguities_complements, Pairs = RnaStandardPairs, make_alphabet_group=True, ModelSeq = ModelRnaSequence, ) PROTEIN = MolType( Sequence = ProteinSequence, motifset = IUPAC_PROTEIN_chars, Ambiguities = IUPAC_PROTEIN_ambiguities, MWCalculator = ProteinMW, make_alphabet_group=True, ModelSeq = ModelProteinSequence, label = "protein") PROTEIN_WITH_STOP = MolType( Sequence = ProteinWithStopSequence, motifset = PROTEIN_WITH_STOP_chars, Ambiguities = PROTEIN_WITH_STOP_ambiguities, MWCalculator = ProteinMW, make_alphabet_group=True, ModelSeq = ModelProteinWithStopSequence, label = "protein_with_stop") BYTES = MolType( # A default type for arbitrary chars read from a file etc. when we don't # want to prematurely assume _anything_ about the data. Sequence = ByteSequence, motifset = map(chr, range(256)), Ambiguities = {}, ModelSeq = ModelSequence, label = 'bytes') #following is a two-state MolType useful for testing AB = MolType( Sequence = ABSequence, motifset = 'ab', Ambiguities={}, ModelSeq = ModelSequence, label='ab') class _CodonAlphabet(Alphabet): """Codon alphabets are DNA TupleAlphabets with a genetic code attribute and some codon-specific methods""" def _with(self, motifs): a = Alphabet._with(self, motifs) a.__class__ = type(self) a._gc = self._gc return a def isCodingCodon(self, codon): return not self._gc.isStop(codon) def isStopCodon(self, codon): return self._gc.isStop(codon) def getGeneticCode(self): return self._gc def CodonAlphabet(gc=DEFAULT_GENETIC_CODE, include_stop_codons=False): if isinstance(gc, (int, basestring)): gc = GeneticCodes[gc] if include_stop_codons: motifset = list(gc.Codons) else: motifset = list(gc.SenseCodons) motifset = [codon.upper().replace('U', 'T') for codon in motifset] a = _CodonAlphabet(motifset, MolType=DNA) a._gc = gc return a def _method_codon_alphabet(ignore, *args, **kwargs): """If CodonAlphabet is set as a property, it gets self as extra 1st arg.""" return CodonAlphabet(*args, **kwargs) STANDARD_CODON = CodonAlphabet() #Modify NucleicAcidSequence to avoid circular import NucleicAcidSequence.CodonAlphabet = _method_codon_alphabet NucleicAcidSequence.PROTEIN = PROTEIN ModelRnaSequence.MolType = RNA ModelRnaSequence.Alphabet = RNA.Alphabets.DegenGapped ModelDnaSequence.MolType = DNA ModelDnaSequence.Alphabet = DNA.Alphabets.DegenGapped ModelProteinSequence.MolType = PROTEIN ModelProteinSequence.Alphabet = PROTEIN.Alphabets.DegenGapped ModelProteinWithStopSequence.MolType = PROTEIN_WITH_STOP ModelProteinWithStopSequence.Alphabet = PROTEIN_WITH_STOP.Alphabets.DegenGapped ModelSequence.Alphabet = BYTES.Alphabet DenseAlignment.Alphabet = BYTES.Alphabet DenseAlignment.MolType = BYTES ModelDnaCodonSequence.Alphabet = DNA.Alphabets.Base.Triples ModelRnaCodonSequence.Alphabet = RNA.Alphabets.Base.Triples #Modify Alignment to avoid circular import Alignment.MolType = ASCII SequenceCollection.MolType = BYTES PyCogent-1.5.3/cogent/core/profile.py000644 000765 000024 00000073046 12024702176 020474 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides Profile and ProfileError object and CharMeaningProfile Owner: Sandra Smit (Sandra Smit) """ from __future__ import division #SUPPORT2425 #from __future__ import with_statement from string import maketrans, translate from numpy import array, sum, transpose, reshape, ones, zeros,\ take, float64, ravel, nonzero, log, put, concatenate, argmax, cumsum,\ sort, argsort, searchsorted, logical_and, asarray, uint8, add, subtract,\ multiply, divide, newaxis, alltrue, max, all, isfinite #from numpy.oldnumeric import sum from numpy.random import random from cogent.util.array import euclidean_distance, row_degeneracy,\ column_degeneracy, row_uncertainty, column_uncertainty, safe_log from cogent.format.table import formattedCells ##SUPPORT2425 import numpy #from cogent.util.unit_test import numpy_err __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Gavin Huttley", "Rob Knight" "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Production" class ProfileError(Exception): """Error raised for exceptions occuring in the Profile object""" pass class Profile(object): """Profile class """ def __init__(self, Data, Alphabet, CharOrder=None): """Initializes a new Profile object. Data: numpy 2D array with the Profile data in it. Specifically, each row of the array corresponds to a position in the Alignment. Each column of the array corresponds to a character in the Alphabet. Alphabet: an Alphabet object or anything that can act as a list of characters CharOrder: optional list of characters to which the columns in the Data correspond. """ self.Data = Data self.Alphabet = Alphabet if CharOrder is None: self.CharOrder = list(self.Alphabet) else: self.CharOrder = CharOrder #the translation table is needed for making consensus sequences, #but will fail if the alphabet isn't made of chars (in which case, #we'll just skip the translation table, and certain downstream #operations may fail). try: self._translation_table = self._make_translation_table() except: pass def __str__(self): """Returns string representation of self.Data""" return str(self.Data) def _make_translation_table(self): """Makes a translation tables between the CharOrder and indices """ indices = ''.join(map(chr, range(len(self.CharOrder)))) chars = ''.join(map(str,self.CharOrder)) return maketrans(chars, indices) def hasValidData(self, err=1e-16): """Returns True if all rows in self.Data add up to one err -- float, maximum deviation from 1 allowed, default is 1e-16 Rounding errors might occur, so a small deviation from 1 is allowed. The default tolerance is 1e-16. """ obs_sums = sums = sum(self.Data,1) lower_bound = ones(len(self.Data)) - err upper_bound = ones(len(self.Data)) + err tfs = sum(self.Data,1) == ones(len(self.Data)) if (lower_bound <= obs_sums).all() and (obs_sums <= upper_bound).all(): return True return False def hasValidAttributes(self): """Checks Alphabet, CharOrder, and size of self.Data""" if not reduce(logical_and, [c in self.Alphabet\ for c in self.CharOrder]): return False elif self.Data.shape[1] != len(self.CharOrder): return False return True def isValid(self): """Check whether everything in the Profile is valid""" vd = self.hasValidData() va = self.hasValidAttributes() return vd and va def dataAt(self, pos, character=None): """Return data for a certain position (row!) and character (column) pos -- int, position (row) in the profile character -- str, character from the CharacterOrder If character is None, all data for the position is returned. """ if not 0 <= pos < len(self.Data): raise ProfileError(\ "Position %s is not present in the profile"%(pos)) if character is None: return self.Data[pos,:] else: if character not in self.CharOrder: raise ProfileError(\ "Character %s is not present in the profile's CharacterOrder"\ %(character)) return self.Data[pos, self.CharOrder.index(character)] def copy(self): """Returns a copy of the Profile object WARNING: the data, alphabet and the character order are the same object in the original and the copy. This means you can rebind the attributes, but modifying them will change them in both the original and the copy. """ return self.__class__(self.Data, self.Alphabet, self.CharOrder) def normalizePositions(self): """Normalizes the data by position (the rows!) to one It does not make sense to normalize anything with negative numbers in there. However, the method does NOT check for that, because it would slow down the calculations too much. It will work, but you might get very unexpected results. The method will raise an error when one or more rows add up to one. It checks explicitly for that to avoid OverflowErrors, ZeroDivisionErrors, and infinities in the results. WARNING: this method works in place with respect to the Profile object, not with respect to the Data attribute. Normalization rebinds self.Data to a new array. """ row_sums = sum(self.Data,1) if (row_sums == 0).any(): zero_indices = nonzero(row_sums==0)[0].tolist() raise ProfileError,\ "Can't normalize profile, rows at indices %s add up to zero"\ %(zero_indices) else: self.Data = self.Data/row_sums[:,newaxis] def normalizeSequences(self): """Normalized the data by sequences (the columns) to one It does not make sense to normalize anything with negative numbers in there. However, the method does NOT check for that, because it would slow down the calculations too much. It will work, but you might get very unexpected results. The method will raise an error when one or more columns add up to one. It checks explicitly for that to avoid OverflowErrors, ZeroDivisionErrors, and infinities in the results. WARNING: this method works in place with respect to the Profile object, not with respect to the Data attribute. Normalization rebinds self.Data to a new array. """ col_sums = sum(self.Data, axis=0) if (col_sums == 0).any(): zero_indices = nonzero(col_sums==0)[0].tolist() raise ProfileError,\ "Can't normalize profile, columns at indices %s add up to zero"\ %(zero_indices) else: self.Data = self.Data/col_sums def prettyPrint(self, include_header=False, transpose_data=False,\ column_limit=None, col_sep='\t'): """Returns a string method of the data and character order. include_header: include charcter order or not transpose_data: data as is (rows are positions) or transposed (rows are characters) to line it up with an alignment column_limit = int, maximum number of columns displayed col_sep = string, column separator """ h = self.CharOrder d = self.Data if column_limit is None: max_col_idx = d.shape[1] else: max_col_idx = column_limit if include_header and not transpose_data: r = [h]+d.tolist() elif include_header and transpose_data: r =[[x] + y for x,y in zip(h,transpose(d).tolist())] elif transpose_data: r = transpose(d).tolist() else: r = d.tolist() # resize the result based on the column limit if column_limit is not None: r = [row[:column_limit] for row in r] # nicely format the table content, discard the header (already included) if r: new_header, formatted_res = formattedCells(r) else: formatted_res = r return '\n'.join([col_sep.join(map(str,i)) for i in formatted_res]) def reduce(self,other,op=add,normalize_input=True,normalize_output=True): """Reduces two profiles with some operator and returns a new Profile other: Profile object op: operator (e.g. add, subtract, multiply, divide) normalize_input: whether the input profiles will be normalized before collapsing. The default is True. normalize_output: whether the output profile will be normalized. The default is True This function is intented for use on normalized profiles. For safety it'll try to normalize the data before collapsing them. If you do not normalize your data and set normalize_input to False, you might get unexpected results. It does check whether self.Data and other.Data have the same shape It does not check whether self and other have the same CharOrder. The resulting Profile gets the alphabet and char order from self. """ if self.Data.shape != other.Data.shape: raise ProfileError,\ "Cannot collapse profiles of different size: %s, %s"\ %(self.Data.shape,other.Data.shape) if normalize_input: self.normalizePositions() other.normalizePositions() try: ##SUPPORT2425 ori_err = numpy.geterr() numpy.seterr(divide='raise') try: new_data = op(self.Data, other.Data) finally: numpy.seterr(**ori_err) #with numpy_err(divide='raise'): #new_data = op(self.Data, other.Data) except (OverflowError, ZeroDivisionError, FloatingPointError): raise ProfileError, "Can't do operation on input profiles" result = Profile(new_data, self.Alphabet, self.CharOrder) if normalize_output: result.normalizePositions() return result def __add__(self,other): """Binary + operator: adds two profiles element-wise. Input and output are NOT normalized. """ return self.reduce(other, op=add, normalize_input=False,\ normalize_output=False) def __sub__(self,other): """Binary - operator: subtracts two profiles element-wise Input and output are NOT normalized. """ return self.reduce(other, op=subtract, normalize_input=False,\ normalize_output=False) def __mul__(self,other): """* operator: multiplies two profiles element-wise Input and output are NOT normalized. """ return self.reduce(other, op=multiply, normalize_input=False,\ normalize_output=False) def __div__(self,other): """/ operator for old-style division: divides 2 profiles element-wise. Used when __future__.divsion not imported Input and output are NOT normalized. """ return self.reduce(other, op=divide, normalize_input=False,\ normalize_output=False) def __truediv__(self,other): """/ operator for new-style division: divides 2 profiles element-wise. Used when __future__.division is in action. Input and output are NOT normalized. """ return self.reduce(other, op=divide, normalize_input=False,\ normalize_output=False) def distance(self, other, method=euclidean_distance): """Returns the distance between two profiles other: Profile object method: function used to calculated the distance between two arrays. WARNING: In principle works only on profiles of the same size. However, when one of the two profiles is 1D (which shouldn't happen) and can be aligned with the other profile the distance is still calculated and may give unexpected results. """ try: return method(self.Data, other.Data) except ValueError: #frames not aligned raise ProfileError,\ "Profiles have different size (and are not aligned): %s %s"\ %(self.Data.shape,other.Data.shape) def toOddsMatrix(self, symbol_freqs=None): """Returns the OddsMatrix of a profile as a new Profile. symbol_freqs: per character array of background frequencies e.g. [.25,.25,.25,.25] for equal frequencies for each of the four bases. If no symbol frequencies are provided, all symbols will get equal freqs. The length of symbol freqs should match the number of columns in the profile! If symbol freqs contains a zero entry, a ProfileError is raised. This is done to prevent either a ZeroDivisionError (raised when zero is an int) or 'inf' in the resulting matrix (which happens when zero is a float). """ pl = self.Data.shape[1] #profile length #if symbol_freqs is None, create an array with equal frequencies if symbol_freqs is None: symbol_freqs = ones(pl)/pl else: symbol_freqs = array(symbol_freqs) #raise error when symbol_freqs has wrong length if len(symbol_freqs) != pl: raise ProfileError,\ "Length of symbol freqs should be %s, but is %s"\ %(pl,len(symbol_freqs)) #raise error when symbol freqs contains zero (to prevent #ZeroDivisionError or 'inf' in the resulting matrix) if sum(symbol_freqs != 0, 0) != len(symbol_freqs): raise ProfileError,\ "Symbol frequency is not allowed to be zero: %s"\ %(symbol_freqs) #calculate the OddsMatrix log_odds = self.Data/symbol_freqs return Profile(log_odds, self.Alphabet, self.CharOrder) def toLogOddsMatrix(self, symbol_freqs=None): """Returns the LogOddsMatrix of a profile as a new Profile/ symbol_freqs: per character array of background frequencies e.g. [.25,.25,.25,.25] for equal frequencies for each of the four bases. See toOddsMatrix for more information. """ odds = self.toOddsMatrix(symbol_freqs) log_odds = safe_log(odds.Data) return Profile(log_odds, self.Alphabet, self.CharOrder) def _score_indices(self, seq_indices, offset=0): """Returns score of the profile for each slice of the seq_indices seq_indices: translation of sequence into indices that match the characters in the CharOrder of the profile offset: where to start the matching procedure This function doesn't do any input validation. That is done in 'score' See method 'score' for more information. """ data = self.Data pl = len(data) #profile length (number of positions) sl = len(seq_indices) r = range(pl) #fixed range result = [] for starting_pos in range(offset, len(seq_indices)-pl+1): slice = seq_indices[starting_pos:starting_pos+pl] result.append(sum(array([data[i] for i in zip(r,slice)]), axis=0)) return array(result) def _score_profile(self, profile, offset=0): """Returns score of the profile against the input_profile. profile: Profile of a sequence or alignment that has to be scored offset: where to start the matching procedure This function doesn't do any input validation. That is done in 'score' See method 'score' for more information. """ data = self.Data self_l = len(data) #profile length other_l = len(profile.Data) #other profile length result = [] for start in range(offset,other_l-self_l+1): stop = start + self_l slice = profile.Data[start:stop,:] result.append(sum(self.Data*slice)) return array(result) def score(self, input_data, offset=0): """Returns a score of the profile against input_data (Profile or Seq). seq: Profile or Sequence object (or string) offset: starting index for searching in seq/profile Returns the score of the profile against all possible subsequences/ subprofiles of the input_data. This method determines how well a profile fits at different places in the sequence. This is very useful when the profile is a motif and you want to find the position in the sequence that best matches the profile/motif. Sequence Example: ================= T C A G 0 .2 .4 .4 0 1 .1 0 .9 0 2 .1 .2 .3 .4 Sequence: TCAAGT pos 0: TCA -> 0.5 pos 1: CAA -> 1.6 pos 2: AAG -> 1.7 pos 3: AGT -> 0.5 So the subsequence starting at index 2 in the sequence has the best match with the motif Profile Example: ================ Profile: same as above Profile to score: T C A G 0 1 0 0 0 1 0 1 0 0 2 0 0 .5 .5 3 0 0 0 1 4 .25 .25 .25 .25 pos 0: rows 0,1,2 -> 0.55 pos 1: rows 1,2,3 -> 1.25 pos 2: rows 2,3,4 -> 0.45 """ #set up some local variables data = self.Data pl = len(data) #profile length is_profile = False #raise error if profile is empty if not data.any(): raise ProfileError,"Can't score an empty profile" #figure out what the input_data type is if isinstance(input_data,Profile): is_profile = True to_score_length = len(input_data.Data) #raise error if CharOrders don't match if self.CharOrder != input_data.CharOrder: raise ProfileError, "Profiles must have same character order" else: #assumes it get a sequence to_score_length = len(input_data) #Profile should fit at least once in the sequence/profile_to_score if to_score_length < pl: raise ProfileError,\ "Sequence or Profile to score should be at least %s "%(pl)+\ "characters long, but is %s."%(to_score_length) #offset should be valid if not offset <= (to_score_length - pl): raise ProfileError, "Offset must be <= %s, but is %s"\ %((to_score_length-pl), offset) #call the apropriate scoring function if is_profile: return self._score_profile(input_data, offset) else: #translate seq to indices if hasattr(self, '_translation_table'): seq_indices = array(map(ord,translate(str(input_data),\ self._translation_table))) else: #need to figure out where each item is in the charorder idx = self.CharOrder.index seq_indices = array(map(idx, input_data)) #raise error if some sequence characters are not in the CharOrder if (seq_indices > len(self.CharOrder)).any(): raise ProfileError,\ "Sequence contains characters that are not in the "+\ "CharOrder" #now the profile is scored against the list of indices return self._score_indices(seq_indices,offset) def rowUncertainty(self): """Returns the uncertainty (Shannon's entropy) for each row in profile Entropy is returned in BITS (not in NATS). """ if not self.Data.any(): return array([]) try: return row_uncertainty(self.Data) except ValueError: raise ProfileError,\ "Profile has to be two dimensional to calculate rowUncertainty" def columnUncertainty(self): """Returns uncertainty (Shannon's entropy) for each column in profile Uncertainty is returned in BITS (not in NATS). """ if not self.Data.any(): return array([]) try: return column_uncertainty(self.Data) except ValueError: raise ProfileError,\ "Profile has to be two dimensional to calculate columnUncertainty" def rowDegeneracy(self, cutoff=0.5): """Returns how many chars are needed to cover the cutoff value. cutoff: value that should be covered in each row For example: pos 0: [.1,.2,.3,.4] char order=TCAG. If cutoff=0.75 -> degeneracy = 3 (degenearate char for CAG) If cutoff=0.25 -> degeneracy = 1 (G alone covers this cutoff) If cutoff=0.5 -> degeneracy = 2 (degenerate char for AG) If the cutoff value is not reached in the row, the returned value will be clipped to the length of the character order (=the number of columns in the Profile). """ try: return row_degeneracy(self.Data,cutoff) except ValueError: raise ProfileError,\ "Profile has to be two dimensional to calculate rowDegeneracy" def columnDegeneracy(self, cutoff=0.5): """Returns how many chars are neede to cover the cutoff value See rowDegeneracy for more information. """ try: return column_degeneracy(self.Data,cutoff) except ValueError: raise ProfileError,\ "Profile has to be two dimensional to calculate columnDegeneracy" def rowMax(self): """Returns ara containing most frequent element in each row of the profile.""" return max(self.Data, 1) def toConsensus(self, cutoff=None, fully_degenerate=False,\ include_all=False): """Returns the consensus sequence from a profile. cutoff: cutoff value, determines how much should be covered in a position (row) of the profile. Example: pos 0 [.2,.1,.3,.4] (CharOrder: TCAG). To cover .65 (=cutoff) we need two characters: A and G, which results in the degenerate character R. fully_degenerate: determines whether the fully degenerate character is returned at a position. For the example above an 'N' would be returned. inlcude_all: all possibilities are included in the degenerate character. Example: row = UCAG = [.1,.3,.3,.3] cutoff = .4, consensus = 'V' (even though only 2 chars would be enough to reach the cutoff value). The Alphabet of the Profile should implement degenerateFromSequence. Note that cutoff has priority over fully_degenerate. In other words, if you specify a cutoff value and set fully_degenerate to true, the calculation will be done with the cutoff value. If nothing gets passed in, the maximum argument is chosen. In the first example above G will be returned. """ #set up some local variables co = array(self.CharOrder, 'c') alpha = self.Alphabet data = self.Data #determine the action. Cutoff takes priority over fully_degenerate if cutoff: result = [] degen = self.rowDegeneracy(cutoff) sorted = argsort(data) if include_all: #if include_all include all possiblilities in the degen char for row_idx, (num_to_keep, row) in enumerate(zip(degen,sorted)): to_take = [item for item in row[-num_to_keep:]\ if item in nonzero(data[row_idx])[0]] +\ [item for item in nonzero(data[row_idx] ==\ data[row_idx,row[-num_to_keep]])[0] if item in\ nonzero(data[row_idx])[0]] result.append(alpha.degenerateFromSequence(\ map(str,take(co, to_take, axis=0)))) else: for row_idx, (num_to_keep, row) in enumerate(zip(degen,sorted)): result.append(alpha.degenerateFromSequence(\ map(str,take(co, [item for item in row[-num_to_keep:]\ if item in nonzero(data[row_idx])[0]])))) elif not fully_degenerate: result = take(co, argmax(self.Data, axis=-1), axis=0) else: result = [] for row in self.Data: result.append(alpha.degenerateFromSequence(\ map(str,take(co, nonzero(row)[0], axis=0)))) return ''.join(map(str,result)) def randomIndices(self, force_accumulate=False, random_f = random): """Returns random indices matching current probability matrix. Stores cumulative sum (sort of) of probability matrix in self._accumulated; Use force_accumulate to reset if you change the matrix in place (which you shouldn't do anyway). The returned indices correspond to the characters in the CharOrder of the Profile. """ if force_accumulate or not hasattr(self, '_accumulated'): self._accumulated = cumsum(self.Data, 1) choices = random_f(len(self.Data)) return array([searchsorted(v, c) for v, c in\ zip(self._accumulated, choices)]) def randomSequence(self, force_accumulate=False, random_f = random): """Returns random sequence matching current probability matrix. Stores cumulative sum (sort of) of probability matrix in self._accumulated; Use force_accumulate to reset if you change the matrix in place (which you shouldn't do anyway). """ co = self.CharOrder random_indices = self.randomIndices(force_accumulate,random_f) return ''.join(map(str,take(co,random_indices))) def CharMeaningProfile(alphabet, char_order=None, split_degenerates=False): """Returns a Profile with the meaning of each character in the alphabet alphabet: Alphabet object (should have 'Degenerates'if split_degenerates is set to True) char_order: string indicating the order of the characters in the profile split_degenerates: whether the meaning of degenerate symbols in the alphabet should be split up among the characters in the char order, or ignored. The returned profile has 255 rows (one for each ascii character) and one column for each character in the character order. The profile specifies the meaning of each character in the alphabet. Chars in the character order will count as a full character by themselves, degenerate characters might split their 'value' over several other charcters in the character order. Splitting up degenerates: only degenerate characters of which the full set of symbols it maps onto are in the character order are split up, others are ignored. E.g. in the DnaAlphabet, if the char order is TACG, ? (which maps to TCAG-) wouldn't be split up, 'R' (which maps to 'AG') would. Any degenerate characters IN the character order will NOT be split up. It doesn't make sense to split up a character that is in the char order because it would create an empty column in the profile, so it might as well be left out alltogether. Example 1: Alphabet = DnaAlphabet Character order = "TCAG" Split degenerates = False All the nonzero rows in the resulting profile are: 65: [0,0,1,0] (A) 67: [0,1,0,0] (C) 71: [0,0,0,1] (G) 84: [1,0,0,0] (T) All other rows will be [0,0,0,0]. Example 2: Alphabet = DnaAlphabet Character order = "AGN" Split degenerates = True All the nonzero rows in the resulting profile are: 65: [1,0,0] (A) 71: [0,1,0] (G) 78: [0,0,1] (N) 82: [.5,.5,0] (R) All other rows will be [0,0,0]. Errors are raised when the character order is empty or when there's a character in the character order that is not in the alphabet. """ if not char_order: #both testing for None and for empty string char_order = list(alphabet) char_order = array(char_order, 'c') lc = len(char_order) #length char_order #initialize the profile. 255 rows (one for each ascii char), one column #for each character in the character order result = zeros([255,lc],float64) if split_degenerates: degen = alphabet.Degenerates for degen_char in degen: #if all characters that the degenerate character maps onto are #in the character order, split its value up according to the #alphabet curr_degens = degen[degen_char] if all(map(char_order.__contains__, curr_degens)): contains = map(curr_degens.__contains__, char_order) result[ord(degen_char)] = \ array(contains, float)/len(curr_degens) #for each character in the character order, make an entry of ones and #zeros, matching the character order for c in char_order: c = str(c) if c not in alphabet: raise ValueError, "Found character in the character order "+\ "that is not in the specified alphabet: %s"%(c) result[ord(c)] = array(c*lc, 'c') == char_order return Profile(Data=result,Alphabet=alphabet,CharOrder=char_order) PyCogent-1.5.3/cogent/core/sequence.py000644 000765 000024 00000154245 12024702176 020645 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Contains classes that represent biological sequence data. These provide generic biological sequence manipulation functions, plus functions that are critical for the EVOLVE calculations. WARNING: Do not import sequence classes directly! It is expected that you will access them through the moltype module. Sequence classes depend on information from the MolType that is _only_ available after MolType has been imported. Sequences are intended to be immutable. This is not enforced by the code for performance reasons, but don't alter the MolType or the sequence data after creation. """ from __future__ import division from annotation import Map, Feature, _Annotatable from cogent.util.transform import keep_chars, for_seq, per_shortest, \ per_longest from cogent.util.misc import DistanceFromMatrix from cogent.core.genetic_code import DEFAULT as DEFAULT_GENETIC_CODE, \ GeneticCodes from cogent.parse import gff from cogent.format.fasta import fasta_from_sequences from cogent.core.info import Info as InfoClass from numpy import array, zeros, put, nonzero, take, ravel, compress, \ logical_or, logical_not, arange from numpy.random import permutation from operator import eq, ne from random import shuffle import re import warnings __author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley", "Matthew Wakefield", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" ARRAY_TYPE = type(array(1)) #standard distance functions: left because generally useful frac_same = for_seq(f=eq, aggregator=sum, normalizer=per_shortest) frac_diff = for_seq(f=ne, aggregator=sum, normalizer=per_shortest) class SequenceI(object): """Abstract class containing Sequence interface. Specifies methods that Sequence delegates to its MolType, and methods for detecting gaps. """ #String methods delegated to self._seq -- remember to override if self._seq #isn't a string in your base class, but it's probably better to make #self._seq a property that contains the string. LineWrap = None #used for formatting FASTA strings def __str__(self): """__str__ returns self._seq unmodified.""" return self._seq def toFasta(self, make_seqlabel=None): """Return string of self in FASTA format, no trailing newline Arguments: - make_seqlabel: callback function that takes the seq object and returns a label str """ return fasta_from_sequences([self], make_seqlabel = make_seqlabel, line_wrap=self.LineWrap) def translate(self, *args, **kwargs): """translate() delegates to self._seq.""" return self._seq.translate(*args, **kwargs) def count(self, item): """count() delegates to self._seq.""" return self._seq.count(item) def __cmp__(self, other): """__cmp__ compares based on the sequence string.""" return cmp(self._seq, other) def __hash__(self): """__hash__ behaves like the sequence string for dict lookup.""" return hash(self._seq) def __contains__(self, other): """__contains__ checks whether other is in the sequence string.""" return other in self._seq def shuffle(self): """returns a randomized copy of the Sequence object""" randomized_copy_list = list(self) shuffle(randomized_copy_list) return self.__class__(''.join(randomized_copy_list), Info=self.Info) def complement(self): """Returns complement of self, using data from MolType. Always tries to return same type as item: if item looks like a dict, will return list of keys. """ return self.__class__(self.MolType.complement(self), Info=self.Info) def stripDegenerate(self): """Removes degenerate bases by stripping them out of the sequence.""" return self.__class__(self.MolType.stripDegenerate(self), Info=self.Info) def stripBad(self): """Removes any symbols not in the alphabet.""" return self.__class__(self.MolType.stripBad(self), Info=self.Info) def stripBadAndGaps(self): """Removes any symbols not in the alphabet, and any gaps.""" return self.__class__(self.MolType.stripBadAndGaps(self), Info=self.Info) def rc(self): """Returns reverse complement of self w/ data from MolType. Always returns same type self. """ return self.__class__(self.MolType.rc(self), Info=self.Info) def isGapped(self): """Returns True if sequence contains gaps.""" return self.MolType.isGapped(self) def isGap(self, char=None): """Returns True if char is a gap. If char is not supplied, tests whether self is gaps only. """ if char is None: #no char - so test if self is all gaps return len(self) == self.countGaps() else: return self.MolType.isGap(char) def isDegenerate(self): """Returns True if sequence contains degenerate characters.""" return self.MolType.isDegenerate(self) def isValid(self): """Returns True if sequence contains no items absent from alphabet.""" return self.MolType.isValid(self) def isStrict(self): """Returns True if sequence contains only monomers.""" return self.MolType.isStrict(self) def firstGap(self): """Returns the index of the first gap in the sequence, or None.""" return self.MolType.firstGap(self) def firstDegenerate(self): """Returns the index of first degenerate symbol in sequence, or None.""" return self.MolType.firstDegenerate(self) def firstInvalid(self): """Returns the index of first invalid symbol in sequence, or None.""" return self.MolType.firstInvalid(self) def firstNonStrict(self): """Returns the index of first non-strict symbol in sequence, or None.""" return self.MolType.firstNonStrict(self) def disambiguate(self, method='strip'): """Returns a non-degenerate sequence from a degenerate one. method can be 'strip' (deletes any characters not in monomers or gaps) or 'random'(assigns the possibilities at random, using equal frequencies). """ return self.__class__(self.MolType.disambiguate(self, method), \ Info=self.Info) def degap(self): """Deletes all gap characters from sequence.""" return self.__class__(self.MolType.degap(self), Info=self.Info) def gapList(self): """Returns list of indices of all gaps in the sequence, or [].""" return self.MolType.gapList(self) def gapVector(self): """Returns vector of True or False according to which pos are gaps.""" return self.MolType.gapVector(self) def gapMaps(self): """Returns dicts mapping between gapped and ungapped positions.""" return self.MolType.gapMaps(self) def countGaps(self): """Counts the gaps in the specified sequence.""" return self.MolType.countGaps(self) def countDegenerate(self): """Counts the degenerate bases in the specified sequence.""" return self.MolType.countDegenerate(self) def possibilities(self): """Counts number of possible sequences matching the sequence. Uses self.Degenerates to decide how many possibilites there are at each position in the sequence. """ return self.MolType.possibilities(self) def MW(self, method='random', delta=None): """Returns the molecular weight of (one strand of) the sequence. If the sequence is ambiguous, uses method (random or strip) to disambiguate the sequence. If delta is passed in, adds delta per strand (default is None, which uses the alphabet default. Typically, this adds 18 Da for terminal water. However, note that the default nucleic acid weight assumes 5' monophosphate and 3' OH: pass in delta=18.0 if you want 5' OH as well. Note that this method only calculates the MW of the coding strand. If you want the MW of the reverse strand, add self.rc().MW(). DO NOT just multiply the MW by 2: the results may not be accurate due to strand bias, e.g. in mitochondrial genomes. """ return self.MolType.MW(self, method, delta) def canMatch(self, other): """Returns True if every pos in self could match same pos in other. Truncates at length of shorter sequence. Gaps are only allowed to match other gaps. """ return self.MolType.canMatch(self, other) def canMismatch(self, other): """Returns True if any position in self could mismatch with other. Truncates at length of shorter sequence. Gaps are always counted as matches. """ return self.MolType.canMismatch(self, other) def mustMatch(self, other): """Returns True if all positions in self must match positions in other.""" return self.MolType.mustMatch(self, other) def canPair(self, other): """Returns True if self and other could pair. Pairing occurs in reverse order, i.e. last position of other with first position of self, etc. Truncates at length of shorter sequence. Gaps are only allowed to pair with other gaps, and are counted as 'weak' (same category as GU and degenerate pairs). NOTE: second must be able to be reverse """ return self.MolType.canPair(self, other) def canMispair(self, other): """Returns True if any position in self could mispair with other. Pairing occurs in reverse order, i.e. last position of other with first position of self, etc. Truncates at length of shorter sequence. Gaps are always counted as possible mispairs, as are weak pairs like GU. """ return self.MolType.canMispair(self, other) def mustPair(self, other): """Returns True if all positions in self must pair with other. Pairing occurs in reverse order, i.e. last position of other with first position of self, etc. """ return not self.MolType.canMispair(self, other) def diff(self, other): """Returns number of differences between self and other. NOTE: truncates at the length of the shorter sequence. Case-sensitive. """ return self.distance(other) def distance(self, other, function=None): """Returns distance between self and other using function(i,j). other must be a sequence. function should be a function that takes two items and returns a number. To turn a 2D matrix into a function, use cogent.util.miscs.DistanceFromMatrix(matrix). NOTE: Truncates at the length of the shorter sequence. Note that the function acts on two _elements_ of the sequences, not the two sequences themselves (i.e. the behavior will be the same for every position in the sequences, such as identity scoring or a function derived from a distance matrix as suggested above). One limitation of this approach is that the distance function cannot use properties of the sequences themselves: for example, it cannot use the lengths of the sequences to normalize the scores as percent similarities or percent differences. If you want functions that act on the two sequences themselves, there is no particular advantage in making these functions methods of the first sequences by passing them in as parameters like the function in this method. It makes more sense to use them as standalone functions. The factory function cogent.util.transform.for_seq is useful for converting per-element functions into per-sequence functions, since it takes as parameters a per-element scoring function, a score aggregation function, and a normalization function (which itself takes the two sequences as parameters), returning a single function that combines these functions and that acts on two complete sequences. """ if function is None: #use identity scoring function function = lambda a, b : a != b distance = 0 for first, second in zip(self, other): distance += function(first, second) return distance def matrixDistance(self, other, matrix): """Returns distance between self and other using a score matrix. WARNING: the matrix must explicitly contain scores for the case where a position is the same in self and other (e.g. for a distance matrix, an identity between U and U might have a score of 0). The reason the scores for the 'diagonals' need to be passed explicitly is that for some kinds of distance matrices, e.g. log-odds matrices, the 'diagonal' scores differ from each other. If these elements are missing, this function will raise a KeyError at the first position that the two sequences are identical. """ return self.distance(other, DistanceFromMatrix(matrix)) def fracSame(self, other): """Returns fraction of positions where self and other are the same. Truncates at length of shorter sequence. Note that fracSame and fracDiff are both 0 if one sequence is empty. """ return frac_same(self, other) def fracDiff(self, other): """Returns fraction of positions where self and other differ. Truncates at length of shorter sequence. Note that fracSame and fracDiff are both 0 if one sequence is empty. """ return frac_diff(self, other) def fracSameGaps(self, other): """Returns fraction of positions where self and other share gap states. In other words, if self and other are both all gaps, or both all non-gaps, or both have gaps in the same places, fracSameGaps will return 1.0. If self is all gaps and other has no gaps, fracSameGaps will return 0.0. Returns 0 if one sequence is empty. Uses self's gap characters for both sequences. """ if not self or not other: return 0.0 is_gap = self.MolType.Gaps.__contains__ return sum([is_gap(i) == is_gap(j) for i,j in zip(self, other)]) \ /min(len(self),len(other)) def fracDiffGaps(self, other): """Returns frac. of positions where self and other's gap states differ. In other words, if self and other are both all gaps, or both all non-gaps, or both have gaps in the same places, fracDiffGaps will return 0.0. If self is all gaps and other has no gaps, fracDiffGaps will return 1.0. Returns 0 if one sequence is empty. Uses self's gap characters for both sequences. """ if not self or not other: return 0.0 return 1.0 - self.fracSameGaps(other) def fracSameNonGaps(self, other): """Returns fraction of non-gap positions where self matches other. Doesn't count any position where self or other has a gap. Truncates at the length of the shorter sequence. Returns 0 if one sequence is empty. """ if not self or not other: return 0.0 is_gap = self.MolType.Gaps.__contains__ count = 0 identities = 0 for i, j in zip(self, other): if is_gap(i) or is_gap(j): continue count += 1 if i == j: identities += 1 if count: return identities/count else: #there were no positions that weren't gaps return 0 def fracDiffNonGaps(self, other): """Returns fraction of non-gap positions where self differs from other. Doesn't count any position where self or other has a gap. Truncates at the length of the shorter sequence. Returns 0 if one sequence is empty. Note that this means that fracDiffNonGaps is _not_ the same as 1 - fracSameNonGaps, since both return 0 if one sequence is empty. """ if not self or not other: return 0.0 is_gap = self.MolType.Gaps.__contains__ count = 0 diffs = 0 for i, j in zip(self, other): if is_gap(i) or is_gap(j): continue count += 1 if i != j: diffs += 1 if count: return diffs/count else: #there were no positions that weren't gaps return 0 def fracSimilar(self, other, similar_pairs): """Returns fraction of positions where self[i] is similar to other[i]. similar_pairs must be a dict such that d[(i,j)] exists if i and j are to be counted as similar. Use PairsFromGroups in cogent.util.misc to construct such a dict from a list of lists of similar residues. Truncates at the length of the shorter sequence. Note: current implementation re-creates the distance function each time, so may be expensive compared to creating the distance function using for_seq separately. Returns 0 if one sequence is empty. """ if not self or not other: return 0.0 return for_seq(f = lambda x, y: (x,y) in similar_pairs, \ normalizer=per_shortest)(self, other) def withTerminiUnknown(self): """Returns copy of sequence with terminal gaps remapped as missing.""" gaps = self.gapVector() first_nongap = last_nongap = None for i, state in enumerate(gaps): if not state: if first_nongap is None: first_nongap = i last_nongap = i missing = self.MolType.Missing if first_nongap is None: #sequence was all gaps result = self.__class__([missing for i in len(self)],Info=self.Info) else: prefix = missing*first_nongap mid = str(self[first_nongap:last_nongap+1]) suffix = missing*(len(self)-last_nongap-1) result = self.__class__(prefix + mid + suffix, Info=self.Info) return result class Sequence(_Annotatable, SequenceI): """Holds the standard Sequence object. Immutable.""" MolType = None #connected to ACSII when moltype is imported def __init__(self, Seq='',Name=None, Info=None, check=True, \ preserve_case=False, gaps_allowed=True, wildcards_allowed=True): """Initialize a sequence. Arguments: Seq: the raw sequence string, default is '' Name: the sequence name check: if True (the default), validates against the MolType """ if Name is None and hasattr(Seq, 'Name'): Name = Seq.Name self.Name = Name orig_seq = Seq if isinstance(Seq, Sequence): Seq = Seq._seq elif isinstance(Seq, ModelSequence): Seq = str(Seq) elif type(Seq) is not str: try: Seq = ''.join(Seq) except TypeError: Seq = ''.join(map(str, Seq)) Seq = self._seq_filter(Seq) if not preserve_case and not Seq.isupper(): Seq = Seq.upper() self._seq = Seq if check: self.MolType.verifySequence(self._seq, gaps_allowed, \ wildcards_allowed) if not isinstance(Info, InfoClass): try: Info = InfoClass(Info) except TypeError: Info = InfoClass() if hasattr(orig_seq, 'Info'): try: Info.update(orig_seq.Info) except: pass self.Info = Info if isinstance(orig_seq, _Annotatable): self.copyAnnotations(orig_seq) def _seq_filter(self, seq): """Returns filtered seq; used to do DNA/RNA conversions.""" return seq def getColourScheme(self, colours): return {} #dict([(motif,colours.black) for motif in self.MolType]) def getColorScheme(self, colors): #alias to support US spelling return self.getColourScheme(colours=colors) def copyAnnotations(self, other): self.annotations = other.annotations[:] def annotateFromGff(self, f): first_seqname = None for (seqname, source, feature, start, end, score, strand, frame, attributes, comments) in gff.GffParser(f): if first_seqname is None: first_seqname = seqname else: assert seqname == first_seqname, (seqname, first_seqname) feat_label = gff.parse_attributes(attributes) self.addFeature(feature, feat_label, [(start, end)]) def withMaskedAnnotations(self, annot_types, mask_char=None, shadow=False): """returns a sequence with annot_types regions replaced by mask_char if shadow is False, otherwise all other regions are masked. Arguments: - annot_types: annotation type(s) - mask_char: must be a character valid for the seq MolType. The default value is the most ambiguous character, eg. '?' for DNA - shadow: whether to mask the annotated regions, or everything but the annotated regions""" if mask_char is None: ambigs = [(len(v), c) for c,v in self.MolType.Ambiguities.items()] ambigs.sort() mask_char = ambigs[-1][1] assert mask_char in self.MolType, 'Invalid mask_char %s' % mask_char annotations = [] annot_types = [annot_types, [annot_types]][isinstance(annot_types, str)] for annot_type in annot_types: annotations += self.getAnnotationsMatching(annot_type) region = self.getRegionCoveringAll(annotations) if shadow: region = region.getShadow() i = 0 segments = [] for b, e in region.getCoordinates(): segments.append(self._seq[i:b]) segments.append(mask_char * (e-b)) i = e segments.append(self._seq[i:]) new = self.__class__(''.join(segments), Name=self.Name, check=False, Info=self.Info) new.annotations = self.annotations[:] return new def gappedByMapSegmentIter(self, map, allow_gaps=True, recode_gaps=False): for span in map.spans: if span.lost: if allow_gaps: unknown = span.terminal or recode_gaps seg = "-?"[unknown] * span.length else: raise ValueError('Gap(s) in map %s' % map) else: seg = self._seq[span.Start:span.End] if span.Reverse: complement = self.MolType.complement seg = [complement(base) for base in seg[::-1]] seg = ''.join(seg) yield seg def gappedByMapMotifIter(self, map): for segment in self.gappedByMapSegmentIter(map): for motif in segment: yield motif def gappedByMap(self, map, recode_gaps=False): segments = self.gappedByMapSegmentIter(map, True, recode_gaps) new = self.__class__(''.join(segments), Name=self.Name, check=False, Info=self.Info) annots = self._slicedAnnotations(new, map) new.annotations = annots return new def _mapped(self, map): # Called by generic __getitem__ segments = self.gappedByMapSegmentIter(map, allow_gaps=False) new = self.__class__(''.join(segments), self.Name, Info=self.Info) return new def __add__(self, other): """Adds two sequences (other can be a string as well).""" if hasattr(other, 'MolType'): if self.MolType != other.MolType: raise ValueError, "MolTypes don't match: (%s,%s)" % \ (self.MolType, other.MolType) other_seq = other._seq else: other_seq = other new_seq = self.__class__(self._seq + other_seq) # Annotations which extend past the right end of the left sequence # or past the left end of the right sequence are dropped because # otherwise they will annotate the wrong part of the constructed # sequence. left = [a for a in self._shiftedAnnotations(new_seq, 0) if a.map.End <= len(self)] if hasattr(other, '_shiftedAnnotations'): right = [a for a in other._shiftedAnnotations(new_seq, len(self)) if a.map.Start >= len(self)] new_seq.annotations = left + right else: new_seq.annotations = left return new_seq def __repr__(self): myclass = '%s' % self.__class__.__name__ myclass = myclass.split('.')[-1] if len(self) > 10: seq = str(self._seq[:7]) + '... %s' % len(self) else: seq = str(self._seq) return "%s(%s)" % (myclass, seq) def getTracks(self, policy): return policy.tracksForSequence(self) def getName(self): """Return the sequence name -- should just use Name instead.""" return self.Name def __len__(self): return len(self._seq) def __iter__(self): return iter(self._seq) def gettype(self): """Return the sequence type.""" return self.MolType.label def resolveambiguities(self): """Returns a list of tuples of strings.""" ambigs = self.MolType.resolveAmbiguity return [ambigs(motif) for motif in self._seq] def slidingWindows(self, window, step, start=None, end=None): """Generator function that yield new sequence objects of a given length at a given interval. Arguments: - window: The length of the returned sequence - step: The interval between the start of the returned sequence objects - start: first window start position - end: last window start position """ start = [start, 0][start is None] end = [end, len(self)-window+1][end is None] end = min(len(self)-window+1, end) if start < end and len(self)-end >= window-1: for pos in xrange(start, end, step): yield self[pos:pos+window] def getInMotifSize(self, motif_length=1, log_warnings=True): """returns sequence as list of non-overlapping motifs Arguments: - motif_length: length of the motifs - log_warnings: whether to notify of an incomplete terminal motif""" seq = self._seq if motif_length == 1: return seq else: length = len(seq) remainder = length % motif_length if remainder and log_warnings: warnings.warn('Dropped remainder "%s" from end of sequence' % seq[-remainder:]) return [seq[i:i+motif_length] for i in range(0, length-remainder, motif_length)] def parseOutGaps(self): gapless = [] segments = [] nongap = re.compile('([^%s]+)' % re.escape("-")) for match in nongap.finditer(self._seq): segments.append(match.span()) gapless.append(match.group()) map = Map(segments, parent_length=len(self)).inverse() seq = self.__class__( ''.join(gapless), Name = self.getName(), Info=self.Info) if self.annotations: seq.annotations = [a.remappedTo(seq, map) for a in self.annotations] return (map, seq) class ProteinSequence(Sequence): """Holds the standard Protein sequence. MolType set in moltype module.""" pass class ProteinWithStopSequence(Sequence): """Holds the standard Protein sequence, allows for stop codon MolType set in moltype module """ pass class NucleicAcidSequence(Sequence): """Base class for DNA and RNA sequences. Abstract.""" PROTEIN = None #will set in moltype CodonAlphabet = None #will set in moltype def reversecomplement(self): """Converts a nucleic acid sequence to its reverse complement. Synonymn for rc.""" return self.rc() def rc(self): """Converts a nucleic acid sequence to its reverse complement.""" complement = self.MolType.rc(self) rc = self.__class__(complement, Name=self.Name, Info=self.Info) self._annotations_nucleic_reversed_on(rc) return rc def _gc_from_arg(self, gc): # codon_alphabet is being deprecated in favor of genetic codes. if gc is None: gc = DEFAULT_GENETIC_CODE elif isinstance(gc, (int, basestring)): gc = GeneticCodes[gc] return gc def hasTerminalStop(self, gc=None): """Return True if the sequence has a terminal stop codon. Arguments: - gc: a genetic code""" gc = self._gc_from_arg(gc) codons = self._seq assert len(codons) % 3 == 0 return codons and gc.isStop(codons[-3:]) def withoutTerminalStopCodon(self, gc=None): gc = self._gc_from_arg(gc) codons = self._seq assert len(codons) % 3 == 0, "seq length not divisible by 3" if codons and gc.isStop(codons[-3:]): codons = codons[:-3] return self.__class__(codons, Name=self.Name, Info=self.Info) def getTranslation(self, gc=None): gc = self._gc_from_arg(gc) codon_alphabet = self.CodonAlphabet(gc).withGapMotif() # translate the codons translation = [] for posn in range(0, len(self._seq)-2, 3): orig_codon = self._seq[posn:posn+3] resolved = codon_alphabet.resolveAmbiguity(orig_codon) trans = [] for codon in resolved: if codon == '---': aa = '-' else: assert '-' not in codon aa = gc[codon] if aa == '*': continue trans.append(aa) if not trans: raise ValueError(orig_codon) aa = self.PROTEIN.whatAmbiguity(trans) translation.append(aa) translation = self.PROTEIN.makeSequence( Seq=''.join(translation), Name=self.Name) return translation def getOrfPositions(self, gc=None, atg=False): gc = self._gc_from_arg(gc) orfs = [] start = None protein = self.getTranslation(gc=gc) for (posn, aa) in enumerate(protein): posn *= 3 if aa == '*': if start is not None: orfs.append((start,posn)) start = None else: if start is None: if (not atg) or gc.isStart(self[posn:posn+3]): start = posn if start is not None: orfs.append((start, posn+3)) return orfs def toRna(self): """Returns copy of self as RNA.""" return RnaSequence(self) def toDna(self): """Returns copy of self as DNA.""" return DnaSequence(self) class DnaSequence(NucleicAcidSequence): def getColourScheme(self, colours): return { 'A': colours.black, 'T': colours.red, 'C': colours.blue, 'G': colours.green, } def _seq_filter(self, seq): """Converts U to T.""" return seq.replace('u','t').replace('U','T') class RnaSequence(NucleicAcidSequence): def getColourScheme(self, colours): return { 'A': colours.black, 'U': colours.red, 'C': colours.blue, 'G': colours.green, } def _seq_filter(self, seq): """Converts T to U.""" return seq.replace('t','u').replace('T','U') class ABSequence(Sequence): """Used for two-state modeling; MolType set in moltypes.""" pass class ByteSequence(Sequence): """Used for storing arbitrary bytes.""" def __init__(self, Seq='', Name=None, Info=None, check=False, \ preserve_case=True): return super(ByteSequence, self).__init__(Seq, Name=Name, Info=Info, \ check=check, preserve_case=preserve_case) class ModelSequenceBase(object): """Holds the information for a non-degenerate sequence. Mutable. A ModelSequence is an array of indices of symbols, where those symbols are defined by an Alphabet. This representation of Sequence is convenient for counting symbol frequencies or tuple frequencies, remapping data (e.g. for reverse-complement), looking up model parameters, etc. Its main drawback is that the sequences can no longer be treated as strings, and conversion to/from strings can be fairly time-consuming. Also, any symbol not in the Alphabet cannot be represented at all. A sequence can have a Name, which will be used for output in formats such as FASTA. A sequence Class has an alphabet (which can be overridden in instances where necessary), a delimiter used for string conversions, a LineWrap for wrapping characters into lines for e.g. FASTA output. Note that a ModelSequence _must_ have an Alphabet, not a MolType, because it is often important to store just a subset of the possible characters (e.g. the non-degenerate bases) for modeling purposes. """ Alphabet = None #REPLACE IN SUBCLASSES MolType = None #REPLACE IN SUBCLASSES Delimiter = '' #Used for string conversions LineWrap = 80 #Wrap sequences at 80 characters by default. def __init__(self, data='', Alphabet=None, Name=None, Info=None, \ check='ignored'): """Initializes sequence from data and alphabet. WARNING: Does not validate the data or alphabet for compatibility. This is for speed. Use isValid() to check whether the data is consistent with the alphabet. WARNING: If data has name and/or Info, gets ref to same object rather than copying in each case. """ if Name is None and hasattr(data, 'Name'): Name = data.Name if Info is None and hasattr(data, 'Info'): Info = data.Info #set the label self.Name = Name #override the class alphabet if supplied if Alphabet is not None: self.Alphabet = Alphabet #if we haven't already set self._data (e.g. in a subclass __init__), #guess the data type and set it here if not hasattr(self, '_data'): #if data is a sequence, copy its data and alphabet if isinstance(data, ModelSequence): self._data = data._data self.Alphabet = data.Alphabet #if it's an array elif type(data) == ARRAY_TYPE: self._data = data else: #may be set in subclass init self._from_sequence(data) self.MolType = self.Alphabet.MolType self.Info = Info def __getitem__(self, *args): """__getitem__ returns char or slice, as same class.""" if len(args) == 1 and not isinstance(args[0], slice): result = array([self._data[args[0]]]) else: result = self._data.__getitem__(*args) return self.__class__(result) def __cmp__(self, other): """__cmp__ compares based on string""" return cmp(str(self), other) def _from_sequence(self, data): """Fills self using the values in data, via the Alphabet.""" if self.Alphabet: self._data = array(self.Alphabet.toIndices(data), \ self.Alphabet.ArrayType) else: self._data = array(data) def __str__(self): """Uses alphabet to convert self to string, using delimiter.""" if hasattr(self.Alphabet, 'toString'): return self.Alphabet.toString(self._data) else: return self.Delimiter.join(map(str, \ self.Alphabet.fromIndices(self._data))) def __len__(self): """Returns length of data.""" return len(self._data) def toFasta(self, make_seqlabel=None): """Return string of self in FASTA format, no trailing newline Arguments: - make_seqlabel: callback function that takes the seq object and returns a label str """ return fasta_from_sequences([self], make_seqlabel = make_seqlabel, line_wrap=self.LineWrap) def toPhylip(self, name_len=28, label_len=30): """Return string of self in one line for PHYLIP, no newline. Default: max name length is 28, label length is 30. """ return str(self.Name)[:name_len].ljust(label_len) + str(self) def isValid(self): """Checks that no items in self are out of the Alphabet range.""" return self._data == self._data.clip(m, 0, len(self.Alphabet)-1) def toKwords(self, k, overlapping=True): """Turns sequence into sequence of its k-words. Just returns array, not Sequence object.""" alpha_len = len(self.Alphabet) seq = self._data seq_len = len(seq) if overlapping: num_words = seq_len - k + 1 else: num_words, remainder = divmod(seq_len, k) last_index = num_words * k result = zeros(num_words) for i in range(k): if overlapping: curr_slice = seq[i:i+num_words] else: curr_slice = seq[i:last_index+i:k] result *= alpha_len result += curr_slice return result def __iter__(self): """iter returns characters of self, rather than slices.""" if hasattr(self.Alphabet, 'toString'): return iter(self.Alphabet.toString(self._data)) else: return iter(self.Alpabet.fromIndices(self._data)) def tostring(self): """tostring delegates to self._data.""" return self._data.tostring() def gaps(self): """Returns array containing 1 where self has gaps, 0 elsewhere. WARNING: Only checks for standard gap character (for speed), and does not check for ambiguous gaps, etc. """ return self._data == self.Alphabet.GapIndex def nongaps(self): """Returns array contining 0 where self has gaps, 1 elsewhere. WARNING: Only checks for standard gap character (for speed), and does not check for ambiguous gaps, etc. """ return self._data != self.Alphabet.GapIndex def regap(self, other, strip_existing_gaps=False): """Inserts elements of self into gaps specified by other. WARNING: Only checks for standard gap character (for speed), and does not check for ambiguous gaps, etc. """ if strip_existing_gaps: s = self.degap() else: s = self c = self.__class__ a = self.Alphabet.Gapped result = zeros(len(other),a.ArrayType)+a.GapIndex put(result, nonzero(other.nongaps()), s._data) return c(result) def degap(self): """Returns ungapped copy of self, not changing alphabet.""" if not hasattr(self.Alphabet, 'Gap') or self.Alphabet.Gap is None: return self.copy() d = take(self._data, nonzero(logical_not(self.gapArray()))[0]) return self.__class__(d, Alphabet=self.Alphabet, Name=self.Name, \ Info=self.Info) def copy(self): """Returns copy of self, always separate object.""" return self.__class__(self._data.copy(), Alphabet=self.Alphabet, \ Name=self.Name, Info=self.Info) def __contains__(self, item): """Returns true if item in self (converts to strings).""" return item in str(self) def disambiguate(self, *args, **kwargs): """Disambiguates self using strings/moltype. Should recode if demand.""" return self.__class__(self.MolType.disambiguate(str(self), \ *args,**kwargs)) def distance(self, other, function=None, use_indices=False): """Returns distance between self and other using function(i,j). other must be a sequence. function should be a function that takes two items and returns a number. To turn a 2D matrix into a function, use cogent.util.miscs.DistanceFromMatrix(matrix). use_indices: if False, maps the indices onto items (e.g. assumes function relates the characters). If True, uses the indices directly. NOTE: Truncates at the length of the shorter sequence. Note that the function acts on two _elements_ of the sequences, not the two sequences themselves (i.e. the behavior will be the same for every position in the sequences, such as identity scoring or a function derived from a distance matrix as suggested above). One limitation of this approach is that the distance function cannot use properties of the sequences themselves: for example, it cannot use the lengths of the sequences to normalize the scores as percent similarities or percent differences. If you want functions that act on the two sequences themselves, there is no particular advantage in making these functions methods of the first sequences by passing them in as parameters like the function in this method. It makes more sense to use them as standalone functions. The factory function cogent.util.transform.for_seq is useful for converting per-element functions into per-sequence functions, since it takes as parameters a per-element scoring function, a score aggregation function, and a normalization function (which itself takes the two sequences as parameters), returning a single function that combines these functions and that acts on two complete sequences. """ if function is None: #use identity scoring shortest = min(len(self), len(other)) if not hasattr(other, '_data'): other = self.__class__(other) distance = (self._data[:shortest] != other._data[:shortest]).sum() else: distance = 0 if use_indices: self_seq = self._data if hasattr(other, '_data'): other_seq = other._data else: self_seq = self.Alphabet.fromIndices(self._data) if hasattr(other, '_data'): other_seq = other.Alphabet.fromIndices(other._data) else: other_seq = other for first, second in zip(self_seq, other_seq): distance += function(first, second) return distance def matrixDistance(self, other, matrix, use_indices=False): """Returns distance between self and other using a score matrix. if use_indices is True (default is False), assumes that matrix is an array using the same indices that self uses. WARNING: the matrix must explicitly contain scores for the case where a position is the same in self and other (e.g. for a distance matrix, an identity between U and U might have a score of 0). The reason the scores for the 'diagonals' need to be passed explicitly is that for some kinds of distance matrices, e.g. log-odds matrices, the 'diagonal' scores differ from each other. If these elements are missing, this function will raise a KeyError at the first position that the two sequences are identical. """ return self.distance(other, DistanceFromMatrix(matrix)) def shuffle(self): """Returns shuffled copy of self""" return self.__class__(permutation(self._data), Info=self.Info) def gapArray(self): """Returns array of 0/1 indicating whether each position is a gap.""" gap_indices = [] a = self.Alphabet for c in self.MolType.Gaps: if c in a: gap_indices.append(a.index(c)) gap_vector = None for i in gap_indices: if gap_vector is None: gap_vector = self._data == i else: gap_vector = logical_or(gap_vector, self._data == i) return gap_vector def gapIndices(self): """Returns array of indices of gapped positions in self.""" return self.gapArray().nonzero()[0] def fracSameGaps(self, other): """Returns fraction of positions where gaps match other's gaps. """ if not other: return 0 self_gaps = self.gapArray() if hasattr(other, 'gapArray'): other_gaps = other.gapArray() elif hasattr(other, 'gapVector'): other_gaps = array(other.gapVector()) else: other_gaps = array(self.MolType.gapVector(other)) min_len = min(len(self), len(other)) self_gaps, other_gaps = self_gaps[:min_len], other_gaps[:min_len] return (self_gaps == other_gaps).sum()/float(min_len) class ModelSequence(ModelSequenceBase, SequenceI): """ModelSequence provides an array-based implementation of Sequence. Use ModelSequenceBase if you need a stripped-down, fast implementation. ModelSequence implements everything that SequenceI implements. See docstrings for ModelSequenceBase and SequenceI for information about these respective classes. """ def stripBad(self): """Returns copy of self with bad chars excised""" valid_indices = self._data < len(self.Alphabet) result = compress(valid_indices, self._data) return self.__class__(result, Info=self.Info) def stripBadAndGaps(self): """Returns copy of self with bad chars and gaps excised.""" gap_indices = map(self.Alphabet.index, self.MolType.Gaps) valid_indices = self._data < len(self.Alphabet) for i in gap_indices: valid_indices -= self._data == i result = compress(valid_indices, self._data) return self.__class__(result, Info=self.Info) def stripDegenerate(self): """Returns copy of self without degenerate symbols. NOTE: goes via string intermediate because some of the algorithms for resolving degenerates are complex. This could be optimized if speed becomes critical. """ return self.__class__(self.MolType.stripDegenerate(str(self)), \ Info=self.Info) def countGaps(self): """Returns count of gaps in self.""" return self.gapArray().sum() def gapVector(self): """Returns list of bool containing whether each pos is a gap.""" return map(bool, self.gapArray()) def gapList(self): """Returns list of gap indices.""" return list(self.gapIndices()) def gapMaps(self): """Returns dicts mapping gapped/ungapped positions.""" nongaps = logical_not(self.gapArray()) indices = arange(len(self)).compress(nongaps) new_indices = arange(len(indices)) return dict(zip(new_indices, indices)), dict(zip(indices, new_indices)) def firstGap(self): """Returns position of first gap, or None.""" a = self.gapIndices() try: return a[0] except IndexError: return None def isGapped(self): """Returns True of sequence contains gaps.""" return len(self.gapIndices()) def MW(self, *args, **kwargs): """Returns molecular weight. Works via string intermediate: could optimize using array of MW if speed becomes important. """ return self.MolType.MW(str(self), *args, **kwargs) def fracSimilar(self, other, similar_pairs): """Returns fraction of positions where self[i] is similar to other[i]. similar_pairs must be a dict such that d[(i,j)] exists if i and j are to be counted as similar. Use PairsFromGroups in cogent.util.misc to construct such a dict from a list of lists of similar residues. Truncates at the length of the shorter sequence. Note: current implementation re-creates the distance function each time, so may be expensive compared to creating the distance function using for_seq separately. Returns 0 if one sequence is empty. NOTE: goes via string intermediate, could optimize using array if speed becomes important. Note that form of similar_pairs input would also have to change. """ if not self or not other: return 0.0 return for_seq(f = lambda x, y: (x,y) in similar_pairs, \ normalizer=per_shortest)(str(self), str(other)) class ModelNucleicAcidSequence(ModelSequence): """Abstract class defining ops for codons, translation, etc.""" def toCodons(self): """Returns copy of self in codon alphabet. Assumes ungapped.""" alpha_len = len(self.Alphabet) return ModelCodonSequence(alpha_len*(\ alpha_len*self._data[::3] + self._data[1::3]) + self._data[2::3], \ Name=self.Name, Alphabet=self.Alphabet.Triples) def complement(self): """Returns complement of sequence""" return self.__class__(self.Alphabet._complement_array.take(self._data),\ Info=self.Info) def rc(self): """Returns reverse-complement of sequence""" comp = self.Alphabet._complement_array.take(self._data) return self.__class__(comp[::-1], Info=self.Info) def toRna(self): """Returns self as RNA""" return ModelRnaSequence(self._data) def toDna(self): """Returns self as DNA""" return ModelDnaSequence(self._data) class ModelRnaSequence(ModelNucleicAcidSequence): MolType = None #set to RNA in moltype.py Alphabet = None #set to RNA.Alphabets.DegenGapped in moltype.py def __init__(self, data='', *args, **kwargs): """Returns new ModelRnaSequence, converting T -> U""" if hasattr(data, 'upper'): data = data.upper().replace('T','U') return super(ModelNucleicAcidSequence, self).__init__(data, \ *args, **kwargs) class ModelDnaSequence(ModelNucleicAcidSequence): MolType = None #set to DNA in moltype.py Alphabet = None #set to DNA.Alphabets.DegenGapped in moltype.py def __init__(self, data='', *args, **kwargs): """Returns new ModelRnaSequence, converting U -> T""" if hasattr(data, 'upper'): data = data.upper().replace('U','T') return super(ModelNucleicAcidSequence, self).__init__(data, \ *args, **kwargs) class ModelCodonSequence(ModelSequence): """Abstract base class for codon sequences, incl. string conversion.""" SequenceClass = ModelNucleicAcidSequence def __str__(self): """Joins triplets together as string.""" return self.Delimiter.join(map(''.join, \ self.Alphabet.fromIndices(self._data))) def _from_string(self, s): """Reads from a raw string, rather than a DnaSequence.""" s = s.upper().replace('U','T') #convert to uppercase DNA d = self.SequenceClass(s, \ Alphabet=self.Alphabet.SubEnumerations[0]) self._data = d.toCodons()._data def __init__(self, data='', Alphabet=None, Name=None, Info=None): """Override __init__ to handle init from string.""" if isinstance(data, str): self._from_string(data) ModelSequence.__init__(self, data, Alphabet, Name, Info=Info) def toCodons(self): """Converts self to codons -- in practice, just returns self. Supports interface of other NucleicAcidSequences.""" return self def toDna(self): """Returns a ModelDnaSequence from the data in self""" unpacked = self.Alphabet.unpackArrays(self._data) result = zeros((len(self._data),3)) for i, v in enumerate(unpacked): result[:,i] = v return ModelDnaSequence(ravel(result), Name=self.Name) def toRna(self): """Returns a ModelDnaSequence from the data in self.""" unpacked = self.Alphabet.unpackArrays(self._data) result = zeros((len(self._data),3)) for i, v in enumerate(unpacked): result[:,i] = v return ModelRnaSequence(ravel(result), Name=self.Name) class ModelDnaCodonSequence(ModelCodonSequence): """Holds non-degenerate DNA codon sequence.""" Alphabet = None #set to DNA.Alphabets.Base.Triples in moltype.py SequenceClass = ModelDnaSequence class ModelRnaCodonSequence(ModelCodonSequence): """Holds non-degenerate DNA codon sequence.""" Alphabet = None #set to RNA.Alphabets.Base.Triples in motype.py SequenceClass = ModelRnaSequence def _from_string(self, s): """Reads from a raw string, rather than a DnaSequence.""" s = s.upper().replace('T','U') #convert to uppercase DNA d = self.SequenceClass(s, \ Alphabet=self.Alphabet.SubEnumerations[0]) self._data = d.toCodons()._data class ModelProteinSequence(ModelSequence): MolType = None #set to PROTEIN in moltype.py Alphabet = None #set to PROTEIN.Alphabets.DegenGapped in moltype.py class ModelProteinWithStopSequence(ModelSequence): MolType = None #set to PROTEIN_WITH_STOP in moltype.py Alphabet= None #set to PROTEIN_WITH_STOP.Alphabets.DegenGapped in moltype.py PyCogent-1.5.3/cogent/core/tree.py000644 000765 000024 00000247527 12024702176 020002 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Classes for storing and manipulating a phylogenetic tree. These trees can be either strictly binary, or have polytomies (multiple children to a parent node). Trees consist of Nodes (or branches) that connect two nodes. The Tree can be created only from a newick formatted string read either from file or from a string object. Other formats will be added as time permits. Tree can: - Deal with either rooted or unrooted tree's and can convert between these types. - Return a sub-tree given a list of tip-names - Identify an edge given two tip names. This method facilitates the statistical modelling by simplyifying the syntax for specifying sub-regions of a tree. - Assess whether two Tree instances represent the same topology. Definition of relevant terms or abbreviations: - edge: also known as a branch on a tree. - node: the point at which two edges meet - tip: a sequence or species - clade: all and only the nodes (including tips) that descend from a node - stem: the edge immediately preceeding a clade """ from numpy import zeros, argsort, ceil, log from copy import deepcopy import re from cogent.util.transform import comb from cogent.maths.stats.test import correlation from operator import or_ from cogent.util.misc import InverseDict from random import shuffle, choice __author__ = "Gavin Huttley, Peter Maxwell and Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley", "Peter Maxwell", "Rob Knight", "Andrew Butterfield", "Catherine Lozupone", "Micah Hamady", "Jeremy Widmann", "Zongzhi Liu", "Daniel McDonald", "Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" def distance_from_r_squared(m1, m2): """Estimates distance as 1-r^2: no correl = max distance""" return 1 - (correlation(m1.flat, m2.flat)[0])**2 def distance_from_r(m1, m2): """Estimates distance as (1-r)/2: neg correl = max distance""" return (1-correlation(m1.flat, m2.flat)[0])/2 class TreeError(Exception): pass class TreeNode(object): """Store information about a tree node. Mutable. Parameters: Name: label for the node, assumed to be unique. Children: list of the node's children. Params: dict containing arbitrary parameters for the node. NameLoaded: ? """ _exclude_from_copy = dict.fromkeys(['_parent','Children']) def __init__(self, Name=None, Children=None, Parent=None, Params=None, \ NameLoaded=True, **kwargs): """Returns new TreeNode object.""" self.Name = Name self.NameLoaded = NameLoaded if Params is None: Params = {} self.params = Params self.Children = [] if Children is not None: self.extend(Children) self._parent = Parent if (Parent is not None) and not (self in Parent.Children): Parent.append(self) ### built-in methods and list interface support def __repr__(self): """Returns reconstructable string representation of tree. WARNING: Does not currently set the class to the right type. """ return 'Tree("%s")' % self.getNewick() def __str__(self): """Returns Newick-format string representation of tree.""" return self.getNewick() def compareName(self, other): """Compares TreeNode by name""" if self is other: return 0 try: return cmp(self.Name, other.Name) except AttributeError: return cmp(type(self), type(other)) def compareByNames(self, other): """Equality test for trees by name""" # if they are the same object then they must be the same tree... if self is other: return True self_names = self.getNodeNames() other_names = other.getNodeNames() self_names.sort() other_names.sort() return self_names == other_names def _to_self_child(self, i): """Converts i to self's type, with self as its parent. Cleans up refs from i's original parent, but doesn't give self ref to i. """ c = self.__class__ if isinstance(i, c): if i._parent not in (None, self): i._parent.Children.remove(i) else: i = c(i) i._parent = self return i def append(self, i): """Appends i to self.Children, in-place, cleaning up refs.""" self.Children.append(self._to_self_child(i)) def extend(self, items): """Extends self.Children by items, in-place, cleaning up refs.""" self.Children.extend(map(self._to_self_child, items)) def insert(self, index, i): """Inserts an item at specified position in self.Children.""" self.Children.insert(index, self._to_self_child(i)) def pop(self, index=-1): """Returns and deletes child of self at index (default: -1)""" result = self.Children.pop(index) result._parent = None return result def remove(self, target): """Removes node by name instead of identity. Returns True if node was present, False otherwise. """ if isinstance(target, TreeNode): target = target.Name for (i, curr_node) in enumerate(self.Children): if curr_node.Name == target: self.removeNode(curr_node) return True return False def __getitem__(self, i): """Node delegates slicing to Children; faster to access them directly.""" return self.Children[i] def __setitem__(self, i, val): """Node[i] = x sets the corresponding item in Children.""" curr = self.Children[i] if isinstance(i, slice): for c in curr: c._parent = None coerced_val = map(self._to_self_child, val) self.Children[i] = coerced_val[:] else: #assume we got a single index curr._parent = None coerced_val = self._to_self_child(val) self.Children[i] = coerced_val def __delitem__(self, i): """del node[i] deletes index or slice from self.Children.""" curr = self.Children[i] if isinstance(i, slice): for c in curr: c._parent = None else: curr._parent = None del self.Children[i] def __iter__(self): """Node iter iterates over the Children.""" return iter(self.Children) def __len__(self): """Node len returns number of children.""" return len(self.Children) #support for copy module def copyRecursive(self, memo=None, _nil=[], constructor='ignored'): """Returns copy of self's structure, including shallow copy of attrs. constructor is ignored; required to support old tree unit tests. """ result = self.__class__() efc = self._exclude_from_copy for k, v in self.__dict__.items(): if k not in efc: #avoid infinite recursion result.__dict__[k] = deepcopy(self.__dict__[k]) for c in self: result.append(c.copy()) return result def copy(self, memo=None, _nil=[], constructor='ignored'): """Returns a copy of self using an iterative approach""" def __copy_node(n): result = n.__class__() efc = n._exclude_from_copy for k,v in n.__dict__.items(): if k not in efc: result.__dict__[k] = deepcopy(n.__dict__[k]) return result root = __copy_node(self) nodes_stack = [[root, self, len(self.Children)]] while nodes_stack: #check the top node, any children left unvisited? top = nodes_stack[-1] new_top_node, old_top_node, unvisited_children = top if unvisited_children: top[2] -= 1 old_child = old_top_node.Children[-unvisited_children] new_child = __copy_node(old_child) new_top_node.append(new_child) nodes_stack.append([new_child, old_child, \ len(old_child.Children)]) else: #no unvisited children nodes_stack.pop() return root __deepcopy__ = deepcopy = copy def copyTopology(self, constructor=None): """Copies only the topology and labels of a tree, not any extra data. Useful when you want another copy of the tree with the same structure and labels, but want to e.g. assign different branch lengths and environments. Does not use deepcopy from the copy module, so _much_ faster than the copy() method. """ if constructor is None: constructor = self.__class__ children = [c.copyTopology(constructor) for c in self.Children] return constructor(Name=self.Name[:], Children=children) #support for basic tree operations -- finding objects and moving in the tree def _get_parent(self): """Accessor for parent. If using an algorithm that accesses Parent a lot, it will be much faster to access self._parent directly, but don't do it if mutating self._parent! (or, if you must, remember to clean up the refs). """ return self._parent def _set_parent(self, Parent): """Mutator for parent: cleans up refs in old parent.""" if self._parent is not None: self._parent.removeNode(self) self._parent = Parent if (Parent is not None) and (not self in Parent.Children): Parent.Children.append(self) Parent = property(_get_parent, _set_parent) def indexInParent(self): """Returns index of self in parent.""" return self._parent.Children.index(self) def isTip(self): """Returns True if the current node is a tip, i.e. has no children.""" return not self.Children def isRoot(self): """Returns True if the current is a root, i.e. has no parent.""" return self._parent is None def traverse(self, self_before=True, self_after=False, include_self=True): """Returns iterator over descendants. Iterative: safe for large trees. self_before includes each node before its descendants if True. self_after includes each node after its descendants if True. include_self includes the initial node if True. self_before and self_after are independent. If neither is True, only terminal nodes will be returned. Note that if self is terminal, it will only be included once even if self_before and self_after are both True. This is a depth-first traversal. Since the trees are not binary, preorder and postorder traversals are possible, but inorder traversals would depend on the data in the tree and are not handled here. """ if self_before: if self_after: return self.pre_and_postorder(include_self=include_self) else: return self.preorder(include_self=include_self) else: if self_after: return self.postorder(include_self=include_self) else: return self.tips(include_self=include_self) def levelorder(self, include_self=True): """Performs levelorder iteration over tree""" queue = [self] while queue: curr = queue.pop(0) if include_self or (curr is not self): yield curr if curr.Children: queue.extend(curr.Children) def preorder(self, include_self=True): """Performs preorder iteration over tree.""" stack = [self] while stack: curr = stack.pop() if include_self or (curr is not self): yield curr if curr.Children: stack.extend(curr.Children[::-1]) #20% faster than reversed def postorder(self, include_self=True): """Performs postorder iteration over tree. This is somewhat inelegant compared to saving the node and its index on the stack, but is 30% faster in the average case and 3x faster in the worst case (for a comb tree). Zongzhi Liu's slower but more compact version is: def postorder_zongzhi(self): stack = [[self, 0]] while stack: curr, child_idx = stack[-1] if child_idx < len(curr.Children): stack[-1][1] += 1 stack.append([curr.Children[child_idx], 0]) else: yield stack.pop()[0] """ child_index_stack = [0] curr = self curr_children = self.Children curr_children_len = len(curr_children) while 1: curr_index = child_index_stack[-1] #if there are children left, process them if curr_index < curr_children_len: curr_child = curr_children[curr_index] #if the current child has children, go there if curr_child.Children: child_index_stack.append(0) curr = curr_child curr_children = curr.Children curr_children_len = len(curr_children) curr_index = 0 #otherwise, yield that child else: yield curr_child child_index_stack[-1] += 1 #if there are no children left, return self, and move to #self's parent else: if include_self or (curr is not self): yield curr if curr is self: break curr = curr.Parent curr_children = curr.Children curr_children_len = len(curr_children) child_index_stack.pop() child_index_stack[-1] += 1 def pre_and_postorder(self, include_self=True): """Performs iteration over tree, visiting node before and after.""" #handle simple case first if not self.Children: if include_self: yield self raise StopIteration child_index_stack = [0] curr = self curr_children = self.Children while 1: curr_index = child_index_stack[-1] if not curr_index: if include_self or (curr is not self): yield curr #if there are children left, process them if curr_index < len(curr_children): curr_child = curr_children[curr_index] #if the current child has children, go there if curr_child.Children: child_index_stack.append(0) curr = curr_child curr_children = curr.Children curr_index = 0 #otherwise, yield that child else: yield curr_child child_index_stack[-1] += 1 #if there are no children left, return self, and move to #self's parent else: if include_self or (curr is not self): yield curr if curr is self: break curr = curr.Parent curr_children = curr.Children child_index_stack.pop() child_index_stack[-1] += 1 def traverse_recursive(self, self_before=True, self_after=False, \ include_self=True): """Returns iterator over descendants. IMPORTANT: read notes below. traverse_recursive is slower than traverse, and can lead to stack errors. However, you _must_ use traverse_recursive if you plan to modify the tree topology as you walk over it (e.g. in post-order), because the iterative methods use their own stack that is not updated if you alter the tree. self_before includes each node before its descendants if True. self_after includes each node after its descendants if True. include_self includes the initial node if True. self_before and self_after are independent. If neither is True, only terminal nodes will be returned. Note that if self is terminal, it will only be included once even if self_before and self_after are both True. This is a depth-first traversal. Since the trees are not binary, preorder and postorder traversals are possible, but inorder traversals would depend on the data in the tree and are not handled here. """ if self.Children: if self_before and include_self: yield self for child in self.Children: for i in child.traverse_recursive(self_before, self_after): yield i if self_after and include_self: yield self elif include_self: yield self def ancestors(self): """Returns all ancestors back to the root. Dynamically calculated.""" result = [] curr = self._parent while curr is not None: result.append(curr) curr = curr._parent return result def root(self): """Returns root of the tree self is in. Dynamically calculated.""" curr = self while curr._parent is not None: curr = curr._parent return curr def isroot(self): """Returns True if root of a tree, i.e. no parent.""" return self._parent is None def siblings(self): """Returns all nodes that are children of the same parent as self. Note: excludes self from the list. Dynamically calculated. """ if self._parent is None: return [] result = self._parent.Children[:] result.remove(self) return result def iterTips(self, include_self=False): """Iterates over tips descended from self, [] if self is a tip.""" #bail out in easy case if not self.Children: if include_self: yield self raise StopIteration #use stack-based method: robust to large trees stack = [self] while stack: curr = stack.pop() if curr.Children: stack.extend(curr.Children[::-1]) #20% faster than reversed else: yield curr def tips(self, include_self=False): """Returns tips descended from self, [] if self is a tip.""" return list(self.iterTips(include_self=include_self)) def iterNontips(self, include_self=False): """Iterates over nontips descended from self, [] if none. include_self, if True (default is False), will return the current node as part of the list of nontips if it is a nontip.""" for n in self.traverse(True, False, include_self): if n.Children: yield n def nontips(self, include_self=False): """Returns nontips descended from self.""" return list(self.iterNontips(include_self=include_self)) def istip(self): """Returns True if is tip, i.e. no children.""" return not self.Children def tipChildren(self): """Returns direct children of self that are tips.""" return [i for i in self.Children if not i.Children] def nonTipChildren(self): """Returns direct children in self that have descendants.""" return [i for i in self.Children if i.Children] def childGroups(self): """Returns list containing lists of children sharing a state. In other words, returns runs of tip and nontip children. """ #bail out in trivial cases of 0 or 1 item if not self.Children: return [] if len(self.Children) == 1: return [self.Children[0]] #otherwise, have to do it properly... result = [] curr = [] state = None for i in self.Children: curr_state = bool(i.Children) if curr_state == state: curr.append(i) else: if curr: result.append(curr) curr = [] curr.append(i) state = curr_state #handle last group result.append(curr) return result def lastCommonAncestor(self, other): """Finds last common ancestor of self and other, or None. Always tests by identity. """ my_lineage = set([id(node) for node in [self] + self.ancestors()]) curr = other while curr is not None: if id(curr) in my_lineage: return curr curr = curr._parent return None def lowestCommonAncestor(self, tipnames): """Lowest common ancestor for a list of tipnames This should be around O(H sqrt(n)), where H is height and n is the number of tips passed in. """ if len(tipnames) == 1: return self.getNodeMatchingName(tipnames[0]) tipnames = set(tipnames) tips = [tip for tip in self.tips() if tip.Name in tipnames] if len(tips) == 0: return None # scrub tree if hasattr(self, 'black'): for n in self.traverse(include_self=True): if hasattr(n, 'black'): delattr(n, 'black') for t in tips: prev = t curr = t.Parent while curr and not hasattr(curr,'black'): setattr(curr,'black',[prev]) prev = curr curr = curr.Parent # increase black count, multiple children lead to here if curr: curr.black.append(prev) curr = self while len(curr.black) == 1: curr = curr.black[0] return curr lca = lastCommonAncestor #for convenience #support for more advanced tree operations def separation(self, other): """Returns number of edges separating self and other.""" #detect trivial case if self is other: return 0 #otherwise, check the list of ancestors my_ancestors = dict.fromkeys(map(id, [self] + self.ancestors())) count = 0 while other is not None: if id(other) in my_ancestors: #need to figure out how many steps there were back from self curr = self while not(curr is None or curr is other): count += 1 curr = curr._parent return count else: count += 1 other = other._parent return None def descendantArray(self, tip_list=None): """Returns numpy array with nodes in rows and descendants in columns. A value of 1 indicates that the decendant is a descendant of that node/ A value of 0 indicates that it is not Also returns a list of nodes in the same order as they are listed in the array. tip_list is a list of the names of the tips that will be considered, in the order they will appear as columns in the final array. Internal nodes will appear as rows in preorder traversal order. """ #get a list of internal nodes node_list = [node for node in self.traverse() if node.Children] node_list.sort() #get a list of tip names if one is not supplied if not tip_list: tip_list = [n.Name for n in self.tips()] tip_list.sort() #make a blank array of the right dimensions to alter result = zeros([len(node_list), len(tip_list)]) #put 1 in the column for each child of each node for (i, node) in enumerate(node_list): children = [n.Name for n in node.tips()] for (j, dec) in enumerate(tip_list): if dec in children: result[i,j] = 1 return result, node_list def _default_tree_constructor(self): return TreeBuilder(constructor=self.__class__).edgeFromEdge def nameUnnamedNodes(self): """sets the Data property of unnamed nodes to an arbitrary value Internal nodes are often unnamed and so this function assigns a value for referencing.""" #make a list of the names that are already in the tree names_in_use = [] for node in self.traverse(): if node.Name: names_in_use.append(node.Name) #assign unique names to the Data property of nodes where Data = None name_index = 1 for node in self.traverse(): if not node.Name: new_name = 'node' + str(name_index) #choose a new name if name is already in tree while new_name in names_in_use: name_index += 1 new_name = 'node' + str(name_index) node.Name = new_name names_in_use.append(new_name) name_index += 1 def makeTreeArray(self, dec_list=None): """Makes an array with nodes in rows and descendants in columns. A value of 1 indicates that the decendant is a descendant of that node/ A value of 0 indicates that it is not also returns a list of nodes in the same order as they are listed in the array""" #get a list of internal nodes node_list = [node for node in self.traverse() if node.Children] node_list.sort() #get a list of tips() Name if one is not supplied if not dec_list: dec_list = [dec.Name for dec in self.tips()] dec_list.sort() #make a blank array of the right dimensions to alter result = zeros((len(node_list), len(dec_list))) #put 1 in the column for each child of each node for i, node in enumerate(node_list): children = [dec.Name for dec in node.tips()] for j, dec in enumerate(dec_list): if dec in children: result[i,j] = 1 return result, node_list def removeDeleted(self,is_deleted): """Removes all nodes where is_deleted tests true. Internal nodes that have no children as a result of removing deleted are also removed. """ #Traverse tree for node in list(self.traverse(self_before=False,self_after=True)): #if node is deleted if is_deleted(node): #Store current parent curr_parent=node.Parent #Set current node's parent to None (this deletes node) node.Parent=None #While there are no chilren at node and not at root while (curr_parent is not None) and (not curr_parent.Children): #Save old parent old_parent=curr_parent #Get new parent curr_parent=curr_parent.Parent #remove old node from tree old_parent.Parent=None def prune(self): """Reconstructs correct topology after nodes have been removed. Internal nodes with only one child will be removed and new connections will be made to reflect change. """ #traverse tree to decide nodes to be removed. nodes_to_remove = [] for node in self.traverse(): if (node.Parent is not None) and (len(node.Children)==1): nodes_to_remove.append(node) for node in nodes_to_remove: #save current parent curr_parent=node.Parent #save child child=node.Children[0] #remove current node by setting parent to None node.Parent=None #Connect child to current node's parent child.Parent=curr_parent def sameShape(self, other): """Ignores lengths and order, so trees should be sorted first""" if len(self.Children) != len(other.Children): return False if self.Children: for (self_child, other_child) in zip(self.Children, other.Children): if not self_child.sameShape(other_child): return False return True else: return self.Name == other.Name def getNewickRecursive(self, with_distances=False, semicolon=True, \ escape_name=True): """Return the newick string for this edge. Arguments: - with_distances: whether branch lengths are included. - semicolon: end tree string with a semicolon - escape_name: if any of these characters []'"(),:;_ exist in a nodes name, wrap the name in single quotes """ newick = [] subtrees = [child.getNewick(with_distances, semicolon=False) for child in self.Children] if subtrees: newick.append("(%s)" % ",".join(subtrees)) if self.NameLoaded: if self.Name is None: name = '' else: name = str(self.Name) if escape_name and not (name.startswith("'") and \ name.endswith("'")): if re.search("""[]['"(),:;_]""", name): name = "'%s'" % name.replace("'","''") else: name = name.replace(' ','_') newick.append(name) if isinstance(self, PhyloNode): if with_distances and self.Length is not None: newick.append(":%s" % self.Length) if semicolon: newick.append(";") return ''.join(newick) def getNewick(self, with_distances=False, semicolon=True, escape_name=True): """Return the newick string for this tree. Arguments: - with_distances: whether branch lengths are included. - semicolon: end tree string with a semicolon - escape_name: if any of these characters []'"(),:;_ exist in a nodes name, wrap the name in single quotes NOTE: This method returns the Newick representation of this node and its descendents. This method is a modification of an implementation by Zongzhi Liu """ result = ['('] nodes_stack = [[self, len(self.Children)]] node_count = 1 while nodes_stack: node_count += 1 #check the top node, any children left unvisited? top = nodes_stack[-1] top_node, num_unvisited_children = top if num_unvisited_children: #has any child unvisited top[1] -= 1 #decrease the #of children unvisited next_child = top_node.Children[-num_unvisited_children] # - for order #pre-visit if next_child.Children: result.append('(') nodes_stack.append([next_child, len(next_child.Children)]) else: #no unvisited children nodes_stack.pop() #post-visit if top_node.Children: result[-1] = ')' if top_node.NameLoaded: if top_node.Name is None: name = '' else: name = str(top_node.Name) if escape_name and not (name.startswith("'") and \ name.endswith("'")): if re.search("""[]['"(),:;_]""", name): name = "'%s'" % name.replace("'", "''") else: name = name.replace(' ','_') result.append(name) if isinstance(self, PhyloNode): if with_distances and top_node.Length is not None: #result.append(":%s" % top_node.Length) result[-1] = "%s:%s" % (result[-1], top_node.Length) result.append(',') len_result = len(result) if len_result == 2: # single node no name if semicolon: return ";" else: return '' elif len_result == 3: # single node with name if semicolon: return "%s;" % result[1] else: return result[1] else: if semicolon: result[-1] = ';' else: result.pop(-1) return ''.join(result) def removeNode(self, target): """Removes node by identity instead of value. Returns True if node was present, False otherwise. """ to_delete = None for i, curr_node in enumerate(self.Children): if curr_node is target: to_delete = i break if to_delete is None: return False else: del self[to_delete] return True def getEdgeNames(self, tip1name, tip2name, getclade, getstem, outgroup_name=None): """Return the list of stem and/or sub tree (clade) edge name(s). This is done by finding the common intersection, and then getting the list of names. If the clade traverses the root, then use the outgroup_name argument to ensure valid specification. Arguments: - tip1/2name: edge 1/2 names - getstem: whether the name of the clade stem edge is returned. - getclade: whether the names of the edges within the clade are returned - outgroup_name: if provided the calculation is done on a version of the tree re-rooted relative to the provided tip. Usage: The returned list can be used to specify subtrees for special parameterisation. For instance, say you want to allow the primates to have a different value of a particular parameter. In this case, provide the results of this method to the parameter controller method `setParamRule()` along with the parameter name etc.. """ # If outgroup specified put it at the top of the tree so that clades are # defined by their distance from it. This makes a temporary tree with # a named edge at it's root, but it's only used here then discarded. if outgroup_name is not None: outgroup = self.getNodeMatchingName(outgroup_name) if outgroup.Children: raise TreeError('Outgroup (%s) must be a tip' % outgroup_name) self = outgroup.unrootedDeepcopy() join_edge = self.getConnectingNode(tip1name, tip2name) edge_names = [] if getstem: if join_edge.isroot(): raise TreeError('LCA(%s,%s) is the root and so has no stem' % (tip1name, tip2name)) else: edge_names.append(join_edge.Name) if getclade: #get the list of names contained by join_edge for child in join_edge.Children: branchnames = child.getNodeNames(includeself = 1) edge_names.extend(branchnames) return edge_names def _getNeighboursExcept(self, parent=None): # For walking the tree as if it was unrooted. return [c for c in (tuple(self.Children) + (self.Parent,)) if c is not None and c is not parent] def _getDistances(self, endpoints=None): """Iteratively calcluates all of the root-to-tip and tip-to-tip distances, resulting in a tuple of: - A list of (name, path length) pairs. - A dictionary of (tip1,tip2):distance pairs """ ## linearize the tips in postorder. # .__start, .__stop compose the slice in tip_order. if endpoints is None: tip_order = list(self.tips()) else: tip_order = [] for i,name in enumerate(endpoints): node = self.getNodeMatchingName(name) tip_order.append(node) for i, node in enumerate(tip_order): node.__start, node.__stop = i, i+1 num_tips = len(tip_order) result = {} tipdistances = zeros((num_tips), float) #distances from tip to curr node def update_result(): # set tip_tip distance between tips of different child for child1, child2 in comb(node.Children, 2): for tip1 in range(child1.__start, child1.__stop): for tip2 in range(child2.__start, child2.__stop): name1 = tip_order[tip1].Name name2 = tip_order[tip2].Name result[(name1,name2)] = \ tipdistances[tip1] + tipdistances[tip2] result[(name2,name1)] = \ tipdistances[tip1] + tipdistances[tip2] for node in self.traverse(self_before=False, self_after=True): if not node.Children: continue ## subtree with solved child wedges starts, stops = [], [] #to calc ._start and ._stop for curr node for child in node.Children: if hasattr(child, 'Length') and child.Length is not None: child_len = child.Length else: child_len = 1 # default length tipdistances[child.__start : child.__stop] += child_len starts.append(child.__start); stops.append(child.__stop) node.__start, node.__stop = min(starts), max(stops) ## update result if nessessary if len(node.Children) > 1: #not single child update_result() from_root = [] for i,n in enumerate(tip_order): from_root.append((n.Name, tipdistances[i])) return from_root, result def getDistances(self, endpoints=None): """The distance matrix as a dictionary. Usage: Grabs the branch lengths (evolutionary distances) as a complete matrix (i.e. a,b and b,a). """ (root_dists, endpoint_dists) = self._getDistances(endpoints) return endpoint_dists def setMaxTipTipDistance(self): """Propagate tip distance information up the tree This method was originally implemented by Julia Goodrich with the intent of being able to determine max tip to tip distances between nodes on large trees efficiently. The code has been modified to track the specific tips the distance is between """ for n in self.postorder(): if n.isTip(): n.MaxDistTips = [[0.0, n.Name], [0.0, n.Name]] else: if len(n.Children) == 1: tip_a, tip_b = n.Children[0].MaxDistTips tip_a[0] += n.Children[0].Length or 0.0 tip_b[0] += n.Children[0].Length or 0.0 else: tip_info = [(max(c.MaxDistTips), c) for c in n.Children] dists = [i[0][0] for i in tip_info] best_idx = argsort(dists)[-2:] tip_a, child_a = tip_info[best_idx[0]] tip_b, child_b = tip_info[best_idx[1]] tip_a[0] += child_a.Length or 0.0 tip_b[0] += child_b.Length or 0.0 n.MaxDistTips = [tip_a, tip_b] def getMaxTipTipDistance(self): """Returns the max tip tip distance between any pair of tips Returns (dist, tip_names, internal_node) """ if not hasattr(self, 'MaxDistTips'): self.setMaxTipTipDistance() longest = 0.0 names = [None,None] best_node = None for n in self.nontips(include_self=True): tip_a, tip_b = n.MaxDistTips dist = (tip_a[0] + tip_b[0]) if dist > longest: longest = dist best_node = n names = [tip_a[1], tip_b[1]] return longest, names, best_node def maxTipTipDistance(self): """returns the max distance between any pair of tips Also returns the tip names that it is between as a tuple""" distmtx, tip_order = self.tipToTipDistances() idx_max = divmod(distmtx.argmax(),distmtx.shape[1]) max_pair = (tip_order[idx_max[0]].Name, tip_order[idx_max[1]].Name) return distmtx[idx_max], max_pair def _getSubTree(self, included_names, constructor=None, keep_root=False): """An equivalent node with possibly fewer children, or None""" # Renumber autonamed edges if constructor is None: constructor = self._default_tree_constructor() if self.Name in included_names: return self.deepcopy(constructor=constructor) else: # don't need to pass keep_root to children, though # internal nodes will be elminated this way children = [child._getSubTree(included_names, constructor) for child in self.Children] children = [child for child in children if child is not None] if len(children) == 0: result = None elif len(children) == 1 and not keep_root: # Merge parameter dictionaries by adding lengths and making # weighted averages of other parameters. This should probably # be moved out of here into a ParameterSet class (Model?) or # tree subclass. params = {} child = children[0] if self.Length is not None and child.Length is not None: shared_params = [n for (n,v) in self.params.items() if v is not None and child.params.get(n) is not None and n is not "length"] length = self.Length + child.Length if length: params = dict([(n, (self.params[n]*self.Length + child.params[n]*child.Length) / length) for n in shared_params]) params['length'] = length result = child result.params = params else: result = constructor(self, tuple(children)) return result def getSubTree(self, name_list, ignore_missing=False, keep_root=False): """A new instance of a sub tree that contains all the otus that are listed in name_list. ignore_missing: if False, getSubTree will raise a ValueError if name_list contains names that aren't nodes in the tree keep_root: if False, the root of the subtree will be the last common ancestor of all nodes kept in the subtree. Root to tip distance is then (possibly) different from the original tree If True, the root to tip distance remains constant, but root may only have one child node. """ edge_names = set(self.getNodeNames(includeself=1, tipsonly=False)) if not ignore_missing: # this may take a long time for name in name_list: if name not in edge_names: raise ValueError("edge %s not found in tree" % name) new_tree = self._getSubTree(name_list, keep_root=keep_root) if new_tree is None: raise TreeError, "no tree created in make sub tree" elif new_tree.istip(): raise TreeError, "only a tip was returned from selecting sub tree" else: new_tree.Name = "root" # keep unrooted if len(self.Children) > 2: new_tree = new_tree.unrooted() return new_tree def _edgecount(self, parent, cache): """"The number of edges beyond 'parent' in the direction of 'self', unrooted""" neighbours = self._getNeighboursExcept(parent) key = (id(parent), id(self)) if key not in cache: cache[key] = 1 + sum([child._edgecount(self, cache) for child in neighbours]) return cache[key] def _imbalance(self, parent, cache): """The edge count from here, (except via 'parent'), divided into that from the heaviest neighbour, and that from the rest of them. 'cache' should be a dictionary that can be shared by calls to self.edgecount, it stores the edgecount for each node (from self) without having to put it on the tree itself.""" max_weight = 0 total_weight = 0 for child in self._getNeighboursExcept(parent): weight = child._edgecount(self, cache) total_weight += weight if weight > max_weight: max_weight = weight biggest_branch = child return (max_weight, total_weight-max_weight, biggest_branch) def _sorted(self, sort_order): """Score all the edges, sort them, and return minimum score and a sorted tree. """ # Only need to duplicate whole tree because of .Parent pointers constructor = self._default_tree_constructor() if not self.Children: tree = self.deepcopy(constructor) score = sort_order.index(self.Name) else: scored_subtrees = [child._sorted(sort_order) for child in self.Children] scored_subtrees.sort() children = tuple([child.deepcopy(constructor) for (score, child) in scored_subtrees]) tree = constructor(self, children) non_null_scores = [score for (score, child) in scored_subtrees if score is not None] score = (non_null_scores or [None])[0] return (score, tree) def sorted(self, sort_order=[]): """An equivalent tree sorted into a standard order. If this is not specified then alphabetical order is used. At each node starting from root, the algorithm will try to put the descendant which contains the lowest scoring tip on the left. """ tip_names = self.getTipNames() tip_names.sort() full_sort_order = sort_order + tip_names (score, tree) = self._sorted(full_sort_order) return tree def _asciiArt(self, char1='-', show_internal=True, compact=False): LEN = 10 PAD = ' ' * LEN PA = ' ' * (LEN-1) namestr = self.Name or '' # prevents name of NoneType if self.Children: mids = [] result = [] for c in self.Children: if c is self.Children[0]: char2 = '/' elif c is self.Children[-1]: char2 = '\\' else: char2 = '-' (clines, mid) = c._asciiArt(char2, show_internal, compact) mids.append(mid+len(result)) result.extend(clines) if not compact: result.append('') if not compact: result.pop() (lo, hi, end) = (mids[0], mids[-1], len(result)) prefixes = [PAD] * (lo+1) + [PA+'|'] * (hi-lo-1) + [PAD] * (end-hi) mid = (lo + hi) / 2 prefixes[mid] = char1 + '-'*(LEN-2) + prefixes[mid][-1] result = [p+l for (p,l) in zip(prefixes, result)] if show_internal: stem = result[mid] result[mid] = stem[0] + namestr + stem[len(namestr)+1:] return (result, mid) else: return ([char1 + '-' + namestr], 0) def asciiArt(self, show_internal=True, compact=False): """Returns a string containing an ascii drawing of the tree. Arguments: - show_internal: includes internal edge names. - compact: use exactly one line per tip. """ (lines, mid) = self._asciiArt( show_internal=show_internal, compact=compact) return '\n'.join(lines) def _getXmlLines(self, indent=0, parent_params=None): """Return the xml strings for this edge. """ params = {} if parent_params is not None: params.update(parent_params) pad = ' ' * indent xml = ["%s" % pad] if self.NameLoaded: xml.append("%s %s" % (pad, self.Name)) for (n,v) in self.params.items(): if v == params.get(n, None): continue xml.append("%s %s%s" % (pad, n, v)) params[n] = v for child in self.Children: xml.extend(child._getXmlLines(indent + 1, params)) xml.append(pad + "") return xml def getXML(self): """Return XML formatted tree string.""" header = [''] # new_name nodes : specific nodes for renaming (such as just tips, etc...) """ if nodes is None: nodes = self.traverse() for n in nodes: if n.Name in mapping: n.Name = mapping[n.Name] def multifurcating(self, num, eps=None, constructor=None, \ name_unnamed=False): """Return a new tree with every node having num or few children num : the number of children a node can have max eps : default branch length to set if self or constructor is of PhyloNode type constructor : a TreeNode or subclass constructor. If None, uses self """ if num < 2: raise TreeError, "Minimum number of children must be >= 2" if eps is None: eps = 0.0 if constructor is None: constructor = self.__class__ if hasattr(constructor, 'Length'): set_branchlength = True else: set_branchlength = False new_tree = self.copy() for n in new_tree.preorder(include_self=True): while len(n.Children) > num: new_node = constructor(Children=n.Children[-num:]) if set_branchlength: new_node.Length = eps n.append(new_node) if name_unnamed: alpha = 'abcdefghijklmnopqrstuvwxyz' alpha += alpha.upper() base = 'AUTOGENERATED_NAME_%s' # scale the random names by tree size s = int(ceil(log(len(new_tree.tips())))) for n in new_tree.nontips(): if n.Name is None: n.Name = base % ''.join([choice(alpha) for i in range(s)]) return new_tree def bifurcating(self, eps=None, constructor=None, name_unnamed=False): """Wrap multifurcating with a num of 2""" return self.multifurcating(2, eps, constructor, name_unnamed) def getNodesDict(self): """Returns a dict keyed by node name, value is node Will raise TreeError if non-unique names are encountered """ res = {} for n in self.traverse(): if n.Name in res: raise TreeError, "getNodesDict requires unique node names" else: res[n.Name] = n return res def subset(self): """Returns set of names that descend from specified node""" return frozenset([i.Name for i in self.tips()]) def subsets(self): """Returns all sets of names that come from specified node and its kids""" sets = [] for i in self.traverse(self_before=False, self_after=True, \ include_self=False): if not i.Children: i.__leaf_set = frozenset([i.Name]) else: leaf_set = reduce(or_, [c.__leaf_set for c in i.Children]) if len(leaf_set) > 1: sets.append(leaf_set) i.__leaf_set = leaf_set return frozenset(sets) def compareBySubsets(self, other, exclude_absent_taxa=False): """Returns fraction of overlapping subsets where self and other differ. Other is expected to be a tree object compatible with PhyloNode. Note: names present in only one of the two trees will count as mismatches: if you don't want this behavior, strip out the non-matching tips first. """ self_sets, other_sets = self.subsets(), other.subsets() if exclude_absent_taxa: in_both = self.subset() & other.subset() self_sets = [i & in_both for i in self_sets] self_sets = frozenset([i for i in self_sets if len(i) > 1]) other_sets = [i & in_both for i in other_sets] other_sets = frozenset([i for i in other_sets if len(i) > 1]) total_subsets = len(self_sets) + len(other_sets) intersection_length = len(self_sets & other_sets) if not total_subsets: #no common subsets after filtering, so max dist return 1 return 1 - 2*intersection_length/float(total_subsets) def tipToTipDistances(self, default_length=1): """Returns distance matrix between all pairs of tips, and a tip order. Warning: .__start and .__stop added to self and its descendants. tip_order contains the actual node objects, not their names (may be confusing in some cases). """ ## linearize the tips in postorder. # .__start, .__stop compose the slice in tip_order. tip_order = list(self.tips()) for i, tip in enumerate(tip_order): tip.__start, tip.__stop = i, i+1 num_tips = len(tip_order) result = zeros((num_tips, num_tips), float) #tip by tip matrix tipdistances = zeros((num_tips), float) #distances from tip to curr node def update_result(): # set tip_tip distance between tips of different child for child1, child2 in comb(node.Children, 2): for tip1 in range(child1.__start, child1.__stop): for tip2 in range(child2.__start, child2.__stop): result[tip1,tip2] = \ tipdistances[tip1] + tipdistances[tip2] for node in self.traverse(self_before=False, self_after=True): if not node.Children: continue ## subtree with solved child wedges starts, stops = [], [] #to calc ._start and ._stop for curr node for child in node.Children: if hasattr(child, 'Length') and child.Length is not None: child_len = child.Length else: child_len = default_length tipdistances[child.__start : child.__stop] += child_len starts.append(child.__start); stops.append(child.__stop) node.__start, node.__stop = min(starts), max(stops) ## update result if nessessary if len(node.Children) > 1: #not single child update_result() return result+result.T, tip_order def compareByTipDistances(self, other, dist_f=distance_from_r): """Compares self to other using tip-to-tip distance matrices. Value returned is dist_f(m1, m2) for the two matrices. Default is to use the Pearson correlation coefficient, with +1 giving a distance of 0 and -1 giving a distance of +1 (the madimum possible value). Depending on the application, you might instead want to use distance_from_r_squared, which counts correlations of both +1 and -1 as identical (0 distance). Note: automatically strips out the names that don't match (this is necessary for this method because the distance between non-matching names and matching names is undefined in the tree where they don't match, and because we need to reorder the names in the two trees to match up the distance matrices). """ self_names = [i.Name for i in self.tips()] other_names = [i.Name for i in other.tips()] common_names = frozenset(self_names) & frozenset(other_names) if not common_names: raise ValueError, "No names in common between the two trees.""" if len(common_names) <= 2: return 1 #the two trees must match by definition in this case #figure out correct order of the two name matrices self_order = [self_names.index(i) for i in common_names] other_order = [other_names.index(i) for i in common_names] self_matrix = self.tipToTipDistances()[0][self_order][:,self_order] other_matrix = other.tipToTipDistances()[0][other_order][:,other_order] return dist_f(self_matrix, other_matrix) class PhyloNode(TreeNode): def __init__(self, *args, **kwargs): length = kwargs.get('Length', None) params = kwargs.get('Params', {}) if 'length' not in params: params['length'] = length kwargs['Params'] = params super(PhyloNode, self).__init__(*args, **kwargs) def _set_length(self, value): if not hasattr(self, "params"): self.params = {} self.params["length"] = value def _get_length(self): return self.params.get("length", None) Length = property(_get_length, _set_length) def getNewick(self, with_distances=False, semicolon=True, escape_name=True): return TreeNode.getNewick(self, with_distances, semicolon, escape_name) def __str__(self): """Returns string version of self, with names and distances.""" return self.getNewick(with_distances=True) def distance(self, other): """Returns branch length between self and other.""" #never any length between self and other if self is other: return 0 #otherwise, find self's ancestors and find the first ancestor of #other that is in the list self_anc = self.ancestors() self_anc_dict = dict([(id(n),n) for n in self_anc]) self_anc_dict[id(self)] = self count = 0 while other is not None: if id(other) in self_anc_dict: #found the first shared ancestor -- need to sum other branch curr = self while curr is not other: if curr.Length: count += curr.Length curr = curr._parent return count else: if other.Length: count += other.Length other = other._parent return None def totalDescendingBranchLength(self): """Returns total descending branch length from self""" return sum([n.Length for n in self.traverse(include_self=False) \ if n.Length is not None]) def tipsWithinDistance(self, distance): """Returns tips within specified distance from self Branch lengths of None will be interpreted as 0 """ def get_distance(d1, d2): if d2 is None: return d1 else: return d1 + d2 to_process = [(self, 0.0)] tips_to_save = [] curr_node, curr_dist = to_process[0] seen = set([id(self)]) while to_process: curr_node, curr_dist = to_process.pop(0) # have we've found a tip within distance? if curr_node.isTip() and curr_node != self: tips_to_save.append(curr_node) continue # add the parent node if it is within distance parent_dist = get_distance(curr_dist, curr_node.Length) if curr_node.Parent is not None and parent_dist <= distance and \ id(curr_node.Parent) not in seen: to_process.append((curr_node.Parent, parent_dist)) seen.add(id(curr_node.Parent)) # add children if we haven't seen them and if they are in distance for child in curr_node.Children: if id(child) in seen: continue seen.add(id(child)) child_dist = get_distance(curr_dist, child.Length) if child_dist <= distance: to_process.append((child, child_dist)) return tips_to_save def prune(self): """Reconstructs correct tree after nodes have been removed. Internal nodes with only one child will be removed and new connections and Branch lengths will be made to reflect change. """ #traverse tree to decide nodes to be removed. nodes_to_remove = [] for node in self.traverse(): if (node.Parent is not None) and (len(node.Children)==1): nodes_to_remove.append(node) for node in nodes_to_remove: #save current parent curr_parent=node.Parent #save child child=node.Children[0] #remove current node by setting parent to None node.Parent=None #Connect child to current node's parent child.Parent=curr_parent #Add the Length of the removed node to the Length of the Child if child.Length is None or node.Length is None: child.Length = child.Length or node.Length else: child.Length = child.Length + node.Length def unrootedDeepcopy(self, constructor=None, parent=None): # walks the tree unrooted-style, ie: treating self.Parent as just # another child 'parent' is where we got here from, ie: the neighbour # that we don't need to explore. if constructor is None: constructor = self._default_tree_constructor() neighbours = self._getNeighboursExcept(parent) children = [] for child in neighbours: children.append(child.unrootedDeepcopy(constructor, parent=self)) # we might be walking UP the tree, so: if parent is None: # base edge edge = None elif parent.Parent is self: # self's parent is becoming self's child, and edge params are stored # by the child edge = parent else: assert parent is self.Parent edge = self result = constructor(edge, tuple(children)) if parent is None: result.Name = "root" return result def balanced(self): """Tree 'rooted' here with no neighbour having > 50% of the edges. Usage: Using a balanced tree can substantially improve performance of the likelihood calculations. Note that the resulting tree has a different orientation with the effect that specifying clades or stems for model parameterisation should be done using the 'outgroup_name' argument. """ # this should work OK on ordinary 3-way trees, not so sure about # other cases. Given 3 neighbours, if one has > 50% of edges it # can only improve things to divide it up, worst case: # (51),25,24 -> (50,1),49. # If no neighbour has >50% we can't improve on where we are, eg: # (49),25,26 -> (20,19),51 last_edge = None edge = self known_weight = 0 cache = {} while 1: (max_weight, remaining_weight, next_edge) = edge._imbalance( last_edge, cache) known_weight += remaining_weight if max_weight <= known_weight+2: break last_edge = edge edge = next_edge known_weight += 1 return edge.unrootedDeepcopy() def sameTopology(self, other): """Tests whether two trees have the same topology.""" tip_names = self.getTipNames() root_at = tip_names[0] me = self.rootedWithTip(root_at).sorted(tip_names) them = other.rootedWithTip(root_at).sorted(tip_names) return self is other or me.sameShape(them) def unrooted(self): """A tree with at least 3 children at the root. """ constructor = self._default_tree_constructor() need_to_expand = len(self.Children) < 3 new_children = [] for oldnode in self.Children: if oldnode.Children and need_to_expand: for sib in oldnode.Children: sib = sib.deepcopy(constructor) if sib.Length is not None and oldnode.Length is not None: sib.Length += oldnode.Length new_children.append(sib) need_to_expand = False else: new_children.append(oldnode.deepcopy(constructor)) return constructor(self, new_children) def rootedAt(self, edge_name): """Return a new tree rooted at the provided node. Usage: This can be useful for drawing unrooted trees with an orientation that reflects knowledge of the true root location. """ newroot = self.getNodeMatchingName(edge_name) if not newroot.Children: raise TreeError("Can't use a tip (%s) as the root" % repr(edge_name)) return newroot.unrootedDeepcopy() def rootedWithTip(self, outgroup_name): """A new tree with the named tip as one of the root's children""" tip = self.getNodeMatchingName(outgroup_name) return tip.Parent.unrootedDeepcopy() def rootAtMidpoint(self): """ return a new tree rooted at midpoint of the two tips farthest apart this fn doesn't preserve the internal node naming or structure, but does keep tip to tip distances correct. uses unrootedDeepcopy() """ # max_dist, tip_names = tree.maxTipTipDistance() # this is slow max_dist, tip_names = self.maxTipTipDistance() half_max_dist = max_dist/2.0 if max_dist == 0.0: # only pathological cases with no lengths return self.unrootedDeepcopy() # print tip_names tip1 = self.getNodeMatchingName(tip_names[0]) tip2 = self.getNodeMatchingName(tip_names[1]) lca = self.getConnectingNode(tip_names[0],tip_names[1]) # last comm ancestor if tip1.distance(lca) > half_max_dist: climb_node = tip1 else: climb_node = tip2 dist_climbed = 0.0 while dist_climbed + climb_node.Length < half_max_dist: dist_climbed += climb_node.Length climb_node = climb_node.Parent # now midpt is either at on the branch to climb_node's parent # or midpt is at climb_node's parent # print dist_climbed, half_max_dist, 'dists cl hamax' if dist_climbed + climb_node.Length == half_max_dist: # climb to midpoint spot climb_node = climb_node.Parent if climb_node.isTip(): raise RuntimeError('error trying to root tree at tip') else: # print climb_node.Name, 'clmb node' return climb_node.unrootedDeepcopy() else: # make a new node on climb_node's branch to its parent old_br_len = climb_node.Length new_root = type(self)() new_root.Parent = climb_node.Parent climb_node.Parent = new_root climb_node.Length = half_max_dist - dist_climbed new_root.Length = old_br_len - climb_node.Length return new_root.unrootedDeepcopy() def _find_midpoint_nodes(self, max_dist, tip_pair): """returns the nodes surrounding the maxTipTipDistance midpoint WAS used for midpoint rooting. ORPHANED NOW max_dist: The maximum distance between any 2 tips tip_pair: Names of the two tips associated with max_dist """ half_max_dist = max_dist/2.0 #get a list of the nodes that separate the tip pair node_path = self.getConnectingEdges(tip_pair[0], tip_pair[1]) tip1 = self.getNodeMatchingName(tip_pair[0]) for index, node in enumerate(node_path): dist = tip1.distance(node) if dist > half_max_dist: return node, node_path[index-1] def setTipDistances(self): """Sets distance from each node to the most distant tip.""" for node in self.traverse(self_before=False, self_after=True): if node.Children: node.TipDistance = max([c.Length + c.TipDistance for \ c in node.Children]) else: node.TipDistance = 0 def scaleBranchLengths(self, max_length=100, ultrametric=False): """Scales BranchLengths in place to integers for ascii output. Warning: tree might not be exactly the length you specify. Set ultrametric=True if you want all the root-tip distances to end up precisely the same. """ self.setTipDistances() orig_max = max([n.TipDistance for n in self.traverse()]) if not ultrametric: #easy case -- just scale and round for node in self.traverse(): curr = node.Length if curr is not None: node.ScaledBranchLength = \ max(1, int(round(1.0*curr/orig_max*max_length))) else: #hard case -- need to make sure they all line up at the end for node in self.traverse(self_before=False, self_after=True): if not node.Children: #easy case: ignore tips node.DistanceUsed = 0 continue #if we get here, we know the node has children #figure out what distance we want to set for this node ideal_distance=int(round(node.TipDistance/orig_max*max_length)) min_distance = max([c.DistanceUsed for c in node.Children]) + 1 distance = max(min_distance, ideal_distance) for c in node.Children: c.ScaledBranchLength = distance - c.DistanceUsed node.DistanceUsed = distance #reset the BranchLengths for node in self.traverse(self_before=True, self_after=False): if node.Length is not None: node.Length = node.ScaledBranchLength if hasattr(node, 'ScaledBranchLength'): del node.ScaledBranchLength if hasattr(node, 'DistanceUsed'): del node.DistanceUsed if hasattr(node, 'TipDistance'): del node.TipDistance def _getDistances(self, endpoints=None): """Iteratively calcluates all of the root-to-tip and tip-to-tip distances, resulting in a tuple of: - A list of (name, path length) pairs. - A dictionary of (tip1,tip2):distance pairs """ ## linearize the tips in postorder. # .__start, .__stop compose the slice in tip_order. if endpoints is None: tip_order = list(self.tips()) else: tip_order = [] for i,name in enumerate(endpoints): node = self.getNodeMatchingName(name) tip_order.append(node) for i, node in enumerate(tip_order): node.__start, node.__stop = i, i+1 num_tips = len(tip_order) result = {} tipdistances = zeros((num_tips), float) #distances from tip to curr node def update_result(): # set tip_tip distance between tips of different child for child1, child2 in comb(node.Children, 2): for tip1 in range(child1.__start, child1.__stop): for tip2 in range(child2.__start, child2.__stop): name1 = tip_order[tip1].Name name2 = tip_order[tip2].Name result[(name1,name2)] = \ tipdistances[tip1] + tipdistances[tip2] result[(name2,name1)] = \ tipdistances[tip1] + tipdistances[tip2] for node in self.traverse(self_before=False, self_after=True): if not node.Children: continue ## subtree with solved child wedges starts, stops = [], [] #to calc ._start and ._stop for curr node for child in node.Children: if hasattr(child, 'Length') and child.Length is not None: child_len = child.Length else: child_len = 1 # default length tipdistances[child.__start : child.__stop] += child_len starts.append(child.__start); stops.append(child.__stop) node.__start, node.__stop = min(starts), max(stops) ## update result if nessessary if len(node.Children) > 1: #not single child update_result() from_root = [] for i,n in enumerate(tip_order): from_root.append((n.Name, tipdistances[i])) return from_root, result def getDistances(self, endpoints=None): """The distance matrix as a dictionary. Usage: Grabs the branch lengths (evolutionary distances) as a complete matrix (i.e. a,b and b,a).""" (root_dists, endpoint_dists) = self._getDistances(endpoints) return endpoint_dists def tipToTipDistances(self, endpoints=None, default_length=1): """Returns distance matrix between all pairs of tips, and a tip order. Warning: .__start and .__stop added to self and its descendants. tip_order contains the actual node objects, not their names (may be confusing in some cases). """ all_tips = self.tips() if endpoints is None: tip_order = list(all_tips) else: if isinstance(endpoints[0], PhyloNode): tip_order = endpoints else: tip_order = [self.getNodeMatchingName(n) for n in endpoints] ## linearize all tips in postorder # .__start, .__stop compose the slice in tip_order. for i, node in enumerate(all_tips): node.__start, node.__stop = i, i+1 # the result map provides index in the result matrix result_map = dict([(n.__start,i) for i,n in enumerate(tip_order)]) num_all_tips = len(all_tips) # total number of tips num_tips = len(tip_order) # total number of tips in result result = zeros((num_tips, num_tips), float) # tip by tip matrix tipdistances = zeros((num_all_tips), float) # dist from tip to curr node def update_result(): # set tip_tip distance between tips of different child for child1, child2 in comb(node.Children, 2): for tip1 in range(child1.__start, child1.__stop): if tip1 not in result_map: continue res_tip1 = result_map[tip1] for tip2 in range(child2.__start, child2.__stop): if tip2 not in result_map: continue result[res_tip1,result_map[tip2]] = \ tipdistances[tip1] + tipdistances[tip2] for node in self.traverse(self_before=False, self_after=True): if not node.Children: continue ## subtree with solved child wedges starts, stops = [], [] #to calc ._start and ._stop for curr node for child in node.Children: if hasattr(child, 'Length') and child.Length is not None: child_len = child.Length else: child_len = default_length tipdistances[child.__start : child.__stop] += child_len starts.append(child.__start); stops.append(child.__stop) node.__start, node.__stop = min(starts), max(stops) ## update result if nessessary if len(node.Children) > 1: #not single child update_result() return result+result.T, tip_order def compareByTipDistances(self, other, sample=None, dist_f=distance_from_r,\ shuffle_f=shuffle): """Compares self to other using tip-to-tip distance matrices. Value returned is dist_f(m1, m2) for the two matrices. Default is to use the Pearson correlation coefficient, with +1 giving a distance of 0 and -1 giving a distance of +1 (the madimum possible value). Depending on the application, you might instead want to use distance_from_r_squared, which counts correlations of both +1 and -1 as identical (0 distance). Note: automatically strips out the names that don't match (this is necessary for this method because the distance between non-matching names and matching names is undefined in the tree where they don't match, and because we need to reorder the names in the two trees to match up the distance matrices). """ self_names = dict([(i.Name, i) for i in self.tips()]) other_names = dict([(i.Name, i) for i in other.tips()]) common_names = frozenset(self_names.keys()) & \ frozenset(other_names.keys()) common_names = list(common_names) if not common_names: raise ValueError, "No names in common between the two trees.""" if len(common_names) <= 2: return 1 #the two trees must match by definition in this case if sample is not None: shuffle_f(common_names) common_names = common_names[:sample] self_nodes = [self_names[k] for k in common_names] other_nodes = [other_names[k] for k in common_names] self_matrix = self.tipToTipDistances(endpoints=self_nodes)[0] other_matrix = other.tipToTipDistances(endpoints=other_nodes)[0] return dist_f(self_matrix, other_matrix) class TreeBuilder(object): # Some tree code which isn't needed once the tree is finished. # Mostly exists to give edges unique names # Children must be created before their parents. def __init__(self, mutable=False, constructor=PhyloNode): self._used_names = {'edge':-1} self._known_edges = {} self.TreeNodeClass = constructor def _unique_name(self, name): # Unnamed edges become edge.0, edge.1 edge.2 ... # Other duplicates go mouse mouse.2 mouse.3 ... if not name: name = 'edge' if name in self._used_names: self._used_names[name] += 1 name += '.' + str(self._used_names[name]) name = self._unique_name(name) # in case of names like 'edge.1.1' else: self._used_names[name] = 1 return name def _params_for_edge(self, edge): # default is just to keep it return edge.params def edgeFromEdge(self, edge, children, params=None): """Callback for tree-to-tree transforms like getSubTree""" if edge is None: assert not params return self.createEdge(children, "root", {}, False) else: if params is None: params = self._params_for_edge(edge) return self.createEdge( children, edge.Name, params, nameLoaded=edge.NameLoaded) def createEdge(self, children, name, params, nameLoaded=True): """Callback for newick parser""" if children is None: children = [] node = self.TreeNodeClass( Children = list(children), Name = self._unique_name(name), NameLoaded = nameLoaded and (name is not None), Params = params, ) self._known_edges[id(node)] = node return node PyCogent-1.5.3/cogent/core/usage.py000644 000765 000024 00000063543 12024702176 020141 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Classes for dealing with base, codon, and amino acid usage. """ from __future__ import division from cogent.maths.stats.util import Freqs, Numbers, UnsafeFreqs from cogent.maths.stats.special import fix_rounding_error from cogent.util.array import euclidean_distance from cogent.util.misc import Delegator, FunctionWrapper, InverseDict from cogent.core.genetic_code import GeneticCodes, GeneticCode as GenCodeClass from cogent.core.info import Info as InfoClass from cogent.core.alphabet import CharAlphabet from string import upper from numpy import array, concatenate, sum, mean, isfinite, sqrt __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Rob Knight", "Sandra Smit", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" RnaBases = CharAlphabet('UCAG') DnaBases = CharAlphabet('TCAG') AminoAcids = CharAlphabet('ACDEFGHIKLMNPQRSTVWY*') # * denotes termination AB = CharAlphabet('ab') #used for testing Chars = CharAlphabet(''.join(map(chr, range(256))), '-') #used for raw chars RnaBasesGap = CharAlphabet('UCAG-', '-') DnaBasesGap = CharAlphabet('TCAG-', '-') AminoAcidsGap = CharAlphabet('ACDEFGHIKLMNPQRSTVWY*-', '-') DnaIupac = CharAlphabet('TCAGNVBHDKSWMYR') RnaIupac = CharAlphabet('UCAGNVBHDKSWMYR') AminoAcidsIupac = CharAlphabet('ACDEFGHIKLMNPQRSTVWY*XBZ') DnaIupacGap = CharAlphabet('TCAG-NVBHDKSWMYR', '-') RnaIupacGap = CharAlphabet('UCAG-NVBHDKSWMYR', '-') AminoAcidsIupacGap = CharAlphabet('ACDEFGHIKLMNPQRSTVWY*-XBZ', '-') RnaPairs = RnaBases**2 DnaPairs = DnaBases**2 RnaGapPairs = RnaBasesGap**2 DnaGapPairs = DnaBasesGap**2 AminoAcidPairs = AminoAcids**2 ABPairs = AB**2 #RnaBases = 'UCAG' #DnaBases = 'TCAG' RnaCodons = [i+j+k for i in RnaBases for j in RnaBases for k in RnaBases] DnaCodons = [i+j+k for i in DnaBases for j in DnaBases for k in DnaBases] #AminoAcids = 'ACDEFGHIKLMNPQRSTVWY*' SGC = GeneticCodes[1] RnaDinucs = [i+j for i in RnaBases for j in RnaBases] RnaToDna = dict(zip(RnaBases, DnaBases)) DnaToRna = dict(zip(DnaBases, RnaBases)) Bases = RnaBases #by default Codons = RnaCodons #by default _equal_bases = Freqs(Bases) _equal_codons = Freqs(Codons) _equal_amino_acids = Freqs(AminoAcids[:-1]) #exclude Stop for i in (_equal_bases, _equal_codons, _equal_amino_acids): i.normalize() empty_rna_codons = dict.fromkeys(RnaCodons, 0.0) empty_dna_codons = dict.fromkeys(DnaCodons, 0.0) def seq_to_codon_dict(seq, empty_codons=empty_dna_codons): """Converts sequence into codon dict.""" leftover = len(seq) % 3 if leftover: seq += 'A' * (3-leftover) result = empty_codons.copy() for i in range(0, len(seq), 3): curr = seq[i:i+3] if curr in result: #ignore others result[curr] += 1 return result def UnsafeCodonsFromString(seq, rna=False, formatted=False, **kwargs): if rna: d = empty_rna_codons if not formatted: seq = seq.upper().replace('T','U') else: d = empty_dna_codons if not formatted: seq = seq.upper().replace('U','T') return UnsafeCodonUsage(seq_to_codon_dict(seq,d), **kwargs) def key_to_rna(key): """Sets key to uppercase RNA.""" return key.upper().replace('T', 'U') def key_to_dna(key): """Sets key to uppercase DNA.""" return key.upper().replace('U', 'T') class InfoFreqs(Freqs, Delegator): """Like Freqs, but has an Info object storing additional data. Intended for holding base or codon frequencies that come from a particular sequence, so that the Info of the sequence can be preserved even if the sequence is deleted to free up memory. """ def __init__(self, data=None, Info=None, **kwargs): """Intializes BaseUsage with data, either sequence or dict of freqs. Ignores additional kwargs (e.g. to support copy). Makes the _handler for delegator accessible with the name Info. """ if Info is None: if hasattr(data, 'Info'): Info = data.Info else: Info = InfoClass() Delegator.__init__(self, Info) Freqs.__init__(self, data or [], **kwargs) def _get_info(self): """Accessor for Info.""" return self._handler def _set_info(self, obj): """Mutator for Info.""" self._handler = obj Info = property(_get_info, _set_info) class BaseUsageI(object): """Provides shared interface for BaseUsage classes. BaseUsage stores counts of the four DNA or RNA bases. """ def bases(self): """Supports bases/codons/positionalBases/aminoAcids interface.""" return self def codons(self): """Predicts codon frequencies from the base frequencies.""" result = {} base_copy = self.__class__(self) base_copy.normalize() for c in Codons: curr = 1 for i in c: curr *= base_copy[i] result[c] = curr return CodonUsage(result, self.Info) def positionalBases(self): """Returns PositionalBaseUsage with copy of self at each position.""" return PositionalBaseUsage(self.__class__(self), self.__class__(self), self.__class__(self), self.Info) def aminoAcids(self, genetic_code=SGC): """Predicts amino acid frequencies from the base frequencies.""" return self.codons().aminoAcids(genetic_code) def distance(self,other): """Calculates the distance between two BaseUsages. Distance is measured in three directions, CG-content, CU-content, and GU-content. """ return euclidean_distance(array(self.toCartesian()),\ array(other.toCartesian())) def content(self, string): """Gets the sum of bases specified in string. For example, self.content('GC') gets the GC content. """ return sum([self.get(i, 0) for i in string], 0) def toCartesian(self): """Returns tuple of x, y, z coordinates from BaseUsage. x=u+c, y=u+g, z=u+a Doesn't alter original object. """ return self['UC'], self['UG'], self['UA'] def fromCartesian(cls, *coords): """Returns new BaseUsage with A,C,G,U coordinates from UC,UG,UA. From UC,UG,UA to A,C,G(,U). This will only work when the original coordinates come from a simplex, where U+C+A+G=1 """ result = cls() x,y,z = coords u = fix_rounding_error((1-x-y-z)/-2) a, c, g = z-u, x-u, y-u result['A'] = a result['C'] = c result['G'] = g result['U'] = u return result fromCartesian = classmethod(fromCartesian) class BaseUsage(BaseUsageI, InfoFreqs): """Stores frequencies of the four bases, mapped to RNA. This class is convenient but inefficient, since it automatically maps any lookups to the uppercase RNA alphabet internally. Use UnsafeBaseUsage for speed when necessary. """ Mask = FunctionWrapper(key_to_rna) RequiredKeys = dict.fromkeys(Bases) def __getitem__(self, key): """Normalizes key and treats T=U.""" key = self.Mask(key) if len(key) == 2: #pair of bases, e.g. GC for GC content dup = BaseUsage(self) dup.normalize() return sum([dup.get(i,0) for i in key], 0) else: return super(BaseUsage, self).__getitem__(key) class UnsafeBaseUsage(BaseUsageI, UnsafeFreqs): """Stores frequencies of the four bases. Does not do any validation. This class avoids overriding most built-ins, so is much faster than BaseFreqs (although it is often less convenient). """ RequiredKeys = dict.fromkeys(Bases) Info = {} # for interface compatibility with InfoFreqs-based class class CodonUsageI(object): """Stores codon usage for a gene or species. Note that CodonUsage objects get their own reference to _default_code during creation, so changing CodonUsage._default_code will not change the GeneticCode of any CodonUsage object that has already been created. """ _default_code = SGC BlockAbbreviations = \ {'UU':'F/L', 'CU':'Leu', 'AU':'I/M', 'GU':'Val', \ 'UC':'Ser', 'CC':'Pro', 'AC':'Thr', 'GC':'Ala',\ 'UA':'Y/*','CA':'H/Q','AA':'N/K','GA':'D/E',\ 'UG':'C*W', 'CG':'Arg', 'AG':'S/R', 'GG':'Gly'} BlockNames = \ {'UU':'Phe/Leu', 'CU':'Leucine', 'AU':'Ile/Met', 'GU':'Valine', \ 'UC':'Serine', 'CC':'Proline', 'AC':'Threonine', 'GC':'Alanine',\ 'UA':'Tyr/Ter','CA':'His/Gln','AA':'Asn/Lys','GA':'Asp/Glu',\ 'UG':'Cys/Ter/Trp', 'CG':'Arginine', 'AG':'Ser/Arg', 'GG':'Glycine'} Blocks = [i+j for i in 'UCAG' for j in 'UCAG'] #UCAG order SingleAABlocks = ['GC','CG','GG','CU','CC','UC','AC','GU'] #alpha by aa SplitBlocks = ['UU', 'CA','AU','AA','AG','GA'] #UCAG order] AbbreviationsToBlocks = InverseDict(BlockAbbreviations) NamesToBlocks = InverseDict(BlockNames) BaseUsageClass = None #Must overrride in subclasses def bases(self, purge_unwanted=False): """Returns overall base counts.""" result = {} if purge_unwanted: data = self._purged_data() else: data = self for codon, freq in data.items(): for base in codon: if base in result: result[base] += freq else: result[base] = freq return self.BaseUsageClass(result, self.Info) def codons(self): """Supports codons/aminoAcids/bases/positionalBases interface.""" return self def rscu(self): """Normalizes self in-place to RSCU, relative synonymous codon usage. RSCU divides the frequency of each codon to the sum of the freqs for that codon's amino acid. """ gc = self.GeneticCode syn = gc.Synonyms aa_sums = {} for key, codons in syn.items(): aa_sums[key] = sum([self[c] for c in codons], 0) for codon in self: try: curr = self[codon] res = curr/aa_sums[gc[codon]] except (KeyError, ZeroDivisionError, FloatingPointError): pass else: if isfinite(res): self[codon] = res return self def _purged_data(self): """Copy of self's freqs after removing bad/stop codons and singlets.""" good_codons = self.RequiredKeys code = self.GeneticCode data = dict(self) #need copy, since we're deleting required keys #delete any bad codons for codon in self: if codon not in good_codons: del data[codon] #delete any stop codons in the current code for codon in code['*']: try: c = codon.upper().replace('T','U') del data[c] except KeyError: pass #don't care if it's not there #delete any single-item blocks in the current code (i.e. leaving #only doublets and quartets). for group in code.Blocks: if len(group) == 1: try: c = group[0].upper().replace('T','U') del data[c] except KeyError: pass #don't care if already deleted return data def positionalBases(self, purge_unwanted=False): """Calculates positional base usage. purge_unwanted controls whether or not to purge 1-codon groups, stop codons, and any codons containing degnerate bases before calculating the base usage (e.g. to get Noboru Sueoka's P3 measurement): default is False. Deletion of unwanted codons happens on a copy, not the original data. """ first = {} second = {} third = {} if purge_unwanted: #make a copy of the data and delete things from it data = self._purged_data() else: data = self for codon, freq in data.items(): try: p1, p2, p3 = codon except ValueError: continue #skip any incomplete codons if p1 in first: first[p1] += freq else: first[p1] = freq if p2 in second: second[p2] += freq else: second[p2] = freq if p3 in third: third[p3] += freq else: third[p3] = freq return PositionalBaseUsage(self.BaseUsageClass(first), \ self.BaseUsageClass(second), self.BaseUsageClass(third), self.Info) def positionalGC(self, purge_unwanted=True): """Returns GC, P1, P2 P3. Use purge_unwanted=False to get raw counts.""" p = self.positionalBases(purge_unwanted) p.normalize() result = [i['G'] + i['C'] for i in p] average = sum(result, 0)/3 return [average] + result def fingerprint(self, which_blocks='quartets', include_mean=True,\ normalize=True): """Returns fingerprint data for fingerprint plots. which_blocks: whether to include only the usual 4-codon quartets (the default), the split blocks only, or all blocks. include_mean: whether to include the mean (True). normalize: whether to normalize so that the quartets sum to 1 (True) """ if which_blocks == 'split': blocks = self.SplitBlocks elif which_blocks == 'quartets': blocks = self.SingleAABlocks elif which_blocks == 'all': blocks = self.Blocks else: raise "Got invalid option %s for which_blocks:\n" % which_blocks+\ " (valid options: 'split', 'quartets', 'all')." result = [] for b in blocks: #iterates over doublet string U, C, A, G = [self[b+i] for i in 'UCAG'] all = U+C+A+G if G+C: g_ratio = G/(G+C) else: g_ratio = 0.5 if A+U: a_ratio = A/(A+U) else: a_ratio=0.5 result.append([g_ratio, a_ratio, all]) result = array(result) if normalize: #make the shown bubbles sum to 1 sum_ = sum(result[:,-1]) if sum_: result[:,-1] /= sum_ if include_mean: #calculate mean from all codons third = self.positionalBases().Third U, C, A, G = [third[i] for i in 'UCAG'] if G+C: g_ratio = G/(G+C) else: g_ratio = 0.5 if A+U: a_ratio = A/(A+U) else: a_ratio=0.5 result = concatenate((result, array([[g_ratio,a_ratio,1]]))) return result def pr2bias(self, block): """Calculates PR2 bias for a specified block, e.g. 'AC' or 'UU'. Returns tuple of: (G/G+C, A/A+T, G/G+A, C/C+T, G/G+T, C/C+A) If any pair sums to zero, will raise ZeroDivisionError. block: codon block, e.g. 'AC', 'UU', etc. Any of the 16 doublets. """ U, C, A, G = [self[block+i] for i in 'UCAG'] return G/(G+C), A/(A+U), G/(G+A), C/(C+U), G/(G+U), C/(C+A) def aminoAcids(self, genetic_code=None): """Calculates amino acid usage, optionally using a specified code.""" if genetic_code is None: curr_code = self.GeneticCode elif isinstance(genetic_code, GenCodeClass): curr_code = genetic_code else: curr_code = GeneticCodes[genetic_code] aa = {} for codon, freq in self.items(): curr_aa = curr_code[codon] if curr_aa in aa: aa[curr_aa] += freq else: aa[curr_aa] = freq return AminoAcidUsage(aa, self.Info) class CodonUsage(CodonUsageI, InfoFreqs): """Stores frequencies of the 64 codons, mapped to RNA. This class is convenient but inefficient, since it automatically maps any lookups to the uppercase RNA alphabet internally. Use UnsafeBaseUsage for speed when necessary. """ Mask = FunctionWrapper(key_to_rna) RequiredKeys = RnaCodons BaseUsageClass = BaseUsage def __init__(self, data=None, Info=None, GeneticCode=None, \ Mask=None, ValueMask=None, Constraint=None): """Initializes new CodonUsage with Info and frequency data. Note: Mask, ValueMask and Constraint are ignored, but must be present to support copy() because of the ConstrainedContainer interface. """ #check if we have a sequence: if so, take it 3 bases at a time #this will (properly) fail on lists of tuples or anything else where #the items don't behave like strings. try: codons = [''.join(data[i:i+3]) for i in xrange(0, len(data), 3)] except: codons = data super(CodonUsage, self).__init__(codons, Info) if GeneticCode: if isinstance(GeneticCode, GenCodeClass): curr_code = GeneticCode else: curr_code = GeneticCodes[GeneticCode] else: curr_code = self._default_code self.__dict__['GeneticCode'] = curr_code def __getitem__(self, key): """Normalizes key and treats T=U.""" key = self.Mask(key) if len(key) == 2: #pair of bases, e.g. GC for GC content dup = BaseUsage(self) dup.normalize() return sum([dup.get(i,0) for i in key], 0) else: return super(CodonUsage, self).__getitem__(key) class UnsafeCodonUsage(CodonUsageI, UnsafeFreqs): """Stores frequencies of the four bases. Must access as RNA. This class avoids overriding most built-ins, so is much faster than CodonFreqs (although it is often less convenient). """ RequiredKeys = RnaCodons Info = {} # for interface compatibility with InfoFreqs-based class Gene=None #for CUTG compatibility Species=None # for CUTG compaitibility BaseUsageClass = UnsafeBaseUsage def __init__(self, data=None, Info=None, GeneticCode=None, \ Mask=None, ValueMask=None, Constraint=None): """Initializes new CodonUsage with Info and frequency data. Note: Mask, ValueMask and Constraint are ignored, but must be present to support copy() because of the ConstrainedContainer interface. """ #check if we have a sequence: if so, take it 3 bases at a time #this will (properly) fail on lists of tuples or anything else where #the items don't behave like strings. try: codons = [''.join(data[i:i+3]) for i in xrange(0, len(data), 3)] except: codons = data or {} UnsafeFreqs.__init__(self, codons) #set required keys for k in self.RequiredKeys: if k not in self: self[k] = 0.0 #flatten Info onto self directly for lookups if Info: self.__dict__.update(Info) self.Info = Info or {} if GeneticCode: if isinstance(GeneticCode, GenCodeClass): curr_code = GeneticCode else: curr_code = GeneticCodes[GeneticCode] else: curr_code = self._default_code self.GeneticCode = curr_code class PositionalBaseUsage(Delegator): """Stores a BaseUsage for each of the three codon positions.""" def __init__(self, First=None, Second=None, Third=None, Info=None): """Returns new PositionalBaseUsage with values for the 3 positions.""" Delegator.__init__(self, Info) self.__dict__['First'] = First or BaseUsage() self.__dict__['Second'] = Second or BaseUsage() self.__dict__['Third'] = Third or BaseUsage() def _get_info(self): """Accessor for Info.""" return self._handler def _set_info(self, obj): """Mutator for Info.""" self._handler = obj Info = property(_get_info, _set_info) def __getitem__(self, index): """Supports lookups by index.""" if index == 0 or index == -3: return self.First elif index == 1 or index == -2: return self.Second elif index == 2 or index == -1: return self.Third else: raise IndexError, "PositionalBaseUsage only has 3 positions." def normalize(self): """Normalizes each of the component base usages.""" self.First.normalize() self.Second.normalize() self.Third.normalize() def bases(self): """Returns distribution of the four bases, summed over positions.""" sum = BaseUsage(Info=self.Info) for i in self: sum +=i return sum def codons(self): """Returns codon distribution, calculated from positional freqs.""" result = {} first_copy, second_copy, third_copy = map(Freqs, self) first_copy.normalize() second_copy.normalize() third_copy.normalize() for c in Codons: result[c] = first_copy[c[0]] * second_copy[c[1]] * third_copy[c[2]] return CodonUsage(result, self.Info) def positionalBases(self): """Supports bases/codons/positionalBases/aminoAcids interface.""" return self def aminoAcids(self, genetic_code=None): """Returns amino acid distribution.""" return self.codons().aminoAcids(genetic_code) class AminoAcidUsage(InfoFreqs): """Stores counts ofthe 20 canonical amino acids.""" Mask = FunctionWrapper(upper) RequiredKeys = dict.fromkeys(AminoAcids) def bases(self, genetic_code=SGC, codon_usage=_equal_codons): """Predicts most likely set of base frequencies. Optionally uses a genetic code (default: standard genetic code) and codon usage (default: unbiased codon usage). """ result = self.codons(genetic_code, codon_usage).bases() result.normalize() return result def codons(self, genetic_code=SGC, codon_usage=_equal_codons): """Predicts most likely set of codon frequencies. Optionally uses genetic_code (to figure out which codons belong with each amino acid), and codon_usage (to get most likely codons for each amino acid). Defaults are the standard genetic code and unbiased codon frequencies. """ result = {} normalized = Freqs(self) normalized.normalize() for aa, aa_freq in normalized.items(): curr_codons = [c.upper().replace('T','U') for c in genetic_code[aa]] if not curr_codons: continue #code might be missing some amino acids? curr_codon_freqs = Numbers([codon_usage[c] for c in curr_codons]) curr_codon_freqs.normalize() for codon, c_freq in zip(curr_codons, curr_codon_freqs): result[codon] = c_freq * aa_freq return CodonUsage(result, self.info, genetic_code) def positionalBases(self, genetic_code=SGC, codon_usage=_equal_codons): """Predicts most likely set of positional base frequencies. Optionally uses a genetic code (default: standard genetic code) and codon usage (default: unbiased codon usage). """ return self.codons(genetic_code, codon_usage).positionalBases() def aminoAcids(self): """Supports bases/positionalBases/aminoAcids/codons interface.""" return self class DinucI(object): """Provides shared interface for DinucUsage classes. DinucUsage stores counts of the 16 doublets. """ def distance(self, other): """Calculates distance between two DinucUsage objects.""" result = 0 for k in self.RequiredKeys: result += (self[k]-other[k])**2 return sqrt(result) class DinucUsage(DinucI, InfoFreqs): """Stores frequencies of the 16 dinucleotides, mapped to RNA. This class is convenient but inefficient, since it automatically maps any lookups to the uppercase RNA alphabet internally. Use UnsafeBaseUsage for speed when necessary. """ Mask = FunctionWrapper(key_to_rna) RequiredKeys = RnaDinucs def __init__(self, data=None, Info=None, Overlapping=True, \ GeneticCode=None, Mask=None, ValueMask=None, Constraint=None): """Initializes new CodonUsage with Info and frequency data. Note: Mask, ValueMask and Constraint are ignored, but must be present to support copy() because of the ConstrainedContainer interface. """ #check if we have a sequence: if so, take it 3 bases at a time #this will (properly) fail on lists of tuples or anything else where #the items don't behave like strings. if Mask is not None: self.Mask = Mask if ValueMask is not None: self.ValueMask = ValueMask try: data = self.Mask(data) if Overlapping == '3-1': range_ = range(2, len(data)-1, 3) elif Overlapping: range_ = range(0, len(data)-1) else: range_ = range(0, len(data)-1, 2) dinucs = [''.join(data[i:i+2]) for i in range_] except: dinucs = data super(DinucUsage, self).__init__(dinucs, Info) def __getitem__(self, key): """Normalizes key and treats T=U.""" key = self.Mask(key) return super(DinucUsage, self).__getitem__(key) #some useful constants... EqualBases = BaseUsage() EqualBases = BaseUsage(_equal_bases) EqualPositionalBases = PositionalBaseUsage(BaseUsage(_equal_bases), BaseUsage(_equal_bases), BaseUsage(_equal_bases)) EqualAminoAcids = AminoAcidUsage(_equal_amino_acids) #excl. Stop EqualCodons = CodonUsage(_equal_codons) PyCogent-1.5.3/cogent/cluster/__init__.py000644 000765 000024 00000000750 12024702176 021314 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """cluster: provides tools for clustering """ __all__ = ['goodness_of_fit', 'metric_scaling', 'procrustes', 'UPGMA', 'nmds', 'approximate_mds'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozuopone", "Rob Knight", "Peter Maxwell", "Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Catherine Lozupone" __email__ = "lozupone@colorado.edu" __status__ = "Production" PyCogent-1.5.3/cogent/cluster/approximate_mds.py000644 000765 000024 00000073655 12024702176 022767 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Functions for doing fast multidimensional scaling of distances using Nystrom/LMDS approximation ans Split and Combine MDS. ===================== Nystrom ===================== Approximates an MDS mapping / a (partial) PCoA solution of an (unknown) full distance matrix using a k x n seed distance matrix. Use if you have a very high number of objects or if each distance caculation is expensive. Speedup comes from two factors: 1. Not all distances are calculated but only k x n. 2. Eigendecomposition is only applied to a k x k matrix. Calculations done after Platt (2005). See http://research.microsoft.com/apps/pubs/?id=69185 : ``This paper unifies the mathematical foundation of three multidimensional scaling algorithms: FastMap, MetricMap, and Landmark MDS (LMDS). All three algorithms are based on the Nystrom approximation of the eigenvectors and eigenvalues of a matrix. LMDS is applies the basic Nystrom approximation, while FastMap and MetricMap use generalizations of Nystrom, including deflation and using more points to establish an embedding. Empirical experiments on the Reuters and Corel Image Features data sets show that the basic Nystrom approximation outperforms these generalizations: LMDS is more accurate than FastMap and MetricMap with roughly the same computation and can become even more accurate if allowed to be slower.`` Assume a full distance matrix D for N objects: / E | F \ D = |-----|----| \ F.t | G / The correspondong association matrix or centered inner-product matrix K is: / A | B \ K = |-----|----| \ B.t | C / where A and B are computed as follows A_ij = - 0.50 * (E_ij^2 - 1/m SUM_p E_pj^2 - 1/m SUM_q E_iq^2 + 1/m^2 SUM_q E_pq^2 B is computed as in Landmark MDS, because it's simpler and works better according to Platt): B_ij = - 0.50 * (F_ij^2 - 1/m SUM_q E_iq^2) In order to approximate an MDS mapping for full matrix you only need E and F from D as seed matrix. This will mimick the distances for m seed objects. E is of dimension m x m and F of m x (N-m) E and F are then used to approximate and MDS solution x for the full distance matrix: x_ij = sqrt(g_j) * U_ij, if i<=m and SUM_p B_pi U_pj / sqrt(g_j) where U_ij is the i'th component of the jth eigenvector of A (see below) and g_j is the j'th eigenvalue of A. The index j only runs from 1 to k in order to make a k dimensional embedding. ===================== SCMDS ===================== The is a Python/Numpy implementation of SCMDS: Tzeng J, Lu HH, Li WH. Multidimensional scaling for large genomic data sets. BMC Bioinformatics. 2008 Apr 4;9:179. PMID: 18394154 The basic idea is to avoid the computation and eigendecomposition of the full pairwise distance matrix. Instead only compute overlapping submatrices/tiles and their corresponding MDS separately. The solutions are then joined using an affine mapping approach. ================================================= """ from numpy import sign, floor, sqrt, power, mean, array from numpy import matrix, ones, dot, argsort, diag, eye from numpy import zeros, concatenate, ndarray, kron, argwhere from numpy.linalg import eig, eigh, qr from random import sample import time __author__ = "Adreas Wilm" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Fabian Sievers ", "Daniel McDonald ", "Antonio Gonzalez Pena"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Andreas Wilm" __email__ = "andreas.wilm@ucd.ie" __status__ = "Development" # print simple timings PRINT_TIMINGS = False def rowmeans(mat): """Returns a `column-vector` of row-means of the 2d input array/matrix """ if not len(mat.shape)==2: raise ValueError, "argument is not a 2D ndarray" #nrows = mat.shape[0] ## create a column vector (hack!) #cv = matrix(arange(float(nrows))) #cv = cv.T #for i in range(nrows): # cv[i] = mat[i].mean() # # As pointed out by Daniel the above is much simpler done in Numpy: cv = matrix(mean(mat, axis=1).reshape((mat.shape[0], 1))) return cv def affine_mapping(matrix_x, matrix_y): """Returns an affine mapping function. Arguments are two different MDS solutions/mappings (identical dimensions) on the the same objects/distances. Affine mapping function: Y = UX + kron(b,ones(1,n)), UU' = I X = [x_1, x_2, ... , x_n]; x_j \in R^m Y = [y_1, y_2, ... , y_n]; y_j \in R^m From Tzeng 2008: `The projection of xi,2 from q2 dimension to q1 dimension induces computational errors (too). To avoid this error, the sample number of the overlapping region is important. This sample number must be large enough so that the derived dimension of data is greater or equal to the real data` Notes: - This was called moving in scmdscale.m - We work on tranposes, i.e. coordinates for objects are in columns Arguments: - `matrix_x`: first mds solution (`reference`) - `matrix_y`: seconds mds solution Return: A tuple of unitary operator u and shifting operator b such that: y = ux + kron(b, ones(1,n)) """ if matrix_x.shape != matrix_y.shape: raise ValueError, \ "input matrices are not of same size" if not matrix_x.shape[0] <= matrix_x.shape[1]: raise ValueError, \ "input matrices should have more columns than rows" # Have to check if we have not more rows than columns, otherwise, # the qr function below might behave differently in the matlab # prototype, but not here. In addition this would mean that we # have more dimensions than overlapping objects which shouldn't # happen # # matlab code uses economic qr mode (see below) which we can't use # here because we need both return matrices. # # see # http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/qr.html # [Q,R] = qr(A,0) produces the economy-size decomposition. # If m > n, only the first n columns of Q and # the first n rows of R are computed. # If m<=n, this is the same as [Q,R] = qr(A) # # [Q,R] = qr(A), where A is m-by-n, produces # an m-by-n upper triangular matrix R and # an m-by-m unitary matrix Q so that A = Q*R # # That's why we check above with an assert ox = rowmeans(matrix_x) oy = rowmeans(matrix_y) mx = matrix_x - kron(ox, ones((1, matrix_x.shape[1]))) my = matrix_y - kron(oy, ones((1, matrix_x.shape[1]))) (qx, rx) = qr(mx) (qy, ry) = qr(my) # sign correction # # Daniel suggest to use something like # [arg]where(sign(a.diagonal()) != sign(b.diagonal())) and then # iterate over the results. Couldn't figure out how to do this # properly :( #idx = argwhere(sign(rx.diagonal()) != sign(ry.diagonal())) #for i in idx: # qy[:,i] *= -1 for i in range(qx.shape[1]): if sign(rx[i, i]) != sign(ry[i, i]): qy[:, i] *= -1 # matrix multiply: use '*' as all arguments are of type matrix ret_u = qy * qx.transpose() ret_b = oy - ret_u * ox return (ret_u, ret_b) def adjust_mds_to_ref(mds_ref, mds_add, n_overlap): """Transforms mds_add such that the overlap mds_ref and mds_add has same configuration. As overlap (n_overlap objects) we'll use the end of mds_ref and the beginning of mds_add Both matrices must be of same dimension (column numbers) but can have different number of objects (rows) because only overlap will be used. Arguments: - `mds_ref`: reference mds solution - `mds_add`: mds solution to adjust - `n_overlap`: overlap size between mds_ref and mds_add Return: Adjusted version of mds_add which matches configuration of mds_ref """ if mds_ref.shape[1] != mds_add.shape[1]: raise ValueError, \ "given mds solutions have different dimensions" if not (mds_ref.shape[0] >= n_overlap and mds_add.shape[0] >= n_overlap): raise ValueError, \ "not enough overlap between given mds mappings" # Use transposes for affine_mapping! overlap_ref = mds_ref.transpose()[:, -n_overlap:] overlap_add = mds_add.transpose()[:, 0:n_overlap] (unitary_op, shift_op) = affine_mapping(overlap_add, overlap_ref) # paranoia: unitary_op is of type matrix, make sure mds_add # is as well so that we can use '*' for matrix multiplication mds_add_adj = unitary_op * matrix(mds_add.transpose()) + \ kron(shift_op, ones((1, mds_add.shape[0]))) mds_add_adj = mds_add_adj.transpose() return mds_add_adj def recenter(joined_mds): """Recenter an Mds mapping that has been created by joining, i.e. move points so that center of gravity is zero. Note: Not sure if recenter is a proper name, because I'm not exactly sure what is happening here Matlab prototype from Tzeng et al. 2008: X = zero_sum(X); # subtract column means M = X'*X; [basis,L] = eig(M); Y = X*basis; return Y = Y(:,end:-1:1); Arguments: - `mds_combined`: Return: Recentered version of `mds_combined` """ # or should we cast explictely? if not isinstance(joined_mds, matrix): raise ValueError, "mds solution has to be of type matrix" # As pointed out by Daniel: the following two loop can be done in # one if you pass down the axis variable to means() # #colmean = [] #for i in range(joined_mds.shape[1]): # colmean.append(joined_mds[:, i].mean()) #for i in range(joined_mds.shape[0]): # joined_mds[i, :] = joined_mds[i, :] - colmean # joined_mds = joined_mds - joined_mds.mean(axis=0) matrix_m = dot(joined_mds.transpose(), joined_mds) (eigvals, eigvecs) = eig(matrix_m) # Note / Question: do we need sorting? # idxs_ascending = eigvals.argsort() # idxs_descending = eigvals.argsort()[::-1]# reverse! # eigvecs = eigvecs[idxs_ascending] # eigvals = eigvals[idxs_ascending] # joined_mds and eigvecs are of type matrix so use '*' for # matrix multiplication joined_mds = joined_mds * eigvecs # NOTE: the matlab protoype reverses the vector before # returning. We don't because I don't know why and results are # good return joined_mds def combine_mds(mds_ref, mds_add, n_overlap): """Returns a combination of the two MDS mappings mds_ref and mds_add. This is done by finding an affine mapping on the overlap between mds_ref and mds_add and changing mds_add accordingly. As overlap we use the last n_overlap objects/rows in mds_ref and the first n_overlap objects/rows in mds_add. The overlapping part will be replaced, i.e. the returned combination has the following row numbers: mds_ref.nrows + mds_add.nrows - overlap The combined version will eventually need recentering. See recenter() Arguments: - `mds_ref`: reference mds mapping - `mds_add`: mds mapping to add """ if mds_ref.shape[1] != mds_add.shape[1]: raise ValueError, \ "given mds solutions have different dimensions" if not mds_ref.shape[0] >= n_overlap: raise ValueError, \ "not enough items for overlap in mds_ref" if not mds_add.shape[0] >= n_overlap: raise ValueError, \ "not enough items for overlap in mds_add" mds_add_adj = adjust_mds_to_ref(mds_ref, mds_add, n_overlap) combined_mds = concatenate(( \ mds_ref[0:mds_ref.shape[0]-n_overlap, :], mds_add_adj)) return combined_mds def cmds_tzeng(distmat, dim = None): """Calculate classical multidimensional scaling on a distance matrix. Faster than default implementation of dim is smaller than distmat.shape Arguments: - `distmat`: distance matrix (non-complex, symmetric ndarray) - `dim`: wanted dimensionality of MDS mapping (defaults to distmat dim) Implementation as in Matlab prototype of SCMDS, see Tzeng J et al. (2008), PMID: 18394154 """ if not isinstance(distmat, ndarray): raise ValueError, \ "Input matrix is not a ndarray" (m, n) = distmat.shape if m != n: raise ValueError, \ "Input matrix is not a square matrix" if not dim: dim = n # power goes wrong here if distmat is ndarray because of matrix # multiplication syntax difference between array and # matrix. (doesn't affect gower's variant). be on the safe side # and convert explicitely (it's local only): distmat = matrix(distmat) h = eye(n) - ones((n, n))/n assocmat = -h * (power(distmat, 2)) * h / 2 #print "DEBUG assocmat[:3] = %s" % assocmat[:3] (eigvals, eigvecs) = eigh(assocmat) # Recommended treatment of negative eigenvalues (by Fabian): use # absolute value (reason: SVD does the same) eigvals = abs(eigvals) ind = argsort(eigvals)[::-1] eigvals = eigvals[ind] eigvecs = eigvecs[:, ind] eigvals = eigvals[:dim] eigvecs = eigvecs[:, :dim] eigval_diagmat = matrix(diag(sqrt(eigvals))) eigvecs = eigval_diagmat * eigvecs.transpose() return (eigvecs.transpose(), eigvals) class CombineMds(object): """ Convinience class for joining MDS mappings. Several mappings can be added. The is uses the above Python/Numpy implementation of SCMDS. See Tzeng et al. 2008, PMID: 18394154 """ def __init__(self, mds_ref=None): """ Init with reference MDS """ self._mds = mds_ref self._need_centering = False def add(self, mds_add, overlap_size): """Add a new MDS mapping to existing one """ if self._mds == None: self._mds = mds_add return if not self._mds.shape[0] >= overlap_size: raise ValueError, \ "not enough items for overlap in reference mds" if not mds_add.shape[0] >= overlap_size: raise ValueError, \ "not enough items for overlap in mds to add" self._need_centering = True self._mds = combine_mds(self._mds, mds_add, overlap_size) def getFinalMDS(self): """Get final, combined MDS solution """ if self._need_centering: self._mds = recenter(self._mds) return self._mds def calc_matrix_b(matrix_e, matrix_f): """Calculates k x n-k matrix b of association matrix K See eq (14) and (15) in Platt (2005) This is where Nystrom and LMDS differ: LMDS version leaves the constant centering term out, This simplifies the computation. Arguments: - `matrix_e`: k x k part of the kxn seed distance matrix - `matrix_f`: k x n-k part of the kxn seed distance matrix """ (nrows, ncols) = matrix_f.shape if matrix_e.shape[0] != matrix_e.shape[1]: raise ValueError, "matrix_e should be quadratic" if matrix_f.shape[0] != matrix_e.shape[0]: raise ValueError, \ "matrix_e and matrix_f should have same number of rows" nseeds = matrix_e.shape[0] # row_center_e was also precomputed in calc_matrix_a but # computation is cheap # row_center_e = zeros(nseeds) for i in range(nseeds): row_center_e[i] = power(matrix_e[i, :], 2).sum()/nseeds # The following is not needed in LMDS but part of original Nystrom # # ncols_f = matrix_f.shape[1] # col_center_f = zeros(ncols_f) # for i in range(ncols_f): # col_center_f[i] = power(matrix_f[:, i], 2).sum()/nseeds # # subtract col_center_f[j] below from result[i, j] and you have # nystrom. dont subtract it and you have lmds #result = zeros((nrows, ncols)) #for i in xrange(nrows): # for j in xrange(ncols): # # # - original one line version: # # # #result[i, j] = -0.50 * ( # # power(matrix_f[i, j], 2) - # # row_center_e[i]) # # # # - optimised version avoiding pow(x,2) # # 3xfaster on a 750x3000 seed-matrix # fij = matrix_f[i, j] # result[i, j] = -0.50 * ( # fij * fij - # row_center_e[i]) # # - optimised single line version of the code block above. pointed out # by daniel. 20xfaster on a 750x3000 seed-matrix. cloning idea # copied from # http://stackoverflow.com/questions/1550130/cloning-row-or-column-vectors result = -0.5 * (matrix_f**2 - array([row_center_e, ]*ncols).transpose()) return result def calc_matrix_a(matrix_e): """Calculates k x k matrix a of association matrix K see eq (13) in Platt from symmetrical matrix E A_ij = - 0.50 * (E_ij^2 - 1/m SUM_p E_pj^2 - 1/m SUM_q E_iq^2 + 1/m^2 SUM_q E_pq^2 Row and colum centering terms (E_pj and E_iq) are identical because we use a k x k submatrix of a symmetrical distance m equals here ncols or ncols of matrix_e we call it nseeds Arguments: - `matrix_e`: k x k part of the kxn seed distance matrix """ if matrix_e.shape[0] != matrix_e.shape[1]: raise ValueError, "matrix_e should be quadratic" nseeds = matrix_e.shape[0] row_center = zeros(nseeds) for i in range(nseeds): row_center[i] = power(matrix_e[i, :], 2).sum()/nseeds # E should be symmetric, i.e. column and row means are identical. # Why is that not mentioned in the papers? To be on the safe side # just do this: # col_center = zeros(nseeds) # for i in range(nseeds): # col_center[i] = power(matrix_e[:, i], 2).sum()/nseeds # or simply: col_center = row_center grand_center = power(matrix_e, 2).sum()/power(nseeds, 2) # E is symmetric and so is A, which is why we don't need to loop # over the whole thing # # FIXME: Optimize # result = zeros((nseeds, nseeds)) for i in range(nseeds): for j in range(i, nseeds): # avoid pow(x,2). it's slow eij_sq = matrix_e[i, j] * matrix_e[i, j] result[i, j] = -0.50 * ( eij_sq - col_center[j] - row_center[i] + grand_center) if i != j: result[j, i] = result[i, j] return result def build_seed_matrix(fullmat_dim, seedmat_dim, getdist, permute_order=True): """Builds a seed matrix of shape seedmat_dim x fullmat_dim Returns seed-matrix and indices to restore original order (needed if permute_order was True) Arguments: - `fullmat_dim`: dimension of the unknown (square, symmetric) "input" matrix - `seedmat_dim`: requested dimension of seed matrix. - `getdist`: distance function to compute distances. should take two arguments i,j with an index range of 0..fullmat_dim-1 - `permute_order`: if permute_order is false, seeds will be picked sequentially. otherwise randomly """ if not seedmat_dim < fullmat_dim: raise ValueError, \ "dimension of seed matrix must be smaller than that of full matrix" if not callable(getdist): raise ValueError, "distance getter function not callable" if permute_order: picked_seeds = sample(range(fullmat_dim), seedmat_dim) else: picked_seeds = range(seedmat_dim) #assert len(picked_seeds) == seedmat_dim, ( # "mismatch between number of picked seeds and seedmat dim.") # Putting picked seeds/indices at the front is not enough, # need to change/correct all indices to maintain consistency # used_index_order = range(fullmat_dim) picked_seeds.sort() # otherwise the below fails for i, seed_idx in enumerate(picked_seeds): used_index_order.pop(seed_idx-i) used_index_order = picked_seeds + used_index_order # Order is now determined in used_index_order # first seedmat_dim objects are seeds # now create seedmat # t0 = time.clock() seedmat = zeros((len(picked_seeds), fullmat_dim)) for i in range(len(picked_seeds)): for j in range(fullmat_dim): if i < j: seedmat[i, j] = getdist(used_index_order[i], used_index_order[j]) elif i == j: continue else: seedmat[i, j] = seedmat[j, i] restore_idxs = argsort(used_index_order) if PRINT_TIMINGS: print("TIMING(%s): Seedmat calculation took %f CPU secs" % (__name__, time.clock() - t0)) # Return the seedmatrix and the list of indices which can be used to # recreate original order return (seedmat, restore_idxs) def nystrom(seed_distmat, dim): """Computes an approximate MDS mapping of an (unknown) full distance matrix using a kxn seed distance matrix. Returned matrix has the shape seed_distmat.shape[1] x dim """ if not seed_distmat.shape[0] < seed_distmat.shape[1]: raise ValueError, \ "seed distance matrix should have less rows than column" if not dim <= seed_distmat.shape[0]: raise ValueError, \ "number of rows of seed matrix must be >= requested dim" nseeds = seed_distmat.shape[0] nfull = seed_distmat.shape[1] # matrix E: extract columns 1--nseed # matrix_e = seed_distmat[:, 0:nseeds] #print("INFO: Extracted Matrix E which is of shape %dx%d" % # (matrix_e.shape)) # matrix F: extract columns nseed+1--end # matrix_f = seed_distmat[:, nseeds:] #print("INFO: Extracted Matrix F which is of shape %dx%d" % # (matrix_f.shape)) # matrix A # #print("INFO: Computing Matrix A") t0 = time.clock() matrix_a = calc_matrix_a(matrix_e) if PRINT_TIMINGS: print("TIMING(%s): Computation of A took %f CPU secs" % (__name__, time.clock() - t0)) # matrix B # #print("INFO: Computing Matrix B") t0 = time.clock() matrix_b = calc_matrix_b(matrix_e, matrix_f) if PRINT_TIMINGS: print("TIMING(%s): Computation of B took %f CPU secs" % (__name__, time.clock() - t0)) #print("INFO: Eigendecomposing A") t0 = time.clock() # eigh: eigen decomposition for symmetric matrices # returns: w, v # w : ndarray, shape (M,) # The eigenvalues, not necessarily ordered. # v : ndarray, or matrix object if a is, shape (M, M) # The column v[:, i] is the normalized eigenvector corresponding # to the eigenvalue w[i]. # alternative is svd: [U, S, V] = numpy.linalg.svd(matrix_a) (eigval_a, eigvec_a) = eigh(matrix_a) if PRINT_TIMINGS: print("TIMING(%s): Eigendecomposition of A took %f CPU secs" % (__name__, time.clock() - t0)) # Sort descending ind = argsort(eigval_a) ind = ind[::-1] eigval_a = eigval_a[ind] eigvec_a = eigvec_a[:, ind] #print("INFO: Estimating MDS coords") t0 = time.clock() result = zeros((nfull, dim)) # X in Platt 2005 # Preventing negative eigenvalues by using abs value. Other option # is to set negative values to zero. Fabian recommends using # absolute values (as in SVD) sqrt_eigval_a = sqrt(abs(eigval_a)) for i in xrange(nfull): for j in xrange(dim): if i+1 <= nseeds: val = sqrt_eigval_a[j] * eigvec_a[i, j] else: # - original, unoptimised code-block # numerator = 0.0 # for p in xrange(nseeds): # numerator += matrix_b[p, i-nseeds] * eigvec_a[p, j] # val = numerator / sqrt_eigval_a[j] # # - optimisation attempt: actually slower # numerator = sum(matrix_b[p, i-nseeds] * eigvec_a[p, j] # for p in xrange(nseeds)) # # - slightly optimised version: twice as fast on a seedmat of # size 750 x 3000 #a_mb = array([matrix_b[p, i-nseeds] for p in xrange(nseeds)]) #a_eva = array([eigvec_a[p, j] for p in xrange(nseeds)]) #val = (a_mb*a_eva).sum() / sqrt_eigval_a[j] # # - optmisation suggested by daniel: # 100fold increase on a seedmat of size 750 x 3000 numerator = (matrix_b[:nseeds, i-nseeds] * eigvec_a[:nseeds, j]).sum() val = numerator / sqrt_eigval_a[j] result[i, j] = val if PRINT_TIMINGS: print("TIMING(%s): Actual MDS approximation took %f CPU secs" % (__name__, time.clock() - t0)) return result """ ================= Nystrom example implamentation: Fast computation of an approximate MDS mapping / PCoA of an (yet unknown) full distance matrix. Returned MDS coordinates have the shape num_objects x dim. Arguments: - `num_objects`: total number of objects to compute mapping for. - `num_seeds`: number of seeds objects. high means more exact solution, but the slower - `dim`: dimensionality of MDS mapping - `dist_func`: callable distance function. arguments should be i,j, with index range 0..num_objects-1 - `permute_order`: permute order of objects. recommended to avoid caveeats with ordered data that might lead to distorted results. permutation is random. run several times for benchmarking. def nystrom_frontend(num_objects, num_seeds, dim, dist_func, permute_order=True): (seed_distmat, restore_idxs) = build_seed_matrix( num_objects, num_seeds, dist_func, permute_order) #picked_seeds = argsort(restore_idxs)[:num_seeds] mds_coords = nystrom(seed_distmat, dim) # restoring original order in mds_coords, which has been # altered during seed matrix calculation return mds_coords[restore_idxs] ================= SCMDS example implamentation: Fast MDS approxmiation SCMDS. Breaks (unknown) distance matrix into smaller chunks (tiles), computes MDS solutions for each of these and joins them to one form a full approximatiom. Arguments: - `num_objects`: number of objects in distance matrix - `tile_size`: size of tiles/submatrices. the bigger, the slower but the better the approximation - `tile_overlap`: overlap of tiles. has to be bigger than dimensionality - `dim`: requested dimensionality of MDS approximation - `dist_func`: distance function to compute distance between two objects x and y. valid index range for x and y should be 0..num_objects-1 - `permute_order`: permute input order if True. reduces distortion. order of returned coordinates is kept fixed in either case. def scmds_frontend(num_objects, tile_size, tile_overlap, dim, dist_func, permute_order=True): if num_objects < tile_size: raise ValueError, \ "Number of objects cannot be smaller than tile size" if tile_overlap > tile_size: raise ValueError, \ "Tile overlap cannot be bigger than tile size" if dim > tile_overlap: raise ValueError, \ "Tile overlap must be at least as big as requested dimensionality" if not callable(dist_func): raise ValueError, "distance getter function not callable" t0_overall = time.clock() if permute_order: order = sample(range(num_objects), num_objects) else: order = range(num_objects) ntiles = floor((num_objects - tile_size) / \ (tile_size - tile_overlap))+1 assert ntiles != 0, "Internal error: can't use 0 tiles!" # loop over all ntiles, overlapping tiles. apply mds to each # single one and join the solutions to the growing overall # solution # tile_no = 1 tile_start = 0 tile_end = tile_size + \ ((num_objects-tile_size) % (tile_size-tile_overlap)) comb_mds = CombineMds() while tile_end <= num_objects: # beware: tile_size is just the ideal tile_size, i.e. # tile_end-tile_start might not be the same, especially for # the last tile this_tile_size = tile_end-tile_start # construct a tile (submatrix) tile = zeros((this_tile_size, this_tile_size)) for i in xrange(this_tile_size): for j in xrange(i+1, this_tile_size): tile[i, j] = dist_func(order[i+tile_start], order[j+tile_start]) tile[j, i] = tile[i, j] #print("INFO: Working on tile with idxs %d:%d gives tile shape %d:%d" % \ # (tile_start, tile_end, tile.shape[0], tile.shape[1])) # Apply MDS on this tile # t0 = time.clock() (tile_eigvecs, tile_eigvals) = cmds_tzeng(tile, dim) if PRINT_TIMINGS: print("TIMING(%s): MDS on tile %d took %f CPU secs" % (__name__, tile_no, time.clock() - t0)) # # (slower) alternative is: # #(tile_tile_eigvecs, tile_eigval) = qiime_pcoa.pcoa(tile) # pcoa computes all dims so cut down #tile_tile_eigvecs = tile_tile_eigvecs[:, 0:dim] # Add MDS solution to the growing overall solution # t0 = time.clock() comb_mds.add(tile_eigvecs, tile_overlap) if PRINT_TIMINGS: print("TIMING(%s): adding of tile %d (shape %d:%d) took %f CPU secs" % (__name__, tile_no, tile_eigvecs.shape[0], tile_eigvecs.shape[1], time.clock() - t0)) tile_start = tile_end - tile_overlap tile_end = tile_end + tile_size - tile_overlap tile_no += 1 restore_idxs = argsort(order) result = comb_mds.getFinalMDS()[restore_idxs] if PRINT_TIMINGS: print("TIMING(%s): SCMDS took %f CPU secs" % (__name__, time.clock() - t0_overall)) return result """ PyCogent-1.5.3/cogent/cluster/goodness_of_fit.py000644 000765 000024 00000020154 12024702176 022724 0ustar00jrideoutstaff000000 000000 #!usr/bin/env python """Goodness of fit of Multidimensional Scaling Implements several functions that measure the degree of correspondence between an MDS and its coordinates and the original input distances. See for example: * Johnson & Wichern (2002): Applied Multivariate Statistical Analysis """ import numpy __author__ = "Andreas Wilm" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Andreas Wilm"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Andreas Wilm" __email__ = "andreas.wilm@ucd.ie" __status__ = "Production" class Stress(object): """Degree of correspondence between input distances and an MDS Stress measures the goodness of fit or degree of correspondence between distances implied by an MDS mapping and the original distances. There are many different variants of Stress; Kruskal's Stress or Stress-1 probably being the most popular one. """ def __init__(self, orig_distmat, mds_coords, apply_scaling=True): """Setup class by storing pointers to orginal matrix and calculating distances implied by MDS By default, implied distances (disparities) are scaled to match the range of the original distances. The calculation of the implied distances is the only really time intensive step, so __init__ will take some time for big matrices. Arguments: * orig_distmat (numpy array or numpy matrix) original distance matrix * mds_coords (numpy array) mds coordinates * apply_scaling (boolean) scale distances implied by an MDS mapping to match those of original distance matrix """ # don't use coercion here (orig_distmat to matrix, mds_coords # to array),to avoid conversion/copy overhead as these # arrays/matrices can potentially be quite big # assert isinstance(orig_distmat, numpy.ndarray), \ "orig_distmat is not a numpy.ndarray instance" assert isinstance(mds_coords, numpy.ndarray), \ "mds_coords is not a numpy.ndarray instance" # check array/matrix shapes and sizes assert len(orig_distmat.shape) == 2, \ "orig_distmat is not a 2D array/matrix." assert len(mds_coords.shape) == 2, \ "mds_coords is not a 2D array." assert orig_distmat.shape[0] == mds_coords.shape[0], \ "orig_distmat and mds_coords do not have the same" \ " number of rows/objects." assert orig_distmat.shape[1] > mds_coords.shape[1], \ "orig_distmat shape bigger than mds_coords shape." \ " Possible argument mixup" self._orig_distmat = orig_distmat # compute distances implied by MDS and scale if requested (and # necessary) by the ratio of maxima of both matrices # self._reproduced_distmat = self._calc_pwdist(mds_coords) if apply_scaling: max_orig = self._orig_distmat.max() max_derived = self._reproduced_distmat.max() scale = max_derived / max_orig if scale != 1.0: self._reproduced_distmat = self._reproduced_distmat / scale def calcKruskalStress(self): """Calculate Kruskal's Stress AKA Stress-1 Kruskal's Stress or Stress-1 is defined as: sqrt( SUM_ij (d'(i,j) - d(i,j))^2 / SUM_ij d(i,j)^2 for in) Returns * distance as float """ assert isinstance(mds_coords, numpy.ndarray), \ "mds_coords is not a numpy ndarray" result = numpy.zeros((mds_coords.shape[0], mds_coords.shape[0])) for i in range(mds_coords.shape[0]): row_i = mds_coords[i, :] for j in range(i+1, mds_coords.shape[0]): result[i, j] = self._calc_rowdist(row_i, mds_coords[j, :]) result[j, i] = result[i, j] return result PyCogent-1.5.3/cogent/cluster/metric_scaling.py000644 000765 000024 00000016136 12024702176 022545 0ustar00jrideoutstaff000000 000000 #!usr/bin/env python """Functions for doing principal coordinates analysis on a distance matrix Calculations performed as described in: Principles of Multivariate analysis: A User's Perspective. W.J. Krzanowski Oxford University Press, 2000. p106. Note: There are differences in the signs of some of the eigenvectors between R's PCoA function (cmdscale) and the implementation provided here. This is due to numpy's eigh() function. The eigenvalues returned by eigh() match those returned by R's eigen() function, but some of the eigenvectors have swapped signs. Numpy's eigh() function uses a different set of Fortran routines (part of LAPACK) to calculate the eigenvectors than R does. Numpy uses DSYEVD and ZHEEVD, while R uses DSYEVR and ZHEEV. As far as the Fortran documentation goes, those routines should produce the same results, they just use different algorithms to obtain them. The differences in sign do not affect the overall results as they are just a different reflection. R's cmdscale documentation also confirms the possibility of obtaining different signs between different R platforms. Please feel free to send questions to jai.rideout@gmail.com. """ from numpy import shape, add, sum, sqrt, argsort, transpose, newaxis from numpy.linalg import eigh from cogent.util.dict2d import Dict2D from cogent.util.table import Table from cogent.cluster.UPGMA import inputs_from_dict2D __author__ = "Catherine Lozupone" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozuopone", "Rob Knight", "Peter Maxwell", "Gavin Huttley", "Justin Kuczynski", "Daniel McDonald", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Catherine Lozupone" __email__ = "lozupone@colorado.edu" __status__ = "Production" def PCoA(pairwise_distances): """runs principle coordinates analysis on a distance matrix Takes a dictionary with tuple pairs mapped to distances as input. Returns a cogent Table object. """ items_in_matrix = [] for i in pairwise_distances: if i[0] not in items_in_matrix: items_in_matrix.append(i[0]) if i[1] not in items_in_matrix: items_in_matrix.append(i[1]) dict2d_input = [(i[0], i[1], pairwise_distances[i]) for i in \ pairwise_distances] dict2d_input.extend([(i[1], i[0], pairwise_distances[i]) for i in \ pairwise_distances]) dict2d_input = Dict2D(dict2d_input, RowOrder=items_in_matrix, \ ColOrder=items_in_matrix, Pad=True, Default=0.0) matrix_a, node_order = inputs_from_dict2D(dict2d_input) point_matrix, eigvals = principal_coordinates_analysis(matrix_a) return output_pca(point_matrix, eigvals, items_in_matrix) def principal_coordinates_analysis(distance_matrix): """Takes a distance matrix and returns principal coordinate results point_matrix: each row is an axis and the columns are points within the axis eigvals: correspond to the rows and indicate the amount of the variation that that the axis in that row accounts for NOT NECESSARILY SORTED """ E_matrix = make_E_matrix(distance_matrix) F_matrix = make_F_matrix(E_matrix) eigvals, eigvecs = run_eig(F_matrix) #drop imaginary component, if we got one eigvals = eigvals.real eigvecs = eigvecs.real point_matrix = get_principal_coordinates(eigvals, eigvecs) return point_matrix, eigvals def make_E_matrix(dist_matrix): """takes a distance matrix (dissimilarity matrix) and returns an E matrix input and output matrices are numpy array objects of type Float squares and divides by -2 each element in the matrix """ return (dist_matrix * dist_matrix) / -2.0 def make_F_matrix(E_matrix): """takes an E matrix and returns an F matrix input is output of make_E_matrix for each element in matrix subtract mean of corresponding row and column and add the mean of all elements in the matrix """ num_rows, num_cols = shape(E_matrix) #make a vector of the means for each row and column #column_means = (add.reduce(E_matrix) / num_rows) column_means = (add.reduce(E_matrix) / num_rows)[:,newaxis] trans_matrix = transpose(E_matrix) row_sums = add.reduce(trans_matrix) row_means = row_sums / num_cols #calculate the mean of the whole matrix matrix_mean = sum(row_sums) / (num_rows * num_cols) #adjust each element in the E matrix to make the F matrix E_matrix -= row_means E_matrix -= column_means E_matrix += matrix_mean #for i, row in enumerate(E_matrix): # for j, val in enumerate(row): # E_matrix[i,j] = E_matrix[i,j] - row_means[i] - \ # column_means[j] + matrix_mean return E_matrix def run_eig(F_matrix): """takes an F-matrix and returns eigenvalues and eigenvectors""" #use eig to get vector of eigenvalues and matrix of eigenvectors #these are already normalized such that # vi'vi = 1 where vi' is the transpose of eigenvector i eigvals, eigvecs = eigh(F_matrix) #NOTE: numpy produces transpose of Numeric! return eigvals, eigvecs.transpose() def get_principal_coordinates(eigvals, eigvecs): """converts eigvals and eigvecs to point matrix normalized eigenvectors with eigenvalues""" #get the coordinates of the n points on the jth axis of the Euclidean #representation as the elements of (sqrt(eigvalj))eigvecj #must take the absolute value of the eigvals since they can be negative return eigvecs * sqrt(abs(eigvals))[:,newaxis] def output_pca(PCA_matrix, eigvals, names): """Creates a string output for principal coordinates analysis results. PCA_matrix and eigvals are generated with the get_principal_coordinates function. Names is a list of names that corresponds to the columns in the PCA_matrix. It is the order that samples were represented in the initial distance matrix. returns a cogent Table object""" output = [] #get order to output eigenvectors values. reports the eigvecs according #to their cooresponding eigvals from greatest to least vector_order = list(argsort(eigvals)) vector_order.reverse() # make the eigenvector header line and append to output vec_num_header = ['vec_num-%d' % i for i in range(len(eigvals))] header = ['Label'] + vec_num_header #make data lines for eigenvectors and add to output rows = [] for name_i, name in enumerate(names): row = [name] for vec_i in vector_order: row.append(PCA_matrix[vec_i,name_i]) rows.append(row) eigenvectors = Table(header=header,rows=rows,digits=2,space=2, title='Eigenvectors') output.append('\n') # make the eigenvalue header line and append to output header = ['Label']+vec_num_header rows = [['eigenvalues']+[eigvals[vec_i] for vec_i in vector_order]] pcnts = (eigvals/sum(eigvals))*100 rows += [['var explained (%)']+[pcnts[vec_i] for vec_i in vector_order]] eigenvalues = Table(header=header,rows=rows,digits=2,space=2, title='Eigenvalues') return eigenvectors.appended('Type', eigenvalues, title='') PyCogent-1.5.3/cogent/cluster/nmds.py000644 000765 000024 00000044242 12024702176 020522 0ustar00jrideoutstaff000000 000000 #!usr/bin/env python """nonmetric multidimensional scaling (nmds) see, for example: Jan de Leeuw 2004 (monotone regression), Rencher 2002: Methods of multivariate analysis, and the original work: Kruskal 1964: Nonmetric multidimensional scaling """ from __future__ import division from numpy import array, multiply, sum, zeros, size, shape, diag, dot, mean,\ sqrt, transpose, trace, argsort, newaxis, finfo, all from numpy.random import seed, normal as random_gauss from numpy.linalg import norm, svd from operator import itemgetter import cogent.maths.scipy_optimize as optimize from cogent.cluster.metric_scaling import principal_coordinates_analysis __author__ = "Justin Kuczynski" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Justin Kuczynski", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Justin Kuczynski" __email__ = "justinak@gmail.com" __status__ = "Development" class NMDS(object): """Generates points using nonmetric scaling Takes as input an n by n distance/dissimilarity matrix (hereafter called a dissimilarity matrix for clarity), and a desired dimension (k). Using the order of these dissimilarities only (the value is considered only to the extent that it determines the order), nonmetric_scaling constructs n points in k-dimensional space which correspond to the input matrix. The algorithm attempts to have the order of the pairwise distances between points correspond as closely as possible to the order of the dissimilarity matrix entries; if dissim[i,j] is the minimum dissimilarity, point i and point j should be the two points closest together. The algorithm is random in nature, and is not guaranteed to converge to the best solution. Furthermore, in general a dissimilarity matrix cannot be represented exactly by n points in k space, for k < n-1. Currently the convergence test is pretty basic, it just tests for a sufficiently small or negative relative improvement in stress, or if a sufficiently tiny value of stress has been reached. Alternatively, it will stop after max_iterations The basic algorithm is: - generate n points in k space - compute pairwise distances between these points - compare these distances with a set of pseudo-distances (dhats), which are constrained to increase in the same order as the input data - repeatedly adjust the points and dhats until the point distances have nearly the same order as the input data Note: increasing MIN_ABS_STRESS causes nans to return from stress fn """ def __init__(self, dissimilarity_mtx, initial_pts="pcoa", dimension=2, rand_seed=None, optimization_method=1, verbosity=1, max_iterations=50, setup_only=False, min_rel_improvement = 1e-3, min_abs_stress = 1e-5): """ Arguments: - dissimilarity_mtx: an n by n numpy float array representing the pairwise dissimilarity of items. 0 on diagonals, symmetric under (i,j) -> (j,i) - initial_pts: "random" => random starting points, "pcoa" => pts from pcoa, or a numpy 2d array, ncols = dimension - dimension: the desired dimension k of the constructed - rand_seed: used for testing - optimization_method: used when points are adjusted to minimize stress: 0 => justin k's ad hoc method of steepest descent 1 => cogent's scipy_optimize fmin_bfgs """ self.min_rel_improvement = min_rel_improvement self.min_abs_stress = min_abs_stress if dimension >= len(dissimilarity_mtx) - 1: raise RuntimeError("NMDS requires N-1 dimensions or fewer, "+\ "where N is the number of samples, or rows in the dissim matrix"+\ " got %s rows for a %s dimension NMDS" % \ (len(dissimilarity_mtx), dimension)) if rand_seed != None: seed(rand_seed) self.verbosity = verbosity num_points = len(dissimilarity_mtx) point_range = range(num_points) self.dimension = dimension self.optimization_method = optimization_method self._calc_dissim_order(dissimilarity_mtx, point_range) # sets self.order # note that in the rest of the code, only the order matters, the values # of the dissimilarity matrix aren't used if initial_pts == "random": self.points = self._get_initial_pts(dimension, point_range) elif initial_pts == "pcoa": pcoa_pts, pcoa_eigs = principal_coordinates_analysis(\ dissimilarity_mtx) order = argsort(pcoa_eigs)[::-1] # pos to small/neg pcoa_pts = pcoa_pts[order].T self.points = pcoa_pts[:,:dimension] else: self.points = initial_pts self.points = self._center(self.points) self._rescale() self._calc_distances() # dists relates to points, not to input data self._update_dhats() # dhats are constrained to be monotonic self._calc_stress() # self.stress is calculated from dists and dhats self.stresses = [self.stress] # stress is the metric of badness of fit used in this code # index 0 is the initial stress, with a initial set of # datapoints. index 1 corresponds to iteration 0 of the loop below if setup_only: return for i in range(max_iterations): if self.verbosity >= 1: print("nonmetric broad iteration, stress: ", i, self.stresses[-1]) if (self.stresses[-1] < self.min_abs_stress): if self.verbosity >= 1: print "stress below cutoff, done" break self._move_points() self._calc_distances() self._update_dhats() self._calc_stress() self.stresses.append(self.stress) if (self.stresses[-2]-self.stresses[-1]) / self.stresses[-2] <\ self.min_rel_improvement: if self.verbosity >= 1: print "iteration improvement minimal. converged." break # center and rotate the points, since pos, rotation is arbitrary # rotation is to align to principal axes of self.points self.points = self._center(self.points) u,s,vh = svd(self.points, full_matrices=False) S = diag(s) self.points = dot(u,S) # normalize the scaling, which should not change the stress self._rescale() @property def dhats(self): """The dhats in order.""" # Probably not required, but here in case needed for backward # compatibility. self._dhats is the full 2D array return [self._dhats[i,j] for (i,j) in self.order] @property def dists(self): """The dists in order""" # Probably not required, but here in case needed for backward # compatibility. self._dists is the full 2D array return [self._dists[i,j] for (i,j) in self.order] def getPoints(self): """Returns (ordered in a list) the n points in k space these are the algorithm's attempt at points corresponding to the input order of dissimilarities. Returns a numpy 'd' mtx, points in rows """ return self.points def getStress(self): """Returns a measure of the badness of fit not in percent, a typical number for 20 datapoints is .12""" return self.stresses[-1] def getDimension(self): """returns the dimensions in which the constructed points lie""" return self.dimension def _center(self, mtx): """translate all data (rows of the matrix) to center on the origin returns a shifted version of the input data. The new matrix is such that the center of mass of the row vectors is centered at the origin. Returns a numpy float ('d') array """ result = array(mtx, 'd') result -= mean(result, 0) # subtract each column's mean from each element in that column return result def _calc_dissim_order(self, dissim_mtx, point_range): """calculates the order of the dissim_mtx entries, puts in self.order First creates a list of dissim elements with structure [i, j, value], then sorts that by value and strips the value subelemnt. i and j correspond to the row and column of the input dissim matrix """ dissim_list = [] for i in point_range: for j in point_range: if j > i: dissim_list.append([i, j, dissim_mtx[i,j]]) dissim_list.sort(key = itemgetter(2)) for elem in dissim_list: elem.pop() self.order = dissim_list def _get_initial_pts(self, dimension, pt_range): """Generates points randomly with a gaussian distribution (sigma = 1) """ # nested list comprehension. Too dense for good readability? points = [[random_gauss(0., 1) for axis in range(dimension)] \ for pt_idx in pt_range] return array(points, 'd') def _calc_distances(self): """Update distances between the points""" diffv = self.points[newaxis, :, :] - self.points[:, newaxis, :] squared_dists = (diffv**2).sum(axis=-1) self._dists = sqrt(squared_dists) self._squared_dist_sums = squared_dists.sum(axis=-1) def _update_dhats(self): """Update dhats based on distances""" new_dhats = self._dists.copy() ordered_dhats = [new_dhats[i,j] for (i,j) in self.order] ordered_dhats = self._do_monotone_regression(ordered_dhats) for ((i,j),d) in zip(self.order, ordered_dhats): new_dhats[i,j] = new_dhats[j, i] = d self._dhats = new_dhats def _do_monotone_regression(self, dhats): """Performs a monotone regression on dhats, returning the result Assuming the input dhats are the values of the pairwise point distances, this algorithm minimizes the stress while enforcing monotonicity of the dhats. Jan de Leeuw 2004 (monotone regression) has a rough outline of the algorithm. Basically, as we proceed along the ordered list, if an element is smaller than its preceeding one, the two are averaged and grouped together in a block. The process is repeated until the blocks are monotonic, that is block i <= block i+1. """ blocklist = [] for top_dhat in dhats: top_total = top_dhat top_size = 1 while blocklist and top_dhat <= blocklist[-1][0]: (dhat, total, size) = blocklist.pop() top_total += total top_size += size top_dhat = top_total / top_size blocklist.append((top_dhat, top_total, top_size)) result_dhats = [] for (val, total, size) in blocklist: result_dhats.extend([val]*size) return result_dhats def _calc_stress(self): """calculates the stress, or badness of fit between the distances and dhats Caches some intermediate values for gradient calculations. """ diffs = (self._dists - self._dhats) diffs **= 2 self._squared_diff_sums = diffs.sum(axis=-1) self._total_squared_diff = self._squared_diff_sums.sum() / 2 self._total_squared_dist = self._squared_dist_sums.sum() / 2 self.stress = sqrt(self._total_squared_diff/self._total_squared_dist) def _nudged_stress(self, v, d, epsilon): """Calculates the stress with point v moved epsilon in the dth dimension """ delta_epsilon = zeros([self.dimension], float) delta_epsilon[d] = epsilon moved_point = self.points[v] + delta_epsilon squared_dists = ((moved_point - self.points)**2).sum(axis=-1) squared_dists[v] = 0.0 delta_squared_dist = squared_dists.sum() - self._squared_dist_sums[v] diffs = sqrt(squared_dists) - self._dhats[v] diffs **= 2 delta_squared_diff = diffs.sum() - self._squared_diff_sums[v] return sqrt( (self._total_squared_diff + delta_squared_diff) / (self._total_squared_dist + delta_squared_dist)) def _rescale(self): """ assumes centered, rescales to mean ot-origin dist of 1 """ factor = array([norm(vec) for vec in self.points]).mean() self.points = self.points/factor def _move_points(self): """ this attempts to move our points in such a manner as to minimize the stress metric, keeping dhats fixed. If the dists could be chosen without constraints, by assigning each dist[i,j] = dhat[i,j], stress would be zero. However, since the distances are computed from points, it is generally impossible to change the dists independantly of each other. a basic algorithm is: - move points - recompute dists - recompute stress - if stress decreased, continue in the same manner, otherwise move points in a different manner self.points often serves as a starting point for optimizaion algorithms optimization algorithm 0 is justin's hack (steepest descent method) """ if self.optimization_method == 0: self._steep_descent_move() elif self.optimization_method == 1: numrows, numcols = shape(self.points) pts = self.points.ravel().copy() # odd behavior of scipy_optimize, possibly a bug there maxiter = 100 while True: if maxiter <= 1: raise RuntimeError("could not run scipy optimizer") try: optpts = optimize.fmin_bfgs( self._recalc_stress_from_pts, pts, fprime=self._calc_stress_gradients, disp=self.verbosity, maxiter=maxiter, gtol=1e-3) break except FloatingPointError: # floor maxiter = int(maxiter/2) self.points = optpts.reshape((numrows, numcols)) else: raise ValueError def _steep_descent_move(self, rel_step_size=1./100, precision=.00001, max_iters=100): """moves self.points. goal: minimize the stress. Uses steepest descent method. This is currently an ad-hoc minimization routine, using the method of steepest descent. The default parameters are only shown to work on a few simple cases, and aren't optimized. The gradient is calculated discretely, not via formula. Each variable (there are n points * k dimensions of variables), is adjusted, the stress measured, and the variable returned to its prior value. If a local minimum is larger than step_size, the algorithm cannot escape. """ num_rows, num_cols = shape(self.points) avg_point_dist = sum([norm(point) for point in self.points])/num_rows step_size = avg_point_dist*rel_step_size for iter in range(max_iters): # initial values prestep_stress = self.stress.copy() gradient = zeros((num_rows, num_cols)) # get gradient for i in range(num_rows): for j in range(num_cols): self.points[i,j] += step_size self._calc_distances() self._calc_stress() delta_stress = self.stress - prestep_stress gradient[i,j] = delta_stress/step_size self.points[i,j] -= step_size grad_mag = norm(gradient) # step in the direction of the negative gradient for i in range(num_rows): for j in range(num_cols): self.points[i,j] -= step_size*gradient[i,j]/grad_mag self._calc_distances() self._calc_stress() newstress = self.stress.copy() # choose whether to iterate again if abs((newstress - prestep_stress)/prestep_stress) < precision: if self.verbosity >= 1: print("move pts converged after iteration: ", iter) break if iter == (max_iters - 1): if self.verbosity >= 1: print("move pts didn't converge in ", max_iters) def _recalc_stress_from_pts(self, pts): """returns an updated value for stress based on input pts a special function for use with external optimization routines. pts here is a 1D numpy array""" pts = pts.reshape(self.points.shape) changed = not all(pts == self.points) self.points = pts if changed: self._calc_distances() self._calc_stress() return self.stress def _calc_stress_gradients(self, pts): """Approx first derivatives of stress at pts, for optimisers""" epsilon = sqrt(finfo(float).eps) f0 = self._recalc_stress_from_pts(pts) grad = zeros(pts.shape, float) for k in range(len(pts)): (point, dim) = divmod(k, self.dimension) f1 = self._nudged_stress(point, dim, epsilon) grad[k] = (f1 - f0)/epsilon return grad def metaNMDS(iters, *args, **kwargs): """ runs NMDS, first with pcoa init, then iters times with random init returns NMDS object with lowest stress args, kwargs is passed to NMDS(), but must not have initial_pts must supply distance matrix """ results = [] kwargs['initial_pts'] = "pcoa" res1 = NMDS(*args,**kwargs) results.append(res1) kwargs['initial_pts'] = "random" for i in range(iters): results.append(NMDS(*args, **kwargs)) stresses = [nmds.getStress() for nmds in results] bestidx = stresses.index(min(stresses)) return results[bestidx] PyCogent-1.5.3/cogent/cluster/procrustes.py000644 000765 000024 00000013250 12024702176 021765 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Procrustes analysis. Main fn: procrustes See for example: Principles of Multivariate analysis, by Krzanowski """ from numpy.linalg import svd from numpy import array, sqrt, sum, zeros, trace, dot, transpose,\ divide, square, subtract, shape, any, abs, mean from numpy import append as numpy_append __author__ = "Justin Kuczynski" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Justin Kuczynski" __email__ = "justinak@gmail.com" __status__ = "Production" def procrustes(data1, data2): """Procrustes analysis, a similarity test for two data sets. Each input matrix is a set of points or vectors (the rows of the matrix) The dimension of the space is the number of columns of each matrix. Given two identially sized matrices, procrustes standardizes both such that: - trace(AA') = 1 (A' is the transpose, and the product is a standard matrix product). - Both sets of points are centered around the origin Procrustes then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize M^2 = sum(square(mtx1 - mtx2)), or the sum of the squares of the pointwise differences between the two input datasets If two data sets have different dimensionality (different number of columns), simply add columns of zeros the the smaller of the two. This function was not designed to handle datasets with different numbers of datapoints (rows) Arguments: - data1: matrix, n rows represent points in k (columns) space data1 is the reference data, after it is standardised, the data from data2 will be transformed to fit the pattern in data1 - data2: n rows of data in k space to be fit to data1. Must be the same shape (numrows, numcols) as data1 - both must have >1 unique points Returns: - mtx1: a standardized version of data1 - mtx2: the orientation of data2 that best fits data1. centered, but not necessarily trace(mtx2*mtx2') = 1 - disparity: a metric for the dissimilarity of the two datasets, disparity = M^2 defined above Notes: - The disparity should not depend on the order of the input matrices, but the output matrices will, as only the first output matrix is guaranteed to be scaled such that trace(AA') = 1. - duplicate datapoints are generally ok, duplicating a data point will increase it's effect on the procrustes fit. - the disparity scales as the number of points per input matrix """ SMALL_NUM = 1e-6 # used to check for zero values in added dimension # make local copies # mtx1 = array(data1.copy(),'d') # mtx2 = array(data2.copy(),'d') num_rows, num_cols = shape(data1) if (num_rows, num_cols) != shape(data2): raise ValueError("input matrices must be of same shape") if (num_rows == 0 or num_cols == 0): raise ValueError("input matrices must be >0 rows, >0 cols") # add a dimension to allow reflections (rotations in n + 1 dimensions) mtx1 = numpy_append(data1, zeros((num_rows, 1)), 1) mtx2 = numpy_append(data2, zeros((num_rows, 1)), 1) # standardize each matrix mtx1 = center(mtx1) mtx2 = center(mtx2) if ((not any(mtx1)) or (not any(mtx2))): raise ValueError("input matrices must contain >1 unique points") mtx1 = normalize(mtx1) mtx2 = normalize(mtx2) # transform mtx2 to minimize disparity (sum( (mtx1[i,j] - mtx2[i,j])^2) ) mtx2 = match_points(mtx1, mtx2) # WARNING: I haven't proven that after matching the matrices, no point has # a nonzero component in the added dimension. I believe it is true, # though, since the unchanged matrix has no points extending into # that dimension if any(abs(mtx2[:,-1]) > SMALL_NUM): raise StandardError("we have accidentially added a dimension to \ the matrix, and the vectors have nonzero components in that dimension") # strip extra dimension which was added to allow reflections mtx1 = mtx1[:,:-1] mtx2 = mtx2[:,:-1] disparity = get_disparity(mtx1, mtx2) return mtx1, mtx2, disparity def center(mtx): """translate all data (rows of the matrix) to center on the origin returns a shifted version of the input data. The new matrix is such that the center of mass of the row vectors is centered at the origin. Returns a numpy float ('d') array """ result = array(mtx, 'd') result -= mean(result, 0) # subtract each column's mean from each element in that column return result def normalize(mtx): """change scaling of data (in rows) such that trace(mtx*mtx') = 1 mtx' denotes the transpose of mtx """ result = array(mtx, 'd') num_pts, num_dims = shape(result) mag = trace(dot(result, transpose(result))) norm = sqrt(mag) result /= norm return result def match_points(mtx1, mtx2): """returns a transformed mtx2 that matches mtx1. returns a new matrix which is a transform of mtx2. Scales and rotates a copy of mtx 2. See procrustes docs for details. """ u,s,vh = svd(dot(transpose(mtx1), mtx2)) q = dot(transpose(vh), transpose(u)) new_mtx2 = dot(mtx2, q) new_mtx2 *= sum(s) return new_mtx2 def get_disparity(mtx1, mtx2): """ returns a measure of the dissimilarity between two data sets returns M^2 = sum(square(mtx1 - mtx2)), the pointwise sum of squared differences""" return(sum(square(mtx1 - mtx2))) PyCogent-1.5.3/cogent/cluster/UPGMA.py000644 000765 000024 00000015542 12024702176 020433 0ustar00jrideoutstaff000000 000000 #usr/bin/env python """Functions to cluster using UPGMA upgma takes an dictionary of pair tuples mapped to distances as input. UPGMA_cluster takes an array and a list of PhyloNode objects corresponding to the array as input. Can also generate this type of input from a Dict2D using inputs_from_dict2D function. Both return a PhyloNode object of the UPGMA cluster """ from numpy import array, ravel, argmin, take, sum, average, ma, diag from cogent.core.tree import PhyloNode from cogent.util.dict2d import Dict2D __author__ = "Catherine Lozupone" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Catherine Lozuopone", "Rob Knight", "Peter Maxwell"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Catherine Lozupone" __email__ = "lozupone@colorado.edu" __status__ = "Production" import numpy numerictypes = numpy.core.numerictypes.sctype2char Float = numerictypes(float) BIG_NUM = 1e305 def upgma(pairwise_distances): """Uses the UPGMA algorithm to cluster sequences pairwise_distances: a dictionary with pair tuples mapped to a distance returns a PhyloNode object of the UPGMA cluster """ items_in_matrix = [] for i in pairwise_distances: if i[0] not in items_in_matrix: items_in_matrix.append(i[0]) if i[1] not in items_in_matrix: items_in_matrix.append(i[1]) dict2d_input = [(i[0], i[1], pairwise_distances[i]) for i in \ pairwise_distances] dict2d_input.extend([(i[1], i[0], pairwise_distances[i]) for i in \ pairwise_distances]) dict2d_input = Dict2D(dict2d_input, RowOrder=items_in_matrix, \ ColOrder=items_in_matrix, Pad=True, Default=BIG_NUM) matrix_a, node_order = inputs_from_dict2D(dict2d_input) tree = UPGMA_cluster(matrix_a, node_order, BIG_NUM) index = 0 for node in tree.traverse(): if not node.Parent: node.Name = 'root' elif not node.Name: node.Name = 'edge.' + str(index) index += 1 return tree def find_smallest_index(matrix): """returns the index of the smallest element in a numpy array for UPGMA clustering elements on the diagonal should first be substituted with a very large number so that they are always larger than the rest if the values in the array.""" #get the shape of the array as a tuple (e.g. (3,3)) shape = matrix.shape #turn into a 1 by x array and get the index of the lowest number matrix1D = ravel(matrix) lowest_index = argmin(matrix1D) #convert the lowest_index derived from matrix1D to one for the original #square matrix and return row_len = shape[0] return divmod(lowest_index, row_len) def condense_matrix(matrix, smallest_index, large_value): """converges the rows and columns indicated by smallest_index Smallest index is returned from find_smallest_index. For both the rows and columns, the values for the two indices are averaged. The resulting vector replaces the first index in the array and the second index is replaced by an array with large numbers so that it is never chosen again with find_smallest_index. """ first_index, second_index = smallest_index #get the rows and make a new vector that has their average rows = take(matrix, smallest_index, 0) new_vector = average(rows, 0) #replace info in the row and column for first index with new_vector matrix[first_index] = new_vector matrix[:, first_index] = new_vector #replace the info in the row and column for the second index with #high numbers so that it is ignored matrix[second_index] = large_value matrix[:, second_index] = large_value return matrix def condense_node_order(matrix, smallest_index, node_order): """condenses two nodes in node_order based on smallest_index info This function is used to create a tree while condensing a matrix with the condense_matrix function. The smallest_index is retrieved with find_smallest_index. The first index is replaced with a node object that combines the two nodes corresponding to the indices in node order. The second index in smallest_index is replaced with None. Also sets the branch length of the nodes to 1/2 of the distance between the nodes in the matrix""" index1, index2 = smallest_index node1 = node_order[index1] node2 = node_order[index2] #get the distance between the nodes and assign 1/2 the distance to the #Length property of each node distance = matrix[index1, index2] nodes = [node1,node2] d = distance/2.0 for n in nodes: if n.Children: n.Length = d - n.Children[0].TipLength else: n.Length = d n.TipLength = d #combine the two nodes into a new PhyloNode object new_node = PhyloNode() new_node.Children.append(node1) new_node.Children.append(node2) node1.Parent = new_node node2.Parent = new_node #replace the object at index1 with the combined node node_order[index1] = new_node #replace the object at index2 with None node_order[index2] = None return node_order def UPGMA_cluster(matrix, node_order, large_number): """cluster with UPGMA matrix is a numpy array. node_order is a list of PhyloNode objects corresponding to the matrix. large_number will be assigned to the matrix during the process and should be much larger than any value already in the matrix. WARNING: Changes matrix in-place. WARNING: Expects matrix to already have diagonals assigned to large_number before this function is called. """ num_entries = len(node_order) tree = None for i in range(num_entries - 1): smallest_index = find_smallest_index(matrix) index1, index2 = smallest_index #if smallest_index is on the diagonal set the diagonal to large_number if index1 == index2: matrix[diag([True]*len(matrix))] = large_number smallest_index = find_smallest_index(matrix) row_order = condense_node_order(matrix, smallest_index, \ node_order) matrix = condense_matrix(matrix, smallest_index, large_number) tree = node_order[smallest_index[0]] return tree def inputs_from_dict2D(dict2d_matrix): """makes inputs for UPGMA_cluster from a Dict2D object Dict2D object is a distance matrix with labeled Rows. The diagonal elements should have a very large positive number assigned (e.g.1e305). The returned array is a numpy array with the distances. PhyloNode_order is a list of PhyloNode objects with the Data property assigned from the Dict2D Row order. """ matrix_lists = list(dict2d_matrix.Rows) matrix = array(matrix_lists, Float) row_order = dict2d_matrix.RowOrder PhyloNode_order = [] for i in row_order: PhyloNode_order.append(PhyloNode(Name=i)) return matrix, PhyloNode_order PyCogent-1.5.3/cogent/app/__init__.py000644 000765 000024 00000002755 12024702176 020422 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """apps: provides support libraries for controlling applications (local or web). """ __all__ = ['blast', 'carnac', 'cd_hit', 'clustalw', 'cmfinder', 'comrna', 'consan', 'contrafold', 'cove', 'dialign', 'dynalign', 'fasttree', 'fasttree_v1' 'foldalign', 'gctmpca', 'ilm', 'knetfold', 'mfold', 'muscle', 'msms', 'nupack', 'parameters', 'pfold', 'pknotsrg', 'raxml', 'rdp_classifier', 'rnaalifold', 'rnaforester', 'rnashapes', 'rnaview', 'sfold', 'unafold', 'util', 'vienna_package'] __author__ = "" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Catherine Lozuopone", "Gavin Huttley", "Rob Knight", "Zongzhi Liu", "Sandra Smit", "Micah Hamady", "Greg Caporaso", "Mike Robeson", "Daniel McDonald", "Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Rob Knight" __email__ = "rob@spot.colorado.edu" __status__ = "Production" """Need to add: tcoffee alignment meme motif finding phylip phylogeny mrbayes phylogeny paml phylogeny other packages? web tools? """ PyCogent-1.5.3/cogent/app/blast.py000644 000765 000024 00000132623 12024702176 017766 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controllers for blast family """ from string import strip from os import remove, access, F_OK, environ, path from cogent.app.parameters import FlagParameter, ValuedParameter, MixedParameter from cogent.app.util import CommandLineApplication, ResultPath, \ get_tmp_filename, guess_input_handler, ApplicationNotFoundError from cogent.parse.fasta import FastaFinder, LabeledRecordFinder, is_fasta_label from cogent.parse.blast import LastProteinIds9, QMEBlast9, QMEPsiBlast9, BlastResult from cogent.util.misc import app_path from random import choice from copy import copy __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Zongzhi Liu", "Micah Hamady", "Jeremy Widmann", "Catherine Lozupone", "Rob Knight","Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" class Blast(CommandLineApplication): """BLAST generic application controller""" _common_options ={ # defaults to non-redundant database #WARNING: This will only work if BLASTDB environment variable is set '-d':ValuedParameter('-',Name='d',Delimiter=' ', Value="nr"), # query file '-i':ValuedParameter('-',Name='i',Delimiter=' '), # Multiple Hits window size [Integer] '-A':ValuedParameter('-',Name='A',Delimiter=' '), # Threshold for extending hits [Integer] '-f':ValuedParameter('-',Name='f',Delimiter=' '), # Expectation value (E) [Real] '-e':ValuedParameter('-',Name='e',Delimiter=' ', Value="10.0"), # alignment view options: # 0 = pairwise, # 1 = query-anchored showing identities, # 2 = query-anchored no identities, # 3 = flat query-anchored, show identities, # 4 = flat query-anchored, no identities, # 5 = query-anchored no identities and blunt ends, # 6 = flat query-anchored, no identities and blunt ends, # 7 = XML Blast output, # 8 = Tabular output, # 9 = Tabular output with comments # 10 = ASN, text # 11 = ASN, binary [Integer] '-m':ValuedParameter('-',Name='m',Delimiter=' ', Value="9"), # Output File for Alignment [File Out] Optional '-o':ValuedParameter('-',Name='o',Delimiter=' '), # Filter query sequence with SEG [String] '-F':ValuedParameter('-',Name='F',Delimiter=' '), # Cost to open a gap [Integer] '-G':ValuedParameter('-',Name='G',Delimiter=' '), # Cost to extend a gap [Integer] '-E':ValuedParameter('-',Name='E',Delimiter=' '), # X dropoff value for gapped alignment (in bits) [Integer] # blastn 30, megablast 20, tblastx 0, all others 15 [Integer] '-X':ValuedParameter('-',Name='X',Delimiter=' '), # Show GI's in deflines [T/F] '-I':ValuedParameter('-',Name='I',Delimiter=' '), # Number of database seqs to show one-line descriptionss for [Integer] '-v':ValuedParameter('-',Name='v',Delimiter=' '), # Number of database sequence to show alignments for (B) [Integer] '-b':ValuedParameter('-',Name='b',Delimiter=' '), # Perform gapped alignment (not available with tblastx) [T/F] '-g':ValuedParameter('-',Name='g',Delimiter=' '), # Number of processors to use [Integer] '-a':ValuedParameter('-',Name='a',Delimiter=' ', Value="1"), # Believe the query defline [T/F] '-J':ValuedParameter('-',Name='J',Delimiter=' '), # SeqAlign file ('Believe the query defline' must be TRUE) [File Out] # Optional '-O':ValuedParameter('-',Name='O',Delimiter=' '), # Matrix [String] '-M':ValuedParameter('-',Name='M',Delimiter=' ', Value="BLOSUM62"), # Word size [Integer] (blastn 11, megablast 28, all others 3) '-W':ValuedParameter('-',Name='W',Delimiter=' '), # Effective length of the database (use zero for the real size) [Real] '-z':ValuedParameter('-',Name='z',Delimiter=' '), # Number of best hits from a region to keep [Integer] '-K':ValuedParameter('-',Name='K',Delimiter=' '), # 0 for multiple hit, 1 for single hit [Integer] '-P':ValuedParameter('-',Name='P',Delimiter=' '), # Effective length of the search space (use zero for real size) [Real] '-Y':ValuedParameter('-',Name='Y',Delimiter=' '), # Produce HTML output [T/F] '-T':ValuedParameter('-',Name='T',Delimiter=' ', Value="F"), # Restrict search of database to list of GI's [String] Optional '-l':ValuedParameter('-',Name='l',Delimiter=' '), # Use lower case filtering of FASTA sequence [T/F] Optional '-U':ValuedParameter('-',Name='U',Delimiter=' '), # Dropoff (X) for blast extensions in bits (default if zero) [Real] # blastn 20, megablast 10, all others 7 '-y':ValuedParameter('-',Name='y',Delimiter=' '), # X dropoff value for final gapped alignment (in bits) [Integer] # blastn/megablast 50, tblastx 0, all others 25 '-Z':ValuedParameter('-',Name='Z',Delimiter=' '), # Input File for PSI-BLAST Restart [File In] Optional '-R':ValuedParameter('-',Name='R',Delimiter=' '), } _executable = 'blastall' _parameters = {} _parameters.update(_common_options) def __init__(self, cur_options, command, blast_mat_root=None, extra_env="", params=None,InputHandler=None, SuppressStderr=None, SuppressStdout=None,WorkingDir=None,\ HALT_EXEC=False): """ Initialize blast """ # update options self._parameters.update(cur_options) # check if need to set env variable (for cgi calls) if blast_mat_root: self._command = "export BLASTMAT=%s;%s%s" % (blast_mat_root, extra_env, command) else: # Determine if blast is installed and raise an ApplicationError # if not -- this is done here so the user will get the most # informative error message available. self._error_on_missing_application(params) # Otherwise raise error about $BLASTMAT not being set if not ('BLASTMAT' in environ or \ access(path.expanduser("~/.ncbirc"), F_OK) or \ access(".ncbirc", F_OK)): ## SHOULD THIS BE CHANGED TO RAISE AN ApplicationError? raise RuntimeError, blastmat_error_message self._command = command super(Blast, self).__init__(params=params, InputHandler=InputHandler,SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout,WorkingDir=WorkingDir,\ HALT_EXEC=HALT_EXEC) def _error_on_missing_application(self,params): """ Raise an ApplicationNotFoundError if the app is not accessible """ if not app_path('blastall'): raise ApplicationNotFoundError,\ "Cannot find blastall. Is it installed? Is it in your path?" def _input_as_seqs(self,data): lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _input_as_seq_id_seq_pairs(self,data): lines = [] for seq_id,seq in data: lines.append(''.join(['>',str(seq_id)])) lines.append(seq) return self._input_as_lines(lines) def _input_as_lines(self,data): if data: self.Parameters['-i']\ .on(super(Blast,self)._input_as_lines(data)) return '' def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ if data: self.Parameters['-i'].on(str(data)) return '' def _input_as_multiline_string(self, data): if data: self.Parameters['-i']\ .on(super(Blast,self)._input_as_multiline_string(data)) return '' def _align_out_filename(self): if self.Parameters['-o'].isOn(): aln_filename = self._absolute(str(self.Parameters['-o'].Value)) else: raise ValueError, "No output file specified." return aln_filename def _get_result_paths(self,data): result = {} if self.Parameters['-o'].isOn(): out_name = self._align_out_filename() result['BlastOut'] = ResultPath(Path=out_name,IsWritten=True) return result blastmat_error_message =\ """BLAST cannot run if the BLASTMAT environment variable is not set. Usually, the BLASTMAT environment variable points to the NCBI data directory, which contains matrices like PAM30 and PAM70, etc. Alternatively, you may create a .ncbirc file to define these variables. From help file: 2) Create a .ncbirc file. In order for Standalone BLAST to operate, you have will need to have a .ncbirc file that contains the following lines: [NCBI] Data="path/data/" Where "path/data/" is the path to the location of the Standalone BLAST "data" subdirectory. For Example: Data=/root/blast/data The data subdirectory should automatically appear in the directory where the downloaded file was extracted. Please note that in many cases it may be necessary to delimit the entire path including the machine name and or the net work you are located on. Your systems administrator can help you if you do not know the entire path to the data subdirectory. Make sure that your .ncbirc file is either in the directory that you call the Standalone BLAST program from or in your root directory. """ class PsiBlast(Blast): """PSI-BLAST application controller - Prototype""" _options ={ # ASN.1 Scoremat input of checkpoint data: # 0: no scoremat input # 1: Restart is from ASCII scoremat checkpoint file, # 2: Restart is from binary scoremat checkpoint file [Integer] Optional '-q':ValuedParameter('-',Name='q',Delimiter=' '), # Output File for PSI-BLAST Matrix in ASCII [File Out] Optional '-Q':ValuedParameter('-',Name='Q',Delimiter=' '), # Start of required region in query [Integer] '-S':ValuedParameter('-',Name='S',Delimiter=' ', Value="1"), # ASN.1 Scoremat output of checkpoint data: # 0: no scoremat output # 1: Output is ASCII scoremat checkpoint file (requires -J), # 2: Output is binary scoremat checkpoint file (requires -J) Optional '-u':ValuedParameter('-',Name='u',Delimiter=' '), # Cost to decline alignment (disabled when 0) [Integer] '-L':ValuedParameter('-',Name='L',Delimiter=' ', Value="0"), # program option for PHI-BLAST [String] '-p':ValuedParameter('-',Name='p',Delimiter=' ', Value="blastpgp"), # Use composition based statistics [T/F] '-t':ValuedParameter('-',Name='t',Delimiter=' ', Value="T"), # Input Alignment File for PSI-BLAST Restart [File In] Optional '-B':ValuedParameter('-',Name='B',Delimiter=' '), # Number of bits to trigger gapping [Real] '-N':ValuedParameter('-',Name='N',Delimiter=' ', Value="22.0"), # End of required region in query (-1 indicates end of query) [Integer] '-H':ValuedParameter('-',Name='H',Delimiter=' ', Value="-1"), # e-value threshold for inclusion in multipass model [Real] '-h':ValuedParameter('-',Name='h',Delimiter=' ', Value="0.001"), # Constant in pseudocounts for multipass version [Integer] '-c':ValuedParameter('-',Name='c',Delimiter=' ', Value="9"), # Maximum number of passes to use in multipass version [Integer] '-j':ValuedParameter('-',Name='j',Delimiter=' ', Value="1"), # Output File for PSI-BLAST Checkpointing [File Out] Optional '-C':ValuedParameter('-',Name='C',Delimiter=' '), # Compute locally optimal Smith-Waterman alignments [T/F] '-s':ValuedParameter('-',Name='s',Delimiter=' ', Value="F"), # Hit File for PHI-BLAST [File In] '-k':ValuedParameter('-',Name='k',Delimiter=' '), } def __init__(self, blast_mat_root=None, params=None, extra_env="", InputHandler=None,SuppressStderr=None, SuppressStdout=None,WorkingDir=None, HALT_EXEC=False): """ Initialize the Psi-Blast""" super(PsiBlast, self).__init__(self._options, "blastpgp", extra_env=extra_env, blast_mat_root=blast_mat_root, params=params, InputHandler=InputHandler,SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout,WorkingDir=WorkingDir, HALT_EXEC=HALT_EXEC) # should probably go into blastall superclass. it's late, works for now BLASTALL_OPTIONS ={ # Use lower case filtering of FASTA sequence [T/F] Optional '-U':ValuedParameter('-',Name='U',Delimiter=' '), # Penalty for a nucleotide mismatch (blastn only) [Integer] # default = -3 '-q':ValuedParameter('-',Name='q',Delimiter=' '), # Reward for a nucleotide match (blastn only) [Integer] '-r':ValuedParameter('-',Name='r',Delimiter=' '), # Query Genetic code to use [Integer] default = 1 '-Q':ValuedParameter('-',Name='Q',Delimiter=' '), # DB Genetic code (for tblast[nx] only) [Integer] '-D':ValuedParameter('-',Name='D',Delimiter=' '), # Query strands to search against database (for blast[nx], and tblastx) # 3 is both, 1 is top, 2 is bottom [Integer] '-S':ValuedParameter('-',Name='S',Delimiter=' '), # Program Name '-p':ValuedParameter('-',Name='p',Delimiter=' '), # MegaBlast search [T/F] '-n':ValuedParameter('-',Name='n',Delimiter=' '), # Location on query sequence [String] Option '-L':ValuedParameter('-',Name='L',Delimiter=' '), # Frame shift penalty (OOF algorithm for blastx) [Integer] '-w':ValuedParameter('-',Name='w',Delimiter=' '), # Length of the largest intron allowed in tblastn for linking HSPs #(0 disables linking) [Integer] '-t':ValuedParameter('-',Name='t',Delimiter=' '), # Number of concatenated queries, for blastn and tblastn [Integer] '-B':ValuedParameter('-',Name='B',Delimiter=' '), } class Blastall(Blast): """blastall application controller - Prototype """ def __init__(self, blast_mat_root=None, params=None, extra_env="", InputHandler=None,SuppressStderr=None, SuppressStdout=None,WorkingDir=None, HALT_EXEC=False): """ Initialize the blastall""" super(Blastall, self).__init__(BLASTALL_OPTIONS, "blastall", blast_mat_root=blast_mat_root, extra_env=extra_env, params=params, InputHandler=InputHandler,SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout,WorkingDir=WorkingDir, HALT_EXEC=HALT_EXEC) class MpiBlast(Blast): """mpblast application controller - Prototype """ _mpi_options ={ # Produces verbose debugging output for each node, optionally logs the # output to a file '--debug':ValuedParameter('-',Name='--debug',Delimiter='='), # Set the scheduler process' MPI Rank (default is 1). Because the # scheduler uses very little CPU it can be useful to force the # scheduler to run on the same physical machine as the writer (rank 0). '--scheduler-rank':ValuedParameter('-',Name='--scheduler-rank', Delimiter='='), # Print the Altschul. et. al. 1997 paper reference instead of the # mpiBLAST paper reference. With this option mpiblast output is nearly # identical to NCBI-BLAST output. '--altschul-reference':FlagParameter(Prefix='--', Name='altschul-reference'), #Removes the local copy of the database from each node before # terminating execution '--removedb':FlagParameter(Prefix='--', Name='removedb'), # Sets the method of copying files that each worker will use. # Default = "cp" # * cp : use standard file system "cp" command. # Additional option is --concurrent. # * rcp : use rsh "rcp" command. Additonal option is --concurrent. # * scp : use ssh "scp" command. Additional option is --concurrent. # * mpi : use MPI_Send/MPI_Recv to copy files. # Additional option is --mpi-size. # * none : do not copy files,instead use shared storage as local storage '--copy-via':ValuedParameter('-',Name='--copy-via', Delimiter='='), # set the number of concurrent accesses to shared storage. Default = 1 '--concurrent':ValuedParameter('-',Name='--concurrent', Delimiter='='), # in bytes, set the maximum buffer size that MPI will use to send data # when transferring files. Default = 65536 '--mpi-size':ValuedParameter('-',Name='--mpi-size', Delimiter='='), # set whether file locking should be used to manage local fragment # lists. Defaults to off. When --concurrency > 1 defaults to on # [on|off] '--lock':ValuedParameter('-',Name='--lock', Delimiter='='), # When set, the writer will use the database on shared storage for # sequence lookup. Can drastically reduce overhead for some blastn # searches. '--disable-mpi-db':FlagParameter(Prefix='--', Name='disable-mpi-db'), # Under unix, sets the nice value for each mpiblast process. '--nice':ValuedParameter('-',Name='--nice', Delimiter='='), # Under unix, sets the nice value for each mpiblast process. '--config-file':ValuedParameter('--',Name='config-file', Delimiter='='), # Experimental. When set, mpiblast will read the output file and # attempt to continue a previously aborted run where it left off '--resume-run':FlagParameter(Prefix='--', Name='resume-run'), # print the mpiBLAST version '--version':FlagParameter(Prefix='--', Name='version'), } _mpi_options.update(BLASTALL_OPTIONS) def __init__(self, blast_mat_root=None, params=None, mpiblast_root="/usr/local/bin/", local_root="/var/scratch/mpiblastdata/", shared_root="/quicksand/hamady/data/blast/mpidb/", config_file="/quicksand2/downloads2/mpiblast/mpiblast.conf", num_db_frags=40, InputHandler=None,SuppressStderr=None, SuppressStdout=None,WorkingDir=None, HALT_EXEC=False): """ Initialize mpiblast""" if config_file: params["--config-file"] = config_file super(MpiBlast, self).__init__(self._mpi_options, "mpirun -np %d %smpiblast" % ((num_db_frags + 2), mpiblast_root), blast_mat_root=blast_mat_root, extra_env="export Local=%s; export Shared=%s;" %(local_root, shared_root), params=params, InputHandler=InputHandler,SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout,WorkingDir=WorkingDir, HALT_EXEC=HALT_EXEC) class FastaCmd(CommandLineApplication): """FastaCmd application controller - Prototype""" _options ={ # Database [String] Optional '-d':ValuedParameter('-',Name='d',Delimiter=' '), # Type of file # G - guess mode (look for protein, then nucleotide) # T - protein # F - nucleotide [String] Optional '-p':ValuedParameter('-',Name='p',Delimiter=' ', Value="G"), # Search str: GIs, accessions and loci may be used delimited by comma '-s':ValuedParameter('-',Name='s',Delimiter=' '), # Input file wilth GIs/accessions/loci for batch retrieval Optional '-i':ValuedParameter('-',Name='i',Delimiter=' '), # Retrieve duplicate accessions [T/F] Optional '-a':ValuedParameter('-',Name='a',Delimiter=' ', Value='F'), # Line length for sequence [Integer] Optional '-l':ValuedParameter('-',Name='l',Delimiter=' '), # Definition line should contain target gi only [T/F] Optional '-t':ValuedParameter('-',Name='t',Delimiter=' '), # Output file [File Out] Optional '-o':ValuedParameter('-',Name='o',Delimiter=' '), # Use Ctrl-A's as non-redundant defline separator [T/F] Optional '-c':ValuedParameter('-',Name='c',Delimiter=' '), # Dump the entire database in fasta format [T/F] Optional '-D':ValuedParameter('-',Name='D',Delimiter=' '), # Range of sequence to extract (Format: start,stop) # 0 in 'start' refers to the beginning of the sequence # 0 in 'stop' refers to the end of the sequence [String] Optional '-L':ValuedParameter('-',Name='L',Delimiter=' '), # Strand on subsequence (nucleotide only): 1 is top, 2 is bottom [Int] '-S':ValuedParameter('-',Name='S',Delimiter=' '), # Print taxonomic information for requested sequence(s) [T/F] '-T':ValuedParameter('-',Name='T',Delimiter=' '), # Print database information only (overrides all other options) [T/F] '-I':ValuedParameter('-',Name='I',Delimiter=' '), # Retrieve sequences with this PIG [Integer] Optional '-P':ValuedParameter('-',Name='P',Delimiter=' '), } _parameters = {} _parameters.update(_options) _command = 'fastacmd' def _input_as_lines(self,data): if data: self.Parameters['-i']\ .on(super(FastaCmd,self)._input_as_lines(data)) return '' def _input_as_seqs(self,data): lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ if data: self.Parameters['-s'].on(data) return '' def _out_filename(self): if self.Parameters['-o'].isOn(): aln_filename = self._absolute(str(self.Parameters['-o'].Value)) else: raise ValueError, "No output file specified." return aln_filename def _get_result_paths(self,data): result = {} if self.Parameters['-o'].isOn(): out_name = self._out_filename() result['FastaOut'] = ResultPath(Path=out_name,IsWritten=True) return result def seqs_to_stream(seqs, ih): """Converts seqs into stream of FASTA records, depending on input handler. Each FASTA record will be a list of lines. """ if ih == '_input_as_multiline_string': recs = FastaFinder(seqs.split('\n')) elif ih == '_input_as_string': recs = FastaFinder(open(seqs)) elif ih == '_input_as_seqs': recs = [['>'+str(i), s] for i, s in enumerate(seqs)] elif ih == '_input_as_lines': recs = FastaFinder(seqs) else: raise TypeError, "Unknown input handler %s" % ih return recs #SOME FUNCTIONS TO EXECUTE THE MOST COMMON TASKS def blast_seqs(seqs, blast_constructor, blast_db=None, blast_mat_root=None, params={}, add_seq_names=True, out_filename=None, WorkingDir=None, SuppressStderr=None, SuppressStdout=None, input_handler=None, HALT_EXEC=False ): """Blast list of sequences. seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. WARNING: DECISION RULES FOR INPUT HANDLING HAVE CHANGED. Decision rules for data are as follows. If it's s list, treat as lines, unless add_seq_names is true (in which case treat as list of seqs). If it's a string, test whether it has newlines. If it doesn't have newlines, assume it's a filename. If it does have newlines, it can't be a filename, so assume it's a multiline string containing sequences. If you want to skip the detection and force a specific type of input handler, use input_handler='your_favorite_handler'. add_seq_names: boolean. if True, sequence names are inserted in the list of sequences. if False, it assumes seqs is a list of lines of some proper format that the program can handle """ # set num keep if blast_db: params["-d"] = blast_db if out_filename: params["-o"] = out_filename ih = input_handler or guess_input_handler(seqs, add_seq_names) blast_app = blast_constructor( params=params, blast_mat_root=blast_mat_root, InputHandler=ih, WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout, HALT_EXEC=HALT_EXEC) return blast_app(seqs) def fasta_cmd_get_seqs(acc_list, blast_db=None, is_protein=None, out_filename=None, params={}, WorkingDir="/tmp", SuppressStderr=None, SuppressStdout=None): """Retrieve sequences for list of accessions """ if is_protein is None: params["-p"] = 'G' elif is_protein: params["-p"] = 'T' else: params["-p"] = 'F' if blast_db: params["-d"] = blast_db if out_filename: params["-o"] = out_filename # turn off duplicate accessions params["-a"] = "F" # create Psi-BLAST fasta_cmd = FastaCmd(params=params, InputHandler='_input_as_string', WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) # return results return fasta_cmd("\"%s\"" % ','.join(acc_list)) def fastacmd_is_crap(line): """Handles missing ids...""" return (not line) or line.isspace() or line.startswith('[') FastaCmdFinder = LabeledRecordFinder(is_fasta_label, ignore=fastacmd_is_crap) def seqs_from_fastacmd(acc_list, blast_db,is_protein=True): """Get dict of description:seq from fastacmd.""" fasta_cmd_res = fasta_cmd_get_seqs(acc_list, blast_db=blast_db, \ is_protein=is_protein) recs = FastaCmdFinder(fasta_cmd_res['StdOut']) result = {} for rec in recs: try: result[rec[0][1:].strip()] = ''.join(map(strip, rec[1:])) except IndexError: #maybe we didn't get a sequence? pass fasta_cmd_res.cleanUp() return result def psiblast_n_neighbors(seqs, n=100, blast_db=None, core_threshold=1e-50, extra_threshold=1e-10, lower_threshold=1e-6, step=100, method="two-step", blast_mat_root=None, params={}, add_seq_names=False, WorkingDir=None, SuppressStderr=None, SuppressStdout=None, input_handler=None, scorer=3, #shotgun with 3 hits needed to keep second_db=None ): """PsiBlasts sequences, stopping when n neighbors are reached. core_threshold: threshold for the core profile (default: 1e-50) extra_threshold: threshold for pulling in additional seqs (default:1e-10) lower_threshold: threshold for seqs in final round (default:1e-6) seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. If you want to skip the detection and force a specific type of input handler, use input_handler='your_favorite_handler'. add_seq_names: boolean. if True, sequence names are inserted in the list of sequences. if False, it assumes seqs is a list of lines of some proper format that the program can handle """ if blast_db: params["-d"] = blast_db ih = input_handler or guess_input_handler(seqs, add_seq_names) recs = seqs_to_stream(seqs, ih) #checkpointing can only handle one seq... #set up the parameters for the core and additional runs max_iterations = params['-j'] params['-j'] = 2 #won't checkpoint with single iteration app = PsiBlast(params=params, blast_mat_root=blast_mat_root, InputHandler='_input_as_lines', WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout, ) result = {} for seq in recs: query_id = seq[0][1:].split(None,1)[0] if method == "two-step": result[query_id] = ids_from_seq_two_step(seq, n, max_iterations, \ app, core_threshold, extra_threshold, lower_threshold, second_db) elif method == "lower_threshold": result[query_id] = ids_from_seq_lower_threshold(seq, n, \ max_iterations, app, core_threshold, lower_threshold, step) elif method == "iterative": result[query_id] = ids_from_seqs_iterative(seq, app, \ QMEPsiBlast9, scorer, params['-j'], n) else: raise TypeError, "Got unknown method %s" % method params['-j'] = max_iterations return result def ids_from_seq_two_step(seq, n, max_iterations, app, core_threshold, \ extra_threshold, lower_threshold, second_db=None): """Returns ids that match a seq, using a 2-tiered strategy. Optionally uses a second database for the second search. """ #first time through: reset 'h' and 'e' to core #-h is the e-value threshold for including seqs in the score matrix model app.Parameters['-h'].on(core_threshold) #-e is the e-value threshold for the final blast app.Parameters['-e'].on(core_threshold) checkpoints = [] ids = [] last_num_ids = None for i in range(max_iterations): if checkpoints: app.Parameters['-R'].on(checkpoints[-1]) curr_check = 'checkpoint_%s.chk' % i app.Parameters['-C'].on(curr_check) output = app(seq) #if we didn't write a checkpoint, bail out if not access(curr_check, F_OK): break #if we got here, we wrote a checkpoint file checkpoints.append(curr_check) result = list(output.get('BlastOut', output['StdOut'])) output.cleanUp() if result: ids = LastProteinIds9(result,keep_values=True,filter_identity=False) num_ids = len(ids) if num_ids >= n: break if num_ids == last_num_ids: break last_num_ids = num_ids #if we didn't write any checkpoints, second run won't work, so return ids if not checkpoints: return ids #if we got too many ids and don't have a second database, return the ids we got if (not second_db) and num_ids >= n: return ids #second time through: reset 'h' and 'e' to get extra hits, and switch the #database if appropriate app.Parameters['-h'].on(extra_threshold) app.Parameters['-e'].on(lower_threshold) if second_db: app.Parameters['-d'].on(second_db) for i in range(max_iterations): #will always have last_check if we get here app.Parameters['-R'].on(checkpoints[-1]) curr_check = 'checkpoint_b_%s.chk' % i app.Parameters['-C'].on(curr_check) output = app(seq) #bail out if we couldn't write a checkpoint if not access(curr_check, F_OK): break #if we got here, the checkpoint worked checkpoints.append(curr_check) result = list(output.get('BlastOut', output['StdOut'])) if result: ids = LastProteinIds9(result,keep_values=True,filter_identity=False) num_ids = len(ids) if num_ids >= n: break if num_ids == last_num_ids: break last_num_ids = num_ids #return the ids we got. may not be as many as we wanted. for c in checkpoints: remove(c) return ids class ThresholdFound(Exception): pass def ids_from_seq_lower_threshold(seq, n, max_iterations, app, core_threshold, \ lower_threshold, step=100): """Returns ids that match a seq, decreasing the sensitivity.""" last_num_ids = None checkpoints = [] cp_name_base = make_unique_str() # cache ides for each iteration # store { iteration_num:(core_threshold, [list of matching ids]) } all_ids = {} try: i=0 while 1: #-h is the e-value threshold for inclusion in the score matrix model app.Parameters['-h'].on(core_threshold) app.Parameters['-e'].on(core_threshold) if core_threshold > lower_threshold: raise ThresholdFound if checkpoints: #-R restarts from a previously stored file app.Parameters['-R'].on(checkpoints[-1]) #store the score model from this iteration curr_check = 'checkpoint_' + cp_name_base + '_' + str(i) + \ '.chk' app.Parameters['-C'].on(curr_check) output = app(seq) result = list(output.get('BlastOut', output['StdOut'])) #sometimes fails on first try -- don't know why, but this seems #to fix problem while not result: output = app(seq) result = list(output.get('BlastOut', output['StdOut'])) ids = LastProteinIds9(result,keep_values=True,filter_identity=False) output.cleanUp() all_ids[i + 1] = (core_threshold, copy(ids)) if not access(curr_check, F_OK): raise ThresholdFound checkpoints.append(curr_check) num_ids = len(ids) if num_ids >= n: raise ThresholdFound last_num_ids = num_ids core_threshold *= step if i >= max_iterations - 1: #because max_iterations is 1-based raise ThresholdFound i += 1 except ThresholdFound: for c in checkpoints: remove(c) #turn app.Parameters['-R'] off so that for the next file it does not #try and read in a checkpoint file that is not there app.Parameters['-R'].off() return ids, i + 1, all_ids def make_unique_str(num_chars=20): """make a random string of characters for a temp filename""" chars = 'abcdefghigklmnopqrstuvwxyz' all_chars = chars + chars.upper() + '01234567890' picks = list(all_chars) return ''.join([choice(picks) for i in range(num_chars)]) def make_subject_match_scorer(count): def subject_match_scorer(checked_ids): """From {subject:{query:score}} returns subject ids w/ >= count hits. Useful for elminating subjects with few homologs. """ return [key for key, val in checked_ids.items() if len(val) >= count] return subject_match_scorer def make_shotgun_scorer(count): def shotgun_scorer(checked_ids): """From {subject:{query:score}} returns any ids w/ >= count hits. A hit counts towards a sequence's score if it was either the subject or the query, but we don't double-count (subject, query) pairs, i.e. if A hits B and B hits A, only one (A,B) hit will be counted, although it will be counted as both (A,B) and (B,A) (i.e. it will help preserve both A and B). """ result = {} for subject, val in checked_ids.items(): for query in val.keys(): if subject not in result: result[subject] = {} result[subject][query] = True if query not in result: result[query] = {} result[query][subject] = True return [key for key, val in result.items() if len(val) >= count] return shotgun_scorer def keep_everything_scorer(checked_ids): """Returns every query and every match in checked_ids, with best score.""" result = checked_ids.keys() for i in checked_ids.values(): result.extend(i.keys()) return dict.fromkeys(result).keys() def ids_from_seqs_iterative(seqs, app, query_parser, \ scorer=keep_everything_scorer, max_iterations=None, blast_db=None,\ max_seqs=None, ): """Gets the ids from each seq, then does each additional id until all done. If scorer is passed in as an int, uses shotgun scorer with that # hits. """ if isinstance(scorer, int): scorer = make_shotgun_scorer(scorer) seqs_to_check = list(seqs) checked_ids = {} curr_iteration = 0 while seqs_to_check: unchecked_ids = {} #pass seqs to command all_output = app(seqs_to_check) output = all_output.get('BlastOut', all_output['StdOut']) for query_id, match_id, match_score in query_parser(output): if query_id not in checked_ids: checked_ids[query_id] = {} checked_ids[query_id][match_id] = match_score if match_id not in checked_ids: unchecked_ids[match_id] = True all_output.cleanUp() if unchecked_ids: seq_file = fasta_cmd_get_seqs(unchecked_ids.keys(), app.Parameters['-d'].Value)['StdOut'] seqs_to_check = [] for s in FastaCmdFinder(fasta_cmd_get_seqs(\ unchecked_ids.keys(), app.Parameters['-d'].Value)['StdOut']): seqs_to_check.extend(s) else: seqs_to_check = [] #bail out if max iterations or max seqs was defined and we've reached it curr_iteration += 1 if max_iterations and (curr_iteration >= max_iterations): break if max_seqs: curr = scorer(checked_ids) if len(curr) >= max_seqs: return curr return scorer(checked_ids) #scorer should return list of good ids def blastp(seqs, blast_db="nr", e_value="1e-20", max_hits=200, working_dir="/tmp", blast_mat_root=None, extra_params={}): """ Returns BlastResult from input seqs, using blastp. Need to add doc string """ # set up params to use with blastp params = { # matrix "-M":"BLOSUM62", # max procs "-a":"1", # expectation "-e":e_value, # max seqs to show "-b":max_hits, # max one line descriptions "-v":max_hits, # program "-p":"blastp" } params.update(extra_params) # blast blast_res = blast_seqs(seqs, Blastall, blast_mat_root=blast_mat_root, blast_db=blast_db, params=params, add_seq_names=False, WorkingDir=working_dir ) # get prot id map if blast_res['StdOut']: lines = [x for x in blast_res['StdOut']] return BlastResult(lines) return None def blastn(seqs, blast_db="nt", e_value="1e-20", max_hits=200, working_dir="/tmp", blast_mat_root=None, extra_params={}): """ Returns BlastResult from input seqs, using blastn. Need to add doc string """ # set up params to use with blastp params = { # matrix "-M":"BLOSUM62", # max procs "-a":"1", # expectation "-e":e_value, # max seqs to show "-b":max_hits, # max one line descriptions "-v":max_hits, # program "-p":"blastn" } params.update(extra_params) # blast blast_res = blast_seqs(seqs, Blastall, blast_mat_root=blast_mat_root, blast_db=blast_db, params=params, add_seq_names=False, WorkingDir=working_dir ) # get prot id map if blast_res['StdOut']: lines = [x for x in blast_res['StdOut']] return BlastResult(lines) return None def blastx(seqs, params=None): """Returns BlastResults from input seqs, using blastx.""" raise NotImplementedError def tblastx(seqs, params=None): """Returns BlastResults from input seqs, using tblastx.""" raise NotImplementedError def psiblast(seqs, params=None): """Returns BlastResults from input seqs, using psiblast.""" raise NotImplementedError def reciprocal_best_blast_hit(query_id, db_1, db_2, exclude_self_hits=True,\ params=None): """Returns best hit in db_2 that maps back to query_id in db_1, or None. exclude_self_hits: if True (the default), returns the best hit that doesn't have the same id. Otherwise, will return the same id if it is in both databases (assuming it's the same sequence in both). """ raise NotImplementedError #make with factory functions for the blast hits if __name__ == "__main__": print "Debug. examples of how i've been using." print "Example of straightforward BLAST" # WARNING: I changed a bunch of stuff to make testing easier, since nr doesn't # fit in memory on my laptop. I created a database 'eco' using formatdb on the # E. coli K12 fasta file from this URL: # ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K12/NC_000913.faa # Because we're blasting an archaeal sequence against one bacterial genome, I # relaxed the inclusion thresholds substantially. DO NOT USE THESE AGAINST NR! in_filename = "test_seq.fasta" out_filename = "test.out" # if blast env variable set, can just say 'nr' #BLAST_DB = "/home/hamady/quicksand/data/blast/db/nr" BLAST_DB = 'nr' #'nr' BLAST_MAT_ROOT="/home/hamady/apps/blast-2.2.9/data" #BLAST_MAT_ROOT='/Users/rob/ncbi/data' # set up params to use with iterative #print seqs_from_fastacmd(['16766313'], 'nr', True) #raise ValueError, "dbug" params = { # matrix "-M":"PAM70", # max procs "-a":2, # expect "-e":1e-15, # blastall # # program # "-p":"blastp", # psi-blast # max iterations "-j":2, # max seqs to show "-b":50, # inclusion "-h":1e-2, } in_seqs = """>stm:STMabcdef thrA; aspartokinase I MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTIGGQDA LPNISDAERIFSDLLAGLASAQPGFPLARLKMVVEQEFAQIKHVLHGISLLGQCPDSINA ALICRGEKMSIAIMAGLLEARGHRVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASQIP ADHMILMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQV PDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASDS DDNLPVKGISNLNNMAMFSVSGPGMKGMIGMAARVFAAMSRAGISVVLITQSSSEYSISF CVPQSDCARARRAMQDEFYLELKEGLLEPLAVTERLAIISVVGDGMRTLRGISAKFFAAL ARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGAL""" # The following should now give the same output: # # in_seqs = 'tiny.faa' #tiny.faa in cwd contains the sequence above # # in_seqs = """>gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577 #MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFE #NELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLG #SVTENVIKKSNKPVLVVKRKNS""".split() #lines instead of multiline string # blast_res = blast_seqs(in_seqs, Blastall, blast_mat_root=BLAST_MAT_ROOT, add_seq_names=False, blast_db=BLAST_DB, params={'-p': 'blastp','-e': '1','-m': 9}, out_filename=out_filename) print [x for x in blast_res['StdOut']] print [x for x in blast_res['StdErr']] print blast_res #for x in blast_res['BlastOut']: # print x.rstrip() blast_res.cleanUp() #print '\n\n' #print "Example of psiblast_n_neighbors" #print "Method 1: two-step with high- and low-confidence matches" #print psiblast_n_neighbors(in_seqs, n=10, blast_db=BLAST_DB, \ # method="two-step", blast_mat_root=BLAST_MAT_ROOT,params=params,\ # core_threshold=1e-5, extra_threshold=1e-2, lower_threshold=1e-1) #print #print "Method 2: keep lowering threshold" #print psiblast_n_neighbors(in_seqs, n=10, blast_db=BLAST_DB, \ # method="lower_threshold", blast_mat_root=BLAST_MAT_ROOT,params=params, # core_threshold=1e-6, lower_threshold=1e-2) #print #print "Method 3: psi-blast shotgun" #print psiblast_n_neighbors(in_seqs, n=10, blast_db=BLAST_DB, \ # method="iterative", blast_mat_root=BLAST_MAT_ROOT,params=params, # core_threshold=1e-5, lower_threshold=1e-2) #print #print "Method 4: two-step with high- and low-confidence matches, diff dbs" #print psiblast_n_neighbors(in_seqs, n=10, blast_db=BLAST_DB, \ # method="two-step", blast_mat_root=BLAST_MAT_ROOT,params=params,\ # core_threshold=1e-5, extra_threshold=1e-2, lower_threshold=1e-1, second_db='stm') #print PyCogent-1.5.3/cogent/app/blat.py000644 000765 000024 00000040130 12024702176 017572 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for BLAT v34""" from cogent.app.parameters import FlagParameter, ValuedParameter, \ MixedParameter, FilePath from cogent.app.util import CommandLineApplication, ResultPath, \ ApplicationError, get_tmp_filename from cogent import DNA, PROTEIN from cogent.core.genetic_code import DEFAULT as standard_code from cogent.parse.fasta import MinimalFastaParser from os import remove from os.path import isabs __author__ = "Adam Robbins-Pianka" __copyright__ = "Copyright 2007-2012, The QIIME Project" __credits__ = ["Adam Robbins-Pianka", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Adam Robbins-Pianka" __email__ = "adam.robbinspianka@colorado.edu" __status__ = "Prototype" class Blat(CommandLineApplication): """BLAT generic application controller""" _command = 'blat' _input_handler = "_input_as_list" _database_types = ['dna', 'prot', 'dnax'] _query_types = ['dna', 'rna', 'prot', 'dnax', 'rnax'] _mask_types = ['lower', 'upper', 'out', 'file.out'] _out_types = ['psl', 'pslx', 'axt', 'maf', 'sim4', 'wublast', 'blast', 'blast8', 'blast9'] _valid_combinations = [('dna', 'dna'), ('dna', 'rna'), ('prot', 'prot'), ('dnax', 'prot'), ('dnax', 'dnax'), ('dnax', 'rnax')] _database = None _query = None _output = None _parameters = { # database type (dna, prot, or dnax, where dnax is DNA sequence # translated in six frames to protein '-t':ValuedParameter('-',Delimiter='=',Name='t'), # query type (dna, rna, prot, dnax, rnax, where rnax is DNA sequence # translated in three frames to protein '-q':ValuedParameter('-',Delimiter='=',Name='q'), # Use overused tile file N.ooc, and N should correspond to the tileSize '-ooc':ValuedParameter('-',Delimiter='=',Name='ooc', IsPath=True), # Sets the size of at match that that triggers an alignment '-tileSize':ValuedParameter('-',Delimiter='=',Name='tileSize'), # Spacing between tiles. '-stepSize':ValuedParameter('-',Delimiter='=',Name='stepSize'), # If set to 1, allows one mismatch in the tile and still triggers # an alignment. '-oneOff':ValuedParameter('-',Delimiter='=',Name='oneOff'), # sets the number of tile matches '-minMatch':ValuedParameter('-',Delimiter='=',Name='minMatch'), #sets the minimum score '-minScore':ValuedParameter('-',Delimiter='=',Name='minScore'), # sets the minimum sequence identity in percent '-minIdentity':ValuedParameter('-',Delimiter='=',Name='minIdentity'), # sets the size o the maximum gap between tiles in a clump '-maxGap':ValuedParameter('-',Delimiter='=',Name='maxGap'), # make an overused tile file. Target needs to be complete genome. '-makeOoc':ValuedParameter('-',Delimiter='=',Name='makeOoc', IsPath=True), # sets the number of repetitions of a tile allowed before it is marked # as overused '-repMatch':ValuedParameter('-',Delimiter='=',Name='repMatch'), # mask out repeats. Alignments won't be started in masked region but # may extend through it in nucleotide searches. Masked areas are # ignored entirely in protein or translated searches. Types are: # lower, upper, out, file.out (file.out - mask database according to # RepeatMasker file.out '-mask':ValuedParameter('-',Delimiter='=',Name='mask'), # Mask out repeats in query sequence. similar to -mask but for query # rather than target sequence '-qMask':ValuedParameter('-',Delimiter='=',Name='qMask'), # repeat bases will not be masked in any way, but matches in # repeat areas will be reported separately from matches in other # areas in the pls output '-repeats':ValuedParameter('-',Delimiter='=',Name='repeats'), # minimum percent divergence of repeats to allow them to be unmasked '-minRepDivergence':ValuedParameter('-',Delimiter='=', Name='minRepDivergence'), # output dot every N sequences to show program's progress '-dots':ValuedParameter('-',Delimiter='=',Name='dots'), # controls output file format. One of: # psl - Default. Tab separated format, no sequence # pslx - Tab separated format with sequence # axt - blastz-associated axt format # maf - multiz-associated maf format # sim4 - similar to sim4 format # wublast - similar to wublast format # blast - similar to NCBI blast format # blast8- NCBI blast tabular format # blast9 - NCBI blast tabular format with comments '-out':ValuedParameter('-',Delimiter='=',Name='out'), # sets maximum intron size '-maxIntron':ValuedParameter('-',Delimiter='=',Name='maxIntron'), # suppress column headers in psl output '-noHead':FlagParameter('-',Name='noHead'), # trim leading poly-T '-trimT':FlagParameter('-',Name='trimT'), # do not trim trailing poly-A '-noTrimA':FlagParameter('-',Name='noTrimA'), # Remove poly-A tail from qSize as well as alignments in psl output '-trimHardA':FlagParameter('-',Name='trimHardA'), # run for fast DNA/DNA remapping - not allowing introns, # requiring high %ID '-fastMap':FlagParameter('-',Name='fastMap'), # for high quality mRNAs, look harder for small initial and terminal # exons '-fine':FlagParameter('-',Name='fine'), # Allows extension of alignment through large blocks of N's '-extendThroughN':FlagParameter('-',Name='extendThroughN') } def _get_result_paths(self, data): """Returns the file location for result output """ return {'output':ResultPath(data[2], IsWritten=True)} def _get_base_command(self): """Gets the command that will be run when the app controller is called. """ command_parts = [] cd_command = ''.join(['cd ',str(self.WorkingDir),';']) if self._command is None: raise ApplicationError, '_command has not been set.' command = self._command parameters = sorted([str(x) for x in self.Parameters.values() if str(x)]) synonyms = self._synonyms command_parts.append(cd_command) command_parts.append(command) command_parts.append(self._database) # Positional argument command_parts.append(self._query) # Positional argument command_parts += parameters if self._output: command_parts.append(self._output.Path) # Positional return self._command_delimiter.join(filter(None,command_parts)).strip() BaseCommand = property(_get_base_command) def _input_as_list(self, data): '''Takes the positional arguments as input in a list. The list input here should be [query_file_path, database_file_path, output_file_path]''' query, database, output = data if (not isabs(database)) \ or (not isabs(query)) \ or (not isabs(output)): raise ApplicationError, "Only absolute paths allowed.\n%s" %\ ', '.join(data) self._database = FilePath(database) self._query = FilePath(query) self._output = ResultPath(output, IsWritten=True) ## check parameters that can only take a particular set of values # check combination of databse and query type if self.Parameters['-t'].isOn() and self.Parameters['-q'].isOn() and \ (self.Parameters['-t'].Value, self.Parameters['-q'].Value) not in \ self._valid_combinations: error_message = "Invalid combination of database and query " + \ "types ('%s', '%s').\n" % \ (self.Paramters['-t'].Value, self.Parameters['-q'].Value) error_message += "Must be one of: %s\n" % \ repr(self._valid_combinations) raise ApplicationError(error_message) # check database type if self.Parameters['-t'].isOn() and \ self.Parameters['-t'].Value not in self._database_types: error_message = "Invalid database type %s\n" % \ self.Parameters['-t'].Value error_message += "Allowed values: %s\n" % \ ', '.join(self._database_types) raise ApplicationError(error_message) # check query type if self.Parameters['-q'].isOn() and \ self.Parameters['-q'].Value not in self._query_types: error_message = "Invalid query type %s\n" % \ self.Parameters['-q'].Value error_message += "Allowed values: %s\n" % \ ', '.join(self._query_types) raise ApplicationError(error_message) # check mask type if self.Parameters['-mask'].isOn() and \ self.Parameters['-mask'].Value not in self._mask_types: error_message = "Invalid mask type %s\n" % \ self.Parameters['-mask'] error_message += "Allowed Values: %s\n" % \ ', '.join(self._mask_types) raise ApplicationError(error_message) # check qmask type if self.Parameters['-qMask'].isOn() and \ self.Parameters['-qMask'].Value not in self._mask_types: error_message = "Invalid qMask type %s\n" % \ self.Parameters['-qMask'].Value error_message += "Allowed values: %s\n" % \ ', '.join(self._mask_types) raise ApplicationError(error_message) # check repeat type if self.Parameters['-repeats'].isOn() and \ self.Parameters['-repeats'].Value not in self._mask_types: error_message = "Invalid repeat type %s\n" % \ self.Parameters['-repeat'].Value error_message += "Allowed values: %s\n" % \ ', '.join(self._mask_types) raise ApplicationError(error_message) # check output format if self.Parameters['-out'].isOn() and \ self.Parameters['-out'].Value not in self._out_types: error_message = "Invalid output type %s\n" % \ self.Parameters['-out'] error_message += "Allowed values: %s\n" % \ ', '.join(self._out_types) raise ApplicationError(error_message) return '' def assign_reads_to_database(query_fasta_fp, database_fasta_fp, output_fp, params=None): """Assign a set of query sequences to a reference database query_fasta_fp : absolute file path to query sequences database_fasta_fp : absolute file path to the reference database output_fp : absolute file path of the output file to write params : dict of BLAT specific parameters. This method returns an open file object. The output format defaults to blast9 and should be parsable by the PyCogent BLAST parsers. """ if params is None: params = {} if '-out' not in params: params['-out'] = 'blast9' blat = Blat(params = params) result = blat([query_fasta_fp, database_fasta_fp, output_fp]) return result['output'] def assign_dna_reads_to_dna_database(query_fasta_fp, database_fasta_fp, output_fp, params = {}): """Assign DNA reads to a database fasta of DNA sequences. Wraps assign_reads_to_database, setting database and query types. All parameters are set to default unless params is passed. query_fasta_fp: absolute path to the query fasta file containing DNA sequences. database_fasta_fp: absolute path to the database fasta file containing DNA sequences. output_fp: absolute path where the output file will be generated. params: optional. dict containing parameter settings to be used instead of default values. Cannot change database or query file types from dna and dna, respectively. This method returns an open file object. The output format defaults to blast9 and should be parsable by the PyCogent BLAST parsers. """ my_params = {'-t': 'dna', '-q': 'dna' } # if the user specified parameters other than default, then use them. # However, if they try to change the database or query types, raise an # applciation error. if '-t' in params or '-q' in params: raise ApplicationError("Cannot change database or query types when " +\ "using assign_dna_reads_to_dna_database. " +\ "Use assign_reads_to_database instead.\n") my_params.update(params) result = assign_reads_to_database(query_fasta_fp, database_fasta_fp, output_fp, my_params) return result def assign_dna_reads_to_protein_database(query_fasta_fp, database_fasta_fp, output_fp, temp_dir = "/tmp", params = {}): """Assign DNA reads to a database fasta of protein sequences. Wraps assign_reads_to_database, setting database and query types. All parameters are set to default unless params is passed. A temporary file must be written containing the translated sequences from the input query fasta file because BLAT cannot do this automatically. query_fasta_fp: absolute path to the query fasta file containing DNA sequences. database_fasta_fp: absolute path to the database fasta file containing protein sequences. output_fp: absolute path where the output file will be generated. temp_dir: optional. Change the location where the translated sequences will be written before being used as the query. Defaults to /tmp. params: optional. dict containing parameter settings to be used instead of default values. Cannot change database or query file types from protein and dna, respectively. This method returns an open file object. The output format defaults to blast9 and should be parsable by the PyCogent BLAST parsers. """ my_params = {'-t': 'prot', '-q': 'prot' } # make sure temp_dir specifies an absolute path if not isabs(temp_dir): raise ApplicationError, "temp_dir must be an absolute path." # if the user specified parameters other than default, then use them. # However, if they try to change the database or query types, raise an # applciation error. if '-t' in params or '-q' in params: raise ApplicationError, "Cannot change database or query types " + \ "when using " + \ "assign_dna_reads_to_dna_database. " + \ "Use assign_reads_to_database instead." my_params.update(params) # get six-frame translation of the input DNA sequences and write them to # temporary file. tmp = get_tmp_filename(tmp_dir=temp_dir, result_constructor=str) tmp_out = open(tmp, 'w') for label, sequence in MinimalFastaParser(open(query_fasta_fp)): seq_id = label.split()[0] s = DNA.makeSequence(sequence) translations = standard_code.sixframes(s) frames = [1,2,3,-1,-2,-3] translations = dict(zip(frames, translations)) for frame, translation in sorted(translations.iteritems()): entry = '>{seq_id}_frame_{frame}\n{trans}\n' entry = entry.format(seq_id=seq_id, frame=frame, trans=translation) tmp_out.write(entry) tmp_out.close() result = assign_reads_to_database(tmp, database_fasta_fp, output_fp, \ params = my_params) remove(tmp) return result PyCogent-1.5.3/cogent/app/bwa.py000644 000765 000024 00000062714 12024702176 017435 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for BWA 0.6.2 (release 19 June 2012)""" from cogent.app.parameters import FlagParameter, ValuedParameter, \ MixedParameter, FilePath from cogent.app.util import CommandLineApplication, ResultPath, \ ApplicationError, get_tmp_filename from os.path import isabs __author__ = "Adam Robbins-Pianka" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Adam Robbins-Pianka", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Adam Robbins-Pianka" __email__ = "adam.robbinspianka@colorado.edu" __status__ = "Production" # helper functions for argument checking def is_int(x): # return true if it's an int return ((type(x) == int) or \ # or it's a string that is all digits (type(x) == str and x.isdigit()) or \ # otherwise return False False) def is_float(x): return (is_int(x) or \ # or if it's a float (type(x) == float) or \ # or it's a string with exactly one decimal and all digits on both sides of # the decimal (type(x) == str and '.' in x and all(map(str.isdigit, x.split('.', 1)))) \ # otherwise return False or False) #Base class class BWA(CommandLineApplication): """BWA generic application controller. Do not instantiate directly. Instead of instantiating this class, instantiate a subclass for each subcommand. Available subclasses are: BWA_index BWA_aln BWA_samse BWA_sampe BWA_bwasw """ # all subclasses will accept dictionaries as input that specify input # and output files. The required (and optional) types of input and output # files differ by subcommand. _input_handler = "_input_as_dict" # the main command. The program bwa should be in the PATH _command = "bwa" # holds the values of the dict handled by the input handler _input = {} # Each subclass can have a dictionary (keys = option names, e.g., -a # and values = boolean fucntions) called _valid_arguments # that specifies checks to be made on the parameters. def check_arguments(self): """Sanity check the arguments passed in. Uses the boolean functions specified in the subclasses in the _valid_arguments dictionary to determine if an argument is valid or invalid. """ for k, v in self.Parameters.iteritems(): if self.Parameters[k].isOn(): if k in self._valid_arguments: if not self._valid_arguments[k](v.Value): error_message = 'Invalid argument (%s) ' % v.Value error_message += 'for parameter %s\n' % k raise ApplicationError(error_message) def _get_base_command(self): """ Returns the full command string Overridden here because there are positional arguments (specifically the input and output files). """ command_parts = [] # Append a change directory to the beginning of the command to change # to self.WorkingDir before running the command # WorkingDir should be in quotes -- filenames might contain spaces cd_command = ''.join(['cd ',str(self.WorkingDir),';']) if self._command is None: raise ApplicationError, '_command has not been set.' command = self._command # also make sure there's a subcommand! if self._subcommand is None: raise ApplicationError, '_subcommand has not been set.' subcommand = self._subcommand # sorting makes testing easier, since the options will be written out # in alphabetical order. Could of course use option parsing scripts # in cogent for this, but this works as well. parameters = sorted([str(x) for x in self.Parameters.values() if str(x)]) synonyms = self._synonyms command_parts.append(cd_command) command_parts.append(command) # add in subcommand command_parts.append(subcommand) command_parts += parameters # add in the positional arguments in the correct order for k in self._input_order: # this check is necessary to account for optional positional # arguments, such as the mate file for bwa bwasw # Note that the input handler will ensure that all required # parameters have valid values if k in self._input: command_parts.append(self._input[k]) return self._command_delimiter.join(command_parts).strip() BaseCommand = property(_get_base_command) def _input_as_dict(self, data): """Takes dictionary that sets input and output files. Valid keys for the dictionary are specified in the subclasses. File paths must be absolute. """ # clear self._input; ready to receive new input and output files self._input = {} # Check that the arguments to the # subcommand-specific parameters are valid self.check_arguments() # Ensure that we have all required input (file I/O) for k in self._input_order: # N.B.: optional positional arguments begin with underscore (_)! # (e.g., see _mate_in for bwa bwasw) if k[0] != '_' and k not in data: raise ApplicationError, "Missing required input %s" % k # Set values for input and output files for k in data: # check for unexpected keys in the dict if k not in self._input_order: error_message = "Invalid input arguments (%s)\n" % k error_message += "Valid keys are: %s" % repr(self._input_order) raise ApplicationError(error_message + '\n') # check for absolute paths if not isabs(data[k][0]): raise ApplicationError, "Only absolute paths allowed.\n%s" %\ repr(data) self._input[k] = data[k] # if there is a -f option to specify an output file, force the user to # use it (otherwise things to to stdout) if '-f' in self.Parameters and not self.Parameters['-f'].isOn(): raise ApplicationError, "Please specify an output file with -f" return '' class BWA_index(BWA): """Controls the "index" subcommand of the bwa application. Valid input keys are: fasta_in """ # the subcommand for bwa index _subcommand = "index" _parameters = { # which algorithm to use. # is # IS linear-time algorithm for constructing suffix array. It requires # 5.37N memory where N is the size of the database. IS is moderately # fast, but does not work with database larger than 2GB. IS is the # default algorithm due to its simplicity. The current codes for IS # algorithm are reimplemented by Yuta Mori. # # bwtsw # Algorithm implemented in BWT-SW. This method works with the whole # human genome, but it does not work with database smaller than 10MB # and it is usually slower than IS. # # DEFAULTs to auto-select (based on input fasta file size) '-a':ValuedParameter('-', Delimiter=' ', Name='a'), # prefix for the output index. # DEFAULTs to the base name of the input fasta file '-p':ValuedParameter('-', Delimiter=' ', Name='p'), # index files named as .64.* instead of .* '-6':FlagParameter('-', Name='6') } # The -a command can take on of only two possible values # the -p command allows the user to specify a prefix; for our purposes, # this prefix should be an abolute path _valid_arguments = { '-a': lambda x: x in ['is', 'bwtsw'], '-p': isabs } # For the position specific arguments, this is the order that they will # be written in the base command # input file keys beginning with _ are optional inputs _input_order = ['fasta_in'] def _get_result_paths(self, data): """Gets the results for a run of bwa index. bwa index outputs 5 files when the index is created. The filename prefix will be the same as the input fasta, unless overridden with the -p option, and the 5 extensions are listed below: .amb .ann .bwt .pac .sa and these extentions (including the period) are the keys to the dictionary that is returned. """ # determine the names of the files. The name will be the same as the # input fasta file unless overridden with the -p option if self.Parameters['-p'].isOn(): prefix = self.Parameters['-p'].Value else: prefix = data['fasta_in'] # the 5 output file suffixes suffixes = ['.amb', '.ann', '.bwt', '.pac', '.sa'] out_files = {} for suffix in suffixes: out_files[suffix] = ResultPath(prefix+suffix, IsWritten=True) return out_files class BWA_aln(BWA): """Controls the "aln" subcommand of the bwa application. Valid input keys are: prefix, fastq_in """ _parameters = { # max #diff (int) or missing prob under 0.02 err rate (float) [0.04] '-n': ValuedParameter('-', Delimiter=' ', Name='n'), #maximum number or fraction of gap opens [1] '-o': ValuedParameter('-', Delimiter=' ', Name='o'), #maximum number of gap extensions, -1 for disabling long gaps [-1] '-e': ValuedParameter('-', Delimiter=' ', Name='e'), #do not put an indel within bp towards the ends [5] '-i': ValuedParameter('-', Delimiter=' ', Name='i'), #maximum occurrences for extending a long deletion [10] '-d': ValuedParameter('-', Delimiter=' ', Name='d'), #seed length [32] '-l': ValuedParameter('-', Delimiter=' ', Name='l'), #maximum differences in the seed [2] '-k': ValuedParameter('-', Delimiter=' ', Name='k'), #maximum entries in the queue [2000000] '-m': ValuedParameter('-', Delimiter=' ', Name='m'), #number of threads [1] '-t': ValuedParameter('-', Delimiter=' ', Name='t'), #mismatch penalty [3] '-M': ValuedParameter('-', Delimiter=' ', Name='M'), #gap open penalty [11] '-O': ValuedParameter('-', Delimiter=' ', Name='O'), #gap extension penalty [4] '-E': ValuedParameter('-', Delimiter=' ', Name='E'), #stop searching when there are > equally best hits [30] '-R': ValuedParameter('-', Delimiter=' ', Name='R'), #quality threshold for read trimming down to 35bp [0] '-q': ValuedParameter('-', Delimiter=' ', Name='q'), #file to write output to instead of stdout '-f': ValuedParameter('-', Delimiter=' ', Name='f'), #length of barcode '-B': ValuedParameter('-', Delimiter=' ', Name='B'), #log-scaled gap penalty for long deletions '-L': FlagParameter('-', Name='L'), #non-iterative mode: search for all n-difference hits (slooow) '-N': FlagParameter('-', Name='N'), #the input is in the Illumina 1.3+ FASTQ-like format '-I': FlagParameter('-', Name='I'), #the input read file is in the BAM format '-b': FlagParameter('-', Name='b'), #use single-end reads only (effective with -b) '-0': FlagParameter('-', Name='0'), #use the 1st read in a pair (effective with -b) '-1': FlagParameter('-', Name='1'), #use the 2nd read in a pair (effective with -b) '-2': FlagParameter('-', Name='2'), #filter Casava-filtered sequences '-Y': FlagParameter('-', Name='Y') } # the subcommand for bwa aln _subcommand = 'aln' _valid_arguments = { # check to see if this is decimal numbers '-n': is_float, # check to see if these are integers '-o': is_int, '-e': is_int, '-i': is_int, '-d': is_int, '-l': is_int, '-k': is_int, '-m': is_int, '-t': is_int, '-M': is_int, '-O': is_int, '-E': is_int, '-R': is_int, '-q': is_int, '-B': is_int, # check to see if this is an absolute file path '-f': isabs } # input file keys beginning with _ are optional inputs _input_order = ['prefix', 'fastq_in'] def _get_result_paths(self, data): """Gets the result file for a bwa aln run. There is only one output file of a bwa aln run, a .sai file and it can be retrieved with the key 'output'. """ return {'output': ResultPath(self.Parameters['-f'].Value, IsWritten=True)} class BWA_samse(BWA): """Controls the "samse" subcommand of the bwa application. Valid input keys are: prefix, sai_in, fastq_in """ _parameters = { # Maximum number of alignments to output in the XA tag for reads # paired properly. If a read has more than this number of hits, the # XA tag will not be written '-n': ValuedParameter('-', Delimiter=' ', Name='n'), #file to write output to instead of stdout '-f': ValuedParameter('-', Delimiter=' ', Name='f'), # Specify the read group in a format like '@RG\tID:foo\tSM:bar' '-r': ValuedParameter('-', Delimiter=' ', Name='r') } # the subcommand for samse _subcommand = 'samse' _valid_arguments = { # make sure that this is an int '-n': is_int, # check to see if this is an absolute file path '-f': isabs } # input file keys beginning with _ are optional inputs _input_order = ['prefix', 'sai_in', 'fastq_in'] def _get_result_paths(self, data): """Gets the result file for a bwa samse run. There is only one output file of a bwa samse run, a .sam file and it can be retrieved with the key 'output'. """ return {'output': ResultPath(self.Parameters['-f'].Value, IsWritten=True)} class BWA_sampe(BWA): """Controls the "sampe" subcommand of the bwa application. Valid input keys are: prefix, sai1_in, sai2_in, fastq1_in, fastq2_in """ _parameters = { # Maximum insert size for a read pair to be considered being mapped # properly '-a': ValuedParameter('-', Delimiter=' ', Name='a'), # Maximum occurrences of a read for pairing '-o': ValuedParameter('-', Delimiter=' ', Name='o'), # Load the entire FM-index into memory to reduce disk operations '-P': FlagParameter('-', Name='P'), # maximum hits to output for paired reads [3] '-n': ValuedParameter('-', Delimiter=' ', Name='n'), # maximum hits to output for discordant pairs [10] '-N': ValuedParameter('-', Delimiter=' ', Name='N'), #file to write output to instead of stdout '-f': ValuedParameter('-', Delimiter=' ', Name='f'), # Specify the read group in a format like '@RG\tID:foo\tSM:bar' '-r': ValuedParameter('-', Delimiter=' ', Name='r'), # disable Smith-Waterman for the unmapped mate '-s': FlagParameter('-', Name='s'), # prior of chimeric rate (lower bound) [1.0e-05] '-c': ValuedParameter('-', Delimiter= ' ', Name='c'), # disable insert size estimate (force -s) '-A': FlagParameter('-', Name='A') } # the subcommand for sampe _subcommand = 'sampe' _valid_arguments = { # make sure this is a float '-c': is_float, # make sure these are all ints '-a': is_int, '-o': is_int, '-n': is_int, '-N': is_int, # check to see if this is an absolute file path '-f': isabs } # input file keys beginning with _ are optional inputs _input_order = ['prefix', 'sai1_in', 'sai2_in', 'fastq1_in', 'fastq2_in'] def _get_result_paths(self, data): """Gets the result file for a bwa sampe run. There is only one output file of a bwa sampe run, a .sam file, and it can be retrieved with the key 'output'. """ return {'output': ResultPath(self.Parameters['-f'].Value, IsWritten=True)} class BWA_bwasw(BWA): """Controls the "bwasw" subcommand of the bwa application. Valid input keys are: prefix, query_fasta, _query_fasta2 input keys beginning with an underscore are optional. """ _parameters = { #Score of a match [1] '-a': ValuedParameter('-', Delimiter=' ', Name='a'), #Mismatch penalty [3] '-b': ValuedParameter('-', Delimiter=' ', Name='b'), #Gap open penalty [5] '-q': ValuedParameter('-', Delimiter=' ', Name='q'), #Gap extension penalty. '-r': ValuedParameter('-', Delimiter=' ', Name='r'), # mask level [0.50] '-m': ValuedParameter('-', Delimiter=' ', Name='m'), #Number of threads in the multi-threading mode [1] '-t': ValuedParameter('-', Delimiter=' ', Name='t'), # file to output results to instead of stdout '-f': ValuedParameter('-', Delimiter=' ', Name='f'), #Band width in the banded alignment [33] '-w': ValuedParameter('-', Delimiter=' ', Name='w'), #Minimum score threshold divided by a [30] '-T': ValuedParameter('-', Delimiter=' ', Name='T'), #Coefficient for threshold adjustment according to query length. #Given an l-long query, the threshold for a hit to be retained is #a*max{T,c*log(l)}. [5.5] '-c': ValuedParameter('-', Delimiter=' ', Name='c'), #Z-best heuristics. Higher -z increases accuracy at the cost #of speed. [1] '-z': ValuedParameter('-', Delimiter=' ', Name='z'), #Maximum SA interval size for initiating a seed. Higher -s increases #accuracy at the cost of speed. [3] '-s': ValuedParameter('-', Delimiter=' ', Name='s'), #Minimum number of seeds supporting the resultant alignment to #trigger reverse alignment. [5] '-N': ValuedParameter('-', Delimiter=' ', Name='N'), # in SAM output, use hard clipping instead of soft clipping '-H': FlagParameter('-', Name='H'), # mark multi-part alignments as secondary '-M': FlagParameter('-', Name='M'), # skip Smith-Waterman read pariing '-S': FlagParameter('-', Name='S'), # ignore pairs with insert >= INT for inferring the size of distr # [20000] '-I': ValuedParameter('-', Delimiter=' ', Name='I') } # the subcommand fo bwasw _subcommand = 'bwasw' # input file keys beginning with _ are optional inputs _input_order = ['prefix', 'query_fasta', '_query_fasta_2'] _valid_arguments = { # Make sure this is a float '-c': is_float, '-m': is_float, # Make sure these are ints '-a': is_int, '-b': is_int, '-q': is_int, '-r': is_int, '-t': is_int, '-w': is_int, '-T': is_int, '-z': is_int, '-s': is_int, '-N': is_int, '-I': is_int, # make sure this is an absolute path '-f': isabs } def _get_result_paths(self, data): """Gets the result file for a bwa bwasw run. There is only one output file of a bwa bwasw run, a .sam file, and it can be retrieved with the key 'output'. """ return {'output': ResultPath(self.Parameters['-f'].Value, IsWritten=True)} def create_bwa_index_from_fasta_file(fasta_in, params=None): """Create a BWA index from an input fasta file. fasta_in: the input fasta file from which to create the index params: dict of bwa index specific paramters This method returns a dictionary where the keys are the various output suffixes (.amb, .ann, .bwt, .pac, .sa) and the values are open file objects. The index prefix will be the same as fasta_in, unless the -p parameter is passed in params. """ if params is None: params = {} # Instantiate the app controller index = BWA_index(params) # call the application, passing the fasta file in results = index({'fasta_in':fasta_in}) return results def assign_reads_to_database(query, database_fasta, out_path, params=None): """Assign a set of query sequences to a reference database database_fasta_fp: absolute file path to the reference database query_fasta_fp: absolute file path to query sequences output_fp: absolute file path of the file to be output params: dict of BWA specific parameters. * Specify which algorithm to use (bwa-short or bwasw) using the dict key "algorithm" * if algorithm is bwasw, specify params for the bwa bwasw subcommand * if algorithm is bwa-short, specify params for the bwa samse subcommand * if algorithm is bwa-short, must also specify params to use with bwa aln, which is used to get the sai file necessary to run samse. bwa aln params should be passed in using dict key "aln_params" and the associated value should be a dict of params for the bwa aln subcommand * if a temporary directory is not specified in params using dict key "temp_dir", it will be assumed to be /tmp This method returns an open file object (SAM format). """ if params is None: params = {} # set the output path params['-f'] = out_path # if the algorithm is not specified in the params dict, or the algorithm # is not recognized, raise an exception if 'algorithm' not in params: raise ApplicationError("Must specify which algorithm to use " + \ "('bwa-short' or 'bwasw')") elif params['algorithm'] not in ('bwa-short', 'bwasw'): raise ApplicationError('Unknown algorithm "%s". ' % \ params['algorithm'] + \ "Please enter either 'bwa-short' or 'bwasw'.") # if the temp directory is not specified, assume /tmp if 'temp_dir' not in params: params['temp_dir'] = '/tmp' # if the algorithm is bwa-short, we must build use bwa aln to get an sai # file before calling bwa samse on that sai file, so we need to know how # to run bwa aln. Therefore, we must ensure there's an entry containing # those parameters if params['algorithm'] == 'bwa-short': if 'aln_params' not in params: raise ApplicationError("With bwa-short, need to specify a key " + \ "'aln_params' and its value, a " + \ "dictionary to pass to bwa aln, since " + \ "bwa aln is an intermediate step when " + \ "doing bwa-short.") # we have this params dict, with "algorithm" and "temp_dir", etc which are # not for any of the subcommands, so make a new params dict that is the # same as the original minus these addendums subcommand_params = {} for k, v in params.iteritems(): if k not in ('algorithm', 'temp_dir', 'aln_params'): subcommand_params[k] = v # build index from database_fasta # get a temporary file name that is not in use index_prefix = get_tmp_filename(tmp_dir=params['temp_dir'], suffix='', \ result_constructor=str) create_bwa_index_from_fasta_file(database_fasta, {'-p': index_prefix}) # if the algorithm is bwasw, things are pretty simple. Just instantiate # the proper controller and set the files if params['algorithm'] == 'bwasw': bwa = BWA_bwasw(params = subcommand_params) files = {'prefix': index_prefix, 'query_fasta': query} # if the algorithm is bwa-short, it's not so simple elif params['algorithm'] == 'bwa-short': # we have to call bwa_aln to get the sai file needed for samse # use the aln_params we ensured we had above bwa_aln = BWA_aln(params = params['aln_params']) aln_files = {'prefix': index_prefix, 'fastq_in': query} # get the path to the sai file sai_file_path = bwa_aln(aln_files)['output'].name # we will use that sai file to run samse bwa = BWA_samse(params = subcommand_params) files = {'prefix': index_prefix, 'sai_in': sai_file_path, 'fastq_in': query} # run which ever app controller we decided was correct on the files # we set up result = bwa(files) # they both return a SAM file, so return that return result['output'] def assign_dna_reads_to_dna_database(query_fasta_fp, database_fasta_fp, out_fp, params = {}): """Wraps assign_reads_to_database, setting various parameters. The default settings are below, but may be overwritten and/or added to using the params dict: algorithm: bwasw """ my_params = {'algorithm': 'bwasw'} my_params.update(params) result = assign_reads_to_database(query_fasta_fp, database_fasta_fp, out_fp, my_params) return result def assign_dna_reads_to_protein_database(query_fasta_fp, database_fasta_fp, out_fp, temp_dir='/tmp', params = {}): """Wraps assign_reads_to_database, setting various parameters. Not yet implemented, as BWA can only align DNA reads to DNA databases. """ raise NotImplementedError, "BWA cannot at this point align DNA to protein" PyCogent-1.5.3/cogent/app/carnac.py000644 000765 000024 00000006471 12024702176 020111 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter,FlagParameter,Parameters __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class Carnac(CommandLineApplication): """Application controller for Carnac RNAfolding application Info at: http://bioinfo.lifl.fr/carnac/index.html Options: -a Inhibit the energy correction that is automatically performed to create the initial set of potential stems. By default, the energy correction depends of the GC percentage of each sequence. -c Eliminate sequences that are too similar. The similarity treshold is 98%. -h Add hairpins that are present only in one sequence to the initial set of potential stems (may be time and space demanding). """ #Limitation #if -c is turned on and file is deleted error in file handling in _get_result_paths _parameters = { '-c':FlagParameter(Prefix='-',Name='c',Value=False), '-a':FlagParameter(Prefix='-',Name='a'), '-h':FlagParameter(Prefix='-',Name='h')} _command = 'carnac' _input_handler='_input_as_string' def _get_result_paths(self,data): """Specifies the paths of output files generated by the application data: the data the instance of the application is called on Carnac produces it's output to a .ct, .eq, to the location of input file and .out files located in the same folder as the program is run from. graph and align file is also created. You always get back: StdOut,StdErr, and ExitStatus """ result={} name_counter = 0 ones='00' tens='0' count='' if not isinstance(data,list): #means data is file path=str(data) data=open(data).readlines() else: #data input as lines #path=''.join([self.WorkingDir,self._input_filename.split('/')[-1]]) path = ''.join(['/tmp/', self._input_filename.split('/')[-1]]) for item in data: if item.startswith('>'): name_counter += 1 if name_counter < 10: count=ones if name_counter > 9: count=tens if name_counter > 99: count='' name = item.strip('>\n') else: nr=name_counter result['ct%d' % nr] =\ ResultPath(Path=('%s%s%d.ct' % (path,count,nr))) result['eq%d' % nr] =\ ResultPath(Path=('%s%s%d.eq' % (path,count,nr))) result['out_seq%d' % nr] = \ ResultPath(Path=(''.join([self.WorkingDir,'Z_%s%d.%s.out'% \ (count,nr,name)]))) result['graph'] =\ ResultPath(Path=(self.WorkingDir+'graph.out')) result['align'] =\ ResultPath(Path=(self.WorkingDir+'alignment.out')) return result PyCogent-1.5.3/cogent/app/cd_hit.py000644 000765 000024 00000027224 12024702176 020113 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for CD-HIT v3.1.1""" import shutil from os import remove from cogent.app.parameters import ValuedParameter from cogent.app.util import CommandLineApplication, ResultPath,\ get_tmp_filename from cogent.core.moltype import RNA, DNA, PROTEIN from cogent.core.alignment import SequenceCollection from cogent.parse.fasta import MinimalFastaParser __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "mcdonadt@colorado.edu" __status__ = "Development" class CD_HIT(CommandLineApplication): """cd-hit Application Controller Use this version of CD-HIT if your MolType is PROTEIN """ _command = 'cd-hit' _input_handler = '_input_as_multiline_string' _parameters = { # input input filename in fasta format, required '-i':ValuedParameter('-',Name='i',Delimiter=' ',IsPath=True), # output filename, required '-o':ValuedParameter('-',Name='o',Delimiter=' ',IsPath=True), # sequence identity threshold, default 0.9 # this is the default cd-hit's "global sequence identity" calc'd as : # number of identical amino acids in alignment # divided by the full length of the shorter sequence '-c':ValuedParameter('-',Name='c',Delimiter=' '), # use global sequence identity, default 1 # if set to 0, then use local sequence identity, calculated as : # number of identical amino acids in alignment # divided by the length of the alignment # NOTE!!! don't use -G 0 unless you use alignment coverage controls # see options -aL, -AL, -aS, -AS '-g':ValuedParameter('-',Name='g',Delimiter=' '), # band_width of alignment, default 20 '-b':ValuedParameter('-',Name='b',Delimiter=' '), # max available memory (Mbyte), default 400 '-M':ValuedParameter('-',Name='M',Delimiter=' '), # word_length, default 8, see user's guide for choosing it '-n':ValuedParameter('-',Name='n',Delimiter=' '), # length of throw_away_sequences, default 10 '-l':ValuedParameter('-',Name='l',Delimiter=' '), # tolerance for redundance, default 2 '-t':ValuedParameter('-',Name='t',Delimiter=' '), # length of description in .clstr file, default 20 # if set to 0, it takes the fasta defline and stops at first space '-d':ValuedParameter('-',Name='d',Delimiter=' '), # length difference cutoff, default 0.0 # if set to 0.9, the shorter sequences need to be # at least 90% length of the representative of the cluster '-s':ValuedParameter('-',Name='s',Delimiter=' '), # length difference cutoff in amino acid, default 999999 # f set to 60, the length difference between the shorter sequences # and the representative of the cluster can not be bigger than 60 '-S':ValuedParameter('-',Name='S',Delimiter=' '), # alignment coverage for the longer sequence, default 0.0 # if set to 0.9, the alignment must covers 90% of the sequence '-aL':ValuedParameter('-',Name='aL',Delimiter=' '), # alignment coverage control for the longer sequence, default 99999999 # if set to 60, and the length of the sequence is 400, # then the alignment must be >= 340 (400-60) residues '-AL':ValuedParameter('-',Name='AL',Delimiter=' '), # alignment coverage for the shorter sequence, default 0.0 # if set to 0.9, the alignment must covers 90% of the sequence '-aS':ValuedParameter('-',Name='aS',Delimiter=' '), # alignment coverage control for the shorter sequence, default 99999999 # if set to 60, and the length of the sequence is 400, # then the alignment must be >= 340 (400-60) residues '-AS':ValuedParameter('-',Name='AS',Delimiter=' '), # 1 or 0, default 0, by default, sequences are stored in RAM # if set to 1, sequence are stored on hard drive # it is recommended to use -B 1 for huge databases '-B':ValuedParameter('-',Name='B',Delimiter=' '), # 1 or 0, default 0 # if set to 1, print alignment overlap in .clstr file '-p':ValuedParameter('-',Name='p',Delimiter=' '), # 1 or 0, default 0 # by cd-hit's default algorithm, a sequence is clustered to the first # cluster that meet the threshold (fast cluster). If set to 1, the program # will cluster it into the most similar cluster that meet the threshold # (accurate but slow mode) # but either 1 or 0 won't change the representatives of final clusters '-g':ValuedParameter('-',Name='g',Delimiter=' '), # print this help '-h':ValuedParameter('-',Name='h',Delimiter=' ') } _synonyms = {'Similarity':'-c'} def getHelp(self): """Method that points to documentation""" help_str =\ """ CD-HIT is hosted as an open source project at: http://www.bioinformatics.org/cd-hit/ The following papers should be cited if this resource is used: Clustering of highly homologous sequences to reduce thesize of large protein database", Weizhong Li, Lukasz Jaroszewski & Adam Godzik Bioinformatics, (2001) 17:282-283 Tolerating some redundancy significantly speeds up clustering of large protein databases", Weizhong Li, Lukasz Jaroszewski & Adam Godzik Bioinformatics, (2002) 18:77-82 """ return help_str def _input_as_multiline_string(self, data): """Writes data to tempfile and sets -i parameter data -- list of lines """ if data: self.Parameters['-i']\ .on(super(CD_HIT,self)._input_as_multiline_string(data)) return '' def _input_as_lines(self, data): """Writes data to tempfile and sets -i parameter data -- list of lines, ready to be written to file """ if data: self.Parameters['-i']\ .on(super(CD_HIT,self)._input_as_lines(data)) return '' def _input_as_seqs(self, data): """Creates a list of seqs to pass to _input_as_lines data -- list like object of sequences """ lines = [] for i,s in enumerate(data): # will number the sequences 1,2,3, etc... lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _input_as_string(self, data): """Makes data the value of a specific parameter""" if data: self.Parameters['-i'].on(str(data)) return '' def _get_seqs_outfile(self): """Returns the absolute path to the seqs outfile""" if self.Parameters['-o'].isOn(): return self.Parameters['-o'].Value else: raise ValueError, "No output file specified" def _get_clstr_outfile(self): """Returns the absolute path to the clstr outfile""" if self.Parameters['-o'].isOn(): return ''.join([self.Parameters['-o'].Value, '.clstr']) else: raise ValueError, "No output file specified" def _get_result_paths(self, data): """Return dict of {key: ResultPath}""" result = {} result['FASTA'] = ResultPath(Path=self._get_seqs_outfile()) result['CLSTR'] = ResultPath(Path=self._get_clstr_outfile()) return result class CD_HIT_EST(CD_HIT): """cd-hit Application Controller Use this version of CD-HIT if your MolType is PROTEIN """ _command = 'cd-hit-est' _input_handler = '_input_as_multiline_string' _parameters = CD_HIT._parameters _parameters.update({\ # 1 or 0, default 0, by default only +/+ strand alignment # if set to 1, do both +/+ & +/- alignments '-r':ValuedParameter('-',Name='r',Delimiter=' ') }) def cdhit_clusters_from_seqs(seqs, moltype, params=None): """Returns the CD-HIT clusters given seqs seqs : dict like collection of sequences moltype : cogent.core.moltype object params : cd-hit parameters NOTE: This method will call CD_HIT if moltype is PROTIEN, CD_HIT_EST if moltype is RNA/DNA, and raise if any other moltype is passed. """ # keys are not remapped. Tested against seq_ids of 100char length seqs = SequenceCollection(seqs, MolType=moltype) #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seqs.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) # setup params and make sure the output argument is set if params is None: params = {} if '-o' not in params: params['-o'] = get_tmp_filename() # call the correct version of cd-hit base on moltype working_dir = get_tmp_filename() if moltype is PROTEIN: app = CD_HIT(WorkingDir=working_dir, params=params) elif moltype is RNA: app = CD_HIT_EST(WorkingDir=working_dir, params=params) elif moltype is DNA: app = CD_HIT_EST(WorkingDir=working_dir, params=params) else: raise ValueError, "Moltype must be either PROTEIN, RNA, or DNA" # grab result res = app(int_map.toFasta()) clusters = parse_cdhit_clstr_file(res['CLSTR'].readlines()) remapped_clusters = [] for c in clusters: curr = [int_keys[i] for i in c] remapped_clusters.append(curr) # perform cleanup res.cleanUp() shutil.rmtree(working_dir) remove(params['-o'] + '.bak.clstr') return remapped_clusters def cdhit_from_seqs(seqs, moltype, params=None): """Returns the CD-HIT results given seqs seqs : dict like collection of sequences moltype : cogent.core.moltype object params : cd-hit parameters NOTE: This method will call CD_HIT if moltype is PROTIEN, CD_HIT_EST if moltype is RNA/DNA, and raise if any other moltype is passed. """ # keys are not remapped. Tested against seq_ids of 100char length seqs = SequenceCollection(seqs, MolType=moltype) # setup params and make sure the output argument is set if params is None: params = {} if '-o' not in params: params['-o'] = get_tmp_filename() # call the correct version of cd-hit base on moltype working_dir = get_tmp_filename() if moltype is PROTEIN: app = CD_HIT(WorkingDir=working_dir, params=params) elif moltype is RNA: app = CD_HIT_EST(WorkingDir=working_dir, params=params) elif moltype is DNA: app = CD_HIT_EST(WorkingDir=working_dir, params=params) else: raise ValueError, "Moltype must be either PROTEIN, RNA, or DNA" # grab result res = app(seqs.toFasta()) new_seqs = dict(MinimalFastaParser(res['FASTA'].readlines())) # perform cleanup res.cleanUp() shutil.rmtree(working_dir) remove(params['-o'] + '.bak.clstr') return SequenceCollection(new_seqs, MolType=moltype) def clean_cluster_seq_id(id): """Returns a cleaned cd-hit sequence id The cluster file has sequence ids in the form of: >some_id... """ return id[1:-3] def parse_cdhit_clstr_file(lines): """Returns a list of list of sequence ids representing clusters""" clusters = [] curr_cluster = [] for l in lines: if l.startswith('>Cluster'): if not curr_cluster: continue clusters.append(curr_cluster) curr_cluster = [] else: curr_cluster.append(clean_cluster_seq_id(l.split()[2])) if curr_cluster: clusters.append(curr_cluster) return clusters PyCogent-1.5.3/cogent/app/clearcut.py000644 000765 000024 00000032715 12024702176 020464 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides an application controller for the commandline version of: Clearcut v1.0.8 """ from cogent.app.parameters import FlagParameter, ValuedParameter, \ MixedParameter from cogent.app.util import CommandLineApplication, ResultPath, get_tmp_filename from cogent.core.alignment import SequenceCollection, Alignment from cogent.core.moltype import DNA, RNA, PROTEIN from cogent.parse.tree import DndParser from cogent.core.tree import PhyloNode from cogent.util.dict2d import Dict2D from cogent.format.table import phylipMatrix __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" MOLTYPE_MAP = {'DNA':'-D', 'RNA':'-D', 'PROTEIN':'-P', } class Clearcut(CommandLineApplication): """ clearcut application controller The parameters are organized by function to give some idea of how the program works. However, no restrictions are put on any combinations of parameters. Misuse of parameters can lead to errors or otherwise strange results. """ #General options. _general = {\ # --verbose. More Output. (Default:OFF) '-v':FlagParameter('-',Name='v'), # --quiet. Silent operation. (Default: ON) '-q':FlagParameter('-',Name='q',Value=True), # --seed=. Explicitly set the PRNG seed to a specific value. '-s':ValuedParameter('-',Name='s',Delimiter='='), # --norandom. Attempt joins deterministically. (Default: OFF) '-r':FlagParameter('-',Name='r'), # --shuffle. Randomly shuffle the distance matrix. (Default: OFF) '-S':FlagParameter('-',Name='S'), #--neighbor. Use traditional Neighbor-Joining algorithm. (Default: OFF) '-N':FlagParameter('-',Name='N'), } # Input file is distance matrix or alignment. Default expects distance # matrix. Output file is tree created by clearcut. _input = {\ # --in=. Input file '--in':ValuedParameter('--',Name='in',Delimiter='=',IsPath=True), # --stdin. Read input from STDIN. '-I':FlagParameter('-',Name='I'), # --distance. Input file is a distance matrix. (Default: ON) '-d':FlagParameter('-',Name='d',Value=True), # --alignment. Input file is a set of aligned sequences. # (Default: OFF) '-a':FlagParameter('-',Name='a'), # --DNA. Input alignment are DNA sequences. '-D':FlagParameter('-',Name='D'), # --protein. Input alignment are protein sequences. '-P':FlagParameter('-',Name='P'), } #Correction model for computing distance matrix (Default: NO Correction): _correction={\ # --jukes. Use Jukes-Cantor correction for computing distance matrix. '-j':FlagParameter('-',Name='j'), # --kimura. Use Kimura correction for distance matrix. '-k':FlagParameter('-',Name='k'), } _output={\ # --out=. Output file '--out':ValuedParameter('--',Name='out',Delimiter='=',IsPath=True), # --stdout. Output tree to STDOUT. '-O':FlagParameter('-',Name='O'), # --matrixout= Output distance matrix to specified file. '-m':ValuedParameter('-',Name='m',Delimiter='='), # --ntrees=. Output n trees. (Default: 1) '-n':ValuedParameter('-',Name='n',Delimiter='='), # --expblen. Exponential notation for branch lengths. (Default: OFF) '-e':FlagParameter('-',Name='e'), # --expdist. Exponential notation in distance output. (Default: OFF) '-E':FlagParameter('-',Name='E'), } #NOT SUPPORTED #'-h':FlagParameter('-','h'), #Help #'-V':FlagParameter('-','V'), #Version _parameters = {} _parameters.update(_general) _parameters.update(_input) _parameters.update(_correction) _parameters.update(_output) _command = 'clearcut' def getHelp(self): """Method that points to the Clearcut documentation.""" help_str =\ """ See Clearcut homepage at: http://bioinformatics.hungry.com/clearcut/ """ return help_str def _input_as_multiline_string(self, data): """Writes data to tempfile and sets -infile parameter data -- list of lines """ if data: self.Parameters['--in']\ .on(super(Clearcut,self)._input_as_multiline_string(data)) return '' def _input_as_lines(self,data): """Writes data to tempfile and sets -infile parameter data -- list of lines, ready to be written to file """ if data: self.Parameters['--in']\ .on(super(Clearcut,self)._input_as_lines(data)) return '' def _input_as_seqs(self,data): """writes sequences to tempfile and sets -infile parameter data -- list of sequences Adds numbering to the sequences: >1, >2, etc. """ lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ if data: self.Parameters['--in'].on(data) return '' def _tree_filename(self): """Return name of file containing the alignment prefix -- str, prefix of alignment file. """ if self.Parameters['--out']: aln_filename = self._absolute(self.Parameters['--out'].Value) else: raise ValueError, "No tree output file specified." return aln_filename def _get_result_paths(self,data): """Return dict of {key: ResultPath} """ result = {} if self.Parameters['--out'].isOn(): out_name = self._tree_filename() result['Tree'] = ResultPath(Path=out_name,IsWritten=True) return result #SOME FUNCTIONS TO EXECUTE THE MOST COMMON TASKS def align_unaligned_seqs(seqs, moltype, params=None): """Returns an Alignment object from seqs. seqs: SequenceCollection object, or data that can be used to build one. moltype: a MolType object. DNA, RNA, or PROTEIN. params: dict of parameters to pass in to the Clearcut app controller. Result will be an Alignment object. """ #Clearcut does not support alignment raise NotImplementedError, """Clearcut does not support alignment.""" def align_and_build_tree(seqs, moltype, best_tree=False, params={}): """Returns an alignment and a tree from Sequences object seqs. seqs: SequenceCollection object, or data that can be used to build one. best_tree: if True (default:False), uses a slower but more accurate algorithm to build the tree. params: dict of parameters to pass in to the Clearcut app controller. The result will be a tuple containing an Alignment object and a cogent.core.tree.PhyloNode object (or None for the alignment and/or tree if either fails). """ #Clearcut does not support alignment raise NotImplementedError, """Clearcut does not support alignment.""" def build_tree_from_alignment(aln, moltype, best_tree=False, params={},\ working_dir='/tmp'): """Returns a tree from Alignment object aln. aln: an cogent.core.alignment.Alignment object, or data that can be used to build one. - Clearcut only accepts aligned sequences. Alignment object used to handle unaligned sequences. moltype: a cogent.core.moltype object. - NOTE: If moltype = RNA, we must convert to DNA since Clearcut v1.0.8 gives incorrect results if RNA is passed in. 'U' is treated as an incorrect character and is excluded from distance calculations. best_tree: if True (default:False), uses a slower but more accurate algorithm to build the tree. params: dict of parameters to pass in to the Clearcut app controller. The result will be an cogent.core.tree.PhyloNode object, or None if tree fails. """ params['--out'] = get_tmp_filename(working_dir) # Create instance of app controller, enable tree, disable alignment app = Clearcut(InputHandler='_input_as_multiline_string', params=params, \ WorkingDir=working_dir, SuppressStdout=True,\ SuppressStderr=True) #Input is an alignment app.Parameters['-a'].on() #Turn off input as distance matrix app.Parameters['-d'].off() #If moltype = RNA, we must convert to DNA. if moltype == RNA: moltype = DNA if best_tree: app.Parameters['-N'].on() #Turn on correct moltype moltype_string = moltype.label.upper() app.Parameters[MOLTYPE_MAP[moltype_string]].on() # Setup mapping. Clearcut clips identifiers. We will need to remap them. # Clearcut only accepts aligned sequences. Let Alignment object handle # unaligned sequences. seq_aln = Alignment(aln,MolType=moltype) #get int mapping int_map, int_keys = seq_aln.getIntMap() #create new Alignment object with int_map int_map = Alignment(int_map) # Collect result result = app(int_map.toFasta()) # Build tree tree = DndParser(result['Tree'].read(), constructor=PhyloNode) for node in tree.tips(): node.Name = int_keys[node.Name] # Clean up result.cleanUp() del(seq_aln, app, result, int_map, int_keys, params) return tree def add_seqs_to_alignment(seqs, aln, params=None): """Returns an Alignment object from seqs and existing Alignment. seqs: an cogent.core.sequence.Sequence object, or data that can be used to build one. aln: an cogent.core.alignment.Alignment object, or data that can be used to build one params: dict of parameters to pass in to the Clearcut app controller. """ #Clearcut does not support alignment raise NotImplementedError, """Clearcut does not support alignment.""" def align_two_alignments(aln1, aln2, params=None): """Returns an Alignment object from two existing Alignments. aln1, aln2: cogent.core.alignment.Alignment objects, or data that can be used to build them. params: dict of parameters to pass in to the Clearcut app controller. """ #Clearcut does not support alignment raise NotImplementedError, """Clearcut does not support alignment.""" def build_tree_from_distance_matrix(matrix, best_tree=False, params={},\ working_dir='/tmp'): """Returns a tree from a distance matrix. matrix: a square Dict2D object (cogent.util.dict2d) best_tree: if True (default:False), uses a slower but more accurate algorithm to build the tree. params: dict of parameters to pass in to the Clearcut app controller. The result will be an cogent.core.tree.PhyloNode object, or None if tree fails. """ params['--out'] = get_tmp_filename(working_dir) # Create instance of app controller, enable tree, disable alignment app = Clearcut(InputHandler='_input_as_multiline_string', params=params, \ WorkingDir=working_dir, SuppressStdout=True,\ SuppressStderr=True) #Turn off input as alignment app.Parameters['-a'].off() #Input is a distance matrix app.Parameters['-d'].on() if best_tree: app.Parameters['-N'].on() # Turn the dict2d object into the expected input format matrix_input, int_keys = _matrix_input_from_dict2d(matrix) # Collect result result = app(matrix_input) # Build tree tree = DndParser(result['Tree'].read(), constructor=PhyloNode) # reassign to original names for node in tree.tips(): node.Name = int_keys[node.Name] # Clean up result.cleanUp() del(app, result, params) return tree def _matrix_input_from_dict2d(matrix): """makes input for running clearcut on a matrix from a dict2D object""" #clearcut truncates names to 10 char- need to rename before and #reassign after #make a dict of env_index:full name int_keys = dict([('env_' + str(i), k) for i,k in \ enumerate(sorted(matrix.keys()))]) #invert the dict int_map = {} for i in int_keys: int_map[int_keys[i]] = i #make a new dict2D object with the integer keys mapped to values instead of #the original names new_dists = [] for env1 in matrix: for env2 in matrix[env1]: new_dists.append((int_map[env1], int_map[env2], matrix[env1][env2])) int_map_dists = Dict2D(new_dists) #names will be fed into the phylipTable function - it is the int map names names = sorted(int_map_dists.keys()) rows = [] #populated rows with values based on the order of names #the following code will work for a square matrix only for index, key1 in enumerate(names): row = [] for key2 in names: row.append(str(int_map_dists[key1][key2])) rows.append(row) input_matrix = phylipMatrix(rows, names) #input needs a trailing whitespace or it will fail! input_matrix += '\n' return input_matrix, int_keys PyCogent-1.5.3/cogent/app/clustalw.py000644 000765 000024 00000070116 12024702176 020515 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides an application controller for the commandline version of: CLUSTALW v1.83 """ from cogent.app.parameters import FlagParameter, ValuedParameter, \ MixedParameter, FilePath from cogent.app.util import CommandLineApplication, ResultPath, remove from cogent.core.alignment import SequenceCollection, Alignment from cogent.parse.tree import DndParser from cogent.parse.clustal import ClustalParser from cogent.core.tree import PhyloNode from cogent.core.moltype import RNA, DNA, PROTEIN from numpy.random import randint __author__ = "Sandra Smit" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Micah Hamady", "Rob Knight", "Jeremy Widmann", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" class Clustalw(CommandLineApplication): """ clustalw application controller The parameters are organized by function to give some idea of how the program works. However, no restrictions are put on any combinations of parameters. Misuse of parameters can lead to errors or otherwise strange results. You are supposed to choose one action for the program to perform. (align, profile, sequences, tree, or bootstrap). If you choose multiple, only the dominant action (see order above) will be executed. By DEFAULT, the -align parameter is turned on. If you decide to turn another one on, you should turn '-align' off IN ADDITION! Some references to help pages are available in the 'getHelp' method. Some might be useful to you. """ _actions = {\ '-align':FlagParameter('-','align',Value=True), '-profile':FlagParameter('-','profile'), '-sequences':FlagParameter('-','sequences'), '-tree':FlagParameter('-','tree'), '-bootstrap':MixedParameter('-','bootstrap',Delimiter='=')} #sequence file for alignment, or alignment file for bootstrap and tree #actions _input = {'-infile':ValuedParameter('-','infile',Delimiter='=',IsPath=True)} # matrix and dnamatrix can be filenames as well, but not always. # They won't be treated as filenames and thus not quoted. # Therefore filepaths containing spaces might result in errors. _multiple_alignment={\ '-quicktree':FlagParameter('-','quicktree'), '-type':ValuedParameter('-','type',Delimiter='='), '-matrix':ValuedParameter('-','matrix',Delimiter='='), '-dnamatrix':ValuedParameter('-','dnamatrix',Delimiter='='), '-gapopen':ValuedParameter('-','gapopen',Delimiter='='), '-gapext':ValuedParameter('-','gapext',Delimiter='='), '-endgaps':FlagParameter('-','endgaps'), '-gapdist':ValuedParameter('-',Name='gapdist',Delimiter='='), '-nopgap':FlagParameter('-','nopgap'), '-nohgap':FlagParameter('-','nohgap'), '-hgapresidues':ValuedParameter('-','hgapresidues',Delimiter='='), '-maxdiv':ValuedParameter('-',Name='maxdiv',Delimiter='='), '-negative':FlagParameter('-','negative'), '-transweight':ValuedParameter('-',Name='transweight',Delimiter='='), '-newtree':ValuedParameter('-','newtree',Delimiter='=',IsPath=True), '-usetree':ValuedParameter('-','usetree',Delimiter='=',IsPath=True)} _fast_pairwise={\ '-ktuple':ValuedParameter('-',Name='ktuple',Delimiter='='), '-topdiags':ValuedParameter('-',Name='topdiags',Delimiter='='), '-window':ValuedParameter('-',Name='window',Delimiter='='), '-pairgap':ValuedParameter('-',Name='pairgap',Delimiter='='), '-score':ValuedParameter('-',Name='score',Delimiter='=')} # pwmatrix and pwdnamatrix can be filenames as well, but not always. # They won't be treated as filenames and thus not quoted. # Therefore filepaths containing spaces might result in errors. _slow_pairwise={\ '-pwmatrix':ValuedParameter('-',Name='pwmatrix',Delimiter='='), '-pwdnamatrix':ValuedParameter('-',Name='pwdnamatrix',Delimiter='='), '-pwgapopen':ValuedParameter('-',Name='pwgapopen',Delimiter='='), '-pwgapext':ValuedParameter('-',Name='pwgapext',Delimiter='=')} #plus -bootstrap _tree={\ '-kimura':FlagParameter('-',Name='kimura'), '-tossgaps':FlagParameter('-',Name='tossgaps'), '-bootlabels':ValuedParameter('-',Name='bootlabels',Delimiter='='), '-seed':ValuedParameter('-',Name='seed',Delimiter='='), '-outputtree':ValuedParameter('-',Name='outputtree',Delimiter='=')} _output={\ '-outfile':ValuedParameter('-',Name='outfile',Delimiter='=',\ IsPath=True), '-output':ValuedParameter('-',Name='output',Delimiter='='), '-case':ValuedParameter('-',Name='case',Delimiter='='), '-outorder':ValuedParameter('-',Name='outorder',Delimiter='='), '-seqnos':ValuedParameter('-',Name='seqnos',Delimiter='=')} _profile_alignment={\ '-profile1':ValuedParameter('-','profile1',Delimiter='=',IsPath=True), '-profile2':ValuedParameter('-','profile2',Delimiter='=',IsPath=True), '-usetree1':ValuedParameter('-','usetree1',Delimiter='=',IsPath=True), '-usetree2':ValuedParameter('-','usetree2',Delimiter='=',IsPath=True), '-newtree1':ValuedParameter('-','newtree1',Delimiter='=',IsPath=True), '-newtree2':ValuedParameter('-','newtree2',Delimiter='=',IsPath=True)} _structure_alignment={\ '-nosecstr1':FlagParameter('-',Name='nosecstr1'), '-nosecstr2':FlagParameter('-',Name='nosecstr2'), '-helixgap':ValuedParameter('-',Name='helixgap',Delimiter='='), '-strandgap':ValuedParameter('-',Name='strandgap',Delimiter='='), '-loopgap':ValuedParameter('-',Name='loopgap',Delimiter='='), '-terminalgap':ValuedParameter('-',Name='terminalgap',Delimiter='='), '-helixendin':ValuedParameter('-',Name='helixendin',Delimiter='='), '-helixendout':ValuedParameter('-',Name='helixendout',Delimiter='='), '-strandendin':ValuedParameter('-',Name='strandendin',Delimiter='='), '-strandendout':ValuedParameter('-',Name='strandendout',Delimiter='='), '-secstrout':ValuedParameter('-',Name='secstrout',Delimiter='=')} #NOT SUPPORTED #'-help':FlagParameter('-','help'), #'-check':FlagParameter('-','check'), #'-options':FlagParameter('-','options'), #'-convert':FlagParameter('-','convert'), #'-batch':FlagParameter('-','batch'), #'-noweights':FlagParameter('-','noweights'), #'-novgap':FlagParameter('-','novgap'), #'-debug':ValuedParameter('-',Name='debug',Delimiter='='), _parameters = {} _parameters.update(_actions) _parameters.update(_input) _parameters.update(_multiple_alignment) _parameters.update(_fast_pairwise) _parameters.update(_slow_pairwise) _parameters.update(_tree) _parameters.update(_output) _parameters.update(_profile_alignment) _parameters.update(_structure_alignment) _command = 'clustalw' def getHelp(self): """Methods that points to the documentation""" help_str =\ """ There are several help pages available online. For example: http://searchlauncher.bcm.tmc.edu/multi-align/Help/ clustalw_help_1.8.html http://hypernig.nig.ac.jp/homology/clustalw-e_help.html http://www.genebee.msu.su/clustal/help.html A page that give reasonable insight in use of the parameters: http://bioweb.pasteur.fr/seqanal/interfaces/clustalw.html """ return help_str def _input_as_multiline_string(self, data): """Writes data to tempfile and sets -infile parameter data -- list of lines """ if data: self.Parameters['-infile']\ .on(super(Clustalw,self)._input_as_multiline_string(data)) return '' def _input_as_lines(self,data): """Writes data to tempfile and sets -infile parameter data -- list of lines, ready to be written to file """ if data: self.Parameters['-infile']\ .on(super(Clustalw,self)._input_as_lines(data)) return '' def _input_as_seqs(self,data): """writes sequences to tempfile and sets -infile parameter data -- list of sequences Adds numbering to the sequences: >1, >2, etc. """ lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ if data: self.Parameters['-infile'].on(data) return '' def _suffix(self): """Return appropriate suffix for alignment file""" _output_formats={'GCG':'.msf', 'GDE':'.gde', 'PHYLIP':'.phy', 'PIR':'.pir', 'NEXUS':'.nxs'} if self.Parameters['-output'].isOn(): return _output_formats[self.Parameters['-output'].Value] else: return '.aln' def _aln_filename(self,prefix): """Return name of file containing the alignment prefix -- str, prefix of alignment file. """ if self.Parameters['-outfile'].isOn(): aln_filename = self._absolute(self.Parameters['-outfile'].Value) else: aln_filename = prefix + self._suffix() return aln_filename def _tempfile_as_multiline_string(self, data): """Write a multiline string to a temp file and return the filename. data: a multiline string to be written to a file. * Note: the result will be the filename as a FilePath object (which is a string subclass). """ filename = FilePath(self.getTmpFilename(self.TmpDir)) data_file = open(filename,'w') data_file.write(data) data_file.close() return filename def _get_result_paths(self,data): """Return dict of {key: ResultPath} """ #clustalw .aln is used when no or unkown output type specified _treeinfo_formats = {'nj':'.nj', 'dist':'.dst', 'nexus':'.tre'} result = {} par = self.Parameters abs = self._absolute if par['-align'].isOn(): prefix = par['-infile'].Value.rsplit('.', 1)[0] #prefix = par['-infile'].Value.split('.')[0] aln_filename = self._aln_filename(prefix) if par['-newtree'].isOn(): dnd_filename = abs(par['-newtree'].Value) elif par['-usetree'].isOn(): dnd_filename = abs(par['-usetree'].Value) else: dnd_filename = abs(prefix + '.dnd') result['Align'] = ResultPath(Path=aln_filename,IsWritten=True) result['Dendro'] = ResultPath(Path=dnd_filename,IsWritten=True) elif par['-profile'].isOn(): prefix1 = par['-profile1'].Value.rsplit('.', 1)[0] prefix2 = par['-profile2'].Value.rsplit('.', 1)[0] #prefix1 = par['-profile1'].Value.split('.')[0] #prefix2 = par['-profile2'].Value.split('.')[0] aln_filename = ''; aln_written = True dnd1_filename = ''; tree1_written = True dnd2_filename = ''; tree2_written = True aln_filename = self._aln_filename(prefix1) #usetree1 if par['-usetree1'].isOn(): tree1_written = False #usetree2 if par['-usetree2'].isOn(): tree2_written = False if par['-newtree1'].isOn(): dnd1_filename = abs(par['-newtree1'].Value) aln_written=False else: dnd1_filename = abs(prefix1 + '.dnd') if par['-newtree2'].isOn(): dnd2_filename = abs(par['-newtree2'].Value) aln_written=False else: dnd2_filename = abs(prefix2 + '.dnd') result['Align'] = ResultPath(Path=aln_filename, IsWritten=aln_written) result['Dendro1'] = ResultPath(Path=dnd1_filename, IsWritten=tree1_written) result['Dendro2'] = ResultPath(Path=dnd2_filename, IsWritten=tree2_written) elif par['-sequences'].isOn(): prefix1 = par['-profile1'].Value.rsplit('.', 1)[0] prefix2 = par['-profile2'].Value.rsplit('.', 1)[0] #prefix1 = par['-profile1'].Value.split('.')[0] #alignment #prefix2 = par['-profile2'].Value.split('.')[0] #sequences aln_filename = ''; aln_written = True dnd_filename = ''; dnd_written = True aln_filename = self._aln_filename(prefix2) if par['-usetree'].isOn(): dnd_written = False elif par['-newtree'].isOn(): aln_written = False dnd_filename = abs(par['-newtree'].Value) else: dnd_filename = prefix2 + '.dnd' result['Align'] = ResultPath(Path=aln_filename,\ IsWritten=aln_written) result['Dendro'] = ResultPath(Path=dnd_filename,\ IsWritten=dnd_written) elif par['-tree'].isOn(): prefix = par['-infile'].Value.rsplit('.', 1)[0] #prefix = par['-infile'].Value.split('.')[0] tree_filename = ''; tree_written = True treeinfo_filename = ''; treeinfo_written = False tree_filename = prefix + '.ph' if par['-outputtree'].isOn() and\ par['-outputtree'].Value != 'phylip': treeinfo_filename = prefix +\ _treeinfo_formats[par['-outputtree'].Value] treeinfo_written = True result['Tree'] = ResultPath(Path=tree_filename,\ IsWritten=tree_written) result['TreeInfo'] = ResultPath(Path=treeinfo_filename,\ IsWritten=treeinfo_written) elif par['-bootstrap'].isOn(): prefix = par['-infile'].Value.rsplit('.', 1)[0] #prefix = par['-infile'].Value.split('.')[0] boottree_filename = prefix + '.phb' result['Tree'] = ResultPath(Path=boottree_filename,IsWritten=True) return result #SOME FUNCTIONS TO EXECUTE THE MOST COMMON TASKS def alignUnalignedSeqs(seqs,add_seq_names=True,WorkingDir=None,\ SuppressStderr=None,SuppressStdout=None): """Aligns unaligned sequences seqs: either list of sequence objects or list of strings add_seq_names: boolean. if True, sequence names are inserted in the list of sequences. if False, it assumes seqs is a list of lines of some proper format that the program can handle """ if add_seq_names: app = Clustalw(InputHandler='_input_as_seqs',\ WorkingDir=WorkingDir,SuppressStderr=SuppressStderr,\ SuppressStdout=SuppressStdout) else: app = Clustalw(InputHandler='_input_as_lines',\ WorkingDir=WorkingDir,SuppressStderr=SuppressStderr,\ SuppressStdout=SuppressStdout) return app(seqs) def alignUnalignedSeqsFromFile(filename,WorkingDir=None,SuppressStderr=None,\ SuppressStdout=None): """Aligns unaligned sequences from some file (file should be right format) filename: string, the filename of the file containing the sequences to be aligned in a valid format. """ app = Clustalw(WorkingDir=WorkingDir,SuppressStderr=SuppressStderr,\ SuppressStdout=SuppressStdout) return app(filename) def alignTwoAlignments(aln1,aln2,outfile,WorkingDir=None,SuppressStderr=None,\ SuppressStdout=None): """Aligns two alignments. Individual sequences are not realigned aln1: string, name of file containing the first alignment aln2: string, name of file containing the second alignment outfile: you're forced to specify an outfile name, because if you don't aln1 will be overwritten. So, if you want aln1 to be overwritten, you should specify the same filename. WARNING: a .dnd file is created with the same prefix as aln1. So an existing dendrogram might get overwritten. """ app = Clustalw({'-profile':None,'-profile1':aln1,\ '-profile2':aln2,'-outfile':outfile},SuppressStderr=\ SuppressStderr,WorkingDir=WorkingDir,SuppressStdout=SuppressStdout) app.Parameters['-align'].off() return app() def addSeqsToAlignment(aln1,seqs,outfile,WorkingDir=None,SuppressStderr=None,\ SuppressStdout=None): """Aligns sequences from second profile against first profile aln1: string, name of file containing the alignment seqs: string, name of file containing the sequences that should be added to the alignment. opoutfile: string, name of the output file (the new alignment) """ app = Clustalw({'-sequences':None,'-profile1':aln1,\ '-profile2':seqs,'-outfile':outfile},SuppressStderr=\ SuppressStderr,WorkingDir=WorkingDir, SuppressStdout=SuppressStdout) app.Parameters['-align'].off() return app() def buildTreeFromAlignment(filename,WorkingDir=None,SuppressStderr=None): """Builds a new tree from an existing alignment filename: string, name of file containing the seqs or alignment """ app = Clustalw({'-tree':None,'-infile':filename},SuppressStderr=\ SuppressStderr,WorkingDir=WorkingDir) app.Parameters['-align'].off() return app() def align_and_build_tree(seqs, moltype, best_tree=False, params=None): """Returns an alignment and a tree from Sequences object seqs. seqs: an cogent.core.alignment.SequenceCollection object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object best_tree: if True (default:False), uses a slower but more accurate algorithm to build the tree. params: dict of parameters to pass in to the Clustal app controller. The result will be a tuple containing a cogent.core.alignment.Alignment and a cogent.core.tree.PhyloNode object (or None for the alignment and/or tree if either fails). """ aln = align_unaligned_seqs(seqs, moltype=moltype, params=params) tree = build_tree_from_alignment(aln, moltype, best_tree, params) return {'Align':aln,'Tree':tree} def build_tree_from_alignment(aln, moltype, best_tree=False, params=None): """Returns a tree from Alignment object aln. aln: an cogent.core.alignment.Alignment object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object best_tree: if True (default:False), uses a slower but more accurate algorithm to build the tree. params: dict of parameters to pass in to the Clustal app controller. The result will be an cogent.core.tree.PhyloNode object, or None if tree fails. """ # Create instance of app controller, enable tree, disable alignment app = Clustalw(InputHandler='_input_as_multiline_string', params=params, \ WorkingDir='/tmp') app.Parameters['-align'].off() #Set params to empty dict if None. if params is None: params={} if moltype == DNA or moltype == RNA: params['-type'] = 'd' elif moltype == PROTEIN: params['-type'] = 'p' else: raise ValueError, "moltype must be DNA, RNA, or PROTEIN" # best_tree -> bootstrap if best_tree: if '-bootstrap' not in params: app.Parameters['-bootstrap'].on(1000) if '-seed' not in params: app.Parameters['-seed'].on(randint(0,1000)) if '-bootlabels' not in params: app.Parameters['-bootlabels'].on('nodes') else: app.Parameters['-tree'].on() # Setup mapping. Clustalw clips identifiers. We will need to remap them. seq_collection = SequenceCollection(aln) int_map, int_keys = seq_collection.getIntMap() int_map = SequenceCollection(int_map) # Collect result result = app(int_map.toFasta()) # Build tree tree = DndParser(result['Tree'].read(), constructor=PhyloNode) for node in tree.tips(): node.Name = int_keys[node.Name] # Clean up result.cleanUp() del(seq_collection, app, result, int_map, int_keys) return tree def bootstrap_tree_from_alignment(aln, seed=None, num_trees=None, params=None): """Returns a tree from Alignment object aln with bootstrap support values. aln: an cogent.core.alignment.Alignment object, or data that can be used to build one. seed: an interger, seed value to use num_trees: an integer, number of trees to bootstrap against params: dict of parameters to pass in to the Clustal app controller. The result will be an cogent.core.tree.PhyloNode object, or None if tree fails. If seed is not specifed in params, a random integer between 0-1000 is used. """ # Create instance of controllor, enable bootstrap, disable alignment,tree app = Clustalw(InputHandler='_input_as_multiline_string', params=params, \ WorkingDir='/tmp') app.Parameters['-align'].off() app.Parameters['-tree'].off() if app.Parameters['-bootstrap'].isOff(): if num_trees is None: num_trees = 1000 app.Parameters['-bootstrap'].on(num_trees) if app.Parameters['-seed'].isOff(): if seed is None: seed = randint(0,1000) app.Parameters['-seed'].on(seed) if app.Parameters['-bootlabels'].isOff(): app.Parameters['-bootlabels'].on("node") # Setup mapping. Clustalw clips identifiers. We will need to remap them. seq_collection = SequenceCollection(aln) int_map, int_keys = seq_collection.getIntMap() int_map = SequenceCollection(int_map) # Collect result result = app(int_map.toFasta()) # Build tree tree = DndParser(result['Tree'].read(), constructor=PhyloNode) for node in tree.tips(): node.Name = int_keys[node.Name] # Clean up result.cleanUp() del(seq_collection, app, result, int_map, int_keys) return tree def align_unaligned_seqs(seqs, moltype, params=None): """Returns an Alignment object from seqs. seqs: cogent.core.alignment.SequenceCollection object, or data that can be used to build one. moltype: a MolType object. DNA, RNA, or PROTEIN. params: dict of parameters to pass in to the Clustal app controller. Result will be a cogent.core.alignment.Alignment object. """ #create SequenceCollection object from seqs seq_collection = SequenceCollection(seqs,MolType=moltype) #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seq_collection.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) #Create Clustalw app. app = Clustalw(InputHandler='_input_as_multiline_string',params=params) #Get results using int_map as input to app res = app(int_map.toFasta()) #Get alignment as dict out of results alignment = dict(ClustalParser(res['Align'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): new_alignment[int_keys[k]]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) #Clean up res.cleanUp() del(seq_collection,int_map,int_keys,app,res,alignment) return new_alignment def add_seqs_to_alignment(seqs, aln, moltype, params=None): """Returns an Alignment object from seqs and existing Alignment. seqs: a cogent.core.alignment.SequenceCollection object, or data that can be used to build one. aln: a cogent.core.alignment.Alignment object, or data that can be used to build one params: dict of parameters to pass in to the Clustal app controller. """ #create SequenceCollection object from seqs seq_collection = SequenceCollection(seqs,MolType=moltype) #Create mapping between abbreviated IDs and full IDs seq_int_map, seq_int_keys = seq_collection.getIntMap() #Create SequenceCollection from int_map. seq_int_map = SequenceCollection(seq_int_map,MolType=moltype) #create Alignment object from aln aln = Alignment(aln,MolType=moltype) #Create mapping between abbreviated IDs and full IDs aln_int_map, aln_int_keys = aln.getIntMap(prefix='seqn_') #Create SequenceCollection from int_map. aln_int_map = Alignment(aln_int_map,MolType=moltype) #Update seq_int_keys with aln_int_keys seq_int_keys.update(aln_int_keys) #Create Mafft app. app = Clustalw(InputHandler='_input_as_multiline_string',\ params=params, SuppressStderr=True) app.Parameters['-align'].off() app.Parameters['-infile'].off() app.Parameters['-sequences'].on() #Add aln_int_map as profile1 app.Parameters['-profile1'].on(\ app._tempfile_as_multiline_string(aln_int_map.toFasta())) #Add seq_int_map as profile2 app.Parameters['-profile2'].on(\ app._tempfile_as_multiline_string(seq_int_map.toFasta())) #Get results using int_map as input to app res = app() #Get alignment as dict out of results alignment = dict(ClustalParser(res['Align'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): new_alignment[seq_int_keys[k]]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) #Clean up res.cleanUp() remove(app.Parameters['-profile1'].Value) remove(app.Parameters['-profile2'].Value) del(seq_collection,seq_int_map,seq_int_keys,\ aln,aln_int_map,aln_int_keys,app,res,alignment) return new_alignment def align_two_alignments(aln1, aln2, moltype, params=None): """Returns an Alignment object from two existing Alignments. aln1, aln2: cogent.core.alignment.Alignment objects, or data that can be used to build them. params: dict of parameters to pass in to the Clustal app controller. """ #create SequenceCollection object from seqs aln1 = Alignment(aln1,MolType=moltype) #Create mapping between abbreviated IDs and full IDs aln1_int_map, aln1_int_keys = aln1.getIntMap() #Create SequenceCollection from int_map. aln1_int_map = Alignment(aln1_int_map,MolType=moltype) #create Alignment object from aln aln2 = Alignment(aln2,MolType=moltype) #Create mapping between abbreviated IDs and full IDs aln2_int_map, aln2_int_keys = aln2.getIntMap(prefix='seqn_') #Create SequenceCollection from int_map. aln2_int_map = Alignment(aln2_int_map,MolType=moltype) #Update aln1_int_keys with aln2_int_keys aln1_int_keys.update(aln2_int_keys) #Create Mafft app. app = Clustalw(InputHandler='_input_as_multiline_string',\ params=params, SuppressStderr=True) app.Parameters['-align'].off() app.Parameters['-infile'].off() app.Parameters['-profile'].on() #Add aln_int_map as profile1 app.Parameters['-profile1'].on(\ app._tempfile_as_multiline_string(aln1_int_map.toFasta())) #Add seq_int_map as profile2 app.Parameters['-profile2'].on(\ app._tempfile_as_multiline_string(aln2_int_map.toFasta())) #Get results using int_map as input to app res = app() #Get alignment as dict out of results alignment = dict(ClustalParser(res['Align'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): new_alignment[aln1_int_keys[k]]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) #Clean up res.cleanUp() remove(app.Parameters['-profile1'].Value) remove(app.Parameters['-profile2'].Value) del(aln1,aln1_int_map,aln1_int_keys,\ aln2,aln2_int_map,aln2_int_keys,app,res,alignment) return new_alignment PyCogent-1.5.3/cogent/app/cmfinder.py000644 000765 000024 00000020372 12024702176 020445 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides application controller for CMfinder v0.2""" import os from cogent.app.util import CommandLineApplication, \ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters from sys import exit from os.path import isfile __author__ = "Daniel McDonald and Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" if 'CMfinder' not in os.environ: raise RuntimeError, \ 'cmfinder cannot run if the CMfinder environment variable is not set.' class CMfinder(CommandLineApplication): """The application controller for CMfinder 0.2 application Options: -b Do not use BLAST search to locate anchors -v Verbose. Print running information, and save intermediate results -c The maximum number of candidates in each sequence. Default 40. No bigger than 100. -m The minimum length of candidates. Default 30 -M The maximum length of candidates. Default 100 -n The maximum number of output motifs. Default 3 -f The fraction of the sequences expected to contain the motif. Default 0.80 -s The number of stem-loops in the motif -h Show help """ #-n default is 3, set to 3 because of resultpath concerns _parameters = { '-b':FlagParameter(Prefix='-',Name='b',Value=True), '-v':FlagParameter(Prefix='-',Name='v'), '-c':ValuedParameter(Prefix='-',Name='c',Value=None,Delimiter=' '), '-m':ValuedParameter(Prefix='-',Name='m',Value=None,Delimiter=' '), '-M':ValuedParameter(Prefix='-',Name='M',Value=None,Delimiter=' '), '-n':ValuedParameter(Prefix='-',Name='n',Value=3,Delimiter=' '), '-f':ValuedParameter(Prefix='-',Name='f',Value=None,Delimiter=' '), '-s':ValuedParameter(Prefix='-',Name='s',Value=None,Delimiter=' ')} _command = 'cmfinder.pl' _input_handler = '_input_as_string' def _get_result_paths(self,data): """Specifies the paths of output files generated by the application data: the data the instance the application is called on CMfinder produces it's output in two files .align and .motif it also prints an output to sdtout. """ result={} if not isinstance(data,list): inputPath = str(data) else: inputPath=self._input_filename itr=self.Parameters['-n'].Value for i in range(itr): nr=str(i+1) try: #unknown nr of output files f = open((inputPath+'.motif.h1.'+nr)) #if exists add to path f.close() result[('cm_'+nr)] =\ ResultPath(Path=(inputPath+'.cm.h1.'+nr)) result[('motif_'+nr)] =\ ResultPath(Path=(inputPath+'.motif.h1.'+nr)) except IOError: # else no more outputs break if self._input_filename is not None: result['_input_filename'] = ResultPath(self._input_filename) if isfile(self.WorkingDir+'latest.cm'): result['latest'] =\ ResultPath(Path=(self.WorkingDir+'latest.cm')) else: pass return result class CombMotif(CommandLineApplication): """ Application controller for the combmotif.pl program Only works for input as string since filnames are needed to located input """ _command = 'CombMotif.pl' _input_handler = '_input_as_string' def _input_as_string(self,data): """ Return data as a string """ input = str(data) +' '+str(data)+'.motif' return input def _input_as_lines(self,data): """ """ print 'Use input as string with cmfinder input_filename as input' exit(1) def _get_result_paths(self,data): """Specifies the paths of output files generated by the application data: the data the instance of the application is called on CombMotif will generate an output, the combination that was possible, the modified _get_result_path will detect that output and return the path to that output file. Since the output is not possible to predict one has to try all possible outputs. Assumes that one stem loop is used may correct this later """ result={} filename = str(data) motifList = [] mnr = 0 #motif number if not isinstance(data,list): inputPath = str(data) else: inputPath=self._input_filename for h in range(2): #numbers of stem loops in each motif, recommended 1-2 if h == 0: s = '.motif.h1.' else: s = '.motif.h2.' for i in range(1,6): for x in range(1,6): #two combined motifs if x == i: continue try: z = str(i) w = str(x) file = filename+s+z+'.'+w f = open((file)) #print 'found',file f.close() mnr += 1 n = str(mnr) result['comb'+n] = ResultPath(Path=file) except IOError: pass for y in range(1,6): # three combined motifs if y == x: continue try: z = str(i) w = str(x) q = str(y) file = filename+s+z+'.'+w+'.'+q f = open((file)) #print 'found', file f.close() mnr += 1 n = str(mnr) result['comb'+n] = ResultPath(Path=file) except IOError: pass for k in range(1,6): # four combined motifs if k == y: continue try: z = str(i) w = str(x) q = str(y) v = str(k) file = filename+s+z+'.'+w+'.'+q+'.'+v f = open(file) #print 'found',file f.close() mnr += 1 n = str(mnr) result['comb'+n] = ResultPath(Path=file) except IOError: pass for j in range(1,6): #five combined motifs if j == k: continue try: z = str(i) w = str(x) q = str(y) v = str(k) u = str(j) file = filename+s+z+'.'+w+'.'+q+'.'+v+'.'+u f = open(file) #print 'found',file f.close() mnr += 1 n = str(mnr) result['comb'+n] = ResultPath(Path=file) except IOError: pass if isfile(self.WorkingDir+'latest.cm'): result['latest'] =\ ResultPath(Path=(self.WorkingDir+'latest.cm')) else: pass return result PyCogent-1.5.3/cogent/app/comrna.py000644 000765 000024 00000005706 12024702176 020141 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym, is_not_None __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class comRNA(CommandLineApplication): """Application controller comRNA v1.80 for comRNA options type comRNA at the promt """ _parameters = { '-L':ValuedParameter(Prefix='',Name='L',Value=None,Delimiter=' '), '-E':ValuedParameter(Prefix='',Name='E',Value=None,Delimiter=' '), '-S':ValuedParameter(Prefix='',Name='S',Value=None,Delimiter=' '), '-Sh':ValuedParameter(Prefix='',Name='Sh',Value=None,Delimiter=' '), '-Sl':ValuedParameter(Prefix='',Name='Sl',Value=None,Delimiter=' '), '-P':ValuedParameter(Prefix='',Name='P',Value=None,Delimiter=' '), '-n':ValuedParameter(Prefix='',Name='n',Value=None,Delimiter=' '), '-x':ValuedParameter(Prefix='',Name='x',Value=None,Delimiter=' '), '-m':ValuedParameter(Prefix='',Name='m',Value=None,Delimiter=' '), '-tp':ValuedParameter(Prefix='',Name='tp',Value=None,Delimiter=' '), '-ts':ValuedParameter(Prefix='',Name='ts',Value=None,Delimiter=' '), '-a':ValuedParameter(Prefix='',Name='a',Value=None,Delimiter=' '), '-o':ValuedParameter(Prefix='',Name='o',Value=None,Delimiter=' '), '-c':ValuedParameter(Prefix='',Name='c',Value=None,Delimiter=' '), '-j':ValuedParameter(Prefix='',Name='j',Value=None,Delimiter=' '), '-r':ValuedParameter(Prefix='',Name='r',Value=None,Delimiter=' '), '-f':ValuedParameter(Prefix='',Name='f',Value=None,Delimiter=' '), '-v':ValuedParameter(Prefix='',Name='v',Value=None,Delimiter=' '), '-g':ValuedParameter(Prefix='',Name='g',Value=None,Delimiter=' '), '-d':ValuedParameter(Prefix='',Name='d',Value=None,Delimiter=' '), '-wa':ValuedParameter(Prefix='',Name='wa',Value=None,Delimiter=' '), '-wb':ValuedParameter(Prefix='',Name='wb',Value=None,Delimiter=' '), '-wc':ValuedParameter(Prefix='',Name='wc',Value=None,Delimiter=' '), '-wd':ValuedParameter(Prefix='',Name='wd',Value=None,Delimiter=' '), '-we':ValuedParameter(Prefix='',Name='we',Value=None,Delimiter=' '), '-pk':ValuedParameter(Prefix='',Name='pk',Value=None,Delimiter=' '), '-pg':ValuedParameter(Prefix='',Name='pg',Value=None,Delimiter=' '), '-pd':ValuedParameter(Prefix='',Name='pd',Value=None,Delimiter=' '), '-ps':ValuedParameter(Prefix='',Name='ps',Value=None,Delimiter=' '), '-pc':ValuedParameter(Prefix='',Name='pc',Value=None,Delimiter=' ')} _command = 'comRNA' _input_handler = '_input_as_string' PyCogent-1.5.3/cogent/app/consan.py000644 000765 000024 00000012605 12024702176 020137 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import remove, system, rmdir, mkdir from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath, FilePath, ApplicationError from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym, is_not_None __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class Consan(CommandLineApplication): """Application controller for CONSAN v1.1""" _parameters = { '-m':ValuedParameter(Prefix='-',Name='m'), '-M':ValuedParameter(Prefix='-',Name='M'), '-C':ValuedParameter(Prefix='-',Name='C'), '-P':ValuedParameter(Prefix='-',Name='P'), '-V':FlagParameter(Prefix='-',Name='V'), '-f':FlagParameter(Prefix='-',Name='f'), '-x':FlagParameter(Prefix='-',Name='x'), '-t':FlagParameter(Prefix='-',Name='t')} _command = 'sfold' _input_handler='_input_as_string' def _input_as_string(self,data): """ Takes two files in a list as input eg. data = [path1,path2] """ inputFiles = ' '.join(data) self._input_filename = data return inputFiles def _input_as_lines(self,data): """ Writes to first sequences(fasta) in a list to two temp files data: a sequence to be written to a file, each element of the sequence will compose a line in the file Data should be in the following format: data = ['>tag1','sequence1','>tag2','sequence2'] Note: '\n' will be stripped off the end of each sequence element before writing to a file in order to avoid multiple new lines accidentally be written to a file """ inputFiles = '' self._input_filename = [] for i in range(2): filename = self.getTmpFilename(self.WorkingDir) self._input_filename.append(filename) data_file = open(filename,'w') if i == 0: data_to_file = '\n'.join(data[:2]) tmp1 = filename else: data_to_file = '\n'.join(data[2:]) tmp2 = filename data_file.write(data_to_file) data_file.close() inputFiles = ' '.join([tmp1,tmp2]) return inputFiles # need to override __call__ to remove files properly def __call__(self,data=None, remove_tmp=True): """Run the application with the specified kwargs on data data: anything that can be cast into a string or written out to a file. Usually either a list of things or a single string or number. input_handler will be called on this data before it is passed as part of the command-line argument, so by creating your own input handlers you can customize what kind of data you want your application to accept remove_tmp: if True, removes tmp files """ input_handler = self.InputHandler suppress_stdout = self.SuppressStdout suppress_stderr = self.SuppressStderr if suppress_stdout: outfile = FilePath('/dev/null') else: outfile = self.getTmpFilename(self.TmpDir) if suppress_stderr: errfile = FilePath('/dev/null') else: errfile = FilePath(self.getTmpFilename(self.TmpDir)) if data is None: input_arg = '' else: input_arg = getattr(self,input_handler)(data) # Build up the command, consisting of a BaseCommand followed by # input and output (file) specifications command = self._command_delimiter.join(filter(None,\ [self.BaseCommand,str(input_arg),'>',str(outfile),'2>',\ str(errfile)])) if self.HaltExec: raise AssertionError, "Halted exec with command:\n" + command # The return value of system is a 16-bit number containing the signal # number that killed the process, and then the exit status. # We only want to keep the exit status so do a right bitwise shift to # get rid of the signal number byte tmp_dir = ''.join([self.WorkingDir, 'tmp']) mkdir(tmp_dir) exit_status = system(command) >> 8 rmdir(tmp_dir) # Determine if error should be raised due to exit status of # appliciation if not self._accept_exit_status(exit_status): raise ApplicationError, \ 'Unacceptable application exit status: %s, command: %s'\ % (str(exit_status),command) # open the stdout and stderr if not being suppressed out = None if not suppress_stdout: out = open(outfile,"r") err = None if not suppress_stderr: err = open(errfile,"r") result = CommandLineAppResult(out,err,exit_status,\ result_paths=self._get_result_paths(data)) # Clean up the input file if one was created if remove_tmp: if self._input_filename: for f in self._input_filename: remove(f) self._input_filename = None return result PyCogent-1.5.3/cogent/app/contrafold.py000644 000765 000024 00000001464 12024702176 021012 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym, is_not_None __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class Contrafold(CommandLineApplication): """Application controler for CONTRAfold v1.0""" _parameters = {'predict':FlagParameter(Prefix='',Name='predict',Value=True), 'train':FlagParameter(Prefix='',Name='train')} _command = 'contrafold' _input_handler='_input_as_string' PyCogent-1.5.3/cogent/app/cove.py000644 000765 000024 00000015662 12024702176 017620 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym, is_not_None __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class Covet(CommandLineApplication): """Application controller for Covet Generate new models, by training them on example sequences. where options are: -a : make starting model from alignment -A : save alignments to filename.1, etc., for animation -b : each iteration, back up curr model to -f : use flat text save formats, portable but clumsy -G : gap-open prob 0 < gop < 1 for random alignment generation -h : print short help and version info -i : take start model from -m : do maximum likelihood model construction (slow!) -p : use prior in ; default is Laplace plus-one -s : set random() seed -X : gap-extend prob 0 < gex < 1 for random alignment generation """ _parameters = { '-a':ValuedParameter(Prefix='-',Name='a',Delimiter=' '), '-A':ValuedParameter(Prefix='-',Name='A',Delimiter=' '), '-b':ValuedParameter(Prefix='-',Name='b',Delimiter=' '), '-f':FlagParameter(Prefix='-',Name='f'), '-G':ValuedParameter(Prefix='-',Name='G',Delimiter=' '), '-i':ValuedParameter(Prefix='-',Name='i',Delimiter=' '), '-m':FlagParameter(Prefix='-',Name='m'), '-p':ValuedParameter(Prefix='-',Name='p',Delimiter=' '), '-s':ValuedParameter(Prefix='-',Name='s',Delimiter=' '), '-X':ValuedParameter(Prefix='-',Name='X',Delimiter=' ')} _command = 'covet' _input_handlar = '_input_as_string' def _input_as_string(self,filename): """Returns 'modelname' and 'filename' to redirect input to stdin""" return ' '.join([filename+'.cm',super(Covet,self)._input_as_string(filename)]) def _input_as_lines(self,data): """Returns 'temp_filename to redirect input to stdin""" filename = self._input_filename = self.getTmpFilename(self.WorkingDir) data_file = open(filename,'w') data_to_file = '\n'.join([str(d).strip('\n') for d in data]) data_file.write(data_to_file) data_file.write('\n') #must end with new line data_file.close() return ' '.join([filename+'.cm',filename]) def _get_result_paths(self,data): """Specifies the paths of output files generated by the application data: the data the instance of the application is called on CMfinder produces it's output in two files .align and .motif it also prints an output to sdtout. """ result={} if not isinstance(data,list): inputPath=data else: inputPath=''.join([self._input_filename]) result['cm'] =\ ResultPath(Path=(inputPath+'.cm')) if self._input_filename is not None: result['_input_filename'] = ResultPath(self._input_filename) return result class Coves(CommandLineApplication): """Application controller for Coves Computes the score of each whole sequence individually, and prints the scores. You might use it to detect sequences which, according to the model, don't belong to the same structural consensus; sequences which don't fit the model get negative scores. where options are: -a : show all pairs, not just Watson-Crick -g : set expected background GC composition (default 0.5) -m : mountain representation of structural alignment -s : secondary structure string representation of structural alignment """ _parameters = { '-a':FlagParameter(Prefix='-',Name='a'), '-g':ValuedParameter(Prefix='-',Name='g',Delimiter=' '), '-m':FlagParameter(Prefix='-',Name='m'), '-s':FlagParameter(Prefix='-',Name='s',Value=True)} _command = 'coves' _input_handler = '_input_as_string' def _input_as_string(self,filename): """Returns 'modelname' and 'filename' to redirect input to stdin""" return ' '.join([filename+'.cm',super(Coves,self)._input_as_string(filename)]) class Covee(CommandLineApplication): """Application controller for Covee emits a consensus structure prediction for the family. where options are: -a : annotate all pairs, not just canonical ones -b : emit single most probable sequence -l : print as mountain landscape -s : set seed for random() EXPERIMENTAL OPTIONS: -L : calculate expected length distributions for states """ _parameters = { '-a':FlagParameter(Prefix='-',Name='a'), '-b':FlagParameter(Prefix='-',Name='b',Value=True), '-l':FlagParameter(Prefix='-',Name='l'), '-s':ValuedParameter(Prefix='-',Name='s',Delimiter=' '), '-L':FlagParameter(Prefix='-',Name='L')} _command = 'covee' _input_handler = '_input_as_string' def _input_as_string(self,filename): """Returns 'modelname' and 'filename' to redirect input to stdin""" return ' '.join([filename+'.cm',super(Covee,self)._input_as_string(filename)]) def _input_as_lines(self,data): """Returns 'temp_filename to redirect input to stdin""" return ''.join([data+'.cm',super(Covee,self)._input_as_lines(data)]) class Covea(CommandLineApplication): """Application controller for Covea here supported options are: -a : annotate all base pairs, not just canonical ones -h : print short help and version info -o : write alignment to in SELEX format -s : save individual alignment scores to Experimental options: -S : use small-memory variant of alignment algorithm """ _parameters = { '-a':FlagParameter(Prefix='-',Name='a'), '-o':ValuedParameter(Prefix='-',Name='o',Delimiter=' '), '-s':ValuedParameter(Prefix='-',Name='s',Delimiter=' '), '-S':FlagParameter(Prefix='-',Name='S')} _command = 'covea' _input_handler = '_input_as_string' def _input_as_string(self,filename): """Returns 'modelname' and 'filename' to redirect input to stdin""" return ' '.join([filename+'.cm',super(Covea,self)._input_as_string(filename)]) def _input_as_lines(self,data): """Returns 'temp_filename to redirect input to stdin""" return ''.join([data+'.cm',super(Covea,self)._input_as_lines(data)]) PyCogent-1.5.3/cogent/app/dialign.py000644 000765 000024 00000023777 12024702176 020301 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Application controller for dialign2-2 """ __author__ = "Gavin Huttley" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Gavin Huttley"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Gavin Huttley" __email__ = "gavin.huttley@anu.edu.au" __status__ = "Production" from cogent.app.parameters import FlagParameter, ValuedParameter from cogent.app.util import CommandLineApplication, ResultPath, \ get_tmp_filename, guess_input_handler from random import choice from cogent.parse.tree import DndParser from cogent.core.tree import PhyloNode from cogent.parse.fasta import MinimalFastaParser class Dialign(CommandLineApplication): """Dialign application controller""" _options ={ # -afc Creates additional output file "*.afc" containing data of # all fragments considered for alignment # WARNING: this file can be HUGE ! '-afc':FlagParameter(Prefix='-',Name='afc'), # -afc_v like "-afc" but verbose: fragments are explicitly printed # WARNING: this file can be EVEN BIGGER ! '-afc_v':FlagParameter(Prefix='-',Name='afc_v'), # -anc Anchored alignment. Requires a file .anc # containing anchor points. '-anc':FlagParameter(Prefix='-',Name='anc'), # -cs if segments are translated, not only the `Watson strand' # but also the `Crick strand' is looked at. '-cs':FlagParameter(Prefix='-',Name='cs'), # -cw additional output file in CLUSTAL W format. '-cw':FlagParameter(Prefix='-',Name='cw'), # -ds `dna alignment speed up' - non-translated nucleic acid # fragments are taken into account only if they start with # at least two matches. Speeds up DNA alignment at the expense # of sensitivity. '-ds':FlagParameter(Prefix='-',Name='ds'), # -fa additional output file in FASTA format. '-fa':FlagParameter(Prefix='-',Name='fa'), # -ff Creates file *.frg containing information about all # fragments that are part of the respective optimal pairwise # alignmnets plus information about consistency in the multiple # alignment '-ff':FlagParameter(Prefix='-',Name='ff'), # -fn output files are named . . '-fn':ValuedParameter('-',Name='fn',Delimiter=' ', IsPath=True), # # # -fop Creates file *.fop containing coordinates of all fragments # that are part of the respective pairwise alignments. '-fop':FlagParameter(Prefix='-',Name='fop'), # -fsm Creates file *.fsm containing coordinates of all fragments # that are part of the final alignment '-fsm':FlagParameter(Prefix='-',Name='fsm'), # -iw overlap weights switched off (by default, overlap weights are # used if up to 35 sequences are aligned). This option # speeds up the alignment but may lead to reduced alignment # quality. '-iw':FlagParameter(Prefix='-',Name='iw'), # -lgs `long genomic sequences' - combines the following options: # -ma, -thr 2, -lmax 30, -smin 8, -nta, -ff, # -fop, -ff, -cs, -ds, -pst '-lgs':FlagParameter(Prefix='-',Name='lgs'), # -lgs_t Like "-lgs" but with all segment pairs assessed at the # peptide level (rather than 'mixed alignments' as with the # "-lgs" option). Therefore faster than -lgs but not very # sensitive for non-coding regions. '-lgs_t':FlagParameter(Prefix='-',Name='lgs_t'), # -lmax maximum fragment length = x (default: x = 40 or x = 120 # for `translated' fragments). Shorter x speeds up the program # but may affect alignment quality. '-lmax':ValuedParameter('-',Name='lmax',Delimiter=' '), # -lo (Long Output) Additional file *.log with information abut # fragments selected for pairwise alignment and about # consistency in multi-alignment proceedure '-lo':FlagParameter(Prefix='-',Name='lo'), # -ma `mixed alignments' consisting of P-fragments and N-fragments # if nucleic acid sequences are aligned. '-ma':FlagParameter(Prefix='-',Name='ma'), # -mask residues not belonging to selected fragments are replaced # by `*' characters in output alignment (rather than being # printed in lower-case characters) '-mask':FlagParameter(Prefix='-',Name='mask'), # -mat Creates file *mat with substitution counts derived from the # fragments that have been selected for alignment '-mat':FlagParameter(Prefix='-',Name='mat'), # -mat_thr Like "-mat" but only fragments with weight score > t # are considered '-mat_thr':ValuedParameter('-',Name='mat_thr',Delimiter=' '), # -max_link "maximum linkage" clustering used to construct sequence tree # (instead of UPGMA). '-max_link':FlagParameter(Prefix='-',Name='max_link'), # -min_link "minimum linkage" clustering used. '-min_link':FlagParameter(Prefix='-',Name='min_link'), # # -mot "motif" option. '-mot':FlagParameter(Prefix='-',Name='mot'), # -msf separate output file in MSF format. '-msf':FlagParameter(Prefix='-',Name='msf'), # -n input sequences are nucleic acid sequences. No translation # of fragments. '-n':FlagParameter(Prefix='-',Name='n'), # -nt input sequences are nucleic acid sequences and `nucleic acid # segments' are translated to `peptide segments'. '-nt':FlagParameter(Prefix='-',Name='nt'), # -nta `no textual alignment' - textual alignment suppressed. This # option makes sense if other output files are of intrest -- # e.g. the fragment files created with -ff, -fop, -fsm or -lo '-nta':FlagParameter(Prefix='-',Name='nta'), # -o fast version, resulting alignments may be slightly different. '-o':FlagParameter(Prefix='-',Name='o'), # # -ow overlap weights enforced (By default, overlap weights are # used only if up to 35 sequences are aligned since calculating # overlap weights is time consuming). Warning: overlap weights # generally improve alignment quality but the running time # increases in the order O(n^4) with the number of sequences. # This is why, by default, overlap weights are used only for # sequence sets with < 35 sequences. '-ow':FlagParameter(Prefix='-',Name='ow'), # -pst "print status". Creates and updates a file *.sta with # information about the current status of the program run. # This option is recommended if large data sets are aligned # since it allows the user to estimate the remaining running # time. '-pst':FlagParameter(Prefix='-',Name='pst'), # -smin minimum similarity value for first residue pair (or codon # pair) in fragments. Speeds up protein alignment or alignment # of translated DNA fragments at the expense of sensitivity. '-smin':ValuedParameter('-',Name='smin',Delimiter=' '), # -stars maximum number of `*' characters indicating degree of # local similarity among sequences. By default, no stars # are used but numbers between 0 and 9, instead. '-stars':ValuedParameter('-',Name='stars',Delimiter=' '), # -stdo Results written to standard output. '-stdo':FlagParameter(Prefix='-',Name='stdo'), # -ta standard textual alignment printed (overrides suppression # of textual alignments in special options, e.g. -lgs) '-ta':FlagParameter(Prefix='-',Name='ta'), # -thr Threshold T = x. '-thr':ValuedParameter('-',Name='thr',Delimiter=' '), # -xfr "exclude fragments" - list of fragments can be specified # that are NOT considered for pairwise alignment '-xfr':FlagParameter(Prefix='-',Name='xfr'), } _parameters = {} _parameters.update(_options) _command = "dialign2-2" def _input_as_seqs(self,data): lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _align_out_filename(self): if self.Parameters['-fn'].isOn(): aln_filename = self._absolute(str(self.Parameters['-fn'].Value)) else: raise ValueError, "No output file specified." return aln_filename def _get_result_paths(self,data): result = {} if self.Parameters['-fn'].isOn(): out_name = self._align_out_filename() result['Align'] = ResultPath(Path=out_name,IsWritten=True) return result def getHelp(self): """Dialign help""" help_str = """ """ return help_str PyCogent-1.5.3/cogent/app/dotur.py000644 000765 000024 00000020127 12024702176 020011 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides an application controller for the commandline version of: DOTUR v1.53 """ import shutil from cogent.app.parameters import FlagParameter, ValuedParameter, \ MixedParameter from cogent.app.util import CommandLineApplication, ResultPath, \ get_tmp_filename, FilePath from cogent.core.alignment import SequenceCollection, Alignment from cogent.core.moltype import DNA, RNA, PROTEIN from cogent.format.table import phylipMatrix from cogent.parse.dotur import OtuListParser __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" class Dotur(CommandLineApplication): """Dotur application controller. """ # Options: _options = {\ # -i: Number of iterations (default = 1000) '-i':ValuedParameter('-',Name='i',Delimiter=' '),\ # -c: Clustering method - (f) furthest neighbor, (n) nearest # neighbor, (a) average neighbor (default = f) '-c':ValuedParameter('-',Name='c',Delimiter=' '),\ # -p: Precision of distances for output, increasing can # dramatically lengthen execution times - 10, 100, 1000, 10000 # (default = 100) '-p':ValuedParameter('-',Name='p',Delimiter=' '),\ # -l: Input file is lower triangular (default = square matrix) '-l':FlagParameter('-',Name='l'),\ # -r: Calculates rarefaction curves for each parameter, can # dramatically lengthen execution times. Simple rarefaction # curve always calculated. '-r':FlagParameter('-',Name='r'),\ # -stop: Stops clustering when cutoff has been reached. '-stop':FlagParameter('-',Name='stop'),\ # -wrep: Samples with replacement. '-wrep':FlagParameter('-',Name='wrep'),\ # -jumble: Jumble the order of the distance matrix. '-jumble':FlagParameter('-',Name='jumble'),\ # -sim: Converts similarity score to distance (D=1-S). '-sim':FlagParameter('-',Name='sim'),\ } _parameters = {} _parameters.update(_options) _input_handler = '_input_as_multiline_string' _command = 'dotur' def getHelp(self): """Method that points to the DOTUR documentation.""" help_str =\ """ See DOTUR Documentation page at: http://schloss.micro.umass.edu/software/dotur/documentation.html """ return help_str def _input_as_multiline_string(self, data): """Write a multiline string to a temp file and return the filename. data: a multiline string to be written to a file. * Note: the result will be the filename as a FilePath object (which is a string subclass). """ filename = self._input_filename = \ FilePath(self.getTmpFilename(self.WorkingDir)) data_file = open(filename,'w') data_file.write(data) data_file.close() return filename def _get_cluster_method(self): """Returns cluster method as string. """ if self.Parameters['-c'].isOn(): cluster_method = self._absolute(str(\ self.Parameters['-c'].Value))+'n' else: # f (furthest neighbor) is default cluster_method = 'fn' return cluster_method def _get_result_paths(self,data): """Return dict of {key: ResultPath} - NOTE: Only putting a few files on the results path. Add more here if needed. """ result = {} out_name = self._input_filename.split('.txt')[0] cluster_method = self._get_cluster_method() #only care about Otu, List and Rank, can add others later. result['Otu'] = ResultPath(Path=out_name+'.%s.otu'%(cluster_method)) result['List'] = ResultPath(Path=out_name+'.%s.list'%(cluster_method)) result['Rank'] = ResultPath(Path=out_name+'.%s.rank'%(cluster_method)) result['Rarefaction'] = \ ResultPath(Path=out_name+'.%s.rarefaction'%(cluster_method)) return result def remap_seq_names(otu_list, int_map): """Returns list with seq names remapped. - otu_list: list of lists containing sequence names in an OTU. - int_map: mapping between names in otu_list and original names. """ res = [] for otu in otu_list: curr_otu = [] for seq in otu: curr_otu.append(int_map[seq]) res.append(curr_otu) return res def dotur_from_alignment(aln,moltype,distance_function,params=None): """Returns dotur results given an alignment and distance function. - aln: An Alignment object or something that behaves like one. Sequences must be aligned. - moltype: cogent.core.moltype object. - distance_function: function that can be passed to distanceMatrix() method of SequenceCollection. Must be able to find distance between two sequences. - NOTE: This function will only return the parsed *.list file, as it contains the OTU identities. Dotur generates 23 output files, so if this is not the one you are looking for, check out the documentation and add the others to the result path. """ #construct Alignment object. This will handle unaligned sequences. aln = Alignment(aln, MolType=moltype) #need to make int map. int_map, int_keys = aln.getIntMap() #construct Alignment object from int map to use object functionality int_map = Alignment(int_map, MolType=moltype) order = sorted(int_map.Names) #Build distance matrix. d_matrix_dict = int_map.distanceMatrix(f=distance_function) d_matrix_dict.RowOrder=order d_matrix_dict.ColOrder=order #Get distance matrix in list form. d_matrix_list = d_matrix_dict.toLists() #must be strings to use phylipMatrix for i,line in enumerate(d_matrix_list): d_matrix_list[i]=map(str,line) #Get phylip formatted string. phylip_matrix_string = phylipMatrix(rows=d_matrix_list,names=order) working_dir = get_tmp_filename(suffix='') app = Dotur(InputHandler='_input_as_multiline_string',\ WorkingDir=working_dir,params=params) res = app(phylip_matrix_string) otu_list = OtuListParser(res['List'].readlines()) #remap sequence names for i,otu in enumerate(otu_list): otu_list[i][2]=remap_seq_names(otu[2], int_keys) shutil.rmtree(app.WorkingDir) return otu_list def dotur_from_file(distance_matrix_file_path,params=None): """Returns dotur results given a distance matrix file. - distance_matrix_file_path: Path to distance matrix file. This file must a PHYLIP formatted square distance matrix. This format is available in cogent.format.table. - IMPORANT NOTE: This distance matrix format allows only 10 characters for the row labels in the distance matrix. Also, the IDs must be unique and ungapped to be useful when using dotur. - NOTE: This function will only return the parsed *.list file, as it contains the OTU identities. Dotur generates 23 output files, so if this is not the one you are looking for, check out the documentation and add the others to the result path. """ # Read out the data from the distance_matrix_file_path. # This is important so we can run dotur in a temp directory and avoid # having to handle all 23 output files. d_matrix_string = open(distance_matrix_file_path,'U').read() working_dir = get_tmp_filename(suffix='') app = Dotur(InputHandler='_input_as_multiline_string',\ WorkingDir=working_dir,params=params) res = app(d_matrix_string) otu_list = OtuListParser(res['List'].readlines()) shutil.rmtree(app.WorkingDir) return otu_list PyCogent-1.5.3/cogent/app/dynalign.py000644 000765 000024 00000022130 12024702176 020455 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath, ApplicationError from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters from sys import platform from os import remove,system,mkdir,getcwd,close __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class Dynalign(CommandLineApplication): """Application controller for Dynalign Input: Sequences for input in Dynalign are assumed to be in the following format: ;(first line of file) Comments must start with a ;semicolon ;There can be any number of comments A single line title must immediately follow: AAA GCGG UUTGTT UTCUTaaTCTXXXXUCAGG1 where the terminal 1 is required at the end. pknotsRG is a tool for thermodynamic folding of RNA secondary structures, including the class of canonical simple recursive pseudoknots. -alignment The alignment file that will be created -M Maximum seperation parameter (larger then diff in length of seq) -gap_cost 0.4 good value -max_precent_diff maximum % diff in free energy in suboptimal struct. 20 good starting point -bp_window specifies how different the suboptimal structures must be from each other. 2 good starting point -align_window specifies how different alignments must be, 1 good starting point. -single_bp_inserts specifies whether single base pair inserts are allowed; 0 = no; 1 = yes. -[savefile] optional """ _parameters = { '-alignment':ValuedParameter(Prefix='',Name='',Value='align',Delimiter=' '), '-M':ValuedParameter(Prefix='',Name='',Value=10,Delimiter=' '), '-gap_cost':ValuedParameter(Prefix='',Name='',Value=0.4,Delimiter=' '), '-max_structures':ValuedParameter(Prefix='',Name='',Value=8,Delimiter=' '), '-max_percent_diff':ValuedParameter(Prefix='',Name='',Value=20,\ Delimiter=' '), '-bp_window':ValuedParameter(Prefix='',Name='',Value=2,Delimiter=' '), '-align_window':ValuedParameter(Prefix='',Name='',Value=1,Delimiter=' '), '-single_bp_inserts':ValuedParameter(Prefix='',Name='',Value=1,\ Delimiter=' '),} _command = 'dynalign' _input_handler = '_input_as_string' def _input_as_string(self,data): """Return data as a string Data = list with two file paths ex: data = ['path1,'path2'']""" inputFiles = '' self._input_filename = [] for i in data: self._input_filename.append(i) inputFiles = ' '.join([inputFiles,i]) outputFile = ' '.join(['ct1', 'ct2']) inputFiles = ' '.join([inputFiles,outputFile]) return inputFiles def _input_as_lines(self,data): """ Write a seq of lines to a temp file and return the filename string data: a sequence to be written to a file, each element of the sequence will compose a line in the file. ex. data = [file1,file2], file1 and 2 is lists of lines -> [[List],[List]] Note: '\n' will be stripped off the end of each sequence element before writing to a file in order to avoid multiple new lines accidentally be written to a file """ inputFiles = '' self._input_filename = [] for el in data: filename = self.getTmpFilename(self.WorkingDir) self._input_filename.append(filename) data_file = open(filename,'w') data_to_file = '\n'.join([str(d).strip('\n') for d in el]) data_file.write(data_to_file) data_file.close() inputFiles = ' '.join([inputFiles,filename]) outputFiles = ' '.join(['ct1', 'ct2']) inputFiles = ' '.join([inputFiles,outputFiles]) return inputFiles def _get_result_paths(self,data): """ data: the data the instance of the application is called on """ result = {} result['seq_1_ct'] =\ ResultPath(Path=(self.WorkingDir+'ct1')) result['seq_2_ct'] =\ ResultPath(Path=(self.WorkingDir+'ct2')) result['alignment'] =\ ResultPath(Path=(self.WorkingDir+'align')) return result #Below from Cogent/app/util.py modified to accommodate dynalign def __call__(self,data=None): """Run the application with the specified kwargs on data Overides the __call__ function in util.py becasue of the special circumstance surrounding the command line input. data: anything that can be cast into a string or written out to a file. Usually either a list of things or a single string or number. input_handler will be called on this data before it is passed as part of the command-line argument, so by creating your own input handlers you can customize what kind of data you want you application to accept """ input_handler = self.InputHandler suppress_stdout = self.SuppressStdout suppress_stderr = self.SuppressStderr if suppress_stdout: outfile = '/dev/null' else: outfile = self.getTmpFilename(self.WorkingDir) if suppress_stderr: errfile = '/dev/null' else: errfile = self.getTmpFilename(self.WorkingDir) if data is None: input_arg = '' else: input_arg = getattr(self,input_handler)(data) # Build up the command, consisting of a BaseCommand followed by # input and output (file) specifications first,second=self.BaseCommand command = self._command_delimiter.join(filter(None,\ [first,input_arg,second,'>',outfile,'2>',errfile])) if self.HaltExec: raise AssertionError, "Halted exec with command:\n" + command # The return value of system is a 16-bit number containing the signal # number that killed the process, and then the exit status. # We only want to keep the exit status so do a right bitwise shift to # get rid of the signal number byte exit_status = system(command) >> 8 # Determine if error should be raised due to exit status of # appliciation if not self._accept_exit_status(exit_status): raise ApplicationError, \ 'Unacceptable application exit status: %s, command: %s'\ % (str(exit_status),command) # open the stdout and stderr if not being suppressed out = None if not suppress_stdout: out = open(outfile,"r") err = None if not suppress_stderr: err = open(errfile,"r") result = CommandLineAppResult(out,err,exit_status,\ result_paths=self._get_result_paths(data)) # Clean up the input file if one was created if self._input_filename: for f in self._input_filename: remove(f) self._input_filename = None return result def _get_base_command(self): """ Returns the full command string Overides the __call__ function in util.py becasue of the special circumstance surrounding the command line input. input_arg: the argument to the command which represents the input to the program, this will be a string, either representing input or a filename to get input from """ command_part1 = [] command_part2 = [] # Append a change directory to the beginning of the command to change # to self.WorkingDir before running the command cd_command = ''.join(['cd ',self.WorkingDir,';']) if self._command is None: raise ApplicationError, '_command has not been set.' command = self._command command_part1.append(cd_command) command_part1.append(command) lista = [self.Parameters['-alignment'],\ self.Parameters['-M'],\ self.Parameters['-gap_cost'],\ self.Parameters['-max_structures'],\ self.Parameters['-max_percent_diff'],\ self.Parameters['-bp_window'],\ self.Parameters['-align_window'],\ self.Parameters['-single_bp_inserts']] command_part2.append(self._command_delimiter.join(filter(\ None,(map(str,lista))))) return self._command_delimiter.join(command_part1).strip(),\ self._command_delimiter.join(command_part2).strip() BaseCommand = property(_get_base_command) PyCogent-1.5.3/cogent/app/fasttree.py000644 000765 000024 00000014306 12024702176 020473 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for FastTree designed for FastTree v1.1.0 . Also functions with v2.0.1, v2.1.0, and v2.1.3 though only with basic functionality""" from cogent.app.parameters import ValuedParameter, FlagParameter, \ MixedParameter from cogent.app.util import CommandLineApplication, FilePath, system, \ CommandLineAppResult, ResultPath, remove, ApplicationError from cogent.core.tree import PhyloNode from cogent.parse.tree import DndParser from cogent.core.moltype import DNA, RNA, PROTEIN from cogent.core.alignment import SequenceCollection __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald", "Justin Kuczynski"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "mcdonadt@colorado.edu" __status__ = "Development" class FastTree(CommandLineApplication): """FastTree application Controller""" _command = 'FastTree' _input_handler = '_input_as_multiline_string' _parameters = { '-quiet':FlagParameter('-',Name='quiet'), '-boot':ValuedParameter('-',Delimiter=' ',Name='boot'), '-seed':ValuedParameter('-',Delimiter=' ',Name='seed'), '-nni':ValuedParameter('-',Delimiter=' ',Name='nni'), '-slow':FlagParameter('-',Name='slow'), '-fastest':FlagParameter('-',Name='fastest'), '-top':FlagParameter('-',Name='top'), '-notop':FlagParameter('-',Name='notop'), '-topm':ValuedParameter('-',Delimiter=' ',Name='topm'), '-close':ValuedParameter('-',Delimiter=' ',Name='close'), '-refresh':ValuedParameter('-',Delimiter=' ',Name='refresh'), '-matrix':ValuedParameter('-',Delimiter=' ',Name='matrix'), '-nomatrix':FlagParameter('-',Name='nomatrix'), '-nj':FlagParameter('-',Name='nj'), '-bionj':FlagParameter('-',Name='bionj'), '-nt':FlagParameter('-',Name='nt'), '-n':ValuedParameter('-',Delimiter=' ',Name='n'), '-pseudo':MixedParameter('-',Delimiter=' ', Name='pseudo'), '-intree':ValuedParameter('-',Delimiter=' ',Name='intree'), '-spr':ValuedParameter('-',Delimiter=' ',Name='spr'), '-constraints':ValuedParameter('-',Delimiter=' ',\ Name='constraints'), '-constraintWeight':ValuedParameter('-',Delimiter=' ',\ Name='constraintWeight'),\ '-makematrix':ValuedParameter('-',Delimiter=' ',Name='makematrix')} def __call__(self,data=None, remove_tmp=True): """Run the application with the specified kwargs on data data: anything that can be cast into a string or written out to a file. Usually either a list of things or a single string or number. input_handler will be called on this data before it is passed as part of the command-line argument, so by creating your own input handlers you can customize what kind of data you want your application to accept remove_tmp: if True, removes tmp files NOTE: Override of the base class to handle redirected output """ input_handler = self.InputHandler suppress_stderr = self.SuppressStderr outfile = self.getTmpFilename(self.TmpDir) self._outfile = outfile if suppress_stderr: errfile = FilePath('/dev/null') else: errfile = FilePath(self.getTmpFilename(self.TmpDir)) if data is None: input_arg = '' else: input_arg = getattr(self,input_handler)(data) # Build up the command, consisting of a BaseCommand followed by # input and output (file) specifications command = self._command_delimiter.join(filter(None,\ [self.BaseCommand,str(input_arg),'>',str(outfile),'2>',\ str(errfile)])) if self.HaltExec: raise AssertionError, "Halted exec with command:\n" + command # The return value of system is a 16-bit number containing the signal # number that killed the process, and then the exit status. # We only want to keep the exit status so do a right bitwise shift to # get rid of the signal number byte exit_status = system(command) >> 8 # Determine if error should be raised due to exit status of # appliciation if not self._accept_exit_status(exit_status): raise ApplicationError, \ 'Unacceptable application exit status: %s, command: %s'\ % (str(exit_status),command) out = open(outfile,"r") err = None if not suppress_stderr: err = open(errfile,"r") result = CommandLineAppResult(out,err,exit_status,\ result_paths=self._get_result_paths(data)) # Clean up the input file if one was created if remove_tmp: if self._input_filename: remove(self._input_filename) self._input_filename = None return result def _get_result_paths(self, data): result = {} result['Tree'] = ResultPath(Path=self._outfile) return result def build_tree_from_alignment(aln, moltype, best_tree=False, params=None): """Returns a tree from alignment Will check MolType of aln object """ if params is None: params = {} if moltype == DNA or moltype == RNA: params['-nt'] = True elif moltype == PROTEIN: params['-nt'] = False else: raise ValueError, \ "FastTree does not support moltype: %s" % moltype.label if best_tree: params['-slow'] = True #Create mapping between abbreviated IDs and full IDs int_map, int_keys = aln.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) app = FastTree(params=params) result = app(int_map.toFasta()) tree = DndParser(result['Tree'].read(), constructor=PhyloNode) #remap tip names for tip in tree.tips(): tip.Name = int_keys[tip.Name] return tree PyCogent-1.5.3/cogent/app/fasttree_v1.py000644 000765 000024 00000012700 12024702176 021075 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for FastTree v1.0""" from cogent.app.parameters import ValuedParameter, FlagParameter from cogent.app.util import CommandLineApplication, FilePath, system, \ CommandLineAppResult, ResultPath, remove, ApplicationError from cogent.core.tree import PhyloNode from cogent.parse.tree import DndParser from cogent.core.moltype import DNA, RNA, PROTEIN __author__ = "Daniel McDonald" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Daniel McDonald" __email__ = "mcdonadt@colorado.edu" __status__ = "Development" class FastTree(CommandLineApplication): """FastTree application Controller""" _command = 'FastTree' _input_handler = '_input_as_multiline_string' _parameters = { '-quiet':FlagParameter('-',Name='quiet'), '-boot':ValuedParameter('-',Delimiter=' ',Name='boot'), '-seed':ValuedParameter('-',Delimiter=' ',Name='seed'), '-nni':ValuedParameter('-',Delimiter=' ',Name='nni'), '-slow':FlagParameter('-',Name='slow'), '-fastest':FlagParameter('-',Name='fastest'), '-top':FlagParameter('-',Name='top'), '-notop':FlagParameter('-',Name='notop'), '-topm':ValuedParameter('-',Delimiter=' ',Name='topm'), '-close':ValuedParameter('-',Delimiter=' ',Name='close'), '-refresh':ValuedParameter('-',Delimiter=' ',Name='refresh'), '-matrix':ValuedParameter('-',Delimiter=' ',Name='matrix'), '-nomatrix':FlagParameter('-',Name='nomatrix'), '-nj':FlagParameter('-',Name='nj'), '-bionj':FlagParameter('-',Name='bionj'), '-nt':FlagParameter('-',Name='nt'), '-n':ValuedParameter('-',Delimiter=' ',Name='n')} #FastTree [-quiet] [-boot 1000] [-seed 1253] [-nni 10] [-slow | -fastest] # [-top | -notop] [-topm 1.0 [-close 0.75] [-refresh 0.8]] # [-matrix Matrix | -nomatrix] [-nj | -bionj] # [-nt] [-n 100] [alignment] > newick_tree def __call__(self,data=None, remove_tmp=True): """Run the application with the specified kwargs on data data: anything that can be cast into a string or written out to a file. Usually either a list of things or a single string or number. input_handler will be called on this data before it is passed as part of the command-line argument, so by creating your own input handlers you can customize what kind of data you want your application to accept remove_tmp: if True, removes tmp files NOTE: Override of the base class to handle redirected output """ input_handler = self.InputHandler suppress_stderr = self.SuppressStderr outfile = self.getTmpFilename(self.TmpDir) self._outfile = outfile if suppress_stderr: errfile = FilePath('/dev/null') else: errfile = FilePath(self.getTmpFilename(self.TmpDir)) if data is None: input_arg = '' else: input_arg = getattr(self,input_handler)(data) # Build up the command, consisting of a BaseCommand followed by # input and output (file) specifications command = self._command_delimiter.join(filter(None,\ [self.BaseCommand,str(input_arg),'>',str(outfile),'2>',\ str(errfile)])) if self.HaltExec: raise AssertionError, "Halted exec with command:\n" + command # The return value of system is a 16-bit number containing the signal # number that killed the process, and then the exit status. # We only want to keep the exit status so do a right bitwise shift to # get rid of the signal number byte exit_status = system(command) >> 8 # Determine if error should be raised due to exit status of # appliciation if not self._accept_exit_status(exit_status): raise ApplicationError, \ 'Unacceptable application exit status: %s, command: %s'\ % (str(exit_status),command) out = open(outfile,"r") err = None if not suppress_stderr: err = open(errfile,"r") result = CommandLineAppResult(out,err,exit_status,\ result_paths=self._get_result_paths(data)) # Clean up the input file if one was created if remove_tmp: if self._input_filename: remove(self._input_filename) self._input_filename = None return result def _get_result_paths(self, data): result = {} result['Tree'] = ResultPath(Path=self._outfile) return result def build_tree_from_alignment(aln, moltype, best_tree=False, params=None): """Returns a tree from alignment Will check MolType of aln object """ if params is None: params = {} if moltype == DNA or moltype == RNA: params['-nt'] = True elif moltype == PROTEIN: params['-nt'] = False else: raise ValueError, \ "FastTree does not support moltype: %s" % moltype.label app = FastTree(params=params) if best_tree: raise NotImplementedError, "best_tree not implemented yet" result = app(aln.toFasta()) tree = DndParser(result['Tree'].read(), constructor=PhyloNode) return tree PyCogent-1.5.3/cogent/app/foldalign.py000644 000765 000024 00000002707 12024702176 020617 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for Foldalign v2.0.3 application Foldalign takes two sequences as input, these sequences can be in the same file(2) or separate files. ex1 Foldalign file(2) ex2 Foldalign seq1 seq2 """ from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym, is_not_None __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class foldalign(CommandLineApplication): """Applictation controller for foldalign RNA secondary structure prediction application """ _parameters = { '-max_length':ValuedParameter(Prefix='-',Name='max_length',Delimiter=' '), '-max_diff':ValuedParameter(Prefix='-',Name='max_diff',Delimiter=' '), '-score_matrix':ValuedParameter(Prefix='-',Name='score_matrix',Delimiter=' '), '-format':ValuedParameter(Prefix='-',Name='format',Delimiter=' '), '-plot_score':FlagParameter(Prefix='-',Name='plot_score'), '-global':FlagParameter(Prefix='-',Name='global'), '-summary':FlagParameter(Prefix='-',Name='summary'),} _command = 'foldalign' _input_handler = '_input_as_string' PyCogent-1.5.3/cogent/app/formatdb.py000755 000765 000024 00000020775 12024702176 020466 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # Author: Greg Caporaso (gregcaporaso@gmail.com) # formatdb.py """ Description File created on 16 Sep 2009. """ from __future__ import division from optparse import OptionParser from os.path import split, splitext from os import remove from glob import glob from cogent.app.util import CommandLineApplication, ResultPath, get_tmp_filename from cogent.app.parameters import ValuedParameter, FilePath __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Production" class FormatDb(CommandLineApplication): """ ApplicationController for formatting blast databases Currently contains a minimal parameter set. """ _command = 'formatdb' _parameters = {\ '-i':ValuedParameter(Prefix='-',Name='i',Delimiter=' ',IsPath=True),\ '-l':ValuedParameter(Prefix='-',Name='l',Delimiter=' ',IsPath=True),\ '-o':ValuedParameter(Prefix='-',Name='o',Delimiter=' ',Value='T'),\ '-p':ValuedParameter(Prefix='-',Name='p',Delimiter=' ',Value='F'),\ '-n':ValuedParameter(Prefix='-',Name='n',Delimiter=' ') } _input_handler = '_input_as_parameter' _suppress_stdout = True _suppress_stderr = True def _input_as_parameter(self,data): """ Set the input path and log path based on data (a fasta filepath) """ self.Parameters['-i'].on(data) # access data through self.Parameters so we know it's been cast # to a FilePath input_filepath = self.Parameters['-i'].Value input_file_dir, input_filename = split(input_filepath) input_file_base, input_file_ext = splitext(input_filename) # FIXME: the following all other options # formatdb ignores the working directory if not name is passed. self.Parameters['-l'].on(FilePath('%s.log') % input_filename) self.Parameters['-n'].on(FilePath(input_filename)) return '' def _get_result_paths(self,data): """ Build the dict of result filepaths """ # access data through self.Parameters so we know it's been cast # to a FilePath wd = self.WorkingDir db_name = self.Parameters['-n'].Value log_name = self.Parameters['-l'].Value result = {} result['log'] = ResultPath(Path=wd + log_name, IsWritten=True) if self.Parameters['-p'].Value == 'F': extensions = ['nhr','nin','nsq','nsd','nsi'] else: extensions = ['phr','pin','psq','psd','psi'] for extension in extensions: for file_path in glob(wd + (db_name + '*' + extension)): # this will match e.g. nr.01.psd and nr.psd key = file_path.split(db_name + '.')[1] result_path = ResultPath(Path=file_path, IsWritten=True) result[key] = result_path return result def _accept_exit_status(self,exit_status): """ Return True when the exit status was 0 """ return exit_status == 0 def build_blast_db_from_fasta_path(fasta_path,is_protein=False,\ output_dir=None,HALT_EXEC=False): """Build blast db from fasta_path; return db name and list of files created **If using to create temporary blast databases, you can call cogent.util.misc.remove_files(db_filepaths) to clean up all the files created by formatdb when you're done with the database. fasta_path: path to fasta file of sequences to build database from is_protein: True if working on protein seqs (default: False) output_dir: directory where output should be written (default: directory containing fasta_path) HALT_EXEC: halt just before running the formatdb command and print the command -- useful for debugging """ fasta_dir, fasta_filename = split(fasta_path) if not output_dir: output_dir = fasta_dir or '.' # Will cd to this directory, so just pass the filename # so the app is not confused by relative paths fasta_path = fasta_filename if not output_dir.endswith('/'): db_name = output_dir + '/' + fasta_filename else: db_name = output_dir + fasta_filename # instantiate the object fdb = FormatDb(WorkingDir=output_dir,HALT_EXEC=HALT_EXEC) if is_protein: fdb.Parameters['-p'].on('T') else: fdb.Parameters['-p'].on('F') app_result = fdb(fasta_path) db_filepaths = [] for v in app_result.values(): try: db_filepaths.append(v.name) except AttributeError: # not a file object, so no path to return pass return db_name, db_filepaths def build_blast_db_from_fasta_file(fasta_file,is_protein=False,\ output_dir=None,HALT_EXEC=False): """Build blast db from fasta_path; return db name and list of files created **If using to create temporary blast databases, you can call cogent.util.misc.remove_files(db_filepaths) to clean up all the files created by formatdb when you're done with the database. fasta_path: path to fasta file of sequences to build database from is_protein: True if working on protein seqs (default: False) output_dir: directory where output should be written (default: directory containing fasta_path) HALT_EXEC: halt just before running the formatdb command and print the command -- useful for debugging """ output_dir = output_dir or '.' fasta_path = get_tmp_filename(\ tmp_dir=output_dir, prefix="BLAST_temp_db_", suffix=".fasta") fasta_f = open(fasta_path,'w') for line in fasta_file: fasta_f.write('%s\n' % line.strip()) fasta_f.close() blast_db, db_filepaths = build_blast_db_from_fasta_path(\ fasta_path, is_protein=is_protein, output_dir=None, HALT_EXEC=HALT_EXEC) db_filepaths.append(fasta_path) return blast_db, db_filepaths def build_blast_db_from_seqs(seqs,is_protein=False,\ output_dir='./',HALT_EXEC=False): """Build blast db from seqs; return db name and list of files created **If using to create temporary blast databases, you can call cogent.util.misc.remove_files(db_filepaths) to clean up all the files created by formatdb when you're done with the database. seqs: sequence collection or alignment object is_protein: True if working on protein seqs (default: False) output_dir: directory where output should be written (default: current directory) HALT_EXEC: halt just before running the formatdb command and print the command -- useful for debugging """ # Build a temp filepath tmp_fasta_filepath = get_tmp_filename(\ prefix='Blast_tmp_db',suffix='.fasta') # open the temp file tmp_fasta_file = open(tmp_fasta_filepath,'w') # write the sequence collection to file tmp_fasta_file.write(seqs.toFasta()) tmp_fasta_file.close() # build the bast database db_name, db_filepaths = build_blast_db_from_fasta_path(\ tmp_fasta_filepath,is_protein=is_protein,\ output_dir=output_dir,HALT_EXEC=HALT_EXEC) # clean-up the temporary file remove(tmp_fasta_filepath) # return the results return db_name, db_filepaths def parse_command_line_parameters(): """ Parses command line arguments """ usage = 'usage: %prog [options] fasta_filepath' version = 'Version: %prog 0.1' parser = OptionParser(usage=usage, version=version) # A binary 'verbose' flag parser.add_option('-p','--is_protein',action='store_true',\ dest='is_protein',default=False,\ help='Pass if building db of protein sequences '+\ '[default: False, nucleotide db]') parser.add_option('-o','--output_dir',action='store',\ type='string',dest='output_dir',default=None, help='the output directory '+\ '[default: directory containing input fasta_filepath]') opts,args = parser.parse_args() num_args = 1 if len(args) != num_args: parser.error('Must provide single filepath to build database from.') return opts,args if __name__ == "__main__": opts,args = parse_command_line_parameters() fasta_filepath = args[0] is_protein = opts.is_protein output_dir = opts.output_dir db_name, db_filepaths = build_blast_db_from_fasta_path(\ fasta_filepath,is_protein=is_protein,output_dir=output_dir) PyCogent-1.5.3/cogent/app/gctmpca.py000755 000765 000024 00000027240 12024702176 020300 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python # Author: Greg Caporaso (gregcaporaso@gmail.com) # gctmpca.py """Application controller for the Generalized Continuous-Time Markov Process Coevolutionary Algorithm (GCTMPCA). GCTMPCA is presented in: Detecting coevolution in and among protein domains. Yeang CH, Haussler D., PLoS Comput Biol. 2007 Nov;3(11):e211. Detecting the coevolution of biosequences--an example of RNA interaction prediction. Yeang CH, Darot JF, Noller HF, Haussler D. Mol Biol Evol. 2007 Sep;24(9):2119-31. This code requires the GCTMPCA package to be installed. As of Nov. 2008, that software is available at: http://www.sns.ias.edu/~chyeang/coevolution_download.zip Note that the authors did not name their algorithm or software when they published it. GCTMPCA was suggested as a name by the first author via e-mail. """ from __future__ import division from cogent.app.util import CommandLineApplication, ResultPath,\ ApplicationError from cogent.app.parameters import FilePath from cogent.evolve.models import DSO78_freqs, DSO78_matrix __author__ = "Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Greg Caporaso" __email__ = "gregcaporaso@gmail.com" __status__ = "Beta" # Are these values in PyCogent somewhere? gctmpca_base_order = 'ACGU' default_gctmpca_rna_priors = {'A':0.2528,'C':0.2372,'G':0.3099,'U':0.2001} default_gctmpca_rna_sub_matrix = """-1.4150\t0.2372\t0.9777\t0.2001 0.2528\t-1.1940\t0.3099\t0.6313 0.7976\t0.2372\t-1.2349\t0.2001 0.2528\t0.7484\t0.3099\t-1.3111""" gctmpca_aa_order = 'ARNDCQEGHILKMFPSTWYV' # By default, the Gctmpca method used the Dayhoff 78 frequencies and rate matrix default_gctmpca_aa_priors = DSO78_freqs default_gctmpca_aa_sub_matrix = """-133.941451\t1.104408\t3.962336\t5.624640\t1.205064\t3.404695\t9.806940\t21.266880\t0.773214\t2.397590\t3.499637\t2.092532\t1.062216\t0.715896\t12.670000\t28.456993\t21.719082\t0.000000\t0.717984\t13.461344 2.352429\t-86.970372\t1.293824\t0.000000\t0.769902\t9.410730\t0.049530\t0.797508\t8.068320\t2.360704\t1.280355\t37.343648\t1.327770\t0.556808\t5.220040\t10.714858\t1.522092\t2.109294\t0.239328\t1.553232 8.538446\t1.308928\t-179.776579\t42.419160\t0.000000\t3.940265\t7.330440\t12.317068\t17.985630\t2.840222\t2.902138\t25.593276\t0.014753\t0.556808\t2.128560\t34.440615\t13.406118\t0.241362\t2.842020\t0.970770 10.455240\t0.000000\t36.590960\t-142.144945\t0.000000\t5.126170\t57.108090\t11.076500\t2.891148\t0.885264\t0.000000\t5.714222\t0.000000\t0.000000\t0.658840\t6.609815\t3.863772\t0.000000\t0.000000\t1.164924 3.136572\t0.940792\t0.000000\t0.000000\t-26.760991\t0.000000\t0.000000\t0.974732\t0.941304\t1.622984\t0.000000\t0.000000\t0.000000\t0.000000\t0.962920\t11.201897\t0.936672\t0.000000\t2.871936\t3.171182 7.754303\t10.062384\t4.164496\t6.280848\t0.000000\t-124.487960\t35.463480\t2.481136\t20.372508\t0.663948\t6.231061\t12.313746\t1.681842\t0.000000\t7.754040\t3.896312\t3.102726\t0.000000\t0.000000\t2.265130 17.251146\t0.040904\t5.983936\t54.043416\t0.000000\t27.390580\t-136.769106\t7.177572\t1.445574\t2.250046\t0.938927\t6.680006\t0.442590\t0.000000\t2.584680\t5.496583\t1.990428\t0.000000\t0.658152\t2.394566 20.910480\t0.368136\t5.620048\t5.859000\t0.368214\t1.071140\t4.011930\t-65.418192\t0.336180\t0.000000\t0.597499\t2.173014\t0.250801\t0.596580\t1.723120\t16.281018\t1.756260\t0.000000\t0.000000\t3.494772 2.003921\t9.816960\t21.631120\t4.030992\t0.937272\t23.182530\t2.129790\t0.886120\t-88.051504\t0.258202\t3.755708\t2.092532\t0.000000\t1.909056\t4.763920\t2.435195\t1.287924\t0.283338\t3.799332\t2.847592 5.663255\t2.617856\t3.113264\t1.124928\t1.472856\t0.688590\t3.021330\t0.000000\t0.235326\t-128.487912\t21.936749\t3.702172\t4.957008\t7.795312\t0.608160\t1.669848\t11.240064\t0.000000\t1.106892\t57.534302 3.572207\t0.613560\t1.374688\t0.000000\t0.000000\t2.792615\t0.544830\t0.620284\t1.479192\t9.479702\t-53.327266\t1.448676\t7.774831\t6.244204\t1.621760\t1.182809\t1.931886\t0.482724\t0.837648\t11.325650 2.265302\t18.979456\t12.857376\t3.327912\t0.000000\t5.853015\t4.110990\t2.392524\t0.874068\t1.696756\t1.536426\t-74.828436\t3.584979\t0.000000\t1.672440\t6.679392\t7.961712\t0.000000\t0.388908\t0.647180 6.273144\t3.681360\t0.040432\t0.000000\t0.000000\t4.361070\t1.485900\t1.506404\t0.000000\t12.393696\t44.983139\t19.557126\t-125.902241\t3.659024\t0.861560\t4.313774\t6.088368\t0.000000\t0.000000\t16.697244 1.568286\t0.572656\t0.566048\t0.000000\t0.000000\t0.000000\t0.000000\t1.329180\t1.613664\t7.229656\t13.401049\t0.000000\t1.357276\t-54.612411\t0.557480\t3.200542\t0.761046\t0.797544\t20.881368\t0.776616 21.781750\t4.213112\t1.698144\t0.609336\t0.636006\t5.853015\t2.526030\t3.012808\t3.160092\t0.442632\t2.731424\t2.655906\t0.250801\t0.437492\t-74.727653\t17.046365\t4.566276\t0.000000\t0.000000\t3.106464 35.634943\t6.299216\t20.013840\t4.452840\t5.389314\t2.142280\t3.912870\t20.735208\t1.176630\t0.885264\t1.451069\t7.726272\t0.914686\t1.829512\t12.416600\t-160.924378\t32.198100\t0.787050\t1.017144\t1.941540 32.324117\t1.063504\t9.258928\t3.093552\t0.535584\t2.027515\t1.684020\t2.658360\t0.739596\t7.082112\t2.816781\t10.945552\t1.534312\t0.517036\t3.953040\t38.267350\t-129.918557\t0.000000\t1.256472\t10.160726 0.000000\t8.221704\t0.929936\t0.000000\t0.000000\t0.000000\t0.000000\t0.000000\t0.907686\t0.000000\t3.926422\t0.000000\t0.000000\t3.022672\t0.000000\t5.218275\t0.000000\t-24.051571\t1.824876\t0.000000 2.091048\t0.327232\t3.841040\t0.000000\t3.213504\t0.000000\t1.089660\t0.000000\t4.269486\t1.364782\t2.389996\t1.046266\t0.000000\t27.760856\t0.000000\t2.365618\t2.458764\t0.640134\t-54.670490\t1.812104 18.122416\t0.981696\t0.606480\t0.843696\t1.640226\t1.338925\t1.832610\t4.785048\t1.479192\t32.791654\t14.937475\t0.804820\t3.806274\t0.477264\t2.432640\t2.087310\t9.191094\t0.000000\t0.837648\t-98.996468""" class Gctmpca(CommandLineApplication): """ App controller for the GCTMPCA algorithm for detecting sequence coevolution The Generalized Continuous-Time Markov Process Coevolutionary Algorithm (GCTMPCA) is presented in: Detecting coevolution in and among protein domains. Yeang CH, Haussler D., PLoS Comput Biol. 2007 Nov;3(11):e211. Detecting the coevolution of biosequences--an example of RNA interaction prediction. Yeang CH, Darot JF, Noller HF, Haussler D. Mol Biol Evol. 2007 Sep;24(9):2119-31. This code requires the GCTMPCA package to be installed. As of 11/08, that software is available at: http://www.sns.ias.edu/~chyeang/coevolution_download.zip """ _command = 'calculate_likelihood' _input_handler = '_gctmpca_cl_input' _data = {'mol_type':None,'comparison_type':0,'seqs1':None,\ 'seqs2':'-','tree1':None,'tree2':'-',\ 'seq_names':None,'species_tree':None,\ 'seq_to_species1':None,'seq_to_species2':'-',\ 'char_priors':None,'sub_matrix':None,'epsilon':0.7,\ 'max_gap_threshold':1.0,'max_seq_distance':1.0,\ 'covariation_threshold':0.0,'likelihood_threshold':0.0,\ 'output_path':None,'single_pair_only':0,'family_reps':'-',\ 'pos1':'','pos2':''} _parameter_order = ['mol_type','comparison_type','seqs1','seqs2',\ 'tree1','tree2','seq_names','species_tree',\ 'seq_to_species1','seq_to_species2','char_priors',\ 'sub_matrix','epsilon','max_gap_threshold','max_seq_distance',\ 'covariation_threshold','likelihood_threshold','output_path',\ 'single_pair_only','family_reps','pos1','pos2'] _potential_paths = ['seqs1','tree1','seq_names',\ 'species_tree','seq_to_species1'] _mol_type_lookup = {'rna':0,'0':0,'protein':1,'1':1} _default_priors = {0:default_gctmpca_rna_priors, 1:default_gctmpca_aa_priors} _default_sub_matrix = {0:default_gctmpca_rna_sub_matrix, 1:default_gctmpca_aa_sub_matrix} _char_order = {0:gctmpca_base_order,1:gctmpca_aa_order} _required_parameters = {}.fromkeys(['mol_type','seqs1','tree1',\ 'seq_names','species_tree','seq_to_species1']) def _set_command_line_parameters(self,data): """ Get the right setting for each command line parameter """ # This function could be cleaned up. # for each command line parameter, set it to the value passed in or # the default value. for p in self._parameter_order: if p not in data: if p in self._required_parameters: raise ApplicationError,\ "Required parameter %s missing." % p else: data[p] = self._data[p] # Write necessary files to disk -- need to modify this so paths # to existing files can be passed in. if p in self._potential_paths: try: data[p] = self._input_as_lines(data[p]) except TypeError: pass if data['single_pair_only'] == 1 and \ not (data['pos1'] and data['pos2']): raise ApplicationError,\ "Must specify pos1 and pos2 if single_pair_only == 1." # Make sure the MolType is in the correct format (i.e., 1 or 0) data['mol_type'] = mol_type = \ self._mol_type_lookup[str(data['mol_type']).lower()] char_order = self._char_order[mol_type] # If we didn't get several values as parameters, set the defaults. # These are done outside of the above loop b/c they require special # handling. if not data['char_priors']: data['char_priors'] = self._default_priors[mol_type] data['char_priors'] = \ self._input_as_lines(\ self._input_as_gctmpca_char_priors(\ data['char_priors'],char_order)) if not data['sub_matrix']: data['sub_matrix'] = \ self._input_as_multiline_string(\ self._default_sub_matrix[mol_type]) else: data['sub_matrix'] = \ self._input_as_lines(\ self._input_as_gctmpca_rate_matrix(\ data['sub_matrix'],char_order)) if not data['output_path']: data['output_path'] = \ self._input_as_path(self.getTmpFilename()) return data def _gctmpca_cl_input(self,data): """ Write the list of 22 command line parameters to a string """ # Get the right setting for each parameter data = self._set_command_line_parameters(data) # Explicitly disallow intermolecular experiments (I do this here to # make sure I'm looking at the final version of data) if data['comparison_type'] == 1: raise NotImplementedError,\ "Intermolecular experiments currently supported only via coevolve_alignments." # Create the command line parameter string and return it return ' '.join([str(data[p]) for p in self._parameter_order]).strip() def _input_as_gctmpca_char_priors(self,priors,char_order): """convert dict of priors to string and write it to tmp file """ # priors t be followed by a newline return ['\t'.join([str(priors[c]) for c in char_order]),''] def _input_as_gctmpca_rate_matrix(self,matrix,char_order): """convert 2D dict rate matrix to string and write it to tmp file """ matrix_rows = [] for c in char_order: matrix_rows.append('\t'.join([str(matrix[c][col_c]) \ for col_c in char_order])) return matrix_rows def _get_result_paths(self,data): """A single file is written, w/ name specified in command line input """ return {'output':ResultPath(Path=data['output_path'],IsWritten=True)} PyCogent-1.5.3/cogent/app/guppy.py000644 000765 000024 00000016302 12024702176 020020 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for guppy 1.1""" __author__ = "Jesse Stombaugh" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Stombaugh"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Stombaugh" __email__ = "jesse.stombaugh@colorado.edu" __status__ = "Production" from cogent.app.parameters import ValuedParameter, FlagParameter from cogent.app.util import CommandLineApplication, FilePath, system, \ CommandLineAppResult, ResultPath, remove, ApplicationError from cogent.core.alignment import Alignment from os.path import splitext,split,join from os import listdir from cogent.parse.tree import DndParser from cogent.core.tree import PhyloNode class Guppy(CommandLineApplication): """guppy Application Controller """ _command = 'guppy' _input_handler = '_input_as_multiline_string' _parameters = { #visualizations # makes trees with edges fattened in proportion to the number of reads 'fat': FlagParameter('', Name='fat'), # maps an an arbitrary vector of the correct length to the tree 'heat': FlagParameter('', Name='heat'), # writes a taxonomically annotated reference tree and an induced # taxonomic tree 'ref_tree': FlagParameter('', Name='ref_tree'), # makes one tree for each query sequence, showing uncertainty 'sing': FlagParameter('', Name='sing'), # makes a tree with each of the reads represented as a pendant edge 'tog': FlagParameter('', Name='tog'), #statistical comparison # draws the barycenter of a placement collection on the reference tree 'bary': FlagParameter('', Name='bary'), # makes a phyloXML tree showing the bootstrap values 'bootviz': FlagParameter('', Name='bootviz'), # calculates the EDPL uncertainty values for a collection of pqueries 'edpl': FlagParameter('', Name='edpl'), # calculates the Kantorovich-Rubinstein distance and corresponding # p-values 'kr': FlagParameter('', Name='kr'), # makes a heat tree 'kr_heat': FlagParameter('', Name='kr_heat'), # performs edge principal components 'pca': FlagParameter('', Name='pca'), # writes out differences of masses for the splits of the tree 'splitify': FlagParameter('', Name='splitify'), # performs squash clustering 'squash': FlagParameter('', Name='squash'), #classification # outputs classification information in a tabular or SQLite format 'classify': FlagParameter('', Name='classify'), #utilities # check a reference package 'check_refpkg': FlagParameter('', Name='check_refpkg'), # splits apart placements with multiplicity, undoing a round procedure 'demulti': FlagParameter('', Name='demulti'), # prints out a pairwise distance matrix between the edges 'distmat': FlagParameter('', Name='distmat'), # filters one or more placefiles by placement name 'filter': FlagParameter('', Name='filter'), # writes the number of leaves of the reference tree and the number of # pqueries 'info': FlagParameter('', Name='info'), # merges placefiles together 'merge': FlagParameter('', Name='merge'), # restores duplicates to deduped placefiles 'redup': FlagParameter('', Name='redup'), # clusters the placements by rounding branch lengths 'round': FlagParameter('', Name='round'), # makes SQL enabling taxonomic querying of placement results 'taxtable': FlagParameter('', Name='taxtable'), # converts old-style .place files to .json placement files 'to_json': FlagParameter('', Name='to_json'), # Run the provided batch file of guppy commands 'batch': FlagParameter('--', Name='batch'), # Print version and exit 'version': FlagParameter('--', Name='version'), # Print a list of the available commands. 'cmds': FlagParameter('--', Name='cmds'), # Display this list of options '--help': FlagParameter('--', Name='help'), # Display this list of options '-help': FlagParameter('-', Name='help'), } def getTmpFilename(self, tmp_dir='/tmp/',prefix='tmp',suffix='.json',\ include_class_id=False,result_constructor=FilePath): """ Define Tmp filename to contain .json suffix, since guppy requires the suffix to be .json """ return super(Guppy,self).getTmpFilename(tmp_dir=tmp_dir, prefix=prefix, suffix=suffix, include_class_id=include_class_id, result_constructor=result_constructor) def _handle_app_result_build_failure(self,out,err,exit_status,result_paths): """ Catch the error when files are not produced """ raise ApplicationError, \ 'Guppy failed to produce an output file due to the following error: \n\n%s ' \ % err.read() def _get_result_paths(self,data): basepath,basename=split(splitext(self._input_filename)[0]) outfile_list=listdir(split(self._input_filename)[0]) result = {} for i in outfile_list: if i.startswith(basename) and not i.endswith('.json') and \ not i.endswith('.txt'): result['result'] = ResultPath(Path=join(basepath,i)) return result def build_tree_from_json_using_params(fname,output_dir='/tmp/',params={}): """Returns a tree from a json. fname: filepath to input json output_dir: location of output files params: dict of parameters to pass in to the RAxML app controller. The result will be a Tree. """ # convert aln to fasta in case it is not already a fasta file ih = '_input_as_multiline_string' guppy_app = Guppy(params=params, InputHandler=ih, WorkingDir=output_dir, TmpDir=output_dir, SuppressStderr=True, SuppressStdout=True, HALT_EXEC=False) guppy_result = guppy_app(open(fname).read()) try: new_tree=guppy_result['result'].read() except: # catch the error of not producing any results and print the command # run so user can check error guppy_cmd=Guppy(params=params, InputHandler=ih, WorkingDir=output_dir, TmpDir=output_dir, SuppressStderr=True, SuppressStdout=True, HALT_EXEC=True) out_msg=guppy_cmd(open(fname).read()) tree = DndParser(new_tree, constructor=PhyloNode) guppy_result.cleanUp() return tree PyCogent-1.5.3/cogent/app/ilm.py000644 000765 000024 00000006757 12024702176 017452 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter,ValuedParameter,Parameters __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class ILM(CommandLineApplication): """Application controller ILM application Predict a secondary structure given a score matrix Main options: -L l: minimum loop length (default=3) -V v: minimum virtual loop length (default=3) -H h: minimum helix length (default=3) -N n: number of helices selected per iteration (default=1) -I i: number of iterations before termination(default=unlimited) """ _parameters = { '-L':ValuedParameter(Prefix='-',Name='L',Delimiter=' '), '-V':ValuedParameter(Prefix='-',Name='V',Delimiter=' '), '-H':ValuedParameter(Prefix='-',Name='H',Delimiter=' '), '-N':ValuedParameter(Prefix='-',Name='N',Delimiter=' '), '-I':ValuedParameter(Prefix='-',Name='I',Delimiter=' ')} _command = 'ilm' _input_handler = '_input_as_string' class hlxplot(CommandLineApplication): """Application controller hlxplot application Compute a helix plot score matrix from a sequence alignment Options: -b B: Set bad pair penalty to B (Default = 2) -g G: Set good pair score to G (Default = 1) -h H: Set minimum helix length to H (Default = 2) -l L: Set minimum loop length to L (Default = 3) -s S: Set helix length score to S (Default = 2.0) -t : Write output in text format (Default = Binary format) -x X: Set paired gap penalty to X (Default = 3) """ _parameters = { '-b':ValuedParameter(Prefix='-',Name='b',Delimiter=' '), '-g':ValuedParameter(Prefix='-',Name='g',Delimiter=' '), '-h':ValuedParameter(Prefix='-',Name='h',Delimiter=' '), '-l':ValuedParameter(Prefix='-',Name='l',Delimiter=' '), '-s':ValuedParameter(Prefix='-',Name='s',Delimiter=' '), '-t':ValuedParameter(Prefix='-',Name='t',Delimiter=' '), '-x':ValuedParameter(Prefix='-',Name='x',Delimiter=' ')} _command = 'hlxplot' _input_handler = '_input_as_string' class xhlxplot(CommandLineApplication): """Application controller xhlxplot application Compute an extended helix plot score matrix from a single sequence Options: -b B: Set bad pair penalty to B (Default = 200) -h H: Set minimum helix length to H (Default = 2) -l L: Set minimum loop length to L (Default = 3) -x X: Set paired gap penalty to X (Default = 500) -t : Write output in text format (Default = Binary format) -c : No Closing GU (Default = allows closing GU) """ _parameters = { '-b':ValuedParameter(Prefix='-',Name='b',Delimiter=' '), '-h':ValuedParameter(Prefix='-',Name='h',Delimiter=' '), '-l':ValuedParameter(Prefix='-',Name='l',Delimiter=' '), '-x':ValuedParameter(Prefix='-',Name='x',Delimiter=' '), '-t':ValuedParameter(Prefix='-',Name='t',Delimiter=' '), '-c':ValuedParameter(Prefix='-',Name='c',Delimiter=' ')} _command = 'xhlxplot' _input_handler = '_input_as_string' PyCogent-1.5.3/cogent/app/infernal.py000644 000765 000024 00000212125 12024702176 020453 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Provides an application controller for the commandline version of: Infernal 1.0 and 1.0.2 only. """ from cogent.app.parameters import FlagParameter, ValuedParameter, FilePath from cogent.app.util import CommandLineApplication, ResultPath, get_tmp_filename from cogent.parse.fasta import MinimalFastaParser from cogent.parse.rfam import MinimalRfamParser, ChangedSequence, \ ChangedRnaSequence, ChangedDnaSequence from cogent.parse.infernal import CmsearchParser from cogent.core.moltype import DNA, RNA from cogent.core.alignment import SequenceCollection, Alignment, DataError from cogent.format.stockholm import stockholm_from_alignment from cogent.struct.rna2d import ViennaStructure, wuss_to_vienna from os import remove __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" MOLTYPE_MAP = {'DNA':'--dna',\ DNA:'--dna',\ 'RNA':'--rna',\ RNA:'--rna',\ } SEQ_CONSTRUCTOR_MAP = {'DNA':ChangedDnaSequence,\ DNA:ChangedDnaSequence,\ 'RNA':ChangedRnaSequence,\ RNA:ChangedRnaSequence,\ } class Cmalign(CommandLineApplication): """cmalign application controller.""" _options = { # -o Save the alignment in Stockholm format to a file . The default # is to write it to standard output. '-o':ValuedParameter(Prefix='-',Name='o',Delimiter=' '),\ # -l Turn on the local alignment algorithm. Default is global. '-l':FlagParameter(Prefix='-',Name='l'),\ # -p Annotate the alignment with posterior probabilities calculated using # the Inside and Outside algorithms. '-p':FlagParameter(Prefix='-',Name='p'),\ # -q Quiet; suppress the verbose banner, and only print the resulting # alignment to stdout. '-q':FlagParameter(Prefix='-',Name='q'),\ # --informat Assert that the input seqfile is in format . Do not run # Babelfish format autodection. Acceptable formats are: FASTA, EMBL, # UNIPROT, GENBANK, and DDBJ. is case-insensitive. '--informat':ValuedParameter(Prefix='--',Name='informat',Delimiter=' '),\ # --mpi Run as an MPI parallel program. (see User's Guide for details). '--mpi':FlagParameter(Prefix='--',Name='mpi'),\ # Expert Options # --optacc Align sequences using the Durbin/Holmes optimal accuracy # algorithm. This is default behavior, so this option is probably useless. '--optacc':FlagParameter(Prefix='--',Name='optacc'),\ # --cyk Do not use the Durbin/Holmes optimal accuracy alignment to align the # sequences, instead use the CYK algorithm which determines the optimally # scoring alignment of the sequence to the model. '--cyk':FlagParameter(Prefix='--',Name='cyk'),\ # --sample Sample an alignment from the posterior distribution of # alignments. '--sample':FlagParameter(Prefix='--',Name='sample'),\ # -s Set the random number generator seed to , where is a # positive integer. This option can only be used in combination with # --sample. The default is to use time() to generate a different seed for # each run, which means that two different runs of cmalign --sample on the # same alignment will give slightly different results. You can use this # option to generate reproducible results. '-s':ValuedParameter(Prefix='-',Name='s',Delimiter=' '),\ # --viterbi Do not use the CM to align the sequences, instead use the HMM # Viterbi algorithm to align with a CM Plan 9 HMM. '--viterbi':FlagParameter(Prefix='--',Name='viterbi'),\ # --sub Turn on the sub model construction and alignment procedure. '--sub':FlagParameter(Prefix='--',Name='sub'),\ # --small Use the divide and conquer CYK alignment algorithm described in # SR Eddy, BMC Bioinformatics 3:18, 2002. '--small':FlagParameter(Prefix='--',Name='small'),\ # --hbanded This option is turned on by default. Accelerate alignment by # pruning away regions of the CM DP matrix that are deemed negligible by # an HMM. '--hbanded':FlagParameter(Prefix='--',Name='hbanded'),\ # --nonbanded Turns off HMM banding. '--nonbanded':FlagParameter(Prefix='--',Name='nonbanded'),\ # --tau Set the tail loss probability used during HMM band calculation # to . '--tau':ValuedParameter(Prefix='--',Name='tau',Delimiter=' '),\ # --mxsize Set the maximum allowable DP matrix size to megabytes. '--mxsize':ValuedParameter(Prefix='--',Name='mxsize',Delimiter=' '),\ # --rna Output the alignments as RNA sequence alignments. This is true by # default. '--rna':FlagParameter(Prefix='--',Name='rna'),\ # --dna Output the alignments as DNA sequence alignments. '--dna':FlagParameter(Prefix='--',Name='dna'),\ # --matchonly Only include match columns in the output alignment, do not # include any insertions relative to the consensus model. '--matchonly':FlagParameter(Prefix='--',Name='matchonly'),\ # --resonly Only include match columns in the output alignment that have at # least 1 residue (non-gap character) in them. '--resonly':FlagParameter(Prefix='--',Name='resonly'),\ # --fins Change the behavior of how insert emissions are placed in the # alignment. '--fins':FlagParameter(Prefix='--',Name='fins'),\ # --onepost Modifies behavior of the -p option. Use only one character # instead of two to annotate the posterior probability of each aligned # residue. '--onepost':FlagParameter(Prefix='--',Name='onepost'),\ # --withali Reads an alignment from file and aligns it as a single # object to the CM; e.g. the alignment in is held fixed. '--withali':ValuedParameter(Prefix='--',Name='withali',Delimiter=' '),\ # --withpknots Must be used in combination with --withali . Propogate # structural information for any pseudoknots that exist in to the # output alignment. '--withpknots':FlagParameter(Prefix='--',Name='withpknots'),\ # --rf Must be used in combination with --withali . Specify that the # alignment in has the same "#=GC RF" annotation as the alignment file # the CM was built from using cmbuild and further that the --rf option was # supplied to cmbuild when the CM was constructed. '--rf':FlagParameter(Prefix='--',Name='rf'),\ # --gapthresh Must be used in combination with --withali . Specify # that the --gapthresh option was supplied to cmbuild when the CM was # constructed from the alignment file . '--gapthresh':ValuedParameter(Prefix='--',Name='gapthresh',Delimiter=' '),\ # --tfile Dump tabular sequence tracebacks for each individual sequence # to a file . Primarily useful for debugging. '--tfile':ValuedParameter(Prefix='--',Name='tfile',Delimiter=' '),\ } _parameters = {} _parameters.update(_options) _command = "cmalign" _suppress_stderr=True def getHelp(self): """Method that points to the Infernal documentation.""" help_str = \ """ See Infernal documentation at: http://infernal.janelia.org/ """ return help_str def _tempfile_as_multiline_string(self, data): """Write a multiline string to a temp file and return the filename. data: a multiline string to be written to a file. * Note: the result will be the filename as a FilePath object (which is a string subclass). """ filename = FilePath(self.getTmpFilename(self.TmpDir)) data_file = open(filename,'w') data_file.write(data) data_file.close() return filename def _alignment_out_filename(self): if self.Parameters['-o'].isOn(): refined_filename = self._absolute(str(\ self.Parameters['-o'].Value)) else: raise ValueError, 'No alignment output file specified.' return refined_filename def _get_result_paths(self,data): result = {} if self.Parameters['-o'].isOn(): out_name = self._alignment_out_filename() result['Alignment'] = ResultPath(Path=out_name,IsWritten=True) return result class Cmbuild(CommandLineApplication): """cmbuild application controller.""" _options = { # -n Name the covariance model . (Does not work if alifile contains # more than one alignment). '-n':ValuedParameter(Prefix='-',Name='n',Delimiter=' '),\ # -A Append the CM to cmfile, if cmfile already exists. '-A':FlagParameter(Prefix='-',Name='A'),\ # -F Allow cmfile to be overwritten. Normally, if cmfile already exists, # cmbuild exits with an error unless the -A or -F option is set. '-F':FlagParameter(Prefix='-',Name='F'),\ # -v Run in verbose output mode instead of using the default single line # tabular format. This output format is similar to that used by older # versions of Infernal. '-v':FlagParameter(Prefix='-',Name='v'),\ # --iins Allow informative insert emissions for the CM. By default, all CM # insert emission scores are set to 0.0 bits. '--iins':FlagParameter(Prefix='--',Name='iins'),\ # --Wbeta Set the beta tail loss probability for query-dependent banding # (QDB) to The QDB algorithm is used to determine the maximium length # of a hit to the model. For more information on QDB see (Nawrocki and # Eddy, PLoS Computational Biology 3(3): e56). '--Wbeta':ValuedParameter(Prefix='--',Name='Wbeta',Delimiter=' '),\ # Expert Options # --rsearch Parameterize emission scores a la RSEARCH, using the # RIBOSUM matrix in file . For more information see the RSEARCH # publication (Klein and Eddy, BMC Bioinformatics 4:44, 2003). Actually, # the emission scores will not exactly With --rsearch enabled, all # alignments in alifile must contain exactly one sequence or the --call # option must also be enabled. '--rsearch':ValuedParameter(Prefix='--',Name='rsearch',Delimiter=' '),\ # --binary Save the model in a compact binary format. The default is a more # readable ASCII text format. '--binary':FlagParameter(Prefix='--',Name='binary'),\ # --rf Use reference coordinate annotation (#=GC RF line, in Stockholm) to # determine which columns are consensus, and which are inserts. '--rf':FlagParameter(Prefix='--',Name='rf'),\ # --gapthresh Set the gap threshold (used for determining which columns # are insertions versus consensus; see --rf above) to . The default is # 0.5. '--gapthresh':ValuedParameter(Prefix='--',Name='gapthresh',Delimiter=' '),\ # --ignorant Strip all base pair secondary structure information from all # input alignments in alifile before building the CM(s). '--ignorant':FlagParameter(Prefix='--',Name='ignorant'),\ # --wgsc Use the Gerstein/Sonnhammer/Chothia (GSC) weighting algorithm. # This is the default unless the number of sequences in the alignment # exceeds a cutoff (see --pbswitch), in which case the default becomes # the faster Henikoff position-based weighting scheme. '--wgsc':FlagParameter(Prefix='--',Name='wgsc'),\ # --wblosum Use the BLOSUM filtering algorithm to weight the sequences, # instead of the default GSC weighting. '--wblosum':FlagParameter(Prefix='--',Name='wblosum'),\ # --wpb Use the Henikoff position-based weighting scheme. This weighting # scheme is automatically used (overriding --wgsc and --wblosum) if the # number of sequences in the alignment exceeds a cutoff (see --pbswitch). '--wpb':FlagParameter(Prefix='--',Name='wpb'),\ # --wnone Turn sequence weighting off; e.g. explicitly set all sequence # weights to 1.0. '--wnone':FlagParameter(Prefix='--',Name='wnone'),\ # --wgiven Use sequence weights as given in annotation in the input # alignment file. If no weights were given, assume they are all 1.0. # The default is to determine new sequence weights by the Gerstein/ # Sonnhammer/Chothia algorithm, ignoring any annotated weights. '--wgiven':FlagParameter(Prefix='--',Name='wgiven'),\ # --pbswitch Set the cutoff for automatically switching the weighting # method to the Henikoff position-based weighting scheme to . If the # number of sequences in the alignment exceeds Henikoff weighting is # used. By default is 5000. '--pbswitch':ValuedParameter(Prefix='--',Name='pbswitch',Delimiter=' '),\ # --wid Controls the behavior of the --wblosum weighting option by # setting the percent identity for clustering the alignment to . '--wid':ValuedParameter(Prefix='--',Name='wid',Delimiter=' '),\ # --eent Use the entropy weighting strategy to determine the effective # sequence number that gives a target mean match state relative entropy. '--wgiven':FlagParameter(Prefix='--',Name='wgiven'),\ # --enone Turn off the entropy weighting strategy. The effective sequence # number is just the number of sequences in the alignment. '--wgiven':FlagParameter(Prefix='--',Name='wgiven'),\ # --ere Set the target mean match state entropy as . By default the # target entropy 1.46 bits. '--ere':ValuedParameter(Prefix='--',Name='ere',Delimiter=' '),\ # --null Read a null model from . The null model defines the # probability of each RNA nucleotide in background sequence, the default # is to use 0.25 for each nucleotide. '--null':ValuedParameter(Prefix='--',Name='null',Delimiter=' '),\ # --prior Read a Dirichlet prior from , replacing the default mixture # Dirichlet. '--prior':ValuedParameter(Prefix='--',Name='prior',Delimiter=' '),\ # --ctarget Cluster each alignment in alifile by percent identity. # find a cutoff percent id threshold that gives exactly clusters and # build a separate CM from each cluster. If is greater than the number # of sequences in the alignment the program will not complain, and each # sequence in the alignment will be its own cluster. Each CM will have a # positive integer appended to its name indicating the order in which it # was built. '--ctarget':ValuedParameter(Prefix='--',Name='ctarget',Delimiter=' '),\ # --cmaxid Cluster each sequence alignment in alifile by percent # identity. Define clusters at the cutoff fractional id similarity of # and build a separate CM from each cluster. '--cmaxid':ValuedParameter(Prefix='--',Name='cmaxid',Delimiter=' '),\ # --call Build a separate CM from each sequence in each alignment in # alifile. Naming of CMs takes place as described above for --ctarget. '--call':FlagParameter(Prefix='--',Name='call'),\ # --corig After building multiple CMs using --ctarget, --cmindiff or --call # as described above, build a final CM using the complete original # alignment from alifile. '--corig':FlagParameter(Prefix='--',Name='corig'),\ # --cdump Dump the multiple alignments of each cluster to in # Stockholm format. This option only works in combination with --ctarget, # --cmindiff or --call. '--cdump':ValuedParameter(Prefix='--',Name='cdump',Delimiter=' '),\ # --refine Attempt to refine the alignment before building the CM using # expectation-maximization (EM). The final alignment (the alignment used # to build the CM that gets written to cmfile) is written to . '--refine':ValuedParameter(Prefix='--',Name='refine',Delimiter=' '),\ # --gibbs Modifies the behavior of --refine so Gibbs sampling is used # instead of EM. '--gibbs':FlagParameter(Prefix='--',Name='gibbs'),\ # -s Set the random seed to , where is a positive integer. # This option can only be used in combination with --gibbs. The default is # to use time() to generate a different seed for each run, which means # that two different runs of cmbuild --refine --gibbs on the same # alignment will give slightly different results. You can use this option # to generate reproducible results. '-s':ValuedParameter(Prefix='-',Name='s',Delimiter=' '),\ # -l With --refine, turn on the local alignment algorithm, which allows the # alignment to span two or more subsequences if necessary (e.g. if the # structures of the query model and target sequence are only partially # shared), allowing certain large insertions and deletions in the # structure to be penalized differently than normal indels. The default is # to globally align the query model to the target sequences. '-l':ValuedParameter(Prefix='-',Name='l',Delimiter=' '),\ # -a With --refine, print the scores of each individual sequence alignment. '-a':ValuedParameter(Prefix='-',Name='a',Delimiter=' '),\ # --cyk With --refine, align with the CYK algorithm. '--cyk':FlagParameter(Prefix='--',Name='cyk'),\ # --sub With --refine, turn on the sub model construction and alignment # procedure. '--sub':FlagParameter(Prefix='--',Name='sub'),\ # --nonbanded With --refine, do not use HMM bands to accelerate alignment. # Use the full CYK algorithm which is guaranteed to give the optimal # alignment. This will slow down the run significantly, especially for # large models. '--nonbanded':FlagParameter(Prefix='--',Name='nonbanded'),\ # --tau With --refine, set the tail loss probability used during HMM # band calculation to . This is the amount of probability mass within # the HMM posterior probabilities that is considered negligible. The # default value is 1E-7. In general, higher values will result in greater # acceleration, but increase the chance of missing the optimal alignment # due to the HMM bands. '--tau':ValuedParameter(Prefix='--',Name='tau',Delimiter=' '),\ # --fins With --refine, change the behavior of how insert emissions are # placed in the alignment. '--fins':FlagParameter(Prefix='--',Name='fins'),\ # --mxsize With --refine, set the maximum allowable matrix size for # alignment to megabytes. '--mxsize':ValuedParameter(Prefix='--',Name='mxsize',Delimiter=' '),\ # --rdump With --refine, output the intermediate alignments at each # iteration of the refinement procedure (as described above for --refine ) # to file . '--rdump':ValuedParameter(Prefix='--',Name='rdump',Delimiter=' '),\ } _parameters = {} _parameters.update(_options) _command = "cmbuild" _suppress_stderr=True def getHelp(self): """Method that points to the Infernal documentation.""" help_str = \ """ See Infernal documentation at: http://infernal.janelia.org/ """ return help_str def _refine_out_filename(self): if self.Parameters['--refine'].isOn(): refined_filename = self._absolute(str(\ self.Parameters['--refine'].Value)) else: raise ValueError, 'No refine output file specified.' return refined_filename def _cm_out_filename(self): if self.Parameters['-n'].isOn(): refined_filename = self._absolute(str(\ self.Parameters['-n'].Value)) else: raise ValueError, 'No cm output file specified.' return refined_filename def _tempfile_as_multiline_string(self, data): """Write a multiline string to a temp file and return the filename. data: a multiline string to be written to a file. * Note: the result will be the filename as a FilePath object (which is a string subclass). """ filename = FilePath(self.getTmpFilename(self.TmpDir)) data_file = open(filename,'w') data_file.write(data) data_file.close() return filename def _get_result_paths(self,data): result = {} if self.Parameters['--refine'].isOn(): out_name = self._refine_out_filename() result['Refined'] = ResultPath(Path=out_name,IsWritten=True) if self.Parameters['-n'].isOn(): cm_name = self._cm_out_filename() result['CmFile'] = ResultPath(Path=cm_name,IsWritten=True) return result class Cmcalibrate(CommandLineApplication): """cmcalibrate application controller.""" _options = { # -s Set the random number generator seed to , where is a # positive integer. The default is to use time() to generate a different # seed for each run, which means that two different runs of cmcalibrate on # the same CM will give slightly different E-value and HMM filter # threshold parameters. You can use this option to generate reproducible # results. '-s':ValuedParameter(Prefix='-',Name='s',Delimiter=' '),\ # --forecast Predict the running time of the calibration for cmfile and # provided options and exit, DO NOT perform the calibration. '--forecast':ValuedParameter(Prefix='--',Name='forecast',Delimiter=' '),\ # --mpi Run as an MPI parallel program. '--mpi':FlagParameter(Prefix='--',Name='mpi'),\ # Expert Options # --exp-cmL-glc Set the length of random sequence to search for the CM # glocal exponential tail fits to megabases (Mb). '--exp-cmL-glc':ValuedParameter(Prefix='--',Name='exp-cmL-glc',\ Delimiter=' '),\ # --exp-cmL-loc Set the length of random sequence to search for the CM # local exponential tail fits to megabases (Mb). '--exp-cmL-loc':ValuedParameter(Prefix='--',Name='exp-cmL-loc',\ Delimiter=' '),\ # --exp-hmmLn-glc Set the minimum random sequence length to search for # the HMM glocal exponential tail fits to megabases (Mb). '--exp-hmmLn-glc':ValuedParameter(Prefix='--',Name='exp-hmmLn-glc',\ Delimiter=' '),\ # --exp-hmmLn-loc Set the minimum random sequence length to search for # the HMM local exponential tail fits to megabases (Mb). '--exp-hmmLn-loc':ValuedParameter(Prefix='--',Name='exp-hmmLn-loc',\ Delimiter=' '),\ # --exp-hmmLx Set the maximum random sequence length to search when # determining HMM E-values to megabases (Mb). '--exp-hmmLx':ValuedParameter(Prefix='--',Name='exp-hmmLx',Delimiter=' '),\ # --exp-fract Set the HMM/CM fraction of dynamic programming # calculations to . '--exp-fract':ValuedParameter(Prefix='--',Name='exp-fract',Delimiter=' '),\ # --exp-tailn-cglc During E-value calibration of glocal CM search modes # fit the exponential tail to the high scores in the histogram tail that # includes hits per Mb searched. '--exp-tailn-cglc':ValuedParameter(Prefix='--',Name='exp-tailn-cglc',\ Delimiter=' '),\ # --exp-tailn-cloc During E-value calibration of local CM search modes # fit the exponential tail to the high scores in the histogram tail that # includes hits per Mb searched. '--exp-tailn-cloc':ValuedParameter(Prefix='--',Name='exp-tailn-cloc',\ Delimiter=' '),\ # --exp-tailn-hglc During E-value calibration of glocal HMM search modes # fit the exponential tail to the high scores in the histogram tail that # includes hits per Mb searched. '--exp-tailn-hglc':ValuedParameter(Prefix='--',Name='exp-tailn-hglc',\ Delimiter=' '),\ # --exp-tailn-hloc During E-value calibration of local HMM search modes # fit the exponential tail to the high scores in the histogram tail that # includes hits per Mb searched. '--exp-tailn-hloc':ValuedParameter(Prefix='--',Name='exp-tailn-hloc',\ Delimiter=' '),\ # --exp-tailp Ignore the --exp-tailn prefixed options and fit the # fraction right tail of the histogram to exponential tails, for all # search modes. '--exp-tailp':ValuedParameter(Prefix='--',Name='exp-tailp',Delimiter=' '),\ # --exp-tailxn With --exp-tailp enforce that the maximum number of hits # in the tail that is fit is . '--exp-tailxn':ValuedParameter(Prefix='--',Name='exp-tailxn',\ Delimiter=' '),\ # --exp-beta During E-value calibration, by default query-dependent # banding (QDB) is used to accelerate the CM search algorithms with a beta # tail loss probability of 1E-15. '--exp-beta':ValuedParameter(Prefix='--',Name='exp-beta',Delimiter=' '),\ # --exp-no-qdb Turn of QDB during E-value calibration. This will slow down # calibration, and is not recommended unless you plan on using --no-qdb in # cmsearch. '--exp-no-qdb':FlagParameter(Prefix='--',Name='exp-no-qdb'),\ # --exp-hfile Save the histograms fit for the E-value calibration to # file . The format of this file is two tab delimited columns. '--exp-hfile':ValuedParameter(Prefix='--',Name='exp-hfile',Delimiter=' '),\ # --exp-sfile Save a survival plot for the E-value calibration to file # . The format of this file is two tab delimited columns. '--exp-sfile':ValuedParameter(Prefix='--',Name='exp-sfile',Delimiter=' '),\ # --exp-qqfile Save a quantile-quantile plot for the E-value calibration # to file . The format of this file is two tab delimited columns. '--exp-qqfile':ValuedParameter(Prefix='--',Name='exp-qqfile',\ Delimiter=' '),\ # --exp-ffile Save statistics on the exponential tail statistics to file # . The file will contain the lambda and mu values for exponential # tails fit to tails of different sizes. '--exp-ffile':ValuedParameter(Prefix='--',Name='exp-ffile',Delimiter=' '),\ # --fil-N Set the number of sequences sampled and searched for the HMM # filter threshold calibration to . By default, is 10,000. '--fil-N':ValuedParameter(Prefix='--',Name='fil-N',Delimiter=' '),\ # --fil-F Set the fraction of sample sequences the HMM filter must be # able to recognize, and allow to survive, to , where is a positive # real number less than or equal to 1.0. By default, is 0.995. '--fil-F':ValuedParameter(Prefix='--',Name='fil-F',Delimiter=' '),\ # --fil-xhmm Set the target number of dynamic programming calculations # for a HMM filtered CM QDB search with beta = 1E-7 to times the # number of calculations required to do an HMM search. By default, is # 2.0. '--fil-xhmm':ValuedParameter(Prefix='--',Name='fil-xhmm',Delimiter=' '),\ # --fil-tau Set the tail loss probability during HMM band calculation # for HMM filter threshold calibration to . '--fil-tau':ValuedParameter(Prefix='--',Name='fil-tau',Delimiter=' '),\ # --fil-gemit During HMM filter calibration, always sample sequences from a # globally configured CM, even when calibrating local modes. '--fil-gemit':FlagParameter(Prefix='--',Name='fil-gemit'),\ # --fil-dfile Save statistics on filter threshold calibration, including # HMM and CM scores for all sampled sequences, to file . '--fil-dfile':ValuedParameter(Prefix='--',Name='fil-dfile',Delimiter=' '),\ # --mxsize Set the maximum allowable DP matrix size to megabytes. '--mxsize':ValuedParameter(Prefix='--',Name='mxsize',Delimiter=' '),\ } _parameters = {} _parameters.update(_options) _command = "cmcalibrate" _suppress_stderr=True def getHelp(self): """Method that points to the Infernal documentation.""" help_str = \ """ See Infernal documentation at: http://infernal.janelia.org/ """ return help_str class Cmemit(CommandLineApplication): """cmemit application controller.""" _options = { # -o Save the synthetic sequences to file rather than writing them # to stdout. '-o':ValuedParameter(Prefix='-',Name='o',Delimiter=' '),\ # -n Generate sequences. Default is 10. '-n':ValuedParameter(Prefix='-',Name='n',Delimiter=' '),\ # -u Write the generated sequences in unaligned format (FASTA). This is the # default, so this option is probably useless. '-u':FlagParameter(Prefix='-',Name='u'),\ # -a Write the generated sequences in an aligned format (STOCKHOLM) with # consensus structure annotation rather than FASTA. '-a':FlagParameter(Prefix='-',Name='a'),\ # -c Predict a single majority-rule consensus sequence instead of sampling # sequences from the CM's probability distribution. '-c':FlagParameter(Prefix='-',Name='c'),\ # -l Configure the CMs into local mode before emitting sequences. See the # User's Guide for more information on locally configured CMs. '-l':FlagParameter(Prefix='-',Name='l'),\ # -s Set the random seed to , where is a positive integer. The # default is to use time() to generate a different seed for each run, # which means that two different runs of cmemit on the same CM will give # different results. You can use this option to generate reproducible # results. '-s':ValuedParameter(Prefix='-',Name='s',Delimiter=' '),\ # --rna Specify that the emitted sequences be output as RNA sequences. This # is true by default. '--rna':FlagParameter(Prefix='--',Name='rna'),\ # --dna Specify that the emitted sequences be output as DNA sequences. By # default, the output alphabet is RNA. '--dna':FlagParameter(Prefix='--',Name='dna'),\ # --tfile Dump tabular sequence parsetrees (tracebacks) for each emitted # sequence to file . Primarily useful for debugging. '--tfile':ValuedParameter(Prefix='--',Name='tfile',Delimiter=' '),\ # --exp Exponentiate the emission and transition probabilities of the CM # by and then renormalize those distributions before emitting # sequences. '--exp':ValuedParameter(Prefix='--',Name='exp',Delimiter=' '),\ # --begin Truncate the resulting alignment by removing all residues # before consensus column , where is a positive integer no greater # than the consensus length of the CM. Must be used in combination with # --end and either -a or --shmm (a developer option). '--begin':ValuedParameter(Prefix='--',Name='begin',Delimiter=' '),\ # --end Truncate the resulting alignment by removing all residues after # consensus column , where is a positive integer no greater than # the consensus length of the CM. Must be used in combination with --begin # and either -a or --shmm (a developer option). '--end':ValuedParameter(Prefix='--',Name='end',Delimiter=' '),\ } _parameters = {} _parameters.update(_options) _command = "cmemit" _suppress_stderr=True def getHelp(self): """Method that points to the Infernal documentation.""" help_str = \ """ See Infernal documentation at: http://infernal.janelia.org/ """ return help_str class Cmscore(CommandLineApplication): """cmscore application controller.""" _options = { # -n Set the number of sequences to generate and align to . This # option is incompatible with the --infile option. '-n':ValuedParameter(Prefix='-',Name='n',Delimiter=' '),\ # -l Turn on the local alignment algorithm, which allows the alignment to # span two or more subsequences if necessary (e.g. if the structures of # the query model and target sequence are only partially shared), allowing # certain large insertions and deletions in the structure to be penalized # differently than normal indels. The default is to globally align the # query model to the target sequences. '-l':FlagParameter(Prefix='-',Name='l'),\ # -s Set the random seed to , where is a positive integer. The # default is to use time() to generate a different seed for each run, # which means that two different runs of cmscore on the same CM will give # different results. You can use this option to generate reproducible # results. The random number generator is used to generate sequences to # score, so -s is incompatible with the --infile option which supplies # the sequences to score in an input file. '-s':ValuedParameter(Prefix='-',Name='s',Delimiter=' '),\ # -a Print individual timings and score comparisons for each sequence in # seqfile. By default only summary statistics are printed. '-a':FlagParameter(Prefix='-',Name='a'),\ # --sub Turn on the sub model construction and alignment procedure. '--sub':FlagParameter(Prefix='--',Name='sub'),\ # --mxsize Set the maximum allowable DP matrix size to megabytes. '--mxsize':ValuedParameter(Prefix='--',Name='mxsize',Delimiter=' '),\ # --mpi Run as an MPI parallel program. '--mpi':FlagParameter(Prefix='--',Name='mpi'),\ # Expert Options # --emit Generate sequences to score by sampling from the CM. '--emit':FlagParameter(Prefix='--',Name='emit'),\ # --random Generate sequences to score by sampling from the CMs null # distribution. This option turns the --emit option off. '--random':FlagParameter(Prefix='--',Name='random'),\ # --infile Sequences to score are read from the file . All the # sequences from are read and scored, the -n and -s options are # incompatible with --infile. '--infile':ValuedParameter(Prefix='--',Name='infile',Delimiter=' '),\ # --outfile Save generated sequences that are scored to the file in # FASTA format. This option is incompatible with the --infile option. '--outfile':ValuedParameter(Prefix='--',Name='outfile',Delimiter=' '),\ # --Lmin Must be used in combination with --random and --Lmax . '--Lmin':ValuedParameter(Prefix='--',Name='Lmin',Delimiter=' '),\ # --pad Must be used in combination with --emit and --search. Add cm->W # (max hit length) minus L (sequence length) residues to the 5' and 3' # end of each emitted sequence . '--pad':FlagParameter(Prefix='--',Name='pad'),\ # --hbanded Specify that the second stage alignment algorithm be HMM banded # CYK. This option is on by default. '--hbanded':FlagParameter(Prefix='--',Name='hbanded'),\ # --tau For stage 2 alignment, set the tail loss probability used during # HMM band calculation to . '--tau':ValuedParameter(Prefix='--',Name='tau',Delimiter=' '),\ # --aln2bands With --search, when calculating HMM bands, use an HMM # alignment algorithm instead of an HMM search algorithm. '--aln2bands':FlagParameter(Prefix='--',Name='aln2bands'),\ # --hsafe For stage 2 HMM banded alignment, realign any sequences with a # negative alignment score using non-banded CYK to guarantee finding the # optimal alignment. '--hsafe':FlagParameter(Prefix='--',Name='hsafe'),\ # --nonbanded Specify that the second stage alignment algorithm be standard, # non-banded, non-D&C CYK. When --nonbanded is enabled, the program fails # with a non-zero exit code and prints an error message if the parsetree # score for any sequence from stage 1 D&C alignment and stage 2 alignment # differs by more than 0.01 bits. In theory, this should never happen as # both algorithms are guaranteed to determine the optimal parsetree. For # larger RNAs (more than 300 residues) if memory is limiting, --nonbanded # should be used in combination with --scoreonly. '--nonbanded':FlagParameter(Prefix='--',Name='nonbanded'),\ # --scoreonly With --nonbanded during the second stage standard non-banded # CYK alignment, use the "score only" variant of the algorithm to save # memory, and don't recover a parse tree. '--scoreonly':FlagParameter(Prefix='--',Name='scoreonly'),\ # --viterbi Specify that the second stage alignment algorithm be Viterbi to # a CM Plan 9 HMM. '--viterbi':FlagParameter(Prefix='--',Name='viterbi'),\ # --search Run all algorithms in scanning mode, not alignment mode. '--search':FlagParameter(Prefix='--',Name='search'),\ # --inside With --search Compare the non-banded scanning Inside algorithm to # the HMM banded scanning Inside algorith, instead of using CYK versions. '--inside':FlagParameter(Prefix='--',Name='inside'),\ # --forward With --search Compare the scanning Forward scoring algorithm # against CYK. '--forward':FlagParameter(Prefix='--',Name='forward'),\ # --taus Specify the first alignment algorithm as non-banded D&C CYK, # and multiple stages of HMM banded CYK alignment. The first HMM banded # alignment will use tau=1E-, which will be the highest value of tau # used. Must be used in combination with --taue. '--taus':ValuedParameter(Prefix='--',Name='taus',Delimiter=' '),\ # --taue Specify the first alignment algorithm as non-banded D&C CYK, # and multiple stages of HMM banded CYK alignment. The final HMM banded # alignment will use tau=1E-, which will be the lowest value of tau # used. Must be used in combination with --taus. '--taue':ValuedParameter(Prefix='--',Name='taue',Delimiter=' '),\ # --tfile Print the parsetrees for each alignment of each sequence to # file . '--tfile':ValuedParameter(Prefix='--',Name='tfile',Delimiter=' '),\ } _parameters = {} _parameters.update(_options) _command = "cmscore" _suppress_stderr=True def getHelp(self): """Method that points to the Infernal documentation.""" help_str = \ """ See Infernal documentation at: http://infernal.janelia.org/ """ return help_str class Cmsearch(CommandLineApplication): """cmsearch application controller.""" _options = { # -o Save the high-scoring alignments of hits to a file . The default # is to write them to standard output. '-o':ValuedParameter(Prefix='-',Name='o',Delimiter=' '),\ # -g Turn on the 'glocal' alignment algorithm, local with respect to the # target database, and global with respect to the model. By default, the # local alignment algorithm is used which is local with respect to both # the target sequence and the model. '-g':ValuedParameter(Prefix='-',Name='g',Delimiter=' '),\ # -p Append posterior probabilities to alignments of hits. '-p':FlagParameter(Prefix='-',Name='p'),\ # -x Annotate non-compensatory basepairs and basepairs that include a gap in # the left and/or right half of the pair with x's in the alignments of # hits. '-x':FlagParameter(Prefix='-',Name='x'),\ # -Z Calculate E-values as if the target database size was megabases # (Mb). Ignore the actual size of the database. This option is only valid # if the CM file has been calibrated. Warning: the predictions for timings # and survival fractions will be calculated as if the database was of size # Mb, which means they will be inaccurate. '-Z':ValuedParameter(Prefix='-',Name='Z',Delimiter=' '),\ # --toponly Only search the top (Watson) strand of the sequences in seqfile. # By default, both strands are searched. '--toponly':FlagParameter(Prefix='--',Name='toponly'),\ # --bottomonly Only search the bottom (Crick) strand of the sequences in # seqfile. By default, both strands are searched. '--bottomonly':FlagParameter(Prefix='--',Name='bottomonly'),\ # --forecast Predict the running time of the search with provided files # and options and exit, DO NOT perform the search. This option is only # available with calibrated CM files. '--forecast':ValuedParameter(Prefix='--',Name='forecast',Delimiter=' '),\ # --informat Assert that the input seqfile is in format . Do not run # Babelfish format autodection. This increases the reliability of the # program somewhat, because the Babelfish can make mistakes; particularly # recommended for unattended, high-throughput runs of @PACKAGE@. is # case-insensitive. Acceptable formats are: FASTA, EMBL, UNIPROT, GENBANK, # and DDBJ. is case-insensitive. '--informat':ValuedParameter(Prefix='--',Name='informat',Delimiter=' '),\ # --mxsize Set the maximum allowable DP matrix size to megabytes. '--mxsize':ValuedParameter(Prefix='--',Name='mxsize',Delimiter=' '),\ # --mpi Run as an MPI parallel program. '--mpi':FlagParameter(Prefix='--',Name='mpi'),\ # Expert Options # --inside Use the Inside algorithm for the final round of searching. This # is true by default. '--inside':FlagParameter(Prefix='--',Name='inside'),\ # --cyk Use the CYK algorithm for the final round of searching. '--cyk':FlagParameter(Prefix='--',Name='cyk'),\ # --viterbi Search only with an HMM. This is much faster but less sensitive # than a CM search. Use the Viterbi algorithm for the HMM search. '--viterbi':FlagParameter(Prefix='--',Name='viterbi'),\ # --forward Search only with an HMM. This is much faster but less sensitive # than a CM search. Use the Forward algorithm for the HMM search. '--forward':FlagParameter(Prefix='--',Name='forward'),\ # -E Set the E-value cutoff for the per-sequence/strand ranked hit list # to , where is a positive real number. '-E':ValuedParameter(Prefix='-',Name='E',Delimiter=' '),\ # -T Set the bit score cutoff for the per-sequence ranked hit list to # , where is a positive real number. '-T':ValuedParameter(Prefix='-',Name='T',Delimiter=' '),\ # --nc Set the bit score cutoff as the NC cutoff value used by Rfam curators # as the noise cutoff score. '--nc':FlagParameter(Prefix='--',Name='nc'),\ # --ga Set the bit score cutoff as the GA cutoff value used by Rfam curators # as the gathering threshold. '--ga':FlagParameter(Prefix='--',Name='ga'),\ # --tc Set the bit score cutoff as the TC cutoff value used by Rfam curators # as the trusted cutoff. '--tc':FlagParameter(Prefix='--',Name='tc'),\ # --no-qdb Do not use query-dependent banding (QDB) for the final round of # search. '--no-qdb':FlagParameter(Prefix='--',Name='no-qdb'),\ # --beta " " For query-dependent banding (QDB) during the final round of # search, set the beta parameter to where is any positive real # number less than 1.0. '--beta':ValuedParameter(Prefix='--',Name='beta',Delimiter=' '),\ # --hbanded Use HMM bands to accelerate the final round of search. # Constraints for the CM search are derived from posterior probabilities # from an HMM. This is an experimental option and it is not recommended # for use unless you know exactly what you're doing. '--hbanded':FlagParameter(Prefix='--',Name='hbanded'),\ # --tau Set the tail loss probability during HMM band calculation to # . '--tau':ValuedParameter(Prefix='--',Name='tau',Delimiter=' '),\ # --fil-no-hmm Turn the HMM filter off. '--fil-no-hmm':FlagParameter(Prefix='--',Name='fil-no-hmm'),\ # --fil-no-qdb Turn the QDB filter off. '--fil-no-qdb':FlagParameter(Prefix='--',Name='fil-no-qdb'),\ # --fil-beta For the QDB filter, set the beta parameter to where is # any positive real number less than 1.0. '--fil-beta':FlagParameter(Prefix='--',Name='fil-beta'),\ # --fil-T-qdb Set the bit score cutoff for the QDB filter round to , # where is a positive real number. '--fil-T-qdb':ValuedParameter(Prefix='--',Name='fil-T-qdb',Delimiter=' '),\ # --fil-T-hmm Set the bit score cutoff for the HMM filter round to , # where is a positive real number. '--fil-T-hmm':ValuedParameter(Prefix='--',Name='fil-T-hmm',Delimiter=' '),\ # --fil-E-qdb Set the E-value cutoff for the QDB filter round. , # where is a positive real number. Hits with E-values better than # (less than) or equal to this threshold will survive and be passed to the # final round. This option is only available if the CM file has been # calibrated. '--fil-E-qdb':ValuedParameter(Prefix='--',Name='fil-E-qdb',Delimiter=' '),\ # --fil-E-hmm Set the E-value cutoff for the HMM filter round. , # where is a positive real number. Hits with E-values better than # (less than) or equal to this threshold will survive and be passed to the # next round, either a QDB filter round, or if the QDB filter is disable, # to the final round of search. This option is only available if the CM # file has been calibrated. '--fil-E-hmm':ValuedParameter(Prefix='--',Name='fil-E-hmm',Delimiter=' '),\ # --fil-Smax-hmm Set the maximum predicted survival fraction for an HMM # filter as , where is a positive real number less than 1.0. '--fil-Smax-hmm':ValuedParameter(Prefix='--',Name='fil-Smax-hmm',\ Delimiter=' '),\ # --noalign Do not calculate and print alignments of each hit, only print # locations and scores. '--noalign':FlagParameter(Prefix='--',Name='noalign'),\ # --aln-hbanded Use HMM bands to accelerate alignment during the hit # alignment stage. '--aln-hbanded':FlagParameter(Prefix='--',Name='aln-hbanded'),\ # --aln-optacc Calculate alignments of hits from final round of search using # the optimal accuracy algorithm which computes the alignment that # maximizes the summed posterior probability of all aligned residues given # the model, which can be different from the highest scoring one. '--aln-optacc':FlagParameter(Prefix='--',Name='aln-optacc'),\ # --tabfile Create a new output file and print tabular results to # it. '--tabfile':ValuedParameter(Prefix='--',Name='tabfile',Delimiter=' '),\ # --gcfile Create a new output file and print statistics of the GC # content of the sequences in seqfile to it. '--gcfile':ValuedParameter(Prefix='--',Name='gcfile',Delimiter=' '),\ # --rna Output the hit alignments as RNA sequences alignments. This is true # by default. '--rna':FlagParameter(Prefix='--',Name='rna'),\ # --dna Output the hit alignments as DNA sequence alignments. '--dna':FlagParameter(Prefix='--',Name='dna'),\ } _parameters = {} _parameters.update(_options) _command = "cmsearch" _suppress_stderr=True def getHelp(self): """Method that points to the Infernal documentation.""" help_str = \ """ See Infernal documentation at: http://infernal.janelia.org/ """ return help_str def _tabfile_out_filename(self): if self.Parameters['--tabfile'].isOn(): tabfile_filename = self._absolute(str(\ self.Parameters['--tabfile'].Value)) else: raise ValueError, 'No tabfile output file specified.' return tabfile_filename def _tempfile_as_multiline_string(self, data): """Write a multiline string to a temp file and return the filename. data: a multiline string to be written to a file. * Note: the result will be the filename as a FilePath object (which is a string subclass). """ filename = FilePath(self.getTmpFilename(self.TmpDir)) data_file = open(filename,'w') data_file.write(data) data_file.close() return filename def _get_result_paths(self,data): result = {} if self.Parameters['--tabfile'].isOn(): out_name = self._tabfile_out_filename() result['SearchResults'] = ResultPath(Path=out_name,IsWritten=True) return result class Cmstat(CommandLineApplication): """cmstat application controller.""" _options = { # -g Turn on the 'glocal' alignment algorithm, local with respect to the # target database, and global with respect to the model. By default, the # model is configured for local alignment which is local with respect to # both the target sequence and the model. '-g':FlagParameter(Prefix='-',Name='g'),\ # -m print general statistics on the models in cmfile and the alignment it # was built from. '-m':FlagParameter(Prefix='-',Name='m'),\ # -Z Calculate E-values as if the target database size was megabases # (Mb). Ignore the actual size of the database. This option is only valid # if the CM file has been calibrated. '-Z':ValuedParameter(Prefix='-',Name='Z',Delimiter=' '),\ # --all print all available statistics '--all':FlagParameter(Prefix='--',Name='all'),\ # --le print local E-value statistics. This option only works if cmfile has # been calibrated with cmcalibrate. '--le':FlagParameter(Prefix='--',Name='le'),\ # --ge print glocal E-value statistics. This option only works if cmfile has # been calibrated with cmcalibrate. '--ge':FlagParameter(Prefix='--',Name='ge'),\ # --beta With the --search option set the beta parameter for the query- # dependent banding algorithm stages to Beta is the probability mass # considered negligible during band calculation. The default is 1E-7. '--beta':ValuedParameter(Prefix='--',Name='beta',Delimiter=' '),\ # --qdbfile Save the query-dependent bands (QDBs) for each state to file # '--qdbfile':ValuedParameter(Prefix='--',Name='qdbfile',Delimiter=' '),\ # Expert Options # --lfi Print the HMM filter thresholds for the range of relevant CM bit # score cutoffs for searches with locally configured models using the # Inside algorithm. '--lfi':FlagParameter(Prefix='--',Name='lfi'),\ # --gfi Print the HMM filter thresholds for the range of relevant CM bit # score cutoffs for searches with globally configured models using the # Inside algorithm. '--gfi':FlagParameter(Prefix='--',Name='gfi'),\ # --lfc Print the HMM filter thresholds for the range of relevant CM bit # score cutoffs for searches with locally configured models using the CYK # algorithm. '--lfc':FlagParameter(Prefix='--',Name='lfc'),\ # --gfc Print the HMM filter thresholds for the range of relevant CM bit # score cutoffs for searches with globally configured models using the CYK # algorithm. '--gfc':FlagParameter(Prefix='--',Name='gfc'),\ # -E Print filter threshold statistics for an HMM filter if a final CM # E-value cutoff of were to be used for a run of cmsearch on 1 MB of # sequence. '-E':ValuedParameter(Prefix='-',Name='E',Delimiter=' '),\ # -T Print filter threshold statistics for an HMM filter if a final CM # bit score cutoff of were to be used for a run of cmsearch. '-T':ValuedParameter(Prefix='-',Name='T',Delimiter=' '),\ # --nc Print filter threshold statistics for an HMM filter if a CM bit score # cutoff equal to the Rfam NC cutoff were to be used for a run of # cmsearch. '--nc':FlagParameter(Prefix='--',Name='nc'),\ # --ga Print filter threshold statistics for an HMM filter if a CM bit score # cutoff of Rfam GA cutoff value were to be used for a run of cmsearch. '--ga':FlagParameter(Prefix='--',Name='ga'),\ # --tc Print filter threshold statistics for an HMM filter if a CM bit score # cutoff equal to the Rfam TC cutoff value were to be used for a run of # cmsearch. '--tc':FlagParameter(Prefix='--',Name='tc'),\ # --seqfile With the -E option, use the database size of the database in # instead of the default database size of 1 MB. '--seqfile':ValuedParameter(Prefix='--',Name='seqfile',Delimiter=' '),\ # --toponly In combination with --seqfile option, only consider the top # strand of the database in instead of both strands. --search perform # an experiment to determine how fast the CM(s) can search with different # search algorithms. '--toponly':FlagParameter(Prefix='--',Name='toponly'),\ # --cmL With the --search option set the length of sequence to search # with CM algorithms as residues. By default, is 1000. '--cmL':ValuedParameter(Prefix='--',Name='cmL',Delimiter=' '),\ # --hmmL With the --search option set the length of sequence to search # with HMM algorithms as residues. By default, is 100,000. '--hmmL':ValuedParameter(Prefix='--',Name='hmmL',Delimiter=' '),\ # --efile Save a plot of cmsearch HMM filter E value cutoffs versus CM # E-value cutoffs in xmgrace format to file . This option must be used # in combination with --lfi, --gfi, --lfc or --gfc. '--efile':ValuedParameter(Prefix='--',Name='efile',Delimiter=' '),\ # --bfile Save a plot of cmsearch HMM bit score cutoffs versus CM bit # score cutoffs in xmgrace format to file . This option must be used in # combination with --lfi, --gfi, --lfc or --gfc. '--bfile':ValuedParameter(Prefix='--',Name='bfile',Delimiter=' '),\ # --sfile Save a plot of cmsearch predicted survival fraction from the # HMM filter versus CM E value cutoff in xmgrace format to file . This # option must be used in combination with --lfi, --gfi, --lfc or --gfc. '--sfile':ValuedParameter(Prefix='--',Name='sfile',Delimiter=' '),\ # --xfile Save a plot of 'xhmm' versus CM E value cutoff in xmgrace # format to file 'xhmm' is the ratio of the number of dynamic # programming calculations predicted to be required for the HMM filter and # the CM search of the filter survivors versus the number of dynamic # programming calculations for the filter alone. This option must be # used in combination with --lfi, --gfi, --lfc or --gfc. '--xfile':ValuedParameter(Prefix='--',Name='xfile',Delimiter=' '),\ # --afile Save a plot of the predicted acceleration for an HMM filtered # search versus CM E value cutoff in xmgrace format to file . This # option must be used in combination with --lfi, --gfi, --lfc or --gfc. '--afile':ValuedParameter(Prefix='--',Name='afile',Delimiter=' '),\ # --bits With --efile, --sfile, --xfile, and --afile use CM bit score # cutoffs instead of CM E value cutoffs for the x-axis values of the plot. '--bits':FlagParameter(Prefix='--',Name='bits'),\ } _parameters = {} _parameters.update(_options) _command = "cmstat" _suppress_stderr=True def getHelp(self): """Method that points to the Infernal documentation.""" help_str = \ """ See Infernal documentation at: http://infernal.janelia.org/ """ return help_str def cmbuild_from_alignment(aln, structure_string, refine=False, \ return_alignment=False,params=None): """Uses cmbuild to build a CM file given an alignment and structure string. - aln: an Alignment object or something that can be used to construct one. All sequences must be the same length. - structure_string: vienna structure string representing the consensus stucture for the sequences in aln. Must be the same length as the alignment. - refine: refine the alignment and realign before building the cm. (Default=False) - return_alignment: Return (in Stockholm format) alignment file used to construct the CM file. This will either be the original alignment and structure string passed in, or the refined alignment if --refine was used. (Default=False) - Note. This will be a string that can either be written to a file or parsed. """ aln = Alignment(aln) if len(structure_string) != aln.SeqLen: raise ValueError, """Structure string is not same length as alignment. Structure string is %s long. Alignment is %s long."""%(len(structure_string),\ aln.SeqLen) else: struct_dict = {'SS_cons':structure_string} #Make new Cmbuild app instance. app = Cmbuild(InputHandler='_input_as_paths',WorkingDir='/tmp',\ params=params) #turn on refine flag if True. if refine: app.Parameters['--refine'].on(get_tmp_filename(app.WorkingDir)) #Get alignment in Stockholm format aln_file_string = stockholm_from_alignment(aln,GC_annotation=struct_dict) #get path to alignment filename aln_path = app._input_as_multiline_string(aln_file_string) cm_path = aln_path.split('.txt')[0]+'.cm' app.Parameters['-n'].on(cm_path) filepaths = [cm_path,aln_path] res = app(filepaths) cm_file = res['CmFile'].read() if return_alignment: #If alignment was refined, return refined alignment and structure, # otherwise return original alignment and structure. if refine: aln_file_string = res['Refined'].read() res.cleanUp() return cm_file, aln_file_string #Just return cm_file else: res.cleanUp() return cm_file def cmbuild_from_file(stockholm_file_path, refine=False,return_alignment=False,\ params=None): """Uses cmbuild to build a CM file given a stockholm file. - stockholm_file_path: a path to a stockholm file. This file should contain a multiple sequence alignment formated in Stockholm format. This must contain a sequence structure line: #=GC SS_cons - refine: refine the alignment and realign before building the cm. (Default=False) - return_alignment: Return alignment and structure string used to construct the CM file. This will either be the original alignment and structure string passed in, or the refined alignment if --refine was used. (Default=False) """ #get alignment and structure string from stockholm file. info, aln, structure_string = \ list(MinimalRfamParser(open(stockholm_file_path,'U'),\ seq_constructor=ChangedSequence))[0] #call cmbuild_from_alignment. res = cmbuild_from_alignment(aln, structure_string, refine=refine, \ return_alignment=return_alignment,params=params) return res def cmalign_from_alignment(aln, structure_string, seqs, moltype,\ include_aln=True,refine=False, return_stdout=False,params=None,\ cmbuild_params=None): """Uses cmbuild to build a CM file, then cmalign to build an alignment. - aln: an Alignment object or something that can be used to construct one. All sequences must be the same length. - structure_string: vienna structure string representing the consensus stucture for the sequences in aln. Must be the same length as the alignment. - seqs: SequenceCollection object or something that can be used to construct one, containing unaligned sequences that are to be aligned to the aligned sequences in aln. - moltype: Cogent moltype object. Must be RNA or DNA. - include_aln: Boolean to include sequences in aln in final alignment. (Default=True) - refine: refine the alignment and realign before building the cm. (Default=False) - return_stdout: Boolean to return standard output from infernal. This includes alignment and structure bit scores and average probabilities for each sequence. (Default=False) """ #NOTE: Must degap seqs or Infernal well seg fault! seqs = SequenceCollection(seqs,MolType=moltype).degap() #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seqs.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) cm_file, aln_file_string = cmbuild_from_alignment(aln, structure_string,\ refine=refine,return_alignment=True,params=cmbuild_params) if params is None: params = {} params.update({MOLTYPE_MAP[moltype]:True}) app = Cmalign(InputHandler='_input_as_paths',WorkingDir='/tmp',\ params=params) app.Parameters['--informat'].on('FASTA') #files to remove that aren't cleaned up by ResultPath object to_remove = [] #turn on --withali flag if True. if include_aln: app.Parameters['--withali'].on(\ app._tempfile_as_multiline_string(aln_file_string)) #remove this file at end to_remove.append(app.Parameters['--withali'].Value) seqs_path = app._input_as_multiline_string(int_map.toFasta()) cm_path = app._tempfile_as_multiline_string(cm_file) #add cm_path to to_remove to_remove.append(cm_path) paths = [cm_path,seqs_path] app.Parameters['-o'].on(get_tmp_filename(app.WorkingDir)) res = app(paths) info, aligned, struct_string = \ list(MinimalRfamParser(res['Alignment'].readlines(),\ seq_constructor=SEQ_CONSTRUCTOR_MAP[moltype]))[0] #Make new dict mapping original IDs new_alignment={} for k,v in aligned.NamedSeqs.items(): new_alignment[int_keys.get(k,k)]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) std_out = res['StdOut'].read() #clean up files res.cleanUp() for f in to_remove: remove(f) if return_stdout: return new_alignment, struct_string, std_out else: return new_alignment, struct_string def cmalign_from_file(cm_file_path, seqs, moltype, alignment_file_path=None,\ include_aln=False,return_stdout=False,params=None): """Uses cmalign to align seqs to alignment in cm_file_path. - cm_file_path: path to the file created by cmbuild, containing aligned sequences. This will be used to align sequences in seqs. - seqs: unaligned sequendes that are to be aligned to the sequences in cm_file. - moltype: cogent.core.moltype object. Must be DNA or RNA - alignment_file_path: path to stockholm alignment file used to create cm_file. __IMPORTANT__: This MUST be the same file used by cmbuild originally. Only need to pass in this file if include_aln=True. This helper function will NOT check if the alignment file is correct so you must use it correctly. - include_aln: Boolean to include sequences in aln_file in final alignment. (Default=False) - return_stdout: Boolean to return standard output from infernal. This includes alignment and structure bit scores and average probabilities for each sequence. (Default=False) """ #NOTE: Must degap seqs or Infernal well seg fault! seqs = SequenceCollection(seqs,MolType=moltype).degap() #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seqs.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) if params is None: params = {} params.update({MOLTYPE_MAP[moltype]:True}) app = Cmalign(InputHandler='_input_as_paths',WorkingDir='/tmp',\ params=params) app.Parameters['--informat'].on('FASTA') #turn on --withali flag if True. if include_aln: if alignment_file_path is None: raise DataError, """Must have path to alignment file used to build CM if include_aln=True.""" else: app.Parameters['--withali'].on(alignment_file_path) seqs_path = app._input_as_multiline_string(int_map.toFasta()) paths = [cm_file_path,seqs_path] app.Parameters['-o'].on(get_tmp_filename(app.WorkingDir)) res = app(paths) info, aligned, struct_string = \ list(MinimalRfamParser(res['Alignment'].readlines(),\ seq_constructor=SEQ_CONSTRUCTOR_MAP[moltype]))[0] #Make new dict mapping original IDs new_alignment={} for k,v in aligned.items(): new_alignment[int_keys.get(k,k)]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) std_out = res['StdOut'].read() res.cleanUp() if return_stdout: return new_alignment, struct_string, std_out else: return new_alignment, struct_string def cmsearch_from_alignment(aln, structure_string, seqs, moltype, cutoff=0.0,\ refine=False,params=None): """Uses cmbuild to build a CM file, then cmsearch to find homologs. - aln: an Alignment object or something that can be used to construct one. All sequences must be the same length. - structure_string: vienna structure string representing the consensus stucture for the sequences in aln. Must be the same length as the alignment. - seqs: SequenceCollection object or something that can be used to construct one, containing unaligned sequences that are to be searched. - moltype: cogent.core.moltype object. Must be DNA or RNA - cutoff: bitscore cutoff. No sequences < cutoff will be kept in search results. (Default=0.0). Infernal documentation suggests a cutoff of log2(number nucleotides searching) will give most likely true homologs. - refine: refine the alignment and realign before building the cm. (Default=False) """ #NOTE: Must degap seqs or Infernal well seg fault! seqs = SequenceCollection(seqs,MolType=moltype).degap() #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seqs.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) cm_file, aln_file_string = cmbuild_from_alignment(aln, structure_string,\ refine=refine,return_alignment=True) app = Cmsearch(InputHandler='_input_as_paths',WorkingDir='/tmp',\ params=params) app.Parameters['--informat'].on('FASTA') app.Parameters['-T'].on(cutoff) to_remove = [] seqs_path = app._input_as_multiline_string(int_map.toFasta()) cm_path = app._tempfile_as_multiline_string(cm_file) paths = [cm_path,seqs_path] to_remove.append(cm_path) app.Parameters['--tabfile'].on(get_tmp_filename(app.WorkingDir)) res = app(paths) search_results = list(CmsearchParser(res['SearchResults'].readlines())) if search_results: for i,line in enumerate(search_results): label = line[1] search_results[i][1]=int_keys.get(label,label) res.cleanUp() for f in to_remove:remove(f) return search_results def cmsearch_from_file(cm_file_path, seqs, moltype, cutoff=0.0, params=None): """Uses cmbuild to build a CM file, then cmsearch to find homologs. - cm_file_path: path to the file created by cmbuild, containing aligned sequences. This will be used to search sequences in seqs. - seqs: SequenceCollection object or something that can be used to construct one, containing unaligned sequences that are to be searched. - moltype: cogent.core.moltype object. Must be DNA or RNA - cutoff: bitscore cutoff. No sequences < cutoff will be kept in search results. (Default=0.0). Infernal documentation suggests a cutoff of log2(number nucleotides searching) will give most likely true homologs. """ #NOTE: Must degap seqs or Infernal well seg fault! seqs = SequenceCollection(seqs,MolType=moltype).degap() #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seqs.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) app = Cmsearch(InputHandler='_input_as_paths',WorkingDir='/tmp',\ params=params) app.Parameters['--informat'].on('FASTA') app.Parameters['-T'].on(cutoff) seqs_path = app._input_as_multiline_string(int_map.toFasta()) paths = [cm_file_path,seqs_path] app.Parameters['--tabfile'].on(get_tmp_filename(app.WorkingDir)) res = app(paths) search_results = list(CmsearchParser(res['SearchResults'].readlines())) if search_results: for i,line in enumerate(search_results): label = line[1] search_results[i][1]=int_keys.get(label,label) res.cleanUp() return search_results PyCogent-1.5.3/cogent/app/knetfold.py000644 000765 000024 00000007001 12024702176 020456 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class Knetfold(CommandLineApplication): """Application controller for Knetfold v1.4.4b application""" _parameters = {'-i':ValuedParameter(Prefix='-',Name='i',Delimiter=' '), '-n':ValuedParameter(Prefix='-',Name='n',Delimiter=' '), '-d':ValuedParameter(Prefix='-',Name='d',Delimiter=' '), '-m':ValuedParameter(Prefix='-',Name='m',Delimiter=' '), '-q':ValuedParameter(Prefix='-',Name='q',Delimiter=' '), '-o':ValuedParameter(Prefix='-',Name='o',Delimiter=' '), '-r':ValuedParameter(Prefix='-',Name='r',Delimiter=' '), '-f':ValuedParameter(Prefix='-',Name='f',Delimiter=' '), '-h':ValuedParameter(Prefix='-',Name='h',Delimiter=' '),} _command = 'knetfold.pl' _input_handler = '_input_as_string' def _input_as_lines(self,data): """ Infile needs to be set with parameter -i """ filename = self._input_filename = self.getTmpFilename(self.WorkingDir) data_file = open(filename,'w') data_to_file = '\n'.join([str(d).strip('\n') for d in data]) data_file.write(data_to_file) data_file.write('\n') data_file.close() #set input flag and give it the input filename self.Parameters['-i'].on(filename) return '' def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ if data: self.Parameters['-i'].on(data) return '' def _get_result_paths(self,data): """ Adds output files to the resultpath """ result = {} if isinstance(data,list): filename=self._input_filename.split('/')[-1] else: filename=(data.split('/')[-1]).split('.')[0] #output files created in extensions list extensions = ['ct','coll','sec','fasta','pdf'] #file = '%s%s%s' % (self.WorkingDir,filename,'_knet') file = ''.join([self.WorkingDir, filename, '_knet']) for ext in extensions: try: path = '%s.%s' % (file,ext) f = open(path) f.close() result[ext]=\ ResultPath(Path=(path)) except IOError: pass number = 0 #Unknown number of mx files, try/except to find all #file = '%s%s%s%d' % (self.WorkingDir,filename,'_knet.mx',number) file_base = ''.join([self.WorkingDir, filename, '_knet.mx']) while(True): try: file = file_base + str(number) f = open(file) result['%s%d' % ('mx',number)]= ResultPath(Path=(file)) f.close() number +=1 except IOError: #No more mx files break return result PyCogent-1.5.3/cogent/app/mafft.py000644 000765 000024 00000044435 12024702176 017761 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Provides an application controller for the commandline version of: MAFFT v6.602 """ from cogent.app.parameters import FlagParameter, ValuedParameter, FilePath from cogent.app.util import CommandLineApplication, ResultPath, \ get_tmp_filename from random import choice from cogent.parse.fasta import MinimalFastaParser from cogent.core.moltype import DNA, RNA, PROTEIN from cogent.core.alignment import SequenceCollection, Alignment from cogent.core.tree import PhyloNode from cogent.parse.tree import DndParser from os import remove __author__ = "Jeremy Widmann" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jeremy Widmann", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jeremy Widmann" __email__ = "jeremy.widmann@colorado.edu" __status__ = "Development" MOLTYPE_MAP = {'DNA':'--nuc',\ 'RNA':'--nuc',\ 'PROTEIN':'--amino',\ } class Mafft(CommandLineApplication): """Mafft application controller""" _options ={ # Algorithm # Automatically selects an appropriate strategy from L-INS-i, FFT-NS-i # and FFT-NS-2, according to data size. Default: off (always FFT-NS-2) '--auto':FlagParameter(Prefix='--',Name='auto'),\ # Distance is calculated based on the number of shared 6mers. Default: on '--6merpair':FlagParameter(Prefix='--',Name='6merpair'),\ # All pairwise alignments are computed with the Needleman-Wunsch algorithm. # More accurate but slower than --6merpair. Suitable for a set of globally # alignable sequences. Applicable to up to ~200 sequences. A combination # with --maxiterate 1000 is recommended (G-INS-i). Default: off # (6mer distance is used) '--globalpair':FlagParameter(Prefix='--',Name='globalpair'),\ # All pairwise alignments are computed with the Smith-Waterman algorithm. # More accurate but slower than --6merpair. Suitable for a set of locally # alignable sequences. Applicable to up to ~200 sequences. A combination # with --maxiterate 1000 is recommended (L-INS-i). Default: off # (6mer distance is used) '--localpair':FlagParameter(Prefix='--',Name='localpair'),\ # All pairwise alignments are computed with a local algorithm with the # generalized affine gap cost (Altschul 1998). More accurate but slower than # --6merpair. Suitable when large internal gaps are expected. Applicable to # up to ~200 sequences. A combination with --maxiterate 1000 is recommended # (E-INS-i). Default: off (6mer distance is used) '--genafpair':FlagParameter(Prefix='--',Name='genafpair'),\ # All pairwise alignments are computed with FASTA (Pearson and Lipman 1988). # FASTA is required. Default: off (6mer distance is used) '--fastapair':FlagParameter(Prefix='--',Name='fastapair'),\ # Weighting factor for the consistency term calculated from pairwise # alignments. Valid when either of --blobalpair, --localpair, --genafpair, # --fastapair or --blastpair is selected. Default: 2.7 '--weighti':ValuedParameter(Prefix='--',Name='weighti',Delimiter=' '),\ # Guide tree is built number times in the progressive stage. Valid with 6mer # distance. Default: 2 '--retree':ValuedParameter(Prefix='--',Name='retree',Delimiter=' '),\ # number cycles of iterative refinement are performed. Default: 0 '--maxiterate':ValuedParameter(Prefix='--',Name='maxiterate',\ Delimiter=' '),\ # Use FFT approximation in group-to-group alignment. Default: on '--fft':FlagParameter(Prefix='--',Name='fft'),\ # Do not use FFT approximation in group-to-group alignment. Default: off '--nofft':FlagParameter(Prefix='--',Name='nofft'),\ #Alignment score is not checked in the iterative refinement stage. Default: # off (score is checked) '--noscore':FlagParameter(Prefix='--',Name='noscore'),\ # Use the Myers-Miller (1988) algorithm. Default: automatically turned on # when the alignment length exceeds 10,000 (aa/nt). '--memsave':FlagParameter(Prefix='--',Name='memsave'),\ # Use a fast tree-building method (PartTree, Katoh and Toh 2007) with the # 6mer distance. Recommended for a large number (> ~10,000) of sequences are # input. Default: off '--parttree':FlagParameter(Prefix='--',Name='parttree'),\ # The PartTree algorithm is used with distances based on DP. Slightly more # accurate and slower than --parttree. Recommended for a large number # (> ~10,000) of sequences are input. Default: off '--dpparttree':FlagParameter(Prefix='--',Name='dpparttree'),\ # The PartTree algorithm is used with distances based on FASTA. Slightly # more accurate and slower than --parttree. Recommended for a large number # (> ~10,000) of sequences are input. FASTA is required. Default: off '--fastaparttree':FlagParameter(Prefix='--',Name='fastaparttree'),\ # The number of partitions in the PartTree algorithm. Default: 50 '--partsize':ValuedParameter(Prefix='--',Name='partsize',Delimiter=' '),\ # Do not make alignment larger than number sequences. Valid only with the # --*parttree options. Default: the number of input sequences '--groupsize':ValuedParameter(Prefix='--',Name='groupsize',Delimiter=' '),\ # Parameter # Gap opening penalty at group-to-group alignment. Default: 1.53 '--op':ValuedParameter(Prefix='--',Name='op',Delimiter=' '),\ # Offset value, which works like gap extension penalty, for group-to-group # alignment. Deafult: 0.123 '--ep':ValuedParameter(Prefix='--',Name='ep',Delimiter=' '),\ # Gap opening penalty at local pairwise alignment. Valid when the # --localpair or --genafpair option is selected. Default: -2.00 '--lop':ValuedParameter(Prefix='--',Name='lop',Delimiter=' '),\ # Offset value at local pairwise alignment. Valid when the --localpair or # --genafpair option is selected. Default: 0.1 '--lep':ValuedParameter(Prefix='--',Name='lep',Delimiter=' '),\ # Gap extension penalty at local pairwise alignment. Valid when the # --localpair or --genafpair option is selected. Default: -0.1 '--lexp':ValuedParameter(Prefix='--',Name='lexp',Delimiter=' '),\ # Gap opening penalty to skip the alignment. Valid when the --genafpair # option is selected. Default: -6.00 '--LOP':ValuedParameter(Prefix='--',Name='LOP',Delimiter=' '),\ # Gap extension penalty to skip the alignment. Valid when the --genafpair # option is selected. Default: 0.00 '--LEXP':ValuedParameter(Prefix='--',Name='LEXP',Delimiter=' '),\ # BLOSUM number matrix (Henikoff and Henikoff 1992) is used. number=30, 45, # 62 or 80. Default: 62 '--bl':ValuedParameter(Prefix='--',Name='bl',Delimiter=' '),\ # JTT PAM number (Jones et al. 1992) matrix is used. number>0. # Default: BLOSUM62 '--jtt':ValuedParameter(Prefix='--',Name='jtt',Delimiter=' '),\ # Transmembrane PAM number (Jones et al. 1994) matrix is used. number>0. # Default: BLOSUM62 '--tm':ValuedParameter(Prefix='--',Name='tm',Delimiter=' '),\ # Use a user-defined AA scoring matrix. The format of matrixfile is the same # to that of BLAST. Ignored when nucleotide sequences are input. # Default: BLOSUM62 '--aamatrix':ValuedParameter(Prefix='--',Name='aamatrix',Delimiter=' '),\ # Incorporate the AA/nuc composition information into the scoring matrix. # Deafult: off '--fmodel':FlagParameter(Prefix='--',Name='fmodel'),\ # Output # Output format: clustal format. Default: off (fasta format) '--clustalout':FlagParameter(Prefix='--',Name='clustalout'),\ # Output order: same as input. Default: on '--inputorder':FlagParameter(Prefix='--',Name='inputorder'),\ # Output order: aligned. Default: off (inputorder) '--reorder':FlagParameter(Prefix='--',Name='reorder'),\ # Guide tree is output to the input.tree file. Default: off '--treeout':FlagParameter(Prefix='--',Name='treeout'),\ # Do not report progress. Default: off '--quiet':FlagParameter(Prefix='--',Name='quiet'),\ # Input # Assume the sequences are nucleotide. Deafult: auto '--nuc':FlagParameter(Prefix='--',Name='nuc'),\ # Assume the sequences are amino acid. Deafult: auto '--amino':FlagParameter(Prefix='--',Name='amino'),\ # Seed alignments given in alignment_n (fasta format) are aligned with # sequences in input. The alignment within every seed is preserved. '--seed':ValuedParameter(Prefix='--',Name='seed',Delimiter=' '),\ } _parameters = {} _parameters.update(_options) _command = "mafft" _suppress_stderr=True def _input_as_seqs(self,data): lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _tree_out_filename(self): if self.Parameters['--treeout'].isOn(): tree_filename = self._absolute(str(self._input_filename))+'.tree' else: raise ValueError, "No tree output file specified." return tree_filename def _tempfile_as_multiline_string(self, data): """Write a multiline string to a temp file and return the filename. data: a multiline string to be written to a file. * Note: the result will be the filename as a FilePath object (which is a string subclass). """ filename = FilePath(self.getTmpFilename(self.TmpDir)) data_file = open(filename,'w') data_file.write(data) data_file.close() return filename def getHelp(self): """Method that points to the Mafft documentation.""" help_str = \ """ See Mafft documentation at: http://align.bmr.kyushu-u.ac.jp/mafft/software/manual/manual.html """ return help_str def _get_result_paths(self,data): result = {} if self.Parameters['--treeout'].isOn(): out_name = self._tree_out_filename() result['Tree'] = ResultPath(Path=out_name,IsWritten=True) return result def align_unaligned_seqs(seqs,moltype,params=None,accurate=False): """Aligns unaligned sequences seqs: either list of sequence objects or list of strings add_seq_names: boolean. if True, sequence names are inserted in the list of sequences. if False, it assumes seqs is a list of lines of some proper format that the program can handle """ #create SequenceCollection object from seqs seq_collection = SequenceCollection(seqs,MolType=moltype) #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seq_collection.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) #Create Mafft app. app = Mafft(InputHandler='_input_as_multiline_string',params=params) #Turn on correct moltype moltype_string = moltype.label.upper() app.Parameters[MOLTYPE_MAP[moltype_string]].on() #Do not report progress app.Parameters['--quiet'].on() #More accurate alignment, sacrificing performance. if accurate: app.Parameters['--globalpair'].on() app.Parameters['--maxiterate'].Value=1000 #Get results using int_map as input to app res = app(int_map.toFasta()) #Get alignment as dict out of results alignment = dict(MinimalFastaParser(res['StdOut'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): new_alignment[int_keys[k]]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) #Clean up res.cleanUp() del(seq_collection,int_map,int_keys,app,res,alignment) return new_alignment def align_and_build_tree(seqs, moltype, best_tree=False, params={}): """Returns an alignment and a tree from Sequences object seqs. seqs: SequenceCollection object, or data that can be used to build one. best_tree: if True (default:False), uses a slower but more accurate algorithm to build the tree. params: dict of parameters to pass in to the Mafft app controller. The result will be a tuple containing an Alignment object and a cogent.core.tree.PhyloNode object (or None for the alignment and/or tree if either fails). """ #Current version of Mafft does not support tree building. raise NotImplementedError, """Current version of Mafft does not support tree building.""" def build_tree_from_alignment(aln, moltype, best_tree=False, params={},\ working_dir='/tmp'): """Returns a tree from Alignment object aln. aln: a cogent.core.alignment.Alignment object, or data that can be used to build one. best_tree: if True (default:False), uses a slower but more accurate algorithm to build the tree. NOTE: Mafft does not necessarily support best_tree option. Will only return guide tree used to align sequences. Passing best_tree = True will construct the guide tree 100 times instead of default 2 times. ***Mafft does allow you to get the guide tree back, but the IDs in the output guide tree do not match the original IDs in the fasta file and are impossible to map. Sent bug report to Mafft authors; possibly expect this option in future version.*** params: dict of parameters to pass in to the Mafft app controller. The result will be an cogent.core.tree.PhyloNode object, or None if tree fails. """ #Current version of Mafft does not support tree building. raise NotImplementedError, """Current version of Mafft does not support tree building.""" def add_seqs_to_alignment(seqs, aln, moltype, params=None, accurate=False): """Returns an Alignment object from seqs and existing Alignment. seqs: a cogent.core.sequence.Sequence object, or data that can be used to build one. aln: an cogent.core.alignment.Alignment object, or data that can be used to build one params: dict of parameters to pass in to the Mafft app controller. """ #create SequenceCollection object from seqs seq_collection = SequenceCollection(seqs,MolType=moltype) #Create mapping between abbreviated IDs and full IDs seq_int_map, seq_int_keys = seq_collection.getIntMap() #Create SequenceCollection from int_map. seq_int_map = SequenceCollection(seq_int_map,MolType=moltype) #create Alignment object from aln aln = Alignment(aln,MolType=moltype) #Create mapping between abbreviated IDs and full IDs aln_int_map, aln_int_keys = aln.getIntMap(prefix='seqn_') #Create SequenceCollection from int_map. aln_int_map = Alignment(aln_int_map,MolType=moltype) #Update seq_int_keys with aln_int_keys seq_int_keys.update(aln_int_keys) #Create Mafft app. app = Mafft(InputHandler='_input_as_multiline_string',\ params=params, SuppressStderr=True) #Turn on correct moltype moltype_string = moltype.label.upper() app.Parameters[MOLTYPE_MAP[moltype_string]].on() #Do not report progress app.Parameters['--quiet'].on() #Add aln_int_map as seed alignment app.Parameters['--seed'].on(\ app._tempfile_as_multiline_string(aln_int_map.toFasta())) #More accurate alignment, sacrificing performance. if accurate: app.Parameters['--globalpair'].on() app.Parameters['--maxiterate'].Value=1000 #Get results using int_map as input to app res = app(seq_int_map.toFasta()) #Get alignment as dict out of results alignment = dict(MinimalFastaParser(res['StdOut'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): key = k.replace('_seed_','') new_alignment[seq_int_keys[key]]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) #Clean up res.cleanUp() remove(app.Parameters['--seed'].Value) del(seq_collection,seq_int_map,seq_int_keys,\ aln,aln_int_map,aln_int_keys,app,res,alignment) return new_alignment def align_two_alignments(aln1, aln2, moltype, params=None): """Returns an Alignment object from two existing Alignments. aln1, aln2: cogent.core.alignment.Alignment objects, or data that can be used to build them. - Mafft profile alignment only works with aligned sequences. Alignment object used to handle unaligned sequences. params: dict of parameters to pass in to the Mafft app controller. """ #create SequenceCollection object from seqs aln1 = Alignment(aln1,MolType=moltype) #Create mapping between abbreviated IDs and full IDs aln1_int_map, aln1_int_keys = aln1.getIntMap() #Create SequenceCollection from int_map. aln1_int_map = Alignment(aln1_int_map,MolType=moltype) #create Alignment object from aln aln2 = Alignment(aln2,MolType=moltype) #Create mapping between abbreviated IDs and full IDs aln2_int_map, aln2_int_keys = aln2.getIntMap(prefix='seqn_') #Create SequenceCollection from int_map. aln2_int_map = Alignment(aln2_int_map,MolType=moltype) #Update aln1_int_keys with aln2_int_keys aln1_int_keys.update(aln2_int_keys) #Create Mafft app. app = Mafft(InputHandler='_input_as_paths',\ params=params, SuppressStderr=False) app._command = 'mafft-profile' aln1_path = app._tempfile_as_multiline_string(aln1_int_map.toFasta()) aln2_path = app._tempfile_as_multiline_string(aln2_int_map.toFasta()) filepaths = [aln1_path,aln2_path] #Get results using int_map as input to app res = app(filepaths) #Get alignment as dict out of results alignment = dict(MinimalFastaParser(res['StdOut'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): key = k.replace('_seed_','') new_alignment[aln1_int_keys[key]]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) #Clean up res.cleanUp() remove(aln1_path) remove(aln2_path) remove('pre') remove('trace') del(aln1,aln1_int_map,aln1_int_keys,\ aln2,aln2_int_map,aln2_int_keys,app,res,alignment) return new_alignment PyCogent-1.5.3/cogent/app/mfold.py000644 000765 000024 00000010243 12024702176 017753 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for mfold v3.2 application !!!To get mfold app.controller to work TmpNameLen should be max 10 (unknown reason). """ from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, ValuedParameter,Parameters from random import choice __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman", "Daniel McDonald"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class Mfold(CommandLineApplication): """Application controller for mfold 3.2 application""" #Not all parameters included! #skipped: NA_CONC,MG_CONC,LB_FR,ROT_ANG,START,STOP,REUSE _parameters = { 'LC':ValuedParameter(Prefix='',Name='LC=',Value=None,Delimiter=''), 'T':ValuedParameter(Prefix='',Name='T=',Value=None,Delimiter=''), 'P':ValuedParameter(Prefix='',Name='P=',Value=None,Delimiter=''), 'MAXBP':ValuedParameter(Prefix='',Name='MAXBP=',Value=None,Delimiter=''), 'MAX':ValuedParameter(Prefix='',Name='MAX=',Value=30,Delimiter=''), 'MAX_LP':ValuedParameter(Prefix='',Name='MAX_LP=',Value=None,Delimiter=''), 'MAX_AS':ValuedParameter(Prefix='',Name='MAX_AS=',Value=None,Delimiter=''), 'MODE':ValuedParameter(Prefix='',Name='MODE=',Value=None,Delimiter=''), } _command = 'mfold' _input_handler = '_input_as_string' def _input_as_string(self,filename): """ mfold dosen't take full paths so a tmp-file is created in the working dir for mfold to read. """ nr = choice(range(150)) input_file = open(filename).readlines() filename = self._input_filename = 'mfold_in%d.txt' % nr data_file = open(filename,'w') data_to_file = '\n'.join([str(d).strip('\n') for d in input_file]) data_file.write(data_to_file) data_file.close() data = '='.join(['SEQ',filename]) return data def _input_as_lines(self,data): """ Uses a fixed tmp filename since weird truncation of the generated filename sometimes occured. """ nr = choice(range(150)) filename = self._input_filename = 'mfold_in%d.txt' % nr data_file = open(filename,'w') data_to_file = '\n'.join([str(d).strip('\n') for d in data]) data_file.write(data_to_file) data_file.close() return '='.join(['SEQ',filename]) def _get_result_paths(self,data): """Return a dict of ResultPath objects representing all possible output """ result = {} itr=self.Parameters['MAX'].Value if itr == None: itr = 30 filename=self._input_filename.split('/')[-1] for i in range(1,itr+1): try: ct = self.WorkingDir+filename+'_'+str(i)+'.ct' f = open(ct) f.close() result['ct'+str(i)] =\ ResultPath(Path=ct) pdf = self.WorkingDir+filename+'_'+str(i)+'.pdf' f = open(pdf) f.close() result['pdf'+str(i)] =\ ResultPath(Path=pdf) except IOError: pass result['ct_all'] =\ ResultPath(Path=(self.WorkingDir+filename+'.ct')) name = self.WorkingDir+filename #output files files = ['log', 'ann', 'h-num', 'det', 'pnt', 'sav', 'ss-count', '-local.seq', 'rnaml', 'out', 'plot', 'ps', '_1.ps', '_1.ss'] for f in files: if f == '-local.seq': file = ''.join([name, f]) elif f.startswith('_1'): file = ''.join([name, f]) else: file = '.'.join([name, f]) result['%s' % f] = ResultPath(Path=file) return result PyCogent-1.5.3/cogent/app/mothur.py000644 000765 000024 00000043647 12024702176 020206 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Provides an application controller for the commandline version of mothur Version 1.6.0 """ from __future__ import with_statement from operator import attrgetter from os import path, getcwd, mkdir, remove, listdir, rmdir import re from shutil import copyfile from subprocess import Popen from tempfile import mkdtemp, NamedTemporaryFile from cogent.app.parameters import ValuedParameter from cogent.app.util import CommandLineApplication, ResultPath, \ CommandLineAppResult, ApplicationError from cogent.parse.mothur import parse_otu_list __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Prototype" class Mothur(CommandLineApplication): """Mothur application controller """ _options = { # Clustering algorithm. Choices are furthest, nearest, and # average 'method': ValuedParameter( Name='method', Value='furthest', Delimiter='=', Prefix=''), # Cutoff distance for the distance matrix 'cutoff': ValuedParameter( Name='cutoff', Value=None, Delimiter='=', Prefix=''), # Minimum pairwise distance to consider for clustering 'precision': ValuedParameter( Name='precision', Value=None, Delimiter='=', Prefix=''), } _parameters = {} _parameters.update(_options) _input_handler = '_input_as_multiline_string' _command = 'mothur' def __init__(self, params=None, InputHandler=None, SuppressStderr=None, SuppressStdout=None, WorkingDir=None, TmpDir='/tmp', TmpNameLen=20, HALT_EXEC=False): """Initialize a Mothur application controller params: a dictionary mapping the Parameter id or synonym to its value (or None for FlagParameters or MixedParameters in flag mode) for Parameters that should be turned on InputHandler: this is the method to be run on data when it is passed into call. This should be a string containing the method name. The default is _input_as_string which casts data to a string before appending it to the command line argument SuppressStderr: if set to True, will route standard error to /dev/null, False by default SuppressStdout: if set to True, will route standard out to /dev/null, False by default WorkingDir: the directory where you want the application to run, default is the current working directory, but is useful to change in cases where the program being run creates output to its current working directory and you either don't want it to end up where you are running the program, or the user running the script doesn't have write access to the current working directory WARNING: WorkingDir MUST be an absolute path! TmpDir: the directory where temp files will be created, /tmp by default TmpNameLen: the length of the temp file name HALT_EXEC: if True, raises exception w/ command output just before execution, doesn't clean up temp files. Default False. """ super(Mothur, self).__init__( params=params, InputHandler=InputHandler, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout, WorkingDir='', TmpDir='', TmpNameLen=TmpNameLen, HALT_EXEC=HALT_EXEC) # Prevent self.WorkingDir from being explicitly cast as a # FilePath object. This behavior does not seem necessary in # the parent's __init__() method, since the casting is # repeated in _set_WorkingDir(). if WorkingDir is not None: working_dir = WorkingDir else: working_dir = self._working_dir or getcwd() self.WorkingDir = working_dir self.TmpDir = TmpDir @staticmethod def getHelp(): """Returns link to online manual""" help = ( 'See manual, available on the MOTHUR wiki:\n' 'http://schloss.micro.umass.edu/mothur/' ) return help def __call__(self, data=None, remove_tmp=True): """Run the application with the specified kwargs on data data: anything that can be cast into a string or written out to a file. Usually either a list of things or a single string or number. input_handler will be called on this data before it is passed as part of the command-line argument, so by creating your own input handlers you can customize what kind of data you want your application to accept remove_tmp: if True, removes tmp files """ # Process the input data. Input filepath is stored in # self._input_filename getattr(self, self.InputHandler)(data) if self.SuppressStdout: outfile = None else: outfile = open(self.getTmpFilename(self.TmpDir), 'w') if self.SuppressStderr: errfile = None else: errfile = open(self.getTmpFilename(self.TmpDir), 'w') args = [self._command, self._compile_mothur_script()] process = Popen( args, stdout=outfile, stderr=errfile, cwd=self.WorkingDir) exit_status = process.wait() if not self._accept_exit_status(exit_status): raise ApplicationError( 'Unacceptable application exit status: %s, command: %s' % \ (exit_status, args)) if outfile is not None: outfile.seek(0) if errfile is not None: errfile.seek(0) result = CommandLineAppResult( outfile, errfile, exit_status, result_paths=self._get_result_paths()) # Clean up the input file if one was created if remove_tmp: if self._input_filename: remove(self._input_filename) self._input_filename = None return result def _accept_exit_status(self, status): return int(status) == 0 def _compile_mothur_script(self): """Returns a Mothur batch script as a string""" def format_opts(*opts): """Formats a series of options for a Mothur script""" return ', '.join(filter(None, map(str, opts))) vars = { 'in': self._input_filename, 'unique': self._derive_unique_path(), 'dist': self._derive_dist_path(), 'names': self._derive_names_path(), 'cluster_opts' : format_opts( self.Parameters['method'], self.Parameters['cutoff'], self.Parameters['precision'], ), } script = ( '#' 'unique.seqs(fasta=%(in)s); ' 'dist.seqs(fasta=%(unique)s); ' 'read.dist(column=%(dist)s, name=%(names)s); ' 'cluster(%(cluster_opts)s)' % vars ) return script def _get_result_paths(self): paths = { 'distance matrix': self._derive_dist_path(), 'otu list': self._derive_list_path(), 'rank abundance': self._derive_rank_abundance_path(), 'species abundance': self._derive_species_abundance_path(), 'unique names': self._derive_names_path(), 'unique seqs': self._derive_unique_path(), 'log': self._derive_log_path(), } return dict([(k, ResultPath(v)) for (k,v) in paths.items()]) # Methods to derive/guess output pathnames produced by MOTHUR. # TODO: test for input files that do not have a filetype extension def _derive_log_path(self): """Guess logfile path produced by Mothur This method checks the working directory for log files generated by Mothur. It will raise an ApplicationError if no log file can be found. Mothur generates log files named in a nondeterministic way, using the current time. We return the log file with the most recent time, although this may lead to incorrect log file detection if you are running many instances of mothur simultaneously. """ filenames = listdir(self.WorkingDir) lognames = [x for x in filenames if re.match("^mothur\.\d+\.logfile$", x)] if not lognames: raise ApplicationError( 'No log file detected in directory %s. Contents: \n\t%s' % ( input_dir, '\n\t'.join(possible_logfiles))) most_recent_logname = sorted(lognames, reverse=True)[0] return path.join(self.WorkingDir, most_recent_logname) def _derive_unique_path(self): """Guess unique sequences path produced by Mothur""" base, ext = path.splitext(self._input_filename) return '%s.unique%s' % (base, ext) def _derive_dist_path(self): """Guess distance matrix path produced by Mothur""" base, ext = path.splitext(self._input_filename) return '%s.unique.dist' % base def _derive_names_path(self): """Guess unique names file path produced by Mothur""" base, ext = path.splitext(self._input_filename) return '%s.names' % base def __get_method_abbrev(self): """Abbreviated form of clustering method parameter. Used to guess output filenames for MOTHUR. """ abbrevs = { 'furthest': 'fn', 'nearest': 'nn', 'average': 'an', } if self.Parameters['method'].isOn(): method = self.Parameters['method'].Value else: method = self.Parameters['method'].Default return abbrevs[method] def _derive_list_path(self): """Guess otu list file path produced by Mothur""" base, ext = path.splitext(self._input_filename) return '%s.unique.%s.list' % (base, self.__get_method_abbrev()) def _derive_rank_abundance_path(self): """Guess rank abundance file path produced by Mothur""" base, ext = path.splitext(self._input_filename) return '%s.unique.%s.rabund' % (base, self.__get_method_abbrev()) def _derive_species_abundance_path(self): """Guess species abundance file path produced by Mothur""" base, ext = path.splitext(self._input_filename) return '%s.unique.%s.sabund' % (base, self.__get_method_abbrev()) def getTmpFilename(self, tmp_dir='/tmp', prefix='tmp', suffix='.txt'): """Returns a temporary filename Similar interface to tempfile.mktmp() """ # Override to change default constructor to str(). FilePath # objects muck up the Mothur script. return super(Mothur, self).getTmpFilename( tmp_dir=tmp_dir, prefix=prefix, suffix=suffix, result_constructor=str) # Temporary input file needs to be in the working directory, so we # override all input handlers. def _input_as_multiline_string(self, data): """Write multiline string to temp file, return filename data: a multiline string to be written to a file. """ self._input_filename = self.getTmpFilename(suffix='.fasta') with open(self._input_filename, 'w') as f: f.write(data) return self._input_filename def _input_as_lines(self, data): """Write sequence of lines to temp file, return filename data: a sequence to be written to a file, each element of the sequence will compose a line in the file * Note: '\n' will be stripped off the end of each sequence element before writing to a file in order to avoid multiple new lines accidentally be written to a file """ self._input_filename = self.getTmpFilename(suffix='.fasta') with open(self._input_filename, 'w') as f: # Use lazy iteration instead of list comprehension to # prevent reading entire file into memory for line in data: f.write(str(line).strip('\n')) f.write('\n') return self._input_filename def _input_as_path(self, data): """Copys the provided file to WorkingDir and returns the new filename data: path or filename """ self._input_filename = self.getTmpFilename(suffix='.fasta') copyfile(data, self._input_filename) return self._input_filename def _input_as_paths(self, data): raise NotImplementedError('Not applicable for MOTHUR controller.') def _input_as_string(self, data): raise NotImplementedError('Not applicable for MOTHUR controller.') # FilePath objects muck up the Mothur script, so we override the # property methods for self.WorkingDir def _get_WorkingDir(self): """Gets the working directory""" return self._curr_working_dir def _set_WorkingDir(self, path): """Sets the working directory """ self._curr_working_dir = path try: mkdir(self.WorkingDir) except OSError: # Directory already exists pass WorkingDir = property(_get_WorkingDir, _set_WorkingDir) def mothur_from_file(file): app = Mothur(InputHandler='_input_as_lines') result = app(file) # Force evaluation, so we can safely clean up files otus = list(parse_otu_list(result['otu list'])) result.cleanUp() return otus class MothurClassifySeqs(Mothur): _options = { 'reference': ValuedParameter( Name='reference', Value=None, Delimiter='=', Prefix=''), 'taxonomy': ValuedParameter( Name='taxonomy', Value=None, Delimiter='=', Prefix=''), 'cutoff': ValuedParameter( Name='cutoff', Value=None, Delimiter='=', Prefix=''), 'iters': ValuedParameter( Name='iters', Value=None, Delimiter='=', Prefix=''), 'ksize': ValuedParameter( Name='ksize', Value=None, Delimiter='=', Prefix=''), } _parameters = {} _parameters.update(_options) def _format_function_arguments(self, opts): """Format a series of function arguments in a Mothur script.""" params = [self.Parameters[x] for x in opts] return ', '.join(filter(None, map(str, params))) def _compile_mothur_script(self): """Returns a Mothur batch script as a string""" fasta = self._input_filename required_params = ["reference", "taxonomy"] for p in required_params: if self.Parameters[p].Value is None: raise ValueError("Must provide value for parameter %s" % p) optional_params = ["ksize", "cutoff", "iters"] args = self._format_function_arguments( required_params + optional_params) script = '#classify.seqs(fasta=%s, %s)' % (fasta, args) return script def _get_result_paths(self): input_base, ext = path.splitext(path.basename(self._input_filename)) result_by_suffix = { ".summary": "summary", ".taxonomy": "assignments", ".accnos": "accnos", } paths = {'log': self._derive_log_path()} input_dir = path.dirname(self._input_filename) for fn in listdir(input_dir): if fn.startswith(input_base): for suffix, result_key in result_by_suffix.items(): if fn.endswith(suffix): paths[result_key] = path.join(input_dir, fn) return dict([(k, ResultPath(v)) for (k,v) in paths.items()]) def parse_mothur_assignments(lines): for line in lines: line = line.strip() if not line: continue seq_id, _, assignment = line.partition("\t") toks = assignment.rstrip(";").split(";") lineage = [] conf = None for tok in toks: matchobj = re.match("(.+)\((\d+)\)$", tok) if matchobj: lineage.append(matchobj.group(1)) pct_conf = int(matchobj.group(2)) conf = pct_conf / 100.0 yield seq_id, lineage, conf def mothur_classify_file( query_file, ref_fp, tax_fp, cutoff=None, iters=None, ksize=None, output_fp=None): """Classify a set of sequences using Mothur's naive bayes method Dashes are used in Mothur to provide multiple filenames. A filepath with a dash typically breaks an otherwise valid command in Mothur. This wrapper script makes a copy of both files, ref_fp and tax_fp, to ensure that the path has no dashes. For convenience, we also ensure that each taxon list in the id-to-taxonomy file ends with a semicolon. """ tmp_ref_file = NamedTemporaryFile(suffix=".ref.fa") for line in open(ref_fp): tmp_ref_file.write(line) tmp_ref_file.seek(0) tmp_tax_file = NamedTemporaryFile(suffix=".tax.txt") for line in open(tax_fp): line = line.rstrip() if not line.endswith(";"): line = line + ";" tmp_tax_file.write(line) tmp_tax_file.write("\n") tmp_tax_file.seek(0) params = {"reference": tmp_ref_file.name, "taxonomy": tmp_tax_file.name} if cutoff is not None: params["cutoff"] = cutoff if ksize is not None: params["ksize"] = ksize if iters is not None: params["iters"] = iters app = MothurClassifySeqs(params, InputHandler='_input_as_lines') result = app(query_file) # Force evaluation so we can safely clean up files assignments = list(parse_mothur_assignments(result['assignments'])) result.cleanUp() if output_fp is not None: f = open(output_fp, "w") for query_id, taxa, conf in assignments: taxa_str = ";".join(taxa) f.write("%s\t%s\t%.2f\n" % (query_id, taxa_str, conf)) f.close() return None return dict((a, (b, c)) for a, b, c in assignments) PyCogent-1.5.3/cogent/app/msms.py000644 000765 000024 00000014015 12024702176 017632 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller and utility functions for MSMS (molecular surface calculation).""" import os import tempfile from cogent.app.util import CommandLineApplication, ResultPath from cogent.app.parameters import FlagParameter, ValuedParameter from cogent.format.xyzrn import XYZRNWriter from cogent.parse.msms import parse_VertFile __author__ = "Marcin Cieslik" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Marcin Cieslik"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Marcin Cieslik" __email__ = "mpc4p@virginia.edu" __status__ = "Development" class Msms(CommandLineApplication): """Application controller for MSMS. The default input is a ``Entity`` instance. Supported parameters: - probe_radius float : probe sphere radius, [1.5] - density float : surface points density, [1.0] - hdensity float : surface points high density, [3.0] - surface : triangulated or Analytical SES, [tses] - no_area : turns off the analytical surface area computation - noh : ignore atoms with radius 1.2 - no_rest_on_pbr : no restart if pb. during triangulation - no_rest : no restart if pb. are encountered - if filename : sphere input file - of filename : output for triangulated surface - af filename : area file - no_header : do not add comment line to the output - free_vertices : turns on computation for isolated RS vertices - all_components : compute all the surfaces components """ _parameters = { # -probe_radius float : probe sphere radius, [1.5] '-probe_radius':ValuedParameter(Prefix='-', Name='probe_radius', Delimiter=' '), # -density float : surface points density, [1.0] '-density':ValuedParameter(Prefix='-', Name='density', Delimiter=' '), # -hdensity float : surface points high density, [3.0] '-hdensity':ValuedParameter(Prefix='-', Name='hdensity', Delimiter=' '), # -surface : triangulated or Analytical SES, [tses] '-surface':ValuedParameter(Prefix='-', Name='surface', Delimiter=' '), # -no_area : turns off the analytical surface area computation '-no_area':FlagParameter(Prefix='-', Name='no_area', Value=False), # -noh : ignore atoms with radius 1.2 '-noh':FlagParameter(Prefix='-', Name='noh', Value=False), # -no_rest_on_pbr : no restart if pb. during triangulation '-no_rest_on_pbr':FlagParameter(Prefix='-', Name='no_rest_on_pbr', Value=False), # -no_rest : no restart if pb. are encountered '-no_rest':FlagParameter(Prefix='-', Name='no_rest', Value=False), # -if filename : sphere input file '-if':ValuedParameter(Prefix='-', Name='if', Delimiter=' ', IsPath=True), # -of filename : output for triangulated surface '-of':ValuedParameter(Prefix='-', Name='of', Delimiter=' ', IsPath=True), # -af filename : area file '-af':ValuedParameter(Prefix='-', Name='af', Delimiter=' ', IsPath=True), # -no_header : do not add comment line to the output '-no_header':FlagParameter(Prefix='-', Name='no_header', Value=False), # -free_vertices : turns on computation for isolated RS vertices '-free_vertices':FlagParameter(Prefix='-', Name='free_vertices', Value=False), # -all_components : compute all the surfaces components '-all_components':FlagParameter(Prefix='-', Name='all_components', Value=False), ####################### # -one_cavity #atoms at1 [at2][at3] : Compute the surface for an internal # cavity for which at least one atom is specified ####################### # -socketName servicename : socket connection from a client # -socketPort portNumber : socket connection from a client # -xdr : use xdr encoding over socket # -sinetd : inetd server connectio } _command = "msms" _input_handler = '_input_from_entity' def _input_from_entity(self, data): """This allows to feed entities to msms.""" if data: # create temporary files and names. fd, self._input_filename = tempfile.mkstemp() os.close(fd) # write XYZR data fh = open(self._input_filename, 'wb') XYZRNWriter(fh, data) fh.close() # self.Parameters['-if'].on(self._input_filename) self.Parameters['-of'].on(self._input_filename) # msms appends .vert .face self.Parameters['-af'].on(self._input_filename) # msms appends .area if (not self.Parameters['-if'].isOn()) or \ (not self.Parameters['-of'].isOn()) or \ (not self.Parameters['-af'].isOn()): raise ValueError('All of -if, -of and -af have to be specified.') return "" def _get_result_paths(self, data): result = {} vert_file = self.Parameters['-of'].Value + '.vert' result['VertFile'] = ResultPath(Path=vert_file, IsWritten=True) face_file = self.Parameters['-of'].Value + '.face' result['FaceFile'] = ResultPath(Path=face_file, IsWritten=True) if not self.Parameters['-no_area'].Value: area_file = self.Parameters['-af'].Value + '.area' result['AreaFile'] = ResultPath(Path=area_file, IsWritten=True) return result def surface_xtra(entity, **kwargs): """Runs the MSMS application to create the molecular surface, which is an a numpy array of 3D-coordinates. Arguments: * entity - an ``Entity`` instance. Additional keyworded arguments are for the ``Msms`` application controller. Returns the numpy array and put it into entity.xtra['MSMS_SURFACE']. """ msms = Msms(**kwargs) res = msms(entity) surface = parse_VertFile(res['VertFile']) entity.xtra['MSMS_SURFACE'] = surface res.cleanUp() # remove all temporary files return surface PyCogent-1.5.3/cogent/app/muscle.py000644 000765 000024 00000071140 12024702176 020145 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for muscle 3.6 """ from os import remove from cogent.app.parameters import FlagParameter, ValuedParameter from cogent.app.util import CommandLineApplication, ResultPath, \ get_tmp_filename, guess_input_handler from random import choice from cogent.core.alignment import SequenceCollection, Alignment from cogent.parse.tree import DndParser from cogent.core.tree import PhyloNode from cogent.parse.fasta import MinimalFastaParser from cogent.util.warning import deprecated __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Zongzhi Liu", "Mike Robeson", "Catherine Lozupone", "Rob Knight", "Daniel McDonald", "Jeremy Widmann", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" class Muscle(CommandLineApplication): """Muscle application controller""" deprecated('class', 'cogent.app.muscle.Muscle', 'cogent.app.muscle_v38.Muscle', '1.6') _options ={ # Minimum spacing between anchor columns. [Integer] '-anchorspacing':ValuedParameter('-',Name='anchorspacing',Delimiter=' '), # Center parameter. Should be negative [Float] '-center':ValuedParameter('-',Name='center',Delimiter=' '), # Clustering method. cluster1 is used in iteration 1 # and 2, cluster2 in later iterations '-cluster1':ValuedParameter('-',Name='cluster1',Delimiter=' '), '-cluster2':ValuedParameter('-',Name='cluster2',Delimiter=' '), # Minimum length of diagonal. '-diaglength':ValuedParameter('-',Name='diaglength',Delimiter=' '), # Discard this many positions at ends of diagonal. '-diagmargin':ValuedParameter('-',Name='diagmargin',Delimiter=' '), # Distance measure for iteration 1. '-distance1':ValuedParameter('-',Name='distance1',Delimiter=' '), # Distance measure for iterations 2, 3 ... '-distance2':ValuedParameter('-',Name='distance2',Delimiter=' '), # The gap open score. Must be negative. '-gapopen':ValuedParameter('-',Name='gapopen',Delimiter=' '), # Window size for determining whether a region is hydrophobic. '-hydro':ValuedParameter('-',Name='hydro',Delimiter=' '), # Multiplier for gap open/close penalties in hydrophobic regions. '-hydrofactor':ValuedParameter('-',Name='hydrofactor',Delimiter=' '), # Where to find the input sequences. '-in':ValuedParameter('-',Name='in',Delimiter=' ', Quote="\""), '-in1':ValuedParameter('-',Name='in1',Delimiter=' ', Quote="\""), '-in2':ValuedParameter('-',Name='in2',Delimiter=' ', Quote="\""), # Log file name (delete existing file). '-log':ValuedParameter('-',Name='log',Delimiter=' '), # Log file name (append to existing file). '-loga':ValuedParameter('-',Name='loga',Delimiter=' '), # Maximum distance between two diagonals that allows them to merge # into one diagonal. '-maxdiagbreak':ValuedParameter('-',Name='maxdiagbreak',Delimiter=' '), # Maximum time to run in hours. The actual time may exceed the # requested limit by a few minutes. Decimals are allowed, so 1.5 # means one hour and 30 minutes. '-maxhours':ValuedParameter('-',Name='maxhours',Delimiter=' '), # Maximum number of iterations. '-maxiters':ValuedParameter('-',Name='maxiters',Delimiter=' '), # Maximum memory in Mb '-maxmb': ValuedParameter('-', Name='maxmb', Delimiter=' '), # Maximum number of new trees to build in iteration 2. '-maxtrees':ValuedParameter('-',Name='maxtrees',Delimiter=' '), # Minimum score a column must have to be an anchor. '-minbestcolscore':ValuedParameter('-',Name='minbestcolscore',Delimiter=' '), # Minimum smoothed score a column must have to be an anchor. '-minsmoothscore':ValuedParameter('-',Name='minsmoothscore',Delimiter=' '), # Objective score used by tree dependent refinement. # sp=sum-of-pairs score. # spf=sum-of-pairs score (dimer approximation) # spm=sp for < 100 seqs, otherwise spf # dp=dynamic programming score. # ps=average profile-sequence score. # xp=cross profile score. '-objscore':ValuedParameter('-',Name='objscore',Delimiter=' '), # Where to write the alignment. '-out':ValuedParameter('-',Name='out',Delimiter=' ', Quote="\""), # Where to write the file in phylip sequenctial format (v3.6 only). '-physout':ValuedParameter('-',Name='physout',Delimiter=' '), # Where to write the file in phylip interleaved format (v3.6 only). '-phyiout':ValuedParameter('-',Name='phyiout',Delimiter=' '), # Set to profile for aligning two alignments and adding seqs to an # existing alignment '-profile':FlagParameter(Prefix='-',Name='profile'), # Method used to root tree; root1 is used in iteration 1 and 2, root2 # in later iterations. '-root1':ValuedParameter('-',Name='root1',Delimiter=' '), '-root2':ValuedParameter('-',Name='root2',Delimiter=' '), # Sequence type. '-seqtype':ValuedParameter('-',Name='seqtype',Delimiter=' '), # Maximum value of column score for smoothing purposes. '-smoothscoreceil':ValuedParameter('-',Name='smoothscoreceil',Delimiter=' '), # Constant used in UPGMB clustering. Determines the relative fraction # of average linkage (SUEFF) vs. nearest-neighbor linkage (1 . SUEFF). '-SUEFF':ValuedParameter('-',Name='SUEFF',Delimiter=' '), # Save tree produced in first or second iteration to given file in # Newick (Phylip-compatible) format. '-tree1':ValuedParameter('-',Name='tree1',Delimiter=' ', Quote="\""), '-tree2':ValuedParameter('-',Name='tree2',Delimiter=' ', Quote="\""), # Sequence weighting scheme. # weight1 is used in iterations 1 and 2. # weight2 is used for tree-dependent refinement. # none=all sequences have equal weight. # henikoff=Henikoff & Henikoff weighting scheme. # henikoffpb=Modified Henikoff scheme as used in PSI-BLAST. # clustalw=CLUSTALW method. # threeway=Gotoh three-way method. '-weight1':ValuedParameter('-',Name='weight1',Delimiter=' '), '-weight2':ValuedParameter('-',Name='weight2',Delimiter=' '), # Use anchor optimization in tree dependent refinement iterations '-anchors':FlagParameter(Prefix='-',Name='anchors'), # Write output in CLUSTALW format (default is FASTA). '-clw':FlagParameter(Prefix='-',Name='clw'), # Cluster sequences '-cluster':FlagParameter(Prefix='-',Name='cluster'), # neighborjoining is "unrecognized" #'-neighborjoining':FlagParameter(Prefix='-',Name='neighborjoining'), # Write output in CLUSTALW format with the "CLUSTAL W (1.81)" header # rather than the MUSCLE version. This is useful when a post-processing # step is picky about the file header. '-clwstrict':FlagParameter(Prefix='-',Name='clwstrict'), # Do not catch exceptions. '-core':FlagParameter(Prefix='-',Name='core'), # Write output in FASTA format. Alternatives include .clw, # .clwstrict, .msf and .html. '-fasta':FlagParameter(Prefix='-',Name='fasta'), # Group similar sequences together in the output. This is the default. # See also .stable. '-group':FlagParameter(Prefix='-',Name='group'), # Write output in HTML format (default is FASTA). '-html':FlagParameter(Prefix='-',Name='html'), # Use log-expectation profile score (VTML240). Alternatives are to use # -sp or -sv. This is the default for amino acid sequences. '-le':FlagParameter(Prefix='-',Name='le'), # Write output in MSF format (default is FASTA). '-msf':FlagParameter(Prefix='-',Name='msf'), # Disable anchor optimization. Default is -anchors. '-noanchors':FlagParameter(Prefix='-',Name='noanchors'), # Catch exceptions and give an error message if possible. '-nocore':FlagParameter(Prefix='-',Name='nocore'), # Do not display progress messages. '-quiet':FlagParameter(Prefix='-',Name='quiet'), # Input file is already aligned, skip first two iterations and begin # tree dependent refinement. '-refine':FlagParameter(Prefix='-',Name='refine'), # Use sum-of-pairs protein profile score (PAM200). Default is -le. '-sp':FlagParameter(Prefix='-',Name='sp'), # Use sum-of-pairs nucleotide profile score (BLASTZ parameters). This # is the only option for nucleotides, and is therefore the default. '-spn':FlagParameter(Prefix='-',Name='spn'), # Preserve input order of sequences in output file. Default is to group # sequences by similarity (-group). '-stable':FlagParameter(Prefix='-',Name='stable'), # Use sum-of-pairs profile score (VTML240). Default is -le. '-sv':FlagParameter(Prefix='-',Name='sv'), # Diagonal optimization '-diags':FlagParameter(Prefix='-',Name='diags'), '-diags1':FlagParameter(Prefix='-',Name='diags1'), '-diags2':FlagParameter(Prefix='-',Name='diags2'), # Terminal gaps penalized with full penalty. # [1] Not fully supported in this version. '-termgapsfull':FlagParameter(Prefix='-',Name='termgapsfull'), # Terminal gaps penalized with half penalty. # [1] Not fully supported in this version. '-termgapshalf':FlagParameter(Prefix='-',Name='termgapshalf'), # Terminal gaps penalized with half penalty if gap relative to # longer sequence, otherwise with full penalty. # [1] Not fully supported in this version. '-termgapshalflonger':FlagParameter(Prefix='-',Name='termgapshalflonger'), # Write parameter settings and progress messages to log file. '-verbose':FlagParameter(Prefix='-',Name='verbose'), # Write version string to stdout and exit. '-version':FlagParameter(Prefix='-',Name='version'), } _parameters = {} _parameters.update(_options) _command = "muscle" def _input_as_seqs(self,data): lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _input_as_lines(self,data): if data: self.Parameters['-in']\ .on(super(Muscle,self)._input_as_lines(data)) return '' def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ if data: self.Parameters['-in'].on(str(data)) return '' def _input_as_multiline_string(self, data): if data: self.Parameters['-in']\ .on(super(Muscle,self)._input_as_multiline_string(data)) return '' def _input_as_multifile(self, data): """For use with the -profile option This input handler expects data to be a tuple containing two filenames. Index 0 will be set to -in1 and index 1 to -in2 """ if data: try: filename1, filename2 = data except: raise ValueError, "Expected two filenames" self.Parameters['-in'].off() self.Parameters['-in1'].on(filename1) self.Parameters['-in2'].on(filename2) return '' def _align_out_filename(self): if self.Parameters['-out'].isOn(): aln_filename = self._absolute(str(self.Parameters['-out'].Value)) else: raise ValueError, "No output file specified." return aln_filename def _tree1_out_filename(self): if self.Parameters['-tree1'].isOn(): aln_filename = self._absolute(str(self.Parameters['-tree1'].Value)) else: raise ValueError, "No tree output file specified." return aln_filename def _tree2_out_filename(self): if self.Parameters['-tree2'].isOn(): tree_filename = self._absolute(str(self.Parameters['-tree2'].Value)) else: raise ValueError, "No tree output file specified." return tree_filename def _get_result_paths(self,data): result = {} if self.Parameters['-out'].isOn(): out_name = self._align_out_filename() result['MuscleOut'] = ResultPath(Path=out_name,IsWritten=True) if self.Parameters['-tree1'].isOn(): out_name = self._tree1_out_filename() result['Tree1Out'] = ResultPath(Path=out_name,IsWritten=True) if self.Parameters['-tree2'].isOn(): out_name = self._tree2_out_filename() result['Tree2Out'] = ResultPath(Path=out_name,IsWritten=True) return result def getHelp(self): """Muscle help""" help_str = """ """ return help_str #SOME FUNCTIONS TO EXECUTE THE MOST COMMON TASKS def muscle_seqs(seqs, add_seq_names=False, out_filename=None, input_handler=None, params={}, WorkingDir=None, SuppressStderr=None, SuppressStdout=None): """Muscle align list of sequences. seqs: a list of sequences as strings or objects, you must set add_seq_names=True or sequences in a multiline string, as read() from a fasta file or sequences in a list of lines, as readlines() from a fasta file or a fasta seq filename. == for eg, testcode for guessing #guess_input_handler should correctly identify input gih = guess_input_handler self.assertEqual(gih('abc.txt'), '_input_as_string') self.assertEqual(gih('>ab\nTCAG'), '_input_as_multiline_string') self.assertEqual(gih(['ACC','TGA'], True), '_input_as_seqs') self.assertEqual(gih(['>a','ACC','>b','TGA']), '_input_as_lines') == docstring for blast_seqs, apply to muscle_seqs == seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. WARNING: DECISION RULES FOR INPUT HANDLING HAVE CHANGED. Decision rules for data are as follows. If it's s list, treat as lines, unless add_seq_names is true (in which case treat as list of seqs). If it's a string, test whether it has newlines. If it doesn't have newlines, assume it's a filename. If it does have newlines, it can't be a filename, so assume it's a multiline string containing sequences. If you want to skip the detection and force a specific type of input handler, use input_handler='your_favorite_handler'. add_seq_names: boolean. if True, sequence names are inserted in the list of sequences. if False, it assumes seqs is a list of lines of some proper format that the program can handle Addl docs coming soon """ if out_filename: params["-out"] = out_filename #else: # params["-out"] = get_tmp_filename(WorkingDir) ih = input_handler or guess_input_handler(seqs, add_seq_names) muscle_app = Muscle( params=params, InputHandler=ih, WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) return muscle_app(seqs) def cluster_seqs(seqs, neighbor_join=False, params={}, add_seq_names=True, WorkingDir=None, SuppressStderr=None, SuppressStdout=None, max_chars=1000000, max_hours=1.0, constructor=PhyloNode, clean_up=True ): """Muscle cluster list of sequences. seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. Addl docs coming soon """ num_seqs = len(seqs) if num_seqs < 2: raise ValueError, "Muscle requres 2 or more sequences to cluster." num_chars = sum(map(len, seqs)) if num_chars > max_chars: params["-maxiters"] = 2 params["-diags1"] = True params["-sv"] = True #params["-distance1"] = "kmer6_6" #params["-distance1"] = "kmer20_3" #params["-distance1"] = "kbit20_3" print "lots of chars, using fast align", num_chars params["-maxhours"] = max_hours #params["-maxiters"] = 10 #cluster_type = "upgmb" #if neighbor_join: # cluster_type = "neighborjoining" params["-cluster"] = True params["-tree1"] = get_tmp_filename(WorkingDir) muscle_res = muscle_seqs(seqs, params=params, add_seq_names=add_seq_names, WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) tree = DndParser(muscle_res["Tree1Out"], constructor=constructor) if clean_up: muscle_res.cleanUp() return tree def aln_tree_seqs(seqs, input_handler=None, tree_type='neighborjoining', params={}, add_seq_names=True, WorkingDir=None, SuppressStderr=None, SuppressStdout=None, max_hours=5.0, constructor=PhyloNode, clean_up=True ): """Muscle align sequences and report tree from iteration2. Unlike cluster_seqs, returns tree2 which is the tree made during the second muscle iteration (it should be more accurate that the cluster from the first iteration which is made fast based on k-mer words) seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. tree_type: can be either neighborjoining (default) or upgmb for UPGMA clean_up: When true, will clean up output files """ params["-maxhours"] = max_hours if tree_type: params["-cluster2"] = tree_type params["-tree2"] = get_tmp_filename(WorkingDir) params["-out"] = get_tmp_filename(WorkingDir) muscle_res = muscle_seqs(seqs, input_handler=input_handler, params=params, add_seq_names=add_seq_names, WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) tree = DndParser(muscle_res["Tree2Out"], constructor=constructor) aln = [line for line in muscle_res["MuscleOut"]] if clean_up: muscle_res.cleanUp() return tree, aln def fastest_aln_seqs(seqs, params={}, out_filename=None, add_seq_names=True, WorkingDir=None, SuppressStderr=None, SuppressStdout=None ): """Fastest (and least accurate) version of muscle seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. Addl docs coming soon """ params["-maxiters"] = 1 params["-diags1"] = True params["-sv"] = True params["-distance1"] = "kbit20_3" muscle_res = muscle_seqs(seqs, params=params, add_seq_names=add_seq_names, out_filename=out_filename, WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) return muscle_res def align_unaligned_seqs(seqs, moltype, params=None): """Returns an Alignment object from seqs. seqs: SequenceCollection object, or data that can be used to build one. moltype: a MolType object. DNA, RNA, or PROTEIN. params: dict of parameters to pass in to the Muscle app controller. Result will be an Alignment object. """ if not params: params = {} #create SequenceCollection object from seqs seq_collection = SequenceCollection(seqs,MolType=moltype) #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seq_collection.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) #get temporary filename params.update({'-out':get_tmp_filename()}) #Create Muscle app. app = Muscle(InputHandler='_input_as_multiline_string',\ params=params) #Get results using int_map as input to app res = app(int_map.toFasta()) #Get alignment as dict out of results alignment = dict(MinimalFastaParser(res['MuscleOut'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): new_alignment[int_keys[k]]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) #Clean up res.cleanUp() del(seq_collection,int_map,int_keys,app,res,alignment,params) return new_alignment def align_and_build_tree(seqs, moltype, best_tree=False, params=None): """Returns an alignment and a tree from Sequences object seqs. seqs: a cogent.core.alignment.SequenceCollection object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object best_tree: if True (default:False), uses a slower but more accurate algorithm to build the tree. params: dict of parameters to pass in to the Muscle app controller. The result will be a tuple containing a cogent.core.alignment.Alignment and a cogent.core.tree.PhyloNode object (or None for the alignment and/or tree if either fails). """ aln = align_unaligned_seqs(seqs, moltype=moltype, params=params) tree = build_tree_from_alignment(aln, moltype, best_tree, params) return {'Align':aln, 'Tree':tree} def build_tree_from_alignment(aln, moltype, best_tree=False, params=None): """Returns a tree from Alignment object aln. aln: a cogent.core.alignment.Alignment object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object best_tree: unsupported params: dict of parameters to pass in to the Muscle app controller. The result will be an cogent.core.tree.PhyloNode object, or None if tree fails. """ # Create instance of app controller, enable tree, disable alignment app = Muscle(InputHandler='_input_as_multiline_string', params=params, \ WorkingDir='/tmp') app.Parameters['-cluster'].on() app.Parameters['-tree1'].on(get_tmp_filename(app.WorkingDir)) app.Parameters['-seqtype'].on(moltype.label) seq_collection = SequenceCollection(aln, MolType=moltype) #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seq_collection.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) # Collect result result = app(int_map.toFasta()) # Build tree tree = DndParser(result['Tree1Out'].read(), constructor=PhyloNode) for tip in tree.tips(): tip.Name = int_keys[tip.Name] # Clean up result.cleanUp() del(seq_collection, app, result) return tree def add_seqs_to_alignment(seqs, aln, params=None): """Returns an Alignment object from seqs and existing Alignment. seqs: a cogent.core.alignment.SequenceCollection object, or data that can be used to build one. aln: a cogent.core.alignment.Alignment object, or data that can be used to build one params: dict of parameters to pass in to the Muscle app controller. """ if not params: params = {} #create SequenceCollection object from seqs seqs_collection = SequenceCollection(seqs) #Create mapping between abbreviated IDs and full IDs seqs_int_map, seqs_int_keys = seqs_collection.getIntMap(prefix='seq_') #Create SequenceCollection from int_map. seqs_int_map = SequenceCollection(seqs_int_map) #create SequenceCollection object from aln aln_collection = SequenceCollection(aln) #Create mapping between abbreviated IDs and full IDs aln_int_map, aln_int_keys = aln_collection.getIntMap(prefix='aln_') #Create SequenceCollection from int_map. aln_int_map = SequenceCollection(aln_int_map) #set output and profile options params.update({'-out':get_tmp_filename(), '-profile':True}) #save seqs to tmp file seqs_filename = get_tmp_filename() seqs_out = open(seqs_filename,'w') seqs_out.write(seqs_int_map.toFasta()) seqs_out.close() #save aln to tmp file aln_filename = get_tmp_filename() aln_out = open(aln_filename, 'w') aln_out.write(aln_int_map.toFasta()) aln_out.close() #Create Muscle app and get results app = Muscle(InputHandler='_input_as_multifile', params=params) res = app((aln_filename, seqs_filename)) #Get alignment as dict out of results alignment = dict(MinimalFastaParser(res['MuscleOut'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): if k in seqs_int_keys: new_alignment[seqs_int_keys[k]] = v else: new_alignment[aln_int_keys[k]] = v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment) #Clean up res.cleanUp() del(seqs_collection, seqs_int_map, seqs_int_keys) del(aln_collection, aln_int_map, aln_int_keys) del(app, res, alignment, params) remove(seqs_filename) remove(aln_filename) return new_alignment def align_two_alignments(aln1, aln2, params=None): """Returns an Alignment object from two existing Alignments. aln1, aln2: cogent.core.alignment.Alignment objects, or data that can be used to build them. params: dict of parameters to pass in to the Muscle app controller. """ if not params: params = {} #create SequenceCollection object from aln1 aln1_collection = SequenceCollection(aln1) #Create mapping between abbreviated IDs and full IDs aln1_int_map, aln1_int_keys = aln1_collection.getIntMap(prefix='aln1_') #Create SequenceCollection from int_map. aln1_int_map = SequenceCollection(aln1_int_map) #create SequenceCollection object from aln2 aln2_collection = SequenceCollection(aln2) #Create mapping between abbreviated IDs and full IDs aln2_int_map, aln2_int_keys = aln2_collection.getIntMap(prefix='aln2_') #Create SequenceCollection from int_map. aln2_int_map = SequenceCollection(aln2_int_map) #set output and profile options params.update({'-out':get_tmp_filename(), '-profile':True}) #save aln1 to tmp file aln1_filename = get_tmp_filename() aln1_out = open(aln1_filename,'w') aln1_out.write(aln1_int_map.toFasta()) aln1_out.close() #save aln2 to tmp file aln2_filename = get_tmp_filename() aln2_out = open(aln2_filename, 'w') aln2_out.write(aln2_int_map.toFasta()) aln2_out.close() #Create Muscle app and get results app = Muscle(InputHandler='_input_as_multifile', params=params) res = app((aln1_filename, aln2_filename)) #Get alignment as dict out of results alignment = dict(MinimalFastaParser(res['MuscleOut'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): if k in aln1_int_keys: new_alignment[aln1_int_keys[k]] = v else: new_alignment[aln2_int_keys[k]] = v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment) #Clean up res.cleanUp() del(aln1_collection, aln1_int_map, aln1_int_keys) del(aln2_collection, aln2_int_map, aln2_int_keys) del(app, res, alignment, params) remove(aln1_filename) remove(aln2_filename) return new_alignment PyCogent-1.5.3/cogent/app/muscle_v38.py000644 000765 000024 00000070674 12024702176 020660 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for muscle 3.8 """ from os import remove from cogent.app.parameters import FlagParameter, ValuedParameter from cogent.app.util import CommandLineApplication, ResultPath, \ get_tmp_filename, guess_input_handler from random import choice from cogent.core.alignment import SequenceCollection, Alignment from cogent.parse.tree import DndParser from cogent.core.tree import PhyloNode from cogent.parse.fasta import MinimalFastaParser __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Zongzhi Liu", "Mike Robeson", "Catherine Lozupone", "Rob Knight", "Daniel McDonald", "Jeremy Widmann", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" class Muscle(CommandLineApplication): """Muscle application controller""" _options ={ # Minimum spacing between anchor columns. [Integer] '-anchorspacing':ValuedParameter('-',Name='anchorspacing',Delimiter=' '), # Center parameter. Should be negative [Float] '-center':ValuedParameter('-',Name='center',Delimiter=' '), # Clustering method. cluster1 is used in iteration 1 # and 2, cluster2 in later iterations '-cluster1':ValuedParameter('-',Name='cluster1',Delimiter=' '), '-cluster2':ValuedParameter('-',Name='cluster2',Delimiter=' '), # Minimum length of diagonal. '-diaglength':ValuedParameter('-',Name='diaglength',Delimiter=' '), # Discard this many positions at ends of diagonal. '-diagmargin':ValuedParameter('-',Name='diagmargin',Delimiter=' '), # Distance measure for iteration 1. '-distance1':ValuedParameter('-',Name='distance1',Delimiter=' '), # Distance measure for iterations 2, 3 ... '-distance2':ValuedParameter('-',Name='distance2',Delimiter=' '), # The gap open score. Must be negative. '-gapopen':ValuedParameter('-',Name='gapopen',Delimiter=' '), # Window size for determining whether a region is hydrophobic. '-hydro':ValuedParameter('-',Name='hydro',Delimiter=' '), # Multiplier for gap open/close penalties in hydrophobic regions. '-hydrofactor':ValuedParameter('-',Name='hydrofactor',Delimiter=' '), # Where to find the input sequences. '-in':ValuedParameter('-',Name='in',Delimiter=' ', Quote="\""), '-in1':ValuedParameter('-',Name='in1',Delimiter=' ', Quote="\""), '-in2':ValuedParameter('-',Name='in2',Delimiter=' ', Quote="\""), # Log file name (delete existing file). '-log':ValuedParameter('-',Name='log',Delimiter=' '), # Log file name (append to existing file). '-loga':ValuedParameter('-',Name='loga',Delimiter=' '), # Maximum distance between two diagonals that allows them to merge # into one diagonal. '-maxdiagbreak':ValuedParameter('-',Name='maxdiagbreak',Delimiter=' '), # Maximum time to run in hours. The actual time may exceed the # requested limit by a few minutes. Decimals are allowed, so 1.5 # means one hour and 30 minutes. '-maxhours':ValuedParameter('-',Name='maxhours',Delimiter=' '), # Maximum number of iterations. '-maxiters':ValuedParameter('-',Name='maxiters',Delimiter=' '), # Maximum memory in Mb '-maxmb': ValuedParameter('-', Name='maxmb', Delimiter=' '), # Maximum number of new trees to build in iteration 2. '-maxtrees':ValuedParameter('-',Name='maxtrees',Delimiter=' '), # Minimum score a column must have to be an anchor. '-minbestcolscore':ValuedParameter('-',Name='minbestcolscore',Delimiter=' '), # Minimum smoothed score a column must have to be an anchor. '-minsmoothscore':ValuedParameter('-',Name='minsmoothscore',Delimiter=' '), # Objective score used by tree dependent refinement. # sp=sum-of-pairs score. # spf=sum-of-pairs score (dimer approximation) # spm=sp for < 100 seqs, otherwise spf # dp=dynamic programming score. # ps=average profile-sequence score. # xp=cross profile score. '-objscore':ValuedParameter('-',Name='objscore',Delimiter=' '), # Where to write the alignment. '-out':ValuedParameter('-',Name='out',Delimiter=' ', Quote="\""), # Where to write the file in phylip sequenctial format (v3.6 only). '-physout':ValuedParameter('-',Name='physout',Delimiter=' '), # Where to write the file in phylip interleaved format (v3.6 only). '-phyiout':ValuedParameter('-',Name='phyiout',Delimiter=' '), # Set to profile for aligning two alignments and adding seqs to an # existing alignment '-profile':FlagParameter(Prefix='-',Name='profile'), # Method used to root tree; root1 is used in iteration 1 and 2, root2 # in later iterations. '-root1':ValuedParameter('-',Name='root1',Delimiter=' '), '-root2':ValuedParameter('-',Name='root2',Delimiter=' '), # Sequence type. '-seqtype':ValuedParameter('-',Name='seqtype',Delimiter=' '), # Maximum value of column score for smoothing purposes. '-smoothscoreceil':ValuedParameter('-',Name='smoothscoreceil',Delimiter=' '), # Constant used in UPGMB clustering. Determines the relative fraction # of average linkage (SUEFF) vs. nearest-neighbor linkage (1 . SUEFF). '-SUEFF':ValuedParameter('-',Name='SUEFF',Delimiter=' '), # Save tree produced in first or second iteration to given file in # Newick (Phylip-compatible) format. '-tree1':ValuedParameter('-',Name='tree1',Delimiter=' ', Quote="\""), '-tree2':ValuedParameter('-',Name='tree2',Delimiter=' ', Quote="\""), # Sequence weighting scheme. # weight1 is used in iterations 1 and 2. # weight2 is used for tree-dependent refinement. # none=all sequences have equal weight. # henikoff=Henikoff & Henikoff weighting scheme. # henikoffpb=Modified Henikoff scheme as used in PSI-BLAST. # clustalw=CLUSTALW method. # threeway=Gotoh three-way method. '-weight1':ValuedParameter('-',Name='weight1',Delimiter=' '), '-weight2':ValuedParameter('-',Name='weight2',Delimiter=' '), # Use anchor optimization in tree dependent refinement iterations '-anchors':FlagParameter(Prefix='-',Name='anchors'), # Write output in CLUSTALW format (default is FASTA). '-clw':FlagParameter(Prefix='-',Name='clw'), # Cluster sequences '-clusteronly':FlagParameter(Prefix='-',Name='clusteronly'), # neighborjoining is "unrecognized" #'-neighborjoining':FlagParameter(Prefix='-',Name='neighborjoining'), # Write output in CLUSTALW format with the "CLUSTAL W (1.81)" header # rather than the MUSCLE version. This is useful when a post-processing # step is picky about the file header. '-clwstrict':FlagParameter(Prefix='-',Name='clwstrict'), # Do not catch exceptions. '-core':FlagParameter(Prefix='-',Name='core'), # Write output in FASTA format. Alternatives include .clw, # .clwstrict, .msf and .html. '-fasta':FlagParameter(Prefix='-',Name='fasta'), # Group similar sequences together in the output. This is the default. # See also .stable. '-group':FlagParameter(Prefix='-',Name='group'), # Write output in HTML format (default is FASTA). '-html':FlagParameter(Prefix='-',Name='html'), # Use log-expectation profile score (VTML240). Alternatives are to use # -sp or -sv. This is the default for amino acid sequences. '-le':FlagParameter(Prefix='-',Name='le'), # Write output in MSF format (default is FASTA). '-msf':FlagParameter(Prefix='-',Name='msf'), # Disable anchor optimization. Default is -anchors. '-noanchors':FlagParameter(Prefix='-',Name='noanchors'), # Catch exceptions and give an error message if possible. '-nocore':FlagParameter(Prefix='-',Name='nocore'), # Do not display progress messages. '-quiet':FlagParameter(Prefix='-',Name='quiet'), # Input file is already aligned, skip first two iterations and begin # tree dependent refinement. '-refine':FlagParameter(Prefix='-',Name='refine'), # Use sum-of-pairs protein profile score (PAM200). Default is -le. '-sp':FlagParameter(Prefix='-',Name='sp'), # Use sum-of-pairs nucleotide profile score (BLASTZ parameters). This # is the only option for nucleotides, and is therefore the default. '-spn':FlagParameter(Prefix='-',Name='spn'), # Preserve input order of sequences in output file. Default is to group # sequences by similarity (-group). '-stable':FlagParameter(Prefix='-',Name='stable'), # Use sum-of-pairs profile score (VTML240). Default is -le. '-sv':FlagParameter(Prefix='-',Name='sv'), # Diagonal optimization '-diags':FlagParameter(Prefix='-',Name='diags'), '-diags1':FlagParameter(Prefix='-',Name='diags1'), '-diags2':FlagParameter(Prefix='-',Name='diags2'), # Terminal gaps penalized with full penalty. # [1] Not fully supported in this version. '-termgapsfull':FlagParameter(Prefix='-',Name='termgapsfull'), # Terminal gaps penalized with half penalty. # [1] Not fully supported in this version. '-termgapshalf':FlagParameter(Prefix='-',Name='termgapshalf'), # Terminal gaps penalized with half penalty if gap relative to # longer sequence, otherwise with full penalty. # [1] Not fully supported in this version. '-termgapshalflonger':FlagParameter(Prefix='-',Name='termgapshalflonger'), # Write parameter settings and progress messages to log file. '-verbose':FlagParameter(Prefix='-',Name='verbose'), # Write version string to stdout and exit. '-version':FlagParameter(Prefix='-',Name='version'), } _parameters = {} _parameters.update(_options) _command = "muscle" def _input_as_seqs(self,data): lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _input_as_lines(self,data): if data: self.Parameters['-in']\ .on(super(Muscle,self)._input_as_lines(data)) return '' def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ if data: self.Parameters['-in'].on(str(data)) return '' def _input_as_multiline_string(self, data): if data: self.Parameters['-in']\ .on(super(Muscle,self)._input_as_multiline_string(data)) return '' def _input_as_multifile(self, data): """For use with the -profile option This input handler expects data to be a tuple containing two filenames. Index 0 will be set to -in1 and index 1 to -in2 """ if data: try: filename1, filename2 = data except: raise ValueError, "Expected two filenames" self.Parameters['-in'].off() self.Parameters['-in1'].on(filename1) self.Parameters['-in2'].on(filename2) return '' def _align_out_filename(self): if self.Parameters['-out'].isOn(): aln_filename = self._absolute(str(self.Parameters['-out'].Value)) else: raise ValueError, "No output file specified." return aln_filename def _tree1_out_filename(self): if self.Parameters['-tree1'].isOn(): aln_filename = self._absolute(str(self.Parameters['-tree1'].Value)) else: raise ValueError, "No tree output file specified." return aln_filename def _tree2_out_filename(self): if self.Parameters['-tree2'].isOn(): tree_filename = self._absolute(str(self.Parameters['-tree2'].Value)) else: raise ValueError, "No tree output file specified." return tree_filename def _get_result_paths(self,data): result = {} if self.Parameters['-out'].isOn(): out_name = self._align_out_filename() result['MuscleOut'] = ResultPath(Path=out_name,IsWritten=True) if self.Parameters['-tree1'].isOn(): out_name = self._tree1_out_filename() result['Tree1Out'] = ResultPath(Path=out_name,IsWritten=True) if self.Parameters['-tree2'].isOn(): out_name = self._tree2_out_filename() result['Tree2Out'] = ResultPath(Path=out_name,IsWritten=True) return result def getHelp(self): """Muscle help""" help_str = """ """ return help_str #SOME FUNCTIONS TO EXECUTE THE MOST COMMON TASKS def muscle_seqs(seqs, add_seq_names=False, out_filename=None, input_handler=None, params={}, WorkingDir=None, SuppressStderr=None, SuppressStdout=None): """Muscle align list of sequences. seqs: a list of sequences as strings or objects, you must set add_seq_names=True or sequences in a multiline string, as read() from a fasta file or sequences in a list of lines, as readlines() from a fasta file or a fasta seq filename. == for eg, testcode for guessing #guess_input_handler should correctly identify input gih = guess_input_handler self.assertEqual(gih('abc.txt'), '_input_as_string') self.assertEqual(gih('>ab\nTCAG'), '_input_as_multiline_string') self.assertEqual(gih(['ACC','TGA'], True), '_input_as_seqs') self.assertEqual(gih(['>a','ACC','>b','TGA']), '_input_as_lines') == docstring for blast_seqs, apply to muscle_seqs == seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. WARNING: DECISION RULES FOR INPUT HANDLING HAVE CHANGED. Decision rules for data are as follows. If it's s list, treat as lines, unless add_seq_names is true (in which case treat as list of seqs). If it's a string, test whether it has newlines. If it doesn't have newlines, assume it's a filename. If it does have newlines, it can't be a filename, so assume it's a multiline string containing sequences. If you want to skip the detection and force a specific type of input handler, use input_handler='your_favorite_handler'. add_seq_names: boolean. if True, sequence names are inserted in the list of sequences. if False, it assumes seqs is a list of lines of some proper format that the program can handle Addl docs coming soon """ if out_filename: params["-out"] = out_filename #else: # params["-out"] = get_tmp_filename(WorkingDir) ih = input_handler or guess_input_handler(seqs, add_seq_names) muscle_app = Muscle( params=params, InputHandler=ih, WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) return muscle_app(seqs) def cluster_seqs(seqs, neighbor_join=False, params={}, add_seq_names=True, WorkingDir=None, SuppressStderr=None, SuppressStdout=None, max_chars=1000000, max_hours=1.0, constructor=PhyloNode, clean_up=True ): """Muscle cluster list of sequences. seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. Addl docs coming soon """ num_seqs = len(seqs) if num_seqs < 2: raise ValueError, "Muscle requres 2 or more sequences to cluster." num_chars = sum(map(len, seqs)) if num_chars > max_chars: params["-maxiters"] = 2 params["-diags1"] = True params["-sv"] = True #params["-distance1"] = "kmer6_6" #params["-distance1"] = "kmer20_3" #params["-distance1"] = "kbit20_3" print "lots of chars, using fast align", num_chars params["-maxhours"] = max_hours #params["-maxiters"] = 10 #cluster_type = "upgmb" #if neighbor_join: # cluster_type = "neighborjoining" params["-clusteronly"] = True params["-tree1"] = get_tmp_filename(WorkingDir) muscle_res = muscle_seqs(seqs, params=params, add_seq_names=add_seq_names, WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) tree = DndParser(muscle_res["Tree1Out"], constructor=constructor) if clean_up: muscle_res.cleanUp() return tree def aln_tree_seqs(seqs, input_handler=None, tree_type='neighborjoining', params={}, add_seq_names=True, WorkingDir=None, SuppressStderr=None, SuppressStdout=None, max_hours=5.0, constructor=PhyloNode, clean_up=True ): """Muscle align sequences and report tree from iteration2. Unlike cluster_seqs, returns tree2 which is the tree made during the second muscle iteration (it should be more accurate that the cluster from the first iteration which is made fast based on k-mer words) seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. tree_type: can be either neighborjoining (default) or upgmb for UPGMA clean_up: When true, will clean up output files """ params["-maxhours"] = max_hours if tree_type: params["-cluster2"] = tree_type params["-tree2"] = get_tmp_filename(WorkingDir) params["-out"] = get_tmp_filename(WorkingDir) muscle_res = muscle_seqs(seqs, input_handler=input_handler, params=params, add_seq_names=add_seq_names, WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) tree = DndParser(muscle_res["Tree2Out"], constructor=constructor) aln = [line for line in muscle_res["MuscleOut"]] if clean_up: muscle_res.cleanUp() return tree, aln def fastest_aln_seqs(seqs, params={}, out_filename=None, add_seq_names=True, WorkingDir=None, SuppressStderr=None, SuppressStdout=None ): """Fastest (and least accurate) version of muscle seqs: either file name or list of sequence objects or list of strings or single multiline string containing sequences. Addl docs coming soon """ params["-maxiters"] = 1 params["-diags1"] = True params["-sv"] = True params["-distance1"] = "kbit20_3" muscle_res = muscle_seqs(seqs, params=params, add_seq_names=add_seq_names, out_filename=out_filename, WorkingDir=WorkingDir, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) return muscle_res def align_unaligned_seqs(seqs, moltype, params=None): """Returns an Alignment object from seqs. seqs: SequenceCollection object, or data that can be used to build one. moltype: a MolType object. DNA, RNA, or PROTEIN. params: dict of parameters to pass in to the Muscle app controller. Result will be an Alignment object. """ if not params: params = {} #create SequenceCollection object from seqs seq_collection = SequenceCollection(seqs,MolType=moltype) #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seq_collection.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) #get temporary filename params.update({'-out':get_tmp_filename()}) #Create Muscle app. app = Muscle(InputHandler='_input_as_multiline_string',\ params=params) #Get results using int_map as input to app res = app(int_map.toFasta()) #Get alignment as dict out of results alignment = dict(MinimalFastaParser(res['MuscleOut'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): new_alignment[int_keys[k]]=v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment,MolType=moltype) #Clean up res.cleanUp() del(seq_collection,int_map,int_keys,app,res,alignment,params) return new_alignment def align_and_build_tree(seqs, moltype, best_tree=False, params=None): """Returns an alignment and a tree from Sequences object seqs. seqs: a cogent.core.alignment.SequenceCollection object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object best_tree: if True (default:False), uses a slower but more accurate algorithm to build the tree. params: dict of parameters to pass in to the Muscle app controller. The result will be a tuple containing a cogent.core.alignment.Alignment and a cogent.core.tree.PhyloNode object (or None for the alignment and/or tree if either fails). """ aln = align_unaligned_seqs(seqs, moltype=moltype, params=params) tree = build_tree_from_alignment(aln, moltype, best_tree, params) return {'Align':aln, 'Tree':tree} def build_tree_from_alignment(aln, moltype, best_tree=False, params=None): """Returns a tree from Alignment object aln. aln: a cogent.core.alignment.Alignment object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object best_tree: unsupported params: dict of parameters to pass in to the Muscle app controller. The result will be an cogent.core.tree.PhyloNode object, or None if tree fails. """ # Create instance of app controller, enable tree, disable alignment app = Muscle(InputHandler='_input_as_multiline_string', params=params, \ WorkingDir='/tmp') app.Parameters['-clusteronly'].on() app.Parameters['-tree1'].on(get_tmp_filename(app.WorkingDir)) app.Parameters['-seqtype'].on(moltype.label) seq_collection = SequenceCollection(aln, MolType=moltype) #Create mapping between abbreviated IDs and full IDs int_map, int_keys = seq_collection.getIntMap() #Create SequenceCollection from int_map. int_map = SequenceCollection(int_map,MolType=moltype) # Collect result result = app(int_map.toFasta()) # Build tree tree = DndParser(result['Tree1Out'].read(), constructor=PhyloNode) for tip in tree.tips(): tip.Name = int_keys[tip.Name] # Clean up result.cleanUp() del(seq_collection, app, result) return tree def add_seqs_to_alignment(seqs, aln, params=None): """Returns an Alignment object from seqs and existing Alignment. seqs: a cogent.core.alignment.SequenceCollection object, or data that can be used to build one. aln: a cogent.core.alignment.Alignment object, or data that can be used to build one params: dict of parameters to pass in to the Muscle app controller. """ if not params: params = {} #create SequenceCollection object from seqs seqs_collection = SequenceCollection(seqs) #Create mapping between abbreviated IDs and full IDs seqs_int_map, seqs_int_keys = seqs_collection.getIntMap(prefix='seq_') #Create SequenceCollection from int_map. seqs_int_map = SequenceCollection(seqs_int_map) #create SequenceCollection object from aln aln_collection = SequenceCollection(aln) #Create mapping between abbreviated IDs and full IDs aln_int_map, aln_int_keys = aln_collection.getIntMap(prefix='aln_') #Create SequenceCollection from int_map. aln_int_map = SequenceCollection(aln_int_map) #set output and profile options params.update({'-out':get_tmp_filename(), '-profile':True}) #save seqs to tmp file seqs_filename = get_tmp_filename() seqs_out = open(seqs_filename,'w') seqs_out.write(seqs_int_map.toFasta()) seqs_out.close() #save aln to tmp file aln_filename = get_tmp_filename() aln_out = open(aln_filename, 'w') aln_out.write(aln_int_map.toFasta()) aln_out.close() #Create Muscle app and get results app = Muscle(InputHandler='_input_as_multifile', params=params) res = app((aln_filename, seqs_filename)) #Get alignment as dict out of results alignment = dict(MinimalFastaParser(res['MuscleOut'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): if k in seqs_int_keys: new_alignment[seqs_int_keys[k]] = v else: new_alignment[aln_int_keys[k]] = v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment) #Clean up res.cleanUp() del(seqs_collection, seqs_int_map, seqs_int_keys) del(aln_collection, aln_int_map, aln_int_keys) del(app, res, alignment, params) remove(seqs_filename) remove(aln_filename) return new_alignment def align_two_alignments(aln1, aln2, params=None): """Returns an Alignment object from two existing Alignments. aln1, aln2: cogent.core.alignment.Alignment objects, or data that can be used to build them. params: dict of parameters to pass in to the Muscle app controller. """ if not params: params = {} #create SequenceCollection object from aln1 aln1_collection = SequenceCollection(aln1) #Create mapping between abbreviated IDs and full IDs aln1_int_map, aln1_int_keys = aln1_collection.getIntMap(prefix='aln1_') #Create SequenceCollection from int_map. aln1_int_map = SequenceCollection(aln1_int_map) #create SequenceCollection object from aln2 aln2_collection = SequenceCollection(aln2) #Create mapping between abbreviated IDs and full IDs aln2_int_map, aln2_int_keys = aln2_collection.getIntMap(prefix='aln2_') #Create SequenceCollection from int_map. aln2_int_map = SequenceCollection(aln2_int_map) #set output and profile options params.update({'-out':get_tmp_filename(), '-profile':True}) #save aln1 to tmp file aln1_filename = get_tmp_filename() aln1_out = open(aln1_filename,'w') aln1_out.write(aln1_int_map.toFasta()) aln1_out.close() #save aln2 to tmp file aln2_filename = get_tmp_filename() aln2_out = open(aln2_filename, 'w') aln2_out.write(aln2_int_map.toFasta()) aln2_out.close() #Create Muscle app and get results app = Muscle(InputHandler='_input_as_multifile', params=params) res = app((aln1_filename, aln2_filename)) #Get alignment as dict out of results alignment = dict(MinimalFastaParser(res['MuscleOut'].readlines())) #Make new dict mapping original IDs new_alignment = {} for k,v in alignment.items(): if k in aln1_int_keys: new_alignment[aln1_int_keys[k]] = v else: new_alignment[aln2_int_keys[k]] = v #Create an Alignment object from alignment dict new_alignment = Alignment(new_alignment) #Clean up res.cleanUp() del(aln1_collection, aln1_int_map, aln1_int_keys) del(aln2_collection, aln2_int_map, aln2_int_keys) del(app, res, alignment, params) remove(aln1_filename) remove(aln2_filename) return new_alignment PyCogent-1.5.3/cogent/app/nupack.py000644 000765 000024 00000014151 12024702176 020135 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for NUPACK v1.2 application """ import shutil from os import remove, system, environ from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath, FilePath, ApplicationError from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym, is_not_None __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" if 'NUPACK_DATA' not in environ: raise RuntimeError, \ 'NUPACK app controller requires the NUPACK_DATA environment variable' nupack_data_dir = environ['NUPACK_DATA'] nupack_data_dna = 'dataS_G.dna' nupack_data_rna = 'dataS_G.rna' class Nupack(CommandLineApplication): """Application controller for Nupack_1.2 application Predicts RNA secondary structure All pseudoknot-free secondary structures are allowed, as well as simple pseudoknots. """ _command = 'Fold.out' _input_handler = '_input_as_string' def _input_as_lines(self,data): """ Write a seq of lines to a temp file and return the filename string data: a sequence to be written to a file, each element of the sequence will compose a line in the file Note: '\n' will be stripped off the end of each sequence element before writing to a file in order to avoid multiple new lines accidentally be written to a file """ filename = self._input_filename = self.getTmpFilename(self.WorkingDir) data_file = open(filename,'w') data_to_file = '\n'.join([str(d).strip('\n') for d in data]) data_file.write(data_to_file) data_file.write('\n') #needs a new line att the end of input data_file.close() return filename def _get_result_paths(self,data): """Return a dict of ResultPath objects representing all possible output This dictionary will have keys based on the name that you'd like to access the file by in the CommandLineAppResult object that will be created, and the values which are ResultPath objects. """ result = {} try: f = open((self.WorkingDir+'out.pair')) f.close() result['pair'] =\ ResultPath(Path=(self.WorkingDir+'out.pair')) except IOError: pass try: f = open((self.WorkingDir+'out.ene')) f.close() result['ene'] =\ ResultPath(Path=(self.WorkingDir+'out.ene')) except IOError: pass return result def __call__(self,data=None, remove_tmp=True): """Run the application with the specified kwargs on data data: anything that can be cast into a string or written out to a file. Usually either a list of things or a single string or number. input_handler will be called on this data before it is passed as part of the command-line argument, so by creating your own input handlers you can customize what kind of data you want your application to accept remove_tmp: if True, removes tmp files """ input_handler = self.InputHandler suppress_stdout = self.SuppressStdout suppress_stderr = self.SuppressStderr if suppress_stdout: outfile = FilePath('/dev/null') else: outfile = self.getTmpFilename(self.TmpDir) if suppress_stderr: errfile = FilePath('/dev/null') else: errfile = FilePath(self.getTmpFilename(self.TmpDir)) if data is None: input_arg = '' else: input_arg = getattr(self,input_handler)(data) # Build up the command, consisting of a BaseCommand followed by # input and output (file) specifications command = self._command_delimiter.join(filter(None,\ [self.BaseCommand,str(input_arg),'>',str(outfile),'2>',\ str(errfile)])) if self.HaltExec: raise AssertionError, "Halted exec with command:\n" + command # copy over data files nupack_data_dna_src = '/'.join([nupack_data_dir, nupack_data_dna]) nupack_data_rna_src = '/'.join([nupack_data_dir, nupack_data_rna]) shutil.copy(nupack_data_dna_src, self.WorkingDir) shutil.copy(nupack_data_rna_src, self.WorkingDir) # The return value of system is a 16-bit number containing the signal # number that killed the process, and then the exit status. # We only want to keep the exit status so do a right bitwise shift to # get rid of the signal number byte # NOTE: we copy the data files to the working directory first exit_status = system(command) >> 8 # remove data files nupack_data_dna_dst = ''.join([self.WorkingDir, nupack_data_dna]) nupack_data_rna_dst = ''.join([self.WorkingDir, nupack_data_rna]) remove(nupack_data_dna_dst) remove(nupack_data_rna_dst) # Determine if error should be raised due to exit status of # appliciation if not self._accept_exit_status(exit_status): raise ApplicationError, \ 'Unacceptable application exit status: %s, command: %s'\ % (str(exit_status),command) # open the stdout and stderr if not being suppressed out = None if not suppress_stdout: out = open(outfile,"r") err = None if not suppress_stderr: err = open(errfile,"r") result = CommandLineAppResult(out,err,exit_status,\ result_paths=self._get_result_paths(data)) # Clean up the input file if one was created if remove_tmp: if self._input_filename: remove(self._input_filename) self._input_filename = None return result PyCogent-1.5.3/cogent/app/parameters.py000644 000765 000024 00000045003 12024702176 021017 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """ Provides Parameter, FlagParameter, ValuedParameter, MixedParameter, Parameters. These are intended to be used with the Application class and its subclasses """ from cogent.util.misc import MappedDict, FunctionWrapper from copy import deepcopy __author__ = "Rob Knight" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Sandra Smit", "Greg Caporaso", "Rob Knight"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Sandra Smit" __email__ = "sandra.smit@colorado.edu" __status__ = "Development" def is_not_None(x): """Returns True if x is not None""" return x is not None class ParameterError(ValueError): """Error raised when field in parameter is bad""" pass class FilePath(str): """ Hold paths for proper handling Paths in this sense are filenames, directory paths, or filepaths. Some examples include: file.txt ./path/to/file.txt ./path/to/dir/ /path/to/file.txt . / The purpose of this class is to allow all paths to be handled the same since they sometimes need to be treated differently than simple strings. For example, if a path has a space in it, and it is being passed to system, it needs to be wrapped in quotes. But, you wouldn't want it as a string wrapped in quotes b/c, e.g., isabs('"/absolute/path"') == False, b/c the first char is a ", not a /. * This would make more sense to call Path, but that conflicts with the ResultPath.Path attribute. I'm not sure what to do about this and want to see what others think. Once finalized, a global replace should take care of making the switch. """ def __new__(cls,path): try: return str.__new__(cls, path.strip('"')) except AttributeError: return str.__new__(cls,'') def __str__(self): """ wrap self in quotes, or return the empty string if self == '' """ if self == '': return '' return ''.join(['"',self,'"']) def __add__(self,other): return FilePath(''.join([self,other])) class Parameter(object): """Stores information regarding a parameter to an application. An abstract class. """ def __init__(self,Prefix,Name,Value=None,Delimiter=None,\ Quote=None,IsPath=None): """Initialize the Parameter object. Prefix: the character(s) preceding the name of the parameter (eg. '-' for a '-a' parameter) Name: the name of the parameter (eg. 'a' for a '-a' parameter) Value: the value of the parameter (eg. '9' in a '-t=9' parameter) The value is also used in subclasses to turn parameters on and off Delimiter: the character separating the identifier and the value, (eg. '=' for a '-t=9' command or ' ' for a '-t 9' parameter) Quote: the character to use when quoting the value (eg. "\"" for a '-l="hello" parameter). At this point asymmetrical quotes are not possible (ie. [4]) WARNING: You must escape the quote in most cases. IsPath: boolean indicating whether Value is a file path, and should therefore be cast to a FilePath object WARNING: Don't set Quote='"' and set IsPath=True. This would result in two sets of double quotes being wrapped around the path when it is printed, and the application would most likely fail. We explicitly disallow this with: if self.IsPath and self.Quote == '"': self.Quote = None Id: The combination of Prefix and Name is called the identifier (Id) of the parameter. (eg. '-a' for a '-a' parameter, or '-t' for a '-t=9' parameter) This is intended to be an abstract class and has no use otherwise. To subclass Parameter, the subclass should implement the following methods: __str__(): returns the parameter as a string when turned on, or as an empty string when turned off on(): turns the parameter on isOn(): return True if a parameter is on, otherwise False off(): turns the parameter off isOff(): return True if a parameter is off, otherwise False Whether a parameter is on or off can be specified in different ways in subclasses, your isOn() and isOff() methods should define this. Optionally you can overwrite __init__, but you should be sure to either call the superclass init or handle the setting of the self._default attribute (or things will break!) """ self.Name = Name self.Prefix = Prefix self.Delimiter = Delimiter self.Quote = Quote self.Value = Value self.IsPath = IsPath if self.IsPath and self.Quote == '"': self.Quote = None def _get_id(self): """Construct and return the identifier""" return ''.join(map(str,filter(is_not_None,[self.Prefix,self.Name]))) Id = property(_get_id) def __eq__(self,other): """Return True if two parameters are equal""" return (self.IsPath == other.IsPath) and\ (self.Name == other.Name) and\ (self.Prefix == other.Prefix) and\ (self.Delimiter == other.Delimiter) and \ (self.Quote == other.Quote) and \ (self.Value == other.Value) def __ne__(self,other): """Return True if two parameters are not equal to each other""" return not self == other class FlagParameter(Parameter): """Stores information regarding a flag parameter to an application""" def __init__(self,Prefix,Name,Value=False): """Initialize a FlagParameter object Prefix: the character(s) preceding the name of the parameter (eg. '-' for a '-a' parameter) Name: the name of the parameter (eg. 'a' for a '-a' parameter) Value: determines whether the flag is turned on or not; should be True to turn on, or False to turn off, False by default Id: The combination of Prefix and Name is called the identifier (Id) of the parameter. (eg. '-a' for a '-a' parameter, or '-t' for a '-t=9' parameter) Usage: f = FlagParameter(Prefix='-',Name='a') Parameter f is turned off by default, so it won't be used by the application until turned on. or f = FlagParameter(Prefix='+',Name='d',Value=True) Parameter f is turned on. It will be used by the application. """ super(FlagParameter,self).__init__(Name=Name,Prefix=Prefix,\ Value=Value,Delimiter=None,Quote=None) def __str__(self): """Return the parameter as a string. When turned on: string representation of the parameter When turned off: empty string """ if self.isOff(): return '' else: return ''.join(map(str,[self.Prefix,self.Name])) def isOn(self): """Returns True if the FlagParameter is turned on. A FlagParameter is turned on if its Value is True or evaluates to True. A FlagParameter is turned off if its Value is False or evaluates to False. """ if self.Value: return True return False def isOff(self): """Returns True if the parameter is turned off A FlagParameter is turned on if its Value is True or evaluates to True. A FlagParameter is turned off if its Value is False or evaluates to False. """ return not self.isOn() def on(self): """Turns the FlagParameter ON by setting its Value to True""" self.Value = True def off(self): """Turns the FlagParameter OFF by setting its Value to False""" self.Value = False class ValuedParameter(Parameter): """Stores information regarding a valued parameter to an application""" def __init__(self,Prefix,Name,Value=None,Delimiter=None,Quote=None,\ IsPath=False): """Initialize a ValuedParameter object. Prefix: the character(s) preceding the name of the parameter (eg. '-' for a '-a' parameter) Name: the name of the parameter (eg. 'a' for a '-a' parameter) Value: the value of the parameter (eg. '9' in a '-t=9' parameter) Delimiter: the character separating the identifier and the value, (eg. '=' for a '-t=9' command or ' ' for a '-t 9' parameter) Quote: the character to use when quoting the value (eg. "\"" for a '-l="hello" parameter). At this point asymmetrical quotes are not possible (ie. [4]) WARNING: You must escape the quote in most cases. IsPath: boolean indicating whether Value is a file path, and should therefore be cast to a FilePath object Id: The combination of Prefix and Name is called the identifier (Id) of the parameter. (eg. '-a' for a '-a' parameter, or '-t' for a '-t=9' parameter) Default: the default value of the parameter; this is defined as what is passed into init for Value and can not be changed after object initialization Usage: v = ValuedParameter(Prefix='-',Name='a',Delimiter=' ',Value=3) the parameter is turned on by default (value=3) and will be used by the application as '-a 3'. or v = ValuedParameter(Prefix='-',Name='d',Delimiter='=') the parameter is turned off by default and won't be used by the application unless turned on with some value. """ if IsPath and Value: Value=FilePath(Value) super(ValuedParameter,self).__init__(Name=Name,Prefix=Prefix,\ Value=Value,Delimiter=Delimiter,Quote=Quote,IsPath=IsPath) self._default = Value def __str__(self): """Return the parameter as a string When turned on: string representation of the parameter When turned off: empty string """ if self.isOff(): return '' else: parts = [self.Prefix,self.Name,self.Delimiter,\ self.Quote,self.Value,self.Quote] return ''.join(map(str,filter(is_not_None,parts))) def __eq__(self,other): """Return True if two parameters are equal""" return (self.Name == other.Name) and\ (self.Prefix == other.Prefix) and\ (self.Delimiter == other.Delimiter) and \ (self.Quote == other.Quote) and \ (self.Value == other.Value) and\ (self._default == other._default) def _get_default(self): """Get the default value of the ValuedParameter Accessed as a property to avoid the user changing this after initialization. """ return self._default Default = property(_get_default) def reset(self): """Reset Value of the ValuedParameter to the default""" self.Value = self._default def isOn(self): """Returns True if the ValuedParameter is turned on A ValuedParameter is turned on if its Value is not None. A ValuedParameter is turned off if its Value is None. """ if self.Value is not None: return True return False def isOff(self): """Returns True if the ValuedParameter is turned off A ValuedParameter is turned on if its Value is not None. A ValuedParameter is turned off if its Value is None. """ return not self.isOn() def on(self,val): """Turns the ValuedParameter ON by setting its Value to val An attempt to turn the parameter on with value 'None' will result in an error, since this is the same as turning the parameter off. """ if val is None: raise ParameterError,\ "Turning the ValuedParameter on with value None is the same as "+\ "turning it off. Use another value." elif self.IsPath: self.Value = FilePath(val) else: self.Value = val def off(self): """Turns the ValuedParameter OFF by setting its Value to None""" self.Value = None class MixedParameter(ValuedParameter): """Stores information regarding a mixed parameter to an application A mixed parameter is a mix between a FlagParameter and a ValuedParameter. When its Value is False, the parameter will be turned off. When its Value is set to None, the parameter will behave like a flag. When its Value is set to anything but None or False, it will behave like a ValuedParameter. Example: RNAfold [-d[0|1]] You can give either '-d' or '-d0' or '-d1' as input. """ def __init__(self,Prefix,Name,Value=False,Delimiter=None,Quote=None,\ IsPath=False): """Initialize a MixedParameter object Prefix: the character(s) preceding the name of the parameter (eg. '-' for a '-a' parameter) Name: the name of the parameter (eg. 'a' for a '-a' parameter) Value: the value of the parameter (eg. '9' in a '-t=9' parameter) Delimiter: the character separating the identifier and the value, (eg. '=' for a '-t=9' command or ' ' for a '-t 9' parameter) Quote: the character to use when quoting the value (eg. "\"" for a '-l="hello" parameter). At this point asymmetrical quotes are not possible (ie. [4]) WARNING: You must escape the quote in most cases. IsPath: boolean indicating whether Value is a file path, and should therefore be cast to a FilePath object Id: The combination of Prefix and Name is called the identifier (Id) of the parameter. (eg. '-a' for a '-a' parameter, or '-t' for a '-t=9' parameter) Default: the default value of the parameter; this is defined as what is passed into init for Value and can not be changed after object initialization Usage: m = MixedParameter(Prefix='-',Name='a',Delimiter=' ',Value=3) the parameter is turned on by default (value=3) and will be used by the application as '-a 3'. or m = MixedParameter(Prefix='-',Name='d',Delimiter='=',Value=None) the parameter is turned on by default as a flag parameter and will be used by the application as '-d'. or m = MixedParameter(Prefix='-',Name='d',Delimiter='=') the parameter is turned off by default (Value=False) and won't be used by the application unless turned on with some value. """ if IsPath and Value: Value=FilePath(Value) super(MixedParameter,self).__init__(Name=Name,Prefix=Prefix,\ Value=Value,Delimiter=Delimiter,Quote=Quote,IsPath=IsPath) def __str__(self): """Return the parameter as a string When turned on: string representation of the parameter When turned off: empty string """ if self.isOff(): return '' elif self.Value is None: return ''.join(map(str,[self.Prefix,self.Name])) else: parts = [self.Prefix,self.Name,self.Delimiter,\ self.Quote,self.Value,self.Quote] return ''.join(map(str,filter(is_not_None,parts))) def isOn(self): """Returns True if the MixedParameter is turned on A MixedParameter is turned on if its Value is not False. A MixedParameter is turned off if its Value is False. A MixedParameter is used as flag when its Value is None. A MixedParameter is used as ValuedParameter when its Value is anything but None or False. """ if self.Value is not False: return True return False def isOff(self): """Returns True if the MixedParameter is turned off A MixedParameter is turned on if its Value is not False. A MixedParameter is turned off if its Value is False. """ return not self.isOn() def on(self,val=None): """Turns the MixedParameter ON by setting its Value to val An attempt to turn the parameter on with value 'False' will result in an error, since this is the same as turning the parameter off. Turning the MixedParameter ON without a value or with value 'None' will let the parameter behave as a flag. """ if val is False: raise ParameterError,\ "Turning the ValuedParameter on with value False is the same as "+\ "turning it off. Use another value." elif self.IsPath: self.Value = FilePath(val) else: self.Value = val def off(self): """Turns the MixedParameter OFF by setting its Value to False""" self.Value = False def _find_synonym(synonyms): """ Returns function to lookup a key in synonyms dictionary. Inteded for use by the Parameters object. """ def check_key(key): if key in synonyms: return synonyms[key] return key return check_key class Parameters(MappedDict): """Parameters is a dictionary of Parameter objects. Parameters provides a mask that lets the user lookup and access parameters by its synonyms. """ def __init__(self,parameters={},synonyms={}): """Initialize the Parameters object. parameters: a dictionary of Parameter objects keyed by their identifier synonyms: a dictionary of synonyms. Keys are synonyms, values are parameter identifiers. """ mask = FunctionWrapper(_find_synonym(synonyms)) super(Parameters,self).__init__(data=deepcopy(parameters),Mask=mask) self.__setitem__ = self.setdefault = self.update =\ self.__delitem__ = self._raiseNotImplemented def _raiseNotImplemented(self,*args): """Raises an error for an attempt to change a Parameters object""" raise NotImplementedError, 'Parameters object is immutable' def all_off(self): """Turns all parameters in the dictionary off""" for v in self.values(): v.off() PyCogent-1.5.3/cogent/app/parsinsert.py000644 000765 000024 00000006421 12024702176 021047 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for ParsInsert designed for ParsInsert v1.03 """ __author__ = "Jesse Stombaugh" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Jesse Stombaugh"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Stombaugh" __email__ = "jesse.stombaugh@colorado.edu" __status__ = "Production" from cogent.app.parameters import ValuedParameter, FlagParameter, \ MixedParameter from cogent.app.util import CommandLineApplication, FilePath, system, \ CommandLineAppResult, ResultPath, remove, ApplicationError from cogent.core.tree import PhyloNode from cogent.parse.tree import DndParser from cogent.core.moltype import DNA, RNA, PROTEIN from cogent.core.alignment import SequenceCollection,Alignment from os.path import splitext, join,abspath from cogent.parse.phylip import get_align_for_phylip from StringIO import StringIO class ParsInsert(CommandLineApplication): """ParsInsert application Controller""" _command = 'ParsInsert' _input_handler = '_input_as_multiline_string' _parameters = { # read mask from this file '-m':ValuedParameter('-',Name='m',Delimiter=' '), # read core tree sequences from this file '-s':ValuedParameter('-',Name='s',Delimiter=' '), # read core tree from this file '-t':ValuedParameter('-',Name='t',Delimiter=' '), # read core tree taxomony from this file '-x':ValuedParameter('-',Name='x',Delimiter=' '), # output taxonomy for each insert sequence to this file '-o':ValuedParameter('-',Name='o',Delimiter=' '), # create log file '-l':ValuedParameter('-',Name='l',Delimiter=' '), # number of best matches to display '-n':ValuedParameter('-',Name='n',Delimiter=' '), #percent threshold cutoff '-c':ValuedParameter('-',Name='c',Delimiter=' '), } def _handle_app_result_build_failure(self,out,err,exit_status,result_paths): """ Catch the error when files are not produced """ raise ApplicationError, \ 'ParsInsert failed to produce an output file due to the following error: \n\n%s ' \ % err.read() def _get_result_paths(self,data): """ Get the resulting tree""" result = {} result['Tree'] = ResultPath(Path=splitext(self._input_filename)[0] + \ '.tree') return result def insert_sequences_into_tree(aln, moltype, params={}): """Returns a tree from placement of sequences """ # convert aln to phy since seq_names need fixed to run through parsinsert new_aln=get_align_for_phylip(StringIO(aln)) # convert aln to fasta in case it is not already a fasta file aln2 = Alignment(new_aln) seqs = aln2.toFasta() parsinsert_app = ParsInsert(params=params) result = parsinsert_app(seqs) # parse tree tree = DndParser(result['Tree'].read(), constructor=PhyloNode) # cleanup files result.cleanUp() return tree PyCogent-1.5.3/cogent/app/pfold.py000644 000765 000024 00000010406 12024702176 017757 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controllers for pfold application package Run in the same order as the order in this file(instructions from pfold author) [fasta2col,findphyl,mltree,scfg] """ import os from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" #IMPORTANT!!!! #pfold_path must be set manually to the bin dir in the install dir of pfold if 'PFOLD_BIN_DIR' not in os.environ: raise RuntimeError, \ "The pfold app controller requires PFOLD_BIN_DIR environment variable" else: pfold_path = os.environ['PFOLD_BIN_DIR'] class fasta2col(CommandLineApplication): """Application controller for fasta2col in Pfold package""" _command = 'fasta2col' _input_handler = '_input_as_string' def _input_as_string(self,filename): """Overrides _input_as_string in CommandLineApplication Input file need to be modified by sed""" sed = '| sed \'s/arbitrary/RNA/g\'' data = '%s %s' % (filename,sed) return data def _input_as_lines(self,data): """Overrides _input_as_lines in CommandLineApplication Input file need to be modified by sed""" filename = super(fasta2col,self)._input_as_lines(data) sed = '| sed \'s/arbitrary/RNA/g\'' data = '%s %s' % (filename,sed) return data class findphyl(CommandLineApplication): """Application controller for findphyl in Pfold package Find the phylogeny of the sequences using the neighbour joining approach""" _command = 'findphyl' _input_handler = '_input_as_string' def _input_as_string(self,filename): """Overrides _input_as_string in CommandLineApplication scfg.rate file need to be specified along with the input file""" file = '%s%s' % (pfold_path,'scfg.rate') data = '%s %s' % (file,filename) return data def _input_as_lines(self,data): """Overrides _input_as_lines in CommandLineApplication scfg.rate file need to be specified along with the input file""" filename = super(findphyl,self)._input_as_lines(data) file = '%s%s' % (pfold_path,'scfg.rate') data = '%s %s' % (file,filename) return data class mltree(CommandLineApplication): """Application controller for mltree in pfold package Performs a maximum likelihood estimate of the branch lengths""" _command = 'mltree' _input_handler = '_input_as_string' def _input_as_string(self,filename): """Overrides _input_as_string in CommandLineApplication scfg.rate file need to be specified along with the input file""" file = '%s%s' % (pfold_path,'scfg.rate') data = '%s %s' % (file,filename) return data def _input_as_lines(self,data): """Overrides _input_as_lines in CommandLineApplication scfg.rate file need to be specified along with the input file""" filename = super(mltree,self)._input_as_lines(data) file = '%s%s' % (pfold_path,'scfg.rate') data = '%s %s' % (file,filename) return data class scfg(CommandLineApplication): """Application controller for scfg in Pfold package Performs the analysis The file `article.grm' has the grammar and evolutionary model that is used for the analysis""" _command = 'scfg' _input_handler = '_input_as_string' def _input_as_string(self,filename): """Overrides _input_as_string in CommandLineApplication Additional input information about tree needed from article.grm file""" file = '%s %s%s' % ('--treeinfile',pfold_path,'article.grm') data = '%s %s' % (file,filename) return data def _input_as_lines(self,data): """Overrides _input_as_lines in CommandLineApplication Additional input information about tree needed from article.grm file""" filename = super(scfg,self)._input_as_lines(data) file = '%s %s%s' % ('--treeinfile',pfold_path,'article.grm') data = '%s %s' % (file,filename) return data PyCogent-1.5.3/cogent/app/pknotsrg.py000644 000765 000024 00000004452 12024702176 020526 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class PknotsRG(CommandLineApplication): """Application controller for PknotsRG v1.2 application Input: plain seqeunce pknotsRG is a tool for thermodynamic folding of RNA secondary structures, including the class of canonical simple recursive pseudoknots. Options: -m Use mfe strategy -f Use enf strategy -l Use loc strategy -s Show suboptimals -u no dangling bases (implies -s) -o no suboptimals inside pknots (implies -s -l) -e Set energy range for suboptimals (kcal/mole) -c Set energy range for suboptimals (%) [10] -n Set npp-value [0.3] -p Set pkinit-value [9] -k Set maximal pknot-length """ _parameters = { '-m':FlagParameter(Prefix='-',Name='m'), '-f':FlagParameter(Prefix='-',Name='f'), '-l':FlagParameter(Prefix='-',Name='l'), '-s':FlagParameter(Prefix='-',Name='s'), '-u':FlagParameter(Prefix='-',Name='u'), '-o':FlagParameter(Prefix='-',Name='o'), '-e':ValuedParameter(Prefix='-',Name='e',Delimiter=' '), '-c':ValuedParameter(Prefix='-',Name='c',Delimiter=' '), '-n':ValuedParameter(Prefix='-',Name='n',Delimiter=' '), '-p':ValuedParameter(Prefix='-',Name='p',Delimiter=' '), '-k':ValuedParameter(Prefix='-',Name='k',Delimiter=' ')} _command = 'pknotsRG-1.2-i386-linux-static' _input_handler = '_input_as_string' def _input_as_string(self,filename): """Returns '>filename' to redirect input to stdin""" return ''.join(['<',super(PknotsRG,self)._input_as_string(filename)]) def _input_as_lines(self,data): """Returns '>temp_filename to redirect input to stdin""" return ''.join(['<',super(PknotsRG,self)._input_as_lines(data)]) PyCogent-1.5.3/cogent/app/pplacer.py000644 000765 000024 00000020763 12024702176 020310 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for pplacer 1.1""" __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger","Jesse Stombaugh"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Production" from cogent.app.parameters import ValuedParameter, FlagParameter from cogent.app.util import CommandLineApplication, FilePath, system, \ CommandLineAppResult, ResultPath, remove, ApplicationError, \ get_tmp_filename from cogent.core.alignment import Alignment from cogent.app.guppy import build_tree_from_json_using_params from os.path import splitext,abspath,join,split from StringIO import StringIO from cogent.parse.phylip import get_align_for_phylip from cogent.parse.tree import DndParser from cogent.core.tree import PhyloNode class Pplacer(CommandLineApplication): """pplacer Application Controller """ _command = 'pplacer' _input_handler = '_input_as_multiline_string' _parameters = { # -c Specify the path to the reference package. '-c': ValuedParameter('-', Name='c', Delimiter=' ', IsPath=True), # -t Specify the reference tree filename. '-t': ValuedParameter('-', Name='t', Delimiter=' ', IsPath=True), # -r Specify the reference alignment filename. '-r': ValuedParameter('-', Name='r', Delimiter=' ', IsPath=True), # -s Supply a phyml stats.txt or a RAxML info file giving the model parameters. '-s': ValuedParameter('-', Name='s', Delimiter=' ', IsPath=True), # -d Specify the directory containing the reference information. '-d': ValuedParameter('-', Name='d', Delimiter=' ', IsPath=True), # -p Calculate posterior probabilities. '-p': FlagParameter('-', Name='p'), # -m Substitution model. Protein: are LG, WAG, or JTT. Nucleotides: GTR. '-m': ValuedParameter('-', Name='m', Delimiter=' '), # --model-freqs Use model frequencies instead of reference alignment frequencies. '--model-freqs': FlagParameter('--', Name='model-freqs'), # --gamma-cats Number of categories for discrete gamma model. '--gamma-cats': ValuedParameter('--', Name='gamma-cats', Delimiter=' '), # --gamma-alpha Specify the shape parameter for a discrete gamma model. '--gamma-alpha': ValuedParameter('--', Name='gamma-alpha', Delimiter=' '), # --ml-tolerance 1st stage branch len optimization tolerance (2nd stage to 1e-5). Default: 0.01. '--ml-tolerance': ValuedParameter('--', Name='ml-tolerance', Delimiter=' '), # --pp-rel-err Relative error for the posterior probability calculation. Default is 0.01. '--pp-rel-err': ValuedParameter('--', Name='pp-rel-err', Delimiter=' '), # --unif-prior Use a uniform prior rather than exponential. '--unif-prior': FlagParameter('--', Name='unif-prior'), # --start-pend Starting pendant branch length. Default is 0.1. '--start-pend': ValuedParameter('--', Name='start-pend', Delimiter=' '), # --max-pend Set the maximum ML pendant branch length. Default is 2. '--max-pend': ValuedParameter('--', Name='max-pend', Delimiter=' '), # --max-strikes Maximum number of strikes for baseball. 0 -> no ball playing. Default is 6. '--max-strikes': ValuedParameter('--', Name='max-strikes', Delimiter=' '), # --strike-box Set the size of the strike box in log likelihood units. Default is 3. '--strike-box': ValuedParameter('--', Name='strike-box', Delimiter=' '), # --max-pitches Set the maximum number of pitches for baseball. Default is 40. '--max-pitches': ValuedParameter('--', Name='max-pitches', Delimiter=' '), # --fantasy Desired likelihood cutoff for fantasy baseball mode. 0 -> no fantasy. '--fantasy': ValuedParameter('--', Name='fantasy', Delimiter=' '), # --fantasy-frac Fraction of fragments to use when running fantasy baseball. Default is 0.1. '--fantasy-frac': ValuedParameter('--', Name='fantasy-frac', Delimiter=' '), # --write-masked Write alignment masked to the region without gaps in the query. '--write-masked': FlagParameter('--', Name='write-masked'), # --verbosity Set verbosity level. 0 is silent, and 2 is quite a lot. Default is 1. '--verbosity': ValuedParameter('--', Name='verbosity', Delimiter=' '), # --unfriendly Do not run friend finder pre-analysis. '--unfriendly': FlagParameter('--', Name='unfriendly'), # --out-dir Specify the directory to write place files to. '--out-dir': ValuedParameter('--', Name='out-dir', Delimiter=' ', IsPath=True), # --pretend Only check out the files then report. Do not run the analysis. '--pretend': FlagParameter('--', Name='pretend'), # --csv Make a CSV file with the results. '--csv': FlagParameter('--', Name='csv'), # --old-format Make an old-format placefile with the resuls. '--old-format': FlagParameter('--', Name='old-format'), # --diagnostic Write file describing the 'diagnostic' mutations for various clades. '--diagnostic': FlagParameter('--', Name='diagnostic'), # --check-like Write out the likelihood of the reference tree, calculated two ways. '--check-like': FlagParameter('--', Name='check-like'), # --version Write out the version number and exit. '--version': FlagParameter('--', Name='version'), # --help Display this list of options '--help': FlagParameter('--', Name='help'), } def getTmpFilename(self, tmp_dir="/tmp",prefix='tmp',suffix='.fasta',\ include_class_id=False,result_constructor=FilePath): """ Define Tmp filename to contain .fasta suffix, since pplacer requires the suffix to be .fasta """ return super(Pplacer,self).getTmpFilename(tmp_dir=tmp_dir, prefix=prefix, suffix=suffix, include_class_id=include_class_id, result_constructor=result_constructor) def _handle_app_result_build_failure(self,out,err,exit_status,result_paths): """ Catch the error when files are not produced """ raise ApplicationError, \ 'Pplacer failed to produce an output file due to the following error: \n\n%s ' \ % out.read() def _get_result_paths(self,data): """ Define the output filepaths """ output_dir = self.Parameters['--out-dir'].Value result = {} result['json'] = ResultPath(Path=join(output_dir, splitext(split(self._input_filename)[-1])[0] + \ '.jplace')) return result def insert_sequences_into_tree(aln, moltype, params={}, write_log=True): """Returns a tree from Alignment object aln. aln: an xxx.Alignment object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object params: dict of parameters to pass in to the RAxML app controller. The result will be an xxx.Alignment object, or None if tree fails. """ # convert aln to phy since seq_names need fixed to run through pplacer new_aln=get_align_for_phylip(StringIO(aln)) # convert aln to fasta in case it is not already a fasta file aln2 = Alignment(new_aln) seqs = aln2.toFasta() ih = '_input_as_multiline_string' pplacer_app = Pplacer(params=params, InputHandler=ih, WorkingDir=None, SuppressStderr=False, SuppressStdout=False) pplacer_result = pplacer_app(seqs) # write a log file if write_log: log_fp = join(params["--out-dir"],'log_pplacer_' + \ split(get_tmp_filename())[-1]) log_file=open(log_fp,'w') log_file.write(pplacer_result['StdOut'].read()) log_file.close() # use guppy to convert json file into a placement tree guppy_params={'tog':None} new_tree=build_tree_from_json_using_params(pplacer_result['json'].name, \ output_dir=params['--out-dir'], \ params=guppy_params) pplacer_result.cleanUp() return new_tree PyCogent-1.5.3/cogent/app/raxml.py000644 000765 000024 00000036572 12024702176 020012 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for RAxML (v7.0.3). WARNING: Because of the use of the -x option, this version is no longer compatible with RAxML version VI. """ from cogent.app.parameters import FlagParameter, ValuedParameter, FilePath from cogent.app.util import CommandLineApplication, ResultPath, get_tmp_filename from cogent.core.tree import PhyloNode from cogent.core.alignment import Alignment from cogent.core.moltype import DNA, RNA, PROTEIN from cogent.util.warning import deprecated from random import choice, randint from os import walk from os.path import isabs from cogent.parse.tree import DndParser __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Catherine Lozupone", "Rob Knight", \ "Daniel McDonald", "Jai Ram Rideout"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Micah Hamady" __email__ = "hamady@colorado.edu" __status__ = "Prototype" class Raxml(CommandLineApplication): """RAxML application controller""" deprecated('class', 'cogent.app.raxml.Raxml', 'cogent.app.raxml_v730.Raxml', '1.6') _options ={ # Specify a column weight file name to assign individual wieghts to # each column of the alignment. Those weights must be integers # separated by any number and type of whitespaces whithin a separate # file, see file "example_weights" for an example. '-a':ValuedParameter('-',Name='a',Delimiter=' '), # Specify an integer number (random seed) for bootstrapping '-b':ValuedParameter('-',Name='b',Delimiter=' '), # Specify number of distinct rate catgories for raxml when # ModelOfEvolution is set to GTRCAT or HKY85CAT. # Individual per-site rates are categorized into numberOfCategories # rate categories to accelerate computations. (Default = 50) '-c':ValuedParameter('-',Name='c',Delimiter=' ', Value=50), # This option allows you to start the RAxML search with a complete # random starting tree instead of the default Maximum Parsimony # Starting tree. On smaller datasets (around 100-200 taxa) it has # been observed that this might sometimes yield topologies of distinct # local likelihood maxima which better correspond to empirical # expectations. '-d':FlagParameter('-',Name='d'), # This allows you to specify up to which likelihood difference. # Default is 0.1 log likelihood units, author recommends 1 or 2 to # rapidly evaluate different trees. '-e':ValuedParameter('-',Name='e',Delimiter=' ', Value=0.1), # select search algorithm: # d for normal hill-climbing search (Default) # when -f option is omitted this algorithm will be used # o old (slower) algorithm from v. 2.1.3 # c (check) just tests whether it can read the alignment # e (evaluate) to optimize model+branch lengths for given input tree # b (bipartition) draws bipartitions # s (split) splits into individual genes, provided with model file '-f':ValuedParameter('-',Name='f',Delimiter=' ', Value="d"), # select grouping file name: allows incomplete multifurcating constraint # tree in newick format -- resolves multifurcations randomly, adds # other taxa using parsimony insertion '-g':ValuedParameter('-', Name='g',Delimiter=' '), # prints help and exits '-h':FlagParameter('-', Name='h'), # allows initial rearrangement to be constrained, e.g. 10 means # insertion will not be more than 10 nodes away from original. # default is to pick a "good" setting. '-i':ValuedParameter('-', Name='i', Delimiter=' '), # writes checkpoints (off by default) '-j':FlagParameter('-', Name='j'), #specifies that RAxML will optimize model parameters (for GTRMIX and # GTRGAMMA) as well as calculating likelihoods for bootstrapped trees. '-k':FlagParameter('-', Name='k'), # Model of Nucleotide Substitution: # -m GTRGAMMA: GTR + Optimization of substitution rates + Gamma # -m GTRCAT: GTR + Optimization of substitution rates + Optimization # of site-specific evolutionary rates which are categorized into # numberOfCategories distinct rate categories for greater # computational efficiency # -m GTRMIX: Searches for GTRCAT, then switches to GTRGAMMA # Amino Acid Models # matrixName (see below): DAYHOFF, DCMUT, JTT, MTREV, WAG, RTREV, # CPREV, VT, BLOSUM62, MTMAM, GTR. # F means use empirical nucleotide frequencies (append to string) # -m PROTCATmatrixName[F]: uses site-specific rate categories # -m PROTGAMMAmatrixName[F]: uses Gamma # -m PROTMIXmatrixName[F]: switches between gamma and cat models # e.g. -m PROTCATBLOSUM62F would use protcat with BLOSUM62 and # empirical frequencies '-m':ValuedParameter('-',Name='m',Delimiter=' '), # Specifies the name of the output file. '-n':ValuedParameter('-',Name='n',Delimiter=' '), # Specifies the name of the outgroup (or outgroups: comma-delimited, # no spaces, should be monophyletic). '-o':ValuedParameter('-',Name='o',Delimiter=' '), # Specified MultipleModel file name, in format: # gene1 = 1-500 # gene2 = 501-1000 # (note: ranges can also be discontiguous, e.g. 1-100, 200-300, # or can specify codon ranges as e.g. 1-100/3, 2-100/3, 3-100/3)) '-q':ValuedParameter('-', Name='q', Delimiter=' '), # Name of the working directory where RAxML-V will write its output # files. '-w':ValuedParameter('-',Name='w',Delimiter=' '), # Constraint file name: allows a bifurcating Newick tree to be passed # in as a constraint file, other taxa will be added by parsimony. '-r':ValuedParameter('-',Name='r',Delimiter=' '), # Specify a random number seed for the parsimony inferences. This # allows you to reproduce your results and will help me debug the # program. This option HAS NO EFFECT in the parallel MPI version '-p':ValuedParameter('-',Name='p',Delimiter=' '), # specify the name of the alignment data file, in relaxed PHYLIP # format. '-s':ValuedParameter('-',Name='s',Delimiter=' '), # Specify a user starting tree file name in Newick format '-t':ValuedParameter('-',Name='t',Delimiter=' '), # Print the version '-v':FlagParameter('-',Name='v'), # Compute only randomized starting parsimony tree with RAxML, do not # optimize an ML analysis of the tree '-y':FlagParameter('-', Name='y'), # Multiple tree file, for use with -f b (to draw bipartitions onto the # common tree specified with -t) '-z':ValuedParameter('-', Name='z', Delimiter=' '), # Specifies number of runs on distinct starting trees. '-#':ValuedParameter('-', Name='#', Delimiter=' '), #Specify an integer number (random seed) to turn on rapid bootstrapping '-x':ValuedParameter('-', Name='x', Delimiter=' ') } _parameters = {} _parameters.update(_options) _command = "raxmlHPC" _out_format = "RAxML_%s.%s" def _format_output(self, outfile_name, out_type): """ Prepend proper output prefix to output filename """ outfile_name = self._absolute(outfile_name) outparts = outfile_name.split("/") outparts[-1] = self._out_format % (out_type, outparts[-1] ) return '/'.join(outparts) def _input_as_seqs(self,data): lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _input_as_lines(self,data): if data: self.Parameters['-s']\ .on(super(Raxml,self)._input_as_lines(data)) return '' def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ if data: self.Parameters['-in'].on(str(data)) return '' def _input_as_multiline_string(self, data): if data: self.Parameters['-s']\ .on(super(Raxml,self)._input_as_multiline_string(data)) return '' def _absolute(self,path): path = FilePath(path) if isabs(path): return path elif self.Parameters['-w'].isOn(): return self.Parameters['-w'].Value + path else: return self.WorkingDir + path def _log_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), "log") else: raise ValueError, "No output file specified." def _info_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), "info") else: raise ValueError, "No output file specified." def _parsimony_tree_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), "parsimonyTree") else: raise ValueError, "No output file specified." def _result_tree_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), "result") else: raise ValueError, "No output file specified." def _result_bootstrap_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "bootstrap") else: raise ValueError, "No output file specified" def _checkpoint_out_filenames(self): """ RAxML generates a crapload of checkpoint files so need to walk directory to collect names of all of them. """ out_filenames = [] if self.Parameters['-n'].isOn(): out_name = str(self.Parameters['-n'].Value) walk_root = self.WorkingDir if self.Parameters['-w'].isOn(): walk_root = str(self.Parameters['-w'].Value) for tup in walk(walk_root): dpath, dnames, dfiles = tup if dpath == walk_root: for gen_file in dfiles: if out_name in gen_file and "checkpoint" in gen_file: out_filenames.append(walk_root + gen_file) break else: raise ValueError, "No output file specified." return out_filenames def _get_result_paths(self,data): result = {} result['Info'] = ResultPath(Path=self._info_out_filename(), IsWritten=True) if self.Parameters['-k'].isOn(): result['Bootstrap'] = ResultPath( Path=self._result_bootstrap_out_filename(), IsWritten=True) else: result['Log'] = ResultPath(Path=self._log_out_filename(), IsWritten=True) result['ParsimonyTree'] = ResultPath( Path=self._parsimony_tree_out_filename(), IsWritten=True) result['Result'] = ResultPath( Path=self._result_tree_out_filename(), IsWritten=True) for checkpoint_file in self._checkpoint_out_filenames(): checkpoint_num = checkpoint_file.split(".")[-1] try: checkpoint_num = int(checkpoint_num) except Exception, e: raise ValueError, "%s does not appear to be a valid checkpoint file" result['Checkpoint%d' % checkpoint_num] = ResultPath( Path=checkpoint_file, IsWritten=True) return result #SOME FUNCTIONS TO EXECUTE THE MOST COMMON TASKS def raxml_alignment(align_obj, raxml_model="GTRCAT", params={}, SuppressStderr=True, SuppressStdout=True): """Run raxml on alignment object align_obj: Alignment object params: you can set any params except -w and -n returns: tuple (phylonode, parsimonyphylonode, log likelihood, total exec time) """ # generate temp filename for output params["-w"] = "/tmp/" params["-n"] = get_tmp_filename().split("/")[-1] params["-m"] = raxml_model ih = '_input_as_multiline_string' seqs, align_map = align_obj.toPhylip() #print params["-n"] # set up command raxml_app = Raxml( params=params, InputHandler=ih, WorkingDir=None, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) # run raxml ra = raxml_app(seqs) # generate tree tree_node = DndParser(ra["Result"]) # generate parsimony tree parsimony_tree_node = DndParser(ra["ParsimonyTree"]) # extract log likelihood from log file log_file = ra["Log"] total_exec_time = exec_time = log_likelihood = 0.0 for line in log_file: exec_time, log_likelihood = map(float, line.split()) total_exec_time += exec_time # remove output files ra.cleanUp() return tree_node, parsimony_tree_node, log_likelihood, total_exec_time def build_tree_from_alignment(aln, moltype, best_tree=False, params={}): """Returns a tree from Alignment object aln. aln: an xxx.Alignment object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object best_tree: best_tree suppport is currently not implemented params: dict of parameters to pass in to the RAxML app controller. The result will be an xxx.Alignment object, or None if tree fails. """ if best_tree: raise NotImplementedError if '-m' not in params: if moltype == DNA or moltype == RNA: params["-m"] = 'GTRMIX' elif moltype == PROTEIN: params["-m"] = 'PROTMIXGTR' else: raise ValueError, "Moltype must be either DNA, RNA, or PROTEIN" if not hasattr(aln, 'toPhylip'): aln = Alignment(aln) seqs, align_map = aln.toPhylip() # generate temp filename for output params["-w"] = "/tmp/" params["-n"] = get_tmp_filename().split("/")[-1] params["-k"] = True params["-x"] = randint(1,100000) ih = '_input_as_multiline_string' raxml_app = Raxml(params=params, InputHandler=ih, WorkingDir=None, SuppressStderr=True, SuppressStdout=True) raxml_result = raxml_app(seqs) tree = DndParser(raxml_result['Bootstrap'], constructor=PhyloNode) for node in tree.tips(): node.Name = align_map[node.Name] raxml_result.cleanUp() return tree PyCogent-1.5.3/cogent/app/raxml_v730.py000644 000765 000024 00000117772 12024702176 020573 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for RAxML (v7.3.0). WARNING: Because of the use of the -x option, this version is no longer compatible with RAxML version VI. """ from cogent.app.parameters import FlagParameter, ValuedParameter, FilePath from cogent.app.util import CommandLineApplication, ResultPath, \ get_tmp_filename,ApplicationError from cogent.core.tree import PhyloNode from cogent.core.alignment import Alignment from cogent.core.moltype import DNA, RNA, PROTEIN from random import choice, randint from os import walk,listdir from os.path import isabs,join,split from cogent.parse.tree import DndParser import re from cogent.app.guppy import build_tree_from_json_using_params __author__ = "Micah Hamady" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Micah Hamady", "Catherine Lozupone", "Rob Knight", \ "Daniel McDonald","Jesse Stombaugh"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Jesse Stombaugh" __email__ = "jesse.stombaugh@colorado.edu" __status__ = "Production" class Raxml(CommandLineApplication): """RAxML application controller""" _options ={ # Specify a column weight file name to assign individual wieghts to # each column of the alignment. Those weights must be integers # separated by any number and type of whitespaces whithin a separate # file, see file "example_weights" for an example. '-a':ValuedParameter('-',Name='a',Delimiter=' '), # Specify one of the secondary structure substitution models implemented # in RAxML. The same nomenclature as in the PHASE manual is used, # available models: S6A, S6B, S6C, S6D, S6E, S7A, S7B, S7C, S7D, S7E, # S7F, S16, S16A, S16B # DEFAULT: 16-state GTR model (S16) '-A':ValuedParameter('-',Name='A',Delimiter=' '), # Specify an integer number (random seed) for bootstrapping '-b':ValuedParameter('-',Name='b',Delimiter=' '), # specify a floating point number between 0.0 and 1.0 that will be used # as cutoff threshold for the MR-based bootstopping criteria. The # recommended setting is 0.03. '-B':ValuedParameter('-',Name='B',Delimiter=' '), # Specify number of distinct rate catgories for raxml when # ModelOfEvolution is set to GTRCAT or HKY85CAT. # Individual per-site rates are categorized into numberOfCategories # rate categories to accelerate computations. (Default = 50) '-c':ValuedParameter('-',Name='c',Delimiter=' '), # Conduct model parameter optimization on gappy, partitioned multi-gene # alignments with per-partition branch length estimates (-M enabled) # using the fast method with pointer meshes described in: # Stamatakis and Ott: "Efficient computation of the phylogenetic # likelihood function on multi-gene alignments and multi-core # processors" # WARNING: We can not conduct useful tree searches using this method # yet! Does not work with Pthreads version. '-C':ValuedParameter('-',Name='C',Delimiter=' '), # This option allows you to start the RAxML search with a complete # random starting tree instead of the default Maximum Parsimony # Starting tree. On smaller datasets (around 100-200 taxa) it has # been observed that this might sometimes yield topologies of distinct # local likelihood maxima which better correspond to empirical # expectations. '-d':FlagParameter('-',Name='d'), # ML search convergence criterion. This will break off ML searches if # the relative Robinson-Foulds distance between the trees obtained from # two consecutive lazy SPR cycles is smaller or equal to 1%. Usage # recommended for very large datasets in terms of taxa. On trees with # more than 500 taxa this will yield execution time improvements of # approximately 50% While yielding only slightly worse trees. # DEFAULT: OFF '-D':ValuedParameter('-',Name='D'), # This allows you to specify up to which likelihood difference. # Default is 0.1 log likelihood units, author recommends 1 or 2 to # rapidly evaluate different trees. '-e':ValuedParameter('-',Name='e',Delimiter=' '), # specify an exclude file name, that contains a specification of # alignment positions you wish to exclude. Format is similar to Nexus, # the file shall contain entries like "100-200 300-400", to exclude a # single column write, e.g., "100-100", if you use a mixed model, an # appropriatly adapted model file will be written. '-E':ValuedParameter('-',Name='E',Delimiter=' '), # select search algorithm: # a rapid Bootstrap analysis and search for best-scoring ML tree in # one program run # A compute marginal ancestral states on a ROOTED reference tree # provided with "t" - ONLY IN 7.3.0 # b draw bipartition information on a tree provided with "-t" based on # multiple trees (e.g., from a bootstrap) in a file specifed by # "-z" # c check if the alignment can be properly read by RAxML # d for normal hill-climbing search (Default) # when -f option is omitted this algorithm will be used # e optimize model+branch lengths for given input tree under # GAMMA/GAMMAI only # E execute very fast experimental tree search, at present only for # testing # F execute fast experimental tree search, at present only for testing # g compute per site log Likelihoods for one ore more trees passed via # "-z" and write them to a file that can be read by CONSEL # WARNING: does not print likelihoods in the original column order # h compute log likelihood test (SH-test) between best tree passed via # "-t" and a bunch of other trees passed via "-z" # i EXPERIMENTAL do not use for real tree inferences: conducts a # single cycle of fast lazy SPR moves on a given input tree, to be # used in combination with -C and -M # I EXPERIMENTAL do not use for real tree inferences: conducts a # single cycle of thorough lazy SPR moves on a given input tree, # to be used in combination with -C and -M # j generate a bunch of bootstrapped alignment files from an original # alignemnt file. You need to specify a seed with "-b" and the # number of replicates with "-#" # following "J" is for version 7.2.8 # J Compute SH-like support values on a given tree passed via "-t". # m compare bipartitions between two bunches of trees passed via "-t" # and "-z" respectively. This will return the Pearson correlation # between all bipartitions found in the two tree files. A file # called RAxML_bipartitionFrequencies.outpuFileName will be # printed that contains the pair-wise bipartition frequencies of # the two sets # n compute the log likelihood score of all trees contained in a tree # file provided by "-z" under GAMMA or GAMMA+P-Invar # o old (slower) algorithm from v. 2.1.3 # p perform pure stepwise MP addition of new sequences to an # incomplete starting tree and exit # r compute pairwise Robinson-Foulds (RF) distances between all pairs # of trees in a tree file passed via "-z" if the trees have node # labales represented as integer support values the program will # also compute two flavors of the weighted Robinson-Foulds (WRF) # distance # following "R" is for version 7.2.8 # R compute rogue taxa using new statistical method based on the # evolutionary placement algorithm # WARNING: this is experimental code - DEPRECATED IN 7.3.0 # s (split) splits into individual genes, provided with model file # following "S" is for version 7.2.8 # S compute site-specific placement bias using a leave one out test # inspired by the evolutionary placement algorithm # t do randomized tree searches on one fixed starting tree # u execute morphological weight calibration using maximum likelihood, # this will return a weight vector. you need to provide a # morphological alignment and a reference tree via "-t" # U execute morphological wieght calibration using parsimony, this # will return a weight vector. you need to provide a morphological # alignment and a reference tree via "-t" - DEPRECATED IN 7.3.0 # v classify a bunch of environmental sequences into a reference tree # using the slow heuristics without dynamic alignment you will # need to start RAxML with a non-comprehensive reference tree and # an alignment containing all sequences (reference + query) # w compute ELW test on a bunch of trees passed via "-z" # x compute pair-wise ML distances, ML model parameters will be # estimated on an MP starting tree or a user-defined tree passed # via "-t", only allowed for GAMMA-based models of rate # heterogeneity # y classify a bunch of environmental sequences into a reference tree # using the fast heuristics without dynamic alignment you will # need to start RAxML with a non-comprehensive reference tree and # an alignment containing all sequences (reference + query) '-f':ValuedParameter('-',Name='f',Delimiter=' ', Value="d"), # enable ML tree searches under CAT model for very large trees without # switching to GAMMA in the end (saves memory). This option can also be # used with the GAMMA models in order to avoid the thorough optimization # of the best-scoring ML tree in the end. # DEFAULT: OFF '-F':FlagParameter('-',Name='F'), # select grouping file name: allows incomplete multifurcating constraint # tree in newick format -- resolves multifurcations randomly, adds # other taxa using parsimony insertion '-g':ValuedParameter('-', Name='g',Delimiter=' '), # enable the ML-based evolutionary placement algorithm heuristics by # specifiyng a threshold value (fraction of insertion branches to be # evaluated using slow insertions under ML). '-G':FlagParameter('-', Name='G'), # prints help and exits '-h':FlagParameter('-', Name='h'), # enable the MP-based evolutionary placement algorithm heuristics # by specifiyng a threshold value (fraction of insertion branches to be # evaluated using slow insertions under ML) - DEPRECATED IN 7.3.0 #'-H':ValuedParameter('-', Name='H',Delimiter=' '), # allows initial rearrangement to be constrained, e.g. 10 means # insertion will not be more than 10 nodes away from original. # default is to pick a "good" setting. '-i':ValuedParameter('-', Name='i', Delimiter=' '), # a posteriori bootstopping analysis. Use: # "-I autoFC" for the frequency-based criterion # "-I autoMR" for the majority-rule consensus tree criterion # "-I autoMRE" for the extended majority-rule consensus tree criterion # "-I autoMRE_IGN" for metrics similar to MRE, but include # bipartitions under the threshold whether they are compatible # or not. This emulates MRE but is faster to compute. # You also need to pass a tree file containg several bootstrap # replicates via "-z" '-I':ValuedParameter('-', Name='I', Delimiter=' '), # writes checkpoints (off by default) '-j':FlagParameter('-', Name='j'), # Compute majority rule consensus tree with "-J MR" or extended majority # rule consensus tree with "-J MRE" or strict consensus tree with "-J # STRICT" You will need to provide a tree file containing several # UNROOTED trees via "-z" '-J':ValuedParameter('-', Name='J', Delimiter=' '), #specifies that RAxML will optimize model parameters (for GTRMIX and # GTRGAMMA) as well as calculating likelihoods for bootstrapped trees. '-k':FlagParameter('-', Name='k'), # Specify one of the multi-state substitution models (max 32 states) # implemented in RAxML. Available models are: ORDERED, MK, GTR '-K':ValuedParameter('-', Name='K', Delimiter=' '), # Model of Binary (Morphological), Nucleotide, Multi-State, or Amino # Acid Substitution:: # BINARY: # -m BINCAT : Optimization of site-specific evolutionary rates which # are categorized into numberOfCategories distinct rate categories # for greater computational efficiency. Final tree might be # evaluated automatically under BINGAMMA, depending on the tree # search option # -m BINCATI : Optimization of site-specific evolutionary rates which # are categorized into numberOfCategories distinct rate categories # for greater computational efficiency. Final tree might be # evaluated automatically under BINGAMMAI, depending on the tree # search option # -m BINGAMMA : GAMMA model of rate heterogeneity (alpha parameter # will be estimated) # -m BINGAMMAI : Same as BINGAMMA, but with estimate of proportion of # invariable sites # NUCLEOTIDES # -m GTRCAT: GTR + Optimization of substitution rates + Optimization # of site-specific evolutionary rates which are categorized into # numberOfCategories distinct rate categories for greater # computational efficiency # -m GTRCAT_FLOAT : Same as above but uses single-precision floating # point arithemtics instead of double-precision Usage only # recommened for testing, the code will run slower, but can save # almost 50% of memory. If you have problems with phylogenomic # datasets and large memory requirements you may give it a shot. # Keep in mind that numerical stability seems to be okay but needs # further testing. - DEPRECATED IN 7.3.0 # -m GTRCATI : GTR + Optimization of substitution rates + Optimization # of site-specific evolutionary rates which are categorized into # numberOfCategories distinct rate categories for greater # computational efficiency. Final tree might be evaluated under # GTRGAMMAI, depending on the tree search option # -m GTRGAMMA: GTR + Optimization of substitution rates + Gamma # -m GTRGAMMA_FLOAT : Same as GTRGAMMA, but also with # single-precision arithmetics, same cautionary notes as for # GTRCAT_FLOAT apply. - DEPRECATED IN 7.3.0 # -m GTRGAMMAI : Same as GTRGAMMA, but with estimate of proportion of # invariable sites # MULTI-STATE: # -m MULTICAT : Optimization of site-specific evolutionary rates which # are categorized into numberOfCategories distinct rate categories # for greater computational efficiency. Final tree might be # evaluated automatically under MULTIGAMMA, depending on the tree # search option # -m MULTICATI : Optimization of site-specific evolutionary rates # which are categorized into numberOfCategories distinct rate # categories for greater computational efficiency. Final tree # might be evaluated automatically under MULTIGAMMAI, depending on # the tree search option # -m MULTIGAMMA : GAMMA model of rate heterogeneity (alpha parameter # will be estimated) # -m MULTIGAMMAI : Same as MULTIGAMMA, but with estimate of proportion # of invariable sites # You can use up to 32 distinct character states to encode multi-state # regions, they must be used in the following order: 0, 1, 2, 3, 4, 5, # 6, 7, 8, 9, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, # T, U, V i.e., if you have 6 distinct character states you would use 0, # 1, 2, 3, 4, 5 to encode these. The substitution model for the # multi-state regions can be selected via the "-K" option # Amino Acid Models: # -m PROTCATmatrixName[F] : specified AA matrix + Optimization of # substitution rates + Optimization of site-specific evolutionary # rates which are categorized into numberOfCategories distinct # rate categories for greater computational efficiency. Final # tree might be evaluated automatically under # PROTGAMMAmatrixName[f], depending on the tree search option # -m PROTCATmatrixName[F]_FLOAT : PROTCAT with single precision # arithmetics, same cautionary notes as for GTRCAT_FLOAT apply # - DEPRECATED IN 7.3.0 # -m PROTCATImatrixName[F] : specified AA matrix + Optimization of # substitution rates + Optimization of site-specific # evolutionary rates which are categorized into numberOfCategories # distinct rate categories for greater computational efficiency. # Final tree might be evaluated automatically under # PROTGAMMAImatrixName[f], depending on the tree search option # -m PROTGAMMAmatrixName[F] : specified AA matrix + Optimization of # substitution rates + GAMMA model of rate heterogeneity (alpha # parameter will be estimated) # -m PROTGAMMAmatrixName[F]_FLOAT : PROTGAMMA with single precision # arithmetics, same cautionary notes as for GTRCAT_FLOAT apply # - DEPRECATED IN 7.3.0 # -m PROTGAMMAImatrixName[F] : Same as PROTGAMMAmatrixName[F], but # with estimate of proportion of invariable sites # Available AA substitution models: DAYHOFF, DCMUT, JTT, MTREV, WAG, # RTREV, CPREV, VT, BLOSUM62, MTMAM, LG, GTR. With the optional "F" # appendix you can specify if you want to use empirical base frequencies # Please note that for mixed models you can in addition specify the # per-gene AA model in the mixed model file (see manual for details). # Also note that if you estimate AA GTR parameters on a partitioned # dataset, they will be linked (estimated jointly) across all partitions # to avoid over-parametrization '-m':ValuedParameter('-',Name='m',Delimiter=' '), # Switch on estimation of individual per-partition branch lengths. Only # has effect when used in combination with "-q". Branch lengths for # individual partitions will be printed to separate files. A weighted # average of the branch lengths is computed by using the respective # partition lengths. # DEFAULT: OFF '-M':FlagParameter('-',Name='M'), # Specifies the name of the output file. '-n':ValuedParameter('-',Name='n',Delimiter=' '), # Specifies the name of the outgroup (or outgroups: comma-delimited, # no spaces, should be monophyletic). '-o':ValuedParameter('-',Name='o',Delimiter=' '), # Enable checkpointing using the dmtcp library available at # http://dmtcp.sourceforge.net/. This only works if you call the program # by preceded by the command "dmtcp_checkpoint" and if you compile a # dedicated binary using the appropriate Makefile. With "-O" you can # specify the interval between checkpoints in seconds. # DEFAULT: 3600.0 seconds - DEPRECATED IN 7.3.0 #'-O':ValuedParameter('-',Name='O',Delimiter=' ',Value=3600.0), # Specify a random number seed for the parsimony inferences. This allows # you to reproduce your results and will help me debug the program. '-p':ValuedParameter('-',Name='p',Delimiter=' '), # Specify the file name of a user-defined AA (Protein) substitution # model. This file must contain 420 entries, the first 400 being the AA # substitution rates (this must be a symmetric matrix) and the last 20 # are the empirical base frequencies '-P':ValuedParameter('-',Name='P',Delimiter=' '), # Specified MultipleModel file name, in format: # gene1 = 1-500 # gene2 = 501-1000 # (note: ranges can also be discontiguous, e.g. 1-100, 200-300, # or can specify codon ranges as e.g. 1-100/3, 2-100/3, 3-100/3)) '-q':ValuedParameter('-', Name='q', Delimiter=' '), # THE FOLLOWING "Q" is DEPRECATED IN 7.2.8 # Turn on computation of SH-like support values on tree. # DEFAULT: OFF '-Q':FlagParameter('-', Name='Q'), # Constraint file name: allows a bifurcating Newick tree to be passed # in as a constraint file, other taxa will be added by parsimony. '-r':ValuedParameter('-',Name='r',Delimiter=' '), # THE FOLLOWING "R" is IN 7.2.8 # Specify the file name of a binary model parameter file that has # previously been generated with RAxML using the -f e tree evaluation # option. The file name should be: RAxML_binaryModelParameters.runID '-R':ValuedParameter('-',Name='R',Delimiter=' '), # specify the name of the alignment data file, in relaxed PHYLIP # format. '-s':ValuedParameter('-',Name='s',Delimiter=' '), # Specify the name of a secondary structure file. The file can contain # "." for alignment columns that do not form part of a stem and # characters "()<>[]{}" to define stem regions and pseudoknots '-S':ValuedParameter('-',Name='S',Delimiter=' '), # Specify a user starting tree file name in Newick format '-t':ValuedParameter('-',Name='t',Delimiter=' '), # PTHREADS VERSION ONLY! Specify the number of threads you want to run. # Make sure to set "-T" to at most the number of CPUs you have on your # machine, otherwise, there will be a huge performance decrease! '-T':ValuedParameter('-',Name='T',Delimiter=' '), # THE FOLLOWING "U" is IN 7.2.8 # Try to save memory by using SEV-based implementation for gap columns # on large gappy alignments # WARNING: this will only work for DNA under GTRGAMMA and is still in an # experimental state. '-U':ValuedParameter('-',Name='U',Delimiter=' '), # Print the version '-v':FlagParameter('-',Name='v'), # Name of the working directory where RAxML-V will write its output # files. '-w':ValuedParameter('-',Name='w',Delimiter=' '), # THE FOLLOWING "W" is IN 7.2.8 # Sliding window size for leave-one-out site-specific placement bias # algorithm only effective when used in combination with "-f S" # DEFAULT: 100 sites '-W':ValuedParameter('-',Name='W',Delimiter=' '), # Specify an integer number (random seed) and turn on rapid # bootstrapping. CAUTION: unlike in version 7.0.4 RAxML will conduct # rapid BS replicates under the model of rate heterogeneity you # specified via "-m" and not by default under CAT '-x':ValuedParameter('-',Name='x',Delimiter=' '), # EXPERIMENTAL OPTION: This option will do a per-site estimate of # protein substitution models by looping over all given, fixed models # LG, WAG, JTT, etc and using their respective base frequencies to # independently assign a prot subst. model to each site via ML # optimization. At present this option only works with the GTR+GAMMA # model, unpartitioned datasets, and in the sequential version only. # DEFAULT: OFF '-X':FlagParameter('-', Name='X'), # Compute only randomized starting parsimony tree with RAxML, do not # optimize an ML analysis of the tree '-y':FlagParameter('-', Name='y'), # Do a more thorough parsimony tree search using a parsimony ratchet and # exit. Specify the number of ratchet searches via "-#" or "-N". This # has just been implemented for completeness, if you want a fast MP # implementation use TNT # DEFAULT: OFF - DEPRECATED IN 7.3.0 #'-Y':FlagParameter('-', Name='Y'), # Multiple tree file, for use with -f b (to draw bipartitions onto the # common tree specified with -t) '-z':ValuedParameter('-', Name='z', Delimiter=' '), # Specifies number of runs on distinct starting trees. '-#':ValuedParameter('-', Name='#', Delimiter=' ',Value=1), # Specifies number of runs on distinct starting trees. '-N':ValuedParameter('-', Name='N', Delimiter=' '), } _parameters = {} _parameters.update(_options) _command = "raxmlHPC" _out_format = "RAxML_%s.%s" def _format_output(self, outfile_name, out_type): """ Prepend proper output prefix to output filename """ outfile_name = self._absolute(outfile_name) outparts = outfile_name.split("/") outparts[-1] = self._out_format % (out_type, outparts[-1] ) return '/'.join(outparts) def _input_as_seqs(self,data): lines = [] for i,s in enumerate(data): #will number the sequences 1,2,3,etc. lines.append(''.join(['>',str(i+1)])) lines.append(s) return self._input_as_lines(lines) def _input_as_lines(self,data): if data: self.Parameters['-s']\ .on(super(Raxml,self)._input_as_lines(data)) return '' def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ if data: self.Parameters['-in'].on(str(data)) return '' def _input_as_multiline_string(self, data): if data: self.Parameters['-s']\ .on(super(Raxml,self)._input_as_multiline_string(data)) return '' def _absolute(self,path): path = FilePath(path) if isabs(path): return path elif self.Parameters['-w'].isOn(): return self.Parameters['-w'].Value + path else: return self.WorkingDir + path def _log_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), "log") else: raise ValueError, "No output file specified." def _info_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), "info") else: raise ValueError, "No output file specified." def _parsimony_tree_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "parsimonyTree") else: raise ValueError, "No output file specified." # added for tree-insertion def _originallabelled_tree_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "originalLabelledTree") else: raise ValueError, "No output file specified." # added for tree-insertion def _labelled_tree_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "labelledTree") else: raise ValueError, "No output file specified." # added for tree-insertion def _classification_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "classification") else: raise ValueError, "No output file specified." # added for tree-insertion def _classificationlikelihoodweights_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "classificationLikelihoodWeights") else: raise ValueError, "No output file specified." # added for tree-insertion def _best_tree_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "bestTree") else: raise ValueError, "No output file specified." # added for tree-insertion def _entropy_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "entropy") else: raise ValueError, "No output file specified." # added for tree-insertion def _json_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "portableTree") else: raise ValueError, "No output file specified." # added for tree-insertion def _parsimony_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "equallyParsimoniousPlacements") else: raise ValueError, "No output file specified." def _result_tree_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "result") else: raise ValueError, "No output file specified." def _result_bootstrap_out_filename(self): if self.Parameters['-n'].isOn(): return self._format_output(str(self.Parameters['-n'].Value), \ "bootstrap") else: raise ValueError, "No output file specified" def _checkpoint_out_filenames(self): """ RAxML generates a crapload of checkpoint files so need to walk directory to collect names of all of them. """ out_filenames = [] if self.Parameters['-n'].isOn(): out_name = str(self.Parameters['-n'].Value) walk_root = self.WorkingDir if self.Parameters['-w'].isOn(): walk_root = str(self.Parameters['-w'].Value) for tup in walk(walk_root): dpath, dnames, dfiles = tup if dpath == walk_root: for gen_file in dfiles: if out_name in gen_file and "checkpoint" in gen_file: out_filenames.append(walk_root + gen_file) break else: raise ValueError, "No output file specified." return out_filenames def _handle_app_result_build_failure(self,out,err,exit_status,result_paths): """ Catch the error when files are not produced """ try: raise ApplicationError, \ 'RAxML failed to produce an output file due to the following error: \n\n%s ' \ % err.read() except: raise ApplicationError,\ 'RAxML failed to run properly.' def _get_result_paths(self,data): result = {} result['Info'] = ResultPath(Path=self._info_out_filename(), IsWritten=True) if self.Parameters['-k'].isOn(): result['Bootstrap'] = ResultPath( Path=self._result_bootstrap_out_filename(), IsWritten=True) elif self.Parameters["-f"].Value == 'v': #these were added to handle the results from tree-insertion result['Classification'] = ResultPath( Path=self._classification_out_filename(), IsWritten=True) result['ClassificationLikelihoodWeights'] = ResultPath( Path=self._classificationlikelihoodweights_out_filename(), IsWritten=True) result['OriginalLabelledTree'] = ResultPath( Path=self._originallabelled_tree_out_filename(), IsWritten=True) result['Result'] = ResultPath( Path=self._labelled_tree_out_filename(),IsWritten=True) result['entropy'] = ResultPath( Path=self._entropy_out_filename(),IsWritten=True) result['json'] = ResultPath( Path=self._json_out_filename()+'.jplace',IsWritten=True) elif self.Parameters["-f"].Value == 'y': #these were added to handle the results from tree-insertion result['Parsimony'] = ResultPath( Path=self._parsimony_out_filename(), IsWritten=True) result['OriginalLabelledTree'] = ResultPath( Path=self._originallabelled_tree_out_filename(), IsWritten=True) result['json'] = ResultPath( Path=self._json_out_filename()+'.jplace',IsWritten=True) else: result['Log'] = ResultPath(Path=self._log_out_filename(), IsWritten=True) result['ParsimonyTree'] = ResultPath( Path=self._parsimony_tree_out_filename(), IsWritten=True) result['Result'] = ResultPath( Path=self._result_tree_out_filename(), IsWritten=True) # result['besttree'] = ResultPath( Path=self._best_tree_out_filename(), IsWritten=True) for checkpoint_file in self._checkpoint_out_filenames(): checkpoint_num = checkpoint_file.split(".")[-1] try: checkpoint_num = int(checkpoint_num) except Exception, e: raise ValueError, "%s does not appear to be a valid checkpoint file" result['Checkpoint%d' % checkpoint_num] = ResultPath( Path=checkpoint_file, IsWritten=True) return result #SOME FUNCTIONS TO EXECUTE THE MOST COMMON TASKS def raxml_alignment(align_obj, raxml_model="GTRCAT", params={}, SuppressStderr=True, SuppressStdout=True): """Run raxml on alignment object align_obj: Alignment object params: you can set any params except -w and -n returns: tuple (phylonode, parsimonyphylonode, log likelihood, total exec time) """ # generate temp filename for output params["-w"] = "/tmp/" params["-n"] = get_tmp_filename().split("/")[-1] params["-m"] = raxml_model params["-p"] = randint(1,100000) ih = '_input_as_multiline_string' seqs, align_map = align_obj.toPhylip() #print params["-n"] # set up command raxml_app = Raxml( params=params, InputHandler=ih, WorkingDir=None, SuppressStderr=SuppressStderr, SuppressStdout=SuppressStdout) # run raxml ra = raxml_app(seqs) # generate tree tree_node = DndParser(ra["Result"]) # generate parsimony tree parsimony_tree_node = DndParser(ra["ParsimonyTree"]) # extract log likelihood from log file log_file = ra["Log"] total_exec_time = exec_time = log_likelihood = 0.0 for line in log_file: exec_time, log_likelihood = map(float, line.split()) total_exec_time += exec_time # remove output files ra.cleanUp() return tree_node, parsimony_tree_node, log_likelihood, total_exec_time def build_tree_from_alignment(aln, moltype, best_tree=False, params={}): """Returns a tree from Alignment object aln. aln: an xxx.Alignment object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object best_tree: best_tree suppport is currently not implemented params: dict of parameters to pass in to the RAxML app controller. The result will be an xxx.Alignment object, or None if tree fails. """ if best_tree: raise NotImplementedError if '-m' not in params: if moltype == DNA or moltype == RNA: #params["-m"] = 'GTRMIX' # in version 7.2.3, GTRMIX is no longer supported but says GTRCAT # behaves like GTRMIX (http://www.phylo.org/tools/raxmlhpc2.html) params["-m"] = 'GTRGAMMA' elif moltype == PROTEIN: params["-m"] = 'PROTGAMMAmatrixName' else: raise ValueError, "Moltype must be either DNA, RNA, or PROTEIN" if not hasattr(aln, 'toPhylip'): aln = Alignment(aln) seqs, align_map = aln.toPhylip() # generate temp filename for output params["-w"] = "/tmp/" params["-n"] = get_tmp_filename().split("/")[-1] params["-k"] = True params["-p"] = randint(1,100000) params["-x"] = randint(1,100000) ih = '_input_as_multiline_string' raxml_app = Raxml(params=params, InputHandler=ih, WorkingDir=None, SuppressStderr=True, SuppressStdout=True) raxml_result = raxml_app(seqs) tree = DndParser(raxml_result['Bootstrap'], constructor=PhyloNode) for node in tree.tips(): node.Name = align_map[node.Name] raxml_result.cleanUp() return tree def insert_sequences_into_tree(seqs, moltype, params={}, write_log=True): """Insert sequences into Tree. aln: an xxx.Alignment object, or data that can be used to build one. moltype: cogent.core.moltype.MolType object params: dict of parameters to pass in to the RAxML app controller. The result will be a tree. """ ih = '_input_as_multiline_string' raxml_app = Raxml(params=params, InputHandler=ih, WorkingDir=None, SuppressStderr=False, SuppressStdout=False, HALT_EXEC=False) raxml_result = raxml_app(seqs) # write a log file if write_log: log_fp = join(params["-w"],'log_raxml_'+split(get_tmp_filename())[-1]) log_file=open(log_fp,'w') log_file.write(raxml_result['StdOut'].read()) log_file.close() ''' # getting setup since parsimony doesn't output tree..only jplace, however # it is currently corrupt # use guppy to convert json file into a placement tree guppy_params={'tog':None} new_tree=build_tree_from_json_using_params(raxml_result['json'].name, \ output_dir=params["-w"], \ params=guppy_params) ''' # get tree from 'Result Names' new_tree=raxml_result['Result'].readlines() filtered_tree=re.sub('\[I\d+\]','',str(new_tree)) tree = DndParser(filtered_tree, constructor=PhyloNode) raxml_result.cleanUp() return tree PyCogent-1.5.3/cogent/app/rdp_classifier.py000644 000765 000024 00000052534 12024702176 021654 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for rdp_classifier-2.0 """ __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger","Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Prototype" import os.path import re from os import remove, environ, getenv, path from optparse import OptionParser from shutil import rmtree import tempfile from cogent.app.parameters import Parameter, ValuedParameter, Parameters from cogent.parse.fasta import MinimalFastaParser from cogent.app.util import CommandLineApplication, CommandLineAppResult, \ FilePath, ResultPath, guess_input_handler, system,\ ApplicationNotFoundError, ApplicationError from cogent.util.misc import app_path class RdpClassifier(CommandLineApplication): """RDP Classifier application controller The RDP Classifier program is distributed as a java archive (.jar) file. If the file 'rdp_classifier-2.2.jar' is not found in the current directory, the app controller uses the JAR file specified by the environment variable RDP_JAR_PATH. If this variable is not set, and 'rdp_classifier-2.2.jar' is not found in the current directory, the application controller raises an ApplicationNotFoundError. The RDP Classifier often requires memory in excess of Java's default 64M. To correct this situation, the authors recommend increasing the maximum heap size for the java virtual machine. An option '-Xmx' (default 1000M) is provided for this purpose. Details on this option may be found at http://java.sun.com/j2se/1.5.0/docs/tooldocs/solaris/java.html The classifier may optionally use a custom training set. The full path to the training set may be provided in the option '-training-data'. """ _input_handler = '_input_as_lines' _command = "rdp_classifier-2.2.jar" _options = { # output file name for classification assignment '-o': ValuedParameter('-', Name='o', Delimiter=' ', IsPath=True), # a property file contains the mapping of the training # files. Note: the training files and the property file should # be in the same directory. The default property file is set # to data/classifier/rRNAClassifier.properties. '-t': ValuedParameter('-', Name='t', Delimiter=' ', IsPath=True), # all tab delimited output format: [allrank|fixrank|db]. # Default is allrank. # # allrank: outputs the results for all ranks applied for # each sequence: seqname, orientation, taxon name, rank, # conf, ... # # fixrank: only outputs the results for fixed ranks in # order: no rank, domain, phylum, class, order, family, # genus # # db: outputs the seqname, trainset_no, tax_id, conf. This # is good for storing in a database '-f': ValuedParameter('-', Name='f', Delimiter=' '), } # The following are available in the attributes JvmParameters, # JarParameters, and PositionalParameters _jvm_synonyms = {} _jvm_parameters = { # Maximum heap size for JVM. '-Xmx': ValuedParameter('-', Name='Xmx', Delimiter='', Value='1000m'), } _parameters = {} _parameters.update(_options) _parameters.update(_jvm_parameters) def getHelp(self): """Returns documentation string""" # Summary paragraph copied from rdp_classifier-2.0, which is # licensed under the GPL 2.0 and Copyright 2008 Michigan State # University Board of Trustees help_str = """\ usage: ClassifierCmd [-f ] [-o ] [-q ] [-t ] -f,--format all tab delimited output format: [allrank|fixrank|db]. Default is allrank. allrank: outputs the results for all ranks applied for each sequence: seqname, orientation, taxon name, rank, conf, ... fixrank: only outputs the results for fixed ranks in order: no rank, domain, phylum, class, order, family, genus db: outputs the seqname, trainset_no, tax_id, conf. This is good for storing in a database -o,--outputFile output file name for classification assignment -q,--queryFile query file contains sequences in one of the following formats: Fasta, Genbank and EMBL -t,--train_propfile a property file contains the mapping of the training files. Note: the training files and the property file should be in the same directory. The default property file is set to data/classifier/rRNAClassifier.properties.""" return help_str def _accept_exit_status(self, status): """Returns false if an error occurred in execution """ return (status == 0) def _error_on_missing_application(self,params): """Raise an ApplicationNotFoundError if the app is not accessible In this case, checks for the java runtime and the RDP jar file. """ if not (os.path.exists('java') or app_path('java')): raise ApplicationNotFoundError( "Cannot find java runtime. Is it installed? Is it in your " "path?") jar_fp = self._get_jar_fp() if jar_fp is None: raise ApplicationNotFoundError( "JAR file not found in current directory and the RDP_JAR_PATH " "environment variable is not set. Please set RDP_JAR_PATH to " "the full pathname of the JAR file.") if not os.path.exists(jar_fp): raise ApplicationNotFoundError( "JAR file %s does not exist." % jar_fp) def _get_jar_fp(self): """Returns the full path to the JAR file. If the JAR file cannot be found in the current directory and the environment variable RDP_JAR_PATH is not set, returns None. """ # handles case where the jar file is in the current working directory if os.path.exists(self._command): return self._command # handles the case where the user has specified the location via # an environment variable elif 'RDP_JAR_PATH' in environ: return getenv('RDP_JAR_PATH') else: return None # Overridden to pull out JVM-specific command-line arguments. def _get_base_command(self): """Returns the base command plus command-line options. Does not include input file, output file, and training set. """ cd_command = ''.join(['cd ', str(self.WorkingDir), ';']) jvm_command = "java" jvm_arguments = self._commandline_join( [self.Parameters[k] for k in self._jvm_parameters]) jar_arguments = '-jar "%s"' % self._get_jar_fp() rdp_arguments = self._commandline_join( [self.Parameters[k] for k in self._options]) command_parts = [ cd_command, jvm_command, jvm_arguments, jar_arguments, rdp_arguments, '-q'] return self._commandline_join(command_parts).strip() BaseCommand = property(_get_base_command) def _commandline_join(self, tokens): """Formats a list of tokens as a shell command This seems to be a repeated pattern; may be useful in superclass. """ commands = filter(None, map(str, tokens)) return self._command_delimiter.join(commands).strip() def _get_result_paths(self,data): """ Return a dict of ResultPath objects representing all possible output """ assignment_fp = str(self.Parameters['-o'].Value).strip('"') if not os.path.isabs(assignment_fp): assignment_fp = os.path.relpath(assignment_fp, self.WorkingDir) return {'Assignments': ResultPath(assignment_fp, IsWritten=True)} class RdpTrainer(RdpClassifier): _input_handler = '_input_as_lines' TrainingClass = 'edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker' PropertiesFile = 'RdpClassifier.properties' _parameters = { 'taxonomy_file': ValuedParameter(None, None, IsPath=True), 'model_output_dir': ValuedParameter(None, None, IsPath=True), 'training_set_id': ValuedParameter(None, None, Value='1'), 'taxonomy_version': ValuedParameter(None, None, Value='version1'), 'modification_info': ValuedParameter(None, None, Value='cogent'), } _jvm_parameters = { # Maximum heap size for JVM. '-Xmx': ValuedParameter('-', Name='Xmx', Delimiter='', Value='1000m'), } _parameters.update(_jvm_parameters) def _get_base_command(self): """Returns the base command plus command-line options. Handles everything up to and including the classpath. The positional training parameters are added by the _input_handler_decorator method. """ cd_command = ''.join(['cd ', str(self.WorkingDir), ';']) jvm_command = "java" jvm_args = self._commandline_join( [self.Parameters[k] for k in self._jvm_parameters]) cp_args = '-cp "%s" %s' % (self._get_jar_fp(), self.TrainingClass) command_parts = [cd_command, jvm_command, jvm_args, cp_args] return self._commandline_join(command_parts).strip() BaseCommand = property(_get_base_command) def _set_input_handler(self, method_name): """Stores the selected input handler in a private attribute. """ self.__InputHandler = method_name def _get_input_handler(self): """Returns decorator that wraps the requested input handler. """ return '_input_handler_decorator' InputHandler = property(_get_input_handler, _set_input_handler) @property def ModelDir(self): """Absolute FilePath to the training output directory. """ model_dir = self.Parameters['model_output_dir'].Value absolute_model_dir = os.path.abspath(model_dir) return FilePath(absolute_model_dir) def _input_handler_decorator(self, data): """Adds positional parameters to selected input_handler's results. """ input_handler = getattr(self, self.__InputHandler) input_parts = [ self.Parameters['taxonomy_file'], input_handler(data), self.Parameters['training_set_id'], self.Parameters['taxonomy_version'], self.Parameters['modification_info'], self.ModelDir, ] return self._commandline_join(input_parts) def _get_result_paths(self, output_dir): """Return a dict of output files. """ # Only include the properties file here. Add the other result # paths in the __call__ method, so we can catch errors if an # output file is not written. self._write_properties_file() properties_fp = os.path.join(self.ModelDir, self.PropertiesFile) result_paths = { 'properties': ResultPath(properties_fp, IsWritten=True,) } return result_paths def _write_properties_file(self): """Write an RDP training properties file manually. """ # The properties file specifies the names of the files in the # training directory. We use the example properties file # directly from the rdp_classifier distribution, which lists # the default set of files created by the application. We # must write this file manually after generating the # training data. properties_fp = os.path.join(self.ModelDir, self.PropertiesFile) properties_file = open(properties_fp, 'w') properties_file.write( "# Sample ResourceBundle properties file\n" "bergeyTree=bergeyTrainingTree.xml\n" "probabilityList=genus_wordConditionalProbList.txt\n" "probabilityIndex=wordConditionalProbIndexArr.txt\n" "wordPrior=logWordPrior.txt\n" "classifierVersion=Naive Bayesian rRNA Classifier Version 1.0, " "November 2003\n" ) properties_file.close() def __call__(self, data=None, remove_tmp=True): """Run the application with the specified kwargs on data data: anything that can be cast into a string or written out to a file. Usually either a list of things or a single string or number. input_handler will be called on this data before it is passed as part of the command-line argument, so by creating your own input handlers you can customize what kind of data you want your application to accept remove_tmp: if True, removes tmp files """ result = super(RdpClassifier, self).__call__(data=data, remove_tmp=remove_tmp) training_files = { 'bergeyTree': 'bergeyTrainingTree.xml', 'probabilityList': 'genus_wordConditionalProbList.txt', 'probabilityIndex': 'wordConditionalProbIndexArr.txt', 'wordPrior': 'logWordPrior.txt', } for key, training_fn in sorted(training_files.items()): training_fp = os.path.join(self.ModelDir, training_fn) if not os.path.exists(training_fp): exception_msg = ( "Training output file %s not found. This may " "happen if an error occurred during the RDP training " "process. More details may be available in the " "standard error, printed below.\n\n" % training_fp ) stderr_msg = result["StdErr"].read() result["StdErr"].seek(0) raise ApplicationError(exception_msg + stderr_msg) # Not in try/except clause because we already know the # file exists. Failure would be truly exceptional, and we # want to maintain the original exception in that case. result[key] = open(training_fp) return result def parse_command_line_parameters(argv=None): """ Parses command line arguments """ usage =\ 'usage: %prog [options] input_sequences_filepath' version = 'Version: %prog ' + __version__ parser = OptionParser(usage=usage, version=version) parser.add_option('-o','--output_fp',action='store',\ type='string',dest='output_fp',help='Path to store '+\ 'output file [default: generated from input_sequences_filepath]') parser.add_option('-c','--min_confidence',action='store',\ type='float',dest='min_confidence',help='minimum confidence '+\ 'level to return a classification [default: %default]') parser.set_defaults(verbose=False,min_confidence=0.80) opts, args = parser.parse_args(argv) if len(args) != 1: parser.error('Exactly one argument is required.') return opts, args def assign_taxonomy( data, min_confidence=0.80, output_fp=None, training_data_fp=None, fixrank=True, max_memory=None): """Assign taxonomy to each sequence in data with the RDP classifier data: open fasta file object or list of fasta lines confidence: minimum support threshold to assign taxonomy to a sequence output_fp: path to write output; if not provided, result will be returned in a dict of {seq_id:(taxonomy_assignment,confidence)} """ # Going to iterate through this twice in succession, best to force # evaluation now data = list(data) # RDP classifier doesn't preserve identifiers with spaces # Use lookup table seq_id_lookup = {} for seq_id, seq in MinimalFastaParser(data): seq_id_lookup[seq_id.split()[0]] = seq_id app = RdpClassifier() if max_memory is not None: app.Parameters['-Xmx'].on(max_memory) temp_output_file = tempfile.NamedTemporaryFile( prefix='RdpAssignments_', suffix='.txt') app.Parameters['-o'].on(temp_output_file.name) if training_data_fp is not None: app.Parameters['-t'].on(training_data_fp) if fixrank: app.Parameters['-f'].on('fixrank') else: app.Parameters['-f'].on('allrank') app_result = app(data) assignments = {} # ShortSequenceException messages are written to stdout # Tag these ID's as unassignable for line in app_result['StdOut']: excep = parse_rdp_exception(line) if excep is not None: _, rdp_id = excep orig_id = seq_id_lookup[rdp_id] assignments[orig_id] = ('Unassignable', 1.0) for line in app_result['Assignments']: rdp_id, direction, taxa = parse_rdp_assignment(line) if taxa[0][0] == "Root": taxa = taxa[1:] orig_id = seq_id_lookup[rdp_id] lineage, confidence = get_rdp_lineage(taxa, min_confidence) if lineage: assignments[orig_id] = (';'.join(lineage), confidence) else: assignments[orig_id] = ('Unclassified', 1.0) if output_fp: try: output_file = open(output_fp, 'w') except OSError: raise OSError("Can't open output file for writing: %s" % output_fp) for seq_id, assignment in assignments.items(): lineage, confidence = assignment output_file.write( '%s\t%s\t%1.3f\n' % (seq_id, lineage, confidence)) output_file.close() return None else: return assignments def train_rdp_classifier( training_seqs_file, taxonomy_file, model_output_dir, max_memory=None): """ Train RDP Classifier, saving to model_output_dir training_seqs_file, taxonomy_file: file-like objects used to train the RDP Classifier (see RdpTrainer documentation for format of training data) model_output_dir: directory in which to save the files necessary to classify sequences according to the training data Once the model data has been generated, the RDP Classifier may """ app = RdpTrainer() if max_memory is not None: app.Parameters['-Xmx'].on(max_memory) temp_taxonomy_file = tempfile.NamedTemporaryFile( prefix='RdpTaxonomy_', suffix='.txt') temp_taxonomy_file.write(taxonomy_file.read()) temp_taxonomy_file.seek(0) app.Parameters['taxonomy_file'].on(temp_taxonomy_file.name) app.Parameters['model_output_dir'].on(model_output_dir) return app(training_seqs_file) def train_rdp_classifier_and_assign_taxonomy( training_seqs_file, taxonomy_file, seqs_to_classify, min_confidence=0.80, model_output_dir=None, classification_output_fp=None, max_memory=None): """ Train RDP Classifier and assign taxonomy in one fell swoop The file objects training_seqs_file and taxonomy_file are used to train the RDP Classifier (see RdpTrainer documentation for details). Model data is stored in model_output_dir. If model_output_dir is not provided, a temporary directory is created and removed after classification. The sequences in seqs_to_classify are classified according to the model and filtered at the desired confidence level (default: 0.80). The results are saved to classification_output_fp if provided, otherwise a dict of {seq_id:(taxonomy_assignment,confidence)} is returned. """ if model_output_dir is None: training_dir = tempfile.mkdtemp(prefix='RdpTrainer_') else: training_dir = model_output_dir training_results = train_rdp_classifier( training_seqs_file, taxonomy_file, training_dir, max_memory=max_memory) training_data_fp = training_results['properties'].name assignment_results = assign_taxonomy( seqs_to_classify, min_confidence=min_confidence, output_fp=classification_output_fp, training_data_fp=training_data_fp, max_memory=max_memory, fixrank=False) if model_output_dir is None: rmtree(training_dir) return assignment_results def get_rdp_lineage(rdp_taxa, min_confidence): lineage = [] obs_confidence = 1.0 for taxon, rank, confidence in rdp_taxa: if confidence >= min_confidence: obs_confidence = confidence lineage.append(taxon) else: break return lineage, obs_confidence def parse_rdp_exception(line): if line.startswith('ShortSequenceException'): matchobj = re.search('recordID=(\S+)', line) if matchobj: rdp_id = matchobj.group(1) return ('ShortSequenceException', rdp_id) return None def parse_rdp_assignment(line): """Returns a list of assigned taxa from an RDP classification line """ toks = line.strip().split('\t') seq_id = toks.pop(0) direction = toks.pop(0) if ((len(toks) % 3) != 0): raise ValueError( "Expected assignments in a repeating series of (rank, name, " "confidence), received %s" % toks) assignments = [] # Fancy way to create list of triples using consecutive items from # input. See grouper function in documentation for itertools for # more general example. itoks = iter(toks) for taxon, rank, confidence_str in zip(itoks, itoks, itoks): if not taxon: continue assignments.append((taxon.strip('"'), rank, float(confidence_str))) return seq_id, direction, assignments if __name__ == "__main__": opts, args = parse_command_line_parameters() assign_taxonomy( open(args[0]), min_confidence=opts.min_confidence, output_fp=opts.output_fp) PyCogent-1.5.3/cogent/app/rdp_classifier20.py000644 000765 000024 00000065614 12024702176 022021 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python """Application controller for rdp_classifier-2.0 """ __author__ = "Kyle Bittinger" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Kyle Bittinger","Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Kyle Bittinger" __email__ = "kylebittinger@gmail.com" __status__ = "Prototype" import re from os import remove, environ, getenv, path from os.path import exists from optparse import OptionParser from shutil import rmtree from tempfile import mkdtemp from cogent.app.parameters import Parameter, ValuedParameter, Parameters from cogent.parse.fasta import MinimalFastaParser from cogent.app.rdp_classifier import RdpClassifier from cogent.app.util import CommandLineApplication, CommandLineAppResult, \ FilePath, ResultPath, guess_input_handler, system,\ ApplicationNotFoundError, ApplicationError class RdpClassifier20(CommandLineApplication): """RDP Classifier version 2.0 application controller The RDP Classifier program is distributed as a java archive (.jar) file. If the file 'rdp_classifier-2.0.jar' is not found in the current directory, the app controller looks in the directory specified by the environment variable RDP_JAR_PATH. If this variable is not set, and 'rdp_classifier-2.0.jar' is not found in the current directory, the application controller raises an ApplicationNotFoundError. The RDP Classifier often requires memory in excess of Java's default 64M. To correct this situation, the authors recommend increasing the maximum heap size for the java virtual machine. An option '-Xmx' (default 1000M) is provided for this purpose. Details on this option may be found at http://java.sun.com/j2se/1.5.0/docs/tooldocs/solaris/java.html The classifier may optionally use a custom training set. The full path to the training set may be provided in the option '-training-data'. """ _input_handler = '_input_as_multiline_string' _command = "rdp_classifier-2.0.jar" _options ={} # The following are available in the attributes JvmParameters, # JarParameters, and PositionalParameters _jvm_synonyms = {} _jvm_parameters = { # Maximum heap size for JVM. '-Xmx': ValuedParameter('-', Name='Xmx', Delimiter='', Value='1000m'), } _positional_synonyms = {} _positional_parameters = { '-training-data': ValuedParameter('', Name='', Delimiter='', Value='', IsPath=True), } _parameters = {} _parameters.update(_options) _parameters.update(_jvm_parameters) _parameters.update(_positional_parameters) def getHelp(self): """Returns documentation string""" # Summary paragraph copied from rdp_classifier-2.0, which is # licensed under the GPL 2.0 and Copyright 2008 Michigan State # University Board of Trustees help_str =\ """ Ribosomal Database Project - Classifier http://rdp.cme.msu.edu/classifier/ The RDP Classifier is a naive Bayesian classifier which was developed to provide rapid taxonomic placement based on rRNA sequence data. The RDP Classifier can rapidly and accurately classify bacterial 16s rRNA sequences into the new higher-order taxonomy proposed by Bergey's Trust. It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The RDP Classifier is not limited to using the bacterial taxonomy proposed by the Bergey's editors. It worked equally well when trained on the NCBI taxonomy. The RDP Classifier likely can be adapted to additional phylogenetically coherent bacterial taxonomies. The following paper should be cited if this resource is used: Wang, Q, G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007. Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol. 73(16):5261-7. """ return help_str def __call__(self, data=None, remove_tmp=True): """Run the application with the specified kwargs on data data: anything that can be cast into a string or written out to a file. Usually either a list of things or a single string or number. input_handler will be called on this data before it is passed as part of the command-line argument, so by creating your own input handlers you can customize what kind of data you want your application to accept remove_tmp: if True, removes tmp files """ input_handler = self.InputHandler suppress_stdout = self.SuppressStdout suppress_stderr = self.SuppressStderr assignment_fp = FilePath(self.getTmpFilename(self.TmpDir)) if suppress_stdout: outfile = FilePath('/dev/null') else: outfile = FilePath(self.getTmpFilename(self.TmpDir)) if suppress_stderr: errfile = FilePath('/dev/null') else: errfile = FilePath(self.getTmpFilename(self.TmpDir)) if data is None: input_arg = '' else: input_arg = getattr(self,input_handler)(data) training_data = self.PositionalParameters['-training-data'] # Build up the command, consisting of a BaseCommand followed by # input and output (file) specifications command = self._commandline_join( [self.BaseCommand, input_arg, assignment_fp, training_data, '>', outfile, '2>', errfile,] ) if self.HaltExec: raise AssertionError, "Halted exec with command:\n" + command # The return value of system is a 16-bit number containing the signal # number that killed the process, and then the exit status. # We only want to keep the exit status so do a right bitwise shift to # get rid of the signal number byte exit_status = system(command) >> 8 # Determine if error should be raised due to exit status of # appliciation if not self._accept_exit_status(exit_status): raise ApplicationError, \ 'Unacceptable application exit status: %s, command: %s'\ % (str(exit_status),command) # open the stdout and stderr if not being suppressed out = None if not suppress_stdout: out = open(outfile,"r") err = None if not suppress_stderr: err = open(errfile,"r") result_paths = self._get_result_paths(data) result_paths['Assignments'] = ResultPath(assignment_fp) result = CommandLineAppResult( out, err, exit_status, result_paths=result_paths) # Clean up the input file if one was created if remove_tmp: if self._input_filename: remove(self._input_filename) self._input_filename = None return result def _accept_exit_status(self, status): """Returns false if an error occurred in execution """ return (status == 0) def _error_on_missing_application(self,params): """Raise an ApplicationNotFoundError if the app is not accessible """ command = self._get_jar_fp() if not exists(command): raise ApplicationNotFoundError,\ "Cannot find jar file. Is it installed? Is $RDP_JAR_PATH"+\ " set correctly?" def _get_jar_fp(self): """Returns the full path to the JAR file. Raises an ApplicationError if the JAR file cannot be found in the (1) current directory or (2) the path specified in the RDP_JAR_PATH environment variable. """ # handles case where the jar file is in the current working directory if exists(self._command): return self._command # handles the case where the user has specified the location via # an environment variable elif 'RDP_JAR_PATH' in environ: return getenv('RDP_JAR_PATH') # error otherwise else: raise ApplicationError,\ "$RDP_JAR_PATH is not set -- this must be set to use the"+\ " RDP classifier application controller." # Overridden to pull out JVM-specific command-line arguments. def _get_base_command(self): """Returns the base command plus command-line options. Does not include input file, output file, and training set. """ # Necessary? Preserve for consistency. if self._command is None: raise ApplicationError, '_command has not been set.' # Append a change directory to the beginning of the command to change # to self.WorkingDir before running the command # WorkingDir should be in quotes -- filenames might contain spaces cd_command = ''.join(['cd ',str(self.WorkingDir),';']) jvm_command = "java" jvm_arguments = self._commandline_join(self.JvmParameters.values()) jar_arguments = '-jar "%s"' % self._get_jar_fp() result = self._commandline_join( [cd_command, jvm_command, jvm_arguments, jar_arguments] ) return result BaseCommand = property(_get_base_command) def _commandline_join(self, tokens): """Formats a list of tokens as a shell command This seems to be a repeated pattern; may be useful in superclass. """ commands = filter(None, map(str, tokens)) return self._command_delimiter.join(commands).strip() @property def JvmParameters(self): return self.__extract_parameters('jvm') @property def PositionalParameters(self): return self.__extract_parameters('positional') def __extract_parameters(self, name): """Extracts parameters in self.__parameters from self.Parameters Allows the program to conveniently access a subset of user- adjusted parameters, which are stored in the Parameters attribute. Relies on the convention of providing dicts named according to "__parameters" and "__synonyms". The main parameters object is expected to be initialized with the contents of these dicts. This method will throw an exception if either convention is not adhered to. """ parameters = getattr(self, '_' + name + '_parameters') synonyms = getattr(self, '_' + name + '_synonyms') result = Parameters(parameters, synonyms) for key in result.keys(): result[key] = self.Parameters[key] return result class RdpTrainer20(RdpClassifier20): _input_handler = '_input_as_lines' TrainingClass = 'edu/msu/cme/rdp/classifier/train/ClassifierTraineeMaker' PropertiesFile = 'RdpClassifier.properties' def __call__(self, training_seqs_file, taxonomy_file, model_output_dir, remove_tmp=True): return self._train_with_rdp_files( training_seqs_file, taxonomy_file, model_output_dir, remove_tmp) def _train_with_mapping_file(self, training_seqs_file, lineage_file, model_output_dir, remove_tmp=True): """Creates a set of training data for the RDP Classifier training_seqs_file: The set of training sequences, in fasta format. lineage_file: A File-like object that specifies a lineage for each sequence. Each line must contain a Sequence ID, followed by a tab, then followed by the assigned lineage. The taxa comprising the lineage must be separated with a comma. model_output_dir: Directory in which to store training data. remove_tmp: if True, removes tmp files To use the resulting model with the RdpClassifier, set '-training_data' to the following path: model_output_dir + RdpClassifier.PropertiesFile """ def _train_with_rdp_files(self, training_seqs_file, taxonomy_file, model_output_dir, remove_tmp=True): """Creates a set of training data for the RDP Classifier training_seqs_file: A pre-classified set of training sequences, in fasta-like format. Each sequence must be labelled with an identifier (no spaces) and an assigned lineage (taxa separated by ';'). Example of a valid label: ">seq1 ROOT;Ph1;Fam1;G1;" taxonomy_file: A File-like object that specifies a taxonomic heirarchy. Each line in the file must contain a '*'-separated list of the following items: Taxon ID, Taxon Name, Parent Taxon ID, Depth, and Rank. IDs should have an integer format. Example of a valid line: "1*Bacteria*0*0*domain" model_output_dir: Directory in which to store training data. remove_tmp: if True, removes tmp files To use the resulting model with the RdpClassifier, set '-training_data' to the following path: model_output_dir + RdpClassifier.PropertiesFile """ # Three extra pieces of information are required to create # training data. Unless we want built-in support for # versioned training sets, these may be set to sensible # defaults. training_set_id = '1' taxonomy_version = 'version1' modification_info = 'cogent' # The properties file specifies the names of the files in the # training directory. We use the example properties file # directly from the rdp_classifier distribution, which lists # the default set of files created by the application. We # must write this file explicitly after generating the # training data. properties = ( "# Sample ResourceBundle properties file\n" "bergeyTree=bergeyTrainingTree.xml\n" "probabilityList=genus_wordConditionalProbList.txt\n" "probabilityIndex=wordConditionalProbIndexArr.txt\n" "wordPrior=logWordPrior.txt\n" "classifierVersion=Naive Bayesian rRNA Classifier Version 1.0, November 2003\n" ) input_handler = self.InputHandler suppress_stdout = self.SuppressStdout suppress_stderr = self.SuppressStderr if suppress_stdout: outfile = FilePath('/dev/null') else: outfile = self.getTmpFilename(self.TmpDir) if suppress_stderr: errfile = FilePath('/dev/null') else: errfile = FilePath(self.getTmpFilename(self.TmpDir)) input_handler_function = getattr(self, input_handler) taxonomy_filename = input_handler_function(taxonomy_file) training_seqs_filename = input_handler_function(training_seqs_file) # Build up the command, consisting of a BaseCommand followed # by input and output (file) specifications # Example from rdp_classifier/sampledata/README: # java -Xmx400m -cp rdp_classifier-2.0.jar # edu/msu/cme/rdp/classifier/train/ClassifierTraineeMaker # mydata/mytaxon.txt mydata/mytrainseq.fasta 1 version1 test # mydata command = self._commandline_join( [self.BaseCommand, taxonomy_filename, training_seqs_filename, training_set_id, taxonomy_version, modification_info, model_output_dir, '>', outfile, '2>', errfile] ) if self.HaltExec: raise AssertionError, "Halted exec with command:\n" + command # The return value of system is a 16-bit number containing the signal # number that killed the process, and then the exit status. # We only want to keep the exit status so do a right bitwise shift to # get rid of the signal number byte exit_status = system(command) >> 8 # Determine if error should be raised due to exit status of # appliciation if not self._accept_exit_status(exit_status): raise ApplicationError, \ 'Unacceptable application exit status: %s, command: %s'\ % (str(exit_status),command) # must write properties file to output directory manually properties_fp = path.join(model_output_dir, self.PropertiesFile) properties_file = open(properties_fp, 'w') properties_file.write(properties) properties_file.close() # open the stdout and stderr if not being suppressed out = None if not suppress_stdout: out = open(outfile,"r") err = None if not suppress_stderr: err = open(errfile,"r") result = CommandLineAppResult(out, err, exit_status, result_paths=self._get_result_paths(model_output_dir)) # Clean up the input files if remove_tmp: remove(taxonomy_filename) remove(training_seqs_filename) return result def _input_as_lines(self, data): """ Write a seq of lines to a temp file and return the filename string. This method has been overridden for RdpTrainer so that the _input_filename attribute is not assigned. data: a sequence to be written to a file, each element of the sequence will compose a line in the file * Note: the result will be the filename as a FilePath object (which is a string subclass). * Note: '\n' will be stripped off the end of each sequence element before writing to a file in order to avoid multiple new lines accidentally be written to a file """ filename = FilePath(self.getTmpFilename(self.TmpDir)) data_file = open(filename, 'w') # Parent method does not take advantage of laziness, due to # temporary variable that contains entire file contents -- # better to write explicit loop over lines in the data source, # storing only each line in turn. for line in data: line = str(line).strip('\n') data_file.write(line) data_file.write('\n') data_file.close() return filename def _get_result_paths(self, output_dir): files = { 'bergeyTree': 'bergeyTrainingTree.xml', 'probabilityList': 'genus_wordConditionalProbList.txt', 'probabilityIndex': 'wordConditionalProbIndexArr.txt', 'wordPrior': 'logWordPrior.txt', 'properties': self.PropertiesFile, } result_paths = {} for name, file in files.iteritems(): result_paths[name] = ResultPath( Path=path.join(output_dir, file), IsWritten=True) return result_paths # Overridden to pull out JVM-specific command-line arguments. def _get_base_command(self): """Returns the base command plus command-line options. Does not include input file, output file, and training set. """ # Necessary? Preserve for consistency. if self._command is None: raise ApplicationError, '_command has not been set.' # Append a change directory to the beginning of the command to change # to self.WorkingDir before running the command # WorkingDir should be in quotes -- filenames might contain spaces cd_command = ''.join(['cd ',str(self.WorkingDir),';']) jvm_command = "java" jvm_arguments = self._commandline_join(self.JvmParameters.values()) jar_arguments = '-cp "%s"' % self._get_jar_fp() result = self._commandline_join( [cd_command, jvm_command, jvm_arguments, jar_arguments, self.TrainingClass] ) return result BaseCommand = property(_get_base_command) def parse_command_line_parameters(): """ Parses command line arguments """ usage =\ 'usage: %prog [options] input_sequences_filepath' version = 'Version: %prog ' + __version__ parser = OptionParser(usage=usage, version=version) parser.add_option('-o','--output_fp',action='store',\ type='string',dest='output_fp',help='Path to store '+\ 'output file [default: generated from input_sequences_filepath]') parser.add_option('-c','--min_confidence',action='store',\ type='float',dest='min_confidence',help='minimum confidence '+\ 'level to return a classification [default: %default]') parser.set_defaults(verbose=False,min_confidence=0.80) opts,args = parser.parse_args() num_args = 1 if len(args) != num_args: parser.error('Exactly one argument is required.') return opts,args def assign_taxonomy(data, min_confidence=0.80, output_fp=None, training_data_fp=None, max_memory=None): """ Assign taxonomy to each sequence in data with the RDP classifier data: open fasta file object or list of fasta lines confidence: minimum support threshold to assign taxonomy to a sequence output_fp: path to write output; if not provided, result will be returned in a dict of {seq_id:(taxonomy_assignment,confidence)} """ data = list(data) # build a map of seq identifiers as the RDP classifier doesn't # preserve these perfectly identifier_lookup = {} for seq_id, seq in MinimalFastaParser(data): identifier_lookup[seq_id.split()[0]] = seq_id # build the classifier object app = RdpClassifier20() if max_memory is not None: app.Parameters['-Xmx'].on(max_memory) if training_data_fp is not None: app.Parameters['-training-data'].on(training_data_fp) # apply the rdp app controller rdp_result = app('\n'.join(data)) # grab assignment output result_lines = rdp_result['Assignments'] # start a list to store the assignments results = {} # ShortSequenceException messages are written to stdout # Tag these ID's as unassignable stdout_lines = rdp_result['StdOut'] for line in stdout_lines: if line.startswith('ShortSequenceException'): matchobj = re.search('recordID=(\S+)', line) if matchobj: rdp_id = matchobj.group(1) orig_id = identifier_lookup[rdp_id] results[orig_id] = ('Unassignable', 1.0) # iterate over the identifier, assignment strings (this is a bit # of an abuse of the MinimalFastaParser, as these are not truely # fasta lines) for identifier, assignment_str in MinimalFastaParser(result_lines): # get the original identifier from the one in the rdp result identifier = identifier_lookup[\ identifier[:identifier.index('reverse=')].strip()] # build a list to store the assignments we're confident in # (i.e., the ones that have a confidence greater than min_confidence) confident_assignments = [] # keep track of the lowest acceptable confidence value that # has been encountered lowest_confidence = 0.0 # split the taxonomy assignment string assignment_fields = assignment_str.split(';') # iterate over (assignment, assignment confidence) pairs for i in range(0,len(assignment_fields),2): assignment = assignment_fields[i] try: assignment_confidence = float(assignment_fields[i+1]) except IndexError: break # check the confidence of the current assignment if assignment_confidence >= min_confidence: # if the current assignment confidence is greater than # the min, store the assignment and confidence value confident_assignments.append(assignment.strip()) lowest_confidence = assignment_confidence else: # otherwise, we've made it to the lowest assignment that # met the confidence threshold, so bail out of the loop break # store the identifier, the semi-colon-separated assignments, and the # confidence for the last assignment results[identifier] = \ (';'.join(confident_assignments),lowest_confidence) if output_fp: try: output_file = open(output_fp,'w') except OSError: raise OSError, "Can't open output file for writing: %s" % output_fp for seq_id, values in results.items(): output_file.write('%s\t%s\t%1.3f\n' % (seq_id,values[0],values[1])) output_file.close() return None else: return results def train_rdp_classifier(training_seqs_file, taxonomy_file, model_output_dir, max_memory=None): """ Train RDP Classifier, saving to model_output_dir training_seqs_file, taxonomy_file: file-like objects used to train the RDP Classifier (see RdpTrainer documentation for format of training data) model_output_dir: directory in which to save the files necessary to classify sequences according to the training data Once the model data has been generated, the RDP Classifier may """ app = RdpTrainer20() if max_memory is not None: app.Parameters['-Xmx'].on(max_memory) return app(training_seqs_file, taxonomy_file, model_output_dir) def train_rdp_classifier_and_assign_taxonomy( training_seqs_file, taxonomy_file, seqs_to_classify, min_confidence=0.80, model_output_dir=None, classification_output_fp=None, max_memory=None): """ Train RDP Classifier and assign taxonomy in one fell swoop The file objects training_seqs_file and taxonomy_file are used to train the RDP Classifier (see RdpTrainer documentation for details). Model data is stored in model_output_dir. If model_output_dir is not provided, a temporary directory is created and removed after classification. The sequences in seqs_to_classify are classified according to the model and filtered at the desired confidence level (default: 0.80). The results are saved to classification_output_fp if provided, otherwise a dict of {seq_id:(taxonomy_assignment,confidence)} is returned. """ if model_output_dir is None: training_dir = mkdtemp(prefix='RdpTrainer_') else: training_dir = model_output_dir trainer = RdpTrainer20() if max_memory is not None: trainer.Parameters['-Xmx'].on(max_memory) training_results = trainer( training_seqs_file, taxonomy_file, training_dir) training_data_fp = training_results['properties'].name assignment_results = assign_taxonomy( seqs_to_classify, min_confidence=min_confidence, output_fp=classification_output_fp, training_data_fp=training_data_fp, max_memory=max_memory) if model_output_dir is None: rmtree(training_dir) return assignment_results if __name__ == "__main__": opts,args = parse_command_line_parameters() assign_taxonomy(open(args[0]),min_confidence=opts.min_confidence,\ output_fp=opts.output_fp) PyCogent-1.5.3/cogent/app/rnaalifold.py000644 000765 000024 00000011675 12024702176 020777 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym from cogent.core.alignment import Alignment from cogent.core.moltype import RNA from cogent.parse.fasta import MinimalFastaParser from cogent.parse.rnaalifold import rnaalifold_parser, MinimalRnaalifoldParser from cogent.format.clustal import clustal_from_alignment from cogent.struct.rna2d import ViennaStructure __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman","Jeremy Widmann"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class RNAalifold(CommandLineApplication): """Application controller for RNAalifold application reads aligned RNA sequences from stdin or file.aln and calculates their minimum free energy (mfe) structure, partition function (pf) and base pairing probability matrix. OPTIONS -cv Set the weight of the covariance term in the energy function to factor. Default is 1. -nc Set the penalty for non-compatible sequences in the covariance term of the energy function to factor. Default is 1. -E Score pairs with endgaps same as gap-gap pairs. -mis Output \"most informative sequence\" instead of simple consensus: For each column of the alignment output the set of nucleotides with frequence greater than average in IUPAC notation. -p Calculate the partition function and base pairing probability matrix in addition to the mfe structure. Default is calculation of mfe structure only. -noLP Avoid structures without lonely pairs (helices of length 1). In the mfe case structures with lonely pairs are strictly forbid- den. For partition function folding this disallows pairs that can only occur isolated. Setting this option provides a signif- icant speedup. The -T, -d, -4, -noGU, -noCloseGU, -e, -P, -nsp, options should work as in RNAfold If using -C constraints will be read from stdin, the alignment has to given as a filename on the command line. For more info see respective man pages. """ _parameters = { '-cv':ValuedParameter(Prefix='-',Name='cv',Delimiter=' '), '-nc':ValuedParameter(Prefix='-',Name='nc',Delimiter=' '), '-E':FlagParameter(Prefix='-',Name='E'), '-mis':FlagParameter(Prefix='-',Name='mis'), '-noLP':FlagParameter(Prefix='-',Name='noLP'), '-T':ValuedParameter(Prefix='-',Name='T',Value=37,Delimiter=' '), '-4':FlagParameter(Prefix='-',Name=4), '-d':MixedParameter(Prefix='-',Name='d',Delimiter=''), '-noGU':FlagParameter(Prefix='-',Name='noGU'), '-noCloseGU':FlagParameter(Prefix='-',Name='noCloseGU'), '-e':ValuedParameter(Prefix='-',Name='e',Delimiter=' '), '-P':ValuedParameter(Prefix='-',Name='P',Delimiter=' '), '-nsp':ValuedParameter(Prefix='-',Name='nsp',Delimiter=' '), '-C':FlagParameter(Prefix='-',Name='C')} _synonyms = {'Temperature':'-T','Temp':'-T','EnergyRange':'-e'} _command = 'RNAalifold' _input_handler = '_input_as_string' def _get_result_paths(self, data): """Specify the paths of the output files generated by the application You always get back: StdOut, StdErr, and ExitStatus. In addition RNAalifold writes a file: alirna.ps. It seems that this file is always written (no exceptions found so far. The documentation says the application can produce a dotplot (alidot.ps), but it is unclear when this file is produced, and thus it is not added to the results dictionary. """ result = {} result['SS'] = ResultPath(Path=self.WorkingDir+'alirna.ps',\ IsWritten=True) return result def rnaalifold_from_alignment(aln,moltype=RNA,params=None): """Returns seq, pairs, folding energy for alignment. """ #Create Alignment object. Object will handle if seqs are unaligned. aln = Alignment(aln,MolType=RNA) int_map, int_keys = aln.getIntMap() app = RNAalifold(WorkingDir='/tmp',\ InputHandler='_input_as_multiline_string',params=params) res = app(clustal_from_alignment(int_map)) #seq,pairs,energy = rnaalifold_parser(res['StdOut'].readlines()) pairs_list = MinimalRnaalifoldParser(res['StdOut'].readlines()) res.cleanUp() return pairs_list if __name__ == "__main__": from sys import argv aln_file = argv[1] aln = dict(MinimalFastaParser(open(aln_file,'U'))) res = rnaalifold_from_alignment(aln) print res PyCogent-1.5.3/cogent/app/rnaforester.py000644 000765 000024 00000014317 12024702176 021212 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python from os import mkdir,getcwd,system,remove,close from random import choice from cogent.util.misc import app_path from cogent.app.util import CommandLineApplication, CommandLineAppResult,\ ResultPath, ApplicationNotFoundError, ApplicationError from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym, is_not_None __author__ = "Shandy Wikman" __copyright__ = "Copyright 2007-2012, The Cogent Project" __contributors__ = ["Shandy Wikman","Greg Caporaso"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class RNAforester(CommandLineApplication): """Application controller for RNAforester application""" #Not all parameters added!! _parameters = { '-d':FlagParameter(Prefix='-',Name='d',Value=False), '-r':FlagParameter(Prefix='-',Name='r',Value=False), '-m':FlagParameter(Prefix='-',Name='m',Value=True), '-p':FlagParameter(Prefix='-',Name='p',Value=False)} _command1 = 'RNAshapes -C -c 20 -f' _command2 = 'RNAforester' _input_handler = '_input_as_string' def _input_as_lines(self,data): """ """ data = ' '.join([super(RNAforester,self)._input_as_lines(data),'-o f','|']) return data def _input_as_string(self,data): """Makes data the value of a specific parameter This method returns the empty string. The parameter will be printed automatically once set. """ data = ' '.join([data,'-o f','|']) return data def _error_on_missing_application(self,params): """ Raise an ApplicationNotFoundError if the app is not accessible """ if not app_path('RNAforester'): raise ApplicationNotFoundError,\ "Cannot find RNAforester. Is it installed? Is it in your path?" if not app_path('RNAshapes'): raise ApplicationNotFoundError,\ "Cannot find RNAshapes. Is it installed? Is it in your path?" #Override these functions to biuld up the command def __call__(self,data=None, remove_tmp=True): """Run the application with the specified args on data data: anything that can be cast into a string or written out to a file. Usually either a list of things or a single string or number. input_handler will be called on this data before it is passed as part of the command-line argument, so by creating your own input handlers you can customize what kind of data you want you application to accept """ input_handler = self.InputHandler suppress_stdout = self.SuppressStdout suppress_stderr = self.SuppressStderr if suppress_stdout: outfile = '/dev/null' else: outfile = self.getTmpFilename(self.WorkingDir) if suppress_stderr: errfile = '/dev/null' else: errfile = self.getTmpFilename(self.WorkingDir) if data is None: input_arg = '' else: input_arg = getattr(self,input_handler)(data) # Build up the command, consisting of a BaseCommand followed by # input and output (file) specifications first,second = self.BaseCommand command = self._command_delimiter.join(filter(None,\ [first,input_arg,second,'>',outfile,'2>',errfile])) #print 'COMMAND',command if self.HaltExec: raise AssertionError, "Halted exec with command:\n" + command # The return value of system is a 16-bit number containing the signal # number that killed the process, and then the exit status. # We only want to keep the exit status so do a right bitwise shift to # get rid of the signal number byte exit_status = system(command) >> 8 # Determine if error should be raised due to exit status of # appliciation if not self._accept_exit_status(exit_status): raise ApplicationError, \ 'Unacceptable application exit status: %s, command: %s'\ % (str(exit_status),command) # open the stdout and stderr if not being suppressed out = None if not suppress_stdout: out = open(outfile,"r") err = None if not suppress_stderr: err = open(errfile,"r") result = CommandLineAppResult(out,err,exit_status,\ result_paths=self._get_result_paths(data)) # Clean up the input file if one was created if remove_tmp: if self._input_filename: remove(self._input_filename) self._input_filename = None remove(''.join([self.WorkingDir, 'cluster.dot'])) remove(''.join([self.WorkingDir, 'test.out'])) remove(''.join([self.WorkingDir, 'ShapesStderr'])) return result def _get_base_command(self): """ Returns the full command string input_arg: the argument to the command which represents the input to the program, this will be a string, either representing input or a filename to get input from """ command_part1 = [] command_part2 = [] # Append a change directory to the beginning of the command to change # to self.WorkingDir before running the command cd_command = ''.join(['cd ',self.WorkingDir,';']) if self._command1 is None: raise ApplicationError, '_command has not been set.' parameters = self.Parameters command1 = self._command1 command2 = self._command2 command_part1.append(cd_command) command_part1.append(command1) command_part1.append(''.join(['2> ', self.WorkingDir, 'ShapesStderr'])) command_part2.append(command2) command_part2.append(self._command_delimiter.join(filter(\ None,(map(str,parameters.values()))))) return self._command_delimiter.join(command_part1).strip(),\ self._command_delimiter.join(command_part2).strip() BaseCommand = property(_get_base_command) PyCogent-1.5.3/cogent/app/rnashapes.py000644 000765 000024 00000012574 12024702176 020647 0ustar00jrideoutstaff000000 000000 #!/usr/bin/env python #file: RNAshaper.py """ Author: Shandy Wikman (ens01svn@cs.umu.se) Application controller for RNAshapes application Revision History: 2006 Shandy Wikman created file """ from cogent.app.util import CommandLineApplication,\ CommandLineAppResult, ResultPath from cogent.app.parameters import Parameter, FlagParameter, ValuedParameter,\ MixedParameter,Parameters, _find_synonym, is_not_None __author__ = "Daniel McDonald and Greg Caporaso" __copyright__ = "Copyright 2007-2012, The Cogent Project" __credits__ = ["Shandy Wikman"] __license__ = "GPL" __version__ = "1.5.3" __maintainer__ = "Shandy Wikman" __email__ = "ens01svn@cs.umu.se" __status__ = "Development" class RNAshapes(CommandLineApplication): """Application controller for RNAshapes application Options: -h Display this information -H